[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 1
To Buyers of 2013 Exam 4 / Exam C Study Guides Howard C. Mahler, FCAS, MAAA How much detail is needed and how many problems need to be done varies by person and topic. In order to help you to concentrate your efforts: 1. About 1/6 of the many problems are labeled “highly recommended”, while another 1/6 are labeled “recommended.” 2. Important Sections are listed in bold in the table of contents. Extremely important Sections are listed in larger type and in bold. 3. Important ideas and formulas are in bold. 4. Each Study Guide has a Section of Important Ideas and Formulas. 5. Each Study Guide has a chart of past exam questions by Section. 6. There is a breakdown of percent of questions on each past exam by Study Guide. My Study Aids are a thick stack of paper.1 However, many students find they do not need to look at the textbooks. For those who have trouble getting through the material, concentrate on the introductions and sections in bold. Highly Recommended problems (about 1/6 of the total) are double underlined. Recommended problems (about 1/6 of the total) are underlined. Do at least the Highly Recommended problems your first time through. It is important that you do problems when learning a subject and then some more problems a few weeks later. Be sure to do all the questions from the recent Course 4 Exams at some point. I have written some easy and tougher problems.2 The former exam questions are arranged in chronological order. The more recent exam questions are on average more similar to what you will be asked on your exam, than are less recent questions. All of the 2009 Sample Exam questions are included (there were 289 prior to deletions.) Their locations are shown in my final study guide, Breakdown of Past Exams. Each of my study guides is divided into sections as shown in its table of contents. The solutions to the problems in a section of a study guide are at the end of that section.
1
The number of pages is not as important as how long it takes you to understand the material. One page in a textbook might take someone as long to understand as ten pages in my Study Guides. 2 Points are based on 100 points = a 4 hour exam.
[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 2
In the electronic version use the bookmarks / table of contents in the Navigation Panel in order to help you find what you want. The Find function will also help. You may find it helpful to print out selected portions, such as the Table of Contents and Important Ideas Section in each of my study guides. Mahlerʼs Guides for Joint Exam 4/C have 14 parts, which are listed below, along with my estimated percent of the exam.3
Study Guides for Joint Exam C / Exam 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14
6% 8% 7% 3% 4% 26% 10% 3% 14% 7% 1% 3% 8%
Mahler's Guide to Frequency Distributions Mahler's Guide to Loss Distributions Mahler's Guide to Aggregate Distributions Mahler's Guide to Risk Measures Mahler's Guide to Fitting Frequency Distributions Mahler's Guide to Fitting Loss Distributions Mahler's Guide to Survival Analysis Mahler's Guide to Classical Credibility Mahler's Guide to Buhlmann Credibility & Bayesian Analysis Mahler's Guide to Conjugate Priors Mahler's Guide to Semiparametric Estimation Mahler's Guide to Empirical Bayesian Credibility Mahler's Guide to Simulation Breakdown of Past Exams
My Practice Exams are sold separately.
3
This is my best estimate, which should be used with appropriate caution, particularly in light of the changes in the syllabus. In any case, the number of questions by topic varies from exam to exam.
[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 3
Author Biography: Howard C. Mahler is a Fellow of the Casualty Actuarial Society, and a Member of the American Academy of Actuaries. He has taught actuarial exam seminars and published study guides since 1994. He spent over 20 years in the insurance industry, the last 15 as Vice President and Actuary at the Workers' Compensation Rating and Inspection Bureau of Massachusetts. He has published dozens of major research papers and won the 1987 CAS Dorweiler prize. He served 12 years on the CAS Examination Committee including three years as head of the whole committee (1990-1993). Mr. Mahler has taught live seminars for Joint Exam 4/C, Joint Exam MFE/3F, CAS Exam 3L, CAS Exam 5, and what is now CAS Exam 8. He has written study guides for all of the above. Mr. Mahler teaches weekly classes in Boston for Joint Exam 4/C, and Joint Exam MFE/3F.
[email protected] www.howardmahler.com/Teaching
[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 4
Loss Models, 3rd Edition
Mahler Study Guides
Chapter 3.1 Chapter 3.2 Chapter 3.3 Chapter 3.4 Chapter 3.5
Loss Distributions: Sections 2-4, 7-8. Loss Distributions: Section 19. Freq. Dists.: Section 9, Aggregate Dists.: Sections 4-5 Loss Distributions: Sections 30, 33, 34. Risk Measures
Chapter 4
Loss Distributions: Sections 21, 38.
Chapter 5.2 Chapter 5.3 Chapter 5.4
Loss Distributions: Sections 29, 39, 40. Loss Distributions: Sections 22, 24-28. Conjugate Priors: Section 11.
Chapters 6.1-6.5, 6.7
Frequency Distributions: Sections 1-6, 9, 11-14, 19.
Chapter 8
Loss Dists.: Sections 6, 15-18, 36, Freq Dists.: Sections 3-6
Chapter 9.1-9.7, 9.11.1-9.11.24
Aggregate Distributions
Chapter 12
Fitting Loss Dists.: Sections 14, 26. Freq. Dists.: Section 7.
Chapter 13
Loss Dists.: Section 4, Fitting Loss Dists.: Sections 5-6, Survival Analysis: Sections 1, 5.
Chapter 14
Survival Analysis: Sections 1, 2, 3, 4, 6, 7, 9, Loss Dists.: Sections 16-17, Fitting Loss Dists.: Sections 5, 6.
Chapter 15.1-15.4 Chapter 15.5 Chapter 15.65
Fit Loss Dists.: Secs. 7-11, 20-25, 27-31, Surv. Anal.: Sec. 8 Buhlmann Cred.: Sections 4-6, 16, Conj. Priors: Section 10 Fitting Freq. Dists.: Sections 2, 3, 6.
Chapter 16
Fitting Freq. Dists.: Secs 4-5, Fitting Loss Dists.: Secs 12-19.
Chapter 20.2 Chapter 20.3 6 Chapter 20.4 7
Classical Credibility Buhlmann Credibility, Conjugate Priors. Empirical Bayes, Semiparametric Estimation
Chapter 218
Simulation
4 5 6 7 8
Excluding 9.6.1 and examples 9.9 and 9.11. Sections 15.6.1-15.6.4, 15.6.6 only. Excluding 20.3.8. Excluding 20.4.3. Sections 21.2-21.2 (excluding 21.2.4)
[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 5
Besides many past exam questions from the CAS and SOA, my study guides include some past questions from exams given by the Institute of Actuaries and Faculty of Actuaries in Great Britain. These questions are copyright by the Institute of Actuaries and Faculty of Actuaries, and are reproduced here solely to aid students studying for actuarial exams. These IOA questions are somewhat different in format than those on your exam, but should provide some additional perspective on the syllabus material. Your exam will be 3.5 hours and will consist of approximately 35 multiple choice questions, each of equal value.9 The examination will be offered via computer-based testing. Download from the CAS website or SOA website, a copy of the tables to be attached to your exam.10 Read the “Hints on Study and Exam Techniques” in the CAS Syllabus.11 Read “Tips for Taking Exams.”12 Some students have reported success with the following guessing strategy. When you are ready to guess (a few minutes before time is finished for the exam), count up how many you have answered of each letter. Then fill in the least used letter, at each stage. For example, if the fewest were A, fill in A's until some other letter is fewest. Now fill in that letter, etc. Remember that for every question you should fill in a letter answer.13 On Exam 4/C, the following rule applies to the use of the Normal Table: When using the normal distribution, choose the nearest z-value to find the probability, or if the probability is given, chose the nearest z-value. No interpolation should be used. Example: If the given z-value is 0.759, and you need to find Pr(Z < 0.759) from the normal distribution table, then choose the probability value for z-value = 0.76; Pr(Z < 0.76) = 0.7764. When using the Normal Approximation to a discrete distribution, use the continuity correction.
9
Equivalent to “2.5 points” each, if 100 points = a 4 hour exam. http://www.soa.org http://casact.org 11 http://casact.org/admissions/syllabus/index.cfm?fa=hints 12 www.casact.org/admissions/index.cfm?fa=tips 13 Nothing will be added for an unanswered question and nothing will be subtracted for an incorrect answer. Therefore put down an answer, even a total guess, for every question. 10
[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 6
I suggest you buy and use the TI-30XS Multiview calculator. You will save time doing repeated calculations using the same formula. Examples include calculating processes variances to calculate an EPV, constructing a Distribution Table of a frequency distribution, simulating from the same continuous distribution several times, etc. While studying, you should do as many problems as possible. Going back and forth between reading and doing problems is the only way to pass this exam. The only way to learn to solve problems is to solve lots of problems. You should not feel satisfied with your study of a subject until you can solve a reasonable number of the problems. There are two manners in which you should be doing problems. First you can do problems in order to learn the material. Take as long on each problem as you need to fully understand the concepts and the solution. Reread the relevant syllabus material. Carefully go over the solution to see if you really know what to do. Think about what would happen if one or more aspects of the question were revised.14 This manner of doing problems should be gradually replaced by the following manner as you get closer to the exam. The second manner is to do a series of problems under exam conditions, with the items you will have when you take the exam. Take in advance a number of points to try based on the time available. For example, if you have an uninterrupted hour, then one might try either 60/2.5 = 24 points or 60/3 = 20 points of problems. Do problems as you would on an exam in any order, skipping some and coming back to some, until you run out of time. I suggest you leave time to double check your work. Expose yourself somewhat to everything on the syllabus. Concentrate on sections and items in bold. Do not read sections or material in italics your first time through the material.15 Each study guide has a chart of where the past exam questions have been; this may also help you to direct your efforts.16 Try not to get bogged down on a single topic. On hard subjects, try to learn at least the simplest important idea. The first time through do enough problems in each section, but leave some problems in each section to do closer to the exam. At least every few weeks review the important ideas and formulas sections of those study guides you have already completed. Make a schedule and stick to it. Spend a minimum of one hour every day. I recommend at least two study sessions every day, each of at least 1/2 hour.
14
Some may also find it useful to read about a dozen questions on an important subject, thinking about how to set up the solution to each one, but only working out in detail any questions they do not quickly see how to solve. 15 Material in italics is provided for those who want to know more about a particular subject and/or to be prepared for more challenging exam questions. Material in italics could be directly needed to answer perhaps one or two questions on an exam. 16 While this may indicate what ideas questions on your exam are likely to cover, every exam contains a few questions on ideas that have yet to be asked.
[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 7
Use whatever order to go through the material that works best for you. Here is a schedule that may work for some people.17 A 15 week Study Schedule for Exam 4/C: 1. Frequency Distributions 2. Start of Loss Distributions: sections 1 to 30. 3. Rest of Loss Distributions: Remainder. 4. Aggregate Distributions 5. Fitting Frequency Distributions Classical Credibility 6. Start of Buhlmann Credibility and Bayesian Analysis: sections 1-6 and 12. 7. Start of Fitting Loss Distributions: sections 1 to 10. 8. More Buhlmann Credibility and Bayesian Analysis: sections 7 to 10. 9. More Fitting Loss Distributions: sections 11 to 19. 10. Rest of Buhlmann Credibility and Bayesian Analysis: Remainder. Semiparametric Estimation 11. Rest of Fitting Loss Distributions: Remainder. 12. Conjugate Priors 13. Survival Analysis 14. Empirical Bayesian Credibility Risk Measures 15. Simulation
17
This is just an example of one possible schedule. Adjust it to suit your needs or make one up yourself.
[email protected],
Exam C / Exam 4 Study Guides,
10/26/12,
Page 8
Most of you will need to spend a total of 300 or more hours of study time on the entire syllabus; this means an average of at least 2 hours a day. Throughout do Exam Problems and Practice Problems in my study guides. At least 50% of your time should be spent doing problems. As you get closer to the Exam, the portion of time spent doing problems should increase. Review the important formulas and ideas section at the end of each study guide. During the last several weeks do my practice exams, sold separately. The CAS/SOA has posted a preview of the tables for Computer Based Testing: http://www.beanactuary.org/exams/4C/Split.html I would suggest you use them if possible when doing practice exams. Past students helpful suggestions and questions have greatly improved these Study Aids. I thank them. Feel free to send me any questions or suggestions: Howard Mahler, Email:
[email protected] Please do not copy the Study Aids, except for your own personal use. Giving them to others is unfair to yourself, your fellow students who have paid for them, and myself.18 If you found them useful, tell a friend to buy his own. Please send me any suspected errors by Email. (Please specify as carefully as possible the page, Study Guide and Course.) The errata sheet will be posted on my webpage: www.howardmahler.com/Teaching
18
These study aids represent thousands of hours of work.
Exam C / Exam 4 Study Guides,
[email protected],
10/26/12,
Page 9
Pass Marks and Passing Percentages for Past Exams:19 Pass Exam 4/C Mark Spring 2007 N.A. Fall 2007 63%
Effective Number of Exams 1976 1786
Number Passing 887 926
Percent Passing 42.7% 49.9%
% Effective Passing 44.9% 51.8%
Spring 2008 60% 1848 Fall 2008 55% 1763
1757 1698
868 769
47.0% 43.6%
49.4% 45.3%
Spring 2009 55% 1957 Fall 2009 58%20 2198
1861 2004
746 959
38.1% 43.6%
40.1% 47.9%
Spring 2010 66% 1674 Aug. 2010 66%21 1252 Nov. 2010 64% 1512
1559 1163 1358
702 552 612
41.9% 44.1% 40.5%
45.0% 47.5% 45.1%
Feb. 2011 June 2011 Oct. 2011
64% 1470 64% 1890 64% 1962
1304 1681 1723
598 745 858
40.7% 39.4% 43.7%
45.9% 44.3% 49.8%
Feb. 2012 May 2012
67% 1461 67% 2008
1318 1798
665 904
45.5% 45.0%
50.5% 50.3%
19
Number of Exams 2079 1857
Information taken from the CAS and SOA webpages. Check the webpages for updated information. Starting in Fall 2009, there was computer-based testing. All versions of the exam are constructed to be of comparable difficulty to one another. Apparently, the passing percentage varies somewhat by version of the exam. On average, 58% correct was needed to pass the exam. 21 Examination C/4 is administered using computer-based testing (CBT). Under CBT, it is not possible to schedule everyone to take the examination at the same time. As a result, each administration consists of multiple versions of the examination given over a period of several days. The examinations are constructed and scored using Item Response Theory (IRT). Under IRT, each operational item that appears on an examination has been calibrated for difficulty and other test statistics and the pass mark for each examination is determined before the examination is given. All versions of the examination are constructed to be of comparable difficulty to one another. For the August 2010 administration of Examination C/4, an average of 66% correct was needed to pass the exam. 20
Mahlerʼs Guide to
Frequency Distributions Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-1 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-1,
Frequency Distributions
HCM 10/4/12,
Page 1
Mahlerʼs Guide to Frequency Distributions Copyright 2013 by Howard C. Mahler. Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (or sections whose title is in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined.1 Solutions to the problems in each section are at the end of that section.
A
B
C
D
E F
1
Section #
Pages
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
4 5-15 16-41 42-72 73-94 95-120 121-148 149-161 162-175 176-188 189-210 211-222 223-242 243-261 262-276 277-294 295-334 335-345 346-387 388-398 399-406
Section Name Introduction Basic Concepts Binomial Distribution
Poisson Distribution Geometric Distribution Negative Binomial Distribution Normal Approximation Skewness Probability Generating Functions Factorial Moments (a, b, 0) Class of Distributions Accident Profiles Zero-Truncated Distributions Zero-Modified Distributions Compound Frequency Distributions Moments of Compound Distributions Mixed Frequency Distributions Gamma Function Gamma-Poisson Frequency Process Tails of Frequency Distributions Important Formulas and Ideas
Note that problems include both some written by me and some from past exams. The latter are copyright by the CAS and SOA, and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus.
2013-4-1,
Frequency Distributions
HCM 10/4/12,
Page 2
Past Exam Questions by Section of this Study Aid2 Course 3 Course 3 Course 3 Course 3 Course 3 Course 3 Section
Sample
5/00
11/00
5/01
11/01
11/02
CAS 3
SOA 3
CAS 3
11/03
11/03
5/04
1 2 3
14
4
16
5 6
18
7 8
28
9 10 11
25
28
32
26
12 13 14
37
15 16
2
17
13
16 36
30
27
3 15
27
5
18 19
12
4
15
20
The CAS/SOA did not release the 5/02 and 5/03 exams. From 5/00 to 5/03, the Course 3 Exam was jointly administered by the CAS and SOA. Starting in 11/03, the CAS and SOA gave separate exams. (See the next page.)
2
Excluding any questions that are no longer on the syllabus.
2013-4-1,
Frequency Distributions
CAS 3 SOA 3 Section
HCM 10/4/12,
CAS 3 SOA M CAS 3 SOA M CAS 3
11/04
11/04
5/05
3
22 24
8
15
4
23
5/05
11/05
39
24
11/05
5/06
Page 3 CAS 3 SOA M 11/06
11/06 Section 5/07
1 2 32
5 6
21
28
32
23 24 31
22
7 8 9 10
25
11
16
19
31
12 13 14 15
27
16
18
17
35
32
18 19
30 19
10
20
The SOA did not release its 5/04 and 5/06 exams. This material was moved to Exam 4/C in 2007. The CAS/SOA did not release the 11/07 and subsequent exams.
4/C
39
2013-4-1,
Frequency Distributions, §1 Introduction
HCM 10/4/12,
Page 4
Section 1, Introduction This Study Aid will review what a student needs to know about the frequency distributions in Loss Models. Much of the first seven sections you should have learned on Exam 1 / Exam P. In actuarial work, frequency distributions are applied to the number of losses, the number of claims, the number of accidents, the number of persons injured per accident, etc. Frequency Distributions are discrete functions on the nonnegative integers: 0, 1, 2, 3, ... There are three named frequency distributions you should know: Binomial, with special case Bernoulli Poisson Negative Binomial, with special case Geometric. Most of the information you need to know about each of these distributions is shown in Appendix B, attached to the exam. Nevertheless, since they appear often in exam questions, it is desirable to know these frequency distributions well, particularly the Poisson Distribution. In addition, one can make up a frequency distribution. How to work with such unnamed frequency distributions is discussed in the next section. In later sections, the important concepts of Compound Distributions and Mixed Distributions will be discussed.3 The most important case of a mixed frequency distribution is the Gamma-Poisson frequency process.
3
Compound Distributions are mathematically equivalent to Aggregate Distributions, which are discussed in “Mahlerʼs Guide to Aggregate Distributions.”
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 5
Section 2, Basic Concepts The probability density function4 f(i) can be non-zero at either a finite or infinite number of points. In the former case, the probability density function is determined by a table of its values at these finite number of points. ∞
The f(i) can take on any values provided they satisfy 0 ≤ f(i) ≤1 and ∑ f(i) = 1. i=0
For example: Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11
Probability Density Function 0.1 0.2 0 0.1 0 0 0.1 0 0 0.1 0.3 0.1
Sum
1
Cumulative Distribution Function 0.1 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.5 0.6 0.9 1
The Distribution Function5 is the cumulative sum of the probability density function: j
F(j) =
∑ f(i) . i=0
In the above example, F(3) = f(0) + f(1) + f(2) + f(3) = 0.1 + 0.2 + 0 + 0.1 = 0.4.
4
Loss Models calls the probability density function of frequency the “probability function” or p.f. and uses the notation pk for f(k), the density at k. 5 Also called the cumulative distribution function.
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 6
Moments: One can calculate the moments of such a distribution. For example, the first moment or mean is: (0)(0.1) + (1)(0.2) + (2)(0) + (3)(0.1) + (4)(0) + (5)(0) + (6)(0.1) + (7)(0) + (8)(0) + (9)(0.1) + (10)(0.3) + (11)(0.1) = 6.1.
Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11
Probability Density Function 0.1 0.2 0 0.1 0 0 0.1 0 0 0.1 0.3 0.1
Probability x # of Claims 0 0.2 0 0.3 0 0 0.6 0 0 0.9 3 1.1
Probability x Square of # of Claims 0 0.2 0 0.9 0 0 3.6 0 0 8.1 30 12.1
Sum
1
6.1
54.9
E[X] = Σ i f(i) = Average of X = 1st moment about the origin = 6.1. E[X2 ] = Σ i2 f(i) = Average of X2 = 2nd moment about the origin = 54.9. The second moment is: (02 )(0.1) + (12 )(0.2) + (22 )(0) + (32 )(0.1) + (42 )(0) + (52 )(0) + (62 )(0.1) + (72 )(0) + (82 )(0) + (92 )(0.1) + (102 )(0.3) + (112 )(0.1) = 54.9. Mean = E[X] = 6.1. Variance = second central moment = E[(X - E[X])2 ] = E[X2 ] - E[X]2 = 17.69. Standard Deviation = Square Root of Variance = 4.206. The mean is the average or expected value of the random variable. For the above example, the mean is 6.1 claims.
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 7
In general means add; E[X+Y] = E[X] + E[Y]. Also multiplying a variable by a constant multiplies the mean by the same constant; E[kX] = kE[X]. The mean is a linear operator: E[aX + bY] = aE[X] + bE[Y]. The mean of a frequency distribution can also be computed as a sum of its survival functions:6 E[X] =
∞
∞
i=0
i=0
∑ Prob[X > i] = ∑ {1 -
F(i)} .
Mode and Median: The mean differs from the mode which represents the value most likely to occur. The mode is the point at which the density function reaches its maximum. The mode for the above example is 10 claims. For a discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p.7 The 80th percentile for the above example is 10; F(9) = .6, F(10) = .9. The median is the 50th percentile. For frequency distributions, and other discrete distributions, the median is the first value at which the distribution function is greater than or equal to 0.5. The median for the above example is 6 claims; F(6) = .5. Definitions: Exposure Base: The basic unit of measurement upon which premium is determined. For example, the exposure base could be car-years, $100 of payrolls, number of insured lives, etc. The rate for Workersʼ Compensation Insurance might be $3.18 per $100 of payroll, with $100 of payroll being one exposure. Frequency: The number of losses or number of payments random variable, (unless indicated otherwise) stated per exposure unit. For example the frequency could be the number of losses per (insured) house-year. Mean Frequency: Expected value of the frequency. For example, the mean frequency might be 0.03 claims per insured life per year. 6
This is analogous to the situation for a continuous loss distributions; the mean of a Loss Distribution can be computed as the integral of its survival function. 7 Definition 3.7 in Loss Models. F(πp -) ≤ p ≤ F(πp ).
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Problems: Use the following frequency distribution for the next 5 questions: Number of Claims Probability 0 0.02 1 0.04 2 0.14 3 0.31 4 0.36 5 0.13 2.1 (1 point) What is the mean of the above frequency distribution? A. less than 3 B. at least 3.1 but less than 3.2 C. at least 3.2 but less than 3.3 D. at least 3.3 but less than 3.4 E. at least 3.4 2.2 (1 point) What is the mode of the above frequency distribution? A. 2 B. 3 C. 4 D. 5 E. None of the above. 2.3 (1 point) What is the median of the above frequency distribution? A. 2 B. 3 C. 4 D. 5 E. None of the above. 2.4 (1 point) What is the standard deviation of the above frequency distribution? A. less than 1.1 B. at least 1.1 but less than 1.2 C. at least 1.2 but less than 1.3 D. at least 1.3 but less than 1.4 E. at least 1.4 2.5 (1 point) What is the 80th percentile of the above frequency distribution? A. 2 B. 3 C. 4 D. 5 E. None of A, B, C, or D.
Page 8
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 9
2.6 (1 point) The number of claims, N, made on an insurance portfolio follows the following distribution: n Pr(N=n) 0 0.7 1 0.2 2 0.1 What is the variance of N? A. less than 0.3 B. at least 0.3 but less than 0.4 C. at least 0.4 but less than 0.5 D. at least 0.5 but less than 0.6 E. at least 0.6 Use the following information for the next 8 questions: V and X are each given by the result of rolling a six-sided die. V and X are independent of each other. Y= V + X. Z = 2X. Hint: The mean of X is 3.5 and the variance of X is 35/12. 2.7 (1 point) What is the mean of Y? A. less than 7.0 B. at least 7.0 but less than 7.1 C. at least 7.1 but less than 7.2 D. at least 7.2 but less than 7.3 E. at least 7.4 2.8 (1 point) What is the mean of Z? A. less than 7.0 B. at least 7.0 but less than 7.1 C. at least 7.1 but less than 7.2 D. at least 7.2 but less than 7.3 E. at least 7.4 2.9 (1 point) What is the standard deviation of Y? A. less than 2.0 B. at least 2.0 but less than 2.3 C. at least 2.3 but less than 2.6 D. at least 2.9 but less than 3.2 E. at least 3.2
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 10
2.10 (1 point) What is the standard deviation of Z? A. less than 2.0 B. at least 2.0 but less than 2.3 C. at least 2.3 but less than 2.6 D. at least 2.9 but less than 3.2 E. at least 3.2 2.11 (1 point) What is the probability that Y = 8? A. less than .10 B. at least .10 but less than .12 C. at least .12 but less than .14 D. at least .14 but less than .16 E. at least .16 2.12 (1 point) What is the probability that Z = 8? A. less than .10 B. at least .10 but less than .12 C. at least .12 but less than .14 D. at least .14 but less than .16 E. at least .16 2.13 (1 point) What is the probability that X = 5 if Y≥10? A. less than .30 B. at least .30 but less than .32 C. at least .32 but less than .34 D. at least .34 but less than .36 E. at least .36 2.14 (1 point) What is the expected value of X if Y≥10? A. less than 5.0 B. at least 5.0 but less than 5.2 C. at least 5.2 but less than 5.4 D. at least 5.4 but less than 5.6 E. at least 5.6 2.15 (3 points) N is uniform and discrete from 0 to b; Prob[N = n] = 1/(b+1), n = 0, 1, 2, ... , b.
∧ 10 ≡ Minimum[N, 10]. If E[N ∧ 10] = 0.875 E[N], determine b. N
A. 13
B. 14
C. 15
D. 16
E. 17
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 11
2.16 (2 points) What is the variance of the following distribution? Claim Count: 0 1 2 3 4 5 >5 Percentage of Insureds: 60.0% 24.0% 9.8% 3.9% 1.6% 0.7% 0% A. 0.2 B. 0.4 C. 0.6 D. 0.8 E. 1.0 2.17 (3 points) N is uniform and discrete from 1 to S; Prob[N = n] = 1/S, n = 1, 2, ... , S. Determine the variance of N, as a function of S. 2.18 (4, 5/88, Q.31) (1 point) The following table represents data observed for a certain class of insureds. The regional claims office is being set up to service a group of 10,000 policyholders from this class. Number of Claims Probability of a Policyholder n Making n Claims in a Year 0 0.84 1 0.07 2 0.05 3 0.04 If each claims examiner can service a maximum of 500 claims in a year, and you want to staff the office so that you can handle a number of claims equal to two standard deviations more than the mean, how many examiners do you need? A. 5 or less B. 6 C. 7 D. 8 E. 9 or more 2.19 (4B, 11/99, Q.7) (2 points) A player in a game may select one of two fair, six-sided dice. Die A has faces marked with 1, 2, 3, 4, 5, and 6. Die B has faces marked with 1, 1, 1, 6, 6, and 6. If the player selects Die A, the payoff is equal to the result of one roll of Die A. If the player selects Die B, the payoff is equal to the mean of the results of n rolls of Die B. The player would like the variance of the payoff to be as small as possible. Determine the smallest value of n for which the player should select Die B. A. 1 B. 2 C. 3 D. 4 E. 5 2.20 (1, 11/01, Q.32) (1.9 points) The number of injury claims per month is modeled by a random 1 variable N with P[N = n] = , where n ≥ 0. (n+ 1) (n+ 2) Determine the probability of at least one claim during a particular month, given that there have been at most four claims during that month. (A) 1/3 (B) 2/5 (C) 1/2 (D) 3/5 (E) 5/6
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 12
Solutions to Problems: 2.1. D. mean = (0)(.02) + (1)(.04) + (2)(.14) + (3)(.31) + (4)(.36) + (5)(.13) = 3.34. Comment: Let S(n) = Prob[N > n] = survival function at n. S(0) = 0.98. S(1) = .94 ∞
E[N] =
∑S(i) = .98 + .94 + .80 + .49 + .13 + 0 = 3.34. 0
2.2. C. f(4) = 36% which is the greatest value attained by the probability density function, therefore the mode is 4. 2.3. B. Since F(2) = 0.2 < 0.5 and F(3) = 0.51 ≥ 0.5 the median is 3. Number of Claims 0 1 2 3 4 5
Probability 2% 4% 14% 31% 36% 13%
Distribution 2% 6% 20% 51% 87% 100%
2.4. B. Variance = (second moment) - (mean)2 = 12.4 - 3.342 = 1.244. Standard Deviation =
1.244 = 1.116.
2.5. C. Since F(3) = 0.51 < 0.8 and F(4) = 0.87 ≥ 0.8, the 80th percentile is 4. 2.6. C. Mean = (.7)(0) + (.2)(1) + (.1)(2) = .4. Variance = (.7)(0 - .4)2 + (.2)(1 - .4)2 + (.1)(2 - .4)2 = 0.44. Alternately, Second Moment = (.7)(02 ) + (.2)(12 ) + (.1)(22 ) = .6. Variance = .6 - .42 = 0.44. 2.7. B. E[Y] = E[V + X] = E[V] + E[X] = 3.5 + 3.5 = 7. 2.8. B. E[Z] = E[2X] = 2 E{X] = (2)(3.5) = 7. 2.9. C. Var[Y] = Var[V+X] = Var[V]+V[X] = (35/12)+(35/12) = 35/6 = 5.83. Standard Deviation[Y] =
5.83 = 2.41.
2.10. E. Var[Z] = Var[2X] = 22 Var[X] = (4)(35/12) = 35/3 = 11.67. Standard Deviation[Z] =
11.67 = 3.42.
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 13
2.11. C. For Y = 8 we have the following possibilities: V =2, X=6; V=3, X=5; V=4;X=4; V=5; X=3; V=6, X=2. Each of these has a (1/6)(1/6) = 1/36 chance, so the total chance that Y =8 is 5/36 = 0.139. Comment: The distribution function for Y is: y f(y)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
2.12. E. Z = 8 when X = 4, which has probability 1/6. Comment: The distribution function for Z is: z f(z)
2 1/6
4 1/6
6 1/6
8 1/6
10 1/6
12 1/6
Note that even though Z has the same mean as Y, it has a significantly different distribution function. This illustrates the difference between adding the result of several independent identically distributed variables, and just multiplying a single result by a constant. (If the variable has a finite variance), the Central Limit applies to the prior situation, but not the latter. The sum of N independent dice starts to look like a Normal Distribution as N gets large. N times a single die has a flat distribution similar to that of X or Z, regardless of N. 2.13. C. If Y ≥ 10, then we have the possibilities V=4, X=6; V=5, X =5; V=5, X=6; V=6, X=4; V=6 ,X=5; V=6, X=6. Out of these 6 equally likely probabilities, for 2 of them X=5. Therefore if Y≥10, there is a 2/6 = .333 chance that X = 5. Comment: This is an example of a conditional distribution. The distribution of f(x | y ≥10) is: x 4 5 6 f(x | y ≥10) 1/6 2/6 3/6 The distribution of f(x | y =10) is: x 4 5 6 f(x | y =10) 1/3 1/3 1/3 2.14. C. The distribution of f(x | y ≥10) is: x 4 5 6 f(x | y ≥10) 1/6 2/6 3/6 (1/6)(4) + (2/6)(5) + (3/6)(6) = 32 / 6 = 5.33.
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
Page 14
2.15 C. E[N] = (0 + 1 + 2 + ... + b)/(b + 1) = {b(b+1)/2}/(b + 1). For b ≥ 10, E[N ∧ 10] = {0 + 1 + 2 + ... 9 + (b-9)(10)}/(b + 1) = (45 + 10b - 90)/(b + 1). E[N
∧ 10] = 0.875 E[N]. ⇒ 10b - 45 = .875b(b+1)/2. ⇒ .875b2 - 19.125b + 90 = 0.
b = (19.125 ±
19.1252 - (4)(0.875)(90) )/1.75 = (19.125 ± 7.125)/1.75 = 15 or 6.857.
However, b has to be integer and at least 10, so b = 15. Comment: The limited expected value is discussed in “Mahlerʼs Guide to Loss Distributions.” If b = 15, then there are 6 terms that enter the limited expected value as 10: E[N ∧ 10] = (0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 10 + 10 + 10 +10 + 10}/16 = 105/16. E[N] = (0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 +13 + 14 + 15}/16 = 15/2. Their ratio is 0.875. 2.16. E. Mean = 0.652 and the variance = 1.414 - 0.6522 = 0.989. Number of Claims
A Priori Probability
Number Times Probability
Number Squared Times Probability
0 1 2 3 4 5
0.60000 0.24000 0.09800 0.03900 0.01600 0.00700
0.00000 0.24000 0.19600 0.11700 0.06400 0.03500
0.00000 0.24000 0.39200 0.35100 0.25600 0.17500
Sum
1
0.652
1.41400
2.17. E[N] = (1 + 2 + ... + S)/S = {S(S+1)/2}/S = (S + 1)/2. E[N2 ] = (12 + 22 + ... + S2 )/S = {S(S+1)(2S + 1)/6}/S = (S + 1)(2S + 1)/6. Var[N] = E[N2 ] - E[N]2 = (S + 1)(2S + 1)/6 - {(S + 1)/2}2 = {(S + 1)/12}{2(2S + 1) - 3(S + 1)} = {(S + 1)/12}(S - 1) = (S2 - 1)/12. Comment: For S = 6, a six-sided die, Var[N] = 35/12. 2.18. C. The first moment is: (.84)(0) +(.07)(1) +(.05)(2) +(.04)(3) = 0.29. The 2nd moment is: (.84)(02 ) +(.07)(12 ) +(.05)(22 ) +(.04)(32 ) = 0.63. Thus the variance is: 0.63 - 0.292 = .5459 for a single policyholder. For 10,000 independent policyholders, the variance of the sum is (10000)(.5459) = 5459. The standard deviation is: 5459 = 73.9. The mean number of claims is (10000)(.29) = 2900. Adding two standard deviations one gets 3047.8. This requires 7 claims handlers (since 6 can only handle 3000 claims.)
2013-4-1,
Frequency Distributions, §2 Basic Concepts
HCM 10/4/12,
2.19. C. Both Die A and Die B have a mean of 3.5. The variance of Die A is: (2.52 + 1.52 + 0.52 + 0.52 + 1.52 + 2.52 ) / 6 = 35/12. The variance of Die B is: 2.52 = 6.25. The variance of an average of n rolls of Die B is 6.25/n. We want 6.25 /n < 35/12. Thus n > (6.25)(12/35) = 2.14. Thus the smallest n is 3. 2.20. B. Prob[N ≥ 1 | N ≤ 4] = Prob[1 ≤ N ≤ 4]/Prob[N ≤ 4] = (1/6 + 1/12 + 1/20 + 1/30)/(1/2 + 1/6 + 1/12 + 1/20 + 1/30) = 20/50 = 2/5. Comment: For integer a and b, such that 0 < a < b, k= b-1
∞
Σ 1/k = (b-a) Σ 1/{(n+a)(n+b)}. k=a
n=0
Therefore, {(b-a)/
k=b−1
∑ 1 / k }/{(n+a)(n+b)}, n ≥ 0, is a frequency distribution.
k=a
This is a heavy-tailed distribution without a finite mean. If b = a + 1, then f(n) = a/{(n+a)(n+a+1)}, n ≥ 0. In this question, a =1, b = 2, and f(n) = 1/{(n+1)(n+2)}, n ≥ 0.
Page 15
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 16
Section 3, Binomial Distribution Assume one has five independent lives, each of which has a 10% chance of dying over the next year. What is the chance of observing two deaths? This is given by the product of three factors. The first is the chance of death to the power two. The second factor is the chance of not dying to the power 3 = 5 - 2. The final factor is the ways to pick two lives out of five, or the binomial coefficient of: ⎛5⎞ 5! = 10. ⎜ ⎟= ⎝2⎠ 2! 3! The chance of observing two deaths is: ⎛5⎞ 0.12 0.93 ⎜ ⎟ = 7.29%. ⎝2⎠ The chance of observing other numbers of deaths in this case is: Number of Deaths 0 1 2 3 4 5
Chance of Observation 59.049% 32.805% 7.290% 0.810% 0.045% 0.001%
Sum
1
Binomial Coefficient 1 5 10 10 5 1
This is a just an example of a Binomial distribution, for q = 0.1 and m = 5. For the binomial distribution: f(x) = m! qx (1-q)m-x /{x!(m-x)!}
x = 0, 1, 2, 3,...., m.
Note that the binomial density function is only positive for x ≤ m; there are at most m claims. The Binomial has two parameters m and q. m is the maximum number of claims and q is the chance of success.8 Written in terms of the binomial coefficient the Binomial density function is: ⎛m⎞ f(x) = ⎜ ⎟ qx (1- q )m-x, x = 0, 1, 2, 3,..., m. ⎝x ⎠ 8
I will use the notation in Loss Models and the tables attached to your exam. Many of you are familiar with the notation in which the parameters for the Binomial Distribution are n and p rather than m and q as in Loss Models.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 17
Bernoulli Distribution: The Bernoulli is a distribution with q chance of 1 claim and 1-q chance of 0 claims. There are only two possibilities: either a success or a failure. The Bernoulli is a special case of the Binomial for m = 1. The mean of the Bernoulli is q. The second moment of the Bernoulli is (02 )(1-q) + (12)q = q. Therefore the variance is q - q2 = q(1-q). Binomial as a Sum of Independent Bernoullis: The example of five independent lives was the sum of five variables each of which was a Bernoulli trial with chance of a claim 10%. In general, the Binomial can be thought of as the sum of the results of m independent Bernoulli trials, each with a chance of success q. Therefore, the sum of two independent Binomial distributions with the same chance of success q, is another Binomial distribution; if X is Binomial with parameters q and m1 , while Y is Binomial with parameters q and m2 , then X+Y is Binomial with parameters q and m1 + m2 . Mean and Variance: Since the Binomial is a sum of the results of m identical Bernoulli trials, the mean of the Binomial is m times the mean of a Bernoulli, which is mq. The mean of the Binomial is mq. Similarly the variance of a Binomial is m times the variance of the corresponding Bernoulli, which is mq(1-q). The variance of a Binomial is mq(1-q). For the case m = 5 and q = 0.1 presented previously: Number of Claims 0 1 2 3 4 5
Probability Density Function 59.049% 32.805% 7.290% 0.810% 0.045% 0.001%
Probability x # of Claims 0.00000 0.32805 0.14580 0.02430 0.00180 0.00005
Probability x Square of # of Claims 0.00000 0.32805 0.29160 0.07290 0.00720 0.00025
Probability x Cube of # of Claims 0.00000 0.32805 0.58320 0.21870 0.02880 0.00125
Sum
1
0.50000
0.70000
1.16000
The mean is: 0.5 = (5)(0.1) = mq. The variance is: E[X2 ] - E[X]2 = 0.7 - 0.52 = 0.45 = (5)(0.1)(0.9) = mq(1-q).
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 18
Properties of the Binomial Distribution: Since 0 < q < 1: mq(1-q) < mq. Therefore, the variance of any Binomial is less than its mean. A Binomial Distribution with parameters m and q, is the sum of m independent Bernoullis, each with parameter q. Therefore, if one sums independent Binomials with the same q, then one gets another Binomial, with the same q parameter and the sum of their m parameters. Exercise: X is a Binomial with q = 0.4 and m = 8. Y is a Binomial with q = 0.4 and m = 22. Z is a Binomial with q = 0.4 and m = 17. X, Y, and Z are independent of each other. What form does X + Y + Z have? [Solution: X + Y + Z is a Binomial with q = 0.4 and m = 8 + 22 + 17 = 47.] Specifically, the sum of n independent identically distributed Binomial variables, with the same parameters q and m, is a Binomial with parameters q and nm. Exercise: X is a Binomial with q = 0.4 and m = 8. What is the form of the sum of 25 independent random draws from X? [Solution: A random draw from a Binomial Distribution with q = 0.4 and m = (25)(8) = 200.] Thus if one had 25 exposures, each of which had an independent Binomial frequency process with q = 0.4 and m = 8, then the portfolio of 25 exposures has a Binomial frequency process with q = 0.4 and m = 200. Thinning a Binomial: If one selects only some of the claims, in a manner independent of frequency, then if all claims are Binomial with parameters m and q, the selected claims are also Binomial with parameters m and qʼ = q(expected portion of claims selected). For example, assume that the number of claims is given by a Binomial Distribution with m = 9 and q = 0.3. Assume that on average 1/3 of claims are large. Then the number of large losses is also Binomial, but with parameters m = 9 and q = .3/3 = 0.1. The number of small losses is also Binomial, but with parameters m = 9 and q = (.03)(2/3) = 0.2.9
9
The number of small and large losses are not independent; in the case of a Binomial they are negatively correlated. In the case of a Poisson, they are independent.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 19
Binomial Distribution Support: x = 0, 1, 2, 3, ..., m.
Parameters: 1 > q > 0, m ≥ 1. m is integer.
m = 1 is a Bernoulli Distribution. D. f. :
P. d. f. :
F(x) = 1 - β(x+1, m-x ; q) = β(m-x, x+1 ; 1-q)
f(x) =
Incomplete Beta Function
m! qx (1- q)m - x ⎛m⎞ x = ⎜ ⎟ q (1- q )m-x. x! (m- x)! ⎝x ⎠
Mean = mq Variance = mq(1-q)
Coefficient of Variation =
Skewness =
Kurtosis = 3 +
1 - 2q mq(1 - q)
Variance / Mean = 1 - q < 1. 1 - q . mq
.
1 6 - . mq(1 - q) m
Mode = largest integer in mq + q (if mq + q is an integer, then f(mq + q) = f(mq + q- 1) and both mq + q and mq + q - 1 are modes.) Probability Generating Function: P(z) = {1 + q(z-1)}m f(x+1)/f(x) = a + b/(x+1), a = -q/(1-q), b = (m+1)q/(1-q), f(0) = (1-q)m. Moment Generating Function: M(s) = (qes + 1-q)m
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Binomial Distribution with m = 8 and q = 0.7: Prob. 0.3 0.25 0.2 0.15 0.1 0.05 0
2
4
6
8
6
8
6
8
x
Binomial Distribution with m = 8 and q = 0.2: Prob. 0.3 0.25 0.2 0.15 0.1 0.05 0
2
4
x
Binomial Distribution with m = 8 and q = 0.5: Prob. 0.25 0.2 0.15 0.1 0.05
0
2
4
x
Page 20
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 21
Binomial Coefficients: The binomial coefficient of x out of n trials is: ⎛n⎞ n! n(n -1)(n - 2) ... (n +1- x) Γ(n + 1) = = . ⎜ ⎟= x(x -1)(x - 2) ... (1) Γ(x + 1) Γ(n+ 1 - x) ⎝x⎠ x! (n - x)! Below are some examples of Binomial Coefficients: n 2 3 4 5 6 7 8 9 10 11
x=0 1 1 1 1 1 1 1 1 1 1
x=1 2 3 4 5 6 7 8 9 10 11
x=2 1 3 6 10 15 21 28 36 45 55
x=3
x=4
x=5
x=6
x=7
x=8
1 4 10 20 35 56 84 120 165
1 5 15 35 70 126 210 330
1 6 21 56 126 252 462
1 7 28 84 210 462
1 8 36 120 330
1 9 45 165
x=9 x=10 x=11
1 10 55
1 11
1
It is interesting to note that the entries in a row sum to 2n . For example, 1 + 6 + 15 + 20 + 15 + 6 + 1 = 64 = 26 . Also note that for x=0 or x=n the binomial coefficient is one. The entries in a row can be computed from the previous row. For example, the entry 45 in the row n =10 is the sum of 9 and 36 the two entries above it and to the left. Similarly, 120 = 36+84. ⎛n⎞ ⎛ n ⎞ Note that: ⎜ ⎟ = ⎜ ⎟. ⎝x⎠ ⎝n - x⎠ For example, ⎛11⎞ ⎛11⎞ 11! 39,916,800 11! = = 462 = = ⎜ ⎟. ⎜ ⎟= (120) (720) 6! (11− 6)! ⎝ 6 ⎠ ⎝ 5 ⎠ 5! (11− 5)!
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Using the Functions of the Calculator to Compute Binomial Coefficients: ⎛n⎞ Using the TI-30X-IIS, the binomial coefficient ⎜ ⎟ can be calculated as follows: ⎝i⎠ n PRB nCr Enter i Enter ⎛10 ⎞ 10! For example, in order to calculate ⎜ ⎟ = = 120: ⎝ 3 ⎠ 3! 7! 10 PRB nCr Enter 3 Enter ⎛10 ⎞ 10! Using instead the BA II Plus Professional, in order to calculate ⎜ ⎟ = = 120: ⎝ 3 ⎠ 3! 7! 10 2nd nCr 3 =
Page 22
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 23
The TI-30XS Multiview calculator saves time doing repeated calculations using the same formula. For example constructing a table of the densities of a Binomial distribution, with m = 5 and q = 0.1:10 ⎛5⎞ f(x) = ⎜ ⎟ 0.1x 0.95-x. ⎝ x⎠ table y = (5 nCr x) * .1x * .9(5-x) Enter Start = 0 Step = 1 Auto OK x=0 x=1 x=2 x=3 x=4 x=5
10
y y y y y y
= = = = = =
0.59049 0.32805 0.07290 0.00810 0.00045 0.00001
Note that to get Binomial coefficients hit the prb key and select nCr.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 24
Relation to the Beta Distribution: The binomial coefficient looks almost like 1 over a complete Beta function.11 The incomplete Beta distribution for integer parameters can be used to compute the sum of terms from the Binomial Distribution.12
β(a,b;x) =
i=a+b-1⎛
∑
⎜ i= a ⎝
a + b -1⎞ i ⎟ x (1- x)a + b - (i + 1) . i ⎠
For example, β(6, 9; 0.3) = 0.21948 =
i=14 ⎛
14 ⎞
∑ ⎜⎝ i ⎟⎠ 0.3i 0.7 14 - i . i=6
By taking appropriate differences of two Betas one can get any sum of binomials terms. For example: ⎛ n⎞ ⎜ ⎟ q a (1-q)n-a = β(a, n-(a-1) ; q) - β(a+1. n-a ; q). ⎝a⎠ ⎛10 ⎞ For example, ⎜ ⎟ 0.23 0.87 = (120) 0.23 0.87 = 0.20133 = β(3, 8 ; 0.2) - β(4, 7; 0.2) ⎝3⎠
β(a, b ; x) = 1 - β(b, a ; 1-x) = F2a, 2b [bx / {a(1-x)}] where F is the distribution function of the F-distribution with 2a and 2b degrees of freedom. For example, β(4,7; .607) = .950 = F8,14 [ (7)(.607) /{(4)(.393)} ] = F8,14 [2.70].13
The complete Beta Function is defined as Γ(a)Γ(b) / Γ(a+b). It is the divisor in front of the incomplete Beta function and is equal to the integral from 0 to 1 of xa-1(1-x)b-1. 12 For a discussion of the Beta Distribution, see “Mahlerʼs Guide to Loss Distributions”. On the exam you should either compute the sum of binomial terms directly or via the Normal Approximation. Note that the use of the Beta Distribution is an exact result, not an approximation. See for example the Handbook of Mathematical Functions, by Abramowitz, et. al. 13 If one did an F-Test with 8 and 14 degrees of freedom, then there would be a 5% chance that the value exceeds 2.7. 11
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 25
Problems: Use the following information for the next seven questions: One observes 9 independent lives, each of which has a 20% chance of death over the coming year. 3.1 (1 point) What is the mean number of deaths over the coming year? A. less than 1.8 B. at least 1.8 but less than 1.9 C. at least 1.9 but less than 2.0 D. at least 2.0 but less than 2.1 E. at least 2.1 3.2 (1 point) What is the variance of the number of deaths observed over the coming year? A. less than 1.5 B. at least 1.5 but less than 1.6 C. at least 1.6 but less than 1.7 D. at least 1.7 but less than 1.8 E. at least 1.8 3.3 (1 point) What is the chance of observing 4 deaths over the coming year? A. less than 7% B. at least 7% but less than 8% C. at least 8% but less than 9% D. at least 9% but less than 10% E. at least 10% 3.4 (1 point) What is the chance of observing no deaths over the coming year? A. less than 13% B. at least 13% but less than 14% C. at least 14% but less than 15% D. at least 15% but less than 16% E. at least 16% 3.5 (3 points) What is the chance of observing 6 or more deaths over the coming year? A. less than .1% B. at least .1% but less than .2% C. at least .2% but less than .3% D. at least .3% but less than .4% E. at least .4%
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 26
3.6 (1 point) What is the median number of deaths per year? A. 0 B. 1 C. 2 D. 3 E. None of A, B, C, or D 3.7 (1 point) What is the mode of the distribution of deaths per year? A. 0 B. 1 C. 2 D. 3 E. None of A, B, C, or D
3.8 (1 point) Assume that each year that Joe starts alive, there is a 20% chance that he will die over the coming year. What is the chance that Joe will die over the next 5 years? A. less than 67% B. at least 67% but less than 68% C. at least 68% but less than 69% D. at least 69% but less than 70% E. at least 70% 3.9 (2 points) One insures 10 independent lives for 5 years. Assume that each year that an insured starts alive, there is a 20% chance that he will die over the coming year. What is the chance that 6 of these 10 insureds will die over the next 5 years? A. less than 20% B. at least 20% but less than 21% C. at least 21% but less than 22% D. at least 22% but less than 23% E. at least 23% 3.10 (1 point) You roll 13 six-sided dice. What is the chance of observing exactly 4 sixes? A. less than 10% B. at least 10% but less than 11% C. at least 11% but less than 12% D. at least 12% but less than 13% E. at least 13% 3.11 (1 point) You roll 13 six-sided dice. What is the average number of sixes observed? A. less than 1.9 B. at least 1.9 but less than 2.0 C. at least 2.0 but less than 2.1 D. at least 2.1 but less than 2.2 E. at least 2.2 3.12 (1 point) You roll 13 six-sided dice. What is the mode of the distribution of the number of sixes observed? A. 1 B. 2 C. 3 D. 4 E. None of A, B, C, or D
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 27
3.13 (3 point) You roll 13 six-sided dice. What is the median of the distribution of the number of sixes observed? A. 1 B. 2 C. 3 D. 4 E. None of A, B, C, or D 3.14 (1 point) You roll 13 six-sided dice. What is the variance of the number of sixes observed? A. less than 1.9 B. at least 1.9 but less than 2.0 C. at least 2.0 but less than 2.1 D. at least 2.1 but less than 2.2 E. at least 2.2 3.15 (2 point) The number of losses is Binomial with q = 0.4 and m = 90. The sizes of loss are Exponential with mean 50, F(x) = 1 - e-x/50. The number of losses and the sizes of loss are independent. What is the probability of seeing exactly 3 losses of size greater than 100? A. 9% B. 11% C. 13% D. 15% E. 17% 3.16 (2 points) Total claim counts generated from Policy A follow a Binomial distribution with parameters m = 2 and q = 0.1. Total claim counts generated from Policy B follow a Binomial distribution with parameters m = 2 and q = 0.6. Policy A is independent of Policy B. For the two policies combined, what is the probability of observing 2 claims in total? A. 32% B. 34% C. 36% D. 38% E. 40% 3.17 (2 points) Total claim counts generated from a portfolio follow a Binomial distribution with parameters m = 9 and q = 0.1. Total claim counts generated from another independent portfolio follow a Binomial distribution with parameters m = 15 and q = 0.1. For the two portfolios combined, what is the probability of observing exactly 4 claims in total? A. 11% B. 13% C. 15% D. 17% E. 19% 3.18 (3 points) The number of losses follows a Binomial distribution with m = 6 and q = 0.4. Sizes of loss follow a Pareto Distribution with α = 4 and θ = 50,000. There is a deductible of 5000, and a coinsurance of 80%. Determine the probability that there are exactly two payments of size greater than 10,000. A. 11% B. 13% C. 15% D. 17% E. 19%
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 28
Use the following information for the next two questions: • A state holds a lottery once a week.
• • • • • • • •
The cost of a ticket is 1. 1,000,000 tickets are sold each week. The prize is 1,000,000. The chance of each ticket winning the prize is 1 in 1,400,000, independent of any other ticket. In a given week, there can be either no winner, one winner, or multiple winners. If there are multiple winners, each winner gets a 1,000,000 prize. The lottery commission is given a reserve fund of 2,000,000 at the beginning of the year.
In any week where no prize is won, the lottery commission sends its receipts of 1 million to the state department of revenue. • In any week in which prize(s) are won, the lottery commission pays the prize(s) from receipts and if necessary the reserve fund. • If any week there is insufficient money to pay the prizes, the lottery commissioner must call the governor of the state, in order to ask the governor to authorize the state department of revenue to provide money to pay owed prizes and reestablish the reserve fund. 3.19 (3 points) What is the probability that the lottery commissioner has to call the governor the first week? A. 0.5% B. 0.6% C. 0.7% D. 0.8% E. 0.9% 3.20 (4 points) What is the probability that the lottery commissioner does not have to call the governor the first year (52 weeks)? A. 0.36% B. 0.40% C. 0.44% D. 0.48% E. 0.52%
3.21 (3 points) The number of children per family follows a Binomial Distribution m = 4 and q = 0.5. For a child chosen at random, how many siblings (brothers and sisters) does he have on average? A. 1.00 B. 1.25 C. 1.50 D. 1.75 E. 2.00 3.22 (2, 5/85, Q.2) (1.5 points) Suppose 30 percent of all electrical fuses manufactured by a certain company fail to meet municipal building standards. What is the probability that in a random sample of 10 fuses, exactly 3 will fail to meet municipal building standards? ⎛10 ⎞ ⎛10 ⎞ A. ⎜ ⎟ (0.37 ) (0.73 ) B. ⎜ ⎟ (0.33 ) (0.77 ) C. 10 (0.33 ) (0.77 ) ⎝3⎠ ⎝3⎠ ⎛10 ⎞ D. ∑ ⎜ ⎟ (0.3 i) (0.7 10− i) ⎝i ⎠ i=0 3
E. 1
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 29
3.23 (160, 11/86, Q.14) (2.1 points) In a certain population 40p 25 = 0.9. From a random sample of 100 lives at exact age 25, the random variable X is the number of lives who survive to age 65. Determine the value one standard deviation above the mean of X. (A) 90 (B) 91 (C) 92 (D) 93 (E) 94 3.24 (160, 5/91, Q.14) (1.9 points) From a study of 100 independent lives over the interval (x, x+1], you are given: (i) The underlying mortality rate, qx, is 0.1. (ii) lx+s is linear over the interval. (iii) There are no unscheduled withdrawals or intermediate entrants. (iv) Thirty of the 100 lives are scheduled to end observation, all at age x + 1/3. (v) Dx is the random variable for the number of observed deaths. Calculate Var(Dx). (A) 6.9
(B) 7.0
(C) 7.1
(D) 7.2
(E) 7.3
3.25 (2, 2/96, Q.10) (1.7 points) Let X1 , X2 , and X3 , be independent discrete random variables with probability functions ⎛ni⎞ P[Xi = k] = ⎜ ⎟ pk (1−p)ni − k for i = 1, 2, 3, where 0 < p < 1. ⎝k ⎠ Determine the probability function of S = X1 + X2 + X3 , where positive. ⎛n1 + n2 + n3 ⎞ A. ⎜ ⎟ ps (1− p)n 1 + n2 + n 3 - s s ⎝ ⎠ 3
B.
∑ n + nni + n3 i=1
1
2
⎛ ni⎞ ⎜ ⎟ ps (1− p)ni - s ⎝s ⎠
3 ⎛n i⎞
C. ∏ ⎜⎜ ⎟⎟ p s (1− p)n i - s i = 1⎝s ⎠ 3
D.
⎛n ⎞
∑ ⎜⎝s i⎟⎠ ps (1− p)ni i=1
⎛n1 n2 n3 ⎞ n E. ⎜ ⎟ ps (1− p)n1 n2 3 - s s ⎠ ⎝
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 30
3.26 (2, 2/96, Q.44) (1.7 points) The probability that a particular machine breaks down on any day is 0.2 and is independent of the breakdowns on any other day. The machine can break down only once per day. Calculate the probability that the machine breaks down two or more times in ten days. A. 0.0175 B. 0.0400 C. 0.2684 D. 0.6242 E. 0.9596 3.27 (4B, 11/96, Q.23) (2 points) Two observations are made of a random variable having a binomial distribution with parameters m = 4 and q = 0.5. Determine the probability that the sample variance is zero. A. 0 B. Greater than 0, but less than 0.05 C. At least 0.05, but less than 0.15 D. At least 0.15, but less than 0.25 E. At least 0.25 3.28 (Course 1 Sample Exam, Q.40) (1.9 points) A small commuter plane has 30 seats. The probability that any particular passenger will not show up for a flight is 0.10, independent of other passengers. The airline sells 32 tickets for the flight. Calculate the probability that more passengers show up for the flight than there are seats available. A. 0.0042 B. 0.0343 C. 0.0382 D. 0.1221 E. 0.1564 3.29 (1, 5/00, Q.40) (1.9 points) A company prices its hurricane insurance using the following assumptions: (i) In any calendar year, there can be at most one hurricane. (ii) In any calendar year, the probability of a hurricane is 0.05 . (iii) The number of hurricanes in any calendar year is independent of the number of hurricanes in any other calendar year. Using the companyʼs assumptions, calculate the probability that there are fewer than 3 hurricanes in a 20-year period. (A) 0.06 (B) 0.19 (C) 0.38 (D) 0.62 (E) 0.92 3.30 (1, 5/01, Q.13) (1.9 points) A study is being conducted in which the health of two independent groups of ten policyholders is being monitored over a one-year period of time. Individual participants in the study drop out before the end of the study with probability 0.2 (independently of the other participants). What is the probability that at least 9 participants complete the study in one of the two groups, but not in both groups? (A) 0.096 (B) 0.192 (C) 0.235 (D) 0.376 (E) 0.469
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 31
3.31 (1, 5/01, Q.37) (1.9 points) A tour operator has a bus that can accommodate 20 tourists. The operator knows that tourists may not show up, so he sells 21 tickets. The probability that an individual tourist will not show up is 0.02, independent of all other tourists. Each ticket costs 50, and is non-refundable if a tourist fails to show up. If a tourist shows up and a seat is not available, the tour operator has to pay 100, the ticket cost plus a penalty of 50, to the tourist. What is the expected revenue of the tour operator? (A) 935 (B) 950 (C) 967 (D) 976 (E) 985 3.32 (1, 11/01, Q.27) (1.9 points) A company establishes a fund of 120 from which it wants to pay an amount, C, to any of its 20 employees who achieve a high performance level during the coming year. Each employee has a 2% chance of achieving a high performance level during the coming year, independent of any other employee. Determine the maximum value of C for which the probability is less than 1% that the fund will be inadequate to cover all payments for high performance. (A) 24 (B) 30 (C) 40 (D) 60 (E) 120 3.33 (CAS3, 11/03, Q.14) (2.5 points) The Independent Insurance Company insures 25 risks, each with a 4% probability of loss. The probabilities of loss are independent. On average, how often would 4 or more risks have losses in the same year? A. Once in 13 years B. Once in 17 years C. Once in 39 years D. Once in 60 years E. Once in 72 years 3.34 (CAS3, 11/04, Q.22) (2.5 points) An insurer covers 60 independent risks. Each risk has a 4% probability of loss in a year. Calculate how often 5 or more risks would be expected to have losses in the same year. A. Once every 3 years B. Once every 7 years C. Once every 11 years D. Once every 14 years E. Once every 17 years
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 32
3.35 (CAS3, 11/04, Q.24) (2.5 points) A pharmaceutical company must decide how many experiments to run in order to maximize its profits.
• The company will receive a grant of $1 million if one or more of its experiments is successful. • Each experiment costs $2,900. • Each experiment has a 2% probability of success, independent of the other experiments. • All experiments are run simultaneously. • Fixed expenses are $500,000. • Ignore investment income. The company performs the number of experiments that maximizes its expected profit. Determine the company's expected profit before it starts the experiments. A. 77,818 B. 77,829 C. 77,840 D. 77,851 E. 77,862 3.36 (SOA3, 11/04, Q.8 & 2009 Sample Q.124) (2.5 points) For a tyrannosaur with a taste for scientists: (i) The number of scientists eaten has a binomial distribution with q = 0.6 and m = 8. (ii) The number of calories of a scientist is uniformly distributed on (7000, 9000). (iii) The numbers of calories of scientists eaten are independent, and are independent of the number of scientists eaten. Calculate the probability that two or more scientists are eaten and exactly two of those eaten have at least 8000 calories each. (A) 0.23 (B) 0.25 (C) 0.27 (D) 0.30 (E) 0.3 3.37 (CAS3, 5/05, Q.15) (2.5 points) A service guarantee covers 20 television sets. Each year, each set has a 5% chance of failing. These probabilities are independent. If a set fails, it is replaced with a new set at the end of the year of failure. This new set is included under the service guarantee. Calculate the probability of no more than 1 failure in the first two years. A. Less than 40.5% B. At least 40.5%, but less than 41.0% C. At least 41.0%, but less than 41.5% D. At least 41.5%, but less than 42.0% E. 42.0% or more
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 33
Solutions to Problems: 3.1. B. Binomial with q =0 .2 and m = 9. Mean = (9)(.2) = 1.8. 3.2. A. Binomial with q = 0.2 and m = 9. Variance = (9)(.2)(1-.2) = 1.44. 3.3. A. Binomial with q = 0.2 and m = 9. f(4) = 9!/(4! 5!) .24 .85 = 6.61%. 3.4. B. Binomial with q = 0.2 and m = 9. f(0) = 9!/(0! 9!) .20 .89 = 13.4%. 3.5. D. Binomial with q = 0.2 and m = 9. The chance of observing different numbers of deaths is: Number of Deaths 0 1 2 3 4 5 6 7 8 9
Chance of Observation 13.4218% 30.1990% 30.1990% 17.6161% 6.6060% 1.6515% 0.2753% 0.0295% 0.0018% 0.0001%
Binomial Coefficient 1 9 36 84 126 126 84 36 9 1
Adding the chances of having 6, 7 , 8 or 9 claims the answer is .307%. Alternately one can add the chances of having 0, 1, 2, 3, 4 or 5 claims and subtract this sum from unity. Comment: Although you should not do so for the exam, one could also answer this question using the Incomplete Beta Function. The chance of more than x claims is β(x+1, m-x; q). The chance of more than 5 claims is: β(5+1, 9-5; .2) = β(6, 4; .2) = 0.00307. 3.6. C. For a discrete distribution such as we have here, employ the convention that the median is the first value at which the distribution function is greater than or equal to .5. F(1) = 0.134 + 0.302 = 0.436 < 50%, F(2) = 0.134 + 0.302 + 0.302 = 0.738 > 50%, and therefore the median is 2.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 34
3.7. E. The mode is the value at which f(n) is a maximum; f(1) = .302 = f(2) and both 1 and 2 are modes. Alternately, in general for the Binomial the mode is the largest integer in mq + q; the largest integer in 2 is 2, but when mq + q is an integer both it and the integer one less are modes. Comment: This is a somewhat unfair question. While it seems to me that E is the best single answer, one could also argue for B or C. If you are unfortunate enough to have an apparently unfair question on your exam, do not let it upset you while taking the exam. 3.8. B. The chance that Joe is alive at the end of 5 years is (1-.2)5 = .32768. Therefore, the chance that he died is 1 - 0.32768 = 0.67232. 3.9. D. Based on the solution of the previous problem, for each life the chance of dying during the five year period is 0.67232. Therefore, the number of deaths for the 10 independent lives is Binomial with m = 10 and q = 0.67232. f(6) = ((10!)/{(6!)(4!)}) (0.672326 ) (0.327684 ) = (210)(0.0924)(0.01153) = 0.224. The chances of other numbers of deaths are as follows: Number of Deaths 0 1 2 3 4 5 6 7 8 9 10
Chance of Observation 0.001% 0.029% 0.270% 1.479% 5.312% 13.078% 22.360% 26.216% 20.171% 9.197% 1.887%
Binomial Coefficient 1 10 45 120 210 252 210 120 45 10 1
Sum
1
1024
3.10. B. The chance of observing a six on an individual six-sided die is 1/6. Assuming the results of the dice are independent, one has a Binomial distribution with q =1/6 and m =13. f(4) = 13!/(4! 9!) (1/6)4 (5/6)9 =10.7%.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 35
3.11. D, 3.12. B, & 3.13. B. Binomial with q =1/6 and m =13. Mean = (1/6)(13) = 2.17. For the Binomial the mode is the largest integer in mq + q = (13)(1/6) + (1/6) = 2.33; the largest integer in 2.33 is 2. Alternately compute all of the possibilities and 2 is the most likely. F(1) = .336 <.5 and F(2) = .628 ≥ .5, therefore the median is 2. Number of Deaths 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Sum
Chance of Observation 9.3463879% 24.3006085% 29.1607302% 21.3845355% 10.6922678% 3.8492164% 1.0264577% 0.2052915% 0.0307937% 0.0034215% 0.0002737% 0.0000149% 0.0000005% 0.0000000% 1
Binomial Coefficient 1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1 8192
Cumulative Distribution 9.346% 33.647% 62.808% 84.192% 94.885% 98.734% 99.760% 99.965% 99.996% 100.000% 100.000% 100.000% 100.000% 100.000%
3.14. A. Binomial with q =1/6 and m =13. Variance = (13)(1/6)(1-1/6) = 1.806. 3.15. D. S(100) = e-100/50 = .1353. Therefore, thinning the original Binomial, the number of large losses is Binomial with m = 90 and q = (.1353)(.4) = (.05413). f(3) = {(90)(89)(88)/3!} (.054133 )(1 - .05413)87 = .147. 3.16. D. Prob[2 claims in total] = Prob[A = 0]Prob[B = 2] + Prob[A= 1]Prob[ B = 1] + Prob[B = 2]Prob[A = 0] = (.92 )(.62 ) + {(2)(.1)(.9)}{(2)(.6)(.4)} + (.12 )(.42 ) = 37.96%. Comment: The sum of A and B is not a Binomial distribution, since their q parameters differ. 3.17. B. For the two portfolios combined, total claim counts follow a Binomial distribution with parameters m = 9 + 15 = 24 and q = 0.1. ⎛ 24⎞ f(4) = ⎜ ⎟ q4 (1-q)20 = {(24)(23)(22)(21)/4!}(.14 )(.920) = 12.9%. ⎝4⎠
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 36
3.18. B. A payment is of size greater than 10,000 if the loss is of size greater than: 10000/.8 + 5000 = 17,500. Probability of a loss of size greater than 17,500 is: {50/(50 + 17.5)}4 = 30.1%. The large losses are Binomial with m = 6 and q = (.301)(0.4) = 0.1204. ⎛6⎞ f(2) = ⎜ ⎟ .12042 (1 - .1204)4 = 13.0%. ⎝2⎠ Comment: An example of thinning a Binomial. 3.19. B. The number of prizes is Binomial with m = 1 million and q = 1/1,400,000. f(0) = (1 - 1/1400000)1000000 = 48.95%. f(1) = 1000000(1 - 1/1400000)999999 (1/1400000) = 34.97%. f(2) = {(1000000)(999999)/2}(1 - 1/1400000)999998 (1/1400000)2 = 12.49%. f(3) = {(1000000)(999999)(999998)/6}(1 - 1/1400000)999997 (1/1400000)3 = 2.97%. n
f(n)
0 1 2 3 4 5 6
48.95% 34.97% 12.49% 2.97% 0.53% 0.08% 0.01%
Sum
100.00%
The first week, the lottery has enough money to pay 3 prizes, (1 million in receipts + 2 million in the reserve fund.) The probability of more than 3 prizes is: 1 - (48.95% + 34.97% + 12.49% + 2.97%) = 0.62%. 3.20. C. Each week there is a .4895 + .3497 = .8392 chance of no need for the reserve fund. Each week there is a .1249 chance of a 1 million need from the reserve fund. Each week there is a .0297 chance of a 2 million need from the reserve fund. Each week there is a .0062 chance of a 3 million or more need from the reserve fund. The governor will be called if there is at least 3 weeks with 2 prizes each (since each such week depletes the reserve fund by 1 million), or if there is 1 week with 2 prizes plus 1 week with 3 prizes, or if there is a week with 4 prizes. Prob[Governor not called] = Prob[no weeks with more than 1 prize] + Prob[1 week @2, no weeks more than 2] + Prob[2 weeks @2, no weeks more than 2] + Prob[0 week @2, 1 week @3, no weeks more than 3] = (.839252) + (52)(.1249)(.839251) + ((52)(51)/2)(.12492 )(.839250) + (52)(.0297)(.839251) = .00011 + .00085 + .00323 + .00020 = 0.00439. Comment: The lottery can not keep receipts from good weeks in order to build up the reserve fund.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 37
3.21. C. Let n be the number of children in a family. The probability that the child picked is in a family of size n is proportional to the product of the size of family and the proportion of families of that size: n f(n). Thus, Prob[child is in a family of size n] = n f(n) / Σ n f(n) = n f(n) / E[N]. For n > 0, the number of siblings the child has is n - 1.
∑ n f(n) (n- 1) ∑ (n2 - n) f(n)
Thus the mean number of siblings is:
1
E[N]
=
1
E[N]
=
E[N2 ] - E[N] = E[N]
E[N2 ] Var[N] + E[N]2 Var[N] mq(1- q) -1= -1= + E[N] - 1 = + mq - 1 = 1 - q + mq - 1 E[N] E[N] mq E[N] = (m - 1)q = (3)(0.5) = 1.5. Alternately, assume for example 10,000 families. Number of children
Binomial Density
Number of Families
Number of Children
Number of Siblings
Product of # of Children Times # of Siblings
0 1 2 3 4
0.0625 0.2500 0.3750 0.2500 0.0625
625 2,500 3,750 2,500 625
0 2,500 7,500 7,500 2,500
0 0 1 2 3
0 0 7,500 15,000 7,500
Total
1.0000
10,000
20,000
6
30,000
Mean of number of siblings for a child chosen at random is: 30,000 / 20,000 = 1.5. Comment: The average size family has two children; each of these children has one sibling. However, a child chosen at random is more likely to be from a large family. ⎛10 ⎞ 3.22. B. A Binomial Distribution with m = 10 and q = 0.3. f(3) = ⎜ ⎟ (0.33 ) (0.77 ). ⎝3⎠ 3.23. D. The number of people who survive is Binomial with m = 100 and q = 0.9. Mean = (100)(.9) = 90. Variance = (100)(.9)(.1) = 9. Mean + Standard Deviation = 93. 3.24. E. The number of deaths is sum of two Binomials, one with m = 30 and q = 0.1/3, and the other with m = 70 and q = 0.1. The sum of their variances is: (30)(.03333)(1 - .03333) + (70)(.1)(.9) = .967 + 6.3 = 7.267. 3.25. A. Each Xi is Binomial with parameters ni and p. The sum is Binomial with parameters n1 + n2 + n3 and p.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 38
3.26. D. Binomial with m = 10 and q = 0.2. 1 - f(0) - f(1) = 1 - .810 - (10)(.2)(.89 ) = 0.624. 3.27. E. The sample variance is the average squared deviation from the mean; thus the sample variance is positive unless all the observations are equal. In this case, the sample variance is zero if and only if the two observations are equal. For this Binomial the chance of observing a given number of claims is: number of claims: 0 1 2 3 4 probability: 1/16 4/16 6/16 4/16 1/16 Thus the chance that the two observations are equal is: (1/16)2 + (4/16)2 + (6/16)2 + (4/16)2 + (1/16)2 = 70/256 = .273. Comment: For example, the chance of 3 claims is: ((m!)/{(3!)((m-3)!)}) q3 (1-q) = ((4!)/{(3!)((1!)}) .53 (1-.5) = 4/16. 3.28. E. The number of passengers that show up for a flight is Binomial with m = 32 and q = .90. Prob[more show up than seats] = f(31) + f(32) = 32(.1)(.931) + .932 = .1564. 3.29. E. The number of hurricanes is Binomial with m = 20 and q = 0.05. Prob[< 3 hurricanes] = f(0) + f(1) + f(2) = .9520 + 20(.05)(.9519) + 190(.052 )(.9518) = .9245. 3.30. E. Each group is Binomial with m = 10 and q = 0.8. Prob[at least 9 complete] = f(9) + f(10) = 10(.2)(.89 ) + .810 = .376. Prob[one group has at least 9 and one group does not] = (2)(.376)(1 - .376) = .469. 3.31. E. The bus driver collects (21)(50) = 1050 for the 21 tickets he sells. However, he may be required to refund 100 to one passenger if all 21 ticket holders show up. The number of tourists who show up is Binomial with m = 21 and q = .98. Expected penalty is: 100 f(21) = 100(.9821) = 65.425. Expected revenue is: (21)(50) - 65.425 = 984.6.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 39
3.32. D. The fund will be inadequate if there are more than 120/C payments. The number of payments is Binomial with m = 20 and q = .02. x
f
F
0 1 2 3
0.66761 0.27249 0.05283 0.00647
0.66761 0.94010 0.99293 0.99940
There is a 1 - .94010 = 5.990% chance of needing more than one payment. There is a 1 - .992930 = 0.707% chance of needing more than two payments. Thus we need to require that the fund be able to make two payments. 120/C = 2. ⇒ C = 60. 3.33. D. This is the sum of 25 independent Bernoullis, each with q = .04. The number of losses per year is Binomial with m = 25 and q = .04. f(0) = (1 - q)m = (1 - .04)25 = .3604. f(1) = mq(1 - q)m-1 = (25)(.04)(1 - .04)24 = .3754. f(2) = {m(m-1)/2!}q2 (1 - q)m-2 = (25)(24/2)(.042 )(1 - .04)23 = .1877. f(3) = {m(m-1)(m-2)/3!}q3 (1 - q)m-3 = (25)(24)(23/6)(.043 )(1 - .04)22 = .0600. Prob[at least 4] = 1 - {f(0) + f(1) + f(2) + f(3)} = 1 - .9835 = .0165. 4 or more risks have losses in the same year on average once in: 1/.0165 = 60.6 years. 3.34. C. A Binomial Distribution with m = 60 and q = .04. f(0) = .9660 = .08635. f(1) = (60)(.04).9659 = .21588. f(2) = {(60)(59)/2}(.042 ).9658 = .26535. f(3) = {(60)(59)(58)/6}(.043 ).9657 = .21376. f(4) = {(60)(59)(58)(57)/24}(.044 ).9656 = .12692. 1 - f(0) - f(1) - f(2) - f(3) - f(4) = 1 - .08635 - .21588 - .26535 - .21376 - .12692 = .09174. 1/.09174 = Once every 11 years. 3.35. A. Assume n experiments are run. Then the probability of no successes is .98n . Thus the probability of at least one success is: 1 - .98n . Expected profit is: (1,000,000)(1 - .98n ) - 2900n - 500,000 = 500,000 - (1,000,000).98n - 2900n. Setting the derivative with respect to n equal to zero: 0 = -ln(.98)(1,000,000).98n - 2900. ⇒ .98n = .143545. ⇒ n = 96.1. Taking n = 96, the expected profit is 77,818. Comment: For n = 97, the expected profit is 77,794.
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 40
3.36. D. (9000 - 8000)/(9000 - 7000) = 1/2. Half the scientists are “large”. Therefore, thinning the original Binomial, the number of large scientist is Binomial with m = 8 and q = 0.6/2 = 0.3. f(2) = {(8)(7)/2} (.76 )(.32 ) = 0.2965. Alternately, this is a compound frequency distribution, with primary distribution a Binomial with q = 0.6 and m = 8, and secondary distribution a Bernoulli with q = 1/2 (half chance a scientist is large.) One can use the Panjer Algorithm. For the primary Binomial Distribution, a = -q/(1-q) = -.6/.4 = -1.5. b = (m+1)q/(1-q) = (9)(1.5) = 13.5. P(z) = {1 + q(z-1)}m. c(0) = Pp (s(0)) = Pp (.5) = {1 + (.6)(.5-1)}8 = .057648. x
c(x) = {1/(1-as(0))}
x
Σ (a + jb/x) s(j) c(x-j) = (1/1.75)Σ (-1.5 + j13.5/x) s(j) c(x-j). j=1
j=1
c(1) = (1/1.75)(-1.5 + 13.5) s(1) c(0) = (1/1.75)(12)(1/2)(.057648) = .197650. c(2) = (1/1.75){(-1.5 + 13.5/2) s(1) c(1) + (-1.5 + (2)13.5/2) s(2) c(0)} = (1/1.75){(5.25)(1/2)(.197650) + (12)(0)(.057648)} = 0.296475. Alternately, one can list all the possibilities: Number of Scientist
Binomial Probability
Given the number of Scientist, the Probability that exactly two are large
Extension
0 1 2 3 4 5 6 7 8
0.00066 0.00786 0.04129 0.12386 0.23224 0.27869 0.20902 0.08958 0.01680
0 0 0.25 0.375 0.375 0.3125 0.234375 0.1640625 0.109375
0.00000 0.00000 0.01032 0.04645 0.08709 0.08709 0.04899 0.01470 0.00184
Sum
1.00000
0.29648
For example if 6 scientist have been eaten, then the chance that exactly two of them are large is: (.56 )6!/(4!2!) = .234375. In algebraic form, this solution is: 8
8
Σ {8!/(n! (8-n)!)} .6n .48 - n {n! /(2! (n-2)!)} .5n = (1/2) Σ {8!/((n-2)! (8-n)!)} .3n .48 - n = n=2
n=2 6
(1/2)(8)(7)(.32 ) Σ {6!/(i! (6-i)!)} .3i .46 - i = (28)(.09)(.3 + .4)6 = 0.2965. i=0
Comment: The Panjer Algorithm (Recursive Method) is discussed in “Mahlerʼs Guide to Aggregate Distributions.”
2013-4-1,
Frequency Distributions, §3 Binomial Dist.
HCM 10/4/12,
Page 41
“two or more scientists are eaten and exactly two of those eaten have at least 8000 calories each”
⇔ exactly two “large” scientists are eaten as well as some unknown number of “small” scientists. At least 2 claims of which exactly two are large.
⇔ exactly 2 large claims and some unknown number of small claims. 3.37. A. One year is Binomial Distribution with m = 20 and q = .05. The years are independent of each other. Therefore, the number of failures over 2 years is Binomial Distribution with m = 40 and q = .05. Prob[0 or 1 failures] = .9540 + (40)(.9539)(.05) = 39.9%. Comment: In this question, when a TV fails it is replaced. Therefore, we can have a failure in both years for a given customer. A somewhat different question than asked would be, assuming each customer owns one set, calculate the probability that no more than one customer suffers a failure during the two years. For a given customer, the probability of no failure in the first two years is: .952 = 0.9025. The probability of 0 or 1 customer suffering a failure is: .902520 + (20)(.0975)(.902519) = 40.6%.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 42
Section 4, Poisson Distribution The Poisson Distribution is the single most important frequency distribution to study for the exam.14 The density function for the Poisson is: f(x) = λx e−λ / x!, x ≥ 0. Note that unlike the Binomial, the Poisson density function is positive for all x ≥ 0; there is no limit on the possible number of claims. The Poisson has a single parameter λ. The Distribution Function is 1 at infinity since λx / x! is the series for eλ . For example, hereʼs a Poisson for λ = 2.5: n f(n) F(n)
0 0.082 0.082
1 0.205 0.287
2 0.257 0.544
3 0.214 0.758
4 0.134 0.891
5 0.067 0.958
6 0.028 0.986
7 0.010 0.996
8 0.003 0.999
9 0.001 1.000
10 0.000 1.000
Prob. 0.25
0.2
0.15
0.1
0.05
x 0
2
4
6
8
10
For example, the chance of 4 claims is: f(4) = λ4 e−λ / 4! = 2.54 e-2.5 / 4! = .1336. Remember, there is a small chance of a very large number of claims. For example, f(15) = 2.515 e-2.5 / 15! = 6 x 10-8. Such large numbers of claims can contribute significantly to the higher moments of the distribution. 14
The Poisson comes up among other places in the Gamma-Poisson frequency process, to be discussed in a subsequent section. Poisson processes are discussed in “Mahlerʼs Guide to Stochastic Models.”
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 43
Letʼs calculate the first two moments for this Poisson distribution with λ = 2.5:
Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Probability Density Function 0.08208500 0.20521250 0.25651562 0.21376302 0.13360189 0.06680094 0.02783373 0.00994062 0.00310644 0.00086290 0.00021573 0.00004903 0.00001021 0.00000196 0.00000035 0.00000006 0.00000001 0.00000000 0.00000000 0.00000000 0.00000000
Probability x # of Claims 0.00000000 0.20521250 0.51303124 0.64128905 0.53440754 0.33400471 0.16700236 0.06958432 0.02485154 0.00776611 0.00215725 0.00053931 0.00012257 0.00002554 0.00000491 0.00000088 0.00000015 0.00000002 0.00000000 0.00000000 0.00000000
Probability Probability x x Square ofCube ofDistribution # of Claims Function # of Claims 0.00000000 0.08208500 0.20521250 0.28729750 1.02606248 0.54381312 1.92386716 0.75757613 2.13763017 0.89117802 1.67002357 0.95797896 1.00201414 0.98581269 0.48709021 0.99575330 0.19881233 0.99885975 0.06989496 0.99972265 0.02157252 0.99993837 0.00593244 0.99998740 0.00147085 0.99999762 0.00033196 0.99999958 0.00006875 0.99999993 0.00001315 0.99999999 0.00000234 1.00000000 0.00000039 1.00000000 0.00000006 1.00000000 0.00000001 1.00000000 0.00000000 1.00000000
Sum
1.00000000
2.50000000
8.75000000
The mean is 2.5 = λ. The variance is: E[X2 ] - E[X]2 = 8.75 - 2.52 = 2.5 = λ. In general, the mean of the Poisson is λ and the variance is λ. In this case the mode is 2, since f(2) = .2565, larger than any other value of the probability density function. In general, the mode of the Poisson is the largest integer in λ.15 This follows from the fact that for the Poisson f(x+1) / f(x) = λ / (x+1). Thus for the Poisson the mode is less than or equal to the mean λ. The median in this case is 2, since F(2) = .544 ≥ .5, while F(1) = .287 < .5. The median as well as the mode are less than the mean, which is typical for distributions skewed to the right.
15
If λ is an integer then f(λ) = f(λ−1), and both λ and λ-1 are modes.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 44
Claim Intensity, Derivation of the Poisson: Assume one has a claim intensity of ξ. The chance of having a claim over an extremely small period of time Δt is approximately ξ (Δt). (The claim intensity is analogous to the force of mortality in Life Contingencies.) If the claim intensity is a constant over time and the chance of having a claim in any interval is independent of the chance of having a claim in any other disjoint interval, then the number of claims observed over a period time t is given by a Poisson Distribution, with parameter ξt. A Poisson is characterized by a constant independent claim intensity and vice versa. For example, if the chance of a claim each month is 0.1%, and months are independent of each other, the distribution of number of claims over a 5 year period (60 months) is Poisson with mean = 6%. For the Poisson, the parameter λ = mean = (claim intensity)(total time covered). Therefore, if for example one has a Poisson in each of five years with parameter λ, then over the entire 5 year period one has a Poisson with parameter 5λ. Adding Poissons: The sum of two independent variables each of which is Poisson with parameters λ1 and λ 2 is also Poisson, with parameter λ1 + λ2 . 16 This follows from the fact that for a very small time interval the chance of a claim is the sum of the chance of a claim from either variable, since they are independent.17 If the total time interval is one, then the chance of a claim from either variable over a very small time interval Δt is λ1 Δt + λ2 Δt = (λ1 + λ2 )Δt. Thus the sum of the variables has constant claim intensity (λ1 + λ2 ) over a time interval of one, and is therefore a Poisson with parameter λ 1 + λ2 . For example, the sum of a two independent Poisson variables with means 3% and 5% is a Poisson variable with mean 8%. So if a portfolio consists of one risk Poisson with mean 3% and one risk Poisson with mean 5%, the number of claims observed for the whole portfolio is Poisson with mean 8%.
16
See Theorem 6.1 in Loss Models.
This can also be shown from simple algebra, by summing over i + j = k the terms (λi e−λ / i!) (µj e−µ / j!) = e −λ+µ (λi µj / i! j!). By the Binomial Theorem, these terms sum to e−λ+µ (λ+µ)k / k!.
17
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 45
Exercise: Assume one had a portfolio of 25 exposures. Assume each exposure has an independent Poisson frequency process, with mean 3%. What is the frequency distribution for the claims from the whole portfolio? [Solution: A Poisson Distribution with mean: (25)(3%) = 0.75.] If one has a large number of independent events each with a small probability of occurrence, then the number of events that occurs has approximately a constant claims intensity and is thus approximately Poisson Distributed. Therefore the Poisson Distribution can be useful in modeling such situations. Thinning a Poisson:18 Sometimes one selects only some of the claims. This is sometimes referred to as “thinning” the Poisson distribution. For example, if frequency is given by a Poisson and severity is independent of frequency, then the number of claims above a certain amount (in constant dollars) is also a Poisson. For example, assume that we have a Poisson with mean frequency of 30 and that the size of loss distribution is such that 20% of the losses are greater than $1 million (in constant dollars). Then the number of losses observed greater than $1 million (in constant dollars) is also Poisson but with a mean of (20%)(30) = 6. Similarly, losses observed smaller than $1 million (in constant dollars) is also Poisson, but with a mean of (80%)(30) = 24. Exercise: Frequency is Poisson with λ = 5. Sizes of loss are Exponential with θ = 100. Frequency and severity are independent. What is the distribution of the number of losses of size between 50 and 200? [Solution: F(200) - F(50) = e-0.5 - e-2 = 0.471. Thus the number of medium sized losses are Poisson with mean: (.471)(5) = 2.36. Comment: The number of large losses are Poisson with mean 5S(200) = (0.135)(5) = 0.68. The number of small losses are Poisson with mean 5F(50) = (0.393)(5) = 1.97. These three Poisson Distributions are independent of each other.] In this example, the total number of losses are Poisson and therefore has a constant independent claims intensity of 5. Since frequency and severity are independent, the large losses also have a constant independent claims intensity of 5S(200), which is therefore Poisson with mean 5S(200). Similarly, the small losses have constant independent claims intensity of 5F(50) and therefore are Poisson. Also, these two processes are independent of each other.
18
See Theorem 6.2 in Loss Models.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 46
If in this example we had a deductible of 200, then only losses of size greater than 200 would result in a (non-zero) payment. Loss Models refers to the number of payments as NP, in contrast to NL the number of losses.19 In this example, NL is Poisson with mean 5, while for a 200 deductible NP is Poisson with mean 0.68. Thinning a Poisson based on size of loss is a special case of decomposing Poisson frequencies. The key idea is that there is some way to divide the claims up into mutually exclusive types that are independent. Then each type is also Poisson, and the Poisson Distributions for each type are independent. Exercise: Claim frequency follows a Poisson Distribution with a mean of 20% per year. 1/4 of all claims involve attorneys. If attorney involvement is independent between different claims, what is the probability of getting 2 claims involving attorneys in the next year? [Solution: Claims with attorney involvement are Poisson with mean frequency 20%/4 = 5%. Thus f(2) = (0.05)2 e-0.05 / 2! = 0.00119.] Derivation of Results for Thinning Poissons:20 If losses are Poisson with mean λ, and one selects a portion, t, of the losses in a manner independent of the frequency, then the selected losses are also Poisson but with mean λt. ∞
Prob[# selected losses = n] =
∑
Prob[m total # losses] Prob[n of m losses are selected]
m=n ∞
=
∞ e- λ λ m t n (1- t)m - n m! e- λ tn λn λm - n (1- t)m - n = = ∑ m! ∑ n! (m- n)! n! (m- n)! m=n m=n
∞ e- λ tn λn e- λ tn λn λ(1-t) e = e−λt(tλ)n /n! = f(n) for a Poisson with mean tλ. λi (1- t)i / i! = ∑ n! n! i=0
In a similar manner, the number not selected follows a Poisson with mean (1-t)λ.
19 20
I do not regard this notation as particularly important. See Section 8.6 of Loss Models. I previously discussed how these results follow from the constant, independent claims intensity.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 47
Prob[# selected losses = n | # not selected losses = j] = Prob[total # = n + j and # not selected losses = j] / Prob[# not selected losses = j] = Prob[total # = n + j] Prob[# not selected losses = j | total # = n + j] = Prob[# not selected losses = j] e- λ λn + j (1- t)j tn (n + j)! (n + j)! n! j! = e−λt(tλ)n /n! = e- (1- t)λ {(1- t)λ}j j! f(n) for a Poisson with mean tλ = Prob[# selected losses = n]. Thus the number selected and the number not selected are independent. They are independent Poisson distributions. The same result follows when dividing into more than 2 disjoint subsets. Effect of Exposures:21 Assume one has 100 exposures with independent, identically distributed frequency distributions. If each one is Poisson, then so is the sum, with mean 100λ. If we change the number of exposures to for example 150, then the sum is Poisson with mean 150λ, or 1.5 times the mean in the first case. In general, as the exposures change, the distribution remains Poisson with the mean changing in proportion. Exercise: The total number of claims from a portfolio of private passenger automobile insured has a Poisson Distribution with λ = 60. If next year the portfolio has only 80% of the current exposures, what is its frequency distribution? [Solution: Poisson with λ = (.8)(60) = 48.] This same result holds for a Compound Frequency Distribution, to be discussed subsequently, with a primary distribution that is Poisson.
21
See Section 6.12 of Loss Models.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 48
Poisson Distribution Parameters: λ > 0
Support: x = 0, 1, 2, 3, ... D. f. : P. d. f. :
F(x) = 1 - Γ(x+1 ; λ)
Incomplete Gamma Function22
f(x) = λx e−λ / x!
Mean = λ
Variance = λ
Coefficient of Variation =
1 λ
Variance / Mean = 1.
Skewness =
1 λ
= CV.
Kurtosis = 3 + 1/λ = 3 + CV2 .
Mode = largest integer in λ (if λ is an integer then both λ and λ-1 are modes.) nth Factorial Moment = λn . Probability Generating Function: P(z) = eλ(z-1) , λ > 0. Moment Generating Function: M(s) = exp[λ(es - 1)]. f(x+1) / f(x) = a + b / (x+1), a = 0, b = λ , f(0) = e−λ. A Poisson Distribution for λ = 10: Prob. 0.12 0.1 0.08 0.06 0.04 0.02 0
5
10
15
20
25
x
x+1 is the shape parameter of the Incomplete Gamma which is evaluated at the point λ. Thus one can get the sum of terms for the Poisson Distribution by using the Incomplete Gamma Function. 22
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Problems: Use the following information for the next five questions: The density function for n is: f(n) = 6.9n e-6.9 / n!, n = 0, 1, 2, ... 4.1 (1 point) What is the mean of the distribution? A. less than 6.9 B. at least 6.9 but less than 7.0 C. at least 7.0 but less than 7.1 D. at least 7.1 but less than 7.2 E. at least 7.2 4.2 (1 point) What is the variance of the distribution? A. less than 6.9 B. at least 6.9 but less than 7.0 C. at least 7.0 but less than 7.1 D. at least 7.1 but less than 7.2 E. at least 7.2 4.3 (2 points) What is the chance of having less than 4 claims? A. less than 9% B. at least 9% but less than 10% C. at least 10% but less than 11% D. at least 11% but less than 12% E. at least 12% 4.4 (2 points) What is the mode of the distribution? A. 5 B. 6 C. 7 D. 8
E. None of the Above.
4.5 (2 points) What is the median of the distribution? A. 5 B. 6 C. 7 D. 8
E. None of the Above.
Page 49
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 50
4.6 (2 points) The male drivers in the State of Grace each have their annual claim frequency given by a Poisson distribution with parameter equal to 0.05. The female drivers in the State of Grace each have their annual claim frequency given by a Poisson distribution with parameter equal to 0.03. You insure in the State of Grace 20 male drivers and 10 female drivers. Assume the claim frequency distributions of the individual drivers are independent. What is the chance of observing 3 claims in a year? A. less than 9.6% B. at least 9.6% but less than 9.7% C. at least 9.7% but less than 9.8% D. at least 9.8% but less than 9.9% E. at least 9.9 % 4.7 (2 points) Assume that the frequency of hurricanes hitting the State of Windiana is given by a Poisson distribution, with an average annual claim frequency of 82%. Assume that the losses in millions of constant 1998 dollars from such a hurricane are given by a Pareto Distribution with α = 2.5 and θ = 400 million. Assuming frequency and severity are independent, what is chance of two or more hurricanes each with more than $250 million (in constant 1998 dollars) of loss hitting the State of Windiana next year? (There may or may not be hurricanes of other sizes.) A. less than 2.1% B. at least 2.1% but less than 2.2% C. at least 2.2% but less than 2.3% D. at least 2.3% but less than 2.4% E. at least 2.4% Use the following information for the next 3 questions: The claim frequency follows a Poisson Distribution with a mean of 10 claims per year. 4.8 (2 points) What is the chance of having more than 5 claims in a year? A. 92% B. 93% C. 94% D. 95% E. 96% 4.9 (2 points) What is the chance of having more than 8 claims in a year? A. 67% B. 69% C. 71% D. 73% E. 75% 4.10 (1 point) What is the chance of having 6, 7, or 8 claims in a year? A. 19% B. 21% C. 23% D. 25% E. 27%
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 51
4.11 (2 points) You are given the following: • Claims follow a Poisson Distribution, with a mean of 27 per year.
• The size of claims are given by a Weibull Distribution with θ = 1000 and τ = 3. • Frequency and severity are independent. Given that during a year there are 7 claims of size less than 500, what is the expected number of claims of size greater than 500 during that year? (A) 20 (B) 21 (C) 22 (D) 23 (E) 24 4.12 (1 point) Frequency follows a Poisson Distribution with λ = 7. 20% of losses are of size greater than $50,000. Frequency and severity are independent. Let N be the number of losses of size greater than $50,000. What is the probability that N = 3? A. less than 9% B. at least 9% but less than 10% C. at least 10% but less than 11% D. at least 11% but less than 12% E. at least 12% 4.13 (1 point) N follows a Poisson Distribution with λ = 0.1. What is Prob[N = 1 | N ≤ 1]? A. 8%
B. 9%
C. 10%
D. 11%
E. 12%
4.14 (1 point) N follows a Poisson Distribution with λ = 0.1. What is Prob[N = 1 | N ≥ 1]? A. 91%
B. 92%
C. 93%
D. 94%
E. 95%
4.15 (2 points) N follows a Poisson Distribution with λ = 0.2. What is E[1/(N+1)]? A. less than 0.75 B. at least 0.75 but less than 0.80 C. at least 0.80 but less than 0.85 D. at least 0.85 but less than 0.90 E. at least 0.90 4.16 (2 points) N follows a Poisson Distribution with λ = 2. What is E[N | N > 1]? A. 2.6
B. 2.7
C. 2.8
D. 2.9
E. 3.0
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 52
4.17 (2 points) The total number of claims from a book of business with 500 exposures has a Poisson Distribution with λ = 27. Next year, this book of business will have 600 exposures. Next year, what is the probability of this book of business having a total of 30 claims? A. 5.8% B. 6.0% C. 6.2% D. 6.4% E. 6.6% Use the following information for the next two questions: N follows a Poisson Distribution with λ = 1.3. Define (N-j)+ = n-j if n ≥ j, and 0 otherwise. 4.18 (2 points) Determine E[(N-1)+]. A. 0.48
B. 0.51
C. 0.54
D. 0.57
E. 0.60
D. 0.22
E. 0.23
4.19 (2 points) Determine E[(N-2)+]. A. 0.19
B. 0.20
C. 0.21
4.20 (2 points) The total number of non-zero payments from a policy with a $500 deductible follows a Poisson Distribution with λ = 3.3. The ground up losses follow a Weibull Distribution with τ = 0.7 and θ = 2000. If this policy instead had a $1000 deductible, what would be the probability of having 4 non-zero payments? A. 14% B. 15% C. 16% D. 17% E. 18% 4.21 (3 points) The number of major earthquakes that hit the state of Allshookup is given by a Poisson Distribution with 0.05 major earthquakes expected per year.
• Allshookup establishes a fund that will pay 1000/major earthquake. • The fund charges an annual premium, payable at the start of each year, of 60. • At the start of this year (before the premium is paid) the fund has 300. • Claims are paid immediately when there is a major earthquake. • If the fund ever runs out of money, it immediately ceases to exist. • Assume no investment income and no expenses. What is the probability that the fund is still functioning in 40 years? A. Less than 40% B. At least 40%, but less than 41% C. At least 41%, but less than 42% D. At least 42%, but less than 43% E. At least 43%
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 53
4.22 (2 points) You are given the following:
• A business has bought a collision policy to cover its fleet of automobiles. • The number of collision losses per year follows a Poisson Distribution. • The size of collision losses follows an Exponential Distribution with a mean of 600. • Frequency and severity are independent. • This policy has an ordinary deductible of 1000 per collision. • The probability of no payments on this policy during a year is 74%. Determine the probability that of the collision losses this business has during a year, exactly three of them result in no payment on this policy. (A) 8% (B) 9% (C) 10% (D) 11% (E) 12% 4.23 (2 points) A Poisson Distribution has a coefficient of variation 0.5. Determine the probability of exactly seven claims. (A) 4% (B) 5% (C) 6% (D) 7% (E) 8% 4.24 (2, 5/83, Q.4) (1.5 points) If X is the mean of a random sample of size n from a Poisson distribution with parameter λ, then which of the following statements is true? A. X has a Normal distribution with mean λ and variance λ. B. X has a Normal distribution with mean λ and variance λ/n. C. X has a Poisson distribution with parameter λ. D. n X has a Poisson distribution with parameter λn . E. n X has a Poisson distribution with parameter nλ. 4.25 (2, 5/83, Q.28) (1.5 points) The number of traffic accidents per week in a small city has a Poisson distribution with mean equal to 3. What is the probability of exactly 2 accidents in 2 weeks? A. 9e-6
B. 18e-6
C. 25e-6
D. 4.5e-3
E. 9.5e-3
4.26 (2, 5/83, Q.45) (1.5 points) Let X have a Poisson distribution with parameter λ = 1. What is the probability that X ≥ 2, given that X ≤ 4? A. 5/65 B. 5/41 C. 17/65 D. 17/41
E. 3/5
4.27 (2, 5/85, Q.9) (1.5 points) The number of automobiles crossing a certain intersection during any time interval of length t minutes between 3:00 P.M. and 4:00 P.M. has a Poisson distribution with mean t. Let W be time elapsed after 3:00 P.M. before the first automobile crosses the intersection. What is the probability that W is less than 2 minutes? A. 1 - 2e-1 - e-2
B. e-2
C. 2e-1
D. 1 - e-2
E. 2e-1 + e-2
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 54
4.28 (2, 5/85, Q.16) (1.5 points) In a certain communications system, there is an average of 1 transmission error per 10 seconds. Let the distribution of transmission errors be Poisson. What is the probability of more than 1 error in a communication one-half minute in duration? A. 1 - 2e-1
B. 1 - e-1
C. 1 - 4e-3
D. 1 - 3e-3
E. 1 - e-3
4.29 (2, 5/88, Q.49) (1.5 points) The number of power surges in an electric grid has a Poisson distribution with a mean of 1 power surge every 12 hours. What is the probability that there will be no more than 1 power surge in a 24-hour period? A. 2e-2
B. 3e-2
C. e-1/2
D. (3/2)e-1/2
E. 3e-1
4.30 (4, 5/88, Q.48) (1 point) An insurer's portfolio is made up of 3 independent policyholders with expected annual frequencies of 0.05, 0.1, and 0.15. Assume that each insured's number of claims follows a Poisson distribution. What is the probability that the insurer experiences fewer than 2 claims in a given year? A. Less than .9 B. At least .9, but less than .95 C. At least .95, but less than .97 D. At least .97, but less than .99 E. Greater than .99 4.31 (2, 5/90, Q.39) (1.7 points) Let X, Y, and Z be independent Poisson random variables with E(X) = 3, E(Y) = 1, and E(Z) = 4. What is P[X + Y + Z ≤ 1]? A. 13e-12
B. 9e-8
C. (13/12)e-1/12
D. 9e-1/8
E. (9/8)e-1/8
4.32 (4B, 5/93, Q.1) (1 point) You are given the following: • A portfolio consists of 10 independent risks. • The distribution of the annual number of claims for each risk in the portfolio is given by a Poisson distribution with mean µ = 0.1. Determine the probability of the portfolio having more than 1 claim per year. A. 5% B. 10% C. 26% D. 37% E. 63% 4.33 (4B, 11/94, Q.19) (3 points) The density function for a certain parameter, α, is f(α) = 4.6α e-4.6 / α!, α = 0, 1, 2, ... Which of the following statements are true concerning the distribution function for α? 1. The mode is less than the mean. 2. The variance is greater than the mean. 3. The median is less than the mean. A. 1 B. 2 C. 3 D. 1, 2
E. 1, 3
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 55
4.34 (4B, 5/95, Q.9) (2 points) You are given the following: • The number of claims for each risk in a group of identical risks follows a Poisson distribution. • The expected number of risks in the group that will have no claims is 96. • The expected number of risks in the group that will have 2 claims is 3. Determine the expected number of risks in the group that will have 4 claims. A. Less than .01 B. At least .01, but less than .05 C. At least .05, but less than .10 D. At least .10, but less than .20 E. At least .20 4.35 (2, 2/96, Q.21) (1.7 points) Let X be a Poisson random variable with E(X) = ln(2). Calculate E[cos(πx)]. A. 0
B. 1/4
C. 1/2 D. 1
E. 2ln(2)
4.36 (4B, 11/98, Q.1) (1 point) You are given the following: • The number of claims follows a Poisson distribution. • Claim sizes follow a Pareto distribution. Determine the type of distribution that the number of claims with sizes greater than 1,000 follows. A. Poisson B. Pareto C. Gamma D. Binomial E. Negative Binomial 4.37 (4B, 11/98, Q.2) (2 points) The random variable X has a Poisson distribution with mean n - 1/2, where n is a positive integer greater than 1. Determine the mode of X. A. n-2 B. n-1 C. n D. n+1 E. n+2 4.38 (4B, 11/98, Q.18) (2 points) The number of claims per year for a given risk follows a distribution with probability function p(n) = λn e−λ / n! , n = 0, 1,..., λ > 0 . Determine the smallest value of λ for which the probability of observing three or more claims during two given years combined is greater than 0.1. A. Less than 0.7 B. At least 0.7, but less than 1.0 C. At least 1.0, but less than 1.3 D. At least 1.3, but less than 1.6 E. At least 1.6
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 56
4.39 (4B, 5/99, Q.8) (3 points) You are given the following: • Each loss event is either an aircraft loss or a marine loss. • The number of aircraft losses has a Poisson distribution with a mean of 0.1 per year. Each loss is always 10,000,000. • The number of marine losses has a Poisson distribution with a mean of 0.2 per year. Each loss is always 20,000,000. • Aircraft losses occur independently of marine losses. • From the first two events each year, the insurer pays the portion of the combined losses that exceeds 10,000,000. Determine the insurer's expected annual payments. A. Less than 1,300,000 B. At least 1,300,000, but less than 1,800,000 C. At least 1,800,000, but less than 2,300,000 D. At least 2,300,000, but less than 2,800,000 E. At least 2,800,000 4.40 (IOA 101, 4/00, Q.5) (2.25 points) An insurance companyʼs records suggest that experienced drivers (those aged over 21) submit claims at a rate of 0.1 per year, and inexperienced drivers (those 21 years old or younger) submit claims at a rate of 0.15 per year. A driver can submit more than one claim a year. The company has 40 experienced and 20 inexperienced drivers insured with it. The number of claims for each driver can be modeled by a Poisson distribution, and claims are independent of each other. Calculate the probability the company will receive three or fewer claims in a year. 4.41 (1, 5/00, Q.24) (1.9 points) An actuary has discovered that policyholders are three times as likely to file two claims as to file four claims. If the number of claims filed has a Poisson distribution, what is the variance of the number of claims filed? (A) 1/ 3
(B) 1
(C) 2
(D) 2
(E) 4
4.42 (3, 5/00, Q.2) (2.5 points) Lucky Tom finds coins on his way to work at a Poisson rate of 0.5 coins/minute. The denominations are randomly distributed: (i) 60% of the coins are worth 1; (ii) 20% of the coins are worth 5; and (iii) 20% of the coins are worth 10. Calculate the conditional expected value of the coins Tom found during his one-hour walk today, given that among the coins he found exactly ten were worth 5 each. (A) 108 (B) 115 (C) 128 (D) 165 (E) 180
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 57
4.43 (1, 11/00, Q.23) (1.9 points) A company buys a policy to insure its revenue in the event of major snowstorms that shut down business. The policy pays nothing for the first such snowstorm of the year and 10,000 for each one thereafter, until the end of the year. The number of major snowstorms per year that shut down business is assumed to have a Poisson distribution with mean 1.5. What is the expected amount paid to the company under this policy during a one-year period? (A) 2,769 (B) 5,000 (C) 7,231 (D) 8,347 (E) 10,578 4.44 (3, 11/00, Q.29) (2.5 points) Job offers for a college graduate arrive according to a Poisson process with mean 2 per month. A job offer is acceptable if the wages are at least 28,000. Wages offered are mutually independent and follow a lognormal distribution, with µ = 10.12 and σ = 0.12. Calculate the probability that it will take a college graduate more than 3 months to receive an acceptable job offer. (A) 0.27 (B) 0.39 (C) 0.45 (D) 0.58 (E) 0.61 4.45 (1, 11/01, Q.19) (1.9 points) A baseball team has scheduled its opening game for April 1. If it rains on April 1, the game is postponed and will be played on the next day that it does not rain. The team purchases insurance against rain. The policy will pay 1000 for each day, up to 2 days, that the opening game is postponed. The insurance company determines that the number of consecutive days of rain beginning on April 1 is a Poisson random variable with mean 0.6. What is the standard deviation of the amount the insurance company will have to pay? (A) 668 (B) 699 (C) 775 (D) 817 (E) 904 4.46 (CAS3, 11/03, Q.31) (2.5 points) Vehicles arrive at the Bun-and-Run drive-thru at a Poisson rate of 20 per hour. On average, 30% of these vehicles are trucks. Calculate the probability that at least 3 trucks arrive between noon and 1:00 PM. A. Less than 0.80 B. At least 0.80, but less than 0.85 C. At least 0.85, but less than 0.90 D. At least 0.90, but less than 0.95 E. At least 0.95
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 58
4.47 (CAS3, 5/04, Q.16) (2.5 points) The number of major hurricanes that hit the island nation of Justcoast is given by a Poisson Distribution with 0.100 storms expected per year.
• Justcoast establishes a fund that will pay 100/storm. • The fund charges an annual premium, payable at the start of each year, of 10. • At the start of this year (before the premium is paid) the fund has 65. • Claims are paid immediately when there is a storm. • If the fund ever runs out of money, it immediately ceases to exist. • Assume no investment income and no expenses. What is the probability that the fund is still functioning in 10 years? A. Less than 60% B. At least 60%, but less than 61% C. At least 61%, but less than 62% D. At least 62%, but less than 63% E. At least 63% 4.48 (CAS3, 11/04, Q.17) (2.5 points) You are given:
• Claims are reported at a Poisson rate of 5 per year. • The probability that a claim will settle for less than $100,000 is 0.9. What is the probability that no claim of $100,000 or more is reported during the next 3 years? A. 20.59% B. 22.31% C. 59.06% D. 60.65% E. 74.08% 4.49 (CAS3, 11/04, Q.23) (2.5 points) Dental Insurance Company sells a policy that covers two types of dental procedures: root canals and fillings. There is a limit of 1 root canal per year and a separate limit of 2 fillings per year. The number of root canals a person needs in a year follows a Poisson distribution with λ = 1, and the number of fillings a person needs in a year is Poisson with λ = 2. The company is considering replacing the single limits with a combined limit of 3 claims per year, regardless of the type of claim. Determine the change in the expected number of claims per year if the combined limit is adopted. A. No change B. More than 0.00, but less than 0.20 claims C. At least 0.20, but less than 0.25 claims D. At least 0.25, but less than 0.30 claims E. At least 0.30 claims
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 59
4.50 (SOA M, 5/05, Q.5) (2.5 points) Kings of Fredonia drink glasses of wine at a Poisson rate of 2 glasses per day. Assassins attempt to poison the kingʼs wine glasses. There is a 0.01 probability that any given glass is poisoned. Drinking poisoned wine is always fatal instantly and is the only cause of death. The occurrences of poison in the glasses and the number of glasses drunk are independent events. Calculate the probability that the current king survives at least 30 days. (A) 0.40 (B) 0.45 (C) 0.50 (D) 0.55 (E) 0.60 4.51 (CAS3, 11/05, Q.24) (2.5 points) For a compound loss model you are given:
• The claim count follows a Poisson distribution with λ = 0.01. • Individual losses are distributed as follows: x F(x) 100 0.10 300 0.20 500 0.25 600 0.40 700 0.50 800 0.70 900 0.80 1,000 0.90 1,200 1.00 Calculate the probability of paying at least one claim after implementing a $500 deductible. A. Less than 0.005 B. At least 0.005, but less than 0.010 C. At least 0.010, but less than 0.015 D. At least 0.015, but less than 0.020 E. At least 0.020 4.52 (CAS3, 11/05, Q.31) (2.5 points) The Toronto Bay Leaves attempt shots in a hockey game according to a Poisson process with mean 30. Each shot is independent. For each attempted shot, the probability of scoring a goal is 0.10. Calculate the standard deviation of the number of goals scored by the Bay Leaves in a game. A. Less than 1.4 B. At least 1.4, but less than 1.6 C. At least 1.6, but less than 1.8 D. At least 1.8, but less than 2.0 E. At least 2.0
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 60
4.53 (CAS3, 11/06, Q.32) (2.5 points) You are given:
• Annual frequency follows a Poisson distribution with mean 0.3. • Severity follows a normal distribution with F(100,000) = 0.6. Calculate the probability that there is at least one loss greater than 100,000 in a year. A. Less than 11 % B. At least 11%, but less than 13% C. At least 13%, but less than 15% D. At least 15%, but less than 17% E. At least 17% 4.54 (SOA M, 11/06, Q.9) (2.5 points) A casino has a game that makes payouts at a Poisson rate of 5 per hour and the payout amounts are 1, 2, 3,… without limit. The probability that any given payout is equal to i is 1/ 2i. Payouts are independent. Calculate the probability that there are no payouts of 1, 2, or 3 in a given 20 minute period. (A) 0.08 (B) 0.13 (C) 0.18 (D) 0.23 (E) 0.28 4.55 (CAS3L, 5/09, Q.8) (2.5 points) Bill receives mail at a Poisson rate of 10 items per day. The contents of the items are randomly distributed:
• 50% of the items are credit card applications. • 30% of the items are catalogs. • 20% of the items are letters from friends. Bill has received 20 credit card applications in two days. Calculate the probability that for those same two days, he receives at least 3 letters from friends and exactly 5 catalogs. A. Less than 6% B. At least 6%, but less than 10% C. At least 10%, but less than 14% D. At least 14%, but less than 18% E. At least 18%
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 61
4.56 (CAS3L, 5/09, Q.9) (2.5 points) You are given the following information:
• Policyholder calls to a call center follow a homogenous Poisson process with λ = 250 per day. • Policyholders may call for 3 reasons: Endorsement, Cancellation, or Payment. • The distribution of calls is as follows: Call Type Percent of Calls Endorsement 50% Cancellation 10% Payment 40% Using the normal approximation with continuity correction, calculate the probability of receiving more than 156 calls in a day that are either endorsements or cancellations. A. Less than 27% B. At least 27%, but less than 29% C. At least 29%, but less than 31% D. At least 31%, but less than 33% E. At least 33% 4.57 (CAS3L, 11/09, Q.11) (2.5 points) You are given the following information:
• Claims follow a compound Poisson process. • Claims occur at the rate of λ = 10 per day. • Claim severity follows an exponential distribution with θ = 15,000. • A claim is considered a large loss if its severity is greater than 50,000. What is the probability that there are exactly 9 large losses in a 30-day period? A. Less than 5% B. At least 5%, but less than 7.5% C. At least 7.5%, but less than 10% D. At least 10%, but less than 12.5% E. At least 12.5%
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 62
Solutions to Problems: 4.1. B. This is a Poisson distribution with a parameter of 6.9. The mean is therefore 6.9. 4.2. B. This is a Poisson distribution with a parameter of 6.9. The variance is therefore 6.9. 4.3. A. One needs to sum the chances of having 0, 1, 2, and 3 claims: n f(n) F(n)
0 0.001 0.001
1 0.007 0.008
2 0.024 0.032
3 0.055 0.087
For example, f(3) = 6.93 e-6.9 / 3! = (328.5)(.001008)/6 = .055. 4.4. B. The mode is the value at which f(n) is a maximum; f(6) = .151 and the mode is therefore 6. n f(n)
0 0.001
1 0.007
2 0.024
3 0.055
4 0.095
5 0.131
6 0.151
7 0.149
8 0.128
Alternately, in general for the Poisson the mode is the largest integer in the parameter; the largest integer in 6.9 is 6. 4.5. C. For a discrete distribution such as we have here, employ the convention that the median is the first value at which the distribution function is greater than or equal to .5. F(7) ≥ 50% and F(6) < 50%, and therefore the median is 7. n f(n) F(n)
0 0.001 0.001
1 0.007 0.008
2 0.024 0.032
3 0.055 0.087
4 0.095 0.182
5 0.131 0.314
6 0.151 0.465
7 0.149 0.614
8 0.128 0.742
4.6. E. The sum of Poisson variables is a Poisson with the sum of the parameters. The sum has a Poisson parameter of (20)(.05) + (10)(.03) = 1.3. The chance of three claims is (1.33 )e-1.3 / 3! = 9.98%. 4.7. E. For the Pareto Distribution, S(x) = 1 - F(x) = {θ/(θ+x)}α. S(250) = {400/(400+250)}2.5 = 0.2971. Thus the distribution of hurricanes with more than $250 million of loss is Poisson with mean frequency of (82%)(.2971) = 24.36%. The chance of zero such hurricanes is e-0.2436 = 0.7838. The chance of one such hurricane is: (0.2436)e-0.2436 = 0.1909. The chance of more than one such hurricane is: 1 - (0.7838 + 0.1909) = 0.0253.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 63
4.8. B. f(n) = e−λ λ n / n! = e-10 10n / n! n f(n) F(n)
0 0.0000 0.0000
1 0.0005 0.0005
2 0.0023 0.0028
3 0.0076 0.0103
4 0.0189 0.0293
5 0.0378 0.0671
Thus the chance of having more than 5 claims is 1 - .0671 = .9329. Comment: Although one should not do so on the exam, one can also solve this using the Incomplete Gamma Function. The chance of having more than 5 claims is the Incomplete Gamma with shape parameter 5+1 =6 at the value 10: Γ(6;10) = .9329. 4.9. A. f(n) = e−λ λ n / n! = e-10 10n / n! n f(n) F(n)
0 0.0000 0.0000
1 0.0005 0.0005
2 0.0023 0.0028
3 0.0076 0.0103
4 0.0189 0.0293
5 0.0378 0.0671
6 0.0631 0.1301
7 0.0901 0.2202
8 0.1126 0.3328
Thus the chance of having more than 8 claims is 1 - .3328 = .6672. Comment: The chance of having more than 8 claims is the incomplete Gamma with shape parameter 8+1 = 9 at the value 10: Γ(9;10) = 0.6672. 4.10. E. One can add up: f(6) + f(7) + f(8) = 0.0631 + 0.0901 + 0.1126 = 0.2657. Alternately, one can use the solutions to the two previous questions. F(8) - F(5) = {1-F(5)} - {1-F(8)} = 0.9239 - 0.6672 = 0.2657. Comment: Prob[6, 7, or 8 claims] = Γ(6;10) - Γ(9;10) = 0.9239 - 0.6672 = 0.2657. 4.11. E. The large and small claims are independent Poisson Distributions. Therefore, the observed number of small claims has no effect on the expected number of large claims. S(500) = exp(-(500/1000)3 ) = 0.8825. Expected number of large claims is: (27)(0.8825) = 23.8. 4.12. D. Frequency of large losses follows a Poisson Distribution with λ = (20%)(7) = 1.4. f(3) = 1.43 e-1.4/3! = 11.3%. 4.13. B. Prob[N = 1 | N ≤ 1] = Prob[N=1]/Prob[N ≤ 1] = λe−λ/(e−λ + λe−λ) = λ/(1 + λ) = 0.0909. 4.14. E. Prob[N = 1 | N ≥ 1] = Prob[N = 1]/Prob[N ≥ 1] = λe−λ/(1 - e−λ) = λ/(eλ - 1) = 0.9508.
2013-4-1,
Frequency Distributions, §4 Poisson Dist. ∞
4.15. E. E[1/(N+1)] =
∑ f(n)/ (n +1) = 0
∞
=
(e-.2/.2)
∑
0.2i / i!
∞
=
(e-.2/.2){
1
HCM 10/4/12,
∞
∑(e- .2 .2n / n!) / (n +1) = (e-.2/.2) 0
∑0.2i / i! −
Page 64
∞
∑0.2n + 1/ (n +1)! 0
0.20 / 0! } = (e-.2/.2)(e.2 - 1) = (1 - e-.2)/.2 = 0.906.
0
4.16. D. E[N] = P[N = 0]0 + P[N = 1]1 + P[N > 1]E[N | N > 1]. 2 = 2e-2 + (1 - e-2 - 2e-2)E[N | N > 1]. E[N | N > 1] = (2 - 2e-2)/(1 - e-2 - 2e-2) = 2.91. 4.17. E. Next year the frequency is Poisson with λ = (600/500)(27) = 32.4. f(30) = e-32.4 32.430/30! = 6.63%. 4.18. D. E[N | N ≥ 1] Prob[N ≥ 1] + 0 Prob[N = 0] = E[N] = λ. ⇒ E[N | N ≥ 1] Prob[N ≥ 1] = λ. E[(N-1)+] = E[N - 1 | N ≥ 1] Prob[N ≥ 1] + 0 Prob[N = 0] = (E[N | N ≥ 1] - 1) Prob[N ≥ 1] = E[N | N ≥ 1] Prob[N ≥ 1] - Prob[N ≥ 1] = λ - (1 - e−λ) = λ + e−λ - 1 = 1.3 + e-1.3 - 1 = 0.5725. ∞
∞
∞
Alternately, E[(N-1)+] = Σ (n-1) f(n) = Σ n f(n) - Σ f(n) = E[N] - Prob[N≥ 1] = λ + e−λ - 1. n=1
n=1
n=1
Alternately, E[(N-1)+] = E[N] - E[N ∧ 1] = λ - Prob[N ≥ 1] = λ + e−λ - 1. Alternately, E[(N-1)+] = E[(1-N)+] + E[N] - 1 = Prob[N = 0] + λ - 1 = e−λ + λ - 1. Comment: For the last two alternate solutions, see “Mahlerʼs Guide to Loss Distributions.”
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 65
4.19. B. E[N | N ≥ 2] Prob[N ≥ 2] + 1 Prob[N = 1] + 0 Prob[N = 0] = E[N] = λ.
⇒ E[N | N ≥ 2] Prob[N ≥ 2] = λ − λe−λ. E[(N-2)+] = E[N - 2 | N ≥ 2] Prob[N ≥ 2] + 0 Prob[N < 2] = (E[N | N ≥ 2] - 2) Prob[N ≥ 2] = E[N | N ≥ 2] Prob[N ≥ 2] - 2Prob[N ≥ 2] = λ - λe−λ - 2(1 - e−λ - λe−λ) = λ + 2e−λ + λe−λ - 2 = 1.3 + 2e-1.3 + 1.3e-1.3 - 2 = 0.199. ∞
∞
∞
Alternately, E[(N-2)+] = Σ (n-2) f(n) = Σ n f(n) - 2Σ f(n) = E[N] - λe−λ - 2 Prob[N≥ 2] = n=2
n=2
n=2
= λ - λe−λ - 2(1 - e−λ - λe−λ) = λ + 2e−λ + λe−λ - 2. Alternately, E[(N-2)+] = E[N] - E[N ∧ 2] = λ - (Prob[N = 1] + 2Prob[N ≥ 2]) = λ + 2e−λ + λe−λ - 2. Alternately, E[(N-2)+] = E[(2-N)+] + E[N] - 2 = 2Prob[N = 0] + Prob[N = 1] + λ - 2 = 2e−λ + λe−λ + λ - 2. Comment: For the last two alternate solutions, see “Mahlerʼs Guide to Loss Distributions.” 4.20. A. For the Weibull, S(500) = exp[-(500/2000).7] = .6846. S(1000) = exp[-(1000/2000).7] = .5403. Therefore, with the $1000 deductible, the non-zero payments are Poisson, with λ = (.5403/.6846)(3.3) = 2.60. f(4) = e-2.6 2.64 /4! = 14.1%. 4.21. D. In the absence of losses, by the beginning of year 12, the fund would have: 300 + (12)(60) = 1020 > 1000. In the absence of losses, by the beginning of year 29, the fund would have: 300 + (29)(60) = 2040 > 2000. Thus in order to survive for 40 years there have to be 0 events in the first 11 years, at most one event during the first 28 years, and at most two events during the first 40 years. Prob[survival through 40 years] = Prob[0 in first 11 years]{Prob[0 in next 17 years]Prob[0, 1, or 2 in final 12 years] + Prob[1 in next 17 years]Prob[0 or 1 in final 12 years]} = e-.55{(e-.85)(e-.6 + .6e-.6 + .62 e-.6/2) + (.85e-.85)(e-.6 + .6e-.6)} = 3.14e-2 = 0.425 Comment: Similar to CAS3, 5/04, Q.16.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 66
4.22. C. The percentage of large losses is: e-1000/600 = 18.89%. Let λ be the mean of the Poisson distribution of losses. Then the large losses, those of size greater than 1000, are also Poisson with mean 0.1889λ. 74% = Prob[0 large losses] = exp[-0.1889λ]. ⇒ λ = 1.59. The small losses, those of size less than 1000, are also Poisson with mean: (1 - 0.1889) (1.59) = 1.29. Prob[3 small losses] = 1.293 e-1.29 / 6 = 9.8%. 4.23. C. The coefficient of variation is the ratio of the standard deviation to the mean, which for a Poisson Distribution is:
λ / λ = 1/
λ . 1/
λ = 0.5. ⇒ λ = 4. ⇒ f(3) = 47 e-4 / 7! = 5.95%.
4.24. E. n X , the sum of n Poissons each with mean λ, is a Poisson with mean nλ. Comment: X can be non-integer, and therefore it cannot have a Poisson distribution. As n → ∞, X → a Normal distribution with mean λ and variance λ/n. 4.25. B. Over two weeks, the number of accidents is Poisson with mean 6. f(2) = e−λλ 2/2 = e-6 62 /2 = 18e- 6. 4.26. C. Prob[X ≥ 2 | X ≤ 4] = {f(2) + f(3) + f(4)}/{f(0) + f(1) + f(2) + f(3) + f(4)} = e -1(1/2 + 1/6 + 1/24)/{e-1(1 + 1 + 1/2 + 1/6 + 1/24)} = (12 + 4 + 1)/(24 + 24 + 12 + 4 + 1) = 17/65. 4.27. D. Prob[W ≤ 2] = 1 - Prob[no cars by time 2] = 1 - e- 2. 4.28. C. Prob[0 errors in 30 seconds] = e-30/10 = e-3. Prob[1 error in 30 seconds] = 3e-3. Prob[more than one error in 30 seconds] = 1 - 4e- 3. 4.29. B. Prob[0 or 1 surges] = e-24/12 + 2e-2 = 3e- 2. 4.30. C. The sum of three independent Poissons is also a Poisson, whose mean is the sum of the individual means. Thus the portfolio of three insureds has a Poisson distribution with mean .05 + .10 + .15 = .30. For a Poisson distribution with mean θ, the chance of zero claims is e−θ and that of 1 claim is θe−θ. Thus the chance of fewer than 2 claims is: (1+θ)e−θ. Thus for this portfolio, the chance of 2 or fewer claims is: (1+.3)e-.3 = 0.963.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 67
4.31. B. X + Y + Z is Poisson with λ = 3 + 1 + 4 = 8. f(0) + f(1) = e-8 + 8e-8 = 9e- 8. 4.32. C. The sum of independent Poissons is a Poisson, with a parameter the sum of the individual Poisson parameters. In this case the portfolio is Poisson with a parameter = (10)(.1) = 1. Chance of zero claims is e-1. Chance of one claim is (1)e-1. Chance of more than one claim is: 1 - (e-1 + e-1) = 0.264. 4.33. E. This is a Poisson distribution with a parameter of 4.6. The mean is therefore 4.6. The mode is the value at which f(α) is a maximum; f(4) = .188 and the mode is 4. Therefore statement 1 is true. For the Poisson the variance equals the mean, and therefore statement 2 is false. For a discrete distribution such as we have here, the median is the first value at which the distribution function is greater than or equal to .5. F(4) > 50%, and therefore the median is 4 and less than the mean of 4.6. Therefore statement 3 is true. n f(n) F(n)
0 0.010 0.010
1 0.046 0.056
2 0.106 0.163
3 0.163 0.326
4 0.188 0.513
5 0.173 0.686
6 0.132 0.818
7 0.087 0.905
8 0.050 0.955
Comment: For a Poisson with parameter λ, the mode is the largest integer in λ. In this case λ = 4.6 so the mode is 4. Items 1 and 3 can be answered by computing enough values of the density and adding them up. Alternately, since the distribution is skewed to the right (has positive skewness), both the peak of the curve and the 50% point are expected to be to the left of the mean. The median is less affected by the few extremely large values than is the mean, and therefore for curves skewed to the right the median is smaller than the mean. For curves skewed to the right, the largest single probability most commonly occurs at less than the mean, but this is not true of all such curves. 4.34. B. Assume we have R risks in total. The Poisson distribution is given by: f(n) = e−λλ n / n!, n=0,1,2,3,... Thus for n=0 we have R e−λ = 96. For n = 2 we have R e−λ λ 2 / 2 = 3. By dividing these two equations we can solve for λ = (6/96).5 = 1/4. The number of risks we expect to have 4 claims is: R e−λ λ 4 / 4! = (96)(1/4)4 / 24 = .0156.
4.35. B. cos(0) = 1. cos(π) = -1. cos(2π) = 1. cos(3π) = -1. E[cos(πx)] = Σ (-1)x f(x) = e−λ{1 - λ + λ2/2! - λ3/3! + λ4/3! - λ5/5! + ...} = e−λ{1 + (-λ) + (-λ)2/2! + (-λ)3/3! + (-λ)4/3! + (-λ)5/5! + ...} = e−λe−λ = e−2λ. For λ = ln(2), e−2λ = 2-2 = 1/4.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 68
4.36. A. If frequency is given by a Poisson and severity is independent of frequency, then the number of claims above a certain amount (in constant dollars) is also a Poisson. 4.37. B. The mode of the Poisson with mean λ is the largest integer in λ. The largest integer in n - 1/2 is n-1. Alternately, for the Poisson f(x)/ f(x-1) = λ/x. Thus f increases when λ > x and decreases for λ < x. Thus f increases for x < λ = n - .5. For integer n, x < n - .5 for x ≤ n -1. Thus the density increases up to n - 1 and decreases thereafter. Therefore the mode is n - 1. 4.38. A. We want the chance of less than 3 claims to be less than .9. For a Poisson with mean λ, the probability of 0, 1 or 2 claims is: e−λ(1 + λ + λ2 /2). Over two years we have a Poisson with mean 2λ. Thus we want e−2λ(1 + 2λ + 2λ2 ) < .9. Trying the endpoints of the given intervals we determine that the smallest such λ must be less than 0.7. Comment: In fact the smallest such λ is about .56.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 69
4.39. D. If there are more than two losses, we are not concerned about those beyond the first two. Since the sum of two independent Poissons is a Poisson, the portfolio has a Poisson frequency distribution with mean of 0.3. Therefore, the chance of zero claims is e-.3 = .7408, one claim is .3 e-.3 = .2222, and of two or more claims is 1 - .741 - .222 = .0370. .1/.3 = 1/3 of the losses are aircraft and .2/.3 = 2/3 of the losses are marine. Thus the probability of the first two events, in the case of two or more events, is divided up as (1/3)(1/3) = 1/9, 2(1/3)(2/3) = 4/9, (2/3)(2/3) = 4/9, between 2 aircraft, 1 aircraft and 1 marine, and 2 marine, using the binomial expansion for two events. ⇒ (.2222)/(1/3) = .0741 = probability of one aircraft, (.2222)(2/3) = probability of one marine, (.0370)(1/9) = .0041 = probability of 2 aircraft, (.0370)(4/9) = probability one of each type, (.0370)(4/9) = .0164 = probability of 2 marine. If there are zero claims, the insurer pays nothing. If there is one aircraft loss, the insurer pays nothing. If there is one marine loss, the insurer pays 10 million. If there are two or more events there are three possibilities for the first two events. If the first two events are aircraft, the insurer pays 10 million. If the first two events are one aircraft and one marine, the insurer pays 20 million. If the first two events are marine, the insurer pays 30 million. Events (first 2 only)
Probability
Losses from First 2 events ($ million)
Amount Paid by the Insurer ($ million)
None 1 Aircraft 1 Marine 2 Aircraft 1 Aircraft, 1 Marine 2 Marine
0.7408 0.0741 0.1481 0.0041 0.0164 0.0164
0 10 20 20 30 40
0 0 10 10 20 30
1
2.345
Thus the insurerʼs expected annual payment is $2.345 million. 4.40. The total number of claims from inexperienced drivers is Poisson with mean: (20)(.15) = 3. The total number of claims from experienced drivers is Poisson with mean: (40)(.1) = 4. The total number of claims from all drivers is Poisson with mean: 3 + 4 = 7. Prob[# claims ≤ 3] = e-7(1 + 7 + 72 /2 + 73 /6) = 8.177%. 4.41. D. f(2) =3 f(4). ⇒ e−λλ 2/2 = 3e−λλ 4/24. ⇒ λ = 2. Variance = λ = 2. 4.42. C. The finding of the three different types of coins are independent Poisson processes. Over the course of 60 minutes, Tom expects to find (.6)(.5)(60) = 18 coins worth 1 each and (.2)(.5)(60) = 6 coins worth 10 each. Tom finds 10 coins worth 5 each. The expected worth of the coins he finds is: (18)(1) + (10)(5) + (6)(10) = 128.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
4.43. C. E[(X-1)+] = E[X] - E[X
∧
HCM 10/4/12,
Page 70
1] = 1.5 - {0f(0) + 1(1 - f(0))} = .5 + f(0) = .5 + e-1.5 = .7231.
Expected Amount Paid is: 10,000E[(X-1)+] = 7231. Alternately, Expected Amount Paid is: 10,000{1f(2) + 2f(3) + 3f(4) + 4f(5) + 5f(6) + ...} = (10,000)e-1.5{1.52 /2 + (2)(1.53 /6) + (3)(1.54 /24) + (4)(1.55 /120) + (5)(1.56 /720) + ...} = 2231{1.125 + 1.125 + .6328 + .2531 + .0791 + .0203 + ...) ≅ 7200. 4.44. B. For this Lognormal Distribution, S(28,000) = 1 - Φ[ln(28000) - 10.12)/0.12] = 1 - Φ(1) = 1 - 0.8413 = 0.1587. Acceptable offers arrive via a Poisson Process at rate 2 S(28000) = (2)(.01587) = 0.3174 per month. Thus the number of acceptable offers over the first 3 months is Poisson distributed with mean (3)(0.3174) = .9522. The probability of no acceptable offers over the first 3 months is: e -0.9522 = 0.386. Alternately, the probability of no acceptable offers in a month is: e-0.3174. Probability of no acceptable offers in 3 months is: (e-0.3174)3 = e -0.9522 = 0.386. 4.45. B. E[X
∧
2] = (0)f(0) + 1f(1) + 2{1 - f(0) - f(1)} = .6e-.6 + 2{1 - e-.6 - .6e-.6} = .573.
∧ 2)2] = (0)f(0) + 1f(1) + 4{1 - f(0) - f(1)} = .6e-.6 + 4{1 - e-.6 - .6e-.6} = .8169. Var[X ∧ 2] = .8169 - .5732 = .4886. Var[1000(X ∧ 2)] = (10002 )(.4886) = 488,600. StdDev[1000(X ∧ 2)] = 488,600 = 699. E[(X
4.46. D. Trucks arrive at a Poisson rate of: (30%)(20) = 6 per hour. f(0) = e-6. f(1) = 6e-6. f(2) = 62 e-6/2. 1 - {f(0) + f(1) + f(2)} = 1 - 25e-6 = 0.938. 4.47. D. If there is a storm within the first three years, then there is ruin, since the fund would have only 65 + 30 = 95 or less. If there are two or more storms in the first ten years, then the fund is ruined. Thus survival requires no storms during the first three years and at most one storm during the next seven years. Prob[survival through 10 years] = Prob[0 storms during 3 years] Prob[0 or 1 storm during 7 years] = (e-.3)(e-.7 + .7e-.7) = 0.625. 4.48. B. Claims of $100,000 or more are Poisson with mean: (5)(1 - 0.9) = 0.5 per year. The number of large claims during 3 years is Poisson with mean: (3)(0.5) = 1.5. f(0) = e-1.5 = 0.2231.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
4.49. C. For λ = 1, E[N For λ = 2, E[N
∧
∧
HCM 10/4/12,
Page 71
1] = 0f(0) + 1(1 - f(0)) = 1 - e-1 = .6321.
2] = 0f(0) + 1f(1) + 2(1 - f(0) - f(1)) = 2e-2 + 2(1 - e-2 - 2e-2) = 1.4587.
Expected number of claims before change: .6321 + 1.4587 = 2.091. The sum of number of root canals and the number of fillings is Poisson with λ = 3. For λ = 3, E[N
∧
3] = 0f(0) + 1f(1) + 2f(2) + 3(1 - f(0) - f(1) - f(2)) =
3e-3 + (2)(9e-3/2) + 3(1 - e-3 - 3e-3 - 4.5e-3) = 2.328. Change is: 2.328 - 2.091 = 0.237. Comment: Although it is not stated, we must assume that the number of root canals and the number of fillings are independent. 4.50. D. Poisoned glasses of wine are Poisson with mean: (0.01)(2) = 0.02 per day. The probability of no poisoned glasses over 30 days is: e-(30)(0.02) = e-0.6 = 0.549. Comment: Survival corresponds to zero poisoned glasses of wine. The king can drink any number of non-poisoned glasses of wine. The poisoned and non-poisoned glasses are independent Poisson Processes. 4.51. B. After implementing a $500 deductible, only losses of size greater than 500 result in a claim payment. Prob[loss > 500] = 1 - F(500) = 1 - .25 = .75. Via thinning, large losses are Poisson with mean: (.75)(.01) = .0075. Prob[at least one large loss] = 1 - e-.0075 = 0.00747. 4.52. C. By thinning, the number of goals is Poisson with mean (30)(0.1) = 3. This Poisson has variance 3, and standard deviation
3 = 1.732.
4.53. B. Large losses are Poisson with mean: (1 - .6)(.3) = 0.12. Prob[at least one large loss] = 1 - e-.12 = 11.3%.
2013-4-1,
Frequency Distributions, §4 Poisson Dist.
HCM 10/4/12,
Page 72
4.54. D. Payouts of size one are Poisson with λ = (1/2)(5) = 2.5 per hour. Payouts of size one are Poisson with λ = (1/4)(5) = 1.25 per hour. Payouts of size one are Poisson with λ = (1/8)(5) = 0.625 per hour. Prob[0 of size 1 over 1/3 of an hour] = e-2.5/3. Prob[0 of size 2 over 1/3 of an hour] = e-1.25/3. Prob[0 of size 3 over 1/3 of an hour] = e-0.625/3. The three Poisson Processes are independent, so we can multiply the above probabilities: e-2.5/3e-1.25/3e-0.625/3 = e-1.458 = 0.233. Alternately, payouts of sizes one, two, or three are Poisson with λ = (1/2 + 1/4 + 1/8)(5) = 4.375 per hour. Prob[0 of sizes 1, 2, or 3, over 1/3 of an hour] = e-4.375/3 = 0.233. 4.55. C. Catalogs are Poisson with mean over two days of: (2)(30%)(10) = 6. Letters are Poisson with mean over two days of: (2)(20%)(10) = 4. The Poisson processes are all independent. Therefore, knowing he got 20 applications tells us nothing about the number of catalogs or letters. Prob[ at least 3 letters] = 1 - e-4 - e-4 4 - e-4 42 /2 = 0.7619. Prob[ 5 catalogs] = e-6 65 /120 = 0.1606. Prob[ at least 3 letters and exactly 5 catalogs] = (.7619)(.1606) = 12.2%. 4.56. C. The number of endorsements and cancellations is Poisson with λ = (250)(60%) = 150. Applying the normal approximation with mean and variance equal to 150: Prob[more than 156] ≅ 1 - Φ[(156.5 - 150)/ 150 ] = 1 - Φ[0.53] = 29.8%. Comment: Using the continuity correction, 156 is out and 157 is in: 156 156.5 157 |→ 4.57. D. S(50) = e-50/15 = 0.03567. Large losses are Poisson with mean: 10 S(50) = 0.3567. Over 30 days, large losses are Poisson with mean: (30)(0.3567) = 10.70. Prob[exactly 9 large losses in a 30-day period] = e-10.7 10.79 / 9! = 11.4%.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 73
Section 5, Geometric Distribution The Geometric Distribution, a special case of the Negative Binomial Distribution, will be discussed first.
Geometric Distribution Parameters: β > 0.
Support: x = 0, 1, 2, 3, ...
D. f. :
⎛ β ⎞x + 1 F(x) = 1 - ⎜ ⎟ ⎝ 1+ β ⎠
P. d. f. :
f(x) =
βx (1 + β) x + 1
f(0) = 1/ (1+β).
f(1) = β / (1 + β)2 .
f(2) = β2 / (1 + β)3 .
f(3) = β3 / (1 + β)4 .
Mean = β Variance = β(1+β)
Coefficient of Variation =
Kurtosis = 3 +
Variance / Mean = 1 + β > 1. 1 + β . β
Skewness =
1 + 2β β(1 + β)
.
6β 2 + 6 β + 1 . β (1+ β )
Mode = 0. Probability Generating Function: P(z) =
1 , z < 1 + 1/β. 1 - β(z - 1)
f(x+1)/f(x) = a + b/(x+1), a = β/(1+β), b = 0, f(0) = 1/(1+β). Moment Generating Function: M(s) = 1/{1- β(es-1)}, s < ln(1+β) - ln(β).
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 74
Using the notation in Loss Models, the Geometric Distribution is: ⎛ β ⎞x ⎜ ⎟ βx ⎝ 1+ β ⎠ f(x) = = , x = 0, 1, 2, 3, ... 1+ β (1 + β) x + 1
For example, for β = 4, f(x) = 4x/5x+1, x = 0, 1, 2, 3, ... A Geometric Distribution for β = 4: Prob. 0.2
0.15
0.1
0.05
0
5
10
15
20
x
The densities decline geometrically by a factor of β/(1+β); f(x+1)/f(x) = β/(1+β). This is similar to the Exponential Distribution, f(x+1)/f(x) = e−1/θ. The Geometric Distribution is the discrete analog of the continuous Exponential Distribution.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 75
For q = 0.3 or β = 0.7/0.3 = 2.333, the Geometric distribution is: Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Sum
f(x) 0.30000 0.21000 0.14700 0.10290 0.07203 0.05042 0.03529 0.02471 0.01729 0.01211 0.00847 0.00593 0.00415 0.00291 0.00203 0.00142 0.00100 0.00070 0.00049 0.00034 0.00024 0.00017 0.00012 0.00008 0.00006 0.00004
F(x) 0.30000 0.51000 0.65700 0.75990 0.83193 0.88235 0.91765 0.94235 0.95965 0.97175 0.98023 0.98616 0.99031 0.99322 0.99525 0.99668 0.99767 0.99837 0.99886 0.99920 0.99944 0.99961 0.99973 0.99981 0.99987 0.99991
Number of Claims times f(x) 0.00000 0.21000 0.29400 0.30870 0.28812 0.25211 0.21177 0.17294 0.13836 0.10895 0.08474 0.06525 0.04983 0.03779 0.02849 0.02136 0.01595 0.01186 0.00879 0.00650 0.00479 0.00352 0.00258 0.00189 0.00138 0.00101
Square of Number of Claims times f(x) 0.00000 0.21000 0.58800 0.92610 1.15248 1.26052 1.27061 1.21061 1.10684 0.98059 0.84743 0.71777 0.59794 0.49123 0.39880 0.32046 0.25523 0.20169 0.15828 0.12345 0.09575 0.07390 0.05677 0.04343 0.03311 0.02515
2.33
13.15
As computed above, the mean is about 2.33. The second moment is about 13.15, so that the variance is about 13.15 - 2.332 = 7.72. Since the Geometric has a significant tail, the terms involving the number of claims greater than 25 would have to be taken into account in order to compute a more accurate value of the variance or higher moments. Rather than taking additional terms it is better to have a general formula for the moments.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 76
The mean can be computed as follows: β ⎛ β ⎞ j+1 1+ β E[X] = ∑ Prob[X > j] = ∑ ⎜ Σ (β/(1+β))j+1 = = β. ⎟ β j=0 j = 0 ⎝ 1+ β ⎠ 1 − 1+ β ∞
∞
Thus the mean for the Geometric distribution is β. For the example, β = 2.333 = mean. The variance of the Geometric is β(1+β), which for β = 2.333 is 7.78.23 Survival Function: Note that there is a small but positive chance of any very large number of claims. Specifically for the Geometric distribution the chance of x > j is: ∞ ∞ ⎛ βi β ⎞i 1 / (1+ β)} j + 1 = {β/(1+β)}j+1. ∑ (1+ β)i + 1 = (1+ β) ∑ ⎜ (1+ β)⎟ Σ (β/(1+β))i = (1+1β) {β 1 β / (1+β) ⎝ ⎠ i=j+1 i=j+1 1 - F(x) = S(x) = {β/(1+β)}x + 1. For example, the chance of more than 19 claims is .720 = .00080, so that F(19) = 1 - 0.00080 = 0.99920, which matches the result above. Thus for a Geometric Distribution, for n > 0, the chance of at least n claims is (β/(1+β))n . The survival function decreases geometrically. The chance of 0 claims from a Geometric is: 1/(1+β) = 1 - β/(1+β) = 1 - geometric factor of decline of the survival function. Exercise: There is a 0.25 chance of 0 claims, 0.75 chance of at least one claim, 0.752 chance of at least 2 claims, 0.753 chance of at least 3 claims, etc. What distribution is this? [Solution: This is a Geometric Distribution with 1/(1+β) = 0.25, β/(1+β) = 0.75, or β = 3.] For the Geometric, F(x) = 1 - {β/(1+β)}x+1. Thus the Geometric distribution is the discrete analog of the continuous Exponential Distribution which has F(x) = 1 - e-x/θ = 1 - (exp[-1/θ])x. In each case the density function decreases by a constant multiple as x increases. For the Geometric Distribution: f(x) = {β/(1+β)}x / (1+β) , while for the Exponential Distribution: f(x) = e-x/θ /θ = (exp[-1/θ])x / θ . 23
The variance is shown in Appendix B attached to the exam. One way to get the variance as well as higher moments is via the probability generating function and factorial moments, as will be discussed subsequently.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 77
Memoryless Property: The geometric shares with the exponential distribution, the “memoryless property.”24 “Given that there are at least m claims, the probability distribution of the number of claims in excess of m does not depend on m.” In other words, if one were to truncate and shift a Geometric Distribution, then one obtains the same Geometric Distribution. Exercise: Let the number of claims be given by an Geometric Distribution with β = 1.7. Eliminate from the data all instances where there are 3 or fewer claims and subtract 4 from each of the remaining data points. (Truncate and shift at 4.) What is the resulting distribution? [Solution: Due to the memoryless property, the result is a Geometric Distribution with β = 1.7.] Generally, let f(x) be the original Geometric Distribution. Let g(x) be the truncated and shifted distribution. Take as an example, a truncation point of 4 as in the exercise. Then g(x) =
f(x + 4) βx + 4 / (1+β)x + 5 = f(x+4) / S(3) = = βx/(1+β)x+1, β 4 / (1+ β)4 1 - {f(0) + f(1)+ f(2)+ f(3)}
which is again a Geometric Distribution with the same parameter β.
Constant Force of Mortality: Another application where the Geometric Distribution arises is constant force of mortality, when one only looks at regular time intervals rather than at time continuously.25 Exercise: Every year in which Jim starts off alive, he has a 10% chance of dying during that year. If Jim is currently alive, what is the distribution of his curtate future lifetime? [Solution: There is a 10% chance he dies during the first year, and has a curtate future lifetime of 0. If he survives the first year, there is a 10% chance he dies during the second year. Thus there is a (0.9)(0.1) = 0.09 chance he dies during the second year, and has a curtate future lifetime of 1. If he survives the second year, which has probability 0.92 , there is a 10% chance he dies during the third year. Prob[curtate future lifetime = 2] = (0.92 )(0.1). Similarly, Prob[curtate future lifetime = n] = (0.9n )(0.1). This is a Geometric Distribution with β/(1+β) = 0.9 or β = 0.9/0.1 = 9.] 24
See Loss Models, pages 105-106. It is due to this memoryless property of the Exponential and Geometric distributions, that they have constant mean residual lives, as discussed subsequently. 25 When one has a constant force of mortality and looks at time continuously, one gets the Exponential Distribution, the continuous analog of the Geometric Distribution.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 78
In general, if there is a constant probability of death each year q, then the curtate future lifetime,26 K(x), follows a Geometric Distribution, with β = (1-q)/q = probability of continuing sequence / probability of ending sequence. Therefore, for a constant probability of death each year, q, the curtate expectation of life,27 ex, is β = (1-q)/q, the mean of this Geometric Distribution. The variance of the curtate future lifetime is: β(1+β) = {(1-q)/q}(1/q) = (1-q)/q2 . Exercise: Every year in which Jim starts off alive, he has a 10% chance of dying during that year. What is Jimʼs curtate expectation of life and variance of his curtate future lifetime? [Solution: Jimʼs curtate future lifetime is Geometric, with mean = β = (1 - 0.1)/0.1 = 9, and variance β(1+β) = (9)(10) = 90.] Exercise: Jim has a constant force of mortality, µ = 0.10536. What is the distribution of Jimʼs future lifetime. What is Jimʼs complete expectation of life? What is the variance of Jimʼs future lifetime? [Solution: It is Exponential, with mean θ = 1/µ = 9.49, and variance θ2 = 90.1. Comment: Jim has a 1 - e-0.10536 = 10% chance of dying each year in which he starts off alive. However, here we look at time continuously.] With a constant force of mortality: observe continuously ⇔ Exponential Distribution observe at discrete intervals ⇔ Geometric Distribution. Exercise: Every year in which Jim starts off alive, he has a 10% chance of dying during that year. Jimʼs estate will be paid $1 at the end of the year of his death. At a 5% annual rate of interest, what is the present value of this benefit?
26
The curtate future lifetime is the number of whole years completed prior to death. See page 54 of Actuarial Mathematics. 27 The curtate expectation of life is the expected value of the curtate future lifetime. See page 69 of Actuarial Mathematics.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 79
[Solution: If Jimʼs curtate future lifetime is n, there is a payment of 1 at time n + 1. ∞
∑ f(n) vn + 1 =
Present value =
0
∞
∑ (9n / 10n + 1) 0.9524n + 1 = (0.09524)/(1 - 0.8572) = 0.667.] 0
In general, if there is a constant probability of death each year q, the present value of $1 paid at the ∞
end of the year of death is:
βn
∑ (1+ β)n + 1 vn + 1 = 0
1 v 1 = = (1+β) 1 - v β / (1+ β) (1+ β) / v - β
1 1 = = q/(q + i).28 (1+ β)(1+ i) - β (1+ i) / q - (1- q) / q Exercise: Every year in which Jim starts off alive, he has a 10% chance of dying during that year. At a 5% annual rate of interest, what is the present value of the benefits from an annuity immediate with annual payment of 1? [Solution: If Jimʼs curtate future lifetime is n, there are n payments of 1 each at times: 1, 2, 3,.., n. The present value of these payments is: v + v2 + ... + vn = (1 - vn ) / i. ∞
∑
Present value =
f(n)(1 - vn ) / i = (1/i) {Σf(n) -
Σ (9n/10n+1) vn} =
n=0
20{1 - (0.1)/(1- 0.9/1.05)} = 6.] In general, if there is a constant probability of death each year q, the present value of the benefits from an annuity immediate with annual payment of 1: ∞
∑
f(n)(1 -
n=0
(1/i){1 -
vn )
∞
/ i= {
∑ n=0
∞
f(n) -
βn n n + 1 (v ) } /i = (1+β) n=0
∑
1/ (1+ β) } = (1/i){1 - 1/(1 + β - vβ)} = (1/i){(β - vβ)/(1 + β - vβ)} = 1 - vβ / (1+ β)
= (1/(1+i)) {β/(1 + β - vβ)} = (1/(1+i)) {1/(1/β + 1 - v)} = (1/(1+i)) {1/(q/(1-q) + 1 - v)} = (1/(1+i)) {(1-q)/(q + (1 - v)(1-q))} = (1-q)/(q(1+i) + i(1-q))} = (1-q)/(q+i).29 In the previous exercise, the present value of benefits is: (1 - 0.1)/(0.1 + 0.05) = 0.9/0.15 = 6. For i = 0, (1-q)/(q+i) becomes (1-q)/q, the mean of the Geometric Distribution of curtate future lifetimes. For q = 0, (1-q)/(q+i) becomes 1/i, the present value of a perpetuity, with the first payment one year from now. With a constant force of mortality µ, the present value of $1 paid at the time of death is: µ/(µ + δ). See page 99 of Actuarial Mathematics. 29 With a constant force of mortality µ, the present value of a life annuity paid continuously is: 1/(µ + δ). See page 136 of Actuarial Mathematics. 28
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 80
Series of Bernoulli Trials: For a series of Bernoulli trials with chance of success 0.3, the probability that there are no success in the first four trials is: (1 - 0.3)4 = 0.24. Exercise: What is the probability that there is no success in the first four trials and the fifth trial is a success? [Solution: (1 - 0.3)4 (0.3) = 0.072 = the probability of the first success occurring on the fifth trial.] In general, the chance of the first success after x failures is: (1 - 0.3)x(0.3). More generally, take a series of Bernoulli trials with chance of success q. The probability of the first success on trial x+1 is: (1-q)x q. f(x) = (1-q)x q,
x = 0, 1, 2, 3,...
This is the Geometric distribution. It is a special case of the Negative Binomial Distribution.30 Loss Models uses the notation β, where q = 1/(1+β). β = (1-q) / q = probability of a failure / probability of a success. For a series of independent identical Bernoulli trials, the chance of the first success following x failures is given by a Geometric Distribution with mean: β = chance of a failure / chance of a success. The number of trials = 1 + number of failures = 1 + Geometric. The Geometric Distribution shows up in many applications, including Markov Chains and Ruin Theory. In many contexts:
β = probability of continuing sequence / probability of ending sequence = probability of remaining in the loop / probability of leaving the loop.
30
The Geometric distribution with parameter β is the Negative Binomial Distribution with parameters β and r=1.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Problems: The following five questions all deal with a Geometric distribution with β = 0.6. 5.1 (1 point) What is the mean? (A) 0.4
(B) 0.5
(C) 0.6
(D) 2/3
(E) 1.5
5.2 (1 point) What is the variance? A. less than 1.0 B. at least 1.0 but less than 1.1 C. at least 1.1 but less than 1.2 D. at least 1.2 but less than 1.3 E. at least 1.3 5.3 (2 points) What is the chance of having 3 claims? A. less than 3% B. at least 3% but less than 4% C. at least 4% but less than 5% D. at least 5% but less than 6% E. at least 6% 5.4 (2 points) What is the mode? A. 0 B. 1 C. 2
D. 3
E. None of A, B, C, or D.
5.5 (2 points) What is the chance of having 3 claims or more? A. less than 3% B. at least 3% but less than 4% C. at least 4% but less than 5% D. at least 5% but less than 6% E. at least 6%
Page 81
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 82
5.6 (1 point) The variable N is generated by the following algorithm: (1) N = 0. (2) 25% chance of exiting. (3) N = N + 1. (4) Return to step #2. What is the variance of N? A. less than 10 B. at least 10 but less than 15 C. at least 15 but less than 20 D. at least 20 but less than 25 E. at least 25 5.7 (2 points) Use the following information:
• Assume a Rating Bureau has been making Workersʼ Compensation classification rates for a very, very long time. • Assume every year the rate for the Carpenters class is based on a credibility weighting of the indicated rate based on the latest year of data and the current rate. • Each year, the indicated rate for the Carpenters class is given 20% credibility.
• Each year, the rate for year Y, was based on the data from year Y-3 and the rate in the year Y-1. Specifically, the rate in the year 2001 is based on the data from 1998 and the rate in the year 2000. What portion of the rate in the year 2001 is based on the data from the year 1990? A. less than 1% B. at least 1% but less than 2% C. at least 2% but less than 3% D. at least 3% but less than 4% E. at least 4% 5.8 (3 points) An insurance company has stopped writing new general liability insurance policies. However, the insurer is still paying claims on previously written policies. Assume for simplicity that payments are made at the end of each quarter of a year. It is estimated that at the end of each quarter of a year the insurer pays 8% of the total amount remaining to be paid. The next payment will be made today. Let X be the total amount the insurer has remaining to pay. Let Y be the present value of the total amount the insurer has remaining to pay. If the annual rate of interest is 5%, what is the Y/X? E. 0.88 A. 0.80 B. 0.82 C. 0.84 D. 0.86
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 83
Use the following information for the next 4 questions: There is a constant force of mortality of 3%. There is an annual interest rate of 4%. 5.9 (1 point) What is the curtate expectation of life? (A) 32.0 (B) 32.2 (C) 32.4 (D) 32.6
(E) 32.8
5.10 (1 point) What is variance of the curtate future lifetime? (A) 900 (B) 1000 (C) 1100 (D) 1200 (E) 1300 5.11 (2 points) What is the actuarial present value of a life insurance which pays 100,000 at the end of the year of death? (A) 41,500 (B) 42,000 (C) 42,500 (D) 43,000 (E) 43,500 5.12 (2 points) What is the actuarial present value of an annuity immediate which pays 10,000 per year? (A) 125,000 (B) 130,000 (C) 135,000 (D) 140,000 (E) 145,000 5.13 (1 point) After each time Mark Orfe eats at a restaurant, there is 95% chance he will eat there again at some time in the future. Mark has eaten today at the Phoenicia Restaurant. What is the probability that Mark will eat at the Phoenicia Restaurant precisely 7 times in the future? A. 2.0% B. 2.5% C. 3.0% D. 3.5% E. 4.0% 5.14 (3 points) Use the following information: • The number of days of work missed by a work related injury to a workersʼ arm is Geometrically distributed with β = 4.
• If a worker is disabled for 5 days or less, nothing is paid for his lost wages under workers compensation insurance. • If he is disabled for more than 5 days due to a work related injury, workers compensation insurance pays him his wages for all of the days he was out of work. What is the average number of days of wages reimbursed under workers compensation insurance for a work related injury to a workersʼ arm? (A) 2.2 (B) 2.4 (C) 2.6 (D) 2.8 (E) 3.0
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 84
Use the following information for the next two questions: The variable X is generated by the following algorithm: (1) X = 0. (2) Roll a fair die with six sides and call the result Y. (3) X = X + Y. (4) If Y = 6 return to step #2. (5) Exit. 5.15 (2 points) What is the mean of X? A. less than 4.0 B. at least 4.0 but less than 4.5 C. at least 4.5 but less than 5.0 D. at least 5.0 but less than 5.5 E. at least 5.5 5.16 (2 points) What is the variance of X? A. less than 8 B. at least 8 but less than 9 C. at least 9 but less than 10 D. at least 10 but less than 11 E. at least 11 5.17 (1 point) N follows a Geometric Distribution with β = 0.2. What is Prob[N = 1 | N ≤ 1]? A. 8%
B. 10%
C. 12%
D. 14%
E. 16%
5.18 (1 point) N follows a Geometric Distribution with β = 0.4. What is Prob[N = 2 | N ≥ 2]? A. 62%
B. 65%
C. 68%
D. 71%
E. 74%
5.19 (2 points) N follows a Geometric Distribution with β = 1.5. What is E[1/(N+1)]? Hint: x + x2 /2 + x3 /3 + x4 /4 + ... = -ln(1-x), for 0 < x < 1. A. 0.5
B. 0.6
C. 0.7
D. 0.8
E. 0.9
5.20 (2 points) N follows a Geometric Distribution with β = 0.8. What is E[N | N > 1]? A. 2.6
B. 2.7
C. 2.8
D. 2.9
E. 3.0
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 85
5.21 (2 points) Use the following information:
• Larry, his brother Darryl, and his other brother Darryl are playing as a three man basketball team at the school yard.
• Larry, Darryl, and Darryl have a 20% chance of winning each game, independent of any other game.
• When a teamʼs turn to play comes, they play the previous winning team. • Each time a team wins a game it plays again. • Each time a team loses a game it sits down and waits for its next chance to play. • It is currently the turn of Larry, Darryl, and Darryl to play again after sitting for a while. Let X be the number of games Larry, Darryl, and Darryl play until they sit down again. What is the variance of X? A. 0.10 B. 0.16 C. 0.20 D. 0.24 E. 0.31 Use the following information for the next two questions: N follows a Geometric Distribution with β = 1.3. Define (N - j)+ = n-j if n ≥ j, and 0 otherwise. 5.22 (2 points) Determine E[(N - 1)+]. A. 0.73
B. 0.76
C. 0.79
D. 0.82
E. 0.85
D. 0.39
E. 0.42
5.23 (2 points) Determine E[(N-2)+]. A. 0.30
B. 0.33
C. 0.36
Use the following information for the next three questions: Ethan is an unemployed worker. Ethan has a 25% probability of finding a job each week. 5.24 (2 points) What is the probability that Ethan is still unemployed after looking for a job for 6 weeks? A. 12% B. 14% C. 16% D. 18% E. 20% 5.25 (1 point) If Ethan finds a job the first week he looks, count this as being unemployed 0 weeks. If Ethan finds a job the second week he looks, count this as being unemployed 1 week, etc. What is the mean number of weeks that Ethan remains unemployed? A. 2 B. 3 C. 4 D. 5 E. 6 5.26 (1 point) What is the variance of the number of weeks that Ethan remains unemployed? A. 12 B. 13 C. 14 D. 15 E. 16
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 86
∞
5.27 (3 points) For a discrete density pk, define the entropy as: -
∑ pk ln[pk]. k=0
Determine the entropy for a Geometric Distribution as per Loss Models. Hint: Why is the mean of the Geometric Distribution β? 5.28 (2, 5/85, Q.44) (1.5 points) Let X denote the number of independent rolls of a fair die required to obtain the first "3". What is P[X ≥ 6]? A. (1/6)5 (5/6)
B. (1/6)5
C. (5/6)5 (1/6)
D. (5/6)6
E. (5/6)5
5.29 (2, 5/88, Q.22) (1.5 points) Let X be a discrete random variable with probability function P[X = x] = 2/3x for x = 1, 2, 3, . . . A. 1/4
B. 2/7
What is the probability that X is even? C. 1/3 D. 2/3 E. 3/4
5.30 (2, 5/90, Q.5) (1.7 points) A fair die is tossed until a 2 is obtained. If X is the number of trials required to obtain the first 2, what is the smallest value of x for which P[X ≤ x] ≥ 1/2? A. 2 B. 3 C. 4 D. 5 E. 6 5.31 (2, 5/92, Q.35) (1.7 points) Ten percent of all new businesses fail within the first year. The records of new businesses are examined until a business that failed within the first year is found. Let X be the total number of businesses examined which did not fail within the first year, prior to finding a business that failed within the first year. What is the probability function for X? A. 0.1(0.9x) for x = 0, 1, 2, 3,...
B. 0.9x(0.1x) for x = 1, 2, 3,...
C. 0.1x(0.9x) for x = 0, 1, 2, 3,...
D. 0.9x(0.1x) for x = 1, 2,3,...
E. 0.1(x - 1)(0.9x) for x = 2, 3, 4,...
5.32 (Course 1 Sample Exam, Q. 7) (1.9 points) As part of the underwriting process for insurance, each prospective policyholder is tested for high blood pressure. Let X represent the number of tests completed when the first person with high blood pressure is found. The expected value of X is 12.5. Calculate the probability that the sixth person tested is the first one with high blood pressure. A. 0.000 B. 0.053 C. 0.080 D. 0.316 E. 0.394 5.33 (1, 5/00, Q.36) (1.9 points) In modeling the number of claims filed by an individual under an automobile policy during a three-year period, an actuary makes the simplifying assumption that for all integers n ≥ 0, pn+1 = pn /5, where pn represents the probability that the policyholder files n claims during the period. Under this assumption, what is the probability that a policyholder files more than one claim during the period? (A) 0.04 (B) 0.16 (C) 0.20 (D) 0.80 (E) 0.96
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 87
5.34 (1, 11/01, Q.33) (1.9 points) An insurance policy on an electrical device pays a benefit of 4000 if the device fails during the first year. The amount of the benefit decreases by 1000 each successive year until it reaches 0. If the device has not failed by the beginning of any given year, the probability of failure during that year is 0.4. What is the expected benefit under this policy? (A) 2234 (B) 2400 (C) 2500 (D) 2667 (E) 2694
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 88
Solutions to Problems: 5.1. C. mean = β = 0.6. 5.2. A. variance = β(1+ β) = (0.6)(1.6) = 0.96. 5.3. B. f(x) = βx / (1+β)x+1. f(3) = (.6)3 / (1.6)3+1 = 3.30%. 5.4. A. The mode is 0, since f(0) is larger than any other value. n f(n)
0 0.6250
1 0.2344
2 0.0879
3 0.0330
4 0.0124
5 0.0046
6 0.0017
7 0.0007
Comment: Just as with the Exponential Distribution, the Geometric Distribution always has a mode of zero. 5.5. D. 1 - {f(0) + f(1) + f(2)} = 1 - (.6250 + .2344 + .0879) = 5.27%. Alternately, S(x) = (β/(1+β))x+1. S(2) = (.6/1.6)3 = 5.27%. 5.6. B. This is a loop, in which each time through there is a 25% of exiting and a 75% chance of staying in the loop. Therefore, N is Geometric with β = probability of remaining in the loop / probability of leaving the loop = .75/.25 = 3. Variance = β(1 + β) = (3)(4) = 12. 5.7. D. The weight given to the data from year 1997 in the rate for year 2000 is 0.20. Therefore, the weight given to the data from year 1997 in the rate for year 2001 is: (1 - 0.2)(0.2). Similarly, the weight given to the data from year 1996 in the rate for year 2001 is: (1-0.2)2 (0.2). The weight given to the data from year 1990 in the rate for year 2001 is: (1-0.2)8 (0.2) = 3.4%. Comment: The weights are from a geometric distribution with β = 1/Z - 1 and β/(1+β) = 1- Z. The weights are: (1-Z)n Z for n = 0,1,2,... Older years of data get less weight. This is a simplification of a real world application, as discussed in “An Example of Credibility and Shifting Risk Parameters”, by Howard C. Mahler, PCAS 1990.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 89
5.8. E. Let Z be the amount remaining to be paid prior to quarter n. Then the payment in quarter n is .08Z. This leaves .92Z remaining to be paid prior to quarter n+1. Thus the payment in quarter n+1 is (.08)(.92)Z. The payment in quarter n+1 is .92 times the payment in quarter n. The payments each quarter decline by a factor of .92. Therefore, the proportion of the total paid in each quarter is a Geometric Distribution with β/(1+β) = .92. β = .92/(1-.92) = 11.5. The payment at the end of quarter n is: X f(n) = X βn /(1+β)n+1, n = 0, 1, 2, ... (The sum of these payment is X.) The present value of the payment at the end of quarter n is: X f(n)/(1.05)n/4 = X(.9879n )βn /(1+β)n+1, n = 0, 1, 2, ... ∞
Y, the total present value is:
∞
∞
Σ X(.9879n )βn /(1+β)n+1 = (X/(1+β))Σ (.9879n )(β/(1+β))n = (X/(1+11.5)) Σ (.9879n )(.92n ) n=0
n=0
n=0
= (X/12.5)/(1-.9089) = .878X. Y/X = 0.878. Comment: In “Measuring the Interest Rate Sensitivity of Loss Reserves,” by Richard Gorvett and Stephen DʼArcy, PCAS 2000 , a geometric payment pattern is used in order to estimate Macaulay durations, modified durations, and effective durations. 5.9. E. If a person is alive at the beginning of the year, the chance they die during the next year is: 1 - e−µ = 1 - e-.03 = .02955. Therefore, the distribution of curtate future lifetimes is Geometric with mean β = (1-q)/q = .97045/.02955 = 32.84 years. Alternately, the complete expectation of life is: 1/µ = 1/.03 = 33.33. The curtate future lifetime is on average about 1/2 less; 33.33 - 1/2 = 32.83. 5.10. C. The variance of the Geometric Distribution is: β(1+β) = (32.84)(33.84) = 1111. Alternately, the future lifetime is Exponentially distributed with mean θ = 1/µ = 1/.03 = 33.33, and variance θ2 = 33.332 = 1111. Since approximately they differ by a constant, 1/2, the variance of the curtate future lifetimes is approximately that of the future lifetimes, 1111. 5.11. C. With constant probability of death, q, the present value of the insurance is: q/(q + i). (100,000)q/(q + i) = (100000)(.02955)/(.02955 + .04) = 42,487. Alternately, the present value of an insurance that pays at the moment of death is: µ/(µ+δ) = .03/(.03 + ln(1.04)) = .03/(.03 + .03922) = .43340. (100000)(.43340) = 43,340. The insurance paid at the end of the year of death is paid on average about 1/2 year later; 43340/(1.04.5) = 42,498.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 90
5.12. D. With constant probability of death, q, the present value of an annuity immediate is: (1-q)/(q +i). (10000)(1-q)/(q +i) = (10000)(1 - .02955)/(.02955 + .04) = 139,533. Alternately, the present value of an annuity that pays continuously is: 1/(µ+δ) = 1/(.03 + ln(1.04)) = 1/(.03 + .03922) = 14.4467. (10000)(14.4467) = 144,467. Discounting for another half year of interest and mortality, the present value of the annuity immediate is approximately: 144,467/((1.04.5)(1.03.5)) = 139,583. 5.13. D. There is a 95% chance Mark will return. If he returns, there is another 95% chance he will return again, etc. The chance of returning 7 times and then not returning an 8th time is: (.957 )(.05) = 3.5%. Comment: The number of future visits is a Geometric Distribution with β = probability of continuing sequence / probability of ending sequence = .95/.05 = 19. f(7) = β7 /(1+β)8 = 197 /208 = 3.5%. 5.14. C. If he is disabled for n days, then he is paid 0 if n ≤ 5, and n days of wages if n ≥ 6. Therefore, the mean number of days of wages paid is: ∞
∞
5
n=6
n=0
n=0
∑ n f(n) = ∑ n f(n) - ∑ n f(n) = E[N] - {0f(0) + 1f(1) + 2f(2) + 3f(3) + 4f(4) + 5f(5)} =
4 - {(1)(.2)(.8) + (2)(.2)(.82 ) + (3)(.2)(.83 ) + (4)(.2)(.84 ) + (5)(.2)(.85 )} = 2.62. Alternately, due to the memoryless property of the Geometric Distribution (analogous to its continuous analog the Exponential), truncated and shifted from below at 6, we get the same Geometric Distribution. Thus if only those days beyond 6 were paid for, the average nonzero payment is 4. However, in each case where we have at least 6 days of disability we pay the full length of disability which is 6 days longer, so the average nonzero payment is: 4 + 6 = 10. The probability of a nonzero payment is: 1 - {f(0) + f(1) + f(2) + f(3) + f(4) + f(5)} = 1 - {.2 + (.2)(.8) + (.2)(.82 ) + (.2)(.83 ) + (.2)(.84 ) + (.2)(.85 )} = 0.262. Thus the average payment (including zero payments) is: (0.262)(10 days) = 2.62 days. Comment: Just an exam type question, not intended as a model of the real world.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 91
5.15. B. & 5.16. D. The number of additional dies rolled beyond the first is Geometric with β = probability of remaining in the loop / probability of leaving the loop = (1/6)/(5/6) = 1/5. Let N be the number of dies rolled, then N - 1 is Geometric with β = 1/5. X = 6(N - 1) + the result of the last 6-sided die rolled. The result of the last six sided die to be rolled is equally likely to be a 1, 2, 3, 4 or 5 (it canʼt be a six or we would have rolled an additional die.) E[X] = (6)(mean of a Geometric with β = 1/5) + (average of 1,2,3,4,5) = (6)(1/5) + 3 = 4.2. Variance of the distribution equally likely to be 1, 2, 3, 4, or 5 is: (22 + 12 + 02 + 12 + 22 )/5 = 2. Var[X] = 62 (variance of a Geometric with β = 1/5) + 2 = (36)(1/5)(6/5) + 2 = 10.64. 5.17. D. Prob[N = 1 | N ≤ 1] = Prob[N = 1]/Prob[N ≤ 1] = β/(1+β)2 /{1/(1+β)) + β/(1+β))2 } = β/(1 + 2β) = .2/1.4 = 0.143. 5.18. D. Prob[N = 2 | N ≥ 2] = Prob[N = 2]/Prob[N ≥ 2] = β2/(1+β)3 /{β2/(1+β)2 } = 1/(1+β). Alternately, from the memoryless property, Prob[N = 2 | N ≥ 2] = Prob[N = 0] = 1/(1+β) = .714. ∞
∞
∞
5.19. B. E[1/(N+1)] = Σ f(n)/(n+1) = Σ f(m-1)/m = (1/β)Σ (β/(1+β))m/m = (1/β)(-ln(1 - β/(1+β)) n=0
m= 1
m= 1
= ln(1+β)/β = ln(2.5)/1.5 = 0.611. 5.20. C. E[N | N > 1]Prob[N > 1] + (1) Prob[N = 1] + (0) Prob[N = 0] = E[N] = β. E[N | N > 1] = {β - β/(1+β)2 } / {β2/(1+β)2 } = 2 + β = 2.8. 5.21. E. X is 1 + a Geometric Distribution with β = (chance of remaining in the loop)/(chance of leaving the loop) = .2/.8 = 1/4. Variance of X is: β(1+β) = (1/4)(5/4) = 5/16 = 0.3125. Comment: Prob[X = 1] = 1 - .2 = .8. Prob[X = 2] = (.2)(.8). Prob[X = 3] = (.22 )(.8). Prob[X = 4] = (.23 )(.8). Prob[X = x] = (.2x-1)(.8). While this is a series of Bernoulli trials, it ends when the team has its first failure. X is the number of trials through the first failure.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
5.22. A. E[(N - 1)+] = E[N] - E[N
∧
HCM 10/4/12,
Page 92
1] = β - Prob[N ≥ 1] = β − β/(1+β) = β2/(1+β) = 0.7348.
Alternately, E[(N-1)+] = E[(1-N)+] + E[N] - 1 = Prob[N = 0] + β - 1 = β + 1/(1+β) - 1 = 1.3 + 1/2.3 - 1 = 0.7348. Alternately, the memoryless property of the Geometric ⇒ E[(N-1)+]/Prob[N≥1] = E[N] = β. ⇒ E[(N-1)+] = β Prob[N≥1] = β β/(1+β) = β2/(1+β) = 0.7348. 5.23. E. E[(N - 2)+] = E[N] - E[N ∧ 2] = β - (Prob[N = 1] + 2 Prob[N ≥ 2]) = β - β/(1+β)2 - 2β2/(1+β)2 = {β(1+β)2 - β - 2β2}/(1+β)2 = β3/(1+β)2 = 1.33 /2.32 = 0.415. Alternately, E[(N-2)+] = E[(2-N)+] + E[N] - 2 = 2Prob[N = 0] + Prob[N = 1] + β - 2 = β + 2/(1+β) + β/(1+β)2 - 2 = 1.3 + 2/2.3 + (1.3)/2.32 - 2 = 0.415. Alternately, the memoryless property of the Geometric ⇒ E[(N-2)+]/Prob[N≥2] = E[N] = β. ⇒ E[(N-2)+] = β Prob[N≥2] = β β2/(1+β)2 = β3/(1+β)2 = 0.415. Comment: For integral j, for the Geometric, E[(N - j)+] = βj+1/(1+β)j. 5.24. D. Probability of finding a job within six weeks is: (.25){1 + .75 + .752 + .753 + .754 + .755 } = .822. 1 - .822 = 17.8%. 5.25. B. The number of weeks he remains unemployed is Geometric with β = (chance of failure) / ( chance of success) = 0.75/0.25 = 3. Mean = β = 3. 5.26. A. Variance of this Geometric is: β (1 + β) = (3)(4) = 12.
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
5.27. The mean of a Geometric Distribution is β. ⇒
HCM 10/4/12,
Page 93
∞
∑ k pk = β. k=0
pk =
-
βk (1+ β) k + 1
. ⇒ ln[pk] = k ln[β] - (k+1) ln[1 + β] = {ln[β] - ln[1 + β]} k - ln[1 + β] .
∞
∞
∞
k=0
k=0
k=0
∑ pk ln[pk] = {ln[1 + β] - ln[β]} ∑ k pk + ln[1 + β] ∑ pk = {ln[1 + β] - ln[β]} β + ln[1 + β] (1) =
(1 + β) ln[1 + β] - β ln[β]. Comment: The Shannon entropy from information theory, except there the log is to the base 2. 5.28. E. Prob[X ≥ 6] = Prob[first 5 rolls each ≠ 3] = (5/6)5 . Alternately, the number failures before the first success, X - 1, is Geometric with β = chance of failure / chance of success = (5/6)/(1/6) = 5. Prob[X ≥ 6] = Prob[# failures ≥ 5] = 1 - F(4) = {β/(1+β)}4+1 = (5/6)5 . 5.29. A. Prob[X is even] = 2/32 + 2/34 + 2/36 + ... = (2/9)/(1 - 1/9) = 1/4. Comment: X - 1 follows a Geometric Distribution with β = 2. 5.30. C. X - 1 is Geometric with β = chance of failure / chance of success = (5/6)/(1/6) = 5. Prob[X - 1 ≥ x - 1] = {β/(1+β)}x = (5/6)x. Set this equal to 1/2: 1/2 = (5/6)x. x = ln(1/2)/ln(5/6) = 3.8. The next greatest integer is 4. P[X ≤ 4] = 1 - (5/6)4 = .518 ≥ 1/2. Alternately, Prob[X = x] = Prob[X - 1 tosses ≠ 2]Prob[X = 2] = (5/6)x-1/6. X
Probability
Cumulative
1 2 3 4 5 6
0.1667 0.1389 0.1157 0.0965 0.0804 0.0670
0.1667 0.3056 0.4213 0.5177 0.5981 0.6651
5.31. A. X has a Geometric with β = chance of continuing / chance of ending = (.9)/(.1) = 9. f(x) = 9x/10x+1 = (0.1)(0.9x ), for x = 0, 1, 2, 3,...
2013-4-1,
Frequency Distributions, §5 Geometric Dist.
HCM 10/4/12,
Page 94
5.32. B. This is a series of Bernoulli trials, and X - 1 is the number of failures before the first success. Thus X - 1 is Geometric. β = E[X - 1] = 12.5 - 1 = 11.5. Prob[X = 6] = Prob[X - 1 = 5] = f(5) = β5/(1+ β)6 = 11.55 /12.56 = .0527. Alternately, Prob[person has high blood pressure] = 1/E[X] = 1/12.5 = 8%. Prob[sixth person is the first one with high blood pressure] = Prob[first five donʼt have high blood pressure] Prob[sixth has high blood pressure] = (1 - .08)5 (.08) = 0.0527. 5.33. A. The densities are declining geometrically. Therefore, this is a Geometric Distribution, with β/(1+ β) = 1/5. ⇒ β = 1/4. Prob[more than one claim] = 1 - f(0) - f(1) = 1 - 1/(1+ β) - β/(1+ β)2 = 1 - 4/5 - 4/25 = 0.04. 5.34. E. Expected Benefit = (4000)(0.4) + (3000)(0.6)(0.4) + (2000)(0.62 )(0.4) + (1000)(0.63 )(0.4) = 2694. Alternately, the benefit is 1000(4 - N)+, where N is the number of years before the device fails. N is Geometric, with 1/(1 + β) = .4. ⇒ β = 1.5. E[N ∧ 4] = 0f(0) + 1f(1) + 2f(2) + 3f(3) + 4{1 - f(0) - f(1) - f(2) - f(3)} = 4 - 4f(0) - 3f(1) - 2f(2) - f(3). Expected Benefit = 1000E[(4 - N)+] = 1000(4 - E[N ∧ 4]) = 1000{4f(0) + 3f(1) + 2f(2) + f(3)} = 1000{4(.4) + 3(.4)(.6) + 2(.4)(.62 ) + (.4)(.63 )} = 2694.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 95
Section 6, Negative Binomial Distribution The third and final important frequency distribution is the Negative Binomial, which has the Geometric as a special case.
Negative Binomial Distribution Support: x = 0, 1, 2, 3, ...
P. d. f. :
f(2) =
r = 1 is a Geometric Distribution
F(x) = β(r, x+1 ; 1/(1+β)) =1- β( x+1, r ; β/(1+β) )
D. f. :
f(0) =
Parameters: β > 0, r ≥ 0.
f(x) =
⎛ x+ r − 1⎞ βx βx r(r + 1)...(r + x - 1) = . ⎜ ⎟ (1+ β )x + r ⎝ x ⎠ (1+ β) x + r x!
1 (1 + β)r
f(1) = 2
r (r + 1)
β
2
(1 + β)r + 2
Mean = rβ
f(3) =
Variance = rβ(1+β)
Coefficient of Variation =
Kurtosis = 3 +
Incomplete Beta Function
1+β rβ
rβ (1 + β)r + 1 3
r (r + 1) (r + 2)
β
6
(1 + β)r + 3
Variance / Mean = 1 + β > 1.
Skewness =
1 + 2β r β (1+ β)
= CV(1+2β)/(1+β).
6 β2 + 6 β + 1 . r β (1 + β )
Mode = largest integer in (r-1)β
(if (r-1)β is an integer, then both (r-1)β and (r-1)β - 1 are modes.)
Probability Generating Function: P(z) = {1- β(z-1)}-r, z < 1 + 1/β. Moment Generating Function: M(s) = {1- β(es-1)}-r, s < ln(1+β) - ln(β). f(x+1)/f(x) = a + b/(x+1), a = β/(1+β), b = (r-1)β/(1+β), f(0) = (1+β)-r.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
A Negative Binomial Distribution with r = 2 and β = 4: Prob. 0.08
0.06
0.04
0.02
0
5
10
15
20
25
x
30
A Negative Binomial Distribution with r = 0.5 and β = 10: Prob. 0.3 0.25 0.2 0.15 0.1 0.05
0
5
10
15
20
25
30
x
Page 96
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 97
Here is a Negative Binomial Distribution with parameters β = 2/3 and r = 8:31 Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
f(x) 0.0167962 0.0537477 0.0967459 0.1289945 0.1418940 0.1362182 0.1180558 0.0944446 0.0708335 0.0503705 0.0342519 0.0224194 0.0141990 0.0087378 0.0052427 0.0030757 0.0017685 0.0009987 0.0005548 0.0003037 0.0001640 0.0000875 0.0000461 0.0000241 0.0000124 0.0000064 0.0000032 0.0000016 0.0000008 0.0000004 0.0000002
Sum
1.00000
F(x) 0.01680 0.07054 0.16729 0.29628 0.43818 0.57440 0.69245 0.78690 0.85773 0.90810 0.94235 0.96477 0.97897 0.98771 0.99295 0.99603 0.99780 0.99879 0.99935 0.99965 0.99982 0.99990 0.99995 0.99997 0.99999 0.99999 1.00000 1.00000 1.00000 1.00000 1.00000
Number of Claims times f(x) 0.00000 0.05375 0.19349 0.38698 0.56758 0.68109 0.70833 0.66111 0.56667 0.45333 0.34252 0.24661 0.17039 0.11359 0.07340 0.04614 0.02830 0.01698 0.00999 0.00577 0.00328 0.00184 0.00101 0.00055 0.00030 0.00016 0.00008 0.00004 0.00002 0.00001 0.00001
Square of Number of Claims times f(x) 0.00000 0.05375 0.38698 1.16095 2.27030 3.40546 4.25001 4.62779 4.53334 4.08001 3.42519 2.71275 2.04465 1.47669 1.02757 0.69204 0.45275 0.28863 0.17977 0.10964 0.06560 0.03857 0.02232 0.01273 0.00716 0.00398 0.00218 0.00119 0.00064 0.00034 0.00018
5.33333
37.33314
For example, f(5) = {( 2/3)5 / (1+ 2/3)8+5}(12!) / {(5!)(7!)} = (0.000171993)(479,001,600)/{(120)(5040)} = 0.136. The mean is: rβ = 8(2/3) = 5.333. The variance is: 8(2/3)(1+2/3) = 8.89. The variance can also be computed as: (mean)(1+β) = 5.333(5/3) = 8.89. The variance is indeed = E[X2 ] - E[X]2 = 37.333 - 5.33332 = 8.89. According to the formula given above, the mode should be the largest integer in (r-1)β = (8-1)(2/3) = 4.67, which contains the integer 4. In fact, f(4) = 14.2% is the largest value of the probability density function. Since F(5) = 0.57 ≥ 0.5 and F(4) = 0.44 < 0.5, 5 is the median. 31
The values for the Negative Binomial probability density function in the table were computed using: f(0) = (β/(1+β))r and f(x+1) / f(x) = β(x+r) / {(x+1)(1+β)}. For example, f(12) = f(11)β(11+r) / {12(1+β)} = (0.02242)(2/3)(19) / 20 = 0.01420.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 98
Mean and Variance of the Negative Binomial Distribution: The mean of a Geometric distribution is β and its variance is β(1+β). Since the Negative Binomial is a sum of r Geometric Distributions, it follows that the mean of the Negative Binomial is rβ and the variance of the Negative Binomial is rβ(1+β). Since β > 0, 1+ β > 1, for the Negative Binomial Distribution the variance is greater than the mean. For the Negative Binomial, the ratio of the variance to the mean is 1+β, while variance/mean = 1 for the Poisson Distribution. Thus (β)(mean) is the “extra variance” for the Negative Binomial compared to the Poisson. Non-Integer Values of r: Note that even if r is not integer, the binomial coefficient in the front of the Negative Binomial Density ⎛ x+ r − 1⎞ (x +r -1)! (x +r -1) (x + r - 2) ... (r) can be calculated as: ⎜ = . ⎟= x! (r - 1)! x! ⎝ x ⎠ For example with r = 6.2 if one wanted to compute f(4), then the binomial coefficient in front is: ⎛4 + 6.2 − 1⎞ ⎛9.2 ⎞ 9.2! (9.2) (8.2) (7.2) (6.2) = = 140.32. ⎜ ⎟ = ⎜ ⎟= 4 4! ⎝ ⎠ ⎝ 4 ⎠ 5.2! 4! Note that the numerator has 4 factors; in general it will have x factors. These four factors are: 9.2! / (9.2-4)! = 9.2!/5.2!, or if you prefer: Γ(10.2) / Γ(6.2) = (9.2)(8.2)(7.2)(6.2). As shown in Loss Models, in general one can rewrite the density of the Negative Binomial as: βx r(r + 1)...(r + x - 1) f(x) = , where there are x factors in the product in the numerator. (1+ β )x + r x! Exercise: For a Negative Binomial with parameters r = 6.2 and β = 7/3, compute f(4). [Solution: f(4) = {(9.2)(8.2)(7.2)(6.2)/4!} (7/3)4 / (1+ 7/3)6.2+4 = 0.0193.]
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 99
Negative Binomial as a Mixture of Poissons: As discussed subsequently, when Poissons are mixed via a Gamma Distribution, the mixed distribution is always a Negative Binomial Distribution, with r = α = shape parameter of the Gamma and β = θ = scale parameter of the Gamma. The mixture of Poissons via a Gamma distribution produces a Negative Binomial Distribution and increases the variance above the mean. Series of Bernoulli Trials: Return to the situation that resulted in the Geometric distribution, involving a series of independent Bernoulli trials each with chance of success 1/(1 + β), and chance of failure of β/(1 + β). What is the probability of two successes and four failures in the first six trials? It is given by the Binomial Distribution: ⎛6⎞ ⎛6⎞ ⎜ ⎟ 1/(1+β)2 {1 - 1/(1+β)}4 = ⎜ ⎟ β4 /(1+β)6 . ⎝2⎠ ⎝2⎠ The chance of having the third success on the seventh trial is given by 1/(1 + β) times the above probability: ⎛6⎞ ⎜ ⎟ β4 /(1+β)7 ⎝2⎠ Similarly the chance of the third success on trial x + 3 is given by 1/(1 + β) times the probability of 3 - 1 = 2 successes and x failures on the first x + 3 - 1 = x + 2 trials: ⎛ x +2⎞ ⎜ ⎟ βx/(1+β)x+3 ⎝ 2 ⎠ More generally, the chance of the rth success on trial x+r is given by 1/(1 + β) times the probability of r-1 success and x failures on the first x+r-1 trials. ⎛ x+ r − 1⎞ f(x) = {1/(1+β)} ⎜ ⎟ βx/(1+β)x+r-1 = r -1 ⎝ ⎠
⎛ x+ r − 1⎞ ⎜ ⎟ βx/(1+β)x+r, x = 0, 1, 2, 3... x ⎝ ⎠
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 100
This is the Negative Binomial Distribution. Thus we see that one source of the Negative Binomial is the chance of experiencing failures on a series of independent Bernoulli trials prior to getting a certain number of successes.32 Note that in the derivation, 1/(1 + β) is the chance of success on each Bernoulli trial. Thus, β = {β/(1+β)) / {1/(1+β)} = chance of a failure / chance of a success. For a series of independent identical Bernoulli trials, the chance of success number r following x failures is given by a Negative Binomial Distribution with parameters β = (chance of a failure) / (chance of a success), and r. Exercise: One has a series of independent Bernoulli trials, each with chance of success 0.3. What is the distribution of the number of failures prior to the 5th success? [Solution: A Negative Binomial Distribution, as per Loss Models, with parameters β = 0.7/0.3 = 7/3, and r = 5.] While this is one derivation of the Negative Binomial distribution, note that the Negative Binomial Distribution is used to model claim counts in many situations that have no relation to this derivation.
32
Even though the Negative Binomial Distribution was derived here for integer values of r, as has been discussed, the Negative Binomial Distribution is well defined for r non-integer as well.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 101
Negative Binomial as a Sum of Geometric Distributions: The number of claims for a Negative Binomial Distribution was modeled as the number of failures prior to getting a total of r successes on a series of independent Bernoulli trials. Instead one can add up the number of failures associated with getting a single success r times independently of each other. As seen before, each of these is given by a Geometric distribution. Therefore, obtaining r successes is the sum of r separate independent variables each involving getting a single success. Number of Failures until the third success has a Negative Binomial Distribution: r = 3, β = (1 - q)/q.
Time 0
Success #1
Geometric β = (1 - q)/q
Success #2
Geometric β = (1 - q)/q
Success #3
Geometric β = (1 - q)/q
Therefore, the Negative Binomial Distribution with parameters β and r, with r integer, can be thought of as the sum of r independent Geometric distributions with parameter β.
The Negative Binomial Distribution for r = 1 is a Geometric Distribution. Since the Geometric distribution is the discrete analog of the Exponential distribution, the Negative Binomial distribution is the discrete analog of the continuous Gamma Distribution33. The parameter r in the Negative Binomial is analogous to the parameter α in the Gamma Distribution.34
(1+β)/β in the Negative Binomial Distribution is analogous to e1/θ in the Gamma Distribution.
Recall that the Gamma Distribution is a sum of α independent Exponential Distributions, just as the Negative Binomial is the sum of r independent Geometric Distributions. 34 Note that the mean and variance of the Negative Binomial and the Gamma are proportional respectively to r and α. 33
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 102
Adding Negative Binomial Distributions: Since the Negative Binomial is a sum of Geometric Distributions, if one sums independent Negative Binomials with the same β, then one gets another Negative Binomial, with the same β parameter and the sum of their r parameters.35 Exercise: X is a Negative Binomial with β = 1.4 and r = 0.8. Y is a Negative Binomial with β = 1.4 and r = 2.2. Z is a Negative Binomial with β = 1.4 and r = 1.7. X, Y, and Z are independent of each other. What form does X + Y + Z have? [Solution: X + Y + Z is a Negative Binomial with β = 1.4 and r = .8 + 2.2 + 1.7 = 4.7.] If X is Negative Binomial with parameters β and r1 , and Y is Negative Binomial with parameters β and r2 , X and Y independent, then X + Y is Negative Binomial with parameters β and r1 + r2 . Specifically, the sum of n independent identically distributed Negative Binomial variables, with the same parameters β and r, is a Negative Binomial with parameters β and nr. Exercise: X is a Negative Binomial with β = 1.4 and r = 0.8. What is the form of the sum of 25 independent random draws from X? [Solution: A random draw from a Negative Binomial with β = 1.4 and r = (25)(.8) = 20.] Thus if one had 25 exposures, each of which had an independent Negative Binomial frequency process with β = 1.4 and r = 0.8, then the portfolio of 25 exposures has a Negative Binomial frequency process with β = 1.4 and r = 20.
This holds whether or not r is integer. This is analogous to adding independent Gammas with the same θ parameter. One obtains a Gamma, with the same θ parameter, but with the new α parameter equal to the sum of the individual α parameters. 35
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 103
Effect of Exposures: Assume one has 100 exposures with independent, identically distributed frequency distributions. If each one is Negative Binomial with parameters β and r, then so is the sum, with parameters β and 100r. If we change the number of exposures to for example 150, then the sum is Negative Binomial with parameters β and 150r, or 1.5 times the r parameter in the first case. In general, as the exposures change, the r parameter changes in proportion.36 Exercise: The total number of claims from a portfolio of insureds has a Negative Binomial Distribution with β = 0.2 and r = 30. If next year the portfolio has 120% of the current exposures, what is its frequency distribution? [Solution: Negative Binomial with β = 0.2 and r = (1.2)(30) = 36.]
Thinning Negative Binomial Distributions: Thinning can also be applied to the Negative Binomial Distribution.37 The β parameter of the Negative Binomial Distribution is multiplied by the thinning factor. Exercise: Claim frequency follows a Negative Binomial Distribution with parameters β = 0.20 and r = 1.5. One quarter of all claims involve attorneys. If attorney involvement is independent between different claims, what is the probability of getting two claims involving attorneys in the next year? [Solution: Claims with attorney involvement are Negative Binomial Distribution with β = (.20)(25%) = 0.05 and r = 1.5. Thus f(2) = r(r+1)β2 / {2! (1+β)r+2} = (1.5)(2.5)(.05)2 / {2 (1.05)3.5} = 0.395%.] Note that when thinning the parameter β is altered, while when adding the r parameter is affected. As discussed previously, if one adds two independent Negative Binomial Distributions with the same β, then the result is also a Negative Binomial Distribution, with the sum of the r parameters.
36
See Section 6.12 of Loss Models. This same result holds for a Compound Frequency Distribution, to be discussed subsequently, with a primary distribution that is Negative Binomial. 37 See Table 8.3 in Loss Models. However, unlike the Poisson case, the large and small accidents are not independent processes.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Problems: The following six questions all deal with a Negative Binomial distribution with parameters β = 0.4 and r = 3. 6.1 (1 point) What is the mean? A. less than .9 B. at least .9 but less than 1.0 C. at least 1.0 but less than 1.1 D. at least 1.1 but less than 1.2 E. at least 1.2 6.2 (1 point) What is the variance? A. less than 1.8 B. at least 1.8 but less than 1.9 C. at least 1.9 but less than 2.0 D. at least 2.0 but less than 2.1 E. at least 2.1 6.3 (2 points) What is the chance of having 4 claims? A. less than 3% B. at least 3% but less than 4% C. at least 4% but less than 5% D. at least 5% but less than 6% E. at least 6% 6.4 (2 points) What is the mode? A. 0 B. 1 C. 2
D. 3
E. None of A, B, C, or D.
6.5 (2 points) What is the median? A. 0 B. 1 C. 2
D. 3
E. None of A, B, C, or D.
6.6 (2 points) What is the chance of having 4 claims or less? A. 90% B. 92% C. 94% D. 96% E. 98%
Page 104
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 105
6.7 (2 points) Bud and Lou play a series of games. Bud has a 60% chance of winning each game. Lou has a 40% chance of winning each game. The outcome of each game is independent of any other. Let N be the number of games Bud wins prior to Lou winning 5 games. What is the variance of N? A. less than 14 B. at least 14 but less than 16 C. at least 16 but less than 18 D. at least 18 but less than 20 E. at least 20 6.8 (1 point) For a Negative Binomial distribution with β = 2/9 and r = 1.5, what is the chance of having 3 claims? A. 1% B. 2%
C. 3%
D. 4%
E. 5%
6.9 (2 points) In baseball a team bats in an inning until it makes 3 outs. Assume each batter has a 40% chance of getting on base and a 60% chance of making an out. Then what is the chance of a team sending exactly 8 batters to the plate in an inning? (Assume no double or triple plays. Assume nobody is picked off base, caught stealing or thrown out on the bases. Assume each batterʼs chance of getting on base is independent of whether another batter got on base.) A. less than 1% B. at least 1% but less than 2% C. at least 2% but less than 3% D. at least 3% but less than 4% E. at least 4% 6.10 (1 point) Assume each exposure has a Negative Binomial frequency distribution, as per Loss Models, with β = 0.1 and r = 0.27. You insure 20,000 independent exposures. What is the frequency distribution for your portfolio? A. Negative Binomial with β = 0.1 and r = 0.27. B. Negative Binomial with β = 0.1 and r = 5400. C. Negative Binomial with β = 2000 and r = 0.27. D. Negative Binomial with β = 2000 and r = 5400. E. None of the above.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 106
6.11 (3 points) Frequency is given by a Negative Binomial distribution with β = 1.38 and r = 3. Severity is given by a Weibull Distribution with τ = 0.3 and θ = 1000. Frequency and severity are independent. What is chance of two losses each of size greater than $25,000? A. 1% B. 2% C. 3% D. 4%
E. 5%
Use the following information for the next two questions: Six friends each have their own phone. The number of calls each friend gets per night from telemarketers is Geometric with β = 0.3. The number of calls each friend gets is independent of the others. 6.12 (2 points) Tonight, what is the probability that three of the friends get one or more calls from telemarketers, while the other three do not? A. 11% B. 14% C. 17% D. 20% E. 23% 6.13 (2 points) Tonight, what is the probability that the friends get a total of three calls from telemarketers? A. 11% B. 14% C. 17% D. 20% E. 23% 6.14 (2 points) The total number of claims from a group of 80 drivers has a Negative Binomial Distribution with β = 0.5 and r = 4. What is the probability that a group of 40 similar drivers have a total of 2 or more claims? A. 22% B. 24% C. 26% D. 28% E. 30% 6.15 (2 points) The total number of non-zero payments from a policy with a $1000 deductible follows a Negative Binomial Distribution with β = 0.8 and r = 3. The ground up losses follow an Exponential Distribution with θ = 2500. If this policy instead had a $5000 deductible, what would be the probability of having no non-zero payments? A. 56% B. 58% C. 60% D. 62% E. 64%
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 107
6.16 (3 points) The mathematician Stefan Banach smoked a pipe. In order to light his pipe, he carried a matchbox in each of two pockets. Each time he needs a match, he is equally likely to take it from either matchbox. Assume that he starts the month with two matchboxes each containing 20 matches. Eventually Banach finds that when he tries to get a match from one of his matchboxes it is empty. What is the probability that when this occurs, the other matchbox has exactly 5 matches in it? A. less than 6% B. at least 6% but less than 7% C. at least 7% but less than 8% D. at least 8% but less than 9% E. at least 9% 6.17 (2 points) Total claim counts generated from a portfolio of 400 policies follow a Negative Binomial distribution with parameters r = 3 and β = 0.4. If the portfolio increases to 500 policies, what is the probability of observing exactly 2 claims in total? A. 21% B. 23% C. 25% D. 27% E. 29% Use the following information for the next three questions: Two teams are playing against one another in a seven game series. The results of each game are independent of the others. The first team to win 4 games wins the series. 6.18 (3 points) The Flint Tropics have a 45% chance of winning each game. What is the Flint Tropics chance of winning the series? A. 33% B. 35% C. 37% D. 39% E. 41% 6.19 (3 points) The Durham Bulls have a 60% chance of winning each game. What is the Durham Bulls chance of winning the series? A. 67% B. 69% C. 71% D. 73% E. 75% 6.20 (3 points) The New York Knights have a 40% chance of winning each game. The Knights lose the first game. The opposing manager offers to split the next two games with the Knights (each team would win one of the next two games.) Should the Knights accept this offer? 6.21 (3 points) The number of losses follows a Negative Binomial distribution with r = 4 and β = 3. Sizes of loss are uniform from 0 to 15,000. There is a deductible of 1000, a maximum covered loss of 10,000, and a coinsurance of 90%. Determine the probability that there are exactly six payments of size greater than 5000. A. 9.0% B. 9.5% C. 10.0% D. 10.5% E. 11.0%
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 108
6.22 (2 points) Define (N - j)+ = n - j if n ≥ j, and 0 otherwise. N follows a Negative Binomial distribution with r = 5 and β = 0.3. Determine E[(N - 2)+]. A. 0.25
B. 0.30
C. 0.35
D. 0.40
E. 0.45
6.23 (3 points) The number of new claims the State House Insurance Company receives in a day follows a Negative Binomial Distribution r = 5 and β = 0.8. For a claim chosen at random, on average how many other claims were also made on the same day? A. 4.0 B. 4.2 C. 4.4 D. 4.6 E. 4.8 6.24 (2, 5/83, Q.44) (1.5 points) If a fair coin is tossed repeatedly, what is the probability that the third head occurs on the nth toss? ⎛ n⎞ A. (n-1)/2n+1 B. (n - 1)(n - 2)/2n+1 C. (n - 1)(n - 2)/2n D. (n-1)/2n E. ⎜ ⎟ / 2n ⎝3⎠ 6.25 (2, 5/90, Q.45) (1.7 points) A coin is twice as likely to turn up tails as heads. If the coin is tossed independently, what is the probability that the third head occurs on the fifth trial? A. 8/81 B. 40/243 C. 16/81 D. 80/243 E. 3/5 6.26 (2, 2/96, Q.28) (1.7 points) Let X be the number of independent Bernoulli trials performed until a success occurs. Let Y be the number of independent Bernoulli trials performed until 5 successes occur. A success occurs with probability p and Var(X) = 3/4. Calculate Var(Y). A. 3/20
B. 3/(4 5 )
C. 3/4
D. 15/4
E. 75/4
6.27 (1, 11/01, Q.11) (1.9 points) A company takes out an insurance policy to cover accidents that occur at its manufacturing plant. The probability that one or more accidents will occur during any given month is 3/5. The number of accidents that occur in any given month is independent of the number of accidents that occur in all other months. Calculate the probability that there will be at least four months in which no accidents occur before the fourth month in which at least one accident occurs. (A) 0.01 (B) 0.12 (C) 0.23 (D) 0.29 (E) 0.41 6.28 (1, 11/01, Q.21) (1.9 points) An insurance company determines that N, the number of claims received in a week, is a random variable with P[N = n] = 1/2n+1, where n ≥ 0. The company also determines that the number of claims received in a given week is independent of the number of claims received in any other week. Determine the probability that exactly seven claims will be received during a given two-week period. (A) 1/256 (B) 1/128 (C) 7/512 (D) 1/64 (E) 1/32
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 109
6.29 (CAS3, 11/03, Q.18) (2.5 points) A new actuarial student analyzed the claim frequencies of a group of drivers and concluded that they were distributed according to a negative binomial distribution and that the two parameters, r and β, were equal. An experienced actuary reviewed the analysis and pointed out the following: "Yes, it is a negative binomial distribution. The r parameter is fine, but the value of the β parameter is wrong. Your parameters indicate that 1/9 of the drivers should be claim-free, but in fact, 4/9 of them are claim-free." Based on this information, calculate the variance of the corrected negative binomial distribution. A. 0.50 B. 1.00 C. 1.50 D. 2.00 E. 2.50 6.30 (CAS3, 11/04, Q.21) (2.5 points) The number of auto claims for a group of 1,000 insured drivers has a negative binomial distribution with β = 0.5 and r = 5. Determine the parameters β and r for the distribution of the number of auto claims for a group of 2,500 such individuals. A. β = 1.25 and r = 5 B. β = 0.20 and r = 5 C. β = 0.50 and r = 5 D. β = 0.20 and r= 12.5 E. β = 0.50 and r = 12.5 6.31 (CAS3, 5/05, Q.28) (2.5 points) You are given a negative binomial distribution with r = 2.5 and β = 5. For what value of k does pk take on its largest value? A. Less than 7
B. 7
C. 8
D. 9
E. 10 or more
6.32 (CAS3, 5/06, Q.32) (2.5 points) Total claim counts generated from a portfolio of 1,000 policies follow a Negative Binomial distribution with parameters r = 5 and β = 0.2. Calculate the variance in total claim counts if the portfolio increases to 2,000 policies. A. Less than 1.0 B. At least 1.0 but less than 1.5 C. At least 1.5 but less than 2.0 D. At least 2.0 but less than 2.5 E. At least 2.5
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 110
6.33 (CAS3, 11/06, Q.23) (2.5 points) An actuary has determined that the number of claims follows a negative binomial distribution with mean 3 and variance 12. Calculate the probability that the number of claims is at least 3 but less than 6. A. Less than 0.20 B. At least 0.20, but less than 0.25 C. At least 0.25, but less than 0.30 D. At least 0.30, but less than 0.35 E. At least 0.35 6.34 (CAS3, 11/06, Q.24) (2.5 points) Two independent random variables, X1 and X2 , follow the negative binomial distribution with parameters (r1 , β1) and (r2 , β2), respectively. Under which of the following circumstances will X1 + X2 always be negative binomial? 1. r1 = r2 . 2. β1 = β2. 3. The coefficients of variation of X1 and X2 are equal. A. 1 only
B. 2 only
C. 3 only
D. 1 and 3 only
E. 2 and 3 only
6.35 (CAS3, 11/06, Q.31) (2.5 points) You are given the following information for a group of policyholders:
• The frequency distribution is negative binomial with r = 3 and β = 4. • The severity distribution is Pareto with α = 2 and θ = 2,000. Calculate the variance of the number of payments if a $500 deductible is introduced. A. Less than 30 B. At least 30, but less than 40 C. At least 40, but less than 50 D. At least 50, but less than 60 E. At least 60 6.36 (SOA M, 11/06, Q.22 & 2009 Sample Q.283) (2.5 points) The annual number of doctor visits for each individual in a family of 4 has a geometric distribution with mean 1.5. The annual numbers of visits for the family members are mutually independent. An insurance pays 100 per doctor visit beginning with the 4th visit per family. Calculate the expected payments per year for this family. (A) 320 (B) 323 (C) 326 (D) 329 (E) 332
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 111
Solutions to Problems: 6.1. E. mean = rβ = (3)(.4) = 1.2. 6.2. A. variance = rβ(1+ β) = (3)(.4)(1.4) = 1.68. ⎛ x +r -1⎞ ⎛6⎞ 6.3. B. ⎜ ⎟ βx/ (1+β)x+r = ⎜ ⎟ (.4)4 / (1.4)4+3 = 15 (.0256)/(10.54) = 0.0364. ⎝ x ⎠ ⎝4⎠ 6.4. A. & 6.5. B. The mode is 0, since f(0) is larger than any other value. n f(n) F(n)
0 0.3644 0.364
1 0.3124 0.677
2 0.1785 0.855
3 0.0850 0.940
4 0.0364 0.977
The median is 1, since F(0) <.5 and F(1) ≥ .5. Comment: Iʼve used the formulas: f(0) = (β/(1+β))r and f(x+1) / f(x) = β(x+r) / {(x+1)(1+β)}. Just as with the Gamma Distribution, the Negative Binomial can have either a mode of zero or a positive mode. For r < 1 + 1/β, as is the case here, the mode is zero, and the Negative Binomial looks somewhat similar to an Exponential Distribution. 6.6. E. F(4) = f(0) + f(1) + f(2) + f(3) + f(4) = 97.7%. n f(n) F(n)
0 0.3644 0.3644
1 0.3124 0.6768
2 0.1785 0.8553
3 0.0850 0.9403
4 0.0364 0.9767
Comment: Using the Incomplete Beta Function: F(4) = 1- β(4+1, r; β/(1+β)) = 1 - β(5,3; .4/1.4) = 1 - 0.0233 = 0.9767. 6.7. D. This is series of Bernoulli trials. Treating Louʼs winning as a “success”, then chance of success is 40%. N is the number of failures prior to the 5th success. Therefore N has a Negative Binomial Distribution with r = 5 and β = chance of failure / chance of success = 60%/40% = 1.5. Variance is: rβ(1+β) = (5)(1.5)(2.5) = 18.75. 6.8. A. f(3) = {r(r+1)(r+2)} / 3!} β3 / (1+β)3+r = {(1.5)(2.5)(3.5)/ 6} (2/9)3 (11/9)-4.5 = 0.0097.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 112
6.9. E. For the defense a batter reaching base is a failure and an out is a success. The number of batters reaching base is the number of failures prior to 3 successes for the defense. The chance of a success for the defense is .6. Therefore the number of batters who reach base is given by a Negative Binomial with r = 3 and β = (chance of failure for the defense)/(chance of success for the defense) = .4/.6 = 2/3. If 8 batters come to the plate, then 5 reach base and 3 make out. The chance of exactly 5 batters reaching base is f(5) for r = 3 and β = 2/3: {(3)(4)(5)(6)(7)/5!}β5 / (1+β)5+r = (21)(.13169)/(59.537) = 0.0464. Alternately, for there to be exactly 8 batters, the last one has to make an out, and exactly two of the first 7 must make an out. Prob[2 of 7 make out] ⇔ density at 2 of Binomial Distribution with m = 7 and q = .6 ⇔ ((7)(6)/2).62 .45 = .0774. Prob[8th batter makes an out]Prob[2 of 7 make an out] = (.6)(.0774) = 0.0464. Comment: Generally, one can use either a Negative Binomial Distribution or some reasoning and a Binomial Distribution in order to answer these type of questions. 6.10. B. The sum of independent Negative Binomials, each with the same β, is another Negative Binomial, with the sum of the r parameters. In this case we get a Negative Binomial with β = 0.1 and r = (.27)(20000) = 5400. 6.11. D. S(25,000) = exp(-(25000/1000).3) = .0723. The losses greater than $25,000 is another Negative Binomial with r = 3 and β = (1.38)(.0723) = .0998. For a Negative Binomial, f(2) = (r(r+1)/2)β2 /(1+β)r+2 = {(3)(4)/2}.09982 /(1.0998)5 = 3.71%. Comment: An example of thinning a Negative Binomial. 6.12. A. For the Geometric, f(0) = 1/(1+β) = 1/1.3. 1 - f(0) = .3/1.3. Prob[3 with 0 and 3 not with 0] = {6! / (3! 3!)} (1/1.3)3 (.3/1.3)3 = 0.112. 6.13. B. The total number of calls is Negative Binomial with r = 6 and β = .3. f(3) = (r(r+1)(r+2)/3!)β3/(1+β)3+r = ((6)(7)(8)/3!).33 /1.39 = 0.143. 6.14. C. The frequency for the 40 drivers is Negative Binomial Distribution with parameters r = (40/80)(4) = 2 and β = 0.5. f(0) = 1/1.52 = 44.44%. f(1) = 2(.5/1.53 ) = 29.63%. 1 - f(0) - f(1) = 25.9%.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 113
6.15. E. For the Exponential, S(1000) = exp[-1000/2500] = .6703. S(5000) = exp[-5000/2500] = .1353. Therefore, with the $5000 deductible, the non-zero payments are Negative Binomial Distribution with r = 3 and β = (.1353/.6703)(0.8) = .16. f(0) = 1/1.163 = 64%. 6.16. E. Let us assume the righthand matchbox is the one discovered to be empty. Call a “success” choosing the righthand box and a “failure” choosing the lefthand box. Then we have a series of Bernoulli trials, with chance of success 1/2. The number of “failures” prior to the 21st “success” (looking in the righthand matchbox 20 times and getting a match and once more finding no matches are left) is Negative Binomial with r = 21 and β = (chance of failure)/(chance of success) = (1/2)/(1/2) = 1. For the lefthand matchbox to then have 5 matches, we must have had 15 “failures”. Density at 15 for this Negative Binomial is: {(21)(22)...(35) / 15!} 115/(1 + 1)15+21 = 4.73%. However, it is equally likely that the lefthand matchbox is the one discovered to be out of matches. Thus we double this probability: (2)(4.73%) = 9.5%. Comment: Difficult. The famous Banach Match problem. 6.17. A. When one changes the number of exposures, the r parameter changes in proportion. For 500 policies, total claim counts follow a Negative Binomial distribution with parameters r = 3(500/400) = 3.75 and β = 0.4. f(2) = {r(r+1)/2}β2/(1+β)r+2 = (3.75)(4.75)(.5)(.42 )/(1.45.75) = 20.6%. Comment: Similar to CAS3, 5/06, Q.32. 6.18. D. Ignoring the fact that once a team wins four games, the final games of the series will not be played, the total number of games won out of seven by the Tropics is Binomial with q = 0.45 and m = 7. We want the sum of the densities of this Binomial from 4 to 7: 35(0.454 )(0.553 ) + 21(0.455 )(0.552 ) + 7(0.456 )(0.55) + 0.457 = 0.2388 + 0.1172 + 0.0320 + 0.0037 = 0.3917. Alternately, the number of failures by the Tropics prior to their 4th success is Negative Binomial with r = 4 and β = .55/.45 = 11/9. For the Tropics to win the series they have to have 3 or fewer loses prior to their 4th win. The probability of this is the sum of the densities of the Negative Binomial at 0 to 3: 1/(20/9)4 + 4(11/9)/(20/9)5 + {(4)(5)(11/9)2 /2!}/(20/9)6 + {(4)(5)(6)(11/9)3 /3!}/(20/9)7 = 0.0410 + 0.0902 + 0.1240 + 0.1364 = 0.3916. Comment: The question ignores any effect of home field advantage.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 114
6.19. C. Ignoring the fact that once a team wins four games, the final games of the series will not be played, the total number of games won out of seven by the Bulls is Binomial with q = 0.60 and m = 7. We want the sum of the densities of this Binomial from 4 to 7: 35(.64 )(.43 ) + 21(.65 )(.42 ) + 7(.66 )(.4) + .67 = 0.2903 + 0.2613 + 0.1306 + 0.0280 = 0.7102. Alternately, the number of failures by the Bulls prior to their 4th success is Negative Binomial with r = 4 and β = .4/.6 = 2/3. For the Bulls to win the series they have to have 3 or fewer loses prior to their 4th win. The probability of this is the sum of the densities of the Negative Binomial at 0 to 3: 1/(5/3)4 + 4(2/3)/(5/3)5 + {(4)(5)(2/3)2 /2!}/(5/3)6 + {(4)(5)(6)(2/3)3 /3!}/(5/3)7 = 0.1296 + 0.2074 + 0.2074 + 0.1659 = 0.7103. Comment: According to Bill James, “A useful rule of thumb is that the advantage doubles in a seven-game series. In other words, if one team would win 51% of the games between two opponents, then they would win 52% of the seven-game series. If one team would win 55% of the games, then they would win 60% of the series.” Here is a graph of the chance of winning the seven game series, as a function of the chance of winning each game: per series
0.8 0.6 0.4 0.2
0.3
0.4
0.5
0.6
0.7
0.8
per game
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 115
6.20. If the Knights do not accept the offer, then they need to win four of six games. We want the sum of the densities from 4 to 6 of a Binomial with q = .4 and m = 6: 15(.44 )(.62 ) + 6(.45 )(.6) + .46 = 0.1382 + 0.0369 + 0.0041 = 0.1792. If the Knights accept the offer, then they need to win three of four games. We want the sum of the densities from 3 to 4 of a Binomial with q = .4 and m = 4: 4(.43 )(.6) + .44 = 0.1536 + 0.0256 = 0.1792. Thus the Knights are indifferent between accepting this offer or not. Alternately, if the Knights do not accept the offer, then they need to win four of six games. The number of failures by the Knights prior to their 4th success is Negative Binomial with r = 4 and β = .6/.4 = 1.5. The Knights win the series if they have 2 or fewer failures: 1/2.54 + 4(1.5)/2.55 + {(4)(5)(1.5)2 /2!}/2.56 = 0.0256 + 0.0614 + 0.0922 = 0.1792. If the Knights accept the offer, then they need to win three of four games. The number of failures by the Knights prior to their 3rd success is Negative Binomial with r = 3 and β = .6/.4 = 1.5. The Knights win the series if they have 1 or fewer failures: 1/2.53 + 3(1.5)/2.54 = 0.0640 + 0.1152 = 0.1792. Thus the Knights are indifferent between accepting this offer or not. Comment: A comparison of their chances of winning the series as a function of their chance of winning a game, accepting the offer (dashed) and not accepting the offer (solid): % series 0.35 0.3 0.25 0.2 0.15 0.1
0.3
0.35
0.4
0.45
0.5
% game
The Knights should accept the offer if their chance of winning each game is less than 40%.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 116
6.21. C. A payment is of size greater than 5000 if the loss is of size greater than: 5000/.9 + 1000 = 6556. Probability of a loss of size greater than 6556 is: 1 - 6556/15000 = 56.3%. The large losses are Negative Binomial with r = 4 and β = (56.3%)(3) = 1.69. f(6) = {r(r+1)(r+2)(r+3)(r+4)(r+5)/6!}β6/(1+β)r+6 = {(4)(5)(6)(7)(8)(9)/720}1.696 /2.6910 = 9.9%. Comment: An example of thinning a Negative Binomial. 6.22. C. f(0) = 1/1.35 = 0.2693. f(1) = (5)0.3/1.36 = 0.3108. E[N] = 0f(0) + 1f(1) + 2f(2) + 3f(3) + 4f(4) + 5f(5) + ... E[(N - 2)+] = 1f(3) + 2f(4) + 3f(5) + .... E[N] - E[(N - 2)+] = f(1) + 2f(2) + 2f(3) + 2f(4) + 2f(5) + ... = f(1) + 2{1 - f(0) - f(1)} = 2 - 2f(0) - f(1). E[(N - 2)+] = E[N] - {2 - 2f(0) - f(1)} = (5)(.3) - {2 - (2)(.2693) -.3108} = 0.3494. Alternately, E[N ∧ 2] = 0f(0) + 1f(1) + 2{1 - f(0) - f(1)} = 1.1506. E[(N - 2)+] = E[N] - E[N ∧ 2] = (5)(0.3) - 1.1506 = 0.3494.
Alternately, E[(N - 2)+] = E[(2-N)+] + E[N] - 2 = 2f(0) + f(1) + (5)(0.3) - 2 = 0.3494. Comment: See the section on Limited Expected Values in “Mahlerʼs Guide to Fitting Loss Distributions.” 6.23. E. Let n be the number of claims made on a day. The probability that the claim picked is on a day of size n is proportional to the product of the number of claims on that day and the proportion of days of that size: n f(n). Thus, Prob[claim is from a day with n claims] = n f(n) / Σ n f(n) = n f(n) / E[N]. For n > 0, the number of other claims on the same day is n - 1.
∑ n f(n) (n- 1) ∑ (n2 - n) f(n)
Average number of other claims is:
1
E[N]
=
1
E[N]
=
E[N2 ] - E[N] = E[N]
E[N2 ] Var[N] + E[N]2 Var[N] -1= -1= + E[N] - 1 = 1 + β + rβ - 1 = (r + 1)β = (6)(0.8) = 4.8. E[N] E[N] E[N] Comment: The average day has four claims; on the average day there are three other claims. However, a claim chosen at random is more likely to be from a day that had a lot of claims.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 117
6.24. B. This is a Negative Binomial with r = 3, β = chance of failure / chance of success = 1, and x = number of failures = n - 3. f(x) = {r(r+1)...(r+x-1)/x!}βx/(1+β)r+x = {(3)(4)...(x+2)/x!}/23+x = (x+1)(x+2)/24+x. f(n-3) = (n-2)(n-1)/2n + 1. Alternately, for the third head to occur on the nth toss, for n ≥ 3, we have to have had two head out of ⎛n -1⎞ the first n-1 tosses, which has probability ⎜ ⎟ /2n-1 = (n-2)(n-1)/2n , and a head on the nth toss, ⎝ 2⎠ which has probability 1/2. Thus the total probability is: (n-2)(n-1)/2n + 1. 6.25. A. The number of tails before the third head is Negative Binomial, with r = 3 and β = chance of failure / chance of success = chance of tail / chance of head = 2. Prob[ third head occurs on the fifth trial] = Prob[2 tails when the get 3rd head] = f(2) = {r(r+1)/2}β2/(1+β)r+2 = (6)(4)/35 = 8/81. Alternately, need 2 heads and 2 tails out of the first 4 tosses, and then a head on the fifth toss: {4!/(2!2!)}(1/3)2 (2/3)2 (1/3) = 8/81. 6.26. D. X-1 is Geometric with β = chance of failure / chance of success = (1 - p)/p = 1/p - 1. Therefore, 3/4 = Var(X) = Var(X-1) = β(1 + β) = (1/p - 1){1/p). .75p2 + p - 1 = 0. ⇒ p = {-1 +
1+ 3 }/1.5 = 2/3.
β = 3/2 - 1 = 1/2. Y-5 is Negative Binomial with r = 5 and β = 1/2. Var[Y - 5] = Var[Y] = (5)(1/2)(3/2) = 15/4. Alternately, once one has gotten the first success, the number of additional trials until the second success is independent of and has the same distribution as X, the number of additional trials until the first success. ⇒ Y = X + X + X + X + X. ⇒ Var[Y] = 5Var[X] = (5)(3/4) = 15/4.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 118
6.27. D. Define a “success” as a month in which at least one accident occurs. We have a series of independent Bernoulli trials, and we stop upon the fourth success. The number of failures before the fourth success is Negative Binomial with r = 4 and β = chance of failure / chance of success = (2/5)/(3/5) = 2/3. f(0) = 1/(1 + 2/3)4 = .1296. f(1) = 4(2/3)/(5/3)5 = .20736. f(2) = {(4)(5)/2!}(2/3)2 /(5/3)6 = .20736. f(3) = {(4)(5)(6)/3!}(2/3)3 /(5/3)7 = .165888. Prob[at least 4 failures] = 1 - (.1296 + .20736 + .20736 + .165888) = .289792. Alternately, instead define a “success” as a month in which no accident occurs. We have a series of independent Bernoulli trials, and we stop upon the fourth success. The number of failures before the fourth success is Negative Binomial with r = 4 and β = chance of failure / chance of success = (3/5)/(2/5) = 1.5. f(0) = 1/(1 + 1.5)4 = .0256. f(1) = (4)1.5/2.55 = .06144. f(2) = {(4)(5)/2!}1.52 /2.56 = .09216. f(3) = {(4)(5)(6)/3!}1.53 /2.57 = .110592. The event we want will occur if at the time of the fourth success, the fourth month in which no accidents occur, there have been fewer than four failures, in other words fewer than four months in which at least one accident occurs. Prob[fewer than 4 failures] = .0256 + .06144 + .09216 + .110592 = .289792. 6.28. D. The number of claims in a week is Geometric with β/(1+β) = 1/2. ⇒ β = 1. The sum of two independent Geometrics is a Negative Binomial with r = 2 and β = 1. f(7) = {(2)(3)(4)(5)(6)(7)(8)/7!}β7/(1+β)9 = 1/64. 6.29. C. For the studentʼs Negative Binomial, r = β: f(0) = 1/(1+β)r = 1/(1+r)r = 1/9. ⇒ r = 2. For the corrected Negative Binomial, r = 2 and: f(0) = 1/(1+β)r = 1/(1+β)2 = 4/9. ⇒ β = .5. Variance of the corrected Negative Binomial = rβ(1+β) = (2)(.5)(1.5) = 1.5. 6.30. E. For a Negative Binomial distribution, as the exposures change we get another Negative Binomial; the r parameter changes in proportion, while β remains the same. The new r = (2500/1000)(5) = 12.5. β = 0.5 and r = 12.5.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 119
6.31. B. For a Negative Binomial, a = β/(1 + β) = 5/6, and b = (r - 1)β/(1 + β) = (1.5)(5/6) = 5/4. f(x)/f(x-1) = a + b/x = 5/6 + (5/4)/x, x = 1, 2, 3, ... To find the mode, where the density is largest, find when this ratio is greater than 1. 5/6 + (5/4)/x = 1. ⇒ x/6 = 5/4. x = 7.5. So f(7)/f(6) > 1 while f(8)/f(7) < 1, and 7 is the mode. Comment: f(6) = .0556878. f(7) = .0563507. f(8) = .0557637. 6.32. D. Doubling the exposures, multiplies r by 2. For 2000 policies, total claim counts follow a Negative Binomial distribution with parameters r = 10 and β = 0.2. Variance = rβ(1+β) = (10)(.2)(1.2) = 2.4. Alternately, for 1000 policies, the variance of total claim counts is: (5)(.2)(1.2) = 1.2. 2000 polices. ⇔ 1000 policies + 1000 policies.
⇒ For 2000 policies, the variance of total claim counts is: 1.2 + 1.2 = 2.4. Comment: When one adds independent Negative Binomial Distribution with the same β, one gets another Negative Binomial Distribution with the sum of the r parameters. When one changes the number of exposures, the r parameter changes in proportion. 6.33. B. rβ = 3. rβ(1+β) = 12. ⇒ 1 + β = 12/3 = 4. ⇒ β = 3. ⇒ r = 1. f(3) + f(4) + f(5) = 33 /44 + 34 /45 + 35 /46 = 0.244. Comment: We have fit via Method of Moments. Since r = 1, this is a Geometric Distribution. 6.34. B. 1. False. 2. True. 3. CV =
rβ(1+ β) / (rβ) =
(1+ β) / r . False.
Comment: For the Negative Binomial, P(z) = 1/{1 - β(z-1)}r. The p.g.f. of the sum of two independent variables is the product of their p.g.f.s: 1/({1 - β1(z-1)} r1 {1 - β2(z-1)} r2 ). This only has the same form as a Negative Binomial if and only if β1 = β2. 6.35. A. For the Pareto, S(500) = (2/2.5)2 = 0.64. Thus the number of losses of size greater than 500 is Negative Binomial with r = 3 and β = (.64)(4) = 2.56. The variance of the number of large losses is: (3)(2.56)(3.56) = 27.34.
2013-4-1,
Frequency Distributions, §6 Negative Binomial Dist.
HCM 10/4/12,
Page 120
6.36. D. The total number of visits is the sum of 4 independent, identically distributed Geometric Distributions, which is a Negative Binomial with r = 4 and β = 1.5. f(0) = 1/2.54 = .0256. f(1) = (4)1.5/2.55 = .06144. f(2) = {(4)(5)/2}1.52 /2.56 = .09216. E[N ∧ 3] = 0f(0) + 1f(1) + 2f(2) + 3{1 - f(0) - f(1) - f(2)} = 2.708. E[(N-3)+] = E[N] - E[N ∧ 3] = (4)(1.5) - 2.708 = 3.292. 100E[(N-3)+] = 329.2. Alternately, E[(N-3)+] = E[(3-N)+] + E[N] - 3 = 3f(0) + 2f(1) + f(2) + (4)(1.5) - 3 = 3.292. Comment: See the section on Limited Expected Values in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 121
Section 7, Normal Approximation This section will go over important information that Loss Models assumes the reader already knows concerning the Normal Distribution and its use to approximate frequency distributions. These ideas are important for practical applications of frequency distributions.38 The Binomial Distribution with parameters q and m is the sum of m independent Bernoulli trials, each with parameter q. The Poisson Distribution with λ integer, is the sum of λ independent Poisson variables each with mean of one. The Negative Binomial Distribution with parameters β and r, with r integer, is the sum of r independent Geometric distributions each with parameter β. Thus by the Central Limit Theorem, each of these distributions can be approximated by a Normal Distribution with the same mean and variance. For the Binomial as m → ∞, for the Poisson as λ→ ∞, and for the Negative Binomial as r → ∞, the distribution approaches a Normal39. The approximation is quite good for large values of the relevant parameter, but not very good for extremely small values. For example, here is the graph of a Binomial Distribution with q = 0.4 and m = 30. It is has mean (30)(0.4) = 12 and variance = (30)(0.4)(0.6) = 7.2. Also shown is a Normal Distribution with µ = 12 and σ =
7.2 = 2.683.
Prob. 0.14 0.12 0.1 0.08 0.06 0.04 0.02
0 38
5
10
15
20
25
30
x
These ideas also underlay Classical Credibility. In fact as discussed in a subsequent section, the Binomial and the Negative Binomial each approach a Poisson which in turn approaches a Normal. 39
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 122
Here is the graph of a Poisson Distribution with λ = 10, and the approximating Normal Distribution with µ = 10 and σ =
10 = 3.162:
Prob. 0.12 0.1 0.08 0.06 0.04 0.02
0
5
10
15
20
25
30
x
Here is the graph of a Negative Binomial Distribution with β = 0.5 and r = 20, with mean (20)(0.5) = 10 and variance (20)(0.5)(1.5) = 15, and the approximating Normal Distribution with µ = 10 and σ =
15 = 3.873:
Prob. 0.1
0.08
0.06
0.04
0.02
0
5
10
15
20
25
30
x
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 123
A typical use of the Normal Approximation would be to find the probability of observing a certain range of claims. For example, given a certain distribution, what is the probability of at least 10 and no more than 20 claims. Exercise: Given a Binomial with parameters q = 0.3 and m = 10, what is the chance of observing 1 or 2 claims? [Solution: 10(0.31 )(0.79 ) + 45(0.32 )(0.78 ) = 0.1211 + 0.2335 = 0.3546.] In this case one could compute the exact answer as the sum of only two terms. Nevertheless, let us illustrate how the Normal Approximation could be used in this case. The Binomial distribution with q = 0.3 and m = 10 has a mean of: (0.3)(10) = 3, and a variance of: (10)(0.3)(0.7) = 2.1. This Binomial Distribution can be approximated by a Normal Distribution with mean of 3 and variance of 2.1, as shown below: Prob. 0.25 0.2 0.15 0.1 0.05
2
4
6
8
10
x
Prob[1 claim] = the area of a rectangle of width one and height f(1) = .1211. Prob[2 claims] = the area of a rectangle of width one and height f(2) = .2335. The chance of either one or two claims is the sum of these two rectangles; this is approximated by the area under this Normal Distribution, with mean 3 and variance 2.1, from 1 - .5 = .5 to 2 + .5 = 2.5. Prob[ 1 or 2 claims] ≅ Φ[(2.5-3)/ 2.1 ] - Φ[(.5-3)/ 2.1 ] = Φ[-.345] - Φ[-1.725] = .365 - .042 = 0.323. Note that in order to get the probability for two values on the discrete Binomial Distribution, one has to cover an interval of length two on the real line for the continuous Normal Distribution. We subtracted 1/2 from the lower end of 1 and added 1/2 to the upper end of 2. This is called the “continuity correction”.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 124
Below, I have zoomed in on the relevant of part of the previous diagram: Prob. 0.25
D C
0.2
0.15 B 0.1
A
0.05
0.5
1
1.5
2
2.5
x
It should make it clear why the continuity correction is needed. In this case the chance of having 1 or 2 claims is equal to the area under the two rectangles, which is not close to the area under the Normal from 1 to 2, but is approximated by the area under the Normal from 0.5 to 2.5. In order to use the Normal Approximation, one must translate to the so called “Standard” Normal Distribution40. In this case, we therefore need to standardize the variables by subtracting the mean of 3 and dividing by the standard deviation of
2.1 = 1.449. In this case,
0.5 ↔ (0.5 - 3) / 1.449 = -1.725, while 2.5 ↔ (2.5 - 3) / 1.449 = -0.345. Thus, the chance of observing either 1 or 2 claims is approximately: Φ[-0.345] - Φ[-1.725] = 0.365 - 0.042 = 0.323. This compares to the exact result of .3546 calculated above. The diagram above shows why the approximation was too small in this particular case41. Area A is within the first rectangle, but not under the Normal Distribution. Area B is not within the first rectangle, but is under the Normal Distribution. Area C is within the second rectangle, but not under the Normal Distribution. Area D is not within the second rectangle, but is under the Normal Distribution. Normal Approximation minus Exact Result = (Area B - Area A) + (Area D - Area C). While there was no advantage to using the Normal approximation in this example, it saves a lot of time when trying to deal with many terms. 40
Attached to the exam and shown below. The approximation gets better as the mean of the Binomial gets larger. The error can be either positive or negative. 41
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 125
In general, let µ be the mean of the frequency distribution, while σ is the standard deviation of the frequency distribution, then the chance of observing at least i claims (j + 0.5) - µ (i - 0.5) − µ and not more than j claims is approximately: Φ[ ] - Φ[ ]. σ σ Exercise: Use the Normal Approximation in order to estimate the probability of observing at least 10 claims but no more than 18 claims from a Negative Binomial Distribution with parameters β = 2/3 and r = 20. [Solution: Mean = rβ = 13.33 and variance = rβ(1+β) = 22.22. Prob[at least 10 claims but no more than 18 claims] ≅ Φ[(18.5 - 13.33) /
22.22 ] − Φ[(9.5-13.33) /
22.22 ] = Φ[1.097] − Φ[-0.813] =
0.864 - 0.208 = 0.656. Comment: The exact answer is 0.648.] Here is a graph of the Normal Approximation used in this exercise: Prob.
0.08
0.06
0.04
0.02
5
10
15
18 20
25
30
x
The continuity correction in this case: at least 10 claims but no more than 18 claims
↔ 10 - 1/2 = 9.5 to 18 + 1/2 = 18.5 on the Normal Distribution.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 126
Note that Prob[ 10 ≤ # claims ≤ 18] = Prob[ 9 < # claims < 19]. Thus one must be careful to carefully check the wording, to distinguish between open and closed intervals. Prob[ 9 < # claims < 19] = Prob[ 10 ≤ # claims ≤ 18] ≅ Φ[{18.5 - µ} / σ] − Φ[{9.5 - µ} / σ]. One should use the continuity correction whenever one is using the Normal Distribution in order to approximate the probability associated with a discrete distribution. Do not use the continuity correction when one is using the Normal Distribution in order to approximate continuous distributions, such as aggregate distributions42 or the Gamma Distribution. Exercise: Use the Normal Approximation in order to estimate the probability of observing more than 15 claims from a Poisson Distribution with λ = 10. [Solution: Mean = variance = 10. Prob[# claims > 15] = 1 - Prob[# claims ≤ 15] ≅ 1 - Φ[(15.5-10)/
10 ] = 1 - Φ[1.739] = 1 - .9590 = 4.10%.
Comment: The exact answer is 4.87%.] The area under the Normal Distribution and to the right of the vertical line at 15.5 is the approximation used in this exercise: Prob. 0.12 0.1 0.08 0.06 0.04 0.02
0
42
5
10
See “Mahlerʼs Guide to Aggregate Distributions.”
15
20
x
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 127
Diagrams: Some of you will find the following simple diagrams useful when applying the Normal Approximation to discrete distributions. More than 15 claims ⇔ At least 16 claims ⇔ 16 claims or more 15
15.5
16
|→ Prob[More than 15 claims] ≅ 1 - Φ[(15.5 - µ)/σ]. Exercise: For a frequency distribution with mean 14 and standard deviation 2, using the Normal Approximation, what is the probability of at least 16 claims? [Solution: Prob[At least 16 claims] = Prob[More than 15 claims] ≅ 1 - Φ[(15.5 - µ)/σ] = 1 - Φ[(15.5 - 14)/2] = 1 - Φ[0.75] = 1 - .07734 = 22.66%.] Less than 12 claims ⇔ At most 11 claims ⇔ 11 claims or less 11
11.5
12
←| Prob[Less than 12 claims] ≅ Φ[(11.5 - µ)/σ]. Exercise: For a frequency distribution with mean 10 and standard deviation 4, using the Normal Approximation, what is the probability of at most 11 claims? [Solution: Prob[At most 11 claims] = Prob[Less than 12 claims] ≅ Φ[(11.5 - µ)/σ] = Φ[(11.5 - 10)/4] = Φ[0.375] = 64.6%.]
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 128
At least 10 claims and at most 13 claims ⇔ More than 9 claims and less than 14 claims 9
9.5
10
11
12
13
13.5 14
Prob[At least 10 claims and at most 13 claims ] ≅ Φ[(13.5 - µ)/σ] - Φ[(9.5 - µ)/σ]. Exercise: For a frequency distribution with mean 10 and standard deviation 4, using the Normal Approximation, what is the probability of more than 9 claims and less than 14 claims? [Solution: Prob[more than 9 claims and less than 14 claims] = Prob[At least 10 claims and at most 13 claims] ≅ Φ[(13.5 - µ)/σ] − Φ[(9.5 - µ)/σ] = Φ[(13.5 - 10)/4] - Φ[(9.5 - 10)/4] = Φ[0.875] - Φ[-0.125] = 0.809 - 0.450 = 35.9%.]
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 129
Confidence Intervals: One can use the lower portion of the Normal Distribution table in order to get confidence intervals. For example, in order to get a 95% confidence interval, one allows 2.5% probability on either tail. Φ(1.96) = (1 + 95%)/2 = 97.5%. Thus 95% of the probability on the Standard Normal Distribution is between -1.96 and 1.96:
2.5%
2.5%
- 1.96
1.96
Thus a 95% confidence interval for a Normal would be: mean ± 1.960 standard deviations. Similarly, since Φ(1.645) = (1 + 90%)/2 = 95%, a 90% confidence interval is: mean ± 1.645 standard deviations.
5%
5% - 1.645
1.645
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 130
Normal Distribution: The Normal Distribution is a bell-shaped symmetric distribution. Its two parameters are (x -µ)2 ] 2σ2 , -∞ < x < ∞. σ 2π
exp[its mean µ and its standard deviation σ. f(x) =
The sum of two independent Normal Distributions is also a Normal Distribution, with the sum of the means and variances. If X is normally distributed, then so is aX + b, but with mean aµ+b and standard deviation aσ. If one standardizes a normally distributed variable by subtracting µ and dividing by σ, then one obtains a Standard Normal with mean 0 and standard deviation of 1. A Normal Distribution with µ = 10 and σ = 5: 0.08
0.06
0.04
0.02
- 10
10
The density of the Standard Normal is denoted by φ(x) =
20 exp[-x2 / 2] 2π
30 , -∞ < x < ∞.43
The corresponding distribution function is denoted by Φ(x).
Φ(x) ≅ 1- φ(x){.4361836t -.1201676t2 +.9372980t3 }, where t = 1/(1+.33267x).44 43 44
As shown near the bottom of the first page of the Tables for Exam 4/C. See pages 103-104 of Simulation by Ross or 26.2.16 in Handbook of Mathematical Functions.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Normal Distribution Support: ∞ > x > -∞
Parameters: ∞ > µ > -∞ (location parameter) σ > 0 (scale parameter)
F(x) = Φ[(x−µ)/σ]
D. f. :
P. d. f. :
⎡ x - µ⎤ f(x) = φ⎢ /σ = ⎣ σ ⎥⎦
(x -µ)2 ] 2σ2 . σ 2π
exp[-
φ(x) =
exp[-x2 / 2] 2π
.
Central Moments: E[(X−µ)n] = σn n! /{2n/2 (n/2)!} n even, n ≥ 2 E[(X−µ)n] = 0
n odd, n ≥ 1
Variance = σ2
Mean = µ
Coefficient of Variation = Standard Deviation / Mean = σ/µ Skewness = 0 (distribution is symmetric) Mode = µ
Kurtosis = 3
Median = µ
Limited Expected Value Function: E[X
∧
x] = µΦ[(x−µ)/σ] − σexp[ -(x-µ )2 /(2σ2 ) ]/ 2 π + x {1 - Φ[(x−µ)/σ]}
Excess Ratio: R(x) = {1- x/µ}{1−Φ((x−µ)/σ)} + (σ/µ)exp( -[(x-µ )2 ]/[2σ2 ] )/ 2 π Mean Residual Life: e(x) = µ - x + σexp( -[(x-µ )2 ]/[2σ2 ] )/ {{1 - Φ((x−µ)/σ) 2 π }} Derivatives of d.f. :
∂F(x) / ∂µ = - φ((x−µ)/σ) ∂F(x) / ∂σ = - φ((x−µ)/σ) / σ2
Method of Moments: µ = µ1 ′ , σ = (µ2 ′ - µ1 ′2).5 Percentile Matching: Set gi = Φ−1(pi), then σ = (x1 -x2 )/(g1 -g2 ), µ = x1 - σg1 Method of Maximum Likelihood: Same as Method of Moments.
Page 131
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 132
Using the Normal Table: When using the normal distribution, choose the nearest z-value to find the probability, or if the probability is given, chose the nearest z-value. No interpolation should be used. Example: If the given z-value is 0.759, and you need to find Pr(Z < 0.759) from the normal distribution table, then choose the probability value for z-value = 0.76; Pr(Z < 0.76) = 0.7764. When using the Normal Approximation to a discrete distribution, use the continuity correction.45 When using the top portion of the table, use the symmetry of the Standard Normal Distribution around zero: Φ[-x] = 1 - Φ[x]. For example, Φ[-0.4] = 1 - Φ[0.4] = 1 - 0.6554 = 0.3446. The bottom portion of the table can be used to get confidence intervals. To cover a confidence interval of probability P, find y such that Φ[y] = (1 + P)/2. For example, in order to get a 95% confidence interval, find y such that Φ[y] = 97.5%. Thus, y = 1.960. [-1.960, 1.960] covers 95% probability on a Standard Normal Distribution.
45
The instructions for Exam 4/C from the SOA/CAS.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 133
Normal Distribution Table Entries represent the area under the standardized normal distribution from -∞ to z, Pr(Z < z). The value of z to the first decimal place is given in the left column. The second decimal is given in the top row. z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.1 0.2 0.3 0.4
0.5000 0.5398 0.5793 0.6179 0.6554
0.5040 0.5438 0.5832 0.6217 0.6591
0.5080 0.5478 0.5871 0.6255 0.6628
0.5120 0.5517 0.5910 0.6293 0.6664
0.5160 0.5557 0.5948 0.6331 0.6700
0.5199 0.5596 0.5987 0.6368 0.6736
0.5239 0.5636 0.6026 0.6406 0.6772
0.5279 0.5675 0.6064 0.6443 0.6808
0.5319 0.5714 0.6103 0.6480 0.6844
0.5359 0.5753 0.6141 0.6517 0.6879
0.5 0.6 0.7 0.8 0.9
0.6915 0.7257 0.7580 0.7881 0.8159
0.6950 0.7291 0.7611 0.7910 0.8186
0.6985 0.7324 0.7642 0.7939 0.8212
0.7019 0.7357 0.7673 0.7967 0.8238
0.7054 0.7389 0.7704 0.7995 0.8264
0.7088 0.7422 0.7734 0.8023 0.8289
0.7123 0.7454 0.7764 0.8051 0.8315
0.7157 0.7486 0.7794 0.8078 0.8340
0.7190 0.7517 0.7823 0.8106 0.8365
0.7224 0.7549 0.7852 0.8133 0.8389
1.0 1.1 1.2 1.3 1.4
0.8413 0.8643 0.8849 0.9032 0.9192
0.8438 0.8665 0.8869 0.9049 0.9207
0.8461 0.8686 0.8888 0.9066 0.9222
0.8485 0.8708 0.8907 0.9082 0.9236
0.8508 0.8729 0.8925 0.9099 0.9251
0.8531 0.8749 0.8944 0.9115 0.9265
0.8554 0.8770 0.8962 0.9131 0.9279
0.8577 0.8790 0.8980 0.9147 0.9292
0.8599 0.8810 0.8997 0.9162 0.9306
0.8621 0.8830 0.9015 0.9177 0.9319
1.5 1.6 1.7 1.8 1.9
0.9332 0.9452 0.9554 0.9641 0.9713
0.9345 0.9463 0.9564 0.9649 0.9719
0.9357 0.9474 0.9573 0.9656 0.9726
0.9370 0.9484 0.9582 0.9664 0.9732
0.9382 0.9495 0.9591 0.9671 0.9738
0.9394 0.9505 0.9599 0.9678 0.9744
0.9406 0.9515 0.9608 0.9686 0.9750
0.9418 0.9525 0.9616 0.9693 0.9756
0.9429 0.9535 0.9625 0.9699 0.9761
0.9441 0.9545 0.9633 0.9706 0.9767
2.0 2.1 2.2 2.3 2.4
0.9772 0.9821 0.9861 0.9893 0.9918
0.9778 0.9826 0.9864 0.9896 0.9920
0.9783 0.9830 0.9868 0.9898 0.9922
0.9788 0.9834 0.9871 0.9901 0.9925
0.9793 0.9838 0.9875 0.9904 0.9927
0.9798 0.9842 0.9878 0.9906 0.9929
0.9803 0.9846 0.9881 0.9909 0.9931
0.9808 0.9850 0.9884 0.9911 0.9932
0.9812 0.9854 0.9887 0.9913 0.9934
0.9817 0.9857 0.9890 0.9916 0.9936
2.5 2.6 2.7 2.8 2.9
0.9938 0.9953 0.9965 0.9974 0.9981
0.9940 0.9955 0.9966 0.9975 0.9982
0.9941 0.9956 0.9967 0.9976 0.9982
0.9943 0.9957 0.9968 0.9977 0.9983
0.9945 0.9959 0.9969 0.9977 0.9984
0.9946 0.9960 0.9970 0.9978 0.9984
0.9948 0.9961 0.9971 0.9979 0.9985
0.9949 0.9962 0.9972 0.9979 0.9985
0.9951 0.9963 0.9973 0.9980 0.9986
0.9952 0.9964 0.9974 0.9981 0.9986
3.0 3.1 3.2 3.3 3.4
0.9987 0.9990 0.9993 0.9995 0.9997
0.9987 0.9991 0.9993 0.9995 0.9997
0.9987 0.9991 0.9994 0.9995 0.9997
0.9988 0.9991 0.9994 0.9995 0.9997
0.9988 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9997 0.9998
3.5 3.6 3.7 3.8 3.9
0.9998 0.9998 0.9999 0.9999 1.0000
0.9998 0.9998 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
z Pr(Z < z)
0.842 0.800
Values of z for selected values of Pr(Z < z) 1.036 1.282 1.645 1.960 0.850 0.900 0.950 0.975
2.326 0.990
2.576 0.995
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 134
Problems: 7.1 (2 points) You roll 1000 6-sided dice. What is the chance of observing exactly 167 sixes? (Use the Normal Approximation.) A. less than 2.5% B. at least 2.5% but less than 3.0% C. at least 3.0% but less than 3.5% D. at least 3.5% but less than 4.0% E. at least 4.0% 7.2 (2 points) You roll 1000 6-sided dice. What is the chance of observing 150 or more sixes but less than or equal to 180 sixes? (Use the Normal Approximation.) A. less than 78% B. at least 78% but less than 79% C. at least 79% but less than 80% D. at least 80% but less than 81% E. at least 81% 7.3 (2 points) You conduct 100 independent Bernoulli Trials, each with chance of success 1/4. What is the chance of observing a total of at least 16 but not more than 20 successes? (Use the Normal Approximation.) A. less than 11% B. at least 11% but less than 12% C. at least 12% but less than 13% D. at least 13% but less than 14% E. at least 14% 7.4 (2 points) One observes 10,000 independent lives, each of which has a 2% chance of death over the coming year. What is the chance of observing 205 or more deaths? (Use the Normal Approximation.) A. less than 36% B. at least 36% but less than 37% C. at least 37% but less than 38% D. at least 38% but less than 39% E. at least 39%
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 135
7.5 (2 points) The number of claims in a year is given by a Poisson distribution with parameter λ = 400. What is the probability of observing at least 420 but no more than 440 claims over the next year? (Use the Normal Approximation.) A. less than 11% B. at least 11% but less than 12% C. at least 12% but less than 13% D. at least 13% but less than 14% E. at least 14% Use the following information in the next three questions: The Few States Insurance Company writes insurance in the states of Taxachusetts, Florgia and Calizonia. Claims frequency for Few States Insurance in each state is Poisson, with expected claims per year of 400 in Taxachusetts, 500 in Florgia and 1000 in Calizonia. The claim frequencies in the three states are independent. 7.6 (2 points) What is the chance of Few States Insurance having a total of more than 1950 claims next year? (Use the Normal Approximation.) A. less than 10% B. at least 10% but less than 11% C. at least 11% but less than 12% D. at least 12% but less than 13% E. at least 13% 7.7 (3 points) What is the chance that Few States Insurance has more claims next year from Taxachusetts and Florgia combined than from Calizonia? (Use the Normal Approximation.) A. less than 1.0% B. at least 1.0% but less than 1.2% C. at least 1.2% but less than 1.4% D. at least 1.4% but less than 1.6% E. at least 1.6% 7.8 (3 points) Define a large claim as one larger than $10,000. Assume that 30% of claims are large in Taxachusetts, 25% in Florgia and 20% in Calizonia. Which of the following is an approximate 90% confidence interval for the number of large claims observed by Few States Insurance over the next year? Frequency and severity are independent. (Use the Normal Approximation.) A. [390, 500] B. [395, 495] C. [400, 490] D. [405, 485] E. [410, 480]
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 136
7.9 (2 points) A six-sided die is rolled five times. Using the Central Limit Theorem, what is the estimated probability of obtaining a total of 20 on the five rolls? A. less than 9.0% B. at least 9% but less than 9.5% C. at least 9.5% but less than 10% D. at least 10% but less than 10.5% E. at least 10.5% 7.10 (2 points) The number of claims in a year is given by the negative binomial distribution: ⎛ 9999 + x⎞ P[X=x] = ⎜ ⎟ 0.610000 0.4x, x = 0,1,2,3... x ⎝ ⎠ Using the Central Limit Theorem, what is the estimated probability of having 6800 or more claims in a year? A. less than 10.5% B. at least 10.5% but less than 11% C. at least 11% but less than 11.5% D. at least 11.5% but less than 12% E. at least 12% 7.11 (2 points) In order to estimate 1 - Φ(4), use the formula:
Φ(x) ≅ 1- φ(x){.4361836t -.1201676t2 +.9372980t3 }, where t = 1/(1 + .33267x), A. less than .0020% B. at least .0020% but less than .0020% C. at least .0025% but less than .0030% D. at least .0030% but less than .0035% E. at least .0035% 7.12 (2 points) You are given the following:
•
The New York Yankees baseball team plays 162 games.
•
Assume the Yankees have an a priori chance of winning each game of 65%.
•
Assume the results of the games are independent of each other.
What is the chance of the Yankees winning 114 or more games? (Use the Normal Approximation.) A. less than 6% B. at least 6% but less than 7% C. at least 7% but less than 8% D. at least 8% but less than 9% E. at least 9%
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 137
7.13 (2 points) You are given the following:
• Sue takes an actuarial exam with 40 multiple choice questions, each of equal value. • Sue knows the answers to 13 questions and answers them correctly. • Sue guesses at random on the remaining 27 questions, with a 1/5 chance of getting each such question correct, with each question independent of the others. If 22 correct answers are needed to pass the exam, what is the probability that Sue passed her exam? Use the Normal Approximation. A. 4% B. 5% C. 6% D. 7% E. 8% 7.14 (3 points) You are given the following:
•
The New York Yankees baseball team plays 162 games, 81 at home and 81 on the road.
•
The Yankees have an a priori chance of winning each home game of 80%.
•
The Yankees have an a priori chance of winning each road game of 50%.
•
Assume the results of the games are independent of each other.
What is the chance of the Yankees winning 114 or more games? (Use the Normal Approximation.) A. less than 6% B. at least 6% but less than 7% C. at least 7% but less than 8% D. at least 8% but less than 9% E. at least 9% 7.15 (2 points) You are given the following:
• Lucky Tom takes an actuarial exam with 40 multiple choice questions, each of equal value. • Lucky Tom knows absolutely nothing about the material being tested. • Lucky Tom guesses at random on each question, with a 40% chance of getting each question correct, independent of the others. If 24 correct answers are needed to pass the exam, what is the probability that Lucky Tom passed his exam? Use the Normal Approximation. A. 0.4% B. 0.5% C. 0.6% D. 0.7% E. 0.8% 7.16 (4 points) X has a Normal Distribution with mean µ and standard deviation σ. Determine the expected value of |x|.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 138
7.17 (4, 5/86, Q.48) (2 points) Assume an insurer has 400 claims drawn independently from a distribution with mean 500 and variance 10,000. Assuming that the Central Limit Theorem applies, find M such that the probability of the sum of these claims being less than or equal to M is approximately 99%. In which of the following intervals is M? A. Less than 202,000 B. At least 202,000, but less than 203,000 C. At least 203,000, but less than 204,000 D. At least 204,000, but less than 205,000 E. 205,000 or more 7.18 (4, 5/86, Q.51) (1 point) Suppose X has a Poisson distribution with mean q. Let Φ be the (Cumulative) Standard Normal Distribution. Which of the following is an approximation for Prob(1 ≤ x ≤ 4) for sufficiently large q? A. Φ[(4 - q) /
q ] - Φ[(1 - q) /
q]
B. Φ[(4.5 - q) /
q ] - Φ[(.5 - q) /
C. Φ[(1.5 - q) /
q ] - Φ[(3.5 - q) /
q]
D. Φ[( 3.5 - q) /
q ] - Φ[(1.5 - q) /
q]
q]
E. Φ[(4 - q) / q] - Φ[(1 - q) / q] 7.19 (4, 5/87, Q.51) (2 points) Suppose that the number of claims for an individual policy during a year has a Poisson distribution with mean 0.01. What is the probability that there will be 5, 6, or 7 claims from 400 identical policies in one year, assuming a normal approximation? A. Less than 0.30 B. At least 0.30, but less than 0.35 C. At least 0.35, but less than 0.40 D. At least 0.40, but less than 0.45 E. 0.45 or more. 7.20 (4, 5/88, Q.46) (1 point) A random variable X is normally distributed with mean 4.8 and variance 4. The probability that X lies between 3.6 and 7.2 is Φ(b) - Φ(a) where Φ is the distribution function of the unit normal variable. What are a and b, respectively? A. 0.6, 1.2 B. 0.6, -0.3 C. -0.3, 0.6 D. -0.6, 1.2 E. None A, B, C, or D. 7.21 (4, 5/88, Q.49) (1 point) An unbiased coin is tossed 20 times. Using the normal approximation, what is the probability of obtaining at least 8 heads? The cumulative unit normal distribution is denoted by Φ(x). A. Φ(-1.118)
B. Φ(-.671)
C. 1 - Φ(-.447)
D. Φ(.671)
E. Φ(1.118)
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 139
7.22 (4, 5/90, Q.25) (1 point) Suppose the distribution of claim amounts is normal with a mean of $1,500. If the probability that a claim exceeds $5,000 is .015, in what range is the standard deviation, σ, of the distribution? A. σ < 1,600 B. 1,600 ≤ σ < 1,625 C. 1,625 ≤ σ < 1,650 D. 1,650 ≤ σ < 1,675 E. σ ≥ 1,675 7.23 (4, 5/90, Q.36) (2 points) The number of claims for each insured written by the Homogeneous Insurance Company follows a Poisson process with a mean of .16. The company has 100 independent insureds. Let p be the probability that the company has more than 12 claims and less than 20 claims. In what range does p fall? You may use the normal approximation. A. p < 0.61 B. 0.61 < p < 0.63 C. 0.63 < p < 0.65 D. 0.65 < p < 0.67 E. 0.67 < p 7.24 (4, 5/91, Q.29) (2 points) A sample of 1,000 policies yields an estimated claim frequency of 0.210. Assuming the number of claims for each policy has a Poisson distribution, use the Normal Approximation to find a 95% confidence interval for this estimate. A. (0.198, 0.225) B. (0.191, 0.232) C. (0.183, 0.240) D. (0.173, 0.251) E. (0.161, 0.264) 7.25 (4B, 5/92, Q.5) (2 points) You are given the following information: • Number of large claims follows a Poisson distribution. • Exposures are constant and there are no inflationary effects. • In the past 5 years, the following number of large claims has occurred: 12, 15, 19, 11, 18 Estimate the probability that more than 25 large claims occur in one year. (The Poisson distribution should be approximated by the normal distribution.) A. Less than .002 B. At least .002 but less than .003 C. At least .003 but less than .004 D. At least .004 but less than .005 E. At least .005
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 140
7.26 (4B, 11/92, Q.13) (2 points) You are given the following information: • The occurrence of hurricanes in a given year has a Poisson distribution. For the last 10 years, the following number of hurricanes has occurred: 2, 4, 3, 8, 2, 7, 6, 3, 5, 2 Using the normal approximation to the Poisson, determine the probability of more than 10 hurricanes occurring in a single year. A. Less than 0.0005 B. At least 0.0005 but less than 0.0025 C. At least 0.0025 but less than 0.0045 D. At least 0.0045 but less than 0.0065 E. At least 0.0065 •
7.27 (4B, 5/94, Q.20) (2 points) The occurrence of tornadoes in a given year is assumed to follow a binomial distribution with parameters m = 50 and q = 0.60. Using the Normal approximation to the binomial, determine the probability that at least 25 and at most 40 tornadoes occur in a given year. A. Less than 0.80 B. At least 0.80, but less than 0.85 C. At least 0.85, but less than 0.90 D. At least 0.90, but less than 0.95 E. At least 0.95 7.28 (5A, 11/94, Q.35) (1.5 points) An insurance contract was priced with the following assumptions: Claim frequency is Poisson with mean 0.01. All claims are of size $5000. Premiums are 110% of expected losses. The company requires a 99% probability of not having losses exceed premiums. (3/4 point) a. What is the minimum number of policies that the company must write given the above surplus requirement? (3/4 point) b. After the rate has been established, it was discovered that the claim severity assumption was incorrect and that the claim severity should be 5% greater than originally assumed. Now, what is the minimum number of policies that the company must write given the above surplus requirement?
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 141
7.29 (4B, 11/96, Q.31) (2 points) You are given the following: • A portfolio consists of 1,600 independent risks. • For each risk the probability of at least one claim is 0.5. Using the Central Limit Theorem, determine the approximate probability that the number of risks in the portfolio with at least one claim will be greater than 850. A. Less than 0.01 B. At least 0.01, but less than 0.05 C. At least 0.05, but less than 0.10 D. At least 0.10, but less than 0.20 E. At least 0.20 7.30 (4B, 11/97, Q.1) (2 points) You are given the following: • A portfolio consists of 10,000 identical and independent risks. • The number of claims per year for each risk follows a Poisson distribution with mean λ. • During the latest year, 1000 claims have been observed for the entire portfolio. Determine the lower bound of a symmetric 95% confidence interval for λ. A. Less than 0.0825 B. At least 0.0825, but less than 0.0875 C. At least 0.0875, but less than 0.0925 D. At least 0.0925, but less than 0.0975 E. At least 0.0975 7.31 (IOA 101, 9/00, Q.3) (1.5 points) The number of claims arising in a period of one month from a group of policies can be modeled by a Poisson distribution with mean 24. Using the Normal Approximation, determine the probability that fewer than 20 claims arise in a particular month. 7.32 (IOA 101, 4/01, Q.4) (1.5 points) For a certain type of policy the probability that a policyholder will make a claim in a year is 0.001. If a random sample of 10,000 policyholders is selected, using the Normal Approximation, calculate an approximate value for the probability that not more than 5 will make a claim next year.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 142
Solutions to Problems: 7.1. C. This is Binomial with q = 1/6 and m =1000. Mean = 1000 /6 = 166.66. Standard Deviation = Φ[
(1000)(5 / 6)(1/ 6) = 11.785.
167.5 -166.66 166.5 -166.66 ] - Φ[ ] = Φ[0.07] - Φ[-0.01] = 0.5279 - 0.4960 = 0.0319. 11.785 11.785
7.2. D. This is Binomial with q = 1/6 and m =1000. Mean = 1000 /6 = 166.66. Standard Deviation =
(1000)(5 / 6)(1/ 6) = 11.785. The interval from 150 to 180 corresponds on
the Standard Normal to the interval from {(149.5-166.66)/11.785} to {(180.5-166.66)/11.785}. Therefore the desired probability is: Φ((180.5-166.66)/11.785) - Φ((149.5-166.66)/11.785) = Φ(1.17) - Φ( -1.46) = .8790 - .0721 = 0.8069. Comment: The exact answer is 0.8080, so the Normal Approximation is quite good. 7.3. D. This is the Binomial Distribution with q =.25 and m = 100. Therefore the mean is (100)(.25) = 25. The Variance is: (100)(.25)(.75) = 18.75 and the Standard Deviation is: Therefore the desired probability is:
18.75 = 4.330.
Φ[(20.5-25)/4.330) - Φ((15.5-25)/4.330] = Φ(-1.04) - Φ(-2.19) = .1492 - .0143 = 0.1349. Comment: The exact answer is .1377, so the Normal Approximation is okay. 7.4. C. Binomial Distribution with mean = 200 and variance = (10,000)(.02)(1-.02) = 196. Standard deviation = 14. Chance of 205 or more claims = 1 - chance of 204 claims or less ≅ 1 - Φ((204.5-200)/14) = 1 - Φ(.32) =1 - .6255 = 0.3745. 7.5. E. Mean = 400 = variance. Standard deviation = 20. Φ((440.5-400)/20) - Φ((419.5-400)/20) = Φ(2.03) - Φ(0.98) =.9788 - .8365 = 0.1423. 7.6. D. The total claims follow a Poisson Distribution with mean 400 + 500 + 1000 = 1900, since independent Poisson variables add. This has a variance equal to the mean of 1900 and therefore a standard deviation of
1900 = 43.59.
Prob[more than 1950 claims] ≅ 1 - Φ((1950.5-1900)/43.59) = 1 - Φ(1.16) = 1- .8770 = 0.123.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 143
7.7. B. The number of claims in Taxachusetts and Florgia is given by a Poisson with mean 400 + 500 = 900. (Since the sum of independent Poisson variables is a Poisson.) This is approximated by a Normal distribution with a mean of 900 and variance of 900. The number of claims in Calizonia is approximated by the Normal distribution with mean 1000 and variance of 1000. The difference between the number of claims in Calizonia and the sum of the claims in Taxachusetts and Florgia is therefore approximately a Normal Distribution with mean = 1000 - 900 = 100 and variance = 1000 + 900 = 1900. More claims next year from Taxachusetts and Florgia combined than from Calizonia ⇔ (# in Calizonia) - (# in Taxachusetts + # in Florgia) < 0 ⇔ (# in Calizonia) - (# in Taxachusetts + # in Florgia) ≤ -1. The probability of this is approximately: Φ({(0 - .5) -100} /
1900 ) = Φ(-2.31) = .0104.
Comment: The sum of independent Normal variables is a Normal. If X is Normal, then so is -X, so the difference of Normal variables is also Normal. Also E[X - Y] = E[X] - E[Y]. For X and Y independent variables Var[X - Y] = Var[X] + Var[Y]. 7.8. E. The number of large claims in Taxachusetts is Poisson with mean (30%)(400) = 120. (This is the concept of “thinning” a Poisson.) Similarly the number of large claims in Florgia and Calizonia are Poisson with means of 125 and 200 respectively. Thus the large claims from all three states is Poisson with mean = 120 + 125 + 200 = 445. (The sum of independent Poisson variables is a Poisson.) This Poisson is approximated by a Normal with a mean of 445 and a variance of 445. The standard deviation is
445 = 21.10. Φ(1.645) = .95 and thus a 90% confidence interval ≅
the mean ± 1.645 standard deviations, which in this case is about 445 ± (1.645)(21.10) = 410.3 to 479.7. Thus [410, 480] covers a little more than 90% of the probability. 7.9. A. For a six-sided die the mean is 3.5 and the variance is 35/12. For 5 such dice the mean is: (5)(3.5) = 17.5 and the variance is: (5)(35/12) = 175/12. The standard deviation = 3.819. Thus the interval from 19.5 to 20.5 corresponds to (19.5-17.5)/3.819 = .524 to (20.5-17.5)/3.819 = .786 on the standard unit normal distribution. Using the Standard Normal Table, this has a probability of Φ(.79) - Φ(.52) = .7852 - .6985 = 0.0867. 7.10. A. A Negative Binomial distribution with β = 2/3 and r =10000. Mean = rβ= (10000)(2/3) = 6666.66. Variance = mean(1+ β) = 11111.11 Standard Deviation = 105.4. 1 - Φ((6799.5-6666.66)/105.4) = 1 - Φ(1.26) = 1- .8962 = 10.38% Comment: You have to recognize that this is an alternate way of writing the Negative Binomial Distribution. In the tables attached to the exam, f(x) = {r(r+1)..(r+x-1)/x!} β x / (1+β)x+r. The factor β/(1 + β) is taken to the power x. Thus for the form of the distribution in this question, β/ (1 + β) = 0.4, and solving β = 2/3. Then line up with the formula in Appendix B and note that r = 10,000.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 144
7.11. D. t = 1/(1+(.33267)(4)) = .42906.
φ(4) = exp(-42 /2) / 2 π = 0.00033546/2.5066 = 0.00013383. 1- Φ(4) ≅ (.0013383){.4361836(.42906) -.1201676(.42906)2 +.9372980(.42906)3 } = (.00013383)(.23906) = 0.00003199. Comment: The exact answer is 1- Φ(4) = .000031672. 7.12. D. The number of games won is a binomial with m =162 and q = .65. The mean is: (162)(.65) = 105.3. The variance is: (162)(.65)(1-.65) = 36.855. The standard deviation is
36.855 = 6.07. Thus the chance of 114 or more wins is about:
1 - Φ((113.5-105.3)/6.07) = 1 - Φ(1.35) = 1 - .9115 = 8.85%. Comment: The exact probability is 8.72%. 7.13. D. The number of correct guesses is Binomial with parameters m = 27 and q = 1/5, with mean: (1/5)(27) = 5.4 and variance: (1/5)(4/5)(27) = 4.32. Therefore, Prob(# correct guesses ≥ 9) ≅ 1 - Φ[(8.5-5.4)/ 4.32 ] = 1 - Φ(1.49) = 6.81%. Comment: Any resemblance between the situation in this question and actual exams is coincidental. The exact answer is in terms of an incomplete Beta Function: 1 - β(19, 9, 0.8) = 7.4%. If Sue knows c questions, c ≤ 22, then her chance of passing is: 1 - β(19, 22-c, 0.8), as displayed below: 1
0.8
0.6
0.4
0.2
5
10
15
20
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 145
7.14. C. The number of home games won is a binomial with m = 81 and q = .8. The mean is: (81)(.8) = 64.8 and variance is: (81)(.8)(1-.8) = 12.96. The number of road games won is a binomial with m = 81 and q = .5. The mean is: (81)(.5) = 40.5 and variance is: (81)(.5)(1-.5) = 20.25. The number of home and road wins are independent random variables, thus the variance of their sum is the sum of their variances: 12.96 + 20.25 = 33.21. The standard deviation is 33.21 = 5.76. The mean number of wins is: 64.8 + 40.5 = 105.3. Thus the chance of 114 or more wins is about: 1 - Φ((113.5-105.3)/5.76) = 1 - Φ(1.42) = 1 - .9228 = 7.78%. Comment: The exact probability is 7.62%, obtained by convoluting two binomial distributions. 7.15. E. Number of questions Lucky Tom guesses correctly is Binomial with mean (0.4)(40) = 16, and variance (40)(0.4)(0.6) = 9.6. The probability he guesses 24 or more correctly is approximately: 1 - Φ[(23.5 - 16)/ 9.6 ] = 1 - Φ(2.42) = 1 - .9922 = 0.78%. Comment: The exact answer is 0.834177%. An ordinary person would only have a 20% chance of randomly guessing correctly on each question. Therefore, their chance of passing would be approximately: 1 - Φ[(23.5 - 8)/ 6.4 ] = 1 - Φ(6.13) = 4.5 x 10-10. 7.16. Let y = (x - µ)/σ. Then y follows a Standard Normal Distribution with mean 0 and standard deviation 1. f(y) = exp[-y2 /2]/ 2 π . x = σy + µ. Expected value of |x| = Expected value of |σy + µ| = ∞
∫
| σy + µ | exp[-y2 / 2] 2π
-∞ -µ/ σ
= -σ
∫
-∞
dy = -
∫
(σy + µ) exp[-y2 / 2] 2π
-∞
y exp[-y2 / 2] 2π
-µ/ σ
∞
∫
dy - µΦ(-µ/σ) + σ
-µ/ σ y=∞
µ{1 - 2Φ(-µ/s)} +
σ ⎤ exp[-y2 / 2]⎥ 2π ⎦ y = -µ / σ
µ{1 - 2Φ[-µ/σ]} + σ
∞
dy +
∫
-µ/ σ
y exp[-y2 / 2] dy + µ{1 - Φ(-µ/σ)} = 2π y = -µ / σ
-
(σy + µ) exp[-y2 / 2] dy 2π
σ ⎤ exp[-y2 / 2]⎥ 2π ⎦
=
y = -∞
2 µ2 exp[]. π 2 σ2
Comment: For a Standard Normal, with µ = 0 and σ = 1, E[|X|] =
2 . π
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 146
7.17. D. The sum of 400 claims has a mean of (400)(500) = 200,000 and a variance of (400)(10000). Thus the standard deviation of the sum is (20)(100) = 2000. In order to standardize the variables one subtracts the mean and divides by the standard deviation, thus standardizing M gives: (M - 200,000)/2000. We wish the probability of the sum of the claims being less than or equal to M to be 99%. For the standard Normal Distribution, Φ(2.327) = 0.99. Setting 2.327 = (M - 200,000)/2000, we get M = 200000 + (2.327)(2000) = 204,654. 7.18. B. Φ[(4.5 - q) / q ] - Φ[(0.5 - q ) / q ]. 7.19. C. The portfolio has a mean of (400)(.01) = 4. Since each policy has a variance of .01 and they should be assumed to be independent, then the variance of the portfolio is (400)(.01) = 4. Thus the probability of 5, 6 or 7 claims is approximately: Φ[(7.5-4)/2] - Φ[4.5-4)/2] = Φ[1.75] - Φ[.25] = 0.9599 - 0.5987 = 0.3612. 7.20. D. The standard deviation of the Normal is 4 = 2. Thus 3.6 corresponds to (3.6-4.8)/2 = -0.6, while 7.2 corresponds to (7.2-4.8)/2 = 1.2. 7.21. E. The distribution is Binomial with q =.5 and m = 20. That has mean (20)(.5) = 10 and variance (20)(.5)(1-.5) = 5. The chance of obtaining 8 or more heads is approximately: 1 - Φ[(7.5-10)/ 5 ] = 1 - Φ(-1.118) = 1 - {1 - Φ(1.118)} = Φ(1.118). 7.22. B. The chance that a claim exceeds 5000 is 1 - Φ((5000-1500) / σ) = .015. Thus Φ(3500 / σ) = .985. Consulting the Standard Normal Distribution, Φ(2.17) = .985, therefore 3500 / σ = 2.17. σ = 3500 / 2.17 = 1613. 7.23. B. The sum of independent Poisson variables is a Poison. The mean number of claims is (100)(.16) = 16. Since for a Poisson the mean and variance are equal, the variance is also 16. The standard deviation is 4. The probability is: Φ((19.5-16) / 4) - Φ((12.5-16) / 4) = Φ(0.87) - Φ(-0.88) = 0.8078 - 0.1894 = 0.6184. Comment: More than 12 claims (greater than or equal to 13 claims) corresponds to 12.5 due to the “continuity correction”.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 147
7.24. C. The observed number of claims is (1000)(.210) = 210. Since for the Poisson Distribution the variance is equal to the mean, the estimated variance for the sum is also 210. The standard deviation is
210 = 14.49. Using the Normal Approximation, an approximate 95% confidence
interval is ± 1.96 standard deviations. (Φ(1.96) = .975.) Therefore a 95% confidence interval for the number of claims from 1000 policies is 210 ± (1.96)(14.49) = 210 ± 28.4. A 95% confidence interval for the claim frequency is: 0.210 ± 0.028. Alternately, the standard deviation for the estimated frequency declines as the square root of the number of policies used to estimate it: 0.210 / 1000 = .458 / 31.62 = 0.01449. Thus a 95% confidence interval for the claim frequency is: 0.210 ± (1.96)(0.01449) = 0.210 ± 0.028. Alternately, one can be a little more “precise” and let λ be the Poisson frequency. Then the standard deviation is:
λ / 1000 and the 95% confidence interval has λ within 1.96 standard deviations of
.210: -1.96 λ / 1000 ≤ (.210 - λ) ≤ 1.96 λ / 1000 . We can solve for the boundaries of this interval: 1.962 λ /1000 = (.210 - λ)2 . ⇒ λ2 - 0.4238416λ + 0.0441 = 0. λ = {0.4238416 ±
0.42384162 - (4)(1)(0.0441) } / {(2) (1)} = 0.2119 ± 0.0285.
Thus the boundaries are 0.2119 - 0.0285 = 0.183 and 0.2119 + 0.0285 = 0.240. Comment: One needs to assume that the policies have independent claim frequencies. The sum of independent Poisson variables is again a Poisson. 7.25. C. The average number of large claims observed per year is: (12+15+19+11+18)/5 = 15. Thus we estimate that the Poisson has a mean of 15 and thus a variance of 15. Thus Prob(N > 25) ≅ 1 - Φ[(25.5 - 15)/
15 ] = 1 - Φ(2.71) ≅ 1 - 0.9966 = 0.0034.
7.26. B. The observed mean is 42 / 10 = 4.2. Assume a Poisson with mean of 4.2 and therefore variance of 4.2. Using the “continuity correction”, more than 10 on the discrete Poisson, (11, 12, 13, ...) will correspond to more than 10.5 on the continuous Normal Distribution. With a mean of 4.2 and a standard deviation of
4.2 = 2.05, 10.5 is “standardized” to:
(10.5 - 4.2) / 2.05 = 3.07. Thus P(N > 10) ≅ 1 - Φ(3.07) = 1 - .9989 = 0.0011. 7.27. D. The mean of the Binomial is mq = (50)(.6) = 30. The variance of the Binomial is mq(1-q) = (50)(.6)(1-.6) = 12. Thus the standard deviation is
12 = 3.464.
Φ[(40.5 - 30) / 3.464] - Φ[(24.5 - 30) / 3.464] = Φ(3.03) - Φ(-1.59) = 0.9988 - 0.0559 = 0.9429.
2013-4-1,
Frequency Distributions, §7 Normal Approximation
HCM 10/4/12,
Page 148
7.28. (a) With N policies, the mean aggregate loss = N(.01)(5000) and the variance of aggregate losses = N(.01)(50002 ). Thus premiums are 1.1N(.01)(5000). The 99th percentile of the Standard Normal Distribution is 2.326. Thus we want: Premiums - Expected Losses = 2.326(standard deviation of aggregate losses). 0.1N(0.01)(5000) = 2.326 N(0.01)(50002 ) . Therefore, N = (2.326/0.1)2 /0.01 = 54,103. (b) With a severity of (1.05)(5000) = 5250 and N policies, the mean aggregate loss = N(.01)(5250), and the variance of aggregate losses = N(.01)(52502 ). Premiums are still: 1.1N(0.01)(5000). Therefore, we want: 1.1N(0.01)(5000) - N(0.01)(5250) = 2.326 N(0.01)(52502 ) . 2.5N = 1221.15 N . N = 238,593. 7.29. A. The mean is 800 while the variance is: (0.5)(1 - 0.5)(1600) = 400. Thus the standard deviation is 20. Using the continuity correction, more than 850 corresponds on the continuous Normal Distribution to: 1 - Φ[(850.5-800)/20] = 1 - Φ(2.53) = 0.0057. 7.30. D. The observed frequency is 1000/10000 = .1, which is the point estimate for λ. Since for a Poisson the variance is equal to the mean, the estimated variance for a single insured of its observed frequency is .1. For the sum of 10,000 identical insureds, the variance is divided by 10000; thus the variance of the observed frequency is: 0.1/10,000 = 1/100,000. The standard deviation is
1/ 100,000 = 0.00316. Using the Normal Approximation, ±1.96 standard deviations
would produce an approximate 95% confidence interval: 0.1 ± (1.96)(0.00316) = 0.1 ± 0.0062 = [0.0938, 0.1062]. 7.31. Prob[N < 20] ≅ Φ[(19.5 - 24)/ 24 ] = Φ[-0.92] = 17.88%. 7.32. The Binomial has mean: (0.001)(10,000) = 10, and variance: (0.001)(.999)(10,000) = 9.99. Prob[N ≤ 5] ≅ Φ[(5.5 - 10)/ 9.99 ] = Φ[-1.42] = 7.78%. Comment: The exact answer is 0.06699.
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 149
Section 8, Skewness Skewness is one measure of the shape of a distribution.46 For example, take the following frequency distribution:
Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11
Probability Density Function 0.1 0.2 0 0.1 0 0 0.1 0 0 0.1 0.3 0.1
Probability x # of Claims 0 0.2 0 0.3 0 0 0.6 0 0 0.9 3 1.1
Probability x Square of # of Claims 0 0.2 0 0.9 0 0 3.6 0 0 8.1 30 12.1
Probability x Cube of # of Claims 0.0 0.2 0.0 2.7 0.0 0.0 21.6 0.0 0.0 72.9 300.0 133.1
Sum
1
6.1
54.9
530.5
E[X] = 1st moment about the origin = 6.1 E[X2 ] = 2nd moment about the origin = 54.9 E[X3 ] = 3rd moment about the origin = 530.5 Variance ≡ 2nd Central Moment = E[X2 ] - E[X]2 = 54.9 - 6.12 = 17.69. Standard Deviation =
17.69 = 4.206.
3rd Central Moment ≡ E[(X - E[X])3 ] = E[X3 - 3X2 E[X] + 3XE[X]2 - E[X]3 ] = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 530.5 - (3)(6.1)(54.9) + (2)(6.13 ) = -20.2. (Coefficient of) Skewness ≡ Third Central Moment / STDDEV3 = -20.2/4.2063 = -0.27. The third central moment: µ3 ≡ E[ (X - E[X])3 ] = E[X3 ] - 3E[X] E[X2 ] + 2E[X]3 . The (coefficient of) skewness is defined as the 3rd central moment divided by the cube of the standard deviation = E[ (X - E[X])3 ] / STDDEV3 . 46
The coefficient of variation and kurtosis are others. See “Mahlerʼs Guide to Loss Distributions.”
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 150
In the above example, the skewness is -0.27. A negative skewness indicates a curve skewed to the left. The Binomial Distribution for q > 0.5 is skewed to the left. For example, here is a Binomial Distribution with m = 10 and q = 0.7:
0.25
0.2
0.15
0.1
0.05
0
2
4
6
8
10
In contrast, the Binomial Distribution for q < 0.5 has positive skewness and is skewed to the right. For example, here is a Binomial Distribution with m = 10 and q = 0.2: 0.3 0.25 0.2 0.15 0.1 0.05
0
2
4
6
8
10
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 151
The Poisson Distribution, the Negative Binomial Distribution (including the special case of the Geometric Distribution), as well as most size of loss distributions, are skewed to the right; they have a small but significant probability of very large values. For example, here is a Geometric Distribution with β = 2:47
0.3 0.25 0.2 0.15 0.1 0.05
0
47
2
4
6
8
10
Even through only densities up to 10 are shown, the Geometric Distribution has support from zero to infinity.
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 152
As another example of a distribution skewed to the right, here is a Poisson Distribution with λ = 3:48
0.2
0.15
0.1
0.05
0
2
4
6
8
10
For the Poisson distribution the skewness is positive and therefore the distribution is skewed to the right. However, as λ gets very large, the skewness of a Poisson approaches zero; in fact the Poisson approaches a Normal Distribution.49 For example, here is a Poisson Distribution with λ = 30: 0.07 0.06 0.05 0.04 0.03 0.02 0.01
0
48
10
20
30
40
50
60
Even through only densities up to 10 are shown, the Poisson Distribution has support from zero to infinity. This follows from the Central Limit Theorem and the fact that for integral N, a Poisson with parameter N is the sum of N independent variables each with a Poisson distribution with a parameter of unity. The Normal Distribution is symmetric and therefore has zero skewness. 49
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 153
A symmetric distribution has zero skewness. Therefore, the Binomial Distribution for q = 0.5 and the Normal Distribution each have zero skewness. For example, here is a Binomial Distribution with m = 10 and q = 0.5: 0.25
0.2
0.15
0.1
0.05
0
2
4
6
8
10
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 154
Binomial Distribution: For a Binomial Distribution with m = 5 and q = 0.1: Number of Claims 0 1 2 3 4 5
Probability Density Function 59.049% 32.805% 7.290% 0.810% 0.045% 0.001%
Probability x # of Claims 0.00000 0.32805 0.14580 0.02430 0.00180 0.00005
Probability x Square of # of Claims 0.00000 0.32805 0.29160 0.07290 0.00720 0.00025
Probability x Cube of # of Claims 0.00000 0.32805 0.58320 0.21870 0.02880 0.00125
Sum
1
0.50000
0.70000
1.16000
The mean is: 0.5 = (5)(0.1) = mq. The variance is: E[X2 ] - E[X]2 = 0.7 - 0.52 = 0.45 = (5)(0.1)(0.9) = mq(1-q). The skewness is:
E[X3] - 3 E[X] E[X2] + 2 E[X]3 1.16 - (3)(0.7)(0.5) + 2 (0.53 ) = σ3 0.45 3 / 2 0.8
= 1.1925 =
For a Binomial Distribution, the skewness is:
1 - 2q m q (1 - q)
0.45
=
1 - 2q m q (1 - q)
.
Binomial Distribution with q < 1/2 ⇔ positive skewness ⇔ skewed to the right. Binomial Distribution q = 1/2 ⇔ symmetric ⇒ zero skewness. Binomial Distribution q > 1/2 ⇔ negative skewness ⇔ skewed to the left.
.
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 155
Poisson Distribution: For a Poisson distribution with λ = 2.5:
Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Probability Density Function 0.08208500 0.20521250 0.25651562 0.21376302 0.13360189 0.06680094 0.02783373 0.00994062 0.00310644 0.00086290 0.00021573 0.00004903 0.00001021 0.00000196 0.00000035 0.00000006 0.00000001 0.00000000 0.00000000 0.00000000 0.00000000
Probability x # of Claims 0.00000000 0.20521250 0.51303124 0.64128905 0.53440754 0.33400471 0.16700236 0.06958432 0.02485154 0.00776611 0.00215725 0.00053931 0.00012257 0.00002554 0.00000491 0.00000088 0.00000015 0.00000002 0.00000000 0.00000000 0.00000000
Probability x Square of # of Claims 0.00000000 0.20521250 1.02606248 1.92386716 2.13763017 1.67002357 1.00201414 0.48709021 0.19881233 0.06989496 0.02157252 0.00593244 0.00147085 0.00033196 0.00006875 0.00001315 0.00000234 0.00000039 0.00000006 0.00000001 0.00000000
Probability x Cube of # of Claims 0.00000000 0.20521250 2.05212497 5.77160147 8.55052069 8.35011786 6.01208486 3.40963146 1.59049864 0.62905464 0.21572518 0.06525687 0.01765024 0.00431553 0.00096250 0.00019731 0.00003741 0.00000660 0.00000109 0.00000017 0.00000002
Sum
1.00000000
2.50000000
8.75000000
36.87500000
The mean is: 2.5 = λ. The coefficient of variation = The skewness is:
Distribution Function 0.08208500 0.28729750 0.54381312 0.75757613 0.89117802 0.95797896 0.98581269 0.99575330 0.99885975 0.99972265 0.99993837 0.99998740 0.99999762 0.99999958 0.99999993 0.99999999 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000
The variance is: E[X2 ] - E[X]2 = 8.75 - 2.52 = 2.5 = λ. 1 variance λ = = . mean λ λ
E[X3] - 3 E[X] E[X2] + 2 E[X]3 = σ3
36.875 - (3)(2.5)(8.75) + 2 (2.53) = 2.5 3/ 2
1 2.5
=
For the Poisson Distribution, the skewness is:
1
.
λ 1
λ
. For the Poisson, Skewness = CV.
Poisson Distribution ⇔ positive skewness ⇔ skewed to the right.
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 156
Negative Binomial Distribution: For a Negative Binomial distribution with r = 3 and β = 0.4: Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Probability Density Function 0.36443149 0.31236985 0.17849705 0.08499860 0.03642797 0.01457119 0.00555093 0.00203912 0.00072826 0.00025431 0.00008719 0.00002944 0.00000981 0.00000324 0.00000106 0.00000034 0.00000011 0.00000004 0.00000001 0.00000000 0.00000000
Probability x # of Claims 0.00000000 0.31236985 0.35699411 0.25499579 0.14571188 0.07285594 0.03330557 0.01427382 0.00582605 0.00228880 0.00087193 0.00032386 0.00011777 0.00004206 0.00001479 0.00000513 0.00000176 0.00000060 0.00000020 0.00000007 0.00000002
Probability x Square of # of Claims 0.00000000 0.31236985 0.71398822 0.76498738 0.58284753 0.36427970 0.19983344 0.09991672 0.04660838 0.02059924 0.00871926 0.00356244 0.00141320 0.00054677 0.00020706 0.00007697 0.00002815 0.00001015 0.00000361 0.00000127 0.00000044
Sum
1.00000000
1.19999999
3.11999977
Probability x Cube of # of Claims 0.00000000 0.31236985 1.42797644 2.29496213 2.33139010 1.82139852 1.19900062 0.69941703 0.37286706 0.18539316 0.08719255 0.03918682 0.01695839 0.00710805 0.00289887 0.00115454 0.00045038 0.00017251 0.00006501 0.00002414 0.00000885 10.79999502
Distribution Function 0.36443149 0.67680133 0.85529839 0.94029699 0.97672496 0.99129614 0.99684707 0.99888619 0.99961445 0.99986876 0.99995595 0.99998539 0.99999520 0.99999844 0.99999950 0.99999984 0.99999995 0.99999998 0.99999999 1.00000000 1.00000000
The mean is: 1.2 = (3)(0.4) = rβ. The variance is: E[X2 ] - E[X]2 = 3.12 - 1.22 = 1.68 = (3)(.4)(1.4) = rβ(1+β). The third central moment is: E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 = 10.8 - (3)(1.2)(3.12) + 2 (1.23 ) = 3.024 = (1.8)(3)(.4)(1.4) = (1 + 2β)rβ(1 + β).50 The skewness is: 3.024 / 1.683/2 = 1.389 =
1.8 (3)(0.4 )(1.4 )
For the Negative Binomial Distribution, the skewness is:
=
1 + 2β rβ(1 + β )
1 + 2β rβ(1 + β )
.
Negative Binomial Distribution ⇔ positive skewness ⇔ skewed to the right. For the Negative Binomial Distribution, 3σ2 - 2µ + 2(σ2 - µ)2 /µ = 3rβ(1+β) − 2rβ + 2{rβ(1+β) − rβ}2 /(rβ) = rβ + 3rβ2 + 2rβ3 = rβ(1 + β)(1 + 2β) = third central moment. This property of the Negative Binomial is discussed in Section 6.9 of Loss Models, not on the syllabus. 50
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 157
Problems: 8.1 (3 points) What is the skewness of the following frequency distribution? Number of Claims Probability 0 0.02 1 0.04 2 0.14 3 0.31 4 0.36 5 0.13 A. less than -1.0 B. at least -1.0 but less than -0.5 C. at least -0.5 but less than 0 D. at least 0 but less than 0.5 E. at least 0.5 8.2 (2 points) A distribution has first moment = m, second moment about the origin = m + m2 , and third moment about the origin = m + 3m2 + m3 . Which of the following is the skewness of this distribution? A. m
B. m0.5
C. 1
D. m-0.5
E. m-1
8.3 (3 points) The number of claims filed by a commercial auto insured as the result of at-fault accidents caused by its drivers is shown below: Year Claims 2002 7 2001 3 2000 5 1999 10 1998 5 Calculate the skewness of the empirical distribution of the number of claims per year. A. Less than 0.50 B. At least 0.50, but less than 0.75 C. At least 0.75, but less than 1.00 D. At least 1.00, but less than 1.25 E. At least 1.25
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 158
8.4 (4 points) You are given the following distribution of the number of claims on 100,000 motor vehicle comprehensive polices: Number of claims Observed number of policies 0 88,585 1 10,577 2 779 3 54 4 4 5 1 6 or more 0 Calculate the skewness of this distribution. A. 1.0 B. 1.5 C. 2.0 D. 2.5 E. 3.0 8.5 (4, 5/87, Q.33) (1 point) There are 1000 insurance policies in force for one year. The results are as follows: Number of Claims Policies 0 800 1 130 2 50 3 20 1000 Which of the following statements are true? 1. The mean of this distribution is 0.29. 2. The variance of this distribution is at least 0.45. 3. The skewness of this distribution is negative. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 159
8.6 (CAS3, 5/04, Q.28) (2.5 points) A pizza delivery company has purchased an automobile liability policy for its delivery drivers from the same insurance company for the past five years. The number of claims filed by the pizza delivery company as the result of at-fault accidents caused by its drivers is shown below: Year Claims 2002 4 2001 1 2000 3 1999 2 1998 15 Calculate the skewness of the empirical distribution of the number of claims per year. A. Less than 0.50 B. At least 0.50, but less than 0.75 C. At least 0.75, but less than 1.00 D. At least 1.00, but less than 1.25 E. At least 1.25
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 160
Solutions to Problems: 8.1. B. Variance = E[X2 ] - E[X]2 = 12.4 - 3.342 = 1.244. Standard Deviation =
1.244 = 1.116.
skewness = {E[X3 ] - (3 E[X] E[X2 ]) + (2 E[X]3 )} / STDDEV3 = {48.82 - (3)(3.34)(12.4) + (2) (3.343 )} / (1.1163 ) = -0.65. Number of Claims 0 1 2 3 4 5
Probability Density Function 2% 4% 14% 31% 36% 13%
Probability x # of Claims 0 0.04 0.28 0.93 1.44 0.65
Probability x Square of # of Claims 0 0.04 0.56 2.79 5.76 3.25
Probability x Cube of # of Claims 0.0 0.0 1.1 8.4 23.0 16.2
Sum
1
3.34
12.4
48.82
8.2. D. σ2 = µ2 ′ - µ1 ′2 = (m + m2 ) - m2 = m. skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / σ3 = {(m + 3m2 + m3 ) - 3(m + m2 )m + 2 m3 } / m3/2 = m- 0 . 5. Comment: The moments are those of the Poisson Distribution with mean m. 8.3. B. E[X] = (7 + 3 + 5 + 10 + 5)/5 = 6. E[X2 ] = (72 + 32 + 52 + 102 + 52 )/5 = 41.6. Var[X] = 41.6 - 62 = 5.6.
E[X3 ] = (73 + 33 + 53 + 103 + 53 )/5 = 324.
Skewness = {E[X3 ] - 3 E[X2 ]E[X] + 2E[X]3 } / Var[X]1.5 = {324 - (3)(41.6)(6) + (2)(63 )}/5.61.5 = 7.2/13.25 = 0.54. Comment: Similar to CAS3, 5/04, Q.28. E[(X- X )3 ] = (13 + (-3)3 + (-1)3 + 43 + (-1)3 )/5 = 7.2.
2013-4-1,
Frequency Distributions, §8 Skewness
HCM 10/4/12,
Page 161
8.4. E. E[X] = 12318/100000 = 0.12318. E[X2 ] = 14268/100000 = 0.14268. Number of Claims
Number of Policies
Contribution to First Moment
Contribution to Second Moment
Contribution to Third Moment
0 1 2 3 4 5
88,585 10,577 779 54 4 1
0 10577 1558 162 16 5
0 10577 3116 486 64 25
0 10577 6232 1458 256 125
Total
100,000
12,318
14,268
18,648
Var[X] = 0.14268 - 0.123182 = 0.12751. E[X3 ] = 18648/100000 = 0.18648. Third Central Moment = E[X3 ] - 3 E[X]E[X2 ] + 2E[X]3 = .18648 - (3)(.12318)(.14268) + (2)(.123183 ) = .13749. Skewness = (Third Central Moment) / Var[X]1.5 = .13749/.127511.5 = 3.02. Comment: Data taken from Table 5.9.1 in Introductory Statistics with Applications in General Insurance by Hossack, Pollard and Zehnwirth. 8.5. A. 1. True. The mean is {(0)(800) + (1)(130) + (2)(50) + (3)(20)} / 1000 = 0.290. 2. False. The second moment is {(02)(800) + (12)(130) + (22)(50) + (32)(20)} / 1000 = 0.510. Thus the variance = 0.510 - 0.292 = 0.4259. 3. False. The distribution is skewed to the right and thus of positive skewness. The third moment is: {(03 )(800) + (13 )(130) + (23 )(50) + (33 )(20)} / 1000 = 1.070. Therefore, skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / STDDEV3 = {1.070 - (3)(.29)(.51) +(2)(.293 )} / .278 = 2.4 > 0. 8.6. E. E[X] = (4 + 1 + 3 + 2 + 15)/5 = 5. E[X2 ] = (42 + 12 + 32 + 22 + 152 )/5 = 51. Var[X] = 51 - 52 = 26.
E[X3 ] = (43 + 13 + 33 + 23 + 153 )/5 = 695.
Skewness = {E[X3 ] - 3 E[X2 ]E[X] + 2E[X]3 } / Var[X]1.5 = {695 - (3)(51)(5) + (2)(53 )}/261.5 = 180/133.425 = 1.358. Alternately, the third central moment is: {(4 - 5)3 + (1 - 5)3 + (3 - 5)3 + (2 - 5)3 + (15 - 5)3 }/5 = 180. Skewness = 180/261.5 = 1.358.
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 162
Section 9, Probability Generating Functions51 The Probability Generating Function, p.g.f., is useful for working with frequency distributions.52 ∞
P(z) = Expected Value of
zn
=
E[zn ] =
∑ f(n) zn . n=0
Note that as with other generating functions, there is a dummy variable, in this case z. Exercise: Assume a distribution with 1-q chance of 0 claims and q chance of 1 claim. (This a Bernoulli distribution with parameter q.) What is the probability generating function? [Solution: P(z) = E[zn ] = (1-q)(z0 ) + q(z1 ) = 1 + q(z-1).] The Probability Generating Function of the sum of independent frequencies is the product of the individual Probability Generating Functions. Specifically, if X and Y are independent random variables, then PX+Y(z) = E[zx+y] = E[zx zy] = E[zx]E[zy] = PX(z)PY(z). Exercise: What is the probability generating function of the sum of two independent Bernoulli distributions each with parameter q? [Solution: It is the product of the probability generating functions of each Bernoulli: {1 + q(z-1)}2 . Alternately, one can compute that for the sum of the two Bernoulli there is: (1-q)2 chance of zero claims, 2q(1-q) chance of 1 claims and q2 chance of 2 claims. Thus P(z) = (1-q)2 z0 + 2q(1-q)z1 + q2 z2 = 1 - 2q +q2 + 2qz + 2q2 z + q2 z2 = 1 + 2q(z-1) + (z2 - 2z +1)q2 = {1 + q(z-1)}2 .] As discussed, a Binomial distribution with parameters q and m is the sum of m independent Bernoulli distributions each with parameter q. Therefore the probability generating function of a Binomial distribution is that of the Bernoulli, to the power m: {1 + q(z-1)}m. The probability generating functions, as well as much other useful information on each frequency distribution, are given in the tables attached to the exam.
51
See Definition 3.9 in Loss Models.
The Probability Generating Function is similar to the Moment Generating Function: M(z) = E[ezn]. See “Mahlerʼs Guide to Aggregate Distributions.” They are related via P(z) = M(ln(z)). They share many properties. Loss Models uses the Probability Generating Function when dealing with frequency distributions. 52
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 163
Densities: The distribution determines the probability generating function and vice versa. Given a p.g.f., one can obtain the probabilities by repeated differentiation as follows: ⎛ d n P(z) ⎞ ⎜ ⎟ ⎝ dz n ⎠ z = 0 f(n) = . n! f(0) = P(0).53 f(1) = Pʼ(0). f(2) = Pʼʼ(0)/2. f(3) = Pʼʼʼ(0)/6. f(4) = Pʼʼʼʼ(0)/24. Exercise: Given the probability generating function: eλ(z-1), what is the probability of three claims? [Solution: P(z) = eλ(z-1) = eλze−λ. Pʼ(z) = λeλze−λ. Pʼʼ(z) = λ2ezλe−λ. Pʼʼʼ(z) = λ3ezλe−λ. f(3) = (d3 P(z) / dz3 )z=0 / 3! = (d3 eλ(z-1) / dz3 )z=0 / 3! = λ3e0λe−λ / 3! = λ3e−λ / 3!. Note that this is the p.g.f. of a Poisson Distribution with parameter lambda, and this is indeed the probability of 3 claims for a Poisson. ] Alternately, the probability of n claims is the coefficient of zn in the p.g.f. So for example, given the probability generating function: eλ(z-1) = eλze−λ = e−λ
∞
∞ -λ i (λz)i ∑ i! = ∑ e i! λ zi . Thus for this p.g.f., f(i) = e−λ λi/ i!, i=0 i=0
which is the density function of a Poisson distribution. Mean: ∞
P(z) =
∑
f(n)zn .
n=0
Pʼ(z) =
⇒ P(1) =
∞
∑ f(n) = 1.
n=0
∞
∞
∞
n=1
n=1
n=0
∑ f(n)nzn-1. ⇒ Pʼ(1) = ∑ f(n)n = ∑ f(n)n = Mean.
Pʼ(1) = Mean.54 53 54
As z → 0, zn → 0 for n > 0. Therefore, P(z) = Σ f(n) zn → f(0) as z → 0. This is a special case of a result discussed subsequently in the section on factorial moments.
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 164
Proof of Results for Adding Distributions: One can use the probability generating function, in order to determine the results of adding Poisson, Binomial, or Negative Binomial Distributions. Assume one has two independent Poisson Distributions with means λ1 and λ2. The p.g.f. of a Poisson is P(z) = eλ(z-1). The p.g.f. of the sum of these two Poisson Distributions is the product of the p.g.f.s of the two Poisson Distributions: exp[λ1(z-1)]exp[λ2(z-1)] = exp[(λ1 + λ2)(z-1)]. This is the p.g.f. of a Poisson Distribution with mean λ1 + λ 2. In general, the sum of two independent Poisson Distributions is also Poisson with mean equal to the sum of the means. Similarly, assume we are summing two independent Binomial Distributions with parameters m1 and q, and m2 and q. The p.g.f. of a Binomial is P(z) = {1 + q(z-1)}m. The p.g.f. of the sum is: {1 + q(z-1)}m1 {1 + q(z-1)}m2 = {1 + q(z-1)}m1 + m2. This is the p.g.f. of a Binomial Distribution with parameters m1 + m2 and q. In general, the sum of two independent Binomial Distributions with the same q parameter is also Binomial with parameters m1 + m2 and q. Assume we are summing two independent Negative Binomial Distributions with parameters r1 and β, and r2 and β. The p.g.f. of the Negative Binomial is P(z) = {1 - β(z-1)}-r. The p.g.f. of the sum is: {1 - β(z-1)}-r1 {1 - β(z-1)}-r2 = {1 - β(z-1)}-(r1 + r2). This is the p.g.f. of a Negative Binomial Distribution with parameters r1 + r2 and β. In general, the sum of two independent Negative Binomial Distributions with the same β parameter is also Negative Binomial with parameters r1 + r2 and β.
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 165
Infinite Divisibility:55 If a distribution is infinitely divisible, then if one takes the probability generating function to any positive power, one gets the probability generating function of another member of the same family of distributions.56 For example, for the Poisson P(z) = eλ(z-1). If we take this p.g.f. to the power ρ > 0, P(z)ρ = eρλ(z-1), which is the p.g.f. of a Poisson with mean ρλ. The p.g.f. of a sum of r independent identically distributed variables, is the individual p.g.f. to the power r. Since for the Geometric P(z) = 1/{1- β(z-1)}, for the Negative Binomial distribution: P(z) = {1- β(z-1)}-r, for r > 0, β > 0. Exercise: P(z) = {1- β(z-1)}-r, for r > 0, β > 0. Is the corresponding distribution infinitely divisible? [Solution: P(z)ρ = {1- β(z-1)}-ρr. Which is of the same form, but with ρr rather than r. Thus the corresponding Negative Binomial Distribution is infinitely divisible.] Infinitely divisible distributions include: Poisson, Negative Binomial, Compound Poisson, Compound Negative Binomial, Normal, Gamma, and Inverse Gaussian.57 Exercise: P(z) = {1 + q(z-1)}m, for m a positive integer and 0 < q < 1. Is the corresponding distribution infinitely divisible? [Solution: P(z)ρ = {1 + q(z-1)}ρm. Which is of the same form, but with ρm rather than m. However, unless ρ is integral, ρm is not. Thus the corresponding distribution is not infinitely divisible. This is a Binomial Distribution. While Binomials can be added up, they can not be divided into pieces smaller than a Bernoulli Distribution.] If a distribution is infinitely divisible, and one adds up independent identically distributed random variables, then one gets a member of the same family. As has been discussed this is the case for the Poisson and for the Negative Binomial.
55
See Definition 6.17 of Loss Models not on the syllabus, and in Section 9.2 of Loss Models on the syllabus. One can work with either the probability generating function, the moment generating function, or the characteristic function. 57 Compound Distributions will be discussed in a subsequent section. 56
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 166
In particular infinitely divisible distributions are preserved under a change of exposure.58 One can find a distribution of the same type such that when one adds up independent identical copies they add up to the original distribution. Exercise: Find a Poisson Distribution, such that the sum of 5 independent identical copies will be a Poisson Distribution with λ = 3.5. [Solution: A Poisson Distribution with λ = 3.5/5 = 0.7.] Exercise: Find a Negative Binomial Distribution, such that the sum of 8 independent identical copies will be a Negative Binomial Distribution with β = .37 and r = 1.2. [Solution: A Negative Binomial Distribution with β = .37 and r = 1.2/8 = .15.]
Distribution
Probability Generating Function, P(z)59
Infinitely Divisible
Binomial
{1 + q(z-1)}m
No60
Poisson
eλ(z-1)
Yes
Negative Binomial
{1 - β(z-1)}-r, z < 1 + 1/β
Yes
58
See Section 6.12 of Loss Models. As shown in Appendix B, attached to the exam. 60 Since m is an integer. 59
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 167
A(z):61 Let an = Prob[x > n]. ∞
Define A(z) =
∑ an zn . n=0 ∞
(1 - z) A(z) =
∑ an
∞
zn
n=0
Thus A(z) =
-
∑ an
∞
zn+ 1
= a0 -
n=0
∑ (an - 1 -
∞
an)zn
= 1 - p0 -
n=1
∑ pn zn
= 1 - P(z).
n=1
1 - P(z) . 1 - z
P(z) - 1 P(1) = 1. Therefore, as z → 1, A(z) = → Pʼ(1) = E[X]. Thus z - 1
∞
∑ an = A(1) = E[X]. n=0
This is analogous to the result that the mean is the integral of the survival function from 0 to ∞. For example, for a Geometric Distribution, P(z) = Thus A(z) =
1 . 1 - β(z - 1)
1 - P(z) -β(z - 1) 1 β = = . ⇒ A(1) = β = mean. 1 - z 1 - β(z - 1) 1 - z 1 + β - βz
Now in general, 1 / (1 - x/c) = 1 + (x/c) + (x/c)2 + (x/c)3 + (x/c)4 + .... Thus A(z) =
β β = 1 + β - βz 1 + β
1 = 1 - zβ / (1+ β)
β {1 + {zβ/(1+β)} + {zβ/(1+β)}2 + {zβ/(1+β)}3 + {zβ/(1+β)}4 + ....} = 1 + β β β β β β +z( )2 + z2 ( )3 + z3 ( )4 + z4 ( )5 + ... 1 + β 1 + β 1 + β 1 + β 1 + β
Thus matching up coefficients of zn , we have: an = ( Thus for the Geometric, Prob[x > n] = (
61
See Exercise 6.34 in Loss Models.
β )n+1. 1 + β
β )n+1, a result that has been discussed previously. 1 + β
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 168
Problems: 9.1 (2 points) The number of claims, N, made on an insurance portfolio follows the following distribution: n Pr(N=n) 0 0.35 1 0.25 2 0.20 3 0.15 4 0.05 What is the Probability Generating Function, P(z)? A. 1 + 0.65z + 0.4z2 + 0.2z3 + 0.05z4 B. 0.35 + 0.6z + 0.8z2 + 0.95z3 + z4 C. 0.35 + 0.25z + 0.2z2 + 0.15z3 + 0.05z4 D. 0.65 + 0.75z + 0.8z2 + 0.85z3 + 0.95z4 E. None of A, B, C, or D. 9.2 (1 point) For a Poisson Distribution with λ = 0.3, what is the Probability Generating Function at 5? A. less than 3 B. at least 3 but less than 4 C. at least 4 but less than 5 D. at least 5 but less than 6 E. at least 6 9.3 (1 point) Which of the following distributions is not infinitely divisible? A. Binomial B. Poisson C. Negative Binomial D. Normal E. Gamma
9.4 (3 points) Given the Probability Generating Function, P(z) = for the corresponding frequency distribution? A. 1/2% B. 1% C. 2% D. 3%
E. 4%
e0.4z - 1 , what is the density at 3 e0.4 - 1
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 169
9.5 (5 points) You are given the following data on the number of runs scored during half innings of major league baseball games from 1980 to 1998: Runs Number of Occurrences 0 518,228 1 105,070 2 47,936 3 21,673 4 9736 5 4033 6 1689 7 639 8 274 9 107 10 36 11 25 12 5 13 7 14 1 15 0 16 1 Total 709,460 WIth the aid of computer, from z = -2.5 to z = 2.5, graph P(z) the probability generating function of the empirical model corresponding to this data. 9.6 (2 points) A variable B has probability generating function P(z) = 0.8z2 + 0.2z4 . A variable C has probability generating function P(z) = 0.7z +0.3z5 . B and C are independent. What is the probability generating function of B + C. A. 1.5z3 + 0.9z5 + 1.1z7 + 0.5z9 B. 0.25z3 + 0.25z5 + 0.25z7 + 0.25z9 C. 0.06z3 + 0.24z5 + 0.14z7 + 0.56z9 D. 0.56z3 + 0.14z5 + 0.24z7 + 0.06z9 E. None of A, B, C, or D. 9.7 (1 point) Given the Probability Generating Function, P(z) = 0.5z + 0.3z2 + 0.2z4 , what is the density at 2 for the corresponding frequency distribution? A. 0.2 B. 0.3 C. 0.4 D. 0.5 E. 0.6
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 170
9.8 (1 point) For a Binomial Distribution with m = 4 and q = 0.7, what is the Probability Generating Function at 10? A. less than 1000 B. at least 1000 but less than 1500 C. at least 1500 but less than 2000 D. at least 2000 but less than 2500 E. at least 2500 9.9 (1 point) N follows a Poisson Distribution with λ = 5.6. Determine E[3N]. A. 10,000
B. 25,000
C. 50,000
D. 75,000
9.10 (7 points) A frequency distribution has P(z) = 1 - (1-z)-r, where r is a parameter between 0 and -1. (a) (3 points) Determine the density at 0, 1, 2, 3, etc. (b) (1 point) Determine the mean. ∞
(c) (2 points) Let an = Prob[x > n]. Define A(z) =
∑ an zn . n=0
Show that in general, A(z) =
1 - P(z) . 1 - z
(d) (2 points) Using the result in part (c), show that for this distribution, ⎛ n + r⎞ an = ⎜ ⎟ = (r+1) (r+2) .... (r+n) / n!. ⎝ n ⎠
E. 100,000
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 171
9.11 (2, 2/96, Q.15) (1.7 points) Let X1 ,..., Xn be independent Poisson random variables with n
expectations λ1, . . . , λn , respectively. Let Y = ∑ cXi , where c is a constant. i =1
Determine the probability generating function of Y. n
A. exp[(zc + z2 c2 /2) ∑ λ i ] i=1
n
B. exp[(zc - 1) ∑ λ i ] i=1
n
n
i=1
i=1
C. exp[zc ∑ λ i + (z2 c2 /2) ∑ λ i2 ] n
D. exp[(zc - 1) ∑ λ i ] i=1
n
E. (zc - 1)n ∏ λ i i =1
9.12 (IOA 101, 4/00, Q.10) (4.5 points) Under a particular model for the evolution of the size of a population over time, the probability generating function of Xt , the size at time t, is given by: P(z) = {z + λt(1 - z)}/{1 + λt(1 - z)}, λ > 0. If the population dies out, it remains in this extinct state for ever. (i) (2.25 points) Determine the expected size of the population at time t. (ii) (1.5 points) Determine the probability that the population has become extinct by time t. (iii) (0.75 points) Comment briefly on the future prospects for the population. 9.13 (IOA 101, 9/01, Q.2) (1.5 points) Let X1 and X2 be independent Poisson random variables with respective means µ1 and µ2. Determine the probability generating function of X1 + X2 and hence state the distribution of X1 + X2 .
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 172
Solutions to Problems: 9.1. C. P(z) = E[zn ] = (.35)(z0 ) + (.25)(z1 ) + (.20)(z2 ) + (.15)(z3 ) + (.05)(z4 ) = 0.35 + 0.25z + 0.2z2 + 0.15z3 + 0.05z4 . 9.2. B. As shown in the Appendix B attached to the exam, for a Poisson Distribution P(z) = eλ(z-1). P(5) = e4λ = e1.2 = 3.32. 9.3. A. The Binomial is not infinitely divisible. Comment: In the Binomial m is an integer. For m =1 one has a Bernoulli. One can not divide a Bernoulli into smaller pieces. 9.4. C. P(z) = (e0.4z - 1)/(e0.4 - 1). Pʼ(z) = .4e0.4z/(e0.4 - 1). Pʼʼ(z) = .16e0.4z/(e0.4 - 1). Pʼʼʼ(z) = .064e0.4z/(e0.4 - 1). f(3) = (d3 P(z) / dz3 )z=0 / 3! = (.064/(e0.4 - 1))/6 = 2.17%. Comment: This is a zero-truncated Poisson Distribution with λ = 0.4, not on the syllabus. 9.5. P(z) = {518,228 + 105,070 z + 47,936 z2 + ... + z16} / 709,460. Here is a graph of P(z): PGF
3.0 2.5 2.0 1.5 1.0 0.5 -2 -1 1 Comment: For example, P(-2) = 0.599825, and P(2) = 2.73582.
2
z
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 173
9.6. D. The probability generating function of a sum of independent variables is the product of the probability generating functions. PB+C(z) = PB(z)PC(z) = (.8z2 + .2z4 )(.7z + .3z5 ) = 0.56z3 + 0.14z5 + 0.24z7 + 0.06z9 . Alternately, B has 80% probability of being 2 and 20% probability of being 4. C has 70% probability of being 1 and 30% probability of being 5. Therefore, B + C has: (80%)(70%) = 56% chance of being 1+ 2 = 3, (20%)(70%) = 14% chance of being 4 + 1 = 5, (80%)(30%) = 24% chance of being 2 + 5 = 7, and (20%)(30%) = 6% chance of being 4 + 5 = 9.
⇒ PB+C(z) = (.8z2 + .2z4 )(.7z + .3z5 ) = 0.56z3 + 0.14z5 + 0.24z7 + 0.06z9 . Comment: An example of a convolution. 9.7. B. P(z) = Expected Value of zn = Σ f(n) zn . Thus f(2) = 0.3. Alternately, P(z) = .5z + .3z2 + .2z4 . Pʼ(z) = .5 + .6z + .8z3 . Pʼʼ(z) = .6 + .24z3 . f(2) = (d2 P(z) / dz2 )z=0 / 2! = .6/2 = 0.3. 9.8. E. As shown in the Appendix B attached to the exam, for a Binomial Distribution P(z) = {1 + q(z-1)}m = {1 + (.7)(z-1)}4 . P(10) = {1 + (.7)(9)}4 = 2840. 9.9. D. The p.g.f. of the Poisson Distribution is: P(z) = eλ(z-1) = e5.6(z-1). E[3N] = P(3) = e5.6(3-1) = e11.2 = 73,130.
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 174
9.10. (a) f(0) = P(0) = 0. Pʼ(z) = -r (1-z)-(r+1). f(1) = Pʼ(0) = -r. 0 Pʼʼ(z) = -r (r+1)(1-z)-(r+1). f(2) = Pʼʼ(0)/2 = -r(r+1)/2. Pʼʼʼ(z) = -r (r+1)(r+2)(1-z)-(r+1). f(3) = Pʼʼʼ(0)/3! = -r(r+1)(r+2)/6. f(x) = -r(r+1) ... (r+x - 1)/x! = -
Γ[x +r] , x = 1, 2, 3, ... Γ[x +1] Γ[r]
(b) Mean = Pʼ(1) = infinity. The densities go to zero too slowly; thus there is no finite mean. ∞
(c) (1 - z) A(z) =
∑ an n=0
Thus A(z) =
∞
zn
-
∑ an n=0
∞
zn+ 1
= a0 -
∑ (an - 1 -
∞
an)zn
= 1 - p0 -
n=1
∑ pn zn
= 1 - P(z).
n=1
1 - P(z) . 1 - z ∞
(d) Thus for this distribution A(z) = (1-z)-r / (1-z) = (1-z)-(r+1) =
n=0 ∞
Thus since A(z) =
⎛ n +r⎞ n z , from the Taylor series. n ⎟⎠
∑ ⎜⎝
⎛ n + r⎞ = (r+1) (r+2) .... (r+n) / n!. n ⎟⎠
∑ an zn , an = ⎜⎝ n=0
Comment: This is called a Sibuya frequency distribution. It is the limit of an Extended Zero-Truncated Negative Binomial Distribution, as β → ∞. See Exercises 6.13, 6.34, and 8.32 in Loss Models. For r = -0.7, the densities at 1 through 10: 0.7, 0.105, 0.0455, 0.0261625, 0.0172672, 0.0123749, 0.00936954, 0.00737851, 0.00598479, 0.00496738. For r = -0.7, Prob[n > 10] = (0.3)(1.3) ... (9.3) / 10! = 0.065995. 1 - P(z) P(1) = 1. Therefore, as z → 0, A(z) = → Pʼ(1) = E[X]. Thus 1 - z
∞
∑ an = A(1) = E[X]. n=0
This is analogous to the result that the mean is the integral of the survival function from 0 to ∞. For this distribution, Mean = A(1) = (1 - 1)-(r+1) = ∞.
2013-4-1,
Frequency Distributions, §9 Prob. Generating Func. HCM 10/4/12, Page 175
9.11. D. For each Poisson, the probability generating function is: P(z) = exp[λi(z-1)]. Multiplying a variable by a constant: PcX[z] = E[zcx] = E[(zc)x] = PX[zc]. For each Poisson times c, the p.g.f. is: exp[λi(zc - 1)]. n
The p.g.f. of the sum of variables is a product of the p.g.f.s: PY(z) = exp[(zc - 1) ∑ λ i ]. i=1
Comment: Multiplying a Poisson variable by a constant does not result in another Poisson; rather it results in what is called an Over-Dispersed Poisson Distribution. Since Var[cX]/E[cX] = cVar[X]/E[X], for a constant > 1, the Over-Dispersed Poisson Distribution has a variance greater than it mean. See for example “A Primer on the Exponential Family of Distributions”, by David R. Clark and Charles Thayer, CAS 2004 Discussion Paper Program. 9.12. (i) Pʼ(z) = ({1-λt}{1 + λt(1-z)} + λt{z + λt(1-z)}) / {1 + λt(1-z)}2 = 1 / {1 + λt(1- z)}2 . E[X] = Pʼ(1) = 1. The expected size of the population is 1 regardless of time. (ii) f(0) = P(0) = λt/(1 + λt). This is the probability of extinction by time t. The probability of survival to time t is: 1 - λt/(1 + λt) = 1/(1 + λt) = (1/λ)/{(1/λ) + t), the survival function of a Pareto Distribution with α = 1 and θ = 1/λ. (iii) As t approaches infinity, the probability of survival approaches zero. Comment: Pʼʼ(z) = 2λt/{1 + λt(1- z)}3 . E[X(X -1)] = Pʼʼ(1) = 2λt.
⇒ E[X2 ] = E[X] + 2λt = 1 + 2λt. ⇒ Var[X] = 1 + 2λt - 12 = 2λt. 9.13. P1 (z) = exp[µ1(z-1)]. P2 (z) = exp[µ2(z-1)]. Since X1 and X2 are independent, the probability generating function of X1 + X2 is: P1 (z)P2 (z) = exp[µ1(z-1) + µ2(z-1)] = exp[(µ1 + µ2)(z-1)]. This is the probability generating function of a Poisson with mean µ1 + µ2, which must therefore be the distribution of X1 + X2 .
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 176
Section 10, Factorial Moments When working with frequency distributions, in addition to moments around the origin and central moments, one sometimes uses factorial moments. The nth factorial moment is the expected value of the product of the n factors: X(X-1) .. (X+1-n). µ( n ) = E[X(X-1) ... (X+1-n)].62 So for example, µ(1) = E[X], µ(2) = E[X(X-1)], µ(3) = E[X(X-1)(X-2)]. Exercise: What is the second factorial moment of a Binomial Distribution with parameters m = 4 and q = 0.3? [Solution: The density function is: f(0) = 0.74 , f(1) = (4)(0.73 )(0.3), f(2) = (6)(0.72 )(0.32 ), f(3) = (4)(0.7)(0.3)3 , f(4) = 0.34 . E[X(X-1)] = (0)(-1)f(0) + (1)(0)f(1) + (2)(1)f(2) + (3)(2)f(3) + (4)(3)f(4) = (12)(0.72 ) (0.32 ) + (24)(0.7)(0.33 ) + 12(0.34 ) = 12(0.32 ){(0.72 ) + (2)(0.7)(0.3) + (0.32 )} = (12)(0.32 )(0.7 + 0.3)2 = (12)(0.32 ) = 1.08.] The factorial moments are related to the moments about the origin as follows:63 µ(1) = µ1 ′ = µ µ(2) = µ2 ′ - µ1 ′
µ(3) = µ3 ′ - 3µ2 ′ + 2µ1 ′ µ(4) = µ4 ′ - 6µ3 ′ + 11µ2 ′ - 6µ1 ′ The moments about the origin are related to the factorial moments as follows: µ1 ′ = µ(1) = µ µ2 ′ = µ(2) + µ(1)
µ3 ′ = µ(3) + 3µ(2) + µ(1) µ4 ′ = µ(4) + 6 µ(3) + 7µ(2) + µ(1) Note that one, can use the factorial moments to compute the variance, etc. 62 63
See the first page of Appendix B of Loss Models. Moments about the origin are sometimes referred to as “raw moments.”
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 177
For example for a Binomial Distribution with m = 4 and q = 0.3, the mean is mq = 1.2, while the second factorial moment was computed to be 1.08. Thus the second moment around the origin is µ2 ′ = µ(2) + µ (1) = µ(2) + µ = 1.08 + 1.2 = 2.28. Thus the variance is 2.28 - 1.22 = 0.84. This in fact equals mq(1-q) = (4)(0.3)(0.7) = 0.84. In general the variance (the second central moment) is related to the factorial moments as follows: variance = µ2 = µ2 ′ - µ1 ′2 = µ(2) + µ (1) - µ(1)2. Using the Probability Generating Function to Get Factorial Moments: One can use the Probability Generating Function to get the factorial moments. To get the nth factorial moment, one differentiates the p.g.f. n times and sets z =1: ⎛ d n P(z) ⎞ ⎟ = Pn (1). µ( n ) = ⎜ ⎝ dzn ⎠ z = 1 So for example, µ(1) = E[X] = Pʼ(1), and µ(2) = E[X(X-1)] = Pʼʼ(1).64 Exercise: Given that the p.g.f. of a Poisson Distribution is eλ(z-1), determine its first four factorial moments. [Solution: P(z) = eλ(z-1) = eλze−λ. Pʼ(z) = λeλze−λ. Pʼʼ(z) = λ2ezλe−λ. Pʼʼʼ(z) = λ3ezλe−λ Pʼʼʼʼ(z) = λ4ezλe−λ. µ(1) = Pʼ(1) = λe1λe−λ = λ. µ(2) = Pʼʼ(1) = λ2eλe−λ = λ2. µ(3) = Pʼʼʼ(1) = λ3e1λe−λ = λ3. µ(4) = Pʼʼʼʼ(1) = λ4. Comment: For the Poisson Distribution, µ(n) = λn .] Exercise: Using the first four factorial moments of a Poisson Distribution, determine the first four moments of a Poisson Distribution. [Solution: µ1 ′ = µ(1) = λ. µ2 ′ = µ(2) + µ(1) = λ2 + λ. µ3 ′ = µ(3) + 3µ(2) + µ (1) = λ3 + 3λ2 + λ. µ4 ′ = µ(4) + 6µ(3) + 7µ(2) + µ(1) = λ4 + 6λ3 + 7λ2 + λ.] 64
Exercise 6.1 in Loss Models.
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 178
Exercise: Using the first four moments of a Poisson Distribution, determine its coefficient of variation, skewness, and kurtosis. [Solution: variance = µ2 ′ - µ1 ′2 = λ 2 + λ - λ2 = λ. Coefficient of variation = standard deviation / mean =
λ /λ = 1/ λ .
third central moment = µ3 ′ − 3µ1 ′µ2 ′ + 2µ1 ′3 = λ3 + 3λ2 + λ − 3λ(λ2 + λ) + 2λ3 = λ. skewness = third central moment / variance1.5 = λ/λ1.5 = 1/ λ . fourth central moment = µ4 ′ − 4µ1 ′µ3 ′ + 6µ1 ′2µ2 ′ − 3µ1 ′4 = λ 4 + 6λ3 + 7λ2 + λ - 4λ(λ3 + 3λ2 + λ) + 6λ2 (λ2 + λ) - 3λ4 = 3λ2 + λ. kurtosis = fourth central moment / variance2 = (3λ2 + λ)/λ2 = 3 + 1/λ. Comment: While there is a possibility you might use the skewness of the Poisson Distribution, you are extremely unlikely to ever use the kurtosis of the Poisson Distribution! Kurtosis is discussed in “Mahlerʼs Guide to Loss Distributions.” As lambda approaches infinity, the kurtosis of a Poisson approaches 3, that of a Normal Distribution. As lambda approaches infinity, the Poisson approaches a Normal Distribution.] Exercise: Derive the p.g.f. of the Geometric Distribution and use it to determine the variance. ∞
[Solution: P(z) = Expected Value of
zn =
∑ f(n) n=0
=
∞
zn
=
∑
n=0
βn 1 zn = n + 1 1+β (1+ β)
∞
∑
n=0
⎛ βz⎞ ⎜ ⎟ ⎝ 1+ β ⎠
1 1 1 = , z < 1 + 1/β. 1+β 1- zβ / (1+β) 1- β(z -1)
Pʼ(z) =
β 2β2 . Pʼʼ(z) = . µ(1) = Pʼ(1) = β. µ(2) = Pʼʼ(1) = 2β2. 2 3 {1- β(z- 1)} {1- β(z- 1)}
Thus the variance of the Geometric distribution is: µ(2) + µ (1) - µ(1)2 = 2β2 + β − β2 = β(1+β).]
n
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 179
Formulas for the (a, b, 0) class:65 One can use iteration to calculate the factorial moments of a member of the (a,b,0) class.66 µ(1) = (a + b)/(1-a)
µ(n) = (an + b)µ(n-1)/(1-a)
Exercise: Use the above formulas to compute the first three factorial moments about the origin of a Negative Binomial Distribution. [Solution: For a Negative Binomial Distribution: a = β/(1+β) and b = (r-1)β/(1+β). µ(1) = (a + b)/(1 - a) = rβ/(1+β) /{1/(1+β)} = rβ. µ(2) = (2a + b)µ(1)/(1 - a) = {(r+1)β/(1+β)} rβ/{1/(1+β)} = r(r+1)β2. µ(3) = (3a + b)µ(2)/(1 - a) = {(r+2)β/(1+β)} r(r+1)β2/{1/(1+β)} = r(r+1)(r+2)β3.] In general, the nth factorial moment of a Negative Binomial Distribution is: µ(n) = r(r+1)...(r+n-1)βn . Exercise: Use the first three factorial moments to compute the first three moments about the origin of a Negative Binomial Distribution. [Solution: µ1 ′ = µ(1) = rβ. µ2 ′ = µ(2) + µ (1) = r(r+1)β2 + rβ. µ3 ′ = µ(3) + 3µ(2) + µ(1) = r(r+1)(r+2)β3 + 3r(r+1)β2 + rβ.] Exercise: Use the first two moments about the origin of a Negative Binomial Distribution to compute its variance. [Solution: The variance of the Negative Binomial is µ2 ′ − µ1 ′2 = r(r+1)β2 + rβ - (rβ)2 = rβ(1+β).] Exercise: Use the first three moments about the origin of a Negative Binomial Distribution to compute its skewness. [Solution: Third central moment = µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3) = r(r+1)(r+2)β3 + 3r(r+1)β2 + rβ -(3)(rβ)( r(r+1)β2 + rβ) + 2(rβ)3 = 2rβ3 + 3rβ2 +rβ. Variance = rβ(1+β). Therefore, skewness =
65 66
2rβ3 + 3rβ 2 + rβ {rβ(1+β)}1.5
=
The (a, b, 0) class will be discussed subsequently. See Appendix B.2 of Loss Models.
1 + 2β rβ(1 + β)
.]
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 180
One can derive that for any member of the (a,b,0) class, the variance = (a+b)/(1-a)2 .67 For example for the Negative Binomial Distribution, a = β/(1+β) and b = (r-1)β/(1+β) variance = (a+b)/(1-a)2 = {rβ/(1+β)}/ {1/(1+β)2 } = rβ(1+β). The derivation is as follows:
µ(1) = (a+b)/(1-a) . µ(2) = (2a +b)µ(1)/(1-a) = (2a +b)(a+b)/(1-a)2 . µ2 ′ = µ(2) + µ(1) = (2a +b)(a+b)/(1-a)2 + (a+b)/(1-a) = (a +b+1)(a+b)/(1-a)2 variance = µ2 ′ − µ1 ′2 = (a +b+1)(a+b)/(1-a)2 - {(a+b)/(1-a)}2 = (a+b)/(1-a)2 . Exercise: Use the above formula for the variance of a member of the (a,b,0) class to compute the variance of a Binomial Distribution. [Solution: For the Binomial, a = -q/(1-q) and b = (m+1)q/(1-q). variance = (a+b)/(1-a)2 = mq/(1-q) / {1/(1-q) }2 = mq(1-q).]
Distribution
nth Factorial Moment
Binomial
m(m-1)...(m+1-n)qn for n ≤ m, 0 for n > m λn
Poisson Negative Binomial
67
r(r+1)...(r+n-1)βn
See Appendix B.2 of Loss Models.
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 181
Problems: 10.1 (2 points) The number of claims, N, made on an insurance portfolio follows the following distribution: n Pr(N=n) 0 0.3 1 0.3 2 0.2 3 0.1 4 0.1 What is the second factorial moment of N? A. 1.6 B. 1.8 C. 2.0 D. 2.2 E. 2.4 10.2 (3 points) Determine the third moment of a Poisson Distribution with λ = 5. A. less than 140 B. at least 140 but less than 160 C. at least 160 but less than 180 D. at least 180 but less than 200 E. at least 200 10.3 (2 points) The random variable X has a Binomial distribution with parameters q and m = 8. Determine the expected value of X(X -1)(X - 2). A. 512
B. 512q(q-1)(q-2)
C. q(q-1)(q-2)
D. q3
E. None of A, B, C, or D
10.4 (2 points) You are given the following information about the probability generating function for a discrete distribution:
• P'(1) = 10 • P"(1) = 98 Calculate the variance of the distribution. A. 8 B. 10 C. 12 D. 14
E. 16
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 182
10.5 (3 points) The random variable X has a Negative Binomial distribution with parameters β = 7/3 and r = 9. Determine the expected value of X(X -1)(X - 2)(X-3). A. less than 200,000 B. at least 200,000 but less than 300,000 C. at least 300,000 but less than 400,000 D. at least 400,000 but less than 500,000 E. at least 500,000 10.6 (3 points) Determine the third moment of a Binomial Distribution with m = 10 and q = 0.3. A. less than 40 B. at least 40 but less than 50 C. at least 50 but less than 60 D. at least 60 but less than 70 E. at least 70 10.7 (3 points) Determine the third moment of a Negative Binomial Distribution with r = 10 and β = 3. A. less than 36,000 B. at least 36,000 but less than 38,000 C. at least 38,000 but less than 40,000 D. at least 40,000 but less than 42,000 E. at least 42,000 10.8 (4B, 11/97, Q.21) (2 points) The random variable X has a Poisson distribution with mean λ. Determine the expected value of X(X -1)...(X - 9). A. 1
B. λ
C. λ(λ−1)...(λ−9)
D. λ10
E. λ(λ+1)...(λ+9)
10.9 (CAS3, 11/06, Q.25) (2.5 points) You are given the following information about the probability generating function for a discrete distribution:
• P'(1) = 2 • P"(1) = 6 Calculate the variance of the distribution. A. Less than 1.5 B. At least 1.5, but less than 2.5 C. At least 2.5, but less than 3.5 D. At least 3.5, but less than 4.5 E. At least 4.5
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Solutions to Problems: 10.1. D. The 2nd factorial moment is: E[N(N-1)] = (.3)(0)(-1) + (.3)(1)(0) + (.2)(2)(1) + (.1)(3)(2) + (.1)(4)(3) = 2.2.
Page 183
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 184
10.2. E. The factorial moments for a Poisson are: λn . mean = first factorial moment = λ = 5. Second factorial moment = 52 = 25 = E[X(X-1)] = E[X2 ] - E[X]. ⇒ E[X2 ] = 25 + 5 = 30. Third factorial moment = 53 = 125 = E[X(X-1)(X-2)] = E[X3 ] - 3E[X2 ] + 2E[X].
⇒ E[X3 ] = 125 + (3)(30) - (2)(5) = 205. Alternately, for the Poisson P(z) = eλ(z-1). P(z) = eλ(z-1). Pʼ(z) = λeλ(z-1). Pʼʼ(z) = λ2eλ(z-1). Pʼʼʼ(z) = λ3eλ(z-1). mean = first factorial moment = Pʼ(1) = λ. Second factorial moment = Pʼʼ(1) = λ2. Third factorial moment = Pʼʼʼ(1) = λ3. Proceed as before. Alternately, the skewness of a Poisson is 1/ λ . Since the variance is λ, the third central moment is: λ1.5/ λ = λ. λ = E[(X - λ)3 ] = E[X3 ] - 3λE[X2 ] + 3λ2E[X] - λ3.
⇒ E[X3 ] = λ + 3λE[X2 ] - 3λ2E[X] + λ3 = λ + 3λ(λ + λ2) - 3λ2λ + λ3 = λ3 + 3λ2 + λ = 125 + 75 + 5 = 205. Comment: One could compute enough of the densities and then calculate the third moment: Number of Claims
Probability Density Function
Probability x # of Claims
Probability x Square of # of Claims
Probability x Cube of # of Claims
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0.674% 3.369% 8.422% 14.037% 17.547% 17.547% 14.622% 10.444% 6.528% 3.627% 1.813% 0.824% 0.343% 0.132% 0.047% 0.016% 0.005% 0.001%
0.00000 0.03369 0.16845 0.42112 0.70187 0.87734 0.87734 0.73111 0.52222 0.32639 0.18133 0.09066 0.04121 0.01717 0.00660 0.00236 0.00079 0.00025
0.00000 0.03369 0.33690 1.26337 2.80748 4.38668 5.26402 5.11780 4.17779 2.93751 1.81328 0.99730 0.49453 0.22323 0.09246 0.03538 0.01258 0.00418
0.00000 0.03369 0.67379 3.79010 11.22991 21.93342 31.58413 35.82459 33.42236 26.43761 18.13279 10.97034 5.93437 2.90193 1.29444 0.53070 0.20127 0.07101
Sum
0.99999458366
4.99990
29.99818
204.96644
2013-4-1,
Frequency Distributions, §10 Factorial Moments
10.3. E. f(x) =
(8)(7)(6)q3
∑
x=3
Page 185
8! qx (1- q)8 - x , for x = 0 to 8. x! (8 - x)!
E[X(X-1)(X-2)] = x=8
HCM 10/4/12,
x=8
x=8
x=3
x=3
∑ x(x -1)(x - 2)f(x) = ∑ x(x -1)(x - 2) x! (8 - x)! qx (1- q)8 - x =
5! qx - 3 (1- q)8 - x = 336q3 (x- 3)! (8 - x)!
8!
y=5
∑ y! (55!- y)! qy (1- q)5 - y = 336q3 . y=0
Alternately, the 3rd factorial moment is the 3rd derivative of the p.g.f. at z = 1. For the Binomial: P(z) = {1+ q(z-1)}m. dP/dz = mq{1+ q(z-1)}(m-1). Pʼʼ(z) = m(m-1)q2 {1+ q(z-1)}(m-2). Pʼʼʼ(z) = m(m-1)(m-2)q3 {1+ q(z-1)}(m-3) . Pʼʼʼ(1) = m(m-1)(m-2)q3 = (8)(7)(6)q3 = 336q3 . Comment: Note that the product x(x -1)(x-2) is zero for x = 0,1 and 2, so only terms for x ≥3 contribute to the sum. Then a change of variables is made: y = x-3. Then the resulting sum is the sum of Binomial terms from y = 0 to 5, which sum is one, since the Binomial is a Distribution, with a support in this case 0 to 5. The expected value of: X(X -1)(X - 2), is an example of what is referred to as a factorial moment. In the case of the Binomial, the kth factorial moment for k≤m is: p k(m!)/(m-k)! = pk(m)(m-1)...(m-(k-1)). In our case we have the 3rd factorial moment (involving the product of 3 terms) equal to: q3 (m)(m-1)(m-2). 10.4. A. 10 = P'(1) = E[N]. 98 = P"(1) = E[N(N-1)] = E[N2 ] - E[N]. E[N2 ] = 98 + 10 = 108. Var[N] = E[N2 ] - E[N]2 = 108 - 102 = 8. Comment: Similar to CAS3, 11/06, Q.25.
2013-4-1,
Frequency Distributions, §10 Factorial Moments
10.5. C. f(x) =
x=∞
x(x- 1)(x - 2)(x - 3)f(x) =
x=0
x=∞
(12)(11)(10)(9)(7/3)4
∑ x=4
352,147
∑
x(x- 1)(x - 2)(x - 3)
x=4
y=∞
∑ y=0
Page 186
(7 / 3)x (x + 8)! . E[ X(X-1)(X-2)(X-3) ] = x! 8! (1+ 7 / 3)x + 9
x=∞
∑
HCM 10/4/12,
(x + 8)! (7 / 3)x = x! 8! (1+ 7 / 3)x + 9
(x + 8)! (7 / 3)x- 4 = (x - 4)! 12! (1+ 7 / 3)x + 9 y
(7 / 3) (y + 12)! = 352,147. y! 12! (1+ 7 / 3)y + 13
Alternately, the 4th factorial moment is the 4th derivative of the p.g.f. at z = 1. For the Negative Binomial: P(z) = {1- β(z-1)}-r. dP/dz = rβ{1- β(z-1)}-(r+1). Pʼʼ(z) = r(r+1)β2{1- β(z-1)}-(r+2). Pʼʼʼ(z) = r(r+1)(r+2)β3{1- β(z-1)}-(r+3). Pʼʼʼʼ(z) = r(r+1)(r+2)(r+3)β4{1- β(z-1)}-(r+4). Pʼʼʼʼ(1) = r(r+1)(r+2)(r+3)β4 = (9)(10)(11)(12)(7/3)4 = 352,147. Comments: Note that the product x(x -1)(x-2)(x-3) is zero for x = 0,1,2 and 3, so only terms for x ≥4 contribute to the sum. Then a change of variables is made: y = x-4. Then the resulting sum is the sum of Negative Binomial terms, with β = 7/3 and r = 13, from y = 0 to infinity, which sum is one, since the Negative Binomial is a Distribution with support from 0 to ∞. The expected value of X(X -1)(X - 2)(X-3), is an example of a factorial moment. In the case of the Negative Binomial, the mth factorial moment is: βm (r)(r+1)...(r+m-1). In our case we have the 4th factorial moment (involving the product of 4 terms) equal to: β 4 (r)(r+1)(r+2)(r+3), with β = 7/3 and r = 9.
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 187
10.6. B. P(z) = {1 + q(z-1)}m = {1 + 0.3(z-1)}10 = {0.7 + 0.3z}10. Pʼ(z) = (10)(0.3){0.7 + 0.3z}9 . Pʼʼ(z) = (3)(2.7){0.7 + 0.3z}8 . Pʼʼʼ(z) = (3)(2.7)(2.4){0.7 + 0.3z}7 . mean = first factorial moment = Pʼ(1) = 3. Second factorial moment = Pʼʼ(1) = (3)(2.7) = 8.1. Third factorial moment = Pʼʼʼ(1) = (3)(2.7)(2.4) = 19.44. Second factorial moment = 8.1 = E[X(X-1)] = E[X2 ] - E[X]. ⇒ E[X2 ] = 8.1 + 3 = 11.1. Third factorial moment = 19.44 = E[X(X-1)(X-2)] = E[X3 ] - 3E[X2 ] + 2E[X].
⇒ E[X3 ] = 19.44 + (3)(11.1) - (2)(3) = 46.74. Comment: E[X2 ] = variance + mean2 = 2.1 + 32 = 11.1. One could compute all of the densities and then calculate the third moment: Number of Claims
Probability Density Function
Probability x # of Claims
Probability x Square of # of Claims
Probability x Cube of # of Claims
0 1 2 3 4 5 6 7 8 9 10
2.825% 12.106% 23.347% 26.683% 20.012% 10.292% 3.676% 0.900% 0.145% 0.014% 0.001%
0.00000 0.12106 0.46695 0.80048 0.80048 0.51460 0.22054 0.06301 0.01157 0.00124 0.00006
0.00000 0.12106 0.93390 2.40145 3.20194 2.57298 1.32325 0.44108 0.09259 0.01116 0.00059
0.00000 0.12106 1.86780 7.20435 12.80774 12.86492 7.93949 3.08758 0.74071 0.10044 0.00590
Sum
1
3.00000
11.10000
46.74000
10.7. C. P(z) = {1 - β(z-1)}-r = {1 - 3(z-1)}-10 = (4 - 3z)-10. Pʼ(z) = (-10)(-3)(4 - 3z)-11. Pʼʼ(z) = (30)(33)(4 - 3z)-12. Pʼʼʼ(z) = (30)(33)(36)(4 - 3z)-13. mean = first factorial moment = Pʼ(1) = 30. Second factorial moment = Pʼʼ(1) = (30)(33) = 990. Third factorial moment = Pʼʼʼ(1) = (30)(33)(36) = 35,640. Second factorial moment = 990 = E[X(X-1)] = E[X2 ] - E[X]. ⇒ E[X2 ] = 990 + 30 = 1020. Third factorial moment = 35,640 = E[X(X-1)(X-2)] = E[X3 ] - 3E[X2 ] + 2E[X].
⇒ E[X3 ] = 35,640 + (3)(1020) - (2)(30) = 38,640. Comment: E[X2 ] = variance + mean2 = (10)(3)(4) + 302 = 1020.
2013-4-1,
Frequency Distributions, §10 Factorial Moments
HCM 10/4/12,
Page 188
10.8. D. For a discrete distribution, the expected value of a quantity is determined by taking the sum of its product with the probability density function. In this case, the density of the Poisson is: e−λ λ x / x! , x = 0, 1, 2... Thus E[ X(X -1)...(X - 9) ] = x=∞
∑ x=0
e - λ λx x(x- 1)(x - 2)(x - 3)(x - 4)(x - 5)(x - 6)(x - 7)(x - 8)(x - 9) = x! x=∞
e−λλ 10
∑
x=10
λx - 10 = e−λλ 10 (x -10)!
y=∞
∑ y=0
y
λ = e−λλ 10 eλ = λ10. y!
Alternately, the 10th factorial moment is the 10th derivative of the p.g.f. at z = 1. For the Poisson: P(z) = exp(λ(z-1)). dP/dz = λ exp(λ(z-1)). Pʼʼ(z) = λ2 exp(λ(z-1)). Pʼʼʼ(z) = λ3 exp(λ(z-1)). P10(z) = λ10 exp(λ(z-1)). P10(1) = λ10. Comment: Note that the product x(x -1)...(x - 9) is zero for x = 0,1...,9, so only terms for x ≥10 contribute to the sum. The expected value of X(X -1)...(X - 9), is an example of a factorial moment. In the case of the Poisson, the nth factorial moment is λ to the nth power. In this case we have the 10th factorial moment (involving the product of 10 terms) equal to λ10. 10.9. D. 2 = P'(1) = E[N]. 6 = P"(1) = E[N(N-1)] = E[N2 ] - E[N]. E[N2 ] = 6 + 2 = 8. Var[N] = E[N2 ] - E[N]2 = 8 - 22 = 4. Comment: P(z) = E[zn ] = Σ f(n)zn . Pʼ(z) = Σ nf(n)zn-1. Pʼ(1) = Σ nf(n) = E[N]. Pʼʼ(z) = Σ n(n-1)f(n)zn-2. Pʼʼ(1) = Σ n(n-1)f(n) = E[N(N-1)].
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 189
Section 11, (a, b, 0) Class of Distributions The “(a,b,0) class of frequency distributions” consists of the three common distributions: Binomial, Poisson, and Negative Binomial. Distribution
Mean Variance
Binomial
mq
Poisson
λ
Negative Binomial
rβ
Distribution
rβ(1+β)
m q (1 - q)
1/
λ
Variance < Mean
1
Variance = Mean
1+β > 1
Variance > Mean
rβ(1 + β)
f(x)
Binomial
m! qx (1- q)m - x x! (m - x)!
Poisson
λx e−λ / x!
If q < 0.5 skewed right, if q > 0.5 skewed left
Skewed to the right
1 + 2β
Negative Binomial
Binomial
1-q <1
λ
1 - 2q
Poisson
Negative
mq(1-q)
Skewness
Binomial
Distribution
Variance / Mean
Skewed to the right
f(x+1)
f(x+1) / f(x)
m! qx+ 1 (1- q)m - x - 1 (x +1)! (m- x - 1)!
λx+1 e−λ / (x+1)!
βx r(r +1)...(r + x - 1) (1+ β)x + r x!
m - x q 1 - q x + 1
λ / (x +1)
βx + 1 r(r +1)...(r + x) (1+β) x + r + 1 (x + 1)!
β x + r 1+β x + 1
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 190
(a,b,0) relationship: For each of these three frequency distributions: f(x+1) / f(x) = a +
b , x = 0, 1, 2, ... x+1
where a and b depend on the parameters of the distribution:68 Distribution Binomial Poisson Negative Binomial
a -q/(1-q) 0 β/(1+β)
b (m+1)q/(1-q) λ (r-1)β/(1+β)
f(0) (1-q)m e−λ 1/(1+β)r
Loss Models writes this recursion formula equivalently as: pk/pk-1 = a + b/k, k = 1, 2, 3 ... 69 This relationship defines the (a,b,0) class of frequency distributions.70 The (a, b ,0) class of frequency distributions consists of the three common distributions: Binomial, Poisson, and Negative Binomial.71 Therefore, it also includes the Bernoulli, which is a special case of the Binomial, and the Geometric, which is a special case of the Negative Binomial. Note that a is positive for the Negative Binomial, zero for the Poisson, and negative for the Binomial. These formula can be useful when programming these frequency distributions into spreadsheets. One calculates f(0) and then one gets additional values of the density function via iteration: f(x+1) = f(x){a + b / (x+1)}. f(1) = f(0) (a + b). f(2) = f(1) (a + b/2). f(3) = f(2) (a + b/3). f(4) = f(3) (a + b/4), etc.
68
These a and b values are shown in the tables attached to the exam. This relationship is used in the Panjer Algorithm (recursive formula), a manner of computing either the aggregate loss distribution or a compound frequency distribution. For a member of the (a,b,0) class, the values of a and b determine everything about the distribution. Given the density at zero, all of the densities would follow; however, the sum of all of the densities must be one. 69 See Definition 6.4 in Loss Models. 70 See Table 6.1 and Appendix B.2 in Loss Models. The (a, b, 0) class is distinguished from the (a, b, 1) class, to be discussed in a subsequent section, by the fact that the relationship holds starting with the density at zero, rather than possibly only starting with the density at one. 71 As stated in Loss Models, these are the only members of the (a, b, 0) class. This is proved in Lemma 6.6.1 of Insurance Risk Models, by Panjer and Willmot. Only certain combinations of a and b are acceptable. Each of the densities must be nonnegative and they must sum to one, a finite amount.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 191
Thinning and Adding: Distribution
Thinning by factor of t
Adding n independent, identical copies
Binomial
q → tq
m → nm
Poisson
λ → tλ
λ → nλ
Negative Binomial
β → tβ
r → nr
If for example, we assume 1/4 of all claims are large: If All Claims
Then Large Claims
Binomial m = 5, q = 0.04
Binomial m = 5, q = 0.01
Poisson λ = 0.20
Poisson λ = 0.05
Negative Binomial r = 2, β = 0.10
Negative Binomial r = 2, β = 0.025
In the Poisson case, small and large claims are independent Poisson Distributions.72 For X and Y independent: X Binomial(m1 , q)
Y Binomial(m2 , q)
X+Y Binomial(m1 + m2 , q)
Poisson(λ 1)
Poisson(λ 2)
Poisson(λ 1 + λ2)
Negative Binomial(r1 , β)
Negative Bin.(r2 , β)
Negative Bin.(r1 + r2 , β)
If Claims each Year
Then Claims for 6 Independent Years
Binomial m = 5, q = 0.04
Binomial m = 30, q = 0.04
Poisson λ = 0.20
Poisson λ = 1.20
Negative Binomial r = 2, β = 0.10
Negative Binomial r = 12, β = 0.10
72
As discussed in the section on the Gamma-Poisson Frequency Process, in the Negative Binomial case, the number of large and small claims are positively correlated. In the Binomial case, the number of large and small claims are negatively correlated.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 192
Probability Generating Functions: Recall that the probability generating function for a given distribution is P(z) = E[zN]. Distribution
Probability Generating Function
Binomial
P(z) = {1 + q(z-1)}m
Poisson
P(z) = eλ(z-1)
Negative Binomial
P(z) = {1 - β(z-1)}-r, z < 1 + 1/β
Parametric Models: Some advantages of parametric models: 1. They summarize the information in terms of the form of the distribution and the parameter values. 2. They serve to smooth the empirical data. 3. They greatly reduce the dimensionality of the information. In addition one can use parametric models to extrapolate beyond the largest observation. As will be discussed in a subsequent section, the behavior in the righthand tail is an important feature of any frequency distribution. Some advantages of working with separate distributions of frequency and severity:73 1. Can obtain a deeper understanding of a variety of issues surrounding insurance. 2. Allows one to address issues of modification of an insurance contract (for example, deductibles.) 3. Frequency distributions are easy to obtain and do a good job of modeling the empirical situations.
73
See Section 6.1 of Loss Models.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 193
Limits: Since, the probability generating function determines the distribution, one can take limits of a distribution by instead taking limits of the Probability Generating Function. Assume one takes a limit of the probability generating function of a Binomial distribution for qm = λ as m → ∞ and q → 0 : P(z) = {1 + q(z-1)}m = {1 + q(z-1)}λ/q = [{1 + q(z-1)}1/q]λ → {e(z-1)}λ = eλ(z-1). Where we have used the fact that as x → 0, (1+ax)1/x → ea. Thus the limit of the Binomial Probability Generating Function is the Poisson Probability Generating Function. Therefore, as we let q get very small in a Binomial but keep the mean constant, in the limit one approaches a Poisson with the same mean.74 For example, a Poisson (triangles) with mean 10 is compared to a Binomial (squares) with q = 1/3 and m = 30 (mean = 10, variance = 20/3): 0.15
0.125
0.1
0.075
0.05
0.025
5
10
15
20
While the Binomial is shorter-tailed than the Poisson, they are not that dissimilar.
74
The limit of the probability generating function is the probability generating function of the limit of the distributions if it exists.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 194
Assume one takes a limit of the probability generating function of a Negative Binomial distribution for rβ = λ as r → ∞ and β → 0: P(z) = {1- β(z-1)}-r = {1- β(z-1)}−λ/β = [{1- β(z-1)} 1/β ]-λ → {e−(z-1)}-λ = eλ(z-1). Thus the limit of the Negative Binomial Probability Generating Function is the Poisson Probability Generating Function. Therefore, as we let β get very close to zero in a Negative Binomial but keep the mean constant, in the limit one approaches a Poisson with the same mean.75 A Poisson (triangles) with mean 10 is compared to a Negative Binomial Distribution (squares) with r = 20 and β = 0.5 (mean = 10, variance = 15):
0.12
0.1
0.08
0.06
0.04
0.02
5
10
15
20
For the three distributions graphed here and previously, while the means are the same, the variances are significantly different; thus the Binomial is more concentrated around the mean while the Negative Binomial is more dispersed from the mean. Nevertheless, one can see how the three distributions are starting to resemble each other.76
75
The limit of the probability generating function is the probability generating function of the limit of the distributions if it exists. 76 They are each approximated by a Normal Distribution. While these three Normal Distributions have the same mean, they have different variances.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 195
If the Binomial q were smaller and m larger such that the mean remained 10, for example q = 1/30 and m = 300, then the Binomial would have been much closer to the Poisson. Similarly, if on the Negative Binomial one had β closer to zero with r larger such that the mean remained 10, for example β = 1/9 and r = 90, then the Negative Binomial would have been much closer to the Poisson. Thus the Poisson is the limit of either a series of Binomial or Negative Binomial Distributions as they “come from different sides.”77 The Binomial has q go to zero; one adds up very many Bernoulli Trials each with a very small chance of success. This approaches a constant chance of success per very small unit of time, which is a Poisson. Note that for each Binomial the mean is greater than the variance, but as q goes to zero the variance approaches the mean. For the Negative Binomial one lets β goes to zero; one adds up very many Geometric distributions each with very small chance of a claim.78 Again this limit is a Poisson, but in this case for each Negative Binomial the variance is greater than the mean. As β goes to zero, the variance approaches the mean. As mentioned previously the Distribution Function of the Binomial Distribution is a form of the Incomplete Beta Function, while that of the Poisson is in the form of an Incomplete Gamma Function. As q → 0 and the Binomial approaches a Poisson, the Distribution Function of the Binomial approaches that of the Poisson. An Incomplete Gamma Function can thus be obtained as a limit of Incomplete Beta Distributions. Similarly, the Distribution Function of the Negative Binomial is a somewhat different form of the Incomplete Beta Distribution. As β → 0 and the Negative Binomial approaches a Poisson, the Distribution Function of the Negative Binomial approaches that of the Poisson. Again, an Incomplete Gamma Function can be obtained as a limit of Incomplete Beta Distributions.
77
One can also show this via the use of Sterlingʼs formula to directly calculate the limits rather than via the use of Probability Generating Functions. 78 The mean of a Geometric is β, thus as β → 0, the chance of a claim becomes very small. For the Negative Binomial, r = mean/β, so that as β → 0 for a fixed mean, r → ∞.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 196
Modes: The mode, where the density is largest, can be located by observing where f(x+1)/f(x) switches from being greater than 1 to being less than 1.79 Exercise: For a member of the (a, b, 0) frequency class, when is f(x+1)/f(x) greater than one, equal to one, and less than one? [Solution: f(x+1)/f(x) = 1 when a + b/(x+1) = 1. This occurs when x = b/(1-a) - 1. For x < b/(1-a) - 1, f(x+1)/f(x) > 1. For x > b/(1-a) - 1, f(x+1)/f(x) < 1.] For example, for a Binomial Distribution with m = 10 and q = .23, a = - q/(1-q) = -.2987 and b = (m+1)q/(1-q) = 3.2857. For x > b/(1-a) - 1 = 1.53, f(x+1)/f(x) < 1. Thus f(3) < f(2). For x < 1.53, f(x+1)/f(x) > 1. Thus f(2) > f(1). Therefore, the mode is 2. In general, since for x < b/(1-a) - 1, f(x+1) > f(x), if c is the largest integer in b/(1-a), f(c) > f(c-1). Since for x > b/(1-a) - 1, f(x+1) < f(x), f(c+1) > f(c). Thus c is the mode. For a member of the (a, b, 0) class, the mode is the largest integer in b/(1-a). If b/(1-a) is an integer, then f(b/(1-a) - 1) = f(b/(1-a)), and there are two modes. For the Binomial Distribution, a = - q/(1-q) and b = (m+1)q/(1-q), so b/(1-a) = (m+1)q. Thus the mode is the largest integer in (m+1)q. If (m+1)q is an integer, there are two modes at: (m+1)q and (m+1)q - 1. For the Poisson Distribution, a = 0 and b = λ, so b/(1-a) = λ. Thus the mode is the largest integer in λ. If λ is an integer, there are two modes at: λ and λ - 1. For the Negative Binomial Distribution, a = β/(1+β) and b = (r-1)β/(1+β), so b/(1-a) = (r-1)β . Thus the mode is the largest integer in (r-1)β. If (r-1)β is an integer, there are two modes at: (r-1)β and (r-1)β - 1. Note that in each case the mode is close to the mean.80 So one could usefully start a numerical search for the mode at the mean.
79
In general this is only a local maximum, but members of the (a,b, 0) class do not have local maxima other than the mode. 80 The means are mq, λ, and rβ.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 197
Moments: Formulas for the Factorial Moments of the (a, b, 0) class have been discussed in a previous section. It can be derived from those formulas that for a member of the (a, b, 0) class: Mean
(a + b)/(1 - a)
Second Moment
(a + b)(a + b + 1)/(1 - a)2
Variance
(a + b)/(1 - a)2
Third Moment
(a + b){(a + b + 1)(a + b + 2) + a - 1}/(1 - a)3
Skewness
(a + 1)/ a + b
A Generalization of the (a, b, 0) Class: The (a, b, 0) relationship is: f(x+1) / f(x) = a + {b / (x+1)}, x = 0, 1, 2, ... or equivalently: pk / pk-1 = a + b/k, k = 1, 2, 3, ... A more general relationship is: pk / pk-1 = (ak + b) / (k + c), k = 1, 2, 3, ... If c = 0, then this would reduce to the (a, b, 0) relationship.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 198
Contagion Model:81 Assume one has a claim intensity of ξ. The chance of having a claim over an extremely small period of time Δt is approximately ξ (Δt).82 As mentioned previously, if the claim intensity is a constant over time, then the number of claims observed over a period time t is given by a Poisson, with parameter ξt. If the claim intensity depends on the number of claims that have occurred so far then the frequency distribution is other than Poisson. Given one has had x-1 claims so far, let ξx Δt be the chance of having the xth claim in small time period Δt. Assume ξx = γ + δ (x-1). Then for δ > 0, one gets a Negative Binomial Distribution. As one observes more claims the chance of observing another claim goes up. This is referred to as positive contagion; examples might be claims due to a contagious disease or from a very large fire. Over time period (0, t), the parameters of the Negative Binomial are r = γ/δ and β = e-δt - 1. Then for δ < 0, one gets a Binomial distribution. As one observes more claims, the chance of future claims goes down. This is referred to as negative contagion. Over time period (0, t), the parameters of the binomial are m = -γ/δ and q = 1 - eδt. For δ = 0 one gets the Poisson. There is no contagion and the claim intensity is constant. Thus the contagion model is another mathematical connection between the three common frequency distributions. We expect as δ → 0 in either the Binomial or Negative Binomial that we approach a Poisson. This is indeed the case as discussed previously.
81 82
Not on the syllabus of your exam. See pages 52-53 of Mathematical Methods of Risk Theory by Buhlmann. The claim intensity is analogous to the force of mortality in Life Contingencies.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 199
As used in the Heckman-Meyers algorithm to calculate aggregate losses, the frequency distributions are parameterized in a related but somewhat different manner via their mean λ and a “contagion parameter” c:83 λ
c
Binomial
mq
-1/m
Poisson
λ
0
Negative Binomial
rβ
1/r
HyperGeometric Distribution: For the HyperGeometric Distribution with parameters m, n, and N, the density is:84
f(x) =
⎛ m⎞ ⎛ N - m⎞ ⎜ x⎟ ⎜ n - x ⎟ ⎝ ⎠ ⎝ ⎠ ⎛N⎞ ⎜ m⎟ ⎝ ⎠
f(x +1) = f(x)
=
, x = 0, 1, ..., n.
⎛ m ⎞ ⎛ N - m ⎞ ⎜ x + 1⎟ ⎜ n - x - 1⎟ ⎝ ⎠ ⎝ ⎠ ⎛ m⎞ ⎛ N - m⎞ ⎜ x⎟ ⎜ n - x ⎟ ⎝ ⎠ ⎝ ⎠
=
(m- x)! x! (N - m - n + x)! (n- x)! . (m- x - 1)! (x +1)! (N - m - n + x + 1)! (n - x - 1)!
m - x n - x . x + 1 N - m - n + x + 1
Thus the HyperGeometric Distribution is not a member of the (a, b, 0) family. Mean = 83
nm . N
Variance =
nm (N - m) (N - n) . (N - 1) N2
Not on the syllabus of your exam. See PCAS 1983 p. 35- 36, “The Calculation of Aggregate Loss Distributions from Claim Severity and Claim Count Distributions,” by Phil Heckman and Glenn Meyers. 84 Not on the syllabus of your exam. See for example, A First Course in Probability, by Sheldon Ross. If we had an urn with N balls,of which m were white, and we took a sample of size n, then f(x) is the probability that x of the balls in the sample were white. For example, tests with 35 questions will be selected at random from a bank of 500 questions. Treat the 35 questions on the first randomly selected test as white balls. Then the number of white balls in a sample of size n from the 500 balls is HyperGeometric with m = 35 and N = 500. Thus the number of questions a second test of 35 questions has in common with the first test is HyperGeometric with m = 35, n = 35, and N = 500. The densities from 0 to 10 are: 0.0717862, 0.204033, 0.272988, 0.228856, 0.134993, 0.0596454, 0.0205202, 0.00564155, 0.00126226, 0.000232901, 0.000035782.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 200
Problems: 11.1 (1 point) Which of the following statements are true? 1. The variance of the Negative Binomial Distribution is less than the mean. 2. The variance of the Poisson Distribution only exists for λ > 2. 3. The variance of the Binomial Distribution is greater than the mean. A. 1 B. 2 C. 3 D. 1, 2, and 3 E. None of A, B, C, or D 11.2 (1 point) A member of the (a, b, 0) class of frequency distributions has a = -2. Which of the following types of Distributions is it? A. Binomial B. Poisson C. Negative Binomial D. Logarithmic E. None A, B, C, or D. 11.3 (1 point) A member of the (a, b, 0) class of frequency distributions has a = 0.4 and b = 2. Given f(4) = 0.1505, what is f(7)? A. Less than 0.06 B. At least 0.06, but less than 0.07 C. At least 0.07, but less than 0.08 D. At least 0.08, but less than 0.09 E. At least 0.09 11.4 (2 points) X is a discrete random variable with a probability function which is a member of the (a,b,0) class of distributions. P(X = 1) = 0.0064. P(X = 2) = 0.0512. P(X = 3) = 0.2048. Calculate P(X = 4). (A) 0.37 (B) 0.38 (C) 0.39 (D) 0.40 (E) 0.41 11.5 (2 points) For a discrete probability distribution, you are given the recursion relation: f(x+1) = {
0.6 1 + } f(x), x = 0, 1, 2,…. x +1 3
Determine f(3). (A) 0.09 (B) 0.10
(C) 0.11
(D) 0.12
(E) 0.13
11.6 (2 points) A member of the (a, b, 0) class of frequency distributions has a = 0.4, and b = 2.8. What is the mode? A. 0 or 1 B. 2 C. 3 D. 4 E. 5 or more 11.7 (2 points) For a discrete probability distribution, you are given the recursion relation: p(x) = (-2/3 + 4/x)p(x-1), x = 1, 2,…. Determine p(3). (A) 0.19 (B) 0.20 (C) 0.21 (D) 0.22 (E) 0.23
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 201
11.8 (3 points) X is a discrete random variable with a probability function which is a member of the (a, b, 0) class of distributions. P(X = 100) = 0.0350252. P(X = 101) = 0.0329445. P(X = 102) = 0.0306836. Calculate P(X = 105). (A) .022 (B) .023 (C) .024 (D) .025 (E) .026 11.9 (2 points) X is a discrete random variable with a probability function which is a member of the (a, b, 0) class of distributions. P(X = 10) = 0.1074. P(X = 11) = 0. Calculate P(X = 6). (A) 6% (B) 7% (C) 8% (D) 9% (E) 10% 11.10 (3 points) Show that the (a, b, 0) relationship with a = -2 and b = 6 leads to a legitimate distribution while a = -2 and b = 5 does not. 11.11 (2 points) A discrete probability distribution has the following properties: (i) pk = c(-1 + 4/k)pk-1 for k = 1, 2,… (ii) p0 = 0.7. Calculate c. (A) 0.06
(B) 0.13
(C) 0.29
(D) 0.35
(E) 0.40
11.12 (3 points) Show that the (a, b, 0) relationship with a = 1 and b = -1/2 does not lead to a legitimate distribution. 11.13 (3 points) A member of the (a, b, 0) class of frequency distributions has been fit via maximum likelihood to the number of claims observed on 10,000 policies. Number of claims Number of Policies Fitted Model 0 6587 6590.79 1 2598 2586.27 2 647 656.41 3 136 136.28 4 25 25.14 5 7 4.29 6 or more 0 0.80 Determine what type of distribution has been fit and the value of the fitted parameters.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 202
11.14 (4, 5/86, Q.50) (1 point) Which of the following statements are true? 1. For a Poisson distribution the mean and variance are equal. 2. For a binomial distribution the mean is less than the variance. 3. The negative binomial distribution is a useful model of the distribution of claim frequencies of a heterogeneous group of risks. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 11.15 (4B, 11/92, Q.21) (1 point) A portfolio of 10,000 risks yields the following: Number of Claims Number of Insureds 0 6,070 1 3,022 2 764 3 126 4 18 Based on the portfolio's sample moments, which of the following distributions provides the best fit to the portfolio's number of claims? A. Binomial B. Poisson C. Negative Binomial D. Lognormal E. Pareto 11.16 (5A, 11/94, Q.24) (1 point) Let X and Y be random variables representing the number of claims for two separate portfolios of insurance risks. You are asked to model the distributions of the number of claims using either the Poisson or Negative Binomial distributions. Given the following information about the moments of X and Y, which distribution would be the best choice for each? E[X] = 2.40 E[Y] = 3.50 E[X2 ] = 8.16 E[Y2 ] = 20.25 A. X is Poisson and Y is Negative Binomial B. X is Poisson and Y is Poisson C. X is Negative Binomial and Y is Negative Binomial D. X is Negative Binomial and Y is Poisson E. Neither distribution is appropriate for modeling numbers of claims. 11.17 (5A, 11/99, Q.39) (2 points) You are given the following information concerning the distribution, S, of the aggregate claims of a particular line of business: E[S] = $500,000 and Var[S] = 7.5 x 109 . The claim severity follows a Normal Distribution with both mean and standard deviation equal to $5,000. What conclusion can be drawn regarding the individual claim propensity of the insureds in this line of business?
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 203
11.18 (3, 5/01, Q.25 & 2009 Sample Q.108) (2.5 points) For a discrete probability distribution, you are given the recursion relation p(k) = (2/k) p(k-1), k = 1, 2,…. Determine p(4). (A) 0.07 (B) 0.08 (C) 0.09 (D) 0.10 (E) 0.11 11.19 (3, 11/02, Q.28 & 2009 Sample Q.94) (2.5 points) X is a discrete random variable with a probability function which is a member of the (a,b,0) class of distributions. You are given: (i) P(X = 0) = P(X = 1) = 0.25 (ii) P(X = 2) = 0.1875 Calculate P(X = 3). (A) 0.120 (B) 0.125 (C) 0.130 (D) 0.135 (E) 0.140 11.20 (CAS3, 5/04, Q.32) (2.5 points) Which of the following statements are true about the sums of discrete, independent random variables? 1. The sum of two Poisson random variables is always a Poisson random variable. 2. The sum of two negative binomial random variables with parameters (r, β) and (r', β') is a negative binomial random variable if r = r'. 3. The sum of two binomial random variables with parameters (m, q) and (m', q') is a binomial random variable if q = q'. A. None of 1, 2, or 3 is true. B. 1 and 2 only C. 1 and 3 only D. 2 and 3 only E. 1, 2, and 3 11.21 (CAS3, 5/05, Q.16) (2.5 points) Which of the following are true regarding sums of random variables? 1. The sum of two independent negative binomial distributions with parameters (r1 , β1) and (r2 , β2) is negative binomial if and only if r1 = r2 . 2. The sum of two independent binomial distributions with parameters (q1 , m1 ) and (q2 , m2 ) is binomial if and only if m1 = m2 . 3. The sum of two independent Poison distributions with parameters λ1 and λ2 is Poisson if and only if λ1 = λ2. A. None are true
B. 1 only
C. 2 only
D. 3 only
E. 1 and 3 only
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
11.22 (SOA M, 5/05, Q.19 & 2009 Sample Q.166) (2.5 points) A discrete probability distribution has the following properties: (i) pk = c (1 + 1/k) pk-1 for k = 1, 2,… (ii) p0 = 0.5. Calculate c. (A) 0.06
(B) 0.13
(C) 0.29
(D) 0.35
(E) 0.40
11.23 (CAS3, 5/06, Q.31) (2.5 points) N is a discrete random variable from the (a, b, 0) class of distributions. The following information is known about the distribution:
• Pr(N = 0) = 0.327680 • Pr(N = 1) = 0.327680 • Pr(N = 2) = 0.196608 • E(N) = 1.25 Based on this information, which of the following are true statements? I. Pr(N = 3) = 0.107965 II. N is from a Binomial distribution. III. N is from a Negative Binomial distribution. A. I only B. II only C. III only D. I and II E. I and III
Page 204
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 205
Solutions to Problems: 11.1. E. 1. The variance of the Negative Binomial Distribution is greater than the mean. Thus Statement #1 is false. 2. The variance of the Poisson always exists (and is equal to the mean.) Thus Statement #2 is false. 3. The variance of the Binomial Distribution is less than the mean. Thus Statement #3 is false. 11.2. A. For a < 0, one has a Binomial Distribution. Comment: Since a = -q/(1-q), q = a/(a-1) = -2/(-3) = 2/3. a = 0 is a Poisson, 1> a > 0 is a Negative Binomial. The Logarithmic Distribution is not a member of the (a,b,0) class. The Logarithmic Distribution is a member of the (a,b,1) class. 11.3. B. f(x+1) = f(x) {a + b/(x+1)} = f(x){.4 + 2/(x+1)} = f(x)(.4)(x+6)/(x+1). Then proceed iteratively. For example f(5) = f(4)(.4)(10)/5 = (.1505)(.8) = .1204. n f(n)
0 0.0467
1 0.1120
2 0.1568
3 0.1672
4 0.1505
5 0.1204
6 0.0883
7 0.0605
Comment: Since 0 < a < 1 we have a Negative Binomial Distribution. r = 1 + b/a = 1+ (2/.4) = 6. β = a/(1-a) = .4/.6 = 2/3. Thus once a and b are given in fact f(4) is determined. Normally one would compute f(0) = (1+β)-r = .66 = .0467, and proceed iteratively from there. 11.4. E. For a member of the (a,b,0) class of distributions, f(x+1) / f(x) = a + {b / (x+1)}. f(2)/f(1) = a + b/2. ⇒ .0512/.0064 = 8 = a + b/2. f(3)/f(2) = a + b/3. ⇒ .2048/.0512 = 4 = a + b/3. Therefore, a = -4 and b = 24. f(4) = f(3)(a + b/4) = (.2048)(-4 + 24/4) = 0.4096. Alternately, once one solves for a and b, a < 0 ⇒ a Binomial Distribution. -4 = a = -q/(1-q). ⇒ q = .8. 24 = b = (m+1)q/(1-q). ⇒ m + 1 = 6. ⇒ m = 5. f(4) = (5)(.84 )(.2) = 0.4096. Comment: Similar to 3, 11/02, Q.28. 11.5. B. This is a member of the (a, b, 0) class of frequency distributions with a = 1/3 and b = 0.6. Since a > 0, this is a Negative Binomial, with a = β/(1+β) = 1/3, and b = (r - 1)β/(1 + β) = .6. Therefore, r - 1 = .6/(1/3) = 1.8. ⇒ r = 2.8. β = 0.5. f(3) = {(2.8)(3.8)(4.8)/3!} .53 /(1.52.8+3) = 0.1013. Comment: Similar to 3, 5/01, Q.25. f(x+1) = f(x) {a + b/(x+1)}, x = 0, 1, 2, ...
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 206
11.6. D. For a member of the (a, b, 0) class, the mode is the largest integer in b/(1-a) = 2.8/(1-.4) = 4.667. Therefore, the mode is 4. Alternately, f(x+1)/f(x) = a + b/(x+1) = .4 + 2.8/(x+1). x f(x+1)/f(x)
0 3.200
1 1.800
2 1.333
3 1.100
4 0.960
5 0.867
6 0.800
Therefore, f(4) = 1.1 f(3) > f(3), but f(5) = .96f(4) < f(4). Therefore, the mode is 4. Alternately, since a > 0, this a Negative Binomial Distribution with a = β/(1+β) and b = (r-1)β/(1+β). Therefore, β = a/(1-a) = .4/.6 = 2/3 and r = b/a + 1 = 2.8/.4 + 1 = 8. The mode of a Negative Binomial is the largest integer in: (r-1)β = (7)(2/3) = 4.6667. Therefore, the mode is 4. 11.7. E. This is a member of the (a, b, 0) class of frequency distributions with a = -2/3 and b = 4. Since a < 0, this is a Binomial, with a = -q/(1-q) = -2/3, and b = (m+1)q/(1-q) = 4. Therefore, m + 1 = 4/(2/3) = 6; m = 5. q = .4. f(3) = {(5!)/((3!)(2!))}.43 .62 = 0.2304. Comment: Similar to 3, 5/01, Q.25. f(x) = f(x-1) {a + b/x}, x = 1, 2, 3, ... 11.8. B. For a member of the (a,b,0) class of distributions, f(x+1) / f(x) = a + {b / (x+1)}. f(101)/f(100) = a + b/101. ⇒ 0.0329445/0.0350252 = .940594 = a + b/101. f(102)/f(101) = a + b/102. ⇒ 0.0306836 /0.0329445 = .931372 = a + b/102. Therefore, a = 0 and b = 95.0. f(105) = f(102)(a + b/103)(a + b/104)(a + b/105) = (0.0306836)(95/103)(95/104)(95/105) = 0.0233893. Comment: Alternately, once one solves for a and b, a = 0 ⇒ a Poisson Distribution. λ = b = 95. f(105) = e-95(95105)(105!) = .0233893, difficult to calculate using most calculators. 11.9. D. The Binomial is the only member of the (a, b, 0) class with finite support. P(X = 11) = 0 and P(X = 10) > 0 ⇒ m = 10.
.1074 = P(X = 10) = q10. ⇒ q = .800.
P(X = 6) = 10!/(6!4!) (1-q)4 q6 = (210) .24 .86 = 0.088. 11.10. f(1) = f(0) (a + b). f(2) = f(1) (a + b/2). f(3) = f(2) (a + b/3). f(4) = f(3) (a + b/4), etc. For a = -2 and b = 6: f(1) = f(0) (-2 + 6) = 4 f(0). f(2) = f(1) (-2 + 6/2) = f(1). f(3) = f(2) (-2 + 6/3) = 0. f(4) = 0, etc. This is a Binomial with m = 2 and q = a/(a-1) = 2/3. f(0) = 1/9. f(1) = 4/9. f(2) = 4/9. For a = -2 and b = 5: f(1) = f(0) (-2 + 5) = 3 f(0). f(2) = f(1) (-2 + 5/2) = 1.5f(1). f(3) = f(2) (-2 + 5/3) < 0. No good! Comment: Similar to Exercise 6.3 in Loss Models. For a < 0, we require that b/a be a negative integer.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 207
11.11. B. This is the (a, b, 0) relationship, with a = -c and b = 4c. For the Binomial, a < 0. For the Poisson a = 0. For the Negative Binomial, a > 0. c must be positive, since the densities are positive, therefore, a < 0 and this is a Binomial. For the Binomial, a = -q/(1-q) and b = (m+1)q/(1-q). b = -4a. ⇒ m + 1 = 4. ⇒ m = 3. 0.7 = p0 = (1 - q)m = (1 - q)3 . ⇒ q = .1121. c = -a = q/(1-q) = .1121/.8879 = 0.126. Comment: Similar to SOA M, 5/05, Q.19. 11.12. f(1) = f(0) (1 - 1/2) = (1/2) f(0). f(2) = f(1) (1 - 1/4) = (3/4)f(1). f(3) = f(2) (1 - 1/6) = (5/6)f(2). f(4) = f(3) (1 - 1/8) = (7/8)f(3). f(5) = f(4) (1 - 1/10) = (9/10)f(4). The sum of these densities is: f(0){1 + 1/2 + (3/4)(1/2) + (5/6)(3/4)(1/2) + (7/8)(5/6)(3/4)(1/2) + (9/10)(7/8)(5/6)(3/4)(1/2) + ...} f(0){1 + 1/2 + 3/8 + 5/16 + 35/128 + 315/1280 + ...} > f(0){1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + ...}. However, the sum 1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + ..., diverges. Therefore, these densities would sum to infinity. Comment: We require that a < 1. a is positive for a Negative Binomial; a = β/(1 + β) < 1. 11.13. For a member of the (a, b, 0) class f(1)/f(0) = a + b, and f(2)/f(1) = a + b/2. a + b = 2586.27/6590.79 = 0.39241. a + b/2 = 656.41/2586.27 = 0.25381.
⇒ b = 0.27720. ⇒ a = 0.11521. Looking in Appendix B in the tables attached to the exam, a is positive for the Negative Binomial. Therefore, we have a Negative Binomial. 0.11521 = a = β/(1+β).
⇒ 1/β = 1/0.11521 - 1 = 7.6798. ⇒ β = 0.1302. 0.27720 = b = (r-1) β/(1+β).
⇒ r - 1 = 0.27720/0.11521 = 2.4060. ⇒ r = 3.406. Comment: Similar to Exercise 16.21b in Loss Models. 11.14. C. 1. True. 2. False. The variance = nq(1-q) is less than the mean = nq, since q < 1. 3. True. Statement 3 is referring to the mixture of Poissons via a Gamma, which results in a Negative Binomial frequency distribution for the entire portfolio.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 208
11.15. B. The mean frequency is .5 and the variance is: .75 - .52 = .5. Number of Insureds 6070 3022 764 126 18 Average
Number of Claims 0 1 2 3 4
Square of Number of Claims 0 1 4 9 16
0.5000
0.7500
Since estimated mean = estimated variance, we expect the Poisson to provide the best fit. Comment: If the estimated mean is approximately equal to the estimated variance, then the Poisson is likely to provide a good fit. The Pareto and the LogNormal are continuous distributions not used to fit discrete frequency distributions. 11.16. A. Var[X] = E[X2 ] - E[X]2 = 8.16 - 2.402 = 2.4 = E[X], so a Poisson Distribution is a good choice for X. Var[Y] = E[Y2 ] - E[Y]2 = 20.25 - 3.502 = 8 > 3.5 = E[Y], so a Negative Binomial Distribution is a good choice for Y. 11.17. Mean frequency = $500,000/$5000 = 100. Assuming frequency and severity are independent: Var[S] = 7.5 x 109 = (100)(50002 ) + (50002 ) (Variance of the frequency). Variance of the frequency = 200. Thus if each insured has the same frequency distribution, then it has variance > mean, so it might be a Negative Binomial. Alternately, each insured could have a Poisson frequency, but with the means varying across the portfolio. In that case, the mean of mixing distribution = 100. When mixing Poisons, Variance of the mixed distribution = Mean of mixing Distribution + Variance of the mixing distribution, so the variance of the mixing distribution = 200 - 100 = 100. Comment: There are many possible other answers. 11.18. C. f(x+1)/f(x) = 2/(x+1), x = 0, 1, 2,... This is a member of the (a, b , 0) class of frequency distributions: with f(x+1)/f(x) = a + b/(x+1), for a = 0 and b = 2. Since a = 0, this is a Poisson with λ = b = 2. f(4) = e-2 24 /4! = 0.090. Alternately, let f(0) = c. Then f(1) = 2c, f(2) = 22 c/2!, f(3) = 23 c/3!, f(4) = 24 c/4!, .... 1 = Σ f(i) = Σ2ic/i! = cΣ2i/i! = c e2 . Therefore, c = e-2. f(4) = e-2 24 /4! = 0.090.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 209
11.19. B. For a member of the (a, b, 0) class of distributions, f(x+1) / f(x) = a + {b / (x+1)}. f(1)/f(0) = a + b. ⇒ .25/.25 = 1 = a + b. f(2)/f(1) = a + b/2. ⇒ .1875/.25 = .75 = a + b/2. Therefore, a = .5 and b = .5. f(3) = f(2)(a + b/3) = (.1875)(.5 + .5/3) = 0.125. Alternately, once one solves for a and b, a > 0 ⇒ a Negative Binomial Distribution. 1/2 = a = β/(1 + β). ⇒ β = 1. 1/2 = b = (r-1)β/(1 + β). ⇒ r - 1 = 1. ⇒ r = 2. f(3) = r(r + 1)(r + 2) β3/{(1 + β)r+3 3!} = (2)(3)(4)/{(25 )(6)} = 0.125. 11.20. C. 1. True. 2. False. Would be true if β = β', in which case the sum would have the sum of the r parameters. 3. True. The sum would have the sum of the m parameters. Comment: Note the requirement that the variables be independent. 11.21. A. The sum of two independent negative binomial distributions with parameters (r1 , β1) and (r2 , β2) is negative binomial if and only if β1 = β2. Statement 1 is false. The sum of two independent binomial distributions with parameters (q1 , m1 ) and (q2 , m2 ) is binomial if and only if q1 = q2 . Statement 2 is false. The sum of two independent Poison distributions with parameters λ1 and λ2 is Poisson, regardless of the values of lambda. Statement 3 is false. 11.22. C. This is the (a, b, 0) relationship, with a = c and b = c. For the Binomial, a < 0. For the Poisson a = 0. For the Negative Binomial, a > 0. c must be positive, since the densities are positive, therefore, a > 0 and this is a Negative Binomial. For the Negative Binomial, a = β/(1+β) and b = (r-1)β/(1+β). a = b. ⇒ r - 1 = 1. ⇒ r = 2. 0.5 = p0 = 1/(1+β)r = 1/(1+β)2 . ⇒ (1+β)2 = 2. ⇒ β = c = a = β/(1+β) = 0.4142/1.4142 = 0.293.
2 - 1 = 0.4142.
2013-4-1,
Frequency Distributions, §11 (a, b, 0) Class
HCM 10/4/12,
Page 210
11.23. C. For a member of the (a, b, 0) class, f(1)/f(0) = a + b, and f(2)/f(1) = a + b/2. Therefore, a + b = 1, and a + b/2 = 0.196608/0.327680 = .6. ⇒ a = 0.2 and b = 0.8. Since a is positive, we have a Negative Binomial Distribution. Statement III is true. f(3) = f(2)(a + b/3) = (0.196608)(0.2 + 0.8/3) = 0.0917504. Statement I is false. Comment: .2 = a = β/(1+β) and .8 = b = (r-1)β/(1+β). ⇒ r = 5 and β = 0.25. E[N] = rβ = (5)(.25) = 1.25, as given. f(3) = {r(r+1)(r+2)/3!}β3/(1+β)3+r = {(5)(6)(7)/6}.253 /1.258 = 0.0917504.
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 211
Section 12, Accident Profiles85 Constructing an “Accident Profile” is a technique in Loss Models that can be used to decide whether data was generated by a member of the (a, b, 0) class of frequency distributions and if so which member. As discussed previously, the (a,b,0) class of frequency distributions consists of the three common distributions: Binomial, Poisson, and Negative Binomial. Therefore, it also includes the Bernoulli, which is a special case of the Binomial, and the Geometric, which is a special case of the Negative Binomial. As discussed previously, for members of the (a, b, 0) class: f(x+1) / f(x) = a + b / (x+1), where a and b depend on the parameters of the distribution:86 Distribution
a
b
f(0)
Binomial
-q/(1-q)
(m+1)q/(1-q)
(1-q)m
Poisson
0
λ
e−λ
Negative Binomial
β/(1+β)
(r-1)β/(1+β)
1/(1+β)r
Note that a < 0 is a Binomial, a = 0 is a Poisson, and 1 > a > 0 is a Negative Binomial. For the Binomial: q = a/(a-1) = |a| / ( |a| +1). For the Negative Binomial: β = a/(1-a). For the Binomial: m = -(a+b)/a = (a + b)/ |a|.87 The Bernoulli has m =1 and b = -2a. For the Poisson: λ = b. For the Negative Binomial: r = 1 + b/a. The Geometric has r =1 and b = 0. Thus given values of a and b, one can determine which member of the (a,b,0) class one has and its parameters.
85
See the latter portion of Section 6.5 of Loss Models. See Appendix B of Loss Models. 87 Since for the Binomial m is an integer, we require that b/ |a| to be an integer. 86
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 212
Accident Profile: Also note that for a member of the (a, b, 0) class, (x+1)f(x+1)/f(x) = (x+1)a + b, so that (x+1)f(x+1)/f(x) is linear in x. It is a straight line with slope a and intercept a + b. Thus graphing (x+1)f(x+1)/f(x) can be a useful method of determining whether one of these three distributions fits the given data.88 If a straight line does seem to fit this “accident profile”, then one should use a member of the (a, b, 0) class. The slope determines which of the three distributions is likely to fit: if the slope is close to zero then a Poisson, if significantly negative then a Binomial, and if significantly positive then a Negative Binomial. For example, here is the accident profile for some data: Number of Claims
Observed
0 1 2 3 4 5 6 7 8 9&+
17,649 4,829 1,106 229 44 9 4 1 1 0
Observed Density Function
(x+1)f(x+1)/f(x)
0.73932 0.20229 0.04633 0.00959 0.00184 0.00038 0.00017 0.00004 0.00004
0.27361 0.45807 0.62116 0.76856 1.02273 2.66667 1.75000 8.00000
Prior to the tail where the data thins out,(x+1)f(x+1)/f(x) approximately follows a straight line with a positive slope of about 0.2, which indicates a Negative Binomial with β/(1+β) ≅ 0.2.89 90 The intercept is rβ/(1+β), so that r ≅ 0.27 / 0.2 ≅ 1.4.91 In general, an accident profile is used to see whether data is likely to have come from a member of the (a, b, 0) class. One would do this test prior to attempting to fit a Negative Binomial, Poisson, or Binomial Distribution to the data. One starts with the hypothesis that the data was drawn from a member of the (a, b, 0) class, without specifying which one. If this hypothesis is true the accident profile should be approximately linear.92
88
This computation is performed using the empirical densities. One should not significantly rely on those ratios involving few observations. 90 Slope is: a = β/(1+β). 91 Intercept is: a + b = β/(1+β) + (r-1)β/(1+β) = rβ/(1+β). 92 Approximate, because any finite sample of data is subject to random fluctuations. 89
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 213
If the accident profile is “approximately” linear, then we do not reject the hypothesis and decide which member of the (a, b, 0) to fit based on the slope of this line.93
Comparing the Mean to the Variance: Another way to decide which of the members of the (a,b,0) class is most likely to fit a given set of data is to compare the sample mean and sample variance. Binomial
Mean > Variance
Poisson
Mean = Variance94
Negative Binomial
Mean < Variance
93
There is not a numerical statistical test to perform, such as with the Chi-Square Test. For data from a Poisson Distribution, the sample mean and sample variance will be approximately equal rather than equal, because any finite sample of data is subject to random fluctuations. 94
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 214
Problems: 12.1 (2 points) You are given the following accident data: Number of accidents Number of policies 0 91,304 1 7,586 2 955 3 133 4 18 5 3 6 1 7+ 0 Total 100,000 Which of the following distributions would be the most appropriate model for this data? (A) Binomial (B) Poisson (C) Negative Binomial, r ≤ 1 (D) Negative Binomial, r > 1 (E) None of A, B, C, or D 12.2 (3 points) You are given the following accident data: Number of accidents Number of policies 0 860 1 2057 2 2506 3 2231 4 1279 5 643 6 276 7 101 8 41 9 4 10 2 11&+ 0 Total 10,000 Which of the following distributions would be the most appropriate model for this data? (A) Binomial (B) Poisson (C) Negative Binomial, r ≤ 1 (D) Negative Binomial, r > 1 (E) None of the above
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 215
12.3 (3 points) You are given the following data on the number of runs scored during half innings of major league baseball games from 1980 to 1998: Runs Number of Occurrences 0 518,288 1 105,070 2 47,936 3 21,673 4 9736 5 4033 6 1689 7 639 8 274 9 107 10 36 11 25 12 5 13 7 14 1 15 0 16 1 Total 709,460 Which of the following distributions would be the most appropriate model for this data? (A) Binomial (B) Poisson (C) Negative Binomial, r ≤ 1 (D) Negative Binomial, r > 1 (E) None of the above
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 216
12.4 (3 points) You are given the following accident data: Number of accidents Number of policies 0 820 1 1375 2 2231 3 1919 4 1397 5 1002 6 681 7 330 8 172 9 56 10 14 11 3 12&+ 0 Total 10,000 Which of the following distributions would be the most appropriate model for this data? (A) Binomial (B) Poisson (C) Negative Binomial, r ≤ 1 (D) Negative Binomial, r > 1 (E) None of the above 12.5 (2 points) You are given the following distribution of the number of claims per policy during a one-year period for 20,000 policies. Number of claims per policy Number of Policies 0 6503 1 8199 2 4094 3 1073 4 128 5 3 6+ 0 Which of the following distributions would be the most appropriate model for this data? (A) Binomial (B) Poisson (C) Negative Binomial, r ≤ 1 (D) Negative Binomial, r > 1 (E) None of the above
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 217
12.6 (2 points) You are given the following distribution of the number of claims on motor vehicle polices: Number of claims in a year Observed frequency 0 565,664 1 68,714 2 5,177 3 365 4 24 5 6 6 0 Which of the following distributions would be the most appropriate model for this data? (A) Binomial (B) Poisson (C) Negative Binomial, r ≤ 1 (D) Negative Binomial, r > 1 (E) None of the above 12.7 (4, 5/00, Q.40) (2.5 points) You are given the following accident data from 1000 insurance policies: Number of accidents Number of policies 0 100 1 267 2 311 3 208 4 87 5 23 6 4 7+ 0 Total 1000 Which of the following distributions would be the most appropriate model for this data? (A) Binomial (B) Poisson (C) Negative Binomial (D) Normal (E) Gamma
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
12.8 (4, 11/03, Q.32 & 2009 Sample Q.25) (2.5 points) The distribution of accidents for 84 randomly selected policies is as follows: Number of Accidents Number of Policies 0 32 1 26 2 12 3 7 4 4 5 2 6 1 Total 84 Which of the following models best represents these data? (A) Negative binomial (B) Discrete uniform (C) Poisson (D) Binomial (E) Either Poisson or Binomial
Page 218
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 219
Solutions to Problems: 12.1. C. Calculate (x+1)f(x+1)/f(x). Since it is approximately linear, we seem to have a member of the (a, b, 0) class. f(x+1)/f(x) = a + b/(x+1), so (x+1)f(x+1)/f(x) = a(x+1) + b = ax + a + b. The slope is positive, so a > 0 and we have a Negative Binomial. The slope, a ≅ 0.17. The intercept is about 0.08. Thus a + b ≅ 0.08. Therefore, b ≅ 0.08 - 0.17 = -0.09 < 0. For the Negative Binomial b = (r-1)β/(1+β). Thus b < 0, implies r < 1. Number of Accident 0 1 2 3 4 5 6 7+
Observed 91,304 7,586 955 133 18 3 1 0
Observed Density Function 0.91304 0.07586 0.00955 0.00133 0.00018 0.00003 0.00001 0.00000
(x+1)f(x+1)/f(x)
Differences
0.083 0.252 0.418 0.541 0.833 2.000
0.169 0.166 0.124 0.292
Comment: Similar to 4, 5/00, Q.40. Do not put much weight on the values of (x+1)f(x+1)/f(x) in the righthand tail, which can be greatly affected by random fluctuation. The first moment is 0.09988, and the second moment is 0.13002. The variance is: 0.13002 - 0.099882 = 0.12004, significantly greater than the mean.
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 220
12.2. B. Calculate (x+1)f(x+1)/f(x). Since it is approximately linear, we seem to have a member of the (a, b, 0) class. f(x+1)/f(x) = a + b/(x+1), so (x+1)f(x+1)/f(x) = a(x+1) + b = ax + a + b. The slope seems close to zero, until the data starts to get thin, so a ≅ 0 and therefore we assume this data probably came from a Poisson. Number of Accident 0 1 2 3 4 5 6 7 8 9 10
Observed 860 2,057 2,506 2,231 1,279 643 276 101 41 4 2
Observed Density Function 0.0860 0.2057 0.2506 0.2231 0.1279 0.0643 0.0276 0.0101 0.0041 0.0004 0.0002
(x+1)f(x+1)/f(x) 2.392 2.437 2.671 2.293 2.514 2.575 2.562 3.248 0.878 5.000
Comment: Any actual data set is subject to random fluctuation, and therefore the observed slope of the accident profile will never be exactly zero. One can never distinguish between the possibility that the model was a Binomial with q small, a Poisson, or a Negative Binomial with β small. This data was simulated as 10,000 independent random draws from a Poisson with λ = 2.5. 12.3. E. Calculate (x+1)f(x+1)/f(x). Note that f(x+1)/f(x) = (number with x + 1)/(number with x). Since (x+1)f(x+1)/f(x) is not linear, we do not have a member of the (a, b, 0) class. Number of runs
Observed
0 1 2 3 4 5 6 7 8 9 10 11
518,228 105,070 47,936 21,673 9,736 4,033 1,689 639 274 107 36 25
(x+1)f(x+1)/f(x)
Differences
0.203 0.912 1.356 1.797 2.071 2.513 2.648 3.430 3.515 3.364 7.639
0.710 0.444 0.441 0.274 0.442 0.136 0.782 0.084 -0.150 4.274
Comment: At high numbers of runs, where the data starts to thin out, one would not put much reliance on the values of (x+1)f(x+1)/f(x). The data is taken from “An Analytic Model for Per-inning Scoring Distributions,” by Keith Woolner.
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 221
12.4. E. Calculate (x+1)f(x+1)/f(x). Since it is does not appear to be linear, we do not seem to have a member of the (a, b, 0) class. Number of Accident 0 1 2 3 4 5 6 7 8 9 10 11
Observed 820 1,375 2,231 1,919 1,397 1,002 681 330 172 56 14 3
Observed Density Function 0.0820 0.1375 0.2232 0.1920 0.1397 0.1002 0.0681 0.0330 0.0172 0.0056 0.0014 0.0003
(x+1)f(x+1)/f(x) 1.677 3.245 2.580 2.912 3.586 4.078 3.392 4.170 2.930 2.500 2.357
12.5. A. Calculate (x+1)f(x+1)/f(x) = (x+1)(number with x + 1)/(number with x). Number of claims
Observed
0 1 2 3 4 5
6,503 8,199 4,094 1,073 128 3
(x+1)f(x+1)/f(x)
Differences
1.261 0.999 0.786 0.477 0.117
-0.262 -0.212 -0.309 -0.360
Since (x+1)f(x+1)/f(x) is approximately linear, we probably have a member of the (a, b, 0) class. a = slope < 0. ⇒ Binomial Distribution. Comment: The data was simulated from a Binomial Distribution with m = 5 and q = 0.2. 12.6. E. Calculate (x+1)f(x+1)/f(x). Note that f(x+1)/f(x) = (number with x + 1)/(number with x). Number of claims
Observed
(x+1)f(x+1)/f(x)
Differences
0 1 2 3 4 5
565,664 68,714 5,177 365 24 6
0.121 0.151 0.212 0.263 1.250
0.029 0.061 0.052 0.987
Even ignoring the final value, (x+1)f(x+1)/f(x) is not linear. Therefore, we do not have a member of the (a, b, 0) class. Comment: Data taken from Table 6.6.2 in Introductory Statistics with Applications in General Insurance by Hossack, Pollard and Zehnwirth. See also Table 6.5 in Loss Models.
2013-4-1,
Frequency Distributions, §12 Accident Profiles
HCM 10/4/12,
Page 222
12.7. A. Calculate (x+1)f(x+1)/f(x). Since it seems to be decreasing linearly, we seem to have a member of the (a, b, 0) class, with a < 0, which is a Binomial Distribution. Number of Accident
Observed
0 1 2 3 4 5 6 7+
100 267 311 208 87 23 4 0
Observed Density Function
(x+1)f(x+1)/f(x)
0.10000 0.26700 0.31100 0.20800 0.08700 0.02300 0.00400 0.00000
2.67 2.33 2.01 1.67 1.32 1.04
Alternately, the mean is 2, and the second moment is 5.494. Therefore, the sample variance is (1000/999)(5.494 - 22 ) = 1.495. Since the variance is significantly less than the mean, this indicates a Binomial Distribution. Comment: One would not use a continuous distribution such as the Normal or the Gamma to model a frequency distribution. (x+1)f(x+1)/f(x) = a(x+1) + b. In this case, a ≅ -.33. For the Binomial, a = -q/ (1-q), so q ≅ .25. In this case, b ≅ 2.67+.33 = 3.00. For the Binomial, b = (m+1)q/ (1-q), so m ≅ (3/.33) -1 = 8. 12.8. A. Calculate (x+1)f(x+1)/f(x). For example, (3)(7/84)/(12/84) = (3)(7)/12 = 1.75. Number of Accident
(x+1)f(x+1)/f(x) Observed
0 1 2 3 4 5 6
32 26 12 7 4 2 1
0.81 0.92 1.75 2.29 2.50 3.00
Since this quantity seems to be increasing roughly linearly, we seem to have a member of the (a, b, 0) class, with a = slope > 0, which is a Negative Binomial Distribution. Alternately, the mean is: 103/84 = 1.226, and the second moment is: 287/84 = 3.417. The sample variance is: (84/83)(3.417 - 1.2262 ) = 1.937. Since the sample variance is significantly more than the sample mean, this indicates a Negative Binomial. Comment: If (x+1)f(x+1)/f(x) had been approximately linear with a slope that was close to zero, then one could not distinguish between the possibility that the model was a Binomial with q small, a Poisson, or a Negative Binomial with β small. If the correct model were the discrete uniform, then we would expect the observed number of policies to be similar for each number of accidents.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 223
Section 13, Zero-Truncated Distributions95 Frequency distributions can be constructed that have support on the positive integers or alternately have f(0) = 0. For example, let f(x) = (e-3 3x / x!) / (1 - e-3). for x = 1, 2, 3, ... x f(x) F(x)
1 15.719% 15.719%
2 23.578% 39.297%
3 23.578% 62.875%
4 17.684% 80.558%
5 10.610% 91.169%
6 5.305% 96.4736%
7 2.274% 98.74718%
Exercise: Verify that the sum of f(x) = (e-3 3x / x!) / (1 - e-3) for x =1 to ∞ is unity. ∞
[Solution: The sum of the Poisson Distribution from 0 to ∞ is 1.
∑ e−3 3x / x! = 1.
i=0 ∞
∞
i=1
i=1
Therefore,
∑ e−3 3x / x! = 1− e−3 . ⇒ ∑ f(x) = 1.]
This is an example of a Poisson Distribution Truncated from Below at Zero, with λ = 3. In general, if f is a distribution on 0, 1, 2, 3,..., f(x) then g(x) = is a distribution on 1, 2, 3, ... 1 - f(0) This is a special case of truncation from below. The general concept of truncation of a distribution is covered in a “Mahlerʼs Guide to Loss Distributions.” We have the following three examples, shown in Appendix B.3.1 of Loss Models: Distribution
Binomial
Poisson
Negative Binomial 95
Density of the Zero-Truncated Distribution m! qx (1- q)m - x x! (m - x)! 1 - (1- q)m e- λ λx / x! 1 - e- λ
x = 1, 2, 3,... , m
x = 1, 2, 3,...
βx r(r +1)...(r + x - 1) (1+ β)x + r x! x = 1, 2, 3,... 1 - 1/ (1+ β)r
See Section 6.7 in Loss Models.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 224
Moments: Exercise: For a Zero-Truncated Poisson with λ = 3, what is the mean? [Solution: Let f(x) be the untruncated Poisson, and g(x) be the truncated distribution. Then g(x) = f(x) / {1 - f(0)}. ∞
∞
The mean of g =
∑
x=1
∑ x f(x) mean of f x f(x) λ 3 x g(x) = ∑ = x=0 = = = -λ - 3 = 3.157.] 1 f(0) 1 f(0) 1 f(0) 1 e 1 e x=1 ∞
In general, the moments of a zero-truncated distribution, g, are given in terms of those of the corresponding untruncated distribution, f, by: Eg [Xn ] =
Ef [Xn] . 1 - f(0)
For example for the Zero-Truncated Poisson the mean is: λ / (1 - e−λ), while the second moment is: (λ + λ2)/ (1 - e−λ). Exercise: For a Zero-Truncated Poisson with λ = 3 what is the second moment? [Solution: Let f(x) be the untruncated Poisson, and g(x) be the truncated distribution. Then g(x) = f(x) / {1 - f(0)}. The second moment of f is its variance plus the square of its mean = λ + λ2. The second moment of h = (the second moment of f)/ {1 - f(0)} = (λ + λ2)/ (1 - e−λ) = (3 + 32 )/(1 - e-3) =12.629.] Thus a Zero-Truncated Poisson with λ = 3 has a variance of 12.629 - 3.1572 = 2.66. This matches the result of using the formula in Appendix B of Loss Models: λ{1 - (λ+1)e−λ}/ (1 - e−λ)2 = (3){1 - 4e-3}/ (1 - e-3)2 = (3)(.8009)/(.9502)2 = 2.66. It turns out that for the Zero-Truncated Negative Binomial, the parameter r can take on values between -1 and 0, as well as the usual positive values, r > 0. This is sometimes referred to as the Extended Zero-Truncated Negative Binomial, however provided r ≠ 0 all the same formulas apply. As r approaches zero, the Zero-Truncated Negative Binomial approaches the Logarithmic Distribution.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 225
Logarithmic Distribution:96 The Logarithmic Distribution with parameter β has support equal to the positive integers:
f(x) =
with mean:
⎛ β ⎞x ⎜ ⎟ ⎝ 1+ β ⎠ x ln(1+ β)
, for x = 1, 2, 3,...
β , and variance: β ln(1+β)
β ln(1+β) . ln(1+β)
1 + β -
ln[1 - β(z - 1)] , z < 1 + 1/β. ln[1+ β] Exercise: Assume the number of vehicles involved in each automobile accident is given by
a = β / (1 +β).
b = -β / (1 +β)
P(z) = 1 -
f(x) = 0.2x / {x ln(1.25)}, for x = 1, 2, 3,... Then what is the mean number of vehicles involved per automobile accident? [Solution: This is a Logarithmic Distribution with β = 0.25. Mean β/ln(1+β) = 0.25/ ln(1.25) = 1.12. Comment: β / (1 + β) = 0.25/1.25 = 0.2.] The density function of this Logarithmic Distribution with β = 0.25 is as follows: x f(x) F(x)
1 89.6284% 89.628%
2 8.9628% 98.591%
3 1.1950% 99.786%
4 0.1793% 99.966%
5 0.0287% 99.994%
6 0.0048% 99.9990%
Exercise: Show that the densities of a Logarithmic Distribution sum to one. ∞
Hint: ln[1/ (1 - y)] =
∑ k , for |y| < 1. yk
97
k=1
∞
[Solution:
∑
k=1
1 f(k) = ln(1+ β)
∞
∑ k=1
⎛ β ⎞k ⎜ ⎟ / k. ⎝ 1+β ⎠
Let y = β/ (1+β). Then, 1/ (1 - y) = 1 + β. ∞
Thus
∑
k=1
96 97
1 f(k) = ln(1+ β)
∞
∑k k=1
yk
=
ln[1/ (1 - y)] ln(1+ β) = = 1. ] ln(1+ β) ln(1+ β)
Sometimes called instead a Log Series Distribution. Not something you need to know for your exam. This result can be derived as a Taylor series.
7 0.0008% 99.9998%
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 226
Exercise: Show that the limit as r → 0 of Zero-Truncated Negative Binomial Distributions with the other parameter β fixed, is a Logarithmic Distribution with parameter β. [Solution: For the Zero-Truncated Negative Binomial Distribution: f(x) = {(r+x-1)! βx / {(1+β)x+r (r-1)! x!}} / {1- 1/(1+β)r} = {r(r+1)...(r+x-1) / x!} {β/(1+β)}x /{(1+β)r - 1}. lim f(x) = {{β/(1+β)}x / x!} lim (r+1)...(r+x-1) {r/((1+β)r -1)} = r→ 0
r→0
{{β/(1+β)}x / x!} (x-1)! lim r/{(1+β)r - 1} = {{β/(1+β)}x / x} lim 1/{ln(1+β) (1+β)r } = r→ 0
r→ 0
{{β/(1+β)}x / x} / ln(1+β). Where I have used LʼHospitalʼs Rule. This is the density of a Logarithmic Distribution. Alternately, the p.g.f. of a Zero-Truncated Negative Binomial Distribution is: P(z) = {{1 - β(z-1)}-r - (1+β)-r }/{1 - (1+β)-r}. lim P(z) = lim {{1 - β(z-1)}-r - (1+β)-r }/{1 - (1+β)-r} = r→ 0
r→0
lim [- ln(1-β(z-1)){1 - β(z-1)}-r + ln(1+β)(1 + β)-r] / {ln(1+β) (1 + β)-r} = r→ 0
{ln(1+β) - ln[1 - β(z-1)]} / ln(1+β) = 1 - ln[1 - β(z-1)] / ln(1+β). Where I have used LʼHospitalʼs Rule. This is the p.g.f of a Logarithmic Distribution.] (a,b,1) Class:98 The (a,b,1) class of frequency distributions in Loss Models is a generalization of the (a,b,0) class. As with the (a,b,0) class, the recursion formula: f(x)/f(x-1) = a + b/x applies. However, this relationship need only apply now for x ≥ 2, rather than x ≥ 1. Members of the (a,b,1) family include: all the members of the (a,b,0) family,99 the zero-truncated versions of those distributions: Zero-Truncated Binomial, Zero-Truncated Poisson, and Extended Truncated Negative Binomial,100 and the Logarithmic Distribution. In addition the (a,b,1) class includes the zero-modified distributions corresponding to these, to be discussed in the next section.
98
See Table 6.4 and Appendix B.3 in Loss Models. Binomial, Poisson, and the Negative Binomial. 100 The Zero-Truncated Negative Binomial where in addition to r > 0, -1 < r < 0 is also allowed. 99
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 227
Probability Generating Functions: The Probability generating function, P(z) = E[zN], for a zero-truncated distribution can be obtained from that for the untruncated distribution. PT(z) =
P(z) - f(0) , 1 - f(0)
where P(z) is the p.g.f. for the untruncated distribution and PT(z) is the p.g.f. for the zero-truncated distribution, and f(0) is the probability at zero for the untruncated distribution. Exercise: What is the Probability Generating Function for a Zero-Truncated Poisson Distribution. [Solution: For the untruncated Poisson P(z) = eλ(z-1). f(0) = e−λ. PT(z) = {P(z) - f(0)} / {1 - f(0)} = {eλ(z-1) - e−λ} / {1- e−λ} = {eλz - 1} / {eλ - 1 }.] One can derive this relationship as follows: ∞
∞
PT(z) =
∑
∞
zn
f(n)
∑ zn g(n) = x=1 1 - f(0) x=1
∑ zn f(n) = x=0
1 - f(0)
f(0) =
P(z) - f(0) . 1 - f(0)
In any case, Appendix B of Loss Models displays the Probability Generating Functions for all of the Zero-Truncated Distributions. Loss Models Notation: p k the density function of the untruncated frequency distribution at k. pT k the density function of the zero-truncated frequency distribution at k. 101 pM k the density function of the zero-modified frequency distribution at k.
T Exercise: Give a verbal description of the following terms: p7 , pM 4 , and p 6 .
[Solution: p7 is the density of the frequency at 7, f(7). pM 4 is the density of the zero-modified frequency at 4, fM(4). pT 6 is the density of the zero-truncated frequency at 6, fT(6).] 101
Zero-modified distributions will be discussed in the next section.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 228
Problems: 13.1 (1 point) The number of persons injured in an accident is assumed to follow a Zero -Truncated Poisson Distribution with parameter λ = 0.3. Given an accident, what is the probability that exactly 3 persons were injured in it? A. Less than 1.0% B. At least 1.0% but less than 1.5% C. At least 1.5% but less than 2.0% D. At least 2.0% but less than 2.5% E. At least 2.5% Use the following information for the next four questions: The number of vehicles involved in an automobile accident is given by a Zero-Truncated Binomial Distribution with parameters q = 0.3 and m = 5. 13.2 (1 point) What is the mean number of vehicles involved in an accident? A. less than 1.8 B. at least 1.8 but less than 1.9 C. at least 1.9 but less than 2.0 D. at least 2.0 but less than 2.1 E. at least 2.1 13.3 (2 points) What is the variance of the number of vehicles involved in an accident? A. less than 0.5 B. at least 0.5 but less than 0.6 C. at least 0.6 but less than 0.7 D. at least 0.7 but less than 0.8 E. at least 0.8 13.4 (1 point) What is the chance of observing exactly 3 vehicles involved in an accident? A. less than 11% B. at least 11% but less than 13% C. at least 13% but less than 15% D. at least 15% but less than 17% E. at least 17% 13.5 (2 points) What is the median number of vehicles involved in an accident?? A. 1 B. 2 C. 3 D. 4 E. 5
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 229
Use the following information for the next five questions: The number of family members is given by a Zero-Truncated Negative Binomial Distribution with parameters r = 4 and β = 0.5. 13.6 (1 point) What is the mean number of family members? A. less than 2.0 B. at least 2.0 but less than 2.1 C. at least 2.1 but less than 2.2 D. at least 2.2 but less than 2.3 E. at least 2.3 13.7 (2 points) What is the variance of the number of family members? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6 13.8 (2 points) What is the chance of a family having 7 members? A. less than 1.1% B. at least 1.1% but less than 1.3% C. at least 1.3% but less than 1.5% D. at least 1.5% but less than 1.7% E. at least 1.7% 13.9 (3 points) What is the probability of a family having more than 5 members? A. less than 1% B. at least 1%, but less than 3% C. at least 3%, but less than 5% D. at least 5%, but less than 7% E. at least 7% 13.10 (1 point) What is the probability generating function?
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 230
Use the following information for the next three questions: A Logarithmic Distribution with parameter β = 2. 13.11 (1 point) What is the mean? A. less than 2.0 B. at least 2.0 but less than 2.1 C. at least 2.1 but less than 2.2 D. at least 2.2 but less than 2.3 E. at least 2.3 13.12 (2 points) What is the variance? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6 13.13 (1 point) What is the density function at 6? A. less than 1.1% B. at least 1.1% but less than 1.3% C. at least 1.3% but less than 1.5% D. at least 1.5% but less than 1.7% E. at least 1.7%
13.14 (1 point) For a Zero-Truncated Negative Binomial Distribution with parameters r = -0.6 and β = 3, what is the density function at 5? A. less than 1.1% B. at least 1.1% but less than 1.3% C. at least 1.3% but less than 1.5% D. at least 1.5% but less than 1.7% E. at least 1.7%
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 231
Use the following information for the next five questions: The number of days per hospital stay is given by a Zero-Truncated Poisson Distribution with parameter λ = 2.5. 13.15 (1 point) What is the mean number of days per hospital stay? A. less than 2.5 B. at least 2.5 but less than 2.6 C. at least 2.6 but less than 2.7 D. at least 2.7 but less than 2.8 E. at least 2.8 13.16 (2 points) What is the variance of the number of days per hospital stay? A. less than 2.2 B. at least 2.2 but less than 2.3 C. at least 2.3 but less than 2.4 D. at least 2.4 but less than 2.5 E. at least 2.5 13.17 (1 point) What is the chance that a hospital stay is 6 days? A. less than 3% B. at least 3% but less than 4% C. at least 4% but less than 5% D. at least 5% but less than 6% E. at least 6% 13.18 (2 points) What is the chance that a hospital stay is fewer than 4 days? A. less than 50% B. at least 50% but less than 60% C. at least 60% but less than 70% D. at least 70% but less than 80% E. at least 80% 13.19 (2 points) What is the mode of this frequency distribution? A. 1 B. 2 C. 3 D. 4 E. 5 13.20 (4 points) Let X follow an Exponential with mean θ. Let Y be the minimum of a random sample from X of size k. However, K in turn follows a Logarithmic Distribution with parameter β. What is the distribution function of Y?
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 232
Use the following information for the next 2 questions: • Harvey Wallbanker, the Automatic Teller Machine, works 24 hours a day, seven days a week, without a vacation or even an occasional day off. • Harvey services on average one customer every 10 minutes. • 60% of Harveyʼs customers are male and 40% are female. • The gender of a customer is independent of the gender of the previous customers. • Harveyʼs hobby is to observe patterns of customers. For example, FMF denotes a female customer, followed by a male customer, followed by a female customer. Harvey starts looking at customers who arrive after serving Pat, his most recent customer. How long does it take on average until he sees the following patterns. 13.21 (2 points) How long on average until Harvey sees “M”? 13.22 (2 points) How long on average until Harvey sees “F”?
13.23 (1 point) X and Y are independently, identically distributed Zero-Truncated Poisson Distributions, each with λ = 3. What is the probability generating function of their sum? 13.24 (3 points) Let X follow an Exponential with mean θ. Let Y be the minimum of a random sample from X of size k. However, K in turn follows a Zero-Truncated Geometric Distribution with parameter β. What is the mean of Y? Hint: The densities of a Logarithmic Distribution sum to one. A. θ / (1 + β)
B. (θ/β) ln[1 + β]
C. θ / (1 + ln[1 + β])
D. θ (1 + β)
E. None of A, B, C, or D. 13.25 (5 points) At the Hyperion Hotel, the number of days a guest stays is distributed via a zero-truncated Poisson with λ = 4. On the day they check out, each guest leaves a tip for the maid equal to $3 per day of their stay. The guest in room 666 is checking out today. What is the expected value of the tip?
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 233
13.26 (5 points) The Krusty Burger Restaurant has started a new sales promotion. With the purchase of each meal they give the customer a coupon. There are ten different coupons, each with the face of a different famous resident of Springfield. A customer is equally likely to get each type of coupon, independent of the other coupons he has gotten in the past. Once you get one coupon of each type, you can turn your 10 different coupons for a free meal. (a) Assuming a customer saves his coupons, and does not trade with anyone else, what is the mean number of meals he must buy until he gets a free meal? (b) What is the variance of the number of meals until he gets a free meal? 13.27 (Course 151 Sample Exam #1, Q.12) (1.7 points) A new business has initial capital 700 and will have annual net earnings of 1000. It faces the risk of a one time loss with the following characteristics:
• The loss occurs at the end of the year. • The year of the loss is one plus a Geometric distribution with β = 0.538. (So the loss may either occur at the end of the first year, second year, etc.)
• The size of the loss is uniformly distributed on the ten integers: 500,1000,1500, ..., 5000. Determine the probability of ruin. (A) 0.00 (B) 0.41 (C) 0.46
(D) 0.60
(E) 0.65
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 234
Solutions to Problems: 13.1. B. Let f(x) be the density of a Poisson Distribution, then the distribution truncated from below at zero is: g(x) = f(x) /(1-f(0)). Thus for θ = 0.3, g(x) = {.3x e-.3 / x!} / {1-e-.3}. g(3) = {.33 e-.3 / 3!} / {1-e-.3} = .00333 / .259 = 1.3%. 13.2. B. Mean is that of the non-truncated binomial, divided by 1 - f(0): (.3)(5) / (1-.75 ) = 1.803. 13.3. D. The second moment is that of the non-truncated binomial, divided by 1 - f(0): (1.05+1.52 ) / (1-.75 ) = 3.967. Variance = 3.967 - 1.8032 = 0.716. Comment: Using the formula in Appendix B of Loss Models: Variance = mq{(1-q) - (1 - q + mq)(1-q)m} / {1-(1-q)m}2 = (5)(.3){(.7 - (.7 + 1.5)(.7)5 } / {1-(.7)5 }2 = (1.5)(.3303)/.83192 = 0.716. 13.4. D. For a non-truncated binomial, f(3) = 5!/{(3!)(2!)} .33 .72 = .1323. For the zero-truncated distribution one gets the density by dividing by 1 - f(0): (.1323) / (1-.75 ) = 15.9%. 13.5. B. For a discrete distribution such as we have here, employ the convention that the median is the first value at which the distribution function is greater than or equal to .5. F(1) = .433 < 50%, F(2) = .804 > 50%, and therefore the median is 2. Number of Vehicles 0 1 2 3 4 5
Untruncated Binomial Zero-Truncated CoefficientBinomial Binomial 16.81% 36.02% 43.29% 30.87% 37.11% 13.23% 15.90% 2.83% 3.41% 0.24% 0.29%
Cumulative Zero-Truncated Binomial 43.29% 80.40% 96.30% 99.71% 100.00%
13.6. E. Mean is that of the non-truncated negative binomial, divided by 1-f(0): (4)(.5)/ (1-1.5-4) = 2 / 0.8025 = 2.49
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 235
13.7. D. The second moment is that of the non-truncated negative binomial, divided by 1-f(0): (3+22 ) / (1-1.5-4) = 8.723. Variance = 8.723 - 2.4922 = 2.51. Comment: Using the formula in Appendix B of Loss Models: Variance = rβ{(1+β) - (1 + β + rβ)(1+β)-r} / {1-(1+β)-r}2 = (4)(.5){(1.5 - (1 + .5 + 2)(1.5-4)} / (1-1.5-4)2 = (2)(.8086)/.80252 = 2.51. The non-truncated negative binomial has mean = rβ = 2, and variance = rβ(1+β) = 3, and thus a second moment of: 3 + 22 = 7. 13.8. C. For the non-truncated negative binomial, f(7) = (4)(5)(6)(7)(8)(9)(10) .57 /((7!)(1.5)11) = 1.08%. For the zero-truncated distribution one gets the density by dividing by 1-f(0): (1.08%) / (1-1.5-4) = 1.35%. 13.9. D. The chance of more than 5 is: 1 - .9471 = 5.29%. Number Untruncated Zero-Truncated of Members Neg. Binomial Neg. Binomial 0 19.75% 1 26.34% 32.82% 2 21.95% 27.35% 3 14.63% 18.23% 4 8.54% 10.64% 5 4.55% 5.67% 6 2.28% 2.84% 7 1.08% 1.35% 8 0.50% 0.62% 9 0.22% 0.28%
Cumulative Zero-Truncated Neg. Binomial 32.82% 60.17% 78.40% 89.04% 94.71% 97.55% 98.90% 99.52% 99.79%
13.10. As shown in Appendix B of Loss Models, P(z) =
{1 - β(z -1)}- r - (1+β)- r (1.5 - 0.5z)- 4 - 1/ 1.54 1.54 / (1.5 - 0.5z)4 - 1 = = . 1 - (1+ β)- r 1 - 1/ 1.54 1.54 - 1
Alternately, for the Negative Binomial, f(0) = 1/(1+β)r = 1/1.54 , and P(z) = {1 - β(z-1)}-r = {1 - (0.5)(z - 1)}-4 = (1.5 - 0.5z)-4. PT(z) =
1.54 / (1.5 - 0.5z)4 - 1 P(z) - f(0) (1.5 - 0.5z)- 4 - 1/ 1.54 = = . 1 - f(0) 1 - 1/ 1.54 1.54 - 1
Comment: This probability generating function only exists for z < 1 + 1/β = 1 + 1/0.5 = 3. 13.11. A. Mean of the logarithmic distribution is: β/ln(1+β) = 2 / ln(3) = 1.82.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 236
13.12. B. Variance of the logarithmic distribution is: β{1 + β − β/ln(1+β)}/ln(1+β) = 2{3 -1.82}/ ln(3) = 2.15. 13.13. C. For the logarithmic distribution, f(x) = {β/ (1+β)}x / {x ln(1+β)} f(6) = (2/3)6 / {6 ln(3)} = 1.33%. 13.14. A. For the zero-truncated Negative Binomial Distribution, f(5) = r(r+1)(r+2)(r+3)(r+4) (β/(1+β))x /{(5!)((1+β)r -1)} = (-.6)(.4)(1.4)(2.4)(3.4)(3/4)5 / {(120)(4-.6 -1) = (-2.742)(.2373) / (120)(-.5647) = .96%. Comment: Note this is an extended zero-truncated negative binomial distribution, with 0 > r > -1. The same formulas apply as when r > 0. (As r approaches zero one gets a logarithmic distribution.) For the untruncated negative binomial distribution we must have r > 0. So in this case there is no corresponding untruncated distribution. 13.15. D. Mean is that of the non-truncated Poisson, divided by 1- f(0): (2.5) / (1 - e-2.5) = 2.5/.9179 = 2.724. Comment: Note that since the probability at zero has been distributed over the positive integers, the mean is larger for the zero-truncated distribution than for the corresponding untruncated distribution. 13.16. A. The second moment is that of the non-truncated Poisson, divided by 1 - f(0): (2.5 + 2.52 ) / (1 - e-2.5) = 9.533. Variance = 9.533 - 2.7242 = 2.11. Comment: Using the formula in Appendix B of Loss Models: Variance = λ{1 - (λ+1)e−λ} /(1-e−λ)2 = (2.5){1 - 3.5e-2.5}/(1 - e-2.5)2 = (2.5)(.7127)/.91792 = 2.11. 13.17. B. For a untruncated Poisson, f(6) = (2.56 )e-2.5/6! = .0278. For the zero-truncated distribution one gets the density by dividing by 1-f(0): (.0278) / (1-e-2.5) = 3.03%.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 237
13.18. D. One adds up the chances of 1, 2 and 3 days, and gets 73.59%. Number of Days 0 1 2 3 4 5 6 7 8
Untruncated Binomial Zero-Truncated CoefficientPoisson Poisson 8.21% 20.52% 22.36% 25.65% 27.95% 21.38% 23.29% 13.36% 14.55% 6.68% 7.28% 2.78% 3.03% 0.99% 1.08% 0.31% 0.34%
Cumulative Zero-Truncated Poisson 22.36% 50.30% 73.59% 88.14% 95.42% 98.45% 99.54% 99.88%
Comment: By definition, there is no probability of zero items for a zero-truncated distribution. 13.19. B. The mode is where the density function is greatest, 2. Number of Days 0 1 2 3 4
Untruncated Binomial Zero-Truncated CoefficientPoisson Poisson 8.21% 20.52% 22.36% 25.65% 27.95% 21.38% 23.29% 13.36% 14.55%
Comment: Unless the mode of the untruncated distribution is 0, the mode of the zero-truncated distribution is the same as that of the untruncated distribution. For example, in this case all the densities on the positive integers are increased by the same factor 1/(1 - .0821). Thus since the density at 2 was largest prior to truncation, it remains the largest after truncation at zero.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 238
13.20. Assuming a sample of size k, then Prob[Min > y | k] = Prob[all elements of the sample > y] = (e-y/θ)k = e-yk/θ. Let pk be the Logarithmic density. ∞
Prob[Min > y] =
∑ Prob[Min >
∞
y | k] pk =
k=1
∑ (e- y / θ )k pk = EK[(e-y/θ)k]. k=1
However, the P.G.F. of a frequency distribution is defined as E[zk]. For the Logarithmic Distribution, P(z) = 1 -
ln[1 - β (z - 1)] . ln(1+β)
Therefore, taking z = e-y/θ, Prob[Min > y] = 1 -
ln[1 - β (e- y / θ - 1)] . ln(1+ β)
Thus Prob[Min ≤ y], in other words the distribution function is: F(y) =
ln[1 - β (e- y / θ - 1)] , y > 0. ln(1+ β)
Comment: The distribution of Y is called an Exponential-Logarithmic Distribution. If one lets p = 1/(1+β), then one can show that F(y) = 1 - ln[1 - (1-p) e-y/θ] / ln(p). As β approaches 0, in other words as p approaches 1, the distribution of Y approaches an Exponential Distribution. The Exponential-Logarithmic Distribution has a declining hazard rate. In general, if S(x) is the survival function of severity, Y is the minimum of a random sample from X of size k, and K in turn follows a frequency distribution with support k ≥ 1 and Probability Generating Function P(z), then F(y) = 1 - P(S(y)). 13.21. The number of customers he has to wait is a Zero-Truncated Geometric Distribution with β = chance of failure / chance of success = (1 - 0.6)/0.6 = 1/0.6 - 1. So the mean number of customers is 1/0.6 = 1.67. ⇒ 16.7 minutes on average. Comment: The mean of the Zero-Truncated Geometric Distribution is:
β = 1 + β. 1 - 1/ (1+β)
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 239
13.22. The number of customers he has to wait is a Zero-Truncated Geometric Distribution with β = chance of failure / chance of success = (1 - .4)/.4 = 1/.4 - 1. So the mean number of customers is 1/.4 = 2.5. ⇒ 25 minutes on average. Comment: Longer patterns can be handled via Markov Chain ideas not on the syllabus. See Example 4.20 in Probability Models by Ross. 13.23. As shown in Appendix B of Loss Models, for the zero-truncated Poisson: P(z) =
eλz - 1 e3z - 1 = z . ez - 1 e - 1
The p.g.f. for the sum of two independently, identically distributed variables is: P(z) P(z) = P(z)2 : ⎛ e3z - 1⎞ 2 . ⎝ ez - 1 ⎠ Comment: The sum of two zero-truncated distributions has a minimum of two events. Therefore, the sum of two zero-truncated Poissons is not a zero-truncated Poisson. 13.24. B. Prob[Min > y | k] = Prob[all elements of the sample > y] = (e-y/θ)k = e-yk/θ. Thus the minimum from a sample of size k, follows an Exponential Distribution with mean θ/k. Therefore, E[Y] = E[E[Y|k]] = E[θ/k] = θ E[1/k]. For a Zero-Truncated Geometric, pk = βk-1 / (1+β)k, for k = 1, 2, 3,... ∞
Thus E[1/k] = (1/β)
∑
k=1
⎛ β ⎞k ⎜ ⎟ /k . ⎝ 1+ β ⎠
⎛ β ⎞k ⎜ ⎟ ⎝ 1+ β⎠ However, for the Logarithmic: pk = , for k = 1, 2, 3,... k ln(1+ β)
∞
Therefore, since these Logarithmic densities sum to one:
∑
k=1
⎛ β ⎞k ⎜ ⎟ / k = ln(1 +β). ⎝ 1+ β ⎠
Thus E[1/k] = (1/β) ln[1 +β]. Thus E[Y] = θ E[1/k] = (θ/β) ln[1 + β].
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 240
13.25. The probability of a stay of length k is pT k . If a stay is of length k, the probability that today is the last day is 1/k. Therefore, for an occupied room picked at random, the probability that its guest is checking out ∞
today is:
∑ pTk / k . k=1
The tip for a stay of length k is 3k. Thus, the expected tip left by the guest checking out of room 666 is: ∞
∑ 3k
∞
pT k
k=1 ∞
/ k
∑ pTk / k k=1
3
∑ pTk
3 = ∞ k=1 = ∞ . pTk / k pTk / k
∑
k=1
∑
k=1 ∞
For the zero-truncated Poisson,
∑ pTk / k = k=1
e- λ 1 - e- λ
(λ + λ2/4 + λ3/18 + λ4/96 + λ5/600 + λ6/4320 + λ7/35,280 + λ8/322,560 + λ 9/3,265,920 + λ10/36,288,000 + λ11/439,084,800 + ...) = 0.330.
Thus, the expected tip left by the guest checking out of room 666 is: 3 / 0.330 = 9.09. Alternately, the (average) tip per day is 3. 3 = (0)(probability not last day) + (average tip if last day)(probability last day). 3 = (average tip if last day)(0.333). Therefore, the average tip if it is the last day is: 3 / 0.330 = 9.09.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 241
13.26. (a) After the customer gets his first coupon, there is 9/10 probability that his next coupon is different. Therefore, the number of meals it takes him to get his next unique coupon after his first is a zero-truncated Geometric Distribution, with β = (probability of failure) / (probability of success) = (1/10)/(9/10) = 1/9. (Alternately, it is one plus a Geometric Distribution with β = 1/9.) Thus the mean number of meals from the first to the second unique coupon is: 1 + 1/9 = 10/9. After the customer gets his second unique coupon, there is 8/10 probability his next coupon is different than those he already has. Therefore, the number of meals it takes him to get his third unique coupon after his second is a zero-truncated Geometric Distribution, with β = (probability of failure) / (probability of success) = (2/10)/(8/10) = 2/8. Thus the mean number of meals from the second to the third unique coupon is: 1 + 2/8 = 10/8. Similarly, the number of meals it takes him to get his fourth unique coupon after his third is a zero-truncated Geometric Distribution, with β = 3/7, and mean 10/7. Proceeding in a similar manner, the means to get the remaining coupons are: 10/6 + ... + 10/1. Including one meal to get the first coupon, the mean total number of meals is: (10) (1/10 + 1/9 + 1/8 + 1/7 + 1/6 + 1/5 + 1/4 + 1/3 + 1/2 + 1/1) = 29.29. (b) It takes one meal to get the first coupon; variance is zero. The number of additional meals to get the second unique coupon is a zero-truncated Geometric Distribution, with β = 1/9 and variance: (1/9)(10/9). Similarly, the variance of the number of meals from the second to the third unique coupon is: (2/8)(10/8). The number of meals in intervals between unique coupons are independent, so their variances add. Thus, the variance of the total number of meals is: (10) (1/92 + 2/82 + 3/72 + 4/62 + 5/52 + 6/42 + 7/32 + 8/22 + 9/12 ) = 125.69. Comment: The coupon collectorʼs problem.
2013-4-1,
Frequency Distributions, §13 Zero-Truncated
HCM 10/4/12,
Page 242
13.27. D. At the end of year one the business has 1700. Thus, if the loss occurs at the end of year one, there is ruin if the size of loss is > 1700, a 70% chance. Similarly, at the end of year 2, if the loss did not occur in year 1, the business has 2700. Thus, if the loss occurs at the end of year two there is ruin if the size of loss is > 2700, a 50% chance. If the loss occurs at the end of year three there is ruin if the size of loss is > 3700, a 30% chance. If the loss occurs at the end of year four there is ruin if the size of loss is > 4700, a 10% chance. If the loss occurs in year 5 or later there is no chance of ruin. The probability of the loss being in year n is: (1/(1+β))(β/(1+β))n-1 = .65(.35n-1). A
B
C
D
Year
Probability of Loss in this year
Probability of Ruin if Loss Occurs in this year
Column B times Column C
1 2 3 4 5
0.6500 0.2275 0.0796 0.0279 0.0098
0.7 0.5 0.3 0.1 0
0.4550 0.1138 0.0239 0.0028 0.0000 0.5954
Alternately, if the loss is of size 500, 1000, or 1500 there is not ruin. If the loss is of size 2000 or 2500, then there is ruin if the loss occurs in year 1. If the loss is of size 3000 or 3500, then there is ruin if the loss occurs by year 2. If the loss is of size 4000 or 4500, then there is ruin if the loss occurs by year 3. If the loss is of size 5000, then there is ruin if the loss occurs by year 4. A
Size of Loss 500, 1000, 1500 2000 or 2500 3000 or 3500 4000 or 4500 5000
B
C
D
Year by which Probability that Probability of a loss occurs Loss Occurs by Loss of this Size for Ruin this year 0.3 0.2 0.2 0.2 0.1
none 1 2 3 4
0.000 0.650 0.877 0.957 0.985
E
Column B times Column D 0.000 0.130 0.175 0.191 0.099 0.595
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 243
Section 14, Zero-Modified Distributions102 Frequency distributions can be constructed whose densities on the positive integers are proportional to those of a well-known distribution, but with f(0) having any value between zero and one.
For example, let g(x) =
e - 3 3x 1 - 0.25 , for x = 1, 2 ,3, ..., and g(0) = 0.25. x! 1 - e- 3
Exercise: Verify that the sum of this density is in fact is unity. ∞
[The sum of the Poisson Distribution from 0 to ∞ is 1.
∑ e−3 3x / x! = 1.
i=0 ∞
Therefore,
∑e
−3
3x
/ x! = 1 -
e-3 .
i= 1
⇒
∞
∞
i= 1
i= 0
∑ g(x) (1 - 0.25). ⇒ ∑ g(x) = 1 - 0.25 + 0.25 = 1.]
This is just an example of a Poisson Distribution Modified at Zero, with λ = 3 and 25% probability placed at zero. For a Zero-Modified distribution, an arbitrary amount of probability has been placed at zero. In the example above it is 25%. Loss Models uses pM 0 to denote this probability at zero. The remaining probability is spread out proportional to some well-known distribution such as the Poisson. In general if f is a distribution on 0, 1, 2, 3,..., and 0 < pM 0 < 1, then g(0) =
pM 0,
g(x) = f(x)
1 - pM 0 1 - f(0)
, x = 1, 2, 3,... is a distribution on 0, 1, 2, 3, ....
Exercise: For a Poisson Distribution Modified at Zero, with λ = 3 and 25% probability placed at zero, what are the densities at 0, 1, 2, 3, and 4? [Solution: For example the density at 4 is: (0.75)(34 )e-3/{(4!)(1 - e-3)} = 0.133. x f(x)
0 0.250
1 0.118
2 0.177
3 0.177
4 0.133
In the case of a Zero-Modified Distribution, there is no relationship assumed between the density at zero and the other densities, other than the fact that all of the densities sum to one. 102
See Section 6.7 and Appendix B.3.2 in Loss Models.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 244
We have the following four cases: Distribution
Zero-Modified Distribution, f(0) = pM 0 m! qx (1- q)m - x x! (m - x)! (1- pM 0) 1 - (1- q)m
Binomial
(1- pM 0)
Poisson
Negative Binomial103
Logarithmic
e- λ λx / x! 1 - e- λ
x = 1, 2, 3,... , m
x = 1, 2, 3,...
βx r(r +1)...(r + x - 1) (1+ β)x + r x! (1- pM ) 0 1 - 1/ (1+ β)r
x =1, 2, 3,...
⎛ β ⎞x ⎜ ⎟ ⎝ 1+ β ⎠ (1- pM ) 0 x ln(1+β) x = 1, 2, 3,...
These four zero-modified distributions complete the (a, b, 1) class of frequency distributions.104 They each follow the formula: f(x)/f(x-1) = a + b/x , for x ≥ 2. Note that if pM 0 = 0, one has f(0) = 0 and the zero-modified distribution reduces to a zero-truncated distribution. However, even though it might be useful to think of the zero-truncated distributions as a special case of the zero-modified distributions, Loss Models restricts the term zero-modified to those cases where f(0) > 0. Moments: The moments of a zero-modified distribution h are given in terms of those of f by Ef [Xn] Eh [Xn ] = (1 - pM . For example for the Zero-Truncated Poisson the mean is: 0) 1 - f(0) (1 - pM 0)
103
2 λ M λ+ λ , while the second moment is: (1 p ) . 0 1 - e- λ 1 - e- λ
The zero-modified version of the Negative Binomial is referred to by Loss Models as the Zero-Modified Extended Truncated Negative Binomial. 104 See Table 6.4 and Appendix B.3 in Loss Models.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 245
Exercise: For a Zero-modified Poisson with λ = 3 and 25% chance of zero claims, what is the mean? [Solution: Let f(x) be the untruncated Poisson, and g(x) be the zero-modified distribution. 0.75 f(x) Then g(x) = , x > 0. The mean of h is: 1 - f(0) ∞
∞
∑
x=1
∑ x f(x) mean of f 0.75 x f(x) λ x g(x) = ∑ = 0.75 x=0 = 0.75 = 0.75 -λ = 1 f(0) 1 f(0) 1 f(0) 1 e x=1
(0.75)
∞
3 = (0.75)(3.157) = 2.368. 1 - e- 3
Comment: The term involving x = 0 would contribute nothing to the mean.] Note that in order to get the moments of a zero-modified distribution, one could first compute the moment of the zero-truncated distribution and then multiply by (1 - pM 0 ). For example, the mean of a zero-truncated Poisson with λ = 3 is 3.157. Then the mean of the zero-modified with λ = 3 and 25% chance of zero claims has a mean of: (3.157) (1 - 0.25) = 2.368. Exercise: For a Negative Binomial with r = 0.7 and β = 3 what is the second moment? [Solution: The mean is (0.7)(3) = 2.1, the variance is (0.7)(3)(1+3) = 8.4, so the second moment is: 8.4 + 2.12 = 12.81.] Exercise: For a Zero-Truncated Negative Binomial with r = 0.7 and β = 3 what is the second moment? [Solution: For a Negative Binomial with r = 0.7 and β = 3, the density at zero is: 1/(1+β)r = 4-0.7 = 0.379, and the second moment is 12.81. Thus the second moment of the zero-truncated distribution is: 12.81/(1 - 0.379) = 20.62.] Exercise: For a Zero-Modified Negative Binomial with r = 0.7 and β = 3, with a 15% chance of zero claims, what is the second moment? [Solution: For a Zero-Truncated Negative Binomial with r = 0.7 and β = 3, the second moment is 20.62. Thus the second moment of the zero-modified distribution with a 15% chance of zero claims is: (20.62)(1 - 0.15) = 17.52.]
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 246
Probability Generating Functions: The zero-modified distribution, can be thought of a mixture of a point mass of probability at zero and a zero-truncated distribution. The probability generating function of a mixture is the mixture of the probability generating functions. A point of probability at zero, has a probability generating function E[zn ] = E[z0 ] = 1. Therefore, the Probability generating function, P(z) = E[zN], for a zero-modified distribution can be obtained from that for zero-truncated distribution: M T PM(z) = pM 0 + (1 - p 0 ) P (z).
where PM(z) is the p.g.f. for the zero-modified distribution and PT(z) is the p.g.f. for the zero-truncated distribution, and pM 0 is the probability at zero for the zero-modified distribution. Exercise: What is the Probability Generating Function for a Zero-Modified Poisson Distribution, with 30% probability placed at zero? [Solution: For the zero-truncated Poisson. PT(z) = {eλz - 1} /{eλ - 1}. eλz - 1 M M .] PM(z) = p0 + (1 - p0 )PT(z) = 0.3 + 0.7 λ e - 1 One can derive this relationship as follows: Let g(n) be the zero modified distribution and h(n) be the zero-truncated distribution. M Then g(0) = pM 0 and g(n) = h(n) (1 - p 0 ) for n > 0. ∞
PM(z)
=
∑ n=0
zn g(n)
=
pM 0
∞
+
∑ zn (1 n=1
M M T - pM 0 ) h(n) = p 0 + (1 - p 0 ) P (z).
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 247
Thinning:105 If we take at random a fraction of the events, then we get a distribution of the same family. One parameter is altered by the thinning as per the non-zero-modified case. In addition, the probability at zero, pM 0 , is altered by thinning. Distribution
Result of thinning by a factor of t
Zero-Modified Binomial
q → tq pM 0
→
m remains the same M m m m pM 0 - (1- q) + (1- tq) - p0 (1- tq)
1 - (1- q)m
λ → tλ
Zero-Modified Poisson
pM 0
Zero-Modified Negative Binomial106
1 - e- λ
β → tβ pM 0
Zero-Modified Logarithmic
→
- λ + e - tλ - pM e - tλ pM 0 - e 0
→
r remains the same - r + (1+ tβ) - r - pM (1+ tβ)- r pM 0 - (1+ β) 0
1 - (1+ β) - r
β → tβ M ln[1+ tβ] pM 0 → 1 - (1 - p 0 ) ln[1+ β]
In each case, the new probability of zero claims is the probability generating function for the original zero-modified distribution at 1 - t, where t is the thinning factor. M For example, for the Zero-Modified Binomial, P(z) = pM 0 + (1 - p 0 ) (p.g.f. of zero-truncated) =
pM 0
+ (1 -
P(1 - t) = 105 106
pM 0) pM 0
{1 + q(z -1)}m - (1- q)m . 1 - (1- q)m
+ (1 -
pM 0)
M M {1 - qt}m - (1- q)m p0 - (1- q)m + (1- tq)m - p0 (1- tq)m = . 1 - (1- q)m 1 - (1- q)m
See Table 8.3 in Loss Models. Including the special case the zero-modified geometric.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 248
For example, let us assume we look at only large claims, which are t of all claims. Then if we have n claims, the probability of zero large claims is: (1-t)n . Thus the probability of zero large claims is: Prob[zero claims] (1-t)0 + Prob[1 claim] (1-t)1 + Prob[2 claims] (1-t)2 + Prob[3 claims] (1-t)3 + ... E[(1-t)n ] = P(1 - t) = p.g.f. for the original distribution at 1 - t. Exercise: Show that the p.g.f. for the original zero-modified Logarithmic distribution at 1 - t matches the above result for the density at zero for the thinned distribution. M [Solution: For the Zero-Modified Logarithmic, P(z) = pM 0 + (1 - p 0 ) (p.g.f. of Logarithmic) =
ln[1 - β(z - 1)] M pM }. 0 + (1 - p 0 ) {1 ln[1+ β] M M ln[1+ tβ] P(1 - t) = pM 0 + (1 - p 0 ) {1 - ln[1 - βt] / ln[1+β]} = 1 - (1 - p 0 ) ln[1+ β] . ]
Exercise: The number of losses follows a zero-modified Poisson with λ = 2 and pM 0 = 10%. 30% of losses are large. What is the distribution of the large losses? [Solution: Large losses follow a zero-modified Poisson with λ = (30%)(2) = 0.6 and pM 0 =
0.1 - e -2 + e - 0.6 - (0.1) e - 0.6 = 0.5304.] 1 - e- 2
Exercise: The number of members per family follows a zero-truncated Negative Binomial with r = 0.5 and β = 4. It is assumed that 60% of people have first names that begin with the letters A through M, and that size of family is independent of the letters of the first names of its members. What is the distribution of the number of family members with first names that begin with the letters A through M? [Solution: The zero-truncated distribution is mathematically the same as a zero-modified distribution with pM 0 = 0. Thus the thinned distribution is a zero-modified Negative Binomial with r = 0.5, β = (60%)(4) = 2.4, and pM 0 =
0 - 5 - 0.5 + 3.4 - 0.5 - (0) (3.4 -0.5 ) = 0.1721. 1 - 5 - 0.5
Comment: While prior to thinning there is no probability of zero members, after thinning there is a probability of zero members with first names that begin with the letters A through M. Thus the thinned distribution is zero-modified rather than zero-truncated.]
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 249
Problems: Use the following information for the next six questions: The number of claims per year is given by a Zero-Modified Binomial Distribution with parameters q = 0.3 and m = 5, and with 15% probability of zero claims. 14.1 (1 point) What is the mean number of claims over the coming year? A. less than 1.4 B. at least 1.4 but less than 1.5 C. at least 1.5 but less than 1.6 D. at least 1.6 but less than 1.7 E. at least 1.7 14.2 (2 points) What is the variance of the number of claims per year? A. less than 0.98 B. at least 0.98 but less than 1.00 C. at least 1.00 but less than 1.02 D. at least 1.02 but less than 1.04 E. at least 1.04 14.3 (1 point) What is the chance of observing 3 claims over the coming year? A. less than 13.0% B. at least 13.0% but less than 13.4% C. at least 13.4% but less than 13.8% D. at least 13.8% but less than 14.2% E. at least 14.2% 14.4 (2 points) What is the 95th percentile of the distribution of the number of claims per year? A. 1 B. 2 C. 3 D. 4 E. 5 14.5 (2 points) What is the probability generating function at 3? A. less than 9 B. at least 9 but less than 10 C. at least 10 but less than 11 D. at least 11 but less than 12 E. at least 12 14.6 (2 points) Small claims are 70% of all claims. What is the chance of observing exactly 2 small claims over the coming year? A. 20% B. 22% C. 24% D. 26% E. 28%
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 250
Use the following information for the next five questions: The number of claims per year is given by a Zero-Modified Negative Binomial Distribution with parameters r = 4 and β = 0.5, and with 35% chance of zero claims. 14.7 (1 point) What is the mean number of claims over the coming year? A. less than 1.7 B. at least 1.7 but less than 1.8 C. at least 1.8 but less than 1.9 D. at least 1.9 but less than 2.0 E. at least 2.0 14.8 (2 points) What is the variance of the number of claims year? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6 14.9 (1 point) What is the chance of observing 7 claims over the coming year? A. less than 0.8% B. at least 0.8% but less than 1.0% C. at least 1.0% but less than 1.2% D. at least 1.2% but less than 1.4% E. at least 1.4% 14.10 (3 points) What is the probability of more than 5 claims in the coming year? A. less than 1% B. at least 1%, but less than 3% C. at least 3%, but less than 5% D. at least 5%, but less than 7% E. at least 7% 14.11 (3 points) Large claims are 40% of all claims. What is the chance of observing more than 1 large claim over the coming year? A. 10% B. 12% C. 14% D. 16% E. 18%
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 251
Use the following information for the next four questions: The number of claims per year is given by a Zero-Modified Logarithmic Distribution with parameter β = 2, and a 25% chance of zero claims. 14.12 (1 point) What is the mean number of claims over the coming year? A. less than 1.0 B. at least 1.0 but less than 1.1 C. at least 1.1 but less than 1.2 D. at least 1.2 but less than 1.3 E. at least 1.3 14.13 (2 points) What is the variance of the number of claims per year? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6 14.14 (1 point) What is the chance of observing 6 claims over the coming year? A. less than 1.1% B. at least 1.1% but less than 1.3% C. at least 1.3% but less than 1.5% D. at least 1.5% but less than 1.7% E. at least 1.7% 14.15 (2 points) Medium sized claims are 60% of all claims. What is the chance of observing exactly one medium sized claim over the coming year? A. 31% B. 33% C. 35% D. 37% E. 39%
14.16 (1 point) The number of claims per year is given by a Zero-Modified Negative Binomial Distribution with parameters r = -.6 and β = 3, and with a 20% chance of zero claims. What is the chance of observing 5 claims over the coming year? A. less than 0.8% B. at least 0.8% but less than 1.0% C. at least 1.0% but less than 1.2% D. at least 1.2% but less than 1.4% E. at least 1.4%
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 252
Use the following information for the next seven questions: The number of claims per year is given by a Zero-Modified Poisson Distribution with parameter λ = 2.5, and with 30% chance of zero claims. 14.17 (1 point) What is the mean number of claims over the coming year? A. 1.9
B. 2.0
C. 2.1
D. 2.2
E. 2.3
14.18 (2 points) What is the variance of the number of claims per year? A. less than 2.7 B. at least 2.7 but less than 2.8 C. at least 2.8 but less than 2.9 D. at least 2.9 but less than 3.0 E. at least 3.0 14.19 (1 point) What is the chance of observing 6 claims over the coming year? A. less than 2% B. at least 2% but less than 3% C. at least 3% but less than 4% D. at least 4% but less than 5% E. at least 5% 14.20 (1 point) What is the chance of observing 2 claims over the coming year? A. 18% B. 20% C. 22% D. 24% E. 26% 14.21 (2 points) What is the chance of observing fewer than 4 claims over the coming year? A. less than 70% B. at least 70% but less than 75% C. at least 75% but less than 80% D. at least 80% but less than 85% E. at least 85% 14.22 (2 points) What is the mode of this frequency distribution? A. 0 B. 1 C. 2 D. 3 E. 4 14.23 (2 points) Large claims are 20% of all claims. What is the chance of observing exactly one large claim over the coming year? A. 15% B. 17% C. 19% D. 21% E. 23%
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 253
14.24 (3 points) Let pk denotes the probability that the number of claims equals k for k = 0, 1, ... If pn / pm = 2.4n-m m! / n!, for m ≥ 0, n ≥ 0, then using the corresponding zero-modified claim count M distribution with pM 0 = 0.31, calculate p 3 .
(A) 16%
(B) 18%
(C) 20%
(D) 22%
(E) 24%
14.25 (3 points) The number of losses follow a zero-modified Poisson Distribution with λ and pM 0. Small losses are 70% of all losses. From first principles determine the probability of zero small losses. 14.26 (3 points) The following data is the number sick days taken at a large company during the previous year. Number of days: 0 1 2 3 4 5 6 7 8+ Number of employees: 50,122 9190 5509 3258 1944 1160 693 418 621 Is it likely that this data was drawn from a member of the (a, b, 0) class? Is it likely that this data was drawn from a member of the (a, b, 1) class? M 14.27 (3 points) For a zero-modified Poisson, pM 2 = 27.3%, and p 3 = 12.7%.
Determine pM 0. (A) 11%
(B) 12%
(C) 13%
(D) 14%
(E) 15%
14.28 (3 points) X is a discrete random variable with a probability function which is a member of the (a, b, 1) class of distributions. p k denotes the probability that X = k. p 1 = 0.1637, p2 = 0.1754, and p3 = 0.1503. Calculate p5 . (A) 7.5%
(B) 7.7%
(C) 7.9%
(D) 8.1%
(E) 8.3%
14.29 (3, 5/00, Q.37) (2.5 points) Given: (i) pk denotes the probability that the number of claims equals k for k = 0, 1, 2, ... (ii) pn / pm = m! / n!, for m ≥ 0, n ≥ 0 M Using the corresponding zero-modified claim count distribution with pM 0 = 0.1, calculate p1 .
(A) 0.1
(B) 0.3
(C) 0.5
(D) 0.7
(E) 0.9
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 254
Solutions to Problems: 14.1. C. Mean is that of the unmodified Binomial, multiplied by (1 - 0.15) and divided by 1- f(0): (0.3)(5)(0.85) / (1 - 0.75 ) = 1.533. 14.2. D. The second moment is that of the unmodified Binomial, multiplied by (1-.15) and divided by 1 - f(0): (1.05 + 1.52 )(0.85) / (1 - 0.75 ) = 3.372. Variance = 3.372 - 1.5332 = 1.022. 14.3. C. For an unmodified binomial, f(3) = (5!/(3!)(2!)) 0.33 0.72 = 0.1323. For the zero-truncated distribution one gets the density by multiplying by (1-.15) and dividing by 1 - f(0): (0.1323)(0.85) / (1 - 0.75 ) = 13.5%. 14.4. C. The 95th percentile is that value corresponding to the distribution function being 95%. For a discrete distribution such as we have here, employ the convention that the 95th percentile is the first value at which the distribution function is greater than or equal to 0.95. F(2) = 0.8334 < 95%, F(3) = 0.9686 ≥ 95%, and therefore the 95th percentile is 3. Number of Claims 0 1 2 3 4 5
Unmodified Binomial Zero-Modified CoefficientBinomial Binomial 16.81% 15.00% 36.02% 36.80% 30.87% 31.54% 13.23% 13.52% 2.83% 2.90% 0.24% 0.25%
Cumulative Zero-Modified Binomial 51.80% 83.34% 96.86% 99.75% 100.00%
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 255
14.5. C. As shown in Appendix B of Loss Models, for the zero-truncated Binomial Distribution: PT(z) =
5 5 {1 + q(z -1)} m - (1- q)m T(3) = {1 + (0.3)(3 -1)} - (1- 0.3) = 12.402. . ⇒ P 1 - (1- q)m 1 - (1- 0.3)5
The p.g.f. for the zero-modified distribution is: M T M PM(z) = pM 0 + (1 - p 0 )P (z). ⇒ P (3) = (0.15) + (0.85)(12.402) = 10.69.
Comment: The densities of the zero-modified distribution: Number of Claims 0 1 2 3 4 5
Unmodified Binomial Zero-Modified CoefficientBinomial Binomial 16.807% 15.000% 36.015% 36.797% 30.870% 31.541% 13.230% 13.517% 2.835% 2.897% 0.243% 0.248%
PT(3) is the expected value of 3n : (15%)(30 ) + (36.797%)(31 ) + (31.541%)(32 ) + (13.517%)(33 ) + (2.897%)(34 ) + (0.248%)(35 ) = 10.69. 14.6. B. After thinning we get another zero-modified Binomial, with m = 5, but q = (0.7)(0.3) = 0.21, and M m m m 5 5 5 pM 0 - (1- q) + (1- tq) - p0 (1- tq) = 0.15 - 0.7 + 0.79 - (0.15) (0.79 ) = 0.2927. pM → 0 1 - (1- q)m 1 - 0.75
The density at two of the new zero-modified Binomial is: 1 - 0.2927 (10) (0.212 ) (0.793 ) = 22.21%. 1 - 0.795 Comment: The probability of zero claims for the thinned distribution is the p.g.f. for the original zero-modified distribution at 1 - t, where t is the thinning factor. 14.7. A. Mean is that of the unmodified negative binomial, multiplied by (1-.35) and divided by 1 - f(0): (4)(0.5)(0.65)/ (1 - 1.5-4) = 2 / 0.8025 = 1.62 14.8. E. The second moment is that of the unmodified negative binomial, multiplied by (1-.35) and divided by 1 - f(0): (3+22 ) (0.65)/ (1 - 1.5-4) = 5.67. Variance = 5.67 - 1.622 = 3.05. 14.9. B. For the unmodified negative binomial, f(7) = (4)(5)(6)(7)(8)(9)(10) .57 /((7!)(1.5)11) = 1.08%. For the zero-truncated distribution one gets the density by multiplying by (1-.35) and dividing by 1 - f(0): (1.08%)(0.65) / (1 - 1.5-4) = 0.87%.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 256
14.10. C. The chance of more than 5 claims is: 1 - 0.9656 = 3.44%. Number of Claims 0 1 2 3 4 5 6 7 8 9
Unmodified Neg. Binomial 19.75% 26.34% 21.95% 14.63% 8.54% 4.55% 2.28% 1.08% 0.50% 0.22%
Zero-Modified Neg. Binomial 35.00% 21.33% 17.78% 11.85% 6.91% 3.69% 1.84% 0.88% 0.40% 0.18%
Cumulative Zero-Modified Neg. Binomial 56.33% 74.11% 85.96% 92.88% 96.56% 98.41% 99.29% 99.69% 99.87%
14.11. D. After thinning we get another zero-modified Negative Binomial, with r = 4, but β = (40%)(0.5) = 0.2, and pM 0
→
- r + (1+ tβ) - r - pM (1+ tβ)- r pM 0 - (1+ β) 0
1 - (1+ β) - r
0.35 - 1.5 - 4 + 1.2 - 4 - (0.35) (1.2 - 4 ) = = 1 - 1.5 - 4
0.5806. The density at one of the new zero-modified Negative Binomial is: 1 - 0.5806 (4)(0.2) = 0.2604. 1 - 1/ 1.24 1.25 Probability of more than one large claim is: 1 - 0.5806 - 0.2604 = 15.90%. Comment: The probability of zero claims for the thinned distribution is the p.g.f. for the original zero-modified distribution at 1 - t, where t is the thinning factor. Similar to Example 8.9 in Loss Models. 14.12. E. Mean of the logarithmic distribution is: β/ln(1+β) = 2 / ln(3) = 1.82. For the zero-modified distribution, the mean is multiplied by 1-.25: (.75)(1.82) = 1.37. Comment: Note the unmodified logarithmic distribution has no chance of zero claims. Therefore, we need not divide by 1-f(0) to get to the zero-modified distribution (or alternately we are dividing by 1 - 0 = 1.) 14.13. C. Variance of the unmodified logarithmic distribution is: β{1 + β − β/ln(1+β)}/ln(1+β) = 2{3 -1.82}/ ln(3) = 2.15. Thus the unmodified logarithmic has a second moment of: 2.15 + 1.822 = 5.46. For the zero-modified distribution, the second moment is multiplied by 1-.25: (.75)(5.46) = 4.10. Thus the variance of the zero-modified distribution is: 4.10 - 1.372 = 2.22.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 257
14.14. A. For the unmodified logarithmic distribution, f(x) = {β/ (1+β)}x / {x ln(1+β)} f(6) = (2/3)6 / {6ln(3)} = 1.33%. For the zero-modified distribution, the density at 6 is multiplied by 1-.25: (.75)(1.33%) = 1.00%. 14.15. D. After thinning we get another zero-modified Logarithmic, with β = (60%)(2) = 1.2, and ln[2.2] M ln[1+ tβ] pM 0 → 1 - (1 - p 0 ) ln[1+ β] = 1 - (0.75) ln[3] = 0.4617. The density at one of the new zero-modified Logarithmic is: 1.2 (1 - 0.4617) = 37.23%. 2.2 ln[2.2] Comment: The probability of zero claims for the thinned distribution is the p.g.f. for the original zero-modified distribution at 1 - t, where t is the thinning factor. 14.16. A. For the zero-truncated Negative Binomial Distribution, f(5) = r(r+1)(r+2)(r+3)(r+4) (β/(1+β))x /{(5!)((1+β)r -1)} = (-.6)(.4)(1.4)(2.4)(3.4)(3/4)5 / {(120)(4-.6 -1) = (-2.742)(.2373) / (120)(-.5647) = .96%. For the zero-modified distribution, multiply by 1 - .2: (.8)(.96%) = .77%. Comment: Note this is an extended zero-truncated negative binomial distribution, with 0 > r > -1. The same formulas apply as when r > 0. (As r approaches zero one gets a logarithmic distribution.) For the unmodified negative binomial distribution we must have r > 0. So in this case there is no corresponding unmodified distribution. 14.17. A. The mean is that of the non-modified Poisson, multiplied by (1-.3) and divided by 1- f(0): (2.5) (.7) / (1-e-2.5) = 1.907. 14.18. E. The second moment is that of the unmodified Poisson, multiplied by (1-.3) and divided by 1-f(0): (2.5+2.52 )(.7) / (1-e-2.5) = 6.673. Variance = 6.673 - 1.9072 = 3.04. 14.19. B. For an unmodified Poisson, f(6) = (2.56 )e-2.5/6! = .0278. For the zero-modified distribution one gets the density by multiplying by (1 - .3) and dividing by 1 - f(0): (.0278)(.7) / (1 - e-2.5) = 2.12%. 14.20. B. For the unmodified Poisson f(0) = e-2.5 = 8.208%, and f(2) = 2.52 e-2.5/2 = 25.652%. The zero-modified Poisson has a density at 2 of: (25.652%)(1 - 30%)/(1 - 8.208%) = 19.56%.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 258
14.21. D. One adds up the chances of 0, 1, 2 and 3 claims, and gets 81.5%. Number of Claims 0 1 2 3 4 5 6 7 8
Unmodified BinomialZero-Modified Poisson Coefficient Poisson 8.21% 30.00% 20.52% 15.65% 25.65% 19.56% 21.38% 16.30% 13.36% 10.19% 6.68% 5.09% 2.78% 2.12% 0.99% 0.76% 0.31% 0.24%
Cumulative Zero-Modified Poisson 45.65% 65.21% 81.51% 91.70% 96.80% 98.92% 99.68% 99.91%
Comment: We are given a 30% chance of zero claims. The remaining 70% is spread in proportion to the unmodified Poisson. For example, (70%)(20.52%)/(1-.0821) = 15.65%, and (70%)(25.65%)/(1-.0821) = 19.56% Unlike the zero-truncated distribution, the zero-modified distribution has a probability of zero events. 14.22. A. The mode is where the density function is greatest, 0. Number of Claims 0 1 2 3 4 5 6 7 8
Unmodified Binomial Zero-Modified CoefficientPoisson Poisson 8.21% 30.00% 20.52% 15.65% 25.65% 19.56% 21.38% 16.30% 13.36% 10.19% 6.68% 5.09% 2.78% 2.12% 0.99% 0.76% 0.31% 0.24%
Comment: If the mode of the zero-modified and unmodified distribution are ≠ 0, then the zero-modified distribution has the same mode as the unmodified distribution, since all the densities on the positive integers are multiplied by the same factor. 14.23. E. After thinning we get another zero-modified Poisson, with λ = (20%)(2.5) = 0.5, and pM 0
→
- λ + e - tλ - pM e - tλ pM 0 - e 0
1 - e- λ
=
0.3 - e - 2.5 + e - 0.5 - (0.3) (e - 0.5 ) = 0.6999. 1 - e - 2.5
The density at one of the new zero-modified Poisson is: 1 - 0.6999 (0.5 e-0.5) = 23.13%. 1 - e - 0.5 Comment: The probability of zero claims for the thinned distribution is the p.g.f. for the original zero-modified distribution at 1 - t, where t is the thinning factor.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 259
14.24. A. f(x+1)/ f(x) = 2.4{x!/ (x+1)!} = 2.4/(x+1). Thus this is a member of the (a, b, 0) subclass, f(x+1)/ f(x) = a + b/(x+1), with a = 0 and b = 2.4. This is a Poisson Distribution, with λ = 2.4. For the unmodified Poisson, the probability of more than zero claims is: 1 - e-2.4 . After, zero-modification, this probability is: 1- .31 = .69. Thus the zero-modified distribution is, fM(x) = (.69/(1 - e-2.4))f(x) = (.69/(1 - e-2.4))e-2.4 2.4x /x! = 2.4x(.69)/((e2.4 - 1) x!), x≥1. fM(3) = 2.43 (.69/((e2.4 - 1) 3!)= 0.159. # claims zero-modified density
0 0.31
1 0.1652
2 0.1983
Comment: For a Poisson with λ = 2.4, f(n)/f(m) =
3 4 5 6 7 0.1586 0.0952 0.0457 0.0183 0.0063 (e-2.4 2.4n / n!)/(e-2.4 2.4m / m!) =
2.4n-m m! / n!. 14.25. If there are n losses, then the probability that zero of them are small is 0.3n . Prob[0 small losses] = Prob[0 losses] + Prob[1 loss] Prob[loss is big] + Prob[2 losses] Prob[both losses are big] + ... = pM 0 M p0
M p0
+{
+
+
1 - pM 0
1 - e- λ
λe−λ}
(0.3) + {
1 - pM 0
1 - e- λ
λ2e−λ
/ 2!}
(0.32 )
+{
1 - pM 0
e−λ {0.3λ + (0.3λ)2 / 2! + (0.3λ)3 / 3! + ...} =
1 - pM 0
e−λ
1 - e- λ
1 - e- λ
{e0.3λ
- 1} =
M p0
+
1 - pM 0
1 - e- λ
M - 0.7 λ - e - λ ) (1 - e - λ ) pM 0 + (1 - p0 ) (e
1 - e- λ
=
1 - pM 0
1 - e- λ
λ3e−λ / 3!} (0.33 ) + ... =
{e-0.7λ - e−λ} =
- λ + e - 0.7 λ - pM e - 0.7 λ pM 0 - e 0
Comment: Matches the general formula with t = 0.7:
1 - e- λ
pM 0
→
.
- λ + e - tλ - pM e - tλ pM 0 - e 0
1 - e- λ
The thinned distribution is also a zero-modified Poisson, with λ* = 0.7λ. The probability of zero claims for the thinned distribution is the P.G.F. for the original zero-modified distribution at 1 - t, where t is the thinning factor.
.
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 260
14.26. Calculate (x+1)f(x+1)/f(x) = (x+1) (number with x+1) / (number with x). Number of Days
Observed
(x+1)f(x+1)/f(x)
Differences
0 1 2 3 4 5 6 7 8+
50,122 9,190 5,509 3,258 1,944 1,160 693 418 621
0.183 1.199 1.774 2.387 2.984 3.584 4.222
1.016 0.575 0.613 0.597 0.601 0.638
The accident profile is not approximately linear starting at zero. Thus, this is probably not from a member of the (a, b, 0) class. The accident profile is approximately linear starting at one. Thus, this is probably from a member of the (a, b, 1) class. Comment: f(x+1)/f(x) = a + b/(x+1), so (x+1)f(x+1)/f(x) = a(x+1) + b = ax + a + b. The slope is positive, so a > 0 and we have a Negative Binomial. The slope, a ≅ 0.6. The intercept is about 0.6. Thus a + b ≅ 0.6. Therefore, b ≅ 0. For the Negative Binomial b = (r-1)β/(1+β). Thus b = 0, implies r ≅ 1. Thus the data may have been drawn from a Zero-Modified Geometric, with β ≅ 0.6.
14.27. E.
Thus
pM 2
pM pM 3/ 2
= f(2)
1 - pM 0 1 - f(0)
.
pM 3
= f(3)
1 - pM 0 1 - f(0)
.
λ 3 e- λ / 6 = f(3) / f(2) = 2 = λ/3. ⇒ λ/3 = 12.7%/27.3%. ⇒ λ = 1.396. λ e- λ / 2
⇒ 27.3% = {(1.3962 ) e-1.396 / 2} Comment: pM = 39.11%. 1
1 - pM 0 . ⇒ pM 0 = 14.86%. 1.396 1 - e
2013-4-1,
Frequency Distributions, §14 Zero-Modified
HCM 10/4/12,
Page 261
14.28. B. Since we have a member of the (a, b, 1) family: p 2 /p1 = a + b/2. ⇒ 2a + b = (2)(0.1754)/0.1637 = 2.1429. p 3 /p2 = a + b/3. ⇒ 3a + b = (3)(0.1503)/0.1754 = 2.5707.
⇒ a = 0.4278. ⇒ b = 1.2873. p 4 = (a + b/4) p3 = (0.4278 + 1.2873/4) (0.1503) = 0.1127. p 5 = (a + b/45) p4 = (0.4278 + 1.2873/5) (0.1127) = 0.0772. Comment: Based on a zero-modified Negative Binomial, with r = 4, β = 0.75, and pM 0 = 20%. 14.29. C. f(x+1)/ f(x) = x!/ (x+1)! = 1/(x+1). Thus this is a member of the (a, b, 0) subclass, f(x+1) / f(x) = a + b/(x+1), with a = 0 and b = 1. This is a Poisson Distribution, with λ = 1. For the unmodified Poisson, the probability of more than zero claims is: 1 - e-1 . After, zero-modification, this probability is: 1 - 0.1 = .9. Thus the zero-modified distribution is, fM(x) = (0.9/(1-e-1))f(x) = (0.9/(1-e-1))e-1 1x /x! = 0.9/((e - 1) x!), x≥1. fM(1) = 0.9/(e-1) = 0.524. # claims zero-modified density
0 0.1
1 0.5238
2 0.2619
Comment: For a Poisson with λ = 1, f(n)/f(m) =
3 4 5 6 0.0873 0.0218 0.0044 0.0007 (e-1 1n / n!)/(e-1 1m / m!) = m! / n!.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 262
Section 15, Compound Frequency Distributions107 A compound frequency distribution has a primary and secondary distribution, each of which is a frequency distribution. The primary distribution determines how many independent random draws from the secondary distribution we sum. For example, assume the number of taxicabs that arrive per minute at the Heartbreak Hotel is Poisson with mean 1.3. In addition, assume that the number of passengers dropped off at the hotel by each taxicab is Binomial with q = 0.4 and m = 5. The number of passengers dropped off by each taxicab is independent of the number of taxicabs that arrive and is independent of the number of passengers dropped off by any other taxicab. Then the aggregate number of passengers dropped off per minute at the Heartbreak Hotel is an example of a compound frequency distribution. It is a compound Poisson-Binomial distribution, with parameters λ = 1.3, q = 0.4, m = 5.108 The distribution function of the primary Poisson is as follows: 1.3 Number of Claims 0 1 2 3 4 5 6
Probability Density Function 27.253% 35.429% 23.029% 9.979% 3.243% 0.843% 0.183%
Cumulative Distribution Function 0.27253 0.62682 0.85711 0.95690 0.98934 0.99777 0.99960
So for example, there is a 3.243% chance that 4 taxicabs arrive; in which case the number passengers dropped off is the sum of 4 independent identically distributed Binomials109, given by the secondary Binomial Distribution. There is a 27.253% chance there are no taxicabs, a 35.429% chance we take one Binomial, 23.029% chance we sum the result of 2 independent identically distributed Binomials, etc.
107
See Section 6.8 of Loss Models, not on the syllabus. However, compound distributions are mathematically the same as aggregate distributions. See “Mahlerʼs Guide to Aggregate Distributions.” Some of you may better understand the idea of compound distributions by seeing how they are simulated in “Mahlerʼs Guide to Simulation.” 108 In the name of a compound distribution, the primary distribution is listed first and the secondary distribution is listed second. 109 While we happen to know that the sum of 4 independent Binomials each with q = 0.4, m = 5 is another Binomial with parameters q = 0.4, m = 20, that fact is not essential to the general concept of a compound distribution.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 263
The secondary Binomial Distribution with q = 0.4, m = 5 is as follows: Number of Claims 0 1 2 3 4 5
Probability Density Function 7.776% 25.920% 34.560% 23.040% 7.680% 1.024%
Cumulative Distribution Function 0.07776 0.33696 0.68256 0.91296 0.98976 1.00000
Thus assuming a taxicab arrives, there is a 34.560% chance that 2 passengers are dropped off. In this example, the primary distribution determines how many taxicabs arrive, while the secondary distribution determines the number of passengers departing per taxicab. Instead, the primary distribution could be the number of envelopes arriving and the secondary distribution could be the number of claims in each envelope.110 Actuaries often use compound distributions when the primary distribution determines how many accidents there are, while for each accident the number of persons injured or number of claimants is determined by the secondary distribution.111 This particular model, while useful for comprehension, may or may not apply to any particular use of the mathematical concept of compound frequency distributions. There are number of methods of computing the density of compound distributions, among them the use of convolutions and the use of the Recursive Method (Panjer Algorithm.)112
Probability Generating Function of Compound Distributions: One can get the Probability Generating Function of a compound distribution in terms of those of its primary and secondary distributions: p.g.f. of compound distribution = p.g.f. of primary distribution[p.g.f. of secondary distribution] P(z) = P1 [P2 (z)].
110
See 3, 11/01, Q.30. See 3, 5/01, Q.36. 112 Both discussed in “Mahlerʼs Guide to Aggregate Distributions,” where they are applied to both compound and aggregate distributions. 111
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 264
Exercise: What is the Probability Generating Function of a Compound Geometric-Binomial Distribution, with β = 3, q = 0.1, and m = 2. [Solution: The p.g.f. of the primary Geometric is: 1 / {1 -3 (z-1)} = 1 / (4 - 3z), z < 1 +1/β = 4/3. The p.g.f. of the secondary Binomial is: {1 + (0.1)(z-1)}2 = (0.9 + 0.1z)2 = 0.01z2 + 0.18z + 0.81. P(z) = P1 [P2 (z)] = 1 / {4 - 3(0.01z2 + 0.18z + 0.81)} = -1/(0.03z2 + 0.54z - 1.57), z < 4/3.] Recall, that for any frequency distribution, f(0) = P(0). Therefore, for a compound distribution, c(0) = Pc(0) = P1 [P2 (0)] = P1 [s(0)]. compound density at 0 = p.g.f. of the primary at density at 0 of the secondary.113 For example, in the previous exercise, the density of the compound distribution at zero is its p.g.f. at z = 0: 1/1.57 = 0.637. The density at 0 of the secondary Binomial Distribution is: 0.92 = 0.81. The p.g.f. of the primary distribution at 0.81 is: 1 / {4 - (3)(0.81)} = 1/1.57 = 0.637. If one takes the p.g.f. of a compound distribution to a power ρ > 0, P(z)ρ = P1 ρ [P2 (z)]. Thus if the primary distribution is infinitely divisible, i.e., P1 ρ has the same form as P1 , then Pρ has the same form as P. If the primary distribution is infinitely divisible, then so is the compound distribution. Since the Poisson and the Negative Binomial are each infinitely divisible, so are compound distributions with a primary distribution which is either a Poisson or a Negative Binomial (including a Geometric.) Adding Compound Distributions: For example, let us assume that taxi cabs arrive at a hotel (primary distribution) and drop people off (secondary distribution.) Assume two independent Compound Poisson Distributions with the same secondary distribution. The first compound distribution represents those cabs whose drivers were born in January through June and has λ = 11, while the second compound distribution represents those cabs whose drivers were born in July through December and has λ = 9. Then the sum of the two distributions represents the passengers from all of the cabs, and is a Compound Poisson Distribution with λ = 11 + 9 = 20, and the same secondary distribution as each of the individual Compound Distributions. Note that the parameter of the primary rather than secondary distribution was affected. 113
This is the first step of the Panjer Algorithm, discussed in “Mahlerʼs Guide to Aggregate Distributions.”
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 265
Exercise: Let X be a Poisson-Binomial Distribution compound frequency distribution with λ = 4.3, q = 0.2, and m = 5. Let Y be a Poisson-Binomial Distribution compound frequency distribution with λ = 2.4, q = 0.2, and m = 5. What is the distribution of X + Y? [Solution: A Poisson-Binomial Distribution with λ = 4.3 + 2.4 = 6.7, q = 0.2, and m = 5.] The sum of two independent identically distributed Compound Poisson variables has the same form. The sum of two independent identically distributed Compound Negative Binomial variables has the same form. Exercise: Let X be a Negative Binomial-Poisson compound frequency distribution with β = 0.7, r = 2.5, and λ = 3. What is the distributional form of the sum of two independent random draws from X? [Solution: A Negative Binomial-Poisson with β = 0.7, r = (2)(2.5) = 5, and λ = 3.] Exercise: Let X be a Poisson-Geometric compound frequency distribution with λ = 0.3 and β = 1.5. What is the distributional form of the sum of twenty independent random draws from X? [Solution: The sum of 20 independent identically distributed variables is of the same form. However, λ = (20)(0.3) = 6. We get a Poisson-Geometric compound frequency distribution with λ = 6 and β = 1.5.] If one adds independent identically distributed Compound Binomial variables one gets the same form. Exercise: Let X be a Binomial-Geometric compound frequency distribution with q = 0.2, m = 3, and β = 1.5. What is the distributional form of the sum of twenty independent random draws from X? [Solution: The sum of 20 independent identically distributed binomial variables is of the same form, with m = (20)(3) = 60. We get a Binomial-Geometric compound frequency distribution with q = 0.2, m = 60, and β = 1.5.]
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 266
Thinning Compound Distributions: Thinning compound distributions can be done in two different manners, one manner affects the primary distribution, and the other manner affects the secondary distribution. For example, assume that taxi cabs arrive at a hotel (primary distribution) and drop people off (secondary distribution.) Then we can either select certain types of cabs or certain types of people. Depending on which we select, we affect the primary or secondary distribution. Assume we select only those cabs that are less than one year old (and assume age of cab is independent of the number of people dropped off and the frequency of arrival of cabs.) Then this would affect the primary distribution, the number of cabs. Exercise: Cabs arrive via a Poisson with mean 1.3. The number of people dropped off by each cab is Binomial with q = 0.2 and m = 5. The number of people dropped off per cab is independent of the number of cabs that arrive. 30% of cabs are less than a year old. The age of cabs is independent of the number of people dropped off and the frequency of arrival of cabs. What is the distribution of the number of people dropped off by cabs less than one year old? [Solution: Cabs less than a year old arrive via a Poisson with λ = (30%)(1.3) = 0.39. There is no effect on the number of people per cab (secondary distribution.) We get a Poisson-Binomial Distribution compound frequency distribution with λ = 0.39, q = 0.2, and m = 5.] This first manner of thinning affects the primary distribution. For example, it might occur if the primary distribution represents the number of accidents and the secondary distribution represents the number of claims. For example, assume that the number of accidents is Negative Binomial with β = 2 and r = 30, and the number of claims per accident is Binomial with q = 0.3 and m = 7. Then the total number of claims is Compound Negative Binomial-Binomial with parameters β = 2, r = 30, q = 0.3 and m = 7.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 267
Exercise: Accidents are assigned at random to one of four claims adjusters: Jerry, George, Elaine, or Cosmo. What is the distribution of the number claims adjusted by George? [Solution: We are selecting at random 1/4 of the accidents. We are thinning the Negative Binomial Distribution of the number of accidents. Therefore, the number of accidents assigned to George is Negative Binomial with β = 2/4 = 0.5 and r = 30. The number claims adjusted by George is Compound Negative Binomial-Binomial with parameters β = 0.5, r = 30, q = 0.3 and m = 7.] Returning to the cab example, assume we select only female passengers, (and gender of passenger is independent of the number of people dropped off and the frequency of arrival of cabs.). Then this would affect the secondary distribution, the number of passengers. Exercise: Cabs arrive via a Poisson with mean 1.3. The number of people dropped off by each cab is Binomial with q = 0.2 and m = 5. The number of people dropped off per cab is independent of the number of cabs that arrive. 40% of the passengers are female. The gender of passengers is independent of the number of people dropped off and the frequency of arrival of cabs. What is the distribution of the number of females dropped off by cabs? [Solution: The distribution of female passengers per cab is Binomial with q = (0.4)(0.2) = 0.08 and m = 5. There is no effect on the number of cabs (primary distribution.) We get a Poisson-Binomial Distribution compound frequency distribution with λ = 1.3, q = 0.08, and m = 5.] This second manner of thinning a compound distribution affects the secondary distribution. It is mathematically the same as what happens when one takes only the large claims in a frequency and severity situation, when the frequency distribution itself is compound.114 For example, if frequency is Poisson-Binomial with λ = 1.3, q = 0.2, and m = 5, and 40% of the claims are large. The number of large claims would be simulated by first getting a random draw from the Poisson, then simulating the appropriate number of random Binomials, and then for each claim from the Binomial there is a 40% chance of selecting it at random independent of any other claims. This is mathematically the same as thinning the Binomial. Therefore, large claims have a PoissonBinomial Distribution compound frequency distribution with λ = 1.3, q = (0.4)(0.2) = 0.08 and m = 5.
114
This is what is considered in Section 8.6 of Loss Models.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 268
Exercise: Let frequency be given by a Geometric-Binomial compound frequency distribution with β = 1.5, q = 0.2, and m = 3. Severity follows an Exponential Distribution with mean 1000. Frequency and severity are independent. What is the frequency distribution of losses of size between 500 and 2000? [Solution: The fraction of losses that are of size between 500 and 2000 is: F(2000) - F(500) = (1-e-2000/1000) - (1-e-500/1000) = e-.5 - e-2 = 0.4712. Thus the losses of size between 500 and 2000 follow a Geometric-Binomial compound frequency distribution with β = 1.5, q = (0.4712)(0.2) = 0.0942, and m = 3.] Proof of Some Thinning Results:115 One can use the result for the probability generating function for a compound distribution, p.g.f. of compound distribution = p.g.f. of primary distribution[p.g.f. of secondary distribution], in order to determine the results of thinning a Poisson, Binomial, or Negative Binomial Distribution. Assume one has a Poisson Distribution with mean λ. Assume one selects at random 30% of the claims. This is mathematically the same as a compound distribution with a primary distribution that is Poisson with mean λ and a secondary distribution that is Bernoulli with q = 0.3. The p.g.f. of the Poisson is P(z) = eλ(z-1). The p.g.f. of the Bernoulli is P(z) = 1 + 0.3(z-1). The p.g.f. of the compound distribution is obtained by replacing z in the p.g.f. of the primary Poisson with the p.g.f. of the secondary Bernoulli: P(z) = exp[λ{1 + 0.3(z-1) - 1}] = exp[(0.3λ)(z - 1)]. This is the p.g.f. of a Poisson Distribution with mean 0.3λ. Thus the thinned distribution is also Poisson, with mean 0.3λ. In general, when thinning a Poisson by a factor of t, the thinned distribution is also Poisson with mean tλ.
115
See Section 8.6 of Loss Models.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 269
Similarly, assume we are thinning a Binomial Distribution with parameters q and m. The p.g.f. of the Binomial is P(z) = {1 + q(z-1)}m. This is mathematically the same as a compound distribution with secondary distribution a Bernoulli with mean t. The p.g.f. of this compound distribution is: {1 + q(1 + t(z-1) -1)}m = {1 + tq(z-1))}m. This is the p.g.f. of a Binomial Distribution with parameters tq and m. In general, when thinning a Binomial by a factor of t, the thinned distribution is also Binomial with parameters tq and m. Assume we are thinning a Negative Binomial Distribution with parameters β and r. The p.g.f. of the Negative Binomial is P(z) = {1 - β(z-1)}-r. This is mathematically the same as a compound distribution with secondary distribution a Bernoulli with mean t. The p.g.f. of this compound distribution is: {1 - β(1 + t(z-1) -1)}-r = {1 - tβ((z-1)}-r. This is the p.g.f. of a Negative Binomial Distribution with parameters r and tβ. In general, when thinning a Negative Binomial by a factor of t, the thinned distribution is also Negative Binomial with parameters tβ and r.116 Since thinning is mathematically the same as a compound distribution with secondary distribution a Bernoulli with mean t, and the p.g.f. of the Bernoulli is 1 - t + tz, the p.g.f. of the thinned distribution is P(1 - t + tz), where P(z) is the p.g.f. of the original distribution. In general, P(0) = f(0). Thus the density at zero for the thinned distribution is: P(1 - t + t0) = P(1 - t). The density of the thinned distribution at zero is the p.g.f. of the original distribution at 1 - t.117 Let us assume instead we start with a zero-modified distribution. Let P(z) be the p.g.f. of the original distribution prior to being zero-modified. M M M P(z) - f(0) Then PZM(z) = pM 0 + (1 - p 0 ) PZT(z) = p 0 + (1 - p 0 ) 1 - f(0) . Now the density at zero for the thinned version of the original distribution is: P(1 - t). The density at zero for the thinned version of the original distribution is: 1 - P(1- t) M M P(1- t) - f(0) pM . ⇒ 1 - pM * = (1 - pM ) . 0 * = PZM(1 - t) = p 0 + (1 - p 0 ) 0 0 1 - f(0) 1 - f(0) 116 117
Including the special case the Geometric Distribution. This general result was discussed previously with respect to thinning zero-modified distributions.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 270
The p.g.f. of the thinned zero-modified distribution is: M P(1 - t + tz) - f(0) PZM(1 - t + tz) = pM = 0 + (1 - p 0 ) 1 - f(0) P(1 - t + tz) - f(0) M P(1- t) - f(0) pM + (1 - pM ) = 0 * - (1 - p 0 ) 0 1 - f(0) 1 - f(0) M pM 0 * + (1 - p 0 )
Now,
P(1 - t + tz) - P(1 - t) P(1 - t + tz) - P(1 - t) = pM * + (1 - pM *) . 0 0 1 - f(0) 1 - P(1 - t)
P(1 - t + tz) - P(1 - t) = 1 - P(1 - t)
(p.g.f. of thinned non- modified dist.) - (density at zero of thinned non- modified dist.) . 1 - (density at zero of thinned non - modified distribution) Therefore, the form of the p.g.f. of the thinned zero-modified distribution: P(1 - t + tz) - P(1 - t) M pM , 0 * + (1 - p 0 *) 1 - P(1 - t) is the usual form of the p.g.f. of a zero-modified distribution, with the thinned version of the original distribution taking the place of the original distribution. Therefore, provided thinning preserves the family of the original distribution, the thinned zero-truncated distribution is of the same family with pM 0 *, and with the other parameters as per thinning of the non-modified distribution. Specifically as discussed before: Distribution
Result of thinning by a factor of t
Zero-Modified Binomial
q → tq
m remains the same
M m m m pM 0 - (1- q) + (1- tq) - p0 (1- tq) pM → 0 1 - (1- q)m
Zero-Modified Poisson
λ → tλ - λ + e - tλ - pM e - tλ pM 0 - e 0 pM → 0 λ 1 - e
Zero-Modified Negative Binomial
β → tβ pM 0
→
r remains the same - r + (1+ tβ) - r - pM (1+ tβ)- r pM 0 - (1+ β) 0
1 - (1+ β) - r
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 271
As discussed previously, things work similarly for a zero-modified Logarithmic. Let P(z) be the p.g.f. of the original Logarithmic distribution prior to being zero-modified. M Then PZM(z) = pM 0 + (1 - p 0 ) P(z).
Now the density at zero for the thinned version of the original distribution is: P(1 - t). The density at zero for the thinned version of the zero-modified distribution is: M M pM 0 * = PZM(1 - t) = p 0 + (1 - p 0 ) P(1 - t). M ⇒ 1 - pM 0 * = (1 - p 0 ) {1 - P(1 - t)}.
As before, since the p.g.f. of the secondary Bernoulli is 1 - t + tz, the p.g.f. of the thinned zero-modified distribution is: M M M M PZM(1 - t + tz) = pM 0 + (1 - p 0 ) P(1 - t + tz) = p 0 * - (1 - p 0 ) P(1 - t) + (1 - p 0 ) P(1 - t + tz) = M pM 0 * + (1 - p 0 *)
Now,
P(1 - t + tz) - P(1 - t) . 1 - P(1 - t)
P(1 - t + tz) - P(1 - t) = 1 - P(1 - t)
(p.g.f. of thinned non- modified dist.) - (density at zero of thinned non- modified dist.) . 1 - (density at zero of thinned non - modified distribution) Therefore, the form of the p.g.f. of the thinned zero-modified distribution: P(1 - t + tz) - P(1 - t) M pM , 0 * + (1 - p 0 *) 1 - P(1 - t) is the usual form of the p.g.f. of a zero-modified distribution, with the thinned version of the original distribution taking the place of the original distribution. Therefore, since thinning results in another Logarithmic, the thinned zero-truncated distribution is of the same family with pM 0 *, and the other parameter as per thinning of the non-modified distribution. As discussed before: Zero-Modified Logarithmic
β → tβ M ln[1+ tβ] pM 0 → 1 - (1 - p 0 ) ln[1+ β]
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 272
Problems: 15.1 (2 points) The number of accidents is Geometric with β = 1.7. The number of claims per accident is Poisson with λ = 3.1. For the total number of claims, what is the Probability Generating Function, P(z)? exp[3.1(z - 1)] A. 2.7 - 1.7z B.
1 2.7 - 1.7exp[3.1(z - 1)]
C. exp[3.1(z - 1)] + (2.7 - 1.7z) 3.1(z - 1.7) D. exp[ ] 2.7 - 1.7z E. None of the above 15.2 (1 point) Frequency is given by a Poisson-Binomial compound frequency distribution, with λ = 0.18, q = 0.3, and m = 3. One third of all losses are greater than $10,000. Frequency and severity are independent. What is frequency distribution of losses of size greater than $10,000? A. Compound Poisson-Binomial with λ = 0.18, q = 0.3, and m = 3. B. Compound Poisson-Binomial with λ = 0.18, q = 0.1, and m = 3. C. Compound Poisson-Binomial with λ = 0.18, q = 0.3, and m = 1. D. Compound Poisson-Binomial with λ = 0.06, q = 0.3, and m = 3. E. None of the above. 15.3 (1 point) X is given by a Binomial-Geometric compound frequency distribution, with q = 0.15, m = 3, and β = 2.3. Y is given by a Binomial-Geometric compound frequency distribution, with q = 0.15, m = 5, and β = 2.3. X and Y are independent. What is the distributional form of X + Y? A. Compound Binomial-Geometric with q = 0.15, m = 4, and β = 2.3 B. Compound Binomial-Geometric with q = 0.15, m = 8, and β = 2.3 C. Compound Binomial-Geometric with q = 0.15, m = 4, and β = 4.6 D. Compound Binomial-Geometric with q = 0.15, m = 8, and β = 4.6 E. None of the above.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 273
15.4 (2 points) A compound claims frequency model has the following properties: (i) The primary distribution has probability generating function: P(z) = 0.2z + 0.5z2 + 0.3z3 . (ii) The secondary distribution has probability generating function: P(z) = exp[0.7(z - 1)]. Calculate the probability of no claims from this compound distribution. (A) 18% (B) 20% (C) 22% (D) 24% (E) 26% 15.5 (1 point) Assume each exposure has a Poisson-Poisson compound frequency distribution, as per Loss Models, with λ1 = 0.03 and λ2 = 0.07. You insure 20,000 independent exposures. What is the frequency distribution for your portfolio? A. Compound Poisson-Poisson with λ1 = 0.03 and λ2 = 0.07 B. Compound Poisson-Poisson with λ1 = 0.03 and λ2 = 1400 C. Compound Poisson-Poisson with λ1 = 600 and λ2 = 0.07 D. Compound Poisson-Poisson with λ1 = 600 and λ2 = 1400 E. None of the above. 15.6 (2 points) Frequency is given by a Poisson-Binomial compound frequency distribution, with parameters λ = 1.2, q = 0.1, and m = 4. What is the Probability Generating Function? A. {1+ 0.1(z - 1)}4
B. exp(1.2(z - 1))
C. exp[1.2({1 + 0.1(z - 1)}4 -1)] E. None of the above
D. {1 + 0.1(exp[1.2(z - 1)] - 1)}4
15.7 (1 point) The total number of claims from a book of business with 100 exposures has a Compound Poisson-Geometric Distribution with λ = 4 and β = 0.8. Next year this book of business will have 75 exposures. Next year, what is the distribution of the total number of claims from this book of business? A. Compound Poisson-Geometric with λ = 4 and β = 0.8. B. Compound Poisson-Geometric with λ = 3 and β = 0.8. C. Compound Poisson-Geometric with λ = 4 and β = 0.6. D. Compound Poisson-Geometric with λ = 3 and β = 0.6. E. None of the above.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 274
15.8 (2 points) A compound claims frequency model has the following properties: (i) The primary distribution has probability generating function: P(z) = 1 / (5 - 4z). (ii) The secondary distribution has probability generating function: P(z) = (0.8 + 0.2z)3 . Calculate the probability of no claims from this compound distribution. (A) 28% (B) 30% (C) 32% (D) 34% (E) 36% 15.9 (1 point) The total number of claims from a group of 50 drivers has a Compound Negative Binomial-Poisson Distribution with β = 0.4, r = 3, and λ = 0.7. What is the distribution of the total number of claims from 500 similar drivers? A. Compound Negative Binomial-Poisson with β = 0.4, r = 30, and λ = 0.7. B. Compound Negative Binomial-Poisson with β = 4, r = 3, and λ = 0.7. C. Compound Negative Binomial-Poisson with β = 0.4, r = 3, and λ = 7. D. Compound Negative Binomial-Poisson with β = 4, r = 30, and λ = 7. E. None of the above. 15.10 (SOA M, 11/05, Q.27 & 2009 Sample Q.208) (2.5 points) An actuary has created a compound claims frequency model with the following properties: (i) The primary distribution is the negative binomial with probability generating function P(z) = [1 - 3(z - 1)]-2. (ii) The secondary distribution is the Poisson with probability generating function P(z) = exp[λ(z - 1)]. (iii) The probability of no claims equals 0.067. Calculate λ. (A) 0.1
(B) 0.4
(C) 1.6
(D) 2.7
(E) 3.1
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 275
Solutions to Problems: 15.1. B. P(z) = P1 [P2 (z)]. The p.g.f of the primary Geometric is: 1/{1 - β(z-1)} = 1/{1 - 1.7(z-1)} = 1/(2.7 - 1.7z). The p.g.f of the secondary Poisson is: exp[λ(z-1)] = exp[3.1(z-1)]. Thus the p.g.f. of the compound distribution is: 1 / {2.7 - 1.7exp[3.1(z-1)]}. Comment: P(z) only exist for z < 1 + 1/β = 1 + 1/1.7. 15.2. B. We are taking 1/3 of the claims from the secondary Binomial. Thus the secondary distribution is Binomial with q = 0.3/3 = 0.1 and m = 3. Thus the frequency distribution of losses of size greater than $10,000 is given by a Poisson-Binomial compound frequency distribution, as per Loss Models with λ = 0.18, q = 0.1, and m = 3. 15.3. B. Provided the secondary distributions are the same, the primary distributions add as they usually would. The sum of two independent Binomials with the same q, is another Binomial with the sum of the m parameters. In this case it is a Binomial with q = 0.15 and m = 3 + 5 = 8. X + Y is a Binomial-Geometric with q = 0.15, m = 8, and β = 2.3. Comment: The secondary distributions determine how many claims there are per accident. The primary distributions determine how many accidents. In this case the Binomial distributions of the number of accidents add. 15.4. E. P(z) = P1 [P2 (z)]. Density at 0 is: P(0) = P1 [P2 (0)] = P1 [e-0.7] = 0.2e-0.7 + 0.5e-1.4 + 0.3e-2.1 = 0.259. Alternately, the primary distribution has 20% probability of 1, 50% probability of 2, and 30% probability of 3, while the secondary distribution is a Poisson with λ = 0.7. The density at zero of the secondary distribution is e-.7. Therefore, the probability of zero claims for the compound distribution is: (0.2)(Prob 0 from secondary) + (0.5)(Prob 0 from secondary)2 + (0.3)(Prob 0 from secondary)3 = .2e-.7 + .5(e-.7)2 + .3(e-.7)3 = 0.259. 15.5. C. One adds up 20,000 independent identically distributed variables. In the case of a Compound Poisson distribution, the primary Poissons add to give another Poisson with λ 1 = (20000)(.03) = 600. The secondary distribution stays the same. The portfolio has a compound Poisson-Poisson with λ1 = 600 and λ2 = 0.07.
2013-4-1,
Frequency Distributions, §15 Compound Dists.
HCM 10/4/12,
Page 276
15.6. C. The p.g.f of the primary Poisson is exp(λ(z-1)) = exp(1.2(z-1)). The p.g.f of the secondary Binomial is {1+ q(z-1)}m = {1+ .1(z-1)}4 . Thus the p.g.f. of the compound distribution is P(z) = P1 [P2 (z)] = exp[1.2({1+ .1(z-1)}4 -1)]. 15.7. B. Poisson-Geometric with λ = (75/100)(4) = 3 and β = 0.8. Comment: One adjusts the primary Poisson distribution in a manner similar to that if one just had a Poisson distribution. 15.8. D. P(z) = P1 [P2 (z)]. Density at 0 is: P(0) = P1 [P2 (0)] = P1 [.83 ] = 1/{5 - 4(.83 )} = 0.339. Alternately, the secondary distribution is a Binomial with m = 3 and q = 0.2. The density at zero of the secondary distribution is .83 . Therefore, the probability of zero claims for the compound distribution is: P1 [.83 ] = 1/{5 - 4(.83 )} = 0.339. 15.9. A. Negative Binomial-Poisson with β = 0.4, r = (500/50)(3) = 30, and λ = 0.7. Comment: One adjusts the primary Negative Binomial distribution in a manner similar to that if one just had a Negative Binomial distribution. 15.10. E. The p.g.f. of the compound distribution is the p.g.f. of the primary distribution at the p.g.f. of the secondary distribution: P(z) = [1 - 3(exp[λ(z - 1)] - 1)]-2. 0.067 = f(0) = P(0) = [1 - 3(exp[λ(0 - 1)] - 1)]-2 = [1 - 3(exp[-λ] - 1)]-2.
⇒ 1 - 3(exp[-λ] - 1) = 3.8633. ⇒ exp[-λ] = .04555. ⇒ λ = 3.089. Alternately, the Poisson secondary distribution at zero is e−λ. From the first step of the Panjer Algorithm, c(0) = Pp [s(0)] = [1 - 3(e−λ - 1)]-2. Proceed as before. Comment: P(z) = E[zn ] = Σ f(n)zn . Therefore, letting z approach zero, P(0) = f(0). The probability generating function of the Negative Binomial only exists for z < 1 + 1/β = 4/3.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 277
Section 16, Moments of Compound Frequency Distributions118 One may find it helpful to think of the secondary distribution as taking the role of a severity distribution in the calculation of aggregate losses.119 Since the situations are mathematically equivalent, many of the techniques and formulas that apply to aggregate losses apply to compound frequency distributions. For example, the same formulas for the mean, variance and skewness apply.120 Mean of Compound Dist. = (Mean of Primary Dist.) (Mean of Secondary Dist.) Variance of Compound Dist. = (Mean of Primary Dist.) (Variance of Secondary Dist.) + (Mean of Secondary Dist.)2 (Variance of Primary Dist.) Skewness Compound Dist. = {(Mean of Primary Dist.)(Variance of Second. Dist.)3/2(Skewness of Secondary Dist.) + 3(Variance of Primary Dist.)(Mean of Secondary Dist.)(Variance of Second. Dist.) + (Variance of Primary Dist.)3/2(Skewness of Primary Dist.)(Mean of Second. Dist.)3 } / (Variance of Compound Dist.)3/2 Exercise: What are the mean and variance of a compound Poisson-Binomial distribution, with parameters λ = 1.3, q = 0.4, m = 5. [Solution: The mean and variance of the primary Poisson Distribution are both 1.3. The mean and variance of the secondary Binomial Distribution are (0.4)(5) = 2 and (0.4)(0.6)(5) = 1.2. Thus the mean of the compound distribution is: (1.3)(2) = 2.6. The variance of the compound distribution is: (1.3)(1.2) + (2)2 (1.3) = 6.76.]
118
See Section 6.8 of Loss Models, not on the syllabus. However, since compound distributions are mathematically the same as aggregate distributions, I believe that a majority of the questions in this section would be legitimate questions for your exam. Compound frequency distributions used to be on the syllabus. 119 In the case of aggregate losses, the frequency distribution determines how many independent identically distributed severity variables we will sum. 120 The secondary distribution takes the place of the severity, while the primary distribution takes the place of the frequency, in the formulas involving aggregate losses. σagg2 = µF σS2 + µS2 σF2. See “Mahlerʼs Guide to Aggregate Distributions.”
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 278 Thus in the case of the Heartbreak Hotel example in the previous section, on average 2.6 passengers are dropped off per minute.121 The variance of the number of passengers dropped off is 6.76. Poisson Primary Distribution: In the case of a Poisson primary distribution with mean λ, the variance of the compound distribution could be rewritten as: λ(Variance of Secondary Dist.) + (Mean of Secondary Dist.)2 λ = λ(Variance of Secondary Dist. + Mean of Secondary Dist.2 ) = λ(2nd moment of Secondary Distribution). It also turns out that the third central moment of a compound Poisson distribution = λ(3rd moment of Secondary Distribution). For a Compound Poisson Distribution: Mean = λ(mean of Secondary Distribution). Variance = λ(2nd moment of Secondary Distribution). 3rd central moment = λ(3rd moment of Secondary Distribution). Skewness = λ−.5(3rd moment of Second. Dist.)/(2nd moment of Second. Dist.)1.5.122 Exercise: The number of accidents follows a Poisson Distribution with λ = 0.04. Each accident generates 1, 2 or 3 claimants with probabilities 60%, 30%, and 10%. Determine the mean, variance, and skewness of the total number of claimants. [Solution: The secondary distribution has mean 1.5, second moment 2.7, and third moment 5.7. Thus the mean number of claimants is: (0.04)(1.5) = 0.06. The variance of the number of claimants is: (0.04)(2.7) = 0.108. The skewness of the number of claimants is: (0.04-.5)(5.7)/(2.7)1.5 = 6.42.] 121
The number of taxicabs that arrive per minute at the Heartbreak Hotel is Poisson with mean 1.3. The number of passengers dropped off at the hotel by each taxicab is Binomial with q = .4 and m = 5. The number of passengers dropped off by each taxicab is independent of the number of taxicabs that arrive and is independent of the number of passengers dropped off by any other taxicab. 122 Skewness = (third central moment)/ Variance1.5.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 279 Problems: 16.1 (1 point) For a compound distribution: Mean of primary distribution = 15. Standard Deviation of primary distribution = 3. Mean of secondary distribution = 10. Standard Deviation of secondary distribution = 4. What is the standard deviation of the compound distribution? A. 26 B. 28 C. 30 D. 32 E. 34 16.2 (2 points) The number of accidents follows a Poisson distribution with mean 10 per month. Each accident generates 1, 2, or 3 claimants with probabilities 40%, 40%, 20%, respectively. Calculate the variance in the total number of claimants in a year. A. 250 B. 300 C. 350 D. 400 E. 450 Use the following information for the next 3 questions: The number of customers per minute is Geometric with β = 1.7. The number of items sold to each customer is Poisson with λ = 3.1. The number of items sold per customer is independent of the number of customers. 16.3 (1 point) What is the mean? A. less than 5.0 B. at least 5.0 but less than 5.5 C. at least 5.5 but less than 6.0 D. at least 6.0 but less than 6.5 E. at least 6.5 16.4 (1 point) What is the variance? A. less than 50 B. at least 50 but less than 51 C. at least 51 but less than 52 D. at least 52 but less than 53 E. at least 53 16.5 (2 points) What is the chance that more than 4 items are sold during the next minute? Use the Normal Approximation. A. 46% B. 48% C. 50% D. 52% E. 54%
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 280 16.6 (3 points) A dam is proposed for a river which is currently used for salmon breeding. You have modeled: (i) For each hour the dam is opened the number of female salmon that will pass through and reach the breeding grounds has a distribution with mean 50 and variance 100. (ii) The number of eggs released by each female salmon has a distribution with mean of 3000 and variance of 1 million. (iii) The number of female salmon going through the dam each hour it is open and the numbers of eggs released by the female salmon are independent. Using the normal approximation for the aggregate number of eggs released, determine the least number of whole hours the dam should be left open so the probability that 2 million eggs will be released is greater than 99.5%. (A) 14 (B) 15 (C) 16 (D) 17 (E) 18 16.7 (3 points) The claims department of an insurance company receives envelopes with claims for insurance coverage at a Poisson rate of λ = 7 envelopes per day. For any period of time, the number of envelopes and the numbers of claims in the envelopes are independent. The numbers of claims in the envelopes have the following distribution: Number of Claims Probability 1 0.60 2 0.30 3 0.10 Using the normal approximation, calculate the 99th percentile of the number of claims received in 5 days. (A) 73 (B) 75
(C) 77
(D) 79
(E) 81
16.8 (3 points) The number of persons using an ATM per hour has a Negative Binomial Distribution with β = 2 and r = 13. Each hour is independent of the others. The number of transactions per person has the following distribution: Number of Transactions Probability 1 0.30 2 0.40 3 0.20 4 0.10 Using the normal approximation, calculate the 80th percentile of the number of transactions in 5 hours. A. 300
B. 305
C. 310
D. 315
E. 320
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 281 Use the following information for the next 3 questions: • The number of automobile accidents follows a Negative Binomial distribution with β = 0.6 and r = 100.
• •
For each automobile accident the number of claimants with bodily injury follows a Binomial Distribution with q = 0.1 and m = 4. The number of claimants with bodily injury is independent between accidents.
16.9 (2 points) Calculate the variance in the total number of claimants. (A) 33 (B) 34 (C) 35 (D) 36 (E) 37 16.10 (1 point) What is probability that there are 20 or fewer claimants in total? (A) 22% (B) 24% (C) 26% (D) 28% (E) 30% 16.11 (3 points) The amount of the payment to each claimant follows a Gamma Distribution with α = 3 and θ = 4000. The amount of payments to different claimants are independent of each other and are independent of the number of claimants. What is the probability that the aggregate payment exceeds 300,000? (A) 44% (B) 46% (C) 48% (D) 50% (E) 52% 16.12 (3 points) The number of batters per half-inning of a baseball game is: 3 + a Negative Binomial Distribution with β = 1 and r = 1.4. The number of pitches thrown per batter is: 1 + a Negative Binomial Distribution with β = 1.5 and r = 1.8. What is the probability of more than 30 pitches in a half-inning? Use the normal approximation with continuity correction. A. 1/2% B. 1% C. 2% D. 3% E. 4% 16.13 (3 points) The number of taxicabs that arrive per minute at the Gotham City Railroad Station is Poisson with mean 5.6. The number of passengers dropped off at the station by each taxicab is Binomial with q = 0.3 and m = 4. The number of passengers dropped off by each taxicab is independent of the number of taxicabs that arrive and is independent of the number of passengers dropped off by any other taxicab. Using the normal approximation for the aggregate passengers dropped off, determine the least number of whole minutes one must observe in order that the probability that at least 1000 passengers will be dropped off is greater than 90%. A. 155 B. 156 C. 157 D. 158 E. 159
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 282 16.14 (4 points) At a storefront legal clinic, the number of lawyers who volunteer to provide legal aid to the poor on any day is uniformly distributed on the integers 1 through 4. The number of hours each lawyer volunteers on a given day is Binomial with q = 0.6 and m = 7. The number of clients that can be served by a given lawyer per hour is a Poisson distribution with mean 5. Determine the probability that 40 or more clients can be served in a day at this storefront law clinic, using the normal approximation. (A) 69% (B) 71% (C) 73% (D) 75% (E) 77% Use the following information for the next 3 questions: The number of persons entering a library per minute is Poisson with λ = 1.2. The number of books returned per person is Binomial with q = 0.1 and m = 4. The number of books returned per person is independent of the number of persons. 16.15 (1 point) What is the mean number of books returned per minute? A. less than 0.5 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9 16.16 (1 point) What is the variance of the number of books returned per minute? A. less than 0.6 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9 16.17 (1 point) What is the probability of observing more than two books returned in the next minute? Use the Normal Approximation. A. less than 0.6% B. at least 0.6% but less than 0.7% C. at least 0.7% but less than 0.8% D. at least 0.8% but less than 0.9% E. at least 0.9%
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 283 16.18 (2 points) Yosemite Sam is panning for gold. The number of pans with gold nuggets he finds per day is Poisson with mean 3. The number of nuggets per such pan are: 1, 5, or 25, with probabilities: 90%, 9%, and 1% respectively. The number of pans and the number of nuggets per pan are independent. Using the normal approximation with continuity correction, what is the probability that the number of nuggets found by Sam over the next ten day is less than 30? (A) Φ(-1.2)
(B) Φ(-1.1)
(C) Φ(-1.0)
(D) Φ(-0.9)
(E) Φ(-0.8)
16.19 (3, 11/00, Q.2 & 2009 Sample Q.112) (2.5 points) In a clinic, physicians volunteer their time on a daily basis to provide care to those who are not eligible to obtain care otherwise. The number of physicians who volunteer in any day is uniformly distributed on the integers 1 through 5. The number of patients that can be served by a given physician has a Poisson distribution with mean 30. Determine the probability that 120 or more patients can be served in a day at the clinic, using the normal approximation with continuity correction. (A) 1 - Φ(0.68)
(B) 1 - Φ(0.72)
(C) 1 - Φ(0.93)
(D) 1 - Φ(3.13)
(E) 1 - Φ(3.16)
16.20 (3, 5/01, Q.16 & 2009 Sample Q.106) (2.5 points) A dam is proposed for a river which is currently used for salmon breeding. You have modeled: (i) For each hour the dam is opened the number of salmon that will pass through and reach the breeding grounds has a distribution with mean 100 and variance 900. (ii) The number of eggs released by each salmon has a distribution with mean of 5 and variance of 5. (iii) The number of salmon going through the dam each hour it is open and the numbers of eggs released by the salmon are independent. Using the normal approximation for the aggregate number of eggs released, determine the least number of whole hours the dam should be left open so the probability that 10,000 eggs will be released is greater than 95%. (A) 20 (B) 23 (C) 26 (D) 29 (E) 32 16.21 (3, 5/01, Q.36 & 2009 Sample Q.111) (2.5 points) The number of accidents follows a Poisson distribution with mean 12. Each accident generates 1, 2, or 3 claimants with probabilities 1/2, 1/3, 1/6, respectively. Calculate the variance in the total number of claimants. (A) 20 (B) 25 (C) 30 (D) 35 (E) 40
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 284 16.22 (3, 11/01, Q.30) (2.5 points) The claims department of an insurance company receives envelopes with claims for insurance coverage at a Poisson rate of λ = 50 envelopes per week. For any period of time, the number of envelopes and the numbers of claims in the envelopes are independent. The numbers of claims in the envelopes have the following distribution: Number of Claims Probability 1 0.20 2 0.25 3 0.40 4 0.15 Using the normal approximation, calculate the 90th percentile of the number of claims received in 13 weeks. (A) 1690 (B) 1710
(C) 1730
(D) 1750
(E) 1770
16.23 (3, 11/02, Q.27 & 2009 Sample Q.93) (2.5 points) At the beginning of each round of a game of chance the player pays 12.5. The player then rolls one die with outcome N. The player then rolls N dice and wins an amount equal to the total of the numbers showing on the N dice. All dice have 6 sides and are fair. Using the normal approximation, calculate the probability that a player starting with 15,000 will have at least 15,000 after 1000 rounds. (A) 0.01 (B) 0.04 (C) 0.06 (D) 0.09 (E) 0.12 16.24 (CAS3, 5/04, Q.26) (2.5 points) On Time Shuttle Service has one plane that travels from Appleton to Zebrashire and back each day. Flights are delayed at a Poisson rate of two per month. Each passenger on a delayed flight is compensated $100. The numbers of passengers on each flight are independent and distributed with mean 30 and standard deviation 50. (You may assume that all months are 30 days long and that years are 360 days long.) Calculate the standard deviation of the annual compensation for delayed flights. A. Less than $25,000 B. At least $25,000, but less than $50,000 C. At least $50,000, but less than $75,000 D. At least $75,000, but less than $100,000 E. At least $100,000
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 285 16.25 (SOA M, 11/05, Q.18 & 2009 Sample Q.205) (2.5 points) In a CCRC, residents start each month in one of the following three states: Independent Living (State #1), Temporarily in a Health Center (State #2) or Permanently in a Health Center (State #3). Transitions between states occur at the end of the month. If a resident receives physical therapy, the number of sessions that the resident receives in a month has a geometric distribution with a mean which depends on the state in which the resident begins the month. The numbers of sessions received are independent. The number in each state at the beginning of a given month, the probability of needing physical therapy in the month, and the mean number of sessions received for residents receiving therapy are displayed in the following table: State# Number in state Probability of needing therapy Mean number of visits 1 400 0.2 2 2 300 0.5 15 3 200 0.3 9 Using the normal approximation for the aggregate distribution, calculate the probability that more than 3000 physical therapy sessions will be required for the given month. (A) 0.21 (B) 0.27 (C) 0.34 (D) 0.42 (E) 0.50 16.26 (SOA M, 11/05, Q.39 & 2009 Sample Q.213) (2.5 points) For an insurance portfolio: (i) The number of claims has the probability distribution n pn 0 0.1 1 0.4 2 0.3 3 0.2 (ii) Each claim amount has a Poisson distribution with mean 3; and (iii) The number of claims and claim amounts are mutually independent. Calculate the variance of aggregate claims. (A) 4.8 (B) 6.4 (C) 8.0 (D) 10.2 (E) 12.4
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 286 16.27 (CAS3, 5/06, Q.35) (2.5 points) The following information is known about a consumer electronics store:
• The number of people who make some type of purchase follows a Poisson distribution with a mean of 100 per day.
• The number of televisions bought by a purchasing customer follows a Negative Binomial distribution with parameters r = 1.1 and β = 1.0. Using the normal approximation, calculate the minimum number of televisions the store must have in its inventory at the beginning of each day to ensure that the probability of its inventory being depleted during that day is no more than 1.0%. A. Fewer than 138 B. At least 138, but fewer than 143 C. At least 143, but fewer than 148 D. At least 148, but fewer than 153 E. At least 153 16.28 (SOA M, 11/06, Q.30 & 2009 Sample Q.285) (2.5 points) You are the producer for the television show Actuarial Idol. Each year, 1000 actuarial clubs audition for the show. The probability of a club being accepted is 0.20. The number of members of an accepted club has a distribution with mean 20 and variance 20. Club acceptances and the numbers of club members are mutually independent. Your annual budget for persons appearing on the show equals 10 times the expected number of persons plus 10 times the standard deviation of the number of persons. Calculate your annual budget for persons appearing on the show. (A) 42,600 (B) 44,200 (C) 45,800 (D) 47,400 (E) 49,000
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 287 Solutions to Problems: 16.1. E. Standard deviation of the compound distribution is: (15)(42 ) + (102 )(32) = 1140 = 33.8. 16.2. E. The frequency over a year is Poisson with mean: (12)(10) = 120 accidents. Second moment of the secondary distribution is: (40%)(12 ) + (40%)(22 ) + (20%)(32 ) = 3.8. Variance of compound distribution is: (120)(3.8) = 456. Comment: Similar to 3, 5/01, Q.36. 16.3. B. The mean of the primary Geometric Distribution is 1.7. The mean of the secondary Poisson Distribution is 3.1. Thus the mean of the compound distribution is: (1.7)(3.1) = 5.27. 16.4. A. The mean of the primary Geometric Distribution is 1.7. The mean of the secondary Poisson Distribution is 3.1. The variance of the primary Geometric is: (1.7)(1+1.7) = 4.59. The variance of the secondary Poisson Distribution is 3.1. The variance of the compound distribution is: (1.7)(3.1) + (3.1)2 (4.59) = 49.38. Comment: The variance of the compound distribution is large compared to its mean. A very large number of items can result if there are a large number of customers from the Geometric combined with some of those customers buying a large numbers of items from the Poisson. Compound distributions tend to have relatively heavy tails. 16.5. E. From the previous solutions, the mean of the compound distribution is 5.27, and the variance of the compound distribution is 49.38. Thus the standard deviation is 7.03. 1 - Φ[(4.5 - 5.27)/7.03)] = 1 - Φ(-0.11) = Φ(0.11) = 0.5438.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 288 16.6. C. Over y hours, the number of salmon has mean 50y and variance 100y. The mean aggregate number of eggs is: (50y)(3000) = 150000y. The standard deviation of the aggregate number of eggs is: (50y)(10002 ) + (30002)(100y) = 30822 y . Thus the probability that the aggregate number of eggs is < 2 million is approximately: Φ((1999999.5 - 150000y)/30822 y ). Since Φ(2.576) = .995, this probability will be 1/2% if: (1999999.5 - 150000y)/ 30822 y = - 2.576 ⇒ 150000y - 79397 y - 1999999.5 = 0. y = (79397 ±
793972 + (4)(150000)(1999999.5) )/ ((2)(150000)) = 0.2647 ± 3.6611.
y = 3.926. ⇒ y = 15.4. The smallest whole number of hours is therefore 16. Alternately, try the given choices and stop when (Mean - 2 million)/StdDev. > 2.576. Hours
Mean
Standard Deviation
# of Claims
14 15 16 17 18
2,100,000 2,250,000 2,400,000 2,550,000 2,700,000
115,325 119,373 123,288 127,082 130,767
0.867 2.094 3.244 4.328 5.353
Comment: Similar to 3, 5/01, Q.16. Note that since the variance over one hour is 100, the variance of the number of salmon over two hours is: (2)(100) = 200. Number of salmon over two hours = number over the first hour + number over the second hour.
⇒ Var[Number over two hours] = Var[number over first hour] + Var[number over second hour] = 2 Var[number over an hour]. We are adding independent random variables, rather than multiplying an individual variable by a constant. 16.7. B. The mean frequency over 5 days is: (7)(5) = 35. Mean number of claims per envelope is: (60%)(1) + (30%)(2) + (10%)(3) = 1.5. Mean of compound distribution is: (35)(1.5) = 52.5. Second moment of number of claims per envelope is: (60%)(12 ) + (30%)(22 ) + (10%)(32 ) = 2.7. Variance of compound distribution is: (35)(2.7) = 94.5. 99th percentile ≅ mean + (2.326)(standard deviations) = 52.5 + (2.326) 94.5 = 75.1. Comment: Similar to 3, 11/01, Q.30.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 289 16.8. C. The number of persons has mean: (13)(2) = 26, and variance: (13)(2)(2 + 1) = 78. The number of transactions per person has mean: (30%)(1) + (40%)(2) + (20%)(3) + (10%)(4) = 2.1, second moment: (30%)(12 ) + (40%)(22 ) + (20%)(32 ) + (10%)(42 ) = 5.3, and variance: 5.3 - 2.12 = 0.89. The number of transactions in an hour has mean: (26)(2.1) = 54.6, and variance: (26)(.89) + (2.12 )(78) = 367.12. The number of transactions in 5 hours has mean: (5)(54.6) = 273, and variance: (5)(367.12) = 1835.6. Φ(0.842) = 80%. 80th percentile ≅ 273 + (0.842) 1835.6 = 309.1. 16.9. E. Mean of the Primary Negative Binomial = (100)(0.6) = 60. Variance of the Primary Negative Binomial = (100)(0.6)(1.6) = 96. Mean of the Secondary Binomial = (4)(0.1) = 0.4. Variance of the Secondary Binomial = (4)(0.1)(0.9) = .36. Variance of the Compound Distribution = (60)(.36) + (0.42 )(96) = 36.96. 16.10. D. Mean of the Compound Distribution = (60) (0.4) = 24. Prob[# claimants ≤ 20] ≅ Φ[(20.5 - 24)/ 36.96 ] = Φ(-0.58) = 1 - 0.7190 = 28.1%. 16.11. A. Mean Frequency: 24. Variance of Frequency: 36.96. Mean Severity: (3)(4000) = 12,000. Variance of Severity: (3)(40002 ) = 48,000,000. Mean Aggregate Loss = (24)(12000) = 288,000. Variance of the Aggregate Loss = (24)(48,000,000) + (12,0002 )(36.96) = 6474 million. Prob[Aggregate loss > 300000] ≅ 1 - Φ((300000 - 288000)/ 6474 million n) = 1 - Φ(0.15) = 1 - 0.5596 = 44%. 16.12. E. The number of batters has mean: 3 + (1.4)(1) = 4.4, and variance: (1.4)(1)(1 + 1) = 2.8. The number of pitches per batter has mean: 1 + (1.8)(1.5) = 3.7, and variance: (1.8)(1.5)(1 + 1.5) = 6.75. The number of pitches per half-inning has mean: (4.4)(3.7) = 16.28, and variance: (4.4)(6.75) + (3.72 )(2.8) = 68.032. Prob[# pitches > 30] ≅ 1 - Φ[(30.5 - 16.28)/ 68.032 ] = 1 - Φ(1.72) = 4.27%.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 290 16.13. D. Over y minutes, the number of taxicabs has mean 5.6y and variance 5.6y. The passengers per cab has mean: (0.3)(4) = 1.2, and variance: (0.3)(1 - 0.3)(4) = 0.84. The mean aggregate number of passengers is: (5.6y)(1.2) = 6.72y. The standard deviation of the aggregate number of passengers is: (5.6y)(0.84) + (1.22)(5.6y) = 3.573 y . Thus the probability that the aggregate number of passengers is ≥ 1000 is approximately: 1 - Φ[(999.5 - 6.72y)/3.573 y ]. Since Φ(1.282) = 0.90, this probability will be greater than 90% if: (Mean - 999.5) / StdDev. = (6.72y - 999.5) / (3.573 y ) > 1.282. Try the given choices and stop when (Mean - 999.5) / StdDev. > 1.282. Minutes
Mean
Standard Deviation
# of Claims
155 156 157 158 159
1,041.6 1,048.3 1,055.0 1,061.8 1,068.5
44.48 44.63 44.77 44.91 45.05
0.946 1.094 1.241 1.386 1.531
The smallest whole number of minutes is therefore 158. 16.14. A. The mean number of lawyers is: 2.5 and the variance is: {(1 - 2.5)2 + (2 - 2.5)2 + (3 - 2.5)2 + (4 - 2.5)2 }/4 = 1.25. The mean number of hours per lawyer is: (7)(.6) = 4.2 and the variance is: (7)(.4)(.6) = 1.68. Therefore, the total number of hours volunteered per day has mean: (2.5)(4.2) = 10.5 and variance: (2.5)(1.68) + (4.22 )(1.25) = 26.25. The number of clients per hour has mean 5 and variance 5. Therefore, the total number of clients per day has mean: (5)(10.5) = 52.5, and variance: (10.5)(5) + (52 )(26.25) = 708.75. Prob[# clients ≥ 40] ≅ 1 - Φ[(39.5 - 52.5)/ 708.75 ] = 1 - Φ(-.49) = 68.79%. Alternately, the mean number of clients per lawyer is: (4.2)(5) = 21 with variance: (4.2)(5) + (52 )(1.68) = 63. Therefore, the total number of clients per day has mean: (2.5)(21) = 52.5 and variance: (2.5)(63) + (212 )(1.25) = 708.75. Proceed as before. Comment: Similar to 3, 11/00, Q.2. 16.15. A. The mean of the primary Poisson Distribution is 1.2. The mean of the secondary Binomial Distribution is: (4)(.1) = .4. Thus the mean of the compound distribution is: (1.2)(.4) = 0.48.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 291 16.16. B. The mean of the primary Poisson Distribution is 1.2. The mean of the secondary Binomial Distribution is: (4)(0.1) = 0.4. The variance of the primary Poisson Distribution is 1.2. The variance of the secondary Binomial Distribution is: (4)(0.1)(.9) = 0.36. The variance of the compound distribution is: (1.2)(0.36) + (0.4)2 (1.2) = 0.624. 16.17. A. The compound distribution has mean of .48 and variance of .624. Prob[# books > 2] ≅ 1 - Φ[(2.5 - 0.48)/ 0.624 ] = 1 - Φ(2.56) = 1 - 0.9948 = 0.0052. 16.18. B. The mean number of nuggets per pan is: (90%)(1) + (9%)(5) + (1%)(25) = 1.6. 2nd moment of the number of nuggets per pan is: (90%)(12 ) + (9%)(52 ) + (1%)(252 ) = 9.4. Mean aggregate over 10 days is: (10)(3)(1.6) = 48. Variance of aggregate over 10 days is: (10)(3)(9.4) = 282. Prob[aggregate < 30] ≅ Φ[(29.5 - 48)/ 282 ] = Φ(-1.10) = 13.57%. 16.19. A. This is a compound frequency distribution with a primary distribution that is discrete and uniform on 1 through 5 and with secondary distribution which is Poisson with λ = 30. The primary distribution has mean of 3 and second moment of: (12 + 22 + 32 + 42 + 52 )/5 = 11. Thus the primary distribution has variance: 11 - 32 = 2. Mean of the Compound Dist. = (Mean of Primary Dist.)(Mean of Secondary Dist.) = (3)(30) = 90. Variance of the Compound Distribution = (Mean of Primary Dist.)(Variance of Secondary Dist.) + (Mean of Secondary Dist.)2 (Variance of Primary Dist.) = (3)(30) + (302 )(2) = 1890. Probability of 120 or more patients ≅ 1 - Φ[(119.5 - 90)/ 1890 ] = 1 - Φ(0.68).
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 292 16.20. B. Over y hours, the number of salmon has mean 100y and variance 900y. The mean aggregate number of eggs is: (100y)(5) = 500y. The variance of the aggregate number of eggs is: (100y)(5) + (52 )(900y) = 23000y. Thus the probability that the aggregate number of eggs is < 10000 is approximately: Φ((9999.5 - 500y)/ 23000y ). Since Φ(1.645) = 0.95, this probability will be 5% if: (9999.5 - 500y)/ 23000y = - 1.645 ⇒ 500y - 249.98 y - 9999.5 = 0. y = {249.48 ±
249.482 + (4)(500)(9999.5) } / {(2)(500)} = 0.24948 ± 4.479.
y = 4.729. ⇒ y = 22.3. The smallest whole number of hours is therefore 23. Alternately, calculate the probability for each of the number of hours in the choices. Hours Mean Variance Probability of at least 10,000 eggs 20
10,000
460,000
1 - Φ((9999.5 - 10000)/ 460,000 ) = 1 - Φ(-0.0007) = 50.0%
23
11,500
529,000
1 - Φ((9999.5 - 11500)/ 529,000 ) = 1 - Φ(-2.063) = 98.0%
26
13,000
598,000
1 - Φ((9999.5 - 13000)/ 598,000 ) = 1 - Φ(-3.880) = 99.995%
Thus 20 hours is not enough and 23 hours is enough so that the probability is greater than 95%. Comment: The number of salmon acts as the primary distribution, and the number of eggs per salmon as the secondary distribution. This exam question should have been worded better. They intended to say “so the probability that at least 10,000 eggs will be released is greater than 95%.” The probability of exactly 10,000 eggs being released is very small. 16.21. E. The second moment of the number of claimants per accident is: (1/2)(12 ) + (1/3(22 ) + (1/6)(32 ) = 3.333. The variance of a Compound Poisson Distribution is: λ(2nd moment of the secondary distribution) = (12)(3.333) = 40. Alternately, thinning the original Poisson, those accidents with 1, 2, or 3 claimants are independent Poissons. Their means are: (1/2)(12) = 6, (1/3)(12) = 4, and (1/6)(12) = 2. Number of accidents with 3 claimants is Poisson with mean 2 ⇒ The variance of the number of accidents with 3 claimants is 2. Number of claimants for those accidents with 3 claimants = (3)(# of accidents with 3 claimants) ⇒ The variance of the # of claimants for those accidents with 3 claimants is: (32 )(2). Due to independence, the variances of the three processes add: (12 )(6) + (22 )(4) + (32 )(2) = 40.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 293 16.22. B. Mean # claims / envelope = (1)(0.2) + (2)(0.25) + (3)(0.4) + (4)(0.15) = 2.5. 2nd moment # claims / envelope = (12 )(0.2) + (22 )(0.25) + (32 )(0.4) + (42 )(0.15) = 7.2. Over 13 weeks, the number of envelopes is Poisson with mean: (13)(50) = 650. Mean of the compound distribution = (650)(2.5) = 1625. Variance of the aggregate number of claims = Variance of a compound Poisson distribution = (mean primary Poisson distribution)(2nd moment of the secondary distribution) = (650)(7.2) = 4680. Φ(1.282) = 0.90. Estimated 90th percentile = 1625 + 1.282
4680 = 1713.
16.23. E. The amount won per a round of the game is a compound frequency distribution. Primary distribution (determining how many dice are rolled) is a six-sided die, uniform and discrete on 1 through 6, with mean 3.5, second moment (12 + 22 + 32 + 42 +52 + 62 )/6 = 91/6, and variance 91/6 - 3.52 = 35/12. Secondary distribution is also a six-sided die, with mean 3.5 and variance 35/12. Mean of the compound distribution is: (3.5)(3.5) = 12.25. Variance of the compound distribution is: (3.5)(35/12) + (3.52 )(35/12) = 45.94. Therefore, the net result of a round has mean 12.25 - 12.5 = -0.25, and variance 45.94. 1000 rounds have a net result with mean -250 and variance 45,940. Prob[net result ≥ 0] ≅ 1 - Φ((-0.5 + 250)/ 45,940 ) = 1 - Φ(1.16) = 1 - 0.8770 = 0.1220. 16.24. B. The total number of delayed passengers is a compound frequency distribution, with primary distribution the number of delayed flights, and the secondary distribution the number of passengers on a flight. The number of flights delayed per year is Poisson with mean: (2)(12) = 24. The second moment of the secondary distribution is: 502 + 302 = 3400. The variance of the number of passengers delayed per year is: (24)(3400) = 81,600. The standard deviation of the number of passengers delayed per year is:
81,600 = 285.66.
The standard deviation of the annual compensation is: (100)( 285.66) = 28,566.
2013-4-1, Frequency Distributions, §16 Moments of Comp. Dists. HCM 10/4/12, Page 294 16.25. D. The mean number of sessions is: (400)(0.2)(2) + (300)(0.5)(15) + (200)(0.3)(9) = 2950. For a single resident we have a Bernoulli primary (whether the resident need therapy) and a geometric secondary (how many visits). This has variance: (mean of primary)(variance of second.) + (mean second.)2 (var. of primary) = qβ(1 + β) + β2q(1 - q). For a resident in state 1, the variance of the number of visits is: (0.2)(2)(3) + (32 )(0.2)(1 - 0.8) = 1.84. For state 2, the variance of the number of visits is: (0.5)(15)(16) + (152 )(0.5)(1 - 0.5) = 176.25. For state 3, the variance of the number of visits is: (0.3)(9)(10) + (92 )(0.3)(1 - 0.3) = 44.01. The sum of the visits from 400 residents in state 1, 300 in state 2, and 200 in state 3, has variance: (400)(1.84) + (300)(176.25) + (200)(44.01) = 62,413. Prob[sessions > 3000] ≅ 1 - Φ[(3000.5 - 2950)/ 62413 ] = 1 - Φ[0.20] = 0.4207. 16.26. E. Primary distribution has mean: (0)(.1) + (1)(.4) + (2)(.3) + (3)(.2) = 1.6, second moment: (02 )(.1) + (12 )(.4) + (22 )(.3) + (32 )(.2) = 3.4, and variance: 3.4 - 1.62 = 0.84. The secondary distribution has mean 3 and variance 3. The compound distribution has variance: (1.6)(3) + (32 )(0.84) = 12.36. 16.27. E. Mean = (mean primary)(mean secondary) = (100)(1.1)(1.0) = 110. Variance = (mean primary)(variance of secondary) + (mean secondary)2 (variance of primary) = (100)(1.1)(1)(1 + 1) + {(1.1)(1.0)}2 (100) = 341. Φ(2.326) = 0.99. 99th percentile: 110 + 2.326 341 = 152.95. Need at least 153 televisions. 16.28. A. The primary distribution is Binomial with m = 1000 and q = .2, with mean 200 and variance 160. The mean of the compound distribution is: (200)(20) = 4000. The variance of the compound distribution is: (200)(20) + (202 )(160) = 68,000. Annual budget is: 10(4000 +
68000 ) = 42,608.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 295
Section 17, Mixed Frequency Distributions One can mix frequency models together by taking a weighted average of different frequency models. This can involve either a discrete mixture of several different frequency distributions or a continuous mixture over a portfolio as a parameter varies. For example, one could mix together Poisson Distributions with different means.123 Discrete Mixtures: Assume there are four types of risks, each with claim frequency given by a Poisson distribution: Average Annual A Priori Type Claim Frequency Probability Excellent 1 40% Good 2 30% Bad
3
20%
Ugly
4
10%
Recall that for a Poisson Distribution with parameter λ the chance of having n claims is given by: f(n) = λn e−λ / n!. So for example for an Ugly risk with λ = 4, the chance of n claims is: 4n e-4 / n!. For an Ugly risk the chance of 6 claims is: 46 e-4 /6! = 10.4%. Similarly the chance of 6 claims for Excellent, Good, or Bad risks are: 0.05%, 1.20%, and 5.04% respectively. If we have a risk but do not know what type it is, we weight together the 4 different chances of having 6 claims, using the a priori probabilities of each type of risk in order to get the chance of having 6 claims: (0.4)(0.05%) + (0.3)(1.20%) + (0.2)(5.04%) + (0.1)(10.42%) = 2.43%. The table below displays similar values for other numbers of claims. The probabilities in the final column represents the assumed distribution of the number of claims for the entire portfolio of risks.124 This “probability for all risks” is the mixed distribution. While the mixed distribution is easily computed by weighting together the four Poisson distributions, it is not itself a Poisson nor other well known distribution. 123
The parameter of a Poisson is its mean. While one can mix together other frequency distributions, for example Binomials or Negative Binomials, you are most likely to be asked about mixing Poissons. (It is unclear what if anything they will ask on this subject.) 124 Prior to any observations. The effect of observations will be discussed in “Mahlerʼs Guide to Buhlmann Credibility” and “Mahlerʼs Guide to Conjugate Priors.”
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 296 Number of Claims
Probability for Excellent Risks
Probability for Good Risks
Probability for Bad Risks
Probability for Ugly Risks
Probability for All Risks
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0.3679 0.3679 0.1839 0.0613 0.0153 0.0031 0.0005 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.1353 0.2707 0.2707 0.1804 0.0902 0.0361 0.0120 0.0034 0.0009 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000
0.0498 0.1494 0.2240 0.2240 0.1680 0.1008 0.0504 0.0216 0.0081 0.0027 0.0008 0.0002 0.0001 0.0000 0.0000
0.0183 0.0733 0.1465 0.1954 0.1954 0.1563 0.1042 0.0595 0.0298 0.0132 0.0053 0.0019 0.0006 0.0002 0.0001
0.1995 0.2656 0.2142 0.1430 0.0863 0.0478 0.0243 0.0113 0.0049 0.0019 0.0007 0.0002 0.0001 0.0000 0.0000
SUM
1.0000
1.0000
1.0000
1.0000
1.0000
The density function of the mixed distribution, is the mixture of the density function for specific values of the parameter that is mixed. Moments of Mixed Distributions: The overall (a priori) mean can be computed in either one of two ways. First one can weight together the means for each type of risks, using their (a priori) probabilities: (0.4)(1) + (0.3)(2) + (0.2)(3) + (0.1)(4) = 2. Alternately, one can compute the mean of the mixed distribution: (0)(0.1995) + (1)(0.2656) + (2)( 0.2142) + ... = 2. In either case, the mean of this mixed distribution is 2. The mean of a mixed distribution is the mixture of the means for specific values of the parameter λ: E[X] = Eλ [E[X | λ]]. One can calculate the second moment of a mixture in a similar manner. Exercise: What is the second moment of a Poisson distribution with λ = 3? [Solution: Second Moment = Variance + Mean2 = 3 + 32 = 12.]
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 297 In general, the second moment of a mixture is the mixture of the second moments. In the case of this mixture, the second moment is: (0.4)(2) + (0.3)(6) + (0.2)(12) + (0.1)(20) = 7. One can verify this second moment, by working directly with the mixed distribution: Number Probability Probability of for for Excellent Claims Good Bad Ugly All Risks Risks Risks Risks Risks 0.1995 0.2656 0.2142 0.1430 0.0863 0.0478 0.0243 0.0113 0.0049 0.0019 0.0007 0.0002 0.0001 0.0000 0.0000 Mean Average
Number of Claims
Square of # of Claims
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 1 4 9 16 25 36 49 64 81 100 121 144 169 196
2.000
7.000
Exercise: What is the variance of this mixed distribution? [Solution: 7 - 22 = 3.] First one mixes the moments, and then computes the variance of the mixture from its first and second moments.125 In general, the nth moment of a mixed distribution is the mixture of the nth moments for specific values of the parameter λ: E[Xn ] = Eλ [E[Xn | λ]].126 There is nothing unique about assuming four types of risks. If one had assumed for example 100 different types of risks, with mean frequencies from 0.1 to 10. There would have been no change in the conceptual complexity of the situation, although the computational complexity would have been increased. This discrete example can be extended to a continuous case. 125
As discussed in “Mahlerʼs Guide to Buhlmann Credibility,” one can split the variance of a mixed distribution into two pieces, the Expected Value of the Process Variance and the Variance of the Hypothetical Means. 126 Third and higher moments are more likely to be asked about for Loss Distributions. Mixtures of Loss Distributions are discussed in “Mahlerʼs Guide to Loss Distributions.”
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 298 Continuous Mixtures: We have seen how one can mix a discrete number of Poisson Distributions.127 For a continuous mixture, the mixed distribution is given as the integral of the product of the distribution of the parameter λ times the Poisson density function given λ.128 g(x) =
∫ f(x; λ) u(λ) dλ.
The density function of the mixed distribution, is the mixture of the density function for specific values of the parameter that is mixed. Exercise: The claim count N for an individual insured has a Poisson distribution with mean λ. λ is uniformly distributed between 0.3 and 0.8. Find the probability that a randomly selected insured will have one claim. [Solution: For the Poisson Distribution, f(1 | λ) = λe−λ. 0.8
(1/0.5)
∫
λ
e- λ
dλ = (2)(-λ
0.3
e- λ
-
λ = 0.8 λ e )]
= (2)(1.3e-0.3 - 1.8 e-0.8) = 30.85%.]
λ = 0.3
Continuous mixtures can be performed of either frequency distributions or loss distributions.129 Such a continuous mixture is called a Mixture Distribution.130 Mixture Distribution ⇔ Continuous Mixture of Models. Mixture Distributions can be created from other frequency distributions than the Poisson. For example, if f is a Binomial with fixed m, one could mix on the parameter q: g(x) =
∫ f(x; q) u(q) dq.
For example, if f is a Negative Binomial with fixed r, one could mix on the parameter β: g(x) =
∫ f(x; β) u(β) dβ.
If f is a Negative Binomial with fixed r, one could instead mix on the parameter p = 1/(1+β). 127
One can mix other frequency distributions besides the Poisson. The very important Gamma-Poisson situation is discussed in a subsequent section. 129 See the section on Continuous Mixtures of Models in “Mahlerʼs Guide to Loss Distributions”. 130 See Section 5.2.4 of Loss Models. 128
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 299 Moments of Continuous Mixtures: As in the case of discrete mixtures, the nth moment of a continuous mixture is the mixture of the nth moments for specific values of the parameter λ: E[Xn ] = Eλ [E[Xn | λ]]. Exercise: What is the mean for a mixture of Poissons? [Solution: For a given value of lambda, the mean of a Poisson Distribution is λ . We need to weight
∫
these first moments together via the density of lambda u(λ): λ u(λ) dλ = mean of u.] If for example, λ were uniformly distributed from 0.1 to 0.5, then the mean of the mixed distribution would be 0.3. In general, the mean of a mixture of Poissons is the mean of the mixing distribution.131 For the case of a mixture of Poissons via a Gamma Distribution with parameters α and θ, the mean of the mixed distribution is that of the Gamma, αθ.132 Exercise: What is the Second Moment for Poissons mixed via a Gamma Distribution with parameters α and θ? [Solution: For a given value of lambda, the second moment of a Poisson Distribution is λ + λ2. We need to weight these second moments together via the density of lambda: λα−1 e−λ/θ θ−α / Γ(α). ∞
(λ + λ2 ) λα - 1 e- λ / θ θ - α θ- α ∞ α P(z) = ∫ dλ = ∫ (λ + λα + 1 ) e - λ / θ dλ = Γ(α) Γ(α) λ=0 λ=0 θ- α {Γ(α+1)θα+1 + Γ(α+2)θα+2} = αθ + α(α+1)θ2.] Γ(α) Since the mean of the mixed distribution is that of the Gamma, αθ, the variance of the mixed distribution is: αθ + α(α+1)θ2 - (αθ)2 = αθ + αθ2. As will be discussed, the mixed distribution is a Negative Binomial Distribution, with r = α and β = θ. Thus the variance of the mixed distribution is: αθ + αθ2 = rβ + rβ2 = rβ(1+β), which is in fact that the variance of a Negative Binomial Distribution. 131 132
This result will hold whenever the parameter being mixed is the mean, as it was in the case of the Poisson. The Gamma-Poisson will be discussed in a subsequent section.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 300 Factorial Moments of Mixed Distributions: The nth factorial moment of a mixed distribution is the mixture of the nth factorial moments for specific values of the parameter ζ: E[(X)(X-1) ... (X-n)] = Eζ[E[(X)(X-1) ... (X-n) | ζ]. When we are mixing Poissons, the factorial moments of the mixed distribution have a simple form. nth factorial moment of mixed Poisson = E[(X)(X-1) ... (X-n)] = Eλ [E[(X)(X-1) ... (X-n) | λ] = Eλ [nth factorial moment of Poisson] = Eλ [λn ] = nth moment of the mixing distribution.133 Exercise: Given Poissons are mixed via a distribution u(θ), what are the mean and variance of the mixed distribution? [Solution: The mean of the mixed distribution = first factorial moment = mean of the mixing distribution. The second moment of the mixed distribution = second factorial moment + first factorial moment = second moment of the mixing distribution + mean of the mixing distribution. Variance of the mixed distribution = second moment of mixed distribution - (mean of mixed distribution)2 = second moment of the mixing distribution + mean of the mixing distribution (mean of the mixing distribution)2 = Variance of the mixing distribution + Mean of the mixing distribution.] When mixing Poissons, Mean of the Mixed Distribution = Mean of the Mixing Distribution, and the Variance of the Mixed Distribution = Variance of the Mixing Distribution + Mean of the Mixing Distribution. Therefore, for a mixture of Poissons, the variance of the mixed distribution is always greater than the mean of the mixed distribution. For example, for a Gamma mixing distribution, the variance of the mixed Poisson is: Variance of the Gamma + Mean of the Gamma = αθ2 + αθ.
133
See equation 8.24 in Insurance Risk Models by Panjer & Willmot.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 301 Probability Generating Functions of Mixed Distributions: The Probability Generating Function of the mixed distribution, is the mixture of the probability generating functions for specific values of the parameter λ:
P(z) =
∫ P(z; λ) u(λ) dλ.
Exercise: What is the Probability Generating Function for Poissons mixed via a Gamma Distribution with parameters α and θ? [Solution: For a given value of lambda, the p.g.f. of a Poisson Distribution is eλ(z-1). We need to weight these Probability Generating Functions together via the density of lambda: λ α−1 e−λ/θ θ−α / Γ(α). ∞
eλ(z - 1) λα - 1 e- λ / θ θ - α θ- α ∞ α - 1 - λ(1/ θ + 1 - z) P(z) = ∫ dλ = dλ ∫ λ e Γ(α) λ=0 Γ(α) λ=0 θ- α = {Γ(α) (1/θ+1-z)−α} = {1+ θ(z-1)}−α.] Γ(α) This is the p.g.f. of a Negative Binomial Distribution with r = α and β = θ. This is one way to establish that when Poissons are mixed via a Gamma Distribution, the mixed distribution is always a Negative Binomial Distribution, with r = α = shape parameter of the Gamma and β = θ = scale parameter of the Gamma.134
134
The Gamma-Poisson frequency process is the subject of an important subsequent section.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 302 Mixing Poissons:135 In the very important case of mixing Poisson frequency distributions, the p.g.f. of the mixed distribution can be put in terms of the Moment Generating Function of the mixing distribution of λ. The Moment Generating Function of a distribution is defined as: MX(t) = E[ext].136 For a mixture of Poissons: Pmixed distribution(z) = Emixing distribution of λ[PPoisson(z)] = Emixing distribution of λ[exp[λ(z - 1)] = M mixing distribution of λ(z - 1). Thus when mixing Poissons, Pmixed distribution(z) = Mmixing distribution of λ(z - 1).137 Exercise: Apply the above formula for probability generating functions to Poissons mixed via a Gamma Distribution. [Solution: The m.g.f. of a Gamma Distribution with parameters α and θ is: (1 - θt)−α. Therefore, the p.g.f. of the mixed distribution is: M mixing distribution(z - 1) = (1 - θ(z - 1))−α. Comment: This is the p.g.f. of a Negative Binomial Distribution, with r = α and β = θ. Therefore, the mixture of Poissons via a Gamma, with parameters α and θ, is a Negative Binomial Distribution, with r = α and β = θ.] M X(t) = EX[ext] = EX[exp[t]x] = PX[et]. Therefore, when mixing Poissons: M mixed distribution(t) = Pmixed distribution(et) = Mmixing distribution of λ(et - 1). Exercise: Apply the above formula for moment generating formulas to Poissons mixed via an Inverse Gaussian Distribution with parameters µ and θ. [Solution: The m.g.f. of an Inverse Gaussian Distribution with parameters µ and θ is: exp[(θ / µ) (1 -
1 - 2 µ2 t / θ )] .
Therefore, the moment generating function of the mixed distribution is: M mixing distribution of λ(et - 1) = exp[(θ / µ) {1 135
1 - 2 µ2 (et − 1) / θ }] . ]
See Section 6.10.2 of Loss Models, not on the syllabus. See Definition 3.9 in Loss Models and “Mahlerʼs Guide to Aggregate Distributions.” The moment generating functions of loss distributions are shown in Appendix B, when they exist. 137 See Equation 6.45 in Loss Models. 136
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 303 Exercise: The p.g.f. of the Zero-Truncated Negative Binomial Distribution is: {1 - β(z - 1)} - r - (1 + β)- r {1 - β(z - 1)} - r - 1 P(z) = = 1 + , z < 1 +1/β. 1 - (1 + β)- r 1 - (1 + β)- r What is the moment generating function of a compound Poisson-Extended Truncated Negative Binomial Distribution, with parameters λ = (θ/µ){(1 + 2µ2/θ).5 - 1}, r = -1/2, and β = 2µ2/θ? [Solution: The p.g.f. of a Poisson Distribution with parameter λ is: P(z) = eλ(z - 1). For a compound distribution, the m.g.f can be written in terms of the p.g.f. of the primary distribution and m.g.f. of the secondary distribution: M compound dist.(t) = Pprimary [Msecondary[t]] = Pprimary [Psecondary[et]] = exp[λ{P secondary[et] - 1}] = exp[λ
{1 - β(et - 1)} - r - 1 ]= 1 - (1 + β)- r
⎡ (θ / µ) { 1 + 2 µ 2 / θ -1} { 1 - 2 (µ2 / θ)(et - 1) - 1} ⎤ exp ⎢ ⎥= ⎣ ⎦ 1 - 1 + 2 µ2 / θ exp[(θ / µ) {1 -
1 - 2 µ2 (et − 1) / θ }] .
Comment: This is the same as the m.g.f. of Poissons mixed via an Inverse Gaussian Distribution with parameters µ and θ.] Since their moment generating functions are equal, if a Poisson is mixed by an Inverse Gaussian as per Loss Models, with parameters µ and θ, then the mixed distribution is a compound Poisson-Extended Truncated Negative Binomial Distribution as per Loss Models, θ with parameters: λ = ( 1 + 2 µ2 / θ - 1) , r = -1/2, and β = 2µ2/θ.138 µ
This is an example of a general result:139 If one mixes Poissons and the mixing distribution is infinitely divisible,140 then the resulting mixed distribution can also be written as a compound Poisson distribution, with a unique secondary distribution. The Inverse Gaussian Mixing Distribution was infinitely divisible and the result of mixing the Poissons was a Compound Poisson Distribution with a particular Extended Truncated Negative Binomial Distribution as a secondary distribution. 138
See Example 6.26 in Loss Models. See Theorem 6.20 in Loss Models. 140 As discussed previously, if a distribution is infinitely divisible, then if one takes the probability generating function to any positive power, one gets the probability generating function of another member of the same family of distributions. Examples of infinitely divisible distributions include: Poisson, Negative Binomial, Compound Poisson, Compound Negative Binomial, Normal, Gamma, and Inverse Gaussian. 139
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 304 Another example is mixing Poissons via a Gamma. The Gamma is infinitely divisible, and therefore the mixed distribution can be written as a compound distribution. As discussed previously, the mixed distribution is a Negative Binomial. It turns out that the Negative Binomial can also be written as a Compound Poisson with a logarithmic secondary distribution. Exercise: The logarithmic frequency distribution has: ⎛ β ⎞x ⎜ ⎟ ln[1 - β(z - 1)] ⎝ 1+ β ⎠ f(x) = , x = 1, 2, 3,... P(z) = 1 , z < 1 + 1/β. x ln(1+β) ln[1+ β] Determine the probability generating function of a Compound Poisson with a logarithmic secondary distribution. [Solution: Pcompound distribution(z) = Pprimary [Psecondary[z]] = exp[λ{P secondary[z] - 1}] = exp[-λ
ln[1 - β(z - 1)] -λ ] = exp[ ln[1 - β(z - 1)]] ln[1+ β] ln[1+β]
= {1 - β(z - 1)}−λ/ln[1 + β].] The p.g.f. of the Negative Binomial is: P(z) = {1 - β(z -1)}-r. This is the same form as the probability generating function obtained in the exercise, with r = λ/ln[1+β] and β = β. Therefore, a Compound Poisson with a logarithmic secondary distribution is a Negative Binomial Distribution with parameters r = λ/ln[1 + β] and β = β.141 Mixing versus Adding: The number of accidents Alice has is Poisson with mean 3%. The number of accidents Bob has is Poisson with mean 5%. The number of accidents Alice and Bob have are independent. Exercise: Determine the probability that Alice and Bob have a total of two accidents. [Solution: Their total number of accidents is Poisson with mean 8%. 0.082 e-0.08 / 2 = 0.30%. Comment: An example of adding two Poisson variables.] Exercise: We choose either Alice or Bob at random. Determine the probability that the chosen person has two accidents. [Solution: (50%)(0.032 e-0.03 / 2) + (50%)(0.052 e-0.05 / 2) = 0.081%. Comment: A 50%-50% mixture of two Poisson Distributions with means 3% and 5%. Mixing is different than adding.] 141
See Example 6.14 in Loss Models, not on the syllabus.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 305 Problems: Use the following information for the next three questions: Each insuredʼs claim frequency follows a Poisson process. There are three types of insureds as follows: Type A Priori Probability Mean Annual Claim Frequency (Poisson Parameter) A 60% 1 B 30% 2 C 10% 3 17.1 (1 point) What is the chance of a single individual having 4 claims in a year? A. less than 0.03 B. at least 0.03 but less than 0.04 C. at least 0.04 but less than 0.05 D. at least 0.05 but less than 0.06 E. at least 0.06 17.2 (1 point) What is the mean of this mixed distribution? A. 1.1 B. 1.2 C. 1.3 D. 1.4 E. 1.5 17.3 (2 points) What is the variance of this mixed distribution? A. less than 2.0 B. at least 2.0 but less than 2.1 C. at least 2.1 but less than 2.2 D. at least 2.2 but less than 2.3 E. at least 2.3
17.4 (7 points) Each insured has its annual number of claims given by a Geometric Distribution with mean β. Across a portfolio of insureds, β is distributed as follows: π(β) = 3/(1+β)4 , 0 < β. (a) Determine the algebraic form of the density of this mixed distribution. (b) List the first several values of this mixed density. (c) Determine the mean of this mixed distribution. (d) Determine the variance of this mixed distribution.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 306 17.5 (1 point) Each insuredʼs claim frequency follows a Binomial Distribution, with m = 5. There are three types of insureds as follows: Type A Priori Probability Binomial Parameter q A 60% 0.1 B 30% 0.2 C 10% 0.3 What is the chance of a single individual having 3 claims in a year? A. less than 0.03 B. at least 0.03 but less than 0.04 C. at least 0.04 but less than 0.05 D. at least 0.05 but less than 0.06 E. at least 0.06 Use the following information for the following four questions:
• The claim count N for an individual insured has a Poisson distribution with mean λ. • λ is uniformly distributed between 0 and 4. 17.6 (2 points) Find the probability that a randomly selected insured will have no claims. A. Less than 0.22 B. At least 0.22 but less than 0.24 C. At least 0.24 but less than 0.26 D. At least 0.26 but less than 0.28 E. At least 0.28 17.7 (2 points) Find the probability that a randomly selected insured will have one claim. A. Less than 0.22 B. At least 0.22 but less than 0.24 C. At least 0.24 but less than 0.26 D. At least 0.26 but less than 0.28 E. At least 0.28 17.8 (1 point) What is the mean claim frequency? 17.9 (1 point) What is the variance of the mixed frequency distribution?
17.10 (4 points) For a given value of q, the number of claims is Binomial with parameters m and q. However, m is distributed via a Negative Binomial with parameters r and β. What is the mixed distribution of the number of claims?
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 307 Use the following information for the next 3 questions: Assume that given q, the number of claims observed for one risk in m trials is given by a Binomial distribution with mean mq and variance mq(1-q). Also assume that the parameter q varies between 0 and 1 for the different risks, with q following a Beta distribution: Γ(a + b) a-1 ab a g(q) = q (1-q)b-1, with mean and variance . 2 Γ(a) Γ(b) (a + b) (a+ b +1) a +b 17.11 (2 points) What is the unconditional mean frequency? m a ab A. B. m C. m a +b a +b a +b D. m
a (a + b) (a +b +1)
E. m
a (a + b)2 (a + b +1)
17.12 (4 points) What is the unconditional variance? a a A. m2 B. m2 (a + b) (a +b +1) a +b D. m(m+a+b)
ab (a + b)2 (a + b +1)
E. m(m+a+b)
C. m2
ab (a + b) (a +b +1)
ab (a + b) (a +b +1) (a + b + 2)
17.13 (4 points) If a = 2 and b = 4, then what is the probability of observing 5 claims in 7 trials for an individual insured? A. less than 0.068 B. at least 0.068 but less than 0.070 C. at least 0.070 but less than 0.072 D. at least 0.072 but less than 0.074 E. at least 0.074
17.14 (2 points) Each insuredʼs claim frequency follows a Negative Binomial Distribution, with r = 0.8. There are two types of insureds as follows: Type A Priori Probability
β
A 70% 0.2 B 30% 0.5 What is the chance of an insured picked at random having 1 claim next year? A. 13% B. 14% C. 15% D. 16% E. 17%
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 308 17.15 (3 points) For a given value of q, the number of claims is Binomial with parameters m and q. However, m is distributed via a Poisson with mean λ. What is the mixed distribution of the number of claims? Use the following information for the next two questions: The number of claims a particular policyholder makes in a year follows a distribution with parameter p: f(x) = p(1-p)x, x = 0, 1, 2, .... The values of the parameter p for the individual policyholders in a portfolio follow a Beta Distribution, with parameters a = 4, b = 5, and θ = 1: g(p) = 280 p3 (1-p)4 , 0 ≤ p ≤ 1. 17.16 (2 points) What is the a priori mean annual claim frequency for the portfolio? A. less than 1.5 B. at least 1.5 but less than 1.6 C. at least 1.6 but less than 1.7 D. at least 1.7 but less than 1.8 E. at least 1.8 17.17 (3 points) For an insured picked at random from this portfolio, what is the probability of observing 2 claims next year? A. 9% B. 10% C. 11% D. 12% E. 13%
Use the following information for the next 2 questions: (i) An individual insured has an annual claim frequency that follow a Poisson distribution with mean λ. (ii) Across the portfolio of insureds, the parameter λ has probability density function: Π(λ) = (0.8)(40e−40λ) + (0.2)(10e−10λ). 17.18 (1 point) What is the expected annual frequency? (A) 3.6% (B) 3.7% (C) 3.8% (D) 3.9% (E) 4.0% 17.19 (2 points) For an insured picked at random, what is the probability that he will have at least one claim in the coming year? (A) 3.6% (B) 3.7% (C) 3.8% (D) 3.9% (E) 4.0%
17.20 (4 points) For a given value of q, the number of claims is Binomial with parameters m and q. However, m is distributed via a Binomial with parameters 5 and 0.1. What is the mixed distribution of the number of claims?
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 309 Use the following information for the next four questions: For a given value of q, the number of claims is Binomial distributed with parameters m = 3 and q. In turn q is distributed uniformly from 0 to 0.4. 17.21 (2 points) What is the chance that zero claims are observed? A. Less than 0.52 B. At least 0.52 but less than 0.53 C. At least 0.53 but less than 0.54 D. At least 0.54 but less than 0.55 E. At least 0.55 17.22 (2 points) What is the chance that one claim is observed? A. Less than 0.32 B. At least 0.32 but less than 0.33 C. At least 0.33 but less than 0.34 D. At least 0.34 but less than 0.35 E. At least 0.35 17.23 (2 points) What is the chance that two claims are observed? A. Less than 0.12 B. At least 0.12 but less than 0.13 C. At least 0.13 but less than 0.14 D. At least 0.14 but less than 0.15 E. At least 0.15 17.24 (2 points) What is the chance that three claims are observed? A. Less than 0.01 B. At least 0.01 but less than 0.02 C. At least 0.02 but less than 0.03 D. At least 0.03 but less than 0.04 E. At least 0.04
17.25 (2 points) For students at a certain college, 40% do not own cars and do not drive. For the rest of the students, their accident frequency is Poisson with λ = 0.07. Let T = the total number of accidents for a group of 100 students picked at random. What is the variance of T? A. 4.0 B. 4.1 C. 4.2 D. 4.3 E. 4.4
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 310 Use the following information for the next 7 questions: On his daily walk, Clumsy Klem loses coins at a Poisson rate. At random, on half the days, Klem loses coins at a rate of 0.2 per minute. On the other half of the days, Klem loses coins at a rate of 0.6 per minute. The rate on any day is independent of the rate on any other day. 17.26 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the sixth minute of todayʼs walk. (A) 0.21 (B) 0.23 (C) 0.25 (D) 0.27 (E) 0.29 17.27 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the first two minutes of todayʼs walk. A. Less than 32% B. At least 32%, but less than 34% C. At least 34%, but less than 36% D. At least 36%, but less than 38% E. At least 38% 17.28 (2 points) Let A = the number of coins that Clumsy Klem loses during the first minute of todayʼs walk. Let B = the number of coins that Clumsy Klem loses during the first minute of tomorrowʼs walk. Calculate Prob[A + B = 1]. (A) 0.30 (B) 0.32 (C) 0.34 (D) 0.36 (E) 0.38 17.29 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the third minute of todayʼs walk and exactly one coin during the fifth minute of todayʼs walk. (A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09 17.30 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the third minute of todayʼs walk and exactly one coin during the fifth minute of tomorrowʼs walk. (A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09 17.31 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the first four minutes of todayʼs walk and exactly one coin during the first four minutes of tomorrowʼs walk. A. Less than 8.5% B. At least 8.5%, but less than 9.0% C. At least 9.0%, but less than 9.5% D. At least 9.5%, but less than 10.0% E. At least 10.0% 17.32 (3 points) Calculate the probability that Clumsy Klem loses exactly one coin during the first 2 minutes of todayʼs walk, and exactly two coins during the following 3 minutes of todayʼs walk. (A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 311 Use the following information for the next two questions: Each insured has its accident frequency given by a Poisson Distribution with mean λ. For a portfolio of insureds, λ is distributed as follows on the interval from a to b: (d +1) λd f(λ) = d + 1 , 0 ≤ a ≤ λ ≤ b ≤ ∞. b - ad + 1 17.33 (2 points) If the parameter d = -1/2, and if a = 0.2 and b = 0.6, what is the mean frequency? A. less than 0.35 B. at least 0.35 but less than 0.36 C. at least 0.36 but less than 0.37 D. at least 0.37 but less than 0.38 E. at least 0.38 17.34 (2 points) If the parameter d = -1/2, and if a = 0.2 and b = 0.6, what is the variance of the frequency? A. less than 0.39 B. at least 0.39 but less than 0.40 C. at least 0.40 but less than 0.41 D. at least 0.41 but less than 0.42 E. at least 0.42 17.35 (3 points) Let X be a 50%-50% weighting of two Binomial Distributions. The first Binomial has parameters m = 6 and q = 0.8. The second Binomial has parameters m = 6 and q unknown. For what value of q, does the mean of X equal the variance of X? A. 0.3 B. 0.4 C. 0.5 D. 0.6 E. 0.7 Use the following information for the next 2 questions: (i) Claim counts for individual insureds follow a Poisson distribution. (ii) Half of the insureds have expected annual claim frequency of 4%. (iii) The other half of the insureds have expected annual claim frequency of 10%. 17.36 (1 point) An insured is picked at random. What is the probability that this insured has more than 1 claim next year? (A) 0.21% (B) 0.23% (C) 0.25% (D) 0.27% (E) 0.29% 17.37 (1 point) A large group of such insured is observed for one year. What is the variance of the distribution of the number of claims observed for individuals? (A) 0.070 (B) 0.071 (C) 0.072 (D) 0.073 (E) 0.074
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 312 Use the following information for the next three questions: An insurance company sells two types of policies with the following characteristics: Type of Policy Proportion of Total Policies Annual Claim Frequency I
25%
Poisson with λ = 0.25
II
75%
Poisson with λ = 0.50
17.38 (1 point) What is the probability that an insured picked at random will have no claims next year? A. 50% B. 55% C. 60% D. 65% E. 70% 17.39 (1 point) What is the probability that an insured picked at random will have one claim next year? A. less than 30% B. at least 30% but less than 35% C. at least 35% but less than 40% D. at least 40% but less than 45% E. at least 45% 17.40 (1 point) What is the probability that an insured picked at random will have two claims next year? A. 4% B. 6% C. 8% D. 10% E. 12%
17.41 (3 points) The Spiders sports team will play a best of 3 games playoff series. They have an 80% chance to win each home game and only a 40% chance to win each road game. The results of each game are independent of the results of any other game. It has yet to be determined whether one or two of the three games will be home games for the Spiders, but you assume these two possibilities are equally likely. What is the chance that the Spiders win their playoff series? A. 63% B. 64% C. 65% D. 66% E. 67% 17.42 (4 points) The number of claims is modeled as a two point mixture of Poisson Distributions, with weight p to a Poisson with mean λ1 and weight (1-p) to a Poisson with mean λ2. (a) For the mixture, determine the ratio of the variance to the mean as a function of λ1, λ2, and p. (b) With the aid of a computer, for λ1 = 10% and λ2 = 20%, graph this ratio as a function of p for 0 ≤ p ≤ 1.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 313 Use the following information for the next five questions: For a given value of q, the number of claims is Binomial distributed with parameters m = 4 and q. In turn q is distributed from 0 to 0.6 via: π(q) =
2500 2 q (1-q). 99
17.43 (3 points) What is the chance that zero claims are observed? A. 12% B. 14% C. 16% D. 18% E. 20% 17.44 (3 points) What is the chance that one claim is observed? A. 26% B. 26% C. 28% D. 30% E. 32% 17.45 (3 points) What is the chance that two claims are observed? A. 26% B. 26% C. 28% D. 30% E. 32% 17.46 (2 points) What is the chance that three claims are observed? A. 19% B. 21% C. 23% D. 25% E. 27% 17.47 (2 points) What is the chance that four claims are observed? A. 3% B. 4% C. 5% D. 6% E. 7%
17.48 (3 points) Use the following information:
• There are two types of insurance policies. • Three quarters are low risk policies, while the remaining one quarter are high risk policies. • The annual claims from each type of policy are Poisson. • The mean number of claims from a high risk policy is 0.4. • The variance of the mixed distribution of the number of claims is 0.2575. Determine the mean annual claims from a low risk policy. A. 12% B. 14% C. 16% D. 18% E. 20%
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 314 17.49 (4, 11/82, Q.48) (3 points) Let f(x|θ) = frequency distribution for a particular risk having parameter θ. f(x|θ) = θ(1-θ)x, where θ is in the interval [p, 1], p is a fixed value such that 0 < p < 1, and x is a non-negative integer. g(θ) = distribution of θ within a given class of risks.
g(θ) =
-1 , for p ≤ θ ≤ 1. θ ln(p)
Find the frequency distribution for the class of risks. A.
-(x +1) (1- p)x p2 ln(p)
B.
- px + 1 (x +1) ln(p)
C.
-(1- p)x + 1 (x +1) ln(p)
D.
- (x +1) px ln(p)
E. None A, B, C, or D. 17.50 (2, 5/88, Q.33) (1.5 points) Let X have a binomial distribution with parameters m and q, and let the conditional distribution of Y given X = x be Poisson with mean x. What is the variance of Y? A. x
B. mq
D. mq2
C. mq(1 - q)
E. mq(2 - q)
17.51 (4, 5/88, Q.32) (2 points) Let N be the random variable which represents the number of claims observed in a one year period. N is Poisson distributed with a probability density function with parameter θ: P[N = n | θ] = e-θ θn /n!, n = 0, 1, 2, ... The probability of observing no claims in a year is less than .450. Which of the following describe possible probability distributions for θ? 1. θ is uniformly distributed on (0, 2). 2. The probability density function of θ is f(θ) = e−θ for θ > 0. 3. P[θ = 1] = 1 and P[θ ≠ 1] = 0. A. 1
B. 2
C. 3
D. 1, 2
E. 1, 3
17.52 (3, 11/00, Q.13 & 2009 Sample Q.114) (2.5 points) A claim count distribution can be expressed as a mixed Poisson distribution. The mean of the Poisson distribution is uniformly distributed over the interval [0, 5]. Calculate the probability that there are 2 or more claims. (A) 0.61 (B) 0.66 (C) 0.71 (D) 0.76 (E) 0.81
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 315 17.53 (SOA3, 11/04, Q.32 & 2009 Sample Q.130) (2.5 points) Bob is a carnival operator of a game in which a player receives a prize worth W = 2N if the player has N successes, N = 0, 1, 2, 3,… Bob models the probability of success for a player as follows: (i) N has a Poisson distribution with mean Λ. (ii) Λ has a uniform distribution on the interval (0, 4). Calculate E[W]. (A) 5 (B) 7
(C) 9
(D) 11
(E) 13
17.54 (CAS3, 11/06, Q.19) (2.5 points) In 2006, annual claim frequency follows a negative binomial distribution with parameters β and r. β follows a uniform distribution on the interval (0, 2) and r = 4. Calculate the probability that there is at least 1 claim in 2006. A. Less than 0.85 B. At least 0.85, but less than 0.88 C. At least 0.88, but less than 0.91 D. At least 0.91, but less than 0.94 E. At least 0.94 17.55 (SOA M, 11/06, Q.39 & 2009 Sample Q.288) (2.5 points) The random variable N has a mixed distribution: (i) With probability p, N has a binomial distribution with q = 0.5 and m = 2. (ii) With probability 1 - p, N has a binomial distribution with q = 0.5 and m = 4. Which of the following is a correct expression for Prob(N = 2)? (A) 0.125p2 (B) 0.375 + 0.125p (C) 0.375 + 0.125p2 (D) 0.375 - 0.125p2 (E) 0.375 - 0.125p
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 316 Solutions to Problems: 17.1. D. Chance of observing 4 accidents is θ4e−θ / 24. Weight the chances of observing 4 accidents by the a priori probability of θ. Type A B C
A Priori Probability 0.6 0.3 0.1
Poisson Parameter 1 2 3
Chance of 4 Claims 0.0153 0.0902 0.1680
Average
0.053
17.2. E. (60%)(1) + (30%)(2) + (10%)(3) = 1.5. 17.3. A. For a Type A insured, the second moment is: variance + mean2 = 1 + 12 = 2. For a Type B insured, the second moment is: variance + mean2 = 2 + 22 = 6. For a Type C insured, the second moment is: variance + mean2 = 3 + 32 = 12. The second moment of the mixture is: (60%)(2) + (30%)(6) + (10%)(12) = 4.2. The variance of the mixture is: 4.2 - 1.52 = 1.95. Alternately, the Expected Value of the Process Variance is: (60%)(1) + (30%)(2) + (10%)(3) = 1.5. The Variance of the Hypothetical Means is: (60%)(1 - 1.5)2 + (30%)(2 - 1.5)2 + (10%)(3 - 1.5)2 = 0.45. Total Variance = EPV + VHM = 1.5 + 0.45 = 1.95. Comment: For the mixed distribution, the variance is greater than the mean.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 317 17.4. (a) For the Geometric distribution, f(x) = βx/(1+β)x+1. ∞
For the mixed distribution, f(x) =
∫
f(x;β) π(β )dβ =
0
∞
1
0
0
∫ βx / (1+β)x + 1 3 / (1+β)4 dβ = 3 ∫ u3 (1−u)x du ,
where u = 1/(1+β), 1 - u = β/(1+β), and du = -1/(1+β)2 . This integral is of the Beta variety; its value of Γ(x+1) Γ(3+1) / Γ(x + 1 + 3 + 1), follows from the fact that the density of a Beta Distribution integrates to one over its support. Therefore, f(x) = (3){Γ(x+1) Γ(3+1) / Γ(x + 1 + 3 + 1)} = (3)(x!)(3!)/(x+4)! = 18 / {(x+1)(x+2)(x+3)(x+4)}. (b) The densities from 0 to 20 are: 3/4, 3/20, 1/20, 3/140, 3/280, 1/168, 1/280, 1/440, 1/660, 3/2860, 3/4004, 1/1820, 3/7280, 3/9520, 1/4080, 1/5168, 1/6460, 1/7980, 3/29260, 3/35420, 1/14168. (c) The mean of this mixed distribution is:
∞
∞
0
0
∫ β π(β )dβ = ∫ β
3 / (1+β)4
1
dβ = 3 ∫ u (1−u) du = 0
(3)(1/2 - 1/3) = 1/2. (d) The second moment of a Geometric is: variance + mean2 = β(1+β) + β2 = β + 2β2. ∞
∞
1
0
0
0
∫ β2 π(β) dβ =
∫ β2 3 / (1+β)4 dβ = 3 ∫ (1− u)2 du = 3/3 = 1.
Therefore, the second moment of this mixed distribution is: 1/2 + (2)(1) = 2.5. The variance of this mixed distribution is: 2.5 - 0.52 = 2.25. Comment: This is a Yule Distribution as discussed In Example 6.22 of Loss Models, with a = 3.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 318 17.5. B. Chance of observing 3 claims is 10q3 (1-q)2 . Weight the chances of observing 3 claims by the a priori probability of q. A Priori Probability 0.6 0.3 0.1
Type A B C
q Parameter 0.1 0.2 0.3
Chance of 3 Claims 0.0081 0.0512 0.1323
Average
0.033
17.6. C. The chance of no claims for a Poisson is: e−λ. We average over the possible values of λ: 4
(1/4)
∫0
e- λ
dλ = (1/ 4)(-
λ =4 λ e )
]
= (1/4)(1 - e-4) = 0.245.
λ=0
17.7. B. The chance of one claim for a Poisson is: λe−λ. We average over the possible values of λ: 4
∫0
λ =4
(1/4) λ e- λ dλ = (1/ 4)(-λ e- λ - e- λ )]
= (1/4)(1 - 5e-4 ) = 0.227.
λ =0
Comment: The densities of this mixed distribution from 0 to 9: 0.245421, 0.227105, 0.190474, 0.141632, 0.0927908, 0.0537174, 0.0276685, 0.0127834, 0.00534086, 0.00203306. 17.8. E[λ] = (0 + 4)/2 = 2. 17.9. The second moment of a Poisson is: variance + mean2 = λ + λ2. E[λ + λ2] = E[λ] + E[λ2] = mean of uniform distribution + second moment of uniform distribution = 2 + {22 + (4 - 0)2 /12} = 2 + 4 + 1.333 = 7.333. variance = second moment - mean2 = 7.333 - 22 = 3.333.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 319 17.10. The p.g.f. of each Binomial is: {1 + q(z-1)}m. The p.g.f. of the mixture is the mixture of the p.g.f.s: Pmixture[z] = Σ f(m){1 + q(z-1)}m = p.g.f. of f at: 1 + q(z-1). However, f(m) is Negative Binomial, with p.g.f.: {1 - β(z-1)}-r Therefore, Pmixture[z] = {1 - β(1 + q(z-1) - 1)}-r = {1 - βq(z-1)}-r . However, this is the p.g.f. of a Negative Binomial Distribution with parameters r and qβ, which is therefore the mixed distribution. Alternately, the mixed distribution at k is: m=∞
m=∞
m=k
m=k
∑ Prob[k | m] Prob[m] = ∑
m! (r + m- 1)! βm k m k q (1- q) = (m- k)! k! (r -1)! m! (1+ β)r + m
∞
(r + k -1)! q k βk (1+ β) r + k (r - 1)! k!
m=∞
∑
⎛ (1- q) β ⎞ m - k (r + m- 1)! = ⎜ ⎟ (m- k)! (r +k - 1)! ⎝ 1 + β ⎠
(r + k -1)! q k βk r + k (r - 1)! k! (1+ β)
n=∞
n (r + k + n -1)! ⎛ (1- q) β ⎞ ⎜ ⎟ = n! (r +k - 1)! ⎝ 1 + β ⎠
m=k
∑ n=0
⎞ r+k (r + k -1)! ⎛ q k βk 1 = ⎜ ⎟ (1+ β) r + k (r - 1)! k! ⎝ 1 - (1- q)β / (1+β) ⎠ (qβ)k (r + k -1)! (1+ β)r + k (r + k -1)! q k βk = . (r - 1)! k! (1 + qβ)r + k (1+ β) r + k (r - 1)! k! (1 + qβ)r + k This is a Negative Binomial Distribution with parameters r and qβ. Comment: The sum was simplified using the fact that the Negative Binomial densities sum to 1: i=∞
1=
∑ i=0
i=∞
⇒
(s + i -1)! αi .⇒ i! (s - 1)! (1+ α) s + i
i=∞
∑ i=0
(s + i -1)! ⎛ α ⎞ i = (1+α)s. i! (s - 1)! ⎝ 1+ α ⎠
∑ i! (s - 1)! γ i = (1-1γ)s . (s + i -1)!
i=0
Where, γ = α/(1+α). ⇒ α = γ/(1-γ). ⇒ 1+ α = 1/(1-γ).
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 320 17.11. B. The conditional mean given q is: mq. The unconditional mean can be obtained by integrating the conditional means versus the distribution of q: 1
E[X] =
∫0
1
E[X | q] g(q) dq =
Γ(a + b) m Γ(a) Γ(b)
∫0
mq
Γ(a + b) a - 1 q (1- q)b - 1 dq = Γ(a) Γ(b)
1
∫0
qa (1- q)b - 1 dq = m
Γ(a + b) Γ(a +1) Γ(b) Γ(a +1) Γ(a+ b) =m = ma / (a+b). Γ(a) Γ(b) Γ(a + b + 1) Γ(a) Γ(a + b + 1)
Alternately, 1
E[X] =
1
∫0 E[X | q] g(q) dq = m ∫0 q g(q) dq = m (mean of Beta Distribution) = ma / (a+b).
Comment: The Beta distribution with θ = 1 has density from 0 to 1 of: Therefore, the integral from zero to of xa-1(1-x)b-1 is:
Γ(a) Γ(b) . Γ(a + b)
Γ(a + b) a-1 x (1-x)b-1. Γ(a) Γ(b)
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 321 17.12. D. The conditional variance given q is: mq(1-q) = mq - mq2 . Thus the conditional second moment given q is: mq - mq2 +(mq)2 = mq + (m2 - m)q2 . The unconditional second moment can be obtained by integrating the conditional second moments versus the distribution of q: 1
E[X2 ] =
1
∫ E[X2 | q] g(q) dq = ∫{mq + (m2 - m)q2} {Γ(a+b) / Γ(a)Γ(b)} qa-1(1-q)b-1dq =
0
0 1
1
∫
∫
m{Γ(a+b) / Γ(a)Γ(b)} qa(1-q)b-1dq +(m2 -m){Γ(a+b) / Γ(a)Γ(b)} qa+1(1-q)b-1dq = 0
0
m{Γ(a+b) / Γ(a)Γ(b)}{Γ(a+1)Γ(b)/Γ(a+b+1) } + (m2 - m){Γ(a+b) / Γ(a)Γ(b)}{Γ(a+2)Γ(b)/Γ(a+b+2)} = m{Γ(a+1)/Γ(a) }{Γ(a+b) /Γ(a+b+1)} + (m2 - m) {Γ(a+2)/Γ(a) }{Γ(a+b) /Γ(a+b+2)} = ma/(a+b) + (m2 - m)a(a+1)/{(a+b)(a+b+1)}. Since the mean is ma/(a+b), the variance is: ma/(a+b) + (m2 - m)a(a+1)/{(a+b)(a+b+1)} - m2 a2 /(a+b)2 = {ma/ {(a+b)2(a+b+1)}}{(a+b+1)(a+b) + (m-1)(a+1)(a+b) - ma(a+b+1)} = {ma/ (a + b)2(a + b + 1)}{ab + b2 + mb )} = m(m+a+b)ab/{(a+b)2(a+b+1)}. Alternately,
1
E[X2 ] = 0
1
1
∫ E[X2 | q] g(q) dq = m ∫q g(q)dq + (m2 - m) ∫ q2 g(q)dq = 0
0
m(mean of Beta Distribution) + (m2 - m) (second moment of the Beta Distribution) = ma/(a+b) + (m2 - m)( ab/{(a+b+1) (a+b)2 } + a2 /(a + b)2) = ma/(a+b) + (m2 - m)( a/{(a+b+1) (a+b)2 })(b + a2 + ab + a) = ma/(a+b) + (m2 - m)a(a+1)/{(a+b)(a+b+1)}. Then proceed as before. Comment: This is an example of the Beta-Binomial Conjugate Prior Process. See “Mahlerʼs Guide to Conjugate Priors.” The unconditional distribution is sometimes called a “Beta-Binomial” Distribution. See Example 6.21 in Loss Models or Kendallʼs Advanced Theory of Statistics by Stuart and Ord.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 322 17.13. E. The probability density of q is a Beta Distribution with parameters a and b: {(a+b-1)! / {(a-1)! (b-1)!} qa-1(1-q)b-1 = {Γ(a+b) / Γ(a)Γ(b)} qa-1(1-q)b-1. One can compute the unconditional density at n ≤ 7 via integration: 1
∫
f(n) = f(n | q) {Γ(a+b) / Γ(a)Γ(b)} qa-1(1-q)b-1 dq = 0 1
∫
{Γ(a+b) / Γ(a)Γ(b)} {m! / (n!)(m-n)!} qn (1-q)m-nqa-1(1-q)b-1dq = 0 1
{Γ(a+b) / Γ(a)Γ(b)}{Γ(m+1) / (Γ(n+1))Γ(m+1-n)}
∫ qa+n-1(1-q)b+m-n-1dq =
0
{Γ(a+b) / Γ(a)Γ(b)}{Γ(m+1) / (Γ(n+1))Γ(m+1-n)} {Γ(a+n) Γ(b+m-n) / Γ(a+b+m) } = Γ(a+b)Γ(m+1)Γ(a+n) Γ(b+m-n) / {Γ(a)Γ(b) Γ(n+1) Γ(m+1-n)Γ(a+b+m)} For n = 5, a = 2, b = 4, and m = 7. f(5) = Γ(6)Γ(8)Γ(7) Γ(6) / {Γ(2)Γ(4) Γ(6) Γ(3)Γ(13)} = 5! 7! 6! 5! /{1! 3! 5! 2! 12!} = 0.07576. Comment: Beyond what you are likely to be asked on your exam. The probability of observing other number of claims in 7 trials is as follows: n f(n) F(n)
0 0.15152 0.15152
1 0.21212 0.36364
2 0.21212 0.57576
3 0.17677 0.75253
4 0.12626 0.87879
5 0.07576 0.95455
6 0.03535 0.98990
This is an example of the “Binomial-Beta” distribution with: a = 2, b = 4, and m = 7. 17.14. B. For a Negative Binomial Distribution, f(1) = For Type A: f(1) = (0.8) (0.2) / (1.21.8) = 11.52%. For Type B: f(1) = (0.8) (0.5) / (1.51.8) = 19.28%. (70%) (11.52%) + (30%) (19.28%) = 13.85%.
rβ (1 + β)r + 1
.
7 0.01010 1.00000
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 323 17.15. The p.g.f. of each Binomial is: {1 + q(z-1)}m. The p.g.f. of the mixture is the mixture of the p.g.f.s: Pmixture[z] = Σ f(m){1 + q(z-1)}m = p.g.f. of f at: 1 + q(z-1). However, f(m) is Poisson, with p.g.f.: exp[λ(z-1)]. Therefore, Pmixture[z] = exp[λ{1 + q(z-1) - 1}] = exp[λq(z-1)]. However, this is the p.g.f. of a Poisson Distribution with mean qλ, which is therefore the mixed distribution. Alternately, the mixed distribution at k is: ∞
∞
∞
Σ Prob[k | m]Prob[m] = Σ m!/{(m-k)!k!} qk (1-q)m-k e−λ λm/m! = qke−λλk/k!Σ(1-q)m-kλm-k/(m-k)! m=k
m=k
m=k
∞
= qke−λλ k/k!Σ(1-q)n λ n /n! = (qke−λλ k/k!)exp[(1-q)λ] = (qλ)ke−λq/k!. n=0
This is a Poisson Distribution with mean qλ. 17.16. C. This is a Geometric Distribution (a Negative Binomial with r =1), parameterized somewhat differently than in Loss Models, with p = 1/(1 + β). Therefore for a given value of p the mean is: µ(p) = β = (1-p)/p. In order to get the average mean over the whole portfolio we need to take the integral of µ(p) g(p) dp. 1
1
1
∫µ(p) g(p) dp = ∫((1-p)/p) 280 p3(1-p)4 dp = 280 ∫ p2(1-p)5 dp = 280 Γ(3)Γ(6) / Γ(3+6) 0
0
0
= 280 (2!)(5!) / 8! = 5/3 . Comment: Difficult! Special case of mixing a Negative Binomial (for r fixed) via a Beta Distribution. See Example 6.22 in Loss Models, where the mixed distribution is called the Generalized Waring. For the Generalized Waring in general, the a priori mean turns out to be rb/(a-1). For r =1, b = 5 and a = 4, the a priori mean is (1)(5)/3 = 5/3.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 324 17.17. D. The probability density of p is a Beta Distribution with parameters a and b: {Γ(a+b) / Γ(a)Γ(b)} pa-1(1-p)b-1. One can compute the unconditional density at n via integration: 1
∫
f(n) = f(n | p) {Γ(a+b) / Γ(a)Γ(b)} pa-1(1-p)b-1 dp = 0 1
∫
{Γ(a+b) / Γ(a)Γ(b)} p(1-p)x p a-1(1-p)b-1dp = 0 1
{Γ(a+b) / Γ(a)Γ(b)}
∫ p a(1-p)b+n-1dp =
0
{Γ(a+b) / Γ(a)Γ(b)}{Γ(a+1) Γ(b+n) / Γ(a+b+n+1)} = a Γ(a+b) Γ(b+n) / {Γ(b) Γ(a+b+n+1)}. For a = 4, b = 5: f(n) = 4 Γ(9) Γ(5+n) / {Γ(5) Γ(10+n)} = 4 8! (n+4)! / {4! (n+9)!} = 6720 (n+4)! / (n+9)!. f(2) = 6720 6! / 11! = 12.1%. Comment: The Beta distribution with θ = 1 has density from 0 to 1 of: {Γ(a+b) / Γ(a)Γ(b)}xa-1(1-x)b-1. Therefore, the integral from zero to of xa-1(1-x)b-1 is: Γ(a)Γ(b)/ Γ(a+b). This is an example of a Generalized Waring Distribution, with r = 1, a = 4 and b = 5. See Example 6.22 in Loss Models. The probabilities of observing 0 to 20 claims is as follows: 0.444444, 0.222222, 0.121212, 0.0707071, 0.043512, 0.027972, 0.018648, 0.0128205, 0.00904977, 0.00653595, 0.00481596, 0.00361197, 0.00275198, 0.00212653, 0.00166424, 0.00131752, 0.00105402, 0.000851323, 0.00069367, 0.000569801, 0.000471559. Since the densities must add to unity: ∞
∞
1 = Σ a Γ(a+b) Γ(b+n) / {Γ(b) Γ(a+b+n+1)}. ⇒ Σ Γ(b+n) / Γ(a+b+n+1) = Γ(b)/{a Γ(a+b)}. n =0
n =0
17.18. E. E[λ] = the mean of the prior mixed exponential = weighted average of the means of the two exponential distributions = (.8)(1/40) + (.2)(1/10) = 4.0%.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 325 17.19. C. Given λ, f(0) = e−λ. ∞
∞
∞
∫ f(0; λ) Π(λ) dλ = 32∫ e−41λ dλ + 2∫ e−11λ dλ = (32/41) + (2/11) = 0.9623. 0
0
0
Prob[at least one claim] = 1 - 0.9623 = 3.77%. 17.20. The p.g.f. of each Binomial is: {1 + q(z-1)}m. The p.g.f. of the mixture is the mixture of the p.g.f.s: Pmixture[z] = Σ f(m){1 + q(z-1)}m = p.g.f. of f at: 1 + q(z-1). However, f(m) is Binomial with parameters 5 and 0.1, with p.g.f.: {1 + .1(z-1)}5 . Therefore, Pmixture[z] = {1 + .1(1 + q(z-1) -1)}5 = {1 + .1q(z-1)}5 . However, this is the p.g.f. of a Binomial Distribution with parameters 5 and .1q, which is therefore the mixed distribution. Alternately, the mixed distribution at k ≤ 5 is: ∞
∞
Σ Prob[k | m]Prob[m] = Σ m!/{(m-k)!k!} qk (1-q)m-k 5!/{(5-m)!m!} .1m .95-m m=k
m=k ∞
= qk .1k 5!/{(5-k)!k!}Σ(5-k)!/{(m-k)!(5-m)!} (1-q)m-k.1m-k.95-m = m=k ∞
= qk .1k 5!/{(5-k)!k!}Σ(5-k)!/{n!(5- k - n)!} (1-q)n .1n .95- k - n = n=0
= qk .1k 5!/{(5-k)!k!} {(1-q)(.1) + .9}5- k = 5!/{(5-k)!k!} (.1q)k (1 - .1q)5- k. This is a Binomial Distribution with parameters 5 and .1q. Comment: The sum was simplified using the Binomial expansion: m
(x + y)m = Σ xiy m-i m!/{i!(m-i)!}. i=0
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 326 17.21. D. Given q, we have a Binomial with parameters m = 3 and q. The chance that we observe zero claims is: (1-q)3 . The distribution of q is uniform: π(q) = 2.5 for 0 ≤ q ≤ 0.4. .4
f(0) =
.4
q = .4
∫ f(0 | q) π(q) dq = ∫(1 - q)3 (2.5) dq = (-2.5/4)(1 - q)4 ] = (-.625)(.64 - 14) = 0.544.
0
0
q=0
17.22. B. Given q, we have a Binomial with parameters m = 3 and q. The chance that we observe one claim is: 3q(1-q)2 = 3q - 6q2 + 3q3 . .4
P(c=1) =
.4
q= .4
∫ P(c=1 | q) f(q) dq = ∫( 3q - 6q2 + 3q3) (2.5)dq = (2.5)(1.5q2 - 2q3 +.75q4) ]
0
0
q= 0
= (2.5)(.24 - .128 + .0192) = 0.328. 17.23. A. Given q, we have a Binomial with parameters m = 3 and q. The chance that we observe two claims is: 3q2 (1-q) = 3q2 - 3 q3 . .4
P(c=2) =
.4
q= .4
∫ P(c=2 | q) f(q) dq = ∫(3q2 - 3 q3) (2.5)dq = (2.5)(q3 -.75q4) ]
0
0
q= 0
= (2.5)(.064 - .0192) = 0.112. 17.24. B. Given q, we have a Binomial with parameters m = 3 and q. The chance that we observe three claims is: q3 . .4
P(c=3) = 0
.4
q= .4
∫ P(c=3 | q) f(q) dq = ∫( q3) (2.5)dq = (2.5)(q4/4) ] = (2.5)(.0064) = 0.016. 0
q= 0
Comment: Since we have a Binomial with m = 3, the only possibilities are 0, 1, 2 or 3 claims. Therefore, the probabilities for 0, 1, 2 and 3 claims (calculated in this and the prior three questions) add to one: 0.544 + 0.328 + 0.112 + 0.016 = 1.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 327 17.25. D. This is a 40%-60% mixture of zero and a Poisson with λ = .07. The second moment of the Poisson is: variance + mean2 = .07 + .072 = .0749. The mean of the mixture is: (40%)(0) + (60%)(.07) = .042. The second moment of the mixture is: (40%)(0) + (60%)(.0749) = .04494. The variance of the mixture is: .04494 - .0422 = .0432, per student. For a group of 100 students the variance is: (100)(.0432) = 4.32. 17.26. C. For λ = 0.2, f(1) = .2e-.2 = 0.1638. For λ = 0.6, f(1) = .6e-.6 = 0.3293. Prob[1 coin] = (.5)(.1638) + (.5)(.3293) = 24.65%. 17.27. A. Over two minutes, the mean is either 0.4: f(1) = .4e-.4 = 0.2681, or the mean is 1.2: f(1) = 1.2e-1.2 = 0.3614. Prob[1 coin] = (.5)(.2681) + (.5)(.3614) = 31.48%. 17.28. C. Prob[0 coins during a minute] = (.5)e-.2 + (.5)e-.6 = 0.6838. Prob[1 coin during a minute] = (.5).2e-.2 + (.5).6e-.6 = 0.2465. Prob[A + B = 1] = Prob[A= 0]Prob[B] + Prob[A = 1]Prob[B = 0] = (2)(.6838)(.2465) = 33.71%. Comment: Since the minutes are on different days, their lambdas are picked independently. 17.29. C. Prob[1 coin during third minute and 1 coin during fifth minute | λ = 0.2] = (.2e-.2)(.2e-.2) = 0.0268. Prob[1 coin during third minute and 1 coin during fifth minute | λ = 0.6] = (.6e-.6)(.6e-.6) = 0.1084. (.5)(.0268) + (.5)(.1084) = 6.76%. Comment: Since the minutes are on the same day, they have the same λ, whichever it is. 17.30. B. Prob[1 coin during a minute] = (.5).2e-.2 + (.5).6e-.6 = 0.2465. Since the minutes are on different days, their lambdas are picked independently. Prob[1 coin during 1 minute today and 1 coin during 1 minute tomorrow] = Prob[1 coin during a minute] Prob[1 coin during a minute] = 0.24652 = 6.08%. 17.31. A. Prob[1 coin during 4 minutes] = (.5).8e-.8 + (.5)2.4e-2.4 = 0.2866. Since the time intervals are on different days, their lambdas are picked independently. Prob[1 coin during 4 minutes today and 1 coin during 4 minutes tomorrow] = Prob[1 coin during 4 minutes] Prob[1 coin during 4 minutes] = 0.28662 = 8.33%.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 328 17.32. B. Prob[1 coin during two minute and 2 coins during following 3 minutes | λ = 0.2] = (.4e-.4)(.62 e-.6/2) = 0.0265. Prob[1 coin during two minute and 2 coins during following 3 minutes | λ = 0.6] = (1.2e-1.2)(1.82 e-1.8/2) = 0.0968. (.5)(.0265) + (.5)(.0968) = 6.17%. b
λ=b
b
∫λ f(λ) dλ = ∫λ (d+1) λd / {bd+1 - ad+1} dλ = {(d+1)/{bd+1 - ad+1}} λd+2 / (d+2) ] =
17.33. E. a
λ=a
a
{(d+1)/ (d+2)}{bd+2 - ad+2 } / {bd+1 - ad+1} = (.5/1.5){.61.5 - .21.5) / (.6.5 - .2.5) = 0.3821. b
λ=b
b
∫λ2 f(λ) dλ = ∫λ2 (d+1) λd / {bd+1 - ad+1} dλ = {(d+1)/{bd+1 - ad+1}} λd+3 / (d+3) ] =
17.34. B. a
a
λ=a
{(d+1)/ (d+3)}{bd+3 - ad+3 } / {bd+1 - ad+1} = (.5/2.5){.62.5 - .22.5) / (.6.5 - .2.5) = .15943. For fixed λ, the second moment of a Poisson is: λ + λ2. Therefore, the second moment of the mixture is: E[λ] + E[λ2] = .3821 + .15943 = .5415. Therefore, the variance of the mixture is: .5415 - .38212 = 0.3955. Alternately, Variance[λ] = Second Moment[λ] - Mean[λ]2 = .15943 - .38212 = .0134. The variance of frequency for a mixture of Poissons is: E[λ] + Var[λ] = .3821 + .0134 = 0.3955. 17.35. A. E[X] = (.5)(6)(.8) + (.5)(6)q = 2.4 + 3q. The second moment of a Binomial is: mq(1 - q) + (mq)2 = mq - mq2 + m2 q2 . E[X2 ] = (.5){(6)(.8) - (6)(.82 ) + (62 )(.82 )} + (.5){6q - 6q2 + 36q2 } = 12 + 3q + 15q2 . Var[X] = 12 + 3q + 15q2 - (2.4 + 3q)2 = 6.24 - 11.4q + 6q2 . E[X] = Var[X]. ⇒ 2.4 + 3q = 6.24 - 11.4q + 6q2 . ⇒ 6q2 - 14.4q + 3.84 = 0. q = {14.4 ±
14.42 - (4)(6)(3.84) } / 12 = {14.4 ± 10.7331} / 12 = 2.094 or 0.3056.
Comment: 0 ≤ q ≤ 1. When one mixes distributions, the variance increases. As discussed in “Mahlerʼs Guide to Buhlmann Credibility,” Var[X] = E[Var[X | q]] + Var[E[X | q]] ≥ E[Var[X | q]]. Since for a Binomial Distribution, the variance is less than the mean, for a mixture of Binomial Distributions, the variance can be either less than, greater than, or equal to the mean.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 329 17.36. D. For λ = .04, Prob[more than 1 claim] = 1 - e-.04 - .04 e-.04 = .00077898. For λ = .10, Prob[more than 1 claim] = 1 - e-.10 - .10 e-.10 = .00467884. Prob[more than 1 claim] = (.5)(.00077898) + (.5)(.00467884) = 0.273%. 17.37. B. For λ = .04, the mean is .04 and the second moment is: λ + λ2 = .04 + .042 = .0416. For λ = .10, the mean is .10 and the second moment is: λ + λ2 = .10 + .102 = .11. Therefore, the mean of the mixture is: (.5)(.04) + (.5)(.10) = 0.07, and the second moment of the mixture is: (.5)(.0416) + (.5)(.11) = 0.0758. The variance of the mixed distribution is: 0.0758 - 0.072 = 0.0709. Alternately, Variance[λ] = Second Moment[λ] - Mean[λ]2 = (.5)(.042 ) + (.5)(.12 ) - .072 = .0009. The variance of frequency for a mixture of Poissons is: Expected Value of the Process Variance + Variance of the Hypothetical Means = E[λ] + Var[λ] = .07 + .0009 = 0.0709. 17.38. D. (25%)(e-.25) + (75%)(e-.5) = 65.0%. 17.39. A. (25%)(.25 e-.25) + (75%)(.5 e-.5) = 27.6%. 17.40. B. (25%)(.252 e-.25/2) + (75%)(.52 e-.5/2) = 6.3%. 17.41. D. If there is one home game and two road games, then the distributions of road wins is: 2 @ 16%, 1 @ 48%, 0 @ 36%. Thus the chance of winning at least 2 games is: Prob[win 2 road] + Prob[win 1 road] Prob[win one home] = 16% + (48%)(80%) = 0.544. If instead there is one road game and two home games, then the distributions of home wins is: 2 @ 64%, 1 @ 32%, 0 @ 4%. Thus the chance of winning at least 2 games is: Prob[win 2 home] + Prob[win one home] Prob[win 1 road] = 64% + (32%)(40%) = 0.768. Thus the chance the Spiders win the series is: (50%)(0.544) + (50%)(0.768) = 65.6%. Comment: This is a 50%-50% mixture of two situations. (Each situation has its own distribution of games won.) While in professional sports there is a home filed advantage, it is not usually this big. Note that for m = 3 and q = (0.8 + 0.4)/2 = 0.6, the probability of at least two wins is: 0.63 + (3)(0.62 )(0.4) = 0.648 ≠ 0.656.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 330 17.42. The mean of the mixture is: p λ1 + (1 - p)λ2. The second moment of a Poisson is: variance + mean2 = λ + λ2. Therefore, the second moment of the mixture is: p (λ1 + λ12) + (1 - p)(λ2 + λ22). Variance of the mixture is: p (λ1 + λ12) + (1 - p)(λ2 + λ22) - {p λ1 + (1 - p)λ2}2 . For the mixture, the ratio of the variance to the mean is: {p (λ 1 + λ12) + (1 - p)(λ2 + λ22)}/{p λ1 + (1 - p)λ2} - {p λ1 + (1 - p)λ2}. For λ1 = 10% and λ2 = 20%, the ratio of the variance to the mean is: {p .11 + (1 - p).24} / {p .1 + (1 - p).2} - {p .1 + (1 - p).2} = (0.24 - 0.13p) / (0.2 - 0.1p) - (0.2 - 0.1p). Here is a graph of the ratio of the variance to the mean as a function of p: Ratio 1.0175 1.015 1.0125 1.01 1.0075 1.005 1.0025 0.2
0.4
0.6
0.8
1
p
Comment: For either p = 0 or p = 1, this ratio is 1. For either p = 0 or p = 1, we have a single Poisson and the mean is equal to the variance. For 0 < p < 1, mixing increases the variance, and the variance of the mixture is greater than its mean. For example, for p = 80%, (0.24 - 0.13p) / (0.2 - 0.1p) - (0.2 - 0.1p) = 1.013.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 331 17.43. B. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe zero claims is: (1-q)4 . The distribution of q is: π(q) = 0.6
f(0) =
2500 99
∫0
2500 f(0 | q) π(q) dq = 99
2500 2 q (1-q). 99
0.6
∫0 q2(1- q)5 dq =
0.6
∫0 q2 - 5q3 + 10q4 - 10q5 + 5 q6 - q7 dq =
2500 {0.63 /3 - (5)(0.64 )/4 + (10)(0.65 )/5 - (10)(0.66 )/6 + (5)(0.67 )/7 - 0.68 /8} = 14.28%. 99 17.44. D. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe one claim is: 4q(1-q)3 . The distribution of q is: π(q) = 0.6
f(1) =
∫0
10,000 99
2500 f(1 | q) π(q) dq = (4) 99
2500 2 q (1-q). 99
0.6
∫0 q3(1- q)4 dq =
0.6
∫0 q3 - 4q4 + 6q5 - 4q6 + q7 dq =
10,000 {0.64 /4 - (4)(0.65 )/5 + (6)(0.66 )/6 - (4) (0.67 )/7 + 0.68 /8} = 29.81%. 99 17.45. E. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe two claims is: 6q2 (1-q)2 . The distribution of q is: π(q) = 0.6
f(2) =
∫0
2500 f(2 | q) π(q) dq = (6) 99
0.6
∫0
2500 2 q (1-q). 99
5000 q4 (1- q)3 dq = 33
5000 {0.65 /5 - (3)(0.66 )/6 + (3)(0.67 )/7 - 0.68 /8} = 32.15%. 33
0.6
∫0 q4 - 3q5 + 3q6 - q7 dq =
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 332 17.46. A. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe three claims is: 4q3 (1-q). The distribution of q is: π(q) = 0.6
f(3) =
∫0
2500 f(3 | q) π(q) dq = (4) 99
0.6
∫0
2500 2 q (1-q). 99
10,000 q5 (1- q)2 dq = 99
0.6
∫0 q5 - 2q6 + q7 dq =
10,000 {0.66 /6 - (2)(0.67 )/7 + 0.68 /8} = 18.96%. 99 17.47. C. Given q, we have a Binomial with parameters m = 4 and q. The chance that we observe four claims is: q4 . The distribution of q is: π(q) = 0.6
f(4) =
∫0
2500 f(4 | q) π(q) dq = 99
0.6
∫0
2500 2 q (1-q). 99
2500 q6 (1- q) dq = 99
0.6
∫0 q6 - q7 dq =
2500 {0.67 /7 - 0.68 /8} = 4.80%. 99 Comment: Since we have a Binomial with m = 4, the only possibilities are 0, 1, 2, 3 or 4 claims. Therefore, the probabilities for 0, 1, 2, 3, and 4 claims must add to one: 14.28% + 29.81% + 32.15% + 18.96% + 4.80% = 1. 17.48. E. Let x be the mean for the low risk policies. The mean of the mixture is: (3/4)x + 0.4/4 = 0.75x + 0.1 The second of the mixture is the mixture of the second moments: (3/4)(x + x2 ) + (0.4 + 0.42 )/4 = 0.75x2 + 0.75x + 0.14. Thus the variance of the mixture is: 0.75x2 + 0.75x + 0.14 - (0.75x + 0.1)2 = 0.1875x2 + 0.6x + 0.13. Thus, 0.2575 = 0.1875x2 + 0.6x + 0.13. ⇒ 0.1875x2 + 0.6x - 0.1275 = 0. ⇒ x=
-0.6 ±
0.62 - (4)(0.1875)(-0.1275) = 0.20, taking the positive root. (2)(0.1875)
Comment: You can try the choices and see which one works.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 333 17.49. C. The frequency distribution for the class = 1
∫p
θ=1
1
(1- θ)x + 1 ⎤ x f(x | θ) g(θ) dθ = - (1- θ) / ln(p) dθ = (x +1) ln(p) ⎥⎦
∫p
x+1
-(1- p) = . (x + 1) ln(p)
θ=p
Comment: 4,11/82, Q.48, rewritten. Note that f(x|θ) is a geometric distribution. The mixed frequency distribution for the class is a logarithmic distribution, with β = 1/p -1 and x+1 running from 1 to infinity (so that f(0) is the logarithmic distribution at 1, f(1) is the logarithmic at 2, etc. The support of the logarithmic is 1,2,3,...) 17.50. E. Var[Y] = EX[VAR[Y | X]] + VARX[E[Y | X]] = EX[x] + VARX[x] = mq + mq(1 - q) = mq(2 - q). Comment: Total Variance = Expected Value of the Process Variance + Variance of the Hypothetical Means. See “Mahlerʼs Guide to Buhlmann Credibility.” 17.51. E. P(Y=0) =
∫ P(Y = 0 | θ) f(θ)dθ = ∫ e−θ f(θ) dθ.
For the first case, f(θ) = 1/2, for 0 ≤ θ ≤ 2 2
P(Y=0) =
∫0 e-θ / 2
dθ = (1 - e-2)/2 = 0.432.
For the second case, f(θ) = e−θ, for θ > 0 and ∞
P(Y=0) =
∫0 e-2θ dθ = 1/2.
For the third case, P(Y=0) = e-1 = 0.368. In the first and third cases P(Y=0) < 0.45. Comment: Three separate problems in which you need to calculate P(Y=0) given three different distributions of θ. 17.52. A. The chance of zero or one claim for a Poisson distribution is: e−λ + λe−λ. We average over the possible values of λ: 5
Prob(0 or 1 claim) = (1/5)
∫0
e- λ
+
λe - λ
dλ = (1/5)
(-2e- λ
-
λ=5 λ λe )
]
= (1/5) (2 - 7e-5) = 0.391.
λ=0
Probability that there are 2 or more claims = 1 - Prob(0 or 1 claim) = 1 - 0.391 = 0.609.
2013-4-1, Frequency Distributions, §17 Mixed Distributions HCM 10/4/12, Page 334 17.53. E. P(z) ≡ E[zN]. The p.g.f. of the Poisson Distribution is: P(z) = eλ(z-1). Therefore, for the Poisson, E[zN] = eλ(z-1). E[2N | λ] = P(2) = eλ(2-1) = eλ. 4
E[W] =
∫0
4
E[2N
| λ] (1/ 4) dλ = (1/4)
∫0 eλ dλ = (1/4)(e4 - 1) = 13.4.
17.54. A. For a Negative Binomial with r = 4, f(0) = 1/(1+β)4 . 2
Prob[0 claims] =
∫
β=2
1/ (1+ β) 4 (1/ 2) dβ
= -1/{6(1 + β)3 }
0
]
β=0
= (1/6)(1 - 33 ) = .1605.
Prob[at least 1 claim] = 1 - .1605 = 0.8395. 17.55. E. For q = 0.5 and m = 2, f(2) = .52 = .25. ⎛4⎞ For q = 0.5 and m = 4, f(2) = ⎜ ⎟ (.52 )(.52 ) = .375. ⎝2⎠ Probability that the mixed distribution is 2 is: p(.25) + (1 - p)(.375) = 0.375 - 0.125p. Comment: The solution cannot involve p2 , eliminating choices A, C, and D.
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 335
Section 18, Gamma Function142 The quantity xα−1e-x is finite for x ≥ 0 and α ≥ 1. Since it declines quickly to zero as x approaches infinity, its integral from zero to ∞ exists. This is the much studied and tabulated (complete) Gamma Function. Γ(α) =
∞
∞
0
0
∫ tα - 1 e - t dt = θ−α
∫ tα - 1e - t / θ
dt , for α ≥ 0 , θ ≥ 0.
We prove the equality of these two integrals, by making the change of variables x = t/θ: ∞
∞
∫0 tα - 1e- t dt = ∫0 (x / θ)α - 1 e- x/ θ dx/ θ Γ(α) = (α −1) !
∞
= θ−α
∫0 xα - 1 e- x/ θ dx .
Γ(α) = (α-1) Γ(α−1).
Γ(1) = 1. Γ(2) = 1. Γ(3) = 2. Γ(4) = 6. Γ(5) = 24. Γ(6) = 120. Γ(7) = 720. Γ(8) = 5040. One does not need to know how to compute the complete Gamma Function for noninteger alpha. Many computer programs will give values of the complete Gamma Function.
Γ(1/2) =
π
Γ(3/2) = 0.5 π
For α ≥ 10: lnΓ(α) ≅ (α - 0.5) lnα - α + +
Γ(-1/2) = -2 π
Γ(-3/2) = (4/3) π .
ln(2 π ) 1 1 1 1 + + 2 12 α 360α 3 1260α 5 1680α 7
691 3617 1 1 + . 143 9 11 13 15 360,360 α 122,400 α 1188α 156 α
For α < 10 use the recursion relationship Γ(α) = (α−1) Γ(α−1). The Gamma function is undefined at the negative integers and zero. For large α: Γ(α) ≅ e-α α α−1/2
2 π , which is Sterlingʼs formula.144
The ratios of two Gamma functions with arguments that differ by an integer can be computed in terms of a product of factors, just as one would with a ratio of factorials.
142
See Appendix A of Loss Models. Also see the Handbook of Mathematical Functions, by M. Abramowitz, et. al. See Appendix A of Loss Models, and the Handbook of Mathematical Functions, by M. Abramowitz, et. al. 144 See the Handbook of Mathematical Functions, by M. Abramowitz, et. al. 143
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 336
Exercise: What is Γ(7) / Γ(4)? [Solution: Γ(7) / Γ(4) = 6! / 3! = (6)(5)(4) = 120.] Exercise: What is Γ(7.2) / Γ(4.2)? [Solution: Γ(7.2) / Γ(4.2) = 6.2! / 3.2! = (6.2)(5.2)(4.2) = 135.4.] Note that even when the arguments are not integer, the ratio still involves a product of factors. The solution of the last exercise depended on the fact that 7.2 - 4.2 = 3 is an integer. Integrals involving e−x and powers of x can be written in terms of the Gamma function: ∞
∫0
tα - 1e - t / θ
dt =
Γ(α) θα, or for integer n:
∞
∫ tn e- c t
dt = n! / cn+1.
0
Exercise: What is the integral from 0 to ∞ of: t3 e-t/10? [Solution: With α = 4 and θ = 10, this integral is: Γ(4) 104 = (6)(10000) = 60,000.] This formula for “gamma-type” integrals is very useful for working with anything involving the Gamma distribution, for example the Gamma-Poisson process. It follows from the definition of the Gamma function and a change of variables. The Gamma density in the Appendix of Loss Models is: θ−α xα−1 e−x/θ / Γ(α). Since this probability density function must integrate to unity, the above formula for gamma-type integrals follows. This is a useful way to remember this formula on the exam. Incomplete Gamma Function: As shown in Appendix A of Loss Models, the Incomplete Gamma Function is defined as: Γ(α ; x) =
x
∫ tα - 1 e- t
dt / Γ(α).
0
Γ(α ; 0) = 0. Γ(α ; ∞) = Γ(α)/Γ(α) = 1. As discussed below, the Incomplete Gamma Function with the introduction of a scale parameter θ is the Gamma Distribution.
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 337
Exercise: Via integration by parts, put Γ(2 ; x) in terms of Exponentials and powers of x. [Solution: Γ(2 ; x) =
x
∫t
e- t
0
dt / Γ(2) =
x
∫t
e- t
dt =
e-t
0
-
t =x t e t]
= 1 - e-x - xe-x.]
t =0
One can prove via integration by parts that Γ(α ; x) = Γ(α-1 ; x) - xα-1 e-x / Γ(α).145 This recursion formula for integer alpha is: Γ(n ; x) = Γ(n-1 ; x) - xn-1 e-x /(n-1)!. Combined with the fact that Γ(1 ; x) = ∫ e−t dt = 1 - e-x, this leads to the following formula for the Incomplete Gamma for positive integer alpha:146
Γ(n ; x) = 1 -
n−1 i x
∑
i =0
e- x . i!
Integrals Involving Exponentials times Powers: One can use the incomplete Gamma Function to handle integrals involving te-t/θ. x
∫
t e- t / θ dt =
0
x
∫ t e - t / θ dt
x/θ
∫
θs e-s θds = θ2∫ se-s ds = θ2Γ(2 ; x/θ)Γ(2) = θ2{1 - e-x/θ - (x/θ)e-x/θ}.
0
= θ2 {1 - e-x/θ - (x/θ)e-x/θ }.
0
Exercise: What is the integral from 0 to 3.4 of: te-t/10? [Solution: (102 ) {1 - e-3.4/10 - (3.4/10)e-3.4/10} = 4.62.] Such integrals can also be done via integration by parts, or as discussed below using the formula for the present value of a continuously increasing annuity, or one can make use of the formula for the Limited Expected Value of an Exponential Distribution:147 145
See for example, Formula 6.5.13 in the Handbook of Mathematical Functions, by Abramowitz, et. al. See Theorem A.1 in Appendix A of Loss Models. One can also establish this result by computing the waiting time until the nth claim for a Poisson Process, as shown in “Mahlerʼs Guide to Stochastic Processes,” on another exam. 147 See Appendix A of Loss Models. 146
2013-4-1, Frequency Distributions, §18 Gamma Function x
∫
x
t e- t / θ dt = θ ∫ t e- t / θ / θ dt = θ{E[X
0
HCM 10/4/12,
Page 338
∧ x] - xS(x)} =
0
θ{θ(1 - e-x/θ) - xe-x/θ} = θ2{1 - e-x/θ - (x/θ)e-x/θ}. When the upper limit is infinity, the integral simplifies: ∞
∫ t e - t dt = θ2.
148
0
In a similar manner, one can use the incomplete Gamma Function to handle integrals involving tn e-t/θ, for n integer: x
∫
tn
e- t/θ
dt
= θn+1
0
x /θ
∫
n
sn
e -s
ds
= θn+1Γ(n+1; x/θ)Γ(n+1)
= n!
θn+1{1
-
i
∑ x ei!
-x
}.
i =0
0
Exercise: What is the integral from 0 to 3.4 of: t3 e-t/10? x
[Solution:
∫ 0
t3 e - t / θ
dt = θ4
x /θ
∫
s3 e -s d s = θ4Γ(4 ; x/θ)Γ(4) =
0
6θ4{1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ/2 - (x/θ)3 e-x/θ/6}. For θ = 10 and x = 3.4, this is: 60000{1 - e-0.34 - 0.34e-0.34 - 0.342 e-0.34/2 - 0.343 e-0.34/6} = 25.49.]
If one divided by θ, then the integrand would be t times the density of an Exponential Distribution. Therefore, the given integral is θ(mean of an Exponential Distribution) = θ2.
148
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 339
Continuously Increasing Annuities: The present value of a continuously increasing annuity of term n, with force of interest δ, is:149
(I a)n = (a n − ne
− nδ
)/δ
where the present value of a continuous annuity of term n, with force of interest δ, is:
a n = (1− e
-nδ
)/ δ
However, the present value of a continuously increasing annuity can also be written as the integral from 0 to n of te-tδ. Therefore, n
∫ t e− tδ dt
= {(1-e-nδ)/δ - ne-nδ}/δ = (1-e-nδ)/δ2 - ne-nδ/δ.
0
Those who remember the formula for the present value of an increasing continuous annuity will find writing such integrals involving te-tδ in terms of increasing annuities to be faster than doing integration by parts. Exercise: What is the integral from 0 to 3.4 of: te-t/10? [Solution: {(1-e-3.4/10)/0.1 - (3.4)e-3.4/10}/0.1 = (2.882 - 2.420)/0.1 = 4.62. Comment: Matches the answer gotten above using Incomplete Gamma Functions. 4.62 is the present value of a continuously increasing annuity with term 3.4 years and force of interest 10%.]
149
See for example, The Theory of Interest by Kellison.
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 340
Gamma Distribution:150 The Gamma Distribution can be defined in terms of the Incomplete Gamma Function, F(x) = Γ(α ; x/ θ). Note that Γ(α; ∞) = Γ(α) / Γ(α) = 1 and Γ(α; 0) = 0, so we have as required for a distribution function F(∞) = 1 and F(0) = 0. f(x) =
(x / θ)α e - x / θ xα− 1 e - x / θ = , x > ∞. x Γ(α ) θα Γ (a)
Exercise: What is the mean of a Gamma Distribution? ∞ ∞
[Solution:
∫
x f(x) dx =
0
∞
∫
x
0
xα-1
∫ xα e - x/ θ dx
e - x/ θ
dx = 0
θα Γ(α)
θα Γ(α)
=
Γ(α+ 1) θα + 1 Γ(α+ 1) = θ = αθ.] θ α Γ(α) Γ(α)
=
Γ(α+ n) θ α+ n Γ(α+ n) n = θ θα Γ(α ) Γ(α)
Exercise: What is the nth moment of a Gamma Distribution? [Solution: ∞ ∞
∞
∫ xn f(x) dx = ∫ xn 0
0
xα- 1
e - x/ θ
θ α Γ(α)
∫ xn + α − 1 e - x/ θ dx
dx = 0
θα Γ(α)
= (α+n-1)(α+n-2)....(α) θn . Comment: This is the formula shown in Appendix A of Loss Models.] Exercise: What is the 3rd moment of a Gamma Distribution with α = 5 and θ = 2.5? [Solution: (α+n-1)(α+n-2)....(α)θn = (5+3-1)(5+3-2)(5)(2.53 ) = 3281.25.] Since the Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with shape parameter of ν/2 and (inverse) scale parameter of 1/2 : χ2ν (x) = Γ(ν/2 ; x/2). Therefore, one can look up values of the Incomplete Gamma Function (for half integer or integer values of α) by using the cumulative values of the Chi-Square Distribution. For example, Γ(6;10) = the Chi-Square Distribution for 2 x 6 = 12 degrees of freedom at a value of 2 x 10 = 20. For the Chi-Square with 12 d.f. there is a 0.067 chance of a value greater than 20, so the value of the distribution function is: χ212 (20) = 1 - 0.067 = 0.933 = Γ(6;10) . 150
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 341
Inverse Gamma Distribution:151 By employing the change of variables y = 1/x, integrals involving e−1/x and powers of 1/x can be written in terms of the Gamma function: ∞
∫ t - (α + 1) e - θ / t
dt = Γ(α) θ−α.
0
The Inverse Gamma Distribution can be defined in terms of the Incomplete Gamma Function, F(x) = 1 - Γ[α ; (θ/x)]. θα e - θ / x The density of the Inverse Gamma is: α + 1 , for 0 < x < ∞. x Γ[α] A good way to remember the result for integrals from zero to infinity of powers of 1/x times Exponentials of 1/x, is that the density of the Inverse Gamma Distribution must integrate to unity.
151
See “Mahlerʼs Guide to Loss Distributions.” Appendix A of Loss Models.
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 342
Problems: 18.1 (1 point) What is the value of the integral from zero to infinity of: x5 e-8x? A. less than 0.0004 B. at least 0.0004 but less than 0.0005 C. at least 0.0005 but less than 0.0006 D. at least 0.0006 but less than 0.0007 E. at least 0.0007 18.2 (1 point) What is the density at x = 8 of the Gamma distribution with parameters α = 3 and θ = 10? A. less than 0.012 B. at least 0.012 but less than 0.013 C. at least 0.013 but less than 0.014 D. at least 0.014 but less than 0.015 E. at least 0.015 ∞
18.3 (1 point) Determine
∫ x- 6 e - 4 / x dx . 0
A. less than 0.02 B. at least 0.02 but less than 0.03 C. at least 0.03 but less than 0.04 D. at least 0.04 but less than 0.05 E. at least 0.05 18.4 (2 points) What is the integral from 6.3 to 8.4 of x2 e-x / 2? Hint: Use the Chi-Square table. A. less than 0.01 B. at least 0.01 but less than 0.03 C. at least 0.03 but less than 0.05 D. at least 0.05 but less than 0.07 E. at least 0.07 18.5 (2 points) What is the integral from 4 to 8 of: xe-x/5? A. 7
B. 8
C. 9
D. 10
E. 11
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 343
Use the following information for the next 3 questions: Define the following distribution function in terms of the Incomplete Gamma Function: F(x) = Γ[α ; ln(x)/θ], 1 < x. 18.6 (2 points) What is the probability density function corresponding this distribution function? θα xα e - θ / x A. Γ[α] ln[x]α -1 B. α 1+ 1/ θ Γ[α] θ x C.
θα xα + 1 e - θ / x Γ[α]
ln[x]α D. α 1+ 1/ θ Γ[α] θ x E. None of the above 18.7 (2 points) What is the mean of this distribution? A. θ/(α−1) B. θ/α C. θ(α−1) D. θα E. None of the above 18.8 (3 points) If α = 5 and θ = 1/7, what is the 3rd moment of this distribution? A. less than 12 B. at least 12 but less than 13 C. at least 13 but less than 14 D. at least 14 but less than 15 E. at least 15
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 344
Solutions to Problems: 18.1. B. Γ(5+1) / 85+1 = 5! / 86 = 0.000458. 18.2. D. θ−α xα−1 e−x /θ / Γ(α) = (10-3) 82 e-0.8 / Γ(3) = 0.0144. 18.3. B. The density of the Inverse Gamma is: θα e−θ/x /{xα+1 Γ(α)}, 0 < x < ∞. Since this density integrates to one, x−(α+1) e−θ/x integrates to θ−αΓ(α). Thus taking α = 5 and θ = 4, x-6 e-4/x integrates to: 4-5 Γ(5) = 24 / 45 = 0.0234. Comment: Alternately, one can make the change of variables y = 1/x. 18.4. C. The integrand is that of the Incomplete Gamma Function for α = 3: xα−1e-x / Γ(α) = x2 e-x /2. Thus the integral is: Γ(3; 8.4) − Γ(3; 6.3). Since the Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with shape parameter of ν/2 and scale parameter of 2 : χ26 (x) = Γ(3 ; x/2). Looking up the Chi-Square Distribution for 6 degrees of freedom, the distribution function is 99% at 16.8 and 95% at 12.6. 99% = Γ(3; 16.8/2) = Γ(3; 8.4), and 95% = Γ(3; 12.6/2) = Γ(3; 6.3). Thus Γ(3; 8.4) − Γ(3; 6.3) = 0.99 - 0.95 = 0.04. Comment: The particular integral can be done via repeated integration by parts. One gets: -e-x {(x2 /2) + x + 1}. Evaluating at the limits of 8.4 and 6.3 gives the same result. 18.5. A.
x
∫ te-t/θ dt = θ2{1 - e-x/θ - (x/θ)e-x/θ}.
Set θ = 5.
0 8
8
8
x=8
∫ te-t/5 dt = ∫ te-t/5 dt - ∫ te-t/5 dt = (52){1 - e-x/5 - (x/5)e-x/5}] = 4
0
4
x=4
(25){e-.8 + (.8)e-.8 - e-1.6 - (1.6)e-1.6} = 7.10. Comment: Can also be done using integration by parts or the increasing annuity technique.
2013-4-1, Frequency Distributions, §18 Gamma Function
HCM 10/4/12,
Page 345
18.6. B. Let y = ln(x)/θ. If y follows a Gamma Distribution with parameters α and 1, then x follows a LogGamma Distribution with parameters α and θ. If y follows a Gamma Distribution with parameters α and 1, then f(y) = yα−1 e−y / Γ(α). Then the density of x is given by: f(y)(dy/dx) = {(ln(x)/θ)α−1 exp(- ln(x)/θ) / Γ(α)} /(xθ) = θ−α{ln(x)}α−1 / {x1+1/θ Γ(α)}. Comment: This is called the LogGamma Distribution and bears the same relationship to the Gamma Distribution as the LogNormal bears to the Normal Distribution. Note that the support for the LogGamma is 1 to ∞, since when y = 0, x = exp(0θ) = 1. 18.7. E. ∞
∞
∫ xf(x)dx = ∫ θ−α{ln(x)}α−1 / {x1/θ Γ(α) } dx. 1
1
Let y = ln(x)/θ, and thus x = exp(θ y), dx = exp(θy)θdy, then the integral for the first moment is: ∞
∞
∫ θ−α{θ y}α−1 {exp(θy)θdy }/{exp(y) Γ(α)} = ∫ yα −1 exp[-y(1-θ)] dy/ Γ(α) = (1-θ)−α . 0
0
18.8. E. The formula for the nth moment is derived as follows: ∞
∞
∞
∫ xnf(x)dx = ∫ xn θ−α{ln(x)}α−1dx /{x1+1/θ Γ(α)} = ∫θ−α{ln(x)}α−1 xn-(1+1/θ)dx / Γ(α)} 1
1
1
Let y = ln(x)/θ, and thus x = exp(θ y), dx = exp(θy)θdy, then the integral for the nth moment is: ∞
∞
∫ θ−α{yθ}α−1 exp({n-(1+1/θ)}yθ){θ exp(y/θ)dy}/Γ(α) = ∫ yα−1exp[-y(1-nθ)]dy/ Γ(α) 0
0
= (1-nθ)−α , nθ < 1. Thus the 3rd moment with α = 5 and θ = 1/7 is: (1-nθ)−α = (1-3/7)-5 = 16.41. Comment: One could plug in n = 3 and the value of the parameters at any stage in the computation. I have chosen to do so at the very end.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 346
Section 19, Gamma-Poisson Frequency Process152 The single most important specific example of mixing frequency distributions, is mixing Poisson Frequency Distributions via a Gamma Distribution. Each insured in a portfolio is assumed to have a Poisson distribution with mean λ. Across the portfolio, λ is assumed to be distributed via a Gamma Distribution. Due to the mathematical properties of the Gamma and Poisson there are some specific relationships. For example, as will be discussed, the mixed distribution is a Negative Binomial Distribution. Prior Distribution: The number of claims a particular policyholder makes in a year is assumed to be Poisson with mean λ. For example, the chance of having 6 claims is given by: λ6 e−λ / 6! Assume the λ values of the portfolio of policyholders are Gamma distributed with α = 3 and θ = 2/3, and therefore probability density function:153 f(λ) = 1.6875 λ2 e−1.5λ
λ ≥ 0.
This prior Gamma Distribution of Poisson parameters is displayed below:
0.4
0.3
0.2
0.1
1
152
2
3
4
5
6
Poisson Parameter
Section 6.3 of Loss Models. Additional aspects of the Gamma-Poisson are discussed in “Mahlerʼs Guide to Conjugate Priors.” 153 For the Gamma Distribution, f(x) = θ−αxα−1 e- x/θ/ Γ(α).
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 347
The Prior Distribution Function is given in terms of the Incomplete Gamma Function: F(λ) = Γ(3; 1.5λ). So for example, the a priori chance that the µ value lies between 4 and 5 is: F(5) - F(4) = Γ(3; 7.5) - Γ(3; 6) = 0.9797 - 0.9380 = 0.0417. Graphically, this is the area between 4 and 5 and under the prior Gamma. Mixed Distribution: If we have a risk and do not know what type it is, in order to get the chance of having 6 claims, one would weight together the chances of having 6 claims, using the a priori probabilities and integrating from zero to infinity:154 ∞
∞ 6 -λ ∞ λ 6 e- λ λ e 2 e- 1.5λ dλ = 0.00234375 f(λ) dλ = 1.6875 λ ∫ 6! ∫ 6! ∫ λ8 e- 2.5λ dλ . 0 0 0
This integral can be written in terms of the (complete) Gamma function: ∞
∫ λα − 1 e- λ / θ
dλ = Γ(α) θα.
0
∞
Thus
∫ λ8 e- 2.5λ dλ = Γ(9) 2.5-9 = (8!) (0.4)9 ≅ 10.57. 0
Thus the chance of having 6 claims ≅ (0.00234375) (10.57) ≅ 2.5%. More generally, if the distribution of Poisson parameters λ is given by a Gamma distribution f(λ) = θ−αλ α−1 e− λ/θ/ Γ(α), and we compute the chance of having n accidents by integrating from zero to infinity: ∞
∞ 6 -λ ∞ λ n e- λ λ e λ α − 1 e- λ / θ 1 f(λ) dλ = dλ = λ n + α − 1 e- λ (1 + 1 / θ) dλ = ∫ n! ∫ 6! ∫ α α θ Γ(α) n! θ Γ(α) 0 0 0
θn θn 1 Γ(n + α) Γ(n+ α) α(α + 1)...(α + n -1) = = . Γ(α) n! θn + α (1 + 1/ θ)n + α (1 + θ)n + α n! θα Γ(α) (1 + 1/ θ)n + α n! The mixed distribution is in the form of the Negative Binomial distribution with parameters r = α and β = θ: Probability of n accidents =
βx r(r +1)...(r + x - 1) . (1+ β) x + r x!
Note the way both the Gamma and the Poisson have factors involving powers of λ and e−λ and these similar factors combine in the product. 154
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 348
For the specific case dealt with previously: n = 6, α = 3 and θ = 2/3. Therefore, the mixed Negative Binomial Distribution has parameters r = α = 3 and β = θ = 2/3. Thus the chance of having 6 claims is:
(2 / 3)6 (3)(4)(5)(6)(7)(8) = 2.477%. (1 + 2 / 3)6 + 3 6!
This is the same result as calculated above. This mixed Negative Binomial Distribution is displayed below, through 10 claims: 0.25 0.2 0.15 0.1 0.05
0
1
2
3
4
5
6
7
8
9
10
On the exam, one should not go through the calculation above. Rather remember that the mixed distribution is a Negative Binomial.
When Poissons are mixed via a Gamma Distribution, the mixed distribution is always a Negative Binomial Distribution, with r = α = shape parameter of the Gamma and β = θ = scale parameter of the Gamma. r goes with alpha, beta rhymes with theta.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 349
Note that the overall (a priori) mean can be computed in either one of two ways. First one can weight together the means for each type of risk, using the a priori probabilities. This is E[λ] = the mean of the prior Gamma = αθ = 3(2/3) = 2. Alternately, one can compute the mean of the mixed distribution: the mean of a Negative Binomial is rβ = 3(2/3) = 2. Of course the two results match. Exponential-Poisson:155 It is important to note that the Exponential distribution is a special case of the Gamma distribution, for α = 1.
For the important special case α = 1, we have an Exponential distribution of λ: f(λ) = e−λ/θ/θ, λ ≥ 0. The mixed distribution is a Negative Binomial Distribution with r = 1 and β = θ. For the Exponential-Poisson, the mixed distribution is a Geometric Distribution with β = θ. Mixed Distribution for the Gamma-Poisson, When Observing Several Years of Data: One can observe for a period of time longer than a year. If an insured has a Poisson parameter of λ for each individual year, with λ the same for each year, and the years are independent, then for example one has a Poisson parameter of 7λ for 7 years. The chances of such an insured having a given number of claims over 7 years is given by a Poisson with parameter 7λ. For a portfolio of insureds, each of its Poisson parameters is multiplied by 7. This is mathematically just like inflation. If before their each being multiplied by 7, the Poisson parameters follow a Gamma distribution with parameter α and θ, then after being multiplied by 7 they follow a Gamma with parameters α and 7θ.156 Thus the mixed distribution for 7 years of data is given by a Negative Binomial with parameters r = α and β = 7θ.
155
See for example 3/11/01, Q.27. Under uniform inflation, the scale parameter of the Gamma Distribution is multiplied by the inflation factor. See “Mahlerʼs Guide to Loss Distributions.” 156
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 350
In general, if one observes a Gamma-Poisson situation for Y years, and each insuredʼs Poisson parameter does not change over time, then the distribution of Poisson parameters for Y years is given by a Gamma Distribution with parameters α and Yθ, and the mixed distribution for Y years of data is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ.157 Exercise: Assume that the number of claims in a year for each insured has a Poisson Distribution with mean λ. The distribution of λ over the portfolio of insureds is a Gamma Distribution with parameters α = 3 and θ = 0.01. What is the mean annual claim frequency for the portfolio of insureds? [Solution: The mean annual claims frequency = mean of the (prior) Gamma = αθ = (3)(0.01) = 3%.] Exercise: Assume that the number of claims in a year for each insured has a Poisson Distribution with mean λ. For each insured, λ does not change over time. For each insured, the numbers of claims in one year is independent of the number of claims in another year. The distribution of λ over the portfolio of insureds is a Gamma Distribution with parameters α = 3 and θ = 0.01. An insured is picked at random and observed for 9 years. What is the chance of observing exactly 4 claims from this insured? [Solution: The mixed distribution for 9 years of data is given by a Negative Binomial Distribution with parameters r = α = 3 and β = Yθ = (9)(0.01) = 0.09. f(4) =
(4 + 3 -1)! 4! 2!
0.094 = 0.054%.] (1 + 0.09)3 + 4
If Lois has a low expected annual claim frequency, for example 2%, then over 9 years she has a Poisson Distribution with mean 18%. Her chance of having 4 claims during these nine years is: 0.184 e-0.18/ 24 = 0.004%. If Hi has a very high expected annual claim frequency, for example 20%, then over 9 years he has a Poisson Distribution with mean 180%. His chance of having 4 claims during these nine years is: 1.84 e-1.8/ 24 = 7.23%. Drivers such as Lois with a low λ in one year are assumed to have the same low λ every year. Such good drivers have an extremely small chance of having four claims in 9 years. 157
“Each insuredʼs Poisson parameter does not change over time.” If Alanʼs lambda is 4% this year, it is 4% next year, and every year. Similarly, if Bonnieʼs lambda is 3% this year , then it is 3% every year. Unless stated otherwise, on the exam assume lambda does not vary over time.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 351
Drivers such as Hi with a very high λ in one year are assumed to have the same high λ every year. Such drivers have a significant chance of having four claims in 9 years. It is such very bad drivers which contribute significantly to the 0.054% probability of four claims in 9 years for an insured picked at random. This situation in which for a given insured λ is the same over time, contrasts with that in which λ changes randomly each year. Exercise: Assume that the number of claims in a year for each insured has a Poisson Distribution with mean λ. For each insured, λ changes each year at random; the λ in one year is independent of the λ in another year. The distribution of λ is a Gamma Distribution with parameters α = 3 and θ = 0.01. An insured is picked at random and observed for 9 years. What is the chance of observing exactly 4 claims from this insured? [Solution: The mixed distribution for 1 year of data is given by a Negative Binomial Distribution with parameters r = α = 3 and β = θ = 0.01. Over 9 years, we get a sum of 9 independent Negative Binomials, with r = (9)(3) = 27 and β = 0.01. f(4) =
(4 + 27 -1)! 4! 26!
0.014 = 0.00020.] (1 + 0.01)27 + 4
This is different than the Gamma-Poisson process in which we assume that the lambda for an individual insured is the same each year. For the Gamma-Poisson the β parameter is multiplied by Y, while here the r parameter is multiplied by Y. This situation in which instead λ changes each year is mathematically the same as if we assume an insured each year has a Negative Binomial Distribution. For example, assume an insured has a Negative Binomial with parameters r and β. Assume the numbers of claims in one year is independent of the number of claims in another year. Then over Y years, we add up Y independent identically distributed Negative Binomials; over Y years, the frequency distribution for this insured is Negative Binomial with parameters Yr and β. Exercise: Assume that the number of claims in a year for an insured has a Negative Binomial Distribution with parameters r = 3 and β = 0.01. What is the mean annual claim frequency? [Solution: rβ = (3)(0.01) = 3%.]
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 352
Exercise: Assume that the number of claims in a year for an insured has a Negative Binomial Distribution with parameters r = 3 and β = 0.01. The numbers of claims in one year is independent of the number of claims in another year. What is the chance of observing exactly 4 claims over 9 years from this insured? [Solution: Over 9 years, the frequency distribution for this insured is Negative Binomial with parameters r = (9)(3) = 27 and β = 0.01. f(4) =
(4 + 27 -1)! 4! 26!
0.014 = 0.00020.] (1 + 0.01)27 + 4
Even though both situations had a 3% mean annual claim frequency, the probability of observing 4 claims over 9 years was higher in the Gamma-Poisson situation with λ the same each year for a given insured, than when we assumed λ changed each year or equivalently an insured had the same Negative Binomial Distribution each year. In the Gamma-Poisson situation with λ the same each year for a given insured, we were more likely to see extreme results such as 4 claims in 9 years, since there is a small probability of picking at random an insured with a high expected annual claim frequency, such as Hi with λ = 20%. Thinning a Negative Binomial Distribution: Since the Gamma-Poisson is one source of the Negative Binomial Distribution, it can be used to aid our understanding of the Negative Binomial Distribution. For example, assume we have a Negative Binomial Distribution with r = 4 and β = 2. We can think of that as resulting from a mixture of Poisson Distributions, with λ distributed via a Gamma Distribution with α = 4 and θ = 2.158 Assume frequency and severity are independent, and that 30% of losses are “large.” Then for each insured, his large losses are Poisson with mean .3λ. If λ is distributed via a Gamma with α = 4 and θ = 2, then 0.3λ is distributed via a Gamma with α = 4 and θ = (0.3)(2) = 0.6.159 The large losses are a Gamma-Poisson Process, and therefore, across the whole portfolio, the distribution of large losses is Negative Binomial, with r = 4 and β = 0.6.
158
While this may not be real world situation that the Negative Binomial is modeling, since the results are mathematically identical, we can assume it is for the purpose of deriving general mathematical results. 159 When a variable is Gamma Distributed, then a constant times that variable is also Gamma Distributed, with the same shape parameter, but with the scale parameter multiplied by that constant. See the discussion of uniform inflation in ”Mahlerʼs Guide to Loss Distributions.”
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 353
In this manner one can show, as has been discussed previously, that if losses are Negative Binomial with parameters r and β, then if we take a fraction t of all the losses in a manner independent of frequency, then these selected losses are Negative Binomial with parameters r and tβ.160 Returning to the example, the small losses for an individual insured are Poisson with mean 0.7λ. Since λ is Gamma distributed, 0.7λ is distributed via a Gamma with α = 4 and θ = (0.7)(2) = 1.4. Therefore, across the whole portfolio, the distribution of small losses is Negative Binomial, with r = 4 and β = 1.4. Thus as in the Poisson situation, the overall process has been thinned into two similar processes. However, unlike the Poisson case, these two Negative Binomials are not independent. If for example, we observe a lot of large losses, such as 5, it is more likely that the observation came from an insured with a large λ. This implies we are more likely to also have observed a higher than average number of small losses. The number of large losses and the number of small losses are positively correlated.161 Correlation of Number of Small and Large Losses, Negative Binomial: Assume the number of losses follow a Negative Binomial Distribution with parameters r and β, and that “large” losses are t of all the losses. As previously, assume each insured is Poisson with mean λ, and λ is distributed via a Gamma with α = r and θ = β. Then the number of large losses is a Gamma-Poisson with α = r and θ = tβ. Posterior to observing L large losses, the distribution of the mean frequency for large losses is Gamma with α = r + L and 1/θ = 1/(tβ) + 1 ⇒ θ = tβ/(1+ tβ).162 Since the mean frequency of large losses is t times the mean frequency, posterior to observing L large losses, the distribution of the mean frequency is Gamma with α = r + L and θ = β/(1+ tβ). Therefore, given we have observed L large losses, the small losses are Gamma-Poisson with α = r + L, and θ = (1-t)β/(1+ tβ).
160
This can be derived via probability generating functions. See Example 8.8 in Loss Models. In the case of thinning a Binomial, the number of large and small loses would be negatively correlated. 162 See “Mahlerʼs Guide to Conjugate Priors.” 161
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 354
One computes the correlation between the number of small losses, S, and the number of large losses, L, as follows: E[LS] = EL [E[LS | L]] = EL [L E[S | L]] = EL [
L (r + L) (1- t)β ]= 1 + tβ
(1- t)β (1- t)β {rEL [L] + EL [L2 ]} = {rrtβ + rtβ(1 + tβ) + (rtβ)2 } = (1-t)tβ2r(1+r).163 1 + tβ 1 + tβ Cov[L, S] = E[LS] - E[L]E[S] = (1-t)tβ2r(1+r) - rtβr(1-t)β = β2rt(1-t).
Corr[L, S] =
β2 r t(1- t) = r tβ(1+ tβ)r (1- t)β {1+(1- t)β }
1 1 1 ) (1 + ) (1 + (1- t)β tβ
> 0.
For example, assume we have a Negative Binomial Distribution with r = 4 and β = 2. Assume frequency and severity are independent, and that 30% of losses are “large.” Then the number of large losses are Negative Binomial with r = 4 and β = 0.6, and the number of small losses are Negative Binomial with r = 4 and β = 1.4. The correlation of the number of large and small losses is: 1 1 = = 0.468. 1 1 (1 + 1/ 0.6) (1 + 1/ 1.4) ) (1 + ) (1 + (1- t)β tβ
163
Large losses are Negative Binomial with parameters r and tβ. Thus, EL [L2 ] = Var[L] + E[L]2 = rtβ(1 + tβ) + (rtβ)2 .
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 355
Problems: Use the following information to answer the next 2 questions: The number of claims a particular insured makes in a year is Poisson with mean λ. λ for a particular insured remains the same each year. The values of the Poisson parameter λ (for annual claim frequency) for the insureds in a portfolio follow a Gamma distribution, with parameters α = 3 and θ = 1/12. 19.1 (2 points) What is the chance that an insured picked at random from the portfolio will have no claims over the next three years? A. less than 35% B. at least 35% but less than 40% C. at least 40% but less than 45% D. at least 45% but less than 50% E. at least 50% 19.2 (2 points) What is the chance that an insured picked at random from the portfolio will have one claim over the next three years? A. less than 35% B. at least 35% but less than 40% C. at least 40% but less than 45% D. at least 45% but less than 50% E. at least 50%
19.3 (2 points) The distribution of the annual number of claims for an insured chosen at random is modeled by the negative binomial distribution with mean 0.6 and variance 0.9. The number of claims for each individual insured has a Poisson distribution and the means of these Poisson distributions are gamma distributed over the population of insureds. Calculate the variance of this gamma distribution. (A) 0.20 (B) 0.25 (C) 0.30 (D) 0.35 (E) 0.40
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 356
19.4 (2 points) The number of claims a particular policyholder makes in a year has a Poisson distribution with mean µ. The µ-values for policyholders follow a gamma distribution with variance equal to 0.3. The resulting distribution of policyholders by number of claims is a Negative Binomial with parameters r and β such that the variance is equal to 0.7. What is the value of r(1+β)? A. less than 0.90 B. at least 0.90 but less than 0.95 C. at least 0.95 but less than 1.00 D. at least 1.00 but less than 1.05 E. at least 1.05 Use the following information for the next 3 questions: Assume that the number of claims for an individual insured is given by a Poisson distribution with mean (annual) claim frequency λ and variance λ. Also assume that the parameter λ varies for the different insureds, with λ following a Gamma distribution: g(λ) = θ−α λ α−1 e−λ/θ / Γ(α), for 0< λ < ∞, with mean αθ, and variance αθ2. 19.5 (2 points) An insured is picked at random and observed for one year. What is the chance of observing 2 claims? A. αθ2/ (1+θ)α+2 B. α(α+1)θ2/ (1+θ)α+2 C. α(α+1)θ2/ {2(1+θ)α+2} D. α2(α+1)θ2/ {6(1+θ)α+2} E. α2(α+1)(α+2)θ2 / {6(1+θ)α+2} 19.6 (2 points) What is the unconditional mean frequency? A. αθ
B. (α-1)θ
C. α(α-1)θ2
D. α(α-1)θ2
E. α(α-1)(α+1)θ2/2
19.7 (3 points) What is the unconditional variance? A. αθ2
B. αθ + αθ2
C. αθ + α2θ2
D. α2θ2
E. α(α+1) θ
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 357
Use the following information for the next 8 questions: As he walks, Clumsy Klem loses coins at a Poisson rate. The Poisson rate, expressed in coins per minute, is constant during any one day, but varies from day to day according to a gamma distribution with mean 0.2 and variance 0.016. The denominations of coins are randomly distributed: 50% of the coins are worth 5; 30% of the coins are worth 10; and 20% of the coins are worth 25. 19.8 (2 points) Calculate the probability that Clumsy Klem loses exactly one coin during the tenth minute of todayʼs walk. (A) 0.09 (B) 0.11
(C) 0.13
(D) 0.15
(E) 0.17
19.9 (3 points) Calculate the probability that Clumsy Klem loses exactly two coins during the first 10 minutes of todayʼs walk. (A) 0.12 (B) 0.14
(C) 0.16
(D) 0.18
(E) 0.20
19.10 (4 points) Calculate the probability that the worth of the coins Clumsy Klem loses during his one-hour walk today is greater than 300. A. 1% B. 3% C. 5% D. 7% E. 9% 19.11 (2 points) Calculate the probability that the sum of the worth of the coins Clumsy Klem loses during his one-hour walks each day for the next 5 days is greater than 900. A. 1% B. 3% C. 5% D. 7% E. 9% 19.12 (2 points) During the first 10 minutes of todayʼs walk, what is the chance that Clumsy Klem loses exactly one coin of worth 5, and possibly coins of other denominations? A. 31% B. 33% C. 35% D. 37% E. 39% 19.13 (3 points) During the first 10 minutes of todayʼs walk, what is the chance that Clumsy Klem loses exactly one coin of worth 5, and no coins of other denominations? A. 11.6% B. 12.0% C. 12.4% D. 12.8% E. 13.2% 19.14 (3 points) Let A be the number of coins Clumsy Klem loses during the first minute of his walk today. Let B be the number of coins Clumsy Klem loses during the first minute of his walk tomorrow. What is the probability that A + B = 3? A. 0.2% B. 0.4% C. 0.6% D. 0.8% E. 1.0% 19.15 (3 points) Let A be the number of coins Clumsy Klem loses during the first minute of his walk today. Let B be the number of coins Clumsy Klem loses during the first minute of his walk tomorrow. Let C be the number of coins Clumsy Klem loses during the first minute of his walk the day after tomorrow. What is the probability that A + B + C = 2? A. 8% B. 10% C. 12% D. 14% E. 16%
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 358
19.16 (2 points) For an insurance portfolio the distribution of the number of claims a particular policyholder makes in a year is Poisson with mean λ. The λ-values of the policyholders follow the Gamma distribution, with parameters α = 4, and θ = 1/9. The probability that a policyholder chosen at random will experience x claims is given by which of the following? (x + 3)! A. 0.94 0.1x x! 3! B.
(x + 3)! 0.14 0.9x x! 3!
C.
(x + 8)! 0.754 0.25x x! 8!
D.
(x + 8)! 0.254 0.75x x! 8!
E. None of A, B, C, or D. 19.17 (2 points) The number of claims a particular policyholder makes in a year has a Poisson distribution with mean λ. The λ-values for policyholders follow a Gamma distribution. This Gamma Distribution has a variance equal to one quarter that of the resulting Negative Binomial distribution of policyholders by number of claims. What is the value of the β parameter of this Negative Binomial Distribution? A. 1/6
B. 1/5
C. 1/4
D. 1/3
E. Can not be determined
19.18 (1 point) Use the following information: • The random variable representing the number of claims for a single policyholder follows a Poisson distribution. • For a portfolio of policyholders, the Poisson parameters follow a Gamma distribution representing the heterogeneity of risks within that portfolio. • The random variable representing the number of claims in a year of a policyholder, chosen at random, follows a Negative Binomial distribution with parameters: r = 4 and β = 3/17. Determine the variance of the Gamma distribution. (A) 0.110 (B) 0.115 (C) 0.120 (D) 0.125
(E) 0.130
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 359
19.19 (2 points) Tom will generate via simulation 100,000 values of the random variable X as follows: (i) He will generate the observed value λ from a distribution with density λe−λ/1.4 /1.96. (ii) He then generates x from the Poisson distribution with mean λ. (iii) He repeats the process 99,999 more times: first generating a value λ, then generating x from the Poisson distribution with mean λ. Calculate the expected number of Tomʼs 100,000 simulated values of X that are 6. (A) 4200 (B) 4400 (C) 4600 (D) 4800 (E) 5000 19.20 (2 points) In the previous question, let V = the variance of a single simulated set of 100,000 values. What is the expected value of V? A. 0 B. 2.8 C. 3.92 D. 5.6 E. 6.72 19.21 (2 points) Dick will generate via simulation 100,000 values of the random variable X as follows: (i) He will generate the observed value λ from a distribution with density λ e−λ/1.4 /1.96. (ii) He will then generate 100,000 independent values from the Poisson distribution with mean λ. Calculate the expected number of Dickʼs 100,000 simulated values of X that are 6. (A) 4200 (B) 4400 (C) 4600 (D) 4800 (E) 5000 19.22 (2 points) In the previous question, let V = the variance of a single simulated set of 100,000 values. What is the expected value of V? A. 0 B. 2.8 C. 3.92 D. 5.6 E. 6.72 19.23 (1 point) Harry will generate via simulation 100,000 values of the random variable X as follows: (i) He will generate the observed value λ from a distribution with density λ e−λ/1.4 /1.96. (ii) He then generates x from the Poisson distribution with mean λ. (iii) He will then copy 99,999 times this value of x. Calculate the expected number of Harryʼs 100,000 simulated values of X that are 6. (A) 4200 (B) 4400 (C) 4600 (D) 4800 (E) 5000 19.24 (1 point) In the previous question, let V = the variance of a single simulated set of 100,000 values. What is the expected value of V? A. 0 B. 2.8 C. 3.92 D. 5.6 E. 6.72
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 360
Use the following information for the next 7 questions: •
The number of vehicles arriving at an amusement park per day is Poisson with mean λ.
•
λ varies from day to day via a Gamma Distribution with α = 40 and θ = 10.
•
The value of λ on one day is independent of the value of λ on another day.
•
The number of people leaving each vehicle is: 1 + a Negative Binomial Distribution with r = 1.6 and β = 6.
•
The amount of money spent at the amusement park by each person is LogNormal with µ = 5 and σ = 0.8.
19.25 (1 point) What is the variance of the number of vehicles that will show up tomorrow at the amusement park? A. 4,000 B. 4,400 C. 4,800 D. 5,200 E. 5,600 19.26 (1 point) What is the variance of the number of vehicles that will show up over the next 7 days at the amusement park? A. 25,000 B. 27,000 C. 29,000 D. 31,000 E. 33,000 19.27 (2 points) What is the variance of the number of people that will show up tomorrow at the amusement park? A. 480,000 B. 490,000 C. 500,000 D. 510,000 E. 520,000 19.28 (1 point) What is the variance of the number of people that will show up over the next 7 days at the amusement park? A. 2.8 million B. 3.0 million C. 3.2 million D. 3.4 million E. 3.6 million 19.29 (3 points) What is the standard deviation of the money spent tomorrow at the amusement park? A. 150,000 B. 160,000 C. 170,000 D. 180,000 E. 190,000 19.30 (1 point) What is the standard deviation of the money spent over the next 7 days at the amusement park? A. 360,000 B. 370,000 C. 380,000 D. 390,000 E. 400,000 19.31 (2 points) You simulate the amount of the money spent over the next 7 days at the amusement park. You run this simulation a total of 1000 times. How many runs do you expect in which less than 5 million is spent? A. 1 B. 2 C. 3 D. 4 E. 5
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 361
Use the following information for the next 6 questions:
• For each individual driver, the number of accidents in a year follows a Poisson Distribution. • For each individual driver, the mean of their Poisson Distribution λ is the same each year. • For each individual driver, the number of accidents each year is independent of other years. • The number of accidents for different drivers are independent. • λ varies between drivers via a Gamma Distribution with mean 0.08 and variance 0.0032. • Moe, Larry, and Curly are each drivers. 19.32 (2 points) What is the probability that Moe has exactly one accident next year? A. 6.9% B. 7.1% C. 7.3% D. 7.5% E. 7.7% 19.33 (2 points) What is the probability that Larry has exactly 2 accidents over the next 3 years? A. 2.25% B. 2.50% C. 2.75% D. 3.00% E. 3.25% 19.34 (2 points) What is the probability that Moe, Larry, and Curly have a total of exactly 2 accidents during the next year? A. 2.25% B. 2.50% C. 2.75% D. 3.00% E. 3.25% 19.35 (2 points) What is the probability that Moe, Larry, and Curly have a total of exactly 3 accidents during the next four years? A. 5.2% B. 5.4% C. 5.6% D. 5.8% E. 6.0% 19.36 (3 points) What is the probability that Moe has no accidents next year, Larry has exactly one accident over the next two years, and Curly has exactly two accidents over the next three years? A. 0.3% B. 0.4% C. 0.5% D. 0.6% E. 0.7% 19.37 (9 points) Let M = the number of accidents Moe has next year. Let L = the number of accidents Larry over the next two years. Let C = the number of accidents Curly over the next three years. Determine the probability that: M + L + C = 3. A. 0.9% B. 1.1% C. 1.3% D. 1.5% E. 1.7%
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 362
Use the following information to answer the next 3 questions: The number of claims a particular policyholder makes in a year is Poisson. The values of the Poisson parameter (for annual claim frequency) for the individual policyholders in a portfolio of 10,000 follow a Gamma distribution, with parameters α = 4 and θ = 0.1. You observe this portfolio for one year and divide it into three groups based on how many claims you observe for each policyholder: Group A: Those with no claims. Group B: Those with one claim. Group C: Those with two or more claims. 19.38 (1 point) What is the expected size of Group A? (A) 6200 (B) 6400 (C) 6600 (D) 6800 (E) 7000 19.39 (1 point) What is the expected size of Group B? (A) 2400 (B) 2500 (C) 2600 (D) 2700 (E) 2800 19.40 (1 point) What is the expected size of Group C? (A) 630 (B) 650 (C) 670 (D) 690 (E) 710
19.41 (3 points) The claims from a particular insured in a time period t are Poisson with mean λt. The values of λ for the individual insureds in a portfolio follow a Gamma distribution, with parameters α = 3 and θ = 0.02. For an insured picked at random what is the average wait until the first claim? A. 17 B. 19 C. 21 D. 23 E. 25 19.42 (2 points) Use the following information:
• Frequency for an individual is a 80-20 mixture of two Poissons with means λ and 3λ. • The distribution of λ is Exponential with a mean of 0.1. For an insured picked at random, what is the probability of seeing two claims? A. 1.2% B. 1.3% C. 1.4% D. 1.5% E. 1.6% 19.43 (2 points) Claim frequency follows a Poisson distribution with parameter λ. λ is distributed according to: g(λ) = 25 λ e-5λ. Determine the probability that there will be at least 2 claims during the next year. A. 5% B. 7% C. 9% D. 11% E. 13%
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 363
Use the following information for the next two questions: • 60% of claims are small.
• 40% of claims are large. • The annual number of claims from a particular insured is Poisson with mean λ. • λ is distributed across a group of insureds via a Gamma with α = 2 and θ = 0.5. • You pick an insured at random and observe for one year. 19.44 (2 points) What is the variance of the number of small claims? A. 0.78 B. 0.80 C. 0.82 D. 0.84 E. 0.86 19.45 (2 points) What is the variance of the number of large claims? A. 0.40 B. 0.42 C. 0.44 D. 0.46 E. 0.48
19.46 (4B, 11/96, Q.15) (2 points) You are given the following:
•
The number of claims for a single policyholder follows a Poisson distribution with mean λ .
• •
λ follows a gamma distribution.
The number of claims for a policyholder chosen at random follows a distribution with mean 0.10 and variance 0.15. Determine the variance of the gamma distribution. A. 0.05 B. 0.10 C. 0.15 D. 0.25 E. 0.30 19.47 (4B, 11/96, Q.26) (2 points) You are given the following: • The probability that a single insured will produce 0 claims during the next exposure period is e−λ.
•
λ varies by insured and follows a distribution with density function f(λ) = 36λe-6λ, 0 < λ < ∞.
Determine the probability that a randomly selected insured will produce 0 claims during the next exposure period. A. Less than 0.72 B. At least 0.72, but less than 0.77 C. At least 0.77, but less than 0.82 D. At least 0.82, but less than 0.87 E. At least 0.87
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 364
19.48 (Course 3 Sample Exam, Q.12) The annual number of accidents for an individual driver has a Poisson distribution with mean λ. The Poisson means, λ, of a heterogeneous population of drivers have a gamma distribution with mean 0.1 and variance 0.01. Calculate the probability that a driver selected at random from the population will have 2 or more accidents in one year. A. 1/121 B. 1/110 C. 1/100 D. 1/90 E. 1/81 19.49 (3, 5/00, Q.4) (2.5 points) You are given: (i) The claim count N has a Poisson distribution with mean Λ . (ii) Λ has a gamma distribution with mean 1 and variance 2. Calculate the probability that N = 1. (A) 0.19 (B) 0.24 (C) 0.31
(D) 0.34
(E) 0.37
19.50 (3, 5/01, Q.3 & 2009 Sample Q.104) (2.5 points) Glen is practicing his simulation skills. He generates 1000 values of the random variable X as follows: (i) He generates the observed value λ from the gamma distribution with α = 2 and θ =1 (hence with mean 2 and variance 2). (ii) He then generates x from the Poisson distribution with mean λ. (iii) He repeats the process 999 more times: first generating a value λ, then generating x from the Poisson distribution with mean λ. (iv) The repetitions are mutually independent. Calculate the expected number of times that his simulated value of X is 3. (A) 75 (B) 100 (C) 125 (D) 150 (E) 175 19.51 (3, 5/01, Q.15 & 2009 Sample Q.105) (2.5 points) An actuary for an automobile insurance company determines that the distribution of the annual number of claims for an insured chosen at random is modeled by the negative binomial distribution with mean 0.2 and variance 0.4. The number of claims for each individual insured has a Poisson distribution and the means of these Poisson distributions are gamma distributed over the population of insureds. Calculate the variance of this gamma distribution. (A) 0.20 (B) 0.25 (C) 0.30 (D) 0.35 (E) 0.40
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 365
19.52 (3, 11/01, Q.27) (2.5 points) On his walk to work, Lucky Tom finds coins on the ground at a Poisson rate. The Poisson rate, expressed in coins per minute, is constant during any one day, but varies from day to day according to a gamma distribution with mean 2 and variance 4. Calculate the probability that Lucky Tom finds exactly one coin during the sixth minute of todayʼs walk. (A) 0.22 (B) 0.24 (C) 0.26 (D) 0.28 (E) 0.30 19.53 (2 points) In 3, 11/01, Q.27, calculate the probability that Lucky Tom finds exactly one coin during the first two minutes of todayʼs walk. (A) 0.12 (B) 0.14 (C) 0.16 (D) 0.18 (E) 0.20 19.54 (3 points) In 3, 11/01, Q.27, let A = the number of coins that Lucky Tom finds during the first minute of todayʼs walk. Let B = the number of coins that Lucky Tom finds during the first minute of tomorrowʼs walk. Calculate Prob[A + B = 1]. (A) 0.09 (B) 0.11 (C) 0.13 (D) 0.15 (E) 0.17 19.55 (3 points) In 3, 11/01, Q.27, calculate the probability that Lucky Tom finds exactly one coin during the third minute of todayʼs walk and exactly one coin during the fifth minute of todayʼs walk. A. Less than 4.5% B. At least 4.5%, but less than 5.0% C. At least 5.0%, but less than 5.5% D. At least 5.5%, but less than 6.0% E. At least 6.0% 19.56 (3 points) In 3, 11/01, Q.27, calculate the probability that Lucky Tom finds exactly one coin during the first minute of todayʼs walk, exactly two coins during the second minute of todayʼs walk, and exactly three coins during the third minute of todayʼs walk. A. Less than 0.2% B. At least 0.2%, but less than 0.3% C. At least 0.3%, but less than 0.4% D. At least 0.4%, but less than 0.5% E. At least 0.5% 19.57 (2 points) In 3, 11/01, Q.27, calculate the probability that Lucky Tom finds exactly one coin during the first minute of todayʼs walk and exactly one coin during the fifth minute of tomorrowʼs walk. (A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09 19.58 (2 points) In 3, 11/01, Q.27, calculate the probability that Lucky Tom finds exactly one coin during the first three minutes of todayʼs walk and exactly one coin during the first three minutes of tomorrowʼs walk. (E) 0.025 (A) 0.005 (B) 0.010 (C) 0.015 (D) 0.020
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 366
19.59 (3, 11/02, Q.5 & 2009 Sample Q.90) (2.5 points) Actuaries have modeled auto windshield claim frequencies. They have concluded that the number of windshield claims filed per year per driver follows the Poisson distribution with parameter λ, where λ follows the gamma distribution with mean 3 and variance 3. Calculate the probability that a driver selected at random will file no more than 1 windshield claim next year. (A) 0.15 (B) 0.19 (C) 0.20 (D) 0.24 (E) 0.31 19.60 (CAS3, 11/03, Q.15) (2.5 points) Two actuaries are simulating the number of automobile claims for a book of business. For the population that they are studying: i) The claim frequency for each individual driver has a Poisson distribution. ii) The means of the Poisson distributions are distributed as a random variable, Λ. iii) Λ has a gamma distribution. In the first actuary's simulation, a driver is selected and one year's experience is generated. This process of selecting a driver and simulating one year is repeated N times. In the second actuary's simulation, a driver is selected and N years of experience are generated for that driver. Which of the following is/are true? I. The ratio of the number of claims the first actuary simulates to the number of claims the second actuary simulates should tend towards 1 as N tends to infinity. II. The ratio of the number of claims the first actuary simulates to the number of claims the second actuary simulates will equal 1, provided that the same uniform random numbers are used. Ill. When the variances of the two sequences of claim counts are compared the first actuary's sequence will have a smaller variance because more random numbers are used in computing it. A. I only B. I and II only C. I and Ill only D. II and Ill only E. None of I, II, or Ill is true
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 367
19.61 (CAS3, 5/05, Q.10) (2.5 points) Low Risk Insurance Company provides liability coverage to a population of 1,000 private passenger automobile drivers. The number of claims during a given year from this population is Poisson distributed. If a driver is selected at random from this population, his expected number of claims per year is a random variable with a Gamma distribution such that α = 2 and θ = 1. Calculate the probability that a driver selected at random will not have a claim during the year. A. 11.1% B. 13.5% C. 25.0% D. 33.3% E. 50.0% 19.62 (2 points) In CAS3, 5/05, Q.10, what is the probability that at most 265 of these 1000 drivers will not have a claim during the year? A. 75% B. 78% C. 81% D. 84% E. 87% 19.63 (2 points) In CAS3, 5/05, Q.10, what is the probability that these 1000 drivers will have a total of more than 2020 claims during the year? A. 31% B. 33% C. 35% D. 37% E. 39% 19.64 (4 points) In CAS3, 5/05, Q.10, let A be the number of these 1000 drivers that have one claim during the year and B be the number of these 1000 drivers that have two claims during the year. Determine the correlation of A and B. A. -0.32 B. -0.30 C. -0.28 D. -0.26 E. -0.24
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 368
Solutions to Problems: 19.1. E. The Poisson parameters over three years are three times those on an annual basis. Therefore they are given by a Gamma distribution with α = 3 and θ = 3/12 = 1/4. (The mean frequency is now 3/4 per three years rather than 3/12 = 1/4 on an annual basis. It might be helpful to recall that θ is the scale parameter for the Gamma Distribution.) The mixed distribution is a Negative Binomial, with parameters r = α = 3 and β = θ = 1/4. f(0) = 1/(1+β)r = 1/1.253 = 0.512. Comment: Over one year, the mixed distribution is Negative Binomial, with parameters r = α = 3 and β = θ = 1/12. Thus for a driver picked at random, the probability of no claims next year is: 1/(1 + 1/12)3 = 0.7865. Then one might be tempted to think that the probability of no claims over the next three years for a driver picked at random is: 0.78653 = 0.4865. However, drivers with a low λ in one year are assumed to have the same low λ every year. Such good drivers have a large chance of having 0 claims in 3 years. Drivers with a high λ in one year are assumed to have the same high λ every year. Such drivers have a smaller chance of having 0 claims in 3 years. As discussed in “Mahlerʼs Guide to Conjugate Priors,” a driver who has no claims the first year, has a posterior distribution of lambda that is Gamma, but with α = 3 + 0 = 3, and 1/θ = 12 + 1 = 13. Therefore for a driver with no claims in year one, the mixed distribution in year two is Negative Binomial with parameters r = α = 3 and β = θ = 1/13. Thus for a driver with no claims in year one, the probability of no claims in year two is: 1/(1 + 1/13)3 = 0.8007. A driver who has no claims the first two years, has a posterior distribution of lambda that is Gamma, but with α = 3 + 0 = 3, and 1/θ = 12 + 2 = 14. Therefore for a driver with no claims in the first two years, the mixed distribution in year two is Negative Binomial with parameters r = α = 3 and β = θ = 1/14. Thus for a driver with no claims in year one, the probability of no claims in year two is: 1/(1 + 1/14)3 = 0.8130. Prob[0 claims in three years] = (0.7865)(0.8007)(0.8130) = 0.512 ≠ 0.4865. 19.2. A. From the previous solution, f(1) = rβ/(1+β)r+1 = (3)(1/4)/1.254 = 0.3072.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 369
19.3. C. The mean of the Negative Binomial is rβ = .6, while the variance is rβ(1+β) = .9. Therefore, 1 + β = .9/.6 = 1.5, and β = .5. Therefore r = 1.2. For a Gamma-Poisson, α = r = 1.2 and θ = β = 0.5. Therefore, the variance of the Gamma Distribution is: αθ2 = (1.2)(.52 ) = 0.3. Comment: Similar to 3, 5/01, Q.15. 19.4. B. For the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to: mean of the Gamma + variance of the Gamma. Thus mean of Gamma + 0.3 = 0.7. Therefore, mean of Gamma = 0.4 = αθ. Variance of Gamma = 0.3 = αθ2 . Therefore, θ = 0.3 / 0.4 = 3/4. α = 0.4/θ = 0.5332. r = α = 0.5332 and β = θ = 3/4. r(1+β) = 0.5332 (7/4) = 0.933. 19.5. C. The conditional chance of 2 claims given λ is e−λ λ 2 /2. The unconditional chance can be obtained by integrating the conditional chances versus the distribution of λ: ∞
f(2) =
∞
∞
∫ f(2 | λ)g(λ) dλ = ∫e−λλ2 θ−αλα−1 e−λ/θ /2Γ(α)dλ = {θ−α/2Γ(α)}∫λα+1 e−(1+1/θ)/λ dλ =
0
0
0
{θ−α/ 2Γ(α)} Γ(α+2) / (1+1/θ)α+2 = α(α+1)θ2/ {2(1+θ) α+2}. Comment: The mixed distribution is a Negative Binomial with r = α and β = θ. f(x) = {(r+x-1)! /(x!)(r-1)!}βx / (1+β)x+r. f(2) = α(α+1)θ2/{2(1+θ)α+2}. 19.6. A. The conditional mean given λ is: λ. The unconditional mean can be obtained by integrating the conditional means versus the distribution of λ: ∞
∞
∞
∫ E[X | λ] g(λ) dλ = ∫λ θ−αλα−1 e−λ/θ /Γ(α)dλ ={θ−α/Γ(α)} ∫ λαe−λ/θ dλ =
E[X] = 0
0
0
{θ−α/Γ(α)} Γ(α+1) θα+1 = α θ . Alternately, ∞
E[X] = 0
∞
∫ E[X | λ] g(λ) dλ = ∫λg(λ) dλ = Mean of the Gamma Distribution = αθ . 0
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 370
19.7. B. The conditional mean given λ is: λ. The conditional variance given λ is: λ. Thus the conditional second moment given λ is: λ + λ2. The unconditional second moment can be obtained by integrating the conditional second moments versus the distribution of λ: ∞
E[X2 ] =
∞
∫ E[X2 | λ] g(λ) dλ = ∫{λ + λ2}θ−αλα−1 e−λ/θ /Γ(α)dλ =
0
0 ∞
∫
∞
∫
{θ−α/Γ(α)} λα e−λ/θdλ + {θ−α/Γ(α)} λα+1 e−λ/θdλ = 0
0
{θ−α/Γ(α)} Γ(α+1) / θα+1 + {θ−α/Γ(α)} Γ(α+2) / θα+2 = αθ + α(α+1)θ2. Since the mean is αθ, the variance is: αθ + α(α+1)θ2 - α2 θ2 = αθ + αθ2. Comment: Note that one integrates the conditional second moments in order to obtain the unconditional second moment. If instead one integrated the conditional variance one would obtain the Expected Value of the Process Variance, (in this case αθ), which is only one piece of the total unconditional variance. One would need to also add the Variance of the Hypothetical Means, (which in this case is αθ2), in order to obtain the total variance of αθ + αθ2. The mixed distribution is a Negative Binomial with r = α and β = θ. It has variance: rβ(1+β)= αθ + αθ2. 19.8. D. For the Gamma, mean = αθ = .2, and variance = αθ2 = .016. Thus θ = .016/.2 = .08 and α = 2.5. This is a Gamma-Poisson, with mixed distribution a Negative Binomial, with r = α = 2.5 and β = θ = .08. f(1) = rβ/(1+β)1+r = (2.5)(.08)/(1+.08)3.5 = 0.153. Comment: Similar to 3, 11/01, Q.27. 19.9. E. Over 10 minutes, the rate of loss is Poisson, with 10 times that for one minute. λ has a Gamma distribution with α = 2.5 and θ = 0.08 ⇒ 10λ has a Gamma distribution with α = 2.5, and θ = (10)(.08) = 0.8. The mixed distribution is a Negative Binomial, with r = α = 2.5 and β = θ = 0.8. f(2) = {r(r+1)/2} β2 /(1+β)2+r = {(2.5)(3.5)/2}.82 /(1+.8)4.5 = 0.199.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 371
19.10. B. Mean value of a coin is: (50%)(5) + (30%)(10) + (20%)(25) = 10.5. 2nd moment of the value of a coin is: (50%)(52 ) + (30%)(102 ) + (20%)(252 ) = 167.5. Over 60 minutes, the rate of loss is Poisson, with 60 times that for one minute. λ has a Gamma distribution with α = 2.5 and θ = .08 ⇒ 60λ has a Gamma distribution with α = 2.5 and θ = (60)(.08) = 4.8. The mixed distribution is a Negative Binomial, with r = α = 2.5 and β = θ = 4.8. Therefore, the mean number of coins: rβ = (2.5)(4.8) = 12, and the variance of number of coins: rβ(1+β) = (2.5)(4.8)(5.8) = 69.6. The mean worth is: (10.5)(12) = 126. Variance of worth is: (12)(167.5 - 10.52 ) + (10.52 )(69.6) = 8360.4. Prob[worth > 300] ≅ 1 - Φ[(300.5 - 126)/ 8360.4 ] = 1 - Φ[1.91] = 2.81%. Klem loses money in units of 5 cents or more. Therefore, if he loses more than 300, he loses 305 or more. Thus it might be better to approximate the probability as: 1 - Φ((304.5 - 126)/ 8360.4 ) = 1 - Φ[1.95] = 2.56%. Along this same line of thinking, one could instead approximate the probability by taking the probability from 302.5 to infinity: 1 - Φ[(302.5 - 126)/ 8360.4 ] = 1 - Φ[1.93] = 2.68%. 19.11. E. From the previous solution, for a day chosen at random, the worth has mean 126 and variance 8360.4. The worth over five days is the sum of 5 independent variables; the sum of 5 days has mean: (5)(126) = 630 and variance: (5)(8360.4) = 41,802. Prob[worth > 900] ≅ 1 - Φ[(900.5 - 630)/ 41,802 ] = 1 - Φ[1.32] = 9.34%. Klem loses money in units of 5 cents or more. Therefore, if he loses more than 900, he loses 905 or more. It might be better to approximate the probability as: Prob[worth > 900] = Prob[worth ≥ 905] ≅ 1 - Φ[(904.5 - 630)/ 41,802 ] = 1 - Φ[1.34] = 9.01%. One might have instead approximated as: 1 - Φ[(902.5 - 630)/ 41,802 ] = 1 - Φ[1.33] = 9.18%.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 372
19.12. A. 50% of the coins are worth 5, so if the overall process is Poisson with mean λ, then losing coins of worth 5 is Poisson with mean .5λ. Over 10 minutes it is Poisson with mean 5λ. λ has a Gamma distribution with α = 2.5 and θ = .08 ⇒ 5λ has a Gamma distribution with α = 2.5 and θ = (5)(.08) = .4. The mixed distribution is a Negative Binomial, with r = α = 2.5 and β = θ = .4. f(1) = rβ/(1 + β)r+1 = (2.5)(.4)/(1.43.5) = 30.8%. 19.13. D. Losing coins of worth 5, 10, and 25 are three independent Poisson Processes. Over 10 minutes losing coins of worth 5 is Poisson with mean 5λ. Over 10 minutes losing coins of worth 10 is Poisson with mean 3λ. Over 10 minutes losing coins of worth 25 is Poisson with mean 2λ. Prob[1 coin @ 5]Prob[0 coins @ 10]Prob[0 coins @ 25] = 5λe−5λe−3λe−2λ = 5λe−10λ. λ has a Gamma distribution with α = 2.5 and θ = 0.08. 1/θ = 12.5.
⇒ f(λ) = 12.52.5λ1.5e−12.5λ/Γ(2.5). ∞
∞
∞
∫5λe−10λ f(λ) dλ = ∫5λe−10λ 12.52.5λ1.5e−12.5λ/Γ(2.5) dλ = {(5)12.52.5/Γ(2.5)}∫λ2.5e−22.5λ dλ = 0
0
{(5)12.52.5/Γ(2.5)}{Γ(3.5)/22.53.5}
0
= (5)(2.5)12.52.5/22.53.5 = 12.8%.
Comment: While given lambda, each Poisson Process is independent, the mixed Negative Binomials are not independent, since each day we use the same lambda (appropriately thinned) for each denomination of coin. From the previous solution, the probability of one coin worth 5 is 30.80%. The distribution of coins worth ten is Negative Binomial with r = 2.5 and β = (3)(.08) = .24. Therefore, the chance of seeing no coins worth 10 is: 1/1.242.5 = 58.40%. The distribution of coins worth 25 is Negative Binomial with r = 2.5 and β = (2)(.08) = .16. Therefore, the chance of seeing no coins worth 25 is: 1/1.162.5 = 69.0%. However, (30.80%)(58.40%)(69.00%) = 12.4% ≠ 12.8%, the correct solution. One can not multiply the three probabilities together, because the three events are not independent. The three probabilities each depend on the same lambda value for the given day.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 373
19.14. E. A is Poisson with mean λA, where λA is a random draw from a Gamma Distribution with α = 2.5 and θ = .08. B is Poisson with mean λB, where λB is a random draw from a Gamma Distribution with α = 2.5 and θ = .08. Since A and B are from walks on different days, λA and λB are independent random draws from the same Gamma. Thus λA + λB is from a Gamma Distribution with α = 2.5 + 2.5 = 5 and θ = .08. Thus A + B is from a Negative Binomial Distribution with r = 5 and β = .08. The density at 3 of this Negative Binomial Distribution is: {(5)(6)(7)/3!}.083 /1.088 = 0.97%. Alternately, A and B are independent Negative Binomials each with r = 2.5 and β = .08. Thus A + B is a Negative Binomial Distribution with r = 5 and β = .08. Proceed as before. Alternately, for A and B the densities for each are: f(0) = 1/(1+β)r = 1/1.082.5 = .825, f(1) = rβ/(1+β)1+r = (2.5).08/1.083.5 = .153, f(2) = {r(r+1)/2} β2 /(1+β)2+r = {(2.5)(3.5)/2}.082 /1.084.5 = .0198, f(3) = {r(r+1)(r+2)/3!} β3 /(1+β)3+r = {(2.5)(3.5)(4.5)/6}.083 /1.085.5 = .00220. Prob[A + B = 3] = Prob[A=0]Prob[B=3] + Prob[A=1]Prob[B=2] + Prob[A=2]Prob[B=1] + Prob[A=3]Prob[B=0] = (0.825)(0.00220) + (0.153)(0.0198) + (0.0198)(0.153) + (0.00220)(0.825) = 0.97%. Comment: For two independent Gamma Distributions with the same θ: Gamma(α1, θ) + Gamma(α2, θ) = Gamma(α1 + α2, θ). 19.15. B. λA + λB + λC is from a Gamma Distribution with α = (3)(2.5) = 7.5 and θ = .08. Thus A + B + C is from a Negative Binomial Distribution with r = 7.5 and β = .08. The density at 2 of this Negative Binomial Distribution is: {(7.5)(8.5)/2!}.082 /1.089.5 = 9.8%. 19.16. A. Mixing a Poisson via a Gamma leads to a negative binomial overall frequency distribution. The negative binomial has parameters r = α = 4 and β = θ = 1/9. f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r = {(4)(5) ... (x + 4)/x!} (1/9)x / (10/9)x+4 = {(x+3)! / (x! 3!)} 0.94 0.1x . 19.17. D. For the Gamma-Poisson, β = θ and r = α. Therefore, the variance of the Gamma = αθ2 = rβ2 . Total Variance = Variance of the mixed Negative Binomial = rβ(1+β). Thus for the GammaPoisson we have: (Var. of the Gamma)/(Var. of the Negative Binomial) = β/(1+β) = 1/{1 + 1/β}. Thus in this case 1/{1 + 1/β} = .25. ⇒ β = 1/3.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 374
19.18. D. The parameters of the Gamma can be gotten from those of the Negative Binomial, α = r = 4, θ = β = 3/17. Then the Variance of the Gamma = αθ2 = 0.125. Alternately, the variance of the Gamma is the Variance of the Hypothetical Means = Total Variance - Expected Value of the Process Variance = Variance of the Negative Binomial - Mean of the Gamma = Variance of the Negative Binomial - Mean of Negative Binomial = rβ(1+β) - rβ = rβ2 = (4)(3/17)2 = 0.125. 19.19. D. This is a Gamma-Poisson with α = 2 and θ = 1.4. The mixed distribution is Negative Binomial with r = α = 2, and β = θ = 1.4. For a Negative Binomial Distribution, f(6) = {(r)(r+1)(r+2)(r+3)(r+4)(r+5)/6!}β6/(1+β)r+6 = {(2)(3)(4)(5)(6)(7)/720}(1.46 )/(2.48 ) = 0.04788. Thus we expect (100,000)(0.04788) = 4788 out of 100,000 simulated values to be 6. Comment: Similar to 3, 5/01, Q.3. One need know nothing about simulation, in order to answer these questions. 19.20. E. Each year is a random draw from a different Poisson with unknown λ. The simulated set consists of random draws each from different Poisson Distributions. Thus each simulated set is a mixed distribution for a Gamma-Poisson, a Negative Binomial Distributions with r = α = 2, and β = θ = 1.4. E[V] = variance of this Negative Binomial = (2)(1.4)(1 + 1.4) = 6.72. Alternately, Expected Value of the Process Variance is: E[P.V. | λ] = E[λ] = αθ = (2)(1.4) = 2.8. Variance of the Hypothetical Means is: Var[Mean | λ] = Var[λ] = αθ2 = (2)(1.42 ) = 3.92. Total Variance is: EPV + VHM = 2.8 + 3.92 = 6.72. Comment: Difficult! In other words, Var[X] = E[Var[X|Y] + Var[E[X|Y]]. 19.21. D. This is a Gamma-Poisson with α = 2 and θ = 1.4. The mixed distribution is Negative Binomial with r = α = 2, and β = θ = 1.4. For this Negative Binomial Distribution, 100000f(6) = 4788.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 375
19.22. B. Each year is a random draw from the same Poisson with unknown λ. The simulated set is from this Poisson Distribution with mean λ. V = λ. E[V] = E[λ] = mean of the Gamma = αθ = (2)(1.4) = 2.8. Comment: Difficult! What Tom did was simulate one year each from 100,000 randomly selected insureds. What Dick did was pick a random insured and simulate 100,000 years for that insured; each year is an independent random draw from the same Poisson distribution with unknown λ. The two situations are different, even though they have the same mean. In Dickʼs case there is no variance associated with the selection of the parameter lambda; the only variance is associated with the variance of the Poisson Distribution. In Tomʼs case there is variance associated with the selection of the parameter lambda as well as variance is associated with the variance of the Poisson Distribution. 19.23. D. This is a Gamma-Poisson with α = 2 and θ = 1.4. The mixed distribution is Negative Binomial with r = α = 2, and β = θ = 1.4. For this Negative Binomial Distribution, 100000f(6) = 4788. 19.24. A. Since all 100,000 values in the simulated set are the same, V = 0. E[V] = 0. Comment: Contrast Tom, Dick, and Harryʼs simulations. Even though they all have the same mean, they are simulating somewhat different situations. 19.25. B. The number of vehicles is Negative Binomial with r = α = 40 and β = θ = 10. It has variance: rβ(1 + β) = (40)(10)(11) = 4400. 19.26. D. This is the sum of 7 independent variables, each with variance 4400. (7)(4400) = 30,800. Comment: Although λ is constant on any given day, it varies from day to day. A day picked at random is a Negative Binomial with r = 40 and β = 10. The sum of seven independent Negative Binomials is a Negative Binomial with r = (7)(40) = 280 and β = 10. This has variance: (280)(10)(11) = 30,800. If instead λ had been the same for a whole week, the answer would have changed. In that case, one would get a Negative Binomial with r = 40 and β = (7)(10) = 70, with variance: (40)(70)(71) = 198,800. 19.27. E. The mean number of people per vehicle is: 1 + (1.6)(6) = 10.6. The variance of the people per vehicle is: (1.6)(6)(1 + 6) = 67.2. Variance of the number of people is: (400)(67.2) + (10.62 )(4400) = 521,264.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 376
19.28. E. This is the sum of 7 independent variables. (7)(521,264) = 3,648,848. 19.29. A. The number of people has mean: (400)(10.6) = 4240, and variance: 521,264. The LogNormal has mean: exp[5 + .82 /2] = 204.38, second moment: exp[(2)(5) + (2)(.82 )] = 79,221, and variance: 79221 - 204.382 = 37,450. Variance of the money spent: (4240)(37450) + (204.382 )(521264) = 21,933 million. 21,933 million = 148,098. 19.30. D. This is the sum of 7 independent variables, with variance: (7)(21,933 million) = 153,531 million.
153,531 million = 391,830.
19.31. C. The mean amount spent per day is: (4240)(204.38) = 866,571. Over 7 days the mean amount spent is: (7)(866571) = 6,065,997, with variance 153,531 million. Prob[amount spent < 5 million] ≅ Φ[(5 million - 6.0660 million)/ 153,531 million ] = Φ(-2.72) = .33%. So we expect: (1000)(0.33%) = 3 such runs. 19.32. B. For the Gamma, mean = αθ = 0.08, and variance = αθ2 = 0.0032. Thus θ = 0.04 and α = 2. This is a Gamma-Poisson, with mixed distribution a Negative Binomial: with r = α = 2 and β = θ = 0.04. f(1) =
rβ (2)(0.04) = = 7.11%. r + 1 (1 + β) (1 + 0.04 )3
Comment: The fact that it is the next year rather than some other year is irrelevant. 19.33. C. For one year, each insureds mean is λ, and is distributed via a Gamma with: θ = 0.04 and α = 2. Over three years, each insureds mean is 3λ, and is distributed via a Gamma with: θ = (3)(0.04) = 0.12, and α = 2. This is a Gamma-Poisson, with mixed distribution a Negative Binomial: with r = α = 2 and β = θ = 0.12. f(2) =
r (r + 1) 2
2
0.122 (2) (3) = = 2.75%. 2 (1 + β)r + 2 (1 + 0.12 )4 β
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 377
19.34. B. For one year, each insureds mean is λ, and is distributed via a Gamma with: θ = 0.04 and α = 2. This is a Gamma-Poisson, with mixed distribution a Negative Binomial: with r = α = 2 and β = θ = 0.04. We add up three individual independent drivers and we get a Negative Binomial with: with r = α = (3)(2) = 6, and β = 0.04. f(2) =
r (r + 1) 2
2
0.042 (6) (7) = = 2.46%. 2 (1 + β)r + 2 (1 + 0.04 )8 β
Comment: The Negative Binomial Distributions here and in the previous solution have the same mean, however the densities are not the same. Here is a graph of the ratios of the densities of the Negative Binomial in the previous solution and those of the Negative Binomial here: ratio
3.0
2.0 1.5
1.0 0
1
2
3
4
5
n
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 378
19.35. E. For one year, each insureds mean is λ, and is distributed via a Gamma with: θ = 0.04 and α = 2. Over four years, each insureds mean is 4λ, and is distributed via a Gamma with: θ = (4)(0.04) = 0.16, and α = 2. This is a Gamma-Poisson, with mixed distribution a Negative Binomial: with r = α = 2 and β = θ = 0.16. We add up three individual independent drivers and we get a Negative Binomial with: with r = α = (3)(2) = 6, and β = 0.16. f(3) =
β3 0.163 r (r + 1) (r + 2) (6) (7) (8) = = 6.03%. 6 6 (1 + β)r + 3 (1 + 0.16 )9
19.36. A. The number of accidents Moe has over one year is Negative Binomial: with r = α = 2 and β = θ = 0.04. f(0) =
1 1 = = 0.9246. r (1 + β) (1 + 0.04 )2
The number of accidents Larry has over two years is Negative Binomial: with r = α = 2 and β = 2θ = 0.08. f(1) =
rβ (2)(0.08) = = 0.1270. r + 1 (1 + β) (1 + 0.08 )3
The number of accidents Curly has over three years is Negative Binomial: with r = α = 2 and β = 3θ = 0.12. f(2) =
r (r + 1) 2
β
2
= (1 + β)r + 2
0.12 2 (2) (3) = 0.0275. 2 (1 + 0.12)4
Prob[Moe = 0, Larry = 1, and Curly = 2)] = (0.9246)(0.1270)(0.0275) = 0.32%.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 379
19.37. D. The number of accidents Moe has over one year is Negative Binomial: with r = α = 2 and β = θ = 0.04. f(0) = 0.9246. f(1) = 0.0711. f(2) = 0.0041. f(3) = 0.0002. The number of accidents Larry has over two years is Negative Binomial: with r = α = 2 and β = 2θ = 0.08. f(0) = 0.8573. f(1) = 0.1270. f(2) = 0.0141. f(3) = 0.0014. The number of accidents Curly has over three years is Negative Binomial: with r = α = 2 and β = 3θ = 0.12. f(0) = 0.7972. f(1) = 0.1708. f(2) = 0.0275. f(3) = 0.0039. We need to list all of the possibilities: Prob[M = 0, L = 0, C = 3] + Prob[M = 0, L = 1, C = 2] + Prob[M = 0, L = 2, C = 1] + Prob[M = 0, L = 3, C = 1] + Prob[M = 1, L = 0, C = 2] + Prob[M = 1, L = 1, C = 1] + Prob[M = 1, L = 2, C = 0] + Prob[M = 2, L = 0, C = 1] + Prob[M = 2, L = 1, C = 0] + Prob[M = 3, L = 0, C = 0] = (0.9246) {(0.8573)(0.0039) + (0.1270)(0.0275) + (0.0141)(0.1708) + (0.0014)(0.7972)} + (0.0711) {(0.8573)(0.0275) + (0.1270)(0.1708) + (0.0141)(0.7972)} + (0.0041) {(0.8573)(0.1708) + (0.1270)(0.7972)} + (0.0002)(0.8573)(0.7972) = 1.475%. Comment: Adding up the three independent drivers, M + L + C does not follow a Negative Binomial, since the betas are not the same. Note that the solution to the previous question is one of the possibilities here. 19.38. D, 19.39. B, & 19.40. D. The mixed distribution is a Negative Binomial with r = α = 4 and β = θ = .1. f(0) = (1+β)-r = 1.1-4 = .6830. Expected size of group A: 6830. f(1) = rβ(1+β)-(r+1) = (4)(.1)1.1-5 = .2484. Expected size of group B: 2484. Expected size of group C: 10000 - (6830 + 2484) = 686. 19.41. E. For an individual insured, the probability of no claims by time t is the density at zero of a Poisson Distribution with mean λt: exp[-λt]. In other words, the probability the first claim occurs by time t is: 1 - exp[-λt]. This an Exponential Distribution with mean 1/λ. Thus, for an individual the average wait until the first claim is 1/λ. (This is a general result for Poisson Processes.) For a Gamma Distribution, E[X-1] = θ−1 Γ(α + k) / Γ(α) = 1/ {θ(α-1)}, α > 1. Lambda is Gamma Distributed, thus E[1/λ] = 1/ {θ(α-1)} = 1 / {(0.02)(3 - 1)} = 25.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 380
19.42. C. There is an 80% chance we get a random draw from the Poisson with mean λ. In which case, we have a Gamma-Poisson with α = 1 and θ = 0.1. The mixed distribution is Negative Binomial with r = 1 and β = 0.1.
f(2) = 0.12 / 1.13 = 0.751%.
There is a 20% chance we get a random draw from the Poisson with mean 3λ. 3λ follows an Exponential with mean 0.3. We have a Gamma-Poisson with α = 1 and θ = 0.3. The mixed distribution is Negative Binomial with r = 1 and β = 0.3.
f(2) = 0.32 / 1.33 = 4.096%
Thus the overall probability of two claims is: (0.8)(0.751%) + (0.2)(4.096%) = 1.420%. 19.43. B. This is a Gamma-Poisson with α = 2 and θ = 1/5. Thus the mixed distribution is Negative Binomial with r = 2 and β = 1/5. For this Negative Binomial: f(0) = 1/(1 + 1/5)2 = 25/36. f(1) = (2)(1/5)/(1 + 1/5)3 = 25/108. Probability of at least 2 claims is: 1 - 25/36 - 25/108 = 8/108 = 2/27 = 7.41%. 19.44. A. The mixed distribution is Negative Binomial with r = 2 and β = 0.5. Thinning, small claims are Negative Binomial with r = 2 and β = (60%)(0.5) = 0.3. Variance of the number of small claims is: (2)(0.3)(1.3) = 0.78. Alternately, for each insured, the number of small claims is Poisson with mean: 0.6 λ. 0.6 λ follows a Gamma Distribution with α = 2 and θ = (0.6)(0.5) = 0.3. Thus the mixed distribution for small claims is Negative Binomial with r = 2 and β = 0.3. Variance of the number of small claims is: (2)(0.3)(1.3) = 0.78. 19.45. E. The mixed distribution is Negative Binomial with r = 2 and β = 0.5. Thinning, large claims are Negative Binomial with r = 2 and β = (40%)(0.5) = 0.2. Variance of the number of large claims is: (2)(0.2)(1.2) = 0.48. Alternately, for each insured, the number of large claims is Poisson with mean: 0.2 λ. 0.2 λ follows a Gamma Distribution with α = 2 and θ = (0.4)(0.5) = 0.2. Thus the mixed distribution for large claims is Negative Binomial with r = 2 and β = 0.2. Variance of the number of large claims is: (2)(0.2)(1.2) = 0.48. Comment: The number of small and large claims is positively correlated. The distribution of claims of all sizes is Negative Binomial with r = 2 and β = 0.5; it has a variance of: (2)(0.5)!9.5) = 1.5 > 1.26 = 0.78 + 0.48.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 381
19.46. A. For the Gamma-Poisson, the mixed distribution is a Negative Binomial with mean rβ and variance = rβ(1+β). Thus we have rβ = 0.1 and 0.15/0.1 = 1+β. Thus β = 0.5, and r = 0.1/0.5 = 0.2. The parameters of the Gamma follow from those of the Negative Binomial: α = r = 0.2 and θ = β = 0.5. The variance of the Gamma is αθ2 = 0.05. Alternately, the total variance is 0.15. For the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to: mean of the Gamma + variance of the Gamma. Therefore, the variance of the Gamma = 0.15 - 0.10 = 0.05. ∞
19.47. B. 0
∞
∞
∫e−λf(λ)dλ = ∫e−λ36λe−6λ dλ = 36 ∫λe−7λ dλ = (36)(Γ(2)/ 72) = (36)(1/49) = 0.735. 0
0
Alternately, assume that the frequency for a single insured is given by a Poisson with a mean of λ. (This is consistent with the given information that the chance of 0 claims is e−λ.) In that case one would have a Gamma-Poisson process and the mixed distribution is a Negative Binomial. The given Gamma distribution of θ has α = 2 and θ = 1/6. The mixed Negative Binomial has r = α = 2 and β = θ = 1/6, and f(0) = (1+β)-r = (1 + 1/6 )-2 = 36/49. Comment: Note that while the situation described is consistent with a Gamma-Poisson, it need not be a Gamma-Poisson. 19.48. A. One can solve for the parameters of the Gamma, αθ = .1, and αθ2 = .01, therefore θ = .1 and α = 1. The mixed distribution is a Negative Binomial with parameters r = α = 1 and β = θ = .1, a Geometric Distribution. f(0) = 1/(1+β) = 1/1.1 = 10/11. f(1) = β/(1+β)2 = .1/1.12 = 10/121. The chance of 2 or more accidents is: 1 - f(0) - f(1) = 1 - 10/11 - 10/121 = 1/121.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 382
19.49. A. mean of Gamma = αθ = 1 and variance of Gamma = αθ2 = 2. Therefore, θ = 2 and α = 1/2. The mixed distribution is a Negative Binomial with r = α = 1/2 and β = θ = 2. f(1) = rβ/(1+β)1+r = (1/2)(2)/(33/2) = 0.192. Alternately, f(1) = ∞
∞
∫0 f(1 | λ) g(λ) dλ = ∫0 λ 1 Γ(1/ 2)
2
e- λ
(λ / 2)1/ 2 e- λ / 2 1 dλ = λ Γ(1/ 2) Γ(1/ 2)
Γ(3/2) (2/3)3/2 = 2
∞
2
∫0 λ1/ 2 e - 3λ / 2 dλ =
Γ(3 / 2) -3/2 3 = 2(1/2) 3-3/2 = 0.192. Γ(1/ 2)
19.50. C. This is a Gamma-Poisson with α = 2 and θ = 1. The mixed distribution is Negative Binomial with r = α = 2 , and β = θ = 1. For a Negative Binomial Distribution, f(3) = {(r)(r+1)(r+2)/3!}β3/(1+β)r+3 = {(2)(3)(4)/6}(13 )/(25 ) = 1/8. Thus we expect (1000)(1/8) = 125 out of 1000 simulated values to be 3. 19.51. A. The mean of the Negative Binomial is rβ = 0.2, while the variance is rβ(1+β) = 0.4. Therefore, 1 + β = 2 ⇒ β = 1 and r = 0.2. For a Gamma-Poisson, α = r = 0.2 and θ = β = 1. Therefore, the variance of the Gamma Distribution is: αθ2 = (0.2)(12 ) = 0.2. Alternately, for the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to: mean of the Gamma + variance of the Gamma. Variance of the Gamma = Variance of the Negative Binomial - Mean of the Gamma = Variance of the Negative Binomial - Overall Mean = Variance of the Negative Binomial - Mean of the Negative Binomial = 0.4 - 0.2 = 0.2.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 383
19.52. A. For the Gamma, mean = αθ = 2, and variance = αθ2 = 4. Thus θ = 2 and α = 1. This is a Gamma-Poisson, with mixed distribution a Negative Binomial, with r = α = 1 and β = θ = 2. This is a Geometric with f(1) = β/(1+β)2 = 2/(1+2)2 = 2/9 = 0.222. Alternately, λ is distributed via an Exponential with mean 2, f(λ) = e−λ/2/2. Prob[1 claim] = ∫Prob[1 claim | λ] f(λ) dλ = ∫λe−λe−λ/2/2 dλ = ∞
∫
(1/2) λe−3λ/2 dλ = (1/2) (2/3)2 Γ(2) = (1/2)(4/9)(1!) = 2/9 = 0.222. 0
Alternately, for the Gamma-Poisson, the variance of the mixed Negative Binomial = total variance = E[Var[N | λ]] + Var[E[N | λ]] = E[λ] + Var[λ] = mean of the Gamma + variance of the Gamma = 2 + 4 = 6. The mean of the mixed Negative Binomial = overall mean = E[λ] = mean of the Gamma = 2. Therefore, rβ = 2 and rβ(1+β) = 6. ⇒ r =1 and β = 2. f(1) = β/(1+β)2 = 2/(1+2)2 = 2/9 = 0.222. Comment: The fact that it is the sixth rather than some other minute is irrelevant. 19.53. C. Over two minutes (on the same day) we have a Poisson with mean 2λ. λ ∼ Gamma(α, θ) = Gamma (1, 2). 2λ ∼ Gamma(α, 2θ) = Gamma (1, 4), as per inflation. Mixed Distribution is Negative Binomial, with r = α = 1 and β = θ = 4. f(1) = β/(1 + β)2 = 4/(1 + 4)2 = 16%. Comment: If one multiplies a Gamma variable by a constant, one gets another Gamma with the same alpha and with the new theta equal to that constant times the original theta.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 384
19.54. D. A ∼ Negative Binomial with r = 1 and β = 2. B ∼ Negative Binomial with r = 1 and β = 2. A + B ∼ Negative Binomial with r = 2 and β = 2. f(1) = r β / (1 + β)1+r = (2)(2) / (1 + 2)3 = 14.8%. Alternately, the number of coins found in the minutes are independent Poissons with means λ1 and λ 2 . Total number found is Poisson with mean λ1 + λ2 . λ 1 + λ2 ∼ Gamma(2α, θ) = Gamma (2, 2). Mixed Negative Binomial has r = 2 and β = 2. Proceed as before. Alternately, P[A + B = 1] = P[A = 1]P[B = 0] + P[A = 0]P[B = 1] = (2/9)(1/3) + (1/3)(2/9) = 14.8%. Comment: The sum of two independent Gamma variables with the same theta, is another Gamma with the same theta and with the new alpha equal to the sum of the alphas. 19.55. E. Prob[1 coin during minute 3 | λ] = λe−λ. Prob[1 coin during minute 5 | λ] = λe−λ. The Gamma has θ = 2 and α = 1, an Exponential. π(λ) = e−λ/2/2. Prob[1 coin during minute 3 and 1 coin during minute 5] =
∫Prob[1 coin during minute 3 | λ] Prob[1 coin during minute 5 | λ] π(λ) dλ = ∞
∫ (λe −λ) (λe−λ ) (e−λ/ 2 / 2) dλ = 0
∞
∫ λ2 e− 2.5λ / 2 dλ = Γ(3) (1/2.5)3 / 2 = (1/2)(2/2.53) =
6.4%.
0
Comment: It is true that Prob[1 coin during minute 3] = Prob[1 coin during minute 5] = 2/9. (2/9)(2/9) = 4.94%. However, since the two probabilities both depend on the same lambda, they are not independent.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 385
19.56. D. Prob[1 coin during minute 1 | λ] = λe−λ. Prob[2 coins during minute 2 | λ] = λ2e−λ/2. Prob[3 coins during minute 3 | λ] = λ3e−λ/6. The Gamma has θ = 2 and α = 1, an Exponential. π(λ) = e−λ/2/2. Prob[1 coin during minute 1, 2 coins during minute 2, and 3 coins during minute 3] =
∫Prob[1 coin minute 1 | λ] Prob[2 coins minute 2 | λ] Prob[3 coins minute 3 | λ] π(λ) dλ = ∞
∫
(λe −λ)
(λ2 e− λ/2
)
(λ3e− λ/
6)
(e−λ / 2 /
∞
2) dλ =
0
∫ λ6 e− 3.5λ / 24 dλ
= Γ(7) (1/3.5)7 / 24 =
0
(720/24)/3.57 = 0.466%. Comment: Prob[1 coin during minute 1] = 2/9. Prob[2 coins during minute 2] = 4/27. Prob[3 coins during minute 3] = 8/81. (2/9)(4/27)(8/81) = 0.325%. However, since the three probabilities depend on the same lambda, they are not independent. 19.57. A. From a previous solution, for one minute, the mixed distribution is Geometric with β = 2. f(1) = β/(1+β)2 = 2/(1+2)2 = 2/9 = 0.2222. Since the minutes are on different days, their lambdas are picked independently. Prob[1 coin during 1 minute today and 1 coin during 1 minute tomorrow] = Prob[1 coin during a minute] Prob[1 coin during a minute] = 0.22222 = 4.94%. 19.58. C. Over three minutes (on the same day) we have a Poisson with mean 3λ. λ ∼ Gamma(α, θ) = Gamma (1, 2). 3λ ∼ Gamma(α, 3θ) = Gamma (1, 6). Mixed Distribution is Negative Binomial, with r = α = 1 and β = θ = 6. f(1) = β/(1 + β)2 = 6/(1 + 6)2 = 0.1224. Since the time intervals are on different days, their lambdas are picked independently. Prob[1 coin during 3 minutes today and 1 coin during 3 minutes tomorrow] = Prob[1 coin during 3 minutes] Prob[1 coin during 3 minutes] = 0.12242 = 1.50%. 19.59. E. Gamma has mean = αθ = 3 and variance = αθ2 = 3 ⇒ θ = 1 and α = 3. The Negative Binomial mixed distribution has r = α = 3 and β = θ = 1. f(0) = 1/(1+β)3 = 1/8. f(1) = rβ/(1+β)4 = 3/16. F(1) = 1/8 + 3/16 = 5/16 = 0.3125.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 386
19.60. E. Assume the prior Gamma, used by both actuaries, has parameters α and θ. The first actuary is simulating N drivers from a Gamma-Poisson frequency process. The number of claims from a random driver is Negative Binomial with r = α and β = θ. The total number of claim is a sum of N independent, identically distributed Negative Binomials, which is Negative Binomial with parameters r = Nα and β = θ. The second actuary is simulating N years for a single driver. An individual who is Poisson with mean λ, over N years is Poisson with mean Nλ. I. The Negative Binomial Distribution simulated by the first actuary has mean Nαθ. The Poisson simulated by the second actuary has mean Nλ, where λ depends on which driver the second actuary has picked at random. There is no reason why the mean number of claims simulated by the two actuaries should be the same. Thus statement I is not true. II. The number of claims simulated will usually be different, since they are from two different distributions. Thus statement II is not true. III. The first actuaryʼs Negative Binomial has variance αθ(1 + θ). The second actuaryʼs simulated sequence has an expected variance of λ, where λ depends on which driver the second actuary has picked at random. The expected variance for the second actuaryʼs simulated sequence could be higher or lower than the first actuaryʼs, depending on which driver he has picked. Thus statement III is not true. 19.61. C. Gamma-Poisson. The mixed distribution is Negative Binomial with r = α = 2 and β = θ = 1. f(0) = 1/(1 + β)r = 1/(1 + 1)2 = 1/4. 19.62. E. From the previous solution, the probability that each driver does not have a claim is 1/4. Thus for 1000 independent drivers, the number of drivers with no claims is Binomial with m = 1000 and q = 1/4. This Binomial has mean mq = 250, and variance mq(1 - q) = 187.5. Using the Normal Approximation with continuity correction, Prob[At most 265 claim-free drivers] ≅ Φ[(265.5 - 250)/ 187.5 ] = Φ[1.13] = 87.08%.
2013-4-1, Frequency Distributions, §19 Gamma-Poisson
HCM 10/4/12,
Page 387
19.63. D. The distribution of number of claims from a single driver is Negative Binomial with r = 2 and β = 1. The distribution of the sum of 1000 independent drivers is Negative Binomial with r = (1000)(2) = 2000 and β = 1. This Negative Binomial has mean rβ = 2000, and variance rβ(1 + β) = 4000. Using the Normal Approximation with continuity correction, Prob[more than 2020 claims] ≅ 1 - Φ[(2020.5 - 2000)/ 4000 ] = 1 - Φ[.32] = 37.45%. Alternately, the mean of the sum of 1000 independent drivers is 1000 times the mean of single driver: (1000) (2) = 2000. The variance of the sum of 1000 independent drivers is 1000 times the variance of single driver: (1000) (2) (1) (1+1) = 4000. Proceed as before. 19.64. C. The distribution of number of claims from a single driver is Negative Binomial with r = 2 and β = 1. f(0) = 1/4. f(1) = rβ/(1 + β)r+1 = (2)(1)/(1 + 1)3 = 1/4. f(2) = {r(r + 1)/2}β2/(1 + β)r+2 = {(2)(3)/2}(12 )/(1 + 1)4 = 3/16. The number of drivers with given numbers of claims is a multinomial distribution, with parameters 1000, f(0), f(1), f(2), ... = 1000, 1/4, 1/4, 3/16, .... The covariance of the number of drivers with 1 claim and the number with 2 claims is: -(1000)(1/4)(3/16) = -46.875. The variance of the number of drivers with 1 claim is: (1000)(1/4)(1 - 1/4) = 187.5. The variance of the number of drivers with 2 claims is: (1000)(3/16)(1 - 3/16) = 152.34. The correlation of the number of drivers with 1 claim and the number with 2 claims is: -46.875/ (187.5)(152.34) = -0.277. Comment: Well beyond what you are likely to be asked on your exam! The multinomial distribution is discussed in A First Course in Probability by Ross. The correlation is: -
f(1) f(2) = -1/ 13 = -0.277. {1 - f(1)} {1 - f(2)}
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 388
Section 20, Tails of Frequency Distributions Actuaries are sometimes interested in the behavior of a frequency distribution as the number of claims gets very large.164 The question of interest is how quickly the density and survival function go to zero as x approaches infinity. If the density and survival function go to zero more slowly, one describes that as a "heavier-tailed distribution." Those frequency distributions which are heavier-tailed than the Geometric distribution are often considered to have heavy tails, while those lighter-tailed than Geometric are consider to have light tails.165 There are number of general methods by which one can distinguish which distribution or empirical data set has the heavier tail. Lighter-tailed distributions have more moments that exist. For the frequency distributions on the exam all of the moments exist. Nevertheless, the three common frequency distributions differ in their tail behavior. Since the Binomial has finite support, f(x) = 0 for x > n, it is very light-tailed. The Negative Binomial has its variance greater than its mean, so that the Negative Binomial is heavier-tailed than the Poisson which has its variance equal to its mean. From lightest to heaviest tailed, the frequency distribution in the (a,b,0) class are: Binomial, Poisson, Negative Binomial r > 1, Geometric, Negative Binomial r < 1. Skewness: The larger the skewness, the heavier-tailed the distribution. The Binomial distribution for q > 0.5 is skewed to the left (has negative skewness.) The Binomial distribution for q < 0.5, the Poisson distribution, and the Negative Binomial distribution are skewed to the right (have positive skewness); they have a few very large values and many smaller values. A symmetric distribution has zero skewness. Therefore, the Binomial Distribution for q = 0.5 has zero skewness. Mean Residual Lives/ Mean Excess Loss: As with loss distributions one can define the concept of the mean residual life. The Mean Residual Life, e(x) is defined as: e(x) = (average number of claims for those insureds with more than x claims) - x. Thus we only count those insureds with more than x and only that part of each number of claims greater than x.166 Heavier-tailed distributions have their mean residual life increase to infinity, while lighter-tailed distributions have their mean residual life approach a constant or decline to zero. 164
Actuaries are more commonly concerned with the tail behavior of loss distributions, as discussed in “Mahlerʼs Guide to Loss Distributions.” 165 See Section 6.3 of Loss Models. 166 Thus the Mean Residual Life is the mean of the frequency distribution truncated and shifted at x.
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 389
One complication is that for discrete distributions this definition is discontinuous at the integers. For example, assume we are interested in the mean residual life at 3. As we take the limit from below we include those insureds with 3 claims in our average; as we approach 3 from above, we donʼt include insureds with 3 claims in our average. Define e(3-) as the limit as x approaches 3 from below of e(x). Similarly, one can define e(3+) as the limit as x approaches 3 from above of e(x). Then it turns out that e(0-) = mean, in analogy to the situation for continuous loss distributions. For purposes of comparing tail behavior of frequency distributions, one can use either e(x-) or e(x+). I will use the former, since the results using e(x-) are directly comparable to those for the continuous size of loss distributions. At integer values of x: ∞
∞
∑ e(x-) =
∑ (i - x) f(i)
(i - x) f(i)
i =x
=
∞
∑ f(i)
i =x
.
S(x)
i =x
One can compute the mean residual life for the Geometric Distribution, letting q = β/(1+β) and thus 1 - q = 1/(1+β): e(x-) S(x-1) = 1 1+β
∞
∞
i=x+1
i=x+1
∑ (i - x) f(i) = ∑ (i - x) βi
/ (1+β) i + 1 =
∞ 1 (i - x) qi = ∑ 1+β i=x+1
∞ ∞ ⎧⎪ ∞ ⎫⎪ ⎫ ⎧ qx + 1 qx + 2 qx + 3 ⎨ ∑ qi + ∑ qi + ∑ qi + ...⎬ = (1- q) ⎨ + + + ... ⎬ ⎪⎩i=x+1 1- q 1- q ⎩ 1- q ⎭ ⎭⎪ i=x+2 i=x+3
= qx+1 + qx+2 + qx+3 + ... = qx+1 /(1-q) = {β/(1+β)}x+1(1+β) = βx+1 /(1+β)x. In a previous section, the survival function for the geometric distribution was computed as: S(x) = {β/(1+β)}x+1. Therefore, S(x-1) = {β/(1+β)}x. Thus e(x-) =
β x + 1 / (1+ β) x = β. {β / (1+ β)} x
The mean residual life for the Geometric distribution is constant.167 As discussed previously, the Geometric distribution is the discrete analog of the Exponential distribution which also has a constant mean residual life.168 e(x-) = β = E[X]. The Exponential and Geometric distributions have constant mean residual lives due to their memoryless property as discussed in Section 6.3 of Loss Models. 167 168
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 390
As discussed previously, the Negative Binomial is the discrete analog of the Gamma Distribution. The tail behavior of the Negative Binomial is analogous to that of the Gamma.169 The mean residual life for a Negative Binomial goes to a constant. For r < 1, e(x-) increases to β, the mean of the corresponding Geometric, while for r>1, e(x-) decreases to β as x approaches infinity. For r = 1, one has the Geometric Distribution with e(x-) constant. Using the relation between the Poisson Distribution and the Incomplete Gamma Function, it turns out that for the Poisson e(x-) = (λ-x) +
λxe-λ .170 The mean residual life e(x-) for the Γ (x) Γ (x; λ )
Poisson Distribution declines to zero as x approaches infinity.171 172 This is another way of seeing that the Poisson has a lighter tail than the Negative Binomial Distribution. Summary: Here are the common frequency distributions, arranged from lightest to heaviest righthand tail: Frequency Distribution Binomial, q > 0.5
Skewness negative
Righthand Tail Behavior Finite Support
Binomial, q = 0.5
zero
Finite Support
Binomial, q < 0.5
positive
Finite Support
Poisson
positive
e(x-) → 0,
Tail Similar to
Normal Distribution
approximately as 1/x Negative Binomial, r >1
positive
e(x-) decreases to β
Gamma, α > 1
Geometric
positive
e(x-) constant = β
Exponential (Gamma, α = 1)
(Negative Binomial, r =1) Negative Binomial, r < 1 169
positive
e(x-) increases to β
Gamma, α < 1
See “Mahlerʼs Guide to Loss Distributions”, for a discussion of the mean residual life for the Gamma and other size of loss distributions. For a Gamma Distribution with α>1, e(x) decreases towards a horizontal asymptote θ. For a Gamma Distribution with α<1, e(x) increases towards a horizontal asymptote θ. 170 For the Poisson F(x) = 1 - Γ(x+1; λ). 171 It turns out that e(x-) ≅ λ / x for very large x. This is similar to the tail behavior for the Normal Distribution. While e(x-) declines to zero, e(x+) for the Poisson Distribution declines to one as x approaches infinity. 172 This follows from the fact that the Poisson is a limit of Negative Binomial Distributions. For a sequence of Negative Binomial distributions with rβ = λ as r → ∞ (and β → 0), in the limit one approaches a Poisson Distribution with the mean λ. The tails of each Negative Binomial have e(x-) decreasing to β as x approaches infinity. As β → 0, the limits of e(x-) → 0.
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 391
Skewness and Kurtosis of the Poisson versus the Negative Binomial:173
The Poisson has skewness:
1 λ
.
The Negative Binomial has skewness:
1 + 2β rβ(1 + β)
.
Therefore, if a Poisson and Negative Binomial have the same mean, λ = rβ, then the ratio of the skewness of the Negative Binomial to that of the Poisson is:
1 + 2β 1 + β
> 1.
The Poisson has kurtosis: 3 + 1/λ. The Negative Binomial has kurtosis: 3 +
6β 2 + 6β + 1 . r β (1+ β)
Therefore, if a Poisson and Negative Binomial have the same mean, λ = rβ, then the ratio of the kurtosis minus 3 of the Negative Binomial to that of the Poisson is:174 6β 2 + 6β + 1 > 1. 1 +β Tails of Compound Distributions: Compound frequency distributions can have longer tails than either their primary or secondary distribution. If the primary distribution is the number of accidents, and the secondary distribution is the number of claims, then one can have a large number of claims either due to a large number of accidents, or an accident with a large number of claims, or a combination of the two. Thus there is more chance for an unusually large number of claims. Generally the longer-tailed the primary distribution and the longer-tailed the secondary distribution, the longer-tailed the compound distribution. The skewness of a compound distribution can be rather large.
173 174
See “The Negative Binomial and Poisson Distributions Compared,” by Leroy J. Simon, PCAS 1960. The kurtosis minus 3 is sometimes called the excess.
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 392
Tails of Mixed Distributions: Mixed distributions can also have long tails. For example, the Gamma Mixture of Poissons is a Negative Binomial, with a longer tail than the Poisson. As with compound distributions, with mixed distributions there is more chance for an unusually large number of claims. One can either have a unusually large number of claims for a typical value of the parameter, have an unusual value of the parameter which corresponds to a large expected claim frequency, or a combination of the two. Generally the longer tailed the distribution type being mixed and the longer tailed the mixing distribution, the longer tailed the mixed distribution. Tails of Aggregate Loss Distributions: Actuaries commonly look at the combination of frequency and severity. This is termed the aggregate loss distribution. The tail behavior of this aggregate distribution is determined by the behavior of the heavier-tailed of the frequency and severity distributions.175 Since the common frequency distributions have tails that are similar to the Gamma Distribution or lighter and the common severity distributions for Casualty Insurance have tails at least as heavy as the Gamma, actuaries working on liability or workers compensation insurance are usually most concerned with the heaviness of the tail of the severity distribution. It is the rare extremely large claims that then are of concern. However, natural catastrophes such as hurricanes or earthquakes can be examples where a large number of claims can be the concern.176 (Tens of thousands of homeowners claims, even limited to for example 1/4 million dollars each, can add up to a lot of money!) In that case the tail of the frequency distribution could be heavier than a Negative Binomial.
175
See for example Panjer & WiIlmot, Insurance Risk Models. Natural catastrophes are now commonly modeled using simulation models that incorporate the science of the particular physical phenomenon and the particular distribution of insured exposures. 176
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 393
Problems: 20.1 (1 point) Which of the following frequency distributions have positive skewness? 1. Negative Binomial Distribution with r = 3, β = 0.4. 2. Poisson Distribution with λ = 0.7. 3. Binomial Distribution with m = 3, q = 0.7. A. 1, 2 only B. 1, 3 only C. 2, 3 only D. 1, 2, and 3 E. The correct answer is not given by A, B, C, or D.
Use the following information for the next five questions: Five friends: Oleg Puller, Minnie Van, Bob Alou, Louis Liu, and Shelly Fish, are discussing studying for their next actuarial exam. Theyʼve counted 10,000 pages worth of readings and agree that on average they expect to find about 2000 “important ideas”. However, they are debating how many of these pages there are expected to be with 3 or more important ideas. 20.2 (2 points) Oleg assumes the important ideas are distributed as a Binomial with q = 0.04 and m = 5. How many pages should Oleg expect to find with 3 or more important ideas? A. Less than 10 B. At least 10 but less than 20 C. At least 20 but less than 40 D. At least 40 but less than 80 E. At least 80 20.3 (2 points) Minnie assumes the important ideas are distributed as a Poisson with λ = 0.20. How many pages should Minnie expect to find with 3 or more important ideas? A. Less than 10 B. At least 10 but less than 20 C. At least 20 but less than 40 D. At least 40 but less than 80 E. At least 80
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 394
20.4 (2 points) Bob assumes the important ideas are distributed as a Negative Binomial with β = 0.1 and r = 2. How many pages should Bob expect to find with 3 or more important ideas? A. Less than 10 B. At least 10 but less than 20 C. At least 20 but less than 40 D. At least 40 but less than 80 E. At least 80 20.5 (3 points) Louis assumes the important ideas are distributed as a compound Poisson-Poisson distribution, with λ1 = 1 and λ2 = 0.2. How many pages should Louis expect to find with 3 or more important ideas? A. Less than 10 B. At least 10 but less than 20 C. At least 20 but less than 40 D. At least 40 but less than 80 E. At least 80 20.6 (3 points) Shelly assumes the important ideas are distributed as a compound Poisson-Poisson distribution, with λ1 = 0.2 and λ2 = 1. How many pages should Shelly expect to find with 3 or more important ideas? A. Less than 10 B. At least 10 but less than 20 C. At least 20 but less than 40 D. At least 40 but less than 80 E. At least 80 20.7 (3 points) Define Riemannʼs zeta function as: ζ(s) =
∞
∑1/ ks , s > 1.
k=1
1 Let the zeta distribution be: f(x) = ρ + 1 , x = 1, 2, 3, ..., ρ > 0. ζ(ρ +1) x Determine the moments of the zeta distribution. 20.8 (4B, 5/99, Q.29) (2 points) A Bernoulli distribution, a Poisson distribution, and a uniform distribution each has mean 0.8. Rank their skewness from smallest to largest. A. Bernoulli, uniform, Poisson B. Poisson, Bernoulli, uniform C. Poisson, uniform, Bernoulli D. uniform, Bernoulli, Poisson E. uniform, Poisson, Bernoulli
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 395
Solutions to Problems: 20.1. A. 1. True. The skewness of any Negative Binomial Distribution is positive. 2. True. The skewness of any Poisson Distribution is positive. 3. False. The skewness of the Binomial Distribution depends on the value of q. For q > .5, the skewness is negative. 20.2. A. f(x) = (5! / {(x!)(5-x)!}) 0.04x 0.965-x. One needs to sum the chances of having x = 0, 1, and 2 : n f(n) F(n)
0 0.81537 0.81537
1 0.16987 0.98524
2 0.01416 0.99940
Thus the chance of 3 or more important ideas is: 1 - 0.99940 = 0.00060. Thus we expect: (10000)(0.00060) = 6.0 such pages. 20.3. B. f(x) = e-0.2 0.2x /x! . One needs to sum the chances of having x = 0, 1, and 2 : n f(n) F(n)
0 0.81873 0.81873
1 0.16375 0.98248
2 0.01637 0.99885
Thus the chance of 3 or more important ideas is: 1 - 0.99885 = 0.00115. Thus we expect: (10000)(0.00115) = 11.5 such pages. 20.4. C. f(x) = ((x+2-1)! / {(x!)(2-1)!})(0.1)x / (1.1)x+2 = (x+1)(10/11)2 (1/11)x. One needs to sum the chances of having x = 0, 1, and 2: n f(n) F(n)
0 0.82645 0.82645
1 0.15026 0.97671
2 0.02049 0.99720
Thus the chance of 3 or more important ideas is: 1 - 0.99720 = 0.00280. Thus we expect: (10000)(0.00280) = 28.0 such pages. Comment: Note that the distributions of important ideas in these three questions all have a mean of .2. Since the Negative Binomial has the longest tail, it has the largest expected number of pages with lots of important ideas. Since the Binomial has the shortest tail, it has the smallest expected number of pages with lots of important ideas.
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 396
20.5. D. For the Primary Poisson a =0 and b = λ1 = 1. The secondary Poisson has density at zero of e-0.2 = 0.8187. The p.g.f of the Primary Poisson is P(z) = e(z-1). The density of the compound distribution at zero is the p.g.f. of the primary distribution at 0.8187: e(0.8187-1) = 0.83421. The densities of the secondary Poisson Distribution with λ = 0.2 are: n 0 1 2 3 4 5
s(n) 0.818731 0.163746 0.016375 0.001092 0.000055 0.000002 x
x
Use the Panjer Algorithm, c(x) = {1/(1-as(0))}Σ(a +jb/x)s(j)c(x-j) = (1/x)Σ js(j) c(x-j) . j=1
j=1
c(1) = (1/1) (1) s(1) c(0) = (.16375)(.83421) = .13660. c(2) = (1/2) {(1)s(1) c(1) + (2)s(2)c(0)} = (1/2){(.16375)( .13660)+(2)(.01638)(.83421)} = .02485. Thus c(0) + c(1) +c(2) = 0.83421 + 0.13660 + 0.02485 = 0.99566. Thus the chance of 3 or more important ideas is: 1 - 0.99566 = 0.00434. Thus we expect: (10000)(0.00434) = 43.4 such pages. Comment: The Panjer Algorithm (recursive method) is discussed in “Mahlerʼs Guide to Aggregate Distributions.”
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 397
20.6. E. For the Primary Poisson a = 0 and b = λ1 = .2. The secondary Poisson has density at zero of e-1 = .3679. The p.g.f of the Primary Poisson is P(z) = e.2(z-1). The density of the compound distribution at zero is the p.g.f. of the primary distribution at .3679 : e.2(.3679-1) = .88124. The densities of the secondary Poisson Distribution with λ = 1 are: n 0 1 2 3 4 5
s(n) 0.367879 0.367879 0.183940 0.061313 0.015328 0.003066 x
Use the Panjer Algorithm,
x
c(x) = {1/(1-as(0))}Σ (a +jb/x)s(j)c(x-j) = (.2/x)Σ js(j) c(x-j) . j=1
j=1
c(1) = (.2/1) (1) s(1) c(0) = (.2)(.3679)(.88124) = .06484. c(2) = (.2/2) {(1)s(1) c(1) + (2)s(2)c(0)} = (.1){(.3679)(.06484)+(2)(.1839)(.88124)} = .03480. Thus c(0) + c(1) +c(2) = .88124 + .06484 + .03480 = .98088. Thus the chance of 3 or more important ideas is: 1 - 0.98088 = 0.01912. Thus we expect: (10000)(0.01912) = 191.2 such pages. Comment: This Poisson-Poisson has a mean of .2, but an even longer tail than the previous Poisson-Poisson which has the same mean. Note that it has a variance of (.2)(1) + (1)2 (.2) = .40, while the previous Poisson-Poisson has a variance of (1)(.2) + (.2)2 (1) = .24. The Negative Binomial has a variance of (2)(.1)(1.1) = .22. The variance of the Poisson is .20. The variance of the Binomial is (5)(.04)(.96) = .192. 20.7.
∞
∞
E[Xn ] = Σ xn (1/xρ+1)/ζ(ρ + 1) = Σ (1/xρ+1-n)/ζ(ρ + 1) = ζ(ρ + 1 - n)/ζ(ρ + 1), n < ρ. x=1
x=1
Comment: You are extremely unlikely to be asked about the zeta distribution! The zeta distribution is discrete and has a heavy righthand tail similar to a Pareto Distribution or a Single Parameter Pareto Distribution, with only some of its moments existing. The zeta distribution is mentioned in Exercise 16.33 in Loss Models.
ζ(2) = π2/6. ζ(4) = π4/90. See the Handbook of Mathematical Functions.
2013-4-1,
Frequency Distributions, §20 Tails
HCM 10/4/12,
Page 398
20.8. A. The uniform distribution is symmetric, so it has a skewness of zero. The Poisson has a positive skewness. The Bernoulli has a negative skewness for q = 0.8 > 0.5. Comment: For the Poisson with mean µ, the skewness is 1/ µ . For the Bernoulli, the skewness is:
1 - 2q = q(1 - q)
1 - (2)(0.8) = -1.5. (0.8) (1 - 0.8)
If one computes for this Bernoulli, the third central moment E[(X-0.8)3 ] = 0.2(0- 0.8)3 + 0.8(1 - 0.8)3 = -0.096. Thus the skewness is:
-0.096 = -1.5. (0.8) (1 - 0.8)
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 399
Section 21, Important Formulas and Ideas Here are what I believe are the most important formulas and ideas from this study guide to know for the exam. Basic Concepts (Section 2) The mean is the average or expected value of the random variable. The mode is the point at which the density function reaches its maximum. The median, the 50th percentile, is the first value at which the distribution function is ≥ 0.5. The 100pth percentile as the first value at which the distribution function ≥ p. Variance = second central moment = E[(X - E[X])2 ] = E[X2 ] - E[X]2 . Standard Deviation = Square Root of Variance. Binomial Distribution (Section 3) ⎛m⎞ m! f(x) = f(x) = ⎜ ⎟ qx (1- q )m-x = qx (1- q)m - x , 0 ≤ x ≤ m. x! (m- x)! ⎝x ⎠ Mean = mq
Variance = mq(1-q)
Probability Generating Function: P(z) = {1 + q(z-1)}m The Binomial Distribution for m =1 is a Bernoulli Distribution. X is Binomial with parameters q and m1 , and Y is Binomial with parameters q and m2 , X and Y independent, then X + Y is Binomial with parameters q and m1 + m2 . Poisson Distribution (Section 4) f(x) = λx e−λ / x!, x ≥ 0 Mean = λ
Variance = λ
Probability Generating Function: P(z) = eλ(z-1) , λ > 0. A Poisson is characterized by a constant independent claim intensity and vice versa. The sum of two independent variables each of which is Poisson with parameters λ1 and λ 2 is also Poisson, with parameter λ1 + λ2 . If frequency is given by a Poisson and severity is independent of frequency, then the number of claims above a certain amount (in constant dollars) is also a Poisson.
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 400
Geometric Distribution (Section 5) f(x) =
βx . (1 + β) x + 1
Mean = β
Variance = β(1+β)
Probability Generating Function: P(z) =
1 . 1- β(z-1)
⎛ β ⎞n For a Geometric Distribution, for n > 0, the chance of at least n claims is: ⎜ ⎟ . ⎝ 1+ β ⎠ For a series of independent identical Bernoulli trials, the chance of the first success following x failures is given by a Geometric Distribution with mean β = (chance of a failure) / (chance of a success). Negative Binomial Distribution (Section 6) f(x) =
βx r(r + 1)...(r + x - 1) . (1+ β )x + r x!
Mean = rβ
Variance = rβ(1+β)
Negative Binomial for r = 1 is a Geometric Distribution. For the Negative Binomial Distribution with parameters β and r, with r integer, can be thought of as the sum of r independent Geometric distributions with parameter β. If X is Negative Binomial with parameters β and r1 , and Y is Negative Binomial with parameters β and r2 , X and Y independent, then X + Y is Negative Binomial with parameters β and r1 + r2 . For a series of independent identical Bernoulli trials, the chance of success number r following x failures is given by a Negative Binomial Distribution with parameters r and β = (chance of a failure) / (chance of a success).
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 401
Normal Approximation (Section 7) In general, let µ be the mean of the frequency distribution, while σ is the standard deviation of the frequency distribution, then the chance of observing at least i claims and not more than j claims is approximately: (j + 0.5) - µ (i - 0.5) − µ Φ[ ] - Φ[ ]. σ σ
Normal Distribution F(x) = Φ((x−µ)/σ) (x - µ)2 ] 2σ 2 , -∞ < x < ∞. σ 2π
exp[f(x) = φ((x−µ)/σ) /σ =
φ(x) =
exp[-x2 / 2] 2π
, -∞ < x < ∞.
Mean = µ
Variance = σ2
Skewness = 0 (distribution is symmetric)
Kurtosis = 3
Skewness (Section 8) Skewness = third central moment /STDDEV3 = E[(X - E[X])3 ]/STDDEV3 = {E[X3 ] - 3 X E[X2 ] + 2 X 3 } / Variance3/2. A symmetric distribution has zero skewness. Binomial Distribution with q < 1/2 ⇔ positive skewness ⇔ skewed to the right. Binomial Distribution q = 1/2 ⇔ symmetric ⇒ zero skewness. Binomial Distribution q > 1/2 ⇔ negative skewness ⇔ skewed to the left. Poisson and Negative Binomial have positive skewness. Probability Generating Function (Section 9) Probability Generating Function, p.g.f.: ∞
P(z) = Expected Value of
zn
=
E[zn ] =
∑ f(n) zn . n=0
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 402
The Probability Generating Function of the sum of independent frequencies is the product of the individual Probability Generating Functions. The distribution determines the probability generating function and vice versa. f(n) = (dn P(z) / dzn )z=0 / n!.
f(0) = P(0).
Pʼ(1) = Mean.
If a distribution is infinitely divisible, then if one takes the probability generating function to any positive power, one gets the probability generating function of another member of the same family of distributions. Examples of infinitely divisible distributions include: Poisson, Negative Binomial, Compound Poisson, Compound Negative Binomial, Normal, Gamma. Factorial Moments (Section 10) nth factorial moment = µ(n) = E[X(X-1) .. (X+1-n)]. µ(n) = (dn P(z) / dzn )z=1. Pʼ(1) = E[X].
Pʼʼ(1) = E[X(X-1)].
(a, b, 0) Class of Distributions (Section 11) For each of these three frequency distributions: f(x+1) / f(x) = a + {b / (x+1)}, x = 0, 1, ... where a and b depend on the parameters of the distribution: Distribution Binomial
a -q/(1-q)
Poisson Negative Binomial Distribution Binomial
b (m+1)q/(1-q)
β/(1+β)
(r-1)β/(1+β) Variance mq(1-q)
Poisson
λ
λ
Negative Binomial
rβ
rβ(1+β)
Thinning by factor of t
(1-q)m e−λ
λ
0
Mean mq
Distribution
f(0)
1/(1+β)r
Variance Over Mean 1-q < 1 1 1+β > 1
Variance < Mean Variance = Mean Variance > Mean
Adding n independent, identical copies
Binomial
q → tq
m → nm
Poisson
λ → tλ
λ → nλ
Negative Binomial
β → tβ
r → nr
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 403
For X and Y independent: X Binomial(q, m1 )
Y Binomial(q, m2 )
X+Y Binomial(q, m1 + m2 )
Poisson(λ 1)
Poisson(λ 2)
Poisson(λ 1 + λ2)
Negative Binomial(β, r1 )
Negative Bin.(β, r2 )
Negative Bin.(β, r1 + r2 )
Accident Profiles (Section 12) For the Binomial, Poisson and Negative Binomial Distributions: (x+1) f(x+1) / f(x) = a(x + 1) + b, where a and b depend on the parameters of the distribution. a < 0 for the Binomial, a = 0 for the Poisson, and a > 0 for the Negative Binomial Distribution. Thus if data is drawn from one of these three distributions, then we expect (x+1) f(x+1) / f(x) for this data to be approximately linear with slope a; the sign of the slope, and thus the sign of a, distinguishes between these three distributions of the (a, b, 0) class. Zero-Truncated Distributions (Section 13) In general if f is a distribution on 0,1,2,3,..., then g(x) = f(x) / {1 - f(0)} is a distribution on 1,2,3, .... We have the following three examples: Distribution Density of the Zero-Truncated Distribution
Binomial
Poisson
Negative Binomial
m! qx (1- q)m - x x! (m - x)! 1 - (1- q)m e- λ λx / x! 1 - e- λ
x = 1, 2, 3,... , m
x = 1, 2, 3,...
βx r(r +1)...(r + x - 1) (1+ β)x + r x! x = 1, 2, 3,... 1 - 1/ (1+ β)r
The moments of a zero-truncated distribution, g, are given in terms of those of the corresponding untruncated distribution, f, by:
Eg [Xn ] = Ef[Xn ] / {1 - f(0)}.
⎛ β ⎞x ⎜ ⎟ ⎝ 1+ β ⎠ The Logarithmic Distribution has support equal to the positive integers: f(x) = . x ln(1+β)
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 404
The (a,b,1) class of frequency distributions is a generalization of the (a,b,0) class. As with the (a,b,0) class, the recursion formula: f(x)/f(x-1) = a + b/x applies. However, it need only apply now for x ≥ 2, rather than x ≥ 1. Members of the (a,b,1) family include: all the members of the (a,b,0) family, the zero-truncated versions of those distributions: Zero-Truncated Binomial, Zero-Truncated Poisson, Extended Truncated Negative Binomial, and the Logarithmic Distribution. In addition the (a,b,1) class includes the zero-modified distributions corresponding to these. Zero-Modified Distributions (Section 14) If f is a distribution on 0,1,2,3,..., and 0 < pM 0 < 1, M then g(0) = pM 0 , g(x) = f(x){1 - p 0 } / {1 - f(0)}, x=1, 2 , 3..., is a distribution on 0, 1, 2, 3, ....
The moments of a zero-modified distribution g are given in terms of those of f by n Eg [Xn ] = (1- pM 0 ) Ef[X ] / {1 - f(0)}.
Compound Frequency Distributions (Section 15) A compound frequency distribution has a primary and secondary distribution, each of which is a frequency distribution. The primary distribution determines how many independent random draws from the secondary distribution we sum. p.g.f. of compound distribution = p.g.f. of primary dist.[p.g.f. of secondary dist.] P(z) = P1 [P2 (z)]. compound density at 0 = p.g.f. of the primary at the density at 0 of the secondary. Moments of Compound Distributions (Section 16) Mean of Compound Dist. = (Mean of Primary Dist.)(Mean of Sec. Dist.) Variance of Compound Dist. = (Mean of Primary Dist.)(Var. of Sec. Dist.) + (Mean of Secondary Dist.)2 (Variance of Primary Dist.) In the case of a Poisson primary distribution with mean λ, the variance of the compound distribution could be rewritten as: λ(2nd moment of Second. Dist.). The third central moment of a compound Poisson distribution = λ(3rd moment of Sec. Dist.).
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 405
Mixed Frequency Distributions (Section 17) The density function of the mixed distribution, is the mixture of the density function for specific values of the parameter that is mixed. The nth moment of a mixed distribution is the mixture of the nth moments. First one mixes the moments, and then computes the variance of the mixture from its first and second moments. The Probability Generating Function of the mixed distribution, is the mixture of the probability generating functions for specific values of the parameter. For a mixture of Poissons, the variance is always greater than the mean. Gamma Function (Section 18) The (complete) Gamma Function is defined as: Γ(α) =
∞
∞
0
0
∫ tα - 1 e - t dt = θ−α
∫ tα - 1e - t / θ
Γ(α) = (α-1)Γ(α-1)
Γ(α) = (α -1)! ∞
dt , for α ≥ 0 , θ ≥ 0.
∫ tα - 1 e - t / θ dt
= Γ(α) θα.
0
The Incomplete Gamma Function is defined as: Γ(α ; x) =
x
∫ tα - 1 e- t
0
dt / Γ(α).
2013-4-1,
Frequency Distributions, §21 Important Ideas
HCM 10/4/12,
Page 406
Gamma-Poisson Frequency Process (Section 19) If one mixes Poissons via a Gamma, then the mixed distribution is in the form of the Negative Binomial distribution with r = α and β = θ. If one mixes Poissons via a Gamma Distribution with parameters α and θ, then over a period of length Y, the mixed distribution is Negative Binomial with r = α and β = Yθ. For the Gamma-Poisson, the variance of the mixed Negative Binomial is equal to: mean of the Gamma + variance of the Gamma. Var[X] = E[Var[X | λ]] + Var[E[X | λ]].
Mixing increases the variance.
Tails of Frequency Distributions (Section 20) From lightest to heaviest tailed, the frequency distribution in the (a,b,0) class are: Binomial, Poisson, Negative Binomial r > 1, Geometric, Negative Binomial r < 1.
Mahlerʼs Guide to
Loss Distributions Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-2 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-2,
Loss Distributions,
HCM 10/8/12,
Page 1
Mahlerʼs Guide to Loss Distributions Copyright 2013 by Howard C. Mahler. The Loss Distributions concepts in Loss Models, by Klugman, Panjer, and WiIlmot, are demonstrated.1 Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (and sections whose titles are in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined. Solutions to the problems in each section are at the end of that section. Note that problems include both some written by me and some from past exams.2 The latter are copyright by the Casualty Actuarial Society and SOA and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. Greek letters used in Loss Models: α = alpha, β = beta, γ = gamma, θ = theta, λ = lambda, µ = mu, σ = sigma, τ = tau β = beta, used for the Beta and incomplete Beta functions. Γ = Gamma, used for the Gamma and incomplete Gamma functions. Φ = Phi, used for the Normal distribution. φ = phi, used for the Normal density function.
Π = Pi is used for the continued product just as Σ = Sigma is used for the continued sum
1
In some cases the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to answer exam questions, but it will not be specifically tested. 2 In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus.
Loss Distributions,
2013-4-2,
Page 2
HCM 10/8/12,
Loss Distributions as per Loss Models
Distribution Name
Distribution Function
Probability Density Function
Exponential
1 - e-x/θ
e-x/θ / θ
⎛ θ⎞ α 1 - ⎜ ⎟ , x > θ. ⎝ x⎠
Single Parameter Pareto
⎡ ⎛ x ⎞ τ⎤ ⎛ x ⎞τ τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥ ⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦ x
⎡ ⎛ x ⎞ τ⎤ 1 - exp⎢-⎜ ⎟ ⎥ ⎣ ⎝ θ⎠ ⎦
Weibull
(x / θ)α e- x/ θ x Γ(α)
Γ[α ; x/θ]
Gamma
LogNormal
⎡ ln(x) − µ ⎤ Φ⎢ ⎥⎦ ⎣ σ
Pareto
⎛ θ ⎞α 1 - ⎜ ⎟ ⎝ θ + x⎠
⎛x ⎞ Inverse Gaussian Φ ⎜ − 1⎟ ⎝µ ⎠
[
Inverse Gamma
α θα ,x>θ xα + 1
θ x
]
[
exp -
− µ)2 2σ2 x σ 2π
]
α θα (θ + x)α + 1
⎛x ⎞ + e2θ / µ Φ − ⎜ + 1⎟ ⎝µ ⎠
1 - Γ[α ; θ/x]
( ln(x)
x α -1 e - x / θ = θ α Γ(α)
[
θ x
]
⎛x θ⎜ ⎝µ θ exp 2x 2π x1 .5
θα e - θ / x xα + 1 Γ[α]
[
⎞2 1⎟ ⎠
]
Loss Distributions,
2013-4-2,
Page 3
HCM 10/8/12,
Moments of Loss Distributions as per Loss Models
Distribution Name
Mean
Variance
Moments
Exponential
θ
θ2
Single Parameter Pareto
αθ α −1
α θ2 (α − 1)2 (α − 2)
Weibull
θ Γ[1 + 1/τ]
Gamma
αθ
n! θn α θn , α> n α−n
θ2 {Γ([1 + 2/τ] − Γ[1 + 1/τ]2 }
θn Γ[1 + n/τ]
αθ2 θn
n−1
∏(α + i) = θn (α)...(α + n -1) = θn i=0
LogNormal
Pareto
exp[µ + σ2/2] θ α −1
exp[2µ + σ2] (exp[σ2] - 1) α θ2 (α − 1)2 (α − 2)
Γ[α + n] Γ[α]
exp[nµ + n2 σ2/2]
n
n! θn
∏ (α − i)
=
n! θ n , α> n (α − 1)...(α − n)
i=1
Inverse Gaussian
Inverse Gamma
µ
θ α −1
µ3/θ
eθ/µ
θ2 (α − 1)2 (α − 2)
n
2θ n µ Kn - 1/2 (θ/µ) µπ
θn
∏ (α − i) i=1
=
θn ,α>n (α − 1)...(α − n)
Loss Distributions,
2013-4-2,
A
B
C
D
E
F
G
H
I
HCM 10/8/12,
Section #
Pages
Section Name
1 2 3 4 5 6 7 8 9 10
9-10 11-27 28-47 48-57 58-62 63-64 67-71 72-78 79-86 87-99
11 12 13 14 15 16 17 18 19 20
100 101-114 115-127 128-139 140-157 158-170 171-180 181-212 213-218 219-223
Grouped Data Working with Grouped Data Uniform Distribution Statistics of Grouped Data Policy Provisions Truncated Data Censored Data Average Sizes Percentiles Definitions
21 22 23 24 25 26 27 28 29 30
224-228 229-247 248-262 263-320 321-336 337-349 350-363 364-371 372-384 385-410
Parameters of Distributions
31 32 33 34 35 36 37 38 39 40
411-468 469-506 507-538 539-562 563-580 581-666 667-731 732-784 785-825 826-849
Limited Expected Values Limited Higher Moments Mean Excess Loss Hazard Rate Loss Elimination Ratios and Excess Ratios The Effects of Inflation Lee Diagrams N-Point Mixtures of Models Continuous Mixtures of Models Spliced Models
41 42 43
850-853 854-873 874-893
Relationship to Life Contingencies Gini Coefficient Important Ideas & Formulas
Ungrouped Data Statistics of Ungrouped Data Coefficient of Variation, Skewness, and Kurtosis Empirical Distribution Function Limited Losses Losses Eliminated Excess Losses Mean Excess Loss Layers of Loss Average Size of Losses in an Interval
Exponential Distribution Single Parameter Pareto Distribution Common Two Parameter Distributions Other Two Parameter Distributions Three Parameter Distributions Beta Function and Distribution Transformed Beta Distribution Producing Additional Distributions Tails of Loss Distributions
Page 4
Loss Distributions,
2013-4-2,
HCM 10/8/12,
Exam 3/M Questions by Section of this Study Aid Section 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample
11/00
5/01
11/01
11/02
35
33 31
5
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
5/00
37 8
17 18
24
25
27 21
30
41-42
37
6
10
28 17
28
The CAS/SOA did not release the 5/02 and 5/03 exams.
Page 5
Loss Distributions,
2013-4-2, Sec.
CAS 3 11/03
SOA 3 11/03
5
39
CAS 3 5/04
CAS 3 11/04
HCM 10/8/12,
SOA 3 11/04
CAS 3 5/05
Page 6 SOA M 5/05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
26
35
32
22
17
34
20 21
25
35
16 21
3 28
7 24
19
7 27
4 30
9
29 17, 29, 34 33
20 23 18
33 30 28 29
Starting in 11/03, the CAS and SOA gave separate exams. The SOA did not release its 5/04 exam.
18 34 10
Loss Distributions,
2013-4-2, Sec.
CAS 3 11/05
SOA M 11/05
CAS 3 5/06
Page 7
HCM 10/8/12,
CAS 3 11/06
SOA M 11/06
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
26
6
20 8
19 22 10 11
25, 36
27 14
37
13
38 10, 11, 16
20, 31
29 21, 33
28
32
32 17, 20 35
The SOA did not release its 5/06 exam.
26, 39 28
30 20 18
Loss Distributions,
2013-4-2,
HCM 10/8/12,
Page 8
Course 4 Exam Questions by Section of this Study Aid3 Section Sample
5/00
11/00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5/01
11/01 11/02 11/03 11/04
5/05
3 36
7 2
6
7
18
26 39
37
13
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams. 3
5/07
3
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
11/05 11/06
Questions on more advanced ideas are in “Mahlerʼs Guide to Fitting Loss Distributions”.
13
2013-4-2,
Loss Distributions, §1 Ungrouped Data
HCM 10/8/12,
Page 9
Section 1, Ungrouped Data There are 130 losses of sizes: 300 400 2,800 4,500 4,900 5,000 7,700 9,600 10,400 10,600 11,200 11,400 12,200 12,900 13,400 14,100 15,500 19,300 19,400 22,100 24,800 29,600 32,200 32,500 33,700 34,300
37,300 39,500 39,900 41,200 42,800 45,900 49,200 54,600 56,700 57,200 57,500 59,100 60,800 62,500 63,600 66,400 66,900 68,100 68,900 71,100 72,100 79,900 80,700 83,200 84,500 84,600
86,600 88,600 91,700 96,600 96,900 106,800 107,800 111,900 113,000 113,200 115,000 117,100 119,300 122,000 123,100 126,600 127,300 127,600 127,900 128,000 131,300 132,900 134,300 134,700 135,800 146,100
150,300 171,800 173,200 177,700 183,000 183,300 190,100 209,400 212,900 225,100 226,600 233,200 234,200 244,900 253,400 261,300 261,800 273,300 276,200 284,300 316,300 322,600 343,400 350,700 395,800 406,900
423,200 437,900 442,700 457,800 463,000 469,300 469,600 544,300 552,700 566,700 571,800 596,500 737,700 766,100 846,100 852,700 920,300 981,100 988,300 1,078,800 1,117,600 1,546,800 2,211,000 2,229,700 3,961,000 4,802,200
Each individual value is shown, rather than the data being grouped into intervals. The type of data shown here is called individual or ungrouped data. Some students will find it helpful to put this data set on a computer and follow along with the computations in the study guide to the best of their ability.4 The best way to learn is by doing.
4
Even this data set is far bigger than would be presented on an exam. In many actual applications, there are many thousands of claims, but such a large data set is very difficult to present in a Study Aid. It is important to realize that with modern computers, actuaries routinely deal with such large data sets. There are other situations where all that is available is a small data set such as presented here.
2013-4-2,
Loss Distributions, §1 Ungrouped Data
HCM 10/8/12,
Page 10
This ungrouped data set is used in many examples throughout this study guide: 300, 400, 2800, 4500, 4900, 5000, 7700, 9600, 10400, 10600, 11200, 11400, 12200, 12900, 13400, 14100, 15500, 19300, 19400, 22100, 24800, 29600, 32200, 32500, 33700, 34300, 37300, 39500, 39900, 41200, 42800, 45900, 49200, 54600, 56700, 57200, 57500, 59100, 60800, 62500, 63600, 66400, 66900, 68100, 68900, 71100, 72100, 79900, 80700, 83200, 84500, 84600, 86600, 88600, 91700, 96600, 96900, 106800, 107800, 111900, 113000, 113200, 115000, 117100, 119300, 122000, 123100, 126600, 127300, 127600, 127900, 128000, 131300, 132900, 134300, 134700, 135800, 146100, 150300, 171800, 173200, 177700, 183000, 183300, 190100, 209400, 212900, 225100, 226600, 233200, 234200, 244900, 253400, 261300, 261800, 273300, 276200, 284300, 316300, 322600, 343400, 350700, 395800, 406900, 423200, 437900, 442700, 457800, 463000, 469300, 469600, 544300, 552700, 566700, 571800, 596500, 737700, 766100, 846100, 852700, 920300, 981100, 988300, 1078800, 1117600, 1546800, 2211000, 2229700, 3961000, 4802200
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 11
Section 2, Statistics of Ungrouped Data For the ungrouped data in Section 1: Average of X = E[X] = 1st moment = Σ xi /n = 40647700/130 ≅ 312,674.6. Average of X2 = E[X2 ] = 2nd moment about the origin = Σ xi2 /n = 4.9284598 x 1011. (empirical) Mean = X = 312,674.6. (empirical) Variance = E[X2 ] - E[X]2 = 3.9508 x 1011. (empirical) Standard Deviation = Square Root of Variance = 6.286 x 105 .
Mean: The mean is the average or expected value of the random variable. Mean of the variable X = E[X]. Empirical Mean of a sample of random draws from X = X . The mean of the data allows you to set the scale for the fitted distribution. In general means add: E[X + Y] = E[X] + E[Y]. Also multiplying a variable by a constant multiplies the mean by the same constant; E[kX] = kE[X]. The mean is a linear operator, E[aX + bY] = aE[X] + bE[Y]. Mode: The mean differs from the mode which represents the value most likely to occur. For a continuous distribution function the mode is the point at which the density function reaches its maximum. For the empirical data in Section 1 there is no clear mode5. For discrete distributions, for example frequency distributions, the mode has the same definition but is easier to pick out. If one multiplies all the claims by a constant, the mode is multiplied by that same constant. Median: The median is that value such that half of the claims are on either side. At the median the distribution function is 0.5. The median is the 50th percentile. For a discrete loss distribution, one may linearly interpolate in order to estimate the median. If one multiplies all the claims by a constant, the median is multiplied by that same constant.
5
One would expect a curve fit to this data to have a mode much smaller than the mean.
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 12
The sample median for the data in Section 1 is about $121 thousand.6 This is much less than the sample mean of about $313 thousand. While the mean can be affected greatly by a few large claims, the median is affected equally by each claim size. For a continuous7 distribution with positive skewness typically: mean > median > mode (alphabetical order.) The situation is reversed for negative skewness. Also usually the median is closer to the mean than to the mode (just as it is in the dictionary.)8
Variance: The variance is the expected value of the squared difference of the variable and its mean. The variance is the second central moment. Var[X] = E[(X - E[X])2 ] = E[X2 ] - E[X]2 = second moment minus the square of the first moment. For the Ungrouped Data in Section 1, we calculate the empirical variance as: (1/N)Σ(Xi - X )2 = E[X2 ] - E[X]2 = 4.92845 x 1011 - 312674.62 = 3.9508 x 1011. Thus if X is in dollars, then Var[X] is in dollars squared. Multiplying a variable by a constant multiplies the variance by the square of that constant; Var[kX] = k2 Var[X]. In particular, Var[-X] = Var[X]. Exercise: Var[X] = 6. What is Var[3X]? [Solution: Var[3X] = 32 Var[X] = (9)(6) = 54.] For independent random variables the variances add.9 If X and Y are independent, then Var [X + Y] = Var [X] + Var [Y]. Also If X and Y are independent, then Var[aX + bY] = a2 Var[X] + b2 Var[Y]. In particular, Var [X - Y] = Var[X] + Var[Y], for X and Y independent. 6
The 65th out 130 claims is $119,300 and the 66th claim is $122,000. As discussed in “Mahlerʼs Guide to Fitting Loss Distributions”, a point estimate for the median would be at the (.5)(1+130) = 65.5th claim. So one would linearly interpolate half way between the 65th and 66th claim to get a point estimate of the median of: (.5)(119300) +(.5) (122000) = $120,650. 7 For frequency distributions the relationship may be different due to the fact that only certain discrete values can appear as the mode or median. 8 See page 49 of Kendallʼs Advanced Theory of Statistics, Volume 1 (1994) by Stuart & Ord. 9 In general Var[X+Y] = Var[X] + Var[Y] + 2Cov[X,Y], where Cov[X,Y] = E[XY] - E[X]E[Y] = covariance of X and Y. For X and Y independent E[XY] = E[X]E[Y] and Cov[X,Y] = 0.
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 13
Exercise: Var[X] = 6. Var[Y] = 8. X and Y are independent. What is Var[3X + 10Y]? [Solution: Var[3X + 10Y] = 9Var[X] + 100Var[Y] = 54 + 800 = 854.] Note that if X and Y are independent and identically distributed, then Var[X1 + X2 ] = 2 Var [X]. Adding up n such variables gives a variable with variance = nVar[X]. Exercise: Var[X] = 6. What is Var[X1 + X2 + X3 ]? [Solution: Var[X1 + X2 + X3 ] = Var[X] + Var[X] + Var[X] = 3Var[X] = (3)(6) = 18.] Averaging consists of summing n random draws, and then dividing by n. Averaging n such variables gives a variable with variance: Var[(1/n)ΣXi] = Var[ΣXi] / n2 = n Var[X] / n2 = Var[X] / n. Thus the sample mean has a variance that is inversely proportional to the number of points. Therefore, the sample mean has a standard deviation that is inversely proportional to the square root of the number of points. Exercise: Var[X] = 6. What is the variance of the average of 100 independent random draws from X? [Solution: Var[X] / n = 6/100 = .06.] While variances add for independent variables, more generally: Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y]. Exercise: Var[X] = 6, Var[Y] = 8, and Cov[X, Y] = 5. What is Var[X + Y]? [Solution: Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y] = 6 + 8 + (2)(5) = 24.]
Covariances and Correlations: The Covariance of two variables X and Y is defined by: Cov[X, Y] = E[XY] - E[X]E[Y]. Exercise: E[X] = 3, E[Y] = 5, and E[XY] = 25. What is the covariance of X and Y? [Solution: Cov[X,Y] = E[XY] - E[X]E[Y] = 25 - (3)(5) = 10.] Since Cov[X,X] = E[X2 ] - E[X]E[X] = Var[X], the covariance is a generalization of the variance.
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 14
Covariances have the following useful properties: Cov[X, aY] = aCov[X, Y]. Cov[X, Y] = Cov[Y, X]. Cov[X, Y + Z] = Cov[X, Y] + Cov[X, Z]. The Correlation of two random variables is defined in terms of their covariances: Cov[X ,Y] Corr[X, Y] = . Var[X] Var[Y] Exercise: Var[X] = 6, Var[Y] = 8, and Cov[X, Y] = 5. What is Corr[X, Y]? Cov[X ,Y] 5 [Solution: Corr[X,Y] = = = 0.72.] (6) (8) Var[X] Var[Y] The correlation is always between -1 and +1. Corr[X, Y] = Corr[Y, X] Corr[X, X] = 1 Corr[X, -X] = -1 ⎧Corr[X, Y] if a > 0 ⎪ Corr[X, aY] = ⎨ 0 if a = 0 ⎪-Corr[X, Y] if a < 0 ⎩ Corr[X, aX] = 1 if a > 0 Two variables that are proportional with a positive proportionality constant are perfectly correlated and have a correlation of one. Closely related variables, such as height and weight, have a correlation close to but less than one. Unrelated variables have a correlation near zero. Inversely related variables, such as the average temperature and use of heating oil, are negatively correlated. Standard Deviation: The standard deviation is the square root of the variance. If X is in dollars, then the standard deviation of X is also in dollars. STDDEV[kX] = kSTDDEV[X]. Exercise: Var[X] = 16. Var[Y] = 9. X and Y are independent. What is the standard deviation of X + Y? [Solution: Var[X + Y] = 16 + 9 = 25. StdDev[X + Y] = 25 = 5. Comment: Standard deviations do not add. In the exercise, 4 + 3 ≠ 5.]
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 15
Exercise: Var[X] = 16. What is the standard deviation of the average of 100 independent random draws from X? [Solution: variance of the average = Var[X] / n = 16/100 = .16. standard deviation of the average = Alternately, StdDev[X] =
0.16 = 0.4.
16 = 4.
standard deviation of the average = StdDev[X] /
n = 4/10 = 0.4.]
Sample Variance: Sample Mean = ΣXi / N = X .
∑ (Xi Note that the variance as calculated above,
- X )2
N
, is a biased estimator of the variance of the
distribution from which this data set was drawn. The sample variance is an unbiased estimator of the variance of the distribution from which a data set was drawn:10
Sample variance ≡
∑ (Xi
- X )2
N - 1
.
For the Ungrouped Data in Section 1, we calculate the sample variance as:
∑ (Xi
- X )2
N - 1
= 3.9814 x 1011.
For 130 data points, the sample variance is the empirical variance multiplied by: N / (N - 1) = 130/129.
10
Bias is discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
Problems: Use the following information for the next two questions: Prob[ X = 1] = 70%, Prob[X = 5] = 20%, Prob[X =10] = 10%. 2.1 (1 point) What is the mean of X? A. less than 3.0 B. at least 3.0 but less than 3.5 C. at least 3.5 but less than 4.0 D. at least 4.0 but less than 4.5 E. at least 4.5 2.2 (1 point) What is the variance of X? A. less than 6 B. at least 6 but less than 7 C. at least 7 but less than 8 D. at least 8 but less than 9 E. at least 9 Use the following data set for the next two questions: 4, 7, 13, 20. 2.3 (1 point) What is the mean? A. 8
B. 9
C. 10
D. 11
E. 12
2.4 (1 point) What is the sample variance? A. 30
B. 35
C. 40
D. 45
2.5 (1 point) You are given the following: • Let X be a random variable X. • Y is defined to be X/2. Determine the correlation coefficient of X and Y. A. 0.00 B. 0.25 C. 0.50 D. 0.75
E. 50
E. 1.00
2.6 (2 points) X and Y are two independent variables. E[X] = 3. Var[X] = 5. E[Y] = 6. Var[Y] = 2. Let Z = XY. Determine the standard deviation of Z. A. 14 B. 16 C. 18 D. 20 E. 22
HCM 10/8/12,
Page 16
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 17
2.7 (1 point) Let X1 , X2 , ..., X20, be 20 independent, identically distributed random variables, each of which has variance of 17. If one estimates the mean by the average of the 20 observed values, what is the variance of this estimate? A. less than .6 B. at least .6 but less than .7 C. at least .7 but less than .8 D. at least .8 but less than .9 E. at least .9 2.8 (1 point) Let X and Y be independent random variables. Which of the following statements are true? 1. If Z is the sum of X and Y, the variance of Z is the sum of the variance of X and the variance of Y. 2. If Z is the difference between X and Y, the variance of Z is the difference between the variance of X and the variance of Y. 3. If Z is the product of X and Y, then the expected value of Z is the product of the expected values of X and Y. A. None of 1, 2, or 3 B. 1 C. 2 D. 3 E. None of A ,B, C, or D Use the following information for the next two questions: Height of Husband (inches): 66 68 69 71 Height of Wife (inches): 64 63 67 65
73 69
2.9 (2 points) What is the covariance of the heights of husbands and wives? A. 2 B. 3 C. 4 D. 5 E. 6 2.10 (2 points) What is the correlation of the heights of husbands and wives? A. 70% B. 75% C. 80% D. 85% E. 90% 2.11 (1 point) Let x and y be two independent random draws from a continuous distribution. Demonstrate that the mean squared difference between x and y is twice the variance of the distribution. 2.12 (2, 5/83, Q.24) (1.5 points) Let the random variable X have the density function f(x) = kx for 0 < x < 2 / k . If the mode of this distribution is at x = then what is the median of this distribution? A.
2 /6
B. 1/4
C.
2 /4
D.
2 /24
E. 1/2
2 /4,
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 18
2.13 (2, 5/83, Q.30) (1.5 point) Below are shown the probability density functions of two symmetric bounded distributions with the same median.
Which of the following statements about the means and standard deviations of the two distributions are true? A. µII > µI and σII = σI
B. µII > µI and σII > σI
C. µI = µII and σII < σI
D. µI = µII and σI < σII
E. Cannot be determined from the given information
2.14 (2, 5/83, Q.49) (1.5 point) Let X and Y be random variables with Var(X) = 4, Var(Y) = 9, and Var(X - Y) = 16. What is Cov(X, Y)? A. -3/2 B. -1/2 C. 1/2 D. 3/2 E. 13/16 2.15 (2, 5/85, Q.5) (1.5 points) Let X and Y be random variables with variances 2 and 3, respectively, and covariance -1. Which of the following random variables has the smallest variance? A. 2X + Y B. 2X - Y C. 3X - Y D. 4X E. 3Y 2.16 (2, 5/85, Q.11) (1.5 points) Let X and Y be independent random variables, each with density f(t) = 1/(2θ) for -θ < t < θ. If Var(XY) = 64/9, then what is θ? A. 1
B. 2
C. 4 3 / 3
D. 2 2
E. 8 3 / 3
2.17 (2, 5/85, Q.47) (1.5 points) Let X be a random variable with finite variance. If Y = 15 - X, then determine Corr[X, (X + Y)X]. A. -1 B. 0 C. 1/15 D. 1 E. Cannot be determined from the information given. 2.18 (4, 5/86, Q.31) (1 point) Which of the following statements are true about the distribution of a random variable X? 1. If X is discrete, the value of X which occurs most frequently is the mode. 2. If X is continuous, the expected value of X is equal to the mode of X. 3. The median of X is the value of X which divides the distribution in half. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 19
2.19 (4, 5/86, Q.32) (1 point) Let X, Y and Z be random variables. Which of the following statements are true? 1. The variance of X is the second moment about the origin of X. 2. If Z is the product of X and Y, then the expected value of Z is the product of the expected values of X and Y. 3. The expected value of X is equal to the expectation over all possible values of Y, of the conditional expectation of X given Y. A. 2 B. 3 C. 1, 2 D. 1, 3 E. 2, 3 2.20 (2, 5/88, Q.43) (1.5 points) X, Y, and Z have means 1, 2, and 3, respectively, and variances 4, 5, and 9, respectively. The covariance of X and Y is 2, the covariance of X and Z is 3, and the covariance of Y and Z is 1. What are the mean and variance, respectively, of the random variable 3X + 2Y - Z? A. 4 and 31 B. 4 and 65 C. 4 and 67 D. 14 and 13 E. 14 and 65 2.21 (4, 5/89, Q.26) (1 point) If the random variables X and Y are not independent, which of the following equations will still be true? 1. E(X + Y) = E(X) + E(Y) 2. E(XY) = E(X) E(Y) 3. Var (X + Y) = Var (X) + Var (Y) A. 1 B. 2 C. 1, 2 D. 1, 3 E. None of the above 2.22 (2, 5/90, Q.18) (1.7 points) Let X be a continuous random variable with density function f(x) = x(4 - x)/9 for 0 < x < 3. What is the mode of X? A. 4/9 B. 1 C. 1/2 D. 7/4 E. 2 2.23 (2, 5/92, Q.2) (1.7 points) Let X be a random variable such that E(X) = 2, E(X3 ) = 9, and E[(X - 2)3 ] = 0. What is Var(X)? A. 1/6 B. 13/6 C. 25/6
D. 49/6
E. 17/2
2.24 (4B, 5/93, Q.9) (1 point) If X and Y are independent random variables, which of the following statements are true? 1. Var[X + Y] = Var[X] + Var[Y] 2. Var[X - Y] = Var[X] + Var[Y] 3. A. 1
Var[aX + bY] = a2 E[X2 ] - a(E[X])2 + b2 E[Y2 ] - b(E[Y])2 B. 1,2 C. 1,3 D. 2,3 E. 1,2,3
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 20
2.25 (4B, 5/94, Q.5) (2 points) Two honest, six-sided dice are rolled, and the results D1 and D2 are observed. Let S = D1 + D2 . Which of the following are true concerning the conditional distribution of D1 given that S<6? 1. The mean is less than the median. 2. The mode is less than the mean. 3. The probability that D1 = 2 is 1/3. A. 2
B. 3
C. 1, 2
D. 2, 3
E. None of A, B, C, or D
2.26 (Course 160 Sample Exam #3, 1994, Q.1) (1.9 points) You are given: (i) T is the failure time random variable. (ii) f(t) = {(10 - t)/10}9 for 0 < t ≤ 10. Calculate the ratio of the mean of T to the median of T. (A) 0.67 (B) 0.74 (C) 1.00 (D) 1.36 (E) 1.49 2.27 (2, 2/96, Q.25) (1.7 points) The sum of the sample mean and median of ten distinct data points is equal to 20. The largest data point is equal to 15. Calculate the sum of the sample mean and median if the largest data point were replaced by 25. A. 20 B. 21 C. 22 D. 30 E. 31 2.28 (Course 1 Sample Exam, Q.9) (1.9 points) The distribution of loss due to fire damage to a warehouse is: Amount of Loss Probability 0 0.900 500 0.060 1,000 0.030 10,000 0.008 50,000 0.001 100,000 0.001 Given that a loss is greater than zero, calculate the expected amount of the loss. A. 290 B. 322 C. 1,704 D. 2,900 E. 32,222
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 21
2.29 (1, 5/00, Q.8) (1.9 points) A probability distribution of the claim sizes for an auto insurance policy is given in the table below: Claim Size Probability 20 0.15 30 0.10 40 0.05 50 0.20 60 0.10 70 0.10 80 0.30 What percentage of the claims are within one standard deviation of the mean claim size? (A) 45% (B) 55% (C) 68% (D) 85% (E) 100% 2.30 (IOA 101, 9/00, Q.2) (3 points) Consider a random sample of 47 white-collar workers and a random sample of 24 blue-collar workers from the workforce of a large company. The mean salary for the sample of white-collar workers is 28,470 and the standard deviation is 4,270; whereas the mean salary for the sample of blue-collar workers is 21,420 and the standard deviation is 3,020. Calculate the mean and the standard deviation of the salaries in the combined sample of 71 employees. 2.31 (1, 11/00, Q.1) (1.9 points) A recent study indicates that the annual cost of maintaining and repairing a car in a town in Ontario averages 200 with a variance of 260. If a tax of 20% is introduced on all items associated with the maintenance and repair of cars (i.e., everything is made 20% more expensive), what will be the variance of the annual cost of maintaining and repairing a car? (A) 208 (B) 260 (C) 270 (D) 312 (E) 374 2.32 (1, 11/00, Q.38) (1.9 points) The profit for a new product is given by Z = 3X - Y - 5. X and Y are independent random variables with Var(X) = 1 and Var(Y) = 2. What is the variance of Z? (A) 1 (B) 5 (C) 7 (D) 11 (E) 16 2.33 (1, 11/01, Q.7) (1.9 points) Let X denote the size of a surgical claim and let Y denote the size of the associated hospital claim. An actuary is using a model in which E(X) = 5, E(X2 ) = 27.4, E(Y) = 7, E(Y2 ) = 51.4, and Var(X+Y) = 8. Let C1 = X+Y denote the size of the combined claims before the application of a 20% surcharge on the hospital portion of the claim, and let C2 denote the size of the combined claims after the application of that surcharge. Calculate Cov(C1 , C2 ). (A) 8.80 (B) 9.60 (C) 9.76 (D) 11.52 (E) 12.32
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 22
2.34 (1, 5/03, Q.15) (2.5 points) An insurance policy pays a total medical benefit consisting of two parts for each claim. Let X represent the part of the benefit that is paid to the surgeon, and let Y represent the part that is paid to the hospital. The variance of X is 5000, the variance of Y is 10,000, and the variance of the total benefit, X + Y, is 17,000. Due to increasing medical costs, the company that issues the policy decides to increase X by a flat amount of 100 per claim and to increase Y by 10% per claim. Calculate the variance of the total benefit after these revisions have been made. (A) 18,200 (B) 18,800 (C) 19,300 (D) 19,520 (E) 20,670
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 23
Solutions to Problems: 2.1. A. E[X] = (70%)(1) + (20%)(5) + (10%)(10) = 2.7. 2.2. D. E[X2 ] = (70%)(12 ) + (20%)(52 ) + (10%)(102 ) = 15.7. Var[X] = 15.7 - 2.72 = 8.41. Alternately, Var[X] = (70%)(1 - 2.7)2 + (20%)(5 - 2.7)2 + (10%)(10 - 2.7)2 = 8.41. 2.3. D. mean = (4 + 7 + 13 + 20)/4 = 11. 2.4. E. sample variance = {(4 - 11)2 + (7 - 11)2 + (13 - 11)2 + (20 - 11)2 }/(4 -1) = 50. 2.5. E. Var[Y] = Var[X/2] = Var[X]/4. Cov[X, Y] = Cov[X, X/2] =Cov[X, X]/2 = Var[X]/2. Therefore, Corr[X, Y] = (Var[X]/2) / Var[X] Var[X]/ 4 = 1. Comments: Two variables that are proportional with a positive proportionality constant are perfectly correlated and have a correlation of one. 2.6. A. E[Z] = E[X]E[Y] = (3)(6) = 18. E[Z2 ] = E[X2 ]E[Y2 ] = (5 + 32 )(2 + 62 ) = 532. Var[Z] = 532 - 182 = 208. StdDev[Z] =
208 = 14.4. 20
2.7. D. The estimated mean is: (1/20) Σ xi . Therefore, i=1 20
20
Var(mean) = Var Σ(xi/20) = (1/202 )Σ Var(xi) = (20)(1/202 ) Var(x) = 17/20 = .85. i=1
i=1
Comment: Since the xi are independent, Var(x1 +x2 ) = Var(x1 )+Var(x2 ). Since they are identically distributed Var(x1 )= Var(x2 ). Since Var(aY) = a2 Var(Y), Var(x1 /20) = (1/202 )Var(x1 ). Note that as the number of observations n increases, the variance of the mean decreases as 1/n. 2.8. E. 1. True, since X and Y are independent. 2. False. In general VAR[X - Y] = VAR[X] + VAR[Y] - 2COV[X, Y]. When X and Y are independent, Cov[X, Y] = 0 and therefore, VAR[X - Y] = VAR[X] + VAR[Y]. 3. True. In general E[XY] ≠ E[X]E[Y], although this is true if X and Y are independent.
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 24
2.9. C. Let H = husbands heights and W = wives heights. E[H] = 69.4. E[W] = 65.6. E[HW] = {(66)(64) + (68)(63) + (69)(67) + (71)(65) + (73)(69)}/5 = 4556.6. Cov[H, W] = 4556.6 - (69.4)(65.6) = 3.96. 2.10. B. Var[H] = {(66 -69.4)2 + (68 -69.4)2 + (69 -69.4)2 + (71 -69.4)2 + (73 -69.4)2 }/5 = 5.84. Var[W] = {(64 -65.6)2 + (63 -65.6)2 + (67 -65.6)2 + (65 -65.6)2 + (69 -65.6)2 }/5 = 4.64. Corr[H, W] = Cov[H, W]/ Var[H]Var[W] = 3.96/ (5.84)(4.64) } = 0.761. 2.11. E[(X - Y)2 ] = E[X2 - 2XY + Y2 ] = E[X2 ] - 2E[XY] + E[Y2 ] = E[X2 ] - 2 E[X]E[Y] + E[X2 ] = 2E[X2 ] - 2E[X]2 = 2 Var[X]. 2.12. B. The mode is where the density is largest, which in this case is at the righthand endpoint of the support,
2/ k.
2 / k = 2 /4. ⇒ k = 4. ⇒ k = 16.
f(x) = 16x. F(x) = 8x2 . At the median, F(x) = 0.5. ⇒ 8x2 = 0.5. ⇒ x = 1/4. 2.13. D. The means are each equal to the medians, since the distributions are symmetric. The two medians are equal, therefore so are the means. The second central moment of distribution II is larger than that of distribution I, since distribution II is more dispersed around its mean. σII2 > σI2. ⇒ σII > σI. 2.14. A. 16 = Var(X - Y) = Var(X) + Var(Y) - 2Cov(X, Y). Cov(X, Y) = -(16 - 4 - 9)/2 = -3/2. 2.15. A. Var[2X + Y] = (4)(2) + 3 + (2)(2)(-1) = 7. Var[2X - Y] = (4)(2) + 3 - (2)(2)(-1) = 15. Var[3X - Y] = (9)(2) + 3 - (2)(3)(-1) = 27. Var[4X] = (16)(2) = 32. Var[3Y] = (9)(3) = 27. 2X + Y has the smallest variance. 2.16. D. E[X] = E[Y] = 0. Var[X] = Var[Y] = (2θ)2 /12 = θ2/3. E[X2 ] = E[Y2 ] = θ2/3. Var[XY] = E[(XY)2 ] - E[XY]2 = E[X2 ] E[Y2 ] - E[X]2 E[Y]2 = (θ2/3)(θ2/3) - 0 = θ4/9. θ4/9 = 64/9. ⇒ θ =
8 = 2 2.
2.17. D. X + Y = 15. Corr[X, (X + Y)X] = Corr[X, 15X] = 1. Comment: Two variables that are proportional with a positive constant have a correlation of 1. 2.18. C. 1. True. 2. False. The expected value of X is the mean. Usually the mean and the mode are not equal. 3. True.
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 25
2.19. B. 1. False. The variance is the second central moment: VAR[X] = E[(X-E[X])2] = E[X2] - E[X]2. The second moment around the origin is E[X2]. 2. False. COVAR[X,Y] = E[XY] - E[X]E[Y], so statement 2 only holds when the covariance of X and Y is zero. (This is true if X and Y are independent.) 3. True. E[X] = EY[E[X|Y]]. 2.20. C. E[3X + 2Y - Z] = 3E[X] + 2E[Y] - E[Z] = (3)(1) + (2)(2) - 3 = 4. Var[3X + 2Y - Z] = 9Var[X] + 4Var[Y] + Var[Z] + 12Cov[X, Y] - 6Cov[X, Z] - 4Cov[Y, Z] = (9)(4) + (4)(5) + 9 + (12)(2) - (6)(3) - (4)(1) = 67. 2.21. A. Means always add, so statement 1 is true. E[XY] =E[X]E[Y] + COVAR[X,Y], therefore E[XY] = E[X]E[Y] if and only if the covariance of X and Y is zero. Thus statement 2 is not true in general. In general VAR[X+Y] = VAR[X] + VAR[Y] + 2 COVAR[X,Y]. If X and Y are independent then their covariance is zero, and statement 3 would hold. However, statement 3 is not true in general. 2.22. E. fʼ(x) = (4 - x)/9 - x/9 = 0. ⇒ x = 2. f(2) = 4/9. Check endpoints: f(0) = 0, f(3) = 1/3. 2.23. A. 0 = E[(X - 2)3 ] = E[X3 ] - 6E[X2 ] + 12E[X] - 8. ⇒ 9 - 6E[X2 ] + (12)(2) - 8 = 0.
⇒ E[X2 ] = 25/6. ⇒ Var(X) = E[X2 ] - E[X]2 = 25/6 - 22 = 1/6. 2.24. B. 1. True. 2. True. 3. For X and Y Independent, Var[aX + bY] = a2 Var[X] + b2 Var[Y] = a2 E[X2 ] - a2 E[X]2 + b2 E[Y2 ] - b2 E[Y]2 , therefore Statement #3 is False. 2.25. A. When S < 6 we have the following equally likely possibilities: D2 Conditional Density Function D1
of D1 1 2 3 4 Possibilities 1 x x x x 4 2 x x x 3 3 x x 2 4 x 1 The mean of the conditional density function of D1 given that S<6 is:
given that S<6 4/10 3/10 2/10 1/10
(.4)(1) + (.3)(2) + (.2)(3) + (.1)(4) = 2. The median is equal to 2, since the Distribution Function at 2 is .7 ≥ .5, but at 1 it is .4 < .5. The mode is 1, since that is the value at which the density is a maximum. 1. F, 2. T, 3. F.
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 26
10
2.26. D. S(t) = {(10 - t)/10}10. E[T] = ∫ S(t)dt = 10/11 = .9091. 0
Set 0.5 = S(t) = {(10 - t)/10}10. ⇒ Median = 10(1 - 0.50.1) = 0.6697. Mean/Median = 0.9091/0.6697 = 1.357. 2.27. B. The sample median remains the same, while the sample mean is increased by (25 - 15)/10 = 1. The sum of the sample mean and median is now: 20 + 1 = 21. 2.28. D. {(500)(0.060) + (1,000)(0.030) + (10,000)(0.008) + (50,000)(0.001) + (100,000)(0.001)}/(.06 + .03 + .008 + .001 + .001) = 29000/.10 = 2900. 2.29. A. mean = (20)(0.15) + (30)(0.10) + (40)(0.05)+ (50)(0.20) + (60)(0.10) + (70)(0.10) + (80)(0.30) = 55. second moment = (202 )(0.15) + (302 )(0.10) + (402 )(0.05)+ (502 )(0.20) + (602 )(0.10) + (702 )(0.10) + (802 )(0.30) = 3500. standard deviation = 3500 - 552 = 21.79. Prob[within one standard deviation of the mean] = Prob[33.21 ≤ X ≤ 76.79] = .05 + .20 + .10 + .10 = 45%. 2.30. Total is: (47)(28470) + (24)(21420) = 1,852,170. Overall mean is: 1,852,170/(47 + 24) = 26,087. The overall second moment is: {(47)(4,2702 + 284702 ) + (24)(3,0202 + 214202 )}/(47 + 24) = 706,800,730. Overall variance is: 706,800,730 - 26,0872 = 26,269,161. Overall standard deviation is:
26,269,161 = 5125.
2.31. E. When one multiplies a variable by a constant, in this case 1.2, one multiplies the variable by that the square of that constant, in this case 1.22 = 1.44. (1.44)(260) = 374.4. 2.32. D. Var[Z] = Var[3X - Y - 5] = 9Var[X] + Var[Y] = (9)(1) + 2 = 11. 2.33. A. Var[X] = 27.4 - 52 = 2.4. Var[Y] = 51.4 - 72 = 2.4. Cov[X, Y] = (Var[X+Y] - Var[X] - Var[Y])/2 = (8 - 2.4 - 2.4)/2 = 1.6. Cov[C1 , C2 ] = Cov[X + Y, X + 1.2Y] = Cov[X, X] + Cov[X, 1.2 Y] + Cov[Y, X] + Cov[Y, 1.2 Y] = Var[X] + 1.2 Cov[X, Y] + Cov[Y, X] + 1.2 Cov[Y, Y] =Var[X] + 1.2 Var[Y] + 2.2 Cov[X, Y] = 2.4 + (1.2)(2.4) + (2.2)(1.6) = 8.8.
2013-4-2,
Loss Distributions, §2 Stats Ungrouped Data
HCM 10/8/12,
Page 27
2.34. C. Cov[X, Y] = (Var[X + Y] - Var[X] - Var[Y])/2 = 1000. Adding a flat amount of 100 does not affect the variance. Var[X + 1.10Y] = Var[X] + 1.21Var[Y] + 2(1.1)Cov[X, Y] = 5000 + (1.21)(10000) + (2.2)(1000) = 19,300.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 28
Section 3, Coefficient of Variation, Skewness, and Kurtosis The coefficient of variation, skewness, and kurtosis, all help to describe the shape of a size of loss distribution. For the ungrouped data in Section 1: Average of X3 = E[X3 ] = 3rd moment about the origin = Σ xi3 / n = 1.600225 x 1018. Average of X4 = E[X4 ] = 4th moment about the origin = Σ xi4 / n = 6.465278 x 1024. Coefficient of Variation = Standard Deviation / Mean = 6.286 x 105 / 312675 = 2.01. (Coefficient of) Skewness = γ1 =
Kurtosis = γ2 =
E[(X - E[X]) 3] E[X3] - 3 E[X] E[X2] + 2 E[X] 3 = = 4.83. STDDEV 3 STDDEV 3
E[(X - E[X])4 ] E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4 = = 30.3. Variance2 Variance2
Coefficient of Variation: The coefficient of variation (CV) = standard deviation / mean. The coefficient of variation measures how dispersed the sizes of loss are around their mean. The larger the coefficient of variation the more dispersed the distribution. The coefficient of variation helps describe the shape of the distribution. Exercise: Let 5 and 77 be the first two moments (around the origin) of a distribution. What is the coefficient of variation of this distribution? [Solution: Variance = 77 - 52 = 52. CV =
52 / 5 = 1.44. ]
Since if X is in dollars then both the standard deviation and the mean are in dollars, the coefficient of variation is a dimensionless quantity; i.e., it is a pure number which is not in any particular currency. Thus the coefficient of variation of X is unaffected if X is multiplied by a constant.
When adding two independent random variables: CV[X + Y] =
Var[X] + Var[Y] . In particular, E[X] + E[Y]
when adding two independent identically distributed random variables: CV[X + X] = CV[X] /
2.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 29
So if one adds up more and more independent identically distributed random variables, then the coefficient of variation declines towards zero.11 The following formula for unity plus the square of the coefficient of variation follows directly from the definition Coefficient of Variation. C V2 = Variance / E[X]2 = (E[X2 ] - E[X]2 ) / E[X]2 = (E[X2 ] / E[X]2 ) - 1. Thus, 1 + CV2 = E[X2 ] / E[X]2 = 2nd moment divided by the square of the mean. Exercise: One observes losses of sizes: $300, $600, $1,200, $1,500, and $2,800. Determine the empirical coefficient of variation. [Solution: X = (300 + 600 + 1200 + 1500 + 2800)/5 = 1280. Variance = {(300 - 1280)2 + (600 - 1280)2 + (1200 - 1280)2 + (1500 - 1280)2 + (2800 - 1280)2 }/5 = 757,600. CV =
757,600 / 1280 = 0.680.
Alternately, 2nd moment is: (3002 + 6002 + 12002 + 15002 + 28002 )/5 = 2,396,000. 1 + CV2 = 2,396,000 / 12802 = 1.4624. ⇒ CV = 0.680. Comment: Note the use of the biased estimator of the variance rather than the sample variance. The CV would be the same using each of the losses divided by 100.] Skewness: The (coefficient of) skewness is defined as the 3rd central moment divided by the cube of the standard deviation: Skewness = γ1 =
E[(X - E[X]) 3] . STDDEV 3
The third central moment can be written in terms of moments around the origin: E[(X - E[X])3 ] = E[X3 - 3X2 E[X] + 3XE[X]2 - E[X]3 ] = E[X3 ] - 3 E[X] E[X2 ] + 3E[X]E[X]2 - E[X]3 = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 . E[(X - E[X])3 ] = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 . Exercise: Let 5, 77, 812 be the first three moments (around the origin) of a distribution. What is the skewness of this distribution? [Solution: Variance = 77 - 52 = 52. Third central moment = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 = 812 - (3)(5)(77) + 2(53 ) = -93. Skewness = Third central moment / Variance1.5 = -93/521.5 = - 0.248.] 11
This is the fundamental idea behind the usefulness of Credibility.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 30
The skewness helps to describe the shape of the distribution. Typically size of loss distributions have positive skewness, such as the following Pareto Distribution with mode of zero:12
0.00004
0.00003
0.00002
0.00001
20000
40000
60000
80000
100000
Or the following LogNormal Distribution with a positive mode:
0.00003 0.000025 0.00002 0.000015 0.00001 5 ¥ 10- 6
20000
40000
60000
80000
100000
Positive skewness ⇔ skewed to the right. There is a significant probability of very large results. A symmetric distribution has zero skewness.13 While a symmetric curve has a skewness of zero, the converse is not true. 12 13
The Pareto Distribution is discussed in the subsequent section on Common Two Parameter Distributions. For example, the Normal distribution has zero skewness.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 31
The following Weibull Distribution has skewness of zero, but it is not symmetric:14
0.000035 0.00003 0.000025 0.00002 0.000015 0.00001 5 ¥ 10- 6 20000
40000
60000
80000
100000
The following Weibull Distribution has negative skewness and is skewed to the left:15 0.00006 0.00005 0.00004 0.00003 0.00002 0.00001
20000
40000
60000
80000
100000
With τ = 3.60235. See the Section on Common Two Parameter Distributions. With τ = 6. The skewness depends on the value of the shape parameter tau. For τ > 3.60235, the Weibull has negative skewness. For τ < 3.60235, the Weibull has positive skewness. 14 15
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 32
If X is in dollars, both the third central moments of X and the cube of the standard deviation are in dollars cubed. Therefore the skewness is a dimensionless quantity; i.e., it is a pure number which is not in any particular currency. Thus the skewness of X is unaffected if X is multiplied by a positive constant. However, Skew[-X] = -Skew[X].16 Thus if X has positive skewness then -X has negative skewness.17 Exercise: The skewness of a random variable X is 3.5. What is the skewness of 1.1X? [Solution: 3.5. The skewness is unaffected when a variable is multiplied by a positive constant. Comment: This could be due to the impact of 10% inflation.] Exercise: The skewness of a random variable X is 3.5. What is the skewness of -1.1X? [Solution: -3.5. The skewness is multiplied by -1 when a variable is multiplied by a negative constant.] The numerator and the denominator of the skewness both involve central moments. The numerator is the third central moment, while the denominator is the second central moment taken to the 3/2 power. Therefore they are unaffected by the addition or subtraction of a constant. Therefore, the skewness of X + c is the same as the skewness of X. Translating a curve to the left or the right does not change its shape; specifically it does not change its skewness. Exercise: The skewness of a random variable X is 3.5. What is the skewness of 10X + 7? [Solution: 3.5. The skewness is unaffected when a variable is multiplied by a positive constant. Also, the skewness is unaffected when a constant is added.] Note that skewnesses do not add. However, since third central moments of independent variables do add, one can derive useful formulas.18
16
The numerator of the skewness is negative of what it was, but the denominator is unaffected since the standard deviation is never negative by definition. 17 If X is skewed to the right, then -X, which is X reflected in the Y-Axis, is skewed to the left. 18 For X and Y independent the 2nd and 3rd central moments add; the 4th central moment and higher central moments do not add. Cumulants of independent variables add and the 2nd and 3rd central moments are equal to the 2nd and 3rd cumulants. See for example, Practical Risk Theory for Actuaries, by Daykin, Pentikainen and Pesonen.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 33
For X and Y independent: 3rd central moment of X+Y = 3rd central moment of X + 3rd central moment of Y = Skew[X]Var[X]1.5 + Skew[Y]Var[Y]1.5. Thus for X and Y independent: Skew[X] Var[X]1.5 + Skew[Y] Var[Y]1.5 Skew[X + Y] = . (Var[X] + Var[Y])1.5 In particular, when adding two independent identically distributed random variables, Skew[X + X] = Skew[X]/√2. As we add more identically distributed random variables the skewness goes to zero; the sum goes to a symmetric distribution.19 The coefficient of variation and the skewness are useful summary statistics that describe the shape of the distribution. They give you an idea of which type of distribution is likely to fit. Note that the Coefficient of Variation and Skewness do not depend on the scale parameter if any. Most size of loss distributions have a positive skewness (skewed to the right), with a few very large claims and many smaller claims. The more of the total dollars of loss represented by the rare large claims, the more skewed the distribution.20 Exercise: One observes losses of sizes: $300, $600, $1,200, $1,500, and $2,800. Determine the empirical coefficient of skewness. [Solution: From a previous exercise X = 1280, and the variance = 757,600. 3rd central moment = {(300 - 1280)3 + (600 - 1280)3 + (1200 - 1280)3 + (1500 - 1280)3 + (2800 - 1280)3 }/5 = 453,264,000. Skewness = 453,264,000/ 757,6001.5 = 0.687. Comment: We have again used the “biased” estimate of the variance rather than the sample variance. The skewness would be the same using each of the losses divided by 100.]
19
Note the relation to the central limit theorem, where a sum of standardized identical distributions goes to a symmetric normal distribution. 20 As discussed subsequently, this situation is referred to as a heavy-tailed distribution.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 34
Kurtosis: The kurtosis is defined as the fourth central moment divided by the square of the variance.
Kurtosis =
E[(X - E[X])4 ] . Variance2
Exercise: One observes losses of sizes: $300, $600, $1,200, $1,500, and $2,800. Determine the empirical kurtosis. [Solution: From a previous exercise X = 1280, and the variance = 757,600. 4th central moment = {(300 - 1280)4 + (600 - 1280)4 + (1200 - 1280)4 + (1500 - 1280)4 + (2800 - 1280)4 }/5 = 1,295,302,720,000. Kurtosis = 1,295,302,720,000/ 757,6002 = 2.257. Comment: The kurtosis would be the same using each of the losses divided by 100.] As with the skewness, the kurtosis is a dimensionless quantity, which describes the shape of the distribution.21 Thus the kurtosis is unaffected when a variable is multiplied by a (non-zero) constant. Since the fourth central moment is always non-negative, so is the kurtosis. Large kurtosis ⇔ a heavy-tailed distribution. Exercise: The kurtosis of a random variable X is 3.5. What is the kurtosis of 1.1X? [Solution: 3.5. The kurtosis is unaffected when a variable is multiplied by a constant. Comment: This exercise could be referring to the impact of 10% inflation.] Exercise: The kurtosis of a random variable X is 3.5. What is the kurtosis of -1.1X? [Solution: 3.5. The kurtosis is unaffected when a variable is multiplied by a non-zero constant. Comment: Remember that the kurtosis is always positive.] The numerator and the denominator of the kurtosis both involve central moments. The numerator is the fourth central moment, while the denominator is the second central moment squared. Therefore they are unaffected by the addition or subtraction of a constant; the kurtosis of X + c is the same as the kurtosis of X. Translating a curve to the left or the right does not change its shape; it does not change its kurtosis.
21
Both the numerator and denominator are in dollars to the fourth power.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 35
Exercise: The kurtosis of a random variable X is 3.5. What is the kurtosis of 10X + 7? [Solution: 3.5. The kurtosis is unaffected when a variable is multiplied by a non-zero constant. Also, the kurtosis is unaffected when a constant is added.] If X is a Normal Distribution, then (X-µ)/σ is a Standard Normal. X = σ(Standard Normal + µ/σ) = a constant times (Standard Normal plus another constant.) Thus all Normal Distributions have the same kurtosis as a Standard Normal. It turns out that, all Normal Distributions have a kurtosis of 3. Distributions with a kurtosis less than 3 are lighter-tailed than a Normal Distribution. Distributions with a kurtosis more than 3 are heavier-tailed than a Normal Distribution; they have their densities go to zero more slowly as x approaches infinity than a Normal. Most size of loss distributions encountered in practice have a kurtosis greater than 3. For example, the kurtosis of a Gamma Distribution with shape parameter α is: 3 + 6/α. Exercise: What is the 4th central moment in terms of moments around the origin? [Solution: The 4th central moment is: E[(X - E[X])4 ] = E[X4 - 4E[X]X3 + 6E[X]2 X2 - 4E[X]3 X + E[X]4 ] = E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 4E[X]3 E[X] + E[X]4 = E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 3E[X]4 .] Thus we have the formula for the Kurtosis in terms of moments around the origin: Kurtosis = γ2 =
E[(X - E[X])4 ] E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4 = . Variance2 Variance2
The empirical kurtosis of the ungrouped data in Section 1 is: E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4 = Variance2 {(6.465278 x 1024 - (4)(312,674.6)(1.600225 x 1018) + (6)( 312,674.62 )(4.9284598 x 1011) - (3)(312,674.64 )} / (3.9508 x 1011)2 = 30.3 It should be noted that empirical estimates of the kurtosis are subject to large estimation errors, since the empirical kurtosis is very heavily effected by the absence or presence of a few large claims.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 36
Exercise: Let 5, 77, 812, 10423 be the first four moments (around the origin) of a distribution. What is the kurtosis of this distribution? [Solution: Variance = 77 - 52 = 52. Kurtosis =
E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4 = Variance2
{10423 - (4)(5)(812) + (6)(52 )(77) - (3)(54 ) }/522 = 1.43.] Jensenʼs Inequality states that for a convex function f, E[f(X)] ≥ f(E[X]).22 f(x) = x2 is an example of a convex function; its second derivative is positive. Therefore, by Jensenʼs Inequality, E[X2 ] ≥ E[X]2 .23 Letting X = Y2 , we therefore have that E[Y4 ] ≥ E[Y2 ]2 . The fourth moment is greater than or equal to the square of the second moment. Letting X = (Y - µY)2 , we therefore have that E[(Y - µY)4 ] ≥ E[(Y - µY)2 ]2 . The fourth central moment is greater than or equal to the square of the variance. Therefore, the Kurtosis is always greater than or equal to one. In fact, Kurtosis ≥ 1 + Skewness2 .24 Exercise: Let Prob[X = -10] = 50% and Prob[X = 10] = 50%. Determine the skewness and kurtosis of X. [Solution: Since this distribution is symmetric around its mean of 0, skewness = 0. Variance = 102 = 100. Fourth Central Moment = 104 . Kurtosis = 104 /1002 = 1.] Exercise: One observes losses of sizes: $300, $600, $1,200, $1,500, and $2,800. Determine the empirical kurtosis. [Solution: From a previous exercise X = 1280, and the variance = 757,600. 4th central moment = {(300 - 1280)4 + (600 - 1280)4 + (1200 - 1280)4 + (1500 - 1280)4 + (2800 - 1280)4 }/5 = 1,295,302,720,000. Kurtosis = 1,295,302,720,000/ 757,6002 = 2.257.] When computing the empirical coefficient of variation, skewness, or kurtosis, we use the biased estimate of the variance, with n in the denominator, rather than the sample variance. We do so since everyone else does.25 22
See for example Actuarial Mathematics. This also follows from the fact that the variance is never negative. 24 See Exercise 3.19 in Volume I of Kendallʼs Advanced Theory of Statistics. 25 See for example, 4, 5/01, Q.3. 23
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 37
Problems: 3.1 (1 point) A size of loss distribution has moments as follows: First moment = 3, Second moment = 50, Third Moment = 2000. Determine the skewness. A. less than 6 B. at least 6 but less than 6.2 C. at least 6.2 but less than 6.4 D. at least 6.4 but less than 6.6 E. at least 6.6 Use the following information for the next 4 questions: E[X] = 5, E[X2 ] = 42.8571, E[X3 ] = 584.184, E[X4 ] = 11,503.3. 3.2 (1 point) What is the variance of X? A. less than 17 B. at least 17 but less than 18 C. at least 18 but less than 19 D. at least 19 but less than 20 E. at least 20 3.3 (1 point) What is the coefficient of variation of X? A. less than 0.6 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9 3.4 (2 points) What is the skewness of X? A. less than 2.4 B. at least 2.4 but less than 2.5 C. at least 2.5 but less than 2.6 D. at least 2.6 but less than 2.7 E. at least 2.7 3.5 (3 points) What is the kurtosis of X? A. less than 10 B. at least 10 but less than 11 C. at least 11 but less than 12 D. at least 12 but less than 13 E. at least 13
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 38
3.6 (1 point) Let X be a random variable. Which of the following statements are true? 1. A measure of skewness of X is E[X3 ] / Var[X]3/2. 2. The measure of skewness is positive if X has a heavy tail to the right. 3. If X is given by Standard Unit Normal, then X has kurtosis equal to one. A. None of 1, 2, or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D 3.7 (3 points) There are 10,000 claims observed as follows: Size of Claim Number of Claims 100 9000 200 800 300 170 400 30 Which of the following statements are true? 1. The mean of this distribution is 112.3. 2. The variance of this distribution is 14210. 3. The skewness of this distribution is positive. A. None of 1, 2, or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D Use the following information for the next three questions: There are five losses of sizes: 5, 10, 20, 50, 100. 3.8 (2 points) What is the empirical coefficient of variation? A. 0.90 B. 0.95 C. 1.00 D. 1.05 E. 1.1 3.9 (2 points) What is the empirical coefficient of skewness? A. 0.8 B. 0.9 C. 1.0 D. 1.2 E. 1.4 3.10 (2 points) What is the empirical kurtosis? A. 1.1 B. 1.4 C. 1.7 D. 2.0
E. 2.3
3.11 (3 points) f(x) = 2x, 0 < x < 1. Determine the skewness. A. -0.6 B. -0.3 C. 0 D. 0.3
E. 0.6
3.12 (3 points) f(x) = 1, 0 < x < 1. Determine the skewness. A. -0.6 B. -0.3 C. 0 D. 0.3
E. 0.6
3.13 (3 points) f(x) = 2(1 - x), 0 < x < 1. Determine the skewness. A. -0.6 B. -0.3 C. 0 D. 0.3
E. 0.6
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 39
3.14 (4, 5/86, Q.33) (1 point) Which of the following statements are true about the random variable X? 1. If X is given by a unit normal distribution, then X has its measure of skewness equal to one. 2. A measure of the skewness of X is E[X3 ]/(VAR[X])3 . 3. The measure of skewness of X is positive if X has a heavy tail to the right. A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3 3.15 (4, 5/88, Q.30) (1 point) Let X be a random variable with mean m, and let ar denote the rth moment of X about the origin. Which of the following statements are true? 1. m = a1 2. The third central moment is equivalent to a3 + 3a2 - 2a1 3 . 3. The variance of X is the second central moment of X. A. 1 B. 2 C. 2, 3 D. 1, 3
E. 1, 2 and 3
3.16 (4, 5/89, Q.27) (1 point) There are 30 claims for a total of $180,000. Given the following claim size distribution, calculate the coefficient of skewness. Claim Size ( $000 ) Number of Claims 2 2 4 6 6 12 8 10 A. Less than -.6 B. At least -.6, but less than -.2 C. At least -.2, but less than .2 D. At least .2, but less than .6 E. .6 or more 3.17 (2 points) In the previous question, determine the kurtosis. A. 1.0 B. 1.5 C. 2.0 D. 2.5 E. 3.0
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 40
3.18 (4B, 5/93, Q.34) (1 point) Claim severity has the following distribution: Claim Size Probability $100 0.05 $200 0.20 $300 0.50 $400 0.20 $500 0.05 Determine the distribution's measure of skewness. A. -0.25 B. 0.00 C. 0.15 D. 0.35 E. Cannot be determined 3.19 (2 points) In the previous question, determine the kurtosis. A. 1.0 B. 1.5 C. 2.0 D. 2.5 E. 3.0 3.20 (4B, 5/95, Q.28) (2 points) You are given the following: • For any random variable X with finite first three moments, the skewness of the distribution of X is denoted Sk(X). • X and Y are independent, identically distributed random variables with mean = 0 and finite second and third moments. Which of the following statements must be true? 1. 2Sk(X) = Sk(2X) 2. -Sk(Y) = Sk(-Y) 3. |Sk(X)| ≥ |Sk(X+Y)| A. 2 B. 3 C. 1,2 D. 2,3 E. None of A, B, C, or D 3.21 (4B, 5/97, Q.21) (2 points) You are given the following: • Both the mean and the coefficient of variation of a particular distribution are 2.
•
The third moment of this distribution about the origin is 136. Determine the skewness of this distribution. Hint: The skewness of a distribution is defined to be the third central moment divided by the cube of the standard deviation. A. 1/4 B. 1/2 C. 1 D. 4 E. 17 3.22 (4B, 11/99. Q.29) (2 points) You are given the following: • A is a random variable with mean 5 and coefficient of variation 1. • B is a random variable with mean 5 and coefficient of variation 1. • C is a random variable with mean 20 and coefficient of variation 1/2. • A, B, and C are independent. •X=A+B •Y=A+C Determine the correlation coefficient between X and Y. A. -2/ 10
B. -1/ 10
C. 0
D. 1/ 10
E. 2/ 10
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 41
3.23 (4, 5/01, Q.3) (2.5 points) You are given the following times of first claim for five randomly selected auto insurance policies observed from time t = 0: 1, 2, 3, 4, 5. Calculate the kurtosis of this sample. (A) 0.0 (B) 0.5 (C) 1.7 (D) 3.4 (E) 6.8 3.24 (4, 11/06, Q.3 & 2009 Sample Q.248) (2.9 points) You are given a random sample of 10 claims consisting of two claims of 400, seven claims of 800, and one claim of 1600. Determine the empirical skewness coefficient. (A) Less than 1.0 (B) At least 1.0, but less than 1.5 (C) At least 1.5, but less than 2.0 (D) At least 2.0, but less than 2.5 (E) At least 2.5
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 42
Solutions to Problems: 3.1. B. Stand. deviation =
41 = 6.403. Skewness = (2000 - 450 + 54) / 262.5 = 6.1
3.2. B. Variance = 42.8571 - 52 = 17.8571. Comment: The given moments are for an Inverse Gaussian Distribution with parameters µ = 5 and θ = 7. The variance for an Inverse Gaussian is: µ3/θ = 125 / 7 = 17.8571. 3.3. D. CV =
17.8571 / 5 = 0.845.
Comment: The given moments are for an Inverse Gaussian Distribution with parameters µ = 5 and θ = 7. The coefficient of variation for an Inverse Gaussian is:
3.4. C. Skewness =
µ /θ = 5 /7 = .845.
E[X3] - 3 E[X] E[X2] + 2 E[X] 3 = STDDEV 3
{584.184 - (3)(5)(42.8571) + (2)(125)}/ 17.85711.5 = 191.3275 / 75.4599 = 2.535. Comment: The given moments are for an Inverse Gaussian Distribution with parameters µ = 5 and θ = 7. The skewness for an Inverse Gaussian is 3 µ /θ = 3 5 /7 = 2.535.
3.5. E. Kurtosis =
E[X4] - 4 E[X] E[X3] - 6 E[X]2 E[X2] - 3 E[X]4 = Variance2
{11503.3 -(4)(5)(584.184) + (6)(25)(42.8571) - (3)(625)}/17.85712 = 4373.19/318.88 = 13.71. Comment: The given moments are for an Inverse Gaussian Distribution with parameters µ = 5 and θ = 7. The kurtosis for an Inverse Gaussian is 3 + 15µ/θ = 96/7 = 13.71. 3.6. C. 1. The numerator of the skewness should be the third central moment: E[(X - E[X])3 ] = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 . Thus Statement 1 is not true in general. 2. Statement 2 is true. A good example is the Pareto Distribution. 3. The Normal Distribution has a kurtosis of three, thus Statement #3 is not true.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 43
3.7. E. The mean is: 1,123,000 / 10000 = 112.3. So Statement #1 is true. The second moment is: 142,100,000 / 10,000 = 14,210, thus the variance is: 14210 - 11.232 = 1598.7. Thus Statement #2 is false. A
B
C
D
E
Size of Claim Number of Claims Col.A x Col.B Sq. of Col.A x Col.B Cube of Col.A x Col.B 100 9000 900,000 90,000,000 9,000,000,000 200 800 160,000 32,000,000 6,400,000,000 300 170 51,000 15,300,000 4,590,000,000 400 30 12,000 4,800,000 1,920,000,000 10000
1,123,000
142,100,000
21,910,000,000
E[X] = 1,123,000 / 10,000 = 112.3. E[X2 ] = 142,100,000 / 10,000 = 14,210. E[X3 ] = 21,910,000,000 / 10,000 = 2,191,000. STDDEV = Skewness =
1598.7 = 39.98.
E[X3] - 3 E[X] E[X2] + 2 E[X] 3 3 = STDDEV 3
{2,191,000 - (3)(112.3)( 14,210) + (2)(112.3)3 )} / 39.983 = 3.7. Thus Statement #3 is true. 3.8. B. X = (5 + 10 + 20 + 50 + 100)/5 = 37. Variance = {(5 - 37)2 + (10 - 37)2 + (20 - 37)2 + (50 - 37)2 + (100 - 37)2 }/5 = 1236. CV = 1236 / 37 = 0.950. Comment: Note the use of the biased estimator of the variance rather than the sample variance. 3.9. B. Third Central Moment = {(5 - 37)3 + (10 - 37)3 + (20 - 37)3 + (50 - 37)3 + (100 - 37)3 }/5 = 38,976. Skewness = 38,976 / 12361.5 = 0.897.
3.10. E. 4th Central Moment = {(5 - 37)4 + (10 - 37)4 + (20 - 37)4 + (50 - 37)4 + (100 - 37)4 }/5 = 3,489,012. Kurtosis = 3,489,012 / 12362 = 2.284. 1
1
1
0
0
0
3.11. A. E[X] = ∫ xf(x)dx = 2/3. E[X2 ] = ∫ x2 f(x)dx = 1/2. E[X3 ] = ∫ x3 f(x)dx = 2/5. variance = (1/2) - (2/3)2 = 1/18. third central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 2/5 - (3)(2/3)(1/2) + 2(2/3)3 = -0.0074074. Skewness = -0.0074074/(1/18)1.5 = -0.5657. Comment: A Beta Distribution with a = 2, b = 1, and θ = 1. Skewness = 2 (b - a)
a + b + 1 / {(a + b + 2) a b } = -2 4 / {5 2 } = -0.5657.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 44
3.12. C. The distribution is symmetric around its mean of 1/2. ⇒ The skewness is 0. 1
1
1
0
0
0
E[X] = ∫ xf(x)dx = 1/2. E[X2 ] = ∫ x2 f(x)dx = 1/3. E[X3 ] = ∫ x3 f(x)dx = 1/4. variance = (1/3) - (1/2)2 = 1/12. third central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 1/4 - (3)(1/2)(1/3) + 2(1/2)3 = 0. Skewness = 0/(1/12)1.5 = 0. 1
1
0
0
3.13. E. E[X] = ∫ xf(x)dx = 1 - 2/3 = 1/3. E[X2 ] = ∫ x2 f(x)dx = 2/3 - 1/2 = 1/6. 1
variance = (1/6) - (1/3)2 = 1/18. E[X3 ] = ∫ x3 f(x)dx = 1/2 - 2/5 = 1/10. 0
3rd central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 1/10 - (3)(1/3)(1/6) + 2(1/3)3 = 0.0074074. Skewness = 0.0074074/(1/18)1.5 = 0.5657. Comment: A Beta Distribution with a = 1, b = 2, and θ = 1. Skewness = 2 (b - a)
a + b + 1 / {(a + b + 2) a b } = 2 4 / {5 2 } = 0.5657.
3.14. C. 1. False . The Normal Distribution is symmetric with a skewness of zero. 2. False. The numerator should be the third central moment, while the denominator should be the standard deviation cubed. SKEW[X] = E[(X-E[X])3 ]/(VAR[X])3/2 . 3. True. 3.15. D. Statement one is true, the mean is the 1st moment around the origin: E[X]. Statement 2 is false. The 3rd central moment = E[(X-m)3 ] = a3 - 3a2 a1 + 2a1 3 ; Statement 3 is true, the variance is the 2nd central moment: E[(X-m)2]. Comment: In the 3rd central moment each term must be in dollars cubed.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 45
3.16. B. & 3.17. D. Calculate the moments: Number of Claims
Size of Claim
Square of Size of Claim
Cube of Size of Claim
2 6 12 10
2 4 6 8
4 16 36 64
8 64 216 512
Average
6.000
39.200
270.400
E[X] = 6 , E[X2 ] = 39.2, E[X3 ] = {(2)(23 ) +(6)(43 ) +(12)(63 ) +(10)(83 ) }/(2+6+10+12) = 270.4. Variance = E[X2 ] - E[X]2 = 39.2 - 62 = 3.2. (Coefficient of) Skewness = {E[X3 ] - (3 E[X] E[X2 ]) + (2 E[X]3 )} / STDDEV3 = {(270.4) - (3)(6)(39.2) +(2)(6)3 } / (3.2)3/2 = -3.2 / 5.724 = -0.56. Alternately, Third central moment = {2(2 - 6)3 + 6(4 - 6)3 + 12(6 - 6)3 + 10(8 - 6)3 }/30 = -3.2. Skewness = -3.2/ (3.2)3/2 = -0.56. Fourth central moment = {2(2 - 6)4 + 6(4 - 6)4 + 12(6 - 6)4 + 10(8 - 6)4 }/30 = 25.6. Kurtosis = (Fourth central moment)/Variance2 = 25.6/3.22 = 2.5. Comment: The distribution is skewed to the left and therefore has a negative skewness. 3.18. B. A symmetric distribution has zero skewness. 3.19. E. Calculate the moments: Probability
Size of Claim ($00)
Square of Size of Claim
5% 20% 50% 20% 5%
1 2 3 4 5
1 4 9 16 25
Average
3.0
9.8
E[X] = 3 , E[X2 ] = 9.8. Variance = E[X2 ] - E[X]2 = 9.8 - 32 = 0.8. 4th central moment = (.05)(1 - 3)4 + (.2)(2 - 3)4 + (.5)(3 - 3)4 + (.2)(4 - 3)4 + (.05)(5 - 3)4 = 2. Kurtosis = (Fourth central moment)/Variance2 = 2/0.82 = 3.125. Comment: The kurtosis does not depend on the scale. So dividing all of the claim sizes by 100 makes the arithmetic easier, but does not affect the answer.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 46
3.20. D. Statement 1 is false. The skewness is a dimensionless quantity; i.e., it is a pure number which is not in any particular currency. Thus the skewness of X is unaffected if X is multiplied by a positive constant. In this specific case both the 3rd central moment and the cube of the standard deviation are multiplied by 23 = 8. Therefore the skewness which is their ratio is unaffected. Statement 2 is true. The skewness is defined as the 3rd central moment divided by the cube of the standard deviation. The former is multiplied by -1 since by definition the third central moment is E[(XE[X])3 ]. Alternately, recall that the third central moment = µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3), each of whose terms is multiplied by -1. (The odd powered moments around the origin are each multiplied by -1, while the even powered moments are unaffected.) The cube of the standard deviation is unaffected since the standard deviation is always positive. Statement 3 is true. Skewnesses do not add. However, since third central moments of independent variables do add, for X and Y independent, 3rd central moment of X+Y = 3rd central moment of X + 3rd central moment of Y = Skew[X]Var[X]1.5 + Skew[Y]Var[Y]1.5. Thus for X and Y independent, Skew[X+Y] = {Skew[X]Var[X]1.5 + Skew[Y]Var[Y]1.5} / {Var[X] + Var [Y]}1.5. In particular, when adding two independent identically distributed random variables, Skew[X + Y] = Skew[X] / √2 ≤ Skew [X]. Comment: Long and difficult. Tests important concepts. Statement 2 says that if X is skewed to the right, then -X is skewed to the left by the same amount. 3.21. B. We are given E[X3 ] = 136, E[X] = 2 and CV = σ/ E[X] = 2. Therefore σ = 4. Therefore E[X2 ] = σ2 + E[X]2 = 42 + 22 = 20. Skewness = {E[X3 ] - (3 E[X] E[X2 ]) + (2 E[X]3 )} / σ3 = {136 - (3)(20)(2) + 2(23 )} / 43 = 32/64 = 1/2. 3.22. D. Var[A] = {(mean)(CV)}2 = 25. Var[B] = {(5)(1)}2 = 25. Var[C] = {(20)(1/2)}2 = 100. Var[X] = Var[A] + Var[B] = 25 + 25 = 50, since A and B are independent. Var[Y] = Var[A] + Var[C] = 25 + 100 = 125, since A and C are independent. Cov[X, Y] = Cov[A+B, A+C] = Cov[A, A] + Cov[A, C] + Cov[B, A] + Cov[B, C] = Var[A] + 0 + 0 + 0 = 25. Corr[X , Y] = Cov[X , Y] /
Var[X] Var[Y] = 25 /
(50)(125) = 1/
10 .
Comment: Since A, B, and C are independent, Cov[A, C] = Cov[B, A] = Cov[B, C] = 0. 3.23. C. Mean = (1 + 2 + 3 + 4 +5)/5 = 3. Variance = 2nd central moment = {(1-3)2 + (2-3)2 + (3-3)2 + (4-3)2 + (5-3)2 }/5 = 2. 4th central moment = {(1-3)4 + (2-3)4 + (3-3)4 + (4-3)4 + (5-3)4 }/5 = 34/5. Kurtosis = the fourth central moment divided by the variance squared = (34/5)/22 = 1.7. Comment: We use the biased estimator of the variance rather than the sample variance.
2013-4-2,
Loss Distributions, §3 CV Skewness Kurtosis HCM 10/8/12, Page 47
3.24. B. E[X] = {(2)(400) + (7)(800) + 1600}/10 = 800. E[X2 ] = {(2)(4002 ) + (7)(8002 ) + 16002 }/10 = 736,000. E[X3 ] = {(2)(4003 ) + (7)(8003 ) + 16003 }/10 = 780,800,000. Variance is: 736,000 - 8002 = 96,000. Third Central Moment is: 780,800,000 - (3)(736,000)(800) + (2)(8003 ) = 38,400,000. Skewness is: 38,400,000/96,0001.5 = 1.291. Alternately, Third Central Moment is: {(2)(400 - 800)3 + (7)(800 - 800)3 + (1600 - 800)3 }/10 = 38,400,000. Proceed as before. Comment: If one divide all of the claim sizes by 100, then the skewness is unaffected. Note that the denominator is not based on using the sample variance.
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 48
Section 4, Empirical Distribution Function This section will discuss the Distribution and Survival Functions. Cumulative Distribution Function: For the Cumulative Distribution Function, F(x) = Prob[X ≤ x]. Various Distribution Functions are listed in Appendix A attached to your exam. For example, for the Exponential Distribution, F(x) = 1 - e-x/θ.26 Exercise: What is the value at 3 of an Exponential Distribution with θ = 2. [Solution: F(x) = 1 - e-x/θ. F(3) = 1- e-3/2 = .777.] Fʼ(x) = f(x) ≥ 0. 0 ≤ F(x) ≤ 1, nondecreasing, right-continuous, starts at 0 and ends at 1.27 Here is graph of the Exponential Distribution with θ = 2: 1 0.8 0.6 0.4 0.2
2
26
4
6
8
10
See Appendix A in the tables attached to the exam. The Exponential Distribution will be discussed in detail in a subsequent section. 27 As x approaches y from above, F(x) approaches F(y). F would not be continuous at a jump discontinuity, but would still be right continuous. See Section 2.2 of Loss Models.
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 49
Survival Function: Similarly, we can define the Survival Function, S(x) = 1 - F(x) = Prob[X > x]. Sʼ(x) = -f(x) ≤ 0. 0 ≤ S(x) ≤ 1, nonincreasing, right-continuous, starts at 1 and ends at 0.28 S(x) = 1 - F(x) = Prob[X > x] = the Survival Function = the tail probability of the Distribution Function F. For example, for the Exponential Distribution, S(x) = 1 - F(x) = 1 - (1- e-x/θ) = e-x/θ. Here is graph of the Survival Function of an Exponential with θ = 2: 1 0.8 0.6 0.4 0.2
2
4
6
8
10
Exercise: What is S(5) for a Pareto Distribution29 with α = 2 and θ = 3? [Solution: F(x) = 1 - {θ/(x+θ)}α. S(x) = {θ/(x+θ)}α. S(5) = {3/(3+5)}2 = 9/64 = 14.1%.] In many situations you may find that the survival function is easier for you to use than the distribution function. Whenever a formula has S(x), one can always use 1 - F(x) instead, and vice-versa.
28
See Definition 2.4 in Loss Models. See Appendix A in the tables attached to the exam. The Pareto Distribution will be discussed in a subsequent section. 29
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 50
Empirical Model: The Empirical Model: probability of 1/(# data points) is assigned to each observed value.30 For the ungrouped data set in Section 1, the corresponding empirical model has density of 1/130 at each of the 130 data points: p(300) = 1/130, p(400) = 1/130, ..., p(4802200) = 1/130. Exercise: The following observations: 17, 16, 16, 19 are taken from a random sample. What is the probability function (pdf) of the corresponding empirical model? [Solution: p(16) = 1/2, p(17) = 1/4, p(19) = 1/4.] Empirical Distribution Function: The Empirical Model is the density that corresponds to the Empirical Distribution Function: Fn (x) = (# data points ≤ x)/(total # of data points). The Empirical Distribution Function at x, is the observed number of claims less than or equal to x divided by the total number of claims observed. At each observed claim size the Empirical Distribution Function has a jump discontinuity. For example for the ungrouped data in Section 1, just prior to 37,300 the Empirical Distribution Function is 26/130 = .2000, while at 37,300 it is 27/130 = .2077. Exercise: One observes losses of sizes: $300, $600, $1,200, $1,500, and $2,800. What is the Empirical Distribution Function? [Solution: Fn (x) is: 0 for x < 300, 1/5 for 300 ≤ x < 600, 2/5 for 600 ≤ x < 1200, 3/5 for 1200 ≤ x < 1500, 4/5 for 1500 ≤ x < 2800, 1 for x ≥ 2800.]
30
See Definition 3.2 in Loss Models.
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 51
Here is a graph of this Empirical Distribution Function: Probability 1.0
o
0.8
o
0.6
o
0.4
0.2
o
o 300 600
1200 1500
2800
x
The empirical distribution function is constant on intervals, with jumps up of 1/5 at each of the five observed points. For example, it is 1/5 at 599.99999 but 2/5 at 600. Mean and Variance of the Empirical Distribution Function: Assume the losses are drawn from a Distribution Function F(x). Then each observed loss has a chance of F(x) of being less than or equal to x. Thus the number of losses observed less than or equal to x is a sum of N independent Bernoulli trials with chance of success F(x). Thus if one has a sample of N losses, the number of losses observed less than or equal to x is Binomially distributed with parameters N and F(x). Therefore, the Empirical Distribution Function is (1/N) times a Binomial Distribution with parameters N and F(x). Therefore, the Empirical Distribution Function has mean of F(x) and a variance of: F(x){1-F(x)}/N. Exercise: Assume 130 losses are independently drawn from an Exponential Distribution: F(x) = 1 - e-x/300,000. Then what is the distribution of the number of losses less than or equal to 100,000? [Solution: The number of losses observed less than or equal to 100,000 is Binomially distributed with parameters 130, 1-e-1/3 = 0.283.]
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 52
Exercise: Assume 130 losses are independently drawn from an Exponential Distribution: F(x) = 1 - e-x/300000. Then what is the variance of the number of losses less than or equal to 100,000? [Solution: The number of losses observed less than or equal to 100,000 is Binomially distributed with parameters 130, 1-e-1/3 = 0.283. Thus it has a variance of: (130)(0.283)(1 - 0.283) = 26.38. ] Exercise: 130 losses are independently drawn from an Exponential Distribution: F(x) = 1 - e-x/300000. What is the distribution of the empirical distribution function at 100,000? [Solution: The number of losses observed less than or equal to 100,000 is Binomially distributed with parameters 130, .283. The empirical distribution function at 100,000, Fn (100000), is the percentage of losses ≤ 100,000. Thus the empirical distribution function at 100,000 is (1/130) times a Binomial with parameters 130 and .283.] Exercise: 130 losses are independently drawn from an Exponential Distribution: F(x) = 1 - e-x/300000. What is the variance of the percentage of losses less than or equal to 100,000? [Solution: Fn (100000) is (1/130) times a Binomial with parameters 130 and .283. Thus it has a variance of (1/130)2 (130)(0.283)(1 - 0.283) = 0.00156. ] As the number of losses, N, increases, the variance of the estimate of the distribution decreases as 1/N. All other things being equal, the variance of the empirical distribution function is largest when trying to estimate the middle of the distribution rather than either of the tails31. Empirical Survival Function: The Empirical Survival Function is: 1 - Empirical Distribution Function. Empirical Distribution Function at x is: (# losses ≤ x)/(total # of losses).32 Empirical Survival Function at x is: (# losses > x)/(total # of losses). Exercise: One observes losses of sizes: $300, $600, $1,200, $1,500, and $2,800. What are the empirical distribution function and survival function at 1000? [Solution: Fn (1000) = (# losses ≤ 1000) / (# losses) = 2/5. S n (1000) = (# losses > 1000) / (# losses) = 3/5.] 31 32
F(x){1-F(x)} is largest for F(x) ≅ 1/2. However, small differences in the tail probabilities can be important. More generally, the empirical distribution function is: (# observations ≤ x) / (total # of observations).
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 53
For 300, 600, 1,200, 1,500, and 2,800, here is a graph of the Empirical Survival Function: Probability 1.0
o
o
0.8
o
0.6
o
0.4
o
0.2
300
600
1200 1500
2800
x
The empirical survival function is constant on intervals, with jumps down of 1/5 at each of the five observed points. For example, it is 4/5 at 599.99999 but 3/5 at 600. Exercise: Determine the area under this empirical survival function. [Solution: (1)(300) + (.8)(300) + (.6)(600) + (.4)(300) + (.2)(1300) = 1280.] The sample mean, X = (300 + 600 + 1200 +1500 + 2800)/5 = 1280. The sample mean is equal to the integral of the empirical survival function. As will be discussed in a subsequent section, the mean is equal to the integral of the survival function, for those cases where the support of the survival function starts at zero.
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 54
Problems: 4.1 (1 point) Insureds suffer six losses of sizes: 3, 8, 13, 22, 35, 62. What is the empirical survival function at 30? A. 1/6 B. 1/3 C. 1/2 D. 2/3
E. 5/6
4.2 (1 point) You observe 5 losses of sizes: 15, 35, 70, 90, 140. What is the empirical distribution function at 50? A. 20% B. 30% C. 40% D. 50% E. 60% 4.3 (2 points) F(200) = 0.9, F(d) = 0.25, and 200
∫ x f(x) dx = 75. d 200
∫ F(x) dx + d = 150. d
Determine d. A. 60 B. 70
C. 80
D. 90
E. 100
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 55
Use the following information for the next two questions: You are given the following graph of an empirical distribution function: Probability 1
0.6 0.4
0 0
7
11
19
Size
4.4 (1 point) Determine the mean of the data. (A) Less than 10 (B) At least 10, but less than 11 (C) At least 11, but less than 12 (D) At least 12, but less than 13 (E) At least 13
∑ (Xi 4.5 (1 point) For this data, determine the biased estimator of the variance,
- X )2
N
.
A) Less than 26 (B) At least 26, but less than 28 (C) At least 28, but less than 30 (D) At least 30, but less than 32 (E) At least 34
4.6 (CAS9, 11/99, Q.16) (1 point) Which of the following can cause distortions in a loss claim size distribution derived from empirical data? 1. Claim values tend to cluster around target values, such as $5,000 or $10,000. 2. Individual clams may come from policies with different policy limits. 3. Final individual claim sizes are not always known. A. 1 B. 2 C. 3 D. 1, 2 E. 1, 2, 3
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 56
4.7 (IOA 101, 9/01, Q.7) (3.75 points) The probability density function of a random variable X is given by f(x) = kx(1 - ax2 ), 0 ≤ x ≤ 1, where k and a are positive constants. (i) (2.25 points) Show that a ≤ 1, and determine the value of k in terms of a. (ii) (1.5 points) For the case a = 1, determine the mean of X.
2013-4-2,
Loss Distributions, §4 Empirical Dist. Function HCM 10/8/12, Page 57
Solutions to Problems: 4.1. B. S(30) = (# losses > 30)/(# losses) = 2/6 = 1/3. 4.2. C. There are 2 losses of size ≤ 50. Empirical distribution function at 50 is: 2/5 = 0.4. 200
200
4.3. A. By integration by parts: ∫ F(x) dx = xF(x)] d
200
∫ x f(x) dx = (200)F(200) - dF(d) - 75 =
d d
(200)(.9) - .25d - 75 = 105 - .25d. ⇒ 105 - .25d + d = 150. ⇒ d = 45/.75 = 60. 4.4. D. From the empirical distribution function, 40% of the data is 7, 60% - 40% = 20% of the data is 11, and 100% - 60% = 40% of the data is 19. The mean is: (40%)(7) + (20%)(11) + (40%)(19) = 12.6. Comment: If the data set was of size five, then it was: 7, 7, 11, 19, 19. The mean is: 63/5 = 12.6. 4.5. C. From the empirical distribution function, 40% of the data is 7, 60% - 40% = 20% of the data is 11, and 100% - 60% = 40% of the data is 19. The mean is: (40%)(7) + (20%)(11) + (40%)(19) = 12.6. The second moment is: (40%)(72 ) + (20%)(112 ) + (40%)(192 ) = 188.2.
∑ (Xi
- X )2
N
= 188.2 - 12.62 = 29.44.
4.6 . E. All of these are true. Item #3 is referring to the time between when the insurer knows about a claim and sets up a reserve, and when the claim is paid and closed. 4.7. (i) f(x) ≥ 0. ⇒ 1 - ax2 ≥ 0, 0 ≤ x ≤ 1. ⇒ a ≤ 1. Integral from 0 to 1 of f(x) = k(x - ax3 ) is: k(1/2 - a/4). Setting this integral equal to one: k(1/2 - a/4) = 1. ⇒ k = 4/(2 - a). (ii) k = 4/(2 - a) = 4/(2 - 1) = 4. f(x) = 4x - 4x3 . The integral from zero to one of xf(x) = 4x2 - 4x4 is: 4/3 - 4/5 = 8/15.
2013-4-2,
Loss Distributions, §5 Limited Losses
HCM
10/8/12,
Page 58
Section 5, Limited Losses The next few sections will introduce a number of related ideas: the Limited Loss Variable, Limited Expected Value, Losses Eliminated, Loss Elimination Ratio, Excess Losses, Excess Ratio, Excess Loss Variable, Mean Residual Life/ Mean Excess Loss, and Hazard Rate/ Failure Rate. X
∧ 1000 ≡ Minimum of x and 1000 = Limited Loss Variable.
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is X ∧ 1000? [Solution: X ∧ 1000 = $300, $600, $1000, $1000, $1000.] If the insured had a policy with a $1000 policy limit (and no deductible), then the insurer would pay $300, $600, $1000, $1000, and $1000, for a total of $3900 for these five losses. The Limited Loss Variable33 corresponding to a limit L ⇔ X
∧
L⇔
censored from above at L ⇔ right censored at L34 ⇔ the payments with a policy limit L (and no deductible) ⇔ X for X < L, L for X ≥ L. Limited Expected Value: Limited Expected Value at 1000 = E[X ∧ 1000] = an average over all sizes of loss of the minimum of 1000 and the size of loss. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the (empirical) limited expected value at $1000? [Solution: E[X ∧ 1000] = (300 + 600 + 1000 + 1000 + 1000)/5 = 3900/5 = 780.] In this case, the insurer pays 3900 on 5 losses or an average of 780 per loss. The mean of the limited loss variable corresponding to L = E[ X the average payment per loss with a policy limit of L.
∧
L] =
Since E[X
∧
L] ≡ E[Min[X, L]] = average of numbers each ≤ L, E[ X
Since E[X
∧
L] ≡ E[Min[X, L]] = average of numbers each ≤ X, E[ X
33 34
See Definition 3.6 in Loss Models. Censoring will be discussed in a subsequent section.
∧
∧
L] ≤ L.
L] ≤ E[X].
2013-4-2,
Loss Distributions, §5 Limited Losses
HCM
10/8/12,
Page 59
Exercise: For the ungrouped data in Section 1, what is the Limited Expected Value at $10,000? [Solution: E[X
∧ 10000] is an average over all sizes of loss of the minimum of $10,000 and the size
of loss. So the first 8 losses in Section 1 would all enter into the average at their total size, while the remaining 122 losses all enter at $10,000. E[X
∧ 10000] = (35200 + (122)(10000)) / 130 = $1,255,200 / 130 = $9655.4.]
The limited expected value can be written as the sum of the contributions of the small losses and the large losses. The (theoretical) Limited Expected Value (LEV), E[X ∧ L], would be written for a continuous size of loss distribution as two pieces: E[X
∧
L
L] =
∫
x f(x) dx + L S(L)
0
= contribution of small losses + contribution of large losses. The first piece represents the contribution of losses up to L in size, while the second piece represents the contribution of those losses larger than L. The smaller losses each contribute their size, while the larger losses each contribute L to the average. For example, for the Exponential Distribution:
E[X ∧ L] =
L
∫
x
e- x / θ
/ θ dx + L
e-L/θ =
(-x
e- x / θ
- θ
x =L x / θ e )]
0
35
See Appendix A of Loss Models and the tables attached to the exam.
x=0
+ L e-L/θ = θ(1 - e-L/θ).35
2013-4-2,
Loss Distributions, §5 Limited Losses
HCM
10/8/12,
Page 60
Problems: 5.1 (1 point) Insureds suffer six losses of sizes: 3, 8, 13, 22, 35, 62. What is the Limited Expected Value, for a Limit of 25? A. 15 B. 16 C. 17 D. 18 E. 19 5.2 (1 point) You observe 5 losses of sizes: 15, 35, 70, 90, 140. What is the Limited Expected Value at 50? A. 10 B. 20 C. 30 D. 40
E. 50
Use the following information for the next two questions: Frequency is Poisson with λ = 20. E[X] = $10,000. E[X ∧ 25,000] = $8000. 5.3 (1 point) If there is no policy limit, what is the expected aggregate annual loss? 5.4 (1 point) If there is a 25,000 policy limit, what is the expected aggregate annual loss?
5.5 (2 points) For an insurance policy, you are given: (i) The policy limit is 100,000 per loss, with no deductible. (ii) Expected aggregate losses are 1,000,000 annually. (iii) The number of losses follows a Poisson distribution. (iv) The claim severity distribution has: S(50,000) = 10%. S(100,000) = 4%. E[X ∧ 50,000] = 28,000. E[X ∧ 100,000] = 32,000. (v) Frequency and severity are independent. Determine the probability that no losses will exceed 50,000 during the next year. (A) 3.0% (B) 3.5% (C) 4.0% (D) 4.5% (E) 5.0% 5.6 (1 point) E[X ∧ 5000] = 3200. Size Number of Losses Dollars of Loss 0 to 5000 170 ??? 5001 to 25,000 60 700,000 over 25,000 20 ??? Determine E[X ∧ 250,000]. (A) 5600 (B) 5800 (C) 6000 (D) 6200 (E) 6400
2013-4-2,
Loss Distributions, §5 Limited Losses
HCM
5.7 (4, 11/01, Q.36) (2.5 points) For an insurance policy, you are given: (i) The policy limit is 1,000,000 per loss, with no deductible. (ii) Expected aggregate losses are 2,000,000 annually. (iii) The number of losses exceeding 500,000 follows a Poisson distribution. (iv) The claim severity distribution has Pr(Loss > 500,000) = 0.0106 E[min(Loss; 500,000)] = 20,133 E[min(Loss; 1,000,000)] = 23,759 Determine the probability that no losses will exceed 500,000 during 5 years. (A) 0.01 (B) 0.02 (C) 0.03 (D) 0.04 (E) 0.05
10/8/12,
Page 61
2013-4-2,
Loss Distributions, §5 Limited Losses
HCM
10/8/12,
Page 62
Solutions to Problems: 5.1. B. E[X ∧ 25] = (3 + 8 + 13 + 22 + 25 + 25) / 6 = 96 / 6 = 16. 5.2. D. E[X
∧
50] = (15 + 35 + 50 + 50 + 50)/5 = 40.
5.3. (20)($10000) = $200,000. 5.4. (20)($8000) = $160,000. 5.5. D. 1,000,000 = expected annual aggregate loss = (mean frequency)E[X
∧
100,000] =
(mean frequency) (32,000). ⇒ mean frequency = 1 million / 32,000 = 31.25 losses per year. The expected number of losses exceeding 50,000 is: (31.25)S(50,000) = 3.125. The large losses are Poisson; the chance of having zero of them is: e-3.125 = 4.4%. Comment: Similar to 4, 11/01, Q.36. 5.6. E. E[X ∧ 25,000] = E[X ∧ 5000] + (contribution above 5000 from medium claims) + (contribution above 5000 from large claims) = 3200 + {700,000 - (60)(5000)} / 250 + (20)(25,000 - 5000) / 250 = 6400. Alternately, let y be the dollars of loss on losses of size 0 to 5000. Then, 3200 = E[X ∧ 5000] = {y + (5000) (60 + 20)} / 250. ⇒ y = 400,000. E[X ∧ 25,000] = {400,000 + 700,000 + (20)(25,000)} / 250 = 6400. Comment: Each loss of size more than 25,000, contributes an additional 20,000 to E[X ∧ 25,000], compared to E[X ∧ 5000]. Each loss of size 5001 to 25,000 contributes an additional x - 5000 to E[X ∧ 25,000], compared to E[X ∧ 5000]. 5.7. A. 2,000,000 = expected annual aggregate loss = (mean frequency)E[X ∧ 1 million] = (mean frequency) (23,759). Therefore, mean frequency = 2 million/ 23759 = 84.18 per year. Assuming frequency and severity are independent, the expected number of losses exceeding 1/2 million is: (84.18)(.0106) = .892 per year. Over 5 years we expect (5)(.892) = 4.461 losses > 1/2 million. Since we are told these losses are Poisson Distributed, the chance of having zero of them is: e-4.461 = 0.012.
2012-4-2,
Loss Distributions, §6 Losses Eliminated
HCM
10/8/12,
Page 63
Section 6, Losses Eliminated Assume an (ordinary) deductible of $10,000 and the ground up36 loss sizes from Section 1. Then the insurer would pay nothing for the first 8 losses, each of which is less than the $10,000 deductible. For the ninth loss of size $10,400, the insurer would pay $400 while the insured would have to absorb $10,000. For a loss of $37,300, the insurer would pay: $37,300 - $10,000 = $27,300. Similarly, for each of the larger losses $10,000 is eliminated, from the point of view of the insurer. The total dollars of loss eliminated is computed by summing up the sizes of loss for all losses less than the deductible amount of $10,000, and adding to that the sum of $10,000 per each loss greater than or equal to $10,000. In this case the losses eliminated are: $35,200 + (122)($10,000) = $1,255,200. Note that the Empirical Losses Eliminated are a continuous function of the deductible amount; a small increase in the deductible amount produces a corresponding small increase in the empirical losses eliminated. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How many dollars of loss are eliminated by a deductible of $1000? [Solution: $300 + $600 + (3)($1000) = $3900.] Let N be the total number of losses. Then the Losses Eliminated by a deductible d would be written for a continuous size of loss distribution as the sum of the same two pieces, the contribution of the small losses plus the contribution of the large losses: d
N
∫
x f(x) dx + N d S(d).
0
The first piece is the sum of losses less than d. (We have multiplied by the total number of losses since f(x) is normalized to integrate to unity.) The second piece is the number of losses greater than d times d per such loss. Note that the losses eliminated are just the number of losses times the Limited Expected Value. Losses Eliminated by deductible d are: N E[X ∧ d].
36
By “ground up” I mean the economic loss to the insured, prior to the impact any deductible.
2012-4-2,
Loss Distributions, §6 Losses Eliminated
HCM
10/8/12,
Page 64
Loss Elimination Ratio: The total losses in Section 1 are $40,647,700. Therefore, the $1,255,200 of losses eliminated by a deductible of size $10,000 represent $1255200 / $40647700 = 3.09% of the total losses. This corresponds to an empirical Loss Elimination Ratio (LER) at 10,000 of 3.09%. Loss Elimination Ratio at d = LER(d) =
Losses Eliminated by a deductible of size d . Total Losses
In general the LER is the ratio of the losses eliminated to the total losses. Since its numerator is continuous while its denominator is independent of the deductible amount, the empirical loss elimination ratio is a continuous function of the deductible amount. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the (empirical) loss elimination ratio at $1000? [Solution: $3900 losses are eliminated out of a total of 300 + 600 + 1200 + 1500 + 2800 = 6400. Therefore, LER(1000) = 3900/6400 = 60.9%.] The loss elimination ratio at x can be written as: dollars of loss limited by x (dollars of loss limited by x) / N E[X ∧ x] LER(x) = = = . total losses (total losses) / N Mean
LER(x) =
E[X∧ x] . E[X]
For example, for the ungrouped data in Section 1, E[X ∧ 10000] is equal to the losses eliminated by a deductible of 10,000: $1,255,200, divided by the total number of losses 130. E[X ∧ 10000] = 1,255,200 / 130 = 9655.4. The mean is the total losses of $40,647,700 divided by 130. E[X] = 40,647,700/130 = 312,675. Therefore, LER(10000) = E[X ∧ 10000] / E[X] = 9655.4 / 312675 = 3.09%, matching the previous computation of LER(10000).
2012-4-2,
Loss Distributions, §6 Losses Eliminated
HCM
10/8/12,
Page 65
Problems: 6.1 (1 point) Insureds suffer six losses of sizes: 3, 8, 13, 22, 35, 62. What is the Loss Elimination Ratio, for a deductible of 10? A. less than 0.37 B. at least 0.37 but less than 0.39 C. at least 0.39 but less than 0.41 D. at least 0.41 but less than 0.43 E. at least 0.43 6.2 (1 point) You observe 5 losses of sizes: 15, 35, 70, 90, 140. What is the Loss Elimination Ratio at 50? A. 54% B. 57% C. 60% D. 63% E. 66% 6.3 (2 points) You observe the following payments on 6 losses with no deductible applied: $200, $300, $400, $800, $900, $1,600. Let A be the loss elimination ratio (LER) for a $500 deductible. Let B be the loss elimination ratio (LER) for a $1000 deductible. Determine B - A. A. 30% B. 35% C. 40% D. 45% E. 50% 6.4 (4, 5/89, Q.57) (1 point) Given the following payments on 6 losses, calculate the loss elimination ratio (LER) for a $300 deductible (assume the paid losses had no deductible applied). Paid Losses: $200, $300, $400, $800, $900, $1,600. A. LER < .40 B. .40 < LER ≤ .41 C. .41 < LER < .42 D. .42 < LER ≤ .43 E. .43 < LER 6.5 (CAS5, 5/03, Q.38) (3 points) Given the information below, calculate the loss elimination ratio for ABC Company's collision coverage in State X at a $250 deductible. Show all work.
• ABC insures 5,000 cars at a $250 deductible with the following fully credible data on the collision claims: Paid losses are $1,000,000 per year. The average number of claims per year is 500.
• A fully credible study found that in State X: The average number of car accidents per year involving collision damage was 10,000. The average number of vehicles was 67,000.
• Assume ABC Company's expected ground-up claims frequency is equal to that of State X. • Assume the average size of accidents that fall below the deductible is $150.
2012-4-2,
Loss Distributions, §6 Losses Eliminated
HCM
10/8/12,
Page 66
Solutions to Problems: 6.1. A. LER(10) = Losses Eliminated / Total Losses = (3 + 8 + 10 + 10 + 10 + 10) / (3 + 8 + 13 + 22 + 35 + 62) = 51 / 143 = 0.357. 6.2. B. E[X] = (15+35+70+90+140)/ 5 = 70. LER(50) = E[X
∧
50]/E[X] = 40/70 = 0.571.
6.3. A. The Losses Eliminated for a $500 deductible are: 200 + 300 + 400 + (3)(500) = 2400. The total losses are 4200. Thus LER(500) = Losses Eliminated / Total Losses = 2400/4200 = .571. Losses Eliminated for a $1000 deductible are: 200 + 300 + 400 + 800 + 900 + 1000 = 3600. Thus LER(1000) = Losses Eliminated / Total Losses = 3600/4200 = .857. LER(1000) - LER(500) = .857 - .571 = 0.286. 6.4. B. The Losses Eliminated are: (200)+(300)+(4)(300) = 1700. The total losses are 4200. Thus the LER = Losses Eliminated / Total Losses = 1700/4200 = 0.405. 6.5. Accident Frequency for State X is: 10,000/67,000 = 14.925%. For 5000 cars, expect: (14.925%)(5000) = 746.3 accidents. There were 500 claims, in other words 500 accidents of size greater than the $250 deductible. Thus we infer: 746.3 - 500 = 246.3 small accidents. These small accidents had average size $150, for a total of: (246.3)($150) = $36,945. Deductible eliminates $250 for each large accident, for a total of: ($250)(500) = $125,000. Losses eliminated = $36,945 + $125,000 = $161,945. Total losses = losses eliminated + losses paid = $161,945 + $1,000,000 = $1,161,945. LER at $250 = Losses Eliminated / Total Losses = $161,945 / $1,161,945 = 13.9%. Alternately, frequency of loss = 10,000/67,000 = 14.925%. Frequency of claims (accidents of size > 250) = 500/5000 = 10%. S(250) = 10%/14.925% = .6700. F(250) = 1 - S(250) = .3300. Average size of accidents that fall below the deductible = average size of small accidents = $150 = {E[X ∧ 250] - 250S(250)}/F(250) = {E[X ∧ 250] - ($250)(.67)}/.33.
⇒ E[X ∧ 250] = (.33)($150) + (.67)($250) = $217. Average payment per non-zero payment = $1,000,000/500 = $2000 = (E[X] - E[X ∧ 250])/S(250) = (E[X] - E[X ∧ 250])/.67.
⇒ E[X] - E[X ∧ 250] = $1340. ⇒ E[X] = $1340 + $217 = $1557. LER(250) = E[X
∧
250] / E[X] = $217/$1557 = 13.9%.
2013-4-2,
Loss Distributions, §7 Excess Losses
HCM 10/8/12,
Page 67
Section 7, Excess Losses The dollars of loss excess of $10,000 per loss are also of interest. These are precisely the dollars of loss not eliminated by a deductible of size $10,000. For the ungrouped data in Section 1, the losses excess of $10,000 are $40,647,700 - $1,255,200 = $39,392,500. (X - d)+ ≡ 0 when X ≤ d, X - d when X > d.37
(X - d)+ is the amount paid to an insured with a deductible of d. The insurer pays nothing if X ≤ d, and pays X - d if X > d. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is (X - 1000)+? [Solution: 0, 0, $200, $500, and $1800.] (X - d)+ is referred to as the “left censored and shifted variable” at d.38 (X - d)+ ⇔ left censored and shifted variable at d ⇔ 0 when X ≤ d, X - d when X > d ⇔ the amounts paid to insured with a deductible of d
⇔ payments per loss, including when the insured is paid nothing due to the deductible of d ⇔ amount paid per loss. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is E[(X - 1000)+]? [Solution: (0 + 0 + $200 + $500 + $1800)/ 5 = $500.] The expected losses excess of 10,000 per loss would be written for a continuous size of loss distribution as: ∞
E[(X - 10000)+ ] = Losses Excess of 10,000 per loss =
∫
(x - 10,000) f(x) dx
10,000
Note that we only integrate over those losses greater than $10,000 in size, since smaller losses contribute nothing to the excess losses. Also larger losses only contribute the amount by which each exceeds $10,000. 37 38
The “+” refers to taking the variable X - d when it is positive, and otherwise setting the result equal to zero. Censoring will be discussed in a subsequent section. See Definition 3.5 in Loss Models.
2013-4-2,
Loss Distributions, §7 Excess Losses ∞
E[(X - 10000)+ ] =
∫
∞
(x - 10,000) f(x) dx =
10,000
∞
10,000
0
0
∫ x f(x) dx - {
∫
HCM 10/8/12,
∫
Page 68
∞
x f(x) dx - 10,000
10,000
∫
f(x) dx =
10,000
x f(x) dx + 10,000 S(10,000)} = E[X] - E[X ∧ 10000].
Losses Excess of L per loss = E[(X - L)+] = E[X] - E[X ∧ L]. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. Show that E[(X - 1000)+] = E[X] - E[X ∧ 1000]. [Solution: E[X] - E[X ∧ 1000] = 1280 - 780 = 500 = E[(X - 1000)+].]
Exercise: For an Exponential Distribution with θ = 100, what is E[(X - 70)+]? [Solution: E[(X - 70)+] = E[X] - E[X ∧ 70] = 100 - 100(1 - e-70/100) = 49.7.] Excess Ratio:39 The Excess Ratio is the losses excess of the given limit divided by the total losses. Excess Ratio at x = R(x) ≡ (losses excess of x)/(total losses) = E[(X - x)+ ] / E[X] = (E[X] - E[X ∧ x]) / E[X]. Therefore, for the data in Section 1, the empirical Excess Ratio, R(10,000) = (40,647,700 - 1,255,200) / 40,647,700 = 96.91%. Note that: R(10,000) = 96.91% = 1 - 3.09% = 1 - LER(10,000). R(10,000) = (losses excess of 10,000) / (total losses) = N E[(X - 10,000)+] / (N E[X]) = (E[X] - E[X ∧ 10,000]) / E[X] = 1 - E[X ∧ 10,000] / E[X] = 1 - LER(10,000). R(x) = 1 - LER(x) = 1 - E[X ∧ x] / E[X]. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the (empirical) excess ratio at $1000? [Solution: R(1000) = (200 + 500 + 1800)/6400 = 39.1% = 1 - 60.9% = 1 - LER(1000) = 1 - E[X ∧ 1000] / E[X] = 1 - 780/1280.] 39
Loss Models does not use the commonly used term Excess Ratio. However, this important concept may help you to understand and answer questions. Since the Excess Ratio is just one minus the Loss Elimination Ratio, one can always work with the Loss Elimination Ratio instead of the Excess Ratio.
2013-4-2,
Loss Distributions, §7 Excess Losses
HCM 10/8/12,
Page 69
One can also write the Excess Ratio in terms of integrals as: ∞
∫ (x - L) f(x) dx
R(L) = L ∞
∫x
∞
∫x
f(x) dx - L S(L)
= L
.
∞
∫x
f(x) dx
0
f(x) dx
0
However, in order to compute the excess ratio or loss elimination ratio, it is usually faster to use the formulas in Appendix A of Loss Models for the Mean and Limited Expected Value. Exercise: For an Exponential Distribution with θ = 100, what is R(70)? [Solution: R(70) = 1 - E[X ∧ 70]/E[X] = 1 - 100(1 - e-70/100)/100 = 49.7%.] Total Losses = Limited Losses + Excess Losses: Exercise: For a loss of size 6 and a loss of size 15, list X ∧ 10, (X-10)+, and (X ∧ 10) + (X-10)+. [Solution: X X ∧ 10 (X-10)+ (X ∧ 10) + (X-10)+ 6 15 In general, X = (X
6 10
∧
0 5
6 15]
d) + (X - d)+ .
In other words, buying two policies, one with a policy limit of 1000 (and no deductible), and another with a deductible of 1000 (and no policy limit), provides the same coverage as a single policy with no deductible or policy limit. A deductible of 1000 caps the policyholderʼs payments at 1000, so from his point of view the 1000 deductible acts as a limit. The policyholderʼs retained loss is: X ∧ 1000. The insurerʼs payment to the policyholder is: (X - 1000)+ . Together they total to the loss, X. A deductible from one point of view is a policy limit from another point of view.40 Remember the losses eliminated by a deductible of size 1000 are E[X ∧ 1000], the same expression as the losses paid under a policy with limit of size 1000 (and no deductible). X = (X
∧
d) + (X - d)+. ⇒ E[X] = E[X
∧
d] + E[(X - d)+]. ⇒ E[(X - d)+ ] = E[X] - E[X
∧
d].
Expected Excess = Expected Total Losses - Expected Limited Losses. 40
An insurer who buys reinsurance with a per claim deductible of 1 million, has capped its retained losses at 1 million per claim. In that sense the 1 million deductible from the point of view of the reinsurer acts as if the insurer had sold policies with a 1 million policy limit from the point of view of the insurer.
2013-4-2,
Loss Distributions, §7 Excess Losses
HCM 10/8/12,
Problems: 7.1 (1 point) Insureds suffer six losses of sizes: 3, 8, 13, 22, 35, 62. What is the Excess Ratio, excess of 30? A. less than .16 B. at least .16 but less than .19 C. at least .19 but less than .22 D. at least .22 but less than .25 E. at least .25 7.2 (2 points) Determine the excess ratio at $200,000. Frequency Dollar of Losses Amount 40% $5,000 20% $10,000 15% $25,000 10% $50,000 5% $100,000 4% $250,000 3% $500,000 2% $1,000,000 1% $2,000,000 7.3 (1 point) X is 70 with probability 40% and 700 with probability 60%. Determine E[(X - 100)+ ]. A. less than 345 B. at least 345 but less than 350 C. at least 350 but less than 355 D. at least 355 but less than 360 E. at least 360 7.4 (1 point) X is 5 with probability 80% and 25 with probability 20%. If E[(X - d)+ ] = 3, determine d. A. 4
B. 6
C. 8
D. 10
E. 12
Page 70
2013-4-2,
Loss Distributions, §7 Excess Losses
HCM 10/8/12,
Page 71
Solutions to Problems: 7.1. E. R(30) = (dollars excess of 30) / (total dollars) = (5 + 32) / (3 + 8 + 13 + 22 + 35 + 62) = 37 / 143 = 0.259. 7.2. Excess Ratio = expected excess losses / expected total losses = 45000/82750 = 54.4%. Probability
Amount
Product
Excess of 200000
Product
0.4 0.2 0.15 0.1 0.05 0.04 0.03 0.02 0.01
$5,000 $10,000 $25,000 $50,000 $100,000 $250,000 $500,000 $1,000,000 $2,000,000
$2,000 $2,000 $3,750 $5,000 $5,000 $10,000 $15,000 $20,000 $20,000
$0 $0 $0 $0 $0 $50,000 $300,000 $800,000 $1,800,000
$0 $0 $0 $0 $0 $2,000 $9,000 $16,000 $18,000
$82,750
$45,000
7.3. E. (70 - 100)+ = 0. (700 - 100)+ = 600. E[(X - 100)+ ] = (40%)(0) + (60%)(600) = 360. 7.4. D. E[(X - 5)+ ] = (0)(80%) + (25 - 5)(20%) = 4 > 3. ⇒ d must be greater than 5. Therefore, E[(X - d)+ ] = (.2)(25 - d) = 3. ⇒ d = 10.
2013-4-2,
Loss Distributions, §8 Mean Excess Loss
HCM 10/8/12,
Page 72
Section 8, Mean Excess Loss The Excess Loss Variable for d is defined for X > d as X-d and is undefined for X ≤ d.41 Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the Excess Loss Variable for $1000? [Solution: undefined, undefined, $200, $500, $1,800.] Excess Loss Variable for d ⇔ the nonzero payments excess of a deductible of d
⇔ X - d for X > d ⇔ truncated from below at d and shifted42 ⇔ amount paid per (non-zero) payment. The Excess Loss Variable at d, which could be called the left truncated and shifted variable at d, is similar to (X - d)+ , the left censored and shifted variable at d. However, the Excess Loss Variable at d is undefined for X ≤ d, while in contrast (X - d)+ is zero for X ≤ d. Excess Loss Variable ⇔ undefined X ≤ d ⇔ amount paid per (non-zero) payment. (X - d)+ ⇔ 0 for X ≤ d ⇔ amount paid per loss. Exercise: An insured has four losses of size: 700, 3500, 16,000 and 40,000. What are the excess loss variable at 5000, the left censored and shifted variable at 5000, and the limited loss variable at 5000? [Solution: Excess Loss Variable at 5000: 11,000 and 35,000, corresponding to the last two losses. (It is not defined for the first two losses of size less than 5000.) Left censored and shifted variable at 5000: 0, 0, 11,000 and 35,000. Limited Loss Variable at 5000: 700, 3500, 5000, 5000.]
41 42
See Definition 3.4 in Loss Models. Truncation will be discussed in a subsequent section.
2013-4-2,
Loss Distributions, §8 Mean Excess Loss
HCM 10/8/12,
Page 73
Mean Residual Life / Mean Excess Loss: The mean of the excess loss variable for d = the mean excess loss, e(d) = (Losses Excess of d) / (number of losses > d) = (E[X] - E[ X ∧ d])/S(d) = the average payment per (nonzero) payment with a deductible of d. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the (empirical) mean excess loss at $1000? [Solution: e(1000) = ($200 + $500 + $1,800)/3 = $833.33.] Note that the first step in computing e(1000) is to ignore the two losses that “died before 1000.” Then one computes the average “lifetime beyond 1000” for the 3 remaining losses.43 In this situation, on a policy with a $1000 deductible, the insurer would make 3 (non-zero) payments totaling $2500, for an average (non-zero) payment of $833.33. The Mean Residual Life or Mean Excess Loss44 at x, e(x), is defined as the average dollars of loss above x on losses of size exceeding x. For the ungrouped data in Section 1, there are 122 losses of size greater than $10,000 and they have (40647700 - 1255200) dollars of loss above $10,000.45 Therefore, e(10,000) = $39,392,500 / 122 = $322,889. This can also be written as: mean - E[X ∧ 10,000] $312,674.6 - $9655.4 = = $322,889. S(10,000) 122 / 130
e(x) =
E[X] − E[X ∧x] . S(x)
Note that the empirical mean excess loss is discontinuous. While the excess losses in the numerator are continuous, the empirical survival function in the denominator is discontinuous. The denominator has a jump discontinuity at every observed claim size.
43
In Life Contingencies, this is how one computes the mean residual life. See Actuarial Mathematics. See Definition 3.4 in Loss Models. 45 Note that only losses which exceed the limit even enter into the computation; we ignore small losses. Thus the denominator in this case is 122 rather than 130. 44
2013-4-2,
Loss Distributions, §8 Mean Excess Loss
HCM 10/8/12,
Page 74
The (theoretical) mean excess loss would be written for a continuous size of loss distribution as: ∞ ⎛L ⎞ ∞ ⎜ ⎟ x f(x) dx x f(x) dx + L S(L) ∫ ∫ x f(x) dx - L S(L) ⎜∫ ⎟ ⎝0 ⎠ 0 L e(L) = = . S(L) S(L)
The numerator of e(L) is the losses eliminated divided by the total number of losses; this is equal to the excess ratio R(x) times the mean. Thus e(x) = R(x) mean / S(x). Specifically, for the ungrouped data in Section 1, e(10000) = (96.91%)(312674.6 ) / (122 / 130) ≈ 322,883. One can also write e(L) as: ∞
∫ x f(x) dx
e(L) =
L
S(L)
-L=
dollars on losses of size > L - L. # of losses of size > L
Thus, e(x) = (average size of those losses of size greater than x) - x.
Summary of Related Ideas: Loss Elimination Ratio at x = LER(x) = E[X ∧ x] / E[X]. Excess Ratio at x = R(x) =
E[X] − E[X ∧x] E[X ∧x] =1= 1 - LER(x). E[X] E[X]
Mean Residual Life at x = Mean Excess Loss at x = e(x) =
E[X] − E[X ∧x] . S(x)
On the exam, one wants to avoid doing integrals if at all possible. Therefore, one should use the formulas for the Limited Expected Value, E[X ∧ x], in Appendix A of Loss Models whenever possible. Those who are graphically oriented may find the Section on Lee Diagrams helps them to understand these concepts.
2013-4-2,
Loss Distributions, §8 Mean Excess Loss
HCM 10/8/12,
Page 75
Exercise: For the data in Section 1, what are the Limited Expected Value, Loss Elimination Ratio, the Excess Ratio, and the mean excess loss, all at $25,000? [Solution: The Limited Expected Value at $25,000 is: {(sum of losses < $25,000) + ($25,000)(# losses > $25,000) } / (# losses) = ($232500)+ ($25,000)(109) = $2,957,500 / 130 = $22,750. The Loss Elimination Ratio at $25,000 is: E[X ∧ 25000] / mean = $22,750 / $312674.6 = 7.28%. The Excess Ratio at $25,000 = R(25000) = 1 - LER(25000) = 1 - 7.28% = 92.72%. The mean excess loss at $25,000 = e(25000) = (mean - E[X ∧ 25000]) / S(25000) = ($312674.6 - $22,750) / (109 / 130) = $345,782.]
Hazard Rate/ Failure Rate: The failure rate, force of mortality, or hazard rate, is defined as: h(x) = f(x)/S(x), x≥0. For a given age x, the hazard rate is the density of the deaths, divided by the number of people still alive at age x. The hazard rate determines the survival (distribution) function and vice versa:
[
S(x) = exp -
-
x
∫ h(t) dt ].
0
d ln[S(x)] = h(x). dx
As will be discussed in a subsequent section, the limit as x approaches infinity of e(x) is equal to the the limit as x approaches infinity of 1/h(x). These behaviors will be used to distinguish the tails of distributions. Exercise: For the data in Section 1, estimate the empirical hazard rate at $25,000. [Solution: There is no unique estimate of the hazard rate at 25,000. However, there are 109 claims greater than 25,000 and 3 claims within 5000 of 25,000. Thus the density at 25,000 is about 3/5000, while the empirical survival function is 109/130. Therefore, h(25000) ≅ (3/5000)/(109/130) = 0.0007.]
2013-4-2,
Loss Distributions, §8 Mean Excess Loss
HCM 10/8/12,
Page 76
Problems: 8.1 (1 point) Insureds suffer six losses of sizes: 3, 8, 13, 22, 35, 62. What is the empirical mean excess loss at 20? A. less than 16 B. at least 16 but less than 18 C. at least 18 but less than 20 D. at least 20 but less than 22 E. at least 22 8.2 (2 points) Match the concepts. 1. Limited Loss Variable at 20 2. Excess Loss Variable at 20 3. (X - 20)+ A. 1a, 2b, 3c
B. 1a, 2c, 3b
a. 3, 8, 13, 20, 20, 20. b. 2, 15, 42. c. 0, 0, 0, 2, 15, 42. C. 1b, 2a, 3c
D. 1b, 2c, 3a
E. 1c, 2b, 3a
8.3 (2 points) The random variable for a loss, X, has the following characteristics: Limited Expected Value at x x F(x) 0 0.0 0 500 0.3 360 1000 0.9 670 5000 1.0 770 Calculate the mean excess loss for a deductible of 500. A. less than 600 B. at least 600 but less than 625 C. at least 625 but less than 650 D. at least 650 but less than 675 E. at least 675 8.4 (2 points) You are given S(60) = .50, S(70) = .40, and e(70) = 13. Assuming the survival function is a straight line between ages 60 and 70, estimate e(67). A. 14.8 B. 15.0 C. 15.2 D. 15.4 E. 15.6 8.5 (1 point) You observe 5 losses of sizes: 15, 35, 70, 90, 140. What is the Mean Excess Loss at 50? A. 10 B. 20 C. 30 D. 40 E. 50
2013-4-2,
Loss Distributions, §8 Mean Excess Loss
HCM 10/8/12,
Page 77
8.6 (4, 5/88, Q.60) (1 point) What is the empirical mean residual life at x = 4 given the following sample of total lifetimes: 3, 2, 5, 8, 10, 1, 6, 9. A. Less than 1.5 B. At least 1.5, but less than 2.5 C. At least 2.5, but less than 3.5 D. 3.5 or more E. Cannot be determined from the information given 8.7 (4B, 5/93, Q.25) (2 points) The following random sample has been observed: 2.0, 10.3, 4.8, 16.4, 21.6, 3.7, 21.4, 34.4 Calculate the value of the empirical mean excess loss function for x = 8. A. less than 7.00 B. at least 7.00 but less than 9.00 C. at least 9.00 but less than 11.00 D. at least 11.00 but less than 13.00 E. at least 13.00 8.8 (4B, 11/94, Q.16) (1 point) A random sample of auto glass claims has yielded the following five observed claim amounts: 100, 125, 200, 250, 300. What is the value of the empirical mean excess loss function at x = 150? A. 75 B. 100 C. 200 D. 225 E. 250 8.9 (3, 11/01, Q.35 & 2009 Sample Q.101) (2.5 points) The random variable for a loss, X, has the following characteristics: Limited Expected Value at x x F(x) 0 0.0 0 100 0.2 91 200 0.6 153 1000 1.0 331 Calculate the mean excess loss for a deductible of 100. (A) 250 (B) 300 (C) 350 (D) 400 (E) 450
2013-4-2,
Loss Distributions, §8 Mean Excess Loss
HCM 10/8/12,
Page 78
Solutions to Problems: 8.1. C. e(20) = (dollars excess of 20) / (# claims greater than 20) = (2 + 15 + 42) / 3 = 59 /3 = 19.7. 8.2. A. Limited Loss Variable at 20, limit each large loss to 20: 3, 8, 13, 20, 20, 20. Excess Loss Variable at 20: 2, 15, 42, corresponding to 20 subtracted from each of the last 3 losses. It is not defined for the first 3 losses, each of size less than 20. (X - 20)+ is 0 for X ≤ 20, and X - 20 for X > 20: 0, 0, 0, 2, 15, 42. 8.3. A. F(5000) = 1 ⇒ E[X] = E[X
∧
5000] = 770.
e(500) = (E[X] - E[X ∧ 500])/S(500) = (770 - 360)/(1 - .3) = 586. Comment: Similar to 3, 11/01, Q.35. 8.4. B. Years excess of 70 = S(70)e(70) = (.4)(13) = 5.2. S(70) = .40, S(69) ≅ .41, S(68) ≅ .42, S(67) ≅ .43. Years lived between ages 67 and 70 ≅ .425 + .415 + .405 = 1.245. e(67) = (years excess of 67)/S(67) ≅ (5.2 + 1.245)/.43 = 15.0. 8.5. E. e(50) = (20 + 40 + 90)/3 = 50. Alternately, e(50) = (E[X] - E[X ∧ 50])/S(50) = (70 - 40)/(1 - 0.4) = 50. 8.6. D. We ignore all claims of size 4 or less. Each of the 5 claims greater than 4 contributes the amount by which it exceeds 4. The empirical mean excess loss at x=4 is: {(5-4) + (8-4) + (10-4) + (6-4) + (9-4)} / 5 = 18/5 = 3.6. 8.7. D. To compute the mean excess loss at 8, we only look at accidents greater than 8. There are 5 such accidents, and we compute the average amount by which they exceed 8: e(8) = (2.3 +8.4 + 13.6 + 13.4 + 26.4) / 5 = 64.1 / 5 = 12.82. 8.8. B. Add up the dollars excess of 150 and divide by the 3 claims of size exceeding 150. e(150) = (50 + 100 + 150) / 3 = 100. 8.9. B. F(1000) = 1 ⇒ E[X] = E[X e(100) = (E[X] - E[X
∧
∧
1000] = 331.
100])/S(100) = (331 - 91)/(1 - 0.2) = 300.
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 79
Section 9, Layers of Loss Actuaries, particularly those working with reinsurance, often look at the losses in a layer. The following diagram shows how the Layer of Loss between $10,000 and $25,000 relates to three specific claims of size: $30,000, $16,000 and $7,000. The claim of size $30,000 contributes to the layer $15,000, the width of the layer, since it is larger than the upper boundary of the layer. The claim of size $16,000 contributes to the layer $16,000 $10,000 = $6,000; since it is between the two boundaries of the layer it contributes its size minus the lower boundary of the layer. The claim of size $7,000 contributes nothing to the layer, since it is smaller than the lower boundary of the layer. $30,000
$25,000
$16,000
$10,000 $7,000
For example, for the data in Section 1 the losses in the layer between $10,000 and $25,000 are calculated in three pieces. The 8 losses smaller than $10,000 contribute nothing to this layer. The 13 losses between $10,000 and $25,000 each contribute their value minus $10,000. This sums to $67,300. The remaining 109 losses which are bigger than the upper limit of the interval at $25,000, each contribute the width of the interval, $25,000 - $10,000 = $15,000. Thus the total losses in the layer between $10,000 and $25,000 are: 0 + $67,300 + (109)($15,000) = $1,702,300. This is $1,702,300 / $40,647,700 = 4.19% of the total losses.
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 80
Exercise: For the data in Section 1, what is the percentage of total losses in the layer from $25,000 to $50,000? [Solution: $2,583,100 / $40,647,700 = 6.35%.] For a continuous size of loss distribution, the percentage of losses in the layer from $10,000 to $25,000 would be written as: 25,000
∫
(x - 10,000) f(x) dx + S(25,000) (25,000 -10,000)
10,000
.
∞
∫x
f(x) dx
0
The percentage of losses in a layer can be rewritten in terms of limited expected values. The percentage of losses in the layer from $10,000 to $25,000 is: (E[X ∧ 25000] - E[X ∧ 10000]) / mean = (22,750- 9655.4) / 312,675 = 4.19%. This can also be written in terms of the Loss Elimination Ratios: LER(25000) - LER(10000) = 7.28% - 3.09% = 4.19%. This can also be written in terms of the Excess Ratios (with the order reversed): R(10000) - R(25000) = 96.91% - 92.72% = 4.19%. The percentage of losses in the layer from d to u = u
∫ (x -
d
d) f(x) dx + S(u) (u - d) =
∞
∫x
f(x) dx
E[X ∧ u] − E[X ∧ d] = LER(u) - LER(d) = R(d) - R(u). E[X]
0
Layer Average Severity for the layer from d to u = The mean losses in the layer from d to u = E[X ∧ u] - E[X ∧ d] = {LER(u) - LER(d)} E[X] = {R(d) - R(u)} E[X]. The Layer from d to u can be thought of as either: (Layer from 0 to u) - (Layer from 0 to d) or (Layer from d to ∞) - (Layer from u to ∞).
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 81
For example, the Layer from 10 to 25 can be thought of as either: (Layer from 0 to 25) - (Layer from 0 to 10) or (Layer from 10 to ∞) - (Layer from 25 to ∞): ∞
R(25)
25 R(10)
LER(25)
10 LER(10) 0
The percentage of losses in the layer from 10 to 25 is: LER(25) - LER(10) = R(10) - R(25). Those who are graphically oriented may find that my Section on Lee Diagrams helps them to understand these concepts.
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 82
Problems: 9.1 (1 point) Insureds suffer six losses of sizes: 3, 8, 13, 22, 35, 62. What is the percentage of total losses in the layer from 10 to 25? A. less than 29% B. at least 29% but less than 30% C. at least 30% but less than 31% D. at least 31% but less than 32% E. at least 32% 9.2 (1 point) Four accidents occur of sizes: $230,000, $810,000, $1,170,000, and $2,570,000. A reinsurer is responsible for the layer of loss from $500,000 to $1,500,000 ($1 million excess of 1/2 million.) How much does the reinsurer pay as a result of these four accidents? A. $1.7 million B. $1.8 million C. $1.9 million D. $2.0 million E. $2.1 million Use the following information for the next two questions: • A reinsurer expects 50 accidents per year from a certain book of business. • Limited Expected Values for this book of business are estimated to be: E[X ∧ $1 million] = $300,000 E[X ∧ $4 million] = $375,000 E[X ∧ $5 million] = $390,000 E[X ∧ $9 million] = $420,000 E[X ∧ $10 million] = $425,000 9.3 (1 point) If the reinsurer were responsible for the layer of loss from $1 million to $5 million ($4 million excess of $1 million), how much does the reinsurer expect to pay per year as a result of accidents from this book of business? A. $4.0 million B. $4.5 million C. $5.0 million D. $5.5 million E. $6.0 million 9.4 (1 point) Let A be the amount the reinsurer would expect to pay per year as a result of accidents from this book of business, if the reinsurer were responsible for the layer of loss from $1 million to $5 million ($4 million excess of $1 million). Let B be the amount the reinsurer would expect to pay per year as a result of accidents from this book of business, if the reinsurer were instead responsible for the layer of loss from $1 million to $10 million ($9 million excess of $1 million). What is the ratio of B/A? A. 1.30 B. 1.35 C. 1.40 D. 1.45 E. 1.50
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 83
Use the following information for the next four questions: • A reinsurer expects 50 accidents per year from a certain book of business. • The average size of accident from this book of business is estimated as $450,000. • Excess Ratios (Unity minus the Loss Elimination Ratio) for this book of business are: R($1 million) = 0.100 R($4 million) = 0.025 R($5 million) = 0.015 R($9 million) = 0.006 R($10 million) = 0.005 9.5 (1 point) What is the percentage of total losses in the layer from $1 million to $5 million? A. less than 6% B. at least 6% but less than 7% C. at least 7% but less than 8% D. at least 8% but less than 9% E. at least 9% 9.6 (1 point) If the reinsurer were responsible for the layer of loss from $1 million to $5 million ($4 million excess of $1 million), how much does the reinsurer expect to pay per year as a result of accidents from this book of business? A. less than $1 million B. at least $1 million but less than $2 million C. at least $2 million but less than $3 million D. at least $3 million but less than $4 million E. at least $4 million 9.7 (1 point) What is the percentage of total losses in the layer from $1 million to $10 million? A. less than 6% B. at least 6% but less than 7% C. at least 7% but less than 8% D. at least 8% but less than 9% E. at least 9% 9.8 (1 point) Let A be the amount the reinsurer would expect to pay per year as a result of accidents from this book of business, if the reinsurer were responsible for the layer of loss from $1 million to $5 million ($4 million excess of $1 million). Let B be the amount the reinsurer would expect to pay per year as a result of accidents from this book of business, if the reinsurer were instead responsible for the layer of loss from $1 million to $10 million ($9 million excess of $1 million). What is the ratio of B/A? A. 1.1 B. 1.2 C. 1.3 D. 1.4 E. 1.5
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 84
9.9 (2 points) Assume you have Pareto distribution with α = 5 and θ = $1000. What percentage of total losses are represented by the layer from $500 to $2000? A. less than 16% B. at least 16% but less than 17% C. at least 17% but less than 18% D. at least 18% but less than 19% E. at least 19% 9.10 (1 point) There are seven losses of sizes: 2, 5, 8, 11, 13, 21, 32. What is the percentage of total losses in the layer from 5 to 15? A. 35% B. 40% C. 45% D. 50% E. 55% 9.11. Use the following information: • Limited Expected Values for Security Blanket Insurance are estimated to be: E[X ∧ 100,000] = 40,000 E[X ∧ 200,000] = 50,000 E[X ∧ 300,000] = 57,000 E[X ∧ 400,000] = 61,000 E[X ∧ 500,000] = 63,000 • Security Blanket Insurance buys reinsurance from Plantagenet Reinsurance. Let A be the amount Plantagenet would expect to pay per year as a result of accidents from Security Blanket, if the reinsurance had a deductible of 100,000, maximum covered loss of 300,000, and a coinsurance factor of 90%. Let B be the amount Plantagenet would expect to pay per year as a result of accidents from Security Blanket, if the reinsurance had a deductible of 100,000, a maximum covered loss of 400,000, and a coinsurance factor of 80%. What is the ratio of B/A? (A) 1.05 (B) 1.10 (C) 1.15 (D) 1.20 (E) 1.25 9.12 (CAS5, 5/07, Q.9) (1 point) Using the table below, what is the formula for the loss elimination ratio at deductible D? Loss Limit Number of Losses Total Loss Amount D and Below N1 L1 Over D N2 L2 Total N1+N2 L1+L2 A. 1 - [L1 + L2 - (N1)(D)] / [L1 + L2] B. 1 - [L1 + (N2)(D)] / [L1 + L2] C. 1 - [L2 - (N2)(D)] / [L1 + L2] D. [L2 + (N2)(D)] / [L1 +(N2)(D)] E. [L1 + (N1)(D)] / [L1]
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 85
Solutions to Problems: 9.1. D. (losses in layer from 10 to 25) / total losses = (0+0+3+12+15+15) / (3+8+13+22+35+62) = 45 / 143 = .315. 9.2. D. The accidents of sizes $230,000, $810,000, $1,170,000, and $2,570,000 contribute to the layer of loss from $500,000 to $1,500,000: 0 + 310,000 + 670,000 + 1,000,000 = $1,980,000. 9.3. B. (50)(E[X ∧ $5 million] - E[X ∧ $1 million]) = (50)(390,000 - 300,000) = $4.5 million. 9.4. C. A = (50)(E[X ∧ $5 million] - E[X ∧ $1 million]) = (50)(390,000 - 300,000) = $4.5 million. B = (50)(E[X ∧ $10 million] - E[X ∧ $1 million]) = (50)(425,000 - 300,000) = $6.25 million. B/A = 6.25 / 4.5 = 1.389. Comment: One can solve this problem without knowing that 50 accidents are expected per year, since 50 multiplies both the numerator and denominator. The ratio between two layers of loss depends on the severity distribution, not the frequency distribution. 9.5. D. R($1 million) - R($5 million) = 0.100 - 0.015 = 0.085. 9.6. B. The annual losses from the layer from $1 million to $5 million = (number of accidents per year)(mean accident){R($1 million) - R($5 million)} = (50)($450,000){R($1 million) - R($5 million)} = ($22.5 million){.100 - .015} = $1,912,500. Alternately, the total expected losses are: (# of accidents per year)(mean accident) = (50)($450,000) = $22,500,000. (.085)($22,500,000) = $1,912,500. 9.7. E. R($1 million) - R($10 million) = 0.100 - 0.005 = 0.095. 9.8. A. B/A = {R($1 million) - R($10 million)} / {R($1 million) - R($5 million)} = (0.100 - 0.005) / (0.100 - 0.015) = 0.095 / 0.085 = 1.12. Comment: A = (number of accidents per year)(mean accident){R($1 million) - R($5 million)}. B = (number of accidents per year)(mean accident){R($1 million) - R($10 million)}.
2013-4-2,
Loss Distributions, §9 Layers of Loss
HCM 10/8/12,
Page 86
9.9. D. Use the formula given in Appendix A of Loss Models for the Limited Expected Value of the Pareto, E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}. Percentage of Losses in the Layer $500 to $2000 = ( E[X ∧ 2000] - E[X ∧ 500]) / mean = (246.9 - 200.6)/250 = 18.5%. Alternately, use the formula given in a subsequent section for the Excess Ratio of the Pareto, R(x) ={θ/(θ+x)}α−1. Percentage of Losses in the Layer $500 to $2000 = R(500) - R(2000) = 19.75% - 1.23% = 18.5%. 9.10. B. Loss: 2 5 8 11 13 21 32 Contribution to Layer from 5 to 15: 0 0 3 6 8 10 10 (0 + 0 + 3 + 6 + 8 + 10 + 10) / (2 + 5 + 8 + 11 + 13 + 21 + 32) = 37/92 = 40.2%. 9.11. B. A = (0.9)(E[X ∧ 300,000] - E[X ∧ 100,000]) = (.9)(57,000 - 40,000) = 15,300. B = (.8)(E[X ∧ 400,000] - E[X ∧ 100,000]) = (.8)(61,000 - 40,000) = 16,800. B/A = 16,800 / 15,300 = 1.098. Comment: Both A and B have been calculated per accident. Their ratio does not depend on the expected number of accidents. 9.12. C. The losses eliminated are: L1 + (N2)(D). Loss Elimination Ratio is: {L1 + (N2)(D)} / (L1 + L2) = 1 - {L2 - (N2)(D)}/(L1 + L2). Alternately, each loss of size less than D contributes nothing to the excess losses. Each loss of size x > D, contributes x - D to the excess losses. Therefore, the excess losses = L2 - (N2)(D). Excess Ratio = (Excess Losses)/(Total Losses) = {L2 - (N2)(D)}/(L1 + L2). Loss Elimination Ratio = 1 - Excess Ratio = 1 - {L2 - (N2)(D)}/(L1 + L2).
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 87
Section 10, Average Size of Losses in an Interval One might want to know the average size of those losses between $10,000 and $25,000 in size. For the ungrouped data in Section 1, this is calculated as: sum of losses of size between $10,000 & $25,000 = $197,300 / 13 = $15,177. # losses between $10,000 & $25,000 Exercise: For the data in Section 1, what is the average size of loss for those losses of size from $25,000 to $50,000? [Solution: (29600 + 32200 + 32500 + 33700 + 34300 + 37300 + 39500 + 39900 + 41200 + 42800 + 45900 + 49200) / 12 = $458,100 / 12 = $38,175. Comment: The answer had to be between 25,000 and 50,000.] Note that this concept differs from a layer of loss. Here we are ignoring all losses other than those in a certain size category. In contrast, losses of all sizes contribute to each layer. Exercise: An insured has losses of sizes: $300, $600, $1200, $1500, and $2800. Determine the losses in the layer from $500 to $2500. [Solution: The loss of size 300 contributes nothing. The loss of size 600 contributes 100. The loss size 1200 contributes 700. The loss of 1500 contributes 1000. The loss of 2800 contributes the width of the layer or 2000. 0 + 100 + 700 + 1000 + 2000 = 3800.] Exercise: An insured has losses of sizes: $300, $600, $1200, $1500, and $2800. Determine the sum of those losses of size from $500 to $2500. [Solution: 600 + 1200 + 1500 = 3300. Comment: The average size of these three losses is: 3300/3 = 1100.] For a discrete size of loss distribution, the dollars from those losses of size ≤ 10,000 is:
Σ xi Prob[X = xi]. xi ≤10000
For a continuous size of loss distribution, the dollars from those losses of size ≤ 10,000 is:46 10,000
∫
x f(x) dx = E[X ∧ 10,000] - 10,000 S(10,000).
0 46
The limited expected value = contribution of small losses + contribution of large losses. Therefore, contribution of small losses = limited expected value - contribution of large losses.
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 88
For a continuous size of loss distribution, the average size of loss for those losses of size less than or equal to 10,000 is: 10,000
∫
x f(x) dx
0
F(10,000)
=
E[X∧10,000] - 10,000 S(10,000) . F(10,000)
Exercise: For an Exponential Distribution with mean = 50,000, what is the average size of those losses of size less than or equal to 10,000? [Solution: E[X ∧ x] = θ (1 - e-x/θ). E[X ∧ 10000] = 50000 (1 - e-1/5) = 9063. S(x) = e-x/θ. S(10000) = e-1/5 = .8187. {E[X ∧ 10000] - 10000S(10000)}/F(10000) = 4832.] For a continuous size of loss distribution the average size of loss for those losses of size between 10,000 and 25,000 would be written as: 25,000
∫
25,000
∫
x f(x) dx
10,000
F(25,000) - F(10,000)
=
10,000
x f(x) dx -
0
∫
x f(x) dx
0
F(25,000) - F(10,000)
=
{E[X∧ 25000] - 25000 S(25000)} - {E[X ∧10000] - 10000 S(10000)} . F(25000) - F(10000) Exercise: For an Exponential Distribution with mean = 50,000, what is the average size of those losses of size between 10,000 and 25,000? [Solution: E[X ∧ x] = θ (1 - e-x/θ). E[X ∧ 10000] = 50000 (1 - e-1/5) = 9063. E[X ∧ 25000] = 50000 (1 - e-1/2) = 19,673. S(x) = e-x/θ. S(10000) = e-1/5 = 0.8187. S(25000) = e-1/2 = 0.6065 ({E[X ∧ 25000] - 25000S(25000)} - {E[X ∧ 10000] - 10000S(10000)}) / {F(25000) - F(10000)} = ({19,673 - (25,000)(0.6065)} - {9063 - (10000)(0.8187)}) / {0.3935 - 0.1813} = 17,127.] In general, the average size of loss for those losses of size between a and b is: {E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)} . F(b) - F(a) The numerator is the dollars per loss contributed by the losses of size a to b = (contribution of losses of size 0 to b) minus (contribution of losses of size 0 to a). The denominator is the percent of losses of size a to b = (percent of losses of size 0 to b) minus (percent of losses of size 0 to a).
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 89
For an Exponential with θ = 50,000, here are the average sizes for various size intervals: Bottom
Top
0 10,000 25,000 50,000 100,000 250,000
10,000 25,000 50,000 100,000 250,000 Infinity
9,063 19,673 31,606 43,233 49,663 50,000
S(Top)
Average Size
81.9% 60.7% 36.8% 13.5% 0.7% 0.0%
4,833 17,126 36,463 70,901 142,141 300,000
For a Pareto Distribution, S(x) = (θ/(θ+x))α, and E[X ∧ x] = {θ/(α-1)}{1 - (θ/(θ+x))α−1}. A Pareto Distribution with α = 3 and θ = 100,000, has a mean of: θ/(α-1) = 50,000. For this Pareto Distribution, here are the average sizes for various size intervals: Bottom
Top
0 10,000 25,000 50,000 100,000 250,000
10,000 25,000 50,000 100,000 250,000 Infinity
8,678 18,000 27,778 37,500 45,918 50,000
S(Top)
Average Size
75.1% 51.2% 29.6% 12.5% 2.3% 0.0%
4,683 16,863 35,989 70,270 148,387 425,000
Notice the difference between the results for the Pareto and the Exponential Distributions. Proportion of Dollars of Loss From Losses of a Given Size: Another quantity of interest, is the percentage of the total losses from losses in a certain size interval. Proportional of Total Losses from Losses in the Interval [a, b] is: b
∫ x f(x) dx
a
E[X]
=
{E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)} . E[X]
Exercise: For an Exponential Distribution with mean = 50,000, what percentage of the total dollars of those losses come from losses of size between 10,000 and 25,000? [Solution: E[X ∧ x] = θ (1 - e-x/θ). E[X ∧ 10000] = 50000 (1 - e-1/5) = 9063. E[X ∧ 25000] = 50000 (1 - e-1/2) = 19,673. S(x) = e-x/θ. S(10000) = e-1/5 = 0.8187. S(25000) = e-1/2 = 0.6065 ({E[X ∧ 25000] - 25000S(25000)} - {E[X ∧ 10000] - 10000S(10000)}) / E[X] = ({19,673 - (25,000)(0.6065)} - {9063 - (10,000)(.8187)}) / 50,000 = 7.3%.]
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 90
For an Exponential with θ = 50,000, here are the percentages for various size intervals: Bottom
Top
0 10,000 25,000 50,000 100,000 250,000
10,000 25,000 50,000 100,000 250,000 Infinity
9,063 19,673 31,606 43,233 49,663 50,000
S(Top)
Percentage of Total Losses
81.9% 60.7% 36.8% 13.5% 0.7% 0.0%
1.8% 7.3% 17.4% 33.0% 36.6% 4.0%
A Pareto Distribution with α = 3 and θ = 100,000, here are the percentages for various size intervals: Bottom
Top
0 10,000 25,000 50,000 100,000 250,000
10,000 25,000 50,000 100,000 250,000 Infinity
8,678 18,000 27,778 37,500 45,918 50,000
S(Top)
Percentage of Total Losses
75.1% 51.2% 29.6% 12.5% 2.3% 0.0%
2.3% 8.1% 15.5% 24.1% 30.2% 19.8%
Notice the difference between the results for the Pareto and the Exponential Distributions.
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 91
Problems: For each of the following three problems, assume you have a Pareto distribution with parameters α = 5 and θ = $1000. 10.1 (2 points) What is the average size of those losses less than $500 in size? A. less than $160 B. at least $160 but less than $170 C. at least $170 but less than $180 D. at least $180 but less than $190 E. at least $190 10.2 (2 points) What is the average size of those losses greater than $500 in size but less than $2000? A. less than $800 B. at least $800 but less than $825 C. at least $825 but less than $850 D. at least $850 but less than $875 E. at least $875 10.3 (2 points) Assume you expect 100 losses per year. What is the expected dollars of loss paid on those losses greater than $500 in size but less than $2000? A. less than $10,500 B. at least $10,500 but less than $11,000 C. at least $11,000 but less than $11,500 D. at least $11,500 but less than $12,000 E. at least $12,000 10.4 (2 points) You are given the following: • A sample of 5,000 losses contains 1800 that are no greater than $100, 2500 that are greater than $100 but no greater than $1000, and 700 that are greater than $1000. • The empirical limited expected value function for this sample evaluated at $100 is $73. • The empirical limited expected value function for this sample evaluated at $1000 is $450. Determine the total amount of the 2500 losses that are greater than $100 but no greater than $1000. A. Less than $1.50 million B. At least $1.50 million, but less than $1.52 million C. At least $1.52 million, but less than $1.54 million D. At least $1.54 million, but less than $1.56 million E. At least $1.56 million
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 92
10.5 (3 points) Severity is LogNormal with µ = 5 and σ = 3. What is the average size of those losses greater than 20,000 in size but less than 35,000? A. less than 25,000 B. at least 25,000 but less than 27,000 C. at least 27,000 but less than 29,000 D. at least 29,000 but less than 31,000 E. at least 31,000 10.6 (2 points) You are given the following: x F(x) E[X ∧ x] $20,000 0.75 $7050 $30,000 0.80 $9340 Determine the average size of those losses of size between $20,000 and $30,000. A. Less than $23,500 B. At least $23,500, but less than $24,500 C. At least $24,500, but less than $25,500 D. At least $25,500, but less than $26,500 E. At least $26,500 10.7 (2 points) You are given the following: • A sample of 3,000 losses contains 2100 that are no greater than $1,000, 830 that are greater than $1,000 but no greater than $5,000, and 70 that are greater than $5,000. • The total amount of the 830 losses that are greater than $1,000 but no greater than $5,000 is $1,600,000. • The empirical limited expected value function for this sample evaluated at $1,000 is $560. Determine the empirical limited expected value function for this sample evaluated at $5,000. A. Less than $905 B. At least $905, but less than $915 C. At least $915, but less than $925 D. At least $925, but less than $935 E. At least $935 10.8 (2 points) The random variable for a loss, X, has the following characteristics: Limited Expected Value at x x F(x) 0 0.0 0 100 0.2 91 200 0.6 153 1000 1.0 331 Calculate the average size of those losses of size greater than 100 but less than 200. (A) 140 (B) 145 (C) 150 (D) 155 (E) 160
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 93
10.9 (160, 5/88, Q.5) (2.1 points) A population experiences mortality consistent with an exponential distribution with θ = 10. Calculate the average fraction of the interval (x, x+3] lived by those who die during the interval. (A) (1 + e-0.1 + e-0.2 - 3e-0.3) e / {6(1 - e-0.3)} (B) (1 + e-0.1 + e-0.2 - 3e-0.3) / {3(1 - e-0.3)} (C) 1/3 (D) (13 - 10e-0.3) / {3(1 - e-0.3)} (E) (10 - 13e-0.3) /{3(1 - e-0.3)} 10.10 (4B, 5/92, Q.23) (2 points) You are given the following information: A large risk has a lognormal claim size distribution with parameters µ = 8.443 and σ = 1.239. The insurance agent for the risk settles all claims under $5,000. (Claims of $5,000 or more are settled by the insurer, not the agent.) Determine the expected value of a claim settled by the insurance agent. A. Less than 500 B. At least 500 but less than 1,000 C. At least 1,000 but less than 1,500 D. At least 1,500 but less than 2,000 E. At least 2,000 10.11 (4B, 5/93, Q.33) (3 points) The distribution for claim severity follows a Single Parameter Pareto distribution of the following form: f(x) = (3/1000)(x/1000)-4, x > 1000 Determine the average size of a claim between $10,000 and $100,000, given that the claim is between $10,000 and $100,000. A. Less than $18,000 B. At least $18,000 but less than $28,000 C. At least $28,000 but less than $38,000 D. At least $38,000 but less than $48,000 E. At least $48,000
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 94
10.12 (4B, 5/99, Q.10) (2 points) You are given the following: • One hundred claims greater than 3,000 have been recorded as follows: Interval Number of Claims (3,000, 5,000] 6 (5,000, 10,000] 29 (10,000, 25,000] 39 (25,000, ∞) 26 • Claims of 3,000 or less have not been recorded. • The null hypothesis, H0 , is that claim sizes follow a Pareto distribution, with parameters α = 2 and θ = 25,000 . If H0 is true, determine the expected claim size for claims in the interval (25,000, ∞). A. 12,500
B. 25,000
C. 50,000
D. 75,000
E. 100,000
10.13 (4B, 11/99, Q.1) (2 points) You are given the following: • Losses follow a distribution (prior to the application of any deductible) with mean 2,000. • The loss elimination ratio (LER) at a deductible of 1,000 is 0.30. • 60 percent of the losses (in number) are less than the deductible of 1,000. Determine the average size of a loss that is less than the deductible of 1,000. A. Less than 350 B. At least 350, but less than 550 C. At least 550, but less than 750 D. At least 750, but less than 950 E. At least 950
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 95
Solutions to Problems: 10.1. A. The Limited Expected Value of the Pareto, E[X ∧ x] = {θ/(α−1)}{1−(θ/θ+x)α−1}. 500
∫ x f(x) dx
= E[X ∧ 500] - 500S(500) = 200.6 - 500 (1000/1500)5 = 200.6 - 65.8 = 134.8.
0
Average size of claim = 134.8 / F(500) = 134.8 / .869 = $155. 10.2. B.
2000
2000
500
∫ x f(x) dx = ∫ x f(x) dx - ∫ x f(x) dx = 500
0
0
E[X ∧ 2000] - 2000S(2000) - {E[X ∧ 500] - 500S(500)} = {246.9 - 2000 (1000/3000)5 } - {200.6 - 500 (1000/1500)5 } = 238.7 - 134.8 = 103.9 Average Size of Claim = 103.9 / (F(2000)-F(500)) = 103.9 / (0.996 - 0.869) = $818. 10.3. A.
2000
∫
2000
∫ x f(x) dx -
100 x f(x) dx = 100 500
0
500
∫
100 x f(x) dx = 0
100{ {E[X ∧ 2000] - 2000S(2000)} - {E[X ∧ 500] - 500S(500)} }= 100{{246.9 - 2000 (1000/3000)5 } - {200.6 - 500 (1000/1500)5 }} = 100{238.7 - 134.8} = $10,390. Alternately, one expects 100{F(2000)-F(500)} = 100{.996 - .869} = 12.7 such claims per year, with an average size of $818, based on the previous problem. Thus the expected dollars of loss on these claims = (12.7)($818) = $10,389.
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 96
10.4. B. The average size of those claims of size between 100 and 1,000 equals : ({E[X ∧ 1000] - 1000S(1000)} - {E[X ∧ 100] - 100S(100)})/{F(1000)-F(100)} = {(450 - (1000)(700/5000)) - (73 - (100)(3200/5000)}/{(4300/5000)-(1800/5000)} = (310 - 9) / .5 = $602. Thus these 2500 claims total: (2500)($602) = $1,505,000. Alternately, (Losses Limited to $100) / (Number of Claims) = E[X ∧ 100] = $73. Since there are 5000 claims, Losses Limited to $100 = ($73)(5000) = $365,000. Now there are: 2500 + 700 = 3200 claims greater than $100 in size. Since these claims contribute $100 each to the losses limited to $100, they contribute a total of: (3200)($100) = $320,000. Losses limited to $100 = (losses on Claims ≤$100) + (contribution of claims >$100). Thus losses on Claims ≤ $100 is: $365,000 - $320,000 = $45,000. (Losses Limited to $1000) / (Number of Claims) = E[X ∧ 1000] = $450. Since there are 5000 claims, Losses Limited to $1000 = ($450)(5000) = $2,250,000. Now there are 700 claims greater than $1000 in size. Since these claims contribute $1000 each to the losses limited to $1000, they contribute a total of: (700)($1000) = $700,000. Losses limited to $1000 = (losses on Claims ≤$1000)+(contribution of Claims >$1000). Thus losses on Claims ≤$1000 = $2,250,000 - $700,000 = $1,550,000. The total amount of the claims that are greater than $100 but no greater than $1000 is: (losses on Claims ≤$1000) - (losses on Claims ≤$100) = $1,550,000 - $45,000 = $1,505,000. 10.5. B. F(x) = Φ[(lnx − µ)/σ]. F(20000) = Φ[1.63]. F(35000) = Φ[1.82]. E[X
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.
E[X
∧
x] - xS(x) = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ].
∧ E[X ∧ E[X
20,000] - (20000)S(20000) = (e9.5) Φ[(ln20000 - 5 - 32 )/3]) =13360 Φ[-1.37]. 35,000] - (35000)S(35000) = (e9.5) Φ[(ln35000 - 5 - 32 )/3]) =13360 Φ[-1.18].
The average size of claims of size between $20,000 and $25,000 is: (E[X ∧ 35000] - 35000S(35000) - {E[X ∧ 20000] - 20000S(20000)}) /{F(35000)-F(20000)} = 13360{Φ[-1.18] - Φ[-1.37]}/{Φ[1.82] - Φ[1.63]} = 13360(.1190 - .0853)/(.9656 - .9484) = 26,176. 10.6. D. The average size of those claims of size between 20,000 and 30,000 equals: ({E[X ∧ 30000] - 30000S(30000)} - {E[X ∧ 20000] - 20000S(20000)}) / {F(30000)-F(20000)} = {(9340-(.2)(30000)) - (7050-(.25)(20000))}/(.80-.75) = (3340-2050)/.05 = $25,800.
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 97
10.7. B. (Losses Limited to $1000) / (Number of Claims) = E[X ∧ 1000] = $560. Since there are 3000 claims, Losses Limited to $1000 = ($560)(3000) = $1,680,000. Now there are 830+70 = 900 claims greater than $1000 in size. Since these claims contribute $1000 each to the losses limited to $1000, they contribute a total of (900)($1000) = $900,000. Losses limited to $1000 = (losses on Claims ≤ $1000) + (contribution of claims > $1000). Thus the losses on claims ≤ $1000 = $1,680,000 - $900,000 = $780,000. Now the losses on claims ≤ $5000 = (losses on claims ≤$1000) + (losses on claims > $1000 and ≤ $5000) = $780,000 + ($1,600,000) = $2,380,000. Finally, the losses limited to $5000 = (the losses on claims ≤ $5000) + (Number of Claims > $5000)($5000) = $2,380,000 + (70)($5000) = $2,730,000. E[X ∧ 5000] = (Losses limited to $5000)/(Total Number of Claims) = $2,730,000 / 3000 = $910. Alternately, the average size of those claims of size between 1,000 and 5,000 equals : ({E[X ∧ 5000] - 5000S(5000)} - {E[X ∧ 1000] - 1000S(1000)})/{F(5000)-F(1000)}. We are given that: S(1000) = 900/3000 = 0.30, S(5000) = 70/3000 = 0.0233, E[X ∧ 1000] = 560. The observed average size of those claims of size 1000 to 5000 is: 1,600,000/ 830 = 1927.7 Setting the observed average size of those claims of size 1000 to 5000 equal to the above formula for the same quantity: 1927.7 = ({E[X ∧ 5000] - 5000S(5000)} - {E[X ∧ 1000] - 1000S(1000)})/{F(5000)-F(1000)} = ({E[X ∧ 5000] - 5000(0.0233)} - {560 - 1000(0.30))/{0.9767 - 0.70} . Solving, E[X ∧ 5000] = (1927.7)(0.2767) + 116.5 + 560 - 300 = $910. 10.8. D. Average size of losses between 100 and 200 is: ({E[X ∧ 200] - 200S(200)} - {E[X ∧ 100] - 100S(100)})/(F(200) - F(100)) = ({153 - (200)(1 - 0.6)} - {91 - (100)(1 - 0.2)})/(0.6 - 0.2) = (73 - 11)/0.4 = 155. Comment: Same information as in 3, 11/01, Q.35.
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 98
10.9. E. The average size for losses of size between x and x + 3 is: {E[X ∧ x+3] - (x+3)S(x+3)} - E[X ∧ x] + xS(x)} / {F(x+3) - F(x)} = {10(1 - e-(x+3)/10 - (x+3)e-(x+3)/10 - 10(1 - e-x/10) + xe-x/10}/(e-x/10 - e-(x+3)/10) = {10e-x/10 - 13e-(x+3)/10 - xe-(x+3)/10 + xe-x/10}/{e-x/10(1 - e-.3)} = (10 - 13e-.3 - xe-.3 + x)/(1 - e-.3) = (10 - 13e-.3)/(1 - e-.3) + x. The average fraction of the interval (x, x+3] lived by those who die during the interval = {(The average size for losses of size between x and x + 3) - x}/(x + 3 - x) = (10 - 13e- . 3)/{3(1 - e- . 3)}. Alternately, the fraction for someone who dies at age x+ t is: t/3. Average fraction is: 3
3
t=0
t=0
∫ (t / 3) e− (x + t)/10 / 10 dt / ∫ e− (x + t )/10 / 10 dt = t=3
t=3
{(e-x/10 / 3)(-te-t/10 - 10e-t/10) ] } / {e-x/10 (- e-t/10) ] } = (10 - 13e- . 3)/{3(1 - e- . 3)}. t=0
t=0
10.10. E. One is asked for the average size of those claims of size less than 5000. This is: 5000
∫ x f(x) dx / F(5000) = {E[X ∧ 5000] - 5000(1 - F(5000)} / F(5000). x=0
For this LogNormal Distribution: F(5000) = Φ[{ln(x) − µ} / σ] = Φ[{ln(5000) - 8.443} / 1.239 ] = Φ[.060] = .5239. E[X ∧ 5000] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx - µ)/σ]} = exp(8.443 + 1.2392 /2)Φ[(ln5000 - 8.443 -1.2392 )/1.239] + 5000 {1-.5238} = 10002 Φ[-1.18] + 2381 = (10002)(.1190) + 2381 = 3571.
⇒ {E[X ∧ 5000] - 5000(1 - F(5000)} / F(5000) = (3571 - 2381)/0.5239 = 2271.
2013-4-2,
Loss Distributions, §10 Avg. Size in Interval
HCM 10/8/12,
Page 99
10.11. A. F(x) = 1 - (x/ 1000)-3, x > 1000. S(10,000) = 0.001. S(100,000) = 0.000001. The average size of claim between 10000 and 100000 is the ratio of the dollars of loss on such claims to the number of such claims: 100,000
∫
100,000
xf(x) dx / {F(10,000) - F(100,000)} = (3 x109 ) ∫ x-3 dx / 0.000999 =
10,000
10,000
3.003 x 1012 {1/2}{10,000-2 - 100,000-2} = (1.5015){10,000 - 100} = 14,865. Comment: One can get the Distribution Function either by integrating the density function from 1000 to x or by recognizing that this is a Single Parameter Pareto Distribution. Note that in this case, as is common for a distribution skewed to the right, the average size of claim is near the left end of the interval rather than near the middle. 10.12. D. F(x) = 1 - (θ/(θ+x))α. S(25000) = 1-F(25000) = {25000/(25000+25000)}2 = 1/4. E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}. E[X ∧ 25000] = {25000/(2-1)}{1-(25000/(25000+25000))2-1} = 25000(1/2) = 12,500. The expected claim size for claims in the interval (25,000, ∞ ) = (E[X] - {E[X ∧ 25000] -25000S(25000)})/S(25000) = (25000-(12500-(1/4)(25000))/(1/4) = (18750)(4) = 75,000. Alternately, the average payment the insurer would make excess of 25,000, per non-zero such payment, is {E[X] - E[X ∧ 25000]}/S(25000) = 50,000. Then the expected claim size for claims in the interval (25,000, ∞ ) is this portion excess of 25,000 plus an additional 25,000 per large claim; 50,000 + 25,000 = 75,000. 10.13. A. LER(1000) = .30. E[X] = 2000. F(1000) = 0.60. E[X ∧ 1000] = LER(1000)E[X] = (0.30)(2000) = 600. The average size of those losses less than 1000 is: {E[X ∧ 1000] - 1000S(1000)}/F(1000) = {600-(1000)(1-.6)}/0.6 = (600-400)/0.6 = 333.33.
2013-4-2,
Loss Distributions, §11 Grouped Data
HCM 10/8/12,
Page 100
Section 11, Grouped Data Unlike the ungrouped data in Section 1, often one is called upon to work with data grouped into intervals.47 In this example, both the number of losses in each interval and the dollars of loss on those losses are shown. Sometimes the latter information is missing or sometimes additional information may be available. Interval ($000) 0-5 5 -10 10-15 15-20 20-25 25-50 50-75 75-100 100 - ∞ SUM
Number of Losses 2208 2247 1701 1220 799 1481 254 57 33 10,000
Total of Losses in the Interval ($000) 5,974 16,725 21,071 21,127 17,880 50,115 15,303 4,893 4,295 157,383
The estimated mean is $15,738. As will be seen, in some cases one has to deal with grouped data in a somewhat different manner than ungrouped data. With modern computing power, the actuary is usually better off working with the data in an ungrouped format if available. The grouping process discards valuable information. The wider the intervals, the worse is the loss of information.
47
Note that in this example, for simplicity I have not made a big deal over whether for example the 10-15 interval includes 15 or not. In many real world applications, in which claims cluster at round numbers, that can be important.
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 101
Section 12, Working with Grouped Data For the Grouped Data in Section 11, it is relatively easy to compute the various items discussed previously. The Limited Expected Value can be computed provided the limit is the upper boundary of one of the intervals. One sums the losses for all the intervals less than the limit, and adds the product of the number of claims greater than the limit times the limit. For example, the Limited Expected Value at $25,000 is: 82,777 thousand + (25 thousand) (1825 claims) = 12.84 thousand 10,000 claims The Loss Elimination Ratio, LER(x), can be computed provided x is a boundary of an interval. LER(x) = E[X ∧ x] / mean. So LER(25000) = 12.84 / 15.74 = 81.6%. The Excess Ratio, R(x), can be computed provided x is a boundary of an interval. The excess losses are the sum of the losses for intervals greater than x, minus the product of x and the number of claims greater than x. For example, the losses excess of $75 thousand, are: (4893 + 4295) - (57 + 33)(75) = 2438 thousand. R(75,000) = 2438 / 157383 = 1.5%. Also, R(x) = 1 - LER(x) = 1 - E[X ∧ x] / mean. The mean excess loss, e(x), can be computed provided x is a boundary of an interval. e(x) is the losses excess of x, divided by the number of claims greater than x. So for example, using the excess losses computed above, e(75,000) = 2,438,000 / (57+33) = 27.0 thousand. e(x) can also computed using e(x) = R(x) mean / S(x) .
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 102
Exercise: Compute the Limited Expected Values, Loss Elimination Ratios, Excess Ratios, and Mean Residual Lives for the Grouped Data in Section 11. [Solution: Bottom Top of of Interval Interval
0 5 10 15 20 25 50 75 100
x 5 10 15 20 25 50 75 100 Infinity
# claims in the Interval 2208 2247 1701 1220 799 1481 254 57 33 10000
# claims > Interval 10000 7792 5545 3844 2624 1825 344 90 33 0
Loss in Interval
5,974 16,725 21,071 21,127 17,880 50,115 15,303 4,893 4,295
CumuSeverity LEV(x) LER(x) lative for Lossesinterval Cumu0 0 0 5,974 4.5 28.6% 22,699 7.8 49.7% 43,770 10.1 64.4% 64,897 11.7 74.6% 82,777 12.8 81.6% 132,892 15.0 95.4% 148,195 15.5 98.5% 153,088 15.6 99.4% 157,383 15.7 100%
R(x)
e(x)
1 71.4% 50.3% 35.6% 25.4% 18.4% 4.6% 1.5% 0.6% 0.0%
15.7 14.4 14.3 14.6 15.2 15.9 21.2 27.1 30.2
157,383
For example, the numerator of the limited expected value at 20,000 is: contribution of small losses + contribution of large losses = $5,974,000 + $16,725,000 + $21,071,000 + $21,127,000 + ($20,000)(799 + 1481 + 254 + 57 + 33) = $64,897,000 + ($20,000)(2624) = $117,377,000. E[X ∧ 20,000] = $117,377,000/10,000 = $11,738. LER(20,000) = E[X ∧ 20,000]/E[X] = $11,738/$15,738 = 74.6%. e(20,000) = (E[X] - E[X ∧ 20,000]) / S(20,000) = ($15,738 - $11,738) / (2624/10,000) = $15,244.]
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Problems: Use the following information to answer the next four questions: Range($) 0-100 100-200 200-300 300-400 400-500 over 500
# of claims 6300 2350 850 320 110 70
loss $300,000 $350,000 $200,000 $100,000 $50,000 $50,000
10000
$1,050,000
12.1 (1 point) What is the Loss Elimination Ratio, for a deductible of $200? A. less than .83 B. at least .83 but less than .85 C. at least .85 but less than .87 D. at least .87 but less than .89 E. at least .89 12.2 (1 point) What is the Limited Expected Value, for a Limit of $300? A. less than $95 B. at least $95 but less than $105 C. at least $105 but less than $115 D. at least $115 but less than $125 E. at least $125 12.3 (1 point) What is the empirical mean excess loss at $400? A. less than $140 B. at least $140 but less than $150 C. at least $150 but less than $160 D. at least $160 but less than $170 E. at least $170 12.4 (1 point) What is the Excess Ratio, excess of $500? A. less than 1.5% B. at least 1.5% but less than 1.6% C. at least 1.6% but less than 1.7% D. at least 1.7% but less than 1.8% E. at least 1.8%
Page 103
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
12.5 (1 point) Calculate the loss elimination ratio for a $500 deductible. Loss Size Number of Claims Total Amount of Loss $0-249 5,000 $1,125,000 250-499 2,250 765,000 500-999 950 640,000 1,000-2,499 575 610,000 2500 or more 200 890,000 Total 8,975 A. Less than 55.0% B. At least 55.0%, but less than 60.0% C. At least 60.0%, but less than 65.0% D. At least 65.0%, but less than 70.0% E. 70.0% or more
$4,030,000
12.6 (2 points) An individual health insurance policy will pay: • None of the first $500 of annual medical costs. • 80% of the next $2500 of annual medical costs. • 100% of annual medical costs excess of $3000. Annual Medical Costs Frequency Average Amount $0 20% $1-$500 20% $300 $501-$1000 10% $800 $1001-$1500 10% $1250 $1501-$2000 10% $1700 $2001-$2500 10% $2150 $2501-$3000 10% $2600 over $3000 10% $4500 What is the average annual amount paid by this policy? A. $800 B. $810 C. $820 D. $830 E. $840
Page 104
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
12.7 (3 points) Use the following information: Range($) # of claims 0 2370 1-10,000 1496 10,001-25,000 365 25,001-100,000 267 100,001-300,000 99 300,001-1,000,000 15 Over 1,000,000 1
HCM 10/8/12,
Page 105
loss($000) 0 4,500 6,437 13,933 16,488 7,207 2,050
4613 50,615 Determine the loss elimination ratios at 10,000, 25,000, 100,000, 300,000, and 1 million. 12.8 (2 points) You are given the following data on sizes of loss: Range # of claims 0-99 29 100-199 38 200-299 13 300-399 9 400-499 7 500 or more 4
loss 1000 6000 3000 3000 3000 4000
100 20,000 Determine the empirical limited expected value E[X ∧ 300]. A. 145 B. 150 C. 155 D. 160 12.9 (3 points) Use the following information: Range($) # of claims 0 2711 1-10,000 1124 10,001-50,000 372 50,001-100,000 83 100,001-300,000 51 300,001-1,000,000 5 Over 1,000,000 2
E. 165
loss($000) 0 3,082 7,851 5,422 7,607 2,050 3,000
4348 29,012 Determine the loss elimination ratios at 10,000, 50,000, 100,000, 300,000, and 1 million.
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 106
12.10 (3 points) Using the following data, determine the mean excess loss at the endpoints of the intervals. Interval Number of claims Dollars of Loss $1- 10,000 1,600 $16,900,000 $10,001 - 30,000 600 $14,000,000 $30,001 - 100,000 250 $12,500,000 $100,001 - 500,000 48 $5,500,000 Over $500,000 2 $1,100,000 Total
2,500
$50,000,000
12.11 (4B, 5/96, Q.22) (2 points) Forty (40) observed losses have been recorded in thousands of dollars and are grouped as follows: Interval Number of Total Losses ($000) Losses ($000) (1, 4/3) 16 20 [4/3, 2) 10 15 [2, 4) 10 35 [4, ∞) 4 20 Determine the empirical limited expected value function evaluated at 2 (thousand). A. Less than 0.5 B. At least 0.5, but less than 1.0 C. At least 1.0, but less than 1.5 D. At least 1.5, but less than 2.0 E. At least 2.0 12.12 (4B, 11/97, Q.8) (2 points) You are given the following: • A sample of 2,000 claims contains 1,700 that are no greater than $6,000, 30 that are greater than $6,000 but no greater than $7,000, and 270 that are greater than $7,000. • The total amount of the 30 claims that are greater than $6,000 but no greater than $7,000 is $200,000. • The empirical limited expected value function for this sample evaluated at $6,000 is $1,810. Determine the empirical limited expected value function for this sample evaluated at $7,000. A. Less than $1,900 B. At least $1,900, but less than $1,925 C. At least $1,925, but less than $1,950 D. At least $1,950, but less than $1,975 E. At least $1,975
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
12.13 (Course 4 Sample Exam 2000, Q.7) Summary statistics of 100 losses are: Interval
Number of Losses
Sum
Sum of Squares
(0, 2000] (2000, 4000] (4000, 8000] (8000, 15000] (15,000, ∞)
39 22 17 12 10
38,065 63,816 96,447 137,595 331,831
52,170,078 194,241,387 572,753,313 1,628,670,023 17,906,839,238
Total
100
667,754
20,354,674,039
Determine the empirical limited expected value E[X ∧ 15,000].
12.14 (CAS5, 5/00, Q.27) (1 point) Calculate the excess ratio at $100. (The excess ratio is one minus the loss elimination ratio.) Loss Size ($) Number of Claims Amount of Losses ($) 0-50 600 21,000 51- 100 500 37,500 101 -250 400 70,000 251 - 500 300 120,000 501-1000 200 150,000 Over 1000 100 200,000 Total 2,100 598,500 A. Less than 0.700 B. At least 0.700, but less than 0.725 C. At least 0.725, but less than 0.750 D. At least 0.750, but less than 0.775 E. At least 0.775
Page 107
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 108
12.15 (CAS5, 5/02, Q.14) (1 point) Based on the following full-coverage loss experience, calculate the excess ratio at $500. (The excess ratio is one minus the loss elimination ratio.) Loss Size ($) Number of Claims Amount of Losses ($) 0 - 100 1,100 77,000 101 - 250 800 148,000 251 - 500 500 180,000 501 -1000 350 245,000 1001 - 2000 200 300,000 Over 2000 50 150,000 Total 3,000 1,100,000 A. Less than 0.250 B. At least 0.250, but less than 0.350 C. At least 0.350, but less than 0.450 D. At least 0.450, but less than 0,550 E. At least 0.550 12.16 (CAS3, 11/04, Q.26) (2.5 points) A sample of size 2,000 is distributed as follows: Range Count 0≤X≤6,000 1,700 6,000
• The sum of the 30 observations between 6,000 and 7,000 is 200,000. • For the empirical distribution X, E(X ∧ 6,000) = 1,810. Determine E(X ∧ 7,000). A. Less than 1,910 B. At least 1,910, but less than 1,930 C. At least 1,930, but less than 1,950 D. At least 1,950, but less than 1,970 E. At least 1,970
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
12.17 (CAS5, 5/05, Q.19) (1 point) Given the following information, calculate the loss elimination ratio at a $500 deductible. Loss Amount Claim Count Total Loss Below $500 150 $15,000 $500 6 $3,000 Over $500 16 $22,000 A. Less than 0.4 B. At least 0.4, but less than 0.5 C. At least 0.5, but less than 0.6 D. At least 0.6, but less than 0.7 E. At least 0.7
Page 109
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 110
Solutions to Problems: 12.1. D. LER(200) = Losses Eliminated / Total Losses = {(300,000+350,000) + (200)(850+320+110+70)} / 1,050,000 = $920,000 / $1,050,000 = 0.876. Alternately, E[X ∧ 200] = {(300,000+350,000) + (200)(850 +320+110+70)} /10000 = $92. Mean = $105. Thus, LER(200) = E[X ∧ 200] / mean = $92 / $105= 0.876. 12.2. B. E[X ∧ 300] = {(300,000+350,000+200,000) + (300)(320+110+70)} / 10000 = 100. 12.3. C. e(400) = (dollars on claims excess of 400) / (# claims greater than 400) - 400 = {(50000 + 50000) / (110 + 70)} - 400 = 555.56 - 400 = $155.56. Alternately, e(400) = { mean - E[X ∧ 400] } / {1 - F(400)} = ($105 - $102.2) / .018 = $155.56. 12.4. A. R(500) = (dollars excess of 500) / (total dollars) = {50000 - (70)(500)} / 1,050,000 = 0.0143. Alternately, R(500) = 1 - E[X ∧ 500] / mean = 1 - $103.5 / $105 = 0.0143. 12.5. D. Losses eliminated = 1,125,000 + 765,000 + (500)(950 + 575 + 200) = 2,752,500. Loss Elimination Ratio = 2,752,500/4,030,000 = 68.3%. 12.6. D. 80% of the layer from 500 to 3000 is paid, plus 100% of the layer excess of 3000. Annual Cost
Frequency
0 1 to 500 501 to 1000 1001 to 1500 1501 to 2000 2001 to 2500 2501 to 3000 over 3000
20% 20% 10% 10% 10% 10% 10% 10%
Average
Average Cost
Layer 500 to 3000
Layer excess of 3000
Paid
$300 $800 $1250 $1700 $2150 $2600 $4500
$0 $0 $300 $750 $1200 $1650 $2100 $2500
$0 $0 $0 $0 $0 $0 $0 $1500
$0 $0 $240 $600 $960 $1320 $1680 $3500 $830
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 111
12.7. The losses eliminated by a deductible of size 10,000 are in thousands: 4500 + 10(365 + 267 + 99 + 15 + 1) = 11,970. LER(10,000) = 11970/50615 = 23.65%. Bottom Top # of of claims Interval Interval in the $ Thou. $ Thou. Interval
# claims > Interval
Loss in Interval $ Thous.
0 10 25 100 300 1000 Infinity
2243 747 382 115 16 1 0
0 4,500 6,437 13,933 16,488 7,207 2,050
1 10 25 100 300 1000
2370 1496 365 267 99 15 1 4613
CumuSeverity Losses lative forEliminated Lossesinterval $ Thous. $ Thous. $ Thous. 0 4,500 10,937 24,870 41,358 48,565 50,615
11,970 20,487 36,370 46,158 49,565
LER(x)
23.65% 40.48% 71.86% 91.19% 97.93%
50,615
Comment: Data taken from AIA Closed Claim Study (1974) in Table IV of “Estimating Pure Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976. Finger calculates excess ratios, which are one minus the loss elimination ratios. 12.8. D. The contribution to the numerator of E[X ∧ 300] from the claims of size less than 300 is the sum of their losses: 1000 + 6000 + 3000 = 10,000. Each claim of size 300 or more contributes 300 to the numerator of E[X ∧ 300]; the sum of their contributions is: (300)(9 + 7 + 4) = 6000. E[X ∧ 300] = (10,000 + 6000)/100 = 160.
12.9. The losses eliminated by a deductible of size 100,000 are in thousands: 3082 + 7851 + 5422 + 100(51 + 5 + 2) = 22,155. LER(100,000) = 22155/29012 = 76.36%. Bottom Top # of of claims Interval Interval in the $ Thou. $ Thou. Interval
# claims > Interval
Loss in Interval $ Thous.
0 10 50 100 300 1000 Infinity
1637 513 141 58 7 2 0
0 3,082 7,851 5,422 7,607 2,050 3,000
1 10 50 100 300 1000
2711 1124 372 83 51 5 2 4348
CumuSeverity Losses lative forEliminated Lossesinterval $ Thous. $ Thous. $ Thous. 0 3,082 10,933 16,355 23,962 26,012 29,012
8,212 17,983 22,155 26,062 28,012
LER(x)
28.31% 61.98% 76.36% 89.83% 96.55%
29,012
Comment: Data taken from NAIC Closed Claim Study (1975) in Table VII of “Estimating Pure Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976. Finger calculates excess ratios, which are one minus the loss elimination ratios.
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 112
12.10. e(0) = E[X] = $50,000,000 / 2500 = $20,000. e(10,000) = ($14,000,000 + $12,500,000 + $5,500,000 + $1,100,000) / (600 + 250 + 48 + 2) - 10,000 = $26,778. e(30,000) = ($12,500,000 + $5,500,000 + $1,100,000) / (250 + 48 + 2) - 30,000 = $33,667. e(100,000) = ($5,500,000 + $1,100,000) / (48 + 2) - 100,000 = $32,000. e(500,000) = $1,100,000 / 2 - 500,000 = $50,000. 12.11. D. Include in the Numerator the small losses at their reported value, while limiting the large losses to 2 (thousand) each. The Denominator is the total number of claims. Therefore, E[X ∧ 2] = (20+15+ (10)(2)+ (4)(2))/(16 +10+10+4) = 63/40 = 1.575. 12.12. D. (Losses Limited to $6000) / (Number of Claims) = E[X ∧ 6000] = $1810. Since there are 2000 claims, Losses Limited to $6000 = ($1810)(2000) = $3,620,000. Now there are 30+270 = 300 claims greater than $6000 in size. Since these claims contribute $6000 each to the losses limited to $6000, they contribute a total of (300)($6000) = $1,800,000. Losses limited to $6000 = (losses on claims ≤$6000)+(contribution of claims >$6000). Thus the losses on claims ≤ $6000 = $3,620,000 - $1,800,000 = $1,820,000. Now the losses on claims ≤ $7000 = (losses on claims ≤6000) + (losses on claims > $6000 and ≤ $7000) = $1,820,000 + ($200,000) = $2,020,000. Finally, the losses limited to $7000 = (the losses on claims ≤ $7000) + (Number of Claims > $7000)($7000) = $2,020,000 + (270)($7000) = $3,910,000. E[X ∧ 7000] = (Losses limited to $7000)/(Total Number of Claims) = $3,910,000 / 2000 = $1955. Alternately, the average size of those claims of size between 6,000 and 7,000 equals : ({E[X ∧ 7000] - 7000S(7000)} - {E[X ∧ 6000] - 6000S(6000)})/{F(7000)-F(6000)}. We are given that: S(6000) = 300/2000 = .15, S(7000) = 270/2000 = .135, E[X ∧ 6000] = 1810. The observed average size of those claims of size 6000 to 7000 is: 200000/ 30 = 6666.7. Setting the observed average size of those claims of size 6000 to 7000 equal to the above formula for the same quantity: 6666.7 = ({E[X ∧ 7000] - 7000S(7000)} - {E[X ∧ 6000] - 6000S(6000)})/{F(7000)-F(6000)} = ({E[X ∧ 7000] - 7000(.135)} - {1810 - 6000(.15))/{.865 - .85} . Solving, E[X ∧ 7000] = (6666.7)(.015) + 945 + 1810 - 900 = $1955. Comment: While Lee Diagrams are not on the syllabus, this question may also be answered via a Lee Diagram, not to scale, as follows:
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 113
$7000
D
$6000
Probability
C
B
A .850
E
.865
1.000
A + B + C = E[X ∧ 6000] = $1810 D + B = (Losses on claims of size 6000 to 7000) / (total number of claims) = $200,000 / 2000 = $100. B = ($6000)(30/2000) = ($6000)(.865 - .850) = $90. Therefore, D = (D+B) - B = $100 - $90 = $10. E = ($7000-$6000)(270/2000) = ($1000)(1-.865) = $135. E[X ∧ 7000] = A + B + C + D + E = $1810 + $10 + $135 = $1955. 12.13. The limited losses are: (dollars from small losses) + (15000)(number of large losses) = (667,754 - 331,831) + (15000)(10) = 485,923. E[X ∧ 15,000] = losses limited to 15000)/(number of losses) = 485,923/100 = 4859. 12.14. C. E[X ∧ 100] = {(21000 + 37500) + (100)(400+ 300 + 200 + 100)} / 2100 = 158,500/2100 = 75.48. Mean = 598,500/2100 = 285. Excess Ratio at 100: R(100) = 1 - E[X ∧ 100]/E[X] = 1 - 75.48/285 = 73.5%. Alternately, the losses excess of 100, are contributed by the last four intervals: 70,000 - (400)(100) + 120,000 - (300)(100) + 150,000 - (200)(100) + 200,000 - (100)(100) = 30,000 + 90,000 + 130,000 + 190,000 = 440,000. Excess Ratio at 100: Losses Excess of $100 / Total Losses = 440/598.5 = 73.5%.
2013-4-2,
Loss Distributions, §12 Working with Grouped Data
HCM 10/8/12,
Page 114
12.15. C. E[X ∧ 500] = {(77000 + 148000 + 180000) + (500)(350 + 200 + 50)} / 3000 = 705000/3000 = 235. Mean = 1,100,000/3000 = 366.67. Excess Ratio at 500: R(500) = 1 - E[X ∧ 500]/E[X] = 1 - 235/366.67 = 35.9%. Alternately, the losses excess of 500, are contributed by the last three intervals: 245,000 - (350)(500) + 300,000 - (200)(500) + (150,000) - (50)(500) = 395,000. Excess Ratio at 500: Losses Excess of $500 / Total Losses = 395/1100 = 35.9%. Comments: Here is the calculation of the excess ratios at various amounts: Bottom of Interval
Top of Interval
0 5 10 15 20 100
x 100 250 500 1000 2000 Infinity
# claims in the Interval
# claims > Interval 3000 1900 1100 600 250 50 0
1100 800 500 350 200 50 3000
Loss in Interval
77,000 148,000 180,000 245,000 300,000 150,000
Cumulative Severity LEV(x)LER(x) R(x) Losses for interval 0 77,000 225,000 405,000 650,000 950,000 1,100,000
0 89.0 166.7 235.0 300.0 350.0 366.7
100.0% 75.7% 54.5% 35.9% 18.2% 4.5% 0.0%
1,100,000
The Loss Elimination Ratio at $500 is: 1 - 35.9% = 64.1% = 235/366.67 = E[X ∧ 500]/E[X]. 12.16. D. (X
∧
7,000) - (X
∧
6,000) = 0 for X ≤ 6000 X - 6000 for 6,000 < X ≤ 7,000 1000 for 7000 < X. For the 30 observations between 6,000 and 7,000:
Σ(xi - 6000) = Σxi - (30)(6000) = 200,000 - 180,000 = 20,000. The 270 observations greater than 7,000 contribute to this difference: (270)(1000) = 270,000.
⇒ Σ {(xi ∧ 7,000) - (xi ∧ 6,000)} = 20,000 + 270,000 = 290,000. ⇒ E(X ∧ 7,000) - E(X ∧ 6,000) = 290,000/(1700 + 30 + 270) = 145. ⇒ E(X ∧ 7,000) = E(X ∧ 6,000) + 145 = 1810 + 145 = 1955. Alternately, 1810 = E(X
∧
6,000) = {sum of small losses + (300)(6000)}/2000.
⇒ Sum of losses of size less than 6000 is: (1800)(2000) - (300)(6000) = 1,820,000. E(X
∧
7,000) = {1,820,000 + 200,000 + (7000)(270)}/2000 = 1955.
12.17. D. Total Losses: 15,000 + 3000 + 22000 = 40,000. Losses eliminated: 15,000 + (6 + 16)(500) = 26,000. Loss Elimination Ratio: 26/40 = 0.65.
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 115
Section 13, Uniform Distribution If losses are uniformly distributed on an interval [a, b], then it is equally likely that a loss is anywhere in that interval. The probability of a loss being in any subinterval is proportional to the width of that subinterval. Exercise: Losses are uniformly distributed on the interval [3, 7]. What is the probability that a loss chosen at random will be in the interval [3.6, 3.8]? [Solution: (3.8 - 3.6)/(7 - 3) = .05.]
Uniform Distribution Support: a ≤ x ≤ b
Parameters: None
D. f. :
F(x) = (x-a) / (b-a)
P. d. f. :
f(x) = 1/ (b-a)
bn + 1 - an + 1 Moments: E[Xn] = (b - a) (n + 1) Mean = (b+a)/2 Variance = (b-a)2 / 12
Coefficient of Variation = Standard Deviation / Mean =
b-a (b + a) 3
Skewness = 0 Kurtosis = 9/5 Median = (b+a)/2 Limited Expected Value Function: E[X ∧ x] = E[(X ∧ x)n ] =
2xb - a2 - x 2 , for a ≤ x ≤ b 2(b - a)
(n +1) xn b - an + 1 - n xn + 1 , for a ≤ x ≤ b (n + 1) (b - a)
e(x) = (b - x)/2, for a ≤ x < b
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 116
Exercise: Assume losses are uniformly distributed on the interval [20, 25]. What is mean, second moment., third moment, fourth moment, variance, skewness and kurtosis? [Solution: The mean is: (20+25)/2 = 22.5. The second moment is: (b3 - a3 ) / {(b-a)(3)} = (253 - 203 ) / {(25-20)(3)} = 508.3333. The third moment is: (b4 - a4 ) / {(b-a)(4)} = (254 - 204 ) / {(25-20)(4)} = 11531.25. The fourth moment is: (b5 - a5 ) / {(b-a)(5)} = (255 - 205 ) / {(25-20)(5)} = 262,625. Therefore the variance is: 508.333 - 22.52 = 2.08 = (25-20)2 / 12. The skewness is: {11531.025 - (3)(508.333)(22.5) + 2(22.53 )} / 2.081.5 = 0. The kurtosis is: {262,625 - (4)(11531.25)(22.5) + (6)(508.3333)(22.52 ) - 3(22.54 )} / 2.082 = 1.8. Comment: The skewness of a uniform distribution is always zero, since it is symmetric. The kurtosis of a uniform distribution is always 9/5.]
Discrete Uniform Distribution: The uniform distribution discussed above is a continuous distribution. It is different than a distribution uniform and discrete on integers. For example, assume a distribution uniform and discrete on the integers from 10 to 13 inclusive: f(10) = 1/4, f(11) = 1/4, f(12) = 1/4, and f(13) = 1/4. It has mean of: (10 + 11 + 12 + 13)/4 = 11.5 = (10 + 13)/2. It has variance of: {(10 - 11.5)2 + (11 - 11.5)2 + (12 - 11.5)2 + (13 - 11.5)2 }/ 4 = 1.25. In general, for a distribution uniform and discrete on the integers from i to j inclusive: Mean = (i + j)/2 Variance = {(j + 1 - i)2 - 1}/12. For i = 10 and j = 13, the variance = {(13 + 1 - 10)2 - 1}/12 = 15/12 = 1.25, matching the previous result. Note that the variance formula is somewhat different for the discrete case than the continuous case.48 Exercise: What is the variance of a six-sided die? [Solution: Uniform and discrete from 1 to 6: variance = {(6 + 1 - 1)2 - 1}/12 = 35/12.] The variance of an S-sided die is: (S2 - 1)/12. 48
I would not memorize the formula for the variance in the discrete case.
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Problems: Use the following information for the next 8 questions: The size of claims is uniformly distributed on the interval from 3 to 7. 13.1 (1 point) What is the probability density function at 6? A. 0.20
B. 0.25
C. 0.30
D. 0.35
13.2 (1 point) What is the distribution function at 6? A. 0.60 B. 0.65 C. 0.70 D. 0.75
E. 0.40
E. 0.80
13.3 (1 point) What is the mean of the severity distribution? A. less than 4 B. at least 4 but less than 5 C. at least 5 but less than 6 D. at least 6 but less than 7 E. at least 7 13.4 (1 point) What is the variance of the severity distribution? A. less than 1.4 B. at least 1.4 but less than 1.5 C. at least 1.5 but less than 1.6 D. at least 1.6 but less than 1.7 E. at least 1.7 13.5 (2 points) What is the limited expected value at 6? A. less than 4.5 B. at least 4.5 but less than 4.6 C. at least 4.6 but less than 4.7 D. at least 4.7 but less than 4.8 E. at least 4.8 13.6 (1 point) What is the excess ratio at 6? A. less than 2.0% B. at least 2.0% but less than 2.1% C. at least 2.1% but less than 2.2% D. at least 2.2% but less than 2.3% E. at least 2.3%
Page 117
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 118
13.7 (1 point) What is the skewness of the severity distribution? (A) -1.0 (B) -0.5 (C) 0 (D) 0.5 (E) 1.0 13.8 (1 point) What is the mean excess loss at 4, e(4)? (A) 1.0 (B) 1.5 (C) 2.0 (D) 2.5 (E) 3.0 13.9 (2 points) Losses for a coverage are uniformly distributed on the interval 0 to $10,000. What is the Loss Elimination Ratio for a deductible of $1000? A. less than 0.16 B. at least 0.16 but less than 0.18 C. at least 0.18 but less than 0.20 D. at least 0.20 but less than 0.22 E. at least 0.22 13.10 (3 points) X is uniformly distributed on the interval 0 to 10,000. Determine the covariance of X ∧ 1000 and (X - 1000)+. A. less than 200,000 B. at least 200,000 but less than 210,000 C. at least 210,000 but less than 220,000 D. at least 220,000 but less than 230,000 E. at least 230,000 13.11 (3 points) X is uniformly distributed on the interval 0 to 10,000. Determine the correlation of X ∧ 1000 and (X - 1000)+. A. less than 0.3 B. at least 0.3 but less than 0.4 C. at least 0.4 but less than 0.5 D. at least 0.5 but less than 0.6 E. at least 0.6 13.12 (3 points) X is uniform on [0, 20]. Y is uniform on [0, 30]. X and Y are independent. Z is the maximum of X and Y. Determine E[Z]. A. less than 16.0 B. at least 16.0 but less than 16.5 C. at least 16.5 but less than 17.0 D. at least 17.0 but less than 17.5 E. at least 17.5
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 119
13.13 (1 point) X and Y are independent. X has a uniform distribution on 0 to 100. Y has a uniform distribution on 0 to ω. eY(40) = ex(40) - 5. What is ω? A. 90
B. 95
C. 100
D. 105
E. 110
13.14 (2, 5/83, Q.13) (1.5 points) A box is to be constructed so that its height is 10 inches and its base is X inches by X inches. If X has a uniform distribution over the interval (2, 8), then what is the expected volume of the box in cubic inches? A. 80.0 B. 250.0 C. 252.5 D. 255.0 E. 280.0 13.15 (160, 11/86, Q.5) (2.1 points) A population has a survival density function f(x) = 0.01, 0 < x < 100. Determine the probability that a life now aged 60 will live longer than a life now aged 50. (A) 0.1 (B) 0.2 (C) 0.3 (D) 0.4 (E) 0.5 13.16 (2, 5/88, Q.40) (1.5 points) Let X be a random variable with a uniform distribution on the interval (1, a) where a > 1. If E(X) = 6 Var(X), then what is a? A. 2
B. 3
C. 3 2
D. 7
E. 8
13.17 (4B, 11/95, Q.28) (2 points) Two numbers are drawn independently from a uniform distribution on [0,1]. What is the variance of their product? A. 1/144 B. 3/144 C. 4/144 D. 7/144 E. 9/144 13.18 (Course 160 Sample Exam #1, 1999, Q.4) (1.9 points) A cohort of eight fruit flies is hatched at time t = 0. You are given: (i) The survival distribution for fruit flies is known to be uniform over (0, 10]. (ii) Deaths are observed at times 1, 1, 2, 4, 5, 6, 6 and 7. (ill) y is the number of fruit flies from the cohort observed to survive past e(0). For any future cohort of eight fruit flies, determine the probability that exactly y will survive beyond e(0). (A) 0.22 (B) 0.23 (C) 0.25 (D) 0.27 (E) 0.28 13.19 (Course 1 Sample Exam, Q.35) (1.9 points) Suppose the remaining lifetimes of a husband and wife are independent and uniformly distributed on the interval [0, 40]. An insurance company offers two products to married couples: One which pays when the husband dies; and One which pays when both the husband and wife have died. Calculate the covariance of the two payment times. A. 0.0 B. 44.4 C. 66.7 D. 200.0 E. 466.7
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 120
13.20 (1, 5/00, Q.38) (1.9 points) An insurance policy is written to cover a loss, X, where X has a uniform distribution on [0, 1000]. At what level must a deductible be set in order for the expected payment to be 25% of what it would be with no deductible? (A) 250 (B) 375 (C) 500 (D) 625 (E) 750 13.21 (1, 11/01, Q.28) (1.9 points) Two insurers provide bids on an insurance policy to a large company. The bids must be between 2000 and 2200. The company decides to accept the lower bid if the two bids differ by 20 or more. Otherwise, the company will consider the two bids further. Assume that the two bids are independent and are both uniformly distributed on the interval from 2000 to 2200. Determine the probability that the company considers the two bids further. (A) 0.10 (B) 0.19 (C) 0.20 (D) 0.41 (E) 0.60 13.22 (1, 11/01, Q.29) (1.9 points) The owner of an automobile insures it against damage by purchasing an insurance policy with a deductible of 250. In the event that the automobile is damaged, repair costs can be modeled by a uniform random variable on the interval (0, 1500). Determine the standard deviation of the insurance payment in the event that the automobile is damaged. (A) 361 (B) 403 (C) 433 (D) 464 (E) 521 13.23 (3, 11/02, Q.33) (2.5 points) XYZ Co. has just purchased two new tools with independent future lifetimes. Each tool has its own distinct De Moivre survival pattern. One tool has a 10-year maximum lifetime and the other a 7-year maximum lifetime. Calculate the expected time until both tools have failed. (A) 5.0 (B) 5.2 (C) 5.4 (D) 5.6 (E) 5.8 13.24 (2 points) In the previous question, calculate the expected time until at least one of the two tools has failed. (A) 2.6 (B) 2.7 (C) 2.8 (D) 2.9 (E) 3.0 13.25 (CAS3, 11/03, Q.5) (2.5 points) Given: i) Mortality follows De Moivre's Law. ii) eº 20 = 30. Calculate q20. A. 1/60
B. 1/70
C. 1/80
D. 1/90
E. 1/100
2013-4-2,
Loss Distributions, §13 Uniform Distribution
13.26 (SOA3, 11/03, Q.39) (2.5 points) You are given: (i) Mortality follows DeMoivreʼs law with ω = 105. (ii) (45) and (65) have independent future lifetimes. Calculate eº ____ . 45:65 (A) 33
(B) 34
(C) 35
(D) 36
(E) 37
HCM 10/8/12,
Page 121
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 122
Solutions to Problems: 13.1. B. f(x) = 1/(7-3) = 0.25 for 3< x< 7. 13.2. D. F(x) = (x-3)/(7-3) = for 3< x< 7. F(6) = (6-3)/(7-3) = 0.75. 13.3. C. Mean = (3+7)/2 = 5. 13.4. A.
7
second moment =
7
∫ x2 (1/4)dx = x3 /{(3)(4)}] = {343- 27}/12 = 26.33.
3
3
mean = (7+3)/2 = 5. Thus variance = 26.33 -52 = 1.33. Alternately, one can use the general formula for the variance of a uniform distribution on [a,b]: Variance = (b-a)2 /12 = (7-3)2 / 12 = 1.333. 13.5. E.
6
E[X ∧ 6] =
∫ x(1/4)dx + 6(1-F(6) = (36 - 9) / 8 + (6)(1/4) = 4.875.
3
Alternately, one can use the general formula for E[X ∧ x] of a uniform distribution on [a,b]: E[X ∧ x] = (2xb - a2 - x2 ) / {2(b-a)}. E[X ∧ 6] = ((2)(6)(7) - 32 - 62 ) / {2(7 - 3)} = 4.875.
13.6. E. The excess ratio R(6) = 1 - E[X ∧ 6] / E[X] = 1 - 4.875 / 5 = 2.5%. 13.7. C. Since the uniform distribution is symmetric, the skewness is zero. 13.8. B. The losses of size larger than 4, are uniform from 4 to 7. The amount by which they exceed 4 is uniformly distributed from 0 to 3. e(4) = average amount by which those losses of size greater than 4, exceed 4 = (0 +3)/2 = 1.5. Comment: For a uniform distribution from a to b, e(x) = (b-x)/2, for a ≤ x < b. 13.9. C. The overall mean is (10000 + 0) / 2 = 5000. E[X
∧
1000] =
1000
∫ (x / 10000) dx + 1000(9000/10000) = 50 + 900 = 950. LER(1000) = 950 / 5000 = 0.19. 0
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 123
13.10. B. E[X ∧ 1000] = (1/10)(500) + (9/10)(1000) = 950. E[(X - 1000)+] = E[X] - E[X ∧ 1000] = 5000 - 950 = 4050.
(X ∧ 1000)(X - 1000)+ is 0 for x ≤ 1000 and (1000)(x - 1000) for x > 1000. 10000
∫
E[(X ∧ 1000)(X - 1000)+] = (1000)(x - 1000)/10000 dx = 4,050,000. 1000
Cov[(X ∧ 1000) , (X - 1000)+] = E[(X ∧ 1000)(X - 1000)+] - E[X ∧ 1000]E[(X - 1000)+] = 4,050,000 - (950)(4050) = 202,500. 13.11. C.
1000
∫
10000
∫
E[(X ∧ 1000)2 ] = x2 /10000 dx + 10002 /10000 dx = 33,333 + 900,000 = 933,333. 0
1000
Var[X ∧ 1000] = 933,333 - 9502 = 30,833. 10000
E[(X - 1000)+2 ] =
∫(x-1000)2/10000 dx = 24,300,000.
1000
Var[(X - 1000)+] = 24,300,000 - 40502 = 7,897,500. Corr[(X ∧ 1000) , (X - 1000)+] = Cov[(X ∧ 1000) , (X - 1000)+]/ Var[X ∧ 1000] Var[(X - 1000)+] = 202,500/ (30,833)(7,897,500) = 0.410. 13.12. D. Prob[Z ≤ z] = Prob[X ≤ z]Prob[Y ≤ z]: (z/20)(z/30) for z ≤ 20, and z/30 for 20 ≤ z ≤ 30. S(z) = 1 - z2 /600 for z ≤ 20, 1 - z/30 for 20 ≤ z ≤ 30, and 0 for z ≥ 30.
∫
20
30
∫
∫
z=20
z=30
Mean = S(z)dz = 1 - z2 /600 dz + 1 - z/30 dz = z - z3 /1800] + z - z2 /60] = 0
20
z=0
z=20
20 - 4.444 + 10 - 8.333 = 17.22. Alternately, f(t) = z/300 for z ≤ 20, 1/30 for 20 ≤ z ≤ 30, and 0 for z ≥ 30. 20
∫
∫
30
Mean = z f(z)dt = z2 /300 dz + 0
∫ z/30 dt = 8.888 + 8.333 = 17.22.
20
Comment: Similar to 3, 11/02, Q.33. 13.13. A. ex(40) = (100 - 40)/2 = 30. ⇒ 30 - 5 = 25 = eY(40) = (ω - 40)/2. ⇒ ω = 90.
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 124
8
13.14. E. E[10X2 ] =
∫ 10x2 / 6 dx = 10(83 - 23)/18 = 280. 2
13.15. D. The future life time of the life aged 60 is uniform from 0 to 40. The future lifetime of the life aged 50 is uniform from 0 to 50. If the age 60 dies at time = t, then the probability it lived longer 40
than the life aged 50 is: t/50.
∫ (t / 50) / 40 dt = 0.4. 0
13.16. B. E[X] = (1 + a)/2. Var[X] = (a - 1)2 /12. (1 + a)/2 = 6(a - 1)2 /12.
⇒ (1 + a) = (a - 1)2 . ⇒ a2 - 3a = 0. ⇒ a = 3. 13.17. D. For X and Y independent: E[XY] = E[X]E[Y], and E[X2 Y 2 ] = E[X2 ]E[Y2 ]. Var[XY] = E[(XY)2 ] - E2 [XY] = E[X2 Y 2 ] - {E[X]E[Y]}2 = E[X2 ]E[Y2 ] - E[X]2 E[Y]2 . For the uniform distribution on [0,1], E[X] = 1/2, E[X2 ] = integral from 0 to 1 of f(x)x2 dx = 1/3. Therefore, Var[XY] = (1/3)(1/3) - (1/4)(1/4) = 1/9 - 1/16 = (16 - 9) /144 = 7 /144. 13.18. A. e(0) = mean = (0 + 10)/2 = 5. Three flies survive past 5, so y = 5. For the uniform, probability 3 out of 8 survive past time 5 is: {(8!)/(3!5!)} .53 .55 = 0.21875. 13.19. C. Let X be the time of death of the husband. Let Y be the time of death of the wife. The first policy pays at maximum[X]. The second policy pays at maximum[X, Y]. E[X] = 20. Prob[Max[X, Y] ≤ t] = Prob[X ≤ t and Y ≤ t] = Prob[X ≤ t] Prob[Y ≤ t] = (t/40)(t/40) = t2 /1600. 40
E[Max[X, Y]] =
∫ 1 - t2/1600 dt = 40 - 13.333 = 26.667. 0
X Max[X, Y] = X2 if X ≥ Y and XY if X < Y. E[X Max[X, Y] | X = x] = (x2 )Prob[Y ≤ x] + xE[Y | Y > x]Prob[Y > x] = (x2 )(x/40) + x{(x + 40)/2}(1 - x/40) = x3 /80 + 20x. 40
40
∫ E[X Max[X, Y] | X = x] f(x) dx = ∫ (x3/80 + 20x)(1/40) dx = 600.
E[X Max[X, Y] ] = 0
Cov[X, Max[X, Y]] = 600 - (20)(26.667) = 66.66.
0
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 125
13.20. C. Expected payment to be 25% of what it would be with no deductible. ⇔ 75% of losses eliminated. .75 = LER[d] = E[X
∧
d]/E[X] = {(d/2)(d/1000) + d(1 - d/1000)}/500. ⇒ 375 = d - d2 /2000.
⇒ d2 - 2000d + 750,000 = 0. ⇒ d = {2000 ± 4,000,000 - 3,000,000 }/2 = 500. Comment: The other root of 1500 is greater than 1000 and thus not a solution to the question. 13.21. B. The two variables are within 20 of each other when they are in the southwest to northeast strip. Area of strip = 2002 - 1802 /2 - 1802 /2 = 7600. 7600/2002 = 19%. 2200
2020 2000 13.22. B. Mean payment is: (1/6)(0) + (5/6)(1250/2) = 520.83. 1500
∫
Second moment of payment is: (1/6)(02 ) + (x-250)2 / 1500 dx = 434028. 250
Variance of payment is: 434028 - 520.832 = 162764. Standard Deviation of payment is:
162,764 = 403.
Comment: I have included the probability of a payment of zero due to the deductible.
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 126
13.23. E. X is uniform on [0, 10] and Y is uniform on [0, 7]. Probability that both are dead by time t is: (t/10)(t/7) for t ≤ 7, and t/10 for 7 ≤ t ≤ 10. The corresponding survival function S(t) = 1 - t2 /70, t ≤ 7, 1 - t/10 for 7 ≤ t ≤ 10, 0 for t ≥ 10. 7
∫
10
∫
t=7
t=10
∫
Mean = S(t)dt = 1 - t2 /70 dt + 1 - t/10 dt = t - t3 /210] + t - t2 /20] = 5.82. 0
7
t=0
t=7
Alternately, the corresponding density function f(t) = t/35, t ≤ 7, 1/10 for 7 ≤ t ≤ 10, 0 for t ≥ 10. 7
∫
∫
10
t=7
t=10
∫
Mean = t f(t)dt = t2 /35 dt + t/10 dt = t3 /105] + t2 /20] = 3.267 + 5 - 2.45 = 5.82. 0
7
t=0
t=7
Comment: Last survivor status is discussed in Section 9.4 of Actuarial Mathematics. One could instead use equation 9.5.4 in Actuarial Mathematics: e° xy = e° x + e° y - e° xy . 13.24. B. X is uniform on [0, 10] and Y is uniform on [0, 7]. Probability that X has not failed by time t is: 1 - t/10, for t ≤ 10. Probability that Y has not failed by time t is: 1 - t/7, for t ≤ 7. Probability that neither tool has failed by time t is: (1 - t/10)(1 - t/7) for t ≤ 7. Corresponding survival function, S(t) = (1 - t/10)(1 - t/7) = 1 - .24286t + t2 /70, for t ≤ 7. 7
∫
t=7
∫ 1 - .24286t + t2/70 dt = t - .12143t2 + t3/210] = 2.68.
Mean = S(t)dt = 0
t=0
One could instead use equation 9.5.4 in Actuarial Mathematics: e° xy = e° x + e° y - e° xy , and the solution to the previous question: e° xy = e° x + e° y - e° xy = 5 + 3.5 - 5.82 = 2.68. 13.25. A. De Moivre's Law ⇒ the age of death for a life aged 20 is uniform from 20 to ω. 30 = e° 20 = the average future lifetime = (ω - 20)/2 = ω/2 - 10. ⇒ ω = 80. q20 = {F(21) - F(20)}/S(20) = {21/80 - 20/80}/(60/80) = 1/60.
2013-4-2,
Loss Distributions, §13 Uniform Distribution
HCM 10/8/12,
Page 127
13.26. B. The life aged 45 has a future lifetime uniform from 0 to 60, e° 45 = 30. The life aged 65 has a future lifetime uniform from 0 to 40, e° = 20. 65
For t < 40, Prob[both alive] = S(t) = (1 - t/60)(1 - t/40) = 1 - .04167t + t2 /2400. 40
40
∫
∫1 - .04167t + t2/2400 dt = t - .02083t2 + t3/7200 ] = 15.56.
e° 45:65 = S(t)dt = 0
t = 40
0
e° 4 5 :6 5 = e° 45 + e° 65 - e° 45:65 = 30 + 20 - 15.56 = 34.44.
t=0
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 128
Section 14, Statistics of Grouped Data Since the grouped data in Section 11 displays the losses in each interval, one estimates the mean as the total losses divided by the total number of claims, 157,383,000 / 10000 = 15,738. If the losses were not given one could estimate the mean by assuming that each claim in an interval was at the center of the interval49. If one were given additional information, such as the sum of squares of the claim sizes, then one could directly compute the second moment and variance Exercise: Summary statistics of 100 losses are: Interval
Number of Losses
Sum
Sum of Squares
(0,2000] (2000,4000] (4000,8000] (8000, 15000] (15,000 ∞)
39 22 17 12 10
38,065 63,816 96,447 137,595 331,831
52,170,078 194,241,387 572,753,313 1,628,670,023 17,906,839,238
Total
100
667,754
20,354,674,039
Estimate the mean, second moment and variance. [Solution: The observed mean is: 667,754/100 = 6677.54. The observed second moment is: (sum of squared loss sizes)/(number of losses) = 20,354,674,039/100 = 203,546,740.39. The observed variance is: 203,546,740.39 - 6677.542 = 158,957,200. Comment: In this case, when it comes to the first two moments, we have enough information to proceed in exactly the same manner as if we had ungrouped data.50] More generally, we can estimate moments by assuming the losses are uniformly distributed on each interval.
49
In the case of skewed distributions this will lead to an underestimate of the mean. Also, one would have to guess what value to assign to the claims in a final interval [c,∞). 50 See Course 4 Sample Examination, Q.8.
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 129
Exercise: Given the following grouped data, assuming the losses in each interval are uniformly distributed, calculate the mean, second moment, third moment and fourth moment. 0 -10 6 10-20 11 20-25 3 [Solution: For each interval [a,b], the nth moment is (bn+1 - an+1) / {(b-a)(n+1)}. (Those for the interval [20,25] match those calculated in the previous exercise.) Then we weight together the moments for each interval by the number of claims observed in each interval. Lower Endpoint 0 10 20
Upper Endpoint 10 20 25
Number of Claims 6 11 3
First Moment 5.00 15.00 22.50
Second Moment 33.33 233.33 508.33
Third Moment 250.00 3,750.00 11,531.25
Fourth Moment 2,000.00 62,000.00 262,625.00
20
13.12
214.58
3,867.19
74,093.75
For example, {(33.33)(6) + (233.33)(11) + (508.33)(3)} / 20 = 214.58. Thus the estimated mean, second moment, third moment and fourth moment are : 13.12, 214.58, 3867.19, 74093.75.] As long as the final interval in which there are claims has a finite upper endpoint, this technique can be applied to estimate the moments of any grouped data set. The estimates of second and higher moments may be poor when the intervals are wide and/or the distribution is highly skewed. These estimates of the moments can then be used to estimate the variance, CV, skewness and kurtosis. Exercise: Given the following grouped data, assuming the losses in each interval are uniformly distributed, calculate the variance, CV, skewness and kurtosis. 0 -10 6 10-20 11 20-25 3 [Solution: From the solution to the previous exercise, the estimated mean, second moment, third moment and fourth moment are : 13.12, 214.58, 3867.19, 74093.75. Therefore the variance is: 214.58 - 13.122 = 42.45. CV = 42.45.5 / 13.12 = .497. The skewness is: {3867.19 - (3)(214.58)(13.12) + 2(13.123 )} / 42.451.5 = -.226. The kurtosis is: {74,093.75- (4)(3867.19)(13.12) + (6)(214.58)(13.122 ) - 3(13.124 )} / 42.452 = 2.15.]
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 130
Estimating Statistics of Grouped Data When Given the Losses in Each Interval: While the syllabus does not discuss how to estimate higher moments for grouped data, when given the losses in each interval, here is an example of how one can estimate the variance of grouped data. First one can compute the between interval variance by assuming that all claims in an interval are at the average. Clearly this underestimates the total variance, because it ignores the variance within intervals. For narrow intervals this will not produce a major error. For a uniform distribution over an interval from a to b, the variance is (b-a)2 /12. Thus one can estimate the within interval variance by computing the weighted average squared width of the intervals, and dividing by 12. The total variance can be estimated by adding the between interval variance to the within interval variance. As an illustrative example, for the Grouped Data in Section 11 the variance could be estimated as follows: A
B
C
Interval
# claims
0-5 5 -10 10-15 15-20 20-25 25-50 50-75 75-100 over 100
2208 2247 1701 1220 799 1481 254 57 33
Loss Cumu5974 16725 21071 21127 17880 50115 15303 4893 4295
10000
157383
D
Severity Cumu2.7 7.4 12.4 17.3 22.4 33.8 60.2 85.8 130.2
E
F
G
H
Square of Severity 7.3 55.4 153.4 299.9 500.8 1145.1 3629.8 7368.9 16939.4
Col. B x Col. E 16163 124488 261015 365861 400118 1695823 921976 420025 559001
Square of Interval Width 25 25 25 25 25 625 625 625 10000
Col. B x Col. G 55200 56175 42525 30500 19975 925625 158750 35625 330000
476
165
First the between variance is estimated as the difference between the average squared severity minus the square of the average severity : Estimated Variance Between Intervals = 476 million - (15,7382) = 228 million. Next the within variance is estimated by calculating the average squared width of interval. For the over $100,000 interval we select an equivalent width of about $100,000 (This is based on the average severity for this interval being $130,000 only 30 thousand more than the lower bound of the interval; therefore, this is not a very heavier-tailed distribution. For heavier-tailed distributions, the rare large claims can contribute a significant portion of the overall mean and an even more significant portion of the variance.) Estimated Variance Within Intervals = 165 million / 12 = 14 million.
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 131
Then the estimated total variance is the sum of the between and within variances: Estimated Variance = 228 million + 14 million = 242 million. The estimated coefficient of variation is the estimated standard deviation divided by the estimated mean. Estimated Coefficient of Variation = 15.6 / 15.7 = 0.99. While the estimated variance and thus the estimated coefficient of variation is dependent to some extent on exactly how one corrects for the grouping, particularly how one deals with the last interval, for this grouped data the coefficient of variation is clearly close to one. (The adjustment for the within interval variance did not have much effect due to the width of the intervals and the relatively light tail of this loss distribution.) Similarly, using the average values for each interval, one can estimate the third moment and thus the skewness. Taking the weighted average of the cubes of the severities for each interval, using the number of claims in each interval as the weight, gives an estimate for the third moment of the grouped data in Section 11: m3 = 2.41 x 1013. However, by comparing for a uniform distribution the integral over an interval of x3 versus the cube of the average severity, one can derive a correction term. This correction term which should be added to the previous estimate of the third moment is: (square of the width of each interval) (mean severity for each interval) / 4, taking the weighted average over all intervals, using the number of claims as the weights. For the grouped data in Section 11, one gets a correction term of 0.22 x 1013, where the last interval has been assigned an equivalent width of 100,000. Adding in the correction term gives an estimate of: m3 = 2.63 x 1013. Using the formula Skewness = {m3 - (3 m1 m2 ) + (2 m1 3 )} / STDDEV3 , and the estimates m1 = 1.57 x 104 , m2 = variance + mean2 = 4.88 x 108 , standard deviation = 1.56 x 104 , gives an estimate for the skewness of: 1.10 / 0.38 = 2.9. Exercise: What is the estimate of the skewness, if in the correction to the third moment one assumes an equivalent width of 150,000 rather than 100,000? [Solution: Estimated skewness is 3.2 rather than 2.9.] The estimated skewness is even more affected than the estimated coefficient of variation by the loss of information inherent in the grouping of data. However, it is clear that the skewness for the grouped data in Section 11 is somewhere around three.51 51
One could use the Single Parameter Pareto Distribution in order to sharpen the estimate of the contribution of the last interval. This can be particularly useful when dealing with more highly skewed distributions.
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
Problems:
Use the following information for the next 6 questions: • You are given the following grouped data: 0 -1 16 1-5 54 5-25 23 25-100 7 • There are no reported losses of size greater than 100. • Assume the losses in each interval are uniformly distributed. 14.1 (1 point) Estimate the mean. A. less than 10 B. at least 10 but less than 11 C. at least 11 but less than 12 D. at least 12 but less than 13 E. at least 13 14.2 (3 points) Estimate the variance. A. 260 B. 270 C. 280 14.3 (3 points) Estimate the skewness. A. less than 3.5 B. at least 3.5 but less than 4.0 C. at least 4.0 but less than 4.5 D. at least 4.5 but less than 5.0 E. at least 5.0 14.4 (3 points) Estimate the kurtosis. A. less than 8 B. at least 8 but less than 10 C. at least 10 but less than 12 D. at least 12 but less than 14 E. at least 14
D. 290
E. 300
HCM 10/8/12,
Page 132
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
14.5 (2 points) Estimate the limited expected value at 50, E[X ∧ 50]. A. less than 7.0 B. at least 7.0 but less than 7.5 C. at least 7.5 but less than 8.0 D. at least 8.0 but less than 8.5 E. at least 8.5 14.6 (3 points) Estimate the limited second moment at 50, E[(X ∧ 50)2 ]. (A) 195 (B) 200 (C) 205 (D) 210 (E) 215
14.7 (3 points) You are given the following grouped data: Claim Size Number of Claims 0 to 10 82 10 to 25 133 25 to 50 65 50 to 100 20 There are no reported losses of size greater than 100. Assume a uniform distribution of claim sizes within each interval. Estimate the second raw moment of the claim size distribution. A. less than 500 B. at least 500 but less than 600 C. at least 600 but less than 700 D. at least 700 but less than 800 E. at least 800
Page 133
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 134
14.8 (3, 11/00, Q.31 & 2009 Sample Q.117) (2.5 points) For an industrywide study of patients admitted to hospitals for treatment of cardiovascular illness in 1998, you are given: (i) Duration In Days Number of Patients Remaining Hospitalized 0 4,386,000 5 1,461,554 10 486,739 15 161,801 20 53,488 25 17,384 30 5,349 35 1,337 40 0 (ii) Discharges from the hospital are uniformly distributed between the durations shown in the table. Calculate the mean residual time remaining hospitalized, in days, for a patient who has been hospitalized for 21 days. (A) 4.4 (B) 4.9 (C) 5.3 (D) 5.8 (E) 6.3 14.9 (2 points) In the previous question, 3, 11/00, Q.31, what is the Excess Ratio at 21 days? (Excess Ratio = 1 - Loss Elimination Ratio.) (A) 0.5% (B) 0.7% (C) 0.9% (D) 1.1% (E) 1.3%
14.10 (4, 11/01, Q.2 & 2009 Sample Q. 58) (2.5 points) You are given: Claim Size Number of Claims 0-25 30 25-50 32 50-100 20 100-200 8 Assume a uniform distribution of claim sizes within each interval. Estimate the second raw moment of the claim size distribution. (A) Less than 3300 (B) At least 3300, but less than 3500 (C) At least 3500, but less than 3700 (D) At least 3700, but less than 3900 (E) At least 3900
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 135
14.11 (4, 5/07, Q.7) (2.5 points) You are given: (i) Claim Size Number of Claims (0, 50] 30 (50, 100] 36 (100, 200] 18 (200, 400] 16 (ii) Claim sizes within each interval are uniformly distributed. (iii) The second moment of the uniform distribution on (a, b] is (b3 - a3 ) / {3(b - a)}. Estimate E[(X ∧ 350)2 ], the second moment of the claim size distribution subject to a limit of 350. (A) 18,362 (B) 18,950 (C) 20,237 (D) 20,662 (E) 20,750 14.12 (2 points) In the previous question, estimate Var[X
∧
350].
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 136
Solutions to Problems: 14.1. A. For each interval [a,b], the nth moment is: (bn+1 - an+1) / {(b-a)(n+1)}. Then we weight together the moments for each interval by the number of claims observed in each interval. Lower Endpoint 0 1 5 25
Upper Endpoint 1 5 25 100
Number of Claims 16 54 23 7
First Moment 0.50 3.00 15.00 62.50
Second Moment 0.33 10.33 258.33 4,375.00
Third Moment 0.25 39.00 4,875.00 332,031.25
Fourth Moment 0.20 156.20 97,625.00 26,640,625.00
100
9.525
371.30
24,384.54
1,887,381.88
For example, {(.33)(16) + (10.33)(54) + (258.33)(23) + (4375)(7)} / 100 = 371.3. Thus the estimated mean, second moment, third moment and fourth moment are: 9.525, 371.3, 24,384.54, and 1,887,381.88. 14.2. C. The estimated variance = 371.3 - 9.5252 = 280.6. 14.3. A. The estimated skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / Variance1.5 = {24384.54 - (3)(9.525)(371.3) + (2)(9.5253 )}/280.61.5 = 3.30. 14.4. E. The estimated kurtosis = {µ4 ′ - 4µ1 ′µ3 ′ + 6µ1 ′2µ2 ′ - 3µ1 ′4} / Variance2 = {1887382- (4)(9.525)(24384.54) + (6)(9.5252 )(371.3) - (3)(9.5254 )}/280.62 = 14.4. 14.5. D. & 14.6. E. There are 7 losses in the interval 25 to 100, so we assume 7/3 losses in the interval 25 to 50 and 14/3 losses in the interval 50 to 100. The losses of size 50 or more contribute 50 to E[X ∧ 50]. The losses of size 50 or more contribute 502 to E[(X ∧ 50)2 ]. Lower Endpoint
Upper Endpoint
# Claims
1st Limited Moment at 50
2nd Limited Moment at 50
0 1 5 25 50
1 5 25 50 100
16 54 23 2.333 4.667
0.50 3.00 15.00 37.50 50
0.33 10.33 258.33 1,458.33 2500
100
8.358
215.74
E[X ∧ 50] = {(.5)(16) + (3)(54) + (15)(23) + (37.5)(7/3) + (50)(14/3)} / 100 = 8.358.
E[(X ∧ 50)2 ] = {(.33)(16) + (10.33)(54) + (258.33)(23) + (1458.33)(7/3) + (2500)(14/3)} / 100 = 215.74.
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 137
14.7. E. For each interval [a, b], we assume the losses are uniformly distributed, and therefore the nth moment is: (bn+1 - an+1) / {(b - a)(n+1)}. The second moment is: (b3 - a3 ) / {(b - a)3}. For example, for the second interval: (253 - 103 ) / {(25 - 10)3} = 325. Then we weight together the moments for each interval by the number of claims observed in each interval: {(82)(33.3) + (133)(325) + (65)(1458.3) + (20)(5833.3)}/300 = 858. Lower Endpoint 0 10 25 50
Upper Endpoint 10 25 50 100
Number of Claims 82 133 65 20
First Moment 5 18 38 75
Second Moment 33.3 325.0 1,458.3 5,833.3
300
22.25
858.1
Comment: Estimated variance = 858.1 - 22.252 = 363. 14.8. A. Since discharges are uniform from 20 to 25, there are assumed to be: (4/5)(53488 - 17384) = 28883.2 discharges from 21 to 25. For a discharge at time t > 21, the contribution to the excess is of 21 is: t - 21. t is assumed uniform on [25, 30], with mean (25 + 30)/2 = 27.5. Thus the average contribution to the excess of 21 from the interval [25, 30] is: E[t - 21] = E[t] - 21 = 27.5 - 21 = 6.5. If discharges are uniformly distributed on [a, b], with a > 21, then the average contribution to the time excess of 21 from those patients discharged between a and b is: (b+a)/2 - 21. For each interval [a, b], the contribution to the time excess of 21 is: (# who are discharged between a and b)(average contribution to the excess). Bottom of Interval
Top of Interval
Average Contribution
Number Discharged
Contribution to Time Excess of 21
21 25 30 35
25 30 35 40
2 6.5 11.5 16.5
28,883.2 12,035.0 4,012.0 1,337.0
57,766.4 78,227.5 46,138.0 22,060.5
46,267.2
204,192.4
Sum
e(21) = (total time excess of 21)/(# patients staying more than 21 days) = 204192.4 days / 46267.2 = 4.4 days. Comment: Can also be answered using Actuarial Mathematics. The number discharged between 30 and 35, is: (the number remaining at time 30) - (number remaining at time 35) = 5349 - 1337 = 4012.
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 138
14.9. C. From the previous solution, the total time excess of 21 days is 204,192 days. Assuming uniformity on each interval, one can calculate the total number of days as 21,903,260: Bottom of Interval
Top of Interval
Average
Number Discharged
Contribution to Total Time
0 5 10 15 20 25 30 35
5 10 15 20 25 30 35 40
2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5
2,924,446 974,815 324,938 108,313 36,104 12,035 4,012 1,337
7,311,115 7,311,112 4,061,725 1,895,478 812,340 330,962 130,390 50,138
4,386,000
21,903,260
Sum
R(21) = (time excess of 21)/total time = 204,192/21,903,260 = 0.93%. 14.10. E. The second moment for a uniform distribution on [a, b] is: b
∫
x2 /(b-a) dx = (b3 - a3 )/(3(b-a)). Weight together the 2nd moments for each interval: a
Lower Endpoint 0 25 50 100
Upper Endpoint 25 50 100 200
Number of Claims 30 32 20 8
First Moment 12.50 37.50 75.00 150.00
Second Moment 208.33 1,458.33 5,833.33 23,333.33
90
47.500
3,958.33
{(30)(208.33) + (32)(1458.33) + (20)(5833.33) + (8)(23,333.33)}/90 = 3958.33. Comment: The estimated variance is: 3958.33 - 47.52 = 1702.08. 14.11. E. Since we assume a uniform distribution from 200 to 400, we assume 12 of the 16 claims in this interval are from 200 to 350, while the remaining 4 claims are from 350 to 400. Interval Second Moment of Uniform Distribution Number of Claims 0 to 50
(503 - 03 ){(3)(50 - 0)} = 833.33
30
50 to 100
(1003 - 503 ){(3)(100 - 50)} = 5833.33
36
100 to 200
(2003 - 1003 ){(3)(200 - 100)} = 23,333.33
18
200 to 350
(3503 - 2003 ){(3)(350 - 200)} = 77,500
12
The 4 claims of size greater than 350 each contribute 3502 to the numerator of E[(X E[(X
∧
∧
350)2 ].
350)2 ] = {(833.33)(30) + (5833.33)(36) + (23333.33)(18) + (77500)(12) + (3502 )(4)} / (30 + 36 + 18 + 12 + 4) = 2,075,000/100 = 20,750.
2013-4-2,
Loss Distributions, §14 Statistics Grouped Data
HCM 10/8/12,
Page 139
14.12. Again, we assume 12 of the 16 claims in the final interval are from 200 to 350, while the remaining 4 claims are from 350 to 400. E[X ∧ 350] = {(25)(30) + (75)(36) + (150)(18) + (275)(12) + (350)(4)} / (30 + 36 + 18 + 12 + 4) = 10850/100 = 108.5. Var[X
∧
350] = E[(X
∧
350)2 ] - E[X
∧
350]2 = 20,750 - 108.52 = 8977.75.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 140
Section 15, Policy Provisions Insurance policies may have various provisions which determine the amount paid, such as deductibles, maximum covered losses, and coinsurance clauses. (Ordinary) Deductible: An ordinary deductible is a provision which states that when the loss is less than or equal to the deductible there is no payment, and when the loss exceeds the deductible the amount paid is the loss less the deductible.52 Unless specifically stated otherwise, assume a deductible is ordinary. Unless stated otherwise assume the deductible operates per loss. In actual applications, deductibles can apply per claim, per person, per accident, per occurrence, per event, per location, per annual aggregate, etc.53 Exercise: An insured suffers losses of size: $3000, $8000 and $17,000. If the insured has a $5000 (ordinary) deductible, what does the insurer pay for each loss? [Solution: Nothing, $8000 - $5000 = $3000, and $17,000 - $5000 = $12,000.] Here is a graph of the payment under an ordinary deductible of 5000: Payment 20000 15000 10000 5000
5000
52
Loss 10000 15000 20000 25000
See Definition 8.1 in Loss Models. An annual aggregate deductible is discussed in the section on Stop Loss Premiums in “Mahlerʼs Guide to Aggregate Losses” 53
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 141
Maximum Covered Loss:54
Maximum Covered Loss ⇔ u ⇔ size of loss above which no additional payments are made ⇔ censorship point from above. Exercise: An insured suffers losses of size: $2,000, $13,000, $38,000. If the insured has a $25000 maximum covered loss, what does the insurer pay for each loss? [Solution: $2,000, $13,000, $25,000.] Most insurance policies have a maximum covered loss or equivalent. For example, a liability policy with a $100,000 per occurrence limit would pay at most $100,000 in losses from any single occurrence, regardless of the total losses suffered by any claimants. An automobile collision policy will never pay more than the total value of the covered automobile minus any deductible, thus it has an implicit maximum covered loss. An exception is a Workersʼ Compensation policy, which provides unlimited medical coverage to injured workers.55 Coinsurance: A coinsurance factor is the proportion of any loss that is paid by the insurer after any other modifications (such as deductibles or limits) have been applied. A coinsurance is a provision which states that a coinsurance factor is to be applied. For example, a policy might have a 80% coinsurance factor. Then the insurer pays 80% of what it would have paid in the absence of the coinsurance factor.
54
See Section 8.5 of Loss Models. Professor Klugman made up the term “maximum covered loss.” While benefits for lost wages are frequently also unlimited, since they are based on a formula in the specific workersʼ compensation law, which includes a maximum weekly benefit, there is an implicit maximum benefit for lost wages, assuming a maximum possible lifetime. 55
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 142
Policy Limit:56 Policy Limit ⇔ maximum possible payment on a single claim. Policy Limit = c (u - d), where c = coinsurance factor, u = maximum covered loss, and d = deductible. If c = 90%, d = 1000, and u = 5000, then the policy limit = (90%)(5000 - 1000) = 3600; if a loss is of size 5000 or greater, the insurer pays 3600. With a coinsurance factor, deductible, and Policy Limit L: u = d + L/c. In the above example, 1000 + 3600/.9 = 5000. With no deductible and no coinsurance, the policy limit is the same as the maximum covered loss. Exercise: An insured has a policy with a $25,000 maximum covered loss, $5000 deductible, and a 80% coinsurance factor. The insured suffers a losses of: $5000, $15,000, $38,000. How much does the insurer pay? [Solution: Nothing for the loss of $5000. (.8)(15000 - 5000) = $8000 for the loss of $15,000. For the loss of $38,000, first the insurer limits the loss to $25,000. Then it reduces the loss by the $5,000 deductible, $25,000 - $5,000 = $20,000. Then the 80% coinsurance factor is applied: (80%)($20,000) = $16,000. Comment: The maximum possible amount paid for any loss, $16,000, is the policy limit.] If an insured with a policy with a $25,000 maximum covered loss, $5000 deductible, and a coinsurance factor of 80%, suffers a loss of size x, then the insurer pays: 0, if x ≤ $5000 0.8(x-5000), if $5000 < x ≤ $25,000 $16,000, if x ≥ $25,000 More generally, if an insured has a policy with a maximum covered loss of u, a deductible of d, and a coinsurance factor of c, suffers a loss of size x, then the insurer pays: 0, if x ≤ d c(x-d), if d < x ≤ u c(u-d), if x ≥ u 56
See Section 8.5 of Loss Models. This definition of a policy limit differs from that used by many actuaries.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 143
If an insured has a policy with a policy limit of L, a deductible of d, and a coinsurance factor of c, suffers a loss of size x, then the insurer pays: 0, if x ≤ d c(x-d), if d < x ≤ d + L/c L, if x ≥ d + L/c Exercise: There is a deductible of $10,000, policy limit of $100,000, and a coinsurance factor of 90%. Let Xi be the individual loss amount of the ith claim and Yi be the claim payment of the ith claim. What is the relationship between Xi and Yi? [Solution: The maximum covered loss, u = 10000 + 100000/0.9 = $121,111. 0 Xi ≤ 10,000 Y i = 0.90(Xi - 10,000) 100,000
10,000 < Xi ≤ 121,111 Xi > 121,111.]
Order of Operations: If one has a deductible, maximum covered loss, and a coinsurance, then on this exam in order to determine the amount paid on a loss, the order to operations is: 1. Limit the size of loss to the maximum covered loss. 2. Subtract the deductible. If the result is negative, set the payment equal to zero. 3. Multiply by the coinsurance factor.
Franchise Deductible:57 Besides an ordinary deductible, there is the “franchise deductible.” Unless specifically stated otherwise, assume a deductible is ordinary. Under a franchise deductible the insurer pays nothing if the loss is less than the deductible amount, but ignores the deductible if the loss is greater than the deductible amount. Exercise: An insured suffers losses of size: $3000, $8000 and $17,000. If the insured has a $5000 franchise deductible, what does the insurer pay for each loss? [Solution: Nothing, $8000, and $17,000.]
57
In Definition 8.2 in Loss Models.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 144
Under a franchise deductible with deductible amount is d, if the insured has a loss of size x, then the insurer pays: 0 x≤d x x>d Thus data from a policy with a franchise deductible is truncated from below at the deductible amount.58 Therefore under a franchise deductible, the average nonzero payment is: e(d) + d = {E[X] - E[X ∧ d]}/S(d) + d.59 The average cost per loss is: (average nonzero payment)(chance of nonzero payment) = {(E[X] - E[X ∧ d])/S(d) + d}S(d) = (E[X] - E[X ∧ d]) + dS(d).60 Here is a graph of the payment under a franchise deductible of 5000: Payment 25000 20000 15000 10000 5000 5000
58
Loss 10000 15000 20000 25000
See the next section for a discussion of truncation from below (truncation from the right.) See Theorem 8.3 in Loss Models. 60 See Theorem 8.3 in Loss Models. 59
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 145
Definitions of Loss and Payment Random Variables:61 Name
Description
ground-up loss62
Losses prior to the impact of any deductible or maximum covered loss; the full economic value of the loss suffered by the insured regardless of how much the insurer is required to pay in light of any deductible, maximum covered loss, coinsurance, etc.
amount paid per payment
Undefined when there is no payment due to a deductible or other policy provision. Otherwise it is the amount paid by the insurer. Thus for example, data truncated and shifted from below consists of the amounts paid per payment.
amount paid per loss
Defined as zero when the insured suffers a loss but there is no payment due to a deductible or other policy provision. Otherwise it is the amount paid by the insurer.
The per loss variable is: 0 if X ≤ d, X if X > d. The per payment variable is: undefined if X ≤ d, X if X > d. Loss Models uses the notation YL for the per loss variable and YP for the per payment variable. Unless stated otherwise, assume a distribution from Appendix A of Loss Models will be used to model ground-up losses, prior to the effects of any coverage modifications. The effects on distributions of coverage modifications will be discussed in subsequent sections.
61 62
See Section 8.2 of Loss Models. Sometimes referred to as ground-up unlimited losses.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 146
Problems: Use the following information for the next 3 questions: The ABC Bookstore has an insurance policy with a $100,000 maximum covered loss, $20,000 per loss deductible, and a 90% coinsurance factor. During the year, ABC Bookstore suffers three losses of sizes: $17,000, $60,000 and $234,000. 15.1 (1 point) How much does the insurer pay in total? A. less than $95,000 B. at least $95,000 but less than $100,000 C. at least $100,000 but less than $105,000 D. at least $105,000 but less than $110,000 E. at least $110,000 15.2 (1 point) What is the amount paid per loss? A. less than $35,000 B. at least $35,000 but less than $40,000 C. at least $40,000 but less than $45,000 D. at least $45,000 but less than $50,000 E. at least $50,000 15.3 (1 point) What is the amount paid per payment? A. less than $35,000 B. at least $35,000 but less than $40,000 C. at least $40,000 but less than $45,000 D. at least $45,000 but less than $50,000 E. at least $50,000
15.4 (2 points) The size of loss is uniform on [0, 400]. Policy A has an ordinary deductible of 100. Policy B has a franchise deductible of 100. What is the ratio of the expected losses paid under Policy B to the expected losses paid under Policy A? A. 7/6 B. 5/4 C. 4/3 D. 3/2 E. 5/3
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 147
15.5 (1 point) An insured suffers 4 losses of size: $2500, $7700, $10,100, and $23,200. The insured has a $10,000 franchise deductible. How much does the insurer pay in total? A. less than 32,500 B. at least 32,500 but less than 33,000 C. at least 33,000 but less than 33,500 D. at least 33,500 but less than 34,000 E. at least 34,000 Use the following information for the next 3 questions: An insurance policy has a deductible of 10,000, policy limit of 100,000, and a coinsurance factor of 80%. (The policy limit is the maximum possible payment by the insurer on a single loss.) During the year, the insured suffers six losses of sizes: 3000, 8000, 14,000, 80,000, 120,000, and 200,000. 15.6 (2 points) How much does the insurer pay in total? A. less than 235,000 B. at least 235,000 but less than 240,000 C. at least 240,000 but less than 245,000 D. at least 245,000 but less than 250,000 E. at least 250,000 15.7 (1 point) What is the amount paid per loss? A. less than 45,000 B. at least 45,000 but less than 50,000 C. at least 50,000 but less than 55,000 D. at least 55,000 but less than 60,000 E. at least 60,000 15.8 (1 point) What is the amount paid per payment? A. less than 45,000 B. at least 45,000 but less than 50,000 C. at least 50,000 but less than 55,000 D. at least 55,000 but less than 60,000 E. at least 60,000
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 148
Use the following size of loss distribution for the next 2 questions: Size of Loss Probability 100 70% 1000 20% 10,000 10% 15.9 (2 points) If there is an ordinary deductible of 500, what is the coefficient of variation of the nonzero payments? A. less than 1.0 B. at least 1.0 but less than 1.1 C. at least 1.1 but less than 1.2 D. at least 1.2 but less than 1.3 E. at least 1.3 15.10 (2 points) If there is a franchise deductible of 500, what is the coefficient of variation of the nonzero payments? A. less than 1.0 B. at least 1.0 but less than 1.1 C. at least 1.1 but less than 1.2 D. at least 1.2 but less than 1.3 E. at least 1.3 Use the following information for the next four questions: The Mockingbird Tequila Company buys insurance from the Atticus Insurance Company, with a deductible of $5000, maximum covered loss of $250,000, and coinsurance factor of 90%. Atticus Insurance Company buys reinsurance from the Finch Reinsurance Company. Finch will pay Atticus for the portion of any payment in excess of $100,000. Let X be an individual loss amount suffered by the Mockingbird Tequila Company. 15.11 (2 points) Let Y be the amount retained by the Mockingbird Tequila Company. What is the relationship between X and Y? 15.12 (2 points) Let Y be the amount paid by the Atticus Insurance Company to the Mockingbird Tequila Company, prior to the impact of reinsurance. What is the relationship between X and Y? 15.13 (2 points) Let Y be the payment made by the Finch Reinsurance Company to the Atticus Insurance Company. What is the relationship between X and Y? 15.14 (2 points) Let Y be the net amount paid by the Atticus Insurance Company after the impact of reinsurance. What is the relationship between X and Y?
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 149
15.15 (2 points) Assume a loss of size x. Policy A calculates the payment based on limiting to a maximum of 10,000, then subtracting a deductible of 1000, and then applying a coinsurance factor of 90%. Policy B instead calculates the payment based on subtracting a deductible of 1000, then limiting it to a maximum of 10,000, and then applying a coinsurance factor of 90%. What is the difference in payments between that under Policy A and Policy B? 15.16 (CAS6, 5/94, Q.21) (1 point) Last year, an insured in a group medical plan incurred charges of $600. This year, the same medical care resulted in a charge of $660. The group comprehensive medical care plan provides 80% payment after a $100 deductible. Determine the increase in the insured's retention under his or her comprehensive medical care plan. A. Less than 7.0% B. At least 7.0% but less than 9.0% C. At least 9.0% but less than 11.0% D. At least 11.0% but less than 13.0% E. 13.0% or more 15.17 (CAS6, 5/96, Q.41) (2 points) You are given the following full coverage experience: Loss Size Number of Claims Amount of Loss $ 0-99 1,400 $76,000 $100-249 400 $80,000 $250-499 200 $84,000 $500-999 100 $85,000 $1,000 or more 50 $125,000 Total 2,150 $450,000 (a) (1 point) Calculate the expected percentage reduction in losses for a $250 ordinary deductible. (b) (1 point) Calculate the expected percentage reduction in losses for a $250 franchise deductible.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 150
Use the following information for the next two questions: Full Coverage Loss Data Loss Size Number of Claims Amount of Loss 0 - 250 1,500 375,000 250 - 500 1,000 450,000 500 - 750 750 487,500 750-1,000 500 400,000 1,000-1,500 250 312,500 1,500 or more 100 300,000 Total 4,100 2,325,000 15.18 (CAS6, 5/97, Q.32a) (1 point) Calculate the percentage reduction in the loss costs for a $500 franchise deductible compared to full coverage. 15.19 (1 point) Calculate the percentage reduction in the loss costs for a $500 ordinary deductible compared to full coverage.
15.20 (CAS9 11/97, Q.36a) (2 points) An insured is trying to decide which type of policy to purchase:
• A policy with a franchise deductible of $50 will cost her $8 more than a policy with a straight deductible of $50.
• A policy with a franchise deductible of S100 will cost her $10 more than a policy with a straight deductible of $100. An expected ground-up claim frequency of 1.000 is assumed for each of the policies described above. Calculate the probability that the insured will suffer a loss between $50 and $100. Show all work.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 151
Use the following information for the next two questions: Loss Size Number of Claims Total Amount of Loss $0-249 5,000 $1,125,000 250-499 2,250 765,000 500-999 950 640,000 1,000-2,499 575 610,000 2500 or more 200 890,000 Total 8,975 $4,030,000 15.21 (CAS6, 5/98, Q.7) (1 point) Calculate the percentage reduction in loss costs caused by the introduction of a $500 franchise deductible. Assume there is no adverse selection or padding of claims to reach the deductible. A. Less than 25.0% B. At least 25.0%, but less than 40.0% C. At least 40.0%, but less than 55.0% D. At least 55.0%, but less than 70.0% E. 70.0% or more 15.22 (1 point) Calculate the percentage reduction in loss costs caused by the introduction of a $500 ordinary deductible. Assume there is no adverse selection or padding of claims to reach the deductible. 15.23 (1, 5/03, Q.25) (2.5 points) An insurance policy pays for a random loss X subject to a deductible of C, where 0 < C < 1. The loss amount is modeled as a continuous random variable with density function f(x) = 2x for 0 < x < 1. Given a random loss X, the probability that the insurance payment is less than 0.5 is equal to 0.64. Calculate C. (A) 0.1 (B) 0.3 (C) 0.4 (D) 0.6 (E) 0.8 15.24 (CAS5, 5/03, Q.9) (1 point) An insured has a catastrophic health insurance policy with a $1,500 deductible and a 75% coinsurance clause. The policy has a $3,000 maximum retention. If the insured incurs a $10,000 loss, what amount of the loss must the insurer pay? Note: I have rewritten this past exam question in order to match the current syllabus.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 152
15.25 (CAS3, 5/04, Q.35) (2.5 points) The XYZ Insurance Company sells property insurance policies with a deductible of $5,000, policy limit of $500,000, and a coinsurance factor of 80%. Let Xi be the individual loss amount of the ith claim and Yi be the claim payment of the ith claim. Which of the following represents the relationship between Xi and Yi? 0 A. Yi = .80(Xi - 5,000)
Xi ≤ 5,000 5,000 < Xi ≤ 625,000
500,000
Xi > 625,000
0
Xi ≤ 4,000
B. Yi = .80(Xi - 4,000)
4,000 < Xi ≤ 500,000
500,000
Xi > 500,000
0
Xi ≤ 5,000
C. Yi = .80(Xi - 5,000)
5,000 < Xi ≤ 630,000
500,000
Xi > 630,000
0
Xi ≤ 6,250
D. Yi = .80(Xi - 6,250)
6,250 < Xi ≤ 631,250
500,000
Xi > 631,250
0
Xi ≤ 5,000
E. Yi = .80(Xi - 5,000) 500,000
5,000 < Xi ≤ 505,000 Xi > 505,000
15.26 (SOA M, 5/05, Q.32 & 2009 Sample Q.168) (2.5 points) For an insurance: (i) Losses can be 100, 200, or 300 with respective probabilities 0.2, 0.2, and 0.6. (ii) The insurance has an ordinary deductible of 150 per loss. (iii) YP is the claim payment per payment random variable. Calculate Var(YP). (A) 1500 (B) 1875
(C) 2250
(D) 2625
(E) 3000
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 153
Solutions to Problems: 15.1. D. First the insurer limits each loss to $100,000: 17, 60, 100. Then it reduces each loss by the $20,000 deductible: 0, 40, 80. Then the 90% coinsurance factor is applied: 0, 36, 72. The insurer pays a total of 0 + 36 + 72 = $108 thousand. 15.2. B. There are three losses and 108,000 in total is paid: $108,000/3 = $36,000. 15.3. E. There are two (non-zero) payments and 108,000 in total is paid: $108,000/2 = $54,000. 15.4. E. Under Policy A one pays x - 100 for x > 100. 3/4 of the losses are greater than 100, and those losses have average size (100 + 400)/2 = 250. Thus under Policy A the expected payment per loss is: (3/4)(250 -100) = 112.5 Under Policy B, one pays x for x > 100. Thus the expected payment per loss is: (3/4)(250) = 187.5. Ratio is: 187.5/112.5 = 5/3. 15.5. C. The insurer pays: 0 + 0 + $10,100 + $23,200 = $33,300. 15.6. D. Subtract the deductible: 0, 0, 4000, 70,000, 110,000, 190,000. Multiply by the coinsurance factor: 0, 0, 3200, 56,000, 88,000, 152,000. Limit each payment to 100,000: 0, 0, 3200, 56,000, 88,000, 100,000. 0 + 0 + 3200 + 56,000 + 88,000 + 100,000 = 247,200. Alternately, the maximum covered loss is: 10000 + 100000/.8 = 135,000. Limit each loss to the maximum covered loss: 3000, 8000, 14,000, 80,000, 120,000, 135,000. Subtract the deductible: 0, 0, 4000, 70,000, 110,000, and 125,000. Multiply by the coinsurance factor: 0, 0, 3200, 56,000, 88,000, and 100,000. 0 + 0 + 3200 + 56,000 + 88,000 + 100,000 = 247,200. 15.7. A. 247,200/6 = 41,200. 15.8. E. 247,200/4 = 61,800. 15.9. D. The nonzero payments are: 500@2/3 and 9500@1/3. Mean = (2/3)(500) + (1/3)(9500) = 3500. 2nd moment = (2/3)(5002 ) + (1/3)(95002 ) = 30,250,000. variance = 30,250,000 - 35002 = 18,000,000. CV =
18,000,000 / 3500 = 1.212.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 154
15.10. B. The nonzero payments are: 1000@2/3 and 10,000@1/3. Mean = (2/3)(1000) + (1/3)(10000) = 4000. 2nd moment = (2/3)(10002 ) + (1/3)(100002 ) = 34,000,000. variance = 34,000,000 - 40002 = 18,000,000. CV =
18,000,000 / 4000 = 1.061.
15.11. Mockingbird retains all of any loss less than $5000. For a loss of size greater than $5000, it retains $5000 plus 10% of the portion above $5000. Mockingbird retains the portion of any loss above the maximum covered loss of $250,000. Y = X, for X ≤ 5000. Y = 5000 + (0.1)(X - 5000) = 4500 + 0.1X, for 5000 ≤ X ≤ 250,000. Y = 4500 + (0.1)(250,000) + (X - 250,000) = X - 220,500, for 250,000 ≤ X. Comment: The maximum amount that Atticus Insurance Company retains on any loss is: (.9)(250,000 - 5000) = 220,500. Therefore, for a loss X of size greater than 250,000, Mockingbird retains X - 220,500. 15.12. Atticus Insurance pays nothing for a loss less than $5000. For a loss of size greater than $5000, Atticus Insurance pays 90% of the portion above $5000. For a loss of size 250,000, Atticus Insurance pays: (.9)(250,000 - 5000) = 220,500. Atticus Insurance pays no more for a loss larger than the maximum covered loss of $250,000. Y = 0, for X ≤ 5000. Y = (0.9)(X - 5000) = 0.9X - 4500, for 5000 ≤ X ≤ 250,000. Y = 220,500, for 250,000 ≤ X. Comment: The amount retained by Mockingbird, plus the amount paid by Atticus to Mockingbird, equals the total loss. 15.13. Finch Reinsurance pays something when the loss results in a payment by Atticus of more than $100,000. Solve for the loss that results in a payment of $100,000: 100000 = (0.9)(X - 5000). ⇒ x = 116,111. Y = 0, for X ≤ 116,111. Y = (0.9)(X - 116,111) = 0.9X - 104,500, for 116,111 < X ≤ 250,000. Y = 120,500, for 250,000 ≤ X. 15.14. For a loss greater than 116,111, Atticus pays 100,000 net of reinsurance. Y = 0, for X ≤ 5000. Y = (0.9)(X - 5000) = 0.9X - 4500, for 5000 ≤ X ≤ 116,111. Y = 100,000, for 116,111 < X.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 155
15.15. Policy A: (0.9) (Min[x, 10,000] - 1000)+ = (0.9) (Min[x - 1000, 9000])+ = (Min[0.9x - 900, 8100])+ Policy B: (0.9) Min[(x - 1000)+, 10,000] = Min[(0.9x - 900)+, 9000] = (Min[0.9x - 900, 9000])+. Policy A - Policy B = (Min[0.9x - 900, 8100])+ - (Min[0.9x - 900, 9000])+. If x ≥ 11,000, then this difference is: 8100 - 9000 = -900. If 11,000 > x > 10,000, then this difference is: 8100 - (0.9x - 900) = 9000 - 0.9x. If 10,000 ≥ x > 1000, then this difference is: (0.9x - 900) - (0.9x - 900) = 0. If x ≤ 1000, then this difference is: 0 - 0 = 0. Comment: Policy A follows the order of operations you should follow on your exam, unless specifically stated otherwise. A graph of the difference in payments between that under Policy A and Policy B: 2000
4000
6000
8000 10000 12000 14000
- 200 - 400 - 600 - 800
I would attack this type of problem by just trying various values for x. Here are some examples: x Payment Under A Payment Under B Difference 12,000 8100 9000 -900 10,300 8100 8370 -270 8000 6300 6300 0 700 0 0 0 15.16. A. Last year, the insured gets paid: (80%)(600 - 100) = 400. Insured retains: 600 - 400 = 200. This year, the insured gets paid: (80%)(660 - 100) = 448. Insured retains: 660 - 448 = 212. Increase in the insured's retention is: 212 / 200 - 1 = 6.0%.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 156
15.17. (a) Losses eliminated: 76,000 + 80,000 + (250)(200 + 100 + 50) = $243,500. $243,500/$450,000 = 0.541 = 54.1% reduction in expected losses. (b) Under the franchise deductible, we pay the whole loss for a loss of size greater than 250. Losses eliminated: 76,000 + 80,000 = $156,000. $156,000/$450,000 = 0.347 = 34.7% reduction in expected losses. 15.18. (375,000 + 450,000) / 2,325,000 = 35.5%. 15.19. {375,000 + 450,000 + (500)(750 + 500 + 250 + 100)} / 2,325,000 = 69.9%. 15.20. When a payment is made, the $50 franchise deductible pays $50 more than the $50 straight deductible. Therefore, $8 = (1.000) S(50) ($50). ⇒ S(50) = 16%. When a payment is made, the $100 franchise deductible pays $100 more than the $100 straight deductible. Therefore, $10 = (1.000) S(100) ($100). ⇒ S(100) = 10%. The probability that the insured will suffer a loss between $50 and $100 is: S(50) - S(100) = 16% - 10% = 6%. 15.21. C. (1125000 + 765000)/4030000 = 46.9%. 15.22. {1125000 + 765000 + (500)(950 + 575 + 200)}/4030000 = 68.3%. 15.23. B. F(x) = x2 . 0.64 = Prob[payment < 0.5] = Prob[X - C < 0.5] = Prob[X < 0.5 + C] = (0.5 + C)2 .
⇒ C = 0.8 - 0.5 = 0.3. 15.24. Without the maximum retention the insurer would pay: (10,000 - 1500)(0.75) = 6375. In that case the insured would retain: 10,000 - 6375 = 3625. However, the insured retains at most 3000, so the insurer pays: 10,000 - 3000 = 7000. 15.25. C. The policy limit is: c(u-d), where u is the maximum covered loss. Therefore, u = d + (policy limit)/c = 5000 + 500,000/0.8 = 630,000. Therefore the payment is: 0 if X ≤ 5,000, 0.80(X - 5,000) if 5,000 < X ≤ 630,000, and 500,000 if X > 630,000. Comment: For a loss of size 3000 nothing is paid. For a loss of size 100,000, the payment is: .8(100,000 - 5000) = 76,000. For a loss of size 700,000, the payment would be: .8(700,000 - 5000) = 556,000, except that the maximum payment is the policy limit of 500,000. Increasing the size of loss above the maximum covered loss of 630,000, results in no increase in payment beyond 500,000.
2013-4-2,
Loss Distributions, §15 Policy Provisions
HCM 10/8/12,
Page 157
15.26. B. Prob[YP = 50] = 0.2/(0.2 + 0.6) = 1/4. Prob[YP = 150] = 0.6/(0.2 + 0.6) = 3/4. E[YP] = (50)(1/4) + (150)(3/4) = 125. E[(YP)2 ] = (502 )(1/4) + (1502 )(3/4) = 17,500. Var[YP] = 17,500 - 1252 = 1875.
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 158
Section 16, Truncated Data The ungrouped data in Section 1 is assumed to be ground-up (first dollar) unlimited losses, on all loss events that occurred. By “first dollar”, I mean that we start counting from the first dollar of economic loss, in other words as if there were no deductible. By “unlimited” I mean we count every dollar of economic loss, as if there were no maximum covered loss. Sometimes some of this information is not reported, most commonly due to a deductible and/or maximum covered loss. There are four such situations likely to come up on your exam, each of which has two names: left truncation ⇔ truncation from below left truncation and shifting ⇔ truncation and shifting from below left censoring and shifting ⇔ censoring and shifting from below right censoring ⇔ censoring from above. In the following, ground-up, unlimited losses are assumed to have distribution function F(x). G(x) is what one would see after the effects of either a deductible or a maximum covered loss. Left Truncation / Truncation from Below:63 Left Truncation ⇔ Truncation from Below at d ⇔ deductible d and record size of loss for size > d. For example, the same data as in Section 1, but left truncated or truncated from below at $10,000, would have no information on the first eight losses, each of which resulted in less than $10,000 of loss. The actuary would not even know that there were eight such losses.64 The same information would be reported as shown in Section 1 on the remaining 122 large losses. When data is truncated from below at the value d, losses of size less than d are not in the reported data base.65 This generally occurs when there is an (ordinary) deductible of size d, and the insurer records the amount of loss to the insured. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is left truncated / truncated from below at $1000? [Solution: $1,200, $1,500 and $2,800 . Comment: The two smaller losses are never reported to the insurer.] 63
The terms “left truncation” and “truncation from below” are synonymous. This would commonly occur in the case of a $10,000 deductible. 65 Note that the Mean Excess Loss, e(x), is unaffected by truncation from below at d, provided x > d. 64
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 159
The distribution function and the probability density functions are revised as follows: F(x) - F(d) G(x) = ,x>d S(d) g(x) = f(x) / S(d), x > d x ⇔ the size of loss. Thus the data truncated from below has a distribution function which is zero at d and 1 at infinity. The revised probability density function has been divided by the original chance of having a loss of size greater than d. Thus for the revised p.d.f. the probability from d to infinity integrates to unity as it should. Note that G(x) = {F(x) - F(d)} / S(d) = (S(d) - S(x))/S(d) = 1 - S(x)/S(d), x> d. 1 - G(x) = S(x)/S(d). The revised survival function after truncation from below is the survival function prior to truncation divided by the survival function at the truncation point. Both data truncated from below and the mean excess loss exclude the smaller losses. In order to compute the mean excess loss, we would take the average of the losses greater than d, and then subtract d. Therefore, the average size of the data truncated from below at d, is d plus the mean excess loss at d, e(d) + d. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average size of the data reported, if it is truncated from below at $1000? [Solution: ($1,200+ $1,500 + $2,800) /3 = 1833.33 = 1000 + 833.33 = 1000 + e(1000).] Franchise Deductible: Under a franchise deductible the insurer pays nothing if the loss is less than the deductible amount, but ignores the deductible if the loss is greater than the deductible amount. If the deductible amount is d and the insured has a loss of size x, then the insurer pays: 0 x
x≤d x>d
Thus data from a policy with a franchise deductible is truncated from below at the deductible amount.
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 160
Left Truncation and Shifting / Truncation and Shifting from Below: Left Truncation & Shifting at d ⇔ Truncation & Shifting from Below at d ⇔ Excess Loss Variable ⇔ deductible d and record non-zero payment. If the data in Section 1 were truncated and shifted from below at $10,000, the data on these remaining 122 losses would have each amount reduced by $10,000. For the ninth loss with $10,400 of loss, $400 would be reported66. When data is truncated and shifted from below at the value d, losses of size less than d are not in the reported data base, and larger losses have their reported values reduced by d. This generally occurs when there is an (ordinary) deductible of size d, and the insurer records the amount of payment to the insured. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is truncated and shifted from below at $1000? [Solution: $200, $500 and $1,800.] The distribution, survival, and the probability density functions are revised as follows: F(x + d) - F(d) G(x) = , x > 0. New survival function = 1 - G(x) = S(x+d)/S(d), x > 0 S(d) g(x) = f(x+d) / S(d), x > 0 x ⇔ the size of (non-zero) payment.
x+d ⇔ the size of loss.
As discussed previously, the Excess Loss Variable for d is defined for X > d as X-d and is undefined for X ≤ d, which is the same as the effect of truncating and shifting from below at d. Exercise: Prior to a deductible, losses are Weibull with τ = 2 and θ = 1000. What is the probability density function of the excess loss variable corresponding to d = 500? [Solution: For the Weibull, F(x) = 1 - exp(-(x/θ)τ) and f(x) = τ xτ−1 exp(-(x/θ)τ) / θτ. S(500) = exp[-(500/1000)2 ] = .7788. Let Y be the truncated and shifted variable. Then, g(y) = f(500 + y)/S(500) = (y+500) exp[-((y+500)/1000)2 ] / 389,400, y > 0.] The average size of the data truncated and shifted from below at d, is the mean excess loss (mean residual life) at d, e(d).
66
With a $10,000 deductible, the insurer would pay $400 while the insured would have to absorb $10,000.
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 161
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average size of the data reported, if it is truncated and shifted from below at $1000? [Solution: ($200+ $500 + $1,800) /3 = 833.33 = e(1000).] Complete Expectation of Life:67 These ideas are mathematically equivalent to ideas discussed in Life Contingencies. The expected future lifetime for an age x is e° x , the complete expectation of life. e° x is the mean residual life (mean excess loss) at x, e(x). e° x is the mean of the lives truncated and shifted from below at x. Exercise: Three people die at ages: 55, 70, 80. Calculate e° 65 . [Solution: Truncate and shift the data at 65; eliminate any ages ≤ 65 and subtract 65: 70 - 65 = 5, 80 - 65 = 15. Average the truncated and shifted data: e° 65 = (5 + 15)/2 = 10.] The survival function at age x + t for the data truncated and shifted at x is: S(x+t)/S(x) = tp x. As will be discussed in a subsequent section, one can get the mean by integrating the survival function. e°
∞ x
= mean of the data truncated and shifted at x =
∫ tpx dt .
68
0
67 68
See Actuarial Mathematics. See equation 3.5.2 in Actuarial Mathematics.
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 162
Right Truncation / Truncation from Above:69 In the case of the data in Section 1, right truncated or truncated from above at $25,000, there would be no information on the 109 losses larger than $25,000. Truncation from above contrasts with data censored from above at $25,000, which would have each of the 109 large losses in Section 1 reported as being $25,000 or more.70 When data is right truncated or truncated from above at the value L, losses of size greater than L are not in the reported data base.71 Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is truncated from above at $1000? [Solution: $300 and $600.] The distribution function and the probability density functions are revised as follows: G(x) = F(x) / F(L), x ≤ L g(x) = f(x) / F(L), x ≤ L The average size of the data truncated from above at L, is the average size of losses from 0 to L, E[X ∧ L] - L S(L) . F(L) Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average size of the data reported, if it is truncated from above at $1000? [Solution: ($300 + $600)/2 = $450.]
69
See Definition 14.1 in Loss Models. Under censoring from above, one would not know the total size of loss for each of these large losses. This is quite common for data reporting when there are maximum covered losses. 71 Right truncation can happen when insufficient time has elapsed to receive all of the data. For example, one might be doing a mortality study based on death records, which would exclude from the data anyone who has yet to die. Right truncation can also occur when looking at claim count development. One might not have data beyond a given “report” and fit a distribution function (truncated from above) to the available claim counts by report. See for example, “Estimation of the Distribution of Report Lags by the Method of Maximum Likelihood,” by Edward W. Weisner, PCAS 1978. 70
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 163
Truncation from Both Above and Below: When data is both truncated from above at L and truncated from below at the value d, losses of size greater than L or less than or equal to d are not in the reported data base. Exercise: Events occur at times: 3, 6, 12, 15, 24, and 28. How is this data reported, if it is truncated from below at 10 and truncated from above at 20? [Solution: 12 and 15. Comment: One starts observing at time 10 and stops observing at time 20.] The distribution function and the probability density functions are revised as follows: G(x) =
F(x) - F(d) , d
g(x) =
f(x) , d
Note that whenever we have truncation, the probability remaining after truncation is the denominator of the altered density and distribution functions. The average size of the data truncated from below at d and truncated from above at L, is the {E[X ∧L] - L S(L)} - {E[X ∧ d] - d S(d)} average size of losses from d to L, . F(L) - F(d) Exercise: Events occur at times: 3, 6, 12, 15, 24, and 28. What is the average size of the data reported, if it is truncated from below at 10 and truncated from above at 20? [Solution: (12 + 15)/2 = 13.5.]
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 164
Problems: Use the following information for the next 4 questions: There are 6 losses: 100, 400, 700, 800, 1200, and 2300. 16.1 (1 point) If these losses are truncated from below (left truncated) at 500, what appears in the data base? 16.2 (1 point) If these losses are truncated and shifted from below (left truncated and shifted) at 500, what appears in the data base? 16.3 (1 point) If these losses are truncated from above (right truncated) at 1000, what appears in the data base? 16.4 (1 point) If these losses are truncated from below (left truncated) at 500 and truncated from above (right truncated) at 1000, what appears in the data base. Use the following information for the next 2 questions: Losses are uniformly distributed from 0 to 1000. 16.5 (1 point) What is the distribution function for these losses left truncated at 100? 16.6 (1 point) What is the distribution function for these losses left truncated and shifted at 100?
16.7 (1 point) Assume that claims follow a distribution, F(x) = 1 -
1 . {1 + (x / θ)γ}α
Which of the following represents the distribution function for the data truncated from below at d? ⎛ θ γ + d γ⎞ α A. ⎜ x >d ⎟ ⎝ θ γ + x γ⎠ ⎛ θ γ + d γ⎞ α B. 1 - ⎜ ⎟ x >d ⎝ θ γ + x γ⎠ ⎛ θγ ⎞ α C. ⎜ x >d ⎟ ⎝ θ γ + x γ⎠ ⎛ θγ ⎞ α D. 1 - ⎜ ⎟ ⎝ θ γ + x γ⎠
x >d
E. None of the above.
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 165
16.8 (1 point) The report lag for claims is assumed to be exponentially distributed: F(x) = 1 - e-λx , where x is the delay in reporting. What is the probability density function for data truncated from above at 3? A. 3e-3x
B. λe-3λ
C. e-λx / (1 - e-3λ)
D. λe-λx / (1 - e-3λ)
E. λe-λ(x-3)
Use the following information for the next three questions:
• Losses follow a Distribution Function F(x) = 1 - {2000 / (2000+x)}3 . 16.9 (1 point) If the reported data is truncated from below at 400, what is the Density Function at 1000? A. less than 0.0004 B. at least 0.0004 but less than 0.0005 C. at least 0.0005 but less than 0.0006 D. at least 0.0006 but less than 0.0007 E. at least 0.0007 16.10 (1 point) If the reported data is truncated from above at 2000, what is the Density Function at 1000? A. less than 0.0004 B. at least 0.0004 but less than 0.0005 C. at least 0.0005 but less than 0.0006 D. at least 0.0006 but less than 0.0007 E. at least 0.0007 16.11 (1 point) If the reported data is truncated from below at 400 and truncated from above at 2000, what is the Density Function at 1000? A. less than 0.0004 B. at least 0.0004 but less than 0.0005 C. at least 0.0005 but less than 0.0006 D. at least 0.0006 but less than 0.0007 E. at least 0.0007 Use the following information for the next 2 questions: There are 3 losses: 800, 2500, 7000. 16.12 (1 point) If these losses are truncated from below (left truncated) at 1000, what appears in the data base? 16.13 (1 point) If these losses are truncated and shifted from below (left truncated and shifted) at 1000, what appears in the data base?
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 166
Use the following information for the next 3 questions: The probability density function is: f(x) = x/50, 0 ≤ x ≤ 10. 16.14 (2 points) Determine the mean of this distribution left truncated at 4. A. 7.3 B. 7.4 C. 7.5 D. 7.6 E. 7.7 16.15 (2 points) Determine the median of this distribution left truncated at 4 A. 7.3 B. 7.4 C. 7.5 D. 7.6 E. 7.7 16.16 (2 points) Determine the variance of this distribution left truncated at 4 A. 2.8 B. 3.0 C. 3.2 D. 3.4 E. 3.6
16.17 (4, 5/85, Q.56) (2 points) Let f be the probability density function of x, and let F be the distribution function of x. Which of the following expressions represent the probability density function of x truncated and shifted from below at d? ⎧ 0, x ≤ d A. ⎨ ⎩f(x) / {1 - F(d)}, d < x ⎧ 0, x ≤ 0 B. ⎨ ⎩f(x) / {1 - F(d)}, 0 < x ⎧ 0, x ≤ d C. ⎨ ⎩f(x - d) / {1 - F(d)}, d < x ⎧ 0, x ≤ - d D. ⎨ ⎩f(x + d) / {1 - F(d)}, - d < x ⎧ 0, x ≤ 0 E. ⎨ ⎩f(x + d) / {1 - F(d)}, 0 < x
2013-4-2,
Loss Distributions, §16 Truncated Data
16.18 (4B, 11/92, Q.3) (1 point) You are given the following: • Based on observed data truncated from above at $10,000, the probability of a claim exceeding $3,000 is 0.30. • Based on the underlying distribution of losses, the probability of a claim exceeding $10,000 is 0.02. Determine the probability that a claim exceeds $3,000. A. Less than 0.28 B. At least 0.28 but less than 0.30 C. At least 0.30 but less than 0.32 D. At least 0.32 but less than 0.34 E. At least 0.34
HCM 10/8/12,
Page 167
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 168
Solutions to Problems: 16.1. If these losses are truncated from below (left truncated) at 500, the two small losses do not appear: 700, 800, 1200, 2300. 16.2. If these losses are truncated and shifted from below at 500, the two small losses do not appear and the other losses have 500 subtracted from them: 200, 300, 700, 1800. Comment: The (non-zero) payments with a $500 deductible. 16.3. If these losses are truncated from above (right truncated) at 1000, the two large losses do not appear: 100, 400, 700, 800. 16.4. Neither the two small losses nor the two large losses appear: 700, 800. 16.5. Losses of size less than 100 do not appear. G(x) = (x - 100)/900 for 100 < x < 1000. Alternately, F(x) = x/1000, 0 < x < 1000 and G(x) = {F(x) - F(100)}/S(100) = {(x/1000) - 100/1000)}/(1 - 100/1000) = (x - 100)/900 for 100 < x < 1000. 16.6. Losses of size less than 100 do not appear. We record the payment amount with a $100 deductible. G(x) = x/900 for 0 < x < 900. Alternately, F(x) = x/1000, 0 < x < 1000 and G(x) = {F(x + 100) - F(100)}/S(100) = {((x + 100)/1000) - 100/1000)}/(1 - 100/1000) = x/900 for 0 < x < 900. Comment: A uniform distribution from 0 to 900. 16.7. B. The new distribution function is for x > d: {F(x) - F(d)} / {1 - F(d)} = [{θγ / (θγ + dγ)}α − {θγ / (θγ + xγ)}α] / {θγ / (θγ + dγ)}α = 1- {(θγ + dγ) / (θ γ + xγ)}α. Comment: A Burr Distribution. 16.8. D. The Distribution Function of the data truncated from above at 3 is G(x) = F(x)/F(3) = (1 - e-λx) / (1 - e-3λ). The density function is g(x) = Gʼ(x) = λe-λx / (1 - e-3λ). 16.9. C. Prior to truncation the density function is: f(x) = (3)(2000)3 / (2000+x)4 . After truncation from below at 400, the density function is: g(x) = f(x) / {1 - F(400)} = f(x) / {2000 / (2000+400)}3 = f(x) / 0.5787. f(1000) = (3)(2000)3 / (2000+1000)4 = 0.000296. g(1000) = 0.000296 / 0.5787 = 0.00051.
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
Page 169
16.10. A. Prior to truncation the density function is: f(x) = (3)(2000)3 / (2000+x)4 . After truncation from above at 2000, the density function is: g(x) = f(x) / F(2000) = f(x) / {1 - (2000 / (2000+2000))3 } = f(x) / 0.875. f(1000) = (3)(2000)3 / (2000 +1000)4 = 0.000296. g(1000) = 0.000296 / 0.875 = 0.00034. 16.11. D. Prior to truncation the density function is: f(x) = (3)(2000)3 / (2000+x)4 . After truncation from below at 400 and from above at 2000, the density function is: g(x) = f(x) / {F(2000)-F(400)}. F(2000) = 0.875. F(400) = 0.4213. f(1000) = (3)(20003 ) / (2000+1000)4 = 0.000296. g(1000) = 0.000296 / (0.875 - 0.4213) = 0.00065. 16.12. If these losses are truncated from below (left truncated) at 1000, then losses of size less than or equal to 1000 do not appear. The data base is: 2500, 7000. 16.13. If these losses are truncated and shifted from below at 1000, then losses of size less than or equal to 1000 do not appear, and the other losses have 1000 subtracted from them. The data base is: 1500, 6000. Comment: The (non-zero) payments with a $1000 deductible. 16.14. B. & 16.15. D. & 16.16. A. For the original distribution: F(4) = 42 /100 = 0.16. Therefore, the density left truncated at 4 is: (x/50)/(1 - 0.16) = x/42, 4 ≤ x ≤ 10. 10
The mean of the truncated distribution is:
∫4 (x / 42) x dx = (103 - 43)/126 = 7.429.
For x > 4, by integrating the truncated density, the truncated distribution is: (x2 - 42 ) / 84. Set the truncated distribution equal to 50%: 0.5 = (x2 - 42 ) / 84. ⇒ x = 7.616. 10
The second moment of the truncated distribution is:
∫4 (x / 42) x2 dx = (104 - 44)/168 = 58.
The variance of the truncated distribution is: 58 - 7.4292 = 2.81. Comment: The density right truncated at 4 is: (x/50)/0.16 = x/8, 0 ≤ x ≤ 4. 16.17. E. The p.d.f. is: 0 for x ≤ 0, and f(x+d)/[1-F(d)] for 0 < x Comment: Choice A is the p.d.f for the data truncated from below at d, but not shifted.
2013-4-2,
Loss Distributions, §16 Truncated Data
HCM 10/8/12,
16.18. C. P(x≤3000 | x ≤ 10000) = P(x≤3000) / P(x≤10000). Thus, 1 - 0.3 = P(x≤3000) / (1 - 0.02). P(x≤3000) = (1 - 0.3)(0.98) = 0.686. P(x>3000) = 1 - 0.686 = 0.314. Alternately, let F(x) be the distribution of the untruncated losses. Let G(x) be the distribution of the losses truncated from above at 10,000. Then G(x) = F(x) / F(10,000), for x ≤ 10000. We are given that 1 - G(3000) = 0.3, and that 1 - F(10,000) = 0.02. Thus F(10,000) = 0.98. Also, 0.7 = G(3000) = F(3000) / F(10,000) = F(3000) / 0.98. Thus F(3000) = (0.7)(0.98) = 0.686. 1 - F(3000) = 1 - 0.686 = 0.314.
Page 170
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 171
Section 17, Censored Data Censoring is somewhat different than truncation. With truncation we do not know of the existence of certain losses. With censoring we do not know the size of certain losses. The most important example of censoring is due to the effect of a maximum covered loss. Right Censored / Censored from Above:72 Right Censored at u ⇔ Censored from Above at u ⇔ X
∧
u ⇔ Min[X, u] ⇔
Maximum Covered Loss u and donʼt know exact size of loss, when ≥ u. When data is right censored or censored from above at the value u, losses of size more than u are recorded in the data base as u. This generally occurs when there is a maximum covered loss of size u. When a loss (covered by insurance) is larger than the maximum covered loss, the insurer pays the maximum covered loss (if there is no deductible) and may neither know nor care how much bigger the loss is than the maximum covered loss. Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is censored from above at $1000? [Solution: $300, $600, $1000, $1000, $1000.] The revised Distribution Function under censoring from above at u is: ⎧F(x) G(x) = ⎨ ⎩1
x < u x = u
⎧ f(x) g(x) = ⎨ ⎩point mass of probability S(u)
x < u x = u
The data censored from above at u is the limited loss variable, X
∧ u ≡ Min[X, u], discussed
previously. The average size of the data censored from above at u, is the Limited Expected Value at u, E[X ∧ u].
72
The terms “right censored” and “censored from above” are synonymous. “From the right” refers to a graph with the size of loss along the x-axis with the large values on the righthand side. “From above” uses similar terminology as “higher layers of loss.” “From above” is how the effect of a maximum covered loss looks in a Lee Diagram.
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 172
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average size of the data reported, if it is censored from above at $1000? [Solution: ($300+ $600+ $1000+ $1000+ $1000) /5 = $780 = E[X ∧ 1000].] Truncation from Below and Censoring from Above: When data is subject to both a maximum covered loss and a deductible, and one records the loss by the insured, then the data is censored from above and truncated from below. For example, with a deductible of $1000 and a maximum covered loss of $25,000: As Recorded after Truncation from Below at 1000 and Loss Size Censoring from Above at 25000 600 Not recorded 4500 4500 37000 25000 For truncation from below at d and censoring from above at u,73 the data are recorded as follows: Loss by Insured Amount Recorded by Insurer x≤d d
Not Recorded x u
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is truncated from below at $1000 and censored from above at $2000? [Solution: $1200, $1500, $2000.] The revised Distribution Function under censoring from above at u and truncation from below at d is: ⎧F(x) - F(d) ⎪ d < x < u G(x) = ⎨ S(d) ⎪⎩ 1 x = u d < x < u ⎧f(x) / S(d) g(x) = ⎨ ⎩ point mass of probability S(u)/ S(d) x = u
73
For the example u = $25,000 and d = $1000.
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 173
The total losses of the data censored from above at u and truncated from below at d, is the losses in the layer from d to u, plus d times the number of losses in the data set. The number of losses in the data set is S(d). Therefore, the average size of the data censored from above at u and truncated from below at d, is: (E[X ∧ u] - E[X ∧ d]) + d S(d) E[X ∧ u] - E[X ∧ d] = + d. S(d) S(d) Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average size of the data reported, if it is truncated from below at $1000 and censored from above at $2000? [Solution: ($1200+ $1500 + $2000) /3 = $1567. Comment: (E[X ∧ 2000] - E[X ∧ 1000])/S(1000) + 1000 = (1120 - 780)/.6 + 1000 = $1567.] Truncation and Shifting from Below and Censoring from Above: When data is subject to both a maximum covered loss and a deductible, and one records the amount paid by the insurer, then the data is censored from above and truncated and shifted from below. For example, with a deductible of $1000 and a maximum covered loss of $25,000: As Recorded after Truncation and Shifting from Below at 1000 and Loss Size Censoring from Above at 25000 600 Not recorded 4500 3500 37000 24000 For truncation and shifting from below at d and censoring from above at u, the data are recorded as follows: Loss by Insured Amount Recorded by Insurer x≤d d
Not Recorded x-d u-d
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 174
The revised Distribution Function under censoring from above at u and truncation and shifting from below at d is: ⎧F(x + d) − F(d) ⎪ S(d) ⎪ G(x) = ⎨ ⎪ 1 ⎪⎩
0 < x < u - d x = u - d
⎧f(x + d) / S(d) 0 < x < u - d g(x) = ⎨ x = u - d ⎩ point mass of probability S(u) / S(d) x ⇔ the size of (non-zero) payment.
x+d ⇔ the size of loss.
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is truncated and shifted from below at $1000 and censored from above at $2000? [Solution: $200, $500, $1000.] The total losses of the data censored from above at u and truncated and shifted from below at d, is the losses in the layer from d to u. The number of losses in the data base is S(d). Therefore the average size of the data censored from above at u and truncated and shifted from below at d, is: E[X ∧ u] - E[X ∧ d] . S(d) Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average size of the data reported, if it is truncated and shifted from below at $1000 and censored from above at $2000? [Solution: ($200+ $500 + $1000) /3 = $567. Comment: (E[X ∧ 2000] - E[X ∧ 1000])/S(1000) = (1120 - 780)/.6 = $567.]
Left Censored and Shifted / Censored and Shifted from Below: For example, the same data as in Section 1, but left censored and shifted or censored and shifted from below at $10,000, would have each of the 8 small losses in Section 1 reported as resulting in no payment in the presence of a $10,000 deductible, but we would not know their exact sizes. For the remaining 122 large losses, the payment of $10,000 less than their size would be reported.
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 175
left censored and shifted variable74 at d ⇔ (X - d)+ ⇔ 0 when X ≤ d, X - d when X > d ⇔ the amounts paid to insured with a deductible of d
⇔ payments per loss, including when the insured is paid nothing due to the deductible of d ⇔ amount paid per loss. When data is left censored and shifted at the value d, losses of size less than d are recorded in the data base as 0. Losses of size x > d, are recorded as x - d. What appears in the data base is (X - d)+. The revised Distribution Function under left censoring and shifting at d is: G(x) = F(x + d)
x≥0
⎧point mass of probability F(d) g(x) = ⎨ ⎩ f(x + d) x ⇔ the size of payment.
x = 0 x > 0
x+d ⇔ the size of loss.
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is left censored and shifted at $1000? [Solution: $0, $0, $200, $500 and $1,800.] The mean of the left censored at d and shifted variable = the average payment per loss with a deductible of d = E[X] - E[ X E[(X - d)+] = E[X] - E[ X
∧
∧
d] ⇔ Layer from d to ∞.
∞
d] =
∫ (x- d) f(x) dx .
d
E[ (X - d)n+ ] =
∞
∫ (x- d)n f(x) dx .
d
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average of the data reported, if it is left censored and shifted at $1000? [Solution: (0 + 0 + $200 + $500 + $1800)/5 = $500. E[X ] - E[X ∧ 1000] = $1280 - $780 = $500 = average payment per loss. In contrast, the average payment per non-zero payment is: ($200 + $500 + $1800)/3 = $833.33. ] 74
Discussed previously. See Definition 3.5 in Loss Models.
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 176
Left Censored / Censored from Below:75 Sometimes data is censored from below, so that one only knows how many small values there are, but does not know their exact sizes.76 For example, an actuary might have access to detailed information on all Workersʼ Compensation losses of size greater than $2000, including the size of each such loss, but might only know how many losses there were of size less than or equal to $2000. Such data has been censored from below at $2000. For example, the same data as in Section 1, but censored from below at $10,000, would have each of the 8 small losses in Section 1 reported as being $10,000 or less. The same information would be reported as shown in Section 1 on the remaining 122 losses. When data is censored from below at the value d, losses of size less than d are recorded in the data base as d. The revised Distribution Function under censoring from below at d is: ⎧0 x < d G(x) = ⎨ ⎩F(x) x ≥ d ⎧point mass of probability F(d) g(x) = ⎨ ⎩ f(x)
x < d x ≥ d
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. How is this data reported, if it is censored from below at $1000? [Solution: $1000, $1000, $1,200, $1,500 and $2,800.] The average size of the data censored from below at d, is (E[X ] - E[X The losses are those in the layer from d to ∞, plus d per loss.
∧
d]) + d.
Exercise: An insured has losses of sizes: $300, $600, $1,200, $1,500 and $2,800. What is the average of the data reported, if it is censored from below at $1000? [Solution: ($1000 + $1000 + $1200 + $1500 + $2800) / 5 = $1500. Comment: (E[X ] - E[X ∧ 1000]) + 1000 = (1280-780) + 1000 = 1500. ]
75 76
See Definition 14.1 in Loss Models. See for example, 4, 11/06, Q.5.
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 177
Problems: Use the following information for the next 4 questions: Losses are uniformly distributed from 0 to 1000. 17.1 (1 point) What is the distribution function for these losses left censored at 100? 17.2 (1 point) What is the distribution function for these losses left censored and shifted at 100? 17.3 (1 point) What is the distribution function for these losses right censored at 800? 17.4 (1 point) What is the distribution function for these losses left truncated and shifted at 100 and right censored at 800?
Use the following information for the next 4 questions: There are 6 losses: 100, 400, 700, 800, 1200, and 2300. 17.5 (1 point) If these losses are left censored (censored from below) at 500, what appears in the data base? 17.6 (1 point) If these losses are left censored and shifted at 500, what appears in the data base? 17.7 (1 point) If these losses are censored from above (right censored) at 2000, what appears in the data base? 17.8 (1 point) If these losses are left truncated and shifted at 500 and right censored at 2000, what appears in the data base?
17.9 (1 point) There are five accidents with losses equal to: $500, $2500, $4000, $6000, and $8000. Which of the following statements are true regarding the reporting of this data? 1. If the data is censored from above at $5000, then the data is reported as: $500, $2500, $4000. 2. If the data is truncated from below at $1000, then the data is reported as: $2500, $4000, $6000, $8000. 3. If the data is truncated and shifted at $1000, then the data is reported as: $1500, $3000, $5000, $7000. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C or D
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 178
17.10 (2 points) It can take many years for a Medical Malpractice claim to be reported to an insurer and can take many more years to be closed, in other words resolved. You are studying how long it takes Medical Malpractice claims to be reported to your insurer. You have data on incidents that occurred two years ago and how long they took to be reported. You are also studying how long it takes for Medical Malpractice claims to be closed once they are reported. You have data on all incidents that were reported two years ago and how long it took to close those that are not still open. For each of these two sets of data, state whether it is truncated and/or censored and briefly explain why.
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 179
Solutions to Problems: 17.1. We only know that small losses are of size at most 100. G(100) = .1; G(x) = x/1000 for 100 < x < 1000. 17.2. G(0) = .1; G(x) = (x + 100)/1000 for 0 < x < 900. 17.3. All losses are limited to 800. G(x) = x/1000 for 0 < x < 800; G(800) = 1. 17.4. Losses less than 100 do not appear. Other losses are limited to 800 and then have 100 subtracted. G(x) = x/900 for x < 700; G(700) = 1. Alternately, F(x) = x/1000, 0 < x < 1000 and G(x) = {F(x + 100) - F(100)}/S(100) = {((x + 100)/1000) - 100/1000)}/(1 - 100/1000) = x/900 for 0 < x < 800 - 100 = 700; G(700) = 1. 17.5. The two smaller losses appear as 500: 500, 500, 700, 800, 1200, 2300. 17.6. (X - 500)+ = 0, 0, 200, 300, 700, 1800. Comment: The amounts the insured receives with a $500 deductible. 17.7. The large loss is limited to 2000: 100, 400, 700, 800, 1200, 2000. Comment: Payments with a 2000 maximum covered loss. Right censored observations might be indicated with a plus as follows: 100, 400, 700, 800, 1200, 2000+. The 2000 corresponds to a loss of 2000 or more. 17.8. The two small losses do not appear; the other losses are limited to 2000 and then have 500 subtracted: 200, 300, 700, 1500. Comment: Payments with a 500 deductible and 2000 maximum covered loss. Apply the maximum covered loss first and then the deductible; therefore, apply the censorship first and then the truncation. 17.9. C. If the data is censored by a $5000 limit, then the data is reported as: $500, $2500, $4000, $5000, $5000. Statement 1 would be true if it referred to truncation from above rather than censoring. Under censoring the size of large accidents is limited in the reported data to the maximum covered loss. Under truncation from above, the large accidents do not even make it into the reported data. Statements 2 and 3 are each true.
2013-4-2,
Loss Distributions, §17 Censored Data
HCM 10/8/12,
Page 180
17.10. The data on incidents that occurred two years ago is truncated from above at two years. Those incidents, if any, that will take more than 2 years to be reported will not be in our data base yet. We donʼt know how many of them there may be nor how long they will take to be reported. The data on claims that were reported two years ago is censored from above at two years. Those claims that are still opened, we know will be closed eventually. However, while we know it will take more than 2 years to close each of them, we donʼt know exactly how long it will take.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 181
Section 18, Average Sizes For each of the different types of data there are corresponding average sizes. The most important cases involve a deductible and/or a maximum covered loss; one should know well the average payment per loss and the average payment per (non-zero) payment. Average Amount Paid per Loss: Exercise: When there is a deductible of size 1000, a maximum covered loss of 25,000, and thus a policy limit of 25,000 - 1000 = 24,000, what is the average amount paid per loss? [Solution: The average amount paid per loss are the average losses in the layer from 1000 to 25,000: E[X ∧ 25000] - E[X ∧ 1000].] Situation
Average Amount Paid per Loss
No Maximum Covered Loss, No Deductible
E[X]
Maximum Covered Loss u, No Deductible
E[X
No Maximum Covered Loss, (ordinary) Deductible d
E[X] - E[X
Maximum Covered Loss u, (ordinary) Deductible d
E[X
Recalling that E[X situations:
∧
∞] = E[X] and E[X
∧
∧ u]
∧
∧
d]
u] - E[X
∧
d]
0] = 0, we have a single formula that covers all four
With Maximum Covered Loss of u and an (ordinary) deductible of d, the average amount paid by the insurer per loss is: E[X ∧ u] - E[X ∧ d]. Note that the average payment per loss is just the layer from d to u. As discussed previously, this layer can also be expressed as: (layer from d to ∞) - (layer from u to ∞) = E[(X - d)+] = E[(X - u)+] = {E[X] - E[X ∧ d]} - {E[X] - E[X ∧ u]} = E[X ∧ u] - E[X ∧ d]. Average Amount Paid per Non-Zero Payment: Exercise: What is the average non-zero payment when there is a deductible of size 1000 and no maximum covered loss? [Solution: The average non-zero payment when there is a deductible of size 1000 is the ratio of the losses excess of 1000, E[X] -E[X ∧ 1000], to the probability of a loss greater than 1000, S(1000). Thus the expected non-zero payment is: (E[X] -E[X ∧ 1000]) / S(1000).]
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 182
In case of a deductible, some losses to the insured are too small to result in a payment by the insurer. Thus there are fewer non-zero payments than losses. In order to convert the average amount paid per loss to the average amount paid per non-zero payment, one needs to divide by S(d). With Maximum Covered Loss of u and an (ordinary) deductible of d, the average amount E[X ∧ u] - E[X ∧ d] paid by the insurer per non-zero payment to the insured is: . S(d) If u = ∞, in other words there is no maximum covered loss, then this is e(d). Coinsurance Factor: For example, an insurance policy might have a 80% coinsurance factor. Then the insurer pays 80% of what it would have paid in the absence of the coinsurance factor. Thus the average payment, either per loss or per non-zero payment would be multiplied by 80%. In general, a coinsurance factor of c, multiplies the average payment, either per loss or per non-zero payment by c. With Maximum Covered Loss of u, an (ordinary) deductible of d, and a coinsurance factor of c, the average amount paid by the insurer per loss by the insured is: c (E[X ∧ u] - E[X ∧ d]). With Maximum Covered Loss of u, an (ordinary) deductible of d, and a coinsurance factor of c, the average amount paid by the insurer per non-zero payment to the insured is: E[X ∧ u] - E[X ∧ d] c . S(d) Exercise: Prior to the application of any coverage modifications, losses follow a Pareto Distribution, as per Loss Models, with parameters α = 3 and θ = 20,000. An insured has a policy with a $100,000 maximum covered loss, a $5000 deductible, and a 90% coinsurance factor. Thus the policy limit is: (0.9)(100,000 - 5000) = 85,500. Determine the average amount per non-zero payment. [Solution: For the Pareto Distribution, as shown in Appendix A of Loss Models, ⎛ θ ⎞α ⎛ θ ⎞ α − 1⎫ θ ⎧ S(x) = ⎜ ⎟ . E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬. ⎝ θ + x⎠ ⎝ θ + x⎠ α −1 ⎩ ⎭ S(5000) = (20/25)3 = 0.512. E[X ∧ 5000] = 10,000 {1 -(20/25)2 } = 3600. E[X ∧ 100,000] = 10,000 {1 -(20/120)2 } = 9722. 90%
E[X ∧ 100,000] - E[X ∧ 5000] (90%)(9722 - 3600) = = $10,761.] 0.512 S(5000)
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 183
Average Sizes for the Different Types of Data Sets: Type of Data
Average Size
Ground-up, Total Limits
E[X]
Censored from Above at u
E[X
Truncated from Below at d
e(d) + d =
Truncated and Shifted from Below at d
e(d) =
Truncated from Above at L
E[X ∧ L] - L S(L) F(L)
Censored from Below at d
(E[X ] - E[X
Left Censored and Shifted
E[(X - d)+] = E[X] - E[X
Censored from Above at u
E[X ∧ u] - E[X ∧ d] +d S(d)
∧ u] E[X] - E[X ∧ d] +d S(d)
E[X] - E[X ∧ d] S(d)
∧
d]) + d
∧
d]
and Truncated from Below at d
Censored from Above at u and
E[X ∧ u] - E[X ∧ d] S(d)
Truncated and Shifted from Below at d
Truncated from Above at L and
{E[X ∧L] - L S(L)} - {E[X ∧ d] - d S(d)} F(L) - F(d)
Truncated from Below at d
Truncated from Above at L and Truncated and Shifted from Below at d
{E[X ∧ L] - L S(L)} - {E[X ∧ d] - d S(d)} -d F (L) - F (d)
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 184
Problems: Use the following information for the next 5 questions: There are five losses of size: $2000, $6000, $12,000, $27,000, and $48,000. 18.1 (1 point) With a $5000 deductible, what is the average payment per loss? A. less than 12,000 B. at least 12,000 but less than 13,000 C. at least 13,000 but less than 14,000 D. at least 14,000 but less than 15,000 E. at least 15,000 18.2 (1 point) With a $5000 deductible, what is the average payment per (non-zero) payment? A. less than 17,000 B. at least 17,000 but less than 18,000 C. at least 18,000 but less than 19,000 D. at least 19,000 but less than 20,000 E. at least 20,000 18.3 (1 point) With a $25,000 policy limit, what is the average payment? A. less than 14,000 B. at least 14,000 but less than 15,000 C. at least 15,000 but less than 16,000 D. at least 16,000 but less than 17,000 E. at least 17,000 18.4 (1 point) With a $5000 deductible and a $25,000 maximum covered loss, what is the average payment per loss? A. less than 10,000 B. at least 10,000 but less than 11,000 C. at least 11,000 but less than 12,000 D. at least 12,000 but less than 13,000 E. at least 13,000 18.5 (1 point) With a $5000 deductible and a $25,000 maximum covered loss, what is the average payment per (non-zero) payment? A. less than 11,000 B. at least 11,000 but less than 12,000 C. at least 12,000 but less than 13,000 D. at least 13,000 but less than 14,000 E. at least 14,000
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 185
Use the following information for the next 5 questions: Loss Size (x) F(x) E[X ∧ x] 10,000 0.325 8,418 25,000 0.599 16,198 50,000 0.784 23,544 100,000 0.906 30,668 250,000 0.978 37,507 ∞ 1.000 41,982 18.6 (1 point) With a 25,000 deductible, determine E[YL ]. A. less than 22,000 B. at least 22,000 but less than 23,000 C. at least 23,000 but less than 24,000 D. at least 24,000 but less than 25,000 E. at least 25,000 18.7 (1 point) With a 25,000 deductible, determine E[YP]. A. less than 62,000 B. at least 62,000 but less than 63,000 C. at least 63,000 but less than 64,000 D. at least 64,000 but less than 65,000 E. at least 65,000 18.8 (1 point) With a 100,000 policy limit, what is the average payment? A. less than 27,000 B. at least 27,000 but less than 28,000 C. at least 28,000 but less than 29,000 D. at least 29,000 but less than 30,000 E. at least 30,000 18.9 (1 point) With a 25,000 deductible and a 100,000 maximum covered loss, what is the average payment per loss? A. less than 15,000 B. at least 15,000 but less than 16,000 C. at least 16,000 but less than 17,000 D. at least 17,000 but less than 18,000 E. at least 18,000 18.10 (1 point) With a 25,000 deductible and a 100,000 maximum covered loss, determine E[YP]. A. 32,000 B. 33,000 C. 34,000 D. 35,000 E. 36,000
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 186
Use the following information for the next five questions: There are six accidents of size: $800, $2100, $3300, $4600, $6100, and $8900. 18.11 (1 point) If the reported data is truncated from below at $1000, what is the average size of claim in the reported data? A. 4800 B. 4900 C. 5000 D. 5100 E. 5200 18.12 (1 point) If the reported data is truncated and shifted from below at $1000, what is the average size of claim in the reported data? A. less than 3900 B. at least 3900 but less than 4000 C. at least 4000 but less than 4100 D. at least 4100 but less than 4200 E. at least 4200 18.13 (1 point) If the reported data is left censored and shifted at $1000, what is the average size of claim in the reported data? A. less than 3700 B. at least 3700 but less than 3800 C. at least 3800 but less than 3900 D. at least 3900 but less than 4000 E. at least 4000 18.14 (1 point) If the reported data is censored from above at $5000, what is the average size of claim in the reported data? A. less than 3400 B. at least 3400 but less than 3500 C. at least 3500 but less than 3600 D. at least 3600 but less than 3700 E. at least 3700. 18.15 (1 point) If the reported data is truncated from above at $5000, what is the average size of claim in the reported data? A. less than 2300 B. at least 2300 but less than 2400 C. at least 2400 but less than 2500 D. at least 2500 but less than 2600 E. at least 2600
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 187
Use the following information for the next four questions: Losses follow a uniform distribution from 0 to 20,000. 18.16 (2 points) If there is a deductible of 1000, what is the average payment by the insurer per loss? A. less than 8800 B. at least 8800 but less than 8900 C. at least 8900 but less than 9000 D. at least 9000 but less than 9100 E. at least 9100 18.17 (2 points) If there is a policy limit of 15,000, what is the average payment by the insurer per loss? A. less than 9300 B. at least 9300 but less than 9400 C. at least 9400 but less than 9500 D. at least 9500 but less than 9600 E. at least 9600 18.18 (2 points) There is a maximum covered loss of 15,000 and a deductible of 1000. What is the average payment by the insurer per loss? (Include situations where the insurer pays nothing.) A. less than 8200 B. at least 8200 but less than 8300 C. at least 8300 but less than 8400 D. at least 8400 but less than 8500 E. at least 8500 18.19 (2 points) There is a maximum covered loss of 15,000 and a deductible of 1000. What is the average value of a non-zero payment by the insurer? A. less than 8700 B. at least 8700 but less than 8800 C. at least 8800 but less than 8900 D. at least 8900 but less than 9000 E. at least 9000
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 188
18.20 (2 points) An insurance policy has a maximum covered loss of 2000 and a deductible of 100. For the ground up unlimited losses: F(100) = 0.20, F(2000) = 0.97, and 2000
∫ x f(x) dx = 400. 100
What is the average payment per loss? A. 360 B. 380 C. 400
D. 420
E. 440
18.21 (1 point) A policy has a policy limit of 50,000 and deductible of 1000. What is the expected payment per loss? A. E[X ∧ 49,000] - E[X ∧ 1000] B. E[X ∧ 50,000] - E[X ∧ 1000] C. E[X ∧ 51,000] - E[X ∧ 1000] D. E[X ∧ 49,000] - E[X ∧ 1000] + 1000 E. E[X ∧ 51,000] - 1000 18.22 (2 points) You are given: • In the absence of a deductible the average loss is 15,900.
• With a 10,000 deductible, the average amount paid per loss is 7,800. • With a 10,000 deductible, the average amount paid per nonzero payment is 13,300. What is the average of those losses of size less than 10,000? (A) 5000 (B) 5200 (C) 5400 (D) 5600 (E) 5800 18.23 (1 point) E[(X - 1000)+] = 3500. E[(X - 25,000)+] = 500. There is a 1000 deductible and a 25,000 maximum covered loss. Determine the average payment per loss.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 189
Use the following information for the next two questions: • Flushing Reinsurance reinsures a certain book of business. • Limited Expected Values for this book of business are estimated to be: E[X ∧ $1 million] = $300,000 E[X ∧ $4 million] = $375,000 E[X ∧ $5 million] = $390,000 E[X ∧ $9 million] = $420,000 E[X ∧ $10 million] = $425,000 • The survival functions, S(x) = 1 - F(x), for this book of business are estimated to be: S($1 million) = 3.50% S($4 million) = 1.70% S($5 million) = 1.30% S($9 million) = 0.55% S($10 million) = 0.45% • Flushing Reinsurance makes a nonzero payment, y, on this book of business. 18.24 (1 point) If Flushing Reinsurance were responsible for the layer of loss from $1 million to $5 million ($4 million excess of $1 million), what is the expected value of y? A. less than $1 million B. at least $1 million but less than $2 million C. at least $2 million but less than $3 million D. at least $3 million but less than $4 million E. at least $4 million 18.25 (1 point) If Flushing Reinsurance were responsible for the layer of loss from $1 million to $10 million ($9 million excess of $1 million), what is the expected value of y? A. less than $1 million B. at least $1 million but less than $2 million C. at least $2 million but less than $3 million D. at least $3 million but less than $4 million E. at least $4 million
18.26 (3 points) Losses are distributed uniformly from 0 to ω. There is a deductible of size d < ω. Determine the variance of the payment per loss.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 190
Use the following information for the next 12 questions: • The distribution of losses suffered by insureds is estimated to have the following limited expected values: E[X ∧ 5,000 ] = 3,600 E[X ∧ 20,000] = 7,500 E[X ∧ 25,000] = 8,025 E[X ∧ ∞] = 10,000 • The survival functions, S(x), for the distribution of losses suffered by insureds is estimated to have the following values: S(5,000) = 51.2% S(20,000) = 12.5% S(25,000) = 8.8% 18.27 (1 point) What is average loss suffered by the insureds? A. 9,600 B. 9,700 C. 9,800 D. 9,900 E. 10,000 18.28 (1 point) What is the average size of data truncated from above at 25,000? A. less than 6,300 B. at least 6,300 but less than 6,400 C. at least 6,400 but less than 6,500 D. at least 6,500 but less than 6,600 E. at least 6,600 18.29 (1 point) What is the average size of data truncated and shifted from below at 5000? A. 12,500
B. 12,600
C. 12,700
D. 12,800
E. 12,900
18.30 (1 point) What is the average size of data censored from above at 25,000? A. 7800
B. 7900
C. 8000
D. 8100
E. 8200
18.31 (1 point) What is the average size of data censored from below at 5,000? A. less than 10,700 B. at least 10,700 but less than 10,800 C. at least 10,800 but less than 10,900 D. at least 10,900 but less than 11,000 E. at least 11,000 18.32 (1 point) What is the average size of data left censored and shifted at 5,000? A. 6200 B. 6300 C. 6400 D. 6500 E. 6600
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 191
18.33 (2 points) What is the average size of data truncated from below at 5,000 and truncated from above at 25,000? A. less than 11,200 B. at least 11,200 but less than 11,300 C. at least 11,300 but less than 11,400 D. at least 11,400 but less than 11,500 E. at least 11,500. 18.34 (1 point) What is the average size of data truncated from below at 5,000 and censored from above at 25,000? A. less than 13,500 B. at least 13,500 but less than 13,600 C. at least 13,600 but less than 13,700 D. at least 13,700 but less than 13,800 E. at least 13,800 18.35 (2 points) What is the average size of data censored from below at 5,000 and censored from above at 25,000? A. 9100 B. 9200 C. 9300 D. 9400 E. 9500 18.36 (2 points) What is the average size of data truncated and shifted from below at 5,000 and truncated from above at 25,000? A. less than 6,100 B. at least 6,100 but less than 6,200 C. at least 6,200 but less than 6,300 D. at least 6,300 but less than 6,400 E. at least 6,400 18.37 (2 points) What is the average size of data truncated and shifted from below at 5,000 and censored from above at 25,000? A. less than 8,700 B. at least 8,700 but less than 8,800 C. at least 8,800 but less than 8,900 D. at least 8,900 but less than 9,000 E. at least 9,000 18.38 (1 point) What is the average size of data truncated from below at 5000? A. 17,000 B. 17,500 C. 18,000 D. 18,500 E. 19,000
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 192
18.39 (2 points) The size of loss distribution has the following characteristics: (i) S(100) = 0.65. (ii) E[X | X > 100] = 345. There is an ordinary deductible of 100 per loss. Determine the average payment per loss. (A) 160 (B) 165 (C)170 (D) 175 (E) 180 18.40 (3 points) A business has obtained two separate insurance policies that together provide full coverage. You are given: (i) The average ground-up loss is 27,000. (ii) Policy B has no deductible and a maximum covered loss of 25,000. (iii) Policy A has an ordinary deductible of 25,000 with no maximum covered loss. (iv) Under policy A, the expected amount paid per loss is 10,000. (v) Under policy A, the expected amount paid per payment is 22,000. Given that a loss less than or equal to 25,000 has occurred, what is the expected payment under policy B? A. Less than 11,000 B. At least 11,000, but less than 12,000 C. At least 12,000, but less than 13,000 D. At least 13,000, but less than 14,000 E. At least 14,000 18.41 (2 points) X is the size of loss prior to the effects of any policy provisions. Given the following information, calculate the average payment per loss under a policy with a 1000 deductible and a 25,000 maximum covered loss. x e(x) F(x) 1000 30,000 72.7% 25,000 980,000 99.7% A. 4250 B. 4500 C. 4750 D. 5000 E. 5250 18.42 (1 point) For a certain policy, in order to determine the payment on a claim, first the deductible of 500 is applied, and then the payment is capped at 10,000. What is the expected payment per loss? A. E[X ∧ 10,000] - E[X ∧ 500] B. E[X ∧ 10,500] - E[X ∧ 500] C. E[X ∧ 10,000] - E[X ∧ 500] + 500 D. E[X ∧ 10,500] - E[X ∧ 500] + 500 E. None of A, B, C, or D
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 193
18.43 (4B, 5/92, Q.20) (1 point) Accidents for a coverage are uniformly distributed on the interval 0 to $5,000. An insurer sells a policy for the coverage which has a $500 deductible. Determine the insurer's expected payment per loss. A. $1,575 B. $2,000 C. $2,025 D. $2,475 E. $2,500 18.44 (4B, 5/95, Q.22) (2 points) You are given the following: • Losses follow a Pareto distribution, with parameters θ = 1000 and α = 2. • 10 losses are expected each year. • The number of losses and the individual loss amounts are independent. • For each loss that occurs, the insurer's payment is equal to the entire amount of the loss if the loss is greater than 100. The insurer makes no payment if the loss is less than or equal to 100. Determine the insurer's expected annual payments. A. Less than 8,000 B. At least 8,000, but less than 9,000 C. At least 9,000, but less than 9,500 D. At least 9,500, but less than 9,900 E. At least 9,900 18.45 (4B, 11/95, Q.13 & 4B, 5/98 Q.9) (3 points) You are given the following: • Losses follow a uniform distribution on the interval from 0 to 50,000.
• There is a maximum covered loss of 25,000 per loss and a deductible of 5,000 per loss. • The insurer applies the maximum covered loss prior to applying the deductible (i.e., the insurerʼs maximum payment is 20,000 per loss). • The insurer makes a nonzero payment p. Determine the expected value of p. A. Less than 15,000 B. At least 15,000, but less than 17,000 C. At least 17,000, but less than 19,000 D. At least 19,000, but less than 21,000 E. At least 21,000
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 194
18.46 (CAS9, 11/97, Q.40a) (2.5 points) You are the large accounts actuary for Pacific International Group, and you have a risk with a $1 million limit. The facultative underwriters from AnyRe have indicated that they are willing to reinsure the following layers: from $100,000 to $200,000 ($100,000 excess of $100,000) from $200,000 to $500,000 ($300,000 excess of $200,000) from $500,000 to $1 million ($500,000 excess of $500,000). You have gathered the following information: Limit E[X ∧ x] F(x) 100,000 58,175 0.603 200,000 89,629 0.748 500,000 139,699 0.885 1,000,000 179,602 0.943 Expected frequency = 100 claims. Calculate the frequency, severity, and expected losses for each of the facultative layers. Show all work. 18.47 (4B, 11/98, Q.12) (2 points) You are given the following: • Losses follow a distribution (prior to the application of any deductible) with cumulative distribution function and limited expected values as follows: Loss Size (x) F(x) E[X ∧ x] 10,000 0.60 6,000 15,000 0.70 7,700 22,500 0.80 9,500 ∞ 1.00 20,000 • There is a deductible of 15,000 per loss and no maximum covered loss. • The insurer makes a nonzero payment p. Determine the expected value of p. A. Less than 15,000 B. At least 15,000, but less than 30,000 C. At least 30,000, but less than 45,000 D. At least 45,000, but less than 60,000 E. At least 60,000
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 195
18.48 (4B, 5/99, Q.7) (2 points) You are given the following: • Losses follow a distribution (prior to the application of any deductible) with cumulative distribution function and limited expected values as follows: Loss Size (x) F(x) E[X ∧ x] 10,000 0.60 6,000 15,000 0.70 7,700 22,500 0.80 9,500 32,500 0.90 11,000 ∞ 1.00 20,000 • There is a deductible of 10,000 per loss and no maximum covered loss. • The insurer makes a payment on a loss only if the loss exceeds the deductible. The deductible is raised so that half the number of losses exceed the new deductible compared to the old deductible of 10,000. Determine the percentage change in the expected size of a nonzero payment made by the insurer. A. Less than -37.5% B. At least -37.5%, but less than -12.5% C. At least -12.5%, but less than 12.5% D. At least 12.5%, but less than 37.5% E. At least 37.5 18.49 (Course 3 Sample Exam, Q.5) You are given the following: • The probability density function for the amount of a single loss is f(x) = 0.01(1 - q + 0.01qx)e-0.01x, x >0. • If an ordinary deductible of 100 is imposed, the expected payment (given that a payment is made) is 125. Determine the expected payment (given that a payment is made) if the deductible is increased to 200.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 196
18.50 (4, 5/00, Q.6) (2.5 points) A jewelry store has obtained two separate insurance policies that together provide full coverage. You are given: (i) The average ground-up loss is 11,100. (ii) Policy A has an ordinary deductible of 5,000 with no maximum covered loss (iii) Under policy A, the expected amount paid per loss is 6,500. (iv) Under policy A, the expected amount paid per payment is 10,000. (v) Policy B has no deductible and a maximum covered loss of 5,000. Given that a loss less than or equal to 5,000 has occurred, what is the expected payment under policy B? (A) Less than 2,500 (B) At least 2,500, but less than 3,000 (C) At least 3,000, but less than 3,500 (D) At least 3,500, but less than 4,000 (E) At least 4,000 18.51 (4, 11/00, Q.18) (2.5 points) A jewelry store has obtained two separate insurance policies that together provide full coverage. You are given: (i) The average ground-up loss is 11,100. (ii) Policy A has an ordinary deductible of 5,000 with no maximum covered loss. (iii) Under policy A, the expected amount paid per loss is 6,500. (iv) Under policy A, the expected amount paid per payment is 10,000. (v) Policy B has no deductible and a maximum covered loss of 5,000. Given that a loss has occurred, determine the probability that the payment under policy B is 5,000. (A) Less than 0.3 (B) At least 0.3, but less than 0.4 (C) At least 0.4, but less than 0.5 (D) At least 0.5, but less than 0.6 (E) At least 0.6
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 197
18.52 (CAS3, 11/03, Q.22) (2.5 points) The severity distribution function of claims data for automobile property damage coverage for Le Behemoth Insurance Company is given by an exponential distribution, F(x). F(x) = 1 - exp(-x/5000). To improve the profitability of this portfolio of policies, Le Behemoth institutes the following policy modifications: i) It imposes a per-claim deductible of 500. ii) It imposes a per-claim limit of 25,000. (The maximum paid per claim is 25,000 - 500 = 24,500.) Previously, there was no deductible and no limit. Calculate the average savings per (old) claim if the new deductible and policy limit had been in place. A. 490 B. 500 C. 510 D. 520 E. 530 18.53 (SOA M, 11/05, Q.26 & 2009 Sample Q.207) (2.5 points) For an insurance: (i) Losses have density function ⎧ 0.02x 0 < x < 10 fX(x) = ⎨ elsewhere ⎩ 0 (ii) The insurance has an ordinary deductible of 4 per loss. (iii) YP is the claim payment per payment random variable. Calculate E[YP]. (A) 2.9 (B) 3.0
(C) 3.2
(D) 3.3
(E) 3.4
18.54 (SOA M, 11/06, Q.6 & 2009 Sample Q.279) (2.5 points) Loss amounts have the distribution function ⎧(x /100)2 , 0 ≤ x ≤ 100 F(x) = ⎨ 1 , 100 < x ⎩ An insurance pays 80% of the amount of the loss in excess of an ordinary deductible of 20, subject to a maximum payment of 60 per loss. Calculate the conditional expected claim payment, given that a payment has been made. (A) 37 (B) 39 (C) 43 (D) 47 (E) 49
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 198
18.55 (CAS5, 5/07, Q.47) (2.0 points) You are given the following information: Claim
Ground-up Uncensored Loss Amount
A B C D E F G H I J
$250,000 $300,000 $450,000 $750,000 $1,200,000 $2,500,000 $4,000,000 $7,500,000 $9,000,000 $15,000,000
a. (1.25 points) Calculate the ratio of the limited expected value at $5 million to the limited expected value at $1 million b. (0.75 points) Calculate the average payment per payment with a deductible of $1 million and a maximum covered loss of $5 million. Comment: I have reworded this exam question in order to match the syllabus of your exam.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 199
Solutions to Problems: 18.1. D. The payments are: 0, 1000, 7000, 22000 and 43000. Average payment per loss is: (0 + 1000 + 7000 + 22,000 + 43,000)/5 = 14,600. 18.2. C. Average payment per payment is: (1000 + 7000 + 22,000 + 43,000)/4 = 18,250. 18.3. B. The payments are: $2000, $6000, $12,000, $25,000, and $25,000. Average payment is: (2000 + 6000 + 12000 + 25000 + 25000)/5 = 14,000. 18.4. A. The payments are: 0, 1000, 7000, 20000 and 20000. Average payment per loss is: (0 + 1000 + 7000 + 20000 + 20000)/5 = 9,600. Alternately, E[X ∧ 5000] = (2000 + 5000 + 5000 + 5000 + 5000 + 5000)/6 = 4400. E[X ∧ 25000] = (2000 + 6000 + 12,000 + 25,000 + 25,000)/6 = 14,000. E[X ∧ 25000] - E[X ∧ 5000] = 14,000 - 4400 = 9,600. Comment: The layer from 5000 to 25,000. 18.5. C. Average payment per payment is: (1000 + 7000 + 20000 + 20000)/4 = 12,000. Alternately, (E[X ∧ 25000] - E[X ∧ 5000])/S(5000) = (14,000 - 4400)/0.8 = 12,000. 18.6. E. E[X] - E[X ∧ 25000] = 41,982 - 16,198 = 25,784. Comment: Based on a LogNormal distribution with µ = 9.8 and σ = 1.3. 18.7. D. Average payment per payment is: (E[X] - E[X ∧ 25000]) / S(25000) = (41,982 - 16,198) / (1 - 0.599) = 64,299. 18.8. E. E[X ∧ 100000] = 30,668. 18.9. A. E[X ∧ 100000] - E[X ∧ 25000] = 30,668 - 16,198 = 14,470. Comment: The layer from 25,000 to 100,000. 18.10. E. (E[X ∧ 100000] - E[X ∧ 25000])/S(25000) = (30,668 - 16,198)/(1 - .599) = 36,085. 18.11. C. ($2100 + $3300 + $4600 + $6100 + $8900) / 5 = 5000. 18.12. C. ($1100 + $2300 + $3600 + $5100 + $7900) / 5 = 4000. Alternately, one can subtract 1000 from the solution to the previous question. 18.13. A. ($0 + $1100 + $2300 + $3600 + $5100 + $7900) / 6 = 3333.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 200
18.14. B. ($800 + $2100 + $3300 + $4600 + $5000 + $5000) / 6 = 3467. 18.15. E. ($800 + $2100 + $3300 + $4600 ) / 4 = 2700. 18.16. D. For this uniform distribution, f(x) = 1/20,000 for 0≤x≤20,000. The payment by the insurer depends as follows on the size of loss x: Insurerʼs Payment 0 x ≤ 1000 x - 1000 1000 ≤ x ≤ 20,000 We need to compute the average dollars paid by the insurer per loss: 20000
20000
∫ (x-1000)f(x) dx x = 1000
=
x = 20000
∫ {(x-1000)/20000} dx
= (.5){(x-1000)2 /20000} ] = 9025.
x = 1000
x = 1000
18.17. B. f(x) = 1/20,000 for 0≤ x ≤ 20,000. Insurerʼs Payment x x ≤ 15,000 15,000 15,000 ≤ x ≤ 20,000 We need to compute the average dollars paid by the insurer per loss, the sum of two terms corresponding to 0 ≤ x ≤ 15000 and 15000 ≤ x ≤ 20000: 15000
∫ xf(x) dx
15000
+ 15000{ 1 - F(15000)} =
x=0
∫ x/20000 dx
+ (15,000)(1 - 0.75) =
x=0 x = 15000
{x2 /40000} ] x=0
+ 3750 = 5625 + 3750 = 9375.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 201
18.18. D. For this uniform distribution, f(x) = 1/20000 for 0≤x≤20000. The payment by the insurer depends as follows on the size of loss x: Insurerʼs Payment 0 x ≤ 1000 x - 1000 1000 ≤ x ≤ 15,000 14,000 x ≥ 15,000 We need to compute the average dollars paid by the insurer per loss and the probability that a loss as the sum of two terms corresponding to 1000 ≤ x ≤ 15,000 and 20,000 ≥x ≥ 15,000: 15000
15000
∫ (x-1000)f(x) dx
+ 14000{ 1 - F(15000)} =
x = 1000
∫ {(x-1000)/20000} dx
+ 14000{ 1 - .75} =
x = 1000 x = 15000
{(x-1000)2 /40000} ]
+ 3500 = 4900 + 3500 = 8400.
x = 1000
18.19. C. We need to compute the ratio of two quantities, the average dollars paid by the insurer per loss and the probability that a loss will result in a non-zero payment. The latter is the chance that x>1000, which is for the uniform distribution: 1 - (1000/20000) = .95. The former is the solution to the previous question: 8400. Therefore, the average non-zero payment is 8400 / 0.95 = 8842. Comment: Similar to 4B, 11/95, Q.13. 18.20. B. Average payment per loss = E[X
∧
2000
E[X
∧
2000] - E[X
∧
100] =
u] - E[X
∧
d] =
100
∫ x f(x) dx + 2000S(2000) - {∫ x f(x) dx + 100S(100)} = 0
0
2000
∫ x f(x) dx
+ 2000S(2000) - 100S(100) = 400 + (2000)(1 - .97) - (100)(1 - .20) = 380.
100
Comment: Can be done via a Lee Diagram. 18.21. C. Policy limit = maximum covered loss - deductible. Thus the maximum covered loss = 51,000. Expected payment per loss = E[X ∧ u] - E[X ∧ d] = E[X ∧ 51,000] - E[X ∧ 1000].
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 202
18.22. C. E[X] = 15,900. With a 10,000 deductible, the average amount paid per loss = E[X] - E[X ∧ 10000] = 7800. Therefore, E[X ∧ 10000] = 15900 - 7800 = 8,100. With a 10,000 deductible, the average amount paid per nonzero payment = (E[X] - E[X ∧ 10000])/S(10000) = 13,300. Therefore, S(10000) = 7800/ 13300 = 0.5865. Average loss of size between 0 and 10000 = 10000
∫ x f(x) dx / F(10000) = (E[X ∧ 10000] - 10000S(10000))/F(10000) = 0
(8100 - (10000)(.5865))/(1-.5865) = 2235/0.4135 = 5405. 18.23. E[(X - 1000)+] - E[(X - 25,000)+] = {E[X] - E[X ∧ 1000]} - {E[X] - E[X ∧ 25,000]} = E[X ∧ 25,000] - E[X ∧ 1000] = average payment per loss. Thus, the average payment per loss is: 3500 - 500 = 3000. 18.24. C. The reinsurer pays the dollars in the layer of loss from $1 million to $5 million, which are: E[X ∧ $5 million] - E[X ∧ $1 million]. The number of nonzero payments is 1 - F(1 million) = S(1 million). Thus the average nonzero payment is: (E[X ∧ $5 million] - E[X ∧ $1 million])/ S($1 million) = (390,000 - 300,000)/.035 = 2,571,429. 18.25. D. The reinsurer pays the dollars in the layer of loss from $1 million to $10 million, which are: E[X ∧ $10 million] - E[X ∧ $1 million]. The number of nonzero payments is 1 - F(1 million) = S(1 million). Thus the average nonzero payment is: (E[X ∧ $10 million] - E[X ∧ $1 million])/ S($1 million) = (425,000 - 300,000)/.035 = 3,571,429. 18.26. The payment per loss of size x is: 0 for x ≤ d, and x - d for x > d. ω
Mean payment per loss is:
∫d (x- d) (1/ ω) dx = (ω - d)2 / (2ω). ω
Second Moment of the payment per loss is:
∫d (x- d)2 (1/ ω) dx = (ω - d)3 / (3ω).
Thus, the variance of the payment per loss is: (ω - d)3 / (3ω) - (ω - d)4 / (2ω)2 = (ω - d)3 (ω + 3d) / (12ω2). 18.27. E. E[X] = E[X
∧
∞] = 10,000.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 203
18.28. B. {E[X ∧ 25000] - 25000S(25000)} / F(25000) = {8025 - (25000)(0.088)} / (1-0.088) = 6387. 18.29. A. e(5000) = {E[X] - E[X 18.30. C. E[X
∧
∧
5000]}/S(5000) = (10,000-3600)/0.512 = 12,500.
25000] = 8025.
18.31. E. The small losses are each recorded at 5000. Subtracting 5000 from every recorded loss, we would get the layer from 5000 to ∞. Thus the average loss is: (E[X ] - E[X ∧ 5000]) + 5000 = (10000 - 3600) + 5000 = 11,400. 18.32. C. E[(X - 5000)+] = E[X ] - E[X
∧
5000] = 10000 - 3600 = 6,400.
18.33. B. This is the average size of losses in the interval from 5000 to 25000: ({E[X ∧ 25000] - 25000S(25000)}-{E[X ∧ 5000] - 5000S(5000)})/(F(25000)-F(5000)} = {(8025-(25000)(.088)) - (3600-(5000)(.512)}/(.512-.088) = (5825 - 1040)/0.424 = 11,285. 18.34. C. {E[X
∧
25000] - E[X
∧
5000]}/S(5000) + 5000 = (8025-3600)/.512 + 5000 = 13,643.
18.35. D. The losses are 5000 per loss plus the layer of losses from 5000 to 25,000. Thus the average loss is: (E[X ∧ 25000] - E[X ∧ 5000]) + 5000 = (8025 -3600) + 5000 = 9425. Alternately, the average size of loss is reduced compared to data just censored from below, by E[X] - E[X ∧ 25000] = 10000 - 8025 = 1975. Since from a previous solution, the average size of data censored from below at 5,000 is 11,400, the solution to this question is: 11,400 - 1975 = 9425. 18.36. C. Using a previous solution, where there was no shifting, this is 5000 less than the average size of data truncated from below at 5,000 and truncated from above at 25,000 = 11,285 - 5000 = 6,285. Alternately, the dollars of loss are those for the layer from 5,000 to 25,000, less the width of the layer times the number of losses greater than 25,000: {E[X ∧ 25000] - E[X ∧ 5000]} - (20000)S(25000). The number of losses in the database is: F(25000) - F(5000) = S(5000) - S(25000). Thus the average size is: {(8025-3600) - (20000)(.088)} / (0.512 - 0.088) = 6,285. 18.37. A. {E[X
∧
25000] - E[X
∧
5000]}/S(5000) = (8025-3600)/.512 = 8,643.
18.38. B. e(5000) + 5000 = {E[X] - E[X ∧ 5000]}/S(5000) + 5000 = (10,000-3600)/0.512 + 5000 = 17,500.
2013-4-2,
Loss Distributions, §18 Average Sizes
18.39. A.
∞
E[X | X > 100] =
HCM 10/8/12,
Page 204
∞
∫x f(x) dx / S(100). ⇒ ∫x f(x) dx = S(100)E[X | X > 100] = (.65)(345) = 224.25.
100
100
With a deductible of 100 per loss, the average payment per loss is: ∞
∞
∞
∫
∫
∫
100
100
100
(x - 100) f(x) dx = x f(x) dx - 100 f(x) dx = 224.25 - (100)(0.65) = 159.25. Alternately, average payment per loss = S(100) (average payment per payment) = S(100) E[X - 100 | X > 100] = S(100) {E[X | X > 100] - 100} = (.65)(345 - 100) = 159.25. 18.40. A. Average ground-up loss = E[X] = 27,000. Under policy A, average amount paid per loss = E[X] - E[X ∧ 25000] = 10000. Therefore, E[X ∧ 25000] = 27000 - 10000 = 17000. Under policy A, average amount paid per payment = (E[X] - E[X ∧ 25000])/S(25000) = 22000. Therefore, S(25000) = 10000/22000 = 0.4545. Given that a loss less than or equal to 25,000 has occurred, the expected payment under policy B = average loss of size between 0 and 25000 = 25000
∫ x f(x) dx / F(25000) = {E[X ∧ 25000] - 25000S(25000)}/F(25000) = 0
{17000 - (25000)(0.4545)} / (1 - 0.4545) = 10,335. Comment: Similar to 4, 5/00, Q.6. 18.41. E. E[(X - 1000)+] = e(1000) S(1000) = (30,000)(1 - 0.727) = 8190. E[(X - 25,000)+] = e(25,000) S(25,000) = (980,000)(1 - 0.997) = 2940. The average payment per loss is: E[X ∧ 25,000] - E[X ∧ 1000] = E[(X - 1000)+] - E[(X - 25,000)+] = 8190 - 2940 = 5250.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 205
18.42. B. Let X be the size of loss. If for example x = 11,000, then the payment is: Min[10,500, 10,000] = 10,000. If for example x = 10,200, then the payment is: Min[9700, 10,000] = 9700. If for example x = 7000, then the payment is: Min[6500, 10,000] = 6500. If for example x = 300, then the payment is: Min[0, 10,000] = 0. ⎧ 10,000, x ≥ 10,500 ⎪ payment = ⎨x - 500, 500 < x < 10,500 . ⎪ 0, x ≤ 500 ⎩ Thus the average payment per loss is: 10,500
∫ 500
(x - 500) f(x) dx + 10,000 S(10,500) =
10,500
∫ 500
10,500
x f(x) dx - 500
∫ 500
f(x) dx + 10,000 S(10,500) =
E[X ∧ 10,500] - 10,500 S(10,500) - {E[X ∧ 500] - 500 S(500)} - 500{F(10,500) - F(500)} + 10,000 S(10,500) = E[X ∧ 10,500] - 500 S(10,500) - E[X ∧ 500] + 500 S(500) - 500 F(10,500) + 500 F(500) = E[X ∧ 10,500] - E[X ∧ 500] + 500{F(500) + S(500) - F(10,500) - S(10,500)} = E[X ∧ 10,500] - E[X ∧ 500] + (500)(1 - 1) = E[X ∧ 10,500] - E[X ∧ 500]. Alternately, 10,000 = maximum payment = Policy limit = maximum covered loss - deductible. Thus the maximum covered loss = 10,500. Expected payment per loss = E[X ∧ u] - E[X ∧ d] = E[X ∧ 10,500] - E[X ∧ 500]. Comment: As mentioned in the section on Policy Provisions, the default on the exam is to apply the maximum covered loss first and then apply the deductible. What is done in this question is mathematically the same as first applying a maximum covered loss of 10,500 and then applying a deductible of 500. 18.43. C. For an accident that does not exceed $500 the insurer pays nothing. For an accident of size x > 500, the insurer pays x - 500. The density function for x is f(x) = 1/5000 for 0≤x≤5000. Thus the insurerʼs average payment per accident is: 5000
5000
∫ (x-500) f(x) dx = ∫ (x-500) (1/5000) dx = x = 500
x = 500
5000
(x-500)2 (1/10000)
] = 2025.
x = 500
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 206
18.44. E. ∞
Expected amount paid per loss =
∞
100
∫100 x f(x) dx = ∫0 x f(x) dx - ∫0 x f(x) dx =
Mean - {E[X ∧ 100] - 100S(100)}. S(100) = {θ/(θ+100)}2 = (1000/1100)2 = 0.8264. E[X ∧ 100] = {θ/(α−1)}{1−(θ/(θ+100))α−1} = {1000/(2-1)} { 1- (1000/1100)2-1} = 90.90. Mean = θ/(α−1) = 1000. Therefore, expected amount paid per loss is: 1000 - {90.90 - 82.64} = 991.74. Expect 10 losses per year, so the average cost per year is: (10)(991.7) = $9917. Alternately the expected cost per year of 10 losses is: ∞
10
∞
∫100 x f(x) dx = (10)(2)(10002) 100∫ x (1000 + x)- 3 dx =
107 {-x
x =∞ (1000 + x)- 2 }
]
x = 100
∞
+ 107
∫100 (1000 + x)- 2 dx = 107 (100/11002 + 1/1100} = 9917.
Alternately, the average severity per loss > $ 100 is: 100 + e(100) = 100 + (θ+100)/(α -1) = 1100 + 100 = $1200. Expected number of losses > $100 = 10S(100) = 8.2645. Expected annual payment = $1200(8.2645) = $9917. Comment: Almost all questions involve the ordinary deductible, in which for a loss X larger than d, X - d is paid. For these situations the average payment per loss is: E[X] - E[X ∧ d]. Instead, here for a large loss the whole amount is paid. This is a franchise deductible, as discussed in the section on Policy Provisions. In this case, the average payment per loss is d S(d) more than for the ordinary deductible or: E[X] - E[X ∧ d] + d S(d). One can either compute the expected total amount paid per year by an insurer either as (average payment insured receives per loss)(expected losses the insured has per year) or as (average payment insurer makes per non-zero payment)(expected non-zero payments the insurer makes per year). The former is ($991.7)(10) = $9917; the latter is ($1200)(8.2645) = $9917. Thus whether one looks at it from the point of view of the insurer or the insured, one gets the same result.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 207
18.45. B. For this uniform distribution, f(x) = 1/50000 for 0≤x≤50000. The payment by the insurer depends as follows on the size of loss x: Insurerʼs Payment 0 x ≤ 5000 x - 5000 5000 ≤ x ≤ 25000 20000 x ≥ 25000 We need to compute the ratio of two quantities, the average dollars paid by the insurer per loss and the probability that a loss will result in a nonzero payment. The latter is the chance that x > 5000, which is: 1 - (5000/50000) = 0.9. The former is the sum of two terms corresponding to 5000 ≤ x ≤ 25000 and x > 25000: 25000
∫ (x-5000)f(x) dx
25000
+ 20000{1 - F(25000)} = ∫ {(x-5000)/50000} dx + 20000{1 - .5} =
x = 5000
x = 5000 x = 25000
{(x-5000)2 /100000} ]
+ 10000 = 4000 + 10000 = 14000.
x = 5000
Thus the average nonzero payment by the insurer is: 14000 / 0.9 = 15,556. Alternately, S(x) = 1 - x/50000, x < 50000. The average payment per (nonzero) payment is: (E[X ∧ L] - E[X ∧ d])/S(d) = (E[X ∧ 25000] - E[X ∧ 5000])/S(5000) = 25000
25000
∫ S(x) dx / S(5000) = ∫ {1 - x/50000} dx / 0.9 = (20000 - 6250 + 250)/0.9 = 15,556. x = 5000
x = 5000
18.46. For the layer from $100,000 to $200,000, the expected number of payments is: 100 S(100,000) = 39.7. The expected losses are: (100) (E[X ∧ 200,000] - E[X ∧ 100,000]) = $3,145,400. The average payment per payment in the layer is: 3,145,400/39.7 = $79,229. For the layer from $200,000 to $500,000, the expected number of payments is: 100 S(200,000) = 25.2. The expected losses are: (100) (E[X ∧ 500,000] - E[X ∧ 200,000]) = $5,007,000. The average payment per payment in the layer is: 5,007,000/25.2 = $198,690. For the layer from $500,000 to $1,000,000, the expected number of payments is: 100 S(500,000) = 11.5. The expected losses are: (100) (E[X ∧ 1,000,000] - E[X ∧ 500,000]) = $3,990,300. The average payment per payment in the layer is: 5,007,000/11.5 = $346,983.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 208
18.47. C. The insurer pays the dollars of loss excess of $15,000, which are: E[X] - E[X ∧ 15000] = E[X ∧ ∞] - E[X ∧ 15000]. The number of non-zero payments is 1 - F(15000). Thus the average nonzero payment is: (E[X] - E[X ∧ 15000])/ (1 - F(15000)) = (20,000 - 7700)/(1-.7) = 12300 / 0.3 = 41,000. 18.48. E. Since 40% of the losses exceed a deductible of 10,000 and half of 40% is 20%, the new deductible is 22,500 which is exceeded by 20% of the losses. In other words, S(22,500) = 20% = 40%/ 2 = S(10,000) / 2. For a deductible of size d, the expected size of a nonzero payment made by the insurer is (E[X] E[X ∧ d])/{ 1 - F(d) } = e(d) = the mean excess loss at d. e(10,000) = (20,000 - 6000) / (1-.6) = 35,000. e(22,500) = (20,000 - 9500) / (1-.8) = 52,500. 52,500 / 35,000 = 1.5 or a 50% increase. Comment: One can do the problem without using the specific numbers in the Loss Size column.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 209
18.49. For this density, the survival function is: ∞
∞
∫
S(x) = 0.01(1 - q + 0.01qt)e-0.01t dt = -e-0.01t - 0.01qte-0.01t ] = e-0.01x + 0.01qxe-0.01x. x
x
We will also need integrals of xf(x): ∞
∞
∫t f(t)dt = 0.01 ∫(1 - q)te-0.01t + 0.01qt2e-0.01t dt = x
x ∞
-.01e-0.01t {(1-q)(100t+1002 ) +q(t2 +2t100 + (2)1002 )}
]= x
e-0.01x {(1-q)(x+100) +q(.01x2 +2x + 200)} = e-0.01x {((x+100) + q(.01x2 +x + 100)}. First, given q, calculate the average value of a non-zero payment, given a deductible of 100. We need to compute: ∞
∞
∫ (t-100) f(t)dt / S(100) = ∫ t f(t)dt / S(100) - 100 = 100
100
{e-1 {200 + q(300)}} /(1 + q)e-1 -100 = {200 + 300q} /(1 + q) -100. Setting this equal to 125, one can solve for q: {200 + 300q} /(1 + q) -100 = 125. 225(1 + q) = 200 + 300q. ⇒ q = 25/75 = 1/3. Now the average non-zero payment, given a deductible of 200 is: ∞
∫t f(t)dt / S(200) - 200 = {e-2 {300 + (700/3)}} /(1 + 2/3)e-2 - 200 = (1600/3)/(5/3) - 200 = 200
= 320 - 200 = 120. Alternately, the given density is a mixture of an Exponential with θ = 100, given weight 1-q, and a Gamma Distribution with parameters α = 2 and θ =100, given weight q. The mean for the Exponential Distribution is 100. The mean for this Gamma Distribution is (20)(100) = 200. Thus, the mean for the mixed distribution is: (1-q)(100) + 200q = 100 + 100q. For this Exponential Distribution, E[X For this Gamma Distribution, E[X
∧
∧
x] = 100 (1- e-0.01x).
x] = 200 Γ(3; .01x) + x {1 - Γ(2; .01x)}.
Making use of Theorem A.1 in Appendix A of Loss Models, Γ(3; .01x) = 1 - e-0.01x{1 + .01x + .00005x2 } and
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 210
Γ(2; .01x) = 1 - e-0.01x{1 + .01x}. Therefore, for this Gamma Distribution, E[X ∧ x] = 200 - e-0.01x{200 + 2x + .01x2 } + e-0.01x{x + .01x2 } = 200 - e-0.01x(200 + x). Thus, for the mixed distribution, E[X ∧ x] = q{200 - e-0.01x(200 + x)} +(1-q)100(1- e-0.01x) = 100(1-e-0.01x) + q(100 - 100e-0.01x - xe-0.01x). For this Exponential Distribution, S(x) = e-0.01x. For this Gamma Distribution, S(x) = 1 - Γ(2; .01x) = e-0.01x{1 + .01x}. Thus for the mixed distribution, S(x) = q{e-0.01x(1 + .01x)} +(1-q) e-0.01x = e-0.01x + q.01x e-0.01x. The expected non-zero payment given a deductible of size x is: (E[X] - E[X
∧
x] )/S(x) = {100e-0.01x + q(100+x)e-0.01x}/{ e-0.01x + q.01x e-0.01x} =
{100 +q(100+x)}/{1 + .01xq}. Thus for a deductible of 100, the average non-zero payment is: (100+200q)/(1+q). Setting this equal to 125 and solving for q, 125 = (100+200q)/(1+q). ⇒ q = 25/75 = 1/3. Thus for a deductible of 200, the average non-zero payment is: (100+(300/3))/(1+2/3) = 200/(5/3) = 120. 18.50. D. Average ground-up loss = E[X] = 11,100. Under policy A, average amount paid per loss = E[X] - E[X ∧ 5000] = 6500. Therefore, E[X ∧ 5000] = 11100 - 6500 = 4600. Under policy A, average amount paid per payment = (E[X] - E[X ∧ 5000])/S(5000) = 10000. Therefore, S(5000) = 6500/ 10000 = .65. Given that a loss less than or equal to 5,000 has occurred, the expected payment under policy B = average loss of size between 0 and 5000 = 5000
∫ x f(x) dx / F(5000) = {E[X ∧ 5000] - 5000S(5000)}/F(5000) = 0
{4600 - (5000)(.65)}/(1 - .65) = 1350/.35 = 3857. Comment: F, S, f, the mean, and the Limited Expected Value, are all for the ground-up unlimited losses of the jewelry store, whether or not it has insurance.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 211
18.51. E. Under policy A, with an ordinary deductible of 5,000 with no maximum covered loss, the expected amount paid per loss is: E[X] - E[X ∧ 5000] = 6,500. Under policy A, the the expected amount paid per payment is: (E[X] - E[X ∧ 5000])/S(5000) = 10,000. Therefore, S(5000) = 6500/10000 = 0.65. Given that a loss has occurred, the payment under policy B, with no deductible and a policy limit of 5,000, is 5,000 if and only if the original loss is 5000 or more. The probability of this is S(5000) = 0.65. 18.52. C. An Exponential Distribution with θ = 5000. E[X E[X
∧
500] = 5000(1 - e-0.1) = 475.8. E[X
∧
∧
x] = θ(1 - e-x/θ) = 5000(1 - e-x/5000).
25000] = 5000(1 - e-5) = 4966.3. E[X] = θ = 5000.
Average payment per loss before: E[X] = 5000. Average payment per loss after: E[X ∧ 25000] - E[X Average savings per loss: 5000 - 4490.5 = 509.5.
∧
500] = 4966.3 - 475.8 = 4490.5.
18.53. E. By integrating f(x), F(x) = .01x2 , 0 < x < 10. S(4) = 1 - (0.01)(42 ) = 0.84. 10
10
∫
x =10
∫
E[X] = S(x) dx = (1 - .01x2 ) dx = x - .01x3 /3 ] = 6.667 0
0
x=0
4
E[X
∧
4
∫
x =4
∫
4] = S(x) dx = (1 - .01x2 ) dx = x - .01x3 /3 ] = 3.787 0
0
E[YP] = (E[X] - E[X
∧
x=0
4])/S(4) = (6.667 - 3.787)/0.84 = 3.43.
18.54. B. Let m be the maximum covered loss. 60 = 0.8(m - 20). ⇒ m = 95. The insurance pays 80% of the layer from 20 to 95. 95
Expected payment per loss is: 0.8
95
∫20 S(x) dx = 0.820∫ 1 - x2 / 10000 dx = 0.8(75 - 28.31) = 37.35.
Expected payment per payment is: 37.35/S(20) = 37.35/(1 - 0.22 ) = 38.91. Alternately, f(x) = x/5000, 0 ≤ x ≤ 100. E[X
∧
95
95] =
∫ x f(x) dx
+ 95 S(95) = 57.16 + (95)(1 - 0.952 ) = 66.42.
0
E[X
∧
20
20] =
∫ x f(x) dx
+ 20 S(20) = 0.533 + (20)(1 - 0.202 ) = 19.73.
0
E[YP] = 0.8 {E[X
∧
95] - E[X
∧
20]} / S(20) = 0.8 (66.42 - 19.73) / 0.96 = 38.91.
2013-4-2,
Loss Distributions, §18 Average Sizes
HCM 10/8/12,
Page 212
18.55. a. E[X ∧ 1 million] = 7,750,000/10 = $775,000. E[X ∧ 5 million] = 24,450,000/10 = $2,445,000. $2,445,000/$775,000 = 3.155. Claim
Loss
Limited to 1 million
Limited to 5 million
A B C D E F G H I J
$250,000 $300,000 $450,000 $750,000 $1,200,000 $2,500,000 $4,000,000 $7,500,000 $9,000,000 $15,000,000
$250,000 $300,000 $450,000 $750,000 $1,000,000 $1,000,000 $1,000,000 $1,000,000 $1,000,000 $1,000,000
$250,000 $300,000 $450,000 $750,000 $1,200,000 $2,500,000 $4,000,000 $5,000,000 $5,000,000 $5,000,000
Sum
$40,950,000
$7,750,000
$24,450,000
b. With a deductible of $1 million there are 6 non-zero payments out of 10 losses. Average payment per payment is: ($2,445,000 - $775,000)/0.6 = $2,783,333. Alternately, the six non-zero payments are in millions: 0.2, 1.5, 3, 4, 4, 4. (0.2 + 1.5 + 3 + 4 + 4 + 4)/6 = 16.7 million/6 = $2,783,333. Comment: The solution to part a is one way to determine the $5 million increased limit factor for a basic limit of $1 million.
2013-4-2,
Loss Distributions, §19 Percentiles
HCM 10/8/12,
Page 213
Section 19, Percentiles The 80th percentile is the place where the distribution function is 0.80. For a continuous distribution, the 100pt h percentile is the first value at which F(x) = p. Exercise: Let F(x) = 1 - e-x/10. Find the 75th percentile of this distribution. [Solution: 0.75 = 1 - e-x/10. ⇒ x = -10 ln(1 - 0.75) = 13.86. Comment: Check: 1 - e-13.86/10 = 1 - 0.250 = 0.75.] The Value at Risk, VaRp , is defined as the 100pth percentile.77 In Appendix A of the Tables attached to the exam, there are formulas for VaRp (X) for many of the distributions: Exponential, Pareto, Single Parameter Pareto, Weibull, Loglogistic, Inverse Pareto, Inverse Weibull, Burr, Inverse Burr, Inverse Exponential, Paralogistic, Inverse Paralogistic. One can use these formulas for VaRp in order to determine percentiles. For example, for the Exponential Distribution as shown in Appendix A: VaRp (X) = -θ ln(1-p). Thus in the previous exercise, VaR0.75 = (-10) ln(0.25) = 13.86. The 50t h percentile is the median, the place where the distribution function is 0.50. The 25th percentile is the lower or first quartile. The 50th percentile is the middle or second quartile. The 75th percentile is the upper or third quartile.78 Exercise: What is the 90th percentile of a Weibull Distribution with parameters τ = 3 and θ = 1000? [Solution: F(x) = 1 - exp[-(x/θ)τ] = 1 - exp[-(x/1000)3 ]. Set F(x) = 0.90 and solve for x. 0.90 = 1 - exp[-(x/1000)3 ]. -ln[0.10] = (x/1000)3 . x = 1000 ln[10]1/3 = 1321. Alternately, as shown in Appendix A: VaRp (X) = θ {-ln(1-p)}1/τ. VaR0.90 = (1000) {-ln(0.1)}1/3 = 1321.]
77 78
As discussed in “Mahlerʼs Guide to Risk Measures.” The difference between the 75th and 25th percentiles is called the interquartile range.
2013-4-2,
Loss Distributions, §19 Percentiles
HCM 10/8/12,
Page 214
Percentiles of Discrete Distributions: A more precise mathematical definition also covers situations other than the continuous loss distributions:79 The 100pth percentile of a distribution F(x) is any number, πp such that: F(πp -) ≤ p ≤ F(πp ), where F(y-) is the limit of F(x) as x approaches y from below. Exercise: Let a distribution be such that there is 30% chance of a loss of $100, a 50% chance of a loss of $200, and a 20% chance of a loss of $500. Determine the 70th and 80th percentiles of this distribution. [Solution: F(100) = 0.3, F(200) = 0.8. Since F(x) = 0.3 for 100 ≤ x < 200, F(200-) = 0.3. Thus, F(200-) ≤ 0.7 ≤ F(200), so that π.70 = 200. 200 is the first value at which F(x) > 0.7. F(x) = 0.8 for 200 ≤ x < 500, so that 200 ≤ π.80 ≤ 500. Comment: Since there is a value at which F(x) = 0.8, there is no unique value of the 80th percentile for this discrete distribution. For example, F(200-) =0.3 ≤ 0.8 = F(200), F(300-) = 0.8 = F(300), and F(500-) = 0.8 ≤ 1.0 = F(500). Thus each of 200, 300 and 500 satisfy the definition of the 80th percentile. In this case I would use 200 as the 80th percentile.] For a discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p. Quantiles: The 95th percentile is also referred to as Q0.95, the 95% quantile.
25th percentile ⇔ Q0.25 ⇔ 25% quantile ⇔ first quartile. 50th percentile ⇔ Q0.50 ⇔ 50% quantile ⇔ median. 75th percentile ⇔ Q0.50 ⇔ 75% quantile ⇔ third quartile. 90th percentile ⇔ Q0.90 ⇔ 90% quantile. 99th percentile ⇔ Q0.99 ⇔ 99% quantile.
79
Definition 3.7 at page 34 of Loss Models.
2013-4-2,
Loss Distributions, §19 Percentiles
HCM 10/8/12,
Page 215
Problems: 19.1 (1 point) What is the 90th percentile of a Pareto Distribution with parameters α = 3 and θ = 100? A. less than 120 B. at least 120 but less than 125 C. at least 125 but less than 130 D. at least 130 but less than 135 E. at least 135 19.2 (1 point) Severity is Exponential. What is the ratio of the 95th percentile to the median? A. 4.1 B. 4.3 C. 4.5 D. 4.7
E. 4.9
19.3 (1 point) What is the 80th percentile of a Weibull Distribution with parameters τ = 2 and θ = 100? A. less than 120 B. at least 120 but less than 125 C. at least 125 but less than 130 D. at least 130 but less than 135 E. at least 135
2013-4-2,
Loss Distributions, §19 Percentiles
HCM 10/8/12,
Page 216
19.4 (2, 5/85, Q.4) (1.5 points) Let the continuous random variable X have the density function f(x) as shown in the figure below: f(x)
(a, 1/4)
1/4
a
(0,0) What is the 25th percentile of the distribution of X ? A. 2 B. 4 C. 8 D. 16
x
E. 32
19.5 (160, 11/86, Q.7) (2.1 points) You are given: (i) F(x) = 1 - (θ/x)a where a > 0, θ > 0, x > θ. (ii) The 90th percentile is (2)(101/4). (iii) The 99th percentile is (2)(1001/4). Determine the median of the distribution. (A) 21/4
(B) 2(21/4)
(C)
2
(D) 2 2
(E) 4 2
19.6 (2, 5/92, Q.10) (1.7 points) Let Y be a continuous random variable with cumulative distribution function F(y) = 1 - exp[-(y - a)2 /2], y > a, where a is a constant. What is the 75th percentile of Y? A. F(0.75)
B. a -
2 ln(4/ 3)
C. a +
2 ln(4/ 3)
D. a - 2 ln(2)
E. a + 2 ln(2)
19.7 (IOA 101, 9/01, Q.1) (1.5 points) Data were collected on 100 consecutive days for the number of claims, x, arising from a group of policies. This resulted in the following frequency distribution x: 0 1 2 3 4 ≥5 f: 14 25 26 18 12 5 Calculate the median, 25th percentile, and 75th percentile for these data.
2013-4-2,
Loss Distributions, §19 Percentiles
HCM 10/8/12,
Page 217
Solutions to Problems: 19.1. A. F(x) = 1 - {100/(100+x)}3 . Set F(x) = 0.90 and solve for x. 0.90 = 1 - {100/(100+x)}3 . ⇒ x = 100 {(1-.9)-1/3 -1} = 115.4. Alternately, for the Pareto Distribution: VaRp (X) = θ [(1-p)-1/α - 1]. VaR0.9 = (100) {(0.1)-1/3 - 1} = 115.4. Comment: Check. F(115.4) = 1 - {100/(100 + 115.4)}3 = 0.900. 19.2. B. F(x) = 1 - e-x/θ. Set F(x) = 0.5 to find the median. median = -θln(1 - 0.5) = 0.693θ. Set F(x) = 0.95 to find the 95th percentile. 95th percentile = -θln(1 - 0.95) = 2.996θ. Ratio of the 95th percentile to the median = 2.996θ/.693θ = 4.32. Alternately, for the Exponential Distribution as shown in Appendix A: VaRp (X) = -θ ln(1-p). VaR0.95 / VaR0.5 = ln(1-0.95)/ ln(1-0.5) = 4.32. 19.3. C. F(x) = 1 - exp[-(x/100)2 ]. Set F(x) = 0.80 and solve for x. 0.80 = 1 - exp[-(x/100)2 ]. ⇒ ln(0.2) = -(x/100)2 . ⇒ x = 100 {-ln(0.2)}1/2 = 126.9. Alternately, as shown in Appendix A: VaRp (X) = θ {-ln(1-p)}1/τ. VaR0.80 = (100) {-ln(0.2)}1/2 = 126.9. Comment: Check. F(126.9) = 1 - exp[-(126.9/100)2 ] = 0.800. 19.4. B. The area under the density must be: 1. (1/2)(a)(1/4) = 1. ⇒ a = 8. The 25th percentile is where F(x) = 0.25, or where 1/4 of the area below the density is to the left. This occurs at: a/2 = 8/2 = 4.
2013-4-2,
Loss Distributions, §19 Percentiles
HCM 10/8/12,
Page 218
19.5. B. 0.90 = F[2(101/4)] = 1 - (θ/{2(101/4)})a. ⇒ -ln10 = alnθ - aln2 - (a/4)ln10. 0.99 = F[2(1001/4)] = 1 - (θ/{2(1001/4)})a. ⇒ -2ln10 = alnθ - aln2 - (a/2)ln10. Subtracting the two equations: ln10 = (a/4)ln(10). ⇒ a = 4. ⇒ θ = 2. .5 = F(x) = 1 - (2/x)4 . ⇒ x = 2(21 / 4). Alternately, this is a Single Parameter Pareto Distribution. As shown in Appendix A: VaRp (X) = θ (1- p) - 1/ α . Therefore, (2)(101/4) = θ (0.1) -1/ a , and (2)(1001/4) = θ (0.01) - 1/a . Dividing the second equation by the first equation: 101/4 = (0.1)-1/a. ⇒ a = 4. ⇒ θ = 2. 19.6. E. Set F(x) = 0.75. exp[-(y - a)2 /2] = 0.25. ⇒ -(y - a)2 /2 = -ln(4). ⇒ (y - a)2 = 4ln(2).
⇒ y = a + 2 ln(2) . 19.7. For a discrete distribution, the median is the first place the Distribution Function is at least 50%. x: 0 1 2 3 4 F: .14 .39 .65 .83 .95 Thus the median is 2. Similarly, the 25th percentile is 1 and the 75th percentile is 3.
2013-4-2,
Loss Distributions, §20 Definitions
HCM 10/8/12,
Page 219
Section 20, Definitions Those definitions that are not discussed elsewhere are included here. Cumulative Distribution Function and Survival Function:80 Cumulative Distribution Function of X ⇔ cdf of X ⇔ Distribution Function of X ⇔ F(x) = Prob[X ≤ x]. The distribution function is defined on the real line and satisfies: 1. 0 ≤ F(x) ≤ 1. 2. F(x) is nondecreasing; F(x) ≤ F(y) for x < y. 3. F(x) is right continuous; lim F(x + ε) = F(x). ε→0
4. F(-∞) = 0 and F(∞) = 1; lim
x → -∞
F(x) = 0 and lim F(x) = 1. x→∞
Most theoretical size of loss distributions, such as the Exponential, F(x) = 1 - e-x/θ, are continuous, increasing functions, with F(0) = 0. The survival function, S(x) = 1 - F(x) = Prob[X > x]. 0 ≤ S(x) ≤ 1. S(x) is nonincreasing. S(x) is left continuous. S(-∞) = 1 and S(∞) = 0. For the Exponential, S(x) = e-x/θ, is a continuous, decreasing function, with S(0) = 1. Discrete, Continuous, and Mixed Random Variables:81 Discrete Random Variable ⇔ support is finite or countable Continuous Random Variable ⇔ support is an interval or a union of intervals Mixed Random Variable82 ⇔ combination of discrete and continuous random variables
Examples of Discrete Random Variables: Frequency Distributions, Made-up Loss Distributions such as 70% chance of 100 and 30% chance of 500. 80
See Definitions 2.1 and 2.4 in Loss Models. See Definition 2.3 at Loss Models. The support is the set of input values for the distribution; for a loss distribution it is the set of possible sizes of loss. The set of integers is countable. The set of real numbers is not countable. 82 This is a different use of the term mixed than for n-point mixtures of Loss Distributions; in an n-point mixture one weights together n individual distributions in order to create a new distribution. 81
2013-4-2,
Loss Distributions, §20 Definitions
HCM 10/8/12,
Page 220
Examples of Continuous Random Variables: Loss Distributions in Appendix A of Loss Models such as the Exponential, Pareto, Gamma, LogNormal, Weibull, etc. Examples of Mixed Random Variables: Loss Distributions censored from above (censored from the right) by a maximum covered loss; there is a point mass of probability at the maximum covered loss. Probability Function ⇔ Probability Mass Function ⇔ probability density function for a discrete distribution or a point mass of probability for a mixed random variable ⇔ p X(x) = Prob[X = x].83 Loss Events, etc.: A loss event or claim is an incident in which someone suffers damages which result in an economic loss. For example, an insured driver may damage his car in an accident, an insured business may have its factory damaged by fire, someone with health insurance may enter the hospital, etc. Each of these is a loss event. On this exam, do not distinguish between the illness or accident that caused the insured to enter the hospital, the entry into the hospital or the bill from the hospital; each or all of these combined could be considered the loss event. Quite often I will refer to a loss event as an accident. However, loss events can involve a death, illness, natural event, etc., with no accident involved. On this exam, the term “claim” is not distinguished from a “loss event”. However, in common usage the term claim is reserved for those situations where someone actually contacts the insurer and asks for a payment. So for example, if an insured with a $5000 deductible suffered $1000 of otherwise covered damage to his home, he would probably not even bother to inform the insurer. This is a loss event, but in common usage it would not be called a claim. One common definition of a claim is: “A claim is a demand for payment by an insured or by an allegedly injured third party under the terms and conditions of an insurance contract.” 84 The same mathematical methods can be applied to self-insureds. On this exam, do not distinguish between first party and third party claims. For example, if Bob is driving his car and it hits Sueʼs car, then Sue may make a claim against Bobʼs insurer. Under liability insurance, there is no requirement that a loss event or claim be made by an insured. For example, a customer may slip and fall in a store and sue the store. The storeʼs insurer may have to pay the customer. 83 84
See Definition 2.6 in Loss Models. Foundations of Casualty Actuarial Science, Chapter 2.
2013-4-2,
Loss Distributions, §20 Definitions
HCM 10/8/12,
Page 221
The loss, size of loss, or severity, is the dollar amount of damage as a result of a loss event. The loss may be zero. If an insured suffers $20,000 of damage and the insured has a $5,000 deductible, then the insurer would only pay the insured $15,000. However, the (size of) loss is $20,000, the amount of damage suffered by the insured. If the insured suffered only $1000 of damage, then the insurer would pay nothing due to the $5000 deductible, but the (size of) loss is $1000. A payment event is an incident in which someone receives a (non-zero) payment as a result of a loss event covered by an insurance contract. The amount paid is the actual dollar amount paid as a result of a loss event or a payment event. If it is as the result of a loss event, the amount paid may be zero. So if an insuredʼs home suffers $20,000 of damage and the insured has a $5,000 deductible, then the insurer would pay the insured $15,000. The amount paid is $15,000. If the insured suffered $1000 damage, then the insurer would pay nothing due to the $5000 deductible. The amount paid is 0. If an injured worker makes a claim for Workers Compensation benefits, but it is found that the injury is not work related, then the amount paid may be zero. If a doctor is sued for medical malpractice, quite often the claim is closed without payment; the amount paid can be zero. The allocated loss adjustment expense (ALAE) is the amount of expense incurred directly as a result of a loss event. Loss adjustment expenses (LAE) which can be directly related to a specific claim are classified as ALAE, while those that can not are classified as unallocated loss adjustment expense (ULAE).85 Examples of ALAE are fees for defense attorneys, expert witnesses for the defense, medical evaluations, court costs, laboratory and x-ray costs, etc. Quite often claims closed without (loss) payment, will have a positive amount of ALAE, sometimes a very large amount. Note that any loss payment is not a part of the ALAE and vice versa. A loss distribution is the probability distribution of either the loss or the amount paid from a loss event or of the amount paid from a payment event. The distribution may or may not exclude payments of zero and may or may not include ALAE.
85
For specific lines of insurance the distinction between ALAE and ULAE may depend on the statistical plan or reporting requirement.
2013-4-2,
Loss Distributions, §20 Definitions
HCM 10/8/12,
Page 222
Loss distributions can be discrete. For example, a 20% chance of $100 and an 80% chance of $500. The loss distributions in the Appendix A of Loss Models, such as the Exponential Distribution or the Pareto Distribution, are all continuous and all exclude the chance of a loss of size zero. Thus in order to model losses for lines of insurance with many claims closed without payment one would have to include a point mass of probability at zero. On Liability lines of insurance, losses and ALAE are often reported together, so they are frequently modeled together. Frequency, Severity, and Exposure: The frequency is the number of losses or number of payments random variable. Its expected value is called the mean frequency. Unless indicated otherwise the frequency is for one exposure unit. Frequency distributions are discrete with support on all or part of the non-negative integers. They can be “made-up”, or can be named distributions, such as the Poisson or Binomial Distributions.86 The severity can be either the loss or amount paid random variable. Its expected value is called the mean severity. Severity and frequency together determine the aggregate amount paid by an insurer. The number of claims or loss events determines how many times we take random draws from the severity variable. Note that first we determine the number of losses from the frequency distribution and then we determine the size of each loss. Thus frequency and severity do not enter into the aggregate loss distribution in a symmetric manner. This will be seen for example when we calculate the variance of the aggregate losses.87 The exposure base is the basic unit of measurement upon which premiums are determined. For example, insuring one automobile for one year is a car-year of exposure. Insuring $100 dollars of payroll in Workersʼ Compensation is the unit of exposure. So if the rate for carpenters in State X were $4 per $100 of payroll, insuring the “Close to You Carpenters”, which paid its carpenters $250,000 per year in total, would cost: (250,000/100)($4) = $10,000 per year.88
86
See “Mahlerʼs Guide to Frequency Distributions,” and Appendix B of Loss Models. See “Mahlerʼs Guide to Aggregate Distributions.” 88 This is a simplified example. 87
2013-4-2,
Loss Distributions, §20 Definitions
HCM 10/8/12,
Page 223
Some Definitions from Joint Principles of Actuarial Science:89 Phenomena ⇔ occurrences that can be observed Experiment ⇔ observation of a given phenomena under specified conditions Event ⇔ set of one or more possible outcomes Stochastic phenomenon ⇔ more than one possible outcome contingent event ⇔ outcome of a stochastic phenomenon; more than one possible outcome probability ⇔ measure of the likelihood of an event, on a scale from 0 to 1 random variable ⇔ function that assigns a numerical value to every possible outcome Data Dependent Distributions: Data-Dependent Distributions ⇔ complexity ≥ that of the data; complexity increases as the sample size increases.90 The most important example of a data-dependent distribution is the Empirical Distribution Function. Another example is Kernel Smoothing. As discussed previously, the empirical model assigns probability 1/n to each of n observed values. For example, with the following observations: 81, 157, 213, the probability function (pdf) of the corresponding empirical model is: p(81) = 1/3, p(157) = 1/3, p(213) = 1/3. As discussed previously, the corresponding Empirical Distribution Function is: F(x) = 0 for x < 81, F(x) = 1/3 for 81 ≤ x < 157, F(x) = 2/3 for 157 ≤ x < 213, F(x) = 1 for 213 ≤ x. In a Kernel Smoothing Model, the data is smoothed using a “kernel” function.91
89
See Section 2.1 of Loss Models. See Definition 4.7 in Loss Models. 91 Kernel smoothing is covered in “Mahlerʼs Guide to Fitting Loss Distributions.” 90
2013-4-2,
Loss Distributions, §21 Parameters of Distributions
HCM 10/8/12,
Page 224
Section 21, Parameters of Distributions The probability that a loss is less than or equal to x is given by a distribution function F(x). For a given type of distribution, in addition to the size of loss x, F(x) depends on what are called parameters. Each type of distribution has a set of parameters, which enter into F(x) and its derivative f(x) in a manner particular to that type of distribution. For example, the Pareto Distribution has a set of two parameters α and θ. For fixed α and θ, F(x) is a function of x. For α = 2 and θ = 10, the Pareto Distribution is given by: ⎛ θ ⎞α F(x) = 1 - ⎜ ⎟ = 1 - (1 + x / 10)-2. ⎝ θ + x⎠ The Pareto Distribution is an example of a parametric distribution.92 The parameter(s) tell you which member of the family one has. For example, if one has a Pareto Distribution with parameters α = 2 and θ = 10, the Distribution is completely described. In addition one needs to know the support of the distributions in the family. For example, all Pareto Distributions have support x > 0. Finally, one needs to know the set from which the parameter or parameters may be drawn. For the Pareto Distribution, α > 0 and θ > 0. It is useful to group distributions based on how many parameters they have. Those in the Appendix Loss Models have one, two, three or even occasionally four parameters. For example, the Exponential has one parameter, the Gamma has two parameters, while the Transformed Gamma has three parameters. It is also useful to divide parameters into those that are related to the scale of the distribution and those that are related to the shape of a parameter. A scale parameter is a parameter which divides x everywhere it appears in the distribution function. For example, θ is a scale parameter for the Pareto distribution: F(x) = 1 - {θ/(θ+x)}α = 1 - (1 + x / θ)−α. θ normalizes the scale of x, so that one standard distribution can fit otherwise similar data sets. Thus for example, different units of measurement (dollars, yen, pounds, marks, etc.) or the effects of inflation can be easily accommodated by changing the scale parameter. A scale parameter will appear to the nth power in the formula for the nth moment of the distribution. Thus, the nth moment of the Pareto has a factor of θn . 92
See Definition 4.1 in Loss Models.
2013-4-2,
Loss Distributions, §21 Parameters of Distributions
HCM 10/8/12,
Page 225
Also, a scale parameter will not appear in the coefficient of variation, the skewness, or kurtosis. The coefficient of variation, the skewness, and the kurtosis each measure the shape of the distribution. Thus a change in scale should not affect the shape of the fitted Pareto; θ does not appear in the coefficient of variation or the skewness of the Pareto. However, α does, and thus alpha is called a shape parameter. A parameter such as µ in the Normal distribution is referred to as a location parameter. Altering
µ shifts the whole distribution to the left or right. Gamma Distribution, an Example: For the Gamma Distribution, α is the shape parameter and θ is the scale parameter. Here are graphs of four different Gamma Distributions, each with a mean of 200: Prob. 0.005
Prob.
Exponential, alpha= 1
alpha= 2
0.003
0.004 0.003
0.002
0.002 0.001
0.001 100 200 Prob.
300 400
500
x 600
100 200 Prob.
alpha= 4
300 400 500
x 600
alpha= 10
0.006
0.004
0.005
0.003
0.004 0.003
0.002
0.002
0.001
0.001 100 200
300 400 500
x 600
100 200
300 400 500
x 600
2013-4-2,
Loss Distributions, §21 Parameters of Distributions
HCM 10/8/12,
Page 226
Advantages of Parametric Estimation:93 Parametric estimation has the following the following advantages: 1. Accuracy. Maximum likelihood estimators of parameters have good properties. 2. Inferences can be made beyond the population that generated the data. For example, even if the largest observed loss is 100, we can estimate the chance that the next observed loss will have a size greater than 100. 3. Parsimony. Distributions can be summarized using only a few parameters. 4. Allows Hypothesis Tests.94 5. Scale parameters/families. Allow one to more easily handle the effects of (uniform) inflation. Some Desirable Properties of Size of Loss Distributions:95 As a model for claim sizes. in order to be practical, a distribution should have the following desirable characteristics: 1. The estimate of the mean should be efficient and reasonably easy to use. 2. A confidence interval about the mean should be calculable. 3. All moments of the distribution function should exist. 4. The characteristic function can be written in closed form. As will be seen, some distributions such as the Pareto do not have all of their moments. While this does not prevent their use, it does indicate some caution must be exercised. For the LogNormal, the characteristic can not be written in closed form.
93
See Section 2.6 of the first edition of Loss Models. For example through the use of the Chi-Square Statistic or the Kolmogorov-Smirnov Statistic, discussed in “Mahlerʼs Guide to Fitting Loss Distributions.” 95 See “Estimating Pure Premiums by Layer - An Approach,” by Robert J. Finger, Discussion by Lee R. Steeneck, PCAS 1976, not on the syllabus 94
2013-4-2,
Loss Distributions, §21 Parameters of Distributions
HCM 10/8/12,
Page 227
Problems: 21.1 (1 point) Which of the following are true with respect to the application in Property/Casualty Insurance of theoretical size of loss distributions? 1. When data is extensive, theoretical distributions are not essential. 2. Inferences can be made beyond the population that generated the data. 3. The inconvenience restricts their use to a unusual circumstances. A. 1 B. 2 C. 3 D. 2, 3 E. None of A, B, C, or D 21.2 (4B, 5/95, Q.29) (1 point) Which of the following are reasons for the importance of theoretical distributions? 1. They permit calculations to be made without the formulation of a model. 2. They are completely summarized by a small number of parameters. 3. Their convenience for mathematical manipulation allows for the development of useful theoretical results. A. 2 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
2013-4-2,
Loss Distributions, §21 Parameters of Distributions
HCM 10/8/12,
Page 228
Solutions to Problems: 21.1. B. 1. F. Even when data are extensive theoretical distributions may be essential, depending on the question to be answered. For example, theoretical distributions may be essential to estimate the tail of the distribution. 2. T. 3. F. 21.2. D. 1. F. 2. T. 3. T.
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 229
Section 22, Exponential Distribution This single parameter distribution is extremely simple to work with and thus appears in many exam questions. In most actual applications, the Exponential doesnʼt provide enough flexibility to fit the data. Thus, it is much more common to use the Gamma or Weibull Distributions, of which the exponential is a special case96. Following is a summary of the Exponential Distribution. Exponential Distribution Support: x > 0
Parameters: θ > 0 ( scale parameter)
D. f. :
F(x) = 1 - e-x/θ
P. d. f. :
f(x) = e-x/θ / θ
Moments: E[Xn] = n! θn Mean = θ Variance = θ2 Coefficient of Variation = Standard Deviation / Mean = 1 Skewness = 2 Kurtosis = 9 Median = θ ln(2)
Mode = 0
Limited Expected Value Function: E[X ∧ x] = θ (1- e-x/θ) R(x) = Excess Ratio = e-x/θ e(x) = Mean Excess Loss = θ
Derivatives of d.f.:
∂F(x) = x e-x/θ ∂θ
Method of Moments: θ = X . Percentile Matching: θ = -x1 / ln(1-p1 ) Method of Maximum Likelihood: θ = X = Σ xi / n, same as method of moments. 96
For a shape parameter of unity, either the Gamma or the Weibull distributions reduce to the Exponential.
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 230
All Exponential Distributions have the same shape. The Coefficient of Variation is always 1 and the skewness is always 2. Hereʼs a graph of the density function of an Exponential. All Exponential Distributions look the same except for the scale; in this case the mean is 10. Also note that while Iʼve only shown x ≤ 50, the density is positive for all x > 0. f(x) 0.10
0.08
0.06
0.04
0.02
10
20
30
40
50
x
The Exponential Distribution has a constant Mean Excess Loss and therefore a constant hazard rate; it is the only continuous distribution with this “memoryless” property. Exercise: Losses prior to any deductible follow an Exponential Distribution with θ = 8. A policy has a deductible of size 5. What is the distribution of non-zero payments under that policy? [Solution: After truncating and shifting by d: S(x + d) S(x + 5) e- (x + 5) / 8 G(x) = 1 =1=1= 1 - e-x/8. S(d) S(5) e- 5 / 8 Comment: This is an Exponential Distribution with θ = 8. ]
When an Exponential Distribution is truncated and shifted from below, one gets the same Exponential Distribution, due to its memoryless property. On any exam question involving an Exponential Distribution, check whether its memoryless property helps to answer the question.97 97
See for example, 3, 11/00, Q.21.
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 231
Integrals Involving the Density of the Exponential Distribution: Let f(x) = e-x/θ / θ, the density of an Exponential Distribution with mean θ. x
∫
tn f(t) dt =
0
x
∫
tn e- t / θ / θ dt =
x/θ
0
Γ(n+1 ; x) = 1 -
∫
sn θn e-s d s = θn Γ(n+1 ; x/θ) Γ(n+1).98
0 i=n
∑ xi e -x / i! .
Γ(n+1) = n!.
i=0
x
∫
Therefore,
tn f(t) dt = θn (1 -
0
i=n
∑ (x /θ ) i e-x/ θ / i! ) n!. i=0
x
∫ t f(t) dt = θ {1 - e-x/θ - (x/θ)e-x/θ} 1! = θ - (θ + x)e-x/θ. 0 x
∫ t2 f(t) dt = θ2 {1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2e-x/θ/2} 2! = 2θ2 - (2θ2 + 2θx + x2)e-x/θ. 0 x
∫ t3 f(t) dt = θ3 {1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2e-x/θ/2 - (x/θ)3e-x/θ/6} 3! 0
= 6θ3 - (6θ3 + 6θ2x + 3θx2 + x3 )e-x/θ. Inverse Exponential Distribution: If x follows an Exponential Distribution, then 1/x follows an Inverse Exponential Distribution. F(x) = e-θ/x.
f(x) =
θ e- x / θ , x > 0. x2
The Inverse Exponential Distribution is very heavy-tailed. It fails to have a finite mean, let alone any higher moments! 98
The Incomplete Gamma Function is discussed in “Mahlerʼs Guide to Frequency Distributions.”
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 232
Problems: Use the following information for the next eight questions: Let X be an exponentially distributed random variable, the probability density function of which is: f(x) = 8 e-8x, x ≥ 0 22.1 (1 point) Which of the following is the mean of X? A. less than 0.06 B. at least 0.06 but less than 0.08 C. at least 0.08 but less than 0.10 D. at least 0.10 but less than 0.12 E. at least 0.12 22.2 (1 point) Which of the following is the median of X? A. less than 0.06 B. at least 0.06 but less than 0.08 C. at least 0.08 but less than 0.10 D. at least 0.10 but less than 0.12 E. at least 0.12 22.3 (1 point) Which of the following is the mode of X? A. less than 0.06 B. at least 0.06 but less than 0.08 C. at least 0.08 but less than 0.10 D. at least 0.10 but less than 0.12 E. at least 0.12 22.4 (1 point) What is the chance that X is greater than 0.3? A. less than 0.06 B. at least 0.06 but less than 0.08 C. at least 0.08 but less than 0.10 D. at least 0.10 but less than 0.12 E. at least 0.12
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 233
22.5 (1 point) What is the variance of X? A. less than 0.015 B. at least 0.015 but less than 0.016 C. at least 0.016 but less than 0.017 D. at least 0.017 but less than 0.018 E. at least 0.018 22.6 (1 point) What is the coefficient of variation of X? A. less than 0.5 B. at least 0.5 but less than 0.7 C. at least 0.7 but less than 0.9 D. at least 0.9 but less than 1.1 E. at least 1.1 22.7 (2 points) What is the skewness of X? A. less than 0 B. at least 0 but less than 1 C. at least 1 but less than 2 D. at least 2 but less than 3 E. at least 3 22.8 (3 points) What is the kurtosis of X? A. less than 3 B. at least 3 but less than 5 C. at least 5 but less than 7 D. at least 7 but less than 9 E. at least 9
22.9 (1 point) Prior to the application of any deductible, losses follow an Exponential Distribution with θ = 135. If there is a deductible of 25, what is the density of non-zero payments at 60? A. B. C. D. E.
less than 0.0045 at least 0.0045 but less than 0.0050 at least 0.0050 but less than 0.0055 at least 0.0055 but less than 0.0060 at least 0.0060
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 234
22.10 (2 points) You are given the following: • Claim sizes follow an exponential distribution with density function f(x) = 0.1 e-0.1x , 0 < x < ∞. • You observe 8 claims. • The number of claims and claim sizes are independent. Determine the probability that the largest of these claim is less than 17. A. less than 80% B. at least 80% but less than 85% C. at least 85% but less than 90% D. at least 90% but less than 95% E. at least 95% 22.11 (1 point) What is F(3), for an Inverse Exponential Distribution, with θ = 10? A. less than 3% B. at least 3% but less than 5% C. at least 5% but less than 7% D. at least 7% but less than 9% E. at least 9% 22.12 (2 points) You are given:
•
Future lifetimes follow an Exponential distribution with a mean of θ.
• •
The force of interest is δ.
A whole life insurance policy pays 1 upon death. What is the actuarial present value of this insurance? (A) e−δθ (B) 1 / (1 + δθ) (C) e−2δθ (D) 1 / (1 + δθ)2 (E) None of A, B, C, or D. 22.13 (1 point) Prior to the application of any deductible, losses follow an Exponential Distribution with θ = 25. If there is a deductible of 5, what is the variance of the non-zero payments? A. B. C. D. E.
less than 600 at least 600 but less than 650 at least 650 but less than 700 at least 700 but less than 750 at least 750
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 235
22.14 (2 points) Prior to the application of any deductible, losses follow an Exponential Distribution with θ = 31. There is a deductible of 10. What is the variance of amount paid by the insurer for one loss, including the possibility that the amount paid is zero? A. less than 900 B. at least 900 but less than 950 C. at least 950 but less than 1000 D. at least 1000 but less than 1050 E. at least 1050 22.15 (2 points) Size of loss is Exponential with mean θ. Y is the minimum of N losses. What is the distribution of Y? 22.16 (2 points) You are given: • A claimant receives payments at a rate of 1 paid continuously while disabled.
• •
Payments start immediately.
•
The force of interest is δ.
The length of disability follows an Exponential distribution with a mean of θ.
At the time of disability, what is the actuarial present value of these payments? (A) 1 / (δ + θ)
(B) 1 / (1 + δθ)
(C) θ / (δ + θ)
(D) θ / (1 + δθ)
(E) None of A, B, C, or D.
22.17 (2 points) You are given the following graph of the density of an Exponential Distribution. Prob. 0.1 0.08 0.06 0.04 0.02
10
20
30
40
50
What is the third moment of this Exponential Distribution? A. 1000 B. 2000 C. 4000 D. 6000 E. 8000
x
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 236
22.18 (3 points) Belle Chimes and Leif Blower are engaged to be married. The cost of their wedding will be 110,000. They will receive 200 gifts at their wedding. The size of each gift has distribution: F(x) = 1 - exp[-(x - 100)/500], x > 100. What is the probability that the total value of the gifts will not exceed the cost of their wedding? A. 6% B. 8% C. 10% D. 12% E. 14% 22.19 (5 points) Define the quartiles as the 25th, 50th, and 75th percentiles. Define the interquartile range as the difference between the third and first quartiles, in other words as the 75th percentile minus the 25th percentile. Determine the interquartile range for an Exponential Distribution. Define the Quartile Skewness Coefficient as: (3rd quartile - 2nd quartile) - (2nd quartile - 1st quartile) . 3rd quartile - 1st quartile Determine the Quartile Skewness Coefficient for an Exponential Distribution, and compare it to the skewness. 22.20 (3 points) Define the Mean Absolute Deviation as: E[ |X - E[X]| ]. Determine the Mean Absolute Deviation for an Exponential Distribution. 22.21 (160, 11/86, Q.9) (2.1 points) X1 and X2 are independent random variables each with Exponential distributions. The expected value of X1 is 9.5. The variance of X2 is 2.25. Determine the probability that X1 < X2 . (A) 2/19
(B) 3/22
(C) 3/19
(D) 3/16
(E) 2/3
22.22 (4, 5/87, Q.32) (1 point) Let X be an exponentially distributed random variable, the probability density function of which is: f(x) = 10 exp(-10x), where x ≥ 0. Which of the following statements regarding the mode and median of X is true? A. The median of X is 0; the mode is 1/2. B. The median of X is (ln 2) / 10; the mode of X is 0. C. The median of X is 1/2; the mode of X does not exist. D. The median of X is 1/2; the mode of X is 0. E. The median of X is 1/10; and the mode of X is (ln 2) /10. 22.23 (2, 5/90, Q.11) (1.7 points) Let X be a continuous random variable with density function f(x) = λe−λx for x > 0. If the median of this distribution is 1/3, then what is λ? A. (1/3) In(1/2)
B. (1/3) In(2)
C. 2 In(3/2)
D. 3 In(2)
E. 3
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 237
22.24 (160, 5/90, Q.3) (2.1 points) You are given: (i) Tu is the failure time random variable assuming the uniform distribution from 0 to ω. (ii) Te is the failure time random variable assuming the exponential distribution. (iii) Var(Tu ) = 3Var(Te ). (iv) f(te ) / S(te ) = 0.5. Calculate the uniform distribution parameter ω. (A) 3
(B) 4
(C) 8
(D) 12
(E) 15
22.25 (2, 2/96, Q.40) (1.7 points) Let X1 , . . . , X100 be a random sample from an exponential distribution with mean 1/2. 100
Determine the approximate value of P[ ∑ Xi > 57] using the Central Limit Theorem. i =1
A. 0.08
B. 0.16
C. 0.31
D. 0.38
E. 0.46
22.26 (2, 2/96, Q.41) (1.7 points) Let X be a continuous random variable with density function f(x) = e-x/2/2 for x > 0. Determine the 25th percentile of the distribution of X. A. In(4/9) B. ln(16/9) C. ln(4) D. 2 E. ln(16) 22.27 (Course 151 Sample Exam #2, Q.24) (2.5 points) An insurer's portfolio consists of a single possible claim. You are given: (i) the claim amount is uniformly distributed over (100, 500). (ii) the probability that the claim occurs after time t is e-0.1t, t > 0. (iii) the claim time and amount are independent. (iv) the insurer's initial surplus is 20. (v) premium income is received continuously at the rate of 40 per annum. Determine the probability of ruin (not having enough money to pay the claim.) (A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7 22.28 (Course 160 Sample Exam #2, 1996, Q.1) (1.9 points) You are given: (i) Two independent random variables X1 and X2 have exponential distributions with means θ1 and θ2, respectively. (ii) Y = X1 X2 . Determine E[Y]. 1 1 (A) + θ1 θ2
(B)
θ1 θ2 θ1 + θ 2
(C) θ1 + θ2
(D)
1 θ1 θ2
(E) θ1θ2
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 238
22.29 (4B, 11/99, Q.17) (2 points) Claim sizes follow a distribution with density function f(x) = e-x, 0 < x < ∞. Determine the probability that the second claim observed will be more than twice as large as the first claim observed. A. e-3
B. e-2
C. 1/3
D. e-1
E. 1/2
22.30 (Course 1 Sample Exam, Q.23) (1.9 points) The value, v, of an appliance is based on the number of years since purchase, t, as follows: v(t) = e(7- 0.2t). If the appliance fails within seven years of purchase, a warranty pays the owner the value of the appliance. After seven years, the warranty pays nothing. The time until failure of the appliance has an exponential distribution with mean 10. Calculate the expected payment from the warranty. A. 98.70 B. 109.66 C. 270.43 D. 320.78 E. 352.16 22.31 (1, 5/00, Q.3) (1.9 points) The lifetime of a printer costing 200 is exponentially distributed with mean 2 years. The manufacturer agrees to pay a full refund to a buyer if the printer fails during the first year following its purchase, and a one-half refund if it fails during the second year. If the manufacturer sells 100 printers, how much should it expect to pay in refunds? (A) 6,321 (B) 7,358 (C) 7,869 (D) 10,256 (E) 12,642 22.32 (1, 5/00, Q.18) (1.9 points) An insurance policy reimburses dental expense, X, up to a maximum benefit of 250. The probability density function for X is: c e-0.004x for x > 0, where c is a constant. Calculate the median benefit for this policy. (A) 161 (B) 165 (C) 173 (D) 182 (E) 250 22.33 (1, 11/00, Q.9) (1.9 points) An insurance company sells an auto insurance policy that covers losses incurred by a policyholder, subject to a deductible of 100. Losses incurred follow an exponential distribution with mean 300. What is the 95th percentile of actual losses that exceed the deductible? (A) 600 (B) 700 (C) 800 (D) 900 (E) 1000 22.34 (1, 11/00, Q.14) (1.9 points) A piece of equipment is being insured against early failure. The time from purchase until failure of the equipment is exponentially distributed with mean 10 years. The insurance will pay an amount x if the equipment fails during the first year, and it will pay 0.5x if failure occurs during the second or third year. If failure occurs after the first three years, no payment will be made. At what level must x be set if the expected payment made under this insurance is to be 1000? (A) 3858 (B) 4449 (C) 5382 (D) 5644 (E) 7235
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 239
22.35 (1, 5/01, Q.20) (1.9 points) A device that continuously measures and records seismic activity is placed in a remote region. The time, T, to failure of this device is exponentially distributed with mean 3 years. Since the device will not be monitored during its first two years of service, the time to discovery of its failure is X = max(T, 2). Determine E[X] . (A) 2 + e-6/3
(B) 2 - 2e-2/3 + 5e-4/3
(C) 3
(D) 2 + 3e-2/3
(E) 5
22.36 (1, 5/01, Q.32) (1.9 points) A company has two electric generators. The time until failure for each generator follows an exponential distribution with mean 10. The company will begin using the second generator immediately after the first one fails. What is the variance of the total time that the generators produce electricity? (A) 10 (B) 20 (C) 50 (D) 100 (E) 200 22.37 (1, 5/03, Q.4) (2.5 points) The time to failure of a component in an electronic device has an exponential distribution with a median of four hours. Calculate the probability that the component will work without failing for at least five hours. (A) 0.07 (B) 0.29 (C) 0.38 (D) 0.42 (E) 0.57 22.38 (CAS3, 11/03, Q.17) (2.5 points) Losses have an Inverse Exponential distribution. The mode is 10,000. Calculate the median. A. Less than 10,000 B. At least 10,000, but less than 15,000 C. At least 15,000, but less than 20,000 D. At least 20,000, but less than 25,000 E. At least 25,000 22.39 (SOA3, 11/03, Q.34 & 2009 Sample Q.89) (2.5 points) You are given: (i) Losses follow an exponential distribution with the same mean in all years. (ii) The loss elimination ratio this year is 70%. (iii) The ordinary deductible for the coming year is 4/3 of the current deductible. Compute the loss elimination ratio for the coming year. (A) 70% (B) 75% (C) 80% (D) 85% (E) 90% 22.40 (CAS3, 5/04, Q.20) (2.5 points) Losses have an exponential distribution with a mean of 1,000. There is a deductible of 500. The insurer wants to double the loss elimination ratio. Determine the new deductible that achieves this. A. 219 B. 693 C. 1,046 D. 1,193 E. 1,546
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 240
22.41 (CAS3, 11/05, Q.20) (2.5 points) Losses follow an exponential distribution with parameter θ. For a deductible of 100, the expected payment per loss is 2,000. Which of the following represents the expected payment per loss for a deductible of 500? A. θ B. θ(1 - e-500/θ) C. 2,000 e-400/θ D. 2,000 e-5/θ E. 2,000 (1 - e-500/θ) / (1 - e-100/θ) 22.42 (4, 11/06, Q.26 & 2009 Sample Q.269) (2.9 points) The random variables X1 , X2 , ... , Xn , are independent and identically distributed with probability density function f(x) = e-x/θ/θ, x ≥ 0. Determine E[ X 2 ]. (A)
n+ 1 2 θ n
(B)
n+ 1 2 θ n2
(C)
θ2 n
(D)
θ2 n
(E) θ2
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 241
Solutions to Problems: 22.1. E. An exponential with θ = 1/8; mean = θ = 0 .125. 22.2. C. An exponential with θ = 1/8; F(x) = 1 - e-x/θ = 1 - e-8x. At the median: F(0.5)= 0.5 = 1 - e-8x. ⇒ x = -ln(.5)/8 = 0.0866. 22.3. A. The mode of the exponential is always zero. (The density 8e-8x decreases for x > 0 and thus attains its maximum at x=0.) 22.4. C. An exponential with 1/θ = 8 ; F(x) = 1 - e -x/θ = 1 - e-8x . 1- F(0.3) = e-(8)(.3) = e-2.4 = 0.0907. 22.5. B. An exponential with 1/θ = 8 ; variance = θ2 = 0.015625. 22.6. D. An exponential always has a coefficient of variation of 1. The C.V. = standard deviation / mean = (0.015625)0.5 / 0.125 = 1. 22.7. D. An exponential always has skewness of 2. Specifically the moments are: µ1 = (1!) θ1 = 1/8 = .125. µ2 ′ = (2!) θ2 = 2 / 82 = .03125. µ3 ′ = (3!) θ3 = 6 / 83 = .01172. Standard Deviation = (.03125 - .1252 ).5 = .125. Skewness = {µ3 ′ - (3 µ1 µ2 ′) + (2 µ1 3 )} / STDDEV3 = {.01172 - (3)(.125)(.03125)) + (2)(.125)3 )} / (.125)3 = .0039075 / .001953 = 2.00. 22.8. E. An exponential always has kurtosis of 9. Specifically the moments are: µ1 = (1! ) θ1 = θ. µ2 ′ = (2!) θ2 = 2θ2 . µ3 ′ = (3!) θ3 = 6θ3 . µ4 ′ = (4!) θ4 = 24 θ4 . Standard Deviation = (2θ2 - θ2 ).5 = θ. µ4 = µ4 ′ - (4 µ1 µ3 ′) + (6 µ1 2 µ2 ′) - 3µ1 4 = 24 θ4 - (4)(θ)(6θ3 ) + (6)(θ2 )(2θ2 ) - (3)θ4 = 9θ4 . kurtosis = µ4 / STDDEV4 = 9θ4 / θ4 = 9. 22.9. B. After truncating and shifting from below, one gets the same Exponential Distribution with θ = 135, due to its memoryless property. The density is: e-x/135/135, which at x = 60 is: e-60/135/135 = 0.00475.
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 242
22.10. A. For this exponential distribution, F(x) = 1 - e-.1x. F(17) = 1- e-(.1)(17) = .817. The chance that all eight claims will be less than or equal to 17, is: F(17)8 = .8178 = 19.9%. Comment: This is an example of an order statistic. The maximum of the 8 claims is less than or equal to 17 if and only if each of the 8 claims is less than or equal to 17. 22.11. B. F(x) = e-θ/x. F(3) = e-10/3 = 0.036. 22.12. B. The probability of death at time t, is the density of the Exponential Distribution: f(t) = e-t/θ /θ. The present value of a payment of one at time t is e−δt . Therefore, the actuarial present value of this insurance is: ∞
∞
∫ e−δt e-t/θ /θ dt = (1/θ) ∫ e−(δ+1/θ)t dt = (1/θ) /(δ + 1/θ) = 1/(1 + δθ). 0
0
22.13. B. After truncating and shifting from below, one gets the same Exponential Distribution with θ = 25, due to its memoryless property. The variance is θ2 = 252 = 625. 22.14. A. After truncating and shifting from below, one gets the same Exponential Distribution with θ = 31, due to its memoryless property. Thus the nonzero payments are Exponential with θ = 31, with mean 31 and variance 312 . The probability of a nonzero payment is the probability that a loss is greater than the deductible of 10; S(10) = e-10/31 = .7243. Thus the payments of the insurer can be thought of as an aggregate distribution, with Bernoulli frequency with mean .7243 and Exponential severity with mean 31. The variance of this aggregate distribution is: (Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) = (.7243)(312 ) + (31)2 {(.7243)(1-.7243)} = 888. Comment: Similar to 3, 11/00, Q.21. 22.15. The survival function of Y is: Prob[all N losses > y] = S(y)N = (e-x/θ)N = e-xN/θ. The distribution of Y is Exponential with mean θ/N.
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 243
22.16. D. Given a disability of length t, the present value of an annuity certain is: (1-e-δt)/δ. The expected present value is the average of this over all t: ∞
∞
∞
∫{(1-e-δt)/δ}f(t) dt = ∫{(1-e-δt)/δ}e-t/θ/θ dt = (1/δ)∫e-t/θ/θ - e-t(δ + 1/θ)/θ dt = 0
0
0
(1/δ){1 - (1/(δ + 1/θ))/θ} = (1/δ){1 - (1/( 1 + δθ) } = θ/( 1 + δθ). 22.17. D. For an Exponential, f(x) = e-x/θ/θ. f(0) = 1/θ. Thus 1/10 = 1/θ. ⇒ θ = 10. Third moment is: 6θ3 = 6000. 22.18. B. Let Y = X - 100. Then Y is Exponential with mean θ = 500. E[X] = E[Y] + 100 = 500 + 100 = 600. Var[X] = Var[Y] = 5002 = 250,000. The mean total value of gifts is: (200)(600) = 120,000. The variance of the total value of gifts is: (200)(250,000) = 50,000,000. Prob[gifts ≤ 110,000] ≅ Φ[(110,000 - 120,000)/ 50,000,000 ] = Φ[-1.41] = 7.9%. Comment: The distribution of the size of gifts is a Shifted Exponential. 22.19. 0.25 = 1 - exp[-Q0.25 / θ]. ⇒ Q0.25 = θ ln[4/3]. 0.5 = 1 - exp[-Q0.5 / θ]. ⇒ Q0.5 = θ ln[2]. 0.75 = 1 - exp[-Q0.75 / θ]. ⇒ Q0.75 = θ ln[4]. Interquartile range = Q0.75 - Q0.25 = θ ln[4] - θ ln[4/3] = θ ln[3] = 1.0986θ. Quartile Skewness Coefficient =
(θ ln(4) - θ ln(2)) - (θ ln(2) - θ ln(4 / 3)) = ln(4/3) / ln(3) = 0.262. θ ln(3)
The skewness of any Exponential Distribution is 2. Specifically, the third central moment is: E[X3 ] - 3E[X2 ]E[X] + 2E[X]3 = 6θ3 - (3)(2θ2)(θ) + 2θ3 = 2θ3. The variance is θ2. Thus the skewness is: 2θ3 / (θ2)3/2 = 2. Comment: The first quartile is also called the lower quartile, while the 3rd quartile is also called the upper quartile. The Quartile Skewness Coefficient as applied to a small sample of data would be a robust estimator of the skewness of the distribution from which the data was drawn; it would be not be significantly effected by unusual values in the sample, in other words by outliers.
2013-4-2,
Loss Distributions, §22 Exponential
∞
22.20.
θ
Page 244
∞
∫0 | x - θ | f(x) dx = ∫0 (θ - x) e- x / θ / θ dx + ∫θ (x - θ) e- x / θ / θ dx =
θ
θ
HCM 10/8/12,
∞
∞
θ
∫0 e - x / θ / θ dx - θ ∫θ e - x / θ / θ dx + ∫θ x e - x / θ / θ dx - ∫0 x e - x / θ / θ dx =
θ (1 -
e-1)
-
θ e-1
x= ∞
+ (-x exp[-x / θ] - θ exp[-x / θ])
]
x =θ
x =θ
- (-x exp[-x / θ] - θ exp[-x / θ])
]
=
x= 0
θ (1 - 2e-1) + 2θe-1 + (2θe-1 - θ) = 2θ e- 1 = 0.7358 θ. 22.21. B. θ22 = 2.25. ⇒ θ2 = 1.25. Given X1 = t, Prob[X2 > t] = e-t/1.5. ∞ Prob[X2 > X1] = ∫ e− t /1.5 e− t /9.5 / 9.5 dt = (1/9.5)/(1/1.5 + 1/9.5) = 3/22. 0
Comment: This is mathematically equivalent to two independent Poisson Processes, with λ 1 = 1/9.5 and λ2 = 1/1.5. The probability of observing an event from the first process before the second process is: λ1/(λ1 + λ2) = (1/9.5)/(1/9.5 + 1/1.5) = 3/22. See “Mahlerʼs Guide to Stochastic Processes,” on CAS Exam 3L. 22.22. B. The median is where F(x) = .5. F(x) = 1 - e-10x. Therefore solving for x, the median = -ln(.5) / 10 = ln(2) / 10. The mode is that point where f(x) is largest. Since f(x) declines for x ≥0, f(x) is at its maximum at x = 0. Therefore, the mode is zero. 22.23. D. F(x) = 1 - e−λx. F(1/3) = 0.5. ⇒ 0.5 = e−λ/3. ⇒ λ = 3 In(2). 22.24. D. For the Exponential, the hazard rate is given as 0.5, and therefore θ = 1/0.5 = 2. Variance of the Exponential is: θ2 = 4. Variance of the Uniform is: ω2/12. ω2/12 = (3)(4). ⇒ ω = 12. 22.25. A. The sum of 100 Exponential distributions has mean (100)(1/2) = 50, and variance 100
(100)(1/22 ) = 25. P[ ∑ Xi > 57] ≅ 1 - Φ[(57 - 55)/5] = 1 - Φ[1.4] = 0.0808. i =1
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 245
22.26. B. .25 = F(x) = 1 - e-x/2. ⇒ x = -2ln(.75) = 2ln(4/3) = ln(16/9). Comment: F(ln(16/9)) = 1 - exp[-ln(16/9)/2] = 1 -
9 / 16 = 1 - 3/4 = 1/4.
22.27. C. If the loss occurs prior to t = 2, then since the insurer has less than 100 in assets, the probability of ruin is 100%. If the loss occurs subsequent to t = 12, then since the insurer has assets of more than 500, the probability of ruin is 0. If the loss occurs at time 2 ≤ t ≤ 12, then the insurer has assets of 20 + 40t, and the probability of ruin is: {500 - (20 + 40t)}/400 = (12 - t)/10. Adding up the different situations, using that the time of loss is exponentially distributed, the probability of ruin is: 12
12
∫
∫
F(2) + f(t)(12 - t)/10 dt + 0 = (1 - e-.2) + .1e-.1t(12 - t)/10 dt = 2
2 12
.181 + (12/10)(-e-.1t)
12
12
] + .01 ∫te-.1t dt = .181 + 2
.621 - .01(te-.1t + e-.1t) ] = .802 - .320 = 0.482.
2
2
Alternately, if the loss is of size 100, there is ruin if the loss occurs at time t < 2, which has probability: 1- e-.2. If the loss is of size 500, there is ruin if the loss occurs at time t < 12, which as probability: 1 - e-1.2. If the loss is of size x, then there is ruin if the loss occurs prior to time (x - 20)/40, since at t = (x - 20)/40 the assets are: 20 + 40(x-20)/40 = x. Thus for a loss of size x, the probability of ruin is: 1 - exp[-.1(x-20)/40] = 1 - e-(x-20)/400. The losses are uniformly distributed from 100 to 500, so the overall probability of ruin is: 500
500
∫ (1- e-(x-20)/400)(1/400)dx = x/400 - e-(x-20)/400 ] = (1.2 - .2) - ( e-1.2 - e-.2) = 0.482. 100
100
22.28. E. E[X1 X2 ] = E[X1 ]E[X2 ] = θ1θ2. 22.29. C. Given that the first claim is of size x, the probability that the second will be more than twice as large is 1 - F(2x) = S(2x) = e-2x. The overall average probability is: ∞
∫
∞
∞
∫
∫
x=0
x=0
(Probability given x) f(x) dx = e-2x e-x dx = e-3x dx = 1/3. x=0
2013-4-2,
Loss Distributions, §22 Exponential
HCM 10/8/12,
Page 246
22.30. D. The density of the time of failure is: f(t) = e-t/10/10. 7
Expected payment is:
7
∫0 v(t) f(t) dt = ∫0 e
(7 - 0.2t)
7
e - t / 10 / 10 dt = 0.1 e7
∫0 e- 0.3t dt =
0.1 e7 (1 - e- 2.1) 0.3 = 320.78. 22.31. D. Prob[fails during the first year] = F(1) = 1 - e-1/2 = .3935. Prob[fails during the second year] = F(2) - F(1) = e-1/2 - e-2/2 = .2386. Expected Cost = 100{(200)(.3935) + (100)(.2386)} = 10,256. 22.32. C. This is an Exponential with 1/θ = .004. θ = 250. The median of this Exponential is: 250 ln(2) = 173.3, which since it is less than 250 is also the median benefit. 22.33. E. By the memoryless property of the Exponential Distribution, the non-zero payments excess of a deductible are also an Exponential Distribution with mean 300. Thus the 95th percentile of the nonzero payments is: -300 ln(1 - .95) = 899. Adding back the 100 deductible, the 95th percentile of the losses that exceed the deductible is: 999. 22.34. D. Prob[fails during the first year] = F(1) = 1 - e-1/10 = .09516. Prob[fails during the second or third year] = F(3) - F(1) = e-1/10 - e-3/10 = .16402. Expected Cost = .09516x + .16402x/2 = 1000. ⇒ x = 5644. 22.35. D. max(T, 2) + min(T, 2) = T + 2. E[max(T, 2)] = E[T+ 2] - E[min(T, 2)] = E[T] + 2 - E[T
∧
2] = 3 + 2 - 3(1 - e-2/3) = 2 + 3e- 2 / 3.
22.36. E. Each Exponential has variance 102 = 100. The variances of independent variables add: 100 + 100 = 200. Comment: The total time is Gamma with α = 2, θ = 10, and variance (2)(102 ) = 200. 22.37. D. Median = 4 ⇒ .5 = 1 - e-4/θ. ⇒ θ = 5.771. S(5) = e-5/5.771 = 0.421. 22.38. E. The mode of the Inverse Exponential is θ/2. θ/2 = 10000 ⇒ θ = 20000. To get the median: F(x) = e-θ/x = e-20000/x = .5. ⇒ x = 28,854. Comment: One could derive the mode: f(x) = θe−θ/x/x2 . fʼ(x) = -2f(x)/x + f(x)θ/x2 = 0. ⇒ x = θ/2.
2013-4-2,
Loss Distributions, §22 Exponential
22.39. C. LER(x) = E[X
∧
HCM 10/8/12,
Page 247
x]/E[X] = θ(1 - e-x/θ)/θ = 1 - e-x/θ.
LER(d) = 1 - e-d/θ = .7. ⇒ d = 1.204θ. LER(4d/3) = LER(1.605θ) = 1 - e-1.605θ/θ = 1 - e-1.605 = 80.0%. 22.40. E. For the Exponential, LER[x] = E[X
∧
x]/E[X] = 1 - e-x/θ.
1 - e-500/1000 = .3935. We want: 1 - ed/1000 = (2)(.3935) = .7869. ⇒ d = 1546. 22.41. C. Due to the memoryless property of the Exponential, the expected payment per payment is θ, regardless of the deductible. Therefore, for a deductible of d, the expected payment per loss is: S(d)θ = θe-d/θ. Thus 2000 = e-100/θ θ. ⇒ θ = 2000e100/θ. Therefore, the expected payment per loss for a deductible of 500 is: θe-d/θ = 2000e100/θ e-500/θ = 2,000 e-400/θ . Alternately, the expected payment per loss is: E[X] - E[X
∧
d] = θ - θ(1 - e-d/θ) = θe-d/θ. Proceed as before.
Comment: One could solve numerically for θ with result θ = 2098. 22.42. A. Xi is Exponential. ΣXi is Gamma with α = n and θ. X = ΣXi/n is Gamma with α = n and θ/n. Therefore X has 2nd moment: (θ/n)2 (n)(n + 1) = {(n+1)/n} θ2. Alternately, E[ X 2 ] = Var[ X ] + E[ X ]2 = Var[X]/n + E[X]2 = θ2/n + θ2 = {(n+1)/n} θ2. Alternately, for i = j, E[XiXj] = E[X2 ] = 2θ2. For i ≠ j, E[XiXj] = E[Xi] E[Xj] = E[X]2 = θ2. E[ X 2 ] = E[ΣXi/n ΣXj/n] = Σ Σ E[XiXj]/n2 = {(n)(2θ2) + (n2 - n)θ2}/n2 = {(n+1)/n} θ2.
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 248
Section 23, Single Parameter Pareto Distribution The Single Parameter Pareto Distribution is described in Appendix A.4.1.4 of Loss Models. It is not the same as the Pareto distribution described in Appendix A.2.3.1 of Loss Models.99 The Single Parameter Pareto applies to a size of claim distribution above a lower limit θ > 0.100 ⎛θ⎞ α F(x) = 1 - ⎜ ⎟ , x > θ. ⎝x⎠ Note that F(θ) = 0. f(x) =
α θα , x > θ. xα + 1
Since this single parameter distribution is simple to work with it is very widely used by actuaries in actual applications involving excess losses or layers of loss.101 It also has appeared in many past exam questions. Exercise: What is the limited expected value for the Single Parameter Pareto Distribution? x
x
α θα [Solution: E[X ∧ x] = ∫ y f(y) dy + x S(x) = ∫ y α + 1 dy + x θ θ y αθα
x ⎛θ ⎞ α 1 α ⎜ ⎟ = αθ ∫ α dy + θα x1− α = ⎝x ⎠ θ y
x1- α - θ1 - α −α θα αθ θα αθ θα + θα x1− α = + + = . (α − 1) xα −1 α −1 α − 1 (α - 1) xα-1 1 −α xα −1
Comment: In Appendix A of Loss Models, For k = 1, E[X ∧ x] =
E[(X ∧ x)k]
α θk k θα = . α - k (α - k) xα - k
αθ θα , matching the above formula.] α - 1 (α - 1) xα - 1
If one takes F(x) = 1 - {(β+θ)/(x+θ)}α for x > β, then one gets a distribution function of which the “Pareto” and “Single Parameter Pareto” are each special cases. One gets the former “Pareto” for beta = 0 and the latter “Single Parameter Pareto” for theta = 0. 100 The Single Parameter Pareto is designed to work directly with data truncated from below at θ. See “Mahlerʼs Guide to Fitting Loss Distributions.” 101 See “A Practical Guide to the Single Parameter Pareto Distribution,” by Stephen W. Philbrick, PCAS LXXII, 1985, pp. 44. 99
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 249
Exercise: Using the formula for the Limited Expected Value, what is the mean excess loss, e(x), for the Single Parameter Pareto Distribution? αθ αθ θα - { } α - 1 (α -1) xα - 1 E[X] - E[X ∧ x] α - 1 x [Solution: e(x) = = = .] S(x) α - 1 ⎛ θ⎞ α ⎝ x⎠
Single Parameter Pareto Distribution Support: x > θ
Parameters: α > 0 (shape parameter)
D. f. :
⎛θ⎞ α F(x) = 1 - ⎜ ⎟ ⎝x⎠
P. d. f. :
α θα α ⎛ θ⎞ α + 1 f(x) = α + 1 = ⎜ ⎟ x θ ⎝ x⎠
Moments: E[Xn] = Mean =
α θn , α> n α−n
αθ , α>1 α −1
Variance =
α θ2 , α>2 (α − 1)2 (α − 2)
Coefficient of Variation = Mode = θ
1 ,α > 2 α(α − 2)
Skewness =
2(α + 1) α−3
α −2 ,α>3 α
Median = θ 21/α
αθ θα Limited Expected Value Function: E[X ∧ x] = ,α>1 α − 1 (α - 1) xα-1 R(x) = Excess Ratio = (1/α) (x/θ)1−α , α > 1, x > θ e(x) = Mean Excess Loss = x / (α−1) Derivatives of d.f.:
α>1
∂F(x) = -(θ/x)α ln(x / θ) ∂α
Method of Moments: α = m1 / (m1 − θ) Method of Maximum Likelihood: α =
Percentile Matching: α = - ln(1-p1 ) / ln (x1 / θ) N
∑ ln[xi / θ]
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Probability density function of a Single Parameter Pareto with θ = 1000 and α = 2:
f(x) 0.0020
0.0015
0.0010
0.0005
2000
3000
4000
x 5000
Page 250
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
Problems: Use the following information for the next nine questions: X has the probability density function:
f(x) = 607.5 x-3.5, x ≥ 9.
23.1 (1 point) Which of the following is the mean of X? A. less than 12 B. at least 12 but less than 14 C. at least 14 but less than 16 D. at least 16 but less than 18 E. at least 18 23.2 (1 point) Which of the following is the median of X? A. less than 12 B. at least 12 but less than 14 C. at least 14 but less than 16 D. at least 16 but less than 18 E. at least 18 23.3 (1 point) Which of the of following is the mode of X? A. less than 10 B. at least 10 but less than 11 C. at least 11 but less than 12 D. at least 12 but less than 13 E. at least 13 23.4 (1 point) What is the chance that X is greater than 30? A. less than 1% B. at least 1% but less than 2% C. at least 2% but less than 3% D. at least 3% but less than 4% E. at least 4% 23.5 (2 points) What is the variance of X? A. less than 170 B. at least 170 but less than 190 C. at least 190 but less than 210 D. at least 210 but less than 230 E. at least 230
HCM 10/8/12,
Page 251
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 252
23.6 (1 point) What is the coefficient of variation of X? A. less than 1 B. at least 1 but less than 2 C. at least 2 but less than 3 D. at least 3 E. Can not be determined 23.7 (2 points) What is the skewness of X? A. less than 0 B. at least 0 but less than 2 C. at least 2 but less than 4 D. at least 4 E. Can not be determined 23.8 (3 points) What is the Limited Expected Value at 20? A. less than 8 B. at least 8 but less than 10 C. at least 10 but less than 12 D. at least 12 but less than 14 E. at least 14 23.9 (2 points) What is the Excess Ratio at 20? A. less than 9% B. at least 9% but less than 10% C. at least 10% but less than 11% D. at least 11% but less than 12% E. at least 12%
23.10 (3 points) The large losses for Pickwick Insurance are given by X: f(x) = 607.5 x-3.5, x ≥ 9. Pickwick Insurance expects 65 such large losses per year. Pickwick Insurance reinsures the layer of loss from 20 to 30 with Global Reinsurance. How much does Global Reinsurance expect to pay per year for losses from Pickwick Insurance? A. less than 20 B. at least 20 but less than 30 C. at least 30 but less than 40 D. at least 40 but less than 50 E. at least 50
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 253
23.11 (2 points) You are modeling the distribution of the diameters of those meteors that have diameters greater than 1 meter and that hit the atmosphere of the Earth. If you use a Single Parameter Pareto Distribution for the model, what are the possible reasonable values of α and θ? 23.12 (2 points) X follows a Single Parameter Pareto Distribution. What is the expected value of ln(X/θ)? A. 1/(α - 1)
B. 1/α
C. θ/(α - 1)
D. θ/α
E. αθ/(α - 1)
23.13 (2 points) You are given the following graph of a Single Parameter Pareto Distribution. Density 0.06 0.05 0.04 0.03 0.02 0.01 50 100 150 What is the variance of this Single Parameter Pareto Distribution? A. less than 1600 B. at least 1600 but less than 1800 C. at least 1800 but less than 2000 D. at least 2000 but less than 2200 E. at least 2200
200
x
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 254
23.14 (3 points) The Pareto principle, named after economist Vilfredo Pareto, states that for many phenomena, 80% of the consequences stem from 20% of the causes. If the size of loss follows a Single Parameter Pareto Distribution, for what value of α is it the case that 80% of the aggregate losses are expected to come from the largest 20% of the loss events? A. less than 1.1 B. at least 1.1 but less than 1.2 C. at least 1.2 but less than 1.3 D. at least 1.3 but less than 1.4 E. at least 1.4 23.15 (4, 5/86, Q.60) (1 point) Given a Single Parameter Pareto distribution F(x; c, α) = 1 - (c/x)α, for x > c, for a random variable x representing large losses. Which of the distribution functions shown below represents the distribution function of x truncated from below at d, d > c? A. 1 - {(c - d)/x}α
x>c-d
B. 1 - {(c + d)/x}α x > c + d
C. 1 - (d/x)α
x>d
D. 1 - {(d - c)/x}α x > d - c
E. 1 - {d/(x - d)}α
x>d
23.16 (4, 5/89, Q.25) (1 point) The distribution function of the random variable X is F(x) = 1 - 1/x2 , x ≥ 1. Which of the following are true about the mean, median, and mode of X? A. mode < mean < median B. mode < median < mean C. mean < mode < median D. median < mean and the mode is undefined E. None of the above 23.17 (4, 5/90, Q.30) (1 point) Losses, denoted by T, have the probability density function: f(T) = 4 T-5 for 1 ≤ T < ∞ . What is the coefficient of variation of T? A. 1/8
B. 1/4
C.
2 /4
D. 3/4
E. 3 2 /4
23.18 (4, 5/90, Q.31) (1 point) Losses, denoted by T, have the probability density function: f(T) = 4 T-5 for 1 ≤ T < ∞ . What is the coefficient of skewness of T? A. 5
B. 5 2
C. 20/27
D. 5/9
E. 5 2 /9
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 255
23.19 (4, 5/90, Q.33) (1 point) Losses, denoted by T, have the probability density function: f(T) = 4 T-5 for 1 ≤ T < ∞. What is the actual 95th percentile of T? A. less than 2.25 B. at least 2.25 but less than 2.50 C. at least 2.50 but less than 2.75 D. at least 2.75 but less than 3.00 E. at least 3.00 23.20 (4B, 5/92, Q.15) (2 points) Determine the coefficient of variation of the claim severity distribution f(x) = (5/2) x-7/2 , x > 1. A. Less than 0.70 B. At least 0.70 but less than 0.85 C. At least 0.85 but less than 1.00 D. At least 1.00 but less than 1.15 E. At least 1.15 23.21 (1, 5/00, Q.34) (1.9 points) An insurance policy reimburses a loss up to a benefit limit of 10. The policyholderʼs loss, Y, follows a distribution with density function: f(y) = 2/y3 , y > 1. What is the expected value of the benefit paid under the insurance policy? (A) 1.0 (B) 1.3 (C) 1.8 (D) 1.9 (E) 2.0 23.22 (1, 11/00, Q.25) (1.9 points) A manufacturerʼs annual losses follow a distribution with density function f(x) = 2.5 (0.62.5) / x3.5, x > 0.6. To cover its losses, the manufacturer purchases an insurance policy with an annual deductible of 2. What is the mean of the manufacturerʼs annual losses not paid by the insurance policy? (A) 0.84 (B) 0.88 (C) 0.93 (D) 0.95 (E) 1.00 23.23 (1, 5/01, Q.39) (1.9 points) An insurance company insures a large number of homes. The insured value, X, of a randomly selected home is assumed to follow a distribution with density function f(x) = 3x-4, x > 1. Given that a randomly selected home is insured for at least 1.5, what is the probability that it is insured for less than 2? (A) 0.578 (B) 0.684 (C) 0.704 (D) 0.829 (E) 0.875
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 256
23.24 (3, 11/01, Q.37 & 2009 Sample Q.103) (2.5 points) For watches produced by a certain manufacturer: (i) Lifetimes follow a single-parameter Pareto distribution with α > 1 and θ = 4. (ii) The expected lifetime of a watch is 8 years. Calculate the probability that the lifetime of a watch is at least 6 years. (A) 0.44 (B) 0.50 (C) 0.56 (D) 0.61 (E) 0.67 23.25 (1, 5/03, Q.22) (2.5 points) An insurer's annual weather-related loss, X, is a random variable with density function f(x) = 2.5 2002.5/x3.5, x > 200. Calculate the difference between the 30th and 70th percentiles of X . (A) 35 (B) 93 (C) 124 (D) 231 (E) 298
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 257
Solutions to Problems: 23.1. C. Single Parameter Pareto Distribution with θ = 9 and α = 2.5. Mean = {α / (α - 1)} θ = (2.5 / 1.5)(9) = 15. Alternately, one can integrate xf(x) from 9 to ∞. 23.2. A. Single Parameter Pareto Distribution with θ = 9 and α = 2.5. F(x) = 1 - (x / θ)-α = 1 - (x / 9)-2.5. At the median we want F(x) =.5: .5 = (x / 9)-2.5. Therefore, x = (9) .5-1/2.5 = 11.9. 23.3. A. The mode of the Single Parameter Pareto Distribution is always θ which in this case is 9. (The density decreases for x >θ and thus attains its maximum at x=θ.) 23.4. E. F(x) = 1 - (x / θ)-α = 1 - (x / 9)-2.5. 1 - F(30) = (30/9)-2.5 = 0.049. 23.5. B. Single Parameter Pareto Distribution with θ = 9 and α = 2.5. Variance = (α / [ (α - 2) (α − 1)2 ]) θ2 = (92 )(2.5) / [(.5)(1.52 )] = 180. Alternately, one can compute the second moment as the integral of x2 f(x) from 9 to ∞. ∞
∞
∞
∞
∫ x2 f(x) dx = ∫ x2 607.5 x-3.5 dx = 607.5 ∫ x-1.5 dx = (-607.5/.5) x-.5 ] = 1215 / 3 = 405. 9
9
9
9
Thus the variance is 405 - 152 = 405 - 225 = 180. 23.6. A. From the solutions to previous questions, the mean is 15 and the variance is 180, the coefficient of variation is:
180 / 15 = 0.894.
Alternately, the coefficient of variation is 1/ α(α - 2) = 1/ (2.5)(2.5 - 2) = .894. 23.7. E. Single Parameter Pareto Distribution with θ = 9 and α = 2.5. The skewness only exists α > 3, therefore in this case the skewness does not exist. (If one tries to calculate the third moment by taking the integral of x2 f(x) from 9 to ∞, one gets infinity due to evaluating x.5 at ∞.)
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 258
23.8. D. Single Parameter Pareto Distribution with θ = 9 and α = 2.5. E[X ∧ x] = θ [{α − (x/θ)1−α} / (α − 1)] . E[X ∧ 20] = 9[ {2.5 -(20/9)-1.5}/1.5] = 13.19. Alternately, one can compute the integral of xf(x) from θ to x, from 9 to 20: 20
20
20
∫ x f(x) dx = ∫ x 607.5 9
20
x-3.5 dx = 607.5 ∫ x-2.5 dx = (-607.5/1.5) x-1.5 ]
9
9
9
= -405 (.01118 - .03704) = 10.473. E[X ∧ 20] is the sum of the above integral plus 20{1 - F(20)} = 110.47 + 20(9/20)2.5 = 10.473 + 2.717 = 13.19. 23.9. E. Single Parameter Pareto Distribution with θ = 9 and α = 2.5. R(x) = Excess Ratio = (1/α) (x/θ)1−α . R((20) = (1/2.5) (20/9)-1.5 = 12.1%. Alternately, one can compute the integral of 1-F(x) = S(x) from 20 to ∞: ∞
∫ (x/9)-2.5 20
∞
∞
dx = 92.5 ∫ x-2.5 dx = (-243) x-1.5/1.5 ] = (-162) (0 - 0.01118) = 1.811. 20
20
This integral is the losses excess of 20. R(20) is the ratio of the above integral to the mean. Mean = (α / (α − 1))θ = (2.5 / 1.5) (9) = 15. Thus R(20) = 1.811 / 15 = 12.1%. Alternately, using previous solutions, R(20) = 1 - E[X ∧ 20]/E[X] = 1 - 13.19/15 = 12.1%. 23.10. E. Single Parameter Pareto Distribution with θ = 9 and α = 2.5. E[X ∧ x] = θ [{α − (x/θ)1−α} / (α − 1)] . E[X ∧ 30] = 9[ {2.5 -(30/9)-1.5}/1.5] = 14.01. E[X ∧ 20] = 9[ {2.5 -(20/9)-1.5}/1.5] = 13.19. 65 large losses expected per year, so that the Reinsurer expects in the layer from 30 to 20: 65{ E[X ∧ 30] - E[X ∧ 20] } = (65)(14.01-13.19) = 53. Comment: The Reinsurer only expects to make payments on (65)S(20) = (65)(20/9)-2.5 = 8.8 losses per year. Of these (65)S(30) = (65)(30/9)-2.5 =3.2 are expected to exceed the upper limit of the layer; on such claims the reinsurer pays the width of the layer or 10.
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 259
23.11. Since the data is truncated from below at 1 meter, one takes θ = 1 meter. The volume of a meteor is proportional to d3 . Therefore, assuming some reasonable average density, the mass of a meteor is also proportional to d3 . The average mass of meteors hitting the Earth should be finite. Therefore, the distribution of d should have a finite third moment. Therefore, α > 3. Comment: Beyond what you are likely to be asked on your exam. However, it is important to know that the Single Parameter Pareto does not have all of its moments. 23.12. B. Let y = α ln(x/θ). x = θey/α. dx = θey/α dy/α. ∞
∞
∞
∫ln(x/θ) α θα / x(α+1) dx = ∫y θα / (θey/α)α+1 θey/α dy/α = (1/α) ∫ y e-y dy = Γ(2)/α = 1/α. θ
0
0
α θα 23.13. C. For the Single Parameter Pareto, f(x) = α + 1 , x > θ. x Since the graph starts at 50, θ = 50. f(θ) = α/θ. Therefore, 0.06 = α/θ. ⇒ α = (50)(0.06) = 3. E[Xn ] =
α θn , α > n. E[X] = (3)(50)/2 = 75. E[X2 ] = (3)(502 )/1 = 7500. α−n
Var[X] = 7500 - 752 = 1875.
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 260
23.14. B. The scale parameter θ will drop out of the analysis, so for convenience take θ = 1. We want 20% of the aggregate losses to come from the smallest 80% of the loss events. E[X] = θα/(α - 1) = α/(α - 1). The 80th percentile is such that 0.2 = S(x) = 1/xα. ⇒ x = 51/α. Dollars from those losses of size less than x = 51/α is: E[X
∧
x] - xS(x) = α/(α - 1) - 1/{51/α(α - 1)} - 51/α(0.2) = {α/(α - 1)}(1 - 51/α/5).
The portion of the total dollars from those loss of size less than 80th percentile is the above divided by the mean: 1 - 51/α/5. Thus we want: 0.2 = 1 - 51/α/5. ⇒ 4 = 51/α. ⇒ α = 1.161. Comment: For θ = 1 and α = 1.161, E[X] = 1.161/0.161 = 7.211, the 80th percentile is: 51/1.161 = 4, and E[X ∧ 4] = 1.161/0.161 - 1/{(0.161)(4.161)} = 2.242. Dollars from those loss of size less than 4 is: 2.242 - (0.2)(4) = 1.442. 1.442/ 7.211 = 0.20. 23.15. C. The distribution function for the data truncated from below at d is: G(x) = (F(x)-F(d)/(1-F(d)) for x >d. In this case G(x) = ((c/d)α - (c/x)α) / (c/d)α = 1 - (d/x)α for x >d. 23.16. B. The density f(x) = 2 x−3, x ≥ 1. Since the density declines for all x ≥ 1, it has its maximum at x =1. The mode is 1. The mean is the integral from 1 to ∞ of xf(x) which is ∞
∞
∫ 2x−2 dx = -2/x ] x=1
= 2. Thus the mean = 2.
x=1
The median is such that F(x) = .5. Thus .5 = 1 - 1/x2 . median =
2 = 1.414.
Comment: A Single Parameter Pareto Distribution, with α = 2 and θ = 1. Mean = (α / (α − 1)) θ = (2/1)(1) = 2. Mode = θ = 1. Median = θ 21/α = 21/2 = 1.414. For a continuous distribution with positive skewness, such as the Single Parameter Pareto Distribution, typically: mean > median > mode (alphabetical order.)
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
23.17. C.
∞
∞
∞
HCM 10/8/12,
Page 261
x=∞
∫ xf(x)dx = ∫ x(4x-5)dx = 4∫ x-4 dx = -4x-3 /3 ] = 4/3.
mean = 1
1
1
∞
second moment =
x=1
∞
∞
x=∞
∫ x2f(x)dx = ∫ x2(4x-5)dx = 4∫ x-3 dx = -4x-2 /2 ] = 2.
1
1
1
x=1
Thus the variance is: 2 - (4/3)2 = 2/9. The standard deviation is:
2 / 3.
Coefficient of Variation = Standard Deviation / Mean = ( 2 / 3) / (4/3) =
2 / 4.
Comment: A Single Parameter Pareto Distribution with α = 4 and θ = 1. The CV = (α(α−2))-.5 = 8-.5 = 2 / 4. 23.18. B.
∞
third moment =
∞
∞
∞
∫ x3f(x)dx = ∫ x3(4x-5)dx = 4∫ x-2 dx = -4x-1 /1 ] = 4.
1
1
1
1
Thus the skewness = {µ3 ′ - (3 µ1 ′ µ2 ′) + (2 µ1 ′3)} / STDDEV3 = [4 - {(3) (4/3)(2)} + {(2)(4/3)3 }] / ( 2 / 3)3 = {128/27 - 4} / {2 2 / 27} = 20 / (2 2 ) = 5 2 . 23.19. f(T) = 4 T-5 for T ≥ 1, so that taking the integral F(T) = 1 - T-4 for T ≥ 1. At the 95th percentile 0.95 = F(T) = 1 - T-4. Therefore T = (1/.05)1/4 = 2.115. 23.20. C. The mean is
∞
∞
∞
∫ x f(x) dx = ∫ x (5/2) x-7/2 dx = -{(5/2)/(3/2)}x-3/2 ] = 5/3
x=1
x=1
x=1
∞
The 2nd moment is
∞
∫ x2 (5/2) x-7/2 dx = -{(5/2)/(1/2)}x-1/2 ] = 5
x=1
x=1
Thus the variance = 5 - (5/3)2 = 2.22. The coefficient of variation is:
2.22 / (5/3) = 0.894.
Comment: This is a Single Parameter Pareto Distribution, with parameters θ =1 and α = 5/2 = 2.5. It has coefficient of variation equal to:
1 = α(α − 2)
1 (2.5)(2.5 − 2)
= 0.894.
2013-4-2,
Loss Distributions, §23 Single Parameter Pareto
HCM 10/8/12,
Page 262
23.21. D. The density of a Single Parameter Pareto Distribution is: αθα / xα+1, x > θ. This is a Single Parameter Pareto Distribution with α = 2 and θ = 1. E[X
∧
x] = αθ/ (α-1) - θα/{(α-1)xα−1}. E[X
∧
10] = 2 - 1/10 = 1.9.
23.22. C. The mean annual losses not paid by the insurance policy is E[X
∧
2].
This is a Single Parameter Pareto Distribution with α = 2.5 and θ = 0.6. E[X
∧
x] = αθ/ (α-1) - θα/{(α-1)xα−1}. E[X
∧
2] = (2.5)(0.6)/1.5 - (0.62.5)/{(1.5)21.5} = 0.9343.
23.23. A. F(x) = 1 - 1/x3 , x > 1. Prob[X < 2 | X > 1.5] = (F(2) - F(1.5))/S(1.5) = (7/8 - .7037)/.2963 = 0.578. 23.24. A. For the Single Parameter Pareto Distribution, E[X] = αθ/(α-1). Therefore, 8 = 4α/(α-1) ⇒ α = 2. S(x) = (θ/x)α. S(6) = (4/6)2 = 4/9 = 0.444. 23.25. B. F(x) = 1 - (200/x)2.5, x > 200. F(Π30) = .3. ⇒ Π30 = 200(1/.7)1/2.5 = 230.7. F(Π75) = .7. ⇒ Π70 = 200(1/.3)1/2.5 = 323.7. Π70 - Π30 = 323.7 - 230.7 = 93.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 263
Section 24, Common Two Parameter Distributions For the exam it is important for the student to become familiar with the material in Appendix A of Loss Models.102 Here are the four most commonly used Distributions with Two Parameters: Pareto, Gamma, LogNormal, and Weibull.103 Pareto: α is a shape parameter and θ is a scale parameter. Notice the factor of θn in the moments. The Pareto is a heavy-tailed distribution. Higher moments may not exist. The coefficient of variation (when it exists) is always greater than one; the standard deviation is always greater than the mean.104 The skewness for the Pareto is always greater than twice the coefficient of variation. ⎛ θ ⎞α F(x) = 1 - ⎜ ⎟ = 1 - (1 + x / θ)−α. ⎝θ + x⎠
Mean =
θ α −1
E[X2 ] =
2 θ2 , α>2 (α − 1) (α − 2)
α>1
f(x) =
α θα = (α/θ)(1 + x / θ)−(α + 1) (θ + x)α + 1
Variance =
E[Xn ] = n
α θ2 ,α>2 (α − 1)2 (α − 2)
n! θn
∏ (α − i)
=
n! θ n , α> n (α − 1)...(α − n)
i=1
Coefficient of Variation =
Mode = 0. E[X ∧ x] =
α α−2
α>2
α +1 α−3
α−2 α
α>3
Median = θ (21/α - 1).
⎛ θ ⎞ α − 1⎫ θ ⎧ ⎨1 - ⎜ ⎟ ⎬, α > 1 ⎝ θ + x⎠ α −1 ⎩ ⎭
Loss Elimination Ratio = 1 - (1 + x / θ)1 − α, α > 1. 102
Skewness = 2
Mean Excess Loss =
θ+ x ,α>1 α −1
Excess Ratio = (1 + x / θ)1 − α, α > 1
There are a few other distributions used by actuaries than those listed there, and the distributions are sometimes parameterized in a different manner. 103 In my opinion. See a subsequent section for additional two parameter distributions in Loss Models. 104 This fact is also equivalent to the fact that for the Pareto E[X2 ] > 2 E[X]2 .
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 264 Hereʼs a graph of the density function of a Pareto Distribution with α = 3 and θ = 60: f(x) 0.05 0.04 0.03 0.02 0.01
20
40
60
80
100
x
Exercise: Losses prior to any deductible follow a Pareto Distribution with parameters α = 1.7 and θ = 30. A policy has a deductible of size 10. What is the distribution of non-zero payments under that policy? [Solution: After truncating and shifting by d, G(x) = 1 - S(x+d)/S(d) = 1 - S(x+10)/S(10) = ⎛ ⎞ 1.7 30 1 - ⎜ ⎟ ⎛ 40 ⎞ 1.7 ⎝ 30 + x + 10 ⎠ = 1 ⎜ ⎟ . ⎛ 30 ⎞ 1.7 ⎝ 40 + x ⎠ ⎜ ⎟ ⎝ 30 + 10 ⎠ Comment: This is a Pareto Distribution with α = 1.7 and θ = 40.] If losses prior to any deductible follow a Pareto Distribution with parameters α and θ, then after truncating and shifting from below by a deductible of size d: ⎛ ⎞α θ ⎜ ⎟ ⎛ θ + d ⎞α ⎝ θ + x + d⎠ G(x) = 1 - S(x+d)/S(d) = 1 = 1 ⎜ ⎟ . ⎛ θ ⎞α ⎝θ + d + x ⎠ ⎜ ⎟ ⎝ θ + d⎠ If losses prior to any deductible follow a Pareto Distribution with parameters α and θ, then after truncating and shifting from below by a deductible of size d, one gets another Pareto Distribution, but with parameters α and θ + d.105 105
The form of an Exponential Distribution is also preserved under truncation and shifting from below. While for the Exponential the parameter remains the same, for the Pareto the θ parameter becomes θ + d.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 265 Exercise: Losses prior to any deductible follow a Pareto Distribution with parameters α = 1.7 and θ = 30. What is the mean non-zero payment under a policy that has a deductible of size 10? [Solution: The non-zero payments follow a Pareto Distribution with α = 1.7 and θ = 40, with a mean of 40/(1.7-1) = 57.1. Alternately, the mean of the data truncated and shifted from below is the mean excess loss for the original Pareto Distribution. e(x) = (θ+x)/(α-1). e(10) = (30 + 10)/(1.7-1) = 57.1.] Gamma:106 107 α is a shape parameter and θ is a scale parameter.108 Note the factors of θ in the moments. For α = 1, the Gamma is an Exponential Distribution. The Gamma always has well-defined moments and is thus not as heavy-tailed as other distributions such as the Pareto. The sum of two independent random variables each of which follows a Gamma distribution with the same scale parameter, is also a Gamma distribution; it has a shape parameter equal to the sum of the two shape parameters and the same scale parameter. Specifically the sum of n independent identically distributed variables which are Gamma with parameters α and θ is a Gamma distribution with parameters nα and θ. The Gamma is infinitely divisible; if X follows a Gamma, then given any n >1 we can find a random variable Y which also follows a Gamma, such that adding up n independent version of Y gives X. Take n independent copies of a Gamma with parameters α/n and θ. Their sum is a Gamma with parameters α and θ. For a positive integral shape parameter, α = m, the Gamma distribution is the sum of m independent variables each of which follows an Exponential distribution. Thus for α = 1, we get an Exponential. The sum of two independent, identically distributed Exponential variables follows a Gamma Distribution with α = 2. As α approaches infinity the Gamma approaches a Normal distribution by the Central Limit Theorem. The Gamma has variance equal to αθ2, the sum of α identical independent exponential distributions each with variance θ2. 106
The incomplete Gamma Function, which underlies the Gamma Distribution, is covered in “Mahlerʼs Guide to Frequency Distributions.” 107 The Gamma Distribution is sometimes called a Pearson Type III Distribution. 108 In Actuarial Mathematics θ is replaced by 1/β.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 266 For α = ν /2, the Gamma is a Chi-Square distribution with ν degrees of freedom.109 A Chi-Square distribution with ν degrees of freedom, is in turn the sum of ν squares of independent unit Normal Distributions Thus the Gamma is a sum of ν = 2α squares of unit Normal Distributions. Also note that the series given in the Theorem A.1 in Appendix A of Loss Models for the Incomplete Gamma function, for α = m, is the sum of Poisson probabilities for the number of events greater than or equal to m. F(x) = Γ(α; x/θ)
Mean = αθ
f(x) =
(x / θ)α e- x/ θ x Γ(α)
=
x α -1 e - x / θ . θ α Γ(α)
Variance = αθ2
E[Xk] = θk (α + k - 1) ... α , for k a positive integer.
E[Xk] = θk
Γ(α + k) , k > −α. Γ(α)
Mode = θ(α - 1), α > 1. Mode = 0, α ≤ 1.
{
}
{
}
Points of inflection: θ α − 1 ± α − 1 , α > 2; θ α − 1 + α − 1 , 2 ≥ α > 1. Coefficient of Variation = 1/ α
Skewness = 2/ α = 2CV.
Kurtosis = 3 + 6/α = 3+ 6CV2 .
The skewness for the Gamma distribution is always twice the coefficient of variation. Thus the Gamma is likely to fit well only to data sets for which this is true. The Kurtosis of a Gamma is always greater than 3, the kurtosis of a Normal Distribution. As α goes to infinity, the Kurtosis of a Gamma goes to 3, the kurtosis of a Normal, since the Gamma approaches a Normal. For a Gamma: 2 Kurtosis - 3 Skewness2 = 6.
109
The Chi-Square with ν degrees of freedom is a Gamma with α = ν /2, θ = 2, mean ν, and variance 2ν.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 267 Hereʼs a graph of the density function of a Gamma Distribution with α = 3 and θ = 10: Prob. 0.025 0.02 0.015 0.01 0.005
20
40
60
80
100
Size
For α = 3, the Gamma is a peaked curve, skewed to the right .110 Note that while Iʼve only shown x ≤ 100, the density is positive for all x > 0. Note that for α ≤ 1, rather than a peaked curve, we get a curve with mode of zero.111 Note that for very large alpha, one would get a curve much less skewed to the right.
110 111
This general description applies to the densities of most Loss Distributions. For alpha =1, one gets an Exponential Distribution, with mode of zero.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 268 LogGamma:112 If ln[X] follows a Gamma Distribution, then X follows a LogGamma Distribution. The distribution function of the LogGamma Γ[a ; ln(x)/θ] is just that of the Gamma Γ(α ; x/θ) with ln(x) in place of x. In order to derive the density of the LogGamma, one can differentiate the distribution function, but must remember to use the chain rule and take into account the change of variables. Let y = ln(x) . F(x) = Γ(α ; ln(x)/θ) = Γ(α ; y/θ). Therefore, f(x) = dF/dx = (dF/dy)(dy/dx) = (density function of Gamma in terms of y) (1/x) = {θ−αy α−1 e−y/θ / Γ(α)} / x = θ−α{ln(x)}α−1 e−ln(x)/θ /Γ(α)} / x = θ−α {ln(x/θ)}α−1 / {x1+1/θ Γ(α) } , x > 1. α is a shape parameter and θ is not exactly a scale parameter. For very large α the distribution approaches a LogNormal distribution, (just as the Gamma approaches the Normal distribution.) For α = 1, ln[X] follows an Exponential Distribution, and one gets a Single Parameter Distribution. If one were to graph the size of loss distribution, but have the x-axis (the size of loss) on a logarithmic scale, then the size of loss distribution would look much less skewed. If ln(x) followed a Gamma, then x itself follows a LogGamma distribution. The LogGamma is much more skewed than the Gamma distribution. The product of independent LogGamma variables with the same θ parameter is a LogGamma with the same θ parameter and the sum of the individual α parameters.113
112
The LogGamma is not in Appendix A of Loss Models, and is extremely unlikely to be asked about on your exam. For more information on the LogGamma see for example Loss Distributions by Hogg & Klugman. 113 This follows from the fact that the sum of two independent random variables each of which follows a Gamma distribution with the same scale parameter, is also a Gamma distribution; it has a shape parameter equal to the sum of the two shape parameters and the same scale parameter.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 269 LogNormal: If one were to graph the size of loss distribution, but have the x-axis (the size of loss) on a logarithmic scale, then the size of loss distribution would be much less skewed. If ln(x) follows a (symmetric) Normal, then x itself follows a LogNormal.114 The product of a series of independent LogNormal variables is also LogNormal.115 The only condition necessary to produce a LogNormal Distribution is that the amount of an observed value be the product of a large number of factors, each of which is independent of the size of any other factor.116 Please note that µ is not the mean of the LogNormal nor is σ the standard deviation. Rather µ is the mean and σ is the standard deviation of the Normal Distribution of the logs of the claim sizes. σ is a shape parameter; note the way the CV and skewness only depend on σ. As parameterized in Loss Models, the LogNormal Distribution does not have a scale parameter. However, we can rewrite the Distribution Function: F(x) = Φ[{ln(x)−µ} / σ] = Φ[{ln(x)−ln(eµ)} / σ] = Φ[{ln(x/eµ)} / σ] . Thus since everywhere x appears in the distribution function it is divided by eµ, eµ would be the scale parameter for the LogNormal. Thus if reparameterized in this way, the LogNormal Distribution would have a scale parameter. Note the way that (eµ)n appears in the formula for the moments of the LogNormal, another sign that eµ would be the scale parameter, if one parameterized the distribution differently. The LogNormal Distribution can also be used to model stock prices.117
⎡ ln(x) − µ ⎤ F(x) = Φ⎢ ⎥⎦ ⎣ σ
[
exp f(x) =
( ln(x)
− µ)2 2σ2 x σ 2π
]
Mean = exp[µ + σ2/2] Second moment = exp[2µ + 2σ2] 114
E[Xn ] = exp[nµ + n2 σ2/2] .
The LogNormal is less skewed than the LogGamma distribution, (because the Normal distribution is less skewed than the Gamma distribution.) 115 Since the sum of independent Normal variables is also a Normal. 116 Quoted from “Sampling Theory in Casualty Insurance,” by Arthur L. Bailey, PCAS 1942 and 1943. 117 See Derivative Markets by McDonald, not on the syllabus of this exam.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 270 Variance = exp(2µ + σ2) {exp(σ2) -1}
Skewness =
Coefficient of Variation = exp(σ2 ) - 1
exp(3σ2) - 3 exp(σ2) + 2 = (3 + CV2 ) CV. {exp(σ2 ) - 1}1.5
Mode = exp(µ − σ2)
Median = exp(µ)
The relationships between the Gamma, Normal, LogGamma, and LogNormal Distributions are shown below:118 Gamma
α →∞
⇒
⇓
y = ln(x)
Normal
⇓
y = ln(x)
LogGamma
⇒
α →∞
LogNormal
Exercise: A LogNormal Distribution has parameters µ = 5 and σ = 1.5. What are the mean and variance? [Solution: Mean = exp(µ+.5σ2) = exp(5 +(1.52 )/2) = 457.145. Second Moment = exp(2µ+2σ2) = exp[10 + (2)(1.52 )] = 1,982,759. Variance = 1,982,759 - 457.1452 = 1,773,777.] The formula for the moments of a LogNormal Distribution follows from the formula for its mean. If X is LogNormal with parameters µ and σ, then ln(X) is Normal with the same parameters. Therefore, n ln(X) is Normal with parameters nµ and nσ. Therefore, exp[n ln(X)] = Xn is LogNormal with parameters nµ and nσ. Therefore, E[Xn ] = exp[nµ + (nσ)2 /2] = exp[nµ + n2 σ2/2]. Exercise: For a LogNormal Distribution with µ = 8.0064 and σ = 0.6368, what are the mean, median and mode? [Solution: Mean = exp(µ + .5 σ2) = 3674. Median = exp(µ) = 3000. Mode = exp(µ − σ2) = 2000.] 118
A summary of the Normal Distribution appears in “Mahlerʼs Guide to Frequency Distributions.”
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 271 Here is a graph of this LogNormal Distribution, with µ = 8.0064 and σ = 0.6368: density
Mode
0.00025 Median 0.00020 Mean 0.00015
0.00010
0.00005
2000
3000 3674
x
The LogNormal is a heavy-tailed distribution, yet all of its (positive) moments exist.119 Its mode (place where the density is largest) is less than it median (place where the distribution function is 50%), which in turn is less than its mean (average). As σ increases, the LogNormal gets heavier-tailed. For a LogNormal Distribution with σ = 2, what is the ratio of the mean to the median? [Solution: Mean / Median = exp(µ + σ2/2) / exp(µ) = exp(σ2/2) = e2 = 7.39.] For a LogNormal Distribution with σ = 2, what is the ratio of the median to the mode? [Solution: Median / Mode = exp(µ) / exp(µ − σ2) = exp(σ2) = e4 = 54.6.] For a LogNormal Distribution with σ = 2, what is the probability of a loss of size less than the mean? [Solution: Mean = exp(µ + .5 σ2). F(mean) = Φ[(ln(mean) - µ)/σ)] = Φ[(µ + .5 σ2 - µ)/σ)] = Φ[.5 σ] = Φ[1] = 84.13%.] 119
See the section on tails of distributions. The LogNormal is the distribution with the heaviest tail such that all of its moments exist.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 272 Weibull: τ is a shape parameter, while θ is a scale parameter. τ = 1 is the Exponential Distribution.
⎡ ⎛ x ⎞ τ⎤ F(x) = 1 - exp⎢-⎜ ⎟ ⎥ ⎣ ⎝ θ⎠ ⎦
⎡ ⎛ x ⎞ τ⎤ ⎛ x ⎞τ τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥ ⎛ x ⎞τ ⎝ θ⎠ τ ⎣ ⎝ θ⎠ ⎦ f(x) = = τ x τ− 1 exp -⎜ ⎟ . ⎝ θ⎠ x θ
[ ]
⎛ τ - 1⎞ 1 / τ The mode of a Weibull is: θ ⎜ ⎟ for τ > 1, and 0 for τ ≤ 1. ⎝ τ ⎠ The Weibull Distribution is a generalization of the Exponential. One applies a “power transformation” to the size of loss and gets a new more general distribution with two parameters from the Exponential Distribution with one parameter. So where x/θ appears in the Exponential Distribution, (x/θ)τ appears in the Weibull Distribution. ⎛ x ⎞τ S(x) = exp -⎜ ⎟ . Note that for large τ, the righthand tail can decrease very quickly since x is taken ⎝ θ⎠
[ ]
to a power in the exponential.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 273 Here is a graph of a Weibull Distribution with θ = 100 and τ = 2 (solid), compared to an Exponential Distribution with θ = 100 (dashed): Prob. 0.01
0.008
0.006
0.004
0.002
100
200
300
400
500
x
For τ sufficiently large the Weibull has a negative skewness. For τ less than 1 the Mean Excess Loss increases, but for τ greater than 1 the Mean Excess Loss decreases. For τ = 1 you get the Exponential Distribution, with constant Mean Excess Loss. For large x the Mean Excess Loss is proportional to x1−τ. The mean of the Weibull is θ Γ(1+ 1/τ). For τ = 1, we have an Exponential with mean θ Γ(2) = θ. For example, for τ = 1/3, the mean is: θ Γ(4) = 6θ. For example for τ = 2, the mean is: θ Γ(3/2) = θ
π /2 = 0.8862 θ.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 274 Here is a graph of the mean divided by θ, as function of the shape parameter of the Weibull Distribution, τ: mean over theta 1.3 1.2 1.1 1.0 0.9
1
2
3
4
The τth moment of a Weibull with shape parameter τ is: θτ Γ(1 + τ/τ) = θτ Γ(2) = θτ. For example, the third moment of a Weibull Distribution with τ = 3 is θ3.
5
tau
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 275 Problems: 24.1 (1 point) Which of the following statements are true? 1.
The mean of a LogNormal distribution, exists for all values of σ > 0.
2.
The mean of a Pareto distribution, exists for all values of α > 1.
3.
The variance of a Weibull distribution, exists for all values of τ > 1.
A. 1
B. 2
C. 1, 2
D. 2, 3
E. 1, 2, 3
24.2 (1 point) Given a Gamma Distribution with a coefficient of variation of 0.5, what is the value of the parameter α? A. 1
B. 2
C. 3
D. 4
E. Cannot be determined.
24.3 (2 points) Given a Gamma Distribution with a coefficient of variation of 0.5 and skewness of 1, what is the value of the parameter θ? A. 1
B. 2
C. 3
D. 4
E. Cannot be determined.
24.4 (1 point) Given a Weibull Distribution with parameters θ = 100,000 and τ = 0.2, what is the survival function at 25,000? A. 43% B. 44% C. 45%
D. 46%
E. 47%
24.5 (1 point) Given a Weibull Distribution with parameters θ = 100,000 and τ = 0.2, what is the mean? A. less than 10 million B. at least 10 million but less than 15 million C. at least 15 million but less than 20 million D. at least 20 million but less than 25 million E. at least 25 million 24.6 (1 point) Given a Weibull Distribution with parameters θ = 100,000 and τ = 0.2, what is the median? A. less than 10 thousand B. at least 10 thousand but less than 15 thousand C. at least 15 thousand but less than 20 thousand D. at least 20 thousand but less than 25 thousand E. at least 25 thousand
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 276 24.7 (3 points) X follows a Gamma Distribution with α = 4 and θ = 10. Y follows a Pareto Distribution with α = 3 and θ = 10. X and Y are independent and Z = XY. What is the variance of Z? A. 120,000 B. 140,000 C. 160,000 D. 180,000
E. 200,000
24.8 (1 point) Given a Pareto Distribution with parameters α = 2.5 and θ = 34 million, what is the distribution function at 20 million? A. less than 60% B. at least 60% but less than 65% C. at least 65% but less than 70% D. at least 70% but less than 75% E. at least 75% 24.9 (1 point) Given a Pareto Distribution with parameters α = 2.5 and θ = 34 million, what is the mean? A. less than 10 million B. at least 10 million but less than 15 million C. at least 15 million but less than 20 million D. at least 20 million but less than 25 million E. at least 25 million 24.10 (1 point) Given a Pareto Distribution with parameters α = 2.5 and θ = 34 million, what is the median? A. less than 10 million B. at least 10 million but less than 15 million C. at least 15 million but less than 20 million D. at least 20 million but less than 25 million E. at least 25 million 24.11 (2 points) Given a Pareto Distribution with parameters α = 2.5 and θ = 34 million, what is the standard deviation? A. less than 50 million B. at least 50 million but less than 55 million C. at least 55 million but less than 60 million D. at least 60 million but less than 65 million E. at least 65 million
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 277 24.12 (2 points) The times of reporting of claims follows a Weibull Distribution with τ = 1.5 and θ = 4. If 172 claims have been reported by time 5, estimate how many additional claims will be reported in the future. A. 56 B. 58 C. 60 D. 62 E. 64 24.13 (1 point) X1 , X2 , X3 , X4 are a sample of size four from an Exponential Distribution with mean 100. What is the mode of X1 + X2 + X3 + X4 ? A. 0
B. 100
C. 200
D. 300
E. 400
24.14 (2 points) You are given the following:
•
V is distributed according to an Gamma Distribution with parameters α = 3 and θ = 50.
•
W is distributed according to an Gamma Distribution with parameters α = 5 and θ = 50.
•
X is distributed according to an Gamma Distribution with parameters α = 9 and θ = 50.
•
V, W, and X are independent.
Which of the following is the distribution of Y = V + W + X? A. Gamma with α = 17 and θ = 50
B. Gamma with α = 17 and θ = 150
C. Gamma with α = 135 and θ = 50
D. Gamma with α = 135 and θ = 150
E. None of the above. 24.15 (3 points) A large sample of claims has an observed average claim size of $2,000 with a variance of 5 million. Assuming the claim severity distribution to be LogNormal, estimate the probability that a particular claim exceeds $3,500. A. less than 0.14 B. at least 0.14 but less than 0.18 C. at least 0.18 but less than 0.22 D. at least 0.22 but less than 0.26 E. at least 0.26 24.16 (1 point) Which of the following are Exponential Distributions? 1. The Gamma Distribution as α approaches infinity. 2. The Gamma Distribution with α = 1. 3. The Weibull Distribution with τ = 1. A. 1, 2
B. 1, 3
C. 2, 3
D. 1, 2, 3
E. None of A, B, C, or D
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 278 24.17 (2 points) Claims are assumed to follow a Gamma Distribution, with α = 5 and θ = 1000. What is the probability that a claim exceeds 8,000? (Use the Normal Approximation.) A. less than 4% B. at least 4% but less than 6% C. at least 6% but less than 8% D. at least 8% but less than 10% E. at least 10% 24.18 (1 point) What is the mode of a Pareto Distribution, with α = 2 and θ = 800? A. B. C. D. E.
less than 700 at least 700 but less than 800 at least 800 but less than 900 at least 900 but less than 1000 at least 1000
24.19 (2 points) The claim sizes at first report follow a LogNormal Distribution, with µ = 10 and σ = 2.5. The amount by which a claim develops from first report to ultimate also follows a LogNormal Distribution, but with µ = 0.1 and σ = 0.5. Assume that there are no new claims reported after first report, and that the distribution of development factors is independent of size of claim. What is the probability that a claim chosen at random is more than 1 million at ultimate? A. less than 4% B. at least 4% but less than 6% C. at least 6% but less than 8% D. at least 8% but less than 10% E. at least 10% 24.20 (2 points) X follows a Gamma Distribution, with α = 5 and θ = 1/1000. What is the expected value of 1/X? A. less than 200 B. at least 200 but less than 220 C. at least 220 but less than 240 D. at least 240 but less than 260 E. at least 260
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 279 24.21 (2 points) X follows a LogNormal Distribution, with µ = 6 and σ = 2.5. What is the expected value of 1/X2 ? A. less than 1.4 B. at least 1.4 but less than 1.6 C. at least 1.6 but less than 1.8 D. at least 1.8 but less than 2.0 E. at least 2.0 24.22 (1 point) What is the mean of a Pareto Distribution, with α = 4 and θ = 3000? A. B. C. D. E.
less than 700 at least 700 but less than 800 at least 800 but less than 900 at least 900 but less than 1000 at least 1000
24.23 (2 points) What is the coefficient of variation of a Pareto Distribution, with α = 4 and θ = 3000? A. B. C. D. E.
less than 1.4 at least 1.4 but less than 1.6 at least 1.6 but less than 1.8 at least 1.8 but less than 2.0 at least 2.0
24.24 (2 points) What is the coefficient of skewness of a Pareto Distribution, with α = 4 and θ = 3000? A. B. C. D. E.
less than 1 at least 1 but less than 3 at least 3 but less than 5 at least 5 but less than 7 at least 7
24.25 (3 points) Y is the sum of 90 independent values drawn from a Weibull Distribution with τ = 1/2. Using the Normal Approximation, estimate the probability that Y > 1.2 E[Y]. A. 5%
B. 10%
C. 15%
D. 20%
E. 25%
24.26 (2 points) For a LogNormal Distribution, the ratio of the 99th percentile to the 95th percentile is 3.4. Determine the σ parameter. A. 1.8
B. 1.9
C. 2.0
D. 2.1
E. 2.2
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 280 24.27 (3 points) You are given: • A claimant receives payments at a rate of 1 paid continuously while disabled.
• •
Payments start immediately.
•
The force of interest is δ.
The length of disability follows a Gamma distribution with parameters α and θ.
At the time of disability, what is the actuarial present value of these payments? 1 - (δ + θ) - α 1 - (1 + δθ)- α θα θ (A) (B) (C) (D) δ δ 1 + δθ (1 + δθ)α (E) None of A, B, C, or D. 24.28 (1 point) What is the median of a LogNormal Distribution, with µ = 4.2 and σ = 1.8? A. B. C. D. E.
less than 70 at least 70 but less than 75 at least 75 but less than 80 at least 80 but less than 85 at least 85
24.29 (2 points) For a LogNormal Distribution, with µ = 4.2 and σ = 1.8, what is the probability that a claim is less than the mean? A. less than 70% B. at least 70% but less than 75% C. at least 75% but less than 80% D. at least 80% but less than 85% E. at least 85% 24.30 (3 points) Calculate the fourth central moment of a Gamma Distribution with parameters α and θ. A. θ4
B. θ4 6α
C. θ4{3α 2 +6α}
D. θ4{α3 + 3α 2 +6α}
E. θ4{α4 - 3α3 + 3α 2 +6α}
24.31 (1 point) What is the kurtosis of a Gamma Distribution with parameters α and θ? Hint: Use the solution to the previous question. A. 3
B. 3 + 6/α
C. 3 + 6θ/α
D. 3 + 6θ/α + 3θ2/α2
E. None of A, B, C, or D.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 281 24.32 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the mean? A. 18
B. 19
C. 20
D. 21
E. 22
24.33 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the median? A. less than 18 B. at least 18 but less than 19 C. at least 19 but less than 20 D. at least 20 but less than 21 E. at least 21 24.34 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the mode? A. less than 18 B. at least 18 but less than 19 C. at least 19 but less than 20 D. at least 20 but less than 21 E. at least 21 24.35 (1 point) For a LogNormal Distribution, with µ = 3 and σ = 0.4, what is the survival function at 50? A. 1.1%
B. 1.2%
C. 1.3%
D. 1.4%
E. 1.5%
24.36 (2 points) You are given:
•
Future lifetimes follow a Gamma distribution with parameters α and θ.
• •
The force of interest is δ.
A whole life insurance policy pays 1 upon death. What is the actuarial present value of this insurance? 1 δ δ (A) (B) (C) α α (δ + θ) (1 + δθ) (δ + θ)α
(D)
θ (1 + δθ)α
(E) None of A, B, C, or D. 24.37 (2 points) Prior to the application of any deductible, losses follow a Pareto Distribution with α = 3.2 and θ = 135. If there is a deductible of 25, what is the density of non-zero payments at 60? A. B. C. D. E.
less than 0.0045 at least 0.0045 but less than 0.0050 at least 0.0050 but less than 0.0055 at least 0.0055 but less than 0.0060 at least 0.0060
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 282 Use the following information for the next three questions: X is Normally distributed with mean 4 and standard deviation 0.8. 24.38 (1 point) What is the mean of eX? A. 69
B. 71
C. 73
D. 75
E. 77
24.39 (2 points) What is the standard deviation of eX? A. 69 B. 71 C. 73 D. 75 E. 77 24.40 (2 points) What is the 10th percentile of eX? A. 20 B. 25 C. 30 D. 35
E. 40
24.41 (2 points) A company has two electric generators. The time until failure for each generator follows an exponential distribution with mean 10. The company will begin using the second generator immediately after the first one fails. What is the probability that both generators have failed by time 30? A. 70% B. 75% C. 80% D. 85% E. 90% 24.42 (3 points) For a LogNormal Distribution, with parameters µ and σ, what is the value of the ratio
mean - mode ? mean - median
A.
1 - exp(-σ 2) 1 - exp(-0.5 σ 2)
B.
1 - exp(-1.5 σ 2) 1 - exp(-0.5 σ 2)
D.
1 - exp(-0.5 σ 2) 1 - exp(-σ 2)
E.
1 - exp(-σ 2) exp(-σ 2) - exp(-0.5 σ 2)
C.
1 - exp(-1.5 σ 2) 1 - exp(-σ 2)
24.43 (3 points) You are given: Future lifetimes follow a Weibull distribution with τ = 2 and θ = 30.
• • •
The force of interest is .04.
A whole life insurance policy pays $1 million upon death. Calculate the actuarial present value of this insurance. ∞
Hint:
∫ exp(-x2) dx =
π {1 - Φ(b 2 )}
b
(A) $325,000
(B) $350,000
(C) $375,000
(D) $400,000
(E) $425,000
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 283 24.44 (3 points) The following three LogNormal Distributions have been fit to three different sets of claim size data for professional liability insurance: Physicians
µ = 7.8616
σ2 = 3.1311
Surgeons
µ = 8.0562
σ2 = 2.8601
Hospitals
µ = 7.4799
σ2 = 3.1988
Compare their means and coefficients of variation. 24.45 (2 points) The data represented by the following histogram is most likely to follow which of the following distributions? A. Normal
B. Exponential
C. Gamma, α > 1
D. Pareto
E. Single Parameter Pareto
24.46 (3 points) Prior to the application of any deductible, losses follow a Pareto Distribution with α = 2.5 and θ = 47. There is a deductible of 10. What is the variance of amount paid by the insurer for one loss, including the possibility that the amount paid is zero? A. 4400 B. 4500 C. 4600 D. 4700 E. 4800 24.47 (3 points) Size of loss is Exponential with mean 4. Three losses occur. What is the probability that the sum of these three losses is greater than 20? A. less than 8% B. at least 8% but less than 10% C. at least 10% but less than 12% D. at least 12% but less than 14% E. at least 14%
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 284 Use the following information for the next 4 questions: Prior to the effect of any deductible, the losses follow a Pareto Distribution with parameters α = 2.5 and θ = 24. 24.48 (2 points) If the insured has an ordinary deductible of size 15, what is the average payment by the insurer per non-zero payment? A. 18 B. 20 C. 22
D. 24
E. 26
24.49 (2 points) If the insured has a franchise deductible of size 15, what is the average payment by the insurer per non-zero payment? A. 37 B. 39 C. 41 D. 43 E. 45 24.50 (2 points) If the insured has an ordinary deductible of size 10, what is the average payment by the insurer per loss? A. 9.5 B. 10.0 C. 10.5 D. 11.0 E. 11.5 24.51 (2 points) If the insured has an franchise deductible of size 10, and there are 73 losses expected per year, what are the insurer's expected annual payments? A. 850 B. 900 C. 950 D. 1000 E. 1050 24.52 (1 point) Given a Weibull Distribution with parameters θ = 1000 and τ = 1.5, what is the mode? A. less than 470 B. at least 470 but less than 480 C. at least 480 but less than 490 D. at least 490 but less than 500 E. at least 500 24.53 (2 points) Size of loss is LogNormal with µ = 7 and σ = 1.6. One has a sample of 10 independent losses: X1 , X2 , ..., X10. Let Y be their geometric average, Y = (X1 X2 ... X10)1/10. Determine the expected value of Y. A. 1100 B. 1150 C. 1200
D. 1250
E. 1300
24.54 (2 points) Determine the variance of a Weibull Distribution with parameters θ = 9 and τ = 4. Some values of the gamma function are: Γ(0.25) = 3.62561, Γ( 0.5) = 1.77245, Γ( 0.75) =1.22542, Γ(1) = 1, Γ(1.25) = 0.90640, Γ(1.5) = 0.88623, Γ(1.75) = 0.91906, Γ(2) = 1. A. 3
B. 5
C. 7
D. 9
E. 11
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 285 24.55 (4 points) Demonstrate that for a Gamma Distribution with α ≥ 1, (mean - mode)/(standard dev.) = (skewness)(kurtosis + 3)/(10 kurtosis - 12 skewness2 - 18). 24.56 (4 points) For a three-year term insurance on a randomly chosen member of a population: (i) 1/4 of the population are smokers and 3/4 are nonsmokers. (ii) The future lifetimes follow a Weibull distribution with: τ = 2 and θ = 15 for smokers τ = 2 and θ = 20 for nonsmokers (iii) The death benefit is 100,000 payable at the end of the year of death. (iv) i = 0.06 Calculate the actuarial present value of this insurance. (A) 2000 (B) 2100 (C) 2200 (D) 2300 (E) 2400 24.57 (2 points) X1 , X2 , X3 is a sample of size three from a Gamma Distribution with α = 5 and mean of 100. What is the mode of X1 + X2 + X3 ? A. 200
B. 220
C. 240
D. 260
E. 280
24.58 (3 points) X follows a LogNormal Distribution, with µ = 3 and σ2 = 2. Y also follows a LogNormal Distribution, but with µ = 4 and σ2 = 1.5. X and Y are independent. Z = XY. What is the standard deviation of Z? A. 35,000 B. 36,000 C. 37,000 D. 38,000 E. 39,000 24.59 (2 points) You are given the following: • The Tan Teen Insurance Company is set up solely to jointly insure 7 independent lives.
• Each life has a future lifetime which follows a Weibull Distribution with τ = 2 and θ = 30. • Tan Teen starts with assets of 5, which grow continuously at 10% per year. Thus the assets at time t are: 5(1.10)t.
• Tan Teen pays 50 upon the death of the last survivor. Calculate the probability of ruin of Tan Teen. A. Less than 0.3% B. At least 0.3% but less than 0.4% C. At least 0.4% but less than 0.5% D. At least 0.5% but less than 0.6% E. At least 0.6%
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 286 24.60 (3 points) For medical malpractice insurance, the size of economic damages follows a LogNormal Distribution with a coefficient of variation of 5.75. What is the probability that an economic damage exceeds the mean? A. 11%
B. 13%
C. 15%
D. 17%
E. 19%
24.61 (2 points) You are given: (i) The frequency distribution for the number of losses for a policy with no deductible is Binomial with m = 50 and q = 0.3. (ii) Loss amounts for this policy follow the Pareto distribution with θ = 2000 and α = 4. Determine the expected number of payments when a deductible of 500 is applied. (A) Less than 5 (B) At least 5, but less than 7 (C) At least 7, but less than 9 (D) At least 9, but less than 11 (E) At least 11 24.62 (2 points) The ratio of the median to the mode of a LogNormal Distribution is 5.4. What is the second parameter, σ, of this LogNormal? A. 0.9
B. 1.0
C. 1.1
D. 1.2
E. 1.3
24.63 (2 points) Define the quartiles as the 25th, 50th, and 75th percentiles. Define the interquartile range as the difference between the third and first quartiles, in other words as the 75th percentile minus the 25th percentile. Determine the interquartile range for a Pareto Distribution with α = 2 and θ = 1000. (A) Less than 825 (B) At least 825, but less than 850 (C) At least 850, but less than 875 (D) At least 875, but less than 900 (E) At least 900
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 287 24.64 (3 points) A random sample of size 20 is drawn from a LogNormal Distribution with µ = 3 and σ = 2. Determine E[ X 2 ]. (A) 50,000
(B) 60,000
(C) 70,000
(D) 80,000
(E) 90,000
24.65 (2, 5/83, Q.7) (1.5 point) W has LogNormal Distribution with parameters µ = 1 and σ = 2. Which of the following random variables has a uniform distribution on [0, 1]? A. In[(Φ[W] - 1)/4]
B. Φ[(ln[W] - 1)/4]
C. Φ[(ln[W] - 1)/2] D. Φ[ln[(W - 1)/2]] E. In[(Φ[W] - 1)/2]
24.66 (2, 5/85, Q.46) (1.5 point) Let X be a continuous random variable with density function f(x) = ax2 e-bx for x ≥ 0, where a > 0 and b > 0. What is the mode of X? A. 0 B. 2 C. 2/b D. b/2 E. ∞ 24.67 (4, 5/86, Q.49) (1 point) Which of the following statements are true? 1. If X is normally distributed, then ln(X) is lognormally distributed. 2. The tail of a Pareto distribution does not approach zero as fast as does the tail of a lognormal distribution. 3. The mean of a Pareto distribution exists for all values of its parameters. A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3 24.68 (4, 5/87, Q.48) (1 point) Which of the following are true? 1. The sum of 25 independent identically distributed random variables from a markedly skewed distribution has an approximate normal distribution. 2. The sum of two independent normal random variables is a normal random variable. 3. A random variable X is lognormally distributed if X = ln(Z) where Z is normally distributed. A. 1 B. 2 C. 3 D. 2, 3 E. None of the above. 24.69 (160, 5/88, Q.4) (2.1 points) A survival function is defined by: f(t) = k (t/β2) e-t/β; t > 0, β > 0. Determine k. (A) 1 / β4
(B) 1 / β2
(C) 1
(D) β2
(E) β4
24.70 (4, 5/88, Q.47) (1 point) Which of the following statements are true? 1. The LogNormal distribution is symmetric. 2. The tail of the Pareto distribution fades to zero more slowly than does that of a LogNormal, for large enough x. 3. An important application of the binomial distribution is in connection with the distribution of claim frequencies when the risks are not homogeneous. A. 1 B. 2 C. 1, 2 D. 2, 3 E. 1, 2, 3
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 288 24.71 (4, 5/89, Q.42) (1 point) Which of the following are true? 1. If the independent random variables X and Y are Poisson variables, then Z = X + Y is a Poisson random variable. 2. If X1 ,,....Xn are n independent unit normal variables, then the sum Z = X1 +..+ Xn is normally distributed with mean µ = n and standard deviation σ = n. 3. If X is normally distributed then Y = lnX has a LogNormal distribution. A. 1 B. 3 C. 1, 2 D. 1, 3 E. 1, 2, 3 24.72 (4, 5/89, Q.44) (2 points) The severities of individual claims are Pareto distributed, with parameters α = 8/3 and θ = 8,000. Using the Central Limit Theorem, what is the probability that the sum of 100 independent claims will exceed 600,000? A. Less than 0.025 B. At least 0.025, but less than 0.075 C. At least 0.075, but less than 0.125 D. At least 0.125, but less than 0.175 E. 0.175 or more 24.73 (4, 5/89, Q.56) (1 point) The random variable X has a Pareto distribution F(x) = 1 - {100 / (100+x)}2 , for x > 0. Which of the following distribution functions represents the distribution function of X truncated from below at 100? A. 1 - {200 / (100+x)}2 , x > 100 B. 1 - (100/x)2 , x > 100 C. 1 - 1 / (x - 100)2 , x > 100 D. 1 - {200 / (200+x)}2 , x > 0 E. 1 - {100 / (x - 100)}2 , x > 0
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 289 24.74 (4, 5/90, Q.50) (2 points) The underlying claim severity distribution for the ABC Insurance Company is lognormal with parameters µ = 7 and σ2 = 10. The company only records losses that are less than $50,000. Let X be the random variable representing all losses with cumulative distribution function FX(x) and Y be the random variable representing the company's recorded losses with cumulative distribution function FY(x). Then for x ≤ $50,000 FY(x) = A FX(x) where A is the necessary adjustment factor. In what range does A fall? A. A < 0.70 B. 0.70 ≤ A < 0.90 C. 0.90 ≤ A < 1.10 D. 1.10 ≤ A < 1.30 E. 1.30 ≤ A 24.75 (4B, 11/92, Q.17) (1 point) You are given the following: • X1 and X2 are independent, identically distributed random variables.
•
X = X1 + X2
Which of the following are true? 1. If X1 , X2 have Poisson distributions with mean µ, then X has a Poisson distribution with mean 2µ. 2. If X1 , X2 have gamma distributions with parameters α and θ, then X has a gamma distribution with parameters 2α and 2θ. 3. If X1 , X2 have standard normal distributions, then X has a normal distribution with mean 0 and variance 2. A. 1 only B. 2 only
C. 1, 3 only
D. 2, 3 only
E. 1, 2, 3
24.76 (4B, 11/92, Q.31) (2 points) The severity distribution of individual claims is gamma with parameters α = 5 and θ = 1000. Use the Central Limit Theorem to determine the probability that the sum of 100 independent claims exceeds $525,000. A. Less than 0.05 B. At least 0.05 but less than 0.10 C. At least 0.10 but less than 0.15 D. At least 0.15 but less than 0.20 E. At least 0.20
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 290 24.77 (4B, 11/93, Q.19) (2 points) A random variable X is distributed lognormally with parameters µ = 0 and σ = 1. Determine the probability that X lies within one standard deviation of the mean. A. Less than 0.65 B. At least 0.65, but less than 0.75 C. At least 0.75, but less than 0.85 D. At least 0.85, but less than 0.95 E. At least 0.95 24.78 (4B, 11/93, Q.28) (3 points) You are given the following: • X1 , X2 , and X3 are random variables representing the amount of an individual claim. • The first and second moments for X1 , X2 , and X3 are: E[X1 ] = 1. E[X1 2 ] = 1.5.
E[X2 ] = 0.5. E[X2 2 ] = 0.5.
E[X3 ] = 0.5. E[X3 2 ] = 1.5.
For which of the random variables X1 , X2 , and X3 is it appropriate to use a Pareto distribution? A. X1
B. X2
C. X3
D. X1 , X3
E. None of A, B, C or D
24.79 (4B, 5/94, Q.6) (2 points) You are given the following: •
Losses follow a Weibull distribution with parameters θ = 20 and τ = 1.0.
•
A random sample of losses is collected, but the sample data is truncated from below by a deductible of 10. Determine the probability that an observation from the sample data is at most 25. A. Less than 0.50 B. At least 0.50, but less than 0.60 C. At least 0.60, but less than 0.70 D. At least 0.70, but less than 0.80 E. At least 0.80 24.80 (4B, 11/94, Q.1) (1 point) You are given the following: X is a random variable representing size of loss. Y = ln(x) is a random variable having a normal distribution with a mean of 6.503 and standard deviation of 1.500. Determine the probability that X is greater than $1,000. A. Less than 0.300 B. At least 0.300, but less than 0.325 C. At least 0.325, but less than 0.350 D. At least 0.350, but less than 0.375 E. At least 0.375
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 291 24.81 (4B, 11/94, Q.10) (2 points) You are given the following: A portfolio consists of 10 independent risks. The distribution of annual losses (x is in $ millions) for each risk is given by a Gamma Distribution f(x) = θ−αxα−1 e−x/θ / Γ(α), x > 0, with θ = 1 and α = 0.1. Determine the probability that the portfolio has aggregate annual losses greater than $1.0 million. A. Less than 20% B. At least 20%, but less than 40% C. At least 40%, but less than 60% D. At least 60%, but less than 80% E. At least 80% 24.82 (4B, 11/94, Q.17) (2 points) You are given the following: Losses follow a Weibull distribution with parameters θ = 20 and τ = 1.0. For each loss that occurs, the insurerʼs payment is equal to the amount of the loss truncated and shifted by a deductible of 10. If the insurer makes a payment, what is the probability that an insurerʼs payment is less than or equal to 25? A. Less than 0.65 B. At least 0.65, but less than 0.70 C. At least 0.70, but less than 0.75 D. At least 0.75, but less than 0.80 E. At least 0.80 24.83 (4B, 11/95, Q.8) (2 points) Losses follow a Weibull distribution, with parameters θ (unknown) and τ = 0.5. Determine the ratio of the mean to the median. A. Less than 2.0 B. At least 2.0, but less than 3.0 C. At least 3.0, but less than 4.0 D. At least 4.0 E. Cannot be determined from the given information. 24.84 (4B, 11/96, Q.8) (1 point) The random variable Y is the sum of two independent and identically distributed random variables, X1 and X2 . Which of the following statements are true? 1. If X1 and X2 have Poisson distributions, then Y must have a Poisson distribution. 2. If X1 and X2 have gamma distributions, then Y must have a gamma distribution. 3. If X1 and X2 have lognormal distributions, then Y must have a lognormal distribution. A. 1
B. 2
C. 3
D. 1, 2
E. 1, 3
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 292 Use the following information for the next two questions: • A portfolio consists of 16 independent risks.
•
For each risk, losses follow a Gamma distribution, with parameters θ = 250 and α = 1.
24.85 (4B, 5/97, Q.6) (2 points) Without using the Central Limit Theorem, determine the probability that the aggregate losses for the entire portfolio will exceed 6,000. A. 1 - Γ(1; 1)
B. 1 - Γ(1; 24)
C. 1 - Γ(1; 384)
D. 1 - Γ(16; 24)
E. 1 - Γ(16; 384)
24.86 (4B, 5/97, Q.7) (2 points) Using the Central Limit Theorem, determine the approximate probability that the aggregate losses for the entire portfolio will exceed 6,000. A. Less than 0.0125 B. At least 0.0125, but less than 0.0250 C. At least 0.0250, but less than 0.0375 D. At least 0.0375, but less than 0.0500 E. At least 0.0500 24.87 (4B, 5/97, Q.16) (1 point) You are given the following:
• The random variable X has a Weibull distribution, with parameters θ = 625 and τ = 0.5. • Z is defined to be 0.25X. Determine the correlation coefficient of X and Z. A. 0.00 B. 0.25 C. 0.50 D. 0.75
E. 1.00
24.88 (4B, 11/98, Q.27) (2 points) Determine the skewness of a gamma distribution with a coefficient of variation of 1. Hint: The skewness of a distribution is defined to be the third central moment divided by the cube of the standard deviation. A. 0 B. 1 C. 2 D. 4 E. 6 24.89 (4B, 5/99. Q.1) (1 point) Which of the following inequalities is true for a Pareto distribution with a finite mean? A. Mean < Median < Mode B. Mean < Mode < Median C. Median < Mode < Mean D. Mode < Mean < Median E. Mode < Median < Mean 24.90 (Course 160 Sample Exam #1, 1999, Q.1) (1.9 points) For a laser operated gene splicer, you are given: (i) It has a Weibull survival model with parameters θ =
2 and τ = 2.
(ii) It was operational at time t = 1. (iii) It failed prior to time t = 4. Calculate the probability that the splicer failed between times t = 2 and t = 3. (A) 0.2046 (B) 0.2047 (C) 0.2048 (D) 0.2049 (E) 0.2050
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 293 24.91 (1, 5/00, Q.7) (1.9 points) An insurance companyʼs monthly claims are modeled by a continuous, positive random variable X, whose probability density function is proportional to (1 + x)4 , where 0 < x < ∞. Determine the companyʼs expected monthly claims. (A) 1/6 (B) 1/3 (C) 1/2 (D) 1
(E) 3
24.92 (3, 5/00, Q.8) (2.5 points) For a two-year term insurance on a randomly chosen member of a population: (i) 1/3 of the population are smokers and 2/3 are nonsmokers. (ii) The future lifetimes follow a Weibull distribution with: τ = 2 and θ = 1.5 for smokers τ = 2 and θ = 2.0 for nonsmokers (iii) The death benefit is 100,000 payable at the end of the year of death. (iv) i = 0.05 Calculate the actuarial present value of this insurance. (A) 64,100 (B) 64,300 (C) 64,600 (D) 64,900 (E) 65,100 24.93 (3, 5/01, Q.24) (2.5 points) For a disability insurance claim: (i) The claimant will receive payments at the rate of 20,000 per year, payable continuously as long as she remains disabled. (ii) The length of the payment period in years is a random variable with the gamma distribution with parameters α = 2 and θ = 1. (iii) Payments begin immediately. (iv) δ = 0.05 Calculate the actuarial present value of the disability payments at the time of disability. (A) 36,400 (B) 37,200 (C) 38,100 (D) 39,200 (E) 40,000 24.94 (1, 11/01, Q.35) (1.9 points) Auto claim amounts, in thousands, are modeled by a random variable with density function f(x) = xe-x for x ≥ 0. The company expects to pay 100 claims if there is no deductible. How many claims does the company expect to pay if the company decides to introduce a deductible of 1000? (A) 26
(B) 37
(C) 50
(D) 63
(E) 74
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 294 24.95 (CAS3, 5/04, Q.21) (2.5 points) Auto liability losses for a group of insureds (Group R) follow a Pareto distribution with α = 2 and θ = 2,000. Losses from a second group (Group S) follow a Pareto distribution with α = 2 and θ = 3,000. Group R has an ordinary deductible of 500, while Group S has a franchise deductible of 200. Calculate the amount that the expected cost per payment for Group S exceeds that for Group R. A. Less than 350 B. At least 350, but less than 650 C. At least 650, but less than 950 D. At least 950, but less than 1,250 E. At least 1,250 24.96 (CAS3, 11/04, Q.25) (2.5 points) Let X be the random variable representing the aggregate losses for an insured. X follows a gamma distribution with mean of $1 million and coefficient of variation 1. An insurance policy pays for aggregate losses that exceed twice the expected value of X. Calculate the expected loss for the policy. A. Less than $100,000 B. At least $100,000, but less than $200,000 C. At least $200,000, but less than $300,000 D. At least $300,000, but less than $400,000 E. At least $400,000 24.97 (CAS3, 5/05, Q.35) (2.5 points) An insurance company offers two types of policies, Type Q and Type R. Type Q has no deductible, but a policy limit of 3,000. Type R has no limit, but an ordinary deductible of d. Losses follow a Pareto distribution with θ = 2,000 and α = 3. Calculate the deductible, d, such that both policy types have the same expected cost per loss. A. Less than 50 B. At least 50, but less than 100 C. At least 100, but less than 150 D. At least 150, but less than 200 E. 200 or more 24.98 (SOA M, 11/05, Q.8) (2.5 points) A Mars probe has two batteries. Once a battery is activated, its future lifetime is exponential with mean 1 year. The first battery is activated when the probe lands on Mars. The second battery is activated when the first fails. Battery lifetimes after activation are independent. The probe transmits data until both batteries have failed. Calculate the probability that the probe is transmitting data three years after landing. (A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 295 24.99 (CAS3, 5/06, Q.25) (2.5 points) Calculate the skewness of a Pareto distribution with α = 4 and θ = 1,000. A. Less than 2 B. At least 2, but less than 4 C. At least 4, but less than 6 D. At least 6, but less than 8 E. At least 8 24.100 (CAS3, 5/06, Q.36) (2.5 points) The following information is available for a collective risk model:
• X is a random variable representing the size of each loss. • X follows a Gamma Distribution with α = 2 and θ = 100. • N is a random variable representing the number of claims. • S is a random variable representing aggregate losses. • S = X1 + ... + XN. Calculate the mode of S when N = 5. A. Less than 950 B. At least 950 but less than 1050 C. At least 1050 but less than 1150 D. At least 1150 but less than 1250 E. At least 1250 24.101 (4, 5/07, Q.39) (2.5 points) You are given: (i) The frequency distribution for the number of losses for a policy with no deductible is negative binomial with r = 3 and β = 5. (ii) Loss amounts for this policy follow the Weibull distribution with θ = 1000 and τ = 0.3. Determine the expected number of payments when a deductible of 200 is applied. (A) Less than 5 (B) At least 5, but less than 7 (C) At least 7, but less than 9 (D) At least 9, but less than 11 (E) At least 11 24.102 (2 points) In the previous question, 4, 5/07, Q.39, determine the variance of the number of payments when a deductible of 200 is applied. A. 20 B. 30 C. 40 D. 50 E. 60
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 296 Solutions to Problems: 24.1. E. 1. True. All of the moments of a LogNormal exist. 2. True. (For the Pareto, the nth moment exists if n > α.) 3. True. All of the moments of a Weibull exist. 24.2. D. For the Gamma Distribution, the mean is αθ, while the variance is αθ2 . Thus the coefficient of variation is: (variance.5) / mean = { αθ2 }.5 /{αθ} = 1 / α.5. Thus for the Gamma Distribution, α = 1/CV2 . Thus α = 1 / (0.5)2 = 4. 24.3. E. Both the coefficient of variation and the skewness do not depend on the scale parameter θ. Therefore θ can not be determined from the given information. For the Gamma Distribution, the coefficient of variation = 1/ α and Skewness = 2/ α . 24.4. E. S(25,000) = exp[-(25,000/100,000)0.2] = 0.469. 24.5. B. The mean of the Weibull is θΓ(1+ 1 /τ) = (100,000)Γ(1+ 1/0.2) = (100,000)Γ(6) = 5! (100,000) = (120) (100,000) = 12 million. 24.6. C. The median of the Weibull is such that .5 = F(m) = 1 - exp(-(m/θ)τ). Thus, -(m/θ)τ = ln .5. m = θ(-ln.5)1/τ = θ(.693147)1/τ = (100,000).6931475 = 16,000. Comment: Note that the median for this Weibull is much smaller than the mean, a symptom of a distribution skewed to the right (positively skewed.)
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 297 24.7. C. In general for X and Y two independent variables : E[XY] = E[X]E[Y] and E[X2 Y 2 ] = E[X2 ]E[Y2 ]. VAR[XY] = E[(XY)2 ] - E[XY]2 = E[X2 Y 2 ] - {E[X]E[Y]}2 = E[X2 ]E[Y2 ] - E[X]2 E[Y]2 . For the Pareto Distribution the moments are: n
n! θn
∏ (α − i)
, α > n.
i=1
Therefore, E[Y] = θ/(α−1) = 10/2 = 5, and E[Y2 ] = 2θ2 /{(α−1)(α−2)} = (2)(102 )/{(2)(1)} = 100. n−1
For the Gamma Distribution the moments are: θn ∏(α + i) . i=0
Therefore, E[X] = αθ = 40, and E[X2 ] = α(α+1)θ2 = (4)(5)/(102 ) = 2000. Therefore, VAR[Z] =(2000)(100) - {(40)(5)}2 = 160,000. 24.8. C. F(20 million) = 1 - {34/(34 + 20)}2.5 = 68.5%. 24.9. D. The mean of the Pareto is θ / (α-1) = 34 million / 1.5 = 22.667 million. 24.10. B. The median of the Pareto is such that .5 = F(m) = 1 - {θ/(θ+m)}α. Thus (θ+m)/θ = 0.5-1/α, and m = θ{0.5-1/α - 1} = 34 million{0.5-1/2.5 - 1} = 10.86 million. Comment: Note the connection to Simulation by the Method of Inversion and to fitting via Percentile Matching. 24.11. B. For the Pareto Distribution the moments are: n
n! θn
∏ (α − i)
, α > n.
i=1
Therefore putting everything in units of 1 million, E[X] = θ/(α−1) = 34/1.5 = 22.67 and E[X2 ] = 2θ2 /{(α−1)(α−2)} = (2)(342 ) / {(1.5)(.5)} = 3083. Thus VAR[X] = 3803 - (22.67)2 = 2569. Therefore, the standard deviation of X is
2569 = 50.7 (million).
Alternately, for the Pareto the variance = θ2α / { (α−2)(α−1)2 } = (342 )(1012)(2.5) / {(.5)(1.52 )} = 2569 x1012. Therefore, the standard deviation is: 2569 x 10 12 = 50.7 million. Comment: Note that the coefficient of variation = standard deviation / mean = 50.7 / 22.67 = 2.24 =
5 = 2.5 / 0.5 = α / (α - 2) .
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 298 24.12. A. F(5) = 1 - exp[-(5/4)1.5] = .753. Expect 172/.753 = 228 claims ultimately reported. Expect 228 - 172 = 56 claims reported in the future. Comment: We are dealing with time rather than size of loss. F(5) = the expected percentage of claims reported by time 5. F(5) (expected total number of claims) = expected number of claims reported by time 5. expected total number of claims ≅ (number of claims reported by time 5)/F(5). 24.13. D. X1 + X2 + X3 + X4 is a Gamma Distribution with α = 4 and θ = 100. As shown in the Appendix attached to the exam, for α > 1 its mode is: θ(α - 1) = 300. 24.14. A. The sum of independent random variables each of which follows a Gamma distribution with the same scale parameter, is also a Gamma distribution; it has a shape parameter equal to the sum of the shape parameters and the same scale parameter. Thus Y is Gamma with α = 3 + 5 + 9 = 17, and θ = 50. 24.15. B. Mean of LogNormal is exp(µ + .5σ2 ). Second Moment of the LogNormal is exp(2µ + 2σ2 ). Therefore the variance is: exp(2µ + 2σ2 ) - exp(2µ + σ2 ) = exp(2µ + σ2 ) {1+exp(σ2 )}. CV2 = Variance / Mean2 = exp(σ2 ) - 1. σ = {ln(1 + CV2 )}.5 = {ln(1 + 5/4)}0.5 = 0.9005. µ = ln(Mean) - 0.5σ2 = ln(2000) - 0.5(.90052 ) = 7.1955. Chance that a claim exceeds $3,500 is 1 -F(3500) = 1 -Φ({ln(3500) - µ} / σ) = 1 - Φ({8.1605 - 7.1955} / 0.9005) = 1- Φ(1.07) = 1- 0.8577 = 0.1423. 24.16. C. The Gamma Distribution as α approaches infinity is a Normal Distribution. The Gamma Distribution with α = 1 is an Exponential Distribution. The Weibull Distribution with τ = 1 is an Exponential Distribution. 24.17. D. For the Gamma Distribution: Mean = αθ = 5000, Variance = αθ2 = 5 million. Thus the Standard Deviation is
5 million = 2236. Thus the chance of a claim exceeding 8,000 is
approximately: 1 - Φ((8000 - 5000)/2236) = 1 - Φ(1.34) = 1 - 0.9099 = 9.01%. Comment: When applying the Normal Approximation to a continuous distribution, there is no “continuity correction” such as is applied as when approximating a discrete distribution. (In any case, here it would make no significant difference.) The Gamma approaches a Normal as α approaches infinity. In this case, the exact answer is given via an incomplete Gamma Function: 1 - Γ(α; x/θ) = 1 - Γ(5; 8) = 1 - .900368 = 9.9632%, gotten via computer.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 299 24.18. A. The mode of any Pareto Distribution is 0. 24.19. C. Let X be the losses at first report, let Y be the loss development factor and let Z = XY be the losses at ultimate. Ln(X) and ln(Y) are two independent Normal variables. Therefore, ln(Z) = ln(XY) = ln(X) + ln(Y) is a Normal variable. Therefore, Z is a LogNormal variable. ln(Z) = ln(X) + ln(Y) has mean equal to the sum of the means of ln(X) and ln(Y) and variance equal to the sum of the variances of ln(X) and ln(Y). Therefore ln(Z) has parameters µ = 10 + 0.1 = 10.1 and σ2 = 2.52 + 0.52 = 6.5, and therefore so does Z. For a LogNormal Distribution F(z) = Φ[{ln(z) − µ} / σ]. In this case with µ = 10.1 and σ = 2.55, F(1,000,000) = Φ[{ln(1,000,000) - 10.1} / 2.55] = Φ[1.46] = .9279. 1 - F(1,000,000) = 1 - 0.9279 = 0.0721. Comment: The product of two independent LogNormal variables is also LogNormal. Note that the variances add, not the σ parameters themselves. ∞
24.20. D. E[1/X] =
∫0 f(x) / x dx = {θ−α/
∞
∫0
Γ(α)} xα - 2 e- x / θ dx =
{θ−α / Γ(α)} / {θ−(α−1) / Γ(α−1)} = 1/ {θ(α−1)} = 1000/(5-1) = 250. Alternately, the moments of the Gamma Distribution are E[Xn ] = θn Γ(α+n) / Γ(α). This formula works for n positive or negative. Therefore for n = -1, α = 5 and θ = 1/1000: E[1/X] = 1000 Γ(4) / Γ(5) = 1000 (3!)/(4!) = 1000 / 4 = 250. Alternately, if X follows a Gamma, then Z = 1/X has Distribution F(z) = 1 - Γ(α; 1/ θz) = 1 - Γ(5; 1000/ z), which is an Inverse Gamma, with scale parameter 1000 and α = 5. The Inverse Gamma has Mean = θ/(α-1) = 1000 / (5-1) = 250. Comment: Note that theta for the Gamma of 1/1000 becomes one over theta for the Inverse Gamma, which has theta equal to 1000.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 300 24.21. C. The moments of the LogNormal Distribution are E[Xn ] = exp[nµ + 0.5 n2 σ2]. Therefore for n = -2, with µ = 6 and σ = 2.5, E[1/X2 ] = exp[-12 + (2) 2.52] = e0.5 = 1.65. Alternately, if lnX follows a Normal, then if Z = 1/X, lnZ = - lnX also follows a Normal ( but with mean of -6 and standard deviation of 2.5.) Therefore, Z follows a LogNormal with µ = -6 and σ = 2.5. Then one can apply the formula for moments of the LogNormal in order to get the second moment of Z: E[1/X2 ] = E[Z2 ] = exp[-12 + (2) 2.52] = e0.5 = 1.65. Alternately, if Y = 1/X2 , then lnY = -2lnX also follows a Normal but with mean = (-2)(6) = -12 and standard deviation equal to ( |-2| )(2.5) = (2)(2.5) = 5. Thus Y follows a LogNormal with µ = -12 and σ = 5. Thus E[1/X2 ] = E[Y] = exp[µ +.5σ2] = exp[-12 + (1/2) 52] = e0.5 = 1.65. 24.22. E. For a Pareto, mean = θ/(α-1) = 3000 / (4-1) = 1000. 24.23. B. For a Pareto, coefficient of variation =
α / (α - 2) = 2 = 1.414.
Alternately, For the Pareto Distribution the moments are: n
n! θn
∏ (α − i)
, α > n.
i=1
E[X] = θ/(α-1) = 1000. E[X2 ] = 2θ2/{(α-1)(α-2)} = 3,000,000. σ =
E[X2] - E[X]2 = 1414.2.
coefficient of variation = σ/E[X] = 1414.2 / 1000 = 1.414. Comment: The coefficient of variation does not depend on the scale parameter θ. 24.24. E. For a Pareto, skewness = 2{(α+1)/(α−3)} (α - 2) / α = 2(5)/ 2 = 7.071. Alternately, for the Pareto Distribution the moments are: n
n! θn
∏ (α − i)
, α > n.
i=1
E[X] = θ/(α−1) = 1000. E[X2 ] = 2θ2/{(α−1)(α−2)} = 3,000,000. σ =
E[X2] - E[X]2 = 1414.2.
E[X3 ] = 6θ3/{(α−1)(α−2)(α-3)} = 27,000,000,000. skewness = {E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 }/σ3 = 7.07. Comment: The skewness does not depend on the scale parameter θ. Notice the large positive skewness, which is typical for a heavier-tailed distribution such as the Pareto, when its skewness exists (for α > 3.)
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 301 24.25 .D. The mean of the Weibull is: θ Γ(1 + 1/τ) = θ Γ(1 + 2) = θ Γ(3) = θ 2! = 2θ. The second moment of the Weibull is: θ2 Γ(1 + 2/τ) = (θ2)Γ(1 + 4) = (θ2)Γ(5) = (θ2)4! = 24θ2. The variance of the Weibull is: 24θ2 - (2θ)2 = 20θ2. Y has a mean of: (90)(2θ) = 180θ, and a variance of: (90)(20θ2) = 1800θ2. Prob[X > 1.2 E[X]] = 1 - Φ[(216θ - 180θ) /
1800 θ2 ] = 1 - Φ[0.85] = 19.77%.
24.26. A. The 99th percentile is: exp[µ + 2.326σ]. The 95th percentile is: exp[µ + 1.645σ]. 3.4 = exp[µ + 2.326σ] / exp[µ + 1.645σ] = exp[0.681σ]. ⇒ σ = 1.797. 24.27. B. Given a disability of length t, the present value of an annuity certain is: (1 - e-δt)/δ. The expected present value is the average of this over all t: ∞
∞
∫{(1 - e-δt)/δ}f(t) dt = (1/δ)∫f(t) - e-δt e-t/θtα−1/(θαΓ(α)) dt = 0
0 ∞
∫
(1/δ){1 - (1/(θαΓ(α))) e-t(δ + 1/θ) tα−1 dt = (1/δ){1 - (1/(θαΓ(α)))Γ(α)(δ + 1/θ)−α} = 0
(1/δ){1 - (1/(δ + 1/θ)α)/θα} = {1 - (1 + δθ)−α}/δ. Comment: Similar to 3, 5/01, Q. 24. ∞
I used the fact that ∫ tα−1 e−t/θ dt = Γ(α)θα ⇔ Gamma density integrates to 1 over its support. 0
24.28. A. The median is where the Distribution Function is 0.5. Φ[ {ln(x)−µ} / σ] = 0.5. Therefore, {ln(x)−µ} / σ = 0. x = eµ = e4.2 = 66.7.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 302 24.29. D. The mean is exp[µ + .5σ2]. The Distribution Function at the mean is: Φ[{ln(exp[µ + 0.5σ2]) - µ} / σ] = Φ[{(µ + 0.5σ2)) - µ} / σ] = Φ[σ/2] = Φ[0.9] = 0.8159. Comment: For a heavier-tailed distribution, thereʼs only a small chance that a claim is greater than the mean; a few large claims contribute a great deal to the mean. The mean is: exp[µ + .5σ2] = exp[4.2 + (1/2)(1.82 )] = exp[5.82] = 336.972. F(336.972) = Φ[{ln(336.972) - 4.2}/1.8] = Φ[0.9] = 0.8159. 24.30. C. For the Gamma Distribution, the moments about the origin are: E[Xn ] = θn Γ(α+n)/Γ(α). E[X] = θα. E[X2 ]= θ2α(α+1). E[X3 ] = θ3α(α+1)(α+2). E[X4 ] = θ4α(α+1)(α+2)(α+3). Fourth Central Moment = E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 3E[X]4 = θ4α {(α+1)(α+2)(α+3) - 4α(α+1)(α+2) + 6αα(α+1) - 3α3} = θ4α {α3 + 6α2 +11α + 6 - 4α3 -12α2 - 8α + 6α3 + 6α2 - 3α3} = θ4 {3α 2 + 6α}. 24.31. B. From the previous solution, for the Gamma Distribution, Fourth Central Moment = θ4{3α 2 + 6α}. Variance = θ2α. Kurtosis = Fourth Central Moment/ Variance2 = 3 + 6/α. Comment: Note that the scale parameter, θ, does not appear in the kurtosis, which is a dimensionless quantity. Also note that the kurtosis of a Gamma is always larger than that of a Normal Distribution, which is 3. The Gamma has a heavier tail than the Normal. 24.32. E. For the LogNormal, the mean is: exp(µ +.5 σ2) = exp(3.08) = 21.8. 24.33. D. The median is that point where F(x) = 0.5. Thus Φ[{ln(x) − µ} / σ] = 0.5. ⇒ 0 = {ln(x) - µ} / σ. Thus ln(x) = µ, or the median = eµ = e3 = 20.1. 24.34. A. The mode is that point where f(x) is a maximum. For the LogNormal: f(x) = exp[-.5 ({ln(x) − µ} / σ)2] /{xσ
2 π ).
fʼ(x) = -f(x)/x - f(x) ({ln(x) − µ} / σ) /xσ. Thus fʼ(x) =0 for ({ln(x) − µ} / σ2) = -1. ⇒ mode = exp(µ − σ2) = exp(2.84) = 17.1.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 303 24.35. A. S(50) = 1 - Φ[(ln(50) - 3)/.4] = 1 - Φ[2.28] = 1 - 0.9887 = 1.13%. 24.36. E. The probability of death at time t, is the density of the Gamma Distribution: f(t) = e-t/θtα−1/{θαΓ(α)}. The present value of a payment of one at time t is e−δt . Therefore, the actuarial present value of this insurance is: ∞
∞
∫ e−δt e-t/θtα−1/(θαΓ(α)) dt = (1/(θαΓ(α))) ∫ e−(δ+1/θ)t xα−1 dt 0
0
= (1/(θαΓ(α))) Γ(α)(δ + 1/θ)−α = 1/(1 + δθ)α. Comment: The Gamma Distribution is too heavy-tailed to be a good model of future lifetimes. 24.37. C. After truncating and shifting from below by a deductible of size d, one gets another Pareto Distribution, but with parameters α and θ + d, in this case 3.2 and 135 + 25 = 160. This has density of: (αθα)(θ + x) − (α + 1) = (3.2)(1603.2)(160+x)-4.2. Plugging in x = 60 one gets: (3.2)(1603.2)(160+60)-4.2 = 0.00525. Alternately, after truncating and shifting by 25, G(x) = 1 - S(x+25)/S(25) = 1 - {(135/(135 + x + 25))3.2} / {(135/(135 +25))3.2} = 1 - (160/(160 + x))3.2. This is a Pareto Distribution with α = 3.2 and θ = 160. Proceed as before. Alternately, after truncating and shifting by 25, g(x) = f(x+25)/S(25) = (3.2)(1353.2)(135 + x + 25) -4.2 / {(135/(135 +25))3.2} = (3.2)(1603.2)(160+x)-4.2. h(60) = (3.2)(1603.2)(220)-4.2 = 0.00525. 24.38. D. & 24.39. B. eX is LogNormal with µ = 4 and σ = 0.8. Mean of LogNormal = E[eX] = exp[µ + σ2/2] = exp[4 + 0.82 /2] = 75.189. E[(eX)2 ] = E[e2X] = Second moment of LogNormal = exp[2µ + 2σ2] = exp[(2)(4 + 0.82 )] = 10,721.4. Variance of LogNormal = 10,721.4 - 75.1892 = 5068.0. Standard Deviation of LogNormal =
5068.0 = 71.2.
Alternately, for the LogNormal, 1 + CV2 = E[X2 ]/E[X]2 = exp[σ2] = exp[0.82 ] = 1.8965.
⇒ CV = 0.9468. ⇒ Standard Deviation of LogNormal = (0.9468)(75.189) = 71.2.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 304 24.40. A. eX is LogNormal with µ = 4 and σ = 0.8. 0.1 = Φ[{ln(x) - 4}/0.8]. ⇒ -1.282 = {ln(x) - 4} / 0.8. ⇒ x = 19.58. Alternately, the 10th percentile of the Normal Distribution is: 4 - (1.282)(0.8) = 2.9744. Thus, the 10th percentile of eX is: e2.9744 = 19.58. 24.41. C. The sum of two identically distributed Exponential Distributions is a Gamma Distribution with α = 2. The density of a Gamma Distribution with α = 2 and θ = 10 is: f(t) = (t/10)2 e-t/10 /{t Γ(2)} = .01 t e-t/10. 30
∫
t=30
]
Prob[T ≤ 30] = .01 t e-t/10 dt = -(t/10) e-t/10 - e-t/10 = 1 - 4e-3 = 80.1%. 0
t=0
Alternately, Prob[T ≤ 30] = Γ[2; 30/10] = 1 - (30/10) e-30/10 - e-30/10 = 80.1%. 24.42. B. For the LogNormal: mean is exp[µ +.5 σ2], median = eµ, mode = exp(µ−σ2). (mean - mode)/(mean - median) = {exp(µ +.5 σ2) - exp(µ−σ2)} / {exp(µ +.5 σ2) - eµ} = {1 - exp(-1.5σ2)} / {1 - exp(-0.5σ2)}. Comment: Note that for the LogNormal, mean > median > mode (alphabetical order) . This is typical for a continuous distribution with positive skewness. (The situation is reversed for negative skewness.) Also note that the median is closer to the mean than to the mode (just as it is in the dictionary.) Also, note that as σ goes to zero, this ratio goes to 1.5/.5 = 3. (For curves with “mild” skewness, it is reasonable to approximate this ratio by 3, according to Kendallʼs Advanced Theory of Statistics.)
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 305 24.43. D. The probability of death at time t, is the density of the Weibull Distribution: f(t) = τ(t/θ)τ exp(-(t/θ)τ) /t = 2t exp(-(t/30)2 ) /302. The present value of a payment of one at time t is e−δt = e-0.04t. Therefore, the actuarial present value of this insurance in million of dollars is: ∞
∞
∫ e-.04t t exp(-(t/30)2) /450 dt = (1/450) ∫ t exp(-{(t/30)2 +.04t + (.62) - .36}) dt = 0
0 ∞
(1/450) e.36
∞
∫ t exp(-{(t/30 + .6}2) dt = (1/450) e.36 ∫ (30x-18) exp(-x2) 30 dx =
0
x = .6
∞
e.36 {
∫
∞
∫
2x exp(-x2 ) dx - (6/5) exp(-x2 ) dx } =
.6
.6 x=∞
e.36 { - exp(-x2 )
] - (6/5)(
π {1 - Φ(0.6 2 )})} = 1 - e.36 (6/5) π {1 - Φ(0.8489)} =
x = .6
1 - (1.433)(1.2)(1.772)(1 - 0.8019) = 1 - 0.604 = 0.396. Where Iʼve used the chance of variables x = (t/30) + .6 and made use of the hint. Thus the the actuarial present value of this insurance is: ($1 million)(.396) = $396,000. 24.44. Mean = exp[µ + σ2/2]. Physicians: exp[7.8616 + 3.1311/2] = 12,421. Surgeons: exp[8.0562 + 2.8601/2] = 13,177. Hospitals: exp[ 7.4799 + 3.1988/2] = 8772. Hospitals have the smallest mean, while Surgeons have the largest mean. 1 + CV2 = E[X2 ] / E[X]2 . CV =
E[X2] / E[X]2 - 1 = exp[2µ + 2σ 2] / exp[µ + σ2 / 2]2 - 1 = exp[σ 2] - 1.
Physicians: exp[3.1311] - 1 = 4.680. Surgeons: Hospitals:
exp[2.8601] - 1 = 4.057.
exp[3.1988] - 1 = 4.848.
Surgeons have the smallest CV, while Hospitals have the largest CV. Comment: Taken from Table 4 of Sheldon Rosenbergʼs discussion of “On the Theory of Increased Limits and Excess of Loss Pricing”, PCAS 1977. Based on data from Policy Year 1972.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 306 24.45. C. The Normal distribution is symmetric. The Exponential and Pareto each have a mode of zero. The single Parameter Pareto has support x > θ > 0 and mode = θ. If it were a Single Parameter Pareto, then the biggest density (the mode) would be where the support of the density starts. This is not the case for the given histogram. Thus none of these are similar to the histogram. The Gamma for α > 1 has a mode > 0. Comment: This histogram is from a Gamma Distribution with α = 4. 24.46. E. After truncating and shifting from below, one gets a Pareto Distribution with α = 2.5 and θ = 47 + 10 = 57. Thus the nonzero payments are Pareto with α = 2.5 and θ = 57. This has mean: θ/(α - 1) = 57/1.5 = 38, second moment: 2θ2 / {(α - 1)(α - 2)} = 8664, and variance: 8664 - 382 = 7220. The probability of a nonzero payment is the probability that a loss is greater than the deductible of 10; for the original Pareto, S(10) = {47/(47+10)}2.5 = 0.617. Thus the payments of the insurer can be thought of as an aggregate distribution, with Bernoulli frequency with mean 0.617 and Pareto severity with α = 2.5 and θ = 57. The variance of this aggregate distribution is: (Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) = (.617)(7220) + (38)2 {(0.617)(1 - 0.617)} = 4796. Comment: Similar to 3, 11/00, Q.21. 24.47. D. The sum of three identically distributed Exponential Distributions is a Gamma Distribution with α = 3. The density of a Gamma Distribution with α = 3 and θ = 4 is: f(x) = (x/4)3 e-x/4 / {x Γ(3)} = x2 e-x/4/128. ∞
∫
x=∞
]
Prob[X > 20] = x2 e-x/4/128 dx = -(x/4)2 e-x/4/2 - (x/4) e-x/4 - e-x/4 = 18.5e-5 = 12.5%. 20
x=20
Alternately, Prob[X > 20] = 1 - Γ[3; 20/4] = 52 e-5/2 + 5 e-5 + e-5 = 18.5e-5 = 12.5%. 24.48. E. For an ordinary deductible, the average payment per non-zero payment is: {E[X] - E[X ∧ d]}/ S(d) = {E[X] - E[X ∧ 15]}/ S(15) = {(24/(2.5-1)) - (24/(2.5-1))(1 - {1 + 15/24}-1.5)} {1 + 15/24}2.5 = 26.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 307 24.49. C. For a franchise deductible, one has data truncated from below, and the average payment per non-zero payment is: {E[X] - E[X ∧ d] + S(d)d}/ S(d) = e(d) + d = (15+24)/(2.5-1) + 15 = 41. Alternately, the numerator is the integral from 15 to infinity of x f(x), which is: E[X] - E[X ∧ 15] + S(15)15 = 16 - 8.276 + (.2971)(15) = 12.180. The denominator is S(15) = 0.2971. Thus the average payment per non-zero payment is: 12.180/0.2971 = 41.00. Alternately, each non-zero payment is 15 more than with an ordinary deductible, thus using the previous solution: 15 + 26 = 41. Comment: For the Pareto Distribution e(d) = (d+θ)/(α-1). 24.50. A. For an ordinary deductible, the average payment per loss is: E[X] - E[X ∧ d] =
E[X] - E[X ∧ 10] = {(24/(2.5-1)) - (24/(2.5-1))(1 - {1 + 10/24}-1.5) = 16{1 + 10/24}-1.5 = 9.49. 24.51. D. For a franchise deductible, the average payment per loss is the same as that for an ordinary deductible, except d is added to each nonzero payment. Average payment per loss = E[X] - E[X ∧ 10] + (10)(Probability of nonzero payment) = 9.49 + (10){24/(24 + 10)}2.5 = 13.68. (13.68)(73) = 999. 24.52. C. As shown in the Appendix attached to the exam, for τ > 1 its mode is: θ{(τ - 1)/τ}1/τ = (1000){.5/1.5)1/1.5 = 481. 24.53. D. lnY = (X1 + X2 + ... + X10)/10, which is the average of 10 independent, identically distributed Normals, which is therefore another Normal with the same mean of 7 and a standard deviation of 1.6/ 10 . Therefore Y is LogNormal with µ = 7 and σ = 1.6/ 10 . Using the formula for the mean of a LogNormal, E[Y] = exp[7 + (1.6/ 10 )2 /2] = e7.128 = 1246. Comment: In general, the expected value of the Geometric average of n independent, identically distributed LogNormals is: exp[µ + σ2/(2n)]. As n → ∞, this approaches the median of eµ. Note that while the expected value of lnY is µ, it is not true that E[Y] = exp[E[lnY]] = eµ. 24.54. B. E[X] = (9)Γ[1 + 1/4] = (9)(0.90640) = 8.1576. E[X2 ] = (92 )Γ[1 + 2/4] = (92 )(0.88623) = 71.7846. Variance = 71.7846 - 8.15762 = 5.238.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 308 24.55. mean = αθ. mode = (α−1)θ. standard deviation = θ α . (mean - mode)/(standard deviation) = {αθ − (α−1)θ}/(θ α ) = 1/ α . Skewness = 2/ α . Kurtosis = 3 + 6/α. (skewness)(kurtosis + 3)/(10 kurtosis - 12 skewness2 - 18) = (2/ α )(3 + 6/α + 3)/(30 + 60/α - 48/α - 18) = (2/ α )(6 + 6/α)/(12 -12/α) = 1/ α . Comment: This is true in general for any of the members of the Pearson family of distributions. See equation 6.6 in Volume I of Kendallʼs Advanced Theory of Statistics. 24.56. D. The survival function for the Weibull is: S(t) = exp(-(t/θ)τ). τ = 2 and θ = 15 for smokers: S(t) = exp(-(t/15)2 ). S(1) = 0.9956. S(2) = 0.9824. S(3) = 0.9608. τ = 2 and θ = 20 for nonsmokers: S(t) = exp(-(t/20)2 ). S(1) = 0.9975. S(2) = 0.9900. S(3) = 0.9778. Assume for example a total of 400,000 people alive initially. Then, one fourth or 100,000 are smokers, and three fourths or 300,000 are nonsmokers. # of Years
Smoker Prob. of Survival
# Smokers Surviving
0 1 2 3
1 0.9956 0.9824 0.9608
100,000 99,557 98,238 96,079
# Smoker Deaths 443 1,319 2,159
Non-Smoker # # Prob. of Non-Smoker Non-Smoker Survival Surviving Deaths 1 0.9975 0.9900 0.9778
300,000 299,251 297,015 293,325
749 2,236 3,690
Total # Deaths 1,193 3,555 5,849
For example, the number of smokers who survive through year one is 99,557, while the number who survive through year two is 98,238. Therefore, 99,557 - 98,238 = 1319 smokers are expected to die during year two. For 400,000 insureds, the actuarial present value of the payments is: (100,000) {1193/1.06 + 3555/1.062 + 5849/1.063 } = 920,034,223. The actuarial present value of this insurance is: 920,034,223 / 400,000 = 2300. Comment: Similar to 3, 5/00, Q.8. 24.57. E. θ = Mean/α = 100/5 = 20. The sum is Gamma with α = (5)(3) = 15 and θ = 20. Mode = θ(α - 1) = (20)(14) = 280. Comment: Similar to CAS3, 5/06, Q.36.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 309 24.58. B. Ln(X) and ln(Y) are two independent Normal variables. Therefore, ln(Z) = ln(XY) = ln(X) + ln(Y) is a Normal variable. Therefore, Z is a LogNormal variable. ln(Z) = ln(X) + ln(Y) has mean equal to the sum of the means of ln(X) and ln(Y) and variance equal to the sum of the variances of ln(X) and ln(Y). Therefore, ln(Z) has parameters µ = 3 + 4 = 7 and σ2 = 2 + 1.5 = 3.5, and therefore so does Z. For the LogNormal Distribution, variance = exp(2µ + σ2) (exp( σ2) -1). Thus VAR[Z] = exp(14 + 3.5) (exp( 3.5) -1) = (39.82 million)(32.12) = 1.279 billion. ⇒ Standard deviation of Z is
1.279 billion = 35,763.
Alternately, in general for X and Y two independent variables: E[XY] = E[X]E[Y] and E[X2 Y 2 ] = E[X2 ]E[Y2 ]. VAR[XY] = E[(XY)2 ] - E[XY]2 = E[X2 Y 2 ] - {E[X]E[Y]}2 = E[X2 ]E[Y2 ] - E[X]2 E[Y]2 . E[X] = exp(µ + .5 σ2) = e4 , E[Y] = e4.75 , E[X2 ] = exp(2µ + 2 σ2) = e10, E[Y2 ] = e11. Therefore, Var[Z] = Var[XY] = (e10)(e11) - ( e4 )2 (e4.75)2 = e21 - e17.5 = 1.279 billion. Therefore, the standard deviation of Z is: 1.279 billion = 35,763. Comment: In general the product of independent LogNormal variables is a LogNormal with the sum of the individual µ and σ2 . 24.59. D. If the payment is made before the assets have grown to 50, then there is ruin. 50 = 5(1.10)t, implies t = 24.16. The probability each person has died by time 24.16 is given by the Weibull Distribution: 1 - exp[-(24.16/30)2 ] = 0.477. The probability that all 7 persons are dead by time 24.16 is: 0.4777 = 0.56%. Comment: Similar to 3, 5/00, Q.6. 24.60. D. 1 + CV2 = E[X2 ]/E[X]2 = exp[2µ + 2σ2] / exp[µ + σ2/2]2 = exp[σ2]. Thus, 1 + 5.752 = exp[σ2]. ⇒ σ = 1.88. For a LogNormal Distribution, the probability that a value is greater than the mean is: 1 - F[exp[µ + σ2/2]] = 1 - Φ[(ln[exp[µ + σ2/2]] - µ) / σ] = 1 - Φ[σ/2] = 1 - Φ[0.94] = 17.36%.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 310 24.61. B. For the Pareto, S(500) = {2000/(2000 + 500)}4 = 0.4096. The mean number of losses is: mq = (50)(0.3) = 15. The expected number of (non-zero) payments is: (0.4096)(15) = 6.144. Comment: Similar to 4, 5/07, Q. 39. As discussed in “Mahlerʼs Guide to Frequency Distributions,” the number of (non-zero) payments is Binomial with m = 5 and q = (0.4096)(0.3) = 0.12288. Therefore, the variance of the number of payments is: (50)(0.12288)(1 - 0.12288) = 5.39. 24.62. E. The Median is where the distribution function is 0.5. 0.5 = Φ[(ln(x) - µ)/σ]. ⇒ 0 = (ln(x) - µ)/σ. ⇒ x = exp[µ]. As shown in Appendix A attached to the exam, Mode = exp[µ − σ2]. Therefore, Median/Mode = exp[µ]/exp[µ − σ2] = exp[σ2] = 5.4. ⇒ σ = 1.30. 24.63. B. For the Pareto Distribution, VaRp [X] = θ {(1-p)-1/α - 1}. Q 0.25 = VaR0.25[X] = θ {(0.75)-1/α - 1} = (1000) {(0.75)-1/2 - 1} = 154.7. Q 0.75 = VaR0.75[X] = θ {(0.25)-1/α - 1} = (1000) {(0.25)-1/2 - 1} = 1000. Interquartile range = Q0.75 - Q0.25 = 1000 - 154.7 = 845.3. Comment: For a Pareto Distribution with θ = 1000, here is the interquartile range as a function of α: Interquartile Range 4000
3000
2000
1000
1
2
3
4
5
alpha
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 311 24.64. D. E[X] = exp[3 + 22 /2] = 148.41. E[X2 ] = exp[(2)(3) + (2)(22 )] = 1,202,604. Var[X] = 1,202,604 - 148.412 = 1,180,579. E[ X ] = E[X] = 148.41. Var[ X ] = Var[X]/20 = 1,180,579/20 = 59,029. E[ X 2 ] = Var[ X ] + E[ X ]2 = 59,029 + 148.412 = 81,054. Comment: Similar to 4, 11/06, Q.26. 24.65. C. ln(W) has a Normal Distribution with parameters µ = 1 and σ = 2. (ln(W) - 1))/2 has a Normal Distribution with parameters µ = 0 and σ = 1; in other words, (ln(W) - 1))/2 follows a standard unit Normal Distribution, with distribution function Φ. Therefore, Φ[(ln(W) - 1))/2] is uniform on [0, 1]. Comment: This forms the basis of simulating a LogNormal Distribution. If X follows the distribution F, then F(X) is uniform on [0, 1]. 24.66. C. Find where the density is a maximum. 0 = fʼ(x) = 2ax e-bx - bax2 e-bx.
⇒ 2 = bx. ⇒ x = 2/b. Comment: A Gamma Distribution with α = 3 and θ = 1/b. For α > 1, Mode = θ(α - 1) = 2/b. In order to integrate to one, the density must go to zero as x goes to infinity. Therefore, the mode is never infinity. 24.67. B. 1. False. A random variable Y is lognormally distributed if ln(Y) is normally distributed. If X is normally distributed, then exp(X) is lognormally distributed. 2. True. The LogNormal has a lighter tail than the Pareto. One way to see this is that all the moments of the LogNormal exist, while the moments of the Pareto only exist for n > α. Another is that the mean residual life of the LogNormal goes to infinity less than linearly, while that of the Pareto increases linearly. 3. False. The mean of a Pareto only exists for α > 1. 24.68. B. 1. Assume one is summing n independent identically distributed random variables. According to the Central Limit Theorem as n approaches ∞, this sum approaches a Normal Distribution. Precisely how large n must be in order for the Normal Approximation to be reasonable depends on the shape of the distribution and the definition of what is reasonable. Precisely how large n must be would depend on details about the distribution, however a large skewness would require a larger n. Statement 1 is not true. 2. True. 3. False. A random variable X is lognormally distributed if ln(X) = Z, where Z is normally distributed. 24.69. C. This is Gamma Distribution with α = 2 and θ = β. k = 1/Γ(α) = 1/Γ(2) = 1/1! = 1.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 312 24.70. B. 1. False. The Normal is symmetric while the LogNormal is skewed to the right. 2. True. The LogNormal has a lighter tail than the Pareto. One way to see this is that the mean residual life of the LogNormal goes to infinity less than linearly, while that of the Pareto increases linearly. 3. False. This is an important application of the Negative Binomial Distribution. The Negative Binomial is the mixed distribution for the Gamma-Poisson process. 24.71. A. 1. T. 2. F. While the sum of independent Normal distributions is also Normal, Statement #2 is false for two reasons. First, since each unit Normal has variance of 1, the sum of n independent unit Normals has a variance of n and standard deviation of n . Second, since each unit Normal has mean of 0, the sum of n independent unit Normals has a mean of 0. 3. F. If ln(Y) is normally distributed, then Y has a LogNormal Distribution. Equivalently, exp(X) is LogNormally distributed if X is Normally distributed. 24.72. C. The variance (for a single claim ) of the Pareto Distribution is: θ2α / { (α−2)(α−1)2 } = 92,160,000. The sum of 100 independent claims has 100 times this variance or 9,216,000,000. The standard deviation is therefore
9,216,000,000 = 96,000.
The mean of a single claim from the Pareto is θ/(α−1) = 4800; for 100 claims the mean is 480,000. Thus standardizing the variable 600,000 corresponds to: (600,000 - 480,000)/96,000 = 1.25. Thus the chance of the sum being greater than 600,000 is approximately: 1 - Φ(1.25) = 1 - .8944 = 0.1056. Comment: The second moment is: 2θ2 / { (α−2)(α−1) } = 115,200,000. Thus the variance = 115,200,000 - 48002 = 92,160,000. 24.73. A. If G(x) is the distribution function truncated from below at d, then G(x) = (F(x)-F(d))/(1-F(d)) for x >d. In this case, G(x) = (F(x)-F(100))/ (1-F(100)), for x>100. G(x) = {(100/200)2 - (100/(100+x))2 }/{(100/200)2 } = 1 - (200/(100+x))2 , for x>100. 24.74. D. For Truncation from above at a limit L, FY(x) = FX(x) / FX(L), for x≤ L. Thus in this case with L = 50000, A = 1/ FX(50000 ) = 1/ Φ[{ln(50000) - µ} / σ] = 1/ Φ[{10.82 - 7} /
10 ] = 1/ Φ[1.208] = 1/.8865 = 1.128.
24.75. C. 1. True. 2. False. X has a Gamma Distribution with parameters 2α and θ. 3. True. The Standard Normal has a mean of 0 and a variance of 1. The means add and variances add. Thus X has mean of 0 + 0 = 0 and variance 1 + 1 = 2.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 313 24.76. C. For the Gamma Distribution the mean is αθ and the variance is αθ2 . Thus we have per claim a mean of 5000 and a variance of 5,000,000. For the sum of 100 independent claims, one has 100 times the mean and variance. Thus the sum has a mean of 500,000 and a variance of 500 million. Thus the sum has a standard deviation of 22,361. $525,000 exceeds this mean by 25000 / 22,361 = 1.118 standard deviations. Thus the chance of the sum exceeding 525,000 is approximately: 1 - Φ(1.12) = 1 - 0.8686 = 0.1314. Comment: Alternately one can use the fact that the sum of 100 identical independent Gamma Distributions has a Gamma Distribution with parameters 100α and θ. Note that when applying the Normal Approximation to a continuous distribution such as the Gamma, there is no need for the “continuity correction” that is applied in the case of a discrete frequency distribution. 24.77. D. For this LogNormal Distribution the moments are E[X] = exp(µ + .5σ2) = e.5 = 1.649. E[X2] = exp(2µ + 2σ2) = e2. Thus the standard deviation is (e2 - e).5 = 2.161. Thus the interval within one standard deviation of the mean is 1.649 ± 2.161. But for the LogNormal distribution x > 0, so we are interested in the probability of x < (1.649 + 2.161) = 3.810. F(3.810) = Φ((ln(3.810) - µ ) / σ) = Φ(1.34) = 0.9099. Comment: Note that this differs from the probability of being within one standard deviation of the mean for the normal distribution, which is .682. Also note that the parameter σ is not the standard deviation of the LogNormal Distribution. Finally, note that the LogNormal Distribution is not symmetric, and thus one has to compute the distribution function separately at each of the two points. F(1.649 - 2.161) = 0 in this case, since the support of the LogNormal Distribution is x >0, so that F(x) = 0 for x ≤0. If instead we were asked the probability of being within a half of a standard deviation of the mean, this would be: F(1.649 + 1.080) - F(1.649 - 1.080) = Φ(1.00) - Φ(-.56) = .8413 - (1 - .7123) = .5536. 24.78. C. When α > 2 and therefore the first two moments exist for the Pareto: E[X2 ] / E[X]2 = {2θ2 /(α-1)(α-2)} / {θ/(α-1)}2 = 2(α-1) / (α-2) > 2. Thus we check the ratio E[X2 ] / E[X]2 , to see whether it is greater than 2. For X1 this ratio is 1.5. For X2 this ratio is 2. For X3 this ratio is: (1.5) / 0.52 = 6. Thus since this ratio is only greater than 2 for X3 , we conclude only for X3 could one “use a Pareto.” Comment: This is an initial test that is useful in some real world applications. Note that 1 + CV2 = E[X2] / E2[X]. Thus E[X2] / E2[X] > 2 is equivalent to CV2 > 1. Thus this fact is equivalent to the fact that for the Pareto distribution, the coefficient of variation (when it exists) is always greater than 1; the standard deviation is always greater than the mean.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 314 24.79. B. The original distribution function: F(x) = 1 - e -0.05x. The data truncated from below at 10, has distribution function: {F(x) - F(10)} / {1 - F(10)} = {e -0.5 - e -0.05x} / e -0.5 = 1 - e 0.5 -0.05x , x > 10 For x = 25, this has a value of 1 - e0.5 -1.25 = 0.53. Comment: The Weibull Distribution for a shape parameter of 1 is an Exponential. 24.80. E. X > 1000 corresponds to Y > 6.908. Converting to the Standard Normal distribution: (6.908 - 6.503) / 1.5 = 0.270. Using the Standard Normal Table, the chance of being less than or equal to .27 is .6064, so the chance of being more is 1 - .6064 = 0.3936. Comment: S(1000) for a LogNormal Distribution, with µ = 6.503 and σ = 1.500. 24.81. B. The sum of 10 independent identically distributed Gammas is a Gamma, with the same scale parameter and 10 times the original shape parameter. Thus the new Gamma has shape parameter, α = 0.1 x 10 = 1, while θ = 1. A Gamma with a shape parameter of 1 is an exponential distribution. F(x) = 1 - exp(-x). 1 - F(1) = exp(-1) = 36.8%. 24.82. C. If the insurer makes a payment, the probability that an insurerʼs payment is less than or equal to 25, is in terms of the original Weibull Distribution: {F(35) - F(10)} / {1 - F(10)} = {(1 - e-1.75) - (1 - e-.5)} / {1 - (1-e-.5)} = 0.713. 24.83. D. The mean of the Weibull for τ = .5 is: θ Γ(1+ 1 /.5) = θ Γ(3) = θ (2!) = 2θ. The median of the Weibull for τ = .5 is such that .5 = F(m) = 1 - exp(-(m/θ).5). Thus, -(m/θ).5 = ln0.5 ⇒ m = θ (-ln.5)2 = .4805 θ. ⇒ mean / median = 2 / .4805 = 4.162. Comment: The ratio is independent of θ, since both the median and the mean are multiplied by the scale parameter θ. 24.84. D. 1. True. 2. True. 3. False. Comment: The sum of independent Normal Distributions is a Normal. The product of independent LogNormal Distributions is a LogNormal. 24.85. D. The sum of 16 independent risks each with a Gamma Distribution is again a Gamma Distribution, with the same scale parameter θ and new shape parameter 16α. In this case, the aggregate losses have a Gamma Distribution with θ = 250 and α = (16)(1) = 16. Thus the tail probability at 6000 is: 1 - F(6000) = 1 - Γ(α; 6000/θ) = 1 - Γ(16; 6000/ 250) = 1 - Γ(16; 24). Comment: 1 - Γ(16; 24) = 0.0344.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 315 24.86. B. The aggregate losses have a Gamma Distribution with θ = 250 and α = 16. The mean is αθ = 4000 and variance is αθ2 = 1,000,000. Alternately, each risk has a mean of 1/.004 = 250, and a variance of 1/0.0042 = 62500. The means and variances add for the sum of independent risks. Thus the aggregate losses have a mean of (16)(250) = 4000 and a variance of (16)(62,500) = 1,000,000. In any case, the standard deviation is 1000 and the survival function at 6000 is approximately: 1- Φ[(6000-4000)/1000] = 1 - Φ(2) = 1 - 0.9773 = 0.0227. Comment: Note that in approximating the continuous Gamma Distribution no continuity correction is required. Note that the result from using the Normal Approximation here is less than the exact result of .0344. Due to the skewness of the Gamma Distribution, the Normal Approximation underestimated the tail probability; the Gamma has a heavier tail than the Normal. 24.87. E. Var[Z] = Var[0.25X] = 0.252 Var[X]. Covar[X,Z] = Covar[X,.025X] = 0.25Covar[X, X] = 0.25 Var[X]. Therefore, Corr[X, Z] = 0.25 Var[X] / Var[X] 0.252 Var[X] = 1. Comments: Two variables that are proportional with a positive proportionality constant are perfectly correlated and have a correlation of one. 24.88. C. The Gamma Distribution has skewness twice its coefficient of variation. Thus the skewness is (2)(1) = 2. Comment: The given Gamma is an Exponential with α = 1, CV = 1, and skewness = 2. The skewness would be calculated as follows. For the Gamma, E[X] = αθ, E[X2 ] = α(α+1)θ2 , E[X3 ] = α(α+1)(α+2)θ3 . The variance is: E[X2 ] - E[X]2 = α(α+1)θ2 - α2θ2 = αθ2 . Thus the CV =
α / θ2 =1/ α/θ
α.
The third central moment = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 = α(α+1)(α+2)θ3 - 3{α(α+1)θ2 }αθ + 2α3θ3 = 2αθ3. Thus the skewness = 2αθ3 / {αθ2 }1.5 = 2 /
α = twice the CV.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 316 24.89. E. The mean of the Pareto is θ/(α−1), for α > 1. The mode is zero, since f(x) = (αθα)(θ + x)−(α + 1), x > 0, which decreases as x increases, so that the density is largest at zero. The median is where F(x) = .5. Therefore, .5 = 1 - (θ/(θ+x))α, thus median = θ(21/α - 1). Mean
θ/(α−1)
Median
θ(21/α - 1)
Mode
0
Thus the mode is smallest. On the exam, just pick values for α and θ and see what happens. For example, for α = 3 and θ = 10, the mean is 10/(3-1) = 5 while the median is smaller at 10(21/3 -1) = 2.6. One can show that for the Pareto the mean is greater than the median, since 1/(α-1) > 21/α - 1, for α > 1. This inequality is equivalent to: 1 + 1/ (α-1) = α/(α-1) > 21/α ⇔ (α-1)/α = 1 − 1/α < 2-1/α. Let β = 1/α, then this inequality is equivalent to: 1 < β + 1/2β, for 1 > β > 0. At β = 0, the right hand expression is 1; its derivative is: 1 + ln(1/2)/2β > 0. Thus the right hand expression is indeed greater than 1 for β > 0. This in turn implies that the mean is greater than the median. Comment: For a continuous distribution with positive skewness typically: mean > median > mode (alphabetical order.) Since the mean is an average of the claim sizes, it is more heavily impacted by the rare large claim than the median; therefore, the Pareto with a heavy tail, has its mean greater than it median. 24.90. D. S(t) = exp[-(t/ 2 )2 ] = exp[-t2 /2]. Prob[2 < t < 3 | 1 < t < 4] = {S(2) - S(3)} / {S(1) - S(4)} = (e-2 - e-4.5) / (e-0.5 - e-8) = 0.205. 24.91. C. The density of a Pareto Distribution is f(x) = (αθα)(θ + x)-(α + 1), 0 < x < ∞. Thus this density is a Pareto Distribution with α = 3 and θ = 1. It has mean: θ/(α-1) = 1/(3 - 1) = 1/2.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 317 24.92. C. The survival function for the Weibull is: S(t) = exp(-(t/θ)τ). τ = 2 and θ = 1.5 for smokers: S(t) = exp(-(t/1.5)2 ). S(1) = e-.4444 = .6412. S(2) = e-1.7778 = .1690. τ = 2 and θ = 2.0 for nonsmokers: S(t) = exp(-(t/2)2 ). S(1) = e-0.25 = 0.7788. S(2) = e-1 = 0.3679. Assume for example a total of 30,000 people alive initially. Then, one third or 10,000 are smokers and two thirds or 20,000 are nonsmokers. # of Years
Smoker Prob. of Survival
# Smokers Surviving
# Smoker Deaths
0 1 2
1 0.6412 0.1690
10,000 6,412 1,690
3,588 4,722
Non-Smoker # # Prob. of Non-Smoker Non-Smoker Survival Surviving Deaths 1 0.7788 0.3679
20,000 15,576 7,358
4,424 8,218
Total # Deaths 8,012 12,940
For example, the number of nonsmokers who survive through year one is 15,576, while the number who survive through year two is 7358. Therefore, 15,576 - 7,358 = 8,218 nonsmokers are expected to die during year two. For 30,000 insureds, the actuarial present value of the payments is: (100,000) {8012/1.05 + 12940/1.052 } = 1,936,743,764. Therefore, the actuarial present value of this insurance is: 1,936,743,764 / 30,000 = 64,558. Comment: Overall: S(1) = (1/3)(.6412) + (2/3)(.7788) = .7329 = (6412 + 15576)/30000.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 318 24.93. B. Assuming the payments are made for a period of time t, the present value is an annuity certain: 20000(1 - e-δt)/δ = 20000(1 - e-.05t)/.05. For the Gamma Distribution, f(t) = θ−α tα−1 e-t/θ / Γ(α) = 1-2 t2-1 e-t/1 / Γ(2) = te-t. The actuarial present value is:
∫f(t)(Present value | t)dt = 20000∫f(t)(1 - e-.05t)/.05 dt = 400000(1 - ∫f(t)e-.05tdt) ∞
∞
∫
∫
= 400000(1 - te-te-.05tdt) = 400000(1 - te-1.05t dt) = 400000(1 - 1.05-2) = 37,188. 0
0
Comment: I have used the fact about “Gamma type integrals” ∞
∫ tα−1 e−t/θ dt = Γ(α)θα, for α = 2 and θ = 1/1.05. 0
For a length of disability with a Gamma Distribution and rate of payment of B, the actuarial present value is: B(1 - (1 + θδ)−α)/δ. In this case, B = 2000, α = 2, θ = 1, and δ = .05, and the actuarial present value is: (20000)(1 - (1 + (1)(.05))-2)/.05 = 37,188. The mean length of disability is: αθ = 2. 20000 times an annuity certain of length 2 has present value: 20000(1 - e-(.05)(2))/.05 = 38,065 ≠ 37,188. 24.94. E.
∞
x =∞
∫
S(1) = xe-x dx = -xe-x - e-x ] = 2e-1 = .7358. 1
x =1
After the introduction of the deductible, expect to pay: (100)(.7358) = 74 claims. Comment: A Gamma Distribution with α = 2 and θ = 1. F(1) = Γ[2; 1/1] = 1 - e-1 - 1e-1 = .2642. 24.95. C. For ordinary deductible d, expected payment per payment = (E[X] - E[X
∧
d])/S(d) =
({θ/(α-1)} - {θ/(α-1)}{1 - (θ/(θ+d))α−1})/(θ/(θ+d))α = {θ/(α-1)}/(θ/(θ+d)) = (θ+d)/(α-1) = θ + d. For Group R, the expected payment per payment is: 2000 + 500 = 2500. For Group S, the expected payment per payment with an ordinary deductible of 200 would be: 3000 + 200 = 3200. However, with a franchise deductible each payment is 200 more, for an average payment per payment of: 3200 + 200 = 3400. 3400 - 2500 = 900. Comment: For an ordinary deductible d, the expected payment per payment is e(d). For the Pareto distribution, e(x) = (θ + x)/(α -1). In this case, e(d) = (θ + d)/(2 - 1) = θ + d.
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 319 24.96. B. For a Gamma Distribution, CV =
αθ2 / (αθ) = 1/ α . CV = 1 ⇒ α =1.
Therefore, this is an Exponential Distribution, with θ = 1 million. E[(X - 2 million)+] = E[X] - E[X
∧
2 million] = 1 million - (1 million)(1 - exp[-(2/1)]) = $135,335.
Comment: One can also compute E[(X - 2 million)+] by integrating the survival function from 2 million to infinity, or by remembering that for an Exponential Distribution, R(x) = e-x/θ. The fact that this is an aggregate distribution does not change the mathematics of using an Exponential Distribution. It would be uncommon for aggregate losses to follow an Exponential. 24.97. D. Expected cost per loss for policy Q is: E[X ∧ 3000] = {2000/(3 - 1)}{1 - (2000/(2000 + 3000))2 } = 840. Expected cost per loss for policy R is: E[X] - E[X ∧ d] = 2000/(3 - 1) - {2000/(3 - 1)} {1 - (2000/(2000 + d))2 } = 1000{2000/(2000 + d)}2 . Set the two expected costs equal: 1000(2000/(2000 + d))2 = 840. ⇒ (2000 + d) = 2000 1000 / 840 = 2182. ⇒ d = 182. 24.98. D. The sum of two independent, identically distributed Exponentials is a Gamma Distribution, with α = 2 and the same θ. Thus the time the probe continues to transmit has a Gamma Distribution with α = 2 and θ = 1. This Gamma has density f(t) = te-t. ∞
S(t) =
∞
∫ f(t) dt = ∫ te-t dt = e-t + te-t. S(3) = 4e-3 = 0.199. t
t
Alternately, independent, identically distributed exponential interarrival times implies a Poisson Process. Therefore, if we had an infinite number of batteries, over three years the number of failures is Poisson with mean 3. We are interested in the probability of fewer than 2 failures, which is: e-3 + 3e-3 = 4e-3 = 0.199. Comment: One could work out the convolution from first principles by doing the appropriate integral: t
t
0
0
f*f(t) = ∫ f(x) f(t − x)dx . Alternately, the distribution function is ∫ F(x) f(t − x)dx .
2013-4-2, Loss Distributions, §24 Common 2 Parameter Dists. HCM 10/8/12, Page 320 24.99. D. E[X] = θ/(α - 1) = 333.333. E[X2 ] = 2θ2/{(α - 1)(α - 2)} = 333,333. Var(X) = 333,333 - 333.3332 = 222,222. E[X3 ] = 6θ3/{(α - 1)(α - 2)(α - 3)} = 1,000,000,000. Third Central Moment is: E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 1,000,000,000 - (3)(333.333)(333,333) + 2(333.3333 ) = 740,741,185. Skewness = 740,741,185/222,2221.5 = 7.071. Alternately, for the Pareto, skewness is: 2{(α+1)/(α−3)} (α - 2) / α = {(2)(5)/1}
2 / 4 = 7.071.
Comment: Since the skewness does not depend on the scale parameter, one could take θ = 1. E[X] = 1/3. E[X2 ] = 1/3. Var(X) = 2/9. E[X3 ] = 1. Skewness = {1 - 3(1/3)(1/3) + 2(1/3)3 } / (2/9)1.5 = 7.071. 24.100. A. The sum of 5 independent, identically distributed Gamma Distributions is another Gamma, with the same θ = 100, and α = (5)(2) = 10. Mode = θ(α - 1) = (100)(10 - 1) = 900. 24.101. C. For the Weibull, S(200) = exp[-(200/1000).3] = 0.5395. The mean number of losses is: rβ = (3)(5) = 15. The expected number of (non-zero) payments is: (0.5395)(15) = 8.09. 24.102. B. The number of (non-zero) payments is Negative Binomial with r = 3 and β = (0.5395)(5) = 2.6975. Therefore, the variance of the number of payments is: (3)(2.6975)(1 + 2.6975) = 29.92. Comment: See “Mahlerʼs Guide to Frequency Distributions.”
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 321
Section 25, Other Two Parameter Distributions120 In Loss Models, there are other Distributions with 2 parameters, which are much less important to know than the common distributions discussed previously. The Inverse Gamma is the most important of these other distributions. With the exception of the Inverse Gaussian, all of the remaining distributions have scale parameter θ, one shape parameter, and are heavy-tailed, with only some moments that exist.121 Just as with the Pareto Distribution discussed previously, the LogLogistic, Inverse Pareto, ParaLogistic and Inverse ParaLogistic, are special cases of the 4 parameter Transformed Beta Distribution, discussed subsequently. The Inverse Gamma and Inverse Weibull are special cases of the 3 parameter Inverse Transformed Gamma, discussed subsequently. LogLogistic:122 The LogLogistic Distribution is a special case of a Burr Distribution, for α = 1. It has scale parameter θ and shape parameter γ. (x / θ)γ 1 F(x) = . γ = 1 + (x / θ) 1 + (θ / x) γ
γ xγ − 1 f(x) = γ . θ (1 + (x / θ) γ)2
The nth moment only exists for n < γ. Inverse Pareto:123 If X follows a Pareto with parameters α and 1 then θ/X follows an Inverse Pareto with parameters τ = α and θ. The Inverse Pareto is so heavy-tailed that it has no (finite) mean nor higher moments. It has scale parameter θ and shape parameter τ. ⎛ x ⎞τ F(x) = ⎜ ⎟ = (1 + θ/x)−τ. ⎝x + θ⎠ 120
f(x) =
τθ x τ − 1 . (x + θ ) τ + 1
See the previous section for what I believe are more commonly used two parameter distributions. While their names sound similar, the Inverse Gaussian and Inverse Gamma are completely different distributions. 122 See Appendix A of Loss Models. 123 See Appendix A of Loss Models. 121
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 322
Inverse Gaussian: The Inverse Gaussian distribution has a tail behavior not very different than a Gamma Distribution. It has two parameters µ and θ, neither of which is exactly either a shape or a scale parameter.
f(x) =
⎛x ⎞2 θ ⎜ - 1⎟ ⎝µ ⎠ exp θ 2x .5 1 2π x
[
⎛x ⎞ F(x) = Φ ⎜ − 1⎟ ⎝µ ⎠
[
⎛ x =Φ ⎜ − ⎝ µ
[
θ x 1⎞ ⎟ x⎠
]
]
θ 2π
=
⎛x ⎞ + e2θ / µ Φ − ⎜ + 1⎟ ⎝µ ⎠
θ
[
]
[
exp -
θ x
⎛ x + e2θ / µ Φ − ⎜ + ⎝ µ
[
θx θ θ + 2 2µ µ 2x . 1.5 x
]
] 1⎞ ⎟ x⎠
θ
]
.
Mean = µ Variance = µ3 / θ
Skewness = 3
Coefficient of Variation =
µ = 3CV. θ
µ θ
Kurtosis = 3 + 15µ/θ = 3 + 15CV2 .
Thus the skewness for the Inverse Gaussian distribution is always three times the coefficient of variation.124 Thus the Inverse Gaussian is likely to fit well only to data sets for which this is true. Multiplying a variable that follows an Inverse Gaussian by a constant gives another Inverse Gaussian.125 The Inverse Gaussian is infinitely divisible.126 If X follows an Inverse Gaussian, then given any n > 1 we can find a random variable Y which also follows an Inverse Gaussian, such that adding up n independent version of Y gives X. 124
For the Gamma, the skewness was twice the coefficient of variation. Thus the Inverse Gaussian is preserved under Uniform Inflation. It is a “scale distribution”. See Loss Models, Definition 4.2. 126 The Gamma Distribution is also infinitely divisible. 125
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 323
If X follows an Inverse Gaussian with parameters µ1 and θ1, and Y follows an Inverse Gaussian with parameters µ2 and θ2, and if µ12 / θ1 = µ22 / θ2, then for X and Y independent, X+Y also follows an Inverse Gaussian, but with parameters µ3 = µ1 + µ2 and θ3 such that µ32 / θ3 = µ12 / θ1 =
µ22 / θ2. 127 Thus an Inverse Gaussian can be thought of as a sum of other independent identically distributed Inverse Gaussian distributions. Therefore, keeping β = µ2 / θ fixed, as µ gets larger and larger, the Inverse Gaussian is the sum of more and more identical copies of an Inverse Gaussian with parameters 1 and 1/β. Thus keeping µ2 / θ fixed and letting µ go to infinity, an Inverse Gaussian approaches a Normal distribution. It can be shown that if X follows an Inverse Gaussian with parameters µ and θ, and if Y = θ(X-µ)2 / (µ2 X), then Y follows a Chi-Square Distribution with one degree of freedom.128 In other words, Y has the same distribution as the square of a unit Normal Distribution. Density of the Inverse Gaussian Distribution: Exercise: Determine the derivative with respect to x of: Φ[ θ (x1/2/µ - x-1/2)]. [Solution: Φ[y] is the cumulative distribution function of a Standard Normal. Its derivative with respect to y is the density of a Standard Normal. d Φ[y] / dy = φ(y) = (1/ 2 π ) exp[-y2 /2]. Therefore, d Φ[y] / dx = dy/dx (1/ 2 π ) exp[-y2 /2]. d Φ[ θ (x1/2/µ - x-1/2)] / dx = θ {(1/2)x-1/2/µ + (1/2) x-3/2}(1/ 2 π ) exp[-θ(x1/2/µ - x-1/2)2 /2] = (1/2) (1/ 2 π )
127
θ (x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] .]
One can verify that the means and the variances add, as they must for the sum of any two independent variables. See Insurance Risk Models by Panjer & Willmot, page 115. This a somewhat more complicated version of the similar result for a Gamma. The sum of two independent Gammas with the same scale parameters, is a Gamma with the same scale parameter and the sum of the shape parameters. 128 See for example page 116 of Risk Models by Panjer and Willmot or page 412 of Volume 1 of Kendallʼs Advanced Theory of Statistics, by Stuart and Ord.
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 324
Exercise: Determine the derivative with respect to x of: e2θ/µ Φ[- θ (x1/2/µ +x-1/2)] . [Solution: d Φ[y] / dx = dy/dx (1/ 2 π ) exp[-y2 /2]. d e2θ/µ Φ[- θ (x1/2/µ + x-1/2)] / dx = e2θ/µ θ {(-1/2)x-1/2/µ + (1/2) x-3/2}(1/ 2 π ) exp[-θ(x1/2/µ + x-1/2)2 /2] = (1/2) (1/ 2 π )
θ (-x-1/2/µ + x-3/2) e2θ/µ exp[-(θ/2)(x/µ2 + 2/µ + 1/ x)/2] =
(1/2) (1/ 2 π )
θ (-x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] .]
Exercise: Given the Distribution Function F(x) = Φ[ θ (x1/2/µ - x-1/2)] + e2θ/µ Φ[- θ (x1/2/µ +x-1/2)], determine the density function f(x). [Solution: dF/ dx = (1/2) (1/ 2 π ) (1/2) (1/ 2 π )
(1/ 2 π )
θ (x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x))] +
θ (-x-1/2/µ + x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] =
θ (x-3/2) exp[-(θ/2)(x/µ2 - 2/µ + 1/ x)] =
θ 2π
[
exp -
θx θ θ + 2µ2 µ 2x .] x1.5
]
Thus we have been able to confirm that the stated Distribution Function of an Inverse Gaussian Distribution does in fact correspond to the stated density function of an Inverse Gaussian. Moments of the Inverse Gaussian Distribution: Exercise: For an Inverse Gaussian, set up the integral in order to compute the nth moment. ∞
[Solution: E[Xn ] =
= eθ/µ
θ 2π
∫0
∞
f(x) xn dx =
∞
∫0
xn - 1.5 exp[-
∫0
θ n - 1.5 θx θ θ x exp[- 2 + ] dx 2π 2µ µ 2x
θx θ ] dx . ] 2µ 2 2x
This integral is in a form that gives a modified Bessel Function of the third kind.129 ∞
∫0 x ν -1 exp [-β xp - γ x - p ] dx = (2/p) (γ/β)ν/2p Kν/p (2 129
βγ ).
See Insurance Risk Models by Panjer and Willmot or formula 3.478.4 in Table of Integrals, Series, and Products, by Gradshetyen and Ryzhik.
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 325
Therefore, using the above formula with p = 1, ν = n - 1/2, β = θ / (2µ2), and γ = θ/2, E[Xn ] = eθ/µ
θ θ (2) (µ2)(n-0.5)/2 K(n - 1/2)/1 (2 θ/2) = eθ/µ 2π 2µ 2
2θ n µ Kn - 1/2 (θ/µ). µπ
130
Exercise: What is the 4th moment of an Inverse Gaussian with parameters µ = 5 and θ = 7? [Solution: E[X4 ] = eθ/µ
2θ 4 µ K3.5(θ/µ) = e1.4 µπ
2.8 4 5 K3.5(1.4) = π
(4.0552)(0.94407)(625)(4.80757) = 11503.3. Comment: Where one has to look up in a table or use a software package to compute the value of K3.5(1.4) = 4.80757, the modified Bessel Function of the third kind.131] ParaLogistic:132 The ParaLogistic Distribution is a special case of the Burr Distribution, with its two shape parameters equal, for α = γ. This unlike most of the other named special cases, in which a parameter is set equal to one. This general idea of setting two shape parameters equal can be used to produce additional special cases. The only other time this is done in Loss Models, is to produce the Inverse ParaLogistic Distribution. The ParaLogistic Distribution has scale parameter θ and shape parameter α. ⎛ ⎞α 1 F(x) = 1 - ⎜ ⎟ . ⎝ 1 + (x / θ)α ⎠ f(x) =
α 2 xα − 1 . θ α {1 + (x / θ)α}α + 1
The nth moment only exists for n < α2. This follows from the fact that moments of the Transformed Beta Distribution only exist for n < αγ, with in this case α = γ.
130
The formula for the cumulants of the Inverse Gaussian Distribution is more tractable than that for the moments. The nth cumulant for n ≥ 2 is: µ2n-1 (2n-3)! / {θn-1(n-2)! 2n-2}. See for example, Kendallʼs Advanced Theory of Statistics, Volume I, by Stuart and Ord. Using this formula for the cumulants, one can obtain the formula for the skewness and the kurtosis listed above. 131 I used Mathematica. 132 See Appendix A of Loss Models.
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 326
Inverse ParaLogistic:133 The Inverse ParaLogistic Distribution is a special case of the Inverse Burr Distribution, with its two shape parameters equal, for τ = γ. If X follows a ParaLogistic with parameters α and 1, then θ/X follows an Inverse ParaLogistic with parameters τ = α and θ. The Inverse ParaLogistic Distribution has scale parameter θ and shape parameter τ. ⎧ (x / θ)τ ⎫τ F(x) = ⎨ ⎬ = {1 + (θ /x)τ}−τ. ⎩1 + (x / θ)τ ⎭ f(x) =
τ2 (x / θ)τ2 . x {1 + (x / θ)τ }τ +1
The nth moment only exists for n < τ. Inverse Weibull:134 If X follows an Weibull Distribution with parameters 1 and τ, then θ/X follows an Inverse Weibull with parameters θ and τ. The Inverse Weibull is heavier-tailed than the Weibull; the moments of the Inverse Weibull only exist for n < τ, while the Weibull has all of its (positive) moments exist. The Inverse Weibull has scale parameter θ and shape parameter τ. The Inverse Weibull Distribution is a special case of the Inverse Transformed Gamma Distribution with α = 1. ⎛ θ ⎞τ F(x) = exp -⎜ ⎟ . ⎝ x⎠ f(x) =
133 134
[ ] [ ]
⎛ θ ⎞τ τ θτ exp ⎜ ⎟ . ⎝ x⎠ xτ + 1
See Appendix A of Loss Models. See Appendix A in Loss Models.
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 327
Inverse Gamma:135 136 If X follows a Gamma Distribution with parameters α and 1, then θ/x follows an Inverse Gamma Distribution with parameters τ = α and θ. (Thus this distribution is no more complicated conceptually than the Gamma Distribution.) α is the shape parameter and θ is the scale parameter. The Inverse Gamma Distribution is a special case of the Inverse Transformed Gamma Distribution with τ = 1. θα e - θ / x . The Distribution Function is : F(x) = 1 - Γ(α ; θ/x), and the density function is: f(x) = α + 1 x Γ[α] If X follows an Inverse Gamma Distribution with parameters α and θ, then 1/X has distribution function Γ(α ; θx), which is a Gamma Distribution with parameters α and 1/θ. Note that the density has a negative power of x times an exponential of 1/x. This is how one recognizes an Inverse Gamma density.137 The scale parameter θ is divided by x in the exponential. The negative power of x has an absolute value one more than the shape parameter α. Exercise: A probability density function is proportional to e-11/x x-2.5. What distribution is this? [Solution: This is an Inverse Gamma Distribution with α = 1.5 and θ = 11. Comment: The proportionality constant in front of the density is 111.5 / Γ(1.5) = 36.48/ 0.8862 = 41.16. There is no requirement that α be an integer. If α is non-integral then one needs access to a software package that computes the (complete) Gamma Function.] The Distribution Function is related to that of a Gamma Distribution: Γ(α ; x/θ). If x/θ follows an Inverse Gamma Distribution with scale parameter of one, then θ/x follows a Gamma Distribution with a scale parameter of one. The Inverse Gamma is heavy-tailed, as can be seen by the lack of the existence of certain moments.138 The nth moment of an Inverse Gamma only exists for n < α. 135
See Appendix A of Loss Models. An Inverse Gamma Distribution is sometimes called a Pearson Type V Distribution. 137 The Gamma density has an exponential of x times x to a power. 138 In the extreme tail its behavior is similar to that of a Pareto distribution with the same shape parameter α. 136
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 328
Note that the Inverse Gamma density function integrates to unity from zero to infinity.139 ∞
e- θ / x α α + 1 dx = Γ(α) / θ , α > 0. x 0
∫
This fact will be useful for working with the Inverse Gamma Distribution. For example, one can compute the moments of the Inverse Gamma Distribution: ∞
E[Xn ] =
∫
0
xn f(x)dx
∞
=
∫
xn θα e -θ/ x
/ { Γ(α)
x α+ 1 }
dx
= {θα/Γ(α)}
0
∞
∫
e -θ / x x -( α + 1− n) dx
0
= {θα/Γ(α)} Γ(α - n) /θ(α-n) = θn Γ(α - n) / Γ(α), α - n > 0. Alternately, the moments of the Inverse Gamma also follow from the moments of the Gamma Distribution which are E[Xn ] = θn Γ(α+n) / Γ(α).140 If X follows a Gamma with unity scale parameter, then Z = θ/X has an Inverse Gamma Distribution, with parameters α and θ. Thus the Inverse Gamma has moments: E[Zn ] = E[(θ/X)n ] = θn E[X-n] = θn Γ(α-n) / Γ(α), α > n. Specifically, the mean of the Inverse Gamma = E[θ/X] = θ Γ(α-1) / Γ(α) = θ /(α−1). As will be discussed in a subsequent section, the Inverse Gamma Distribution can be used to mix together Exponential Distributions. For the Inverse Gamma Distribution: {Skewness(Kurtosis + 3)}2 = 4 (4 Kurtosis - 3 Skewness2 ) (2 Kurtosis - 3 Skewness2 - 6).141
139
This follows from substituting y = 1/x in the definition of the Gamma Function. Remember it via the fact that all probability density functions integrate to unity over their support. 140 This formula works for n positive or negative, provided n > − α. 141 See page 216 of Volume I of Kendallʼs Advanced Advanced Theory of Statistics. Type V of the Pearson system of distributions is the Inverse Gamma.
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Inverse Gamma Distribution Support: x > 0
Parameters: α > 0 (shape parameter), θ > 0 (scale parameter)
D. f. :
F(x) = 1 - Γ(α ; θ/x)
P. d. f. :
⎛θ ⎞ α e- θ / x θα e - θ / x f(x) = α + 1 =⎜ ⎟ . x Γ[α] ⎝ x ⎠ x Γ[α]
Moments: E[Xk] = n
θn
=
∏ (α − i)
θk , α > k. (α − 1)...(α − k)
i=1
Mean =
θ α −1
α>1 θ2 (α − 1) (α − 2)
Second Moment =
Variance =
Mode =
θ2
(α − 1)2 (α − 2)
α >2
α >2
Coefficient of Variation = Standard Deviation / Mean = Skewness =
4
θ α+1
α−2 , α > 3. α −3
Kurtosis =
Limited Expected Value: E[X ∧ x] =
1 , α> 2. α −2
3 (α − 2 ) (α + 5 ) , α > 4. (α − 3 ) (α − 4 )
θ {1 − Γ[α−1; θ/x]} + x Γ[α; θ/x] α −1
R(x) = Excess Ratio = Γ[α−1; θ/x] − (α−1) (x/θ) Γ[α; θ/x] .
e(x) = Mean Excess Loss =
θ Γ [α -1; θ / x] -x α>1 α − 1 Γ [ α ; θ / x]
X ∼ Gamma(α, 1) ⇔ θ/X ∼ Inverse Gamma(α, θ).
Page 329
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Problems: Use the following information for the next 4 questions: You have an Inverse Gamma Distribution with parameters α = 5 and θ = 10. 25.1 (1 point) What is the density function at x = 0.7? A. less than 0.01 B. at least 0.01 but less than 0.02 C. at least 0.02 but less than 0.03 D. at least 0.03 but less than 0.04 E. at least 0.04 25.2 (1 point) What is the mean? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6 25.3 (1 point) What is the variance? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6 25.4 (1 point) What is the mode?
25.5 (1 point) What is the integral from zero to infinity of 1,000,000 e-15/x x-7? A. less than 8 B. at least 8 but less than 9 C. at least 9 but less than 10 D. at least 10 but less than 11 E. at least 11
Page 330
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 331
25.6 (1 point) Sizes of loss are assumed to follow a LogLogistic Distribution, with γ = 4 and θ = 1000. What is the probability that a loss exceeds 1500? A. B. C. D. E.
less than 13% at least 13% but less than 14% at least 14% but less than 15% at least 15% but less than 16% at least 16%
25.7 (1 point) Losses follow a LogLogistic Distribution with γ = 4 and θ = 1000. What is the mode? 25.8 (1 point) For an Inverse Pareto Distribution, with τ = 3 and θ = 10, what is the probability density function at 7? A. less than 1.7% B. at least 1.7% but less than 1.8% C. at least 1.8% but less than 1.9% D. at least 1.9% but less than 2.0% E. at least 2.0% 25.9 (1 point) For an Inverse Pareto Distribution, with τ = 6 and θ = 80, what is the mode? 25.10 (2 points) For a ParaLogistic Distribution, with α = 4 and θ = 100, what is the mean? You may use the facts that: Γ(1/4) = 3.6256, Γ(1/2) = 1.7725, and Γ(3/4) = 1.2254. A. 59
B. 61
C. 63
D. 65
E. 67
25.11 (1 point) For an Inverse ParaLogistic Distribution, with τ = 2.3 and θ = 720, what is F(2000)? A. B. C. D. E.
less than 82% at least 82% but less than 83% at least 83% but less than 84% at least 84% but less than 85% at least 85%
25.12 (2 points) For an Inverse Weibull Distribution, with τ = 5 and θ = 20, what is the variance? You may use: Γ(0.2) = 4.59084 , Γ(0.4) = 2.21816 , Γ(0.6) = 1.48919 , and Γ(0.8) = 1.16423. A. B. C. D. E.
less than 55 at least 55 but less than 60 at least 60 but less than 65 at least 65 but less than 70 at least 70
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 332
25.13 (1 point) For an Inverse Weibull Distribution, with τ = 5 and θ = 20, what is the mode?
25.14 (2 points) f(x) =
12,500 e- 10 / x , x > ∞. 3 x6
You take a sample of size 20. Using the Normal Approximation, determine the probability that the sample mean is greater than 3. A. 2% B. 3% C.4% D. 5% E. 6% 25.15 (2 points) What is the Distribution Function at 9 of an Inverse Gaussian Distribution, with parameters µ = 7 and θ = 4? A. B. C. D. E.
less than 70% at least 70% but less than 75% at least 75% but less than 80% at least 80% but less than 85% at least 85%
25.16 (2 points) What is the Probability Density Function at 16 of an Inverse Gaussian Distribution, with parameters µ = 5 and θ = 7? A. B. C. D. E.
less than 0.0040 at least 0.0040 but less than 0.0045 at least 0.0045 but less than 0.0050 at least 0.0050 but less than 0.0055 at least 0.0055
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Use the following information for the next two questions. Let W = 0.7X + 0.3Y, where X and Y are independent random variables. X follows a Gamma distribution with α = 3 and θ = 10. Y follows a Inverse Gamma distribution with α = 12 and θ = 300. 25.17 (2 points) What is the mean of W? A. less than 25 B. at least 26 but less than 27 C. at least 27 but less than 28 D. at least 28 but less than 29 E. at least 29 25.18 (3 points) What is the variance of W? A. less than 145 B. at least 145 but less than 150 C. at least 150 but less than 155 D. at least 155 but less than 160 E. at least 160
Page 333
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 334
Solutions to Problems: 25.1. C. f(x) = θα e−θ/x / {Γ(α) xα+1} , f(.7) = 105 e−10/.7 / {Γ(5) .76 } = 0.0221. Comment: Using the formulas in Appendix A of Loss Models, f(x) = (θ/x)α e−θ/x /{xΓ(α)}. f(0.7) = (10/0.7)5 e-10/0.7 / {0.7 Γ(5) } = 0.0221. 25.2. D. Mean = θ / (α−1) = 10 / (5-1) = 2.5. 25.3. B. Variance = θ2 / {(α−1)2 (α−2)} = (102 ) / {42 3} = 2.0833. Alternately, the second moment is: θ2 / {(α−1) (α−2)} = (102 ) / {(4)(3)} = 8.333. Thus the variance = 8.333 - 2.52 = 2.083. 25.4. mode = θ/(α+1) = 10/6 = 1.67. ∞
25.5. D.
∫0 x - (α + 1) e - θ / x dx = Γ(α) / θα.
Letting θ = 15 and α = 6, the integral from zero to infinity of e-15/x x-7 is: Γ(6) / 156 = 120 / 113909625 = 0.0000105. Thus the integral of 1 million times that is: 10.5. Comment: e-15/x x-7 is proportional to the density of an Inverse Gamma Distribution with θ = 15 and α = 6. Thus its integral from zero to infinity is the inverse of the constant in front of the Inverse Gamma Density, since the density itself must integrate to unity. Alternately, one could let y = 15/x and convert the integral to a complete Gamma Function. 25.6. E. F(x) = (x/θ)γ / (1+ (x/θ)γ). S(x) = 1 / (1 + (x/θ)γ) = 1/(1 + 1.54 ) = 0.165. 25.7. For γ > 1, mode = θ {(γ-1)/(γ+1)}1/γ = 1000 (3/5)1/4 = 880. 25.8. B. f(x) = τθ x τ−1 /(x+θ)τ+1. f(7) = (3)(10)(72 )/(7+10)4 = 0.0176. 25.9. For τ > 1, mode = θ (τ - 1)/2 = 80(5/2) = 200.
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 335
25.10. E. Γ(1.25) = (1/4)Γ(.25) = (1/4)(3.6256) = .9064. Γ(3.75) = (2.75)(1.75)(.75)Γ(.75) = (3.6094)(1.2254) = 4.423. E[X] = θ Γ(1+1/α)Γ(α−1/α)/Γ(α) = 100 Γ(1.25) Γ(3.75) / Γ(4). = (100)(.9064)(4.423)/6 = 66.8. 25.11. A. F(x) = ((x /θ)τ/(1+(x /θ)τ))τ = (1+(θ /x )τ)−τ. F(2000) = (1+(720/2000)2.3)-2.3 = 0.811. 25.12. A. E[X] = θ Γ(1−1/τ) = 20 Γ(.8) = (20)(1.16423) = 23.285. E[X2 ] = θ2 Γ(1−2/τ) = 400 Γ(.6) = (400)(1.48919) = 595.68. variance = 595.68 - 23.2852 = 53.5. 25.13. mode = θ {τ/(τ+1)}1/τ = (20) (5/6)1/5 = 19.3. 25.14. E. The given density is that of an Inverse Gamma, with α = 5 and θ = 10. Mean = θ/(α-1) = 10/4 = 2.5. Second Moment =
θ2 = 100 / 12. (α − 1) (α − 2)
Variance = 25/3 - 2.52 = 2.0833.
Thus the variance of the sample mean is: 2.0833 / 20 = 0.10417 3 - 2.5 S(3) ≅ 1 - Φ[ ] = 1 - Φ[1.55] = 6.06%. 0.10417 25.15. C. For the Inverse Gaussian, F(x) = Φ[ θ (x1/2/µ - x-1/2)] + e2θ/µ Φ[- θ (x1/2/µ +x-1/2)]. With µ = 7 and θ = 4, F(9) = Φ[2 (3/7 - 1/3)] + e8/7 Φ[-2(3/7 +1/3)] = Φ[.19] + (3.1357)Φ[-1.52] = .5753 + (3.1357)(.0643) = 0.777. Comment: Using the formulas in Appendix A of Loss Models, z = (x-µ)/µ = (9-7)/7 = 2/7. y = (x+µ)/µ = (9+7)/7 = 16/7. F(x) = Φ[z θ / x )] + e2θ/µ Φ[-y θ / x ]. F(9) = Φ[(2/7) 4 / 9 ] + e8/7 Φ[-(16/7) 4 / 9 ] = Φ[.19] + (3.1357)Φ[-1.52] = 0.777. 25.16. E. For the Inverse Gaussian, f(x) = With µ = 5 and θ = 7, f(16) = 0.0057.
θ / (2 π) x-1.5 exp(-θ(x/µ -1)2 / (2x)).
7 / (2 π) 16-1.5 exp(-7(11/5)2 / 32) = (1.0556) (1/64) e-1.0588 =
2013-4-2,
Loss Distributions, §25 Other 2 Parameter Dists.
HCM 10/8/12,
Page 336
25.17. E. Mean of Gamma is αθ = 30. Mean of Inverse Gamma = θ/(α−1) = 300/11 = 27.3. Mean of W = .7(30) + .3(27.3) = 29.2. 25.18. C. Second moment for Gamma is α(α+1)θ2 = 1200. Variance of Gamma = 1200 - 302 = 300 = αθ2. Second moment for Inverse Gamma is θ2/ {(α−1)(α−2)} = 818.2. Variance of Inverse Gamma = 8.18.2 - 27.32 = 72.9. Therefore, the Variance of W = (0.72 )(300) + (0.32 )(72.9) = 153.6.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 337
Section 26, Three Parameter Distributions There are five Three Parameter Distributions in Loss Models: Transformed Gamma, Inverse Transformed Gamma, Burr, Inverse Burr, and Generalized Pareto, each generalizations of one or more of the two parameter distributions. The extra parameter provides extra flexibility, which potentially allows a closer fit to data.142 You are unlikely to be asked questions involving the 3 parameter distributions. Transformed Gamma Distribution:143 Transformed Gamma with τ = 1 is the Gamma. Transformed Gamma with α = 1 is the Weibull.
F(x) =
Γ[α ; (x/θ)τ].
⎛ x⎞ τ τ xτα -1 exp -⎜ ⎟ ⎝ θ⎠ f(x) = τα θ Γ(α)
[ ]
.
θ is the scale parameter for the Transformed Gamma. α is a shape parameter in the same way it is for the Gamma. τ is a shape parameter, as for the Weibull. The relationships between the Exponential, Weibull, Gamma, and Transformed Gamma Distributions are shown below: power transformation Exponential α =1
⇑
Gamma
⇒
⇒
Weibull
⇑
α =1
Transformed Gamma
power transformation Mean = θ Γ(α + 1/τ) /Γ(α).
142 143
Variance = λ2 {Γ(α)Γ(α + 2/τ) − Γ2(α + 1/τ)} / Γ(α)2.
The potential additional accuracy comes at the cost of extra complexity. See Appendix A and Figure 5.3 in Loss Models.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 338
Therefore: 1 + CV2 = Γ(α)Γ(α + 2/τ) / Γ2(α + 1/τ). It turns out that: (Skewness)(CV3) + 3CV2 + 1 = Γ2(α)Γ(α + 3/τ) / Γ2(α+1/τ).144 The Transformed Gamma Distribution is defined in terms of the Incomplete Gamma Function, F(x) = Γ[α ; (x/θ)τ]. Thus using the change of variables y = (x/θ)τ, the density of a Transformed Gamma Distribution can be derived from that for a Gamma Distribution. Exercise: Derive the density of the Transformed Gamma Distribution from that of the Gamma Distribution. [Solution: Let y = (x/θ)τ. If y follows a Gamma Distribution with parameters α and 1, then x follows a Transformed Gamma Distribution with parameters α and θ. If y follows a Gamma Distribution with parameters α and 1, then f(y) = yα−1 e-y / Γ(α). Then the density of x is given by: f(y)(dy/dx) = {((x/θ)τ)α−1 exp(-(x/θ)τ) / Γ(α)} {τθ−τxτ−1} = θ−τατxτα−1 exp[-(x/θ)τ] / Γ(α), as shown in Appendix A of Loss Models.] Exercise: What is the mean of a Transformed Gamma Distribution? [Solution: ∞ ∞ ∞
∫ xf(x)dx = ∫ x θ−τατxτα−1 exp[-(x/θ)τ] / Γ(α) dx = ∫ θ−τατxτα exp[-(x/θ)τ] dx/ Γ(α). 0
0
0
Let y = (x/θ)τ, and thus x = θ y1/τ , dx = θ y1/τ −1 dy / τ, then the integral for the first moment is: ∞
∞
∫ θ−τατ(yα θτα) exp[-y]{θ y1/τ −1 dy/τ}/Γ(α) = ∫ θyα +1/τ −1 exp[-y] dy/Γ(α) = θΓ(α+ 1/τ)/Γ(α).] 0
144
0
See Venter “Transformed Beta and Gamma Distributions and Aggregate Losses,” PCAS 1983. These relations can be used to apply the method of moments to the Transformed Gamma Distribution. Also an appropriate mixing of Transformed Gammas via a Gamma produces a Transformed Beta Distribution. The mixing of Exponentials via a Gamma to produce a Pareto is just a special case of this more general result.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 339
Exercise: What is the nth moment of a Transformed Gamma Distribution? [Solution: ∞
∞
∞
∫ xnf(x)dx = ∫ xn θ−τατxτα−1 exp[-(x/θ)τ]dx/Γ(α) = ∫ θ−τατxτα+n-1 exp[-(x/θ)τ]dx/Γ(α). 0
0
0
Let y = (x/θ)τ, and thus x = θy1/τ, dx = θy1/τ −1 dy/τ, then the integral for the nth moment is: ∞
∞
∫ θ−τα τ(yα+(n-1)/τ θτα+n-1) exp[-y]{θy1/τ −1 dy/τ} / Γ(α) = ∫ θn yα +(n/τ) −1 exp[-y] dy/Γ(α) 0
0
= θn Γ(α+ n/τ)/Γ(α). This is the formula shown in Appendix A of Loss Models.] Exercise: What is the 3rd moment of a Transformed Gamma Distribution with α = 5, θ = 2.5, and τ = 1.5? [Solution: θn Γ(α+ n/τ)/Γ(α) = 2.53 Γ(5+ 3/1.5)/Γ(5) = 2.53 { Γ(7)/Γ(5)} = 2.53 (6)(5) = 468.75.] Limit of Transformed Gamma Distributions: One can obtain a LogNormal Distribution as an appropriate limit of Transformed Gamma Distributions.145 First note that Γ[α; y] is a Gamma Distribution with scale parameter of 1, and thus mean and variance of α. As α gets large, this Gamma Distribution, approaches a Normal Distribution. Thus for large α, Γ[α; y] ≅ Φ[(y -α)/ α ]. Now take a limit of Transformed Gamma Distributions as τ goes to zero, while α = (1 + τµ)/τ2σ2 and θτ = τ2σ2, where µ and σ are selected constants.146 As τ goes to zero, both α and θ go to infinity. For each Transformed Gamma we have F(x) = Γ[α; y], with y = (x/θ)τ = xτ/(τ2σ2). (y -α)/ α = (xτ/τ2σ2 - (1 + τµ)/τ2σ2)/ {
1 + τµ /τσ} = {xτ / 1 + τµ -
1 + τµ } / τσ. As tau goes
to zero both the number and denominator go to zero. To get the limit we use LʼHospitalʼs rule and differentiate both the numerator and denominator with respect to τ: limit xτ / 1 + τµ τ →0
1 + τµ } / τσ = limit {ln(x)xτ/ 1 + τµ - µxτ/{2(1 + τµ)1.5} - µ/ {2 τ →0
= {ln(x) - µ/ 2 - µ/ 2}/ σ = {ln(x) - µ}/ σ. 145 146
See Section 5.3.3 of Loss Models. Mu and sigma will turn out to be the parameters of the limiting LogNormal Distribution.
1 + τµ }}/ σ
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 340
Thus as alpha gets big and tau get small we have Γ[α; y] ≅ Φ[(y - α) /
α ] ≅ Φ[{ln(x) - µ}/ σ],
which is a LogNormal Distribution. Thus one can obtain a LogNormal Distribution as an appropriate limit of Transformed Gamma Distributions.147 Inverse Transformed Gamma:148 If x follows a Transformed Gamma Distribution, then 1/x follows an Inverse Transformed Gamma Distribution. Inverse Transformed Gamma with τ = 1 is the Inverse Gamma. Inverse Transformed Gamma with α = 1 is the Inverse Weibull.
F(x) = 1 -
⎛ θ ⎞τ τ θτα exp -⎜ ⎟ ⎝ x⎠ f(x) = . τα + 1 x Γ[α]
[ ]
Γ[α ; (θ/x)τ].
θ is the scale parameter for the Inverse Transformed Gamma. α is a shape parameter in the same way it is for the Inverse Gamma. τ is a shape parameter, as for the Inverse Weibull. The relationships between the Inverse Exponential, Inverse Weibull, Inverse Gamma, and Inverse Transformed Gamma Distributions are shown below: power transformation Inverse Exponential α =1
⇒
⇑
Inverse Gamma
⇒
Inverse Weibull
⇑
α =1
Inverse Transformed Gamma
power transformation The Inverse Transformed Gamma, and its special cases, are heavy-tailed distributions. The nth moment only exists is n < ατ. 147 148
One can also obtain a LogNormal Distribution as an appropriate limit of Inverse Transformed Gamma Distributions. See Appendix A and Figure 5.3 of Loss Models.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 341
The mean excess loss increases approximately linearly for large x. In the same way that the Transformed Gamma distribution is a generalization of the Gamma Distribution, the Inverse Transformed Gamma is a Generalization of the Inverse Gamma. y = xτ Gamma
⇓
⇒
y = 1/x Inverse Gamma
Transformed Gamma
⇓
y = 1/x
⇒
Inverse Transformed Gamma
y = xτ
Generalized Pareto:149 Generalized Pareto with τ = 1 is the Pareto. Generalized Pareto with α = 1 is the Inverse Pareto. F(x) = β[τ, α ; x/(θ+x)].
f(x) =
Γ[α + τ] θα xτ -1 . Γ[α ] Γ[τ] (θ + x)α + τ
θ is the scale parameter for the Generalized Pareto. α is a shape parameter in the same way it is for the Pareto. τ is an additional shape parameter. While the Pareto may be obtained by mixing Exponential Distributions via a Gamma Distribution, the Generalized Pareto can be obtained by mixing Gamma Distributions via a Gamma Distribution. Thus in the same way that a Gamma is a generalization of the Exponential, so is the Generalized Pareto of the Pareto. If x follows a Generalized Pareto, then so does 1/x.150 Specifically, using the fact that β(a,b; x) = 1 - β(b,a; 1-x), 1 - F(1/x) = 1 - β[τ, α ; (1/x)/(θ+1/x)] = 1 - β[τ, α ; (1/θ)/(1/θ +x)] = β[α, τ; 1- (1/θ)/(1/θ +x)] = β[α, τ; x/(1/θ +x)]. Thus if x follows a Generalized Pareto with parameters α, θ, τ, then 1/x follows a Generalized Pareto, but with parameters τ, 1/θ, α. 149
As discussed in “Mahlerʼs Guide to Statistics”, for CAS Exam 3L, the F-Distribution is a special case of the Generalized Pareto. The Generalized Pareto is sometimes called a Pearson Type VI Distribution. 150 Which is why there is not an “Inverse Generalized Pareto Distribution.”
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 342
Burr Distribution: Burr with γ = 1 is the Pareto. Burr with α = 1 is the LogLogistic.
Burr with α = γ is the ParaLogistic.
⎛ ⎞α 1 F(x) = 1 - ⎜ γ⎟ ⎝ 1 + (x / θ) ⎠
f(x) =
α γ xγ −1 θγ
⎛ ⎞ α+ 1 1 ⎜ γ⎟ ⎝ 1 + (x / θ) ⎠
θ is the scale parameter for the Burr distribution. α is a shape parameter in the same way it is for the Pareto. γ is an additional shape parameter. The Burr is obtained from the Pareto by introducing a power transformation; if xγ follows a Pareto Distribution, then x follows a Burr Distribution. If x follows a Burr Distribution with parameters α, θ, and γ, then (x/θ)γ follows a Pareto Distribution with shape parameter of α and scale parameter of 1. While the Pareto may be obtained by mixing Exponential Distributions via a Gamma Distribution, the Burr can be obtained by mixing Weibull Distributions via a Gamma Distribution. Thus in the same way that a Weibull is a generalization of the Exponential, so is the Burr of the Pareto. Inverse Burr: If x follows a Burr Distribution, then 1/x follows an Inverse Burr Distribution. Inverse Burr with γ = 1 is the Inverse Pareto. Inverse Burr with τ = 1 is the LogLogistic. Inverse Burr with α = γ is the Inverse ParaLogistic. ⎛ (x / θ) γ ⎞ τ γ −τ F(x) = ⎜ γ ⎟ = {1 + (θ /x) } ⎝ 1 + (x / θ) ⎠
f(x) =
⎞τ+1 τ γ x γτ − 1 ⎛ 1 ⎜ γ⎟ θγτ ⎝ 1 + (x / θ) ⎠
θ is the scale parameter for the Burr distribution. α is a shape parameter in the same way it is for the Inverse Pareto. γ is an additional shape parameter. The Inverse Burr is obtained from the Inverse Pareto by introducing a power transformation; if xγ follows a Inverse Pareto Distribution, then x follows a Inverse Burr Distribution.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 343
While the Inverse Pareto may be obtained by mixing Inverse Exponential Distributions via a Gamma Distribution, the Inverse Burr can be obtained by mixing Inverse Weibull Distributions via a Gamma Distribution. Thus in the same way that an Inverse Weibull is a generalization of the Inverse Exponential, so is the Inverse Burr of the Inverse Pareto. Log-t Distribution:151 The log-t distribution has the same relationship to the t distribution, as does the LogNormal Distribution to the Standard Normal Distribution.152 If Y has a t distribution with r degrees of freedom, then exp(σY + µ) has a log-t distribution, with parameters r, µ and σ. In other words, if X has a log-t distribution with parameters r, µ and σ, then (ln(X) - µ)/σ has a t distribution, with r degrees of freedom. Exercise: What is the distribution function at 1.725 for a t distribution with 20 degrees of freedom? [Solution: Using a t table, at 1.725 there is a total of 10% in the left and right hand tails. Therefore, at 1.725 the distribution function is 95%.] Exercise: What is the distribution function at 11.59 for a log-t distribution with parameters r = 20, µ = -1, and σ = 2? [Solution: {ln(11.59) - (-1)}/2 = 1.725. Thus we want the distribution function at 1.725 of a t distribution with 20 degrees of freedom. This is 95%.] As with the t-distribution, the distribution function of the log-t distribution can be written in terms of incomplete beta functions, while the density involves complete gamma functions. Since the t-distribution is heavier tailed than the Normal Distribution, the log-t distribution is heavier tailed than the LogNormal Distribution. In fact, none of the moments of the log-t distribution exist!
151 152
See Appendix A in Loss Models. The (Studentʼs) t distribution is discussed briefly in the next section.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 344
Problems: 26.1 (1 point) The Generalized Pareto Distribution is a generalization of which of the following Distributions? A. Pareto B. Gamma
C. Weibull
D. LogNormal
E. Inverse Gaussian
26.2 (2 points) For f(x) = (10-10) x9 exp[-(.01x2 )] / 12 what is the value of integral from zero to infinity of x2 f(x) dx? Hint: For the Transformed Gamma distribution, f(x) = τ(x/θ)τα exp(-(x/θ)τ) / {x Γ(α)}, and E[Xn ] = θn Γ(α + n/τ)/Γ(α). A. less than 450 B. at least 450 but less than 470 C. at least 470 but less than 490 D. at least 490 but less than 510 E. at least 510 26.3 (1 point) Match the Distributions. 1. Transformed Gamma with α =1.
a. Pareto
2. Transformed Gamma with τ =1.
b. Weibull
3. Burr with γ =1.
c. Gamma
A. 1a, 2b, 3c
B. 1a, 2c, 3b
C. 1b, 2a, 3c
D. 1b, 2c, 3a
26.4 (2 points) You are given the following: •
Claim sizes follow a Burr Distribution with parameters α = 3, θ = 32, and γ = 2.
• You observe 11 claims. • The number of claims and claim sizes are independent. Determine the probability that the smallest of these claim is greater than 7. A. less than 20% B. at least 20% but less than 25% C. at least 25% but less than 30% D. at least 30% but less than 35% E. at least 35%
E. 1c, 2b, 3a
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 345
26.5 (2 points) Which of the following is an expression for the variance of a Generalized Pareto Distribution, with α > 2? A. θ2 α / {(α-τ)2 (α-2)} B. θ2 τ (α+τ-1) / {(α-1)2 (α-2)} C. θ2 (α+τ-1) / {(α-τ)(α-1)(α-2)} D. θ2 τ α / {(α-τ)2 (α- (τ+1))} E. None of the above. 26.6 (1 point) Claim sizes follow an Inverse Burr Distribution with parameters τ = 3, θ = 100, and γ = 4. Determine F(183). A. less than 60% B. at least 60% but less than 65% C. at least 65% but less than 70% D. at least 70% but less than 75% E. at least 75% 26.7 (1 point) What is the second moment of an Inverse Transformed Gamma Distribution with parameters α = 5, θ = 15, τ = 2? A. less than 20 B. at least 20 but less than 30 C. at least 30 but less than 40 D. at least 40 but less than 50 E. at least 50 26.8 (1 point) Match the Distributions. 1. Inverse Transformed Gamma with α = 1.
a. LogLogistic
2. Generalized Pareto with α = 1.
b. Inverse Weibull
3. Inverse Burr with τ = 1.
c. Inverse Pareto
A. 1a, 2b, 3c
B. 1a, 2c, 3b
C. 1b, 2a, 3c
D. 1b, 2c, 3a
E. 1c, 2b, 3a
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 346
26.9 (1 point) What is the mode of a Transformed Gamma distribution, with α = 4.85, θ = 813, and τ = 0.301? A. less than 2000 B. at least 2000 but less than 2500 C. at least 2500 but less than 3000 D. at least 3000 but less than 3500 E. at least 3500 26.10 (1 point) What is the mode of a Generalized Pareto distribution, with α = 2.5, θ = 100, and τ = 1.3? 26.11 (1 point) What is the mode of an Inverse Burr distribution, with τ = 0.8, θ = 1000, and γ = 1.5? 26.12 (1 point) What is the mode of an Inverse Transformed Gamma distribution, with α = 3.7, θ = 200, and τ = 0.6? 26.13 (1 point) What is the median of a Burr distribution, with α = 1.85, θ = 273,700, and γ = 0.97? A. less than 115,000 B. at least 115,000 but less than 120,000 C. at least 120,000 but less than 125,000 D. at least 125,000 but less than 130,000 E. at least 130,000 26.14 (2 points) X follows a Burr Distribution with parameters α = 6, θ = 20, and γ = 0.5. Let X be the average of a random sample of size 200. Use the Central Limit Theorem to find c such that Prob( X ≤ c) = 0.90. A. 2.4 B. 2.5 C. 2.6 D. 2.7 E. 2.8 26.15 (4B, 5/97, Q.24) (2 points) The random variable X has the density function f(x) = 4x/(1+x2 )3 , 0 < x < ∞ . Determine the mode of X. A. 0 B. Greater than 0, but less than 0.25 C. At least 0.25, but less than 0.50 D. At least 0.50, but less than 0.75 E. At least 0.75
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 347
Solutions to Problems: 26.1. A. The Generalized Pareto Distribution is a generalization of the Pareto Distribution, with three rather than two parameters. Comment: Questions canʼt get any easier than this! For τ = 1, the Generalized Pareto is a Pareto. For α = 1, the Generalized Pareto is an Inverse Pareto. 26.2. D. This is a Transformed Gamma distribution, with α = 5, θ = 10, and τ = 2. The integral is the second moment, which for the Transformed Gamma is: θ2Γ(α + 2/τ) / Γ(α) = 100 Γ(5 + 1) / Γ(5) = (100)(5) = 500. 26.3. D. 1. Transformed Gamma with α =1 is a Weibull. 2. Transformed Gamma with τ = 1 is a Gamma. 3. Burr with γ = 1 is a Pareto. 26.4. B. S(7) = 1 - F(7) = (1/(1+(7/32)2 ))3 = .8692. The smallest of these 11 claims is greater than 7, if and only if all the 11 claims are greater than 7. The chance of this is: S(7)11 = 0.869211 = 21.4%. Comment: This is an example of an order statistic. 26.5. B. The Generalized Pareto Distribution has for α > n, moments: θn Γ(α − n)Γ(τ + n)/Γ(α)Γ(τ). The first moment is: θΓ(α − 1)Γ(τ + 1)/Γ(α)Γ(τ) = θ τ /(α-1). The second moment is: θ2Γ(α − 2)Γ(τ + 2)/Γ(α)Γ(τ) = θ2τ(τ+1) /{(α-1)(α-2)}. Thus the variance is: θ2τ(τ+1) /{(α-1)(α-2)} - {θτ /(α-1)}2 = {θ2τ/{(α-1)2 (α-2)}}{(τ+1)(α-1) - τ(α-2)} = θ2 τ (α+τ-1) /{(α-1)2 (α-2)}. Comment: For τ =1, this is a Pareto, with variance: θ2α / {(α−2)(α−1)2}, for α > 2. 26.6. E. F(x) = (1 + (θ/x)γ)−τ . F(183) = (1 + (.5464)4 )-3 = 0.774. 26.7. E. E[X2 ] = θ2 Γ(α - 2/τ) / Γ(α) = 225 Γ(4) / Γ(5) = 225/4 = 56.25. 26.8. D. 1. b, 2.c, 3. a.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 348
26.9. D. For τα = 1.46 > 1, the mode = θ{(ατ −1)/τ}1/τ = (813)(.45985/.301)1/0.301 = 3323. Comment: f(x) = τ(x/θ)τα exp(-(x/θ)τ) / {x Γ(α)}. ln f(x) = ln(τ) + (τα - 1)ln(x) - τα ln(θ) - (x/θ)τ -ln(Γ(α)). The mode is where the density is largest, and therefore where ln f(x) is largest. d ln f(x) / dx = (τα - 1)/x - τxτ−1/θτ. Setting d ln f(x) / dx = 0, (τα - 1)/x = τxτ−1/θτ . Therefore, x = θ{(τα −1)/τ}1/τ = (813){1.528)3.322 = 3323. 26.10. For τ > 1, mode = θ(τ - 1)/(α+1) = (100)(0.3)/3.5 = 8.57. Comment: The formula for the mode is shown in Appendix A attached to the exam. 26.11. For τγ > 1, mode = θ {(τγ - 1) / (γ+1)}1/γ = (1000)(0.2/2.5)1/1.5 = 185.7. 26.12. Mode = θ {τ / (ατ + 1)}1/τ = (200)(0.6/3.22)1/0.6 = 12.16. 26.13. C. F(x) = 1 - (1/(1+(x/θ)γ))α. Therefore, .5 = 1 - (1/(1+(x/273700).97))1.85. 1.455 = 1 + x.97 /188,000.
Therefore, x = 121.5 thousand.
26.14. E. E[X] = θ Γ[1 + 1/γ] Γ[α - 1/γ] / Γ[α] = (20) Γ[3] Γ[4] / Γ[6] = (20) (2!) (3!) / (5!) = 2. E[X2 ] = θ2 Γ[1 + 2/γ] Γ[α - 2/γ] / Γ[α] = (202 ) Γ[5] Γ[2] / Γ[6] = (202 ) (4!) (1!) / (5!) = 80. Var[X] = 80 - 22 = 76. Var[ X ] = 76/200 = 0.38. E[ X ] = E[X] = 2. Using the Normal Approximation, the 90th percentile is at: 2 + 1.282 0.38 = 2.790.
2013-4-2,
Loss Distributions, §26 Three Parameter Distribs.
HCM 10/8/12,
Page 349
26.15. C. The mode is where the density is at a maximum. We check the end points, but f(0) = 0 and f(∞) = 0, so the maximum is in between where f′(x) = 0. f′(x) = {4(1+x2 )3 - (4x)(6x)(1+x2 )2 }/(1+x2 )6 . Setting the derivative equal to zero, 4(1+x2 )3 - (4x)(6x)(1+x2 )2 = 0. Therefore 4(1+x2 ) = 24x2 . x2 = 1/5. x = 1/ 5 = 0.447. Comment: One can confirm that the mode is approximately .45 numerically: x
f(x)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
0.000 0.199 0.388 0.561 0.711 0.834 0.927 0.990 1.025 1.035 1.024 0.996 0.954
This is a Burr Distribution with α = 2, θ = 1, and γ = 2. As shown in Appendix A of Loss Models, the mode is for γ > 1: θ {(γ-1) / (αγ +1)}1/γ = (1/5)1/2.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 350
Section 27, Beta Function and Distribution The quantity xa-1(1-x)b-1 for a > 0, b > 0, has a finite integral from 0 to 1. This integral is called the (complete) Beta Function. The value of this integral clearly depends on the choices of the (a - 1)! (b- 1)! Γ(a) Γ(b) parameters a and b.153 This integral is: = . (a + b - 1)! Γ(a + b) The Complete Beta Function is a combination of three Complete Gamma Functions: β[a,b] =
1
∫ xa - 1 (1- x)b - 1 dx =
0
(a - 1)! (b- 1)! Γ(a) Γ(b) = . (a + b - 1)! Γ(a + b)
Note that β(a, b) = β(b, a). Exercise: What is the integral from zero to 1 of x3 (1-x)6 ? [Solution: β(4, 7) = Γ(4)Γ(7) / Γ(4+7) = 3! 6! / 10! = 1/840 = 0.001190.] One can turn the complete Beta Function into a distribution on the interval [0,1] in a manner similar to how the Gamma Distribution was created from the (complete) Gamma Function on [0,∞]. Let: x
F(x) = {(a+b-1)! / (a-1)! (b-1)!}
∫
x
ta-1(1-t)b-1 dt = {Γ(a+b) / Γ(a)Γ(b)}
0
∫
ta-1(1-t)b-1 dt.
0
x
F(x) =
∫ ta - 1 (1- t)b - 1 dt / β[a, b] ≡ β(a , b; x).
0
f(x) = {(a+b-1)! / (a-1)! (b-1)!} xa-1(1-x)b-1, 0≤ x ≤1. Then the distribution function is zero at x = 0 and one at x = 1. The latter follows from the fact that we have divided by the value of the integral from 0 to 1 of ta-1(1-t)b-1. This corresponds to the form of the incomplete Beta function shown in the Appendix A of Loss Models. F(x) = β(a, b; x), 0 ≤ x ≤ 1.154 This two parameter distribution is a special case of what Loss Models calls the Beta distribution, for θ = 1; the Beta Distribution in Loss Models is F(x) = β(a, b; x/θ), 0 ≤ x ≤ θ.
153
The results have been tabulated and this function is widely used in many applications. See for example the Handbook of Mathematical Functions, by Abramowitz, et. al. 154 This distribution is sometimes called a Beta Distribution of the first kind, or a Pearson Type I Distribution.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 351
The following relationship, is sometimes useful: β(a, b; x) = 1 - β(b, a; 1 - x).
β(a,b;x) has mean:
a a (a +1) ab , second moment: , and variance: . 2 a + b (a + b) (a + b + 1) (a + b) (a+ b +1)
The mean is between zero and one; for b < a the mean is greater than 0.5. For a fixed ratio of a/b, the mean is constant and for a and b large β(a,b; x) approaches a Normal Distribution. As a or b get larger the variance decreases. For either a or b extremely large, virtually all the probability is concentrated at the mean. Here are various Beta Distributions (with θ = 1):
5
a = 1 and b = 5
a = 2 and b = 4 2.0
4 1.5 3 1.0
2
0.5
1 0.2
0.4
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
a = 5 and b = 1
a = 4 and b = 2
5
2.0
4 1.5
3
1.0
2
0.5
1 0.2
0.4
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
For a > b the Beta Distribution is skewed to the left. For a < b it is skewed to the right. For a = b it is symmetric. For a ≤ 1, the Mode = 0. For b ≤ 1, the Mode = 1. β(a,b;x), the Beta distribution for θ = 1 is closely connected to the Binomial Distribution. The Binomial parameter q varies from zero to one, the same domain as the Incomplete Beta Function.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 352
The Beta density is proportional to the chance of success to the power a-1, times the chance of failure to the power b-1. The constant in front of the Beta density is (a+b-1) times the binomial coefficient for (a+b-2) and a-1. The Incomplete Beta Function is a conjugate prior distribution for the Binomial. The Incomplete Beta Function for integer parameters can be used to compute the sum of terms from the Binomial Distribution.155 Summary of Beta Distribution: Support: 0 ≤ x ≤ θ
Parameters: a > 0 (shape parameter), b > 0 (shape parameter) θ > 0 (similar to a scale parameter, determines the support)
(a + b - 1)! x/ θ a - 1 F(x) = β(a, b ; x/θ) ≡ ∫ t (1- t)b - 1 dt . (a - 1)! (b- 1)! 0 f(x) = =
1 Γ(a + b) (x/θ)a (1 - x/θ)b-1 / x = (x/θ)a (1 - x/θ)b-1 / x β(a, b) Γ(a) Γ(b) (a + b - 1)! (x/θ)a-1 (1 - x/θ)b-1 / θ, 0 ≤ x ≤ θ. (a - 1)! (b- 1)!
For a = 1, b = 1, the Beta Distribution is the uniform distribution from [0, θ]. Γ(a + b) Γ(a + n) (a + b - 1)! (a + n - 1)! E[Xn] = θn = θn Γ(a + b + n) Γ(a) (a + b + n - 1)! (a - 1)! = θn
Mean = θ
a(a + 1) ... (a + n - 1) . (a + b)(a + b + 1) ... (a + b + n - 1) a a+b
E[X2 ] = θ2
a (a +1) (a + b) (a + b + 1)
Variance = θ2
ab (a+ b +1)
(a + b)2
Coefficient of Variation = Standard Deviation / Mean = (b / {a (a+b+1)}).5 Skewness = 2 (b - a)
Mode = θ
a - 1 a + b - 2
a + b + 1 (a + b + 2) a b for a > 1 and b > 1
Limited Expected Value = E[X ∧ x] = θ{a/(a+b)}β(a+1, b; x/θ) + x(1-β(b, a; x/θ)). 155
See ”Mahlerʼs Guide to Frequency Distributions.” On the exam you should either compute the sum of binomial terms directly or via the Normal Approximation. Note that the use of the Beta Distribution is an exact result, not an approximation. See for example the Handbook of Mathematical Functions, by Abramowitz, et. al.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 353
Beta Distribution for a = 3, b = 3, and θ = 1: 1.75 1.5 1.25 1 0.75 0.5 0.25 0.2
0.4
0.6
0.8
1
Since the density of the Beta Distribution integrates to one over its support: θ
∫
(t / θ)a - 1 (1 - t / θ)b - 1 dt = θ β(a, b). This is sometimes called a Beta type integral.
0
Uniform Distribution: The Uniform Distribution from 0 to θ is a Beta Distribution with a = 1 and b = 1. Specifically, DeMoivreʼs Law is a Beta Distribution with a = 1, b = 1, and θ = ω. The future lifetime of a life aged x under DeMoivreʼs Law is a Beta Distribution with a = 1, b = 1, and θ = ω - x. Modified DeMoivreʼs Law: The Modified DeMoivreʼs Law has S(x) = (1 - x/ω)α, 0 ≤ x ≤ ω. α = 1 is DeMoivreʼs Law. The Modified DeMoivreʼs Law is a Beta Distribution with a = 1, b = α, and θ = ω. The future lifetime of a life aged x under the Modified DeMoivreʼs Law is a Beta Distribution with a = 1, b = α, and θ = ω - x. Generalized Beta Distribution: Appendix A of Loss Models also shows the Generalized Beta Distribution, which is obtained from the Beta Distribution via a power transformation. F(x) = β(a, b ; (x/θ)τ). For τ = 1, the Generalized Beta Distribution reduces to the Beta Distribution.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 354
Studentʼs-t Distribution: The Studentʼs-t Distribution from Statistics can be written in terms of the Incomplete Beta Function. If U is a Unit Normal variable and χ2 follows a chi-square distribution with ν degrees of freedom, then U/
χ2 / ν follows a Students-t distribution, ν degrees of freedom.156
For parameter ν, the density is: f(x) = {Γ((ν+1)/2)/ (Γ(ν/2)
πν )} (1+x2 /ν)-(ν+1)/2 = (1+x2 /ν)-(ν+1)/2 / β(ν/2,1/2), -∞ < x < ∞.
The Distribution Function is:157 F(x) = β[ν/2, 1/2; ν/(ν+x2 )] /2 for x ≤ 0, F(x) = 1 - β[ν/2, 1/2; ν/(ν+x2 )] /2 for x ≥ 0. F-Distribution: The F-Distribution (variance ratio distribution) from Statistics can be written in terms of the Incomplete Beta Function.158 If χ12 follows a chi-square distribution with ν1 degrees of freedom and χ 22 follows a chi-square distribution with ν2 degrees, then
(χ12/ν1 ) / (χ22/ν2 ) follows an F-distribution, with ν1 and ν2 degrees of freedom.159 The Distribution Function is:160 F(x) = β[ν1 /2, ν2 /2; ν1 x / (ν2 +ν1 x)] = 1- β[ν2 /2, ν1 /2; ν2 /(ν2 +ν1 x)], x > 0. Conversely one can get the Incomplete Beta Function from the F-Distribution, provided 2a and 2b are integral: β(a,b; x) = F[bx/ a(1-x)], where F is the F-Distribution with 2a and 2b degrees of freedom.
A chi-square distribution with ν degrees of freedom is the sum of ν squares of independent Unit Normals. See Section 26.7 of the Handbook of Mathematical Functions by Abramowitz, et. al. 158 The F-Distribution is a form of what is sometimes called a Beta Distribution of the Second Kind. 159 For example, χ1 2 could be the estimated variance from a sample of ν1 drawn from a Normal Distribution and χ2 2 156 157
could be the estimated variance from an independent sample of ν2 drawn from a Normal Distribution with the same variance as the first. The variance-ratio test uses the F-Distribution to test the hypothesis that the two Normal Distributions have the same variance. See the syllabus of CAS Exam 3L. 160 See Section 26.6 of the Handbook of Mathematical Functions by Abramowitz, et. al.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 355
Relation of the Beta to the Gamma Distribution: The Complete Beta Function is a combination of three Complete Gamma Functions: 1
β[a,b] =
(b- 1)! Γ(a) Γ(b) = . ∫ xa - 1 (1- x)b - 1 dx = (a(a- 1)! + b - 1)! Γ(a + b)
0
In addition, there are other connections between the Gamma and Beta. If X is a random draw from a Gamma Distribution with shape parameter α and scale parameter θ, and Y is a random draw from a Gamma Distribution with shape parameter β and same scale parameter θ, then Z = X / (X+Y) is a random draw from a Beta Distribution with parameters α, β, and scale parameter 1. In addition, the Gamma Distribution can be obtained as the limit of an appropriately chosen sequence of Beta Distributions. In order to demonstrate this relationship weʼll use Sterlingʼs formula for the Gamma of large arguments as well as the fact that the limit as z → ∞ of (1 + c/z)z is ec.161 Let β(a,b; x) be a Beta Distribution. Let y be a chosen constant. Let x go to zero such that at the same time the second parameter b of the Beta Distribution goes to infinity while the relationship b = 1 + y/x holds. Then the integral that enters into the definition of β(a,b; x) is: x
x
∫ ta-1(1-t)b-1 dt = ∫ ta-1(1-t)y/x dt 0
0
Change variables by taking s = ty /x, then since t = sx/y, the above integral is: y
y
y
∫(sx/y)a-1(1-sx/y)y/x (x/y)ds = (x/y)a ∫sa-1{1-s/(y/x)}y/x ds = (x/y)a ∫sa-1e-s ds 0
0
0
Since for small x , y/x is large and therefore {1 - s/(y/x)}y/x ≅ e-s Meanwhile, the constant in front of the integral that enters into the definition of β(a,b; x) is:
Γ(a+b) / { Γ(a)Γ(b)} = Γ(a){ Γ(a+ 1+ y/x) / Γ(1 + y/x)}. 161
The latter fact follows from taking the limit of ln{(1 + c/z)z} = z ln(1 + c/z) ≅ z (c/z - (c/z)2 /2 ) = c - c/ 2z2 ≅ c.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 356
For very small x the argument of the Complete Gamma Function is very large. Sterlingʼs Formula says for large z: Γ(z) ≅ e-zzz−0.5 2 π . Thus for small x:
Γ(a+ 1+ y/x) / Γ( 1+ y/x) ≅ e-(a+ 1+ y/x)(a+ 1+ y/x)(a+ .5+ y/x) (2π).5 / {e-(1+ y/x)(1+ y/x)(.5+ y/x) (2π).5} = e-a (a+ 1+y/x)a {(a+ 1+ y/x) / (1+ y/x)}(.5+ y/x) ≅ e-a (y/x)a {1+ ax/y) y/x ≅ e-a (y/x)a ea = (y/x)a . Thus for large b, Γ(a+b) / { Γ(a)Γ(b)} ≅ (y/x)a / Γ(a) Putting the two pieces of the Incomplete Beta Function together: x
β(a,b; x) = Γ(a+b) / { Γ(a)Γ(b)} ∫ ta-1(1-t)b-1 dt ≅ 0
y
y
{ (y/x)a / Γ(a)} (x/y)a ∫sa-1e-s ds = {1 / Γ(a)} 0
∫sa-1 e-s ds = Γ(a; y).
0
Thus the Incomplete Gamma Function Γ(a; y) has been obtained as a limit of an appropriate sequence of Incomplete Beta Functions β(a, b; x), with b = 1 + y/x, as x goes to zero.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 357
Problems: Use the following information for the next four questions: X follows a Beta Distribution with a = 3, b = 8, and θ = 1 27.1 (1 point) What is the density function at x = 0.6 ? A. 0.05 B. 0.1 C. 0.2 D. 0.3
E. 0.4
27.2 (1 point) What is the mode? A. less than 0.20 B. at least 0.20 but less than 0.25 C. at least 0.25 but less than 0.30 D. at least 0.30 but less than 0.35 E. at least 0.35 27.3 (1 point) What is the mean? A. less than 0.20 B. at least 0.20 but less than 0.25 C. at least 0.25 but less than 0.30 D. at least 0.30 but less than 0.35 E. at least 0.35 27.4 (1 point) What is the variance? A. less than 0.02 B. at least 0.02 but less than 0.03 C. at least 0.03 but less than 0.04 D. at least 0 .04 but less than 0.05 E. at least 0.05
27.5 (2 points) For a Beta Distribution with parameters a = 0.5, b = 0.5, and θ = 1, what is the mode? A. 1/4
B. 1/3
C. 1/5
D. 3/4
E. None of A, B, C, or D
27.6 (2 points) For a Generalized Beta Distribution with τ = 2, determine the second moment. A. θ2
a a + b
B. θ2
b a + b
D. θ2
a (a + 1) (a + b) (a + b + 1)
E. θ2
b (b + 1) (a + b) (a + b + 1)
C. θ2
ab (a + b) (a + b + 1)
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 358
Use the following information for the next 2 questions:
•
While partially disabled, an injured workersʼ “lost earnings capacity” is defined based on the wage he could now earn as: (preinjury wage - postinjury wage)/(preinjury wage) = lost wages / preinjury wage.
•
Assume an injured workers “lost earnings capacity” is distributed via a Beta Distribution with parameters a = 3, b = 2, and θ =1.
•
While partially disabled, Workersʼ Compensation weekly benefits are 70% times the workersʼ lost wages, limited to 56% of the workersʼ preinjury wage.
27.7 (1 point) What is the average lost earnings capacity of injured workers? A. less than 0.53 B. at least 0.53 but less than 0.56 C. at least 0.56 but less than 0.59 D. at least 0.59 but less than 0.62 E. at least 0.62 27.8 (3 points) Where β(a,b; x) is the Incomplete Beta Function as defined in Appendix A of Loss Models, what is the average ratio of weekly partial disability benefits to preinjury weekly wage? Hint: Use the formula for E[X ∧ x] for the Beta Distribution, in Appendix A of Loss Models. A. 0.42β(4, 2; 0.2) + 0.56β(2, 3; 0.8) B. 0.56β(4, 2; 0.8) + 0.42β(2, 3; 0.2) C. 0.42β(4, 2; 0.8) + 0.56β(2, 3; 0.2) D. 0.56β(4, 2; 0.2) + 0.42β(2, 3; 0.8) E. None of the above.
27.9 (2 points) On an exam, the grades of students are distributed via a Beta Distribution with a = 4, b = 1, and θ = 100. A grade of 70 or more passes. What is the average grade of a student who passes this exam? A. 82 B. 84 C. 86 D. 88 E. 90
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 359
27.10 (2, 5/83, Q.12) (1.5 points) Let X have the density function f(x) = Γ(α + β) xα−1 (1 - x)β−1/ {Γ(α) Γ(β)}, for 0 < x < 1, where α > 0 and β > 0. If β = 6 and α = 5, what is the expected value of (1 - X)-4? A. 42
B. 63
C. 210
D. 252
E. 315
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 360
Solutions to Problems: 27.1. C. f(x) = {(a+b-1)! / (a-1)! (b-1)!} (x/θ)a-1{1-(x/θ)}b-1/θ = {10! / (2! 7!)}.62 .47 = 0.212 27.2. B. f(x) = {10! / (2! 7!)}x2 (1-x)7 = 360x2 (1-x)7 . fʼ(x) = 720x(1-x)7 + 2520x2 (1-x)6 . 0 = fʼ(x) = 720x(1-x)7 + 2520x2 (1-x)6 . ⇒ 720(1 - x) = 2520x. ⇒ x = 0.222. Comment: For a > 1 and b > 1, the mode is: θ(a - 1) / (a + b - 2) = (1)(3 - 1) / (3 + 8 - 2) = 2/9. A graph of the density of this Beta Distribution: density 3.0 2.5 2.0 1.5 1.0 0.5 0.2
0.4
0.6
0.8
1.0
x
27.3. C. Mean = θa/(a+b) = 3 / 11 = 0.2727. 27.4. A. Second moment = θ2a(a + 1 )/ {(a + b)(a + b + 1)} = (3)(4) / {(11)(12)} = 0.09091. Variance = 0.09091 - 0.27272 = 0.0165.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 361
27.5. E. f(x) is proportional to: x-1/2 (1 - x)-1/2. fʼ(x) is proportional to: (-1/2)x-3/2 (1 - x)-1/2 + (1/2)x-1/2 (1 - x)-3/2. Setting this equal to zero, and solving, x = 1/2. At x = 1/2, x-1/2 (1 - x)-1/2 = 2. As x → 0, x-1/2 (1 - x)-1/2 → ∞. As x → 1, x-1/2 (1 - x)-1/2 → ∞. Thus checking the endpoints of the support, the modes are at 0 and 1. Comment: 1/2 is a minimum for this density, f(x) = {Γ(1/2 + 1/2) / (Γ(1/2)Γ(/12))} x-1/2 (1 - x)-1/2 = x-1/2 (1 - x)-1/2 / π, 0 < x < 1: density
1.5
1.0
0.5
0.2
27.6. A. E[Xk] = θk E[X2 ] = θ2 = θ2
0.4
0.6
0.8
1.0
x
Γ[a + b] Γ[a + k / τ] . Γ[a] Γ[a + b + k / τ]
Γ[a + b] Γ[a + 2 / 2] Γ[a + b] Γ[a + 1] Γ[a + 1] Γ[a + b] = θ2 = θ2 Γ[a] Γ[a + b + 2 / 2] Γ[a] Γ[a + b + 1] Γ[a] Γ[a + b + 1] a . a + b
27.7. D. The mean of a Beta Distribution is θa/(a+b) = 3/(3+2) = 0.6.
2013-4-2,
Loss Distributions, §27 Beta Function
HCM 10/8/12,
Page 362
27.8. C. Let y be the injured workersʼ lost earnings capacity. Let w be the workersʼ pre-injury wage. Then the lost wages are yw. Thus the benefits are .7yw, limited to .56w. Thus the ratio of weekly partial disability benefits to pre-injury weekly wage, r, is: .7y limited to .56. For y ≥ .56/.7 = .8, r = .56; for y ≤ .8, r = .7y. To get the average r, we need to integrate with respect to f(y)dy for y = 0 to 1, dividing into two pieces depending on whether y is less than or greater than .8. 0.8
1
∫0 (0.7y) f(y) dy + 0.8∫ (0.56) f(y) dy
0.8
= 0.7{
∫0 y f(y) dy
+ 0.8 S(0.8) } = 0.7 E[Y
∧
0.8].
Thus the average ratio r has been written in terms of the Limited Expected Value; for the Beta Distribution, E[X ∧ x] = θ(a/(a+b))β(a+1,b; x/θ) + x(1-β(b,a; x/θ)). Thus for θ = 1, a = 3 and b = 2: 0.7 E[X ∧ .8] = 0.7{(3/(3+2))β(3+1,2;0.8) + 0.8(1-β(2,3;.8)} = 0.42β(4,2; 0.8) + 0.56β(2,3; 0.2). Comments: 0.42β(4,2; 0.8) + 0.56β(2,3; 0.2) = (0.42)(0.7373) + (0.56)(0.1808) = 0.411. Since β(a,b;1-x) = 1 - β(b,a;x), the solution could also be written as: 0.42β(4,2; 0.8) + 0.56{1-β(3,2; 0.8)}. 27.9. D. f(x) = x3 / 25,000,000. 100
S(70) =
100
∫70 x3 dx / 25,000,000 = 0.7599. 100
∫70 x f(x) dx = 70∫ x4 dx / 25,000,000 = 66.5544.
The average grade of a student who passes this exam is: 66.5544 / 0.7599 = 87.58. Comment: The average grade of all students is: (100)(4)/(4 + 1) = 80.
2013-4-2,
Loss Distributions, §27 Beta Function
27.10. A.
1
E[(1 - X)-4] = {Γ(11) / (Γ(5) Γ(6))}
HCM 10/8/12,
Page 363 1
∫ x4 (1 - x)5 (1 - x)-4 dx = {Γ(11) / (Γ(5) Γ(6))} ∫ x4 (1 - x) dx =
0
0
{Γ(11) / (Γ(5) Γ(6))}{(Γ(5) Γ(2))/Γ(7)} = (10)(9)(8)(7)/{(5)(4)(3)(2)} = 42. Alternately, let y = 1 - x. Then y has a Beta Distribution with a = 6, b = 5, and θ = 1. E[(1 - X)-4] = E[Y-4] = Γ(6 + 5)Γ(6 - 4)/{Γ(6)Γ(6 + 5 - 4)} = Γ(11)Γ(2)/{Γ(6)Γ(7)} = (10!)(1!)/{(5!)(6!)} = (10)(9)(8)(7) / {(5)(4)(3)(2)} = 42. Comment: X has a Beta Distribution with a = 5, b = 6, and θ = 1.
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 364
Section 28, Transformed Beta Distribution You are extremely unlikely to be asked questions on your exam involving the 4 parameter Transformed Beta Distribution. Define the Transformed Beta Distribution on (0, ∞) in terms of the Incomplete Beta Function: F(x) = β[τ, α ; xγ / (xγ + θγ)] = 1 - β[α , τ ; θγ / ( xγ + θγ)]. f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ). As shown on page 70 and in Appendix A of Loss Models, the Pareto, Generalized Pareto, Burr, LogLogistic, and other distributions are special cases of the Transformed Beta Distribution. In this parameterization, θ acts as a scale parameter, since everywhere x appears in F(x) it is divided by θ. γ is a power transformation parameter as in the Loglogistic and Burr Distributions. α is a shape parameter as in the Pareto. τ is another shape parameter as in the Generalized Pareto or Inverse Pareto. With four parameters the Transformed Beta Distribution has a lot of flexibility to fit different data sets. For 0 < x < ∞, xγ / (xγ + θγ) is between 0 and 1, the domain of the Incomplete Beta Function. Thus this transformation allows one to get a size of loss distribution from the Incomplete Beta Function, which only has a domain from zero to one. The moments of the Transformed Beta Distribution are: θn Γ(α − n/γ)Γ(τ+ n/γ)/Γ(α)Γ(τ), αγ > n > -ατ. Only some moments exist; if αγ >1 the mean exists. If αγ >2 the second moment exists, etc. Mean = θ Γ(α − 1/γ)Γ(τ+ 1/γ)/(Γ(α)Γ(τ)), αγ > 1. Provided αγ >1, the mean excess loss exists and increases to infinity approximately linearly; for large x, e(x) ≅ x / (αγ - 1). This tail behavior carries over to special cases such as: the Burr, Generalized Pareto, Pareto, LogLogistic, ParaLogistic, Inverse Burr, Inverse Pareto, and Inverse ParaLogistic. All have mean excess losses, when they exist, that increase approximately linearly for large x.162 162
The mean excess loss of a Pareto, when it exists, is linear in x. e(x) = (x+θ)/(α-1).
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 365
Special Cases of the Transformed Beta:163 By setting any one of the three shape parameters equal to one,164 the Transformed Beta becomes a three parameter distribution. For τ = 1, the Transformed Beta is a Burr. For γ = 1, the Transformed Beta is a Generalized Pareto. For α = 1, the Transformed Beta is an Inverse Burr. In turn, one can fix one of the remaining shape parameters in one of these three parameter distributions and obtain a two parameter distribution, each of which is also a special case of the Transformed Beta Distribution. For γ = 1, the Burr is a Pareto. For α = 1, the Burr is a LogLogistic. For α = γ, the Burr is a ParaLogistic. For τ = 1, the Generalized Pareto is a Pareto. For α = 1, the Generalized Pareto is an Inverse Pareto. For γ = 1, the Inverse Burr is an Inverse Pareto. For τ = 1, the Inverse Burr is a LogLogistic. For τ = γ, the Inverse Burr is an Inverse ParaLogistic. Limiting Cases of the Transformed Beta Distribution:165 By taking appropriate limits one can obtain additional distributions from the Transformed Beta Distribution. Examples include the Transformed Gamma, Inverse Transformed Gamma and the LogNormal Distributions. The first example is that the Transformed Gamma Distribution is a limiting case of the Transformed Beta. In order to demonstrate this relationship weʼll use Sterlingʼs formula for the Gamma of large arguments, which says that for large z: Γ(z) ≅ e-zzz-0.5 2 π . 163
See Figure 5.2 in Loss Models. One can fix any of the parameters at any positive value, but for example the three parameter distribution that results from fixing α = 2 does not have a name, because it does not come up as often in applications. Fixing the scale parameter is much less common for practical applications than fixing one or more shape parameters. 165 See Figure 5.4 in Loss Models. 164
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 366
Weʼll also use the fact that the limit as z → ∞ of (1 + c/z)z is ec. The latter fact follows from taking the limit of ln{(1 + c/z)z} = z ln(1 + c/z) ≅ z (c/z - (c/z)2 /2) = c - c/2z2 ≅ c. Exercise: Use Sterlingʼs Formula to approximate Γ(a+b)/Γ(a)Γ(b), for very large a. [Solution: Sterlingʼs Formula says for large z: Γ(z) ≅ e-zzz-0.5 2 π . Thus for large a: Γ(a+b)/Γ(a) ≅ e-(a+b)(a+b)a+b-0.5 2 π / {e-a aa−.5 2 π } = e-b (1+b/a)a (a+b)b a / (a + b) ≅ e-b eb ab ≅ ab . Thus for large a, Γ(a+b)/Γ(a)Γ(b) ≅ ab / Γ(b).] A Transformed Beta Distribution with parameters α, θ, γ and τ has density f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ). Weʼll take limits of the Gamma Functions in front and the rest of the density as two separate pieces and then put the results together. Set q = θ/α1/γ and let α go to infinity, while holding q constant.166 Given the chosen constant q, then θ = qα1/γ. Then the density of the Transformed Beta, other than the Gamma Functions is: γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) = γ q−γτα −τ xγτ−1 (1 + (x/q)γ /α)−(α + τ) = γ q−γτα −τ xγτ−1 (1 + (x/q)γ /α)−τ /(1 + (x/q)γ /α)α ≅ γ q−γτα −τ xγτ−1/exp((x/q)γ) = γ q−γτα −τ xγτ−1 exp(-(x/q)γ). Where Iʼve used the fact that the limit as z → ∞ of (1+c/z)z is ec, with z = α and c = (x/q)γ. Meanwhile, the Gammas in front of density are: Γ(α+τ)/{Γ(τ)Γ(α)}. As shown in the above exercise, for large α, Γ(α+τ)/{Γ(τ)Γ(α)} ≅ α τ / Γ(τ). Putting the two pieces of the density of the Transformed Beta Distribution together: f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) ≅ { α τ /Γ(τ) }γ q−γτα −τ xγτ−1 exp(-(x/q)γ) = γ q−γτ xγτ−1 exp[-(x/q)γ] / Γ(τ). This is the density of a Transformed Gamma Distribution, with scale parameter q, with what is normally the α parameter given as τ, and what is normally the τ parameter given as γ. 166
q will turn out to be the scale parameter of the limiting Transformed Gamma Distribution.
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 367
Thus the Transformed Gamma Distribution has been obtained as a limit of a series of Transformed Beta Distributions, with q = θ/α1/γ and letting α go to infinity, while holding q constant. Then q is the scale parameter of the limiting Transformed Gamma. The τ parameter of the limiting Transformed Gamma is the γ parameter of the Transformed Beta. The α parameter of the limiting Transformed Gamma is the τ parameter of the Transformed Beta.167 In terms of Distribution Functions: lim β[τ, α ; xγ / (xγ + α qγ)] = Γ[τ ; (x/q)γ]. α →∞
For the Transformed Beta Distribution, and its special cases, as alpha approaches infinity, in the limit we get a Transformed Gamma, and its special cases: Transformed Beta
→ Transformed Gamma
Generalized Pareto
→ Gamma
Burr
→ Weibull
Pareto
→ Exponential
Note that in each case since we taken the limit as alpha approaches infinity, the limiting distribution has one fewer shape parameter than the distribution whose limit we are taking. Exercise: What is the limit of a Pareto Distribution, as α goes to infinity while θ = 100α? [Solution: This is special case of the limit of a Transformed Beta. Using the above result, in this case, the limit is an Exponential Distribution with scale parameter 100. Alternately, for the Pareto S(x) = 1/{1+ (x/θ)}α = 1/{1 + (x/100)/α}α. As α approaches infinity, S(x) approaches 1/exp(x/100) = e-x/100. This is an Exponential Distribution with mean 100.] Exercise: What is the limit of a Burr Distribution, with γ = 3, as α goes to infinity while θ = 25α1/3? [Solution: This is special case of the limit of a Transformed Beta. Using the above result, in this case, the limit is an Weibull Distribution with scale parameter 25 and τ = 3. Alternately, for the Burr with γ = 3, S(x) = 1/(1+ (x/θ)3 )α = 1/(1+ (x/25)3 /α)α. As α approaches infinity, S(x) approaches 1/exp((x/25)3 ) = exp(-(x/25)3 ). This is an Weibull Distribution with scale parameter 25 and τ = 3.] 167
The fact that the tau parameter of the limiting Transformed Gamma Distribution is not the same tau as that of the Transformed Beta Distribution is due to the manner in which Loss Models has chosen to parametrize both distributions.
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 368
Similarly, for the Transformed Beta Distribution, and its special cases, as tau approaches infinity, in the limit we get a Inverse Transformed Gamma, and its special cases: Transformed Beta
→ Inverse Transformed Gamma
Generalized Pareto
→ Inverse Gamma
Inverse Burr
→ Inverse Weibull
Inverse Pareto
→ Inverse Exponential
A Transformed Beta Distribution with parameters α, θ, γ and τ has density f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ). Set q = θτ1/γ and let τ go to infinity, while holding q constant. Given the chosen constant q, then θ = q/τ1/γ. Then the density of the Transformed Beta, other than the Gamma Functions is: γ θ−γτ xγτ−1 (1 + (x/θ)γ)−(α + τ) = γ θ−γτ xγτ−1 (x/θ)-γ(α + τ) (1 + (θ / x)γ)−(α + τ) = γ θγα x−(γα+1) (1 + (θ / x)γ) −τ (1 + (θ / x)γ) −α = γ qγα τ−α x−(γα+1) (1 + (q / x)γ/τ)−τ (1 + (q / x)γ/τ)−α ≅ γ qγα τ−α x−(γα+1) exp(-(x/q)γ) (1) ≅ γ qγα τ−α x−(γα+1) exp(-(x/q)γ).168 As shown previously in an exercise, for large τ, {Γ(α+τ)/Γ(τ)Γ(α)} ≅ τα / Γ(α). Putting the two pieces of the density of the Transformed Beta Distribution together: f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) ≅ { τα / Γ(α)}γ qγα τ−α x−(γα+1) exp(-(x/q)γ) = γ qγα τ−α x−(γα+1) exp(-(x/q)γ) / Γ(α). This is the density of an Inverse Transformed Gamma Distribution, with scale parameter q, with the usual α parameter, and what is normally the τ parameter given as γ. Thus the Inverse Transformed Gamma Distribution has been obtained as a limit of a series of Transformed Beta Distributions, with q = θτ1/γ and let τ go to infinity, while holding q constant. Then q is the scale parameter of the limiting Inverse Transformed Gamma. The τ parameter of the limiting Inverse Transformed Gamma is the γ parameter of the Transformed Beta. The α parameter of the limiting Inverse Transformed Gamma is the α parameter of the Transformed Beta. In terms of Distribution Functions: lim β[τ, α ; xγ / (xγ + q γ /τ)] = 1 - Γ[α ; (q/x)γ]. τ→∞ 168
Where Iʼve use the fact that the limit as z → ∞ of (1 + c/z)-z is e-c, with z = τ and c = (x/q)γ.
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 369
Note that this result could have been obtained from the previous one: lim β[τ, α ; xγ / (xγ + α qγ)] = Γ[τ ; (x/q)γ]. Since β[a, b;x] = 1- β[a, b; 1-x], α →∞
lim β[τ, α ; xγ / (xγ + q γ /τ)] = lim 1 - β[ α, τ ; qγ/τ / (xγ + q γ/τ)] = τ →∞
τ→∞
lim 1 - β[ α, τ ; qγ / (qγ + τ xγ)] = 1 - Γ[τ ; (q/x)γ]. τ →∞
Exercise: What is the limit of a Generalized Pareto Distribution, with α = 7, as τ goes to infinity while θ = 33/τ? [Solution: This is special case of the limit of a Transformed Beta. Using the above result, in this case, the limit is an Inverse Gamma Distribution with scale parameter 33 and α = 7. Alternately, for the Generalized Pareto Distribution with α = 7, f(x) = {Γ(7+τ)/Γ(τ)Γ(7)}θ−τxτ−1(1+(x/θ))−(7 + τ) = {Γ(7+τ)/Γ(τ)Γ(7)}33−τττ xτ−1(1+τ(x/33))−(7 + τ) =
{Γ(7+τ)/Γ(τ)Γ(7)} 33−τ ττ xτ−1 {τ(x/33)}−τ ((1 + (33/x)/τ)−τ (1+τ(x/33))-7 = {Γ(7+τ)/Γ(τ)Γ(7)} x-1 (( 1 + (33/x)/τ)−τ (1+τ(x/33))-7. As τ approaches infinity, this approaches: {τ7/Γ(7)} x-1 exp(-33/x) {τ(x/33)}-7 = 337 x-8 exp(-33/x) /Γ(7). This is the density of an Inverse Gamma Distribution with scale parameter 33 and α = 7.] Exercise: What is the limit of an Inverse Burr Distribution, with γ = 4, as τ goes to infinity while θ = 13/τ1/4 ? [Solution: This is special case of the limit of a Transformed Beta. Using the above result, in this case, the limit is an Inverse Weibull Distribution with scale parameter 13 and τ = 4. Alternately, for the Inverse Burr with γ = 4, S(x) = 1/(1 + (θ /x)4 )τ = 1/(1 + (13/x)4 /τ)τ. As τ approaches infinity, S(x) approaches 1/exp((13/x)4 ) = exp(-(13/x)4 ). This is an Inverse Weibull Distribution with scale parameter 13 and τ = 4.]
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 370
Problems: 28.1 (1 point) Which of the following are special cases of the Transformed Beta Distribution? 1. Beta Distribution 2. ParaLogistic Distribution 3. Inverse Gaussian Distribution A. None of 1, 2 or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D 28.2 (1 point) Which of the following can be obtained as limits of Transformed Beta Distributions? 1. Weibull Distribution 2. Inverse Gamma Distribution 3. Single Parameter Pareto Distribution A. None of 1, 2 or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D 28.3 (2 points) Calculate the density function at 14, f(14), for a Transformed Beta Distribution with α = 3, θ = 10, τ = 4 and γ = 6. Hint: f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ). A. less than 0.02 B. at least 0.02 but less than 0.03 C. at least 0.03 but less than 0.04 D. at least 0.04 but less than 0.05 E. at least 0.05 28.4 (1 point) Match the Distributions: 1. Pareto
a. Transformed Beta with α = 1 and τ =1
2. LogLogistic
b. Transformed Beta with τ = 1 and γ =1
3. Inverse Pareto
c. Transformed Beta with α = 1 and γ =1
A. 1a, 2b, 3c D. 1b, 2c, 3a
B. 1a, 2c, 3b E. 1c, 2b, 3a
C. 1b, 2a, 3c
28.5 (2 points) What is the limit of Inverse Pareto Distributions, as τ goes to infinity while θ = 7/τ? A. An Exponential Distribution, with scale parameter 7. B. An Exponential Distribution, with scale parameter 1/7. C. An Inverse Exponential Distribution, with scale parameter 7. D. An Inverse Exponential Distribution, with scale parameter 1/7. E. None of the above.
2013-4-2,
Loss Distributions, §28 Transformed Beta
HCM 10/8/12,
Page 371
Solutions to Problems: 28.1. C. 1. While the Transformed Beta Distribution can be derived from the Beta Distribution, the Beta has support [0,1], while the Transformed Beta Distribution has support x > 0. The Beta is not a special case of a Transformed Beta. 2. The ParaLogistic Distribution is a special case of a Transformed Beta, with τ =1 and γ = α. 3. The Inverse Gaussian Distribution is not a special case of a Transformed Beta Distribution. 28.2. E. 1. Yes. By taking appropriate limits of Burr Distributions, Transformed Beta Distributions with τ = 1, one can obtain a Weibull Distribution . 2. Yes. By taking appropriate limits of Generalized Pareto Distributions, Transformed Beta Distributions with γ = 1, one can obtain an Inverse Gamma Distribution 3. No. The Single Parameter Pareto has support x > θ, while the Transformed Beta Distribution has support x > 0. 28.3. B. f(x) = {Γ(α+τ)/Γ(τ)Γ(α)} γ (x/θ)γτ (1+(x/θ)γ)−(α + τ)/x. f(22) = {Γ(7)/Γ(4)Γ(3)} (6) (1.4)24 (1+(1.4)6 )-7/ 14 = (60)(6)(3214.2)/((3,284,565)(14)) = 0.025. 28.4. C. 1b, 2a, 3c. 28.5. C. This is special case of the limit of a Transformed Beta. The limit is an Inverse Transformed Gamma with α = 1 and γ =1 and scale parameter 7. That is an Inverse Exponential with scale parameter 7. Alternately, for the Inverse Pareto, F(x) = (x/(x+θ))τ = (1+θ/x)−τ = (1+(7/x)/τ)−τ. As τ approaches infinity F(x) approaches 1/exp(7/x) = exp(-7/x). This is an Inverse Exponential with scale parameter 7.
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 372
Section 29, Producing Additional Distributions Given a light-tailed distribution, one can produce a more heavy-tailed distribution by looking at the inverse of x. Let G(x) = 1 - F(1/x).169 For example, if F is a Gamma Distribution, then G is an Inverse Gamma Distribution. For a Gamma Distribution with θ = 1, F(x) = Γ[α; x]. Letting y = θ/x, G(y) = 1 - F(x) = 1 - F(θ/y) = Γ[α; θ/y], the Inverse Gamma Distribution. Given a Distribution, one can produce another distribution by adding up independent identical copies. For example, adding up α independent Exponential Distributions gives a Gamma Distribution. As α approaches infinity one approaches a very light-tailed Normal Distribution. One can get a more heavy-tailed distribution by the change of variables y = ln(x). Let G(x) = F[ln(x)]. For example, if F is the Normal Distribution, then G is the heavier-tailed LogNormal Distribution. Loss Models refers to this as "exponentiating", since if y = ln(x), then x = ey. One can get new distributions by the change of variables y = x1/τ. Loss Models refers to this as "raising to a power". Let G(x) = F[x1/τ], τ > 0. For example, if F is the Exponential Distribution with mean θ, then G is a Weibull Distribution, with scale parameter θτ. For τ > 1 the Weibull Distribution has a lighter tail than the Exponential Distribution. For τ < 1 the Weibull Distribution has a heavier tail than the Exponential Distribution. For τ > 0, Loss Models refers to the new distribution as transformed, for example Transformed Gamma versus Gamma.170 If τ < 0, then G(x) = 1 - F[x1/τ]. For τ < 0, this is called the inverse transformed distribution, such as the Inverse Transformed Gamma versus the Gamma. This can be usefully thought of as two separate changes of variables: raising to a positive power and inverting. For the special case, τ = -1, this is the inverse distribution as discussed previously, such as the Inverse Gamma versus the Gamma.
169
We need to subtract from one, so that G(0) = 0 and G(∞) = 1. However, some distribution retain their special names. For example the Weibull is not called the transformed Exponential, nor is the Burr called the transformed Pareto. 170
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 373
Exercise: X is Exponential with mean 10. Determine the form of the distribution of Y = X3 . [Solution: F(x) = 1 - exp[-x/10]. Y = X3 . X = Y1/3. FY(y) = FX[x] = FX[y1/3] = 1 - exp[-y1/3/10] = 1 - exp[-(y/1000)1/3]. A Weibull Distribution with θ = 1000 and τ = 1/3. Alternately, FY(y) = Prob[Y ≤ y] = Prob[X3 ≤ y] = Prob[X ≤ y1/3] = 1 - exp[-y1/3/10]. FY(y) = 1 - exp[-(y/1000)1/3]. A Weibull Distribution with θ = 1000 and τ = 1/3. Alternately, f(x) = exp[-x/10]/10. fY(y) = fX[y1/3] |dx/dy| = exp[-y1/3/10]/10 (1/3)y-2/3 = (1/3) (y/1000)1/3 exp[-(y/1000)1/3] / y. A Weibull Distribution with θ = 1000 and τ = 1/3. Comment: A change of variables as in calculus class.] One can get additional distributions as a mixture of distributions. As will be discussed in a subsequent section, the Pareto can be obtained as a mixture of an Exponential by an Inverse Gamma.171 Usually such mixing produces a heavier tail; the Pareto has a heavier tail than an Exponential. The Negative Binomial which can be obtained as a mixture of Poissons via a Gamma has a heavier tail than the Poisson. Loss Models refers to this as Mixing. Another method of getting new distributions is to weight together two or more existing distributions. Such mixed distributions, referred to by Loss Models as n-point or two-point mixtures, are discussed in a subsequent section.172 One can get additional distributions as a ratio of independent variables each of which follows a known distribution. For example an F-distribution is a ratio of Chi-Squares.173 As a special case, the Pareto can be obtained as a ratio of an Exponential variable and a Gamma variable.174 The Beta Distribution can be obtained as a combination of two Gammas.175 Generally the distributions so obtained have heavier tails.
171
Loss Models Example 5.4 shows that mixing an exponential via an inverse exponential yields a Pareto Distribution. This is just a special case of the Inverse Gamma-Exponential, with mixed distribution a Pareto. Example 5.6 shows that mixing an Inverse Weibull via a Transformed Gamma with the same τ parameter, gives an Inverse Burr Distribution. 172 Loss Models, Section 4.2.3. 173 The F-Distribution from Statistics is related to the Generalized Pareto Distribution. 174 See p. 47, Loss Distributions by Hogg & Klugman. 175 If X is a random draw from a Gamma Distribution with shape parameter α and scale parameter θ, and Y is a random draw from a Gamma Distribution with shape parameter β and scale parameter θ, then Z = X / (X+Y) is a random draw from a Beta Distribution with parameters α and β.
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 374
Finally, one can introduce a scale parameter. If one had the distribution F(x) = 1 - e-x, one can create a family of distributions by substituting x/θ everywhere x appears. θ is now a scale parameter. F(x) = 1 - e-x/θ. Introducing a scale parameter does not effect either the tail behavior or the shape of the distribution. Loss Models refers to this as "multiplying by a constant". 10 - 10. Determine the distribution of Y. x
Exercise: X is uniform from 0 to 0.1. Y =
[Solution: F(x) = 10x, 0 ≤ x ≤ 0.1. X = 10/(Y + 10)2 . x = 0 ⇔ y = ∞. x = 0.1 ⇔ y = 0. Small Y ⇔ Large X. ⇒ Need to take 1 - F(x). FY(y) = 1 - FX[10/(y + 10)2 ] = 1 - 102 /(y + 10)2 . A Pareto Distribution with α = 2 and θ = 10. Alternately, FY(y) = Prob[Y ≤ y] = Prob[
10 10 - 10 ≤ y] = Prob[ ≤ y + 10] = x x
Prob[10/X ≤ (y + 10)2 ] = Prob[X ≥ 10/(y + 10)2 ] = 1 - Prob[X ≤ 10/(y + 10)2 ] = 1 - (10){10/(y + 10)2 } = 1 - 102 /(y + 10)2 . FY(y) = 1 - 102 /(y + 10)2 , a Pareto Distribution with α = 2 and θ = 10. Alternately, f(x) = 10, 0 ≤ x ≤ 0.1. fY(y) = fX[10/(y + 10)2 ] |dx/dy| = (10) 20/(Y + 10)3 = (2) 102 /(Y + 10)3 . A Pareto Distribution with α = 2 and θ = 10.]
Percentiles: A one-to-one monotone transformation, such as ln(x), ex, or x2 , preserves the percentiles, including the median. For example, the median of a Normal Distribution is µ, which implies that the median of a LogNormal Distribution is eµ.
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 375
Problems: 29.1 (4 points) X follows a Standard Normal Distribution, with mean zero and standard deviation of 1. Y = 1/X. (a) (1 point) What is the density of y? (b) (2 points) Graph the density of y. (c) (1 point) What is E[Y]? 29.2 (1 point) X follows an Exponential Distribution with mean 1. Let Y = θ exp[X/α]. What is the distribution of Y? 29.3 (3 points) X follows a Weibull Distribution with parameters θ and τ. Let Y = -ln[X]. What are the algebraic forms of the distribution and density functions of Y? 29.4 (3 points) ln(X) follows a LogNormal Distribution with µ = 1.3 and σ = 0.4. Determine the density function of X. 29.5 (3 points) X follows a Standard Normal Distribution, with mean 0 and standard deviation of 1. Let Yτ = X2 , for τ > 0. Determine the form of the Distribution of Y. 29.6 (2 points) Let X follows the density f(x) = e-x / (1 + e-x)2 , -∞ < x < ∞. Let Y = θ eX / γ , for θ > 0, γ > 0. Determine the form of the distribution of Y. 29.7 (2 points) Let X be a uniform distribution from 0 to 1. 1 - p Let Y = θ ln[ ], for θ > 0, 1 > p > 0. Determine the form of the distribution function of Y. 1 - pX 29.8 (2 points) X follows an Exponential Distribution with hazard rate λ. Let Y = exp[-δX]. What is the distribution of Y? 29.9 (3 points) X is uniform on (0, 1). Y is uniform on (0, (a) What is the distribution of Y? (b) Determine E[Y].
x ).
29.10 (1 point) X follows an Exponential Distribution with mean 10. Let Y = 1/X. What is the distribution of Y?
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 376
29.11 (4 points) You are given the following: • The random variable X has a Normal Distribution, with mean zero and standard deviation σ. • The random variable Y also has a Normal Distribution, with mean zero and standard deviation σ. • X and Y are independent. • R2 = X2 + Y2 . Determine the form of the distribution of R. Hint: The sum of squares of ν independent Standard Normals is a Ch-Square Distribution with v degrees a freedom, which is a Gamma Distribution with α = ν/2 and θ = 2. 29.12 (2, 5/90, Q.36) (1.7 points) If Y it uniformly distributed on the interval (0, 1) and if Z = -a ln(1 - Y) for some a > 0, then to which of the following families of distributions does Z belong? A. Pareto B. LogNormal C. Normal D. Exponential E. Uniform 29.13 (4B, 11/97, Q.11) (2 points) You are given the following: • The random variable X has a Pareto distribution, with parameters θ and α. • Y is defined to be ln(1 + X/θ). Determine the form of the distribution of Y. A. Negative Binomial B. Exponential C. Pareto
D. Lognormal
E. Normal
29.14 (IOA 101, 4/00, Q.13) (3.75 points) Suppose that the distribution of a physical coefficient, X, can be modeled using a uniform distribution on (0, 1). A researcher is interested in the distribution of Y, an adjusted form of the reciprocal of the coefficient, where Y = (1/X) - 1. (i) (2.25 points) Determine the probability density function of Y. (ii) (1.5 points) Determine the mean of Y. 29.15 (1, 11/01, Q.13) (1.9 points) An actuary models the lifetime of a device using the random variable Y = 10X0.8, where X is an exponential random variable with mean 1 year. Determine the probability density function f(y), for y > 0, of the random variable Y. (A) 10 y0.8 exp[-8y-0.2] (B) 8 y-0.2 exp[-10y0.8] (C) 8 y-0.2 exp[-(0.1y)1.25] (D) (0.1y)1.25 exp[-1.25(0.1y).25] (E) .125 (0.1y).25 exp[-(0.1y)1.25]
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
29.16 (CAS3, 11/05, Q.19) (2.5 points) Claim size, X, follows a Pareto distribution with parameters α and θ. A transformed distribution, Y, is created such that Y= X1/τ. Which of the following is the probability density function of Y? A. τθyτ−1 / (y + θ)τ+1 B. αθατy τ−1 / (yτ + θ)α+1 C. θαθ / (y + α)θ+1 D. ατ(y/θ)τ / {y[1 + (y/θ)τ]α+1} E. αθα / (yτ + θ)α+1 29.17 (CAS3, 5/06, Q.27) (2.5 points) The following information is available regarding the random variables X and Y:
• X follows a Pareto Distribution with α = 2 and θ = 100. • Y = ln[1 + (X/θ)] Calculate the variance of Y. A. Less than 0.10 B. At least 0.10, but less than 0.20 C. At least 0.20, but less than 0.30 D. At least 0.30, but less than 0.40 E. At least 0.40
Page 377
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 378
Solutions to Problems: 29.1. (a) f(x) = exp[-.5 x2 ]/ 2 π . x = 1/y. dx/dy = -1/y2 . g(y) = f(x) |dx/dy| = (exp[-0.5/y2 ]/y2 ) / 2π , -∞ < y < ∞. (b) This density is zero at zero, is symmetric, and has maximums at ±1/ 2 : density 0.30 0.25 0.20 0.15 0.10
0.05
-6
-4
-2
2
4
6
x
∞
∫
(c) E[Y] = 2 (exp[-.5/y2 ]/y)/ 2 π . Now as y → ∞, exp[-0.5/y2 ] → e0 = 1. 0
Therefore, for large values of y, the integrand is basically 1/y, which has no finite integral since ln[∞] = ∞. Therefore, the first moment of Y does not exist. Comment: This is an Inverse Normal Distribution, none of whose positive moments exist. 29.2. F(x) = 1 - exp[-x]. y = θ exp[x/α]. ⇒ exp[x/α] = y/θ. ⇒ exp[x] = (y/θ)α. Substituting into F(x), F(y) = 1 - (θ/y)α, a Single Parameter Pareto Distribution. Comment: While x goes from 0 to ∞, y goes from θ exp[0/α] = θ to ∞. In general, if ln[Y] follows a Gamma, then Y follows what is called a LogGamma. A special case is when ln[Y] is Exponential with mean α. Then Y follows a LogExponential, which is just a Single Parameter Pareto Distribution with θ = 1.
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 379
29.3. F(x) = 1 - exp[-(x/θ)τ], x > 0. y = -ln[x]. ⇒ x = e-y. x = 0 ⇔ y = ∞, and x = ∞ ⇔ y = -∞. Since large x corresponds to small y, we need to substitute into S(x) rather than F(x). Substituting into S(x), F(y) = exp[-(e-y/θ)τ] = exp[-e-τy/θτ], -∞ < y < ∞. Differentiating, f(y) = exp[-e-τy/θτ] τ e-τy/θτ, -∞ < y < ∞. Alternately, for the Weibull Distribution, f(x) = τxτ−1 exp(-(x/θ)τ) / θτ. f(y) = f(x) |dx/dy| = {τ e-y(τ−1) exp(-(e-y/θ)τ) / θτ} e-y = exp[-e-τy/θτ] τ e-τy/θτ, -∞ < y < ∞. Comment: This distribution is sometimes called the Gumbel Distribution. For τ = 1 and θ = 1, F(y) = exp[-e-y], and f(x) = exp[-y - e-y], -∞ < y < ∞, which is a form of what is called the Extreme Value Distribution, the Fisher-Tippet Type I Distribution, or the Doubly Exponential Distribution. 29.4. The LogNormal Distribution has support starting at 0, so we want ln(x) > 0. ⇒ x > 1. F(x) = LogNormal Distribution at ln(x): Φ[{ln(ln(x)) - 1.3}/.4]. f(x) = φ[{ln(ln(x)) - 1.3}/.4] d ln(ln(x))/dx = {exp[-{ln(ln(x)) - 1.3}2 /(2 .42 )]/ (.4
2 π ) } / {x ln(x)} =
exp[-3.125{ln(ln(x)) - 1.3}2 ]/ {0.4 x ln(x) 2π }, x > 1. Comment: Beyond what you are likely to be asked on your exam. Just as the LogNormal Distribution has a much heavier righthand tail than the Normal Distribution, the “LogLogNormal” Distribution has a much heavier righthand tail than the LogNormal Distribution. 29.5. f(x) = exp[-x2 /2]/ 2 π . X = Yτ/2. Since x is symmetric around zero, but x2 ≥ 0, we need to double the density of x. g(y) = 2 f(x) |dx/dy| = 2{exp[-yτ/2]/ 2 π }(τ/2)yτ/2 − 1 = τ yτ/2 − 1 exp[-yτ/2] /
2π .
The density of a Transformed Gamma Distribution is: f(x) = τ xτα−1 exp[-xτ/ θτ] /{θτα Γ(α)}. Matching parameters, τα = τ/2, and 2 = θτ. The density of y is a Transformed Gamma Distribution with parameters α = 1/2, τ, and θ = 21/τ. Comment: θτα Γ(α) = (θτ)α Γ(1/2) = 21/2
π = 2π .
If τ = 1, then Y has a Gamma Distribution with α = 1/2 and θ = 2, which is a Chi-Square Distribution with one degree of freedom.
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 380
29.6. By integration, F(x) = 1/(1 + e-x) = ex / (1 + ex), -∞ < x < ∞. γ
ex = (y / θ) . Therefore, FY(y) =
γ
(y / θ) γ , y > 0. This is a Loglogistic Distribution. 1 + (y / θ)
Comment: The original distribution is called a Logistic Distribution. The Loglogistic has a similar relationship to the Logistic Distribution, as the LogNormal has to the Normal.
2013-4-2,
Loss Distributions, §29 Additional Distributions
29.7. y/θ = ln[
HCM 10/8/12,
Page 381
1 - p 1 - p ]. ⇒ ey/θ = . ⇒ (1-p) e-y/θ = 1 - px. ⇒ px = 1 - (1-p) e-y/θ. ⇒ x 1 - p 1 - px
x ln(p) = ln[1 - (1-p) e-y/θ]. ⇒ x = ln[1 - (1-p) e-y/θ] / ln(p). For x = 1, y = 0, while as x approaches zero, y approaches infinity. Since X is uniform, FX(x) = x. ⇒ FY(y) = 1 - ln[1 - (1-p) e-y/θ ] / ln(p), y > 0. Comment: The density is fY(y) =
-e- y / θ (1- p) - y/ θ
{1 - (1- p)e
} θ ln[p]
, y > 0.
The distribution of Y is called an Exponential-Logarithmic Distribution. As p approaches 1, the distribution of Y approaches an Exponential Distribution. The Exponential-Logarithmic Distribution has a declining hazard rate. If frequency follows a Logarithmic Distribution, and severity is Exponential, then the minimum of the claim sizes follows an Exponential-Logarithmic Distribution. Here is a graph comparing the density of an Exponential with mean 100 and an Exponential-Logarithmic Distribution with p = 0.2 and θ = 100: density 0.025
0.020 Exponential-Logarithmic 0.015
0.010
Exponential
0.005
50
100
150
200
250
300
size
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 382
29.8. F(x) = 1 - exp[-λx]. X = -ln[Y]/δ. When x is big y is small and vice-versa. As x goes from zero to infinity, y goes from 1 to 0. Therefore, we get the distribution function of Y by plugging into the survival function of X: F(y) = exp[-λ (-ln[y]/δ)] = yλ/δ, 0 < y < 1. ⇒ f(y) = (λ/δ) yλ/δ - 1, 0 < y < 1. Y follows a Beta Distribution, with parameters θ = 1, a = λ/δ, and b = 1. Comment: If X is the future lifetime, and δ is the force of interest, then Y is the present value of a life insurance that pays 1. The actuarial present value of this insurance is: a λ/δ λ E[Y] = θ = = . a + b λ/δ + 1 λ + δ As discussed in Life Contingencies, if the distribution of future lifetimes is Exponential with hazard λ rate λ, then is the actuarial present value of a life insurance that pays 1. λ + δ 29.9. a. F[y | x] = y/ x for 0 ≤ y ≤
x . F[y | x] = 1 for y >
x.
⎧ 1 for 0 ≤ x ≤ y2 In other words, F[y | x] = ⎨ . ⎩y / x for y2 ≤ x ≤ 1 y2
Thus, F[y] =
1
∫0 1 dx + y∫2 y /
x =1
x dx =
y2
+ 2y x
]
= y2 + 2y - 2y2 = 2y - y2 , 0 ≤ y ≤ 1.
x = y2 1
b. S(y) = 1 + y2 - 2y. E[Y] =
∫0 1 + y2 - 2y dy = 1 + 1/3 - 1 = 1/3.
Alternately, f(x) = 2 - 2y, 0 ≤ y ≤ 1. This is a Beta Distribution with a = 1, b = 2, and θ = 1. E[Y] = θ a / (a+ b) = 1/3. 29.10. F(x) = 1 - e-x/10. Let G be the distribution function of Y. G(y) = 1 - F(x) = 1 - F(1/y) = exp[-0.1/y]. This is an Inverse Exponential Distribution with θ = 0.1. Comment: We need to subtract from one, so that G(0) = 0 and G(∞) = 1.
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 383
29.11. For σ = 1, R2 is the sum of two unit Normals, and thus a Chi-Square with 2 degrees of freedom, which is an Exponential Distribution with θ = 2. Now R = (R2 )1/2, so we have a power transformation, and thus R is Weibull with τ = 2. Specifically, the survival function of R is: S(r) = survival function of R2 = exp[-r2 /2]. Now if σ ≠ 1, we just have a scale transformation, and r is divided by σ wherever r appears in the survival function: r2 ⎛ r ⎞2 S(r) = exp[] = exp[- ⎜ ⎟ ]. ⎝σ 2⎠ 2 σ2 R follows a Weibull Distribution with τ = 2 and θ = σ
2.
Comment: This is called a Rayleigh Distribution. 29.12. D. F(z) = Prob[Z ≤ z] = Prob[-a ln(1 - Y) ≤ z] = Prob[ln( 1 - Y) ≥ -z/a] = Prob[1 - Y ≥ e-z/a] = Prob[1 - e-z/a ≥ Y] = 1 - e-z/a. An Exponential Distribution with θ = a. Comment: For Y uniform on [0, 1], Prob[Y ≤ y] = y. This is the basis of one way to simulate an Exponential Distribution. Z = -a ln(Y), also follows an Exponential Distribution with θ = a, which is the basis of another way to simulate an Exponential Distribution. 29.13. B. If y = ln(1+ x/θ), then dy/dx = (1/θ) / (1+x/θ) = 1/(θ+x). Note that ey = 1+ x/θ. g(y) = f(x) / |dy/dx| = {(αθα)(θ + x)−(α + 1)}/ (1/(θ+x)) = α(1 + x/θ)−α = α(ey)−α = αe−αy. Thus y is distributed as per an Exponential. Comment: See for example page 107 of Insurance Risk Models by Panjer and Willmot, not on the Syllabus. 29.14. (i) F(x) = x, 0 < x < 1. y = 1/x - 1. ⇒ x = 1/(1 + y). ⇒ F(y) = 1 - 1/(1+y), 0 < y < ∞.
⇒ f(y) = 1/(1+y)2 , 0 < y < ∞. Alternately, f(x) = 1, 0 < x < 1. dy/dx= -1/x2 . f(y) = f(x)/(|dy/dx|) = 1/(1+y)2 . When x is 0, y is ∞, and when x = 1, y is 0. ⇒ f(y) = 1/(1+y)2 , 0 < y < ∞. (ii) Y follows a Pareto Distribution with α = 1 (and θ = 1), and therefore the mean does not exist. Alternately, E[Y] is the integral from 0 to ∞ of y/(1+y)2 , which does not exist, since for large y the integrand acts like 1/y.
2013-4-2,
Loss Distributions, §29 Additional Distributions
HCM 10/8/12,
Page 384
29.15. E. S(x) = exp[-x]. y = 10x0.8. ⇒ x = (y/10)1.25. ⇒ S(y) = exp[-(y/10)1.25]. f(y) = 1.25 y.25 exp[-(y/10)1.25] / 101.25 = 0.125 (0.1y). 2 5 exp[-(0.1y)1 . 2 5] . Comment: Y follows a Weibull Distribution with τ = 1.25 and θ = 10. 29.16. B. Y= X1/τ. x = yτ. dx/dy = τ yτ−1. f(x) = αθα/(x + θ)α+1. f(y) = dF/dy = dF/dx dx/dy = {αθα/(x + θ)α+1} τ yτ−1 = {αθα/(yτ + θ)α+1} τ yτ−1 = αθατyτ−1/(yτ + θ)α+1. Alternately, F(x) = 1 - {θ/(x + θ)}α. x = yτ. ⇒ F(y) = 1 - {θ/(yτ + θ)}α. Differentiating with respect to y, f(y) = αθατyτ−1/(yτ + θ)α+1. Comment: Basically, just a change of variables from calculus. The result is a Burr Distribution, but with a somewhat different treatment of the scale parameter than in Loss Models. If τ = 1, one should just get the density of the original Pareto. This is not the case for choices A and C, eliminating them. While it is not obvious, choice D does pass this test. 29.17. C. Y = ln(1 + (X/θ)). ⇔ X = θ(eY - 1) = 100(eY - 1). F(x) = 1 - {100/(100 + x)}2 . F(y) = 1 - {100/(100 + 100(ey - 1))}2 = 1 - e-2y. Thus Y follows an Exponential Distribution with θ = 1/2, and variance θ2 = 1/4.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 385
Section 30, Tails of Loss Distributions Actuaries are often interested in the behavior of a size of loss distribution as the size of claim gets very large. The question of interest is how quickly the right-hand tail probability, as quantified in the survival function S(x) = 1 - F(x), goes to zero as x approaches infinity. If the tail probability goes to zero slowly, one describes that as a "heavy-tailed distribution." For example, for the Pareto distribution S(x) = {θ/(θ+x)}α, which goes to zero as per x−α. If the tail probability goes to zero quickly, then one describes the distribution as "light-tailed". For the Exponential Distribution, S(x) = e-x/θ, which goes to zero very quickly as x → ∞. The heavier tailed distribution will have both its density and its survival function go to zero more α θα slowly as x approaches infinity. For example, for a Pareto, f(x) = , which goes to zero (θ + x)α + 1 more slowly than the density of an Exponential Distribution, f(x) = e-x/θ /θ. For example, here is a comparison starting at 300 of the Survival Function of an Exponential Distribution with θ = 100 versus that of a Pareto Distribution with θ = 200, α = 3, and mean of 100:
S(x) 0.06 0.05 0.04 0.03 Pareto
0.02 0.01
Expon. 400
500
600
700
800
900
1000
x
The Pareto with a heavier righthand tail has its Survival Function go to zero more slowly as x approaches infinity, than the Exponential. The Exponential has less probability in its righthand tail than the Pareto. The Exponential has a lighter righthand tail than the Pareto.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 386
Exercise: Compare S(1000) for the Exponential Distribution with θ = 100 versus that of a Pareto Distribution with θ = 200, α = 3, and mean 100. [Solution: For the Exponential, S(1000) = e-1000/100 = 0.00454%. For the Pareto, S(1000) = (200/1200)3 = 0.46296%. Comment: The Pareto Distribution has a much higher probability of a loss of size greater than 1000 than does the Exponential Distribution.] Exercise: What are the mean and second moment of a Pareto Distribution with parameters α = 3 and θ = 10? [Solution: The mean is: θ/(α−1) = 10/2 = 5. The second moment is:
2 θ2 = 200 / 2 = 100.] (α − 1) (α − 2)
Exercise: What are the mean and second moment of a LogNormal Distribution with parameters µ = 0.9163 and σ = 1.1774? [Solution: The mean is: exp(µ + σ2/2) = exp(1.6094) = 5. The second moment is: exp(2µ + 2σ2) = exp(4.605) = 100.] Thus a Pareto Distribution with parameters α = 3 and θ = 10 and a LogNormal Distribution with parameters µ = 0.9163 and σ = 1.1774 have the same mean and second moment, and therefore the same variance. However, while their first two moments match, the Pareto has a heavier tail. This can be seen by calculating the density functions for some large values of x. Exercise: What are f(10), f(100), f(1000) and f(10,000) for a Pareto Distribution with parameters α = 3 and θ = 10? [Solution: For a Pareto, f(x) =
α θα . So that f(10) = 3000/ 204 = 0.01875, α + 1 θ + x ( )
f(100) = 2.05 x 10-5 , f(1000) = 2.88 x 10-9, f(10,000) = 2.99 x 10-13.]
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 387
Exercise: What are f(10), f(100) , f(1000) and f(10,000) for a LogNormal Distribution with parameters µ = 0.9163 and σ = 1.1774?
[(
exp [Solution: For a LogNormal f(x) =
ln(x) − µ)2 2σ2 x σ 2π
]
, so that
f(10) = .0169 , f(100) = 2.50 x 10-5 , f(1000) = 8.07 x 10-10 , f(10,000) = 5.68 x 10-16.] x
Pareto10 3Density
LogNormal 0.9163 1.1774 Density
10 100 1000 10000
1.87e-2 2.05e-5 2.88e-9 2.99e-13
1.69e-2 2.50e-5 8.07e-10 5.68e-16
While at 10 and 100 the two densities are similar, by the time we get to 1000, the LogNormal Density has started to go to zero more quickly. This LogNormal has a lighter tail than this Pareto. In general any LogNormal has a lighter tail than any Pareto Distribution. For the LogNormal, ln f(x) = -0.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) - ln(2π)/2. For very large x this is approximately: -0.5 ln(x)2 /σ2. For the Pareto, ln f(x) = ln(α) + αln(θ) - (α+1) ln(θ + x). For very large x this is approximately: -(α+1) ln(x). Since the square of ln(x) eventually gets much bigger than ln(x), the log density of the Lognormal (eventually) goes to minus infinity faster than that of the Pareto. In other words, for very large x, the density of the Lognormal goes to zero more quickly than the Pareto. The LogNormal is lighter-tailed than the Pareto. There are number of methods by which one can distinguish which distribution or empirical data set has the heavier tail. Light-tailed distributions have more moments that exist. For example, the Gamma Distribution has all of its (positive) moments exist. Heavy- tailed distributions do not have higher moments exist. For example, for the Pareto, only those moments for n < α exist. In general, computing the nth moment involves integrating xn f(x) with upper limit of infinity. Thus if f(x) goes to zero as x-m as x approaches infinity, then the integrand is xn-m for large x; thus the mean only exist if n-m < -1, in other words if m > n +1. The nth moment will only exist if f(x) goes to zero faster than x-(n+1).
2013-4-2,
Loss Distributions, §30 Tails
For example, the Burr Distribution has f(x) =
HCM 10/8/12,
Page 388
⎞ α+ 1 α γ xγ −1 ⎛ 1 go to zero as per ⎜ γ⎟ θγ ⎝ 1 + (x / θ) ⎠
x(γ−1) − γ(α+1) = x−(γα +1), so the nth moment exists only if αγ > n. For example, a Burr Distribution with α = 2.2 and γ = 0.8 has a first moment but fails to have a second moment, since αγ = 1.76 ≤ 2. If it exists, the larger the coefficient of variation, the heavier-tailed the distribution. For example, for the Pareto with α > 2, the Coefficient of Variation =
α , which increases as α approaches 2. α - 2
Thus as α decreases, the tail of the Pareto gets heavier. Skewness: Similarly, when it exists, the larger the skewness, the heavier the tail of distribution. The Normal Distribution is symmetric and thus has a skewness of zero. For the common size of loss distributions, the skewness is usually positive when it exists. The Gamma, Pareto and LogNormal all have positive skewness. For small τ the Weibull has positive skewness, but has negative skewness for large enough τ. The Gamma Distribution has skewness of 2 / α , which is always positive. The skewness of the Pareto Distribution does not exist for α ≤ 3. For α > 3, the Pareto skewness is: 2
α +1 α−3
α−2 > 0. α
For the LogNormal Distribution the skewness =
exp(3σ2) - 3 exp(σ2) + 2 . {exp(σ2 ) - 1}1.5
The denominator is positive, since exp( σ2) > 1 for σ2 > 0. The numerator is positive since it can be written as y3 - 3 y + 2, for y = exp( σ2) > 1. The derivative is 3y2 - 3 > 0 for y > 1. At y =1 this denominator is zero, thus for y >1 this denominator is positive. Thus the skewness of the LogNormal is positive.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 389
For the Weibull Distribution the skewness is:
{Γ(1+ 3/τ) − 3Γ(1+ 1/τ) Γ(1+ 2/τ) + 2(Γ(1+ 1/τ))3 } / {Γ(1+ 2/τ) −(Γ(1+ 1/τ)2 )}1.5. Note that the skewness depends on the shape parameter τ but not on the scale parameter θ. Note that for very large tau, the skewness of the Weibull is approximately -1.5 / tau. For large tau the skewness is negative, but goes to zero as tau goes to infinity. The Weibull has positive skewness for τ < 3.6 and a negative skewness for τ > 3.6. Mean Excess Loss (Mean Residual Lives): Heavy-tailed distributions have mean excess losses (mean residual lives), e(x) that increase to infinity as x approaches infinity.176 For example, for the Pareto the mean excess loss increases linearly. Light-tailed distributions have mean excess losses (mean residual lives) that increase slowly or decrease as x approaches infinity. For example, the Exponential Distribution has a constant mean excess loss, e(x) = θ. Hazard Rate (Force of Mortality):177 The hazard rate / force of mortality is defined as: h(x) = f(x) / S(x). If the force of mortality is large, then the chance of being alive at large ages very quickly goes to zero. If the hazard rate (force of mortality) is large, then the density drops off quickly to zero. Thus if the hazard rate is increasing, the tail is light. Conversely, if the hazard rate decreases as x approaches infinity, then the tail is heavy. The hazard rate for an Exponential Distribution is constant, h(x) = 1/θ. Relation of the Tail to the Exponential
Hazard Rate
Mean Residual Life
Heavier
Decreasing
Increasing
Lighter
Increasing
Decreasing
176 177
Mean Excess Losses (Mean Residual Lives) are discussed further in a subsequent section. Hazard rates are discussed further in a subsequent section.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 390
Heavier vs. Lighter Tails: Heavier or lighter tail is a comparative concept; there are no strict definitions of heavy-tailed and light-tailed: Heavier Tailed
Lighter Tailed
f(x) goes to zero more slowly
f(x) goes to zero more quickly
Few Moments exist
All (positive) moments exist
Larger Coefficient of Variation178
Smaller Coefficient of Variation
Higher Skewness179
Lower Skewness180
e(x) Increases to Infinity181
e(x) goes to a constant182
Decreasing Hazard Rate
Increasing Hazard Rate
178
Very heavy tailed distributions may not even have a (finite) coefficient of variation. Very heavy tailed distributions may not even have a (finite) skewness. 180 Very light tailed distributions may have a negative skewness 181 The faster the mean excess loss increases to infinity the more heavy the tail. 182 For very light-tailed distributions (such as the Weibull with τ > 1) the mean excess loss may go to zero as x approaches infinity. 179
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 391
Here is a list of loss distributions, arranged in increasing heaviness of the tail:183
Distribution Normal
Mean Excess Loss Do All Positive (Mean Residual Life) Moments Exist decreases to zero approximately as 1/x Yes
Weibull for τ > 1
decreases to zero less quickly than 1/x
Yes
Trans. Gamma for τ > 1
decreases to zero less quickly than 1/x
Yes
Gamma for α > 1184
decreases to a constant
Yes
Exponential
constant
Yes
Gamma for α < 1
increases to a constant
Yes
Inverse Gaussian
increases to a constant
Yes
Weibull for τ < 1
increases to infinity less than linearly
Yes
Trans. Gamma for τ < 1
increases to infinity less than linearly
Yes
LogNormal
increases to infinity just less than linearly
Yes185
Pareto
increases to infinity linearly
No
Single Parameter Pareto
increases to infinity linearly
No
Burr
increases to infinity linearly
No
Generalized Pareto
increases to infinity linearly
No
Inverse Gamma
increases to infinity linearly
No
Inverse Trans. Gamma
increases to infinity linearly
No
183
The Pareto, Single Parameter Pareto, Burr, Generalized Pareto, Inverse Transformed Gamma and Inverse Gamma all have tails that are not very different. The Gamma and Inverse Gaussian have tails that are not very different. The Weibull and Transformed Gamma have tails that are not very different. 184 The Gamma Distribution with α < 1 is heavier tailed than the Exponential (α = 1). The Gamma Distribution with α > 1 is lighter tailed than the Exponential (α = 1). One way to remember which one is heavier than an Exponential, is that as α → ∞, the Gamma Distribution is a sum of many independent identically distributed Exponentials, which approaches a Normal Distribution. The Normal Distribution is lighter tailed, and therefore so is a Gamma Distribution for α > 1. 185 While the moments exist for the LogNormal, the Moment Generating Function does not.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 392
Comparing Tails: There is an analytic technique one can use to more precisely compare the tails of distributions. One takes the limit as x approaches infinity of the ratios of the densities.186 Exercise: What is the limit as x approaches infinity of the ratio of the density of a Pareto Distribution with parameters α and θ to the density of a Burr Distribution with parameters, α, θ and γ. [Solution: For the Pareto f(x) = (αθα)(θ + x) − (α + 1) For the Burr, (using g to distinguish it from the Pareto), g(x) = αγ(x/θ)γ(1+(x/θ)γ)−(α + 1) /x lim f(x) / g(x) = lim (αθα)(θ + x)−(α + 1) / {αγ(x/θ)γ(1+(x/θ)γ)−(α + 1) /x} = x →∞
x →∞
lim θαx−(α + 1) / {(γxγ−1/θγ)θγ(α + 1)x−γ(α + 1)} = lim θα−γα x α(γ−1) / γ. x →∞
x →∞
For γ >1 the limit is infinity. For γ < 1 the limit is zero. For γ =1 the limit is one; for γ =1, the Burr Distribution is a Pareto.] Let f(x) and g(x) be the two densities, then if: lim f(x) / g(x) = ∞, f has a heavier tail than g. x →∞
lim f(x) / g(x) = 0, f has a lighter tail than g. x →∞
lim f(x) / g(x) = positive constant, f has a similar tail to g. x →∞
Exercise: Compare the tails of Pareto Distribution with parameters α and θ, and a Burr Distribution with parameters, α, θ and γ. [Solution: The comparison depends on the γ, the second shape parameter of the Burr. For γ >1, the Pareto has a heavier tail than the Burr. For γ < 1, the Pareto has a lighter tail than the Burr. For γ = 1, the Burr is equal to the Pareto, thus they have similar, in fact identical tails.]
186
See Loss Models, Section 3.4.2.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 393
Note, Loss Models uses the notation f(x) ~ g(x), x →∞, when lim f(x)/g(x) = 1. x →∞
Two distributions have similar tails if f(x) ~ cg(x), x →∞, for some constant c > 0. Instead of taking the limit of the ratio of densities, one can equivalently take the limit of the ratios of the survival functions.187 Exercise: What is the limit as x approaches infinity of the ratio of the Survival Function of a Pareto Distribution with parameters α and θ to the Survival Function to a Burr Distribution with parameters, α, θ and γ. [Solution: For the Pareto S(x) = θα(θ + x)−α = (1 + x/θ)−α. For the Burr, (using T to distinguish it from the Pareto), T(x) = {1 + (x/θ)γ}−α. lim S(x) / T(x) = lim {(1 +(x/θ)γ) / (1 +x/θ) }α = lim {(x/θ)γ−1}α = lim θα−γα xα(γ−1). x →∞
x →∞
x →∞
x →∞
For γ >1 the limit is infinity. For γ < 1 the limit is zero. For γ =1 the limit is one; for γ =1, the Burr Distribution is a Pareto.] Therefore the comparison of the tails of the Burr and Pareto depends on the value of γ, the second shape parameter of the Burr. For γ >1 the Burr has a lighter tail than the Pareto. For γ < 1 the Burr has a heavier tail than the Pareto. For γ = 1, the Burr is equal to the Pareto, thus they have similar, in fact identical, tails. This makes sense, since for γ >1, xγ increases more quickly than x. Thus a Burr with γ = 2 has x2 in the denominator of its survival function, where the Pareto only has x. Thus the survival function of a Burr with γ = 2 goes to zero more quickly than the Pareto, indicating it is lighter-tailed than the Pareto. The reverse is true if γ = 1/2. Then the Burr has
x in the denominator of its survival function, where
the Pareto only has x. These same technique also can be used to compare the tails of distributions from the same family.
187
The derivative of the survival function is minus the density. Since as x approaches infinity, S(x) approaches zero, one can apply L'Hospital's Rule. Let the two densities be f and g. Let the two survival functions be S and T. Limit as x approaches infinity of S(x)/T(x) = limit x approaches infinity of S'(x)/T'(x) = limit x approaches infinity of - f(x)/(- g(x)) = limit x approaches infinity of f(x)/g(x).
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 394
Exercise: The first Distribution is a Gamma with parameters α and θ. The second Distribution is a Gamma with parameters a and q. Which distribution has the heavier tail? [Solution: The density of the first Gamma is: f1 (x) ~ xα-1 exp(-x/θ). The density of the second Gamma is: f2 (x) ~ xa-1 exp(-x/q). f1 (x)/f2 (x) ~ xα-a exp(x(1/q - 1/θ)). If 1/q - 1/θ > 0, then the limit of f1 (x)/f2 (x) as x approaches infinity is ∞. If 1/q - 1/θ < 0, then the limit of f1 (x)/f2 (x) as x approaches infinity is 0. If 1/q - 1/θ = 0, then the limit of f1 (x)/f2 (x) as x approaches infinity is ∞ if α > a and 0 if a > α. Thus we have that: If θ > q, then the first Gamma is heavier-tailed. If θ < q, then the second Gamma is heavier-tailed. If θ = q and α > a, then the first Gamma is heavier-tailed. If θ = q and α < a, then the second Gamma is heavier-tailed. Multiplicative constants such as Γ(α) or θ−α, which appear in the density, have been ignored since they will not effect whether the limit of the ratio of densities goes to zero or infinity.] Thus we see that the tails of two Gammas while not very different are not precisely similar.188 Whichever Gamma has the larger scale parameter is heavier-tailed. If they have the same scale parameter, whichever Gamma has the smaller shape parameter is heavier-tailed.189 Inverse Gaussian Distribution vs. Gamma Distribution: The skewness of the Inverse Gaussian Distribution, 3 µ / θ , is always three times its coefficient of variation, µ / θ . In contrast, the Gamma Distribution has it skewness 2/ α , is always twice times its coefficient of variation, 1/ α . Thus if a Gamma and Inverse Gaussian have the same mean and variance, then the Inverse Gaussian has a larger skewness; if a Gamma and Inverse Gaussian have the same mean and variance, then the Inverse Gaussian has a heavier tail. A data set for which a Gamma is a good candidate usually also has an Inverse Gaussian as a good candidate. The fits of the two types of curves differ largely based on the relative magnitude of the skewness of the data set compared to its coefficient of variation. For data sets with less volume, there may be no way statistically to distinguish the fits. 188
Using the precise mathematical definitions in Loss Models. Casualty actuaries rarely use this concept to compare the tails of two Gammas. It would be more common to compare a Gamma to let's say a LogNormal. (A LogNormal Distribution has a significantly heavier-tail than a Gamma Distribution.) 189 If they have the same scale parameters and the same shape parameters, then the two Gammas are identical and have the same tail.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 395
Tails of the Transformed Beta Distribution: Since many other distributions are special cases of the Transformed Beta Distribution,190 it is useful to know its tail behavior. The density of the Transformed Beta Distribution is:
{Γ(α+τ)/Γ(τ)Γ(α)} γ θ−γτ xγτ−1 (1+(x/θ)γ)−(α + τ) . For large x the density acts as xγτ−1 / xγ(α + τ) = 1/ xγα + 1. If we multiply by xn we get xn-γα − 1; if we then integrate to infinity, we get a finite answer provided n - γα -1 > -1. Thus the nth moment exists for n > γα. The larger the product γα, the more moments exist and the lighter the (righthand) tail. The α shape parameter is that of a Pareto. The γ shape parameter is the power transform by which the Burr is obtained from the Pareto. Their product, γα, determines the (righthand) tail behavior of the Transformed Beta Distribution and its special cases. Provided αγ >1, the mean excess loss exists and increases to infinity approximately linearly; for large x, e(x) ≈ x / (αγ - 1). This tail behavior carries over to special cases such as: the Burr, Generalized Pareto, Pareto, LogLogistic, ParaLogistic, Inverse Burr, Inverse Pareto, and Inverse ParaLogistic. All have mean excess losses, when they exist, that increase approximately linearly for large x.191 One can examine the behavior of the left hand tail192, as x approaches zero, in a similar manner. For small x the density acts as xγτ−1. If we multiply by x-n we get xγτ−1−n; if we then integrate to zero, we get a finite answer provided γτ - 1- n > -1. Thus the negative nth moment exists for γτ > n. Thus the behavior of the left hand tail is determined by the product of the two shape parameters of the Inverse Burr Distribution. Thus we see that of the three shape parameters of the Transformed Beta, τ (one more than the power to which x is taken in the Incomplete Beta Function, i.e., the first parameter of the Incomplete Beta Function) affects the left hand tail, α (the shape parameter of the Pareto) affects the righthand tail, and γ (the power transform parameter of the Burr and LogLogistic) affects both tails.
190
See Figure 5.2 in Loss Models. The mean excess loss of a Pareto, when it exists, is linear in x. e(x) = (x+θ)/(α-1). 192 Since casualty actuaries are chiefly concerned with the behavior of loss distributions in the righthand tail, as x approaches infinity, assume that unless specified otherwise, "tail behavior" refers to the behavior in the righthand tail as x approaches infinity. 191
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 396
An Example of Distributions fit to Hurricane Data: Hogg & Klugman in Loss Distributions show the results of fitting different distributions to a set of hurricane data.193 The hurricane data set is truncated from below at $5 million and consists of 35 storms with total losses adjusted to 1981 levels of more than $5 million.194 This serves as a good example of the importance of the tails of the distribution to practical applications. Here are the parameters of different distributions fit by Hogg & Klugman via maximum likelihood, as well as their means, coefficients of variation (when they exist) and their skewnesses (when they exist): Distribution Type Weibull LogNormal Pareto Burr Gen. Pareto
Parameters θ = 88588730 µ = 17.953 α = 1.1569 α = 3.7697 α = 2.8330
Parameters τ = .51907 σ = 1.6028 θ = 73674000 θ = 585453983 θ = 862660000
Parameters
γ = .65994 τ = .33292
Mean ($ million)
Coefficient of Variation
Skewness
166 226 470 197 157
2.12 3.47 N.D. 3.75 2.79
6.11 52.26 N.D. N.D. N.D.
It is interesting to compare the tails of the different distributions by comparing the estimated probabilities of a storm greater than $1 billion or $5 billion: Distribution Type
Probability of storm > 5 million
Probability of storm > 1 billion
Probability of storm > 5 billion
Weibull LogNormal Pareto Burr Gen. Pareto
79.86% 94.26% 92.68% 85.28% 43.56%
2.9637% 4.1959% 4.5069% 3.5529% 0.0142%
0.0300% 0.3142% 0.7475% 0.2122% 0.0002%
Estimated Annual Frequency of Hurricanes Greater than $1 billion $5 billion 4.0591% 4.8686% 5.3185% 4.5567% 0.0358%
0.0410% 0.3646% 0.8821% 0.2722% 0.0004%
The lighter-tailed Weibull produces a much lower estimate of the chance of a huge hurricane than a heavier-tailed distribution such as the Pareto. The estimate from the LogNormal, which is heavier-tailed than a Weibull, but lighter-tailed than a Pareto, is somewhere in between.
193
Loss Distributions was formerly on the Part 4B exam syllabus. The data is shown in Table 4.1 of Loss Distributions, and Table 13.8 of Loss Models. In millions of dollars, the trended hurricane sizes are: 6.766, 7.123, 10.562, 14.474, 15.351, 16.983, 18.383, 19.030, 25.304, 29.112, 30.146, 33.727, 40.596, 41.409, 47.905, 49.397, 52.600, 59.917, 63.123, 77.809, 102.942, 103.217, 123.680, 140.136, 192.013, 198.446, 227.338, 329.511, 361.200, 421.680, 513.586, 545.778, 750.389, 863.881, 1638.000. 194
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 397
There were 35 hurricanes greater than 5 million in constant 1981 dollars observed in 32 years. Thus one could estimate the frequency of such hurricanes as 35/32 = 1.09 per year. Then using the curves fit to the data truncated from below one could estimate the frequency of hurricanes greater than size x as: (1.09)S(x) / S(5 million). For example, for the Pareto Distribution the estimated annual frequency of hurricanes greater than 5 billion in 1981 dollars is: (1.09)(.7475%)/(92.68%) = .8821%. This is a mean return time of: 1/.8821% = 113 years. The return times estimated using the other curves are much longer: Distribution Type Weibull LogNormal Pareto Burr Gen. Pareto
Mean Return Time (Years) of storm of storm > $1 billion > $5 billion 25 21 19 22 2,796
2,438 274 113 367 229,068
It is interesting to note that even the most heavy-tailed of these curves would seem with twenty-twenty hindsight to have underestimated the chance of large hurricanes such as Hurricane Andrew.195 The small amount of data does not allow a good estimate of the extreme tail; the observation of just one very large hurricane would have significantly changed the results. Also due to changing or cyclic weather patterns and the increase in homes near the coast, this may just not be an appropriate technique to apply to this particular problem. The preferred technique currently is to simulate possible hurricanes using meteorological data and estimate the likely damage using exposure data on the location and characteristics of insured homes combined with engineering and physics data.196
195
The losses in Hogg & Klugman are adjusted to 1981 levels. Hurricane Andrew in 8/92 with nearly $16 billion in insured losses, probably exceeded $7 billion dollars in loss in 1981 dollars. It is generally believed that Hurricanes that produce such severe losses have a much shorter average return time than a century. 196 See for example, "A Formal Approach to Catastrophe Risk Assessment in Management", by Karen M. Clark, PCAS 1986, or "Use of Computer Models to Estimate Loss Costs," by Michael A. Walters and Francois Morin, PCAS 1997.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 398
Coefficient of Variation versus Skewness, Two Parameter Distributions: For the following two parameter distributions: Pareto, LogNormal, Gamma and Weibull, the Coefficient of Variation and Skewness depend on a single shape parameter. Values are tabulated below: Shape Parameter 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 5 6 7 8 9 10
Pareto C.V. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. 3.317 2.449 2.082 1.871 1.732 1.633 1.558 1.500 1.453 1.414 1.291 1.225 1.183 1.155 1.134 1.118
Skew N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. 25.720 14.117 10.222 8.259 7.071 4.648 3.810 3.381 3.118 2.940 2.811
LogNormal C.V. 0.202 0.417 0.658 0.947 1.311 1.795 2.470 3.455 4.953 7.321 11.201 17.786 29.354 50.391 90.012 167 324 652 1366 2981 268337 6.6e+7 4.4e+10 7.9e+13 3.9e+17 5.2e+21
Skew 0.056 0.355 1.207 3.399 9.282 26.840 87.219 331 1503 8208 53948 426061 4036409 4.6e+7 6.2e+8 1.0e+10 2.0e+11 4.6e+12 1.3e+14 4.3e+15 2.7e+24 1.5e+35 7.6e+47 3.5e+62 1.4e+79 5.2e+97
Gamma C.V. 2.236 1.581 1.291 1.118 1.000 0.913 0.845 0.791 0.745 0.707 0.674 0.645 0.620 0.598 0.577 0.559 0.542 0.527 0.513 0.500 0.447 0.408 0.378 0.354 0.333 0.316
The shape parameters for these distributions are: Pareto α
LogNormal σ
Gamma α
Weibull τ
Skew 4.472 3.162 2.582 2.236 2.000 1.826 1.690 1.581 1.491 1.414 1.348 1.291 1.240 1.195 1.155 1.118 1.085 1.054 1.026 1.000 0.894 0.816 0.756 0.707 0.667 0.632
Weibull C.V. 15.843 3.141 1.758 1.261 1.000 0.837 0.724 0.640 0.575 0.523 0.480 0.444 0.413 0.387 0.363 0.343 0.325 0.309 0.294 0.281 0.229 0.194 0.168 0.148 0.133 0.120
Skew 190.1 11.35 4.593 2.815 2.000 1.521 1.198 0.962 0.779 0.631 0.509 0.405 0.315 0.237 0.168 0.106 0.051 0.001 -0.045 -0.087 -0.254 -0.373 -0.463 -0.534 -0.591 -0.638
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 399
As mentioned previously, for the Gamma Distribution the skewness is twice the CV:197 Skewness 10 8 6
alpha < 1
4 2
E alpha > 1 1
2
3
4
5
CV
For α > 1 the Gamma Distribution is lighter-tailed than an Exponential, and has CV < 1 and skewness < 2. Conversely, for α < 1 the Gamma Distribution is heavier-tailed than an Exponential, and has CV > 1 and skewness > 2. The Exponential Distribution (α = 1), shown above as E, has CV = 1 and skewness = 2. For the Pareto Distribution the skewness is more than twice the CV, when they exist. For the Pareto, the CV > 1 and the skewness > 2: Skewness 10 Pareto
8 6
Gamma 4 2
E 1
197
2
3
For the Inverse Gaussian, the Skewness is three times the CV.
4
5
CV
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 400
As α goes to infinity, the Pareto approaches the Exponential which has CV = 1 and skewness = 2. As α approaches 3, the skewness approaches infinity. Here is a similar graph for the LogNormal Distribution versus the Gamma Distribution: Skewness 10 8
LogNormal
6
Gamma
4 2
E
1
2
3
4
5
CV
For the LogNormal, as σ approaches zero, the coefficient of variation and skewness each approach zero. For σ = 1, CV = 1.311 and the skewness = 9.282. As σ approaches infinity both the skewness and CV approach infinity. Finally, here is a similar graph for the Weibull Distribution versus the Gamma Distribution: Skewness 10 8
Weibull, tau < 1
6
Gamma
4 2
E Weibull, tau > 1 1
2
3
4
5
CV
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 401
For τ > 1 the Weibull Distribution is lighter-tailed than an Exponential, and has CV < 1 and skewness < 2. Conversely, for τ < 1 the Weibull Distribution is heavier-tailed than an Exponential, and has CV > 1 and skewness > 2. The Exponential Distribution (τ = 1), shown above as E, has CV = 1 and skewness = 2. The CV is positive by definition. The skewness is positive for curves skewed to the right and negative for curves skewed to the left. The Pareto, LogNormal and Gamma all have positive Skewness. The Weibull has positive skewness for tau < 3.60235 and negative skewness for tau > 3.60235.
Existence of Moment Generating Functions:198 The moment generating function for a continuous loss distribution is given by:199 ∞
M(t) =
∫0 f(x) ext dx = E[ext].
For example for the Gamma Distribution: M(t) = (1 - θt)−α, for t < 1/θ. The moments of the function can be obtained as the derivatives of the moment generating function at zero. Thus if the Moment Generating Function exists (within an interval around zero) then so do all the moments. However the converse is not true. The moment generating function, when it exists, can be written as a power series in t:200 n=∞
M(t) =
∑ E[Xn] tn / n!. n=0
198
See “Mahlerʼs Guide to Aggregate Distributions.” See also Definition 12.2.2 in Actuarial Mathematics or Definition 3.9 in Loss Models. 199 With support from zero to infinity. In general the integral goes over the support of the probability distribution. 200 This is just the usual Taylor Series, substituting in the moments for the derivatives at zero of the Moment Generating Function.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 402
In order for the moment generating function to converge (in an interval around zero), the moments E[Xn ] may not grow too quickly as n gets large. This is yet another way to distinguish lighter and heavier tailed distributions. Those with Moment Generating Functions are lighter-tailed than those without Moment Generating Functions. Thus the Weibull for τ > 1, whose m.g.f. exists, is lighter-tailed than the Weibull with τ < 1, whose m.g.f. does not. The Transformed Gamma has the same behavior as the Weibull; for τ > 1 the Moment Generating Function exists and the distribution is lighter-tailed than τ < 1 for which the Moment Generating Function does not exist. (For τ = 1, one gets a Gamma, for which the Moment Generating Function exists.) The LogNormal Distribution has its moments increase rapidly and thus it does not have a Moment Generating Function. The LogNormal is the heaviesttailed of those distributions which have all their moments.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 403
Problems: 30.1 (2 points) You are given the following information on three (3) size of loss distributions: Distribution Coefficient of Variation Skewness I 2 3 II 1.22 3.81 III 1 2 Which of these three loss distributions can not be a Gamma Distribution? A. I B. II C. III D. I, II, III E. None of A,B,C, or D 30.2 (1 points) Which of the following distributions always have positive skewness? 1. Weibull 2. Normal 3. Gamma A. None of 1, 2, or 3 B. 1 C. 2 D. 3 E. None of A, B, C, or D 30.3 (2 points) Which of the following statements is true? 1. For the Pareto Distribution, the standard deviation (when it exists) is always greater than the mean. 2. For the Pareto Distribution, the skewness (when it exists) is always greater than twice the coefficient of variation. 3. For the LogNormal distribution, f(x) goes to zero more quickly as x approaches infinity than for the Transformed Gamma distribution. Hint: For the Transformed Gamma distribution, f(x) = τ(x/θ)τα exp[-(x/θ)τ] / {x Γ(α)}. A. 1, 2
B. 1, 3
C. 2, 3
D. 1, 2, 3
E. None of A, B, C, or D
30.4 (2 points) Rank the tails of the following three distributions, from lightest to heaviest: 1. Weibull with τ = 0.5 and θ = 10. 2. Weibull with τ = 1 and θ = 100. 3. Weibull with τ = 2 and θ = 1000. A. 1, 2, 3
B. 2,1, 3
C. 1, 3, 2
D. 3, 2, 1
E. None of A, B, C or D
30.5 (3 points) Rank the tails of the following three distributions, from lightest to heaviest: 1. Gamma with α = 0.7 and θ = 10. 2. Inverse Gaussian with µ = 3 and θ = 4 . 3. Inverse Gaussian with µ = 5 and θ = 2 . A. 1, 2, 3
B. 2, 1, 3
C. 1, 3, 2
D. 3, 2, 1
E. None of A, B, C or D
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 404
30.6 (1 point) Rank the tails of the following three distributions, from lightest to heaviest: 1. Exponential 2. Lognormal 3. Single Parameter Pareto A. 1, 2, 3 B. 2, 1, 3 C. 1, 3, 2
D. 3, 2, 1
E. None of A, B, C or D
30.7 (1 point) The Inverse Exponential Distribution has a righthand tail similar to which of the following distributions? A. Lognormal
B. Pareto α = 1
C. Pareto α = 2
D. Weibull τ < 1
E. Weibull τ > 1
30.8 (3 points) You are given the following: • Claim sizes for Risk A follow a Exponential distribution, with mean 400. • Claim sizes for Risk B follow a Gamma distribution, with parameters θ = 200, α = 2. • r is the ratio of the proportion of Risk B's claims (in number) that exceed d to the proportion of Risk A's claims (in number) that exceed d. Determine the limit of r as d goes to infinity. A. 0 B. 1/2 C. 1 D. 2 E. ∞ 30.9 (4B, 11/92, Q.2) (1 point) Which of the following are true? 1. The random variable X has a lognormal distribution with parameters µ and σ, if Y = ex has a normal distribution with mean µ and standard deviation σ. 2. The lognormal and Pareto distributions are positively skewed. 3. The lognormal distribution generally has greater probability in the tail than the Pareto distribution. A. 1 only B. 2 only C. 1, 3 only D. 2, 3 only E. 1, 2, 3 30.10 (4B, 11/93, Q.21) (1 point) Which of the following statements are true for statistical distributions? 1. Linear combinations of independent normal random variables are also normal. 2. The lognormal distribution is often useful as a model for claim size distribution because it is positively skewed. 3. The Pareto probability density function tapers away to zero much more slowly than the lognormal probability density function. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 405
30.11 (4B, 11/99, Q.19) (2 points) You are given the following: • Claim sizes for Risk A follow a Pareto distribution, with parameters θ = 10,000 and α = 2. • Claim sizes for Risk B follow a Burr distribution, F(x) = 1 - {1/(1+(x/θ)γ)}α, with parameters θ = 141.42, α = 2, and γ = 2. • r is the ratio of the proportion of Risk A's claims (in number) that exceed d to the proportion of Risk B's claims (in number) that exceed d. Determine the limit of r as d goes to infinity. A. 0 B. 1 C. 2 D. 4 E. ∞ 30.12 (CAS3, 11/03, Q.16) (2.5 points) Which of the following is/are true, based on the existence of moments test? I. The Loglogistic Distribution has a heavier tail than the Gamma Distribution. ll. The Paralogistic Distribution has a heavier tail than the Lognormal Distribution. Ill. The Inverse Exponential has a heavier tail than the Exponential Distribution. A. I only B. I and II only C. I and Ill only D. II and III only E. I, ll, and Ill
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 406
Solutions to Problems: 30.1. E. Distributions I and II canʼt be a Gamma, for which the skewness = twice coefficient of variation. 30.2. D. The Normal is symmetric, so it has skewness of zero. The Gamma has skewness of α 0.5 > 0. The Weibull has either a positive or negative skewness, depending on the value of τ. 30.3. A. 1. True. This is the same as saying the CV > 1 for the Pareto. 2. True. 3. False. For the LogNormal, ln f(x) = -0.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) -ln(2π)/2. For very large x this is approximately: -0.5 ln(x)2 /σ2. For the Transformed Gamma, ln f(x) = ln(τ) + (τα -1)ln(x) - ταln(θ) -(x/θ)τ - lnΓ(α) . For very large x this is approximately: -xτ/θτ. Thus the log density of the LogNormal goes to minus infinity more slowly than that of the Transformed Gamma. Therefore the density function of the LogNormal goes to zero less quickly as x approaches infinity than that of the Transformed Gamma. The LogNormal has a heavier tail than the Transformed Gamma.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 407
30.4. D. The three survival functions are: S1 (x) = exp[-(x/10).5], S2 (x) = exp(-x/100), S 3 (x) = exp[-(x/1000)2 ]. S1 (x)/S2 (x) = exp[x/100S 1 (x)/S2 (x) is ∞, since x increases more quickly than
x / 10 ]. The limit as x approaches infinity of x . Thus the first Weibull is heavier-tailed than
the second. Similarly, the limit as x approaches infinity of S2 (x)/S3 (x) is ∞, since x increases more quickly than x2 . Thus the second Weibull is heavier-tailed than the third. Alternately, just calculate the densities or log densities for an extremely large value of x, for example 1 billion = 109 . (The log densities are more convenient to work with; the ordering of the densities and the log densities are the same.) For the Weibull, f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x. ln f(x) = ln(τ) + τ ln(x/θ) - ln(x) -(x/θ)τ. For the first Weibull, ln f(1 billion) = ln(.5) +(.5)ln(100 million) -ln(1 billion) - 100 million ≅ -10,000. For the second Weibull, ln f(1 billion) = ln(1) + (1)ln(10 million) - ln(1 billion) 10 million ≅ -10,000,000. For the third Weibull, ln f(1 billion) = ln(2) + (2)ln(1 million) ln(1 billion) - (1 million)2 ≅ -1,000,000,000,000. Thus f(1 billion) is much larger for the first Weibull than second Weibull, while f(1 billion) is much larger for the second Weibull than third Weibull. Thus the third Weibull has the lightest tail, while the first Weibull has the heaviest tail. Comment: For the Weibull, the smaller the shape parameter τ, the heavier the tail. The values of the scale parameter θ, have no effect on the heaviness of the tail. However by changing the scale, the third Weibull with θ = 1000 does take longer before its density falls below the others than if it instead had θ = 1. The (right) tail behavior refers to the behavior as x approaches infinity, thus how long it takes the density to get smaller does not affect which has a lighter tail. While the third Weibull might be lighter-tailed, for some practical applications with a maximum covered loss you may be uninterested in the large values of x at which its density is smaller than the others.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 408
30.5. B. The three densities functions are: f1 (x) = 10-0.7 x-.3 exp(-x/10)/ Γ(0.7), f2 (x) = 2 / π x-1.5 exp(-2x/9 + 4/3 - 2/x). f3 (x) =
4 / (2 π) x-1.5 exp(-4(x/3 -1)2 / (2x)) = 1/ π x-1.5 exp(-x/25 + 2/5 - 1/x).
We will take the limit as x approaches infinity of the ratios of these densities, ignoring any annoying multiplicative constants such as 10-0.7/ Γ(0.7) or
2 / π e4/3.
f1 (x)/f2 (x) ~ x-0.3 exp(-x/10) / x-1.5 exp(-2x/9 - 2/x) = x1.2 exp(0.122x + 2/x). The limit as x approaches infinity of f1 (x)/f2 (x) is ∞. Thus the Gamma is heavier-tailed than the first Inverse Gaussian with µ = 3 and θ = 4. f1 (x)/f3 (x) ~ x-0.3 exp(-x/10) / x-1.5 exp(-x/25 - 1/x) = x1.2 exp(-0.06x + 1/x ). The limit as x approaches infinity of f1 (x)/f3 (x) is 0, since exp(-0.06x) goes to zero very quickly. Thus the Gamma is lighter-tailed than the second Inverse Gaussian with µ = 5 and θ = 2. Thus the second Inverse Gaussian has the heaviest tail, followed by the Gamma, followed by the first Inverse Gaussian. Comment: In general the Inverse Gaussian and the Gamma have somewhat similar tails; they both have their mean residual lives go to a constant as x approaches infinity. Which is heavier-tailed depends on the particular parameters of the distributions. Let's assume we have a Gamma with shape parameter α and scale parameter β ( using beta rather than using theta which is also a parameter of the Inverse Gaussian,) and Inverse Gaussian with parameters µ and θ. Then the density of the Gamma f1 (x) ~ xα-1 exp(-x/β) Then the density of the Inverse Gaussian f2 (x) ~ x-1.5 exp[-xθ/(2µ2) - θ/(2x)]. f1 (x)/f2 (x) ~ xα+.5 exp[x(θ/(2µ2) - 1/β) + θ/ 2x]. If θ/ 2µ2 > 1/β, then the limit as x approaches infinity of f1 (x)/f2 (x) is ∞, and the Gamma is heavier-tailed than the Inverse Gaussian. If θ/ 2µ2 < 1/β, then the limit as x approaches infinity of f1 (x)/f2 (x) is 0, and the Gamma is lighter-tailed than the Inverse Gaussian. If θ/ 2µ2 = 1/β, then f1 (x)/f2 (x) ~ xα+.5 exp[θ/(2x)], and the limit as x approaches infinity of f1 (x)/f2 (x) is ∞, and the Gamma is heavier-tailed than the Inverse Gaussian. 30.6. A. The Single Parameter Pareto does not have all of its moments, and thus is heavier tailed than the other two. The Lognormal has an increasing mean excess loss, while that for the Exponential is constant. Thus the Lognormal is heavier tailed than the Exponential. Comment: The Single Parameter Pareto has a tail similar to that of the Pareto.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 409
30.7. B. The Inverse Exponential does not have a mean and neither does the Pareto for α = 1. More specifically, the density of the Inverse Exponential is: θe−θ/x/x2 which is approximately θ/x2 for large x, while the density of the Pareto for α = 1 is: θ/(x+θ)2 which is also approximately θ/x2 for large x. Comment: The Inverse Gamma Distribution has a similar to tail to the Pareto Distribution for the same shape parameter α. The Inverse Exponential is the Inverse Gamma for α = 1. 30.8. A. S A(d) = e-d/400. fB(x) = xe-x/200/40000. ∞
∫
x=∞
S B(d) = xe-x/200/40000 dx = (1/40000)(-200xe-x/200 - 40000e-x/200 d
] = (1 + d/200)e-d/200.
x=d
r = SB(d) / SA(d) = (1 + d/200)/ e0.0025d. As d goes to infinity the denominator increases faster than the numerator; thus as d goes to infinity, r goes to zero. Comment: Similar to 4B, 11/99, Q.19. 30.9. B. 1. False. Ln(X) has a Normal distribution if X has a LogNormal distribution. 2. True. The skewness of the Pareto does not exist for α ≤ 3. For α > 3, the Pareto skewness is: 2{(α+1)/(α−3)} (α - 2) / α > 0. LogNormal Skewness = ( exp(3σ2) - 3 exp(σ2) + 2 ) / (exp(σ2) -1)1.5. The denominator is positive, since exp(σ2) > 1 for σ2 > 0. The numerator is positive since it can be written as: y 3 - 3 y + 2, for y = exp(σ2) > 1. (The derivative of y3 - 3 y + 2 is 3y2 - 3 , which is positive for y >1. At y =1, y3 - 3 y + 2 is zero, thus for y >1 it is positive.) Since the numerator and denominator are both positive, so is the skewness. 3. False. The Pareto is heavier-tailed than the Lognormal distribution. This can be seen by a comparison of the mean residual lives. That of the lognormal increases less than linearly, while the mean residual life of the Pareto increases linearly. Another way to see this is that all of the moments of the LogNormal distribution exist, while higher moments of the Pareto distribution do not exist. Comments: The LogNormal and the Pareto distributions are both heavy-tailed and heavy-tailed distributions have positive skewness, (are skewed to the right.) These are statements that practicing actuaries should know. 30.10. E. 1. True. 2. True. 3. True. Comment: Statement 3 is another way of saying that the Pareto has a heavier tail than the LogNormal.
2013-4-2,
Loss Distributions, §30 Tails
HCM 10/8/12,
Page 410
30.11. E. SA(d) = (10000/(10000+d))2 . SB(d) = (1/(1+(d/141.42)2 ))2 =(20000/(20000+d2 ))2 . r = SA(d) / SB(d) = {(20000+d2 )/ 2(10000+d)}2 . As d goes to infinity the numerator increases faster than the denominator; thus as d goes to infinity, r goes to infinity. Comment: For γ > 1, the Burr Distribution has a lighter tail than the Pareto Distribution, while for γ < 1, the Burr Distribution would have a heavier tail than the Pareto Distribution with the same α. 30.12. E. I. The Loglogistic does not have all its moments, while the Gamma does.
⇒ The Loglogistic Distribution has a heavier tail than the Gamma Distribution. II. The Paralogistic does not have all its moments, while the Lognormal does.
⇒ The Paralogistic Distribution has a heavier tail than the Lognormal Distribution. III. The Inverse Exponential does not have all its moments, while the Exponential does.
⇒ The Inverse Exponential Distribution has a heavier tail than the Exponential Distribution.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 411
Section 31, Limited Expected Values As discussed previously, the Limited Expected Value E[X ∧ x] is the average size of loss with all losses limited to a given maximum size x. Thus the Limited Expected Value, E[X ∧ x], is the mean of the data censored from above at x. The Limited Expected Value is closely related to other important quantities: the Loss Elimination Ratio, the Mean Excess Loss, and the Excess (Pure Premium) Ratio. The Limited Expected Value can be used to price Increased Limit Factors. The ratio of losses expected for an increased limit L, compared to a basic limit B, is E[X ∧ L] / E[X ∧ B]. The Limited Expected Value is generally the sum of two pieces. Each loss of size less than or equal to u contributes its own size, while each loss greater than u contributes just u to the average. For a discrete distribution: E[X ∧ u] =
∑ xi Prob[X = xi] + u Prob[X > u]. xi≤u
For a continuous distribution: E[X ∧ u] =
u
∫
t f(t) dt + u S(u).
0
Rather than calculating this integral, make use of Appendix A of Loss Models, which has formulas for the limited expected value for each distribution.201 For example, the formula for the Limited Expected Value of the Pareto is:202 ⎛ θ ⎞ α− 1⎫ θ ⎧ E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ , α ≠ 1. ⎝ θ + x⎠ α −1 ⎩ ⎭ Exercise: For a Pareto with α = 4 and θ = 1000, compute E[X], E[X ∧ 500] and E[X ∧ 5000]. [Solution: E[X] =
⎛ 1000 ⎞ 4 − 1⎫ θ 1000 ⎧ = 333.33. E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬. ⎝ 1000 + x ⎠ α −1 4 −1 ⎩ ⎭
E[X ∧ 500] = 234.57. E[X ∧ 5000] = 331.79.] 201
In some cases the formula for the Limited Expected Value (Limited Expected First Moment) is not given. In those cases, one takes k = 1, in the formula for the Limited Expected Moments. 202 For α =1, E[X ∧ x] = -θ ln(θ/(θ+x)).
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 412
Here are the formulas for the limited expected value for some distributions: Distribution
Limited Expected Value, E[X ∧ x]
Exponential
θ (1 - e-x/θ)
Pareto
⎛ θ ⎞ α− 1⎫ θ ⎧ ⎨1 - ⎜ ⎟ ⎬, α ≠ 1 ⎝ θ + x⎠ α −1 ⎩ ⎭
LogNormal
⎡ ln(x) - µ - σ2 ⎤ ⎡ ln(x) - µ ⎤ exp(µ + σ2/2) Φ ⎢ + x {1 - Φ ⎢ ⎥ ⎥⎦ } σ σ ⎣ ⎦ ⎣
Gamma
α θ Γ[α+1 ; x/θ] + x {1 - Γ[α ; x/θ]}
Weibull
θ Γ(1 +1/τ) Γ[1 +1/τ ; (x/θ)τ] + x exp( -(x/θ)τ)
Single Parameter Pareto
⎛ θ⎞ α − 1 α − ⎜ ⎟ αθ θα ⎝ x⎠ = θ , x ≥θ α - 1 (α - 1) x α - 1 α − 1
Exercise: For a LogNormal Distribution with µ = 9.28 and σ = 0.916, determine E[X [Solution: E[X
∧
∧
25000].
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.
E[X ∧ 25000] = exp(9.6995)Φ[0.01] + 25000 {1 - Φ[0.92]} = (16310)(0.5040) + (25000)(1 - 0.8212) = 12,705.] Relationship to the LER, Excess Ratio, and Mean Excess Loss: The following relationships hold between the mean, the Limited Expected Value E[X ∧ x], the Excess Ratio R(x), the Mean Excess Loss e(x), and the Loss Elimination Ratio LER(x): mean = E[X ∧ infinity]. e(x) = { mean - E[X ∧ x] } / S(x). R(x) = 1 - { E[X ∧ x] / mean } = 1 - LER(x). R(x) = e(x) S(x) / mean. LER(x) = E[X ∧ x] / mean.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 413
Layer Average Severity: The Limited Expected Value can be useful when dealing with layers of loss. For example, suppose we are estimating the expected value (per loss) of the layer of loss greater than $1 million but less than $5 million.203 Note here we are taking the average over all losses, including those that are too small to contribute to the layer. This Layer Average Severity is equal to the Expected Value Limited to $5 million minus the Expected Value Limited to $1 million. Layer Average Severity = E[X ∧ top of Layer] - E[X ∧ bottom of layer]. The Layer Average Severity is the insurerʼs average payment per loss to an insured, when there is a deductible of size equal to bottom of the layer and a maximum covered loss equal to the top of the layer. Loss Models refers to this as the expected payment per loss. expected payment per loss = average amount paid per loss = E[X ∧ Maximum Covered Loss] - E[X ∧ Deductible Amount]. Exercise: Losses follow a Pareto with α = 4 and θ = 1000. There is a deductible of 500 and a maximum covered loss of 5000. What is the insurerʼs average payment per loss? [Solution: From previous solutions: E[X ∧ 500] = 234.57. E[X ∧ 5000] = 331.79. The Layer Average Severity = E[X ∧ 5000] - E[X ∧ 500] = 331.79 - 234.59 = 97.20.] Each small loss, x ≤ d, contributes nothing to a layer from d to u. Each medium size loss, d < x ≤ u, contributes x - d to a layer from d to u. Each large loss, u < x, contributes u - d to a layer from d to u. u
Therefore, Layer Average Severity =
∫d (t − d) f( t) dt + S(u) (u - d).
Average Non-zero Payment: Besides the average amount paid per loss to the insured, one can also calculate the average amount paid per non-zero payment by the insurer. Loss Models refers to this as the expected payment per payment.204 With a deductible, there are many instances where the insured suffers a small loss, but the insurer makes no payment. Therefore, if the denominator only includes those situations where the insurer makes a non-zero payment, the average will be bigger. The average payment per payment is greater than or equal to the average payment per loss. 203 204
This might be useful for pricing a reinsurance contract. See pages 180-183 of Loss Models.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 414
Exercise: Losses follow a Pareto with α = 4 and θ = 1000. There is a deductible of 500, and a maximum covered loss of 5000. What is the average payment per non-zero payment by the insurer? [Solution: From the previous solution the average payment per loss to the insured is 97.20. However, the insurer only makes a payment S(500) = 19.75% of the time the insured has a loss. Thus the average per non-zero payment by the insurer is: 97.20 / 0.1975 = 492.08.] expected payment per payment =
E[X ∧ Maximum Covered Loss] - E[X ∧ Deductible] S(Deductible)
Coinsurance: Sometimes, the insurer will only pay a percentage of the amount it would otherwise pay.205 As discussed previously, this is referred to as a coinsurance clause. For example with a 90% coinsurance factor, after the application of any maximum covered loss and/or deductible, the insurer would only pay 90% of what it would pay in the absence of the coinsurance clause. Exercise: Losses follow a Pareto with α = 4 and θ = 1000. There is a deductible of 500, a maximum covered loss of 5000, and a coinsurance factor of 80%. What is the insurerʼs average payment per loss to the insured? [Solution: From a previous solution in the absence of the coinsurance factor, the average payment is 97.20. With the coinsurance clause each payment is multiplied by 0.8, so the average is: (0.8)(97.20) = 77.76.] In general each payment is multiplied by the coinsurance factor, thus so is the average. This is just a special case of multiplying a variable by a constant. The nth moment is multiplied by the constant to the nth power. The variance is therefore multiplied by the square of the coinsurance factor. Exercise: Losses follow a Pareto with α = 4 and θ = 1000. There is a deductible of 500, a maximum covered loss of 5000, and a coinsurance factor of 80%. What is the insurerʼs average payment per non-zero payment by the insurer? [Solution: From a previous solution the average payment per loss to the insured is 77.76. However, the insurer only makes a payment S(500) = 19.75% of the time the insured has a loss. Thus the average per non-zero payment by the insurer is: 77.76 / 0.1975 = 393.66.]
205
For example, coinsurance clauses are sometimes used in Health Insurance, Homeowners Insurance, or Reinsurance.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 415
Formulas for Average Payments: These are the general formulas:206 Given Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then the average payment per (non-zero) payment by the insurer is: E[X ∧ u] - E[X ∧ d] c = c e(d). S(d) Given Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then the insurerʼs average payment per loss to the insured is: c (E[X ∧ u] - E[X ∧ d]). The insurerʼs average payment per loss to the insured is the Layer of Loss between the Deductible Amount d to the Maximum Covered Loss u, E[X ∧ u] - E[X ∧ d], all multiplied by the coinsurance factor. The average per non-zero payment by the insurer is the insurerʼs average payment per loss to the insured divided by the ratio of the number of non-zero payments by the insurer to the number of losses by the insured, S(d). Limited Expected Value as an Integral of the Survival Function: The Limited Expected Value can be written as an Integral of the Survival Function, S(x) = 1 - F(x). E[X ∧ x] =
x
∫
t f(t) dt + x S(x).
0
Using integration by parts and the fact that the integral of f(x) is -S(x):207 E[X ∧ x] =
x
{- S(x)x + ∫ 0
S(t) dt } + x S(x) =
x
∫
S(t) dt .
0
Thus the Limited Expected Value can be written as an integral of the Survival Function from 0 to the limit, for a distribution with support starting at zero.208
206
See Theorem 8.7 in Loss Models. More general formulas that include the effects of inflation will be discussed in a subsequent section. 207 Note that the derivative of S(x) is dS(x) /dx = d(1-F(x) / dx = - f(x). Thus an indefinite integral of f(x) is -S(x) = F(x) - 1. (There is always an arbitrary constant in an indefinite integral.) 208 Thus this formula does not apply to the Single Parameter Pareto Distribution. For the Single Parameter Pareto Distribution with support starting at θ, E[X ∧ x] = θ + integral from θ to x of S(t). More generally, E[X ∧ x] is the sum of the integral from -∞ to 0 of -F(t) and the integral from 0 to x of S(t). See Equation 3.9 in Loss Models.
2013-4-2, E[X ∧ x] =
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 416
x
∫
S(t) dt .
0
Since the mean is E[X ∧ ∞], it follows that the mean can be written as an integral of the Survival Function from 0 to the infinity,209 for a distribution with support starting at zero.210 ∞
E[X] =
∫
S(t) dt .
0
The losses in the Layer from a to b is given as a difference of Limited Expected Values: E[X ∧ b] - E[X ∧ a] =
b
∫ S(t) dt . a
Thus the Losses in a Layer can be written as an integral of the Survival Function from the bottom of the Layer to the top of the Layer.211 Expected Amount by Which Aggregate Claims are Less than a Given Value: The amount by which x is less than y is defined as (y - X)+: 0 if x > y and y - x if x ≤ y. For example, (10 - 2)+ = 8, while (10 - 15)+ = 0. x 2 7 15
10 - X 8 3 -5
(10 - X)+ 8 3 0
X ∧ 10 2 7 10
(10 - X)+ + (X ∧ 10) 10 10 10
So we see that (10 - x)+ + (x ∧ 10) = 10, regardless of x.
209
See formula 3.5.2 in Actuarial Mathematics. Do not apply this formula to a Single Parameter Pareto Distribution. For a continuous distribution with support on (a, b), the mean is: a + the integral from a to b of S(x). For the Single Parameter Pareto Distribution with support (θ, ∞), E[X] = θ + integral from θ to ∞ of S(x). 211 These are the key ideas behind Lee Diagrams, discussed subsequently. 210
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 417
In general, (y - X)+ + (X ∧ y) = y. ⇒ E[(y - X)+] + E(X ∧ y) = y. ⇒ E[(y - X)+] = y - E[X ∧ y]. More generally, the expected amount by which losses are less than y is: E[(y - X)+] =
y
y
y
0
0
0
∫ (y − x) f(x) dx = y ∫ f(x) dx - ∫ x f(x) dx = yF(y) - {E[X ∧ y] - yS(y)} = y - E[X ∧ y].
Therefore, the expected amount by which losses are less than y is: E[(y - X)+ ] = y - E[X ∧ y]. This can also be seen via a Lee Diagram, as discussed in a subsequent section. The expected amount by which aggregate losses are less than a given amount is sometimes called the “savings.” 212 For example, assume policyholder dividends are 1/3 of the amount by which that policyholderʼs aggregate annual claims are less than 1000. Then let L be aggregate annual claims. Then ⎧(1000 - L)/ 3 Policyholder Dividend = ⎨ ⎩0
L < 1000 L ≥ 1000
Then the expected policyholder dividend is one third times the average amount by which aggregate claims are less than 1000. Therefore, the expected dividend is: (1000 - E[L ∧ 1000])/3. Exercise: The aggregate annual claims for a policyholder follow the following discrete distribution: Prob[X = 200] = 30%, Prob[X = 500] = 40%, Prob[X = 2000] = 20%, Prob[X = 5000] = 10%. Policyholder dividends are 1/4 of the amount by which that policyholderʼs aggregate annual claims are less than 1000, (no dividend is paid if annual claims exceed 1000.) [Solution: E[X ∧ 1000] = (0.3)(200) + (0.4)(500) + (0.3)(1000) = 560. Therefore, the expected amount by which aggregate annual claims are less than 1000 is: 1000 - E[X ∧ 1000] = 1000 - 560 = 440. Expected policyholder dividend is: 440/4 = 110. Alternately, if aggregate claims are 200, then the dividend is: (1000 - 200)/4 = 200. If aggregate claims are 500, then the dividend is: (1000 - 500)/4 = 125. If aggregate claims are 2000 or 5000, then no dividend is paid. Expected dividend (including those cases where no dividend is paid) is: (0.3)(200) + (0.4)(125) = 110.] 212
Insurance Savings as used in Retrospective Rating is discussed for example in Gillam and Snader, “Fundamentals of Individual Risk Rating.”
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 418
Use the formula, E[(y - X)+] = y - E[X ∧ y], if the distribution is continuous rather than discrete. Exercise: Assume aggregate annual claims for a policyholder are LogNormally distributed, with µ = 4 and σ = 2.5.213 Policyholder dividends are 1/3 of the amount by which that policyholderʼs aggregate annual claims are less than 1000. No dividend is paid if annual claims exceed 1000. What are the expected policyholder dividends? [Solution: For the LogNormal distribution, E[X ∧ x] = exp(µ + σ2/2) Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}. E[X ∧ 1000] = exp(7.125)Φ(-1.337) + (1000){1 - Φ(1.163)} = (1242.6)(0.0906) + (1000)(1 - 0.8776) = 235. Therefore, the expected dividend is (1000 - E[X ∧ 1000])/3 = 255.] Sometimes, the dividend or bonus is stated in terms of the loss ratio, which is losses divided by premiums. In this case, the same technique can be used to determine the average dividend or bonus. Exercise: An insurance agent will receive a bonus if his loss ratio is less than 75%. The agent will receive a percentage of earned premium equal to 1/5 of the difference between 75% and his loss ratio. The agent receives no bonus if his loss ratio is greater than 75%. His earned premium is 10 million. His incurred losses are distributed according to a Pareto distribution with α = 2.5 and θ = 12 million. Calculate the expected value of his bonus. [Solution: A loss ratio of 75% corresponds to (0.75)(10 million) = $7.5 million in losses. If his losses are L, his loss ratio is L/10 million. If L < 7.5 million, his bonus is: (1/5)(0.75 - L/10 million)(10 million) = (1/5)(7.5 million - L). Therefore, his bonus is 1/5 the amount by which his losses are less than $7.5 million. For the Pareto distribution, E[X ∧ x] = (θ/(α-1){1 - (θ/(θ+x)α−1}. Therefore, E[X ∧ 7.5 million] = (12 million/1.5){1 - (12/(12+7.5)1.5} = 4.138 million. Therefore, the expected bonus is: (1/5){7.5 million - 4.138 million} = 672 thousand.] E[(X-d)+] versus E[(d - X)+]: E[(X-d)+] = E[X] - E[X ∧ d], is the expected losses excess of d. E[(d-X)+] = d - E[X ∧ d], is the expected amount by which losses are less than d. Therefore, E[(X-d)+] - E[(d-X)+] = E[X] - d = E[X - d]. 213
Note that we are applying the mathematical concept of a limited expected value to the distribution of aggregate losses in the same manner as was done to a distribution of sizes of loss. Aggregate distributions are discussed further in “Mahlerʼs Guide to Aggregate Distributions.”
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 419
In fact, (X-d)+ - (d-X)+ = (x-d, if x ≥ d) - (d - x, if x < d) = X - d. Exercise: For a Poisson distribution, determine E[(N-1)+].
[Solution: E[N ∧ 1] = 0 f(0) + 1 Prob[N ≥ 1] = Prob[N ≥ 1]. E[(N-1)+] = E[N] - E[N ∧ 1] = λ - Prob[N ≥ 1] = λ + e−λ - 1. Alternately, E[(N-1)+] = E[(1-N)+] + E[N] - 1 = Prob[N = 0] + λ - 1 = e−λ + λ - 1.] Exercise: In baseball a team bats in an inning until it makes 3 outs. In the fifth inning of today's game, each batter for the Bad News Bears has a 45% chance of walking and a 55% chance of striking out, independent of any other batter. What is the expected number of runs the Bad News Bears will score in the fifth inning? (If there are three men on base, a walk forces in a run. Assume no wild pitches, passed balls, etc. Assume nobody steals any bases, is picked off base, etc.) [Solution: Treat a walk as a failure for the defense, and striking out as a success for the defense. An inning ends when there are three successes. The number of walks (failures) is Negative Binomial with r = 3 and β = (chance of failure)/(chance of success) = 0.45/0.55 = 9/11. f(0) = 1/(1 + β)r = (11/20)3 = .1664. f(1) = rβ/(1 + β)r+1 = (3)(9/11)(11/20)4 = 0.2246. f(2) = {(r)(r+1)/2}β2/(1 + β)r+2 = (3)(4/2)(9/11)2 (11/20)5 = 0.2021. E[N ∧ 3] = 0f(0) + 1f(0) + 2f(2) + 3{1 - f(0) - f(1) - f(2)} = 0.2246 + (2)(0.2021) + (3)(0.4069) = 1.8495. If there are 3 or fewer walks in the inning, they score no runs. With a total of 4 walks they score 1 run, with a total 5 walks they score 2 runs, etc. Number of runs scored = (N - 3)+. Expected number of runs scored = E[(N - 3)+] = E[N] - E[N ∧ 3] = (3)(9/11) - 1.8495 = 0.605.] Average Size of Losses in an Interval: As discussed previously, the Limited Expected Value is generally the sum of two pieces. Each loss of size less than x contributes its own size, while each loss greater than or equal to x contributes just x to the average: x
E[X ∧ x] =
∫0 y f(y) dy
+ x S(x).
This formula can be rewritten to put the integral in terms of the limited expected value E[X ∧ x] and the survival function S(x), both of which are given in the Appendix A of Loss Models: x
∫0 y f(y) dy = E[X ∧ x] - x S(x).
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 420
This integral represents the dollars of loss on losses of size 0 to x. Dividing by the probability of such claims, F(x), would give the average size of such losses. Dividing instead by the mean would give the percentage of losses represented by those losses. The dollars of loss represented by the losses in an interval from a to b is just the difference of two integrals of the type we have been discussing: b
a
0
0
∫ y f(y) dy - ∫ y f(y) dy = E[X ∧ b] - bS(b) - {E[X ∧ a] -
aS(a)}.
Dividing by F(b) - F(a) would give the average size of loss for losses in this interval. Average Size of Losses in the Interval [a, b] =
{E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)} . F(b) - F(a)
Exercise: For a LogNormal Distribution, with parameters µ = 8 and σ = 3, what is the average size of those losses with sizes between $1 million and $5 million? [Solution: For the LogNormal: F(5 million) = Φ[{ln(5 million)−µ} / σ] = Φ[{ln(5 million)-8} / 3] = Φ[(15.425-8)/3] = Φ[2.475] = 0.9933. F(1 million) = Φ[{ln(1 million)-8} / 3] = Φ[1.939] = 0.9737. E[X ∧ 5 million] = exp(µ + σ2/2)Φ[(ln(5 mil) - µ - σ2)/σ] + (5 mil){1 − Φ[(ln(5 mil) - µ)/σ]} = exp(µ + σ2/2)Φ[(ln(5 mil) - µ - σ2)/σ] + (5 mil) {1 - Φ[(ln(5 mil)- µ)/σ]} = (268,337)Φ[-0.525] +(5,000,000)(1- Φ[2.475]) = (268,337)(0.2998) +(5,000,000)(0.0067) = 113,679. E[X ∧ 1 million] = exp(µ + σ2/2)Φ[(ln(1 mil) - µ - σ2)/σ] + (1 mil) {1 - Φ[(ln(1 mil)- µ)/σ]} = (268,337)Φ[-1.061] +(1,000,000)(1- Φ[1.939]) = (268,337)(0.1444) +(1,000,000)(0.0263) = 65,048. Thus, the average size of loss for those losses of size between $1 million and $5 million is: ({E[X ∧ 5m ] - (5m) S(5m)} - {E[X ∧ 1m] - (1m)S(1m)}) / {F(5m) - F(1m)} = ({113,679 - (5,000,000)(0.0067)} - {65,048 - (1,000,000)(0.0263)}) / (0.9933 -0.9737) = 41,700/0.0196 = $2.13 million. Comment: Note that the average size of loss is not at the midpoint of the interval, which is $3 million. In the case of the LogNormal, E[X ∧ x ] - xS(x) = exp(µ + σ2/2)Φ[(ln(x)− µ − σ2)/σ]. Thus, the mean loss size for the interval a to b is: exp(µ + σ2/2){Φ[(lnb - µ - σ2)/σ]-Φ[(lna - µ - σ2)/σ]} / {Φ[(lnb - µ)/σ] -Φ[(lna - µ)/σ]}, which would have saved some computation in this case. ]
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 421
Dividing instead by the mean would give the percentage of dollars of total losses represented by those claims. Proportional of Total Losses from Losses in the Interval [a, b] is: {E[X ∧ b] - b S(b)} - {E[X∧ a] - a S(a)} . E[X] Exercise: For a LogNormal Distribution, with parameters µ = 8 and σ = 3, what percentage of the total losses are from those losses with sizes between $1 million and $5 million? [Solution: E[X] = exp(µ + σ2/2) = e12.5 = 268,337. From a previous solution, ({E[X ∧ 5m ] - (5m) S(5m)} - {E[X ∧ 1m] - (1m)S(1m)}) = 41,700. 41,700/268,337 = 15.5%. Comment: In the case of the LogNormal, the percentage of losses from losses of size a to b = exp(µ + σ2/2){Φ[(lnb - µ - σ2)/σ] - Φ[(lna - µ - σ2)/σ]} / exp(µ + σ2/2) = Φ[(lnb - µ - σ2)/σ] - Φ[(lna - µ - σ2)/σ]. ] Questions about the losses in an interval have to be distinguished from those about layers of loss. For example, the losses in the layer from $100,000 and $1 million are part of the dollars from losses of size greater than $100,000. Each loss of size between $100,000 and $1 million contributes its size minus $100,000 to this layer, while those of size greater than $1 million contribute the width of the layer, $900,000, to this layer.214
214
See the earlier section on Layers of Loss.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 422
Payments Subject to a Minimum:215 Assume a disabled worker is paid his weekly wage, subject to a minimum payment of 300.216 Let X be a workers weekly wage. Then, while he is unable to work, he is paid Max[X, 300]. Min[X, 300] + Max[X, 300] = X + 300. Therefore, E[Max[X, 300]] = 300 + E[X] - E[Min[X, 300]] = 300 + E[X] - E[X
∧
300].
Let Y = amount the worker is paid = Max[X, 300]. Then Y - 300 = 0 if X ≤ 300, and Y - 300 = X - 300 if X > 300. Therefore, E[Y - 300] = E[(X - 300)+] = E[X] - E[X ∧ 300].
⇒ E[Y] = 300 + E[X] - E[X ∧ 300], matching the previous result. Exercise: Weekly wages are distributed as follows: 200 @ 20%, 300 @30%, 400 @30%, 500 @ 10%, 1000 @10%. Determine the average weekly payment to a worker who is disabled. [Solution: E[X] = (20%)(200) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(1000) = 400. E[X ∧ 300] = (20%)(200) + (30%)(300) + (30%)(300) + (10%)(300) + (10%)(300) = 280. 300 + E[X] - E[X ∧ 300] = 300 + 400 - 280 = 420. Alternately, one can list all of the possibilities: Wage Payment Probability 200 300 20% 300 300 30% 400 400 30% 500 500 10% 1000 1000 10% (20%)(300) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(1000) = 420.] Another way to look at this is that the average payment is: mean wage + (the average amount by which the wage is less than 300) = E[X] + (300 - E[X ∧ 300]) = 300 + E[X] - E[X ∧ 300], matching the previous result. In general, E[Max[X, a]] = a + E[X] - E[X
215 216
∧
a].
See for example, SOA M, 11/06, Q. 20. This is a very simplified version of benefits under Workers Compensation.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 423
Payments Subject to both a Minimum and a Maximum:217 Assume a disabled worker is paid his weekly wage, subject to a minimum payment of 300, and a maximum payment of 700.218 Let X be a workers weekly wage. Then, while he is unable to work, he is paid Min[Max[X, 300], 700]. Let Y = amount the worker is paid = Min[Max[X, 300], 700]. Then Y - 300 = 0 if X ≤ 300, Y - 300 = X - 300 if 300 < X < 700, and Y - 300 = 400 if X ≥ 700. Therefore, E[Y - 300] = the layer from 300 to 700 = E[X ∧ 700] - E[X ∧ 300].
⇒ E[Y] = 300 + E[X ∧ 700] - E[X ∧ 300]. Exercise: Weekly wages are distributed as follows: 200 @ 20%, 300 @30%, 400 @30%, 500 @ 10%, 1000 @10%. Determine the average weekly payment to a worker who is disabled. [Solution: E[X ∧ 300] = (20%)(200) + (80%)(300) = 280. E[X ∧ 700] = (20%)(200) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(700) = 370. 300 + E[X ∧ 700] - E[X ∧ 300] = 300 + 370 - 280 = 390. Alternately, one can list all of the possibilities: Wage Payment Probability 200 300 20% 300 300 30% 400 400 30% 500 500 10% 1000 700 10% (20%)(300) + (30%)(300) + (30%)(400) + (10%)(500) + (10%)(700) = 390.] Another to way to arrive at the same result is that the average payment is: mean wage + (average amount by which the wage is less than 300) - (layer above 700) = E[X] + (300 - E[X ∧ 300]) - (E[X] - E[X ∧ 700]) = 300 + E[X ∧ 700] - E[X ∧ 300], matching the previous result. In general, E[Min[Max[X , a] , b] = a + E[X
∧
b] - E[X
∧
a].
We note that if b = ∞, in other words the payments are not subject to a maximum, this reduces to the result previously discussed for that case, E[Max[X, a]] = a + E[X] - E[X ∧ a]. If instead a = 0, in other words the payment is not subject to a minimum, this reduces to E[Min[X , b]] = E[X ∧ b], which is the definition of the limited expected value. 217
This mathematics is a simplified version of the premium calculation under a Retrospectively Rated Policy. See for example, “Individual Risk Rating” by Margaret Tiller Sherwood, in Foundations of Casualty Actuarial Science. 218 This is a simplified version of benefits under Workers Compensation.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 424
Problems: 31.1 (1 point) You are given the following: • The size of loss distribution is given by f(x) = 2e-2x, x > 0 • Under a basic limits policy, individual losses are capped at 1. • The expected annual claim frequency is 13. What are the expected annual total loss payments on a basic limits policy? A. less than 5.0 B. at least 5.0 but less than 5.5 C. at least 5.5 but less than 6.0 D. at least 6.0 but less than 6.5 E. at least 6.5 Use the following information for the next 7 questions. Assume the unlimited losses follow a LogNormal Distribution with parameters µ = 7 and σ = 3. Assume an average of 200 losses per year. 31.2 (1 point) What is the total cost expected per year? A. less than $19 million B. at least $19 million but less than $20 million C. at least $20 million but less than $21 million D. at least $21 million but less than $22 million E. at least $22 million 31.3 (2 points) If the insurer pays no more than $1 million per loss, what is the insurerʼs total cost expected per year? A. less than $7 million B. at least $7 million but less than $8 million C. at least $8 million but less than $9 million D. at least $9 million but less than $10 million E. at least $10 million 31.4 (2 points) If the insurer pays no more than $5 million per loss, what is the insurerʼs total cost expected per year? A. less than $7 million B. at least $7 million but less than $8 million C. at least $8 million but less than $9 million D. at least $9 million but less than $10 million E. at least $10 million
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 425
31.5 (1 point) What are the dollars in the layer from $1 million to $5 million expected per year? A. less than $3.5 million B. at least $3.5 million but less than $3.7 million C. at least $3.7 million but less than $3.9 million D. at least $3.9 million but less than $4.1 million E. at least $4.1 million 31.6 (1 point) What are the total dollars excess of $5 million per loss expected per year? A. less than $7 million B. at least $7 million but less than $8 million C. at least $8 million but less than $9 million D. at least $9 million but less than $10 million E. at least $10 million 31.7 (2 points) What is the average size of loss for those losses between $1 million and $5 million in size? A. less than $1.4 million B. at least $1.4 million but less than $1.7 million C. at least $1.7 million but less than $2.0 million D. at least $2.0 million but less than $2.3 million E. at least $2.3 million 31.8 (1 point) What is the expected total cost per year of those losses between $1 million and $5 million in size? A. less than $3.5 million B. at least $3.5 million but less than $3.7 million C. at least $3.7 million but less than $3.9 million D. at least $3.9 million but less than $4.1 million E. at least $4.1 million
31.9 (2 points) A Pareto Distribution with parameters α = 2.5 and θ = $15,000 appears to be a good fit to liability claims. What is the expected average size of loss for a policy issued with a $250,000 limit of liability? A. less than 9200 B. at least 9200 but less than 9400 C. at least 9400 but less than 9600 D. at least 9600 but less than 9800 E. at least 9800
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 426
Use the following information for the next 4 questions:
•
The weekly wages for workers in a state follow a Pareto Distribution with α = 4 and θ = 1800.
•
Injured workers are paid weekly benefits equal to 2/3 of their pre-injury average weekly wage, but subject to a maximum benefit of the state average weekly wage and a minimum benefit of 1/4 of the state average weekly wage.
•
Injured workers have the same wage distribution as all workers.
•
The duration of payments is independent of the workerʼs wage.
31.10 (1 point) What is the state average weekly wage? A. less than $500 B. at least $500 but less than $530 C. at least $530 but less than $560 D. at least $560 but less than $590 E. at least $590 31.11 (2 points) For a Pareto Distribution with parameters α = 4 and θ = 1800, what is E[X ∧ 900]? A. less than $400 B. at least $400 but less than $430 C. at least $430 but less than $460 D. at least $460 but less than $490 E. at least $490 31.12 (2 points) For a Pareto Distribution with parameters α = 4 and θ = 1800, what is E[X ∧ 225]? A. less than $100 B. at least $100 but less than $130 C. at least $130 but less than $160 D. at least $160 but less than $190 E. at least $190 31.13 (3 points) What is the average weekly benefit received by injured workers? A. less than $300 B. at least $300 but less than $320 C. at least $320 but less than $340 D. at least $340 but less than $360 E. at least $360 Hint: Use the solutions to the previous three questions.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 427
Use the following information for the next 15 questions: Losses follow an Exponential Distribution with θ = 10,000. 31.14 (1 point) What is the average loss? A. less than 8500 B. at least 8500 but less than 9000 C. at least 9000 but less than 9500 D. at least 9500 but less than 10,000 E. at least 10,000 31.15 (1 point) Assuming a 25,000 policy limit, what is the average payment by the insurer? A. less than 9000 B. at least 9000 but less than 9100 C. at least 9100 but less than 9200 D. at least 9200 but less than 9300 E. at least 9300 31.16 (1 point) Assuming a 1000 deductible (with no maximum covered loss), what is the average payment per loss? A. less than 9000 B. at least 9000 but less than 9100 C. at least 9100 but less than 9200 D. at least 9200 but less than 9300 E. at least 9300 31.17 (1 point) Assuming a 1000 deductible (with no maximum covered loss), what is the average payment per non-zero payment by the insurer? A. less than 8500 B. at least 8500 but less than 9000 C. at least 9000 but less than 9500 D. at least 9500 but less than 10,000 E. at least 10,000 31.18 (1 point) Assuming a 1000 deductible and a 25,000 maximum covered loss, what is the average payment per loss? A. less than 8500 B. at least 8500 but less than 9000 C. at least 9000 but less than 9500 D. at least 9500 but less than 10,000 E. at least 10,000
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 428
31.19 (1 point) Assuming a 1000 deductible and a 25,000 maximum covered loss, what is the average payment per (non-zero) payment by the insurer? A. less than 9000 B. at least 9000 but less than 9100 C. at least 9100 but less than 9200 D. at least 9200 but less than 9300 E. at least 9300 31.20 (1 point) Assuming a 75% coinsurance factor (with no deductible or maximum covered loss), what is the average payment by the insurer? A. less than 6700 B. at least 6700 but less than 6800 C. at least 6800 but less than 6900 D. at least 6900 but less than 7000 E. at least 7000 31.21 (1 point) Assuming a 75% coinsurance factor and a 1000 deductible (with no maximum covered loss), what is the average payment per loss? A. less than 6700 B. at least 6700 but less than 6800 C. at least 6800 but less than 6900 D. at least 6900 but less than 7000 E. at least 7000 31.22 (1 point) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum covered loss, what is the average payment per non-zero payment by the insurer? A. less than 6700 B. at least 6700 but less than 6800 C. at least 6800 but less than 6900 D. at least 6900 but less than 7000 E. at least 7000 31.23 (2 points) What is the average size of the losses in the interval from 1000 to 25000? Assume no deductible, no maximum covered loss, and no coinsurance factor. A. less than 7500 B. at least 7500 but less than 8000 C. at least 8000 but less than 8500 D. at least 8500 but less than 9000 E. at least 9000
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 429
31.24 (2 points) What is the proportion of total dollars of loss from the losses in the interval from 1000 to 25000? Assume no deductible, no maximum covered loss, and no coinsurance factor. A. less than 74% B. at least 74% but less than 76% C. at least 76% but less than 78% D. at least 78% but less than 80% E. at least 80% 31.25 (3 points) Assuming a 1000 deductible, what is the average size of the insurerʼs payments for those payments greater than 500 and at most 4000? A. less than 2100 B. at least 2100 but less than 2130 C. at least 2130 but less than 2160 D. at least 2160 but less than 2190 E. at least 2190 31.26 (3 points) Assuming a 75% coinsurance factor, and a 1000 deductible, what is the average size of the insurerʼs payments for those payments greater than 500 and at most 4000? A. less than 2100 B. at least 2100 but less than 2130 C. at least 2130 but less than 2160 D. at least 2160 but less than 2190 E. at least 2190 31.27 (4 points) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum covered loss, what is the average size of the insurerʼs payments for those payments greater than 15,000 and at most 19,000? A. less than 17,400 B. at least 17,400 but less than 17,500 C. at least 17,500 but less than 17,600 D. at least 17,600 but less than 17,700 E. at least 17,700 31.28 (1 point) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum covered loss, what is the mean of the insurerʼs payments per loss? A. less than 4000 B. at least 4000 but less than 5000 C. at least 5000 but less than 6000 D. at least 6000 but less than 7000 E. at least 7000
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 430
31.29 (3 points) You are given the following information about a policyholder:
• His loss ratio is calculated as incurred losses divided by earned premium. • He will receive a policyholder dividend as a percentage of earned premium equal to 1/4 of the difference between 60% and his loss ratio. • He receives no policyholder dividend if his loss ratio is greater than 60%.
• His earned premium is 40,000. • His incurred losses are distributed via a LogNormal Distribution, with µ = 6 and σ = 3. Calculate the expected value of his policyholder dividend. (A) 4800 (B) 5000 (C) 5200 (D) 5400 (E) 5600 Use the following information for the next two questions: • In the state of Minnehaha, each town is responsible for its snow removal.
• However, a state fund shares the cost if a town has a lot of snow during a winter. • In exchange, a town is required to pay into this state fund when it has a winter with a small amount of snow. • Let x be the number of inches of snow a town has during a winter.
• If x < 20, then the town pays the state fund c(20 - x), where c varies town. • If x > 50, then the state fund pays the town c(x - 50). • c = 1000 for the town of Frostbite Falls. 31.30 (3 points) The number of inches of snow the town of Frostbite Falls has per winter is equally likely to be: 8, 10, 16, 21, 35, 57, 70, or 90. What is the expected net amount the state fund pays Frostbite Falls (expected amount state fund pays town minus expected amount town pays the state fund) per winter? A. 3000 B. 3500 C. 4000 D. 4500 E. 5000 31.31 (5 points) The number of inches of snow the town of Frostbite Falls has per winter is LogNormal, with µ = 2.4 and σ = 1.5. What is the expected net amount the state fund pays Frostbite Falls (expected amount state fund pays town minus expected amount town pays the state fund) per winter? A. 7000 B. 7500 C. 8000 D. 8500 E. 9000
31.32 (2 points) N follows a Poisson Distribution, with λ = 2.5. Determine E[(N - 3)+]. A. 0.2
B. 0.3
C. 0.4
D. 0.5
E. 0.6
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 431
31.33 (2 points) The lifetime of batteries is Exponential with mean 6. Batteries are sold for $100 each. If a battery lasts less than 2 years, the manufacturer will pay the purchaser the pro rata share of the purchase price. For example if the battery lasts only 1.5 years, the manufacturer will pay the purchaser (100)(2 - 1.5)/2 = 25. What is the expected amount paid by the manufacturer per battery sold? (A) 11 (B) 13 (C) 15 (D) 17 (E) 19 31.34 (4 points) XYZ Insurance Company writes insurance in a state with a catastrophe fund for hurricanes. For any hurricane on which XYZ has more than $30 million in losses in this state, the Catastrophe Fund will pay XYZ 75% of its hurricane losses above $30 million, subject to a maximum payment from the fund of $90 million. The amount XYZ pays in this state on a hurricane that hits this state is distributed via a LogNormal Distribution, with µ = 15 and σ = 2. What is expected value of the amount XYZ will receive from the Catastrophe Fund due to the next hurricane to hit this state? (A) 4 million (B) 5 million (C) 6 million (D) 7 million (E) 8 million
Use the following information for the next two questions: •
Losses follow a Pareto Distribution, with parameters α = 5 and θ = 40,000.
• •
Three losses are expected each year. For each loss less than or equal to 5,000, the insurer makes no payment.
31.35 (2 points) You are given the following: For each loss greater than 5,000, the insurer pays the amount of the loss up to the maximum covered loss of 25,000, less a 5000 deductible. (Thus for a loss of 7000 the insurer pays 2000; for a loss of 80,000 the insurer pays 20,000.) Determine the insurer's expected annual payments. A. Less than 7,500 B. At least 7,500, but less than 12,500 C. At least 12,500, but less than 17,500 D. At least 17,500, but less than 22,500 E. At least 22,500 31.36 (2 points) For each loss greater than 5,000, the insurer pays the entire amount of the loss up to the maximum covered loss of 25,000. Determine the insurer's expected annual payments. A. Less than 7,500 B. At least 7,500, but less than 12,500 C. At least 12,500, but less than 17,500 D. At least 17,500, but less than 22,500 E. At least 22,500
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 432
31.37 (2 points) Losses follow an Exponential Distribution with θ = 20,000. Calculate the percent of expected losses within the layer 5,000 to 50,000. A. Less than 50% B. At least 50%, but less than 55% C. At least 55%, but less than 60% D. At least 60%, but less than 65% E. At least 65% 31.38 (4 points) Losses follow a LogNormal Distribution with µ = 9.4 and σ = 1. Calculate the percent of expected losses within the layer 5,000 to 50,000. A. Less than 50% B. At least 50%, but less than 55% C. At least 55%, but less than 60% D. At least 60%, but less than 65% E. At least 65% 31.39 (3 points) Losses follow a Pareto Distribution with α = 3 and θ = 40,000. Calculate the percent of expected losses within the layer 5,000 to 50,000. A. Less than 50% B. At least 50%, but less than 55% C. At least 55%, but less than 60% D. At least 60%, but less than 65% E. At least 65% 31.40 (3 points) N follows a Geometric Distribution, with β = 2.5. Determine E[(N - 3)+]. A. 0.9
B. 1.0
C. 1.1
D. 1.2
E. 1.3
31.41 (3 points) Losses follow a Pareto Distribution with α = 3 and θ = 12,000. Policy A has a deductible of 3000. Policy B has a maximum covered loss of u. The average payment per loss under Policy A is equal to that under Policy B. Determine u. A. 4000 B. 5000 C. 6000 D. 7000 E. 8000 31.42 (1 point) X is 5 with probability 80% and 25 with probability 20%. If E[(y - X)+ ] = 8, determine y. A. 10
B. 15
C. 20
D. 25
E. 30
31.43 (2 points) X is Exponential with θ = 2. Y is equal to 1 - X is X < 1, and Y is 0 if X ≥ 1. What is the expected value of Y? A. 0.15 B. 0.17 C. 0.19
D. 0.21
E. 0.23
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 433
31.44 (3 points) Let R be the weekly wage for a worker compared to the statewide average. R follows a LogNormal Distribution with σ = 0.4. Determine the percentage of overall wages earned by workers whose weekly wage is less than twice the statewide average. A. 88% B. 90% C. 92% D. 94% E. 96% 31.45 (2 points) You observe the following 35 losses: 6, 7, 11, 14, 15, 17, 18, 19, 25, 29, 30, 34, 40, 41, 48, 49, 53, 60, 63, 78, 85, 103, 124, 140, 192, 198, 227, 330, 361, 421, 514, 546, 750, 864, 1638. What is the (empirical) Limited Expected Value at 50? A. less than 38 B. at least 38 but less than 39 C. at least 39 but less than 40 D. at least 40 but less than 41 E. at least 41 31.46 (2 points) Alexʼs pay is based on the annual profit made by his employer. Alex is paid 2% of the profit, subject to a minimum payment of 100. The annual profits for Alexʼs company, X, follow a distribution F(x). Which of the following represents Alexʼs expected payment? A. 100F(100) + E[X]/50 - E[X ∧ 100] B. 100F(5000) + E[X]/50 - E[X ∧ 5000]/50 C. 100 + .02(E[X] - E[X ∧ 5000]) D. .02(E[X ∧ 5000] - E[X ∧ 100]) + 100S(5000) E. None of A, B, C, or D 31.47 (2 points) In the previous question, assume F(x) = 1 - {20,000/(20,000 + x)}3 . Determine Alexʼs expected payment. A. 200 B. 230 C. 260 D. 290 E. 320 31.48 (2 points) The size of losses follows a Gamma distribution with parameters α = 3, θ = 100. What is the limited expected value at 500, E[X ∧ 500] ? Hint: Use Theorem A.1 in Appendix A of Loss Models: j=n-1
Γ(n; x) = 1 -
∑ xj e- x / j! , for n a positive integer. j=0
A. less than 275 B. at least 275 but less than 280 C. at least 280 but less than 285 D. at least 285 but less than 290 E. at least 290
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 434
31.49 (2 points) Donald Adams owns the Get Smart Insurance Agency. Let L be the annual losses from the insurance policies that Donʼs agency writes for the Control Insurance Company. L follows a Single Parameter Pareto distribution with α = 3 and θ = 100,000. Don gets a bonus from the Control Insurance Company calculated as (170,000 - L)/4 if this quantity is positive and 0 otherwise. Calculate Donʼs expected bonus. A Less than 10,000 B. At least 10,000, but less than 12,000 C. At least 12,000, but less than 14,000 D. At least 14,000, but less than 16,000 E. At least 16,000 31.50 (1 point) In the previous question, calculate the expected value of Donʼs bonus conditional on his bonus being positive. 31.51 (1 point) X follows the density f(x), with support from 0 to infinity. 1000
∫
0
1000
f(x) dx = 0.87175.
∫
x f(x) dx = 350.61.
0
Determine E[X ∧ 1000]. A. Less than 480 B. At least 480, but less than 490 C. At least 490, but less than 500 D. At least 500, but less than 510 E. At least 510 31.52 (3 points) The size of loss is modeled by a two parameter Pareto distribution with θ = 5000 and α = 3. An insurance has the following provisions: (i) It pays 75% of the first 2000 of any loss. (ii) It pays 90% of any portion of a loss that is greater than 10,000. Calculate the average payment per loss. A Less than 1050 B. At least 1050, but less than 1100 C. At least 1100, but less than 1150 D. At least 1150, but less than 1200 E. At least 1200
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 435
31.53 (3 points) The mean number of minutes used per month by owners of cell phones varies between owners via a Single Parameter Pareto Distribution with α = 1.5 and θ = 20. The Telly Savalas Phone Company is planning to sell a new unlimited calling plan. Only those whose current average usage is greater than the overall average will sign up for the plan. In addition, those who sign up will use on average 50% more minutes than currently. What is the expected number of minutes used per month under the new plan? A. 150 B. 180 C. 210 D. 240 E. 270 31.54 (2 points) Define the first moment distribution, G(x), as the percentage of total loss dollars that come from those losses of size less than x. If the size of loss distribution follows a LogNormal Distribution, with parameters µ and σ, determine the form of the first moment distribution. 31.55 (3 points) Define the quartiles as the 25th, 50th, and 75th percentiles. Define the trimmed mean as the average of those values between the first (lower) quartile and the third (upper) quartile. Determine the trimmed mean for an Exponential Distribution. 31.56 (2 points) For a Pareto Distribution with α = 1, derive the formula for the Limited Expected Value that is shown in Appendix A of Loss Models, attached to the exam. 31.57 (4 points) The value of a Property Claims Service (PCS) index is determined by the catastrophe losses for the insurance industry in a certain region of the country over a certain period of time. Each $100 million of catastrophe losses corresponds to one point on the index. A 100/150 call spread would pay: (200) {(S - 150)+ - (S - 100)+}, ⎧0 if x < 0 where S is the value of the PCS index at expiration and X+ = ⎨ . ⎩ x if x ≥ 0 You assume that the catastrophe losses in a certain region follow a LogNormal Distribution with parameters µ = 20 and σ = 2. What is the expected payment on a 100/150 call spread on the PCS Index for this region? A. 200 B. 300 C. 400 D. 500 E. 600
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 436
31.58 (3 points) Define the quantile Qα to be such that F[Qα] = α. For α between 0 and 1/2, compute the Windsorized mean by: 1. Replace all values below Qα by Qα. 2. Replace all values above Q1−α by Q1−α. 3. Take the average. Determine the algebraic form of the Windsorized mean for an Exponential Distribution. 31.59 (4 points) Define πp as the pth percentile. Define the trimmed mean as the average of those values between π1-p and πp . For p = 95%, determine the algebraic form of the trimmed mean for a Pareto Distribution. 31.60 (4, 5/88, Q.61) (3 points) Losses for a given line of insurance are distributed according to the probability density function f(x) = 0.015 - .0001x, 0 < x < 100. An insurer has issued policies each with a deductible of 10 for this line. On these policies, what is the average expected payment by the insurer per non-zero payment by the insurer? A. Less than 30 B. At least 30, but less than 35 C. At least 35, but less than 40 D. At least 40, but less than 45 E. 45 or more 31.61 (4, 5/90, Q.53) (2 points) Loss Models defines two functions: 1. the limited expected value function, E[X ∧ x] and 2. the Mean Excess Loss function, e(x) If F(x) = Pr{X ≤ x} and the expected value of X is denoted by E[X], then which of the following equations expresses the relationship between E[X ∧ x] and e(x)? A. E[X ∧ x] = E[X] - e(x) / {1- F(x)} B. E[X ∧ x] = E[X] - e(x) C. E[X ∧ x] = E[X] - e(x)(1 - F(x)) D. E[X ∧ x] = E[X](1 - F(x)) - e(x) E. None of the above
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 437
31.62 (4B, 11/93, Q.16) (1 point) Which of the following statements are true regarding loss distribution models? 1. For small samples, method of moments estimators have smaller variances than maximum likelihood estimators. 2. The limited expected value function evaluated at any point d > 0 equals E [X ∧ d] =
d
∫ x fx(x) dx + d{1 - Fx(d)}, where fx(x) and Fx(x) are the probability density
0
and distribution functions, respectively, of the loss random variable X. 3. A consideration in model selection is agreement between the empirical and fitted limited expected value functions. A. 2 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 31.63 (4B, 5/95, Q.22) (2 points) You are given the following: • Losses follow a Pareto distribution, with parameters θ = 1000 and α = 2. • 10 losses are expected each year. • The number of losses and the individual loss amounts are independent. • For each loss that occurs, the insurer's payment is equal to the entire amount of the loss if the loss is greater than 100. The insurer makes no payment if the loss is less than or equal to 100. Determine the insurer's expected annual payments. A. Less than 8,000 B. At least 8,000, but less than 9,000 C. At least 9,000, but less than 9,500 D. At least 9,500, but less than 9,900 E. At least 9,900 31.64 (4B, 11/98, Q.28) (2 points) You are given the following: • • • •
Losses follow a lognormal distribution, with parameters µ = 10 and σ = 1.
One loss is expected each year. For each loss less than or equal to 50,000, the insurer makes no payment. For each loss greater than 50,000, the insurer pays the entire amount of the loss up to the maximum covered loss of 100,000. Determine the insurer's expected annual payments. A. Less than 7,500 B. At least 7,500, but less than 12,500 C. At least 12,500, but less than 17,500 D. At least 17,500, but less than 22,500 E. At least 22,500
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 438
31.65 (3, 5/00, Q.25) (2.5 points) An insurance agent will receive a bonus if his loss ratio is less than 70%. You are given: (i) His loss ratio is calculated as incurred losses divided by earned premium on his block of business. (ii) The agent will receive a percentage of earned premium equal to 1/3 of the difference between 70% and his loss ratio. (iii) The agent receives no bonus if his loss ratio is greater than 70%. (iv) His earned premium is 500,000. (v) His incurred losses are distributed according to the Pareto distribution: F(x) = 1 - {600,000 / (x + 600,000)}3 , x > 0. Calculate the expected value of his bonus. (A) 16,700 (B) 31,500 (C) 48,300 (D) 50,000 (E) 56,600 31.66 (3, 11/00, Q.27 & 2009 Sample Q.116) (2.5 points) Total hospital claims for a health plan were previously modeled by a two-parameter Pareto distribution with α = 2 and θ = 500. The health plan begins to provide financial incentives to physicians by paying a bonus of 50% of the amount by which total hospital claims are less than 500. No bonus is paid if total claims exceed 500. Total hospital claims for the health plan are now modeled by a new Pareto distribution with α = 2 and θ = K . The expected claims plus the expected bonus under the revised model equals expected claims under the previous model. Calculate K. (A) 250 (B) 300 (C) 350 (D) 400 (E) 450 31.67 (3, 11/02, Q.37 & 2009 Sample Q.96) (2.5 points) Insurance agent Hunt N. Quotum will receive no annual bonus if the ratio of incurred losses to earned premiums for his book of business is 60% or more for the year. If the ratio is less than 60%, Huntʼs bonus will be a percentage of his earned premium equal to 15% of the difference between his ratio and 60%. Huntʼs annual earned premium is 800,000. Incurred losses are distributed according to the Pareto distribution, with θ = 500,000 and α = 2. Calculate the expected value of Huntʼs bonus. (A) 13,000
(B) 17,000
(C) 24,000
(D) 29,000
(E) 35,000
31.68 (1 point) In the previous question, (3, 11/02, Q. 37), calculate the expected value of Huntʼs bonus, given that Hunt receives a (positive) bonus. (A) 46,000 (B) 48,000 (C) 50,000 (D) 52,000 (E) 54,000 31.69 (CAS3, 11/03, Q.21) (2.5 points) The cumulative loss distribution for a risk is F(x) = 1 - 106 / (x + 103 )2 . Calculate the percent of expected losses within the layer 1,000 to 10,000. A. 10% B. 12% C. 17% D. 34% E. 41%
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 439
31.70 (SOA3, 11/03, Q.3 & 2009 Sample Q.84) (2.5 points) A health plan implements an incentive to physicians to control hospitalization under which the physicians will be paid a bonus B equal to c times the amount by which total hospital claims are under 400 (0 ≤ c ≤ 1). The effect the incentive plan will have on underlying hospital claims is modeled by assuming that the new total hospital claims will follow a two-parameter Pareto distribution with α = 2 and θ = 300. E(B) = 100. Calculate c. (A) 0.44
(B) 0.48
(C) 0.52
(D) 0.56
(E) 0.60
31.71 (SOA3, 11/04, Q.7 & 2009 Sample Q.123) (2.5 points) Annual prescription drug costs are modeled by a two-parameter Pareto distribution with θ = 2000 and α = 2. A prescription drug plan pays annual drug costs for an insured member subject to the following provisions: (i) The insured pays 100% of costs up to the ordinary annual deductible of 250. (ii) The insured then pays 25% of the costs between 250 and 2250. (iii) The insured pays 100% of the costs above 2250 until the insured has paid 3600 in total. (iv) The insured then pays 5% of the remaining costs. Determine the expected annual plan payment. (A) 1120 (B) 1140 (C) 1160 (D) 1180 (E) 1200 31.72 (CAS3, 11/05, Q.22) (2.5 points) An insurance agent gets a bonus based on the underlying losses, L, from his book of business. L follows a Pareto distribution with parameters α = 3 and θ = 600,000. His bonus, B, is calculated as (650,000 - L)/3 if this quantity is positive and 0 otherwise. Calculate his expected bonus. A Less than 100,000 B. At least 100,000, but less than 120,000 C. At least 120,000, but less than 140,000 D. At least 140,000, but less than 160,000 E. At least 160,000 31.73 (SOA M, 11/05, Q.14) (2.5 points) You are given: (i) T is the future lifetime random variable. (ii) µ(t) = µ, t ≥ 0. (iii) Var[T] = 100. Calculate E[T ∧ 10]. (A) 2.6 (B) 5.4
(C) 6.3
(D) 9.5
(E) 10.0
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 440
31.74 (CAS3, 5/06, Q.37) (2.5 points) Between 9 am and 3 pm Big National Bank employs 2 tellers to service customer transactions. The time it takes Teller X to complete each transaction follows an exponential distribution with a mean of 10 minutes. Transaction times for Teller Y follow an exponential distribution with a mean of 15 minutes. Both Teller X and Teller Y are continuously busy while the bank is open. On average every third customer transaction is a deposit and the amount of the deposit follows a Pareto distribution with parameter α = 3 and θ = $5000. Each transaction that involves a deposit of at least $7500 is handled by the branch manager. Calculate the expected total deposits made through the tellers each day. A. Less than $31,000 B. At least $31,000, but less than $32,500 C. At least $32,500, but less than $35,000 D. At least $35,000, but less than $37,500 E. At least $37,500 31.75 (SOA M, 11/06, Q.20 & 2009 Sample Q.281) (2.5 points) For a special investment product, you are given: (i) All deposits are credited with 75% of the annual equity index return, subject to a minimum guaranteed crediting rate of 3%. (ii) The annual equity index return is normally distributed with a mean of 8% and a standard deviation of 16%. (iii) For a random variable X which has a normal distribution with mean µ and standard deviation σ, you are given the following limited expected values: E[X
∧
µ = 6%
3%] µ = 8%
σ = 12%
-0.43%
0.31%
σ = 16%
-1.99%
-1.19%
E[X
∧
4%]
µ = 6%
µ = 8%
σ = 12%
0.15%
0.95%
σ = 16%
-1.43%
-0.58%
Calculate the expected annual crediting rate. (A) 8.9% (B) 9.4% (C) 10.7% (D) 11.0%
(E) 11.6%
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 441
31.76 (SOA M, 11/06, Q.31 & 2009 Sample Q.286) (2.5 points) Michael is a professional stuntman who performs dangerous motorcycle jumps at extreme sports events around the world. The annual cost of repairs to his motorcycle is modeled by a two parameter Pareto distribution with θ = 5000 and α = 2. An insurance reimburses Michaelʼs motorcycle repair costs subject to the following provisions: (i) Michael pays an annual ordinary deductible of 1000 each year. (ii) Michael pays 20% of repair costs between 1000 and 6000 each year. (iii) Michael pays 100% of the annual repair costs above 6000 until Michael has paid 10,000 in out-of-pocket repair costs each year. (iv) Michael pays 10% of the remaining repair costs each year. Calculate the expected annual insurance reimbursement. (A) 2300 (B) 2500 (C) 2700 (D) 2900 (E) 3100
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 442
Solutions to Problems: 31.1. C. The distribution is an Exponential Distribution with θ = 1/2. For the Exponential Distribution E[X ∧ x] = θ (1 - e-x/θ). The average size of the capped losses is: E[X ∧ 1] = (1/2)(1 - e-2) = .432. Thus the expected annual total loss payments on a basic limits policy are: (13)(.432) = 5.62. Alternately, one can use the relation between the mean excess loss and the Limited Expected Value: e(x) = { mean - E[X ∧ x] } / {1 - F(x)}, therefore E[X ∧ x] = mean - e(x){1 - F(x)}. For the Exponential Distribution, the mean excess loss is a constant = θ = mean. Therefore E[X ∧ x] = mean - e(x){1 - F(x)} = θ − θ(e-x/ θ). Proceed as before. 31.2. B. mean = exp(µ + σ2/2) = 98,716. Therefore, with 200 claims expected per year, the expected total cost per year is: (200)(98716) = $19.74 million. 31.3. A. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}. E[X ∧ 1 mill.] = exp(7 + 9/2)Φ[(ln(1000000) - 7 - 9 )/3] + (1000000){1 - Φ[(ln(1000000) − 7)/3]} = (98,716)Φ[-.73] + (100000)(1−Φ[2.27]) = (98,716)(1 - 0.7673) + (100,000)(1 - 0.9884) = 22,971 + 11,600 = 34,571. With a limit of $1 million per claim and 200 claims expected per year, the expected total cost per year is: 200 E[X ∧ 1 million] = (200)(34,571) = $6.91 million. 31.4. E. E[X ∧ 5 million] = exp(7 + 9/2)Φ[(ln(5000000) - 7 - 9 )/3] + (5000000) {1 − Φ[(ln(5000000) − 7)/3]} = (98716)Φ[-.19] + (500000)(1−Φ[2.81]) = (98716)(1 - .5753) + (500,000)(1- .9975) = 41,925 + 12,500 = 54,425. 200 E[X ∧ 5 million] = $10.88 million. 31.5. D. The dollars in the layer from $1 million to $5 million is the difference between the dollars limited to $5 million and the dollars limited to $1 million. Using the answers to the two previous questions: $10.88 million - $6.91 million = $3.97 million. Comment: In terms of the limited expected values and the expected number of losses N, the dollars in the layer from $1 million to $5 million equals: N{E[X ∧ 5 million] - E[X ∧ 1 million]}. In this case N = 200.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 443
31.6. C. The dollars excess of $5 million per loss is the difference between the total cost and the cost limited to $5 million per loss. Using the answers to two prior questions: $19.74 million - $10.88 million = $8.86 million. Comment: The dollars excess of $5 million per losses equals: N{E[X ∧ ∞] - E[X ∧ 5 million]} = N{mean - E[X ∧ 5 million]}. In this case N = 200 losses. 31.7. D. First calculate the dollars of loss on these losses per total number of losses: {E[X ∧ 5 million] - 5 million S(5 million)} - {E[X ∧ 1 million] - 1 million S(1 million)} = {54,425 - (5 million)(1-.9975)} - {34,517 - (1 million)(1-.9884)} = 41,925 - 22,917 = $19,008. Then divide by the probability of a loss being of this size: F(5 million) - F(1 million) = Φ[(ln(5000000) - 7)/3] - Φ[(ln(1000000) - 7)/3] = Φ[2.81] − Φ[2.27] = (.9975 - .9884) = .0091. $19,008 / .0091 = $2.09 million. 31.8. C. Either one can calculate the expected number of losses of this size per year as (200){F(5 million) - F(1 million)} = (200){0.9975 - 0.9884} = 1.8 and multiply by the average size calculated in the previous question. (1.8)($2.09 million) = $3.8 million. Alternately, one can multiply the expected number of losses per year times the dollars on these losses per loss calculated in a previous question: (200)($19,008) = $3.8 million. 31.9. E. For the Pareto Distribution, E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}. E[X ∧ 250000] = {15000/1.5}{1-(15000/(15000+250000))2.5-1} = 10000{.9865} = 9865. 31.10. E. The mean of a Pareto is: θ/(α-1) = 1800/3 = 600. 31.11. B. For the Pareto Distribution, E[X ∧ x] = {θ/(α−1)}{1 − (θ/(θ+x))α−1}. E[X ∧ 900] = {1800/3}{1-(1800/(1800+900))3 } = 422.22. 31.12. D. For the Pareto Distribution, E[X ∧ x] = {θ/(α−1)}{1 − (θ/(θ+x))α−1}. E[X ∧ 225] = {1800/3}{1-(1800/(1800+225))3} = 178.60.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 444
31.13. B. The average weekly wage is $600, from a previous solution. Thus the maximum benefit is $600, while the minimum benefit is $600/4 = $150. These correspond to pre-injury wages of $600/(2/3) = $900 and $150/(2/3) = $225 respectively. (If a workers pre-injury wage is more than $900 his benefit is only $600. If his pre-injury wage is less than $225, his benefit is still $150.) Let x be the workerʼs pre-injury wage, then the workerʼs benefits are: $150 if x ≤$225, 2x/3 if x ≥ $225 and x ≤ $900, $600 if x ≥ $900. Thus the average benefit is made up of three terms (low, medium, and high wages): 900
150 F(225) + (2/3)
900
∫225 x f(x) dx + 600 S(900)
900
∫225 x f(x) dx = ∫0
225
x f(x) dx -
∫0
x f(x) dx = E[X ∧ 900] - 900S(900) - {E[X ∧ 225] - 225S(225)}.
Thus the average benefit is: 150F(225) + 150S(225) + 600S(900) - 600S(900) + (2/3)(E[X ∧ 900] - E[X ∧ 225]) = 150 + (2/3)(E[X ∧ 900] -E[X ∧ 225]) = 150 + (2/3)(422.22 - 178.60) = 312.41. Alternately, the benefits can be described as: 150 + (2/3)(layer of wages between 900 and 225) = 150 + (2/3)(E[X ∧ 900] -E[X ∧ 225]). Comment: Extremely unlikely to be asked on the exam. Relates to the calculation of Law Amendment Factors used in Workersʼ Compensation Ratemaking. Geometrically oriented students may benefit by reviewing the subsection on payments subject to both a minimum and a maximum in the subsequent section on Lee Diagrams. 31.14. E. E[X] = θ = 10,000. 31.15. C. E[X ∧ 25000] = 10,000 (1-e-25000/10000) = 9179. 31.16. B. E[X ∧ x] = θ (1 - e-x/θ) = 10,000 (1 - e-1000/10000) = 952. E[X] - E[X ∧ 1000] = 10,000 - 952 = 9048. 31.17. E. {E[X] - E[X ∧ 1000]}/S(1000) = (10,000 - 952)/.9048 = 10,000. Alternately, the average size of the data truncated and shifted from below is the mean excess loss. For the Exponential e(x) = θ = 10,000. Comment: For the Exponential, {E[X] - E[X ∧ x]}/S(x) = θ. Thus for the Exponential Distribution, in the absence of any maximum covered loss, the average size of the insurerʼs payment per non-zero payment by the insurer does not depend on the deductible amount; the mean excess loss is constant for the Exponential Distribution.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 445
31.18. A. E[X ∧ 25000] = 10,000 (1-e-25000/10000) = 9179. E[X ∧ 25000] - E[X ∧ 1000] = 9179 - 952 = 8,227. 31.19. B. {E[X ∧ 25000] - E[X ∧ 1000]}/S(1000) = (9179 - 952)/.9048 = 9093. 31.20. E. Each payment is 75% of the insuredʼs loss, so the average is: (.75)E[X] = (.75)(10,000) = 7500. 31.21. B. Each payment is 75% of the what it would have been without any coinsurance, so the average is (.75)(E[X] - E[X ∧ 1000]) = (.75)(10,000 - 952) = 6786. 31.22. C. Each payment is 75% of the what it would have been without any coinsurance, so the average is (.75)(E[X ∧ 25000] - E[X ∧ 1000])/ S(1000) = (.75)(9179 - 952)/.9048 = 6819. 31.23. D. Average Size of Losses in the Interval [1000, 25000] = {E[X ∧ 25000] - 25000S(25000) - (E[X ∧ 1000] - 1000S(1000))}/ {F(25000) - F(1000)} = {9179 - 25000(.08208) - (952 - (1000)(.9048))}/(.9179 - .0952) = 7080/.8227 = 8606. 31.24. A. {E[X ∧ 25000] - 25000S(25000) - (E[X ∧ 1000] - 1000S(1000))}/E[X] = {9179 - 25000(.08208) - (952 - (1000)(.9048))}/ 10,000 = 7080/10,000 = 70.8%. 31.25. C. The payments from 500 to 4000 correspond to losses of size between 1500 and 5000. These losses have average size: {E[X ∧ 5000] - 5000S(5000) - (E[X ∧ 1500] - 1500S(1500))}/ {F(5000) - F(1500)} = {3935 - 5000(.6065) - (1393 - (1500)(.8607))}/(.3935 - .1393) = 3148. The average size of the payments is 1000 less: 3148 - 1000 = 2148. 31.26. B. The payments from 500 to 4000 correspond to losses of size between 1000+(500/.75) = 1667 and 1000+(4000/.75) = 6333. These losses have average size: {E[X ∧ 6333] - 6333S(6333) - (E[X ∧ 1667] - 1667S(1667))}/ {F(6333) - F(1667)} = {4692 - 6333(.5308) - (1535 - (1667)(.8465))}/(.4692 - .1535) = 3820. The average size of the payments is 1000 less then multiplied by .75: (3820 - 1000)(.75) = 2115.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 446
31.27. B. The most the insurer will pay is: (0.75)(25,000 - 1000) = 18,000. For any loss of size greater than or equal to 25,000 the insurer pays 18,000. Let X be the size of loss. Then the payment is 18,000 if X ≥ 25,000, and (0.75)(X - 1000) if 25,000 > X > 1000. A payment of 15,000 corresponds to a loss of: (15,000/0.75) + 1000 = 21,000. Thus the dollars of payments greater than 15,000 and at most 19,000 is the payments on losses of size greater than 21,000, which we split into two pieces: 25,000
∫ 0.75(x - 1000) f(x) dx + 18,000 S(25000) = 21,000 25,000
0.75
∫ x f(x) dx - 750{F(25000)-F(21000)} + 18,000S(25000) =
21,000
0.75 {E[X ∧ 25000] - 25000S(25000) - (E[X ∧ 21000] - 21000S(21000)} + 750{S(25000) - S(21000)} + 18,000S(25000) = 0.75E[X ∧ 25000] - 0.75E[X ∧ 21000] + 15,750S(21000) - 18,750S(25000) - 750S(21000) + 750S(25000) + 18,000S(25000) = 0.75E[X ∧ 25000] - 0.75E[X ∧ 21000] + 15,000S(21000) = 0.75(9179.2) - (0.75)(8775.4) +15000(0.12246) = 2139.8. In order to get the average size we need to divide the payments by the percentage of the number of losses represented by losses greater than 21,000, S(21,000) = 0.12246: 2139.8/ 0.12246 = 17,473. Comment: Long and difficult. In this case it may be easier to calculate the integral of xf(x)dx, rather than put it in terms of the Limited Expected Values and Survival Functions. 31.28. D. 0.75{E[X ∧ 25000] - E[X ∧ 1000]} = (0.75)(9179 - 952) = 6170.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 447
31.29. B. A loss ratio of 60% corresponds to (.6)(40000) = $24,000 in losses. If his losses are x, and x < 24,000, then he gets a dividend of (1/4)(24,000 - x). The expected dividend is: 24000
∫
(1/4) (24000 - x) f(x) = (1/4){24000 F(24000) - (E[X ∧ 24000] - 24000S(24000))} 0 = (1/4){24000
- (E[X ∧ 24000]}. For a LogNormal Distribution,
E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}. Therefore, E[X ∧ 24000] = exp(6 + 9/2)Φ[(ln24000 - 6 - 9)/3] + 24000 {1 − Φ[(ln24000 - 6)/3]} = (36316)Φ[-1.64] + (24000){1 - Φ[1.36]} = (36,316)(1 - .9495) + (24,000)(1 - .9131) = 3920. Therefore, his expected dividend is: (1/4)(24000 - 3920) = $5020. 31.30. E. If x < 20, then Frostbite Falls pays the state fund 1000(20 - x). The expected amount by which x is less than 20, (the “savings” at 20), is: 20 - E[X ∧ 20]. E[X ∧ 20] = (8 + 10 + 16 + 100)/8 = 16.75. Therefore, the expected amount paid by the town to the state fund per winter is: (1000)(20 - E[X ∧ 20]) = 3250. If x > 50, then the state fund pays Frostbite Falls 1000(x - 50). The expected amount by which x is more than 50, (the inches of snow excess of 50), is: E[X] - E[X ∧ 50]. E[X] = (8 + 10 + 16 + 21 + 35 + 57 + 70 + 90)/8 = 38.375. E[X ∧ 50] = (8 + 10 + 16 + 21 + 35 + 150)/8 = 30. Therefore, the expected amount paid by the state fund to the town per winter is: (1000)(E[X] - E[X ∧ 50]) = (1000)(38.375 - 30) = 8375. Expected amount state fund pays town minus expected amount town pays the state fund is: 8375 - 3250 = 5125. Alternately, one can list what happens in each possible situation: Snow
Paid by State
8 10 16 21 35 57 70 90
-12,000 -10,000 -4,000 0 0 7,000 20,000 40,000
Average
5,125
Comment: (12 + 10 + 4)/8 = 3.250 = 20 - E[X ∧ 20]. (7 + 20 + 40)/8 = 8.375 = E[X] - E[X ∧ 50]. A very simplified example of retrospective rating. See for example, “Individual Risk Rating,” by Margaret Tiller Sherwood in Foundations of Casualty Actuarial Science.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 448
31.31. A. If x < 20, then Frostbite Falls pays the state fund 1000(20 - x). The expected amount by which x is less than 20, (the “savings” at 20), is: 20 - E[X ∧ 20]. E[X
∧
20] = exp(µ + σ2/2)Φ[(ln20 − µ − σ2)/σ] + 20{1 - Φ[(ln20 − µ)/σ]} =
(33.954)Φ[-1.10] + (20){1 - Φ[.40]) = (33.954)(.1357) + (20)(1 - .6554) = 11.50. Therefore, the expected amount paid by the town to the state fund per winter is: (1000)(20 - E[X ∧ 20]) = 8500. If x > 50, then the state fund pays Frostbite Falls 1000(x - 50). The expected amount by which x is more than 50, (the inches of snow excess of 50), is: E[X] - E[X ∧ 50]. E[X] = exp(µ + σ2/2) = 33.954. E[X
∧
50] = exp(µ + σ2/2)Φ[(ln50 − µ − σ2)/σ] + 50{1 - Φ[(ln50 − µ)/σ]} =
(33.954)Φ[-.49] + (20){1 - Φ[1.01]) = (33.954)(.3121) + (50)(1 - .8438) = 18.41. Therefore, the expected amount paid by the state fund to the town per winter is: (1000)(E[X] - E[X ∧ 50]) = (1000)(33.954 - 18.41) = 15,544. Expected amount state fund pays town minus expected amount town pays the state fund is: 15,544 - 8500 = 7,044. Comment: In the following Lee Diagram, other than the constant c = 1000, the expected amount paid by the town to the state (when there is little snow) corresponds to Area A, below a horizontal line at 20 and above the curve. Other than the constant c = 1000, the expected amount paid by the state to the town (when there is a lot of snow) corresponds to Area B, above a horizontal line at 50 and below the curve. 200
150
100 B 50
A 0.2
0.4
0.6
0.8
1
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 449
31.32. C. E[(N-3)+] = E[N] - E[N ∧ 3] = λ - (Prob[N = 1] + 2Prob[N = 2] + 3Prob[N ≥ 3]) = λ - λe−λ - λ2e−λ - (3)(1 - e−λ - λe−λ - λ2e−λ/2) = λ + 3e−λ + 2λe−λ + λ2e−λ/2 - 3 = 2.5 + 3e-2.5 + 2(2.5)λe-2.5 + (2.52 )e-2.5/2 - 3 = 0.413. Alternately, E[(N-3)+] = E[(3-N)+] + E[N] - 3 = 3Prob[N = 0] + 2Prob[N = 1] + Prob[N = 2] + λ - 3 = λ + 3e−λ + 2λe−λ + λ2e−λ/2 - 3 = 0.413. 31.33. C. The expected amount by which lifetimes are less than 2 is: 2 - E[X ∧ 2] = 2 - (6)(1 - e-2/6) = .2992. The expected amount paid per battery is: (100)(.2992/2) = 14.96. 31.34. B. For the LogNormal Distribution, E[X
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.
E[X ∧ 30 million] = exp(15 + 22 /2)Φ[(ln30000000 - 15 - 22 )/2] + 30000000 {1 - Φ[(ln30000000 - 15)/2]} = 24154953 Φ[-0.89] + 30000000 {1 - Φ[1.11]} = (24,154,953)(.1867) + (30,000,000)(1 - .8665) = 8.51 million. E[X ∧ 150 million] = exp(15 + 22 /2)Φ[(ln150000000 - 15 - 22 )/2] + 150000000 {1 - Φ[(ln150000000 - 15)/2]} = 24154953 Φ[-.09] + 150000000 {1 - Φ[1.91]} = (24,154,953)(.4641) + (150,000,000)(1 - .9719) = 15.43 million. The maximum payment of $90 million correspond to a loss by XYZ of: 30 + 90/.75 = 150 million. Therefore the average payment to XYZ per hurricane is: .75 (E[X ∧ 150 million] - E[X ∧ 30 million]) = (.75)(15.43 - 8.51) = 5.2 million. Comment: The portion of hurricanes on which XYZ receives non-zero payments is: S(30 million) = 1 - Φ[(ln30000000 - 15)/2] = 1 - Φ[1.11] = .1335. Therefore, the average payment per nonzero payment is: (.75)(15.43 - 8.51) /.1335 = 38.9 million. A very simplified version of the Florida Hurricane Catastrophe Fund. 31.35. C. Per loss, the insurer would pay the layer from 5,000 to 25,000, which is: E[X ∧ 25,000] - E[X ∧ 5,000]. For the Pareto: E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1} = 10000 {1 - (40000/(40000+x))4 } . E[X ∧ 25,000] = 10000 {1 - (40/65)4 } = 8566. E[X ∧ 5,000] = 10000 {1 - (40/45)4 } = 3757. E[X ∧ 25,000] - E[X ∧ 5,000] = 8566 - 3757 = 4809. Three losses expected per year, thus the insurerʼs expected payment is: (3)(4809) = 14,427.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 450
31.36. E. Without the feature that the insurer pays the entire loss (up to 25,000) for each loss greater than 5,000, the insurer would pay the layer from 5,000 to 25,000, which is: E[X ∧ 25,000] - E[X ∧ 5,000]. As calculated in the solution to the previous question, E[X ∧ 25,000] - E[X ∧ 5,000] = 8566 - 3757 = 4809. However, that extra provision adds 5,000 per large loss, or 5,000(1-F(5000)) = 5,000{(θ/(θ+5000))α = 5000 (40/45)5 = 2775. Thus per loss the insurer pays: 5,000(1-F(5000)) + E[X ∧ 25,000] - E[X ∧ 5,000] = 2775 + 4809 = 7584. There are three losses expected per year, thus the insurerʼs expected payment is: (3)(7584) = 22,752. 31.37. E. The expected losses within the layer 5,000 to 50,000 is: 50000
50000
∫S(x) dx = ∫e-x/20000 dx = 13934. 5000
5000
The percent of expected losses within the layer 5,000 to 50,000 is: 13934/20000 = 69.7%. Alternately, for the Exponential Distribution, LER(x) = 1 - e-x/θ. LER(50000) - LER(5000) = e-5000/20000 - e-50000/20000 = e-0.25 - e-2.5 = 69.7%. 31.38. D. E[X] = exp(µ + σ2/2) = e9.9 = 19930. E[X
∧
5000] = exp(µ + σ2/2)Φ[(ln5000 − µ − σ2)/σ] + 5000{1 - Φ[(ln5000 − µ)/σ]} =
(19930)Φ[-1.88] + (5000){1 - Φ[-.88]) = (19930)(.0301) + (5000)(.8106) = 4653. E[X
∧
50000] = exp(µ + σ2/2)Φ[(ln50000 − µ − σ2)/σ] + 50000{1 - Φ[(ln50000 − µ)/σ]} =
(19930)Φ[.42] + (50000){1 - Φ[1.42]) = (19930)(.6628) + (50000)(1 - .9222) = 17100. The percent of expected losses within the layer 5,000 to 50,000 is: (E[X ∧ 50,000] - E[X ∧ 5000])/E[X] = (17100 - 4653)/19930 = 62.5%. 31.39. C. E[X E[X
∧
∧
x] = {θ/(α−1)}{1 - (θ/(θ + x))α−1} = 20000{1 - (40000/(40000 + x))2 }.
5000] = 4198. E[X
∧
50,000] = 16,049. E[X] = θ/(α−1) = 20000.
The percent of expected losses within the layer 5,000 to 50,000 is: (E[X ∧ 50,000] - E[X ∧ 5000])/E[X] = (16049 - 4198)/20000 = 59.3%.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 451
31.40. A. E[(N - 3)+] = E[N] - E[N ∧ 3] = β - (Prob[N = 1] + 2Prob[N = 2] + 3Prob[N ≥ 3]) = β - β/(1+β)2 - 2β2/(1+β)3 - 3β3/(1+β)3 = {β(1+β)3 - β(1+β) - 2β2 - 3β3}/(1+β)3 = β4/(1+β)3 = 2.54 /3.53 = 0.911. Alternately, E[(N-3)+] = E[(3-N)+] + E[N] - 3 = 3Prob[N = 0] + 2Prob[N = 1] + Prob[N = 2] + β - 3 = β + 3/(1+β) + 2β/(1+β)2 + β2/(1+β)3 - 3 = 2.5 + 3/3.5 + (2)(2.5)/3.52 + 2.52 /3.53 - 3 = 0.911. Alternately, the Geometric shares the memoryless property of the Exponential ⇒ E[(N-3)+]/Prob[N≥3] = E[N] = β. ⇒ E[(N-3)+] = β Prob[N≥3] = β β3/(1+β)3 = β4/(1+β)3 = 0.911. Comment: For integral j, for the Geometric, E[(N - j)+] = βj+1/(1+β)j. 31.41. E. For Policy A the average payment per loss is: E[X] - E[X ∧ 3000] = θ/(α-1) - (θ/(α-1)){1 - (θ/(θ+3000))α−1} = 6000(12/15)2 = 6000(.64). For Policy B the average payment per loss is: E[X ∧ u] = (θ/(α-1)){1 - (θ/(θ+u))α−1} = 6000{1 - (12000/(12000+u))2 }. Setting this equal to 6000(.64): 6000(.64) = 6000{1 - (12000/(12000+u))2 } ⇒ (12000/(12000+u))2 = .36. ⇒ u = 8000. 31.42. B. E[(25 - X)+ ] = (25 - 5)(80%) + (0)(20%) = 16 > 8. ⇒ y must be less than 25. Therefore, E[(y - X)+ ] = (0.8)(y - 5) = 8. ⇒ d = 15. 31.43. D. E[Y] = E[(1 - X)+] = 1 - E[X
∧
1] = 1 - θ(1 - e−1/θ) = 1 - 2(1 - e-1/2) = 0.213.
1
x=1
∫
Alternately, E[Y] = (1 - x) e-x/2/2 dx = -e-x/2 + xe-x/2 + 2e-x/2 ] = 2e-1/2 - 1 = 0.213. 0
x=0
31.44. D. Since by definition E[R] = 1, the LogNormal Distribution has mean of 1. exp[µ + σ2/2] = 1. ⇒ µ = -σ2/2 = -0.08. Percentage of overall wages earned by workers with R < 2 is: {E[X
∧
2] - 2S(2)}/E[X] = Φ[(ln2 − µ − σ2)/σ] = Φ[(ln2 + 0.08 - 0.42 )/0.4] = Φ[1.53] = 93.7%.
Comment: Such wage tables are used to price the impact of changes in the laws governing Workers Compensation Benefits.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 452
31.45. B. Each loss below 50 is counted as its size, while each of the 19 losses ≥ 50 counts as 50. E[X ∧ 50] = {6 + 7+ 11+ 14 + 15+17+ 18 + 19 + 25+ 29 + 30 + 34 + 40 + 41 + 48 + 49 + (19)(50)} / 35 = (403 + 950)/35 = 38.66. 31.46. C. 100/2% = 5000. If the profit is less then 5000, then Alex gets 100. Thus we want: E[Max[.02 X, 100]] = .02 E[max[X, 5000]]. ∞
E[Max[X, 5000]] = 5000F(5000) +
∫
∞
xf(x)dx = 5000F(5000) +
5000
∫
5000
xf(x)dx -
0
∫
xf(x)dx
0
= 5000F(5000) + E[X] - {E[X ∧ 5000] - 5000S(5000)} = 5000 + E[X] - E[X ∧ 5000]. .02 E[Max[X, 5000] = 100 + 0.02(E[X] - E[X ∧ 5000]). Alternately, let Y = max[X, 5000]. Y - 5000 = 0 if X ≤ 5000, and Y - 5000 = X - 5000 if X > 5000. Therefore, E[Y - 5000] = E[(X - 5000)+] = E[X] - E[X ∧ 5000].
⇒ E[Y] = 5000 + E[X] - E[X ∧ 5000]. Expected value of Alexʼs pay is: .02E[Y] = 100 + .02(E[X] - E[X Comment: Similar to SOA M, 11/06, Q.20.
∧
5000]).
31.47. B. F is a Pareto Distribution with α = 3 and θ = 20,000. E[X] = 20,000/(3 - 1) = 10,000. E[X ∧ 5000] = (10,000)(1 - {20,000/(20,000 + 5000)}2 ) = 3600. Alexʼs expected payment is: 100 + .02(E[X] - E[X ∧ 5000]) = 100 + (.02)(10000 - 3600) = 228. 31.48. C. Γ[3 ; 5] = 1 - e-5(1 + 5 + 52 /2) = .875. Γ[4 ; 5] = 1 - e-5(1 + 5 + 52 /2 +53 /6) = .735. For the Gamma Distribution, E[X
∧
500] = (αθ)Γ[α+1 ; 500/θ] + 500 {1 - Γ[α ; 500/θ]} =
300Γ[4 ; 5] + 500{1 - Γ[3 ; 5]} = (300)(.735) +(500)(1 - .875) = 283. 31.49. A. E[X ∧ x] = αθ/(α - 1) - θ3/{(α - 1)xα−1}. E[L ∧ 170,000] = (3)(100,000)/(3 - 1) - 1000003 /{(3 - 1) 1700003-1} = 132,699. E[(170,000 - L)+] = 170,000 - E[L ∧ 170000] = 170,000 - 132,699 = 37,301. E[Bonus] = E[(170,000 - L)+/4] = 37,301/4 = 9325. Comment: Similar to CAS3, 11/05, Q.22.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 453
31.50. His bonus is positive when L < 170,000. ⎛ 100,000 ⎞ 3 F(170,000) = 1 - ⎜ ⎟ = 0.79646. ⎝ 170,000 ⎠ E[Bonus | Bonus > 0] = E[Bonus] / Prob[Bonus > 0] = E[Bonus] / F(170,000) = 9325 / 0.79646 = 11,708.
31.51. A. E[X
∧
1000
1000] =
∫
x f(x) dx + 1000 S(1000) = 350.61 + (1000)(1 - 0.87175.) =
0
478.86. Comment: Based on a LogNormal Distribution with µ = 6.0 and σ = 0.8. 31.52. D. For this Pareto Distribution, E[X E[X
∧
2000] = 1224. E[X
∧
∧
x] = (5000/2){1 - 50002 /(5000 + x)2 } .
10000] = 2222. E[X] = θ/(α-1) = 2500.
The average payment per loss is: (75%) E[X ∧ 2000] + (90%)(E[X] - E[X
∧
10000]) = (75%)(1224) + (90%)(2500 - 2222) = 1168.
31.53. E. The mean of the Single Parameter Pareto is: αθ/(α - 1) = (1.5)(20)/(1.5 - 1) = 60. Thus we want the average size of loss for those losses of size greater than 60. E[X ∧ x] = αθ/(α - 1) - θα / {xα−1 (α - 1)}. E[X ∧ 60] = (1.5)(20)/(1.5 - 1) - 201.5 / {601.5-1 (1.5 - 1)} = 36.906. Average size of loss for those losses of size greater than 60 is: {E[X] - (E[X ∧ 60] - 60S(60))}/S(60) = (60 - 36.906)/{(20/60)1.5} + 60 = 180. Taking into account the 50% increase: (1.5)(180) = 270. Alternately, the average size of those losses of size greater than 60 is: ∞ ∞
∫
60
∫
x f(x) dx / S(60) = 60
x 1.5 201.5 x - 2.5 dx (20 / 60)1.5
∞
= (1.5) (601.5)
∫ x - 1.5 dx
60
= (1.5)(601.5) (2)(60-0.5) = 180. Taking into account the 50% increase: (1.5)(180) = 270.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 454
31.54. The contribution of the small losses, those losses of size less than x is: E[X ∧ x] - x S(x). E[X ∧ x] - x S(x) The percentage of loss dollars from those losses of size less than x is: . E[X] For the LogNormal Distribution, E[X
∧
x] - x S(x) = exp[µ + σ2/2] Φ[
Thus for the LogNormal Distribution, G(x) =
ln[x] - µ - σ 2 ]. σ
E[X ∧ x] - x S(x) ln[x] - µ - σ 2 = Φ[ ]. E[X] σ
G(x) is also LogNormal with parameters: µ + σ2, and σ.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 455
31.55. 0.25 = 1 - exp[-Q0.25 / θ]. ⇒ Q0.25 = θ ln[4/3]. 0.75 = 1 - exp[-Q0.75 / θ]. ⇒ Q0.75 = θ ln[4]. θln[4]
x = θln[4]
x exp[-x / θ] dx = -x exp[-x / θ] - θ exp[-x / θ] ] = θ ln[4/3] 3/4 + θ 3/4 - θ ln[4]/4 - θ/4 ∫ x = θln[4 / 3] θln[4/ 3] = θ{1/2 + ln[4]/2 - ln[3] 3/4}. One half of the total probability is between the first and third quartile. Trimmed Mean = θ{1/2 + ln[4]/2 - ln[3] 3/4} / (1/2) = θ{1 + ln[4] - ln[3] 3/2} = 0.7384 θ. Alternately, E[X ∧ x] = θ (1 - e-x/θ). E[X ∧ Q0.25] = 0.25 θ.
E[X ∧ Q0.75] = 0.75 θ.
The average size of those losses of size between Q0.25 and Q0.75 is: {E[X ∧ Q0.75] - Q0.75 S(Q0.75 )} - {E[X ∧ Q0.25] - Q0.25 S(Q0.25)} F(Q0.75) - F(Q0.25)
=
{0.75θ - (ln[4]θ)(0.25)} - {0.25θ - (ln[4 / 3]θ)(0.75)} θ {0.5 + 0.5 ln[4] - 0.75 ln[3]} = = 0.75 - 0.25 1/2 θ{1 + ln[4] - ln[3] 3/2} = 0.7384 θ. Comment: Here the trimmed mean excludes 25% probability in each tail. One could instead for example exclude 10% probability in each tail The trimmed mean could be applied to a small set of data in order to estimate the mean of the distribution from which the data was drawn. For a symmetric distribution such as a Normal Distribution, the trimmed mean would be an unbiased estimator of the mean. If instead you assumed the data was from a skewed distribution such as an Exponential, then the trimmed mean would be a biased estimator of the mean. If the data was drawn from an Exponential, then the trimmed mean divided by 0.7384 would be an unbiased estimator of the mean. The trimmed mean would be a robust estimator; it would not be significantly affected by unusual values in the sample. In contrast, the sample mean can be significantly affected by one unusually large value in the sample.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
31.56. For α = 1, S(x) = x
E[X ∧ x] =
∫0
HCM 10/8/12,
Page 456
θ . θ + x
x
S(t) dt =
t= x θ θ dt = θ ln(θ + t)] = θ ln(θ+x) - θ ln(θ) = - θ ln[ ]. θ + t θ + x t = 0 0
∫
Comment: The mean only exists if α > 1. However, since the values entering its computation are limited, the limited expected value exists as long as α > 0. 31.57. D. (S - 150)+ - (S - 100)+ is the amount in the layer from 100 to 150 on the index. This is 1/(100 million) times the layer from 10 billion to 15 billion on catastrophe losses. (100 million times 150 is 15 billion.) Thus the payment on the spread is 1/500,000 times the layer from 10 billion to 15 billion on catastrophe losses. ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ For the LogNormal, E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ + x {1 Φ ⎥⎦ ⎢⎣ ⎥⎦ } σ σ ⎣ E[X ∧ 10 billion] = ln[10 billion] - 20 - 22 ln[10 billion] - 20 exp[20 + 22 /2] Φ[ ] + (10 billion) {1 - Φ[ ]} = 2 2 (3.5849 billion) Φ[-0.49] + (10 billion) {1 - Φ[1.51]} = (3.5849 billion) (0.3121) + (10 billion) {1 - 0.9345} = 1.774 billion. E[X ∧ 15 billion] = ln[15 billion] - 20 - 22 ln[15 billion] - 20 exp[20 + 22 /2] Φ[ ] + (15 billion) {1 - Φ[ ]} = 2 2 (3.5849 billion) Φ[-0.28] + (15 billion) {1 - Φ[1.72]} = (3.5849 billion) (0.3897) + (15 billion) {1 - 0.9573} = 2.038 billion. E[X ∧ 15 billion] - E[X ∧ 10 billion] = (2.038 billion - 1.774 billion) / 500,000 = 528. 500,000 Comment: Not intended as a realistic model of catastrophe losses. Catastrophe losses would be from hurricanes, earthquakes, etc. An insurer could hedge its catastrophe risk by buying a lot of these or similar call spreads. An insurer who owned many of these call spreads, would be paid money in the event of a lot of catastrophe losses in this region for the insurance industry. This should offset to some extent the insurerʼs own losses due to these catastrophes, in a manner somewhat similar to reinsurance. 528 is the amount expected to be paid by someone who sold one of these calls (in other words owned a put.) The probability of paying anything is low, but this person who sold a call could pay up to a maximum of: (200)(50) = 10,000.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 457
31.58. The small values each contribute Qα. Their total contribution is αQα. The large values each contribute Q1−α. Their total contribution is αQ1−α . The medium values each contribute their value x. Q1-α
Their total contribution is:
∫ Q
x f(x) dx =
α
E[X ∧ Q1−α] - Q1−α S(Q1−α) - {E[X ∧ Qα] - Qα S(Qα)} = E[X ∧ Q1−α] - αQ1−α - {E[X ∧ Qα] - (1-α)Qα}. Thus adding up the three contributions, the Windsorized mean is: αQ α + αQ 1−α + E[X ∧ Q1−α] - αQ1−α - {E[X ∧ Qα] - (1-α)Qα} = E[X ∧ Q1−α] - E[X ∧ Qα] + Qα. For the Exponential, Qα = -θ ln(1 - α). E[X ∧ x] = θ (1 - e-x/θ).
Q 1−α = -θ ln(α).
E[X ∧ Qα] = θα.
E[X ∧ Q1−α] = θ(1-α).
Thus the Windsorized mean is: θ(1-α) - θα - θ ln(1 - α) = θ {1 - 2α - ln(1-α)}. Comment: The trimmed mean excludes probability in each tail. In contrast, the Windsorized mean substitutes for extreme values the corresponding quantile. The Windsorized mean could be applied to a small set of data in order to estimate the mean of the distribution from which the data was drawn. For example if α = 10%, then all values below the 10th percentile are replaced by the 10th percentile, and all values above the 90th percentile are replaced by the 90th percentile, prior to taking an average. For a symmetric distribution such as a Normal Distribution, the Windsorized mean would be an unbiased estimator of the mean. If instead you assumed the data was from a skewed distribution such as an Exponential, then the Windsorized mean would be a biased estimator of the mean. The Windsorized mean would be a robust estimator; it would not be significantly affected by unusual values in the sample. In contrast, the sample mean can be significantly affected by one unusually large value in the sample. For the Exponential, here is a graph of the Windsorized mean divided by the mean, in other words the Windsorized mean for θ = 1, as a function of alpha:
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Wind. Mean 1.0 0.9 0.8 0.7 0.6
alpha 0.1 0.2 0.3 0.4 0.5 As alpha increases, we are substituting for more of the values in the tails.
Page 458
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 459
31.59. Using the formula for VaR for the Pareto, πp = θ {(1-p)-1/α - 1}. π 0.05 = θ {0.95-1/α - 1}. E[X ∧ x] =
π 0.95 = θ {0.05-1/α - 1}.
⎛ θ ⎞ α− 1⎫ θ ⎧ ⎨1 - ⎜ ⎟ ⎬ , α ≠ 1. ⎝ θ + x⎠ α −1 ⎩ ⎭
θ ⎧ 1 θ ⎛ ⎞ α −1⎫ {1 - 0.951- 1/ α } . E[X ∧ π0.05] = = ⎨1 - ⎝ ⎬ 1/ α ⎠ α −1 ⎩ α − 1 0.95 ⎭ E[X ∧ π0.95] =
θ ⎧ 1 θ ⎛ ⎞ α −1⎫ {1 - 0.051- 1/ α } . = ⎨1 - ⎝ ⎬ 1/ α ⎠ α −1 ⎩ α − 1 0.05 ⎭
The trimmed mean, the average size of those losses of size between π0.05 and π0.95 is: {E[X ∧ π 0.95 ] - π 0.95 S(π0.95)} - {E[X ∧ π0.05 ] - π 0.05 S(π0.05 )} = F(π 0.95 ) - F(π 0.05) {
θ (0.951-1/α - 0.051-1/α) + 0.95 π0.05 - 0.05 π0.95} / 0.9 = α −1
θ{
1 (0.951-1/α - 0.051-1/α) + 0.951-1/α - 0.051-1/α - 0.9} / 0.9 = α −1
α (0.951- 1/ α - 0.051- 1/ α ) θ{ - 1}, α ≠ 1. (0.9) (α -1) For α = 1, E[X ∧ x] = - θ ln[
θ ]. θ+x
π 0.05 = θ {0.95-1 - 1} = θ/19.
π 0.95 = θ {0.05-1 - 1} = 19θ.
E[X ∧ π0.05] = θ ln(20/19).
E[X ∧ π0.95] = θ ln(20).
Therefore, the trimmed mean is: θ {ln(20) - ln(20/19) + 0.95/19 - (0.05)(19)} / 0.9 = 2.2716 θ. Comment: Here the trimmed mean excludes 5% probability in each tail. One could instead for example exclude 10% probability in each tail. Even though we have excluded an equal probability in each tail, for the positively skewed Pareto Distribution, the trimmed mean is less than the mean. As α approaches 1, the mean approaches infinity, while the trimmed mean approaches 2.2716 θ. Here is a graph of the ratio of the trimmed mean to the mean:
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 460
Trimmed Mean over Mean 0.8 0.7 0.6 0.5 0.4 0.3 0.2 2
4
6
8
10
alpha
For example, for α = 3, the trimmed mean is 0.384436θ , while the mean is θ/2; for α = 3 the ratio of the trimmed mean to the mean is 0.768872. 31.60. C. For a loss of size x, the insurer pays 0 if x < 10, and x - 10 if 100 ≥ x ≥ 10. (There are no losses greater than 100.) The average payment, excluding from the average small losses on which the insurer makes no payment is: 100
100
100
100
∫(x-10)f(x) dx / ∫f(x) dx = ∫(x-10)(.015-.0001x) dx / ∫.015-.0001x dx = 32.4 / .855 = 37.9. 10
10
10
10
∞
Alternately, S(10) =
∫f(x) dx = ∫.015-.0001x dx = .855.
10 ∞
E[X] =
100
10 100
∫xf(x) dx = ∫x(.015-.0001x) dx = 41.67.
0
0 10
E[X
∧
∫
10
∫
10 ] = xf(x) dx + 10S(10) = x(.015-.0001x) dx = .72 + 8.55 = 9.27. 0
0
Average payment per payment is: (E[X] - E[X
∧
10 ])/S(10) = (41.67 - 9.27)/.855 = 37.9.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 461
31.61. C. e(x) = (losses excess of x)/(claims excess of x) = (E[X] - E[X ∧ x]) / S(x). Therefore, E[X ∧ x] = E[X] - e(x){1 - F(x)}. 31.62. D. 1. False (not true), For small samples, either of the two methods may have smaller variance. For large samples, the method of maximum likelihood has the smallest variance. 2. True. This is the definition of the Limited Expected Value. 3. True. 31.63. E.
∞
Expected amount paid per loss =
∞
100
∫ x f(x) dx = ∫ x f(x) dx - ∫ x f(x) dx =
100
0
0
Mean - {E[X ∧ 100] - 100(1-F(100)}. 1-F(100) = (θ/(θ+100))2 = (1000/1100)2 = .8264. E[X ∧ 100] = {θ/(α−1)} {1 - (θ/(θ+100))α−1} = {1000/(2-1)} {1 - (1000/1100)2-1} = 90.90. Mean = θ/(α−1) = 1000. Therefore, Expected amount paid per loss = 1000 - {90.90 - 82.64} = 991.74. Expect 10 losses per year, so the average cost per year is: (10)(991.5) = $9915. Alternately, the expected cost per year of 10 losses is: ∞
10
∞
∫ x f(x) dx = (10)(2)(10002)∫ x (1000+x)-3 dx =
100
100 x=∞
∞
∫
107 {-x (1000+x)-2} ] + 107 (1000+x)-2 dx = 107 {100/11002 + 1/1100} = 9917. x=100
100
Alternately, the average severity per loss > $ 100 is: 100 + e(100) = 100 + (θ+100)/(α -1) = 1100 + 100 = $1200. Expected number of losses > $100 = 10(1-F(100)) = 8.2645. Expected annual payment = $1200(8.2645) = $9917. Comment: This is the franchise deductible.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 462
31.64. C. For the LogNormal: E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}. E[X ∧ 100,000] = exp(10 + 12/2)Φ[(ln100000 − 10 − 12 )/1] + 100000{1 - Φ[(ln100000 - 10)/1]} = e10.5Φ(.51) + 100000(1 - Φ(1.51)) = 36,316(.6950) + 100,000(1 - .9345) = 31,790. E[X ∧ x] - xS(x) = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ]. E[X ∧ 50,000] - 50,000S(50000) = e10.5Φ[(ln50000 - 10 - 12 )/1] = 36,316 Φ(-.18) = 36,316(.4286) = 15,565. Without the feature that the insurer pays the entire loss (up to $100,000) for each loss greater than $50,000, the insurer would pay the layer from 50,000 to 100,000, which is E[X ∧ 100,000] - E[X ∧ 50,000]. That extra provision adds 50,000 per large loss, or 50,000 S(50000). Thus the insurer pays: 50,000S(50000) + E[X ∧ 100,000] - E[X ∧ 50,000] = E[X ∧ 100,000] - {E[X ∧ 50,000] - 50,000S(50000)} = 31,790 - 15,565 = 16,225. Alternately, the insurer pays for all dollars of loss in the layer less than $100,000, except it pays nothing for losses of size less than $50,000. The former is: E[X ∧ 100,000]; the latter is: E[X ∧ 50,000]-50,000S(50000). Thus the insurer pays: E[X ∧ 100,000] - {E[X ∧ 50,000] - 50,000S(50000)}. Proceed as above. Alternately, the insurer pays all dollars for losses greater than $50,000, except it pays nothing in the layer above $100,000. The former is: E[X] - {E[X ∧ 50,000]-50,000(S(50000)}; the latter is: E[X] - E[X ∧ 100,000]. Thus subtracting the two values the insurer pays: E[X ∧ 100,000] - { E[X ∧ 50,000] - 50,000(S(50000) }. Proceed as above. Alternately, the insurer pays all dollars for losses greater than $50,000 and less than $100,000, and pays $100,000 per loss greater than $100,000. The former is: {E[X ∧ 100,000]-100,000(1-F(100000))} - {E[X ∧ 50,000]-50,000S(50000)}; the latter is: 100,000S(100,000). Thus adding the two contributions the insurer pays: E[X ∧ 100,000] - { E[X ∧ 50,000] - 50,000S(50000)}. Proceed as above.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 463
31.65. E. A loss ratio of 70% corresponds to (.7)(500000) = $350,000 in losses. If the losses are x, and x < 350,000, then the agent gets a bonus of (1/3)(350,000 - x). On the other hand, if x ≥ 350,000, then the bonus is zero. Therefore the expected bonus is: 350000
∫
(1/3) (350000 -x) f(x)dx = (1/3)(350000) 0
350000
350000
∫ f(x)dx - (1/3) ∫ xf(x)dx = 0
0
(1/3)(350000)F(350000) - (1/3){E[X ∧ 350000] - 350000S(350000)} = (1/3){350000 - (E[X ∧ 350000] }. The distribution of losses is Pareto with α = 3 and θ = 600,000. Therefore, E[X ∧ 350000] = (θ/(α-1)) {1 - (θ/(θ + 350000))α−1} = (600000/2)(1 - (600/950)2 ) = 180,332. Therefore, the expected bonus is: (1/3) (350000 - 180,332) = 56,556. Alternately, the expected amount by which losses are less than y is: y - E[L ∧ y]. Therefore, expected bonus = (1/3)(expected amount by which losses are less than 350000) = E[(350000 - L)+)]/3 = (1/3)(350000 - E[L ∧ 350000]). Proceed as before. Alternately, his losses must be less than 350,000 to receive a bonus. S(350,000) = (600/(350 + 600))3 = .25193 = probability that he receives no bonus. The mean of the "small" losses (< 350,000) = {E[L ∧ 350000] - 350000S(350000)}/F(350000) = (180,332 - (350,000)(.25193))/(1 - .25193) = 123,192. 123,192 / 500,000 = 24.638%, is the expected loss ratio when he gets a bonus. Therefore, the expected bonus when he gets a bonus is: 500,000(70% - 24.638%)/3 = 75,603. His expected overall bonus is: (1 - .25193)(75,603) + (.25193)(0) = 56,556. Comment: Note that since if x ≥ 350, 000 the bonus is zero, we only integrate from zero to 350,000. Therefore, it is not the case that E[Bonus] = (1/3)(350,000 - E[X]). 31.66. C. Let total dollars of claims be A. Let B = the Bonus. Then B = (500-A)/2 if A < 500 and 0 if A ≥ 500. Let y = A if A < 500 and 500 if A ≥ 500. Then E[y] = E[A ∧ 500]. 2B + y = 500, regardless of A. Therefore 2E[B] + E[y] = 500. Therefore E[B] = (500 - E[A ∧ 500])/2 = 250 - E[A ∧ 500]/2. For the Pareto Distribution, E[X] = θ/(α-1), and E[X ∧ x] = {θ/(α-1)}{1- (θ/(x+θ))α−1}. For the revised model, E[A ∧ 500] = K{1- (K/(500+K))} = 500K / (500 + K). Thus for the revised model, E[B] = 250 - 250K/(500 + K) = 125,000/(500 + K). Expected aggregate claims under the revised model are: K/(2-1) = K. Expected aggregate claims under the previous model are: 500/(2-1) = 500. So we are given that: K + 125,000/(500 + K) = 500. 500K + K2 + 125000 = 250000 + 500K. K2 = 125000. K = 353. Comment: The expected amount by which claims are less than 500 is: E[(500 - A)+)] = 500 - E[A ∧ 500].
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 464
31.67. E. A loss ratio of 60% corresponds to: (60%)(800000) = 480,000 in losses. For the Pareto distribution, E[X
∧
x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}.
E[X ∧ 480000] = {(500000)/(2-1)}(1 - (500000/(480000 + 500000))2 - 1) = 244,898. If losses are less than 480,000 a bonus is paid. Bonus = (15%)(amount by which losses < 480,000). Expected bonus = (15%)E[(480000 - L)+] = (15%){480,000 - E[L
∧
480000]} = (15%)(480000 - 244898) = 35,265.
31.68. B. From the previous solution, his expected bonus is: 35,265. He gets a bonus when the aggregate losses are less than 480,000. The probability of this is: F(480,000) = 1 - {500/(500 + 480)}2 = .73969. Expected value of Huntʼs bonus, given that Hunt receives a (positive bonus) is: 35,265/.73969 = 47,675. Comment: This question asks about an analog to the expected payment per (non-zero) payment, while the exam question asks about an analog to the expected payment per loss. In this question we only consider situations where the bonus is positive, while the exam question includes those situations where the bonus is zero. 31.69. E. The expected losses within the layer 1,000 to 10,000 is: 10000
10000
x=10000
∫S(x) dx = ∫106/(x + 103)2 dx = -106/(x + 103)] = 106(1/2000 - 1/11000) = 1000(1/2 - 1/11). 1000
1000
x=1000
∞
∞
∫
x=∞
∫
E[X] = S(x) dx = 106 /(x + 103 )2 dx = -106 /(x + 103 )] = 1000. 0
0
x=0
Therefore the percent of expected losses within the layer 1,000 to 10,000 is: 1000(1/2 - 1/11)/1000 = 1/2 - 1/11 = 40.9%. Alternately, this is a Pareto Distribution with α = 2 and θ = 1000. E[X
∧
x] = {θ/(α−1)}{1 - (θ/(θ + x))α−1} = 1000{1 - 1000/(1000 + x)} = 1000x/(1000 + x).
E[X
∧
1000] = 500. E[X
∧
10,000] = 909. E[X] = θ/(α−1) = 1000.
The percent of expected losses within the layer 1,000 to 10,000 is: (E[X ∧ 10,000] - E[X ∧ 1000])/E[X] = (909 - 500)/1000 = 40.9%. 31.70. A. E[X
∧
x] = {θ/(α-1)}{1 - (θ/(θ+x))α−1}. E[X
100 = E[B] = c(400 - E[X
∧
∧
400] = 300(1 - 3/7) = 171.43.
400]) = c(400 - 171.43) = c228.57. ⇒ c = 100/228.57 = 0.4375.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 465
31.71. C. At 5100 in loss, the insured pays: 250 + (25%)(2250 - 250) + (5100 - 2250) = 3600. ⇒ For annual losses > 5100, the insured pays 5% of the amount > 5100. ⇒ The insurer pays: 75% of the layer from 250 to 2250, 0% of the layer 2250 to 5100, and 95% of the layer from 5100 to ∞.
∧ E[X ∧ E[X ∧ E[X ∧ E[X
x] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (2000){1 - 2000/(2000 + x)} = 2000x/(2000 + x). 250] = (2000)(250)/(2000 + 250) = 222. 2250] = (2000)(2250)/(2000 + 2250) = 1059. 5100] = (2000)(5100)/(2000 + 5100) = 1437.
E[X] = θ/(α-1) = 2000/(2 - 1) = 2000. The expected annual plan payment: (75%)(E[X ∧ 2250] - E[X ∧ 250]) + (95%)(E[X] - E[X ∧ 5100]) = (75%)(1059 - 222) + (95%)(2000 - 1437) = 1163. Comment: Provisions are similar to those in the 2006 Medicare Prescription Drug Program. Here is a detailed breakdown of the layers of loss: Layer Expected Losses in Layer Insured Share Insurer Share 5100 to ∞
563
5%
95%
2250 to 5100
378
100%
0%
250 to 2250
837
25%
75%
0 to 250
222
100%
0%
Total
2000
E[X] - E[X ∧ 5100] = 2000 - 1437 = 563. E[X ∧ 5100] - E[X ∧ 2250] = 1437 - 1059 = 378. E[X ∧ 2250] - E[X ∧ 250] = 1059 - 222 = 837. E[X ∧ 250] = 222. For example, for an annual loss of 1000, insured pays: 250 + (25%)(1000 - 250) = 437.5, and insurer pays: (75%)(1000 - 250) = 562.5. For an annual loss of 4000, insured pays: 250 + (25%)(2250 - 250) + (4000 - 2250) = 2500, and insurer pays: (75%)(2250 - 250) = 1500. For an annual loss of 8000, insured pays: 250 + (25%)(2250 - 250) + (5100 - 2250) + (5%)(8000 - 5100) = 3745, and insurer pays: (75%)(2250 - 250) + (95%)(8000 - 5100) = 4255. 31.72. C. For this Pareto, E[L ∧ 650000] = {600000/(3 - 1)}{1 - (600000/(650000 + 600000)2 } = 230,880. E[(650,000 - L)+] = 650,000 - E[L ∧ 650000] = 650,000 - 230,880 = 419,120. E[Bonus] = E[(650,000 - L)+/3] = 419,120/3 = 139,707.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 466
31.73. C. A constant force of mortality is an Exponential Distribution. Variance = θ2 = 100. ⇒ θ = 10. E[T
∧
10] = θ(1 - e-10/θ) = (10)(1 - e-1) = 6.32.
31.74. B. Teller X completes on average 6 transactions per hour, while Teller Y completes on average 4 transactions per hour. (6)(6 + 4) = 60 transactions by tellers expected in total. 1/3 of all transactions are deposits, and therefore we expect 20 deposits. Expected number of deposits handled by tellers: 20 F(7500). Average size of those deposit of size less than 7500 is: {E[X ∧ 7500] - 7500S(7500)}/F(7500). Expected total deposits made through the tellers each day: 20{E[X ∧ 7500] - 7500S(7500)7500]} = 20{(5000/2)(1 - (5/12.5)2 ) - 7500(5/12.5)3 } = (20){2100 - (7500)(.064)} = 32,400. Comment: While the above is the intended solution of the CAS, it is not what I would have done to solve this poorly worded exam question. Let y be total number of deposits expected per day. Then we expect S(7500)y deposits to be handled by the manager, and F(7500)y deposits to be handled by the tellers. Expect 60 - F(7500)y non-deposits to be handled by the tellers. 1/3 of all transactions are deposits, presumably including those handled by the manager. {60 + S(7500)y}/3 = y. ⇒ y = 60/{3 - S(7500)}. Expected number of deposits handled by tellers: F(7500)y = F(7500) 60/{3 - S(7500)}. Multiply by the average size of those deposit of size less than 7500: (F(7500) 60/{3 - S(7500)}) {E[X ∧ 7500] - 7500S(7500)}/F(7500) = 60{E[X ∧ 7500] - 7500S(7500)}/(3 - S(7500)) = (60){2100 - (7500)(.064)}/(3 - .064) = 33,106. Resulting in a different answer than the intended solution.
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 467
31.75. B. 3%/75% = 4%. If the index return is less then 4%, then the depositor gets 3%. Thus we want: E[Max[.75 X, 3%]] = 75% E[max[X, 4%]]. ∞
E[max[X, 4%]] = 4F(4) +
∫
4
∞
xf(x)dx = 4F(4) +
∫
0
4
xf(x)dx -
∫ xf(x)dx 0
= 4F(4) + E[X] - {E[X ∧ 4] - 4S(4)} = 4 + E[X] - E[X ∧ 4] = 4 + 8 - (-0.58) = 12.58. 75% E[max[X, 4%]] = (75%)(12.58%) = 9.43%. Alternately, let Y = max[X, 4]. Then Y - 4 = 0 if X ≤ 4, and Y - 4 = X - 4 if X > 4. Therefore, E[Y - 4] = E[(X - 4)+] = E[X] - E[X ∧ 4].
⇒ E[Y] = 4 + E[X] - E[X ∧ 4] = 4 + 8 - (-0.58) = 12.58. (75%)(12.58%) = 9.43%. Alternately, as discussed in “Mahlerʼs Guide to Risk Measures” for the Normal Distribution: TVaRp [X] = µ + σ φ[zp ] / (1 - p). We are interested in the tail value at risk for a 4% interest rate. For the Normal with mean 8% and standard deviation 16%, 4% corresponds to: zp = (4% - 8%) / 16% = -0.25. ⇒ p = 1 - 0.5987 = 0.4013. Therefore, TVaR = 0.08 + 0.16 {exp[-(-0.25)2 /2] / 2 π } / 0.5987 = 0.1833. Now 40.13% of the time the return on the equity index is less than 4%, while the remaining 59.87% of the time the return is greater than 4%. Therefore, the expected annual crediting rate is: (75%) {(40.13%)(4%) + (59.87%)(0.1833)} = 9.43%. Given the table of limited expected values, this alternate solution is harder. Comment: In general, Min[X, 4] + Max[X, 4] = X + 4. Therefore, E[Max[X, 4]] = E[X] + 4 - E[X ∧ 4].
2013-4-2,
Loss Distributions, §31 Limited Expected Values
HCM 10/8/12,
Page 468
31.76. C. Let X be such that Michael just has paid 10,000 in out-of-pocket repair costs: 10000 = 1000 + (20%)(6000 - 1000) + (X - 6000). ⇒ X = 14,000. Thus the insurance pays 80% of the layer from 1000 to 6000, plus 90% of the layer above 14,000. For this Pareto Distribution, E[X ∧ x] = 5000{1 - 5000/(5000 + x)} = 5000x/(5000 + x). E[X
∧
1000] = 833. E[X
∧
6000] = 2727. E[X
∧
14000] = 3684. E[X] = θ/(α-1) = 5000.
Expected annual payment by the insurer is: 80%(E[X ∧ 6000] - E[X ∧ 1000]) + 90%(E[X] - E[X
∧
14000]) =
80%(2727 - 833) + 90%(5000 - 3684) = 2700. Comment: Similar to SOA3, 11/04, Q.7. Here is a detailed breakdown of the layers of loss: Layer Expected Losses in Layer Michaelʼs Share
Insurer Share
14,000 to ∞
5000 - 3684 = 1316
10%
90%
6000 to 14,000
3684 - 2727 = 957
100%
0%
1000 to 6000
2727 - 833 = 1894
20%
80%
100%
0%
0 to 1000 Total
833 5000
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 469
Section 32, Limited Higher Moments One can get limited higher moments in a manner parallel to the limited expected value. Just as the limited expected value at u, E[X ∧ u] , is the first moment of data limited to u, the limited second moment, E[(X ∧ u)2 ], is the second moment of the data limited to u. First limit the losses, then square, then take the expected value.
Exercise: Prob[X = 2] = 70%, and Prob[X = 9] = 30%. Determine E[X ∧ 5] and E[(X ∧ 5)2 ]. [Solution: E[X ∧ 5] = (70%)(2) + (30%)(5) = 2.9. E[(X ∧ 5)2 ] = (70%)(22 ) + (30%)(52 ) = 10.3. Comment: Var[X ∧ 5] = 10.3 - 2.92 = 1.89.] As with the limited expected value, one can write the limited second moment as a contribution of small losses plus a contribution of large losses: E[(X ∧ u)2 ] =
u
∫ t2 f(t) dt + S(u) u2.
0
The losses of size larger than u, each contribute u2 , while the losses of size u or less, each contribute their size squared. E[(X ∧ u)2 ] can be computed by integration in the same manner as the moments and Limited Expected Values. As shown in Appendix A attached to the exam, here are the formulas for the limited higher moments for some distributions:219 E[(X ∧ x)n ]
Distribution
n! θn Γ(n+1 ; x/θ) + xn e-x/θ
Exponential Pareto
{n! θn Γ(α−n) / Γ(α)} β[n+1, α−n ; x/(θ+x)] + xn (θ/(θ+x))α
Gamma
{θn Γ(α+n) Γ(α+n; x/θ) / Γ(α)} + xn {1- Γ(α; x/θ) }
LogNormal
⎡ ln(x) − µ − n σ2 ⎤ ⎡ ln(x) − µ ⎤ exp[nµ +.5 n2 σ2] Φ⎢ + xn {1- Φ⎢ ⎥ ⎥⎦ } ⎣ ⎦ ⎣ σ σ
Weibull
θn Γ(1 + n/τ) Γ(1 +n/τ ; (x/θ)τ) + xn exp[-(x/θ)τ]
Single Parameter Pareto
α θn n θα , x ≥ θ. α - n (α - n) x α - n
219
The formula for the limited moments of the Pareto involving Incomplete Beta Functions, reduces to the formula shown subsequently for n=2. However, it requires integration by parts and a lot of algebraic manipulation.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 470
One obtains the Limited Expected Value by setting n = 1, while one obtains the limited second moment for n = 2.220 Distribution
E[(X ∧ x)2 ]
Exponential
2θ2 - 2θ2e-x/θ - 2θxe-x/θ
Single Parameter Pareto
α θ2 2 θα , x ≥ θ. α - 2 (α - 2) x α - 2
Pareto
2 θ2 {1 - (1 + x / θ)1 - α (1 + (α -1)x / θ)} (α − 1) (α − 2)
LogNormal
exp[2µ + 2σ2] Φ
[ ln(x) − σµ −
2σ2
] + x2 {1 - Φ[ ln(x)σ − µ]}
Exercise: For a LogNormal Distribution with µ = 7 and σ = 0.5, what is the E[(X ∧ 1000)2 ]? [Solution: E[(X ∧ 1000)2 ] = e14.5 Φ[{ln(1000) - 7.5} / 0.5] + 10002 {1 - Φ[{ln(1000) - 7} / 0.5] } = 1,982,759 Φ[-1.184] + 1,000,000{1 - Φ[-0.184]} = 1,982,759 (0.1182) + 1,000,000(1 - 0.4270) = 807,362.] Generally, E[(X ∧ u)2 ] is less than E[X2 ]. For low censorship points u or more skewed distributions the difference can be quite substantial. For example, in the above exercise, E[X2 ] = exp[2µ + 2σ2] = e14.5 = 1,982,759, while E[(X ∧ 1000)2 ] = 807,362. Gamma Distribution: For the Gamma Distribution: E[(X ∧ x)2 ] = θ2 α(α+1) Γ(α+2; x/θ)+ x2 {1- Γ(α; x/θ)}. Using Theorem A.1 in Appendix A of Loss Models, Γ(3; x/θ) = 1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ /2. Also Γ(1; x/θ) = 1 - e-x/θ. Thus for the Exponential, which is a Gamma for α = 1, E[(X ∧ x)2 ] = 2θ2 Γ(3; x/θ) + x2 {1 - Γ(α; x/θ)} = 2θ2{1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ /2} + x2 e-x/θ = 2θ2 - 2θ2e-x/θ - 2θxe-x/θ. 220
The limited second moments of a Exponential and Pareto are not shown in Loss Models in these forms, but as shown below these formulas are correct.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 471
Exercise: For an Exponential Distribution with θ = 10, what is the E[(X ∧ 30)2 ]? [Solution: E[(X ∧ x)2 ] = 2θ2 - 2θ2e-x/θ - 2θxe-x/θ. E[(X ∧ 30)2 ] = 200 - 200e-3 - 600e-3 = 160.2.] Second Limited Moment in Terms of Other Quantities of Interest: It is sometimes useful to write E[(X ∧ u)2 ] in terms of the Survival Function, the Excess Ratio R(x) or the Loss Elimination Ratio LER(x) as follows. Using integration by parts and the fact that the integral of f(x) is -S(x): E[(X ∧ u)2 ] =
u
t= u 2 t
0
t= 0
∫ t2 f(t) dt + S(u) u2 = -S(t) ]
u
+
u
∫0 S(t) 2t dt + S(u)u2 = ∫0 S(t) 2t dt .
In particular, for u = ∞, one can write the second moment as twice the integral of the survival function times x:221 ∞
E[X2 ] = 2
∫0 S(t) t dt. u
More generally, E[(X ∧ u)n ] =
∫0 n tn - 1S(t) dt .
222
∞
For u = ∞, E[Xn ] =
∫0 n tn - 1S(t) dt .
Using integration by parts and the fact that the integral of S(t) is LER(x) µ : ∞
E[(X ∧ u)2 ] = 2
221
t =u
u
∫0 S(t) t dt = 2xµ LER(x)t ]= 0 - 2µ ∫0 LER(t) dt .
See formula 3.5.3 in Actuarial Mathematics. Recall that the mean can be written as an integral of the survival function. One can proceed in the same manner to get higher moments in terms of integrals of the survival function times a power of x. 222 The form shown here is true for distributions with support x > 0. More generally, the nth limited moment is the sum of an integral from -∞ to 0 of - n tn-1F(t) and an integral from 0 to L n tn-1S(t). See Equation 3.9 in Loss Models.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments u
E[(X ∧ u)2 ] = 2µ {LER(u)u -
HCM 10/8/12,
Page 472
u
∫0 LER(t) dt } = 2µ { ∫0 R(t) dt - R(u)u}.
So for example for the Pareto distribution: µ = θ / (α−1), R(x) = {θ/(θ+x)}α−1. u
∫0 R(t) dt =
-θα - 1 /
{(α - 2)(θ+ t)α - 2
t= u
}
]
= {θα−1/(α−2)}{θ2−α - (θ+u)2−α}.
t= 0
E[(X ∧ u)2 ] = {2 θα / (α−1)(α−2)}{θ2−α - (θ+u)2−α − (α−2)u(θ+u)1−α}. E[(X ∧ u)2 ] = {2 θ2 / (α−1)(α−2)}{1 - (1 + u/ θ)2−α - (α−2)(u/ θ)(1 + u/ θ)1−α}. E[(X ∧ u)2 ] = E[X2 ] {1 - (1 + u/ θ)1−α[1 + (α−1)u/ θ]}. ∞
Letting u go to infinity, it follows that: E[X2 ] = 2 E[X] ∞
∫0
⇒ R(x) dx =
∫0 R(x) dx .
E[X2 ] . 2 E[X]
∞
∫ S(y) dy
Now the excess ratio R(x) is:
∞
Therefore,
expected excess losses = x mean E[X]
∞
∫ ∫
x=0 y= x
∞
S(y) dy dx = E[X]
∫ 0
R(x) dx = E[X]
.
E[X2 ] = E[X2 ]/2. 2 E[X]
⎛ ∞ ⎞2 f(x) ⎜ I will use the above result to show that the variance is equal to S(y) dy⎟ dx . ⎜ ⎟ S(x)2 x=0 ⎝ y = x ⎠ ∞
∫ ∫
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 473
⎛ ∞ ⎞2 f(x) ⎜ Using integration by parts, let u = S(y) dy⎟ and dv = dx. ⎜ ⎟ S(x)2 ⎝ y =x ⎠
∫
∞
du = 2
S(y) dy (-S(x)). ∫ y =x
v = 1/S(x). x =∞
⎤ ⎛ ∞ ⎞2 ⎛ ∞ ⎞2 f(x) 1 ⎥ ⎜ Therefore, S(y) dy⎟ dx = ⎜ S(y) dy⎟ 2 ⎜ ⎟ S(x) ⎜ ⎟ S(x) ⎥ ⎥⎦ x=0 ⎝ y = x ⎠ ⎝ y =x ⎠ ∞
∫ ∫
∫
+2
∞
∫x=0 y=∫x S(y) dy dx .
x =0
∞
Now as was shown previously, E[X2 ] = 2
∞
∫0 t S(t) dt . ∞
Therefore, if there is a finite second moment,
∫0 t S(t) dt must be finite.
If in the extreme righthand tail S(t) ~ 1/t2 , then this integral would be infinite. Therefore, in the extreme right hand tail, S(t) must go down faster than 1/t2 . ∞
If in the extreme righthand tail S(t) ~ 1/t2+ε , then
S(y) dy / ∫ y =x
S(x) ~ (1/x1+ε) x1+ε/2 = 1/xε/2.
Therefore, if in the extreme right hand tail, S(t) goes down faster than 1/t2 , ∞
then
S(y) dy / ∫ y =x
S(x) approaches zero as x approaches infinity.
⎛ ∞ ⎞2 1 Therefore, as x approaches infinity, ⎜ S(y) dy⎟ approaches zero. ⎜ ⎟ S(x) ⎝ y =x ⎠ ∞ ⎛ ∞ ⎞2 f(x) ⎜ Thus, S(y) dy⎟ dx = -E[X]2 / S(0) + 2 E[X2 ]/2 = E[X]2 - E[X]2 = Var[X]. ⎜ ⎟ S(x)2 x=0 ⎝ y = x ⎠
∫
∫ ∫
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 474
Pareto Distribution: ∞
As discussed previously,
∫0
R(x) dx =
E[X2 ] 223 . 2 E[X]
For example, for the Pareto Distribution, R(x) = {θ/(θ+x)}α−1. ∞
∫0
∞
R(x) dx = θα−1
∫0
(θ + x)α - 1 dx = θα−1 θα−2/(α - 2) = θ / (α-2) =
2 θ2 / {(α -1)(α - 2)} E[X2 ] = . 2 θ / (α - 1) 2 E[X]
As discussed previously, for the Pareto Distribution, E[(X ∧ x)2 ] = E[X2 ] {1 - (1 + x / θ)1 - α (1 + (α - 1)x / θ)} .224 Exercise: For a Pareto with α = 4 and θ = 1000, compute E[X2 ], E[(X ∧ 500)2 ] and E[(X ∧ 5000)2 ]. [Solution: E[X2 ] =
2 θ2 = 333,333, (α − 1) (α − 2)
E[(X ∧ 500)2 ]= E[X2 ] {1 - (1 + 500/θ)1−α(1 + (α−1)500/ θ)} = 86,420, and E[(X ∧ 5000)2 ] = E[X2 ] {1 - (1 + 5000/θ)1−α(1 + (α−1)5000/ θ)} = 308,642.] The limited higher moments can also be used to calculate the variance, coefficient of variation, and skewness of losses subject to a maximum covered loss. Exercise: For a Pareto Distribution with α = 4 and θ = 1000, and for a maximum covered loss of 5000, compute the variance and coefficient of variation (per single loss.) [Solution: From a previous solutions E[X ∧ 5000] = 331.79 and E[(X ∧ 5000)2 ] = 308,642. Thus the variance is: 308,642 - 331.792 = 198,557. Thus the CV is: 198,557 / 331.79 = 1.34.]
223 224
Where R(x) is the excess ratio, and the distribution has support starting at zero. While this formula was derived above, it is not in the Appendix attached to the exam.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 475
Second Moment of a Layer of Loss: Also, one can use the Limited Second Moment to calculate the second moment of a layer of loss.225 The second moment of the layer from d to u is:226 E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d {E[X ∧ u] - E[X ∧ d]}. Exercise: For a Pareto with α = 4 and θ = 1000, compute the second moment of the layer from 500 to 5000. [Solution: From the solutions to previous exercises, E[X ∧ 500] = 234.57, E[X ∧ 5000] = 331.79, E[(X ∧ 500)2 ] = 86,420, and E[(X ∧ 5000)2 ] = 308,642. The second moment of the layer from 500 to 5000 is:
E[(X ∧ 5000)2 ] - E[(X ∧ 500)2 ] - 2(500){E[X ∧ 5000] - E[X ∧ 500]} = 308,642 - 86,420 - (1000)(331.79 - 234.57) = 125,002. Comment: Note this is the second moment per loss, including those losses that do not penetrate the layer, in the same way that E[X ∧ 5000] - E[X ∧ 500] is the first moment of the layer per loss.] Derivation of the Second Moment of a Layer of Loss: For the layer from d to u, the medium size losses contribute x - d, while the large losses contribute the width of the interval u - d. Therefore, the second moment of the layer from d to u is: u
∫d (x -
u
d)2 f(x)
dx + (u - d)2 S(u) =
u
∫0
∫d (x2 - 2dx + d2) f(x) dx + u2S(u) - 2duS(u) + d2S(u) =
d
x2
f(x) dx + u2 S(u) -
∫0
u
x2
∫d
f(x) dx - d2 S(d) + d2 S(d) - 2d x f(x) dx
+ d2 {F(u) - F(d)} - 2duS(u) + d2 S(u) =
225
Recall that the expected value of a Layer of Loss is the difference of the Limited Expected Value at the top of the layer and the Limited Expected Value at the bottom of the layer. For second and higher moments the relationships are more complicated. 226 See Theorem 8.8 in Loss Models. Here we are referring to the second moment of the per loss variable; similar to the average payment per loss, we are including those times a small loss contributes zero to the layer.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 476
E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - uS(u) - (E[X ∧ d] - dS(d))} + d2 {S(d) + F(u) - F(d)} - 2duS(u) + d2 S(u) = E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]} + d{2uS(u) - 2dS(d) + dS(d) + dF(u) - dF(d) -2uS(u) + dS(u)} = E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]} + d{d(F(u) + S(u)) - d(F(d) + S(d)} = E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]} + d(d - d) = E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]}. Variance of a Layer of Loss: Given the first and second moments layer of loss, one can compute the variance and the coefficient of variation of a layer of loss. Exercise: For a Pareto with α = 4 and θ = 1000, compute the variance of the losses in the layer from 500 to 5000. [Solution: From the previous exercise, the second moment is 125,002 and the mean is E[X ∧ 5000] - E[X ∧ 500] = 331.79 - 234.57 = 97.22. The variance of the layer from 500 to 5000 is: 125,002 - 97.222 = 115,550.] Exercise: For a Pareto with α = 4 and θ = 1000, compute the coefficient of variation of the losses in the layer from 500 to 5000. [Solution: From the previous exercise, the variance of the layer from 500 to 5000 is: 125,002 - 97.222 = 115,550 and the mean is 97.22. Thus the CV = 115,550.5 / 97.22 = 3.5.] Using the formulas for the first and seconds moments of a layer of loss, the variance of the layer of losses from d to u is: E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d {E[X ∧ u] - E[X ∧ d]} - {E[X ∧ u] - E[X ∧ d]}2 . Since the average payment per loss under a maximum covered loss of u and a deductible of d is the layer from d to u, this provides a formula for the variance of the average payment per loss under a maximum covered loss of u and a deductible of d.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 477
Exercise: Assume losses are given by a LogNormal Distribution with µ = 8 and σ = 0.7. An insured has a deductible of 1000, and a maximum covered loss of 10,000. What is the expected amount paid per loss? [Solution: For the LogNormal Distribution: E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}. E[X ∧ 1000] = e8.245 Φ[{ln(1000) - 8 - 0.49} / 0.7] + 1000{1 - Φ[{ln(1000) - 8} / 0.7] } = 3808.54 Φ[-2.260] + 1000{1 - Φ[-1.560]} = 3808.54 (0.0119) + 1000(1 - 0.0594) = 986. E[X ∧ 10000] = e8.245 Φ[{ln(10000) - 8 - 0.49} / 0.7] + 10,000{1 - Φ[{ln(10000) - 8} / 0.7] } = 3808.54 Φ[1.029] + 10000{1 - Φ[1.729]} = 3808.54(0.8483) + 10000(1 - 0.9581) = 3650. E[X ∧ 10000] - E[X ∧ 1000] = 3650 - 986 = 2664.] Exercise: Assume losses are given by a LogNormal Distribution with µ = 8 and σ = 0.7. An insured has a deductible of 1000, and a maximum covered loss of 10,000. What is the variance of the amount paid per loss? [Solution: For the LogNormal Distribution: E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ + 2σ2)} / σ] + x2 {1 - Φ[{ln(x) − µ} / σ]}. E[(X ∧ 1000)2 ] = e16.98 Φ[{ln(1000) - 8.98} / 0.7] + 10002 {1 - Φ[{ln(1000) - 8} / 0.7] } = 23,676,652 Φ[-2.960] + 1,000,000{1 - Φ[-1.560]} = 23,676,652 (0.0015) + 1,000,000(1 - 0.0594) = 976,115. E[(X ∧ 10000)2 ] = e16.98 Φ[{ln(10000) - 8.98} / 0.7] + 100002 {1 - Φ[{ln(10000) - 8} / 0.7] } = 23,676,652 Φ[0.329] + 100,000,000{1 - Φ[1.729]} = 23,676,652 (0.6289) + 100,000,000(1 - 0.9581) = 19,080,246. E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]} - {E[X ∧ u] - (E[X ∧ d]}2 = 19,080,246 - 976,115 - (2000)(2664) - 26642 = 5.68 million. Comment: It would take too long to perform all of these computations for a single exam question!] If one has a coinsurance factor of c, then each payment is multiplied by c, therefore the variance is multiplied by c2 . Exercise: Assume losses are given by a LogNormal Distribution with µ = 8 and σ = 0.7. An insured has a deductible of 1000, a maximum covered loss of 10,000, and a coinsurance factor of 80%. What is the variance of the amount paid per loss? [Solution: (0.82 )(5.68 million) = 3.64 million.]
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 478
Variance of Non-Zero Payments: Exercise: For a Pareto with α = 4 and θ = 1000, compute the average non-zero payment given a deductible of 500 and a maximum covered loss of 5000. [Solution: (E[X ∧ 5000] - E[X ∧ 500]) / S(500)= (331.79 - 234.57) / 0.1975 = 492.2. ] One can compute the second moment of the non-zero payments in a manner similar to the second moment of the payments per loss. Given a deductible of d and a maximum covered loss of u, the 2nd moment of the non-zero payments is:227 u
∫(x - d)2 (f(x)/S(d)) dx + (u - d)2S(u)/S(d) = (2nd moment of the payments per loss) / S(d) = d
{E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d(E[X ∧ u] - E[X ∧ d])} / S(d). So just as with the first moment, the second moment of the non-zero payments has S(d) in the denominator. If one has a coinsurance factor of c, then the second moment is multiplied by c2 . Exercise: For a Pareto with α = 4 and θ = 1000, compute the second moment of the non-zero payments given a deductible of 500 and a maximum covered loss of 5000. [Solution: {E[(X ∧ 5000)2 ] - E[(X ∧ 500)2 ] - 2(500){E[X ∧ 5000] - E[X ∧ 500]}}/S(500) = {308,642- 86,420 - (1000)(331.79 - 234.57)} / (1000/1500)4 = 125,002/0.1975309 = 632,823.] Thus given a deductible of d and a maximum covered loss of u the variance of the non-zero payments is: {E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d(E[X ∧ u] - E[X ∧ d])} / S(d) - {(E[X ∧ u] - E[X ∧ d]) / S(d)}2 . Exercise: For a Pareto with α = 4 and θ = 1000, compute the variance of the non-zero payments given a deductible of 500 and a maximum covered loss of 5000. [Solution: From the solutions to previous exercises, variance = 632,823 - 492.22 = 390,562.] If one has a coinsurance factor of c, then each payment is multiplied by c, therefore the variance is multiplied by c2 . Exercise: For a Pareto with α = 4 and θ = 1000, compute the variance of the non-zero payments given a deductible of 500, a maximum covered loss of 5000, and a coinsurance factor of 85%. [Solution: (0.852 )(390,562) = 282,181.] 227
Note that the density is f(x)/S(d) from d to u, and has a point mass of S(u)/S(d) at u.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 479
Problems: Use the following information for the next 7 questions. Assume the unlimited losses follow a LogNormal Distribution with parameters µ = 10 and σ = 1.5. Assume an average of 10,000 losses per year. For simplicity, ignore any variation in costs due to variations in the number of losses per year. 32.1 (2 points) What is the coefficient of variation of the total cost expected per year? A. less than 0.014 B. at least 0.014 but less than 0.018 C. at least 0.018 but less than 0.022 D. at least 0.022 but less than 0.026 E. at least 0.026 32.2 (3 points) If the insurer pays no more than $250,000 per loss, what is the coefficient of variation of the insurerʼs total cost expected per year? A. less than 0.014 B. at least 0.014 but less than 0.018 C. at least 0.018 but less than 0.022 D. at least 0.022 but less than 0.026 E. at least 0.026 32.3 (3 points) If the insurer pays no more than $1,000,000 per loss, what is the coefficient of variation of the insurerʼs total cost expected per year? A. less than 0.014 B. at least 0.014 but less than 0.018 C. at least 0.018 but less than 0.022 D. at least 0.022 but less than 0.026 E. at least 0.026 32.4 (1 point) If the insurer pays the layer from $250,000 to $1 million per loss, what are the insurerʼs total costs expected per year? A. less than $135 million B. at least $135 million but less than $140 million C. at least $140 million but less than $145 million D. at least $145 million but less than $150 million E. at least $150 million
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 480
32.5 (3 points) If the insurer pays the layer from $250,000 to $1 million per loss, what is the coefficient of variation of the insurerʼs total cost expected per year? A. less than 0.03 B. at least 0.03 but less than 0.05 C. at least 0.05 but less than 0.07 D. at least 0.07 but less than 0.09 E. at least 0.09 32.6 (3 points) What is the coefficient of skewness of the total cost expected per year? A. less than 0.15 B. at least 0.15 but less than 0.20 C. at least 0.20 but less than 0.25 D. at least 0.25 but less than 0.30 E. at least 0.30 32.7 (3 points) If the insurer pays no more than $250,000 per loss, what is the coefficient of skewness of the insurerʼs total cost expected per year? A. less than 0.015 B. at least 0.015 but less than 0.020 C. at least 0.020 but less than 0.025 D. at least 0.025 but less than 0.030 E. at least 0.030
Use the following information for the next 7 questions Losses follow an Exponential Distribution with θ = 10,000. 32.8 (1 point) What is the variance of losses? A. less than 105 million B. at least 105 million but less than 110 million C. at least 110 million but less than 115 million D. at least 115 million but less than 120 million E. at least 120 million 32.9 (2 points) Assuming a 25,000 policy limit, what is the variance of payments by the insurer? A. less than 35 million B. at least 40 million but less than 45 million C. at least 45 million but less than 50 million D. at least 50 million but less than 55 million E. at least 55 million
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 481
32.10 (3 points) Assuming a 1000 deductible (with no maximum covered loss), what is the variance of the payments per loss? A. less than 95 million B. at least 95 million but less than 100 million C. at least 100 million but less than 105 million D. at least 105 million but less than 110 million E. at least 110 million 32.11 (2 points) Assuming a 1000 deductible (with no maximum covered loss), what is the variance of non-zero payments by the insurer? A. less than 95 million B. at least 95 million but less than 100 million C. at least 100 million but less than 105 million D. at least 105 million but less than 110 million E. at least 110 million 32.12 (3 points) Assuming a 1000 deductible and a 25,000 maximum covered loss, what is the variance of the payments per loss? A. less than 55 million B. at least 55 million but less than 56 million C. at least 56 million but less than 57 million D. at least 57 million but less than 58 million E. at least 58 million 32.13 (3 points) Assuming a 1000 deductible and a 25,000 maximum covered loss, what is the variance of the non-zero payments by the insurer? A. less than 53 million B. at least 53 million but less than 54 million C. at least 54 million but less than 55 million D. at least 55 million but less than 56 million E. at least 56 million 32.14 (2 points) Assuming a 75% coinsurance factor, a 1000 deductible and a 25,000 maximum covered loss, what is the variance of the insurerʼs payments per loss? A. less than 15 million B. at least 15 million but less than 20 million C. at least 20 million but less than 25 million D. at least 25 million but less than 30 million E. at least 30 million
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 482
32.15 (2 points) Let X be the result of rolling a fair six-sided die, with the numbers 1 through 6 on its faces. Calculate Var(X (A) 1.1 (B) 1.2
∧
4). (C) 1.3
(D) 1.4
(E) 1.5
32.16 (3 points) The size of loss distribution has the following characteristics: (i) E[X] = 245. (ii) E[X ∧ 100] = 85. (iii) S(100) = 0.65. (iv) E[X2 | X > 100] = 250,000. There is an ordinary deductible of 100 per loss. Calculate the second moment of the payment per loss. (A) 116,000 (B) 118,000 (C)120,000
(D) 122,000
(E) 124,000
Use the following information for the next four questions:
• •
Losses follow a LogNormal Distribution with parameters µ = 9.7 and σ = 0.8. The insured has a deductible of 10,000, maximum covered loss of 50,000, and a coinsurance factor of 90%.
32.17 (3 points) What is the average payment per loss? A. less than 7,000 B. at least 7,000 but less than 8,000 C. at least 8,000 but less than 9,000 D. at least 9,000 but less than 10,000 E. at least 10,000 32.18 (2 points) What is E[(X ∧ 50,000)2 ]? A. less than 600 million B. at least 600 million but less than 700 million C. at least 700 million but less than 800 million D. at least 800 million but less than 900 million E. at least 900 million 32.19 (2 points) What is E[(X ∧ 10,000)2 ]? A. less than 80 million B. at least 80 million but less than 90 million C. at least 90 million but less than 100 million D. at least 100 million but less than 110 million E. at least 110 million
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
32.20 (2 points) What is the variance of the payment per loss? A. less than 110 million B. at least 110 million but less than 120 million C. at least 120 million but less than 130 million D. at least 130 million but less than 140 million E. at least 140 million
32.21 (3 points) You are given: Claim Size Number of Claims 0-25 30 25-50 32 50-100 20 100-200 8 Assume a uniform distribution of claim sizes within each interval. Estimate E[(X ∧ 80)2 ]. (A) 2300 (B) 2400
(C) 2500
(D) 2600
(E) 2700
Use the following information for the next 5 questions: • Losses are uniform from 0 to 40. 32.22 (1 point) What is E[X ∧ 10]? A. 8.5 B. 8.75 C. 9.0
D. 9.25
E. 9.5
32.23 (1 point) What is E[X ∧ 25]? A. 15 B. 16 C. 17
D. 18
E. 19
32.24 (2 points) What is E[(X ∧ 10)2 ]? A. 79 B. 80 C. 81
D. 82
E. 83
32.25 (2 points) What is E[(X ∧ 25)2 ]? A. 360 B. 365 C. 370
D. 375
E. 380
32.26 (2 points) What is the variance of the layer of loss from 10 to 25? A. 37 B. 39 C. 41 D. 43 E. 45
Page 483
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Use the following information for the next 7 questions. Assume the following discrete size of loss distribution: 10 50% 50 30% 100 20% 32.27 (2 points) What is the coefficient of variation of this size of loss distribution? A. less than 0.6 B. at least 0.6 but less than 0.8 C. at least 0.8 but less than 1.0 D. at least 1.0 but less than 1.2 E. at least 1.2 32.28 (3 points) What is the coefficient of skewness of this size of loss distribution? A. less than 0 B. at least 0 but less than 0.2 C. at least 0.2 but less than 0.4 D. at least 0.4 but less than 0.6 E. at least 0.6 32.29 (2 points) If the insurer pays no more than 25 per loss, what is the coefficient of variation of the distribution of the size of payments? A. less than 0.6 B. at least 0.6 but less than 0.8 C. at least 0.8 but less than 1.0 D. at least 1.0 but less than 1.2 E. at least 1.2 32.30 (2 points) If the insurer pays no more than 60 per loss, what is the coefficient of variation of the distribution of the size of payments? A. less than 0.6 B. at least 0.6 but less than 0.8 C. at least 0.8 but less than 1.0 D. at least 1.0 but less than 1.2 E. at least 1.2
Page 484
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
32.31 (3 points) If the insurer pays no more than 60 per loss, what is the coefficient of skewness of the distribution of the size of payments? A. less than 0 B. at least 0 but less than 0.2 C. at least 0.2 but less than 0.4 D. at least 0.4 but less than 0.6 E. at least 0.6 32.32 (1 point) If the insurer pays the layer from 30 to 70 per loss, what is the insurerʼs expected payment per loss? A. 10 B. 12 C. 14 D. 16 E. 18 32.33 (2 points) If the insurer pays the layer from 30 to 70 per loss, what is the coefficient of variation of the insurerʼs payments per loss? A. less than 0.6 B. at least 0.6 but less than 0.8 C. at least 0.8 but less than 1.0 D. at least 1.0 but less than 1.2 E. at least 1.2 32.34 (2 points) X follows the density f(x), with support from 0 to infinity. 500
∫
500
f(x) dx = 0.685.
0
∫
0
500
x f(x) dx = 217.
∫
x2 f(x) dx = 76,616.
0
Determine the variance of the limited loss variable with u = 500, X ∧ 500. A. 14,000 B. 15,000 C. 16,000 D. 17,000 E. 18,000 32.35 (4 points) The size of loss follows a Single Parameter Pareto Distribution with α = 3 and θ = 200. A policy has a deductible of 250, a maximum covered loss of 1000, and a coinsurance of 80%. Determine the variance of YP, the per payment variable. A. 12,000 B. 13,000 C. 14,000 D. 15,000 E. 16,000 32.36 (3 points) The loss severity random variable X follows an exponential distribution with mean θ. Determine the coefficient of variation of the excess loss variable Y = max(X - d, 0).
Page 485
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 486
32.37 (21 points) Let U be the losses in the layer from a to b. Let V be the losses in the layer from c to d. a < b ≤ c < d. (i) (3 points) Determine an expression for the covariance of U and V in terms of Limited Expected Values. (ii) (2 points) For an Exponential Distribution with mean of 8, determine the covariance of the losses in the layer from 0 to 10 and the losses in the layer from 10 to infinity. (iii) (3 points) For an Exponential Distribution with mean of 8, determine the variance of the losses in the layer from 0 to 10. Hint: For the Exponential, E[(X ∧ x)2 ] = 2θ2 - 2θ2e-x/θ - 2θxe-x/θ. (iv) (3 points) For an Exponential Distribution with mean of 8, determine the variance of the losses in the layer from 10 to infinity. (v) (1 point) For an Exponential Distribution with mean of 8, determine the correlation of the losses in the layer from 0 to 10 and the losses in the layer from 10 to infinity. (vi) (2 points) For a Pareto Distribution with α = 3 and θ = 16, determine the covariance of the losses in the layer from 0 to 10 and the losses in the layer from 10 to infinity. (vii) (3 points) For a Pareto Distribution with α = 3 and θ = 16, determine the variance of the losses in the layer from 0 to 10. Hint: For the Pareto, E[(X ∧ x)2 ] =
2 θ2 {1 - (1 + x/θ)1−α (1 + (α-1)x/θ)}. (α - 1) (α - 2)
(viii) (3 points) For a Pareto Distribution with α = 3 and θ = 16, determine the variance of the losses in the layer from 10 to infinity. (ix) (1 point) For a Pareto Distribution with α = 3 and θ = 16, determine the correlation of the losses in the layer from 0 to 10 and the losses in the layer from 10 to infinity. 32.38 (14 points) Let X be the price of a stock at time 1. X is distributed via a LogNormal Distribution with µ = 4 and σ = 0.3. Let Y be the payoff on a one-year 70-strike European Call on this stock. Y = 0 if X ≤ 70, and Y = X - 70 if X > 70. (i) (1 point) What is the mean of X? (ii) (2 points) What is the variance of X? (iii) (3 points) What is the mean of Y? (iv) (4 points) What is the variance of Y? (v) (3 points) What is the covariance of X and Y? (vi) (1 point) What is the correlation of X and Y?
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 487
Use the following information for the next 2 questions: Limit Limited Expected Value Limited Second Moment 100,000 55,556 4444 million 250,000 80,247 12,346 million 500,000 91,837 20,408 million 1,000,000 97,222 27,778 million 32.39 (2 points) Determine the coefficient of variation of the layer of loss from 100,000 to 500,000. (A) Less than 2 (B) At least 2, but less than 3 (C) At least 3, but less than 4 (D) At least 4, but less than 5 (E) At least 5 32.40 (2 points) Determine the coefficient of variation of the layer of loss from 250,000 to 1 million. (A) Less than 2 (B) At least 2, but less than 3 (C) At least 3, but less than 4 (D) At least 4, but less than 5 (E) At least 5
Use the following information for the next 2 questions: • Losses are uniform from 2 to 20.
• There is a deductible of 5. 32.41 (1 point) Determine the variance of YP, the per-payment variable. A. 17 B. 18 C. 19 D. 20 E. 21 32.42 (3 points) Determine the variance of YL , the per-loss variable. A. 35 B. 37 C. 39 D. 41 E. 43
32.43 (3 points) The loss severity random variable X follows the pareto distribution with α = 5 and θ = 400. Determine the coefficient of variation of the excess loss variable Y = max(X - 300, 0). (A) 6.5 (B) 7.0 (C) 7.5 (D) 8.0 (E) 8.5
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 488
32.44 (3, 11/00, Q.21 & 2009 Sample Q.115) (2.5 points) A claim severity distribution is exponential with mean 1000. An insurance company will pay the amount of each claim in excess of a deductible of 100. Calculate the variance of the amount paid by the insurance company for one claim, including the possibility that the amount paid is 0. (A) 810,000 (B) 860,000 (C) 900,000 (D) 990,000 (E) 1,000,000 32.45 (1, 5/01, Q.35) (1.9 points) The warranty on a machine specifies that it will be replaced at failure or age 4, whichever occurs first. The machineʼs age at failure, X, has density function 1/5 for 0 < x < 5. Let Y be the age of the machine at the time of replacement. Determine the variance of Y. (A) 1.3 (B) 1.4 (C) 1.7 (D) 2.1 (E) 7.5 32.46 (4, 11/03, Q.37 & 2009 Sample Q.28) (2.5 points) You are given: Claim Size (X) Number of Claims (0, 25] 25 (25, 50] 28 (50, 100] 15 (100, 200] 6 Assume a uniform distribution of claim sizes within each interval. Estimate E(X2 ) - E[(X ∧ 150)2 ]. (A) Less than 200 (B) At least 200, but less than 300 (C) At least 300, but less than 400 (D) At least 400, but less than 500 (E) At least 500 32.47 (SOA3, 11/03, Q.28) (2.5 points) For (x): (i) K is the curtate future lifetime random variable. (ii) qx+k = 0.1(k + 1), k = 0, 1, 2,…, 9 Calculate Var(K ∧ 3). (A) 1.1 (B) 1.2
(C) 1.3
(D) 1.4
(E) 1.5
32.48 (4, 5/07, Q.13) (2.5 points) The loss severity random variable X follows the exponential distribution with mean 10,000. Determine the coefficient of variation of the excess loss variable Y = max(X - 30,000, 0). (A) 1.0 (B) 3.0 (C) 6.3 (D) 9.0 (E) 39.2
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 489
Solutions to Problems: 32.1. E. For the sum of N independent losses, both the Variance and the mean are N times that for a single loss. The standard deviation is multiplied by
N.
Thus the CV, which is the ratio of the standard deviation to the mean, is divided by
N.
Per loss, mean = exp(µ + σ2/2) = e11.125, and second moment is exp(2µ + 2σ2) = e24.5. Therefore for a single loss, CV =
E[X2] / E[X]2 - 1 = e24.5 / e22.25 - 1 = e2.25 -1 = 2.91.
For 10,000 losses we divide by
10,000 = 100, thus the CV for the total losses is 0.0291.
32.2. A. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}. E[X ∧ 250,000] = exp(11.125)Φ[(ln(250,000) - 10 - 2.25)/1.5] + (250,000){1 − Φ[(ln(250,000) − 10)/1.5]} = (67,846)Φ[.12] + (250,000)(1 − Φ[1.62]) = (67,846)(.5478) + (250,000)(1 - .9474) = 50.3 thousand. E[(X ∧ L)2 ] = exp[2µ + 2σ2]Φ[{ln(L) − (µ+ 2σ2)} / σ] + L2 {1 - Φ[{ln(L) − µ} / σ]}. E[(X ∧ 250,000)2 ] = exp(24.5)Φ[-1.38] + 6.25 x 1010{1 - Φ[1.62]} = (4.367 x 1010)(.0838) + (6.25 x 1010)(1 - .9474) = 6.95 x 109 . Therefore for a single loss Coefficient of Variation = E[(X ∧ 250,000)2 ]/ E[X ∧ 250,000]2 - 1 = 6.95 x 109 / 2.53 x 109 - 1 = 1.32. For 10,000 losses we divide by
10,000 = 100, thus the CV is 0.0132.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 490
32.3. C. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}. E[X ∧ 1,000,000] = exp(11.125)Φ[(ln(1,000,000) - 10 - 2.25 )/1.5] + (1,000,000) {1 - Φ[(ln(1,000,000) − 10)/1.5]} = (67,846)Φ[1.04] + (1,000,000)(1−Φ[2.54]) = (67,846)(.8508) + (1,000,000)(1 - .9945) = 63.2 thousand. E[(X ∧ L)2 ] = exp[2µ + 2σ2]Φ[{ln(L) − (µ + 2σ2)} / σ] + L2 {1- Φ[{ln(L) − µ} / σ] }. E[(X ∧ 1,000,000)2 ] = exp(24.5)Φ[-.46] + (1012){1 − Φ[2.54]} = (4.367 x 1010)(.3228) + (1012){1 - .9945} = 1.960 x 1010. Therefore Coefficient of Variation =
E[(X ∧ 250,000)2 ]/ E[X ∧ 250,000]2 - 1 =
1.960 x 1010 / 3.99 x 109 - 1 = 1.98. For 10,000 losses we divide by
10,000 = 100, thus the CV is 0.0198.
Comment: The Coefficient of Variation of the Limited Losses is less than that of the unlimited losses. The CV of the losses limited to 250,000 is lower than that of the losses limited to $1 million. 32.4. A. Since the insurer expects 10,000 losses per year, the expected dollars in the layer from 250,000 to $1 million are: 10,000{E[X ∧ 1 million] - E[X ∧ 250,000]} = 10,000(63.2 thousand - 50.3 thousand) = 129 million. 32.5. C. The mean for the layer is: E[X ∧ 1 million] - E[X ∧ 250,000] = 63.2 thousand - 50.3 thousand = 13.1 thousand. The second moment for the layer is: E[(X ∧ 1 million)2 ] - E[(X ∧ 250,000)2 ] - 2(250000)(E[X ∧ 1 million] - E[X ∧ 250,000]) = 1.960 x 1010 - 6.95 x 109 - 6.55 x 109 = 6.10 x 109 . Therefore for a single loss, Coefficient of Variation = For 10,000 losses we divide by
6.10 x 109 / 1.72 x 108 - 1 = 5.9.
10,000 = 100, thus the CV is 0.059.
Comment: The CV of a layer depends on how high the layer is, the width of the layer, as well as the loss distribution. A higher layer usually has a larger CV.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 491
32.6. E. For the sum of N independent losses, both the Variance and the Third Central Moment are N times that for a single loss. (For the sum of independent random variables, second and third central moments each add.) Thus the skewness, which is the ratio of the Third Central Moment to the variance to the 3/2 power, is divided by
N.
Per loss, mean = exp(µ + σ2/2) = e11.125, second moment is: exp(2µ + 2σ2) = e24.5, and third moment is: exp(3µ + 4.5σ2) = e40.125. Therefore, the variance is: e24.5 - e22.25 = 3.907 x 1010. The third central moment is: e40.125 - 3e24.5e11.125 + 2e33.375 = 2.620 x 1017. Thus for one loss the skewness is: 2.585 x 1017 / (3.907 x 1010)1.5 = 33.5. For 10,000 losses we divide by
10,000 = 100, thus the skewness for the total losses is 0.335.
32.7. B. E[X ∧ 250,000] = exp(11.125)Φ[(ln(250,000) - 10 - 2.25 )/1.5] + (250,000) {1 − Φ[(ln(250,000) − 10)/1.5]} = 50.3 thousand. E[(X ∧ 250,000)2 ] = exp(24.5)Φ[-1.38] + .6.25 x 1010{1 - Φ[1.62]} = 6.95 x 109 . Thus the variance of a limited loss is 6.95 x 109 - 2.53 x 109 = 4.42 x 109 . E[(X ∧ L)3 ] = exp[3µ + 4.5 σ2]Φ[{ln(L) − (µ+ 3σ2)} / σ] + L3 {1- Φ[{ln(L) − µ} / σ] } E[(X ∧ 250,000)3 ] = e40.125Φ[-2.88] + 1.5625 x 1016 {1- Φ[1.62] } = 1.355 x 1015. The third central moment is: 1.355 x 1015 - 3(50.3 thousand)(6.95 x 109 ) + 2( 50.3 thousand)3 = 5.6 x 1014. Thus for one loss the skewness is: 5.6 x 1014 / ( 4.42 x 109 )1.5 = 1.9. For 10,000 losses we divide by
10,000 = 100; thus the skewness for the total losses is 0.019.
Comment: The skewness of the limited losses is much smaller than that of the unlimited losses. Rare very large losses have a large impact on the skewness of the unlimited losses. 32.8. A. θ2 = 100 million. 32.9. E. The second moment is E[(X ∧ x)2 ] = 2θ2 Γ(3; x/θ) + x2 e-x/θ. E[(X ∧ 25000)2 ] = 200,000,000 Γ(3; 2.5) + 625,000,000e-2.5 = 200 million{1- e-2.5(1+2.5+2.52 /2)}+51.30 million = 142.54 million. variance = 142.54 million - 91792 = 58.28 million.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 492
32.10. B. The second moment is: E[X2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X] - (E[X ∧ 1000]}. E[X] - E[X ∧ 1000] = 10,000 - 952 = 9048. E[X2 ] = 2θ2 = 200 million. E[(X ∧ 1000)2 ] = 200,000,000 Γ(3; .1) + 1,000,000e-.1 = 200 million{{1 - e-.1(1+.1+.12 /2)}+.005e-2.5} = .936 million. The second moment is: 200 million - .936 million - (2000)(9048) = 180.97 million. Variance = 180.97 million - 90482 = 99.1 million. ∞
∫
32.11. C. The second moment is:
1000
f(x) (x -1000)2 dx S(1000)
= (2nd moment of payment per loss) / S(1000) = 180.97 million / e-.1 = 180.97 million / .9048 = 200.00 million. Variance = 200.00 million - 10,0002 = 100 million. Comment: Due to the memoryless property of the Exponential, the variance is the same as in the absence of a deductible. 32.12. D. E[X ∧ 25000] = 9179. E[X ∧ 1000] = 952.
E[(X ∧ 25000)2 ] =142.54 million. E[(X ∧ 1000)2 ] = .936 million. second moment of the layer of loss = E[(X ∧ 25000)2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X ∧ 25000] - (E[X ∧ 1000]} = 142.54 million - .936 million - (2000)(9179-952) = 125.15 million. Variance = 125.15 million - (9179-952)2 = 57.46 million. 25,000
32.13. D. The second moment is:
∫
1000
f(x) (x - 1000)2 dx + 240002 S(25000)/S(1000) S(1000)
= (2nd moment of payment per loss) /S(1000) = 125.15 million / e-.1 = 125.15 million / .9048 = 138.32 million. The mean is: 9093. Variance = 138.32 million - 90932 = 55.63 million. 32.14. E. The second moment is: .752 {E[(X ∧ 25000)2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X ∧ 25000] - (E[X ∧ 1000]}} =.5625{ 142.54 million - .936 million - (2000)(9179-952)} = 70.40 million. Variance = 70.40 million - 61702 = 32.33 million.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
32.15. C. E[X
∧
HCM 10/8/12,
Page 493
4] = (1/6)(1) + (1/6)(2) + (1/6)(3) + (3/6)(4) = 3.
∧ 4)2] = (1/6)(12) + (1/6)(22) + (1/6)(32) + (3/6)(42) = 10.33. Var(X ∧ 4) = 10.33 - 32 = 1.33. E[(X
32.16. E.
∞
E[X2 | X > 100] =
∞
∫x2 f(x) dx / S(100). ⇒ ∫x2 f(x) dx = S(100)E[X2 | X > 100]
100
100
= (.65)(250,000) = 162,500. ∞
∞
100
∫x f(x) dx = ∫x f(x) dx - ∫x f(x) dx = E[X] - {E[X ∧ 100] - 100S(100)} 100
0
0
= 245 - {85 - (100)(.65)} = 225. ∞
∫ f(x) dx = S(100) = .65. 100
With a deductible of 100 per loss, the second moment of the payment per loss is: ∞
∫
∞
∞
∫
∫
∞
∫
(x - 100)2 f(x) dx = x2 f(x) dx - 200 x f(x) dx + 10000 f(x) dx 100
100
100
100
= 162,500 - (200)(225) + (10000)(.65) = 124,000. Comment: Similar to SOA M, 5/05, Q.17, in “Mahlerʼs Guide to Aggregate Distributions.” The variance of the payment per loss is: 124,000 - (245 - 85)2 = 98,400. With a deductible of d, the second moment of the payment per loss is: E[X2 | X > d] S(d) - 2d(E[X] - {E[X
∧
E[X2 | X > d] S(d) - 2d E[X] + 2d E[X
d] - dS(d)}) + d2 S(d) =
∧
d] - d2 S(d).
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 494
32.17. E. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]} E[X ∧ 50000] = exp(10.02)Φ[(ln(50,000) -9.7 - .64 )/.8] + (50,000) {1 − Φ[(ln(50,000) − 9.7)/.8]} = (22,471)Φ[.60] + (50,000) {1 - Φ[1.40]} = (22,471)(.7257) + (50,000) {1 - .9192} = 20,347. E[X ∧ 10000] = exp(10.02)Φ[(ln(10,000) - 9.7 - .64 )/.8] + (10,000){1 - Φ[(ln(10,000) − 9.7)/.8]} = (22,471)Φ[-1.41] + (10,000) {1 - Φ[-.61]} = (22,471)(.0793) + (10,000) {.7291} = 9073. The average payment per loss is: (.9)(E[X ∧ 50000] - E[X ∧ 10000]) = (.9)(20,347 - 9073) = 10,147. 32.18. B. E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1 - Φ[{ln(x) −µ} / σ] } E[(X ∧ 50,000)2 ] = exp[20.68]Φ[{ln(50000) - 10.98} / .8] + 500002 {1 - Φ[{ln(50000) − 9.7}/.8] } = 957,656,776Φ[-.20] + 2,500,000,000{1 - Φ[1.40]} = (957,656,776)(.4207) + (2,500,000,000){1 - .9192} = 604.9 million. 32.19. B. E[(X ∧ 10,000)2 ] = exp[20.68]Φ[{ln(10000) - 10.98} / 0.8] + 100002 {1 - Φ[{ln(10000) - 9.7} / 0.8] } = 957,656,776Φ[-2.21] + 100,000,000{1 - Φ[-0.61]} = (957,656,776)(0.0136) + (100,000,000)(0.7291) = 85.9 million. 32.20. D. c2 {E[(X ∧ L)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ L] - (E[X ∧ d]} - {E[X ∧ L] - (E[X ∧ d]}2 } = 0.92 {E[(X ∧ 50000)2 ] - E[(X ∧ 10000)2 ] - 2(10000){E[X ∧ 50000] - (E[X ∧ 10000]} - {E[X ∧ 50000] - (E[X ∧ 10000]}2 } = 0.81{604.8 million - 85.9 million - 20,000(20,345 - 9073) - (20,345 - 9073)2 } = (0.81){166.4 million) = 134.8 million.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 495
32.21. A. For a uniform distribution on (a, b), E[X2 ] = (b3 - a3 )/{3(b - a)}. For those intervals above 80, E[(X ∧ 80)2 ] = 802 = 6400. We need to divide the interval from 50 to 100 into two pieces. For losses uniform on 50 to 100, 2/5 are expected to be greater than 80, so we assign (2.5)(20) = 8 to the interval 80 to 100 and the remaining 12 to the interval 50 to 80. Bottom of Interval
Top of Interval
Number of Claims
Expected 2nd Moment Limited to 80
0 25 50 80 100
25 50 80 100 200
30 32 12 8 8
208 1458 4300 6400 6400
Average
2299
32.22. B., 32.23. C., 32.24. E., 32.25. B. , 32.26. C. 10
∫
E[X ∧ 10] = x/40 dx + (3/4)(10) = 8.75. 0 25
∫
E[X ∧ 25] = x/40 dx + (15/40)(25) = 17.1875. 0 10
∫
E[(X ∧ 10)2 ] = x2 /40 dx + (3/4)(102 ) = 83.333. 0 25
∫
E[(X ∧ 25)2 ] = x2 /40 dx + (15/40)(252 ) = 364.583. 0
Layer Average Severity is: E[X ∧ 25] - E[X ∧ 10] = 17.1875 - 8.75 = 8.4375.
2nd moment of the layer = E[(X ∧ 25)2 ] - E[(X ∧ 10)2 ] - (2)(10)(E[X ∧ 25] - E[X ∧ 10]) = 364.583 - 83.333 - (2)(10)(8.4375) = 112.5. Variance of the layer = 112.5 - 8.43752 = 41.31. Alternately, the contributions to the layer from each small loss is 0, from each medium loss is x - 10, and each large loss is 15. Thus the second moment of the layer is: 25
∫
(x-10)2 /40 dx + (15/40)(152 ) = 28.125 + 84.375 = 112.5. Proceed as before. 10
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 496
32.27. C. Mean = (50%)(10) + (30%)(50) + (20%)(100) = 40. Second Moment = (50%)(102 ) + (30%)(502 ) + (20%)(1002 ) = 2800. Variance = 2800 - 402 = 1200. coefficient of variation =
1200 / 40 = 0.866.
32.28. E. Third Central Moment = (50%)(10 - 40)3 + (30%)(50 - 40)3 + (20%)(100 - 40)3 = 30,000. Skewness = Third Central Moment / Variance1.5 = 30,000/12001.5 = 0.722. 32.29. A. Mean = (50%)(10) + (30%)(25) + (20%)(25) = 17.5. Second Moment = (50%)(102 ) + (30%)(252 ) + (20%)(252 ) = 362.5. Variance = 362.5 - 17.52 = 56.25. coefficient of variation =
56.25 / 17.5 = 0.429.
32.30. B. Mean = (50%)(10) + (30%)(50) + (20%)(60) = 32. Second Moment = (50%)(102 ) + (30%)(502 ) + (20%)(602 ) = 1520. Variance = 1520 - 322 = 496. coefficient of variation =
496 / 32 = 0.696.
32.31. B. Third Central Moment = (50%)(10 - 32)3 + (30%)(50 - 32)3 + (20%)(60 - 32)3 = 816. Skewness = Third Central Moment / Variance1.5 = 816/4961.5 = 0.074. 32.32. C. (50%)(0) + (30%)(20) + (20%)(40) = 14. 32.33. D. Second Moment = (50%)(02 ) + (30%)(202 ) + (20%)(402 ) = 440. Variance = 440 - 142 = 244. coefficient of variation =
32.34. B. E[X
∧
244 / 14 = 1.116.
500
500] =
∫
x f(x) dx + 500 S(500) = 217 + (500)(1 - .685) = 374.5.
0
E[(X
∧
500
500)2 ] =
∫
x2 f(x) dx + 5002 S(500) = 76,616 + (5002 )(1 - .685) = 155,366.
0
Var[(X
∧
500)2 ] = E[(X
∧
500)2 ] - E[X
∧
500]2 = 155,366 - 374.52 = 15,116.
Comment: Based on a Gamma Distribution with α = 4.3 and θ = 100.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 497
32.35. C. From the Tables attached to the exam, for the Single Parameter Pareto, for x ≥ θ: E[X ∧ x] =
αθ θα . α - 1 (α - 1) x α - 1
α θ2 2 θα E[(X ∧ x) ] = . α - 2 (α - 2) x α - 2 2
Thus E[X ∧ 250] = (3)(200) / 2 - 2003 / {(2) (2502 )} = 236. E[X ∧ 1000] = (3)(200) / 2 - 2003 / {(2) (10002 )} = 296. S(250) = (200/250)3 = 0.512. Thus the mean payment per payment is: (80%) (296 - 236) / 0.512 = 93.75. E[(X ∧ 250)2 ] = (3)(2002 ) / 1 - (2)(2003 ) / 250 = 56,000. E[(X ∧ 1000)2 ] = (3)(2002 ) / 1 - (2)(2003 ) / 1000 = 104,000. Thus the second moment of the non-zero payments is: (80%)2 {104,000 - 56,000 - (2)(250)(296 - 236)} / 0.512 = 22,500. Thus the variance of the non-zero payments is: 22,500 - 93.752 = 13,711.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 498
32.36. The probability that a loss exceeds d is: e-d/θ. The non-zero payments excess of a deductible d is the same as the original Exponential. Zero payments would contribute nothing to the aggregate amount paid. One can think of Y = (X - d)+ as the aggregate that results from a Bernoulli frequency with q = e-d/θ, and an Exponential severity with mean θ. This has a variance of: (mean of Bernoulli)(var. of Expon.) + (mean of Expon.)2 (var. of Bernoulli) = (q)(θ2) + (θ2)(q)(1 - q) = (θ2)(q)(2 - q). Mean aggregate is: qθ. Coefficient of variation is:
(θ2)(q)(2 - q) / (qθ) =
Comment: Similar to 4, 5/07, Q.13.
2/ q - 1=
2 ed / θ - 1.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 499
32.37. (i) E[U] = E[X ∧ b] - E[X ∧ a]. E[V] = E[X ∧ d] - E[X ∧ c]. For x < c, V = 0. For c < x < d, U = b - a, and V = x - c. For d < x, U = b - a, and V = d - c. Therefore, E[UV] =
d
d
c
c
∫ (b − a)(x − c)f(x)dx + (b-a)(d-c) S(d) = (b - a){
∫ (x − c)f(x) dx
+ (d-c) S(d)}
= (b - a)(expected losses in the layer from c to d) = (b - a){E[X ∧ d] - E[X ∧ c]}. Cov[U, V] = E[UV] - E[U]E[V] = {b - a - E[X ∧ b] + E[X ∧ a]}{E[X ∧ d] - E[X ∧ c]}. Cov[U, V] = (width of the lower interval minus the layer average severity of the lower interval) (layer average severity of the upper interval). Cov[U, V] = {E[(b - X)+] - E[(a - X)+]}{E[X ∧ d] - E[X ∧ c]}. (ii) E[X ∧ 10] = 8(1 - e-10/8) = 5.708. Using the result from part (i) with a = 0, b = c = 10, and d = ∞: Covariance = {10 - E[X ∧ 10]}{E[X] - E[X ∧ 10]} = (10 - 5.708)(8 - 5.708) = 9.84. (iii) Mean of the layer from 0 to 10 is: E[X ∧ 10] = 5.708. E[(X ∧ 10)2 ] = 2(82 ) - 2(82 ) e-10/8 - 2(8)(10) e-10/8 = 45.487. Second Moment of the layer from 0 to 10 is: E[(X
∧
10)2 ] = 2(82 ) - 2(82 ) e-10/8 - 2(8)(10) e-10/8 = 45.487.
Variance of the layer from 0 to 10 is: 45.487 - 5.7082 = 12.906. (iv) Mean of the layer from 10 to ∞ is: E[X] - E[X ∧ 10] = 8 - 5.708 = 2.292. E[X2 ] = 2θ2 = 2(82 ) = 128. Second Moment of the layer from 10 to ∞ is: E[X2 ] - E[(X
∧
10)2 ] - (2)(10)(E[X] - E[X
∧
10]) = 128 - 45.487 - (20)(2.292) = 36.673.
Variance of the layer from 10 to ∞ is: 36.673 - 2.2922 = 31.420. (v) Correlation of the layer from 0 to 10 and the layer from 10 to infinity is: 9.84/ (12.906)(31.420) = 0.489. (vi) E[X
∧
10] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (16/2){1 - (16/26)2 } = 4.970.
Using the result from part (i) with a = 0, b = c = 10, and d = ∞: Covariance = {10 - E[X ∧ 10]}{E[X] - E[X ∧ 10]} = (10 - 4.970)(8 - 4.970) = 15.24. (vii) Mean of the layer from 0 to 10 is: E[X ∧ 10] = 4.970. Second Moment of the layer from 0 to 10 is: E[(X
∧
10)2 ] = 2(162 ){1 - (1 + 10/16)-2(1 + (2)(10)/16)}/{(2)(1)} = 37.870.
Variance of the layer from 0 to 10 is: 37.870 - 4.9702 = 13.169. (viii) Mean of the layer from 10 to ∞ is: E[X] - E[X ∧ 10] = 8 - 4.970 = 3.030. E[X2 ] = 2(162 )/{(2)(1)} = 256. Second Moment of the layer from 10 to ∞ is: E[X2 ] - E[(X
∧
10)2 ] - (2)(10)(E[X] - E[X
∧
10]) = 256 - 37.870 - (20)(3.030) = 157.530.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 500
Variance of the layer from 10 to ∞ is: 157.530 - 3.0302 = 148.35. (ix) Correlation of the layer from 0 to 10 and the layer from 10 to infinity is: 15.24/ (13.169)(148.35) = 0.345. Comment: Since a larger loss contributes more to both layers than a smaller loss, the losses in the layers are positively correlated. 32.38. (i) E[X] = exp[4 + 0.32 /2] = 57.11. (ii) E[X2 ] = exp[(2)(4) + (2)(0.32 )] = 3568.85. VAR[X] = 3568.85 - 57.112 = 307.3. (iii) E[Y] = 0 Prob[X < 70] + E[X - 70 | X > 70] Prob[X > 70] = E[X - 70 | X > 70] Prob[X > 70] = e(70) S(70) = E[X] - E[X ∧ 70 ]. E[X ∧ 70 ] = exp[4 + 0.32 /2] Φ[(ln(70) - 4 - 0.32 )/0.3] + 70 {1 - Φ[(ln(70) - 4)/0.3] = 57.11 Φ[0.53] + (70){1 - Φ[0.83]} = (57.11)(0.7019) + (70)(1 - 0.7967) = 54.32. Thus E[Y] = E[X] - E[X ∧ 70 ] = 57.11 - 54.32 = 2.79. (iv) E[Y2 ] = 0 Prob[X < 70] + E[(X - 70)2 | X > 70] Prob[X > 70] = (second moment of the layer from 70 to infinity) Prob[X > 70] = E[X2 ] - E[(X ∧ 70)2 ] - (2)(70){E[X] - E[X ∧ 70 ]}. For LogNormal Distribution the second limited moment is: ln(x) − µ − 2σ2 ln(x) − µ E[(X ∧ x)2 ] = exp[2µ + 2σ2] Φ + x2 {1 - Φ }. σ σ
[
]
[
]
E[(X ∧ 70)2 ] = exp[(2)(4) + (2)(0.32 )] Φ[(ln(70) - 4 - (2)(0.32 ))/0.3] + 702 {1 - Φ[(ln(70) - 4)/0.3]} = 3568.85 Φ[0.23] + 4900 {1 - Φ[0.83]} = (3568.85)(0.5910) + (4900)(1 - 0.7967) = 3105.36. Thus, E[Y2 ] = E[X2 ] - E[(X ∧ 70)2 ] - (2)(70){E[X] - E[X ∧ 70 ]} = 3568.85 - 3105.36 - (140)(57.11 - 54.32) = 72.89. VAR[Y] = E[Y2 ] - E[Y]2 = 72.89 - 2.792 = 65.11. (v) E[XY] = E[X (X - 70) | X > 70] Prob[X > 70] = E[(70)(X - 70) - (X - 70) (X - 70) | X > 70] Prob[X > 70] = 70 E[X - 70 | X > 70] Prob[X > 70] - E[Y2 | X > 70] Prob[X > 70] = 70 E[Y] + E[Y2 ] = (70) (2.79) + 72.89 = 268.19. Cov[X, Y] = E[XY] - E[X] E[Y] = 268.19 - (57.11)(2.79) = 108.85. Cov[X, Y] 108.85 (vi) Corr[X, Y] = = = 0.77. VAR[X] VAR[Y] (307.3) (65.11)
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 501
32.39. B. The first moment of the layer is: 91,837 - 55,556 = 36,281. The second moment of the layer is: 20,408 million - 4444 million - (2)(100,000)(36,281) = 8707.8 million. Variance of the layer is: 8707.8 million - 36,2812 = 7392 million. Standard Deviation of the layer is: 85,977. CV of the layer is: 85,977/36,281 = 2.37. Comment: Based on a Pareto Distribution with α = 3 and θ = 200,000. 32.40. D. The first moment of the layer is: 97,222 - 80,247 = 16,975. The second moment of the layer is: 27,778 million - 12,346 million - (2)(250,000)(16,975) = 6944.5 million. 1 + CV2 = 6944.5 million / 16,9752 = 24.100. ⇒ CV = 4.81. 32.41. C. The non-zero payments are uniform from 5 to 20, with variance: (20 - 5)2 / 12 = 18.75. 32.42. B. The non-zero payments are uniform from 5 to 20, with mean: 12.5, variance: (20 - 5)2 / 12 = 18.75, and second moment: 18.75 + 12.52 = 175. The probability of a non-zero payment is: 15/18 = 5/6. Thus YL is a two-point mixture of a uniform distribution from 5 to 20 and a distribution that is always zero, with weights 5/6 and 1/6. The mean of the mixture is: (5/6)(12.5) + (1/6)(0) = 10.417. The second moment of the mixture is: (5/6)(175) + (1/6)(02 ) = 145.83. The variance of this mixture is: 145.83 - 10.4172 = 37.3. Alternately, YL can be thought of as a compound distribution, with Bernoulli frequency with mean 5/6 and Uniform severity from 5 to 20. The variance of this compound distribution is: (Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) = (5/6)(18.75) + (12.5)2 {(5/6)(1/6)} = 37.3.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 502
32.43. A. The probability that a loss exceeds 300 is: (4/7)5 = 0.06093. The losses truncated and shifted from below at 300 is also a Pareto Distribution but with α = 5 and θ = 400 + 300 = 700. One can think of Y = (X - 300)+ as the aggregate that results from a Bernoulli frequency with q = 0.06093, and a Pareto severity with α = 5 and θ = 700. This Pareto has mean: 700/4 = 175, second moment: (2)(7002 ) / {(4)(3)} = 81,667, and variance: 81,667 - 1752 = 51,042. This has a variance of: (mean of Bernoulli)(var. of Pareto) + (mean of Pareto)2 (var. of Bernoulli) = (0.06093)(51,042) + (1752 )(0.06093)(1 - 0.06093) = 4862. Mean aggregate is: (0.06093)(175) = 10.663. Coefficient of variation is: 4862 / 10.663 = 6.54. Alternately, this is mathematically equivalent to a two point mixture, with 0.06093 weight to a Pareto with α = 5 and θ = 700 (the non-zero payments) and (1 - 0.06093) weight to a distribution that is always zero. The mean is: (0.06093)(175) + (1 - 0.06093)(0) = 10.663. The second moment is the weighted average of the two second moments: (0.06093)(81,667) + (1 - 0.06093)(0) = 4976. Therefore, 1 + CV2 = 4976/10.6632 = 43.76. ⇒ CV = 6.54. Comment: Similar to 4, 5/07, Q.13, which involves an Exponential rather than a Pareto.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 503
32.44. D. An Exponential distribution truncated and shifted from below is the same Exponential Distribution, due to the memoryless property of the Exponential. Thus the nonzero payments are Exponential with mean 1000. The probability of a nonzero payment is the probability that a loss is greater than the deductible of 100; S(100) = e-100/1000 = 0.90484. Thus the payments of the insurer can be thought of as a compound distribution, with Bernoulli frequency with mean 0.90484 and Exponential severity with mean 1000. The variance of this compound distribution is: (Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) = (.90484)(10002 ) + (1000)2 {(.90484)(.09516)} = 990,945. Equivalently, the payments of the insurer in this case are a two point mixture of an Exponential with mean 1000 and a distribution that is always zero, with weights .90484 and .09516. This has a first moment of: (1000)(.90484) + (.09516)(0) = 904.84, and a second moment of: {(2)(10002 )}(.90484) + (.09516)(02 ) = 1,809,680. Thus the variance is: 1809680 - 904.842 = 990,945. Alternately, for the Exponential Distribution, E[X] = θ = 1000, and E[X2 ] = 2θ2 = 2 million. For the Exponential Distribution, E[X ∧ x] = θ (1 - e-x/θ). E[X
∧
100] = 1000(1 - e-100/1000) = 95.16.
For the Exponential, E[(X E[(X
∧
∧
x)n ] = n! θn Γ(n+1; x/θ) + xn e-x/θ.
100)2 ] = (2)10002 Γ(3; 100/1000) + 1002 e-100/1000.
According to Theorem A.1 in Loss Models, for integral α, the incomplete Gamma function Γ(α; y) is 1 minus the first α densities of a Poisson Distribution with mean y. Γ(3; y) = 1 - e-y(1 + y + y2 /2). Γ(3; .1) = 1 - e-.1(1 + .1 + .12 /2) = .0001546. Therefore, E[(X ∧ 100)2 ] = (2 million)(.0001546) + 10000e-.1 = 9357. The first moment of the layer from 100 to ∞ is: E[X] - E[X ∧ 100] = 1000 - 95.16 = 904.84. The second moment of the layer from 100 to ∞ is: E[X2 ] - E[(X ∧ 100)2 ] - (2)(100)(E[X] - E[X ∧ 100]) = 2,000,000 - 9357 - (200)(904.84) = 1,809,675. Therefore, the variance of the layer from 100 to ∞ is: 1,809,675 - 904.842 = 990,940. Alternately, one can work directly with the integrals, using integration by parts. The first moment of the layer from 100 to ∞ is: ∞
∫
∞
∞
∫
∫
(x -100)e-x/1000/1000 dx = xe-x/1000/1000 dx - (1/10) e-x/1000 dx = 100
100
100
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 504
x= ∞
-xe-x/1000 - 1000e-x/1000] - 100e-.1 = 100e-.1 + 1000e-.1 - 100e-.1 = 904.84. x = 100
The second moment of the layer from 100 to ∞ is: ∞
∞
∫
(x -100)2 e-x/1000/1000 dx = 100
∞
∞
∫x2e-x/1000/1000 dx - ∫xe-x/1000/5 dx + 10∫e-x/1000 dx
100
100
100
x= ∞
x= ∞
= -x2 e-x/1000 - 2000xe-x/1000 - 2,000,000e-x/1000] + 200xe-x/1000 + 200,000e-x/1000] x = 100
x = 100
+ 10,000e-.1 = e-.1{10,000 + 200,000 + 2,000,000 - 20,000 - 200,000 + 10,000} = 2,000,000e-.1 = 1,809,675. Therefore, the variance of the layer from 100 to ∞ is: 1,809,675 - 904.842 = 990,940. Comment: Very long and difficult, unless one uses the memoryless property of the Exponential Distribution. 32.45. C. E[Y] = (2)(4/5) + 4(1/5) = 2.4. 4
∫
E[Y2 ] = x2 /5 dx + 42 (1/5) = 64/15 + 16/5 = 7.4667. 0
Var[Y] = 7.4667 - 2.42 = 1.7067.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
32.46. C. For X ≤ 150, X = X
∧
HCM 10/8/12,
Page 505
150.
So the only contribution to E(X2 ) - E[(X
∧
150)2 ] comes from any losses of size > 150.
Losses uniform on (100, 200] ⇒ expect 3 claims greater than 150, out of a total of 74. Uniform on (150, 200], E[X2 ] = variance + mean2 = 502 /12 + 1752 = 30,833. On (150, 200], each loss is at least 150, and therefore E[(X E(X2 ) - E[(X ∧ 150)2 ] = Prob[X>150] E[X2 - (X (3/74)(30,833 - 22,500) = 338.
∧
∧
150)2 ] = 1502 = 22,500.
150)2 | X uniform on (150, 200]] =
32.47. A. qx = 0.1. qx+1 = 0.2. qx+2 = 0.3. Prob[K = 0] = Prob[Die 1st Year] = qx = 0.1. Prob[K = 1] = P[Alive @ x+1] P[Die 2nd Year | Alive @ x+1] = (1 - qx) qx+1 = (0.9)(0.2) = 0.18. Prob[K = 2] = P[Alive @ x+2 ] P[Die 3rd Year | Alive @ x+2] = (1 - qx) (1 - qx+1) qx+2 = (0.9)(0.8)(0.3) = 0.216. Prob[K ≥ 3] = 1 - (0.1 + 0.18 + .0216) = 0.504. K
Prob
K
∧
3
(K
∧
3)2
0 .1 0 0 1 .18 1 1 2 .216 2 4 ≥3 .504 3 9 Avg. 2.124 5.580 (.1)(0) + (.18)(1) + (.216)(2) + (.504)(3) = 2.124. (.1)(0) + (.18)(1) + (.216)(4) + (.504)(9) = 5.580. Var(K
∧
3) = E[(K
∧
3)2 ] - E[K
∧
3]2 = 5.580 - 2.1242 = 1.069.
2013-4-2,
Loss Distributions, §32 Limited Higher Moments
HCM 10/8/12,
Page 506
32.48. C. The probability that a loss exceeds 30,000 is: e-30000/10000 = 0.049787. The losses truncated and shifted from below at 30,000 is the same as the original Exponential. One can think of Y = (X - 30000)+ as the aggregate that results from a Bernoulli frequency with q = 0.049787, and an Exponential severity with mean 10,000. This has a variance of: (mean of Bernoulli)(var. of Expon.) + (mean of Expon.)2 (var. of Bernoulli) = (0.049787)(100002 ) + (100002 )(0.049787)(1 - 0.049787) = 9,710,096. Mean aggregate is: ( 0.049787)(10000) = 497.87. Coefficient of variation is:
9,710,096 / 497.87 = 6.26.
Alternately, this is mathematically equivalent to a two point mixture, with 0.049787 weight to an Exponential with mean 10,000 (the non-zero payments) and (1 - 0.049787) weight to a distribution that is always zero. The mean is: (0.049787)(10,000) + (1 - 0.049787)(0) = 497.87. The second moment is the weighted average of the two second moments: (0.049787)(2)(10,0002) + (1 - 0.049787)(0) = 9,957,414. Therefore, 1 + CV2 = 9,957,414/497.872 = 40.17. ⇒ CV = 6.26. Alternately, E[Y2 ] = E[(X - 30000)+2 | X > 30000] Prob[X > 30000] = (Second moment of an Exponential Distribution with θ = 10000) e-10000/30000 = (2)(100002 )(0.049787) = 9,957,414. Proceed as before. ∞
Alternately, E[Y] =
∫
(x − 30000) exp[−x / 10000]/ 10000 dx =
30000 ∞
∫
∞
y exp[−(y+ 30000) / 10000] /10000 dy = e-3
0
∫ y exp[−y / 10000] / 10000 dy 0
= e-3 (10000) = 497.87. ∞
E[Y2 ] =
∫
(x − 30000) 2 exp[−x /10000] /10000 dx
30000 ∞
=
∫
0
y2 exp[−(y + 30000) / 10000] / 10000 dy = e-3
∞
∫ y2 exp[−y /10000]/ 10000 dy 0
= e-3 (2)(100002 ) = 9,957,414. Proceed as before.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 507
Section 33, Mean Excess Loss As discussed previously, the Mean Excess Loss or Mean Residual Life (complete expectation of life), e(x), is defined as the mean value of those losses greater than size x, where each loss is reduced by x. Thus one only includes those losses greater than size x, and only that part of each such loss greater than x. ∞
∞
x
x
∫ (t - x) f(t) dt
e(x) = E[X - x | X > x] =
S(x)
∫ t f(t) dt
=
S(x)
- x.
The Mean Excess Loss at d, e(d) = average payment per payment with a deductible d. The Mean Excess Loss is the mean of the loss distribution truncated and shifted at x: e(x) = (average size of those losses greater in size than x) - x. Therefore, the average size of those losses greater in size than x = e(x) + x. On the exam, usually the easiest way to compute the Mean Excess Loss for a distribution is to use the formulas for the Limited Expected Value in Appendix A of Loss Models, and the identity: E[X] − E[X ∧x] e(x) = . S(x) Therefore, e(0) = mean, provided the distribution has support, x > 0.228 Exercise: E[X ∧ $1 million] = $234,109. E[X] = $342,222. S($1 million) = 0.06119. Determine e($1 million). [Solution: e($1 million) = (342,222 - 234,109) / 0.06119 = $1.767 million.]
228
Thus e(0) = mean, with the notable exception of the Single Parameter Pareto.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 508
Formulas for the Mean Excess Loss for Various Distributions: Distribution
Mean Excess Loss, e(x)
Exponential
θ
Pareto
θ+ x , α> 1 α − 1
LogNormal
⎡ ln(x) − µ − n σ2 ⎤ 1 - Φ⎢ ⎥⎦ ⎣ σ exp(µ + σ2/2) - x ⎡ln(x) − µ ⎤ 1 - Φ⎢ ⎥⎦ ⎣ σ
Gamma
αθ
Weibull
θ Γ(1 +1/τ) {1 - Γ(1 +1/τ ; (x/θ)τ) } exp[(x/θ)τ] - x
1 - Γ(α +1 ; x / θ) - x 1 - Γ(α ; x / θ)
x α − 1
Single Parameter Pareto Inverse Gaussian: ⎛ x⎞ θ Φ ⎜1− ⎟ ⎝ µ⎠ x µ ⎛ x⎞ θ Φ ⎜1− ⎟ ⎝ µ⎠ x
[ [
Burr
] ]
⎛x ⎞ θ + e2θ / µ Φ − ⎜ + 1⎟ ⎝µ ⎠ x ⎛x ⎞ θ - e2 θ / µ Φ − ⎜ + 1⎟ ⎝µ ⎠ x
[ [
] ]
- x
{θΓ(α− 1/γ)Γ(1+1/γ) / Γ(α)}{β[ α− 1/γ , 1+1/γ ; 1/(1+(x/θ)γ)]}(1+(x/θ)γ)α, αγ > 1
Trans. Gamma
θ{Γ(α+(1/τ)) /Γ(α)}{1 - Γ(α+(1/τ) ; (x/θ)τ) } / {1-Γ[α ; (x/θ)τ]} - x
Gen. Pareto
{ θτ / (α-1)}β[α−1, τ+1; θ/(θ+x)] / β[ α,τ ; θ/(θ+x)]}, α > 1
Normal
σ2φ[(x − µ )/σ]/{1 − Φ[(x − µ )/σ]} + µ - x
It should be noted that for heavier-tailed distributions, just as with the mean, the Mean Excess Loss only exists for certain values of the parameters. Otherwise it is infinite.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 509
For example, for the Pareto for α ≤ 1, the mean excess loss is infinite or does not exist. The Exponential distribution is the only distribution with a constant Mean Excess Loss. If F(x) represents the distribution of the ages of death, then e(x) is the (remaining) life expectancy of a person of age x. A constant Mean Excess Loss is independent of age and is equivalent to a force of mortality that is independent of age. Exercise: For a Pareto with α = 4 and θ = 1000, determine e(800). [Solution: E[X] = θ/(α−1) = 333.3333. E[X ∧ 800] = {θ/(α−1)}{1 − (θ/(θ + 800))α−1} = 276.1774. S(800) = {θ/(θ + 800)}α = (1/1.8)4 = 0.09526. e(800) = (333.3333 - 276.1774)/(0.09526) = 600. Alternately, e(x) = (θ + x) / (α - 1) = (1000 + 800)/(4 - 1) = 600.] Mean Excess Loss in terms of the Survival Function: The Mean Excess Loss can be written in terms of the survival function, S(x) = 1 - F(x). By definition, e(x) is the ratio of loss dollars excess of x divided by S(x). ∞
∞
x
x
e(x) = ∫ (t − x) f( t)dt / S(x) = { ∫ t f(t) dt - S(x)x} / S(x)
.
Using integration by parts and the fact that the integral of f(x) is -S(x):229 ∞
e(x) = {S(x)x + ∫ S(t) dt - S(x)x} / S(x). x
∞
e(x) = ∫ S(t) dt / S(x) . x
So the Mean Excess Loss at x is the integral of the survival function from x to infinity divided by the survival function at x.230 229
Note that the derivative of S(x) is d(1-F(x))/dx = - f(x). Remember there is an arbitrary constant for indefinite integrals. Thus the indefinite integral of f(x) is either -F(x) or S(x) = 1-F(x). 230 The Mean Excess Loss as defined here is the same as the complete expectation of life as defined in Life Contingencies. The formula given here is equivalent to formula 3.5.2 in Actuarial Mathematics by Bowers et. al., pp.62-63. sp x = S(x+s) / S(x), and therefore: ∞
e° x =
∫ spx ds. 0
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 510
For example, for the Pareto Distribution, S(x) = θα (θ+x)−α. Therefore, e(x) = {θα (θ+x)1−α / (α−1)} / {θα (θ+x)−α} = (θ + x) / (α − 1). This matches the formula given above for the Mean Excess Loss of the Pareto Distribution. Behavior of e(x) in the Righthand Tail: Here is a table of the behavior of the Mean Excess Loss as the loss size approaches infinity for some distributions: Distribution
Behavior of e(x) as x→∞
For Extremely Large x
Exponential
constant
e(x) = θ
Single Par. Pareto
increases linearly
e(x) = x / (α-1)
Pareto
increases linearly
e(x) = (θ+x) / (α−1)
LogNormal
increases to infinity less than linearly
e(x) ≈ x σ2 / ln(x)
Gamma, α>1
decreases towards a horizontal asymptote
e(x)→ θ
Gamma, α<1
increases towards a horizontal asymptote231
e(x)→ θ
Inverse Gaussian
increases to a constant
e(x)→ 2µ2/θ
Weibull, τ>1
decreases to zero
e(x) ≈ θτ x1 − τ / τ
Weibull, τ<1
increases to infinity less than linearly
e(x) ≈ θτ x1 − τ / τ
Trans. Gamma, τ>1
decreases to zero
e(x) ≈ θτ x1 − τ / τ
Trans. Gamma, τ<1
increases to infinity less than linearly
e(x) ≈ θτ x1 − τ / τ
Burr
increases to infinity approximately linearly
e(x) ≈ x / (αγ−1)
Gen. Pareto
increases to infinity approximately linearly
e(x) ≈ x / (α−1)
Inv. Trans. Gamma
increases to infinity approximately linearly
e(x) ≈ x / (ατ−1)
Normal
decreases to zero approximately as 1/x
e(x) ≈ σ2 / (x-µ)
231
For the Gamma Distribution for large x, e(x) ≅ θ − (α-1)/x .
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 511
Recall that the mean and thus the Mean Excess Loss fails to exist for: Pareto with α ≤ 1, Inverse Transformed Gamma with ατ ≤ 1, Generalized Pareto with α ≤ 1, and Burr with αγ ≤ 1. Also the Gamma with α = 1 and the Weibull with τ = 1 are Exponential distributions, and thus have constant Mean Excess Loss. The Transformed Gamma with τ = 1 is a Gamma distribution, and thus in this case has the behavior of the Mean Excess Loss depend on whether alpha is greater than, less than, or equal to one. For the LogNormal, e(x) approaches its asymptotic behavior very slowly. Thus the formula derived above e(x) ≈ x / {(ln(x) − µ) / σ2 -1}, will provide a somewhat better approximation than the formula e(x) ≈ x σ2 / ln(x), until one reaches truly immense loss sizes. Those curves with heavier tails have the Mean Excess Loss increase with x. Comparing the Mean Excess Loss provides useful information on the fit of the curves to the data. Small differences in the tail of the distributions that may not have been evident in the graphs of the Distribution Function, are made evident by graphing the Mean Excess Loss. I have found the Mean Excess Loss particularly useful at distinguishing between the tails of the different distributions when using them to estimate Excess Ratios.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 512
Below is shown for various Gamma distributions the behavior of the Mean Excess Loss as the loss size increases. For α = 1, the Exponential Distribution has a constant mean excess loss equal to θ, in this case 1000. For α > 1, the mean excess loss decreases to θ. For a Gamma Distribution with α < 1, the mean excess loss increases to θ. The tail of a Gamma Distribution is similar to that of an Exponential Distribution with the same θ. e(x) 2000 1800 1600 Gamma Distribution, alpha > 1
1400 1200 1000 800
Gamma Distribution, alpha < 1
600 5000
10000 15000 20000 25000 30000
x
For the Weibull with τ = 1, the Exponential Distribution has a constant mean excess loss equal to θ, in this 1000. For τ > 1, the mean excess loss decreases to 0. For a Weibull Distribution with τ < 1, the mean excess loss increases to infinity less than linearly. e(x) 7000 6000
Weibull Distribution, tau < 1
5000 4000 3000 2000 1000 Weibull Distribution, tau > 1 1000
2000
3000
4000
5000
6000
x
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 513
The Pareto and LogNormal Distribution each have heavy tails. However, the Pareto Distribution has its mean excess loss increase linearly, while that of the LogNormal increases slightly less than linearly. Thus the Pareto has a heavier (righthand) tail than the LogNormal, which in turn has a heavier tail than the Weibull.232 40000 Pareto 30000
LogNormal 20000
10000 Weibull, tau < 1
10000
20000
30000
x 40000
All three distributions have mean residual lives that increase to infinity. Note that it takes a while for the mean residual life of the Pareto to become larger than that of the LogNormal.233
The mean excess losses are graphed for a Weibull Distribution with θ = 500 and τ = 1/2, a LogNormal Distribution with µ = 5.5 and σ = 1.678, and a Pareto Distribution with α = 2 and θ = 1000. All three distributions have a mean of 1000. 233 In this case, the Pareto has a larger mean residual life for loss sizes 15 times the mean and greater. 232
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 514
The mean residual life of an Inverse Gaussian increases to a constant; e(x)→ 2µ2/θ. Thus the Inverse Gaussian has a tail that is somewhat similar to a Gamma Distribution. Here is e(x) for an Inverse Gaussian with µ = 1000 and θ = 500: e(x) 3500
3000 2500
2000 1500
10000
20000
30000
40000
x 50000
Here is the e(x) for an Inverse Gaussian with µ = 1000 and θ = 2500: 850
800
750
700
650
5000
10000
15000
x
In this case, e(x) initially decreases and then increases towards: (2)(10002 )/2500 = 800.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 515
Determining the Tail Behavior of the Mean Excess Loss: The fact that the Mean Excess Loss is the integral of S(t) from x to infinity divided by S(x), is the basis for a method of determining the behavior of e(x) as x approaches infinity. One applies LʼHospitalʼs Rule twice. ∞
lim e(x) = lim ∫ S(t)dt x
x→∞
/S(x) = lim S(x) / f(x) = lim
x→∞
x→∞
-f(x) / f´(x).
x→∞
For example for the Gamma distribution: f(x) = θ−αxα−1 e−x/ θ / Γ(α). fʼ(x) = (α−2)θ−αxα−2 e−x/ θ / Γ(α) − θ−(α+1)xα−1 e−x/ θ / Γ(α). lim e(x) = lim -f(x) / f´(x) = lim 1 / {1/θ −(α −1)/ x} = θ. x→∞
x→∞
x→∞
When the Mean Excess Loss increases to infinity, it may be useful to look at the limit of x/e(x). Again one applies LʼHospitalʼs Rule twice.
∞
lim x/e(x) = lim xS(x) / ∫ S(t)dt = lim (-f(x)x + S(x))/ -S(x) = x
x→∞
x→∞
x→∞
lim (-f(x)-xf´(x) -f(x) )/ f(x) = lim {-xf´(x) / f(x) } - 2. x→∞
x→∞
For example for the LogNormal distribution: f(x) = ξ / x , where ξ = exp[-.5 ({ln(x) − µ} / σ)2] /{σ 2 π ) f´(x) = -ξ / x2 - {(ln(x) − µ) / (xσ2 )} (ξ / x). lim x/e(x) = lim -xf´(x) / f(x) - 2 = lim {1 + (ln(x) − µ) / σ2 } - 2 ≈ ln(x) / σ2 . x→∞
x→∞
x→∞
Thus for the LogNormal distribution the Mean Excess Loss increases to infinity, but a little less quickly than linearly: e(x) ≈ x / {(ln(x) − µ)/σ2 -1} ≈ x σ2 / ln(x).
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 516
As another example, for the Burr distribution: f(x) = αγxγ−1θ−γ(1+(x/θ)γ)−(α + 1).
f´(x) = (γ−1)f(x)/x -(α+1)γ xγ−1 f(x) /(1+(x/θ)γ).
lim x/e(x) = lim -xf´(x)/ f(x) - 2 = lim -(γ−1) +(α+1)γ xγθ−γ /(1+(x/θ)γ) - 2 = −γ+1+ αγ+γ −2 = αγ−1. x→∞
x→∞
x→∞
The Mean Excess Loss for the Burr Distribution increases to infinity approximately linearly: e(x) ≈ x / (αγ−1), provided αγ > 1. Moments, CV, and the Mean Excess Loss: When the relevant moments are finite and the distribution has support x > 0, then one can compute the moments of the distribution in terms of the mean excess loss, e(x).234 We have E[X] = e(0).235 We will now show how to write the second moment in terms of an integral of the mean excess loss and the survival function. As shown in a previous section: ∞
E[X2 ] = 2
∫ S(t) t dt
t=0
Note that the integral of S(t)/E[X] from x to infinity is the excess ratio, R(x), and thus Rʼ(x) = -S(x)/E[X]. Using this fact and integration by parts: ∞
∞
∫
∞
] + 2∫R(t) dt
E[X2 ]/E[X] = 2 t S(t)/E[X] dt = -2t R(t) t=0
t=0
t=0
For a finite second moment, tR(t) goes to zero as x goes to infinity, therefore:236 ∞
∞
∫
∫
t=0
t=0
E[X2 ] = 2E[X] R(t) dt = 2 S(t)e(t) dt.
234
Since e(x) determines the distribution, it follows that e(x) determines the moments if they exist. The numerator of e(0) is the losses excess of zero, i.e. all the losses. The denominator of e(0) is the number of losses larger than 0, i.e., the total number of losses. The support of the distribution has been assumed to be x > 0. 236 I have used the fact that E[X]R(t) = S(t)e(t). Both are the losses excess of t. 235
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 517
Exercise: For a Pareto Distribution, what is the integral from zero to infinity of S(x)e(x)? [Solution: S(x) = (1+x/θ)−α. e(x) = (x+θ)/(α-1) = (1+ x/θ)θ/(α-1). S(x)e(x) = (1+x/θ)−(α−1) θ/(α-1). ∞
∞
∞
∫S(t)e(t) dt = θ/(α-1) ∫(1+t/θ)−(α−1) dt = {θ/(α-1)} θ(α−2)(1+t/θ)−(α−2) ] = θ2/{(α-1)(α−2)}. t=0
t=0
t=0
Comment: The integral is one half of the second moment for a Pareto, consistent with the above result.] Assume that the first two moments are finite and the distribution has support x > 0, and e(x) > e(0) = E[X] for all x. Then: ∞
∞
∞
∫
∫
∫
t=0
t=0
t=0
E[X2 ] = 2 S(t)e(t) dt > 2 S(t)E[X] dt = 2E[X] S(t) dt = 2E[X]E[X].
E[X2 ] > 2E2 [X]. ⇒ E[X2 ] / E2 [X] > 2. ⇒ CV2 = E[X2 ]/E2 [X] - 1 > 1. ⇒ CV > 1. When the first two moments are finite and the distribution has support x>0, then if e(x) > e(0) = E[X] for all x, then the coefficient of variation is greater than one.237 Note that e(x) > e(0), if e(x) is monotonically increasing. Examples where this result applies are the Gamma α < 1, Weibull τ < 1, Transformed Gamma with α < 1 and τ < 1, and the Pareto.238 In each case CV > 1. Note that each of these distributions is heavier-tailed than an Exponential, which has CV = 1. While in the tail e(x) for the LogNormal approaches infinity, it is not necessarily true for the LogNormal that e(x) > e(0) for all x. The mean excess loss of the LogNormal can decrease before it finally increases as per x/ln(x) in the tail. For example, here is a graph of the Mean Excess Loss for a LogNormal with µ = 1 and σ = 0.5:
237 238
See Section 3.4.5 in Loss Models. For α > 2, so that the CV of the Pareto exists.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 518
e(x) 7 6 5 4 3 2
20
40
60
In any case, the CV of the LogNormal is: CV < 1 for σ <
80
100
x
exp(σ2 ) - 1. Thus for the LogNormal,
ln(2) ≅ .82, while CV > 1 for σ >
ln(2) ≅ 0.82.
When the first two moments are finite and the distribution has support x > 0, then if e(x) < e(0) = E[X] for all x, then CV < 1. Note that e(x) < e(0), if e(x) is monotonically decreasing. Examples where this result applies are the Gamma α > 1, Weibull τ > 1, and the Transformed Gamma with α > 1 and τ > 1. In each case CV < 1. Note that each of these distributions is lighter-tailed than an Exponential, which has CV = 1. One can get similar results to those above for higher moments. Exercise: Assuming the relevant moments are finite and the distribution has support x>0, express the integral from zero to infinity of S(x)xn , in terms of moments. [Solution: One applies integration by parts and the fact that dS(x)/dx = - f(x): ∞
∞
∞
∫ S(t)tndt = S(t) tn+1 /(n+1) ] + ∫ f(t)tn+1 /(n+1)dt = E[Xn+1]/(n+1). t=0
t=0
t=0
Where Iʼve used the fact, that if the n+1st moment is finite, then S(x)xn+1 must go to zero as x approaches infinity. Comment: For n = 0 one gets the result that the mean is the integral of the survival function from zero to infinity. For n = 1 one gets the result used above, that the integral of xS(x) from zero to infinity is half of the second moment.]
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 519
Exercise: Assuming the relevant moments are finite and the distribution has support x>0, express the integral from zero to infinity of S(x)e(x)xn , in terms of moments. [Solution: S(x)e(x) = R(x)E[X]. Then one applies integration by parts, differentiating R(t) and integrating tn . Since the integral of S(t)/E[X] from x to infinity is R(x), the derivative of R(x) is -S(x)/E[X]. ∞
∞
∞
∞
∫ S(t)e(t)tndt = E[X] ∫ R(t)tndt = E[X] {R(t) tn+1 /(n+1) ] + ∫ (S(t)/E[X]) tn+1 /(n+1) dt } = t=0
t=0
t=0
t=0
∞
∫
(1/(n+1)) S(t) tn+1 dt = (1/(n+1))E[Xn+2]/(n+2) = E[Xn+2]/{(n+1)(n+2)}.] t=0
Where Iʼve used the result of the previous exercise and the fact, that if the n+2nd moment is finite, then R(x)xn+1 must go to zero as x approaches infinity.] Thus we can express moments, when they exist, either as integrals of S(t)e(t) times powers of t, integrals of R(t) times powers of t, or as S(t) times powers of t. Assuming the relevant moments are finite and the distribution has support x>0, then if e(x) > e(0) = E[X] for all x, we have for n ≥ 1: ∞
E[Xn+1]/{n(n+1)} =
∞
∫ S(t)e(t)tn-1dt > E[X] ∫ S(t)tn-1dt = E[X]E[Xn]/n.
t=0
t=0
Thus if e(x) > e(0) for all x, E[Xn+1] > (n+1)E[X]E[Xn ], n≥1. For n = 1 we get a previous result: E[X2 ] > 2E2 [X]. For n = 2 we get: E[X3 ] > 3E[X]E[X2 ]. Conversely, if e(x) < e(0) for all x, E[Xn+1] < (n+1)E[X]E[Xn ], n≥1.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 520
Equilibrium Distribution: Given that X follows a distribution with survival function SX, for x > 0, then Loss Models defines the density of the corresponding “equilibrium distribution” as:239 g(y) = SX(y) / E[X], y > 0. Exercise: Demonstrate that the above is actually a probability density function. [Solution: SX(y) / E[X] ≥ 0. ∞
∫
∞
∫
(SX(y) / E[X]) dy = (1/E[X]) SX(y) dy = E[X]/ E[X] = 1.
0
]
0
Exercise: If severity is Exponential with mean θ = 10, what is the density of the corresponding equilibrium distribution? [Solution: g(y) = SX(y) / E[X] = exp(-y/10)/ 10.] In general, if the severity is Exponential, then the corresponding equilibrium distribution is also Exponential with the same mean. Exercise: If severity is Pareto, with α = 5 and θ = 1000, what is the corresponding equilibrium distribution? [Solution: g(y) = SX(y) / E[X] = (1 + y/1000)-5 / 250. This is the density of another Pareto Distribution, but with α = 4 and θ = 1000.] The distribution function of the corresponding equilibrium distribution is the loss elimination ratio of the severity distribution: y
y
∫
G(y) = (SX(y) / E[X]) dy = (1/E[X]) 0
∫ SX(y) dy = E[X ∧ y]/ E[X] = LER(y).
0
Therefore the survival function of the corresponding equilibrium distribution is the excess ratio of the severity distribution.
239
See Equation 3.20 in Loss Models.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 521
For example, if severity is Pareto, the excess ratio, R(x) = {θ/(θ+x)}α−1, which is the survival function for a Pareto with the same scale parameter and a shape parameter one less. Thus if severity is Pareto, (with α > 1), then the distribution of the corresponding equilibrium distribution is also Pareto, with the same scale parameter and shape parameter of α - 1. The mean of the corresponding equilibrium distribution is:240 ∞
∫
∞
y (SX(y) / E[X]) dy = (1/E[X])
0
∫ y SX(y) dy = E[X2] / {2E[X]}.
0
The second moment of the corresponding equilibrium distribution is: ∞
∞
∫ y2 (SX(y) / E[X]) dy = (1/E[X]) ∫ y2 SX(y) dy = E[X3] / {3E[X]}. 0
0
Exercise: If severity is Pareto, with α = 5 and θ = 1000, what are the mean and variance of the corresponding equilibrium distribution? [Solution: The mean of the Pareto is: 1000/4 = 250. The second moment of the Pareto is: 2(10002 )/{(5-1)(5-2)} = 166,667. The third moment of the Pareto is: 6(10003 )/{(5-1)(5-2)(5-3)} = 250 million. The mean of the corresponding equilibrium distribution is: E[X2 ]/ {2E[X]} = 166,667 / 500 = 333.33. The second moment of the corresponding equilibrium distribution is: E[X3 ] / {3E[X]} = 250 million/ 750 = 333,333. Thus the variance of the corresponding equilibrium distribution is: 333,333 - 333.332 = 222,222. Alternately, the corresponding equilibrium distribution is a Pareto Distribution, but with α = 4 and θ = 1000. This has mean: 1000/3 = 333.33, second moment: 2(10002 )/{(3)(2)} = 333,333, and variance: 333,333 - 333.332 = 222,222.] The hazard rate of the corresponding equilibrium distribution is: density of the corresponding equilibrium distribution S(x) / E[X] S(x) = = = survival function the corresponding equilibrium distribution R(x) E[X] R(x) S(x) = 1 / e(x). expected losses excess of x The hazard rate of the corresponding equilibrium distribution is the inverse of the mean excess loss. 240
See Section 3.4.5 in Loss Models.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 522
Curtate Expectation of Life: The mean excess loss is mathematically equivalent to what is called the complete expectation of life, e° x ⇔ e(x). Exercise: Five individuals live 53.2, 66.3, 70.8, 81.0, and 83.5 years. What is the observed e(60) = e°60 for this group? [Solution: (6.3 + 10.8 + 21 + 23.5)/4 = 15.4.] If instead we ignore any fractions of a year lived, then we get what is called the curtate expectation of life, ex. Exercise: Five individuals live 53.2, 66.3, 70.8, 81.0, and 83.5 years. What is the observed e60 for this group? [Solution: (6 + 10 + 21 + 23)/4 = 15.0.] Since we are ignoring any fractions of a year lived, ex ≤ e° x . On average we are ignoring about 1/2 year of life, therefore, ex ≅ e° x - 1/2. Just as we can write e(x) = e° x in terms of an integral of the Survival Function:241 ∞
e° x = ∫ S(t)dt / S(x) x
one can write the curtate expectation of life in terms of a summation of Survival Functions:242 ex =
∞
∑ S(t) / S(x).
t = x+1 ∞
e0 = ∑ S(t) . t=1
Exercise: Determine ex for an Exponential Distribution with mean θ. [Solution: ex =
∞
∞
∞
t = x+1
t=1
t=1
∑ S(t) / S(x) = ∑ e-(x + t)/ θ / e-x/ θ = ∑ e-t / θ = e−1/θ/(1 - e−1/θ) = 1/(e1/θ - 1).
Comment: ex = 1/(e1/θ - 1) ≅ 1/{1/θ + 1/(2θ2)} = θ/{1 + 1/(2θ)} ≅ θ{1 - 1/(2θ)} = θ - 1/2.] 241 242
See equation 3.5.2 in Actuarial Mathematics, with tp x = S(x+t)/S(x). See equation 3.5.7 in Actuarial Mathematics, with kp x = S(x+k)/S(x).
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 523
For example, for θ = 10, ex = 1/(e0.1 - 1) = 9.508. This compares to e° x = θ = 10. Exercise: Determine e0 for a Pareto Distribution with θ = 1 and α = 2. [Solution: e0 = S(1) + S(2) + S(3) + ... = (1/2)2 + (1/3)2 + (1/4)2 + (1/5)2 + ... = π2/6 - 1 = 0.645. Comment: e(0) = E[X] = θ/(α - 1) = 1.]
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 524
Problems: 33.1 (2 points) Assume you have a Pareto distribution with α = 5 and θ = $1000. What is the Mean Excess Loss at $2000? A. less than $500 B. at least $500 but less than $600 C. at least $600 but less than $700 D. at least $700 but less than $800 E. at least $800 33.2 (1 point) Assume you have a distribution F(x) = 1 - e-x/666. What is the Mean Excess Loss at $10,000? A. less than $500 B. at least $500 but less than $600 C. at least $600 but less than $700 D. at least $700 but less than $800 E. at least $800 33.3 (3 points) The random variables X and Y have joint density function f(x, y) = 60,000,000 x exp(-10x2 ) / (100 + y)4 , 0 < x < ∞, 0 < y < ∞. Determine the Mean Excess Loss function for the marginal distribution of Y evaluated at Y = 1000. A. less than 200 B. at least 200 but less than 300 C. at least 300 but less than 400 D. at least 400 but less than 500 E. at least 500 33.4 (1 point) Which of the following distributions would be most useful for modeling the age at death of humans? A. Gamma B. Inverse Gaussian C. LogNormal D. Pareto E. Weibull 33.5 (1 point) Given the following empirical mean excess losses for 500 claims: x 0 5 10 15 25 50 100 150 200 250 500 1000 e(x) 15.6 16.7 17.1 17.4 17.6 18.0 18.2 18.3 18.3 18.4 18.5 18.5 Which of the following distributions would be most useful for modeling this data? A. Gamma with α > 1
B. Gamma with α < 1
D. Weibull with τ > 1
E. Weibull with τ < 1
C. Pareto
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 525
33.6 (2 points) You have a Pareto distribution with parameters α and θ. If e(1000)/e(100) = 2.5, what is θ? A. 100
B. 200
C. 300
D. 400
E. 500
For the following three questions, assume you have a LogNormal distribution with parameters µ = 11.6, σ = 1.60. 33.7 (3 points) What is the Mean Excess Loss at $100,000? A. less than $500,000 B. at least $500,000 but less than $600,000 C. at least $600,000 but less than $700,000 D. at least $700,000 but less than $800,000 E. at least $800,000 33.8 (1 point) What is the average size of those losses greater than $100,000? A. less than $500,000 B. at least $500,000 but less than $600,000 C. at least $600,000 but less than $700,000 D. at least $700,000 but less than $800,000 E. at least $800,000 33.9 (2 points) What percent of the total loss dollars are represented by those losses greater than $100,000? A. less than 0.91 B. at least 0.91 but less than 0.92 C. at least 0.92 but less than 0.93 D. at least 0.93 but less than 0.94 E. at least 0.94
33.10 (2 points) You observe the following 35 losses: 6, 7, 11, 14, 15, 17, 18, 19, 25, 29, 30, 34, 40, 41, 48, 49, 53, 60, 63, 78, 85, 103, 124, 140, 192, 198, 227, 330, 361, 421, 514, 546, 750, 864, 1638. What is the (empirical) Mean Excess Loss at 500? A. less than 350 B. at least 350 but less than 360 C. at least 360 but less than 370 D. at least 370 but less than 380 E. at least 380
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 526
Use the following information for the next two questions:
• The annual frequency of ground up losses is Negative Binomial with r = 4 and β = 1.3. • The sizes of ground up losses follow a Pareto Distribution with α = 3 and θ = 5000. • There is a franchise deductible of 1000. 33.11 (2 points) Determine the insurerʼs average payment per nonzero payment. (A) 2500 (B) 3000 (C) 3500 (D) 4000 (E) 4500 33.12 (2 points) Determine the insurer's expected annual payments. (A) 8,000 (B) 9,000 (C) 10,000 (D) 11,000 (E) 12,000 33.13 (3 points) F is a continuous size of loss distribution on (0, ∞). LER(x) is the corresponding loss elimination ratio at x. Which of the following are true? A. F(x) ≥ LER(x) for all x > 0. B. F(x) ≥ LER(x) for all x > c, for some c > 0. C. F(x) ≥ LER(x) for all x > 0, if and only if e(x) ≥ e(0). D. F(x) ≥ LER(x) for all x > 0, if and only F is an Exponential Distribution. E. None of A, B, C or D is true.
33.14 (2 points) The size of loss follows an Exponential Distribution with θ = 5. The largest integer contained in each loss is the amount paid for that loss. For example, a loss of size 3.68 results in a payment of 3. What is the expected payment? A. 4.46 B. 4.48 C. 4.50 D. 4.52 E. 4.54 33.15 (2 points) Losses follow a Single Parameter Pareto distribution with θ = 1000 and α > 1. Determine the ratio of the Mean Excess Loss function at x = 3000 to the Mean Excess Loss function at x = 2000. A. 1 B. 4/3 C. 3/2 D. 2 E. Cannot be determined from the given information. 33.16 (3 points) For a Gamma Distribution with α = 2, what is the behavior of the mean excess loss e(x) as x approaches infinity?
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 527
33.17 (4, 5/86, Q.58) (2 points) For a certain machine part, the Mean Excess Loss e(x) varies as follows with the age (x) of the part: Age x e(x) 5 months 12.3 months 10 18.6 20 34.3 50 69.1 Which of the following continuous distributions best fits this pattern of Mean Excess Loss? A. Exponential
B. Gamma (α > 1)
D. Weibull (τ > 1)
E. Normal
C. Pareto
33.18 (160, 5/87, Q.1) (2.1 points) You are given the following survival function: S(x) = (b - x/a)1/2, 0 ≤ x ≤ k. The median age is 75. Determine e(75). (A) 8.3 (B) 12.5 (C) 16.7 (D) 20.0 (E) 33.3 33.19 (4B, 5/92, Q.14) (1 point) Which of the following statements are true about the Mean Excess Loss function e(x)? 1. If e(x) increases linearly as x increases, this suggests that a Pareto model may be appropriate. 2. If e(x) decreases as x increases, this suggests that a Weibull model may be appropriate. 3. If e(x) remains constant as x increases, this suggests that an exponential model may be appropriate. A. 1 only B. 2 only C. 1 and 3 only D. 2 and 3 only E. 1, 2, and 3 33.20 (4B, 5/93, Q.24) (2 points) The underlying distribution function is assumed to be the following: F(x) = 1 - e-x/10, x ≥ 0 Calculate the value of the Mean Excess Loss function e(x), for x = 8. A. less than 7.00 B. at least 7.00 but less than 9.00 C. at least 9.00 but less than 11.00 D. at least 11.00 but less than 13.00 E. at least 13.00 33.21 (4B, 5/94, Q.4) (2 points) You are given the following information from an unknown size of loss distribution for random variable X: Size k ($000s) 1 3 5 7 9 Count of X ≥ k 180 118 75 50 34 Sum of X ≥ k 990 882 713 576 459 If you are using the empirical Mean Excess Loss function to help you select a distributional family for fitting the empirical data, which of the following distributional families should you attempt to fit first? A. Pareto B. Gamma C. Exponential D. Weibull E. Lognormal
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 528
33.22 (4B, 5/95, Q.21) (3 points) Losses follow a Pareto distribution, with parameters θ and α > 1. Determine the ratio of the Mean Excess Loss function at x = 2θ to the Mean Excess Loss function at x = θ. A. 1/2 B. 1 C. 3/2 D. 2 E. Cannot be determined from the given information. 33.23 (4B, 11/96, Q.22) (2 points) The random variable X has the density function f(x) = e-x/λ/λ , 0 < x < ∞, λ > 0. Determine e(λ), the Mean Excess Loss function evaluated at λ. B. λ
A. 1
D. λ/e
C. 1/λ
E. e/λ
33.24 (4B, 5/97, Q.13) (1 point) Which of the following statements are true? 1. Empirical Mean Excess Loss functions are continuous. 2. The Mean Excess Loss function of an exponential distribution is constant. 3. If it exists, the Mean Excess Loss function of a Pareto distribution is decreasing. A. 2 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 33.25 (4B, 5/98, Q.3) (3 points) The random variables X and Y have joint density function f(x, y) = exp(-2x - y/2) 0 < x < ∞, 0 < y < ∞. Determine the Mean Excess Loss function for the marginal distribution of X evaluated at X = 4. A. 1/4 B. 1/2 C. 1 D. 2 E. 4 33.26 (4B, 11/98, Q.6) (2 points) Loss sizes follow a Pareto distribution, with parameters α = 0.5 and θ = 10,000. Determine the Mean Excess Loss at 10,000. A. 5,000
B. 10,000
C. 20,000
D. 40,000
E. ∞
33.27 (4B, 11/99, Q.25) (2 points) You are given the following: • The random variable X follows a Pareto distribution, as per Loss Models, with parameters θ = 100 and α = 2. • The mean excess loss function, eX(k), is defined to be E[X - k I X ≥ k]. Determine the range of eX(k) over its domain of [0, ∞ ). A. [0, 100]
B. [0, ∞)
C. 100
D. [100, ∞)
E. ∞
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 529
33.28 (4B, 11/99, Q.27) (2 points) You are given the following: • The random variable X follows a Pareto distribution, as per Loss Models, with parameters θ = 100 and α = 2 . • The mean excess loss function, eX(k), is defined to be E[X - k I X ≥ k]. Z = min(X, 500). Determine the range of eZ(k) over its domain of [0, 500]. A. [0, 150]
B. [0, ∞)
C. [100, 150]
D. [100, ∞)
E. [150, ∞)
33.29 (SOA3, 11/04, Q.24) (2.5 points) The future lifetime of (0) follows a two-parameter Pareto distribution with θ = 50 and α = 3. Calculate e° 20 . (A) 5
(B) 15
(C) 25
(D) 35
(E) 45
33.30 (CAS3, 5/05, Q.4) (2.5 points) Well-Traveled Insurance Company sells a travel insurance policy that reimburses travelers for any expenses incurred for a planned vacation that is canceled because of airline bankruptcies. Individual claims follow a Pareto distribution with α = 2 and θ = 500. Because of financial difficulties in the airline industry, Well-Traveled imposes a limit of $1,000 on each claim. If a policyholder's planned vacation is canceled due to airline bankruptcies and he or she has incurred more than $1,000 in expenses, what is the expected non-reimbursed amount of the claim? A. Less than $500 B. At least $500, but less than $1,000 C. At least $1,000, but less than $1,500 D. At least $1,500, but less than $2,000 E. $2,000 or more 33.31 (SOA M, 5/05, Q.9 & 2009 Sample Q.162) (2.5 points) A loss, X, follows a 2-parameter Pareto distribution with α = 2 and unspecified parameter θ. You are given: E[X - 100 | X > 100] = (5/3) E[X - 50 | X > 50]. Calculate E[X - 150 | X > 150]. (A) 150 (B) 175 (C) 200 (D) 225
(E) 250
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 530
33.32 (CAS3, 11/05, Q.10) (2.5 points) You are given the survival function s(x) as described below:
• s(x) = 1 - x/40 for 0 ≤ x ≤ 40. • s(x) is zero elsewhere. Calculate e° 25, the complete expectation of life at age 25. A. Less than 7.7 B. At least 7.7 , but less than 8.2 C. At least 8.2, but less than 8.7 D. At least 8.7, but less than 9.2 E. At least 9.2 33.33 (CAS3, 5/06, Q.38) (2.5 points) The number of calls arriving at a customer service center follows a Poisson distribution with λ = 100 per hour. The length of each call follows an exponential distribution with an expected length of 4 minutes. There is a $3 charge for the first minute or any fraction thereof and a charge of $1 per minute for each additional minute or fraction thereof. Determine the total expected charges in a single hour. A. Less than $375 B. At least $375, but less than $500 C. At least $500, but less than $625 D. At least $625, but less than $750 E. At least $750
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 531
Solutions to Problems: 33.1. D. e(2000) = {mean - E[X ∧ 2000]} / S(2000) = (250 - 246.9) / .00411 = $754 Alternately, for the Pareto, e(x) = (θ + x) / (α - 1) = 3000 / 4 = $750. Alternately, a Pareto truncated and shifted at 2000, is another Pareto with α = 5 and θ = 1000 + 2000 = 3000. e(2000) is the mean of this new Pareto: 3000/(5 - 1) = $750. Alternately, e(2000) =
∞
∞
0
0
∫ tp2000 dt = ∫ S(2000 + t) / S(2000) dt =
∞
∫ (2000 + θ)α / (2000 + θ + t)α dt = (2000 + θ) / (α - 1) = 3000 / 4 = $750. 0
33.2. C. For the exponential distribution the mean excess loss is a constant; it is equal to the mean. The mean in this case is θ = $666. 33.3. E. X and Y are independent since the support doesnʼt depend on x or y and the density can be factored into a product of terms each just involving x and y. f(x, y) = 60,000,000 x exp(-10x2 ) / (100 + y)4 = {20 x exp(-10x2 )} {3000000 / (100 + y)4 }. The former is the density of a Weibull Distribution with θ = 1/ 10 and τ = 2. The latter is the density of a Pareto Distribution with α = 3 and θ = 100. When one integrates from x = 0 to ∞ in order to get the marginal distribution of y, one is left with just a Pareto, since the Weibull integrates to unity and the Pareto is independent of x. Thus the marginal distribution is just a Pareto, with parameters α = 3 and θ = 100. Thus e(y) = (θ + y)/(α - 1) = (100 + y)/(3 - 1). e(1000) = 1100/2 = 550. 33.4. E. Of these distributions, only the Weibull (for τ >1) has mean residual lives decline to zero. The Weibull (for τ >1) has the force of mortality increase as the age approaches infinity, as is observed for humans. The other distributions have the force of mortality decline or approach a positive constant as the age increases. 33.5. B. The empirical mean residual lives seem to be increasing towards a limit of about 18.5 as x approaches infinity. This is the behavior of a Gamma with alpha less than 1. The other distributions given all exhibit different behaviors than this.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
33.6. E. For the Pareto Distribution: e(x) = (E[X] - E[X
∧
HCM 10/8/12,
Page 532
x])/S(x) = (x + θ)/(α - 1).
e(1000)/e(100) = (1000 + θ)/(100 + θ) = 2.5. ⇒ θ = 500. Comment: One can not determine α from the given information. 33.7. C. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}. E[X ∧ 100000] = exp(12.88)Φ[-1.65] + (100000) {1 - Φ[-.05]} = (392,385)(1 - .9505) + (100000)(.5199) = 71,413. For the LogNormal, E[X] = exp(µ + σ2/2) = exp(12.88) = 392,385. For the LogNormal, F(x) = Φ[{ln(x) − µ} / σ]. F(100000) = Φ[-.05] = (1 - .5199). Therefore, e(100000) = {E[X] - E[X ∧ 100000] }/{1 - F(100000)} = (392,385 - 71,413) / .5199 ≅ $617,000. Alternately, for the LogNormal distribution, e(x) = exp(µ + σ2/2){1 − Φ[(lnx − µ − σ2)/σ] / {1 − Φ[(lnx − µ)/σ]} − x. For µ = 11.6, σ = 1.60, e(100000) = exp(12.88)(1 − Φ[-1.65]) / {1 − Φ[-.05]} - 100000 = (392,385)(.9505)/(.5199) - (100000) = $617 thousand. 33.8. D. The size of those claims greater than $100,000 = $100,000 + e(100000). But from the previous question e(100000) ≅ $617,000. Therefore, the solution ≅ $717,000. 33.9. E. Use the results from the previous two questions. F(100000) = Φ[-.05] = (1 -.5199). Thus, S(100000) = .5199. E[X] = exp(µ + σ2/2) = exp(12.88) = 392,385. Percent of the total loss dollars represented by those losses greater than $100,000 = S(100000)}{size of those claims greater than $100,000}/E[X] = (.5199)(717000)/392,385 = 0.95. Alternately, the losses represented by the small losses are: E[X ∧ 100000] - S(100000)(100000) = 71,413 − 51,990 = 19,423. Divide by the mean of 392,385 and get .049 of the losses are from small claims. Thus the percentage of losses from large claims is: 1 - .049 = 0.95. 33.10. C. Each claim above 500 contributes its excess above 500 and then divide by the number of claims greater than 500. e(500) = {14 + 46+250+364+1138}/ 5 = 362.4.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 533
33.11. D. With a franchise deductible the insurer pays the full value of every large loss and pays nothing for small losses. Therefore, the Pareto Distribution has been truncated from below at 1000. The mean of a distribution truncated and shifted from below at 1000 is e(1000) ⇒ the mean of a distribution truncated from below at 1000 is: e(1000) + 1000. For a Pareto Distribution e(x) = (E[X] - E[X
∧
x])/S(x) = (x + θ)/(α - 1).
e(1000) = (1000 + 5000)/(3 - 1) = 3000. e(1000) + 1000 = 4000. 33.12. E. For the Pareto Distribution, S(1000) = {5000/(5000 + 1000)}3 = 0.5787. Mean frequency = rβ = (4)(1.3) = 5.2. Expected # of nonzero payments = (0.5787)(5.2) = 3.009. From the previous solution, average nonzero payment is 4000. Expected annual payments = (3.009)(4000) = 12,036. Alternately, with a franchise deductible of 1000 the payment is 1000 more than that for an ordinary deductible for each large loss, and thus the average payment per loss is: E[X] - E[X ∧ 1000] + 1000S(1000) = (5000/2) - (5000/2){1 - (5000/6000)2 } + (1000)(5000/6000)3 = 2315. Expected annual payments = (5.2)(2315) = 12,038. 33.13. C. F(x) - LER(x) = 1 - S(x) - {1 - S(x)e(x)/E[X]} = {S(x)/E[X]}{e(x) - E[X]} = {S(x)/E[X]}{e(x) - e(0)}. Therefore, F(x) ≥ LER(x) ⇔ e(x) ≥ e(0). Alternately, e(x)/e(0) = {E[(X - x)+]/S(x)}/E[X] = {E[(X - x)+]/E[X]}/S(x) = R(x)/S(x). Therefore, e(x) ≥ e(0). ⇔ R(x) ≥ S(x). ⇔ LER(x) = 1 - R(x) ≤ 1 - F(x) = S(x). Comment: For an Exponential Distribution, e(x) = e(0) = θ, and therefore F(x) = LER(x). For a Pareto Distribution with α > 1, e(x) increases linearly, and therefore F(x) > LER(x). 33.14. D. The expected payment is the curtate expectation of life at zero. ∞
e0 = ∑ S(t) = S(1) + S(2) + S(3) + ... = e-1/5 + e-2/5 + e-3/5 + ... t=1
= e-1/5/(1 - e-1/5) = 1/(e0.2 - 1) = 4.517. Comment: Approximately 1/2 less than the mean of 5. 33.15. C. e(x) = {E[X] - E[X
∧
x]}/S(x) = (αθ/(α−1) - {αθ/(α−1) − θα/((α−1)xα−1)})/{(θ/x)α}
= x/(α-1). e(3000)/e(2000) = 3000/2000 = 3/2. Comment: Similar to 4B, 5/95, Q.21.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 534
33.16. The value of the scale parameter θ does not affect the behavior, for simplicity set θ = 1. f(x) = x e-x, x > 0. ∞
S(x) =
∫x f(t) dt = x e-x + e-x. ∞
e(x) =
∫x S(t) dt / S(x) = (x e-x + 2e-x) / (x e-x + e-x) = 1 + 1/(1+x).
Thus, as x approaches infinity, e(x) decreases to a constant. Comment: In this case the limit of e(x) is one, while in general it is θ. In general, for α > 1, e(x) decreases to a constant, while h(x) increases to a constant. For α < 1, e(x) increases to a constant, while h(x) decreases to a constant. For α = 1, we have an Exponential, and e(x) and h(x) are each constant. For α = 2 and θ = 1, h(x) = f(x) / S(x) = x e-x / (x e-x + e-x) = x / (x + 1). 33.17. C. The mean residual life increases approximately linearly, which indicates a Pareto. Comment: The Pareto has a mean residual life that increases linearly. The Exponential has a constant mean residual life. For a Gamma with α > 1 the mean residual life decreases towards a horizontal asymptote. For a Weibull with τ > 1 the mean residual life decreases to zero. For a Normal Distribution the mean residual life decreases to zero. 33.18. C. We want S(0) = 1. b = 1. ⇒ We want S(k) = 0. ⇒ k = a. 0.5 = S(75) = (1 - 75/a)1/2. ⇒ a = 100. 100
100
75
75
e(75) = ∫ S(x) dx /S(75) = ∫ (1 - x / 100)1/ 2 dx / 0.5 = (25/3)/0.5 = 16.7.
33.19. E. 1. T. Mean Residual Life of the Pareto Distribution increases linearly. 2. T. The Weibull Distribution for τ > 1 has the mean residual life decrease (to zero.) 3. T. The mean residual life for the Exponential Distribution is constant. 33.20. C. For the Exponential Distribution, e(x) = mean = θ = 10.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 535
33.21. C. The empirical mean residual life is calculated as: e(k) = ($ excess of k) / (# claims > k) = {($ on claims > k) / (# claims > k)} - k = (average size of those claims of size greater than k) - k. Size k ($000) # claims ≥ k Sum of X ≥ k average size of those claims of size > k e(k)
1 180 990 5.500 4.500
3 118 882 7.475 4.475
5 75 713 9.507 4.507
7 50 576 11.520 4.520
9 34 459 13.500 4.500
Since the mean residual life is approximately constant, one would attempt first to fit an exponential distribution, since it has a constant mean residual life. 33.22. C. For the Pareto, e(x) = (x+θ)/(α−1). e(2θ) = 3θ / (α-1). e(θ) = 2θ / (α-1). e(2θ) / e(θ) = 3/2. Comment: If one doesnʼt remember the formula for the mean residual life of the Pareto, it is a longer question. In that case, one can compute: e(x) = (mean - E[X ∧ x]) / (1 -F(x)). 33.23. B. The mean residual life of the Exponential is a constant equal to its mean, here λ. 33.24. A. 1. False. The empirical mean residual life is the ratio of the observed losses excess of the limit divided by the number of observed claims greater than the limit. While the numerator is continuous, the denominator is not. For example, assume you observe 3 claims of sizes 2, 6 and 20. Then e(5.999) = {(20-5.999) + (6-5.999)}/2 = 7.001, while e(6.001) = (20-6.001)/1 = 13.999. The limit of e(x) as x approaches 6 from below is 7, while the limit of e(x) as x approaches 6 from above is 14. Thus the empirical mean residual life is discontinuous at 6. 2. True. 3. False. For the Pareto Distribution, the mean residual life increases (linearly). Comment: A function is continuous at a point x, if and only if the limits approaching x from below and above both exist and are each equal to the value of the function at x. The empirical mean residual life is discontinuous at points at which there are observed claims, since so are the Empirical Distribution Function and the tail probability. In contrast, the empirical Excess Ratio and empirical Limited Expected Value are continuous. The numerator of the Excess Ratio is the observed losses excess of the limit; the denominator is the total observed losses. This numerator is continuous, while this denominator is independent of x. Thus the empirical Excess Ratio is continuous. The numerator of the empirical Limited Expected Value is the observed losses limited by the limit; the denominator is the total number of observed claims. This numerator is continuous, while this denominator is independent of x. Thus the empirical Limited Expected Value is continuous.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 536
33.25. B. The marginal distribution of X is obtained by integrating with respect to y: ∞
f(x) =
∞
∞
∫exp(-2x - y/2)dy = e-2x ∫e-y/2 dy = e-2x (-2e-y/2) ] = 2e-2x.
y=0
y=0
y=0
Thus the marginal distribution is an Exponential with a mean of 1/2. It has a mean residual life of 1/2, regardless of x. 33.26. E. The mean excess loss for the Pareto only exists for α > 1. For α ≤ 1 the relevant integral is infinite. ∞
∫ t f(t) dt /S(x)
e(x) =
- x.
x
For a Pareto with α = .5, ∞
∫ x
∞
∫
t f(t) dt =
t (.5θ.5)(θ + t)-1.5 dt .
x
For large t, the integrand is proportional to t t-1.5 = t-.5, whose integral approaches infinity as the upper limit of the integral approaches infinity. (The integral of t-.5 is 2t.5.) Alternately, e(x) = (E[X] - E[X ∧ x]) / S(x). The limited expected value E[X ∧ x] is finite (it is less than x), as is S(x). However, for α ≤ 1, the mean E[X] (does not exist or) is infinite. Therefore, so is the mean excess loss. Comment: While choice E is the best of those available, in my opinion a better answer might have been that the mean excess loss does not exist. 33.27. D. For the Pareto Distribution e(k) = (k+θ)/(α-1) = k + 100. Therefore, as k goes from zero to infinity, e(k) goes from 100 to infinity.
2013-4-2,
Loss Distributions, §33 Mean Excess Loss
HCM 10/8/12,
Page 537
33.28. A. Z is for data censored at 500, corresponding to a maximum covered loss of 500. eZ(k) = (dollars of loss excess of k) / S(k) = (E[X ∧ 500] - E[X ∧ k])/ S(k). E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}, for the Pareto. Thus (E[X ∧ 500] - E[X ∧ k])/ S(k) = {100/(100+k) - 100/600} {(100+k)/100}2 = (100 + k)(600 - (100 + k))/600 = (100 + k)(500 - k)/600. eZ(0) = 83.33. eZ(500) = 0. Setting the derivative equal to zero: (400-2k)/ 600 = 0. k = 200. eZ(200) = 150. Thus the maximum over the interval is 150, while the minimum is 0. Therefore, as k goes from zero to 500, eZ(k) is in the interval [0, 150]. 33.29. D. E[X] = θ/(α-1) = 50/(3-1) = 25. E[X
∧
20] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (25){1 - (50/(50 + 20))2 } = 12.245.
S(20) = (50/(50 + 20))3 = 0.3644. e(20) = (E[X] - E[X ∧ 20])/S(20) = (25 - 12.245)/.3644 = 35. Alternately, for the Pareto, e(x) = (x + θ)/(α - 1). e(20) = (20 + 50)/(3 - 1) = 35. 33.30. D. Given someone incurred more than $1,000 in expenses, the expected non-reimbursed amount of the claim is the mean residual life at e(1000). For the Pareto, e(x) = (x + θ)/(α - 1). e(1000) = (1000 + 500)/(2 - 1) = 1500. Alternately, (E[X] - E[X
∧
1000])/S(1000) = {500 - 500(1 - 500/1500)}/(500/1500)2 = 1500.
Alternately, a Pareto truncated and shifted from below is another Pareto, with parameters α and θ + d. Therefore, the unreimbursed amounts follow a Pareto Distribution with parameters α = 2 and θ = 500 + 1000 = 1500, with mean 1500/(2 - 1) = 1500. 33.31. B. e(d) = E[X - d | X > d] = (E[X] - E[X
∧
d])/S(d) = {θ - θ(1 - θ/(θ + d))}/{θ/(θ + d)}2 = θ + d.
The given equation states e(100) = (5/3)e(50). ⇒ 100 + θ = (5/3)(50 + θ). ⇒ θ = 25. E[X - 150 | X > 150] = e(150) = 150 + 25 = 175. Comment: A Pareto truncated and shifted from below is another Pareto, with parameters α and θ + d. ⇒ e(x) = (x + θ)/(α - 1).
2013-4-2,
Loss Distributions, §33 Mean Excess Loss 40
40
25
25
HCM 10/8/12,
Page 538
33.32. A. e(25) = ∫ S(x)dx / S(25) = ∫ (1 - x / 40) dx / (1 - 25/40) = 2.8125 / 0.375 = 7.5. Alternately, the given survival function is a uniform distribution on 0 to 40. At age 25, the future lifetime is uniform from 0 to 15, with an average of 7.5. Comment: DeMoivreʼs Law with ω = 40. 33.33. D. The charge per call of length t is: 3 + 1(if t > 1) + 1(if t > 2) + 1(if t > 3) + 1(if t > 4) + ... The expected charge per call is: 3 + S(1) + S(2) + S(3) + ... = 3 + e-1/4 + e-2/4 + e-3/4 + ... = 3 + e-1/4/(1 - e-1/4) = 6.521. (100)(6.521) = 652.1. Comment: Ignore the possibility that a call lasts exactly an integer, since the Exponential is a continuous distribution. Then the cost of a call is: 3 + curtate lifetime of a call. For example, if the call lasted 4.6 minutes, the cost is: 3 + 4 = 7. The expected cost of a call is: 3 + curtate expected lifetime of a call. ∞
∞
t=1
t=1
e0 = ∑ S(t) = ∑ e− t / 4 = e-1/4/(1 - e-1/4) = 1/(e1/4 - 1) = 3.521. (100)(3 + 3.521) = 652.1. e0 ≅ e(0) - 1/2 = E[X] - 1/2 = 4 - 1/2 = 3.5. (100)(3 + 3.5) = 650, close to the exact answer.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 539
Section 34, Hazard Rate The hazard rate, force of mortality, or failure rate, is defined as: h(x) =
f(x) x ≥ 0. S(x)
h(x) can be thought of as the failure rate of machine parts. The hazard rate can also be interpreted as the force of mortality = probability of death / chance of being alive.243 For a given age x, h(x) is the density of the deaths, divided by the number of people still alive. Exercise: F(x) = 1 - e-x/10. What is the hazard rate? [Solution: h(x) = f(x)/S(x) = (e-x/10/10)/e-x/10 = 1/10.] The hazard rate determines the survival (distribution) function and vice versa. d ln(S(x)) / dx = dS(x)/dx / S(x) = -f(x) / S(x) = - h(x). Thus h(x) = -d ln(S(x)) / dx.
[
S(x) = exp -
x
∫ h(t) dt ].
244
0
x
S(x) = exp[-H(x)], where H(x) =
∫ h(t) dt .
245
0
Note that h(x) = f(x)/S(x) ≥ 0. S(∞) = exp[-H(∞)] = 0 ⇔ H(∞) =
∞
∫ h(t) dt = ∞.
0
If h(x) ≥ 0, then H(x) is nondecreasing and therefore, S(x) = exp[-H(x)] is nonincreasing. H(x) usually increases, while S(x) decreases, although H(x) and S(x) can be constant on an interval. Since H(0) = 0, S(0) = exp[-0] = 1. A function h(x) defined for x > 0 is a legitimate hazard rate, in other words it corresponds to a legitimate survival function, if and only if h(x) ≥ 0 and the integral of h(x) from 0 to infinity is infinite.
243
This is equation 3.2.13 in Actuarial Mathematics by Bowers et. al. The lower limit of the integral should be the lower end of the support of the distribution. 245 H is called the cumulative hazard rate. See “Mahlerʼs Guide to Survival Analysis.” 244
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 540
As in Life Contingencies, one can write the distribution function and the density function in terms of the force of mortality h(t):246
[
F(x) = 1 - exp -
x
]
∫ h(t) dt .
0
[
x
f(x) = h(x) exp -
∫ h(t) dt ].
0
Exercise: h(x) = 1/10. What is the distribution function? [Solution: F(x) = 1 - e-x/10, an Exponential Distribution with θ = 10.] h constant ⇔ the Exponential Distribution, with constant hazard rate of 1/θ = 1/mean. The Exponential is the only continuous distribution with a constant hazard rate, and therefore constant mean excess loss. The Force of Mortality for various distributions is given below: Distribution
Force of Mortality or Hazard Rate
Behavior as x approaches ∞
Exponential
1/θ
Weibull
τ xτ − 1 θτ
Pareto
α θ + x
h(x) → 0, as x → ∞
Burr247
α γ xγ −1 θγ + xγ
h(x) → 0, as x → ∞
Single Parameter Pareto
α/x
h(x) → 0, as x → ∞
Gompertzʼs Law248
Bcx
h(x) → ∞, as x → ∞
Makehamʼs Law249
A + Bcx
h(x) → ∞, as x → ∞
246
h(x) constant τ < 1, h(x) → 0. τ > 1, h(x) → ∞.
See equation 3.2.14 in Actuarial Mathematics, np x = chance of living n years for those who have reached age x = {1-F(x+t)} / {1-F(x)} = S(x+t) / S(x) = exp (- integral from x to x+n of µt). The Loglogistic is a special case of the Burr with α = 1. As per Life Contingencies. See Actuarial Mathematics Section 3.7. 249 As per Life Contingencies. See Actuarial Mathematics Section 3.7. 247 248
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 541
Exercise: h(x) = 3/(10 + x). What is the distribution function? [Solution: x F(x) = 1 - exp[ - ∫3/(10 + t) dt ] = 1 - exp[-3 {ln(10 +x) - ln(10)}] = 1 - {10/(10+x)}3 . 0
A Pareto Distribution with α = 3 and θ = 10.]
Relationship to Other Items of Interest: One can obtain the Mean Excess Loss from the Hazard Rate: t
x
x
t
x
∫
∫
∫
∫
∫
s=0
s=0
s=0
s=0
s= t
S(t) / S(x) = exp(- h(s) ds) / exp(- h(s) ds) = exp( h(s) ds - h(s) ds) = exp( h(s) ds).
∞
∞
∫ S(t) dt /
e(x) = x
S(x) =
∞
∫ S(t)/S(x) dt
t=x
=
∫ exp( ∫h(s) ds) dt . t=x
∞
x
s= t
∞
x
∫ exp(H(x) - H(t)) dt = exp[H(x)] ∫exp(-H(t)) dt, where H(x) = ∫h(t) dt.
250
e(x) =
t=x
x
0
Exercise: Given a hazard rate of h(x) = 4 / (100+x), what is the mean excess loss, e(x)? [Solution: x
∫
H(x) = h(t) dt = 4 ln(100+x). 0
∞ e(x) = (100+x)4
∞
∫ 1/(100+t)4 dt = -(100+x)4/ 3(100+t))3 ] = (100+x)/3.
t=x
t=x
Comment: This is Pareto Distribution with α = 4 and θ = 100; e(x) = (θ+x) / (α−1), h(x) = α / (θ + x). ] 250
H is called the cumulative hazard rate and is used in Survival Analysis. S(x) = Exp[-H(x)].
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 542
One can obtain the Hazard Rate from the Mean Excess Loss as follows: ∞
{ ∫ S(t) dt } / S(x).
e(x) =
∞
x
Thus e′(x) =
{-S2(x) + f(x) ∫ S(t) dt} / S2(x) = -1 + f(x) e(x) / S(x) = -1 + e(x)h(x). x
Thus h(x) = {1 + e′(x)} / e(x). Exercise: Given a mean excess loss of rate of e(x) = (100+x) / 3, what is the hazard rate, h(x)? [Solution: eʼ(x) = 1/3 . h(x) = {1 + e′(x)} / e(x) = (4/3)(3/(100+x)) = 4/(100+x). Comment: This is Pareto Distribution with α = 4 and θ = 100. It has e(x) = (θ+x) / (α−1) and h(x) = α / (θ + x). ] Finally, one can obtain the Survival Function from the Mean Excess Loss as follows: h(x) = {1 + e′(x)} / e(x). x
H(x) =
x
x
∫ h(t) dt = ∫ {1/e(t) + e′(t) / e(t)} dt = ∫ 1/e(t) dt + ln(e(x)/e(0)).
0
0
0 x
∫
S(x) = exp[-H(x)] = (e(0)/e(x)) exp[- 1/e(t) dt]. 0
For example, for the Pareto, e(x) = (θ + x) / (α - 1). x
x
∫ 1/e(t) dt = (α - 1)∫ 1/(θ + x) dt = (α -1) ln((θ + x)/θ). 0
0
S(x) = {{(θ + x)/(α - 1)} / {θ/(α - 1)}}{(θ + x)/θ}−(α −1) = {θ/(θ + x)}α.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 543
Tail Behavior of the Hazard Rate: As was derived above, the limit as x approaches infinity of e(x) is equal to the limit as x approaches infinity of S(x) / f(x) = 1/h(x). lim e(x) = lim 1/h(x) x→∞
x→∞
Thus an increasing mean excess loss, e(x), is equivalent to a decreasing hazard or failure rate, h(x), and vice versa. Since the force of mortality for the Pareto, α / (θ+x), decreases with age, the Mean Excess Loss increases with age.251 The quicker the hazard rate declines, the faster the Mean Excess Loss increases and the heavier the tail of the distribution. For the Weibull, if τ > 1 then the hazard rate, τxτ−1/θτ, increases and thus the Mean Excess Loss decreases. For the Weibull with τ < 1, the hazard rate decreases and thus the Mean Excess Loss increases. Lighter Tail
e(x) decreases
h(x) increases
Heavier Tail
e(x) increases
h(x) decreases
251
Unlike the situation for mortality of humans. For Gompertzʼs or Makehamʼs Law with B > 0 and c > 0, the force of mortality increases with age, so the Mean Excess Loss decreases with age. For the Pareto, if α ≤ 1, then the force of mortality is sufficiently small so that there exists no mean; for α ≤ 1 the mean lifetime is infinite.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Problems: 34.1 (1 point) You are given the following three loss distributions. 1. Gamma α = 1.5, θ = 0.5 2. LogNormal µ = 0.1, σ = 0.6 3. Weibull θ = 1.4, τ = 0.8 For which of these distributions does the hazard rate increase? A. 1 B. 2 C. 3 D. 1,2,3 E. None of A, B, C, or D
Use the following information for the next 5 questions: e(x) = 72 - 0.8x, 0 < x < 90. 34.2 (2 points) What is the force of mortality at 50? A. less than 0.003 B. at least 0.003 but less than 0.004 C. at least 0.004 but less than 0.005 D. at least 0.005 but less than 0.006 E. at least 0.006 34.3 (3 points) What is the Survival Function at 60? A. 72%
B. 74%
C. 76%
D. 78%
E. 80%
D. 68%
E. 70%
34.4 (2 points) What is 50p 30? A. 62%
B. 64%
C. 66%
34.5 (1 point) What is the mean lifetime? A. 72 B. 74 C. 76 D. 78
E. 80
34.6 (2 points) What is the probability density function at 40? A. less than 0.002 B. at least 0.002 but less than 0.003 C. at least 0.003 but less than 0.004 D. at least 0.004 but less than 0.005 E. at least 0.005
Page 544
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 545
34.7 (2 points) For a LogNormal distribution with parameters µ = 11.6, σ = 1.60, what is the hazard rate at $100,000? A. less than 4 x 10-6 B. at least 4 x 10-6 but less than 5 x 10-6 C. at least 5 x 10-6 but less than 6 x 10-6 D. at least 6 x 10-6 but less than 7 x 10-6 E. at least 7 x 10-6 34.8 (2 points) The hazard rate h(x) = 0.002 + 1.1x / 10,000, x > 0. What is S(50)? (A) 0.76 (B) 0.78 (C) 0.80 (D) 0.82 (E) 0.84 34.9 (1 point) If the hazard rate of a certain machine part is a constant 0.10 for t > 0, what is the Mean Excess Loss at age 25? A. less than 10 B. at least 10 but less than 15 C. at least 15 but less than 20 D. at least 20 but less than 25 E. at least 25 34.10 (2 points) Losses follow a Weibull Distribution with θ = 25 and τ = 1.7. What is the hazard rate at 100? A. less than 0.05 B. at least 0.05 but less than 0.10 C. at least 0.10 but less than 0.15 D. at least 0.15 but less than 0.20 E. at least 0.20 34.11 (2 points) For a loss distribution where x ≥ 10, you are given: i) The hazard rate function: h(x) = z/x, for x ≥ 10. ii) A value of the survival function: S(20) = .015625. Calculate z. A. 2 B. 3 C. 4 D. 5 E. 6 34.12 (2 points) For a loss distribution where x ≥ 0, you are given: i) The hazard rate function: h(x) = z x2 , for x ≥ 0. ii) A value of the distribution function: F(5) = 0.1175. Calculate z. A. 0.002 B. 0.003 C. 0.004 D. 0.005
E. 0.006
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 546
Use the following information for the next four questions: Ground up losses follow a Weibull Distribution with τ = 2 and θ = 10. 34.13 (3 points) There is an ordinary deductible of 5. What is the hazard rate of the per loss variable? 34.14 (3 points) There is an ordinary deductible of 5. What is the hazard rate of the per payment variable? 34.15 (3 points) There is a franchise deductible of 5. What is the hazard rate of the per loss variable? 34.16 (3 points) There is a franchise deductible of 5. What is the hazard rate of the per payment variable? 34.17 (3 points) X follows a Gamma Distribution with parameters α = 3 and θ. Determine the form of the hazard rate h(x). What is the behavior of h(x) as x approaches infinity? 34.18 (2 points) The hazard rate h(x) = 4/(100 + x), x > 0. What is S(50)? (A) 0.18
(B) 0.20
(C) 0.22
(D) 0.24
(E) 0.26
34.19 (2 points) Determine the hazard rate at 300 for a Loglogistic Distribution with γ = 2 and θ = 100. (A) 0.005
(B) 0.006
(C) 0.007
(D) 0.008
(E) 0.009
34.20 (1 point) F(x) is a Pareto Distribution. If the hazard rate h(x) is doubled for all x, what is the new distribution function? 34.21 (2 points) You are using a Weibull Distribution to model the length of time workers remain unemployed. Briefly discuss the implications of different values of the parameter τ.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 547
34.22 (2 points) Robots can fail due to two independent decrements: Internal and External. (Internal includes normal wear and tear. External includes accidents.) Assuming no external events, a robotʼs time until failure is given by a Pareto DIstribution with α = 2 and θ = 10. Assuming no internal events, a robotʼs time until failure is given by a Pareto DIstribution with α = 4 and θ = 10. At time = 5, what is the hazard rate of the robotʼs time until failure? A. 0.35 B. 0.40 C. 0.45 D. 0.50 E. 0.55 34.23 (1 point) F(x) is a Weibull Distribution. If the hazard rate h(x) is doubled for all x, what is the new distribution function? 34.24 (160, 11/87, Q.7) (2.1 points) Which of the following are true for all values of x > 0? S(x) - S(x +1) I. For every exponential survival model h(x) = . x+1
∫
S(t)dt
x
II. For every survival model f(x) ≤ h(x). III. For every survival model f(x) ≤ f(x + 1). (A) I and II only (B) I and III only (C) II and III only (D) I, II and III (E) The correct answer is not given by (A), (B), (C), or (D). 34.25 (160, 11/87, Q.8) (2.1 points) The force of mortality for a survival distribution is given by: 1 h(x) = , 0 < x < 100. Determine e(64). 2 (100 - x) (A) 16
(B)18
(C) 20
(D) 22
(E) 24
34.26 (160, 11/87, Q.15) (2.1 points) For a Weibull distribution as per Loss Models, the hazard rate at the median age is 0.05. Determine the median age. (A) τ ln(2)
(B) τ ln(20)
(C) 20τ ln(2)
(D) 2ln(τ)
(E) 2τ ln(20)
34.27 (160, 11/88, Q.2) (2.1 points) A survival model is represented by the following probability density function: f(t) = (0.1)(25 - t)-1/2; 0 ≤ t ≤ 25. Calculate the hazard rate at 20. (A) 0.05 (B) 0.10 (C) 0.15 (D) 0.20 (E) 0.25
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 548
34.28 (160, 11/89, Q.1) (2.1 points) For a survival model, you are given: (i) The hazard rate is h(t) = 2/(w - t), 0 ≤ t < w. (ii) T is the random variable denoting time of failure. Calculate Var(T). (A) w2 /18
(B) w2 /12
(C) w2 /9
(D) w2 /6
(E) w2 /3
34.29 (160, 11/89, Q.2) (2.1 points) S(x) = 0.1(100 - x)1/2, 0 ≤ x ≤ 100. Calculate the hazard rate at 84. (A) 1/32 (B) 1/24 (C) 1/16 (D) 1/8 (E) 1/4 34.30 (160, 5/90, Q.4) (2.1 points) You are given that y is the median age for the survival function S(x) = 1 - (x/100)2 , 0 ≤ x ≤ 100. Calculate the hazard rate at y. (A) 0.013 (B) 0.014 (C) 0.025 (D) 0.026 (E) 0.028 34.31 (Course 160 Sample Exam #1, 1996, Q.2) (1.9 points) X has a uniform distribution from 0 to 10. Y = 4X2 . Calculate the hazard rate of Y at 4. (A) 0.007 (B) 0.014 (C) 0.021 (D) 0.059
(E) 0.111
34.32 (Course 160 Sample Exam #2, 1996, Q.2) (1.9 points) You are given: 1 (i) A survival model has a hazard rate h(x) = , 0 ≤ x ≤ ω. 3 (ω - x) (ii) The median age is 63. Calculate the mean residual life at 63, e(63). (A) 4.5 (B) 6.8 (C) 7.9 (D) 9.0
(E) 13.5
34.33 (Course 160 Sample Exam #3, 1997, Q.1) (1.9 points) You are given: (i) For a Weibull distribution with parameters θ and τ, the median age is 22. (ii) At the median age, the value of the Hazard Rate Function is 1.26. Calculate τ. (A) 37
(B) 38
(C) 39
(D) 40
(E) 41
34.34 (Course 160 Sample Exam #1, 1999, Q.19) (1.9 points) Losses follow a Loglogistic Distribution, with parameters γ = 3 and θ = 0.1984. For what value of x is the hazard rate, h(x), a maximum? (A) 0.18 (B) 0.20 (C) 0.22 (D) 0.25 (E) 0.28
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 549
34.35 (Course 160 Sample Exam #1, 1999, Q.20) (1.9 points) A sample of 10 batteries in continuous use is observed until all batteries fail. You are given: (i) The times to failure (in hours) are 14.1, 21.3, 23.2, 26.2, 29.8, 31.3, 35.7, 39.4, 39.2, 45.3. (ii) The composite hazard rate function for these batteries is defined by h(t) = λ, 0 ≤ t < 27.9, h(t) = λ + β(t - 27.9)2 , t ≥ 27.9. (iii) S(15) = 0.7634, S(30) = 0.5788. Calculate the absolute difference between the cumulative hazard rate at 34, H(34), based on the assumed hazard rate, and the cumulative hazard rate at 34, HO(34), based on the observed data. (A) 0.03 (B) 0.06 (C) 0.08 D) 0.11 (E) 0.14 34.36 (CAS3, 11/03, Q.19) (2.5 points) For a loss distribution where x ≥ 2, you are given: i) The hazard rate function: h(x) = z2 / (2x), for x ≥ 2. ii) A value of the distribution function: F(5) = 0.84. Calculate z. A. 2 B. 3 C. 4 D. 5 E. 6 34.37 (CAS3, 11/04, Q.7) (2.5 points) Which of the following formulas could serve as a force of mortality? 1. µx = BCx,
B > 0, C > 1
2. µx = a(b+x)-1,
a > 0, b > 0
3. µx = (1+x)-3,
x≥0
A. 1 only
B. 2 only
C. 3 only
D. 1 and 2 only
E. 1 and 3 only
34.38 (CAS3, 11/04, Q.27) (2.5 points) You are given:
• X has density f(x), where f(x) = 500,000 / x3 , for x > 500 (single-parameter Pareto with α = 2). • Y has density g(y), where g(y) = y e-y/500 / 250,000 (gamma with α = 2 and θ = 500). Which of the following are true? 1. X has an increasing mean residual life function. 2. Y has an increasing hazard rate. 3. X has a heavier tail than Y based on the hazard rate test. A. 1 only. B. 2 only. C. 3 only. D. 2 and 3 only. Note: I have rewritten this exam question.
E. All of 1, 2, and 3.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 550
34.39 (CAS3, 5/05, Q.30) (2.5 points) Acme Products will offer a warranty on their products for x years, where x is the largest integer for which there is no more than a 1% probability of product failure. Acme introduces a new product with a hazard function for failure at time t of 0.002t. Calculate the length of the warranty that Acme will offer on this new product. A. Less than 3 years B. 3 years C. 4 years D. 5 years E. 6 or more years 34.40 (CAS3, 11/05, Q.11) (2.5 points) Individuals with Flapping Gum Disease are known to have a constant force of mortality µ. Historically, 10% will die within 20 years. A new, more serious strain of the disease has surfaced with a constant force of mortality equal to 2µ. Calculate the probability of death in the next 20 years for an individual with this new strain. A. 17% B. 18% C. 19% D. 20% E. 21% 34.41 (SOA M, 11/05, Q.13) (2.5 points) The actuarial department for the SharpPoint Corporation models the lifetime of pencil sharpeners from purchase using a generalized DeMoivre model with s(x) = (1 - x/ω)α, for α > 0 and 0 < x ≤ ω. A senior actuary examining mortality tables for pencil sharpeners has determined that the original value of α must change. You are given: (i) The new complete expectation of life at purchase is half what it was previously. (ii) The new force of mortality for pencil sharpeners is 2.25 times the previous force of mortality for all durations. (iii) ω remains the same. Calculate the original value of α. (A) 1
(B) 2
(C) 3
(D) 4
(E) 5
34.42 (CAS3, 5/06, Q.10) (2.5 points) The force of mortality is given as: µ(x) = 2 / (110 - x), for 0 ≤ x < 110. Calculate the expected future lifetime for a life aged 30. A. Less than 20 B. At least 20, but less than 30 C. At least 30, but less than 40 D. At least 40, but less than 50 E. At least 50
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 551
34.43 (CAS3, 5/06, Q.11) (2.5 points) Eastern Digital uses a single machine to manufacture digital widgets. The machine was purchased 10 years ago and will be used continuously until it fails. The failure rate of the machine, u(x), is defined as: u(x) = x2 / 4000, for x ≤ 4000 , where x is the number of years since purchase. Calculate the probability that the machine will fail between years 12 and 14, given that the machine has not failed during the first 10 years. A. Less than 1.5% B. At least 1.5%, but less than 3.5% C. At least 3.5%, but less than 5.5% D. At least 5.5%, but less than 7.5% E. At least 7.5% 34.44 (CAS3, 5/06, Q.16) (2.5 points) The force of mortality is given as: µ(x) = 1 / (100 - x), for 0 ≤ x < 100. Calculate the probability that exactly one of the lives (40) and (50) will survive 10 years. A. 9/30 B. 10/30 C. 19/30 D. 20/30 E. 29/30
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 552
Solutions to Problems: 34.1. A. For a Gamma with α > 1, hazard rate increases (toward a horizontal asymptote given by an exponential.) For α >1 the Gamma is lighter-tailed than an Exponential. For a LogNormal the hazard rate decreases. For a Weibull with τ < 1, the hazard rate decreases; for τ <1 the Weibull is heavier-tailed than an Exponential. Alternately, the hazard rate increases if and only if the mean excess loss decreases. For a Gamma with α > 1, mean excess loss decreases (toward a horizontal asymptote given by an exponential.) For a LogNormal mean excess loss increases. For a Weibull with τ < 1, mean excess loss increases. 34.2. E. h(x) = {1 + e′(x)} / e(x) = (1 - .8)/(72 - .8x) = 1/(360 - 4x). h(50) = 0.00625. x
∫
x
t=x
∫
34.3. C. S(x) = exp[- λ(t) dt] = exp[- 1/(360 - 4t) dt] = exp[ln(360 - 4t)/4] 0
0
]= t=0
exp[ln(360 - 4x)/4 - ln(360)/4] = exp[(1/4)ln(1 - x/90)] = (1 - x/90)1/4. S(60) = 0.760. 34.4. B. 50p 30 = probability that a life aged 30 lives at least 50 more years = S(80)/S(30) = (1/9)1/4 / (2/3)1/4 = 0.639. 34.5. A. mean lifetime = e(0) = 72. Alternately, 90
∫
E[X] = S(t) dt = 0
90
90
∫ (1 - x/90)1/4 dt = -72(1 - x/90)5/4 ] = 72. 0
0
34.6. D. f(x) = -dS(x)/dx = (1 - x/90)-3/4 /360. f(40) = 0.0043. 34.7. B. For the LogNormal, F(x) = Φ[{ln(x) − µ} / σ]. F(100000) = Φ[-.0544] = (1 - .5217). Therefore, S(100,000) = .5217. f(x) = exp[-.5 ({ln(x)−µ} /σ)2] /{xσ 2 π }. f(100,000) = exp[-.5({ln(100000)-11.6}/1.6)2 )]/{ 160000 2 π } = 2.490 x 10-6. h(100000) = f(100000) / S(100000) = 2.490 x 10-6 / .5217 = 4.77 x 10- 6.
2013-4-2,
Loss Distributions, §34 Hazard Rate
34.8. C.
x
∫
HCM 10/8/12,
Page 553
x
∫
S(x) = exp(- λ(t) dt) = exp(- .002 + 1.1t/10000 dt) = exp(-.002x + (1 - 1.1x)/(10000 ln(1.1) ). 0
0
S(50) = exp(-.1 - .1221) = 0.80. Comment: This is an example of Makehamʼs Law of mortality. 34.9. B. A constant rate of hazard implies an Exponential Distribution, with θ = 1 / the hazard rate. The mean excess loss is θ at all ages. Thus the mean excess loss at age 25 (or any other age) is: 1 / 0.10 = 10. Comment: Note, one can write down the equation: hazard rate = chance of failure / probability of still working = F´(x) / {1-F(x)} = S´(x) / S(x) = .10 and solve the resulting differential equation: S´(x) = .1 S(x), for S(x) = e-.1x or F(x) = 1 - e-.1x. 34.10. D. h(x) = f(x) / S(x) = {τ(x/θ)τ exp(-(x/θ)τ) /x}/ exp(-(x/θ)τ) = τxτ−1/θτ. h(100) = (1.7)(100.7)/(251.7) = 0.179. 34.11. E.
x
∫
S(x) = exp(- h(t) dt) = exp[-z{ln(x) - ln(10)}] = (10/x)z, for x ≥ 10. 10
0.015625 = S(20) = (1/2)z. ⇒ z = 6. Comment: S(x) = (10/x)6 , for x ≥ 10. A Single Parameter Pareto, with α = 6 and θ = 10. Similar to CAS3, 11/03, Q.19. 34.12. B.
x
∫
S(x) = exp(- h(t) dt) = exp[-z x3 /3] , for x ≥ 0. 0
0.8825 = S(5) = exp[-z 53 /3]. ⇒ z = -ln(.8825) 3/125 = 0.0030. Comment: S(x) = exp[-(x/10)3 ], for x ≥ 0. A Weibull Distribution, with θ = 10 and τ = 3.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 554
34.13. For the Weibull, f(x) = τ(x/θ)τ exp(-(x/θ)τ) / x = x exp(-(x/10)2 )/ 50. S(x) = exp(-(x/θ)τ) = exp(-(x/10)2 ). h(x) = f(x)/S(x) = x/50. The per loss variable Y is 0 for x ≤ 5, and is X - 5 for x > 5. Y has a point mass of F(5) at 0. Thus fY(y) is undefined at zero. fY(y) = fX(y+5) for y > 0. S Y(0) = SX(5). SY(y) = hY(y) undefined at zero.
SX(y+5) for y > 0. hY(y) = fY(y)/SY(y) = fX(y+5)/SX(y+5) = hX(y+5) = (y+5)/50 for y > 0.
Comment: Similar to Example 8.1 in Loss Models. Loss Models uses the notation YP for the per payment variable and YL for the per loss variable. 34.14. The per payment variable Y is undefined for x ≤ 5, and is X - 5 for x > 5. fY(y) = fX(y+5)/SX(5) for y > 0. SY(y) = SX(y+5)/SX(5) for y > 0. hY(y) = fY(y)/SY(y) = fX(y+5)/SX(y+5) = hX(y+5) = (y+5)/50 for y > 0. 34.15. The per loss variable Y is 0 for x ≤ 5, and is X for x > 5. Y has a point mass of FX(5) at 0. Thus fY(y) is undefined at zero. fY(y) = 0 for 0 < y ≤ 5. fY(y) = fX(y) for y > 5. SY(y) = SX(5) for 0 < y ≤ 5. SY(y) = SX(x) for y > 5. hY(y) undefined at zero. hY(y) = 0 for 0 < y ≤ 5. hY(y) = fY(y)/SY(y) = fX(x)/SX(x) = hX(y) = y/50 for y > 5. 34.16. The per payment variable Y is undefined for x ≤ 5, and is X for x > 5. fY(y) = fX(x)/SX(5) for y > 5. SY(y) = SX(x)/SX(5) for y > 5. hY(y) = fY(y)/SY(y) = fX(y)/SX(y) = hX(y) = y/50 for y > 5. Comment: Similar to Example 8.2 in Loss Models. 34.17. f(x) = 0.5 x2 e-x/θ / θ3. S(x) = 1 - Γ(3 ; x/θ) = e-x/θ + (x/θ)e-x/θ + (x/θ)2 e-x/θ/2. h(x) = f(x)/S(x) = x2 /(2θ3 + 2θ2x + θx2 ). h(x) = 1/(2θ3/x2 + 2θ2/x + θ), which increases to 1/θ as x approaches infinity. Comment: I have used Theorem A.1 in Appendix A of Loss Models, in order to write out the incomplete Gamma Function for an integer parameter. One can also verify that dS(x)/dx = -f(x). A Gamma Distribution for α > 1 is lighter tailed than an Exponential (α = 1), and the hazard rate increases to 1/θ, while the mean excess loss decreases to θ.
2013-4-2,
Loss Distributions, §34 Hazard Rate x
x
∫0
34.18. B. S(x) = exp[ - λ( t) dt ] = exp[ -
exp[-4{ln(100+x) - ln(100)}] =
HCM 10/8/12,
Page 555
∫0 100 + t dt ] = 4
4 ⎛ 100 ⎞ . S(50) = (100/150)4 = 0.198. ⎝ 100 + x ⎠
Comment: This is a Pareto Distribution, with α = 4 and θ = 100. 34.19. B. F(x) = (x/θ)γ / {1 + (x/θ)γ}. F(300) = 32 / (1 + 32 ) = 0.9. S(300) = 0.1. f(x) = γ (x/θ)γ / (x{1 + (x/θ)γ}2 ). f(300) = (2) (32 ) / {(300)(1 + 32 )2 } = 0.0006. h(300) = f(300)/S(300) = 0.0006/0.1 = 0.006. Comment: For the Loglogistic, h(x) = f(x)/S(x) = γ xγ−1 θ−γ / {1+ (x/θ)γ}. For γ = 2 and θ = 100: h(x) = 0.0002 x / {1 + (x/100)2 }. The hazard rate increases and then decreases: hazard rate 0.010
0.008
0.006
0.004
0.002
200
400
600
34.20. For the Pareto the hazard rate is: h(x) = f(x) / S(x) =
800
x 1000
α 2α . ⇒ 2 h(x) = . θ + x θ + x
This is the hazard rate for another Pareto Distribution with parameters 2α and θ.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 556
34.21. If τ = 1, then we have an Exponential with constant hazard rate. The probability of the period of unemployment ending is independent of how long the worker has been out of work. If τ < 1, then we have decreasing hazard rate. As a worker remains out of work for longer periods of time, his chance of finding a job declines. This could be due to exhaustion of possible employment opportunities or the worker becoming discouraged. If τ > 1, then we have increasing hazard rate. As a worker remains out of work for longer periods of time, his chance of going back to work increases. This could be due to worry about exhaustion of unemployment benefits or the worker becoming more willing to settle for a less than desired job. Comment: A Weibull with τ < 1 has a heavier righthand tail than an Exponential. A Weibull with τ > 1 has a lighter righthand tail than an Exponential.
34.22. B. For the Pareto Distribution, h(x) = f(x)/S(x) = Thus h1 (x) =
α . θ + x
2 4 , and h2 (x) = . 10 + x 10 + x
Since the decrements are independent, the hazard rates add, and h(x) =
6 . 10 + x
h(5) = 6/15 = 0.4. Alternately, the probability of surviving past age x is the product of the probabilities of surviving both of the independent decrements: ⎛ 10 ⎞ 2 ⎛ 10 ⎞ 4 ⎛ 10 ⎞ 6 S(x) = S1 (x) S2 (x) = = . ⎝ 10 + x⎠ ⎝ 10 + x⎠ ⎝ 10 + x⎠ This is a Pareto DIstribution with α = 6 and θ = 10. ⇒ h(x) =
6 . ⇒ h(5) = 6/15 = 0.4. 10 + x
34.23. For the Weibull the hazard rate is: h(x) = f(x) / S(x) = 2 h(x) = 2
τ xτ − 1 .⇒ θτ
τ x τ− 1 τ x τ− 1 = . θτ (θ / 21/ τ )τ
This is the hazard rate for another Weibull Distribution with parameters τ and θ / 21/τ. Comment: If τ = 1 we have an Exponential, and if the hazard rate is doubled then the mean is halved.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 557 x+1
34.24. A. For the Exponential, h(x) = 1/θ and S(x) = e-x/θ. {S(x) - S(x+1)}/ ∫ S(t)dt = x
{e-x/θ - e-(x+1)/θ}/{e-x/θ/θ - e-(x+1)/θ/θ} = 1/θ. Statement I is true. h(x) = f(x)/S(x) ≥ f(x), since S(x) ≤ 1. Therefore, Statement II is true. While the density must go to zero as x approaches infinity, the density can either increase or decrease over short periods. Statement III is not true. x+1
Comment: {S(x) - S(x+1)}/ ∫ S(t)dt = mx = central death rate. x
See page 70 of Actuarial Mathematics. x
34.25. E. S(x) = exp[- ∫ h(t)dt ] = exp[ln(100-x)/2 - ln(100)/2] = 0
1 - x / 100 .
100
e(64) = ∫ S(t)dt /S(64) = (200/3)(1 - 64/100)3/2 / (1 - 64/100)1/2 = 24. 64
34.26. C. For the Weibull, S(x) = exp(-(x/θ)τ), and f(x) = τxτ−1exp(-(x/θ)τ)/θτ.
⇒ h(x) = f(x)/S(m) = τxτ−1/θτ. Let m be the median. S(m) = 0.5. ⇒ lnS(m) = -(m/θ)τ = -ln(2). h(m) = 0.05. ⇒ τmτ−1/θτ = 0.05. Dividing the two equations: m/τ = ln(2)/0.05. ⇒ m = 20τ ln(2). 34.27. B. Integrating f(t) from t to 25, S(t) = (0.2)(25 - t)1/2. h(t) = f(t)/S(t) = 1/(50 - 2t). h(20) = 1/10. 34.28. A. h(t) = 2/(w - t). H(t) = ∫ 0t h(x)dx = 2ln(w) - 2ln(2 - t). S(t) = exp[-H(t)] = {(w - t)/w}2 = (1 - t/w)2 . f(t) = 2(1 - t/w)/w. = 2/w - 2t/w2 . ∫ 0t x f(x )dx = w/3. ∫ 0t x2 f(x)dx = w2 /6. Var(T) = w2 /6 - (w/3)2 = w 2 /18. 34.29. A. S(x) = 0.1(100 - x)1/2. f(x) = 0.05(100 - x)-1/2. h(x) = f(x)/S(x) = 0.5/(100 - x). h(84) = 0.5/16 = 1/32. 34.30. E. 0.5 = (y/100)2 . ⇒ y = 70.71. f(x) = x/50,000. f(y) = f(70.71) = 70.71/50,000 = 0.01414. h(y) = f(y)/S(y) = 0.01414/.5 = 0.0283.
2013-4-2,
Loss Distributions, §34 Hazard Rate
34.31. B. Y = 4X2 . ⇒ X =
HCM 10/8/12,
Page 558
Y / 2.
S X(x) = 1 - x/10, 0 ≤ x ≤ 10. ⇒ SY(y) = 1 -
y / 20, 0 ≤ y ≤ 400.
fY(y) = 1/(40 y ). fY(4) = 1/80. SY(4) = .9. hY(4) = (1/80)/0.9 = 1/72 = 0.0139. 34.32. B. H(t) = ∫ 0t h(x)dx = ln(ω)/3 - ln(ω - t)/3. S(x) = exp[-H(t)] = exp[ln(ω - x)/3 - ln(ω)/3] = (ω - x)1/3/ω1/3 = (1 - x/ω)1/3. Median age is 63. ⇒ .5 = S(63) = (1 - 63/ω)1/3. ⇒ ω = 63/.875 = 72. ⇒ S(t) = (1 - t/72)1/3. 72
e(63) = ∫ S(t) dt / S(63) = {(3/4)(72)(1 - 63/72)4/3}/0.5 = 6.75. 63
34.33. D. .5 = S(22) = exp[-(22/θ)τ]. ⇒ .69315 = (22/θ)τ. h(x) = f(x)/S(x) = τxτ−1/θτ. We are given: 1.26 = h(22) = τ22τ−1/θτ. Dividing the two equations: τ/22 = 1.8178. ⇒ τ = 40. Comment: θ = 22.2. 34.34. D. S(x) = 1/(1 + (x/θ)γ). f(x) = γxγ−1 θ−γ / (1+ (x/θ)γ)2 . h(x) = f(x)/S(x) = γxγ−1 θ−γ / (1+ (x/θ)γ) = {(3)x2 /(.19843 )}/{1 + (x/.1984)3 } = 3x2 /(.00781 + x3 ). 0 = hʼ(x) = {6x(.00781 + x3 ) - (3x2 )(3x2 )}/(.00781 + x3 )2 . ⇒ x3 = 0.01562. ⇒ x = 0.25. Comment: Here is a graph of h(x) = 3x2 / (0.00781 + x3 ): hazard rate 8
6
4
2
0.5
1.0
1.5
2.0
2.5
3.0
x
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 559
34.35. E. For t < 27.9, H(t) = ∫ 0t h(x)dx = λt. S(t) = e−λt. S(15) = 0.7634. ⇒ λ = 0.018. t For t ≥ 27.9, H(t) = H(27.9) + ∫ 27.9 h(x)dx = (27.9)(.018) + (t - 27.9)(.018) + β(t - 27.9)3 /3.
S(t) = exp[-.018t - β(t - 27.9)3 /3]. 0.5788 = S(30) = exp[-(.018)(30) - β(30 - 27.9)3 /3].
⇒ (.018)(30) + β(30 - 27.9)3 /3 = .5468. ⇒ β = .00222. H(34) = -lnS(34) = (0.018)(34) + (0.00222)(34 - 27.9)3 /3 = 0.780. For the observed data, SO(34) = 4/10 = 0.4. HO(34) = -ln(.4) = 0.916. |H(34) - HO(34)| = 0.916 - 0.780 = 0.14. 34.36. A.
x
∫
2 S(x) = exp(- h(t) dt) = exp[- (z2 /2){ln(x) - ln(2)}] = (x / 2)-z / 2 , for x ≥ 2.
2 2 0.16 = S(5) = 2.5-z / 2 . ⇒ ln 0.16 = (-z2 /2) ln 2.5. ⇒ z2 = -2 ln 0.16 / ln 2.5 = 4. ⇒ z = 2.
Comment: S(x) = (2/x)2 , for x ≥ 2. A Single Parameter Pareto, with α = 2 and θ = 2. 34.37. D. µx corresponds to a legitimate survival function, if and only if it is nonnegative and its integral from 0 to infinity is infinite. All the candidates are nonnegative. ∞
x =∞
∫
B Cx = B lnC Cx ] = ∞, since C>1. 0
x =0 ∞
x =∞
∫
a(b+x)-1 = a ln(b+x) ] = ∞. 0
x =0
∞
x =∞
∫
(1+x)-3 = -(1+x)-2/2 ] = 1/2 ≠ ∞. 0
x =0
Thus 1 and 2 could serve as a force of mortality. Comment: #1 is Gompertz Law. #2 is a Pareto Distribution with a = α and b = θ. The Pareto is not useful for modeling human lives, since its force of mortality decreases to zero as x approaches infinity.
2013-4-2,
Loss Distributions, §34 Hazard Rate
HCM 10/8/12,
Page 560
34.38. E. 1. Single Parameter Pareto is heavy tailed with an increasing mean residual life. 2. A Gamma with α > 1 is lighter tailed than an Exponential; it has a decreasing mean residual life and an increasing hazard rate. 3. Single Parameter Pareto has a heavier tail than a Gamma. Comments: The mean residual life of a Single Parameter Pareto increases linearly as x goes to infinity, e(x) = x/(α-1). The hazard rate of a Single Parameter Pareto goes to zero as x goes to infinity, h(x) = α/x. For this Gamma Distribution, h(x) = f(x)/S(x) = {y e-y/500/ 250,000}/{1 - Γ[2 ; y/500]} = {y e-y/500/ 250,000}/{e-y/500 + (y/500)e-y/500} = 1/(250000/y + 500), where I have used Theorem A.1 to write out the incomplete Gamma function for integer parameter. h(x) increases to 1/500 = 1/θ as x approaches infinity. A Single Parameter Pareto, which has a decreasing hazard rate, has a heavier righthand tail than a Gamma, which has an increasing hazard rate. A Gamma with α = 1 is an Exponential with constant hazard rate. For α integer, a Gamma is a sum of α independent, identically distributed Exponentials. Therefore, as α → ∞, the Gamma Distribution approaches a Normal Distribution. The Normal Distribution is very light-tailed and has an increasing hazard rate. This is one way to remember that for α > 1, the Gamma Distribution has an increasing hazard rate. For α < 1, the Gamma Distribution has a decreasing hazard rate.
∫
34.39. B. h(t) = 0.002t. H(t) = h(t) dt = 0.001t2 . S(t) = exp[-H(t)] = exp[-0.001t2 ]. We want F(t) ≤ 1%. F(3) = 1 - exp[-0.009] = .009 ≤ 1%, so 3 is OK. F(4) = 1 - exp[-0.016] = .016 > 1%, so 4 is not OK. Comment: A Weibull Distribution with τ = 2. 99% = S(t) = exp[-0.001t2 ]. ⇒ t = 3.17. 34.40. C. For a constant force of mortality (hazard rate) one has an Exponential Distribution. For the original strain of the disease: 10% = 1 - e-20µ. ⇒ µ = 0.005268. For the new strain, the probability of death in the next 20 years is: 1 - exp[-(20)(2µ)] = 1 - exp[-(20)(2)(0.005268)] = 1 - e-0.21072 = 19.0%. Alternately, for twice the hazard rate, the survival function is squared. For the original strain, S(20) = 1 - 0.010 = 0.90. For the new strain, S(20) = 0.92 = 0.81. For the new strain, the probability of death in the next 20 years is: 1 - 0.81 = 19%.
2013-4-2,
Loss Distributions, §34 Hazard Rate
34.41. D.
HCM 10/8/12,
Page 561
ω
∞
∫
∫
e(x) = S(t) dt / S(x) = (1 - t/ω)α dt / (1 - x/ω)α = {ω(1 - x/ω)α/(α + 1)}/ (1 - x/ω)α x
x
= (ω - x)/(α + 1). ⇒ e(0) = ω/(α + 1). By differentiating, f(x) = -dS(x)/dx = α(1 - x/ω)α−1/ω. h(x) = f(x)/S(x) = {α(1 - x/ω)α−1/ω}/(1 - x/ω)α = α/(ω - x). Let α be the original value and αʼ be the new value of this parameter. From bullet i: ω/(αʼ + 1) = .5ω/(α + 1). ⇒ αʼ = 2α + 1. From bullet ii: αʼ/(ω - x) = 2.25α/(ω - x). ⇒ αʼ = 2.25α. Therefore, 2.25α = 2α + 1. ⇒ α = 4. Alternately, H(x) = -ln S(x) = -α ln(1 - x/ω). h(x) = d H(x) / dx = (α/ω)/(1 - x/ω) = α/(ω - x). Proceed as before. Comment: If α = 1, then one has DeMoivreʼs Law, the uniform distribution. A Modified DeMoivre model has α times the hazard rate of DeMoivreʼs Law for all ages. 34.42. B.
x
x
∫
∫
H(x) = µ(t) dt = 2/(110 - t) dt = -2{ln(110 - x) - ln(110)}. 0
0
S(x) = exp[-H(t)] = {(110 - x)/110}2 = (1 - x/100)2 , for 0 ≤ x < 110. 110
110
∫
∫
e(30) = S(t) dt / S(30) = (1 - x/100)2 dt /(1 - 30/100)2 = (110/3)(1 - 30/110)3 /(1 - 30/100)2 = 30
30
(110 - 30)/3 = 26.67. Comment: Generalized DeMoivreʼs Law with ω = 110 and α = 2. µ(x) = α/(ω - x), 0 ≤ x < ω. e(x) = (ω - x)/(α+1) = (110 - x)/3. The remaining lifetime at age 30 is a Beta Distribution with a = 1, b = α = 2, and θ = ω - 30 = 80.
2013-4-2,
Loss Distributions, §34 Hazard Rate
34.43. E.
x
HCM 10/8/12,
Page 562
x
∫
∫
H(x) = u(x) dt = t2 /4000 dt = x3 /12,000. 0
0
S(x) = exp[-H(t)] = exp[- x3 /12,000], for 0 ≤ x < 4000 . S(10) = .9200. S(12) = .8659. S(14) = .7956. Prob[fail between 12 and 14 | survive until 10] = {S(12) - S(14)}/S(10) = (.8659 - .7956)/.9200 = 0.0764. Comment: Without the restriction, x ≤
4000 , this would be a Weibull Distribution with τ = 3.
34.44. A. x
∫
H(x) = µ(t) dt = ln(100) - ln(100 - x). S(x) = exp[-H(x)] = (100 - x)/100. 0
Prob[life aged 40 survives at least 10 years] = S(50)/S(40) = .5/.6 = 5/6. Prob[life aged 50 survives at least 10 years] = S(60)/S(50) = .4/.5 = 4/5. Prob[exactly one survives 10 years] = (5/6)(1 - 4/5) + (1 - 5/6)(4/5) = 9/30. Comment: DeMoivreʼs Law with ω = 100.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 563
Section 35, Loss Elimination Ratios and Excess Ratios As discussed previously, the Loss Elimination Ratio (LER) is defined as the ratio of the losses eliminated by a deductible to the total losses prior to imposition of the deductible. The losses eliminated by a deductible d, are E[x ∧ d], the Limited Expected Value at d.252 LER(x) =
E[X ∧ x] . E[X]
The excess ratio R(x), is defined as the ratio of loss dollars excess of x divided by the total loss dollars.253 It is the complement of the Loss Elimination Ratio; they sum to unity. E[X] - E[X ∧ x] E[X ∧ x] R(x) = =1= 1 - LER(x). E[X] E[X] Using the formulas in Appendix A of Loss Models for the Limited Expected Value one can use the E[X ∧ x] relationship: R(x) = 1 to compute the Excess Ratio. E[X] For various distributions, here are the resulting formulas for the excess ratios, R(x): Distribution
Excess Ratio, R(x)
Exponential
e-x/θ
Pareto
⎛ θ ⎞ α− 1 ⎜ ⎟ ,α>1 ⎝θ + x ⎠ ⎡ ln(x) − µ ⎤ 1− Φ ⎢ ⎥⎦ σ ⎣ x ⎥⎦ exp[µ + σ2 / 2]
σ2 ⎤
LogNormal
⎡ ln(x) − µ − 1 -Φ⎢ σ ⎣
Gamma
1 - Γ(α+1 ; x/θ) - x{1-Γ(α ; x/θ)}/ αθ
Weibull
1 - Γ[1 +1/τ ; (x/θ)τ] - (x/θ) exp( -(x/θ)τ) / Γ[1 +1/τ]
Single Parameter Pareto
(1/α) (x/θ)1−α , α > 1, x > θ
252
The losses eliminated are paid by the insured rather than the insurer. The insured would generally pay less for its insurance in exchange for accepting a deductible. By estimating the percentage of losses eliminated the insurer can price how much of a credit to give the insured for selecting various deductibles. How the LER is used to price deductibles is beyond the scope of this exam, but generally the higher the loss elimination ratio, the greater the deductible credit. 253 The excess ratio is used by actuaries to price reinsurance, workers compensation excess loss factors, etc.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 564
Recall that the mean and thus the Excess Ratio fails to exist for: Pareto with α ≤ 1, Generalized Pareto with α ≤ 1, and Burr with αγ ≤ 1. Except where the formula could be simplified, there is a term in the Excess Ratio which is: -x(S(x)) / mean.254 Due to the computational length, exam questions involving the computation of Loss Elimination or Excess Ratios are most likely to involve the Exponential, Pareto, Single Parameter Pareto, or LogNormal Distributions. Exercise: Compute the excess ratios at $1 million and $5 million for a Pareto with parameters α = 1.702 and θ = 240,151. [Solution: For the Pareto R(x) = {θ/(θ+x)}α−1. R($1 million) = (240,151/1,240,151)0.702 = 0.316. R($5 million) = (240,151/5,240,151)0.702 = 0.115.] Since LER(x) = 1 - R(x), one can use the formulas for the Excess Ratio to get the Loss Elimination Ratio and vice-versa. Exercise: Compute the loss elimination ratio at 10,000 for the Pareto with parameters: α = 1.702 and θ = 240,151. [Solution: For the Pareto, R(x) = {θ/(θ+x)}α−1. Therefore, LER(x) = 1- {θ/(θ+x)}α−1. LER(10,000) = 1 - (240,151/250,151)0.702 = 2.8%. Comment: One could get the same result by using LER(x) = E[X ∧ x] / mean.] Loss Elimination Ratio and Excess Ratio in Terms of the Survival Function: As discussed previously, for a distribution with support starting at zero, the Limited Expected Value can be written as an integral of the Survival Function from 0 to the limit: E[X ∧ x] =
x
∫
S(t) dt .
0
LER(x) = E[X ∧ x] / E[X], therefore:
LER(x) =
∞ ∫ x
S(t) dt E[X]
=
x ∫ 0 ∞
S(t) dt
∫ S(t) dt
.
0
This term comes from the second part of - E[X ∧ x] in the numerator, -xS(x). For example for the Gamma Distribution, the excess ratio has a term -xλ{1-Γ(α ; λx)}/α = -xS(x)/( α/λ) = - xS(x)/mean. 254
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 565
Thus, for a distribution with support starting at zero, the Loss Elimination Ratio is the integral from zero to the limit of S(x) divided by the mean. Since R(x) = 1 - LER(x) = (E[X] - E[X ∧ x]) / E[X], the Excess Ratio can be written as:
R(x) =
∞ ∫ x
S(t) dt E[X]
=
∞ ∫ x ∞
S(t) dt
∫ S(t) dt
.
0
So the excess ratio is the integral of the survival from the limit to infinity, divided by the mean.255 For example, for the Pareto Distribution, S(x) = θα (θ+x)−α. So that: R(x) =
θα (θ + x)1- α / (α -1) = {θ/(θ+x)}α−1. θ / (α - 1)
This matches the formula given above for the Excess Ratio of the Pareto Distribution.
LER(x) =
Since
x ∫ 0
S(t) dt E[X]
.⇒
d LER(x) S(x) = . dx E[X]
S(x) ≥ 0, the loss elimination ratio is a increasing function of x.256 E[X]
d LER(x) S(x) d2 LER(x) f(x) = .⇒ =. 2 dx E[X] E[X] dx
Since
f(x) ≥ 0, the loss elimination ratio is a concave downwards function of x. E[X]
The loss elimination ratio as a function of x is increasing, concave downwards, and approaches one as x approaches infinity.
255
This result is used extremely often by property/casualty actuaries. See for example, “The Mathematics of Excess of Loss Coverage and Retrospective Rating -- A Graphical Approach,” by Y.S. Lee, PCAS LXXV, 1988. 256 If S(x) = 0, in other words there is no possibility of a loss of size greater than x, then the loss elimination is a constant 1, and therefore, more precisely the loss elimination is nondecreasing.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 566
For example, here is a graph of the loss elimination ratio for a Pareto Distribution with parameters α = 1.702 and θ = 240,151:257 LER 0.8
0.6
0.4
0.2
1
2
3
4
5
size in millions
Since the loss elimination ratio is increasing and concave downwards, the excess ratio is increasing and concave upwards (convex). For a distribution with support starting at zero: d LER(x) S(x) d LER(0) 1 d LER(x) d LER(0) = .⇒ = . ⇒ S(x) = / . dx E[X] dx E[X] dx dx Therefore, the loss elimination ratio function determines the distribution function, as well as vice-versa. Layers of Loss: As discussed previously, layers can be thought of in terms of the difference of loss elimination ratio or the difference of excess ratios in the opposite order. Exercise: Compute the percent of losses in the layer from $1 million to $5 million for a Pareto Distribution with parameters α = 1.702 and θ = 240,151. [Solution: For this Pareto Distribution, R($1 million) - R($5 million) = 0.316 - 0.115 = 0.201. Thus for this Pareto, 20.1% of the losses are in the layer from $1 million to $5 million.] 257
As x approaches infinity, the loss elimination ratio approaches one. In this case it approaches the limit slowly.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 567
Problems: 35.1 (2 points) Assume you have a Pareto distribution with α = 5 and θ = $1000. What is the Loss Elimination Ratio for $500? A. less than 78% B. at least 78% but less than 79% C. at least 79% but less than 80% D. at least 80% but less than 81% E. at least 81% 35.2 (2 points) Assume you have Pareto distribution with α = 5 and θ = $1000. What is the Excess Ratio for $2000? A. less than 1% B. at least 1% but less than 2% C. at least 2% but less than 3% D. at least 3% but less than 4% E. at least 4% 35.3 (3 points) You observe the following 35 losses: 6, 7, 11, 14, 15, 17, 18, 19, 25, 29, 30, 34, 40, 41, 48, 49, 53, 60, 63, 78, 85, 103, 124, 140, 192, 198, 227, 330, 361, 421, 514 ,546, 750, 864, 1638. What is the (empirical) Loss Elimination Ratio at 50? A. less than 0.14 B. at least 0.14 but less than 0.16 C. at least 0.16 but less than 0.18 D. at least 0.18 but less than 0.20 E. at least 0.20 35.4 (2 points) The size of losses follows a LogNormal distribution with parameters µ = 11 and σ = 2.5. What is the Excess Ratio for 100 million? A. less than 5% B. at least 5% but less than 10% C. at least 10% but less than 15% D. at least 15% but less than 20% E. at least 20%
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 568
35.5 (2 points) The size of losses follows a Gamma distribution with parameters α = 3, θ = 100,000. What is the excess ratio for 500,000? Hint: Use Theorem A.1 in Appendix A of Loss Models: j=n-1
Γ(n; x) = 1 -
∑ xj e- x / j! , for n a positive integer. j=0
A. less than 5.6% B. at least 5.6% but less than 5.8% C. at least 5.8% but less than 6.0% D. at least 6.0% but less than 6.2% E. at least 6.2% 35.6 (2 points) The size of losses follows a LogNormal distribution with parameters µ = 10, σ = 3. What is the Loss Elimination Ratio for 7 million? A. less than 10% B. at least 10% but less than 15% C. at least 15% but less than 20% D. at least 20% but less than 25% E. at least 25% Use the following information for the next two questions: •
Accident sizes for Risk 1 follow an Exponential distribution, with mean θ.
•
Accident sizes for Risk 2 follow an Exponential distribution, with mean 1.2θ.
• •
The insurer pays all losses in excess of a deductible of d. 10 accidents are expected for each risk each year.
35.7 (1 point) Determine the expected amount of annual losses paid by the insurer for Risk 1. A. 10dθ
B. 10 / (dθ)
C. 10θ
D. 10θe-d/θ
E. 10e-d/θ
35.8 (1 point) Determine the limit as d goes to infinity of the ratio of the expected amount of annual losses paid by the insurer for Risk 2 to the expected amount of annual losses paid by the insurer for Risk 1. A. 0 B. 1/1.2 C. 1 D. 1.2 E. ∞
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 569
35.9 (1 point) You have the following estimates of integrals of the Survival Function. 1000
∫
0
S(x) dx ≅ 400.
∞
∫
S(x) dx ≅ 2300.
1000
Estimate the Loss Elimination Ratio at 1000. A. less than 15% B. at least 15% but less than 16% C. at least 16% but less than 17% D. at least 17% but less than 18% E. at least 18% 35.10 (4 points) For a LogNormal distribution with coefficient of variation equal to 3, what is the Loss Elimination Ratio at twice the mean? A. less than 50% B. at least 50% but less than 55% C. at least 55% but less than 60% D. at least 60% but less than 65% E. at least 65% 35.11 (3 points) The loss elimination ratio at 1 ≥ x ≥ 0 is:
ln[a + bx] - ln[a +1] , 1 > b > 0, a > 0. ln[a + b] - ln[a + 1]
Determine the form of the distribution function. 35.12 (4, 5/86, Q.59) (2 points) Assume that losses follow the probability density function f(x) = x/18 for 0 ≤ x ≤ 6 with f(x) = 0 otherwise. What is the loss elimination ratio (LER) for a deductible of 2? A. Less than .35 B. At least .35, but less than .40 C. At least .40, but less than .45 D. At least .45, but less than .50 E. .50 or more. 35.13 (4, 5/87, Q.57) (2 points) Losses are distributed with a probability density function f(x) = 2/x3 , 1 < x < ∞. What is the expected loss eliminated by a deductible of d = 5? A. Less than .5 B. At least .5, but less than 1 C. At least 1, but less than 1.5 D. At least 1.5, but less than 2 E. 2 or more.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 570
35.14 (4B, 5/92, Q.25) (2 points) You are given the following information: • Deductible $250
• • •
Expected size of loss with no deductible
$2,500
Probability of a loss exceeding deductible
Mean Excess Loss value of the deductible Determine the loss elimination ratio. A. Less than 0.035 B. At least 0.035 but less than 0.070 C. At least 0.070 but less than 0.105 D. At least 0.105 but less than 0.140 E. At least 0.140
0.95 $2,375
35.15 (4B, 11/92, Q.18) (2 points) You are given the following information:
• • • •
Deductible, d
$ 500
Expected value limited to d, E[x ∧ d]
$ 465
Probability of a loss exceeding deductible, 1-F(d)
0.86
Mean Excess Loss value of the deductible, e(d) Determine the loss elimination ratio. A. Less than 0.035 B. At least 0.035 but less than 0.055 C. At least 0.055 but less than 0.075 D. At least 0.075 but less than 0.095 E. At least 0.095
$5,250
35.16 (4B, 5/94, Q.10) (2 points) You are given the following: •
The amount of a single loss has a Pareto distribution with parameters α = 2 and θ = 2000.
Calculate the Loss Elimination Ratio (LER) for a $500 deductible. A. Less than 0.18 B. At least 0.18, but less than 0.23 C. At least 0.23, but less than 0.28 D. At least 0.28, but less than 0.33 E. At least 0.33
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 571
35.17 (4B, 5/96, Q.9 & Course 3 Sample Exam, Q.17) (2 points) You are given the following:
• • • •
Losses follow a lognormal distribution,with parameters µ = 7 and σ = 2. There is a deductible of 2,000. 10 losses are expected each year.
The number of losses and the individual loss amounts are independent. Determine the loss elimination ratio (LER) for the deductible. A. Less than 0.10 B. At least 0.10, but less than 0.15 C. At least 0.15, but less than 0.20 D. At least 0.20, but less than 0.25 E. At least 0.25 35.18 (4B, 11/96, Q.13) (2 points) You are given the following:
• •
Losses follow a Pareto distribution, with parameters θ = k and α = 2, where k is a constant.
There is a deductible of 2k. What is the loss elimination ratio (LER)? A. 1/3 B. 1/2 C. 2/3
D. 4/5
E. 1
35.19 (4B, 5/97, Q.19) (3 points) You are given the following: • Losses follow a distribution with density function f(x) = (1/1000) e-x/1000, 0 < x < ∞.
• •
There is a deductible of 500.
10 losses are expected to exceed the deductible each year. Determine the amount to which the deductible would have to be raised to double the loss elimination ratio (LER). A. Less than 550 B. At least 550, but less than 850 C. At least 850, but less than 1,150 D. At least 1,150, but less than 1,450 E. At least 1,450
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 572
Use the following information for the next two questions: • Loss sizes for Risk 1 follow a Pareto distribution, with parameters θ and α, α > 2 . • Loss sizes for Risk 2 follow a Pareto distribution, with parameters θ and 0.8α, α > 2 . • The insurer pays losses in excess of a deductible of k. • 1 loss is expected for each risk each year. 35.20 (4B, 11/97, Q.22) (2 points) Determine the expected amount of annual losses paid by the insurer for Risk 1. θ+k θα α θα A. B. C. α-1 (θ +k)α (θ +k)α + 1 D.
θ α+ 1 (α -1) (θ +k)α
E.
θα (α -1) (θ +k)α− 1
35.21 (4B, 11/97, Q.23) (1 point) Determine the limit of the ratio of the expected amount of annual losses paid by the insurer for Risk 2 to the expected amount of annual losses paid by the insurer for Risk 1 as k goes to infinity. A. 0 B. 0.8 C. 1 D. 1.25 E. ∞ 35.22 (4B, 5/99, Q.20) (2 points) Losses follow a lognormal distribution, with parameters µ = 6.9078 and σ = 1.5174. Determine the ratio of the loss elimination ratio (LER) at 10,000 to the loss elimination ratio (LER) at 1,000. A. Less than 2 B. At least 2, but less than 4 C. At least 4, but less than 6 D. At least 6, but less than 8 E. At least 8
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 573
35.23 (SOA3, 11/03, Q.29 & 2009 Sample Q.87) (2.5 points) The graph of the density function for losses is: f(x) 0.010 0.008 0.006 0.004 0.002
80
120
Loss amount, x
Calculate the loss elimination ratio for an ordinary deductible of 20. (A) 0.20 (B) 0.24 (C) 0.28 (D) 0.32 (E) 0.36 35.24 (SOA M, 11/06, Q.29 & 2009 Sample Q.284) (2.5 points) A risk has a loss amount which has a Poisson distribution with mean 3. An insurance covers the risk with an ordinary deductible of 2. An alternative insurance replaces the deductible with coinsurance α, which is the proportion of the loss paid by the insurance, so that the expected insurance cost remains the same. Calculate α. (A) 0.22
(B) 0.27
(C) 0.32
(D) 0.37
(E) 0.42
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 574
Solutions to Problems: 35.1. D. LER(x) = { E[X ∧ x] / mean } , for the Pareto: mean = θ/(α−1) = $250 and E[X ∧ x] = {θ/(α−1)}{1−(θ/θ+x)α−1} = $200.6. LER(x) = 200.6/250 = 80.24% 35.2. B. Excess Ratio = 1- { E[X ∧ x] / mean } = {θ/(θ +x)}α−1 = 1.23%. Comment: E[X ∧ x] = 246.925, mean = 250. 35.3. D. E[X ∧ 50] = { 6 + 7 + 11+ 14 +15 + 17 + 18 + 19 + 25+ 29+ 30+ 34 + 40 + 41+ 48 + 49 + (19)(50)} /35 = (403 + 950)/35 = 38.66. E[X] = {6 + 7 + 11 +14 + 15 + 17 + 18 +19 + 25 + 29 + 30 + 34 + 40 + 41+ 48 + 49 + 53 + 60 + 63 + 78 + 85 + 103 + 124 + 140 + 192 + 198 + 227 + 330 + 361 + 421 + 514 + 546 + 750 + 864 + 1638}/35 = 7150 /35 = 204.29. LER(50) = E[X ∧ 50] / E[X] = 38.66 / 204.29 = 0.189. 35.4. E. mean = exp(µ + σ2/2) = exp(11 + 2.52 /2) = 1362729. E{X ∧ x] = exp(µ + σ2/2)Φ[(lnx - µ − σ2)/σ] + x {1 − Φ[(lnx - µ)/σ]}. E{X ∧ 100 million] = 1362729Φ[(18.421 - 11 - 2.52 )/2.5] − (100 million){1 - Φ[(18.421 - 11)/2.5]} = 1362729Φ[.47] − (100 million){1 - Φ[2.97]} = 1,362,729(.6808) - (100 million){1 - 0.9985) = 1,077,745. Then R(100 million) = 1 - 1,077,745 / 1,362,729 = 1 - .791 = 20.9%. 35.5. B. Γ[3 ; 5] = 1 - e-5(1 + 5 + 52 /2) = 0.875. Γ[4 ; 5] = 1 - e-5(1 + 5 + 52 /2 +53 /6) = 0.735. For Gamma E[X] = αθ = 300,000. E[X
∧
500,000] = (αθ)Γ[α+1 ; 500,000/θ] + 500,000 (1 - Γ[α ; 500,000/θ]) =
300000Γ[4 ; 5] + 500,000(1 - Γ[3 ; 5]) = 283,000. Excess Ratio = 1 - { E[X
∧
500,000] / E[X] } = 1 - 283,000/300,000 = 1 - 0.943 = 5.7%.
35.6. D. mean = exp(µ + σ2/2) = exp(10 + 32 /2) = 1,982,759. E{X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}. E{X ∧ 7 million] = 1982759Φ[(15.761 - 10 - 32 )/3] + (7 million){1 - Φ[(15.761 - 10)/3]} = 1,982,759 Φ[-1.08] + (7 million){1 - Φ[1.92]} = 1,982,759(0.1401) + (7 million)(1 - 0.9726) = 469,585. Then LER(7 million) = E{X ∧ 7 million]/ E[X] = 469,585 / 1,982,759 = 0.237.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 575
35.7. D. The expected amount paid by the insurer is: 10{E[X] - E[X ∧ d]} = 10{θ - θ(1−e-d/θ)} = 10 θ e-d/θ . Alternately, per claim the losses excess of the limit d are: e(k)(1-F(k)) = θ e-d/θ . Thus for 10 claims we expect the insurer to pay: 10 θ e-d/θ . Alternately, per claim the losses excess of the limit k are: R(k)E[X] = e-d/θ θ = θ e-d/θ. Thus for 10 claims we expect the insurer to pay: 10 θ e-d/θ . 35.8. E. Using the solution to the previous question, the expected amount paid by the insurer for Risk 1 is: 10θ e-d/θ. Similarly, the expected amount paid by the insurer for Risk 2 is: 12θ e-d/1.2θ. Therefore, the ratio of the expected amount of annual losses paid by the insurer for Risk 2 to the expected amount of annual losses paid by the insurer for Risk 1 is: {12θ e-d/1.2θ} / {10θ e-d/θ} = 1.2 e.167d/θ. As d goes to infinity this goes to infinity. 35.9. A.
∞
1000
LER(1000)= 0
∫ S(t) dt / ∫ S(t) dt ≅ 400/(400+2300) = 14.8%. 0
Comment: The estimated mean is 400+2300 = 2700. The estimated limited expected value at 1000 is 400.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 576
35.10. D. 1 + CV2 = E[X2 ]/E[X]2 = exp[2µ + 2σ2]/exp[µ + σ2/2]2 = exp[σ2]. 1 + 32 = exp[σ2]. ⇒ σ =
ln(10) = 1.5174.
LER[x] = E[X ∧ x] / mean = {exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}} / exp(µ + σ2/2) = Φ[(lnx − µ)/σ - σ] + (x/mean) {1 − Φ[(lnx − µ)/σ]}. x = 2 mean. ⇒ ln(x) = ln(2) + µ + σ2/2.
⇒ (lnx − µ)/σ = ln(2)/σ + σ/2 = .69315/1.5174 + 1.5174/2 = 1.2155. LER[x] = Φ[1.2155 - 1.5174] + (2){1 - Φ[1.2155]} = Φ[-.30] + 2(1 - Φ[1.22]) = 0.3821 + 2(1 - 0.8888) = 60.5%. Comment: See Table I in “Estimating Pure Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976. Finger calculates excess ratios, which are one minus the loss elimination ratios. Here is a graph of Loss Elimination Ratios as a function of the ratio to the mean, for LogNormal Distributions with some different values of the coefficient of variation: LER 1
CV= 1 CV= 2
0.8
CV= 3
0.6
0.4
0.2
Ratio to Mean 1
2
3
4
5
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
35.11. LER(x) =
⇒
x ∫ 0
S(t) dt E[X]
.⇒
HCM 10/8/12,
Page 577
d LER(x) S(x) = . dx E[X]
d LER(0) 1 d LER(x) d LER(0) d LER(x) d LER(0) = . ⇒ S(x) = / . ⇒ F(x) = 1 / . dx E[X] dx dx dx dx
d LER(x) 1 ln[b] bx d LER(0) 1 ln[b ] = . = . dx ln[a + b] - ln[a +1] a + bx dx ln[a + b] - ln[a +1] a + 1
⇒ F(x) = 1 - (a+1)
bx , 0 ≤ x < 1. a + bx
S(1-) = (a+1)b / (a + b) > 0. Thus there is a point mass of probability at 1 of size: (a+1)b / (a + b). Comment: Note that F(1-) = a (1 - b) / (a + b) < 1. This is a member of the MBBEFD Distribution Class, not on the syllabus of your exam. See “Swiss Re Exposure Curves and the MBBEFD Distribution Class,” by Stefan Bernegger, ASTIN Bulletin, Vol. 27, No. 1, May 1997, pp. 99-111, on the syllabus of CAS Exam 8. Here is a graph of the loss elimination ratio for b = 0.2 and a = 3: LER 1.0 0.8 0.6 0.4 0.2 size 0.2 0.4 0.6 0.8 1.0 As it should, the LER is increasing, concave downwards, and approaches 1 as x approaches 1. Here is a graph of the Survival Function for b = 0.2 and a = 3:
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 578
Survival Function 1.0 0.8 0.6 0.4 0.2 size 0.2 0.4 0.6 0.8 1.0 There is a point mass of probability at x = 1 of size: (a+1)b / (a + b) = (4)(0.2)/3.2 = 25%. 35.12. D. F(x) = x2 / 36. 2
LER(2) = {
6
∫f(x)xdx + 2(1-F(2)) } / ∫f(x)xdx = {(23)/(54) +2 (1 - 22 / 36)} / {(63)/54} =
x=0
x=0
1.926 / 4 = 0.481. 35.13. D. Integrating f(x) from 1 to x, F(x) = 1 - 1/x2 . A deductible of d eliminates the size of the loss for small losses and d per large loss. The expected losses eliminated by a deductible of d is: d
d
d
∫f(x)x dx + d (1-F(d)) = ∫2x-2 dx + d(1/d2) = -2/x ] +1/d = (2 -2/d) + 1/d = 2 - 1/d . 1
1
1
For d =5, the expected losses eliminated are: 2 - 1/5 = 1.8. Comment: A Single Parameter Pareto with α = 2 and θ = 1. 35.14 . C. e(x) = {mean - E[X ∧ x]}/(1 - F(x)). Therefore: 2375 = (2500 - E[X ∧ x])/(.95). Thus E[X ∧ x] = 243.75. Then, LER(x) = E[X ∧ x] / E[X] = 243.75 / 2500 = 0.0975. Alternately, LER(x) = 1 - e(x) {1 - F(x)}/E[x] = 1 - (2375)(1 - .05)/2500 = 0.0975. 35.15. D. LER(d) = E[x ∧ d] / Mean. e(d) = (Mean - E[X ∧ d]) / S(d). Therefore, 5250 = (Mean - 465) / 0.86. Therefore Mean = 4515 + 465 = 4980. Thus LER(d) = 465/4980 = 0.093. Comment: One does not use the information that the deductible amount is $500. However, note that E[x ∧ d] ≤ d, as it should.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 579
35.16. B. For the Pareto distribution LER(x) = 1 - (1 + x / θ)1 − α. For α = 2 and θ = 2000, LER(500) = 1 - 1/1.25 = 0.2. Alternately, E[X ∧ x] = {θ/(α−1)}{1−(θ/θ+x)α−1} and E[X ∧ 500] = (2000)(1 - .8) = 400. The mean is θ / (α-1) = 2000. LER(x) = E[X ∧ x] / mean = 400 / 2000 = 0.2. 35.17. B. Mean = exp(µ + .5 σ2) = exp(7 + 2) = e9 = 8103. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]} E[X ∧ 2000] = 8103Φ[(ln2000 − 7 - 4)/2] + 2000 {1 − Φ[(ln2000 − 7)/2]} = 8103 Φ(-1.7) + 2000{1 - Φ(.3)} = (8103)(1-.9554) +(2000)(1 - .6179) = 361 +764 = 1125. LER(2000) = E[X ∧ 2000] / Mean = 1125 / 8103 = 0.139. 35.18. C. For the Pareto Distribution, the Loss Elimination Ratio is: 1 - (θ/(θ+x))α−1. For θ = k and α = 2, LER(x) = 1 - (k/(k+x)) = x / (k+x). Thus LER(2k) = 2k/ 3k = 2/3. Comment: If one does not remember the formula for the LER for the Pareto, one can use the formula for the limited expected value and the fact that LER(x) = E[X ∧ x] / E[X]. 35.19. E. For the Exponential Distribution: LER(x) = 1 - e-x/θ. For θ = 1000, LER(500) = 1- e-.5 = .3935. In order to double the LER, then (2)(.3935) = 1 - e-x/1000. Thus e-x/1000 = 0.214. ⇒ x = -1000 ln(.213) = 1546. Comment: For the Exponential, e(x) = θ, and therefore R(x) = e(x) S(x) / mean = (θ)(e-x/ θ)/(θ) = e-x/ θ. Thus LER(X) = 1- R(x) = 1 - e-x/ θ. 35.20. E. Since there is 1 loss expected per risk per year, the expected amount paid by the insurer is: E[X] - E[X ∧ k] = θ/(α−1) − {θ/(α−1)}{1 - θα−1/(θ+k)α−1} = {θ/(α−1)}θα−1/(θ + k)α−1 = θα/{(α−1)(θ + k)α−1}. Alternately, the losses excess of the limit k are e(k)(1-F(k)) = {(k + θ)/(α−1)} θα/(θ+k)α = θα/{(α−1)(θ + k)α−1}. Alternately, the losses excess of the limit k are R(k)E[X] = {θ/(θ + k)}α−1{θ/(α−1)} = θα/{(α−1)(θ + k)α−1}.
2013-4-2,
Loss Distributions, §35 Loss Elimination Ratio
HCM 10/8/12,
Page 580
35.21. E. Using the solution to the prior question, but with 0.8α rather than α, the expected amount of annual losses paid by the insurer for Risk 2 is: θ.8α/{(.8α−1)(θ + k).8α−1}. That for Risk 1 is: θα/{(α−1)(θ + k)α−1}. The ratio is: {(α−1)/(.8α−1)}(θ + k).2α / θ.2α. As k goes to infinity, this ratio goes to infinity. Comment: The loss distribution of Risk 2 has a heavier tail than Risk 1. The pricing of very large deductibles is very sensitive to the value of the Pareto shape parameter, α. 35.22. B. LER = E[X ∧ x] / E[X]. ⇒ LER(10,000) / LER (1000) = E[X E[X
∧
∧ E[X ∧ E[X ∧ E[X
∧
10000] / E[X
∧
1000].
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}. 10,000] = e8.059Φ[0] + 10000 {1 − Φ[1.52]} = (3162)(.5) +10000(1 - .9357) = 2224. 1000] = e8.059Φ[-1.52] + 1000 {1−Φ[0]} = (3162)(.0643) +1000(.5) = 703. 10000] / E[X
∧
1000] = 2224 / 703 = 3.16.
35.23. E. F(20) = (20)(.010) = 0.2. S(20) = 1 - 0.2 = 0.8. f(x) = .01, 0 ≤ x ≤ 80. From 80 to 120 the graph is linear, and it is 0 at 120 and .010 at 80.
⇒ f(x) = (.01)(120 - x)/40 = .03 - .00025x, 80 ≤ x ≤ 120. 80
120
∫
E[X] = (.01)x dx + 0
x=80
x=120
x=120
∫x (.03 - .00025x) dx = .01x2/2 ] + .03x2/2 ] - .00025x3/3 ] = 50.67.
80
x=0
x=80
x=80
20
E[X
∧
∫
20] = (.01)x dx + 20S(20) = 2 + (20)(0.8) = 18. 0
LER(20) = E[X
∧
20]/E[X] = 18 / 50.67 = 35.5%.
35.24. E. E[X
∧
2] = 0f(0) + 1f(1) + 2{1 - f(0) - f(1)} = 2 - 2f(0) - f(1) = 2 - 2e-3 -3e-3 = 2 - 5e-3.
Loss Elimination Ratio = E[X
∧
2]/E[X] = (2 - 5e-3)/3 = .584.
We told that, α = 1 - LER = 1 - .584 = 0.416. Comment: α is set equal to the excess ratio for a deductible of 2.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 581
Section 36, The Effects of Inflation Inflation is a very important consideration when pricing Health Insurance and Property/Casualty Insurance. Important ideas include the effect of inflation when there is a maximum covered loss and/or deductible, in particular the effect on the average payment per loss and the average payment per payment, the effect on other quantities of interest, and the effect on size of loss distributions. Memorize the formulas for the average sizes of payment including inflation, discussed in this section! On this exam, we deal with the effects of uniform inflation, meaning that a single inflation factor is applied to all sizes of loss.258 For example, if there is 5% inflation from 1999 to the year 2000, we assume that a loss of size x in 1999 would have been of size 1.05x if it had instead occurred in the year 2000. Effect of a Maximum Covered Loss: Exercise: You are given the following: • For 1999 the amount of a single loss has the following distribution: Amount Probability $500 20% $1,000 30% $5,000 25% $10,000 15% $25,000 10% • An insurer pays all losses after applying a $10,000 maximum covered loss to each loss. • Inflation of 5% impacts all losses uniformly from 1999 to 2000. Assuming no change in the maximum covered loss, what is the inflationary impact on dollars paid by the insurer in the year 2000 as compared to the dollars the insurer paid in 1999? [Solution: One computes the average amount paid by the insurer per loss in each year: Probability
1999 Amount 1999 of Loss Insurer Payment
2000 Amount 2000 of Loss Insurer Payment
0.20 0.30 0.25 0.15 0.10
500 1,000 5,000 10,000 25,000
500 1,000 5,000 10,000 10,000
525 1,050 5,250 10,500 26,250
525 1,050 5,250 10,000 10,000
Average
5650.00
4150.00
5932.50
4232.50
4232.50 / 4150 = 1.020, therefore the insurerʼs payments increased 2.0%.] 258
Over a few years inflation can often be assumed to be approximately uniform by size of loss. However, over longer periods of time the larger losses often increase at a different rate than the smaller losses.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 582
Inflation on the limited losses is 2%, less than that of the total losses. Prior to the application of the maximum covered loss, the average loss increased by the overall inflation rate of 5%, from 5650 to 5932.5. In general, for a fixed limit, limited losses increase more slowly than the overall rate of inflation. Effect of a Deductible: Exercise: You are given the following: • For 1999 the amount of a single loss has the following distribution: Amount Probability $500 20% $1,000 30% $5,000 25% $10,000 15% $25,000 10% • An insurer pays all losses after applying a $1000 deductible to each loss. • Inflation of 5% impacts all losses uniformly from 1999 to 2000. Assuming no change in the deductible, what is the inflationary impact on dollars paid by the insurer in the year 2000 as compared to the dollars the insurer paid in 1999? [Solution: One computes the average amount paid by the insurer per loss in each year: Probability
1999 Amount 1999 of Loss Insurer Payment
2000 Amount 2000 of Loss Insurer Payment
0.20 0.30 0.25 0.15 0.10
500 1,000 5,000 10,000 25,000
0 0 4,000 9,000 24,000
525 1,050 5,250 10,500 26,250
0 50 4,250 9,500 25,250
Average
5650.00
4750.00
5932.50
5027.50
5027.5 / 4750 = 1.058, therefore the insurerʼs payments increased 5.8%.] Inflation on the losses excess of the deductible is 5.8%, greater than that of the total losses. Prior to the application of the deductible, the average loss increased by the overall inflation rate of 5%, from 5650 to 5932.5. In general, for a fixed deductible, losses paid excess of the deductible increase more quickly than the overall rate of inflation. The Loss Elimination Ratio in 1999 is: (5650 - 4750) / 5650 = 15.9%. The Loss Elimination Ratio in 2000 is: (5932.5 -5027.5) / 5932.5 = 15.3%. In general, under uniform inflation for a fixed deductible amount the LER declines. The effect of a fixed deductible decreases over time.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 583
Similarly, under uniform inflation the Excess Ratio over a fixed amount increases.259 If a reinsurer were selling reinsurance excess of a fixed limit such as $1 million, then over time the losses paid by the reinsurer would be expected to increase faster than the overall rate of inflation, in some cases much faster. Limited Losses increase slower than the total losses. Excess Losses increase faster than total losses. Limited Losses plus Excess Losses = Total Losses. Graphical Examples: Assume for example that losses follow a Pareto Distribution with α = 3 and θ = 5000 in the earlier year.260 Assume that there is 10% inflation and the same limit in both years. Then the increase in limited losses as a function of the limit is: (% Inflation (%)
9 8 7 6
10000
20000
30000
40000
Limit 50000
As the limit increases, so does the rate of inflation. For no limit the rate is 10%.
259
See 3, 11/00, Q.42. How to work with the Pareto and other continuous size of loss distributions under uniform inflation will be discussed subsequently. 260
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 584
If instead there were a fixed deductible, then the increase in losses paid excess of the deductible as a function of the deductible is: Inflation (%) 24 22 20 18 16 14 12 2000
4000
6000
8000
Deductible 10000
For no deductible the rate of inflation is 10%. As the size of the deductible increases, the losses excess of the deductible becomes “more excess”, and the rate of inflation increases. Effect of a Maximum Covered Loss and Deductible, Layers of Loss: Exercise: You are given the following: • For 1999 the amount of a single loss has the following distribution: Amount Probability $500 20% $1,000 30% $5,000 25% $10,000 15% $25,000 10% • An insurer pays all losses after applying a $10,000 maximum covered loss and then a $1000 deductible to each loss. • Inflation of 5% impacts all loss uniformly from 1999 to 2000. Assuming no change in the deductible or maximum covered loss, what is the inflationary impact on dollars paid by the insurer in the year 2000 as compared to 1999?
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 585
[Solution: One computes the average amount paid by the insurer per loss in each year: Probability
1999 Amount 1999 of Loss Insurer Payment
2000 Amount 2000 of Loss Insurer Payment
0.20 0.30 0.25 0.15 0.10
500 1,000 5,000 10,000 25,000
0 0 4,000 9,000 9,000
525 1,050 5,250 10,500 26,250
0 50 4,250 9,000 9,000
Average
5650.00
3250.00
5932.50
3327.50
3327.5 / 3250 = 1.024, therefore the insurerʼs payments increased 2.4%.] In this case, the layer of loss from 1000 to 10,000 increased more slowly than the overall rate of inflation. However, there were two competing effects. The deductible made the rate of increase larger, while the maximum covered loss made the rate of increase smaller. Which effect dominates depends on the particulars of a given situation. For example, for the ungrouped data in Section 1, the dollars of loss in various layers are as follows for the original data and the revised data after 50% uniform inflation: LAYER ($ million) Bottom Top 0 0.5 0.5 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 0
infinity
Dollars of Loss ($000) Original Data 24277 6424 4743 2441 1961 802 0 0 0
Revised Data 30174 10239 8433 4320 2661 2000 1942 1000 203
Ratio 1.24 1.59 1.78 1.77 1.36 2.49 infinite infinite infinite
40648
60972
1.50
We observe that the inflation rate for higher layers is usually higher than the uniform inflation rate, but not always. For the layer from 3 to 4 million dollars the losses increase by 36%, which is less than the overall rate of 50%. A layer (other than the very first) “gains” dollars as loss sizes increase and are thereby pushed above the bottom of the layer. For example, a loss of size 0.8 million would contribute nothing to the layer from 1 to 2 million prior to inflation, while after 50% inflation it would be of size 1.2 million, and would contribute 0.2 million. In addition, losses which were less than the top of the layer and more than the bottom of the layer, now contribute more dollars to the layer. For example, a loss of size 1.1 million would contribute 0.1 million to the layer from 1 to 2 million prior to inflation, while after 50% inflation it would be of size 1.65 million, and would contribute 0.65 million to this layer. Either of these two types of increases can be very big compared to the dollars that were in the layer prior to the effects of inflation.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 586
On the other hand, a loss whose size was bigger than the top of a given layer, contributes no more to that layer no matter how much it grows. For example, a loss of size 3 million would contribute 1 million to the layer from 1 to 2 million prior to inflation, while after 50% inflation it would be of size 4.5 million, and would still contribute 1 million. A loss of size 3 million has already contributed the width of the layer, and that is all that any single loss can contribute to that layer. So for such losses there is no increase to this layer. Thus for an empirical sample of losses, how inflation impacts a particular layer depends how the varying effects from the various sizes of losses contribute to the combined effect.
Manners of Expressing the Amount of Inflation: There are a number of different ways to express the amount of inflation: 1. State the total amount of inflation from the earlier year to the later year. 2. Give a constant annual inflation rate. 3. Give the different amounts of inflation during each annual period between the earlier and later year. 4. Give the value of some consumer price index in the earlier and later year. In all cases, you want to determine the total inflation factor, (1+r), to get from the earlier year to the later year. For example, from the year 2001 to 2004, inflation might be: 1. A total of 15%; 1 + r = 1.15. 2. 4% per year; 1 + r = (1.04)3 = 1.125. 3. 7% between 2001 and 2002, 4% between 2002 and 2003, and 5% between 2003 and 2004; 1 + r = (1.07)(1.04)(1.05) = 1.168. 4. The CPI (Consumer Price Index) was 327 in 2001 and is 366 in 2004; 1 + r = 366/327 = 1.119.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 587
Moments, etc.: If one multiplies all of the loss sizes by 1.1, then the mean is also multiplied by 1.1. E[1.1X] = 1.1 E[X]. Since each loss is multiplied by the inflation factor, (1+r), so are the Mean, Mode and Median of the distribution. Any percentile of the distribution is also multiplied by (1+r); in fact this is the definition of inflation uniform by size of loss. Any quantity in dollars is expected to be multiplied by the inflation factor, 1+ r. If one multiplies all of the loss sizes by 1.1, then the second moment is multiplied by 1.12 . E[(1.1X)2 ] = 1.12 E[X2 ]. E[{(1+r)X}n ] = (1+r)n E[Xn ]. In general, under uniform inflation the nth moment is multiplied by (1+r)n . Exercise: In 2003 the mean loss is 100 and the second moment is 50,000. Between 2003 and 2004 there is 5% inflation. What is the variance of the losses in 2004? [Solution: In 2004, the mean is: (1.05)(100) = 105, and the second moment is: (1.052 )(50000) = 55,125. Thus in 2004, the variance is: 55,125 - 1052 = 44,100.] The variance in 2003 was 50,000 - 1002 = 40,000. The variance increased by a factor of: 44,100/40,000 = 1.1025 = 1.052 = (1+r)2 . Var[(1+r)X] = E[{(1+r)X}2 ] - E[(1+r)X]2 = (1+r)2 E[X2 ] - (1+r)2 E[X]2 = (1+r)2 Var[X]. Under uniform inflation, the Variance is multiplied by (1+r)2 . Any quantity in dollars squared is expected to the multiplied by the square of the inflation factor, (1+r)2 . Since the Variance is multiplied by (1+r)2 , the Standard Deviation is multiplied by (1+r).
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 588
CV, Skewness, and Kurtosis: Exercise: In 2003 the mean loss is 100 and the second moment is 50,000. Between 2003 and 2004 there is 5% inflation. What is the coefficient of variation of the losses in 2004? [Solution: In 2004, the mean is 105, and the standard deviation is
44,100 = 210.
The Coefficient of Variation is: 210 / 105 = 2.] In this case, the CV for 2003 is:
40,000 / 100 = 2. Thus the coefficient of variation remained the
same. CV = standard deviation / mean, and in general both the numerator and denominator are multiplied by (1+r), and therefore the CV remains the same. Skewness = (3rd central moment)/ standard deviation3 . Both the numerator and denominator are in dollars cubed, and under uniform inflation they are each multiplied (1+r)3 . Thus the skewness is unaffected by uniform inflation. Kurtosis = (4th central moment)/ standard deviation4 . Both the numerator and denominator are in dollars to the fourth power, and under uniform inflation they are each multiplied (1+r)4 . Thus the kurtosis is unaffected by uniform inflation. The Coefficient of Variation, the Skewness, and the Kurtosis are each unaffected by uniform inflation. Each is a dimensionless quantity, which helps to describe the shape of a distribution and is independent of the scale of the distribution.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 589
Limited Expected Values: As discussed previously, losses limited by a fixed limit increase slower than the rate of inflation. For example, if the expected value limited to $1 million is $300,000 in the prior year, then after uniform inflation of 10%, the expected value limited to $1 million is less than $330,000 in the later year. Exercise: You are given the following: • For 1999 the amount of a single loss has the following distribution: Amount Probability $500 20% $1,000 30% $5,000 25% $10,000 15% $25,000 10% • Inflation of 5% impacts all losses uniformly from 1999 to 2000. An insurer pays all losses after applying a maximum covered loss to each loss. The maximum covered loss in 1999 is $10,000. The maximum covered loss in 2000 is $10,500, 5% more than that in 1999. What is the inflationary impact on dollars paid by the insurer in the year 2000 as compared to the dollars the insurer paid in 1999? [Solution: One computes the average amount paid by the insurer per loss in each year: Probability
1999 Amount 1999 of Loss Insurer Payment
2000 Amount 2000 of Loss Insurer Payment
0.20 0.30 0.25 0.15 0.10
500 1,000 5,000 10,000 25,000
500 1,000 5,000 10,000 10,000
525 1,050 5,250 10,500 26,250
525 1,050 5,250 10,500 10,500
Average
5650.00
4150.00
5932.50
4357.50
4357.50 / 4150 = 1.050, therefore the insurerʼs payments increased 5.0%.] On exam questions, the maximum covered loss would usually be the same in the two years. In that case, as discussed previously, the insurerʼs payments would increase at 2%, less than the overall rate of inflation. In this exercise, instead the maximum covered loss was increased in order to keep up with inflation. The result was that the insurerʼs payments, the limited expected value, increased at the overall rate of inflation. Provided the limit keeps up with inflation, the Limited Expected Value is multiplied by the inflation factor.261 If we increase the limit at the rate of inflation, then the Limited Expected Value, which is in dollars, also keeps up with inflation. 261
As discussed previously, if rather than being increased in order to keep up with inflation the limit is kept fixed, then the limited losses increase slower than the overall rate of inflation.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 590
Exercise: The expected value limited to $1 million is $300,000 in the 2007. There is 10% uniform inflation between 2007 and 2008. What is the expected value limited to $1.1 million in 2008? [Solution: Since the limit kept up with inflation, ($300,000)(1.1) = $330,000.] Proof of the Result for Limited Expected Values: The Limited Expected Value is effected for two reasons by uniform inflation. Each of the losses entering into its computation is multiplied by (1+r), but in addition the relative effect of the limit has been effected. Due to the combination of these two effects it turns out that if Z = (1+r) X, then E[Z ∧ u(1+r)] = (1+r) E[X ∧ u]. In terms of the definition of the Limited Expected Value: E[Z ∧ u(1+r)] =
u(1+r)
∫
0 u
∫
0
x fX(x) dx
z fZ(z) dz
+ {SZ(u(1+r))} {u(1+r)} =
+ {SX(u)} {L(1+u)} = (1+r) E[X ∧ u].
Where we have applied the change of variables, z = (1+r) x and thus FZ(L(1+r)) = FX(L), and fX(x) dx = fZ(z) dz. We have shown that E[(1+r)X ∧ u(1+r)] = (1+r) E[X ∧ u]. The left hand side is the Limited Expected Value in the later year, with a limit of u(1+r); we have adjusted u, the limit in the prior year, in order to keep up for inflation via the factor 1+r. This yields the Limited Expected Value in the prior year, except multiplied by the inflation factor to put them in terms of the subsequent yearʼs dollars, which is the right hand side.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 591
Mean Excess Loss: Exercise: You are given the following: • For 1999 the amount of a single loss has the following distribution: Amount Probability $500 20% $1,000 30% $5,000 25% $10,000 15% $25,000 10% • Inflation of 5% impacts all losses uniformly from 1999 to 2000. Compute the mean excess loss at $3000 in 1999. Compute the mean excess loss at $3000 in 2000. Compute the mean excess loss at $3150 in 2000. [Solution: In 1999, e(3000) = {(2000)(.25) + (7000)(.15) + (22,000)(.1)}/(.25 + .15 + .1) = 7500. In 2000, e(3000) = {(5250 - 3000)(.25) + (10,500 - 3000)(.15) + (26,250 - 3000)(.1)}/.5 = 8025. In 2000, e(3150) = {(5250 - 3150)(.25) + (10,500 - 3150)(.15) + (26,250 - 3150)(.1)}/.5 = 7875.] In this case, if the limit is increased for inflation, from 3000 to (1.05)(3000) = $3150 in 2000, then the mean excess loss increases by the rate of inflation; (1.05)(7500) = 7875. The mean excess loss in the later year is multiplied by the inflation factor, provided the limit has been adjusted to keep up with inflation. Exercise: The mean excess loss beyond $1 million is $3 million in 2007. There is 10% uniform inflation between 2007 and 2008. What is the mean excess loss beyond $1.1 million in 2008? [Solution: Since the limit kept up with inflation, ($3 million)(1.1) = $3.3 million.] If the limit is fixed, then the behavior of the mean excess loss, depends on the particular size of loss distribution.262 Proof of the Result for the Mean Excess Loss: The Mean Excess Loss or Mean Residual Life at L in the prior year is given by eX(L) = {E[X] - E[X ∧ L]} / S(L). Letting Z = (1+r)X, the mean excess loss at L(1+r) in the latter year is given by eZ(L(1+r)) = { E[Z] - E[Z ∧ L(1+r)] } / SZ(L(1+r)) = {(1+r)E[X] - (1+r)E[X ∧ L] } / {SX(L)} = (1+r)eX(L). 262
As was discussed in a previous section, different distributions have different behaviors of the mean excess loss as a function of the limit.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 592
Loss Elimination Ratio: As discussed previously, for a fixed deductible, the Loss Elimination Ratio declines under uniform inflation. For example, if the LER(1000) = 13% in the prior year, then after uniform inflation, LER(1000) is less than 13% in the latter year. Exercise: You are given the following: • For 1999 the amount of a single loss has the following distribution: Amount Probability $500 20% $1,000 30% $5,000 25% $10,000 15% $25,000 10% • Inflation of 5% impacts all losses uniformly from 1999 to 2000. An insurer pays all losses after applying a deductible to each loss. The deductible in 1999 is $1000. The deductible in 2000 is $1050, 5% more than that in 1999. Compare the loss elimination ratio in the year 2000 to that in the year 1999. [Solution: One computes the average amount paid by the insurer per loss in each year: Probability
1999 Amount 1999 of Loss Insurer Payment
2000 Amount 2000 of Loss Insurer Payment
0.20 0.30 0.25 0.15 0.10
500 1,000 5,000 10,000 25,000
0 0 4,000 9,000 24,000
525 1,050 5,250 10,500 26,250
0 0 4,200 9,450 25,200
Average
5650.00
4750.00
5932.50
4987.50
The Loss Elimination Ratio in 1999 is: 1 - 4750/5650 = 15.9%. The Loss Elimination Ratio in 2000 is: 1 - 4987.5/5932.5 = 15.9%. Comment: 4987.50 / 4750 = 1.050, therefore the insurerʼs payments increased 5.0%.] On exam questions, the deductible would usually be the same in the two years. In that case, as discussed previously, the loss elimination ratio would decrease from 15.9% to 15.3%. In this exercise, instead the deductible was increased in order to keep up with inflation. The result was that the insurerʼs payments increased at the overall rate of inflation, and the loss elimination ratio stayed the same. The Loss Elimination Ratio in the later year is unaffected by uniform inflation, provided the deductible has been adjusted to keep up with inflation.263 263
As discussed above, for a fixed deductible the Loss Elimination Ratio decreases under uniform inflation.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 593
Exercise: The Loss Elimination Ratio for a deductible of $1000 is 13% in 2007. There is 10% uniform inflation between 2007 and 2008. What is the Loss Elimination ratio for a deductible of $1100 in 2008? [Solution: Since the deductible keeps up with inflation, the Loss Elimination Ratio is the same in 2008 as in 2007, 13%.] Since the Excess Ratio is just unity minus the LER, the Excess Ratio in the latter year is unaffected by uniform inflation, provided the limit has been adjusted to keep up with inflation.264 Proof of the Result for Loss Elimination Ratios: The Loss Elimination Ratio at d in the prior year is given by LERX(d) = E[X ∧ d] / E[X]. Letting Z = (1+r)X, the Loss Elimination Ratio at d(1+r) in the latter year is given by LERZ(d(1+r)) = E[Z
∧
d(1+r)] / E[Z] = (1+r)E[X ∧ d] / {(1+r)E[X]} = E[X ∧ d] / E[X] = LERX(d).
Using Theoretical Loss Distributions: It would also make sense to use continuous distributions, obtained perhaps from fitting to a data set, in order to estimate the impact of inflation. We could apply a factor of 1+r to every loss in the data set and then fit a distribution to the altered data. In most cases, it would be a waste of time fitting new distributions to the data modified by the uniform effects of inflation. For most size of loss distributions, after uniform inflation one gets the same type of distribution with the scale parameter revised by the inflation factor. For example, for a Pareto Distribution with parameters α = 1.702 and θ = 240,151, under uniform inflation of 50% one would get another Pareto Distribution with parameters: α = 1.702, θ = (1.5)(240,151) = 360,360.265 Behavior of Specific Distributions under Uniform Inflation of (1+r): For the Pareto, θ becomes θ(1+r). The Burr and Generalized Pareto have the same behavior. Not coincidentally, for these distributions the mean is proportional to θ. As discussed in a previous section, theta is the scale parameter for these distributions; everywhere x appears in the Distribution Function it is divided by θ. In general under inflation, scale parameters are transformed under inflation by being multiplied by (1+r). For the Pareto the shape parameter α remains the same. For the Burr the shape parameters α and γ remain the same. For the Generalized Pareto the shape parameters α and τ remain the same. 264 265
As discussed above, for a fixed limit the Excess Ratio increases under uniform inflation. Prior to inflation,this the Pareto fit by maximum likelihood to the ungrouped data in Section 1.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 594
Similarly, for the Gamma, and Weibull, θ becomes θ(1+r). The Transformed Gamma has the same behavior. As parameterized in Loss Models, theta is the scale parameter for the Gamma, Weibull, and Transformed Gamma distributions. For the Gamma the shape parameter α remains the same. For the Weibull the shape parameter τ remains the same. For the Transformed Gamma the shape parameters α and τ remain the same. Since the Exponential is a special case of the Gamma, for the Exponential θ becomes θ(1+r), under uniform inflation of 1+r. Exercise: In 2001 losses follow a Gamma Distribution with parameters α = 2 and θ = 100. There is 10% inflation in total between 2001 and 2004. What is loss distribution in 2004? [Solution: Gamma with α = 2 and θ = (1.1)(100) = 110.] The behavior of the LogNormal under uniform inflation is explained by noting that multiplying each loss by a factor of (1+r) is the same as adding a constant amount ln(1+r) to the log of each loss. Adding a constant amount to a Normal distribution, gives another Normal Distribution, with the same variance but with the mean shifted. µʼ = µ + ln(1+r), and σʼ = σ. X ~ LogNormal(µ , σ). ⇔ ln(X) ~ Normal(µ , σ). ⇒ ln[(1+r)X] = ln(X) + ln(1+r) ~ Normal(µ , σ) + ln(1+r) = Normal(µ + ln(1+r), σ). ⇔ (1+r)X ~ LogNormal(µ + ln(1+r), σ). Thus under uniform inflation for the LogNormal, µ becomes µ + ln(1+r). The other parameter, σ, remains the same. The behavior of the LogNormal under uniform inflation can also be explained by the fact that eµ is the scale parameter and σ is a shape parameter. Therefore, eµ is multiplied by (1+r); eµ becomes eµ(1+r) = eµ+ln(1+r). Therefore, µ becomes µ + ln(1+r). Exercise: In 2001 losses follow a LogNormal Distribution with parameters µ = 5 and σ = 2. There is 10% inflation in total between 2001 and 2004. What is loss distribution in 2004? [Solution: LogNormal with µ = 5 + ln(1.1) = 5.095, and σ = 2.] Note that in each case, the behavior of the parameters under uniform inflation depends on the particular way in which the distribution is parameterized. For example, in Loss Models the Exponential distribution is given as: F(x) = 1 - e-x/θ. Thus in this parameterization of the Exponential, θ acts as a scale parameter, and under uniform inflation θ becomes θ(1+r). This contrasts with the parameterization of the Exponential in Actuarial Mathematics, F(x) = 1 - e-λx, where 1/λ acts as a scale parameter, and under uniform inflation λ becomes λ/(1+r). θ ⇔ 1/λ.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 595
Conveniently, most of the distributions in Loss Models have a scale parameter, which is multiplied by (1+r), while the shape parameters are unaffected. Exceptions are the LogNormal Distribution and the Inverse Gaussian.266 Note that all of the members of “Transformed Beta Family” all act similarly.267 The scale parameter θ is multiplied by the inflation factor and all of the shape parameters remain the same. All of the members of the “Transformed Gamma Family” all act similarly.268 The scale parameter θ is multiplied by the inflation factor and all of the shape parameters remain the same. Distribution
Parameters Prior to Inflation
Pareto
α
θ
Generalized Pareto
α
θ
Burr
α
Inverse Burr
Parameters After Inflation α
θ(1+r)
τ
α
θ(1+r)
τ
θ
γ
α
θ(1+r)
γ
τ
θ
γ
τ
θ(1+r)
γ
Transformed Beta
α
θ
γ τ
α
θ(1+r)
γ τ
Inverse Pareto
τ
θ
τ
θ(1+r)
Loglogistic
γ
θ
γ
θ(1+r)
Paralogistic
α
θ
α
θ(1+r)
Inverse Paralogistic
τ
θ
τ
θ(1+r)
θ
Exponential Gamma
α
Inverse Gamma
θ θ
Weibull α
Inverse Weibull
θ(1+r) α τ
θ
θ(1+r) α
θ
τ
θ(1+r) τ
θ(1+r) θ(1+r)
τ
Trans. Gamma
α
θ
τ
α
θ(1+r)
τ
Inv. Trans. Gamma
α
θ
τ
α
θ(1+r)
τ
266
This is discussed along with the behavior under uniform inflation of the LogNormal and Inverse Gaussian, in Appendix A of Loss Models. However it is not included in the Tables attached to the exam. 267 See Figure 5.2 and Appendix A of Loss Models. 268 See Figure 5.3 and Appendix A of Loss Models
2013-4-2, Distribution
Loss Distributions, §36 Inflation Parameters Prior to Inflation
HCM 10/8/12,
Page 596
Parameters After Inflation
Normal
µ
σ
µ(1+r)
σ(1+r)
LogNormal
µ
σ
µ + ln(1+r)
σ
Inverse Gaussian
µ
θ
µ(1+r)
θ(1+r)
Single Par. Pareto
α
θ
α
Uniform Distribution
a
b
Beta Distribution
a
b
θ
Generalized Beta Dist.
a
b
θ
τ
θ(1+r) a(1+r)
b(1+r)
a
b
θ(1+r)
a
b
θ(1+r)
τ
Note that all the distributions in the above table are preserved under uniform inflation. After uniform inflation we get the same type of distributions, but some or all of the parameters have changed. If X follows a type of distribution implies that cX, for any c > 0, also follows the same type of distribution, then that is defined as a scale family. So for example, the Inverse Gaussian is a scale family of distributions, even though it does not have scale parameter. If X follows an Inverse Gaussian, then Y = cX also follows an Inverse Gaussian. Any distribution with a scale parameter is a scale family. In order to compute the effects of uniform inflation on a loss distribution, one can adjust the parameters as in the above table. Then one can work with the loss distribution revised by inflation in the same manner one would work with any loss distribution. Exercise: Losses prior to inflation follow a Pareto Distribution with parameters α = 1.702 and θ = 240,151. Losses increase uniformly by 50%. What are the means prior to and subsequent to inflation? [Solution: For the Pareto Distribution, E[X] = θ/(α−1). Prior to inflation, E[X] = 240,151 / 0.702 = 342,095. After inflation, with parameters α = 1.702 and θ = (1.5)(240,151) = 360,227. After inflation, E[X] = 360,227 / 0.702 = 513,143. Alternately, inflation increases the mean by 50% to: (1.5)(342,095) = 513,143.]
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 597
Exercise: Losses prior to inflation follow a Pareto Distribution with parameters α = 1.702 and θ = 240,151. Losses increase uniformly by 50%. What are the limited expected values at 1 million prior to and subsequent to inflation? ⎛ θ ⎞ α− 1⎫ θ ⎧ [Solution: For the Pareto Distribution, E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬. ⎝ θ + x⎠ α −1 ⎩ ⎭ Prior to inflation, E[X ∧ 1 million] = (240,151 / 0.702) {1 - (240,151/1,240,151).702} = 234,044. After inflation, E[X ∧ 1 million] = (360,227 / 0.702) {1 - (360,227/1,360,227).702} = 311,232.] Exercise: Losses prior to inflation follow a Pareto Distribution with parameters α = 1.702 and θ = 240,151. Losses increase uniformly by 50%. Excess Ratio = 1 - LER. What are the excess ratios at 1 million prior to and subsequent to inflation? [Solution: Excess ratio = R(x) = (E[X] - E[X ∧ x]) / E[X] = 1 - E[X ∧ x] / E[X]. Prior to inflation, R(1 million) = 1 - 234,044 / 342,095 = 31.6%. After inflation, R(1 million) = 1 - 311,232 / 513,143 = 39.3%. Comment: As expected, for a fixed limit the Excess Ratio increases under uniform inflation. ⎛ θ ⎞ α− 1 For the Pareto the excess ratio is given by R(x) = ⎜ ⎟ .] ⎝θ + x ⎠
Behavior in General of Distributions under Uniform Inflation of (1+r): For distributions in general, including those not discussed in Loss Models, one can determine the behavior under uniform inflation as follows. One makes the change of variables Z = (1+r) X. For the Distribution Function one just sets FZ(z) = FX(x); one substitutes x = z / (1+r). Alternately, for the density function fZ(z) = fX(x) / (1+r).269 (x -µ)2 ] 2σ2 . Under uniform inflation, x = z/(1+r) and σ 2π
exp[For example, for the Normal Distribution f(x) =
{z - µ(1+ r)}2 (z / (1+r) -µ)2 exp[] ] 2 {σ(1+ r)}2 2σ 2 = . (1+ r) σ 2 π (1+ r) σ 2 π
exp[fZ(z) = fX(x) / (1+r) =
This is a Normal density function with sigma and mu each multiplied by (1+r). Thus under inflation, for the Normal µ becomes µ(1+r) and σ becomes σ(1+r). The location parameter µ has been multiplied by the inflation factor, as has the scale parameter σ. 269
Under change of variables you need to divide by dz/dx = 1+ r, since fZ(z) = dF/dz = (dF/dx) / (dz/dx) = fX(x) / (1+r).
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 598
Alternately, the Distribution Function for the Normal is Φ[(x-µ)/σ]. Therefore, FZ(z) = FX(x) = Φ[(x-µ)/σ] = Φ[({z/(1+r)}-µ)/σ] = Φ[{z- µ(1+r)}/{σ(1+r)}]. This is the Distribution Function for a Normal with sigma and mu each multiplied by (1+r), which matches the previous result. Exercise: What is the behavior under inflation of the distribution function: F(x) = xa / (xa + ba), x > 0. [Solution: Under uniform inflation, FZ(z) = FX(x) = xa / (xa + ba) = {z/(1+r)}a / ({z/(1+r)}a + ba) = za/(za + {b(1+r)}a). This is the same type of distribution, where b has become b (1+r). The scale parameter b has been multiplied by the inflation factor (1+r). Alternately, one can work with the density function f(x) = a ba xa-1 / (xa + ba)2 = (a/b)(x/b)a-1 / (1+ (x/b)a)2 . Then under uniform inflation: x = z/(1+r) and fZ(z) = fX(x) / (1+r) = (a/b)(x/b)a-1 / {(1+r)(1+ (x/b)a)2 } = (a / {b (1+r)})(z / {b (1+r)} )a-1 / { (1+ (z / {b (1+r)})a)2 }, which is same type of density, where b has become b(1+r), as was shown previously. Alternately, you can recognize b is a scale parameter, since F(x) = (x/b)a / {(x/b)a + 1}. Or alternately, you can recognize that this is Loglogistic Distribution with a = γ and b = θ. ] Exercise: What is the behavior under uniform inflation of the density function: ⎛ x ⎞2 θ ⎜ − 1⎟ ⎝µ ⎠ exp θ 2x f(x) = . 2π x1.5
[
]
[Solution: In general one substitutes for x = z / (1+r), and for the density function fZ(z) = fX(x) / (1+r).
fZ(z) = fX(x) / (1+r) =
⎛ z ⎞2 θ⎜ − 1⎟ ⎝ µ(1+ r) ⎠ exp θ 2 {z / (1+ r)} 2π (1+ r) {z / (1+r)}1.5
[
]
=
⎛ z ⎞2 θ (1+ r) ⎜ − 1⎟ ⎝ µ(1+ r) ⎠ θ(1+ r) exp 2z 2π z1.5
[
This is of the same form, but with parameters (1+r)µ and (1+r)θ, rather than µ and θ.] Thus we have shown that under uniform inflation for the Inverse Gaussian Distribution µ and θ become (1+r)µ and (1+r)θ.
]
.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 599
Behavior of the Domain of Distributions: For all of the distributions discussed so far, the domain has been 0 to ∞. For the Single Parameter Pareto distribution the domain is x > θ. Under uniform inflation the domain becomes x > (1+r)θ. In general, the domain [a, b] becomes under uniform inflation [(1+r)a, (1+r)b]. If a = 0, multiplying by 1+r has no effect; if b = ∞, multiplying by 1+r has no effect. So for distributions like the Gamma, the domain remains (0, ∞) after uniform inflation. For the Single Parameter Pareto F(x) = 1 - (x / θ)−α, x > θ, under uniform inflation α is unaffected and θ becomes (1+r)θ. The uniform distribution on [a , b] becomes under uniform inflation the uniform distribution on [a(1+r) , b(1+r)]. Working in Either the Earlier or Later Year: Exercise: Losses prior to inflation follow a Pareto Distribution with parameters α = 1.702 and θ = 240,151. Losses increase uniformly by 50%. What is the average contribution per loss to the layer from 1 million to 5 million, both prior to and subsequent to inflation? ⎛ θ ⎞ α− 1⎫ θ ⎧ [Solution: For the Pareto Distribution, E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬. ⎝ θ + x⎠ α −1 ⎩ ⎭ Prior to inflation, E[X ∧ 5 million] = (240,151 / 0.702){1 - (240,151/5,240,151).702} = 302,896. After inflation, losses follow a Pareto with α =1.702 and θ = (1.5)(240,151) = 360,227, and E[X ∧ 5 million] = (360,227 / 0.702){1 - (360,227/5,360,227).702} = 436,041. Prior to inflation, the average loss contributes: E[X ∧ 5 million] - E[X ∧ 1 million] = 302,896 - 234,044 = 68,852, to this layer. After inflation, the average loss contributes: 436,041 - 311,232 = 124,809, to this layer.] The contribution to this layer has increased by 82%, in this case more than the overall rate of inflation. There are two alternative ways to solve many problems involving inflation. In the above solution, one adjusts the size of loss distribution in the earlier year to the later year based on the amount of inflation. Then one calculates the quantity of interest in the later year. However, there is an alternative, which many people will prefer. Instead one calculates the quantity of interest in the earlier year at its deflated value, and then adjusts it to the later year for the effects of inflation. Hereʼs how this alternate method works for this example.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 600
A limit of 1 million in the later year corresponds to a limit of 1 million /1.5 = 666,667 in the earlier year. Similarly, a limit of 5 million in the later year corresponds to 5 million /1.5 = 3,333,333 in the earlier year. Using the Pareto in the earlier year, with α =1.702 and θ = 240,151, E[X ∧ 666,667] = (240,151 / 0.702){1 - (240,151/906,818)0.702} = 207,488, and E[X ∧ 3,333,333] = (240,151 / 0.702){1 - (240,151/3,573,484)0.702} = 290,694. In terms of the earlier year dollars, the contribution to the layer is: 290,694 - 207,488 = 83,206. However, one has to inflate back up to the level of the later year: (1.5)(83,206) =124,809, matching the previous solution. This type of question can also be answered using the formula discussed subsequently for the average payment per loss. This formula for the average payment per loss is just an application of the technique of working in the earlier year, by deflating limits and deductibles. However, this technique of working in the earlier year is more general, and also applies to other quantities of interest, such as the survival function. Exercise: Losses in 2003 follow a LogNormal Distribution with parameters µ = 3 and σ = 5. Between 2003 and 2009 there is a total of 35% inflation. Determine the percentage of the total number of losses in 2009 that would be expected to exceed a deductible of 1000. [Solution: The losses in year 2009 follow a LogNormal Distribution with parameters µ = 3 + ln(1.35) = 3.300 and σ = 5. Thus in 2009, S(1000) = 1 - F(1000) = 1 - Φ[{ln(1000) - 3.300} / 5] = 1 - Φ[0.72] = 1 - 0.7642 = 0.2358. Alternately, we first deflate to 2003. A deductible of 1000 in 2009 is equivalent to a deductible of 1000/1.35 = 740.74 in 2003. The losses in 2003 follow a LogNormal Distribution with parameters µ = 3 and σ = 5. Thus in 2003, S(740.74) = 1 - F(740.47) = 1 - Φ[{ln(740.47) - 3}/ 5] = 1 - Φ[0.72] = 1 - 0.7642 = 0.2358.] Of course both methods of solution produce the same answer. One can work either in terms of 2003 or 2009 dollars. In this case, the survival function is a dimensionless quantity. However, when working with quantities in dollars, such as the limited expected value, if one works in the earlier year, in this case 2003, one has to remember to reinflate the final answer back to the later year, in this case 2009.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 601
Formulas for Average Payments: The ideas discussed above can be put in terms of formulas:270 Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the insurerʼs average payment per loss in the later year is:
(1+r) c { E[X ∧
u d ] - E[X ∧ ]}. 1+ r 1+ r
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the average payment per (non-zero) payment by the insurer in the later year is:
(1+r) c
E[X ∧
u d ] - E[X ∧ ] 1+ r 1+ r . ⎛ d ⎞ S⎜ ⎟ ⎝ 1+ r ⎠
In each case we have deflated the Maximum Covered Loss and the Deductible back to the earlier year, computed the average payment in the earlier year, and then reinflated back to the later year. Important special cases are: d = 0 ⇔ no deductible, L = ∞ ⇔ no maximum covered loss, c = 1 ⇔ no coinsurance, r = 0 ⇔ no inflation or prior to the effects of inflation. For example, assume losses in 2001 follow an Exponential distribution with θ = 1000. There is a total of 10% inflation between 2001 and 2004. In 2004 there is a deductible of 500, a maximum covered loss of 5000, and a coinsurance factor of 80%. Then the average payment per (non-zero) payment in 2004 is computed as follows, using that for the Exponential Distribution, E[X
∧
x] = θ(1 - e-x/θ).
Take d = 500, u = 5000, c = 0.8, and r = 0.1. average payment per (non-zero) payment in 2004 = (1+r) c (E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]) / S(d/(1+r)) = (1.1)(0.8)(E[X ∧ 4545] - E[X ∧ 455]) / S(455) = (0.88){1000(1 - e-4545/1000) - 1000(1 - e-455/1000)}/e-455/1000 = (0.88)(989 - 366)/0.634 = 865. Note that all computations use the original Exponential Distribution in 2001, with θ = 1000. 270
See Theorem 8.7 in Loss Models.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 602
Exercise: For a LogNormal Distribution with parameters µ = 3 and σ = 5, determine E[X ∧ 100,000], E[X ∧ 1,000,000], E[X ∧ 74,074], and E[X ∧ 740,740]. [Solution: E[X ∧ 100000] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]} = exp(3 + 25/2)Φ[(ln(100000) - 3 - 25)/5] + (100000){1 - Φ[{ln(100000) - 3} / 5]} = 5,389,670Φ[-3.30] + (740,740){1 - Φ[1.70]} = 5,389,670(.0005) + (100000)(1 - .9554) = 7155. E[X ∧ 1,000,000] = exp(3 + 25/2)Φ[(ln(1000000) - 3 - 25)/5] + (1000000){1 - Φ[{ln(1000000) - 3} / 5] } = 5,389,670Φ[-2.84] + (1000000){1 - Φ[2.16]} = 5,389,670(.0023) + (1000000)(1 - 0.9846) = 27,796. E[X ∧ 74,074] = exp(3 + 25/2)Φ[(ln(74074) - 3 - 25)/5] + (74074){1 - Φ[{ln(74074) - 3} / 5]) = 5,389,670Φ[-3.36] + (74074){1 - Φ[1.64]} = 5,389,670(.0004) + (74074)(1 - 0.9495) = 5897. E[X ∧ 740,740] = exp(3 + 25/2)Φ[(ln(740,740) - 3 - 25)/5] + (740,740){1 - Φ[{ln(740,740) - 3}/ 5]} = 5,389,670Φ[-2.90] + (740,740){1 - Φ[2.10]} = 5,389,670(.0019) + (740,740)(1 - 0.9821) = 23,500.] Exercise: Losses in 2003 follow a LogNormal Distribution with parameters µ = 3 and σ = 5. Between 2003 and 2009 there is a total of 35% inflation. In 2009 there is a deductible of $100,000 and a maximum covered loss of $1 million. Determine the increase between 2003 and 2009 in the insurerʼs average payment per loss to the insured. [Solution: In 2003, take r = 0, d = 100,000, u = 1 million, and c = 1. Average payment per loss = E[X ∧ 1 million] - E[X ∧ 100,000] = 27,796 - 7155 = 20,641. In 2009, take r = 0.35, d = 100,000, u = 1 million, and c = 1. Average payment per loss = 1.35 (E[X ∧ 1 million/1.35] - E[X ∧ 100000/1.35]) = 1.35 (E[X ∧ 740,740] - E[X ∧ 74074]) = 1.35 (23,500 - 5897) = 23,764. The increase is: 23,764/20,641 - 1 = 15%. Comment: Using a computer, the exact answer without rounding is: 23,554/20,481 - 1 = 15.0%. Using the formula in order to get the average payment per loss in 2009 is equivalent to deflating to 2003, working in the year 2003, and then reinflating to the year 2009. The 2009 LogNormal has parameters µ = 3 + ln(1.45) = 3.300 and σ = 5. For this LogNormal, E[X ∧ 100,000] = exp(3.3 + 25/2)Φ[(ln(100000) - 3.3 - 25)/5] + (100000){1 - Φ[{ln(100000) - 3.3} / 5]} = 7,275,332Φ[-3.36] + (100,000){1 - Φ[1.64]} = 7,275,332(.0004) + (100000)(1 - 0.9495) = 7960. For this LogNormal, E[X ∧ 1,000,000] = exp(3.3 + 25/2)Φ[(ln(1000000) - 3.3 - 25)/5] + (1000000){1 - Φ[{ln(1000000) - 3.3} / 5]} = 7,275,332Φ[-2.90] + (100,0000){1 - Φ[2.10]} = 7,275,332(.0019) + (1000000)(1 - 0.9821) = 31,723. 31,723 - 7960 = 23,763, matching the 23,764 obtained above except for rounding.]
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 603
Exercise: Losses in 2003 follow a LogNormal Distribution with parameters µ = 3 and σ = 5. Between 2003 and 2009 there is a total of 35% inflation. In 2009 there is a deductible of $100,000 and a maximum covered loss of $1 million. Determine the increase between 2003 and 2009 in the insurerʼs average payment per (non-zero) payment to the insured. [Solution: In 2003, take r = 0, d = 100,000, u = 1 million, and c = 1. Average payment per non-zero payment = (E[X ∧ 1 million] - E[X ∧ 100,000])/S(100,000) = (27,796 - 7155)/{1 - Φ[{ln(100000) - 3} / 5] } = 20,641/{1 - Φ[1.70]} = 20,641/0.0446 = 462,803. In 2009, take r = 0.35, d = 100,000, d = 1 million, and c = 1. Average payment per non-zero payment = 1.35 (E[X ∧ 1 million/1.35] - E[X ∧ 100000/1.35])/S(100,000/1.35) =
1.35 (E[X ∧ 740,740] - E[X ∧ 74074])/S(74074) = 1.35 (23,500 - 5897)/{1 - Φ[1.64]} = 23,764/0.0505 = 470,574. The increase is: 470,574/462,803 - 1 = 1.7%. Comment: Using a computer, the exact answer without rounding is: 468,852/462,085 - 1 = 1.5%.] Formulas for Second Moments:271 We have previously discussed second moments of layers. We can incorporate inflation in a manner similar to the formulas for first moments. However, since the second moment is in dollars squared, we reinflate back by multiplying by (1+r)2 . Also we multiply by the coinsurance factor squared. Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the second moment of the insurerʼs payment per loss in the later year is: (1+r)2 c2 { E[(X ∧
u 2 d 2 d u d ) ] - E[(X ∧ ) ] -2 ( E[X ∧ ] - E[X ∧ ])}. 1+ r 1+ r 1+ r 1+ r 1+ r
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the average payment per (non-zero) payment by the insurer in the later year is:
(1+r)2 c2
E[(X ∧
u 2 d 2 d u d ) ] - E[(X ∧ ) ] - 2 {E[X ∧ ] - E[X ∧ ]} 1+ r 1+ r 1+ r 1+ r 1+ r . ⎛ d ⎞ S ⎝ 1+ r ⎠
One can combine the formulas for the first and second moments in order to calculate the variance. 271
See Theorem 8.8 in Loss Models. If r = 0, these reduce to formulas perviously discussed.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 604
Exercise: Losses in 2005 follow a Single Parameter Pareto Distribution with α = 3 and θ = 200. Between 2005 and 2010 there is a total of 25% inflation. In 2010 there is a deductible of 300, a maximum covered loss of 900, and a coinsurance of 90%. In 2010, determine the variance of YP, the per payment variable. [Solution: From the Tables attached to the exam, for the Single Parameter Pareto, for x ≥ θ: E[X ∧ x] =
αθ θα . α - 1 (α - 1) x α - 1
E[(X ∧ x)2 ] =
α θ2 2 θα . α - 2 (α - 2) x α - 2
Thus [X ∧ 300 / 1.25] = [X ∧ 240] = (3)(200) / 2 - 2003 / {(2) (2402 )} = 230.556. E[X ∧ 900 / 1.25] = E[X ∧ 720] = (3)(200) / 2 - 2003 / {(2) (7202 )} = 292.284. S(300/1.25) = S(240) = (200/240)3 = 0.5787. Thus the mean payment per payment is: (1.25) (90%) (292.284 - 230.556) / 0.5787 = 120.00. E[(X ∧ 240)2 ] = (3)(2002 ) / 1 - (2)(2003 ) / 240 = 53,333. E[(X ∧ 720)2 ] = (3)(2002 ) / 1 - (2)(2003 ) / 720 = 97,778. Since the second moment is in dollars squared, we multiply by the square of the coinsurance factor, and the square of the inflation factor. Thus the second moment of the non-zero payments is: (1.252 ) (90%)2 {97,778 - 53,333 - (2)(240)(292.284 - 230.556)} / 0.5787 = 32,402. Thus the variance of the non-zero payments is: 32,402 - 120.002 = 18,002. Alternately, work with the 2010 Single Parameter Pareto with α = 3, and θ = (200)(1.25) = 250. E [X ∧ 300] = (3)(250) / 2 - 2503 / {(2) (3002 )} = 288.194. E [X ∧ 900] = (3)(250) / 2 - 2503 / {(2) (9002 )} = 365.355. S(300) = (250/300)3 = 0.5787. Thus the mean payment per payment is: (90%) (365.355 - 288.194) / 0.5787 = 120.00. E[(X ∧ 300)2 ] = (3)(2502 ) / 1 - (2)(2503 ) / 300 = 83,333. E[(X ∧ 900)2 ] = (3)(2502 ) / 1 - (2)(2503 ) / 900 = 152,778. Since the second moment is in dollars squared, we multiply by the square of the coinsurance factor. Thus the second moment of the non-zero payments is: (90%)2 {152,778 - 83,333 - (2)(300)(365.355 - 288.194)} / 0.5787 = 32,401. Thus the variance of the non-zero payments is: 32,401 - 120.002 = 18,001.]
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 605
Mixed Distributions:272 If one has a mixed distribution, then under uniform inflation each of the component distributions acts as it would under uniform inflation. Exercise: The size of loss distribution is: F(x) = 0.7{1 - e-x/130} + 0.3{1 - (250/(250+x))2 }. After uniform inflation of 20%, what is the size of loss distribution? [Solution: After uniform inflation of 20%, we get another Exponential Distribution, but with θ = (1.2)(130) = 156: 1 - e-x/156. After uniform inflation of 20%, we get another Pareto Distribution, but with α = 2 and θ = (1.2)(250) = 300: 1 - {300/(300+x)}2 . Therefore, the mixed distribution becomes: 0.7{1 - e-x/156} + 0.3{1 - (300/(300+x))2 }. ]
272
Mixed Distributions are discussed in a subsequent section.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 606
Non-Uniform Rates of Inflation by Size of Loss: On the exam, inflation is assumed to be uniform by size of loss. What would one expect to see if for example large losses were inflating at a higher rate than smaller losses? Then we would expect for example the 90th percentile to increase at a faster rate than the median.273 Exercise: In 2001 the losses follow a Pareto Distribution with parameters α = 3 and θ = 1000. In 2004 the losses follow a Pareto Distribution with parameters α = 2.5 and θ = 1100. What is the increase from 2001 to 2004 in the median (50th percentile)? Also, what is the increase from 2001 to 2004 in the 90th percentile? [Solution: For the Pareto, at the 90th percentile: 0.9 = 1 - {θ/(θ +x)}α. ⇒ x = θ{101/α -1}. In 2001 the 90th percentile is: 1000{101/3 -1} = 1154. In 2004 the 90th percentile is: 1100{101/2.5 - 1} = 1663. For the Pareto, the median is: θ{21/α -1}. In 2001 the median is: 1000{21/3 - 1} = 260. In 2004 the median is: 1100{21/2.5 -1} = 351. The median increased by: (351/260) - 1 = 35.0%, while the 90th percentile increased by: (1663/1154) - 1 = 44.1%.] In this case, the 90th percentile increased more than the median did. The shape parameter of the Pareto decreased, resulting in a heavier-tailed distribution in 2004 than in 2001. If the higher percentiles increase at a different rate than the lower percentiles, then inflation is not uniform by size of loss. When inflation is uniform by size of loss, all percentiles increase at the same rate.274
273
If the larger losses are inflating at a lower rate than the smaller losses then the situation is reversed and the higher percentiles will inflate more slowly than the lower percentiles. Which situation applies may be determined by graphing selected percentiles over time, with the size of loss on a log scale. In practical applications this analysis would be complicated by taking into account the impacts of any deductible and/or maximum covered loss. 274 In practical situations, the estimated rates of increase in different percentiles based on data will differ somewhat, even if the underlying inflation is uniform by size of loss.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 607
Fixed Exchange Rates of Currency:275 Finally it is useful to note that the mathematics of changes in currency is the same as that for inflation. Thus if loss sizes are expressed in dollars and you wish to convert to some other currency one multiplies each loss size by the appropriate exchange rate. Assuming each loss is paid at (approximately) the same time one can apply (approximately) the same exchange rate to each loss. This is mathematically identical to applying the same inflation factor under uniform inflation. If the exchange rate is 80 yen per dollar, then the Loss Elimination Ratio at 80,000 yen is the same as that at $1000. Exercise: The Limited Expected Value at $1000 is $600. The exchange rate is 80 yen per dollar. Determine the Limited Expected Value at 80,000 yen. [Solution: 80,000 yen ⇔ 1000. Limited Expected Value at 80,000 yen is: (600)(80) = 48,000 yen.] The Coefficient of Variation, Skewness, and Kurtosis, which describe the shape of the size of loss distribution, are unaffected by converting to yen. Exercise: The size of loss distribution in dollars is Gamma with α = 3 and θ = 2000. The exchange rate is 0.80 euros per dollar. Determine the size of loss distribution in euros. [Solution: Gamma with α = 3 and θ = (0.80)(2000) = 1600. Comment: The mean in euros is: (3)(1600) = €4800. The mean in dollars is: (3)(2000) = $6000. ⇔ (.8)(6000) = €4800. 0.80 euros per dollar. ⇔ 1.25 dollars per euro. Going from euros to dollars would be mathematically equivalent to 25% inflation. Going from dollars to euros is mathematically equivalent to deflating back to the earlier year from the later year with 25% inflation. $6000/1.25 = €4800.]
275
See CAS3, 5/06, Q.26.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 608
Problems: 36.1 (1 point) The size of losses in 1994 follow a Pareto Distribution, with parameters α = 3, θ = 5000. Assume that inflation uniformly increases the size of losses between 1994 and 1997 by 20%. What is the average size of loss in 1997? A. 2500 B. 3000 C. 3500 D. 4000 E. 4500 36.2 (1 point) The size of losses in 2004 follow an Exponential Distribution: F(x) = 1 - e-x/θ, with θ = 200. Assume that inflation uniformly increases the size of losses between 2004 and 2009 by 3% per year. What is the variance of the loss distribution in 2009? A. 48,000 B. 50,000 C. 52,000 D. 54,000 E. 56,000 ⎛ ⎞α 1 36.3 (2 points) The size of losses in 1992 follows a Burr Distribution, F(x) = 1 - ⎜ γ⎟ , ⎝ 1 + (x / θ) ⎠ with parameters α = 2, θ = 19,307, γ = 0.7. Assume that inflation uniformly increases the size of losses between 1992 and 1996 by 30%. What is the probability of a loss being greater than 10,000 in 1996? A. 39% B. 41% C. 43% D. 45% E. 47% 36.4 (1 point) The size of losses in 1994 follow a Gamma Distribution, with parameters α = 2, θ = 100. Assume that inflation uniformly increases the size of losses between 1994 and 1996 by 10%. What are the parameters of the loss distribution in 1996? A. α = 2, θ = 100
B. α = 2, θ = 110
C. α = 2, θ = 90.9
D. α = 2, θ = 82.6
E. None of A, B, C, or D. 36.5 (2 points) The size of losses in 1995 follow a Pareto Distribution, with α = 1.5, θ = 15000. Assume that inflation uniformly increases the size of losses between 1995 and 1999 by 25%. In 1999, what is the average size of the non-zero payments excess of a deductible of 25,000? A. 72,000 B. 76,000 C. 80,000 D. 84,000 E. 88,000 36.6 (2 points) The size of losses in 1992 follow the density function f(x) = 2.5x-2 for 2 < x < 10. Assume that inflation uniformly increases the size of losses between 1992 and 1996 by 20%. What is the probability of a loss being greater than 6 in 1996? A. 23% B. 25% C. 27% D. 29% E. 31%
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 609
Use the following information for the next 4 questions: A size of loss distribution has been fit to certain data in terms of dollars. The loss sizes have been converted to yen. Assume the exchange rate is 80 yen per dollar. 36.7 (1 point) In terms of dollars the sizes of loss are given by a Loglogistic, with parameters γ = 4, and θ = 100. Which of the following are the parameters of the distribution in terms of yen? A. γ = 4, and θ = 100
B. γ = 320, and θ = 100
D. γ = 320, and θ = 8000
E. None of A, B, C, or D.
C. γ = 4, and θ = 8000
36.8 (1 point) In terms of dollars the sizes of loss are given by a LogNormal Distribution, with parameters µ = 10 and σ = 3. Which of the following are the parameters of the distribution in terms of yen? A. µ = 10 and σ = 3
B. µ = 800 and σ = 3
D. µ = 800 and σ = 240
E. None of A, B, C, or D.
C. µ = 10 and σ = 240
36.9 (1 point) In terms of dollars the sizes of loss are given by a Weibull Distribution, with parameters θ = 625 and τ = 0.5. Which of the following are the parameters of the distribution in terms of yen? A. θ = 625 and τ = 0.5
B. θ = 69.9 and τ = 0.5
D. θ = 50,000 and τ = 0.5
E. None of A, B, C, or D.
C. θ = 5,590 and τ = 0.5
36.10 (1 points) In terms of dollars the sizes of loss are given by a Paralogistic Distribution, with α = 4, θ = 100. Which of the following are the parameters of the distribution in terms of yen? A. α = 4, and θ = 100
B. α = 320, and θ = 100
D. α = 320, and θ = 8000
E. None of A, B, C, or D.
C. α = 4, and θ = 8000
36.11 (1 point) The size of losses in 1994 follows a distribution F(x) = Γ[α; λ ln(x)], x > 1, with parameters α = 40, λ = 10. Assume that inflation uniformly increases the size of losses between 1994 and 1996 by 10%. What are the parameters of the loss distribution in 1996? A. α = 40, λ = 10
B. α = 40, λ = 9.1
D. α = 40, λ = 12.1
E. None of A, B, C, or D.
C. α = 40, λ = 11
36.12 (1 point) X1 , X2 , ... X50, are independent, identically distributed variables, each with an Exponential Distribution with mean 800. What is the distribution of X their average?
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 610
36.13 (1 point) Assume that inflation uniformly increases the size of losses between 1994 and 1996 by 10%. Which of the following statements is true regarding the size of loss distribution? 1. If the skewness in 1994 is 10, then the skewness in 1996 is 13.31. 2. If the 70th percentile in 1994 is $10,000, then the 70th percentile in 1996 is $11,000. 3. If in 1994 the Loss Elimination Ratio for a deductible of $1000 is 10%, then in 1996 the Loss Elimination Ratio for a deductible of $1100 is 11%. A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A, B, C, or D 36.14 (3 points) The size of losses in 1995 follow the density function: f(x) = 375x2 e-10x + 20x3 exp(-20x4 ). Assume that inflation uniformly increases the size of losses between 1995 and 1999 by 25%. Which of the following is the density in 1999? A. 468.75x2 e-12.5x + 16x3 exp(-16x4 )
B. 192x2 e-8x + 8.192x3 exp(-8.192x4 )
C. 468.75x2 e-12.5x + 8.192x3 exp(-8.192x4 ) E. None of the above.
D. 192x2 e-8x + 16x3 exp(-16x4 )
36.15 (3 points) You are given the following:
•
Losses follow a distribution with density function ⎡ ⎛ ln(x) - 7 ⎞ 2 ⎤ exp⎢-0.5 ⎜ ⎟ ⎥ ⎝ 3 ⎠ ⎦ ⎣ f(x) = , 0 < x < ∞. 3x 2 π
•
There is a deductible of 1000.
•
173 losses are expected to exceed the deductible each year.
Determine the expected number of losses that would exceed the deductible each year if all loss amounts increased by 40%, but the deductible remained at 1000. A. Less than 175 B. At least 175, but less than 180 C. At least 180, but less than 185 D. At least 185, but less than 190 E. At least 190 36.16 (3 points) Losses in the year 2001 have a Pareto Distribution with parameters α = 3 and θ = 40. Losses are uniformly 6% higher in the year 2002 than in the year 2001. In both 2001 and 2002, an insurance policy has a deductible of 5 and a maximum covered loss of 25. What is the ratio of expected payments in 2002 over expected payments in the year 2001? (A) 104% (B) 106% (C) 108% (D) 110% (E) 112%
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 611
36.17 (2 points) You are given the following: • 1000 observed losses occurring in 1993 for a group of risks have been recorded and are grouped as follows: Interval Number of Losses (0, 100] 341 (100, 500] 202 (500, 1000] 131 (1000, 5000] 151 (5000, 10000] 146 (10000, ∞ ) 29 • Inflation of 8% per year affects all losses uniformly from 1993 to 2002. What is the expected proportion of losses for this group of risks that will be greater than 1000 in the year 2002? A. 38% B. 40% C. 42% D. 44% E. 46%
36.18 (2 points) The probability density function of losses in 1996 is: f(x) = µ
2
-µ) [ (x2βx ]
exp -
2β π x 3
, x> 0.
Between 1996 and 2001 there is a total of 20% inflation. What is the density function in 2001? A. Of the same form, but with parameters 1.2µ and β, rather than µ and β. B. Of the same form, but with parameters µ and 1.2β, rather than µ and β. C. Of the same form, but with parameters 1.2µ and 1.2β, rather than µ and β. D. Of the same form, but with parameters µ/1.2 and β, rather than µ and β. E. Of the same form, but with parameters µ and β/1.2, rather than µ and β. Use the following information for the next two questions:
• The losses in 1998 prior to any deductible follow a Distribution: F(x) = 1 - e-x/5000. • Assume that losses increase uniformly by 40% between 1998 and 2007. • In 1998, an insurer pays for losses excess of a 1000 deductible. 36.19 (2 points) If in 2007 this insurer pays for losses excess of a 1000 deductible, what is the increase between 1998 and 2007 in the dollars of losses that this insurer expects to pay? A. 44% B. 46% C. 48% D. 50% E. 52% 36.20 (2 points) If in 2007 this insurer pays for losses excess of a 1400 deductible, what is the increase between 1998 and 2007 in the dollars of losses that this insurer expects to pay? A. 38% B. 40% C. 42% D. 44% E. 46%
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 612
36.21 (3 points) You are given the following: •
In 1990, losses follow a LogNormal Distribution, with parameters 3 and σ.
• •
Between 1990 and 1999 there is uniform inflation at an annual rate of 4%. In 1990, 5% of the losses exceed the mean of the losses in 1999.
Determine σ. A. B. C. D. E.
0.960 or 2.960 0.645 or 2.645 0.546 or 3.374 0.231 or 3.059 None of the above
36.22 (3 points) You are given the following:
• The losses in 1995 follow a Weibull Distribution with parameters θ = 1 and τ = 0.3. • A relevant Consumer Price Index (CPI) is 170.3 in 1995 and 206.8 in 2001. • Assume that losses increase uniformly by an amount based on the increase in the CPI. What is the increase between 1995 and 2001 in the expected number of losses exceeding a 1000 deductible? A. 45% B. 48% C. 51% D. 54% E. 57% 36.23 (3 points) You are given the following:
• The losses in 1994 follow a LogNormal Distribution, with parameters µ = 3 and σ = 4. • Assume that losses increase by 5% from 1994 to 1995, 3% from 1995 to 1996, 7% from 1996 to 1997, and 6% from 1997 to 1998.
• In both 1994 and 1998, an insurer sells policies with a $25,000 maximum covered loss. What is the increase due to inflation between 1994 and 1998 in the dollars of losses that the insurer expects to pay? A. 9% B. 10% C. 11% D. 12% E. 13% 36.24 (3 points) You are given the following:
•
The losses in 1994 follow a Distribution: F(x) = 1 - (100000/x)3 for x > $100,000.
•
Assume that inflation is a total of 20% from 1994 to 1999.
•
In each year, a reinsurer pays for the layer of loss from $500 thousand to $2 million.
What is the increase due to inflation between 1994 and 1999 in the dollars that the reinsurer expects to pay? A. 67% B. 69% C. 71% D. 73% E. 75%
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 613
36.25 (2 points) You are given the following:
•
The losses in 2001 follow an Inverse Gaussian Distribution, with parameters µ = 3 and θ =10.
•
There is uniform inflation from 2001 to 2009 at an annual rate of 3%.
What is the variance of the distribution of losses in 2009? A. Less than 2 B. At least 2, but less than 3 C. At least 3, but less than 4 D. At least 4, but less than 5 E. At least 5 36.26 (1 point) In the year 2002 the size of loss distribution is a Pareto with α = 3 and θ = 5000. During the year 2002 what is the median of those losses of size greater than 10,000? A. 13,900 B. 14,000 C. 14,100 D. 14,200 E. 14,300 36.27 (2 points) In the year 2002 the size of loss distribution is a Pareto with α = 3 and θ = 5000. You expect a total of 15% inflation between the years 2002 and 2006. During the year 2006 what is the median of those losses of size greater than 10,000? A. 13,900 B. 14,000 C. 14,100 D. 14,200 E. 14,300 36.28 (1 point) The size of losses in 1992 follow the density function f(x) = 1000e-1000x. Assume that inflation uniformly increases the size of losses between 1992 and 1998 by 25%. Which of the following is the density in 1998? A. 800e-800x
B. 1250e-1250x
C. 17841 x.5 e-1000x
D. 1000e-1000x
E. None of the above.
36.29 (2 points) You are given the following: • For 2003 the amount of a single loss has the following distribution: Amount Probability $1,000 1/6 $2,000 1/3 $5,000 1/3 $10,000 1/6 • An insurer pays all losses after applying a $2000 deductible to each loss. • Inflation of 4% per year impacts all claims uniformly from 2003 to 2006. Assuming no change in the deductible, what is the inflationary impact on losses paid by the insurer in 2006 as compared to the losses the insurer paid in 2003? A. 9% B. 12% C. 15% D. 18% E. 21%
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 614
36.30 (2 points) You are given the following:
• The size of loss distribution in 2007 is LogNormal Distribution with µ = 5 and σ = 0.7. • Assume that losses increase by 4% per year from 2007 to 2010. What is the second moment of the size of loss distribution in 2010? A. Less than 70,000 B. At least 70,000, but less than 75,000 C. At least 75,000, but less than 80,000 D. At least 80,000, but less than 85,000 E. At least 85,000 36.31 (3 points) In 2005 sizes of loss follow a distribution F(x), with survival function S(x) and density f(x). Between 2005 and 2008 there is a total of 10% inflation. In 2008 there is a deductible of 1000. Which of the following does not represent the expected payment per loss in 2008? A. 1.1 E[X] - 1.1 E[X ∧ 909] ∞
∞
∫
B. 1.1
x f(x) dx - 1000
909
∫
f(x) dx
909
∞
C. 1.1
∫
S(x) dx
909 ∞
D. 1.1
∫
(x -1000) f(x) dx
909 ∞
E. 1.1
∫
909
909
x f(x) dx + 1.1
∫
{x f(x) - S(x)} dx
0
36.32 (3 points) For Actuaries Professional Liability insurance, severity follows a Pareto Distribution with α = 2 and θ = 500,000. Excess of loss reinsurance covers the layer from R to $1 million. Annual unlimited ground up inflation is 10% per year. Determine R, less than $1 million, such that the annual loss trend for the reinsured layer is exactly equal to the overall inflation rate of 10%.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 615
36.33 (3 points) In 2011, the claim severity distribution is exponential with mean 5000. In 2013, an insurance company will pay the amount of each claim in excess of a deductible of 1000. There is a total of 8% inflation between 2011 and 2013. In 2013, calculate the variance of the amount paid by the insurance company for one claim, including the possibility that the amount paid is 0. (A) 24 million (B) 26 million (C) 28 million (D) 30 million (E) 32 million 36.34 (3 points) In 2005 sizes of loss follow a certain distribution, and you are given the following selected values of the distribution function and limited expected value: x F(x) Limited Expected Value at x 3000 .502 2172 3500 .549 2409 4000 .590 2624 4500 .624 2820 5000 .655 3000 5500 .681 3166 6000 .705 3319 6500 .726 3462 7000 .744 3594 Between 2005 and 2010 there is a total of 25% inflation. In both 2005 and 2010 there is a deductible of 5000. In 2010 the average payment per payment is 15% more than it was in 2005. Determine E[X] in 2005. A. 5000 B. 5500 C. 6000 D. 6500 E. 7000
Use the following information for the next 2 questions:
• In 2010, losses follow a Pareto Distribution with α = 5 and θ = 40. • There is a total of 25% inflation between 2010 and 2015. • In 2015, there is a deductible of 10. 36.35 (2 points) In 2015, determine the variance of YP, the per-payment variable. A. 300
B. 325
C. 350
D. 375
E. 400
36.36 (3 points) In 2015, determine the variance of YL , the per-loss variable. A. 160
B. 180
C. 200
D. 220
E. 240
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 616
Use the following information for the next four questions:
• Losses in 2002 follow a LogNormal Distribution with parameters µ = 9.7 and σ = 0.8. • In 2007, the insured has a deductible of 10,000, maximum covered loss of 50,000, and a coinsurance factor of 90%. • Inflation is 3% per year from 2002 to 2007. 36.37 (3 points) In 2007, what is the average payment per loss? A. less than 12,100 B. at least 12,100 but less than 12,200 C. at least 12,200 but less than 12,300 D. at least 12,300 but less than 12,400 E. at least 12,400 36.38 (1 point) In 2007, what is the average payment per non-zero payment? A. less than 15,500 B. at least 15,500 but less than 15,600 C. at least 15,600 but less than 15,700 D. at least 15,700 but less than 15,800 E. at least 15,800 36.39 (3 points) In 2007, what is the standard deviation of YL , the per-loss variable? A. less than 12,100 B. at least 12,100 but less than 12,200 C. at least 12,200 but less than 12,300 D. at least 12,300 but less than 12,400 E. at least 12,400 36.40 (1 point) In 2007, what is the standard deviation of YP, the per-payment variable? A. less than 12,100 B. at least 12,100 but less than 12,200 C. at least 12,200 but less than 12,300 D. at least 12,300 but less than 12,400 E. at least 12,400
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 617
36.41 (3 points) In 2011, losses prior to the effect of a deductible follow a Pareto Distribution with α = 2 and θ = 250. There is deductible of 100 in both 2011 and 2016. The ratio of the expected aggregate payments in 2016 to 2011 is 1.26. Determine the total amount of inflation between 2011 and 2016. A. 19% B. 20% C. 21% D. 22% E. 23% 36.42 (4, 5/86, Q.61 & 4, 5/87, Q.59) (1 point) Let there be a 10% rate of inflation over the period of concern. Let X be the uninflated losses and Z be the inflated losses. Let Fx be the distribution function (d.f.) of X, and fx be the probability density function (p.d.f.) of X. Similarly, let Fz and fz be the d.f. and p.d.f. of Z. Then which of the following statements are true? 1. fz(Z) = fx(Z / 1.1) 2. If Fx is a Pareto, then Fz is also a Pareto. 3. If Fx is a LogNormal, then Fz is also a LogNormal. A. 2
B. 3
C. 1, 2
D. 1, 3
E. 2, 3
36.43 (4, 5/89, Q.58) (2 points) The random variable X with distribution function Fx(x) is distributed ⎛ ⎞α 1 according to the Burr distribution, F(x) = 1 - ⎜ γ⎟ , ⎝ 1 + (x / θ) ⎠ with parameters α > 0, θ > 0, and γ > 0. If Z = (1 + r)X where r is an inflation rate over some period of concern, find the parameters for the distribution function Fz(z) of the random variable z. A. α, θ, γ
B. α(1+r), θ, γ
D. α, θ, γ(1+r)γ
E. None of the above
C. α, θ(1+r), γ
36.44 (4, 5/90, Q.37) (2 points) Liability claim severity follows a Pareto distribution with a mean of $25,000 and parameter α = 3. If inflation increases all claims by 20%, the probability of a claim exceeding $100,000 increases by: A. less than 0.02 B. at least 0.02 but less than 0.03 C. at least 0.03 but less than 0.04 D. at least 0.04 but less than 0.05 E. at least 0.05
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 618
36.45 (4, 5/91, Q.27) (3 points) The Pareto distribution with parameters θ = 12,500 and α = 2 appears to be a good fit to 1985 policy year liability claims. Assume that inflation has been a constant 10% per year. What is the estimated claim severity for a policy issued in 1992 with a $200,000 limit of liability? A. Less than 22,000 B. At least 22,000 but less than 23,000 C. At least 23,000 but less than 24,000 D. At least 24,000 but less than 25,000 E. At least 25,000 36.46 (4, 5/91, Q.44) (2 points) Inflation often requires one to modify the parameters of a distribution fitted to historical data. If inflation has been at the same rate for all sizes of loss, which of the sets of parameters shown in Column C would be correct? The form of the distributions is as given in the Appendix A of Loss Models. (A) (B) (C) Distribution Family Distribution Function Parameters of z = (1+r)(x) ⎛x ⎞ θ ⎛x ⎞ θ 1. Inverse Gaussian Φ ⎜ − 1⎟ + e2θ / µ Φ − ⎜ + 1⎟ µ, θ(1+r) ⎝µ ⎠ x ⎝µ ⎠ x
]
[
[
]
2. Generalized Pareto
β[τ, α; x/(θ+x)]
α, θ /(1+r), τ
3. Weibull
⎡ ⎛ x ⎞ τ⎤ 1 - exp⎢-⎜ ⎟ ⎥ ⎣ ⎝ θ⎠ ⎦
θ(1+r), τ
A. 1
B. 2
C. 3
D. 1, 2, 3
E. None of the above
36.47 (4B, 5/92, Q.7) (2 points) The random variable X for claim amounts with distribution function Fx(x) is distributed according to the Erlang distribution with parameters b and c. The density function for X is as follows: f(x) = (x/b)c-1 e-x/b / { b (c-1)! }; x > 0, b > 0 , c > 1. Inflation of 100r% acts uniformly over a one year period. Determine the distribution function Fz(Z) of the random variable Z = (1+r)X. A. B. C. D. E.
Erlang with parameters b and c(1+r) Erlang with parameters b(1+r) and c Erlang with parameters b/(1+r) and c Erlang with parameters b/(1+r) and c/(1+r) No longer an Erlang distribution
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 619
36.48 (4B, 11/92, Q.20) (3 points) Claim severity follows a Burr distribution, ⎛ ⎞α 1 F(x) = 1 - ⎜ γ ⎟ , with parameters α = 3, γ = 0.5 and θ. The mean is 10,000. ⎝ 1 + (x / θ) ⎠ If inflation increases all claims uniformly by 44%, determine the probability of a claim exceeding $40,000 after inflation. Hint: The nth moment of a Burr Distribution is: θn Γ(1+ n/γ) Γ(α − n/γ) / Γ(α), αγ > n. A. Less than 0.01 B. At least 0.01 but less than 0.03 C. At least 0.03 but less than 0.05 D. At least 0.05 but less than 0.07 E. At least 0.07 36.49 (4B, 5/93, Q.11) (1 point) You are given the following: • The underlying distribution for 1992 losses is given by a lognormal distribution with parameters µ = 17.953 and σ = 1.6028. • Inflation of 10% impacts all claims uniformly the next year. What is the underlying loss distribution after one year of inflation? A. lognormal with µ´ = 19.748 and σ´ = 1.6028 B. lognormal with µ´ = 18.048 and σ´ = 1.6028 C. lognormal with µ´ = 17.953 and σ´ = 1.7631 D. lognormal with µ´ = 17.953 and σ´ = 1.4571 E. no longer a lognormal distribution 36.50 (4B, 5/93, Q.12) (3 points) You are given the following: The underlying distribution for 1992 losses is given by f(x) = e-x, x > 0, where losses are expressed in millions of dollars. • Inflation of 10% impacts all claims uniformly from 1992 to 1993. • Under a basic limits policy, individual losses are capped at $1.0 (million). What is the inflation rate from 1992 to 1993 on the capped losses? A. less than 2% B. at least 2% but less than 3% C. at least 3% but less than 4% D. at least 4% but less than 5% E. at least 5% •
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 620
36.51 (4B, 5/93, Q.28) (3 points) You are given the following: • The underlying loss distribution function for a certain line of business in 1991 is: F(x) = 1 - x-5, x > 1. • From 1991 to 1992, 10% inflation impacts all claims uniformly. Determine the 1992 Loss Elimination Ratio for a deductible of 1.2. A. Less than 0.850 B. At least 0.850 but less than 0.870 C. At least 0.870 but less than 0.890 D. At least 0.890 but less than 0.910 E. At least 0.910 36.52 (4B, 11/93, Q.5) (3 points) You are given the following: • The underlying distribution for 1993 losses is given by f(x) = e-x, x > 0, where losses are expressed in millions of dollars. • Inflation of 5% impacts all claims uniformly from 1993 to 1994. • Under a basic limits policy, individual losses are capped at $1.0 million in each year. What is the inflation rate from 1993 to 1994 on the capped losses? A. Less than 1.5% B. At least 1.5%, but less than 2.5% C. At least 2.5%, but less than 3.5% D. At least 3.5%, but less than 4.5% E. At least 4.5% 36.53 (4B, 11/93, Q.15) (3 points) You are given the following: • X is the random variable for claim severity with probability distribution function F(x). • During the next year, uniform inflation of r% impacts all claims. Which of the following are true of the random variable Z = X(1+r), the claim severity one year later? 1. The coefficient of variation for Z equals (1+r) times the coefficient of variation for X. 2. For all values of d > 0, the mean excess loss of Z at d(1+r) equals (1+r) times the mean excess loss of X at d. 3. For all values of d > 0, the limited expected value of Z at d equals (1+r) times the limited expected value of X at d. A. 2 B. 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C or D
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 621
36.54 (4B, 11/93, Q.27) (3 points) You are given the following: • Losses for 1991 are uniformly distributed on [0, 10,000]. • Inflation of 5% impacts all losses uniformly from 1991 to 1992 and from 1992 to 1993 (5% each year). Determine the 1993 Loss Elimination Ratio for a deductible of $500. A. Less than 0.085 B. At least 0.085, but less than 0.090 C. At least 0.090, but less than 0.095 D. At least 0.095, but less than 0.100 E. At least 0.100 36.55 (4B, 5/94, Q.16) (1 point) You are given the following: • Losses in 1993 follow the density function f(x) = 3x-4, x ≥ 1, where x = losses in millions of dollars. • Inflation of 10% impacts all claims uniformly from 1993 to 1994. Determine the probability that losses in 1994 exceed $2.2 million. A. Less than 0.05 B. At least 0.05, but less than 0.10 C. At least 0.10, but less than 0.15 D. At least 0.15, but less than 0.20 E. At least 0.20 36.56 (4B, 5/94, Q.21) (2 points) You are given the following: • For 1993 the amount of a single claim has the following distribution: Amount Probability $1,000 1/6 $2,000 1/6 $3,000 1/6 $4,000 1/6 $5,000 1/6 $6,000 1/6 • An insurer pays all losses AFTER applying a $1,500 deductible to each loss. • Inflation of 5% impacts all claims uniformly from 1993 to 1994. Assuming no change in the deductible, what is the inflationary impact on losses paid by the insurer in 1994 as compared to the losses the insurer paid in 1993? A. Less than 5.5% B. At least 5.5%, but less than 6.5% C. At least 6.5%, but less than 7.5% D. At least 7.5%, but less than 8.5% E. At least 8.5%
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 622
36.57 (4B, 5/94, Q.24) (3 points) You are given the following: X is a random variable for 1993 losses, having the density function f(x) = 0.1e-0.1x, x > 0. Inflation of 10% impacts all losses uniformly from 1993 to 1994. For 1994, a deductible, d, is applied to all losses. P is a random variable representing payments of losses truncated and shifted by the deductible amount. Determine the value of the cumulative distribution function at p = 5, FP(5), in 1994. • • • •
A. 1 - e-0.1(5+d)/1.1 B. C. D. E.
{e-0.1(5/1.1) - e-0.1(5+d)/1.1} / {1 - e-0.1(5/1.1)} 0 At least 0.25, but less than 0.35 At least 0.35, but less than 0.45
36.58 (4B, 11/94, Q.8) (3 points) You are given the following: In 1993, an insurance companyʼs underlying loss distribution for an individual claim amount is a lognormal distribution with parameters µ = 10.0 and σ2 = 5.0. From 1993 to 1994, an inflation rate of 10% impacts all claims uniformly. In 1994, the insurance company purchases excess-of-loss reinsurance that caps the insurerʼs loss at $2,000,000 for any individual claim. Determine the insurerʼs 1994 expected net claim amount for a single claim after application of the $2,000,000 reinsurance cap. A. Less than $150,000 B. At least $150,000, but less than $175,000 C. At least $175,000, but less than $200,000 D. At least $200,000, but less than $225,000 E. At least $225,000 36.59 (4B, 11/94, Q.28) (2 points) You are given the following: In 1993, the claim amounts for a certain line of business were normally distributed with mean (x -µ)2 exp[] 2σ2 . µ = 1000 and variance σ2 = 10,000: f(x) = σ 2π Inflation of 5% impacted all claims uniformly from 1993 to 1994. What is the distribution for claim amounts in 1994? A. No longer a normal distribution B. Normal with µ = 1000.0 and σ = 102.5 C. Normal with µ = 1000.0 and σ = 105.0 D. Normal with µ = 1050.0 and σ = 102.5 E. Normal with µ = 1050.0 and σ = 105.0
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 623
36.60 (Course 160 Sample Exam #3, 1994, Q.2) (1.9 points) You are given: (i) The random variable X has an exponential distribution. (ii) px = 0.95, for all x. (iii) Y = 2X. (iv) fY(Y) is the probability density function of the random variable Y. Calculate fY(1). (A) 0.000
(B) 0.025
(C) 0.050
(D) 0.075
(E) 0.100
36.61 (4B, 5/95, Q.6) (3 points) You are given the following: • For 1994, loss sizes follow a uniform distribution on [0, 2500]. • In 1994, the insurer pays 100% of all losses. • Inflation of 3.0% impacts all losses uniformly from 1994 to 1995. • In 1995, a deductible of $100 is applied to all losses. Determine the Loss Elimination Ratio (L.E.R.) of the $100 deductible on 1995 losses. A. Less than 7.3% B. At least 7.3%, but less than 7.5% C. At least 7.5%, but less than 7.7% D. At least 7.7%, but less than 7.9% E. At least 7.9% 36.62 (4B, 5/95, Q.23) (2 points) You are given the following: • Losses follow a Pareto distribution, with parameters θ = 1000 and α = 2. • 10 losses are expected each year. • The number of losses and the individual loss amounts are independent. • For each loss that occurs, the insurer's payment is equal to the entire amount of the loss if the loss is greater than 100. The insurer makes no payment if the loss is less than or equal to 100. Determine the insurer's expected number of annual payments if all loss amounts increased uniformly by 10%. A. Less than 7.9 B. At least 7.9, but less than 8.1 C. At least 8.1, but less than 8.3 D. At least 8.3, but less than 8.5 E. At least 8.5
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 624
36.63 (4B, 11/95, Q.6) (2 points) You are given the following:
• •
In 1994, losses follow a Pareto distribution, with parameters θ = 500 and α = 1.5.
Inflation of 5% impacts all losses uniformly from 1994 to 1995. What is the median of the portion of the 1995 loss distribution above 200? A. Less than 600 B. At least 600, but less than 620 C. At least 620, but less than 640 D. At least 640, but less than 660 E. At least 660 36.64 (4B, 5/96, Q.10 & Course 3 Sample Exam, Q.18) (2 points) You are given the following:
• • • •
Losses follow a lognormal distribution, with parameters µ = 7 and σ = 2. There is a deductible of 2,000. 10 losses are expected each year.
The number of losses and the individual loss amounts are independent. Determine the expected number of annual losses that exceed the deductible if all loss amounts increased uniformly by 20%, but the deductible remained the same. A. Less than 4.0 B. At least 4.0, but less than 5.0 C. At least 5.0, but less than 6.0 D. At least 6.0, but less than 7.0 E. At least 7.0 36.65 (4B, 11/96, Q.1) (1 point) Using the information in the following table, determine the total amount of losses from 1994 and 1995 in 1996 dollars. Year Actual Losses Cost Index 1994 10,000,000 0.8 1995 9,000,000 0.9 1996 --1.0 A. Less than 16,000,000 B. At least 16,000,000, but less than 18,000,000 C. At least 18,000,000, but less than 20,000,000 D. At least 20,000,000, but less than 22,000,000 E. At least 22,000,000
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 625
36.66 (4B, 11/96, Q.14) (2 points) You are given the following:
• •
Losses follow a Pareto distribution, with parameters θ = k and α = 2, where k is a constant.
There is a deductible of 2k. Over a period of time, inflation has uniformly affected all losses, causing them to double, but the deductible remains the same. What is the new loss elimination ratio (LER)? A. 1/6 B. 1/3 C. 2/5 D. 1/2 E. 2/3 36.67 (4B, 11/96, Q.25) (1 point) The random variable X has a lognormal distribution, with parameters µ and σ. If the random variable Y is equal to 1.10X what is the distribution of Y? A. Lognormal with parameters 1.10µ and σ B. Lognormal with parameters µ and 1.10σ C. Lognormal with parameters µ + ln1.10 and σ D. Lognormal with parameters µ and σ + ln1.10 E. Not lognormal 36.68 (4B, 5/97, Q.17) (2 points) You are given the following:
• The random variable X has a Weibull distribution, with parameters θ = 625 and τ = 0.5. • Z is defined to be 0.25X. Determine the distribution of Z. A. Weibull with parameters θ = 10,000 and τ = 0.5 B. Weibull with parameters θ = 2500 and τ = 0.5 C. Weibull with parameters θ = 156.25 and τ = 0.5 D. Weibull with parameters θ = 39.06 and τ = 0.5 E. Not Weibull 36.69 (4B, 5/97, Q.20) (2 points) You are given the following:
• • •
Losses follow a distribution with density function f(x) = (1/1000) e-x/1000, 0 < x < ∞. There is a deductible of 500.
10 losses are expected to exceed the deductible each year. Determine the expected number of losses that would exceed the deductible each year if all loss amounts doubled, but the deductible remained at 500. A. Less than 10 B. At least 10, but less than 12 C. At least 12, but less than 14 D. At least 14, but less than 16 E. At least 16
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 626
36.70 (4B, 11/97, Q.4) (1 point) You are given the following: • The random variable X has a distribution that is a mixture of a Burr distribution, ⎛ ⎞α 1 F(x) = 1 - ⎜ γ ⎟ , with parameters θ = 1,000 , α = 1 and γ = 2, ⎝ 1 + (x / θ) ⎠ and a Pareto distribution, with parameters θ = 1,000 and α = 1. • Each of the two distributions in the mixture has equal weight. Y is defined to be 1.10 X, which is also a mixture of a Burr distribution and a Pareto distribution. Determine θ for the Burr distribution in this mixture. A. Less than 32 B. At least 32, but less than 33 C. At least 33, but less than 34 D. At least 34, but less than 35 E. At least 35 36.71 (4B, 11/97, Q.26) (3 points) You are given the following: •
In 1996, losses follow a lognormal distribution, with parameters µ and σ.
•
In 1997, losses follow a lognormal distribution with parameters µ+ ln k and σ,
•
where k is greater than 1. In 1996, 100p% of the losses exceed the mean of the losses in 1997.
Determine σ. Note: zp is the 100pth percentile of a normal distribution with mean 0 and variance 1. A. 2 ln k B. -zp ±
zp2 - 2 ln k
C. zp ±
zp2 - 2 ln k
D.
-zp ±
E.
zp ±
zp 2 - 2 ln k zp 2 - 2 ln k
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 627
36.72 (4B, 5/98, Q.25) (2 points) You are given the following: • 100 observed claims occurring in 1995 for a group of risks have been recorded and are grouped as follows: Interval Number of Claims (0, 250) 36 [250, 300) 6 [300, 350) 3 [350, 400) 5 [400, 450) 5 [450, 500) 0 [500, 600) 5 [600, 700) 5 [700, 800) 6 [800, 900) 1 [900, 1000) 3 [1000, ∞ ) 25 • Inflation of 10% per year affects all claims uniformly from 1995 to 1998. Using the above information, determine a range for the expected proportion of claims for this group of risks that will be greater than 500 in 1998. A. Between 35% and 40% B. Between 40% and 45% C. Between 45% and 50% D. Between 50% and 55% E. Between 55% and 60% 36.73 (4B, 11/98, Q.13) (2 points) You are given the following: • Losses follow a distribution (prior to the application of any deductible) with cumulative distribution function and limited expected values as follows: Loss Size (x) F(x) E[X ∧ x] 10,000 0.60 6,000 15,000 0.70 7,700 22,500 0.80 9,500 ∞ 1.00 20,000 • There is a deductible of 15,000 per loss and no maximum covered loss. • The insurer makes a nonzero payment p. After several years of inflation, all losses have increased in size by 50%, but the deductible has remained the same. Determine the expected value of p. A. Less than 15,000 B. At least 15,000, but less than 30,000 C. At least 30,000, but less than 45,000 D. At least 45,000, but less than 60,000 E. At least 60,000
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 628
36.74 (4B, 5/99, Q.17) (2 points) You are given are following: •
In 1998, claim sizes follow a Pareto distribution, with parameters θ (unknown) and α = 2.
Inflation of 6% affects all claims uniformly from 1998 to 1999. r is the ratio of the proportion of claims that exceed d in 1999 to the proportion of claims that exceed d in 1998. Determine the limit of r as d goes to infinity. A. Less than 1.05 B. At least 1.05, but less than 1.10 C. At least 1.10, but less than 1.15 D. At least 1.15, but less than 1.20 E. At least 1.20 • •
36.75 (4B, 5/99, Q.21) (2 points) Losses follow a lognormal distribution,with parameters µ = 6.9078 and σ = 1.5174. Determine the percentage increase in the number of losses that exceed 1,000 that would result if all losses increased in value by 10%. A. Less than 2% B. At least 2%, but less than 4% C. At least 4%, but less than 6% D. At least 6%, but less than 8% E. At least 8% 36.76 (CAS6, 5/99, Q.39) (2 points) Use the information shown below to determine the one-year severity trend for the loss amounts in the following three layers of loss: $0 - $50 $50 - $100 $100 - $200
• Losses occur in multiples of $40, with equal probability, up to $200, i.e., if a loss occurs, it has an equal chance of being $40, $80, $120, $160, or $200.
• For the next year, the severity trend will uniformly increase all losses by 10%. 36.77 (4B, 11/99, Q.26) (1 point) You are given the following: • The random variable X follows a Pareto distribution, as per Loss Models, with parameters θ = 100 and α = 2 . • The mean excess loss function, eX(k), is defined to be E[X - k I X ≥ k]. • Y = 1.10 X. Determine the range of the function eY(k)/eX(k) over its domain of [0, ∞ ). A. (1, 1.10]
B. (1, ∞)
C. 1.10
D. [1.10, ∞)
E. ∞
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 629
36.78 (CAS9, 11/99, Q.38) (1.5 points) Assume a ground-up claim frequency of 0.05. Based on the following claim size distribution, answer the following questions. Show all work. Claim size (d) Fx(d) E(X ∧ d) $909 0.075 $870 $1,000 0.090 $945 $1,100 0.100 $1,040 Unlimited 1.000 $10,000 a. (1 point) For a $1,000 franchise deductible, what is the frequency of payments and the average payment per payment? b. (0.5 point) Assuming a constant annual inflation rate of 10% across all loss amounts, what is the pure premium one year later if there is a $1,000 franchise deductible? 36.79 (Course 151 Sample Exam #1, Q.7) (1.7 points) For a certain insurance, individual losses in 1994 were uniformly distributed over (0,1000). A deductible of 100 is applied to each loss. In 1995, individual losses have increased 5%, and are still uniformly distributed. A deductible of 100 is still applied to each loss. Determine the percentage increase in the standard deviation of amount paid. (A) 5.00% (B) 5.25% (C) 5.50% (D) 5.75% (E) 6.00% 36.80 (Course 1 Sample Exam, Q.17) (1.9 points) An actuary is reviewing a study she performed on the size of claims made ten years ago under homeowners insurance policies. In her study, she concluded that the size of claims followed an exponential distribution and that the probability that a claim would be less than $1,000 was 0.250. The actuary feels that the conclusions she reached in her study are still valid today with one exception: every claim made today would be twice the size of a similar claim made ten years ago as a result of inflation. Calculate the probability that the size of a claim made today is less than $1,000. A. 0.063 B. 0.125 C. 0.134 D. 0.163 E. 0.250 36.81 (3, 5/00, Q.30) (2.5 points) X is a random variable for a loss. Losses in the year 2000 have a distribution such that: E[X ∧ d] = -0.025d2 + 1.475d - 2.25, d = 10, 11, 12,..., 26 Losses are uniformly 10% higher in 2001. An insurance policy reimburses 100% of losses subject to a deductible of 11 up to a maximum reimbursement of 11. Calculate the ratio of expected reimbursements in 2001 over expected reimbursements in the year 2000. (A) 110.0% (B) 110.5% (C) 111.0% (D) 111.5% (E) 112.0%
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 630
Use the following information for the next two questions: An insurer has excess-of-loss reinsurance on auto insurance. You are given: (i) Total expected losses in the year 2001 are 10,000,000. (ii) In the year 2001 individual losses have a Pareto distribution with ⎛ 2000 ⎞ 2 F(x) = 1 - ⎜ ⎟ , x > 0. ⎝ 2000 + x ⎠ (iii) Reinsurance will pay the excess of each loss over 3000. (iv) Each year, the reinsurer is paid a ceded premium, Cyear , equal to 110% of the expected losses covered by the reinsurance. (v) Individual losses increase 5% each year due to inflation. (vi) The frequency distribution does not change. 36.82 (3, 11/00, Q.41 & 2009 Sample Q.119) (1.25 points) Calculate C2001. (A) 2,200,000
(B) 3,300,000
(C) 4,400,000
(D) 5,500,000
(E) 6,600,000
36.83 (3, 11/00, Q.42 & 2009 Sample Q.120) (1.25 points) Calculate C2002 / C2001. (A) 1.04 (B) 1.05 (C) 1.06 (D) 1.07 (E) 1.08
36.84 (3, 11/01, Q.6 & 2009 Sample Q.97) (2.5 points) A group dental policy has a negative binomial claim count distribution with mean 300 and variance 800. Ground-up severity is given by the following table: Severity Probability 40 0.25 80 0.25 120 0.25 200 0.25 You expect severity to increase 50% with no change in frequency. You decide to impose a per claim deductible of 100. Calculate the expected total claim payment after these changes. (A) Less than 18,000 (B) At least 18,000, but less than 20,000 (C) At least 20,000, but less than 22,000 (D) At least 22,000, but less than 24,000 (E) At least 24,000
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 631
36.85 (CAS3, 5/04, Q.17) (2.5 points) Payfast Auto insures sub-standard drivers.
• Each driver has the same non-zero probability of having an accident. • Each accident does damage that is exponentially distributed with θ = 200. • There is a $100 per accident deductible and insureds only "report" claims that are larger than the deductible.
• Next year each individual accident will cost 20% more. • Next year Payfast will insure 10% more drivers. What will be the percentage increase in the number of “reported” claims next year? A. Less than 15% B. At least 15%, but less than 20% C. At least 20%, but less than 25% D. At least 25%, but less than 30% E. At least 30% 36.86 (CAS3, 5/04, Q.29) (2.5 points) Claim sizes this year are described by a 2-parameter Pareto distribution with parameters θ = 1,500 and α = 4. What is the expected claim size per loss next year after 20% inflation and the introduction of a $100 deductible? A. Less than $490 B. At least $490, but less than $500 C. At least $500, but less than $510 D. At least $510, but less than $520 E. At least $520 36.87 (CAS3, 5/04, Q.34) (2.5 points) Claim severities are modeled using a continuous distribution and inflation impacts claims uniformly at an annual rate of i. Which of the following are true statements regarding the distribution of claim severities after the effect of inflation? 1. An Exponential distribution will have scale parameter (1+i)θ. 2. A 2-parameter Pareto distribution will have scale parameters (1+i)α and (1+i)θ. 3. A Paralogistic distribution will have scale parameter θ /(1+i). A. 1 only
B. 3 only
C. 1 and 2 only
D. 2 and 3 only
E. 1, 2, and 3
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 632
36.88 (CAS3, 11/04, Q.33) (2.5 points) Losses for a line of insurance follow a Pareto distribution with θ = 2,000 and α = 2. An insurer sells policies that pay 100% of each loss up to $5,000. The next year the insurer changes the policy terms so that it will pay 80% of each loss after applying a $100 deductible. The $5,000 limit continues to apply to the original loss amount. That is, the insurer will pay 80% of the loss amount between $100 and $5,000. Inflation will be 4%. Calculate the decrease in the insurer's expected payment per loss. A. Less than 23% B. At least 23%, but less than 24% C. At least 24%, but less than 25% D. At least 25%, but less than 26% E. At least 26% 36.89 (SOA3, 11/04, Q.18 & 2009 Sample Q.127) (2.5 points) Losses in 2003 follow a two-parameter Pareto distribution with α = 2 and θ = 5. Losses in 2004 are uniformly 20% higher than in 2003. An insurance covers each loss subject to an ordinary deductible of 10. Calculate the Loss Elimination Ratio in 2004. (A) 5/9 (B) 5/8 (C) 2/3 (D) 3/4 (E) 4/5 36.90 (CAS3, 11/05, Q.21) (2.5 points) Losses during the current year follow a Pareto distribution with α = 2 and θ = 400,000. Annual inflation is 10%. Calculate the ratio of the expected proportion of claims that will exceed $750,000 next year to the proportion of claims that exceed $750,000 this year. A Less than 1.105 B. At least 1.105, but less than 1.115 C. At least 1.115, but less than 1.125 D. At least 1.125, but less than 1.135 E. At least 1.135 36.91 (CAS3, 11/05, Q.33) (2.5 points) ⎛ 800 ⎞ 3 In year 2005, claim amounts have the following Pareto distribution: F(x) = 1 - ⎜ ⎟ . ⎝ 800 + x ⎠ The annual inflation rate is 8%. A franchise deductible of 300 will be implemented in 2006. Calculate the loss elimination ratio of the franchise deductible. A. Less than 0.15 B. At least 0.15, but less than 0.20 C. At least 0.20, but less than 0.25 D. At least 0.25, but less than 0.30 E. At least 0.30
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 633
36.92 (SOA M, 11/05, Q.28 & 2009 Sample Q.209) (2.5 points) In 2005 a risk has a two-parameter Pareto distribution with α = 2 and θ = 3000. In 2006 losses inflate by 20%. An insurance on the risk has a deductible of 600 in each year. Pi, the premium in year i, equals 1.2 times the expected claims. The risk is reinsured with a deductible that stays the same in each year. Ri, the reinsurance premium in year i, equals 1.1 times the expected reinsured claims. R2005/P2005 = 0.55.
Calculate R2006/P2006.
(A) 0.46
(C) 0.55
(B) 0.52
(D) 0.58
(E) 0.66
36.93 (CAS3, 5/06, Q.26) (2.5 points) The aggregate losses of Eiffel Auto Insurance are denoted in euro currency and follow a Lognormal distribution with µ = 8 and σ = 2. Given that 1 euro = 1.3 dollars, which set of lognormal parameters describes the distribution of Eiffelʼs losses in dollars? A. µ = 6.15, σ = 2.26
B. µ = 7.74, σ = 2.00
D. µ = 8.26, σ = 2.00
E. µ = 10.40, σ = 2.60
C. µ = 8.00, σ = 2.60
36.94 (CAS3, 5/06, Q.39) (2.5 points) Prior to the application of any deductible, aggregate claim counts during 2005 followed a Poisson distribution with λ = 14. Similarly, individual claim sizes followed a Pareto distribution with α = 3 and θ = 1000. Annual severity inflation is 10%. If all policies have a $250 ordinary deductible in 2005 and 2006, calculate the expected increase in the number of claims that will exceed the deductible in 2006. A. Fewer than 0.41 claims B. At least 0.41, but fewer than 0.45 C. At least 0.45, but fewer than 0.49 D. At least 0.49, but fewer than 0.53 E. At least 0.53 36.95 (CAS3, 11/06, Q.30) (2.5 points) An insurance company offers two policies. Policy R has no deductible and no limit. Policy S has a deductible of $500 and a limit of $3,000; that is, the company will pay the loss amount between $500 and $3,000. In year t, severity follows a Pareto distribution with parameters α = 4 and θ = 3,000. The annual inflation rate is 6%. Calculate the difference in expected cost per loss between policies R and S in year t+4. A. Less than $500 B. At least $500, but less than $550 C. At least $550, but less than $600 D. At least $600, but less than $650 E. At least $650
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 634
36.96 (CAS5, 5/07, Q.46) (2.0 points) You are given the following information: Claim Ground-up Uncensored Loss Amount A $35,000 B 125,000 C 180,000 D 206,000 E 97,000 If all claims experience an annual ground-up severity trend of 8.0%, calculate the effective trend in the layer from $100,000 to $200,000 ($100,000 in excess of $100,000.) Show all work.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 635
Solutions to Problems: 36.1. B. For the Pareto, the new theta is the old theta multiplied by the inflation factor of 1.2. Thus the new theta = (1.2)(5000) = 6000. Alpha is unaffected. The average size of claim for the Pareto is: θ / (α−1). In 1997, this is: 6000/(3-1) = 3000. Alternately, the mean in 1994 is 5000 / (3-1) = 2500. The mean increases by the inflation factor of 1.2; therefore the mean in 1997 is (1.2)(2500) = 3000. 36.2. D. The inflation factor is (1.03)5 = 1.1593. For the Exponential, the new θ is the old θ multiplied by the inflation factor. Thus the new θ is: (200)(1.1593) = 231.86. The variance for the Exponential Distribution is: θ2 , which in 2009 is: 231.862 = 53,758. Alternately, the variance in 2004 is: 2002 = 40,000. The variance increases by the square of the inflation factor; therefore the variance in 2009 is: (1.15932 )(40,000) = 53759. 36.3. C. For a Burr, the new theta = θ(1+r) = 19307(1.3) = 25,099. (Alpha and gamma are unaffected.) Thus in 1996, 1 - F(10,000) = {1/(1 + (10000/25099).7)}2 = 43.0%. Alternately, $10,000 in 1996 corresponds to $10,000 / 1.3 = $7692 in 1992. Then in 1992, 1 - F(7692) = {1/(1 + (7692/19307).7)}2 = 43.0%. 36.4. B. For the Gamma Distribution, θ is multiplied by the inflation factor of 1.1, while α is unaffected. Thus the parameters in 1996 are: α = 2, θ = 110. 36.5. E. For the Pareto, the new theta is the old theta multiplied by the inflation factor of 1.25. Thus the new theta = (1.25)(15000) = 18750. Alpha is unaffected. The average size of claim for data truncated and shifted at 25,000 in 1999 is the mean excess loss, e(25000), in 1999. For the Pareto e(x) = (x+θ) / (α-1). In 1999, e(25000) = (25000 + 18750) / (1.5 - 1) = 87,500. Alternately, $25,000 in 1999 corresponds to $25,000 / 1.25 = $20,000 in 1995. The average size of claim for data truncated and shifted at 20,000 in 1995 is the mean excess loss, e(20000), in 1995. For the Pareto e(x) = (x+θ) / (α−1). In 1995, e(20,000) = (20000 + 15000) / (1.5 - 1) = 70,000. However, we need to inflate this back up to get the average size in 1999 dollars: (70,000)(1.25) = 87,500. Comment: The alternate solution uses the fact that the effect of a deductible keeps up with inflation provided the limit keeps up with inflation, or equivalently if the limit keeps up with inflation, then the mean excess loss increases by the inflation rate.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 636
36.6. B. Integrating the density function, FX(x) = (2.5)(1/2 - 1/x), 2
36.10. C. For the Paralogistic, θ is multiplied by 80 (the inflation factor), while the other parameter α is unaffected. 36.11. E. In 1996 x becomes 1.1x. z = 1.1x. ln(z) = ln(1.1x) = ln(x) + ln(1.1). Thus in 1996 the distribution function is F(z) = Γ[α; λln(z)] = Γ[α; λ ln(x) + λln(1.1)]. This is not of the same form, so the answer is none of the above. Comment: This is called the LogGamma Distribution. If ln(x) follows a Gamma Distribution, then x follows a LogGamma Distribution. Under uniform inflation, ln(x) becomes ln(x) + ln(1+r). If you add a constant amount to a Gamma distribution, then you no longer have a Gamma distribution. Which is why under uniform inflation a LogGamma distribution is not reproduced. 36.12. The sum of 50 independent, identically distributed Exponentials each with θ = 800 is Gamma with α = 50 and θ = 800. The average is 1/50 times the sum, and has a Gamma Distribution with α = 50 and θ = 800/50 = 16.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 637
36.13. B. 1. F. The skewness is unaffected by uniform inflation. (The numerator of the skewness is the third central moment which would be multiplied by 1.13 .) 2. T. Since each claim size is increased by 10%, the place where 70% of the claims are less and 30% are more is also increased by 10%. Under uniform inflation, each percentile is increased by the inflation factor. 3. F. Under uniform inflation, if the deductible increases to keep up with inflation, then the Loss Elimination Ratio is unaffected. So in 1996 the Loss Elimination Ratio at $1100 is 10% not 11%. 36.14. B. This is a mixed Gamma-Weibull Distribution. The Gamma has parameters α = 3 and θ = 1/10, and density: θ−αxα−1 e−x/θ /Γ(α) = 500x2 e-10x. The Weibull has parameters θ =1/ (201/4) and τ = 4, and density: τ(x/θ)τ exp(-(x/θ)τ) /x = 80x3 exp(-20x4 ). In the mixed distribution, the Gamma is given a weight of .75, while the Weibull is given a weight of .25. Note that (0.75)(500) =375, and (0.25)(80) = 20. Under uniform inflation of 25%, the Gamma has parameters: α = 3 and θ = (1/10)1.25 = 1/8, and density: θ−αxα−1 e−x/θ /Γ(α) = 256x2 e-8x. Under uniform inflation of 25%, the Weibull has parameters: θ = 1.25/ (201/4). θ-1/4= 20/(1.25)4 = 8.192 and τ = 4, and density: τ(x/θ)τ exp(-(x/θ)τ) / x = 32.768x3 exp(-8.192x4 ). Therefore, the mixed distribution after inflation has a density of: (.75){256x2 e-8x} + (.25){32.768x3 exp(-8.192x4 )} = 192x2 e-8x + 8.192x3 exp(-8.192x4 ). Comment: For a mixed distribution under uniform inflation, the weights are unaffected, while the each separate distribution is affected as usual. 36.15. D. This is a LogNormal Distribution with parameters (prior to inflation) of µ = 7 and σ = 3. Thus posterior to inflation of 40%, one has a LogNormal Distribution with parameters of µ = 7 + ln(1.4) = 7.336 and σ = 3. For the LogNormal, S(x) = 1 - Φ((ln(x) - µ)/σ). Prior to inflation, S(1000) = 1 - Φ((ln(x) - µ)/σ) = 1 - Φ((ln(1000) - 7)/3) = 1 - Φ(-.031) = Φ(.03) = .5120. After inflation, S(1000) = 1 - Φ[(ln(x) - µ)/σ] = 1 - Φ[(ln(1000) - 7.336)/3] = 1 - Φ(-.143) = Φ(.14) = .5557. Prior to inflation, 173 losses are expected to exceed the deductible each year. The survival function increased from .5120 to .5557 after inflation. Thus after inflation one expects to exceed the deductible per year: 173(.5557)/.5120 = 187.8 claims. Alternately, a limit of 1000 after inflation is equivalent to 1000/1.4 = 714.29 prior to inflation. Thus the tail probability after inflation at 1000 is the same as the tail probability at 714.29 prior to inflation. Prior to inflation, 1 - F(714.29) = 1 - Φ((ln(x) - µ)/σ) = 1 - Φ((ln(714.3) - 7)/3) = 1 - Φ(-.14) = Φ(.14) = .5557. Proceed as before. Comment: The expected number of claims over a fixed deductible increases under uniform inflation.
2013-4-2,
Loss Distributions, §36 Inflation
36.16. A. For the Pareto Distribution, E[X
∧ E[X ∧ E[X ∧ E[X
∧
HCM 10/8/12,
Page 638
x] = {θ/(α−1)} {1−(θ/(θ+x))α−1} = 20{1 - (1 + x/40)-2}.
5/1.06] = 20{1 - (1 + 4.717/40)-2} = 3.997. 5] = 20{1 - (1+ 5/40)-2} = 4.198. E[X
∧
25/1.06] = 20(1-(1+ 23.585/40)-2) = 12.085.
25] = 20{1 - (1+ 25/40)-2} = 12.426. In 2001 the expected payments are: E[X ∧ 25] - E[X ∧ 5] = 12.426 - 4.198 = 8.228. A deductible of 5 and maximum covered loss of 25 in the year 2002, when deflated back to the year 2001, correspond to a deductible of: 5/1.06 = 4.717, and a maximum covered loss of: 25/1.06 = 23.585. Therefore, reinflating back to the year 2002, the expected payments in the year 2002 are: (1.06)(E[X ∧ 23.585] - E[X ∧ 4.717]) = (1.06)(12.085 - 3.997) = 8.573. The ratio of expected payments in 2002 over the expected payments in the year 2001 is: 8.573/ 8.228 = 1.042. Alternately, the insurerʼs average payment per loss is: (1+r) c (E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]). c = 100%, u = 25, d = 5. r = .06 for the year 2002 and r = 0 for the year 2001. Then proceed as previously. 36.17. E. Inflation is 8% per year for 9 years, thus the inflation factor is 1.089 = 1.999. Thus 1000 in the year 2002 is equivalent to 1000/1.999 = 500 in 1993. There are 457 claims excess of 500 in 1993; this is 457/1000 = 45.7%. Comment: Note the substantial increase in the proportion of claims over a fixed limit. In 1993 there are 32.6% of the claims excess of 1000, while in 2002 there are 45.7%. 36.18. C. In general one substitutes for x = z / (1+r), and for the density function fZ(z) = fX(x) / (1+r). (Recall that that under change of variables you need to divide by dz/dx = 1+ r, since dF/dz = (dF /dx) / (dz/dx).) In this case, 1 + r = 1.2. Thus, fZ(z) = fX(x) / 1.2 = µ exp[- (x-µ)2 /(2βx)] / {1.2 2βπ x3 )} = µ exp[- ({z/1.2}-µ)2 /(2β{z/1.2})] / {1.2 2βπ {z / 1.2}3 } = (1.2µ) exp[- (z-(1.2µ))2 /{2(1.2β)z}] /
2(1.2β)π z3 .
This is of the same form, but with parameters 1.2µ and 1.2β, rather than µ and β. Comment: This is an Inverse Gaussian Distribution. Let β = µ2 / θ and one has the parameterization in Loss Models, with parameters µ and θ. Since under uniform inflation, for the Inverse Gaussian each of µ and θ are multiplied by the inflation factor, so is β = µ2 / θ.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 639
36.19. C. This is an Exponential Distribution with θ = 5000. ⇒ new θ = (5000)(1.4) = 7000. For the Exponential Distribution, E[X ∧ x] = θ(1-e−x/θ). The mean is θ. The losses excess of a limit are proportional to E[X] - E[X ∧ x] = θe−x/θ. In 1998 this is for x = 1000: 5000e-1000/5000 = 4094. In 2007 this is for x = 1000: 7000e-1000/7000 = 6068. The increase is: (6068/4094) - 1 = 48.2%. Comment: In general excess losses over a fixed limit, increase faster than the rate of inflation. Note that E[X] - E[X ∧ x] = S(x)e(x) = R(x)E[X] = θe−x/θ. 36.20. B. This is an Exponential Distribution with θ = 5000. Therefore, the new theta = (5000) 1.4 = 7000. For the Exponential Distribution, E[X ∧ x] = θ(1-e−x/θ). The mean is θ. The losses excess of a limit are proportional to E[X] - E[X ∧ x] = θe−x/θ. In 1998 this is for x = 1000: 5000e-1000/5000 = 5000e-.2. In 2007 this is for x = 1400: 7000e-1400/7000 = 7000e-.2. The increase is: (7000/5000) - 1 = 40.0%. Comment: If the limit keeps up with inflation, then excess losses increase at the rate of inflation. 36.21. D. The inflation factor from 1990 to 1999 is: (1.04)9 = 1.423. Thus the parameters of the 1999 LogNormal are: 3 + ln(1.423) and σ. Therefore, the mean of the 1999 LogNormal is: Mean99 = exp(3 + ln(1.423) + σ2 /2) = 1.423 exp(3 + σ2 /2) = 1.423 Mean90. Therefore, (ln(Mean99) = 3 + ln(1.423) + σ2 /2. F90(Mean99) = Φ[(ln(Mean99)) - µ) / σ] = Φ( (3+ ln 1.423 + σ2 /2 - 3) / σ) = Φ( (ln 1.423 + σ2 /2) / σ). We are given that in 1990 5% of the losses exceed the mean of the losses in 1999. Thus, F90(Mean99) = .95. Therefore, Φ( (ln 1.423 + σ2 /2) / σ) = .95. Φ(1.645) = .95. ⇒(ln 1.423 + σ2 /2) / σ = 1.645. ⇒ σ2 /2 -1.645σ + ln 1.423 = 0.⇒ σ = 1.645 ±
1.6452 - 2 ln 1.423 = 1.645 ±
2.000 = 0.231 or 3.059.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 640
36.22. E. The inflation factor from 1995 to 2001 is 206.8/170.3 = 1.214. For the Weibull Distribution, θ is multiplied by 1+ r, while τ is unaffected. Thus in 2001 the new θ is: (1) (1.214) = 1.214. The survival function of the Weibull is S(x) = exp(-(x/θ)τ). In 1995, S(10000) = exp(-(1000.3)) = .000355. In 2001, S(10000) = exp(-(1000/1.214).3) = .000556. The ratio of survival functions is: .000556/.000355 = 1.57 or a 57% increase in the expected number of claims excess of the deductible. Comment: Generally for the Weibull, the ratio of the survival functions at x is: exp[-(x/ (1+r)θ)τ] / exp[-(x/θ)τ] = exp[(x/θ)τ {1 - 1/(1+r)}]. 36.23. B. The inflation factor from 1994 to 1998 is: (1.05)(1.03)(1.07)(1.06) = 1.2266. For the LogNormal Distribution, µ has ln(1+r) added, while σ is unaffected. Thus in 1998 the new µ is: 3 + ln(1.2266) = 3.2042. The Limited Expected Value for the LogNormal Distribution is: E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]} In 1994, E[X ∧ 25000] = e11Φ[(ln(25000) - 19)/4] + 25000{1 - Φ[(ln((25000) - 3)/4]} = 59874Φ[-2.22] + 25000{1 - Φ[1.78]} = 59,874(1 - .9868) + 25,000(1 - .9625) = 1728. In 1998, E[X ∧ 25000] = e11.2042Φ[(ln(25000) - 19.2042)/4] + 25000{1 - Φ[(ln((25000) - 3.2043)/4]} = 73438Φ[-2.27] + 25000{1 - Φ[1.73]} = 73438(1 - .9884) + 25,000{1 - .9582) = 1897. The ratio of Limited Expected Values is: 1897/1728 = 1.098 or a 9.8% increase in the expected dollars of claims between 1994 and 1998. Alternately, in 1994, E[X ∧ 20382] = e11Φ[(ln(20382) - 19)/4] + 20382{1 - Φ[(ln((20382) - 3)/4]} = 59874Φ[-2.27] + 20382{1 - Φ[1.73]} = 59,874(1 - .9884) + 20382(1 - .9582) = 1547. In 1998 the average payment per loss is: (1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]) = 1.2266 E[X ∧ 25000/1.2266] = 1.2266 E[X ∧ 20382] = (1.2266)(1547) = 1898. Proceed as before. Comment: For a fixed limit, basic limit losses increase at less than the overall rate of inflation. Here unlimited losses increase 22.7%, but limited losses increase only 9.8%. When using the formula for the average payment per loss, use the original LogNormal for 1994.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 641
36.24. D. In general a layer of loss is proportional to the integral of the survival function. In 1994, S(x) = 1015 x-3. The integral from 500,000 to 2,000,000 of S(x) is: 1015( 500000-2 - 2000000-2)/2 = 1875. In 1999, the Distribution Function is gotten by substituting x = z/1.2. F(z) = 1 - (100000/(z/1.2))3 = 1 - (120000/z)3 for z > $120,000. Thus in 1999, the integral from 500,000 to 2,000,000 of the survival function is: (120000)3 ( 500000-2 - 2000000-2)/2 = 3240. 3240 / 1875 = 1.728, representing a 72.8% increase. Alternately, this is a Single Parameter Pareto Distribution, with α = 3 and θ = 100,000. Under uniform inflation of 20%, θ becomes 120,000 while α is unaffected. E[X ∧ x] = θ [{α − (x/θ)1−α} / (α − 1)] for α > 1. Then the layer from 500,000 to 2,000,000 is proportional to: E[X ∧ 2000000] -E[X ∧ 500000] = (θ/(α−1)){(α −(2000000/θ)1−α) - (α −(500000/θ)1−α)} = (θ/(α − 1)){(500000/θ)1−α −(2000000/θ)1−α} . In 1994, E[X ∧ 2000000] -E[X ∧ 500000] = (100000/2){5-2 - 20-2) = 1875. In 1999, E[X ∧ 2000000] -E[X ∧ 500000] = (120000/2){4.1667-2 - 16.6667-2) = 3240. 3240 / 1875 = 1.728, representing a 72.8% increase. Comment: As shown in “A Practical Guide to the Single Parameter Pareto Distribution,” by Stephen W. Philbrick, PCAS LXXII, 1985, pp. 44, for the Single Parameter Pareto Distribution, a layer of losses is multiplied by (1+r)α. In this case 1.23 = 1.728. 36.25. D. The total inflation factor is (1.03)8 = 1.2668. Under uniform inflation both parameters of the Inverse Gaussian are multiplied by 1 + r = 1.2668. Thus in 2009 the parameters are: µ = 3(1.26668) = 3.8003 and θ = 10(1.2668) = 12.668. Thus the variance in 2009 is: µ3 / θ = 3.80033 / 12.668 = 4.33. Alternately, the variance in 2001 is: µ3 / θ = 33 / 10 = 2.7. Under uniform inflation, the variance is multiplied by (1+r)2 . Thus in 2009 the variance is: (2.7)(1.26682 ) = 4.33. 36.26. A. S(10000) = (5000/(5000 + 10000))3 = 1/27. Thus we want x such that S(x) = (1/2)(1/27) = 1/54. (5000/(5000 + x))3 = 1/54 ⇒ x = 5000(541/3 - 1) = 13,899.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 642
36.27. C. During the year 2006, the losses are Pareto with α = 3 and θ = (1.15)(5000) = 5750. S(10000) = {5750/(5750 + 10000)}3 = 0.04866. Thus we want x such that S(x) = (1/2)(0.04866) = 0.02433. (5750/(5750 + x))3 = 0.02433 ⇒ x = 5750(1/0.024331/3 - 1) = 14,094. 36.28. A. The new theta = (1/1000)1.25 = 1/800. Thus in 1998 the density is: e-x /θ/θ = 800e-800x. 36.29. E. The inflation factor is 1.043 = 1.1249. Probability 0.1667 0.3333 0.3333 0.1667
2003 Amount Loss Amount 1000 2000 5000 10000
2003 Insurer Payment 0 0 3000 8000
2006 Amount Loss Amount 1124.9 2249.7 5624.3 11248.6
2006 Insurer Payment 0.0 249.7 3624.3 9248.6
Average
4166.7
2333.3
4686.9
2832.8
2832.8 / 2333.3 = 1.214, therefore the insurerʼs expected payments increased 21.4%. Comment: Similar to 4B, 5/94, Q.21. 36.30. B. The second moment of the LogNormal in 2007 is exp[(2)(5) + (2)(0.72 )] = 58,689. The second moment increases by the square of the inflation factor: (1.043 )2 (58,689) = 74,260. Alternately, the LogNormal in 2010 has parameters of: µ = 5 + ln[1.043 ] = 5.1177, and σ = 0.7. The second moment of the LogNormal in 2010 is exp[(2)(5.1177) + (2)(0.72 )] = 74,265.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 643
36.31. D. Deflating, the 1000 deductible in 2005 is equivalent to a deductible of 1000/1.1 = 909 in 2005. Work in 2005 and then reinflate back up to the 2008 level by multiplying by 1.1. ∞
Average payment per loss is: 1.1 {E[X] - E[X
∧
∫
0
∞
∫
= 1.1 x f(x) dx - 1000 909
∫
909]} = 1.1{ x f(x) dx - x f(x) dx - 909 S(909)} 0
∞
909
∫ f(x) dx.
909 ∞
Average payment per loss is: 1.1 {E[X] - E[X
∧
∫
909
∞
∫
∫
909]} = 1.1{ S(x) dx - S(x) dx} = 1.1 S(x) dx. 0
0
909
∞
∫
Average payment per loss is: 1.1 E[(X - 909)+] = 1.1 (x - 909) f(x) dx. 909 ∞
Average payment per loss is: 1.1 {E[X] - E[X
∧
∫
∫
909
∫
= 1.1 x f(x) dx + 1.1 {xf(x) - S(x)} dx. 909
0
∫
909]} = 1.1 x f(x) dx - 1.1 S(x) dx 0
∞
909
0
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 644
36.32. For convenience put everything in millions of dollars. Prior to inflation, E[X ∧ x] = 0.5 - 0.52 / (0.5 + x). Thus prior to inflation the average payment per loss is: E[X ∧ 1] - E[X ∧ R] = 0.52 / (0.5 + R) - 0.52 / (0.5 + 1) = 0.52 / (0.5 + R) - 0.166667. After inflation the average payment per loss is: 1.1(E[X ∧ 1/1.1] - E[X ∧ R/1.1]) = (1.1)(0.52 ) / (0.5 + R/1.1) - (1.1)(0.52 ) / (0.5 + 1/1.1) = 0.552 / (0.55 + R) - 0.195,161. Setting the ratio of the two average payments per loss equal to 1.1: (1.1){0.52 / (0.5 + R) - 0.166667} = 0.552 / (0.55 + R) - 0.195161. ⇒ 0.011827(0.5 + R) (0.55 + R) + (1.1)(0.52 )(0.55 + R) - 0.552 (0.5 + R) = 0. ⇒ 0.011827R2 - 0.015082R + 0.0032524 = 0. R=
0.015082 ±
0.0150822 - (4)(0.011827)(0.0032524) (2)(0.011827)
= 0.6376 ± 0.3627.
R = 0.275 or 1.000. ⇒ R = $275,000. Comment: A rewritten version of CAS9, 11/99, Q.39. 36.33. C. During 2013, the losses follow an Exponential with mean: (1.08)(5000) = 5400. An Exponential distribution truncated and shifted from below is the same Exponential Distribution, due to the memoryless property of the Exponential. Thus the nonzero payments are Exponential with mean 5400. The probability of a nonzero payment is the probability that a loss is greater than the deductible of 1000; S(1000) = e-1000/5400 = 0.8310. Thus the payments of the insurer can be thought of as a compound distribution, with Bernoulli frequency with mean 0.8310 and Exponential severity with mean 5400. The variance of this compound distribution is: (Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) = (0.8310)(54002 ) + (5400)2 {(0.8310)(1 - 0.8310)} = 28.3 million. Equivalently, the payments of the insurer in this case are a two point mixture of an Exponential with mean 5400 and a distribution that is always zero, with weights 0.8310 and 0.1690. This has a first moment of: (5400)(0.8310) + (0)(0.1690) = 4487.4, and a second moment of: {(2)(54002 )}(0.8310) + (02 )(0.1690) = 48,463,920. Thus the variance is: 48,463,920 - 4487.42 = 28.3 million. Comment: Similar to 3, 11/00, Q.21, which does not include inflation.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 645
36.34. E. In 2005, the average payment per payment is: (E[X] - E[X ∧ 5000])/S(5000) = (E[X] - 3000)/(1 - .655) = 2.8986E[X] - 8695.7. In 2005, the average payment per payment is: 1.25(E[X] - E[X ∧ 5000/1.25])/S(5000/1.25) = 1.25(E[X] - E[X ∧ 4000])/(1 - F(4000)) = 1.25(E[X] - 2624)/(1 - .590) = 3.0488E[X] - 8000. Set 1.15(2.8986E[X] - 8695.7) = 3.0488E[X] - 8000. ⇒ E[X] = 2000/.2846 = 7027. 36.35. D. In 2015, the losses are Pareto with α = 5 and θ = (1.25)(4) = 50. With a deductible of 10, the non-zero payments are Pareto with α = 5 and θ = 50 + 10 = 60. The mean of this Pareto is: 60/4 = 15. The second moment of this Pareto is:
(2) (602 ) = 600. (5 - 1) (5 - 2)
The variance of this Pareto is: 600 - 152 = 375. 36.36. C. The non-zero payments are Pareto with α = 5 and θ = 50 + 10 = 60, with mean: 15, second moment: 600, and variance: 600 - 152 = 375. The probability of a non-zero payment is the survival function at 10 of the original Pareto: ⎛ 50 ⎞ 5 = 0.4019. ⎝ 50 + 10⎠ Thus YL is a two-point mixture of a Pareto distribution α = 5 and θ = 60, and a distribution that is always zero, with weights 0.4019 and 0.5981. The mean of the mixture is: (0.4019)(15) + (0.5981)(0) = 6.029. The second moment of the mixture is: (0.4019)(600) + (0.5981)(02 ) = 241.14. The variance of this mixture is: 241.14 - 6.0292 = 205. Alternately, YL can be thought of as a compound distribution, with Bernoulli frequency with mean 0.4019 and Pareto distribution α = 5 and θ = 60. The variance of this compound distribution is: (Mean Freq.)(Var. Sev.) + (Mean Sev.)2 (Var. Freq.) = (0.4019)(375) + (15)2 {(0.4019)(0.5981)} = 205.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 646
36.37. B. inflation factor = (1+r) = 1.035 = 1.1593. coinsurance factor = c = .90. Maximum Covered Loss = u = 50,000. Deductible amount = d = 10,000. L/(1+r) = 43,130. d/(1+r) = 8626. E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]} E[X ∧ u/(1+r)] = E[X ∧ 43,130] = exp(10.02)Φ[(ln(43,130) - 9.7 - .64 )/.8] + (43,130) {1 − Φ[(ln(43,130) − 9.7)/.8]} = (22,471)Φ[.41] + (43,130){1 - Φ[1.21]} = (22,471)(.6591) + (43,130){1 - .8869} = 19,689. E[X ∧ d/(1+r)] = E[X ∧ 8626] = exp(10.02)Φ[(ln(8626) - 9.7 - 0.64 )/0.8] + (8626) {1 - Φ[(ln(8626) - 9.7)/0.8]} = (22,471)Φ[-1.60] + (8626) {1 - Φ[-.80]} = (22,471)(.0548) + (8626) {.7881} = 8030. The average payment per loss is: (1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]) = (1.1593)(.9)(19,689 - 8030) = 12,165. Comment: In 2002, the average payment per loss is: (.9)(E[X ∧ 50000] - E[X ∧ 10000]) ≅ (.9)(20,345 - 9073) = 10,145. Thus it increased from 2002 to 2007 by: 12165/10145 - 1 = 19.9%. The maximum covered loss would cause the increase to be less than the rate of inflation of 15.9%, while the deductible would cause it to be greater. In this case the deductible had a bigger impact than the maximum covered loss on the rate of increase. When using the formula for the average payment for loss, use the parameters of the original LogNormal for 2002. This formula is equivalent to deflating the 2007 values back to 2002, working in 2002, and then reinflating back up to 2007. One could instead inflate the LogNormal to 2007 and work in 2007. 36.38. A. Inflation factor = (1+r) = 1.035 = 1.1593. coinsurance factor = c = 0.90. Maximum Covered Loss = u = 50,000. Deductible amount = d = 10,000. u/(1+r) = 43,130. d/(1+r) = 8626. S(d/(1+r)) = 1 - Φ[(ln(8626) -9.7)/.8] = 1 - Φ[-.80] = 0.7881. The average payment per non-zero payment is: (1+r)c(E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]) / S(d/(1+r)) = (1.1593)(0.9)(19,689 - 8030)/0.7881 = 15,435. Comment: The average payment per non-zero payment is: average payment per loss / S(d/(1+r)) = 12,165 /0.7881 = 15,436.
2013-4-2,
Loss Distributions, §36 Inflation
[ ln(x) − σµ −
36.39. D. E[(X ∧ x)2 ] = exp[2µ + 2σ2] Φ
2σ2
HCM 10/8/12,
Page 647
] + x2 {1 - Φ[ ln(x)σ − µ]}
E[(X ∧ u/(1+r))2 ] = E[(X ∧ 43,130)2 ] = exp(20.68)Φ[{ln(43,130) - 9.7 - (2)(0.82 )}/0.8] + (43,130)2 {1 - Φ[(ln(43,130) − 9.7)/0.8]} = e20.68 Φ[-0.39] + (43,1302 ){1 - Φ[1.21]} = (e20.68)(0.3483) + (43,1302 ){(1 - 0.8869) = 543,940,124. E[(X ∧ d/(1+r))2 ] = E[(X ∧ 8626)2 ] = exp(20.68)Φ[{ln(8626) - 9.7 - (2)(0.82 )}/0.8] + (86262 ) {1 - Φ[(ln(8626) - 9.7)/0.8]} = (e20.68)Φ[-2.40] + (86262 ) {1 - Φ[-.80]} = (e20.68)(0.0082) + (86262 ) (0.7881) = 66,493,633. From previous solutions: E[X ∧ 43,130] = 19,689, E[X ∧ 8626] = 8030, S(8626) = 0.7881. Thus the second moment of the per-loss variable is: (1.15932 ) (90%2 ) {543,940,124 - 66,493,633 - (2)(8626)(19,689 - 8030)} = 300,791,874. From a previous solution, the average payment per loss is 12,165. Thus the variance of the per-loss variable is: 300,791,874 - 12,1652 = 152,804,649. The standard deviation of the per-loss variable is 12,361. Comment: One could instead inflate the LogNormal to 2007 and work in 2007. The 2007 LogNormal has parameters µ = 9.7 + ln[1.035 ] = 9.848, and σ = 0.8 36.40. A. The second moment of the per-payment variable is: (second moment of the per-loss variable) / S(d/(1+r)) = 300,791,874 / 0.7881 = 381,667,141. From a previous solution, the average payment per payment is 15,435. Thus the variance of the per-payment variable is: 381,667,141 - 15,4352 = 143,427,916. The standard deviation of the per-payment variable is 11,976.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 648
36.41. B. In 2016 the ground up losses are Pareto with α = 2 and θ = (1+r)250. The average payment per loss is: E[X] - E[X ∧ 100]. For a Pareto with α = 2: E[X] - E[X ∧ 100] = θ - θ (1 -
θ θ2 )= . θ + 100 θ + 100
In 2011, E[X] - E[X ∧ 100] = 178.57. In 2016, E[X] - E[X ∧ 100] =
{250(1+ r)}2 . 250(1+ r) + 100
Setting this ratio equal to 1.26: (1.26)(178.57) =
{250(1+ r)}2 . 250(1+ r) + 100
⇒ (225) (250) (1+r) + 22,500 = 62,500 (1 + 2r + r2 ). ⇒ r2 + 1.1r - 0.26 = 0. ⇒ r = 0.2, taking the positive root of the quadratic equation. In other words, there is a total of 20% inflation between 2011 and 2016. Comment: The mean ground up loss increases by 20%, but the losses excess of the deductible increase at a faster rate of 26%. The average payment per loss is 2011: 2502 / (250 + 100) = 178.57. The average payment per loss is 2016: 3002 / (300 + 100) = 225.00. Their ratio is: 225.00/178.57 = 1.260. The average payment per payment is: e(100) = (θ + 100) / (α - 1) = θ + 100. In 2011, e(100) = 250 + 100 = 350. In 2016, e(100) = (1.2)(250) + 100 = 400. Their ratio is: 400/350 = 1.143. 36.42. E. 1. False. FZ(Z) = FX(Z / 1.1) so that fZ(Z) = fX(Z / 1.1) / 1.1. 2. True. 3. True. 36.43. C. For the Burr distribution θ is transformed by inflation to θ (1+r) . This follows from the fact that θ is the scale parameter for the Burr distribution. The shape parameters α and γ remain the same. From first principles, one makes the change of variables Z = (1+r) X. For the Distribution Function one just sets FZ(z) = FX(x); one substitutes for x = z / (1+r). FZ(z) = FX(x) = 1 - (1/(1+(x/θ)τ))α = 1 - (1/(1+(z / (1+r)θ)τ))α. This is a Burr Distribution with parameters: α, θ (1+r), and γ.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 649
36.44. A. The mean of a Pareto is θ /(α−1). Therefore, θ = (α−1)(mean) = (3-1)(25,000) = 50,000. Prior to the impact of inflation: 1 - F(100000) = {50000 / (50000 +100000)}α = 1/33 = .0370. Under uniform inflation for the Pareto, θ is multiplied by 1.2 and α is unchanged. Thus the new θ is (50000)(1.2) = 60000. Thus after inflation: 1 - F(100000) = {θ / (θ + 100000)}α = (6/16)3 = 0.0527. The increase is .0527- .0370 = 0.0157. 36.45. A. Inflation has been 10% per year for 7 years. Thus the inflation factor is 1.17 = 1.949. Under uniform inflation, the Pareto has θ increase to (1+r)θ, while α remains the same. Thus in 1992 the Pareto Distribution has parameters: 2 and (12500)(1.949) = 24,363. For the Pareto E[X ∧ x] = {θ/(α−1)}{1 - (θ/(θ+x))α−1}. Thus in 1992, E[X ∧ 200000] = {24363/(2-1)}{1-(24363/(24363+200000))2-1} = 21,717. Alternately, the 200,000 limit in 1992 corresponds to 200,000 / 1.949 = 102,617 limit in 1985. In 1985, E[X ∧ 102,617] = {12500/(2-1)}{1-(12500/(12500+102,617))2-1} = 11143. In order to inflate to 1992, multiply by 1.949: (1.949)(11143) = 21,718. 36.46. C. 1. False. For the Inverse Gaussian, both µ and θ are multiplied by 1+r. 2. False. θ becomes θ(1+r). (The Generalized Pareto acts like the Pareto under inflation. The scale parameter is multiplied by the inflation factor.) 3. True. 36.47. B. Since b divides x everywhere that x appears in the density function, b is a scale parameter. Therefore, under uniform inflation we get a Erlang Distribution with b multiplied by (1+r). Alternately, one can substitute for x = z / (1+r). For the density function fZ(z) = fX(x) / (1+r). (Recall that that under change of variables you need to divide by dz/dx = 1+ r, since dF/dy = (dF /dx) / (dy/dx).) Thus f(z) = (z/(1+r)b)c-1 e-z/(1+r)b / (1+r){ b (c-1)! }, which is an Erlang Distribution with b multiplied by (1+r) and with c unchanged. Comment: The Erlang Distribution is a special case of the Gamma Distribution, with c integral. c ⇔ α, and b ⇔ θ.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 650
36.48. D. The mean of a Burr distribution is: θ Γ(1+ 1/γ) Γ(α − 1/γ)/Γ(α) = θ Γ(1+2 )Γ(3 − 2)/Γ(3) = θ Γ(1 )Γ(3 )/Γ(3) = θ. Under uniform inflation the mean increases from 10,000 to (10,000)(1.44) = 14,400. After inflation, the chance that a claim exceeds $40,000 is: S(40000) = {1 / (1 + (40000/θ)γ)}α = {1 / (1 + (40000/14400).5)}3 = 0.0527. Alternately, one can compute the chance of exceeding 40000 / 1.44 = 27778 prior to inflation: S(27778) = {1 / (1 + (27778/10000).5)}3 = 0.0527. 36.49. B. Under uniform inflation for the LogNormal we get another LogNormal, but µ becomes µ + ln(1+r) while σ stays the same. Thus in this case µ'= 17.953 + ln(1.1) = 18.048, while σ remains 1.6028. 36.50. C. For the Exponential Distribution E[X ∧ x] = θ (1- e-x/θ). During 1992 the distribution is an Exponential Distribution with θ = 1 and the average value of the capped losses is E[X ∧ 1] = 1 - e-1 = .632. During 1993 the distribution is an Exponential Distribution with θ = 1.1. Thus in 1993, E[X
∧
1] = 1.1{1- e-1/1.1} = .6568.
The increase in capped losses between 1993 and 1992 is .6568 / .6321 = 1.039. Comments: The rate of inflation of 3.9% for the capped losses with a fixed limit is less than the overall rate of inflation of 10%.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 651
36.51. B. Prior to inflation in 1991, F(x) = 1 - x-5 , x > 1. After inflation in 1992, F(x) = 1 - (x/1.1)-5, x > 1.1. f(y) = 5(1.15 )x-6. LER(1.2) = E[X ∧ 1.2] / E[X]. 1.2
E[X ∧ 1.2] = ∫ xf(x) dx + (1.2){S(1.2)} = (1.15 )(5/4){(1.1-4) - (1.2-4)} + (1.2){(1.2/1.1)-5} 1.1
= 0.404 + 0.777 = 1.181 ∞
∞
E[X] = ∫ xf(x) dx = (1.15 )∫ x-5 dx = (1.15 ) (5/4)(1.1-4) = 1.375. 1.1
1.1
LER(1.2) = E[X ∧ 1.2] / E[X] = 1.181 / 1.375 = 0.859. Comment: Remember that under uniform inflation the domain of the Distribution Function also changes; in 1992 x >1.1. This is a Single Parameter Pareto with α = 5 and θ = 1.1. E[X ∧ x] = θ [{α − (x/θ)1−α} / (α − 1)]. E[X ∧ 1.2] = 1.1[{5 - (1.2/1.1)1-5} / (5 - 1)] = 1.181. E[X] = θ α / (α − 1) = 1.1(5/4) = 1.375. LER(x) = 1 - (1/α) (x/θ)1−α. LER(1.2) = 1 - (1/5)(1.2/1.1)-4 = 1 - (.2)(.7061) = 0.859. Note that one could instead deflate the 1.2 deductible in 1992 to a 1.2/1.1 = 1.0909 deductible in 1991 and then work with the 1991 distribution function. 36.52. B. The distribution for the 1993 losses is an exponential distribution F(x) = 1 - e-x. In order to convert into 1994 dollars, the parameter of 1 is multiplied by 1 plus the inflation rate of 5%; thus the revised parameter is 1.05. The capped losses which are given by the Limited Expected Value are for the exponential: E[X ∧ x] = (1 - e−x/θ)θ. Thus in 1993 the losses capped to 1 ($million) is E[X ∧ 1] = (1- e-1) / 1 = 0.6321. In 1994 with θ = 1.05, E[X ∧ 1] = (1 - e-0.9524)(1.05) = 0.6449. The increase in capped losses is: 0.6449 / 0.6321 = 1.019, or 1.9% inflation. Alternately rather than working with the 1994 distribution one can translate everything back to 1993 dollars and use the 1993 distribution. In 1993 dollars the 1994 limit of 1 is only 1/1.05 = 0.9524. Thus the capped losses in 1994 are in 1993 dollars E[X ∧ 0.9524] = (1 - e-0.9524). In 1994 dollars the 1994 capped losses are therefore 1.05E[X ∧ 0.9524] = 0.6449. The solution is therefore 0.6449 / 0.6321 = 1.019, or 1.9% inflation.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 652
36.53. A. Statement 1 is false. In fact the coefficient of variation as well as the skewness are dimensionless quantities which are unaffected by a change in scale and are therefore unchanged under uniform inflation. Specifically in this case the new mean is the prior mean times (1 + r), the new variance is the prior variance times (1+r)2 . Therefore, the new coefficient of variation = new standard deviation / new mean = (1+ r) prior standard deviation / (1 + r) prior mean = prior standard deviation / prior mean = prior coefficient of variation. Statement 3 is false. In fact, E[Z ∧ d(1+r)] = (1+r) E[X ∧ d]. The left hand side is the Limited Expected Value in the later year, with a limit of d(1+r); we have adjusted d, the limit in the prior year, in order to keep up for inflation via the factor 1+r. This yields the Limited Expected Value in the prior year, except multiplied by the inflation factor to put it in terms of the subsequent year dollars, which is the right hand side. For example, if the expected value limited to $1 million is $300,000 in the prior year, then after uniform inflation of 10%, the expected value limited to $1.1 million is $330,000 in the later year. In terms of the definition of the Limited Expected Value: d(1+r)
E[Z ∧ d(1+r)] =
d
∫ z fZ(z) dz + SZ(d(1+r))d(1+r) = ∫ (1+r)x fX(x) dx + SX(d)d(1+r) = (1+r)E[X ∧ d]. 0
0
Where we have applied the change of variables, z = (1+d) x and thus FZ(d(1+r)) = FX(d), and fX(x) dx = fZ(z) dz. Statement 2 is true. The mean residual life at d in the prior year is given by eX(d) =
{ mean of X - E[X ∧ d] } / {1 - FX(d)}. Similarly, the mean residual life at d(1+r) in the later year is given by eZ(d(1+r)) = {mean of Z - E[Z;d(1+r)]} / {1 - FZ(d(1+r))} = { (1+r)mean of X - (1+r)E[X ∧ d] } / {1 - FX(d)} = (1+r)eX(d) . Thus the mean residual life in the later year is multiplied by the inflation factor of (1+r), provided the limit has been adjusted to keep up with inflation. For example, if the mean residual life beyond $1 million is $3 million in the prior year, then after uniform inflation of 10%, the mean residual life beyond $1.1 million is $3.3 million in the subsequent year.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 653
36.54. B. Losses uniform on [0, 10000] in 1991 become uniform on: [0, 1.052(10000)] = [0, 11025] in 1993. 500
LER( 500) = {
11025
∫ x f(x) dx + (1- F(500))(500) } / ∫ x f(x) dx
0
0
We have f(x) = 1/11025 for 0 ≤ x ≤ 11025. F(500) = 500 / 11025 = 0.04535. Thus, LER( 500) = {(1/11025)(5002 )/2 + (1-.04535)(500)} / (11025 / 2) = 0.0886. Alternately, the LER(500) in 1993 is the LER(500/1.1025) = LER(453.51) in 1991. 453.51
In 1991: E[X ∧ 453.51] =
∫ x /10000 dx + S(453.51) (453.51) =
0
10.28 + 432.96 = 443.24. Mean in 1991 = 10000 / 2 = 5000. In 1991: LER(453.51) = E[X ∧ 453.51] / mean = 443.24 / 5000 = .0886. 36.55. C. F(x) = 1 - x-3, x ≥ 1 in 1993 dollars. A loss exceeding $2.2 million in 1994 dollars is equivalent to a loss exceeding $2.2 million / 1.1 = $2 million in 1993 dollars. The probability of the latter is: 1 - F(2) = 2-3 = 1/8 = 0.125. Alternately, the distribution function in 1994 dollars is: G(x) = 1 - (x/1.1)-3, x ≥ 1.1. Therefore, 1 - G(2.2) = (2.2/1.1)-3 = 1/8 = 0.125. Comment: Single Parameter Pareto Distribution. 36.56. D. Probability 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
1993 Amount Loss Amount 1000 2000 3000 4000 5000 6000
1993 Insurer Payment 0 500 1500 2500 3500 4500
1994 Amount Loss Amount 1050 2100 3150 4200 5250 6300
1994 Insurer Payment 0 600 1650 2700 3750 4800
Average
3500.00
2083.33
3675.00
2250
2250 / 2083 = 1.080, therefore the insurerʼs payments increased 8%. Comment: Inflation on the losses excess of the deductible is greater than that of the ground up losses.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 654
36.57. E. The distribution for the 1993 losses is an exponential distribution F(x) = 1 - e-.1x. In order to convert into 1994 dollars, the parameter of 1/.1 is multiplied by 1 plus the inflation rate of 10%; thus the revised parameter is 1.1/.1 = 1/.0909. Thus the 1994 distribution function is G(x) = 1 - e-.0909x , where x is in 1994 dollars. The next step is to write down (in 1994 dollars) the Truncated and Shifted distribution function for a deductible of d: FP(x) = {G(x+d) - G(d)} / {1 - G(d)} = {e-0.0909d - e-0.0909 (x+d)} / e-0.0909d = 1 - e-0.0909x. Fp (5) = 1- e-(0.0909)(5) = 0.3653. Alternately, $5 in 1994 dollars corresponds to $5 / 1.1 = $4.545 in 1993 dollars. In 1993, the Truncated and Shifted distribution function for a deductible of d: G(x) = {F(x+d) - F(d)} / {1 - F(d)} = {e-0.1d - e-0.1(x+d)} / e-0.1d = 1 - e-0.1x. G(4.545) = 1 - e-0.1(4.545) = 0.3653. Comment: Involves two separate questions: how to adjust for the effects of inflation and how to adjust for the effects of truncated and shifted data. Note that for the exponential distribution, after truncating and shifting the new distribution function does not depend on the deductible amount d. 36.58. B. Under uniform inflation, the parameters of a LogNormal become: µ′ = µ + ln(1.1) = 10 + .09531 = 10.09531, σ′ = σ =
5.
Using the formula for the limited expected value of the LogNormal: E[X ∧ $2,000,000] = exp(10.09531 + 5/2) Φ[ln(2,000,000) - 10.09531 - 5)/ 5 ] + (2,000,000){1 - Φ[ln(2,000,000) - 10.09531)/ 5 ]} = 295,171 Φ[-.26] + (2,000,000)(1 - Φ[1.97]) = (295,171)(.3974) + (2,000,000)(.0244) = $166 thousand. Alternately, using the original LogNormal Distribution, the average payment per loss in 1994 is: 1.1 E1993[X ∧ 2 million / 1.1] = 1.1 E1993[X ∧ 1,818,182] = 1.1 { exp(10 + 5/2) Φ[ln(1,818,182) - 10 - 5)/ 5 ] + (1,818,182){1 - Φ[ln(1,818,182) - 10)/ 5 ]} } = 1.1{268,337 Φ[-.26] + (1,818,182)(1 - Φ[1.97]) = (1.1){(268,337)(.3974) + (1,818,182)(.0244)} = $166 thousand. 36.59. E. One can put y = 1.05x, where y is the claim size in 1994 and x is the claim size in 1993. Then let g(y) be the p.d.f. for y, g(y)dy = f(x)dx = f(y/1.05) dy/1.05. g(y) = exp(-.5[{(y/1.05)-1000}/100]2 ) / { 2 π (100)(1.05)} = exp(-.5{(y-1050)/105}2 / { 2 π (105)}. This is again a Normal Distribution with both µ and σ multiplied by the inflation factor of 1.05. Comment: As is true in general, under uniform inflation, both the mean and the standard deviation have been multiplied by the inflation factor of 1.05. Assuming you remember that a Normal Distribution is reproduced under uniform inflation, you can use this general result to arrive at the solution to this particular problem, since for the Normal, µ is the mean and σ is the standard deviation.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 655
36.60. B. 0.95 = p0 = S(1) = e-1/θ. ⇒ θ = -1/ln(0.95) = 19.5. Y is also Exponential with twice the mean of X: (2)(19.5) = 39. fY(1) = e-1/39/39 = 0.0250. 36.61. C. If the losses are uniformly distributed on [0,2500] in 1994 then they are uniform on [0,2575] in 1995. (Each boundary is multiplied by the inflation factor of 1.03.) ∞
100
LER( 100) = {
∫ x f(x) dx + (1- F(100))(100) } / ∫ x f(x) dx =
0
0
100
{
2575
∫ x (1/2575) dx
+ ( 1 - (100/2575) )(100) } /
0
∫ x (1/2575) dx = 0
{(1/2575)(1002 )/2 + 100 - (1/2575)(1002 )} / (2575/2) = 7.62%. Alternately, $100 in 1995 is equivalent to 100/1.03 = $97.09 in 1994. In 1994: ∞
97.09
LER( 97.09) = {
∫ x f(x) dx + (1- F(97.09))(97.09)} / ∫ x f(x) dx =
0
0
97.09
{ 0
2500
∫ x (1/2500) dx + (1 - (97.09/2500))(97.09)} / ∫ x (1/2500) dx = 0
{(1/2500)(97.092 )/2 + 97.09 - (1/2500)(97.092 )} / (2500/2) = 7.62%. 36.62. D. Under uniform inflation the parameters of the Pareto become 2 and 1000(1.1) = 1100. The expected number of insurer payments is 10 losses per year times the percent of losses greater than 100: 10{1-F(100)} = 10 {1100/(1100+100)}2 = 8.40. Alternately, after inflation the $100 deductible is equivalent to 100/1.1 = 90.91. For the original Pareto with α = 2 and θ = 1000, 10{1-F(90.91)} = 10 {1000/1090.91}2 = 8.40.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 656
36.63. C. Under uniform inflation the scale parameter θ of the Pareto is multiplied by the inflation factor, while the shape parameter α remains the same. Therefore the size of loss distribution in 1995 has parameters: θ = (500)(1.05) = 525, α = 1.5. F(x) = 1 - {525/(525+x)}1.5 . The distribution function of the data truncated from below at 200 is: G(x) = {F(m) - F(200)} / {1-F(200)} = {(525/(525+200))1.5 - (525/(525+x))1.5} / (525/(525+200))1.5 = 1 - (725/(525+x))1.5. At the median m of the distribution truncated from below G(m) = .5. Therefore, 1 - (725/(525+m))1.5 = .5. (725/(525+m))1.5 = .5. Thus {(525+m)/725} = 21/1.5 = 1.587. Solving, m = (725)(1.587) - 525 = 626. 36.64. B. After 20% uniform inflation, the parameters of the LogNormal are: µ′ = µ + ln(1+r) = 7 + ln (1.2) = 7.18, while σ is unchanged at 2. F(2000) = Φ[{ln(2000) − 7.18} / 2] = Φ[.21] = .5832. Thus the expected number of claims per year greater than 2000 is: 10{1 - F(2000)} = (10)(1 - .5832) = 4.17. Alternately, one can deflate the deductible amount of 2000, which is then 2000/ 1.2 = 1667, and use the original LogNormal Distribution. The expected number of claims per year greater than 1667 in the original year is: 10(1 - F(1667)) = (10)(1 - Φ[{ln(1667) - 7}/ 2]) = (10)(1 - Φ[.21]) = (10)(1 - .5832) = 4.17. Comment: Prior to inflation, the expected number of claims per year greater than 2000 is: 10(1 - F(2000)) = (10)(1 - Φ[{ln(2000) - 7} / 2]) = (10)(1 - Φ[.30]) = 3.82. 36.65. E. (10 million / 0.8) + (9 million / 0.9) = 22.5 million. Comment: Prior to working with observed losses, they are commonly brought to one common level of inflation. 36.66. D. For the Pareto Distribution, LER(x) = E[X ∧ x] / E[X] = 1 - (θ/(θ+x))α−1. In the later year, losses have doubled, so the scale parameter of the Pareto has doubled, so θ = 2k, rather than k. For θ = 2k and α = 2, LER(x) = 1 - (2k/(2k+x)) = x / (2k + x). Thus LER(2k) = 2k/ 4k = 1/2.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 657
36.67. C. The behavior of the LogNormal under uniform inflation is explained by noting that multiplying each claim by a factor of 1.1 is the same as adding a constant amount ln(1.1) to the log of each claim. (For the LogNormal, the log of the sizes of the claims follow a Normal distribution.) Adding a constant amount to a normal distribution, gives another normal distribution, with the same variance but with the mean shifted. Thus under uniform inflation for the LogNormal, µ becomes µ + ln(1.1). The parameter σ remains the same. 36.68. C. An inflation factor of .25 applied to a Weibull Distribution, gives another Weibull with scale parameter: (0.25)θ = (.25)(625) = 156.25, while the shape parameter τ is unaffected. Thus Z is a Weibull with parameters θ = 156.25 and τ = 0.5. 36.69. C. For the Exponential Distribution,under uniform inflation θ is multiplied by the inflation factor. In this case, the inflation factor is 2, so the new theta is (1000)(2) = 2000. Prior to inflation the percent of losses that exceed the deductible of 500 is: e-500/1000 = e-.5 = .6065. After inflation the percent of losses that exceed the deductible of 500 is: e-500/2000 = e-.25 = .7788. Thus the number of losses that exceed the deductible increased by a factor of .7788/.6065 = 1.284. Since there were 10 losses expected prior to inflation, there are (10)(1.284) = 12.8 claims expected to exceed the 500 deductible after inflation. Comment: One can also do this question by deflating the 500 deductible to 250. Prior to inflation, S(250) = e-250/1000 = e-.25 and S(500) = e-500/1000 = e-.5. Thus if 10 claims are expected to exceed 500, then there a total of 10/e-.5 claims. Thus the number of clams expected to exceed 250 is: (10e.5)(e-.25) = 10e.25 = 12.8. 36.70. D. Under uniform inflation, for the Burr, theta is multiplied by (1+r), thus theta becomes: 1.1 1000 = 34.8. Comment: For a mixed distribution, under uniform inflation each of the individual distributions is transformed just as it would be if an individual distribution. In this case, the Pareto has new parameters α = 1 and θ = (1000)(1.1) = 1100, while the Burr has new parameters α = 1, θ = (1.1) 1000 = 1210, and γ = 2. The weights applied to the distributions remain the same.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 658
36.71. B. The mean of the 1997 LogNormal is: exp((µ+ ln k) + σ2 /2). F96[Mean97] = Φ[(ln(Mean97) - µ) / σ] = Φ[(µ+ ln k + σ2 /2 - µ) / σ] = Φ[(ln k + σ2 /2) / σ]. Since we are given that in 1996 100p% of the losses exceed the mean of the losses in 1997, 1 - F96[Mean97] = p. Thus F96[Mean97] = 1 - p. Thus Φ[(ln k + σ2 /2) / σ] = 1 - p. Since the Normal Distribution is symmetric, Φ[-(ln k + σ2 /2) / σ] = p. Thus by the definition of zp , -(ln k + σ2 /2) / σ = zp . Therefore, σ2 /2 + zp σ + ln k = 0. ⇒ σ = -zp ± zp2 - 2 ln k . Comment: The 1997 distribution is the result of applying uniform inflation, with an inflation factor of k, to the 1996 distribution. Thus the mean of the 1997 distribution is: k exp(µ+ σ2 /2), k times the mean of the 1996 distribution. One could take for example p = .05, in which case zp = -1.645, and then solve for σ in that particular case. 36.72. D. At 10% per year for three years, the inflation factor is 1.13 = 1.331. Thus greater than 500 in 1998 corresponds to greater than 500/1.331 = 376 in 1995. At least 45 and at most 50 claims are less than 376 in 1995. Therefore, between 50% and 55% of the total of 100 claims are greater than 376 in 1995. Therefore, between 50% and 55% of the total of 100 claims are greater than 500 in 1998. Comment: One could linearly interpolate that about 52% or 53% of the claims are greater than 500 in 1998. 36.73. D. Deflate the 15,000 deductible in the later year back to the prior year: 15,000/1.5 = 10,000. In the prior year, the average non-zero payment is: (E[X] - E[X ∧ 10000]) / S(10000) = (20,000 - 6000) / (1 - 0.6) = 14000 / 0.4 = 35,000. Inflating to the subsequent year: (1.5)(35,000) = 52,500. Comment: If the limit keeps up with inflation, so does the mean residual life. 36.74. C. In 1999, one has a Pareto with parameters 2 and 1.06θ. S 1999(d) = {1.06θ / (1.06θ + d)}2 . S1998(d) = {θ / (θ + d)}2 . r = S1999(d) / S1998(d) = 1.062 {(θ + d) / (1.06θ + d)}2 = 1.1236 {(1+ θ/d ) / (1+ 1.06θ/d)}2 As d goes to infinity, r goes to 1.1236. Comment: Alternately, S1999(d) = S1998(d/1.06) = {θ / (θ + d/1.06)}2 .
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 659
36.75. C. After 10% inflation, the survival function at 1000 is what it was originally at 1000/1.1 = 909.09. S(1000) = 1 - F(1000) = 1 - Φ[{ln(1000)−µ} / σ] = 1 - Φ[0] = 0.5. S(909.09) = 1 - Φ[{ln(909.09)−µ} / σ] = 1 - Φ[-.06] = 0.5239. S(909.09) / S(1000) = 0.5239 / 0.5 = 1.048. An increase of 4.8%. Comment: After inflation one has a LogNormal with µ = 6.9078 + ln(1.1), σ = 1.5174. 36.76. Assume for simplicity that the expected frequency is 5. ⇔ One loss of each size. Loss
Contribution to Layer 0-50
Contribution to Layer 50-100
Contribution to Layer 100-200
Contribution to Layer 200-∞
Total
40 80 120 160 200
40 50 50 50 50
0 30 50 50 50
0 0 20 60 100
0 0 0 0 0
40 80 120 160 200
Total
240
180
180
0
600
For the next year, increase each size of loss by 10%: Loss
Contribution to Layer 0-50
Contribution to Layer 50-100
Contribution to Layer 100-200
Contribution to Layer 200-∞
Total
44 88 132 176 220
44 50 50 50 50
0 38 50 50 50
0 0 32 76 100
0 0 0 0 20
44 88 132 176 220
Total
244
188
208
20
660
Trend for layer from 0 to 50 is: 244/240 - 1 = 1.7%. Trend for layer from 50 to 100 is: 188/180 - 1 = 4.4%. Trend for layer from 100 to 200 is: 208/180 - 1 = 15.6%. Comment: The limited losses in the layer from 0 to 50 increase slower than the overall rate of inflation 10%, while the excess losses in the layer from 200 to ∞ increase faster. The losses in middle layers, such as 50 to 100 and 100 to 200, can increase either slower or faster than the overall rate of inflation, depending on the particulars of the situation. 36.77. A. Y follows a Pareto Distribution with parameters: α = 2 and θ = (1.10)(100) = 110. Thus eY(k) = (k+θ)/(α-1) = k + 110. eX(k) = k + 100. eY(k) / eX(k) = (k+110) / (k+100) = 1 + 10/(k+110). Therefore, as k goes from zero to infinity, eY(k) / eX(k) goes from 1.1 to 1.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 660
36.78. a. The frequency of claims exceeding the franchise deductible is: {1 - F(1000)} 0.05 = (1 - 0.09) (0.05) = 4.55%. The average payment per payment for an ordinary deductible of 1000 is: (E[X] - E[X ∧ 1000]) / {1 - F(1000)} = (10,000 - 945) / (1 - 0.09) = 9950.55. The average payments per payment for a franchise deductible of 1000 is 1000 more: $10,950.55. b. After inflation, the average payment per payment for an ordinary deductible of 1000 is: (1.1)(E[X] - E[X ∧ 909]) / {1 - F(909)} = (1.1)(10,000 - 870) / (1 - 0.075) = 10,857.30. The average payments per payment for a franchise deductible of 1000 is 1000 more: $11,857.30. Subsequent to inflation, the frequency of payments is: (1 - 0.075) (0.05) = 4.625%, and the pure premium is: (4.625%)($11,857.30) = $548.40. Comment: Prior to inflation, the pure premium is: (4.55%)($10,950.55) = $498.25.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 661
36.79. B. During 1994, there is a 100/1000 = 10% chance that nothing is paid. If there is a non-zero payment, it is uniformly distributed on (0, 900). Thus the mean amount paid is: 90%(450) = 405. The second moment of the amount paid is: (90%)(9002 )/3 = 243,000. Thus in 1994, the standard deviation of the amount paid is:
243,000 - 4052 = 281.02.
In 1995, the losses are uniformly distributed from (0, 1050). During 1995, there is a 100/1050 chance that nothing is paid. If there is a non-zero payment it is uniformly distributed on (0, 950). Thus the mean amount paid is: (950/1050)(950/2) = 429.76. The second moment of the amount paid is: (950/1050)(9502 )/3 = 272,182.5. Thus in 1994, the standard deviation of the amount paid is:
272,182.5 - 429.762 = 295.78.
% increase in the standard deviation of amount paid is: 295.78/281.02 - 1 = 5.25%. Alternately, the variance of the average payment per loss under a maximum covered loss of u and a deductible of d is: E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - E[X ∧ d]} - {E[X ∧ u] - E[X ∧ d]}2 . With no maximum covered loss (u = ∞), this is: E[X2 ] - E[(X ∧ d)2 ] - 2d{E[X] - E[X ∧ d]} - {E[X] - E[X ∧ d]}2 . For the uniform distribution on (a, b), the limited moments are for a ≤ x ≤ b : x
∫
E[(X ∧ d)n ] = yn / (b-a) dy + xn S(x) = (xn+1 - an+1)/{(n+1)(b-a)} + xn (b-x)/(b-a) = a
{(n+1)xn b - an+1 - n xn+1} / {(n+1)(b-a)}. In 1994, a = 0, b = 1000, and d =100. E[X ∧ 100] = {(2)(100)(1000) - 1002 }/2000 = 95. E[(X ∧ 100)2 ] = ((3)(1002 )(1000) - (2)1003 )/3000 = 9333.33. The variance of the average payment per loss is: 10002 /3 - 9333.33 - (2)(100)(500 - 95) - (500 - 95)2 = 78,975. Similarly, in 1995, E[X ∧ 100] = ((2)(100)(1050) - 1002 )/((2)(1050)) = 95.238. E[(X ∧ 100)2 ] = ((3)(1002 )(1050) - (2)1003 )/((3)(1050))= 9365.08. In 1995, the variance of the average payment per loss is: 10502 /3 - 9365.08 - (2)(100)(525 - 95.238) - (525 - 95.238)2 = 87,487. % increase in the standard deviation of amount paid is: 87,487 / 78,975 - 1 = 5.25%. Comment: The second moment of the uniform distribution on (a, b) is: (b3 - a3 ) / {3(a-b)}. When a = 0, this is b2 / 3. The amount paid is a mixture of two distributions, one always zero and the other uniform. For example, in 1994, the amount paid is a 10%-90% mixture of a distribution that is always zero and a uniform distribution on (0, 900). The second moments of these distributions are zero and 9002 /3= 270,000. Thus the second moment of the amount paid is: (10%)(0) + (90%)(270000) = 243,000.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 662
36.80. C. Prior to inflation, .250 = F(1000) = 1 - e-1000/θ. ⇒ θ = 3476. After inflation, θ = (2)(3476) = 6952. F(1000) = 1 - e-1000/6952 = 0.134. 36.81. D. E[X
∧ E[X ∧ E[X ∧ E[X
∧
10] = -0.025(102 ) + (1.475)(10) - 2.25 = 10.
11] = -0.025(112 ) + (1.475)(11) - 2.25 = 10.95. 20] = -0.025(202 ) + (1.475)(20) - 2.25 = 17.25
22] = -0.025(222 ) + (1.475)(22) - 2.25 = 18.10. In 2000, there is a deductible of 11 and a maximum covered loss of 22, so the expected payments are: E[X ∧ 22] - E[X ∧ 11] = 18.10 -10.95 = 7.15. A deductible of 11 and maximum covered loss of 22 in the year 2001, when deflated back to the year 2000 correspond to a deductible of 11/1.1 = 10 and a maximum covered loss of 22/1.1 = 20. Therefore, reinflating back to the year 2001, the expected payments in the year 2001 are: (1.1)(E[X ∧ 20] - E[X ∧ 10]) = (1.1)(17.25 - 10) = 7.975. The ratio of expected payments in 2001 over the expected payments in the year 2000 is: 7.975/ 7.15 = 1.115. Alternately, the insurerʼs average payment per loss is: (1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]). c = 100%, L = 22, d = 11. r = .1 for the year 2001 and r = 0 for the year 2000. Then proceed as previously. 36.82. C. The expected frequency is: 10,000,000 / 2000 = 5000. E[X
∧
x] = {θ/(α-1)}{1 - (θ/(x+θ))α−1} = (2000){1 - (2000/(x + 2000)} = 2000 x/(x + 2000).
E[X] - E[X
∧
3000] = 2000 - 1200 = 800.
⇒ Expected losses paid by reinsurer: (5000)(800) = 4 million. The ceded premium is: (1.1)(4 million) = 4.4 million. Alternately, the Excess Ratio, R(x) = 1 - E[X ∧ x]/E[X]. For the Pareto, E[X] = θ/(α-1) and E[X ∧ x] = {θ/(α-1)}{1- (θ/(x+θ))α−1}. Therefore R(x) = (θ/(x+θ))α−1. In this case, θ = 2000 and α = 2, so R(x) = 2000/(x+2000). R(3000) = 40%. The expected excess losses are: (40%)(10,000,000) = 4 million. The ceded premium is: (1.1)(4 million) = 4.4 million.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 663
36.83. E. In 2001 the Pareto has θ = 2000 and α = 2, so in 2002 the Pareto has parameters θ = (1.05)(2000) = 2100 and α = 2. In 2002, R(3000) = 2100/(3000+2100) = 41.18%. The expected losses in 2002 are: (1.05)(10 million) = 10.5 million. The expected excess losses in 2002 are: (41.18%)(10.5 million) = 4.324 million. The ceded premium in 2002 is: (1.1)(4.324 million) = 4.756 million. C 2002 / C2001 = 4.756/4.4 = 1.08. Alternately, the excess ratio at 3000 in 2002 is the same as the excess ratio at 3000/1.05 = 2857 in 2001. In 2001, R(2857) = 2000/(2857+2000) = 41.18%. Proceed as before. Alternately, the average payment per loss is: (1+r) c (E[X ∧ L/(1+r)] - E[X ∧ d/(1+r)]). In 2001 this is: E[X] - E[X ∧ 3000]. In 2002 this is: (1.05)(E[X] - E[X ∧ 3000/1.05]). C 2002 / C2001 = (avg. payment per loss in 2002)/(average payment per loss in 2001) = (1.05)(E[X] - E[X ∧ 2857])/(E[X] - E[X ∧ 3000]) = (1.05)(2000 - 1176)/(2000 - 1200) = 1.08. Comment: Over a fixed limit the excess losses increase more quickly than the overall inflation rate of 5%. 36.84. D. After severity increases by 50%: Probability Severity Payment with 100 deductible 0.25 60 0 0.25 120 20 0.25 180 80 0.25 300 200 Average payment per loss: (0 + 20 + 80 + 200)/4 = 75. Expected total payment = (300)(75) = 22,500 Comment: Expect 300 payments: 75@0, 75@ 20, 75@80, and 75@ 200, for a total of 22,500. 36.85. B. Prior to inflation, S(100) = e-100/200 = 0.6065. After inflation, severity is Exponential with θ = (1.2)(200) = 240. S(100) = e-100/240 = .6592. Percentage increase in the number of reported claims next year: (1.1)(.6592/.6065) - 1 = 19.6%. 36.86. D. After inflation, severity is Pareto with θ = (1.2)(1500) = 1800, and α = 4. Expected payment per loss: E[X] - E[X
∧
100] = {θ/(α-1)} - {θ/(α-1)}{1 - (θ/(θ+100))α−1}
= {1800/(4 - 1)}(1800/(1800 +100))4-1 = 510.16. Alternately, the average payment per loss in the later year is: (1+r) c (E[X ∧ u/(1+r)] - E[X ∧ d/(1+r)]) = (1.2)(1)(E[X] - E[X ∧ 100/1.2]) = 1.2{500 - 500(1 - (1500/1583.33)3 )} = 510.16.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 664
36.87. A. 1. True. 2. False. Should be α and (1+i)θ. In any case, α is a shape not a scale parameter. 3. False. Should be (1+i)θ; the scale parameter is multiplied by one plus the rate of inflation. 36.88. B. First year: E[X ∧ 5000] = (2000/(2-1)){1 - (2000/(2000 + 5000))2-1} = 1429. Next year, d = 100, u = 5000, c = 80% and r = 4%, and the average payment per loss is: (1.04)(80%){E[X ∧ 5000/1.04] - E[X ∧ 100/1.04]} = 0.832{E[X ∧ 4807.7] - E[X ∧ 96.15]} = (.832)(2000){2000/(2000 + 96.15) - 2000/(2000 + 4807.7)} = 1099. Reduction is: 1 - 1099/1429 = 23.1%. 36.89. B. In 2004 losses follow a Pareto with α = 2 and θ = (1.2)(5) = 6. E[X] = 6/(2 - 1) = 6. E[X LER(10) = E[X
∧
∧
10] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (6){1 - 6/(6 + 10)} = 3.75.
10]/E[X] = 3.75/6 = .625 = 5/8.
36.90. D. Sthisyear(750000) = {400000/(400000 + 750000)}2 = .12098. After inflation, the losses follow a Pareto distribution with α = 2 and θ = (1.1)(400,000) = 440,000. S nextyear(750000) = {440000/(440000 + 750000)}2 = .13671. S nextyear(750000) / Sthisyear(750000) = .13671/.12098 = 1.130. Alternately, one can calculate the survival function at 750,000 next year, by deflating the 750000 to this year. 750000/1.1 = 681,818. Snextyear(750000) = Sthisyear(681,818) = {400000/(400000 + 681,818)}2 = 0.13671. Proceed as before. 36.91. B. This is Pareto Distribution with α = 3 and θ = 800. In 2006, losses follow a Pareto Distribution with α = 3 and θ = (1.08)(800) = 864. With a franchise deductible, the total amount of those losses of size 300 or less are eliminated, while the full amount of all losses of size greater than 300 is paid. 300
Losses eliminated = ∫ x f(x)dx = E[X 0
∧ 300] - 300S(300) =
(864/2){1 - (864/(300 + 864))2 } - (300){864/(300 + 864)}3 = 71.30. Loss elimination ratio = Losses eliminated / mean = 71.30/(864/2) = 16.5%. Alternately, the expected payment per loss with an ordinary deductible would be: E[X] - E[X ∧ 300] = (864/2) - (864/2){1 - (864/(300 + 864))2 } = 238.02. With the franchise deductible one pays 300 more on each loss of size exceeding 300 than under the ordinary deductible: 238.02 + 300S(300) = 238.02 + (300){864/(300 + 864)}3 = 360.71. E[X] = 864/2 = 432. Loss Elimination Ratio is: 1 - 360.71/432 = 16.5%.
2013-4-2,
Loss Distributions, §36 Inflation
∧
36.92. D. For a Pareto Distribution, E[X] - E[X
HCM 10/8/12,
Page 665
d] = θ/(α−1) - {θ/(α−1)}{1 - (θ/(d + θ))α −1} =
θα/{(α-1)(d + θ)α−1}. The premium in each year is: 1.2(E[X] - E[X
∧
600]) = 1.2θα/{(α-1)(600 + θ)α−1}.
If the reinsurance covers the layer excess of d in ground up loss, then the reinsurance premium is: 1.1(E[X] - E[X
∧
d]) = 1.1θα/{(α-1)(d + θ)α−1}.
In 2005, the Pareto has α = 2 and θ = 3000. R2005 = .55P2005. ⇒ (1.1) 30002 /(d + 3000) = (.55)(1.2) 30002 /3600.
⇒ (.66)(d + 3000) = (1.1)(3600). ⇒ d = 3000. In 2006, the losses follow a Pareto Distribution with α = 2 and θ = (3000)(1.2) = 3600. P2006 = (1.2)36002 /4200. R2006 = (1.1)36002 /6600. R2006/P2006 = (1.1)(4200)/{(1.2)(6600)} = 0.583. Comment: The higher layer increases more due to inflation, and therefore the ratio of R/P has to increase, thereby eliminating choices A, B, and C. One could have instead described the reinsurance as covering the layer excess of 600 + d in ground up loss, in which case d = 2400 and one obtains the same final answer. P2005 = 3000. R2005 = 1650. P2006 = 3703. R2006 = 2160. 36.93. D. The losses in dollars are all 30% bigger. For example, 10,000 euros ⇔ 13,000 dollars. This is mathematically the same as 30% uniform inflation. We get another Lognormal with σ = 2 the same, and µʼ = µ + ln(1+r) = 8 + ln(1.3) = 8.26. Comment: The mean in dollars should be 1.3 times the mean in euros. The mean in euros is: exp[8 + 22 /2] = 22,026. For the choices we get means of: exp[6.15 + 2.262 /2] = 6026, exp[7.74 + 22 /2] = 16,984, exp[8 + 2.62 /2] = 87,553, exp[8.26 + 22 /2] = 28,567, and exp[10.4 + 2.62 /2] = 965,113. Eliminating all but choice D. 36.94. A. In 2005, S(250) = (1000/1250)3 = .512. In 2006, the Pareto distribution has α = 3 and θ = (1.1)(1000) = 1100. In 2006, S(250) = (1100/1350)3 = .54097. Increase in expected number of claims: (14)( .54097 - .512) = 0.406 claims. Alternately, deflate 250 in 2006 back to 2005: 250/1.1 = 227.27. In 2005, S(227.27) = (1000/1227.27)3 = .54097. Proceed as before. Comment: We make no use of the fact that frequency is Poisson.
2013-4-2,
Loss Distributions, §36 Inflation
HCM 10/8/12,
Page 666
36.95. D. In 4 years, severity is Pareto with parameters α = 4 and θ = (1.064 )3000 = 3787. Under Policy R, the expected cost per loss is the mean: 3787/(4 - 1) = 1262. Under Policy S, the expected cost per loss is: E[X ∧ 3000] - E[X ∧ 500] = {θ/(α−1)}{(θ/(θ+500))α−1 − (θ/(θ+3000))α−1} = (1262){(3787/4287)3 - (3787/6787)3 } = 651. Difference is: 1262 - 651 = 611. Alternately, under Policy S, the expected cost per loss, using the Pareto for year t, is: (1.064 ){E[X ∧ 3000/1.064 ] - E[X ∧ 500/1.064 ]} = (1.2625){E[X ∧ 2376] - E[X ∧ 396]} = (1.2625){{3000/(4-1)}{1 - (3000/5376)4-1} - {3000/(4-1)}{1 - (3000/3396)4-1} = (1.2625){826 - 311} = 650. Difference is: 1262 - 650 = 612. Comment: This exam question should have said: “Policy S has a deductible of $500 and a maximum covered loss of $3,000.” 36.96. Compare the contributions to the layer from 100,000 to 200,000 before and after inflation: Claim
Loss
Contribution to Layer
A B C D E
35,000 125,000 180,000 206,000 97,000
0 25,000 80,000 100,000 0
Total
205,000
Inflated Loss
Contribution to Layer
37,800 135,000 194,400 222,480 104,760
0 35,000 94,400 100,000 4,760 234,160
234,160/205,000 = 1.142. 14.2% effective trend on this layer.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 667
Section 37, Lee Diagrams A number of important loss distribution concepts can be displayed graphically. Important concepts will be illustrated using this graphical approach of Lee.276 While this material is not on your exam, graphically oriented students may benefit from looking at this material. You may find that it helps you to remember formulas that are used on your exam. Below is shown a conventional graph of a Pareto Distribution with α = 4 and θ = 2400:
Exercise: For a Pareto Distribution with α = 4 and θ = 2400, what is F(1000)? ⎛ θ ⎞α [Solution: F(x) = 1 - ⎜ ⎟ . F(1000) = 1 - (2400/3400)4 = 0.752.] ⎝ θ + x⎠ In the conventional graph, the x-axis correspond to size of loss, while the y-axis corresponds to probability. Thus for example, the above graph of a Pareto includes the point (1000, 0.752). In contrast, “Lee Diagrams” have the x-axis correspond to probability, while the y-axis corresponds to size of loss. 276
“The Mathematics of Excess of Loss Coverage and Retrospective Rating --- A Graphical Approach”, by Y.S. Lee, PCAS LXXV, 1988. Currently on the syllabus of CAS Exam 9. Lee cites “A Practical Guide to the Single Parameter Pareto Distribution”, by Steven Philbrick, PCAS 1985. Philbrick in turn points out the similarity to the treatment of Insurance Charges and Savings in “Fundamentals of Individual Risk Rating and Related Topics”, by Richard Snader.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 668
Here is the Lee Diagram of a Pareto Distribution with α = 4 and θ = 2400:
For example, since F(1000) = .752, the point (.752, 1000) is on the curve. Note the way that the probability approaches a vertical asymptote of unity as the claim size increases towards infinity.277 Advantages of this representation of Loss Distributions include the intuitively appealing features:278 1. The mean is the area under the curve.279 2. A loss limit is represented by a horizontal line, and excess losses lie above the line. 3. Losses eliminated by a deductible lie below the horizontal line represented by the deductible amount. 4. After the application of a trend factor, the new loss distribution function lies above the prior distribution.
F(x) →1, as x →∞. “A Practical Guide to the Single Parameter Pareto Distribution”, by Steven Philbrick, PCAS 1985. 279 As discussed below. In a conventional graph, the area under a distribution function is infinite, the area under the survival function is the mean, and the area under the density is one. 277 278
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 669
Means: One can compute the mean of this size of loss distribution as the integral from 0 to ∞ of y f(y)dy. As shown below, this is represented by summing narrow vertical strips, each of width f(y)dy and each of height y, the size of loss.280 5000
4000
3000
2000
f(y)dy 1000
y
0.2
0.4
0.6
0.8
1.0
Summing over all y, would give the area under the curve. Thus the mean is the area under the curve.281
280
y = size of loss. x = F(y) = Prob[Y ≤ y]. f(y) = dF/dy = dx/dy. dx = (dx/dy)dy = f(y)dy. The width of each vertical strip is dx = f(y)dy. 281 In this case, the mean = θ/(α-1) = 2400/3 = 800.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 670
Alternately as shown below, one can compute the area under the curve by summing narrow horizontal strips, each of height dy and each of width 1 - F(y) = S(y), the survival function.282 Summing over all y, would give the area under the curve. 5000
4000
3000
2000
dy
1000
S(y)
0.2
0.4
0.6
0.8
1.0
Two integrals can be gotten from each other via integration by parts: ∞
∞
0
0
∫ S(y) dy =
∫ y f(y) dy = mean claim size.
Thus the area under the curve can be computed in either of two ways. The mean is either the integral of the survival function, via summing horizontal strips, or the integral of y f(y), via summing vertical strips . In the Pareto example: ∞ ⎛ ∞ 2400 ⎞ 4 24004 ∫ ⎜⎝ y + 2400 ⎟⎠ dy = 800 = ∫ y 4 (y + 2400)5 dy = mean claim size. 0 0
282
Each horizontal strip goes from x = F(y) to x = 1, and thus has width 1 - F(x) = S(x).
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 671
Mean vs. Median: The Lee Diagram below, illustrates why for loss distributions skewed to the right, i.e., with positive coefficient of skewness, the mean is usually greater than the median.283
The area under the curve is the mean; since the width is one, the average height of the curve is the mean. On the other hand, the median is the height at which the curve reaches a probability of one half, which in this case is less than the average height of the curve. The diagram would be similar for most distributions significantly skewed to the right, and the median is less than the mean.
283
For distributions with skewness close to zero, the mean is usually close to the median. (For symmetric distributions, the mean equals the median.) Almost all loss distributions encountered in practical applications by casualty actuaries have substantial positive skewness, with the median significantly less than the mean.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 672
Limited Expected Value: A loss of size less than 1000 contributes its size to the Limited Expected Value at 1000, E[X ∧ 1000]. A loss of size greater than or equal to 1000 contributes 1000 to E[X ∧ 1000]. These contributions to E[X ∧ 1000] correspond to vertical strips: 5000
4000
3000
2000
1000
f(500) dy 1000
500 0.2
0.4
0.6
0.8
1.0
The contribution to E[X ∧ 1000] of the small losses, as a sum of vertical strips, is the integral from 0 to 1000 of y f(y) dy. The contribution to E[X ∧ 1000] of the large losses, is the area of the rectangle of height 1000 and width S(1000): 1000 S(1000).
These 2 pieces correspond to the 2 terms: E[X ∧ 1000] =
1000
∫
0
y f(y) dy + 1000S(1000).
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 673
Adding up these two types of contributions, E[X ∧ 1000] corresponds to the area under both the distribution curve and the horizontal line y = 1000:
In general, E[X ∧ L] ⇔ the area below the curve and also below the horizontal line at L. Summing horizontal strips, the Limited Expected Value also is equal to the integral of the survival function from zero to the limit: 5000
4000
3000
2000
1000
dy S ( y) 0.2
E[X ∧ 1000] =
1000
∫
0
S(t) dt .
0.4
0.6
0.8
1.0
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 674
Losses Eliminated: A loss of size less than 1000 is totally eliminated by a deductible of 1000. For a loss of size greater than or equal to 1000, 1000 is eliminated by a deductible of size 1000. Summing these contributions as vertical strips, as shown below the losses eliminated by a 1000 deductible correspond to the area under both the curve and y = 1000.
Losses Eliminated by 1000 Deductible ⇔ area under both the curve and y = 1000
⇔ E[X ∧ 1000]. In general, the losses eliminated by a deductible d ⇔ the area below the curve and also below the horizontal line at d. Loss Elimination Ratio (LER) = losses eliminated / total losses. The Loss Elimination Ratio is represented by the ratio of the area under both the curve and y = 1000 to the total area under the curve.284 One can either calculate the area corresponding to the losses eliminated by summing of horizontal strips of width S(t) or vertical strips of height t limited to x
x. Therefore: LER(x) =
∫ S(t) dt 0
284
x
/ µ = { ∫ t f(t) dt + xS(x)} / µ = E[X ∧ x] / E[X]. 0
In the case of a Pareto with α = 4 and θ = 2400, the Loss Elimination Ratio at 1000 is 518.62 / 800 = 64.8% .
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 675
Excess Losses: A loss of size y > 1000, contributes y - 1000 to the losses excess of a 1000 deductible. Losses of size less than or equal to 1000 contribute nothing. Summing these contributions, as shown below the losses excess of a 1000 deductible corresponds to the area under the distribution curve but above y = 1000.
This area under the distribution curve but above y = 1000, are the losses excess of 1000, E[(X - 1000)+], the numerator of the Excess Ratio.285 The denominator of the Excess Ratio is the mean, the whole area under the distribution curve. Thus the Excess Ratio, R(1000) = E[(X - 1000)+] / E[X], corresponds to the ratio of the area under the curve but above y = 1000, to the total area under the curve.286 Since the Losses Eliminated by a 1000 deductible and the Losses Excess of 1000 sum to the total losses, E[(X - 1000)+] = Losses Excess of 1000 = total losses - losses limited to 1000 ⇔ E[X] - E[X ∧ 1000]. LER(1000) + R(1000) = 1.
In the case of a Pareto with α = 4 and θ = 2400, the area under the curve and above the line y=1000 is: θα (θ+y)1−α / (α−1) = 24004 / {(34003 )3} = 281.38. 285
In the case of a Pareto with α = 4 and θ = 2400, the Excess Ratio at 1000 is: 281.38 / 800 = 35.2% = (2400 / 3400)3 = {θ/(θ+x)}α−1. R(1000) = 1 - LER(1000) = 1 - .648 = .352.
286
2013-4-2,
Loss Distributions, §37 Lee Diagrams ∞
E[(X - 1000)+] =
∫
S(t) dt =
1000
HCM 10/8/12,
Page 676
∞
∫
(t - 1000) f(t) dt .
1000
The first integral ⇔ summing of horizontal strips. The second integral ⇔ summing of vertical strips. E[(X - 1000)+] ⇔ area under the distribution curve but above y = 1000. In general, the losses excess of u ⇔ the area below the curve and also above the horizontal line at u. As was done before, one can divide the limited losses into two Area A and B.
1000
Area A =
∫
t f(t) dt = 271.
0
Area B = (1000)S(1000) = (1000)(.248) = 248. Area A + Area B = E[X ∧ 1000] = 519. Area C = E[X] - E[X ∧ 1000] = 800 - 519 = 281. Excess Ratio at 1000 = C / (A + B + C) = 281/ 800 = 35%.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 677
Mean Excess Loss (Mean Residual Life): The Mean Excess Loss can be written as the excess losses divided by the survival function: e(x) =
∞
∞
x
x
∫ S(t) dt / S(x) =
∫ (t
- x) f(t) dt / S(x) .
For example, the losses excess of a 1000 deductible corresponds to the area under the distribution curve but above y = 1000: 5000
4000
3000
2000 Losses Excess of 1000
1000
S(1000) = 0.248 0.2
0.4
0.6
0.8
1.0
This area under the distribution curve but above y = 1000 is the numerator of the Mean Excess Loss.287 The first integral above corresponds to the summing of horizontal strips. The second integral above corresponds to the summing of vertical strips. The denominator of the Excess Ratio is the survival function S(1000) = .248, which is the width along the horizontal axis of the area corresponding to the numerator. Thus the Mean Excess Loss, e(1000), corresponds to the average height of the area under the curve but above y = 1000.288 For example, in this case that average height is e(1000) = 1133. However, since the curve extends vertically to infinity, it is difficult to use this type of diagram to distinguish the mean excess loss, particularly for heavy-tailed distributions such as the Pareto. 287
The numerator of the Mean Excess Loss is the same as the numerator of the Excess Ratio. In the case of a Pareto with α = 4 and θ = 2400, the area under the curve and above the line y=1000 is: θα (θ+y)1−α / (α−1) = 24004 / {(34003 )3} = 281.38. In the case of a Pareto with α = 4 and θ = 2400, the Mean Excess Loss at 1000 is 281.38 / .2483 = 1133, since S(x) = {θ/(θ+x)}α = (2400 / 3400)4 = .2483. Alternately, for the Pareto e(x) = (θ+x)/(α−1) = 3400/3 = 1133. 288
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 678
Layers of Loss: Below is shown the effect of imposing a 3000 maximum covered loss:
Amount paid with a 3000 maximum covered loss ⇔ the losses censored from above at 3000
⇔ E[X ∧ 3000] ⇔ the area under both the curve and the horizontal line y = 3000. The Layer of Loss between 1000 and 3000 would be those dollars paid in the presence of both a 1000 deductible and a 3000 maximum covered loss. As shown below, the layer of losses from 1000 to 3000 is the area under the curve but between the horizontal lines y = 1000 and y = 3000 ⇔ Area B.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 679
Area A ⇔ Losses excess of 3000 ⇔ Losses not paid due to 3000 maximum covered loss. Area C ⇔ Losses limited to 1000 ⇔ Losses not paid due to 1000 deductible. Area B ⇔ Layer from 1000 to 3000
⇔ Losses paid with 1000 deductible and 3000 maximum covered loss. Summing horizontal strips, Area B can be thought of as the integral of the survival function from 1000 to 3000: Layer of losses from 1000 to 3000 =
3000
∫
S(t) dt = E[X ∧ 3000] - E[X ∧ 1000].
1000
This area is also equal to the difference of two limited expected values: the area below the curve and y = 3000, E[X ∧ 3000], minus the area below the curve and y = 1000, E[X ∧ 1000]. In general, the layer from d to u ⇔ the area under the curve but also between the horizontal line at d and the horizontal line at u. Summing vertical strips this same area can also be expressed as the sum of an integral and the area of a rectangle: 5000
4000
3000
2000 t-1000
1000 f(t)dt
0.2
0.4
0.6
0.8
1.0
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 680
3000
Layer of losses from 1000 to 3000 =
∫
(t - 1000) f(t) dt + (3000 - 1000) S(3000).
1000
For the Pareto example, losses excess of 3000 = E[X] - E[X ∧ 3000] = 2400/3 - (2400/3){1 - (2400/(2400 + 3000))3 } = 800 - 729.767 = 70.233. Losses limited to 1000 = E[X ∧ 1000] = (2400/3){1 - (2400/(2400 + 1000))3 ) = 518.624. Layer from 1000 to 3000 = E[X ∧ 3000] - E[X ∧ 1000] = 729.767 - 518.624 = 211.143. (Losses limited to 1000) + (Layer from 1000 to 3000) + (Losses excess of 3000) = 518.624 + 211.143 + 70.233 = 800 = Mean ⇔ total losses. 3000
Alternately, the layer from 1000 to 3000 =
∫
(t - 1000) f(t) dt + (3000 - 1000) S(3000) =
1000 3000
∫
1000
(t - 1000) (4)
24004 dt + (2000){2400/(3000+2400)}4 (t + 2400)5
= 133.106 + 78.037 = 211.143.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 681
The area below the Pareto distribution curve, which equals the mean of 800, can be divided into four pieces, where the layer from 1000 to 3000 has been divided into the two terms calculated above:
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 682
Exercise: For a Pareto Distribution with α = 4 and θ = 2400, determine the expected losses in the following layers: 0 to 500, 500 to 1000, 1000 to 1500, and 1500 to 2000. ⎧ ⎛ θ ⎞ α− 1⎫ ⎛ 2400 ⎞ 3 ⎫ θ ⎧ [Solution: E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬ = 800 ⎨1 - ⎜ ⎟ ⎬. ⎝ θ + x⎠ ⎝ 2400 + x ⎠ ⎭ α −1 ⎩ ⎭ ⎩ E[X ∧ 500] = 347. E[X ∧ 1000] = 519. E[X ∧ 1500] = 614. E[X ∧ 2000] = 670. Expected Losses in layer from 0 to 500: E[X ∧ 500] = 347. Expected Losses in layer from 500 to 10000: E[X ∧ 1000] - E[X ∧ 500] = 519 - 347 = 172. Expected Losses in layer from 1000 to 1500: E[X ∧ 1500] - E[X ∧ 1000] = 614 - 519 = 95. Expected Losses in layer from 1500 to 2000: E[X ∧ 2000] - E[X ∧ 1500] = 670 - 614 = 56.] For a given width, lower layers of loss are larger than higher layers of loss.289 This is illustrated in the following Lee Diagram: 3000
2500
2000
D 1500
C 1000
B 500
A 0.2
0.4
0.6
0.8
1
Area A > Area B > Area C > Area D.
289
Therefore, incremental increased limits factors decrease as the limit increases. These ideas are discussed in “On the Theory of Increased Limits and Excess of Loss Pricing,” by Robert Miccolis, with discussion by Sheldon Rosenberg, PCAS 1997.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 683
Various Formulas for a Layer of Loss:290
In the above diagram, the layer from a to b is: Area F + Area G. There are various different formulas for the layer from a to b. Algebraic Expression for Layer from a to b E[X ∧ b] - E[X
∧ a]
E[(X - a)+] - E[(X - b)+]
Corresponding Areas on the Lee Diagram (C + D + E + F + G) - (C + D + E) (F + G + H) - H
b
∫ (y
- a) f(y) dy + (b-a)S(b)
F+G
f(y) dy + (b-a)S(b) - a{F(b)-F(a)}
(D + F) + G - D
f(y) dy + bS(b) - aS(a)
(D + F) + (E + G) - (D + E)
a b
∫y a b
∫y a
290
See pages 58-59 of “The Mathematics of Excess of Loss Coverage and Retrospective Rating --- A Graphical Approach” by Y.S. Lee, PCAS LXXV, 1988.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 684
Average Sizes of Losses in an Interval: As shown below, the dollars of loss on losses of size less than 1000 correspond to the area under the curve and to the left of the vertical line x = F(1000) = .7517:
Summing up vertical strips, this left hand area corresponds to the integral of y f(y) from 0 to 1000. As discussed previously, this area is the contribution of the small losses, one of the two pieces making up E[X ∧ 1000]. The other piece of E[X ∧ 1000] was the contribution of the large losses, 1000S(1000). Thus the dollars of loss on losses of size less than 1000 can also be calculated as E[X ∧ 1000] - 1000S(1000) or as the difference of the corresponding areas. In this case, the area below the curve and to the left of x = F(1000) is: 1000
∫
y f(y) dy = E[X ∧ 1000] - 1000S(1000) = 518.62 - 248.27 = 270.35.291
0
Therefore, the average size of the losses of size less than 1000 is: 270.35 / F(1000) = 270.35 / .75173 = 359.64. In general, the losses of size a to b ⇔ the area below the curve and also between the vertical line at a and the vertical line at b.
In the case of a Pareto with α = 4 and θ = 2400, E[X ∧ 1000] = (2400/3){1 - (2400/3400)3 } = 518.62, and S(1000) = (2400/3400)4 = .24827. 291
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 685
As shown below, the dollars of loss on losses of size between 1000 and 2000 correspond to the area under the curve and between the vertical lines the vertical line x = F(1000) = 0.7517 and x = F(2000) = 0.9115:
Summing up vertical strips, this area corresponds to the integral of y f(y) from 1000 to 2000. This area between the two vertical lines can also be thought of as the difference between the area to the left of x = F(2000) and that to the left of x = F(1000). In other words the dollars of loss on losses between 1000 and 2000 are the difference between the dollars of loss on losses of size less than 2000 and dollars of loss on losses of size less than 1000. In this case, the area below the curve and to the left of x = F(2000) is: E[X ∧ 2000] - 2000S(2000) = 670.17 - (2000)(.088519) = 493.13. The area below the curve and to the left of x = F(1000) is: E[X ∧ 1000] - 1000S(1000) = 518.62 - 248.27 = 270.35. The area between the vertical lines is the difference: 2000
∫
1000
2000
y f(y) dy =
∫
0
1000
y f(y) dy -
∫
0
y f(y) dy = 493.13 - 270.35 = 222.78.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 686
The average size of the losses of size between 1000 and 2000 is: 222.78 / {F(2000) -F(1000)} = 222.78 / (0.9115 - 0.7517) = 1394. The numerator is the area between the vertical lines at F(2000) and F(1000) and below the curve, while the denominator is the width of this area. The ratio is the average height of this area, the average size of the losses of size between 1000 and 2000.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 687
Expected Amount by Which Losses are Less than a Given Value: Assume we take 1000 - x for x ≤ 1000 and 0 for x > 1000. This is the amount by which X is less than 1000, or (1000 - X)+. As shown below for various x < 1000, (1000 - X)+ is the height of a vertical line from the curve to 1000, or from x up to 1000:
The expected amount by which X is less than 1000, E[(1000 - X)+], is Area A below:
Area B = E[X ∧ 1000]. Area A + Area B = area of a rectangle of width 1 and height 1000 = 1000. E[(1000 - X)+] = the expected amount by which losses are less than 1000 = Area A = 1000 - Area B = 1000 - E[X
∧
1000].
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
1000
We can also write area A as a sum of horizontal strips:
∫
F(x) dx .
0
In general, E[(d - X)+] =
d
d
0
0
∫ F(x) dx =
∫ 1−S(x)dx = d - E[X ∧ d].
E[(X - d)+] versus E[(d - X)+]: In the following Lee Diagram, Area A = E[(1000 - X)+] and Area C = E[(X - 1000)+].
Area A + Area B = a rectangle of height 1000 and width 1. Therefore, A + B = 1000. B = 1000 - A = 1000 - E[(1000 - X)+]. Area B + Area C = area under the curve. Therefore, B + C = E[X]. B = E[X] - C = E[X] - E[(X - 1000)+]. Therefore, 1000 - E[(1000 - X)+] = E[X] - E[(X - 1000)+]. Therefore, E[(X - 1000)+] - E[(1000 - X)+] = E[X] - 1000 = E[X - 1000]. In general, E[(X-d)+] - E[(d-X)+] = E[X] - d = E[X - d].
Page 688
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 689
Payments Subject to a Minimum:
Max[X, 1000] = A + B + C = (A + B) + C = 1000 + (E[X] - E[X ∧ 1000]). Max[X, 1000] = A + B + C = A + (B + C) = (1000 - E[X ∧ 1000]) + E[X]. Payments Subject to both a Minimum and a Maximum:
Min[Max[X, 1000], 3000] = A + B + C = (A + B) + C = 1000 + (E[X ∧ 3000] - E[X ∧ 1000]). Min[Max[X, 1000], 3000] = A + B + C = A + (B + C) = (1000 - E[X ∧ 1000]) + E[X ∧ 3000].
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 690
Inflation: After 50% uniform inflation is applied to the original Pareto distribution, with α = 4 and θ = 2400, the revised distribution is also a Pareto with α = 4 but with θ = (1.5)(2400) = 3600. Here are the original Pareto (solid) and the Pareto after inflation (dashed):
The increase in the losses due to inflation corresponds to the area between the distribution curves. The total area under the new curve is: (1.5)(800) = 3600 / (4-1) = 1200. The area under the old curve is 800. The increase in losses is the difference = 1200 - 800 = (.5)(800) = 400. The increase in losses is 50% from 800 to 1200.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 691
The losses excess of 1000 are above the horizontal line at 1000. Prior to inflation the excess losses are below the original Pareto (solid), Area E. After inflation the excess losses are below the revised Pareto (dashed), Areas D + E:
Area D represents the increase in excess losses due to inflation, while Area A represents the increase in limited losses due to inflation. Note that the excess losses have increased more quickly (as a percent) than the total losses, while the losses limited to 1000 have increased less quickly (as a percent) than the total losses. The loss excess of 1000 for a Pareto with α = 4 and θ = 3600 is: 1200 - E[X ∧ 1000] = 1200 - 624.80 = 575.20, Areas D + E above. The loss excess of 1000 for a Pareto with α = 4 and θ = 2400 is: 800 - E[X ∧ 1000] = 800 - 518.62 = 281.38, Area E above. Thus under uniform inflation of 50%, in this case the losses excess of 1000 have increased by 104.4%, from 281.38 to 575.20. The increase in excess losses is Area D above, 575.20 - 281.38 = 293.82. The loss limited to 1000 for a Pareto with α = 4 and θ = 3600 is: E[X ∧ 1000] = 624.80. The loss limited to 1000 for a Pareto with α = 4 and θ = 2400 is: E[X ∧ 1000] = 518.62. Thus under uniform inflation of 50%, in this case the losses limited to 1000 have increased by only 20.5%, from 518.62 to 624.80. The increase in limited losses is Area A above, 624.80 - 518.62 = 106.18. The total losses increase from 800 to 1200; Area A + Area D = 293.82 + 106.18 = 400.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Another version of this same Lee Diagram, showing the numerical areas:
% Increase in Limited Losses is: 106/519 = 20% < 50%. % Increase in Excess Losses is: 294/281 = 105% > 50%. % Increase in Total Losses is: (106 + 294)/ (519 + 281) = 400/800 = 50%.
Page 692
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 693
In the earlier year, the losses limited to 2000 are below the horizontal line at 2000, and below the solid Pareto, Area A in the Lee Diagram below. In the later year, the losses limited to 3000 are below the horizontal line at 3000, and below the dotted Pareto, Areas A + B + C in the Lee Diagram below. Every loss in combined Areas A + B + C is exactly 1.5 times the height of a corresponding loss in Area A.
Showing that Elater year[X
∧
3000] = 1.5 Eearlier year[X
∧
3000/1.5] = 1.5 Eearlier year[X
∧
2000].
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 694
Call Options:292 The expected payoff on a European Call on a stock is equal to E[(ST - K)+], where ST is the future price of the stock at expiration of the option, time T. Let F(x) be the distribution of the future stock price at time T.293 E[(X - K)+] is the expected losses excess of K, and corresponds to the area above the horizontal line at height K and also below the curve graphing F(x) in the following Lee Diagram:
As K increases, the area above the horizontal line at height K decreases; in other words, the value of the call decreases as K increases.
292 293
Not on the syllabus of this exam. See “Mahlerʼs Guide to Financial Economics.” A common model is that the future prices of a stock follow a LogNormal Distribution.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 695
For an increase in K of ΔK, the value of the call decreases by Area A in the following Lee Diagram: Stock Price
K+ΔK K
A
1
Prob.
The absolute change in the value of the call, Area A, is smaller than a rectangle of height ΔK and width 1 - F(K). Thus Area A is smaller than ΔK {1 - F(K)} ≤ ΔK. Thus a change of ΔK in the strike price results in an absolute change in the value of the call option smaller than ΔK. The following Lee Diagram shows the effect of raising the strike price by fixed amounts.
The successive absolute changes in the value of the call are represented by Areas A, B, C, and D. We see that the absolute changes in the value of the call get smaller as the strike price increases.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 696
Put Options:294 The expected payoff of a European put is E[(K - ST)+], where ST is the future price of the stock at expiration of the option, time T. Let F(x) be the distribution of the future stock price at time T. Then this expected payoff corresponds to Area P below the horizontal line at height K and also above the curve graphing F(x) in the following Lee Diagram:
As K increases, the area below the horizontal line at height K increases; in other words, the value of the put increases as K increases.
294
Not on the syllabus of this exam. See “Mahlerʼs Guide to Financial Economics.”
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 697
For an increase in K of ΔK, the value of the put increases by Area A in the following Lee Diagram: Stock Price
K+ΔK K
A
1
Prob.
The change in the value of the put, Area A, is smaller than a rectangle of height ΔK and width F(K). Thus Area A is smaller than ΔK F(K) ≤ ΔK. Thus a change of ΔK in the strike price results in a change in the value of the put option smaller than ΔK. The following Lee Diagram shows the effect of raising the strike price by fixed amounts.
The successive changes in the value of the put are represented by Areas A, B, C, and D. We see that the changes in the value of the put get larger as the strike price increases.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Tail Value at Risk (TVaR):295 The Tail Value at Risk of a loss distribution is defined as: TVaRp ≡ E[X | X > πp ], where the percentile πp is such that F(πp ) = p. Exercise: For a Pareto Distribution with α = 4 and θ = 2400, determine π0.90. [Solution: .90 = 1 - {2400/(2400 + x)}4. ⇒ x = 1868.] TVaR.90 is the average size of those losses of size greater than π0.90 = 1868. The denominator of TVaR.90 is: 1 - 0.90 = 0.10. The numerator of TVaR.90 is Area A + Area B in the following Lee Diagram:
Therefore, TVaR.90 is the average height of Areas A + B. Area A has height π0.90 = 1868. Area B is the expected losses excess of π0.90 = 1868. The average height of Area B is the mean excess loss, e(1868) = e(π0.90). Therefore, TVaR.90 = π0.90 + e(π0.90). In general, TVaRp = πp + e(πp ). 295
See “Mahlerʼs Guide to Risk Measures.”
Page 698
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 699
Problems: Use the following information for the next 26 questions: The size of loss distribution F(x), with corresponding survival function S(x) and density f(x), is shown in the following diagram, with probability along the horizontal axis and size of loss along the vertical axis. Express each of the stated quantities algebraically in terms of the six labeled areas in the diagram: α, β, γ, δ, ε, η.
η
5
ε γ 2
0
β
α 0
F(2)
δ F(5) 1
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
37.1 (1 point) E[X]. 37.2 (1 point) Losses from claims of size less than 2. 37.3 (1 point) Portion of total losses in the layer from 0 to 2. 2
37.4 (1 point)
∫ x dF(x)
+ 2{1 - F(2)}.
0
37.5 (1 point) Portion of total losses from claims of size more than 2. 37.6 (1 point) Portion of total losses in the layer from 0 to 5. 37.7 (1 point) E[X ∧ 5]. 37.8 (1 point) Portion of total losses from claims of size less than 5. 37.9 (1 point) Portion of total losses in the layer from 2 to 5. 37.10 (1 point) R(2) = excess ratio at 2 = 1 - LER(2). ∞
37.11 (1 point)
∫ x dF(x) . 5
37.12 (1 point) Portion of total losses in the layer from 2 to ∞. 37.13 (1 point) LER(5) = loss elimination ratio at 5. 37.14 (1 point) 2(F(5)-F(2)). 37.15 (1 point) Portion of total losses from claims of size between 2 and 5. ∞
37.16 (1 point)
∫ S(t) dt . 5
37.17 (1 point) LER(2) = loss elimination ratio at 2.
Page 700
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 701
∞
37.18 (1 point)
∫ (t - 5) f(t) dt . 5
37.19 (1 point) 2S(5). 37.20 (1 point) R(5) = excess ratio at 5 = 1 - LER(5). 37.21 (1 point) e(5) = mean excess loss at 5. 5
37.22 (1 point)
∫ (t - 2) f(t) dt . 2
37.23 (1 point) Losses in the layer from 5 to ∞. 5
37.24 (1 point)
∫ (1 - F(t)) dt . 2
37.25 (1 point) 3S(5). 37.26 (1 point) e(2) = mean excess loss at 2.
37.27 (2 points) Using Leeʼs “The Mathematics of Excess of Loss Coverages and Retrospective Rating -- A Graphical Approach,” show graphically why the limited expected value increases at a decreasing rate as the limit is increased. Label all axes and explain your reasoning in a brief paragraph. 37.28 (2 points) Losses follow an Exponential Distribution with mean 500. Using Leeʼs “The Mathematics of Excess of Loss Coverages and Retrospective Rating -A Graphical Approach,” draw a graph to show the expected losses from those losses of size 400 to 800. Label all axes.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 702
Use the following information for the next five questions: Prior to the effects of any maximum covered loss or deductible, losses follow a Weibull Distribution with θ = 300 and τ = 1/2. Using Leeʼs “The Mathematics of Excess of Loss Coverages and Retrospective Rating -- A Graphical Approach,” draw a graph to show the expected payments. Label all axes. 37.29 (1 point) With no deductible and no maximum covered loss. 37.30 (1 point) With a 500 deductible and no maximum covered loss. 37.31 (1 point) With no deductible and a 1500 maximum covered loss. 37.32 (1 point) With a 500 deductible and a 1500 maximum covered loss. 37.33 (1 point) With a 500 franchise deductible and no maximum covered loss. 37.34 (2 points) You are given the following graph of the cumulative loss distribution: 2500 2000 1500 B 1000 500
A
0.63
• Size of the area labeled A = 377. • Size of the area labeled B = 139. Calculate the loss elimination ratio at 1000. A. Less than 0.6 B. At least 0.6, but less than 0.7 C. At least 0.7, but less than 0.8 D. At least 0.8, but less than 0.9 E. At least 0.9
1
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 703
37.35 (3 points) The following graph is of the cumulative loss distribution in 2001: x
7500
P 5000
Q 3333
R 1500
T
1000 667
U V 1
There is a total of 50% inflation between 2001 and 2008. A policy in 2008 has a 1000 deductible and a 5000 maximum covered loss. Which of the following represents the expected size of loss under this policy? A. 1.5(Q + R + T) B. 1.5(R + T + U) C. (P + Q + R)/1.5 D. (Q + R + T)/1.5 E. None of A, B, C, or D.
Prob.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 704
37.36 (2 points) The size of loss distribution is shown in the following diagram, with probability along the horizontal axis and size of loss along the vertical axis. Which of the following represents the expected losses under a policy with a franchise deductible of 2 and a maximum covered loss of 5? A. γ + ε
B. α + β + γ
C. δ + ε + η
D. β + γ + δ + ε
E. β + γ + δ + ε + η
η
5
ε γ 2
0
β
α 0
F(2)
δ F(5) 1
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 705
37.37 (2 points) The following graph shows the distribution function, F(x), of loss severities. x P
6250 Q
5000
R
4000 T
1
F(x)
A policy has a 5000 maximum covered loss and an 80% coinsurance. Which of the following represents the expected size of loss under this policy? A. Q + R B. R + T C. Q + R + T D. P + Q + R + T E. None of A, B, C, or D.
37.38 (3 points) For a Pareto Distribution with α = 4 and θ = 3, draw a Lee Diagram, showing the curtate expectation of life, e0 .
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 706
For the next nine questions, use the following graph of the cumulative loss distribution: 2000
1500
1000
B 500 A 0.323 0.762 1 Size of the area labeled A = 107. Size of the area labeled B = 98. The mean size of loss is 750. Calculate the following items: 37.39 (1 point) The average size of loss for those losses of size less than 500. 37.40 (1 point) The loss elimination ratio at 500. 37.41 (1 point) The average payment per loss with a deductible of 500 and a maximum covered loss of 1000. 37.42 (1 point) The mean excess loss at 500. 37.43 (1 point) The average size of loss for those losses of size between 500 and 1000. 37.44 (1 point) The loss elimination ratio at 1000. 37.45 (1 point) The average payment per payment with a deductible of 500 and a maximum covered loss of 1000. 37.46 (1 point) The mean excess loss at 1000. 37.47 (1 point) The average size of loss for those losses of size greater than 1000.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 707
37.48 (3 points) You are given the following graph of cumulative distribution functions. Size
L P
L/(1+r)
1
Prob.
The thicker curve is the cumulative distribution function for the size of loss in a later year, F(x), with corresponding Survival Function S(x) and density f(x). There is total inflation of r between this later year and an earlier year. The thinner curve is the cumulative distribution function for the size of loss in this earlier year. Which of the following is an expression for Area P? L
A.
∫
S(x) dx - S(L)Lr.
L/(1+r) L
B.
∫
x f(x) dx - L{S(L/(1+r)) - S(L)}.
L/(1+r) L
C. (1+r)
∫
S(x) dx - S(L)Lr/(1+r).
L/(1+r) L
D.
∫
x f(x) dx - L{F(L) - F(L/(1+r))}/(1+r).
L/(1+r)
E. None of the above.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 708
37.49 (CAS9, 11/99, Q.34) (2 points) Using Lee's "The Mathematics of Excess of Loss Coverages and Retrospective Rating - A Graphical Approach," answer the following: a. (0.5 point) If the total limits inflation rate is 6%, describe why the inflation rate for the basic limits coverage is lower than 6%. b. (1 point) Use Lee to graphically justify your answer. c. (0.5 point) What are the two major reasons why the inflation rate in the excess layer is greater than the total limits inflation rate?
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 709
37.50 (5, 5/03, Q.14) (1 point) ∞
Given E[x] =
∫ x f(x) dx = $152,500
0
and the following graph of the cumulative loss distribution, F(x), as a function of the size of loss, x, calculate the excess ratio at $100,000.
• Size of the area labeled Y = $12,500
Loss Size (x)
$100,000
Y 0.20
1.00 F(x) = Cumulative Claim Frequency
A. Less than 0.3 B. At least 0.3, but less than 0.5 C. At least 0.5, but less than 0.7 D. At least 0.7, but less than 0.9 E. At least 0.9
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 710
37.51 (CAS3, 11/03, Q.20) (2.5 points) Let X be the size-of-loss random variable with cumulative distribution function F(x) as shown below:
Which expression(s) below equal(s) the expected loss in the shaded region? ∞
I.
∫ x dF(x)
K
K
II. E(x) -
∫ x dF(x) 0
∞
Ill.
∫ [1
- F(x)] dx
K
A. I only B. II only C. Ill only D. I and Ill only E. II and Ill only
K[1-F(K)]
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 711
37.52 (CAS3, 11/03, Q.23) (2.5 points) F(x) is the cumulative distribution function for the size-of-loss variable, X. P, Q, R, S, T, and U represent the areas of the respective regions. What is the expected value of the insurance payment on a policy with a deductible of "DED" and a limit of "LIM"? (For clarity, that is a policy that pays its first dollar of loss for a loss of DED + 1 and its last dollar of loss for a loss of LIM.)
A. Q
B. Q+R
C. Q+T
D. Q+R+T+U
E. S+T+U
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 712
37.53 (CAS3, 5/04, Q.33) (2.5 points) F(x) is the cumulative distribution function for the size-of-loss variable, X. P, Q, R, S, T, and U represent the areas of the respective regions. What is the expected value of the savings to the insurance company of implementing a franchise deductible of “DED" and a limit of “LIM" to a policy that previously had no deductible and no limit? (For clarity, that is a policy that pays its first dollar of loss for a loss of DED + 1 and its last dollar of loss for a loss of LIM.)
A. S
B. S+P
C. S+Q+P
D. S+P+R+U
E. S+T+U+P
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 713
37.54 (CAS3, 11/04, Q.30) (2.5 points) Let X be a random variable representing an amount of loss. Define the cumulative distribution function F(x) as F(x) = Pr(X≤x).
Determine which of the following formulas represents the shaded area. b
A.
∫ x dF(x) + a - b + aF(b) - bF(a) a b
B.
∫ x dF(x) + a - b + aF(a) - bF(b) a
b
C.
∫ x dF(x) - a + b + aF(b) - bF(a) a b
D.
∫ x dF(x) - a + b + aF(a) - bF(b) a b
E.
∫ x dF(x) - a + b - aF(a) + bF(b) a
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 714
37.55 (CAS3, 5/06, Q.28) (2.5 points) The following graph shows the distribution function, F(x), of loss severities in 2005.
x
P 1.1 D Q
D S
D/1.1
T
R U
0 Loss severities are expected to increase 10% in 2006 due to inflation. A deductible, D, applies to each claim in 2005 and 2006. Which of the following represents the expected size of loss in 2006? A. P B. 1.1P C. 1.1(P+Q+R) D. P+Q+R+S+T+U E. 1.1(P+Q+R+S+T+U)
1
F(x)
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 715
Solutions to Problems: 37.1. α+β+γ+δ+ε+η, the mean is the area under the distribution curve. 37.2. α, the result of summing vertical strips under the curve from zero to F(2). 37.3. (α+β+δ) / (α+β+γ+δ+ε+η) = E[X ∧ 2] / E[X]. 37.4. E[X ∧ 2] is: α+β+δ, the area under the curve and the horizontal line at 2. 37.5. (β+γ+δ+ε+η) / ( α+β+γ+δ+ε+η) = 1 - α / ( α+β+γ+δ+ε+η). 37.6. (α+β+γ+δ+ε) / ( α+β+γ+δ+ε+η) = 1 - η / ( α+β+γ+δ+ε+η) = E[X ∧ 2] / E[X]. 37.7. α+β+δ+γ+ε, the area under the curve and the horizontal line at 5. 37.8. (α+β+γ)/( α+β+γ+δ+ε+η), the numerator is the result of summing vertical strips under the curve from zero to F(5), while the denominator is the total area under the curve. 37.9. (γ+ε) / ( α+β+γ+δ+ε+η) = ( E[X ∧ 5] - E[X ∧ 2]) / E[X]. 37.10. (γ+ε+η) / ( α+β+γ+δ+ε+η) = (E[X] - E[X ∧ 2]) / E[X]. 37.11. Losses from claims of size more than 5: δ+ε+η. 37.12. (γ+ε+η) / ( α+β+γ+δ+ε+η) = (E[X] - E[X ∧ 2]) / E[X]. 37.13. (α+β+γ+δ+ε) / ( α+β+γ+δ+ε+η) = 1 - η / ( α+β+γ+δ+ε+η) = E[X ∧ 5] / E[X]. 37.14. β, a rectangle of height 2 and width F(5)-F(2). 37.15. (β+γ) / ( α+β+γ+δ+ε+η), the numerator is the result of summing vertical strips under the curve from F(2) to F(5), while the denominator is the total area under the curve.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 716
37.16. η = E[X] - E[X ∧ 5], the sum of horizontal strips of length S(t) = 1-F(t) between the horizontal lines at 5 and ∞. 37.17. (α+β+δ) / ( α+β+γ+δ+ε+η) = E[X ∧ 2] / E[X]. 37.18. η, the sum of vertical strips of height t-5 between the vertical lines at F(5) and 1. 37.19. δ, a rectangle of height 2 and width S(5). 37.20. η / ( α+β+γ+δ+ε+η) = (E[X] - E[X ∧ 5]) / E[X]. 37.21. e(5) = η /S(5). But, δ = 2S(5) and ε = 3S(5). Thus e(5) = 3η/ε or 2η/δ. 37.22. γ, the sum of vertical strips of height t-2 between the vertical lines at F(2) and F(5). 37.23. η = E[(X - 5)+] = E[X] - E[X ∧ 5]. 37.24. γ+ε = E[X ∧ 5] - E[X ∧ 2], the sum of horizontal strips of length 1-F(t) between the horizontal lines at 2 and 5. 37.25. ε, a rectangle of height 5 -2 and width 1-F(5) = S(5). 37.26. e(2) = losses excess of 2 / S(2) = (γ+ε+η) /S(2). But, β+δ = 2S(2). ⇒ e(2) = 2(γ+ε+η)/(β+δ).
2013-4-2,
Loss Distributions, §37 Lee Diagrams
37.27. In the Lee diagram below, E[X
∧
HCM 10/8/12,
Page 717
1000] = Area A.
E[X ∧ 2000] = Area A + Area B. Therefore, Area B = E[X ∧ 2000] - E[X ∧ 1000]. Area B is the increase in the limited expected value due to increasing the limit from 1000 to 2000. Similarly, Area C is the increase in the limited expected value due to increasing the limit by another 1000. Area C < Area B, and therefore the increase in the limited expected value is less. In general the areas of a given height get smaller as one moves up the diagram, as the curve moves closer to the righthand asymptote. Therefore, the rate of increase of the limited expected value decreases. Comment: The diagram was based on a Pareto Distribution with α = 4 and θ = 2400.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 718
37.28. If y = size of loss and x = probability, then for an Exponential with θ = 500, x = 1 - exp[-y/500]. Therefore, y = -500 ln[1 - x]. loss of size 400 ⇔ probability = 1 - e-.8 = 0.551. loss of size 800 ⇔ probability = 1 - e-1.6 = 0.798. Losses of size 400 to 800 corresponds to the area below the curve and between vertical lines at 0.551 and 0.798: Size of Loss
800
400
Losses of size 400 to 800 0.2
0.4
0.6
0.8
Probability 1.0
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 719
37.29. If y = size of loss and x = probability, then for a Weibull with θ = 300 and τ = 1/2, x = 1 - exp[-(y/300).5]. Therefore, y = 300 ln[1 -x]2 . Lee Diagram:
Comment: One has to stop graphing at some size of loss, unless one has infinite graph paper! In this case, I only graphed up to 2500.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 720
37.30. The payments with a 500 deductible and no maximum covered loss are represented by the area above the line at height 500 and to the right of the curve:
37.31. The payments with no deductible and a 1500 maximum covered loss are represented by the area below the line at height 1500, and to the right of the curve:
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 721
37.32. The payments with a 500 deductible and a 1500 maximum covered loss are represented by the area above the line at height 500, below the line at height 1500, and to the right of the curve:
37.33. Under a 500 franchise deductible, nothing is paid on a loss of size 500 or less, and the whole loss is paid for a loss of size greater than 500. The payments with a 500 franchise deductible and no maximum covered loss are the losses of size greater than 500, represented by the area to the right of the vertical line at F(500) and below the curve:
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 722
37.34. D. Area C is a rectangle with height 1000 and width (1 - .63) = .37, with area 370. 2500 2000 1500
B
1000 500
A
C
0.63 1 Expected losses limited to 1000 = Area A + Area C = 377 + 370 = 747. E[X] = Area A + Area B + Area C = 377 + 139 + 370 = 886. Loss elimination ratio at 1000 = E[X ∧ 1000]/E[X] = 747/886 = 84.3%. Comment: Similar to 5, 5/03, Q.14. The Lee Diagram was based on a Weibull Distribution with θ = 1000 and τ = 2. 37.35. B. Deflate 5000 from 2008 to 2001, where it is equivalent to: 5000/1.5 = 3333. Deflate 1000 from 2008 to 2001, where it is equivalent to: 1000/1.5 = 667. Then the average expected loss in 2001 is the area between horizontal lines at 667 and 3333, and under the curve: R+T+U. In order to get the expected size of loss in 2008, reinflate back up to the 2008 level, by multiplying by 1.5: 1.5(R + T + U). Comment: Similar to CAS3, 5/06, Q.28. 37.36. D. A policy with a franchise deductible of 2 pays the full amount of all losses of size greater than 2. This is the area under the curve and to the right of the vertical line at 2: β + γ + δ + ε + η. However, there is also a maximum covered loss of 5, which means the policy does not pay for the portion of any loss greater than 5, which eliminates area η, above the horizontal line at 5. Therefore the expected payments are: β + γ + δ + ε. Comment: Similar to CAS3, 5/04, Q.33. 37.37. E. Prior to the effect of the coinsurance, the expected size of loss is below the line at 5000 and below the curve: R + T. We multiply by 80% before paying under this policy. The expected size of loss is: 0.8(R + T).
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 723
37.38. The curtate expectation of life, e0 , is the sum of a series of rectangles, each with height 1, with areas: S(1) = (3/4)4 , S(2) = (3/5)4 , S(3) = (3/6)4 , etc. The first six of these rectangles are shown below: Size of Loss
6
5
4
3
2
1
Prob. 0.2
0.4
0.6
Comment: e0 < e(0) = E[X] = area under the curve. The curtate expectation of life is discussed in Actuarial Mathematics.
0.8
1
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 724
37.39. We are given Area A = 107 and Area B is 98. We can get the areas of three rectangles. One rectangle has width: .762 - .323 = .439, height 500, and area: (.439)(500) = 219.5. Two rectangles have width: 1 - .762 = .238, height 500, and area: (.238)(500) = 119. The total area under the curve is equal to the mean, given as 750. Therefore, the area under the curve and above the horizontal line at 1000 is: 750 - (107 + 219.5 + 98 + 119 + 119) = 87.5. 2000
1500
87.5
1000 98
119
219.5
119
500 107
0.323 0.762 The average size of loss for those losses of size less than 500 is: (dollars from losses of size less than 500)/F(500) = 107/.323 = 331.
1
Comment: The Lee Diagram was based on a Gamma Distribution with α = 3 and θ = 250. 37.40. LER(500) = E[X ∧ 500]/E[X] = (107 + 219.5 + 119)/750 = 59.4%. 37.41. The average payment per loss with a deductible of 500 and maximum covered loss of 1000 is: layer from 500 to 1000 ⇔ the area under the curve and between the horizontal lines at 500 and 1000 ⇔ 98 + 119 = 217. 37.42. e(500) = (losses excess of 500)/S(500) = (98 + 119 + 87.5)/(1 - .323) = 450. 37.43. The average size of loss for those losses of size between 500 and 1000 is: (dollars from losses of size between 500 and 1000)/{F(1000) - F(500)} = (219.5 + 98)/(.762 - .323) = 723.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 725
37.44. The loss elimination ratio at 1000 = E[X ∧ 1000]/E[X] = (107 + 219.5 + 119 + 98 + 119)/750 = 88.3%. Alternately, excess ratio at 1000 is: 87.5/750 = 11.7%. ⇒ LER(1000) = 1 - 11.7% = 88.3%. 37.45. The average payment per (non-zero) payment with a deductible of 500 and maximum covered loss of 1000 is: (average payment per loss)/S(500) = 217/(1 - .323) = 321. 37.46. e(1000) = (losses excess of 1000)/S(1000) = 87.5/(1 - .762) = 368. 37.47. The average size of loss for those losses of size greater than 1000 is: (dollars from losses of size > 1000)/S(1000) = (87.5 + 119 + 119)/(1 - .762) = 1368. 37.48. D. The rectangle below Area P plus Area P, represent those losses of size between L/(1+r) and L. Size
L P
L/(1+r)
1 L
⇒ Area P + Rectangle =
∫ x f(x) dx. L/(1+r)
The rectangle has height: L/(1+r), and width: F(L) - F(L/(1+r)). L
⇒ Area P =
∫ x f(x) dx - {F(L) - F(L/(1+r))}L/(1+r).
L/(1+r)
Prob.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 726
Comment: Area P can be written in other ways as shown below:
Size
L P
L/(1+r)
1
Prob.
Area P plus the rectangle to the left of Area P, represents the layer of loss from L/(1+r) to L. L
⇒ Area P + Rectangle =
∫ S(x) dx.
L/(1+r)
The rectangle has height: L - L/(1+r), and width: S(L). L
⇒ Area P =
L
∫ S(x) dx - S(L){L - L/(1+r)} = ∫ S(x) dx - S(L) L r/(1+r).
L/(1+r)
L/(1+r)
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 727
37.49. a. Some losses will hit the basic limit before they get the total increase from inflation, while some were already at the basic limit so inflation wonʼt increase them. For example, if the basic limit is $100,000, then a loss of $125,000 will still contribute $100,000 to the basic limit after inflation. A loss of $98,000 will increase to $103,880 after inflation, and would then contribute $100,000 to the basic limit, an increase of only 100/98 - 1 = 2.04% in that contribution, less than 6%. b. Let L be the basic limit. The solid curve refers to the losses prior to inflation, while the dashed curve refers to the losses after inflation:
The expected excess losses prior to inflation are Area D. The increase in the expected excess losses due to inflation is Area C. The expected basic limit losses prior to inflation are Area B. The increase in the expected basic limit losses due to inflation is Area A. Area C is larger compared to Area D, than is Area A compared to Area B. Therefore, the basic limit losses increase slower due to inflation than do the excess losses. Since the unlimited ground up losses are the sum of the excess and basic limit losses, the basic limit losses increase slower due to inflation than the unlimited ground up losses.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 728
Looked at from a somewhat different point of view, those losses in Area M will have their contributions to the basic limit increase at a rate less than 6%, while those losses in Area N, will not have their contribution to the basic limit increases at all due to inflation.
c. All losses that were already in the excess layer receive the full increase from inflation. Some losses that did not contribute to the excess layer will, after inflation, contribute something to the excess layer. Comment: In order to allow one to see what is going on, the Lee Diagrams in my solution are based on much more than 6% inflation.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 729
37.50. B. Area Y + Area A + Area B = E[x] = $152,500. Area A is a rectangle with height $100,000 and width .8, with area $80,000. Area Y is given as $12,500. Expected losses excess of $100,000 = Area B = $152,500 - $12,500 - $80,000 = $60,000. Excess ratio at $100,000 = (Expected losses excess of $100,000)/E[X] = $60,000 / $152,500 = 0.393.
Loss Size (x)
B $100,000
Y
A 0.20
1.00 F(x) = Cumulative Claim Frequency
Comment: Loss Elimination Ratio at $100,000 is: 1 - .393 = .607. Not one of the usual size of loss distributions encountered in casualty actuarial work.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 730
37.51. E. The shaded area represents the losses excess of K = E[(X-K)+] = E[X] - E[X K
∫
∞
∞
∫
∧
K] =
∞
∫
∫
E[X] - { x f(x) dx + KS(K)} = x f(x) dx - k S(k) = (x-K) f(x) dx = S(x) dx. 0
K
K
K
Since S(x) = 1 - F(x) and f(x) dx = dF(x), statements II and III are true. Statement I is false; it would be true if the integrand were x-K rather than x. 37.52. B. The layer of loss from DED to LIM is the area under the curve and between the horizontal lines at DED and LIM: Q + R. 37.53. B. Under a franchise deductible one does not pay any losses of size less than DED, but pays the whole of any loss of size greater than DED. Due to the franchise deductible, one saves S, the area corresponding to the losses of size less than or equal to DED. Due to the limit, one pays at most LIM for any loss, so one saves P, the area representing the expected amount excess of LIM. The expected savings are: S + P. Comment: Similar to CAS3, 11/03, Q.23, except here there is a franchise deductible rather than an ordinary deductible. The affect of the franchise deducible is to left truncate the data at DED, which removes area S.
2013-4-2,
Loss Distributions, §37 Lee Diagrams
HCM 10/8/12,
Page 731
37.54. D. Label some of the areas in the Lee Diagram as follows:
D = (b-a)S(b). E = a{F(b) - F(a)} b
∫
C + E = losses on losses of size between a and b = x dF(x). a
Shaded Area = C + D = (C + E) + D - E = b
b
∫x dF(x) + (b-a)S(b) - a{F(b) - F(a)} = ∫x dF(x) - a + b + aF(a) - bF(b). a
a
Alternately, the shaded area is the layer from a to b: b
E[X
∧
b] - E[X
∧
a] = 0
a
b
∫x dF(x) + bS(b) - ∫x dF(x) - aS(a) = ∫x dF(x) - a + b + aF(a) - bF(b). 0
a
37.55. E. Deflate D from 2006 to 2005, where it is equivalent to a deductible of: D/1.1. Then the average expected loss in 2005 is the area above the deductible, D/1.1, and under the curve: P+Q+R+S+T+U. In order to get the expected size of loss in 2006, reinflate back up to the 2006 level, by multiplying by 1.1: 1.1(P+Q+R+S+T+U).
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 732
Section 38, N-Point Mixtures of Models Mixing models is a technique that provides a greater variety of loss distributions. Such mixed distributions are referred to by Loss Models as n-point or two-point mixtures.296 2-point mixtures: For example, let A be a Pareto Distribution with parameters α = 2.5 and θ = 10, while B is a LogNormal Distribution with parameters µ = 0.5 and σ = 0.8. Let p = 0.10, the weight for the Pareto Distribution. If we let G(x) = pA(x) + (1-p)B(x) = 0.1 A(x) + 0.9 B(x), then G is a Distribution Function since G(0) = 0 and G(∞) = 1. G is a mixed “Pareto-LogNormal” Distribution. Hereʼs the individual distribution functions, as well as that of this mixed distribution: Limit
Pareto Distribution Function
LogNormal Distribution Function
Mixed Pareto-LogNormal Distribution Function
0.5 1 2.5 5 10 25 50 100
0.1148 0.2120 0.4276 0.6371 0.8232 0.9564 0.9887 0.9975
0.0679 0.2660 0.6986 0.9172 0.9879 0.9997 1.0000 1.0000
0.0726 0.2606 0.6715 0.8892 0.9714 0.9954 0.9989 0.9998
For example, 0.9954 = (0.1)(0.9564) + (0.09)(0.9997). For the mixed distribution, the chance of a claim greater than 25 is: 0.46% = (0.1)(4.36%) + (0.9)(.03%). The Distribution Function and the Survival Function of the mixture are mixtures of the individual Distribution and Survival Functions. Also we see from the above table, that for example the 89th percentile of this mixed ParetoLogNormal Distribution is a little more than 5, since F(5) = 0.8892. In general, one can take a weighted average of any two Distribution Functions: G(x) = p A(x) + (1-p)B(x). Such a Distribution Function H, called a 2-point mixture of models, will generally have properties that are a mixture of those of A and B.
296
Loss Models, Section 4.2.3.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 733
One can create a very large number of possible combinations by choosing various types of distributions for A and B. The mixture will have a number of parameters equal to the sum of the number of parameters of the two distributions A and B, plus one more for p, the weighting parameter. The mixed Pareto-LogNormal Distribution discussed above has 2 + 2 + 1 = 5 parameters: α, θ, µ, σ, and p. Densities: The density of the mixture is the derivative of its Distribution Function. Therefore, the density of the mixture is the mixture of the densities. Exercise: What is the density at 5 of this mixed Pareto-LogNormal Distribution? α θα [Solution: The density of the Pareto is: f(x) = = (2.5)(102.5)(10 + x)-3.5. (θ + x)α + 1 f(5) = 790.57/153.5 = 0.06048. The density of the LogNormal is:
[
exp f(x) =
( ln(x) -
− µ)2 2σ2 x σ 2π
]
[
− 0.5)2 2 (0.82) x 0.8 2π
exp =
( ln(x)
]
.
f(5) = exp[-.5 ({ln(5) - 0.5}/0.8)2 ] / {(5)(0.8) 2 π } = 0.03813. The density for the mixed distribution is: (0.1)(0.06048) + (0.9)(0.03813) = 0.0404.] Moments: Moments of the mixed distribution are the weighted average of the moments of the individual distributions: EH[Xn ] = p EA[Xn ] + (1-p) EB[Xn ].
E H [X] = p EA [X] + (1-p) EB [X]. For example, the mean of the above mixed Pareto-LogNormal Distribution is: (0.1)( mean of the Pareto) + (0.9)( the mean of the LogNormal). Exercise: What are the first and second moments of the Pareto Distribution with parameters α = 2.5 and θ = 10? [Solution: For the Pareto, the mean is: θ / (α−1) = 10/1.5 = 6.667, while the second moment is:
2 θ2 200 = = 266.67.] (α − 1) (α − 2) (1.5) (0.5)
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 734
Exercise: What are the first and second moments of the LogNormal Distribution with parameters µ = 0.5 and σ = 0.8? [Solution: For the LogNormal, the mean is: exp[µ + 0.5 σ2] = e0.82 = 2.270, while the second moment is: exp[2µ + 2σ2] = e2.28 = 9.777.] Thus the mean of the mixed Pareto-LogNormal is: (0.1)(6.667) + (0.9)( 2.27) = 2.71. We also note that of the total loss dollars represented by the mixed distribution, (0.1)(6.667) = 0.667 come from the underlying Pareto, while (0.9)( 2.27) = 2.04 come from the underlying LogNormal. Thus 0.667 / 2.71 = 25% of the total losses come from the underlying Pareto, while the remaining 2.04 / 2.71 = 75% come from the underlying LogNormal. In general, p EA[X] / {p EA[X] + (1-p) EB[X]} represents the portion of the total losses for the mixed distribution that come from the first of the individual distributions. For a 2 point mixture of A and B with weights p and 1-p: E[X] = E[X | A]Prob[A] + E[X | B]Prob[B] = p(mean of A) + (1-p)(mean B). E[X2 ] = E[X2 | A]Prob[A] + E[X2 | B]Prob[B] = p(2nd moment of A) + (1-p)(2nd moment of B).
E H [X2 ] = p EA [X2 ] + (1-p) EB [X2 ]. The second moment of this mixed distribution is the weighted average of the second moments of the two individual distributions: (0.1)(266.67) + (0.9) (9.777) = 35.47.
The moment of the mixture is the mixture of the moments. Thus the variance of this mixed distribution is: 35.47 - 2.712 = 28.13. First one gets the moments of the mixture, and then one gets the variance of the mixture. One does not weight together the individual variances. Coefficient of Variation: One can now get the coefficient of variation of this mixture, from the mean and variance of this mixture. The Coefficient of Variation of this mixture is: 28.16 / 2.71 = 1.96. The C.V. of the mixed distribution is between the C.V. of the Pareto at 2.24 and that of the LogNormal at 0.95. The mixed distribution has a heavier tail than the LogNormal and a lighter tail than the Pareto.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 735
Limited Moments: Limited Moments of the mixed distribution are the weighted average of the limited moments of the individual distributions: EH[(X ∧ x)n ] = p EA[(X ∧ x)n ] + (1-p) EB[(X ∧ x)n ]. E H [X
∧
x] = p EA [X
∧
x] + (1-p) EB [X
∧
x].
For example, the limited expected value of the mixed Pareto-LogNormal Distribution is: (.1)(LEV of the Pareto) + (.9)(LEV of the LogNormal). Exercise: What is E[X ∧ 4] for the Pareto Distribution with parameters α = 2.5 and θ =10? [Solution: For the Pareto, E[X ∧ x] =
θ ⎧ ⎛ θ ⎞ α -1⎫ ⎨1 - ⎝ ⎬. α−1 ⎩ θ+ x ⎠ ⎭
E[X ∧ 4] = (10/1.5) {1 - (10/14)1.5} = 2.642.] Exercise: Compute E[X ∧ 4] for the LogNormal with parameters µ = 0.5 and σ = 0.8. ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ [Solution: E[X ∧ x] = exp(µ + σ2/2) Φ⎢ + x {1 - Φ⎢ ⎥ ⎥⎦ }. ⎣ ⎦ ⎣ σ σ E[X ∧ 4] = e.82 Φ[.3079] + 4 {1 - Φ[1.1079]} = (2.2705)(.6209) + (4){1 - .8660} = 1.946.] Thus for the mixed Pareto-LogNormal, E[X ∧ 4] = (0.1)(2.642) + (0.9)(1.946) = 2.016. Quantities that are Mixtures: Thus there are a number of quantities which are mixtures: The Distribution Function of the mixture is the mixture of the Distribution Functions. The Survival Function of the mixture is the mixture of the Survival Functions. The density of the mixture is the mixture of the densities. The moments of the mixture are the mixture of the moments. The limited moments of the mixture are the mixture of the limited moments. E[(X -d)+] = E[X] - E[X
∧
d] is also a mixture.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 736
As discussed already the variance of the mixture is not the mixture of the variances. Rather, first one gets the moments of the mixture, and then one gets the variance. The coefficient of variation of the mixture is not the mixture of the coefficients of variation. Rather, first one computes the moments of the mixture. Similarly, the skewness of the mixture is not the mixture of the skewnesses. Rather, first one computes the moments of the mixture. Then one gets the third central moment of the mixture and divides it by the standard deviation of the mixture cubed. A number of other quantities of interest, such as the hazard rate, mean residual life, Loss Elimination Ratio, Excess Ratio, and percent of losses in a layer, have to be computed from their components for the mixture, as will be discussed. Hazard Rates: h(x) = f(x)/S(x). Therefore, in order to get the hazard rate for a mixture, one computes the density and the survival function for that mixture. As computed previously, for this mixture f(5) = (0.1)(0.06048) + (0.9)(0.03813) = 0.0404. As computed previously, for this mixture F(5) = (0.1)(0.6371) + (0.9)(0.9172) = 0.8892. S(5) = 1 - 0.8892 = 0.1108. For this mixture, h(5) = 0.0404/0.1108 = 0.3646. For the Pareto, h(5) = 0.06048 / (1 - 0.6371) = 0.1667. For the LogNormal, h(5) = 0.03813 / (1 - 0.9172) = 0.4605. (0.1)(0.1667) + (0.9)(0.4605) = 0.4311 ≠ 0.3646. The hazard rate of the mixture is not equal to the mixture of the hazard rates. Excess Ratios: Here is how one calculates the Excess Ratio for this mixed distribution at 10, RG(10). The numerator is the loss dollars excess of 10. For the Pareto this is the excess ratio of the Pareto at 10 times the mean for the Pareto: RA(10)EA[X]; for the LogNormal it is: RB(10)EB[X]. Thus the numerator of RG(10) is: pEA[X]RA(10) + (1 - p)EB[X]RB(10).
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 737
Exercise: What is the Excess Ratio at 10 of the Pareto Distribution with α = 2.5 and θ = 10? ⎛ θ ⎞ α− 1 [Solution: The excess ratio for the Pareto is: R(x) = ⎜ ⎟ . ⎝θ + x ⎠ RA(10) = {10/(10 + 10)}2.5-1 = 0.3536.] Exercise: What is the Excess Ratio at 10 of the LogNormal Distribution with parameters µ = 0.5 and σ = 0.8? [Solution: The excess ratio for the LogNormal is: ⎡ ln(x) − µ ⎤ 1 − Φ ⎢⎣ ⎥⎦ ⎡ ln(x) − µ − σ2 ⎤ σ R(x) = 1 - Φ⎢ x . ⎥⎦ ⎣ σ exp[µ + σ2 / 2] RB(10) = 1 - Φ[(ln10 - 0.5 - 0.82 )/0.8] - x {1 - Φ[(ln10 - 0.5)/0.8]} / exp(0.5 + 0.82 /2) = 0.0197. ] Thus for the mixed distribution the excess ratio at 10 is: RG(10) = {pEA[X]RA(10) + (1 - p)EB[X]RB(10)} / EH[X] = {pEA[X]RA(10) + (1 - p)EB[X]RB(10)} / {p EA[X] + (1 - p) EB[X]} = {(0.1)(6.667)(0.3536) + (0.9)(2.27)(0.0198)} / 2.71= 10.2% At each limit, the Excess Ratio for the mixed distribution is a weighted average of the individual excess ratios, with weights: pEA[X] = (0.1)(6.667), and (1-p)EB[X] = (0.9)(2.27). Here are the Excess Ratios computed at different limits: Limit
Pareto Excess Ratio
LogNormal Excess Ratio
Mixed Pareto-LogNormal Excess Ratio
1 2.5 5 10 25 50
86.68% 71.55% 54.43% 35.36% 15.27% 6.80%
59.96% 27.83% 9.64% 1.97% 0.10% 0.00%
66.53% 38.59% 20.66% 10.18% 3.83% 1.68%
We note that the excess ratio for the LogNormal declines much more quickly than that of the Pareto. The excess ratio for the mixed distribution is somewhere in between. In general, the Excess Ratio for the mixed distribution is a weighted average of the individual excess ratios, with weights pEA[X] and (1 - p)EB[X]: RG(x) = {pEA[X]RA(x) + (1 - p)EB[X]RB(x)} / EH[X] = {pEA[X]RA(x) + (1 - p)EB[X]RB(x)}/{pEA[X] + (1 - p)EB[X]}.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 738
Layers of Losses: One can compute the percent of total dollars in a layer by taking the difference of the excess ratios. For example, for the Pareto the layer from 5 to 10 represents: 54.43% - 35.36% = 19.07% of the Paretoʼs total dollars of loss. For the LogNormal the layer from 5 to 10 represents only: 9.64% - 1.97% = 7.67% of its total dollars of loss. Using the excess ratios computed for the mixed distribution, for the mixed distribution the layer from 5 to 10 represents: 20.66% - 10.18% = 10.48% of its total dollars of loss. One can divide the losses for a layer for the mixed distribution into those from the Pareto and those from the LogNormal. The contribution from the Pareto to the mixed distribution for the layer from 5 to 10 is: (0.1)(6.667)(54.43% - 35.36%) = 0.1271. The contribution from the LogNormal for the mixed distribution to the layer from 5 to 10 is: (0.9)(2.270)(9.64% - 1.97%) = 0.1566. The mixed distribution has losses in the layer from 5 to 10 of: (2.71)(20.66% - 10.18%) = 0.284 = 0.1271 + 0.1566. Thus for the layer from 5 to 10, about 45% of the losses for the mixed distribution come from the Pareto while the remaining 55% come from the LogNormal.297 One can perform similar calculations for other layers: Bottom of Layer
Top of Layer
0 1 2.5 5 10 25 50
1 2.5 5 10 25 50 ∞
Losses in Layer Losses in Layer Contributed Contributed by Pareto by LogNormal 0.0888 0.1008 0.1141 0.1272 0.1339 0.0565 0.0454
0.8180 0.6564 0.3716 0.1567 0.0382 0.0020 0.0001
Portion of Losses in Layer Contributed Contributed by Pareto by LogNormal 9.8% 13.3% 23.5% 44.8% 77.8% 96.7% 99.8%
90.2% 86.7% 76.5% 55.2% 22.2% 3.3% 0.2%
We note that for this mixed distribution the losses for lower layers come mainly from the LogNormal Distribution, while those for the higher layers come mainly from the Pareto Distribution. In that sense the LogNormal is modeling the behavior of the smaller claims, while the Pareto is modeling the behavior of the larger claims. This is typical of a mixture of models; a lighter-tailed distribution mostly models the behavior of the smaller losses, while a heaviertailed distribution mostly models the behavior of the larger losses.
297
0.127 / 0.284 = 45%, and 0.157 / 0.284 = 55%.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 739
Fitting to Data: One can fit size of loss data to mixed models using the same techniques as for other size of loss distributions. However, due to the generally larger number of parameters there are often practical difficulties. For example, if one attempted to use maximum likelihood to fit a mixed ParetoLogNormal Distribution to data, one would be fitting 5 parameters. Thus numerical algorithms would be searching through 5 dimensional space and might take a long time. Thus it is important in practical applications to have a good staring point. Sometimes one fits each of the individual distributions to the given data and uses these results to help pick a starting value. Sometimes, keeping one or more of the parameters fixed while fitting the others will help to determine a good starting place. Often one can use a prior yearʼs result or a result from a similar data set to help choose a starting value. Sometimes one just has to try a few different starting values before finding one that seems to work. Sometimes using numerical methods, one has problems getting p, the weight parameter, to stay between 0 and 1 as desired. One could in such cases reparameterize by letting p = eb / (eb +1). n-Point Mixtures: In general, one can weight together any number of distributions, rather than just two.298 While I have illustrated two-point mixtures, there is no reason why one could not use three-point, four-point, etc. mixtures. The quantities of interest can be calculated in a manner parallel to that used here for the two-point distributions. Also besides those situations where all the distributions of are different types, some or all of the individual distributions can be of the same type. For example, the Distribution: (0.2)(1 - e-x/10) + (0.5)(1 - e-x/25) + (0.3)(1 - e-x/100) = 1 - 0.2e-x/10 - 0.5e-x/25 - 0.3e-x/100, is a three-point mixture of Exponential Distributions, with means of 10, 25, and 100 respectively. An n-point mixture of Exponential Distributions would have 2n -1 parameters, n means of Exponential Distributions and n-1 weighting parameters. For example a three-point mixture of Exponential Distributions has 3 means and 2 weights for a total of 5 parameters.
298
Of course one must be careful of introducing too many parameters. The “principal of parsimony” applies; one should use the minimum number of parameters necessary. One can always improve the fit to a given data set by adding parameters, but the resulting model is often less useful.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 740
Variable Mixtures: Variable-Component Mixture Distribution ⇔ weighted average of unknown # of distributions ⇔ F(x) = Σ wi Fi(x), with Σ wi =1.299 Variable Mixture ⇔ weighted average of unknown # of distributions of the same family but differing parameters ⇔ F(x) = Σ wi Fi(x), with each Fi of the same family and Σ wi =1. For example a variable mixture of Exponentials would have: F(x) = Σ wi (1 - exp[-x/θi]) = 1 - Σ wi exp[-x/θi], with Σ wi =1. The key difference from a n-point mixture of Exponentials is that in the variable mixture, the number of Exponentials weighted together is unknown and is a parameter to be determined.300 Variable-Component Mixture Distribution, and their special cases Variable Mixtures, are called semi-parametric, since they share some of the properties of both parametric and nonparametric distributions.301
299
See Definition 4.6 in Loss Models. For example, one can fit a variable mixture of Exponentials via maximum likelihood. See “Modeling Losses with the Mixed Exponential Distribution”, by Clive L. Keatinge, PCAS 1999. 301 The Exponential and LogNormal are examples of parametric distributions. The empirical distribution function is an example of a nonparametric distribution. 300
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 741
Summary: The mixture of models can be useful when more flexibility is desired to fit size of loss data.302 It has been useful for a number of practical applications.303 One can perform either n-point mixtures or continuous mixing.304 One can use the same techniques to mix together frequency models. Sometimes the mixture of models is just a mathematical device with no physical significance. However, sometimes the mixture directly models a feature of the real world. For example, sometimes a mixture is useful when a population is divided into two sub-populations such as smoker and nonsmoker. Mixtures can also be useful when the data results from different perils. For example, for Homeowners Insurance it might be useful to fit Theft, Wind, Fire, and Liability losses each to a separate size of loss distribution. This is an example, where one might weight together 4 different distributions. Mixtures will come up again in Buhlmann Credibility.305
302
See for example “Methods of Fitting Distributions to Insurance Loss Data”, by Charles C. Hewitt and Benjamin Lefkowitz, PCAS 1979. 303 For example, Insurance Services Office used mixed Pareto-Pareto models to calculate Increased Limits Factors. ISO has switched to an n-point mixture of Exponential Distributions. See “ Modeling Losses with the Mixed Exponential Distribution”, PCAS 1999, by Clive Keatinge. The Massachusetts Workersʼ Compensation Rating Bureau has used mixed Pareto-Exponential models to calculate Excess Loss Factors. See “Workersʼ Compensation Excess Ratios, an Alternative Method,” PCAS 1998, by Howard C. Mahler. 304 To be discussed in the next section. 305 See “Mahlerʼs Guide to Buhlmann Credibility and Bayesian Analysis.”
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Problems: Use the following information for the next 22 questions:
•
V follows a Pareto Distribution, with parameters α = 4, θ = 10.
•
W follows an Exponential Distribution: F(w) = 1 - e- w /0.8.
•
Y is a two-point mixture of V and W, with 5% weight to the Pareto Distribution and 95% weight to the Exponential Distribution.
38.1 (1 point) For V, what is the chance of a claim greater than 2? A. less than 49% B. at least 49% but less than 50% C. at least 50% but less than 51% D. at least 51% but less than 52% E. at least 52% 38.2 (1 point) What is the mean of V? A. less than 3.1 B. at least 3.1 but less than 3.2 C. at least 3.2 but less than 3.3 D. at least 3.3 but less than 3.4 E. at least 3.4 38.3 (1 point) What is the second moment of V? A. less than 34 B. at least 34 but less than 35 C. at least 35 but less than 36 D. at least 36 but less than 37 E. at least 37 38.4 (1 point) What is the coefficient of variation of V? A. less than 1.3 B. at least 1.3 but less than 1.4 C. at least 1.4 but less than 1.5 D. at least 1.5 but less than 1.6 E. at least 1.6
Page 742
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
38.5 (1 point) What is the third moment of V? A. less than 1000 B. at least 1000 but less than 1001 C. at least 1001 but less than 1002 D. at least 1002 but less than 1003 E. at least 1003 38.6 (2 points) What is the skewness of V? A. less than 7.0 B. at least 7.0 but less than 7.1 C. at least 7.1 but less than 7.2 D. at least 7.2 but less than 7.3 E. at least 7.3 38.7 (2 points) What is the excess ratio at 5 of V? A. less than 28% B. at least 28% but less than 29% C. at least 29% but less than 30% D. at least 30% but less than 31% E. at least 31% 38.8 (1 point) For W, what is the chance of a claim greater than 2? A. less than 8% B. at least 8% but less than 9% C. at least 9% but less than 10% D. at least 10% but less than 11% E. at least 11% 38.9 (1 point) What is the mean of W? A. less than 0.5 B. at least 0.5 but less than 0.6 C. at least 0.6 but less than 0.7 D. at least 0.7 but less than 0.8 E. at least 0.8
HCM 10/8/12,
Page 743
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
38.10 (1 point) What is the second moment of W? A. less than 1.3 B. at least 1.3 but less than 1.4 C. at least 1.4 but less than 1.5 D. at least 1.5 but less than 1.6 E. at least 1.6 38.11 (1 point) What is the coefficient of variation of W? A. less than 0.7 B. at least 0.7 but less than 0.8 C. at least 0.8 but less than 0.9 D. at least 0.9 but less than 1.00 E. at least 1.0 38.12 (1 point) What is the third moment of W? A. less than 2.7 B. at least 2.7 but less than 2.8 C. at least 2.8 but less than 2.9 D. at least 2.9 but less than 3.0 E. at least 3.0 38.13 (1 point) What is the skewness of W? A. less than 1.8 B. at least 1.8 but less than 1.9 C. at least 1.9 but less than 2.0 D. at least 2.0 but less than 2.1 E. at least 2.1 38.14 (1 point) What is the Excess Ratio at 5 of W? A. less than 0.16% B. at least 0.16% but less than 0.18% C. at least 0.18% but less than 0.20% D. at least 0.20% but less than 0.22% E. at least 0.22% 38.15 (1 point) For Y, what is the chance of a claim greater than 2? A. less than 8% B. at least 8% but less than 9% C. at least 9% but less than 10% D. at least 10% but less than 11% E. at least 11%
HCM 10/8/12,
Page 744
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
38.16 (1 point) What is the mean of Y? A. less than 0.9 B. at least 0.9 but less than 1.0 C. at least 1.0 but less than 1.1 D. at least 1.1 but less than 1.2 E. at least 1.2 38.17 (1 point) What is the second moment of Y? A. less than 2.8 B. at least 2.8 but less than 2.9 C. at least 2.9 but less than 3.0 D. at least 3.0 but less than 3.1 E. at least 3.1 38.18 (1 point) What is the coefficient of variation of Y? A. less than 1.4 B. at least 1.4 but less than 1.5 C. at least 1.5 but less than 1.6 D. at least 1.6 but less than 1.7 E. at least 1.7 38.19 (1 point) What is the third moment of Y? A. 50 B. 51 C. 52 D. 53
E. 54
38.20 (2 points) What is the skewness of Y? A. less than 14 B. at least 14 but less than 15 C. at least 15 but less than 16 D. at least 16 but less than 17 E. at least 17 38.21 (2 points) What is the excess ratio at 5 of Y? A. less than 4% B. at least 4% but less than 5% C. at least 5% but less than 6% D. at least 6% but less than 7% E. at least 7% 38.22 (4 points) What is the mean excess loss at 2 of Y? A. 0.8 B. 1.0 C. 1.2 D. 1.4 E. 1.6
HCM 10/8/12,
Page 745
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 746
Use the following information for the next three questions: The random variable X has the density function: f(x) = 0.4 exp(-x/1128)/1128 + 0.6 exp(-x/5915)/5915, 0 < x < ∞. 38.23 (2 points) Determine the variance of X. (A) 21 million (B) 23 million (C) 25 million (D) 27 million 38.24 (2 points) Determine E[X ∧ 2000]. (A) 1250 (B) 1300 (C) 1350 (D) 1400
(E) 29 million
(E) 1450
38.25 (1 point) Determine E[(X - 2000)+]. (A) 2600
(B) 2650
(C) 2700
(D) 2750
(E) 2800
38.26 (2 points) You are given the following: • The random variable X has a distribution that is a mixture of a Pareto distribution with parameters θ = 1000 and α = 1, and another Pareto distribution, but with parameters θ = 100 and α = 1. • The first Pareto is given a weight of 0.3 and the second Pareto a weight of 0.7. Determine the 20th percentile of X. A. Less than 15 B. At least 15, but less than 25 C. At least 25, but less than 35 D. At least 35, but less than 45 E. At least 45 Use the following information for the next two questions: Medical losses are Poisson with λ = 2. The size of medical losses are uniform from 0 to 2000. Dental losses are Poisson with λ = 1. The size of dental losses is uniform from 0 to 500. A policy, with an ordinary deductible of 200, covers both medical and dental losses. 38.27 (2 points) Determine the average payment per loss for this policy. (A) 570 (B) 580 (C) 590 (D) 600 (E) 610 38.28 (1 point) Determine the average payment per payment for this policy. (A) 700 (B) 710 (C) 720 (D) 730 (E) 740
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Use the following information for the next 14 questions: F(x) = (0.2)(1 - e-x/10) + (0.5)(1 - e-x/25) + (0.3)(1 - e-x/100). 38.29 (1 point) What is the probability that x is more than 15? A. 56% B. 58% C. 60% D. 62% E. 64% 38.30 (1 point) What is the mean? A. Less than 20 B. At least 20, but less than 30 C. At least 30, but less than 40 D. At least 40, but less than 50 E. At least 50 38.31 (3 points) What is the median? A. Less than 19 B. At least 19, but less than 20 C. At least 20, but less than 21 D. At least 21, but less than 22 E. At least 22 38.32 (1 point) What is the mode? (A) 0 (B) 10 (C) 25
(D) 100
(E) None of A, B, C, D
38.33 (1 point) What is the second moment? A. Less than 3000 B. At least 3000, but less than 4000 C. At least 4000, but less than 5000 D. At least 5000, but less than 6000 E. At least 6000 38.34 (1 point) What is the coefficient of variation? (A) 0.7 (B) 0.9 (C) 1.1 (D) 1.3 38.35 (2 points) What is the third moment? A. Less than 1.0 million B. At least 1.0 million, but less than 1.3 million C. At least 1.3 million, but less than 1.6 million D. At least 1.6 million, but less than 1.9 million E. At least 1.9 million
(E) 1.5
Page 747
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
38.36 (2 points) What is the skewness? A. Less than 2.0 B. At least 2.0, but less than 2.5 C. At least 2.5, but less than 3.0 D. At least 3.0, but less than 3.5 E. At least 3.5 38.37 (2 points) What is the hazard rate at 50, h(50)? A. Less than 0.020 B. At least 0.020, but less than 0.025 C. At least 0.025, but less than 0.030 D. At least 0.030, but less than 0.035 E. At least 0.035 38.38 (2 points) What is the Limited Expected Value at 20, E[X ∧ 20]? A. Less than 11 B. At least 11, but less than 13 C. At least 13, but less than 15 D. At least 15, but less than 17 E. At least 17 38.39 (2 points) What is the Loss Elimination Ratio at 15? A. Less than 26% B. At least 26%, but less than 29% C. At least 29%, but less than 32% D. At least 32%, but less than 35% E. At least 35% 38.40 (2 points) What is the Excess Ratio at 75? A. Less than 26% B. At least 26%, but less than 29% C. At least 29%, but less than 32% D. At least 32%, but less than 35% E. At least 35%
Page 748
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 749
38.41 (4 points) What is the Limited Second Moment at 30, E[(X ∧ 30)2 ]? Hint: Use Theorem A.1 in Appendix A of Loss Models: j=n-1
Γ(n; x) = 1 -
∑ xj e- x / j!, for n a positive integer. j=0
A. Less than 430 B. At least 440, but less than 450 C. At least 450, but less than 460 D. At least 460, but less than 470 E. At least 470 38.42 (3 points) What is the Mean Excess Loss at 50? A. Less than 78 B. At least 78, but less than 82 C. At least 82, but less than 86 D. At least 86, but less than 90 E. At least 90
38.43 (3 points) With the aid of a computer, graph a two-point mixture of a Gamma Distribution with α = 4 and θ = 3 and a Gamma Distribution with α = 2 and θ = 10, with 60% weight to the first distribution and 40% weight to the second distribution. 38.44 (2 points) 40% of lives follow DeMoivreʼs Law with ω = 80. The other 60% of lives follow DeMoivreʼs law with ω = 100. A life is picked at random. If the life has survives to at least age 70, what is its expected age at death? A. 80 B. 81 C. 82 D. 83 E. 84 38.45 (2 points) You are the consulting actuary to a group of venture capitalists financing a search for pirate gold. Itʼs a risky undertaking: with probability 0.80, no treasure will be found, and thus the outcome is 0. The rewards are high: with probability 0.20 treasure will be found. The outcome, if treasure is found, is uniformly distributed on [1000, 5000]. Calculate the variance of the distribution of outcomes. (A) 1.3 million (B) 1.4 million (C) 1.5 million (D) 1.6 million (E) 1.7 million 38.46 (3 points) With the aid of a computer, graph a two-point mixture of a Gamma Distribution with α = 4 and θ = 3 and a Gamma Distribution with α = 6 and θ = 10, with 30% weight to the first distribution and 70% weight to the second distribution.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 750
38.47 ( 3 points) The distribution of a loss, X, is a two-point mixture: (i) With probability 0.7, X has an Exponential distribution with θ = 100. (ii) With probability 0.3, X has an Exponential distribution with θ = 200. If a loss is of size greater than 50, what is its expected size? A. 180 B. 185 C. 190 D. 195 E. 200 38.48 (3 points) In 2002 losses follow the following density: f(x) = 0.7 exp(-x/1000)/1000 + 0.3 exp(-x/5000)/5000, 0 < x < ∞. Losses uniformly increase by 8% between 2002 and 2004. In 2004 a policy has a 3000 maximum covered loss. In 2004 what is the average payment per loss? A. 1320 B. 1340 C. 1360 D. 1380 E. 1400 Use the following information for the next 3 questions: Risk Type Number of Risks Size of Loss Distribution I
600
Single Parameter Pareto, θ = 10, α = 4
II
400
Single Parameter Pareto, θ = 10, α = 3
38.49 (2 points) You independently simulate a single loss for each risk. Let S be the sum of these 1000 amounts. You repeat this process many times. What is the variance of S? A. Less than 42,000 B. At least 42,000, but less than 43,000 C. At least 43,000, but less than 44,000 D. At least 44,000, but less than 45,000 E. At least 45,000 38.50 (2 points) A risk is selected at random from one of the 1000 risks. You simulate a single loss for this risk. This risk is replaced, and a new risk is selected at random from one of the 1000 risks. You simulate a single loss for this new risk. You repeat this process many times, each time picking a new risk at random. What is the variance of the outcomes? A. 42 B. 43 C. 44 D. 45 E. 46 38.51 (2 points) A risk is selected at random from one of the 1000 risks. You simulate a single loss for this risk. You then simulate another loss for this same risk. You repeat this process many times. What is the expected variance of the outcomes? A. 42 B. 43 C. 44 D. 45 E. 46
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 751
Use the following information for the next two questions:
• Bob is an overworked underwriter. • Applications arrive at his desk. • Each application has a 1/3 chance of being a “bad” risk and a 2/3 chance of being a “good” risk. • Since Bob is overworked, each time he gets an application he flips a fair coin. • If it comes up heads, he accepts the application without looking at it. • If the coin comes up tails, he accepts the application if and only if it is a “good” risk. • The expected profit on a “good” risk is 300 with variance 10,000. • The expected profit on a “bad” risk is -100 with variance 90,000. 38.52 (2 points) Calculate the variance of the profit per applicant. A. 50,000 B. 51,000 C. 52,000 D. 53,000 E. 54,000 38.53 (2 points) Calculate the variance of the profit per applicant that Bob accepts. A. Less than 50,000 B. At least 50,000, but less than 51,000 C. At least 51,000, but less than 52,000 D. At least 52,000, but less than 53,000 E. At least 53,000
38.54 (4 points) You are given the following information for Homeowners Insurance:
• 10% of losses are due to Wind. • 30% of losses are due to Fire. • 20% of losses are due to Liability. • 40% of losses are due to All Other Perils. • Losses due to Wind follow a LogNormal distribution with µ = 10 and σ = 0.7. • Losses due to Fire follow a Gamma distribution with α = 2 and θ = 10,000. • Losses due to Liability follow a Pareto distribution with α = 5 and θ = 200,000. • Losses due to All Other Perils follow an Exponential distribution with θ = 5,000. Determine the standard deviation of the size of loss for Homeowners Insurance. A. 15,000 B. 20,000 C. 25,000 D. 30,000 E. 35,000
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 752
Use the following information for the next 3 questions: X follows a two-point mixture of LogNormal Distributions. The first LogNormal is given weight 65%, and has parameters µ = 8 and σ = 0.5. The second LogNormal is given weight 35%, and has parameters µ = 9 and σ = 0.3. 38.55 (1 point) Determine E[X]. A. Less than 4400 B. At least 4400, but less than 4600 C. At least 4600, but less than 4800 D. At least 4800, but less than 5000 E. At least 5000 38.56 (1 point) Determine E[X2 ]. A. Less than 35 million B. At least 35 million, but less than 40 million C. At least 40 million, but less than 45 million D. At least 45 million, but less than 50 million E. At least 50 million 38.57 (1 point) Determine 1 / E[1/X]. A. Less than 3200 B. At least 3200, but less than 3400 C. At least 3400, but less than 3600 D. At least 3600, but less than 3800 E. At least 3800
38.58 (4 points) On an exam, the grades of Good students are distributed via a Beta Distribution with a = 6, b = 2 and θ = 100. On this exam, the grades of Bad students are distributed via a Beta Distribution with a = 3, b = 2 and θ = 100. 3/4 of students are good, while 1/4 of students are bad. A grade of 65 or more passes. What is the expected grade of a student who fails this exam? A. Less than 50 B. At least 50, but less than 51 C. At least 51, but less than 52 D. At least 52, but less than 53 E. At least 53
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 753
Use the following information for the next four questions: R is the annual return on a stock. At random, half of the time R is a random draw from a Normal Distribution with µ = 8% and σ = 20%. The other half of the time, R is a random draw from a Normal Distribution with µ = 11% and σ = 30%. Hint: The third moment of a Normal Distribution is: µ3 + 3µσ2. The fourth moment of a Normal Distribution is: µ4 + 6µ2σ2 + 3σ4. 38.59 (1 point) What is the mean of R? A. 8.5% B. 9% C. 9.5% D. 10.0%
E. 10.5%
38.60 (2 points) What is the standard deviation of R? A. Less than 25% B. At least 25%, but less than 27% C. At least 27%, but less than 29% D. At least 29%, but less than 31% E. 31% or more 38.61 (3 points) What is the skewness of R? A. Less than -0.10 B. At least -0.10, but less than -0.05 C. At least -0.05, but less than 0.05 D. At least 0.05, but less than 0.10 E. 0.10 or more 38.62 (4 points) What is the kurtosis of R? A. 3.0 B. 3.2 C. 3.4 D. 3.6
E. 3.8
38.63 (5 points) For a mixture of two distributions with the same coefficient of variation, compare the coefficient of variation of the mixture with that of the components. 38.64 (2, 5/85, Q.15) (1.5 points) If X is a random variable with density function f(x) = 1.4e-2x + 0.9e-3x for x ≥ 0. Determine E(X). A. 9/20 B. 5/6 C. 1 D. 230/126
E. 23/10
38.65 (160, 11/86, Q.1) (2.1 points) Three populations have constant forces of mortality 0.01, 0.02, and 0.04, respectively. For a group of newborns, one-third from each population, determine the complete expectation of future lifetime at age 50. (A) 8.3 (8) 24.3 (C) 50.0 (D) 58.3 (E) 74.3
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 754
38.66 (160, 11/86, Q.4) (2.1 points) For a certain population, you are given: (i) At any point in time, equal numbers of males and females are born. (ii) The mean and variance of the lifetime distribution for males at birth are 60 and 200, respectively. (iii) The mean and variance of the lifetime distribution for females at birth are 80 and 300, respectively. Determine the variance of the lifetime distribution for the population. (A) 150 (B) 200 (C) 250 (D) 300 (E) 350 38.67 (4B, 5/93, Q.20) (1 point) Which of the following statements are true? 1. With an n-point mixture of models, the large number of parameters that need to be estimated may be a problem. 2. Starting with several distributions, the two-point mixture of models leads to many more pairs of distributions. 3. A potential computational problem with the mixture of models is that estimation of p in the equation F(x) = pF1 (x) + (1-p)F2 (x) via iterative numerical techniques, A. 1
may lead to a value of p outside the interval from 0 to 1. B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3
38.68 (4B, 11/97, Q.3) (2 points) You are given the following: • The random variable X has a distribution that is a mixture of a Burr distribution, ⎛ ⎞α 1 F(x) = 1 - ⎜ γ ⎟ , with parameters θ = 1000 , α = 1 and γ = 2, ⎝ 1 + (x / θ) ⎠ and a Pareto distribution, with parameters θ = 1,000 and α = 1. • Each of the two distributions in the mixture has equal weight. Determine the median of X. A. Less than 5 B. At least 5, but less than 50 C. At least 50, but less than 500 D. At least 500, but less than 5,000 E. At least 5,000
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 755
38.69 (4B, 5/98, Q.16) (2 points) You are given the following: • X1 is a mixture of a random variable with a uniform distribution on [0,2]
•
and a random variable with a uniform distribution on [1, 3]. (Each distribution in the mixture has positive weight.) X2 is the sum of a random variable with a uniform distribution on [0,2]
•
and a random variable with a uniform distribution on [1, 3]. X3 is a random variable that has a normal distribution that is right censored at 1.
Match X1 , X2 , and X3 with the following descriptions: 1. Continuous distribution function and continuous density function 2. Continuous distribution function and discontinuous density function 3. Discontinuous distribution function A. X1 :1, X2 :2, X3 :3 B. X1 :1, X2 :3, X3 :2 C. X1 :2, X2 :1, X3 :3 D. X1 :2, X2 :3, X3 :1
E. X1 :3, X2 :1, X3 :2
38.70 (4B, 11/98 Q.8) (2 points) You are given the following: • A portfolio consists of 75 liability risks and 25 property risks. • The risks have identical claim count distributions. •
Loss sizes for liability risks follow a Pareto distribution, with parameters θ = 300 and α = 4.
•
Loss sizes for property risks follow a Pareto distribution, with parameters θ = 1,000 and α = 3.
Determine the variance of the claim size distribution for this portfolio for a single claim. A. Less than 150,000 B. At least 150,000, but less than 225,000 C. At least 225,000, but less than 300,000 D. At least 300,000, but less than 375,000 E. At least 375,000 38.71 (Course 151 Sample Exam #2, Q.5) (0.8 points) You are given S = S1 + S2 , where S1 and S2 are independent and have compound Poisson distributions with the following characteristics: (i) λ1 = 2 and λ2 = 3 (ii)
x
p 1 (x)
p 2 (x)
1 0.6 0.1 2 0.4 0.3 3 0.0 0.5 4 0.0 0.1 Determine the variance of individual claim amounts for S. (A) 0.83 (B) 0.87 (C) 0.91 (D) 0.95 (E) 0.99
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 756
38.72 (Course 1 Sample Exam. Q.24) (1.9 points) An automobile insurance company divides its policyholders into two groups: good drivers and bad drivers. For the good drivers, the amount of an average claim is 1400, with a variance of 40,000. For the bad drivers, the amount of an average claim is 2000, with a variance of 250,000. Sixty percent of the policyholders are classified as good drivers. Calculate the variance of the amount of a claim for a policyholder. A. 124,000 B. 145,000 C. 166,000 D. 210,400 E. 235,000 38.73 (Course 3 Sample Exam, Q.10) An insurance company is negotiating to settle a liability claim. If a settlement is not reached, the claim will be decided in the courts 3 years from now. You are given: • There is a 50% probability that the courts will require the insurance company to make a payment. The amount of the payment, if there is one, has a lognormal distribution with mean 10 and standard deviation 20. • In either case, if the claim is not settled now, the insurance company will have to pay 5 in legal expenses, which will be paid when the claim is decided, 3 years from now. • The most that the insurance company is willing to pay to settle the claim is the expected present value of the claim and legal expenses plus 0.02 times the variance of the present value. • Present values are calculated using i = 0.04. Calculate the insurance company's maximum settlement value for this claim. A. 8.89 B. 9.93 C. 12.45 D. 12.89 E. 13.53 38.74 (IOA 101, 9/00, Q.8) (4.5 points) Claims on a certain class of policy are classified as being of two types, I and II. Past experience has shown that: 25% of claims are of type I and 75% are of type II; Type I claim amounts have mean 500 and standard deviation 100; Type II claim amounts have mean 300 and standard deviation 70. Calculate the mean and the standard deviation of the claim amounts on this class of policy. 38.75 (1, 5/01, Q.17) (1.9 points) An auto insurance company insures an automobile worth 15,000 for one year under a policy with a 1,000 deductible. During the policy year there is a 0.04 chance of partial damage to the car and a 0.02 chance of a total loss of the car. If there is partial damage to the car, the amount X of damage (in thousands) follows a distribution with density function f(x) = 0.5003 e-x/2, 0 < x <15. What is the expected claim payment? (A) 320 (B) 328 (C) 352 (D) 380
(E) 540
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 757
38.76 (3, 11/01, Q.28 & 2009 Sample Q.100) (2.5 points) The unlimited severity distribution for claim amounts under an auto liability insurance policy is given by the cumulative distribution: F(x) = 1 - 0.8e-0.02x - 0.2e-0.001x, x ≥ 0. The insurance policy pays amounts up to a limit of 1000 per claim. Calculate the expected payment under this policy for one claim. (A) 57 (B) 108 (C) 166 (D) 205 (E) 240 38.77 (4, 11/02, Q.13) (2.5 points) Losses come from an equally weighted mixture of an exponential distribution with mean m1 , and an exponential distribution with mean m2 . Determine the least upper bound for the coefficient of variation of this distribution. (A) 1
(B)
2
(C)
3
(D) 2
(E)
5
38.78 (SOA3, 11/03, Q.18) (2.5 points) A population has 30% who are smokers with a constant force of mortality 0.2 and 70% who are non-smokers with a constant force of mortality 0.1. Calculate the 75th percentile of the distribution of the future lifetime of an individual selected at random from this population. (A) 10.7 (B) 11.0 (C) 11.2 (D) 11.6 (E) 11.8 38.79 (CAS3, 11/04, Q.28) (2.5 points) A large retailer of personal computers issues a Warranty contract with each computer that it sells. The warranty covers any cost to repair or replace a defective computer within the first 30 days of purchase. 40% of all claims are easily resolved with minor technical help and do not involve any cost to replace or repair. If a claim involves some cost to replace or repair, the claim size is distributed as a Weibull with parameters τ = 1/2 and θ = 30. Which of the following statements are true? 1. The expected cost of a claim is $60. 2. The survival function at $60 is 0.243. 3. The hazard rate at $60 is 0.012. A. 1 only. B. 2 only. C. 3 only. D. 1 and 2 only.
E. 2 and 3 only.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 758
38.80 (CAS3, 11/04, Q.29) (2.5 points) High-Roller Insurance Company insures the cost of injuries to the employees of ACME Dynamite Manufacturing, Inc.
• 30% of injuries are "Fatal" and the rest are "Permanent Total" (PT). • There are no other injury types. • Fatal injuries follow a log-logistic distribution with θ = 400 and γ = 2. • PT injuries follow a log-logistic distribution with θ = 600 and γ = 2. • There is a $750 deductible per injury. Calculate the probability that an injury will result in a claim to High-Roller. A. Less than 30% B. At least 30%, but less than 35% C. At least 35%, but less than 40% D. At least 40%, but less than 45% E. 45% or more 38.81 (SOA M, 5/05, Q.34 & 2009 Sample Q.169) (2.5 points) The distribution of a loss, X, is a two-point mixture: (i) With probability 0.8, X has a two-parameter Pareto distribution with α = 2 and θ = 100. (ii) With probability 0.2, X has a two-parameter Pareto distribution with α = 4 and θ = 3000. Calculate Pr(X ≤ 200). (A) 0.76 (B) 0.79
(C) 0.82
(D) 0.85
(E) 0.88
38.82 (CAS3, 11/05, Q.32) (2.5 points) For a certain insurance company, 60% of claims have a normal distribution with mean 5,000 and variance 1,000,000. The remaining 40% have a normal distribution with mean 4,000 and variance 1,000,000. Calculate the probability that a randomly selected claim exceeds 6,000. A Less than 0.10 B. At least 0.10, but less than 0.15 C. At least 0.15, but less than 0.20 D. At least 0.20, but less than 0.25 E. At least 0.25 38.83 (SOA M, 11/05, Q.32) (2.5 points) For a group of lives aged 30, containing an equal number of smokers and non-smokers, you are given: (i) For non-smokers, µn (x) = 0.08, x ≥ 30. (ii) For smokers, µs(x) = 0.16, x ≥ 30. Calculate q80 for a life randomly selected from those surviving to age 80. (A) 0.078
(B) 0.086
(C) 0.095
(D) 0.104
(E) 0.112
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 759
38.84 (CAS3, 11/06, Q.20) (2.5 points) An insurance company sells hospitalization reimbursement insurance. You are given:
• Benefit payment for a standard hospital stay follows a lognormal distribution with µ = 7 and σ = 2. • Benefit payment for a hospital stay due to an accident is twice as much as a standard benefit. • 25% of all hospitalizations are for accidental causes. Calculate the probability that a benefit payment will exceed $15,000. A. Less than 0.12 B. At least 0.12, but less than 0.14 C. At least 0.14, but less than 0.16 D. At least 0.16, but less than 0.18 E. At least 0.18 38.85 (IOA, CT8, 4/10, Q.9) (7.5 points) An asset is worth 100 at the start of the year and is funded by a senior loan and a junior loan of 50 each. The loans are due to be repaid at the end of the year; the senior one with annual interest at 6% and the junior one with annual interest at 8%. Interest is paid on the loans only if the asset sustains no losses. Any losses of up to 50 sustained by the asset reduce the amount returned to the investor in the junior loan by the amount of the loss. Any losses of more than 50 mean that the investor in the junior loan gets 0 and the amount returned to the investor in the senior loan is reduced by the excess of the loss over 50. The probability that the asset sustains a loss is 0.25. The size of a loss, L, if there is one, follows a uniform distribution between 0 and 100. (i) ( 6 points) (a) Calculate the variance of the distribution of amounts paid back to the investors in the junior loan. (b) Calculate the variance of the distribution of amounts paid back to the investors in the senior loan. (ii) (1.5 points) Calculate the probabilities for the investors in the junior and senior loans, that they get paid back less than the original amounts of their loans.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 760
38.86 (IOA CT8, 9/10, Q.1) (5.25 points) An investor holds an asset that produces a random rate of return, R, over the course of a year. The distribution of this rate of return is a mixture of Normal distributions: R has a Normal distribution with a mean of 0% and standard deviation of 10% with probability 0.8 and a Normal distribution with a mean of 30% and a standard deviation of 10% with a probability of 0.2. S is the normally distributed random rate of return on another asset that has the same mean and variance as R. (i) (2.25 points) Calculate the mean and variance of R. (ii) (3 points) Calculate the following probabilities for R and for S: (a) probability of a rate of return less than 0%. (b) probability of a rate of return less than -10%.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 761
Solutions to Problems: 38.1. A. The chance of a claim greater than 2 is: 1 - F(2) = (θ/(θ+2))α = (10/12)4 = 0.4823. 38.2. D. The mean of a Pareto is: θ/(α−1) = 10/3 = 3.333. 38.3. A. The second moment of a Pareto is: 2θ2 /{(α−1)(α−2)} = 200/6 = 33.333. 38.4. C. The variance is : 33.333 - 3.3332 = 22.22. Thus the CV = Comment: For the Pareto the CV is: α / (α - 2) = 4 / 2 =
22.22 / 3.333 = 1.414.
2 = 1.414.
38.5. B. The third moment of a Pareto is : 6θ3 /{(α−1)(α−2)(α−3)} = 6000/6 = 1000.
38.6. B. Skewness =
E[X3] - 3 E[X] E[X2] + 2 E[X] 3 = STDDEV 3
{1000 - (3)(3.333)(33.333) + (2)(3.333)3 } / (22.22)1.5 = 7.07. Comment: For the Pareto, Skewness = 2{(α+1)/(α−3)} (α - 2) / α = (2){(5)/(1)}
2 / 4 = 7.07.
38.7. C. The excess ratio for the Pareto is: {θ/(θ+x)}α−1 = (10/15)3 = 0.2963. 38.8. B. The chance of a claim greater than 2 is: 1 - F(2) = e-2/δ = e-2/0.8 = 0.0821. 38.9. E. The mean of the Exponential Distribution is: δ = 0.8. 38.10. A. The second moment of the Exponential is: 2δ2 = 1.28. 38.11. E. The variance = 1.28 - 0.82 = 0.64. Thus the standard deviation = 0.8. The CV = standard deviation divided by the mean = 0.8 / 0.8 = 1. 38.12. E. The third moment of the exponential is 6δ3 = 3.072.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
38.13. D. Skewness =
HCM 10/8/12,
Page 762
E[X3] - 3 E[X] E[X2] + 2 E[X] 3 = STDDEV 3
{3.072 - (3)(0.8)(1.28) + (2)(0.8)3 } / (0.64)1.5 = 2. Comment: The C.V. of the exponential distribution is always 1, while the skewness is always 2. 38.14. C. The excess ratio for the Exponential is: e−x/δ = e-5/0.8 = 0.00193. 38.15. D. The chance of a claim greater than 2 is: (0.05)(0.4823) + (0.95)(0.0821) = 0.102. 38.16. B. The mean is a weighted average of the individual means: (.05)(3.333 ) + (.95)(.8) = 0.9267. 38.17. B. The second moment is a weighted average of the individual second moments: (.05)(33.333 ) + (.95)(1.28) = 2.883. 38.18. C. The variance is: 2.883 - .9272 = 2.024. Thus the CV =
2.024 / .927 = 1.535.
38.19. D. The third moment is a weighted average of the individual third moments: (.05)(1000) + (.95)( 3.072) = 52.918.
38.20. D. Skewness =
E[X3] - 3 E[X] E[X2] + 2 E[X] 3 = STDDEV 3
{52.918 - (3)(.927)(2.883) + (2)(.927)3 } / (2.024)1.5 = 16.15. 38.21. C. The excess ratio for the mixed distribution is the weighted average of the individual excess ratios, using as the weights the means times p or 1-p: {(.05)(3.333)( .2963)+ (.95)(.8)(.00193)} / {(.05)(3.333)+ (.95)(.8)} = 0.05085 / 0.9267 = 0.0549. Comment: Almost certainly beyond what will be asked on the exam.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 763
38.22. E. For the Pareto, S(2) = (θ/(θ+2))α = (10/12)4 = 0.4823. For the Exponential, S(2) = exp[-2/θ]} = exp[-2/.8] = 0.0821. For the mixture, S(2) = (5%)(0.4823) + (95%)(0.0821) = 0.1021. For the Pareto, the expected losses excess of 2 are: E[X] - E[X ∧ 2] = θ/(α−1) − {θ/(α−1)}{1−(θ/(θ+2))α−1} = (10/3)(10/12)3 = 1.9290. For the Exponential, the expected losses excess of 2 are: E[X] - E[X ∧ 2] = θ − θ{1 - exp[-2/θ]} = θ exp[-2/θ] = .8 exp[-2/.8] = 0.0657. For the mixture, the expected losses excess of 2 are: (5%)(1.9290) + (95%)(0.0657) = 0.1589. For the mixture, e(2) = 0.1589/0.1021 = 1.556. Comment: For the Exponential e(2) = θ = 0.8. For the Pareto, e(2) = (2 + θ)/(α - 1) = (2 + 10)/(4 - 1) = 4. The mean excess loss of the mixture is not equal to the mixture of the mean excess losses: (5%)(4) + (95%)(0.8) = 0.96 ≠ 1.556. Here is a graph of the mean excess loss of the mixture: Mean ExcessLoss 6 5 4 3 2 1 2
4
6
8
10
x
38.23. D. The mean of each Exponential is: θ. The second moment of each Exponential is: 2θ2 . The mean and second moment of the mixed distribution are the weighted average of those of the individual distributions. Therefore, the mixed distribution has mean: .4θ1 + .6θ2 = (.4)(1128) + (.6)(5915) = 4000 and second moment: 2(.4θ12 + .6θ22) = 2{(.4)(11282 ) + (.6)(59152 )} = 43,002,577. Variance = 43,002,577 - 40002 = 27.0 million.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
38.24. D. For the Exponential Distribution, E[X
∧
HCM 10/8/12,
Page 764
x] = θ(1 - e-x/θ).
The given distribution is a 40%-60% mixture of two Exponentials, with means 1128 and 5915. Therefore, the limited expected value at 2000 is a weighted average of the LEVs for the individual Exponentials. E[X ∧ 2000] = (.4){(1128)(1 - e-2000/1128)} + (.6){(5915)(1 - e-2000/5915)} = 1393. Comment: Similar to 3, 11/01, Q.28. 38.25. A. E[(X - 2000)+] = E[X] - E[X
∧
2000] = 4000 - 1393 = 2607.
Alternately, for each Exponential, E[(X - 2000)+] = ∞
∞
∫S(x) dx = ∫e-x/θ dx = θe-2000/θ. For θ = 1128, E[(X - 2000)+] = 1128 e-2000/1128 = 191.6. 2000
2000
For θ = 5915, E[(X - 2000)+] = 5915 e-2000/5915 = 4218.0. For the mixture, E[(X - 2000)+] = (.4)(191.5) + (.6)(4218.0) = 2607. 38.26. D. Let F be the mixed distribution, then: F(x) = (.3){1-1000/(1000+x)} + (.7){1-100/(100+x)} The 20th percentile is that x such that F(x) = .2. Thus the 20th percentile of the mixed distribution is the value of x such that: .2 = (.3){1-1000/(1000+x)} + (.7){1-100/(100+x)}. Thus .8 = 300/(1000+x) + 70/(100+x). Thus .8x2 + 510x - 20000 = 0. Thus x = {-510 +
(5102 ) + (4)(0.8)(20,000) }/ {(2)(0.8)} = 37.1.
38.27. A. E[X] = (2/3)(1000) + (1/3)(250) = 750. E[X ∧ 200] = (2/3){(.1)(100) + (.9)(200)} + (1/3){(.4)(100) + (.6)(200)} = (2/3)(190) + (1/3)(160) = 180. E[(X - 200)+] = E[X] - E[X ∧ 200] = 750 - 180 = 570. Alternately, for each uniform from 0 to b, E[(X - 200)+] = b
b
∫S(x) dx = ∫(1 - x/b) dx = (b - 200) - (b/2 - 20000/b) = b/2 + 20000/b - 200. 200
200
For b = 2000, E[(X - 200)+] = 810. For b = 500, E[(X - 200)+] = 90. For the mixture, E[(X - 200)+] = (2/3)(810) + (1/3)(90) = 570. Comment: Mathematically the same as a mixture of two uniform distributions, with weight 2/(2 + 1) = 2/3 to the first uniform distribution.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 765
38.28. B. For the mixture, S(200) = (2/3)(.9) + (1/3)(.6) = .8. Average payment per payment = E[(X - 200)+]/S(200) = 570/.8 = 712.5. Alternately, nonzero payments for medical have mean frequency of: (.9)(2) = 1.8. Nonzero payments for medical are uniform from 0 to 1800 with mean 900. Nonzero payments for dental have mean frequency of: (.6)(1) = .6. Nonzero payments for dental are uniform from 0 to 300 with mean 150. {(1.8)(900) + (.6)(150)} / (1.8 + .6) = 712.5. 38.29. B. S(x) = 1 - F(x) = 0.2 e-x/10 + 0.5 e-x/25 + 0.3 e-x/100. S(15) = 0.2 e-15/10 + 0.5 e-15/25 + 0.3 e-15/100 = 57.7%. 38.30. D. The mean of the mixed distribution is a weighted average of the mean of each Exponential Distribution: (0.2)(10) + (0.5)(25) + (0.3)(100) = 44.5. Comment: This is a 3-point mixture of Exponential Distributions, with means of 10, 25, and 100 respectively. 38.31. B. Set F(x) = 1 - 0.2e-x/10 + 0.5e-x/25 + 0.3e-x/100. One can calculate the distribution function at the endpoints of the intervals and determine that the median is between 19 and 20. (Solving numerically, median = 19.81.) x
1 - Exp(-x/10)
1- Exp(-x/25)
1- Exp(-x/100)
Mixed Distribution
19 20 21 22
0.850 0.865 0.878 0.889
0.532 0.551 0.568 0.585
0.173 0.181 0.189 0.197
0.488 0.503 0.516 0.530
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 766
38.32. A. The mode of every Exponential Distribution is 0, thus so is that of the Mixed Exponential. Comment: If the individual distributions of a mixture have different modes, then in general it would difficult to calculate algebraically the mode. One could do so by graphing the mixed density and seeing where it reached a maximum. In this case a graph of the mixed density is as follows: 0.04
0.03
0.02
0.01
20
40
60
80
100
120
140
38.33. E. Each Exponential Distribution has a second moment of 2θ2. The second moment of the mixture is a weighted average of the individual second moments: (0.2)(2)(102 ) + (0.5)(2)(252 ) + (0.3)(2)(1002 ) = 6665. 38.34. E. Variance = 6665 - 44.522 = 4684.75. CV = 4684.75 / 44.5 = 1.54. Comment: Note that while the CV of every Exponential is 1, the CV of a mixed exponential is always greater than one. 38.35. D. Each Exponential Distribution has a third moment of 6θ3. The third moment of the mixture is a weighted average of the individual third moments: (0.2)(6)(103 ) + (0.5)(6)(253 ) + (0.3)(6)(1003 ) = 1,848,078. 38.36. E. skewness = {1,848,078 -(3)(44.50)(6665) + (2)(44.53 )} / 4684.751.5 = 3.54. Comment: Note that while the skewness of every Exponential is 2, the skewness of a mixed exponential is always greater than 2. 38.37. A. f(x) = (0.2)(e-x/10/10) + (0.5)(1-e-x/25/25) + (0.3)(1-e-x/100/100) . f(50) = .00466. S(x) = 0.2e-x/10 + 0.5e-x/25 + 0.3e-x/100. S(50) = 0.25097. h(50) = f(50)/S(50) = 0.00466 / 0.25097 = 0.0186. Comment: Note that while the hazard rate of each Exponential Distribution is independent of x, that is not true for the Mixed Exponential Distribution.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 767
38.38. C. For each individual Exponential, the Limited Expected Value is: θ(1-e-x/θ). The Limited Expected Value of the mixture is a weighted average of the individual Limited Expected Values: (.2)(10)(1-e-20/10) + (.5)(25)(1-e-20/25) + (.3)(100)(1-e-20/100) = 14.05. 38.39. A. E[X ∧ 15] = (.2)(10)(1-e-15/10) + (.5)(25)(1-e-15/25) + (.3)(100)(1-e-15/100) = 11.37. E[X] = 44.5. LER(15) = E[X ∧ 15] / E[X] = 11.37 / 44.5 = 25.6%. 38.40. D. E[X ∧ 75] = (.2)(10)(1-e-75/10) + (.5)(25)(1-e-75/25) + (.3)(100)(1-e-75/100) = 29.71. E[X] = 44.5. R(75) = 1 - E[X ∧ 75] / E[X] = 29.71 / 44.5 = 33.2%. 38.41. D. For each individual Exponential, the Limited Second Moment is: 2θ2Γ(3;x/θ) + x2 e-x/θ. Using Theorem A.1 in Appendix A of Loss Models, Γ(3;x/θ) = 1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ /2. Thus E[(X ∧ x)2 ] =2θ2Γ(3; x/θ) + x2 e-x/θ = 2θ2 - 2θ2e-x/θ - 2xθe-x/θ = 2θ{θ - (θ+x)e-x/θ} . For θ = 10, E[(X ∧ 30)2 ] = 20{10 - 40e-3} = 160.17. For θ = 25, E[(X ∧ 30)2 ] = 50{25 - 55e-1.2} = 421.72. For θ = 100, E[(X ∧ 30)2 ] = 200{100 - 130e-0.3} = 738.72. The Limited Second Moment of the mixture is a weighted average of the individual Limited Second Moments: (0.2)(160.17) + (0.5)(421.72) + (0.3)(738.72) = 464.51. Comment: Difficult. One can compute the integral in the limited second moment of the Exponential Distribution by repeated use of integration by parts. 38.42. B. E[X ∧ 50] = (0.2)(10)(1-e-50/10) + (0.5)(25)(1-e-50/25) + (0.3)(100)(1-e-50/100) = 24.60. E[X] = 44.5. S(x) = 0.2e-x/10 + 0.5e-x/25 + 0.3e-x/100. S(50) = 0.25097. e(50) = (E[X] - E[X ∧ 50]) / S(50) = (44.5-24.60)/.25097 = 79.3. Comment: Note that while for each Exponential Distribution e(x) is independent of x, that is not true for the Mixed Exponential Distribution. For the Mixed Distribution, the Mean Excess Loss increases with x, towards the largest mean of the individual Exponentials. For example, in this case e(200) = 99.7. The tail behavior of the mixed exponential is that of the individual exponential with the largest mean.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 768
38.43. In this case, the mixed distribution is unimodal. density 0.05
0.04
0.03
0.02
0.01
10
20
30
40
50
x
38.44. D. Prob[ω = 80 | surviving to least 70] = Prob[surviving to least 70 | ω = 80] Prob[ω = 80] / Prob[surviving to least 70] = (10/80)(0.4) / {(10/80)(0.4) + (30/100)(0.6)} = 21.7%. Prob[ω = 100 | surviving to least 70] = (30/100)(.6)/{(10/80)(.4) + (30/100)(.6)} = 78.3%. Therefore, expected age at death is: (21.7%)(70 + 80)/2 + (78.3%)(70 + 100)/2 = 82.8. Comment: DeMoivreʼs Law is uniform from 0 to ω. 38.45. E. This can be thought of as a two point mixture between a severity that is always zero and a uniform distribution on [1000, 5000]. Mean = (80%)(0) + (20%)(3000) = 600. 2nd moment of uniform on [1000, 5000] is: (50003 - 10003 )/{(3)(5000 - 1000)} = 10,333,333. Second moment of mixture = (80%)(0) + (20%)(10,333,333) = 2,066,667. Variance of mixture = 2,066,667 - 6002 = 1,706,667. Alternately, this can be thought of as a Bernoulli Frequency with q = .2 and a uniform severity. Variance of Aggregate = (mean freq.)(var. of severity) + (mean severity)2 (variance of freq.) = (0.2)(40002 /12) + 30002(0.2)(0.8) = 1,706,667. Comment: This can also be thought of as a two component splice between a point mass at 0 of 80% and a uniform distribution with weight 20%.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 769
38.46. f(x) = θ−αxα−1 e−x/θ / Γ(α). With α = 4 and θ = 3, f(x) = x2 e-x/3/ 486: density 0.07 0.06 0.05 0.04 0.03 0.02 0.01 10
20
30
40
x
With α = 6 and θ = 10, f(x) = x5 e-x/10 / 120,000,000: density 0.015 0.010 0.005 x 20 40 60 80 100 120 140 With 30% weight to the first distribution and 70% weight to the second distribution: density 0.020 0.015 0.010 0.005 x 20 40 60 80 100 120 140 Comment: In this case, the mixed distribution is bimodal. Two-point mixtures of distributions each unimodal, can be either unimodal or bimodal. In this example, with 3% weight to the first Gamma and 97% weight to the second Gamma, the mixture would have been unimodal.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures ∞
38.47. B. E[X | X > 50] =
∫ {.7 f1(x)
HCM 10/8/12,
Page 770
∞
+ .3 f2 (x)} x dx /
50
∫ .7 f1(x)
+ .3 f2(x) dx
50
∞
∞
∞
∞
50
50
50
50
= { .7 ∫ f1(x) x dx + .3 ∫ f2 (x) x dx }/{ .7 ∫ f1(x) dx + .3 ∫ f2(x) dx } = {(.7) S1 (50) (50 + e1 (50)) + (.3) S2 (50) (50 + e2 (50))}/ {(.7) S1 (50) + (.3) S2 (50)} = {(.7) e-50/100 (50 + 100) + (.3) e-50/200 (50 + 200)}/{(.7) e-50/100 + (.3) e-50/200} = 122.096/.6582 = 185.5. Alternately, for the mixture, S(50) = (.7) e-50/100 + (.3) e-50/200 = 0.6582. E[X] = (.7)(100) + (.3)(200) = 130. E[X ∧ 50] = (.7){(100)(1 - e-50/100)} + (.3){(200)(1 - e-50/200)} = 40.815. Average size of those losses greater than 50 is: {E[X] - (E[X ∧ 50] - 50S(50))}/S(50) = {130 - (40.815 - 50(.6582))}/.6582 = 185.5. ∞
Comment: e(x) =
∫ f(t)( t
- x) dt / S(x). ⇒ e(x) S(x) =
x
⇒
∞
∫ f(t) t
dt - x S(x).
x
∞
∫ f(t) t
dt = S(x) {x + e(x)}.
x
38.48 . E. In 2002 the losses are a 70%-30% mixture of two Exponentials with means 1000 and 5000. In 2004 the losses are a 70%-30% mixture of two Exponentials with means 1080 and 5400. For the Exponential, E[X
∧ For θ = 5400, E[X ∧ For θ = 1080, E[X
∧
x] = θ(1 - e-x/θ).
3000] = (1080)(1 - e-3000/1080) = 1012.8. 3000] = (5400)(1 - e-3000/5400) = 2301.7.
In 2004 the average payment per loss: (0.7)(1012.8) + (0.3)(2301.7) = 1399.5. 38.49. C. For Type I, the Single Parameter Pareto has mean αθ/(α − 1) = (4)(10)/3 = 13.3333, 2nd moment αθ2/(α − 2) = (4)(100)/2 = 200, and variance 200 - 13.33332 = 22.222. For Type II, the Single Parameter Pareto has mean αθ/(α − 1) = (3)(10)/2 = 15, second moment αθ2/(α − 2) = (3)(100)/1 = 300, and variance 300 - 152 = 75. The sum of 600 risks of Type I and 400 risks of Type II has variance: (600)(22.222) + (400)(75) = 43,333. Comment: We know exactly how many of each type we have, rather than picking a certain number of risks at random.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 771
38.50. C. This is a 60%-40% mixture of the two Single Parameter Pareto Distributions. The mixture has mean: (0.6)(13.3333) + (0.4)(15) = 14. The mixture has second moment: (0.6)(200) + (0.4)(300) = 240. The mixture has variance: 240 - 142 = 44. 38.51. B. If the risk is of Type I, then the variance of outcomes is 22.222. If the risk is of Type II, then the variance of outcomes is 75. The expected value of this variance is: (0.6)(22.222) + (0.4)(75) = 43.333. Comment: This is the Expected Value of the Process Variance. The Variance of the Hypothetical Means is: (0.6)(13.3333 - 14)2 + (0.4)(15- 14)2 = .667. Expected Value of the Process Variance + Variance of the Hypothetical Means = 43.333 + .667 = 44 = Total Variance. Note that the variance of the sum of one loss from each of the 1000 risks is: (1000)(43.333) = 43,333, the solution to a previous question. 38.52. A. Profit per applicant is a mixed distribution, with 50% weight to heads and 50% weight to tails. These are each in turn mixed distributions. Heads is 2/3 weight to good and 1/3 weight to bad, with mean: (2/3)(300) + (1/3)(-100) = 166.67, and with second moment: (2/3)(10000 + 3002 ) + (1/3)(90000 + 1002 ) = 100,000. Tails is 2/3 weight to good and 1/3 weight to zero, with mean: (2/3)(300) + (1/3)(0) = 200, and with second moment: (2/3)(10000 + 3002 ) + (1/3)(02 ) = 66,667. The overall mean profit is: (50%)(166.67) + (50%)(200) = 183.33. The overall second moment of profit is: (50%)(100,000) + (50%)(66,667) = 83,333. The variance of the profit per applicant is: 83333 - 183.332 = 49,722. Comment: Information taken from 3, 11/02, Q.15. 38.53. C. Of the original applicants: (50%)(2/3) = 1/3 are heads and good, (50%)(1/3) = 1/6 are heads and bad, (50%)(2/3) = 1/3 are tails and good, (50%)(1/3) = 1/6 are tails and bad. Bob accepts the first three types, 5/6 of the total. Thus profit per accepted applicant is a mixed distribution, with (1/3 + 1/3)/(5/6) = 80% weight to good and (1/6)/(5/6) = 20% weight to bad. The mean profit is: (80%)(300) + (20%)(-100) = 220. The second moment of profit is: (80%)(10000 + 3002 ) + (20%)(90000 + 1002 ) = 100,000. The variance of the profit per accepted applicant is: 100000 - 2202 = 51,600.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 772
38.54. E. The LogNormal has mean: exp[10 + 0.72 /2] = 28,141, and second moment: exp[(2)(10) + (2)(0.72 )] = 1,292,701,433. The Gamma has mean: (2)(10,000) = 20,000, and second moment: (2)(2+1)(10,0002 ) = 600,000,000. The Pareto has mean: 200,000/(5 - 1) = 50,000, and second moment: (2)200,0002 /{(5 - 1)(5 - 2)} = 6,666,666,667. The Exponential has mean: 5000, and second moment: (2)(50002 ) = 50,000,000. The mixed distribution has mean: (.1)(28,141) + (.3)(20,000) + (.2)(50,000) + (.4)(5000) = 20,814. The mixed distribution has second moment: (.1)(1,292,701,433) + (.3)(600,000,000) + (.2)(6,666,666,667) + (.4)(50,000,000) = 1662.60 million. Standard deviation of the mixed distribution:
1662.60 million - 20,8142 = 35,062.
Comment: Not intended to be a realistic model of Homeowners Insurance. For example, the size of loss distribution would depend on the value of the insured home. For wind and fire, there would be point masses of probability at the value of the insured home. The mix of losses by peril would depend on the location of the insured home. 38.55. E. E[X] = (.65)exp[8 + .52 /2] + (.35)exp[9 + .32 /2] = (.65)(3378) + (.35)(8476) = 5162. 38.56. B. E[X2 ] = (.65)exp[(2)(8) + (2)(.52 )] + (.35)exp[(2)(9) + (2)(.32 )] = (.65)(14,650,719) + (.35)(78,609,255) = 37,036,207. 38.57. C. For the LogNormal, E[X-1] = exp[-µ + σ2/2]. E[X-1] = (.65)exp[-8 + .52 /2] + (.35)exp[-9 + .32 /2] = (.65)(.0003801) + (.35)(.00012909) = 0.00029225. 1/E[1/X] = 1/.00029225 = 3422.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 773
38.58. A. For good students, f(x) = (6 + 2 - 1)!/{(6-1)!(2-1)!} (x/100)6 (1 - x/100)2-1/x = 42 x5 (1 - x/100)/1012, 0 ≤ x ≤ 100. F(65) =
65 12 42 / 10
∫
x5 - x6 / 100 dx = .2338.
0 65
∫
x f(x) dx =
65 12 42 / 10
∫
0
x6 - x7 / 100 dx = 12.685.
0
Average grade for those good students who fail: 12.685/.2338 = 54.26. For bad students, f(x) = (3 + 2 - 1)!/{(3-1)!(2-1)!} (x/100)3 (1 - x/100)2-1/x = 12 x2 (1 - x/100)/106 , 0 ≤ x ≤ 100. 65
65
0
0
F(65) = 12 / 106 ∫ x2 - x 3 / 100 dx = .5630.
∫
65
x f(x) dx = 12 / 106 ∫ x3 - x 4 / 100 dx = 25.705. 0
Average grade for those bad students who fail: 25.705/0.5630 = 45.66. Prob[Good | failed] = Prob[fail | Good] Prob[Good] / Prob[fail] = (.2338)(.75)/{(.2338)(.75) + (.5630)(.25)} = 55.47%. Expected grade of a student who fails: (0.5547)(54.26) + (1 - 0.5547)(45.66) = 50.4. Comment: The distribution of grades is given as continuous, so we integrate from 0 to 65. 38.59. C. & 38.60. B. & 38.61. D. & 38.62. C. Mean = (.5)(8%) + (.5)(11%) = 9.5%. For each Normal, its second moment is equal to: σ2 + µ2. Second Moment of R is: (.5)(.22 + .082 ) + (.5)(.32 + .112 ) = 0.07425. Variance of R is: 0.07425 - .0952 = 0.0652.
0.0652 = 0.255.
Third Moment of the first Normal is: .083 + (3)(.08)(.22 ) = .01011. Third Moment of the second Normal is: .113 + (3)(.11)(.32 ) = .03103. Third Moment of R is: (.5)(.01011) + (.5)(.03103) = 0.02057. Third Central Moment of R is: 0.02057 - (3)(.095)(.07425) + (2)(.0953 ) = 0.001124. Skewness of R is: 0.001124/0.06521.5 = 0.0675. Fourth Moment of the first Normal is: .084 + (6)(.082 )(.22 ) + (3)(.24 ) = .006377. Fourth Moment of the second Normal is: .114 + (6)(.112 )(.32 ) + (3)(.34 ) = .030980. Fourth Moment of R is: (.5)(.006377) + (.5)(.030980) = 0.01868. Fourth Central Moment of R is: 0.01868 - (4)(.095)(0.02057) + (6)(.0952 )(.07425) - (3)(.0954 ) = 0.01464. Kurtosis of R is: 0.01464/0.06522 = 3.44. Comment: Note that each Normal has a kurtosis of 3, yet the mixture has a kurtosis greater than 3. Mixtures tend to have heavier tails.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 774
38.63. Let the CV of each component be c. Let µ1/µ2 = r. Then the mean of the mixture is: p µ1 + (1-p)µ2 = µ2{r p + 1 - p}. σ1 = cµ1 = crµ2. σ2 = cµ2. The second moment of the mixture is: p{σ12 + µ12} + (1-p){σ22 + µ22} = p{c2 r2 µ22 + r2 µ22} + (1-p){c2 µ22 + µ22} = µ22(1 + c2 )(pr2 + 1 - p). 2 µ 2 (1 + c 2)(pr2 + 1 - p) 2 ) pr + 1 - p . For the mixture: 1 + CV2 = E[X2 ]/E[X]2 = 2 = (1 + c µ 22 {r p + 1 - p}2 {r p + 1 - p}2
The relationship of the CV of the mixture to that of each component, c, depends on the ratio: 1 + CV2 pr2 + 1 - p = . 1 + c2 {r p + 1 - p}2 If this ratio is one, then CV = c. If this key ratio is greater than one, the CV > c. If p = 0 or p = 1, then this key ratio is one. In this case we really do not have a mixture. For r = 1, this key ratio is one. Thus if the two components have the same mean, then the CV of the mixture is equal to the CV of each component. For p fixed, 0 < p < 1, take the derivative with respect to r of this key ratio: 2pr(rp + 1 - p)2 - (pr2 + 1 - p)2(rp + 1 - p)p r(rp + 1 - p) - (pr2 + 1 - p) = 2p = 4 {r p + 1 - p} {r p + 1 - p}3 2p(1-p)
r - 1 . {r p + 1 - p}3
Since p > 0, 1 - p > 0, and the denominator of this derivative is positive, the sign of the derivative depends on r - 1. For r < 1 this derivative is negative, and for r > 1 this derivative is positive. Thus for p fixed, the minimum of the key ratio occurs for r = 1. Thus if the two components have the same mean, then the CV of the mixture is equal to the CV of each component. However, if the two components have different means, in other words r ≠ 1, then the CV of the mixture is greater than the CV of each component. Alternately, the variance of the mixture = EPV + VHM. If the means of the two components are equal, then the VHM = 0, and the variance of the mixture = EPV. If the means of the two components are not equal, then the VHM > 0, and the variance of the mixture is greater than the EPV. The EPV = p σ12 + (1 - p)σ22 = pc2 r2 µ22 + (1-p)c2 µ22 = c2 µ22 (pr2 + 1 - p).
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 775
Thus if the means of two components differ, in other words if r ≠ 1, C V2 of mixture = (Variance of mixture) / (mean of mixture)2 > EPV / (mean of mixture)2 = c2µ 22 (pr2 + 1 - p) . µ2 2{r p + 1 - p}2 Thus for r ≠ 1,
(CV of mixture)2 pr2 + 1 - p > . c2 {r p + 1 - p}2
As before, we can show that this key ratio is greater than one when r ≠ 1. Thus if the two components have different means, in other words r ≠ 1, then the CV of the mixture is greater than the CV of each component. 38.64. A. f(x) = (.7)(2e-2x) + (.3)(3e-3x), a 70%-30% mixture of two Exponentials with means 1/2 and 1/3. E[X] = (.7)(1/2) + (.3)(1/3) = .45 = 9/20. 38.65. E. A mixture of three Exponentials with means 100, 50, and 25, and equal weights. For each Exponential, E[X] - E[X
∧
50] = θe-50/θ, and S(50) = e-50/θ.
e(50) = (E[X] - E[X ∧ 50]) / S(50) = (100e-.5/3 + 50e-1/3 + 25e-2/3) / (e-.5/3 + e-1/3 + e-2/3) = 27.4768 / 0.369915 = 74.28. 38.66. E. Mean = (60 + 80)/2 = 70. Second Moment = {(200 + 602 ) + (300 + 802 )}/2 = 5250. Variance = 5250 - 702 = 350. Alternately, Expected Value of the Process Variance = (200 + 300)/2 = 250. Variance of the Hypothetical Means = {(60 -70)2 + (80 - 70)2 }/2 = 100. Total Variance = EPV + VHM = 250 + 100 = 350. 38.67. E. 1. True. 2. True. 3. True. 38.68. C. Let F be a mixed distribution: then F(x) = pA(x) + (1-p)B(x). The median is that x such that F(x) = .5. Thus the median of the mixed distribution is the value of x such that: .5 = pA(x) + (1-p)B(x). In this case, p = .5, A is a Burr and B is a Pareto. Substituting into the equation for the median: .5 = (.5){1-(1000/(1000+x2 ))} + (.5){1-1000/(1000+x)}. Thus 1 = 1000/(1000+x2 ) + 1000/(1000+x). Thus x3 = 1,000,000. Thus x =100. Comment: Check: for x = 100, (.5){1-(1000/(1000+1002 ))} + (.5){1-1000/(1000+100)} = (.5)(1-1000/11000) + (.5)(1-1000/1100) = (.5)(.9090) + (.5)(.0909) = .4545 + .0455 = .5. Note that the median of the Burr is 31.62, while the median of the Pareto is 1000. The (weighted) average of the medians is: (.5)(31.62) + (.5)(1000) = 515.8, which is not equal to the median of the mixed distribution.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 776
38.69. C. The uniform distribution on [0,2] has density function: 0 x < 0, 1/2 0 < x < 2, 0 2
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 777
38.70. C. For mixed distributions the moments are weighted averages of the moments of the individual distributions. For a Pareto the first moment is θ/(α−1), which in these cases are 100 and 500. For a Pareto the second moment is 2θ2 /{(α−1)(α−2)}, which in these cases are 30,000 and 1,000,000. Thus the first moment of the mixed distribution is: (.75)(100) + (.25)(500) = 200. The second moment of the mixed distribution is: (.75)(30,000) + (.25)(1,000,000) = 272,500. The variance of the mixed distribution is: 272,500 - 2002 = 232,500. Alternately, the variance of a Pareto is αθ2 /{(α−1)2 (α−2)}. Thus the process variances are: Var[X | Liability] = 4(3002 ) / {(32 )(2)} = 20,000 and Var[X | Property] = 3(10002 ) / {(22 )(1)} = 750,000. Thus the Expected Value of the Process Variance = (.75)(20000) + (.25)(750000) = 202,500. Since the mean of Pareto is θ/(α−1), the hypothetical means are: E[X | Liability] = 300 / 3 = 100 and E[X | Property] = 1000 / 2 = 500. The overall mean is: (.75)(100) + (.25)(500) = 200. Thus the Variance of the Hypothetical Means is: (.75)(100-200)2 + (.25)(500-200)2 = 30,000. Thus for the whole portfolio the Total Variance = EPV + VHM = 202,500 + 30,000 = 232,500. Type of Risk Liability Property
A Priori Chance of This Type of Risk 0.750 0.250
Overall
Hypothetical Mean
Second Moment
Process Variance
Square of Hypothetical Mean
100 500
30,000 1,000,000
20,000 750,000
10,000 250,000
202,500
70,000
200
VHM = 70,000 -
2002
= 30,000. Total Variance = 202,500 + 30,000 = 232,500.
Comment: The variance of a mixed distribution is not the weighted average of the individual variances. The statement that the risks have identical claim count distributions, allows one to weight the two distributions with weights 75% and 25%. For example, if instead, liability risks had twice the mean claim frequency of property risks, then (2)(.75)/{(2)(.75) + (1)(.25)} = 85.7% of the claims would come from liability risks. Therefore, in that case one would instead weight the two distributions together using weights of 85.7% and 14.3%, as follows. Type of Risk Liability Property
A Priori Chance of This Type of Risk 0.750 0.250
Overall
VHM = 44286 -
1572 =
Relative Claim Frequency 2 1
Chance of a Claim from this Type of Risk 0.857 0.143
Hypothetical Second Process Mean MomentVariance
Square of Hypothetical Mean
100 500
20,000 750,000
10,000 250,000
157
124,286
44,286
19,367. Total Variance = 124,286 + 19,367 = 143,923.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 778
38.71. A. This question asks for the variance of individual claim amounts. For a claim picked at random, it has a 2/(2+3) = 40% chance of coming from the first severity distribution and a 60% chance of coming from the second severity distribution. This is mathematically the same as a two point mixture. The first distribution has a mean 1.4 and a second moment 2.2, while the second distribution has a mean of 2.6 and a second moment of 7.4. The mixed distribution has a weighted average of the moments; it has a mean of: (2/5)(1.4) + (3/5)(2.6) = 2.12, and a second moment of: (2/5)(2.2) + (3/5)(7.4) = 5.32. The variance of the mixed severity distribution is: 5.32 - 2.122 = .83. Alternately, one computes the combined severity distribution, by weighting the individual distribution together, using their mean frequencies of 2 and 3 as weights. x
p1(x)
p2(x)
combined severity
1 2 3 4
0.6 0.4 0 0
0.1 0.3 0.5 0.1
0.30 0.34 0.30 0.06
first moment
second moment
0.30 0.68 0.90 0.24
0.30 1.36 2.70 0.96
2.12
5.32
The variance of the combined severity is: 5.32 - 2.122 = .83. Comment: In “Mahlerʼs Guide to Aggregate Distributions”, the very similar Course 151 Sample Exam #2, Q.4 asks instead for the variance of S. The variance of each compound Poisson is its mean frequency times the second moment of its severity. Since the two compound Poissons are independent, their variances add to get the variance of S. 38.72. D. Mean of the mixture is: (0.6)(1400) + (0.4)(2000) = 1640. Second of the mixture is: (0.6)(40,000 + 14002 ) + (0.4)(250,000+ 20002 ) = 2,900,000. Variance of the mixture is: 2,900,000 - 16402 = 210,400. Alternately, Expected Value of the Process Variance = (0.6)(40,000) + (0.4)(250,000) = 124,000. Variance of the Hypothetical Means = (0.6)(1400 - 1640)2 + (0.4)(2000 - 1640)2 = 86,400. Total Variance = EPV + VHM = 124,000 + 86,400 = 210,400.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 779
38.73. C. Since all payments take place 3 years from now and we used an interest rate of 4%, present values are taken by dividing by 1.043 = 1.125. The mean claim payment is (50%)(0) + (50%)(10) = 5. The insurer's mean payment plus legal expense is: 5 + 5 = 10, with present value: 10/1.125 = 8.89. The payment of 5 in claims expense is fixed, so that it does not affect the variance. The second moment of the LogNormal is: variance + mean2 = 202 + 102 = 500. The claims payment is a 50%-50% mixture of zero and a LogNormal Distribution. Therefore its second moment is a weighted average of the second moments: (.5)(0) + (.5)(500) = 250. Thus the variance of the claim payments is: 250 - 52 = 225. The variance of the present value is: 225/1.1252 = 177.78. Therefore, the expected present value of the claim and legal expenses plus 0.02 times the variance of the present value is: 8.89 + (.02)(177.78) = 12.45. Comment: Since the time until payment and the interest rate are both fixed, the present values are easy to take. The present value is gotten by dividing by 1.125, so the variance of the present value is divided by 1.1252 . There is no interest rate risk or timing risk in this simplified example. We do not make any use of the fact that the amount of payment specifically follows a LogNormal Distribution. 38.74. Overall mean is: (25%)(500) + (75%)(300) = 350. Second moment for Type I is: 1002 + 5002 = 260,000. Second moment for Type II is: 702 + 3002 = 94,900. Overall second moment is: (25%)(260,000) + (75%)(94,900) = 136,175. Overall variance is: 136,175 - 3502 = 13,675. Overall standard deviation is:
13,675 = 116.9.
Alternately, E[Var | Type] = (25%)(1002 ) + (75%)(702 ) = 6175. Var[Mean | Type] = (.25)(500 - 350)2 + (.75)(300 - 350)2 = 7500. Overall variance is: E[Var | Type] + Var[Mean | Type] = 6175 + 7500 = 13,675. Overall standard deviation is:
13,675 = 116.9.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 780
38.75. B. Put into thousands, the density is: f(x) = 0.0005003 e-x/2000, 0 < x <15000. 15000
∫
Expected payment is: (.02)(15000 - 1000) + (.04) (x - 1000) 0.0005003 e-x/2000 dx = 1000 15000
∫
15000
∫
280 + .000020012 x e-x/2000 dx - .020012 e-x/2000 dx = 1000
1000 x=15000
x=15000
280 + .000020012 (-2000x e-x/2000 -20002 e-x/2000) ] - (.020012) 2000e-x/2000 ] = x=1000
x=1000
280 + 23.9 + 48.5 - 24.3 = 328. 38.76. C. For the Exponential Distribution E[X
∧
x] = θ(1 - e-x/θ).
The given distribution is a 80%-20% mixture of two Exponentials, with means 50 and 1000. F(x) = .8(1 - e-x/50) + .2(1 - e-x/1000). Therefore, the limited expected value at 1000 is a weighted average of the LEVs for the individual Exponentials. E[X ∧ 1000] = (.8){(50)(1 - e-1000/50)} + (.2){(1000)(1 - e-1000/1000)} = 166.4. Alternately, E[X ∧ 1000] = 1000
1000
∫S(x) dx = ∫0.8e-0.02x + 0.2e-0.001x dx = 40(1 - e-20) + (200)(1 - e-1) = 166.4. 0
0
38.77. C. E[X] = (m1 + m2 )/2. The second moment of a mixture is the mixture of the second moments: E[X2 ] = (2m1 2 + 2m2 2 )/2 = m1 2 + m2 2 . 1 + CV2 = E[X2 ]/E[X]2 = 4(m1 2 + m2 2 )/(m1 + m2 )2 = 4{1 - 2m1 m2 /(m1 + m2 )2 } ≤ 4.
⇒ C V2 ≤ 3. ⇒ CV ≤ 3 . Comment: The CV is largest when m1 and m2 are significantly different. If m1 = m2 , then the CV is: (4)(1 - 2m2 /(2m)2 ) - 1 = 1; we would have a single Exponential Distribution with CV = 1. If we let r = m2 /m1 , then CV2 = 4(m1 2 + m2 2 )/(m1 + m2 )2 - 1 = 4(1 + r2 )/(1 + r)2 - 1. This is maximized as either r→0 or r→∞ and CV2 → 3 or CV→ 3 .
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 781
38.78. D. The future lifetime of smokers is Exponential with mean: 1/.2 = 5. The future lifetime of non-smokers is Exponential: F(t) = 1 - e-.1t. The future lifetime for an individual selected at random is a mixed Exponential: F(t) = (.3)(1 - e-.2t) + (.7)(1 - e-.1t) = 1 - .3e-.2t - .7e-.1t. Want .75 = F(t) = 1 - .3e-.2t - .7e-.1t. Let y = e-.1t. ⇒ .3y2 + .7y - .25 = 0. ⇒ y = 0.315. ⇒ t = 11.56. Comment: Constant force of mortality λ ⇔ Exponential Distribution with mean 1/λ. 38.79. C. The mean of the Weibull is: 30 Γ[1 + 1/(1/2)] = 30 Γ(3) = (30)(2!) = 60. Thus the expected cost of a claim is: (40%)(0) + (60%)(60) = 36. #1 is false. The survival function at $60 is: (60%)(Survival Function at 60 for the Weibull) = (60%)exp[-(60/30)1/2] = 0.1459. #2 is false. The density at 60 is: (40%)(0) + (60%){exp[-(60/30)1/2](1/2)(60/30)1/2/60} = .001719. h(60) = .001719/.1459 = 0.0118. # 3 is true. Comment: For the Weibull, h(x) = τxτ−1/θτ. h(60) = (1/2)(60-1/2)/301/2 = 0.0118. For the mixture, h(60) = {(60%)(f(60) of the Weibull)}/{(60%)(S(60) of the Weibull)} = hazard rate at 60 for the Weibull. Since in this case, one of the components of the mixture is zero, it does not effect the hazard rate. For example, if one mixed a Weibull and a Pareto, they would each affect the numerator and denominator of the hazard rate of the mixture. 38.80. B. S(750) = (30%){1/(1 + (750/400)2 )} + (70%){1/(1 + (750/600)2 )} = .3396. Comment: The Survival Function of the mixture is the mixture of the Survival Functions. For the loglogistic distribution, F(x) = (x/θ)γ/{1 + (x/θ)γ}. S(x) = 1/{1 + (x/θ)γ}. 38.81. A. For the first Pareto, F(200) = 1 - {100/(100 + 200)}2 = .8889. For the second Pareto, F(200) = 1 - {3000/(3000 + 200)}4 = .2275. Pr(X ≤ 200) = (0.8)(0.8889) + (0.2)(0.2275) = 0.757. 38.82. B. F(6000) = (0.6)Φ[(6000 - 5000)/ 1,000,000 ] + (0.4)Φ[(6000 - 4000)/ 1,000,000 ] = (0.6)Φ[1] + (0.4)Φ[2] = (0.6)(0.8413) + (0.4)(0.9772) = 0.8957. S(6000) = 1 - 0.8957 = 10.43%. Comment: A two-point mixture of Normal Distributions.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 782
38.83. A. Nonsmokers have an Exponential Survival function beyond age 30: S(x)/S(30) = exp[-.08(x - 30)]. Similarly, for smokers, S(x)/S(30) = exp[-.16(x - 30)]. Since we have a 50-50 mixture starting at age 30: S(x)/S(30) = .5 exp[-.08(x - 30)] + .5 exp[-.16(x - 30)]. S(80)/S(30) = .5 exp[-(.08)(50)] + .5 exp[-(.16)(50)] = .009326. S(81)/S(30) = .5 exp[-(.08)(51)] + .5 exp[-(.16)(51)] = .008597. p 80 = S(81)/S(80) = {S(81)/S(30)}/{S(80)/S(30)} = .008597/.009326 = .9218. q80 = 1 - .9218 = 0.0782. Alternately, assume that there are for example originally 2,000,000 individuals, 1,000,000 nonsmokers and 1,000,000 smokers, alive at age 30. Then the expected nonsmokers alive at age 80 is: 1,000,000 exp[-(.08)(80 - 30)] = 18,316. The expected nonsmokers alive at age 81 is: 1,000,000 exp[-(.08)(81 - 30)] = 16,907. The expected smokers alive at age 80 is: 1,000,000 exp[-(.16)(80 - 30)] = 335. The expected smokers alive at age 81 is: 1,000,000 exp[-(.16)(81 - 30)] = 286. In total we expect 18316 + 335 = 18,651 alive at age 80 and 16907 + 286 = 17913 alive at age 81. Therefore, q80 = (18651 - 17193)/18651 = 0.0782. Comment: Since the expected number of smokers alive by age 80 is so small, q80 is close to that for nonsmokers: 1 - exp[-(.08)(51)]exp[-(.08)(50)] = 1 - e-.08 = 0.0769. 38.84. A. For the first LogNormal F(15000) = Φ[(ln(15000) - 7)/2] = Φ[1.31] = .9049. The second LogNormal has µ = 7 + ln2 = 7.693 and σ = 2, and F(15000) = Φ[(ln(15000) - 7.693)/2] = Φ[0.96] = .8315. For the mixed distribution: S(15000) = 1 - {(.75)(.9049) + (.25)(.8315)} = 0.113. Alternately, Prob[accident ≤ 15000] = Prob[standard stay ≤ 15000/2 = 7500] = Φ[(ln(7500) - 7)/2] = Φ[0.96] = .8315. Proceed as before.
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
38.85. (i) (a) Let J be the amount paid back to the investor in the junior loan. If the asset does not sustain a loss then they get paid the 50 plus 8% interest or 54. If a loss is sustained they get no interest. If the loss is more than 50 they get paid nothing. If the loss is less than 50, they get paid 50 minus the loss. Let U(0, 50) be uniform from 0 to 50. ⎧ 54 with probability 75% ⎪ J= ⎨ 0 with probability 12.5% = ⎪50 - U(0, 50) with probability 12.5% ⎩
⎧ 54 with probability 75% ⎪ 0 with probability 12.5% . ⎨ ⎪U(0, 50) with probability 12.5% ⎩
J is a mixture. E[J] = (0.75)(54) + (0.125)(0) + (0.125)(25) = 43.625. E[J2 ] = (0.75)(542 ) + (0.125)(02 ) + (0.125)(502 /12 + 252 ) = 2291. Var[J] = 2291 - 43.6252 = 388. (b) Let S be the amount paid back to the investor in the senior loan. If the asset does not sustain a loss then they get paid the 50 plus 6% interest or 53. If a loss is sustained they get no interest. If the loss is less than 50, they get paid 50. If the loss L is more than 50 they get paid: 50 - (L - 50) = 100 - L. ⎧ 53 with probability 75% ⎧ 53 with probability 75% ⎪ ⎪ S=⎨ 50 with probability 12.5% = ⎨ 50 with probability 12.5% . ⎪100 - U(50, 100) with probability 12.5% ⎪U(0, 50) with probability 12.5% ⎩ ⎩ S is a mixture. E[S] = (0.75)(53) + (0.125)(50) + (0.125)(25) = 49.125. E[S2 ] = (0.75)(532 ) + (0.125)(502 ) + (0.125)(502 /12 + 252 ) = 2523. Var[S] = 2523 - 49.1252 = 110. (ii) Prob(J < 50) = 0.25. Prob(S < 50) = 0.125. Comment: The variance of a uniform distribution from 0 to 50 is 502 /12. Thus the second moment of a uniform distribution from 0 to 50 is: 502 /12 + 252 .
Page 783
2013-4-2,
Loss Distributions, §38 N-Point Mixtures
HCM 10/8/12,
Page 784
38.86. (a) E[R] = (0.8)(0) + (0.2)(30%) = 6%. E[R2 ] = (0.8)(0.12 + 02 ) + (0.2)(0.12 + 0.32 ) = 0.028. Var[R] = 0.028 - 0.062 = 0.0244. (b) Prob[R < 0] = 0.8 Φ[(0 - 0)/0.1] + 0.2 Φ[(0 - 0.3)/0.1] = (0.8) Φ[0] + (0.2) Φ[-3] = (0.8)(0.5) + (0.2)(0.0013) = 0.40026. Prob[S < 0] = Φ[(0 - 0.06)/ 0.0244 ] = Φ[-0.38] = 0.3520. Prob[R < -0.1] = 0.8 Φ[(-0.1 - 0)/0.1] + 0.2 Φ[(-0.1 - 0.3)/0.1] = (0.8) Φ[-1] + (0.2) Φ[-4] = (0.8)(0.1587) + (0.2)(0) = 0.12696. Prob[S < -0.1] = Φ[(-0.1 - 0.06)/ 0.0244 ] = Φ[-1.02] = 0.1539. Comment: Even though R and S have the same mean and variance they have different probabilities in the lefthand tail. F(0) is greater for R than S, while F(-0.1) is greater for S than R.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 785
Section 39, Continuous Mixtures of Models306 Discrete mixtures can be extended to a continuous case such as the Inverse Gamma - Exponential situation, to be discussed below. Instead of an n-point mixture, one can take a continuous mixture of severity distributions. Mixture Distribution ⇔ Continuous Mixture of Models. Continuous mixtures can be performed of either frequency distributions307 or loss distributions. For example assume that each individual's future lifetime is exponentially distributed with mean 1/λ, and over the population, λ is uniformly distributed over (0.05, 0.15). u(λ) = 1/0.1 = 10, 0.05 ≤ λ ≤ 0.15. Then the probability that a person picked at random lives more than 20 years is: 0.15
S(20) =
∫ S(20; λ) u(λ) dλ = 0.05∫ e- 20λ (1/ 0.10) dλ = (10/20)(e-1 - e-3) = 15.9%.
The density at 20 of this mixture distribution is: 0.15
∫ f(20; λ) u(λ) dλ = 0.05 ∫
λ e- 20λ
10 dλ =
(10)(-λe- 20λ
/ 20 -
e- 20λ
λ = 0.15
/ 400)]
= 0.0134.
λ = 0.05
In general, one takes a mixture of the density functions for specific values of the parameter ζ:
g(x) =
∫ f(x; ζ) π(ζ) dζ .
via some mixing distribution π(ζ). For example, in the case where the severity is Exponential and the mixing distribution of their means is Inverse Gamma, we get the Inverse Gamma - Exponential process. 306
See Section 5.2.4 of Loss Models. See the sections on Mixed Frequency Distributions and the Gamma-Poisson in “Mahlerʼs Guide to Frequency Distributions”. 307
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 786
Inverse Gamma-Exponential: The sizes of loss for a particular policyholder are assumed to be Exponential with mean δ. Given δ, the distribution function of the size of loss is 1 - e−x/δ, while the density of the size of loss distribution is (1/δ)e−x/δ. The mean of this Exponential is δ and its variance is δ2. Note that I have used δ rather than θ, so as to not confuse the scale parameter of the Exponential with that of the Inverse Gamma which is θ. So for example, the density of a loss being of size 8 is (1/δ)e-8/δ. If δ = 2 this density is: (1/2)e-4 = 0.009, while if δ = 20 this density is: (1/20)e-0.4 = 0.034. Assume that the values of δ across a portfolio of policyholders are given by an Inverse Gamma distribution with α = 6 and θ = 15, with probability density function: θα e- θ / δ 94921.875 e- 15 / δ π(δ) = α + 1 = , 0 < δ < ∞.308 δ Γ[α ] δ7 Note that this distribution has a mean of: θ / ( α−1) = 15 / (6-1) = 3. If we have a policyholder and do not know itʼs expected mean severity, in order to get the density of the next loss being of size 8, one would weight together the densities of having a loss of size 8 given δ, using the a priori probabilities of δ: π(δ) = 94921.875 e−15/δ / δ7, and integrating from zero to infinity: ∞
g(8) =
∞ - 8/ δ ∞ - 23/ δ e- 8 / δ e 94921.875 e- 15 / δ e π(δ) dδ = ∫ dδ = 94921.875 dδ ∫ 8 7 δ δ δ δ 0 0 0
∫
= 94921.875 ( 6! ) / (237 ) = 0.0201.
Where we have used the fact that the density of the Inverse Gamma Distribution integrates to unity ∞
over its support and therefore:
e- θ / x Γ[α ] (α − 1)! dx = α = . α + 1 θ θα 0 x
∫
The Inverse Gamma Distribution has density: f(x) = θα e−θ/x / {Γ(α) xα+1}. In this case, the constant in front is: θα/ Γ(α) = 156 / Γ(6) = 11390625 / 120 = 94921.875. 308
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 787
More generally, if the distribution of Exponential means δ is given by an Inverse Gamma distribution θα e- θ / δ π(δ) = α + 1 , and we compute the density of having a claim of size x by integrating from zero to δ Γ[α ] ∞
infinity: g(x) =
=
∞ - x/δ e- 8 / δ e θα e - θ / δ θ α ∞ e- ( θ + x) / δ π(δ) dδ = ∫ dδ = ∫ δα + 2 dδ α + 1 Γ[α] δ δ δ Γ[α ] 0 0 0
∫
θα Γ[α + 1] α θα = .309 α + 1 Γ[α ] (θ + x)α+ 1 θ + x ( )
Thus the (prior) mixed distribution is in the form of the Pareto distribution. Note that the shape parameter and scale parameter of the mixed Pareto distribution are the same as those of the Inverse Gamma distribution. For the specific example: α = 6 and θ = 15. Thus the mixed Pareto has g(x) = 6(156 )(15+x)-7. g(8) = 6(156 )(23)-7 = 0.0201, matching the previous result. For the Inverse Gamma-Exponential the (prior) mixed distribution is always a Pareto, with α = shape parameter of the (prior) Inverse Gamma and θ = scale parameter of the (prior) Inverse Gamma.310 Note that for the particular case we get a mixed Pareto distribution with parameters of α = 6 and θ = 15, which has a mean of 15/(6-1) = 3, which matches the result obtained above. Note that the formula for the mean of an Inverse Gamma and a Pareto are both θ/(α−1). Exercise: Each insured has an Exponential severity with mean δ. The values of δ are distributed via an Inverse Gamma with parameters α = 2.3 and θ = 1200. An insured is picked at random. What is the probability that its next claim will be greater than 1000? [Solution: The mixed distribution is a Pareto with parameters α = 2.3 and θ = 1200. ⎛ θ ⎞α ⎛ ⎞ 2.3 1200 S(1000) = ⎜ ⎟ =⎜ ⎟ = 24.8%.] ⎝θ + x ⎠ ⎝ 1000 +1200 ⎠
309 310
Both the Exponential and the Inverse Gamma have terms involving powers of e−1/δ and 1/δ. See Example 5.4 in Loss Models. See also 4B, 11/93, Q.26.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 788
Hazard Rates of Exponentials Distributed via a Gamma:311 If the hazard rate of the Exponential, λ, is distributed via a Gamma(α, θ), then the mean 1/λ is distributed via an Inverse Gamma(α, 1/θ), and therefore the mixed distribution is Pareto. If the Gamma has parameters α and θ, then the mixed Pareto has parameters α and 1/θ.
311
See for example, SOA M, 11/05, Q.17.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 789
Relationship of Inverse Gamma-Exponential to the Gamma-Poisson: If δ, the mean of each Exponential, follows an Inverse Gamma Distribution with parameters α and θ, F(δ) = 1 - Γ[α, θ/δ]. If λ = 1/δ, then F(λ) = Γ[α, θλ], and λ follows a Gamma with parameters α and 1/θ. This is mathematically the same as Exponential interarrival times each with mean 1/λ, or a Poisson Process with intensity λ. Prob[X > x] ⇔ Prob[Waiting time to 1st claim > x] = Prob[no claims by time x]. From time 0 to x we have a Poisson Frequency with mean xλ. xλ has a Gamma Distribution with parameters α and x/θ. This is mathematically a Gamma-Poisson, with mixed distribution that is Negative Binomial with r = α and β = x/θ. Prob[X > x] ⇔ Prob[no claims by time x] = f(0) = 1/(1 + x/θ)α = θα/(θ + x)α. This is the survival function at x of a Pareto Distribution, with parameters α and θ, as obtained previously. Exercise: Each insured has an Exponential severity with mean δ. The values of δ are distributed via an Inverse Gamma with parameters α = 2.3 and θ = 1200. An insured is picked at random. What is the probability that the sum of his next 3 claims will be greater than 6000? [Solution: Prob[sum of 3 claims > 6000] ⇔ Prob[Waiting time to 3rd claim > 6000] = Prob[at most 2 claims by time 6000]. The mixed distribution is Negative Binomial with r = α = 2.3, and β = x/θ = 6000/1200 = 5. Prob[at most 2 claims by time 6000] = f(0) + f(1) + f(2) = 1/62.3 + (2.3)5/63.3 + {(2.3)(3.3)/2}52 /64.3 = 9.01%. Alternately, the sum of 3 independent Exponential Claims is a Gamma with α = 3 and θ = δ. As listed subsequently, the mixture of a Gamma by an Inverse Gamma is a Generalized Pareto Distribution with parameters, α = 2.3, θ = 1200, and τ = 3. F(x) = β[τ, α; x/(x + θ)]. S(6000) = 1 - β[3, 2.3; 6000/(6000 + 1200)] = 1 - β[3, 2.3; 1/1.2] = β[2.3, 3, 1 - 1/1.2] =
β[2.3, 3, 1/6]. Using a computer, β[2.3, 3, 1/6] = 9.01%. Comment: As shown in “Mahlerʼs Guide to Frequency Distributions,” the distribution function of a Negative Binomial is: F(x) = β[r, x+1; 1/(1+β)]. In this case, F(2) = β[2.3, 3; 1/6]. Thus, one can compute β[a, b; x], for b integer, as a sum of Negative Binomial densities.]
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 790
Moments of Mixed Distributions: The nth moment of a mixed distribution is the mixture of the nth moments for specific values of the parameter ζ: E[Xn ] = Eζ[E[Xn | ζ]] Exercise: What is the mean for a mixture of Exponentials, mixed on the mean δ? [Solution: For a given value of δ, the mean of a Exponential Distribution is δ . We need to weight these first moments together via the density of delta, π(δ):
∫δ π(δ) dδ = mean of π(δ), the distribution of δ.] Thus the mean of a mixture of Exponentials is the mean of the mixing distribution. This result will hold whenever the parameter being mixed is the mean, as it was in the case of the Exponential. For the case of a mixture of Exponentials via an Inverse Gamma Distribution with parameters α and θ, the mean of the mixed distribution is that of the Inverse Gamma, θ/(α−1). Exercise: What is the Second Moment of Exponentials, mixed on the mean δ? [Solution: For a given value of δ, the second moment of an Exponential Distribution is 2δ2. We need to weight these second moments together via the density of delta, π(δ):
∫2δ2 π(δ) = 2(second moment of π(δ), the distribution of δ).] Exercise: What is the variance of Exponentials mixed on the mean δ via an Inverse Gamma Distribution, as per Loss Models, with parameters α and θ? [Solution: The second moment of the mixed distribution is: θ2 2(second moment of the Inverse Gamma) = 2 . (α − 1) (α − 2) The mean of the mixed distribution is the mean of the Inverse Gamma: θ/ (α−1). Thus the variance of the mixed distribution is: 2
⎛ θ ⎞2 θ2 α θ2 -⎜ ⎟ = . (α − 1)2 (α − 2) (α − 1) (α − 2) ⎝α − 1⎠
Comment: The mixed distribution is a Pareto and this is indeed its variance.]
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 791
Normal-Normal: The sizes of claims a particular policyholder makes is assumed to be Normal with mean m and known fixed variance s2 . 312 Given m, the distribution function of the size of loss is: Φ[
while the density of the size of loss distribution is: φ[
x-m ], s
x-m ]= s
(x - m)2 ] 2s2 . s 2π
exp[-
(8 - m)2 ] 18 . 3 2π
exp[So for example if s = 3, then the probability density of claim being of size 8 is:
If m = 2 this density is:
exp[-2] 3 2π
= 0.018, while if m = 20 this density is:
exp[-8] 3 2π
= 0.000045.
Assume that the values of m are given by another Normal Distribution with mean 7 and standard deviation of 2, with probability density function:313 (m - 7)2 ] 8 , -∞ < m < ∞. 2 2π
exp[π(m) =
Note that 7, the mean of this distribution, is the a priori mean claim severity.
312
Note Iʼve used roman letter for parameters of the Normal likelihood, in order to distinguish from those of the Normal distribution of parameters discussed below. 313 There is a very small but positive chance that the mean severity will be negative.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 792
Below is displayed this distribution of hypothetical mean severities:314 Density 0.20
0.15
0.10
0.05
2
4
6
8
10
12
14
Mean Severity
If we have a risk and do not know what type it is, in order to get the chance of the next claim being of size 8, one would weight together the chances of having a claim of size 8 given m: (8 - m)2 exp[] 18 , using the a priori probabilities of m: 3 2π (m - 7)2 ] 8 , and integrating from minus infinity to infinity: 2 2π
exp[π(m) =
∞
∫
-∞
2 2 (8 - m)2 ∞ exp[- (8 - m) ] exp[- (m - 7 ) ] ] 18 18 8 π(m) dm = dm = 3 2π 3 2π 2 2π
exp[-
1 6 2π
1 6 2π 314
∫
-∞
∞
∫
-∞ ∞
∫ -∞
(8 - m)2 (m - 7 )2 ∞ exp[-{ + }] 1 exp[- {13m2 - 190m + 697} / 72 ] 18 8 dm = dm = 6 2π 2π 2π
∫
-∞
exp[- {m2 - (190 / 13)m + (95 / 13)2 + 697 / 13 - (95 / 13) 2} / (72 / 13) ] 2π
dm =
Note that there is a small probability that a hypothetical mean is negative. When this situation is discussed further in “Mahlerʼs Guide to Conjugate Priors,” this will be called the prior distribution of hypothetical mean severities.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
exp[(-36 / 132 )/ (72 / 13)] 6 2π
∞
∫ -∞
exp[- (m - 95 / 13)2 / {(2) (6 / 13)2 } ] 2π
HCM 10/8/12,
Page 793
dm =
exp[-1/ 26] exp[-1/ 26] (6/ 13 ) = = 0.1065. 6 2π 13 2π
Where we have used the fact that a Normal Density integrates to unity:315 ∞
∫ -∞
exp[- (m - 95 / 13)2 / {(2) (6 / 13)2 } ] 2π
dm = 6/ 13 .
More generally, for the Normal-Normal, the mixed distribution is another Normal, with mean equal to that of the Normal distribution of parameters, and variance equal to the sum of the variances of the Normal distribution of parameters and the Normal likelihood.316 For the specific case dealt with previously: s = 3, µ = 7, and σ = 2, the mixed distribution has a Normal Distribution with a mean of 7 and variance of: 32 + 22 = 13. (x - 7)2 ] 26 . 13 2π
exp[Thus the chance of having a claim of size x is:
For x = 8 this chance is:
exp[-1/ 26] = 0.1065. 13 2 π
This is the same result as calculated above.
315
With mean of 95/13 and standard deviation of 6/ 13 . The Expected Value of the Process Variance is the variance of the Normal Likelihood, the Variance of the Hypothetical Means is the variance of the Normal distribution of parameters, and the total variance is the variance of the mixed distribution. Thus this relationship follows from the general fact that the total variance is the sum of the EPV and VHM. See “Mahlerʼs Guide to Buhlmann Credibility.” 316
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 794
Derivation of the Mixed Distribution for the Normal-Normal: The sizes of loss for a particular policyholder is assumed to be Normal with mean m and known fixed variance s2 . Given m, the density of the size of loss distribution is: φ[
x-m ]= s
(x - m)2 ] 2s2 . s 2π
exp[-
The distribution of hypothetical means m is given by another Normal Distribution with mean µ and variance σ2 : π(m) =
(m - µ)2 ] 2σ2 , -∞ < m < ∞. σ 2π
exp[-
We compute the mixed density at x, the chance of having a claim of size x, by integrating from minus infinity to infinity: 2 2 2 ∞ exp[- (x - m) ] ∞ exp[- (x - m) ] exp[- (m - µ) ] 2s2 2s2 2σ 2 π(m) dm = dm = s 2π s 2π σ 2π
∫
∫
-∞
-∞
1 s σ 2π
1 s σ 2π
∫
(x - m)2 (m - µ)2 exp[-{ + }] 2s2 2σ2 dm = 2π
∞
exp[-
∞
-∞
∫
-∞
(s2 + σ 2 )m2 - (xσ 2 + µs2 )2m + x2 σ2 + µ 2s2 ] 2s2 σ2 dm = 2π
s2 σ 2 xσ2 + µs2 x2σ 2 + µ2 s2 Let ξ2 = 2 , ν = , and δ = . s + σ2 s2 + σ 2 s2 + σ 2 Then the above integral is equal to: 1 s σ 2π
∞
∫ -∞
exp[-
m2 - 2νm + δ ] 2ξ2 2π
dm =
1 s σ 2π
∞
∫ -∞
exp[-
m2 - 2νm + ν2 - ν2 + δ ] 2ξ 2 2π
dm =
2013-4-2,
1 s σ 2π
exp[-
Loss Distributions, §39 Continuous Mixtures
exp[-
ν2 - δ ] 2ξ2
s2 + σ 2
2π
ν2 - δ ] 2ξ 2
∞
∫
exp[-
-∞
(m - ν)2 ] 2ξ2 2π
dm =
1 s σ 2π
exp[-
HCM 10/8/12,
Page 795
ν2 - δ ]ξ= 2ξ 2
.
Where we have used the fact that a Normal Density integrates to unity:317 ∞
∫ -∞
exp[-
(m - ν)2 ] 2ξ2
ξ
dm = 1.
2π
Note that ν2 - δ =
x2σ 4 + 2xµσ2 s2 + µ 2 s4 - {x2s2 σ 2 + x2 σ4 + µ 2s4 + µ 2 σ2 s2} = (s2 + σ 2)2
2xµσ2 s2 - x 2s2 σ 2 - µ 2 σ 2s2 (x - µ)2 σ2 s2 = . (s2 + σ 2 )2 (s2 + σ2 )2
Thus,
ν2 - δ (x - µ)2 σ2 s2 s2 + σ 2 (x - µ)2 = = . ξ2 s2 σ 2 (s2 + σ2 )2 s2 + σ 2
Thus the mixed distribution can be put back in terms of x, s, µ, and σ: exp[-
ν2 - δ ] 2ξ2
s2 + σ 2
2π
exp[=
(x - µ)2 ] 2(s2 + σ 2)
s2 + σ 2
2π
.
This is a Normal Distribution with mean µ and variance s2 + σ2 . Thus if the likelihood is a Normal Distribution with variance s2 (fixed and known), and the distribution of the hypothetical means of the likelihood is also a Normal, but with mean µ and variance σ2, then the mixed distribution is yet a third Normal Distribution with mean µ and variance s2 + σ2 . The mean of the likelihood is what is varying among the insureds in the portfolio. Therefore, the mean of the mixed distribution is equal to that of the prior distribution, in this case µ. 317
With mean of ν and standard deviation of ξ.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 796
Other Mixtures: There are many other examples of continuous mixtures of severity distributions. Here are some examples.318 In each case the scale parameter is being mixed, with the other parameters in the severity distribution held fixed. Severity
Mixing Distribution
Mixed Distribution
Exponential
Inverse Gamma: α, θ
Pareto: α, θ
Inverse
Gamma: α, θ
Inverse Pareto: τ = α, θ
Exponential Weibull, τ = t
Inverse Transformed
Burr: α, θ, γ = t
Gamma: α, θ, τ = t Inverse Burr: τ = α, θ, γ = t
Inverse
Transformed
Weibull, τ = t
Gamma: α, θ, τ = t
Gamma, α = a
Inverse Gamma: α, θ
Generalized Pareto: α, θ, τ = a
Inverse Gamma, α = a
Exponential: θ
Pareto: α = a, θ
Inverse
Generalized
Gamma, α = a
Gamma: α, θ
Pareto: α = a, θ, τ = α
Transformed
Inverse Transformed
Transformed Beta:
Gamma, α = a, τ = t
Gamma: α, θ,τ = t
α, θ, γ = t, τ = a
Inverse Transformed
Transformed
Transformed Beta:
Gamma, α = a, τ = t
Gamma: α, θ,τ = t
α = a, θ, γ = t, τ = α
318
See the problems for illustrations of some of these additional examples. Example 5.6 in Loss Models shows that mixing an Inverse Weibull via a Transformed Gamma gives an Inverse Burr.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 797
For example, assume that the amount of an individual claim has an Inverse Gamma distribution with shape parameter α fixed and scale parameter q (rather than θ to avoid later confusion.) The parameter q is distributed via an Exponential Distribution with mean µ. For the Inverse Gamma, f(x | q) = qα e-q/x / {Γ[α] xα+1}. For the Exponential, π(q) = e-q/µ / µ. f(x) =
∫
f(x | q) π(q) dq =
∞
∞ qα e- q / x e-q /µ dq = ∫ Γ[α] xα+ 1 µ ∫ qα e- q(1/ x + 1/ µ) dq / {µ Γ[α] xα+1} 0 0
= {Γ[α+1]/(1/x + 1/µ)α+1} / ({µ Γ[α] xα+1}) = α µα/(x + µ)α+1. This is the density of a Pareto Distribution with parameters α and θ = µ. This is an example of an Exponential-Inverse Gamma, an Inverse Gamma Severity with shape parameter α, with its scale parameter mixed via an Exponential.319 The mixture is a Pareto Distribution, with shape parameter equal to that of the Inverse Gamma severity, and scale parameter equal to the mean of the Exponential mixing distribution. Exercise: The severity for each insured is an Inverse Gamma Distribution with parameters α = 3 and q. Over the portfolio, q varies via an Exponential Distribution with mean 500. What is the severity distribution for the portfolio as a whole? [Solution: The mixed distribution is a Pareto Distribution with parameters α = 3, θ = 500.] Exercise: In the previous exercise, what is the probability that a claim picked at random will be greater than 400? [Solution: S(400) = {500/(400 + 500)}3 = 17.1%.] Exercise: In the previous exercise, what is the expected size of a claim picked at random? [Solution: Mean of a Pareto with α = 3 and θ = 500 is: 500/(3 - 1) = 250. Alternately, the mean of each Inverse Gamma is: E[X | q] = q/(3 - 1) = q/2. E[X] = Eq [E[X | q]] = Eq [q/2] = Eq [q]/2 = (mean of the Exponential Dist.)/2 = 500/ 2 = 250.] Exercise: The severity for each insured is a Transformed Gamma Distribution with parameters
α = 3.9, q, and τ = 5. Over the portfolio, q varies via an Inverse Transformed Gamma Distribution with parameters α = 2.4, θ = 17, and τ = 5. What is the severity distribution for the portfolio as a whole? [Solution: Using the above chart, the mixed distribution is a Transformed Beta Distribution with parameters α = 2.4, θ = 17, γ = 5, and τ = 3.9.] 319
This differs from the more common Inverse Gamma-Exponential discussed previously, in which we have an Exponential severity, whose mean is mixed via the Inverse Gamma.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 798
Frailty Models:320 A particular type of mixed model are called Frailty models. They involve a particular form of the hazard rate. x
∫0
Recall that the hazard rate, h(x) = f(x) / S(x). Also S(x) = exp[-H(x)], where H(x) = h(t) dt . Assume h(x | λ) = λ a(x), where λ is a parameter which varies across the portfolio. a(x) is some function of x, and let A(x) =
∫ a(x) dx . Then H(x | λ) = λ A(x).
S(x | λ) = exp[-λ A(x)]. S(x) = Eλ[S(x | λ)] = Eλ[exp[-λ A(x)]] = Mλ[-A(x)], where Mλ is the moment generating function of the distribution of λ.321 322 For an Exponential Distribution, the hazard rate is constant and equal to one over the mean. Thus if each individual has an Exponential Distribution, a(x) = 1, and λ = 1/θ. A(x) = x, and S(x) = Mλ[-x]. We have already discussed mixtures like this. For example, λ could be distributed uniformly from 0 to 2.323 In that case, the general mathematical structure does not help very much. However, let us assume each individual is Exponential and that λ is Gamma Distributed with parameters α and β. The Gamma has moment generating function M(t) = (1 - βt)−α.324 Therefore, S(x) = (1 + βx)−α. This is a Pareto Distribution with parameters α and θ = 1/β.325 This is mathematically equivalent to the Inverse Gamma-Exponential discussed previously. If λ is Gamma Distributed with parameters α and β, then the means of the Exponentials, δ = 1/λ are distributed via an Inverse Gamma with parameters α and 1/β. The mixed distribution is Pareto with parameters α and θ = 1/β. 320
Section 5.2.5 in Loss Models. The definition of the moment generating function is My(t) = Ey[exp[yt]]. See “Mahlerʼs Guide to Aggregate Distributions.” 322 The survival function of the mixture is the mixture of the survival functions. 323 See 3, 5/01, Q.28. 324 See Appendix A in the tables attached to the exam. 325 For the Pareto, S(x) = (1 + x/θ)−α. 321
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 799
Exercise: What is the hazard rate for a Weibull Distribution? [Solution: h(x) = f(x)/S(x) = {τ(x/θ)τ exp(-(x/θ)τ) / x}/exp(-(x/θ)τ) = τ xτ−1 θ−τ.] Therefore, we can put the Weibull for fixed τ into the form of a frailty model; h(x) = λ a(x), by taking a(x) = τ xτ−1 and λ = θ−τ. A(x) = xτ. Therefore, if each insured is Weibull with fixed τ, with λ = θ−τ, then S(x) = Mλ[-A(x)] = Mλ[-xτ]. Exercise: Each insured has a Weibull Distribution with τ fixed. λ = θ−τ is Gamma distributed with parameters α and β. What is the form of the mixed distribution?326 [Solution: The Gamma has moment generating function M(t) = (1 - βt)−α.327 Therefore, S(x) = (1 + βxτ)−α. This is a Burr Distribution with parameters α, θ = 1/β1/τ, and γ = τ. Comment: The Burr Distribution has S(x) = (1 + (x /θ)γ)−α.] If in this case α = 1, then λ has an Exponential Distribution, and the mixed distribution is a Loglogistic, a special case of the Burr for α = 1.328 If instead τ = 1, then as discussed previously the mixed distribution is a Pareto, a special case of the Burr. In general for a frailty model, f(x | λ) = -dS(x | λ)/dx = λ a(x) exp[-λ A(x)]. Therefore, f(x) = Eλ[ λ a(x) exp[-λ A(x)]] = a(x) Eλ[ λ exp[-λ A(x)]] = a(x) Mλʼ[-A(x)], where Mλʼ is the derivative of the moment generating function of the distribution of λ.329 330 For example, in the previous exercise, M(t) = (1 - βt)−α, and Mʼ(t) = αβ (1 - βt)−(α+1). f(x) = a(x) Mλʼ[-A(x)] = τ xτ−1 αβ (1 - βxτ)−(α+1). The density of a Burr Distribution is: (αγ θ−γ xγ−1)(1 + (x /θ)γ)−(α + 1). This is indeed the density of a Burr Distribution with parameters α, θ = 1/β1/τ, and γ = τ. 326
See Example 5.7 in Loss Models. This result is mathematically equivalent to mixing the scale parameter of a Weibull via an Inverse Transformed Gamma, resulting in a Burr, one of the examples listed previously. 327 See Appendix A in the tables attached to the exam. 328 See Exercise 5.13 in Loss Models. 329 Ey[y exp[yt]] = Myʼ(t). See “Mahlerʼs Guide to Aggregate Distributions.” 330 The density function of the mixture is the mixture of the density functions.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 800
For a frailty model, h(x) = f(x)/S(x) = a(x) Mλʼ[-A(x)] / Mλ[-A(x)] = a(x) d ln Mλ[-A(x)] / dλ.331 Defining the cumulant generating function as ψX(t) = ln MX(t) = ln E[etx], then h(x) = a(x) ψλʼ(-A(x)), where ψʼ is the derivative of the cumulant generating function. For example in the previous exercise, M(t) = (1 - βt)−α, and ψ(t) = ln M(t) = -α ln (1 - βt).
ψʼ(t) = αβ /(1 - βt). h(x) = a(x) ψλʼ(-A(x)) = τ xτ−1 αβ /(1 + βxτ). Exercise: What is the hazard rate for a Burr Distribution? [Solution: h(x) = f(x)/S(x) = {(αγ θ−γ xγ−1)(1 + (x/θ)γ)−(α + 1)}/{1 + (x/θ)γ}−α = (αγ θ−γ xγ−1)/(1 + (x/θ)γ).] Thus the above h(x) is indeed the hazard rate of a Burr Distribution with parameters α,
θ = 1/β1/τ, and γ = τ.
331
The hazard rate of the mixture is not the mixture of the hazard rates.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 801
Problems: Use the following information for the next 3 questions: Assume that the size of claims for an individual insured is given by an Exponential Distribution: f(x) = e−x/δ /δ, with mean δ and variance δ2. Also assume that the parameter δ varies for the different insureds, with δ following an θα e- θ / δ Inverse Gamma distribution: g(δ) = α + 1 , for 0< δ < ∞. δ Γ[α ] 39.1 (2 points) An insured is picked at random and is observed until it has a loss. What is the probability density function at 400 for the size of this loss? 1 θα α θα θα α θα A. B. C. D. E. (θ + 400)α (θ + 400)α (θ + 400)α (θ + 400)α+ 1 (θ + 400)α+ 1 39.2 (2 points) What is the unconditional mean severity? θ θ α - 1 α A. B. C. D. E. None of A, B, C, or D α - 1 α θ θ 39.3 (3 points) What is the unconditional variance? A.
θ2 α - 1
B.
2 α θ2 α - 1
C.
α θ2 (α - 1) (α - 2)
D.
α θ2 (α - 1)2 (α - 2)
E.
2 α θ2 (α - 1)2 (α - 2)
39.4 (3 points) The severity distribution of each risk in a portfolio is given by a Weibull Distribution, with parameters τ = 1/3 and θ, with θ varying over the portfolio via an Inverse Transformed Gamma Distribution: g(θ) = A. Burr
(72.5 / 3) exp[-7 / θ1/ 3 ] θ11/ 6 Γ(2.5)
B. Generalized Pareto
. What is the mixed distribution? C. Inverse Burr
D. LogLogistic
E. ParaLogistic
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 802
39.5 (3 points) You are given the following: • The amount of an individual claim, X, follows an exponential distribution function with probability density function: f(x | λ) = λ e-λx, x, λ > 0. •
The parameter λ, follows a Gamma distribution with probability density function p(λ) = (4/3) λ4 e-2λ, λ > 0.
Determine the unconditional probability that x > 7. A. 0.00038 B. 0.00042 C. 0.00046
D. 0.00050
E. 0.00054
39.6 (2 points) Consider the following frailty model: • h(x | λ) = λ a(x). • a(x) = 4 x3 . • Λ follows an Exponential Distribution with mean 0.007. Determine S(6). A. 7% B. 8%
C. 9%
D. 10%
E. 11%
39.7 (3 points) The future lifetimes of a certain population consisting of 1000 people is modeled as follows: (i) Each individual's future lifetime is exponentially distributed with constant hazard rate λ. (ii) Over the population, λ is uniformly distributed over (0.01, 0.11). For this population, all of whom are alive at time 0, calculate the number of deaths expected between times 3 and 5. (A) 75 (B) 80 (C) 85 (D) 90 (E) 95 39.8 (2 points) You are given the following: • The number of miles that an individual car is driven during a year is given by an Exponential Distribution with mean µ. • µ differs between cars. • µ is distributed via an Inverse Gamma Distribution with parameters α = 3 and θ = 25,000. What is the probability that a car chosen at random will be driven more than 20,000 miles during the next year? (A) 9% (B) 11% (C) 13% (D) 15% (E) 17%
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 803
39.9 (2 points) You are given the following: • The IQs of actuaries are normally distributed with mean 135 and standard deviation 10. • Each actuaryʼs score on an IQ test is normally distributed around his true IQ, with standard deviation of 15. What is the probability that Abbie the actuary scores between 145 and 155 on his IQ test? A. 10% B. 12% C. 14% D. 16% E. 18% 39.10 (2 points) Consider the following frailty model: • SX|Λ(x | λ) = e−λx. • Λ follows a Gamma Distribution with α = 3 and θ = 0.01. Determine S(250). A. Less than 3% B. At least 3%, but less than 4% C. At least 4%, but less than 5% D. At least 5%, but less than 6% E. At least 6% 39.11 (3 points) The severity distribution of each risk in a portfolio is given by an Inverse Weibull Distribution, F(x) = exp[-(q/x)4 ], with q varying over the portfolio via a Transformed Gamma Distribution with parameters α = 1.3, θ = 11, and τ = 4. What is the probability that the next loss will be of size less than 10? ⎛ x⎞ τ τ xτα -1 exp -⎜ ⎟ ⎝ θ⎠ Hint: For a Transformed Gamma Distribution, f(x) = τα θ Γ(α)
[ ]
A. Less than 32% B. At least 32%, but less than 34% C. At least 34%, but less than 36% D. At least 36%, but less than 38% E. At least 38%
.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 804
39.12 (3 points) You are given the following: • The amount of an individual loss in 2002, follows an exponential distribution with mean $3000. • Between 2002 and 2007, losses will be multiplied by an inflation factor. • You are uncertain of what the inflation factor between 2002 and 2007 will be, but you estimate that it will be a random draw from an Inverse Gamma Distribution with parameters α = 4 and θ = 3.5. Estimate the probability that a loss in 2007 exceeds $5500. A. Less than 18% B. At least 18%, but less than 19% C. At least 19%, but less than 20% D. At least 20%, but less than 21% E. At least 21% Use the information on the following frailty model for the next two questions: • Each insured has a survival function that is Exponential with hazard rate λ. • The hazard rate varies across the portfolio via an Inverse Gaussian Distribution with µ = 0.015 and θ = 0.005. 39.13 (3 points) Determine S(65). A. 56% B. 58% C. 60%
D. 62%
E. 64%
39.14 (2 points) For the mixture what is the hazard rate at 40? A. 0.0060 B. 0.0070 C. 0.0080 D. 0.0090 E. 0.0100
39.15 (4 points) You are given the following: • The amount of an individual claim has an Inverse Gamma distribution with shape parameter α = 4 and scale parameter q (rather than θ to avoid later confusion.) • The parameter q is distributed via an Exponential Distribution with mean 100. What is the probability that a claim picked at random will be of size greater than 15? A. Less than 50% B. At least 50%, but less than 55% C. At least 55%, but less than 60% D. At least 60%, but less than 65% E. At least 65%
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
39.16 (3 points) You are given the following: • X is a Normal Distribution with mean zero and variance v. • v is distributed via an Inverse Gamma Distribution with α = 10 and θ = 10. Determine the form of the mixed distribution. 39.17 (2 points) You are given the following: • The amount of an individual claim has an exponential distribution given by: p(y) = (1/δ) e-y/δ y > 0, δ > 0 • The parameter δ has a probability density function given by: f(δ)= (4000/δ4 ) e-20/δ
δ >0
Determine the variance of the claim severity distribution. A. 150 B. 200 C. 250 D. 300 E. 350 39.18 (3 points) Consider the following frailty model: • h(x | λ) = λ a(x). • a(x) =
0.004 1 + 0.008x
.
• Λ follows an Gamma Distribution with α = 6 and θ = 1. Determine S(11). A. 70% B. 72%
C. 74%
D. 76%
E. 78%
39.19 (3 points) You are given the following: • The amount of an individual loss this year, follows an Exponential Distribution with mean $8000. • Between this year and next year, losses will be multiplied by an inflation factor. • The inflation factor follows an Inverse Gamma Distribution with parameters α = 2.5 and θ = 1.6. Estimate the probability that a loss next year exceeds $10,000. A. Less than 21% B. At least 21%, but less than 22% C. At least 22%, but less than 23% D. At least 23%, but less than 24% E. At least 24%
Page 805
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 806
39.20 (2 points) Severity is LogNormal with parameters µ and 0.3. µ varies across the portfolio via a Normal Distribution with parameters 5 and 0.4. What is probability that a loss chosen at random exceeds 200? (A) 25% (B) 27% (C) 29% (D) 31% (E) 33% Use the following information for the next two questions: For each class, sizes of loss are Exponential with mean µ. Across a group of classes µ varies via an Inverse Gamma Distribution with parameters α = 3 and θ = 1000. 39.21 (2 points) For a class picked at random, what is the expected value of the loss elimination ratio at 500? A. 50% B. 55% C. 60% D. 65% E. 70% 39.22 (6 points) What is the correlation across classes of the loss elimination ratio at 500 and the loss elimination ratio at 200? A. Less than 96% B. At least 96%, but less than 97% C. At least 97%, but less than 98% D. At least 98%, but less than 99% E. At least 99%
39.23 (4B, 5/93, Q.19) (2 points) You are given the following: • The amount of an individual claim has an exponential distribution given by: p(y) = (1/µ) e-y/µ, y > 0, µ > 0. • The parameter µ has a probability density function given by: f(µ)= (400/µ3 )e-20/µ, µ > 0. Determine the mean of the claim severity distribution. A. 10 B. 20 C. 200 D. 2000
E. 4000
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 807
39.24 (4B, 11/93, Q.26) (3 points) You are given the following: • The amount of an individual claim, Y, follows an exponential distribution function with probability density function f(y | δ) = (1/ δ) e-y/δ, y, δ > 0. • The conditional mean and variance of Y given δ are E[Y | δ] = δ and Var[Y | δ] = δ2. • The mean claim amount, δ, follows an Inverse Gamma distribution with density function p(δ) = 4e-2/δ / δ4, δ > 0. Determine the unconditional density of Y at y = 3. A. Less than 0.01 B. At least 0.01, but less than 0.02 C. At least 0.02, but less than 0.04 D. At least 0.04, but less than 0.08 E. At least 0.08 39.25 (3, 5/00, Q.17) (2.5 points) The future lifetimes of a certain population can be modeled as follows: (i) Each individual's future lifetime is exponentially distributed with constant hazard rate θ. (ii) Over the population, θ is uniformly distributed over (1, 11). Calculate the probability of surviving to time 0.5, for an individual randomly selected at time 0. (A) 0.05 (B) 0.06 (C) 0.09 (D) 0.11 (E) 0.12 39.26 (3, 5/01, Q.28) (2.5 points) For a population of individuals, you are given: (i) Each individual has a constant force of mortality. (ii) The forces of mortality are uniformly distributed over the interval (0, 2). Calculate the probability that an individual drawn at random from this population dies within one year. (A) 0.37 (B) 0.43 (C) 0.50 (D) 0.57 (E) 0.63 39.27 (SOA M, 5/05, Q.10 & 2009 Sample Q.163) The scores on the final exam in Ms. Bʼs Latin class have a normal distribution with mean θ and standard deviation equal to 8. θ is a random variable with a normal distribution with mean equal to 75 and standard deviation equal to 6. Each year, Ms. B chooses a student at random and pays the student 1 times the studentʼs score. However, if the student fails the exam (score ≤ 65), then there is no payment. Calculate the conditional probability that the payment is less than 90, given that there is a payment. (A) 0.77 (B) 0.85 (C) 0.88 (D) 0.92 (E) 1.00
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 808
39.28 (SOA M, 11/05, Q.17 & 2009 Sample Q.204) (2.5 points) The length of time, in years, that a person will remember an actuarial statistic is modeled by an exponential distribution with mean 1/Y. In a certain population, Y has a gamma distribution with α = θ = 2. Calculate the probability that a person drawn at random from this population will remember an actuarial statistic less than 1/2 year. (A) 0.125 (B) 0.250 (C) 0.500 (D) 0.750 (E) 0.875 39.29 (SOA M, 11/05, Q.20) (2.5 points) For a group of lives age x, you are given: (i) Each member of the group has a constant force of mortality that is drawn from the uniform distribution on [0.01, 0.02 ]. (ii) δ = 0.01. For a member selected at random from this group, calculate the actuarial present value of a continuous lifetime annuity of 1 per year. (A) 40.0 (B) 40.5 (C) 41.1 (D) 41.7 (E) 42.3
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 809
Solutions to Problems: 39.1. E. The conditional probability of a loss of size 400 given δ is: e−400/δ /δ. The unconditional probability can be obtained by integrating the conditional probabilities versus the distribution of δ: ∞
f(400) =
∞
∫ f(400 | δ)g(δ) dδ = ∫ {e−400/δ /δ } θαδ−(α+1) e−θ/δ /Γ(α) dδ =
0
0 ∞
∫
{ θα/Γ(α)} δ−(α+2) e−(400+θ)/δ dδ = { θα/Γ(α)} Γ(α+1) / (400 + θ)α+1 = αθα/ (θ+400)α+1. 0
39.2. A. The conditional mean given δ is: δ. The unconditional mean can be obtained by integrating the conditional means versus the distribution of δ: ∞
∞
∫
∫
∞
∫
E[X] = E[X | δ]g(δ) dδ = δ θα δ−(α+1) e−θ/δ /Γ(α) dδ = {θα/Γ(α)} δ−αe−θ/δ dθ 0
0
0
= {θα/Γ(α)} Γ(α−1) / θα−1 = θ /(α−1). Comment: The mean of a Pareto Distribution; the mixed distribution is a Pareto with scale parameter θ and shape parameter α.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 810
39.3. D. The conditional mean given δ is δ. The conditional variance given δ is δ2. Thus the conditional second moment given δ is: δ2 + δ2 = 2δ2. The unconditional second moment can be obtained by integrating the conditional second moments versus the distribution of δ: ∞
E[X2 ] =
∞
∫ E[X2 | θ] g(δ) dδ = ∫{2δ2}θα δ−(α+1) e−θ/δ /Γ(α) dδ =
0
0 ∞
∫
{2θα/Γ(α)} δ−(α−1) e−θ/δdδ = {2θα/Γ(α)} Γ(α−2) / θα−2 = 2θ2/{(α−1)(α−2)}. 0
Since the mean is θ/(α−1), the variance is: 2θ2/{(α−1)(α−2)} - θ2/(α−1)2 = ( θ2/{(α−1)2(α−2)}){2(α−1)−(α−2)} = θ2α / { (α−2)(α−1)2 }. Comment: The variance of a Pareto Distribution.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 811
39.4. A. Weibull has density, τ(x/θ)τ exp(-(x/θ)τ) /x = x-2/3θ−1/3 exp(-x1/3/θ1/3)/3. The density of the mixed distribution is obtained by integrating the Weibull density times g(θ): ∞
∞
∫f(x)g(θ)dθ = ∫{x-2/3θ−1/3 exp(-x1/3/θ1/3)/3} 72.5(1/3) exp(-7/θ1/3) / {θ11/6 Γ(2.5)}dθ = 0
0 ∞
∫
72.5 x-2/3 /(9Γ(2.5)) θ−13/6 exp(-(7+x1/3)/θ1/3) dθ = 0
Make the change of variables, y = (7+x1/3)/θ1/3. θ = (7+x1/3)y-3. dθ = -3(7+x1/3)y-4dy. ∞
∫
72.5 x-2/3 /(3Γ(2.5)) (7+x1/3)-13/2 y 13/2exp(-y) (7+x1/3)y-4dy = 0 ∞
∫
72.5 x-2/3 (7+x1/3)-7/2 /(3Γ(2.5)) y5/2exp(-y)dy = {72.5 x-2/3 (7+x1/3)-7/2 /(3Γ(2.5))}Γ(3.5) 0 = (2.5)(1/3){1/7) x-2/3 /(1+(x/343)1/3)-7/2.
This is the density of a Burr Distribution, αγ(x/θ)γ(1+(x/θ)γ)−(α + 1) /x, with parameters: α = 2.5, θ = 343 = 73 , and γ = 1/3. Comment: In general, if one mixes a Weibull with τ = t fixed, with its scale parameter varying via an Inverse Transformed Gamma Distribution, with parameters: α, θ, and τ = t, then the mixed distribution is a Burr with parameters: α , θ , and γ = t. This Inverse Transformed Gamma Distribution has parameters: α = 2.5, θ = 343 = 73 , and τ = 1/3.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 812
39.5. E. Although it is not obvious, this is the Exponential - Inverse Gamma. The mean of the exponential is δ = 1/λ, and δ follows an Inverse Gamma. Since dλ / dδ = −1/δ2, g(δ) = p(λ) |dλ / dδ| = (4/3) δ−4e-2/δ/δ2 = (4/3) δ−6e-2/δ. This Inverse Gamma has parameters α = 5 and θ = 2. The (prior) marginal distribution is a Pareto, with α = 5 and θ = 2. Therefore 1 - F(x) = (2/ (2+x))5 . 1 - F(7) = (2/9)5 = 0.00054. Alternately, one can compute the unconditional distribution function at x = 7 via integration: ∞
∫
∞
∞
∫
∫
0
0
1 - F(7) = {1-F(7 | λ) }p(λ) dλ = exp(-7λ)(4/3) λ4e-2λ dλ = (4/3) λ4e-9λ dλ. 0
This is the same type of integral as the Gamma Function, thus: 1 - F(7) = (4/3) Γ(5) / 95 = (4/3) (24) / 95 = 0.00054. Alternately, one can compute the unconditional distribution function at x via integration: ∞
∫
1 - F(x) = {1-F(x | λ) }p(λ)dλ = 0
∞
∞
∫ exp(-xλ)(4/3) λ4e-2λ dλ = (4/3)∫ λ4e-(2+x)λ dλ. 0
0
This is the same type of integral as the Gamma Function, thus: 1 -F(x) = (4/3) Γ(5) / (2+x)5 . Therefore, 1 - F(7) = (4/3) (24) / 95 = 32 / 95 = 0.00054. Comment: If one recognizes this as a Pareto with scale parameter of 2 and shape parameter of 5, then one can determine the constant by looking in Appendix A of Loss Models without doing the Gamma integral.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 813
x
39.6. D. A(x) =
∫ a(t) dt =
x4 .
0
S(x) = Mλ[-A(x)]. The moment generating function of this Exponential Distribution is: M(t) = Thus S(x) = Thus, S(6) =
1 . 1 - 0.007t
1 . 1 + 0.007x 4 1 = 9.9%. 1 + 0.007 (64)
Comment: See Exercise 5.13 in Loss Models. The mixture is Loglogistic. For the Weibull as per Loss Models, h(x) = τ xτ−1 θ−τ. Therefore, we can put the Weibull for fixed τ into the form of a frailty model; h(x) = λ a(x), by taking a(x) = τ xτ−1 and λ = θ−τ. A(x) = xτ. Here τ = 4, a(x) = 4 x3 , and λ = θ-4. Thus for a given value of theta or lambda, S(x | λ) = exp[-(x/θ)4 ] = exp[-λx4 ]. Therefore, S(6 | λ) = exp[-λ64 ] = exp[-1296λ]. Thus, S(6) =
∞
∞
0
0
∫ exp[-1296λ] exp[-λ / .007] / 0.007 dλ = ∫ exp[-1438.86λ] dλ / 0.007
= (1/1438.86)/0.007 = 9.9%. 39.7. D. The hazard rate for an Exponential is one over its mean. Therefore, the survival function is S(t; λ) = e-λt. Mixing over the different values of λ: .11
S(t) = .01
.11
.11
∫ S(t; λ) f(λ) dλ = ∫ e-tλ (1/0.1) dλ = (-10/t)e-tλ ] = (10/t)(e-.01t - e-.11t). .01
.01
S(3) = (10/3)(e-.03 - e-.33) = 0.8384. S(5) = (10/5)(e-.05 - e-.55) = 0.7486. The number of deaths expected between time 3 and time 5 is: (1000)(S(3) - S(5)) = (1000)(0.8384 - 0.7486) = 89.8. Comment: Similar to 3, 5/00, Q.17.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 814
39.8. E. For this Inverse Gamma-Exponential, the mixed distribution is a Pareto with α = 3 and θ = 25,000. S(20000) = {25000/(25000 + 20000)}3 = (5/9)3 = 17.1%. 39.9. D. If the severity is Normal with fixed variance s2 , and the mixing distribution of their means is also Normal with mean µ and variance σ2, then the mixed distribution is another Normal, with mean µ and variance: s2 + σ2. In this case, the mixed distribution is Normal with mean 135 and variance: 152 + 102 = 325. Prob[145 ≤ score ≤ 155] = Φ[(155 - 135)/ 325 ] - Φ[(145 - 135)/ 325 ] = Φ[1.109] - Φ[.555] = .8663 - .7106 = 0.156. 39.10. A. λ is the hazard rate of each Exponential, one over the mean. We are mixing λ via a Gamma. Therefore, the mixed distribution is Pareto with parameters α = 3 and θ = 1/0.01 = 100. (This is mathematically the same as the Inverse-Gamma Exponential.) Thus, S(250) =
100 ⎛ ⎞3 = 2.3%. ⎝ 100 + 250 ⎠
Alternately, h(x | λ) = λ a(x). For the Exponential, h(x) = λ. Thus, this is a frailty model with a(x) = 1 and A(x) = x. S(x) = Mλ[-A(x)]. The moment generating function of this Gamma Distribution is: M(t) =
Thus S(x) =
1 ⎛ ⎞ 3 ⎛ 100 ⎞ 3 = . ⎝ 1 + 0.01x⎠ ⎝ 100 + x ⎠
Thus, S(250) =
100 ⎛ ⎞3 = 2.3%. ⎝ 100 + 250 ⎠
1 ⎛ ⎞3 . ⎝ 1 - 0.01t⎠
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 815
39.11. A. Inverse Weibull has density: 4x-5q4 exp(-(q/x)4 ). The density of q is that of the Transformed Gamma Distribution: τ(q/θ)τα exp(-(q/θ)τ) / {q Γ(α)} = 4 q4.2 11-5.2exp(-(q/11)4 ) / Γ(1.3). The density of the mixed distribution is obtained by integrating the Inverse Weibull density times the density of q: ∞
∫4x-5q4exp(-(q/x)4) 4 q4.2 11-5.2 exp(-(q/11)4) / Γ(1.3) dq = 0 ∞
((16) 11-5.2 x-5 /Γ(1.3))
∫ q8.2exp(-(q4(11-4 + x-4)) dq =
0
Make the change of variables, y = q4 (11-4 + x-4). q = (11-4 + x-4)-1/4y 1/4. dq = (1/4)(11-4 + x-4)-1/4y -3/4dy. ∞
∫
((16) 11-5.2 x-5 /Γ(1.3)) (11-4 + x-4)-2.05y 2.05 exp(-y) (1/4)(11-4 + x-4)-1/4y -3/4dy = 0 ∞
∫
((4) 11-5.2 x-5 (11-4 + x-4)-2.3 /Γ(1.3)) y1.3 exp(-y)dy = 0
((4) 11-5.2 x-5 (11-4 + x-4)-2.3 /Γ(1.3)) Γ(2.3) = (4)(1.3) 11-5.2 x-5 (11-4 + x-4)-2.3 = (4)(1.3) 11-5.2 x-5 x9.2(1 + (x/11)4 )-2.3 = (4)(1.3) (x/11)5.2 2(1 + (x/11)4 )-2.3 /x. This is the density of an Inverse Burr Distribution, τγ(x/θ)γτ(1+(x/θ)γ)−(τ + 1) /x , with parameters τ = 1.3, θ = 11, and γ = 4. Therefore, the mixed distribution is: F(x) = {(x /θ)γ/(1+(x /θ)γ)}τ = {1+(11 /x )4 }-1.3. F(10) = {1 + (11/10)4 }-1.3 = 31.0%. Comment: In general, if one mixes an Inverse Weibull with τ = t fixed, with its scale parameter varying via a Transformed Gamma Distribution, with parameters α, θ, and τ = t, then the mixed distribution is an Inverse Burr with parameters τ = α , θ, and γ = t. For each Inverse Weibull, S(10) = exp(-(q / 10)4 ). One could instead average S(10) for each Inverse Weibull, in order to get S(10) the mixed distribution:
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 816
∞
∫exp(-(q / 10)4) 4 q4.2 11-5.2 exp(-(q/11)4) / Γ(1.3) dq= 0 ∞
∫
{4 (11-5.2) / Γ(1.3)} exp(- .0001683q4 ) q4.2 dq = 0 ∞
∫
(11-5.2)/Γ(1.3) exp(-.0001683y)y.3 dy = {(11-5.2)/Γ(1.3)} Γ(1.3)(.0001683)-1.3 = 31.0%. 0
39.12. B. Let the inflation factor be y. Then given y, in the year 2007 the losses have an Exponential Distribution with mean 3000y. Let z = 3000y. Then since y follows an Inverse Gamma with parameters α = 4 and scale parameter θ = 3.5, z follows an Inverse Gamma with parameters α = 4 and θ = (3000)(3.5) = 10,500. Thus in the year 2007, we have a mixture of Exponentials each with mean z, with z following an Inverse Gamma. This is the (same mathematics as the) Inverse Gamma-Exponential. For the Inverse Gamma-Exponential the mixed distribution is a Pareto, with α = shape parameter of the Inverse Gamma and θ = scale parameter of the Inverse Gamma. In this case the mixed distribution is a Pareto with α = 4 and θ = 10,500. For this Pareto, S(5500) = {1 + (5500/10500)}-4 = 18.5%. Comment: This is an example of “parameter uncertainty.” We assume that the loss distribution in year 2007 will also be an Exponential, we just are currently uncertain of its parameter. x
39.13. B. h(x | λ) = λ a(x). For an Exponential a(x) = 1, and A(x) =
∫ a(t) dt = x.
0
The moment generating function of this Inverse Gaussian Distribution is: M(t) = exp[(θ / µ) (1 -
1 - 2tµ2 / θ )] = exp[(1/3)(1 -
S(x) = Mλ[-A(x)] = exp[(1/3)(1 Thus, S(65) = exp[(1/3)(1 -
1 + 0.09x )].
1 + (0.09)(65))] = 58%.
1 - 0.09t )].
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
39.14. B. From the previous solution, S(x) = exp[(1/3)(1 S(40) = exp[(1/3)(1 -
Page 817
1 + 0.09x )].
1 + (0.09)(40))] = 0.683.
Differentiating, f(x) = exp[(1/3)(1 f(40) = exp[(1/3)(1 -
HCM 10/8/12,
1 + 0.09x )] (1/3)(0.09)(1/2)/ 1 + 0.09x .
1 + (0.09)(40))] 0.015 /
1 + (0.09)(40) = 0.00478.
h(40) = f(40)/S(40) = 0.00478/0.683 = 0.0070. Alternately, for a frailty model, h(x) = a(x) d ln Mλ[-A(x)] / dλ. M(t) = exp[(θ / µ) (1 ln M(t) = (1/3) (1 -
1 - 2tµ2 / θ )] = exp[(1/3)(1 -
1 - 0.09t )].
1 - 0.09t ).
d ln M[t] / dt = (1/3)(0.09)(1/2) /
1 - 0.09t . x
h(x | λ) = λ a(x). For an Exponential a(x) = 1, and A(x) =
∫ a(t) dt = x.
0
h(x) = (1) (1/3)(0.09)(1/2)/ 1 + 0.09x . h(40) = 0.015 /
1 + (0.09)(40) = 0.0070.
39.15. C. For the Inverse Gamma, f(x | q) = qα e-q/x / {Γ[α] xα+1} = q4 e-q/x / {6 x5 }. For the Exponential, u(q) = e-q/100/100. ∞
f(x) =
∞ q4 e- q / x e- q / 100 dq = q4 e -q (1/ x + 1/ 100) dq / (600x5 ) ∫ 5 6x 100 0 0
∫ f(x | q) u(q) dq = ∫
= {Γ[5]/(1/x + 1/100)5 }/ (600x5 ) = {(24)1005 /(x + 100)5 }/ 600 = (4)1004 /(x + 100)5 . This is the density of a Pareto Distribution with parameters α = 4 and θ = 100. Therefore, F(x) = 1 - {θ/(x+θ)}α = 1 - {100/(x+ 100)}4 . S(15) = (100/115)4 = 57.2%. Comment: An example of an Exponential-Inverse Gamma.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 818
39.16. For a Normal Distribution with mean zero and variance v: f(x | v) = exp[-x2 /(2v)]/ 2 πv . An Inverse Gamma with α = 10 and θ = 10 has density: g(v) = 1010 e-10/v v-11 / Γ(10), v > 0. The mixed distribution has density: ∞
∫ exp[-x2/(2v)]/
∞
∫
2 πv 1010 e-10/v v-11 / Γ(10) dv = 1010 / {9! 2 π )} exp[-(10+x2 /2)/v] v-11.5 dv
v=0
= (1010 / {Γ(10)
v=0
2 π ) Γ(10.5) / (10 + x2 /2)10.5 = (1 + x2 /20)-10.5 Γ(10.5) / {Γ(10) Γ(1/2)
20 }.
This is a Studentʼs t distribution with 20 degrees of freedom. Comment: Difficult! Since the Inverse Gamma Density integrates to one over its support, ∞
∫ exp[-θ/x] x-(α+1) dx = Γ(α) / θα. Also, Γ(1/2) =
π.
0
A Studentʼs t distribution with ν degrees of freedom, has density: f(x) = 1/{(1 + x2 /ν)(ν+1)/2 β[ν/2, 1/2] ν.5}, where β[ν/2, 1/2] = Γ(1/2)Γ(ν/2)/Γ((ν+1)/2). 39.17. D. The mixed distribution is a Pareto with shape parameter = α = 3 and scale parameter = θ = 20, with variance: (2)(202 )/{{(3-1)(3-2)} - {20 / (3-1)}2 = 400 - 100 = 300. Alternately,Var[X | δ] = Variance[Exponential Distribution with mean δ] = δ2 . f(δ) is an Inverse Gamma Distribution, with θ = 20 and α = 3. E[Var[X | δ]] = E[δ2 ] = 2nd moment of Inverse Gamma = 202 /{(3-1)(3-2)} = 200. Var[E[X | δ]] = Var[δ] = variance of Inverse Gamma = 2nd moment of Inverse Gamma - (mean Inverse Gamma)2 = 200 - {20 / (3-1)}2 = 100. Var[X] = E[Var[X | δ]] + Var[E[X | δ]] = 200 + 100 = 300.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 819
x
39.18. E. A(x) =
∫ a(t) dt =
1 + 0.008x - 1.
0
The moment generating function of a Gamma Distribution with α = 6 and θ = 1 is: M(t) =
⎛ 1 ⎞6 . ⎝ 1 - t⎠
⎛ ⎞6 ⎛ 1 1 ⎞6 ⎛ 1 ⎞3 S(x) = Mλ[-A(x)] = ⎜ = ⎜ ⎟ = . ⎟ ⎝ 1 + 0.008x ⎠ ⎝ 1 + 0.008x ⎠ ⎝ 1 + A(x)⎠ ⎛ ⎞3 1 Thus, S(11) = ⎜ ⎟ = 77.6%. ⎝ 1 + (0.008)(11)⎠ Comment: See Exercise 5.14 in Loss Models. The mixture is a Pareto Distribution with α = 3, and θ = 1/0.008 = 125. 39.19. D. Let the inflation factor be y. Then given y, in the next year the losses have an Exponential Distribution with mean 8000y. Let z = 8000y. Then since y follows an Inverse Gamma with parameters α = 2.5 and scale parameter θ = 1.6, z follows an Inverse Gamma with parameters α = 2.5 and θ = (8000)(1.6) = 12,800. Thus next year, we have a mixture of Exponentials each with mean z, with z following an Inverse Gamma. This is the (same mathematics as the) Inverse GammaExponential. For the Inverse Gamma-Exponential the mixed distribution is a Pareto, with α = shape parameter of the Inverse Gamma, and θ = scale parameter of the Inverse Gamma. In this case the mixed distribution is a Pareto with α = 2.5 and θ = 12,800. For this Pareto Distribution, S(10,000) = {1 + (10000/12800)}-2.5 = 23.6%. 39.20. B. ln[x] follows a Normal with parameters µ and 0.3. Therefore, we are mixing a Normal with fixed variance via another Normal. Therefore, the mixture of ln[x] is Normal with parameters 5, and Thus the mixture of x is LogNormal with parameters 5 and 0.5. S(200) = 1 - Φ[{ln(200) - 5}/0.5] = 1 - Φ[0.60] = 27.43%.
0.32 + 0.42 = 0.5.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 820
39.21. E. & 39.22. B. For the Exponential, the loss elimination ratio is equal to the distribution function: LER(x) = 1 - e-x/µ. The mixed distribution of the size of loss is Pareto with the same parameters, α = 3 and θ = 1000. Eµ[LER(x)] = Eµ[1 - e-x/µ] = Eµ[F(x ; µ)] =
∫ π[µ] F(x ; µ) dµ
3 ⎛ 1000 ⎞ = distribution function of the mixture = distribution function of the Pareto = 1 . ⎝ x + 1000 ⎠
Eµ[LER(500)] = 1 - (10/15)3 = 0.70370. Eµ[LER(200)] = 1 - (10/12)3 = 0.42130. Eµ[LER(x) LER(y)] = Eµ[(1 - e-x/µ) (1 - e-y/µ)] = Eµ[(1 - e-x/µ - e-y/µ + e-(x+y)/µ] = 1 - Eµ[S(x ; µ)] - Eµ[S(x ; µ)] + Eµ[S(x + y ; µ)] 3 ⎛ ⎞3 1000 ⎞ 3 ⎛ 1000 ⎛ 1000 ⎞ =1-⎜ ⎟ +⎜ ⎟ . ⎝ x + 1000 ⎠ ⎝ y + 1000⎠ ⎝ x + y + 1000 ⎠
Eµ[LER(200) LER(500)] = 1 - (10/12)3 - (10/15)3 + (10/17)3 = 0.32854. C o vµ[LER(200) , LER(500)] = Eµ[LER(200) LER(500)] - Eµ[LER(200)] Eµ[LER(500)] = 0.32854 - (0.42130)(0.70370) = 0.03207. Eµ[LER(x)2 ] = Eµ[LER(x) LER(x)] = 1 - 2
3 3 ⎛ 1000 ⎞ ⎛ 1000 ⎞ + . ⎝ x + 1000 ⎠ ⎝ 2x + 1000 ⎠
Eµ[LER(200)2 ] = 1 - (2)(10/12)3 + (10/14)3 = 0.20702. Varµ[LER(200)] = 0.20702 - 0.421302 = 0.02953. Eµ[LER(500)2 ] = 1 - (2)(10/15)3 + (10/20)3 = 0.53241. Varµ[LER(500)] = 0.53241 - 0.703702 = 0.03722. Corrµ[LER(200) , LER(500)] =
0.03207 (0.02953) (0.03722)
= 96.73%.
Comment: The loss elimination ratios for deductibles of somewhat similar sizes are highly correlated across classes. For a practical example for excess ratios, see Tables 3 and 4 in “NCCIʼs 2007 Hazard Group Mapping,” by John P. Robertson, Variance, Vol. 3, Issue 2, 2009, not on the syllabus of this exam.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
39.23. B. f(µ) is an Inverse Gamma Distribution, with θ = 20 and α = 2. p(y) is an Exponential Distribution with E[Y | µ] = µ. Therefore the mean severity = Eµ[ E[Y | µ] ] = Eµ[µ] = ∫ µ f(µ) dµ = mean of Inverse Gamma = θ / (α-1) = 20 / (3-2) = 20. Alternately, the mixed distribution is a Pareto with shape parameter = α = 2, and scale parameter = θ = 20. Therefore this Pareto has mean 20 / (2-1) = 20. Comment: One can do the relevant integral via the substitution x = 1/µ, dx = -dµ / µ2:
∫ µ f(µ) dµ = ∫ µ(400 / µ3)e-20/µ dλ = 400∫e-20/µ dµ/ µ2 = 400∫e-20x dx = 400/20 = 20.
Page 821
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 822
39.24. C. This is an Exponential mixed via an Inverse Gamma. The Inverse Gamma has parameters α = 3 and θ = 2. Therefore the (prior) mixed distribution is a Pareto, with α = 3 and θ = 2. Thus f(x) = (3) (23 ) (2+x)-4. f(3) = (3)(8) / 54 = 0.0384. Alternately, one can compute the unconditional density at y = 3 via integration: ∞
f(3) =
∞
∞
∫ f(3 | δ) p(δ) dδ = ∫ (1/δ)exp(-3 / δ)(4 / δ4) exp(-2 / δ) dδ = ∫ 4δ−5 exp(-5 / δ) dδ.
0
0
0
Let x = 5/δ and dx = (-5/δ2)dδ in the integral: ∞
f(3) = (4 / 54)
∫ x3 exp(-x ) dx = (4/625) Γ(4) = (4/625)(3!) = (4/625)(6) = 0.0384.
0
Alternately, one can compute the unconditional density at y via integration: ∞
∞
∫
f(y) = (1/δ) exp(-y / δ) (4 / δ4) exp(-2 / δ) dδ = 0
∫ 4δ−5 exp(- (2+y)/ δ) dδ 0
Let x = (2+y)/δ and dx = -((2+y)/δ2)dδ in the integral: ∞
f(y) = (4 / (2+y)4)
∫ x3 exp(-x ) dx = (4 / (2+y)4) Γ(4) = (4/ (2+y)4)(3!) = 24(2+y)-4.
f(3) = 0.0384.
0
Comment: If one recognizes this as a Pareto with θ = 2 and α = 3, then one can determine the constant by looking in Appendix A of Loss Models, rather than doing the Gamma integral.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 823
39.25. E. The hazard rate for an Exponential is one over its mean. The mean is 1/θ not θ. The survival function is S(t; θ) = e-θt. S(.5; θ) = e-.5θ. Mixing over the different values of θ: 11
11
∫
11
∫
]
1
1
S(.5) = S(.5; θ) f(θ) dθ = e-.5θ (1/10) dθ = (-1/5)e-.5θ = (e-.5 - e-5.5 )/5 = (.607-.004)/5 = 0.12. 1
Comment: The mean future lifetime given θ is 1/θ. The overall mean future lifetime is: 11
11
11
∫(1/θ) f(θ) dθ = ∫(1/θ) (1/10) dθ = ln(θ)/10 ] = 0.24. 1
1
1
39.26. D. For a constant force of mortality, λ, the distribution function is Exponential: F(t | λ) = 1 - e-λt. F(1 | λ) = 1 - e-λ. The forces of mortality are uniformly distributed over the interval (0, 2). ⇒ π(λ) = 1/2, 0 ≤ λ ≤ 2. Taking the average over the values of λ: 2
2
0
0
F(1) = ∫ F(1 | λ) π(λ) dλ = ∫ (1 - e- λ ) / 2 dλ = 1 - (1 - e-2)(1/2) = 0.568. Alternately, one can work with the means θ = 1/λ, which is harder. λ is uniform from 0 to 2. ⇒ The distribution function of λ is: Fλ(λ) = λ/2, 0 ≤ λ ≤ 2. ⇒ The distribution function of θ is: Fθ(θ) = 1 - Fλ(λ) = 1 - Fλ(1/θ) = 1 - 1/(2θ), 1/2 ≤ θ ≤ ∞. ⇒ The density function of θ is: 1/(2θ2), 1/2 ≤ θ ≤ ∞. Given θ, the probability of death by time 1 is: 1 - e-1/θ. Taking the average over the values of θ: ∞
F(1) = ∫ (1 -
e- 1/ θ )/ (2θ2)
1/ 2
∞
dθ = 1 - ∫
1/ 2
e-1/ θ
/ (2θ2 )
dθ = 1 - (1/2)
x =∞ -1/ θ ] (e
)
x = 1/ 2
= 1 - (1 - e-2)/2 = 0.568. Comment: F(1 | λ = 1) = 1 - e-1 = 0.632. Thus choices A and B are unlikely to be correct.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 824
39.27. D. If the severity is Normal with fixed variance s2 , and the mixing distribution of their means is also Normal with mean µ and variance σ2, then the mixed distribution is another Normal, with mean µ and variance: s2 + σ2. In this case, the mixed distribution is Normal with mean 75 and variance: 82 + 62 = 100. Prob[there is a payment] = Prob[Score > 65] = 1 - Φ[(65 - 75)/10] = 1 - Φ[-1] = .8413. Prob[90 > Score > 65] = Φ[(90 - 75)/10] - Φ[(65 - 75)/10] = Φ[1.5] - Φ[-1] = .9332 - .1587 = .7745. Prob[payment < 90 | payment > 0] = Prob[90 > Score > 65 | Score > 65] = Prob[90 > Score > 65]/Prob[Score > 65] = .7745/.8413 = 0.9206. 39.28. D. Y is Gamma with α = θ = 2. Therefore, F(y) = Γ[2; y/2]. Let the mean of each Exponential Distribution be δ = 1/y. Then F(δ) = Γ[2; (1/2)/δ]. Therefore, δ has an Inverse Gamma Distribution with α = 2 and θ = 1/2. This is an Inverse Gamma - Exponential with mixed distribution a Pareto with α = 2 and θ = 1/2. F(x) = 1 - {θ/(x + θ)}α = 1 - {.5/(x + .5)}2 . F(1/2) = 1 - (.5/1)2 = 0.75.
∫
Alternately, Prob[T > t | y] = e-yt. Prob[T > t] = e-yt f(y) dy = MY[-t]. The moment generating function of a Gamma Distribution is: 1/(1 - θt)α. Therefore, the moment generating function of Y is: 1/(1 - 2t)2 . Prob[T > 1/2] = MY[-1/2] = 1/22 = 1/4. Prob[T < 1/2] = 1 - 1/4 = 3/4. Alternately, f(y) = y e-y/2 /(Γ(2) 22 ) = y e-y/2 /4. Therefore, the mixed distribution is: ∞
F(x) =
∞
∫ (1 - e-xy) ye-y/2/4 dy = 1 - (1/4) ∫ y e-y(x + .5) dy = 1 - .25/(x + .5)2.
0
0
F(1/2) = 1 - 0.25 = 0.75. Alternately, the length of time until the forgetting is analogous to the time until the first claim. This time is Exponential with mean 1/Y and is mathematically the same as a Poisson Process with intensity Y. Since Y has a Gamma Distribution, this is mathematically the same as a Gamma-Poisson. Remembering less than 1/2 year, is analogous to at least one claim by time 1/2. Over 1/2 year, Y has a Gamma Distribution with α = 2 and instead θ = 2/2 = 1. The mixed distribution is Negative Binomial, with r = α = 2 and β = θ = 1. 1 - f(0) = 1 - 1/(1 + 1)2 = 3/4.
2013-4-2,
Loss Distributions, §39 Continuous Mixtures
HCM 10/8/12,
Page 825
39.29. B. The present value of a continuous annuity of length t is: (1 - e-δt)/δ. Given constant force of mortality λ, the lifetimes are exponential with density f(t) = λe−λt. ∞
For fixed λ, APV =
∫ (1 - e-δt)/δ λe−λt dt = {1 - λ/(λ + δ)}/δ = 1/(λ + δ) = 1/(λ + .01).
0
λ in turn is uniform from .01 to .02 with density 100. 0.02
∫
Mixing over λ, Actuarial Present Value = 1/(λ + .01) 100 dλ = 100ln(3/2) = 40.55. 0.01
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 826
Section 40, Spliced Models332 A spliced model allows one to have different behaviors for different sizes of loss. For example as discussed below, one could splice together an Exponential Distribution for small losses and a Pareto Distribution for large losses. This would differ from a two-point mixture of an Exponential and Pareto, in which each distribution would contribute its density to all sizes of loss. A Simple Example of a Splice: Assume f(x) = 0.01 for 0 < x < 10, and f(x) = 0.009 e0.1 e-x/100 for x > 10. Exercise: Show that this f(x) is a density. [Solution: f(x) ≥ 0. ∞
10
∫ f(x) dx =
0
∫0
∞
0.01 dx +
∫ 0.009 e0.1 e- x / 100 dx = 0.1 + 0.9 e.1 e-10/100 = 1.]
10
Here is a graph of this density: density 0.010 0.008 0.006 0.004 0.002
50
100
150
200
x
We note that this density is discontinuous at 10. This is an example of a 2-component spliced model. From 0 to 10 it is proportional to a uniform density and above 10 it is proportional to an Exponential density. 332
See Section 5.2.6 of Loss Models.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 827
Two-Component Splices: In general a 2-component spliced model would have density: f(x) = w1 f1 (x) on (a1 , b1 ) and f(x) = w2 f2 (x) on (a2 , b2 ), where f1 (x) is a density with support (a1 , b 1 ), f2 (x) is a density with support (a2 , b2 ), and w1 + w2 = 1. In the example, f1 (x) = 1/10 on (0, 10), f2 (x) = e0.1e-x/100/100 on (10, ∞), w1 = 0.1, and w2 = 0.9. f1 is the uniform distribution on (0, 10). f2 is proportional to an Exponential with θ = 100. In order to make f2 a density on (10, ∞), we have divided by S(10) = e-10/100 = e-0.1.333 A Splice of an Exponential and a Pareto: Assume an Exponential with θ = 50. On the interval (0, 100) this would have probability F(100) = 1 - e-2 = 0.8647. In order to turn this into a density on (0, 100), we would divide this Exponential density by 0 .8647: (e-x/50/50) / 0.8647 = 0.02313 e-x/50. This integrates to one from 0 to 100. Assume a Pareto Distribution with α = 3 and θ = 200, with density (3)(2003 ) / (200 + x)4 = 0.015/(1 + x/200)4 . On the interval (100, ∞) this would have probability S(100) = (θ/(θ + x))α = (200/300)3 = 8/27. In order to turn this into a density on (100, ∞), we would multiply by 27/8: (27/8) {.015/(1 + x/200)4 } = 0.050625 / (1 + x/200)4 . This integrates to one from 100 to ∞. So we would have f1 (x) = 0.02313e-x/50 on (0, 100), f2 (x) = 0.050625/(1 + x/200)4 on (100, ∞). We could use any weights w1 and w2 as long as they add to one, so that the spliced density will integrate one. If we took for example, w1 = 70% and w2 = 30%, then the spliced density would be: (0.7)(0.02313 e-x/50) = 0.01619 e-x/50 on (0, 100), and (0.3) (0.050625/(1 + x/200)4 ) = 0.0151875 / (1 + x/200)4 on (100, ∞).
333
This is how one alters the density in the case of truncation from below.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 828
This 2-component spliced density looks as follows: Density 0.015
0.010
0.005
100
200
300
400
Size
It is not continuous at 100. This spliced density is: 0.01619e-x/50 on (0, 100), and 0.01519/(1 + x/200)4 on (100, ∞) ⇔ (0.7)(.02313e-x/50) on (0, 100), and (0.3)(.050625/(1 + x/200)4 ) on (100, ∞) ⇔ (0.7)(e-x/50/50)/(1 - e-100/50) on (0, 100), (0.3){(3)2003 /(200 + x)4 }/{(200/300)3 } on (100, ∞) ⇔ (0.8096) (Exponential[50]) on (0, 100), and (1.0125) (Pareto[3, 200]) on (100, ∞). Exercise: What is the distribution function of this splice at 80? [Solution: (0.8096)(Exponential Distribution function at 80) = (0.8096)(1 - e-80/50) = 0.6461.] Exercise: What is the survival function of the splice at 300? [Solution: (1.0125)(Pareto Survival function at 300) = (1.0125){200/(200 + 300)}3 = 0.0648.] In general, it is easier to work with the distribution function of the first component of the splice below the breakpoint and the survival function of the second component above the breakpoint. Note that at the breakpoint of 100, the distribution function of the splice is: (0.8096)(Exponential Distribution function at 100) = (0.8096)(1 - e-100/50) = 0.700 = 1 - (1.0125)(Pareto Survival function at 100) = 1 - (1.0125){200/(200 + 100)}3 .
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 829
Assume we had originally written the splice as:334 c1 Exponential[50] on (0, 100), and c2 Pareto[3, 200] on (100, ∞). Since we chose weights of 70% & 30%, we want: 100
70% = ∫ c 1 Exponential[50] dx = c1 (1 - e-100/50). 0
⇒ c1 = 70% / (1 - e-100/50) = 0.8096. ∞
200 ⎛ ⎞3 30% = ∫ c 2 Pareto[3, 200] dx = c2 = 0.29630 c2 . ⎝ 200 + 100 ⎠ 100 ⇒ c2 = 30% / 0.29630 = 1.0125. Therefore, as shown previously, the spliced density can be written as: (0.8096) (Exponential[50]) on (0, 100), and (1.0125) (Pareto[3, 200]) on (100, ∞). Density 0.015
0.010
0.005
100
334
200
300
400
Size
As discussed, while this is mathematically equivalent, this is not the manner in which the splice would be written in Loss Models, which uses f1 , f2 , w1 , and w2 .
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 830
Continuity: With appropriate values of the weights, a splice will be continuous at its breakpoint. Exercise: Choose w1 and w2 so that the above spliced density would be continuous at 100. [Solution: f1 (100) = 0.02313e-100/50 = 0.00313. f2 (100) = 0.050625 / (1 + 100/200)4 = 0.01. In order to be continuous at 100, we need w1 f1 (100) = w2 f2 (100) = (1- w1 )f2 (100). w1 = f2 (100) / {f1 (100) + f2 (100)} = 0.01 / (0.00313 + 0.01) = 0.762. w2 = 1 - w1 = 0.238.] If we take f(x): (0.762)(0.02313e-x/50) = 0.01763e-x/50 on (0, 100), and (0.238){0.050625 / (1 + x/200)4 } = 0.01205 / (1 + x/200)4 on (100, ∞), then f(x) is continuous: Density 0.015
0.010
0.005
100
200
300
400
Size
The density of an Exponential Distribution with mean 50 is: e-x/50 / 50. The density of a Pareto Distribution with α = 3 and θ = 200 is:
(3) (2003) 3 / 200 = . (200 + x) 4 (1 + x / 200)4
(0.01763)(50) = 0.8815. (0.01205)(200/3) = 0.8033. Therefore, similar to the noncontinuous splice, we could rewrite this continuous splice as: (0.8815) (Exponential[50]) on (0, 100), and (0.8033) (Pareto[3, 200]) on (100, ∞). In general, a 2-component spliced density will be continuous at the breakpoint b, provided the weights are inversely proportional to the component densities at the breakpoint: w 1 = f2 (b) / {f1 (b) + f2 (b)}, w2 = f1 (b) / {f1 (b) + f2 (b)}.335 335
While this spliced density will be continuous at the breakpoint, it will not be differentiable at the breakpoint.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 831
Moments: One could compute the moments of a spliced density by integrating xn f(x). For example, the mean of this continuous 2-component spliced density is: ∞
100
∫0
x 0.01763 e- x / 50 dx +
∫
100
x 0.01205 dx . (1 + x / 200)4
100
Exercise: Compute the integral
∫0 x 0.01763 e- x / 50 dx .
100
[Solution:
∫0 x 0.01763
x = 100
e- x / 50
dx = -50 x 0.01763 e- x / 50 - 502 0.01763 e- x / 50 ]
= 26.18.
x =0
Alternately, as discussed previously, the first component of the continuous splice is: (0.8815) Exponential[50], on (0, 100). 100
Now
∫0 x fexp(x) dx = E[X ∧ 100] - 100 Sexp(100) = 50(1 - e-100/50) - 100 e-100/50 = 29.700.
100
Thus
100
∫0 x 0.01763 e- x / 50 dx = 0.8815 ∫0 x fexp(x) dx = (0.8815)(29.700) = 26.18.
Comment:
∫ x e- x / θ dx = -θ x e-x/θ - θ2 e-x/θ.]
2013-4-2,
Loss Distributions, §40 Splices ∞
Exercise: Compute the integral
HCM 10/8/12,
Page 832
∫100 (1 + x / 200)4 dx . x 0.01205
[Solution: One can use integration by parts. ∞
∫ 100
x =∞
x 0.01205 -200 / 3 dx = 0.01205 x 4 (1 + x / 200) (1 + x / 200)3
]
∞
dx = ∫ (1 + x / 200)3 100
- 0.01205
x = 100
(0.01205) (20,000/3) (
-200 / 3
1 1 + ) = 59.51. 3 1.5 1.52
Alternately, as discussed previously, the second component of the continuous splice is: (0.8033) Pareto[3, 200], on (100, ∞). ∞
Now
∞
∞
∫100 x fPareto(x) dx = 100∫ (x - 100) fPareto(x) dx + 100 100∫ fPareto(x) dx =
100 + 200 200 ⎛ ⎞ ePareto(100) SPareto(100) + 100 SPareto(100) = ( + 100) ⎝ 200 + 100 ⎠ 3 - 1 ∞
Thus
∫ 100
x 0.01205 dx = 0.8033 (1 + x / 200)4
∞
∫100 x fPareto(x) dx = (0.8033)(74.074) = 59.51.
∞
Comment: e(x) =
x + θ
∫x (t - x) f(t) dt / S(x). For a Pareto Distribution, e(x) = α - 1 .]
Thus the mean of this continuous splice is: ∞
100
∫0 x 0.01763
e- x / 50
dx +
∫100 (1 + x / 200)4 dx = 26.18 + 59.51 = 85.69. x 0.01205
3
= 74.074.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 833
More generally, assume we have a splice which is w1 h1 (x) on (0, b) and w2 h2 (x) on (b, ∞), where h1 (x) = f1 (x) / F1 (b) and h2 (x) = f2 (x) / S2 (b). Then the mean of this spliced density is: ∞
b
∫0 x w1 h1(x) dx + ∫b
w1 x w2 h2 (x) dx = F1(b)
b
∫0
w1 w2 { E[X1 ∧ b] - bS1 (b)} + {E[X2 ] F1(b) S2 (b)
w2 x f1(x) dx + S2 (b)
∞
∫b x f2(x) dx =
b
∫0 x f2(x) dx } =
w1 w2 { E[X1 ∧ b] + bF1 (b) - b} + {E[X2 ] + bS2 (b) - E[X2 ∧ b]} = F1(b) S2 (b) w1 w2 { E[X1 ∧ b] - b} + bw1 + bw2 + {E[X2 ] - E[X2 ∧ b]} = F1(b) S2 (b)
b+
w1 w2 { E[X1 ∧ b] - b} + {E[X2 ] - E[X2 ∧ b]}. F1(b) S2 (b)
For the example of the continuous splice, b = 100, F1 (100) = 1 - e-100/50 = 0.8647, E[X1 ∧ b] = 50(1 - e-100/50) = 43.235,336 S 2 (b) = (200/300)3 = 8/27, E[X2 ] = 200/(3-1) = 100, E[X2 ∧ b] = 100(1 - (2/3)2 ) = 55.556.337 Therefore, for w1 = 0.762 and w2 = 0.238, the mean is: 100 + (0.762/0.8647)(43.235 - 100) + (0.238)(27/8)(100 - 55.556) = 85.68, matching the previous result subject to rounding. n-Component Splices: In addition to 2-component splices, one can have 3-components, 4-components, etc. In a three component splice there are three intervals and the spliced density is: w1 f1 (x) on (a1 , b1 ), w2 f2 (x) on (a2 , b2 ), and w3 f3 (x) on (a3 , b3 ), where f1 (x) is a density with support (a1 , b1 ), f2 (x) is a density with support (a2 , b2 ), f3 (x) is a density with support (a3 , b3 ), and w1 + w2 + w3 = 1. Previously, when working with grouped data we had discussed assuming a uniform distribution on each interval. This is an example of an n-component splice, with n equal to the number of intervals for the grouped data, and with each component of the splice uniform. 336 337
For the Exponential Distribution, E[X ∧ d] = θ(1 - e-d/θ).
For the Pareto Distribution, E[X ∧ d] = (θ/(α−1))(1 - (θ/(d+θ))α-1).
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 834
Using the Empirical Distribution: One common use of splicing, is to use the Empirical Distribution function (or a smoothed version of it) for small losses, and some parametric distribution to model large losses.338 For example, take the ungrouped data in Section 1. We could model the losses of size less than 100,000 using the Empirical Distribution Function, and use a Pareto Distribution to model the losses of size greater than 100,000. There are 57 out of 130 losses of size less than 100,000. Therefore, the Empirical Distribution Function at 100,000 is: 57/130 = .4385. A Pareto Distribution with α = 2 and θ = 298,977, has F(100000) = 1 - (298,977/398,977)2 = .4385, matching the Empirical Distribution Function. Thus one could splice together this Pareto Distribution from 100,000 to ∞, and the Empirical Distribution Function from 0 to 100,000. Here is what this spliced survival function looks like: 1.0
0.8
Empirical
0.6
0.4
Pareto
0.2
100000
338
500000
900000
A variation of this technique is used in “Workersʼ Compensation Excess Ratios, an Alternative Method,” by Howard C. Mahler, PCAS 1998.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 835
Using a Kernel Smoothed Density:339 Rather than use the Empirical Distribution, one could use a kernel smoothed version of the Empirical Distribution. For example, one could splice together the same Pareto Distribution above 100,000, and below 100,000 the kernel smoothed density for the ungrouped data in Section 1, using a uniform kernel with a bandwidth of 5000. Here is one million times this spliced density: 10
8
Kernel Smoothed
6
4 Pareto 2
100000
339
500000
Kernel Smoothing is discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
900000
2013-4-2,
Loss Distributions, §40 Splices
Problems: Use the following information for the next 11 questions: f(x) = 0.12 for x ≤ 5, and f(x) = 0.06595e-x/10 for x > 5. 40.1 (1 point) What is the Distribution Function at 10? A. less than 0.60 B. at least 0.60 but less than 0.65 C. at least 0.65 but less than 0.70 D. at least 0.70 but less than 0.75 E. at least 0.75 40.2 (3 points) What is the mean? A. less than 5 B. at least 5 but less than 6 C. at least 6 but less than 7 D. at least 7 but less than 8 E. at least 8 40.3 (4 points) What is the variance? A. less than 70 B. at least 70 but less than 75 C. at least 75 but less than 80 D. at least 80 but less than 85 E. at least 85 40.4 (5 points) What is the skewness? A. less than 2.6 B. at least 2.6 but less than 2.7 C. at least 2.7 but less than 2.8 D. at least 2.8 but less than 2.9 E. at least 2.9 40.5 (3 points) What is E[X ∧ 3]? A. less than 2.2 B. at least 2.2 but less than 2.3 C. at least 2.3 but less than 2.4 D. at least 2.4 but less than 2.5 E. at least 2.5
HCM 10/8/12,
Page 836
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 837
40.6 (3 points) What is the loss elimination ratio at 5? A. less than 40% B. at least 40% but less than 45% C. at least 45% but less than 50% D. at least 50% but less than 55% E. at least 55% 40.7 (3 points) What is E[(X-20)+]? A. less than 0.75 B. at least 0.75 but less than 0.80 C. at least 0.80 but less than 0.85 D. at least 0.85 but less than 0.90 E. at least 0.90 40.8 (2 points) What is e(20)? A.7 B. 8 C. 9
D. 10
E. 11
40.9 (1 point) What is the median? A. 4.2 B. 4.4 C. 4.6
D. 4.8
E. 5.0
40.10 (2 points) What is the 90th percentile? A. 18 B. 19 C. 20 D. 21
E. 22
40.11 (3 points) The size of loss for the Peregrin Insurance Company follows the given f(x). The average annual frequency is 138. Peregrin Insurance buys reinsurance from the Meriadoc Reinsurance Company for 5 excess of 15. How much does Meriadoc expect to pay per year for losses from Peregrin Insurance? A. 60 B. 70 C. 80 D. 90 E. 100
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 838
Use the following information for the next 3 questions: One has a two component splice, which is proportional to an Exponential Distribution with mean 3 for loss sizes less than 5, and is proportional to a Pareto Distribution with α = 4 and θ = 60 for loss sizes greater than 5. The splice is continuous at 5. 40.12 (2 points) What is the distribution function of the splice at 2? A. less than 20% B. at least 20% but less than 25% C. at least 25% but less than 30% D. at least 30% but less than 35% E. at least 35% 40.13 (2 points) What is the survival function of the splice at 10? A. less than 35% B. at least 35% but less than 40% C. at least 40% but less than 45% D. at least 45% but less than 50% E. at least 50% 40.14 (3 points) What is the mean of this splice? A. less than 14.0 B. at least 14.0 but less than 14.5 C. at least 14.5 but less than 15.0 D. at least 15.0 but less than 15.5 E. at least 15.5
40.15 (4 points) In 2008, the size of monthly pension payments for a group of retired municipal employees follows a Single Parameter Pareto Distribution, with α = 2 and θ = $1000. The city announces that for 2009, there will be a 5% cost of living adjustment (COLA.) However. the COLA will only apply to the first $2000 in monthly payments. What is the probability density function of the size of monthly pension payments in 2009?
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 839
40.16 (3 points) You are given the following grouped data: Range # of claims loss 0-1 6300 3000 1-2 2350 3500 2-3 850 2000 3-4 320 1000 4-5 110 500 over 5 70 500 10,000 10,500 What is the mean of a 2-component splice between the empirical distribution below 4 and an Exponential with θ = 1.5? A. less than 1.045 B. at least 1.045 but less than 1.050 C. at least 1.050 but less than 1.055 D. at least 1.055 but less than 1.060 E. at least 1.060 Use the following information for the next 2 questions: ⎧ 617,400 ,0 < x ≤ 4 ⎪⎪218 (10 + x) 4 f(x) = ⎨ . 3920 ⎪ ,x > 4 ⎪⎩ 25 (10 + x)3
40.17 (2 points) Determine the probability that X is greater than 2. A. 54% B. 56% C. 58% D. 60% E. 62% 40.18 (4 points) Determine E[X]. A. 4 B. 5 C. 6
D. 7
E. 8
40.19 (SOA M, 11/05, Q.35 & 2009 Sample Q.211) (2.5 points) An actuary for a medical device manufacturer initially models the failure time for a particular device with an exponential distribution with mean 4 years. This distribution is replaced with a spliced model whose density function: (i) is uniform over [0, 3] (ii) is proportional to the initial modeled density function after 3 years (iii) is continuous Calculate the probability of failure in the first 3 years under the revised distribution. (A) 0.43 (B) 0.45 (C) 0.47 (D) 0.49 (E) 0.51
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 840
40.20 (CAS3, 11/06, Q.18) (2.5 points) A loss distribution is a two-component spliced model using a Weibull distribution with θ1 = 1,500 and τ = 1 for losses up to $4,000, and a Pareto distribution with θ2 = 12,000 and α = 2 for losses $4,000 and greater. The probability that losses are less than $4,000 is 0.60. Calculate the probability that losses are less than $25,000. A. Less than 0.900 B. At least 0.900, but less than 0.925 C. At least 0.925, but less than 0.950 D. At least 0.950, but less than 0.975 E. At least 0.975
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 841
Solutions to Problems: 5
10
∫
∫
40.1. E. F(10) = .12 dx + .06595e-x/10 dx = (5)(.12) + .6595(e-5/10 - e-10/10) = 0.757. 0
5 ∞
∫
Alternately, S(10) = .06595e-x/10 dx = .6595e-10/10 = .243. F(10) = 1 - .243 = 0.757. 10 5
40.2. D. mean =
∞
∫ .12 x dx + ∫ .06595e-x/10 x dx =
0
5
x=5
.06 x2
x=∞
] - .06595 {10xe-x/10 + 100e-x/10} ] = 1.5 + 6.0 = 7.5.
x=0
x=5 5
∞
∫
∫
40.3. C. 2nd moment = .12 x2 dx + .06595e-x/10 x2 dx = 0
5
x=5
.04x3
x=∞
] - .06595{10x2e-x/10 + 200xe-x/10 + 2000e-x/10} ] = 5 + 130 = 135.
x=0
x=5
Variance = 135 - 7.52 = 78.75. 5
∞
∫
∫
40.4. A. 3rd moment = .12 x3 dx + .06595e-x/10 x3 dx = 0 x=5
.03 x4
5 x=∞
] - .06595 {10x3e-x/10 + 300x2e-x/10 + 6000xe-x/10 + 60000e-x/10} ] x=0
x=5
= 18.75 + 3950 = 3968.75. Skewness = (3968.5 - (3)(7.5)(135) + (2)(7.53 ))/(78.751.5) = 2.54.
2013-4-2,
Loss Distributions, §40 Splices 3
HCM 10/8/12,
Page 842
x=3
∫
40.5. D. E[X ∧ 3] = .12 x dx + 3S(3) = .06 x2 0
] - (3)(1 - (.12)(3)) = .54 + 1.92 = 2.46. x=0
5
x=5
∫
40.6. C. E[X ∧ 5] = .12 x dx + 5S(5) = .06 x2 0
] - (5){1 - (.12)(5)} = 1.5 + 2 = 3.5. x=0
E[X ∧ 5]/E[X] = 3.5/7.5 = 46.7%. Alternately, since for x > 5, the density is proportional to an Exponential, f(x) = 0.06595e-x/10 for x > 5, S(x) = 0.6595e-x/10 for x > 5. ∞
The layer from 5 to infinity is:
∫
0.6595 e - x / 10 dx = 4.00.
5
The loss elimination ratio at 5 is: 1 - 4/7.5 = 46.7%. Comment: 1 - (0.12)(5) = S(5) = 0.6595e-5/10 = 0.400. 5
20
∫
∫
40.7. D. E[X ∧ 20] = .12 x dx + .06595e-x/10 x dx + 20S(20) = 0
5
x=5
.06 x2
x=20
] - .06595 {10xe-x/10 + 100e-x/10} ] + (20)(.06595)(10e-2)
x=0
x=5
= 1.5 + 3.323 + 1.785 = 6.608. E[(X-20)+] = E[X] - E[X ∧ 20] = 7.5 - 6.608 = 0.892. ∞
x=∞
∫
Alternately, E[(X-20)+] = .06595e-x/10(x - 20) dx = .06595 {-10xe-x/10 + 100e-x/10} ] = 0.8925. 20
x=20 ∞
∫
∞
x=∞
∫
Alternately, E[(X-20)+] = S(x) dx = .6595e-x/10 dx = -6.595e-x/10 ] = 0.8925. 20
20
x=20
40.8. D. Beyond 5 the density is proportional to an Exponential density with mean 10, and therefore, beyond 5, the mean residual life is a constant 10. Alternately, S(20) = (.06595)(10e-2) = .08925. e(20) = E[(X-20)+] /S(20) = .8925/.08925 = 10.0.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 843
40.9. A. f(x) = 0.12 for x ≤ 5. ⇒ F(5) = 0.60. ⇒ median = 0.5 / 0.12 = 4.167. 40.10. B. f(x) = 0.12 for x ≤ 5. ⇒ F(5) = 0.6. f(x) = 0.06595e-x/10 for x > 5. x
⇒ F(x) = 0.6 +
∫5 0.06595 e- t / 10 dt = 0.6 + 0.6595e-5/10 - 0.6595e-x/10, x > 5.
Require that: 0.9 = 0.6 + 0.6595e-5/10 - 0.6595e-x/10. ⇒ e-x/10 = 0.15164. ⇒ x = 18.86. 20
20
∫
∫
x=20
40.11. C. E[X ∧ 20] - E[X ∧ 15] = S(x) dx = .6595e-x/10 dx = -6.595e-x/10 15
15
] = .579.
x=15
Meriadoc reinsures the layer from 15 to 20, so it expects to pay: (138)(.579) = 79.9. 40.12. C. Let the splice be: a(Exponential), x < 5, and b(Pareto), x > 5. The splice must integrate to unity from 0 to ∞: 1 = a(Exponential Distribution at 5) + b(1 - Pareto Distribution at 5) ⇒ 1 = a(1 - e-5/3) + b(60/65)4 ⇒ 1 = 0.8111a + 0.7260b. The density of the Exponential is: e-x/3/3. f(5) = e-5/3/3 = 0.06296. The density of the Pareto is: (4)(604 )/(x+60)5 . f(5) = (4)(604 )/(655 ) = 0.04468 Also in order for the splice to be continuous at 5: a(Exponential density @ 5) = b(Pareto density @5) ⇒ a(0.06296) = b(0.04468).
⇒ b = 1.4091a. ⇒ 1 = 0.8111a + 0.7260(1.4091a). ⇒ a = 0.545. ⇒ the distribution function at 2 is: .545(Exponential Distribution at 2) = 0.545(1 - e-2/3) = 0.265.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 844
40.13. C. Continuing the previous solution, b = 1.4091a = .768.
⇒ the survival function at 10 is: .768(Pareto survival function at 10) = .768(60/70)4 = 0.415. Alternately, the distribution function at 10 is: 5
0.545
∫0
e- x / 3 dx + 0.768 3
10
∫5
(4)(604) dx = (x + 60)5
0.545 (Exponential distribution function at 5) + 0.768 (Pareto distribution function at 10 - Pareto distribution function at 5) = 0.545 (1 - e-5/3) + 0.768{(1 - (60/70)4 ) - (1 - (60/65)4 )} = (0.545)(0.811) + (0.768)(0.186) = 0.585. Therefore, the survival function at 10 is: 1 - 0.585 = 0.415. 5
e- x / 3 40.14. E. mean = 0.545 x dx + 0.768 3
∫0
∞
∫5
x
(4)(604 ) dx = (x + 60)5
The first integral is for an Exponential Distribution: E[X ∧ 5] - 5S(5) = 3(1 - e-5/3) - 5e-5/3 = 1.49. 5
The second integral is for an Pareto Distribution: E[X] -
∫0 x fPareto(x) dx =
E[X] - {E[X ∧ 5] - 5S(5)} = E[X] - E[X ∧ 5] + 5S(5) = (60/3)(60/65)3 + (5)(60/65)4 = 19.36. Thus the mean of the splice is: (0.545)(1.49) + (0.768)(19.36) = 15.68. Alternately, the second integral is for an Pareto: ∞
∞
∞
∫5 x fPareto(x) dx = ∫5 (x- 5) fPareto(x) dx + 5 ∫5 fPareto(x) dx = SPareto(5) ePareto(5) + 5 SPareto(5) = (60/65)4
5 + 60 + (5)(60/65)4 = 19.36. Proceed as before. 4 - 1 ∞
Comment: e(x) =
x + θ
∫x (t - x) f(t) dt / S(x). For a Pareto Distribution, e(x) = α - 1 .
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 845
40.15. In 2008, S(2000) = (1000/2000)2 = 1/4. For a Single Parameter Pareto Distribution, f(x) = αθα / xα+1, x > θ. For those whose payments are less than $2000 per month, the payment is multiplied by 1.05. Thus in 2009 they follow a Single Parameter Pareto with α = 2 and θ = $1050. The density is proportional to: f(x) = 2 (10502 ) / x3 , x > 1050. For those whose payments are $2000 or more per month, their payment is increased by (2000)(5%) = 100. S(x) = {1000/(x-100)}2 , x > 2100. The density is proportional to: f(x) = 2 (10002 ) / (x - 100)3 . In 2009, the density is a splice, with 3/4 weight to the first component and 1/4 weight to the second component. Someone with $2000 in 2008, will get $2100 in 2009; $2100 is the breakpoint of the splice. f(x) = 2 (10502 ) / x3 , x > 1050, would integrate from 1050 to 2100 to the distribution function of at 2100 of a Single Parameter Pareto with α = 2 and θ = $1050, 1 - (1050/2100)2 = 3/4. This is the desired weight for the first component, so this is OK. f(x) = 2 (10002 ) / (x - 100)3 , would integrate from 2100 to ∞: (10002 ) / (2100 - 100)2 = 1/4. This is the desired weight for the second component, so this is OK. The probability density function of the size of monthly pension payments in 2009 is a splice: ⎧ 2 (10502 ) / x3 , 2100 > x > 1050 f(x) = ⎨ . ⎩ 2 (10002 ) / (x - 100) 3, x > 2100 Comment: Coming from the left, f(2100) = 2 (10502 ) / 21003 = 1/4200. Coming from the right, f(2100) = 2 (10002 ) / (2100 - 100)3 = 1/4000. Thus the density of this splice is not (quite) continuous at the breakpoint of 2100. A graph of this splice: f(x)
0.0015 0.0010 0.0005
1050
2100
x
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 846
40.16. B. The empirical survival function at 4 is: (110 + 70)/10000 = .018. Above 4 the splice is proportional to an Exponential with survival function e-x/1.5. Let w be the weight applied to this Exponential. Matching S(4), set 0.018 = we-4/1.5. w = 0.259. The contribution to the mean from the losses of size less than 4 is: (3000 + 3500 + 2000 + 1000)/10,000 = 0.9500. The contribution to the mean from the losses of size greater than 4 is: ∞
∞
∫
.259 x e-x/1.5/1.5 dx = -.259(x e-x/1.5 + 1.5e-x/1.5) ] = 0.0990. 4
4
mean = 0.9500 + 0.0990 = 1.0490. 3
40.17. D. F(3) =
∫0
617,400 617,400 1 1 dx = { } = 0.3977. (3) (103 ) (3) (123 ) 218 (10 + x) 4 218
S(3) = 1 - 0.3977 = 0.6023. Comment: Via integration, one can determine that the first component of the splice has a total probability of 3/5, while the second component of the splice has a total probability of 2/5.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 847
40.18. E. One can use integration by parts. x= 4 4 4 -x ⎤ x 1 4 1 1 dx = + dx = + 3 ⎥ 4 3 2 3 (10 + x) 3 (10 + x) 3 (10 + x) (3) (14 ) (2) (3) (14 ) (2) (3) (102 ) ⎦ 0 0 x= 0
∫
∫
= 0.00033042. 4
Thus
∞
∫4
∫0
x
617,400 617,400 dx = 0.00033042 = 0.9358. 218 (10 + x)4 218 x= ∞
x -x dx = (10 + x)3 2 (10 + x)2
]
x= 4 ∞
Therefore,
∫4
x
4
Thus, E[X] =
∫0
∞
+
∫
4 1 1 dx = + = 9/196. 2 (2) (142 ) (2) (14) 2 (10 + x) 4
3920 3920 dx = (9/196) = 7.2000. 25 25 (10 + x)3 617,400 x dx + 218 (10 + x)4
∞
∫4 x 25 (10 + x)3 dx = 0.9358 + 7.2000 = 8.1358. 3920
Alternately, each component of the splice is proportional to the density of a Pareto Distribution. The density of a Pareto with α = 3 and θ = 10 is: Thus the first component of the splice is:
(3) (103 ) . (10 + x)4
617,400 1 2058 Pareto[3, 10] = Pareto[3, 10]. 3000 2180 218
4
Now
∫0 x fPareto(x) dx = E[X ∧ 4] - 4 S(4) = (10/2) {1 - (10/14)2} - (4)(10/14)3 = 0.99125. 4
Therefore,
∫0 x 218 (10 + x)4 dx = 2180 0.99125 = 0.9358. 617,400
2058
The density of a Pareto with α = 2 and θ = 10 is: Thus the second component of the splice is:
(2) (102 ) . (10 + x)4
3920 1 Pareto[2, 10] = 0.784 Pareto[2, 10]. 25 200
2013-4-2,
Loss Distributions, §40 Splices
∞
Now
∞
HCM 10/8/12,
Page 848
∞
∫4 x fPareto(x) dx = ∫4 (x - 4) fPareto(x) dx + ∫4 4 fPareto(x) dx = e(4) S(4) + 4 S(4)
= (14/1) (10/14)2 + (4) (10/14)2 = 9.1837. ∞
Therefore,
∫4 x 25 (10 + x)3 dx = (0.784) (9.1837) = 7.2000. 3920
4
Thus, E[X] =
∫0
617,400 x dx + 218 (10 + x)4
∞
∫4 x 25 (10 + x)3 dx = 0.9358 + 7.2000 = 8.1358. 3920
∞
Comment: e(x) =
x + θ
∫x (t - x) f(t) dt / S(x). For a Pareto Distribution, e(x) = α - 1 .
40.19. A. A uniform on [0, 3] has density of 1/3. On the interval 3 to ∞, we want something proportional to an Exponential with θ = 4. From 3 to ∞ this Exponential density would integrate to S(3) = e-3/4. Therefore, something proportional that would integrate to one is: .25e-x/4/e-3/4 = .25e-(x-3)/4. Thus the density of the splice is: w(1/3) from 0 to 3, and (1 - w).25e-(x-3)/4 from 3 to ∞. In order to be continuous, the two densities must match at 3: w(1/3) = (1 - w)0.25e-(3-3)/4. ⇒ 4w = 3(1 - w). ⇒ w = 3/7 = 0.429. Probability of failure in the first 3 years is the integral of the splice from 0 to 3: w = 0.429.
2013-4-2,
Loss Distributions, §40 Splices
HCM 10/8/12,
Page 849
40.20. C. For the Pareto, S(4000) = {12/(12 + 4)}2 = 9/16. The portion of the splice above $4000 totals 1 - .6 = 40% probability. Therefore, the portion of the splice above 4000 is: (.4) Pareto[2, 12,000] / (9/16). For the Pareto, S(25000) = {12/(12 + 25)}2 = 0.1052. Therefore, for the splice the probability that losses are greater than $25,000 is: (.4)(0.1052)/(9/16) = 0.0748. 1 - 0.0748 = 0.9252. Comment: A Weibull with τ = 1 is an Exponential. It is easier to calculate S(25,000), and then F(25,000) = 1 - S(25,000). When working above the breakpoint of $4000, work with the Pareto. If working below the breakpoint of $4000, work with the Weibull. It might have been better if the exam question had read instead: “A loss distribution is a two-component spliced model using a density proportional to a Weibull distribution with θ1 = 1,500 and τ = 1 for losses up to $4,000, and a density proportional to a Pareto distribution with θ2 = 12,000 and α = 2 for losses $4,000 and greater.” The density above 4000 is proportional to a Pareto. The original Pareto integrates to 9/16 from 4000 to infinity. In order to get the density from 4000 to infinity to integrate to the desired 40%, we need to multiply the density of the original Pareto by 40%/(9/16).
2013-4-2, Loss Distributions, §41 Relationship to Life Con.
HCM 10/8/12,
Page 850
Section 41, Relationship to Life Contingencies Many of the ideas discussed with respect to Loss Distributions apply to Life Contingencies and vice-versa. For example, as discussed previously, the mean residual life (complete expectation of life) and the mean excess loss are mathematically equivalent. Similarly, as discussed previously, the hazard rate and force of mortality are two names for the same thing. One can relate the notation used in Life Contingencies to that used in Loss Distributions. ps and qs:340 The probability of survival past time 70 + 10 = 80, given survival past time 70, is 10p 70. The probability of failing at or before time 70 + 10 = 80, given survival past time 70, is 10q70. 10p 70
+ 10q70 = 1.
In general, y-xpx ≡ Prob[Survival past y | Survival past x] = S(y)/S(x). y-xqx
≡ Prob[Not Surviving past y | Survival past x] = {S(x) - S(y)}/S(x) = 1 - y-xpx .
Also px ≡ 1 px = Prob[Survival past x+1 | Survival past x] = S(x+1)/S(x). qx ≡ 1 qx = Prob[Death within one year | Survival past x] = 1 - S(x+1)/S(x). Exercise: Estimate 100p 50 and 300q100, given the following 10 values: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746. [Solution: S(50) = 8/10. S(150) = 3/10. 100p 50 = S(150)/S(50) = (3/10)/(8/10) = 3/8. S(100) = 4/10. S(400) = 1/10. 300q100 = 1 - S(400)/S(100) = 3/4.] t|uqx
≡ Prob[x+t < time of death ≤ x+t+u | Survival past x]
= {S(x+t) - S(x+t+u)}/S(x). Note that t is the time delay, while u is the length of the interval whose probability we measure. Exercise: In the previous exercise, estimate 100|200q70. [Solution: 100|200q70 = {S(170) - S(370)}/S(70) = {(3/10) - (1/10)}/(6/10) = 1/3.] 340
See Section 3.2.2 of Actuarial Mathematics.
2013-4-2, Loss Distributions, §41 Relationship to Life Con.
HCM 10/8/12,
Page 851
Variance of ps and qs:341 With the 10 values: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746, the estimate of 100p 50 = S(150)/S(50) = (3/10)/(8/10) = 3/8 = (number > 150)/(number > 50). Conditional on having 8 values greater than 50, the number of values greater than 150 is Binomial with m = 8 and q = 100p 50, and variance: 8 100p 50(1 - 100p 50) = 8 100p 50 100q50. However, given 8 values greater than 50, 100p 50 = (number > 150)/8. Thus, Var[100p 50 | S(50) = 8/10] = 8 100p 50 100q50 /82 = 100p 50 100q50 / 8 = (3/8)(5/8)/8 = (3)(5)/83 . Let nx ≡ number of values greater than x. Then by the above reasoning, y-xp x = ny/nx, and Var[y-xp x | nx] = ny(nx - ny)/nx3 . Since y-xqx = 1 - y-xp x, Var[y-xqx | nx] = Var[y-xp x | nx] = ny(nx - ny)/nx3 . Exercise: Estimate Var[30q70 | 6 values greater than 70], given the following 10 values: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746. [Solution: 30p 70 = S(100)/S(70) = (4/10)/(6/10) = 2/3. 30q70 = 1/3. Var[30q70 | 6 values greater than 70] = 30p 70 30q70 /6 = (2/3)(1/3)/6 = 1/27. Alternately, Var[30q70 | n70 = 6] = n100(n70 - n100)/n703 = (4)(6 - 4)/63 = 1/27.] Central Death Rate:342 The central death rate, mx = (Probability of dying from age x to age x + 1) / (expected years lived from x to x + 1) x+1
= {S(x) - S(x+1)} / ∫ S(t)dt x
= (Probability of loss of size x to x + 1) / (layer of loss from x to x + 1). n mx
= (Probability of dying from age x to age x + n) / (expected years lived from x to x + n) x+n
= {S(x) - S(x+n)} / ∫ S(t)dt x
= (Probability of loss of size x to x + n) / (layer of loss from x to x + n). 341 342
See Example 14.5 in Loss Models. See page 70 of Actuarial Mathematics.
2013-4-2, Loss Distributions, §41 Relationship to Life Con.
HCM 10/8/12,
Page 852
Problems: Use the following information for the next 5 questions: Mortality follows a Weibull Distribution with θ = 70 and τ = 4. 41.1 (1 point) Determine q60. A. 0.028
B. 0.030
C. 0.032
D. 0.034
E. 0.036
D. 0.96
E. 0.98
D. 0.48
E. 0.50
D. 0.34
E. 0.36
D. 0.24
E. 0.26
41.2 (1 point) Determine p80. A. 0.90
B. 0.92
C. 0.94
41.3 (1 point) Determine 10q65. A. 0.42
B. 0.44
C. 0.46
41.4 (1 point) Determine 13p 74. A. 0.28
B. 0.30
C. 0.32
41.5 (2 points) Determine 10|5q62. A. 0.18
B. 0.20
C. 0.22
41.6 (165, 5/87, Q.9) (2.1 points) Mortality follows a Weibull Distribution with parameters θ and τ. q0 = 0.09516. q1 = 0.25918. Determine q2 . A. 0.37
B. 0.39
C. 0.41
D. 0.43
E. 0.45
41.7 (CAS3, 11/07, Q.30) (2.5 points) Survival follows a Weibull Distribution. Given the following:
• µ(x) = kx2 , k > 0, x ≥ 0 defines the hazard rate function. • 3q2 = 0.68963. Calculate 2|q2 .
2013-4-2, Loss Distributions, §41 Relationship to Life Con.
HCM 10/8/12,
Page 853
Solutions to Problems: 41.1. E. q 60 = 1 - p60 = 1 - S(61)/S(60) = 1 - exp[-(61/70)4 ]/exp[-(60/70)4 ] = 1 - e-.03689 = 0.0362. 41.2. B. p 80 = S(81)/S(80) = exp[-(81/70)4 ]/exp[-(80/70)4 ] = e-.08691 = 0.917. 41.3. B. 10q65 = 1 - 10p 65 = 1 - S(75)/S(65) = 1 - exp[-(75/70)4 ]/exp[-(65/70)4 ] = 1 - e-.5743 = 0.437. 41.4. C. 13p 74 = S(87)/S(74) = exp[-(87/70)4 ]/exp[-(74/70)4 ] = e-1.1372 = 0.321. 41.5. A. 10|5q62 = {S(72) - S(77)}/S(62) = {exp[-(72/70)4 ] - exp[-(77/70)4 ]}/exp[-(62/70)4 ] = (.32652 - .23129)/.54041 = 0.176. 41.6. B. S(x) = exp[-(x/θ)τ]. q0 = {S(0) - S(1)}/S(0) = 1 - exp[ -1/θτ]. ⇒ exp[ -1/θτ] = 1 - .09516 = .90484. q1 = {S(1) - S(2)}/S(1) = 1 - exp[-(2/θ)τ]/exp[ -1/θτ].
⇒ exp[-(2/θ)τ]/exp[ -1/θτ] = 1 - .25918 = .74082. ⇒ exp[-(2/θ)τ] = (.90484)(.74082) = .67032. Therefore, 1/θτ = -ln(.90484) = .100, and (2/θ)τ = -ln(.67032) = .400. Dividing the two equations: 2τ = 4. ⇒ τ = 2. ⇒ θ =
10 .
S(x) = exp[-x2 /10]. q2 = 1 - S(3)/S(2) = 1 - e-.9/e-.4 = 1 - e-.5 = 0.3935.
∫
41.7. D. H(x) = h(x) = k x3 /3. S(x) = exp[-H(x)] = exp[- k x3 /3]. 0.68963 = 3 q2 = 1 - S(5)/S(2) = 1 - exp[-125k/3]/exp[-8k/3].
⇒ 0.31037 = exp[-117k/3]. ⇒ k = 0.03. ⇒S(x) = exp[-x3 /100]. 2|q 2
= {S(4) - S(5)}/S(2) = (e-.64 - e-1.25)/ = e-.08 = 0.261.
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 854
Section 42, Gini Coefficient343 The Gini Coefficient or coefficient of concentration is a concept that comes up for example in economics, when looking at the distribution of incomes. This section will discuss the Gini coefficient and relate it to the Relative Mean Difference. The Gini coefficient is a measure of inequality. For example if all of the individuals in a group have the same income, then the Gini coefficient is zero. As incomes of the individuals in a group became more and more unequal, the Gini coefficient would increase towards a value of 1. The Gini coefficient has found application in many different fields of study. Mean Difference: Define the mean difference as the average absolute difference between two random draws from a distribution. Mean Difference =
∫ ∫ | x - y | f(x) f(y) dx dy , where the double integral is taken over the support of f.
For example, for a uniform distribution from 0 to 10: 10 10
Mean Difference =
10 10
(1/100)
10 y
∫0 x=y∫ x - y dx dy + (1/100) ∫0 x=0∫ y - x dx dy =
10
(1/100)
∫0 ∫0 | x - y | (1/ 10) (1/ 10) dx dy =
10
∫0 50 - y2 / 2 + y2 - 10y dy + (1/100) ∫0 y2 - y2 / 2 dy =
(1/100) (500 + 1000/6 - 500) + (1/100)(1000/6) = 10/3. In a similar manner, in general for the continuous uniform distribution, the mean difference is: (width)/3.344 343
Not on the syllabus of your exam. For a sample of size two from a uniform, the expected value of the minimum is the bottom of the interval plus (width)/3, while the expected value of the maximum is the top of the interval - (width)/3. Thus the expected absolute difference is (width)/3. This is discussed in order statistics, on the Syllabus of Exam CAS 3L. 344
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 855
Exercise: Compute the mean difference for an Exponential Distribution. [Solution: Mean difference = ∞
(1/θ2)
∫0
∫ ∫ | x - y | e- x / θ / θ ∞
y
e- y / θ
∫x=0 (y - x)
dx e - y / θ / θ dy =
e- x / θ
dx dy + (1/θ2)
∫0
∞
e- y / θ
∫x=y (x - y) e- x / θ dx dy =
∞
(1/θ)
∫0 e- y / θ {y (1- e- y / θ ) + θe - y / θ + y e- y / θ - θ} dy + ∞
(1/θ)
∫0 e- y / θ {θe - y / θ + y e- y / θ - y e- y / θ } dy =
∞
∫0 y e- y / θ / θ
+ 2 e - 2y / θ - e - y / θ dy = θ + θ - θ = θ.
Alternately, by symmetry the contributions from when x > y and when y > x must be equal. ∞
Thus, the mean difference is: (2) (1/θ2)
∫0
y
e- y / θ
∫x=0 (y - x) e- x / θ dx dy =
∞
(2/θ)
∫0 e- y / θ {y (1- e- y / θ ) + θe - y / θ + y e- y / θ - θ} dy =
∞
2
∫0 y e- y / θ / θ
Comment:
+ e - 2y / θ - e- y / θ dy = (2)(θ + θ/2 - θ) = θ.
∫ x e - x / θ dx = -θ (x + θ) e-x/θ.
For a sample of size two from an Exponential Distribution, the expected value of the minimum is θ/2, while the expected value of the maximum is 3θ/2. Therefore, the expected value of the difference is θ.]
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 856
Mean Relative Difference: The mean relative difference of a distribution is defined as:
mean difference . mean
For the uniform distribution, the mean relative difference is:
(width) / 3 = 2/3. (width) / 2
For the Exponential Distribution, the mean relative difference is:
θ = 1. θ
Exercise: Derive the form of the Mean Relative Difference for a Pareto Distribution. x -1 x (a- 1) + θ Hint: dx = (x+ θ)a (x + θ)a - 1 (a - 1)(a - 2)
∫
[Solution: For α > 1, E[X] = θ/(α-1). f(x) = Mean difference =
∫∫
|x-y |
α θα . (θ + x)α + 1
α θα α θα dx dy . (θ + x )α + 1 (θ + y)α + 1
By symmetry the contributions from when x > y and when y > x must be equal. ∞
Therefore, mean difference = 2 α2 θ2α ∞
Now using the hint:
∫
x=y ∞
∞
dx dy . ∫ ∫ (θ + x)α + 1 (θ + y)α + 1 y=0 x=y (x - y)
1
x yα + θ dx = . (θ + x)α + 1 α (α - 1) (θ + y)α
∫x=y (θ + x)α + 1 dx = α (θ + y)α . 1
1
∞
Therefore,
∫
x=y
x - y yα + θ y 1 dx = = . (θ + x)α + 1 α (α - 1) (θ + y)α α (θ + y)α α (α - 1) (θ + y)α -1
2 α θ2α Thus, mean difference = α -1
∞
∫0
1 2 α θ2α 1 2αθ dy = = . α 2 2α 1 (α - 1) (2α - 1) (θ + y) α -1 (2α - 1) θ
E[X] = θ/(α-1). Thus, the mean relative difference is: 2 / (2α - 1), α > 1.]
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Lorenz Curve: Assume that the incomes in a country follow a distribution function F(x).345 Then F(x) is the percentage of people with incomes less than x. x
The income earned by such people is:
x
∫0 t f(t) dt = E[X ∧ x] - x S(x) = ∫0 S(t) dt .
The percentage of total income earned by such people is: x
∫ y f(y) dy 0
E[X]
=
E[X ∧ x] - x S(x) . E[X] x
∫ y f(y) dy
Define G(x) = 0
E[X]
=
E[X ∧ x] - x S(x) 346 . E[X]
For example, assume an Exponential Distribution. Then F(x) = 1 - e-x/θ. G(x) =
E[X ∧ x] - x S(x) θ (1 - e- x / θ ) - x e - x / θ = = 1 - e-x/θ - (x/θ) e-x/θ. E[X] θ
Let t = F(x) = 1 - e-x/θ. Therefore, x/θ = - ln(1 - t).347 Then, G(t) = t - {-ln(1-t)} (1-t) = t + (1-t) ln(1-t).
345
Of course, the mathematics applies regardless of what is being modeled. The distribution of incomes is just the most common context. 346 This is not standard notation. I have just used G to have some notation. 347 This is just the VaR formula for the Exponential Distribution.
Page 857
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 858
Then we can graph G as a function of F: G(x) 1.0 0.8 0.6 0.4 0.2
0.2
0.4
0.6
0.8
1.0
F(x)
This curve is referred to as the Lorenz curve or the concentration curve. Since F(0) = 0 = G(0) and F(∞) = 1 = G(∞), the Lorenz curve passes through the points (0, 0) and (1, 1). Usually one would also include in the graph the 45° reference line connecting (0, 0) and (1, 1), as shown below: % of income 1.0
0.8
0.6
0.4 Lorenz Curve 0.2
0.2
0.4
0.6
0.8
1.0
% of people
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 859
x
∫ y f(y) dy
G(t) = G[F(x))] = 0
E[X]
.
dG dG dF x f(x) x = / = / f(x) = > 0. dt dx dx E[X] E[X] d2 G 1 dx dF 1 = / = > 0. 2 dt E[X] dx dx E[X] f(x) Thus, in the above graph, as well as in general, the Lorenz curve is increasing and concave up. The Lorenz curve is below the 45° reference line, except at the endpoints when they are equal. The vertical distance between the Lorenz curve and the 45° comparison line is: F - G. dF dG Thus, this vertical distance is a maximum when: 0 = . dF dF
⇒
dG x = 1. ⇒ = 1. ⇒ x = E[X]. dF E[X]
Thus the vertical distance between the Lorenz curve and the 45° comparison line is a maximum at the mean income. Exercise: If incomes follow an Exponential Distribution, what is this maximum vertical distance between the Lorenz curve and the 45° comparison line? [Solution: The maximum occurs when x = θ. F(x) = 1 - e-x/θ. From previously, G(x) = 1 - e-x/θ - (x/θ) e-x/θ. F - G = (x/θ) e-x/θ. At x = θ, this is: e-1 = 0.3679.]
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 860
Exercise: Determine the form of the Lorenz Curve, if the distribution of incomes follows a Pareto Distribution, with α > 1. ⎛ θ ⎞α ⎛ θ ⎞ α− 1⎫ θ θ ⎧ [Solution: F(x) = 1 - ⎜ ⎟ . E[X] = . E[X ∧ x] = ⎨1 - ⎜ ⎟ ⎬. ⎝ θ + x⎠ ⎝ θ + x⎠ α −1 α −1 ⎩ ⎭ α -1 θ ⎛ θ ⎞ {1 } - x S(x) α -1 ⎝ θ + x⎠ E[X ∧ x] - x S(x) α - 1 x ⎛ θ ⎞ G(x) = = =1 - (α-1) S(x). ⎝ θ + x⎠ E[X] θ / (α -1) θ
⎛ θ ⎞α ⎛ θ ⎞α Let t = F(x) = 1 - ⎜ ⎟ .⇒ = S(x) = 1 - t. Also, x/θ = (1 - t)-1/α - 1.348 ⎝ ⎠ ⎝ θ + x⎠ θ+x Therefore, G(t) = 1 - (1 - t)(α-1)/α - (α-1){(1 - t)-1/α - 1} (1 - t) = t + α - tα - α (1-t)1-1/α, 0 ≤ t ≤ 1. Comment: G(0) = α - α = 0. G(1) = 1 + α - α - 0 = 1.] Here is graph comparing the Lorenz curves for Paretos with α = 2 and α = 5: % of income 1.0
0.8
0.6
0.4 alpha = 5 alpha = 2 0.2
0.2
348
0.4
0.6
This is just the VaR formula for the Pareto Distribution.
0.8
1.0
% of people
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 861
The Pareto with α = 2 has a heavier righthand tail than the Pareto with α = 5. If incomes follow a Pareto with α = 2, then there are more extremely high incomes compared to the mean, than if incomes follow a Pareto with α = 5. In other words, if α = 2, then income is more concentrated in the high income individuals than if α = 5.349 The Lorenz curve for α = 2 is below that for α = 5. In general, the lower curve corresponds to a higher concentration of income. In other words, a higher concentration of income corresponds to a smaller area under the Lorenz curve. Equivalently, a higher concentration of income corresponds to a larger area between the Lorenz curve and the 45° reference line. Gini Coefficient: This correspondence between areas on the graph of the Lorenz curve the concentration of income is the idea behind the Gini Coefficient. Let us label the areas in the graph of a Lorenz Curve, in this case for an Exponential Distribution: % of income 1.0
0.8
0.6 A
0.4
B
0.2
0.2 Gini Coefficient = 349
0.4
0.6
0.8
1.0
% of people
Area A . Area A + Area B
An Exponential Distribution has a lighter righthand tail than either Pareto. Thus if income followed an Exponential, it would less concentrated than if it followed any Pareto.
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 862
However, Area A + Area B add up to a triangle with area 1/2. Area A Therefore, Gini Coefficient = = 2A = 1 - 2B. Area A + Area B For the Exponentials Distribution, the Lorenz curve was: G(t) = t + (1-t) ln(1-t). 1
Thus, Area B = area under Lorenz curve =
1
∫0 t + (1- t) ln(1- t) dt = 1/2 + ∫0 s ln(s) ds .
Applying integration by parts, 1
s =1
1
∫0 s ln(s) ds = (s2 / 2) ln(s)s ]= 0 - ∫0 (s2 / 2) (1/ s) ds = 0 - 1/4 = -1/4. Thus Area B = 1/2 - 1/4 = 1/4. Therefore, for the Exponential Distribution, the Gini Coefficient is: 1 - (2)(1/4) = 1/2. Recall that for the Exponential Distribution, the mean relative difference was 1. As will be shown subsequently, in general, Gini Coefficient = (mean relative difference)/2. Therefore, for the Uniform Distribution, the Gini Coefficient is: (1/2)(2/3) = 1/3. Similarly, for the Pareto Distribution, the Gini Coefficient is: (1/2){2 / (2α - 1)} = 1 / (2α - 1), α > 1. We note that the Uniform with the lightest righthand tail of the three has the smallest Gini coefficient, while the Pareto with the heaviest righthand tail of the three has the largest Gini coefficient. Among Pareto Distributions, the smaller alpha, the heavier the righthand tail, and the larger the Gini Coefficient.350 The more concentrated the income is among the higher earners, the larger the Gini coefficient.
350
As alpha approaches one, the Gini coefficient approaches one.
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 863
LogNormal Distribution: For the LogNormal Distribution: E[X] = exp[µ + σ2/2]. ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ + x {1 - Φ ⎢ ⎥ ⎥⎦ } σ σ ⎣ ⎦ ⎣ ⎡ ln(x) − µ − σ2 ⎤ = E[X] Φ ⎢ ⎥⎦ + x S(x). σ ⎣ Therefore, G(x) =
E[X ∧ x] - x S(x) ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) - µ ⎤ = Φ⎢ = Φ⎢ - σ⎥. ⎥ E[X] σ σ ⎣ ⎦ ⎣ ⎦
⎡ ln(x) - µ ⎤ Let t = F(x) = Φ ⎢ ⎥⎦ . σ ⎣ Then the Lorenz Curve is: G(t) = Φ[ Φ-1[t] - σ]. For example, here a graph of the Lorenz curves for LogNormal Distributions with σ =1 and σ = 2: % of income 1.0
0.8
0.6
0.4 sigma = 1
0.2 sigma = 2
0.2
0.4
0.6
0.8
1.0
% of people
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 864
As derived subsequently, for a LogNormal Distribution, the Gini Coefficient is: 2Φ[σ/ 2 ] - 1. Here is a graph of the Gini Coefficient as a function of sigma: Gini Coef 1.0
0.8
0.6
0.4
0.2
1
2
3
4
5
sigma
As sigma increases, the LogNormal has a heavier tail, and the Gini Coefficient Increases towards 1. The mean relative distance is twice the Gini Coefficient: 4Φ[σ/ 2 ] - 2.
2013-4-2,
Loss Distributions, §42 Gini Coefficient
Derivation of the Gini Coefficient for the LogNormal Distribution: In order to compute the Gini Coefficient, we need to compute area B. % of income 1.0 0.8 0.6 A 0.4 B
0.2
0.2 1
B=
0.4
0.6
0.8
1
∫0 G(t) dt = ∫0 Φ[ Φ - 1[t] - σ] dt .
Let y = Φ-1[t]. Then t = Φ[y]. dt = φ[y] dy. ∞
B=
∫-∞ Φ[y - σ] φ[y] dy .
Now B is some function of σ. ∞
B(σ) =
∫-∞ Φ[y - σ] φ[y] dy . ∞
B(0) =
∫-∞ Φ[y] φ[y] dy = Φ[y]
2
y =∞
/2]
y = -∞
= 1/2.
1.0
% of people
HCM 10/8/12,
Page 865
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 866
∞
B(σ) =
∫-∞ Φ[y - σ] φ[y] dy . Taking the derivative of B with respect to sigma: ∞
Bʼ(σ) = -
∫ -∞
1 φ[y - σ] φ[y] dy = 2π
1 =exp[-σ2/2] 2π
1 =exp[-σ2/4] 2π
1 =exp[-σ2/4] 2π Let x = Bʼ(σ) = -
2 y-σ/
2 π
∫-∞ exp[-(y - σ) 2 / 2] exp[-y 2 / 2] dy
∞
∫-∞ exp[-(2y2 - 2σy) / 2] dy ∞
∫-∞ exp[-{(
2 y) 2 - 2( 2 y)(σ / 2 ) + (σ / 2 ) 2} / 2] dy
∞
∫-∞ exp[-{
2 y - σ / 2 } 2 / 2] dy .
2 . ⇒ dy = dx / ∞
1
∞
exp[-σ2/4]
∫
-∞
2.
exp[-x 2 / 2] 1 dx = exp[-σ2/4].351 2 π 2π
Now assume that B(σ) = c - Φ[σ/ 2 ], for some constant c. Then Bʼ(σ) = -φ[σ/ 2 ] / 2 = -
1 exp[-(σ/ 2 )2 /2] / 2π
2 =-
1 2 π
exp[-σ2/4], matching above.
Therefore, we have shown that B(σ) = c - Φ[σ/ 2 ]. However, B(0) = 1/2. ⇒ 1/2 = c - 1/2. ⇒ c = 1. ⇒ B(σ) = 1 - Φ[σ/ 2 ]. 352 Thus the Gini Coefficient is: 1 - 2B = 2Φ[σ/ 2 ] - 1.
351
Where I have used the fact that the density of the Standard Normal integrates to one over its support from -∞ to ∞. ∞
352
In general,
∫-∞ Φ[a + by] φ[y] dy = Φ[a /
1 + b2 ).
For a list of similar integrals, see http://en.wikipedia.org/wiki/List_of_integrals_of_Gaussian_functions
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 867
Proof of the Relationship Between the Gini Index and the Mean Relative Difference:353 I will prove that: Gini Coefficient = (mean relative difference) / 2. As a first step, let us look at a graph of the Lorenz Curve with areas labeled: % of income 1.0
0.8 C 0.6 A
0.4
B
0.2
0.2
0.4
0.6
0.8
1.0
% of people
A + B = 1/2 = C. B is the area on the Lorenz curve:
∫ G dF .
Area B is the area between the Lorenz curve and the horizontal axis. We can instead look at: C + A = area between the Lorenz curve and the vertical axis =
∫ F dG .
∫ F dG - ∫ G dF = C + A - B = 1/2 + A - (1/2 - A) = 2 A. ⇒ Area A = (1/2) { ∫ F dG - ∫ G dF }. Therefore, we have that:
⇒ Gini Coefficient = 353
Area A = 2A = Area A + Area B
∫ F dG - ∫ G dF .
Based on Section 2.25 of Volume I of Kendallʼs Advanced Theory of Statistics, not on the syllabus.
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 868
x
∫ y f(y) dy
Recall that G(x) = 0
E[X]
Therefore, Gini Coefficient =
1 E[X]
∞ s
∫0 ∫0
. ⇒ dG =
x f(x) dx . E[X]
∫ F dG - ∫
1 G dF = E[X]
1 s f(t) dt f(s) ds E[X]
∞s
∫0 ∫0
∞
∞
∫0 F(s) s f(s) ds - ∫0 G(s) f(s) ds =
1 t f(t) dt f(s) ds = E[X]
∞s
∫0 ∫0 (s - t) f(t) dt f(s) ds .
∞s
∫0 ∫0 (s - t) f(t) dt f(s) ds is the contribution to the mean distance from when s > t. By symmetry it is equal to the contribution to the mean distance from when t > s. ∞s
Therefore, 2
∫0 ∫0 (s - t) f(t) dt f(s) ds = mean distance.
⇒ Gini Coefficient =
(mean difference) / 2 = (mean relative difference) / 2. E[X]
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 869
Problems: 42.1 (15 points) The distribution of incomes follows a Single Parameter Pareto Distribution, α > 1. a. (3 points) Determine the mean relative distance. b. (3 points) Determine the form of the Lorenz curve. c. (3 points) With the aid of a computer, draw and compare the Lorenz curves for α = 1.5 and α = 3. d. (3 points) Use the form of the Lorenz curve to compute the Gini coefficient. e. (3 point) If the Gini coefficient is 0.47, what percent of total income is earned by the top 1% of earners?
42.2 (5 points) For a Gamma Distribution with α = 2, determine the mean relative distance. Hint: Calculate the contribution to the mean difference from when x < y.
∫ x e- x / θ dx = -x e-x/θ θ - e-x/θ θ2. ∫ x2 e- x / θ dx = -x2 e-x/θ θ - 2x e-x/θ θ2 - 2e-x/θ θ3.
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Solutions to Problems:
42.1. a. f(x) =
α θα , x > θ. xα + 1
The contribution to the relative difference from when x > y is: ∞y
∫θ ∫0
α θα α θα (x - y) α + 1 dx α + 1 dy = α2 θ2α x y
1⎞ ⎛ 1 α 2 θ2α ⎝α - 1 α⎠
∞
∫θ
1 y2 α
dy = θ2α
∞
∫θ
⎛ 1 1 -1 y ⎞ 1 + dy = ⎜ ⎟ ⎝ α -1 yα - 1 α yα ⎠ yα + 1
α 1 α = θ. 2α 1 α - 1 (2α - 1) θ (α -1) (2α - 1)
By symmetry this is equal to the contribution to the mean distance from when y > x. α Therefore, the mean distance is: 2 θ. (α -1) (2α - 1) E[X] =
αθ , α > 1. α-1
Therefore, the mean relative difference is:
x
∫
b. G(x) = θ
x
y f(y) dy E[X]
Now let t = F(x) = 1 -
∫
= θ
α θα y α + 1 dy y αθ α-1
2 , α > 1. 2α -1
x
= (α-1) θα−1
θα , x > θ. ⇒ θ/x = (1-t)1/α. xα
Then G(t) = 1 - (1-t)1-1/α , 0 ≤ t ≤ 1.
∫θ
1 yα
dy = 1 -
θα − 1 , x > θ. xα − 1
Page 870
2013-4-2,
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 871
c. For α = 1.5, G(t) = 1 - (1-t)1/3, 0 ≤ t ≤ 1. For α = 3, G(t) = 1 - (1-t)2/3, 0 ≤ t ≤ 1. Here is a graph of these two Lorenz curves: % of income 1.0
0.8
0.6
alpha = 3 0.4 alpha = 1.5 0.2
0.2
0.4
0.6
0.8
1.0
% of people
The Lorenz curve for α = 1.5 is below that for α = 3. The incomes are more concentrated for α = 1.5 than for α = 3. d. The Lorenz curve is: G(t) = 1 - (1-t)1-1/α , 0 ≤ t ≤ 1. Integrating, the area under the Lorenz curve is: B = 1 - 1/(2 - 1/α) = 1 - α/(2α-1) = (α-1)/(2α-1). Gini coefficient is: 1 - 2B = 1 - 2(α-1)/(2α-1) =
1 , α > 1. 2α -1
Note that the Gini Coefficient = (mean relative difference) / 2 = (1/2)
2 1 = . 2α -1 2α -1
2013-4-2, e. 0.47 =
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 872
1 . ⇒ α = 1.564. E[X] = θ α / (α-1) = 2.773 θ. 2α -1
The 99th percentile is: θ (1 - 0.99)-1/1.564 = 19.00 θ. The income earned by the top 1% is: ∞
∫
19θ
x
1.564 θ1.564 dx = (1.564/0.564) θ1.564 / (19θ)0.564 = 0.527 θ. x2.564
Thus the percentage of total income earned by the top 1% is: 0.527 θ / (2.773θ) = 19.0%. Comment: The mean relative distance and the Gini coefficient have the same form as for the two-parameter Pareto Distribution. The distribution of incomes in the United States has a Gini coefficient of about 0.47. For a sample of size two from a Single Parameter Pareto Distribution with α > 1, it turns out that: E[Min] =
2α θ 2α 2 θ . E[Max] = . 2α -1 (α - 1)(2α - 1)
Therefore, the mean difference is: Since, E[X] =
2α 2 θ 2α θ α = 2θ . (α - 1)(2α - 1) 2α -1 (α -1) (2α - 1)
αθ 2 , the mean relative distance is . α-1 2α -1
2013-4-2, 42.2. f(x) =
Loss Distributions, §42 Gini Coefficient
HCM 10/8/12,
Page 873
x α -1 e - x / θ = x e-x/θ / θ2. θ α Γ(α)
The contribution to the relative difference from when x < y is: ∞∞
(1/θ4)
∫0 ∫y (y - x) x
e- x / θ
dx y
e- y / θ
∞⎛
∞ ∞ ⎞ x / θ x / θ ⎜ dy = (1/θ4) y xe dx - x2 e dx⎟ y e- y / θ dy ⎜ ⎟ 0⎝ y y ⎠
∫ ∫
∫
∞
= (1/θ4)
∫0 {y(-ye- y / θ θ - e- y / θ θ2) + y2e- y / θ θ + 2ye- y / θ θ2 + 2e- y / θ θ3 } y e- y / θ dy =
∞
(1/θ4)
∫0 y2 e- 2y / θ θ2 + 2y e- 2y / θ θ3 dy = (1/θ4) {θ2 2(θ/2)3 + 2θ3 (θ/2)2} = θ 3/4.
By symmetry this is equal to the contribution to the mean distance from when x > y. Therefore, the mean distance is: θ 3/2. E[X] = αθ = 2 θ. Therefore, the mean relative difference is:
θ 3 /2 = 3/4. 2θ
Comment: The Gini Coefficient is half the mean relative difference or 3/8. One can show in general that for the Gamma the mean relative distance is 2 - 4 β(α, α+1; 1/2). ⎛2 α⎞ Then in turn it can be shown that for alpha integer, the mean relative distance is: ⎜ ⎟ / 22α-1. ⎝α⎠ ⎛8⎞ For example, for α = 4, the mean relative difference is: ⎜ ⎟ / 27 = 70/128 = 35/64. ⎝4⎠ The Gini Coefficient is half the mean relative difference, and is graphed below as a function of alpha: Gini Coef 1.0 0.8 0.6 0.4 0.2 2
4
6
8
10
alpha
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 874
Section 43, Important Ideas & Formulas Here are what I believe are the most important formulas and ideas from this study guide to know for the exam. Statistics Ungrouped Data (Section 2): Average of X
= 1st moment = E[X].
Average of X2 = 2nd moment about the origin = E[X2 ]. Mean = E[X]. Mode = the value most likely to occur. Median = the value at which the distribution function is 50% = 50th percentile. Variance = second central moment = E[(X - E[X])2 ] = E[X2 ] - E[X]2 . Standard Deviation = Variance . Var[kX] = k2 Var[X]. For independent random variables the variances add. The average of n independent, identically distributed variables has a variance of Var[X] / n. Var[X+Y] = Var[X] + Var[Y] + 2Cov[X,Y]. Cov[X,Y] = E[XY] - E[X]E[Y].
Corr[X, Y] = Cov[X ,Y] /
Var[X]Var[Y] .
Sample Mean = ΣXi / N = X . The sample variance is an unbiased estimator of the variance of the distribution from which a data set
∑ (Xi was drawn: Sample variance ≡
- X )2
N - 1
.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 875
Coefficient of Variation and Skewness (Section 3): Coefficient of Variation (CV) = Standard Deviation / Mean. 1 + CV2 = E[X2 ] / E2 [X] = 2nd moment divided by the square of the mean Average of X3 = 3rd moment about the origin = E[X3 ]. Third Central Moment = E[(X - E[X])3 ] = E[X3 ] - 3 E[X] E[X2 ] + 2 E[X]3 .
Skewness = γ1 =
Kurtosis =
E[(X - E[X]) 3] . A symmetric distribution has zero skewness. STDDEV 3
E[(X - E[X])4 ] E[X4] - 4 E[X] E[X3] + 6 E[X]2 E[X2] - 3 E[X]4 = Variance2 Variance2
.
When computing the empirical coefficient of variation, skewness, or kurtosis, we use the biased estimate of the variance, with n in the denominator, rather than the sample variance. Empirical Distribution Function (Section 4): The Empirical Distribution Function at x: (# of losses ≤ x)/(# losses). The Empirical Distribution Function has mean of F(x) and a variance of: F(x){1-F(x)}/N. S(x) = 1 - F(x) = the Survival Function. Limited Losses (Section 5): X
∧
L ≡ Minimum of x and L = Limited Loss Variable.
The Limited Expected Value at L = E[X
E[X
∧
∧ L]
= E[Minimum[L, x]].
L
L] =
∫
x f(x) dx + LS(L)
0
= contribution of small losses + contribution of large losses. mean = E[X
∧
∞].
E[X
∧
x] ≤ x.
E[X
∧
x] ≤ mean.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 876
Losses Eliminated (Section 6): N = the total number of accidents or loss events. d
Losses Eliminated by a deductible of size d = N
∫
x f(x) dx + N d S(d) = N E[X
∧
d].
0
Loss Elimination Ratio (LER) =
LER(x) =
Losses Eliminated by a deductible of size d . Total Losses
E[X∧ x] . E[X]
Excess Losses (Section 7): (X - d)+ ≡ 0 when X ≤ d, X - d when X > d ⇔ left censored and shifted variable at d ⇔ amounts paid to insured with a deductible of d. Excess Ratio = R(x) = (Losses Excess of x) / (total losses) = E[(X - d)+]/E[X]. R(x) = 1 - LER(x) = 1 - { E[X ∧ x] / mean }.
Total Losses = Limited Losses + Excess Losses: X = (X E[(X - d)+] = E[X] - E[(X
∧
∧
d) + (X - d)+.
d)].
Mean Excess Loss (Section 8): Excess Loss Variable for d ≡ X - d for X > d, undefined for X ≤ d ⇔ the nonzero payments excess of deductible d. Mean Residual Life or Mean Excess Loss = e(x) = the average dollars of loss above x on losses of size exceeding x. e(x) =
E[X] − E[X ∧x] . S(x)
e(x) = (average size of those claims of size greater than x) - x. Failure rate, force of mortality, or hazard rate = h(x) = f(x)/S(x) = - d ln(S(x)) / dx .
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 877
Layers of Loss (Section 9): The percentage of losses in the layer from d to u = u
∫ (x -
d
d) f(x) dx + S(u) (u - d) =
∞
∫x
f(x) dx
E[X ∧ u] − E[X ∧ d] = LER(u) - LER(d) = R(d) - R(u). E[X]
0
Layer Average Severity (LAS) for the layer from d to u = The mean losses in the layer from d to u = E[X ∧ u] - E[X ∧ d] = {LER(u) - LER(d)} E[X] = {R(d) - R(u)} E[X]. Average Size of Losses in an Interval (Section 10): The average size of loss for those losses of size between a and b is: b
∫ x f(x) dx
a
F(b) - F(a)
=
{E[X ∧b] - b S(b)} - {E[X ∧a] - a S(a)} . F(b) - F(a)
Proportional of Total Losses from Losses in the Interval [a, b] is: {E[X ∧b] - b S(b)} - {E[X ∧a] - a S(a)} . E[X] Working with Grouped Data (Section 12): For Grouped Data, if one is given the dollars of loss for claims in each interval, then one can compute E[X
∧ x], LER(x), R(x), and e(x), provided x is an endpoint of an interval.
Uniform Distribution (Section 13): Support: a ≤ x ≤ b
Parameters: None
D. f. : F(x) = (x-a) / (b-a)
P. d. f. : f(x) = 1/ (b-a)
bn + 1 - an + 1 Moments: E[Xn] = (b - a) (n + 1) Mean = (b+a)/2
Variance = (b-a)2 /12
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 878
Statistics of Grouped Data (Section 14): One can estimate moments of Grouped Data by assuming the losses are uniformly distributed on each interval and then weighting together the moments for each interval by the number of claims observed in each interval. Policy Provisions (Section 15): An ordinary deductible is a provision which states that when the loss is less than or equal to the deductible, there is no payment and when the loss exceeds the deductible, the amount paid is the loss less the deductible. The Maximum Covered Loss is the size of loss above which no additional payments are made. A coinsurance factor is the proportion of any loss that is paid by the insurer after any other modifications (such as deductibles or limits) have been applied. A coinsurance is a provision which states that a coinsurance factor is to be applied. The order to operations is: 1. Limit the size of loss to the maximum covered loss. 2. Subtract the deductible. If the result is negative, set the payment equal to zero. 3. Multiply by the coinsurance factor. A policy limit is maximum possible payment on a single claim. Policy Limit = c(u - d). Maximum Covered Loss = u = d + (Policy Limit)/c. With no deductible and no coinsurance, the policy limit ⇔ the maximum covered loss. Under a franchise deductible the insurer pays nothing if the loss is less than the deductible amount, but ignores the deductible if the loss is > the deductible amount. Name ground-up loss
Description Losses prior to the impact of any deductible or maximum covered loss; the full economic value of the loss suffered by the insured regardless of how much the insurer is required to pay in light of any deductible, maximum covered loss, coinsurance, etc.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 879
Truncated Data (Section 16): Ground-up, unlimited losses have distribution function F(x). G(x) is what one would see after the effects of either a deductible or maximum covered loss. Left Truncated ⇔ Truncation from Below at d ⇔ deduct. d & record size of loss when size > d. F(x) - F(d) G(x) = ,x>d 1 - G(x) = S(x) / S(d), x > d S(d) g(x) = f(x) / S(d), x > d
x ⇔ the size of loss.
Truncation & Shifting from Below at d ⇔ deductible d & record non-zero payment ⇔ amount paid per (non-zero) payment. F(x + d) - F(d) , x > 0. g(x) = f(x+d) / S(d), x > 0 S(d) x ⇔ the size of (non-zero) payment. x+d ⇔ the size of loss. G(x) =
When data is truncated from above at the value L, claims of size greater than L are not in the reported data base. G(x) = F(x) / F(L), x ≤ L g(x) = f(x) / F(L), x ≤ L. Censored Data (Section 17): Right Censored ⇔ Censored from Above at u ⇔ Maximum Covered Loss u & donʼt know exact size of loss, when ≥ u. ⎧F(x) x < u G(x) = ⎨ x = u ⎩1 ⎧ f(x) g(x) = ⎨ ⎩point mass of probability S(u)
x < u x = u
The revised Distribution Function and density under censoring from above at u and truncation from below at d is: ⎧F(x) - F(d) ⎪ d < x < u G(x) = ⎨ S(d) ⎪⎩ 1 x = u d < x < u ⎧f(x) / S(d) g(x) = ⎨ ⎩ point mass of probability S(u)/ S(d) x = u
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 880
Left Censored and Shifted at d ⇔ (X - d)+ ⇔ losses excess of d ⇔ 0 when X ≤ d, X - d when X > d ⇔ amounts paid to insured with a deductible of d ⇔ payments per loss, including when the insured is paid nothing due to the deductible ⇔ amount paid per loss. G(0) = F(d) ; G(x) = F(x+d), x > 0.
g(0) point mass of F(d) ; g(x) = f(x+d), x > 0.
Average Sizes (Section 18): Type of Data Ground-up, Total Limits Censored from Above at u Truncated from Below at d Truncated and Shifted from Below at d Left Censored and Shifted
Average Size E[X] E[X ∧ u] e(d) + d = {E[X] - E[X ∧ d]}/S(d) + d e(d) = {E[X] - E[X ∧ d]}/S(d) E[(X - d)+] = E[X] - E[X ∧ d]
Censored from Above at u and Truncated and Shifted from Below at d
{E[X
∧
u] - E[X
∧
d]} / S(d)
With Maximum Covered Loss of u and an (ordinary) deductible of d, the average amount paid by the insurer per loss is: E[X ∧ u] - E[X ∧ d]. With Maximum Covered Loss of u and an (ordinary) deductible of d, the average amount paid by the insurer per non-zero payment to the insured is: E[X ∧ u] - E[X ∧ d] = e(d). S(d) A coinsurance factor of c, multiplies the average payment, either per loss or per non-zero payment by c. Percentiles (Section 19): For a continuous distribution, the 100pth percentile is the first value at which F(x) = p. For a discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p. Definitions (Section 20): A loss event or claim is an incident in which an insured or group of insureds suffers damages which are potentially covered by their insurance contract. The loss is the dollar amount of damage suffered by an insured or group of insureds as a result of a loss event. The loss may be zero.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 881
A payment event is an incident in which an insured or group of insureds receives a payment as a result of a loss event covered by their insurance contract. The amount paid is the actual dollar amount paid to the policyholder(s) as a result of a loss event or a payment event. If it is as the result of a loss event, the amount paid may be zero. A loss distribution is the probability distribution of either the loss or the amount paid from a loss event or of the amount paid from a payment event. The severity can be either the loss or amount paid random variable. The exposure base is the basic unit of measurement upon which premiums are determined. The frequency is the number of losses or number of payments random variable. Parameters of Distributions (Section 21): For a given type of distribution, in addition to the size of loss x, F(x) depends on what are called parameters. The numerical values of the parameter(s) distinguish among the members of a parametric family of distributions. It is useful to group families of distributions based on how many parameters they have. A scale parameter is a parameter which divides x everywhere it appears in the distribution function. A scale parameter will appear to the nth power in the formula for the nth moment of the distribution. A shape parameter affects the shape of the distribution and appears in the coefficient of variation and the skewness. Exponential Distribution (Section 22): Support: x > 0
Parameters: θ > 0 ( scale parameter)
F(x) = 1 - e-x/θ
f(x) = e-x/θ / θ
Mean = θ
Variance = θ2
Coefficient of Variation = 1.
Skewness = 2.
2nd moment = 2θ2
e(x) = Mean Excess Loss = θ
When an Exponential Distribution is truncated and shifted from below, one gets the same Exponential Distribution, due to its memoryless property.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 882
Single Parameter Pareto Distribution (Section 23): Support: x > θ
Parameters: α > 0 (shape parameter)
⎛θ⎞ α F(x) = 1 - ⎜ ⎟ ⎝x⎠
α θα f(x) = α + 1 x
Mean =
αθ , α > 1. α −1
Common Two Parameter Distributions (Section 24): Pareto: α is a shape parameter and θ is a scale parameter. The Pareto is a heavy-tailed distribution. Higher moments may not exist. ⎛ θ ⎞α α θα θ F(x) = 1 - ⎜ ⎟ = 1 - (1 + x / θ)−α f(x) = Mean = , α > 1. α + 1 ⎝θ + x⎠ α −1 (θ + x) E[Xn ] =
n! θ n , α > n. (α − 1)...(α − n)
Mean Excess Loss =
θ+ x , α > 1. α −1
If losses prior to any deductible follow a Pareto Distribution with parameters α and θ, then after truncating and shifting from below by a deductible of size d, one gets another Pareto Distribution, but with parameters α and θ + d. Gamma: α is a shape parameter and θ is a scale parameter. Note the factors of θ in the moments. For α = 1 you get the Exponential. The sum of n independent identically distributed variables which are Gamma with parameters α and θ is a Gamma distribution with parameters nα and θ. For α = a positive integer, the Gamma distribution is the sum of α independent variables each of which follows an Exponential distribution. x α -1 e - x / θ θ α Γ(α)
F(x) = Γ(α; x/θ)
f(x) =
Mean = αθ
Variance = αθ2
E[Xn ] = θn (α )...(α + n -1) .
The skewness for the Gamma distribution is always twice the coefficient of variation. LogNormal: If ln(x) follows a Normal, then x itself follows a LogNormal. ⎡ ln(x) − µ ⎤ F(x) = Φ⎢ ⎥⎦ f(x) = ⎣ σ Mean = exp[µ + .5 σ2]
[(
exp -
ln(x) − µ)2 2σ2 x σ 2π
]
Second Moment = exp[2µ + 2σ2]
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 883
Weibull: τ is a shape parameter, while θ is a scale parameter. ⎡ ⎛ x ⎞ τ⎤ F(x) = 1 - exp⎢-⎜ ⎟ ⎥ ⎣ ⎝ θ⎠ ⎦
f(x) =
⎛ x ⎞τ τ τ− 1 x exp ⎜ ⎟ ⎝ θ⎠ θτ
[ ]
For τ = 1 you get the Exponential Distribution. Other Two Parameter Distributions (Section 25): Mean = µ
Inverse Gaussian:
⎛x ⎞ F(x) = Φ ⎜ − 1⎟ ⎝µ ⎠
[
θ x
]
Variance = µ3 / θ
⎛x ⎞ + e2θ / µ Φ − ⎜ + 1⎟ ⎝µ ⎠
(x / θ)γ LogLogistic: F(x) = . 1 + (x / θ)γ
[
θ x
]
.
f(x) =
⎛ x ⎞2 θ ⎜ − 1⎟ ⎝µ ⎠ θ exp 2x 1.5 2π x
[
]
.
γ xγ − 1 f(x) = γ . θ (1 + (x / θ) γ)2
Inverse Gamma : If X follows a Gamma Distribution with parameters α and 1, then θ/x follows an Inverse Gamma Distribution with parameters τ = α and θ. α is the shape parameter and θ is the scale parameter. The Inverse Gamma is heavy-tailed. θα e - θ / x F(x) = 1 - Γ(α ; θ/x) f(x) = α + 1 x Γ[α] Mean =
θ , α > 1. α −1
E[Xn ] =
θn , α > n. (α − 1)...(α − n)
Producing Additional Distributions (Section 29): Introduce a scale parameter by "multiplying by a constant". Let G(x) = 1 - F(1/x). One gets the Inverse Gamma from the Gamma. Let G(x) = F(ln(x)). One gets the LogNormal from the Normal by “exponentiating.” Add up independent identical copies. One gets the Gamma from the Exponential. Let G(x) = F(xτ). One gets a Weibull from the Exponential by "raising to a power." One can get a new distribution as a continuous mixture of distributions. The Pareto can be obtained as a mixture of Exponentials via an Inverse Gamma. Another method of getting new distributions is via two-point or n-point mixtures.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 884
Tails of Loss Distributions (Section 30): If S(x) goes to zero slowly as x approaches ∞, this is a "heavy-tailed distribution." The righthand tail is thick. If S(x) goes to zero quickly as x approaches ∞, this is a "light-tailed distribution." The righthand tail is thin. The Pareto Distribution is heavy-tailed. The Exponential distribution is light-tailed. The Pareto Distribution is heavier-tailed than the LogNormal Distribution. The Gamma, Pareto and LogNormal all have positive skewness. Heavier Tailed
Lighter Tailed
f(x) goes to zero more slowly Few Moments exist Larger Coefficient of Variation Higher Skewness e(x) Increases to Infinity Decreasing Hazard Rate
f(x) goes to zero more quickly All (positive) moments exist Smaller Coefficient of Variation Lower Skewness e(x) goes to a constant Increasing Hazard Rate
Here is a list of some loss distributions, arranged in increasing heaviness of the tail: Distribution Mean Excess Loss All Moments Exist Weibull for τ > 1
decreases to zero less quickly than 1/x
Yes
Gamma for α > 1
decreases to a constant
Yes
Exponential
constant
Yes
Gamma for α < 1
increases to a constant
Yes
Inverse Gaussian
increases to a constant
Yes
Weibull for τ < 1
increases to infinity less than linearly
Yes
LogNormal Pareto
increases to infinity just less than linearly increases to infinity linearly
Yes No
Let f(x) and g(x) be the two densities, then if: lim f(x) / g(x) = ∞, f has a heavier tail than g x →∞
lim f(x) / g(x) = 0, f has a lighter tail than g x →∞
lim f(x) / g(x) = positive constant, f has a similar tail to g. x →∞
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 885
Limited Expected Values (Section 31): E[X ∧ x] =
x
∫ t f(t) dt + xS(x).
0
Rather than calculating this integral, make use of Appendix A of Loss Models, which has formulas for the limited expected value for each distribution. mean = E[X ∧ infinity].
e(x) = { mean - E[X ∧ x] } / S(x). LER(x) = E[X ∧ x] / mean. Layer Average Severity = E[X ∧ top of Layer] - E[X ∧ bottom of layer]. Expected Losses Excess of d: E[(X - d)+] = E[X] - E[X ∧ d]. Given Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then the average payment per non-zero payment by the insurer is: E[X ∧ u] - E[X ∧ d] c . S(d) Given Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then the insurerʼs average payment per loss to the insured is: c (E[X ∧ u] - E[X ∧ d]). E[X ∧ x] =
∞
x
∫
S(t) dt .
E[X] =
0
∫
S(t) dt .
0
The Losses in a Layer can be written as an integral of the Survival Function from the bottom of the Layer to the top of the Layer: E[X ∧ b] - E[X ∧ a] =
b
∫ S(t) dt . a
The expected amount by which losses are less than d is: E[(d - X)+ ] = d - E[X ∧ d]. E[Max[X, a]] = a + E[X] - E[X
∧
E[Min[Max[X , a] , b]] = a + E[X
a].
∧
b] - E[X
∧
a].
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 886
Limited Higher Moments (Section 32): E[(X ∧ u)2 ] =
u
∫ t2 f(t) dt + S(u) u2
0
The second moment of the average payment per loss under a Maximum Covered Loss u and a deductible of d = the second moment of the layer from d to u is: E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d {E[X ∧ u] - E[X ∧ d]}. Given a deductible of d and a Maximum Covered Loss u, the second moment of the non-zero payments is : (2nd moment of the payments per loss)/S(d). If one has a coinsurance factor of c, then each payment is multiplied by c, therefore the second moment and the variance are each multiplied by c2 . Mean Excess Loss (Mean Residual Life) (Section 33): ∞
∞
x
x
∫ (t - x) f(t) dt
e(x) = E[X - x | X > x] =
S(x)
∫ t f(t) dt
=
S(x)
∞
- x = ∫ S(t) dt / S(x) . x
e(x) = { mean - E[X ∧ x] } / S(x). e(d) = average payment per payment with a deductible d. It should be noted that for heavier-tailed distributions, just as with the mean, the Mean Excess Loss only exists for certain values of the parameters. Otherwise it is infinite. Distribution
Behavior of e(x) as x→∞
Exponential
constant
Pareto
increases linearly
LogNormal
increases to infinity less than linearly
Gamma, α>1
decreases towards a horizontal asymptote
Gamma, α<1
increases towards a horizontal asymptote
Weibull, τ>1
decreases to zero
Weibull, τ<1
increases to infinity less than linearly
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 887
Hazard Rate (Section 34): The Hazard Rate, force of mortality, or failure rate, is defined as: h(x) = f(x)/S(x), x≥0. h(x) = -d ln(S(x)) / dx x
S(x) = exp[-H(x)], where H(x) =
∫ h(t) dt .
0
h(x) defined for x > 0 is a legitimate hazard rate, if and only if h(x) ≥ 0 and the integral of h(x) from 0 to infinity is infinite. For the Exponential, h(x) = 1/θ = constant. 1 . x→∞ h(x)
lim e(x) = lim
x→∞
Loss Elimination Ratios and Excess Ratios (Section 35): Loss Elimination Ratio = LER(x) = E[X ∧ x] / mean = 1 - R(x) Excess Ratio = R(x) = (mean - E[X ∧ x]) / mean = 1 - { E[X ∧ x] / mean } = 1 - LER(x).
LER(x)=
x ∫ 0
S(t) dt E[X]
=
x ∫ 0 ∞
S(t) dt
∫ S(t) dt
.
0
The percent of losses in a layer can be computed either as the difference in Loss Elimination Ratios or the difference of Excess Ratios in the opposite order.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 888
The Effects of Inflation (Section 36): Uniform Inflation ⇔ Every size of loss increases by a factor of 1+r. Under uniform inflation, for a fixed limit the excess ratio increases and for a fixed deductible amount the loss elimination ratio declines. In order to keep up with inflation either the deductible or the limit must be increased at the rate of inflation, rather than being held constant. Under uniform inflation the dollars limited by a fixed limit increase slower than the overall rate of inflation. Under uniform inflation the dollars excess of a fixed limit increase faster than the overall rate of inflation. Limited Losses plus Excess Losses = Total Losses. Common ways to express the amount of inflation: 1. State the total amount of inflation from the earlier year to the later year. 2. Give a constant annual inflation rate. 3. Give the different amounts of inflation during each annual period between the earlier and later year. 4. Give the value of some consumer price index in the earlier and later year. In all cases you want to determine the total inflation factor, (1+r), to get from the earlier year to the later year. The Mean, Mode, Median, and the Standard Deviation are each multiplied by (1+r). The Variance is multiplied by (1+r)2 . The nth moment is multiplied by (1+r)n . The Coefficient of Variation, the Skewness, and the Kurtosis are each unaffected by uniform inflation. Provided the limit keeps up with inflation, the Limited Expected Value, in dollars, is multiplied by the inflation factor. In the later year, the mean excess loss, in dollars, is multiplied by the inflation factor, provided the limit has been adjusted to keep up with inflation. The Loss Elimination Ratio, dimensionless, is unaffected by uniform inflation, provided the deductible has been adjusted to keep up with inflation. In the later year, the Excess Ratio, dimensionless, is unaffected by uniform inflation, provided the limit has been adjusted to keep up with inflation. Most of the size of loss distributions are scale families; under uniform inflation one gets the same type of distribution. If there is a scale parameter, it is revised by the inflation factor. For the Pareto, Single Parameter Pareto, Gamma, Weibull, and Exponential Distributions, θ becomes θ(1+r). Under uniform inflation for the LogNormal, µ becomes µ + ln(1+r).
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 889
For distributions in general, one can determine the behavior under uniform inflation as follows. One makes the change of variables Z = (1+r) X. For the Distribution Function one just sets FZ(z) = FX(x); one substitutes for x = z / (1+r). Alternately, for the density function fZ(z) = fX(x) / (1+r). The domain [a, b] becomes under uniform inflation [(1+r)a, (1+r)b]. The uniform distribution on [a, b] becomes under uniform inflation the uniform distribution on [a(1+r), b(1+r)]. There are two alternative ways to solve many problems involving uniform inflation: 1. Adjust the size of loss distribution in the earlier year to the later year based on the amount of inflation. Then calculate the quantity of interest in the later year. 2. Calculate the quantity of interest in the earlier year at its deflated value, and then adjust it to the later year for the effects of inflation. Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the insurerʼs average payment per loss in the later year is:
(1+ r) c { E[X ∧
u d ] - E[X ∧ ] }. 1+ r 1+ r
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the average payment per (non-zero) payment by the insurer in the later year is:
(1+r) c
E[X ∧
u d ] - E[X ∧ ] 1+ r 1+ r . ⎛ d ⎞ S⎜ ⎟ ⎝ 1+ r ⎠
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 890
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the second moment of the insurerʼs payment per loss in the later year is: (1+r)2 c2 { E[(X ∧
u 2 d 2 d u d ) ] - E[(X ∧ ) ]-2 ( E[X ∧ ] - E[X ∧ ])}. 1+r 1+ r 1+ r 1+ r 1+ r
Given uniform inflation, with inflation factor of 1+r, Deductible Amount d, Maximum Covered Loss u, and coinsurance factor c, then in terms of the values in the earlier year, the average payment per (non-zero) payment by the insurer in the later year is:
(1+r)2 c2
E[(X ∧
u 2 d 2 d u d ) ] - E[(X ∧ ) ] - 2 {E[X ∧ ] - E[X ∧ ]} 1+r 1+ r 1+r 1+r 1+r . ⎛ d ⎞ S ⎝ 1+r ⎠
If one has a mixed distribution, then under uniform inflation each of the component distributions acts as it would under uniform inflation. Lee Diagrams (Section 37): Put the size of loss on the y-axis and probability on the x-axis. The mean is the area under the curve. Layers of loss correspond to horizontal strips. Restricting attention to only certain sizes of loss corresponds to vertical strips.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 891
N-Point Mixtures of Models (Section 38): Mixing models is a technique that provides a greater variety of loss distributions. One can take a weighted average of any two Distribution Functions: G(x) = pA(x) + (1-p)B(x). This is called a 2-point mixture of models. The Distribution Function of the mixture is the mixture of the Distribution Functions. The Survival Function of the mixture is the mixture of the Survival Functions. The density of the mixture is the mixture of the densities.
The mean of the mixture is the mixture of the means. The moment of the mixture is the mixture of the moments: E H [Xn ] = p EA [Xn ] + (1-p) EB [Xn ]. Limited Moments of the mixed distribution are the weighted average of the limited moments of the individual distributions: EH[X ∧ x] = p EA[X ∧ x] + (1-p) EB[X ∧ x]. In general, one can weight together any number of distributions, rather than just two. These are called n-point mixtures. Sometimes the mixture of models is just a mathematical device with no physical significance. However, it can also be useful when the data results from different perils. Variable Mixture ⇔ weighted average of unknown # of distributions of the same family but differing parameters ⇔ F(x) = Σwi Fi(x), with each Fi of the same family & Σwi = 1.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 892
Continuous Mixtures of Models (Section 39): Mixture Distribution ⇔ Continuous Mixture of Models. One takes a mixture of the density functions for specific values of the parameter ζ via some mixing distribution u: g(x) =
∫ f(x; ζ) u(ζ) dζ .
The nth moment of a mixed distribution is the mixture of the nth moments for specific values of the parameter ζ: E[Xn ] = Eζ[E[Xn | ζ]]. If the severity is Exponential and the mixing distribution of their means is Inverse Gamma, then the mixed distribution is a Pareto, with α = shape parameter of the Inverse Gamma and θ = scale parameter of the Inverse Gamma. If the hazard rate of the Exponential, λ, is distributed via a Gamma(α, θ), then the mean 1/λ is distributed via an Inverse Gamma(α, 1/θ), and therefore the mixed distribution is Pareto. If the Gamma has parameters α and θ, then the mixed Pareto has parameters α and 1/θ. If the severity is Normal with fixed variance s2 , and the mixing distribution of their means is also Normal with mean µ and variance σ2, then the mixed distribution is another Normal, with mean µ and variance: s2 + σ2. In a Frailty Model, the hazard rate is of the form: h(x | λ) = λ a(x), where λ is a parameter which varies across the portfolio, and a(x) is some function of x. x
Let A(x) = ∫ a(t) dt . 0
Then S(x) = Mλ[-A(x)]. For an Exponential Distribution: a(x) = 1, and A(x) = x. For a Weibull Distribution: λ = θ−τ, a(x) = τxτ−1, and A(x) = xτ.
2013-4-2,
Loss Distributions, §43 Important Ideas
HCM 10/8/12,
Page 893
Spliced Models (Section 40): A 2-component spliced model has: f(x) = w1 f1 (x) on (a1 , b1 ) and f(x) = w2 f2 (x) on (a2 , b2 ), where f1 (x) is a density with support (a1 , b1 ), f2 (x) is a density with support (a2 , b2 ), and w1 + w2 = 1. A 2-component spliced density will be continuous at the breakpoint b, provided the weights are inversely proportional to the component densities at the breakpoint: w1 = f2 (b) / {f1 (b) + f2 (b)}, w2 = f1 (b) / {f1 (b) + f2 (b)}. Relationship to Life Contingencies (Section 41): y-xp x ≡
Prob[Survival past y | Survival past x] = S(y)/S(x).
y-xq x ≡
Prob[Not Surviving past y | Survival past x] = {S(x) - S(y)}/S(x) = 1 - y-xp x.
p x ≡ 1 p x = Prob[Survival past x+1 | Survival past x] = S(x+1)/S(x). qx ≡ 1 qx = Prob[Death within one year | Survival past x] = 1 - S(x+1)/S(x). t|uq x
≡ Prob[x+t < time of death ≤ x+t+u | Survival past x] = {S(x+t) - S(x+t+u)}/S(x).
Mahlerʼs Guide to
Aggregate Distributions Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-3 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-3,
Aggregate Distributions,
HCM 10/23/12,
Page 1
Mahlerʼs Guide to Aggregate Distributions Copyright 2013 by Howard C. Mahler. The Aggregate Distribution concepts in Loss Models are demonstrated. Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (and sections whose titles are in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined. Solutions to problems are given at the end of each section.1 Section #
Pages
1 2 3 4 5 6 7 8 9 10 11 12
3-28 29-59 60-74 75-108 109-198 199-210 211-242 243-252 253-274 275-286 287-323 324-329
A
B
C D
1
Section Name Introduction Convolutions Using Convolutions Generating Functions
Moments of Aggregate Losses Individual Risk Model Recursive Method / Panjer Algorithm Recursive Method / Panjer Algorithm, Advanced Discretization Analytic Results Stop Loss Premiums Important Formulas & Ideas
Note that problems include both some written by me and some from past exams. The latter are copyright by the CAS and/or SOA and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author; the CAS/SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. In some cases Iʼve rewritten past exam questions in order to match the notation in the current Syllabus. In some cases the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to answer exam questions, but it will not be specifically tested.
2013-4-3,
Aggregate Distributions,
HCM 10/23/12,
Page 2
Past Exam Questions by Section of this Study Aid2 Course 3 Course 3 Course 3 Course 3 Course 3Course 3 CAS 3 Section Sample
5/00
11/00
1 2 3 4 5 6 7 8 9 10 11
11/01
26
36
11/02
11/03
CAS 3
11/03
5/04
19 37
20 25
16 19
8 32
29
7
41 42
14-15
Section 11/04
31, 32
6
24, 25
4, 33
19 22 38 39
36
11
CAS 3 SOA 3 1 2 3 4 5 6 7 8 9 10 11
5/01
SOA 3
19 30
18
40
16
CAS 3
SOA M
CAS 3
SOA M
11/04
5/05
5/05
11/05
11/05
17
6
15
8, 9, 40
CAS 3 CAS 3 SOA M 5/06
11/06
11/06
5/07
40
17, 31, 40 30, 34 34, 38, 40
29
21, 32
17 8
18
19
7
The CAS/SOA did not release the 5/02 and 5/03 exams. The SOA did not release its 5/04 and 5/06 exams. From 5/00 to 5/03, the Course 3 Exam was jointly administered by the CAS and SOA. Starting in 11/03, the CAS and SOA gave separate exams. The CAS/SOA did not release the 11/07 and subsequent exams 4/C.
2
4/C
Excluding any questions that are no longer on the syllabus.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 3
Section 1, Introduction Aggregate losses are the total dollars of loss. Aggregate losses are the product of: the number of exposures, frequency per exposure, and severity. Aggregate Losses = (# of Exposures) (
# of Claims $ of Loss ) ( )= # of Exposures # of Claims
(Exposures) (Frequency) (Severity). If one is not given the frequency per exposure, but is rather just given the frequency for the whole number of exposures,3 whatever they are for the particular situation, then Aggregate Losses = (Frequency) (Severity). Definitions: The Aggregate Loss is the total dollars of loss for an insured or set of an insureds. If not stated otherwise, the period of time is one year. For example, during 1999 the MT Trucking Company may have had $152,000 in aggregate losses on its commercial automobile collision insurance policy. All of the trucking firms insured by the Fly-by-Night Insurance Company may have had $16.1 million dollars in aggregate losses for collision. The dollars of aggregate losses are determined by how many losses there are and the severity of each one. Exercise: During 1998 MT Trucking suffered three collision losses for $8,000, $13,500, and $22,000. What are its aggregate losses? [Solution: $8,000 + $13,500 + $22,000 = $43,500.] The Aggregate Payment is the total dollars paid by an insurer on an insurance policy or set of insurance policies. If not stated otherwise, the period of time is one year. Exercise: During 1998 MT Trucking suffered three collision losses for $8,000, $13,500, and $22,000. MT Trucking has a $10,000 per claim deductible on its policy with the Fly-by-Night Insurance Company. What are the aggregate payments by Fly-by-Night? [Solution: $0 + $3,500 + $12,000 = $15,500.]
3
For example, the expected number claims from a large commercial insured is 27.3 per year or the expected number of Homeownerʼs claims expected by XYZ Insurer in the State of Florigia is 12,310.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 4
Loss Models uses many different terms for each of the important concepts: aggregate losses, frequency and severity.4 aggregate losses ⇔ aggregate loss random variable ⇔ total loss random variable ⇔ aggregate payments ⇔ total payments ⇔ S. frequency ⇔ frequency distribution ⇔ number of claims ⇔ claim count distribution ⇔ claim count random variable ⇔ N. severity ⇔ severity distribution ⇔ single loss random variable ⇔ individual loss random variable ⇔ loss random variable ⇔ X. Collective Risk Model: There are two different types of risk models discussed in Loss Models in order to calculate aggregate losses or payments. The collective risk model adds up the individual losses.5 Frequency is independent of severity and the sizes of loss are independent, identically distributed variables. Exam questions almost always involve the collective risk model. 1. Conditional on the number of losses, the sizes of loss are independent, identically distributed variables. 2. The size of loss distribution is independent of the number of losses. 3. The distribution of the number of claims is independent of the sizes of loss. For example, one might look at the aggregate losses incurred this year by Few States Insurance Company on all of its Private Passenger Automobile Bodily Injury Liability policies in the State of West Carolina. Under a collective risk model one might model the number of losses via a Negative Binomial and the size of loss via a Weibull Distribution. In such a model one is not modeling what happens on each individual policy.6 If we have 1000 independent, identical policies, then the mean of the sum of the aggregate loss is 1000 times the mean of the aggregate loss for one policy. If we have 1000 independent, identical policies, then the variance of the sum of the aggregate losses is 1000 times the variance of the aggregate loss for one policy. 4
This does not seem to add any value for the reader. See Definition 9.1 in Loss Models. 6 In any given year, almost all of these policies would have no Bodily Injury Liability claim. 5
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 5
Individual Risk Model: In contrast the individual risk model adds up the amount paid on each insurance policy.7 The amounts paid on the different insurance policies are assumed to be independent of each other.8 For example, we have 10 life insurance policies with death benefit of $100,000, and 5 with a death benefit of $250,000. Each policy could have a different mortality rate. Then one adds the modeled payments on these polices. This is an example of an individual risk model. Advantages of Analyzing Frequency and Severity Separately:9 Loss Models lists seven advantages of separately analyzing frequency and severity, which allow a more accurate and flexible model of aggregate losses: 1. The number of claims changes as the volume of business changes. Aggregate frequency may change due to changes in exposures. For example, if exposures are in car years, the expected frequency per car year might stay the same, while the number of car years insured increases somewhat. Then the expected total frequency would increase. For example, if the frequency per car year were 3%, and the insured car years increased from 100,000 to 110,000, then the expected number of losses would increase from 3000 to 3300.10 2. The effects of inflation can be incorporated. 3. One can adjust the severity distribution for changes in deductibles, maximum covered loss, etc.
7
See Definition 9.2 in Loss Models. Unless specifically told to, do not assume that the amount of loss on the different policies are identically distributed. For example, the different policies might represent the different employees under a group life or health contract. Each employee might have different amounts of coverage and/or frequencies. 9 See page 201 of Loss Models. 10 In spite of what Loss Models says, there is no significant advantage to looking at frequency and severity separately in this case. One could just multiply expected aggregate losses by 10% in this example. Nor is this example likely to be a big concern financially, if as usual the premiums collected increase in proportion to the increase in the number of car years. Nor is this situation relatively hard to keep track of and/or predict. It should be noted that insurers make significant efforts to keep track of the volume of business they are insuring. In most cases it is directly related to collecting appropriate premiums. Of far greater concern is when the expected frequency per car year changes significantly. For example, if the expected frequency per car year went from 3.0% to 3.3%, this would also increase the expected total number of losses, but without any automatic increase in premiums collected. This might have occurred when speed limits were increased from 55 m.p.h. or when lawyers were first allowed to advertise on T.V. Being able to separately adjust historical frequency and severity for the expected impacts of such changes would be an advantage. 8
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 6
4. One can adjust frequency for changes in deductibles. 5. One can appropriately combine data from policies with different deductibles and maximum covered losses into a single severity distribution. 6. One can create consistent models for the insurer, insured, and reinsurer. 7. One can analyze the tail of the aggregate losses by separately analyzing the tails of the frequency and severity.11 A separate analysis allows an actuary to estimate the parameters of the frequency and severity from separate sources of information.12 Deductibles and Maximum Covered Losses:13 Just as when dealing with loss distributions, aggregate losses may represent somewhat different mathematical and real world quantities. They may relate to the total economic loss, i.e., no deductible and no maximum covered loss. They may relate to the amount paid by an insurer after deductibles and/or maximum covered losses.14 They may relate to the amount paid by the insured due to a deductible and/or other policy modifications. They may relate to the amount the insurer pays net of reinsurance. They may relate to the amount paid by a reinsurer. In order to get the aggregate losses in these different situations, one has to adjust the severity distribution and then add up the number of payments or losses.15 One can either add up the non-zero payments or one can add up all the payments, including zero payments. One needs to be careful to use the corresponding frequency and severity distributions. For example, assume that losses are Poisson with λ = 3, and severity is Exponential with θ = 2000. Frequency and severity are independent.
11
See “Mahlerʼs Guide to Frequency Distributions” and “Mahler's Guide to Loss Distributions.” For most casualty lines, the tail behavior of the aggregate losses is determined by that of the severity distribution rather than the frequency distribution. 12 This can be either an advantage or a disadvantage. Using inconsistent data sources or models may produce a nonsensical estimate of aggregate losses. 13 See Section 8.6 of Loss Models and “Mahlerʼs Guide to Loss Distributions.” 14 Of course in practical applications, we may have a combination of different deductibles, maximum covered losses, coinsurance clauses, reinsurance agreements, etc. However, the concept is still the same. 15 See “Mahlerʼs Guide to Loss Distributions.” Policy provisions that deals directly with the aggregate losses, such as the aggregate deductible for stop loss insurance to be discussed subsequently, must be applied to the aggregate losses at the end.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 7
If there is a deductible of 1000, then the (non-zero) payments are also Exponential with θ = 2000.16 S(1000) = e-1000/2000 = 60.65%; this is the proportion of losses that result in (non-zero) payments. Therefore the aggregate payments can be modeled as either: 1. Number of (non-zero) payments is Poisson with λ = (.6065)(3) = 1.8195 and the size of (non-zero) payments are also Exponential with θ = 2000.17 2. Number of losses is Poisson with λ = 3 and the size of payments per loss is a two-point mixture, with 39.35% chance of zero and 60.65% of an Exponential with θ = 2000. Average aggregate payments are: (1.8915)(2000) = 3639 = (3){(0)(.3935) + (2000)(.6065)}. Exercise: Frequency is Negative Binomial with β = 4 and r = 3. Severity is Pareto with α = 2 and θ = 10,000. Frequency and severity are independent. What are the average aggregate losses? The insurer buys $25,000 per claim reinsurance; the reinsurer will pay the portion of each claim greater than $25,000. What are the insurerʼs aggregate annual losses after reinsurance? How would one model the insurerʼs aggregate losses after reinsurance? [Solution: Average frequency = (4)(3) = 12. Average severity = 10000/(2-1) = 10,000. Average aggregate losses (prior to reinsurance) = (12)(10,000) = 120,000. After reinsurance, the average severity is: E[X ∧ 25000] = {(10000)/(2-1)}{1 - (1+25/10)-(2-1)} = 7143. Average aggregate losses, after reinsurance = (12)(7143) = 85,716. After reinsurance, the frequency distribution is the same, while the severity distribution is censored at 25,000: G(x) = 1 - {1+(x/10000)}-2, for x < 25000; G(25,000) = 1.]
16 17
Due to the memoryless property of the Exponential Distribution. See “Mahlerʼs Guide to Loss Distributions.” When one thins a Poisson, one gets another Poisson. See “Mahlerʼs Guide to Frequency Distributions.”
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 8
Exercise: Frequency is Negative Binomial with β = 4 and r = 3. Severity is Pareto with α = 2 and θ = 10,000. Frequency and severity are independent. The insurer buys $25,000 per claim reinsurance. What is the average aggregate loss for the reinsurer? How would one model the reinsurerʼs aggregate losses? [Solution: From the previous exercise, the reinsurer pays on average 120,000 - 85,716 = 34,284. Alternately, the distribution function of the Pareto at $25000 is: 1 - (1 + 2.5)-2 = 0.918367. Thus only 8.1633% of the insurers losses lead to a non-zero payment by the reinsurer. Thus the reinsurer sees a frequency distribution for non-zero payments which is Negative Binomial with β = (4)(8.1633%) = 0.32653, r = 3, and mean (0.32653)(3) = 0.97959.18 The severity distribution for non-zero payments by the reinsurer is truncated and shifted from below at 25,000. G(x) = {F(x+25000)-F(25000)}/S(25000) = {(1+2.5)-2 - (1+ (x+25000)/10000)-2} /(1+2.5)-2 = 1 - {3.5 + (x/10000)}-2 3.52 = 1 - (1 + (x/35,000))-2. Thus the distribution of the reinsurerʼs non-zero payments is Pareto with α = 2 and θ = 35000, and mean 35000/(2-1) = 35,000.19 Thus the reinsurerʼs expected aggregate loses are: (0.97959)(35,000) = 34,286. Alternately, we can model the reinsurer including its non-zero payments. In that case, the frequency distribution is the original Negative Binomial with β = 4 and r = 3. The severity would be a 91.8367% and 8.1633% mixture of zero and a Pareto with α = 2 and θ = 35,000; G(0) = 0.918367, and G(x) = 1 - (0.081633){1+ (x/35000)}-2, for x > 0.] Model Choices: Severity distributions that are members of scale families have the advantage that they are easy to adjust for inflation and/or changes in currency.20 Infinitely divisible frequency distributions have the advantages that they are easy to adjust for changes in level of exposures and/or time period.21 Loss Models therefore recommends the use of infinitely divisible frequency distributions, unless there is a specific reason not to do so.22
If one thins a Negative Binomial variable one gets the same form, but with β multiplied by the thinning factor. See “Mahlerʼs Guide to Frequency Distributions.” 19 The mean excess loss for a Pareto is e(x) = (θ+x)/(α-1). e(25000) = (10,000 + 25,000)/(2-1) =35,000. See “Mahlerʼs Guide to Loss Distributions.” 20 See “Mahlerʼs Guide to Loss Distributions.” If X is a member of a scale family, then for any c > 0, cX is also a member of that family. 21 See “Mahlerʼs Guide to Frequency Distributions.” Infinitely divisible frequency distributions include the Poisson and Negative Binomial. Compound distributions with a primary distribution that is infinitely divisible are also infinitely divisible. 22 For example, the Binomial may be appropriate when there is some maximum possible number of claims. 18
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 9
Problems: 1.1 (1 point) According to Loss Models, which of the following are advantages of the separation of the aggregate loss process into frequency and severity? 1. Allows the actuary to estimate the parameters of the frequency and severity from separate sources of information. 2. Allows the actuary to adjust for inflation. 3. Allows the actuary to adjust for the effects of deductibles. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A,B, C, or D. Use the following information for the next two questions:
•
Frequency is Poisson with λ = 130, prior to the effect of any deductible.
•
Loss amounts have a LogNormal Distribution with µ = 6.5 and σ = 1.3, prior to the effect of any deductible.
•
Frequency and loss amounts are independent.
1.2 (1 point) Calculate the expected aggregate amount paid by the insurer. (A) 160,000
(B) 170,000
(C) 180,000
(D) 190,000
(E) 200,000
1.3 (3 points) Calculate the expected aggregate amount paid by the insurer, if there is a deductible of 500 per loss. (A) 140,000
(B) 150,000
(C) 160,000
(D) 170,000
(E) 180,000
1.4 (1 point) According to Loss Models, which of the following are advantages of the use of infinitely divisible frequency distribution? 1. The type of distribution selected does not depend on whether one is working with months or years. 2. Allows the actuary to retain the same type of distribution after adjusting for inflation. 3. Allows the actuary to adjust for changes in exposure levels while using the same type of distribution. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A,B, C, or D.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 10
1.5 (1 point) Aggregate losses for a portfolio of policies are modeled as follows: (i) The number of losses before any coverage modifications follows a distribution with mean 30. (ii) The severity of each loss before any coverage modifications is uniformly distributed between 0 and 1000. The insurer would like to model the impact of imposing an ordinary deductible of 100 on each loss and reimbursing only 80% of each loss in excess of the deductible. It is assumed that the coverage modifications will not affect the loss distribution. The insurer models its claims with modified frequency and severity distributions. The modified claim amount is uniformly distributed on the interval [0, 720]. Determine the mean of the modified frequency distribution. (A) 3 (B) 21.6 (C) 24 (D) 27 (E) 30 1.6 (3 points) The amounts of loss have a Pareto Distribution with α = 4 and θ = 3000, prior to any maximum covered loss or deductible. Frequency is Negative Binomial with r = 32 and β = 0.5, prior to any maximum covered loss or deductible. If there is a 1000 deductible and 5000 maximum covered loss, what is the expected aggregate amount paid by the insurer? (A) 6000 (B) 6500 (C) 7000 (D) 7500 (E) 8000 1.7 (3 points) The Boxborough Box Company owns three factories. It buys insurance to protect itself against major repair costs. Profit = 45 less the sum of insurance premiums and retained major repair costs. The Boxborough Box Company will pay a dividend equal to half of the profit, if it is positive. You are given: (i) Major repair costs at the factories are independent. (ii) The distribution of major repair costs for each factory is: k Prob(k) 0 0.6 20 0.3 50 0.1 (iii) At each factory, the insurance policy pays the major repair costs in excess of that factoryʼs ordinary deductible of 10. (iv) The insurance premium is 25. Calculate the expected dividend. (A) 3.9 (B) 4.0 (C) 4.1 (D) 4.2 (E) 4.3
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 11
1.8 (2 points) Lucky Tom always gives money to one beggar on his walk to work and money to another beggar on his walk home from walk. There is a 3/4 chance he gives a beggar $1 and a 1/4 chance he gives a beggar $10. However, 1/8 of the time Tom has to stay late at work and takes a cab home rather than walk. What is the 90th percentile of the amount of money Tom gives away on a work day? A. 1 B. 2 C. 10 D. 11 E. 20 1.9 (2 points) A restaurant has tables that seat two people. 30% of the time one person eats at a table, while 70% of the time two people eat at a table. After eating their meal, tips are left. 50% of the time the tip at a table is $1 per person, 40% of the time the tip at a table is $2 per person, and 10% of the time the tip at a table is $3 per person. What is the 70th percentile of the distribution of tips left at a table? A. 2 B. 3 C. 4 D. 5 E. 6 1.10 (2 points) A coffee shop has tables that seat two people. 30% of the time one person sits at a table, while 70% of the time two people sit at a table. There are a variety of beverages which cost either $1, $2, or $3. Each person buys a beverage, independently of anyone else. 50% of the time the beverage costs $1, 40% of the time the beverage costs $2, and 10% of the time the beverage costs $3. Determine the probability that the total cost of beverages at a table is either $2, $3, or $4. A. 78% B. 79% C. 80% D. 81% E. 82% Use the following information for the next four questions: The losses for the Mockingbird Tequila Company have a Poisson frequency distribution with λ = 5 and a Weibull severity distribution with τ = 1/2 and θ = 50,000. The Mockingbird Tequila Company buys insurance from the Atticus Insurance Company, with a deductible of $5000, maximum covered loss of $250,000, and coinsurance factor of 90%. The Atticus Insurance Company buys reinsurance from the Finch Reinsurance Company. Finch will pay Atticus for the portion of any payment in excess of $100,000. 1.11 (3 points) Construct a model for the aggregate payments retained by the Mockingbird Tequila Company. 1.12 (3 points) Construct a model for the aggregate payments made by Atticus Insurance Company to the Mockingbird Tequila Company, prior to the impact of reinsurance. 1.13 (3 points) Construct a model for the aggregate payments made by the Finch Reinsurance Company to the Atticus Insurance Company. 1.14 (3 points) Construct a model for the aggregate payments made by the Atticus Insurance Company to the Mockingbird Tequila Company net of reinsurance.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 12
1.15 (2 points) The number of claims is Binomial with m = 2 and q = 0.1. The size of claims is Normal with µ = 1500 and σ = 400. Frequency and severity are independent. Determine the probability that the aggregate loss is greater than 2000. A. Less than 1.0% B. At least 1.0%, but less than 1.5% C. At least 1.5%, but less than 2.0% D. At least 2.0%, but less than 2.5% E. 2.5% or more 1.16 (5A, 11/95, Q.21) (1 point) Which of the following assumptions are made in the collective risk model? 1. The individual claim amounts are identically distributed random variables. 2. The distribution of the aggregate losses generated by the portfolio is continuous. 3. The number of claims and the individual claim amounts are mutually independent. A. 1 B. 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 1.17 (Course 151 Sample Exam #1, Q.15) (1.7 points) An insurer issues a portfolio of 100 automobile insurance policies. Of these 100 policies, one-half have a deductible of 10 and the other half have a deductible of zero. The insurance policy pays the amount of damage in excess of the deductible subject to a maximum of 125 per accident. Assume: (i) the number of automobile accidents per year per policy has a Poisson distribution with mean 0.03 (ii) given that an accident occurs, the amount of vehicle damage has the distribution: x p(x) 30 1/3 150 1/3 200 1/3 Compute the total amount of claims the insurer expects to pay in a single year. (A) 270 (B) 275 (C) 280 (D) 285 (E) 290 1.18 (Course 151 Sample Exam #1, Q.19) (2.5 points) SA and SB are independent random variables and each has a compound Poisson distribution. You are given: (i) λA = 3, λB =1 (ii) The severity distribution of SA is: pA(1) = 1.0. (iii) The severity distribution of SB is: pB(1) = pB(2) = 0.5. (iv) S = SA + SB Determine FS(2). (A) 0.12
(B) 0.14
(C) 0.16
(D) 0.18
(E) 0.20
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 13
1.19 (Course 151 Sample Exam #2, Q.21) (1.7 points) Aggregate claims, X, is uniformly distributed over (0, 20). Complete insurance is available with a premium of 11.6. If X is less than k, a dividend of k - X is payable. Determine k such that the expected cost of this insurance is equal to the expected claims without insurance. (A) 2 (B) 4 (C) 6 (D) 8 (E) 10 1.20 (Course 151 Sample Exam #3, Q.18) (1.7 points) Aggregate claims has a compound Poisson distribution with: (i) λ = 1.0 (ii) severity distribution: p(1) = p(2) = 0.5 For a premium of 4.0, an insurer will pay total claims and a dividend equal to the excess of 75% of premium over claims. Determine the expected dividend. (A) 1.5 (B) 1.7 (C) 2.0 (D) 2.5 (E) 2.7 1.21 (5A, 11/96, Q.37) (2 points) Claims arising from a particular insurance policy have a compound Poisson distribution. The expected number of claims is five. The claim amount density function is given by P(X = 1,000) = 0.8 and P(X = 5,000) = 0.2 Compute the probability that losses from this policy will total 6,000. 1.22 (5A, 11/99, Q.24) (1 point) Which of the following are true regarding collective risk models? A. If we combine insurance portfolios, where the aggregate claims of each of the portfolios have compound Poisson Distributions and are mutually independent, then the aggregate claims for the combined portfolio will also have a compound Poisson Distribution. B. When the variance of the number of claims exceeds its mean, the Poisson distribution is appropriate. C. If the claim amount distribution is continuous, it can be concluded that the distribution of the aggregate claims is continuous. D. A Normal Distribution is usually the best approximation to the aggregate claim distribution. E. All of the above are false.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 14
1.23 (Course 1 Sample Exam, Q.10) (1.9 points) An insurance policy covers the two employees of ABC Company, Bob and Carol. The policy will reimburse ABC for no more than one loss per employee in a year. It reimburses the full amount of the loss up to an annual company-wide maximum of 8000. The probability of an employee incurring a loss in a year is 40%. The probability that an employee incurs a loss is independent of the other employeeʼs losses. The amount of each loss is uniformly distributed on [1000, 5000]. Given that Bob has incurred a loss in excess of 2000, determine the probability that losses will exceed reimbursements. A. 1/20 B. 1/15 C. 1/10 D. 1/8 E. 1/6 Note: The original exam question has been rewritten. 1.24 (IOA 101, 4/01, Q.8) (5.25 points) Consider two independent lives A and B. The probabilities that A and B die within a specified period are 0.1 and 0.2 respectively. If A dies you lose 50,000, whether or not B dies. If B dies you lose 30,000, whether or not A dies. (i) (3 points) Calculate the mean and standard deviation of your total losses in the period. (ii) (2.25 points) Calculate your expected loss within the period, given that one, and only one, of A and B dies. 1.25 (3, 5/01, Q.26 & 2009 Sample Q.109) (2.5 points) A company insures a fleet of vehicles. Aggregate losses have a compound Poisson distribution. The expected number of losses is 20. Loss amounts, regardless of vehicle type, have exponential distribution with θ = 200. In order to reduce the cost of the insurance, two modifications are to be made: (i) a certain type of vehicle will not be insured. It is estimated that this will reduce loss frequency by 20%. (ii) a deductible of 100 per loss will be imposed. Calculate the expected aggregate amount paid by the insurer after the modifications. (A) 1600 (B) 1940 (C) 2520 (D) 3200 (E) 3880 1.26 (1, 11/01, Q.16) (1.9 points) Let S denote the total annual claim amount for an insured. There is a probability of 1/2 that S = 0. There is a probability of 1/3 that S is exponentially distributed with mean 5. There is a probability of 1/6 that S is exponentially distributed with mean 8. Determine the probability that 4 < S < 8. (A) 0.04 (B) 0.08 (C) 0.12 (D) 0.24 (E) 0.25 Note: This past exam question has been rewritten.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 15
1.27 (3, 11/01, Q.36 & 2009 Sample Q.102) (2.5 points) WidgetsRUs owns two factories. It buys insurance to protect itself against major repair costs. Profit equals revenues, less the sum of insurance premiums, retained major repair costs, and all other expenses. WidgetsRUs will pay a dividend equal to the profit, if it is positive. You are given: (i) Combined revenue for the two factories is 3. (ii) Major repair costs at the factories are independent. (iii) The distribution of major repair costs for each factory is k Prob(k) 0 0.4 1 0.3 2 0.2 3 0.1 (iv) At each factory, the insurance policy pays the major repair costs in excess of that factoryʼs ordinary deductible of 1. The insurance premium is 110% of the expected claims. (v) All other expenses are 15% of revenues. Calculate the expected dividend. (A) 0.43 (B) 0.47 (C) 0.51 (D) 0.55 (E) 0.59 1.28 (SOA3, 11/03, Q.19 & 2009 Sample Q.86) (2.5 points) Aggregate losses for a portfolio of policies are modeled as follows: (i) The number of losses before any coverage modifications follows a Poisson distribution with mean λ. (ii) The severity of each loss before any coverage modifications is uniformly distributed between 0 and b. The insurer would like to model the impact of imposing an ordinary deductible, d (0 < d < b), on each loss and reimbursing only a percentage, c (0 < c < 1), of each loss in excess of the deductible. It is assumed that the coverage modifications will not affect the loss distribution. The insurer models its claims with modified frequency and severity distributions. The modified claim amount is uniformly distributed on the interval [0, c(b - d)]. Determine the mean of the modified frequency distribution. (A) λ
(B) λc
(C) λd/b
(D) λ(b-d)/b
(E) λc(b-d)/b
1.29 (SOA3, 11/04, Q.17 & 2009 Sample Q.126) (2.5 points) The number of annual losses has a Poisson distribution with a mean of 5. The size of each loss has a two-parameter Pareto distribution with θ = 10 and α = 2.5. An insurance for the losses has an ordinary deductible of 5 per loss. Calculate the expected value of the aggregate annual payments for this insurance. (A) 8 (B) 13 (C) 18 (D) 23 (E) 28
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 16
1.30 (CAS3, 5/05, Q.6) (2.5 points) For a portfolio of 2,500 policies, claim frequency is 10% per year and severity is distributed uniformly between 0 and 1,000. Each policy is independent and has no deductible. Calculate the reduction in expected annual aggregate payments, if a deductible of $200 per claim is imposed on the portfolio of policies. A. Less than $46,000 B. At least $46,000, but less than $47,000 C. At least $47,000, but less than $48,000 D. At least $48,000, but less than $49,000 E. $49,000 or more 1.31 (SOA M, 11/06, Q.40 & 2009 Sample Q.289) (2.5 points) A compound Poisson distribution has λ = 5 and claim amount distribution as follows: x
p(x)
100 0.80 500 0.16 1000 0.04 Calculate the probability that aggregate claims will be exactly 600. (A) 0.022 (B) 0.038 (C) 0.049 (D) 0.060 (E) 0.070
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 17
Solutions to Problems: 1.1. D. 1. True. 2. True. May also adjust for expected changes in frequency. 3. True. 1.2. E. E[X] = exp(µ + σ2/2) = exp(6.5 + 1.32 /2) = 1548. Mean aggregate loss = (130)(1548) = 201,240. 1.3. B. E[X] = exp(µ + σ2/2) = e7.345 = 1548. E[X ∧ 500] = exp(µ + σ2/2)Φ[(ln500 − µ − σ2)/σ] + 500{1 - Φ[(ln500 − µ)/σ]} = 1548Φ[-1.52] + 500{1 - Φ[-0.22]} = (1548)(0.0643) + (500)(0.5871) = 393. The mean payment per loss is: E[X] - E[X ∧ 500] = 1548 - 393 = 1155. Mean aggregate loss = (130)(1155) = 150,150. Alternately, the nonzero payments are Poisson with mean: (130)S(500). The average payment per nonzero payment is: (E[X] - E[X ∧ 500])/S(500). Therefore, the mean aggregate loss is: (130)S(500)(E[X] - E[X ∧ 500])/S(500) = (130)(E[X] - E[X ∧ 500]) = (130)(1155) = 150,150. 1.4. B. 1. True. For example, if the frequency over one year is Negative Binomial with parameters β = .3 and r =2, then (assuming the months are independent) the frequency is Negative Binomial over one month, with parameters β = .3 and r = 2/12. This follows from the form of the Probability Generating Function, which is: P(z) = {1- β(z-1)}-r. 2. False. The frequency distribution has no effect on this. 3. True. 1.5. D. For the uniform distribution on (0, 1000), S(100) = 90%. The distribution of non-zero payments has mean: (90%)(30) = 27. Comment: Similar to SOA3, 11/03 Q.19. The mean aggregate payment after modifications is: (27)(720/2) = 9720.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 18
1.6. A. The mean frequency is: (32)(.5) = 16. The mean payment per loss is: E[X ∧ 5000] - E[X ∧ 1000] = {θ/(α−1)}{1−(θ/(θ+5000))α−1} - {θ/(α−1)}{1−(θ/(θ+1000))α−1} = (3000/3){(3000/4000)3 - (3000/8000)3 } = (1000)(.4219 - .0527) = 369.2. Mean aggregate loss = (16)(369.2) = 5907. Alternately, the nonzero payments have mean frequency: (16)S(1000). The average payment per nonzero payment is: (E[X ∧ 5000] - E[X ∧ 1000])/S(1000). Therefore, the mean aggregate loss is: (16)S(1000)(E[X ∧ 5000] - E[X ∧ 1000])/S(1000) = (16)(E[X ∧ 5000] - E[X ∧ 1000]) = (16)(947.3 - 578.1) = 5907. 1.7. E. Each factory has retained costs that are 0 60% of the time and 10 40% of the time. Profit = 45 - 25 - retained costs = 20 - retained costs. Prob[retained cost = 0] = Prob[all 3 factories have 0 retained costs] = 0.63 = 0.216. Prob[retained cost = 10] = Prob[2 factories have 0 retained costs and other has 10] = (3)(0.62 )(0.4) = 0.432. Probability Retained Cost Profit 0.216 0 20 0.432 10 10 0.288 20 0 0.064 30 -10 Average 12 8 Comment: Similar to 3, 11/01, Q.36.
Dividend 10 5 0 0 4.32
1.8. D. There is a 7/8 chance he donates to two beggars and a 1/8 chance he donates to one beggar. Prob[Agg = 1] = (1/8)(3/4) = 3/32. Prob[Agg = 2] = (7/8)(3/4)2 = 63/128. Prob[Agg = 10] = (1/8)(1/4) = 1/32. Prob[Agg = 11] = (7/8)(2)(3/4)(1/4) = 42/128. Prob[Agg = 20] = (7/8)(1/4)2 = 7/128. Prob[Agg ≤ 10] = 79/128 = 61.7% < 90%. Prob[Agg ≤ 11] = 121/128 = 94.5% ≥ 90%. The 90th percentile is 11. Comment: The 90th percentile is the first value such that F ≥ 90%. 1.9. C. Prob[Agg = 1] = (30%)(50%) = 15%. Prob[Agg = 2] = (30%)(40%) + (70%)(50%) = 47%. Prob[Agg = 3] = (30%)(10%) = 3%. Prob[Agg = 4] = (70%)(40%) = 28%. Prob[Agg = 6] = (70%)(10%) = 7%. Prob[Agg ≤ 3] = 65% < 70%. Prob[Agg ≤ 4] = 93% ≥ 70%. The 70th percentile is 4.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 19
1.10. B. Prob[Agg = 2] = Prob[1 person] Prob[drink = $2] + Prob[2 persons] Prob[both $2] = (30%)(40%) + (70%)(50%)2 = 29.5%. Prob[Agg = 3] = Prob[1 person] Prob[drink = $3] + Prob[2 persons] Prob[one at $1 and one at $2] = (30%)(10%) + (70%)(2)(50%)(40%) = 31%. Prob[Agg = 4] = Prob[2 persons] Prob[each $2] + Prob[2 persons] Prob[one at $1 and one at $3] = (70%)(40%)2 + (70%)(2)(50%)(10%) = 18.2%. Prob[Agg = 2] + Prob[Agg = 3] + Prob[Agg = 4] = 29.5% + 31% + 18.2% = 78.7%. Comment: Prob[Agg = 1] = (30%)(50%) = 15%. Prob[Agg = 5] = (70%)(2)(40%)(10%) = 5.6%. Prob[Agg = 6] = (70%)(10%)2 = 0.7%. 1.11. Mockingbird retains all of any loss less than $5000. For a loss of size greater than $5000, it retains $5000 plus 10% of the portion above $5000. Mockingbird retains the portion of any loss above the maximum covered loss of $250,000. Let X be the size of loss and Y be the amount retained. Let F be the Weibull Distribution of X and G be the distribution of Y. y = x, for x ≤ 5000. y = 5000 + (0.1)(x - 5000) = 4500 + 0.1x, for 5000 ≤ x ≤ 250,000. Therefore, x = 10y - 45000, for 5000 ≤ y ≤ 29,500. y = 4500 + (.1)(250000) + (x - 250000) = x - 220,500, for 250,000 ≤ x. Therefore, x = y + 220,500, for 29,500 ≤ y. G(y) = F(y) = 1 - exp[-(y/50000)1/2], for y ≤ 5000. G(y) = F(10y - 45000) = 1 - exp[-((10y - 45000)/50000)1/2] = 1 - exp[-(y/5000 - 0.9)1/2], 5000 ≤ y ≤ 29,500. G(y) = F(y + 220,500) = 1 - exp[-((y + 220,500)/50000)1/2] = 1 - exp[-(y/50000 + 4.41)1/2], 29,500 ≤ y. The number of losses is Poisson with λ = 5.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 20
1.12. Let X be the size of loss and Y be the amount paid on that loss. Let F be the Weibull Distribution of X and G be the distribution of Y. Atticus Insurance pays nothing for a loss less than $5000. For a loss of size greater than $5000, Atticus Insurance pays 90% of the portion above $5000. For a loss of size 250,000, Atticus Insurance pays: (0.9)(250,000 - 5000) = 220,500. Atticus Insurance pays no more for a loss larger than the maximum covered loss of $250,000. y = 0, for x ≤ 5000. y = (0.9)(x - 5000) = 0.9x - 4500, for 5000 ≤ x ≤ 250,000. Therefore, x = (y + 4500)/.9, y < 220,500. y = 220,500, for 250,000 ≤ x. G(0) = F(5000) = 1 - exp[-(5000/50000)1/2] = .2711. G(y) = F[(y + 4500)/0.9] = 1 - exp[-((y + 4500)/45000)1/2], 0 < y < 220,500. G(220,500) = 1. The number of losses is Poisson with λ = 5. Alternately, let Y be the non-zero payments by Atticus Insurance. Then G(y) = {F((y + 4500)/.9) - F(5000)} / S(5000) = {1 - exp[-((y + 4500)/45000)1/2] - .2711} / 0.7289 = 1 - exp[-((y + 4500)/45000)1/2] / 0.7289, 0 < y < 220,500. G(220,500) = 1. The number of non-zero payments is Poisson with λ = (0.7289)(5) = 3.6445.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 21
1.13. Finch Reinsurance pays something when the loss results in a payment by Atticus of more than $100,000. Solve for the loss that results in a payment of $100,000: 100000 = (.9)(x - 5000). ⇒ x = 116,111. Let X be the size of loss and Y be the amount paid by Finch Reinsurance. Let F be the Weibull Distribution of X and G be the distribution of Y. y = 0, for x ≤ 116,111. y = (.9)(x - 116,111) = .9x - 104,500, for 116,111 < x ≤ 250,000. Therefore, x = (y + 104500)/.9, for 0 < y < 120,500. y = 120,500, for 250,000 ≤ x. G(0) = F(116111) = 1 - exp[-(116111/50000)1/2] = .7821. G(y) = F((y + 104500)/.9) = 1 - exp[-((y + 104500)/45000)1/2], for 0 < y < 120,500. G(120500) = 1. The number of losses is Poisson with λ = 5. Alternately, let Y be the non-zero payments by Finch. Then G(y) = {F((y + 104500)/.9) - F(116111)}/S(116111) = {1 - exp[-((y + 104500)/45000)1/2] - 0.7821}/0.2179 = 1 - exp[-((y + 104500)/45000)1/2]/0.2179, for 0 < y < 120,500. G(120500) = 1. The number of non-zero payments by Finch is Poisson with λ = (.2179)(5) = 1.0895. 1.14. Let X be the size of loss and Y be the amount paid on that loss net of reinsurance. Let F be the Weibull Distribution of X and G be the distribution of Y. For a loss greater than 116,111, Atticus Insurance pays 100,000 net of reinsurance. y = 0, for x ≤ 5000. y = (0.9)(x - 5000) = 0.9x - 4500, for 5000 ≤ x ≤ 116,111. y = 100,000, for 116,111 < x. G(0) = F(5000) = 1 - exp[-(5000/50000)1/2] = 0.2711. G(y) = F[(y + 4500)/0.9] = 1 - exp[-((y + 4500)/45000)1/2], 0 < y < 100,000. G(100,000) = 1. The number of losses is Poisson with λ = 5. Alternately, let Y be the non-zero payments by Atticus Insurance net of reinsurance. Then G(y) = {F((y + 4500)/.9) - F(5000)}/S(5000) = {1 - exp[-((y + 4500)/45000)1/2] - .2711}/.7289 = 1 - exp[-((y + 4500)/45000)1/2]/0.7289, 0 < y < 100,000. G(100,000) = 1. The number of non-zero payments is Poisson with λ = (0.7289)(5) = 3.6445.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 22
1.15. E. Prob[N = 0] = 0.92 = 0.81. Prob[N = 1] = (2)(0.1)(0.9) = 0.18. Prob[N = 2] = 0.12 = 0.01. Let X be the size of a single loss. Prob[X > 2000] = 1 - Φ[(2000 - 1500)/400] = 1 - Φ[1.25] = 0.1056. X + X is also Normal, but with µ = 3000 and σ = 400
2 = 565.685.
Prob[X + X > 2000] = 1 - Φ[(2000 - 3000)/565.685] = 1 - Φ[-1.77] = 0.9616. Prob[aggregate > 2000] = Prob[N = 1] Prob[X > 2000] + Prob[N = 2] Prob[X + X > 2000] = (0.18)(0.1056) + (0.01)(0.9616) = 2.86%. Comment: Do not use the Normal Approximation. 1.16. C. 1. True. 2. False. The aggregate losses are continuous if the severity distribution is continuous and there is no probability of zero claims. The aggregate losses are discrete if the severity distribution is discrete. 3. True. 1.17. B. The expected number of accidents is: (.03)(100) = 3. The mean payment with no deductible is: (30 + 125 + 125)/3 = 93.333. The mean payment with a deductible of 10 is: (20 + 125 + 125)/3 = 90. The overall mean payment is: (1/2)(93.333) + (1/2)(90) = 91.667. Therefore, the mean aggregate loss is: (3)(91.667) = 275. Comment: The maximum amount paid is 125; in one case there is no deductible and a maximum covered loss of 125, while in the other case there is a deductible of 10 and a maximum covered loss of 135.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 23
1.18. E. Since pA(1) = 1, SA is just a Poisson Distribution with mean 3. Prob(SA = 0) = e-3. Prob(SA = 1) = 3e-3. Prob(SA = 2) = (9/2)e-3. For B the chance of 0 claims is e-1, the chance of one claim is e-1, and the chance of two claims is: e1 /2.
Prob(SB = 0) = Prob(zero claims) = e-1.
Prob(SB = 1) = Prob(one claim)Prob(claim amount = 1) = e-1/2. Prob(SB = 2) = Prob(one claim)Prob(amount = 2) + Prob(two claims)Prob( both amounts = 1) = e-1/2 + (e-1/2)(1/4) = 5e-1/8. Prob(S≤2) = Prob(SA = 0)Prob(SB ≤2) + Prob(SA = 1)Prob(SB ≤1) + Prob(SA = 2)Prob(SB = 0) = (e-3)(17e-1/8) + (3e-3)(12e-1/8) + (9e-3/2)(e-1) = 89e-4/8 = 0.2038. Alternately, the combined process is the sum of two independent compound Poisson processes, so it in turn is a compound Poisson process. It is has claims of sizes 1 and 2. The expected number of claims of size 1 is: (3)(1) + (1)(.5) = 3.5. The expected number of claims of size 2 is: (3)(0) + (1)(.5) = .5. The small claims (those of size 1) and the large claims (those of size 2), form independent Poisson Processes. Prob(S ≤ 2) = Prob(no claims of size 1)Prob(0 or 1 claims of size 2) + Prob(1 claim of size 1)Prob(no claims of size 2) + Prob(2 claims of size 1)Prob(no claims of size 2) = (e-3.5)(e-0.5 + .5e-0.5) + (3.5e-3.5)(e-0.5) + (3.52 e-3.5/2)(e-0.5) = 11.125e-4 = 0.2038. Comment: One could use either the Panjer algorithm or convolutions in order to compute the distribution of SB. 1.19. D. The expected aggregate losses are: (0+20)/2 = 10. The expected dividend is: k
∫0
x =k
(k - x) (1/ 20) dx = -(k - x)2 / 40 ]
] = k2/40.
x =0
Setting as stated, premiums - expected dividends = expected aggregate losses, 11.6 - k2 /40 = 10. Therefore, k = 8.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 24
1.20. B. Let A be the aggregate claims. 75% of premiums is 3. If A ≥ 3, the dividend is 0; if A ≤ 3, the dividend is 3 - A. Thus the dividend is: 3 - Minimum(3, A). Therefore, the expected dividend is 3 - E[A ∧ 3]. Prob(A = 0) = Prob(0 claims) = e-1. Prob(A = 1) = Prob(1 claim)Prob(claim size = 1) = (e-1)(.5). Prob(A = 2) = Prob(1 claim)Prob(claim size = 2) + Prob(2 claims)Prob(claim sizes are both 1) = (e-1)(.5) + (e-1/2)(.52 ) = .625e-1. Prob(A ≥ 3) = 1 - 2.125e-1. Thus E[A
∧
3] = (0)(e-1) + (1)(.5e-1) + (2)(.625e-1) + (3)(1 - 2.125e-1) = 3 - 4.625 e-1.
Therefore the expected dividend is: 3 - E[A ∧ 3] = 4.625 e-1 = 1.70. Alternately, if A = 0 the dividend is 3, if A = 1 the dividend is 2, if A = 2 the dividend is 1, and if A ≥ 3, the dividend is zero. Therefore, the expected dividend is: (3)(e-1) + (2)(.5e-1) + (1)(.625e-1) + (0)(1 - 2.125e-1) = 4.625 e-1 = 1.70. 1.21. For the aggregate losses to be 6000, either there are two claims with one of size 1000 and one of size 5000, or there are six claims each of size 1000. The probability is: (2)(.8)(.2)(52 e-5/2!) + (.86 )(56 e-5/6!) = 6.53%. Alternately, thin the Poisson Distribution. The number of claims of size 1000 is Poisson with λ = 4. The number of claims of size 5000 is Poisson with λ = 1. The number of small and large claims is independent. For the aggregate losses to be 6000, either there are two claims with one of size 1000 and one of size 5000, or there are six claims each of size 1000. The probability is: Prob[1 claim of size 1000]Prob[1 claim of size 5000] + Prob[6 claims of size 1000]Prob[no claim of size 5000] = (4e-4)(e-1) + (46 e-4/6!)(e-1) = 6.53%. Comment: One could use either convolution or the Panjer Algorithm (recursive method), but they would take longer in this case. 1.22. A. The sum of independent Compound Poissons is also Compound Poisson, so Statement A is True. When the variance of the number of claims equals its mean, the Poisson distribution is appropriate, so Statement B is False. If there is a chance of no claims, then there is an extra point mass of probability at zero in the aggregate distribution, and the distribution of aggregate losses is not continuous at zero, so Statement C is not True. When, as is common, the distribution of aggregate losses is significantly skewed, the Normal Distribution is not the best approximation, so Statement D is not True.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 25
1.23. B. The only way that reimbursements can be greater than 8000 is if both employees have a claim, and they sum to more than 8000. Let x be the size of Bobʼs claim and y be the size of the Carolʼs claim, then x + y > 8000
⇔ y > 8000 - x. Prob[ y > 8000 - x] = {5000 - (8000 - x)}/4000 = (x - 3000)/4000, x > 3000. Given Bob has had a claim of size > 2000, f(x) = 1/(5000-2000) = 1/3000. Prob[reimbursements > 8000 | Bob has a claim > 2000] = Prob[Carol has a claim] Prob[x + y > 8000] = 5000
0.4
(Prob[ y > ∫ 2000
5000
8000 - x] / 3000) dx = (4/30,000)
{(x - 3000) / 4000} dx ∫ 3000
= (20002 /2) / 30,000,000 = 1/15. 1.24. (i) A has mean: (0.1)(50,000) = 5000, and variance: (500002 )(0.1)(0.9) = 225 million. B has mean: (0.2)(30,000) = 6000, and variance: (300002 )(0.2)(0.8) = 144 million. The total has mean: 5000 + 6000 = 11,000, and variance: 225 million + 144 million = 369 million. The standard deviation of the total is: 369 million = 19,209. (ii) Prob[A and not B] = (0.1)(0.8) = 0.08. Prob[B and not A] = (0.2)(0.9) = 0.18. {(0.08)(50000) + (0.18)(30000)} / (0.08 + 0.18) = 36,154. 1.25. B. After the modifications, the mean frequency is (80%)(20) = 16. The mean payment per loss is: E[X] - E[X
∧ 100] = θ - θ(1 - e-100/θ) = (200)e-100/200 = 121.31.
After the modifications, the mean aggregate loss is: (16)(121.31) = 1941. Alternately, given a loss, the probability of a non-zero payment given a deductible of size 100 is: S(100) = e-100/200 = 0.6065. Thus the mean frequency of non-zero payments is: (0.6065)(16) = 9.704. Due to the memoryless property of the Exponential, the non-zero payments excess of the deductible are also exponentially distributed with θ = 200. Thus the mean aggregate loss is: (9.704)(200) = 1941. 1.26. C. Prob(4 < S < 8) = (1/3)Prob[4 < S < 8 | θ = 5] + (1/6)Prob[4 < S < 8 | θ = 8] = (1/3)(e-4/5 - e-8/5) + (1/6)(e-4/8 - e-8/8) = 0.122.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 26
1.27. E. E[X] = (0.4)(0) + (0.3)(1) + (0.2)(2) + (0.1)(3) = 1. E[X ∧ 1] = (0.4)(0) + (0.3)(1) + (0.2)(1) + (0.1)(1) = 0.6. Expected losses paid by the insurer per factory = E[X] - E[X ∧ 1] = 1 - 0.6 = .4. Insurance Premium = (110%)(2 factories)(0.4 / factory) = .88. Profit = 3 - (0.88 + retained costs + (0.15)(3)) = 1.67 - retained costs. For each factory independently, the retained costs are either zero 40% of the time, or one 60% of the time. Therefore, total retained costs are: zero 16% of the time, one 48% of the time, and two 36% of the time. Probability Retained Losses Profit Dividend 16% 0 1.67 1.67 48% 1 0.67 0.67 36% 2 -0.33 0 Average 1.20 0.47 0.589 Expected Dividend = (0.16)(1.67) + (0.48)(0.67) + (0.36)(0) = 0.589. Alternately, expected losses paid by the insurer per factory = E[(X-1)+] = (0.4)(0) + (0.3)(0) + (0.2)(2 - 1) + (0.1)(3 -1) = 0.4. Proceed as before. Alternately, the dividends are the amount by which the retained costs are less than: revenues - insurance premiums - all other expenses = 3 - 0.88 - .045 = 1.67. Expected Dividend = expected amount by which retained costs are less than 1.67 = 1.67 - E[retained costs ∧ 1.67]. Probability Retained Costs Retained Costs Limited to 1.67 16% 0 0 48% 1 1 36% 2 1.67 E[retained costs ∧ 1.67] = (.16)(0) + (.48)(1) + (.36)(1.67) = 1.081. Expected Dividend = 1.67 - 1.081 = 0.589. Comment: Note that since no dividend is paid when the profit is negative, the average dividend is not equal to the average profit. E[Profit] = 1.67 - E[retained costs] = 1.67 - 1.20 = 0.47 ≠ .589.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 27
1.28. D. For the uniform distribution on (0, b), S(d) = (b-d)/b, for d < b. The frequency distribution of non-zero payments is Poisson with mean: S(d)λ = λ(b-d)/b. Comment: The severity distribution has been truncated and shifted from below, which would have been uniform on [0, b - d], and then all the values were multiplied by c. ⇒ uniform on [0, c(b -d)]. The mean aggregate payment per year after the modifications is: {λ(b-d)/b}{c(b-d)/2} = λc(b-d)2 /(2b). For example, assume b = 5000, so that the severity of each loss before any coverage modifications is uniformly distributed between 0 and 5000, d = 3000, and c = 90%. Then 40% of the losses exceed the deductible of 3000. Thus the modified frequency is Poisson with mean: .4λ = λ(5000 - 3000)/5000. The modified severity is uniform from 0 to (90%)(5000 - 3000) = 1800. The mean aggregate payment per year after the modifications is: (.4λ)(1800/2) = 360λ.
1.29. C. E[X
∧
5] = {θ/(α-1)}{1 - (θ/(θ+x))α−1} = (10/1.5)(1 - (10/15)1.5) = 3.038.
E[X] = θ/(α-1) = 10/1.5 = 6.667. Expected aggregate: (5)(6.667 - 3.038) = 18.1. Alternately, the expected number of (nonzero) payments is: (5)S(5) = (5)(10/15)2.5 = 1.81. The average payment per (nonzero) payment is for the Pareto Distribution: e(5) = (5 + θ)/(α - 1) = (5 + 10)/(2.5 - 1) = 10. Expected aggregate loss is: (10)(1.81) = 18.1. 1.30. A. Prior to the deductible, we expect: (2500)(10%) = 250 losses. The expected losses eliminated by the deductible is: 250/5 = 50 averaging 100 and (250)(4/5) = 200 at 200. (50)(100) + (200)(200) = 45,000. Alternately, the losses eliminated are: (expected number of losses) E[X ∧ 200] 200
= (250){
∫0 (1/ 1000) x dx
+ 200(4/5)} = (250){(2002 /2)/1000 + 160} = (250)(180) = 45,000.
2013-4-3,
Aggregate Distributions §1 Introduction,
HCM 10/23/12,
Page 28
1.31. D. One can either have six claims each of size 100, or two claims with one of size 100 and the other of size 500, in either order. Prob[n = 6] Prob[x = 100]6 + Prob[n = 2] 2 Prob[x = 100] Prob[x = 500] = (56 e-5/6!)(0.86 ) + (52 e-5/2!)(2)(0.8)(0.16) = (0.1462)(0.2621) + (0.0842)(0.2560) = 0.0599. Alternately, the claims of size 100 are Poisson with λ = (0.8)(5) = 4, the claims of size 500 are Poisson with λ = (0.16)(5) = 0.8, and the claims of size 1000 are Poisson with λ = (0.04)(5) = 0.2, and the three processes are independent. Prob[6 @ 100]Prob[0 @ 500]Prob[0 @ 1000] + Prob[1 @ 100]Prob[1 @ 500]Prob[0 @ 1000] = (46 e-4/6!)(e-0.8)(e-0.2) + (4e-4)(0.8e-0.8)(e-0.2) = 0.0599. Comment: One could instead use the Panjer Algorithm, but that would be much longer.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 29
Section 2, Convolutions23 Quite often one has to deal with the sum of two independent variables. One way to do so is via the so-called convolution formula.24 You are unlikely to be tested directly on convolutions on your exam. As will be discussed in the next section, convolutions can be useful for computing either aggregate distributions or compound distributions. Six-sided Dice: If one has a variable with density f, then the convolution of f with itself, f*f, is the density of the sum of two such independent, identically distributed variables. Exercise: Let f be a distribution which has 1/6 chance of a 1, 2, 3, 4, 5, or 6. One can think of this as the result of rolling a six-sided die. What is f*f at 4? [Solution: Prob[X1 + X2 = 4] = Prob[X1 = 1] Prob[X2 = 3] + Prob[X1 = 2] Prob[X2 = 2] + Prob[X1 = 3] Prob[X2 = 1] = (1/6)(1/6) + (1/6)(1/6) + (1/6)(1/6) = 3/36.] One can think of f*f as the sum of the rolls of two six-sided dice. Then f*f*f = f*3 can be thought of as the sum of the rolls of three six-sided dice.
Adding Variables versus Multiplying by a Constant: Let X be the result of rolling a six-sided die: Prob[X = 1] = Prob[X = 2] = Prob[X = 3] = Prob[X = 4] = Prob[X = 5] = Prob[X = 6] = 1/6. Exercise: What are the mean and variance of X? [Solution: The mean is 3.5, the second moment is: (12 + 22 + 32 + 42 + 52 + 62 )/6 = 91/6, and the variance is: 91/6 - 3.52 = 35/12.] Then X + X is the sum of rolling two dice. Exercise: What is the distribution of X + X? [Solution: Result: 2 3 4 5 6 7 8 9 10 11 12 Prob.: 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 ] 23
See Section 9.3 of Loss Models. Also see Section 2.3 of Actuarial Mathematics, not on the Syllabus, or An Introduction to Probability Theory and Its Applications by William Feller. 24 Another way is to work with Moment Generating Functions or other generating functions.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 30
In general, X + X means the sum of two independent, identically distributed variables. In contrast 2X means twice the value of a single variable. In this example, 2X is twice the result of rolling a single die. Exercise: What is the distribution of 2X? [Solution: Result: 2 4 6 8 10 Prob.: 1/6 1/6 1/6 1/6 1/6
12 1/6 ]
We note that the distributions of 2X and X + X are different. For example X + X can be 3, while 2X can not. Exercise: What are the mean and variance of X + X? [Solution: Mean = (2)(3.5) = 7. Variance = (2)(35/12) = 35/6.] The variances of independent variables add. Exercise: What are the mean and variance of 2X? [Solution: Mean = (2)(3.5) = 7. Variance = (22 )(35/12) = 35/3.] Multiplying a variable by a constant, multiplies the variance by the square of that constant. Note that while X + X and 2X each have mean 2 E[X], they have different variances. Var[X + X] = 2 Var[X]. Var[2X] = 4 Var[X].
Convoluting Discrete Distributions: Let X have a discrete distribution such that f(0) = 0.7, f(1) = 0.3. Let Y have a discrete distribution such that g(0) = 0.9, g(1) = 0.1. If X and Y are independent, we can calculate the distribution of Z = X + Y as follows: If X is 0 and Y is 0, then Z = 0. This has a probability of (0.7)(0.9) = 0.63. If X is 0 and Y is 1, then Z = 1. This has a probability of (0.7)(0.1) = 0.07. If X is 1 and Y is 0, then Z = 1. This has a probability of (0.3)(0.9) = 0.27. If X is 1 and Y is 1, then Z = 2. This has a probability of (0.3)(0.1) = 0.03. Thus Z has a 63% chance of being 0, a 34% chance of being 1, and a 3% chance of being 2.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 31
One can put this solution in terms of formulas. Let z be the outcome, Z = X + Y. Let f(x) be the density of X. Let g(y) be the density of Y. Let h(z) be the density of Z. Then since y = z - x and x = z - y one can sum over the possible outcomes in either of two ways: h(z) =
Σ f(x)g(z-x) = Σ f(z-y)g(y). x
y
Exercise: Use the above formulas to calculate h(1). 1 [Solution: h(1) = Σ f(x)g(1-x) = f(0)g(1) + f(1)g(0) = (0.7)(0.1) + (0.3)(0.9) = 0.34, or x=0 1
h(1) = Σ f(1-y)g(y) = f(1)g(0) + f(0)g(1) = (0.3)(0.9) + (0.7)(0.1) = 0.34. ] y=0
One could arrange this type of calculation in a spreadsheet: x 0 1
f(x) 0.7 0.3
Product 0.07 0.27
g(1-x) 0.1 0.9
Sum
1
0.34
1
1-x 1 0
One has to write g in the reverse order, so as to line up the appropriate entries. Then one takes products and sums them. Let us see how this works for a more complicated case.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 32
Exercise: Let X have a discrete distribution such that f(2) = 0.3, f(3) = 0.4, f(4) = 0.1, and f(5) = 0.2. Let have Y have a discrete distribution such that g(0) = 0.5, g(1) = 0.2, g(2) = 0, and g(3) = 0.3. X and Y are independent. Z = X +Y. Calculate the density at 4 of Z. [Solution: Since we want x + y = 4, we put f(3) next to g(1), etc. x 1 2 3 4 5
f(x) 0.3 0.4 0.1 0.2
Product 0 0 0.08 0.05 0
g(4-x) 0.3 0 0.2 0.5
Sum
1
0.13
1 f(4-x) 0.2 0.1 0.4 0.3 1
4-x 3 2 1 0
Alternately, we can list f(4-x). x -1 0 1 2 3
g(x) 0.5 0.2 0 0.3
Product 0 0.05 0.08 0 0
Sum
1
0.13
4-x 5 4 3 2
Alternately, list the possible ways the two variables can add to 4: X = 2 and Y = 2, with probability: (0.3)(0) = 0, X = 3 and Y = 1, with probability: (0.4)(0.2) = 0.8, X = 4 and Y = 0, with probability: (0.1)(0.5) = 0.5. The density at 4 of X + Y is the sum of these probabilities: 0 + 0.8 + 0.5 = 0.13. Comment: f*g = g*f.] In a similar manner we can calculate the whole distribution of Z: z h(z)
2 0.15
3 0.26
4 0.13
5 0.21
6 0.16
7 0.03
8 0.06
Note that the probabilities sum to one: 0.15 + 0.26 + 0.13 + 0.21 + 0.16 + 0.03 + 0.06 = 1. This is one good way to check the calculation of a convolution. One could arrange this whole calculation in spreadsheet form as follows:
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 33
The possible sums of X and Y are: X
0
Y 1
2 3 4 5
2 3 4 5
3 4 5 6
2
3
4 5 6 7
5 6 7 8
0
0.3
0% 0% 0% 0%
9% 12% 3% 6%
With the corresponding probabilities: Probabilities of X
0.5
0.3 0.4 0.1 0.2
15% 20% 5% 10%
Probabilities of Y 0.2 6% 8% 2% 4%
Then adding up probabilities: sum = 2: 15%. sum = 3: 20% + 6% = 26%. sum = 4: 5% + 8% + 0% = 13%. sum = 5: 10% + 2% + 0% + 9%= 21%.
sum = 6: 4% + 0% + 12% = 16%. sum = 7: 0% + 3% = 3%. sum = 8: 6%.
Convoluting Three or More Variables: If one wants to add up three numbers, one can sum the first two and then add in the third number. For example 3 + 5 + 12 = (3 + 5) + 12 = 8 + 12 = 20. Similarly, if one wants to add three variables one can sum the first two and then add in the third variable. In terms of convolutions, one can first convolute the first two densities and then convolute this result with the third density. Continuing the previous example, once one has the distribution of Z = X + Y, then we could compute the densities of X + Y + Y = Z + Y, by performing another convolution. For example, here is how one could compute the density of X + Y + Y = Z + Y at 6: x 2 3 4 5 6 7 8
h(x) 0.15 0.26 0.13 0.21 0.16 0.03 0.06
Product 0 0.078 0 0.042 0.08 0 0
g(6-x)
6-x
0.3 0 0.2 0.5
3 2 1 0
Sum
1
0.2
1
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 34
Notation: We use the notation h = f * g, for the convolution of f and g. Repeated convolutions are indicated by powers, in the same manner as is continued multiplication. f * f = f*2 . f*f*f = f*3 . Sum of 2 independent, identically distributed variables: f* f = f*2 . Sum of 3 independent, identically distributed variables: f* f*f = f*3 . Both Loss Models and Actuarial Mathematics employ the convention that f*0 (x) = 1 if x = 0, and 0 otherwise.25
f*1 (x) = f(x).
Similar notation is used for the distribution of the sum of two independent variables. If X follows F and Y follows G, then the distribution of X + Y is H = F * G. H(z) =
∑ F(x)g(z - x) = ∑ f(z - y)G(y) = ∑ f(x)G(z - x) = ∑ F(z - y)g(y) . x
y
x
y
Again repeated convolutions are indicated by a power: F*F*F = F*3 . Both Loss Models and Actuarial Mathematics employ the convention that F*0 (x) = 0 if x < 0 and 1 if x ≥ 0; F*0 has a jump discontinuity at 1. F*1 (x) = F(x). Properties of Convolutions: The convolution operator is commutative and associative. f*g = g*f. (f*g)*h = f* (g*h). Note that the moment generating function of the convolution f* g is the product of the moment generating functions of f and g. Mf*g = Mf Mg . This follows from the fact that the moment generating function for a sum of independent variables is the product of the moment generating functions of each of the variables. Thus if one takes the sum of n independent identically distributed variables, the Moment Generating Function is taken to the power n. Similarly, the Probability Generating Function of the convolution f* g is just of the product of the Probability Generating Functions of f and g. Pf*g = Pf Pg . 25
This will be used when one writes aggregate or compound distributions in terms of convolutions.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 35
This follows from the fact the Probability Generating Function for a sum of independent variables is the product of the Probability Generating Functions of each of the variables. Thus if one takes the sum of n independent identically distributed variables, the Probability Generating Function is taken to the power n. Convolution of Continuous Distributions: The same convolution formulas apply if one is dealing with continuous rather than discrete distributions, with integration taking the place of summation. One can get the density function for the sum of two independent variables X + Y: h(z) =
∫ f(x) g(z-x) dx = ∫ f(z-y) g(y) dy.
Similar formulas apply to get the Distribution Function of X + Y:
∫
∫
∫
∫
H(z) = f(x)G(z-x) dx = F(z-y)g(y) dy = F(x)g(z-x) dx = f(z-y)G(y) dy. For example, let X have a uniform distribution on [1,4], while Y has a uniform distribution on [7,12]. If X and Y are independent and Z = X + Y, here is how one can use the convolution formulas to compute the density of Z. In this case f(x) = 1/3 for 1< x < 4 and g(y) = 1/5 for 7
∫f(x) g(z-x)dx = ∫(1/3) g(z-x)dx = (1/15)Length[{7< (z-x) <12} and {1
Length[{7< (z-x) <12} and {1 < x < 4}] = Length[{z-12 < x < z-7} and {1 < x < 4}]. If z < 8, then Length[{z-12 < x < z-7} and {1 < x < 4}] = 0. If 8≤ z ≤ 11, then Length[{z-12 < x < z-7} and {1 < x < 4}] = z - 8. If 11≤ z ≤ 13, then Length[{z-12 < x < z-7} and {1 < x < 4}] = 3. If 13≤ z ≤ 16, then Length[{z-12 < x < z-7} and {1 < x < 4}] = 16 - z. If 16 < z, then Length[{z-12 < x < z-7} and {1 < x < 4}]= 0. Thus the density of the sum is: 0 z≤8 (z-8)/15 8 ≤ z ≤ 11 3/15 11 ≤ z ≤ 13 (16-z)/15 13 ≤ z ≤ 16 0 16 ≤ z
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 36
A graph of this density:
11
8
13
16
For example, the density of the sum at 10 is: x=4
∫f(x) g(10-x)dx = ∫(1/3) g(10-x)dx = (1/15)Length[{7<(10-x)<12} and {1
(1/15)Length[{1 < x < 3}] = (1/15)(2) = 2/15. Note the convolution is in this case a continuous density function. Generally the convolution will behave “better” than the original distributions, so convolution serves as a smoothing operator.26 Exercise: X has density f(x) = e-x, x > 0. Y has density g(y) = e-y, y > 0. If X and Y are independent, use the convolution formula to calculate the density function for their sum. [Solution: The density of X + Y is a Gamma Distribution with α = 2 and θ = 1. ∞
z
∫
f* g = f(x) g(z-x) dx = x=0
z
∫ e-x e-(z-x) dx = e-z ∫ dx = ze-z.
x=0
x=0
Note that the integral extends only over the domain of g, so that z-x > 0 or x < z. ] Exercise: X has a Gamma Distribution with α = 1 and θ = 0.1. Y has a Gamma Distribution with α = 5 and θ = 0.1. If X and Y are independent, use the convolution formula to calculate the density function for their sum. [Solution: The density of X + Y is a Gamma Distribution with α = 1 + 5 = 6 and θ = 0.1. ∞
∫
z
f* g = f(z-y)g(y) dy = y=0
z
∫(10e-10(z-y) ) (105y 4e-10y /4!) dy = 106e-z ∫y4/4! dy = 106e-z z5/5!.
y=0
y=0
Note that the integral extends only over the domain of f, so that z-y > 0 or y < z. ] 26
See An Introduction to Probability Theory and Its Applications by William Feller.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 37
Thus we have shown that adding an independent Exponential to a Gamma with the same scale parameter, θ, increases the shape parameter of the Gamma, α, by 1. In general, the sum of two independent Gammas with the same scale parameter, θ is another Gamma with the same θ and the sum of the two alphas.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Problems: Use the following information for the next 10 questions: Let X have density: f(0) = 0.6, f(1) = .3, and f(2) = 0.1. Let Y have density: g(0) = 0.2, g(1) = .5, and g(2) = 0.3. 2.1 (1 point) What is f*f at 2? A. Less than 0.15 B. At least 0.15 but less than 0.20 C. At least 0.20 but less than 0.25 D. At least 0.25 but less than 0.30 E. At least 0.30 2.2 (1 point) What is the cumulative distribution function of X + X at 2? A. Less than 0.80 B. At least 0.80 but less than 0.85 C. At least 0.85 but less than 0.90 D. At least 0.90 but less than 0.95 E. At least 0.95 2.3 (2 points) What is f*3 = f*f*f at 3? A. Less than 0.12 B. At least 0.12 but less than 0.14 C. At least 0.14 but less than 0.16 D. At least 0.16 but less than 0.18 E. At least 0.18 2.4 (3 points) What is f*4 = f*f*f*f at 5? A. Less than 0.04 B. At least 0.04 but less than 0.06 C. At least 0.06 but less than 0.08 D. At least 0.08 but less than 0.10 E. At least 0.10 2.5 (1 point) What is g*g at 3? A. Less than 0.20 B. At least 0.20 but less than 0.25 C. At least 0.25 but less than 0.30 D. At least 0.30 but less than 0.35 E. At least 0.35
Page 38
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
2.6 (1 point) What is the cumulative distribution function of Y + Y at 1? A. Less than 0.10 B. At least 0.10 but less than 0.15 C. At least 0.15 but less than 0.20 D. At least 0.20 but less than 0.25 E. At least 0.25 2.7 (1 point) What is g*3 = g*g*g at 4? A. Less than 0.25 B. At least 0.25 but less than 0.30 C. At least 0.30 but less than 0.35 D. At least 0.35 but less than 0.40 E. At least 0.40 2.8 (1 point) What is g*4 = g*g*g*g at 3? A. Less than 0.10 B. At least 0.10 but less than 0.15 C. At least 0.15 but less than 0.20 D. At least 0.20 but less than 0.25 E. At least 0.25 2.9 (1 point) What is f*g at 3? A. Less than 0.15 B. At least 0.15 but less than 0.20 C. At least 0.20 but less than 0.25 D. At least 0.25 but less than 0.30 E. At least 0.30 2.10 (1 point) What is the cumulative distribution function of X + Y at 2? A. Less than 0.70 B. At least 0.70 but less than 0.75 C. At least 0.75 but less than 0.80 D. At least 0.80 but less than 0.85 E. At least 0.85
Page 39
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 40
2.11 (1 point) Which of the following are true? 1. If f has a Gamma Distribution with α = 3 and θ = 7, then f*10 has a Gamma Distribution with α = 30 and θ = 70. 2. If f has a Normal Distribution with µ = 3 and σ = 7, then f*10 has a Normal Distribution with µ = 30 and σ = 70. 3. If f has a Negative Binomial Distribution with β = 3 and r = 7, then f*10 has a Negative Binomial Distribution with β = 30 and r = 70. A. 1, 2
B. 1, 3
C. 2, 3
D. 1, 2, 3
E. None of A, B, C, or D.
2.12 (2 points) The severity distribution is: f(1) = 40%, f(2) = 50% and f(3) = 10%. There are three claims. What is the chance they sum to 6? A. Less than 0.24 B. At least 0.24 but less than 0.25 C. At least 0.25 but less than 0.26 D. At least 0.26 but less than 0.27 E. At least 0.27 2.13 (3 points) The waiting time, x, from the date of an accident to the date of its report to an insurance company is exponential with mean 1.7 years. The waiting time, y, in years, from the beginning of an accident year to the date of an accident is a random variable with density f(y) = 0.9 + 0.2y, 0 ≤ y ≤ 1. Assume x and y are independent. What is the expected portion of the total number of accidents for an accident year reported to the insurance company by one half year after the end of the accident year? A. Less than 0.41 B. At least 0.41 but less than 0.42 C. At least 0.42 but less than 0.43 D. At least 0.43 but less than 0.44 E. At least 0.44 2.14 (2 points) Let f(x) = 0.02(10 - x), 0 ≤ x ≤ 10. What is the density of X + X at 7? A. Less than 0.07 B. At least 0.07 but less than 0.08 C. At least 0.08 but less than 0.09 D. At least 0.09 but less than 0.10 E. At least 0.10
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 41
Use the following information for the next two questions: X is the sum of a random variable with a uniform distribution on [1, 6] and an independent random variable with a uniform distribution on [3, 11]. 2.15 (1 point) What is the density of X at 14? A. Less than 0.08 B. At least 0.08 but less than 0.09 C. At least 0.09 but less than 0.10 D. At least 0.10 but less than 0.11 E. At least 0.11 2.16 (2 point) What is the Distribution Function of X at 14? A. Less than 0.87 B. At least 0.87 but less than 0.88 C. At least 0.88 but less than 0.89 D. At least 0.89 but less than 0.90 E. At least 0.90 2.17 (2 points) X follows an Exponential Distribution with mean 7. Y follows an Exponential Distribution with mean 17. X and Y are independent. What is the density of Z = X + Y? A. e-z/12 / 12 B. ze-z/12 / 144 C. ( e-z/17 - e-z/7) / 10 D. ( e-z/17 + e-z/7) / 24 E. None of the above 2.18 (3 points) Tom, Dick, and Harry are actuaries working on the same project. Each actuary performs his calculations with no intermediate rounding. Each result is a large number, which the actuary rounds to the nearest integer. If without any rounding Tomʼs and Dickʼs results would sum to Harryʼs, what is the probability that they do so after rounding? A. 1/2 B. 3/5 C. 2/3 D. 3/4 E. 7/8 2.19 (2 points) Let X be the results of rolling a 4-sided die (1, 2, 3 or 4), and let Y be the result of rolling a 6-sided die. X and Y are independent. What is the distribution of X + Y? 2.20 (2 points) The density function for X is f(1) = .2, f(3) = .5, f(4) = .3. The density function for Y is g(0) = .1, g(2) = .7, g(3) = .2. X and Y are independent. Z = X + Y. What is the density of Z at 4? A. 7% B. 10% C. 13% D. 16% E. 19%
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 42
Use the following information for the next three questions:
• The Durham Bulls and Toledo Mud Hens baseball teams will play a series of games against each other.
• Each game will be played either in Durham or Toledo. • Each team has a 55% chance of winning and 45% chance of losing any game at home. • The outcome of each game is independent of the outcome of any other game. • In the series, the Durham Bulls will play one more game at home than the Toledo Mud Hens. 2.21 (2 points) If the series consists of 3 games, what is probability that the Durham Bulls win the series; in other words win more games than their opponents the Toledo Mud Hens? 2.22 (3 points) If the series consists of 5 games, what is probability that the Durham Bulls win the series; in other words win more games than their opponents the Toledo Mud Hens? 2.23 (5 points) If the series consists of 7 games, what is probability that the Durham Bulls win the series; in other words win more games than their opponents the Toledo Mud Hens?
2.24 (1 point) Which of the following are true? 1. If f has a Binomial Distribution with m = 2 and q = 0.07, then f*5 has a Binomial Distribution with m = 10 and q = 0.07. 2. If f has a Pareto Distribution with α = 4 and θ = 8, then f*5 has a Pareto Distribution with α = 40 and θ = 8. 3. If f has a Poisson Distribution with λ = 0.2, then f*5 has a Poisson with λ = 1. A. 1, 2
B. 1, 3
C. 2, 3
D. 1, 2, 3
E. None of A, B, C, or D.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 43
2.25 (4B, 5/85, Q.49) (2 points) The waiting time, x, in years, from the date of an accident to the date of its report to an insurance company is a random variable with probability density function (p.d.f.) f(x), 0 < x < ∞. The waiting time, y, in years, from the beginning of an accident year to the date of an accident is a random variable with p.d.f. g(y), 0 < y < 1. Assuming x and y are independent, which of the following expressions represents the expected proportion of the total number of accidents for an accident year reported to the insurance company by the end of the accident year? F(x), 0 < x < ∞, and G(y), 0 < y < 1 represent respectively the distribution functions of x and y. 1
∫ f(t) G(1-t) dt
A.
0 1
∫ f(t) G(t) dt
B. 0
1
C.
∫ f(t) g(1-t) dt 0 1
∫ F(t) G(t) dt
D. 0
1
∫ F(t) G(1-t) dt
E. 0
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 44
2.26 (5A, 11/94, Q.21) (1 point) Let S = X1 + X2 , where X1 and X2 are independent random variables with distribution functions defined below: X F1 (X) F2 (X) 0 0.3 0.6 1 0.4 0.8 2 0.6 1.0 3 0.7 4 1.0 Calculate Pr(S≤ 2). A . Less than 0.25 B. At least 0.25, but less than 0.35 C. At least 0.35, but less than 0.45 D. At least 0.45, but less than 0.55 E. Greater than or equal to 0.55 2.27 (5A, 11/94, Q.23) (1 point) X1 , X2 , X3 ,and X4 are independent random variables for a Gamma distribution with the parameters α = 2.2 and θ = 0.2. If S = X1 + X2 + X3 + X4 , then what is the distribution function for S? A. Gamma distribution with the parameters α = 8.8 and θ = 0.8. B. Gamma distribution with the parameters α = 8.8 and θ = 0.2. C. Gamma distribution with the parameters α = 2.2 and θ = 0.8. D. Gamma distribution with the parameters α = 2.2 and θ = 0.2. E. None of the above 2.28 (5A, 5/95, Q.19) (1 point) Assume S = X1 + X2 + ... + XN, where X1 , X2 , ... XN are identically distributed and N, X1 , X2 , ... XN are mutually independent random variables. Which of the following statements are true? 1. If the distribution of the Xiʼs is continuous and the Prob(N = 0) > 0, the distribution of S will be continuous. 2. If the distribution of the Xiʼs is normal, then the nth convolution of the Xiʼs is normal. 3. If the distribution of the Xiʼs is exponential, then the nth convolution of the Xiʼs is exponential. A. 1
B. 2
C. 1, 2
D. 2, 3
E. 1, 2, 3
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 45
2.29 (5A, 11/95, Q.19) (1 point) Let S = X1 + X2 , where X1 and X2 are independent random variables with the following distribution functions: X F1 (X) F2 (X) 0 0.5 0.3 1 0.8 0.6 2 1 1 What is the probability that S > 2? A. Less than 0.20 B. At least 0.20 but less than 0.40 C. At least 0.40 but less than 0.60 D. At least 0.60 but less than 0.80 E. At least 0.80 2.30 (5A, 11/97, Q.22) (1 point) The following information is given regarding three mutually independent random variables: x f1 (x) f2 (x) f3 (x) 0 0.5 0.2 0.1 1 0.4 0.2 0.9 2 0.1 0.2 3 0.2 4 0.2 If S = x1 + x2 + x3 , calculate the probability that S = 5. A. Less than 0.10 B. At least 0.10, but less than 0.15 C. At least 0.15, but less than 0.20 D. At least 0.20, but less than 0.25 E. 0.25 or more 2.31 (5A, 11/98, Q.24) (1 point) Assume that S = X1 +X2 +X3 +...+XN where X1 , X2 , X3 , ...XN are identically distributed and N, X1 , X2 , X3 , ... XN are mutually independent random variables. Which of the following statements is true? 1. If the distribution of the Xi's is continuous and the Pr[N=0] > 0, the distribution of S will be continuous. 2. The nth convolution of a normal distribution with parameters µ and σ is also normal with mean nµ and variance nσ2. 3. If the individual claim amount distribution is discrete, the distribution of S is also discrete. A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A, B, C, or D
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 46
2.32 (5A, 5/99, Q.37) (2 points) ABC Insurance Company writes liability coverage with one maximum covered loss of $90,000 offered to all insureds. Only two types of losses to the insurer arise out of this coverage: (1) total limits plus expenses: $100,000 (2) loss expenses only: $50,000 You are given the following distribution of aggregate losses that applies in years when the insurer faces 2 claims. x f(x) 100,000 90.25% 150,000 9.5% 200,000 0.25% If, next year, the insurer faces 3 claims, what is the likelihood that the aggregate losses will exceed $150,000? 2.33 (5A, 11/99, Q.23) (1 point) X1 , X2 , X3 are mutually independent random variables with probability functions as follows: x f1 (X) f2 (X) 0 1 2 3 S = X1
0.9 0.5 0.1 0.3 0.0 0.2 0.0 0.0 + X2 + X3 . Find fS(2).
f3 (X) 0.25 0.25 0.25 0.25
A. Less than 0.25 B. At least 0.25 but less than 0.26 C. At least 0.26 but less than 0.27 D. At least 0.27 but less than 0.28 E. At least 0.28 2.34 (Course 151 Sample Exam #1, Q.8) (1.7 points) Aggregate claims S = X1 + X2 + X3 , where X1 , X2 and X3 are mutually independent random variables with probability functions as follows: x f1 (x) f2 (x) f3 (x) 0 0.6 1 0.4 2 0.0 3 0.0 4 0.0 You are given FS(4) = 0.6. Determine p. (A) 0.0 (B) 0.1
p 0.3 0.0 0.0 0.7-p
0.0 0.5 0.5 0.0 0.0
(C) 0.2
(D) 0.3
(E) 0.4
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 47
2.35 (Course 151 Sample Exam #3, Q.11) (1.7 points) S = X1 + X2 + X3 where X1 , X2 and X3 are independent random variables distributed as follows: x
f1 (X)
f2 (X)
0 0.2 0 1 0.3 0 2 0.5 p 3 0.0 1-p 4 0.0 0 You are given FS(4) = 0.43. Determine p. (A) 0.1 (B) 0.2
(C) 0.3
f3 (X) 0.5 0.5 0.0 0.0 0.0
(D) 0.4
(E) 0.5
2.36 (1, 11/01, Q.37) (1.9 points) A device containing two key components fails when, and only when, both components fail. The lifetimes, T1 and T2 , of these components are independent with common density function f(t) = e-t, t > 0. The cost, X, of operating the device until failure is 2T1 + T2 . Which of the following is the density function of X for x > 0? (A) e-x/2 - e-x
(B) 2(e-x/2 - e-x)
(C) x2 e-x/2
(D) e-x/2/2
(E) e-x/3/3
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 48
Solutions to Problems: 2.1. C. x 0 1 2
f(x) 0.6 0.3 0.1
f(2-x) 0.1 0.3 0.6
Product 0.06 0.09 0.06
Sum
1
1
0.21
Comment: f*f(0) = f(0) f(0) = (0.6)(0.6) = 0.36. f*f(1) = f(0) f(1) + f(1) f(0) = (0.6)(0.3) + (0.3)(0.6) = 0.36. f*f(3) = f(1) f(2) + f(2) f(1) = (0.1)(0.3) + (0.3)(0.1) = 0.06. f*f(4) = f(2)f(2) = (0.1)(0.1) = 0.01. 2.2. D. x 0 1 2
F(x) 0.6 0.9 1
Sum
f(2-x) 0.1 0.3 0.6
Product 0.06 0.27 0.6
1
0.93
Comment: Alternately, f*f(0) + f*f(1) + f*f(2) = 0.36 + 0.36 + 0.21 = 0.93. 2.3. B. One can use the fact that f*f*f = (f*f)*f. x -1 0 1 2 3 4
f*f(x)
f(3-x)
Product
0.36 0.36 0.21 0.06 0.01
0.1 0.3 0.6
0.036 0.063 0.036
Sum
1
1
0.135
2.4. A. One can use the fact that f*f = (f*f)*(f*f). x 0 1 2 3 4 5 Sum
f*f(x) 0.36 0.36 0.21 0.06 0.01
f*f(5-x) 0.01 0.06 0.21 0.36 0.36
Product 0 0.0036 0.0126 0.0126 0.0036 0.0324
2013-4-3,
Aggregate Distributions §2 Convolutions,
2.5. D. x 0 1 2 3
g(x) 0.2 0.5 0.3
g(3-x) 0.3 0.5 0.2
Sum
Product 0 0.15 0.15 0 0.3
2.6. D. x 0 1 2
G(x) 0.2 0.7 1
g(1-x) 0.5 0.2
Sum
Product 0.1 0.14 0 0.24
2.7. B. x 0 1 2 3 4
g*g(x) 0.04 0.2 0.37 0.3 0.09
g(4-x)
0.3 0.5 0.2
Sum
Product 0 0 0.111 0.15 0.018 0.279
2.8. C. One can use the fact that g*g = (g*g)*(g*g). x 0 1 2 3 4
g*g(x) 0.04 0.2 0.37 0.3 0.09
g*g(3-x) 0.3 0.37 0.2 0.04
Sum
Product 0.012 0.074 0.074 0.012 0 0.172
2.9. A. x 0 1 2 Sum
f(x) 0.6 0.3 0.1
g(3-x) 0.3 0.5
Product 0 0.09 0.05 0.14
HCM 10/23/12,
Page 49
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 50
2.10. D. One can calculate the answer in either of two ways. x 0 1 2
F(x) 0.6 0.9 1
g(2-x) 0.3 0.5 0.2
Product 0.18 0.45 0.2
Sum
0.83
x 0 1 2
G(x) 0.2 0.7 1
f(2-x) 0.1 0.3 0.6
Product 0.02 0.21 0.6
Sum
0.83
2.11. E. 1. False, the sum of 10 independent Gammas is a Gamma with parameters 10α and θ. 2. False. The variances add, so that the new variance is 10σ2 and the new standard deviation is σ 10 , not 10σ. The sum of 10 independent Normals is a Normal with parameters 10µ and σ 10 . 3. False. The sum of 10 independent Negative Binomials is a Negative Binomial with parameters β and 10r. 2.12. B. First one can compute f*f. f*f(2) = 0.16. f*f(3) = 0.40. f*f(4) = (0.4)(0.1) + (0.5)(0.5) + (0.1)(0.4) = 0.33. f*f(5) = 0.10. f*f(6) = 0.01. Then use the fact that f*f*f = (f*f)*f. x 2 3 4 5 6
f*f(x) 0.16 0.4 0.33 0.1 0.01
f(6-x)
Sum
1
1
Product 0 0.04 0.165 0.04 0
0.1 0.5 0.4
0.245
Comment: The mean of f is 0.4 + 1 + 0.3 = 1.7. If one computes f*f*f and computes the mean, one gets 5.1 = (3) (1.7) The mean of the sum of 3 claims is three times the mean of a single claim. x 3 4 5 6 7 8 9
f*f*f(x) 0.064 0.24 0.348 0.245 0.087 0.015 0.001
Product 0.192 0.96 1.74 1.47 0.609 0.12 0.009
Sum
1
5.1
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 51
2.13. D. For date of accident 0 < y < 1, the expected portion of accidents reported by time is: 1.5 is: 1 - e-(1.5-y)/1.7. We can integrate over the dates of accident: H(1.5) = 1
∫ G(1.5 - y) f(y) dy = ∫0 (1 - e - (1.5 - y) / 1.7 ) (0.9 + 0.2y) dy = 1
∫0 {0.9 - 0.9e - (1.5 - y) / 1.7 + 0.2y - 0.2y e - (1.5 - y) / 1.7} dy - (1.5 - y) / 1.7
= {0.9y - 1.53e
+
0.1y2
- 0.34y
e - (1.5 - y) / 1.7
+
y =1 (1.5 y) / 1.7 0.0578e }
]
y =0
0.9 - 0.5070 + 0.1 - 0.2534 + 0.1915 = 0.4311. Comment: One could instead use: H(1.5) =
∫ g(x) F(1.5-x) dx.
2.14. E. Now f(7-x) > 0 for 0 ≤ 7-x < 10, which implies -3 < x ≤ 7. In addition f(x) > 0 when 0 ≤ x < 10. Thus f(x)f(7-x) > 0 when 0 ≤ x ≤ 7. 7
7
∫0
f*f(7) = (0.02)(10 - x) (0.02){10 - (7 - x)} dx = 0.0004
∫0 30 + 7x - x2 dx =
(0.0004) {210 + (7/2)(49) - (343/3)} = 0.1069. 2.15. A. Let f(y) = 1/5 for 1 < y < 6 and g(z) = 1/8 for 3 < z < 11, then density of the sum is: 6
h(14) =
∫ f(y) g(14 - y) dy = ∫3 (1/ 5) (1/ 8) dy = 3/40 = 0.075.
Comment: The integrand is zero unless 1 ≤ y ≤ 6 and 3 ≤ 14-y ≤ 11. Therefore, we only integrate from y = 3 to y = 6.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 52
2.16. C. Let F(y) = (y-1)/5 for 1 ≤ y ≤ 6, F(y) = 1 for y > 6, and g(z) = 1/8 for 3 ≤ z ≤ 11, then the distribution of the sum is: 8
11
∫ F(14 - z) g(z) dz z = ∫3 (1/ 8) dz + ∫8 (13 - z)/ 40 dz = 5/8 + 21/80 = 0.8875.
H(14) =
Alternately, X has a density function: (x - 4)/40 5/40 (17-x)/40
4≤x≤9 9 ≤ x ≤ 12 12 ≤ x ≤ 17. 9
Thus F(14) =
14
∫4 (x - 4)/ 40 dx + ∫1/8 dx + 12∫ (17 - x) / 40 dx = 0.3125 + 0.375 + 0.200 = 0.8875.
Comment: In general, if X is the sum of two uniform variables on (a, b) and (c, d), with d - c ≥ b - a, then X has a density: (x - (a+b))/{(b-a)(d-c)}, a + c ≤ x ≤ b + c (b-a)/{(b-a)(d-c)}, b + c ≤ x ≤ a + d (c+d-x)/{(b-a)(d-c)}, a + d ≤ x ≤ b + d. 2.17. C. f(x) = e-x/7 /7. g(y) = e-y/17 /17. Using the convolution formula: z
h(z) =
∫ f(t) g(z - t) dt = ∫0 (e- t / 7
z
/ 7) (e- (z - t) / 17 / 17) dt = (e-z/17 /119)
∫0 e- 0.0840336t dt =
(e-z/17 /119) (e-0.0840336z - 1) / (-0.0840336) = ( e-z/17 - e- z / 7) /10. Alternately, the Moment Generating Functions are: 1/(1-7t) and 1/(1-17t). Their product is: 1/{(1-7t)(1-17t)} = (17/10)/(1-17t) - (7/10)/(1-7t). This is: (17/10)(m.g.f. of an Exponential with mean 17) - (7/10)(m.g.f. of an Exponential with mean 7) = m.g.f of [(17/10 times an Expon. with mean 17)- (7/10 times an Exponential with mean 7)]. Thus Z is (17/10 times an Expon. with mean 17) - (7/10 times an Exponential with mean 7). Density of Z is: (17/10)(e-y/17 /17) - (7/110)( e-z/7 /7) = ( e-z/17 - e-z/7) /10. Comment: In general, the sum of two independent Exponentials with different means θ1 and θ2 , has a density of: {exp(-z/θ1 ) - exp(-z/θ2 )}/(θ1 - θ2 ). If θ1 = θ2 , then one would instead get a Gamma Distribution with parameters α = 2 and θ = θ1 = θ2 . In this case with differing means, the density is closely approximated by a Gamma Distribution with α =2 and θ = (7+17)/2 =12, but it is not a Gamma Distribution. The sum of n independent Exponentials with different means θi, has density: Σ θin-2 exp(-z/θi)/ Π (θi - θj), n≥2, θi ≠ θj. i
i≠j
See Example 2.3.3 in Actuarial Mathematics, not on the Syllabus.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 53
2.18. D. Let x = Tomʼs unrounded result - Tomʼs rounded result. Then x is uniformly distributed from -0.5 to 0.5. Let y = Dickʼs unrounded result - Dickʼs rounded result. Then y is uniformly distributed from -0.5 to 0.5. Then z = x + y has a triangle distribution: f(z) = 1 + z for -1 ≤ z ≤ 0, and f(z) = 1 - z for 1 ≥ z ≥ 0. F(z) = (1 + z)2 /2 for -1 ≤ z ≤ 0, and F(z) = 1 - (1 - z)2 /2 for 1 ≥ z ≥ 0. The sum of the rounded results equals the rounding of the sums provided z is between -.5 and +.5. The probability that -0.5 < z < 0.5 is: F(0.5) - F(-0.5) = 7/8 - 1/8 = 3/4. Alternately, let t = decimal portion of Tomʼs unrounded result and d = decimal portion of Dickʼs unrounded result. Divide into cases: t < 1/2, d < 1/2, and 0 ≤ t + d < 1/2; OK; Prob = 1/8. t < 1/2, d < 1/2, and 1 > t + d ≥ 1/2; not OK; Prob = 1/8. t < 1/2, d ≥ 1/2; OK; Prob = 1/4. t ≥ 1/2, d < 1/2; OK; Prob = 1/4. t ≥ 1/2, d ≥ 1/2, and 1 ≤ t + d < 3/2; not OK; Prob = 1/8. t ≥ 1/2, d ≥ 1/2, and 2 ≥ t + d ≥ 3/2; OK; Prob = 1/8. Total probability where OK is: 1/8 + 1/4 + 1/4 + 1/8 = 3/4. 2.19. The possible results range from 2 to 10. There are (4)(6) = 24 equally likely results. For example, a 4 can result from 1, 3; 2, 2; or 3, 1. Thus the chance of a 4 is: 3/24. A 7 can result from 1, 6; 2; 5; 3, 4; or 4, 3. Thus the chance of a 7 is 4/24. The probability density function is: Result Number of Chances Probability
2 1 0.042
3 2 0.083
4 3 0.125
5 4 0.167
6 4 0.167
7 4 0.167
8 3 0.125
9 2 0.083
10 1 0.042
2013-4-3,
Aggregate Distributions §2 Convolutions,
2.20. A. h(4) =
HCM 10/23/12,
Page 54
∑ f(x) g(4 - x) = f(1) g(3) + f(3) g(1) + f(4) g(0) x
= (0.2)(0.2) + (0.5)(0) + (0.3)(0.1) = 0.07. Or h(4) =
∑ f(4 - y) g(y) = f(4)g(0) + f(2)g(2) + f(1)g(3) = (0.3)(0.1) + (0)(0.7) + (0.2)(0.2) = 0.07. y
One can arrange this calculation in a spreadsheet: x
f(x)
g(4-x)
f(x)g(4-x)
1 2 3 4
0.2 0 0.5 0.3
0.2 0.7 0 0.1
0.04 0 0 0.03 0.07
Comment: Similarly, h(5) = Σ f(x)g(5-x) = 0.35. x
f(x)
1 2 3 4
0.2 0 0.5 0.3
g(5-x)
f(x)g(5-x)
0.2 0.7 0 0.1
0 0 0.35 0 0 0.35
h(6) = Σ f(x)g(6-x) = (0.5)(0.2) + (0.3)(0.7) = 0.31. The entire convolution of f and g is shown below: z h(z)
1 0.02
2 0
3 0.19
4 0.07
5 0.35
6 0.31
7 0.06
Note that being a distribution, h = f*g sums to 1. 2.21. The number of home games the Durham Bulls win is Binomial with m = 2 and q = 0.55: f(0) = 0.452 = 0.2025. f(1) = (2)(0.55)(0.45) = 0.4950. f(2) = 0.552 = 0.3025. The number of road games the Durham Bulls win is Binomial with m = 1 and q = 0.45. Prob[3 wins] = Prob[2 home wins]Prob[1 road win] = (0.3025)(0.45) = 0.1361. Prob[2 wins] = Prob[2 home wins]Prob[0 road win] + Prob[1 home wins]Prob[1 road win] = (0.3025)(0.55) + (0.4950)(0.45) = 0.3891. Prob[at least 2 wins] = 0.1361 + 0.3891 = 52.52%. Comment: The total number of games they win is the convolution of the two Binomials. In a three game championship series, if one team won the first 2 games, then the final game would not be played. However, this does not affect the answer to the question.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 55
2.22. The number of home games the Durham Bulls win is Binomial with m = 3 and q = .55: f(0) = 0.453 = 0.0911. f(1) = (3)(0.55)(0.452 ) = 0.3341. f(2) = (3)(0.552 )(0.45) = 0.4084. f(3) = 0.553 = 0.1664. The number of road games the Durham Bulls win is Binomial with m = 2 and q = 0.45: g(0) = 0.552 = 0.3025. g(1) = (2)(0.45)(0.55) = 0.4950. g(2) = 0.452 = 0.2025. Prob[5 wins] = Prob[3 home wins]Prob[2 road wins] = (0.1664)(0.2025) = 0.0337. Prob[4 wins] = Prob[3 home wins]Prob[1 road win] + Prob[2 home wins]Prob[2 road wins] = (0.1664)(0.4950) + (0.4084)(0.2025) = 0.1651. Prob[3 wins] = Prob[3 home wins]Prob[0 road win] + Prob[2 home wins]Prob[1 road win] + Prob[1 home win]Prob[2 road wins] = (0.1664)(0.2025) + (0.4084)(0.4950) + (0.3341)(0.2025) = 0.3201. Prob[at least 3 wins] = 0.0337 + 0.1651 + 0.3201 = 51.89%. Comment: The longer the series, the less advantage the Durham Bulls get from the extra home game. 2.23. The number of home games the Durham Bulls win is Binomial with m = 4 and q = .55: f(0) = 0.454 = 0.0410. f(1) = (4)(0.55)(0.453 ) = 0.2005. f(2) = (6)(0.552 )0(.452 ) = 0.3675. f(3) = (4)(.553 )(0.45) = 0.2995. f(4) = 0.554 = 0.0915. The number of road games the Durham Bulls win is Binomial with m = 3 and q = 0.45: g(0) = 0.553 = 0.1664. g(1) = (3)(0.552 )(0.45) = 0.4084. g(2) = (3)(0.55)(0.452 ) = 0.3341. g(3) = 0.453 = 0.0911. Prob[7 wins] = Prob[4 home wins]Prob[3 road wins] = (0.0915)(0.0911) = 0.0083. Prob[6 wins] = Prob[4 home wins]Prob[2 road wins] + Prob[3 home wins]Prob[3 road wins] = (.0915)(0.3341) + (.2995)(0.0911) = 0.0579. Prob[5 wins] = Prob[4 home wins]Prob[1 road win] + Prob[3 home wins]Prob[2 road wins] + Prob[2 home wins]Prob[3 road wins] = (0.0915)(0.4084) + (0.2995)(0.3341) + (.03675)(0.0911) = 0.1709. Prob[4 wins] = Prob[4 home wins]Prob[0 road win] + Prob[3 home wins]Prob[1 road win] + Prob[2 home wins]Prob[2 road wins] + Prob[1 home wins]Prob[3 road wins] = (0.0915)(0.1664) + (0.2995)(0.4084)) + (0.3675)(0.3341) + (0.2005)(0.0911) = 0.2786. Prob[at least 4 wins] = 0.0083 + 0.0579 + 0.1709 + 0.2786 = 51.57%. Comment: The probabilities for the number of games won by the Durham Bulls are: 0.68%, 5.01%, 15.67%, 27.06%, 27.86% , 17.09%, 5.79%, 0.83%.
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 56
2.24. B. The sum of 5 independent, identically distributed Binomials is another Binomial with q the same and m multiplied by 5. Statement #1 is true. The sum of independent, identically distributed Paretos is not another Pareto. The sum of 5 independent, identically distributed Poissons is another Poisson with λ multiplied by 5. Statement #3 is true. 2.25. A. We can add up over all possible reporting delays the chance that for a given reporting delay the accident date is such that the accident will be reported by the end of the accident year. For a given reporting delay x, the accident will be reported by the end of the accident year (time = 1) if and only if the accident date is ≤ 1 - x. This only holds for x≤1; if the reporting delay is greater than 1, then the accident can not be reported before the end of the accident year, regardless of the accident date. The chance that the accident date is ≤ 1-x is: G(1-x). So the chance that we have an reporting delay of x and that the accident is reported by the end of the accident year is the product: f(x)G(1-x), since X and Y are given to be independent. Integrating over all reporting delays less than or equal to 1, we get the chance that the accident is reported by the end of the accident year: 1
∫0 f(x) G(1- x) dx . Comment: More generally, if x is the time from the accident date to the reporting date and y is the time from the beginning of the accident year to the accident date, then the time from the beginning of the accident year to the reporting date is x + y. An accident is reported by time z from the beginning of the accident year is x + y ≤ z. So the distribution of Z is the distribution of the sum of X and Y. If X and Y are independent, then the probability density function of their sum is given by the convolution formula:
∫ f(x) g(z - x) dx = ∫ f(z - y) g(y) dy . The distribution function is given by: z
z=∞
∞
f(x) g(z - x) dz dx = f(x) G(z - x) dx . ∫ ∫ ∫ x=-∞ z=-∞ -∞ In this case we are asked for the chance that the time between the beginning of the accident year and the date of reporting is less than 1 (year) so we want the chance that z = x+y ≤1. Since in this case 0 < x < ∞, the integral only goes at most from 0 to infinity. Also since in this case 0 < y < 1, we have 0 < (z -x) < 1 so that (z-1) < x < z. Thus the integral goes at most from z-1 to z. Thus the chance that the accident is reported by z from the beginning of the accident year is for z > 0: z
f(x) G(1- x) dx . ∫ max[0, z-1] Thus the chance that the accident is reported by the end of the accident year (z=1) is:
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 57
1
∫0 f(x) G(1- x) dx . An important actuarial idea, very unlikely to be asked about on this exam. Alternately to the intended solution, we can add up over all possible accident dates the chance that for a given accident date the reporting delay is less than or equal to the time until the end of the accident year. For a given accident date y, the time until the end of the accident year is 1-y. The chance that the reporting delay is ≤ 1-y is: F(1-y). So the chance that we have an accident date of y and that it is reported by the end of the accident year is the product g(y)F(1-y), since X and Y are given to be independent. Integrating over all possible accident dates (0
∫0 g(y) F(1- y) dy . Convolutions can generally be computed in either of these two alternate forms. 2.26. D. S > 2 when: X2 = 0 and X1 > 2, X2 = 1 and X1 > 1, X2 = 2 and X1 > 0. This has probability: (0.6)(1 - 0.6) + (0.2)(1 - 0.4) + (0.2)(1 - 0.3) = 0.50. Pr(S≤ 2) = 1 - 0.5 = 0.5. Alternately, FS(2) = Σ f1 (x)F2 (2-x) = (0.3)(1) + (0.1)(0.8) + (0.2)(0.6) = 0.5. FS(2) = Σ f1 (2-x)F2 (x) = (0.2)(0.6) + (0.1)(0.8) + (0.3)(0.1) = 0.5. FS(2) = Σ F1 (x)f2 (2-x) = (0.3)(0.2) + (0.4)(0.2) + (0.6)(0.6) + (0.7)(0) + (1)(0) = 0.5. FS(2) = Σ F1 (2-x)f2 (x) = (0.6)(0.6) + (0.4)(0.2) + (0.3)(0.2) = 0.5. 2.27. B. The sum of 4 independent, identical Gamma Distributions is another Gamma Distribution with the same θ parameter and 4 times the α parameter, in this case: α = 8.8 and θ = 0.2. 2.28. B. 1. False. There will be a point mass of probability at zero. 2. True. 3. False. The nth convolution of an Exponential is a Gamma with shape parameter α = n. 2.29. B. S > 2 if X1 = 1 and X2 ≥ 2, or X1 = 2 and X2 ≥ 1. This has probability: (0.3)(0.4) + (0.2)(0.7) = 0.26. Comment: Iʼve used the formula: (F*G)(z) = Σ f(x) G(z-x).
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 58
2.30. C. The ways in which S can be 5 are: (0, 4, 1), (1, 4, 0), (1, 3, 1), (2, 3, 0), (2, 2, 1) with probability: (0.5)(0.2)(0.9) + (0.4)(0.2)(0.1) + (0.4)(0.2)(0.9) + (0.1)(0.2)(0.1) + (0.1)(0.2)(0.9) = 0.19. Alternately, f1 (x)*f3 (x) = .05 @ 0, .49 @ 1, .37 @ 2, and .09 @ 3. f1 *f3 *f2 (5) = (0.49)(0.2) + (0.37)(0.2) + (0.09)(0.2) = 0.19. 2.31. E. If there is a chance of no claims, then there is an extra point mass of probability at zero in the aggregate distribution, and the distribution of aggregate losses is not continuous at zero, so Statement #1 is False. The sum of n independent Normal Distributions is also Normal, with n times the mean and n times the variance, so statement #2 is true. Statement #3 is True. 2.32. Based on the aggregate distribution with two losses, there is a
90.25% = 0.95 chance of a
$50,000 loss and a 0.25% = 0.05 chance of a $100,000 loss. The aggregate distribution with three claims is that for two claims convoluted with that for one claim; it has density at $150,000 of (0.95)(0.9025) = 0.857375. With 3 claims the aggregate distribution is ≥ $150,000, so the chance of exceeding $150,000 is: 1 - 0.857375 = 14.2625%. 2.33. A. f1 *f2 is: (0.9)(0.5) = 0.45@0, (0.9)(0.3) + (0.1)(0.5) = 0.32 @1, (0.9)(0.2) + (0.1)(0.3) = 0.21@2, and (0.1)(0.2) = 0.02 @ 3. fS = f1 *f2 *f3 is: (0.45)(0.25) = 0.1125 @ 0, (0.45)(0.25) + (0.32)(0.25) = 0.1925 @ 1, (0.45)(0.25) + (0.32)(0.25) + (0.21)(0.25) = 0.245 @ 2, (0.45)(0.25) + (0.32)(0.25) + (0.21)(0.25) + (0.02)(0.25) = 0.25 @ 3, (0.32)(0.25) + (0.21)(0.25) + (0.02)(0.25) = 0.1375 @ 4, (0.21)(0.25) + (0.02)(0.25) = 0.0575 @ 5, and (0.02)(0.25) = 0.005 @ 6. 2.34. D. We are given that the chance that S is greater than 4 is: 1- 0.6 = 0.4. Since the sum of severities one and three is either 1, 2 or 3, and since the second severity is either 0, 1, or 4, S is greater than 4 if and only if the second severity is 4. Thus 0.7 - p = 0.4. p = 0.3. Alternately, we can compute the distribution function of S, FS, via convolution. First convolute F1 and f3 . F1 * f3 is: (.6)(.5) = .3 @ 1, (.6)(.5) + (1)(.5) = .8 @ 2, and (1)(.5) = (1)(.5) = 1 @ 3. Next convolute by f2 . FS = F1 * f3 * f2 is: .3p @ 1, .8p + (.3)(.3) @ 2, p + (.3)(.8) + (0)(.3) @ 3, p + (.3)(1) +(0)(.8) + 0(.3) @ 4, p + .3 + 0 + 0 + (.7-p)(.3) @ 5, p + .3 + (.7-p)(.8) @ 6, and 1 @ 7. We are given FS(4) = 0.6, so that 0.6 = p + (.3)(1) +(0)(.8) + 0(.3). Therefore p = .3. Comment: In order to compute FS, one can do the convolutions in any order. I did it in the order I found easiest. In general, FX+Y(z) =
∑ FX(x) fY(z - x) = ∑ FY(y) fX(z - y) = ∑ FX(z - y) fY(y) = ∑ FY(z - x) fX(x) . x
y
y
x
2013-4-3,
Aggregate Distributions §2 Convolutions,
HCM 10/23/12,
Page 59
2.35. B. f1 *f3 is: (0.2)(0.5) = .1@0, (0.2)(0.5) + (0.3)(0.5) = 0.25 @1, (0.3)(0.5) + (0.5)(0.5) = 0.4@2, and (0.5)(0.5) = 0.25 @ 3. fS = f2 *f1 *f3 is: 0.1p @2, 0.1(1 - p) + 0.25p @3, and (0.25)(1-p) + (0.4)(p) @4. Since S≥ 2, FS(4) = fS(2) + fS(3) +fS(4) = 0.1p + 0.1(1 - p) + 0.25p + (0.25)(1-p) + (0.4)(p) = 0.35 + 0.4p. Setting FS(4) = 0.43: 0.35 + 0.4p = 0.43. ⇒ p = 0.2. Comment: Although it is not needed to solve the problem: f2 *f1 *f3 is: 0.25p + 0.4(1-p) @5, and 0.25(1-p) @6. One can verify that the density of S sums to one. 2.36. A. T1 is Exponential with mean 1. When we multiply by 2, we get another Exponential with mean 2. Let 2T1 = U. Then U is Exponential with θ = 2. Density of U: e-u/2/2. X = U + V, where V = T2 . Density of V is: e-v = e-(x-u). x
∫0
Density of x = (e- u / 2 / 2)
x
e- (x - u)
∫0
du = e-x eu / 2 / 2 du = (e-x)(ex/2 - 1) = e- x / 2 - e- x, x > 0.
Comment: The sum of an Exponential with θ = 2 and an Exponential with θ = 1, is not a Gamma Distribution.
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 60
Section 3, Using Convolutions Convolutions can be useful for computing either aggregate distributions or compound distributions.27 Aggregate Distributions: Exercise: Frequency is given by a Poisson with mean 7. Severity is given by an Exponential with mean 1000. Frequency and Severity are independent. Write the Distribution Function for the aggregate losses. [Solution: The chance of n claims is e-7 7n / n!. If one has n claims, then the Distribution of Aggregate Losses is the sum of n independent Exponentials or a Gamma with parameter α = n and θ = 1000. Let FA(x) be the Distribution of Aggregate Losses. FA(x) = Σ (Probability of n claims)(Aggregate Distribution given n claims) = Σ (e-7 7n / n!) Γ(n; x/1000).] We note that each Gamma Distribution was the nth convolution of the Exponential. Each term of the sum is the density of the frequency distribution at n times the nth convolution of the severity distribution. More generally, if frequency is FN, if severity is FX, frequency and severity are independent, and aggregate losses are FA then: ∞
FA (x) =
Σ fN (n) FX*n (x).
n = 0 ∞
fA (x) =
Σ fN (n) fX*n (x).
Recalling that f*0 (0) ≡ 1.
n = 0
If one has discrete severity distributions, one can employ these formulas to directly calculate the distribution of aggregate losses.28 27
The same mathematics applies to aggregate distributions (independent frequency and severity) and compound distributions. 28 If the severity is continuous, as will be discussed in a subsequent section, then one could approximate it by a discrete distribution.
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 61
An Example with a Discrete Severity: Exercise: Let a discrete severity distribution be: f(10) = 0.4, f(20) = 0.5, f(30) = 0.1. What is f*f? [Solution: List the possible ways the two variables can add to 20: 10 and 10, with probability: (0.4)(0.4) = 0.16. f*f(20) = 0.16. List the possible ways the two variables can add to 30: 10 and 20, with probability: (0.4)(0.5) = 0.20, or 20 and 10, with probability: (0.5)(0.4) = 0.20. f*f(30) = 0.20 + 0.20 = 0.40. List the possible ways the two variables can add to 40: 10 and 30, with probability: (0.4)(0.1) = 0.04, or 20 and 20, with probability: (0.5)(0.5) = 0.25, or 30 and 10, with probability: (0.1)(0.4) = 0.04. f*f(40) = 0.04 + 0.25 + 0.04 = 0.33. List the possible ways the two variables can add to 50: 20 and 30, with probability: (0.5)(0.1) = 0.05, or 30 and 20, with probability: (0.1)(0.5) = 0.05. f*f(50) = 0.05 + 0.05 = 0.10. List the possible ways the two variables can add to 60: 30 and 30, with probability: (0.1)(0.1) = 0.01. f*f(60) = 0.01.] Exercise: For the distribution in the previous exercise, what is f*f*f? [Solution: One can use the fact that f*f*f = (f*f)*f. f*f*f(30) = 0.064. f*f*f(40) = 0.240. f*f*f(50) = 0.348. f*f*f(60) = 0.245. f*f*f(70) = 0.087. f*f*f(80) = 0.015. f*f*f(90) = 0.001. x 10 20 30 40 50 60
f*f(x)
Product
f(50-x)
0.16 0.4 0.33 0.1 0.01
0.016 0.2 0.132 0 0
0.1 0.5 0.4
Sum
1
0.348
1
x 10 20 30 40 50 60
f*f(x)
Product
f(60-x)
0.16 0.4 0.33 0.1 0.01
0 0.04 0.165 0.04 0
0.1 0.5 0.4
Sum
1
0.245
1
x 10 20 30 40 50 60
f*f(x)
Product
f(70-x)
0.16 0.4 0.33 0.1 0.01
0 0 0.033 0.05 0.004
0.1 0.5 0.4
Sum
1
0.087
1
50-x 40 30 20 10 0
60-x 50 40 30 20 10 0 70-x 60 50 40 30 20 10
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 62
Exercise: Let frequency be Binomial with parameters m = 3 and q = 0.2. Let the severity have a discrete distribution such that f(10) = 0.4, f(20) = 0.5, f(30) = 0.1. Calculate the distribution of aggregate losses, using the convolutions calculated in the previous exercises. [Solution: Recall that f*0 (0) ≡ 1. n Binomial
0 0.512
1 0.384
2 0.096
3 0.008
x
f*0
f
f*f
f*f*f
0 10 20 30 40 50 60 70 80 90
1
Sum
1
0.4 0.5 0.1
1
0.16 0.40 0.33 0.10 0.01
1
Aggregate Density
0.064 0.240 0.348 0.245 0.087 0.015 0.001
0.512000 0.153600 0.207360 0.077312 0.033600 0.012384 0.002920 0.000696 0.000120 0.000008
1
1
The aggregate density at 30 is: (0.384)(0.1) + (0.096)(0.40) + (0.008)(0.064) = 0.077312. Comment: Since the Binomial Distribution and severity distribution have finite support, so does the aggregate distribution. In this case the aggregate losses can only take on the values 0 through 90.] In general, when the frequency distribution is Binomial, there are only a finite number of terms in the sum used to get the aggregate density via convolutions: m
fA(x) =
m! qn (1- q)m - n fX* n (x) . ∑ n! (mn)!
n=0
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 63
Density of a Compound Distribution in terms of Convolutions: For example, assume the number of taxicabs that arrive per minute at the Heartbreak Hotel is Poisson with mean 1.3. In addition, assume that the number of passengers dropped off at the hotel by each taxicab is Binomial with q = 0.4 and m = 5. The number of passengers dropped off by each taxicab is independent of the number of taxicabs that arrive and is independent of the number of passengers dropped off by any other taxicab. Then the aggregate number of passengers dropped off per minute at the Heartbreak Hotel is a compound Poisson-Binomial distribution, with parameters λ = 1.3, q = 0.4, m = 5. Let the primary distribution be p and the secondary distribution be s and let c be the compound distribution. Then we can write the density of c, in terms of a weighted average convolutions of s. For example, assume we have 4 taxis. Then the distribution of the number of people is given by the sum of 4 independent variables each distributed as per the secondary distribution, s. This sum is distributed as the four-fold convolution of s: s* s * s* s = s* 4 . The chance of having four taxis is the density of the primary distribution at 4: p(4). Thus this possibility contributes p(4)s* 4 to the compound distribution c. The possibility of n taxis contributes p(n)s* n to the compound distribution c. Therefore, the compound distribution is the sum of such terms:29 ∞
c(x) =
Σ p(n)s*n (x) n =0
Compound Distribution = Sum over n of: (Density of primary distribution at n)(n-fold convolution of secondary distribution)
29
See page 207 in Loss Models. The same formula holds for the distribution of aggregate losses, where severity takes the place of the secondary distribution.
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 64
Exercise: What is the four-fold convolution of a Binomial distribution, with parameters q = 0.4, m = 5. [Solution: The sum of 4 independent Binomials each with parameters q = 0.4, m = 5 is a Binomial with parameters q = 0.4, m = (4)(5) = 20.] The n-fold convolution of a Binomial distribution, with parameters q = 0.4, m = 5 is a Binomial with (5n)! parameters q = 0.4, m = 5n. It has density at x of: 0.4x 0.65n-x. x! (5n - x)! Exercise: Write a formula for the density of a compound Poisson-Binomial distribution, with parameters λ = 1.3, q = 0.4, m = 5. [Solution: c(x) = Σ p(n)s*
∞
n (x)
=
∑
n=0
e- 1.3 1.3n (5n)! 0.4 x 0.6n - x .] (x!) (5n - x)! n!
One could perform this calculation in a spreadsheet as follows:
2013-4-3, n Poisson
x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Sum
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 65
0 0.27253
1 0.35429
2 0.23029
3 0.099792
4 0.032432
5 0.008432
Binomial m=0
Binomial m=5
Binomial m=10
Binomial m=15
Binomial m=20
Binomial m=25
1
0.07776 0.25920 0.34560 0.23040 0.07680 0.01024
0.00605 0.04031 0.12093 0.21499 0.25082 0.20066 0.11148 0.04247 0.01062 0.00157 0.00010
0.000470 0.004702 0.021942 0.063388 0.126776 0.185938 0.206598 0.177084 0.118056 0.061214 0.024486 0.007420 0.001649 0.000254 0.000024 0.000001
0.000037 0.000487 0.003087 0.012350 0.034991 0.074647 0.124412 0.165882 0.179706 0.159738 0.117142 0.070995 0.035497 0.014563 0.004854 0.001294 0.000270 0.000042 0.000005 0.000000 0.000000
0.000003 0.000047 0.000379 0.001937 0.007104 0.019891 0.044203 0.079986 0.119980 0.151086 0.161158 0.146507 0.113950 0.075967 0.043410 0.021222 0.008843 0.003121 0.000925 0.000227 0.000045 0.000007 0.000001 0.000000 0.000000 0.000000
Compound PoissonBinomial 0.301522089 0.101600875 0.152585482 0.137881306 0.098817323 0.070981212 0.050696418 0.033505760 0.021065986 0.012925619 0.007625757 0.004278394 0.002276687 0.001138213 0.000525897 0.000221047 0.000083312 0.000027689 0.000007950 0.000001926 0.000000383 0.000000061 0.000000007 0.000000001 0.000000000 0.000000000
1
1
1
1
1
1
0.997769395
For example, the density at 2 of the compound distribution is calculated as: (0.27253)(0) + (0.35429)(0.34560) + (0.23029)(0.12093) + (0.099792)(0.021942) + (0.032432)(0.003087) + (0.008432)(0.000379) = 0.1526. Thus, there is a 15.26% chance that two passengers will be dropped off at the Heartbreak Hotel during the next minute. Note that by not including the chance of more than 5 taxicabs in our spreadsheet, we have allowed the calculation to fit in a finite sized spreadsheet, but have also left out some possibilities.30
30
As can be seen, the computed compound densities only add to .998 < 1. The approximate compound densities at x < 10 are fairly accurate; for larger x one would need a bigger spreadsheet.
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 66
Practical Issues: When one has a frequency with infinite support and a discrete severity, while these calculations of the aggregate distribution via convolutions are straightforward to perform on a computer, they can get rather lengthy.31 Also if the severity distribution has a positive density at zero, then each summation contains an infinite number of terms.32 When the frequency or primary distribution is a member of the (a, b, 0) class or (a, b, 1) class, aggregate and compound distributions can also be computed via the Panjer Algorithm (Recursive Method), to be discussed in a subsequent section. The Panjer Algorithm avoids some of these practical issues.33
31
As stated in Section 9.5 of Loss Models, in order to compute the aggregate distribution up to n using convolutions, the number of calculations goes up as n3 . 32 One can get around this difficulty when the frequency distribution can be “thinned”. 33 As stated in Section 9.5 of Loss Models, in order to compute the aggregate distribution up to n using the Panjer Algorithm, the number of calculations goes up as n2 .
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 67
Problems: 3.1 (3 points) There are either one, two or three claims, with probabilities of 60%, 30% and 10%, respectively. Each claim is of size $100 or $200, with probabilities of 80% and 20% respectively, independent of the size of any other claim. Frequency and severity are independent. Calculate the aggregate distribution. 3.2 (2 points) The number of claims in a period has a Geometric distribution with mean 2. The amount of each claim X follows P(X = x) = 0.50, x = 1, 2. The number of claims and the claim amounts are independent. S is the aggregate claim amount in the period. Calculate Fs(3). (A) 0.66
(B) 0.67
(C) 0.68
(D) 0.69
(E) 0.70
3.3 (3 points) The number of accidents per year follows a Binomial distribution with m = 2 and q = 0.7. The number of claims per accident is Geometric with β = 1. The number of claims for each accident is independent of the number of claims for any other accident and of the total number of accidents. Calculate the probability of 2 or fewer claims in a year. A. Less than 80% B. At least 80% but less than 82% C. At least 82% but less than 84% D. At least 84% but less than 86% E. At least 86% 3.4 (3 points) For a certain company, losses follow a Poisson frequency distribution with mean 2 per year, and the amount of a loss is 1, 2, or 3, each with probability 1/3. Loss amounts are independent of the number of losses, and of each other. What is the probability of 4 in annual aggregate losses? A. 7% B. 8% C. 9% D. 10% E. 11% 3.5 (8 points) The number of accidents is either 0, 1, 2, or 3 with probabilities 50%, 20%, 20%, and 10% respectively. The number of claims per accident is 0, 1, 2 or 3 with probabilities 30%, 40%, 20%, and 10% respectively. Calculate the distribution of the total number of claims.
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 68
3.6 (5A, 11/94, Q.37) (2 points) Let N = number of claims and S = X1 + X2 + ... + XN. Suppose S has a compound Poisson distribution with Poisson parameter λ = 0.6. The only possible individual claim amounts are $2,000, $5,000, and $10,000 with probabilities 0.6, 0.3, and 0.1, respectively. Calculate Prob[S ≤ $7000 | N ≤ 2]. 3.7 (CAS3, 5/04, Q.37) (2.5 points) An insurance portfolio produces N claims with the following distribution: n P(N = n) 0 0.1 1 0.5 2 0.4 Individual claim amounts have the following distribution: x fX(x) 0 0.7 10 0.2 20 0.1 Individual claim amounts and claim counts are independent. Calculate the probability that the ratio of aggregate claim amounts to expected aggregate claim amounts will exceed 4. A. Less than 3% B. At least 3%, but less than 7% C. At least 7%, but less than 11% D. At least 11%, but less than 15% E. At least 15%
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 69
Solutions to Problems: 3.1. The severity distribution is: f(100) = .8 and f(200) = .2. f*f is: 200@64%, 300@32%, 400@4%, since theYpossible sums of two claims are: 100 200
100
200
200 300
300 400
with the corresponding probabilities:
0.8 0.2
0.8
0.2
64% 16%
16% 4%
f*f *f = f*(f*f) is:
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%, since the possible sums of three claims are: Y 100 200
200
300
400
300 400
400 500
500 600
0.64
0.32
0.04
51.2% 12.8%
25.6% 6.4%
3.2% 0.8%
with the corresponding probabilities: 0.8 0.2
The aggregate distribution is Σ Prob(N = n) f*n . n
0 0.00
1 0.60
2 0.30
3 0.10
x
f*0
f
f*f
f*f*f
0 100 200 300 400 500 600
1
Sum
1
0.8 0.2
1
0.64 0.32 0.04
1
Aggregate Distribution
0.512 0.384 0.096 0.008
0.0000 0.4800 0.3120 0.1472 0.0504 0.0096 0.0008
1
1.0000
For example, the probability that the aggregate distribution is 300 is: (.3)(.32) + (.1)(.512) = 14.72%. The aggregate distribution is: 100@48%,
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%. Comment: One could instead use semi-organized reasoning. For example, the aggregate can be 300 if either one has 2 claims of sizes 100 and 200, or one has 3 claims each of size 100. This has probability of: (30%)(2)(80%)(20%) + (10%)(80%)(80%)(80%) = 14.72%.
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 70
3.2. C. For the Geometric with β = 2: f(0) = 1/3, f(1) = 2f(0)/3 = 2/9, f(2) = 2f(1)/3 = 4/27, f(3) = 2f(2)/3 = 8/81. The ways in which the aggregate is ≤ 3: 0 claims: 1/3 = .3333. 1 claim: 2/9 = .2222. 2 claims of sizes 1 & 1, 1 & 2, or 2 & 1: (3/4)(4/27) = 1/9 = .1111. 3 claims of sizes 1 & 1 & 1: (1/8)(8/81) =1/81 = .0123. Distribution of aggregate at 3 is: .3333 + .2222 + .1111 + .0123 = 0.679. Alternately, using convolutions: n Geometric
0
1
2
3
0.3333
0.2222
0.1481
0.0988
x
f*0
f
f*f
f*f*f
0 1 2 3
1
0.50 0.50
0.250 0.500
0.1250
Aggregate Distribution
0.3333 0.1111 0.1481 0.0864
Distribution of aggregate at 3 is: .3333 + .1111 + .1481 + .0864 = 0.679. Comment: Similar to but easier than 3, 11/02, Q.36. One could also use the Panjer Algorithm (Recursive Method). 3.3. A. For the Binomial with m = 2 and q = .7: f(0) = .32 = .09, f(1) = (2)(.3)(.7) = .42, f(2) = 0.72 = .49. For a Geometric with β = 1, f(0) = 1/2, f(1) = 1/4, f(2) = 1/8. The number of claims with 2 accidents is the sum of two independent Geometrics, which is a Negative Binomial with r = 2 and β =1, with: f(0) = 1/(1+ β)r = 1/4. f(1) = rβ/(1+ β)r+1 = 1/4. f(2) = {r(r+1)/2}β2 /(1+ β)r+2 = 3/16. Using convolutions: n Poisson
0
1
2
0.09
0.42
0.49
x
f*0
f
f*f
0 1 2
1
0.5000 0.2500 0.1250
0.2500 0.2500 0.1875
Compound Distribution 0.4225 0.2275 0.1444
Prob[2 or fewer claims] = .4225 + .2275 + .1444 = 0.7944. Comment: One could instead use the Panjer Algorithm (Recursive Method).
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
3.4. E. For the Poisson with λ = 2: f(0) = e-2 = 0.1353, f(1) = 2f(0) = 0.2707, f(2) = 2f(1)/2 = 0.2707, f(3) = 2f(2)/3 = 0.1804, f(4) = 2f(3)/4 = 0.0902. Using convolutions: n Poisson
0
1
2
3
4
0.1353
0.2707
0.2707
0.1804
0.0902
x
f*0
f
f*f
f*f*f
f*f*f*f
0 1 2 3 4
1
0.3333 0.3333 0.3333
0.1111 0.2222 0.3333
0.0370 0.1111
0.0123
Aggregate Distribution 0.1353 0.0902 0.1203 0.1571 0.1114
Prob[Aggregate = 4] = Prob[2 claims]Prob[2 claims sum to 4] + Prob[3 claims]Prob[3 claims sum to 4] + Prob[4 claims]Prob[4 claims sum to 4] = (0.2707)(0.3333) + (0.1804)(0.1111) + (0.0902)(0.0123) = 0.1114. Comment: One could instead use the Panjer Algorithm (Recursive Method).
Page 71
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 72
3.5. The possible sums of the numbers of claims for 2 accidents is: 0 1 2 3
0
1
2
3
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
0.3
0.4
0.2
0.1
9% 12% 6% 3%
12% 16% 8% 4%
6% 8% 4% 2%
3% 4% 2% 1%
With the corresponding Probabilities ofprobabilities: 0.3 0.4 0.2 0.1
f*f = 0@9%, 1@24%, 2@28%,3@22%, 4@12%, 5@4%, 6@1%. The possible sums of the numbers of claims for 3 accidents is: 0 1 2 3 4 5 6
0
1
2
3
0 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
0.3
0.4
0.2
0.1
2.7% 7.2% 8.4% 6.6% 3.6% 1.2% 0.3%
3.6% 9.6% 11.2% 8.8% 4.8% 1.6% 0.4%
1.8% 4.8% 5.6% 4.4% 2.4% 0.8% 0.2%
0.9% 2.4% 2.8% 2.2% 1.2% 0.4% 0.1%
With the corresponding Probabilities ofprobabilities: 0.09 0.24 0.28 0.22 0.12 0.04 0.01
f*f*f =
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%,
[email protected]%,
[email protected],
[email protected]%.
2013-4-3, n
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Compound Distribution
0
1
2
3
0.5
0.2
0.2
0.1
x
f*0
f
f*f
f*f*f
0 1 2 3 4 5 6 7 8
1
0.30 0.40 0.20 0.10
0.09 0.24 0.28 0.22 0.12 0.04 0.01
0.027 0.108 0.198 0.235 0.204 0.132 0.065 0.024 0.006 0.001
0.5807 0.1388 0.1158 0.0875 0.0444 0.0212 0.0085 0.0024 0.0006 0.0001
1
1
1
1
1
9 sum
For example, (0.2)(0.1) + (0.2)(0.22) + (0.1)(0.235) = 0.0875.
Page 73
2013-4-3,
Aggregate Distributions §3 Using Convolutions,
HCM 10/23/12,
Page 74
3.6. f*f(4) = Prob[1st claim = 2]Prob[2nd claim = 2] = 0.62 = 0.36. f*f(7) = Prob[1st claim = 2]Prob[2nd claim = 5] + Prob[1st claim = 5]Prob[2nd claim = 2] = (2)(.6)(.3) = .36. f*f(10) = Prob[1st claim = 5]Prob[2nd claim = 5] = 0.32 = 0.09. The aggregate distribution is Σ Prob(N = n) f*n . Given that N ≤ 2, we need only calculate the first three terms of that sum. n Poisson
0 0.5488
1 0.3293
2 0.0988
x
f*0
f
f*f
0 1 2 3 4 5 6 7 8 9 10
1
Sum
0.6 0.36 0.3 0.36
0.1
0.09
Aggregate Distribution 0.5488 0.0000 0.1976 0.0000 0.0356 0.0988 0.0000 0.0356 0.0000 0.0000 0.0418 0.9581
The probability that N ≤ 2 and the aggregate losses are less ≤ 7 is: 0.5488 + 0.1976 + 0.0356 + 0.0988 + 0.0356 = 0.9164. The probability that N ≤ 2 is 0.5488 + 0.3293 + 0.0988 = 0.9769. Thus Prob[S ≤ $7000 | N ≤ 2] = Prob[ S ≤ $7000 and N ≤ 2] / Prob[N ≤ 2] = 0.9164 / 0.9769 = 0.938. Comment: If there are more than 3 claims, the aggregate losses are > 7. The chance of three claims all of size 2 is (e-.6 0.63 / 6)(0.63 ) = 0.0043. Thus the unconditional probability that S ≤ 7 is 0.9164 + 0.0043 = 0.9207. 3.7. A. Mean Frequency is 1.3. Mean severity is 4. Mean Aggregate is: (1.3)(4) = 5.2. Prob[Agg > (4)(5.2) = 20.4] = Prob[Agg ≥ 30]. The aggregate is ≥ 30 if there are two claims of sizes: 10 and 20, 20 and 10, or 20 and 20. Prob[Agg ≥ 30] = (0.4) {(2)(0.2)(0.1) + 0.12 } = 2%.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 75
Section 4, Generating Functions34 There are a number of different generating functions, with similar properties. On this exam, the the Probability Generating Function (p.g.f.)35 is used for working with frequency distributions. On this exam, the Moment Generating Function (m.g.f.) and the Probability Generating Function are used for working with aggregate distributions. Other generating functions include: the Characteristic Function, the Laplace Transform, and the Cumulant Generating Function. Name
Symbol
Formula
Probability Generating Function
P X(t)
E[tx ] = MX(ln(t))
Moment Generating Function
M X(t)
E[et x] = PX( et )
Characteristic Function
ϕX(t)
E[eitx] = MX(it)
Laplace Transform
LX(t)
E[e-tx]
Cumulant Generating Function
ψX(t)
ln MX(t) = ln E[etx]
Moment Generating Functions:36 The moment generating function is defined as MX(t) = E[et x]. The moment generating function for a continuous loss distribution with support from 0 to ∞ is given by:37 ∞
M(t) = E[ext] =
∫ f(x)ext dx .
x =0 34
See Section 3.3 of Loss Models . Also see page 38 of Actuarial Mathematics, not on the Syllabus. See “Mahlerʼs Guide to Frequency Distributions.” 36 Moment Generating Functions are used in the study of Aggregate Distributions and Continuous Time Ruin Theory. Continuous Time Ruin Theory is not on the syllabus. See either Chapter 13 of Actuarial Mathematics or Chapter 11 of Loss Models. 37 In general the integral goes over the support of the probability distribution. In the case of discrete distributions, one substitutes summation for integration. 35
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 76
For example for the Exponential distribution: ∞
M(t) =
∞
∞
∫ f(x)ext dx = ∫ {e-x/θ /θ}ext dx = (1/θ)ex(t−1/θ) /(t - 1/θ) ] = 1/(1 - θt),
x=0
x=0
for t < 1/θ.
x=0
Exercise: What is the moment generating function for a uniform distribution on [3, 8]? [Solution: M(0) = E[ex0] = E[1] = 1. M(t) = E[ext] = 8
∫ (1/5)ext dx = (1/5)ext /t x=3
x=8
]
= (e8t - e3t)/5t, for t ≠ 0.]
x=3
The Moment Generating Functions of severity distributions, when they exist, are given in Appendix A of Loss Models. The Probability Generating Functions of frequency distributions are given in Appendix B of Loss Models. M(t) = P(et ) .
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 77
Table of Moment Generating Functions Distribution38
Parameters
Support of M.G.F.
(ebt - eat)/ t(b-a) 39
Uniform on [a, b] Normal
Moment Generating Function
µ, σ
exp[µt + σ2t2 /2]
θ
1/(1 - θt)
t < 1/θ
Gamma
α, θ
(1 - θt)−α
t < 1/θ
Inverse Gaussian
µ, θ
Bernoulli
q
Binomial
q, m
Exponential
λ
Poisson Geometric Negative Binomial
β r, β
exp[(θ / µ) (1 -
1 - 2tµ2 / θ )]
t < θ/ 2µ2
qet + 1 - q (qet + 1 - q)m exp[λ(et - 1)] 1/{1 - β(et - 1)}
et < (1+β)/β
{1 - β(et - 1)}-r
et < (1+β)/β
Exercise: Assume X is Normally Distributed with parameters µ = 2 and σ = 3. What is E[etX]? [Solution: If X is Normally distributed with parameters µ = 2 and σ = 3, then tX is Normally distributed with parameters µ = 2t and σ = 3t. ⇒ etX is LogNormally distributed with µ = 2t and σ = 3t. E[etX] = mean of a LogNormal Distribution = exp[2t + (3t)2 /2] = exp[2t + 4.5t2 ]. ] X is Normal(µ, σ) ⇒ tX is Normal(tµ, tσ) ⇒ etX is LogNormal(tµ, tσ). X is Normal(µ, σ) ⇒ MX(t) ≡ E[ext] = mean of LogNormal(tµ, tσ) = exp[µt + σ2t2 /2].40 The Moment Generating Function of a Normal Distribution is: M(t) = exp[tµ + t2 σ2 /2]. 38
As per Loss Models. M(0) = 1. 40 The mean of a LogNormal Distribution is: exp[(first parameter) + (second parameter)2 /2]. 39
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 78
Discrete Distributions: For a discrete distribution, we substitute summation for integration. For example, for the Poisson Distribution the m.g.f. is determined as follows: ∞
∞
M(t) = E[ext] = Σ (e−λ λx / x!) etx = e−λ Σ (λet)x / x! = e−λ exp[λet] = exp[λ(et -1)]. x=0
x=0
Exercise: Severity is 300 with probability 60% and 700 with probability 40%. What is the moment generating function for severity? [Solution: M(t) = E[ext] = 0.6e300t + 0.4e700t.] Relation of Moment and Probability Generating Functions: M(t) = E[ext] = E[(et)x] = P(et). Thus one can write the Moment Generating Function in terms of the Probability Generating Function, M(t) = P(et ).41 For example, for the Poisson Distribution, P(t) = exp[λ(t-1)]. Therefore, M(t) = P(et) = exp[λ(et -1)]. On the other hand, if one knows the Moment Generating Function, one can get the Probability Generating Function as: P(t) = M(ln(t)). Exercise: What is the Moment Generating Function of a Negative Binomial Distribution as per Loss Models? [Solution: As shown in Appendix B.2.1.4 of Loss Models, for the Negative Binomial Distribution: P(t) = {1 - β(t - 1)}-r. Thus M(t) = P(et) = {1 - β(et - 1)}-r. Comment: Instead, one could calculate E[ext] for a Geometric Distribution as 1/{1 - β(et - 1)}, and since a Negative Binomial is a sum of r independent Geometrics, M(t) = {1 - β(et - 1)}-r.]
41
The Probability Generating Functions of frequency distributions are given in Appendix B of Loss Models. The Moment Generating Functions of severity distributions, when they exist, are given in Appendix A of Loss Models.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 79
Properties of Moment Generating Functions: The Moment Generating Function has useful properties. For example, for X and Y independent variables, the moment generating function of their sum is: MX+Y(t) = E[e(X+Y) t] = E[eXt eYt] = E[eXt]E[eYt] = MX(t)MY(t). The moment generating function of the sum of two independent variables is the product of their moment generating functions: M X+Y(t) = MX(t) MY(t) Exercise: X and Y are each Exponential with mean 25. X and Y are independent. What is the m.g.f. of their sum? [Solution: X and Y each have m.g.f.: 1/(1 - 25t). Thus the m.g.f. of their sum is: 1/(1 - 25t)2 . Comment: This is the m.g.f. of a Gamma Distribution with θ = 25 and α = 2.] Exercise: X follows an Inverse Gaussian Distribution with µ = 10 and θ = 8. Y follows an Inverse Gaussian Distribution with µ = 5 and θ = 2. X and Y are independent. What is the m.g.f. of their sum? [Solution: The m.g.f. of an Inverse Gaussian Distribution is: M(t) = exp[(θ/µ){1 M Y(t) = exp[.4{1 -
1 - 2µ2 t / θ }]. Therefore, MX(t) = exp[0.8{1 1 - 25t }]. MX+Y(t) = MX(t) MY(t) = exp[1.2{1 -
1 - 25t }] and 1 - 25t }].
Comment: This is the m.g.f. of another Inverse Gaussian with µ = 15 and θ = 18.] Since the moment generating function of the sum of two independent variables is the product of their moment generating functions, the Moment Generating Function converts convolution into multiplication: M f * g = Mf M g . The Moment Generating Function for a sum of independent variables is the product of the Moment Generating Functions of each of the variables. Of particular importance for working with aggregate losses, the sum of n independent, identically distributed variables has the Moment Generating Function taken to the power n. The m.g.f. of f*n is the nth power of the m.g.f. of f.42 42
Using characteristic functions rather than Moment Generating Functions, this is the key idea behind the Heckman-Meyers algorithm not on the Syllabus. The Robertson algorithm, not on the Syllabus, relies on the similar properties of the Fast Fourier Transform. See Section 9.8 of Loss Models.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 80
For example, MX+X+X(t) = MX(t)3 . Thus the Moment Generating Function for a sum of 3 independent, identical, Exponential variables is M(t) = {1/(1 - tθ)}3 , for t < 1/θ, the moment generating function of a Gamma Distribution with α = 3. Exercise: What is the m.g.f. for the sum of 5 independent Exponential Distributions, each with mean 17? [Solution: M(t) = {1/(1 - 17t)}5 .] Adding a constant to a variable, multiplies the m.g.f. by e to the power of that constant times t: M X+b (t) = E[e(x+b)t] = ebt E[ext] = ebt MX(t). Multiplying a constant times a variable, gives an m.g.f that is the original m.g.f at t times that constant: M cX(t) = E[ecxt] = MX(ct). Exercise: There is uniform inflation of 4% between 2001 and 2002. What is the m.g.f. of the severity distribution in 2002 in terms of that in 2001? [Solution: y = 1.04x, therefore, M2002(t) = M2001(1.04t).] For example, if losses in 2001 follow a Gamma Distribution with α = 2 and θ = 1000, then in 2001 M(t) = (1 - 1000t)-2. If there is uniform inflation of 4% between 2001 and 2002, then in 2002 the m.g.f. is: {1 - 1000(1.04)t}-2 = (1 - 1040t)-2, which is that of a Gamma Distribution with α = 2 and θ = 1040. Exercise: What is the m.g.f. for the average of 5 independent Exponential Distributions, each with mean 17? [Solution: Their average is their sum multiplied by 1/5. MY/5(t) = MY(t/5). Therefore, the m.g.f. of the average is the m.g.f. of the sum at t/5: {1/(1 - 17t/5)}5 .] In general, the Moment Generating Function of the average of n independent identically distributed variables is the nth power of the Moment Generating Function of t/n. For example, the average of n independent Geometric Distribution each with parameter β, each with Moment Generating Function: (1 - β(et - 1))-1, has m.g.f.: (1 - β(et/n - 1))-n. In addition, the moment generating function determines the distribution, and vice-versa. Therefore, one can take limits of a distribution by instead taking limits of the Moment Generating Function.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 81
Exercise: Use moment generating functions to take the limit of Negative Binomial Distributions, such that rβ = 7 as β → 0. [Solution: The moment generating function of a Negative Binomial is: (1 - β(et-1))-r, which for r = 7/β is: (1 - β(et - 1))-7/β. ln ((1 - β(et - 1))-7/β) = -(7/β) ln(1 - β(et - 1)) ≅ -(7/β)(- β(et - 1)) = 7(et - 1). Thus the limit as β → 0 of the m.g.f. is exp[7(et - 1)], which is the m.g.f. of a Poisson with mean 7. Thus the limit of these Negative Binomials is a Poisson with the same mean.] Moments and the Moment Generating Function: The moments of the function can be obtained as the derivatives of the moment generating function at zero, by reversing the order of integration and the taking of the derivative.
∫ M′(0) = ∫f(x)xdx = E[X]. M′′(s) = ∫f(x)x2 exsdx. M′′(0) =∫f(x)x2 dx = E[X2 ].
M′(s) = f(x)xexsdx.
M(0) = E[X0 ] = 1 M ′(0) = E[X] M ′′(0) = E[X2 ] M′′′(0) = E[X3 ] M ( n )(0) = E[Xn ] For example, for the Gamma Distribution: M(t) = (1 - tθ)−α. M′(t) = θα (1 - tθ)−(α+1). M′(0) = αθ = mean. M′′(t) = θ2 α(α+1)(1 - tθ)−(α+2). M′′(0) = θ2 α(α+1) = second moment. M′′′(t) = θ3 α(α+1)(α+2)(1 - tθ)−(α+3). M′′′(0) = θ3 α(α+1)(α+2) = third moment.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 82
Exercise: A distribution has m.g.f. M(t) = exp[11t + 27t2 ]. What are the mean and variance of this distribution? [Solution: M′(t) = M(t)(11 + 54t). mean = M′(0) = 11. M′′(t) = Mʼ(t)(11 + 54t) + 54M(t). 2nd moment = Mʼʼ(0) = (11)(11) + 54 = 175. Variance = 175 - 112 = 54.] Moment Generating Function as a Power Series: One way to remember the relationship between the m.g.f. and the moments is to expand the exponential into a power series: ∞ ⎡∞ ⎤ M(t) = E[ext]= E⎢ ∑ (xt)n / n!⎥ = ∑ E[Xn] t n / n! = Σ(nth moment) tn /n!. ⎢⎣n=0 ⎦⎥ n=0
So the nth moment of the distribution is the term multiplying tn /n! in the power series representation of its m.g.f., M(t). ∞
For example, the power series for 1/(1-y) is:
∑ yn / n!, n=0
while the m.g.f. of an Exponential Distribution is: M(t) = 1/(1 - θt). Therefore,
M(t) = 1/(1 - θt) =
∞
∑ n=0
(θt) n / n!
∞
=
∑ θn tn / n! . n=0
Therefore, the nth moment of an Exponential is θn . When one differentiates n times the power series for M(t), the first n terms vanish. dn (tn /n!)/dtn = 1, and the remaining terms all still have powers of t, which will vanish when we set t = 0. Therefore, ⎛∞ ⎞ dn⎜⎜ ∑E[Xi ] ti / i!⎟⎟ ⎝ i=0 ⎠ at t equal to zero = nth moment. M n (0) = n dt
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 83
The Moments of a Negative Binomial Distribution: As discussed previously, the probability generating function of a Negative Binomial Distribution is M(t) = P(et) = 1/{1 - β(et - 1)}r. Exercise: Using its moment generating function, determine the first four moments of a Negative Binomial Distribution. [Solution: Mʼ(t) = rβet/{1 - β(et - 1)}r+1. First Moment = Mʼ(0) = rβ. Mʼʼ(t) = Mʼ(t) + r(r+1)β2e2t/{1 - β(et - 1)}r+2. Second Moment = Mʼʼ(0) = rβ + r(r+1)β2. Mʼʼʼ(t) = Mʼʼ(t) + 2r(r+1)β2e2t/{1 - β(et - 1)}r+2 + r(r+1)(r+2)β3e3t/{1 - β(et - 1)}r+3. Third Moment = Mʼʼʼ(0) = rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3. Mʼʼʼʼ(t) = Mʼʼʼ(t) + 4r(r+1)β2e2t/{1 - β(et - 1)}r+2 + 2r(r+1)(r+2)β3e3t/{1 - β(et - 1)}r+3 + 3r(r+1)(r+2)β3e3t/{1 - β(et - 1)}r+3 + r(r+1)(r+2)(r+3)β4e4t/{1 - β(et - 1)}r+4. Fourth Moment = Mʼʼʼʼ(0) = rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3 + 4r(r+1)β2 + 2r(r+1)(r+2)β3 + 3r(r+1)(r+2)β3 + r(r+1)(r+2)(r+3)β4 = rβ + 7 r(r+1)β2 + 6r(r+1)(r+2)β3 + r(r+1)(r+2)(r+3)β4.] Exercise: Determine the CV, skewness, and kurtosis of a Negative Binomial Distribution. [Solution: Variance = rβ + r(r+1)β2 - (rβ)2 = rβ(1 + β). Coefficient of Variation = rβ(1 + β) / rβ = (1+ β) / (rβ) . Third Central Moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3 - 3(rβ){rβ + r(r+1)β2} + 2(rβ)3 = r(β + 3β2 + 2β3) = rβ(1 + β)(1 + 2β). Skewness = rβ(1 + β)(1 + 2β)/{rβ(1 + β)}1.5 = (1 + 2β) /
β(1 + β) .
Fourth Central Moment = E[X4 ] - 4E[X]E[X3 ] + 6E[X]2 E[X2 ] - 3E[X]4 = rβ + 7 r(r+1)β2 + 6r(r+1)(r+2)β3 + r(r+1)(r+2)(r+3)β4 - 4(rβ){rβ + 3 r(r+1)β2 + r(r+1)(r+2)β3} + 6(rβ)2 { rβ + r(r+1)β2} -3(rβ)4 = rβ{1 + 7β + 12β2 + 6β3 + 3rβ(1 + β)2 }. Kurtosis = rβ{1 + 7β + 12β2 + 6β3 + 3rβ(1 + β)2 }/{rβ(1 + β)}2 = 3 + {6β2 + 6β + 1}/{(1 + β)rβ}.] Therefore, for the Negative Binomial Distribution: Skewness = {3 Variance - 2 mean + 2(Variance - mean)2 /mean}/Variance1.5.43 43
See equation 6.7.8 in Risk Theory by Panjer and Willmot.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 84
Calculating the Moment Generating Function of an Inverse Gaussian Distribution: ∞
Exercise: What is the integral
∫ x-3/2 exp(-(a2x + b2/x)) dx ? 0
Hint: An Inverse Gaussian density integrates to unity from zero to infinity. [Solution: The density of an Inverse Gaussian with parameters µ and θ is: f(x) =
θ / (2 π) x-1.5 exp(-θ(x/µ -1)2 / (2x)) =
θ / (2 π) x-1.5 exp(-θx/ 2µ2 + θ/µ - θ/ 2x).
Let a2 = θ/ 2µ2 and b2 = θ/ 2, then θ = 2b2 and µ = b/a. Then f(x) =
b2 / π x-1.5 exp(-a2 + 2ba - b2 /x) = e2ba (b/ π ) x-1.5 exp(-a2 - b2 /x) . ∞
∫
Since this integrates to unity: (e2ba b/ π ) x-1.5 exp(-a2 - b2 /x) dx = 1. 0 ∞
∫ x-3/2 exp(-(a2x + b2/x)) dx =
π e-2ba / b.
0
This is a special case of a Modified Bessel Function of the Third Kind, K-.5. See for example, Appendix C of Insurance Risk Models by Panjer & Willmot. ] Exercise: Calculate the Moment Generating Function for an Inverse Gaussian Distribution with parameters µ and θ. Hint: Use the result of the previous exercise. [Solution: The Moment Generating Function is the expected value of ezx. M(z) =
eθ/µ
∫ ezx f(x) dx = ∫ ezx θ / 2π x-1.5 exp(-θx/ 2µ2 + θ/µ - θ/ 2x) dx = θ / (2 π) ∫ x-1.5 exp(-(θ/ 2µ2 - z)x - θ/ 2x) dx = eθ/µ θ / (2 π) { π e-2ba / b} =
eθ/µ exp[-(θ/µ) 1 - 2zµ2 / θ ] = exp[ (θ/µ) (1 - 1 - 2zµ2 / θ ) ]. Where we have used the result of the previous exercise with a2 = θ/ 2µ2 - z and b2 = θ/2. The former requires that z < θ/ 2µ2 . Note that ba =
b2 a2 = (θ / 2) (θ / 2µ2 - z) = (θ/ 2µ) 1 - 2zµ2 / θ . ]
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 85
An Example, Calculating the Skewness of an Inverse Gaussian Distribution: Let's see how the Moment Generating Function could be used to determine the skewness of an Inverse Gaussian Distribution. We can use the Moment Generating Function of the Inverse Gaussian to calculate its moments. 44 M(t) = exp[(θ/µ) {1 -
1 - 2tµ2 / θ }]. M(0) = 1.
M'(t) = M(t) µ/ 1 - 2tµ2 / θ . M'(0) = µ = mean. M''(t) = M'(t) µ/ 1 - 2tµ2 / θ + M(t) (µ3/θ) (1 - 2tµ2/θ)-3/2. M''(0) = µ2 + µ3/θ = second moment. Variance = µ3/θ. M'''(t) = M''(t)µ/ 1 - 2tµ2 / θ + 2M'(t) (µ3/θ) (1 - 2tµ2/θ)-3/2 + M(t) (3µ5/θ2) (1 - 2tµ2/θ)-5/2. M'''(0) = (µ2 + µ3/θ)µ + 2µ(µ3/θ) + 3µ5/θ2 = µ3 + 3(µ4/θ)(1+ µ/θ) = third moment. Exercise: What is the coefficient of variation of an Inverse Gaussian Distribution with µ and θ? [Solution:
Variance / Mean = (µ3/2/θ1/2)/µ =
µ / θ. ]
Exercise: What is the skewness of an Inverse Gaussian Distribution with parameters µ and θ? [Solution: Third Central Moment = µ3 + 3(µ4/θ)(1+ µ/θ) - 3µ (µ2 + µ3/θ) + 2µ3 = 3µ5/θ2. Skewness = Third Central Moment / (Variance)1.5 = (3µ5/θ2)/ (µ3/θ)1.5 = 3 µ / θ . ] Thus we see that the skewness of the Inverse Gaussian Distribution, 3 µ / θ , is always three times its coefficient of variation, µ / θ . In contrast, for the Gamma Distribution its skewness of 2/ α , is always twice times its coefficient of variation of 1/ α . Existence of Moment Generating Functions: Moment Generating Functions only exist for distributions all of whose moments exist. Thus for example, the Pareto does not have all of its moments, so that its Moment Generating Function does not exist. If a distribution is short tailed enough for all of its moments to exist, then its moment generating function may or may not exist. While the LogNormal Distribution has all of its moments exist, its Moment Generating Function does not exist.45 The Moment Generating Function of the Transformed Gamma, or its special case the Weibull, only exists if τ ≥ 1.46 44
One can instead use the cumulant generating function, ln M(z), to get the cumulants. See for example, Kendall's Advanced Theory of Statistics, page 412. 45 The LogNormal is the heaviest-tailed distribution all of whose moments exist. 46 While if τ > 1, the m.g.f. of the Transformed Gamma exists, the m.g.f. is not a well known function.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 86
The moments of the function can be obtained as the derivatives of the moment generating function at zero. Thus if the Moment Generating Function exists (within an interval around zero) then so do all the moments. However the converse is not true. As discussed previously, the moment generating function when it exists can be written as a power series in t, where E[Xn ] is the nth moment about the origin of the distribution: ∞
M(t) =
∑ E[Xn] t n / n! . n=0
In order for the moment generating function to converge, in an interval around zero, the moments E[Xn ] may not grow too quickly as n gets large. For example, the LogNormal Distribution has moments: E[Xn ] = exp[nµ + .5 n2 σ2] = exp[nµ] exp[ .5 n2 σ2] . Thus in this case E[Xn ] tn / n! = exp[nµ] exp[5 n2 σ2] tn / n! . Using Sterling's Formula, for large n, n! increases approximately as: en nn . Thus E[Xn ] tn / n! increases approximately as: exp[n] exp[n2] tn / (en nn ) = exp[n2] (t/n)n . Thus as n increases, ln[E[Xn ] tn / n!] increases approximately as: n2+ nln(t) - nln(n) = n{n + ln(t) - ln(n)}. Since n increases more quickly than ln(n), this expression approaches infinity as n approaches infinity. Thus so does E[Xn ] tn / n! . Since the terms of the power series go to infinity, the sum does not converge.47 Thus for the LogNormal Distribution the Moment Generating Function fails to exist. In general, the Moment Generating Function of a distribution exists if and only if the distribution has a tail which is "exponential bounded."48 A distribution is exponentially bounded if for some K > 0, c > 0 and for all x: 1 - F(x) ≤ Ke-cx. In other words, the survival function has to decline at least exponentially. For example, for the Weibull Distribution the survival function is exp(-cxτ). For τ > 1 this survival function declines faster than e-x and thus the Weibull is exponentially bounded. For τ < 1 this survival function declines slower than e-x and thus the Weibull is not exponentially bounded.49 Thus the Weibull for τ > 1 has a Moment Generating Function, while for τ < 1 it does not.50 47
In fact, in order for the power series to converge the terms have to decline faster than 1/n. See page 186 of Adventures in Stochastic Processes by Sidney L. Resnick. 49 For τ = 1 one has an Exponential Distribution which is exponentially bounded and its m.g.f. exists. 50 The Transformed Gamma has the same behavior as the Weibull; for τ > 1 the Moment Generating Function exists and the distribution is lighter-tailed than τ < 1 for which the Moment Generating Function does not exist. For a Transformed Gamma with τ = 1, one gets a Gamma, for which the m.g.f. exists. 48
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 87
Characteristic Function: The Characteristic Function is defined as ϕX(z) = E[eizx] = MX(iz) = PX(eiz).51 The Characteristic Function has the advantage that it exists for all z. However, Characteristic Functions involve complex variables:
ϕX(z) = E[eizx] = E[cos(zx) + i sin(zx)] = E[cos(zx)] + i E[sin(zx)]. One can obtain most of the same useful results using either Moment Generating Functions, Probability Generating Functions, or Characteristic Functions. Cumulant Generating Function: The Cumulant Generating Function is defined as the natural log of the Moment Generating Function.52 ψX(t) = ln MX(t) = ln E[etx]. The cumulants are then obtained from the derivatives of the Cumulant Generating Function at zero. The first cumulant is the mean. ψʼ(0) = E[X]. The 2nd and 3rd cumulants are equal to the 2nd and 3rd central moments.53 Thus one can obtain the variance as ψʼʼ(0). ψʼʼ(0) = Var[X].54 d2 (ln MX(t)) / dt2 | t =0 = Var[X].
d3 (ln MX(t)) / dt3 | t =0 = 3rd central moment of X.
Cumulants of independent variables add. Thus for X and Y independent the 2nd and 3rd central moments add.55 Exercise: What is the cumulant generating function of an Inverse Gaussian Distribution? [Solution: M(t) = exp[(θ/µ) {1 -
1 - 2µ2 t / θ }]. ψ(t) = ln M(t) = (θ/µ) {1 -
1 - 2µ2 t / θ }. ]
Exercise: Use the cumulant generating function to determine the variance of an Inverse Gaussian Distribution. [Solution: ψʼ(t) = (θ/µ)(µ2/θ) (1 - 2tµ2/θ)-.5 = µ (1 - 2tµ2/θ)-.5. Mean = ψʼ(0) = µ. ψʼʼ(t) = µ (µ2/θ)(1 - 2tµ2/θ)-1.5. Variance = ψʼʼ(0) = µ3/θ. ]
51
See Definition 6.18 and Theorem 6.19 in Loss Models, not on the syllabus. See Kendall's Advanced Theory of Statistics Volume 1, by Stuart and Ord or Practical Risk Theory for Actuaries, by Daykin, Pentikainen and Pesonen. 53 The fourth and higher cumulants are not equal to the central moments. 54 See pages 387 and 403 of Actuarial Mathematics. 55 The 4th central moment and higher central moments do not add. 52
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 88
Exercise: Use the cumulant generating function to determine the skewness of an Inverse Gaussian Distribution. [Solution: ψʼʼʼ(t) = (µ3/θ) (3µ2/θ)(1 - 2tµ2/θ)-1.5. Third Central Moment = ψʼʼʼ(0) = 3µ5/θ2. Skewness = Third Central Moment / Variance1.5 = (3µ5/θ2)/( µ3/θ)1.5 = 3 µ / θ . ] Aggregate Distributions: Generating functions are useful for working with the distribution of aggregate losses, when frequency and severity are independent. Let A be Aggregate Losses, X be severity and N be frequency, then the Moment Generating Function of the Aggregate Losses can be written in terms of the p.g.f. of the frequency and m.g.f. of the severity: M A(t) = E[exp[tl]] = Σ E[exp[tl] | N = n] Prob(n = N) = Σ {E[exp[tx1 ] ... E[exp[txn ]}Prob(n = N) =
Σ MX(t)n Prob(n = N) = EN[MX(t)n]
= PN(MX(t)).
MA (t) = PN (MX(t)) = MN(ln(MX(t))). Exercise: Frequency is given by a Poisson with mean 7. Frequency and severity are independent. What is the Moment Generating Function for the aggregate losses? [Solution: As shown in Appendix B.2.1.1 of Loss Models, PN(z) = eλ(z-1) = exp7(z-1). M A(t) = PN(MX(t)) = exp[7(MX(t) - 1)].] In general, for any Compound Poisson distribution, MA (t) = exp(λ( MX(t) - 1)). Exercise: Frequency is given by a Poisson with mean 7. Severity is given by an Exponential with mean 1000. Frequency and severity are independent. What is the Moment Generating Function for the aggregate losses? [Solution: For the Exponential, MX(t) = 1/(1 - θt) = 1/(1 - 1000t), t < 0.001. M A(t) = exp(λ(MX(t) - 1)) = exp(7(1/(1 - 1000t) -1)) = e(7000t)/(1-1000t) , t < 0.001.] The p.g.f. of the Negative Binomial Distribution is: [1 - β(z-1)]-r. Thus for any Compound Negative Binomial distribution, MA(t) = [1 - β(MX(t) - 1)]-r, for MX(t) < 1 + 1/β.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 89
The probability generating function of the Aggregate Losses can be written in terms of the p.g.f. of the frequency and p.g.f. of the severity:56 PA(t) = PN(PX(t)). Exercise: What is the p.g.f of an Exponential distribution? [Solution: The m.g.f. the Exponential distribution is: 1/(1 - θt). In general P[t] = M(ln(t)). Thus for an Exponential distribution P(t) = 1/(1 - θ ln(t)).] Exercise: Frequency is given by a Poisson with mean 7. Severity is given by an Exponential with mean 1000. Frequency and severity are independent. What is the Probability Generating Function for the aggregate losses? [Solution: The p.g.f. the Exponential distribution is: P(t) = 1/(1 - 1000 ln(t)), while that for the Poisson is: P(z) = e7(z-1). Thus the p.g.f of aggregate losses is: exp[7((1/(1 - 1000 ln(t)) -1] = exp[7000ln(t)/(1 - 1000 ln(t))] = t 7000/(1-1000ln(t)).] Recall that a compound frequency distribution is mathematically equivalent to an aggregate distribution. Therefore, for a compound frequency distribution, PN(z) = P1 (P2 (z)). Exercise: What is the p.g.f of a compound Geometric-Poisson frequency distribution? [Solution: For the Geometric primary distribution: P1 (t) = 1/{1 - β(t - 1)}. For the Poisson secondary distribution: P2 (t) = exp[λ(t - 1)]. PN(t) = P1 (P2 (t)) = 1/{1 - β(exp[λ(t - 1)] - 1)}. ] Exercise: Frequency is a compound Geometric-Poisson distribution, with β = 3 and λ = 7. Severity is Exponential with mean 10. Frequency and severity are independent. What is the p.g.f. of aggregate losses? [Solution: The p.g.f for the frequency is: PN(t) = 1/(1 - 3(exp[7(t - 1)] - 1)). The p.g.f. for the Exponential severity is: PX(t) = 1/(1 - 10 ln(t)). PA(t) = PN(PX(t)) = 1/(1 - 3(exp[7(1/(1 - 10 ln(t)) - 1)] - 1)) = 1/{1- 3(exp[70 ln(t)/(1 - 10 ln(t))] - 1)} = 1/(4 - 3t70/(1-10ln(t))).] In general, for a compound frequency distribution and an independent severity: PA(t) = PN[PX(t)] = P1 [P2 (PX(t))].
56
This is the same result as for compound frequency distributions; the mathematics are identical. See “Mahlerʼs Guide to Frequency Distributions.”
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 90
The Laplace Transform for the aggregate distribution is: LA(z) = E[e-zA] = PN[LX[z]].57 The Characteristic Function for the aggregate distribution is: ϕA(z) = E[eizA] = PN[ϕX[z]]. Mixtures: Exercise: Assume one has a two point mixture of distributions: H(x) = pF(x) + (1-p)G(x). What is the Moment Generating Function of H? [Solution: MH(t) = EH[ext] = pEF[ext] + (1-p)EG[ext] = pMF(t) + (1-p)MG(t).] Thus the Moment Generating Function of a mixture is a mixture of the Moment Generating Functions.58 In particular (1/4) + (3/4)(1/(1 - 40t)) is the m.g.f. of a 2-point mixture of a point mass at zero and an Exponential distribution with mean 40, with weights 1/4 and 3/4. Exercise: What is the m.g.f. of a 60%-40% mixture of Exponentials with means of 3 and 7? [Solution: 0.6/(1 - 3t) + 0.4/(1 - 7t).] One can apply these same ideas to continuous mixtures. For example, assume the frequency of each insured is Poisson with parameter λ, with the λ parameters varying across the portfolio via a Gamma distribution with parameters α and θ.59 Then the Moment Generating Function of the frequency distribution for the whole portfolio is the mixture of the individual Moment Generating Functions. For a given value of λ, the Poisson has an m.g.f. of exp[λ(et - 1)]. The Gamma density of λ is: f(λ) = λα−1 e−λ/θ θ−α / Γ(α).
∫exp[λ(et - 1)]λα−1 e−λ/θ θ−α/ Γ(α) dλ = {θ−α/ Γ(α)} ∫λα−1 exp[-λ(1 + 1/θ - et )] dλ = {θ−α / Γ(α)} (1 + 1/θ - et )−αΓ(α) = (θ + 1 - θet )−α = {1 - θ(et - 1)}−α. This is the m.g.f. of a Negative Binomial Distribution, with r = α and β = θ. Therefore, the mixture of Poissons via a Gamma, with parameters α and θ, is a Negative Binomial Distribution, with r = α and β = θ.60
57
See page 205 of Loss Models. This applies equally well to n-point mixtures and continuous mixtures of distributions. 59 This is the well known Gamma-Poisson frequency process. 60 This same result was derived using Probability Generating Functions in “Mahlerʼs Guide to Frequency Distributions.” 58
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 91
Policy Modifications: Let F(x) be the ground up severity distribution. Let PNumberLoss be the probability generating function of the number of losses. Assume there is a deductible, d. Then the expected number of (non-zero) payments is less than the expected number of losses. The number of (non-zero) payments can be thought of as coming from a compound process. First one generates a random number of losses. Then each loss has S(d) chance of being a nonzero payment, independent of any other loss. This is mathematically equivalent to a compound distribution with secondary distribution that is Bernoulli with q = S(d).61 The probability generating function of this Bernoulli is P(z) = 1 + S(d)(z - 1) = F(d) + S(d)z. Therefore, the probability generating function of this compound situation is: PNumberPayments(z) = PNumberLoss(F(d) + S(d)z).62 With a deductible, the severity distribution is altered.63 The per loss variable is zero with probability F(d) and GPerLoss(y) = F(y + d) for y > 0. ∞
∞
∫
∫
Therefore, MPerLoss(t) = E[ety] = F(d)et0 + ety f(y + d) dy = F(d) + ety f(y + d) dy. 0
0
The distribution of the (non-zero) payments has been truncated and shifted from below at d. G PerPayment(y) = {F(y + d) - F(d)}/S(d) for y > 0. gPerPayment(y) = f(y + d)/S(d) for y > 0. ∞
Therefore, MPerPayment(t) = E[ety] =
∫ety f(y + d)/S(d) dy. 0
Therefore, MPerLoss(t) = F(d) + S(d) MPerPayment(t).64 As discussed previously, the aggregate distribution can be thought of either in terms of the per loss variable or the per payment variable. 61
This same mathematical idea was used in proving thinning results in “Mahlerʼs Guide to Frequency Distributions.” See Section 8.6 and Equation 9.31 in Loss Models. 63 See “Mahlerʼs Guide to Loss Distributions.” 64 See Equation 9.30 in Loss Models. 62
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 92
Therefore, MAggregate(t) = PNumberLoss(MPerLoss(t)), and M Aggregate(t) = PNumberPayments(MPerPayment(t)). PNumberPayments(MPerPayment(t)) = PNumberLoss[F(d) + S(d)MPerPayment(t)] = PNumberLoss(MPerLoss(t)), confirming that these two versions of MAggregate(t) are equal. One can compute the moment generating function after the effects of coverage modifications. Exercise: Prior to the effects of a deductible, the sizes of loss follow an Exponential distribution with mean 8000. For a deductible of 1000, determine the moment generating function for the size of the non-zero payments by the insurer. [Solution: ∞ ∞
∫
∫
M PerPayment(t) = E[ety] = ety f(y + d)/S(d) dy = ety (e-(y +1000)/8000 /8000) / e-1000/8000 dy = 0
0
∞
∫exp[-y(1/8000 - t)] dy /8000 = {1/(1/8000 - t)}/8000 = 1/(1 - 8000t), for t < 1/8000. 0
Comment: Due to the memoryless property of the Exponential, the non-zero payments are also an Exponential distribution with mean 8000.] Exercise: Prior to the effects of a deductible, the sizes of loss follow an Exponential distribution with mean 8000. For a deductible of 1000, determine the moment generating function of the payment per loss variable. [Solution: MPerLoss(t) = F(d) + S(d) MPerPayment(t) = 1 - e-1000/8000 + e-1000/8000/(1 - 8000t), for t < 1/8000. ]
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 93
Exercise: Prior to the effects of a maximum covered loss, the sizes of loss follow an Exponential distribution with mean 8000. For a maximum covered loss of 25,000, determine the moment generating function for the size of payments by the insurer. [Solution: For the data censored from above at 25,000, there is a density of: e-x/8000/8000 for x < 25,000, and a point mass of probability of e-25000/8000 = e-3.125 at 25,000. Therefore, M(t) = E[ext] = 25000
x = 25000
∫{e-x/8000 /8000}ext dx + e-3.125e25000t = ext-x/8000 /(1 - 8000t)] + e25000t-3.125 = 0
x=0
(1 - e25000t-3.125)/(1 - 8000t) + e25000t-3.125 = (1 - 8000te25000(t - 1/8000))/(1 - 8000t).]
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 94
Problems: 4.1 (1 point) Let f(2) = 0.3, f(5) = 0.6, f(11) = 0.1. Let M be the moment generating function of f. What is M(0.4)? A. Less than 11 B. At least 11 but less than 12 C. At least 12 but less than 13 D. At least 13 but less than 14 E. At least 14 4.2 (2 points) An aggregate loss distribution is defined by Prob(N = n) = 2n / (n! e2 ), n = 0, 1, 2,... and f(x) = 62.5x2 e-5x, x > 0. What is the Moment Generating Function of the distribution of aggregate losses? A. exp[2{1/(1 - 5t)3 - 1}] B. exp[2{1/(1 - 5t) - 1}] C. exp[2{1/(1 - t/5)3 - 1}] D. exp[2{1/(1 - t/5) - 1}] E. None of A, B, C, or D. 4.3 (2 points) Frequency is given by a Binomial Distribution with m =10 and q = 0.3. The size of losses are either 100 or 250, with probability 80% and 20% respectively. What is the Moment Generating Function for Aggregate Losses at 0.01? A. Less than 1600 B. At least 1600 but less than 1700 C. At least 1700 but less than 1800 D. At least 1800 but less than 1900 E. At least 1900 4.4 (2 points) Y follows a Gamma Distribution with α = 3 and θ = 100. Z = 40 + Y. What is the Moment Generating Function of Z? A. (1 - 140t)-3 B. (1 - 140e40t)-3 C. e40t (1 - 100t)-3 D. (1 - 100t e40t)-3 E. None of A, B, C, or D.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Use the following information for the next five questions: Assume the m.g.f. of a distribution is: M(t) = exp[10{1 4.5 (1 point) What is the mean of this distribution? A. 2.8 B. 2.9 C. 3.0 D. 3.1
1 - 0.6t }].
E. 3.2
4.6 (2 points) What is the variance of this distribution? A. Less than 0.7 B. At least 0.7 but less than 0.8 C. At least 0.8 but less than 0.9 D. At least 0.9 but less than 1.0 E. At least 1.0 4.7 (3 points) What is the skewness of this distribution? A. Less than 0.7 B. At least 0.7 but less than 0.8 C. At least 0.8 but less than 0.9 D. At least 0.9 but less than 1.0 E. At least 1.0 4.8 (1 point) After uniform inflation of 20%, what is the Moment Generating Function? A. exp[12{1 -
1 - 0.6t }]
B. exp[12{1 -
1 - 0.72t }]
C. exp[10{1 -
1 - 0.6t }]
D. exp[10{1 - 1 - 0.72t }] E. None of A, B, C, or D. 4.9 (1 point) X and Y each follow this distribution. X and Y are independent. Z = X + Y. What is the Moment Generating Function of Z? A. exp[10{1 -
1 - 0.6t }]
B. exp[40{1 -
1 - 0.6t }]
C. exp[10{1 -
1 - 1.2t }]
D. exp[40{1 - 1 - 1.2t }] E. None of A, B, C, or D.
Page 95
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 96
4.10 (3 points) The distribution of aggregate losses is compound Poisson with λ = 5. The Moment Generating Function of Aggregate Losses is: M(t) = exp[5/(1-7t)3 - 5]. What is the second moment of the severity distribution? A. Less than 550 B. At least 550 but less than 600 C. At least 600 but less than 650 D. At least 650 but less than 700 E. At least 700 4.11 (2 points) What is the integral: ∞
∫0 x - 3 / 2 exp[-(a2x + b2 / x)] dx
?
Hint: make use of the fact that the density of an Inverse Gaussian Distribution integrates to unity from zero to infinity. A.
π e-ba / b
B.
π e-2ba / b
C.
π ae-ba / b
D.
π ae-2ba / b
E. None of A, B, C, or D. 4.12 (2 points) Calculate the Moment Generating Function for an Inverse Gaussian Distribution with parameters µ and θ. Hint: Use the result of the previous problem. A. exp[(θ/µ) {1 -
1 - tµ / θ }], t ≤ θ/µ.
B. exp[(θ/µ) {1 -
1 - 2tµ / θ }], t ≤ θ/ 2µ.
C. exp[(θ/µ) {1 -
1 - tµ2 / θ }], t ≤ θ/µ2 .
D. exp[(θ/µ) {1 -
1 - 2tµ2 / θ }], t ≤ θ/ 2µ2 .
E. None of A, B, C, or D.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 97
4.13 (2 points ) Frequency is given by a Negative Binomial Distribution with r = 3 and β = 1.2. The size of losses are uniformly distributed on (8, 23). What is the Moment Generating Function for Aggregate Losses? A. t3 / {1.2t - 0.08(e23t - e8t)}3 B. t3 / {2.2t - 0.08(e23t - e8t)}3 C. t3 / {1.2t - (e23t - e8t)/75}3 D. t3 / {2.2t - (e23t - e8t)/75}3 E. None of A, B, C, or D. 4.14 (2 points) The probability of snowfall in any day in January is 20%. If it snows during a day, the amount of snowfall in inches that day is Gamma distributed with α = 3 and θ = 1.7. Each day is independent of the others. What is the Moment Generating Function for the amount of snow during January? A. 0.2 + 0.8(1 - 1.7t)-93 B. {0.2 + 0.8(1 - 1.7t)-3}31 C. 0.8 + 0.2(1 - 1.7t)-93 D. {0.8 + 0.2(1 - 1.7t)-3}31 E. None of A, B, C, or D. 4.15 (3 points) The number of people in a group arriving at The Restaurant at the End of the Universe is Logarithmic with β = 3. The number of groups arriving per hour is Poisson with mean 10. Determine the distribution of the total number of people arriving in an hour. 4.16 (2, 5/83, Q.35) (1.5 points) Let X be a continuous random variable with density function b e-bX for x > 0, where b > 0. If M(t) is the moment-generating function of X, then M(-6b) is which of the following? A. 1/7 B. 1/5 C. 1/(7b) D. 1/(5b) E. +∞ 4.17 (2, 5/83, Q.37) (1.5 points) Let X have the probability density function f(x) = (8/9)x/9, for x = 0, 1, 2, . . . What is the moment-generating function of X? A. 1/(9 - 8et)
B. 9/(9 - 8et)
C. 1/(8et)
D. 9/(8et)
E. 9 - 8et
4.18 (2, 5/85, Q.13) (1.5 points) Let the random variable X have moment-generating function M(t) = 1 / (1 - t)2 , for t < 1. Find E(X3 ). A. -24 B. 0 C. 1/4 D. 24 E. Cannot be determined from the information given.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 98
4.19 (4B, 5/85, Q.47) (3 points) Given that M'(t) = r M(t) / [1- (1 - p)et], where M(t) represents the moment generating function of a distribution. Which of the following represents, respectively, the mean and variance of this distribution? A. mean = r/p
variance = r/p2
B. mean = r(1-p)/p
variance = r/p2
C. mean = r/p
variance = r(1-p)/p2
D. mean = r(1-p)/p E. None of the above
variance = r(1-p)/p2
4.20 (2, 5/88, Q.25) (1.5 points) Let the random variable X have moment generating function M(t) = exp[3t + t2 ]. What is E(X2 )? A. 1 B. 2 C. 3
D. 9
E. 11
4.21 (2, 5/90, Q.12 and Course 1 Sample Exam, Q.26) (1.7 points) Let X be a random variable with moment-generating function M(t) = {(2 + et)/3}9 for -∞ < t < ∞. What is the variance of X? A. 2 B. 3 C. 8 D. 9 E. 11 4.22 (5A, 5/95, Q.21) (1 point) Which of the following are true for the moment generating function M S(t) for the aggregate claims distribution S = X1 + X2 +...+ XN? 1. If the Xiʼs are independent and identically distributed with m.g.f. MX(t), and the number of claims, N, is fixed, then MS(t) = MX(t)N. 2. If the Xiʼs are independent and identically distributed with m.g.f. MX(t), and the number of claims, N has m.g.f. MN(t), then MS(t) = MN[exp(MX(t))]. 3. If the Xiʼs are independent and identically distributed with m.g.f. MX(t), and N is Poisson distributed, then MS(t) = exp[λ(MX(t) - 1)]. A. 1
B. 3
C. 1, 2
D. 1, 3
E. 2, 3
4.23 (2, 2/96, Q.30) (1.7 points) Let X and Y be two independent random variables with moment generating functions MX(t) = exp[t2 + 2t], MY(t) = exp[3t2 + t]. Determine the moment generating function of X + 2Y. A. exp[t2 + 2t] + 2exp[3t2 + t]
B. exp[t2 + 2t] + exp[12t2 + 2t]
C. exp[7t2 + 4t]
D. 2exp[4t2 + 3t]
E. exp[13t2 + 4t]
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 99
4.24 (5A, 5/96, Q.22) (1 point) Let M(t) denote the moment generating function of a claim amount distribution. The number of claims distribution is Poisson with a moment generating function exp[λ(exp(t)-1)]. What is the moment generating function of the compound Poisson Distribution? A. λ(M(t) -1)
B. exp[λexp[M(t) -1]]
D. exp[λ[M(t) -1]]
E. None of A, B, C, D
C. λexp[M(t) -1]
4.25 (Course 151 Sample Exam #1, Q.9) (1.7 points) For S = X1 + X2 + ... + XN: (i) X1 , X2 ... each has an exponential distribution with mean 1/β. (ii) the random variables N, X1 , X2 ,... are mutually independent. (iii) N has a Poisson distribution with mean 1.0. (iv) MS(1.0) = 3.0. Determine β. (A) 1.9
(B) 2.0
(C) 2.1
(D) 2.2
(E) 2.3
4.26 (1, 5/00, Q.35) (1.9 points) A company insures homes in three cities, J, K, and L. Since sufficient distance separates the cities, it is reasonable to assume that the losses occurring in these cities are independent. The moment generating functions for the loss distributions of the cities are: M J(t) = (1 - 2t)-3.
M K(t) = (1 - 2t)-2.5.
M L (t) = (1 - 2t)-4.5.
Let X represent the combined losses from the three cities. Calculate E(X3 ). (A) 1,320 (B) 2,082 (C) 5,760 (D) 8,000 (E) 10,560 4.27 (IOA 101, 9/00, Q.9) (3.75 points) The size of a claim, X, which arises under a certain type of insurance contract, is to be modeled using a gamma random variable with parameters α and θ (both > 0) such that the moment generating function of X is given by M(t) = (1 - θt)−α, t < 1/θ. By using the cumulant generating function of X, or otherwise, show that the coefficient of skewness of the distribution of X is given by 2/ α . 4.28 (1, 11/00, Q.11) (1.9 points) An actuary determines that the claim size for a certain class of accidents is a random variable, X, with moment generating function M X(t) = 1 / (1 - 2500t)4 . Determine the standard deviation of the claim size for this class of accidents. (A) 1,340 (B) 5,000 (C) 8,660 (D) 10,000 (E) 11,180
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 100
4.29 (1, 11/00, Q.27) (1.9 points) Let X1 , X2 , X3 be a random sample from a discrete distribution with probability function p(0) = 1/3 and p(1) = 2/3. Determine the moment generating function, M(t), of Y = X1 X2 X3 . (A) 19/27 + 8et/27 (B) 1 + 2et (C) (1/3 + 2et/3)3 (D) 1/27 + 8e3t/27 (E) 1/3 + 2e3t/3 4.30 (IOA 101, 4/01, Q.5) (2.25 points) Show that the probability generating function for a Binomial Distribution with parameters m and q is P(z) = (1 - q + qz)m. Deduce the moment generating function. 4.31 (IOA 101, 4/01, Q.6) (2.25 points) Let X have a normal distribution with mean µ and standard deviation σ, and let the ith cumulant of the distribution of X be denoted κi. Given that the moment generating function of X is M(t) = exp[µt + σ2t2 /2], determine the values of κ2, κ3, and κ4. 4.32 (IOA 101, 4/01, Q.7) (1.5 points) The number of policies (N) in a portfolio at any one time is modeled as a Poisson random variable with mean 10. The number of claims (Xi) arising on a policy is also modeled as a Poisson random variable with mean 2, independently for each policy and independent of N. N
Determine the moment generating function for the total number of claims, ∑ Xi , i=1
arising for the portfolio of policies. 4.33 (1, 5/03, Q.39) (2.5 points) X and Y are independent random variables with common moment generating function M(t) = exp[t2 /2]. Let W = X + Y and Z = Y - X. Determine the joint moment generating function, M(t1 , t2 ), of W and Z. (A) exp[2t1 2 + 2t2 2 ]
(B) exp[(t1 - t2 )2 ]
(D) exp[2t1 t2 ]
(E) exp[t1 2 + t2 2 ]
(C) exp[(t1 + t2 )2 ]
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 101
Solutions to Problems: 4.1. D. M(t) = E[ext] = .3e2t + .6e5t + .1e11t. M(.4) = 0.3e.8 + 0.6e2 + 0.1e4.4 = 0.6676 + 4.4334 + 8.1451 = 13.25. 4.2. C. Frequency is Poisson, with λ = 2 and p.g.f.: P(z) = exp(2(z-1)). Severity is Gamma with α = 3 and θ = 1/5 and m.g.f.: 1/(1 - t/5)3 . M A(t) = PN(MX(t)) = exp[2{1/(1-t/5)3 - 1}]. 4.3. A. The p.g.f. of the Binomial Frequency is: P(z) = (1+.3(z-1))10. The m.g.f. for severity is: M(t) = .8 e100t + .2e250t. MA(t) = PN(MX(t)) = (1 + .3( .8 e100t + .2e250t - 1))10. M A(.01) = (1 + .3( .8 e1 + .2e2.5 - 1))10 = 1540. 4.4. C. The m.g.f. of Y is MY(t) = (1-100t)-3. E[ezt] = E[e(y+40)t] = E[eyt e40t] = e40t E[eyt]. Therefore, the m.g.f of Z is: MZ(t) = e40t MY(t) = e4 0 t (1-100t)- 3. 4.5. C. M(t) = exp[10{1 -
1- 0.6t }]. Mʼ(t) = M(t) 3 / 1- 0.6t . mean = Mʼ(0) = 3.
4.6. D. M(t) = exp[10{1 -
1- 0.6t }]. Mʼ(t) = M(t) 3 / 1- 0.6t . Mean = Mʼ(0) = 3.
Mʼʼ(t) = Mʼ(t) 3 / 1- 0.6t + M(t) 0.9 / (1 - 0.6t)1.5. Second moment = Mʼʼ(0) = (3)(3) + 0.9 = 9.9. Variance = 9.9 - 32 = 0.9. 4.7. D. Mʼʼ(t) = Mʼ(t) 3 / 1- 0.6t + M(t) .9 / (1- 0.6t)1.5. M'''(t) = M''(t)3/ 1- 0.6t + 2M'(t)(.9)(1 - .6t)-3/2 + M(t)(.81)(1 - .6t)-5/2. Mʼʼ(0) = 9.9. M'''(0) = (9.9)(3) + (2)(3)(0.9) + (1)(0.81) = 35.91 = third moment. Third central moment = 35.91 - (3)(9.9)(3) + 2(33 ) = 0.81. Thus the skewness = 0.81 /(0.91.5) = 0.949. Comment: This is an Inverse Gaussian Distribution with µ = 3 and θ = 30, with mean 3, variance 33 /30 = 0.9, and coefficient of variation: 0.9 / 3 = 0.3162. The skewness of an Inverse Gaussian is three times the CV: (3)(0.3162) = 0.949 = 3 3 / 30 = 3 µ / θ .
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 102
4.8. D. In general McX(t) = E[etcx] = MX(ct). In this case we have multiplied x by 1.20, so the new m.g.f is exp[10{1 -
1 - 0.6(1.2t)}] = exp[10{1 -
(1 - 0.72t) }].
Alternately, prior to inflation this is an Inverse Gaussian Distribution with µ = 3 and θ = 30. Under uniform inflation, both parameters are multiplied by the inflation factor, so after inflation we have µ = 3.6 and θ = 36. The m.g.f. of an Inverse Gaussian is: exp[(θ/µ){1 -
1 - 2µ2 t / θ )}]. After inflation this is: exp[10{1 -
4.9. E. M X+Y(t) = MX(t) MY(t) = exp[10{1 exp[20{1 -
(1 - 0.72t) }].
1 - 0.6t }] exp[10{1 -
1 - 0.6t }] =
1 - 0.6t }].
Comment: This is the moment generating function of another Inverse Gaussian, but with µ = 6 and θ = 120, rather than µ = 3 and θ = 30. In general, the sum of two independent identically distributed Inverse Gaussian Distributions with parameters µ and θ, is another Inverse Gaussian Distribution, but with parameters 2µ and 4θ. 4.10. B. For a Compound Poisson, MA(t) = exp(λ(MX(t)-1)). Thus MX(t) = 1/(1-7t)3 . This is the moment Generating Function of a Gamma Distribution, with parameters α = 3 and θ = 7. Thus it has a mean of: (3)(7) = 21, a variance of: 3(72 ) = 147, and second moment of: 147 + 212 = 588. Alternately, MXʼ(t) = 21/(1-7t)4 . MXʼʼ(t) = 588/(1-7t)4 . second moment of the severity = MXʼʼ(0) = 588. Alternately, Mʼ(t) = 105M(t)(1-7t)-4. Mʼ(0) = 105 = mean of aggregate losses. Mʼʼ(t) = 2940M(t)(1-7t)-5 + 105Mʼ(t)(1-7t)-4. Mʼʼ(0) = 2940 + (105)(105) = 13965 = 2nd moment of the aggregate losses. Variance of the aggregate losses = 13965 - 1052 = 2940. For a compound Poisson, variance of the aggregate losses = λ (2nd moment of severity). Therefore, 2nd moment of severity = 2940/5 = 588. Comment: One could use the Cumulant Generating Function, which is defined as the natural log of the Moment Generating Function. ψ(t) = ln M(t) = ln {exp(5/(1-7t)3 - 5))} = 5/(1-7t)3 - 5. ψʼ(t) = 105/(1-7t)4 . ψʼʼ(t) = 2940/(1-7t)5 . Variance = ψʼʼ(0) = 2940.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 103
4.11. B. The density of an Inverse Gaussian with parameters µ and θ is: f(x) =
θ -1.5 x exp(-θ(x/µ -1)2 / (2x)) = 2π
θ -1.5 x exp(-θx/ 2µ2 + θ/µ - θ/ 2x). 2π
Let a2 = θ/ 2µ2 and b2 = θ/ 2, then θ = 2b2 and µ = b/a. b -1.5 b2 -1.5 x exp(-a2 + 2ba - b2 /x) = e2ba x exp(-a2 - b2 /x). π π
Then f(x) =
b Since this integrates to unity: e2ba π
∞
∫0 x - 1.5 exp(-a2 - b2 / x) dx = 1. ⇒
∞
∫0 x - 1.5 exp(-a2 - b2 / x) dx =
π e- 2 b a / b.
Comment: This is a special case of a Modified Bessel Function of the Third Kind, K-.5. See for example, Appendix C of Insurance Risk Models by Panjer & Willmot. 4.12. D. The Moment Generating Function is the expected value of etx. ∞
∞
M(t) =
∫0 etx f(x) dx = ∫0 etx
eθ/µ
θ 2π
θ - 1.5 x exp[-θx / (2µ2) + θ / µ - θ / (2x)] dx = 2π
∞
∫0 x - 1.5 exp[-{θ / (2µ2) - t}x - θ / (2x)] dx.
Provided t ≤ θ/ 2µ2 , let a2 = θ/ 2µ2 - t, and b2 = θ/ 2. Then the integral is of the type in the previous problem and has a value of: π e-2ba / b. Therefore, M(t) = eθ/µ
θ { π e-2ba / b} = 2π
eθ/µ exp[-(θ/µ) 1 - 2tµ2 / θ ] = exp[(θ/µ) {1 -
1 - 2tµ 2 / θ }].
We required that t ≤ θ/ (2µ2 ), so that a2 ≥ 0; M(t) only exists for t ≤ θ / (2µ2 ). Comment: Note that ba =
b2 a2 = (θ / 2) (
θ - t) = {θ/ (2µ)} 2µ2
1 - 2tµ2 / θ .
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 104
4.13. B. The p.g.f. of the Negative Binomial Frequency is: P(z) = (1 - 1.2(z-1))-3 = (2.2 - 1.2z)-3. The m.g.f. for the uniform severity is: M(t) = (exp(23t) - exp(8t))/ (15t). M A (t) = PN(MX(t)) = {2.2 - 1.2((exp(23t) - exp(8t))/(15t) }-3 = {2.2 - .08((exp(23t) - exp(8t))/t)}-3 = t3 /{2.2t - 0.08(e2 3 t - e8 t) }3 . 4.14. D. The frequency for a single day is Bernoulli, with q = .2 and p.g.f. P(z) = .8 + .2z. The Gamma severity has m.g.f. M(t) = (1- 1.7t)-3. Thus the m.g.f of the aggregate losses for one day is P(M(t)) = .8 +.2(1- 1.7t)-3. The m.g.f for 31 independent days is the 31st power of that for a single day, {0.8 + 0.2(1 - 1.7t)- 3}3 1. Alternately, the frequency for 31 days is Binomial with m = 31, q = .2, and p.g.f. P(z) = (.8 + .2z)31. Thus the m.g.f. for the aggregate losses is P(M(t)) = (.8 +.2(1- 1.7t)-3)31. 4.15. This is a compound distribution with primary distribution a Poisson and secondary distribution a Logarithmic. Alternately, it is a Compound Poisson with severity Logarithmic. ln[1 - β (z -1)] PLogarithmic(z) = 1 . ln[1 + β] PPoisson(z) = exp[λ(z-1)]. PAggregate(z) = PPoisson[PLogarithmic(z)] = PPoisson[1 -
ln[1 - β (z -1)] ln[1 - β (z -1)] ] = exp[-λ ] ln[1 + β] ln[1 + β]
= exp[ln[1 - β(z-1)]]-λ /ln(1+β) = {1 - β(z-1)}-λ /ln(1+β). PNegativeBinomial(z) = {1 - β(z - 1)}-r. Thus the aggregate distribution is Negative Binomial, with r = λ/ln(1+β) = 10/ln(4) = 7.213. The aggregate number of people is Negative Binomial with r = 7.213 and β = 3. Comment: Beyond what you should be asked on your exam. See Example 6.14 in Loss Models, not on the syllabus, or Section 6.8 in Insurance Risk Models by Panjer and Willmot, not on the syllabus. The mean of the Logarithmic Distribution is 3/ln(4). (Mean number of groups) (Average Size of Groups) = (10){3/ln(4)} = 30/ln(4) = {10/ln(4)}(3) = Mean of the Negative Binomial. The variance of the Logarithmic Distribution is: 3{4 - 3/ln(4)} / ln(4). (Mean number of groups) (Variance of Size of Groups) + (Average Size of Groups)2 (Variance of Number of Groups) = (10)3{4 - 3/ln(4)}/ln(4) + {3/ln(4)}2 (10) = 120/ln(4) - 90/{ln(4)}2 + 90/{ln(4)}2 = 120/ln(4) = {10/ln(4)}(3)(4) = Variance of the Negative Binomial.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 105
4.16. A. An Exponential with θ = 1/b. M(t) = 1/(1 - θt) = 1/(1 - t/b). M(-6b) = 1/(1 + 6) = 1/7. 4.17. A. A Geometric Distribution with β = 8. P(z) = 1/{1 - β(z-1)} = 1/(9 - 8z). M(t) = P(et) = 1/(9 - 8et ). 4.18. D. Mʼʼʼ(t) = 24/(1 - t)5 . E[X3 ] = Mʼʼʼ(0) = 24. Alternately, this is a Gamma Distribution, with α = 2, θ = 1, and E[X3 ] = θ3α(α+1)(α+2) = 24. 4.19. C. The moment generating function is always unity at zero; M(t) = E[etX], M(0) = E[1] = 1. The mean is Mʼ(0) = M(0)r/p = r/p. Mʼʼ(t) = d (r M(t) / [1-(1-p)et] / dt = r{Mʼ(t)(1-(1-p)et) + M(t)(1-p)et}/ {1-(1-p)et}2 . The second moment is Mʼʼ(0) = r{pMʼ(0) + M(0)(1-p)} / p2 = r{r +(1-p)} / p2 . Therefore, the variance = r{r +(1-p)} / p2 - (r/p)2 = r(1-p)/p2 . Comment: This is the Moment Generating Function of the number of Bernoulli trials, each with chance of success p, it takes until r successes. The derivation of the Negative Binomial Distribution involves the number of failures rather than the number of trials. See “Mahlerʼs Guide to Frequency Distributions.” Thus the variable here is: r + (a Negative Binomial with parameters r and β = (1-p)/p). This variable has mean: r + (r)(1-p)/p = r/p and variance: rβ(1+β) = r (1-p)/p2 . Note that the m.g.f of this variable is: M(t) = ert (m.g.f of a Negative Binomial) = ert (1 - β(et -1))-r = ert p r(1 - (1-p)(et))-r. Mʼ(t) = rert p r(1 - (1-p)(et))-r + r(1-p)(et)ert p r(1 - (1-p)(et))-(r+1) = r M(t) + r(1-p)et M(t) /[1- (1-p)et] = rM(t){ 1- (1-p)et + (1-p)et}/[1- (1-p)et] = rM(t)/[1- (1-p)et]. 4.20. E. Mʼ(t) = (3 + 2t)M(t). Mʼʼ(t) = 2M(t) + (3 + 2t)2 M(t). E[X2 ] = Mʼʼ(0) = 2 + 9 = 11. Comment: A Normal Distribution with mean 3 and variance 2. E[X2 ] = 32 + 2 = 11.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 106
4.21. A. Mʼ(t) = 9et(2 + et)8 /39 . E[X] = Mʼ(0) = 3. Mʼʼ(t) = 72et(2 + et)7 /39 + 9et(2 + et)8 /39 . E[X2 ] = Mʼʼ(0) = 8 + 3 = 11. Variance = 11 - 32 = 2. Alternately, lnM(t) = 9 ln(2 + et) - 9 ln(3). d(lnM(t))/dt = 9et/(2 + et). d2 (lnM(t))/dt2 = 9et/(2 + et) - 9e2t/(2 + et)2 . Var[X] = d2 (ln MX(t)) / dt2 | t =0 = 3 - 1 = 2. Alternately, M(t) = {(2 + et)/3}9 ⇒ P(z) = {(2 + z)/3}9 = (1 + (z - 1)/3)9 . This is the probability generating function of a Binomial Distribution with q = 1/3 and m = 9. Variance = mq(1-q) = (9)(1/3)(1 - 1/3) = 2. 4.22. D. 1. True. 2. False. MS(t) = PN(MX(t)) = MN(ln(MX(t))). 3. True. For a Poisson distribution, PN(z) = exp(λ(z - 1)). For a Compound Poisson, MS(t) = PN(MX(t)) = exp(λ(MX(t)-1)). 4.23. E. M 2Y(t) = MY(2t) = exp[12t2 + 2t]. MX+2Y(t) = MX(t)M2Y(t) = exp[13t2 + 4t]. Comment: X is Normal with mean 2 and variance 2. Y is Normal with mean 1 and variance 6. 2Y is Normal with mean 2 and variance 24. X + 2Y is Normal with mean 4 and variance 26. For a Normal, M(t) = exp[µt + σ2t2 /2]. 4.24. D. For a Compound Poisson, MA(t) = MN(ln(MX(t))) = exp(λ(MX(t)-1)). 4.25. A. The moment generating function of aggregate losses can be written in terms of those of the frequency and severity: MN(ln(MX(t)) = PN(MX(t)). For a Poisson Distribution, the probability generating function is eλ(z-1). For an Exponential Distribution with mean 1/β, the moment generating function is 1/(1- t/β) = β/(β-t). Therefore, the moment generating function of the aggregate losses is: exp[λ(MX(t)-1)] = exp[λ(β/(β-t)-1)] = exp[λt/(β-t)]. In this case λ = 1, so MS(t) = exp[t/(β-t)]. We are given MS(1) = 3, so that 3 = exp[1/(β-1)]. Therefore, β = 1+ 1/ln(3) = 1.91. 4.26. E. MX(t) = MJ(t) MK(t) ML (t) = (1 - 2t)-10. M Xʼ(t) = 20 (1 - 2t)-11. MXʼʼ(t) = 440 (1 - 2t)-12. MXʼʼʼ(t) = 10560 (1 - 2t)-13. E[X3 ] = MXʼʼʼ(0) = 10,560. Alternately, the three distributions are each Gamma with θ = 2, and α = 3, 2.5, and 4.5. Therefore, their sum is Gamma with θ = 2, and α = 3 + 2.5 + 4.5 = 10. E[X3 ] = θ3 α(α+1)(α+2) = (8)(10)(11)(12) = 10,560.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 107
4.27. ψ(t) = ln M(t) = -α ln (1 - θt). ψʼ(t) = αθ / (1 - θt). ψʼʼ(t) = αθ2 / (1 - θt)2 . ψʼʼʼ(t) = 2αθ3 / (1 - θt)3 .
Var[X] = ψʼʼ(0) = αθ2 .
Third Central Moment of X = ψʼʼʼ(0) = 2αθ3 .
Skewness = 2αθ3 /(αθ2 )1.5 = 2/ α .
Comment: One could instead get the moments from the Appendix attached to the exam or use the Moment Generating Function to get the moments. Then the Third Central Moment is: E[X3 ] - 3 E[X] E[X2 ] + 2E[X]3 . 4.28. B. MX(t) = (1 - 2500t)-4. MXʼ(t) = 10,000 (1 - 2500t)-5. E[X] = MXʼ(0) = 10,000. M Xʼʼ(t) = 125 million (1 - 2500t)-6. E[X2 ] = MXʼʼ(0) = 125 million. Standard deviation = 125 million - 100002 = 5000. Alternately, ln MX(t) = -4ln(1 - 2500t). d ln MX(t)/ dt = 10000/(1 - 2500t). d2 ln MX(t)/ dt2 = 25,000,000/(1- 2500t)2 . Var[X] = d2 ln MX(0)/ dt2 = 25,000,000. Standard deviation = 5000. Alternately, the distribution is Gamma with θ = 2500 and α = 4. Variance = αθ2 = (4)(25002 ). Standard deviation = (2)(2500) = 5000. 4.29. A. Prob[Y = 1] = Prob[X1 = X2 = X3 = 1] = (2/3)3 = 8/27. Prob[Y = 0] = 1 - 8/27 = 19/27. M Y(t) = E[eyt] = Prob[Y = 0] e0t + Prob[Y = 1] e1t = 19/27 + 8et /27. 4.30. P(z) = E[zn ] = Σ f(n) zn . For a Bernoulli, P(z) = f(0)z0 + f(1)z1 = 1 - q + qz. The Binomial is the sum of m independent, identically distributed Bernoullis, and therefore has P(z) = (1 - q + qz)m. M(t) = P(et) = (1 - q + qet)m. 4.31. The cumulant generating function is ψ(t) = ln M(t) = µt + σ2t2 /2. ψʼ(t) = µ + σ2t. κ1 = ψʼ(0) = µ = mean. ψʼʼ(t) = σ2. κ2 = ψʼʼ(0) = σ2 = variance. ψʼʼʼ(t) = 0. κ3 = ψʼʼʼ(0) = 0 = third central moment. ψʼʼʼʼ(t) = 0. κ4 = ψʼʼʼʼ(0) = 0. Comment: κi is the coefficient of ti/i! in the cumulant generating function.
2013-4-3,
Aggregate Distributions §4 Generating Functions,
HCM 10/23/12,
Page 108
4.32. For the Poisson Distribution, P(z) = exp[λ(z - 1)]. ⇒ M(t) = exp[λ(et - 1)]. Pprimary(z) = exp[10(z - 1)]. Msecondary(t) = exp[2(et - 1)]. M compound(t) = Pprimary[Msecondary(t)] = exp[10 {exp[2(et - 1)] - 1}]. Alternately, Pcompound(z) = Pprimary[Psecondary(z)] = exp[10 {exp[2(z - 1)] - 1}]. Then let z = et, in order to get the moment generating function of the compound distribution. 4.33. E. The joint moment generating function of W and Z: M(t1 , t2 ) ≡ E[exp[t1 w + t2 z]] = E[exp[t1 (x+y) + t2 (y-x)]] = E[exp[y(t1 + t2 ) + x(t1 - t2 )]] = E[exp[y(t1 + t2 )]] E[exp[x(t1 - t2 )]] = MY(t1 + t2 ) MX(t1 - t2 ) = exp[(t1 + t2 )2 /2] exp[(t1 - t2 )2 /2] = exp[t1 2 /2 + t2 2 /2 + t1 t2 ] exp[t1 2 /2 + t2 2 /2 - t1 t2 ] = exp[t1 2 + t2 2 ]. Comment: Beyond what you should be asked on your exam. X and Y are two independent unit Normals, each with mean 0 and standard deviation 1. E[W] = 0. Var[W] = Var[X] + Var[Y] = 1 + 1 = 2. E[Z] = 0. Var[Z] = Var[X] + Var[Y] = 1 + 1 = 2. Cov[W, Z] = Cov[X + Y, Y - X] = -Var[X] + Var[Y] + Cov[X, Y] - Cov[X, Y] = -1 + 1 = 0. Corr[W, Z] = 0. W and Z are bivariate Normal, with µW = 0, σW2 = 2, µZ = 0, σZ2 = 2, ρ = 0. For a bivariate Normal, M(t1 , t2 ) = exp[µ1t1 + µ2t2 + σ12t1 2 /2 + σ22t2 2 /2 + ρσ1σ2t1 t2 ]. See for example, Introduction to Probability Models, by Ross.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 109
Section 5, Moments of Aggregate Losses If one is not given the frequency per exposure, but is rather just given the frequency for the whole number of exposures, whatever they are for the particular situation, then Losses = (Frequency) (Severity). Thus Mean Aggregate Loss = (Mean Frequency) (Mean Severity). Exercise: The number of claims is given by a Negative Binomial Distribution with r = 4.3 and β = 3.1. The size of claims is given by a Pareto Distribution with α = 1.7 and θ = 1400. What is the expected aggregate loss? [Solution: The mean frequency is rβ = (4.3)(3.1) = 13.33. The mean severity is θ/(α-1) = 1400 / 0.7 = 2000. The expected aggregate loss is: (13.33)(2000) = 26,660.] Since they depend on both the number of claims and the size of claims, aggregate losses have more reasons to vary than do either frequency or severity individually. Random fluctuation occurs when one rolls dice, spins spinners, picks balls from urns, etc. The observed result varies from time period to time period due to random chance. This is also true for the aggregate losses observed for a collection of insureds. The variance of the observed aggregate losses that occurs due to random fluctuation is referred to as the process variance. That is what will be discussed here.65
Independent Frequency and Severity: You are given the following: • The number of claims for a single exposure period is given by a Binomial Distribution with q = 0.3 and m = 2. • The size of the claim will be 50, with probability 80%, or 100, with probability 20%. • Frequency and severity are independent.
65
The process variance is distinguished from the variance of the hypothetical means as discussed in “Mahlerʼs Guide to Buhlmann Credibility.”
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 110
Exercise: Determine the variance of the aggregate losses. [Solution: List the possibilities and compute the first two moments: Situation 0 claims 1 claim @ 50 1 claim @ 100 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 100 2 claims @ 100 each
Probability 49.00% 33.60% 8.40% 5.76% 2.88% 0.36%
Overall
100.0%
Aggregate Loss Square of Aggregate Loss 0 0 50 2500 100 10000 100 10000 150 22500 200 40000 36
3048
For example, the probability of 2 claims is: 0.32 = 9%. We divide this 9% among the possible claim sizes: 50 and 50 @ (0.8)(0.8) = 64%, 50 and 100 @ (0.8)(0.2) = 16%, 100 and 50 @ (0.2)(0.8) = 16%, 100 and 100 @ (0.2)(0.2) = 4%. (9%)(64%) = 5.76%, (9%)(16% + 16%) = 2.88%, (9%)(4%) = 0.36%. One takes the weighted average over all the possibilities. The average Pure Premium is 36. The second moment of the Pure Premium is 3048. Therefore, the variance of the pure premium is: 3048 - 362 = 1752.] In this case, since frequency and severity are independent one can make use of the following formula:66 (Process) Variance of Aggregate Loss = (Mean Freq.) (Variance of Severity) + (Mean Severity)2 (Variance of Freq.)
σAgg2 = µF σX 2 + µX 2 σF2. Memorize this formula for the variance of the aggregate losses when frequency and severity are independent. Note that each of the two terms has a mean and a variance, one from frequency and one from severity. Each term is in dollars squared; that is one way to remember that the mean severity (which is in dollars) enters as a square while that for mean frequency (which is not in dollars) does not. In the above example, the mean frequency is mq = 0.6 and the variance of the frequency is: mq(1 - q) = (2)(0.3)(0.7) = 0.42. The average severity is 60 and the variance of the severity is: (0.8)(102 ) + (0.2)(402 ) = 400. The process variance of the aggregate losses is: (0.6)(400) + (602 )(0.42) = 1752, which matches the result calculated previously. 66
See equation 9.9 in Loss Models. Note Loss Models uses S for aggregate losses rather than A, and N for frequency rather than F. I have used X for severity in order to follow Loss Models. This formula can also be used to compute the process variance of the pure premium, when frequency and severity are independent.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 111
One can rewrite the formula for the process variance of the aggregate losses in terms of coefficients of variation by dividing both sides by the square of the mean aggregate loss:67 CVA2 = CVF2 + CVX2 / µF. In the example above, CVF2 = (2)(0.3)(0.7) / {(2)(0.3)}2 = 1.167, CVX2 = 400 / 602 = 1.111, and therefore CVA2 = 1.167 + 1.111/0.6 = 1.352 = 1752 / 362 . Thus the square of the coefficient of variation of the aggregate losses is the sum of the CV2 for frequency and the CV2 for severity divided by the mean frequency. An Example of Dependent Frequency and Severity: On both the exam and in practical applications, frequency and severity are usually independent. However, here is an example in which frequency and severity are dependent. Assume that you are given the following: • The number of claims for a single exposure period will be either 0, 1, or 2: Number of Claims Probability 0 60% 1 30% 2 10% • If only one claim is incurred, the size of the claim will be 50, with probability 80%; or 100, with probability 20%. • If two claims are incurred, the size of each claim, independent of the other, will be 50, with probability 50%; or 100, with probability 50%. How would one determine the variance of the aggregate losses? First list the aggregate losses and probability of each of the possible outcomes. If there is no claim (60% chance) then the aggregate loss is zero. If there is one claim, then the aggregate loss is either 50 with (30%)(80%) = 24% chance, or 100 with (30%)(20%) = 6% chance. If there are two claims then there are three possibilities. There is a (10%)(25%) = 2.5% chance that there are two claims each of size 50 with an aggregate loss of 100. There is a (10%)(50%) = 5% chance that there are two claims one of size 50 and one of size 100 with an aggregate loss of 150. There is a (10%)(25%) = 2.5% chance that there are two claims each of size 100 with an aggregate loss of 200. Next, the first and second moments can be calculated by listing the aggregate losses for all the possible outcomes and taking the weighted average using the probabilities as weights of either the aggregate loss or its square. 67
The mean of the aggregate losses is the product of the mean frequency and the mean severity.
2013-4-3,
Aggregate Distributions §5 Moments,
Situation 0 claims 1 claim @ 50 1 claim @ 100 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 100 2 claims @ 100 each
Probability 60.0% 24.0% 6.0% 2.5% 5.0% 2.5%
Overall
100.0%
HCM 10/23/12,
Page 112
Aggregate Loss Square of Aggregate Loss 0 0 50 2500 100 10000 100 10000 150 22500 200 40000 33
3575
One takes the weighted average over all the possibilities. The average aggregate loss is 33. The second moment of the aggregate losses is 3575. Therefore, the variance of the aggregate losses is: 3575 - 332 = 2486. Note that the frequency and severity are not independent in this case. Rather the severity distribution depends on the number of claims. For example, the average severity if there is 1 claim is 60, while the average severity if there are 2 claims is 75. In general, one can calculate the variance of the aggregate losses in the above manner from the second and first moments. The variance is: the second moment - (first moment)2 . The first and second moments can be calculated by listing the aggregate losses for all the possible outcomes and taking the weighted average, applying the probabilities as weights to either the aggregate loss or its square. In continuous cases, this will involve taking integrals, rather than sums. Policies of Different Types: Let us assume we have a portfolio consisting of two types of policies:
Type A B
Number of Policies 10 20
Mean Aggregate Loss per Policy 6 9
Variance of Aggregate Loss per Policy 3 4
Assuming the results of each policy are independent, then the mean aggregate loss for the portfolio is: (10)(6) + (20)(9) = 240. The variance of aggregate loss for the portfolio is: (10)(3) + (20)(4) = 110.68 For independent policies, the means and variances of the aggregate losses for each policy add. The sum of the aggregate losses from two independent policies has the sum of the means and variances of the aggregate losses for each policy. 68
Since we are given the variance of aggregate losses, there is no need to compute the variance of aggregate losses from the mean frequency, variance of frequency, mean severity, and variance of severity.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 113
Exercise: Compare the coefficient of variation of aggregate losses in the above example to that if one had instead 100 policies of Type A and 200 polices of type B. [Solution: For the original example, CV =
110 / 240 = 0.043.
For the new example, CV = 1100 / 2400 = 0.0138. Comment: Note that as we have more policies, all other things being equal, the coefficient of variation goes down.] Exercise: For each of the two cases in the previous exercise, using the Normal Approximation estimate the probability that the aggregate losses will be at least 5% more than their mean. [Solution: For the original example, Prob[Agg. > 252] ≅ 1 - Φ[(252 - 240)/ 110 ] = 1 - Φ[1.144] = 12.6%. For the new example, Prob[Agg. > 2520] ≅ 1 - Φ[(2520 - 2400)/ 1100 ] = 1 - Φ[3.618] = 0.015%.] For a larger portfolio, all else being equal, there is less chance of an extreme outcome in a given year measured as a percentage of the mean.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 114
Derivation of the Formula for the Process Variance of the Aggregate Losses: The above formula for the process variance of the aggregate losses for independent frequency and severity is a special case of the formula that also underlies analysis of variance: Var(Y) = EZ[VARY(Y | Z)] + VARZ(EY[Y | Z]), where Z and Y are any random variables. Letting Y be the aggregate losses, A, and Z be the number of claims, N, in the above formula gives: Var(Agg) = EN[VARA(A | N)] + VARN(EA[A | N]) = EN[NσX2 ] + VARN(µXN) = EN[N]σX2 + µX2 VARN(N) = µF σX2 + µX2 σF2 . Where we have used the assumption that the frequency and severity are independent and the facts: • For a fixed number of claims N, the variance of the aggregate losses is the variance of the sum of N independent identically distributed variables each with variance σX2 . (Since frequency and severity are assumed independent, σX2 is the same for each value of N.) Such variances add so that VARA(A | N) = NσX2 . •
For a fixed number of claims N, for frequency and severity independent the expected value of the aggregate losses is N times the mean severity: EA[A | N] = µXN.
•
Since with respect to N the variance of the severity acts as a constant : EN[NσX2 ] = σX2 EN[N] = µF σX2 .
•
Since with respect to N the mean of the severity acts as a constant : VARN(µXN) = µX2 VARN(N) = µX2 σF2 .
Letʼs apply this derivation to the previous example. You were given the following: • For a given risk, the number of claims is given by a Binomial Distribution with q = 0.3 and m = 2. • The size of the claim will be 50, with probability 80%, or 100, with probability 20%. • frequency and severity are independent. There are only three possible values of N: N=0, N=1, or N=2. If N = 0 then A = 0. If N = 1 then either A = 50 with 80% chance or A = 100 with 20% chance. If N = 2 then A = 100 with 64% chance, A = 150 with with 32% chance or A = 200 with 4% chance.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 115
We then get : N
Probability
0 1 2
49% 42% 9%
Mean
Mean A Given N 0 60 120 36
Square of Mean Second Moment of A Given N of A Given N 0 0 3600 4000 14400 15200 2808
Var of A Given N 0 400 800 240
For example, given two claims, the second moment of the aggregate losses is: (64%)(1002 ) + (32%)(1502 ) + (4%)(2002 ) = 15,200. Thus given two claims the variance of the aggregate losses is: 15,200 - 1202 = 800. Thus EN[VARA(A | N)] = 240, and VARN(EA[A | N]) = 2808 - 362 = 1512. Thus the variance of the aggregate losses is EN[VARA(A | N)] + VARN(EA[A | N]) = 240 + 1512 = 1752, which matches the result calculated above. The (total) process variance of the aggregate losses has been split into two pieces. The first piece calculated as 240, is the expected value over the possible numbers of claims of the process variance of the aggregate losses for fixed N. The second piece calculated as 1512, is the variance over the possible numbers of the claims of the mean aggregate loss for fixed N. Poisson Frequency: Assume you are given the following: • For a given risk, the number of claims for a year is Poisson with mean 7. • The size of the claim will be 50, with probability 80%, or 100, with probability 20%. • frequency and severity are independent. Exercise: Determine the variance of the aggregate losses for this risk. [Solution: µF = σF2 = 7. µX = 60. σX2 = 400.
σA2 = µF σX2 + µX2 σF2 = (7)(602 ) + (400)(7) = 28,000.] In the case of a Poisson Frequency with independent frequency and severity the formula for the process variance of the aggregate losses simplifies. Since µF = σF2 :
σA2 = µF σX2 + µX2 σF2 = µF(σX2 + µX2 ) = µF(2nd moment of the severity). The variance of a Compound Poisson is: λ (2nd moment of severity).
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 116
In the example above, the second moment of the severity is: (0.8)(502 ) + (0.2)(1002 ) = 4000. Thus σA2 = µF(2nd moment of the severity) = (7)(4000) = 28,000. As a final example, assume you are given the following: • For a given risk, the number of claims for a year is Poisson with mean 3645. •
The severity distribution is LogNormal, with parameters µ = 5 and σ = 1.5.
•
Frequency and severity are independent.
Exercise: Determine the variance of the aggregate losses for this risk. [Solution: The second moment of the severity = exp(2µ + 2σ2 ) = exp(14.5) = 1,982,759. Thus σA2 = λ(2nd moment of the severity) = (3645)(1,982,759) = 7.22716 x 109 .] Formula in Terms of Moments of Severity: It may sometimes be useful to rewrite the variance of the aggregate loss in terms of the first and second moments of the severity:
σA2 = µFσX2 + µX2 σF2 = µF(E[X2 ] - E[X]2 ) + E[X]2 σF2 = µFE[X2 ] + E[X]2 (σF2 - µF). For a Poisson frequency distribution the final term is zero, σA2 = λE[X2 ]. For a Negative Binomial frequency distribution, σA2 = rβE[X2 ] + E[X]2 rβ2. For the Binomial frequency distribution, σA2 = mqE[X2 ] - E[X]2 mq2 . Normal Approximation: For frequency and severity independent, for large numbers of expected claims, the observed aggregate losses are approximately normally distributed. The more skewed the severity distribution, the higher the expected frequency has to be for the Normal Approximation to produce worthwhile results.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 117
For example, continuing the example above, the mean Poisson frequency is 3645, and the mean severity = exp[µ + 0.5σ2 ] = exp(6.125) = 457.14. Thus the mean aggregate losses is: (3645)(457.14) = 1,666,292. One could ask what the chance of the observed aggregate losses being between 1.4997 million and 1.8329 million. Since the variance of the aggregate losses is 7.22716 x 109 , the standard deviation of the aggregate losses is 85,013. Thus the probability of the observed aggregate losses being within ±10% of 1.6663 million is approximately: Φ[(1.8329 - 1.6663 million) / 85,013] - Φ[(1.4997 - 1.6663 million) / 85,013] = Φ[1.96] - Φ[-1.96] = 0.975 - (1 - 0.975) = 95%. LogNormal Approximation: When one has other than a large number of expected claims, the distribution of aggregate losses typically has a significantly positive skewness. Therefore, it makes sense to approximate the aggregate losses with a distribution that also has a positive skewness.69 Loss Models illustrates how to use a LogNormal Distribution to approximate the Aggregate Distribution.70 One applies the method of moments to fit a LogNormal Distribution with the same mean and variance as the Aggregate Losses. Exercise: For a given risk, the number of claims for a year is Negative Binomial with β = 3.2 and r = 14. The severity distribution is Pareto with parameters α = 2.5 and θ = 10. Frequency and severity are independent. Determine the mean and variance of the aggregate losses. [Solution: The mean frequency is: (3.2)(14) = 44.8. The variance of frequency is: (3.2)(1 + 3.2)(14) = 188.16. The mean severity is 10/(2.5 - 1) = 6.667. The second moment of severity is: 2θ2 / {(α-1)(α-2)} = 200 / {(0.5)(1.5)} = 266.67. The variance of the severity is: 266.67 - 6.6672 = 222.22. Thus the mean of the aggregate losses is: (44.8)(6.667) = 298.7. The variance of the aggregate losses is: (44.8)(222.22) + (6.6672 )(188.16) = 18,319.]
69
The Normal Distribution being symmetric has zero skewness. See Example 9.4 of Loss Models. Actuarial Mathematics, at pages 388-389 not on the Syllabus, demonstrates how to use a “translated Gamma Distribution.” “Approximations of the Aggregate Loss Distribution,” by Papush, Patrik, and Podgaits, CAS Forum Winter 2001, recommends that if one uses a 2 parameter distribution, one use the Gamma Distribution. Loss Models mentions that one could match more than the first two moments by using distributions with more than two parameters. 70
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 118
Exercise: Fit a LogNormal Distribution to the aggregate losses in the previous exercise, by matching the first two moments. [Solution: mean = 298.7. second moment = 18,319 + 298.72 = 107,541. Matching the mean of the LogNormal and the data: exp(µ + 0.5 σ2) = 298.7. Matching the second moment of the LogNormal and the data: exp(2µ + 2σ2) = 107,541. Divide the second equation by the square of the first equation: exp(2µ + 2σ2) / exp(2µ + σ2) = exp(σ2) = 1.205.
⇒ σ = 0.1867 = 0.432. ⇒ µ = ln(298.7) - σ2/2 = 5.606. Comment: The Method of Moments as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”] Then one can use the approximating LogNormal Distribution to answer questions about the aggregate losses. Exercise: Using the LogNormal approximation, estimate is the probability that the aggregate losses are less than 500? [Solution: Φ[(ln(500) - 5.606)/0.432] = Φ[1.41] = 0.9207.] Exercise: Using the LogNormal approximation, estimate is the probability that the aggregate losses are between 200 and 500? [Solution: Φ[(ln(500)-5.606)/0.432] - Φ[(ln(200)-5.606)/0.432] = Φ[1.41] - Φ[-0.71] = 0.9207 - 0.2389 = 0.6818.] Higher Moments of the Aggregate Losses: When frequency and severity are independent, just as one can write the variance or coefficient of variation of the aggregate loss distribution in terms of quantities involving frequency and severity, one can write higher moments in this manner. For example, the third central moment of the aggregate losses can be written as:71 third central moment of the aggregate losses = (mean frequency)(3rd central moment of severity) + 3(variance of frequency)(mean severity)(variance of severity) + (mean severity)3 (3rd central moment of frequency). Note that each term is in dollars cubed. 71
See Equation 9.9 in Loss Models. Also, see either Actuarial Mathematics or Practical Risk Theory for Actuaries, by Daykin, et. al. As shown in the latter, one can derive this formula via the cumulant generating function.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 119
This formula can be written in terms of skewnesses as follows: E[A - E[A]]3 = µF σX3 γX + 3 σF2 µXσX2 + σF3 γFµX3 . Therefore, the skewness of the aggregate losses is: γA = {µF σX3 γX + 3 σF2 µXσX2 + σF3 γFµX3 } / σA3 . This can also be written in terms of coefficients of variation rather than variances: γA =
(CVX3 γX / µF2 ) + (3 CVF2 CVX2 / µF) + CVF3 γF . CVA3
Exercise: If the frequency is Negative Binomial with r = 27 and β = 7/3, then what are the mean, coefficient of variation and skewness? [Solution: µF = rβ = 63, CVF =
1 + β = 0.2300, and γF = (1+2β) / βr
(1+ β)βr = 0.391.]
Exercise: If the severity is given by a Pareto Distribution with α = 4 and θ = 3, then what are the mean, coefficient of variation and skewness? [Solution: E[X] = θ/(α-1) = 1. E[X2 ] = 2θ2/{(α-1)(α-2)} = 3. E[X3 ] = 6θ2/{(α-1)(α-2)(α-3)} = 27.
µX = E[X] = 1. Var[X] = E[X2 ] - E[X]2 = 2. CVX = 2 /1 = 1.414. 3rd central moment = E[X3 ] - 3E[X]E[X2 ] + 2E[X]3 = 20. γX = 20/21.5 = 7.071.] Exercise: The frequency is Negative Binomial with r = 27 and β = 7/3. The severity is Pareto with α = 4 and θ = 3. Frequency and severity are independent. What are the mean, coefficient of variation and skewness of the aggregate losses? [Solution: µA = µFµX = 63, CVA2 = CVF2 + CVX2 / µF = 0.0846 and therefore CVA = 0.291, and γA =
(CVX3 γX / µF2 ) + (3 CVF2 CVX2 / µF) + CVF3 γF = CVA3
{(1.4143 )(7.071)/632 + 3(1.4142 )(0.232 )/63 + (0.233 )(0.391)} / 0.2913 = 0.0148/0.0246 = 0.60. Note that the variance of the aggregate losses in this case is:
σA2 = µF σX2 + µX2 σF2 = (63)(2) + (1)(210) = 336. Comment: Actuarial Mathematics in Table 12.5.1, not on the Syllabus, has the following formula for the third central moment of a Compound Negative Binomial: r{βE[X3 ] + 3 β2E[X]E[X2 ] + 2 β3E[X]3 }. In this case, this formula gives a third central moment of: (207){(7/3)(27) + (3)(7/3)2 (1)(3) + (2)(7/3)3 (13 ) = 3710. Thus the skewness is: 3710 / (3361.5) = 0.60.]
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 120
As the expected number of claims increases, the skewness of the aggregate losses → 0, making the Normal Approximation better. As discussed above, when the skewness of the aggregate losses is significant, one can approximate the aggregate losses via a LogNormal.72 For the Poisson Distribution the mean = λ, the variance = λ, and the skewness is 1/ λ .73 Therefore, for a Poisson Frequency, σF3 γF = λ1.5/ λ = λ, and the third central moment of the aggregate losses is: E[A - E[A] ]3 = µF σX3 γX + 3 σF2 µXσX2 + σF3 γFµX3 = λ (third central moment of severity) + 3λ µXσX2 + λ µX3 = λ {E[X3 ] - 3µXE[X2 ] - 2µX3 + 3µX (E[X2 ] - µX2 ) + µX3 } = λ (third moment of the severity). The Third Central Moment of a Compound Poisson Distribution is: (mean frequency) (third moment of the severity). For a Poisson Frequency, the variance of the aggregate losses is: λ(2nd moment of severity). Therefore, skewness of a compound Poisson = (third moment of the severity) / { λ (2nd moment of severity)1.5}. Exercise: Frequency is Poisson with mean 3.1. Severity is discrete with: P[X=100] = 2/3, P[X=500] = 1/6, and P[X=1000] = 1/6. Frequency and Severity are independent. What is the skewness of the distribution of aggregate losses? [Solution: The second moment of the severity is: (2/3)(1002 ) + (1/6)(5002 ) + (1/6)(10002 ) = 215,000. The third moment of the severity is: (2/3)(1003 ) + (1/6)(5003 ) + (1/6)(10003 ) = 188,166,667. The skewness of a compound Poisson = (third moment of the severity) / { λ (2nd moment of severity)1.5} = 188,166,667/ { 3.1 (215000)1.5} = 1.072. Comment: Since the skewness is a dimensionless quantity that does not depend on the scale, we would have gotten the same answer if we had instead worked with a severity distribution with all of the amounts divided by 100: P[X=1] = 2/3, P[X=5] = 1/6, and P[X=10] = 1/6.]
The Kurtosis of a Compound Poisson Distribution is: 3 +
72
E[X4 ] . E[X2] 2 λ
As discussed in Actuarial Mathematics, when the skewness of the aggregate losses is significant, one can approximate with a translated Gamma Distribution rather than a Normal Distribution. 73 See “Mahlerʼs Guide to Frequency Distributions.”
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 121
Per Claim Deductibles:74 If frequency and severity are independent, then the aggregate losses depend on the number of losses greater than the deductible amount and the size of the losses truncated and shifted by the deductible amount. If frequency is Poisson with parameter λ, then the losses larger than d are also Poisson, but with parameter S(d)λ.75 Exercise: Frequency is Poisson with λ = 10. If 19.75% of losses are large, what is the frequency distribution of large losses? [Solution: It is Poisson with λ = (10)(19.75%) = 1.975.] If frequency is Negative Binomial with parameters β and r, then the losses larger than d are also Negative Binomial, but with parameter S(d)β and r.76 Exercise: Frequency is Negative Binomial with r = 2.4 and β = 1.1. If 77.88% of losses are large, what is the frequency distribution of large losses? [Solution: It is Negative Binomial with r = 2.4 and β = (1.1)(0.7788) = 0.8567.] One can then look at the non-zero payments by the insurer. Their sizes are distributed as the original distribution truncated and shifted by d.77 The mean of the non-zero payments = the mean of the severity distribution truncated and shifted by d: {E[X] - (E[X ∧ d]} / S(d). Exercise: For a Pareto with α = 4 and θ = 1000, compute the mean of the non-zero payments given a deductible of 500. [Solution: For the Pareto E[X] = θ/(α-1) = 1000/ 3 = 333.33. The limited expected value is E[X ∧ 500] = {θ/(α−1)} {1−(θ/(θ+500))α−1} = 234.57. S(500) = 1/(1+500/1000)4 = 0.1975. (E[X] - E[X ∧ 500])/S(500) = (333.33 - 234.57)/0.1975 = 500. Alternately, the mean of the data truncated and shifted at 500 is the mean excess loss at 500. For the Pareto, e(x) = (x+θ)/(α-1). e(500) = (500+1000)/(4-1) = 500. Alternately, the distribution truncated and shifted (from below) at 500 is G(x) = {F(x+500) - F(500)}/S(500) = {(1+500/1000)-4 - (1+(x+500)/1000)-4}/(1+500/1000)-4 = 1 - (1+x/1500)-4}. This is a Pareto with α = 4 and θ = 1500, and mean 1500/(4-1) = 500.] 74
A per claim deductible operates on each loss individually. This should be distinguished from an aggregate deductible which applies to the aggregate losses, as discussed below in the section on stop loss premiums. 75 Where S(d) is the survival function of the severity distribution prior to the impact of the deductible. 76 See “Mahlerʼs Guide to Frequency Distributions.” 77 See “Mahlerʼs Guide to Loss Distributions.”
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 122
Thus a Pareto truncated and shifted at d, is another Pareto with parameters α and θ+d. Therefore the above Pareto distribution truncated and shifted at 500, with parameters 4 and 1000 + 500 = 1500, has a variance of (15002 )(4)/{(4-2)(4-1)2 } = 500,000. The Exponential distribution has a similar nice property. An Exponential Distribution truncated and shifted at d, is another Exponential with the same mean. Thus if one has an Exponential with θ = 2000, and one truncates and shifts at 500 (or any other value) one gets another exponential with θ = 2000. Thus the mean of the severity truncated and shifted is 2000, and the variance is: 20002 = 4,000,000 For any severity distribution, given a deductible of d, the variance of the non-zero payments = the variance of the severity distribution truncated and shifted by d is:78 {E[X2 ] - E[(X ∧ d)2 ] - 2d{E[X] - (E[X ∧ d]}}/S(d) - {{E[X] - (E[X ∧ d]}/S(d)}2 . Exercise: For a Pareto with α = 4 and θ = 1000, use the above formula to compute the variance of the non-zero payments given a deductible of 500. [Solution: E[X2 ] = (10002 )2/((4-1)(4-2) = 333,333. E[(X ∧ 500)2 ]= E[X2 ] {1 - (1+ 500/ θ)1−α[1+ (α-1)500/ θ]} = 86,420. E[X ∧ 500] = 234.57. E[X] = θ/(α-1) = 1000/ 3 = 333.33. S(500) = 0.1975. {E[X2 ] - E[(X ∧ d)2 ] - 2d{E[X] - (E[X ∧ d]}}/S(d) - {{E[X] - (E[X ∧ d]}/S(d)}2 = {333,333 - 86420 - (2)(500(333.33 - 234.57)}/0.1975 - (500)2 = 500,000.] We can then combine the frequency and severity after the effects of the per claim deductible, in order to work with the aggregate losses. Exercise: Frequency is Poisson with λ = 10. Severity is Pareto with α = 4 and θ = 1000. Severity and frequency are independent. There is a per claim deductible of 500. What are the mean and variance of the aggregate losses excess of the deductible? [Solution: The frequency of non-zero payments is Poisson with mean (0.1975)(10) = 1.975. The severity of non-zero payments is Pareto with α = 4 and θ = 1500. The mean aggregate loss is (1.975)(1500/3) = 987.5. The variance of this compound Poisson is: 1.975 (2nd moment of Pareto) = (1.975)(15002 2/ (4-1)(4-2)) = 1,481,250.]
78
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 123
Exercise: Frequency is Negative Binomial with r = 2.4 and β = 1.1. Severity is Exponential with θ = 2000. Severity and frequency are independent. There is a per claim deductible of 500. What is the mean and variance of the aggregate losses excess of the deductible? [Solution: S(500) = e-500/2000 = 0.7788. The frequency of non-zero payments is: Negative Binomial with r = 2.4 and β = (1.1)(0.7788) = 0.8567. The mean frequency of non-zero payments is: (2.4)(0.8567) = 2.056. The variance of the number of non-zero payments is: (2.4)(0.8567)(1.8567) = 3.818. The severity of non-zero payments is Exponential with θ = 2000. The mean non-zero payment is 2000. The variance of size of non-zero payments is 20002 = 4 million. The mean of the aggregate losses excess of the deductible is: (2.056)(2000) = 4112. The variance of the aggregate losses excess of the deductible is: (2.056)(4 million) + (3.818)(20002 ) = 23.5 million. Comment: See Course 3 Sample Exam, Q.20. ] Maximum Covered Losses:79 Assume the severity follows a LogNormal Distribution with parameters µ = 8 and σ = 2. Assume frequency is Poisson with λ = 100. The mean severity is: exp(µ + σ2/2) = e10 = 22026.47, and the mean aggregate losses are: 2,202,647. The second moment of the severity is: exp(2µ + 2σ2) = e24 = 26.49 billion. Thus the variance of the aggregate losses is: (100)(26.49 billion) = 2649 billion. Exercise: If there is a $250,000 maximum covered loss, what are the mean and variance of the aggregate losses paid by the insurer? [Solution: E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x{1 - Φ[(lnx − µ)/σ]}. E[X ∧ 250,000] = e10 Φ[(ln(250,000) - 8 - 4 )/2] + (250,000){1 - Φ[(ln(250,000) − 8)/2]} = (22026)Φ[.2146] + (250,000)(1−Φ[2.2146]) = (22026)(0.5850) + (250,000)(1- 0.9866) = 16235. E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1- Φ[{ln(x) − µ} / σ] }. E[(X ∧ 250,000)2 ] = exp(24)Φ[-1.7854] + 62.5 billion{1-Φ[2.2146]} = (26.49 billion) (0.03710) + (62.5 billion) (1- 0.9866) = 1.820 billion. The frequency is unaffected by the maximum covered loss. Thus the mean aggregate losses are (100)(16,235) = 1.62 million. The variance of the aggregate losses is: (100)(1.820 billion) = 182.0 billion.] 79
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 124
We note how the maximum covered loss reduces both the mean and variance of the aggregate losses. By cutting off the effect of the heavy tail of the severity, the maximum covered loss has a significant impact on the variance of the aggregate losses. In general one can do similar calculations for any severity distribution and frequency distribution. The moments of the severity are calculated using the limited expected moments, as shown in Appendix A of Loss Models, while the frequency is unaffected by the maximum covered loss. Maximum Covered Losses and Deductibles: If one has both a maximum covered loss u and a per claim deductible d, then the severity is the layer of loss between d and u, while the frequency is the same as that in the presence of just the deductible. The first moment of the nonzero payments is: {E[X ∧ u] - (E[X ∧ d]}}/S(d), while the second moment of the nonzero payments is: {E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]}}/S(d).80
Exercise: If in the previous exercise there were a per claim deductible of $50,000 and a maximum covered loss of $250,000, what would be the mean and variance of the aggregate losses paid by the insurer? [Solution: E[X ∧ 50,000] = 10,078. E[X ∧ 250,000] = 16,235. E[(X ∧ 250,000)2 ] = 1.820 billion. E[(X ∧ 50,000)2 ] = .323 billion. S(50000) = 1 - Φ((ln(50000)-8)/2) = 1 - Φ(1.410) = 1 - 0.9207 = 0.0793. The first moment of the nonzero payments is: (16235 - 10078)/0.0793 = 77,642. The second moment of the nonzero payments is: (1.820 billion - 0.323 billion - (2)(50000)(16235 - 10078))/0.0793 = 11.11 billion. The frequency of nonzero payments is Poisson with λ = (100)(0.0793) = 7.93. Thus the mean aggregate losses are: (7.93)(77642) = 616 thousand. The variance of the aggregate losses is: (7.93)(11.11 billion) = 88.1 billion.] We can see how such calculations involving aggregate losses in the presence of both a deductible and a maximum covered loss, can quickly become too time consuming for exam conditions.
80
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 125
Compound Frequency Distributions:81 For example, assume the number of taxicabs that arrive per minute at the Heartbreak Hotel is Poisson with mean 1.3. In addition, assume that the number of passengers dropped off at the hotel by each taxicab is Binomial with q = 0.4 and m = 5. The number of passengers dropped off by each taxicab is independent of the number of taxicabs that arrive and is independent of the number of passengers dropped off by any other taxicab. Then the aggregate number of passengers dropped off per minute at the Heartbreak Hotel is an example of a compound frequency distribution. Compound distributions are mathematically equivalent to aggregate distributions, with a discrete severity distribution. Poisson ⇔ Frequency. Binomial ⇔ Severity. Thus although compound distributions are not on the syllabus, on your exam, one could describe the above situation as a collective risk model with Poisson frequency and Binomial Severity. Aggregate Distribution
Compound Frequency Distribution
Frequency
⇔
Primary (# of cabs)
Severity
⇔
Secondary (# of passengers per cab)
σC 2 = µf σs 2 + µs 2 σf 2 . f ⇔ frequency or first (primary) s ⇔ severity or secondary.
81
Discussed more extensively in “Mahlerʼs Guide to Frequency Distributions.”
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 126
Process Covariances:82 Assume a claims process in which frequency and severity are independent of each other, and the claim sizes are mutually independent random variables with a common distribution.83 Then let each claim be divided into two pieces in a well-defined manner not dependent on the number of claims. For convenience, we refer to these two pieces as primary and excess.84 Let:
Tp = Total Primary Losses
Te = Total Excess Losses
Xp = Primary Severity
Xe = Excess Severity
N = Frequency Then as will be proved subsequently: COV[Tp , Te ] = E[N] COV[Xp ,Xe ] + VAR[N] E[Xp ] E[Xe ].85 Exercise: Assume severity follows an Exponential Distribution with θ = 10,000. The first 5000 of each claim is considered primary losses. Xp = Primary Severity = X ∧ 5000. Excess of 5000 is considered excess losses. Xe = Excess Severity = (X - 5000)+. Determine the covariance of Xp and Xe . Hint:
∫ x e - x / θ / θ dx = -x e-x/θ - θ e-x/θ. ∫ x2 e- x / θ / θ dx = -x2 e-x/θ - 2xθ e-x/θ - 2θ2 e-x/θ.
[Solution: E[Xp ] = E[X ∧ 5000] = (10,000) (1 - e-5000/10,000) = 3935. E[Xe ] = 10,000 - 3935 = 6065. ∞
5000
E[Xp Xe ] = ∞
=
∫0
x 0 e - x / 10000 dx +
x (x - 5000) e- x / 10000 dx ∫ 5000
∞
x2 e- x / 10000 dx - 5000 x e - x / 10000 dx ∫ ∫ 5000 5000
= {(50002 ) +(2)(5000)(10,000) + (2)(10,0002 )} e-0.5 - (5000){5000 + 10,000}e-0.5 = 151.633 million. Cov[Xp , Xe ] = E[Xp Xe ] - E[Xp ]E[Xe ] = 151.633 million - (3935)(6065) = 127.767 million.] 82
Beyond what you will be asked on your exam. In other words, assume the usual collective risk model. 84 In a single-split experience rating plan, the first $5000 of each claim might be primary and anything over $5000 would contribute to the excess losses. 85 This is a generalization of the formula we had for the process variance of aggregate losses. See Appendix A of Howard Mahlerʼs discussion of Glenn Meyersʼ “An Analysis of Experience Rating”, PCAS 1987. 83
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 127
Exercise: Assume severity follows an Exponential Distribution with θ = 10,000. The first 5000 of each claim is primary losses. Excess of 5000 is excess losses. Frequency is Negative Binomial with r = 3 and β = 0.4. Determine the covariance of the total primary and total excess losses. [Solution: COV[Tp , Te ] = E[N] COV[Xp ,Xe ] + VAR[N] E[Xp ] E[Xe ] = (3)(0.4) (127.761 million) + (3)(0.4)(1.4) (3935)(6065) = 193.408 million.] Proof of the Result for Process Covariances: The total primary losses Tp is the sum of the individual primary portions of claims Xp (i), where i runs from 1 to N, the number of claims. Similarly, Te is a sum of Xe (i). Since N is a random variable, both frequency and severity contribute to the covariance of Tp and Te . To compute the covariance of Tp , and Te , begin by calculating E[Tp Te | N=n]. n ⎡ n ⎢ Fix the number of claims n and find E Xe(i) Xp(i) ⎢ ⎣ i=1 i=1
∑
∑
⎤ ⎥. ⎥ ⎦
Expanding the product yields n2 terms of the form Xp (i) Xe (j). From the definition of covariance, when i = j the expected value of the term is: E[Xp (i) Xe (i)] = COV[Xp (i), Xe (i)] + E[Xp (i)] E[Xe (i)]. Otherwise, for i ≠ j, X(i) and X(j) are independent and E[Xp (i) Xe (j)] = E[Xp (i)] E[Xe (j)]. n ⎡ n ⎢ Thus E Xe(i) Xp(i) ⎢ ⎣ i=1 i=1
∑
∑
⎤ ⎥ = n COV[Xp , Xe ] + n2 E[Xp ] E[Xe ]. ⎥ ⎦
Now, by general considerations of conditional expectations: E[Tp Te ] = EN[ E[Tp Te | N=n] ]. Thus, taking the expected value of the above equation with respect to N gives: E[Tp Te ] = E[N] COV[Xp , Xe ] + E[N2 ] E[Xp ] E[Xe ] = E[N] COV[Xp , Xe ] + {Var[N] + E[N]2 } E[Xp ] E[Xe ]. COV[Tp , Te ] = E[Tp Te ] - E[Tp ] E[Te ] = E[N] COV[Xp , Xe ] + {Var[N] + E[N]2 } E[Xp ] E[Xe ] - E[N] E[Xp ] E[N] E[Xe ] = E[N] COV[Xp ,Xe ] + VAR[N] E[Xp ] E[Xe ].
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 128
Problems: 5.1 (1 point) You are given the following: • mean frequency = 13 • variance of the frequency = 37 • mean severity = 300 • variance of the severity = 200,000 • frequency and severity are independent What is the variance of the aggregate losses? A. Less than 5 million B. At least 5 million but less than 6 million C. At least 6 million but less than 7 million D. At least 7 million but less than 8 million E. At least 8 million 5.2 (2 points) A six-sided die is used to determine whether or not there is a claim. Each side of the die is marked with either a 0 or a 1, where 0 represents no claim and 1 represents a claim. Two sides are marked with a 0 and four sides with a 1. In addition, there is a spinner representing claim severity. The spinner has three areas marked 2, 5 and 14. The probabilities for each claim size are: Claim Size Probability 2 20% 5 50% 14 30% The die is rolled and if a claim occurs, the spinner is spun. What is the variance for a single trial of this risk process? A. Less than 24 B. At least 24 but less than 25 C. At least 25 but less than 26 D. At least 26 but less than 27 E. At least 27 5.3 (2 points) You are given the following: • Number of claims for an insured follows a Poisson distribution with mean 0.25. • The amount of a single claim has a uniform distribution on [0, 5000] • Number of claims and claim severity are independent. Determine the variance of the aggregate losses for this insured. A. Less than 2.1 million B. At least 2.1 million but less than 2.2 million C. At least 2.2 million but less than 2.3 million D. At least 2.3 million but less than 2.4 million E. At least 2.4 million
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 129
5.4 (3 points) You are given the following: • For a given risk, the number of claims for a single exposure period will be 1, with probability 4/5; or 2, with probability 1/5. • If only one claim is incurred, the size of the claim will be 50, with probability 3/4; or 200, with probability 1/4. • If two claims are incurred, the size of each claim, independent of the other, will be 50, with probability 60%; or 150, with probability 40%. Determine the variance of the aggregate losses for this risk. A. Less than 4,000 B. At least 4,000, but less than 4,500 C. At least 4,500, but less than 5,000 D. At least 5,000, but less than 5,500 E. At least 5,500 5.5 (2 points) A large urn has many balls of three different kinds in the following proportions: Type of Ball: Proportion Red 70% Green $50 20% Green $200 10% The risk process is as follows: 1. Set the aggregate losses equal to zero. 2. Draw a ball from the urn. 3. If the ball is Red then Exit, otherwise continue to step 4. 4. If the ball is Green add the amount shown to the aggregate losses and return to step 2. Determine the process variance of the aggregate losses for a single trial of this risk process. A. Less than 9,000 B. At least 9,000 but less than 10,000 C. At least 10,000 but less than 11,000 D. At least 11,000 but less than 12,000 E. At least 12,000 5.6 (3 points) Assume there are 3 types of risks. Whether or not there is a claim is determined by whether a six-sided die comes up with a zero or a one, with a one indicating a claim. If a claim occurs then its size is determined by a spinner. Type Number of die faces with a 1 rather than a 0 Claim Size Spinner I 2 $100 70%, $200 30% II 3 $100 50%, $200 50% III 4 $100 30%, $200 70% Determine the variance of aggregate annual losses for a portfolio of 300 risks, consisting of 100 risks of each type. A. 1.9 million B. 2.0 million C. 2.1 million D. 2.2 million E. 2.3 million
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 130
Use the following information for the next 5 questions: • •
Number of claims follows a Poisson distribution with mean of 5. Claim severity is independent of the number of claims and has the following probability density function: f(x) = 3.5 x-4.5, x >1.
5.7 (2 points) Determine the variance of the aggregate losses. A. Less than 11.0 B. At least 11.0 but less than 11.3 C. At least 11.3 but less than 11.6 D. At least 11.6 but less than 11.9 E. At least 11.9 5.8 (2 points) Using the Normal Approximation, estimate the probability that the aggregate losses will exceed 11. A. 10% B. 12% C. 14% D. 16% E. 18% 5.9 (2 points) Approximating with a LogNormal Distribution, estimate the probability that the aggregate losses will exceed 11. A. Less than 12% B. At least 12%, but less than 14% C. At least 14%, but less than 16% D. At least 16%, but less than 18% E. At least 18% 5.10 (2 points) Determine the skewness of the aggregate losses. A. Less than 0.4 B. At least 0.4 but less than 0.6 C. At least 0.6 but less than 0.8 D. At least 0.8 but less than 1.0 E. At least 1.0 5.11 (2 points) Determine the variance of the aggregate losses, if there is a maximum covered loss of 5. A. Less than 11.0 B. At least 11.0 but less than 11.3 C. At least 11.3 but less than 11.6 D. At least 11.6 but less than 11.9 E. At least 11.9
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 131
Use the following information for the next two questions: •
Number of claims follows a Poisson distribution with mean µ.
•
The amount of a single claim has an exponential distribution given by: f(x) = e-x/θ / θ , x > 0, θ > 0
•
Number of claims and claim severity distributions are independent.
5.12 (2 points ) Determine the variance of the aggregate losses. A. µθ
B. µθ2
C. 2µθ
D. 2µθ2
E. None of A, B, C, or D.
5.13 (2 points ) Determine the skewness of the aggregate losses. 1 θ 3 3θ A. B. C. D. E. None of A, B, C, or D. 2µ 2µ 2µ 2µ 5.14 (1 point) You are given the following:
•
The frequency distribution follows the Poisson process with mean 3.
•
The second moment about the origin for the severity distribution is 200.
•
Frequency and Severity are independent.
What is the variance of the aggregate losses? A. 400 B. 450 C. 500 D. 550
E. 600
5.15 (3 points) The number of accidents any particular automobile has during a year is Poisson with mean 0.03. The damage to an automobile due any single accident is uniformly distributed over the interval from 0 to 3000. Using the Normal Approximation, what is the minimum number of independent automobiles that must be insured so that the probability that the aggregate annual losses exceed 160% of expected is at most 5%? A. 295 B. 305 C. 315 D. 325 E. 335 5.16 (2 points) You are given the following:
• For baseball player Don, the number of official at bats in a season is Poisson with λ = 600. • For Don the probabilities of the following types of hits per official at bat are: Single 22%, Double 4%, Triple 1%, and Home Run 5%. • Donʼs contract provides him incentives of: $2000 per single, $4000 per double, $6000 per triple and $8000 per home run ($2000 per base.) What is the chance that next year Don will earn at most $700,000 from his incentives? Use the Normal Approximation. A. 84% B. 86% C. 88% D. 90% E. 92%
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 132
Use the following information for the next five questions: There are three types of risks. For each type of risk, the frequency and severity are independent. Type Frequency Distribution Severity Distribution Ι
Binomial: m = 8, q = 0.4
Pareto: α = 4, θ = 1000
ΙΙ
Poisson: λ = 3
LogNormal: µ = 7, σ = 0.5
ΙΙΙ
Negative Binomial: r = 3, β = 2
Gamma: α = 3, θ = 200
5.17 ( 2 points) For a risk of Type Ι, what is the variance of the aggregate losses? A. Less than 0.5 million B. At least 0.5 million but less than 0.6 million C. At least 0.6 million but less than 0.7 million D. At least 0.7 million but less than 0.8 million E. At least 0.8 million 5.18 ( 2 points) For a risk of Type ΙΙ, what is the variance of the aggregate losses? A. Less than 5.7 million B. At least 5.7 million but less than 5.8 million C. At least 5.8 million but less than 5.9 million D. At least 5.9 million but less than 6.0 million E. At least 6.0 million 5.19 ( 2 points) For a risk of Type ΙΙΙ, what is the variance of the aggregate losses? A. Less than 7.0 million B. At least 7.0 million but less than 7.5 million C. At least 7.5 million but less than 8.0 million D. At least 8.0 million but less than 8.5 million E. At least 8.5 million 5.20 ( 2 points) Assume one has a portfolio made up of 55 risks of Type Ι, 35 risks of Type ΙΙ, and 10 risks of Type ΙΙΙ. Each risk in the portfolio is independent of all the others. For this portfolio, what is the variance of the aggregate losses? A. 310 million
B. 320 million
C. 330 million
D. 340 million
E. 350 million
5.21 (5 points) For a risk of Type ΙΙΙ, what is the skewness of the aggregate losses? A. Less than 1.0 B. At least 1.0 but less than 1.1 C. At least 1.1 but less than 1.2 D. At least 1.2 but less than 1.3 E. At least 1.3
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 133
Use the following information for the next 6 questions:
• The severity distribution is an Exponential distribution with θ = 5000, prior to the impact of any deductible or maximum covered loss.
• The number of losses follows a Poisson distribution with λ = 2.4, prior to the impact of any deductible.
• Frequency and severity are independent. 5.22 (1 point) What are the mean aggregate losses excess of a 1000 per claim deductible? A. 9,600 B. 9,800 C. 10,000 D. 10,200 E. 10,400 5.23 (2 points) What is the standard deviation of the aggregate losses excess of a 1000 per claim deductible? A. 8,700 B. 9,000 C. 9,300 D. 9,600 E. 9,900 5.24 (1 point) What are the mean aggregate losses if there is a 10,000 maximum covered loss and no deductible? A. Less than 9,800 B. At least 9,800, but less than 10,000 C. At least 10,000, but less than 10,200 D. At least 10,200, but less than 10,400 E. At least 10,400 5.25 (2 points) What is the standard deviation of the aggregate losses if there is a 10,000 maximum covered loss? A. Less than 6,000 B. At least 6,000, but less than 7,000 C. At least 7,000, but less than 8,000 D. At least 8,000, but less than 9,000 E. At least 9,000 5.26 (2 points) What are the mean aggregate losses if there is both a 1000 per claim deductible and a 10,000 maximum covered loss? A. 7,800 B. 8,000 C. 8,200 D. 8,400 E. 8,600 5.27 (3 points) What is the standard deviation of the aggregate losses if there is both a 1000 per claim deductible and a 10,000 maximum covered loss? A. Less than 6,500 B. At least 6,500, but less than 7,000 C. At least 7,000, but less than 7,500 D. At least 7,500, but less than 8,000 E. At least 8,000
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 134
Use the following size of loss data from ABC Insurance for the next 6 questions: Range number of losses 0-1 60 1-3 30 3-5 20 5-10 10 120 Assume a uniform distribution of loss sizes within each interval. In addition there are 5 losses of size greater than 10: 12, 15, 17, 20, 30. 5.28 (2 points) Calculate the mean. A. 2.3 B. 2.5 C. 2.7
D. 2.9
E. 3.1
5.29 (3 points) Calculate the variance. A. less than 15.5 B. at least 15.5 but less than 16.0 C. at least 16.0 but less than 16.5 D. at least 16.5 but less than 17.0 E. at least 17.0 5.30 (2 points) Calculate e(7). A. less than 6.0 B. at least 6.0 but less than 6.5 C. at least 6.5 but less than 7.0 D. at least 7.0 but less than 7.5 E. at least 7.5 5.31 (2 points) The annual number of losses for ABC Insurance is Poisson with mean 40. What is the coefficient of variation of its aggregate annual losses? (A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7 5.32 (3 points) The annual number of losses for ABC Insurance is Poisson with mean 40. ABC Insurance buys reinsurance for the layer 10 excess of 5 (the layer from 5 to 15). How much does the reinsurer expect to pay per year due to losses by ABC Insurance? (A) 19 (B) 20 (C) 21 (D) 22 (E) 23 5.33 (3 points) The annual number of losses for ABC Insurance is Poisson with mean 40. ABC Insurance buys reinsurance for the layer 10 excess of 5. What is the coefficient of variation of the annual payment by the reinsurer due to losses by ABC Insurance? (A) 0.6 (B) 0.7 (C) 0.8 (D) 0.9 (E) 1.0
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 135
5.34 (3 points) You are given the following:
• The severity distribution is a Pareto distribution with α = 3.2 and θ = 20,000, prior to the impact of any deductible.
• The number of losses follows a Negative Binomial with r = 4.1 and β = 2.8, prior to the impact of any deductible.
• Frequency and severity are independent. • There is a 50,000 per claim deductible. What is the chance that the aggregate losses excess of the deductible are greater than 15,000? Use the Normal Approximation. A. Less than 20% B. At least 20%, but less than 25% C. At least 25%, but less than 30% D. At least 30%, but less than 35% E. At least 35% Use the following information for the next two questions:
• The claim frequency for each policy is Poisson. • The expected mean frequencies differ across the portfolio of policies. • The mean frequencies are Gamma Distributed across the portfolio with α = 5 and θ = 0.4. • Claim severity has a mean of 20 and a variance of 300. • Claim frequency and severity are independent. 5.35 (3 points) If an insurer has sold 200 independent policies, determine the probability that the aggregate loss for the portfolio will exceed 110% of the expected loss. Use the Normal Approximation. A. 7.5% B. 8.5% C. 9.5% D. 10.5% E. 11.5% 5.36 (2 points) Determine the minimum number of independent policies that would have to be sold so that the probability that the aggregate loss for the portfolio will exceed 110% of the expected loss does not exceed 1%. Use the Normal Approximation. A. 400 B. 500 C. 600 D. 700 E. 800 5.37 (2 points) Frequency has mean 10 and variance 20. Severity has mean 1000 and variance 200,000. Severities are independent of each other and of the number of claims. Let σ be the standard deviation of the aggregate losses. Let σʼ be the standard deviation of the aggregate losses, given that 8 claims have occurred. Calculate σ/σʼ. (A) 2.9
(B) 3.1
(C) 3.3
(D) 3.5
(E) 3.7
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 136
5.38 (3 points) Use the following information:
• An insurer issues a policy that pays for hospital stays. • Each hospital stay results in room charges and other charges. • Total room charges for a hospital stay have mean 5000 and standard deviation 8000. • Total other charges for a hospital stay have mean 2000 and standard deviation 3000. • The correlation between total room charges and total other charges for a hospital stay is 0.6. • The insurer reimburses 100% for room charges and 75% for other charges. • The number of annual admission to the hospital is Binomial with m = 4 and q = 0.1. Determine the standard deviation of the insurer's annual aggregate payments for this policy. (A) 5500 (B) 6000 (C) 6500 (D) 7000 (E) 7500 5.39 (3 points) Use the following information:
• An insurer issues a policy that pays for loss plus loss adjustment expense. • Losses follow a Gamma Distribution with α = 4 and θ = 1000. • Loss Adjustment Expenses follow a Gamma Distribution with α = 3 and θ = 200. • The correlation between loss and loss adjustment expense is 0.8. • The number of annual claims is Poisson with λ = 0.6. Determine the standard deviation of the insurer's annual aggregate payments for this policy. (A) 3800 (B) 3900 (C) 4000 (D) 4100 (E) 4200 5.40 (2 points) For aggregate claims A, you are given: ∞
(i) fA(x) =
∑ p* n (x) 3n e - 3 /
n!
n=0
(ii)
x p(x) 1 0.5 2 0.3 3 0.2 Determine Var[A]. (A) 7.5 (B) 8.5
(C) 9.5
(D) 10.5
(E) 11.5
5.41 (3 points) Aggregate Losses have a mean of 100 and a variance of 90,000. Approximating the aggregate distribution by a LogNormal Distribution, estimate the probability that the aggregate losses are greater than 2000. (A) 0.1% (B) 0.2% (C) 0.3% (D) 0.4% (E) 0.5%
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 137
5.42 (3 points) Use the following information: • The number of claims follows a Poisson distribution with a mean of 7. •
Claim severity has a Pareto Distribution with α = 3 and θ = 100.
• Frequency and severity are independent. Approximating the aggregate losses by a LogNormal Distribution, estimate the probability that the aggregate losses will exceed 1000. A. 1% B. 2% C. 3% D. 4% E. 5% 5.43 (2 points) The number of losses is Poisson with mean λ. The ground up distribution of size of loss is Exponential with mean θ. Frequency and severity are independent. Let B be the variance of aggregate payments if there is a deductible b. Let C be the variance of aggregate payments if there is a deductible c > b. Determine the ratio of C/B. When is this ratio less than one, equal to one, and greater than one? 5.44 (2 points) Frequency is Poisson with λ = 3. The size of loss distribution is Exponential with θ = 400. Frequency and severity are independent. There is an ordinary deductible of 500. Calculate the variance of the aggregate payments excess of the deductible. A. Less than 250,000 B. At least 250,000, but less than 260,000 C. At least 260,000, but less than 270,000 D. At least 270,000, but less than 280,000 E. 280,000 or more 5.45 (3 points) The number of losses is Poisson with mean λ. The ground up distribution of size of loss is Pareto with parameters α > 2, and θ. Frequency and severity are independent. Let B be the variance of aggregate payments if there is a deductible b. Let C be the variance of aggregate payments if there is a deductible c > b. Determine the ratio of C/B. When is this ratio less than one, equal to one, and greater than one?
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 138
Use the following information for the next six questions: The distribution of aggregate losses has a mean of 20 and a variance of 100. 5.46 (1 point) Approximate the distribution of aggregate losses by a Normal Distribution with mean and variance equal to that of the aggregate losses. Estimate the probability that the aggregate losses are greater than 42. A. Less than 2.0% B. At least 2.0%, but less than 2.5% C. At least 2.5%, but less than 3.0% D. At least 3.0%, but less than 3.5% E. At least 3.5% 5.47 (3 points) Approximate the distribution of aggregate losses by a LogNormal Distribution with mean and variance equal to that of the aggregate losses. Estimate the probability that the aggregate losses are greater than 42. A. 1.5% B. 2.0% C. 2.5% D. 3.0% E. 3.5% 5.48 (3 points) Approximate the distribution of aggregate losses by a Gamma Distribution with mean and variance equal to that of the aggregate losses. Estimate the probability that the aggregate losses are greater than 42. n-1
Hint: Γ[n; λ] = 1 - ∑ e -λ λi / i! . i=0
A. Less than 2.0% B. At least 2.0%, but less than 2.5% C. At least 2.5%, but less than 3.0% D. At least 3.0%, but less than 3.5% E. At least 3.5% 5.49 (4 points) Approximate the distribution of aggregate losses by an Inverse Gaussian Distribution with mean and variance equal to that of the aggregate losses. Estimate the probability that the aggregate losses are greater than 42. exp[-x2 / 2] Use the following approximation for x > 3: 1 - Φ[x] ≅ (1/x - 1/x3 + 3/x5 - 15/x7 ). 2π A. Less than 2.0% B. At least 2.0%, but less than 2.5% C. At least 2.5%, but less than 3.0% D. At least 3.0%, but less than 3.5% E. At least 3.5%
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 139
5.50 (4 points) If Y follows a Poisson Distribution with parameter λ, then for c > 0, cY follows an “Over-dispersed Poisson” Distribution with parameters c and λ. Approximate the distribution of aggregate losses by an Over-dispersed Poisson Distribution with mean and variance equal to that of the aggregate losses. Estimate the probability that the aggregate losses are greater than 42. A. Less than 2.0% B. At least 2.0%, but less than 2.5% C. At least 2.5%, but less than 3.0% D. At least 3.0%, but less than 3.5% E. At least 3.5% 5.51 (4 points) Approximate the distribution of aggregate losses by an Inverse Gamma Distribution with mean and variance equal to that of the aggregate losses. Estimate the probability that the aggregate losses are greater than 42. n-1
Hint: Γ[n; λ] = 1 - ∑ e -λ λi / i! . i=0
A. Less than 2.0% B. At least 2.0%, but less than 2.5% C. At least 2.5%, but less than 3.0% D. At least 3.0%, but less than 3.5% E. At least 3.5%
5.52 (2 points) The frequency for each insurance policy is Poisson with mean 2. The cost per loss has mean 5 and standard deviation 12. The number of losses and their sizes are all mutually independent. Determine the minimum number of independent policies that would have to be sold so that the probability that the aggregate loss for the portfolio will exceed 115% of the expected loss does not exceed 2.5%. Use the Normal Approximation. (A) 500 (B) 600 (C) 700 (D) 800 (E) 900 5.53 (3 points) Frequency is Negative Binomial with r = 4 and β = 3. The size of loss distribution is Exponential with θ = 1700. Frequency and severity are independent. There is an ordinary deductible of 1000. Calculate the variance of the aggregate payments excess of the deductible. A. 40 million B. 50 million C. 60 million D. 70 million
E. 80 million
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 140
Use the following information for the next two questions: An insurance company sold policies as follows: Number of Policy Probability of Policies Maximum Claim Per Policy 10,000 25 3% 15,000 50 5% You are given: (i) The claim amount for each policy is uniformly distributed between 0 and the policy maximum. (ii) The probability of more than one claim per policy is 0. (iii) Claim occurrences are independent. 5.54 (2 points) What is the variance of aggregate losses? A. Less than 650,000 B. At least 650,000, but less than 700,000 C. At least 700,000, but less than 750,000 D. At least 750,000, but less than 800,000 E. 800,000 or more 5.55 (1 point) What is the probability that aggregate losses are greater than 24,000? Use the Normal Approximation. A. Less than 4% B. At least 4%, but less than 5% C. At least 5%, but less than 6% D. At least 6%, but less than 7% E. At least 7%
5.56 (3 points) The number of Property Damage Liability claims is Poisson with mean λ. The size of Property Damage Liability claims has mean 10 and standard deviation 15. The number of Bodily Injury Liability claims is Poisson with mean λ/3. The size of Bodily Injury Liability claims has mean 24 and standard deviation 60. Let P = the 90th percentile of the aggregate distribution of Property Damage Liability. Let B = the 90th percentile of the aggregate distribution of Bodily Injury Liability. B/P = 1.061. Using the Normal Approximation, determine λ. A. 60
B. 70
C. 80
D. 90
E. 100
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 141
5.57 (5 points) Use the following information:
• You are given five years of observed aggregate losses: Year 2006 2007 2008 2009 2010
Aggregate Loss ($ million) 31 38 36 41 41
• Frequency is Poisson with mean 3000. • Severity follows a Pareto Distribution. • Frequency and severity are independent. • Inflation is 4% per year. Using the method of moments to fit the aggregate distribution to the data, estimate the probability that an individual loss will be of size greater than $20,000 in 2012. A. Less than 5% B. At least 5%, but less than 10% C. At least 10%, but less than 15% D. At least 15%, but less than 20% E. At least 20% 5.58 (3 points) Let S be the aggregate loss and N be the number of claims. Given the following information, determine the variance of S. N
Probability
E[S | N]
0 20% 0 1 40% 100 2 30% 250 3 10% 400 A. 60,000 B. 65,000
E[S2 | N]
0 50,000 150,000 300,000 C. 70,000 D. 75,000
E. 80,000
5.59 (3 points) Frequency is Binomial with m = 5 and q = 0.4. Severity is LogNormal with µ = 6 and σ = 0.3. Frequency and severity are independent. Using the Normal Approximation, estimate the probability that the aggregate losses are greater than 150% of their mean. A. 20% B. 25% C. 30% D. 35% E. 40%
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 142
5.60 (3 points) Use the following information:
• Annual claim occurrences follow a Zero-Modified Negative Binomial Distribution with pM 0 = 40%, r = 2 and β = 0.4.
• Each claim amount follows a Gamma Distribution with α = 3 and θ = 500. • Claim occurrences and amounts are independent. Determine the variance of aggregate annual losses. A. 3.2 million B. 3.4 million C. 3.6 million
D. 3.8 million
E. 4.0 million
5.61 (2 points) You are given six years of aggregate losses: 111, 106, 98, 120, 107, 113. Use the sample variance together with the Normal Approximation, in order to estimate the probability that the aggregate losses next year are less than 100. A. 9% B. 10% C. 11% D. 12% E. 13% 5.62 (3 points) The number of losses per year has a Poisson distribution with a mean of 0.35. There are three types of claims: Type of Claim Mean Frequency Mean Severity Coefficient of Variation of Severity I 0.20 100 5 II 0.10 200 4 III 0.05 300 3 The number of claims of one type is independent of the number of claims of the other types. Determine the variance of the distribution of annual aggregate losses. (A) 150,000 (B) 165,000 (C) 180,000 (D) 195,000 (E) 210,000 5.63 (2 points) The distribution of the number of claims is: n f(n) 1 40% 2 30% 3 20% 4 10% The natural logarithm of the sizes of claims are Normal distributed with mean 6 and variance 0.7. Determine the variance of the distribution of annual aggregate losses. (A) 1.0 million (B) 1.1 million (C) 1.2 million (D) 1.3 million (E) 1.4 million 5.64 (2 points) X has the following distribution: Prob[X = 0] = 20%, Prob[X = 1] = 30%, Prob[X = 2] = 50%. Y is the sum of X independent Normal random variables, each with mean 3 and variance 5. What is the variance of Y? A. 8 B. 9 C. 10 D. 11 E. 12
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 143
5.65 (4 points) For liability insurance, the number of accidents per year is Poisson with mean 10%. The number of claimants per accident follows a zero-truncated Binomial Distribution with m = 4 and q = 0.2. The size of each claim follows a Gamma Distribution with α = 3 and θ = 10,000. Determine the coefficient of variation of the aggregate annual losses. A. 3.2 B. 3.4 C. 3.6 D. 3.8 E. 4.0 5.66 (5 points) The Spring & Sommers Company has 2000 employees. Spring & Sommers provides a generous disability program for its employees. A disabled employee is paid 2/3 of his or her weekly salary. The company self-insures the first 5 weeks of any disability and has an lnsurance policy that will cover any disability payments beyond 5 weeks. Occurrences of disability among employees are independent of one another. Assume that an employee can suffer at most one disability per year. Disabilities have duration: 1 week 30% 2 weeks 20% 3 weeks 10% 4 weeks 10% 5 or more weeks 30% There are two types of employees: Type Number of Employees Weekly Salary Annual Probability of a Disability 1 1500 600 5% 2 500 900 8% Determine the coefficient of variation of the distribution of total annual payments Spring & Sommers pays its employees for disabilities, excluding any amounts paid by the insurance policy. (A) 0.10 (B) 0.15 (C) 0.20 (D) 0.25 (E) 0.30 5.67 (3 points) Use the following information:
• Annual claim occurrences follow a Zero-Modified Poisson Distribution with pM 0 = 25% and λ = 0.1. • Each claim amount follows a LogNormal Distribution with µ = 8 and σ = 0.6. • Claim occurrences and amounts are independent. Determine the variance of aggregate annual losses. A. 6.0 million B. 6.5 million C. 7.0 million
D. 7.5 million
E. 8.0 million
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 144
5.68 (4, 5/85, Q.35) (2 points) Suppose x is a claim-size variable which is gamma distributed with probability density function: f(x) = ar xr-1 e-ax / Γ(r) where a and r are > 0 and x > 0, mean = r/a, variance = r/a2 . N
Let T = total losses = Σ xi
where N is a positive integer.
i=1
Assume the number of claims is independent of their amounts. If E[N] = λ,Var[N] = 2λ, which of the following is the variance of T? A. λr (2r+ λ) / a2
B. 2λ + r/a2
C. 2λr/a2
D. λr (2r+ 1) / a2
E. None of the above 5.69 (4, 5/87, Q.44) (1 point) Let Xi be independent, identically distributed claim-size variables which are gamma-distributed, with parameters α and θ. N
Let T = total losses =
∑Xi where N is a positive integer. i=1
Assume the number of claims is independent of their amounts. If E[N] = m, VAR[N] = 3m, which of the following is the variance of T? A. 3m + αθ2
B. mα(3α + 1)θ2
C. mα(3a + m)θ2
D. 3mαθ2
E. None A, B, C, or D. 5.70 (4, 5/90, Q.43) (2 points) Let N be a random variable for the claim count with: Pr{N = 4} = 1/4 Pr{N = 5} = 1/2 Pr{N = 6} = 1/4 Let X be a random variable for claim severity with probability density function f(x) = 3 x-4, for 1 ≤ x < ∞. Find the coefficient of variation, R, of the aggregate loss distribution, assuming that claim severity and frequency are independent. A. R < 0.35 B. 0.35 ≤ R < 0.50 C. 0.50 ≤ R < 0.65 D. 0.65 ≤ R < 0.70 E. 0.70 ≤ R
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 145
5.71 (Course 151 Sample Exam #1, Q.10) (1.7 points) For aggregate claims S, you are given: ∞ ⎛n + 2⎞ (i) fS(x) = ∑ p *n (x) ⎜ ⎟ (0.6)3 (0.4)n . n ⎝ ⎠ n=0 x p(x) 1 0.3 2 0.6 3 0.1 Determine Var[S]. (A) 7.5 (B) 8.5
(ii)
(C) 9.5
(D) 10.5
(E) 11.5
5.72 (Course 151 Sample Exam #1, Q.18) (2.5 points) The policies of a building insurance company are classified according to the location of the building insured: Number of Claim Policies Claim Region Amount in Region Probability A 20 300 0.01 B 10 500 0.02 C 5 600 0.03 D 15 500 0.02 E 18 100 0.01 There is at most one claim per policy and if there is a claim it is for the stated amount. Using the normal approximation, relative security loadings are computed for each region such that the probability that the total claims for the region do not exceed the premiums collected from policies in that region is 0.95. The relative security loading is defined as: (premiums / expected losses) - 1. Which region pays the largest relative security loading? (A) A (B) B (C) C (D) D (E) E
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 146
5.73 (Course 151 Sample Exam #2, Q.13) (1.7 points) For aggregate claims N
S = Σ Xi, you are given: i=1
(i) Xi has distribution x 1 2
p(x) p 1-p
(ii) Λ is a Poisson random variable with parameter 1/p (iii) given Λ = λ, N is Poisson with parameter λ (iv) the number of claims and claim amounts are mutually independent (v) Var(S) = 19/2 . Determine p. (A) 1/6 (B) 1/5 (C) 1/4 (D) 1/3 (E) 1/2 5.74 (Course 151 Sample Exam #2, Q.14) (1.7 points) For an insured portfolio, you are given: (i) the number of claims has a Geometric distribution with β = 1/3 (ii) individual claim amounts can take values of 3, 4 or 5 with equal probability (iii) the number of claims and claim amounts are independent (iv) the premium charged equals expected aggregate claims plus the variance of aggregate claims. Determine the exact probability that aggregate claims exceeds the premium. (A) 0.01 (B) 0.03 (C) 0.05 (D) 0.07 (E) 0.09 5.75 (Course 151 Sample Exam #2, Q.16) (1.7 points) Let S be the aggregate claims for a collection of insurance policies. You are given: The size of claims has mean E[X] and second moment E[X2 ]. G is the premium with relative security loading η, (premiums / expected losses) - 1. S has a compound Poisson distribution with parameter λ. R = S/G (the loss ratio). Which of the following is an expression for Var(R)? (A)
E[X2] 1 E[X] 1+ η
(B)
1 E[X2 ] 2 E[X] λ (1+ η)
(D)
1 E[X2 ] E[X]2 λ (1+ η)2
(E)
1 E[X2] 2 E[X]2 λ (1+ η)2
(C)
1 E[X2] 2 2 E[X] λ (1+ η)
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 147
5.76 (Course 151 Sample Exam #3, Q.8) (1.7 points) An insurer has a portfolio of 40 independent policies. For each policy you are given:
• •
The probability of a claim is 1/8 and there is at most one claim per policy. The benefit amount given that there is a claim has an Inverse Gaussian distribution with µ = 400 and θ = 8000.
Using the Normal approximation, determine the probability that the total claims for the portfolio are greater than 2900. (A) 0.03 (B) 0.06 (C) 0.09 (D) 0.12 (E) 0.15 5.77 (Course 151 Sample Exam #3, Q.10) (1.7 points) An insurance company is selling policies to individuals with independent future lifetimes and identical mortality profiles. For each individual, the probability of death by all causes is 0.10 and the probability of death due to accident is 0.01. Each insurance policy pays the following benefits: 10 for accidental death 1 for non-accidental death The company wishes to to have at least a 95% probability that premiums with a relative security loading of 0.20 are adequate to cover claims. The relative security loading is: (premiums / expected losses) - 1. Using the normal approximation, determine the minimum number of policies that must be sold. (A) 1793 (B) 1975 (C) 2043 (D) 2545 (E) 2804 5.78 (5A, 11/94, Q.20) (1 point) The probability of a loss in a given period is 0.01. The probability of more than one loss in a given period is 0. Given that a loss occurs, the damage is assumed to be uniformly distributed over the interval from 0 to 10,000. What is the variance of the aggregate loss experience within the given time period? A. Less than 200,000 B. At least 200,000, but less than 250,000 C. At least 250,000, but less than 300,000 D. At least 300,000, but less than 350,000 E. Greater than or equal to 350,000 5.79 (5A, 5/95, Q.36) (2 points) Suppose S is a compound Poisson distribution of aggregate claims with a mean number of claims = 3 for a collection of insurance policies over a single premium period. The first and second moments of the individual claim amount distribution are 100 and 15,000 respectively. The aggregate premium was determined by applying a relative security loading, (premiums / expected losses) - 1, of 0.1 to the expected aggregate claim amount and by ignoring expenses. Determine the mean and variance of the loss ratio.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 148
5.80 (5A, 5/96, Q.36) The XYZ Insurance Company insures 500 risks. For each risk, there is a 10% probability of having a claim, but no more than one claim is possible. The individual claim amount distribution is given by f(x) = 0.001exp(-x/1000), for x > 0. Assume that the risks are independent. a. (1.5 points) What is the expectation and standard deviation of S = X1 + X2 + ... + X500 where Xi is the loss on insured unit i? b. (1 point) Assuming no expenses, using the Normal Approximation, estimate the premium per risk necessary so that there is a 95% chance that the premiums are sufficient to pay the resulting claims. 5.81 (5A, 5/97, Q.39) (2 points) For a one-year term life insurance policy, suppose the insurer agrees to pay a fixed amount if the insured dies. You are given the following information regarding the binomial claim distribution for this policy: E[x] = 30 Var[x] = 29,100. Calculate the amount of the death payment and the probability that the insured will die within the next year. 5.82 (5A, 5/98, Q.37) (2 points) For a collection of homeowners policies, assume: i) S represents the aggregate claim amount for the entire collection of policies. ii) G is the aggregate premium collected. iii) G = 1.2 E(S) iv) The distribution for the number of claims is Poisson with λ = 5. v) The claim amounts are identically distributed random variables that are uniform over the interval (0,10). vi) The number of claims and the claim amounts are mutually independent. Find the variance of the loss ratio, S/G. 5.83 (5A, 11/98, Q.23) (1 point) The distribution of aggregate claims, S, is compound Poisson with λ = 3. Individual claim amounts are distributed as follows: x p(x) 1 0.40 2 0.20 3 0.40 Which of the following is the closest to the normal approximation of Pr[S > 9]? A. 8% B. 11% C. 14% D. 17% E. 20%
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 149
5.84 (5A, 5/99, Q.24) (1 point) You are given the following information concerning the claim severity, X, and the annual aggregate amount of claims, S: E[X] = 50,000. Var[X] = 500,000,000. Var[S] = 30,000,000. Assume that the claim sizes (X1 , X2 , ...) are identically distributed random variables and the number of claims sizes are mutually independent. Assume that the number of claims (N) follows a Poisson distribution. What is the likelihood that there will be at least one claim next year? A. Less than 5% B. At least 5%, but less than 50% C. At least 50%, but less than 95% D. At least 95% E. Cannot be determined from the above information. 5.85 (5A, 5/99, Q.38) (2.5 points) For a particular line of business, the aggregate claim amount S follows a compound Poisson distribution. The aggregate number of claims N, has a mean of 350. The dollar amount of each individual claim, xi, i = 1,..., N is uniformly distributed over the interval from 0 to 1000 . Assume that N and the Xi are mutually independent random variables. Using the Normal Approximation, calculate the probability that S > 180,000. 5.86 (5A, 11/99, Q.38) (3 points) Use the following information: • An insurer has a portfolio of 14,000 insured properties as shown below. Property Value Number of Properties $20,000 3000 $35,000 4000 $60,000 5000 $75,000 2000 • The annual probability of a claim for each of the insured properties is .04.
• • •
Each property is independent of the others. Assume only total losses are possible.
In order to reduce risk, the insurer buys reinsurance with a retention of $30,000 on each property. (For example, in the case of a loss of $75,000, the insurer would pay $30,000, while the reinsurer would pay $45,000.) • The annual reinsurance premium is set at 125% of the expected excess annual claims. Calculate the probability that the total cost (retained claims plus reinsurance cost) of insuring the properties will exceed $28,650,000 in any year. Use the Normal Approximation.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 150
5.87 (Course 3 Sample Exam, Q.20) You are given: • An insuredʼs claim severity distribution is described by an exponential distribution: F(x) = 1 - e-x/1000.
• The insuredʼs number of claims is described by a negative binomial distribution with: β = 2 and r = 2.
• A 500 per claim deductible is in effect. Calculate the standard deviation of the aggregate losses in excess of the deductible. A. Less than 2000 B. At least 2000 but less than 3000 C. At least 3000 but less than 4000 D. At least 4000 but less than 5000 E. At least 5000 5.88 (Course 3 Sample Exam, Q.25) For aggregate losses S = X1 + X2 + . . . + XN, you are given:
• N has a Poisson distribution with mean 500. • X1 , X2 , ... have mean 100 and variance 100. • N, X1 , X2 , ... are mutually independent. You are also given: • For a portfolio of insurance policies, the loss ratio is the ratio of the aggregate losses to aggregate premiums collected. • The premium collected is 1.1 times the expected aggregate losses. Using the normal approximation to the compound Poisson distribution, calculate the probability that the loss ratio exceeds 0.95. 5.89 (IOA 101, 4/00, Q.2) (2.25 points) Insurance policies providing car insurance are such that the sizes of claims are normally distributed with mean 1,870 and standard deviation 610. In one month 50 claims are made. Assuming that claims are independent, calculate the probability that the total of the claim sizes is more than 100,000. 5.90 (3, 5/00, Q.16) (2.5 points) You are given: Standard Mean Deviation Number of Claims 8 3 Individual Losses 10,000 3,937 Using the normal approximation, determine the probability that the aggregate loss will exceed 150% of the expected loss. (A) Φ(1.25) (B) Φ(1.5)
(C) 1 - Φ(1.25)
(D) 1 - Φ(1.5)
(E) 1.5 Φ(1)
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 151
5.91 (3, 5/00, Q.19) (2.5 points) An insurance company sold 300 fire insurance policies as follows: Number of Policy Probability of Policies Maximum Claim Per Policy 100 400 0.05 200 300 0.06 You are given: (i) The claim amount for each policy is uniformly distributed between 0 and the policy maximum. (ii) The probability of more than one claim per policy is 0. (iii) Claim occurrences are independent. Calculate the variance of the aggregate claims. (A) 150,000 (B) 300,000 (C) 450,000 (D) 600,000 (E) 750,000 5.92 (3, 11/00, Q.8 & 2009 Sample Q.113) (2.5 points) The number of claims, N, made on an insurance portfolio follows the following distribution: n Pr(N=n) 0 0.7 2 0.2 3 0.1 If a claim occurs, the benefit is 0 or 10 with probability 0.8 and 0.2, respectively. The number of claims and the benefit for each claim are independent. Calculate the probability that aggregate benefits will exceed expected benefits by more than 2 standard deviations. (A) 0.02 (B) 0.05 (C) 0.07 (D) 0.09 (E) 0.12 5.93 (3, 11/00, Q.32 & 2009 Sample Q.118) (2.5 points) For an individual over 65: (i) The number of pharmacy claims is a Poisson random variable with mean 25. (ii) The amount of each pharmacy claim is uniformly distributed between 5 and 95. (iii) The amounts of the claims and the number of claims are mutually independent. Determine the probability that aggregate claims for this individual will exceed 2000 using the normal approximation. (A) 1 - Φ(1.33)
(B) 1 - Φ(1.66)
(C) 1 - Φ(2.33)
(D) 1 - Φ(2.66)
(E) 1 - Φ(3.33)
5.94 (3, 5/01, Q.29 & 2009 Sample Q.110) (2.5 points) You are the producer of a television quiz show that gives cash prizes. The number of prizes, N, and prize amounts, X, have the following distributions: n Pr(N = n) x Pr (X=x) 1 0.8 0 0.2 2 0.2 100 0.7 1000 0.1 Your budget for prizes equals the expected prizes plus the standard deviation of prizes. Calculate your budget. (A) 306 (B) 316 (C) 416 (D) 510 (E) 518
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 152
5.95 (3, 11/01, Q.7 & 2009 Sample Q.98) (2.5 points) You own a fancy light bulb factory. Your workforce is a bit clumsy – they keep dropping boxes of light bulbs. The boxes have varying numbers of light bulbs in them, and when dropped, the entire box is destroyed. You are given: Expected number of boxes dropped per month: 50 Variance of the number of boxes dropped per month: 100 Expected value per box: 200 Variance of the value per box: 400 You pay your employees a bonus if the value of light bulbs destroyed in a month is less than 8000. Assuming independence and using the normal approximation, calculate the probability that you will pay your employees a bonus next month. (A) 0.16 (B) 0.19 (C) 0.23 (D) 0.27 (E) 0.31 5.96 (3, 11/02, Q.6 & 2009 Sample Q.91) (2.5 points) The number of auto vandalism claims reported per month at Sunny Daze Insurance Company (SDIC) has mean 110 and variance 750. Individual losses have mean 1101 and standard deviation 70. The number of claims and the amounts of individual losses are independent. Using the normal approximation, calculate the probability that SDICʼs aggregate auto vandalism losses reported for a month will be less than 100,000. (A) 0.24 (B) 0.31 (C) 0.36 (D) 0.39 (E) 0.49 5.97 (CAS3, 11/03, Q.24) (2.5 points) Zoom Buy Tire Store, a nationwide chain of retail tire stores, sells 2,000,000 tires per year of various sizes and models. Zoom Buy offers the following road hazard warranty: "If a tire sold by us is irreparably damaged in the first year after purchase, we'll replace it free, regardless of the cause." The average annual cost of honoring this warranty is $10,000,000, with a standard deviation of $40,000. Individual claim counts follow a binomial distribution, and the average cost to replace a tire is $100. All tires are equally likely to fail in the first year, and tire failures are independent. Calculate the standard deviation of the replacement cost per tire. A. Less than $60 B. At least $60, but less than $65 C. At least $65, but less than $70 D. At least $70, but less than $75 E. At least $75
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 153
5.98 (CAS3, 11/03, Q.25) (2.5 points) Daily claim counts are modeled by the negative binomial distribution with mean 8 and variance 15. Severities have mean 100 and variance 40,000. Severities are independent of each other and of the number of claims. Let σ be the standard deviation of a day's aggregate losses. On a certain day, 13 claims occurred, but you have no knowledge of their severities. Let σʼ be the standard deviation of that day's aggregate losses, given that 13 claims occurred. Calculate σ/σʼ - 1. A. Less than -7.5% B. At least -7.5%, but less than 0 C. 0 D. More than 0, but less than 7.5% E. At least 7.5% 5.99 (SOA3, 11/03, Q.4 & 2009 Sample Q.85) (2.5 points) Computer maintenance costs for a department are modeled as follows: (i) The distribution of the number of maintenance calls each machine will need in a year is Poisson with mean 3. (ii) The cost for a maintenance call has mean 80 and standard deviation 200. (iii) The number of maintenance calls and the costs of the maintenance calls are all mutually independent. The department must buy a maintenance contract to cover repairs if there is at least a 10% probability that aggregate maintenance costs in a given year will exceed 120% of the expected costs. Using the normal approximation for the distribution of the aggregate maintenance costs, calculate the minimum number of computers needed to avoid purchasing a maintenance contract. (A) 80 (B) 90 (C) 100 (D) 110 (E) 120 5.100 (SOA3, 11/03, Q.33 & 2009 Sample Q.88) (2.5 points) A towing company provides all towing services to members of the City Automobile Club. You are given: (i) Towing Distance Towing Cost Frequency 0-9.99 miles 80 50% 10-29.99 miles 100 40% 30+ miles 160 10% (ii) The automobile owner must pay 10% of the cost and the remainder is paid by the City Automobile Club. (iii) The number of towings has a Poisson distribution with mean of 1000 per year. (iv) The number of towings and the costs of individual towings are all mutually independent. Using the normal approximation for the distribution of aggregate towing costs, calculate the probability that the City Automobile Club pays more than 90,000 in any given year. (A) 3% (B) 10% (C) 50% (D) 90% (E) 97%
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 154
5.101 (CAS3, 5/04, Q.19) (2.5 points) A company has a machine that occasionally breaks down. An insurer offers a warranty for this machine. The number of breakdowns and their costs are independent. The number of breakdowns each year is given by the following distribution: # of breakdowns Probability 0 50% 1 20% 2 20% 3 10% The cost of each breakdown is given by the following distribution: Cost Probability 1,000 50% 2,000 10% 3,000 10% 5,000 30% To reduce costs, the insurer imposes a per claim deductible of 1,000. Compute the standard deviation of the insurer's losses for this year. A. 1,359 B. 2,280 C. 2,919 D. 3,092 E. 3,434 5.102 (CAS3, 5/04, Q.22) (2.5 points) An actuary determines that claim counts follow a negative binomial distribution with unknown β and r. It is also determined that individual claim amounts are independent and identically distributed with mean 700 and variance 1,300. Aggregate losses have mean 48,000 and variance 80 million. Calculate the values for β and r. A. β = 1.20, r = 57.19 B. β = 1.38, r = 49.75 C. β = 2.38, r = 28.83 D. β = 1,663.81, r = 0.04 E. β = 1,664.81, r = 0.04
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 155
5.103 (CAS3, 5/04, Q.38) (2.5 points) You are asked to price a Workers' Compensation policy for a large employer. The employer wants to buy a policy from your company with an aggregate limit of 150% of total expected loss. You know the distribution for aggregate claims is Lognormal. You are also provided with the following: Mean Standard Deviation Number of claims 50 12 Amount of individual loss 4,500 3,000 Calculate the probability that the aggregate loss will exceed the aggregate limit. A. Less than 3.5% B. At least 3.5%, but less than 4.5% C. At least 4.5%, but less than 5.5% D. At least 5.5%, but less than 6.5% E. At least 6.5% 5.104 (CAS3, 5/04, Q.39) (2.5 points) PQR Re provides reinsurance to Telecom Insurance Company. PQR agrees to pay Telecom for all losses resulting from "events", subject to a $500 per event deductible. For providing this coverage, PQR receives a premium of $250. Use a Poisson distribution with mean equal to 0.15 for the frequency of events. Event severity is from the following distribution: Loss Probability 250 0.10 500 0.25 750 0.30 1,000 0.25 1,250 0.05 1,500 0.05
• i = 0% Using the normal approximation to PQR's annual aggregate losses on this contract, what is the probability that PQR will payout more than it receives? A. Less than 12% B. At least 12%, but less than 13% C. At least 13%, but less than 14% D. At least 14%, but less than 15% E. 15% or more
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 156
5.105 (CAS3, 11/04, Q.31) (2.5 points) The mean annual number of claims is 103 for a group of 10,000 insureds. The individual losses have an observed mean and standard deviation of 6,382 and 1,781, respectively. The standard deviation of the aggregate claims is 22,874. Calculate the standard deviation for the annual number of claims. A. 1.47 B. 2.17 C.4.72 D. 21.73 E. 47.23 5.106 (CAS3, 11/04, Q.32) (2.5 points) An insurance policy provides full coverage for the aggregate losses of the Widget Factory. The number of claims for the Widget Factory follows a negative binomial distribution with mean 25 and coefficient of variation 1.2. The severity distribution is given by a lognormal distribution with mean 10,000 and coefficient of variation 3. To control losses, the insurer proposes that the Widget Factory pay 20% of the cost of each loss. Calculate the reduction in the 95th percentile of the normal approximation of the insurer's loss. A. Less than 5% B. At least 5%, but less than 15% C. At least 15%, but less than 25% D. At least 25%, but less than 35% E. At least 35% 5.107 (SOA3, 11/04, Q.15 & 2009 Sample Q.125) (2.5 points) Two types of insurance claims are made to an insurance company. For each type, the number of claims follows a Poisson distribution and the amount of each claim is uniformly distributed as follows: Type of Claim
Poisson Parameter λ for Number of Claims
Range of Each Claim Amount
I 12 (0, 1) II 4 (0, 5) The numbers of claims of the two types are independent and the claim amounts and claim numbers are independent. Calculate the normal approximation to the probability that the total of claim amounts exceeds 18. (A) 0.37 (B) 0.39 (C) 0.41 (D) 0.43 (E) 0.45
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 157
5.108 (CAS3, 5/05, Q.8) (2.5 points) An insurance company increases the per claim deductible of all automobile policies from $300 to $500. The mean payment and standard deviation of claim severity are shown below. Deductible Mean Payment Standard Deviation $300 1,000 256 $500 1,500 678 The claims frequency is Poisson distributed both before and after the change of deductible. The probability of no claim increases by 30%, and the probability of having exactly one claim decreases by 10%. Calculate the percentage increase in the variance of the aggregate claims. A. Less than 30% B. At least 30%, but less than 50% C. At least 50%, but less than 70% D. At least 70%, but less than 90% E. 90% or more 5.109 (CAS3, 5/05, Q.9) (2.5 points) Annual losses for the New Widget Factory can be modeled using a Poisson frequency model with mean of 100 and an exponential severity model with mean of $10,000. An insurance company agrees to provide coverage for that portion of any individual loss that exceeds $25,000. Calculate the standard deviation of the insurer's annual aggregate claim payments. A. Less than $36,000 B. At least $36,000, but less than $37,000 C. At least $37,000, but less than $38,000 D. At least $38,000, but less than $39,000 E. $39,000 or more 5.110 (CAS3, 5/05, Q.40) (2.5 points) An insurance company has two independent portfolios. In Portfolio A, claims occur with a Poisson frequency of 2 per week and severities are distributed as a Pareto with mean 1,000 and standard deviation 2,000. In Portfolio B, claims occur with a Poisson frequency of 1 per week and severities are distributed as a log-normal with mean 2,000 and standard deviation 4,000. Determine the standard deviation of the combined losses for the next week. A. Less than 5,500 B. At least 5,500, but less than 5,600 C. At least 5,600, but less than 5,700 D. At least 5,700, but less than 5,800 E. 5,800 or more
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 158
5.111 (SOA M, 5/05, Q.17 & 2009 Sample Q.164) (2.5 points) For a collective risk model the number of losses has a Poisson distribution with λ = 20. The common distribution of the individual losses has the following characteristics: (i) E[X] = 70 (ii) E[X ∧ 30] = 25 (iii) Pr(X > 30) = 0.75 (iv) E[X2 | X > 30] = 9000 An insurance covers aggregate losses subject to an ordinary deductible of 30 per loss. Calculate the variance of the aggregate payments of the insurance. (A) 54,000 (B) 67,500 (C) 81,000 (D) 94,500 (E) 108,000 Note: This past exam question has been rewritten. 5.112 (SOA M, 5/05, Q.31 & 2009 Sample Q.167) (2.5 points) The repair costs for boats in a marina have the following characteristics: Boat Number of Probability that Mean of repair Variance of repair type boats repair is needed cost given a repair cost given a repair Power boats 100 0.3 300 10,000 Sailboats 300 0.1 1000 400,000 Luxury yachts 50 0.6 5000 2,000,000 At most one repair is required per boat each year. The marina budgets an amount, Y, equal to the aggregate mean repair costs plus the standard deviation of the aggregate repair costs. Calculate Y. (A) 200,000 (B) 210,000 (C) 220,000 (D) 230,000 (E) 240,000 5.113 (SOA M, 5/05, Q.40 & 2009 Sample Q.171) (2.5 points) For aggregate losses, S: (i) The number of losses has a negative binomial distribution with mean 3 and variance 3.6. (ii) The common distribution of the independent individual loss amounts is uniform from 0 to 20. Calculate the 95th percentile of the distribution of S as approximated by the normal distribution. (A) 61 (B) 63 (C) 65 (D) 67 (E) 69
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 159
5.114 (CAS3, 11/05, Q.30) (2.5 points) On January 1, 2005, Dreamland Insurance sold 10,000 insurance policies that pay $100 for each day 2005 that a policyholder is in the hospital. The following assumptions were used in pricing the policies:
• The probability that a given policyholder will be hospitalized during the year is 0.05. No policyholder will be hospitalized more than one time during the year.
• If a policyholder is hospitalized, the number of days spent in the hospital follows a lognormal distribution with µ = 1.039 and σ = 0.833. Using the normal approximation, calculate the premium per policy such that there is a 90% probability that total premiums will exceed total losses. A. Less than 21.20 B. At least 21.20, but less than 21.50 C. At least 21.50, but less than 21.80 D. At least 21.80, but less than 22.10 E. At least 22.10 5.115 (CAS3, 11/05, Q.34) (2.5 points) Claim frequency follows a Poisson process with rate of 10 per year. Claim severity is exponentially distributed with mean 2,000. The method of moments is used to estimate the parameters of a lognormal distribution for the aggregate losses. Using the lognormal approximation, calculate the probability that annual aggregate losses exceed 105% of expected annual losses. A. Less than 34.5% B. At least 34.5%, but less than 35.5% C. At least 35.5%, but less than 36.5% D. At least 36.5%, but less than 37.5% E. At least 37.5%
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 160
5.116 (SOA M, 11/05, Q.34 & 2009 Sample Q.210) (2.5 points) Each life within a group medical expense policy has loss amounts which follow a compound Poisson process with λ = 0.16. Given a loss, the probability that it is for Disease 1 is 1/16. Loss amount distributions have the following parameters: Mean per loss Standard Deviation per loss Disease 1 5 50 Other diseases 10 20 Premiums for a group of 100 independent lives are set at a level such that the probability (using the normal approximation to the distribution for aggregate losses) that aggregate losses for the group will exceed aggregate premiums for the group is 0.24. A vaccine which will eliminate Disease 1 and costs 0.15 per person has been discovered. Define: A = the aggregate premium assuming that no one obtains the vaccine, and B = the aggregate premium assuming that everyone obtains the vaccine and the cost of the vaccine is a covered loss. Calculate A/B. (A) 0.94 (B) 0.97 (C) 1.00 (D) 1.03 (E) 1.06 5.117 (SOA M, 11/05, Q.38 & 2009 Sample Q.212) (2.5 points) For an insurance: (i) The number of losses per year has a Poisson distribution with λ = 10. (ii) Loss amounts are uniformly distributed on (0, 10). (iii) Loss amounts and the number of losses are mutually independent. (iv) There is an ordinary deductible of 4 per loss. Calculate the variance of aggregate payments in a year. (A) 36 (B) 48 (C) 72 (D) 96 (E) 120 5.118 (SOA M, 11/05, Q.40) (2.5 points) Lucky Tom deposits the coins he finds on the way to work according to a Poisson process with a mean of 22 deposits per month. 5% of the time, Tom deposits coins worth a total of 10. 15% of the time, Tom deposits coins worth a total of 5. 80% of the time, Tom deposits coins worth a total of 1. The amounts deposited are independent, and are independent of the number of deposits. Calculate the variance in the total of the monthly deposits. (A) 180 (B) 210 (C) 240 (D) 270 (E) 300
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 161
5.119 (CAS3, 11/06, Q.29) (2.5 points) Frequency of losses follows a binomial distribution with parameters m = 1,000 and q = 0.3. Severity follows a Pareto distribution with parameters α = 3 and θ = 500. Calculate the standard deviation of the aggregate losses. A. Less than 7,000 B. At least 7,000, but less than 7,500 C. At least 7,500, but less than 8,000 D. At least 8,000, but less than 8,500 E. At least 8,500 5.120 (SOA M, 11/06, Q.21 & 2009 Sample Q.282) (2.5 points) Aggregate losses are modeled as follows: (i) The number of losses has a Poisson distribution with λ = 3. (ii) The amount of each loss has a Burr (Burr Type XII, Singh-Maddala) distribution with α = 3, θ = 2, and γ = 1. (iii) The number of losses and the amounts of the losses are mutually independent. Calculate the variance of aggregate losses. (A) 12 (B) 14 (C) 16 (D) 18 (E) 20 5.121 (SOA M, 11/06, Q.32 & 2009 Sample Q.287) (2.5 points) For an aggregate loss distribution S: (i) The number of claims has a negative binomial distribution with r = 16 and β = 6. (ii) The claim amounts are uniformly distributed on the interval (0, 8). (iii) The number of claims and claim amounts are mutually independent. Using the normal approximation for aggregate losses, calculate the premium such that the probability that aggregate losses will exceed the premium is 5%. 5.122 (4, 5/07, Q.17) (2.5 points) You are given: (i) Aggregate losses follow a compound model. (ii) The claim count random variable has mean 100 and standard deviation 25. (iii) The single-loss random variable has mean 20,000 and standard deviation 5000. Determine the normal approximation to the probability that aggregate claims exceed 150% of expected costs. (A) 0.023 (B) 0.056 (C) 0.079 (D) 0.092 (E) 0.159
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 162
Solutions to Problems: 5.1. B. σA2 = µF σS2 + µS2 σF2 = (13)(200000) + (3002 )(37) = 5,930,000. 5.2. C. Frequency is Bernoulli with q = 2/3, with mean = 2/3 and variance = (2/3)(1/3) = 2/9. Mean severity = 7.1, variance of severity = 72.1 - 7.12 = 21.69. Thus σA2 = µF σS2 + µS2 σF2 = (2/3)(21.69) + (7.12 )(2/9) = 25.66. For the severity the mean and the variance are computed as follows: Probability 20% 50% 30%
Size of claim 2 5 14
Square of Size of Claim 4 25 196
Mean
7.1
72.1
5.3. A. Since the frequency and severity are independent, the variance of the aggregate losses = (mean frequency)(variance of severity) + (mean severity)2 (variance of frequency) = 0.25 {(variance of severity) + (mean severity)2 } = 0.25 (2nd moment of the severity) 5000
= (0.25 / 5000)
∫0
x2 dx = (0.25 / 5000) (5000)3 / 3 = 2,083,333.
5.4. E. The average aggregate loss is 106. The second moment of the aggregate losses is 16940. Therefore, the variance = 16940 - 1062 = 5704. Situation 1 claim @ 50 1 claim @ 200 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 150 2 claims @ 150 each Overall
Probability 60.0% 20.0% 7.2% 9.6% 3.2% 100.0%
Aggregate Loss Square of Aggregate Loss 50 2500 200 40000 100 10000 200 40000 300 90000 106
16940
For example, the chance of 2 claims with one of size 50 and one of size 150 is the chance of having two claims times the chance given two claims that one will be 50 and the other 150: (.2){(2)(.6)(.4)} = 9.6%. In that case the aggregate loss is 50 + 150 = 200. One takes the weighted average over all the possibilities. Comment: Note that the frequency and severity are not independent.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 163
5.5. A. Severity and frequency are independent. Frequency is Geometric with β = (1 - 0.7) / 0.7 = 0.4286. The mean frequency is β = 0.4286, variance is β(1+β) = 0.6122. The mean severity is 100. The variance of the severity = (2/3)(50-100)2 + (1/3) (200-100)2 = 5000.
σA2 = µFσS2 + µS2 σF2 = (.4286)(5000) + (1002 ) (0.6122) = 8265. Comment: The chance of 0 claims is 0.7 = 1/(1+β), the chance of 1 claim is (0.3)(0.7), the chance of 2 claims is (0.32 )(0.7), etc. 5.6. A. Use for each type the formula: variance of aggregate = µf σs2 + µs2 σf2. For example for Type I : µf = 1/3, σf2 = (1/3)(2/3), (Bernoulli claim frequency), µs = 130, σs2 = (70%)(302 ) + (30%)(702 ) = 2100; Type I variance of aggregate = (1/3)(2100) + (2/9)(1302 ) = 4456. Type Risk I II III
A Priori Chance of Risk 0.333 0.333 0.333
Mean Freq. 0.333 0.500 0.667
Variance of Freq. 0.222 0.250 0.222
Mean Severity 130 150 170
Variance #VALUE! 2100 2500 2100
Process Variance of Agg. 4456 6875 7822
Variance for the portfolio is: (100)(4456) + (100)(6875) + (100)(7822) = 1,915,300. 5.7. D. Since the frequency and severity are independent, the variance of the aggregate losses= (mean frequency)(variance of severity) + (mean severity)2 (variance of frequency)
= 5 {(variance of severity) + (mean severity)2 } = 5 (2nd moment of the severity). Second moment of the severity is: ∞
∫1 x2 (3.5 x- 4.5 ) dx = 2.333. Therefore variance of aggregate losses = (5)(2.333) = 11.67. Comment: The Severity is a Single Parameter Pareto Distribution.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 164
∞
5.8. B. The mean severity is:
∫1 x (3.5 x- 4.5 ) dx = 1.4.
Thus the mean aggregate loss is (5)(1.4 ) = 7. From the solution to the prior question, the variance of the aggregate losses is: 11.667. Thus the standard deviation of the aggregate losses is 11.667 = 3.416. To apply the Normal Approximation we subtract the mean and divide by the standard deviation. The probability that the total losses will exceed 11 is approximately: 1 - Φ[(11 - 7)/ 3.416] = 1 - Φ(1.17) = 12.1%. 5.9. A. Mean = exp(µ + σ2/2) = 7. 2nd moment = exp(2µ + 2σ2) = 11.67 + 72 = 60.67. Dividing the 2nd equation by the square of the 1st: exp(2µ + 2σ2) / exp(2µ + σ2) = 60.67 / 72 ⇒ exp(σ2) = 1.238 ⇒ σ =
ln(1.238) = 0.4621.
µ = ln(7) - σ2/2 = 1.839. 1 - Φ((ln(11) - 1.839)/.4621) = 1 - Φ(1.21) = 11.31%. Comment: Below shown as dots is the aggregate distribution approximated via simulation of 10,000 years, the Normal Approximation shown as the dotted line, and the LogNormal Approximation shown as the solid line: 0.15
0.125
0.1 0.075
0.05
0.025
5
Here is a similar graph of the righthand tail:
10
15
20
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 165
0.01
0.008
0.006
0.004
0.002
16
18
20
22
24
As shown above, the Normal Distribution (dashed) has a lighter righthand tail than the LogNormal Distribution (solid), with the aggregate distribution (dots) somewhere in between. For example, S(20) = 0.6% for the LogNormal, while S(20) = 0.007% for the Normal. For the simulation, S(20) = 0.205%, less than the LogNormal, but more than the Normal. 5.10. D. The severity is a Single Parameter Pareto with α = 3.5 and θ = 1. It has second moment of: (3.5)(12 )/(3.5-2) = 2.333, and third moment of: (3.5)(13 )/(3.5-3) = 7. The Third Central Moment of a Compound Poisson Distribution is: (mean frequency) (third moment of the severity) = (5)(7) = 35. Variance of the aggregate losses is: λ(2nd moment of severity) = (5)(2.333) = 11.67. Therefore, skewness = 35/(11.67)1.5 = 0.878. Alternately, skewness of a compound Poisson = (third moment of the severity) / { λ (2nd moment of severity)1.5} = 7/{ 5 (2.3331.5)} = 0.878. 5.11. B. Since the frequency and severity are independent, and frequency is Poisson with mean 5, the process variance of the aggregate losses = (5 )(2nd moment of the severity). The severity distribution is Single Parameter Pareto. F(x) = 1 - x-3.5, prior to the effects of the maximum covered loss. The 2nd moment of the severity after the maximum covered loss is: 5
∫
1
x2 (3.5 x - 4.5 )
dx +
(52 )S(5)
= -(3.5 / 1.5)
x=5 1.5 x ]
+ (25)( 5-3.5) = 2.125 + 0.089 = 2.214.
x =1
Therefore, the variance of the aggregate losses = (5)(2.214) = 11.07. Comment: For a Single Parameter Pareto Distribution, E[(X
∧
x)2] = αθ2/(α - 2) - 2θ2/{(α - 2)xα−2} =
(3.5)(12 )/(3.5 - 2) - (2)(12 )/{(3.5 - 2)53.5-2} = 2.333 - .119 = 2.214.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 166
5.12. D. σA2 = µF σS2 + µS2 σF2 = µ(θ2) + (θ)2 µ = 2µθ2. 5.13. C. The third moment of the Exponential severity is 6θ3 . The Third Central Moment of a Compound Poisson Distribution is: (mean frequency) (third moment of the severity) = µ6θ3 . From the solution to the previous question, the variance of the aggregate losses is 2µθ2 . Therefore, the skewness of the aggregate losses is: µ6θ3 / {2µθ2 }1.5 = 3/ 2µ . 5.14. E. For a Poisson frequency with severity and frequency independent, the process variance of the aggregate losses = µF(2nd moment of the severity) = (3)(200) = 600. 5.15. E. The mean aggregate loss for N automobiles is: N(.03)(3000/2) = 45N. Second moment of the uniform distribution from 0 to 3000 is: 30002 /12 + 15002 = 2,250,000. Variance of aggregate loss for N automobiles is: N(.03)(2,250,000) = 90,000N. Prob[aggregate ≤ 160% of expected] = Φ[(.6)45N/ 90,000N ] = Φ[.09/ N ]. We want this probability to be at least 95% ⇒ 0.09/ N ≥ 1.645 ⇒ N ≥ 334.1. Comment: Similar to SOA3, 11/03, Q.4. 5.16. C. This is a compound Poisson. In units of bases, the mean severity is: (1)(0.22) + (2)(0.04) + (3)(0.01) + (4)(0.05) = .530. (This is Donʼs expected slugging percentage.) The second moment of the severity is: (1)(0.22) +(4)(0.04) + (9)(0.01) + (16)(0.05) = 1.27. Thus the variance of the aggregate losses is (600)(1.27) = 762. The mean of the aggregate losses is: (600)(.530) = 318 bases. The chance that Don will have at most $700,000 in incentives, is the chance that he has no more than 350 total bases: Φ((350.5-318)/ 762 ) = Φ(1.177) = 88.0%. 5.17. E. For the Binomial frequency: mean = mq = 3.2, variance = mq(1-q) = 1.92. For the Pareto severity: mean = θ / (α-1) = 333.333, second moment = 2θ2 / {(α-1)(α-2)} = 333,333, variance = 333,333 - 333.3332 = 222,222. Since the frequency and severity are independent:
σA2 = µF σS2 + µS2 σF2 = (3.2)(222,222) + (333.3332 )(1.92) = 924,443.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 167
5.18. D. The 2nd moment of a LogNormal Distribution is: exp(2µ + 2σ2) = exp[2(7) + 2(.52 )] = exp(14.5) = 1,982,759. Since the frequency is Poisson and the frequency and severity are independent:
σA2 = (mean frequency)(2nd moment of the severity) = (3)(1,982,759) = 5,948,278. 5.19. B. For the Negative Binomial frequency: mean = rβ = 6, variance = rβ(1+β) = 18. For the Gamma severity: mean = αθ = 600, variance = αθ2 = 120,000. Since the frequency and severity are independent:
σA2 = µF σS2 + µS2 σF2 = (6)(120,000) + (6002 )(18) = 7,200,000. 5.20. C. The variances of independent risks add. (55)(924,443) + (35)(5,948,278) + (10)(7,200,000) = 331 million. 5.21. D. For the Negative Binomial, mean = rβ = 6, variance = rβ(1+β) = 18, skewness = (1 + 2β) /
(1+ β)(rβ) = 5/ 18 = 1.1785.
For the Gamma severity: mean = αθ = 600, variance = αθ2 = 120,000, skewness = 2 /
α = 1.1547.
From a previous solution, σA2 = 7,200,000. Since the frequency and severity are independent: γA = {µF σX3 γX + 3 σF2µXσX2 + σF3γFµX3} / σA3 = {(6)(120,0001.5)(1.1547) + (3)(18)(600)(120,000) + (181.5)(1.1785)(6003 )}/7,200,0001.5 = 1.222. Comment: Well beyond what you should be asked on your exam! The skewness is a dimensionless quantity, which does not depend on the scale. Therefore, we would have gotten the same answer for the skewness if we had set the scale parameter of the Gamma, θ = 1, including in the calculation of σA2 . 5.22. B. S(1000) = e-1000/5000 = 0.8187. The frequency of non-zero payments is Poisson with mean: (2.4)(.8187) = 1.965. The severity distribution truncated and shifted at 1000 is also an exponential with mean 5000. The mean aggregate losses excess of the deductible is (1.965)(5000) = 9825.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 168
5.23. E. The frequency of non-zero payments is Poisson with λ = 1.965. The severity of non-zero payments is an Exponential distribution with θ = 5000, with second moment 2(50002 ). Thus the variance of the aggregate losses excess of the deductible is: (1.965)(2)(50002 ). The standard deviation is: 5000 3.93 = 9912. 5.24. D. For the Exponential: E[X
∧
x] = θ(1-e-x/θ).
E[X ∧ 10,000] = (5000)(1-e-10000/5000) = 4323. Thus the mean aggregate losses are: (2.4)(4323) = 10,376. 5.25. D. For the Exponential: E[(X
∧
x)2 ] = 2θ2Γ[3;x/θ] + x2 e-x/θ. E[(X
∧
10,000)2 ] =
2(50002 )Γ[3; 2] + (100 million)(e-2) = (50 million)(.3233) + 13.52 million = 29.7 million. Thus the variance of the aggregate losses is: (2.4)(29.7 million) = 71.3 million, for a standard deviation of 8443. Comment: Using Theorem A.1 in Loss Models: Γ[3 ; 2] = 1 - e-2{1 + 2 + 22 /2 } = 0.3233. 5.26. C. S(1000) = e-1000/5000 = 0.8187. The frequency of non-zero payments is Poisson with mean (2.4)(.8187) = 1.965. The severity distribution truncated and shifted at 1000 is also an Exponential with mean 5000. The maximum covered loss reduces the maximum payment to: 10,000 - 1,000 = 9,000. For the Exponential: E[X
∧
x] = θ(1-e-x/θ). Thus, the average non-zero
payment is: E[X ∧ 9,000] = (5000)(1 - e-9000/5000) = 4174. Alternately, the average non-zero payment is: (E[X ∧ 10,000] - E[X ∧ 1,000])/S(1000) = (4323 - 906)/0.8187 = 4174. Thus the mean aggregate losses are: (1.965)(4174) = 8202. 5.27. C. The frequency of non-zero payments is Poisson with λ = 1.965. The severity of non-zero payments is an Exponential distribution with θ = 5000, censored at 10,000 - 1000 = 9000, with second moment E[(X
∧
9,000)2 ] =
2(50002 )Γ[3; 9000/5000] + (90002 )e-9000/5000 = (50 million)(.2694) + 13.39 million = 26.86 million. Thus the variance of the aggregate losses is: (1.965)(26.86 million) = 52.78 million, for a standard deviation of 7265. Comment: Using Theorem A.1 in Loss Models: Γ[3 ; 1.8] = 1 - e-1.8{1 + 1.8 + 1.82 /2 } = 0.2694.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 169
5.28. C. For each interval [a,b], the first moment is (a+b)/2. Lower Endpoint 0 1 3 5
Upper Endpoint 1 3 5 10
Number of Losses 60 30 20 10
First Moment 0.50 2.00 4.00 7.50
Contribution to the Mean 30.00 60.00 80.00 75.00 245
Mean = ((60)(.5) + (30)(2) + (20)(4) + (10)(7.5) + 12 + 15 + 17 + 20 + 30)/125 = 2.71. 5.29. D. For each interval [a,b], the second moment is: (b3 - a3 )/(3(b-a)). Lower Endpoint 0 1 3 5
Upper Endpoint 1 3 5 10
Number of Losses 60 30 20 10
Second Moment 0.33 4.33 16.33 58.33
Contribution to 2nd Moment 20.00 130.00 326.67 583.33 1,060
We add to these contributions, those of each of the large losses; 2nd moment = {(60)(.33) + (30)(4.33) + (20)(16.33) + (10)(58.33) + 122 + 152 + 172 + 202 + 302 }/125 = 24.14. Variance = 24.14 - 2.712 = 16.8. 5.30. B. In the interval 5 to 10, 3/5 of the losses are assumed to be of size greater than 7. There are (3/5)(10) = 6 such losses of average size (7 + 10)/2 = 8.5. Thus they contribute (6)(8.5 - 7) = 9 to the layer excess of 7. The 5 large losses contribute: 5 + 8 + 10 + 13 + 23 = 59. e(7) = (9 + 59)/(6 + 5) = 6.2. 5.31. A. Mean aggregate loss is: (40)(2.71) = 108.4. Variance of aggregate loss is: (40)(2nd moment of severity) = (40)(24.14) = 965.5. CV of aggregate loss is:
965.5 / 108.4 = 0.29.
5.32. E. In the interval 5 to 10, there are 10 loses of average size 7.5. Thus they contribute (10)(7.5 - 5) = 25 to the layer from 5 to 15. The 5 individual large losses contribute: (12 - 5) + (15 - 5) + 10 + 10 + 10 = 47. The payment per loss is: (25 + 47)/125 = 0.576. For 40 losses the reinsurer expects to pay: (40)(0.576) = 23.0.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 170
5.33. A. The contributions to this layer from the losses in interval [5, 10], are uniform on [0, 5]; the second moment is: (53 - 03 )/(3(5-0)) = 8.333. The second moment of the ceded losses is: ((10)(8.333) + (12 - 5)2 + (15 - 5)2 + 102 + 102 + 102 )/125 = 4.259. Variance of aggregate ceded losses = (4.259)(40) = 170.4. CV of aggregate ceded losses =
170.4 / 23.0 = 0.57.
5.34. E. S(50000) = {(20000/(20000 + 50000)}3.2 = (2/7)3.2 = .01815. Therefore, the frequency of non-zero payments is: Negative Binomial with r = 4.1 and β = (2.8)(0.01815) = 0.05082. The mean frequency is (4.1)(.05082) = 0.2084. The variance of the frequency is: (4.1)(.05082)(1.0508) = 0.2190. Truncating and shifting from below produces another Pareto; the severity of non-zero payments is also Pareto with α = 3.2 and θ = 20000 + 50000 = 70000. This Pareto has mean 70000/2.2 = 31,818 and variance (3.2)(700002 )/(1.2 (2.22 )) = 2700 million. Thus the variance of the aggregate losses excess of the deductible is: (0.2084)(2700 million) + (0.2190)(318182 ) = 784.3 million. The standard deviation is: 28.0 thousand. The mean the aggregate losses excess of the deductible is: (.2084)(31818) = 6631. Thus the chance that the aggregate losses excess of the deductible are greater than 15,000 is approximately: 1 - Φ[(15,000 - 6631)/ 28,000] = 1 - Φ[0.30] = 38.2%. 5.35. B. We are mixing Poisson frequencies via a Gamma, therefore frequency for the portfolio is a Negative Binomial with r = α = 5 and β = θ = 0.4, per policy, with mean: (5)(.4) = 2, and variance: (5)(.4)(1.4) = 2.8. The mean loss per policy is: (2)(20) = 40. The variance of the loss per policy is: (2)(300) + (202 )(2.8) = 1720. For 200 independent policies, Mean Aggregate Loss = (200)(40) = 8000.
⇒ 110% of mean aggregate loss is 8800. Variance of Aggregate Loss = (200)(1720) = 344,000. Prob(Aggregate Loss > 1.1mean) = Prob(Aggregate Loss > 8800) ≅ 1 - Φ[(8800 - 8000)/ 344,000 ] = 1 - Φ(1.36) = 8.7%.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 171
5.36. C. For N independent policies, Mean Aggregate Loss = 40N, and Variance of Aggregate Loss = 1720N. Prob(Aggregate Loss > 1.1mean) ≅ 1 - Φ[(0.1)40N/ 1720N )] = 1 - Φ(0.09645 N ). We want this probability to be at most 1% ⇒ 0.09645 N ≥ 2.326 ⇒ N ≥ 582. Comment: Similar to SOA3, 11/03 Q.4. 5.37. E. µ f = 10. σf2 = 20. µs = 1000. σs2 = 200,000. Variance of the aggregate: µfσs2 + µs2 σf2 = (10)(200,000) + (10002 )(20) = 22,000,000.
⇒ σ = 4690. Now if we know that there have been 8 claims, then the aggregate is the sum of 8 independent, identically distributed severities. ⇒ Var[Aggregate] = 8 Var[Severity] = (8)(200,000) = 1,600,000. ⇒ σʼ =
1,600,000 = 1265. σ/σʼ = 4690/1265 = 3.7.
Comment: Similar to CAS3, 11/03, Q.25. 5.38. D. Mean severity is: 5000 + (2000)(0.75) = 6500. Let X = room charges, Y = other charges, Z = payment. Z = X + 0.75Y. Var[Z] = Var[X + .75Y] = Var[X] + .752 Var[Y] + (2)(.75)Cov[X, Y] = 80002 + (.752 )(30002 ) + (2)(.75){(.6)(8000)(3000)} = 90.66 million. Variance of Aggregate = (Mean Freq.)(Variance of Sev.) + (Mean Severity)2 (Var. of Freq.) = (.4)(90.66 million) + (65002 )(.36) = 51.47 million. Standard Deviation of Aggregate =
51.47 million = 7174.
5.39. C. Mean loss is: (4)(1000) = 4000. Variance of loss is: (4)(10002 ) = 4 million. Mean loss adjustment expense is: (3)(200) = 600. Variance of loss adjustment expense is: (3)(2002 ) = 0.12 million. Var[Loss + LAE] = Var[Loss] + Var[LAE] + 2Cov[Loss, LAE] = 4 million + 0.12 million + (2) (0.8) (4 million)(0.12 million) = 5.2285 million. Variance of Aggregate = (Mean Freq.)(Variance of Sev.) + (Mean Severity)2 (Var. of Freq.) = (.6)(5.2285 million) + (46002 )(.6) = 15.83 million. Standard Deviation of Aggregate =
15.83 million = 3979.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 172
5.40. D. One has to recognize this as a compound Poisson, with p(x) the severity, and frequency 3n e-3/n!. Frequency is Poisson with λ = 3. The second moment of the severity is: (.5)(12 ) + (.3)(22 ) + (.2)(32 ) = 3.5. The variance of aggregate losses is: (3)(3.5) = 10.5. Comment: Similar to Course 151 Sample Exam #1, Q.10. 5.41. C. Matching the mean of the LogNormal and the aggregate distribution: exp(µ + .5 σ2) = 100. Matching the second moments: exp(2µ + 2σ2) = 90,000 + 1002 = 100,000. Divide the second equation by the square of the first equation: exp(2µ + 2σ2)/exp(2µ + σ2) = exp(σ2) = 10.
⇒ σ = ln(10) = 1.517. ⇒ µ = ln(100) - σ2/2 = 3.455. Prob[agg. > 2000] ≅ 1 - F(2000) = 1 - Φ[(ln(2000) - 3.455)/1.517] = 1 - Φ[2.73] = 0.0032. 5.42. C. The severity has: mean = θ/(α-1) = 50, and second moment = 2θ2/{(α-1)(α-2)} = 10,000. The mean aggregate loss is: (7)(50) = 350. Since the frequency and severity are independent, and frequency is Poisson, the variance of the aggregate losses = (mean frequency) (2nd moment of the severity) = (7)(10,000) = 70,000. For the LogNormal Distribution the mean is exp[µ +.5 σ2], while the second moment is exp[2µ + 2σ2]. Matching the first 2 moments of the aggregate losses to that of the LogNormal Distribution: exp[µ +.5 σ2] = 350 and exp[2µ + 2σ2] = 70000 + 3502 = 192,500. We can solve by dividing the square of the 1st equation into the 2nd equation: exp[σ2] = 192,500 / 3502 = 1.571. Thus σ = .672 and thus µ = 5.632. Therefore the probability that the total losses will exceed 1000 is approximately: 1 - Φ[(ln(1000) - 5.632) / 0.672] = 1 - Φ[1.90] = 2.9%.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 173
5.43. Due to the memoryless property of the Exponential, the payments excess of a deductible follow the same Exponential Distribution as the ground up losses. Thus the second moment of (non-zero) payments is 2θ2. The number of (non-zero) payments with a deductible b is Poisson with mean: λS(b) = λe-b/θ. Therefore, with deductible b, B = variance of aggregate payments = λe-b/θ2θ2. With deductible c, C = variance of aggregate payments = λe-c/θ2θ2. C/B = e(b-c)/θ. Since c > b, this ratio is less than one. Comment: In the case of an Exponential severity, the variance of aggregate payments decreases as the deductible increases. Similar to CAS3, 5/05, Q.9.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 174
5.44. D. Due to the memoryless property of the Exponential, the payments excess of a deductible follow the same Exponential Distribution as the ground up losses. Thus the second moment of (non-zero) payments is: (2)(4002 ) = 320,000. The number of (non-zero) payments is Poisson with mean: 3 e-500/400 = 0.85951. Therefore, variance of aggregate payments = (0.85951)(320,000) = 275,045. Alternately, for the Exponential Distribution, E[X] = θ = 400, and E[X2 ] = 2θ2 = 320,000. For the Exponential Distribution, E[X ∧ x] = θ (1 - e-x/θ). E[X
∧
500] = 400(1 - e-500/400) = 285.40.
For the Exponential, E[(X E[(X
∧
∧
x)n ] = n! θn Γ(n+1; x/θ) + xn e-x/θ.
500)2 ] = (2)4002 Γ(3; 500/400) + 5002 e-500/400.
According to Theorem A.1 in Loss Models, for integral α, the incomplete Gamma function Γ(α; y) is 1 minus the first α densities of a Poisson Distribution with mean y. Γ(3; y) = 1 - e-y(1 + y + y2 /2). Γ(3; 1.25) = 1 - e-1.25(1 + 1.25 + 1.252 /2) = 0.13153. Therefore, E[(X ∧ 500)2 ] = (320,000)(0.13153) + 250,000e-1.25 = 113,716. The first moment of the layer from 500 to ∞ is: E[X] - E[X ∧ 400] = 400 - 285.40 = 114.60.
Second moment of the layer from 500 to ∞ is: E[X2 ] - E[(X ∧ 500)2 ] - (2)(500)(E[X] - E[X ∧ 500]) = 320,000 - 113,716 - (1000)(114.60) = 91,684. The number of losses is Poisson with mean 3. Thus the variance of the aggregate payments excess of the deductible is: (3)(91,684) = 275,052. Alternately, one can work directly with the integrals, using integration by parts. The second moment of the layer from 500 to ∞ is: ∞
∫500 (x - 500)2 e- x / 400 / 400 dx = ∞
∞
∞
∫500 x2 e- x / 400 / 400 dx - 2.5 500∫ x e- x / 400 dx + 625 500∫ e- x / 400 dx x= ∞
= -x2 e- x / 400 - 800 x e- x / 400 - 320,000 e- x / 400
]
x = 500 x =∞
+ 1000 x e- x / 400 + 400,000 e- x / 400
]
+ 320,000 e-1.25 =
x = 500
e-1.25 {250,000 + 400,000 + 320,000 - 500,000 - 400,000 + 250,000} = 91,682. The variance of the aggregate payments excess of the deductible is: (3)(91,682) = 275,046. Comment: Similar to CAS3, 5/05, Q.9.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 175
5.45. The payments excess of a deductible follow a Pareto Distribution with parameters α and θ + d. Thus the second moment of (non-zero) payments is 2(θ + d)2/{(α - 1)(α - 2)}. The number of (non-zero) payments with a deductible b is Poisson with mean: λS(b) = λ{θ/(θ + b)}α. Therefore, with deductible b, B = λ{θ/(θ + b)}α2(θ + b)2/{(α - 1)(α - 2)}. With deductible c, C = variance of aggregate payments = λ{θ/(θ + c)}α2(θ + c)2/{(α - 1)(α - 2)}. C/B = {(θ + b)/(θ + c)}α−2. Since α > 2 and c > b, this ratio is less than one. Comment: As α approaches 2, the ratio C/B approaches one. For α ≤ 2, the second moment of the Pareto does not exist, and neither does the variance of aggregate payments. Here the variance of the aggregate payments decreases as the deductible increases. In CAS3, 5/05, Q.8, the variance of aggregate payments increases as the deductible increases. 5.46. A. S(42) ≅ 1 - Φ[(42 - 20)/10] = 1 - Φ[2.2] = 1 - .9861 = 1.39%. 5.47. E. exp[µ + σ2/2] = 20. exp[2µ + 2σ2] = 100 + 202 = 500. ⇒ exp[σ2] = 500/202 = 1.25.
⇒ σ = 0.4724. ⇒ µ = 2.8842. S(42) ≅ 1 - Φ[(ln(42) - 2.8842)/.4724] = 1 - Φ[1.81] = 3.51%. 5.48. D. αθ = 20. αθ2 = 100. ⇒ θ = 5. ⇒ α = 4. S(42) ≅ 1 - Γ[4; 42/5] = 1 - Γ[4; 8.4] = e-8.4(1 + 8.4 + 8.42 /2 + 8.43 /6) = 3.23%. Comment: An example of the method of moments.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 176
5.49. E. µ = 20. µ3/θ = 100. ⇒ θ = 80. S(42) ≅ 1 - Φ[(42/20 - 1)(80/42).5] - exp[(2)(80)/20]Φ[-(42/20 + 1)(80/42).5] = 1 - Φ[1.52] - e8 Φ[-4.278]. Φ[-4.278] = 1 - Φ[4.278] ≅ {exp[-4.2782 /2]}/ 2 π }(1/4.278 - 1/4.2783 + 3/4.2785 - 15/4.2787 ) = 9.423 x 10-9. S(42) ≅ 1 - 0.9357 - (2981)(9.423 x 10-9) = 0.0643 - 0.0281 = 3.62%. Comment: An example of the method of moments. 5.50. B. The mean of c times a Poisson is cλ. The variance of c times a Poisson is c2 λ. cλ = 20. c2 λ = 100. ⇒ c = 5. ⇒ λ = 4. 5N > 42. ⇔ N > 42/5 = 8.2. S(42) ≅ 1 - e-4(1 + 4 + 42 /2 + 43 /6 + 44 /4! + 45 /5! + 46 /6! + 47 /7! + 48 /8!) = 2.14%. Comment: Well beyond what you are likely to be asked on your exam! Since Var[cX]/E[cX] = cVar[X]/E[X], for a c > 1, the Over-dispersed Poisson Distribution has a variance greater than it mean. See for example “A Primer on the Exponential Family of Distributions”, by David R. Clark and Charles Thayer, CAS 2004 Discussion Paper Program. 5.51. D. θ/(α - 1) = 20. θ2/{(α - 1)(α - 2)} = 100 + 202 = 500. ⇒ (α - 1)/(α - 2) = 1.25
⇒ α = 6. ⇒ θ = 100. S(42) ≅ Γ[6; 100/42] = Γ[6; 2.381] = 1 - e-2.381(1 + 2.381 + 2.3812 /2 + 2.3813 /6 + 2.3814 /24 + 2.3815 /120) = 3.45%. Comment: Beyond what you are likely to be asked on your exam. An example of the method of moments. Which distribution is used to approximate the Aggregate Distribution can make a very significant difference! From lightest to heaviest righthand tail, the approximating distributions are: Normal, Over-dispersed Poisson, Gamma, Inverse Gaussian, LogNormal, Inverse Gamma. Here is a table of the inverse of the survival functions for various sizes: Distribution Normal 0 Gamma Inv. Gaussian LogNormal Inv. Gamma
40
50
1/S(x) 60
43.96 46.81 23.60 21.87 22.60 23.80
740.8 352.1 96.75 68.21 67.63 60.37
31,574 3653 436.3 212.4 192.0 136.9
70
80
90
100
3.5e+6 50,171 2109 657.3 515.9 283.4
1.0e+9 882,744 10,736 2019 1315 544.0
7.8e+11 1.9e+7 59,947 6152 3194 981.9
1.5e+15 5.2e+8 312,137 18,623 7423 1683
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 177
5.52. B. Mean aggregate per policy: (2)(5) = 10. Variance of aggregate per policy: λ(2nd moment of severity) = (2)(122 + 52 ) = 338. For N policies, mean is: 10N, and the variance is: 338N. 115% of the mean is: 11.5N. Prob[Aggregate > 115% of mean] ≅ 1 - Φ[(11.5N - 10N)/ 338N ] = 1 - Φ[0.08159 N ]. This probability ≤ 2.5%. ⇔ Φ[0.08159 N ] ≥ 97.5%. Φ[1.960] = 0.975. ⇒ We want 0.08159 N ≥ 1.960. ⇒ N ≥ 577.1. Comment: Similar to SOA3, 11/03, Q.4. 5.53. D. Due to the memoryless property of the Exponential, the payments excess of a deductible follow the same Exponential Distribution as the ground up losses. The size of payments has mean 1700, and variance 17002 = 2.89 million. For the original Exponential, S(1000) = exp[-1000/1700] = 0.5553. Thus the number of (non-zero) payments is Negative Binomial with r = 4, and β = (0.5553)(3) = 1.666. The number of payments has mean: (4)(1.666) = 6.664, and variance: (4)(1.666)(2.666) = 17.766. Therefore, the variance of aggregate payments is: (6.664)(2.89 million) + (17002 )(17.766) = 70.6 million. 5.54. B. & 5.55. A. Mean aggregate = (10000)(.03)(12.5) + (15000)(.05)(25) = 22,500. Policy Type one has a mean severity of 12.5 and a variance of the severity of (25 - 0)2 /12 = 52.083. Policy Type one has a mean frequency of .03 and a variance of the frequency of (.03)(.97) = .0291. Thus, a single policy of type one has a variance of aggregate losses of: (.03)(52.083) + (12.52 )(.0291) = 6.109. Policy Type two has a mean severity of 25 and a variance of the severity of (50 - 0)2 /12 = 208.333. Policy Type two has a mean frequency of .05 and a variance of the frequency of (.05)(.95) = .0475. Thus, a single policy of type two has a variance of aggregate losses of: (.05)(208.333) + (252 )(.0475) = 40.104. Therefore, the variance of the aggregate losses of 10000 independent policies of type one and 15000 policies of type two is: (10000)(6.109) + (15000)(40.104) = 662,650. Standard Deviation of aggregate losses is: 814.037. Prob[Aggregate > 24,000] ≅ 1 - Φ[(24,000 - 22,500)/814.037] = 1 - Φ[1.84] = 3.3%. Comment: Similar to 3, 5/00, Q.19.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 178
5.56. C. The aggregate distribution of Property Damage Liability has mean 10λ, and variance λ(152 + 102 ) = 325λ. Φ[1.282] = 90%. Therefore, P ≅ 10λ + 1.282 325λ = 10λ + 23.11 λ . The aggregate distribution of Bodily Injury Liability has mean 24λ/3 = 8λ, and variance (λ/3)(242 + 602 ) = 1392λ. Therefore, B ≅ 8λ + 1.282 1392λ = 8λ + 47.83 λ . B/P = {8λ + 47.83 λ }/{10λ + 23.11 λ } = 1.061.
⇒ 8 λ + 47.83 = 10.61 λ + 24.52. ⇒ λ = 8.93. ⇒ λ = 79.7. 5.57. D. First inflate all of the aggregate losses to the 2012 level: (1.046 ) (31,000,000) = 39,224,890. (1.045 ) (38,000,000) = 46,232,811. (1.044 ) (36,000,000) = 42,114,908. (1.043 ) (41,000,000) = 46,119,424. (1.042 ) (41,000,000) = 44,345,600. Next we calculate the mean and the second moment of the inflated losses: Mean = 43.6075 million. Second Moment = 1908.65 x 1012. The mean of the aggregate distribution is: λ (first moment of severity) = 3000
θ . α−1
The variance of the aggregate distribution is: λ (second moment of severity) = 3000
2 θ2 . (α − 1) (α − 2)
Matching the theoretical and empirical moments: θ 43.6075 million = 3000 . ⇒ θ = 14,539(α - 1). α−1 1908.65 x 1012 - (43.6075 million)2 = 3000
2 θ2 ⇒ θ2 = 11.73 million (α - 1)(α - 2). (α − 1) (α − 2)
Dividing the second equation by the square of the first: 1 = 5.549(α - 2)/(α - 1). ⇒ α = 2.220.
⇒ θ = 17,738. ⇒ S(20,000) = {17,738/(17,738 + 20,000)}2.220 = 18.7%.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 179
5.58. C. For example, VARS(S | N = 3) = 300,000 - 160,000 = 140,000. EN[VARS(S | N)] = (20%)(0) + (40%)(40,000) + (30%)(87,500) + (10%)(140,000) = 56,250. N 0 1 2 3 Mean
Probability Mean of S Given N 20% 40% 30% 10%
Square of Mean of S Given N
Second Moment of of S Given N
Var of S Given N
0 100 250 400
0 10,000 62,500 160,000
0 50,000 150,000 300,000
0 40,000 87,500 140,000
155
38,750
56,250
VARN(ES[S | N]) = 38,750 - 1552 = 14,725. Thus the variance of the aggregate losses is: EN[VARA(A | N)] + VARN(EA[A | N]) = 56,250 + 14,725 = 70,975. Comment: We have not assumed that frequency and severity are independent. The mathematics here is similar to that for the EPV and VHM, used in Buhlmann Credibility. 5.59. A. The Binomial has a mean of: (5)(.4) = 2, and a variance of: (5)(.4)(.6) = 1.2. The LogNormal distribution has a mean of: exp[6 + 0.32 /2] = 422, a second moment of: exp[(2)(6) + (2).32 ] = 194,853, and variance of: 194,853 - 4222 = 16,769. The aggregate losses have a mean of: (2)(422) = 844. The aggregate losses have a variance of: (2)(16,769) + (4222 )(1.2) = 247,239. Prob[Aggregate > (1.5)(844)] ≅ 1 - Φ[(.5)(844)/ 247,239 ] = 1 - Φ[0.85] = 19.77%. 5.60. B. The density at zero for the non-modified Negative Binomial is: 1/1.42 = 0.5102. The mean of the zero-modified Negative Binomial is: (1 - 0.4)(0.8) / (1 - 0.5102) = 0.9800. The second moment of the zero-modified Negative Binomial is: (1 - 0.4){(2)(0.4)(1.4) + 0.82 } / (1 - 0.5102) = 2.1560. Thus the variance of the zero-modified Negative Binomial is: 2.1560 - 0.98002 = 1.1956. The mean of the Gamma is: (3)(500) = 1500. The variance of the Gamma is: (3)(5002 ) = 750,000. Thus the variance of the annual aggregate loss is: (0.9800)(750,000) + (15002 )(1.1956) = 3,425,100. 5.61. C. The sample mean is 109.167 The sample variance is 54.967. 100 - 109.16 Prob[Aggregate < 100] = Φ[ ] = Φ[-1.24] = 1 - 0.8925 = 10.75%. 54.967
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 180
5.62. B. By thinning, each type of claim is Poisson. For each type, variance of aggregate is: λ (second moment of severity) = λ (mean2 ) (1 + CV2 ). Variance of Type I: (0.20) (1002 ) (1 + 52 ) = 52,000. Variance of Type II: (0.10) (2002 ) (1 + 42 ) = 68,000. Variance of Type III: (0.05) (3002 ) (1 + 32 ) = 45,000. The variance of the distribution of annual aggregate losses is: 52,000 + 68,000 + 45,000 = 165,000. Alternately, severity is a mixture, with weights: 20/35, 10/35, and 5/35. The second moment of the mixture is the mixture of the second moments: (4/7) (1002 ) (1 + 52 ) + (2/7) (2002 ) (1 + 42 ) + (1/7) (3002 ) (1 + 32 ) = 471,429. The variance of the distribution of annual aggregate losses is: (0.35)(471,429) = 165,000. 5.63. A. Severity is LogNormal with µ = 6 and σ2 = 0.7. Mean severity is: exp[6 + 0.7/2] = 572.5. Second moment of severity is: exp[(2)(6) + (2)(0.7)] = 660,003. Variance of severity is: 660,003 - 572.52 = 332,247. Mean frequency is: (40%)(1) + (30%)(2) + (20%)(3) + (10%)(4) = 2. Second Moment of frequency is: (40%)(12 ) + (30%)(22 ) + (20%)(32 ) + (10%)(42 ) = 5. Variance of frequency is: 5 - 22 = 1. The variance of the distribution of annual aggregate losses is: (2)(332,247) + (572.52 )(1) = 992,250. 5.64. E. X is the discrete frequency, severity is Normal; Y is the aggregate loss. E[X] = 1.3. E[X2 ] = 2.3. Var[X] = 2.3 - 1.32 = 0.61. Var[Y] = (1.3) (5) + (32 ) (0.61) = 11.99. Alternately, this is a mixture: with probability 20% Y is 0, with probability 30% Y is Normal with mean 3 and variance 5, with probability 50% Y is Normal with mean 6 and variance 10. Thus E[Y] = (0.2)(0) + (0.3)(3) + (0.5)(6) = 3.9. E[Y2 ] = (0.2)(0) + (0.3)(5 + 32 ) + (0.5)(10 + 62 ) = 27.2. Var[Y] = 27.2 - 3.92 = 11.99.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 181
5.65. D. The Gamma has a mean of 30,000, and a variance of: (3)(10,0002 ) = 300 million. The mean of the zero-truncated Binomial is: (4)(0.2) / (1 - 0.84 ) = 1.355. Thus the mean number of claimants is: (0.1)(1.355) = 0.1355. Thus the mean annual aggregate loss is: (0.1355)(30,000) = 4065. The second moment of the non-truncated Binomial is: (4)(0.2)(0.8) + {(4)(0.2)}2 = 1.28. The second moment of the zero-truncated Binomial is: 1.28 / (1 - 0.84 ) = 2.168. The annual number of claimants follows a compound Poisson zero-truncated Binomial Distribution, in other words as if there were a Poisson Frequency and a zero-truncated Binomial severity. Thus the variance of the number of claimants is: (0.1)(2.168) = 0.2168. Thus the variance of the annual aggregate loss is: (0.1355)(300 million) + (30,0002 )(0.2168) = 235.77 million. CV of aggregate loss is:
235.77 million / 4065 = 3.78.
5.66. A. For the portion paid by Spring & Sommers, the mean disability is: (0.3)(1) + (0.2)(2) + (0.1)(3) + (0.1)(4) + (0.3)(5) = 2.9 weeks. Second moment is: (0.3)(12 ) + (0.2)(22 ) + (0.1)(32 ) + (0.1)(42 ) + (0.3)(52 ) = 11.1. Variance is: 11.1 - 2.92 = 2.69. The number of disabilities from Type 1 are Binomial with m = 1500 and q = 5%. The mean severity is: (2/3)(600)(2.9) = 1160. The variance of severity is: (4002 ) (2.69) = 430,400. For Type 1, the mean aggregate is: (5%)(1500) (1160) = 87,000. The variance aggregate is: (5%)(1500)(430,400) + (11602 )(1500)(0.05)(0.95) = 128,154,000. The number of disabilities from Type 2 are Binomial with m = 500 and q = 8%. The mean severity is: (2/3)(900)(2.9) = 1740. The variance of severity is: (6002 ) (2.69) = 968,400. For Type 2, the mean aggregate is: (8%)(500) (1740) = 69,600. The variance aggregate is: (8%)(500)(968,400) + (17402 )(500)(0.08)(0.92) = 150,151,680. The total mean aggregate is: 87,000 + 69,600 = 156,600. The variance of total aggregate is: 128,154,000 + 150,151,680 = 278,305,680. The coefficient of variation of the distribution of total annual payments is: 278,305,680 / 156,600 = 0.1065.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 182
5.67. D. The mean of the zero-modified Poisson is: (1 - 0.25)(0.1) / (1 - e-0.1) = 0.7881. The second moment of the zero-modified Poisson is: (1 - 0.25)(0.1 + 0.12 ) / (1 - e-0.1) = 0.8669. Thus the variance of the zero-modified Poisson is: 0.8669 - 0.78812 = 0.2458. The mean of the LogNormal is: exp[8 + 0.62 /2] = 3569. The second moment of the LogNormal is: exp[(2)(8) + (2)(0.62 )] = 18,255,921. Thus the variance of the LogNormal is: 18,255,921 - 35692 = 5,518,160. Thus the variance of the annual aggregate loss is: (0.7881)(5,518,160) + (35692 )(0.2458) = 7,479,804. 5.68. D. For frequency independent of severity, the process variance of the aggregate losses is given by: (Mean Freq.)(Variance of Severity) + (Mean Severity)2 (Variance of Freq.) = λ(r/a2) + (r/a)2(2λ) = λr(2r + 1) / a2 . 5.69. B. Let X be the claim sizes, then VAR[T] = E[N]VAR[X] + E[X]2VAR[N] = m(αθ2) + (αθ)2(3m) = m α (3α + 1)θ2. 5.70. A. The mean frequency = (1/4)(4) + (1/2)(5) + (1/4)(6) = 5. The second moment of the frequency = (1/4)(42 ) + (1/2)(52 ) + (1/4)(62 ) = 25.5. The variance of the frequency = 25.5 - 52 = .5. The severity distribution is a Single Parameter Pareto with θ = 1 and α = 3. mean = (α / (α − 1))θ = 3/2. 2nd moment = (α /(α − 2))θ2 = 3. variance = 3 - 9/4 = 3/4. The variance of the aggregate losses = (5)(3/4) + (3/2)2 (.5) = 4.875. The mean of the aggregate loss is: (mean frequency)(mean severity) = (5)(3/2) = 7.5. The coefficient of variation of the aggregate loss =
4.875 / 7.5 = 0.294.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 183
5.71. E. One has to recognize this as a compound Negative Binomial, with p(x) the severity, and ⎛n + 2⎞ frequency density: ⎜ ⎟ (0.6)3 (0.4)n . ⎝ n ⎠ Frequency is Negative Binomial with r = 3 and β/(1+β) = 0.4, so that β = 2/3. The mean frequency is: rβ = 2, and the variance of the frequency is: rβ(1+β) = 10/3. The mean severity is: (0.3)(1) + (0.6)(2) + (0.1)(3) = 1.8. The second moment of the severity is: (0.3)(12 ) + (0.6)(22 ) + (0.1)(32 ) = 3.6. Thus the variance of the severity is: 3.6 - 1.82 = .36. The variance of aggregate losses is: (0.36)(2) + (1.82 )(10/3) = 11.52. Comment: Where p is the density of the severity, f is the density of the frequency, and frequency ∞
and severity are independent, then the density of aggregate losses is:
∑ p * n(x) f(n). n=0
You should recognize that this is the convolution form of writing an aggregate distribution. In the frequency density, you have a geometric decline factor of 0.4. So 0.4 looks like β/(1+β) in a Geometric Distribution. However, we also have the binomial coefficients in front which is one way of ⎛n + 2⎞ ⎛n + 2⎞ r(r +1)...(r + k -1) writing a Negative Binomial. ⎜ = ⎜ = (3) ... (n+2) / n! ⇔ . ⎟ ⎟ k! ⎝ n ⎠ ⎝ 2 ⎠ This is the form of the Negative Binomial density in Loss Models, with r = 3. There are only a few frequency distributions in the Appendix, so when you see something like this, there are only a few choices to try to match things up. It is more common for them to just say frequency is Negative Binomial or whatever.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 184
5.72. E. In each case the premium is the mean aggregate loss plus 1.645 standard deviations, since Φ(1.645) = .95. Thus the relative security loading is 1.645 standard deviations/ mean. Let A be the fixed amount of the claim, let p be the probability of a claim, and N be the number of policies. Since we are told that each policy has either zero or one claim, the number of claims for N policies is Binomial with parameters p and N. Therefore, the mean aggregate losses is: NpA. The variance of aggregate losses is: (N(p)(1-p))A2 . Thus the relative security loading is: (1.645)
N(p)(1- p)A2 / (NpA) = 1.645
(1- p) / (Np) .
So the largest relative security loading corresponds to the largest value of (1-p)/(Np). As shown below this occurs for region E. Region
N
p
(1-p)/(Np)
relative security loading
A B C D E
300 500 600 500 100
0.01 0.02 0.03 0.02 0.01
0.330 0.098 0.054 0.098 0.990
0.94 0.51 0.38 0.51 1.64
5.73. E. This is a mixed Poisson-Poisson frequency. For a given value of λ, the first moment of a Poisson frequency is λ. Thus for the mixed frequency, the first moment is E[λ] = 1/p. For a given value of λ, the second moment of a Poisson frequency is λ + λ2. Thus for the mixed frequency, the second moment is: E[λ + λ2] = E[λ] + E[ λ2] = 1/p + (second moment of a Poisson with mean 1/p) = 1/p + (1/p + 1/p2 ) = 2/p + 1/p2 . Thus the variance of the mixed frequency distribution is: 2/p + 1/p2 - 1/p2 = 2/p. The mean severity is: p + (2)(1-p) = 2 - p. The second moment of the severity is: p + (4)(1-p) = 4 - 3p. Thus the variance of the severity is: 4 - 3p - (2-p)2 = p - p2 . Variance of aggregate losses is: (variance of severity)(mean frequency) + (mean severity)2 (variance of frequency) = (p-p2 )(1/p) + (2-p)2 (2/p) = p - 7 + 8/p. Setting this equal to the given 19/2, p - 7 + 8/p = 19/2. Therefore, 2p2 - 33p + 16 = 0. p = (33 ±
332 - (4)(2)(16) )/4 = (33 ± 31)/4 = 1/2 or 16.
However, in order to have a legitimate severity distribution, we must have 0 ≤ p ≤ 1. Therefore, p = 1/2.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 185
5.74. B. The mean frequency is β = 1/3. The mean severity is 4. Thus the mean aggregate loss is (1/3)(4) = 4/3. The second moment of the severity is (9 + 16 + 25)/3 = 50/3. Thus the variance of the severity is: 50/3 - 42 = 2/3. The variance of the frequency is: β(1+β) = (1/3)(4/3) = 4/9. Thus the variance of aggregate losses is: (1/3)(2/3) + (4/9)(42 ) = 66/9. Thus the premium is: 4/3 + 66/9 = 78/9 = 8.667. The aggregate losses do not exceed the premiums if: there are 0 claims, there is 1 claim, and sometimes when there are 2 claims. The probability of 0 claims is: 1/(1+β) = .75. The probability of 1 claim is: β/(1+β)2 = .1875. The probability of 2 claims is: β2/(1+β)3 = .046875. If there are two claims, the aggregate losses are < 8.667 if the claims are of sizes: 3,3; 3,4; 3,5; 4,3; 4,4; 5,3. This is 6 out of 9 equally likely possibilities, when there are two claims. Therefore, the probability that the aggregate losses exceed the premiums is: 1 - {0.75 + 0.1875 + (0.046875)(6/9)} = 1 - 0.96875 = 0.03125. Comment: In spreadsheet form, the probability that the aggregate losses do not exceed the premiums is calculated as 0.96875: A
B
C
D
Number of Claims
Frequency Distribution
Probability that Aggregate Losses ≤ Premiums
Column B times Column C
0 1 2 3
0.75000 0.18750 0.04688 0.01172
1.00000 1.00000 0.66667 0.00000
0.75000 0.18750 0.03125 0.00000 0.96875
5.75. D. G = E[S](1+η) = λE[X](1+η). Var[S] = λE[X2 ]. Var[R] = Var[S/G] = Var[S]/G2 = λE[X2 ]/(λE[X](1+η))2 = E[X2 ]/{λE[X]2 (1+η)2 }. Comment: The premiums G does not vary, so we can treat G like a constant; G comes out of the variance as a square. S is a compound Poisson, so its variance is the mean frequency times the second moment of the severity. 5.76. E. Frequency is Binomial, with mean = (40)(1/8) = 5 and variance = (40)(1/8)(7/8) = 35/8. The mean severity is µ = 400. The variance of the severity is: µ3/θ = 4003 /8000 = 8000. Thus the mean aggregate loss is: (5)(400) = 2000 and the variance of aggregate losses is: (4002 )(35/8) + (5)(8000) = 740,000. Thus the probability that the total dollars of claims for the portfolio are greater than 2900 is approximately: 1 - Φ[(2900 - 2000)/ 740,000 ] = 1 - Φ[1.05] = 1 - 0.852 = 0.147.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 186
5.77. B. Mean severity is: (0.9)(1) + (0.1)(10) = 1.9. Second moment of the severity is: (0.9)(12 ) + (0.1)(102 ) = 10.9. Variance of the severity is: 10.9 - 1.92 = 7.29. Mean frequency is 0.10. Variance of frequency is: (0.10)(.90) =0 .09. Mean aggregate loss is: N(.01)(1.9) = 0.019N. Variance of aggregate losses is: N{(0.10)(7.29) + (0.09)(1.92 )} = 1.0539N. A 95% probability corresponds to 1.645 standard deviations greater than the mean, since Φ(1.645) = 0.95. Thus, safety loading = .2(mean aggregate loss) = 1.645 (standard deviations). Thus, (.2)(0.019N) = 1.645 1.0539N . Solving, N = 1.6452 (1.0539)/.0382 = 1975 policies. Comment: If one knows classical credibility, one can do this problem as follows. P = 95%, but since one performs only a one-sided test in this case y = 1.645. k = ±20%. The CV2 of the severity is: 7.29/1.92 = 2.019. The standard for full credibility is: (y/k)2 (σF2/µF + CVS2 ) = (1.645/.2)2 (.09/.1 + 2.019) = (67.65)(2.919) = 197.5 claims. This corresponds to 197.5/.1 = 1975 policies. 5.78. D. Mean Freq = .01. Variance of freq. = (0.01)(0.99) = 0.0099. Mean Severity = 5000. Variance of severity = (10000-0)2 /12 = 8,333,333. Variance of Aggregate Losses = (.01)(8333333) + (.0099)(50002 ) = 330,833. 5.79. The mean of the aggregate losses = (3)(100) = 300. Since the frequency is Poisson, the variance of aggregate losses = (mean frequency)(second moment of the severity) = (3)(15000) = 45,000. Premiums = (300)(1.1) = 330. Mean Loss Ratio = 300/330 = 0.91. Var(Loss Ratio) = Var(Loss/Premium) = Var(Loss)/3302 = 45,000/3302 = 0.41. 5.80. a. E[S] = (# risks)(mean frequency)(mean severity) = (500)(.1)(1000) = 50,000. Var[S] = (# risks){(mean frequency)(var. of sev.) + (mean severity)2 (var. freq.)} = (500){(0.1)(10002 ) + (10002 )0(.1)(0.9)} = 95,000,000. StdDev{S] =
95,000,000 = 9746.
b. So that there is 95% chance that the premiums are sufficient to pay the resulting claims, the aggregate premiums = mean + 1.645(StdDev) = 50000 + (1.645)(9746) = 66,033. Premium per risk = 66033/500 = 132. 5.81. Let the death benefit be b and the probability of death be q. Then 30 = E[x] = bq and 29100 = Var[x] = q(1-q)b2 . Thus 29100/302 = (1-q)/q. ⇒ q = 0.03. ⇒ b = 1000.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 187
5.82. Premium = 1.2E[S] = 1.2(5)(0 +10)/2 = 30. Var[S] = (mean freq)(second moment of severity) = (5)(102 /3) = 166.67. Var[Loss Ratio] = Var[S/30] = Var[S]/900 = 166.67/900 = 0.185. Alternately, G = 1.2 E[N]E[X], Var[S] = E[N]Var[X] + E[X]2 Var[N]. Var[S/G] = Var[S]/G2 = {E[N]Var[X] + E[X]2 Var[N]}/ {1.2 E[N]E[X]}2 = {Var[X]/ (E[N]E[X]2 ) + Var[N]/E[N]2 } / 1.44 = {(100/12)/((5)(25)) +(5)/(25)} / 1.44 = .185. 5.83. D. Mean aggregate loss = (3){(.4)(1) + (.2)(2) + (.4)(3)} = 6. For a compound Poisson, variance of aggregate losses = (mean frequency)(second moment of severity) = (3){(.4)(12 ) + (.2)(22 ) + (.4)(32 )} = 14.4. Since the severity is discrete, one should use the continuity correction. Pr[S > 9] ≅ 1 - Φ[(9.5 - 6)/ 14.4 ] = 1 - Φ(0.92) = 1 - 0.8212 = 17.88%. 5.84. A. Since frequency is Poisson, Var[S] = (mean frequency)(second moment of the severity). 30,000,000 = λ (500,000,000 + 50,0002 ). λ = 1/100. Prob(N≥1) = 1 - Prob(N = 0) = 1 - e−λ = 1 - e-.01 = 0.995%. 5.85. Mean Aggregate Loss = (350)(500) = 175,000. Since frequency is Poisson, Variance of Aggregate Loss = (350)(2nd moment of severity) = (350)(10002 /3) = 116.67 million. Prob (S > 180,000) ≅ 1 - Φ[(180,000 - 175,000)/ 116.67 million] = 1 - Φ[0.46] = 32.3%. Comment: The second moment of the uniform distribution (a , b) is: (b3 - a3 )/(3(b-a)).
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 188
5.86. The expected excess annual claims are: (0)(.04)(3000) + (5000)(.04)(4000) + (30000)(.04)(5000) + (45000)(.04)(2000) = 10.4 million. Therefore, the reinsurance cost is: (125%)(10.4 million) = 13 million. The expected retained annual claims are: (20000)(.04)(3000) + (30000)(.04)(4000) + (30000)(.04)(5000) + (30000)(.04)(2000) = 15.6 million. The variance of the retained annual claims are: (200002 )(.04)(.96)(3000) + (300002 )(.04)(.96)(4000) + (300002 )(.04)(.96)(5000) + (300002 )(.04)(.96)(2000) = 4.2624 x 1011. The total cost (retained claims plus reinsurance cost) of insuring the properties has mean 15.6 million + 13 million = 28.6 million and variance 4.2624 x 1011. Probability that the total cost exceeds $28,650,000
≅ 1 - Φ[(28.65 million - 28.6 million) / 4.2624 x 1011 ] = 1 - Φ[0.08] = 46.8%. Comment: The insurerʼs cost for reinsurance does not depend on the insurerʼs actual losses in a year; rather it is fixed and has a variance of zero. 5.87. B. S(500) = e-500/1000 = .6065. The frequency distribution of losses of size greater than 500 is also a Negative Binomial Distribution, but with β = (.6065)(2) = 1.2131 and r = 2. Therefore, the frequency of non-zero payments has mean: (2)(1.2131) = 2.4262 and variance: (2)(1.2131)(1+1.2131) = 5.369. When one truncates and shifts an Exponential Distribution, one gets the same distribution, due to the memoryless property of the Exponential. Therefore, the severity distribution of payments on losses of size greater than 500 is also an Exponential Distribution with θ = 1000. The aggregate losses excess of the deductible, which are the sum of the non-zero payments, have a variance of: (mean freq.)(var. of sev.) + (mean sev. 2)(var. of freq) = (2.4262)(10002 ) + (10002 )(5.269) = (7.796)(10002 ). Thus the standard deviation of total payments is: (1000) 7.796 = 2792. Comment: The mean of the aggregate losses excess of the deductible is: (2.4262)(1000) = 2426. 5.88. The expected aggregate losses are (500)(100) = 50,000. Thus the premium is: (1.1)(50,000) = 55,000. If the loss ratio exceeds .95 then the aggregate losses exceed (.95)(55,000) = 52250. The variance of the aggregate losses is: 500(100+1002 ) = 5,050,000. Thus the chance that the losses exceed 52,250 is about: 1 - Φ[(52,250 - 50,000) /
5,050,000 ] = 1 - Φ(1.00) = 1 - 0.8413 = 0.159.
Comment: For a Compound Poisson Distribution, the variance of the aggregate losses = (mean frequency)(2nd moment of the severity).
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 189
5.89. Mean aggregate is: (50)(1870) = 93,500. Variance of aggregate is: (50)(6102 ) = 18,605,000. Prob[Agg > 100,000] ≅ 1 - Φ[(100,000 - 93,500) /
18,605,000 ] = 1 - Φ[1.51] = 0.0655.
Comment: We know how many claims there were, and therefore the variance of frequency is 0. 5.90. C. The mean aggregate losses are: (8)(10,000) = 80,000. σagg2 = µfreq σsev2 + µsev2 σfreq2 = (8)(39372 ) + (10000)2 (32 ) = 1,023,999,752. The probability that the aggregate losses will exceed (1.5)(80,000) = 120,000 is approximately: 1 - Φ((120,000 - 80,000) /
1,023,999,752 = 1 - Φ(1.25).
Comment: Short and easy. 1 - Φ(1.25) = 10.6%. 5.91. D. Policy Type one has a mean severity of 200 and a variance of the severity of (400-0)2 /12 = 13,333. Policy Type one has a mean frequency of 0.05 and a variance of the frequency of (0.05)(0.95) = 0.0475. Thus, a single policy of type one has a variance of aggregate losses of: (0.05)(13,333) + (2002 )(0.0475) = 2567. Policy Type two has a mean severity of 150 and a variance of the severity of (300-0)2 /12 = 7500. Policy Type two has a mean frequency of 0.06 and a variance of the frequency of (0.06)(0.94) = 0.0564. Thus, a single policy of type two has a variance of aggregate losses of: (0.06)(7500) + (1502 )(.0564) = 1719. Therefore, the variance of the aggregate losses of 100 independent policies of type one and 200 policies of type two is: (100)(2567) + (200)(1719) = 600,500. Comment: Frequency is Bernoulli. Severity is uniform.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 190
5.92. E. Mean frequency = (0)(0.7) + (2)(0.2) + (3)(0.1) = .7. Second moment of the frequency = (02 )(0.7) + (22 )(0.2) + (32 )(0.1) = 1.7. Variance of the frequency = 1.7 - 0.72 = 1.21. Mean severity = (0)(0.8) + (10)(0.2) = 2. Second moment of the severity = (02 )(0.8) + (102 )(0.2) = 20. Variance of the severity = 20 - 22 = 16. Mean aggregate loss = (.7)(2) = 1.4. Variance of the aggregate losses = (.7)(16) + (22 )(1.21) = 16.04. Mean + 2 standard deviations = 1.4 + 2 16.04 = 9.41. The aggregate benefits are greater than 9.41 if and only if there is at least one non-zero claim. The probability of no non-zero claims is: .7 + (.2)(.82 ) + (.1)(.83 ) = 0.8792. Thus the probability of at least one non-zero claim is: 1 - 0.8792 = 0.1208. Comment: If one were to inappropriately use the Normal Approximation, the probability that aggregate benefits will exceed expected benefits by more than 2 standard deviations is: 1 - Φ(2) = 1 - 0.9772 = 0.023. The fact that the magic phrase "use the Normal Approximation" did not appear in this question, might make one think. One usually relies on the Normal Approximation when the expected number of claims is large. In this case one has very few expected claims. Therefore, one should not rush to use the Normal Approximation. 5.93. D. For the uniform distribution on [5, 95], E[X] = (5+95)/2 = 50, Var[X] = (95 - 5)2 /12 = 675. E[X2 ] = 675 + 502 = 3175. Therefore, the aggregate claims have mean of: (25)(50) = 1250 and variance of: (25)(3175) = 79,375. Thus, Prob(aggregate claims > 2000) ≅ 1 - Φ[(2000 - 1250) /
79,375 ] = 1 - Φ(2.662).
5.94. E. Mean frequency is: (.8)(1) + (.2)(2) = 1.2. Variance of the frequency is: (.8)(12 ) + (.2)(22 ) - 1.22 = .16. Mean severity is: (.2)(0) + (.7)(100) + (.1)(1000) = 170. Second moment of the severity is: (.2)(02 ) + (.7)(1002 ) + (.1)(10002 ) = 107,000. Variance of the severity is: 107000 - 1702 = 78,100. Mean aggregate loss: (1.2)(170) = 204. Variance of aggregate loss is: (1.2)(78,100) + (1702 )(.16) = 98,344. mean plus the standard deviation = 204 +
98,344 = 518.
Comment: The frequency is 1 plus a Bernoulli Distribution with q = .2. Therefore, it has mean: 1 + .2 = 1.2, and variance: (.2)(1 - .2) = .16. 5.95. A. mean aggregate loss = (50)(200) = 10,000. Variance of aggregate loss = (50)(400) + (2002 )(100) = 4,020,000. Prob[aggregate < 8000] ≅ Φ[(8000 - 10000) /
4,020,000 ] = Φ[-1.00] = 15.9%.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 191
5.96. A. Mean of aggregate = (110)(1101) = 121,110. Variance of aggregate = (110)(702 ) + (11012 )(750) = 909,689,750. Prob[aggregate < 100,000] ≅ Φ[(100000 - 121,110) /
909,689,750 ] =
Φ[-0.70] = 1 - 0.7580 = 0.2420. 5.97. E. The average number of tires repaired per year is: $10,000,000/$100 = 100,000. There are 2,000,000 tires sold per year, so q = 100,000/2,000,000 = .05. µf = mq = (2000000)(.05) = 100,000. σf2 = mq(1-q) = (2000000)(.05)(.95) = 95,000. µs = 100. We are given that the variance of the aggregate is: 400002 = µfσs2 + µs2 σf2 . ⇒ 1,600,000,000 = 100,000σs2 + (1002 )(95,000). ⇒ σs2 = 6500. ⇒ σs = 80.62. 5.98. B. µf = 8. σf2 = 15. µs = 100. σs2 = 40,000. Variance of the aggregate: µfσs2 + µs2 σf2 = (8)(40,000) + (1002 )(15) = 470,000. ⇒ σ = 685.57. Now if we know that there have been 13 claims, then the aggregate is the sum of 13 independent, identically distributed severities. ⇒ Var[Aggregate] = 13 Var[Severity] = (13)(40,000) = 520,000. ⇒ σʼ =
520,000 = 721.11. σ/σʼ - 1 = 685.57/721.11 - 1 = -4.93%.
Alternately, if we know that there have been 13 claims, µf = 13, σf2 = 0, and the variance of the aggregate is: (13)(40,000) + (1002 )(0) = 520,000. Proceed as before. 5.99. C. Mean aggregate per computer: (3)(80) = 240. Variance of aggregate per computer: λ(2nd moment of severity) = (3)(2002 + 802 ) = 139,200. For N computers, mean is: 240N, and the variance is: 139,200N. 120% of the mean is: 288N. Prob[Aggregate > 120% of mean] ≅ 1 - Φ[(288N - 240N) /
139,200N ] = 1 - Φ[0.12865 N ].
This probability < 10%. ⇔ Φ[0.12865 N ] > 90%.. Φ[1.282] = 0.90. ⇒ We want 0.12865 N > 1.282. ⇒ N > 99.3. Alternately, for classical credibility, we might want a probability of 90% of being within ± 20%. However, here we are only interested in avoiding +20%, a one-sided rather than two-sided test. For 10% probability in one tail, for the Standard Normal Distribution, y = 1.282. n0 = (1.282/.2)2 = 41.088. Severity has a CV of: 200/80 = 2.5. Number of claims needed for full credibility of aggregate losses: (1 + 2.52 )(41.088) = 297.89. However, the number of computers corresponds to the number of exposures. Thus we need to divide by the mean frequency of 3: 297.89/3 = 99.3.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 192
5.100. B. Mean severity is: (50%)(80) + (40%)(100) + (10%)(160) = 96. Second Moment of severity is: (50%)(802 ) + (40%)(1002 ) + (10%)(1602 ) = 9760. Mean Aggregate is: (1000)(96) = 96,000. Variance of Aggregate is: (1000)(9760) = 9,760,000. Prob[Club pays > 90,000] = Prob[Aggregate > 100,000] ≅ 1 - Φ[(100000 - 96,000) /
9,760,000 ] = 1 - Φ[1.28] = 10.0%.
Comment: One could instead work with the payments, which are 90% of the losses. 5.101. B. Frequency has mean of 0.9, second moment of 1.9, and variance of 1.09. Severity reduced by the 1000 deductible has mean of: (50%)(0) + (10%)(1000) + (10%)(2000) + (30%)(4000) = 1500, second moment of: (50%)(02 ) + (10%)(10002 ) + (10%)(20002 ) + (30%)(40002 ) = 5.3 million, and variance of: 5.3 million - 15002 = 3.05 million. σA2 = (.9)(3.05 million) + (15002 )(1.09) = 5,197,500. σA = 2280. Alternately, for the original severity distribution: E[X] = (50%)(1000) + (10%)(2000) + (10%)(3000) + (30%)(5000) = 2500. E[X ∧ 1000] = (50%)(1000) + (10%)(1000) + (10%)(1000) + (30%)(1000) = 1000. E[X2 ] = (50%)(10002 ) + (10%)(20002 ) + (10%)(30002 ) + (30%)(50002 ) = 9.3 million. E[(X ∧ 1000)2 ] = (50%)(10002 ) + (10%)(10002 ) + (10%)(10002 ) + (30%)(10002 ) = 1 million. First moment of the layer from 1000 to ∞ is: E[X] - E[X ∧ 1000] = 2500 - 1000 = 1500. Second moment of the layer from 1000 to ∞ is: E[(X ∧ ∞)2 ] - E[(X ∧ 1000)2 ] - (2)(1000){E[X ∧ ∞] - E[X ∧ 1000]} = E[X2 ] - E[(X ∧ 1000)2 ] - (2000){E[X] - E[X ∧ 1000]} = 9.3 million - 1 million - (2000)(2500 - 1000) = 5.3 million. Proceed as before. 5.102. B. Frequency has mean rβ and variance rβ(1+β). Mean Aggregate = rβ700. ⇒ 48000 = rβ700. ⇒ rβ = 68.571. Variance of Aggregate = rβ1300 + 7002 rβ(1+β) = 491300rβ + 490000rβ2.
⇒ 80,000,000 = 491300rβ + 490000rβ2 = (491300)(68.571) + (490000)(68.571)β. ⇒ β = 1.378. ⇒ r = 49.76.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 193
5.103. B. Mean of Aggregate: (50)(4500) = 225,000. Variance of Aggregate: (50)(30002 ) + (45002 )(122 ) = 3366 million. Set Mean of Lognormal equal to that of the aggregate: exp[µ + σ2/2] = 225,000. Set Second Moment of Lognormal equal to that of the aggregate: exp[2µ + 2σ2] = 225,0002 + 3366 million = 5399.1 million. Divide the second equation by the square of the first: exp[σ2] = 1.0665.
⇒ σ = .2537. ⇒ µ = 12.292. S((1.5)(225000)) = S(337500) = 1 - Φ[(ln(337500) - 12.292)/.2537] = 1 - Φ[1.72] = 4.27%. 5.104. A. After the deductible, the severity has a mean of: (0.35)(0) + (0.3)(250) + (0.25)(500) + (0.05)(750) + (0.05)(1000) = 287.5. After the deductible, the severity has a second moment of (0.35)(02 ) + (0.3)(2502 ) + (0.25)(5002 ) + (0.05)(7502 ) + (0.05)(10002 ) = 159,375. Average Aggregate: (0.15)(287.5) = 43.125. Variance of Aggregate: (0.15)(159,375) = 23,906. Prob[Aggregate > 250] ≅ 1 - Φ[(250 - 43.125)/ 23,906 ] = 1 - Φ[1.34] = 9.01%. 5.105. B. σA2 = µF σX2 + µX2 σF2 . 228742 = (103)(17812 ) + 62822 σF2 . ⇒ σF = 2.197. Comment: Sometimes the answer given by the exam committee, in this case 2.17, will not match the exact answer, 2.20 in this case. Very annoying! In this case, you might have checked your work once, but then put down B as the best choice and move on. 5.106. C. With a coinsurance factor of 80%, each payment is each loss times 0.8. When we multiply a variable by a constant, the mean and standard deviation are each multiplied by that constant. The 95th percentile of the normal approximation is: mean + (1.645)(standard deviation). Thus it is also multiplied by .8. The reduction is 20%. Alternately, before the reduction, µA = (25)(10,000) = 250,000, and σA2 = µF σX2 + µX2 σF2 = (25){(3)(10000)}2 + (10000)2 {(1.2)(25)}2 = 112,500 million. mean + (1.645)(standard deviation) = 250,000 + (1.645)(335,410) = 801,750. Paying 80% of each loss, multiplies the severity by .8. The mean of the lognormal is multiplied by .8 and its coefficient of variation is unaffected. µA = (25)(8000) = 200,000. σA2 = µF σX2 + µX2 σF2 = (25){(3)(8000)}2 + (8000)2 {(1.2)(25)}2 = 72,000 million. mean + (1.645)(standard deviation) = 200,000 + (1.645)(268,328) = 641,400. Reduction in the estimated 95th percentile: 1 - 641,400/801,750 = 20%.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 194
5.107. A. For Type I, mean: (12)(1/2) = 6, variance: 12(12 )/3 = 4. For Type II, mean: (4)(2.5) = 10, variance: 4(52 )/3 = 33.33. Overall mean = (12)(1/2) + (4)(2.5) = 6 + 10 = 16. Overall variance = 4 + 33.33 = 37.33. Prob[aggregate > 18] ≅ 1 - Φ((18 - 16)/ 37.33 ) = 1 - Φ(0.33) = 1 - 0.6293 = 0.3707. Comment: For a Poisson frequency, variance of aggregate = λ(second moment of severity). The two types of claims are independent, so their variances add. 5.108. D. Let the mean frequencies be b before and c after. The probability of no claim increases by 30%. ⇒ 1.3e-b = e-c. The probability of having one claim decreases by 10%. ⇒ .9be-b = ce-c. Dividing: 0.9/1.3 = c/b. The expected aggregate before is: (10002 + 2562 )b = 1,065,536b. The expected aggregate after is: (1,5002 + 6782 )c = 2,709,684c. The ratio of “after” over “before” is: 2.543c/b = 2.543(0.9/1.3) = 1.760. ⇔ 76.0% increase. 5.109. E. Due to the memoryless property of the Exponential, the payments excess of a deductible follow the same Exponential Distribution as the ground up losses. Thus the second moment of (non-zero) payments is: 2(10,0002 ) = 200 million. The number of (non-zero) payments is Poisson with mean: 100e-25000/10000 = 8.2085. Therefore, variance of aggregate payments = (8.2085)(200 million) = 1641.7 million. Standard deviation of aggregate payments =
1641.7 million = 40,518.
5.110. A. Var[A] = (2)(10002 + 20002 ) = 10 million. Var[B] = (1)(20002 + 40002 ) = 20 million. The variances of two independent portfolios add. Var[A] + Var[B] = 10 million + 20 million = 30 million. Standard deviation of the combined losses is:
30 million = 5,477.
2013-4-3,
Aggregate Distributions §5 Moments, ∞
5.111. B. E[X2 | X > 30] =
HCM 10/23/12,
Page 195
∞
∫30 x2 f(x) dx / S(30). ⇒ 30∫ x2 f(x) dx = S(30)E[X2 | X > 30] =
(0.75)(9000) = 6750. ∞
∞
30
∫30 x f(x) dx = ∫0 x f(x) dx - ∫0 x f(x) dx = E[X] - {E[X ∧ 30] - 30S(30)} = 70 - {25 - (30)(.75)} = 67.5. ∞
∫30 f(x) dx = S(30) = 0.75. With a deductible of 30 per loss, the second moment of the payment per loss is: ∞
∞
∞
∞
∫30 (x - 30)2 f(x) dx = 30∫ x2 f(x) dx - 60 30∫ x f(x) dx + 900 30∫ f(x) dx = 6750 - (60)(67.5) + (900)(0.75) = 3375. Since frequency is Poisson, the variance of the aggregate payments is: λ(second moment of the payment per loss) = (20)(3375) = 67,500. Alternately, e(30) = (E[X] - E[X
∧
30])/S(30) = (70 - 25)/.75 = 60.
(X - 30)2 = X2 - 60X + 900 = X2 - 60(X - 30) - 900. ⇒ E[(X - 30)2 | X > 30] = E[X2 - 60(X - 30) - 900 | X > 30] = E[X2 | X > 30] - 60 E[X - 30 | X > 30] - E[900 | X > 30] = 9000 - 60 e(30) - 900 = 9000 - (60)(60) - 900 = 4500. The number of losses of size greater than 30 is Poisson with mean: (.75)(20) = 15. The variance of the aggregate payments is: (number of nonzero payments)(second moment of nonzero payments) = (15)(4500) = 67,500. Comment: Difficult. In the original exam question, “number of losses, X,” should have read “number of losses, N,” 5.112. B. Mean = (100)(.3)(300) + (300)(.1)(1000) + (50)(.6)(5000) = 189,000. Variance per power boat: (.3)(10,000) + (.3)(.7)(3002 ) = 21,900. Variance per sailboat: (.1)(400,000) + (.1)(.9)(10002 ) = 130,000. Variance per luxury yacht: (.6)(2,000,000) + (.6)(.4)(50002 ) = 7,200,000. Total Variance: (100)(21,900) + (300)(130,000) + (50)(7,200,000) = 401,190,000. Mean + standard deviation = 189,000 +
401,190,000 = 209,030.
Comment: Assume that the repair costs for one boat are independent of those of any other boat.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 196
5.113. C. Mean = (3)(10) = 30. Variance = (3)(202 /12) + (102 )(3.6) = 460. Φ(1.645) = .95. The 95th percentile is approximately: 30 + 1.645
460 = 65.3.
5.114. C. The primary distribution is Binomial with m = 10000 and q = 0.05, with mean: (10000)(.05) = 500, and variance: (10000)(.05)(1 - .05) = 475. The second distribution is LogNormal with mean: exp[1.039 + .8332 /2] = 3.9986, second moment: exp[2(1.039) + 2(.8332 )] = 32.0013, and variance: 32.0013 - 3.99862 = 16.012. The mean number of days is: (500)(3.9986) = 1999.3. The variance of the number of days is: (500)(16.012) + (3.99862 )(475) = 15601. The 90th percentile of the Standard Normal Distribution is 1.282. Thus the 90th percentile of the aggregate number of days is approximately: 1999.3 + 1.282 15,601 = 2159.43. This corresponds to losses of: ($100)(2159.43) = $215,943. Since this was for 10,000 policies, this corresponds to a premium per policy of $21.59. 5.115. D. The mean aggregate is: (10)(2000) = 20,000. Variance of aggregate = λ(second moment of severity) = (10)(2)(20002 ) = 80,000,000. Match the first and second moments of a LogNormal to that of the aggregate: exp[µ + σ2/2] = 20,000. exp[2µ + 2σ2] = 80,000,000 + 20,0002 = 480,000,000. Divide the second equation by the square of the first equation: exp[2µ + 2σ2]/exp[2µ + σ2] = exp[σ2] = 1.2.
⇒ σ = 0.427. ⇒ µ = 9.812. 105% of the expected annual loss is: (1.05)(20000) = 21,000. For the approximating LogNormal, S(21,000) = 1 - Φ[(ln(21000) - 9.812)/0.427] = 1 - Φ[.33] = 37.07%. Comment: We have fit a LogNormal to the aggregate losses via the method of moments.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 197
5.116. C. The mixed severity has mean: (1/16)(5) + (15/16)(10) = 9.6875. The mixed severity has second moment: (1/16)(502 + 52 ) + (15/16)(202 + 102 ) = 626.56. Thus without the vaccine, for 100 lives the mean of the compound Poisson Process is: (100)(.16)(9.6875) = 155.0, and the variance is: (100)(.16)(626.56) = 10025. Φ[0.71] = 0.7611 ≅ 1 - 0.24. Therefore, we set the aggregate premium for 100 individuals as: 155.0 + (0.71) 10,025 = 226.1. With the vaccine, the cost for 100 individuals has mean: (100)(0.15) + (100)(0.16)(10)(15/16) = 165, and variance: (100)(.016)(15/16)(202 + 102 ) = 7500. Therefore, with the vaccine we set the aggregate premium for 100 individuals as: 165.0 + (0.71) 7500 = 226.5. A/B = 226.1/226.5 = 0.998. Alternately, one can thin the original process into two independent Poisson Processes, that for Disease 1 with λ = .16/16 = .01, and that for other diseases with λ = (.16)15/16 = .15. The first process has mean: (0.01)(5) = 0.05, and variance: (0.01)(502 + 52 ) = 25.25. The second process has mean: (0.15)(10) = 1.5, and variance: (0.15)(202 + 102 ) = 75. Without the vaccine, for 100 lives, the aggregate loss has mean: (100)(0.05 + 1.5) = 155, and variance: (100)(25.25 + 75) = 10025. With the vaccine, for 100 lives, the aggregate cost has mean: (100)(.15 + 1.5) = 165, and variance: (100)(0 + 75) = 7500. Proceed as before. Comment: The use of the vaccine increases the mean cost, but decreases the variance. This could result in either an increase or decrease in aggregate premiums, depending on the criterion used to set premiums, as well as the number of insured lives. For example, assume instead that the premiums for a group of 100 independent lives are set at a level such that the probability that aggregate losses for the group will exceed aggregate premiums for the group is 5%. Then A = 155.0 + (1.645) 10,025 = 319.7, B = 165.0 + (1.645) 7500 = 307.5, and A/B = 1.04. 5.117. C. The non-zero payments are Poisson with mean: (.6)(10) = 6. The size of the non-zero payments is uniform from 0 to 6, with mean 3 and variance: 62 /12 = 3. Variance of aggregate payments is: λ(second moment of severity) = (6)(3 + 32 ) = 72. Alternately, the payments per loss are a mixture of 40% zero and 60% uniform from 0 to 6. Therefore, the size of the payments per loss has second moment: (40%)(0) + (60%)(3 + 32 ) = 7.2. Variance of aggregate payments is: λ(second moment of severity) = (10)(7.2) = 72.
2013-4-3,
Aggregate Distributions §5 Moments,
HCM 10/23/12,
Page 198
5.118. B. 2nd moment of the amount distribution is: (5%)(102 ) + (15%)(52 ) + (80%)(12 ) = 9.55. Variance of the compound Poisson Process is: (22)(9.55) = 210.1. 5.119. D. The Binomial frequency has mean: (1000)(.3) = 300, and variance: (1000)(.3)(.7) = 210. The Pareto severity has a mean of: 500/(3 -1) = 250, second moment: (2)(5002 )/{(3 -1)(3 -2)} = 250,000, and variance: 250,000 - 2502 = 187,500. Variance of Aggregate is: (300)(187,500) + (2502 )(210) = 69,375,000. Standard deviation of the aggregate losses is:
69,375,000 = 8329.
5.120. A. E[X2 ] = θ2 Γ[1 + 2/γ] Γ[α - 2/γ]/Γ[α] = 22 Γ[1 + 2/1] Γ[3 - 2/1]/Γ[3] = (4)Γ[3]Γ[1]/Γ[3] = 4. Since frequency is Poisson, the variance of aggregate is: λ(second moment of severity) = (3)(4) = 12. Comment: A Burr Distribution with γ = 1, is a Pareto Distribution. E[X2 ] = 2θ2 / {(α-1)(α-2)} = 2(22 ) / {(3-1)(3-2)} = 4. 5.121. D. Negative Binomial has mean: rβ = 96, and variance: rβ(1 + β) = 672. Uniform severity has mean: 4, and variance 82 /12 = 16/3. Mean of Aggregate is: (96)(4) = 384. Variance of Aggregate is: (96)(16/3) + (42 )(672) = 11,264. Φ[1.645] = 95%. Premium is: 384 + 1.645 11,264 = 559. 5.122. A. Mean of the aggregate is: (100)(20,000) = 2,000,000. Variance of the aggregate is: (100)(50002 ) + (20,000)2 (252 ) = 2.525 × 1011. Prob[Aggregate > (1.5)(2,000,000)] ≅ 1 - Φ[1,000,000/
2.525 x 1011 ] = 1 - Φ[2.0] = 2.3%.
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 199
Section 6, Individual Risk Model86 In the individual risk model, the aggregate loss is the sum of the losses from different independent policies. Throughout this section we will assume that each policy has at most one claim per year. Thus frequency is Bernoulli for each policy, with the q parameters varying between policies. Mean and Variance of the Aggregate: Often, the claim size will be a fixed amount, bi, for policy i. In this case, the mean aggregate is the sum of the policy means: Σ qi bi. The variance of the aggregate is the sum of the policy variances: Σ (1 - qi)qi bi2 .87 Exercise: There are 300 policies insuring 300 independent lives. For the first 100 policies, the probability of death this year is 2% and the death benefit is 5. For the second 100 policies, the probability of death this year is 4% and the death benefit is 20. For the third 100 policies, the probability of death this year is 6% and the death benefit is 10. Determine the mean and variance of Aggregate Losses. [Solution: Mean = Σ ni qi bi = (100)(2%)(5) + (100)(4%)(20) + (100)(6%)(10) = 150. Variance = Σ ni (1-qi)qi bi2 = (100)(0.98)(0.02)(52 ) + (100)(0.96)(0.04)(202 ) + (100)(0.94)(0.06)(102 ) = 2149.] If the severity is not constant, assume the severity distribution for the policy i has mean µi and variance σi2. In that case, the mean aggregate is the sum of the policy means: Σ qi µi. The variance of the aggregate is the sum of the policy variances: Σ {qiσi2 + (1-qi)qi µi2 }.88 For example, assume that for a particular policy the benefit for ordinary death is 5 and for accidental death is 10, and that 30% of deaths are accidental.89 Then for this policy: µ = (70%)(5) + (30%)(10) = 6.5, and σ2 = (70%)(52 ) + (30%)(102 ) - 6.52 = 5.25.
86
See Sections 9.11.1 and 9.11.2 in Loss Models. Applying the usual formula for the variance of the aggregate, where the variance of a Bernoulli frequency is q(1-q) and for a given policy severity is fixed. 88 Applying the usual formula for the variance of the aggregate, where a Bernoulli frequency has mean q and variance q(1-q) . 89 See Example 9.19 in Loss Models. 87
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 200
Parametric Approximation: As discussed before, one could approximate the Aggregate Distribution by a Normal Distribution, a LogNormal Distribution, or some other parametric distribution. Exercise: For the situation in the previous exercise, what is the probability that the aggregate loss will exceed 170? Use the Normal Approximation. [Solution: Prob[A > 170] ≅ 1 - Φ[(170 - 150)/ 2149 ] = 1 - Φ[0.43] = 33.36%. Comment: We usually do not use the continuity correction when working with aggregate distributions. Here since everything is in units of 5, a more accurate approximation would be: 1 - Φ[(172.5 - 150)/ 2149 ] = 1 - Φ[0.49] = 31.21%.] Direct Calculation of the Aggregate Distribution: When there are only 3 policies, it is not hard to calculate the aggregate distribution directly. Policy Probability of Death Death Benefit 1 2% 5 2 4% 20 3 6% 10 For the first policy, there is a 98% chance of an aggregate of 0 and a 2% chance of an aggregate of 5. For the second policy there is a 96% chance of 0 and a 4% chance of 20. Adding the first two policies, the combined aggregate has:90 (98%)(96%) = 94.08% @0, (2%)(96%) = 1.92% @ 5, (98%)(4%) = 3.92% @ 20, and (2%)(4%) = 0.08% at 25. Exercise: Add in the third policy, in order to calculate the aggregate for all three policies. [Solution: The third policy has a 94% chance of 0 and a 6% chance of 10. The combined aggregate has: (94.08%)(94%) = 88.4352% @ 0, (1.92%)(94%) = 1.8048% @ 5, (94.08%)(6%) = 5.6448% @ 10, (1.92%)(6%) = 0.1152% @ 15, (3.92%)(94%) = 3.6848% @ 20, (0.08%)(94%) = 0.0752% @ 25, (3.92%)(6%) = 0.2352% @ 30, (0.08%)(6%) = 0.0048% @ 35.]
90
This is the same as convoluting the two aggregate distributions.
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 201
One can use this distribution to calculate the mean and variance of aggregate losses: Aggregate
Probability
First Moment
Second Moment
0 5 10 15 20 25 30 35
88.4352% 1.8048% 5.6448% 0.1152% 3.6848% 0.0752% 0.2352% 0.0048%
0.0000 0.0902 0.5645 0.0173 0.7370 0.0188 0.0706 0.0017
0.0000 0.4512 5.6448 0.2592 14.7392 0.4700 2.1168 0.0588
Sum
1
1.5000
23.7400
The mean is 1.5 and the variance is: 23.74 - 1.52 = 21.49, matching the previous results. Exercise: What is the probability that the aggregate will exceed 17? [Solution: Prob[A > 17] = 3.6848% + 0.0752% + 0.2352% + 0.0048% = 4%. Comment: A > 17 if and only if there is claim from the second policy. The exact 4% differs significantly from the result using the Normal Approximation of 0.04%! The Normal Approximation is poor when the total number of claims expected is only 0.12.]
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 202
Problems: 6.1 (2 points) An insurer provides life insurance for the following group of independent lives: Number Death Probability of Lives Benefit of Death 2000 1 0.05 3000 5 0.04 4000 10 0.02 Using the Normal Approximation, what is the probability that the aggregate losses exceed 110% of their mean? (A) 6.5% (B) 7.0% (C) 7.5% (D) 8.0% (E) 8.5% Use the following information for the next two questions: An insurer writes two classes of policies with the following distributions of losses per policy: Class Mean Variance 1 10 20 2 15 40 6.2 (1 point) The insurer will write 10 independent policies, 5 of class one and 5 of class 2. What is variance of the aggregate losses? A. 300 B. 320 C. 340 D. 360 E. 380 6.3 (2 points) The insurer will write 10 independent policies. The number of these policies that are class one is Binomial with m = 10 and q = 0.5. What is variance of the aggregate losses? A. 300 B. 320 C. 340 D. 360 E. 380 6.4 (2 points) An insurer provides life insurance for the following group of 4 independent lives: Life Death Benefit Probability of Death A 10 0.03 B 25 0.06 C 50 0.01 D 100 0.02 What is the coefficient of variation of the aggregate losses? A. 2.9 B. 3.1 C. 3.3 D. 3.5 E. 3.7
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 203
Use the following information for the next two questions: An insurer provides life insurance for the following group of 400 independent lives: Number of Lives Death Benefit Probability of Death 100 10 0.03 100 25 0.06 100 50 0.01 100 100 0.02 6.5 (2 points) What is the coefficient of variation of the aggregate losses? A. 0.3
B. 0.4
C. 0.5
D. 0.6
E. 0.7
6.6 (2 points) Using the Normal Approximation, what is the probability that the aggregate losses are less than 300? A. 20% B. 21%
C. 22%
D. 23%
E. 24%
6.7 (2 points) An insurer provides life insurance for the following group of 4 independent lives: Death Probability Life Benefit of Death 1 10 0.04 2 10 0.03 3 20 0.02 4 30 0.05 What is the probability that the aggregate losses are more than 40? A. less than 0.09% B. at least 0.09% but less than 0.10% C. at least 0.10% but less than 0.11% D. at least 0.11% but less than 0.12% E. at least 0.12%
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 204
6.8 (Course 151 Sample Exam #1, Q.22) (2.5 points) An insurer provides one year term life insurance to a group. The benefit is 100 if death is due to accident and 10 otherwise. The characteristics of the group are: Probability Probability Number of Death of Death Gender of Lives (all causes) (accidental causes) Female 100 0.004 0.0004 Male 200 0.006 0.0012 The aggregate claims distribution is approximated using a compound Poisson distribution which equates the expected number of claims. The premium charged equals the mean plus 10% of the standard deviation of this compound Poisson distribution. Determine the relative security loading, (premiums / expected losses) - 1. (A) 0.10 (B) 0.13 (C) 0.16 (D) 0.19 (E) 0.22 6.9 (Course 151 Sample Exam #2, Q.20) (1.7 points) An insurer provides life insurance for the following group of independent lives: Number of Death Probability Lives Benefit of Death 100
α
0.02
200
2α
0.03
Let S be the total claims. Let w be the variance of the compound Poisson distribution which approximates the distribution of S by equating the expected number of claims. Determine the maximum value of α such that w ≤ 2500. (A) 6.2
(B) 8.0
(C) 9.8
(D) 11.6
(E) 13. 4
6.10 (Course 151 Sample Exam #2, Q.23) (2.5 points) An insurer has the following portfolio of policies: Benefit Number Probability Class Amount of Policies of a Claim 1 1 400 0.02 2 10 100 0.02 There is at most one claim per policy. The insurer reinsures the amount in excess of R (R >1) per policy. The reinsurer has a reinsurance loading of 0.25. The insurer wants to minimize the probability, as determined by the normal approximation, that retained claims plus cost of reinsurance exceeds 34. Determine R. (A) 1.5 (B) 2.0 (C) 2.5 (D) 3.0 (E) 3.5
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 205
6.11 (Course 151 Sample Exam #3, Q.9) (1.7 points) An insurance company has a portfolio of two classes of insureds: Probability Number of Relative Security Class Benefit of a Claim Insureds Loading I 5 0.20 N 0.10 II 10 0.10 2N 0.05 The relative security loading is defined as: (premiums / expected losses) - 1. Assume all claims are independent. The total of the premiums equals the 95th percentile of the normal distribution that approximates the distribution of total claims. Determine N. (A) 1488 (B) 1538 (C) 1588 (D) 1638 (E) 1688 6.12 (5A, 5/94, Q.37) (2 points) An insurance company has two classes of insureds as follows: Number of Probability Claim Class Insureds of 1 Claim Amount 1 200 0.05 2000 2 300 0.01 1500 There is at most one claim per insured and each insured has only one size of claim. The insurer wishes to collect an amount equal to the 95th percentile of the distribution of total claims, where each individual's share is to be proportional to the expected claim amount. Calculate the relative security loading, (premiums / expected losses) - 1, using the Normal Approximation. 6.13 (5A, 11/97, Q.39) (2 points) A life insurance company issues 1-year term life contracts for benefit amounts of $100 and $200 to individuals with probabilities of death of .03 or .09. The following table gives the number of individuals in each of the four classes. Class Probability Benefit Number 1 .03 100 50 2 .03 200 40 3 .09 100 60 4 .09 200 50 The company wants to collect from this population an amount equal to the 95th percentile of the distribution of total claims, and it wants each individual's share of this amount to be proportional to the individual's expected claim. Using the Normal Approximation, calculate the required relative security loading, (premiums / expected losses) - 1.
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 206
6.14 (Course 1 Sample Exam, Q.15) (1.9 points) An insurance company issues insurance contracts to two classes of independent lives, as shown below. Class Probability of Death Benefit Amount Number in Class A 0.01 200 500 B 0.05 100 300 The company wants to collect an amount, in total, equal to the 95th percentile of the distribution of total claims. The company will collect an amount from each life insured that is proportional to that lifeʼs expected claim. That is, the amount for life j with expected claim E[Xj] would be kE[Xj]. Using the Normal Approximation, calculate k. A. 1.30 B. 1.32 C. 1.34 D. 1.36
E. 1.38
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 207
Solutions to Problems: 6.1. C. Mean = Σ niqi bi = (2000)(.05)(1) + (3000)(.04)(5) + (4000)(.02)(10) = 1500. Variance = Σ ni(1-qi)qi bi2 = (2000)(.95)(.05)(12 ) + (3000)(.96)(.04)(52 ) + (4000)(.98)(.02)(102 ) = 10,815. Prob[A > (1.1)(1500)] ≅ 1 - Φ[150/ 10,815 ] = 1 - Φ[1.44] = 1 - 0.9251 = 7.49%. 6.2. A. (5)(20) + (5)(40) = 300. Comment: Since we are given the variance of aggregate losses, there is no need to compute the variance of aggregate losses from the mean frequency, variance of frequency, mean severity, and variance of severity. 6.3. D. Using analysis of variance, let n be the number of policies of class 1: Var[A] = En [Var[A | n]] + Varn [E[A | n]] = En [(n)(20) + (10-n)(40)] + Varn [(n)(10) + (10-n)(15)] = En [400 - 20n] + Varn [150 - 5n] = 400 - 20En [n] + 25Varn [n] = 400 - (20)(10)(0.5) + (25)(10)(0.5)(1 - 0.5) = 362.5. Comment: Similar to Exercise 9.72 in Loss Models. 6.4. E. Mean = Σ qi bi = (.03)(10) + (.06)(25) + (.01)(50) + (.02)(100) = 4.3. Variance = Σ (1-qi)qi bi2 = (.97)(.03)(102 ) + (.94)(.06)(252 ) + (.99)(.01)(502 ) + (.98)(.02)(1002 ) = 258.91. CV = ( 258.91 )/4.3 = 3.74. 6.5. B. Mean = Σ ni qi bi = 100{(.03)(10) + (.06)(25) + (.01)(50) + (.02)(100)} = 430. Variance = Σ ni (1-qi)qi bi2 = 100{(.97)(.03)(102 ) + (.94)(.06)(252 ) + (.99)(.01)(502 ) + (.98)(.02)(1002 )} = 25,891. CV = ( 258.91 )/430 = 0.374. Comment: With 100 policies of each type, the coefficient of variation is 1/10 of what it would have been with only one policy of each type as in the previous question. 6.6. B. From the previous solution, Mean = 430 and Variance = 25,891. Prob[Aggregate < 300] ≅ Φ[(300 - 430)/ 258.91 ] = Φ[-0.81] = 1 - 0.7910 = 20.9%. 6.7. C. Prob[A > 40] = Prob[A = 50] + Prob[A = 60] + Prob[A = 70] = Prob[lives 3 and 4 die] + Prob[lives 1, 2, and 4 die] + Prob[lives 1, 3, and 4 die] + Prob[lives 2, 3, and 4 die] + Prob[lives 1, 2, 3, and 4 die] = (.96)(.97)(.02)(.05) + (.04)(.03)(.98)(.05) + (.04)(.97)(.02)(.05) + (.96)(.03)(.02)(.05) + (.03)(.04)(.02)(.05) = 0.00106.
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 208
6.8. B. The expected number of fatal accidents is: (100)(.0004) + (200)(.0012) = .28. The expected number of deaths (all causes) is: (100)(.004) + (200)(.006) = 1.6. So the expected number of deaths from other than accidents is: 1.6 - .28 = 1.32. Therefore, the mean severity is: ((1.32)(10) + (.28)(100))/1.6 = 25.75. The second moment of the severity is: ((1.32)(102 ) + (.28)(1002 ))/1.6 = 1832.5. The mean aggregate loss is: (1.6)(25.75) = 41.2. If this were a compound Poisson Distribution, then the variance would be: (mean frequency)(second moment of the severity) = (1.6)(1832.5) = 2932. The standard deviation is: 2932 = 54.14. Thus the premium is: 41.2 + (10%)(54.14) = 46.614. The relative security loading is: premium /(expected loss) - 1 = 46.614/41.2 - 1 = 13.1%. 6.9. C. Expected number of claims = (100)(.02) + (200)(.03) = 2 + 6 = 8. Therefore, 2/8 of the claims are expected to have death benefit α, while 6/8 of the claims are expected to have death benefit of 2α. The second moment of the severity is: (2/8)(α2) + (6/8)(2α)2 = 13α2 /4. Thus the variance of aggregate losses is: (8)(13α2/4) = 26α2. Setting 2500 = 26α2, α = 9.8.
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 209
6.10. B. Expected number of claims = (400)(.02) + (100)(.02) = 8 + 2 = 10. For a retention of 10 > R > 1, the expected losses retained are : (8)(1) + (2)(R) = 8 + 2R. Variance of retained losses = (12 )(400)(.02)(.98) + (R2 )(100)(.02)(.98) = 7.84 + 1.96R2 . The expected ceded losses are 2(10-R) = 20 - 2R. Thus the cost of reinsurance is: 1.25(20 - 2R) = 25 - 2.5R. Thus expected retained losses plus reinsurance costs are: 8 + 2R + 25 - 2.5 R = 33 - .5 R. The variance of the retained losses plus reinsurance costs is that of the retained losses. Therefore, the probability that the retained losses plus reinsurance costs exceed 34 is approximately: 1 - Φ[(34 - (33 - .5 R))/ 7.84 + 1.96R2 ]. This probability is minimized by maximizing (34 - (33 - .5 R))/ 7.84 + 1.96R2 = (1 + .5 R)/ 7.84 + 1.96R2 . Setting the derivative with respect to R equal to zero: 0 = {(.5) 7.84 + 1.96R2 - (1 + .5 R)(1/2)(2)(1.96R)/ 7.84 + 1.96R2 }/ (7.84 + 1.96R2 ). Therefore, (.5)(7.84 + 1.96R2 ) = (1 + .5 R)(1.96R). ⇒ 3.92 + .98R2 = 1.96R + .98R2 . ⇒ R = 2. Comment: The graph of (1 + .5 R)/ 7.84 + 1.96R2 , for 1 < R < 10 is as follows: 0.5
0.48
0.46
0.44
2
3
4
5
6
7
8
9
10
The graph of the approximate probability that the retained losses plus reinsurance costs exceed 34, 1 - Φ[(34 - (33 - .5 R))/ 7.84 + 1.96R2 ], for 1 < R < 10 is as follows: 0.335 0.33 0.325 0.32 0.315 0.31
2
3
4
5
6
7
8
9
10
This probability is minimized for R = 2. However, this probability is insensitive to R, so in this case this may not be a very practical criterion for selecting the “best” R.
2013-4-3, Aggregate Distributions §6 Individual Risk Model,
HCM 10/23/12,
Page 210
6.11. A. The 95th percentile of the Normal Distribution implies that the premium = expected aggregate loss + 1.645(standard deviations). The expected aggregate loss is: (5)(.2)N + (10)(.1)(2N) = 3N. The variance of aggregate losses is: (53 )(.2)(.8)N + (102 )(.1)(.9)(2N) = 22N. The premiums = (1.1)(5)(.2)N + (1.05)(10)(.1)(2N) = 3.2N. Setting the premiums equal to expected aggregate loss + 1.645(standard deviations): 3.2N = 3N + (1.645) 22N . Solving, N = 22(1.645/.2)2 = 1488. 6.12. The mean loss is: (.05)(2000)(200) + (.01)(1500)(300) = 24,500. The variance of aggregate losses is: (.05)(.95)(20002 )(200) + (.01)(.99)(15002 )(300) = 44,682,500. The 95th percentile of aggregate losses is approximately: 24,500 + (1.645) 44,682,500 = 24,500 + 10996. The relative security loading is: 10996/24500 = 45%. 6.13. The mean loss is: (.03)(100)(50) + (.03)(200)(40) + (.09)(100)(60) + (.09)(200)(50) = 1830. The variance of aggregate losses is: (.03)(.97)(1002 )(50) + (.03)(.97)(2002 )(40) + (.09)(.91)(1002 )(60) + (.09)(.91)(2002 )(50) = 274,050. The 95th percentile of aggregate losses is approximately: 1830 + (1.645) 274,050 = 2691. The security loading is: 2691 - 1830 = 861. The relative security loading is: 861/1830 = 47%. 6.14. E. The mean aggregate is: (.01)(500)(200) + (.05)(300)(100) = 2,500. The variance of the aggregate is: (.01)(.99)(500)(2002 ) + (.05)(.95)(300)(1002 ) = 340,500. Using the Normal Approximation, the 95th percentile of the aggregate is: 2500 + 1.645 340,500 = 3460. k = 3460/2500 = 1.384.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 211
Section 7, Recursive Method / Panjer Algorithm As discussed previously, the same mathematics apply to aggregate distributions (independent frequency and severity) and compound frequency distributions. While one could calculate the density function by brute force, tools have been developed to make it easier to work with aggregate distributions when the primary distribution has certain forms. The Panjer Algorithm, referred to in Loss Models as the “recursive method”, is one such technique.91 The first step is to calculate the density at zero of the aggregate or compound distribution. Density of Aggregate Distribution at Zero: For a aggregate distribution one can calculate the probability of zero claims as follows from first principles. For example, assume frequency follows a Poisson Distribution, with λ = 1.3, and severity has a 60% chance of being zero. Exercise: What is the density at 0 of the aggregate distribution? [Solution: There are a number of ways one can have zero aggregate. One can either have zero claims or one can have n claims, each with zero severity. Assuming there were n claims, the chance of each of them having zero severity is 0.6n . The chance of having zero claims is the density of the Poisson distribution at 0, e-1.3. Thus the chance of zero aggregate is: e-1.3 + (1.3)e-1.3 0.6 + (1.32 /2!)e-1.3 0.62 + (1.33 /3!)e-1.3 0.63 +( 1.34 /4!)e-1.3 0.64 + ... = ∞
e-1.3
∑ {(1.3)(.6)} n / n! = e-1.3 exp((1.3) .6) = exp(-(1.3)(1 - .6)) = e-0.52 = 0.5945.
n=0
∞
Comment: I have used the fact that:
n
∑ xn!
= ex.]
0
In this exercise, we have computed that there is a 59.45% chance that there is zero aggregate. Instead one can use the following formula, the first step of the Panjer algorithm: c(0) = Pp (s(0)), where c is the compound or aggregate density, s is the density of the severity or secondary distribution, and Pp is the probability generating function of the frequency or primary distribution. Loss Models points out that the number of computations increases as n2 , O(n2 ), rather than n3 , O(n3 ), as for direct calculation using convolutions. 91
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 212
Exercise: Apply this formula to determine the density at zero of the aggregate distribution. [Solution: For the Poisson, P(z) = exp[λ(z - 1)] = exp[1.3(z - 1)]. c(0) = Pp (s(0)) = Pp (.6) = e1.3(.6-1) = e-0.52 = 0.5945205.] Derivation of the Formula for the Density of Aggregate Distribution at Zero: Let the frequency or primary distribution be p, the severity or secondary distribution be s, and let c be the aggregate or compound distribution. The probability of zero aggregate is : c(0) = p(0) + p(1)s(0) + p(2)s(0)2 + p(3)s(0)3 + ... ∞
c(0) = Σ p(n) s(0)n . n=0
We note that by the definition of the Probability Generating Function, the righthand side of the above equation is the Probability Generating of the primary distribution at s(0).92 Therefore, the density of the compound distribution at zero is:93 c(0) = Pp (s(0)) = P.G.F. of primary distribution at (density of secondary distribution at zero). Formulas for the Panjer Algorithm (recursive method): Let the frequency or primary distribution be p, the severity or secondary distribution be s, and c be the aggregate or compound distribution. If the primary distribution p is a member of the (a,b,0) class94, then one can use the Panjer Algorithm (recursive method) in order to iteratively compute the compound density:95 c(x) =
x 1 ∑ (a + jb / x) s(j) c(x - j) . 1 - a s(0) j=1
c(0) = Pp (s(0)). P(z) = E[zn] = Σ p(n) zn. See Theorem 6.14 in Loss Models. This is the source of the values given in Table D.1 in Appendix D of Loss Models. 94 f(x+1)/f(x) = a + b/(x+1), which holds for the Binomial, Poisson, and Negative Binomial Distributions. 95 Formula 9.22 in Loss Models. Note that if primary distribution is a member of the (a, b, 1) class, then as discussed in the next section, there is a modification of this algorithm which applies. 92 93
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 213
Aggregate Distribution Example: In order to apply the Panjer Algorithm one must have a discrete severity distribution. Thus either the original severity distribution must be discrete or one must approximate a continuous severity distribution with a discrete one.96 We will assume for simplicity that the discrete distribution has support on the nonnegative integers. If not, we can just change units to make it more convenient. Exercise: Assume the only possible sizes of loss are 0, $1000, $2000, $3000, etc. How could one change the scale so that the support is the nonnegative integers? [Solution: One puts everything in units of thousands of dollars instead of dollars. If f(2000) is the original density at 2000, then it is equal to s(2), the new density at 2. The new densities still sum to unity and the aggregate distribution will be in units of $1000.] Let the frequency distribution be p,97 the discrete severity distribution be s,98 and let c be the aggregate loss distribution.99 Exercise: Let severity have density: s(0) = 60%, s(1) = 10%, s(2) = 25%, and s(3) = 5%. Frequency is Poisson with λ = 1.3. Use the Panjer Algorithm to calculate the density at 3 of the aggregate distribution. [Solution: For the Poisson a = 0 and b = λ = 1.3. c(0) = Pp (s(0)) = Pp ( .6) = e1.3(.6-1) = 0.5945205. c(x) =
x x 1 1.3 (a + jb / x) s(j) c(x - j) = j s(j) c(x - j) . ∑ ∑ 1 - a s(0) j=1 x j=1
c(1) = (1.3/1) (1) s(1) c(1-1) = (1.3/1) {(1)(0.1)(0.5945205)} = 0.077288. c(2) = (1.3/2){(1)s(1)c(1) + (2)s(2)c(0)} = (1.3/2) {(1)(0.1)(0.077288) +(2)(.25)(0.5945205)} = 0.19824. c(3) = (1.3/3) {(1)(0.1)(0.19824) + (2)(0.25)(0.077288) + (3)(0.05)(0.5945205)} = 0.06398. ] By continuing iteratively in this manner, one could calculate the density for any value.100 The Panjer algorithm reduces the amount of work needed a great deal while providing exact results, provided one retains enough significant digits in the intermediate calculations. 96
There are a number of ways of performing such an approximation, as discussed in a subsequent section. In general, p is the primary distribution, which for this application of the Panjer algorithm is the frequency distribution. 98 In general, s is the secondary distribution, which for this application of the Panjer algorithm is the discrete severity distribution. 99 In general c is the compound distribution, which for this application of the Panjer algorithm is the aggregate losses. 100 In this case, the aggregate distribution out to 10 is: 0.594521, 0.0772877, 0.198243, 0.06398, 0.0380616, 0.0170385, 0.00657186, 0.00276448, 0.000994199, 0.000356408, 0.000123164. The chance of the aggregate losses being greater than 10 is: 0.0000587055. 97
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 214
If the severity density is zero at 0, then c(0) = Pp (s(0)) = Pp (0). Now Pp (0) = lim Pp (z) = lim E[ZN] = lim {p(0) + z→0
z→0
z→0
∑ p(n)zn } = p(0). n=1
Therefore, if s(0) = 0, the probability of zero aggregate losses = c(0) = p(0) = the probability of no claims. If s(0) > 0, then there is an additional probability of zero aggregate losses, due to the contribution of situations with claims of size zero. Thinning the Frequency or Primary Distribution: The Panjer Algorithm directly handles situations in which there is a positive chance of a zero severity. In contrast, if one tried to apply convolution to such a situation, one would need to calculate a lot of convolutions, since one can get zero aggregate even if one has many claims. One can get around this difficulty by thinning the frequency or primary distribution.101 As in the previous example, let frequency be Poisson with λ = 1.3, and the severity have density: s(0) = 60%, s(1) = 10%, s(2) = 25%, and s(3) = 5%. Exercise: What is the distribution of the number of claims with non-zero severity? [Solution: Poisson with λ = (1.3)(40%) = 0.52.] Exercise: If the severity distribution is truncated to remove the zeros, what is the resulting distribution? [Solution: s(1) = 10%/40% = 25%, s(2) = 25%/40% = 62.5%, and s(3) = 5%/40% = 12.5%.] Only the claims with non-zero severity contribute to the aggregate. Therefore, we can compute the aggregate distribution by using the thinned frequency, Poisson with λ = (1.3)(40%) = 0.52, and the severity distribution truncated to remove the zero claims. Exercise: Use convolutions to calculate the aggregate distribution up to 3. [Solution: p(3) = e-0.52 0.523 /6 = 0.01393. (s*s)[3] = (2)(0.25)(0.625) = 0.3125. (0.30915)(0.12500) + (0.08038)(0.31250) + (0.01393)(0.01562) = 0.06398. n Poisson
0 0.59452
1 0.30915
2 0.08038
3 0.01393
x
s*0
s
s*s
s*s*s
0 1 2 3
1 0.25000 0.62500 0.12500
0.06250 0.31250
0.01562
Aggregate Density 0.594521 0.077288 0.198243 0.063980
Comment: Matching the result obtained previously using the Panjer Algorithm.] 101
Thinning is discussed in “Mahlerʼs Guide to Frequency Distributions.”
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 215
One can apply this same thinning technique when the frequency is any member of the (a, b, 0) class. If the original frequency is Binomial, then the non-zero claims are also Binomial with parameters m and {1 - s(0)}q. If the original frequency is Negative Binomial, then the non-zero claims are also Negative Binomial with parameters r and {1 - s(0)}β. Compound Distribution Example: The mathematics are the same in order to apply the Panjer Algorithm to the compound case. For example, assume the number of taxicabs that arrive per minute at the Heartbreak Hotel is Poisson with mean 1.3. In addition, assume that the number of passengers dropped off at the hotel by each taxicab is Binomial with q = 0.4 and m = 5. The number of passengers dropped off by each taxicab is independent of the number of taxicabs that arrive and is independent of the number of passengers dropped off by any other taxicab. Then the aggregate number of passengers dropped off per minute at the Heartbreak Hotel is a compound Poisson-Binomial distribution, with parameters: λ = 1.3, q = 0.4, m = 5. Exercise: Use the Panjer Algorithm to calculate the density at 3 for this example. [Solution: The densities of the secondary Binomial Distribution are: j s(j) j s(j) 0 0.07776 3 0.2304 1 0.2592 4 0.0768 2 0.3456 5 0.01024 For the primary Poisson a = 0, b = λ = 1.3, and P(z) = exp[λ(z - 1)] = exp[1.3(z - 1)]. c(0) = Pp (s(0)) = Pp (0.07776) = e1.3(0.07776-1) = 0.301522. c(x) =
x x 1 1.3 (a + jb / x) s(j) c(x j) = ∑ ∑ j s(j) c(x - j) . 1 - a s(0) j=1 x j=1
c(1) = (1.3/1) (1) s(1) c(1-1) = (1.3/1) {(1)(0.2592)(0.301522)} = 0.101601. c(2) = (1.3/2) {(1)(0.2592)(0.101601) + (2)(0.3456)(0.301522)} = 0.152586. c(3) = (1.3/3) {(1)(0.2592)(0.152586) + (2)(0.3456)(0.101601) + (3)(0.2304)(0.301522)} = 0.137882.] By continuing iteratively in this manner, one could calculate the density for any value. The Panjer algorithm reduces the amount of work needed a great deal while providing exact results, provided one retains enough significant digits in the intermediate calculations.102 Here are the densities for the compound Poisson-Binomial distribution, with parameters λ = 1.3, q = 0.4, m = 5, calculated using the Panjer Algorithm, from 0 to 20: 0.301522, 0.101601, 0.152586, 0.137882, 0.0988196, 0.070989, 0.0507183, 0.0335563, 0.0211638, 0.0130872, 0.0078559, 0.00456369, 0.00258682, 0.00143589, 0.000779816, 0.000414857, 0.000216723, 0.000111302, 0.0000562232, 0.0000279619, 0.0000137058. 102
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 216
Preliminaries to the Proof of the Panjer Algorithm: In order to prove the Panjer Algorithm/ Recursive Method, we will use two results for convolutions. First, s*n(x) = Prob(sum of n losses is x) = ∞
∞
i=0
i=0
∑ Prob(first loss is i) Prob(sum of n- 1 losses is x - i) = ∑ s(i)
s * ( n - 1) (x - i) .
In other words, s*n = s * s*n-1. In this case, since the severity density is assumed to have support equal to the nonnegative integers, s*n-1(x-i) is 0 for i > x, so the terms for i > x drop out of the summation: x
s*n(x) =
∑ s(i)
s * ( n - 1) (x - i) .
i=0
Second, assume we have n independent, identically distributed losses, each with distribution s. Assume we know their sum is x > 0, then by symmetry the conditional expected value of any of these losses is x/n. ∞
x/n = E[L1 | L1 + L2 + ... + Ln = x] =
∑
i Prob[ L1 = i | L1 + L2 + ... + Ln = x] =
i=0 ∞
∑
i Prob[ L1 = i and L1 + L2 + ... + Ln = x] / Prob[L1 + L2 + ... + Ln = x] =
i=0 ∞
∑
i Prob[ L1 = i and L2 + ... + Ln = x - i] / s*n(x) =
i=0
∞
(1 / s*n(x))
∑
i s(i) s * (n - 1)(x - i) = {1 / s*n(x)}
i=0
∑ i s(i)
s * (n - 1) (x - i).103
i=1 x
Therefore, s*n(x) = (n/x)
x
∑ i s(i)
s * (n - 1) (x - i).
i=1
103
Note that the term for i = 0 drops out. Since for use in the Panjer Algorithm the severity density is assumed to have support equal to the nonnegative integers, s*n-1(x-i) is 0 for i > x, so the terms for i > x drop out of the summation.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 217
Exercise: Verify the above relationship for n = 2 and x = 5. [Solution: s*2 (5) = probability two losses sum to 5 = s(0)s(5) + s(1)s(4) + s(2)s(3) + s(3)s(2) + s(4)s(1) + s(5)s(0) = 2{s(0)s(5) + s(1)s(4) + s(2)s(3)}. x
(n/x) ∑ i s(i) s * (n - 1) (x - i) = (2/5){s(1)s(4) + 2s(2)s(3) + 3s(3)s(2) + 4s(4)s(1) + 5s(5)s(0)} = i=1
2{s(0)s(5) + s(1)s(4) + s(2)s(3)}.] Proof of the Panjer Algorithm: Recall that for a member of the (a, b, 0) class of frequency distributions: f(n+1) / f(n) = a + {b / (n+1)}. Thus, f(n) = f(n-1){a + b /n}, for n > 0. For the compound distribution to take on a value x > 0, there must be one or more losses. As discussed previously, one can write the compound distribution in terms of convolutions: ∞
c(x) =
∑
s * n (x)
f(n)
n=1 ∞
a ∑ f(n- 1) n=1 x
a ∑ s(i) i=0
= a ∑ f(n- 1)
s * n (x)
n=1
x
∑ s(i) s * ( n - 1) (x - i) + b
i=0
∞
∑
∞
f(n- 1)
s * (n - 1)(x - i)
n=1
x
i=0
+ b ∑ f(n- 1) s * n (x) / n = n=1
∞
∑
x
n=1
f(n- 1) (n / x) ∑ i s(i) s * (n - 1) (x - i) / n = i=1
x
+ (b/x) ∑ i s(i) i=1
a ∑ s(i) c(x − i) + (b/x)
∞
∞
∑
f(n- 1) s * (n - 1)(x - i) =
n=1
x
∑ i s(i) c(x − i) = i=1
x
a s(0) c(x) +
∑ (a
+ bi / x) s(i) c(x - i) .
i=1
Taking the first term to the lefthand side of the equation and solving for c(x): c(x) =
x 1 (a + i b / x) s(i) c(x - i). 1 - a s(0) ∑ i=1
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Problems: Use the following information for the next 11 questions:
• •
One has a compound Geometric distribution with β = 2.1. The discrete severity distribution is as follows: 0 25% 1 35% 2 20% 3 15% 4 5%
7.1 (1 point) What is the mean aggregate loss? A. less than 3.0 B. at least 3.0 but less than 3.5 C. at least 3.5 but less than 4.0 D. at least 4.0 but less than 4.5 E. at least 4.5 7.2 (2 points) What is the variance of the aggregate losses? A. less than 13 B. at least 13 but less than 14 C. at least 14 but less than 15 D. at least 15 but less than 16 E. at least 16 7.3 (1 point) What is the probability that the aggregate losses are zero? A. less than .37 B. at least .37 but less than .38 C. at least .38 but less than .39 D. at least .39 but less than .40 E. at least .40 7.4 (2 points) What is the probability that the aggregate losses are one? A. less than 12% B. at least 12% but less than 13% C. at least 13% but less than 14% D. at least 14% but less than 15% E. at least 15%
Page 218
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 219
7.5 (2 points) What is the probability that the aggregate losses are two? A. less than 8% B. at least 8% but less than 9% C. at least 9% but less than 10% D. at least 10% but less than 11% E. at least 11% 7.6 (2 points) What is the probability that the aggregate losses are three? A. less than 9.2% B. at least 9.2% but less than 9.3% C. at least 9.3% but less than 9.4% D. at least 9.4% but less than 9.5% E. at least 9.5% 7.7 (2 points) What is the probability that the aggregate losses are four? A. less than 7.2% B. at least 7.2% but less than 7.3% C. at least 7.3% but less than 7.4% D. at least 7.4% but less than 7.5% E. at least 7.5% 7.8 (2 points) What is the probability that the aggregate losses are five? A. less than 4.7% B. at least 4.7% but less than 4.8% C. at least 4.8% but less than 4.9% D. at least 4.9% but less than 5.0% E. at least 5.0% 7.9 (2 points) What is the probability that the aggregate losses are greater than 5? Use the Normal Approximation. A. less than 10% B. at least 10% but less than 15% C. at least 15% but less than 20% D. at least 20% but less than 25% E. at least 25%
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 220
7.10 (2 points) Approximate the distribution of aggregate losses via a LogNormal Distribution, and estimate the probability that the aggregate losses are greater than 5. A. less than 10% B. at least 10% but less than 15% C. at least 15% but less than 20% D. at least 20% but less than 25% E. at least 25% 7.11 (2 points) What is the 70th percentile of the distribution of aggregate losses? A. 2 B. 3 C. 4 D. 5 E. 6 Use the following information for the next 6 questions:
• •
Frequency follows a Binomial Distribution with m = 10 and q = 0.3.
•
Frequency and Severity are independent.
The discrete severity distribution is as follows: 0 20% 1 50% 2 20% 3 10%
7.12 (1 point) What is the probability that the aggregate losses are zero? A. less than 4% B. at least 4% but less than 5% C. at least 5% but less than 6% D. at least 6% but less than 7% E. at least 7% 7.13 (2 points) What is the probability that the aggregate losses are one? A. less than 13% B. at least 13% but less than 14% C. at least 14% but less than 15% D. at least 15% but less than 16% E. at least 16% 7.14 (2 points) What is the probability that the aggregate losses are two? A. less than 16% B. at least 16% but less than 17% C. at least 17% but less than 18% D. at least 18% but less than 19% E. at least 19%
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 221
7.15 (2 points) What is the probability that the aggregate losses are three? A. less than 17% B. at least 17% but less than 18% C. at least 18% but less than 19% D. at least 19% but less than 20% E. at least 20% 7.16 (2 points) What is the probability that the aggregate losses are four? A. less than 14% B. at least 14% but less than 15% C. at least 15% but less than 16% D. at least 16% but less than 17% E. at least 17% 7.17 (2 points) What is the probability that the aggregate losses are five? A. less than 11% B. at least 11% but less than 12% C. at least 12% but less than 13% D. at least 13% but less than 14% E. at least 14%
Use the following information for the next 2 questions: The number of snowstorms each winter in Springfield is Negative Binomial with r = 5 and β = 3. The probability that a given snowstorm will close Springfield Elementary School for at least one day is 30%, independent of any other snowstorm. 7.18 (2 points) What is the probability that Springfield Elementary School will not be closed due to snow next winter? A. 2% B. 3% C. 4% D. 5% E. 6% 7.19 (2 points) What is the probability that Springfield Elementary School will be closed by exactly one snowstorm next winter? A. 10% B. 12% C. 14% D. 16% E. 18%
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 222
7.20 (2 points) The frequency distribution is a member of the (a, b , 0) class, with a = 0.75 and b = 3.75. The discrete severity distribution is: 0, 1, 2, 3 or 4 with probabilities of: 15%, 30%, 40%, 10% and 5%, respectively. The probability of the aggregate losses being 6, 7, 8 and 9 are: 0.0695986, 0.0875199, 0.107404, and 0.127617, respectively. What is the probability of the aggregate losses being 10? A. less than 14.6% B. at least 14.6% but less than 14.7% C. at least 14.7% but less than 14.8% D. at least 14.8% but less than 14.9% E. at least 14.9% Use the following information for the next 2 questions: The number of hurricanes that form in the Atlantic Ocean each year is Poisson with λ = 11. The probability that a given such hurricane will hit the continental United States is 15%, independent of any other hurricane. 7.21 (2 points) What is the probability that no hurricanes hit the continental United States next year? A. 11% B. 13% C. 15% D. 17% E. 19% 7.22 (2 points) What is the probability that exactly one hurricane will hit the continental United States next year? A. 30% B. 32% C. 34% D. 36% E. 38% Use the following information for the next 6 questions: One has a compound Geometric-Poisson distribution with parameters β = 1.7 and λ = 3.1. 7.23 (1 point) What is the density function at zero? A. less than 0.36 B. at least 0.36 but less than 0.37 C. at least 0.37 but less than 0.38 D. at least 0.38 but less than 0.39 E. at least 0.39 7.24 (2 points) What is the density function at one? A. less than 3.3% B. at least 3.3% but less than 3.4% C. at least 3.4% but less than 3.5% D. at least 3.5% but less than 3.6% E. at least 3.6%
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
7.25 (2 points) What is the density function at two? A. less than 5.4% B. at least 5.4% but less than 5.5% C. at least 5.5% but less than 5.6% D. at least 5.6% but less than 5.7% E. at least 5.7% 7.26 (2 points) What is the density function at three? A. less than 6.2% B. at least 6.2% but less than 6.3% C. at least 6.3% but less than 6.4% D. at least 6.4% but less than 6.5% E. at least 6.5% 7.27 (2 points) What is the density function at four? A. less than 6.0% B. at least 6.0% but less than 6.1% C. at least 6.1% but less than 6.2% D. at least 6.2% but less than 6.3% E. at least 6.3% 7.28 (2 points) What is the median? A. 2 B. 3 C. 4
D. 5
E. 6
7.29 (2 points) The frequency distribution is a member of the (a, b , 0) class, with a = -0.42857 and b = 4.71429. The discrete severity distribution is: 0, 1, 2, or 3 with probabilities of: 20%, 50%, 20% and 10% respectively. The probability of the aggregate losses being 10, 11 and 12 are: 0.00792610, 0.00364884, and 0.00157109, respectively. What is the probability of the aggregate losses being 13? A. less than 0.03% B. at least 0.03% but less than 0.04% C. at least 0.04% but less than 0.05% D. at least 0.05% but less than 0.06% E. at least 0.06%
Page 223
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 224
7.30 (3 points) Frequency is given by a Poisson-Binomial compound frequency distribution, as per Loss Models, with parameters λ = 1.2, m = 4, and q = 0.1. (Frequency is Poisson with λ = 1.2, and severity is Binomial with m = 4 and q = 0.1.) What is the density function at 1? A. less than 0.20 B. at least 0.20 but less than 0.21 C. at least 0.21 but less than 0.22 D. at least 0.22 but less than 0.23 E. at least 0.23 7.31 (4 points) Assume that S has a compound Poisson distribution with λ = 2 and individual claim amounts that are 20, 30, and 50 with probabilities of 0.5, 0.3 and 0.2, respectively. Calculate Prob[S > 75]. A. 30% B. 32% C. 34% D. 36% E. 38% 7.32 (8 points) The number of crises per week faced by the superhero Underdog follows a Negative Binomial Distribution with r = 0.3 and β = 4. The number of super energy pills he requires per crisis is distributed as follows: 50% of the time it is 1, 30% of the time it is 2, and 20% of the time it is 3. What is the minimum number of super energy pills Underdog needs at the beginning of a week to be 99% certain he will not run out during the week? Use a computer to help you perform the calculations. 7.33 (5A, 11/95, Q.36) (2 points) Suppose that the aggregate loss S has a compound Poisson distribution with expected number of claims equal to 3 and the following claim amount distribution: individual claim amounts can be 1, 2 or 3 with probabilities of 0.6, 0.3, and 0.1, respectively. Calculate the probability that S = 2. 7.34 (5A, 5/98, Q.36) (2.5 points) Assume that S has a compound Poisson distribution with λ = 0.6 and individual claim amounts that are 1, 2, and 3 with probabilities of 0.25, 0.35 and 0.40, respectively. Calculate Prob[S = 1], Prob[S= 2] and Prob[S=3]. 7.35 (Course 151 Sample Exam #3, Q.12) (1.7 points) You are given: (i) S has a compound Poisson distribution with λ = 2. (ii) individual claim amounts, x, are distributed as follows: x p(x) 1 0.4 2 0.6 Determine fS(4). (A) 0.05
(B) 0.07
(C) 0.10
(D) 0.15
(E) 0.21
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 225
Use the following information for the next two questions: The frequency distribution of the number of losses in a year is geometric-Poisson with geometric primary parameter β = 3 and Poisson secondary parameter λ = 0.5. 7.36 (Course 3 Sample Exam, Q.41) Calculate the probability that the total number of losses in a year is at least 4. 7.37 (Course 3 Sample Exam, Q.42) If individual losses are all exactly 100, determine the expected aggregate losses in excess of 400. 7.38 (3, 11/02, Q.36 & 2009 Sample Q.95) (2.5 points) The number of claims in a period has a geometric distribution with mean 4. The amount of each claim X follows P(X = x) = 0.25, x = 1, 2, 3, 4. The number of claims and the claim amounts are independent. S is the aggregate claim amount in the period. Calculate Fs(3). (A) 0.27
(B) 0.29
(C) 0.31
(D) 0.33
(E) 0.35
7.39 (3 points) The number of claims in a period has a geometric distribution with mean 5. The amount of each claim X follows P(X = x) = 0.2, x = 0, 1, 2, 3, 4. The number of claims and the claim amounts are independent. S is the aggregate claim amount in the period. Calculate Fs(3). (A) 0.27
(B) 0.29
(C) 0.31
(D) 0.33
(E) 0.35
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 226
7.40 (CAS3, 5/04, Q.40) (2.5 points) XYZ Re provides reinsurance to Bigskew Insurance Company. XYZ agrees to pay Bigskew for all losses resulting from “events”, subject to:
• a $500 deductible per event and • a $100 annual aggregate deductible For providing this coverage, XYZ receives a premium of $150. Use a Poisson distribution with mean equal to 0.15 for the frequency of events. Event severity is from the following distribution: Loss Probability 250 0.10 500 0.25 800 0.30 1,000 0.25 1,250 0.05 1,500 0.05
• i = 0% What is the actual probability that XYZ will payout more than it receives? A. 8.9% B. 9.0% C. 9.1% D. 9.2% E. 9.3% 7.41 (4, 5/07, Q.8) (2.5 points) Annual aggregate losses for a dental policy follow the compound Poisson distribution with λ = 3. The distribution of individual losses is: Loss Probability 1 0.4 2 0.3 3 0.2 4 0.1 Calculate the probability that aggregate losses in one year do not exceed 3. (A) Less than 0.20 (B) At least 0.20, but less than 0.40 (C) At least 0.40, but less than 0.60 (D) At least 0.60, but less than 0.80 (E) At least 0.80
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 227
Solutions to Problems: 7.1. A. The mean severity is: (1)(.35) + (2)(.2) + (3)(.15) +(4)(.05) = 1.4. The mean aggregate losses = (2.1)(1.4) = 2.94. 7.2. D. The second moment of the severity is: (12 )(.35) + (22 )(.2) + (32 )(.15) +(42 )(.05) = 3.3. Thus the variance of the severity is: 3.3 - 1.42 = 1.34. The mean frequency is 2.1. The variance of the frequency is: (2.1)(1 + 2.1) = 6.51. The variance of the aggregate losses is: (2.1)(1.34) + (1.42 )(6.51) = 15.57. 7.3. C. The p.g.f. of the Geometric Distribution is: P(z) = (1 - 2.1(z-1))-1. c(0) = P(s(0)) = P(.25) = (1 - 2.1(.25-1))-1 = 0.38835. Alternately, the non-zero losses are Geometric with β = (75%)(2.1) = 1.575. The only way to get an aggregate of 0 is to have no non-zero losses: 1/2.575 = 0.38835. Comment: In the alternative solution, we are trying to determine the number of losses of size other than zero. If one has one or more such loss, then the aggregate losses are positive. If one has zero such losses, then the aggregate losses are zero. We can have any number of losses of size zero without affecting the aggregate losses. 7.4. A. For the Geometric Distribution: a = β/(1+β) = 2.1/3.1 = .67742 and b = 0. 1/(1-as(0)) = 1/(1-(.67742)(.25)) = 1.20388. Use the Panjer Algorithm, x
x
x
c(x) = {1/(1-as(0))}Σ(a +jb/x)s(j)c(x-j) = 1.20388Σ .67742s(j) c(x-j) = .81553Σ s(j) c(x-j) j=1
j=1
j=1
c(1) = .81553 s(1) c(0) = (.81553)(.35)(.38835) = 0.11085. Alternately, the non-zero losses are Geometric with β = (75%)(2.1) = 1.575, and severity distribution: 35/75 = 7/15 @ 1, 4/15 @2, 3/15 @3, and 1/15 @4. The only way to get an aggregate of 1 is to have one non-zero loss of size 1: (1.575/2.5752 ) (7/15) = 0.11085.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 228
7.5. C. Use the Panjer Algorithm, c(2) = .81553 {s(1) c(1) + s(2)c(0)} = (.81553){(.35)(.11085) + (.20)(.38835)} = 0.09498. Alternately, the non-zero losses are Geometric with β = (75%)(2.1) = 1.575, and severity distribution: 35/75 = 7/15 @ 1, 4/15 @2, 3/15 @3, and 1/15 @4. Ways to get an aggregate of 2: One non-zero loss of size 2: (1.575/2.5752 ) (4/15) = 0.06334. Two non-zero losses, each of size 1: (1.5752 /2.5753 ) (7/15)2 = 0.03164. Total probability: 0.06334 + 0.03164 = 0.09498. 7.6. B. Use the Panjer Algorithm, c(3) = .81553 {s(1) c(2) + s(2)c(1) +s(3)c(0)} = (.81553){(.35)(.09498) + (.20)(.11085) + (.15)(.38835)} = 0.09270. Alternately, the non-zero losses are Geometric with β = (75%)(2.1) = 1.575, and severity distribution: 35/75 = 7/15 @ 1, 4/15 @2, 3/15 @3, and 1/15 @4. Ways to get an aggregate of 3: One non-zero loss of size 3: (1.575/2.5752 ) (3/15) = 0.04751. Two non-zero losses, one of size 1 and one of size 2 in either order: (1.5752 /2.5753 ) (2)(7/15)(4/15) = 0.03616. Three non-zero losses, each of size 1: (1.5753 /2.5754 ) (7/15)3 = 0.00903. Total probability: 0.04751 + 0.03616 + 0.00903 = 0.09270. 7.7. A. c(4) = .81553 {s(1) c(3) + s(2)c(2) +s(3)c(1) + s(4)c(0)} = (.81553){(.35)(.09270) + (.20)(.09498) + (.15)(.11085) + (.05)(.38835)} = 0.07135. Alternately, the non-zero losses are Geometric with β = (75%)(2.1) = 1.575, and severity distribution: 35/75 = 7/15 @ 1, 4/15 @2, 3/15 @3, and 1/15 @4. Ways to get an aggregate of 4: One non-zero loss of size 4: (1.575/2.5752 ) (1/15) = 0.01584. Two non-zero losses, one of size 1 and one of size 3 in either order: (1.5752 /2.5753 ) (2)(7/15)(3/15) = 0.02712. Two non-zero losses, each of size 2: (1.5752 /2.5753 ) (4/15)2 = 0.01033. Three non-zero losses, two of size 1 and one of size 2 in any order: (1.5753 /2.5754 ) (3)(7/15)2 (4/15) = 0.01548. Four non-zero losses, each of size 1: (1.5754 /2.5755 ) (7/15)4 = 0.00258. Total probability: 0.01584 + 0.02712 + 0.01033 + 0.01548 + 0.00258 = 0.07135.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 229
7.8. E. c(5) = .81553 {s(1) c(4) + s(2)c(3) +s(3)c(2) + s(4)c(1) + s(5)c(0)} = (.81553){(.35)(.07135) + (.20)(.09270) + (.15)(.09498) + (.05)(.11085) + (0)(.38835)} = 0.05162. Comment: The aggregate distribution from 0 to 20 is: 0.38835, 0.110849, 0.094983, 0.0926988, 0.0713479, 0.0516245, 0.0415858, 0.0327984, 0.0253694, 0.0197833, 0.0154927, 0.0120898, 0.00943242, 0.00736622, 0.00575178, 0.0044901, 0.00350553, 0.00273696, 0.00213682, 0.00166827, 0.00130247. 7.9. E. From previous solutions, the mean and variance of the aggregate losses are: 2.94 and 15.57. Thus the probability that the aggregate losses are greater than 5 is approximately: 1 - Φ[(5.5 - 2.94)/ 15.57 ] = 1 - Φ[(5.5 - 2.94)/ 15.57 ] = 1 - Φ[.65] = 25.8%. Comment: Based on the previous solutions, the exact answer is 1 - .80985 = 19.0%. 7.10. B. From previous solutions, the mean and variance of the aggregate losses are: 2.94 and 15.57. The mean of a LogNormal is exp(µ + .5σ2). The second moment of a LogNormal is exp(2µ + 2σ2). Therefore set: exp(µ + .5σ2) = 2.94 and exp(2µ + 2σ2) = 15.57 + 2.942 . 1 + 15.57 / 2.942 = exp(2µ + 2σ2)/exp(2µ + σ2) = exp(σ2). σ=
ln(2.8013) = 1.015. µ = ln( 2.94 / exp(.5 (1.0152 ))) = .5634. Since the aggregate losses are
discrete, we apply a “continuity correction”; more than 5 corresponds to 5.5. The probability that the aggregate losses are greater than 5 is approximately: 1 - Φ[(ln(5.5) - .5634)/1.015] = 1 - Φ[1.12] = 13.1%. Comment: Based on the previous solutions, the exact answer is 1 - .80985 = 19.0%. 7.11. C. c(0) + c(1) + c(2) + c(3) = 0.38835 + 0.110849 + 0.094983 + 0.0926988 = .6868808 < 70%. c(0) + c(1) + c(2) + c(3) + c(4) = 0.38835 + 0.110849 + 0.094983 + 0.0926988 + 0.0713479 = .7582287 ≥ 70%. The 70th percentile is 4, the first value such that the distribution function is ≥ 70%. 7.12. D. P(z) = (1 + .3(z-1))10. c(0) = P(s(0)) = P(.2) = (1 + .3(.2-1))10 = 0.06429. Alternately, the non-zero losses are Binomial with q = (80%)(0.3) = 0.24. The only way to get an aggregate of 0 is to have no non-zero losses: (1 - 0.24)10 = 0.06429.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 230
7.13. A. For the Binomial, a = -q/(1-q) = - 0.3/0.7 = - 0.42857. b = (m+1)q/(1-q) = 33/7 = 4.71429. x
c(x) = {1/(1-as(0))}
x
Σ (a + jb/x) s(j) c(x-j) = .92105Σ (-.42857 +4.71429 j/x) s(j) c(x-j) j=1
j=1
x
= .39474Σ (-1 + 11 j/x) s(j) c(x-j) j=1
c(1) = .39474(-1 + 11(1/1))s(1)c(0) = .39474(10)(.5)(.06429) = 0.12689. Alternately, the non-zero losses are Binomial with q = (80%)(0.3) = 0.24. The severity distribution truncated to remove the zero losses is: 5/8 @ 1, 2/8 @2, and 1/8 @3. The only way to get an aggregate of 1 is to have 1 non-zero loss of size 1: 10 (1 - 0.24)9 (.24) (5/8) = 0.12689. 7.14. B. c(2) = .39474{(-1 + 11(1/2))s(1)c(1) + (-1 + 11(2/2))s(2)c(0)} = .39474{(4.5)(.5)(.12689) + (10)(.2)(.06429)} = 0.16345. Alternately, in order to get an aggregate of 2 we have either 1 non-zero loss of size 2 or two non-zero losses each of size 1: 10 (1 - 0.24)9 (.24) (2/8) + 45 (1 - 0.24)8 (.24)2 (5/8)2 = 0.16345. 7.15. B. c(3) = .39474{(-1 + 11(1/3))s(1)c(2) + (-1 + 11(2/3))s(2)c(1) + (-1 + 11(3/3))s(3)c(0)} = .39474{(2.6667)(.5)(.16345) + (6.3333)(.2)(.12689) + (10)(.1)(.06429)} = 0.17485. Alternately, in order to get an aggregate of 3 we have either 1 non-zero loss of size 3, two non-zero losses each of sizes 1 and 2 in either order, or three non-zero losses each of size 1: 10 (1 - 0.24)9 (.24) (1/8) + 45 (1 - 0.24)8 (.24)2 (2)(5/8)(2/8) + 120 (1 - 0.24)7 (.24)3 (5/8)3 = 0.17485. 7.16. C. c(4) = .39474{(-1 + 11(1/4))s(1)c(3) + (-1 + 11(2/4))s(2)c(2) + (-1 + 11(3/4))s(3)c(1) + (-1 + 11(4/4))s(4)c(0)} = .39474{(1.75)(.5)(.17485) + (4.5)(.2)(.16345) + (7.25)(.1)(.12689) + (10)(0)(.06429)} = 0.15478. 7.17. B. c(5) = .39474{(-1 + 11(1/5))s(1)c(4) + (-1 + 11(2/5))s(2)c(3) + (-1 + 11(3/5))s(3)c(2) + (-1 + 11(4/5))s(4)c(1) + (-1 + 11(5/5))s(5)c(0)} = .39474{(1.2)(.5)(.15478) + (3.4)(.2)(.17485) + (5.6)(.1)(.16345)} = 0.11972. Comment: Note that the terms involving s(4) and s(5) drop out, since s(4) = s(5) = 0.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 231
7.18. C. & 7.19. A. We are thinning a Negative Binomial; the snowstorms that close the school are also Negative Binomial, but with r = 5 and β = (.3)(3) = .9. f(0) = 1/(1 + .9)5 = 4.0%. f(1) = (5)(.9)/(1 + .9)6 = 9.6%. Alternately, the number of storms that close the school is a compound Negative Binomial - Bernoulli Distribution. c(0) = Pp(s(0)) = Pp(.7) = (1 - (3)(.7 - 1))-5 = 1/1.95 = 4.04%. a = β/(1+β) = 3/4. b = (r-1)β/(1+β) = 3. Using the recursive method / Panjer Algorithm: c(1) = {1/(1 - as(0))} {(a + b)s(1)c(0)} = {1/(1 - (3/4)(.7)} {(15/4)(.3)(.0404)} = 9.57%. 7.20. D. Apply the Panjer Algorithm. x
c(x) = {1/(1-as(0))}
Σ (a +jb/x) s(j) c(x-j) =
x
1.12676Σ (.75 + 3.75 j/x) s(j) c(x-j)
j=1
j=1
x
= .84507Σ (1 + 5 j/x) s(j) c(x-j) j=1
c(10) = .84507{(1+ 5(1/10))s(1)c(9) + (1+ 5(2/10))s(2)c(8) + (1+ 5(3/10))s(3)c(7) + (1+ 5(4/10))s(4)c(6)} = .84507{(1 + 5(1/10))(.3)(0.127617) + (1 + 5(2/10))(.4)(0.107404) + (1 + 5(3/10))(.1)(0.0875199) + (1+ 5(4/10))(.05)(0.0695986)} = 0.14845. Comment: Note that the terms involving s(5), s(6), etc., drop out, since in this case the severity is such that s(x) = 0, for x > 4. Frequency follows a Negative Binomial Distribution with r = 6 and β = 3. The aggregate distribution from 0 to 10 is: 0.0049961, 0.0075997, 0.0168763, 0.0250745, 0.0385867, 0.052303, 0.0695986, 0.0875199, 0.107404, 0.127617, 0.148454. 7.21. E. & 7.22. B. We are thinning a Poisson; the hurricanes that hit the continental United States are also Poisson, but with λ = (.15)(11) = 1.65. f(0) = e-1.65 = 19.2%. f(1) = 1.65e-1.65 = 31.7%. Alternately, the number of storms that hit the continental United States is a compound Poisson-Bernoulli Distribution. c(0) = Pp(s(0)) = Pp(.85) = exp[(11)(.85 - 1)] = e-1.65 = 19.205%. a = 0. b = λ = 11. Using the recursive method / Panjer Algorithm: c(1) = {1/(1 - as(0))}{(a + b)s(1)c(0)} = {1}{(11)(.15)(.19205)} = 31.688%.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 232
7.23. D. For the Primary Geometric, P(z) = 1/{1 - β(z-1)} = 1/(2.7 - 1.7z). The secondary Poisson has density at zero of e−λ = e-3.1. The density of the compound distribution at zero is the p.g.f. of the primary distribution at e-3.1: 1/{2.7 - 1.7e-3.1} = 0.3812. 7.24. C. For the Primary Geometric, a = β/(1+β) = 1.7/2.7 = .62963 and b = 0 . The secondary Poisson has density at zero of e−λ = e-3.1 = .045049. 1/(1 - as(0)) = 1/{1 - (.62963)(.045049)} = 1.02919. Use the Panjer Algorithm, x
x
x
c(x) = {1/(1 - as(0))}Σ(a + jb/x)s(j)c(x-j) =1.02919 Σ .62963s(j) c(x-j) = .64801Σ s(j) c(x-j). j=1
j=1
j=1
c(1) = .64801 s(1) c(0) = (.64801)(.139653)(.3812) = 0.03450. Alternately, the compound distribution is one if and only if the Geometric is n ≥ 1, and of the resulting n Poissons one is 1 and the rest are 0. ∞
c(1) = Σ Prob[Geometric = n] n Prob[Poisson = 1] Prob[Poisson = 0]n-1 = n=1 ∞
∞
Σ {(1.7/2.7)n /2.7} n 3.1e-3.1 (e-3.1)n-1 = (3.1/2.7)Σ n(e-3.11.7/2.7)n = n=1
n=1
(3.1/2.7){0.0283643 + 0.0016091 + 0.0000685 + .0000026 + .0000001 + ...} = 0.03450. Comment: The densities of the secondary Poisson Distribution with λ = 3.1 are: n 0 1 2 3 4
s(n) 0.045049 0.139653 0.216461 0.223677 0.173350
The formula for the Panjer Algorithm simplifies a little since for the Geometric b = 0. 7.25. D. Use the Panjer Algorithm, c(2) = .64801 {s(1) c(1) + s(2)c(0)} = (.64801){(.139653)(.03450)+(.216461)(.3812)} = 0.05659. 7.26. E. Use the Panjer Algorithm, c(3) = .64801 {s(1) c(2) + s(2)c(1) +s(3)c(0)} = (.64801){(.139653)(.05659) +(.216461)(.03450) + (.223677)(.3812) } = 0.06521.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 233
7.27. C. c(4) = .64801 {s(1) c(3) + s(2)c(2) +s(3)c(1) + s(4)c(0)} = (.64801){(.139653)(.06521) +(.216461)(.05659) + (.223677)(.03450) + (.173350)(.3812) } = 0.06166. 7.28. B. c(0) + c(1) + c(2) = .3812 + .03450 + .05659 = .47229 < 50%. c(0) + c(1) + c(2) + c(3) = .3812 + .03450 + .05659 + .06521 = .5375 ≥ 50%. The median is 3, the first value such that the distribution function is ≥ 50%. 7.29. E. Apply the Panjer Algorithm. x
c(x) = {1/(1-as(0))}
Σ (a +jb/x) s(j) c(x-j) = j=1
x
.92105Σ (-.42857 +4.71429 j/x) s(j) c(x-j) j=1
x
= .39474Σ (-1 + 11 j/x) s(j) c(x-j) j=1
c(13) = .39474{(-1+ 11(1/13))s(1)c(12) + (-1+ 11(2/13))s(2)c(11) + (-1+ 11(3/13))s(3)c(10)} = .39474{(-.15385)(.5)(.00157109) + (.69231)(.2)( .00364884) + (1.53846)(.1)(.00792610)} = 0.00063307. Comment: Terms involving s(4), s(5), etc., drop out, since in this case the severity is such that s(x) = 0, for x > 3. Frequency follows a Binomial Distribution with m = 10 and q = .3.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 234
7.30. E. The secondary Binomial has density at zero of (1-q)m = .94 = .6561. The density of the compound distribution at zero is the p.g.f. of the primary Poisson distribution at .6561: exp[1.2(.6561 - 1)] = .66187. For the Primary Poisson a = 0 and b = λ = 1.2. 1/(1-as(0)) = 1. Use the Panjer Algorithm, x
x
c(x) = {1/(1 - a s(0))}Σ(a +jb/x)s(j)c(x-j) = 1.2 Σ (j/x)s(j) c(x-j) . j=1
j=1
c(1) = (1.2)(1/1) s(1) c(0) = (1.2){(4)(.93 ) (.1)}(.66187) = 0.23160. Alternately, the p.g.f. of the compound distribution is: P(z) = exp(1.2({1+ .1(z-1)}4 -1)). P(0) = exp(1.2({1+ .1(0-1)}4 -1)) = .66187. Pʼ(z) = P(z) (1.2)(4)(.1)(1+.1(z-1))3 . Pʼ(0) = P(0) (.48)(.1)(1+.1(0-1))3 = (.66187)(.48)(.93 ) = .23160. f(n) = (dn P(z) / dzn )z=0 / n!, so that f(1) = Pʼ(0) = 0.23160. Comment: Alternately, think of the Primary Poisson Distribution as the number of accidents, while the secondary Binomial represents the number of claims on each accident. The only way for the compound distribution to have be one, is if all but one accident has zero claims and the remaining accident has 1 claim. For example, the chance of 3 accidents is: 1.23 e-1.2 /3! = .086744. The chance of an accident having no claims is: .94 = .6561. The chance of an accident 1 claim is: (4)(.93 ) (.1) = .2916. Thus if one has 3 accidents, the chance that 2 accidents are for zero and 1 accident is 1 is: (3)(.2916)(.65612 ) = .37657. Thus the chance that there are 3 accidents and they sum to 1 is: (.086744)(.37657) = .03267. Summing over all the possible numbers of accidents, gives a density at one of the compound distribution of .23160: Number of Accidents
Poisson
Chance of all but one at 0 claims and one at 1 claim
Chance of 1 claim in Aggregate
0 1 2 3 4 5 6 7 8
0.30119 0.36143 0.21686 0.08674 0.02602 0.00625 0.00125 0.00021 0.00003
0.00000 0.29160 0.38264 0.37657 0.32943 0.27017 0.21271 0.16282 0.12209
0.00000 0.10539 0.08298 0.03267 0.00857 0.00169 0.00027 0.00003 0.00000
Sum
1.00000
0.23160
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 235
7.31. A. Prob[S ≤ 75] = Prob[N=0] + Prob[N=1] + Prob[N=2](1 - Prob[30,50 or 50,30 or 50,50]) + Prob[N=3]Prob[3@20 or 2@20 and 1@30] = e -2 + 2e-2 + 2e-2{1 - (2)(.3)(.2) - .22 } + 4e-2/3{.53 + (3)(.52 )(.3)} = 5.1467e-2 = .6965. Prob[S > 75] = 1 - .6965 = 30.35%. Alternately, use the Panjer Algorithm, in units of 10: For the Poisson Distribution, a = 0 and b = λ = 2. c(0) = P(s(0)) = P(0) = e2(0-1) = e-2 = .135335. x
c(x) = {1/(1-as(0))}
x
Σ (a +jb/x) s(j) c(x-j) = j=1
(2/x) Σ j s(j) c(x-j). j=1
c(1) = (2/1) (1) s(1) c(1-1) = 0. c(2) = (2/2){(1)s(1)c(1) + (2)s(2)c(0)} = {0 + (2)(.5)(.135335)} = .135335. c(3) = (2/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)} = (2/3){0 + 0 + (3)(.3)(.135335)} = .081201. c(4) = (2/4){(1)s(1)c(3) + (2)s(2)c(2) + (3)s(3)c(1) + (4)s(4)c(0)} = 0.5{0 + (2)(.5)(.135335) + 0 + 0} = .067668. c(5) = (2/5){(1)s(1)c(4) + (2)s(2)c(3) + (3)s(3)c(2) + (4)s(4)c(1) + (5)s(5)c(0)} = 0.4{0 + (2)(.5)(.081201) + (3)(.3)(.135335) + 0 + (5)(.2)(.135335)} = .135335. c(6) = (2/6){(1)s(1)c(5) + (2)s(2)c(4) + (3)s(3)c(3) + (4)s(4)c(2) + (5)s(5)c(1) + (6)s(6)c(0)} = (1/3){0 + (2)(.5)(.067668) + (3)(.3)(.081201) + 0 + 0 + 0} = .046916. c(7) = (2/7){(1)s(1)c(6) + (2)s(2)c(5) + (3)s(3)c(4) + (4)s(4)c(3) + (5)s(5)c(2) + (6)s(6)c(1) + (7)s(7)c(0)} = (2/7){0 + (2)(.5)(.135335) + (3)(.3)(.067668) + 0 + (5)(.2)(.135335) + 0 + 0} = .094735. c(0) + c(1) + c(2) + c(3) + c(4) + c(5) + c(6) + c(7) = .696525. 1 - .696525 = 30.35%. Alternately, use convolutions: n Poisson
0 0.1353
1 0.2707
2 0.2707
3 0.1804
x
p*0
p
p*p
p*p*p
0 10 20 30 40 50 60 70
1
Sum
1
0.5 0.3 0.2
1 - .6965 = 30.35%.
1
0.25 0.30 0.09 0.20
0.125 0.225
0.84
0.35
Aggregate Density
Aggregate Distribution
0.1353 0.0000 0.1353 0.0812 0.0677 0.1353 0.0469 0.0947
0.1353 0.1353 0.2707 0.3519 0.4195 0.5549 0.6018 0.6965
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 236
7.32. For the Negative Binomial Distribution, a = β/(1+β) = 4/5 = .8, b = (r - 1)β/(1+β) = -0.56, and P(z) = 1/{1 - β(z -1)}r = 1/{1 - 4(z - 1)}.3 = 1/{5 - 4z)}.3. c(0) = Pp (s(0)) = Pp (0) = 1/5.3 = .6170339. x
x
c(x) = {1/(1 - as(0))} Σ (a + jb/x) s(j) c(x-j) = Σ (.8 - .56j/x) s(j) c(x-j). j=1
j=1
c(1) = (.8 - .56)s(1)c(0) =(.24)(.5)(.6170339) = .0740441. c(2) = (.8 - .56/2)s(1)c(1) + (.8 - .56(2/2))s(2)c(0) = (.52)(.5)(.0740441) + (.24)(.3)(.6170339) = .0636779. c(3) = (.8 - .56/3)s(1)c(2) + (.8 - .56(2/3))s(2)c(1) + (.8 - .56(3/3))s(3)c(0) = (.613333)(.5)(.0636779) + (.426667)(.3)(.0740441) + (.24)(.2)(.6170339) = .0586232. c(4) = (.8 - .56/4)s(1)c(3) + (.8 - .56(2/4))s(2)c(2) + (.8 - .56(3/4))s(3)c(1) + (.8 - .56(4/4))s(4)c(0) = (.66)(.5)(.0586232) + (.52)(.3)(.0636779) + (.38)(.2)(.0740441) + (.24)(0)(.6170339) = .0349068. Continuing in this manner produces the following densities for the compound distribution from zero to twenty: 0.617034, 0.0740441, 0.0636779, 0.0586232, 0.0349067, 0.0280473, 0.0224297, 0.0173693, 0.0140905, 0.0114694, 0.00937036, 0.00773602, 0.00641437, 0.00534136, 0.00446732, 0.00374844, 0.00315457, 0.00266188, 0.00225134, 0.00190808, 0.0016202. The corresponding distribution functions from zero to twenty are: 0.617034, 0.691078, 0.754756, 0.813379, 0.848286, 0.876333, 0.898763, 0.916132, 0.930223, 0.941692, 0.951062, 0.958798, 0.965213, 0.970554, 0.975021, 0.97877, 0.981924, 0.984586, 0.986838, 0.988746, 0.990366. Thus Underdog requires 20 pills to be 99% certain he will not run out during the week. Comment: This compound distribution has a long righthand tail. Therefore, one would not get the same result if one used the Normal Approximation. Note that since the secondary distribution has only three non-zero densities, each recursion involves summing at most three non-zero terms. 7.33. Using the Panjer algorithm, for the Poisson a = 0 and b = λ = 3. c(0) = P(s(0)) = P( 0) = e3(0-1) = e-3 = .04979. x
c(x) = {1/(1-as(0))}
Σ (a +jb/x) s(j) c(x-j) = j=1
x
(3/x) Σ j s(j) c(x-j) j=1
c(1) = (3/1) (1) s(1) c(1-1) = (3/1) {(1)(.6)(.04979)} = .08962. c(2) = (3/2){(1)s(1)c(1) + (2)s(2)c(0)} = (3/2) {(1)(.6)(.08962) +(2)(.3)(.04979)} = 0.1255. Alternately, if the aggregate losses are 2, then there is either one claim of size 2 or two claims each of size 1. This has probability: (3e-3)(.3) + (32 e-3 /2)(.62 ) = .04481 + .08066 = 0.1255.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 237
7.34. Using the Panjer algorithm, for the Poisson a = 0 and b = λ = .6. c(0) = P(s(0)) = P(0) = e.6(0-1) = e-.6 = .54881. x
c(x) = {1/(1-as(0))}
Σ (a +jb/x) s(j) c(x-j) = j=1
x
(.6/x) Σ j s(j) c(x-j) j=1
c(1) = (.6/1) (1) s(1) c(1-1) = (.6) {(1)(.25)(.54881)} = 0.08232. c(2) = (.6/2){(1)s(1)c(1) + (2)s(2)c(0)} = (.3) {(1)(.25)(.08232) + (2)(.35)(.54881)} = 0.12142. c(3) = (.6/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)} = (.2) {(1)(.25)(.12142) + (2)(.35)(.08232) + (3)(.40)(.54881)} = 0.14931. Comment: For example, one could instead calculate the probability of the aggregate losses being three as: Prob[1 claim @3] + Prob[ 2 claims of sizes 1 and 2] + Prob [3 claims each @1] = (.4)(.6e-.6) + (2)(.25)(.35)(.62 e-.6 /2) + (.253 )(.63 e-.6 /6) = .1493.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 238
7.35. D. For the aggregate losses to be 4, one can have either 2 claims each of size 2, 3 claims of which 2 are size 1 and one is size 2 (there are 3 ways to order the claim sizes), or 4 claims each of size one. Thus fS(4) = (22 e-2/2)(.62 ) + (23 e-2/6)((3)(.42 )(.6)) + (24 e-2/24)(.44 ) = 1.121e-2 = 0.152. Alternately, use the Panjer Algorithm (recursive method): For the Poisson a = 0 and b = λ = 2. c(0) = P(s(0)) = P(0) = e2(0-1) = e-2 = 0.13534. x
x
c(x) = {1/(1 - as(0))} Σ (a +jb/x) s(j) c(x-j) = (2/x) Σ j s(j) c(x-j). j=1
j=1
c(1) = (2/1) (1) s(1) c(1-1) = (2/1) {(1)(.4)(.13534)} = 0.10827. c(2) = (2/2){(1)s(1)c(1) + (2)s(2)c(0)} = (.4)(.10827) + (2)(.6)( .13534) = 0.20572. c(3) = (2/3){s(1)c(2) + 2s(2)c(1) + 3s(3)c(0)} = (2/3){(.4)(.20572) + (2)(.6)(.10827) + 0} = 0.14147. c(4) = (2/4){s(1)c(3) + 2s(2)c(2) + 3s(3)c(1) + 4s(4)c(0)} = (2/4){(.4)(.14147) + (2)(.6)(.20572) + 0 + 0} = 0.1517. Alternately, weight together convolutions of the severity distribution: (.1353)(0) + (.2707)(0) + (.2707)(.36) + (.1804)(.288) + (.0902)(.0256) = 0.1517. n Poisson
0 0.1353
1 0.2707
2 0.2707
3 0.1804
4 0.0902
x
p*0
p
p*p
p*p*p
p*4
0 1 2 3 4 5 6 7 8
1
Sum
1
0.4 0.6
1
0.16 0.48 0.36
1
0.064 0.288 0.432 0.216
1
0.0256 0.1536 0.3456 0.3456 0.1296
Aggregate Distribution 0.135335 0.108268 0.205710 0.141470 0.151720
1
Comment: Since we only want the density at 4, and do not need the densities at 0, 1, 2, and 3 in order to answer this question, the Panjer Algorithm involves more computation in this case.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 239
7.36. The secondary Poisson has density at zero of e−λ = e-0.5 = 0.6065. The densities of the secondary Poisson Distribution with λ = 0.5 are: n 0 1 2 3 4 5
s(n) 0.6065 0.3033 0.0758 0.0126 0.0016 0.0002
The density of the compound distribution at zero is the p.g.f. of the primary Geometric distribution, P(z) = 1/{1 - β(z-1)}, at z = e-0.5 : 1/{4 - 3e-.5} = .4586. For the Primary Geometric, a = β/(1+β) = 3/4 = .75 and b = 0. 1/(1 - as(0)) = 1/{1 - (.75)(.6065)} = 1.8345. Use the Panjer Algorithm: x
x
x
c(x) = {1/(1-as(0))}Σ(a +jb/x)s(j)c(x-j) =1.8345 Σ .75s(j) c(x-j) = 1.3759Σ s(j) c(x-j) j=1
j=1
j=1
c(1) = 1.3759 s(1) c(0) = (1.3759)(.3033)(.4586) = .1914. c(2) = 1.3759 {s(1) c(1) + s(2)c(0)} = ( 1.3759){(.3033)(.1914)+(.0758)(.4586)} = .1277. c(3) = 1.3759 {s(1) c(2) + s(2)c(1) +s(3)c(0)} = ( 1.3759){(.3033)(.1277) +(.0758)(.1914) + (.0126)(.4586) } = .0812. The chance of 4 or more claims in a year is 1 - (c(0) + c(1) + c(2) + c(3)) = 1 - (.4586 + .1914 + .1277 + .0812) = 0.1411. Comment: Long! Using the Normal Approximation, one would proceed as follows. The expected number of losses per year is: (mean of Geometric)(Mean of Poisson) = (3)(0.5) = 1.5. The variance of the compound frequency distribution is: (mean of Poisson)2 (variance of geometric) + (mean of geometric)(variance of Poisson) = (λ2)β(1+β) + βλ = 3 + 1.5 = 4.5. Thus the chance of more than 3 losses is approximately: 1 - Φ[(3.5 - 1.5)/ 4.5 ] = 1 - Φ[0.94] = 1 - 0.8264 = 0.1736. Due to the skewness of the compound frequency distribution, the approximation is not particularly good.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 240
7.37. The expected number of losses per year is: (mean of Geometric)(Mean of Poisson) = (3)(0.5) = 1.5. Thus the expected annual aggregate losses are: (100)(1.5) = 150. Since each loss is of size 100, if one has 4 or more losses, then the aggregate losses are greater than 400. Therefore, the expected losses limited to 400: 0f(0) + 100f(1) + 200f(2) + 300f(3) + 400{1-(f(0)+f(1)+f(2)+f(3))} = 100{4 - 4f(0) - 3f(1) -2f(2) - f(3)} = 100{4 - 4(.4586) - 3(.1914) -2(.1277) - .0812} = 125.48. Therefore, the expected losses excess of 400 are: 150 - 125.48 = 24.52. Comment: Uses the intermediate results of the previous question. Since severity is constant, this question is basically about the frequency. The question does not specify that it wants expected annual excess losses. 7.38. E. For the geometric distribution with β = 4 , P(z) = 1/(1 - β(z-1)) = 1/(5 - 4z). a = β/(1 + β) = .8, b = 0. Using the Panjer algorithm, c(0) = Pf(s(0)) = P(0) = .2. x
x
c(x) = {1/(1-as(0))} Σ (a + jb/x) s(j) c(x-j) = .8Σ s(j) c(x-j). j=1
j=1
c(1) = .8s(1)c(0) = (.8)(1/4)(.2) = .04. c(2) = .8{s(1)c(1) + s(2)c(0)} = (.8){(1/4)(.04) + (1/4)(.2)} = .048. c(3) = .8{s(1)c(2) + s(2)c(1) + s(3)c(0)} = (.8){(1/4)(.048) + (1/4)(.04) + (1/4)(.2)} = .0576. Distribution of aggregate at 3 is: .2 + .04 + .048 + .0576 = 0.3456. Alternately, one can use “semi-organized reasoning” For the Geometric with β = 4: f(0) = 1/5 = .2, f(1) = .8f(0) = .16, f(2) = .8f(1) = .128, f(3) = .8f(2) = .1024. The ways in which the aggregate is ≤ 3: 0 claims: .2. 1 claim of size ≤ 3: (3/4)(.16) = .12. 2 claims of sizes 1 & 1, 1 & 2, or 2 & 1: (3/16)(.128) = .024. 3 claims of sizes 1 & 1 & 1: (1/64)(.1024) = .0016. Distribution of aggregate at 3 is: .2 + .12 + .024 + .0016 = 0.3456. Alternately, using convolutions: n Geometric
0 0.200
1 0.160
2 0.128
3 0.102
x
f*0
f
f*f
f*f*f
0 1 2 3
1
0.25 0.25 0.25
0.062 0.125
0.0156
Aggregate Distribution
0.2000 0.0400 0.0480 0.0576
Distribution of aggregate at 3 is: .2 + .04 + .048 + .0576 = 0.3456.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 241
7.39. E. One can thin the Geometric Distribution. The non-zero claims are Geometric with β = (5)(4/5) = 4. The size distribution for the non-zero claims is: P(X = x) = 0.25, x = 1, 2, 3, 4. Only the non-zero claims contribute to the aggregate distribution. Thus this question has the same solution as the previous question 3, 11/02, Q. 36. Distribution of aggregate at 3 is: 0.3456. Comment: In general, thinning the frequency to only consider the non-zero claims can simplify the use of convolutions or “semi-organized reasoning”. Thinning a Binomial affects q. Thinning a Poisson affects λ. Thinning a Negative Binomial affects β. 7.40. E. For XYZ to pay out more than it receives, the aggregate has to be > 250 prior to the application of the aggregate deductible. After the 500 per event deductible, the severity distribution is: 0 @ 35%, 300 or more @ 65%. Thus XYZ pays out more than it receives if and only if XYZ makes a single nonzero payment. Nonzero payments are Poisson with mean: (65%)(.15) = .0975. Probability of at least one nonzero payment is: 1 - e-.0975 = 9.29%. Alternately, for the aggregate distribution after the per event deductible, using the Panjer Algorithm, c(0) = P(s(0)) = Exp[.15(.35 - 1)] = .9071. 1 - .9071 = 9.29%. Comment: The aggregate deductible applies after the per event deductible is applied.
2013-4-3,
Aggregate Distributions §7 Panjer Algorithm,
HCM 10/23/12,
Page 242
7.41. B. Some densities of the Poisson frequency are: f(0) = e-3 = .0498, f(1) = 3e-3 = .1494, f(2) = 32 e-3/2 = .2240, f(3) = 33 e-3/6 = .2240. Ways in which the aggregate can be less than or equal to 3: no claims: .0498. 1 claim of size less than 4: (.1494)(.9) = .1345. 2 claims of sizes 1 and 1, 1 and 2, 2 and 1: (.2240){(.4)(.4) + (.4)(.3) + (.3)(.4)} = .0896. 3 claims each of size 1: (.2240)(.43 ) = .0143 The sum of these probabilities is: .0498 + .1345 + .0896 + .0143 = 0.2882. Alternately, using the Panjer algorithm, for the Poisson a = 0 and b = λ = 3. P(z) = exp[λ(z-1)]. c(0) = P(s(0)) = P(0) = e3(0-1) = e-3 = .04979. x
x
c(x) = {1/(1-as(0))} Σ (a + jb/x) s(j) c(x-j) = (3/x) Σ j s(j) c(x-j) j=1
j=1
c(1) = (3/1) (1) s(1) c(1-1) = (3) {(1)(.4)(.04979)} = .05975. c(2) = (3/2){(1)s(1)c(1) + (2)s(2)c(0)} = (1.5) {(1)(.4)(.05975) + (2)(.3)(.04979)} = .08066. c(3) = (3/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)} = (1)(.4)(.08066) + (2)(.3)(.05975) + (3)(.2)(.04979) = .09799. The sum of the densities of the aggregate at 0, 1, 2, and 3 is: .04979 + .05975. + .08066 + .09799 = 0.2882.
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 243
Section 8, Recursive Method / Panjer Algorithm, Advanced104 Additional items related to the Recursive Method / Panjer Algorithm will be discussed. Aggregate Distribution, when Frequency is a Compound Distribution:105 Assume frequency is a Compound Geometric-Poisson Distribution with β = 0.8 and λ = 1.3. Let severity have density: s(0) = 60%, s(1) = 10%, s(2) = 25%, and s(3) = 5%. Then we can use the Panjer Algorithm twice, in order to compute the Aggregate Distribution. First we use the Panjer Algorithm to calculate the density of an aggregate distribution with a Poisson frequency with λ = 1.3, and this severity. As computed in the previous section, this aggregate distribution is:106 0.594521, 0.0772877, 0.198243, 0.06398, ... These then are used as the secondary distribution in the Panjer algorithm, together with a Geometric with β = 0.8 as the primary distribution. For the primary Geometric, P(z) = 1/{1 - β(z-1)} = 1/(1.8 - .8z), a = β/(1+β) = .8/1.8 = 0.4444444, and b = 0. c(0) = Pp (s(0)) = 1/{1.8 - (0.8)(0.594521)} = 0.7550685. 1/{1 - as(0)} = 1/{1- (0.4444444)(0.594521)} = 1.359123. x x 1 c(x) = (a + j b / x) s(j) c(x - j) =1.359123 ∑ 0.4444444 s(j) c(x- j) 1 - a s(0) ∑ j=1
j=1
x
= 0.604055 ∑ s(j) c(x- j) . j=1
c(1) = 0.604055 s(1) c(0) = (0.604055)(0.0772877)(0.7550685) = 0.035251. c(2) = 0.604055{s(1)c(1)+s(2)c(0)} = (0.604055){(0.0772877)(0.035251) +(0.198243)(0.7550685)} = 0.092065. c(3) = .604055 {s(1)c(2) + s(2)c(1) + s(3)c(0)} = (0.604055){(0.0772877)(0.092065) + (0.198243)(0.035251) + (0.06398)(0.7550685)} = 0.0377009. One could calculate c(4), c(5), c(6), .... , in a similar manner. 104
See Section 9.6 of Loss Models. See Section 9.6.1 of Loss Models, not on the syllabus. 106 The densities at 0, 1, 2, and 3 were computed, while the densities from 4 through 10 were displayed. 105
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 244 Practical Issues: Loss Models mentions a number of concerns that may arise in practical applications of the recursive method/ Panjer Algorithm. Whenever one uses recursive techniques, one has to be concerned about the propagation of rounding errors. Small errors can compound at each stage and become very significant.107 While the chance of this occurring can be minimized by keeping as many significant digits as possible, in general the chance can not be eliminated. In the case of the Panjer Algorithm, one is particularly concerned about the calculated right hand tail of the aggregate distribution. In the case of a Poisson or Negative Binomial frequency distribution, the relative errors in the tail of the aggregate distribution do not grow quickly; the algorithm is numerically stable.108 However, in the case of a Binomial frequency, rarely the errors in the right hand tail will “blow up”.109 Exercise: Aggregate losses are compound Poisson with λ = 1000. There is a 5% chance that the size of a loss is zero. What is the probability that the aggregate losses are 0? [Solution: P(A=0) = PN(fX(0)) = exp(1000(fX(0)-1)) = exp(-950) = 2.6 x 10-413.] Thus for this case, the probability of the aggregate losses being zero is an extremely small number. Depending on the computer and software used, e-950 may not be distinguishable from zero. If this value is represented as zero, then the results of the Panjer Algorithm would be complete nonsense.110 Exercise: Assume in the above exercise, the probability of aggregate losses at zero, c(0) is mistakenly taken as zero. What is the aggregate distribution calculated by the Panjer algorithm? [Solution: c(x) =
x 1 (a + j b / x) s(j) c(x - j) 1 - a s(0) ∑ j=1
Thus c(1) = {1/(1 - as(0))} (a + b/1) s(1) c(0) = 0. Then, c(2) = {1/(1 - as(0))} {(a + b/2) s(1) c(1) + (a + 2b/2)s(2)c(0)} = 0. In a similar manner, the whole aggregate distribution would be calculated as zero.] 107
This is a particular concern when one is applying the recursion formula many times. While on a typical exam question one would apply the recursion formula at most 4 times, in a practical application one could apply it thousands of times. 108 See Section 9.6.3 in Loss Models. 109 In this case, the calculated probabilities will alternate sign. Of course the probabilities are actually nonnegative. 110 Of course with such a large expected frequency, it is likely that the Normal, LogNormal or other approximation to the aggregate losses may be a superior technique to using the Panjer algorithm.
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 245 So taking c(0) = 0, rather than the correct c(0) = 2.6 x 10-413, would defeat the whole purpose of using the Panjer algorithm. Thus we see that it is very important when applying the Panjer Algorithm to such situations either to carefully distinguish between extremely small numbers and zero, or to be a little “clever” in applying the algorithm. Exercise: Aggregate losses are compound Poisson with λ = 1000. The severity distribution is: f(0) = 5%, f(1) = 75%, and f(2) = 20%. What are the mean and variance of the aggregate losses? [Solution: The mean of the severity is 1.15. The second moment of the severity is 1.55. Therefore, the mean of the aggregate losses is 1150 and the variance of the aggregate losses is: (1000)(1.55) = 1550.] Thus in this case the mean of the aggregate losses minus 6 standard deviations is: 1150 - 6 1550 = 914. In general, we expect there to be extremely little probability more than 6 standard deviations below the mean.111 One could take c(x) = 0 for x ≤ 913, and c(914) = 1; basically we start the algorithm at 914. Then when we apply the algorithm, the distribution of aggregate losses will not sum to unity, since we arbitrarily chose c(914) = 1. However, at the end we can add up all of the calculated densities and divide by the sum, in order to normalize the distribution of aggregate losses. Exercise: Assume the aggregate losses have a mean of 100 and standard deviation of 5. Explain how you would apply the Panjer algorithm? [Solution: One assumes there will be very little probability below 100 - (6)(5) = 70. Thus we take c(x) = 0 for x < 70, and c(70) = 1. Then we apply the Panjer algorithm starting at 70; we calculate, c(71), c(72), c(73), ... , c(130). Then we sum up the probabilities. Perhaps they sum to 1,617,012. Then we would divide each of these calculated values by 1,617,012.] Another way to solve this potential problem, would be first to perform the calculation for λ = 1000/128 = 7.8125 rather than 1000.112 Let g(x) be the result of performing the Panjer algorithm with λ = 7.8125. Then the desired distribution of aggregate losses, corresponding to λ =1000, can be obtained as g(x)*128. Note that we can “power-up” the convolutions by successively taking convolutions. For example, (g*8 ) * (g*8 ) = (g*16), and then in turn (g*16) * (g*16) = (g*32). In this manner we need only perform 7 convolutions in order to get the 27 = 128th convolution. This technique relies on the property that the sum of independent, identically distributed compound Poisson distributions is another compound Poisson distribution.113 Φ(-6) = 9.87 x 10-10. Loss Models in Section 9.6.2, suggests starting at 6 standard deviations below the mean. One would pick some sufficiently large power of 2, such as for example 128. 113 See a “Mahlerʼs Guide to Frequency Distributions.” 111 112
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 246 Since the compound Negative Binomial shares the same property, one can apply a similar technique. Exercise: Assume you have a Compound Negative Binomial with β = 20 and r = 30. How might you use the Panjer Algorithm to calculate the distribution of aggregate losses? [Solution: One could apply the Panjer Algorithm to a Compound Negative Binomial with β = 20 and r = 30/32 = 0.9375, and then take the 32nd convolution of the result.] For the Binomial, since the m parameter has to be integer, one has to modify the technique slightly. Exercise: Assume you have a Compound Binomial with q = 0.6 and m = 592. How might you use the Panjer Algorithm to calculate the distribution of aggregate losses? [Solution: One could apply the Panjer Algorithm to a Compound Binomial with q = 0.6 and m = 1, and then take the 592nd convolution of the result. Comment: One could get the 29 = 512nd convolution relatively quickly and then convolute that with the 80th convolution. In base 2, 592 is written as 1001010000. Therefore, in order to get the 592nd convolution, one would retain the 512nd, 64th and 16th convolutions, and convolute them at the end. ]
Panjer Algorithm (Recursive Method) for the (a,b,1) class: If the frequency distribution or primary distribution, pk, is a member of the (a,b,1) class, then one can modify the Panjer Algorithm:114 115 x 1 s(x) {p1 - (a+ b) p0} c(x) = + ∑ (a + jb / x) s(j) c(x - j) , 1 - a s(0) 1 - a s(0) j=1
where p0 = frequency density at zero, and p1 = frequency density at one. c(0) = Pp (s(0)) = p.g.f. of the frequency distribution at (density of severity distribution at zero.) If p is a member of the (a, b, 0) class, then p1 = (a+b)p0 , and the first term of c(x) drops out. Thus this formula reduces to the previously discussed formula for the (a, b, 0) class.
114
See Theorem 9.8 in Loss Models. While on the syllabus, it is very unlikely that you will be asked about this. The (a, b, 1) class of frequency distributions includes the (a, b, 0) class. For the (a, b, 1) class, the recursion relationship f(x+1) = f(x)/{a + b/(x+1)} need only hold for x ≥1, rather than x ≥ 0. 115
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 247 Exercise: Calculate the density at 1 for a zero-modified Negative Binomial with β = 2, r = 3, and probability at zero of 22%. [Solution: Without the modification, f(0) = 1/(1+2)3 = 0.037037, and f(1) = (3)(2)/(1+2)4 = 0.074074. Thus with the zero-modification, the density at one is: (0.074074)(1 - 0.22)/(1 - 0.037037) = 0.06.] Exercise: What is the probability generating function for a zero-modified Negative Binomial with β = 2, r = 3, and probability at zero of 22%? [Solution: P(z) = 0.22 + (1 - 0.22)(p.g.f. of zero-truncated Negative Binomial) = 0.22 + (0.78){(1 - 2(z-1))-3 - (1+2)-3} / {1 - (1+2)-3} = 0.22 + (0.81){(1 - 2(z-1))-3 - 0.037037}.] Exercise: Let severity have density: s(0) = 30%, s(1) = 60%, s(2) = 10%. Aggregate losses are given by a compound zero-modified Negative Binomial distribution, with parameters β = 2, r = 3, and the probability at zero for the zero-modified Negative Binomial is 22%. Use the Panjer algorithm to calculate the density at 0 of the aggregate losses. [Solution: From the previous exercise, the zero-modified Negative Binomial has p.g.f. P(z) = 0.22 + (.81){(1 - 2(z-1))-3 - 0.037037}. c(0) = Pp (s(0)) = Pp (.3) = 0.22 + (0.81){(1 - 2(.3-1))-3 - 0.037037} = 0.24859.] Exercise: Use the Panjer algorithm to calculate the density at 2 of the aggregate losses. [Solution: For the zero-modified Negative Binomial, a = 2/(1+2) = 2/3 and b = (3-1)(2)/(1+2) = 4/3. x 1 s(x) {p1 - (a+ b) p0} c(x) = + (a + j b / x) s(j) c(x - j) = 1 - a s(0) ∑ 1 - a s(0) j=1
x
s(x){0.06 - (2/3 + 4/3)(0.22)}/(1- (2/3)(0.3)) + 1/{1 - (2/3)(0.3)}
∑ (2 / 3
+ j4 / 3x) s(j) c(x - j) =
j=1
x
-0.475 s(x) + 0.83333
∑ (1 + 2j / x) s(j) c(x - j) . j=1
c(1) = -0.475 s(1) + (0.83333)(1 + (2)(1)/1) s(1) c(0) = (-0.475)(0.6) + (0.83333)(3)(0.6)(0.24859) = 0.087885. c(2) = -0.475 s(2) + (0.83333){(1 + (2)(1)/2)s(1) c(1) + (1 + (2)(2)/2)s(2) c(0)} = (-0.475)(0.1) + (0.83333){(2)(0.6)(0.087885) + (3)(0.1)(0.24859)} = 0.10253. Comment: The densities out to 10 are: 0.24859, 0.087885, 0.102533, 0.102533, 0.0939881, 0.0811716, 0.0671683, 0.0538092, 0.0420268, 0.0321601, 0.0241992.]
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 248 Here is a graph of the density of the aggregate losses:
0.25 0.2 0.15 0.1 0.05
0
5
10
15
20
Other than the large probability of zero aggregate losses, the aggregate losses look like they could be approximated by one of the size of loss distributions in Appendix A of Loss Models. Continuous Severity Distributions: If one has a continuous severity distribution s(x), and the frequency distribution, p, is a member of the (a, b, 1) class,116 then one has an integral equation for the distribution of aggregate losses, c, similar to the Panjer Algorithm:117 x
c(x) = p1 s(x) +
∫0 (a + by / x) s(y) c(x - y) dy .
Loss Models merely states this result without using it. Instead as has been discussed, Loss Models demonstrates how one can employ the Panjer algorithm using a discrete severity distribution. One can either have started with a discrete severity distribution, or one can have approximated a continuous severity distribution by a discrete severity distribution, as will be discussed in the next section. 116
It also holds for the members of the (a, b, 0) class, which is a subset of the (a, b, 1) class. See Theorem 9.26 in Loss Models. This is a Volterra integral equation of the second kind. See for example Appendix D of Insurance Risk Models, by Panjer and Willmot. 117
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 249 Problems: Use the following information for the next 6 questions:
• Frequency follows a zero-truncated Poisson with λ = 0.8. • For the zero-truncated Poisson, P(z) = (eλz - 1) / (eλ - 1), a = 0, and b = λ. • Severity is discrete and takes on the following values: Size 0 1 2 3 4
Probability 20% 40% 20% 10% 10%
• Frequency and Severity are independent. 8.1 (2 points) What is the probability that the aggregate losses are zero? A. less than 12% B. at least 12% but less than 13% C. at least 13% but less than 14% D. at least 14% but less than 15% E. at least 15% 8.2 (2 points) What is the probability that the aggregate losses are one? A. less than 30% B. at least 30% but less than 31% C. at least 31% but less than 32% D. at least 32% but less than 33% E. at least 33% 8.3 (2 points) What is the probability that the aggregate losses are two? A. less than 19% B. at least 19% but less than 20% C. at least 20% but less than 21% D. at least 21% but less than 22% E. at least 22%
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 250 8.4 (2 points) What is the probability that the aggregate losses are three? A. less than 9% B. at least 9% but less than 10% C. at least 10% but less than 11% D. at least 11% but less than 12% E. at least 12% 8.5 (3 points) What is the probability that the aggregate losses are four? A. less than 9% B. at least 9% but less than 10% C. at least 10% but less than 11% D. at least 11% but less than 12% E. at least 12% 8.6 (3 points) What is the probability that the aggregate losses are five? A. less than 5% B. at least 5% but less than 6% C. at least 6% but less than 7% D. at least 7% but less than 8% E. at least 8%
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 251 Solutions to Problems: 8.1. D. P(z) = (eλz - 1)/(eλ - 1) = (e.8z - 1)/(e.8 - 1) . c(0) = P(s(0)) = P(.2) = (e.8(.2) - 1)/(e.8 - 1) = 0.141579. 8.2. B. For the zero-truncated Poisson, a = 0 and b = λ = .8. p(0) = 0. p(1) = .8e-.8/(1 - e-.8) = .652773. x
c(x) = s(x){p(1) - (a+b)p(0)}/(1 - as(0)) + {1/(1 - as(0))}
Σ (a + jb/x) s(j) c(x-j) = j=1
x
s(x){.652773 - (.8)(0)}/(1 - (0)(.2)) + 1/{1 - (0)(.2)}Σ (0 + .8j/x) s(j) c(x-j) = j=1 x
.652773 s(x) + .8 Σ (j/x) s(j) c(x-j) j=1
c(1) = .652773 s(1) + (.8)(1/1)s(1) c(0) = (.652773)(.4) + (.8)(1)(.4)(.141579) = 0.306414. 8.3. C. c(2) = .652773 s(2) + (.8){(1/2)s(1)c(1) + (2/2)s(2)c(0)} = (.652773) (.2) + (.8){(1/2)(.4)(.306414) + (1)(.2)(.141579)} = 0.202233. 8.4. E. c(3) = .652773 s(3) + (.8){(1/3)s(1)c(2) + (2/3)s(2)c(1) + (3/3)s(3)c(0)} = (.652773)(.1) + (.8){(1/3)(.4)(.202233) + (2/3)(.2)(.306414) + (3/3)(.1)(.141579)} = 0.130859. 8.5. E. c(4) = .652773 s(4) + (.8){(1/4)s(1)c(3) + (2/4)s(2)c(2) + (3/4)s(3)c(1) + (4/4)s(4)c(0)} = (.652773)(.1) + (.8){(1/4)(.4)(.130859) + (2/4)(.2)(.202233) + (3/4)(.1)(.306414) + (4/4)(.1)(.141579)} = 0.121636.
2013-4-3, Aggregate Distributions §8 Advanced Panjer Algorithm, HCM 10/23/12, Page 252 8.6. A. c(5) = .652773 s(5) + (.8){(1/5)s(1)c(4) + (2/5)s(2)c(3) + (3/5)s(3)c(2) + (4/5)s(4)c(1) + (5/5)s(5)c(0)} = (.652773) (0) + (.8){(1/5)(.4)(.121636) + (2/5)(.2)(.130859) + (3/5)(.1)(.202233)+ (4/5)(.1)(.306414) + (5/5)(0)(.141579)} = 0.045477. Comment: The distribution of aggregate losses from 0 to 15 is: 0.141579, 0.306414, 0.202234, 0.130859, 0.121636, 0.0454774, 0.0249329, 0.0133713, 0.00776193, 0.00303326, 0.00146421, 0.000689169, 0.000325073, 0.000126662, 0.0000556073, 0.0000237919. Here is a graph: 0.3 0.25 0.2 0.15 0.1 0.05 0
2
4
6
8
10
12
14
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 253
Section 9, Discretization118 With a continuous severity distribution, in order to apply the Recursive Method / Panjer Algorithm, one would first need to approximate this continuous distribution by a discrete severity distribution. There are a number of methods one could use to do this. Method of Rounding:119 Assume severity follows an Exponential distribution with θ = 100. For example, we could use a discrete distribution g, with support 0, 20, 40, 60, 80, 100, etc. Then we could take g(0) = F(20/2) = F(10) = 1 - e-10/100 = 1 - e-0.1 = 0.095163. We could let g(20) = F(30) - F(10) = e-.1 - e-.3 = 0.164019.120 Exercise: Continuing in this manner what is g(40)? [Solution: g(40) = F(50) - F(30) = (1 - e-50/100) - (1 - e-30/100) = e-0.3 - e-0.5 = 0.134288.] Graphically, one can think of this procedure as balls dropping from above, with their probability horizontally following the density of this Exponential Distribution. The method of rounding is like setting up a bunch of cups each of width the span of 20, centered at 0, 20, 40, 60, etc.
0
20
40
Then the expected percentage of balls falling in each cup is the discrete probability produced by the method of rounding. This discrete probability is placed at the center of each cup. 118
See Section 9.6.5 of Loss Models. See Section 9.6.5.1 of Loss Models. Also called the method of mass dispersal. 120 Loss Models actually takes g(0) = F(10) - Prob(10), g(20) = (F(30) - Prob(30)) - (F(10) - Prob(10)), etc. This makes no difference for a continuous distribution such as the Exponential. It would make a difference if there happened to be a point mass of probability at either 10 or 30. Loss Models provides no explanation for this choice of including a point mass at 30 in the discretized distribution at 40. It is unclear that this choice is preferable to instead either including a point mass at 30 in the discretized distribution at 20 or splitting it equally between 20 and 40. 119
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 254
We could arrange this calculation in a spreadsheet: x 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
F(x+10) 0.095163 0.259182 0.393469 0.503415 0.593430 0.667129 0.727468 0.776870 0.817316 0.850431 0.877544 0.899741 0.917915 0.932794 0.944977 0.954951 0.963117 0.969803 0.975276 0.979758 0.983427
g(x) 0.095163 0.164019 0.134288 0.109945 0.090016 0.073699 0.060339 0.049402 0.040447 0.033115 0.027112 0.022198 0.018174 0.014879 0.012182 0.009974 0.008166 0.006686 0.005474 0.004482 0.003669
x 400 420 440 460 480 500 520 540 560 580 600 620 640 660 680 700 720 740 760 780 800
F(x+10) 0.983427 0.986431 0.988891 0.990905 0.992553 0.993903 0.995008 0.995913 0.996654 0.997261 0.997757 0.998164 0.998497 0.998769 0.998992 0.999175 0.999324 0.999447 0.999547 0.999629 0.999696
g(x) 0.003669 0.003004 0.002460 0.002014 0.001649 0.001350 0.001105 0.000905 0.000741 0.000607 0.000497 0.000407 0.000333 0.000273 0.000223 0.000183 0.000150 0.000122 0.000100 0.000082 0.000067
The discrete distribution g is the result of discretizing the continuous Exponential distribution. Note that one could continue beyond 800 in the same manner, until the probabilities got sufficiently small for a particular application. This is an example of the Method of Rounding. For the Method of Rounding with span h, construct the discrete distribution g: g(0) = F(h/2). g(ih) = F(h(i + 1/2)) - F(h(i - 1/2)). We have that for example F(30) = G(30) = 1 - e-0.3. In this example, F and G match at 10, 30, 50, 70, etc. In general, the Distribution Functions match at all of the points halfway between the support of the discretized distribution obtained from the method of rounding.121 In this example, the span was 20, the spacing between the chosen discrete sizes. If one had instead taken a span of 200, there would have been one tenth as many points and the approximation would have been worse. If instead one had taken a span of 2, there would have been 10 times as many points and the approximation would have been much better.
121
This contrasts with the method of local moment matching, to be discussed subsequently.
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 255
Since one discretizes in order to simplify calculations, usually one wants to have fewer points, and thus a larger span. This goal conflicts with the desire to have a good approximation to the continuous distribution, which requires a smaller span. Thus in practical applications one needs to select a span that is neither too small nor too large. One can always test whether making the span smaller would materially effect your results. One could use this discretized severity distribution obtained from the method of rounding in the Panjer Algorithm in order to approximate the aggregate distribution. Exercise: Using the above discretized approximation to the Exponential distribution with
θ = 100, and a Geometric frequency with β = 9, calculate the first four densities of the aggregate distribution via the Panjer Algorithm. [Solution: The discretized distribution has span of 20, so we treat 20 as 1, 40 as 2, etc., for purposes of the Panjer Algorithm. The p.g.f. of the Geometric Distribution is: P(z) = 1/{1 - 9(z - 1)} = 1/(10 - 9z). c(0) = P(s(0)) = P(0.095163) = 1/{10 - 9(0.095163)} = 0.10937. For the Geometric Distribution: a = β/(1 + β) = 9/10 = 0.9 and b = 0. 1/(1 - as(0)) = 1/{1 - (0.9)(0.095163)} = 1.10967. c(x) = {1/(1 - as(0))}
x
x
x
j=1
j=1
j=1
∑ (a + jb / x) s(j) c(x - j) = 1.10967 ∑ 0.9 s(j) c(x - j) = 0.98430 ∑ s(j) c(x - j)
c(1) = 0.98430 s(1)c(0) = (0.98430)(0.164019)(0.10937) = 0.01766. c(2) = 0.98430 {s(1)c(1) + s(2)c(0)} = (0.98430){(0.164019)(0.01766) + (0.134288)(0.10937)} = 0.01731. c(3) = 0.98430 {s(1)c(2)+ s(2)c(1) + s(3)c(0)} = (0.98430){(0.164019)(0.01731) + (0.134288)(0.01766) + (0.109945)(0.10937)} = 0.01695.] Thus the approximate discrete densities of the aggregate distribution at 0, 20, 40, and 60 are: 0.10937, 0.01766, 0.01731, 0.01695. Exercise: What is the moment generating function of aggregate losses if the severity is Exponential with θ = 100 and frequency is Geometric with β = 9? [Solution: For the Exponential Distribution the m.g.f. is: MX(t) = (1 - 100t)-1. The p.g.f. of the Geometric Distribution is: P(z) = 1/{1 - 9(z - 1)} = 1/(10 - 9z). M A(t) = 1 /{ 10 - 9(1 - 100t)-1} = (1 - 100t) / (1 - 1000t).]
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 256
Note that (1 - 100t)/(1 - 1000t) = 0.1 + (0.9){1/(1 - 1000t)}. This is the weighted average of the m.g.f. of a point mass at zero and the m.g.f. of an Exponential distribution with mean 1000. Therefore, the aggregate distribution is a weighted average of a point mass at zero and an Exponential distribution with mean 1000, using weights 10% and 90%.122 Thus the distribution function of aggregate losses for x > 0 is: C(x) = 0.1 + 0.9(1 - e-x/1000) = 1 - 0.9e-x/1000. One can create a discrete approximation to this aggregate distribution via the method of rounding with a span of 20. Here are the first four discrete densities: g(0) = C(10) = 1 - 0.9 e-0.01 = 0.10896. g(20) = C(30) - C(10) = 0.9(e-0.01 - e-0.03) = 0.01764. g(40) = C(50) - C(30) = 0.9(e-0.03 - e-0.05) = 0.01729. g(60) = C(70) - C(50) = 0.9(e-0.05 - e-0.07) = 0.01695. This better discrete approximation to the aggregate distribution is similar to the previous approximation obtained by applying the Panjer Algorithm using the approximate severity distribution:
x 0 20 40 60
122
Discrete Density from Applying Panjer Algorithm to the Approximate Severity 0.10937 0.01766 0.01731 0.01695
Discrete Density from Applying Method of Rounding to the Exact Aggregate Distribution 0.10896 0.01764 0.01729 0.01695
This is an example of a general result discussed in my section on Analytic Results.
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 257
Exercise: Create a discrete approximation to a Pareto Distribution with θ = 40 and α = 3, using the method of rounding with a span of 50. Stop at 1000. [Solution: F(x) = 1 - {40/(40 + x)}3 . For example, g(50) = F(75) - F(25) = (40/(40 + 25))3 - (40/(40 + 75))3 = .190964. g(0) = F(25) = 0.767. F(25) = 0.767 g(50) = F(75) - F(25) = 0.191. F(75) = 0.958 g(100) = F(125) - F(75) = 0.028. F(125) = 0.986 g(150) = F(175) - F(125) = 0.008. F(175) = 0.994 g(200) = F(225) - F(175) = 0.003. F(225) = 0.997 etc. x
F(x+25)
g(x)
x
F(x+25)
g(x)
0 50 100 150 200 250 300 350 400 450 500
0.766955 0.957919 0.985753 0.993560 0.996561 0.997952 0.998684 0.999105 0.999363 0.999531 0.999645
0.766955 0.190964 0.027834 0.007807 0.003001 0.001391 0.000731 0.000421 0.000259 0.000168 0.000114
550 600 650 700 750 800 850 900 950 1000
0.999725 0.999782 0.999825 0.999857 0.999882 0.999901 0.999916 0.999929 0.999939 0.999947
0.000080 0.000058 0.000043 0.000032 0.000025 0.000019 0.000015 0.000012 0.000010 0.000008
Comment: By stopping at 1000, there is 1 - 0.999947 = 0.000053 of probability not included in the discrete approximation. One could place this additional probability at some convenient spot. For example, we could figure out where 1 - F(x) = 0.000053/2. This occurs at x = 1302. Thus one might put a probability of 0.000053 at 1300.] The sum of the first n densities that result from the method of rounding is: F(h/2) + F(3h/2) - F(h/2) + F(5h/2) - F(3h/2) + ... + F(h(n + 1/2)) - F(h(n - 1/2)) = F(h(n + 1/2)). As n goes to infinity, this sum approaches F(∞) = 1. Thus the method of rounding includes in the discrete distribution all of the probability.
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 258
Average of the Result of the Method of Rounding: The mean of the discrete distribution that results from the method of rounding is: 0 F(h/2) + h{F(3h/2) - F(h/2)} + 2h{F(5h/2) - F(3h/2)} + 3h{F(7h/2) - F(5h/2)} + ... = h{S(h/2) - S(3h/2)} + 2h{S(3h/2) - S(5h/2)} + 3h{S(5h/2) - S(3h/2)} + ... = ∞
h{S(h/2) + S(3h/2) + S(5h/2) + S(7h/2) + ...} ≅
∫0 S(x) dx = E[X].
Thus the method of rounding produces a discrete distribution with approximately the same mean as the continuous distribution we are approximating. The smaller the span, h, the better the approximation will be. Here is a computation of the mean of the previous result of applying the method of rounding with span 20 to an Exponential distribution with θ = 100. x 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400
F(x+10) 0.0952 0.2592 0.3935 0.5034 0.5934 0.6671 0.7275 0.7769 0.8173 0.8504 0.8775 0.8997 0.9179 0.9328 0.9450 0.9550 0.9631 0.9698 0.9753 0.9798 0.9834
g(x) 0.0952 0.1640 0.1343 0.1099 0.0900 0.0737 0.0603 0.0494 0.0404 0.0331 0.0271 0.0222 0.0182 0.0149 0.0122 0.0100 0.0082 0.0067 0.0055 0.0045 0.0037
Extension 0.0000 3.2804 5.3715 6.5967 7.2013 7.3699 7.2407 6.9162 6.4715 5.9607 5.4224 4.8835 4.3617 3.8687 3.4110 2.9922 2.6131 2.2732 1.9706 1.7030 1.4677
x 400 420 440 460 480 500 520 540 560 580 600 620 640 660 680 700 720 740 760 780 800 Sum
F(x+10) 0.98343 0.98643 0.98889 0.99090 0.99255 0.99390 0.99501 0.99591 0.99665 0.99726 0.99776 0.99816 0.99850 0.99877 0.99899 0.99917 0.99932 0.99945 0.99955 0.99963 0.99970
g(x) 0.00367 0.00300 0.00246 0.00201 0.00165 0.00135 0.00111 0.00090 0.00074 0.00061 0.00050 0.00041 0.00033 0.00027 0.00022 0.00018 0.00015 0.00012 0.00010 0.00008 0.00007
Extension 1.4677 1.2617 1.0822 0.9263 0.7914 0.6749 0.5747 0.4886 0.4149 0.3518 0.2979 0.2521 0.2130 0.1799 0.1517 0.1279 0.1077 0.0906 0.0762 0.0640 0.0538 101.0249003
In this case, the mean of the discrete distribution is 101, compared to 100 for the Exponential. For a longer-tailed distribution such as a Pareto, the approximation might not be this close.
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 259
Method of Local Moment Matching:123 The method of local moment matching is another technique for approximating a continuous distribution by a discrete distribution with a span of h. In the method of moment matching, the approximating distribution will have the same lower moments as the original distribution. In the simplest case, one requires that the means match. In a more complicated version, one could require that both the first and second moments match. In order to have the means match, using a span of h, the approximating densities are:124 g(0) = 1 - E[X ∧ h]/h. g(ih) = {2E[X ∧ ih] - E[X ∧ (i-1)h] - E[X ∧ (i+1)h]} / h. For example, for an Exponential Distribution, E[X g(ih) = {2E[X
∧
ih] - E[X
∧
(i-1)h] - E[X
∧
∧
x] = θ(1 - e-x/θ).
(i+1)h]}/h = θe-ih/θ{eh/θ + e-h/θ - 2}/h.
For an Exponential Distribution with θ = 100 using a span of 20: g(0) = 1 - E[X
∧
20]/20 = 1 - (100)(1 - e-.2)/20 = .093654.
g(ih) = θe-ih/θ{eh/θ + e-h/θ - 2}/h = (100)e-i/5{e.2 + e-.2 - 2}/20 = 0.2000668e-i/5. g(20) = 0.2000668e-1/5 = 0.164293. g(40) = 0.2000668e-2/5 = 0.134511. Out to 800, the approximating distribution is: 0.093654, 0.164293, 0.134511, 0.110129, 0.040514, 0.033170, 0.027157, 0.022235, 0.008180, 0.006697, 0.005483, 0.004489, 0.001651, 0.001352, 0.001107, 0.000906, 0.000333, 0.000273, 0.000223, 0.000183, 0.000067.125
0.090166, 0.018204, 0.003675, 0.000742, 0.000150,
0.073821, 0.014904, 0.003009, 0.000608, 0.000123,
0.060440, 0.012203, 0.002464, 0.000497, 0.000100,
0.049484, 0.009991, 0.002017, 0.000407, 0.000082,
In general, calculating the mean matching discrete distribution requires that one calculate the limited expected value of the original distribution at each of the spanning points.
123
See Section 9.6.5.2 of Loss Models. For a distribution with positive support, for example x > 0. Obviously, this would only be applied to a distribution with a finite mean. 125 While this is close to the method of rounding approximation calculated previously, they differ. 124
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 260
Exercise: Create a discrete approximation to a Pareto Distribution with θ = 40 and α = 3, matching the mean, with a span of 10. Stop at 400. [Solution: E[X
∧
x] = {θ/(α-1)}(1 - {θ/(x+θ)}α−1) = 20{1 - {40/(x+40)}2 ).
x
LEV(x)
g(x)
Extension
x
LEV(x)
g(x)
Extension
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210
0.0000 7.2000 11.1111 13.4694 15.0000 16.0494 16.8000 17.3554 17.7778 18.1065 18.3673 18.5778 18.7500 18.8927 19.0123 19.1136 19.2000 19.2744 19.3388 19.3951 19.4444 19.4880
28.000% 32.889% 15.528% 8.277% 4.812% 2.988% 1.952% 1.330% 0.937% 0.679% 0.504% 0.382% 0.295% 0.231% 0.184% 0.148% 0.121% 0.099% 0.082% 0.069% 0.058%
0.0000 3.2889 3.1057 2.4830 1.9249 1.4938 1.1715 0.9308 0.7494 0.6110 0.5041 0.4203 0.3539 0.3006 0.2574 0.2220 0.1928 0.1685 0.1480 0.1308 0.1161
200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410
19.4444 19.4880 19.5266 19.5610 19.5918 19.6195 19.6444 19.6670 19.6875 19.7062 19.7232 19.7388 19.7531 19.7663 19.7784 19.7896 19.8000 19.8096 19.8186 19.8269 19.8347 19.8420
0.049% 0.042% 0.036% 0.031% 0.027% 0.024% 0.021% 0.018% 0.016% 0.014% 0.013% 0.011% 0.010% 0.009% 0.008% 0.008% 0.007% 0.006% 0.006% 0.005% Sum
0.1035 0.0927 0.0833 0.0751 0.0680 0.0617 0.0562 0.0514 0.0470 0.0432 0.0397 0.0366 0.0338 0.0313 0.0291 0.0270 0.0252 0.0235 0.0219 0.0205 19.5441
For example, g(0) = 1 - E[X ∧ 10]/10 = 1 - 7.2/10 = 28%. g(10) = {2E[X ∧ 10] - E[X ∧ 20] - E[X ∧ 0]}/10 = {(2)(7.2) - 11.111 - 0}/10 = 32.89%. g(20) = {2E[X ∧ 20] - E[X ∧ 30] - E[X ∧ 10]}/10 = {(2)(11.111) - 13.469 - 7.2}/10 = 15.53%. Comment: Summing through 400, the mean of the approximating distribution is 19.544 < 20, the mean of the Pareto. The Pareto is a long-tailed distribution, and we would need to include values of the approximating distribution beyond g(400), in order to get closer to the mean.]
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 261
Relationship to Layers of the Method of Mean Matching: Note that the numerator of g(ih) can be written as a difference of layers of loss:126 ih
g(ih) = {(E[X
∧
ih] - E[X
∧
(i-1)h]) - (E[X
∧
(i+1)h] - E[X
∧
ih])} / h = {
ih+h
∫ih-h S(x) dx - ∫ih S(x) dx } / h.
This numerator is nonnegative, since S(x) is a nondecreasing function of x. Thus all of the approximating discrete densities are nonnegative, when we match the mean.127 h g(ih) = {(E[X
∧
ih] - E[X
∧
(i-1)h]) - (E[X
∧
(i+1)h] - E[X
∧
ih])} = Layi - Layi+1,
where Layi is the layer from (i-1)h to ih, i = 1, 2, 3, ... The first four of these successive layers are shown on the following Lee Diagram:128 Size
4h Lay. 4 3h Layer 3 2h Layer 2 h Layer 1 1
Prob.
g(ih) = Layi /h - Layi+1 /h = (average width of area i) - (average width of area i+1) = (average contribution of S(x) to Layer i) - (average contribution of S(x) to Layer i+1).
126
This formula works even when i = 0, since we have assumed S(x) = 1 for x ≤ 0. If one matches the first two moments, this nice property does not necessarily hold. 128 Lee Diagrams are not on the syllabus of this exam. See “Mahlerʼs Guide to Loss Distributions”. 127
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 262
Demonstration that the Densities Given by the Formulas Do Match the Mean: n
ih
n
ih+h
0
nh+h
nh+h
∑ g(ih) = ∑ { ∫ S(x) dx - ∫ S(x) dx} / h = { ∫ S(x) dx - ∫ S(x) dx }/h = 1 - ∫ S(x) dx / h. i=0 i=0 ih-h ih -h nh nh The final term goes to zero as n approaches ∞, since S(x) goes to zero as x approaches ∞. Therefore, the densities of the approximating distribution do sum to 1. n
n
ih
ih+h
ih+h
n-1
∑ ih g(ih) = ∑ i { ∫ S(x) dx - ∫ S(x) dx} = ∑ (i +1) ∫ i=0 i=1 ih-h i=0 ih ih n-1 ih+h
nh+h
nh
ih+h
n
S(x) dx -
∑ (i +1) ∫ i=1
S(x) dx
ih
nh+h
∑ ∫ S(x) dx - n ∫ S(x) dx = ∫ S(x) dx - n ∫ S(x) dx . i=0 ih nh 0 nh As n approaches infinity, the first term goes to the integral from zero to infinity of the survival function, which is the mean. Assuming the mean exists, xS(x) goes to zero as x approaches infinity.129 Therefore, the second term goes to zero as n approaches infinity.130 Therefore, the mean of the discretized distribution, g, matches the mean of the original distribution. n
One can rewrite the above as,
∑ ih g(ih) = E[X ∧ nh] - n{E[X ∧ nh + h] - E[X ∧ nh]} i=0
= (n+1) E[X
∧
nh] - n E[X
∧
nh + h].
For example, when the Pareto was approximated, the sum up to n = 40 was: (41)E[X ∧ 400] - (40)E[X ∧ 410] = (41)(19.8420) - (40)(19.8347) = 19.54. Another way of showing that the mean of the approximating discrete distribution matches that of the original continuous distribution: n
∞
∞
∞
∞
∞
∞
i=0
i=1
i=1
i=1
i=1
i=1
i=1
∑ ih g(ih) = ∑ i (Layi - Layi + 1) = ∑ i Layi - ∑ i Layi + 1 = ∑ i Layi - ∑ (i -1) Layi = ∑ Layi = Mean. 129
If S(x) ~ 1/x for large x, then the integral of S(x) to infinity would not exist, and therefore neither would the mean. The second term is n times the layer from nh to nh+h. As n approaches infinity, the layer starting at nh of width h has to go to zero faster than 1/n. Otherwise when we add them up, we get an infinite sum. (The sum of 1/n diverges.) If we got an infinite sum, then the mean would not exist. 130
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 263
Matching the First Two Moments: According to Loss Models, matching the first two moments for the discretized distribution, results in more accurate results, when for example calculating stop loss premiums. While the equations for moment matching shown in Loss Models can be written out for the case of matching the first two moments and then can be programmed on a computer, this is well beyond the level of calculations you should be expected to perform on the exam!131 Matching the first two moments, the densities of the approximating distribution are:132 2h
g(0) =
∫0 (x2 - 3hx + 2h2) f(x) dx / (2h2). ih+h
For i odd, g(ih) = -
∫ih-h {x2 - 2ihx + (i2 -1)h2} f(x) dx / h2.
For i even, g(ih) = ih
{
ih+2h
{x2 - (2i - 3)hx + (i-1)(i - 2)h2 } f(x) dx + ∫ ∫ih ih-2h
{x2 - (2i + 3)hx + (i+ 1)(i + 2)h2} f(x) dx }/(2h2 ).
Applying these formulas to an Exponential Distribution with mean 50, using a span of 10, f(x) = e-x/50/50 and h = 10: 2h
g(0) =
20
∫0 (x2 - 3hx + 2h2) f(x) dx / (2h2) = ∫0 {(x2 - 30x + 200) e- x / 50 / 50} dx / 200
= 0.0661987. 10+h
g(10) = -
20
{x2 - 2ihx + (i2 -1)h 2} f(x) dx /h2 = - {(x2 - 20x) e- x / 50 / 50} dx / 100 ∫ ∫0 10-h
= 0.219203.
131
Loss Models does not show an example of matching the first two moments. Matching the first three moments is even more complicated. 132 Derived from equations 9.28 and 9.29 in Loss Models.
2013-4-3,
Aggregate Distributions §9 Discretization,
20
g(20) = {
HCM 10/23/12,
Page 264
40
∫0 {(x2 - 10x) e- x / 50 / 50} dx + 20∫ {(x2 - 70x + 1200) e - x / 50 / 50} dx} / 200
= 0.0886528. Note these formulas involve various integrals of x2 f(x) and x f(x). Thus, one needs to be able to calculate such integrals. For an Exponential Distribution: b
∫a
b
x2 f(x) dx =
∫a
x =b
x2 e- x / θ / θ dx = -(x2 + 2xθ + 2θ2)e - x / θ ]
x=a
= (a2 + 2aθ + 2θ2)e-a/θ - (b2 + 2bθ + 2θ2)e-b/θ.
b
b
∫a x f(x) dx = ∫a x
e- x / θ / θ
dx = -(x +
x= b x / θ θ)e
]
= (a + θ)e-a/θ - (b + θ)e-b/θ.
x= a
The resulting approximating densities at 0, 10, 20, 30,..., 600 were: 0.0661987, 0.219203, 0.0886528, 0.146936, 0.0594257, 0.0984942, 0.0398343, 0.0660226, 0.0267017, 0.0442563, 0.0178987, 0.0296659, 0.0119979, 0.0198856, 0.0080424, 0.0133297, 0.00539098, 0.00893519, 0.00361368, 0.00598944, 0.00242232, 0.00401484, 0.00162373, 0.00269123, 0.00108842, 0.00180398, 0.00072959, 0.00120925, 0.000489059, 0.000810582, 0.000327826. Coverage Modifications: I have previously discussed the effect of deductibles and maximum covered losses on the aggregate distribution. Once one has the modified frequency and severity distributions, one can apply the Panjer Algorithm or other technique of estimating the aggregate losses in the usual manner. In the case of a continuous severity, one could perform the modification and discretization in either order. However, Loss Models recommends that you perform the modification first and discretization second.133
133
See Section 9.7 of Loss Models.
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 265
Problems: Use the following information for the next two questions: One creates a discrete approximation to a Weibull Distribution with θ = 91 and τ = 1.4, using the method of rounding with a span of 25. 9.1 (1 point) What is the density of this discrete approximation at 150? A. less than 6.0% B. at least 6.0% but less than 6.1% C. at least 6.1% but less than 6.2% D. at least 6.2% but less than 6.3% E. at least 6.3% 9.2 (1 point) For this discrete approximation, what is the probability of a loss less than or equal to 75? A. less than 54% B. at least 54% but less than 56% C. at least 56% but less than 58% D. at least 58% but less than 60% E. at least 60%
9.3 (2 points) An Exponential Distribution with θ = 70 is approximated using the method of matching means with a span of 5. What is the density of the approximating distribution at 60? A. 2.0% B. 2.5% C. 3.0% D. 3.5% E. 4.0% 9.4 (2 points) A LogNormal Distribution with µ = 8 and σ = 2 is approximated using the method of rounding with a span of 2000. What is the density of the approximating distribution at 20,000? A. 1.3% B. 1.5% C. 1.7% D. 1.9% E. 2.1% Use the following information for the next two questions: A Pareto Distribution with θ = 1000 and α = 2 is approximated using the method of matching means with a span of 100. 9.5 (2 points) What is the density of the approximating distribution at 500? A. 4.0% B. 4.5% C. 5.0% D. 5.5% E. 6.0% 9.6 (2 points) What is the density of the approximating distribution at 0? A. 8.0% B. 8.5% C. 9.0% D. 9.5% E. 10.0%
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 266
9.7 (1 point) An Exponential Distribution with θ = 300 is approximated using the method of rounding with a span of 50. What is the density of the approximating distribution at 400? A. 3.0% B. 3.5% C. 4.0% D. 4.5% E. 5.0% Use the following information for the next 6 questions:
• Frequency follows a Poisson Distribution with λ = 0.8. • Severity is Exponential Distribution with θ = 3. • Frequency and Severity are independent. • The severity distribution is to be approximated via the method of rounding with a span of 1. 9.8 (2 points) What is the probability that the aggregate losses are zero? A. less than 35% B. at least 35% but less than 40% C. at least 40% but less than 45% D. at least 45% but less than 50% E. at least 50% 9.9 (2 points) What is the probability that the aggregate losses are one? A. less than 8% B. at least 8% but less than 9% C. at least 9% but less than 10% D. at least 10% but less than 11% E. at least 11% 9.10 (2 points) What is the probability that the aggregate losses are two? A. less than 8% B. at least 8% but less than 9% C. at least 9% but less than 10% D. at least 10% but less than 11% E. at least 11% 9.11 (2 points) What is the probability that the aggregate losses are three? A. less than 5% B. at least 5% but less than 6% C. at least 6% but less than 7% D. at least 7% but less than 8% E. at least 8%
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 267
9.12 (3 points) What is the probability that the aggregate losses are four? A. less than 5% B. at least 5% but less than 6% C. at least 6% but less than 7% D. at least 7% but less than 8% E. at least 8% 9.13 (3 points) What is the probability that the aggregate losses are five? A. less than 5% B. at least 5% but less than 6% C. at least 6% but less than 7% D. at least 7% but less than 8% E. at least 8% 9.14 (3 points) Losses follow a Pareto Distribution with θ = 100 and α = 3. There is a deductible of 5, coinsurance of 80%, and a maximum covered loss of 100. The per loss variable is approximated using the method of rounding with a span of 4. What is the density of the approximating distribution at 40? A. 2.4% B. 2.6% C. 2.8% D. 3.0% E. 3.2% Use the following information for the next two questions: A Pareto Distribution with θ = 1000 and α = 4 is approximated using the method of rounding with a span of 100. 9.15 (2 points) What is the density of the approximating distribution at 500? A. 4.7% B. 4.9% C. 5.1% D. 5.3% E. 5.5% 9.16 (1 point) What is the density of the approximating distribution at 0? A. 16.9% B. 17.1% C. 17.3% D. 17.5% E. 17.7% 9.17 (3 points) A LogNormal Distribution with µ = 7 and σ = 0.5 is approximated using the method of matching means with a span of 200. What is the density of the approximating distribution at 2000? A. 3.5% B. 3.7% C. 3.9% D. 4.1% E. 4.3% 9.18 (3 points) Losses follow a Pareto Distribution with θ = 100 and α = 3. There is a deductible of 5, coinsurance of 80%, and a maximum covered loss of 100. The per payment variable is approximated using the method of rounding with a span of 4. What is the density of the approximating distribution at 60? A. 1.7% B. 1.9% C. 2.1% D. 2.3% E. 2.5%
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 268
9.19 (3 points) An Exponential Distribution with θ = 100 is approximated using the method of matching means with a span of 25. Let R be the density of the approximating distribution at 0. Let S be the density of the approximating distribution at 75. What is R + S? A. 23% B. 25% C. 27% D. 29% E. 31%
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 269
Solutions to Problems: 9.1. E. F(x) = 1 - exp[-(x/91)1.4]. g(150) = F(150+12.5) - F(150-12.5) = exp[-(137.5/91)1.4] - exp[-(162.5/91)1.4] = .16826 - .10521 = 6.305%. Comment: Hereʼs is a table of some values of the approximating distribution: x 0 25 50 75 100 125 150 175 200 225 250 275 300
F(x+12.5) 0.060201 0.251032 0.446217 0.611931 0.739649 0.831739 0.894794 0.936158 0.962307 0.978304 0.987806 0.993298 0.996394
g(x) 0.060201 0.190831 0.195185 0.165714 0.127718 0.092090 0.063055 0.041363 0.026149 0.015998 0.009502 0.005492 0.003096
9.2. E. The distribution function of the discrete approximating density at 75 is: g(0) + g(25) + g(50) + g(75) = F(12.5) + {F(37.5) - F(12.5)} + {F(62.5) - F(37.5)} + {F(87.5) - F(62.5)} = F(87.5) = 1 - exp[-(87.5/91)1.4] = 61.2%. Alternately, the approximating distribution and the Weibull Distribution are equal at the points midway between the span points of: 50, 75, 100, etc. The distribution function of the approximating distribution at 75 = the distribution function of the approximating distribution at 87.5 = the Weibull Distribution at 87.5. F(x) = 1 - exp[-(x/91)1.4]. F(87.5) = 1 - exp[-(87.5/91)1.4] = 61.2%. Comment: The following diagram might be helpful: x 0 25 50 x+12.5 12.5 37.5 62.5 F(x+12.5) 6.0% 25.1% 44.6% g(x) 6.0% 19.1% 19.5% 9.3. C. E[X
∧
75 87.5 61.2% 16.6%
x] = θ(1 - e-x/θ) = 70(1 - e-x/70).
E[X ∧ 55] = 38.0944. E[X ∧ 60] = 40.2939. E[X ∧ 65] = 42.3418. g(60) = {2E[X ∧ 60] - E[X ∧ 55] - E[X ∧ 65]}/5 = {(2)(40.2939) - 38.0944 - 42.3418}/5 = 3.03%. 9.4. A. g(20000) = F(21000) - F(19000) = Φ[(ln(21000) - 8)/2] - Φ[(ln(19000) - 8)/2] = Φ[.98] - Φ[.93] = .8365 - 8238 = 0.0127.
2013-4-3, 9.5. E. E[X
Aggregate Distributions §9 Discretization,
∧
HCM 10/23/12,
Page 270
x] = {θ/(α-1)}(1 - {θ/(x+θ)}α−1) = 1000{1 - {1000/(x+1000)}1 ) = 1000x/(x+1000).
E[X ∧ 400] = 285.7. E[X ∧ 500] = 333.3. E[X ∧ 600] = 375. g(500) = {2E[X ∧ 500] - E[X ∧ 400] - E[X ∧ 600]}/100 = {(2)(333.3) - 285.7 - 375}/100 = 5.9%. 9.6. C. E[X ∧ 100] = {1000/(2-1)}( 1 - {1000/(1000 + 100)}2-1) = 90.91. g(0) = 1 - E[X ∧ 100]/100 = 1 - 90.91/100 = 9.09%. 9.7. D. g(400) = F(425) - F(375) = e-375/300 - e-425/300 = 0.044. 9.8. E. P(z) = eλ(z-1) = e.8(z-1). The method of rounding assigns probability to zero of: F(.5) = 1 - e-.5/3 = 0.153518. c(0) = P(s(0)) = P(.153518) = e.8(.153518-1) = 0.508045. 9.9. C. The method of rounding assigns probability to 1 of F(1.5) - F(.5) = e-.5/3 - e-1.5/3 = .846482 - .606531 = .239951. x 0 1 2 3 4 5 6
F(x+.5) 0.153518 0.393469 0.565402 0.688597 0.776870 0.840120 0.885441
s(x) 0.153518 0.239951 0.171932 0.123195 0.088273 0.063250 0.045321
x 7 8 9 10 11 12 13
F(x+.5) 0.917915 0.941184 0.957856 0.969803 0.978363 0.984496 0.988891
s(x) 0.032474 0.023269 0.016673 0.011946 0.008560 0.006134 0.004395
For the Poisson, a = 0 and b = λ = 0.8. x
c(x) = {1/(1 - as(0))}
x
Σ (a + jb/x) s(j) c(x-j) = 1/{1 - (0)(.153518)}Σ (0 + .8j/x) s(j) c(x-j) = j=1
j=1
x
.8 Σ (j/x) s(j) c(x-j). j=1
c(1) = (.8)(1/1)s(1)c(0) = (.8)(1)(.239951)(.508045) = 0.097525. 9.10. A. c(2) = (.8){(1/2)s(1)c(1) + (2/2)s(2)c(0)} = (.8){(1/2)(.239951)(.097525) + (1)(.171932)(.508045)} = 0.079240.
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 271
9.11. C. c(3) = (.8){(1/3)s(1)c(2) + (2/3)s(2)c(1) + (3/3)s(3)c(0)} = (.8){(1/3)(.239951)(.079240) + (2/3)(.171932)(.097525) + (3/3)(.123195)(.508045)} = 0.064084. 9.12. B. c(4) = (.8){(1/4)s(1)c(3) + (2/4)s(2)c(2) + (3/4)s(3)c(1) + (4/4)s(4)c(0)} = (.8){(1/4)(.239951)(.064084) + (2/4)(.171932)(.079240) + (3/4)(.123195)(.097525)+ (4/4)(.088273)(.508045)} = 0.051611. 9.13. A. c(5) = (.8){(1/5)s(1)c(4) + (2/5)s(2)c(3) + (3/5)s(3)c(2) + (4/5)s(4)c(1) + (5/5)s(5)c(0) = (.8){(1/5)(.239951)(.051611) + (2/5)(.171932)(.064084) + (3/5)(.123195)(.079240) + (4/5)(.088273)(.097525) + (5/5)(.063250)(.508045)} = 0.0414097. Comment: The distribution of aggregate losses from 0 to 30 is: 0.508045, 0.0975247, 0.07924, 0.064084, 0.0516111, 0.0414099, 0.033112, 0.0263947, 0.0209802, 0.0166327, 0.0131542, 0.0103797, 0.0081733, 0.00642328, 0.0050387, 0.00394576, 0.00308486, 0.0024081, 0.00187708, 0.00146114, 0.00113589, 0.000881934, 0.000683946, 0.000529806, 0.00040996, 0.000316896, 0.000244715, 0.000188795, 0.00014552, 0.000112066, 0.00008623.
1
0.1
0.01
0.001
0.0001 0
5
10
15
20
25
30
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 272
9.14. B. Prior to the policy modifications, F(x) = 1 - {100/(100 + x)}3 . Let y be the payment per loss. y = 0 for x ≤ 5. y = .8(x - 5) = .8x - 4, for 5 < x ≤ 100. y = (.8)(95) = 76, for 100 ≤ x. Let H(y) be the distribution of the payments per loss. H(0) = F(5) = 1 - (100/(100 + 5))3 = .1362. H(y) = F(x) = F((y+4)/.8) = F(1.25y + 5) = 1 - (100/(105 + 1.25y))3 , for 0 < y < 76. H(76) = 1. g(40) = H(42) - H(38) = (100/{105 + (1.25)(38)})3 - {(100/{105 + (1.25)(42)})3 = 2.6%. Comment: Note that we apply the modifications first and then discretize. g(0) = H(2) = .1950. Note that at 76 we would include all the remaining probability: 1 - H(74) = {100/(105 + 1.25(74))}3 = .1298. Here is the whole approximating distribution: y 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76
H(y+2) 0.1950 0.2977 0.3836 0.4560 0.5175 0.5701 0.6153 0.6544 0.6884 0.7180 0.7440 0.7670 0.7872 0.8052 0.8212 0.8355 0.8483 0.8598 0.8702
g(y) 0.1950 0.1026 0.0859 0.0724 0.0615 0.0526 0.0452 0.0391 0.0340 0.0297 0.0260 0.0229 0.0203 0.0180 0.0160 0.0143 0.0128 0.0115 0.0104 0.1298
9.15. D. F(x) = 1 - {1000/(1000 + x)}4 . g(500) = F(550) - F(450) = (1000/(1000 + 450))4 - (1000/(1000 + 550))4 = 0.053. 9.16. E. g(0) = F(50) = 1 - (1000/(1000 + 50))4 = 17.7%.
2013-4-3,
Aggregate Distributions §9 Discretization,
9.17. C. E[X
∧
HCM 10/23/12,
Page 273
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]} =
1242.6 Φ[2 lnx - 14.5] + x{1 - Φ[2 lnx - 14]}.
∧ E[X ∧ E[X
1800] = 1242.6 Φ[.49] + 1800{1 - Φ[.99]} = (1242.6)(.6879) + (1800)(1 - .8389) = 1144.8. 2000] = 1242.6 Φ[.70] + 2000{1 - Φ[1.20]} = (1242.6)(.7580) + (2000)(1 - .8849) =
1172.1. E[X
∧
2200] = 1242.6 Φ[.89] + 2000{1 - Φ[1.39]} = (1242.6)(.8133) + (2200)(1 - .9177) =
1191.7. g(2000) = {2E[X ∧ 2000] - E[X ∧ 1800] - E[X {(2)(1172.1) - 1144.8 - 1191.7}/200 = 3.9%.
∧
2200]}/200 =
2013-4-3,
Aggregate Distributions §9 Discretization,
HCM 10/23/12,
Page 274
9.18. A. Prior to the policy modifications, F(x) = 1 - (100/(100 + x))3 . Let y be the (non-zero) payment. y is undefined for x ≤ 5. y = .8(x - 5) = .8x - 4, for 5 < x ≤ 100. y = (.8)(95) = 76, for 100 ≤ x. Let H(y) be the distribution of the non-zero payments. H(y) = {F(x) - F(5)}/S(5) = {F((y+4)/.8) - .1362}/.8638 = F(1.25y + 5)/.8638 - .1577 = {1 - (100/(105 + 1.25y))3 }/.8638 - .1577 = 1 - (100/(105 + 1.25y))3 /.8638, for 0 < y < 76. H(76) = 1. g(60) = H(62) - H(58) = {(100/{105 + (1.25)(58)})3 - {(100/{105 + (1.25)(62)})3 }/.8638 = 1.7%. Comment: See the latter portion of Example 9.14 in Loss Models. Note that we apply the modifications first and then discretize. g(0) = H(2) = .0682. Note that at 76 we would include all the remaining probability: 1 - H(74) = {100/(105 + 1.25(74))}3 /.8638 = .1503. Here is the whole approximating distribution: y 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76
9.19. A. E[X
H(y+2) 0.0682 0.1870 0.2864 0.3703 0.4415 0.5024 0.5547 0.5999 0.6393 0.6736 0.7037 0.7302 0.7537 0.7745 0.7930 0.8096 0.8244 0.8377 0.8497
∧
g(y) 0.0682 0.1188 0.0994 0.0839 0.0712 0.0609 0.0523 0.0452 0.0393 0.0343 0.0301 0.0265 0.0234 0.0208 0.0185 0.0166 0.0148 0.0133 0.0120 0.1503
x] = θ(1 - e-x/θ) = 100(1 - e-x/100).
E[X ∧ 25] = 22.120. E[X ∧ 50] = 39.347. E[X ∧ 75] = 52.763. E[X ∧ 100] = 63.212. g(0) = 1 - E[X ∧ 25]/25 = 1 - 22.120/25 = 0.1152. g(75) = {2E[X ∧ 75] - E[X ∧ 50] - E[X ∧ 100]}/25 = {(2)(52.763) - 39.347 - 63.212}/25 = 0.1187. 0.1152 + 0.1187 = 0.2339.
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 275
Section 10, Analytic Results134 In some special situations the aggregate distribution has a somewhat simpler form. Closed Under Convolution: If one adds two independent Gamma Distributions with the same θ, then one gets another Gamma Distribution with the sum of the α parameters. Gamma(α1, θ) + Gamma (α2, θ) = Gamma(α1 + α2, θ). A distribution is closed under convolution, if when one adds independent identically distributed copies, one gets a member of the same family. Gamma(α, θ) + Gamma (α, θ) = Gamma(2α, θ). Thus a Gamma Distribution is closed under convolution. Distributions that are closed under convolution include: Gamma, Inverse Gaussian, Normal, Binomial, Poisson, and Negative Binomial. If severity is Gamma(α, θ), then fX* n (x) = Gamma(nα, θ), for n ≥ 1. Gamma(nα, θ) has density: e-x/θ xnα-1 / {θnα Γ(nα)}. Thus if severity is Gamma, the aggregate distribution can be written in terms of convolutions: ∞
fA(x) =
∑ fN(n) fX* n (x) = n=0
∞
fN(0){point mass of prob. 1 @ 0} +
∑ fN(n) e- x / θ xnα - 1/ {θnα Γ(nα)} .
135
n=1
This is particularly useful when α is an integer. α = 1 is an Exponential Distribution. Exercise: Severity is Exponential. Write a formula for the density of the aggregate distribution. [Solution: ∞
fA(x) = fN(0){point mass of probability 1 @ 0} +
∑ fN(n) e- x / θ xn - 1/ {θn (n -1)!} . ] n=1
134 135
See Section 9.4 of Loss Models. Where for fX*0 (x) has a point mass of probability 1 at zero.
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 276
Geometric-Exponential:136 One interesting special case of the collective risk model has a Geometric frequency and an Exponential severity. Exercise: Let frequency be given by a Geometric Distribution with β = 3. Let severity be given by an Exponential with mean 10. Frequency and severity are independent. What is the moment generating function of the aggregate losses? 1 1 [Solution: For the Geometric Distribution, P(z) = . For β = 3, P(z) = . 1 - β(z - 1) 4 - 3z For the Exponential Distribution MX(t) = M Agg(t) = PN(MX(t)) =
Note that
1 1 . For θ = 3, MX(t) = . 1 - θt 1 - 10t
1 1 - 10t 1 - 10t = = .] 4 - 3 / (1 - 10t) 4 - 40t - 3 1 - 40t
1 - 10t 0.25 - 10t + 0.75 (1/ 4)(1 - 40t) + (3 / 4) (3 / 4) = = = (1/4) + . 1 - 40t 1 - 40t 1 - 40t 1 - 40t
This is the weighted average of the m.g.f. of a point mass at zero and the m.g.f. of an Exponential distribution with mean 40, with weights of 1/4 and 3/4.137 Thus the combination of a Geometric frequency and an Exponential Severity gives an aggregate loss distribution that is a mixture of a point mass at zero and an exponential distribution. In this case, there is a point mass of probability 25% at zero. SA(y) = 0.75 e-y/40. For example, SA(0) = 0.75, and SA(40) = 0.75e-1 = 0.276. This aggregate loss distribution is discontinuous at zero.138 This will generally be the case when there is a chance of zero claims. If instead the frequency distribution has no probability at zero and the severity is a continuous distribution with support from 0 to ∞, then the aggregate losses will be a continuous distribution from 0 to ∞.139
136
See Example 9.7 in Loss Models. The m.g.f. is the expected value of exp[xt]. Thus the m.g.f. of a point mass at zero is E[1] = 1. In general, the m.g.f. of a point mass at c is ect. The m.g.f. of an Exponential with θ = 40 is: 1/(1 - 40t). 138 The limit approaching from below zero is not equal to the limit approaching from above zero. 139 The aggregate distribution is discontinuous when the severity distribution is discontinuous. 137
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 277
In general, with a Geometric frequency and an Exponential severity: 1 1 - θt 1 M Agg(t) = PN(MX(t)) = = = = 1 - β{MX[t] - 1} 1 - β{1/ (1 - θt) - 1} 1 - θt - β{1 - (1 - θt)} 1 - θt (1 + β) (1 - θt) 1 + β - θt - βθt = = = 1 - (1 + β)θt (1 + β) {1 - (1 + β)θt} (1 + β) {1 - (1 + β)θt} 1 - θt - βθt β 1 β / (1 + β) + = + . (1 + β) {1 - θt + βθt} (1 + β) {1 - (1 + β)θt} 1 + β 1 - (1 + β)θt This is the weighted average of the moment generating function of a point mass at zero and the moment generating function of an Exponential with mean (1+β)θ. The weights are:
1 β and . 1 + β 1 + β
Therefore: FAgg(0) =
β 1 1 β . FAgg(y) = + (1 - e-y/{(1+β)θ}) = 1 e-y/{(1+β)θ}. 1 + β 1 + β 1 + β 1 + β
This mixture is mathematically equivalent to an aggregate situation with a Bernoulli frequency with β q= and an Exponential Severity with mean (1+β)θ.140 In the latter situation there is a 1 + β probability of
β 1 of no claim in which case the aggregate is 0, and a probability of of 1 + β 1 + β
1 claim and thus the aggregate is an Exponential with mean (1+β)θ.
140
This general technique can be applied to a mixture of a point mass at zero and another distribution.
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 278
Negative Binomial-Exponential:141 One can generalize the previous situation to a Negative Binomial frequency with r integer. The Negative Binomial is a sum of r independent Geometrics each with β. Thus the aggregate is the sum of r independent situations as before, each of which has a Bernoulli frequency with β q= and an Exponential Severity with mean (1+β)θ. Thus the aggregate is mathematically 1 + β the same as a Binomial frequency with m = r and q =
β and an Exponential Severity with 1 + β
mean (1+β)θ. Exercise: Determine the moment generating function of an aggregate distribution with a Binomial β frequency with m = r and q = and an Exponential Severity with mean (1+β)θ. 1 + β [Solution: For a Binomial Distribution, P(z) = {1 + q(z-1)}m. r
r
(1 + β + βz - β) (1 + βz) β For this Binomial Distribution, P(z) = {1 + (z-1)}r = = . r (1 + β) (1 + β)r 1 + β For an Exponential Distribution with mean θ, MX(t) = For this Exponential Distribution, MX(t) = (1 + M Agg(t) = PN(MX(t)) =
=
141
1 . 1 - θt
1 1 = . 1 - (1 + β)θt 1 - θt - βθt
β )r 1 - θt - βθt {1 - θt - βθt + β}r = (1 + β)r {(1 + β) (1 - θt - βθt)} r
{(1 + β) (1 - θt)}r (1 - θt)r = .] {(1 + β) (1 - θt - βθt)} r (1 - θt - βθt) r
See Example 9.7 in Loss Models.
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 279
Exercise: Determine the moment generating function of an aggregate distribution with a Negative Binomial frequency and an Exponential Severity. 1 [Solution: For the Negative Binomial Distribution, P(z) = . {1 - β(z -1)}r For the Exponential Distribution, MX(t) = M Agg(t) = PN(MX(t)) =
1 . 1 - θt
1 (1 - θt)r (1 - θt)r = = r r. ] 1 {1θt β (1 (1 θt))} {1θt βθt} r {1 - β ( - 1)} 1 - θt
We have shown that the moment generating functions are the same; thus proving as stated above, that with a Negative Binomial frequency with r integer, and an Exponential Severity, the aggregate is β mathematically the same as a Binomial frequency with m = r and q = and an Exponential 1 + β Severity with mean (1+β)θ. In the latter situation, the frequency has finite support, and severity is Exponential, so one can write the aggregate in terms of convolutions as: m
fAgg(x) = fN(0) fX* 0 (x) +
∑ fN (n) n=1
e- x/ {(1+ β)θ} xn - 1 (1+β) n θn (n-1)!
r βn e- x / {(1+ β)θ } xn - 1 1 r! = {point mass of prob. 1 @ 0} + . ∑ n n r (1 + β)r n=1 n! (r - n)! (1+β) (1+ β) θ (n- 1)!
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 280
Problems: 10.1 (1 point) Which of the following distributions is not closed under convolution? A. Binomial B. Gamma C. Inverse Gaussian D. Negative Binomial E. Pareto 10.2 (3 points) Frequency is: Prob[n = 0] = 60%, Prob[n = 1] = 30%, and Prob[n = 2] = 10%. Severity is Gamma with α = 3 and θ = 10. Frequency and severity are independent. Determine the form of the aggregate distribution. For example, what are the densities of the aggregate distribution at 10, 50, and 100? 10.3 (3 points) Calculate the density at 6 for a Compound Binomial-Poisson frequency distribution with parameters m = 4, q = 0.6, and λ = 3. A. 6%
B. 7%
C. 8%
D. 9%
E. 10%
10.4 (3 points) Frequency is Geometric with β = 1.4. Severity is Exponential with θ = 5. Frequency and severity are independent. What is the density of the aggregate distribution at 10? A. 2.0% B. 2.5% C. 3.0% D. 3.5% E. 4.0% 10.5 (4 points) Frequency is Negative Binomial with r = 3 and β = 1.4. Severity is Exponential with θ = 5. Frequency and severity are independent. What is the density of the aggregate distribution at 10? A. 2.2% B. 2.6% C. 3.0% D. 3.4% E. 3.8% 10.6 (3 points) Frequency is Poisson with λ = 2. Severity is Exponential with θ = 10. Frequency and severity are independent. What is the density of the aggregate distribution at 30? A. 1.0% B. 1.2% C. 1.4% D. 1.6% E. 1.8% 10.7 (5A, 11/96, Q.36) The frequency distribution is Geometric with parameter β. The severity distribution is Exponential with a mean of 1. Frequency and severity are independent. (1/2 point) What is the Moment Generating Function of the frequency? (1/2 point) What is the Moment Generating Function of the severity? (1 point) What is the Moment Generating Function of the aggregate losses?
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 281
10.8 (Course 151 Sample Exam #2, Q.15) (1.7 points) Aggregate claims has a compound Poisson distribution with λ = ln(4) and individual claim amounts probability function given by: f(x) =
2- x , x = 1, 2, 3,.... x ln(2)
Which of the following is true about the distribution of aggregate claims? (A) Binomial with q = 1/2. (B) Binomial with q = 1/4. (C) Negative Binomial with r = 2 and β = 1. (D) Negative Binomial with r = 4 and β = 1. (E) Negative Binomial with r = 2 and β = 3.
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 282
Solutions to Problems: 10.1. E. The sum of two independent Pareto Distributions is not another Pareto Distribution. 10.2. The sum of two of the Gammas is also Gamma, with α = 6 and θ =10. Thus the aggregate distribution is: (0.6)(point mass of prob. 1 @ 0) + (0.3)Gamma(3, 10) + (0.1)Gamma(6, 10). For y > 0, the density of the aggregate distribution is: fA(y) = (0.3){y2 e-y/10 / (103 Γ(3))} + (0.1){y5 e-y/10 / (106 Γ(6))} = e-y/10{1,500,000y2 + 8.3333y5 } / 1010. fA(10) = 0.00555. fA(50) = 0.004281. fA(100) = 0.000446. Comment: This density integrated from 0 to infinity is 0.4. The remaining 60% of the probability is in the point mass at zero, corresponding to the probability of zero claims.
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 283
10.3. E. When the Binomial primary distribution is 2, the compound distribution is the sum of two independent Poisson distributions each with λ = 3, which is Poisson with λ = 6. Density of the compound at 6 is: Prob[Binomial = 1] (Density at 6 of a Poisson with λ = 3) + Prob[Binomial = 2] (Density at 6 of a Poisson with λ = 6) + Prob[Binomial = 3] (Density at 6 of a Poisson with λ = 9) + Prob[Binomial = 4] (Density at 6 of a Poisson with λ = 12) = (0.1536)(0.05041) + (0.3456)(0.16062) + (0.3456)(0.09109) + (0.1296)(0.02548) = 0.0980. Comment: Here is the density of the compound distribution, out to 16: n Binomial
0 0.0256
1 0.1536
2 0.3456
3 0.3456
4 0.1296
Aggregate Distribution
x
f*0
λ=3
λ=6
λ=9
λ = 12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
0.04979 0.14936 0.22404 0.22404 0.16803 0.10082 0.05041 0.02160 0.00810 0.00270 0.00081 0.00022 0.00006 0.00001 0.00000 0.00000 0.00000
0.00248 0.01487 0.04462 0.08924 0.13385 0.16062 0.16062 0.13768 0.10326 0.06884 0.04130 0.02253 0.01126 0.00520 0.00223 0.00089 0.00033
0.00012 0.00111 0.00500 0.01499 0.03374 0.06073 0.09109 0.11712 0.13176 0.13176 0.11858 0.09702 0.07277 0.05038 0.03238 0.01943 0.01093
0.00001 0.00007 0.00044 0.00177 0.00531 0.01274 0.02548 0.04368 0.06552 0.08736 0.10484 0.11437 0.11437 0.10557 0.09049 0.07239 0.05429
0.034147 0.028475 0.051617 0.070664 0.084417 0.093636 0.098037 0.097036 0.090957 0.081063 0.068967 0.056172 0.043871 0.032891 0.023690 0.016405 0.010929
Sum
1
1.00000
0.99983
0.98889
0.89871
0.982974
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 284
10.4. A. In general, with a Geometric frequency and an Exponential severity: M A(t) = PN(MX(t)) = 1/{1 - β(1/(1 - θt) -1)} = (1 - θt)/{1 - (1+β)θt} = (1+β)(1 - θt)/{(1+β){1 - (1+β)θt}} = (1 + β - θt - βθt)/{(1+β){1 - (1+β)θt}} = (1 - θt - βθt)/{(1+β){1 - (1+β)θt}} + β/{(1+β){1 - (1+β)θt}} = 1/(1+β) + {β/(1+β)}/{1 - (1+β)θt}. This is the weighted average of the moment generating function of a point mass at zero and the moment generating function of an Exponential with mean (1+β)θ. In this case, the point mass at zero is: 1/(1+β) = 1/2.4 = 5/12, and the Exponential with mean (1+β)θ = (2.4)(5) = 12 is given weight: {β/(1+β)} = 1.4/2.4 = 7/12. Therefore, the density of the aggregate distribution at 10 is: (7/12)e-10/12/12 = 0.0211. Comment: Similar to Example 9.7 in Loss Models. 10.5. B. The Negative Binomial is the sum of three independent Geometric Distributions with β = 1.4. In the previous solution, the aggregate was equivalent to a Bernoulli frequency with q = 7/12 and an Exponential Severity with mean 12. This is the sum of three independent versions of the previous solution, which is equivalent to a Binomial frequency with m = 3 and q = 7/12, and an Exponential Severity with mean 12. For the Binomial, Prob[n = 0] = (5/12)3 = .0723, Prob[n = 1] = (3)(7/12)(5/12)2 = .3038, Prob[n = 2] = (3)(7/12)2 (5/12) = .4253, Prob[n = 3] = (7/12)3 = .1985. When n = 1 the aggregate is Exponential with θ = 12, with density e-x/12/12. When n = 2 the aggregate is Gamma with α = 2 and θ = 12, with density x e-x/12/144. When n = 3 the aggregate is Gamma with α = 3 and θ = 12, with density x2 e-x/12/3456. Therefore, the density of the aggregate distribution at 10 is: (0.3038)e-10/12/12 + (0.4253)(10)e-10/12/144 + (0.1985)(100)e-10/12/3456 = 0.0606e-10/12 = 0.0263. Comment: Similar to Example 9.7 in Loss Models. Beyond what you are likely to be asked.
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 285
10.6. B. Since the sum of n Exponentials is a Gamma with α = n, the density of the aggregate at x > 0 is: ∞
∑
n=1
fN(n) e- x / θ xn - 1 = θn (n-1)!
e - (2 + x / 10) x
∞
∑
n=1
∞
∑
n=1
{e- 2 2n / n!} e - x / 10 xn - 1 = 10n (n- 1)!
(0.2x)n . For x = 30 this is: (1/4452.395) n! (n-1)!
∞
∑
n=1
6n = n! (n-1)!
(1/4452.395) {6 + 36/2 + 216/12 + 1296/144 + 7776/2880 + 46656/86400 + ...) = 0.0122. Comment: There is a point mass of probability at zero of: e−λ = e-2 = 13.53%. An example of what is called a Tweedie Distribution, where more generally the severity is Gamma. The Tweedie distribution is used in Generalized Linear Models. See for example, “A Practitioners Guide to Generalized Linear Models,” by Duncan Anderson, Sholom Feldblum, Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi, or “A Primer on the Exponential Family of Distributions,” by David R. Clark and Charles A. Thayer, both in the 2004 CAS Discussion Paper Program. 10.7. As shown in Appendix B of Loss Models, for a Geometric frequency P(z) = 1/(1-β(z-1)). For an Exponential with θ = 1, M(t) = 1/(1- θt) = 1/(1-t). For the aggregate losses, MA(t) = PN(MX(t)) = 1/(1-β(1/(1-t)-1)) = (1 - t) / (1 - t - βt).
2013-4-3,
Aggregate Distributions §10 Analytic Results,
HCM 10/23/12,
Page 286
10.8. C. The probability generating function of the aggregate distribution can be written in terms of the p.g.f. of the frequency and severity: Paggregate(z) = Pfrequency(Pseverity(z)). The frequency is Poisson, with p.g.f. P(z) = exp[λ(z-1)] = exp[ln(4)(z-1)]. The severity has p.g.f. of P(z) = E[zx] = (1/ln(2)){z/2 + (z/2)2 /2 + (z/2)3 /3 + (z/2)4 /4 + ...} = (1/ln(2))(-ln(1 - z/2)) = (1/ln(2))ln(2/(2 - z)) = (1/ln(2))(ln(2) - ln(2 - z)) = 1 - ln(2 - z)/ln(2). Paggregate(z) = exp[ln(4)(Pseverity(z) - 1)] = exp[ln(4){- ln(2 - z)/ln(2)}] = exp[-2 ln(2-z)] = (2-z)-2. The p.g.f. of a Negative Binomial is [1 - β(z-1)]-r. Comparing probability generating functions, the aggregate losses are a Negative Binomial with r = 2 and β = 1. Comments: The severity (or secondary distribution in the compound frequency distribution) is a Logarithmic distribution as per Appendix B of Loss Models, with β = 1. Thus it has p.g.f. of P(z) = 1 - ln[1- β(z-1)]/ln(1+β) = 1 - ln(2-z)/ln(2). This is a compound Poisson-Logarithmic distribution. In general, a Compound Poisson-Logarithmic distribution with a parameters λ and β, is a Negative Binomial distribution with parameters r = λ/ln(1+β) and β. In this case r = ln(4)/ln(1+1) = 2ln(2)/ ln(2) = 2. ln(1 - y) = -y - y2 /2 - y3 /3 - y4 /4 - ..., for |y| < 1, follows from taking a Taylor Series.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 287
Section 11, Stop Loss Premiums Loss Models discusses stop loss insurance, in which the aggregate losses excess of an aggregate deductible are being covered.142 The stop loss premium is the expected aggregate losses excess of an aggregate deductible, or the expected cost for stop loss insurance, ignoring expenses, taxes, risk loads, etc. For example, assume Merlinʼs Mall buys stop loss insurance from Halfmoon Insurance, such that Halfmoon will pay for any aggregate losses excess of a $100,000 aggregate deductible per year. Exercise: If Merlinʼs Mall has aggregate losses of $302,000 in 2003, how much does Halfmoon Insurance pay? [Solution: 302,000 - 100,000 = $202,000. Comment: If instead Merlinʼs Mall had $75,000 in aggregate losses, Halfmoon would pay nothing.] In many cases, the stop loss premium just involves the application to somewhat different situations of mathematical concepts that have already been discussed with respect to a per claim deductible.143 One can have either a continuous or a discrete distribution of aggregate losses. Discrete Distributions of Aggregate Losses: Exercise: Assume the aggregate losses in thousands of dollars for Merlinʼs Mall are approximated by the following discrete distribution: f(50) = 0.6, f(100) = 0.2, f(150) = 0.1,f(200) = 0.05, f(250) = 0.03, f(300) = 0.02. What is the stop loss premium, for a deductible of 100 thousand? [Solution: For aggregate losses of: 50, 100, 150, 200, 250, and 300, the amounts paid Halfmoon Insurance are respectively: 0, 0, 50, 100, 150, 200. Thus the expected amount paid by Halfmoon is: (0)(0.6) + (0)(0.2) + (50)(0.1) + (100)(0.05) + (150)(0.03) + (200)(0.02) = 18.5 thousand.] In general, for any discrete distribution, one can compute the losses excess of d, the stop loss premium for a deductible of d, by taking a sum of the payments times the density function: E[(A - d)+] =
∑
(a - d) f(a) .
a>d 142
See Definition 9.3 in Loss Models. Stop loss insurance is mathematically identical to stop loss reinsurance. Reinsurance is protection insurance companies buy from reinsurers. 143 See “Mahlerʼs Guide to Loss Distributions.” Those who found Lee Diagrams useful for understanding excess losses for severity distributions, will probably also find them helpful here.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 288
For example, the stop loss premium for Merlinʼs Mall with a deductible of 150 thousand is: (50)(0.05) + (100)(0.03) + (150)(0.02) = 8.5 thousand. Note that one could arrange this calculation in a spreadsheet as follows: Aggregate Losses
Probability
Amount Paid by Stop Loss Insurance, Ded. of 150
Amount paid times Probability
50 100 150 200 250 300
0.6 0.2 0.1 0.05 0.03 0.02
0 0 0 50 100 150
0 0 0 2.5 3 3
Sum
1
8.5
This technique is generally applicable to stop loss premium calculations involving discrete distributions of aggregate losses. Exercise: What is the stop loss premium for Merlinʼs Mall with a deductible of 120 thousand ? [Solution: (30)(0.1) + (80)(0.05) + (130)(0.03) + (180)(0.02) = 14.5 thousand. Aggregate Losses
Probability
Amount Paid by Stop Loss Insurance, Ded. of 120
Amount paid times Probability
50 100 150 200 250 300
0.6 0.2 0.1 0.05 0.03 0.02
0 0 30 80 130 180
0 0 3 4 3.9 3.6
Sum
1
14.5
Note that in this case there is no probability between $100,000 and $150,000. $120,000 is 40% of the way from $100,000 to $150,000. The stop loss premium for a deductible of $120,000 is 14,500, 40% of the way from the stop loss premium for a deductible of $100,000 to the the stop loss premium at a deductible of $150,000; 14,500 = (0.6)(18.5) + (0.4)(8.5). In general, when there is no probability for the aggregate losses in an interval, the stop loss premium for deductibles in this interval can be gotten by linear interpolation.144 Exercise: What is the stop loss premium for Merlinʼs Mall with a deductible of 140 thousand ? [Solution: (0.2)(18.5) + (0.8)(8.5) = 10.5 thousand.]
144
See Theorem 9.4 in Loss Models.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 289
Thus for a discrete probability distribution, the excess losses and the excess ratio decline lineally over intervals in which there is no probability; the slope changes at any point which is part of the support of the distribution. For continuous distributions, the excess losses and excess ratio decline faster than linearly; the graphs of the excess losses and excess ratio are concave upwards.145 One can also calculate the stop loss premium, as the mean aggregate loss minus the expected value of the aggregate loss limited to d. E[(A - d)+ ] = E[A] - E[A ∧ d]. For the prior example, the mean aggregate loss is: (50)(0.6) + (100)(0.2) + (150)(0.1) + (200)(0.05) + (250)(0.03) + (300)(0.02) = 88.5 thousand. One would calculate the expected value of the aggregate loss limited to 150 as follows: A
B
C
D
Aggregate Losses
Probability
Aggregate Loss Limited to 150
Product of Col. B & Col. C
50 100 150 200 250 300
0.6 0.2 0.1 0.05 0.03 0.02
50 100 150 150 150 150
30 20 15 7.5 4.5 3
Sum
1
E[(A - 150)+ ] = E[A] - E[A
∧
80
150] = 88.5 - 80 = 8.5.
Recursion Formula: When one has a discrete distribution with support spaced at regular intervals, then Loss Models presents a systematic way to calculate the excess losses. As above, assume the aggregate losses in thousands of dollars for Merlinʼs Mall are approximated by the following discrete distribution: f(50) = 0.6, f(100) = 0.2, f(150) = 0.1, f(200) = 0.05, f(250) = 0.03, f(300) = 0.02. I n this case, the density is only positive at a finite number of points, each 50 thousand apart. ∞
Losses excess of 150 thousand = 50,000
∑ S(150
+ 50j)
j=0
= 50,000 {S(150) + S(200) + S(250) + S(300)} = 50,000(0.1 + 0.05 + 0.02 + 0) = 8,500.146 145
Excess losses are the integral of the survival function from x to infinity. With x at the lower limit of integration, the derivative of the excess losses is -S(x) < 0. The second derivative of the excess losses is f(x) > 0. 146 Which matches the result calculated directly. One could rearrange the numbers that entered into the two calculations in order to see why the results are equal.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 290
Thus in analogy to the continuous case, where one can write the excess losses as an integral of the survival function, in the discrete case, one can write the excess losses as a sum of survival functions, times ΔA:147 ∞
E[(A - jΔA)+] = ΔA Σ S(jΔA + kΔA). k =0
This result can be turned into a recursion formula. For the above example, Losses excess of 150,000 = 50,000(0.1 + 0.05 + 0.02 + 0) = 8500. The losses excess of 200,000 = 50,000(0.05 + 0.02 + 0) = 3500 = 8500 - 5000 = (losses excess of 150,000) - (50000)(0.1) = losses excess of 150,000 - ΔA S(150,000). More generally, we can write the excess losses at the larger deductible, (j+1)ΔA in terms of those at the smaller deductible, jΔA, and the Survival Function at the smaller deductible:148 E[(A - (j+1)ΔA)+ ] = E[(A - jΔA)+ ] - ΔA S(jΔA). In other words, in this type of situation, raising the aggregate deductible of the insured by ΔA, eliminates additional losses of ΔA S(jΔA), from the point of view of the insurer.149 This recursion can be very useful if there is some maximum value A can take on. In which case we could start at the top and work our way down. In the example, we know there are no aggregate losses excess of 300,000; the stop loss loss premium for a deductible of 300,000 is zero. Then we could calculate successively, the stop loss premiums for deductibles 250, 200, 150, etc. Then any other deductibles can be handled via linear interpolation. However, it is more generally useful to start at a deductible of zero and work ones way up. The stop loss premium at a deductible of zero is the mean. Usually we would have already calculated the mean aggregate loss as the product of the mean frequency and the mean severity. In the example, the mean aggregate loss is: (50)(0.6) + (100)(0.2) + (150)(0.1) + (200)(0.05) + (250)(0.03) + (300)(0.02) = 88.5 thousand. Then the stop loss premium at a deductible of 50,000 is: 88.5 - (50)(1) = 38.5 thousand. The stop loss premium at a deductible of 100 is: 38.5 - (50)(0.4) = 18.5. 147
See Theorem 9.5 in Loss Models. In the corresponding Lee Diagram, the excess losses are a sum of horizontal
rectangles of width S(Ai) and height ΔA. 148 See Corollary 9.6 in Loss Models. 149 Or adds additional losses of ΔAS( jΔA), from the point of view of the insured.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 291
This calculation can be arranged in a spreadsheet as follows: Deductible
Survival Function
Stop Loss Premium
0 50 100 150 200 250 300
1 0.4 0.2 0.1 0.05 0.02 0
88.5 38.5 18.5 8.5 3.5 1 0
Note that the stop loss premium at a deductible of 300 is 0. In general, the stop loss premium at a deductible of ∞ (or the largest possible aggregate loss) is zero. Continuous Distributions of Aggregate Losses: Assume the aggregate annual losses for Halfmoon Insurance are closely approximated by a LogNormal Distribution with µ = 16 and σ = 1.5. Exercise: What are the mean aggregate annual losses for Halfmoon? [Solution: exp(16 + 1.52 /2) = 27,371,147.] Exercise: For Halfmoon, what is the probability that the aggregate losses in a year will be larger than $100,000,000. [Solution: 1 - Φ[(ln(100,000,000) - 16)/1.5] = 1 - Φ[1.62] = 5.26%.] Halfmoon Insurance might buy stop loss reinsurance from Global Reinsurance.150 For example, assume Halfmoon buys stop loss reinsurance excess of $100 million. If Halfmoonʼs aggregate losses exceed $100 million in any given year, then Global Reinsurance will pay Halfmoon the amount by which the aggregate losses exceed $100 million. Exercise: Halfmoonʼs aggregate losses in 2002 are $273 million. How much does Global Reinsurance pay Halfmoon? [Solution: $273 million - $100 million = $173 million.]
150
Also called aggregate excess reinsurance. Stop loss reinsurance is mathematically identical situation of the purchase of insurance excess of an aggregate deductible.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 292
Mathematically, the payments by Global Reinsurance are the same as the losses excess of a deductible or maximum covered loss of $100 million. The expected excess losses are the mean minus the limited expected value.151 The expected losses retained by Halfmoon are the limited expected value. Exercise: What are the expected losses retained by Halfmoon and the expected payments by Global Reinsurance? [Solution: For the LogNormal Distribution, the limited expected value is: E[X
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}
E[X
∧
100,000,000] = (27.37 million)Φ[(ln(100 million) - 16 - 1.52 )/1.5] + (100 million){1-Φ[(ln(100 million)- 16)/ 1.5]} =
(27.37 million)Φ[0.11] + (100 million){1 - Φ[1.61]} = (27.37 million)(0.5438) + (100 million)(0.0537) = $20.25 million. Thus Halfmoon retains on average $20.25 million of losses. Global Reinsurance pays on average E[X] - E[X ∧ 100,000,000] = 27.37 million - 20.25 million = $7.12 million. Comment: The formula for the limited expected value for the LogNormal Distribution is given in Appendix A of Loss Models.] Thus ignoring Global Reinsuranceʼs expenses, etc., the net stop loss premium Global Reinsurance would charge Halfmoon would be in this case $7.12 million.152 In general, the stop loss premium depends on both the deductible and the distribution of the aggregate losses. For example, the stop loss premium for a deductible of $200 million would have been less than that for a deductible of $100 million. Given the LogNormal Distribution one could calculate variances and higher moments for either the losses excess of the deductible or below the deductible. One could also do calculations concerning layers of loss. Mathematically these are the same type of calculations as were performed on severity distributions.153
151
See “Mahlerʼs Guide to Loss Distributions.” See Definition 9.3 in Loss Models. 153 See “Mahlerʼs Guide to Loss Distributions.” 152
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 293
Exercise: What is the variance of the losses retained by Halfmoon? What is the variance of the payments by Global Reinsurance? [Solution: For the LogNormal Distribution: E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1- Φ[{ln(x) − µ} / σ] } For µ = 16 and σ = 1.5, E[(X ∧ 100 million)2 ] = 1.122 x 1015. E[X2 ] = exp[2µ + 2σ2] = 7.109 x 1015. E[X
∧
100 million] = 2.025 x 107 .
E[X] = exp[µ + .5σ2] = 27.37 million
The variance of Halfmoonʼs retained losses is: E[(X ∧ 100 million)2 ] - E[X
∧
100 million]2 = 1.122 x 1015 - (2.025 x 107 )2 =
7.12 x 1014. The second moment of Globalʼs payments is: E[X2 ] - E[(X ∧ 100 m)2 ] - 2(100 million){E[X] - (E[X ∧ 100 m]} = 7.109 x 1015 - 1.122 x 1015 - (2x 108 )(2.737 x 107 - 2.025 x 107 ) = 4.563 x 1015 . From the previous solution, the mean of Globalʼs payments is $7.23 million. Therefore, the variance of Globalʼs payments is: 4.563 x 1015 - (7.23 x 106 )2 = 4.512 x 1015.] There is nothing special about the LogNormal Distribution. One could apply the same ideas to the Uniform, Exponential, or other continuous distributions. Exercise: Aggregate losses are uniformly distributed on (50, 100). What is the net stop loss premium for a deductible of 70? 100
[Solution: losses excess of 70 =
100
∫70 (t - 70) f(t) dt = 70∫ (t - 70) / 50 dt = 9.
Alternately, for the uniform distribution, E[X ∧ x] = (2xb - a2 - x2 ) / {2(b-a)}, for a ≤ x ≤ b. E[X ∧ 70] = (2(70)(100) - 502 - 702 ) / {2(100-50)} = 66. E[X] = (50+100)/2 = 75. E[X] - E[X ∧ 70] = 75 - 66 = 9.] For any continuous distribution, F(x), the mean, limited expected value, and therefore the excess losses can be written as an integral of the survival function S(x) = 1 - F(x).154 ∞
E[A] =
154
∫0 S(t) dt .
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 294
d
E[A ∧ d] =
∫0 S(t) dt . ∞
losses excess of d = E[A] - E[A ∧ d] =
∫d S(t) dt .
Loss Models also uses the notation E[(A - d)+] for the excess losses, where y+ is defined as 0 if y < 0 and y if y ≥ 0. ∞
losses excess of d = E[(A - d)+] =
∞
∫d (t - d) f(t) dt = ∫d S(t) dt .
The stop loss premium at 0 is the mean: E[(A - 0)+ ] = E[A]. The stop loss premium at ∞ is 0: E[(A - ∞)+ ] = E[0] = 0. Other Quantities of Interest: Once one has the distribution of aggregate losses, either discrete or continuous, one can calculate other quantities than the expected losses excess of an aggregate deductible; i.e., other than the stop loss premium. Basically any quantity we could calculate for a severity distribution,155 we could calculate for an aggregate distribution. For example, one can calculate higher moments. In particular one could calculate the variance of aggregate losses excess of an aggregate deductible.
155
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 295
Exercise: Assume the aggregate losses in thousands of dollars for Merlinʼs Mall are approximated by the following discrete distribution: f(50) = 0.6, f(100) = 0.2, f(150) = 0.1, f(200) = 0.05, f(250) = 0.03, f(300) = 0.02. Merlinʼs Mall buys stop loss insurance from Halfmoon Insurance, such that Halfmoon will pay for any aggregate losses excess of a $100 thousand deductible per year. What is the variance of payments by Halfmoon? [Solution: For aggregate losses of: 50, 100, 150, 200, 250, and 300, the amounts paid Halfmoon Insurance are respectively: 0, 0, 50, 100, 150, 200. Thus the expected amount paid by Halfmoon is: (0)(0.6) + (0)(0.2) + (50)(0.1) + (100)(0.05) + (150)(0.03) + (200)(0.02) = 18.5 thousand. The second moment is: (02 )(0.6) + (02 )(0.2) + (502 )(0.1) + (1002 )(0.05) + (1502 )(0.03) + (2002 )(0.02) = 2225 million. Therefore, the variance is: 2225 million - 342.25 million = 1882.75 million.] One could calculate the mean and variance of aggregate losses subject to an aggregate limit. The losses not paid by Halfmoon Insurance due to the aggregate deductible are paid for by the insured, Merlinʼs Mall. Thus from Merlinʼs Mallʼs point of view, it pays for aggregate losses subject to an aggregate maximum of $100,000. Exercise: In the previous exercise, what are the mean and variance of Merlinʼs Mallʼs aggregate losses after the impact of insurance? [Solution: For aggregate losses of: 50, 100, 150, 200, 250, and 300, the amounts paid Merlinʼs Mall after the effect of insurance are respectively: 50, 100, 100, 100, 100, 100. Thus the expected amount paid by Merlinʼs Mall is: (50)(0.6) + (100)(0.2) + (100)(0.1) + (100)(0.05) + (100)( 0.03) + (100)(0.02) = 70 thousand. The second moment is: (502 )(0.6) + (1002 )(0.4) = 5500 million. Therefore, the variance is: 5500 million - 4900 million = 600 million.] Here is an example of how one can do calculations related to layers of loss. Assume the aggregate annual losses for Halfmoon Insurance are closely approximated by a LogNormal Distribution with µ = 16 and σ = 1.5. If Halfmoonʼs aggregate losses exceed $100 million in any given year, then Global Reinsurance will pay Halfmoon the amount by which the aggregate losses exceed $100 million. However, Global will pay no more than $250 million per year. Exercise: Halfmoonʼs aggregate losses in 2002 are $273 million. How much does Global Reinsurance pay Halfmoon? [Solution: $273 million - $100 million = $173 million.]
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 296
Exercise: Halfmoonʼs aggregate losses in 2004 are $517 million. How much does Global Reinsurance pay Halfmoon? [Solution: $517 million - $100 million = $417 million. However, Globalʼs payment is limited to $250 million. Comment: Unless Halfmoon has additional reinsurance, Halfmoon pays $517 - $250 = $267 million in losses, net of reinsurance.] Mathematically, the payments by Global Reinsurance are the same as the layer of losses from $100 to $350 million. The expected losses for Global are: E[X ∧ 350 million] - E[X ∧ 100 million].156 The expected losses retained by Halfmoon are: E[X] + E[X ∧ 100 million] - E[X ∧ 350 million]. Exercise: What are the expected losses retained by Halfmoon and the expected payments by Global Reinsurance? [Solution: For the LogNormal Distribution, the limited expected value is: E[X
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}.
For µ = 16 and σ = 1.5: E[X
∧
100 million] = $20.25 million. E[X
∧
350 million] = $25.17 million.
E[X] = exp(µ + σ2/2) = $27.37 million. Thus Global Reinsurance pays on average E[X ∧ 350 million] - E[X ∧ 100 million] = 25.17 million - 20.25 million = $4.92 million. Halfmoon retains on average $27.37 - 4.92 = $22.45 million of losses.] Similarly, one could calculate the variance of the layers of losses. The second moment of the layer of loss from d to u is: E[(X ∧ u)2 ] - E[(X ∧ d)2 ] - 2d{E[X ∧ u] - (E[X ∧ d]}.157
156 157
See “Mahlerʼs Guide to Loss Distributions.” See “Mahlerʼs Guide to Loss Distributions.”
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 297
Exercise: What is the variance of payments by Global Reinsurance? [Solution: For the LogNormal Distribution: E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1- Φ[{ln(x) − µ} / σ] } . For µ = 16 and σ = 1.5, E[(X ∧ 100 million)2 ] = 1.122 x 1015. E[(X ∧ 350 million)2 ] = 2.940 x 1015. E[X ∧ 100 million] = 2.025 x 107 . E[X ∧ 350 million] = 2.517 x 107 . The second moment of Globalʼs payments is: E[(X ∧ 350 m)2 ] - E[(X ∧ 100 m)2 ] - 2(100 million){E[X ∧ 350 m] - (E[X ∧ 100 m]} = 2.940 x 1015 - 1.122 x 1015 - (2 x 108 )( 2.517 x 107 - 2.025 x 107 ) = 8.4 x 1014. From the previous solution, the mean of Globalʼs payments is $4.92 million. Therefore, the variance of Globalʼs payments is: 8.34 x 1014 - (4.92 x 106 )2 = 8.10 x 1014. Comment: The formula for the limited moments for the LogNormal Distribution is given in Appendix A of Loss Models.]
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 298
Problems: 11.1 (1 point) The stop loss premium for a deductible of $1 million is $120,000. The stop loss premium for a deductible of $1.1 million is $111,000. Assuming the aggregate losses are very unlikely to be between $1 million and $1.1 million dollars, what is the stop loss premium for a deductible of $1.08 million? A. Less than 112,500 B. At least 112,500, but less than 112,600 C. At least 112,600, but less than 112,700 D. At least 112,700, but less than 112,800 E. At least 112,800 11.2 (3 points) The aggregate annual losses have a mean of 13,000 and a standard deviation of 92,000. Approximate the distribution of aggregate losses by a LogNormal Distribution, and then estimate the stop loss premium for a deductible of 25,000. A. Less than 7000 B. At least 7000, but less than 7100 C. At least 7100, but less than 7200 D. At least 7200, but less than 7300 E. At least 7300 Use the following information for the next 9 questions: The aggregate losses have been approximated by the following discrete distribution: f(0) = 0.3, f(10) = 0.4, f(20) = 0.1, f(30) = 0.08, f(40) = 0.06, f(50) = 0.04, f(60) = 0.02. 11.3 (1 point) What is the mean aggregate loss? A. Less than 15 B. At least 15, but less than 16 C. At least 16, but less than 17 D. At least 17, but less than 18 E. At least 18 11.4 (1 point) What is the stop loss premium, for a deductible of 10? A. Less than 4 B. At least 4, but less than 5 C. At least 5, but less than 6 D. At least 6, but less than 7 E. At least 7
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
11.5 (1 point) What is the stop loss premium, for a deductible of 20? A. Less than 2 B. At least 2, but less than 3 C. At least 3, but less than 4 D. At least 4, but less than 5 E. At least 5 11.6 (1 point) What is the stop loss premium, for a deductible of 30? A. Less than 0.5 B. At least 0.5, but less than 1.0 C. At least 1.0, but less than 1.5 D. At least 1.5, but less than 2.0 E. At least 2.0 11.7 (1 point) What is the stop loss premium, for a deductible of 40? A. Less than 0.25 B. At least 0.25, but less than 0.50 C. At least 0.50, but less than 0.75 D. At least 0.75, but less than 1.00 E. At least 1.00 11.8 (1 point) What is the stop loss premium, for a deductible of 50? A. Less than 0.1 B. At least 0.1, but less than 0.2 C. At least 0.2, but less than 0.3 D. At least 0.3, but less than 0.4 E. At least 0.4 11.9 (1 point) What is the stop loss premium, for a deductible of 60? A. Less than 0.1 B. At least 0.1, but less than 0.2 C. At least 0.2, but less than 0.3 D. At least 0.3, but less than 0.4 E. At least 0.4 11.10 (1 point) What is the stop loss premium, for a deductible of 33? A. Less than 1.4 B. At least 1.4, but less than 1.5 C. At least 1.5, but less than 1.6 D. At least 1.6, but less than 1.7 E. At least 1.7
HCM 10/23/12,
Page 299
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 300
11.11 (1 point) If the stop loss premium is 3.7, what is the corresponding deductible? A. Less than 19 B. At least 19, but less than 20 C. At least 20, but less than 21 D. At least 21, but less than 22 E. At least 22 Use the following information for the next three questions:
• Aggregate losses follow an Exponential distribution. • There is an aggregate deductible of 250. • An insurer has priced the stop loss insurance assuming a gross premium 30% more than the stop loss premium. 11.12 (1 point) What is the stop loss premium if the mean of the Exponential is 100? A. Less than 8 B. At least 8, but less than 9 C. At least 9, but less than 10 D. At least 10, but less than 11 E. At least 11 11.13 (1 point) What is the stop loss premium if the mean of the Exponential is 110? A. Less than 12 B. At least 12, but less than 13 C. At least 13, but less than 14 D. At least 14, but less than 15 E. At least 15 11.14 (1 point) If the insurer assumed that the mean of the exponential was 100, but it was actually 110, then what is the ratio of gross premium charged to the (correct) stop loss premium? A. Less than 94% B. At least 94%, but less than 95% C. At least 95%, but less than 96% D. At least 96%, but less than 97% E. At least 97%
11.15 (2 points) The stop loss premium at a deductible of 150 is 11.5. The stop loss premium for a deductible of 180 is 9.1. There is no chance that the aggregate losses are between 140 and 180. What is the probability that the aggregate losses are less than or equal to 140? A. Less than 90% B. 90% C. 91% D. 92% E. More than 92%
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 301
11.16 (2 points) The average disability lasts 47 days. The insurer will pay for all days beyond the first 10. The insurer will only pay for 75% of the cost of the first 10 days. The cost per day is $80. 60% of disabilities are 10 days or less. Assume that those disabilities of 10 days or less are uniformly distributed from 1 to 10. What is the expected cost for the insurer per disability? A. Less than 3580 B. At least 3580, but less than 3600 C. At least 3600, but less than 3620 D. At least 3620, but less than 3640 E. At least 3640 Use the following information for the next 6 questions:
• Aggregate losses for Slippery Rock Insurance have the following distribution: f(0) = 47% f(1) = 10% f(2) = 20%
• • •
f(5) = 13% f(10) = 6% f(25) = 3%
f(50) = 1%
Slippery Rock Insurance buys aggregate reinsurance from Global Reinsurance. Global will pay those aggregate losses in excess of 8 per year. Slippery Rock collects premiums equal to 110% of its expected losses prior to the impacts of reinsurance. Global Reinsurance collects from Slippery Rock Insurance 125% of the losses Global Reinsurance expects to pay.
11.17 (1 point) How much premium does Slippery Rock Insurance collect? A. Less than 3.5 B. At least 3.5, but less than 3.6 C. At least 3.6, but less than 3.7 D. At least 3.7, but less than 3.8 E. At least 3.8 11.18 (1 point) What is the variance of Slippery Rockʼs aggregate losses, prior to the impact of reinsurance? A. Less than 30 B. At least 30, but less than 40 C. At least 40, but less than 50 D. At least 50, but less than 60 E. At least 60
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 302
11.19 (1 point) What are the expected aggregate losses for Slippery Rock after the impact of reinsurance? A. Less than 1.5 B. At least 1.5, but less than 1.6 C. At least 1.6, but less than 1.7 D. At least 1.7, but less than 1.8 E. At least 1.8 11.20 (1 point) What are the variance of aggregate losses for Slippery Rock after the impact of reinsurance? A. Less than 6.5 B. At least 6.5, but less than 6.6 C. At least 6.6, but less than 6.7 D. At least 6.7, but less than 6.8 E. At least 6.8 11.21 (1 point) How much does Slippery Rock pay Global Reinsurance? A. Less than 1.4 B. At least 1.4, but less than 1.5 C. At least 1.5, but less than 1.6 D. At least 1.6, but less than 1.7 E. At least 1.7 11.22 (1 point) Global Reinsurance in turns buys reinsurance from Cosmos Assurance covering payments due to Globalʼs contract with Slippery Rock. Cosmos Assurance will reimburse Global for the portion of its payments in excess of 12. What are Globalʼs expected aggregate losses, after the impact of its reinsurance with Cosmos? A. Less than 0.6 B. At least 0.6, but less than 0.7 C. At least 0.7, but less than 0.8 D. At least 0.8, but less than 0.9 E. At least 0.9
11.23 (2 points) The aggregate annual losses follow approximately a LogNormal Distribution with parameters µ = 9.902 and σ = 1.483. Estimate the stop loss premium for a deductible of 100,000. (A) 25,000 (B) 27,000 (C) 29,000 (D) 31,000 (E) 33,000
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 303
11.24 (2 points) The aggregate losses for Mercer Trucking are given by a Compound Poisson Distribution with λ = 3. The mean severity is $10. The net stop loss premium at $25 is $14.2. The insurer will pay Mercer Trucking a dividend if Mercer Truckingʼs aggregate losses are less than $25. The dividend will be 30% of the amount by which $25 exceeds Mercer Truckingʼs aggregate losses. What is the expected value of next yearʼs dividend? A. Less than 2.8 B. At least 2.8, but less than 2.9 C. At least 2.9, but less than 3.0 D. At least 3.0, but less than 3.1 E. At least 3.1 Use the following information for the next 3 questions: Aggregate Stop Loss Aggregate Stop Loss Deductible Premium Deductible Premium 100,000 2643 1,000,000 141 150,000 1633 2,000,000 53 200,000 1070 3,000,000 26 250,000 750 4,000,000 15 300,000 563 5,000,000 10 500,000 293 Assume there is no probability between the given amounts. 11.25 (1 point) A stop loss insurance pays the excess of aggregate losses above 700,000. Determine the amount the insurer expects to pay. (A) 200 (B) 210 (C) 220 (D) 230 (E) 240 11.26 (1 point) A stop loss insurance pays the excess of aggregate losses above 250,000 subject to a maximum payment of 750,000. Determine the amount the insurer expects to pay. (A) 600 (B) 610 (C) 620 (D) 630 (E) 640 11.27 (2 points) A stop loss insurance pays 75% of the excess of aggregate losses above 500,000 subject to a maximum payment of 1,000,000. Determine the amount the insurer expects to pay. (A) 140 (B) 150 (C) 160 (D) 170 (E) 180 11.28 (3 points) A manufacturer will buy stop loss insurance with an annual aggregate deductible of D. If annual aggregate losses are less than D, then the manufacturer will pay its workers a safety bonus of one third the amount by which annual losses are less than D. D is chosen so as to minimize the sum of the stop loss premium and the expected bonuses. What portion of the time will the annual aggregate losses exceed D? (A) 1/4 (B) 1/3 (C) 2/5 (D) 1/2 (E) 3/5
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 304
11.29 (5 points) The Duff Brewery buys Workersʼ Compensation Insurance. Duffʼs annual aggregate losses are LogNormally Distributed with µ = 13.5 and σ = 0.75. Duffʼs premiums depend on its actual aggregate annual losses, A. Premium = 1.05(200,000 + 1.1A), subject to a minimum premium of 500,000 and a maximum premium of 2,500,000. What is Duffʼs average premium? A. Less than 1.3 million B. At least 1.3 million, but less than 1.4 million C. At least 1.4 million, but less than 1.5 million D. At least 1.5 million, but less than 1.6 million E. At least 1.6 million 11.30 (Course 151 Sample Exam #1, Q.14) (1.7 points) Aggregate claims have a compound Poisson distribution with λ = 4, and a severity distribution: p(1) = 3/4 and p(2) = 1/4. Determine the stop loss premium at 2. (A) 3.05 (B) 3.07 (C) 3.09
(D) 3.11
(E) 3.13
11.31 (Course 151 Sample Exam #1, Q.16) (1.7 points) A stop-loss reinsurance pays 80% of the excess of aggregate claims above 20, subject to a maximum payment of 5. All claim amounts are non-negative integers. Let In be the the stop loss premium for a deductible of n, (and no limit), then you are given: E[I16] = 3.89
E[I25] = 2.75
E[I20] = 3.33
E[I26] = 2.69
E[I24] = 2.84
E[I27] = 2.65
Determine the total amount of claims the reinsurer expects to pay. (A) 0.46 (B) 0.49 (C) 0.52 (D) 0.54 (E) 0.56 11.32 (Course 151 Sample Exam #2, Q.8) (0.8 points) A random loss is uniformly distributed over ( 0, 80 ). Two types of insurance are available. Type Premium Stop loss with deductible 10 Insurer's expected claim plus 14.6 Complete Insurer's expected claim times (1+k) The two premiums are equal. Determine k. (A) 0.07 (B) 0.09 (C) 0.11 (D) 0.13 (E) 0.15
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 305
11.33 (Course 151 Sample Exam #2, Q.22) (1.7 points) Aggregate claims has a compound Negative Binomial distribution with r = 2 and β = 7/3, and individual claim distribution: x p(x) 2 2/3 5 1/3 Determine the stop loss premium at 2. (A) 11.4 (B) 11.8 (C) 12.2
(D) 12.6
(E) 13.0
11.34 (Course 151 Sample Exam #3, Q.21) (1.7 points) For aggregate claims S, you are given: (i) S can only take on positive integer values. (ii) The stop loss premium at zero is 5/3. (iii) The stop loss premium at two is 1/6. (iv) The stop loss premium at three is 0. Determine fS(1). (A) 1/6
(B) 7/18
(C) 1/2
(D) 11/18
(E) 5/6
11.35 (5A, 5/94, Q.24) (1 point) Suppose S has a compound Poisson distribution with Poisson parameter of 2 and E(S) = $200. Net stop-loss premiums with deductibles of $400 and $500 are $100 and $25, respectively. The premium is $500. The insurer agrees to pay a dividend equal to the excess of 80% of the premium over the claims. What is the expected value of the dividend? A. Less than $200 B. At least $200, but less than $250 C. At least $250, but less than $300 D. At least $300, but less than $350 E. $350 or more 11.36 (5A, 5/94, Q.38) (2 points) Assume that the aggregate claims for an insurer have a compound Poisson Distribution with lambda = 2. Individual claim amounts are equal to 1, 2, 3 with probabilities 0.4, 0.3, 0.3, respectively. Calculate the net stop-loss premium for a deductible of 2. 11.37 (CAS9, 11/98, Q.30a) (1 point) Your company has an expected loss ratio of 50%. You have analyzed year-to-year variation and determined that any particular accident yearʼs loss ratio will be uniformly distributed on the interval 40% to 60%. If expected losses are $5.0 million on subject premium of $10.0 million, what is the expected value of losses ceded to an aggregate stop-loss cover with a retention of a 55% loss ratio?
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 306
Use the following information for the next two questions: • An aggregate loss distribution has a compound Poisson distribution with expected number of claims equal to 1.25. • Individual claim amounts can take only the values 1, 2 or 3, with equal probability. 11.38 (Course 3 Sample Exam, Q.14) Determine the probability that aggregate losses exceed 3. 11.39 (Course 3 Sample Exam, Q.15) Calculate the expected aggregate losses if an aggregate deductible of 1.6 is applied. 11.40 (3, 5/00, Q.11) (2.5 points) A company provides insurance to a concert hall for losses due to power failure. You are given: (i) The number of power failures in a year has a Poisson distribution with mean 1. (ii) The distribution of ground up losses due to a single power failure is: x Probability of x 10 0.3 20 0.3 50 0.4 (iii) The number of power failures and the amounts of losses are independent. (iv) There is an annual deductible of 30. Calculate the expected amount of claims paid by the insurer in one year. (A) 5 (B) 8 (C) 10 (D) 12 (E) 14 11.41 (3 points) In the previous question, calculate the expected amount of claims paid by the insurer in one year, if there were an annual deductible of 50 rather than 30. A. 6.5 B. 7.0 C. 7.5 D. 8.0 E. 8.5 11.42 (3, 5/01, Q.19 & 2009 Sample Q.107) (2.5 points) For a stop-loss insurance on a three person group: (i) Loss amounts are independent. (ii) The distribution of loss amount for each person is: Loss Amount Probability 0 0.4 1 0.3 2 0.2 3 0.1 (iii) The stop-loss insurance has a deductible of 1 for the group. Calculate the net stop-loss premium. (A) 2.00 (B) 2.03 (C) 2.06 (D) 2.09 (E) 2.12
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 307
11.43 (3, 5/01, Q.30) (2.5 points) You are the producer of a television quiz show that gives cash prizes. The number of prizes, N, and prize amounts, X, have the following distributions: n Pr(N = n) x Pr (X=x) 1 0.8 0 0.2 2 0.2 100 0.7 1000 0.1 You buy stop-loss insurance for prizes with a deductible of 200. The relative security loading is defined as: (premiums / expected losses) - 1. The cost of insurance includes a 175% relative security load. Calculate the cost of the insurance. (A) 204 (B) 227 (C) 245 (D) 273 (E) 357 11.44 (3, 11/01, Q.18 & 2009 Sample Q.99) (2.5 points) For a certain company, losses follow a Poisson frequency distribution with mean 2 per year, and the amount of a loss is 1, 2, or 3, each with probability 1/3. Loss amounts are independent of the number of losses, and of each other. An insurance policy covers all losses in a year, subject to an annual aggregate deductible of 2. Calculate the expected claim payments for this insurance policy. (A) 2.00 (B) 2.36 (C) 2.45 (D) 2.81 (E) 2.96 11.45 (2 points) In the previous question, 3, 11/01, Q.18, let Y be the claim payments for this insurance policy. Determine E[Y | Y > 0]. (A) 3.5 (B) 3.6 (C) 3.7 (D) 3.8 (E) 3.9 11.46 (3, 11/02, Q.16 & 2009 Sample Q.92) (2.5 points) Prescription drug losses, S, are modeled assuming the number of claims has a geometric distribution with mean 4, and the amount of each prescription is 40. Calculate E[(S - 100)+]. (A) 60
(B) 82
(C) 92
(D) 114
(E) 146
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 308
11.47 (SOA M, 5/05, Q.18 & 2009 Sample Q.165) (2.5 points) For a collective risk model: (i) The number of losses has a Poisson distribution with λ = 2. (ii) The common distribution of the individual losses is: x fx(x) 1 0.6 2 0.4 An insurance covers aggregate losses subject to an aggregate deductible of 3. Calculate the expected aggregate payments of the insurance. (A) 0.74 (B) 0.79 (C) 0.84 (D) 0.89 (E) 0.94 Comment: I have rewritten slightly this past exam question. 11.48 (2 points) In the previous question, SOA M, 5/05, Q.18, for those cases where the aggregate payment is positive, what is the expected aggregate payment? (A) 2.1 (B) 2.3 (C) 2.5 (D) 2.7 (E) 2.9 11.49 (SOA M, 11/05, Q.19 & 2009 Sample Q.206) (2.5 points) In a given week, the number of projects that require you to work overtime has a geometric distribution with β = 2. For each project, the distribution of the number of overtime hours in the week is the following: x f(x) 5 0.2 10 0.3 20 0.5 The number of projects and number of overtime hours are independent. You will get paid for overtime hours in excess of 15 hours in the week. Calculate the expected number of overtime hours for which you will get paid in the week. (A) 18.5 (B) 18.8 (C) 22.1 (D) 26.2 (E) 28.0 11.50 (SOA M, 11/06, Q.7 & 2009 Sample Q.280) (2.5 points) A compound Poisson claim distribution has λ = 5 and individual claim amounts distributed as follows: x
fX(x)
5 0.6 k 0.4 where k > 5 The expected cost of an aggregate stop-loss insurance subject to a deductible of 5 is 28.03. Calculate k. (A) 6 (B) 7 (C) 8 (D) 9 (E) 10
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 309
Solutions to Problems: 11.1. E. Linearly interpolate: (.2)(120,000) + (.8)(111,000) = 112,800. Comment: If as is commonly the case, there is some probability that the aggregate losses are between $1 and $1.1 million, then the stop loss premium at $1.08 million is likely to be somewhat closer to $111,000 than calculated here. 11.2. E. Set the observed and theoretical first two moments equal: mean = 13,000 = exp(µ + σ2/2). second moment = exp(2µ + 2σ2) = 92,0002 + 13,0002 = 8633 million. σ=
ln(8633 million) - 2 ln(13000) = 3.933 = 1.983.
µ = ln(13000) - σ2/2 = 7.507. E[X] = exp(µ + σ2/2) = 13000. E[X
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.
E[X
∧
25,000] = 13000Φ[-.66] + (25000){1 - Φ[1.32]} =
(13000)(1 - .7454) + (25000)(1 - .9066) = 5645. E[X] - E[X
∧
25,000] = 13000 - 5645 = 7355.
11.3. A. (0)(.3) + (10)(.4) + (20)(.1) + (30)(.08) + (40)(.06) + (50)(.04) + (60)(.02) = 14.0. 11.4. E., 11.5. D., 11.6. E., 11.7. D., 11.8. C., and 11.9. A. The fastest way to do this set of problems is to use the recursion formula: E[(A - (j+1)ΔA)+] = E[(A - jΔA)+] - ΔA S( jΔA). Deductible
Survival Function
Stop Loss Premium
0 10 20 30 40 50 60
0.7 0.3 0.2 0.12 0.06 0.02 0
14.00 7.00 4.00 2.00 0.80 0.20 0.00
11.10. D. Since there is no probability between 30 and 40, we can linearly interpolate: (0.7)(2.00) + (0.3)(0.8) = 1.64. 11.11. D. Linearly interpolating between a deductible of 20 and 30: d = {(4 - 3.7)(30) + (3.7 - 2)(20)} / (4 - 2) = 21.5.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 310
11.12. B. The mean excess loss for an Exponential is equal to its mean. The losses excess of 250 are: e(250) S(250) = 100( e-250/100) = 8.21. 11.13. A. The mean excess loss for an Exponential is equal to its mean. The losses excess of 250 are: e(250)S(250) = 110(e-250/110) = 11.33. 11.14. B. The insurer would charge a gross premium of: (1.3)(8.21) = 10.67. The mean excess losses are: 11.33. 10.67 / 11.33 = 0.942. Comment: Thus the expected excess losses are 94.2% of charged premiums, rather than the desired: 1/1.3 = 76.9%. Even a 10% mistake in estimating the mean, had a large effect on excess losses. This is the same mathematical reason why the inflation rate of excess losses over a fixed limit is greater than that of the total losses. 11.15. D. Since there is no probability in the interval 150 to 180, the stop loss premium at 180 = stop loss premium at 150 - (180 - 150)S(150). Therefore, S(150) = (11.5 - 9.1)/(180 - 150) = .08. Therefore F(150) = 1 - 0.08 = 0.92. Since there is no probability in the interval (140, 150], F(140) = F(150) = 0.92. 11.16. C. If the insurer paid for everything, then the expected cost = (47)(80) = $3760. There is the equivalent of a (25%)(80)= $20 per day deductible for the first 10 days. For those who stay more than 10 days this is $200. For those who stay 10 days or less, they average: (1+10)/2 = 5.5 days, so the deductible is worth on average: (5.5)($20) = $110. Weighting together the two cases, the deductible is worth: (60%)(110) + (40%)(200) = $146. Thus the insurer expects to pay: 3760 - 146 = $3614. 11.17. A. Slippery Rockʼs expected losses are: (1)(0.1) + (2)(0.2) + (5)(0.13) + (10)(0.06) + (25)(0.03) + (50)(0.01) = 3. Thus it charges of a premium of: (1.1)(3) = 3.3. 11.18. C. The second moment of Slippery Rockʼs losses are: (1)(0.1) + (4)(0.2) + (25)(0.13) + (100)(0.06) + (625)(0.03) + (2500)(0.01) = 53.9. Therefore, the variance = 53.9 - 32 = 44.9. 11.19. E. (1)(0.1) + (2)(0.2) + (5)(0.13) + (8)(0.06) + (8)(0.03) + (8)(0.01) = 1.95. 11.20. D. After reinsurance, the second moment of Slippery Rockʼs losses are: (1)(0.1) + (4)(0.2) + (25)(0.13) + (64)(0.06) +(64)(0.03) + (64)(0.01) = 10.55. Therefore, the variance = 10.55 - 1.952 = 6.75.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 311
11.21. A. Globalʼs expected payments are 3 - 1.95 = 1.05. Therefore, Global charges Slippery Rock: (1.25)(1.05) = 1.31. 11.22. B. Cosmos pays Global when Slippery Rockʼs losses exceed: 8 + 12 = 20. Thus Cosmosʼs expected losses are: (5)(0.03) + (30)(0.01) = .45. Thus Globalʼs expected losses net of reinsurance are 1.05 - 0.45 = 0.60. 11.23. A. E[X] = exp(µ + σ2/2) = exp(9.902 + 1.4832 /2) = 59,973. E[X
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.
E[X
∧
100,000] = (59,973) Φ[(ln100000 - 9.902 - 1.4832 )/1.483]) +
(100000) {1 - Φ[(ln100000 - 9.902 )/1.483]}} = 59973Φ[-.40] + (100000)(1 - Φ[1.09]) = (59,973)(.3446) + (100,000)(.1379) = 34,457. E[X] - E[X
∧
100,000] = 59,973 - 34,457 = 25.5 thousand.
11.24. A. Let A be the aggregate losses. The net stop loss premium at 25 is: E[A] - E[A ∧ 25] = 14.2. Thus E[A ∧ 25] = E[A] - 14.2 = (3)(10) - 14.2 = 15.8. E[25 - A]+ = 25 - E[A ∧ 25] = 25 - 15.8 = 9.2. The dividend is 0.3[25 - A]+. Therefore, the expected dividend is: (0.3)(9.2) = 2.76. 11.25. D. Linearly interpolating: (0.6)(293) + (0.4)(141) = 232. 11.26. B. In order to get the aggregate layer from 250,000 to 250,000 + 750,000 = 1,000,000, subtract the stop loss premiums: 750 - 141 = 609. 11.27. D. An aggregate loss of 1,833,333 results in a payment of: (0.75)(1,833,333 - 500,000) = 1 million. Thus the insurer pays 75% of the layer from 500,000 to 1.833 million. In order to get the stop loss premium at 1.833 million, linearly interpolate: (0.167)(141) + (0.833)(53) = 67.7. In order to get the aggregate layer from 500,000 to 1,833,333, subtract the stop loss premiums: 293 - 67.7 = 225.3. The insurer pays 75% of this layer: (75%)(225.3) = 169.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
11.28. A. The stop loss premium is: E[(A - D)+] = E[A] - E[A
∧
HCM 10/23/12,
Page 312
D].
The average amount by which aggregate losses are less than D is: E[(D - A)+] = D - E[A
∧
D].
The stop loss premium plus expected bonus is: E[(A - D)+] + E[(D - A)+]/3 = E[A] - E[A ∧ D] + (D - E[A
∧
D].
∧
D])/3 = E[A] + D/3 - (4/3)E[A
Note that E[A ∧ D] is the integral of the survival function from 0 to D, and therefore, d E[A ∧ D] / dD = S(D). Setting equal to zero the derivative of the stop loss premium plus expected bonus: 1/3 - (4/3)S(D) = 0. ⇒ S(D) = 1/4. 11.29. A. Premium = 500,000 ⇔ 500000 = 1.05(200,000 + 1.1A) ⇒ A = (500000/1.05 - 200000))/1.1 = 251082. So the minimum premiums are paid if A ≤ 251,082. Premium = 2,500,000 ⇔ 2500000 = 1.05(200,000 + 1.1A) ⇒
A = (2500000/1.05 - 200000))/1.1 = 1982684. So the maximum premiums are paid if A ≥ 1,982,684. If there were no maximum or minimum premium, then the average premium would be: 1.05(200,000 + 1.1E[A]). If there were no minimum premium, then the average premium would be: 1.05(200,000 + 1.1E[A ∧ 1,982,684]). Due to the minimum premium, we add to [A ∧ 1,982,684] the average amount by which losses are less than 251,082, which is: 251,082 - E[A ∧ 251,082]. Thus the average premiums are: 1.05(200,000 + 1.1{E[A ∧ 1982684] + 251,082 - E[A ∧ 251,082]}) = 500,000 + (1.05)(1.1){E[A ∧ 1,974,026] - E[A ∧ 251,082]} = Minimum Premium + (1.05)(1.1){Layer of Loss from 251,082 to 1,974,026}. For the LogNormal, E[X E[X
∧
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]}.
1982684] = exp(13.5 + .752 /2)Φ[(ln1982684 - 13.5 - .752 )/.75] +
1982684 {1 - Φ[(ln1982684 - 13.5)/.75]} = 966320 Φ[.59] + 1982684 {1 - Φ[1.33]} = (966,320)(.7224) + (1,982,684)(1 - .9082) = 880,080. E[X
∧
251082] = exp(13.5 + .752 /2)Φ[(ln251082 - 13.5 - .752 )/.75] +
251082 {1 - Φ[(ln251082 - 13.5)/.75]} = 966320 Φ[-2.17] + 251082 {1 - Φ[-1.42]} = (966,320)(.0150) + (251,082)(.9222) = 246,048. Therefore, the average premium is: 500,000 + (1.05)(1.1)(880,080 - 246,048) = 1.23 million. Comment: A simplified example of Retrospective Rating.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 313
11.30. C. The mean severity is: (3/4)(1) + (1/4)(2) = 5/4. The mean aggregate loss is (5/4)(4) = 5. The probability that the aggregate losses are zero is the probability of zero claims, which is: e-4. The probability that the aggregate losses are one, is the probability that there is one claims and it is of size one, which is: (3/4)4e-4 = 3e-4. The stop loss premium at 0 is the mean aggregate loss of 5. We can make use of the recursion: E[(X - (j+1)Δx)+] = E[(X - jΔx)+] - Δx S( jΔx). Deductible
Survival Function
Stop Loss Premium
0 1 2
0.9817 0.9267
5 4.0183 3.0916
Alternately, the expected value of aggregate losses limited to 2 is: (0)( e-4) + (1)( 3e-4) + (2)(1 - 4e-4) = 1.9084. The expected value of aggregate losses unlimited is 5. Thus the expected value of aggregate losses excess of 2 is: 5 - 1.9084 = 3.0916. 11.31. C. If the insurer has aggregate losses of 26.25, then the reinsurer pays: .8(26.25 - 20) = 5. If the insurer has losses greater than 26.25, then the reinsurer still pays 5. If the insurer has losses less than 20, then the reinsurer pays nothing. Thus the reinsurerʼs payments are 80% of the layer of aggregate losses from 20 to 26.25. The layer from 20 to 26.25 is the difference between the aggregate losses excess of 20 and those excess of 26.25: I20 - I26.25. By linear interpolation I26.25 = 2.68. Thus the reinsurerʼs expected payments are: (0.8)(I20 - I26.25) = (0.8)(3.33 - 2.68) = 0.52. 11.32. D. For a uniform distribution on (0, 80), the expected loss is 40. Thus the premium for complete insurance is (1+k)(40). For a deductible of 10 the average payment per loss is: 80
80
∫(x-10) (1/80) dx = (1/80)(x-10)2/2 ] = 702/160 = 30.625. 10
10
Thus with a deductible of 10, the premium is: 30.625 + 14.6 = 45.225. Setting the two premiums equal: (1+k)(40) = 45.225, or k = 0.13.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 314
11.33. C. The mean severity is: (2)(2/3) + (5)(1/3) = 3. The mean frequency is: (2)(7/3) = 14/3. Thus the mean aggregate loss is: (14/3)(3) = 14. The aggregate losses are at least 2 if there is a claim. The chance of no claims is: 1/(1+β)r = 1/(10/3)2 = 0.09. Thus the expected aggregate losses limited to 2, are: (0)(0.09) + (2)(1 - 0.09) = 1.82. Thus the expected aggregate losses excess of 2 are: 14 - 1.82 = 12.18. 11.34. C. Since the stop loss premium at 3 is zero, S is never greater than 3. Since S can only take on positive integer values, S can only be 1, 2, or 3. Stop loss premium at zero = (1)f(1) + (2)f(2) + (3)f(3) = 5/3. Stop loss premium at two = (0)f(1) + (0)f(2) + (1)f(3) = 1/6. Therefore, f(3) = 1/6 and f(1) + 2f(2) = 5/3 - 3/6 = 7/6. Therefore, f(2) = (7/6 - f(1))/2. Now f(1) + f(2) + f(3) = 1. Therefore, f(1) + (7/6 - f(1))/2 + 1/6 = 1. Solving, f(1) = 1/2. Comment: For f(1) = 1/2, f(2) = 1/3, and f(3) =1/6, the stop loss premium at 0 is the mean: (1)(1/2) + (2)(1/3) + (3)(1/6) = 5/3. The stop loss premium at 2 is: (0)(1/2) + (0)(1/3) + (1/6)(3 - 2) = 1/6.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 315
11.35. D. If the loss is (80%)(500) = 400 or more, then the insurer pays no dividend. If the loss S is less than 400, then the insurer pays a dividend of 400 - S. Thus the dividend is 400 - S when S ≤ 400, and is zero when S ≥ 400. The net stop loss premium at 400 is: E [zero when S ≤ 400 and S - 400 when S≥ 400]. Dividend + S - 400 is zero when S ≤ 400, and S - 400 when S ≥ 400. Therefore, E[dividend + S - 400] = net stop loss premium at 400. E[dividend] + E[S] - 400 = net stop loss premium at 400. E[dividend] = 400 + (net stop loss premium at 400) - E[S] = 400 + 100 - 200 = 300. Comment: Somewhat similar to 3, 5/00, Q.30 and Course 151 Sample Exam #1, Q.16. When the dividend is the excess of y over the aggregate loss, then the expected dividend is: y + net stop loss premium at y - mean aggregate loss. In the following Lee Diagram, applied to the distribution of aggregate losses, Area A is the average dividend, Area C is the net stop loss premium at 400. Area B + C is the average aggregate loss. Therefore, Area B = Average aggregate loss - stop loss premium at 400. Area A + Area B = 400. Therefore, 400 = average dividend + average aggregate loss - stop loss premium at 400.
C 400 A
B
11.36. The average severity is (1)(.4) + (2)(.3) + (3)(.3) = 1.9. The average aggregate losses are (1.9)(2) = 3.8. The only way the aggregate losses can be zero is if there are no claims, which has probability e-2 = 0.1353. The only way the aggregate losses can be 1 is if there is one claim of size 1, which has probability: (0.4)(2e-2) = 0.1083. Thus E[A ∧ 2] = (0)(0.1353) + (1)(0.1083) + (2)(1 - 0.1353 - 0.1083) = 1.6212. Thus the net stop-loss premium at 2 is: E[A] - E[A ∧ 2] = 3.8 - 1.62 = 2.18.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 316
11.37. The expected loss ratio excess of 55% is: 60%
(x - 55%) / 20% dx = 0.0625%. ∫ 55% The corresponding premium is: ($10.0 million)(0.0625%) = $62,500. 11.38. We need to calculate the density of the aggregate losses at 0, 1, 2 and 3, then sum them and subtract from unity. The aggregate losses are 0, if there are no claims; f(0) = e-1.25. The aggregate losses are 1 if there is a single claim of size 1; f(1) = (1/3) 1.25 e-1.25. The aggregate losses are 2 if either there is a single loss of size 2 or there are two losses each of size 1; f(2) = (1/3) 1.25 e-1.25 + (1/9) (1.252 /2)e-1.25. The aggregate losses are 3 if either there is a single loss of size 3, there are two losses of sizes 1 and 2 or 2 and 1, or there are three losses each of size 1; f(3) = (1/3) 1.25 e-1.25 + (2/9) (1.252 /2) e-1.25 + (1/27) (1.253 /6) e-1.25. Thus, f(0) + f(1) + f(2) + f(3) = e-1.25 { 1 + 1.25 + (1.252 /6) + (1.253 /162)} = .723. Thus the chance that the aggregate losses are greater than 3 is: 1 - .723 = 0.277. Alternately, one can compute the convolutions of the severity distribution and weight them together using the Poisson probabilities of various numbers of claims. For example (f*f*f) (7) = Σ (f*f)(7-x) f(x) = (1/9)(1/3) + (2/9)(1/3) + (3/9)(1/3) = 6/27 = .2222. Note that I have shown more than it is necessary in order to answer this question. One need only calculate up to the f*f*f and only for values up to 3. I have not shown the aggregate distribution for larger values, since that would require the calculation of higher convolutions. Poisson Probability Number of Claims Dollars of Loss 0 1 2 3 4 5
0.2865 0
0.3581 1
0.2238 2
0.0933 3
0.0291 4
f*0 1.0000
f 0.0000 0.3333 0.3333 0.3333
f*f 0.0000 0.0000 0.1111 0.2222 0.3333 0.2222
f*f*f 0.0000 0.0000 0.0000 0.0370 0.1111 0.2222
f*f*f*f 0 0 0 0 0.0123 0.0494
Aggregate Distribution 0.2865 0.1194 0.1442 0.1726 0.0853 N.A.
Then the chance of aggregate losses of 0, 1, 2 or 3 is: .2865 + .1194 + .1442 + .1726 = .7227. Thus the chance that the aggregate losses are greater than 3 is: 1- .723 = 0.277. Alternately, we can use the Panjer Algorithm, since this is a compound Poisson Distribution. The severity distribution is s(1) = s(2) = s(3) = 1/3. The p.g.f of a Poisson is eλ(x-1). s(0) = severity distribution at zero = 0.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 317
c(0) = Pf(s(0)) = p.g.f. of frequency dist. at (density of severity distribution at zero.) = e1.25(0-1) = .2865. For the Poisson Distribution, a = 0 and b = λ = 1.25. x
c(x) = {1/(1-as(0))}
x
Σ (a + jb/x) s(j) c(x-j) = (1.25/x) Σ j s(j) c(x-j) j=1
j=1
c(1) = (1.25/1) (1) s(1) c(1-1) = (1.25)(1/3)(.2865)} = .1194. c(2) = (1.25/2) {(1)(1/3)(.1194) +(2)(1/3)(.2865)} = .1442. c(3) = (1.25/3) {(1)(1/3)(.1442) +(2)(1/3)(.1194) +(3)(1/3)(.2865)} = .1726. Then the chance of aggregate losses of 0, 1, 2 or 3 is: .2865 + .1194 + .1442 + .1726 = .7227. Thus the chance that the aggregate losses are > 3 is: 1 - .723 = 0.277. 11.39. In the absence of a deductible, the mean aggregate losses are: (average frequency)(average severity) = 1.25(2) = 2.5. In the previous solution, we calculated f(0) = .2865, and f(1) = .1194. Therefore, the limited expected value at 1.6 of the aggregate losses is: (0)f(0) + (1)f(1) + 1.6(1 - (f(0) + f(1)) = 1.6 - 1.6 f(0) - .6 f(1) = 1.6 - (1.6)(.2865) - (.6)(.1194) = 1.07. Thus the average aggregate losses with the deductible of 1.6 are: E[A] - E[A ∧ 1.6] = = 2.5 - 1.07 = 1.43.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 318
11.40. E. The mean severity is (0.3)(10) + (0.3)(20) + (0.4)(50) = 29. The mean frequency is 1. Therefore, prior to a deductible the mean aggregate losses are: (1)(29) = 29. The probability of no claims is: e-1 = .3679. The probability of one claim is: e-1 = .3679. The probability of two claims is: e-1/2 = .1839. Therefore, the probability of no aggregate losses is 0.3679. Aggregate losses of 10 correspond to one power failure costing 10, with probability (0.3)(0.3679) =0 .1104. Aggregate losses of 20 correspond to either one power failure costing 20, or two power failures each costing 10, with probability: (0.3)(0.3679) + (0.32 )(0.1839) = 0.1269. Thus the chance of aggregate losses of 30 or more is: 1 - (0.3679 + 0.1104 + 0.1269) = 0.3948. Therefore, the limited expected value of aggregate losses at 30 is: (0)(0.3679) + (10)(0.1104) + (20)(0.1269) + (30)(0.3948) = 15.49. Thus the losses excess of 30 are: 29 - 15.49 = 13.5. Alternately, one could use the Panjer Algorithm (Recursive Method) to get the distribution of aggregate losses. Since the severity distribution has support 10, 20, 50, we let Δx = 10: 10 ⇔ 1, 20 ⇔ 2, 30 ⇔ 3, ...
For the Poisson, a = 0, b = λ = 1, and P(z) = eλ(z-1).
c(0) = Pf(s(0)) = Pf(0) = e1(0-1) = 0.3679. x
c(x) = {1/(1-as(0))}
x
Σ (a +jb/x) s(j) c(x-j) =
(1/x) Σ j s(j) c(x-j)
j=1
j=1
c(1) = (1/1) (1) s(1) c(1-1) = {(.3)(.3679)} = 0.1104. c(2) = (1/2){(1)s(1)c(1) + (2)s(2)c(0)} = (1/2) {(.3)(.1104) +(2)(.3)(.3679)} = 0.1269. One can also calculate the distribution of aggregate losses using convolutions. For the severity distribution, s* s(20) = 0.09, s* s(30) = 0.18, s* s(40) = 0.09, s* s(60) = 0.24, s* s(70) = 0.24, and s* s(100) = 0.16. Number of Losses Poisson Frequency Aggregate Losses 0 10 20
0 0.3679 1
1 0.3679
2 0.1839
s 0 0.3 0.3
s*s
0.09
Aggregate Distribution 0.3679 0.1104 0.1269
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 319
Once one has the distribution of aggregate losses, one can use the recursion formula: E[(A - (j+1)ΔA)+] = E[(A - jΔA)+] - ΔA S(jΔA). Deductible
Survival Function
Stop Loss Premium
0 10 20 30
0.6321 0.5217 0.3948
29 22.679 17.462 13.514
Alternately, once one has the aggregate distribution, one can calculate the expected amount not paid by the stop loss insurance as follows: A
B
C
D
Aggregate Losses
Probability
Amount Not Paid by Stop Loss Insurance Ded. of 30
Product of Col. B & Col. C
0 10 20 30 or more
0.3679 0.1104 0.1269 0.3948
0 10 20 30
0 1.104 2.538 11.844
Sum
1
15.486
Since the mean aggregate loss is 29, the expected amount paid by the stop loss insurance is: 29 - 15.486 = 13.514.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 320
11.41. A. The densities of the Poisson frequency are: 0 0.3679
1 0.3679
2 0.1839
3 0.0613
4 0.0153
Aggregate losses of 30 correspond to either two power failures costing 10 and 20, or three power failures each costing 10, with probability: (2)(0.32 )(0.1839) + (0.33 )(0.0613) = 0.0348. Aggregate losses of 40 correspond to either two power failures costing 20 each, three failures costing 10, 10 and 20, or four power failures each costing 10, with probability: (0.32 )(0.1839) + (3)(0.33 )(0.0613) + (0.34 )(0.0153) = 0.0216. Thus the chance of aggregate losses of 50 or more is: 1 - (0.3679 + 0.1104 + 0.1269 + 0.0348 + 0.0216) = 0.3384. Therefore, the limited expected value of aggregate losses at 50 is: (0)(0.3679) + (10)(0.1104) + (20)(0.1269) + (30)(0.0348) + (40)(0.0216) + (50)(0.3384) = 22.47. Thus the losses excess of 50 are: 29 - 22.47 = 6.5. Alternately, one could use the Panjer Algorithm to get the distribution of aggregate losses. Continuing from the previous solution: c(3) = (1/3){(1)s(1)c(2) + (2)s(2)c(1) + (3)s(3)c(0)} = (1/3) {(.3)(.1269) + (2)(.3)(.1104) + (3)(0)(.3679)} = 0.0348. c(4) = (1/4){(1)s(1)c(3) + (2)s(2)c(2) + (3)s(3)c(1) + 4s(4)c(0)} = (1/4) {(.3)(.0348) + (2)(.3)(.1269) + (3)(0)(.1104) + (4)(0)(.3679)} = 0.0216. One can also calculate the distribution of aggregate losses using convolutions. Number of Losses Poisson Frequency
0 0.3679
Aggregate Losses 0 10 20 30 40
1
1 0.3679
2 0.1839
3 0.0613
4 0.0153
s 0 0.3 0.3 0 0
s*s
s*s*s
s*s*s*s
0.09 0.18 0.09
0.027 0.081
0.0081
Aggregate Distrib. 0.3679 0.1104 0.1269 0.0348 0.0216
Once one has the distribution of aggregate losses, one can use the recursion formula: E[(A - (j+1)ΔA)+] = E[(A - jΔA)+] - ΔA S(jΔA). Deductible
Survival Function
Stop Loss Premium
0 10 20 30 40 50
0.6321 0.5217 0.3948 0.3600 0.3384
29.000 22.679 17.462 13.514 9.914 6.530
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 321
11.42. C. For each person the mean is: (0.4)(0) + (0.3)(1) + (0.2)(2) + (0.1)(3) = 1. Therefore, the overall mean for 3 people is: (3)(1) = 3. Prob(Aggregate loss = 0) = 0.43 = 0.064. Therefore, the limited expected value at 1 is: (0.064)(0) + (1 - 0.064)(1) = 0.936. Net Stop Loss Premium at 1 is: Mean - Limited Expected Value at 1 = 3 - 0.936 = 2.064. 11.43. D. The probability of zero loss: Prob(n = 1)Prob(x = 0) + Prob(n =2)Prob(x =0)2 = (.8)(.2) + (.2)(.2)2 = 0.168. The probability of an aggregate loss of 100 is: Prob(n = 1)Prob(x = 100) + Prob(n = 2)(2)Prob(x = 0)Prob(x = 100) = (.8)(.7) + (.2)(2)(.2)(.7) = .616. Therefore, the probability that the aggregate losses are 200 or more is: 1 - (.168 + .616) = .216. Therefore, E[A ∧ 200] = (.168)(0) + (.616)(100) + (.216)(200) = 104.8. Mean frequency is: (.8)(1) + (.2)(2) = 1.2. Mean severity is: (.2)(0) + (.7)(100) + (.1)(1000) = 170. Mean aggregate loss is: (1.2)(170) = 204. Stop loss premium is: E[A] - E[A ∧ 200] = 204 - 104.8 = 99.2. With a relative security loading of 175%, the insurance costs: (1 + 1.75)(99.2) = 273. Alternately, the probability of 2000 in aggregate loss is: Prob(n = 2)Prob(x = 1000)2 = (0.2)(0.12 ) = 0.002. The probability of 1100 in aggregate loss is: Prob(n = 2)(2)Prob(x = 100)Prob(x = 1000) = (.2)(2)(0.7)(0.1) = 0.028. The probability of 1000 in aggregate loss is: Prob(n = 2)(2)Prob(x = 0)Prob(x = 1000) + Prob(n = 1)Prob(x = 1000) = (.2)(2)(0.2)(0.1) + (0.8)(0.1) = 0.088. These are the only possible aggregate values greater than 200. Therefore, the expected aggregate loss excess of 200 is: (2000 - 200)(.002) + (1100 - 200)(.028) + (1000 - 200)(0.088) = 99.2. With a relative security loading of 175%, the insurance costs: (1 + 1.75)(99.2) = 273. 11.44. B. Prob[0 claims] = e-2. Prob[1 claim] = 2e-2. Prob[aggregate = 0] = Prob[0 claims] = e-2. Prob[aggregate = 1] = Prob[1 claim] Prob[size = 1] = (2 e-2) (1/3) = 2e-2/3. Limited Expected Value of Aggregate at 2 = (0)e-2 + (1)2e-2/3 + (2){1- (e-2 + 2e-2/3)} = 2 - 8e-2/3. Mean Severity = (1 + 2 + 3)/3 = 2. Mean Aggregate Loss = (2)(2) = 4. Expected Excess of 2 = 4 - (2 - 8e-2/3) = 2 + 8e-2/3 = 2.36. Alternately, let A = aggregate loss. E[A] = (2)(2) = 4. E[(A-1)+] = E[A] - S(0) = 4 - (1 - e-2). E[(A-2)+] = E[(A-1)+] - S(1) = 4 - (1 - e-2) - (1 - e-2 - 2e-2/3) = 2 + 8e-2/3 = 2.36.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 322
11.45. B. From the previous solution, Prob[aggregate = 0] = Prob[0 claims] = e-2. Prob[aggregate = 1] = Prob[1 claim] Prob[size = 1] = 2e-2/3. Now the aggregate can be two if there are 2 claims of size 1 or 1 claim of size 2. Prob[aggregate = 2] = (22 e-2 / 2)(1/3)2 + (2 e-2)(1/3) = 8e-2/9. Thus the probability of a zero total payment by the insurer is: e-2 + 2e-2/3 + 8e-2/9 = 0.3459. From the previous solution, expected claim payments are 2.36. Thus the expected claim payments for this insurance policy when it is positive is: 2.36 / (1 - 0.3459) = 3.61. 11.46. C. For a geometric with β = 4: f(0) = 1/5 = 0.2, f(1) = 0.8f(0) = 0.16, f(2) = 0.8f(1) = 0.128. E[S] = (4)(40) = 160. E[S ∧ 100] = 0f(0) + 40f(1) + 80f(2) + 100{1 - (f(0) + f(1) + f(2))} = (40)(0.16) + (80)(0.128) + (100){1 - (0.2 + 0.16 + 0.128)} = 67.84. E[(S - 100)+] = E[S] - E[S ∧ 100] = 160 - 67.84 = 92.16. 11.47. A. Prob[Agg = 0] = e-2 = 0.1353. Prob[Agg = 1] = 2e-2(0.6) = 0.1624. Prob[Agg = 2] = Prob[1 loss of size 2 or 2 losses of size 1] = 2e-2(0.4) + (22 e-2/2)(0.62 ) = 0.2057. E[A ∧ 3] = 0.1624 + (2)(0.2057) + (3)(1 - 0.1353 - 0.1624 - 0.2057) = 2.0636. E[A] = (mean frequency)(mean severity) = (2)(1.4) = 2.8. E[(A - 3)+] = E[A] - E[A ∧ 3] = 2.8 - 2.0636 = 0.7364. Comment: The Exam Committee meant to say “subject to an aggregate deductible of 3.” 11.48. B. From the previous solution, Prob[Agg = 0] = e-2 = 0.1353, Prob[Agg = 1] = 2e-2(0.6) = 0.1624, Prob[Agg = 2] = Prob[1 loss of size 2 or 2 losses of size 1] = 2e-2(0.4) + (22 e-2/2)(.62 ) = 0.2057. The aggregate can be three if: 3 clams of size 1, or one claim of size one and one claim of size 2. Prob[Agg = 3] = (23 e-2/6)(0.63 ) + (22 e-2/2) {(2)(0.6)(0.4)} = 0.1689. Thus the chance the insurer makes a positive payment is: 1 - (0.1353 + 0.1624 + 0.2057 + 0.1689) = 0.3277. From the previous solution, the expected aggregate payment is 0.7364. Thus the average of the positive aggregate payments is: 0.7364 / 0.3277 = 2.247. Comment: In the exam question we are determining the average aggregate payment that the insurer makes in a year, including those years in which the aggregate payment is zero. In contrast, in this followup question we restrict our attention to only those years where the insurer makes a positive payment.
2013-4-3,
Aggregate Distributions §11 Stop Loss Premiums,
HCM 10/23/12,
Page 323
11.49. B. For a Geometric Distribution with β = 2: f(0) = 1/3, f(1) = (2/3)f(0) = 2/9, f(2) = (2/3)f(1) = 4/27. The mean of the distribution of overtime hours is: (5)(.2) + (10)(.3) + (20)(.5) = 14. The mean aggregate is: (2)(14) = 28. Prob[Agg = 0] = Prob[0 projects] = 1/3. Prob[Agg = 5] = Prob[1 project]Prob[5 overtime] = (2/9)(.2) = .04444. Prob[Agg = 10] = Prob[1 project]Prob[10 overtime] + Prob[2 projects]Prob[5 overtime]2 = (2/9)(0.3) + (4/27)(0.2)2 = 0.07259. E[Agg ∧ 15] = (0)(1/3) + (5)(0.04444) + (10)(0.07259) + 15(1 - 1/3 - 0.04444 - 0.07259) = 9.19. Expected overtime in excess of 15 is: Mean[Agg] - E[Agg ∧ 15] = 28 - 9.19 = 18.81. Alternately, one can use a recursive method, with steps of 5. As above, E[A] = 28. Also get the first few values of the aggregate distribution as above. E[(A - 5)+] = E[A] - 5SA(0) = 28 - (5)(1 - 1/3) = 24.667. E[(A - 10)+] = E[(A - 5)+] - 5SA(5) = 24.667 - (5)(1 - 1/3 - 0.04444) = 21.556. E[(A - 15)+] = E[(A - 10)+] - 5SA(10) = 21.556 - (5)(1 - 1/3 - 0.04444 - 0.07259) = 18.81. 11.50. D. Let A be the aggregate loss. E[A] = (5){(0.6)(5) + 0.4 k} = 15 + 2k. Prob[A = 0] = Prob[0 claims] = e-5. Prob[A ≥ 5] = Prob[at least 1 claim] = 1 - e-5. E[A
∧
5] = (0)e-5 + 5(1 - e-5) = 5 - 5e-5.
28.03 = E[(A - 5)+] = E[A] - E[A
∧
5] = 10 + 2k + 5e-5. ⇒ k = (18.03 - 5e-5)/2 = 9.
Comment: Given the output, solve for the missing input.
2013-4-3, Aggregate Distributions §12 Important Ideas,
HCM 10/23/12,
Page 324
Section 12, Important Formulas and Ideas Introduction (Section 1) The Aggregate Loss is the total dollars of loss for an insured or set of an insureds. Aggregate Losses = (Exposures)(Frequency)(Severity). If one is not given the frequency per exposure, but is rather just given the frequency for the whole number of exposures, whatever they are for the particular situation, then Aggregate Losses = (Frequency)(Severity). Loss Models list of advantages of separately analyzing frequency and severity: 1. The number of claims changes as the volume of business changes. 2. The effects of inflation can be incorporated. 3. One can adjust the severity distribution for changes in deductibles, maximum covered loss, etc. 4. One can adjust frequency for changes in deductibles. 5. One can appropriately combine data from policies with different deductibles and maximum covered losses into a single severity distribution. 6. One can create consistent models for the insurer, insured, and reinsurer. 7. One can analyze the tail of the aggregate losses by separately analyzing the tails of the frequency and severity. Loss Models recommends for modeling aggregate losses, infinitely divisible frequency distributions and severity distributions that are members of scale families. Convolutions (Section 2) Convolution calculates the density or distribution function of the sum of two independent variables. There are discrete and continuous cases. (f*g)(z) =
∑ f(x) g(z - x) = ∑ f(z - y) g(y) . x
(F*G)(z) =
y
∑ F(x) g(z - x) = ∑ f(z - y) G(y) = ∑ f(x) G(z- x) = ∑ F(z - y) g(y) . x
y
x
y
2013-4-3, Aggregate Distributions §12 Important Ideas,
∫
(f*g)(z) = f(x) g(z-x) dx =
∫
HCM 10/23/12,
Page 325
∫ f(z-y) g(y) dy. ∫
∫
∫
(F*G)(z) = f(x)G(z-x)dx = F(z-y)g(y)dy = F(x)g(z-x)dx = f(z-y)G(y)dy. The convolution operator is commutative and associative: f* g = g* f. (f* g)* h = f* (g* h). Using Convolutions (Section 3) If frequency is N, if severity is X, frequency and severity are independent, and aggregate losses are A then: ∞
FA(x) =
∑ fN (n) FX * n(x) . n=0 ∞
fA (x) =
∑ fN (n) fX * n (x).
n= 0
Generating Functions (Section 4) Probability Generating Function:
P X(t) = E[tx ] = MX(ln(t))
Moment Generating Function:
M X(t) = E[et x] = PX( et )
The Moment Generating Functions of severity distributions, when they exist, are given in Appendix A of Loss Models. The Probability Generating Functions of frequency distributions are given in Appendix B of Loss Models. M(t) = P(et). For an Exponential, M(t) = 1 / (1 - θt), t < 1/θ. For a Poisson, P(z) = exp[λ(z - 1)], and M(t) = exp[λ(et - 1)]. The moment generating function of the sum of two independent variables is the product of their moment generating functions: M X+Y(t) = MX(t) MY( t )
2013-4-3, Aggregate Distributions §12 Important Ideas,
HCM 10/23/12,
Page 326
The Moment Generating Function converts convolution into multiplication: M f * g = Mf M g . The sum of n independent identically distributed variables has the Moment Generating Function taken to the power n. The m.g.f. of f*n is the nth power of the m.g.f. of f. M X+b (t) = ebt MX(t).
M cX(t) = E[ecxt] = MX(ct).
M cX + b (t) = ebt MX(ct). M cX + dY + b(t) = ebt MX(ct) MY(dt), for X and Y independent. The Moment Generating Function of the average of n independent, identically distributed variables is the nth power of the Moment Generating Function of t/n. The moment generating function determines the distribution, and vice-versa. Therefore, one can take limits of a distribution by instead taking limits of the Moment Generating Function. M(0) = 1
M ′(0) = E[X]
M ′′(0) = E[X2 ]
M′′′(0) = E[X3 ]
M ( n )(0) = E[Xn ]
∞
M X(t) =
∑ (nth moment of X) tn / n! n=0
Moment Generating Functions only exist for distributions all of whose moments exist. However the converse is not true. For the LogNormal Distribution the Moment Generating Function fails to exist, even though all of its moments exist. d2 ln[MX(t)] | t = 0 = Var[X]. dt2
3 d ln[MX(t)] | t = 0 = 3rd central moment of X. dt3
Let A be Aggregate Losses, X be severity and N be frequency, then the probability generating function of the Aggregate Losses can be written in terms of the p.g.f. of the frequency and p.g.f. of the severity: PA(t) = PN(PX(t)).
2013-4-3, Aggregate Distributions §12 Important Ideas,
HCM 10/23/12,
Page 327
The Moment Generating Function of the Aggregate Losses can be written in terms of the p.g.f. of the frequency and m.g.f. of the severity: M A (t) = PN ( MX(t)) = MN(ln(MX(t))) For any Compound Poisson distribution, MA(t) = exp(λ(MX(t)-1)). The Moment Generating Function of a mixture is a mixture of the Moment Generating Functions. Moments of Aggregate Losses (Section 5) Mean Aggregate Loss = (Mean Frequency)(Mean Severity) When frequency and severity are independent: Process Variance of Aggregate Loss = (Mean Freq.)(Variance of Severity) + (Mean Severity)2 (Variance of Freq.)
σA2 = µF σX 2 + µX 2 σF2. The variance of a Compound Poisson is: λ (2nd moment of severity). The mathematics of Aggregate Distributions and Compound Frequency Distributions are the same: Aggregate Dist. Compound Frequency Dist. Frequency
⇔
Primary (# of cabs)
Severity
⇔
Secondary (# of passengers)
One can approximate the distribution of aggregate losses using the Normal Approximation. One could also approximate aggregate losses via a LogNormal Distribution by matching the first two moments. The Third Central Moment of a Compound Poisson Distribution is: (mean frequency) (third moment of the severity).
2013-4-3, Aggregate Distributions §12 Important Ideas,
HCM 10/23/12,
Page 328
Recursive Method / Panjer Algorithm (Sections 7 and 8) The Panjer Algorithm (recursive method) can be used to compute the aggregate distribution when the severity distribution is discrete and the frequency distribution is a member of the (a, b, 0) class. If the frequency distribution is a member of the (a, b, 0) class:
c(0) = Pf (s(0)).
c(x) =
x 1 ∑ (a + jb / x) s(j) c(x - j) . 1 - a s(0) j=1
In situations in which there is a positive chance of a zero severity, it may be helpful to thin the frequency distribution and work with the distribution of nonzero losses. In the same manner, the Panjer Algorithm (recursive method) can be used to compute a compound frequency distribution when the primary distribution is a member of the (a, b, 0) class. If the frequency distribution, pk, is a member of the (a, b, 1) class: x 1 s(x) {p1 - (a+ b) p0} c(0) = Pf(s(0)). c(x) = + ∑ (a + jb / x) s(j) c(x - j) . 1 - a s(0) 1 - a s(0) j=1
Discretization (Section 9) For the method of rounding with span h, construct the discrete distribution g: g(0) = F(h/2). g(ih) = F(h(i + 1/2)) - F(h(i - 1/2)). For the method of rounding, the original and approximating Distribution Function match at all of the points halfway between the support of the discretized distribution. In order to instead have the means match, the approximating densities are: g(0) = 1 - E[X ∧ h]/h. g(ih) = {2E[X ∧ ih] - E[X ∧ (i-1)h] - E[X ∧ (i+1)h]} / h.
Analytic Results (Section 10) A distribution is closed under convolution, if when one adds independent identically distributed copies, one gets a member of the same family. Closed under convolution: Gamma, Inverse Gaussian, Normal, Binomial, Poisson, and Negative Binomial.
2013-4-3, Aggregate Distributions §12 Important Ideas,
HCM 10/23/12,
Page 329
Stop Loss Premiums (Section 11) The stop loss premium is the expected aggregate losses excess of an aggregate deductible. The stop loss premium at zero is the mean; the stop loss premium at infinity is zero. ∞
∞
expected losses excess of d = E[(A - d)+] =
∫d (t - d) f(t) dt = ∫d S(t) dt .
expected losses excess of d = E[(A - d)+] =
∑
(a - d) f(a) .
a>d
expected aggregate losses excess of d = E[A] - E[A
∧
d].
When there is no probability for the aggregate losses in an interval, the stop loss premium for deductibles in this interval can be gotten by linear interpolation. If the distribution of aggregate losses is discrete with span ΔA: E[(A - (j+1)ΔA)+] = E[(A - jΔA)+] - ΔA S(jΔA).
Mahlerʼs Guide to
Risk Measures Joint Exam 4/C prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-4 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-4,
Risk Measures,
HCM 10/8/12,
Page 1
Mahlerʼs Guide to Risk Measures Copyright 2013 by Howard C. Mahler. The Risk Measure concepts in Loss Models are discussed.1 2 Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (and sections whose titles are in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Solutions to the problems in each section are at the end of that section.3
A
B
Section #
Pages
Section Name
1 2 3 4 5 6 7 8
2-3 4-13 14-24 25-44 45-64 65-72 73-82 83-84
Introduction Premium Principles Value at Risk Tail Value at Risk Distortion Risk Measures Coherence Using Simulation Important Ideas and Formulas
Exam 4/C Exam Questions by Section of this Study Aid4 Question 27 of the Spring 2007 exam, in my Section 5, was on the Proportional Hazard Transform, no longer on the syllabus. The 11/07 and subsequent exams were not released.
1
See Section 3.5 and Section 21.2.5 of Loss Models. Prior to 11/09 this material was from “An Introduction to Risk Measures in Actuarial Applications” by Mary Hardy. 3 Note that problems include both some written by me and some from past exams. Since this material was added to the syllabus for 2007, there are few past exam questions. Past exam questions are copyright by the Casualty Actuarial Society and the Society of Actuaries and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. In some cases Iʼve rewritten past exam questions in order to match the notation in the current Syllabus. 4 This topic was added to the syllabus in 2007. 2
2013-4-4,
Risk Measures §1 Introduction,
HCM 10/8/12,
Page 2
Section 1, Introduction Assume that aggregate annual losses (in millions of dollars) follow a LogNormal Distribution with µ = 5 and σ = 1/2, with mean = exp[5 + (1/2)2 /2] = 168.174, second moment = exp[(2)(5) + (2)(1/2)2 ] = 36,315.5, and variance = 36,315.5 - 168.1742 = 8033: Prob. 0.006 0.005 0.004 0.003 0.002 0.001 100
200
300
400
500
600
x
Assume instead that aggregate annual losses (in millions of dollars) follow a LogNormal Distribution with µ = 4.625 and σ = 1, with mean = exp[4.625 + 12 /2] = 168.174, second moment = exp[(2)(4.625) + (2)(12 )] = 76,879.9, and variance = 76,879.9 - 168.1742 = 48,597: Prob. 0.006 0.005 0.004 0.003 0.002 0.001 100
200
300
400
500
600
x
While the two portfolios have the same mean loss, the second portfolio has a much bigger variance. The second portfolio has a larger probability of an extremely bad year. Therefore, we would consider the second portfolio “risker” to insure than the first portfolio. We will discuss various means to quantify the amount of risk, so-called risk measures.
2013-4-4,
Risk Measures §1 Introduction,
HCM 10/8/12,
Page 3
There are three main uses of risk measures in insurance:5 1. Helping to determine the premium to charge. 2. Determining the appropriate amount of policyholder surplus (capital). 3. Helping to determine an appropriate amount for loss reserves. We would expect that all other things being equal, an insurer would charge more to insure the riskier second portfolio, than the less risky first portfolio. We would expect that all other things being equal, an insurer should have more policyholder surplus if insuring the riskier second portfolio, than the less risky first portfolio.6 Definition of a Risk Measure: A risk measure is defined as a functional mapping of an aggregate loss distribution to the real numbers. ρ(X) is the notation used for the risk measure. Given a specific choice of risk measure, a number is associated with each loss distribution (distribution of aggregate losses), which encapsulates the risk associated with that loss distribution. Exercise: Let the risk measure be: the mean + two standard deviations.7 In other words, ρ(X) = E[X] + 2 StdDev[X]. Determine the risk of the two portfolios discussed previously. [Solution: For the first portfolio: 168.2 + 2 8025 = 347.4. For the second portfolio: 168.2 + 2 48,589 = 609.1 Comment: As expected, using this measure, the second portfolio has a larger risk than the first.]
5
These same ideas can be applied with appropriate modification to banking. What is an appropriate amount of surplus might be determined by an insurance regulator or by the market effects of the possible ratings given to the insurer by a rating agency. 7 This is an example of the standard deviation premium principle, to be discussed in the next section. 6
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 4
Section 2, Premium Principles Three simple premium principles will be discussed: 1. The Expected Value Premium Principle 2. The Standard Deviation Premium Principle 3. The Variance Premium Principle Each premium principle generates a premium which is bigger than the expected loss. The difference between the premium and the mean loss is the premium loading, which acts as a cushion against adverse experience. For a given loss distribution, different choices of risk measure result in different premiums. As elsewhere on the syllabus of this exam, we ignore expenses, investment income, etc., unless specifically stated otherwise. The Expected Value Premium Principle: For example, let the premium be 110% of the expected losses.8 More generally, ρ(X) = (1 + k)E[X], k > 0. In the above example, k = 10%. The Standard Deviation Premium Principle:9 For example, let the premium be the expected losses plus 1.645 times the standard deviation. More generally, ρ(X) = E[X] + k Var[X] , k > 0.10 In the above example, k = 1.645. Using the Normal Approximation, since Φ[1.645] = 95%, E[X] + 1.645 Var[X] is approximately the 95th percentile of the aggregate distribution.11 Thus we would expect that the aggregate loss would exceed the premium approximately 5% of the time.
8
On the exam, you would be given the 110%; you would not be responsible for selecting it. See Example 3.12 in Loss Models. 10 While I have used the same letter k in the different risk measures, k does not have the same meaning. 11 The Normal Approximation is one common way to approximate an aggregate distribution, but not the only method. See “Mahlerʼs Guide to Aggregate Distributions.” 9
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 5
The Variance Premium Principle: σ2 = E[(X - E[X])2 ]. For example, let the premium be the expected losses plus 20% times the variance.12 More generally, ρ(X) = E[X] + k Var[X], k > 0. In the above example, k = 0.2.
Further Reading: There have been many discussions of the use of possible different methods of calculating risks loads along the lines of these simple premium principles.13 Other risk measures have been developed more recently and have certain nice mathematical properties.14
12
On the exam, you would be given the 20%; you would not be responsible for selecting it. See for example, “Reinsurer Risk Loads from Marginal Surplus Requirements” by Rodney Kreps, PCAS 1990 and discussion by Daniel F. Gogol, PCAS 1992; “Risk Loads for Insurers,” by Sholom Feldblum, PCAS 1990, discussion by Steve Philbrick PCAS 1991, Authorʼs reply PCAS 1993, discussion by Todd R. Bault, PCAS 1995; “The Competitive Market Equilibrium Risk Load Formula,” by Glenn G. Meyers, PCAS 1991, discussion by Ira Robbin PCAS 1992, Authorʼs reply PCAS 1993; “Balancing Transaction Costs and Risk Load in Risk Sharing Arrangements,” by Clive L. Keatinge, PCAS 1995; “Pricing to Optimize an Insurerʼs Risk-Return Relationship,” Daniel F. Gogol, 14 In the CAS literature, there have continued to be many papers on this subject. See for example, “An Application of a Game Theory: Property Catastrophe Risk Load,” PCAS 1998, “Capital Consumption: An Alternative Methodology for Pricing Reinsurance”, by Donald Mango, Winter 2003 CAS Forum, “Implementation of PH-Transforms in Ratemaking” by Gary Venter, PCAS 1998, and The Dynamic Financial Analysis Call Papers in the Spring 2001 CAS Forum. 13
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 6
Problems: 2.1 (2 points) Suppose S is a compound Poisson distribution of aggregate claims with a mean number of claims = 500 and with individual claim amounts distributed as an Exponential with mean 1000. The insurer wishes to collect a premium equal to the mean plus one standard deviation of the aggregate claims distribution. Calculate the required premium. Ignore expenses. (A) 500,000 (B) 510,000 (C) 520,000 (D) 530,000 (E) 540,000 2.2 (2 points) For an insured portfolio, you are given: (i) the number of claims has a Geometric distribution with β = 12. (ii) individual claim amounts can take values of 1, 5, or 25 with equal probability. (iii) the number of claims and claim amounts are independent. (iv) the premium charged equals expected aggregate claims plus 2% of the variance of aggregate claims. Determine the premium charged. (A) 200 (B) 300 (C) 400 (D) 500 (E) 600 2.3 (2 points) For aggregate claims S = X1 + X2 + ...+ XN: (i) N has a Poisson distribution with mean 400. (ii) X1 , X2 . . . have mean 2 and variance 3. (iii) N, X1 , X2 . . . are mutually independent. Three actuaries each propose premiums based on different premium principles. Wallace proposes using the expected value premium principle with k = 15%. Yasmin proposes using the standard deviation premium principle with k = 200%. Zachary proposes using the variance premium principle with k = 5%. Rank the three proposed premiums from smallest to largest. (A) Wallace, Yasmin, Zachary (B) Wallace, Zachary, Yasmin (C) Yasmin, Zachary, Wallace (D) Zachary, Yasmin, Wallace (E) None of A, B, C, or D
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 7
Use the following information for the next two questions: • An insurer has a portfolio of 1000 insured properties as shown below. Property Value Number of Properties $50,000 300 $100,000 500 $200,000 200 • The annual probability of a claim for each of the insured properties is .03.
• •
Each property is independent of the others. Assume only total losses are possible.
2.4 (2 points) Insurance premiums are set at the mean loss plus one standard deviation. Determine the premium. (A) Less than 3.5 million (B) At least 3.5 million, but less than 3.6 million (C) At least 3.6 million, but less than 3.7 million (D) At least 3.7 million, but less than 3.8 million (E) At least 3.8 million 2.5 (2 points) The insurer buys reinsurance with a retention of $75,000 on each property. (For example, in the case of a loss of $200,000, the insurer would pay $75,000, while the reinsurer would pay $125,000.) The annual reinsurance premium is set at 110% of the expected annual excess claims. Insurance premiums are set at the reinsurance premiums plus mean annual retained loss plus one standard deviation of the annual retained loss. Determine the premium. (A) Less than 3.5 million (B) At least 3.5 million, but less than 3.6 million (C) At least 3.6 million, but less than 3.7 million (D) At least 3.7 million, but less than 3.8 million (E) At least 3.8 million 2.6 (2 points) Annual aggregate losses have the following distribution: Annual Aggregate Losses Probability 0 50% 10 30% 20 10% 50 5% 100 5% Determine the variance premium principle with k = 1%. A. 16 B. 18 C. 20 D. 22 E. 24
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 8
2.7 (Course 151 Sample Exam #1, Q.20) (2.5 points) For aggregate claims S = X1 + X2 + ...+ XN: (i) N has a Poisson distribution with mean 0.5 (ii) X1 , X2 . . . have mean 100 and variance 100 (iii) N, X1 , X2 . . . are mutually independent. For a portfolio of insurance policies, the loss ratio during a premium period is the ratio of aggregate claims to aggregate premiums collected during the period. The relative security loading, (premiums / expected losses) - 1, is 0.1. Using the normal approximation to the compound Poisson distribution, calculate the probability that the loss ratio exceeds 0.75 during a particular period. (A) 0.43 (B) 0.45 (C) 0.50 (D) 0.55 (E) 0.57 2.8 (Course 151 Sample Exam #2, Q.12) (1.7 points) An insurer provides life insurance for the following group of independent lives: Number Death Probability of Lives Benefit of Death 100 1 0.01 200 2 0.02 300 3 0.03 The insurer purchases reinsurance with a retention of 2 on each life. The reinsurer charges a premium H equal to its expected claims plus the standard deviation of its claims. The insurer charges a premium G equal to expected retained claims plus the standard deviation of retained claims plus H. Determine G. (A) 44 (B) 46 (C) 70 (D) 94 (E) 96 2.9 (Course 151 Sample Exam #3, Q.3) (0.8 points) A company buys insurance to cover medical claims in excess of 50 for each of its three employees. You are given: (i) claims per employee are independent with the following distribution: x p(x) 0 0.4 50 0.4 100 0.2 (ii) the insurer's relative security loading, (premiums / expected losses) - 1, is 50%. Determine the premium for this insurance. (A) 30 (B) 35 (C) 40 (D) 45 (E) 50
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 9
2.10 (5A, 11/94, Q.34) (2 points) You are the actuary for Abnormal Insurance Company. You are assigned the task of setting the initial surplus such that the probability of losses less premiums collected exceeding this surplus at the end of the year is 2%. Company premiums were set equal to 120% of expected losses. Assume that the aggregate losses are distributed according to the information below: Prob(Aggregate Losses < L) = 1 - [10,000,000/(L + 10,000,000)]2 . What is the lowest value of the initial surplus that will satisfy the requirements described above? 2.11 (5A, 5/95, Q.35) (1 point) Suppose S is a compound Poisson distribution of aggregate claims with a mean number of claims = 2 and with individual claim amounts distributed as exponential with E(X) = 5 and VAR(X) = 25. The insurer wishes to collect a premium equal to the mean plus one standard deviation of the aggregate claims distribution. Calculate the required premium. Ignore expenses. 2.12 (IOA, 9/09, Q. 3) (9 points) A small bank wishes to improve the performance of its investments by investing £1m in high returning assets. An investment bank has offered the bank two possible investments: Investment A: A diversified portfolio of shares and derivatives which can be assumed to produce a return of £R1 million where R1 = 0.1 + N, where N is a normal N(1,1) random variable. Investment B: An over-the-counter derivative which will produce a return of £R2 million where the investment bank estimates: ⎧1.5 with probability 0.99 R2 = ⎨ ⎩-5.0 with probability 0.01 The chief executive of the bank says that if one investment has a better expected return and a lower variance than the other then it is the best choice. (i) (4.5 points) (a) Calculate the expected return and variance of each investment A and B. (b) Discuss the chief executiveʼs comments in the light of your calculations. (ii) (1.5 points) Calculate the following risk measures for each of the two investments A and B: (a) probability of the returns falling below 0. (b) probability of the returns falling below -2. (iii) (3 points) (a) Define other suitable risk measures that could be calculated. (b) Discuss what these risk measures would show.
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 10
Solutions to Problems: 2.1. D. E[S] = λθ = (500)(1000)= 500,000. Var[S] = λ2θ2 = (500)(2)(10002 ) = 1,000,000,000. E[S] +
Var[S] = 500,000 +
1,000,000,000 = 531,622.
2.2. D. E[N] = 12. Var[N] = (12)(12 + 1) = 156. E[X] = (1 + 5 + 25)/3 = 10.333. E[X2 ] = (12 + 52 + 252 )/3 = 217. Var[X] = 217 - 10.3332 = 110.2. The aggregate has mean: (12)(10.333) = 124. The aggregate has variance: (12)(110.2) + (10.3332 )(156) = 17,979. E[S] + (0.02)Var[S] = 124 + (2%)(17,979) = 484. 2.3. E. Mean of aggregate is: (400)(2) = 800. Variance of aggregate is: λ(second moment of severity) = (400)(3 + 22 ) = 2800. Wallaceʼs proposed premium is: (1.15)(800) = 920. Yasminʼs proposed premium is: 800 + (2) 2800 = 905.8. Zacharyʼs proposed premium is: 800 + (.05)(2800) = 940 From smallest to largest: Yasmin, Wallace, Zachary. 2.4. D. Frequency is Binomial with m = 1000 and q = .03. Mean frequency is: (1000)(.03) = 30. Variance of Frequency is: (1000)(.03)(.97) = 29.1. Mean severity is: (30%)(50000) + (50%)(100000) + (20%)(200000) = 105,000. Variance of severity is: (30%)(50000 - 105000)2 + (50%)(100000 - 105000)2 + (20%)(200000 - 105000)2 = 2725 million. Mean aggregate is: (30)(105,000) = 3.15 million. Variance of aggregate is: (30)(2725 million) + (105,000)2 (29.1) = 402,577.5 million. Premium is: 3.15 million + 402,577.5 million = 3.15 million + 0.63 million = 3.78 million.
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 11
2.5. C. For a $50,000 loss, all $50,000 is retained. For a loss of either $100,000 or $200,000, $75,000 is retained. The mean retained severity is: (30%)(50000) + (70%)(75000) = 67,500. The mean aggregate retained is: (30)(67,500) = 2.025 million. Therefore the mean aggregate excess is: 3.15 million - 2.025 million = 1.125 million. The reinsurance premium is: (110%)(1.125 million) = 1.238 million. Variance of retained severity is: (30%)(50000 - 67,500)2 + (70%)(75000 - 67,500)2 = 131.25 million. Variance of aggregate retained is: (30)(131.25 million) + (67,500)2 (29.1) = 136,524.4 million. Premium is: 1.238 million + 2.025 million +
136,524.4 million = 3.63 million.
Comment: Purchasing reinsurance has reduced the risk of the insurer. 2.6. B. E[X] = (0)(50%) + (30%)(10) + (10%)(20) + (5%)(50) + (5%)(100) = 12.5. σ2 = (0 - 12.5)2 (50%) + (30%)(10 - 12.5)2 + (10%)(20 - 12.5)2 + (5%)(50 - 12.5)2 + (5%)(100 - 12.5)2 = 538.75. E[X] + (1%)σ2 = 12.5 + (.01)( 538.75) = 17.9. 2.7. D. The mean aggregate loss is: (100)(.5) = 50. The premiums are: (1.1)(50) = 55. Since frequency is Poisson, the variance of the aggregate loss is: (mean frequency)(second moment of the severity) = (.5)(100 + 1002 ) = 5050. The loss ratio is 75% if the loss is: (55)(.75) = 41.25. Thus the loss ratio exceeds 75% if the loss exceeds 41.25. Thus using the Normal approximation, the probability that the loss ratio exceeds 75% is: 1 - Φ((41.25 - 50)/ 5050 ) = 1 - Φ(-.12) = Φ(.12) = 0.5478. 2.8. B. For the insurer, the mean payment is: (100)(.01)(1) + (200)(.02)(2) +(300)(.03)(2) = 1 + 8 + 18 = 27. For the insurer, the variance of payments is : (100)(.01)(.99)(12 ) + (200)(.02)(.98)(22 ) + (300)(.03)(.97)(22 ) = 51.59. For the reinsurer, the mean payment is : (100)(.01)(0) + (200)(.02)(0) + (300)(.03)(1) = 9. For the reinsurer, the variance of payments is : (100)(.01)(.99)(02 ) + (200)(.02)(.98)(02 ) + (300)(.03)(.97)(12 ) = 8.73. Reinsurerʼs premium = 9 + Insurerʼs premium = 27 +
8.73 = 11.955. 51.59 + 11.955 = 46.14.
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 12
2.9. D. The expected payment per employee is: (0)(.4) + (0)(.4) + (100 - 50)(.2) = 10. The expected aggregate payments are: (3)(10) = 30. The premiums = (1.5)(30) = 45. 2.10. The distribution of L is a Pareto Distribution with α = 2 and θ = 10 million. Therefore, E[L] = θ/(α−1) = $10 million. Premiums are (1.2)E(L) = (1.2)($10 million) = $12 million. The 98th percentile of the distribution of aggregate losses is such that .02 = [10,000,000/(L + 10,000,000)]2 . Therefore 98th percentile of L = 60.71 million. Therefore, we require that: 60.71 million = initial surplus + $12 million. initial surplus = $48.71 million. Comment: Use the 98th percentile of the given Pareto Distribution, rather than the Normal Approximation to the Pareto Distribution. 2.11. The mean of the aggregate losses = (2)(5) =10. The variance of aggregate losses = (2)(25) + (2)(52 ) = 100. Mean + Stddev = 10 + 10 = 20.
2013-4-4,
Risk Measures §2 Premium Principles,
HCM 10/8/12,
Page 13
2.12. (i) Investment A Expected return = E[0.1 + N] = 0.1 + 1 = 1.1 Variance = 1 Investment B Expected return = (1.5)(0.99) + (-5.0)(0.01) = 1.435. Variance = (0.99)(1.435 - 1.5)2 + (0.01){1.435 - (- 5)}2 = 0.418275 Investment B has both a higher expected return and lower variance so would be preferred on this basis. However there is an issue with the possibility of very bad returns on Investment B. Also there might be an issue with the estimated probabilities of investment B being somewhat unreliable as they are probably derived from the heavy righthand tail of a distribution. Thus it might be wise to take this calculation with a grain of salt. (ii) a. Investment A. Probability of return below 0 is probability of the return from N(1, 1) being below -0.1: Φ[(-0.1 - 1)/1] = Φ[-1.1] = 0.1357. Investment B Probability of return below 0 is 0.01. b. Investment A. Probability of return below 2 is probability of the return from N(1,1) being below -2.1: Φ[(-2.1 - 1)/1] = Φ[-3.1] = 0.0010. Investment B. Probability of return below -2 is 0.01. (iii) One could instead use the Value at Risk or Tail Value at Risk. For example, 95%-VaR is the 95th percentile of the distribution of losses, which in this case would be the 5th percentile of the returns. For Investment A, 95%-VaR is: 0.1 + {1 - 1.645) = -0.545. For Investment B, 95%-VaR is: 1.5. For Investment A, 99%-VaR is: 0.1 + {1 - 2.326) = -1.226. For Investment B, 99%-VaR is: -5.0. The 95%-TVaR is the average of those losses greater than or equal to 95%-VaR, or in this case the average of the returns less than or equal to 95%-VaR. For the Normal Distribution, TVaRp [X] = µ + σ φ[zp ] / (1 - p). Thus for Investment A, 95%-TVar is: 1.1 - 1
exp[-(1.6452) / 2] 2π
/ 0.05 = -0.9622.
For Investment B, 95%-TVar is: (0.4)(1.5) + (0.1)(-5.0) = 0.1. For Investment A, 95%-TVar is: 1.1 - 1
exp[-(2.3262) / 2] 2π
/ 0.01 = -1.5674.
For Investment B, 99%-TVar is: -5.0. Comment: There is no one “correct” measure of risk. Different measures of risk give different orderings in this case.
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 14
Section 3, Value at Risk15 16 In this section, another risk measure will be discussed: Value at Risk ⇔ VaR ⇔ Quantile Risk Measure ⇔ Quantile Premium Principle. Percentiles: Exercise: Assume that aggregate annual losses (in millions of dollars) follow a LogNormal Distribution with µ = 5 and σ = 1/2. Determine the 95th percentile of this distribution. [Solution: 0.95 = F(x) = Φ[(lnx - 5)/(1/2)]. ⇒ (2)(lnx - 5) = 1.645.
⇒ x = exp[5 + (1/2)(1.645)] = 337.8. Comment: Find the 95th percentile of the underlying Normal and exponentiate.] In other words, for this portfolio, there is a 95% chance that the aggregate loss is less than 337.8. π p is the 100pth percentile. For this portfolio, π95% = 337.8. Quantiles: The 95th percentile is also referred to as Q0.95, the 95% quantile. For this portfolio, the 95% quantile is 337.8. 90th percentile ⇔ Q0.90 ⇔ 90% quantile. 99th percentile ⇔ Q0.99 ⇔ 99% quantile. median ⇔ Q.50 ⇔ 50% quantile.
15 16
See Section 3.5.3 of Loss Models. Value at Risk is also discussed in Chapter 25 of Derivative Markets by McDonald, not on the syllabus.
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 15
Definition of the Value at Risk: The Value at Risk, VaRp , is defined as the 100pt h percentile. p is sometimes called the security level. VaRp (X) = π p . If aggregate annual losses follow a LogNormal Distribution with µ = 5 and σ = 1/2, then VaR95% is the 95th percentile, or 337.8. For this LogNormal Distribution with µ = 5 and σ = 1/2, here is a graph of VaRp as a function of p: VaR 700 600 500 400 300 200 100 p 0.2
0.4
0.6
0.8
0.9
0.999
Exercise: If annual aggregate losses follow a Weibull Distribution with θ = 10 and τ = 3, determine VaR90%. [Solution: .90 = 1 - exp[-(x/10)3 ]. ⇒ x = 13.205. Comment: We have determined the 90th percentile of this Weibull Distribution. As shown in Appendix A: VaRp (X) = θ {-ln(1-p)}1/τ ]. In Appendix A of the Tables attached to the exam, there are formulas for VaRp (X) for a many of the distributions.17 17
This will also help in finding percentiles and in performing simulation by inversion.
2013-4-4,
Risk Measures §3 Value at Risk,
Distribution
VaRp (X)
Exponential
-θ ln(1-p)
Pareto
θ {(1-p)-1/α - 1}
Weibull
θ {-ln(1-p)}1/τ
Single Parameter Pareto
θ (1- p) - 1/ α
Loglogistic
θ {p-1 - 1}-1/γ
Inverse Pareto
θ {p-1/τ - 1}-1
Inverse Weibull
θ {-ln(p)}−1/τ
Burr
θ {(1-p)-1/α - 1}1/γ
Inverse Burr
θ {p-1/τ - 1}-1/γ
Inverse Exponential
θ {-ln(p)}-1
Paralogistic
θ {(1-p)-1/α - 1}1/α
Inverse Paralogistic
θ {p-1/τ - 1}-1/τ
Normal18
µ + σ zp .
18
Not shown in Appendix A attached to the exam. See Example 3.14 in Loss Models. zp is the pth percentile of the Standard Normal.
HCM 10/8/12,
Page 16
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 17
Problems: 3.1 (1 point) Losses are Normal with µ = 1000 and σ = 25. Determine the VaR80%. A. 1010
B. 1020
C. 1030
D. 1040
E. 1050
3.2 (2 points) Losses follow a LogNormal Distribution with µ = 8 and σ = 0.7. Premiums are 110% of expected losses. Determine the amount of policyholder surplus the insurer must have so that there is a 10% chance that the losses will exceed the premium plus surplus. (A) Less than 2500 (B) At least 2500, but less than 3000 (C) At least 3000, but less than 3500 (D) At least 3500, but less than 4000 (E) At least 4000 3.3 (1 point) If annual aggregate losses follow a Pareto Distribution with α = 4 and θ = 100, determine VaR95%. A. 80
B. 90
C. 100
D. 110
E. 120
3.4 (2 points) A group medical insurance policy covers the medical expenses incurred by 2000 mutually independent lives. The annual loss amount, X, incurred by each life is distributed as follows: x Pr(X=x) 0 0.40 100 0.40 1000 0.15 5000 0.05 The premium is equal to the 99th percentile of the normal distribution which approximates the distribution of total claims. Determine the premium per life. (A) Less than 470 (B) At least 470, but less than 480 (C) At least 480, but less than 490 (D) At least 490, but less than 500 (E) At least 500
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 18
3.5 (3 points) Annual Losses for the Rocky Insurance Company are Normal with mean 20 and standard deviation 3. Annual Losses for the Bullwinkle Insurance Company are Normal with mean 30 and standard deviation 4.
The annual losses for the Rocky and Bullwinkle companies have a correlation of 60%. (i) Determine the VaR90% for the Rocky Insurance Company. (ii) Determine the VaR90% for the Bullwinkle Insurance Company. (iii) The Rocky and Bullwinkle companies merge. Determine the VaR90% for the merged company.
3.6 (1 point) Losses follow a Weibull Distribution with θ = 10 and τ = 0.3, for a 99% security level determine the Value at Risk. (A) Less than 1500 (B) At least 1500, but less than 2000 (C) At least 2000, but less than 2500 (D) At least 2500, but less than 3000 (E) At least 3000
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 19
3.7 (1 point) Annual aggregate losses have the following distribution: Annual Aggregate Losses Probability 0 50% 10 30% 20 10% 50 4% 100 2% 200 2% 500 1% 1000 1% Determine the 95% Value at Risk. A. 60 B. 70 C. 80 D. 90 E. 100 3.8 (Course 151 Sample Exam #2, Q.11) (1.7 points) A group medical insurance policy covers the medical expenses incurred by 100,000 mutually independent lives. The annual loss amount, X, incurred by each life is distributed as follows: x Pr(X=x) 0 0.30 50 0.10 200 0.10 500 0.20 1,000 0.20 10,000 0.10 The policy pays 80% of the annual losses for each life. The premium is equal to the 95th percentile of the normal distribution which approximates the distribution of total claims. Determine the difference between the premium and the expected aggregate payments. (A) 1,213,000 (B) 1,356,000 (C) 1,446,000 (D) 1,516,000 (E) 1,624,000 3.9 (5A, 11/94, Q.36) (2 points) An auto insurer has 2 classes of insureds with the following claim probabilities and distribution of claim amounts: Number of Probability of Claim Class Insureds One Claim Severity 1 400 0.10 3,000 2 600 0.05 2,000 An insured will have either no claims or exactly one claim. The size of claim for each class is constant. The insurer wants to collect a total dollar amount such that the probability of total claims dollars exceeding that amount is 5%. Using the normal approximation and ignoring expenses, how much should the insurer collect?
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 20
3.10 (5A, 11/95, Q.35) (2 points) An insurance company has two classes of insureds with the following claim probabilities and distribution of claim amounts: Number of Probability Claim Class Insureds of 1 claim Severity 1 1,000 0.15 $600 2 5,000 0.05 $800 The probability of an insured having more than one loss is zero. The company wants to collect an amount equal to the 95th percentile of the distribution of aggregate losses. Determine the total premium. 3.11 (5A, 11/98, Q.35) (2 points) You are a pricing actuary offering a new coverage and you have analyzed the distribution of losses capped at various limits shown below: Capped Limit Expected Value Variance 30,000 500 100,000 25,000 450 50,000 20,000 400 40,000 15,000 350 28,000 10,000 250 14,000 5,000 200 9,000 Your chief actuary requires that the premiums be at the 95th percentile of the distribution of losses. The general manager requires that the difference between the premiums and the expected losses be no greater than $200. What is the highest limit of the new coverage that can be written consistent with these requirements? 3.12 (5A, 11/99, Q.37) (2 points) An insurer issues 1-year warranty coverage policies to two different types of insureds. Group 1 insureds have a probability of having a claim of .05 and Group 2 insureds have a probability of having a claim of .10. There are two possible claim amounts of $500 and $1,000. The following table shows the number of insureds in each class. Class Prob. of Claim Claim Amount # of Insureds 1 0.05 $500 200 2 0.10 $500 200 3 0.05 $1000 300 4 0.10 $1000 250 Using the Normal Approximation, how much premium should the insurer collect such that the collected premium equals the 95th percentile of the distribution of total claims?
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 21
3.13 (8, 5/09, Q.28) (2.25 points) Given the following information about Portfolios A and B:
• The returns on a stock are Normally distributed. • The volatility is the standard deviation of the returns on a stock. • If you buy stocks, then the loss is the difference between the initial cost of the portfolio and the current value of the portfolio.
• The value of Portfolio A is $15 million and consists only of Company A stock. • The daily volatility of Portfolio A is 3%. • The value of Portfolio B is $7 million and consists only of Company B stock. • The daily volatility of Portfolio B is 2%. • The correlation coefficient between Company A and Company B stock prices is 0.40. a. (0.75 point) Calculate the 10-day 99% Value-at-Risk (VaR) for Portfolio A. b. (0.75 point) Calculate the 10-day 99% VaR for a portfolio consisting of Portfolios A and B. Note: I have revised this past exam question.
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 22
Solutions to Problems: 3.1. B. .80 = F(x) = Φ[(x - 1000)/25]. ⇒ (x - 1000)/25 = 0.842
⇒ x = 1000 + (25)(0.842) = 1021. 3.2. C. E[X] = exp[8 + .72 /2] = 3808. Premium is: (1.1)(3808) = 4189. .90 = F(x) = Φ[(lnx - 8)/0.7]. ⇒ (lnx - 8)/0.7 = 1.282.
⇒ x = exp[8 + (0.7)(1.282)] = 7313. 90th percentile of the LogNormal is 7313. Required surplus is: 7313 - 4189 = 3124. 3.3. D. .95 = 1 - {100/(100 + x)}4 . ⇒ 20 = (1 + x/100)4 . ⇒ x = 111.5. As shown in Appendix A, for a Pareto Distribution with parameters α and θ, α > 1: VaRp (X) = θ [(1-p)-1/α - 1]. VaR0.95 = (100) {(0.05)-1/4 - 1} = 111.5. 3.4. D. E[X] = (0)(.4) + (100)(.4) + (1000)(.15) + (5000)(.05) = 440. E[X2 ] = (02 )(.4) + (1002 )(.4) + (10002 )(.15) + (50002 )(.05) = 1,404,000. Var[X] = 1,404,000 - 4402 = 1,210,400. The aggregate has mean: (2000)(440) and variance: (2000)(1,210,400). Φ[2.326] = 0.99. Total premium is: (2000)(440) + (2.326) (2000)(1,210,400) . Premium per life is: 440 + (2.326) 1,210,400 / 2000 = 497.2. 3.5. (i) For Rocky, VaR90% is: 20 + (1.282)(3) = 23.846. (ii) For Bullwinkle, VaR90% is: 30 + (1.282)(4) = 35.128. (iii) Annual losses for Rocky plus Bullwinkle are Normal with mean: 20 + 30 = 50, and variance: 32 + 42 + (2)(.6)(3)(4) = 39.4. For Rocky plus Bullwinkle, VaR90% is: 50 + (1.282)( 39.4 ) = 58.047. Comment: 58.047 < 58.974 = 23.846 + 35.128. Merging has reduced the risk measure, an example of the advantage of diversification. As will be discussed with respect to coherent risk measures, this property is called subadditivity. While Value at Risk is usually subadditive, it is not always subadditive.
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 23
3.6. B. .99 = 1 - exp[-(x/10)0.3]. ⇒ 100 = exp[(x/10)0.3]. ⇒ x = 1625. As shown in Appendix A: VaRp (X) = θ [ -ln(1-p) ]1/τ. VaR0.99(X) = (10) { -ln(0.01) ]1/0.3 = 1625. 3.7. E. F(50) = 94% < 95%. F(100) = 96% ≥ 95%. Thus 100 is the 95% VaR. 3.8. A. The variance of the severity is: 10,254,250 - 13252 = 8,498,625. x
density
first moment
second moment
0 50 200 500 1000 10000
0.3 0.1 0.1 0.2 0.2 0.1
0 5 20 100 200 1000
0 250 4,000 50,000 200,000 10,000,000
1325
10,254,250
The mean aggregate payment by the insurer is: (100000)(.8)(1325) = 106 million. The variance of the insurerʼs aggregate payment is: (.82 )(100000)(8,498,625). The standard deviation is: 737,503. For the 95th percentile, one adds 1.645 standard deviations to the mean. Thus the premium is: 106,000,000 + (1.645)(737,503). Premiums - expected aggregate payments = (1.645)(737,503) = 1,213,192. 3.9. Mean Aggregate Loss = (400)(.10)(3000) + (600)(.05)(2000) = 180,000. Variance of Aggregate Losses = (400)(.10)(.9)(30002 ) + (600)(.05)(.95)(20002 ) = 438,000,000. Since the 95th percentile of the Unit Normal Distribution is 1.645, we want to collect: Mean + 1.645 Standard Deviations = 180,000 + 1.645 438,000,000 = 214,427. 3.10. The mean loss is: (.15)(600)(1000) + (.05)(800)(5000) = 290,000. The variance of aggregate losses is: (.15)(.85)(6002 )(1000) + (.05)(.95)(8002 )(5000) = 197,900,000. The 95th percentile of aggregate losses is approximately: 290,000 + (1.645) 197,900,000 = 290,000 + 23,141 = 313,141. Comment: The relative security loading is: 23,141/290,000 = 8.0%.
2013-4-4,
Risk Measures §3 Value at Risk,
HCM 10/8/12,
Page 24
3.11. Using the Normal Approximation, The 95th percentile is approximately: mean + 1.645(Standard Deviation), . The difference between the premiums and the expected losses is: 1.645(Standard Deviation). Therefore, we require 1.645(Standard Deviation) < 200 ⇒ Variance < 14,782. The highest limit of the new coverage that can be written consistent with these requirements is $10,000. 3.12. With severity s, Bernoulli parameter q, and n insureds: mean of aggregate losses = nqs, variance of aggregate losses = nq(1-q)s2 . Class
Frequency
Severity
# of Insureds
Mean
Variance
1 2 3 4
0.05 0.10 0.05 0.10
500 500 1000 1000
200 200 300 250
5,000 10,000 15,000 25,000
2,375,000 4,500,000 14,250,000 22,500,000
55,000
43,625,000
Overall
Approximate the distribution of aggregate losses by the Normal Distribution with the same mean and variance. The 95th percentile ≅ 55000 + 1.645 43,625,000 = 65,865. 3.13. a. Φ[2.326] = 99%. Assuming the returns on different days are independent, the variances add; variances are multiplied by N, while standard deviations are multiplied by The volatility over ten days is: 0.03
N.
10 .
One standard deviation of movement in value is: ($15 million) (0.03 10 ). The 1% worst outcomes are when the value declines by 2.326 standard deviations or more. VaR0.99 = ($15 million) (2.326) (0.03
10 ) = 3.31 million.
b. The standard deviation of the daily change in the value of the portfolio is: (152 )(0.032 ) + (7 2 )(0.022) + (2)(0.4)(15)(0.03)(7)(0.02) = 0.522 million. VaR0.99 = (0.522 million) (2.326)
10 = 3.84 million.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 25
Section 4, Tail Value at Risk19 20 21 In this section, another risk measure will be discussed: Tail Value at Risk ⇔ TVaR ⇔ Conditional Tail Expectation ⇔ CTE ⇔
⇔ Tail Conditional Expectation ⇔ TCE ⇔ Expected Shortfall ⇔ Expected Tail Loss. Definition of the Tail Value at Risk: For a given value of p, the security level, the Tail Value at Risk of a loss distribution is defined as the average of the 1 - p worst outcomes: TVaRp (X) ≡ E[X | X > πp ]. The corresponding risk measure is: ρ(X) = TVaRp (X). Exercise: The aggregate losses are uniform from 0 to 100. Determine TVaR.80 and TVaR.90. [Solution: TVaR.80 = (100 + 80)/2 = 90. TVaR.90 = (100 + 90)/2 = 95.] As with the Value at Risk, for larger choices of p, the Tail Value at Risk is larger, all other things being equal. TVaRp = average size of those losses of size greater than the pth percentile, πp . ∞
TVaRp =
∫
πp
∞
x f(x) dx /
∫
πp
∞
f(x) dx =
∫ x f(x) dx / (1−p).
πp
The average size of those losses of size between a and b is:22 E[X | b > X > a] = ({E[X ∧ b] - b S(b)} - {E[X ∧ a] - a S(a)}) / {F(b) - F(a)}. Letting a = πp and b = ∞: TVaRp = ({E[X] - 0} - {E[X ∧ πp ] - πp S(πp )} ) / {1 - F(πp )} = {E[X] - E[X ∧ πp ] + πp (1 - p)}/(1 - p) = πp + (E[X] - E[X ∧ πp ])/(1 - p). TVaRp (X) = π p + (E[X] - E[X ∧ πp ]) / (1 - p). 19
See Section 3.5.4 of Loss Models. For an example of an application, see “DFA Insurance Company Case Study, Part 2 Capital Adequacy and Capital Allocation,” by Stephen W. Philbrick and Robert A. Painter, in the Spring 2001 CAS Forum. 21 This is also discussed in Section 25.2 of Derivative Markets by McDonald, not on the syllabus. 22 See “Mahlerʼs Guide to Loss Distributions.” 20
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 26
Exercise: Losses follow a Pareto Distribution with α = 3 and θ = 20. Determine TVaR0.90. [Solution: Set 0.90 = F(π0.90) = 1 - {20/(π0.90 + 20)}3 . ⇒ π0.90 = 23.09. E[X] = θ/(α -1) = 20/(3 - 1) = 10. E[X ∧ 23.09] = {θ/(α -1)}{1 - {20/(23.09 + 20)}2 } = 7.846. TVaR0.90 = π0.90 + (E[X] - E[X ∧ π0.90])/(1 - .90) = 23.09 + (10 - 7.846)/0.1 = 44.63. Alternately, X truncated and shifted from below at 23.09 is Pareto with α = 3 and θ = 20 + 23.09 = 43.09, with mean 43.09/(3 - 1) = 21.54. TVaR0.90 = E[X | X > π0.90] = 23.09 + 21.54 = 44.63. Comment: As shown in Appendix A of the Tables attached to the exam: TVaRp = θ [(1-p)−1/α - 1] +
θ (1- p) - 1/ α , for α > 1. ] α - 1
For this Pareto Distribution with α = 3 and θ = 20, here is a graph of TVaRp as a function of p: TVaR
250
200
150
100
50
p 0.2
0.4
0.6
TVaR0 (X) = E[X | over the worst 100% of outcomes] = E[X]. For a loss distribution with a maximum, TVaR1 (X) = Max[X].
0.8
0.9
0.999
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 27
In Appendix A, there are formulas for TVaRp (X) for a few of the distributions: Exponential, Pareto, Single Parameter Pareto. Distribution
TVaRp (X)
Exponential
-θ ln(1-p) + θ
Pareto
θ {(1-p)-1/α - 1} +
Single Parameter Pareto
αθ (1- p) -1/ α ,α>1 α - 1
Normal23
µ + σ φ[zp ] / (1 - p)
23
θ (1- p) - 1/ α ,α>1 α - 1
Not shown in Appendix A attached to the exam. See Example 3.14 in Loss Models. zp is the pth percentile of the Standard Normal, and φ is the density of the Standard Normal. For example, z0.975 = 1.960.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 28
Relationship to the Mean Excess Loss: The mean excess loss, e(x) = E[X - x | X > x] = E[X | X > x] - x.24 Therefore, E[X | X > x] = x + e(x). Therefore, TVaRp (X) = E[X | X > πp ] = πp + e(πp ). This matches a previous formula, since e(πp ) = (E[X] - E[X ∧ πp ])/S(πp ) = (E[X] - E[X ∧ πp ])/(1 - p). This form of the formula for the TVaR can be useful in those cases where one remembers the form of the mean residual life.
For example, for a Pareto Distribution with α = 3 and θ = 20, as determined previously, π .90 = 23.09. The mean excess loss for a Pareto is e(x) = (x + θ )/(α - 1). Therefore, e(23.09) = (23.09 + 20)/(3 - 1) = 21.54. TVaR.90 = 23.09 + 21.54 = 44.63, matching the previous result. For a Pareto Distribution with parameters α and θ, α > 1: π p = θ{(1 - p)−1/α - 1}. e(πp ) = (πp + θ )/(α - 1) = θ(1 - p)−1/α/(α - 1). TVaRp = πp + e(πp ) = θ{(1 - p)−1/α α/(α - 1) - 1}.25 For the above example, TVaR.90 = 20{(.1-1/3)(3/2) - 1} = 44.63, matching the previous result. Exercise: For an Exponential Distribution with mean 600, determine TVaR.99. [Solution: Set .99 = 1 - exp[-π.99/600]. ⇒ π.99 = 2763. For the Exponential, e(x) = θ = 600. Therefore, TVaR.99 = π.99 + e(π.99) = 2763 + 600 = 3363.] For an Exponential Distribution with mean θ: π p = -θ ln[1 - p]. e(πp ) = θ. TVaRp = πp + e(πp ) = θ(1 - ln[1 - p]).26 For the above example, TVaR.99 = 600(1 - ln[.01]) = 3363, matching the previous result. 24
See “Mahlerʼs Guide to Loss Distributions.” I would not memorize this formula. 26 I would not memorize this formula. 25
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 29
A Example with a Discrete Distribution: Let us assume that the aggregate distribution is: 10 50% 50 30% 100 10% 500 8% 1000 2% Then E[L | L ≥ 500] = {(500)(8%) + (1000)(2%)}/10% = 600. In contrast, E[L | L > 500] = 1000. Neither 600 or 1000 is the average of the 5% worst outcomes. Thus neither is used for TVaR.95. Rather we compute TVaR.95 by averaging the 5% worst possible outcomes: TVaR.95 = {(500)(3%) + (1000)(2%)}/5% = 700. In general, in order to calculate TVaRp :27 (1) Take the 1 - p worst outcomes. (2) Average over these worst outcomes.
27
This is equivalent to what had been done in the case of a continuous aggregate distribution.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 30
TVaR Versus VaR: Exercise: The aggregate losses are uniform from 0 to 100. Determine VaR95% and TVaR95%. [Solution: VaR95% = π.95 = 95. TVaR95% = (100 + 95)/2 = 97.5.] Since TVaRp (X) ≡ E[X | X > πp ], TVaRp (X) ≥ VaRp (X). 28 Unlike VaRp, TVaRp is affected by the behavior in the extreme righthand tail of the distribution. Exercise: The aggregate losses are a two-component splice between a uniform from 0 to 95, and a uniform from 95 to 200, with 95% weight to the first component of the splice. Determine VaR95% and TVaR95%. [Solution: VaR95% = π.95 = 95. TVaR95% = (200 + 95)/2 = 147.5.] For a heavier-tailed distribution, TVaRp can be much larger than VaRp . 29
Exercise: The aggregate losses are a two-component splice between a uniform from 0 to 95, and above 95 a density proportional to a Pareto with α = 3 and θ = 300, with 95% weight to the first component of the splice. Determine VaR95% and TVaR95%. [Solution: VaR95% = π.95 = 95. Above 95 the density of the splice is proportional to a Pareto ∞
Distribution, let us say c fPareto(x). e(95) = ∫ (x - 95) c fPareto (x) dx / {c SPareto(x)} = 95
ePareto(95). For a Pareto with α = 3 and θ = 300, e(x) = (x + 300)/(3 - 1). e(95) = 395/2 = 197.5. TVaR95% = 95 + e(95) = 95 + 197.5 = 292.5. Comment: In this and the previous exercise, the 95% Values at Risk are the same, even though the distribution in this exercise has a larger probability of extremely bad outcomes such as 300.] Derivative of TVaR: ∞
Let G(x) = E[X | X > x] = x + e(x) = x +
∫ S(t) dt /S(x). x
∞
Then dG/dx = 1 - S(x)/S(x) + f(x) ∫ S(t) x
28
∞
dt /S(x)2
= {f(x)/S(x)} ∫ S(t) dt /S(x) = h(x) e(x). x
Only in very unusual situations would the two be equal. A heavier-tailed distribution has f(x) go to zero more slowly as x approaches infinity. The Pareto and LogNormal are examples of heavier-tailed distributions. See “Mahlerʼs Guide to Loss Distributions.” 29
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 31
dE[X | X > x]/dx = h(x) e(x).30 dE[X | X > x]/dx > 0, and as expected E[X | X > x] is an increasing function of x. For example, for a Pareto with parameters α and θ, e(x) = (x + θ)/(α - 1), and h(x) = α / (θ+x). Therefore, for a Pareto, dE[X | X > x]/dx = h(x) e(x) = α/(α - 1), for α > 1.31 Exercise: For an Exponential Distribution, determine dE[X | X > x]/dx. [Solution: e(x) = θ, and h(x) = 1/θ. dE[X | X > x]/dx = h(x) e(x) = 1. Comment: For the Exponential: E[X | X > x] = x + e(x) = x + θ.] TVaRp (X) = E[X | X > πp ] = G(πp ). Therefore, by the Chain Rule, TVaRp /dp = h(πp ) e(πp ) d πp /dp. For example, for a Pareto with parameters α and θ, p = F(πp ) = 1 - {θ/(πp + θ)}1/α. ⇒ πp = θ{(1 - p)-1/α - 1}. Therefore, for a Pareto, with the shape parameter α > 1, TVaRp /dp = h(πp ) e(πp ) d πp /dp = {α/(α - 1)} (θ/α)(1 - p)-(1+1/α) = {θ/(α - 1)}(1 - p)-(1+1/α).32 Exercise: For an Exponential Distribution, determine TVaRp /dp. [Solution: p = F(πp ) = 1 - exp[-πp /θ]. ⇒ πp = -θ ln[1 - p]. TVaRp /dp = h(πp ) e(πp ) d πp /dp = (1) θ/(1 - p) = θ/(1 - p). Comment: For the Exponential: TVaRp = πp + e(πp ) = -θ ln[1 - p] + θ.] Since πp is an increasing function of p, d πp /dp > 0. Therefore, TVaRp /dp = h(πp ) e(πp ) d πp /dp > 0, and as expected TVaRp is an increasing function of p.
30
See Exercise 3.37 in Loss Models. h(x) is the hazard rate. For the Pareto: E[X | X > x] = x + e(x) = x + (x + θ)/(γ - 1). 32 As discussed previously, for the Pareto: TVaRp = θ{(1 - p)−1/α α/(α - 1) - 1}. 31
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 32
Normal Distribution:33 For a Normal Distribution, the pth percentile is: µ + σ zp , where zp is the pth percentile of the Standard Normal. Exercise: For a Normal Distribution with µ = 100 and σ = 20, determine VaR0.95[X]. [Solution: The 95th percentile of the Standard Normal is 1.645. VaR0.95[X] = 100 + (20)(1.645) = 132.9.] As derived below, TVaRp [X] = µ + σ φ[zp ] / (1 - p). Exercise: For a Normal Distribution with µ = 100 and σ = 20, determine TVaR0.95[X]. [Solution: φ[zp ] = φ[1.645] = exp[-1.6452 /2] /
2 π = 0.10311.
TVaR0.95[X] = 100 + (20)(0.10311)/(1 - 0.95) = 141.24. Comment: Note that TVaR0.95[X] > VaRp [X].] For the Standard Normal: ∞
∫x
∞
x φ(x) dx =
∫x
x exp[-x2 / 2] / 2π dx = -exp[-x2 / 2] / 2 π
∞
]
= exp[-x2 /2] /
2 π = φ(x).
x
For the nonstandard Normal: ∞
∞
∞
∫x x f(x) dx = ∫x x φ[(x - µ) / σ] / σ dx = (x -∫µ)/ σ (σy + µ) φ[y] dy = ∞
σ
∞
y φ[y] dy + µ ∫ φ[y] dy = σ φ[(x-µ)/σ] + µ(1 - Φ[(x-µ)/σ]). ∫ (x - µ)/ σ (x - µ)/ σ ∞
TVaRp [X] =
∫π x f(x) dx / (1 - p) = σ φ[(πp-µ)/σ] / (1 - p) + µ {1 - Φ[(πp-µ)/σ]} / (1 - p) = p
σ φ[zp ] / (1 - p) + µ(1 - Φ[zp ]) / (1 - p) = σ φ[zp ] / (1 - p) + µ(1 - p)/(1 - p) = µ + σ φ[zp ] / (1 - p). 33
See Example 3.14 in Loss Models.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 33
Problems: 4.1 (2 points) What is the TVaR0.95 for an Exponential Distribution with mean 100? A. 400
B. 425
C. 450
D. 475
E. 500
4.2 (3 points) Losses are Normal with µ = 300 and σ = 10. Determine the 90% Tail Value at Risk. Hint: For the Normal Distribution, E[X
∧
x] = µ Φ[(x−µ)/σ] - σ φ[(x−µ)/σ] + x {1 - Φ[(x−µ)/σ]}.
A. less than 315 B. at least 315 but less than 320 C. at least 320 but less than 325 D. at least 325 but less than 330 E. at least 330 4.3 (3 points) F(x) = 1 - {θ/(θ + x)}4 . Calculate the Tail Value at Risk at a security level of 99%. A. 2.6θ
B. 2.8θ
C. 3.0θ
D. 3.2θ
E. 3.4θ
4.4 (2 points) For an Exponential Distribution with mean θ, determine TVaRp - VaRp . A. θ
B. -θ ln(1 - p)
C. θ - θ ln(1 - p)
D. θ + θ ln(1/p)
E. None of A, B, C, or D
4.5 (3 points) Losses follow a LogNormal Distribution with µ = 7 and σ = 0.8. Determine TVaR0.995. A. less than 12,000 B. at least 12,000 but less than 13,000 C. at least 13,000 but less than 14,000 D. at least 14,000 but less than 15,000 E. at least 15,000
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 34
Use the following information for the next two questions: Annual aggregate losses have the following distribution: Annual Aggregate Losses Probability 0 50% 10 30% 20 10% 50 4% 100 2% 200 2% 500 1% 1000 1% 4.6 (1 point) Determine the 90% Tail Value at Risk. A. 200 B. 210 C. 220 D. 230
E. 240
4.7 (1 point) Determine the 95% Tail Value at Risk. A. 375 B. 400 C. 425 D. 450
E. 475
Use the following information for the next 2 questions: Losses follows a Single Parameter Pareto Distribution, with α = 6 and θ = 1000. 4.8 (1 point) Determine the 98% Value at Risk. A. 1800
B. 1900
C. 2000
D. 2100
E. 2200
4.9 (2 points) Determine the 98% Tail Value at Risk. A. 1900
B. 2000
C. 2100
D. 2200
E. 2300
4.10 (3 points) You are given the following information:
• Frequency is Binomial with m = 500 and q = 0.3. • Severity is LogNormal with µ = 8 and σ = 0.6. • Frequency and severity are independent. Using the Normal Approximation, determine the 99% Tail Value at Risk for Aggregate Losses. Hint: For the Normal Distribution, TVaRp (X) = µ + σ φ[Φ-1(p)] / (1 - p). A. 655,000
B. 660,000
C. 665,000
D. 670,000
E. 675,000
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 35
Use the following information for the next 4 questions: For the aggregate losses, VaR0.9 is 1,000,000. 4.11 (2 points) John believes that the aggregate losses follow an Exponential Distribution. Determine Johnʼs estimate of TVaR0.9. 4.12 (4 points) Paul believes that the aggregate losses follow a LogNormal Distribution with σ = 0.6. Determine Paulʼs estimate of TVaR0.9. 4.13 (4 points) George believes that the aggregate losses follow a LogNormal Distribution with σ = 1.2. Determine Georgeʼs estimate of TVaR0.9. 4.14 (3 points) Ringo believes that the aggregate losses follow a Pareto Distribution with α = 3. Determine Ringoʼs estimate of TVaR0.9.
Use the following information for the next 2 questions: In the state of Windiana, a State Fund pays for losses due to hurricanes. The worst possible annual amounts to be paid by the State Fund in millions of dollars are: Amount Probability 100 3.00% 200 1.00% 300 0.50% 400 0.25% 500 0.10% 600 0.05% 700 0.04% 800 0.03% 900 0.02% 1000 0.01% 4.15 (2 points) Determine TVaR0.95 in millions of dollars. A. 120
B. 140
C. 160
D. 180
E. 200
4.16 (2 points) Determine TVaR0.99 in millions of dollars. A. 400
B. 450
C. 500
D. 550
E. 600
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 36
Use the following information for the next 2 questions: Losses follow a mixture of two Exponential Distributions, with means of 1000 and 2000, and with weights of 60% and 40% respectively. 4.17 (2 points) Determine the 95% Value at Risk. A. 3000 B. 3500 C. 4000 D. 4500
E. 5000
4.18 (3 points) Determine the 95% Tail Value at Risk. A. 5700 B. 6000 C. 6300 D. 6600
E. 6900
Use the following information for the next 2 questions: f(x) = 0.050 for 0 ≤ x ≤ 10, f(x) = 0.010 for 10 < x ≤ 50, and f(x) = 0.002 for 50 < x ≤ 100. 4.19 (1 point) Determine the 80% Value at Risk. A. 35 B. 40 C. 45 D. 50
E. 55
4.20 (2 points) Determine the 80% Tail Value at Risk. A. 50 B. 55 C. 60 D. 65
E. 70
4.21 (2 points) F(x) = (x/10)4 , 0 ≤ x ≤ 10. Determine TVaR0.90. A. less than 9.70 B. at least 9.70 but less than 9.75 C. at least 9.75 but less than 9.80 D. at least 9.80 but less than 9.85 E. at least 9.85 4.22 (2 points) For a Normal Distribution with µ = 10 and σ = 3, determine TVaR95%. A. less than 14 B. at least 14 but less than 15 C. at least 15 but less than 16 D. at least 16 but less than 17 E. at least 17 4.23 (3 points) f(x) = 0.0008 for x ≤ 1000, and f(x) = 0.0004 exp[2 - x/500] for x > 1000. Determine the 95% Tail Value at Risk. A. 2200 B. 2300 C. 2400 D. 2500 E. 2600
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 37
Solutions to Problems: 4.1. A. Set 0.95 = 1 - exp[-π.95/100]. ⇒ π.95 = -(100)ln(.05) = 299.6. For the Exponential, e(x) = θ = 100. TVaR.95 = π.95 + e(π.95) = 299.6 + 100 = 399.6. As shown in Appendix A: TVaRp (X) = -θ ln(1-p) + θ = -(100)ln(.05) + 100 = 399.6. Comment: See Example 3.15 in Loss Models. 4.2. B. 0.90 = F(x) = Φ[(x - 300)/10]. ⇒ (x - 300)/10 = 1.282
⇒ x = 300 + (10)(1.282) = 312.82. φ[(312.82 - µ)/σ] = φ[1.282] = exp[-1.2822 /2]/ 2 π = 0.1754. Φ[(312.82 - µ)/σ] = 0.9.
∧ E[X ∧ E[X
x] = µ Φ[(x−µ)/σ] - σ φ[(x−µ)/σ] + x {1 - Φ[(x−µ)/σ]}.
312.82] = (300)(0.9) - (10)(0.1754) + (312.82)(1 - 0.9) = 299.53. e(312.82) = (E[X] - E[X ∧ 312.82]) / (1 - 0.9) = (300 - 299.53)/0.1 = 4.7. TVaRp = πp + e(πp ) = 312.82 + 4.7 = 317.5. Alternately, for a Normal Distribution, TVaRp [X] = µ + σ φ[zp ] = 300 + (10)φ[1.282] = 300 + (10) exp[-1.2822 /2]/ 2 π = 317.5. Comment: See Example 3.14 in Loss Models.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 38
4.3. D. For the Pareto Distribution, .99 = 1 - {θ/(θ + π.99)}4 . ⇒ π.99 = θ(100.25 - 1) = 2.1623θ. TVaRp = πp + (E[X] - E[X ∧ πp ])/(1 - p) = πp + e(πp ). TVaR.99 = 2.1623θ + e(2.1623θ) = 2.1623θ + (2.1623θ + θ)/(4 - 1) = 3.2164θ. As shown in Appendix A, for a Pareto Distribution with parameters α and θ, α > 1: TVaRp (X) = VaRp (X) +
θ (1 - p) -1/ α θ (1 - p) -1/ α = θ [(1-p)-1/α - 1] + = α- 1 α- 1
⎧ ⎫ - 1/ α α θ ⎨ (1 - p) - 1⎬ . ⎩ ⎭ α- 1 With α = 4, TVaR0.99(X) = θ [(1%)-0.25 - 1] + θ (1%)-0.25 / 3 = 2.1623 θ + 1.0541 θ = 3.2164 θ. Comment: See Example 3.16 in Loss Models. For the Pareto Distribution, e(x) = (x + θ)/(α - 1), α > 1. TVaRp , the Tail Variance at Risk as a function of p, for F(x) = 1 - {θ/(θ + x)}4 : TVaR 600 500 400 300 200 100 0.8
0.9
0.95
0.99
p
4.4. A. TVaRp = πp .+ e(πp ). TVaRp − πp = e(πp ) = θ. As shown in Appendix A: VaRp (X) = -θ ln(1-p). TVaRp (X) = -θ ln(1-p) + θ. TVaRp (X) - VaRp (X) = θ. Comment: For an Exponential Distribution, e(x) = θ.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 39
4.5. A. E[X] = exp[7 + 0.82 /2] = 1510. 0.995 = F(x) = Φ[(lnx - 7)/0.8]. ⇒ (lnx - 7)/0.8 = 2.576.
⇒ x = exp[7 + (0.8)(2.576)] = 8611. 99.5th percentile of the LogNormal is 8611. E[X
∧
8611] = (1510)Φ[(ln8611 - 7 - 0.82 )/0.8] + (8611){1 - Φ[(ln8611 - 7)/0.8]}
= (1510)Φ[1.78] + (8611){1 - Φ[2.58]} = (1510)(.9625) + (8611)(.0049) = 1496. TVaR0.995 = π0.995 + (E[X] - E[X
∧
π.995])/(1 - 0.995) = 8611 + (1510 - 1496)/0.0049
= 11,468. 4.6. D. Average the 10% worst possible outcomes: TVaR.90 = {(4%)(50) + (2%)(100) + (2%)(200) + (1%)(500) + (1%)(1000)}/10% = 230. 4.7. B. Average the 5% worst possible outcomes: TVaR.95 = {(1%)(100) + (2%)(200) + (1%)(500) + (1%)(1000)}/5% = 400. 4.8. B. F(X) = 1 - (θ/x)α. .98 = 1 - (1000/π.98)6 . ⇒ π.98 = 1919. As shown in Appendix A: VaRp = θ (1-p)−1/α. VaR0.98 = (1000) (0.02)-1/6 = 1919. 4.9. E. E[X E[X
∧
∧
x] = θ {α - (θ/x)α−1} / (α - 1).
1919] = (1000){6 - (1000/1919)5 }/(6 - 1) = 1192.315.
E[X] = θ α / (α - 1) = (1000)(6/5) = 1200. TVaR.98 = π.98 + (E[X] - E[X
∧
π.98])/(1 - .98) = 1919 + (1200 - 1192.315)/.02 = 2303. ∞
Alternately, f(x) = 6 x 108 / x7 . TVaR.98 =
∫
x f(x) dx /.02 = 2303.
1919
α θ (1-p)-1 / α As shown in Appendix A: TVaRp = , for α > 1. α -1 TVaR0.98 = (6) (1000) (0.02)-1/6 / 5 = 2303. Comment: For a Single Parameter Pareto, with parameters α and θ, πp = θ /(1 - p)1/α. E[X] - E[X
∧
πp ] = θ (θ/πp )α−1 / (α - 1) = θ {(1 - p)1/α}α−1 / (α - 1) = θ (1 - p)1 - 1/α / (α - 1).
TVaRp = πp + (E[X] - E[X
∧
πp ])/(1 - p) = {θ /(1 - p)1/α} {α/(α - 1)} = {α/(α - 1)} πp .
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 40
4.10. B. The mean severity is: exp[8 + 0.62 /2] = 3569. The second moment of severity is: exp[(2)(8) + (2)(0.62 )] = 18,255,921. The variance of severity is: 18,255,921 - 35692 = 5,518,160. The mean frequency is: (500)(0.3) = 150. The variance of frequency is: (500)(0.3)(0.7) = 105. The mean aggregate loss is: (150)(3569) = 535,350. The variance of aggregate loss is: (150)(5,518,160) + (35692 )(105) = 2,165,188,905. Thus we approximate by a Normal Distribution with µ = 535,350 and σ =
2,165,188,905 = 46,532.
φ[Φ-1(p)] = φ[Φ-1(99%)] = φ[2.326] = exp[-2.3262 /2] /
2 π = 0.02667.
TVaR95%(X) = 535,350 + (46,532)(0.02667) / 0.01 = 659,451. Comment: VaR0.99 = 535,350 + (2.326)(46,532) = 643,583. The formula for TVaR for the Normal Distribution is given the Example 3.14 in Loss Models. 4.11. For the Exponential Distribution, VaRp = -θ ln(1-p). Thus 1,000,000 = -θ ln(1-0.9). ⇒ θ = 434,294. For the Exponential Distribution, TVaRp = VaRp + θ = 1,000,000 + 434,294 = 1,434,294. 4.12. For the LogNormal, the distribution function at 1,000,000 is 0.9. 0.9 = Φ[{ln[1,000,000] - µ}/0.6]. ⇒ 1.282 = {ln[1,000,000] - µ}/0.6. ⇒ µ = 13.0463. For this LogNormal, E[X] = exp[13.0463 + 0.62 /2] = 554,765. ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ + x {1 - Φ ⎢ ⎥ ⎥⎦ } σ σ ⎣ ⎦ ⎣ E[X ∧ 1,000,000] = 554,765 Φ[0.682] + (1,000,000) {1 - Φ[1.282]} = (554,765)(0.7517) + (1,000,000)(0.10) = 517,067. e(1 million) = (554,765 - 517,067)/0.1 = 376,980. TVaRp = 1,000,000 + 376,980 = 1,376,980.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 41
4.13. For the LogNormal, the distribution function at 1,000,000 is 0.9. 0.9 = Φ[{ln[1,000,000] - µ}/1.2]. ⇒ 1.282 = {ln[1,000,000] - µ}/1.2. ⇒ µ = 12.2771. For this LogNormal, E[X] = exp[12.2771 + 1.22 /2] = 441,132. ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ E[X ∧ x] = exp(µ + σ2/2) Φ ⎢ + x {1 - Φ ⎢ ⎥ ⎥⎦ }. σ σ ⎣ ⎦ ⎣ E[X ∧ 1,000,000] = 441,132 Φ[0.082] + (1,000,000) {1 - Φ[1.282]} = (441,132)(0.5319) + (1,000,000)(0.10) = 334,638. e(1 million) = (441,132 - 334,638)/0.1 = 1,064,940. TVaRp = 1,000,000 + 1,064,940 = 2,064,940. Comment: The Tail Value at Risk depends on which form of distribution one assumes. Even assuming a LogNormal Distribution, the Tail Value at Risk depends on σ: TVaR ($ million) 4.0 3.5 3.0 2.5 2.0 1.5 0.5
1.0
1.5
2.0
sigma
The bigger σ, the heavier the righthand tail and thus the larger TVaR, all else being equal.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 42
4.14. For the Pareto Distribution, VaRp = θ {(1-p)-1/α - 1}. Thus 1,000,000 = θ {(1-0.9)-1/3 - 1}. ⇒ θ = 866,225. For the Pareto Distribution, TVaRp = VaRp +
θ (1- p) - 1/ α α - 1
= 1,000,000 + (866,225)(1-0.9)-1/3 / (3 -1) = 1,933,113. Comment: The Tail Value at Risk depends as a function of α: TVaR ($million) 3.5
3.0
2.5
2.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
alpha
The smaller α, the heavier the righthand tail and thus the larger TVaR, all else being equal. 4.15. D. What is shown here is the 5% worst outcomes. Their average is: {(100)(3%) + (200)(1.00%) + (300)(0.50%) + (400)(0.25%) + (500)(0.10%) + (600)(0.05%) + (700)(0.04%) + (800)(0.03%) + (900)(0.02%) + (1000)(0.01%)}/5% = 182 million. 4.16. A. The average of the worst 1% of outcomes is: {(300)(0.50%) + (400)(0.25%) + (500)(0.10%) + (600)(0.05%) + (700)(0.04%) + (800)(0.03%) + (900)(0.02%) + (1000)(0.01%)}/1% = 410 million.
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 43
4.17. D. We wish to find where the survival function is 5%. 0.05 = 0.6 exp[-x/1000] + 0.4 exp[-x/2000]. 5 exp[x/1000] - 60 - 40 exp[x/2000] = 0. Let y = exp[x/2000]. Then y2 - 8y - 12 = 0
⇒y=
8 ±
64 + 48 = 9.292, taking the positive root. 2
⇒ exp[x/2000] = 9.292. ⇒ x = 4458. 4.18. C. The mean of the mixture is: (60%)(1000) + (40%)(2000) = 1400. The limited expected value of the mixture at 4458 is: (60%)(1000)(1 - e-4458/1000) + (40%)(2000)(1 - e-4458/2000) = 1306.94 e(4458) = (1400 - 1306.94)/0.05 = 1861. TVaR95% = 4458 + e(4458) = 4458 + 1861 = 6319. 4.19. B. & 4.20. C. There is 50% probability on the first interval and 40% probability on the second interval, so the 80th percentile is in the second interval. F(40) = 50% + 30% = 80%. Thus the 80th percentile is 40. 50
The 80% Tail Value at Risk is: E[X | X > 40] = {
100
∫40 x 0.01 dx + 50∫ x 0.002 dx } / 0.2
= (4.5 + 7.5) / 0.2 = 60. 50
Alternately, e(40) = E[(X - 40) | X > 40] = {
100
∫40 (x - 40) 0.01 dx + 50∫ (x - 40) 0.002 dx } / 0.2 =
(0.5 + 3.5) / 0.2 = 20. ⇒ The 80% Tail Value at Risk is: 40 + e(40) = 60. 4.21. E. 0.90 = (x/10)4 . ⇒ 90th percentile = 9.74. f(x) = 4 x3 / 10,000, 0 ≤ x ≤ 10. 10
⇒ TVaR0.90 =
x 4x3 / 10,000 dx / 0.1 = 9.873. ∫ 9.74
2013-4-4,
Risk Measures §4 Tail Value at Risk,
HCM 10/8/12,
Page 44
4.22. D. For the Normal Distribution: TVaRp [X] = µ + σ φ[zp ] / (1 - p). The 95th percentile of the Standard Normal Distribution is 1.645. φ[1.645] = exp[-1.6452 /2] /
2 π = 0.10311.
TVaR95% = 10 + (3)φ[1.645] / 0.05 = 10 + (60)(0.10311) = 16.19. 4.23. A. f(x) = 0.0008 for x ≤ 1000. ⇒ F(1000) = 0.8. To find the 95th percentile: x
0.95 = 0.8 +
0.0004 exp[2 - t / 500] dt = 0.8 + (0.2) (1 - exp[2- x /500]). ∫ 1000
⇒ 0.25 = exp[2- x /500]. ⇒ x = 1693. ∞
⇒ TVaR95%[X] =
0.008
e2
∞
0.0004 exp[2 - t / 500] t dt / 0.05 = 0.008 e2 ∫ e- t / 500 t dt = ∫ 1693 1693
(-500te - t / 500
-
t= ∞ t / 500 2 500 e )
]
t = 1693
= (0.008) e2 (37,110) = 2194.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 45
Section 5, Distortion Risk Measures34 A distortion function, g, maps [0, 1] to [0, 1] such that g(0) = 0, g(1) = 1, and g is increasing. A distortion risk measure is obtained by taking the integral of g[S(x)]:35 ∞
H(X) =
∫ g[S(x)]dx . 0
Examples of distortion risk measures are: PH Transform36
g(y) = y1/κ
Wang Transform
g(y) = Φ[ Φ-1[y] + κ]
Dual Power Transform
g(y) = 1 - (1 - y)κ
It is less obvious, but the Value-at-Risk and Tail-Value-at-Risk risk measures can also be put in this form and are thus distortion risk measures. Proportional Hazard (PH) Transform: Define the Proportional Hazard (PH) Transform to be: g(S(x)) = S(x)1/κ, κ ≥ 1. Exercise: What is the PH transform of an Exponential Distribution? [Solution: For the Exponential, S(x) = exp[-x/θ]. S(x)1/κ = exp[-x/θ]1/κ = exp[-x/(κθ)]. Thus the PH transform is also an Exponential, but with θ replaced by κθ.] ∞
Recall that E[X] =
∫ S(x) dx .
37
0
The above integral computes the expected value of the losses. If instead we raised the survival function to some power less than one, we would get a larger integral, since S(x) < S(x)1/κ for κ > 1 and S(x) < 1. 34
No longer on the syllabus. The integral is taken over the domain of X, which is most commonly 0 to ∞. 36 The PH Transform is a special case of a Beta Transform, where g is a Beta Distribution with θ = 1. 37 See “Mahlerʼs Guide to Loss Distributions.” 35
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 46
The Proportional Hazard Transform risk measure is for κ ≥ 1: ∞
H(X) =
∫ S(x)1/ κ dx .
38
0
For κ = 1, the PH Transform is the mean. As the selected κ increases, so does the PH Transform. The more averse to risk one is, the higher the selected κ should be, resulting in a higher level of security. For a Pareto Distribution with α = 4 and θ = 240, E[X] = 240/(3 - 1) = 80. S(x) = {240/(240 + x)}4 . S(x)1/κ = {240/(240 + x)}4/κ, a Pareto with α = 4/κ and θ = 240. Exercise: For κ = 1.2, what is the PH Transform risk measure for this situation? [Solution: The transformed distribution is a Pareto with α = 4/1.2 = 3.33 and θ = 240. ∞
Therefore
∫ S(x)1/ κ dx = mean of this transformed Pareto = 240/(3.33 - 1) = 103.] 0
In the case of the Pareto Distribution, the PH Transform is also a Pareto, but with α replaced by α/κ. Thus the PH Transform has reduced the Pareto's shape parameter, resulting in a distribution with a heavier tail.39 The PH Transform risk measure is: θ/(α/κ - 1), for κ < α.
38 39
The integral is taken over the domain of X, which is most commonly 0 to ∞. Heavier-tailed distributions are sometimes referred to as more risky.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 47
Here is a graph of the PH Transform Risk Measure, for a Pareto Distribution with α = 4 and θ = 240, as function of κ:
PHTrans. R. M. 1500
1000
500
80 1
2
3
3.5
kappa
Exercise: For this situation, what value of k corresponds to a relative security loading of 50%? [Solution: θ/(α/κ - 1) = 1.5E[X] = 1.5 θ/(α - 1). ⇒ α - 1 = 1.5α/κ - 1.5.
⇒ κ = 1.5α/(α + .5) = (1.5)(4)/(4 + .5) = 6/4.5 = 1.33.] Exercise: Losses follow an Exponential Distribution with θ = 1000. Determine the PH Transform risk measure for κ = 1.6. [Solution: S(x) = exp[-x/1000]. S(x)1/1.6 = exp[-x/1600]. The PH Transform risk measure is the mean of the new Exponential, 1600. Comment: For the Exponential Distribution, the PH Transform risk measure is κθ.] Wangʼs Transform: Wangʼs Transform produces another risk measure, which is useful for working with Normal or LogNormal losses. Let X be LogNormal with µ = 6 and σ = 2. Then S(x) = 1 - Φ[(lnx - 6)/2]. Φ-1 is the inverse function of Φ. Φ-1[0.95] = 1.645. ⇔ Φ[1.645] = 0.95. Φ-1[1 - .95] = -Φ-1[.95] = -1.645. ⇔ Φ[-1.645] = 0.05.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 48
Φ-1[S(x)] = Φ-1[1 - Φ[(lnx - 6)/2]] = - Φ-1[Φ[(lnx - 6)/2]] = -(lnx - 6)/2. Φ[Φ-1[S(x)] + 0.7] = Φ[0.7 - (lnx - 6)/2] = 1 - Φ[(lnx - 6)/2 - 0.7] = 1 - Φ[(lnx - {6 + (0.7)(2)})/2]. Thus Φ[Φ-1[S(x)] + 0.7] is the survival function of a LogNormal with µ = 6 + (.7)(2) and σ = 2. Define Wangʼs Transform to be: g(S(x)) = Φ[Φ-1[S(x)] + κ], κ ≥ 0. As shown in the above example, if X is LogNormal with parameters µ and σ, then the Wang Transform is also LogNormal but with parameters µ + κσ and σ. The Wang Transform risk measure is for κ ≥ 0: ∞
H(X) =
∫ Φ[ Φ− 1[S(x)] + κ ] dx .
40
0
Exercise: X is LogNormal with µ = 6 and σ = 2. Determine the Wang Transform risk measure for κ = 0.7. [Solution: The Wang Transform is LogNormal with µ = 6 + (0.7)(2) = 7.4 and σ = 2. The Wang Transform risk measure is the mean of that LogNormal: exp[7.4 + 22 /2] = 12,088.] Dual Power Transform: g(S(x)) = 1 - {1 - S(x)}κ = 1 - F(x)κ, κ ≥ 1. Exercise: X is uniform from 0 to 1. Determine the Dual Power Transform with κ = 3. [Solution: F(x) = x. 1 - F(x)3 = 1 - x3 , 0 ≤ x ≤ 1. Comment: This is the survival function of a Beta Distribution with a = 3, b = 1, and θ = 1. The corresponding density is: 3x2 , 0 ≤ x ≤ 1.] Prob[Maximum of a sample of size N ≤ x] = Prob[X ≤ x]N = F(x)N. The distribution function of the maximum of a sample of size N is F(x)N. Therefore, 1 - F(x)N is the survival function of the maximum of a sample of size N. 40
The integral is taken over the domain of X, which is most commonly 0 to ∞.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 49
Therefore, if κ is an integer, the Dual Power Transform is the survival function of the maximum of a sample of size κ. The Dual Power Transform measure of risk is: ∞
H(X) =
∫1 -
κ
F(x) dx , κ ≥ 1.41
0
Thus if k = N, then the Dual Power Transform risk measure is the expected value of the maximum of a sample of size N. Exercise: X is uniform from 0 to 1. Determine the Dual Power Transform risk measure with κ = 3. 1
[Solution: F(x) = x.
1 - F(x)3
=1-
x3 ,
∫ 1 - x 3 dx = 1 - 1/4 = 3/4.
0 ≤ x ≤ 1.
0
Comment: The mean of a Beta Distribution with a = 3, b = 1, and θ = 1 is: (1)(3)/(3 + 1) = 3/4. The expected value of the maximum of a sample of size N from a uniform distribution on (0, ω) is: ω N/(N + 1). See “Mahlerʼs Guide to Statistics”, covering material on the syllabus of CAS3.] Value at Risk: For the VaRp risk measure: ⎧ 0 if 0 ≤ y ≤ 1 - p g(y) = ⎨ ⎩1 if 1 - p < y ≤ 1 Using the above distortion function, g[S(x)] is one when S(x) > 1 - p, and otherwise zero. S(x) > 1 - p when x < πp . Thus
∞
πp
0
0
∫ g[S(x)]dx = ∫ 1 dx = πp.
Exercise: What is the distortion function for VaR.95? ⎧ 0 if 0 ≤ y ≤ 0.05 [Solution: g(y) = ⎨ .] ⎩1 if 0.05 < y ≤ 1
41
The integral is taken over the domain of X, which is most commonly 0 to ∞.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 50
Tail-Value-at-Risk: For the TVaRp risk measure: ⎧y / (1-p) if 0 ≤ y ≤ 1 - p g(y) = ⎨ ⎩ 1 if 1 - p < y ≤ 1 Then, g[S(x)] is S(x)/(1 - p) when S(x) ≤ 1 - p, in other words when x ≥ πp , and otherwise 1. Thus ∞
πp
0
0
∞
∫ g[S(x)]dx = ∫ 1 dx + π∫ S(x)/ (1 -
p) dx = πp + e(πp ) = TVaRp .
p
Exercise: What is the distortion function for TVaR.95? ⎧20y if 0 ≤ y ≤ 0.05 [Solution: g(y) = ⎨ .] ⎩ 1 if 0.05 < y ≤ 1 Then, g[S(x)] is 20S(x) when S(x) ≤ 0.05, in other words when x ≥ π.9 5, and otherwise 1. π.95
∞
Thus
∫
0
g[S(x)]dx =
∫
1 dx +
0
= π.9 5 + e(π.9 5) = TVaR.95.
∞
∫
π.95
20 S(x) dx = π.9 5 + (layer from π.9 5 to ∞)/.05
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 51
Problems: 5.1 (1 point) What is the PH Transform risk measure with κ = 1.5 for an Exponential Distribution with mean of 100? A. 150 B. 160
C. 170
D. 180
E. 190
5.2 Which of the following distributions are not preserved under a PH Transform? A. Single Parameter Pareto B. Weibull C. Burr D. Gompertzʼs Law, F(x) = 1 - exp[-B[cx - 1)/ln(c)] E. LogNormal 5.3 (2 points) F(x) = 1 - {300/(300 + x)}5 . Determine the Proportional Hazard Transform risk measure with κ = 2. A. 200
B. 220
C. 240
D. 260
E. 280
5.4 (2 points) Losses follow a Uniform Distribution from 0 to 100. Determine the Proportional Hazard Transform risk measure with κ = 1.3. A. less than 55 B. at least 55 but less than 60 C. at least 60 but less than 65 D. at least 65 but less than 70 E. at least 70 5.5 (3 points) Aggregate losses follow a Single Parameter Pareto Distribution with α = 3 and θ = 10. However, a reinsurance contract caps the insurerʼs payments at 30. Determine the Proportional Hazard Transform risk measure of the insurerʼs payments with κ = 1.2. A. less than 16 B. at least 16 but less than 17 C. at least 17 but less than 18 D. at least 18 but less than 19 E. at least 19 5.6 (3 points) Losses follow a Weibull Distribution with θ = 1000 and τ = 0.4. Determine the Proportional Hazard Transform risk measure with κ = 1.8. Hint: Γ(1/2) = A. 14,000
B. 14,500
C. 15,000
D. 15,500
E. 16,000
π.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 52
5.7 (2 points) Annual aggregate losses have the following distribution: Annual Aggregate Losses Probability 100 60% 500 30% 1000 10% Determine the Proportional Hazard Transform risk measure with κ = 2. A. less than 350 B. at least 350 but less than 400 C. at least 400 but less than 450 D. at least 450 but less than 500 E. at least 500 Use the following information for the next two questions: The premium will be set equal to the proportional hazard transform of the distribution of aggregate annual losses retained by the insurer or reinsurer, with κ = 1.2. The relative security loading, η, is such that: Premiums = (1 + η) (Expected Losses). 5.8 (6 points) Annual aggregate losses follow an Exponential Distribution with mean 100. Determine the relative security loading for the following situations: (a) The insurer retains all losses. (b) The insurer retains only the layer from 0 to 50. (c) A reinsurer retains only the layer from 50 to 100. (d) A reinsurer retains only the layer above 100. 5.9 (6 points) Annual aggregate losses follow a Pareto Distribution with α = 3 and θ = 200. Determine the relative security loading for the following situations: (a) The insurer retains all losses. (b) The insurer retains only the layer from 0 to 50. (c) A reinsurer retains only the layer from 50 to 100. (d) A reinsurer retains only the layer above 100. 5.10 (2 points) Losses are LogNormal with µ = 4 and σ = 0.6. Determine the Wang Transform risk measure for κ = 0.3. A. less than 75 B. at least 75 but less than 80 C. at least 80 but less than 85 D. at least 85 but less than 90 E. at least 90
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 53
5.11 (2 points) Losses are Normal with µ = 7000 and σ = 500. Determine the Wang Transform risk measure for κ = 0.8. A. 7000
B. 7100
C. 7200
D. 7300
E. 7400
5.12 (3 points) Annual aggregate losses have the following distribution: Annual Aggregate Losses Probability 100 60% 500 30% 1000 10% Determine the Wang Transform risk measure with κ = 0.5. A. 350
B. 400
C. 450
D. 500
E. 550
5.13 (4 points) Y+ is defined as 0 if Y ≤ 0, and Y if Y > 0. X is the value of a put option. X = (220 - P)+, where P follows a LogNormal Distribution with µ = 5.5 and σ = 0.2. Determine the Wang Transform risk measure with κ = 0.6. A. 17
B. 19
C. 21
D. 23
E. 25
5.14 (4 points) Y+ is defined as 0 if Y ≤ 0, and Y if Y > 0. X is the value of a call option. X = (P - 300)+, where P follows a LogNormal Distribution with µ = 5.5 and σ = 0.2. Determine the Wang Transform risk measure with κ = 0.4. A. 8
B. 9
C. 10
D. 11
E. 12
5.15 (2 points) What is the Dual Power Transform risk measure with κ = 3 for an Exponential Distribution with mean 100? A. 140 B. 160 C. 180
D. 200
E. 220
5.16 (2 points) Losses are uniform from 0 to 100. Determine the Dual Power Transform risk measure with κ = 1.4. A. 52
B. 54
C. 56
D. 58
E. 60
5.17 (2 points) Losses follow a Pareto Distribution with α = 5 and θ = 10. Determine the Dual Power Transform risk measure with κ = 2. A. 3.3
B. 3.5
C. 3.7
D. 3.9
E. 4.1
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 54
5.18 (2 points) Annual aggregate losses have the following distribution: Annual Aggregate Losses Probability 100 60% 500 30% 1000 10% Determine the Dual Power Transform risk measure with κ = 1.5. A. less than 350 B. at least 350 but less than 400 C. at least 400 but less than 450 D. at least 450 but less than 500 E. at least 500 5.19 (1 point) Graph the distortion functions corresponding to VaR.9. 5.20 (1 point) Graph the distortion function corresponding to TVaR.9 5.21 (1 point) Graph the distortion function corresponding to the PH Transform with κ = 2. 5.22 (1 point) Graph the distortion function corresponding to the Dual Power Transform with κ = 2. 5.23 (3 points) Graph the distortion function corresponding to the Wang Transform with κ = 0.3. 5.24 (3 points) A distortion risk measure has: ⎧10y if 0 ≤ y ≤ 0.1 g(y) = ⎨ ⎩ 1 if 0.1 < y ≤ 1 Determine the risk measure for a Pareto Distribution with α = 3 and θ = 200. A. less than 500 B. at least 500 but less than 600 C. at least 600 but less than 700 D. at least 700 but less than 800 E. at least 800 5.25 (1 point) Which of the following are distortion risk measures? 1. PH (Proportional Hazard) Transform 2. Dual Power Transform 3. Expected Value Premium Principle A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, and 3 E. Not A, B, C, or D
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 55
5.26 (2 points) Which of the following is the distortion function for the VaR risk measure for p = 90%? ⎧ 0 if 0 ≤ y ≤ 0.10 A. g(y) = ⎨ ⎩1 if 0.10 < y ≤ 1 ⎧ 0 if 0 ≤ y ≤ 0.90 B. g(y) = ⎨ ⎩1 if 0.90 < y ≤ 1 ⎧10y if 0 ≤ y ≤ 0.10 C. g(y) = ⎨ ⎩ 1 if 0.10 < y ≤ 1 ⎧10y if 0 ≤ y ≤ 0.90 D. g(y) = ⎨ ⎩ 1 if 0.90 < y ≤ 1 E. None of A, B, C, or D 5.27 (2 points) Which of the following is the distortion function for the TVaR risk measure for p = 90%? ⎧ 0 if 0 ≤ y ≤ 0.10 A. g(y) = ⎨ ⎩1 if 0.10 < y ≤ 1 ⎧ 0 if 0 ≤ y ≤ 0.90 B. g(y) = ⎨ ⎩1 if 0.90 < y ≤ 1 ⎧10y if 0 ≤ y ≤ 0.10 C. g(y) = ⎨ ⎩ 1 if 0.10 < y ≤ 1 ⎧10y if 0 ≤ y ≤ 0.90 D. g(y) = ⎨ ⎩ 1 if 0.90 < y ≤ 1 E. None of A, B, C, or D 5.28 (4, 5/07, Q.27) (2.5 points) You are given the distortion function: g(x) =
x.
Calculate the distortion risk measure for losses that follow the Pareto distribution with θ = 1000 and α = 4. (A) Less than 300 (B) At least 300, but less than 600 (C) At least 600, but less than 900 (D) At least 900, but less than 1200 (E) At least 1200
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 56
Solution to Problems: 5.1. A. S(x) = exp[-x/100]. S(x)1/κ = exp[-x/100]1/1.5 = exp[-x/150]. Thus the PH transform is also an Exponential, but with mean 150. 5.2. E. For the Single Parameter Pareto, S(x) = (θ/x)α. S(x)1/κ = (θ/x)α/κ. Another Single Parameter Pareto, but with α replaced by α/κ. For the Weibull, S(x) = exp[-(x/θ)τ]. S(x)1/κ = exp[-(x/θ)τ]1/κ = exp[-(x/θ)τ/κ] = exp[-{x/(θ κ1/τ)}τ]. Thus the PH transform is also a Weibull distribution, but with θ replaced by θ κ1/τ. For the Burr, S(x) = {θ/(θ + xγ)}α. S(x)1/κ = {θ/(θ + xγ)}α/κ. Thus the PH transform is also a Burr Distribution, but with α replaced by α/κ. For Gompertzʼs Law, S(x) = exp[-B(cx - 1)/ln(c)]. S(x)1/κ = exp[-(B/κ)(cx - 1)/ln(c)]. Thus the PH transform is also Gompertzʼs Law, but with B replaced by B/κ. For the LogNormal, S(x) = 1 - Φ[(lnx - µ)/σ]. S(x)1/κ is not of the same form. 5.3. A. S(x) = {300/(300 + x)}5 . S(x)1/2 = {300/(300 + x)}2.5, a Pareto Distribution with α = 2.5 and θ = 300. The risk measure is the mean of this second Pareto Distribution, 300/(2.5 - 1) = 200. 5.4. B. S(x) = 1 - x/100, x ≤ 100. S(x)1/1.3 = (1 - x/100)1/1.3. 100
∫
(1 -
x / 100)1/ 1.3
dx = -100(1 -
x/100)1+1/1.3/(1
x=100
+ 1/1.3)
0
]
= 100/(1 + 1/1.3) = 56.52.
x=0
5.5. A. S(x) = (10/x)3 , x < 30. S(x)1/1.2 = (10/x)3/1.2, x < 30, a Single Parameter Pareto Distribution with α = 3/1.2 = 2.5 and θ = 10, capped at 30. The risk measure is the limited expected value at 30 of this transformed Single Parameter Pareto. E[X ∧ x] = θ {α - (θ/x)α−1} / (α - 1). E[X ∧ 30] = 10{2.5 - (10/30)1.5} / 1.5 = 15.38.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 57
5.6. B. For the Weibull, S(x) = exp[-(x/θ)τ]. S(x)1/κ = exp[-(x/θ)τ]1/κ = exp[-(x/θ)τ/κ] = exp[-{x/(θ κ1/τ)}τ]. Thus the PH transform is also a Weibull distribution, but with θ replaced by θ κ1/τ. Therefore, the risk measure is the mean of a Weibull with θ = (1000)(1.81/0.4) = 4347 and τ = 0.4: 4347 Γ(1 + 1/0.4) = 4347 Γ(3.5) = (4347)(2.5)(1.5)(0.5)Γ(1/2) = 8151 π = 14,447. 5.7. E. For the original distribution: S(x) = 1 for x < 100, 0.4 for 100 ≤ x < 500, 0.1 for 500 ≤ x < 1000, 0 for x ≤ 1000. For the PH Transform, S(x) = 1 for x < 100, 0.41/2 = 0.6325 for 100 ≤ x < 500, 0.11/2 = 0.3162 for 500 ≤ x < 1000, 0 for x ≤ 1000. The integral of the Survival Function of the PH Transform is: (100)(1) + (500 - 100)(0.6325) + (1000 - 500)(0.3162) = 511. Comment: The mean of the original distribution is: (100)(1) + (500 - 100)(0.4) + (1000 - 500)(0.1) = 310 = (60%)(100) + (30%)(500) + (10%)(1000). 5.8. For an Exponential with mean θ, S(x) = e-x/θ. S(x)1/κ = e-x/(θκ). u
∫ S(x)dx = θ(e-d/θ - e-u/θ). d
Therefore for the layer from d to u, E[X] = θ(e-d/θ - e-u/θ), and H(X) = κθ{e-d/(κθ) - e-u/(κθ)}. H(X)/E[X] = κ{e-d/(κθ) - e-u/(κθ)}/(e-d/θ - e-u/θ) = 1.2(e-d/120 - e-u/120)/(e-d/100 - e-u/100). (a) For d = 0 and u = ∞, H(X)/E[X] = 1.2. η = 20.0%. (b) For d = 0 and u = 50, H(X)/E[X] = 1.2(1 - e-50/120)/(1 - e-50/100) = 1.039. η = 3.9%. (c) For d = 50 and u = 100, H(X)/E[X] = 1.2(e-50/120 - e-100/120)/(e-50/100 - e-100/100) = 1.130. η = 13.0%. (d) For d = 100 and u = ∞, H(X)/E[X] = 1.2(e-100/120 - 0)/(e-100/100 - 0) = 1.418. η = 41.8%. Comment: The lowest layer gets the smallest relative security loading, while the highest layer gets the highest relative security loading.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 58
5.9. For the Pareto, S(x) = θα/(x + θ)α. S(x)1/κ = θα/κ/(x + θ)α/κ. u
∫ S(x)dx = θα{1/(θ + d)α−1 - 1/(θ + u)α−1}/(α - 1). d
Therefore for the layer from d to u, E[X] = θα{1/(θ + d)α−1 - 1/(θ + u)α−1}/(α - 1) = 2003 {1/(200 + d)2 - 1/(200 + u)2 }/2, and H(X) = θα/κ{1/(θ + d)α/κ−1 - 1/(θ + u)α/κ−1}/(α/κ - 1) = 2002.5{1/(200 + d)1.5 - 1/(200 + u)1.5}/1.5. H(X)/E[X] = (4/3)200-.5{1/(200 + d)1.5 - 1/(200 + u)1.5}/{1/(200 + d)2 - 1/(200 + u)2 }. (a) For d = 0 and u = ∞, H(X)/E[X] = 4/3. η = 1/3 = 33.3%. (b) For d = 0 and u = 50, H(X)/E[X] = (4/3)200-.5{1/2001.5 - 1/2501.5}/{1/2002 - 1/2502 } = 1.054. η = 5.4%. (c) For d = 50 and u = 100, H(X)/E[X] = (4/3)200-.5{1/2501.5 - 1/3001.5}/{1/2502 - 1/3002 } = 1.167. η = 16.7%. (d) For d = 100 and u = ∞, H(X)/E[X] = (4/3)200-.5{1/3001.5}/{1/3002 } = 1.633. η = 63.3%. 5.10. B. The Wang Transform is LogNormal with µ = 4 + (.3)(.6) = 4.18 and σ = 0.6. The Wang Transform risk measure is the mean of that LogNormal: exp[4.18 + .62 /2] = 78.26. 5.11. E. Φ -1[S(x)] = Φ-1[1 - Φ[(x - µ)/σ]] = - Φ-1[Φ[(x - µ)/σ]] = -(x - µ)/σ. Φ[Φ-1[S(x)] + κ] = Φ[κ - (x - µ)/σ] = 1 - Φ[(x - µ)/σ - κ] = 1 - Φ[(x - {µ + κσ})/σ]. This is the survival function of a Normal with µʼ = µ + κσ and σ = 2. The Wang Transform risk measure is the mean of that Normal: µ + κσ. In this case, µ + κσ = 7000 + (.8)(500) = 7400. Comment: As applied to the Normal Distribution, the Wang Transform is equivalent to the Standard Deviation Premium Principle, with α = κ. The Wang Transform Risk Measure for κ > 0 is greater than the mean, eliminating choice A.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 59
5.12. C. For the original distribution: S(x) = 1 for x < 100, 0.4 for 100 ≤ x < 500, 0.1 for 500 ≤ x < 1000, 0 for x ≤ 1000. Φ-1[S(x)] is: ∞ for x < 100, Φ-1[0.4] = -0.253 for 100 ≤ x < 500, Φ-1[0.1] = -1.282 for 500 ≤ x < 1000, -∞ for x ≤ 1000. Φ-1[S(x)] + κ is: ∞ for x < 100, 0.247 for 100 ≤ x < 500, -0.782 for 500 ≤ x < 1000, -∞ for x ≤ 1000. Φ[Φ-1[S(x)] + κ] is: 1 for x < 100, Φ[0.25] = 0.5987 for 100 ≤ x < 500, Φ[-0.78] = 0.2177 for 500 ≤ x < 1000, 0 for x ≤ 1000. The integral of the Survival Function of the Wang Transform is: (100)(1) + (500 - 100)(0.5987) + (1000 - 500)(0.2177) = 448. 5.13. A. SX(x) = Prob[X > x] = Prob[220 - p > x] = Prob[p < 220 - x] = FP(150 - x) = Φ[{ln(220 - x) - 5.5}/0.2], for x ≤ 220. Φ-1[S(x)] is: {ln(220 - x) - 5.5}/0.2, for x ≤ 220. Φ-1[S(x)] + 0.6 = {ln(220 - x) - 5.38}/0.2, for x ≤ 220. Φ[Φ-1[S(x)] + 0.6] = Φ[{ln(220 - x) - 5.38}/0.2], for x ≤ 220. Let p = 220 - x, then Φ[Φ-1[S(x)] + 0.6] = Φ[{ln(p) - 5.38}/0.2], for p ≤ 220. The integral of Φ[Φ-1[S(x)] + 0.6] is the integral of a LogNormal Distribution Function with µ = 5.38 and σ = 0.2, from 0 to 220. 220
∫
0
220
F(x) dx =
∫
0
220
1 - S(x) dx = 220 -
∫
S(x) dx = 220 - E[X
∧
220].
0
For the LogNormal with µ = 5.38 and σ = 0.2, E[X
∧
220] =
exp[5.38 + .22 /2] Φ[(ln220 - 5.38 - 0.22 )/0.2] + 220 {1 - Φ[(ln220 - 5.38)/0.2]} = (221.406)Φ[-0.13] + 220 {1 - Φ[0.07]} = (221.406)(.4483) + (220)(.4721) = 203.1. The integral of Φ[Φ-1[S(x)] + 0.6] is: 220 - 203.1 = 16.9.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 60
5.14. D. SX(x) = Prob[X > x] = Prob[p - 300 > x] = Prob[p > 300 + x] = SP(300 + x) = 1 - Φ[{ln(300 + x) - 5.5}/0.2] = Φ[{5.5 - ln(300 + x)}/0.2], for x > 0. Φ-1[S(x)] is: {5.5 - ln(300 + x)}/0.2, for x > 0. Φ-1[S(x)] + 0.4 = {5.58 - ln(300 + x)}/0.2, for x > 0. Φ[Φ-1[S(x)] + 0.4] = Φ[{5.58 - ln(300 + x)}/0.2] = 1 - Φ[{ln(300 + x) - 5.58}/0.2] for x > 0. Let p = 300 + x, then Φ[Φ-1[S(x)] + 0.4] = 1 - Φ[{ln(p) - 5.58}/0.2], for p > 300 The integral of Φ[Φ-1[S(x)] + 0.4] is the integral of a LogNormal Survival Function with µ = 5.58 and σ = 0.2, from 300 to ∞, which is for that LogNormal: E[X] - E[X
∧
300].
For the LogNormal with µ = 5.58 and σ = 0.2, E[X] = exp[5.58 + .22 /2] = 270.426. E[X
∧
300] =
exp[5.58 + .22 /2] Φ[(ln300 - 5.58 - 0.22 )/0.2] + 300 {1 - Φ[(ln300 - 5.58)/0.2]} = (270.426)Φ[0.42] + 300 {1 - Φ[0.62]} = (270.426)(.6628) + (300)(.2676) = 259.5. The integral of Φ[Φ-1[S(x)] + 0.4] is: 270.426 - 259.5 = 10.9. 5.15. C. F(x) = 1 - exp[-x/100]. F(x)3 = 1 - 3exp[-x/100] + 3exp[-2x/100] - exp[-3x/100]. ∞
∫ 1 - F(x)3 dx = 3(100) - (3)(100/2) + 100/3 = 183.3. 0
Comment: The expected value of the maximum of a sample of size N from an Exponential is: N
θ ∑ 1/ i . See “Mahlerʼs Guide to Statistics”, covering material on the syllabus of CAS Exam 3. i=1
5.16. D. F(x) = x/100. F(x)1.4 = x1.4/1001.4 . 100
∫
1 - F(x)1.4 dx = 100 - 100/2.4 = 58.33.
0
Comment: For a uniform distribution on (0, ω), the Dual Power Transform risk measure is: ω κ /(κ + 1). 5.17. D. F(x) = 1 - {10/(10 + x)}5 . F(x)2 = 1 - 2({10/(10 + x)}5 + {10/(10 + x)}10. ∞
∫ 1 - F(x)2 dx = 10{2/4 - 1/9} = 3.89. 0
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
5.18. B. For the original distribution: F(x) = 0 for x < 100, 0.6 for 100 ≤ x < 500, 0.9 for 500 ≤ x < 1000, 1 for x ≤ 1000. 1 - F(x)κ is: 1 for x < 100, 1 - 0.61.5 = 0.5352 for 100 ≤ x < 500, 1 - 0.91.5 = 0.1462 for 500 ≤ x < 1000, 0 for x ≤ 1000. The integral of the Survival Function of the Dual Power Transform is: (100)(1) + (500 - 100)(0.5352) + (1000 - 500)(0.1462) = 387. 5.19. 90%-VaR. g(y) = 0 for 0 ≤ y ≤ 10%, and 1 for 10% < y ≤ 1. g(y) 1.0 0.8 0.6 0.4 0.2
0.2
0.4
0.6
0.8
1.0
y
5.20. TVaR90%. g(y) = y/.1 = 10y for 0 ≤ y ≤ 10%, and 1 for 10% < y ≤ 1. g(y) 1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
y
Page 61
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
5.21. PH Transform with κ = 2. g(y) = y1/2 = y . g(y) 1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
y
5.22. Dual Power Transform with κ = 2. g(y) = 1 - (1 - y)2 . g(y) 1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
y
Page 62
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 63
5.23. Wang Transform with κ = 0.3. g(y) = Φ[ Φ-1[y] + 0.3]. g(y) 1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
y
For example, without rounding, g(.05) = Φ[ Φ-1[.05] + 0.3] = Φ[-1.645 +.3] = Φ[-1.345] = 0.0893. g(.7) = Φ[ Φ-1[0.7] + 0.3] = Φ[0.524 + 0.3] = Φ[0.824] = 0.795. 5.24. A. This is the distortion function for TVaR.90. H(X) =
Q.90
∞
∞
0
Q.90
∫ g[S(x)]dx = 10 ∫
S(x) dx +
∫
dx = (E[X] - E[X
∧
Q.90])/(1 - .9) + Q.90 = TVaR.90.
0
E[X] = 200/(3 - 1) = 100. .9 = F(x) = 1 - {200/(200 + x)}3 . ⇒ Q.90 = 230.89. E[X
∧
230.9] = {200/(3 - 1)}{1 - (200/(200 +230.9))3-1} = 78.46.
TVaR.90 = Q.90 + (E[X] - E[X
∧
Q.90])/(1 - .9) = 230.89 + (100 - 78.46)/.1 = 446.
Comment: y = S(x) in the definition of g(y) the distortion function. 0 ≤ y ≤ 0.1. ⇔ 0 ≤ S(x) ≤ 0.1. ⇔ 0.1 < F(x). ⇔ Q.90 < x. When S(x) is small, x is large, while when S(x) is large, x is small. 0.1 < y. ⇔ 0.1 < S(x). ⇔ F(x) ≤ 0.1. ⇔ x ≤ Q.90. 5.25. A. The Expected Value Premium Principle is not a distortion risk measure.
2013-4-4,
Risk Measures §5 Distortion Risk Measures,
HCM 10/8/12,
Page 64
⎧ 0 if 0 ≤ y ≤ 1 - α 5.26. A. For the α-VaR risk measure: g(y) = ⎨ . ⎩1 if 1 - α < y ≤ 1 ⎧ 0 if 0 ≤ y ≤ 0.10 For 90%-VaR: g(y) = ⎨ . ⎩1 if 0.10 < y ≤ 1 Comment: g(S(x)) = 0 for S(x) ≤ .10, and g(S(x)) = 1 for S(x) > .10. In other words, g(S(x)) = 0 for x ≥ Q.90, and g(S(x)) = 1 for x > Q.90. ∞
Q.90
0
0
Therefore, ∫ g(S(x)) dx =
∫
dx = Q.90.
⎧y / (1- α) if 0 ≤ y ≤ 1 - α 5.27. C. For the α-TVaR risk measure: g(y) = ⎨ . ⎩ 1 if 1 - α < y ≤ 1 ⎧10y if 0 ≤ y ≤ 0.10 For 90%-TVaR: g(y) = ⎨ . ⎩ 1 if 0.10 < y ≤ 1 Comment: The distortion function for α-TVaR involves 1/(1-α). g(S(x)) = 10 S(x) for S(x) ≤ .10, and g(S(x)) = 1 for S(x) > .10. In other words, g(S(x)) = 10 S(x) for x ≥ Q.90, and g(S(x)) = 1 for x > Q.90. ∞
Q.90
0
0
Therefore, ∫ g(S(x)) dx =
∫
dx +
∞
∫
10 S(x) dx = Q.90 + 10 E[(X - Q.90)+] =
Q.90
Q .90 + E[(X - Q.90)+]/.1 = Q.90 + e(Q.90) = 90%-TVaR. 5.28. D. For this Pareto, S(x) = {1000/(1000 + x)}4 . g(S(x)) = {1000/(1000 + x)}2 , the Survival Function of another Pareto with θ = 1000 and α = 2. H(X) is the integral of g(S(x)), the mean of this second Pareto: 1000/( 2 - 1) = 1000. Alternately,
∞
∞
∞
0
0
0
∫ g(S(x))dx = ∫ 10002 / (1000 + x)2 dx = -1,000,000/(1000 + x)
Comment: PH transform with κ = 2. For a Pareto Distribution, the PH Transform risk measure is: θ/(α/κ - 1), α/κ > 1.
] = 1000.
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 65
Section 6, Coherence42 There are various desirable properties for a risk measure to satisfy. A risk measure is coherent if it has the following four properties: 1. Translation Invariance 2. Positive Homogeneity 3. Subadditivity 4. Monotonicity Translation Invariance: ρ(X + c) = ρ(X) + c, for any constant c. In other words, a risk measure is translation invariant if adding a constant to the loss variable, adds that same constant to the risk measure. Letting X = 0, Translation Invariance ⇒ ρ(c) = c. In other words if the outcome is certain, the risk measure is equal to the loss For example, if the loss is always 1000, then the risk measure is 1000. Positive Homogeneity: ρ(cX) = c ρ(X), for any constant c > 0. In other words, a risk measure is positive homogeneous if multiplying the loss variable by a positive constant, multiplies the risk measure by the same constant. Positive Homogeneity ⇒ If the loss variable is converted to a different currency at a fixed rate of exchange, then so is the risk measure. Positive Homogeneity ⇒ If the exposure to loss is doubled, then so is the risk measure. Subadditivity: ρ(X + Y) ≤ ρ(X) + ρ(Y). 42
See Section 3.5.2 of Loss Models, in particular Definition 3.11. See also “Setting Capital Requirements With Coherent Measures of Risk”, by Glenn G. Meyers, August 2002 and November 2002 Actuarial Reviews.
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 66
In other words, a risk measure satisfies subadditivity, if the merging of two portfolios can not increase the total risk compared to the sum of their individual risks, but may decrease the total risk. It should not be possible to reduce the appropriate premium or the required surplus by splitting a portfolio into its constituent parts.43 Exercise: Determine whether VaR90% satisfies subadditivity. [Solution: For example, take the following joint distribution for X and Y: X = 0 and Y = 0, with probability 88% X = 0 and Y = 1, with probability 4% X = 1 and Y = 0, with probability 4% X = 1 and Y = 1, with probability 4% Then for X, Prob[X = 0] = 92%, Prob[X = 1] = 8%, π.90 = 0. For Y, Prob[Y = 0] = 92%, Prob[Y = 1] = 8%, π.90 = 0. For X + Y, Prob[X + Y = 0] = 88%, Prob[X + Y = 1] = 8%, Prob[X + Y = 2] = 4%, π.90 = 1. ρ(X + Y) = 1 > 0 = 0 + 0 = ρ(X) + ρ(Y). Thus VaR90% does not have the subadditivity property. Comment: See Example 3.13 in Loss Models. X and Y are not independent. In order to have the subadditivity property, one must have that H(X + Y) ≤ H(X) + H(Y), for all possible distributions of losses X and Y.] Since it does not satisfy subadditivity, Value at Risk (VaR) is not coherent.44 Monotonicity: If Prob[X ≤ Y] = 1, then ρ(X) ≤ ρ(Y).45 In others words, a risk measure satisfies monotonicity, if X is never greater than Y, then the risk associated with X is not greater than the risk associated with Y. For example, let X = (180 - P)+ and Y = (200 - P)+.46 Then X ≤ Y. Therefore, for any risk measure that satisfies monotonicity, ρ(X) ≤ ρ(Y). 43
Mergers do not increase risk. Diversification does not increase risk. Nevertheless, VaR is still commonly used, particularly in banking. In most practical applications, VaR is subadditive. Also in some circumstances it may be valuable to disaggregate risks. 45 Technically, we are allowing X > Y on a set of probability zero, something of interest to mathematicians but not most actuaries. 46 X and Y are two put options on the price of the same stock P. with different strike prices. 44
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 67
Risk Measures: The Tail Value at Risk is a coherent measure of risk. The Standard Deviation Premium Principle and the Value at Risk are not coherent measures of risk. Translation Invariance
Positive Homogen.
Subadditivity
Monotonicity
Coherence
No
Yes
Yes
Yes
No
Standard Deviation Premium Principle Yes
Yes
Yes
No
No
Variance Premium Principle
Yes
No
No
No
No
Value at Risk
Yes
Yes
No
Yes
No
Tail Value at Risk
Yes
Yes
Yes
Yes
Yes
Risk Measure Expected Value Premium Principle
A measure of risk is coherent if and only if it can be expressed as the supremum of the expected losses taken over a class of probability measures on a finite set of scenarios.47 A distortion measure is coherent if and only if the distortion function is concave.48 From this it follows that the PH Transform Risk Measure, the Double Power Transform, and the Wang Transform are each coherent. It can be shown that for a coherent risk measure: E[X] ≤ ρ(X) ≤ Max[X].49
47
“Coherent Measure of Risks” by Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, and David Heath, Mathematical Finance 9 (1999), No. 3. 48 “Distortion Risk Measures: Coherence and Stochastic Dominance” by Julia L. Wirch and Mary R. Hardy, presented at the 6th International Congress on Insurance: Mathematics and Economics. 49 TVaR0 = E[X] and TVaR1 = Max[X].
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 68
Problems: 6.1 (3 points) List and briefly define the properties that make a risk measure, ρ(X), coherent. 6.2 (3 points) Briefly discuss whether the Expected Value Premium Principle is a coherent risk measure. Which of the properties does it satisfy? 6.3 (3 points) Briefly discuss whether the Standard Deviation Premium Principle is a coherent risk measure. Which of the properties does it satisfy? 6.4 (3 points) Briefly discuss whether the Variance Premium Principle is a coherent risk measure. Which of the properties does it satisfy? 6.5 (3 points) Briefly discuss whether the Value at Risk is a coherent risk measure. Which of the properties does it satisfy? 6.6 (2 points) Briefly discuss whether the Tail Value at Risk is a coherent risk measure. Which of the properties does it satisfy? 6.7 (3 points) Briefly discuss whether ρ(X) = E[X] is a coherent risk measure. Which of the properties does it satisfy? 6.8 (3 points) Define ρ(X) = Maximum[X], for loss distributions for which Maximum[X] < ∞. Briefly discuss whether this is a coherent risk measure. Which of the properties does it satisfy? 6.9 (3 points) The Exponential Premium Principle has ρ(X) = ln[E[eαX]]/α, α > 0. Briefly discuss whether it is a coherent risk measure. Which of the properties does it satisfy?
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 69
Solution to Problems: 6.1. 1. Translation Invariance. ρ(X + c) = ρ(X) + c. 2. Positive Homogeneity. ρ(cX) = c ρ(X), for any constant c > 0. 3. Subadditivity. ρ(X + Y) ≤ ρ(X) + ρ(Y). 4. Monotonicity. If Prob[X ≤ Y] = 1, then ρ(X) ≤ ρ(Y). 6.2. ρ(X) = (1 + k)E[X], k > 0. 1. ρ(X + c) = (1 + k)E[X +c] = (1 + k)E[X] + (1 + k)c = ρ(X) + (1 + k)c ≠ ρ(X) + k. Translation Invariance does not hold 2. ρ(cX) = (1 + k)E[cX] = c(1 + k)E[X] = c ρ(X). Positive Homogeneity does hold. 3. ρ(X + Y) = (1 + k)E[X + Y] = (1 + k)E[X] + (1 + k)E[Y] = ρ(X) + ρ(Y) ≤ ρ(X) + ρ(Y). Subadditivity does hold. 4. If Prob[X ≤ Y] = 1, then ρ(X) = (1 + c)E[X] ≤ (1 + c)E[Y] = ρ(Y). Monotonicity does hold. The Expected Value Premium Principle is not coherent since #1 does not hold. 6.3. ρ(X) = E[X] + kStdDev[X], k > 0. 1. ρ(X + c) = E[X + c] + kStdDev[X+c] = E[X] + c + kStdDev[X] = ρ(X) + c. Translation Invariance does hold 2. ρ(cX) = E[cX] + kStdDev[cX] = c E[X] + k c StdDev[X] = c ρ(X). Positive Homogeneity does hold. 3. ρ(X + Y) = E[X + Y] + kStdDev[X + Y] = E[X] + E[Y] + kStdDev[X + Y]. Now Var[X + Y] = σ2X + σ2Y + 2σX σY Corr[X, Y] ≤ σ2X + σ2Y + 2σX σY, since Corr[X, Y] ≤ 1 Var[X + Y] ≤ (σX + σY)2 . ⇒ StdDev[X + Y] ≤ σX + σY.
⇒ ρ(X + Y) ≤ E[X] + E[Y] + kStdDev[X] + kStdDev[X] = ρ(X) + ρ(Y). Subadditivity does hold. 4. Let X be uniform from 0 to 1. Let Y be constant at 2. Let k = 10. Then ρ(X) = 0.5 + 10 1/ 12 = 3.39. ρ(Y) = 2 + (10)(0) = 2. Prob[X ≤ Y] = 1, yet ρ(X) > ρ(Y). Monotonicity does not hold. The Standard Deviation Premium Principle is not coherent since #4 does not hold.
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 70
6.4. ρ(X) = E[X] + kVar[X], α > 0. 1. ρ(X + c) = E[X + c] + kVar[X+c] = E[X] + c + kVar[X] = ρ(X) + c. Translation Invariance does hold 2. ρ(λX) = E[cX] + kVar[cX] = c E[X] + k c2 Var[X] ≠ c ρ(X). Positive Homogeneity does not hold. 3. ρ(X + Y) = E[X + Y] + kVar[X + Y] = E[X] + E[Y] + kVar[X + Y]. Now Var[X + Y] = σ2X + σ2Y + 2σX σY Corr[X, Y]. If Corr[X, Y] > 0, then Var[X + Y] > Var[X] + Var[Y], and ρ(X + Y) > ρ(X) + ρ(Y). Subadditivity does not hold. 4. Let X be uniform from 0 to 1. Let Y be constant at 1. Let k = 10. Then ρ(X) = 0.5 + 10(1/12) = 1.333. ρ(Y) = 1 + (10)(0) = 1. Prob[X ≤ Y] = 1, yet ρ(X) > ρ(Y). Monotonicity does not hold. The Variance Premium Principle is not coherent. 6.5. ρ(X) = πp , the pth percentile 1. Adding a constant to a variable adds a constant to each percentile. ρ(X + c) = ρ(X) + c. Translation Invariance does hold 2. Multiplying a variable by a constant multiplies each percentile by that constant. ρ(cX) = c ρ(X). Positive Homogeneity does hold. 3. For example, take the following joint distribution for X and Y: X = 0 and Y = 0, with probability 88% X = 0 and Y = 1, with probability 4% X = 1 and Y = 0, with probability 4% X = 1 and Y = 1, with probability 4% Then for X, Prob[X = 0] = 92%, Prob[X = 1] = 8%, π.90 = 0. For Y, Prob[Y = 0] = 92%, Prob[Y = 1] = 8%, π.90 = 0. For X + Y, Prob[X + Y = 0] = 88%, Prob[X + Y = 1] = 8%, Prob[X + Y = 2] = 4%, π.90 = 1. Let p = 90%. ρ(X + Y) = 1 > 0 = 0 + 0 = ρ(X) + ρ(Y). Subadditivity does not hold. 4. If Prob[X ≤ Y] = 1, then the pth percentile of X is ≤ the pth percentile of Y. ρ(X) ≤ ρ(Y). Monotonicity does hold. Value at Risk is not coherent since #3 does not hold.
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 71
6.6. ρ(X) = E[X | X > πp ]. 1. Adding a constant to a variable adds a constant to each percentile. ρ(X + c) = E[X + c | X + c > πp + c ] = E[X + c | X > πp ] = E[X | X > πp ] + c = ρ(X) + c. Translation Invariance does hold 2. Multiplying a variable by a constant multiplies each quantile by that constant. ρ(cX) = E[cX | cX > cπp ] = E[cX | X > πp ] = c E[X | X > πp ] = c ρ(X). Positive Homogeneity does hold. 3. E[X | worst p of the outcomes for X] ≥ E[X | worst p of the outcomes for X + Y]. ρ(X + Y) = E[X + Y | worst p of the outcomes for X + Y] = E[X | worst p of the outcomes for X + Y] + E[Y | worst p of the outcomes for X + Y] ≤ E[X | worst p of the outcomes for X] + E[Y | worst p of the outcomes for Y] = ρ(X) + ρ(Y). Subadditivity does hold. 4. If Prob[X ≤ Y] = 1, then the pth percentile of X is ≤ the pth percentile of Y. Therefore ρ(X) = E[X | worst p of the outcomes for X] ≤ E[Y | worst p of the outcomes for X] ≤ E[Y | worst p of the outcomes for Y] = ρ(Y). Monotonicity does hold. The Tail Value at Risk is coherent. 6.7. ρ(X) = E[X]. 1. ρ(X + c) = E[X + c] = E[X] + c = ρ(X) + c. Translation Invariance does hold 2. ρ(cX) = E[cX] = c E[X] = c ρ(X). Positive Homogeneity does hold. 3. ρ(X + Y) = E[X + Y] = E[X] + E[Y] = ρ(X) + ρ(Y) ≤ ρ(X) + ρ(Y). Subadditivity does hold. 4. If Prob[X ≤ Y] = 1, then ρ(X) = E[X] ≤ E[Y] = ρ(Y). Monotonicity does hold. This risk measure is coherent. 6.8. ρ(X) = Max[X]. 1. ρ(X + c) = Max[X + c] = Max[X] + c = ρ(X) + c. Translation Invariance does hold 2. ρ(cX) = Max[cX] = c Max[X] = c ρ(X). Positive Homogeneity does hold. 3. ρ(X + Y) = Max[X + Y] ≤ Max[X] + Max[Y] ≤ ρ(X) + ρ(Y). Subadditivity does hold. 4. If Prob[X ≤ Y] = 1, then ρ(X) = Max[X] ≤ Max[Y] = ρ(Y). Monotonicity does hold. This risk measure is coherent.
2013-4-4,
Risk Measures §6 Coherence,
HCM 10/8/12,
Page 72
6.9. ρ(X) = ln[E[eαX]]/α. 1. ρ(X + c) = ln[E[eα(X+c)]]/α = ln[eαcE[eαX]]/α = {αc + ln[E[eαX]]}/α = ρ(X) + c. Translation Invariance does hold 2. If X is Normal, ρ(X) = µ + ασ2/2. cX is Normal with parameters cµ and cσ. ρ(cX) = c µ + α(cλσ)2/2 ≠ cρ(X). Positive Homogeneity does not hold. 3. ρ(X + Y) ≤ ρ(X) + ρ(Y). ⇔ ln[E[eα(X+Y)]] ≤ ln[E[eαX]] + ln[E[eαY]].
⇔ E[eαX eαY] ≤ E[eαX] E[eαY]. ⇔ Cov[eαX, eαY] ≤ 0. However, this covariance can be positive. For example, take the following joint distribution for X and Y: X = 0 and Y = 0, with probability 88% X = 0 and Y = 1, with probability 4% X = 1 and Y = 0, with probability 4% X = 1 and Y = 1, with probability 4% Then for X, Prob[X = 0] = 92%, Prob[X = 1] = 8%. ρ(X) = ln(.92 + .08eα). For Y, Prob[Y = 0] = 92%, Prob[Y = 1] = 8%. ρ(Y) = ln(.92 + .08eα). For X + Y, Prob[X + Y = 0] = 88%, Prob[X + Y = 1] = 8%, Prob[X + Y = 2] = 4%. ρ(X + Y) = ln(0.88 + 0.08eα + 0.04e2α). For example, for α = 2, ρ(X) = ρ(Y) = ln(0.92 + 0.08e2 )/2 = 0.206. ρ(X + Y) = ln(0.88 + 0.08e2 + 0.04e4 )/2 = 0.648. ρ(X + Y) = 0.648 > 0.412 = ρ(X) + ρ(Y). Subadditivity does not hold. 4. If Prob[X ≤ Y] = 1, then for α > 0, eαX ≤ eαY. ⇒ E[eαX] ≤ E[eαY]. ⇒ ln[E[eαX]] ≤ ln[E[eαY]].
⇒ ρ(X) ≤ ρ(Y). Monotonicity does hold. The Exponential Premium Principle is not coherent.
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 73
Section 7, Using Simulation50 For a general discussion of simulation see “Mahlerʼs Guide to Simulation.” Here I will discuss using the results of a simulation of aggregate losses to estimate risk measures. Simulating aggregate losses could be relatively simple, for example if one assumes that aggregate losses are LogNormal. On the other hand, it could involve a very complicated simulation model of a property/casualty insurer with many different lines of insurance whose results are not independent, with complicated reinsurance arrangements, etc.51 Here we will not worry about how the simulation was performed. Rather we will be given a large simulated sample. For example, let us assume we have simulated from the distribution of aggregate losses the following sample of size 100, arranged from smallest to largest:52 13, 19, 20, 25, 25, 31, 35, 35, 37, 39, 43, 48, 49, 51, 53, 55, 65, 68, 69, 75, 75, 79, 81, 84, 86, 87, 88, 90, 90, 94, 97, 112, 121, 128, 129, 132, 133, 133, 134, 137, 137, 138, 141, 142, 143, 144, 145, 145, 150, 150, 161, 166, 171, 186, 187, 191, 191, 206, 212, 212, 222, 226, 228, 228, 239, 250, 252, 270, 272, 274, 303, 315, 317, 319, 321, 322, 326, 340, 352, 356, 362, 365, 373, 388, 415, 434, 455, 456, 459, 516, 560, 638, 691, 762, 906, 1031, 1456, 1467, 1525, 2034. Mean and Standard Deviation: The sum of this sample is 27,305. X = 27,305/100 = 273.05. The sum of the squares of the sample is 18,722,291. The estimated 2nd moment is 187,223. Therefore, the sample variance is: (187,223 - 273.052 )(100/99) = 113,805. Exercise: Estimate the appropriate premium using the Standard Deviation Premium Principle with k = 0.5. [Solution: 273.05 + (0.5)( 113,805 ) = 442.]
50
See Section 21.2.5 of Loss Models. See for example, The Dynamic Financial Analysis Call Papers in the Spring 2001 CAS Forum. 52 In practical applications, one would usually simulate a bigger sample, such as size 1000 or 10,000. 51
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 74
Estimating Value at Risk: Here, in order to estimate the pth percentile, Loss Models takes the value in the sample corresponding to: 1 + the largest integer in Np.53 For a sample of size 100, VaR0.90 is estimated as: [(100)(0.9)] + 1 = 91st value from smallest to largest. Exercise: For the previous sample of size 100, estimate VaR0.80. [Solution: Take 1 + the largest integer in: (0.80)(100) = 80. So we take the 81st element in the sample from smallest to largest: 362.] In general, let [x] be the greatest integer contained in x. [7.2] = 7. [7.6] = 7. [8.0] = 8. VaRp is estimated as the [Np] + 1 value from smallest to largest. VaRp = L[ N p ] + 1. Using a Series of Simulations: 54 Loss Models does not discuss how to estimate the variance of this estimate of VaR.80. One way would be through a series of simulations. One could repeat the simulation that resulted in the previous sample of 100, and get a new sample of size 100. Using the original sample the estimate of VaR.80 was 362. Using the new sample, the estimate of VaR.80 would be somewhat different. Then we could proceed to simulate a third sample, and get a third estimate of VaR.80. We could produce for example 500 different samples and get 500 corresponding estimates of VaR.80. Then the mean of these 500 estimates of VaR.80, would be a good estimate of VaR.80. The sample variance of these 500 estimates of VaR.80, would be an estimate of the variance of any of the individual estimates of VaR.80. However, the variance of the average of these 500 estimates of VaR.80 would be the sample variance divided by 500.55 This differs from the smoothed empirical estimate of πp which is the p(N+1) loss from smallest to largest, linearly interpolating between two loss amounts if necessary. See “Mahlerʼs Guide to Fitting Loss Distributions.” 54 “Mahlerʼs Guide to Simulation” has many examples of simulation experiments. See especially the section on Estimating the p-value via Simulation. 55 The variance of an average is the variance of a single draw, divided by the number of items being averaged. 53
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 75
Estimating Tail Value at Risk: One can estimate TVaRp as an average of the worst outcomes of a simulated sample. For a sample of size 100, VaR0.90 is estimated as: [(100)(0.9)] + 1 = 91st value from smallest to largest. For a sample of size 100, in order to estimate TVaR0.90, take an average of the 10 largest values. Average the values starting at the 91st. For the previous sample of size 100, the 91st value is 560, the estimate of π0.90. We could estimate TVaR90% as the average of the 10 largest values in the sample: (560 + 638 + 691 + 762 + 906 + 1031 + 1456 + 1467 + 1525 + 2034)/10 = 1107. In general, let [x] be the greatest integer contained in x. TVaRp is estimated as the average of the largest values in the sample, starting with the [Np] + 1 value from smallest to largest.56 Exercise: For the previous sample of size 100, estimate TVaR95%. [Solution: [(100)(.95)] + 1 = 96. (1031 + 1456 + 1467 + 1525 + 2034)/5 = 1502.6. Comment: For a small sample such as this, and a large p such as 95%, the estimate of the TVaR95% is subject to a lot of random fluctuation.]
56
There are other similar estimators that would also be reasonable.
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 76
Variance of the Estimate of the Tail Value at Risk: In general a variance can be divided into two pieces.57 Conditioning the estimate of TVaRp on π^p , the estimator of πp : ^ ^ ^ Var[ TVaRp ] = E[Var[ TVaRp | π^p ]] + Var[E[ TVaRp | π^p ]].
This leads to the following estimate of the variance of the estimate of TVaRp :58 ^ {sp 2 + p( TVaRp - π^p ) 2 }/{N - [Np]},
where sp 2 is the sample variance of the worst outcomes used to estimate TVaRp . For the previous sample of size 100, TVaR90% was estimated as an average of the 10 largest values in the sample: (560 + 638 + 691 + 762 + 906 + 1031 + 1456 + 1467 + 1525 + 2034)/10 = 1107. The sample variance of these 10 worst outcomes is: {(560 - 1107)2 + (638 - 1107)2 + (691 - 1107)2 + (762 - 1107)2 + (906 - 1107)2 + (1031 - 1107)2 + (1456 - 1107)2 + (1467 - 1107)2 + (1525 - 1107)2 + (2034 - 1107)2 }/9 = 238,098. Thus the estimate of the variance of this estimate of TVaR90% is: {238,098 + (.9)(1107 - 560)2 }/(100 - 90) = 50,739. Thus the estimate of the standard deviation of this estimate of TVaR90% is: 50,739 = 712.
57
As discussed in “Mahlerʼs Guide to Buhlmann Credibility.” The first term is the EPV, while the second term is the VHM. The first term dominates for a heavier-tailed distribution, while the second term is significant for a lighter-tailed distribution. Although Loss Models does not explain how to derive the second term, it does cite “Variance of the CTE Estimator”, by B. John Manistre and Geoffrey H. Hancock, April 2005 NAAJ. The derivation in Manistre and Hancock uses the information matrix and the delta method, which are discussed in “Mahlerʼs Guide to Fitting Loss Distributions.” 58
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 77
Problems: For the next six questions, you simulate the following 35 random values from a distribution: 6 7 11 14 15 17 18 19 25 29 30 34 38 40 41 48 49 53 60 63 78 103 124 140 192 198 227 330 361 421 514 546 750 864 1638 7.1 (1 point) What is the estimate of VaR0.9? A. 546
B. 750
C. 864
D. 1638
E. None of A, B, C, or D
7.2 (1 point) What is the estimate of VaR0.7? A. 140
B. 192
C. 198
D. 227
E. 330
7.3 (2 points) What is the estimate of TVaR0.9? A. 900
B. 950
C. 1000
D. 1050
E. 1100
7.4 (3 points) What is the variance of the estimate in the previous question? A. 88,000 B. 90,000 C. 92,000 D. 94,000 E. 96,000 7.5 (2 points) What is the estimate of TVaR0.7? A. 400
B. 450
C. 500
D. 550
E. 600
7.6 (3 points) What is the variance of the estimate in the previous question? A. 22,000 B. 24,000 C. 26,000 D. 28,000 E. 30,000
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 78
Use the following information for the next 3 questions: One hundred values of the annual earthquake losses in the state of Allshookup for the Presley Insurance Company have been simulated, and ranked from smallest to largest: 57, 72, 98, 128, 151, 160, 163, 171, 203, 218, 257, 262, 267, 301, 323, 327, 337, 372, 397, 401, 441, 447, 454, 464, 491, 498, 500, 509, 512, 520, 522, 523, 526, 530, 531, 553, 554, 565, 620, 632, 633, 637, 641, 648, 660, 666, 678, 685, 695, 708, 709, 728, 732, 782, 810, 826, 851, 858, 862, 871, 890, 903, 942, 947, 955, 976, 984, 992, 1016, 1023, 1024, 1027, 1041, 1047, 1048, 1050, 1055, 1055, 1057, 1062, 1076, 1081, 1088, 1117, 1131, 1148, 1192, 1220, 1253, 1270, 1329, 1398, 1406, 1537, 1578, 1658, 1814, 1909, 2431, 2702. 7.7 (1 point) Estimate VaR0.9. A. 1192
B. 1220
C. 1253
D. 1270
E. 1329
D. 2100
E. 2200
7.8 (2 points) Estimate TVaR0.95. A. 1800
B. 1900
C. 2000
7.9 (3 points) Estimate the standard deviation of the estimate made in the previous question. A. 240 B. 260 C. 280 D. 300 E. 320
7.10 (1 point) XYZ Insurance Company wrote a portfolio of medical professional liability insurance. 100 scenarios were simulated to model the aggregate losses. The 10 worst results of these 100 scenarios are (in $ million): 104, 132, 132, 143, 152, 183, 131, 126, 191, 117. Estimate the 95% Tail Value at Risk.
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 79
Use the following information for the next 4 questions: One thousand values of aggregate annual losses net of reinsurance have been simulated. They have been ranked from smallest to largest, and here are the largest 100: 3985, 4239, 4521, 4705, 4875, 5220, 5239, 5294, 5384, 5503, 5514, 5581, 5601,5630, 5735, 5756, 5823, 5872, 5902, 5909, 5945, 6004, 6038, 6085, 6204, 6249, 6265, 6270, 6326, 6338, 6371, 6378, 6398, 6402, 6457, 6533, 6548, 6667, 6679, 6688, 6822, 6920, 6994, 7004, 7039, 7050, 7100, 7126, 7126, 7128, 7129, 7133, 7208, 7250, 7317, 7317, 7352, 7361, 7377, 7466, 7467, 7468, 7470, 7472, 7527, 7534, 7538, 7544, 7547, 7547, 7578, 7607, 7613, 7651, 7663, 7712, 7757, 7771, 7785, 7823, 7849, 7865, 7878, 7880, 7906, 7923, 7928, 7941, 7955, 7963, 7976, 7979, 8011, 8021, 8032, 8034, 8052, 8065, 8089, 8116. 7.11 (1 point) Estimate VaR0.95. A. 7128
B. 7129
C. 7133
D. 7208
E. 7250
D. 4705
E. 4875
D. 7900
E. 8000
7.12 (1 point) Estimate VaR0.90. A. 3985
B. 4239
C. 4521
7.13 (2 points) Estimate TVaR0.99. A. 7600
B. 7700
C. 7800
7.14 (3 points) Estimate the standard deviation of the estimate made in the previous question. A. 20 B. 22 C. 24 D. 26 E. 28
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 80
Solution to Problems: 7.1. A. VaR.9 is estimated as the [(.90)(35)] + 1 = 32nd value from smallest to largest: 546. 7.2. B. VAR.7 is estimated as the [(.70)(35)] + 1 = 25th value from smallest to largest: 192. 7.3. B. [(.90)(35)] + 1 = 32. Estimate TVaR.9 as the average of the worst outcomes starting with the 32nd value. (546 + 750 + 864 + 1638)/4 = 949.5. 7.4. D. [Np] + 1 = [(.90)(35)] + 1 = 32. The 32nd element from smallest to largest is 546, the estimate of π.9. sp 2 is the sample variance of the worst outcomes used to estimate TVaR.95: {(546 - 949.5)2 + (750 - 949.5)2 + (864 - 949.5)2 + (1638 - 949.5)2 }/3 = 227,985. The variance of the estimate of TVaRp : {sp 2 + p(TVaRp - πp )2 }/{N - [Np]} = {227,985 + (.9)(949.5 - 546)2 }/(35 - 31) = 93,629. 7.5. D. [(.70)(35)] + 1 = 25. Estimate TVaR.7 as the average of the worst outcomes starting with the 25th value. (192 + 198 + 227 + 330 + 361 + 421 + 514 + 546 + 750 + 864 + 1638)/11 = 549.2. 7.6. B. [Np] + 1 = [(.60)(35)] + 1 = 25. The 25th element from smallest to largest is 192, the estimate of π.7. sp 2 is the sample variance of the worst outcomes used to estimate TVaR.7: {(192 - 549.2)2 + (198 - 549.2)2 + (227 - 549.2)2 + (330 - 549.2)2 + (361 - 549.2)2 + (421 - 549.2)2 + (514 - 549.2)2 + (546 - 549.2)2 + (750 - 549.2)2 + (864 - 549.2)2 + (1638 - 549.2)2 }/10 = 178,080. The variance of the estimate of TVaRp : {sp 2 + p(TVaRp - πp )2 }/{N - [Np]} = {178,080 + (.7)(549.2 - 192)2 }/(35 - 24) = 24,309. Comment: The variance of the estimate of TVaR.7 is much smaller than the variance of the estimate of TVaR.9. It is easier to estimate the Tail Value at Risk at a smaller value of p than a larger value of p; it is hard to estimate what is going on in the extreme righthand tail. 7.7. E. [Np] + 1 = [(100)(.90)] + 1 = 91. The 91st element from smallest to largest is: 1329.
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
Page 81
7.8. D. [Np] + 1 = [(100)(.95)] + 1 = 96. Average the 96th to 100th values: (1658 + 1814 + 1909 + 2431 + 2702)/5 = 2102.8. 7.9. C. [Np] + 1 = [(100)(.95)] + 1 = 96. The 96th element from smallest to largest is 1658, the estimate of π.95. sp 2 is the sample variance of the worst outcomes used to estimate TVaR.95: {(1658 - 2102.8)2 + (1814 - 2102.8)2 + (1909 - 2102.8)2 + (2431 - 2102.8)2 + (2702 - 2102.8)2 }/4 = 196,392. The variance of the estimate of TVaRp : {sp 2 + p(TVaRp - πp )2 } / {N - [Np]} = {196,392 + (.95)(2102.8 - 1658)2 } / (100 - 95) = 76,869. The standard deviation is:
76,869 = 277.
7.10. [Np] + 1 = [(100)(0.95)] + 1 = 96. We average the 96th, 97th, 98th, 99th, 100th values: (132 + 143 + 152 + 183 + 191) / 5 = $160.2 million. Comment: TVaRp is estimated as the average of the largest values in the sample, starting with the [Np] + 1 value from smallest to largest. 7.11. B. [Np] + 1 = [(1000)(.95)] + 1 = 951. The 951st value is: 7129. 7.12. A. [Np] + 1 = [(1000)(.90)] + 1 = 901. The 901st value is: 3985. 7.13. E. [Np] + 1 = [(1000)(.99)] + 1 = 991. Average the 991st to the 1000th values: (7976 + 7979 + 8011 + 8021 + 8032 + 8034 + 8052 + 8065 + 8089 + 8116)/10 = 8037.5.
2013-4-4,
Risk Measures §7 Using Simulation,
HCM 10/8/12,
7.14. C. [Np] + 1 = [(1000)(.99)] + 1 = 991. The 991st element from smallest to largest is 7976, the estimate of π.95. sp 2 is the sample variance of the worst outcomes used to estimate TVaR.99: {(7976 - 8037.5)2 + (7979 - 8037.5)2 + (8011 - 8037.5)2 + (8021 - 8037.5)2 + (8032 - 8037.5)2 + (8034 - 8037.5)2 + (8052 - 8037.5)2 + (8065 - 8037.5)2 + (8089 - 8037.5)2 + (8116 - 8037.5)2 }/9 = 2000. The variance of the estimate of TVaRp : {sp 2 + p(TVaRp - πp )2 }/{N - [Np]} = {2000 + (.99)(8037.5 - 7976)2 }/(1000 - 990) = 574.4. The standard deviation is:
574.4 = 24.0.
Page 82
2013-4-4,
Measures §8 Important Ideas,
HCM 10/8/12,
Page 83
Section 8, Important Ideas and Formulas Introduction (Section 1): A risk measure is defined as a functional mapping of a loss distribution to the real numbers. ρ(X) is the notation used for the risk measure. Premium Principles (Section 2): Expected Value Premium Principle: ρ(X) = (1 + k)E[X], k > 0. Standard Deviation Premium Principle: ρ(X) = E[X] + k Var[X] , k > 0. Variance Premium Principle: ρ(X) = E[X] + k Var[X], k > 0. Value at Risk (Section 3): Value at Risk, VaRp (X), is defined as the 100pt h percentile. VaRp (X) = π p . In Appendix A of the Tables attached to the exam, there are formulas for VaRp(X) for a many of the distributions: Exponential, Pareto, Single Parameter Pareto, Inverse Pareto, Inverse Weibull, Burr, Inverse Burr, Inverse Exponential, Paralogistic, Inverse Paralogistic. Tail Value at Risk (Section 4): TVaRp (X) ≡ E[X | X > πp ] = πp + e(πp ) = πp + (E[X] - E[X ∧ πp ]) / (1 - p). The corresponding risk measure is: ρ(X) = TVaRp (X). TVaRp (X) ≥ VaRp (X).
TVaR0 (X) = E[X].
TVaR1 (X) = Max[X].
In Appendix A, there are formulas for TVaRp (X) for a few of the distributions: Exponential, Pareto, Single Parameter Pareto.
2013-4-4,
Measures §8 Important Ideas,
HCM 10/8/12,
Page 84
For the Normal Distribution: TVaRp (X) = µ + σ φ[zp ] / (1 - p). Coherence (Section 6): A risk measure is coherent if it has the following four properties: 1. Translation Invariance
ρ(X + c) = ρ(X) + c, for any constant c..
2. Positive Homogeneity
ρ(cX) = c ρ(X), for any constant c > 0.
3. Subadditivity
ρ(X + Y) ≤ ρ(X) + ρ(Y).
4. Monotonicity
If Prob[X ≤ Y] = 1, then ρ(X) ≤ ρ(Y).
The Tail Value at Risk is a coherent measure of risk. The Standard Deviation Premium Principle and the Value at Risk are not coherent measures of risk. Using Simulation (Section 7): Let [x] be the greatest integer contained in x. VaRp is estimated as the [Np] + 1 value from smallest to largest. TVaRp is estimated as the average of the largest values in the sample, starting with the [Np] + 1 value from smallest to largest. Estimate of the variance of the estimate of TVaRp : ^ {sp 2 + p( TVaRp - π^p )2 }/{N - [Np]},
where sp 2 is the sample variance of the worst outcomes used to estimate TVaRp .
Mahlerʼs Guide to
Classical Credibility Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-8 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-8,
Classical Credibility,
HCM 10/16/12,
Page 1
Mahlerʼs Guide to Classical Credibility Copyright 2013 by Howard C. Mahler. The concepts in Section 2 of “Credibility” by Mahler and Dean1 or Section 20.2 of Loss Models are demonstrated. Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (and sections whose title is in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined. Solutions to the problems in each section are at the end of that section. Note that problems include both some written by me and some from past exams.2 The latter are copyright by the Casualty Actuarial Society and the Society of Actuaries and are reproduced here solely to aid students in studying for exams.3 Section # A
B C
1
1 2 3 4 5 6 7
Pages 3-4 5-9 10-25 26-32 33-65 66-114 115-150 150-151
Section Name Normal Distribution Table Introduction Full Credibility for Frequency Full Credibility for Severity Variance of Pure Premiums & Aggregate Losses Full Credibility for Pure Premiums & Aggregate Losses Partial Credibility Important Formulas and Ideas
From Chapter 8 of the fourth Edition of Foundations of Casualty Actuarial Science. My study guide is very similar to and formed a basis for the Credibility Chapter written by myself and Curtis Gary Dean. 2 In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. 3 The solutions and comments are solely the responsibility of the author; the CAS/SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams.
2013-4-8,
Classical Credibility,
HCM 10/16/12,
Page 2
Course 4 Exam Questions by Section of this Study Aid4 Section
Sample
5/00
11/00
5/01
11/01
11/02
11/03
11/04
1 2
21
3 4 5
36 15
6
Section
14
14
26
15
5/05
11/05
11/06
2
35
30
3 35
5/07
1 2 3 4 5 6
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams.
4
Excluding any questions that are no longer on the syllabus.
2013-4-8,
Classical Credibility,
HCM 10/16/12,
Page 3
Normal Distribution Table Entries represent the area under the standardized normal distribution from -∞ to z, Pr(Z < z). The value of z to the first decimal place is given in the left column. The second decimal is given in the top row. z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.0 0.1 0.2 0.3 0.4
0.5000 0.5398 0.5793 0.6179 0.6554
0.5040 0.5438 0.5832 0.6217 0.6591
0.5080 0.5478 0.5871 0.6255 0.6628
0.5120 0.5517 0.5910 0.6293 0.6664
0.5160 0.5557 0.5948 0.6331 0.6700
0.5199 0.5596 0.5987 0.6368 0.6736
0.5239 0.5636 0.6026 0.6406 0.6772
0.5279 0.5675 0.6064 0.6443 0.6808
0.5319 0.5714 0.6103 0.6480 0.6844
0.5359 0.5753 0.6141 0.6517 0.6879
0.5 0.6 0.7 0.8 0.9
0.6915 0.7257 0.7580 0.7881 0.8159
0.6950 0.7291 0.7611 0.7910 0.8186
0.6985 0.7324 0.7642 0.7939 0.8212
0.7019 0.7357 0.7673 0.7967 0.8238
0.7054 0.7389 0.7704 0.7995 0.8264
0.7088 0.7422 0.7734 0.8023 0.8289
0.7123 0.7454 0.7764 0.8051 0.8315
0.7157 0.7486 0.7794 0.8078 0.8340
0.7190 0.7517 0.7823 0.8106 0.8365
0.7224 0.7549 0.7852 0.8133 0.8389
1.0 1.1 1.2 1.3 1.4
0.8413 0.8643 0.8849 0.9032 0.9192
0.8438 0.8665 0.8869 0.9049 0.9207
0.8461 0.8686 0.8888 0.9066 0.9222
0.8485 0.8708 0.8907 0.9082 0.9236
0.8508 0.8729 0.8925 0.9099 0.9251
0.8531 0.8749 0.8944 0.9115 0.9265
0.8554 0.8770 0.8962 0.9131 0.9279
0.8577 0.8790 0.8980 0.9147 0.9292
0.8599 0.8810 0.8997 0.9162 0.9306
0.8621 0.8830 0.9015 0.9177 0.9319
1.5 1.6 1.7 1.8 1.9
0.9332 0.9452 0.9554 0.9641 0.9713
0.9345 0.9463 0.9564 0.9649 0.9719
0.9357 0.9474 0.9573 0.9656 0.9726
0.9370 0.9484 0.9582 0.9664 0.9732
0.9382 0.9495 0.9591 0.9671 0.9738
0.9394 0.9505 0.9599 0.9678 0.9744
0.9406 0.9515 0.9608 0.9686 0.9750
0.9418 0.9525 0.9616 0.9693 0.9756
0.9429 0.9535 0.9625 0.9699 0.9761
0.9441 0.9545 0.9633 0.9706 0.9767
2.0 2.1 2.2 2.3 2.4
0.9772 0.9821 0.9861 0.9893 0.9918
0.9778 0.9826 0.9864 0.9896 0.9920
0.9783 0.9830 0.9868 0.9898 0.9922
0.9788 0.9834 0.9871 0.9901 0.9925
0.9793 0.9838 0.9875 0.9904 0.9927
0.9798 0.9842 0.9878 0.9906 0.9929
0.9803 0.9846 0.9881 0.9909 0.9931
0.9808 0.9850 0.9884 0.9911 0.9932
0.9812 0.9854 0.9887 0.9913 0.9934
0.9817 0.9857 0.9890 0.9916 0.9936
Table continued on the next page
0.09
2013-4-8,
Classical Credibility,
HCM 10/16/12,
Page 4
Entries represent the area under the standardized normal distribution from -∞ to z, Pr(Z < z). The value of z to the first decimal place is given in the left column. The second decimal is given in the top row. z 2.5 2.6 2.7 2.8 2.9
0.00 0.9938 0.9953 0.9965 0.9974 0.9981
0.01 0.9940 0.9955 0.9966 0.9975 0.9982
0.02 0.9941 0.9956 0.9967 0.9976 0.9982
0.03 0.9943 0.9957 0.9968 0.9977 0.9983
0.04 0.9945 0.9959 0.9969 0.9977 0.9984
0.05 0.9946 0.9960 0.9970 0.9978 0.9984
0.06 0.9948 0.9961 0.9971 0.9979 0.9985
0.07 0.9949 0.9962 0.9972 0.9979 0.9985
0.08 0.9951 0.9963 0.9973 0.9980 0.9986
0.09 0.9952 0.9964 0.9974 0.9981 0.9986
3.0 3.1 3.2 3.3 3.4
0.9987 0.9990 0.9993 0.9995 0.9997
0.9987 0.9991 0.9993 0.9995 0.9997
0.9987 0.9991 0.9994 0.9995 0.9997
0.9988 0.9991 0.9994 0.9995 0.9997
0.9988 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9997 0.9998
3.5 3.6 3.7 3.8 3.9
0.9998 0.9998 0.9999 0.9999 1.0000
0.9998 0.9998 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
z Pr(Z < z)
0.842 0.800
Values of z for selected values of Pr(Z < z) 1.036 1.282 1.645 1.960 0.850 0.900 0.950 0.975
2.326 0.990
2.576 0.995
For Classical Credibility, we will be using the chart at the bottom of the table, showing various percentiles of the Standard Normal Distribution. Using the Normal Table:5 When using the normal distribution, choose the nearest z-value to find the probability, or if the probability is given, chose the nearest z-value. No interpolation should be used. Example: If the given z-value is 0.759, and you need to find Pr(Z < 0.759) from the normal distribution table, then choose the probability value for z-value = 0.76; Pr(Z < 0.76) = 0.7764. When using the Normal Approximation to a discrete distribution, use the continuity correction.
5
Instructions for Exam 4/C from the SOA/CAS.
2013-4-8,
Classical Credibility §1 Introduction,
HCM 10/16/12,
Page 5
Section 1, Introduction Assume Carpenters are currently charged a rate of $10 (per $100 of payroll) for Workers Compensation Insurance.6 Assume further that the recent experience would indicate a rate of $5. Then an actuaryʼs new estimate of the rate for Carpenters might be $5, $10, or most likely something in between. In other words, the new estimate of the appropriate rate for Carpenters would be a weighted average of the separate $5 and $10 estimates. If the actuary put more weight on the observation, the new estimate would be closer to the observation of $5. If on the other hand, the actuary put less weight on the observation, then the new estimate would be closer to the current rate of $10. One could write this as: new estimate = (5)(Z) + (10)(1-Z), where Z is the weight, 0 ≤ Z ≤1. So for example if Z = 20%, then the new estimate is ($5)(.2) + ($10)(.8) = $9. If instead Z = 60%, then the new estimate is ($5)(.6)+($10)(.4) = $7. The weight Z is generally referred to as the “credibility” assigned to the observed data. Credibility is commonly used by actuaries in order to weight together two estimates7 of the same quantity. Let X and Y be two estimates. X might be from a recent observation based on limited data, while Y might be a previous estimate or one obtained from a larger but less specific data set.8 Then the estimate using credibility would = ZX + (1 - Z)Y, where Z is the credibility assigned to the observation X. 1 - Z is generally referred to as the complement of credibility. Thus the use of credibility involves a linear estimate of the true expectation derived as a result of a compromise between hypothesis and observations.
Credibility: A linear estimator by which data external to a particular group or individual are combined with the experience of the group or individual in order to better estimate the expected loss (or any other statistical quantity) for each group or individual. Credibility or Credibility Factor: Z, the weight given the observation. The basic formula is: new estimate = (observation) (Z) + (old estimate) (1-Z).
6
Assume that there is no change in rates indicated for the Contracting Industry Group in which Carpenters are included. So that in the absence of any specific data for the Carpenterʼs class, Carpenters would continue to be charged $10. 7 In some actual applications more than two estimates are weighted together. 8 For example, Y might be (appropriately adjusted) countrywide data for the Carpenterʼs class.
2013-4-8,
Classical Credibility §1 Introduction,
HCM 10/16/12,
Page 6
Sometimes it is useful to use the equivalent formula: new estimate = old estimate + Z (observation - old estimate). This can be solved for the credibility: new estimate - old estimate Z= . observation - old estimate In the example, in order to calculate a new estimate of the appropriate rate for Carpenters, one first has to decide that one will weight together the current observations9 with the current rate for Carpenters10. Generally on the exam when it is relevant to answering the question, it will be clear which two estimates to weight together. Second one has to decide how much credibility to assign to the current observation. On the exam this is generally the crux of the questions asked. Two manners of determining how much credibility to assign are covered on the Syllabus. The first is called Classical Credibility or Limited Fluctuation Credibility and is covered in this Study Aid.11 The second is referred to as Buhlmann Credibility, Least Squares Credibility, or Greatest Accuracy Credibility and is covered in another Study Aid.12 Either form of credibility can be applied to various actuarial issues such as: Classification and/or Territorial Ratemaking, Experience Rating (Individual Risk Rating), Loss Reserving, Trend, etc. On the exam, credibility questions will usually involve experience rating13 or perhaps classification ratemaking14, unless they deal with urns, dies, spinners, etc., that are used to model probability and credibility theory situations.15
9
One would have to decide what period of time to use, for example the most recently available 3 years. Also one would adjust the data for law changes, trend, development, etc. 10 In actual applications, various adjustments would be made to the current rate for Carpenters before using it to estimate the proposed rate for Carpenters. 11 Classical Credibility was developed in the U.S. in the first third of the 20th century by early members of the CAS such as Albert Mowbray and Francis Perryman. 12 Greatest Accuracy Credibility was developed in the late 1940s by Arthur Bailey, FCAS, based on earlier work by Albert Whitney and other members of the CAS. 13 Experience Rating refers to the use of the experience of an individual policyholder in order to help determine his premium. This can be for either Commercial Insureds (e.g. Workers Compensation) or Personal Lines Insureds (e.g. Private Passenger Automobile.) 14 For example making the rates for the Workers Compensation class of Carpenters. Similar situations occur when making the rates for the territories of a state or for the classes and territories in a state. 15 The reason you are given problems involving urns, etc. is that one can then ask questions that do not require the knowledge of the specific situation. For example, in order to ask a question involving an actual application to Workers Compensation Classification Ratemaking would require knowledge many students do not have and which can not be covered on the syllabus for this exam. Also, the questions involving urns, etc., illustrate the importance of modeling. In actual applications, someone has to propose a model of the underlying process, so that one can properly apply Credibility Theory. Urn models, etc. allow one to determine which features are important and how they are likely to affect real world situations. A good example is Philbrickʼs target shooting example.
2013-4-8,
Classical Credibility §1 Introduction,
HCM 10/16/12,
Page 7
In general, all other things being equal, one would assign more credibility to a larger volume of data. In Classical Credibility, one determines how much data one needs before one will assign to it 100% credibility. This amount of data is referred to as the Full Credibility Criterion or the Standard for Full Credibility. If one has this much data or more, then Z = 100%; if one has observed less than this amount of data then one has 0 ≤ Z <1. For example, if I observed 1000 full time Carpenters, then I might assign 100% credibility to their data.16 Then if I observed 2000 full time Carpenters I would also assign them 100% credibility. I might assign 100 full time Carpenters 32% credibility. In this case we say we have assigned the observation partial credibility, i.e., less than full credibility. Exactly how to determine the amount of credibility assigned to different amounts of data is discussed in the following sections. There are five basic concepts from Classical Credibility you need to know how to apply in order to answer exam questions: 1. How to determine the Criterion for Full Credibility when estimating frequencies. 2. How to determine the Criterion for Full Credibility when estimating severities. 3. How to determine the Criterion for Full Credibility when estimating pure premiums or aggregate losses. 4. How to determine the amount of partial credibility to assign when one has less data than is needed for full credibility. 5. How to use credibility to estimate the future, by combining the observation and the old estimate.
16
For Workers Compensation that data would be dollars of loss and dollars of payroll.
2013-4-8,
Classical Credibility §1 Introduction,
HCM 10/16/12,
Page 8
Problems: 1.1 (1 point) The observed claim frequency is 120. The credibility given to this data is 25%. The complement of credibility is given to the prior estimate of 200. What is the new estimate of the claim frequency? A. Less than 165 B. At least 165 but less than 175 C. At least 175 but less than 185 D. At least 185 but less than 195 E. At least 195 1.2 (1 point) The prior estimate was 100 and after an observation of 800 the new estimate is 150. How much credibility was assigned to the data? A. Less than 4% B. At least 4% but less than 5% C. At least 5% but less than 6% D. At least 6% but less than 7% E. At least 7%
2013-4-8,
Classical Credibility §1 Introduction,
HCM 10/16/12,
Solutions to Problems: 1.1. C. (25%)(120) + (75%)(200) = 180. 1.2. E. New estimate = old estimate + Z (observation - old estimate).
⇒ Z = (new estimate - old estimate) / (observation - old estimate) = (150-100)/(800-100) = 50/700 = 7.1%.
Page 9
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 10
Section 2, Full Credibility for Frequency17 The most common uses of Classical Credibility, assume that the frequency is (approximately) Poisson. Thus weʼll deal with that case first. Poisson Case: Assume we have a Poisson process for claim frequency, with an average of 500 claims per year. Then if we observe the numbers of claims, they will vary from year to year around the mean of 500. The variance of a Poisson process is equal to its mean of 500. We can approximate this Poisson Process by a Normal Distribution with a mean of 500 and a variance of 500. We can use this Normal Approximation to estimate how often we will observe results far from the mean. For example, how often can one expect to observe more than 550 claims? The standard deviation is:
500 = 22.36. So 550 claims corresponds to about 50 / 22.36 = 2.24 standard
deviations greater than average. Since Φ(2.24) = .9875, there is approximately a 1.25% chance of observing more than 550 claims. Thus there is about a 1.25% chance of observing more than 10% greater than the expected number of claims. Similarly, we can calculate the chance of observing fewer than 450 claims as approximately 1.25%. Thus the chance of observing outside ±10% from the mean number of claims is about 2.5%. In other words, the chance of observing within ±10% of the expected number of claims is 97.5% in this case18 . If we had a mean of 1000 claims instead of 500 claims, then there would be a greater chance of observing within ±10% of the expected number of claims. This is given by the Normal (10%) (1000) (10%) (1000) approximation as: Φ[ ] - Φ[] = Φ[3.162] - Φ[-3.162] = 1000 1000 1 - (2){1 - Φ[3.162]} = 2Φ[3.162] - 1 = (2)(0.9992) - 1 = 99.84%. Exercise: Compute the Probability of being within ± 5% of the mean, for 100 expected claims. (5%) (100) [Solution: 2Φ[ ] - 1 = 38.29%.] 100 17
A subsequent section deals with estimating Pure Premiums rather than Frequencies. As will be seen in order to calculate a Standard for Full Credibility for the Pure Premium generally one first calculates a Standard for Full Credibility for the Frequency. Thus questions about the former also test whether one knows how to do the latter. 18 Note that here we have ignored the “continuity correction.” As shown in “Mahlerʼs Guide to Frequency Distributions” the probability would be calculated including the continuity correction. The probability of more than 500 claims is approximately: 1 - Φ[(550.5-500)/ 500 ] = 1 - Φ(2.258) = 1 - 0.9880 = 1.20%.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 11
In general, let P be the chance of being within ±k of the mean, given an expected number of claims equal to n. Then P = 2Φ[k n ] - 1. Here is a table showing P, for k = 10%, 5%, 2.5%, 1%, and 0.5%, and for 10, 50, 100, 500, 1000, 5000, and 10,000 claims: Probability of Being Within ± k of the Mean Expected # of Claims 10 50 100 500 1000 5000 10000
k=10% 24.82% 52.05% 68.27% 97.47% 99.84% 100.00% 100.00%
k=5% 12.56% 27.63% 38.29% 73.64% 88.62% 99.96% 100.00%
k=2.5% 6.30% 14.03% 19.74% 42.38% 57.08% 92.29% 98.76%
k=1% 2.52% 5.64% 7.97% 17.69% 24.82% 52.05% 68.27%
k=0.5% 1.26% 2.82% 3.99% 8.90% 12.56% 27.63% 38.29%
Turning things around, given values of P and k, then one can compute the number of expected claims n0 such that the chance of being within ±k of the mean is P. For example, if P = 90% and k = 2.5%, then based on the above table n0 is somewhat less than 5000 claims. More precisely, P = 2Φ[k 0.9 = 2Φ[2.5%
n ] - 1, and therefore for P = 0.9 and k = 2.5%,
n0 ] - 1.
Thus we want Φ[2.5%
n0 ] = (1+ P)/2 = 0.95. Let y be such that Φ(y) = (1+ P)/2 = .95.
Consulting the Standard Normal Table, y = 1.645. Then we want y = 0.025
n0 .
Thus n0 = y2 / k2 = 1.6452 / 0.0252 = 4330 claims. Having taken P = 90% and k = 2.5%, we would refer to 4330 as the Standard for Full Credibility (for estimating frequencies.) In general, assume one desires that the chance of being within ± k of the mean frequency to be at least P, then for a Poisson Frequency, the Standard for Full Credibility is: n0 =
y2 1+ P 19 , where y is such that Φ(y) = . k2 2
Exercise: Assuming frequency is Poisson, for P = 95% and for k = 5%, what is the number of claims required for Full Credibility for estimating the frequency? [Solution: y = 1.960 since Φ(1.960) = (1+P)/2 = 97.5%. Therefore, n0 = y2 / k2 = (1.96/0.05)2 = 1537 claims.] 19
See Equations 2.2.4 and 2.2.5 in Mahler and Dean.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 12
Here are values of y corresponding to various values of P: P
(1+P)/2
y
60.00% 70.00% 80.00% 90.00% 95.00% 98.00% 99.00%
80.00% 85.00% 90.00% 95.00% 97.50% 99.00% 99.50%
0.842 1.036 1.282 1.645 1.960 2.326 2.576
The relevant values are shown in the lower portion of the Normal table attached to the exam. Here is a table of values for the Standard for Full Credibility for the Frequency n0 , given various values of P and k:20
Standards for Full Credibility for Frequency
(Claims)
Probability Level
P
k = 30%
k = 20%
k = 10%
k = 7.5%
k = 5%
k = 2.5%
k = 1%
80.00%
18
41
164
292
657
2,628
16,424
90.00% 95.00% 96.00% 97.00% 98.00% 99.00% 99.90% 99.99%
30
68
271
481
1,082
4,329
27,055
43
96
384
683
1,537
6,146
38,415
47
105
422
750
1,687
6,749
42,179
52
118
471
837
1,884
7,535
47,093
60
135
541
962
2,165
8,659
54,119
74
166
664
1,180
2,654
10,616
66,349
120
271
1,083
1,925
4,331
17,324
108,276
168
378
1,514
2,691
6,055
24,219
151,367
The Standard of 1082 claims corresponding to P = 90% and k = 5% is the most commonly used, followed by the Standard of 683 claims corresponding to P = 95% and k = 7.5%. You should on several different occasions verify that you can calculate quickly and accurately a randomly selected value from this table. The value 1082 claims corresponding to P = 90% and k = 5% is commonly used in applications. For P = 90%, we want to have a 90% chance of being within ±k of the mean, so we are willing to have a 5% probability outside on either tail, for a total of 10% probability of being outside the error bars. Thus Φ(y) = 0.95 or y =1.645. Thus n0 = y2 /k2 = (1.645 / 0.05)2 = 1082 claims. 20
See Longley-Cookʼs “An Introduction to Credibility Theory” PCAS 1962, or “Some Notes on Credibility“ by Perryman, PCAS 1932.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 13
Variations from the Poisson Assumption: Assume one desires that the chance of being within ±k of the mean frequency to be at least P, then the Standard for Full Credibility is n0 = y2 / k2 , where y is such that Φ(y) = (1+ P)/2. However, this depended on the following assumptions:21 1. One is trying to Estimate Frequency 2. Frequency is given by a Poisson Process (so that the variance is equal to the mean) 3. There are enough expected claims to use the Normal Approximation. If any of these assumptions do not hold then one should not apply the above technique. Questions can also deal with situations where the frequency is not assumed to be Poisson. If a Binomial, Negative Binomial, or other frequency distribution is substituted for a Poisson distribution, then the difference in the derivation is that the variance is not equal to the mean. For example, assume one has a Binomial Distribution with parameters m = 1000 and q = 0.3. The mean is 300 and the variance is (1000)(0.3)(0.7) = 210. So the chance of being (5%) (300) (5%) (300) within ±5% of the expected value is approximately: Φ[ ] - Φ[ ]= 210 210 Φ(1.035) - Φ(-1.035) = 0.8496 - 0.1504 = 69.9%. So in the case of a Binomial with parameter 0.3, the “Standard for Full Credibility” with P = 70% and k = ±5% is about 1000 exposures or 300 expected claims. If instead a Negative Binomial Distribution had been assumed, then the variance would have been greater than the mean. This would have resulted in a standard for Full Credibility greater than in the Poisson situation. One can derive a more general formula when the Poisson assumption does not apply. Standard for Full Credibility for Frequency, General Case: The Standard for Full Credibility for Frequency in terms of claims is:22 σf 2 y2 σf 2 = n . 0 k2 µf µf which reduces to the Poisson case when σf2 / µf = 1. 21
Unlike Buhlmann Credibility, in Classical Credibility the weight given to the prior mean does not depend on the actuaryʼs view of its accuracy. 22 Equation 2.2.6 in Mahler and Dean.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 14
Exercise: Find the number of claims required for full credibility. Require that there is a 90% chance that the estimate of the frequency is correct within ±2.5%. The frequency distribution has a variance twice its mean. [Solution: P = 90% and y = 1.645. k = 2.5%. n0 = y2 / k2 = (1.645 / .025)2 = 4329 claims. We are given that σf2 /µf = 2. Thus n0 (σf2 /µf) = (4329)(2) = 8658 claims.] Exercise: Find the number of claims required for full credibility. Require that there is a 99% chance that the estimate of the frequency is correct within ±10%. Assume the frequency distribution is Negative Binomial, with parameters β = 1.5 and r unknown. [Solution: P = 99% and thus we want Φ(y) = (1+ P)/2 = .995. Thus y = 2.576. k = 10%. n0 = y2 / k2 = (2.576 / 0.10)2 = 664 claims. We are given that the frequency is Negative Binomial with mean µf = rβ and variance σf2 = rβ(1+β). Thus σf2 /µf = 1+β = 2.5. Thus n0 (σf2 / µf) = (664)(2.5) = 1660 claims. This is larger than the standard of 664 for a Poisson frequency, since the Negative Binomial has a variance greater than its mean. In this case the variance is 2.5 times the mean. Thus the standard of 1660 claims is 2.5 times 664.] Derivation for Standard for Full Credibility for Frequency: Require that the observed frequency should be within 100k% of the expected pure premium with probability P. Use the following notation: µf = mean frequency. σf2 = variance of frequency. Let y be such that Φ(y) = (1+P) / 2. Using the Normal Approximation what is a formula for the number of claims needed for full credibility of the frequency? Assume there are N claims expected and therefore N/µf exposures. The mean frequency is µf. The variance of the frequency for a single exposure is: σf2. A key idea is that if one adds up for example 3 independent, identically distributed variables, one gets 3 times the variance. In this case we are assumed to have N/µf independent exposures. Therefore, the variance of the number of claims observed for N/µf independent exposures is: (N/µf )σf2 .
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 15
The observed frequency is the number of claims divided by the number of exposures, N/µf. When one divides by a constant, the variance is divided by that constant squared. Therefore, the variance of the observed frequency is the variance of the number of claims, (N / µf )σf2 , divided by (N/µf)2 , which is: µf σf2 / N. Thus the standard deviation of the observed claim frequency is: σ = σf
µf / N .
We desire that Prob(µf - kµf ≤ X ≤ µf + kµf) ≥ P. Using the Normal Approximation this is true provided: kµf = yσ = yσf
µf / N .
Solving for N = (y2 / k2 ) (σf2 / µf). Exposures vs. Claims: Standards for Full Credibility have been calculated so far in terms of the expected number of claims. It is common to translate these into a number of exposures by dividing by the (approximate) expected claim frequency. So for example, if the Standard for Full Credibility is 1082 claims (P = 90%, k = 5%) and the expected claim frequency in Homeowners Insurance were .04 claims per house-year, then 1082 / .04 ≅ 27,000 house-years would be a corresponding Standard for Full Credibility in terms of exposures. In general, one can divide the Standard for Full Credibility in terms of claims by µf, in order to get it in terms of exposures. Thus in general, the Standard for Full Credibility for Frequency in terms of exposures is:23 σf 2 y2 σf 2 = n . 0 k2 µf 2 µf 2
23
This is equation 20.6 in Loss Models, as applied to this situation. Equation 20.6 in Loss Models gives the number of exposures required for full credibility: n ≥ λ0 (σ/ξ)2 . What Loss Models refers to as σ, is the standard deviation of
the quantity to be estimated, in this case frequency, and ξ is the mean of the quantity to be estimated. So in this case (σ/ξ)2 = (σf/µ f )2 .
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 16
Exercise: Find the number of exposures required for full credibility. Require that there is a 99% chance that the estimate of the frequency is correct within ±10%. Assume the frequency distribution is Negative Binomial, with parameters β = 1.5 and r = 4. [Solution: P = 99% and thus we want Φ(y) = (1+ P)/2 = 0.995. Thus y = 2.576. k = 10%. n0 = y2 / k2 = (2.576 / 0.10)2 = 664 claims. We are given that the frequency is Negative Binomial with mean µf = rβ and variance σf2 = rβ(1+β). σ2 1+β 2.5 Thus f2 = = = 0.4167. µf rβ (4) (1.5) Thus n0
σf2 = (664)(0.4167) = 277 exposures. µf 2
Comment: Note the assumed mean frequency is (4)(1.5) = 6. Thus 277 exposures correspond to about (277)(6) = 1660 expected claims, as found in a previous exercise. ] The Choice of P and k: On the exam one will be given P and k. In practical applications appropriate values of P and k have to be selected.24 While there is clearly some judgment involved in the choice of P and k, the Standards for Full Credibility for a given application are generally chosen by actuaries within a similar range.25 This same type of judgment is involved in the choice of error bars around an estimate of a quantity such as the loss elimination ratio at $10,000. Many times ±2 standard deviations (corresponding to about a 95% confidence interval) will be chosen, but that is not necessarily better than choosing ±1.5 or ±2.5 standard deviations. Similarly one has to decide at what significance level to reject or accept H0 when doing hypothesis testing. Should one use 5%, 1%, or some other significance level? So while Classical Credibility also involves somewhat arbitrary judgments, that has not stood in the way of it being very useful for many decades in many applications.
24
For situations that come up repeatedly, the choice of P and k may have been made several decades ago, but nevertheless the choice was made at some point in time. 1082 claims corresponding to P = 90% and k = 5% is the single most commonly used value. 25 For example, if an actuary were estimating frequency for private passenger automobile insurance, he would probably pick values of P and k that have been used before by other actuaries. These practical applications are beyond the syllabus of this exam.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 17
Problems: 2.1 (1 point) Assume frequency is Poisson. How many claims are required for Full Credibility if one requires that there be a 98% chance of the estimated frequency being within ±2.5% of the true value? A. Less than 8,000 B. At least 8,000 but less than 9,000 C. At least 9,000 but less than 10,000 D. At least 10,000 but less than 11,000 E. At least 11,000 2.2 (3 points) Y represents the number of independent homogeneous exposures in an insurance portfolio. The claim frequency rate per exposure is a random variable with mean = 0.10 and variance = 0.25. A full credibility standard is devised that requires the observed sample frequency rate per exposure to be within 4% of the expected population frequency rate per exposure 95% of the time. Determine the value of Y needed to produce full credibility for the portfolioʼs experience. A. 50,000 B. 60,000 C. 70,000 D. 80,000 E. 90,000 2.3 (1 point) Let A be the number of claims needed for full credibility, if the estimate is to be within ±3% of the true value with a 80% probability. Let B be the similar number using 8% rather than 3%. What is the ratio of A divided by B? A. 3 B. 4 C. 5 D. 6 E. 7 2.4 (2 points) Assume you are conducting a poll relating to a single question and that each respondent will answer either yes or no. You pick a random sample of respondents out of a very large population. Assume that the true percentage of yes responses in the total population is between 20% and 80%. How many respondents do you need, in order to require that there be a 90% chance that the results of the poll are within ±8% of the true answer? A. Less than 1,000 B. At least 1,000 but less than 2,000 C. At least 2,000 but less than 3,000 D. At least 3,000 but less than 4,000 E. At least 4,000
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 18
2.5 (1 point) Assume frequency is Poisson. The full credibility standard for a company is set so that the total number of claims is to be within 8% of the true value with probability P. This full credibility standard is calculated to be 625 claims. What is the value of P? A. Less than 93% B. At least 93% but less than 94% C. At least 94% but less than 95% D. At least 95% but less than 96% E. 96% or more 2.6 (1 point) Find the number of claims required for full credibility. Require that there is a 95% chance that the estimate of the frequency is correct within ±10%. The frequency distribution has a variance 3 times its mean. A. Less than 1,000 B. At least 1,000, but less than 1,100 C. At least 1,100, but less than 1,200 D. At least 1,200, but less than 1,300 E. 1,300 or more 2.7 (2 points) A Standard for Full Credibility in terms of claims has been established for frequency assuming that the frequency is Poisson. If instead the frequency is assumed to follow a Negative Binomial with parameters r = 12 and β = 0.5, what is the ratio of the revised Standard for Full Credibility to the original one? A. Less than 1 B. At least 1 but less than 1.2 C. At least 1.2 but less than 1.4 D. At least 1.4 but less than 1.6 E. At least 1.6 2.8 (1 point) Assume frequency is Poisson. How many claims are required for Full Credibility if one requires that there be a 95% chance of being within ±10% of the true frequency? A. Less than 250 B. At least 250 but less than 300 C. At least 300 but less than 350 D. At least 350 but less than 400 E. 400 or more
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 19
2.9 (1 point) The total number of claims for a group of insureds is Poisson distributed with a mean of m. Using the Normal approximation, calculate the value of m such that the observed number of claims will be within 6% of m with a probability of 0.98. A. Less than 1,000 B. At least 1,000, but less than 1,500 C. At least 1,500, but less than 2,000 D. At least 2,000, but less than 2,500 E. 2,500 or more 2.10 (1 point) Assume frequency is Poisson. How many claims are required for Full Credibility if one requires that there be a 99% chance of the estimated frequency being within ±7.5% of the true value? A. Less than 800 B. At least 800 but less than 900 C. At least 900 but less than 1000 D. At least 1000 but less than 1100 E. At least 1100 2.11 (2 points) You are given the following information about a book of business: (i) Each insuredʼs claim count has a Poisson distribution with mean λ, where λ has a gamma distribution with α = 5 and θ = 0.3. (ii) The full credibility standard is for frequency to be within 2.5% of the expected with probability 0.95. Using classical credibility, determine the expected number of claims required for full credibility. (A) 6,000 (B) 7,000 (C) 8,000 (D) 9,000 (E) 10,000 2.12 (2 points) Frequency is assumed to follow a Binomial with parameters q = 0.4 and m. How many claims are required for Full Credibility if one requires that there be a 90% chance of the estimated frequency being within ±5% of the true value? (A) 650 (B) 700 (C) 750 (D) 800 (E) 850
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 20
2.13 (3 points) A standard for full credibility has been selected so that the actual frequency would be within 10% of the expected frequency 80% of the time. The number of claims for an individual insured is Poisson with mean λ. However, λ in turn varies across the portfolio via a Poisson with mean c. What is the smallest value of c, such that the data for one insured would be given full credibility? A. Less than 300 B. At least 300 but less than 400 C. At least 400 but less than 500 D. At least 500 but less than 600 E. At least 600 2.14 (2, 5/85, Q. 31) (1.5 points) Some scientists believe that Drug X would benefit about half of all people with a certain blood disorder. To estimate the proportion, p, of patients who would benefit from taking Drug X, the scientists will administer it to a random sample of patients who have the blood disorder. The estimate of p will be p^ , the proportion of patients in the sample who benefit from having taken the drug. Which of the following is closest to the minimum sample size that guarantees P[|p - p^ | ≤ 0.03] ≥ 0.95? A. 748
B. 1,068
C. 1,503
D. 2,056
E. 2,401
2.15 (4, 5/86, Q.34) (1 point) Let X be the number of claims needed for full credibility, if the estimate is to be within 5% of the true value with a 90% probability. Let Y be the similar number using 10% rather than 5%. What is the ratio of X divided by Y? A. 1/4 B. 1/2 C. 1 D. 2 E. 4 2.16 (4, 5/87, Q.46) (2 points) The "Classical" approach to credibility optimizes which of the following error measures? A. least squares error criterion B. variance of the hypothetical means C. normal approximation for skewness D. coefficient of variation E. None of the above
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 21
2.17 (4, 5/89, Q.29) (1 point) The total number of claims for a group of insureds is Poisson distributed with a mean of m. Calculate the value of m such that the observed number of claims will be within 3% of m with a probability of 0.975 using the normal approximation. A. Less than 5,000 B. At least 5,000, but less than 5,500 C. At least 5,500, but less than 6,000 D. At least 6,000, but less than 6,500 E. 6,500 or more 2.18 (4B, 11/94, Q.15) (3 points) You are given the following: Y represents the number of independent homogeneous exposures in an insurance portfolio. The claim frequency rate per exposure is a random variable with mean = 0.025 and variance = 0.0025. A full credibility standard is devised that requires the observed sample frequency rate per exposure to be within 5% of the expected population frequency rate per exposure 90% of the time. Determine the value of Y needed to produce full credibility for the portfolioʼs experience. A. Less than 900 B. At least 900, but less than 1,500 C. At least 1,500, but less than 3,000 D. At least 3000, but less than 4,500 E. At least 4,500 2.19 (4B, 5/96, Q.13) (1 point) Using the methods of Classical credibility, a full credibility standard of 1,000 expected claims has been established such that the observed frequency will be within 5% of the underlying frequency, with probability P. Determine the number of expected claims that would be required for full credibility if 5% were changed to 1%. A. 40 B. 200 C. 1,000 D. 5,000 E. 25,000 2.20 (4, 11/04, Q.21 & 2009 Sample Q.148) (2.5 points) You are given: (i) The number of claims has probability function: ⎛m⎞ p(x) = ⎜ ⎟ qx (1-q)m-x, x = 0, 1, 2,…, m ⎝x ⎠ (ii) The actual number of claims must be within 1% of the expected number of claims with probability 0.95. (iii) The expected number of claims for full credibility is 34,574. Determine q. (A) 0.05 (B) 0.10 (C) 0.20 (D) 0.40 (E) 0.80
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 22
Solutions to Problems: 2.1. B. Φ(2.326) = (1+P)/2 = (1+.98)/2 = .99, so that y = 2.326. n0 = y2 / k2 = (2.326 / 0.025)2 = 8656. 2.2. B. k = .04, P = 95%, y = 1.960, µf = .10, σf2 = .25, and (y2 /k2 )(σf2 /µf) = (1.960/0.04)2 (.25/.10) = 6002.5 claims ⇔ 6002.5/.10 = 60,025 exposures. 2.3. E. n0 = y2 / k2 and thus for a given P the standard for full credibility is inversely proportional to the square of k. Thus A/B = 82 / 32 = 7.11. Comment: The standard for full credibility is larger the smaller k; being within ±3% is more stringent a requirement which requires more claims than than being within ±8%. 2.4. B. Let m be the number of respondents and let q be the true percentage of yes respondents in the total population. The number of yes responses in the sample is given by a Binomial Distribution with parameters q and m, with variance mq(1-q). The percentage of yes responses is N/m, with variance: mq(1-q) / m2 = q(1-q) / m. Using the Normal Approximation 90% probability corresponds to ±1.645 standard deviations of the mean of q. Thus we want: (0.08)(q) = (1.645) q(1- q) / m. m = (1.645)
(1- q) / q / 0.08. m = 423 {(1/q) - 1}. The desired m is a decreasing function of q.
However, we assume q ≥ 0.2, so that m ≤ 423(5 - 1) = 1692. Alternately, for each respondent, which can be thought of as an exposure we have a Bernoulli distribution, with σf2/µf2 = (1-q)q / q2 = 1/q - 1. The standard for full credibility is in terms of exposures: (σf2 / µf) (y2 / k2 ) / µf = (σf2 / µf2 ) (y2 / k2 ) = (1.645/0.08)2 (1/q - 1) = 423 (1/q - 1). For 0.2 ≤ q ≤ 0.8, this is maximized when q = 0 2, and is then: 423(5 - 1) = 1692 exposures. Comment: The number of exposures needed for full credibility depends on q. We want a standard for full credibility that will be enough exposures to satisfy the criterion regardless of q, so we pick the maximum over q from 20% to 80%. 2.5. D. n0 = y2 / k2 . Therefore y = k n0 = 0.08 625 = 2.00. Φ(y) = (1+P)/2. P = 2Φ(y) - 1 = 2Φ(2.00) - 1 = (2)(.9772) - 1 = 0.9544.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 23
2.6. C. P = 95%. Φ(1.960) = (1+P)/2 = 0.975, so that y = 1.960. (σf2 /µf ) (y2 / k2 ) = (3)(1.960 / 0.10)2 = (3)(384) = 1152 claims. 2.7. D. For frequency, the general formula for the Standard for Full Credibility in terms of claims is: (σf2 /µf ) {y2 / k2 }. Assuming y and k are fixed, then the Standard for Full Credibility is proportional to the ratio of the variance to the mean. For the Poisson this ratio is one. For the Negative Binomial this ratio is: {rβ(1+β)} / (rβ) = 1 + β. Thus the second Standard is 1+β = 1.5 times the first Standard. Comment: The Negative Binomial has more random fluctuation than the Poisson, and therefore the standard for Full Credibility is larger. 2.8. D. Φ(1.960) = 0.975, so that y = 1.960. n0 = y2 / k2 = (1.960 / 0.10)2 = 384. 2.9. C. Φ(2.326) = 0.99, so that y = 2.326. n0 = y2 / k2 = (2.326 / 0.06)2 = 1503 claims. 2.10. E. Φ(2.576) = (1+P)/2 = (1+.99)/2 = 0.995, so that y = 2.576. n0 = y2 / k2 = (2.576 / 0.075)2 = 1180 claims. 2.11. C. For the Gamma-Poisson, the mixed distribution is Negative Binomial, with r = α = 5 and β = θ = 0.3. Therefore, σf2 /µf = rβ(1 + β)/(rβ) = 1 + β = 1.3. For P = 0.95, y = 1.960. k = 0.025. (y2 / k2 )(σf2 /µf) = (1.960/0.025)2 (1.3) = 7991 claims. Comment: Similar to 4, 11/02, Q.14. 2.12. A. Φ(1.645) = (1+P)/2 = (1 + 0.90)/2 = .95, so that y = 1.645. n0 = y2 / k2 = (1.645 / 0.05)2 = 1082 claims. σf2 /µf = (0.4)(0.6)m/(0.4m) = 0.6. n0 σf2 /µf = (1082)(0.6) = 650 claims.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 24
2.13. B. We have y = 1.282 since Φ(1.282) = 0.90 = (1 + 0.80)/2. Therefore, n0 = y2 / k2 = (1.282 / 0.10)2 = 164 claims. The frequency for the portfolio is a mixture of Poissons. The mean of the mixture is: E[λ] = c. The second moment of each Poisson is: variance + mean2 = λ + λ2. The second moment of the mixture is the mixture of the second moments: E[λ + λ2] = E[λ] + E[λ2] = c + (c + c2 ) = 2c + c2 . Thus the variance of the mixture is: 2c + c2 - c2 = 2c. Thus the standard for full credibility in terms of number of claims is: σf2 n = (2c/c) n0 = 2n0 = (2)(164) = 328 claims. µf 0 For the data for one insured to be given full credibility, we need to have the expected number of claims for an individual insured to be at least 328. The smallest possible c is 328. Alternately, EPV = E[λ] = c. VHM = Var[λ] = c. So the variance of the mixture is: EPV + VHM = 2c. Proceed as before. 2.14. B. Want 95% probability of estimate being within 0.03/p in percent of the true value of p. k = 0.03/p. Φ(1.960) = (1 + 95%) = 90%. y = 1.960. For Classical Credibility, the standard for full credibility for frequency is: (σf2/µf)(y/k)2 = {p(1-p)/p}{1.960/(0.03/p)}2 = (1-p)p2 4268 claims. Put it in terms of exposures by dividing by the mean frequency p: (1-p)p4268 exposures. p(1-p) ≤ 1/2. Therefore, one can take n = 4268/4 = 1067. Alternately, let x be the number who benefit. For a Binomial Distribution, p^ = x/n. Var[X] = np(1-p). Var[ p^ ] = Var[X]/n2 = p(1-p)/n. Using the Normal Approximation, P[|p - p^ | ≤ 0.03] = 0.95, if 0.03 = 1.960 standard deviations. 0.03 = 1.96
p(1- p) . ⇒ n = 4268 p(1-p). n
Now for 0 ≤ p ≤ 1, p(1-p) has its maximum for p = 1/2, when p(1-p) = 1/4. So we can take n = 4268/4 = 1067.
2013-4-8,
Classical Credibility §2 Full Credibility Frequency, HCM 10/16/12, Page 25
2.15. E. Since the full credibility standard is inversely proportional to the square of k: n0 = y2 / k2 , X/Y = (10%/5%)2 = 4. Alternately, one can compute the values of X and Y assuming one is dealing with the standard for frequency and that the frequency is Poisson. (The answer to this question does not depend on these assumptions.) For k = 5% and P = 90%: Φ(1.645) = 0.95 = (1 + 0.90)/2, so that y = 1.645, n0 = y2 / k2 = (1.645 / 0.05)2 = 1082 = X. For k = 10% and P = 90%: Φ(1.645) = 0.95 = (1 + 0.90)/2, so that y = 1.645, n0 = y2 / k2 = (1.645 / 0.10)2 = 271 = Y. Thus X/Y = 1082 / 271 = 4. Comment: As the requirement gets less strict, for example k= 10% rather than 5%, the number of claims needed for Full Credibility decreases. 2.16. E. The classical approach to credibility attempts to limit the probability of “large” errors. What is considered a “large” error is determined by the choice of k. The classical approach to credibility does not optimize any particular error measure. The Buhlmann or “greatest accuracy” approach, optimizes the least squares error criterion. 2.17. C. Classical Credibility for frequency with k = 0.03 and P = 0.975. y = 2.24 , since Φ(2.24 ) = 0.9875 = (1+P)/2. n0 = y2 / k2 = (2.24/0.03)2 = 5575 claims. 2.18. D. Φ(1.645) = (1 + 0.90)/2 = 0.95 ⇒ y = 1.645. n0 = y2 / k2 = (1.645/0.05)2 = 1082 claims. nf = n0 (σf2 /µf) = (1082)(0.0025/0.025) = 108.2 claims. 108.2 claims / 0.025 = 4328 exposures. Alternately, the standard for full credibility for frequency in terms of number of exposures is: (y2 /k2 ) (σf2 / µf2 ) = (1.645/0.05)2 (0.0025 /0.0252 ) = (1082) (4) = 4328 exposures. 2.19. E. The Standard for Full Credibility (whether it is for frequency, severity, or pure premiums) is inversely proportional to k2 . Thus the revised Standard is: (0.05/0.01)2 (1000) = 25,000. 2.20. B. k = 1%. P = 95%. ⇒ y = 1.960. n0 = (y/k)2 = 38,416 claims. Binomial Frequency. σf2 /µf = mq(1-q)/(mq) = 1 - q. 34,574 = n0 σf2 /µf = 38,416(1 - q). ⇒ q = 0.100.
2013-4-8,
Classical Credibility §3 Full Credibility Severity, HCM 10/16/12, Page 26
Section 3, Full Credibility for Severity You are less likely to be asked a question on the exam involving applying Classical Credibility to estimating future severities. However, the same ideas easily apply as they did to frequencies. Assume we have 5 claims each independently drawn from an Exponential Distribution: F(x) = 1 - e-x/100. Then since the variance of an Exponential is θ2 , the variance of a single claim is: 1002 = 10,000. Thus the variance of the total cost of five independent claims is (5)(10,000) = 50,000. The observed severity is the total observed cost divided by the number of claims, in this case 5. Thus the variance of the observed severity is (1/5)2 (50000) = 2000. When one has N claims, the variance of the observed severity is (N10000) / N2 = 10,000 / N. In general, the variance of the observed severity = (process variance of the severity) / (number of claims) = σS2 / N. Therefore, the standard deviation for the observed severity is σS / N . Assume we wish to have a chance of P that the observed severity will be within ±k of the true average severity. As before with credibility for the frequency, use the Normal Approximation, with y such that Φ(y) = (1+P)/2. Then within ±y(standard deviations of observed severity) of the mean covers probability of P on the Normal Distribution. Therefore, in order to have P probability of differing from the mean severity by less than ±kµS, we want y( σS / N ) = kµS. Solving: N = (y/ k)2 σS2 / µS2 = n0 CVSev2 . The Standard for Full Credibility for the Severity in terms of number of expected claims is: y2 σs 2 = n0 CVS e v2 , k2 µs 2 where CVSev is the coefficient of variation of the severity = standard deviation / mean.26 27
26
Equation 2.3.2 in Mahler and Dean. This is equation 20.6 in Loss Models, as applied to this situation. Equation 20.6 in Loss Models gives the requirement for full credibility: n ≥ λ0 (σ/ξ)2 . What Loss Models refers to as σ, is the standard deviation of the 27
quantity to be estimated, in this case severity, and ξ is the mean of the quantity to be estimated. So in this case (σ/ξ)2 = (σs/µ s )2 . Note that since the denominator of severity is claims, equation 20.6 gives a result here in terms of expected claims rather than exposures.
2013-4-8,
Classical Credibility §3 Full Credibility Severity, HCM 10/16/12, Page 27
Note that no assumption was made about the distributional form of the frequency. The Standard for Full Credibility for severity does not depend on whether the frequency is Poisson, Negative Binomial, etc. However, we have assumed that frequency and severity are independent and that all of the claims are drawn from the same distribution. Exercise: Let P = 90% and k = 5%. If the coefficient of variation of the severity is 3, then what is the Standard for Full Credibility for the severity in terms of expected claims? [Solution: n0 = (1.645/0.05)2 = 1082 claims. Then the Standard for Full Credibility for the severity is: (1082)(32 ) = 9738 expected claims.] Exposures vs. Claims: Standards for Full Credibility have been calculated so far in terms of the expected number of claims. It is common to translate these into a number of exposures by dividing by the (approximate) expected claim frequency. So for example, if the Standard for Full Credibility is 9738 claims and the expected claim frequency in Homeowners Insurance were .04 claims per house-year, then 9738 / 0.04 = 243,000 house-years would be a corresponding Standard for Full Credibility in terms of exposures. In general, one can divide the Standard for Full Credibility in terms of claims by µf, in order to get it in terms of exposures. Thus in general, the Standard for Full Credibility for the Severity in terms of number of exposures is: n0 CVSev2 y2 σs 2 1 = , µf k2 µs2 µf where CVSev is the coefficient of variation of the severity.
2013-4-8,
Classical Credibility §3 Full Credibility Severity, HCM 10/16/12, Page 28
Problems: 3.1 (2 points) You are given the following:
•
The claim amount distribution has mean 500, variance 5,000,000.
•
Frequency and severity are independent.
Find the number of claims required for full credibility, if you require that there will be a 80% chance that the estimate of the severity is correct within ±2%. A. Less than 60,000 B. At least 60,000 but less than 70,000 C. At least 70,000 but less than 80,000 D. At least 80,000 but less than 90,000 E. At least 90,000
3.2 (3 points) You are given the following:
•
The claim amount distribution is LogNormal, with σ = 1.5.
•
Frequency and severity are independent.
Find the number of claims required for full credibility, if you require that there will be a 95% chance that the estimate of the severity is correct within ±10%. A. Less than 2900 B. At least 2900, but less than 3000 C. At least 3000, but less than 3100 D. At least 3100, but less than 3200 E. At least 3200
3.3 (3 points) You are given the following:
•
The claim amount distribution is Pareto, with α = 2.3.
•
Frequency and severity are independent.
Find the number of claims required for full credibility, if you require that there will be a 90% chance that the estimate of the severity is correct within ±7.5%. (E) 3700 (A) 2900 (B) 3100 (C) 3300 (D) 3500
2013-4-8,
Classical Credibility §3 Full Credibility Severity, HCM 10/16/12, Page 29
3.4 (2 points) You require that there will be a 99% chance that the estimate of the severity is correct within ±5%. 17,000 claims are required for full credibility. Determine the coefficient of variation of the size of loss distribution. A. Less than 1 B. At least 1, but less than 2 C. At least 2, but less than 3 D. At least 3, but less than 4 E. At least 4 3.5 (2 points) You are given the following: • The estimated claim frequency is 4%. • Number of claims and claim severity are independent. • Claim severity has the following distribution: Claim Size Probability 10 .50 20 .30 50 . 20 Determine the number of exposures needed so that the estimated average size of claim is within 2% of the expected size with 95% probability. (A) 95,000 (B) 105,000 (C) 115,000 (D) 125,000 (E) 135,000 3.6 (3 points) An actuary is determining the number of claims needed for full credibility in three different situations: (1) Assuming claim severity is Pareto, the estimated claim severity is to be within r of the true value with probability p. (2) Assuming claim frequency is Binomial, the estimated claim frequency is to be within r of the true value with probability p. (3) Assuming claim severity is Exponential, the estimated claim severity is to be within r of the true value with probability p. The same values of r and p are chosen for each situation. Rank these three limited fluctuation full credibility standards from smallest to largest. (A) 1, 2, 3 (B) 2, 1, 3 (C) 3, 1, 2 (D) 2, 3, 1 (E) None of A, B, C, or D
2013-4-8,
Classical Credibility §3 Full Credibility Severity, HCM 10/16/12, Page 30
3.7 (3 points) You wish to estimate the average insured damage per hurricane for hurricanes that hit the east or gulf coasts of the United States. The full credibility standard is to be within 10% of the expected severity 98% of the time. The insured damage from a single hurricane in millions of dollars is modeled as a mixture of five Exponential Distributions: θ1 = 6
p 1 = 24%,
θ2 = 40
p 2 = 26%
θ3 = 700
p 3 = 30%
θ4 = 4000
p 4 = 17%
θ5 = 18,000
p 5 = 3%
Determine the number of hurricanes needed for full credibility. A. 3000 B. 4000 C. 5000 D. 6000 E. 7000
2013-4-8,
Classical Credibility §3 Full Credibility Severity, HCM 10/16/12, Page 31
Solutions to Problems: 3.1. D. y = 1.282 since Φ(1.282) = 0.90. n0 = y2 / k2 = (1.282/0.02)2 = 4109. For severity, the Standard For Full Credibility is: n0 C V2 = (4109) (5,000,000/5002 ) = (4109)(20) = 82,180 claims. 3.2. E. y = 1.960 since Φ(1.960) = 0.975. n0 = y2 / k2 = (1.960/0.1)2 = 384. For the LogNormal Distribution: Mean = exp(µ + 0.5 σ2), Variance = exp(2µ + σ2) {exp( σ2) - 1}, and therefore, the Coefficient of Variation = exp(σ2 ) - 1. For σ = 1.5, the CV2 = exp(2.25) - 1 = 8.49. For severity, the Standard For Full Credibility is: n0 C V2 = (384)(8.49) = 3260. 3.3. E. Φ(1.645) = 0.95, so that y = 1.645. n0 = y2 /k2 = (1.645/0.075)2 = 481. 2 θ2 Using the formulas for the moments: CV2 = E[X2 ] / E2 [X] - 1 =
(α − 1) (α − 2) -1= 2 θ α −1
(
)
2 (α-1) / (α-2) - 1 = α / (α-2) . For α = 2.3, CV2 = 2.3 / .3 = 7.667. Therefore n0 (CV2 ) = (481)(7.667) = 3688. Comment: The smaller α, the heavier-tailed the Pareto Distribution, making it harder to limit fluctuations in the estimated severity, since a single large clam can affect the observed average severity. Therefore, the smaller α, the larger the Standard for Full Credibility. 3.4. C. Φ(2.576) = 0.995 = (1 + 0.99)/2, so that y = 2.576. n0 = y2 /k2 = (2.576/0.05)2 = 2654. 17,000 = n0 C VSev2 . ⇒ CVSev =
17,000 = 2.53. 2654
3.5. D. We have y = 1.960 since Φ(1.960) = 0.975. Therefore n0 = y2 / k2 = (1.960/0.02)2 = 9604. The mean severity is: (10)(0.5) + (20)(0.3) + (50)(0.2) = 21. The variance of the severity is: (112 )(0.5) + (12 )(0.3) + (292 )(0.2) = 229. Thus the coefficient of variation squared = 229 / 212 = 0.519. n0 (CV2 ) = (9604) (0.519) = 4984 claims. This corresponds to: 4984 / 0.04 = 124,600 exposures.
2013-4-8,
Classical Credibility §3 Full Credibility Severity, HCM 10/16/12, Page 32
3.6. D. (1) The coefficient of variation for the Pareto is greater than 1 (or infinite). Thus the Standard for Full Credibility for Severity is: CVSev2 n0 > 12 n0 = n0 . (2) For the Binomial,
variance m q (1 - q) = = 1 - q. mean mq
Standard for Full Credibility for Frequency is: (1 - q) n0 < n0 . (3) The CV for the Exponential is 1. Thus the Standard for Full Credibility for Severity is: CVSev2 n0 = 12 n0 = n0 . Thus ranking the standards from smallest to largest: 2, 3, 1. Comment: The CV of the Pareto is discussed in Section 30 of “Mahlerʼs Guide to Loss Distributions.” Since it is heavier tailed than the Exponential, when it is finite, the CV of the Pareto is greater than that of the Exponential. From its mean and second moment, one can determine that for a Pareto Distribution: Coefficient of Variation =
α , α > 2. α−2
3.7. D. P = 98%. ⇒ y = 2.326. k = 10%. ⇒ n0 = (2.326/0.1)2 = 541 hurricanes. The first moment of the mixed severity is: (24%)(6) + (26%)(40) + (30%)(700) + (17%)(4000) + (3%)(18,000) = 1441.8. The second moment of each Exponential is 2θ2. Thus the second moment of the mixed severity is: (24%)(2)(62 ) + (26%)(2)(402 ) + (30%)(2)(7002 ) + (17%)(2)(40002 ) + (3%)(2)(18,0002 ) = 25,174,850. C V2 = E[X2 ]/E[X]2 - 1 = 25,174,85 / 1441.82 - 1 = 11.110. Standard for full credibility is: (11.110)(541) = 6010 hurricanes. Comment: 167 hurricanes hit the continental United States from 1900 to 1999. The reported losses would have been adjusted to a current level for inflation, changes in per capita wealth (to represent the changes in property value above the rate of inflation), changes in insurance utilization, and changes in number of housing units (by county). See “A Macro Validation Dataset for U.S. Hurricane Models”, by Douglas J. Collins and Stephen P. Lowe, CAS Forum, Winter 2001. Using the criterion in this question, the credibility for a century of data used for estimating severity of hurricanes is: 167 / 6010 = 17%.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 33
Section 4, Variance of Pure Premiums and Aggregate Losses28 The same formulas and techniques can be used to calculate the process variance of the aggregate losses or loss ratios. The loss ratio is defined as losses divided by premiums. Exercise: XYZ Insurance insures 123,000 automobiles for one year. Total premiums are $57 million. Total loss payments are $48 million. What are the pure premium, aggregate annual loss, and loss ratio? [Solution: The aggregate loss is $48 million. Pure premium = $48 million/123,000 car years = $390/ car year. Loss Ratio = $48 million/ $57 million = 84.2%.] Aggregate Loss: The Aggregate Loss is the total dollars of loss for an insured or set of an insureds. If not stated otherwise, the period of time is one year. For example, during 1999 the MT Trucking Company may have had $952,000 in aggregate losses on its commercial automobile collision insurance policy. All of the trucking firms insured by the Fly-byNight Insurance Company may have had $15.1 million dollars in aggregate losses for collision. The dollars of aggregate losses are determined by how many losses there are and the severity of each one. Exercise: During 1998 MT Trucking suffered three collision losses for $8,000, $13,500, and $22,000. What are its aggregate losses? [Solution: $8,000 + $13,500 + $22,000 = $43,500.] Aggregate Losses = # of Claims $ of Loss (# of Exposures) = (Exposures) (Frequency) (Severity). # of Exposures # of Claims If one is not given the frequency per exposure, but is rather just given the frequency for the whole number of exposures,29 whatever they are for the particular situation, then Aggregate Losses = (Frequency) (Severity). Similarly, the Aggregate Payment is the total dollars paid by an insurer on an insurance policy or set of insurance policies. If not stated otherwise, the period of time is one year.
28
This important material is covered in “Mahlerʼs Guide to Aggregate Distributions,” as well as here. For example, the expected number claims from a large commercial insured is 27.3 per year or the expected number of Homeownerʼs claims expected by XYZ Insurer in the State of Florida is 12,310. 29
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 34
Exercise: During 1998 MT Trucking suffered three collision losses for $8,000, $13,500, and $22,000. MT Trucking has a $10,000 per claim deductible on its policy with the Fly-by-Night Insurance Company. What are the aggregate payments by Fly-by-Night? [Solution: $0 + $3,500 + $12,000 = $15,500.] Pure Premium: Pure Premium = Aggregate Loss per exposure. The mean pure premium is: (mean frequency per exposure)(mean severity). Expected Aggregate Loss = (Mean Pure Premium)(Exposure) Estimated expected pure premiums serve as a starting point for pricing insurance.30
Process Variance: Random fluctuation occurs when one rolls dice, spins spinners, picks balls from urns, etc. The observed result varies from time period to time period due to random chance. This is also true for the pure premium observed for a collection of insureds.31 The variance of the observation for a given risk that occurs due to random fluctuation is referred to as the process variance. That is what will be discussed here.32 Since pure premiums depend on both the number of claims and the size of claims, pure premiums have more reasons to vary than do either frequency or severity individually.
30
One would have to load for loss adjustment expenses, expenses, taxes, profits, etc. In fact this is the fundamental reason for the existence of insurance. 32 The process variance is distinguished from the variance of the hypothetical pure premiums as discussed in Buhlmann Credibility. 31
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 35
Independent Frequency and Severity: You are given the following: • For a given risk, the number of claims for a single exposure period is given by a Binomial Distribution with q = 0.3 and m = 2. • The size of a claim will be 50, with probability 80%, or 100, with probability 20%. • Frequency and severity are independent. Exercise: Determine the variance of the pure premium for this risk. [Solution: List the possibilities and compute the first two moments: Situation 0 claims 1 claim @ 50 1 claim @ 100 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 100 2 claims @ 100 each
Probability 49.00% 33.60% 8.40% 5.76% 2.88% 0.36%
Pure Premium 0 50 100 100 150 200
Square of P.P. 0 2,500 10,000 10,000 22,500 40,000
Overall
100.0%
36
3,048
For example, the probability of 2 claims is: .32 = 9%. We split this 9% among the possible claim sizes: 50 and 50 @ (0.8)(0.8) = 64%, 50 and 100 @ (0.8)(0.2) = 16%, 100 and 50 @ (0.2)(0.8) = 16%, 100 and 100 @ (0.2)(0.2) = 4%. (9%)(64%) = 5.76%, (9%)(16 % + 16%) = 2.88%, (9%)(4%) = 0.36%. One takes the weighted average over all the possibilities. The average Pure Premium is 36. The second moment of the Pure Premium is 3048. Therefore, the variance of the pure premium is: 3048 - 362 = 1752.] In this case since frequency and severity are independent one can make use of the following formula:33 Process Variance of Pure Premium = (Mean Frequency) (Variance of Severity) + (Mean Severity)2 (Variance of Frequency)
σP P2 = µF σS 2 + µS 2 σF2. Memorize this formula! Note that each of the two terms has a mean and a variance, one from frequency and one from severity. Each term is in dollars squared; that is one way to remember that the mean severity (which is in dollars) enters as a square while that for mean frequency (which is not in dollars) does not. 33
Equation 2.4.1 in Mahler and Dean.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 36
In the above example, the mean frequency is mq = 0.6 and the variance of the frequency is: mq(1 - q) = (2)(0.3)(0.7) = 0.42. The average severity is 60 and the variance of the severity is: (0.8)(102 ) + (0.2)(402 ) = 400. Thus, the process variance of the pure premium is: (0.6)(400) + (602 )0(.42) = 1752, which matches the result calculated previously. This same formula can also be used to compute the process variance of the aggregate losses, when frequency and severity are independent.
σA2 = µF σS 2 + µS 2 σF2. The sum of losses is just the product of the pure premium and the number of exposures. Provided the risk processes for the individual exposures are independent and identical, then both µF and σF2 are multiplied by the number of exposures as is σPP2 . In the above example, the process variance of the sum of the losses from 10 exposures is: (6)(400) + (602 )(4.2) = 17520 = (10)(1752). Dependent Frequency and Severity: While frequency and severity are almost always independent, if they are dependent one can use a more general technique.34 The first and second moments can be calculated by listing the pure premiums for all the possible outcomes and taking the weighted average, applying the probabilities as weights to either the pure premium or its square. In continuous cases, this will involve taking integrals, rather than sums. Then one can calculate the variance of the pure premium as: second moment - (first moment)2 . Aggregate Losses Versus Pure Premiums: Exercise: Assume frequency is Poisson with mean 5% for one exposure. Severity is Exponential, with mean 100. What is the mean and variance of the pure premium? [Solution: The mean pure premium = µF µS = (5%)(100) = 5. Variance of pure premium = µF σS2 + µS2 σF2 = (5%)(1002 ) + (1002 )(5%) = 1000.] Exercise: If we insure 1000 independent, identically distributed exposures, what is the mean and variance of the aggregate loss? [Solution: Overall frequency is Poisson with mean: (5%)(1000) = 50. Mean aggregate: (50)(100) = 5000. Variance of aggregate: (50)(1002 ) + (1002 )(50) = 1 million.] 34
See 4B, 5/95, Q.14 and 4, 11/02, Q.36 for examples where frequency and severity are dependent.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 37
So we can use basically the same formula for the mean and variance when working with either the aggregate losses or pure premiums.35 When working with pure premiums, we used 5% as the mean frequency and 5% as the variance of the frequency, the mean and variance of the frequency distribution for a single exposure. However, when working with the aggregate losses, we used 50 as the mean frequency and 50 as the variance of the frequency, the mean and variance of the frequency distribution of the whole portfolio. Note that when we add up 1000 independent, identically distributed exposures, we get 1000 times the mean and 1000 times the variance for a single exposure. In general, when we have N identical, independent exposures: Mean aggregate loss = (N)(mean pure premium). Variance of aggregate loss = (N)(variance of pure premium). Derivation of the formula for the Process Variance of the Pure Premium: The above formula for the process variance of the pure premium for independent frequency and severity is a special case of the formula that also underlies analysis of variance: Var(Y) = EX[VARY(Y|X)] + VARX(EY[Y|X]), where X and Y are any random variables. Letting Y be the pure premium PP and X be the number of claims N in the above formula gives: Var(PP) = EN[VARPP(PP|N)] + VARN(EPP[PP|N]) = EN[NσS2 ] + VARN(µSN) = EN[N]σS2 + µS2 VARN(N) = µF σS2 + µS2 σF2 . Where I have used the assumption that the frequency and severity are independent and the facts: • For a fixed number of claims N, the variance of the pure premium is the variance of the sum of N independent identically distributed variables each with variance σS2 . (Since frequency and severity are assumed independent, σS2 is the same for each value of N.) Such variances add so that VARPP(PP | N) = NσS2 . •
For a fixed number of claims N, for frequency and severity independent the expected value of the pure premium is N times the mean severity: EPP[PP | N] = µSN.
•
Since with respect to N the variance of the severity acts as a constant: EN[NσS2 ] = σS2 EN[N] = µF σS2 .
•
Since with respect to N the mean of the severity acts as a constant: VARN(µSN) = µS2 VARN(N) = µS2 σF2 .
35
One can define the whole portfolio as one “exposure”; then the Aggregate Loss is mathematically just a special case of the Pure Premium.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 38
Letʼs apply this derivation to a previous example. You were given the following: • For a given risk, the number of claims for a single exposure period is given by a Binomial Distribution with q = 0.3 and m = 2. • The size of the claim will be 50, with probability 80%, or 100, with probability 20%. • Frequency and severity are independent. There are only three possible values of N: N = 0, N = 1 or N = 2. If N = 0, then PP = 0. If N = 1, then either PP = 50 with 80% chance or PP = 100 with 20% chance. If N = 2, then PP = 100 with 64% chance, PP =150 with with 32% chance or PP = 200 with 4% chance. We then get : N 0 1 2 Mean
Probability Mean PP Given N 49% 0 42% 60 9% 120 36
Square of Mean Second Moment of PP Given N of PP Given N 0 0 3600 4000 14400 15200 2808
Var of PP Given N 0 400 800 240
For example given two claims the second moment of the pure premium = (64%)(1002 ) + (32%)(1502 ) + (4%)(2002 ) = 15200. Thus given two claims the variance of the pure premium is: 15200 - 1202 = 800. Thus EN[VARPP(PP|N)] = 240, and VARN(EPP[PP|N]) = 2808 - 362 = 1512. Thus the variance of the pure premium is EN[VARPP(PP|N)] + VARN(EPP[PP|N]) = 240 + 1512 = 1752, which matches the result calculated above. The (total) process variance of the pure premium has been split into two pieces. The first piece calculated as 240, is the expected value over the possible numbers of claims of the process variance of the pure premium for fixed N. The second piece calculated as 1512, is the variance over the possible numbers of the claims of the mean pure premium for fixed N. Expected Value of the Process Variance: In order to solve questions involving Greatest Accuracy/Buhlmann Credibility and Pure Premiums or Aggregate Losses one has to compute the Expected Value of the Process Variance of the Pure Premium or Aggregate Losses.36 This involves being able to compute the process variance for each specific type of risk and then averaging over the different types of risks possible. This may involve taking a weighted average or performing an integral.
36
“See “Mahlerʼs Guide to Buhlmann Credibility.”
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 39
Poisson Frequency: Assume you are given the following: • For a given risk, the number of claims for a single exposure period is Poisson with mean 7. • The size of the claim will be 50, with probability 80%, or 100, with probability 20%. • frequency and severity are independent. Exercise: Determine the variance of the pure premium for this risk. [Solution: µF = σF2 = 7. µS = 60. σS2 = 400.
σPP2 = µF σS2 + µS2 σF2 = (7)(602 ) + (400)(7) = 28,000.] In the case of a Poisson Frequency with independent frequency and severity the formula for the process variance of the pure premium simplifies. Since µF = σF2 :
σPP2 = µF σS2 + µS2 σF2 = µF(σS2 + µS2 ) = µF(2nd moment of the severity). When there is a Poisson Frequency, the variance of aggregate losses is: λ (2nd moment of severity). In the example above, the second moment of the severity is: (0.8)(502 ) + (0.2)(1002 ) = 4000. Thus σPP2 = µF(2nd moment of the severity) = (7)(4000) = 28,000. If instead we have 20 independent exposures and take the sum of the losses, then the variance of these aggregate losses is (140)(4000) = (20)(28,000) = 560,000. As another example, assume you are given the following: • For a given risk, the number of claims for a single exposure period is Poisson with mean 3645. •
The severity distribution is LogNormal, with parameters µ = 5 and σ = 1.5.
•
frequency and severity are independent
Exercise: Determine the variance of the pure premium for this risk. [Solution: The second moment of the severity = exp(2µ + 2σ2 ) = exp(14.5) = 1,982,759.264. Thus σPP2 = µF(2nd moment of the severity) = (3645)(1,982,759) = 7.22716 x 109 .]
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 40
Normal Approximation: For large numbers of expected claims, the observed pure premiums are approximately Normally Distributed.37 For example, continuing the example above, mean severity = exp(µ + 0.5σ2 ) = exp(6.125) = 457.14. Thus the mean pure premium is (3645)(457.14) = 1,666,292. One could ask what the chance of the observed pure premiums being between 1.4997 million and 1.8329 million. Since the variance is 7.22716 x 109 , the standard deviation of the pure premium is 85,013. Thus the probability of the observed pure premiums being within ±10% of 1.6663 million is approximately: Φ[(1.8329 million - 1.6663 million) / 85,013] - Φ[(1.4997 - 1.6663 million) / 85,013] = Φ[1.96] - Φ[−1.96] = 0.975 - (1 - 0.975) = 95%. Thus in this case with an expected number of claims equal to 3645, there is about a 95% chance that the observed pure premium will be within ±10% of the expected value. One could turn this around and ask how many claims would one need in order to have a 95% chance that the observed pure premium will be within ±10% of the expected value. The answer of 3645 claims could be taken as a Standard for Full Credibility for the Pure Premium.38
37
The more skewed the severity distribution, the higher the expected frequency has to be for the Normal Approximation to produce worthwhile results. 38 As discussed in the next section.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 41
Policies of Different Types: Let us assume we have a portfolio consisting of two types of policies:
Type A B
Number of Policies 10 20
Mean Aggregate Loss per Policy 6 9
Variance of Aggregate Loss per Policy 3 4
Assuming the results of each policy are independent, then the mean aggregate loss for the portfolio is: (10)(6) + (20)(9) = 240. The variance of aggregate loss for the portfolio is: (10)(3) + (20)(4) = 110. For independent policies, the means and variances add. Note that as we have more policies, all other things being equal, the coefficient of variation goes down. Exercise: Compare the coefficient of variation of aggregate losses in the above example to that if one had instead 100 policies of Type A and 200 polices of type B. [Solution: For the original example, CV = For the new example, CV =
110 / 240 = 0.043.
1100 / 2400 = 0.0138.]
Exercise: For each of the two cases in the previous exercise, using the Normal Approximation estimate the probability that the aggregate losses will be at least 5% more than their mean. [Solution: For the original example, Prob[Agg. > 252] ≅ 1 - Φ[(252 - 240)/ 110 ] = 1 - Φ[1.144] = 12.6%. For the new example, Prob[Agg. > 2520] ≅ 1 - Φ[(2520 - 2400)/ 1100 ] = 1 - Φ[3.618] = 0.015%.] For a larger portfolio, all else being equal, there is less chance of an extreme outcome in a given year measured as a percentage of the mean.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 42
Problems: Use the following information for the next five questions: • The number of claims for a single year is Poisson with mean 6200. •
The severity distribution is LogNormal, with parameters µ = 5 and σ = 0.6.
•
Frequency and severity are independent
4.1 (1 point) Determine the expected annual aggregate losses. A. Less than 0.8 million B. At least 0.8 million but less than 0.9 million C. At least 0.9 million but less than 1.0 million D. At least 1.0 million but less than 1.1 million E. At least 1.1 million 4.2 (2 points) Determine the variance of the annual aggregate losses. A. Less than 270 million B. At least 270 million but less than 275 million C. At least 275 million but less than 280 million D. At least 280 million but less than 285 million E. At least 285 million 4.3 (2 points) Determine the chance that the observed annual aggregate losses will be more than 1.130 million. (Use the Normal Approximation.) A. Less than 4% B. At least 4%, but less than 5% C. At least 5%, but less than 6% D. At least 6%, but less than 7% E. At least 7% 4.4 (2 points) Determine the chance that the observed annual aggregate losses will be less than 1.075 million. (Use the Normal Approximation.) A. Less than 4% B. At least 4%, but less than 5% C. At least 5%, but less than 6% D. At least 6%, but less than 7% E. At least 7% 4.5 (1 point) Determine the chance that the observed annual aggregate losses will be within ±2.5% of its expected value. (Use the Normal Approximation.) A. 86% B. 88% C. 90% D. 92% E. 94%
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 43
Use the following information for the next three questions: There are two types of risks. For each type of risk, the frequency and severity are independent. Type Frequency Distribution Severity Distribution 1
Poisson: λ = 4%
Gamma: α = 3, θ = 10
2
Poisson: λ = 6%
Gamma: α = 3, θ = 15
4.6 (1 point) Calculate the process variance of the pure premium for Type 1. A. 48 B. 50 C. 52 D. 54 E. 56 4.7 (1 point) Calculate the process variance of the pure premium for Type 2. A. 150 B. 156 C. 162 D. 168 E. 174 4.8 (1 point) Assume one has a portfolio made up of 80% risks of Type 1, and 20% risks of Type 2. For this portfolio, what is the expected value of the process variance of the pure premium? A. 65
B. 67
C. 69
D. 71
E. 73
Use the following information for the next 3 questions: • Number of claims for a single insured follows a Negative Binomial distribution, with parameters r = 30 and β = 2/3. •
The amount of a single claim has a Gamma distribution with α = 4 and θ = 1000.
•
Number of claims and claim severity distributions are independent.
4.9 (2 points) Determine EN[VARPP(PP | N)], the expected value over the number of possible claims of the variance of the pure premium for a given number of claims. A. 50 million B. 60 million C. 70 million D. 80 million E. 90 million 4.10 (2 points) Determine VARN(EPP[PP|N]), the variance over the number of claims of the expected value of the pure premium for a given number of claims. A. Less than 400 million B. At least 400 million but less than 450 million C. At least 450 million but less than 500 million D. At least 500 million but less than 550 million E. At least 550 million 4.11 (2 points) Determine the pure premium's process variance for a single insured. A. 575 million B. 585 million C. 595 million D. 605 million E. 615 million
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 44
Use the following information for the next four questions: There are three types of risks. For each type of risk, the frequency and severity are independent. Type Frequency Distribution Severity Distribution Ι
Binomial: m =10, q = 0.3
Pareto: α = 3, θ = 500
ΙΙ
Poisson: λ = 5
LogNormal: µ = 6, σ = 0.8
ΙΙΙ
Negative Binomial: r = 2.7, β = 7/3
Gamma: α = 2, θ = 250
4.12 ( 2 points) For a risk of Type Ι, what is the process variance of the pure premium? A. Less than 0.5 million B. At least 0.5 million but less than 0.6 million C. At least 0.6 million but less than 0.7 million D. At least 0.7 million but less than 0.8 million E. At least 0.8 million 4.13 ( 2 points) For a risk of Type ΙΙ, what is the process variance of the pure premium? A. Less than 2.7 million B. At least 2.7 million but less than 2.8 million C. At least 2.8 million but less than 2.9 million D. At least 2.9 million but less than 3.0 million E. At least 3 million 4.14 ( 2 points) For a risk of Type ΙΙΙ, what is the process variance of the pure premium? A. Less than 5.7 million B. At least 5.7 million but less than 5.8 million C. At least 5.8 million but less than 5.9 million D. At least 5.9 million but less than 6.0 million E. At least 6.0 million 4.15 ( 2 points) Assume one has a portfolio made up of 55% risks of Type Ι, 35% risks of Type ΙΙ, and 10% risks of Type ΙΙΙ. For this portfolio, what is the expected value of the process variance of the pure premium? A. Less than 1.7 million B. At least 1.7 million but less than 1.8 million C. At least 1.8 million but less than 1.9 million D. At least 1.9 million but less than 2.0 million E. At least 2.0 million
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 45
4.16 (4, 5/89, Q.35) (1 point) For a given risk situation, the frequency distribution follows the Poisson process with mean 0.5. The second moment about the origin for the severity distribution is 1,000. Frequency and severity are independent of each other. What is the process variance of the aggregate claim amount? A. 500
B. (0.5)2
C. 1000 D. 0.5 E. Cannot be determined from the information given
1000
4.17 (4, 5/90, Q.43) (2 points) Let N be a random variable for the claim count with: Pr{N = 4} = 1/4 Pr{N = 5} = 1/2 Pr{N = 6} = 1/4 Let X be a random variable for claim severity with probability density function f(x) = 3 x-4, for 1 ≤ x < ∞. Find the coefficient of variation, R, of the aggregate loss distribution, assuming that claim severity and frequency are independent. A. R < 0.35 B. 0.35 ≤ R < 0.50 C. 0.50 ≤ R < 0.65 D. 0.65 ≤ R < 0.70 E. 0.70 ≤ R 4.18 (4, 5/91, Q.26) (2 points) The probability function of claims per year for an individual risk is Poisson with a mean of 0.10. There are four types of claims. The number of claims has a Poisson distribution for each type of claim. The table below describes the characteristics of the four types of claims. Type of Mean -------Severity------Claim Frequency Mean Variance W 0.02 200 2,500 X 0.03 1,000 1,000,000 Y 0.04 100 0 Z 0.01 1,500 2,000,000 Calculate the variance of the pure premium. A. Less than 70,000 B. At least 70,000 but less than 80,000 C. At least 80,000 but less than 90,000 D. At least 90,000 but less than 100,000 E. At least 100,000
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 46
4.19 (4B, 5/92, Q.31) (2 points) You are given that N and X are independent random variables where: • N is the number of claims, and has a binomial distribution with parameters m = 3 and q = 1/6. • X is the size of claim and has the following distribution: P[X=100] = 2/3 P[X=1100] = 1/6 P[X=2100] = 1/6 Determine the coefficient of variation of the aggregate loss distribution. A. Less than 1.5 B. At least 1.5 but less than 2.5 C. At least 2.5 but less than 3.5 D. At least 3.5 but less than 4.5 E. At least 4.5 4.20 (5A, 5/94, Q.22) (1 point) The probability of a particular automobile's being in an accident in a given time period is 0.05. The probability of more than one accident in the time period is zero. The damage to the automobile is assumed to be uniformly distributed over the interval from 0 to 2000. What is the variance of the pure premium? A. Less than 40,000 B. At least 40,000, but less than 50,000 C. At least 50,000, but less than 60,000 D. At least 60,000, but less than 70,000 E. 70,000 or more 4.21 (5A, 5/94, Q.35) (2 points) Your company plans to sell a certain type of policy that is expected to have a claim frequency per policy of 0.15, and a claim size distribution with a mean of 1200 and a standard deviation of 2000. Management believes that 40,000 of these policies can be written this year. Assume that for the portfolio of policies, the number of claims is Poisson distributed. Assume that the premium for each policy is 105% of expected losses. Ignore expenses. What is the amount of surplus that must be held for this portfolio such that the probability that the surplus will be exhausted is .005? 4.22 (5A, 5/94, Q.39) (2 points) Your company plans to sell a certain policy but will not commit any surplus to support it. You have determined that the policy will have a mean frequency per policy of 0.045, and a claim size distribution with a mean of 750 and a second moment about the origin of 60,000,000. The price that is suggested is 105% of expected losses. Management will allow the policy to be written only if the probability that losses will exceed premiums is less than 1%. Ignore expenses and assume that for the portfolio of policies, the number of claims is Poisson distributed. What is the smallest number of policies that must be sold in order to satisfy management's requirement?
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 47
4.23 (5A, 11/94, Q.22) (1 point) Assume S is a compound Poisson distribution of aggregate claims with a Poisson parameter of 3. Individual claims are uniformly distributed with integer values from 1 to 6 What is the variance of S? A. Less than 30 B. At least 30, but less than 40 C. At least 40, but less than 50 D. At least 50, but less than 60 E. Greater than or equal to 60 4.24 (5A, 11/94, Q.38) (3 points) Your company's automobile liability portfolio consists of three tiers. You have determined that the aggregate claim distribution for each tier is compound Poisson, characterized by the following: Tier 1 Tier 2 Tier 3 Poisson parameter 2.3 3.0 1.9 Pr[X = xi | a claim has occurred] Claim Amount x1 = 1,000
Tier 1 0.60
x2 = 5,000
0.30
0.20
0.15
x3 = 10,000
0.10
0.10
0.05
Tier 2 0.70
Tier 3 0.80
What are the mean and variance of the aggregate claim distribution for the entire automobile portfolio? 4.25 (4B, 5/95, Q.14) (3 points) You are given the following: • For a given risk, the number of claims for a single exposure period will be 1, with probability 3/4; or 2, with probability 1/4. • If only one claim is incurred, the size of the claim will be 80, with probability 2/3; or 160, with probability 1/3. • If two claims are incurred, the size of each claim, independent of the other, will be 80, with probability 1/2; or 160, with probability 1/2. Determine the variance of the pure premium for this risk. A. Less than 3,600 B. At least 3,600, but less than 4,300 C. At least 4,300, but less than 5,000 D. At least 5,000, but less than 5,700 E. At least 5,700
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 48
4.26 (5A, 5/95, Q.20) (1 point) Assume S is compound Poisson with a mean number of claims = 4. Individual claims will be of amounts 100, 200, and 500 with probabilities 0.4, 0.5, and 0.1, respectively. What is the variance of S? A. Less than 150,000 B. At least 150,000, but less than 175,000 C. At least 175,000, but less than 200,000 D. At least 200,000 but less than 225,000 E. Greater than or equal to 225,000 4.27 (4B, 5/96, Q.7) (3 points) You are given the following: • The number of claims follows a negative binomial distribution with mean 800 and variance 3,200. Claim sizes follow a transformed gamma distribution with mean 3,000 • and variance 36,000,000. The number of claims and claim sizes are independent. • Using the Central Limit Theorem, determine the approximate probability that the aggregate losses will exceed 3,000,000. A. Less than 0.005 B. At least 0.005, but less than 0.01 C. At least 0.01, but less than 0.1 D. At least 0.1, but less than 0.5 E. At least 0.5
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 49
4.28 (4B, 5/96, Q.18) (2 points) Two dice, A and B, are used to determine the number of claims. The faces of each die are marked with either a 1 or a 2, where 1 represents 1 claim and 2 represents 2 claims. The probabilities for each die are: Die Probability of 1 Claim Probability of 2 Claims A 2/3 1/3 B 1/3 2/3 In addition, there are two spinners, X and Y, which are used to determine claim size. Each spinner has two areas marked 2 and 5. The probabilities for each spinner are: Probability Probability Spinner that Claim Size = 2 that Claim Size = 5 X 2/3 1/3 Y 1/3 2/3 For the first trial, a die is randomly selected from A and B and rolled. If 1 claim occurs, spinner X is spun. If 2 claims occur, both spinner X and spinner Y are spun. For the second trial, the same die selected in the first trial is rolled again. If 1 claim occurs, spinner X is spun. If 2 claims occur, both spinner X and spinner Y are spun. Determine the expected amount of total losses for the first trial. A. Less than 4.8 B. At least 4.8, but less than 5.1 C. At least 5.1, but less than 5.4 D. At least 5.4, but less than 5.7 E. At least 5.7 4.29 (3 points) In the previous question, 4B, 5/96, Q.18, determine the variance of the distribution of total losses for the first trial. A. 4 B. 5 C. 6 D. 7 E. 8
4.30 (5A, 5/96, Q.37) (2.5 points) Given the following information regarding a single commercial property exposure: The probability of claim in a policy period is 0.2. Each risk has at most one claim per period. The distribution of individual claim amounts is LogNormal with parameters µ = 7.54 and σ = 1.14. Assume that all exposures are independent and identically distributed. Using the normal approximation, how many exposures must an insurer write to be 95% sure that the total loss does not exceed twice the expected loss?
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 50
4.31 (5A, 11/96, Q.38) (2 points) A portfolio of insurance policies is assumed to follow a compound Poisson claims process with 100 claims expected. The claim amount distribution is assumed to have an expected value of 1,000 and variance of 1,000,000. These insureds would like to self insure their risk provided that there is no more than a 5% chance of insolvency in the first year. If the premium equals the expected loss, and if there are no other risks, then how much capital must the insureds possess in order to meet their solvency requirement? 4.32 (5A, 11/98, Q.22) (1 point) Assume S is compound Poisson with mean number of claims (N) equal to 3. Individual claim amounts follow a distribution with E[X] = 560 and Var[X] = 194,400. What is the variance of S? A. Less than 1,500,000 B. At least 1,500,000, but less than 1,750,000 C. At least 1,750,000, but less than 2,000,000 D. At least 2,000,000, but less than 2,250,000 E. At least 2,250,000 4.33 (5A, 11/98, Q.36) (2 points) Assume the following: i. S = X1 + X2 + X3 +...+ XN where X1 , X2 , X3 ,... XN are identically distributed and N, X1 , X2 , X3 , . . ., XN are mutually independent random variables. ii. N follows a Poisson distribution with λ = 4. iii. Expected value of the variance of S given N, E[Var(S | N)] = 1,344. iv. Var(N) [E(X)]2 = 4,096. Calculate E[X2 ]. 4.34 (5A, 5/99, Q.23) (1 point) Let S be the aggregate amount of claims. The number of claims, N, has the following probability function: Pr(N=0) = 0.25, Pr(N = 1) = 0.25, and Pr(N=2) = 0.50. Each claim size is independent and is uniformly distributed over the interval (2, 6). The number of claims and the claim sizes are mutually independent. What is Var(S)? A. Less than 6 B. At least 6, but less than 9 C. At least 9, but less than 12 D. At least 12, but less than 15 E. At least 15 4.35 (5A, 5/99, Q.36) (2 points) In a given time period, the probability that a particular automobile insurance policyholder will have a physical damage claim is 0.05. Assume that the policyholder can have at most one claim during the given time period. If a physical damage claim is made, the cost of the damages is uniformly distributed over the interval (0, 5000). Calculate the mean and variance of aggregate policy losses within the given time period.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 51
Use the following information for the next two questions: • The number of claims per year follows a Poisson distribution with mean 300. • Claim sizes follow a Generalized Pareto distribution, as per Loss Models, with parameters θ = 1,000, α = 3, and τ = 2 . • The nth moment of a Generalized Pareto Distribution is: θn Γ(α - n) Γ(τ + n) E[Xn ] = , for α > n. Γ(α) Γ(τ) • The number of claims and claim sizes are independent. 4.36 (4B, 11/99, Q.12) (2 points) Using the Normal Approximation, determine the probability that annual aggregate losses will exceed 360,000. A. Less than 0.01 B. At least 0.01, but less than 0.03 C. At least 0.03, but less than 0.05 D. At least 0.05, but less than 0.07 E. At least 0.07 4.37 (4B, 11/99, Q.13) (2 points) After a number of years, the number of claims per year still follows a Poisson distribution, but the expected number of claims per year has been cut in half. Claim sizes have increased uniformly by a factor of two. Using the Normal Approximation, determine the probability that annual aggregate losses will exceed 360,000. A. Less than 0.01 B. At least 0.01, but less than 0.03 C. At least 0.03, but less than 0.05 D. At least 0.05, but less than 0.07 E. At least 0.07 4.38 (Course 151 Sample Exam #1, Q.4) (0.8 points) For an insurance portfolio: (i) the number of claims has the probability distribution n p(n) 0 0.4 1 0.3 2 0.2 3 0.1 (ii) each claim amount has a Poisson distribution with mean 4 (iii) the number of claims and claim amounts are mutually independent. Determine the variance of aggregate claims. (A) 8 (B) 12 (C) 16 (D) 20 (E) 24
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 52
4.39 (Course 151 Sample Exam #2, Q.4) (0.8 points) You are given S = S1 + S2 , where S1 and S2 are independent and have compound Poisson distributions with the following characteristics: (i) λ1 = 2 and λ2 = 3 (ii)
x
p 1 (x)
p 2 (x)
1 0.6 0.1 2 0.4 0.3 3 0.0 0.5 4 0.0 0.1 Determine the variance of S. (A) 15.1 (B) 18.6 (C) 22.1
(D) 26.6
(E) 30.1
4.40 (Course 151 Sample Exam #3, Q.1) (0.8 points) For a portfolio of insurance, you are given the distribution of number of claims: n Pr(N=n) 0 0.40 5 0.10 10 0.50 and the distribution of the claim amounts: x p(x) 1 0.90 2 0.10 Individual claim amounts and the number of claims are mutually independent. Determine the variance of aggregate claims. (A) 22.3 (B) 24.1 (C) 25.0 (D) 26.9 (E) 27.4 4.41 (Course 151 Sample Exam #3, Q.13) (1.7 points) You are given:
•
The number of claims is given by a mixed Poisson with an Inverse Gaussian mixing distribution, with µ = 500 and θ = 5000.
• • •
The number and amount of claims are independent. The mean aggregate loss is 1000.
The variance of aggregate losses is 150,000. Determine the variance of the claim amount distribution. (A) 88 (B) 92 (C) 96 (D) 100 (E) 104
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 53
4.42 (4, 11/02, Q.36 & 2009 Sample Q. 53) (2.5 points) You are given: Number of Claims
Probability
Claim Size
0
1/5
1
3/5
25 150
1/3 2/3
2
1/5
50 200
2/3 1/3
Probability
Claim sizes are independent. Determine the variance of the aggregate loss. (A) 4,050 (B) 8,100 (C) 10,500 (D) 12,510
(E) 15,612
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 54
Solutions to Problems: 4.1. E. The mean severity = exp(µ + 0.5σ2 ) = exp(5.18) = 177.6828. Thus the mean aggregate losses are: (6200)(177.6828) = 1,101,633. 4.2. D. The second moment of the severity = exp(2µ + 2σ2 ) = exp(10.72) = 45,252. Thus since the frequency is Poisson and independent of the severity:
σPP2 = µF(2nd moment of the severity) = (6200)(45252) = 280.56 million. 4.3. B. Since the variance is 280.56 million, the standard deviation of the aggregate losses is 16750. Thus the probability of the observed aggregate losses being more than 1130 thousand is approximately: 1 - Φ[(1130 - 1101.63) / 16.75] = 1 - Φ[1.69] = 1 - 0.9545 = 4.55%. 4.4. C. Prob[aggregate losses < 1075 thousand] ≅ Φ[(1075 - 1101.63) / 16.75] = Φ(-1.59) = 1 - 0.9441 = 5.59%. 4.5. C. Using the solutions to the prior two questions: 1 - 4.51% - 5.59% = 89.9%. Comment: If one were asked for the Full Credibility criterion for Aggregate Losses corresponding to a 90% chance of being within ±2.5% of the expected aggregate losses, in the case of a Poisson frequency, as explained in the next section the answer would be: (y/k)2 (1 + CV2 ) = (1.645/0.025)2 exp(σ2 ) = 4330(1.4333) = 6206 claims. Note that for the LogNormal Distribution: 1 + CV2 = exp(σ2 ) = exp(0.82 ) = 1.4333. That is just another way of saying there is about a 90% chance of being within ±2.5% of the expected aggregate losses when one has about 6200 expected claims. 4.6. A. µf = σf2 = λ = 0.04. µs = αθ = 30. σs2 = αθ2 = 300. σPP2 = µfσs2 + µs2σf2 = (0.04)(300) + (302 )(0.04) = 48. 4.7. C. σPP2 = λ(second moment of severity) = (0.06){α(α + 1)θ2} = (0.06)(3)(4)(152 ) = 162. 4.8. D. EPV = (80%)(48) + (20%)(162) = 70.8.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 55
4.9. D. For a fixed number of claims N, the variance of the pure premium is the variance of the sum of N independent identically distributed variables each with variance σS2 . (Since frequency and severity are assumed independent, σS2 is the same for each value of N.) Such variances add so that VARPP(PP|N) = NσS2 . EN[VARPP(PP|N)] = EN[NσS2 ] = σS2 EN[N] = σS2 µF. For the Negative Binomial Distribution: mean = rβ = (30)(2/3) = 20. For the Gamma the variance = αθ2 = 4(10002 ) = 4,000,000. Thus EN[VARPP(PP|N)] = σS2 µF = (20)(4 million) = 80 million. 4.10. D. For a fixed number of claims N, with frequency and severity independent, the expected value of the pure premium is N times the mean severity: EPP[PP|N] = µSN. VARN(EPP[PP|N]) = VARN( µSN) = µS2 VARN(N) = µS2 σF2 . For the Negative Binomial: variance = rβ(1+β) = (30)(2/3)(5/3) = 33.33. For the Gamma the mean is αθ = 4(1000) = 4000. Therefore, VARN(EPP[PP|N]) = µS2 σF2 = (4000)2 (33.33) = 533.3 million. 4.11. E. For the Negative Binomial Distribution: mean = 20, variance = 33.33. For the Gamma the mean 4000, variance = 4,000,000. Thus σPP2 = µF σS2 + µS2 σF2 = (20)(4 million) + (4000)2 (33.33) = 80 million + 533.3 million = 613.3 million. Comment: Note that the process variance is also the sum of the answers to the two previous questions: 80 million + 533.3 million = 613.3 million. This is the analysis of variance that is used in the derivation of the formula used to solve this problem. 4.12. C. For the Binomial frequency: mean = mq = 3, variance = mq(1-q) = (10)(0.3)(0.7) = 2.1. For the Pareto severity: mean = θ / (α-1) = 500 / 2 = 250, variance =
α θ2 (3) (5002) = = 187,500. (α − 1)2 (α − 2) (3 -1)2 (3 - 2)
Since the frequency and severity are independent:
σPP2 = µF σS2 + µS2 σF2 = (3)(187,500) + (2502 )(2.1) = 693,750.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 56
4.13. D. For the Poisson frequency: mean = variance = λ = 5. For the LogNormal severity: Mean = exp(µ + 0.5 σ2) = exp[6 + (0.5 )(0.82 )] = 555.573, Variance = exp(2µ + σ2) {exp( σ2) - 1} = exp[2(6) +(0.82 )] (exp[0.82 ] - 1) = (308661.3) (1.89648 -1) = 276,709. Since the frequency and severity are independent:
σPP2 = µF σS2 + µS2 σF2 = (5)(276709) + (555.5732 )(5) = 2,926,852. Alternately, since the frequency is Poisson and the frequency and severity are independent:
σPP2 = (mean frequency)(2nd moment of the severity). The 2nd moment of a LogNormal Distribution is: exp(2µ + 2σ2) = exp[2(6) + 2(0.82 )] = exp(13.28) = 585370.3. Therefore,
σPP2 = (mean frequency)(2nd moment of the severity) = (5)(585370.3) = 2,926,852. 4.14. E. For the Negative Binomial frequency: mean = rβ = (2.7)(7/3)= 6.3, variance = rβ(1+β) = (2.7)(7/3)(10/3) = 21. For the Gamma severity: mean = αθ = 2(250) = 500, variance = αθ2 = 2 (250)2 = 125000. Since the frequency and severity are independent:
σPP2 = µF σS2 + µS2 σF2 = (6.3)(125000) + (5002 )(21) = 6,037,500. 4.15. E. (55%)(693,750) + (35%)(2,926,852) + (10%) (6,037,500) = 2,009,711. 4.16. A. For a Poisson frequency, σPP2 = µF(2nd moment of the severity) = (0.5)(1000) = 500.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 57
4.17. A. The mean frequency = (1/4)(4) + (1/2)(5) + (1/4)(6) = 5. 2nd moment of frequency = (1/4)(42 ) + (1/2)(52 ) + (1/4)(62 ) = 25.5. The variance of the frequency = 25.5 - 52 = 0.5. ∞
mean severity =
∞
∫1 x f(x) dx = ∫1
x= ∞ 3 2 x 4 dx = -3x / 2 ] = 3/2. x x=1
∞
second moment =
∫1
∞
x2
f(x) dx =
∫1
x2
x= ∞ 3 1 dx = -3x / 1] = 3. x4 x=1
Thus the variance of the severity is: 3 - (3/2)2 = 3/4. For independent frequency and severity, the variance of the pure premiums = (mean frequency)(variance of severity) +(mean severity)2 (variance of frequency) = (5)(3/4) + (3/2)2 (0.5) = 4.875. The mean of the pure premium is (mean frequency)(mean severity) = (5)(3/2) = 7.5. The coefficient of variation of the pure premium =
variance of P.P. = mean of P.P.
4.875 = 0.294. 7.5
Comment: The severity distribution is a Single Parameter Pareto with θ = 1 and α = 3. The mean =
α θ2 αθ = 3/2. The variance = = 3/4. (α − 1)2 (α − 2) α −1
4.18. E. Since we have a Poisson Frequency, the Process Variance for each type of claim is given by the mean frequency times the second moment of the severity. For example, for Claim Type Z, the process variance of the pure premium is: (0.01)(2,250,000 + 2,000,000) = 42,500. Then the process variances for each type of claim add to get the total variance, 103,570. Type of Claim W X Y Z SUM
Mean Frequency 0.02 0.03 0.04 0.01
Mean Severity 200 1000 100 1500
Square of Mean Severity 40,000 1,000,000 10,000 2,250,000
Variance of Severity 2,500 1,000,000 0 2,000,000
Process Variance of P.P. 850 60,000 400 42,500 103,750
Comment: This is like adding up four independent die rolls; the variances add. For example this could be a nonrealistic model of homeowners insurance with the four types of claims being: Fire, Liability, Theft and Windstorm.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 58
4.19. B. The mean frequency is mq = 1/2, while the variance of the frequency is mq(1-q) = (3)(1/6)(5/6) = 5/12. The mean severity is: (2/3)(100) +(1/6)(1100)+ (1/6)(2100) = 600. The second moment of the severity is: (2/3)(1002 ) + (1/6)(11002 ) + (1/6)(21002 ) = 943,333. Thus the variance of the severity is: 943,333 - 6002 = 583,333. The variance of the pure premium = (variance of frequency)(mean severity)2 + (variance of severity)(mean frequency) = (5/12)(600)2 + (1/2)(583,333) = 441,667. The mean pure premium is (1/2)(600) = 300. Therefore, the coefficient of variation is: standard deviation / mean = 441,667 / 300 = 2.2. 4.20. D. The variance of the frequency is: (0.05)(0.95) = 0.0475. The mean damage is: 1000. The variance of the damage is: (2000 - 0)2 / 12 = 333,333. The variance of the pure premium = (10002 )(0.0475) + (0.05)(333,333) = 64,167. 4.21. The mean aggregate loss is: (40000)(0.15)(1200) = 7.2 million. Since frequency is Poisson with mean: (40000)(0.15) = 6000, the variance of aggregate losses is: (mean frequency)(2nd moment of severity) = (6000) (20002 + 12002 ) = 32,640 million. The standard deviation of aggregate losses is:
32,640 million = 180,665.
Premium = (1.05)(expected losses) = (1.05)(7.2 million) = 7.56 million. We want the Premiums + Surplus ≥ Actual Losses.
⇔ Surplus ≥ Actual Losses - Premiums = Actual Losses - 7.56 million. Φ(2.756) = 0.995 ⇔ the 99.5th percentile of the Standard Normal Distribution is 2.576. Therefore, 99.5% of the time actual losses are less than or equal to: 7.2 million + (2.576)(180,665) = 7.665 million. Therefore, we want surplus of at least: 7.665 million - 7.56 million = 105 thousand. Comment: 100% - 99.5% = 0.5% of the time, actual losses will be greater than 7.665 million, and a surplus of 105 thousand would be exhausted. 4.22. Let N be the number of policies written. The mean aggregate loss = N(.045)(750) and the variance of aggregate losses = N(0.045)(60,000,000). Thus premiums are: 1.05N(.045)(750). The 99th percentile of the Unit Normal Distribution is 2.326. Thus we want Premiums - Expected Losses = 2.326(standard deviation of aggregate losses). (0.05)N(0.045)(750) = 2.326
N (0.045) (60,000,000) .
Therefore, N = (60,000,000/7502 ) (2.326/0.05)2 / 0.045 = 5,129,743.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 59
4.23. C. Second moment of the severity = (12 + 22 + 32 + 42 + 52 + 62 )/6 = 15.167. Since the frequency is Poisson, the variance of aggregate losses = (mean frequency)(second moment of the severity) = (3)(15.167) = 45.5. 4.24. For each tier, the mean aggregate loss = (mean frequency)(mean severity) and since frequency is Poisson, the variance of aggregate loss = (mean frequency)(second moment of the severity). The means and variances of the tiers add to get an overall mean of: 19,125, and an overall variance of: 106,875,000. Tier 1
Tier 2
Tier 3
1000 5000 10000
0.6 0.3 0.1
0.7 0.2 0.1
0.8 0.15 0.05
Mean Severity 2nd Moment of Severity
3100 18,100,000
2700 15,700,000
2050 9,550,000
Poisson Parameter
2.3
3
1.9
Mean Aggregate Variance of Aggregate
7,130 41,630,000
8,100 47,100,000
3,895 18,145,000
Overall
19,125 106,875,000
Alternately, the tiers, each of which is compound Poisson, add to get a new compound Poisson with mean frequency: 2.3 + 3 + 1.9 = 7.2. The mean severity overall is a weighted average of the means for the individual tiers: {(2.3)(3100) + (3)(2700) + (1.9)(2050)}/7.2 = 2556.25. Thus the mean aggregate loss is: (2556.25)(7.2) = 19,125. The second moment of the severity overall is a weighted average of the second moments for the individual tiers: {(2.3)(18.1 million) + (3)(15.7 million) + (1.9)(9.55 million)}/7.2 = 14.844 million. Thus the variance of aggregate losses is: (7.2)(14.844 million) = 106.9 million. 4.25. D. For example, the chance of 2 claims of size 80 each is the chance of having two claims times the chance given two claims that they will each be 80 = (1/4)(1/2)2 = 1/16. In that case the pure premium is 80 + 80 = 160. One takes the weighted average over all the possibilities. The average Pure Premium is 140. The second moment of the Pure Premium is 24800. Therefore, the variance = 24800 - 1402 = 5200. Situation 1 claim @ 80 1 claim @ 160 2 claims @ 80 each 2 claims: 1 @ 80 & 1 @ 160 2 claims @ 160 each
Probability 0.5000 0.2500 0.0625 0.1250 0.0625
Pure Premium 80 160 160 240 320
Square of P.P. 6400 25600 25600 57600 102400
Overall
1.0000
140
24800
Comment: Note that the frequency and severity are not independent.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 60
4.26. C. Since the frequency is Poisson, the variance of aggregate losses = (mean frequency) (second moment of the severity) = (4) {(0.4)(1002 ) + (0.5)(2002 ) + (0.1)(5002 )} = 196,000. 4.27. B. The mean pure premium is (3000)(800) = 2.4 million. Since frequency and severity are independent, the (process) variance of the aggregate losses is: µf σs2 + µs2 σf2 = (800)(36 million) + (3000)2 (3200) = 57.6 billion. Thus the standard deviation of the pure premiums is: 57.6 billion = 240,000. To apply the Normal Approximation we subtract the mean and divide by the standard deviation. The probability that the total losses will exceed 3 million is approximately: 1 - Φ[(3 million - 2.4 million)/ 240,000] = 1 - Φ(2.5) = 1 - 0.9938 = 0.0062. Comment: One makes no specific use of the information that the frequency is given by a Negative Binomial, nor that the severity is given by a Transformed Gamma Distribution. 4.28. B. Since Die A and Die B are equally likely, the chance of 1 claim is: (1/2)(2/3) + (1/2)(1/3) = 1/2, while the chance of 2 claims is: (1/2)(1/3) + (1/2)(2/3) = 1/2. The mean of Spinner X is: (2/3)(2) + (1/3)(5) = 3, while the mean of Spinner Y is: (1/3)(2) + (2/3)(5) = 4. If we have one claim the mean loss is E[X] = 3. If we have two claims, then the mean loss is: E[X+Y] = E[X] + E[Y] = 3 + 4 = 7. The overall mean pure premium is: (chance of 1 claim)(mean loss if 1 claim) + (chance of 2 claims)(mean loss if 2 claims) = (1/2)(3) + (1/2)(7) = 5. Comment: In this problem frequency and severity are not independent. The next question on this exam used the same setup and asked one to perform Bayes Analysis; See section 13 of my guide to Buhlmann Credibility and Bayes Analysis.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 61
4.29. D. Since Die A and Die B are equally likely, the chance of 1 claim is: (1/2)(2/3) + (1/2)(1/3) = 1/2, while the chance of 2 claims is: (1/2)(1/3) + (1/2)(2/3) = 1/2. If we have one claim, then spinner X is spun and the loss is either: 2 with probability 2/3 or 5 with probability 1/3. If we have 2 claims, then spinners X and Y are spun and the loss is either: 4 with probability 2/9, 7 with probability 5/9, or 10 with probability 2/9. Thus the distribution of losses is: 2 @ 1/3, 5 @ 1/6, 4 @ 1/9, 7 @ 5/18,and 10 @ 1/9. Mean loss is: (2)(1/3) + (5)(1/6) + (4)(1/9) + (7)(5/18) + (10)(1/9) = 5. Second moment is: (22 )(1/3) + (52 )(1/6) + (42 )(1/9) + (72 )(5/18) + (102 )(1/9) = 32. Variance = 32 - 52 = 7. Alternately, this is a 50-50 mixture of two situations one claim or two claims. The mean of Spinner X is: (2/3)(2) + (1/3)(5) = 3. The variance of Spinner X is: (2/3)(2 - 3)2 + (1/3)(5 - 3)2 = 2. The mean of Spinner Y is: (1/3)(2) + (2/3)(5) = 4. The variance of Spinner Y is: (1/3)(2 - 4)2 + (2/3)(5 - 4)2 = 2. If we have one claim the mean loss is E[X] = 3. If we have two claims, then the mean loss is: E[X+Y] = E[X] + E[Y] = 3 + 4 = 7. The overall mean is: (1/2)(3) + (1/2)(7) = 5. If we have one claim the second moment is from spinner X: 2 + 32 = 11. If we have two claims the variance is the sum of those for X and Y: 2 + 2 = 4. Thus if we have two claims the second moment: 4 + 72 = 53. Thus the second moment of the mixture is: (1/2)(11) + (1/2)(53) = 32. Therefore, the variance of the mixture is: 32 - 52 = 7. Alternately, take the two types as 1 or 2 claims, equally likely. The hypothetical means for 1 and 2 claims are: 3 and 7. Therefore, the variance of the hypothetical means is: (1/2)(3 - 5)2 + (1/2)(7 - 5)2 = 4. When there is one claims, the process variance is that of spinner X: 2. When there are 2 claims, the process variance is the sum of those for spinners X and Y: 2 + 2 = 4. Expected Value of the process variance is: (1/2)(2) + (1/2)(4) = 3. Total variance is: EPV + VHM = 3 + 4 = 7.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 62
Alternately, take the two types as Die A and B, equally likely. The hypothetical mean if Die A is: (2/3)(3) + (1/3)(7) = 13/3. The hypothetical mean if Die B is: (1/3)(3) + (2/3)(7) = 17/3. Therefore, the variance of the hypothetical means is: (1/2)(13/3 - 5)2 + (1/2)(17/3 - 5)2 = 4/9. When there is one claim, the second moment of pure premium is: 2 + 32 = 11. When there is two claims, the second moment of pure premium is: 4 + 72 = 53. Therefore, if one has die A, the second moment of pure premium is: (2/3)(11) + (1/3)(53) = 25. Thus the process variance when die A is: 25 - (13/3)2 = 56/9. Therefore, if one has die B, the second moment of pure premium is: (1/3)(11) + (2/3)(53) = 39. Thus the process variance when die B is: 39 - (17/3)2 = 62/9. Expected Value of the process variance is: (1/2)(56/9) + (1/2)(62/9) = 59/9. Total variance is: EPV + VHM = 59/9 + 4/9 = 7. Comment: In this problem frequency and severity are not independent. 4.30. For the LogNormal Distribution, E[X] = exp[µ + σ2/2] = exp[7.54 + 1.142 /2] = 3604. E[X2 ] = exp[2µ + 2σ2] = exp[(2)(7.54) + (2)(1.142 )] = 47,640,795. Var[X] = 47,640,795 - 36042 = 34,651,979. Mean aggregate loss (per exposure) = (0.2)(3604) = 721. Variance of Aggregate Losses (per exposure) is: (.2)( 34,651,979) + (0.2)(0.8)(36042 ) = 9,008,606. Thus if we write N exposures, mean loss = 721N, and standard deviation of the aggregate loss is 3001 N . We want N such that Prob(Aggregate Loss > (2)(mean)) ≤ 5%. Prob((Aggregate Loss - mean) > 721N) ≤ 5%. Using the Normal Approximation, we want: (1.645)(stddev) < 721N.
⇒ 1.645(3001 N ) < 721N. ⇒ N > 46.9. 4.31. The variance of aggregate losses is: (100)(1,000,000 + 10002 ) = 200,000,000. The 95th percentile of the aggregate losses exceeds the mean by about 1.645 standard deviations: (1.645) 200,000,000 = 23,264. With at least this much capital, there is no more than a 5% chance of insolvency in the first year. 4.32. B. For a compound Poisson, variance of aggregate losses = (mean frequency)(second moment of severity) = (3)(194400 + 5602 ) = 1,524,000.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 63
4.33. 1344 = EN[Var(S | N)] = EN[Var(X1 + X2 + X3 +...+ XN)] = EN[N Var[X]] = E[N] Var[X] = 4Var[X]. ⇒ Var[X] = 1344/4 =336. 4,096 = Var(N)[E(X)]2 = 4[E(X)]2 . ⇒ [E(X)]2 = 1024. E[X2 ] = Var[X] + [E(X)]2 = 336 + 1024 = 1360. Comment: E[Var(S | N)] given in the question is the expected value over N of the variance of the aggregate losses conditional on N. 4.34. D. Mean Frequency = (0.25)(0) + (0.25)(1) + (0.50)(2) = 1.25. Second Moment of the Frequency = (0.25)(02 ) + (0.25)(12 ) + (.050)(22 ) = 2.25. Variance of the Frequency = 2.25 - 1.252 = 0.6875. Mean Severity = (2 + 6)/2 = 4. Variance of the severity = (6-2)2 /12 = 4/3. Variance of the Aggregate Losses = (42 )(0.6875) + (4/3)(1.25) = 12.67. 4.35. Mean frequency = 0.05. Variance of frequency = (0.05)(0.95) = 0.0475. Mean severity = (0 + 5000)/2 = 2500. Variance of severity = (5000 - 0)2 /12 = 2,083,333. Mean aggregate loss = (0.05)(2500) = 125. Variance of aggregate losses = (0.05)(2083333) + (25002 )(0.0475) = 401,042. 4.36. B. The mean and variance of the frequency is 300. The mean of the Generalized Pareto severity is: θτ/(α-1) = (1000)(2)/(3-1) = 1000. The 2nd moment of the Generalized Pareto severity is: θ2 τ (τ +1) (10002) (2) (3) = = 3 million. (α − 1) (α − 2) (3 -1) (3 - 2) Mean aggregate losses = (300)(1000) = 300,000. Variance of Aggregate Losses = (mean of Poison)(2nd moment of severity) = (300)(3 million) = 900 million. Standard Deviation of Aggregate Losses = 30,000. Using the Normal Approximation, the chance that the aggregate losses are greater than 360,000 is approximately: 1 - Φ[(360,000 - 300,000)/30,000] = 1 - Φ[2] = 1 - 0.9772 = 0.0228.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 64
4.37. E. The mean and variance of the frequency is 150. The mean of the Generalized Pareto severity is twice what it was or 2000. The 2nd moment of the Generalized Pareto severity is four times what it was or 12 million. Mean aggregate losses = (150)(2000) = 300,000. Variance of Aggregate Losses = (mean of Poison)(2nd moment of severity) = (150)(12 million) = 1800 million. Standard Deviation of Aggregate Losses = 42,426. Using the Normal Approximation, the chance that the aggregate losses are greater than 360,000 is approximately: 1 - Φ[(360,000 - 300,000)/42,426] = 1 - Φ[1.41] = 1 - 0.9207 = 0.0793. Comment: The second moment is always multiplied by the square of the inflation factor under uniform inflation. Alternately, one can instead use the behavior under uniform inflation of the Generalized Pareto Distribution; the new severity distribution is also a Generalized Pareto, but with parameters θ = 2000, α = 3 and τ = 2. Its mean and second moment are as Iʼve stated. In general, when one halves the frequency and uniformly doubles the claim size, while the expected aggregate losses remain the same, the variance of the aggregate losses increases. (Given a Poisson Frequency, the variance of the aggregate losses doubles.) Therefore, there is a larger chance for an unusual year. High Severity/Low Frequency lines of insurance are more volatile than High Frequency/Low Severity lines of insurance. 4.38. D. Mean Frequency = (0.4)(0) + (0.3)(1) + (0.2)(2) + (0.1)(3) = 1. 2nd moment of Frequency = (0.4)(02 ) + (0.3)(12 ) + (0.2)(22 ) + (0.1)(32 ) = 2. Variance of Frequency = 2 - 12 = 1. Mean Severity = Variance of Severity = 4. Variance of aggregate claims = (4)(1) + (42 )(1) = 20. 4.39. D. The second moment of severity p1 is: (0.6)(12 ) + (0.4)(22 ) = 2.2. The second moment of severity p2 is: (0.1)(12 ) + (0.3)(22 ) + (0.5)(32 ) + (0.1)(42 ) = 7.4. Var[S] = (2)(2.2) + (3)(7.4) = 26.6. 4.40. E. Mean frequency is : (0)(0.4) + (5)(0.1) + (10)(0.5) = 5.5. The 2nd moment of the frequency is: (02 )(0.4) + (52 )(0.1) + (102 )(0.5) = 52.5. Variance of the frequency is: 52.5 - 5.252 = 22.25. Mean severity is: (0.9)(1) + (0.1)(2) = 1.1. 2nd moment of the severity is: (0.9)(12 ) + (0.1)(22 ) = 1.3. Variance of the severity is: 1.3 - 1.12 = 0.09. Variance of aggregate losses = (1.12 )(22.25) + (5.5)(0.09) = 27.4.
2013-4-8,
Classical Credibility §4 Variance of Pure Premiums, HCM 10/16/12, Page 65
4.41. C. The mean frequency = mean of the Inverse Gaussian = µ = 500. Variance of frequency = mean of Inverse Gaussian + variance of Inverse Gaussian = µ + µ3/θ = 500 + 5003 /5000 = 25,500. Let X be the severity distribution. Then we are given that: 1000 = Mean aggregate loss = 500E[X]. 150,000 = Variance of aggregate losses = 500 Var[X] + 25,500 E[X]2 . Therefore, E[X] = 1000 / 500 = 2 and Var[X] = {150,000 - 25,500(22 )} / 500 = 96. Comment: In general when one has a mixture of Poissons, Mean frequency = E[λ] = mean of mixing distribution, and Second moment of the frequency = Eλ[second moment of Poisson | λ] = E[λ + λ2] = mean of mixing distribution + second moment of mixing distribution. Variance of frequency = mean of mixing distribution + second moment of mixing - (mean of mixing distribution)2 = mean of mixing distribution + variance of mixing distribution. 4.42. B. List the different possible situations and their probabilities: Situation
Probability
Aggregate Loss
Square of the Aggregate Loss
no claims 1 claim @25 1 claim @ 150 2 claims each @50 1 claim @ 50 and 1 claim @ 200 2 claims @ 200
20.00% 20.00% 40.00% 8.89% 8.89% 2.22%
0 25 150 100 250 400
0 625 22,500 10,000 62,500 160,000
105
19,125
Weighted Average
Mean = 105. Second Moment = 19,125. Variance = 19,125 - 1052 = 8100.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 66
Section 5, Full Credibility for Pure Premiums & Aggregate Losses A single standard for full credibility applies when one wishes to estimate either pure premiums, aggregate losses, or loss ratios. Pure Premium =
Loss Ratio =
$ of Loss # of Claims $ of Loss = = (Frequency) (Severity). # of Exposures # of Exposures # of Claims
$Loss . $Premium
Since they depend on both the number of claims and the size of claims, pure premiums and aggregate losses have more reasons to vary than do either frequency or severity. Since pure premiums are more difficult to estimate than frequencies, all other things being equal the Standard for Full Credibility for Pure Premiums is larger than that for Frequencies. Poisson Frequency Example: For example, assume frequency is Poisson distributed with a mean of 9 (and a variance of 9) and every claim is of size 10. Then since the severity is constant, it does not increase the random fluctuations. Since Var[cX] = c2 Var[X], the variance of the pure premium for a single exposure is: (variance of the frequency)(102 ) = 900. Exercise: In the above situation, what is the Standard for Full Credibility (in terms of expected number of claims), so that the estimated pure premium will have a 90% chance of being within ±5% of the true value? [Solution: We wish to have a 90% probability, so we are to be within ±1.645 standard deviations, since Φ(1.645) = 0.95. For X exposures the variance of the sum of the pure premiums for each exposure is X900. The variance of the average pure premium per exposure is this divided by X2 . Thus we have a variance of 900 / X and a standard deviation of 30 / The mean pure premium is (9)(10) = 90. We wish to be within ±5% of this or ±4.5. Setting this equal to ±1.645 standard deviations we have:
X.
4.5 = 1.645(30 / X ), or X= {(1.645)(30/4.5)}2 = 120.26 exposures. The expected number of claims is: (120.26)(9) = 1082. Comment: Since severity is constant, the Standard for Full Credibility is the same as that for estimating the frequency, with Poisson frequency, P = 90%, and k = 5%: 1082 claims.]
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 67 If the severity is not constant but instead varies, then the variance of the pure premium is greater than 900. Specifically assume that the severity is given by a Gamma Distribution, with α = 3 and θ = 10. This distribution has a mean of: αθ = 30, and a variance of: αθ2 = 300. Then if we assume frequency and severity are independent we can use the formula developed for the variance of the pure premium in terms of that of the frequency and severity: σpp2 = µs2 σf2 + µfσs2 . In this case µs = 30, σs2 = 300, µf = σf2 = 9, σpp2 = 10,800. Assume we wish the Standard for Full Credibility (in terms of expected number of claims), to be such that the estimated pure premium will have a 90% chance of being with ±5% of the true value. We wish to have a 90% chance, so we want to be within ±1.645 standard deviations since Φ(1.645) = 0.95. For X exposures the variance of the sum of the pure premiums for each exposure is 10800X. The variance of the average pure premium per exposure is this divided by X2 . Thus we have a variance of 10800 / X and a standard deviation of 103.9 /
X . The mean pure premium is (9)(30) = 270.
We wish to be within ±5% of this or ±13.5. Setting this equal to ±1.645 standard deviations we gave: 13.5 = 1.645(103.9 /
⎛ 103.9 ⎞ 2 X ), or X = ⎜1.645 ⎟ = 160.3 exposures. ⎝ 13.5 ⎠
The expected number of claims is: (160.3)(9) = 1443. Note that this is greater than the 1082 claims needed for Full Credibility of the frequency when P = 90% and k = 5%. In fact the ratio is: 1443/1082 = 1 + 1/3 = 1 + CV2 , where CV2 is the square of the coefficient of variation of the severity distribution, which for the Gamma is 1/α = 1/3. It turns out in general, when frequency is Poisson, that the Standard for Full Credibility for the Pure Premium is: the Standard for Full Credibility for the Frequency times (1 + square of coefficient of variation of the severity):39 y2 nF = n0 (1 + CVS e v2 ) = 2 (1 + CVS e v2 ). k
39
Equation 2.5.4 in Mahler and Dean.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 68 Derivation of the Standard for Full Credibility for Pure Premiums, Poisson Case: The derivation follows that of the particular case above. Let µs be the mean of the severity distribution while σs2 is the variance. Assume that the frequency is Poisson and therefore µf = σf2 . Assuming the frequency and severity are independent, the variance of the Pure Premium for one exposure unit is: σpp2 = µs2 σf2 + µfσs2 = µf (µs2 + σs2 ). For X exposure units, the variance of the estimated average pure premium is this divided by X. We wish to be within ±y standard deviations, where as usual y is such that Φ(y) = (1+P)/2. For a mean pure premium of µf µs we wish to be within ±kµf µs. Setting the two expressions for the error bars equal yields: kµf µs = y
µf (µs2 + σ s2) / X . Solving for X, X = (y/k)2 (µs2 + σs2 ) / (µf µs2 ).
The expected number of claims needed for Full Credibility is: nF = µf X = (y/k)2 (1 + σs2 /µs2 ) = n0 (1 + CVSev2 ). A Formula for the Square of the Coefficient of Variation: The following formula for unity plus the square of the coefficient of variation follows directly from the definition of the Coefficient of Variation. C V2 = Variance / E[X]2 = (E[X2 ] - E[X]2 ) / E[X]2 = (E[X2 ] / E[X]2 ) - 1. Thus, 1 + CV2 =
E[X2 ] = 2nd moment divided by the square of the mean. E[X]2
This formula is useful for Classical credibility problems involving the Pure Premium. For example, assume one has a Pareto Distribution. Then using the formulas for the moments: 1 + CV2 = E[X2 ] / E[X]2 = {2θ2 /(α−1)(α−2)} / {θ / (α−1)}2 = 2 (α−1) / (α−2). For example if α = 5, then (1 + CV2 ) = 2(4)/3 = 8/3 . Exercise: Assume frequency and severity are independent and frequency is Poisson. For P = 90% and k = 5%, and if severity follows a Pareto Distribution with α = 5, what is the Standard for Full Credibility for the Pure Premium in terms of claims? [Solution: n0 (1 + CV2 ) = 1082(8/3) = 2885 claims.] In general the Standard for Full Credibility for the pure premium is the sum of those for frequency and severity. n0 (1+CV2 ) = n0 + n0 CV2 . In this case: 1082 + 1803 = 2885.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 69 General Case, if Frequency is Not Poisson: As with the Standard for Full Credibility for frequency, one can derive a more general formula when the Poisson assumption does not apply. The Standard for Full Credibility for estimating either pure premiums or aggregate losses is:40 σs 2 σf 2 y2 σf 2 ( + ) = n ( + CVs e v2 ). 0 k2 µf µs 2 µf which reduces to the Poisson case when σf2 /µf = 1. Note that if every claim is of size one, then the variance of the severity is zero and the standard for full credibility reduces to that for frequency: σ2 n0 f . µf Exercise: Frequency is Negative Binomial with r = 0.1 and β = 0.5. Severity has a coefficient of variation of 3. The number of claims and claim sizes are independent. The observed aggregate loss should be within 5% of the expected aggregate loss 90% of the time. Determine the expected number of claims needed for full credibility. σ2 [Solution: P = 90%. y = 1.645. k = 0.05. f = rβ(1 + β)/(rβ) = 1+ β = 1.5. µf ⎛ ⎞2 y2 σf2 2 ) = ⎜ 1.645 ⎟ (1.5 + 32 ) = 11,365 claims.] ( + CV sev ⎝ 0.05 ⎠ k2 µf
Note that a Negative Binomial has
σf2 > 1, so the standard for full credibility is larger than if one µf
assumed a Poisson frequency. Note that if one limits the size of claims, then the coefficient of variation is smaller. Therefore, the criterion for full credibility for basic limits losses is less than that for total losses. In general the Standard for Full Credibility for the pure premium is the sum of those for frequency σ2 σ2 and severity: n0 ( f + CV2 ) = n0 f + n0 CV2 . µf µf
S ↔ severity and f ↔ frequency. Equation 2.5.5 in Mahler and Dean, as derived in one of the problems below. See “The Credibility of the Pure Premium,” by Mayerson, Jones, and Bowers, PCAS 1968. 40
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 70 Derivation of the Standard for Full Credibility for Pure Premiums or Aggregate Loss: Let µf be the mean of the frequency distribution while σf2 is its variance. Let µs be the mean of the severity distribution while σs2 is its variance. Assuming that frequency and severity are independent, the variance of the Pure Premium for one exposure unit is: σpp2 = µfσs2 + µs2 σf2 . We assume the number of exposures is known; it is not random.41 For X exposure units, the variance of the estimated average pure premium is this divided by X. We wish to be within ±y standard deviations, where as usual y is such that Φ(y) = (1+P)/2. For a mean pure premium of µf µs we wish to be within ±k µf µs. Setting the two expressions for the error bars equal yields: k µf µs = y
(µf σ s2 + µs2 σ f2 ) / X .
Solving for X, the full credibility standard in exposures: X = (y/k)2 (µs2 σf2 + µfσs2 ) / (µf2 µs2 ). The expected number of claims needed for Full Credibility is: nF = µf X = (y/k)2 (σf2 /µf + σs2 /µs2 ) = n0 (σf2 /µf + CVSev2 ). Exposures vs. Claims: Standards for Full Credibility are calculated in terms of the expected number of claims. It is common to translate these into a number of exposures by dividing by the (approximate) expected claim frequency. So for example, if the Standard for Full Credibility is 2885 claims and the expected claim frequency in Auto Insurance were 0.07 claims per car-year, then 2885 / 0.07 ≅ 41,214 car-years would be a corresponding Standard for Full Credibility in terms of exposures. The Standard for Full Credibility in terms of claims, can be converted to exposures by dividing by µf, the mean claim frequency. Standard for Full Credibility in terms of exposures = nF / µf = n0 (σf2 /µf + σs2 / µs2 )/ µf = n0 (µs2 σf2 + µfσs2 )/ (µfµs)2 = n0 (variance of pure premium)/(mean pure premium)2 = n0 (CV of the Pure Premium)2 . 41
While we will solve for that number of exposures which satisfies the criterion for full credibility, in any given application of the credibility technique the number of exposures is known.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 71 When asked for the number of exposures needed for Full Credibility for Pure Premiums one can directly use this formula:42 43 Standard for Full Credibility for Pure Premiums in terms of exposures is: y2 n0 (Coefficient of Variation of the Pure Premium)2 = ( C VP P) 2 . k2 Exercise: The variance of pure premiums is 100,000. The mean pure premium is 40. Frequency is Poisson. We require that the estimated pure premiums be within 2.5% of the true value 90% of the time. How many exposures are need for full credibility? [Solution: The square of the Coefficient of Variation of the Pure Premium is 100,000/402 = 62.5. y = 1.645. k = 0.025. n0 = (y / k)2 = 4330. n0 (Coefficient of Variation of the Pure Premium)2 = 4330(62.5) = 270,625 exposures. Alternately, let m be the mean frequency. Then since the frequency is assumed to be Poisson, variance of pure premium = m(second moment of severity). Thus E[X2 ] = 100000 / m. E[X] = 40 / m. Standard for Full Credibility in terms of claims is: n0 (1 + CV2 ) = n0 E[X2 ] / E[X]2 = 4330 (100000 / 402 ) m = 270,625 m claims. To convert to exposures divide by m, to get 270,625 exposures.] Assumptions: σ2 The formula nF = n0 ( f + CVsev2 ) assumes: µf 1. Frequency and Severity are independent. 2. The claims are drawn from the same distribution or at least from distributions with the same finite mean and variance.44 3. The pure premium or aggregate loss is approximately Normally Distributed (the Central Limit Theorem applies.) 4. The number of exposures is known; it is not stochastic. The pure premiums are often approximately Normal; generally the greater the expected number of claims or the shorter tailed the frequency and severity distributions, the better the Normal Approximation. It is assumed that one has enough claims that the aggregate losses approximate a Normal Distribution. 42
See Equation 20.6 in Loss Models. In the formula for the standard for full credibility the CV is calculated for one exposure. The CV would go down as the number of exposures increased, since the mean increases as a factor of N, while the standard deviation increases as a factor of square root of N. This is precisely why when we have a lot of exposures, we get a good estimate of the pure premium by relying solely on the data. In other words, this is why there is a standard for full credibility. 44 The claim sizes can follow any distribution with a finite mean and variance, so that one can compute the coefficient of variation. 43
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 72 While it is possible to derive formulas that donʼt depend on the Normal Approximation, they are not on the Syllabus.45 Notation in Loss Models: The Loss Models text does not use the same notation as Mahler-Dean and many other casualty actuarial papers. Thus if one wanted to read this material in Loss Models, one would have to learn the notation in Loss Models. Mahler-Dean
Loss Models
P
p
probability level
k
r
range parameter
y
yp
such that the mean ±y standard deviations covers probability P on the Normal Distribution
n0 nF
λ0
Standard for Full Credibility for Poisson Frequency Standard for Full Credibility for Pure Premium
Loss Models refers to Classical Credibility as Limited Fluctuation Credibility.
45
See for example Appendix 1 of “Classical Partial Credibility with Application to Trend” by Gary Venter, PCAS 1986.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 73 Problems: 5.1 (2 points) You are given the following information : • The number of claims is Poisson. •
The severity distribution is LogNormal, with parameters µ = 6 and σ = 1.2.
• •
Frequency and severity are independent Full credibility is defined as having a 95% probability of being within plus or minus 10% of the true pure premium. What is the minimum number of expected claims that will be given full credibility? A. Less than 1600 B. At least 1600 but less than 1700 C. At least 1700 but less than 1800 D. At least 1800 but less than 1900 E. At least 1900 5.2 (2 points) The number of claims is Poisson. Mean claim frequency = 7%. Mean claim severity = $500. Variance of the claim severity = 1 million. Full credibility is defined as having a 80% probability of being within plus or minus 5% of the true pure premium. What is the minimum number of policies that will be given full credibility? A. 47,000 B. 48,000 C. 49,000 D. 50,000 E. 51,000 5.3 (3 points) The number of claims is Poisson. The full credibility standard for a company is set so that the total number of claims is to be within 5% of the true value with probability P. This full credibility standard is calculated to be 5000 claims. The standard is altered so that the total cost of claims is to be within 10% of the true value with probability P. The claim frequency has a Poisson distribution and the claim severity has the following distribution: f(x) = 0.000008 (500 - x), 0 ≤ x ≤ 500 What is the expected number of claims necessary to obtain full credibility under the new standard? A. 1825 B. 1850 C. 1875 D. 1900 E. 1925
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 74 5.4 (2 points) You are given the following information: • A standard for full credibility of 3,000 claims has been selected so that the actual pure premium would be within 5% of the expected pure premium 98% of the time. • The number of claims follows a Poisson distribution, and is independent of the severity distribution. Using the concepts of classical credibility, determine the coefficient of variation of the severity distribution underlying the full credibility standard. A. Less than 0.6 B. At least 0.6 but less than 0.7 C. At least 0.7 but less than 0.8 D. At least 0.8 but less than 0.9 E. At least 0.9 5.5 (2 points) You are given the following: • The number of claims is Poisson distributed. • Number of claims and claim severity are independent. • Claim severity has the following distribution: Claim Size Probability 1 0.50 5 0.30 10 0 20 Determine the number of claims needed so that the total cost of claims is within 3% of the expected cost with 90% probability. A. Less than 5000 B. At least 5000 but less than 5100 C. At least 5100 but less than 5200 D. At least 5200 but less than 5300 E. At least 5300 5.6. (2 points) Frequency is Poisson, and severity is Pareto with α = 4. The standard for full credibility is that actual aggregate losses be within 10% of expected aggregate losses 99% of the time. 50,000 exposures are needed for full credibility. Determine the expected number of claims per exposure. A. 2% B. 3% C. 4% D. 5% E. 6% 5.7 (2 points) The distribution of pure premium has a coefficient of variation of 5. The full credibility standard has been selected so that actual aggregate losses will be within 5% of expected aggregate losses 90% of the time. Using limited fluctuation credibility, determine the number of exposures required for full credibility. (A) 23,000 (B) 24,000 (C) 25,000 (D) 26,000 (E) 27,000
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 75 5.8 (3 points) Require that the estimated pure premium should be within 100k% of the expected pure premium with probability P. Assume frequency and severity are independent. Use the following notation: µf = mean frequency σf2 = variance of frequency µs = mean severity
σs2 = variance of severity
Let y be such that Φ(y) = (1 + P)/2. Using the Normal Approximation, which of the following is a formula for the number of claims needed for full credibility of the pure premium? σ2 σ y2 A. ( f + s ) 2 µf µs k
σ2 σ y2 B. ( f2 + s ) 2 µf µs k
σ2 σ 2 y2 C. ( f + s2 ) 2 µf µs k
σ2 σ 2 y2 D. ( f2 + s2 ) 2 µf µs k
E. None of the above. 5.9 (2 points) Using the formula derived in the previous question, find the number of claims required for full credibility. Require that there is a 90% chance that the estimate of the pure premium is correct within ±7.5%. The frequency distribution has a variance 2.5 times its mean. The claim amount distribution is a Pareto with α = 2.3. A. Less than 4500 B. At least 4500 but less than 4600 C. At least 4600 but less than 4700 D. At least 4700 but less than 4800 E. At least 4800 5.10 (2 points) The number of claims is Poisson. The expected number of claims needed to produce a selected standard for full credibility for the pure premium is 1500. If the severity were constant, the same selected standard for full credibility would require 850 claims. Given the information below, what is the variance of the severity in the first situation? Average Claim Frequency = 200 Average Claim Severity = 500. A. less than 190,000 B. at least 190,000 but less than 200,000 C. at least 200,000 but less than 210,000 D. at least 210,000 but less than 220,000 E. at least 220,000
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 76 5.11 (1 point) The expected number of claims needed to produce full credibility for the claim frequency is 700. Let: Average claim frequency = 100 Average claim cost = 400 Variance of claim frequency = 100 Variance of claim cost = 280,000 What is the expected number of claims required to produce full credibility for the pure premium? A. Less than 1,750 B. At least 1,750, but less than 1,850 C. At least 1,850, but less than 1,950 D. At least 1,950, but less than 2,050 E. 2,050 or more 5.12 (2 points) A full credibility standard is determined so that the total number of claims is within 5% of the expected number with probability 99%. If the same expected number of claims for full credibility is applied to the total cost of claims, the actual total cost would be within 100k% of the expected cost with 95% probability. The coefficient of variation of the severity is 2.5. The frequency is Poisson. Frequency and severity are independent. Using the normal approximation of the aggregate loss distribution, determine k. A. 4% B. 6% C. 8% D. 10% E. 12% 5.13 (1 point) Which of the following are true regarding Standards for Full Credibility? 1. A Standard for Full Credibility should be adjusted for inflation. 2. All other things being equal, if severity is not constant, a Standard for Full Credibility for pure premiums is larger than that for frequency. 3. All other things being equal, a Standard for Full Credibility for pure premiums is larger as applied to losses limited by a policy limit than when applied to unlimited losses. A. None of 1, 2 or 3 B. 1 C. 2 D. 3 E. None of A, B, C or D 5.14 (2 points) You are given the following:
•
The frequency distribution is Poisson
•
The claim amount distribution has mean 1000, variance 4,000,000.
•
Frequency and severity are independent.
Find the number of claims required for full credibility, if you require that there will be a 80% chance that the estimate of the pure premium is correct within 10%. A. Less than 750 B. At least 750 but less than 800 C. At least 800 but less than 850 D. At least 850 but less than 900 E. At least 900
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 77 5.15 (3 points) Standards for full credibility for aggregate losses are being determined for three situations. The only thing that differs among the situations is the assumed size of loss distribution: 1. Exponential. 2. Weibull, τ = 1/2. 3. LogNormal, σ = 0.8. Rank the resulting standards for full credibility from smallest to largest. A. 1, 2, 3 B. 1, 3, 2 C. 2, 1, 3 D. 2, 3, 1 E. none of A, B, C, or D 5.16 (2 points) You are given the following:
• The total losses for one risk within a class of homogeneous risks equals T. • E [{T - E(T)}2 ] = 40,000. • The average amount of each claim = 100. • The frequency for each insured is Poisson. • The average number of claims for each risk = 2. Find the number of claims required for full credibility, if you require that there will be a 90% chance that the estimate of the pure premium is correct within 5%. A. Less than 1,000 B. At least 1,000 but less than 1,500 C. At least 1,500 but less than 2,000 D. At least 2,000 but less than 2,500 E. 2,500 or more 5.17 (1 point) You are given the following:
•
You require that the estimated frequency should be within 100k% of the expected frequency with probability P.
•
The standard for full credibility for frequency is 800 claims.
•
You require that the estimated pure premium should be within 100k% of the expected pure premium with probability P.
•
The standard for full credibility for pure premiums is 2000 claims.
•
You require that the estimated severity should be
within 100k% of the expected severity with probability P. What is the standard for full credibility for the severity, in terms of the number of claims? E. 1300 A. 900 B. 1000 C. 1100 D. 1200
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 78 5.18 (3 points) You are given the following: • The number of claims follows a Poisson distribution. •
Claim sizes follow a Burr distribution, with parameters θ (unknown), α = 9, and γ = 0.25.
The number of claims and claim sizes are independent. The full credibility standard has been selected so that actual aggregate claim costs will be within 10% of expected aggregate claim costs 85% of the time. Using the methods of Classical credibility, determine the expected number of claims needed for full credibility. A. Less than 1000 B. At least 1000, but less than 10,000 C. At least 10,000, but less than 100,000 D. At least 100,000, but less than 1,000,000 E. At least 1,000,000 • •
5.19 (2 points) You are given the following: • The number of claims follows a Poisson distribution. • The variance of the number of claims is 20. • The variance of the claim size distribution is 35. • The variance of aggregate claim costs is 1300. • The number of claims and claim sizes are independent. • The full credibility standard has been selected so that actual aggregate claim costs will be within 7.5% of expected aggregate claim costs 98% of the time. Using the methods of classical credibility, determine the expected number of claims required for full credibility. A. Less than 2,000 B. At least 2,000, but less than 2,100 C. At least 2,100, but less than 2,200 D. At least 2,200, but less than 2,300 E. At least 2,300 5.20 (3 points) Determine the number of claims needed for full credibility in three situations. In each case, there will be a 90% chance that the estimate is correct within 10%. 1. Estimating frequency. Frequency is assumed to be Negative Binomial with β = 0.3. 2. Estimating severity. Severity is assumed to be Pareto with α = 5. 3. Estimating aggregate losses. Frequency is assumed to be Poisson. Severity is assumed to be Gamma with α = 2. Rank the resulting standards for full credibility from smallest to largest. A. 1, 2, 3 B. 1, 3, 2 C. 2, 1, 3 D. 2, 3, 1 E. none of A, B, C, or D
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 79 5.21 (2 points) A company has determined that the limited fluctuation full credibility standard is 16,000 claims if: (i) The total cost of claims is to be within r% of the true value with probability p. (ii) The number of claims follows a Geometric distribution with β = 0.4. (iii) The severity distribution is Exponential. The standard is changed so that the total cost of claims is to be within 3r% of the true value with probability p, where claim severity is Gamma with α = 2. Using limited fluctuation credibility, determine the expected number of claims necessary to obtain full credibility under the new standard. A. 1100 B. 1200 C. 1300 D. 1400 E. 1500 5.22 (2 points) You are given the following information about a book of business: (i) Each insuredʼs claim count has a Poisson distribution with mean λ, where λ has a gamma distribution with α = 4 and θ = 0.5. (ii) Individual claim size amounts are independent and uniformly distributed from 0 to 500. (iii) The full credibility standard is for aggregate losses to be within 10% of the expected with probability 0.98. Using classical credibility, determine the expected number of claims required for full credibility. (A) 600 (B) 700 (C) 800 (D) 900 (E) 1000 5.23 (3 points) You are given the following: • • •
Claim sizes follow a gamma distribution, with parameters α = 2.5 and θ unknown.
The number of claims and claim sizes are independent. The full credibility standard for frequency has been selected so that the actual number of claims will be within 2.5% of the expected number of claims P of the time. • The full credibility standard for aggregate loss has been selected so that the actual aggregate losses will be within 2.5% of the expected actual aggregate losses P of the time, using the same P as for the standard for frequency. • 13,801 expected claims are needed for full credibility for frequency. • 18,047 expected claims are needed for full credibility for aggregate loss. Using the methods of Classical credibility, determine the value of P. A. 80% B. 90% C. 95% D. 98% E. 99%
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 80 5.24 (4, 11/82, Q.47) (3 points) You are given the following:
•
The frequency distribution is Negative Binomial with variance equal to twice its mean.
•
The claim amount distribution is LogNormal with mean 100, variance 25,000.
•
Frequency and severity are independent.
Find the number of claims required for full credibility, if you require that there will be a 90% chance that the estimate of the pure premium is correct within 5%. Use the Normal Approximation. A. Less than 4500 B. At least 4500 but less than 4600 C. At least 4600 but less than 4700 D. At least 4700 but less than 4800 E. At least 4800 5.25 (4, 5/83, Q.36) (1 point) The number of claims is Poisson. Assume that claim severity has mean equal to 100 and standard deviation equal to 200. Which of the following is closest to the factor which would need to be applied to the full credibility standard based on frequency only, in order to approximate the full credibility standard for the pure premium? A. 0.5 B. 1.0 C. 1.5 D. 2.0 E. 5.0
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 81 5.26 (4, 5/83, Q.46) (3 points) Chebyshev's inequality says that for a probability distribution X, with mean m and standard deviation σ, for any constant "a": Prob(|X - m| ≥ aσ) ≤ 1/a2 . Using Chebyshev's inequality (rather than the Normal Approximation) derive a formula for the number of claims needed for full credibility of the pure premium. Assume frequency and severity are independent. Require that the observed pure premium should be within 100k% of the expected pure premium with probability P. Use the following notation: µf = mean frequency
σf2 = variance of frequency
µs = mean severity
σs2 = variance of severity
σ 2 σf2 + s2 µf µs A. 2 k (1-P)
σ2 σ 2 B. ( f2 + s2 ) k2 (1-P) µf µs
σ2 σ 2 k2 C. ( f + s2 ) µf µs P
σ2 σ 2 k2 D. ( f2 + s2 ) µf µs P
E. None of the above. 5.27 (2 points) Using the formula derived in the previous question, find the number of claims required for full credibility. Require that there is a 90% chance that the estimate of the pure premium is correct within 7.5%. The frequency distribution has a variance 2.5 times its mean. The claim amount distribution is a Pareto with α = 2.3. A. Less than 15,000 B. At least 15,000 but less than 16,000 C. At least 16,000 but less than 17,000 D. At least 17,000 but less than 18,000 E. At least 18,000
5.28 (4, 5/85, Q.32) (1 point) The expected number of claims needed to produce a selected level of credibility for the claim frequency is 1200. Let: Average claim frequency = 200 Average claim cost = 400 Variance of claim frequency = 200 Variance of claim cost = 80,000 What is the expected number of claims required to produce the same level of credibility for the pure premium? (Use Classical Credibility.) A. Less than 1,750 B. At least 1,750, but less than 1,850 C. At least 1,850, but less than 1,950 D. At least 1,950, but less than 2,050 E. 2,050 or more
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 82 5.29 (4, 5/85, Q.33) (2 points) How many claims are necessary for full credibility if the standard for full credibility is to have the estimated pure premium be within 8% of the true pure premium 90% of the time? Assume the average claim severity is $1000 and the standard deviation of the claim severity is 4000. Assume the variance of the number of claims is 1.5 times the mean number of claims. Assume frequency and severity are independent. A. Less than 7,150 B. At least 7,150, but less than 7,250 C. At least 7,250, but less than 7,350 D. At least 7,350, but less than 7,450 E. 7,450 or more 5.30 (4, 5/87, Q.34) (1 point) The expected number of claims needed to produce a selected standard for full credibility for the pure premium is 1800. If the claim size were constant, the same selected standard for full credibility would require 1200 claims. Given the information below, what is the variance of the claim cost in the first situation? • The number of claims is Poisson. • Average Claim Frequency = 200. • Average Claim Cost = 400. A. 20,000 B. 40,000 C. 80,000 D. 120,000 E. 160,000 5.31 (4, 5/87, Q.35) (2 points) The number of claims for a company's major line of business is Poisson distributed, and during the past year, the following claim size distribution was observed: $ 0 - 400 20 400 - 800 240 800 - 1200 320 1200 - 1600 210 1600 - 2000 100 2000 - 2400 60 2400 - 2800 30 2800 - 3200 10 3200 - 3600 10 Total 1000 The mean of this claim size distribution is $1216 and the standard deviation is
$362,944 .
You need to select the number of claims needed to ensure that the estimate of losses is within 8% of the actual value 90% of the time. How many claims are needed for full credibility if the claim size distribution is considered? A. Less than 450 claims B. At least 450, but less than 500 claims C. At least 500, but less than 550 claims D. At least 550, but less than 600 claims E. 600 claims or more
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 83 5.32 (4, 5/90, Q.29) (2 points) The ABC Insurance Company has decided to establish its full credibility requirements for an individual state rate filing using Classical Credibility. The full credibility standard is to be set so that the observed total cost of claims underlying the rate filing should be within 5% of the true value with probability 0.95. The claim frequency follows a Poisson distribution and the claim severity is distributed according to the following distribution: 1 f(x) = , for 0 ≤ x ≤ 100,000. 100,000 What is the expected number of claims, nF necessary to obtain full credibility. A.
nF < 1500
B. 1500 ≤ nF < 1800 C. 1800 ≤ nF < 2100 D. 2100 ≤ nF < 2400 E. 2400 ≤ nF 5.33 (4, 5/91, Q.22) (1 point) The average claim size for a group of insureds is $1,500 with standard deviation $7,500. Assuming a Poisson claim count distribution, calculate the expected number of claims so that the total loss will be within 6% of the expected total loss with probability P = 0.90. A. Less than 10,000 B. At least 10,000 but less than 15,000 C. At least 15,000 but less than 20,000 D. At least 20,000 but less than 25,000 E. At least 25,000 5.34 (4, 5/91, Q.39) (3 points) The full credibility standard for a company is set so that the total number of claims is to be within 5% of the true value with probability P. This full credibility standard is calculated to be 800 claims. The standard is altered so that the total cost of claims is to be within 10% of the true value with probability P. The claim frequency has a Poisson distribution and the claim severity has the following distribution. f(x) = (0.0002) (100 - x), 0 ≤ x ≤ 100. What is the expected number of claims necessary to obtain full credibility under the new standard? A. Less than 250 B. At least 250 but less than 500 C. At least 500 but less than 750 D. At least 750 but less than 1000 E. At least 1000
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 84 5.35 (4B, 5/92, Q.1) (2 points) You are given the following information: • A standard for full credibility of 1,000 claims has been selected so that the actual pure premium would be within 10% of the expected pure premium 95% of the time. • The number of claims follows a Poisson distribution, and is independent of the severity distribution. Using the concepts from Classical Credibility determine the coefficient of variation of the severity distribution underlying the full credibility standard. A. Less than 1.20 B. At least 1.20 but less than 1.35 C. At least 1.35 but less than 1.50 D. At least 1.50 but less than 1.65 E. At least 1.65 5.36 (4B, 5/92, Q.16) (2 points) You are given the following information: • The number of claims follows a Poisson distribution. • Claim severity is independent of the number of claims and has the following distribution: f(x) = (5/2) x-7/2 , x > 1. A full credibility standard is determined so that the total number of claims is within 5% of the expected number with probability 98%. If the same expected number of claims for full credibility is applied to the total cost of claims, the actual total cost would be within 100K% of the expected cost with 95% probability. Using the normal approximation of the aggregate loss distribution, determine K. A. Less than 0.04 B. At least 0.04 but less than 0.05 C. At least 0.05 but less than 0.06 D. At least 0.06 but less than 0.07 E. At least 0.07
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 85 5.37 (4B, 11/92, Q.1) (2 points) You are given the following: • The number of claims is Poisson distributed. • Number of claims and claim severity are independent. • Claim severity has the following distribution: Claim Size Probability 1 0.50 2 0.30 10 0.20 Determine the number of claims needed so that the total cost of claims is within 10% of the expected cost with 90% probability. A. Less than 625 B. At least 625 but less than 825 C. At least 825 but less than 1,025 D. At least 1,025 but less than 1,225 E. At least 1,225 5.38 (4B, 11/92, Q.10) (2 points) You are given the following: • A full credibility standard of 3,025 claims has been determined using classical credibility concepts. • The full credibility standard was determined so that the actual pure premium is within 10% of the expected pure premium 95% of the time. • Number of claims is Poisson distributed. Determine the coefficient of variation for the severity distribution. A. Less than 2.25 B. At least 2.25 but less than 2.75 C. At least 2.75 but less than 3.25 D. At least 3.25 but less than 3.75 E. At least 3.75
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 86 5.39 (4B, 11/92, Q.15) (2 points) You are given the following: • X is the random variable for claim size. •
N is the random variable for number of claims and has a Poisson distribution.
•
X and N are independent.
•
n0 is the standard for full credibility based only on number of claims.
•
nF is the standard for full credibility based on total cost of claims.
•
n is the observed number of claims.
•
C is the random variable for total cost of claims.
• Z is the amount of credibility to be assigned to total cost of claims. According to the Classical credibility concepts, which of the following are true? 1. Var(C) = E(N) Var(X) + E(X) Var(N) E(X)2 + Var(X) 2. nF = n0 E(X2 ) 3. Z =
n nF
A. 1 only
B. 2 only
C. 1, 3 only
D. 2, 3 only
E. 1, 2, 3
5.40 (4B, 5/93, Q.10) (2 points) You are given the following: • The number of claims for a single insured follows a Poisson distribution. • The coefficient of variation of the severity distribution is 2. • The number of claims and claim severity distributions are independent. • Claim size amounts are independent and identically distributed. • Based on Classical credibility, the standard for full credibility is 3415 claims. With this standard, the observed pure premium will be within k% of the expected pure premium 95% of the time. Determine k. A. Less than 5.75% B. At least 5.75% but less than 6.25% C. At least 6.25% but less than 6.75% D. At least 6.75% but less than 7.25% E. At least 7.25%
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 87 5.41 (4B, 11/93, Q.11) (3 points) You are given the following: • Number of claims follows a Poisson distribution. • Claim severity is independent of the number of claims and has the following probability density distribution f(x) = 5x-6, x > 1. A full credibility standard has been determined so that the total cost of claims is within 5% of the expected cost with a probability of 90%. If the same number of claims for full credibility of total cost is applied to frequency only, the actual number of claims would be within 100k% of the expected number of claims with a probability of 95%. Using the normal approximation of the aggregate loss distribution, determine k. A. Less than 0.0545 B. At least 0.0545, but less than 0.0565 C. At least 0.0565, but less than 0.0585 D. At least 0.0585, but less than 0.0605 E. At least 0.0605 5.42 (4B, 5/94, Q.13) (2 points) You are given the following: • 120,000 exposures are needed for full credibility. • The 120,000 exposures standard was selected so that the actual total cost of claims is within 5% of the expected total 95% of the time. • The number of claims per exposure follows a Poisson distribution with mean m. • m was estimated from the following observed data using the method of moments: | Year | Exposures | Claims | 1 18,467 1,293 2 26,531 1,592 3 20,002 1,418 If mean claim severity is $5,000, determine the standard deviation of the claim severity distribution. A. Less than $9,000 B. At least $9,000, but less than $12,000 C. At least $12,000, but less than $15,000 D. At least $15,000, but less than $18,000 E. At least $18,000
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 88 5.43 (4B, 11/94, Q.11) (3 points) You are given the following: Number of claims follows a Poisson distribution with mean m. X is the random variable for claim severity, and has a Pareto distribution with parameters α = 3.0 and θ = 6000. A standard for full credibility was developed so that the observed pure premium is within 10% of the expected pure premium 98% of the time. Number of claims and claims severity are independent. Using Classical credibility concepts, determine the number of claims needed for full credibility for estimates of the pure premium. A. Less than 600 B. At least 600, but less than 1200 C. At least 1200, but less than 1800 D. At least 1800, but less than 2400 E. At least 2400 5.44 (4B, 5/95, Q.10) (1 point) You are given the following: • The number of claims follows a Poisson distribution. • The distribution of claim sizes has a mean of 5 and variance of 10. • The number of claims and claim sizes are independent. How many expected claims are needed to be 90% certain that actual claim costs will be within 10% of the expected claim costs? A. Less than 100 B. At least 100, but less than 300 C. At least 300, but less than 500 D. At least 500, but less than 700 E. At least 700
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 89 5.45 (4B, 5/95, Q.26) (3 points) You are given the following: • 40,000 exposures are needed for full credibility. • The 40,000 exposures standard was selected so that the actual total cost of claims is within 7.5% of the expected total 95% of the time. • The number of claims per exposure follows a Poisson distribution with mean m. • The claim size distribution is lognormal with parameters µ (unknown) and σ = 1.5. •
The lognormal distribution has the following moments: mean: exp(µ + σ2/2)
variance: exp(2µ + σ2) {exp(σ2) - 1}.
• The number of claims per exposure and claim sizes are independent. Using the methods of classical credibility, determine the value of m. A. Less than 0.05 B. At least 0.05, but less than 0.10 C. At least 0.10, but less than 0.15 D. At least 0.15, but less than 0.20 E. At least 0.20 5.46 (4B, 11/95 Q.11) (2 points) You are given the following:
• • • • •
The number of claims follows a Poisson distribution. Claim sizes follow a Pareto distribution, with parameters θ = 3000 and α = 4. The number of claims and claim sizes are independent. 2000 expected claims are needed for full credibility.
The full credibility standard has been selected so that actual claim costs will be within 5% of expected claim costs P% of the time. Using the methods of Classical credibility, determine the value of P. A. Less than 82.5 B. At least 82.5, but less than 87.5 C. At least 87.5, but less than 92.5 D. At least 92.5, but less than 97.5 E. At least 97.5
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 90 5.47 (4B, 5/96, Q.27) (2 points) You are given the following: • The number of claims follows a Poisson distribution.
• •
Claim sizes follow a gamma distribution, with parameters α = 1 and θ (unknown).
The number of claims and claim sizes are independent. The full credibility standard has been selected so that actual claim costs will be within 5% of expected claim costs 90% of the time. Using the methods of Classical credibility, determine the expected number of claims required for full credibility. A. Less than 1,000 B. At least 1,000, but less than 2,000 C. At least 2,000, but less than 3,000 D. At least 3,000 E. Cannot be determined from the given information. 5.48 (4B, 11/96, Q.2) (1 point) Using the methods of Classical credibility, a full credibility standard of 1,000 expected claims has been established so that actual claim costs will be within 100c% of expected claim costs 90% of the time. Determine the number of expected claims that would be required for full credibility if actual claim costs were to be within 100c% of expected claim costs 95% of the time. A. Less than 1,100 B. At least 1,100, but less than 1,300 C. At least 1,300, but less than 1,500 D. At least 1,500, but less than 1,700 E. At least 1,700 5.49 (4B, 11/96, Q.28) (2 points) You are given the following:
• • •
The number of claims follows a Poisson distribution. Claim sizes are discrete and follow a Poisson distribution with mean 4.
The number of claims and claim sizes are independent. The full credibility standard has been selected so that actual claim costs will be within 10% of expected claim costs 95% of the time. Using the methods of Classical credibility, determine the expected number of claims required for full credibility. A. Less than 400 B. At least 400, but less than 600 C. At least 600, but less than 800 D. At least 800, but less than 1,000 E. At least 1,000
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 91 5.50 (4B, 5/97, Q.2) (2 points) The number of claims follows a Poisson distribution. Using the methods of Classical credibility, a full credibility standard of 1,200 expected claims has been established for aggregate claim costs. Determine the number of expected claims that would be required for full credibility if the coefficient of variation of the claim size distribution were changed from 2 to 4 and the range parameter, k, were doubled. A. 500 B. 1,000 C. 1,020 D. 1,200 E. 2,040 5.51 (4B, 11/97, Q.24 & Course 4 Sample Exam 2000, Q.15) (3 points) You are given the following: • The number of claims per exposure follows a Poisson distribution with mean 0.01. •
Claim sizes follow a lognormal distribution, with parameters µ (unknown) and σ = 1.
• •
The number of claims per exposure and claim sizes are independent. The full credibility standard has been selected so that actual aggregate claim costs will be within 10% of expected aggregate claim costs 95% of the time. Using the methods of Classical credibility, determine the number of exposures required for full credibility. A. Less than 25,000 B. At least 25,000, but less than 50,000 C. At least 50,000, but less than 75,000 D. At least 75,000, but less than 100,000 E. At least 100,000
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 92 Use the following information for the next two questions: You are given the following: • The number of claims follows a Poisson distribution. •
Claim sizes follow a inverse gamma distribution, with parameters α = 4 and θ unknown.
• •
The number of claims and claim sizes are independent The full credibility standard has been selected so that the actual aggregate claim costs will be within 5% of expected aggregate claim costs 95% of the time.
5.52 (4B, 5/98 Q.18) (2 points) Using the methods of Classical credibility, determine the expected number of claims required for full credibility. A. Less than 1,600 B. At least 1,600, but less than 18,00 C. At least 1,800, but less than 2,000 D. At least 2,000, but less than 2,200 E. At least 2,200 5.53 (4B, 5/98 Q.19) (1 point) If the number of claims were to follow a negative binomial distribution instead of a Poisson distribution, determine which of the following statements would be true about the expected number of claims required for full credibility. A. The expected number of claims required for full credibility would be smaller. B. The expected number of claims required for full credibility would be the same. C. The expected number of claims required for full credibility would be larger. D. The expected number of claims required for full credibility would be either the same or smaller, depending on the parameters of the negative binomial distribution. E. The expected number of claims required for full credibility would be either smaller or larger, depending on the parameters of the negative binomial distribution.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 93 5.54 (4B, 11/98, Q.5) (2 points) You are given the following: • The number of claims follows a Poisson distribution. • The variance of the number of claims is 10. • The variance of the claim size distribution is 10. • The variance of aggregate claim costs is 500. • The number of claims and claim sizes are independent. • The full credibility standard has been selected so that actual aggregate claim costs will be within 5% of expected aggregate claim costs 95% of the time. Using the methods of Classical credibility, determine the expected number of claims required for full credibility. A. Less than 2,000 B. At least 2,000, but less than 4,000 C. At least 4,000, but less than 6,000 D. At least 6,000, but less than 8,000 E. At least 8,000 5.55 (4B, 11/98, Q.29) (3 points) You are given the following: • The number of claims follows a Poisson distribution. •
Claim sizes follow a Burr distribution, with parameters θ (unknown), α = 6, and γ = 0.5 .
• • •
The number of claims and claim sizes are independent. 6,000 expected claims are needed for full credibility. The full credibility standard has been selected so that actual aggregate claim costs will be within 10% of expected aggregate claim costs P% of the time. Using the methods of Classical credibility, determine the value of P. θn Γ(1 + n / γ) Γ(α - n / γ) Hint: For the Burr Distribution E[Xn ] = . Γ(α) A. Less than 80 B. At least 80, but less than 85 C. At least 85, but less than 90 D. At least 90, but less than 95 E. At least 95
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 94 5.56 (4B, 5/99, Q.19) (1 point) You are given the following: • The number of claims follows a Poisson distribution. • The coefficient of variation of the claim size distribution is 2. • The number of claims and claim sizes are independent. • 1,000 expected claims are needed for full credibility. • The full credibility standard has been selected so that the actual number of claims will be within k% of the expected number of claims P% of the time. Using the methods of Classical credibility, determine the number of expected claims that would be needed for full credibility if the full credibility standard were selected so that actual aggregate claim costs will be within k% of expected aggregate claim costs P% of the time. A. 1,000 B. 1,250 C. 2,000 D. 2,500 E. 5,000 5.57 (4B, 11/99, Q.2) (2 points) You are given the following: • The number of claims follows a Poisson distribution. • Claim sizes follow a lognormal distribution, with parameters µ and σ . • The number of claims and claim sizes are independent. • 13,000 expected claims are needed for full credibility. • The full credibility standard has been selected so that actual aggregate claim costs will be within 5% of expected aggregate claim costs 90% of the time. Determine σ. A. Less than 1.2 B. At least 1.2, but less than 1.4 C. At least 1.4, but less than 1.6 D. At least 1.6, but less than 1.8 E. At least 1.8
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 95 5.58 (4, 11/00, Q.14) (2.5 points) For an insurance portfolio, you are given: (i) For each individual insured, the number of claims follows a Poisson distribution. (ii) The mean claim count varies by insured, and the distribution of mean claim counts follows a gamma distribution. (iii) For a random sample of 1000 insureds, the observed claim counts are as follows: Number Of Claims, n 0 1 2 3 4 5 Number Of Insureds, fn 512 307 123 41 11 6
∑ n fn = 750, ∑ n2 fn = 1494. (iv) Claim sizes follow a Pareto distribution with mean 1500 and variance 6,750,000. (v) Claim sizes and claim counts are independent. (vi) The full credibility standard is to be within 5% of the expected aggregate loss 95% of the time. Determine the minimum number of insureds needed for the aggregate loss to be fully credible. (A) Less than 8300 (B) At least 8300, but less than 8400 (C) At least 8400, but less than 8500 (D) At least 8500, but less than 8600 (E) At least 8600 5.59 (4, 11/02, Q.14 & 2009 Sample Q. 39) (2.5 points) You are given the following information about a commercial auto liability book of business: (i) Each insuredʼs claim count has a Poisson distribution with mean λ, where λ has a gamma distribution with α = 15 and θ = 0.2. (ii) Individual claim size amounts are independent and exponentially distributed with mean 5000. (iii) The full credibility standard is for aggregate losses to be within 5% of the expected with probability 0.90. Using classical credibility, determine the expected number of claims required for full credibility. (A) 2165 (B) 2381 (C) 3514 (D) 7216 (E) 7938
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 96 5.60 (4, 11/03, Q.3 & 2009 Sample Q.2) (2.5 points) You are given: (i) The number of claims has a Poisson distribution. (ii) Claim sizes have a Pareto distribution with parameters θ = 0.5 and α = 6. (iii) The number of claims and claim sizes are independent. (iv) The observed pure premium should be within 2% of the expected pure premium 90% of the time. Determine the expected number of claims needed for full credibility. (A) Less than 7,000 (B) At least 7,000, but less than 10,000 (C) At least 10,000, but less than 13,000 (D) At least 13,000, but less than 16,000 (E) At least 16,000 5.61 (4, 5/05, Q.2 & 2009 Sample Q.173) (2.9 points) You are given: (i) The number of claims follows a negative binomial distribution with parameters r and β = 3. (ii) Claim severity has the following distribution: Claim Size Probability 1 0.4 10 0.4 100 0.2 (iii) The number of claims is independent of the severity of claims. Determine the expected number of claims needed for aggregate losses to be within 10% of expected aggregate losses with 95% probability. (A) Less than 1200 (B) At least 1200, but less than 1600 (C) At least 1600, but less than 2000 (D) At least 2000, but less than 2400 (E) At least 2400
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 97 5.62 (4, 11/05, Q.35 & 2009 Sample Q.245) (2.9 points) You are given: (i) The number of claims follows a Poisson distribution. (ii) Claim sizes follow a gamma distribution with parameters α (unknown) and θ = 10,000. (iii) The number of claims and claim sizes are independent. (iv) The full credibility standard has been selected so that actual aggregate losses will be within 10% of expected aggregate losses 95% of the time. Using limited fluctuation (classical) credibility, determine the expected number of claims required for full credibility. (A) Less than 400 (B) At least 400, but less than 450 (C) At least 450, but less than 500 (D) At least 500 (E) The expected number of claims required for full credibility cannot be determined from the information given. 5.63 (4, 11/06, Q.30 & 2009 Sample Q.273) (2.9 points) A company has determined that the limited fluctuation full credibility standard is 2000 claims if: (i) The total number of claims is to be within 3% of the true value with probability p. (ii) The number of claims follows a Poisson distribution. The standard is changed so that the total cost of claims is to be within 5% of the true value with probability p, where claim severity has probability density function: 1 f(x) = , 0 ≤ x ≤ 10,000. 10,000 Using limited fluctuation credibility, determine the expected number of claims necessary to obtain full credibility under the new standard. (A) 720 (B) 960 (C) 2160 (D) 2667 (E) 2880
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 98 Solutions to Problems: 5.1. B. The mean severity = exp[µ + 0.5σ2 ] = exp(6.72) = 828.82. The second moment of the severity = exp[2µ + 2σ2 ] = exp (14.88) = 2,899,358. Thus 1 + CV2 = E[X2 ]/E[X]2 = 2,899,358 /828.822 = 4.221. y = 1.960 since Φ(1.960) = 0.975 = (1 + 0.95)/2. Therefore n0 = y2 / k2 = (1.96/.1)2 = 384. Therefore nF = n0 (1+CV2 ) = (384)(4.221) = 1621 claims. 5.2. A. Square of Coefficient of Variation = (1 million)/(5002 ) = 4. y = 1.282 since Φ(1.282) = 0.9 = (1 + 0.8)/2. k = 5%. Therefore in terms of number of claims the full credibility standard is: (y2 / k2 ) (1+ CV2 ) = (1.282/0.05)2 (1 + 4) = 3287 claims. This is equivalent to: 3287 / 0.07 = 46,958 policies. 5.3. C. The severity has a mean of 166.7, and a second moment of 41,667: 500
∫0
500
x f(x) dx = 0.000008
500
∫0
∫0
500
x2 f(x)
dx = 0.000008
∫0
x = 500
(500x - x 2) dx = (0.000008) (250x2 - x3 / 3)]
= 166.7.
x =0 x = 500
(500x2 - x3 ) dx = (0.000008) (500x3 / 3 - x 4 / 4)]
= 41,667.
x =0
1 + CV2 = E[X2 ] / E[X]2 = 41667 / 166.72 = 1.5. The standard for Full Credibility for the pure premiums for k = 5% is therefore nF = n0 (1+CV2 ) = (5000)(1.5) =7500. For k = 10% we need to multiply by: (5%/10%)2 = 1/4 since the full credibility standard is inversely proportional to k2 . 7500/4 = 1875. 5.4. B. We have y = 2.326 since Φ(2.326) = 0.99 = (1+.98)/2. Therefore n0 = y2 / k2 = (2.326 / 0.05)2 = 2164. nF = n0 (1+CV2 ), therefore CV =
nF - 1= n0
3000 - 1 = 0.62. 2164
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 99 5.5. D. We have y = 1.645 since Φ(1.645) = .95 = (1+.9)/2. Therefore, n0 = y2 / k2 = (1.645 / 0.03)2 = 3007. The mean severity is (1)(0.5) + (5)(0.3) + (10)(0.2) = 4. The 2nd moment of the severity is: (12 )(0.5) + (52 )(0.3) + (102 )(0.2) = 28. 1 + CV2 = E[X2 ]/E[X]2 = 28 / 42 = 1.75. nF = n0 (1+CV2 ) = (3007)(1.75) = 5262. 5.6. C. P = 99% ⇒ y = 2.576. ⇒ n0 = (2.576/0.1)2 = 664 claims. For the Pareto severity: 1 + CV2 = E[X2 ] / E[X]2 =
2 θ2 ⎛ θ ⎞2 / = 3. (4 - 1)(4 - 2) ⎝ 4 - 1⎠
Thus the standard for full credibility is: (664)(3) = 1992 claims. Thus, 1992 claims ⇔ 50,000 exposures. ⇒ λ = 1992 / 50,000 = 3.98%. 5.7. E. P = 90%. ⇒ y = 1.645. ⇒ n0 = (1.645/0.05)2 = 1082. Standard for full credibility is: n0 CVPP2 = (1082)(52 ) = 27,050 exposures. 5.8. C. Assume there are N claims expected and therefore N/µf exposures. The mean pure premium is m = Nµs. For frequency and severity independent, the variance of the pure premium for a single exposure is: µf σs2 + µs2 σf2 . The variance of the aggregate loss for N/µf independent exposures is: σ2 = (N / µf)(µf σs2 + µs2 σf2 ) = N (σs2 + µs2 σf2 /µf). We desire that Prob[m - km ≤ X ≤ m + km] ≥ P. Using the Normal Approximation this is true provided km = yσ Therefore, k2 m2 = y2 σ2 . Thus, k2 N2 µs2 = y2 N (σs2 + µs2 σf2 /µf). Solving, N = y2 (σs2 + µs2 σf2 /µf)/(k2 µs2 ) = (σf2 /µ f + σs2 / µs2 )(y2 /k2 ) = n0 (σf2 /µf + CVSev2 ). Comment: See Mayerson, Jones and Bowers “The Credibility of the Pure Premium”, PCAS 1968. Note that if one assumes a Poisson Frequency Distribution, then σf2 /µf = 1 and the formula becomes: (1 + CV2 ) {y2 / k2 }.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 100 5.9. E. Φ(1.645) = 0.95 so that y = 1.645. n0 = y2 / k2 = (1.645/0.075)2 = 481. Using the formulas 2 θ2 for the moments: CV2 = E[X2 ] / E2 [X] - 1 =
(α − 1) (α − 2) - 1 = 2 (α−1) / (α−2) - 1 2 θ α −1
(
)
= α/(α−2). For α = 2.3, CV2 = 2.3 / 0.3 = 7.667. Therefore, n0 (σf2 /µf + CV2 ) = (481)(2.5 + 7.667) = 4890. 5.10. B. nF = n0 (1 + CV2 ). Therefore CV2 = {(nF / n0 ) - 1} = {(1500 / 850) - 1} = 0.7647. Variance of severity = CV2 mean2 = (0.7647)(500)2 = 191,175. 5.11. C. CV2 = 280000 / 4002 = 1.75. nF = n0 (1+CV2 ) = (700)(1 + 1.75) = 1925. 5.12. D. Φ( 2.576) = 0.995, so y = 2.576. For frequency the standard for full credibility is: ( 2.576/0.05)2 = 2654. Φ( 1.960) = 0.975, so y = 1.960 for the Standard for Full Credibility for pure premium. Thus 2654 = nF = (y2 / k2 )(1 + CV2 ) = {1.962 /k2 }(1 + 2.52 ) = 27.85 / k2 . Thus k =
27.85 = 0.102. 2654
5.13. C. 1. F. The formula for the Standard for Full Credibility for either severity or the Pure Premium involves the severity distribution via the coefficient of variation, which is not effected by (uniform) inflation. (The Standard for Full Credibility for frequency doesnʼt involve severity at all, and is thus also unaffected.) 2. T. (y2 / k2 )(1+ CV2 ) ≥ (y2 / k2 ). 3. F. Limited (basic limits) losses have a smaller coefficient of variation than do unlimited (total limits) losses. Therefore, the Standard for Full Credibility for Basic Limits losses is less. 5.14. C. n0 = y2 / k2 = (1.282/0.10)2 = 164. For the Pure Premium, the Standard For Full Credibility is: n0 (1 + CV2 ) = (164)(1 + 4000000/10002 ) = (164)(5) = 820.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 101 5.15. E. The standard for full credibility is n0 (σf2 /µf + CVsev2 ). Since the only thing that differs is the severity distribution, the ranking depends on CVsev, the coefficient of variation of the severity distribution. For the Exponential, CV = 1. For the Weibull, 1 + CV2 = E[X2 ]/E[X]2 = CV =
θ2 Γ(1 + 2 / τ) Γ(1 + 2 / τ) Γ(5) 4! = = = = 6. {θ Γ(1 + 1/ τ)}2 Γ(1 + 1/ τ)2 Γ(3)2 (3!)2
5 = 2.236.
For the Lognormal, 1 + CV2 = E[X2 ]/E[X]2 = exp[2µ + 2σ2] / exp[µ + σ2/2]2 = exp[σ2] = exp[0.64] = 1.896. CV =
0.896 = 0.95. From smallest to largest: 3, 1, 2.
5.16. D. k = 5% and P = 90%. We have y = 1.645 since Φ(1.645) = 0.95 = (1+P)/2. Therefore, n0 = y2 /k2 = (1.645/0.05)2 = 1082. For Poisson frequency, the variance of the total losses is: (mean frequency)(µs2 + σs2) = (mean frequency)(mean severity)2 (1 + CVsev2 ). Thus 40,000 = (2)(1002 )(1 + CVsev2 ). ⇒ (1 + CV2 ) = 2. But nF = n0 (1+CVsev2 ) = (1082)(2) = 2164 claims. 5.17. D. The Standard for Full Credibility for the pure premium is the sum of those for frequency and severity. Thus in this case, the standard for full credibility for the severity is 2000 - 800 =1200 claims. 5.18. E. For the Burr Distribution E[Xn ] = λn/τΓ(α − n/τ)Γ(1+ n/τ)/Γ(α). For α = 9 and τ = 0.25, E[X] = λ4 Γ(9-4)Γ(1+ 4)/Γ(9) = λ4 (4!)(4!)/8! = λ4/70. E[X2 ] = λ8 Γ(9-8)Γ(1+ 8)/Γ(9) = λ8 (1)(8!)/8! = λ8. (1+CV2 ) = E[X2 ] / E2 [X] = (λ8)/(λ4/70)2 = 4900. We have y = 1.439 since Φ(1.439) = 0.925 = (1+P)/2. k = 0.10. Therefore, nF = n0 (1+CV2 ) = (y/k)2 (1+CV2 ) = (1.439/0.10}2 (4900) = 1.015 million claims. Comment: For τ = 0.25 one gets a very heavy-tailed Burr Distribution and therefore a very large Standard for Full Credibility.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 102 5.19. B. Frequency is Poisson and therefore µf = σf2 . σpp2 = µs2 σf2 + µfσs2 = µf (µs2 + σs2 ). Thus 1300 = 20(µs2 + 35 ). Therefore, µs2 = 30. C Vsev2 = σs2 / µs2 = 35/30 = 1.167. k = 0.075. Φ(y) = (1+P)/2 = (1+.98)/2 = 0.99. Thus y = 2.326. n0 = (y/k)2 = (2.326/0.075)2 = 962. nF = n0 (1+CV2 ) = (962) (1 + 1.167) = 2085. 5.20. B. For situation #1: n0 (σf2 /µf) = n0 rβ(1 + β)/(rβ) = n0 (1 + β) = 1.3n0 . For situation #2: n0 (CV2 ) = n0 (E[X2 ]/E[X]2 - 1) = n0 ({2θ2 /(α−1)(α−2)}/{θ / (α−1)}2 - 1) = {2(α−1)/(α−2) - 1}n0 = (α/(α - 2))n0 = (5/3)n0 = 1.67n0 . For situation #3: n0 (1 + CV2 ) = n0 (1 + αθ2/(αθ)2) = n0 (1 + 1/α) = 1.5n0 . From smallest to largest: 1, 3, 2. 5.21. D. For the Geometric Distribution: variance / mean = β(1+β) / β = 1 + β = 1.4. For the Exponential distribution: CV = 1. For the Gamma Distribution, CV2 = αθ2 / (αθ)2 = 1/α = 1/2. Old standard: 16,000 = (y/r)2 (1.4 + 12 ). ⇒ (y/r)2 = 16,000/2.4. New standard: {y/(3r)}2 (1.4 + 1/2) = (1/9) (y/r)2 (1.9) = (1.9/9) (16,000/2.4) = 1407. 5.22. E. For the Gamma-Poisson, the mixed distribution is Negative Binomial, with r = α = 4 and β = θ = 0.5. Therefore, for frequency σf2 /µf = rβ(1 + β)/ (rβ) = 1 + β = 1.5. For the Uniform Distribution from 0 to 500, σs2 / µs2 = {(5002 )/12}/2502 = 1/3. For P = 0.98, y = 2.326. k = 0.1. {y2 / k2 }(σf2 /µf + σs2 / µs2 ) = (2.326/0.1)2 (1.5 + 0.333) = 992 claims. Comment: Similar to 4, 11/02, Q.14.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 103 5.23. E. 13,800 = (σf2 /µf)n0 . 18,050 = (σf2 /µf + CVSev2 )n0 . Subtracting the first equation from the second: CVSev2 n0 = 4246. For the Gamma, CV2 = variance/mean2 = αθ2/(αθ)2 = 1/α = 1/2.5 = 0.4. Therefore, n0 = 4246/0.4 = 10,615. ⇒ (y/k)2 = 10,615. ⇒ y/k = 103.03.
⇒ y = (103.03)(2.5%) = 2.576. 99.5% = Φ[y] = (1+P)/2. ⇒ P = 99%. 5.24. E. n0 = (y/k)2 = (1.645/.05)2 = 1082. For the Pure Premium, when we have a general frequency distribution (not necessarily Poisson), the Standard For Full Credibility is: n0 (σf2 /µf + CVsev2 ) = (1082) (2 + 25000/1002 ) = (1082)(4.5) = 4869. 5.25. E. The coefficient of variation = standard deviation / mean = 200 /100 = 2. nF = n0 (1+CVsev2 ) = n0 (1+22 ) = 5n0 . 5.26. A. Assume there are N claims expected and therefore N/µf exposures. The mean pure premium is m = Nµs. For frequency and severity independent, the variance of the pure premium for a single exposure is: µf σs2 + µs2 σf2 . The variance of the aggregate loss for N/µf independent exposures = σ2 = (N/µf)(µf σs2 + µs2 σf2 ) = N (σs2 + µs2 σf2 /µf). We desire that: Prob[(m - km) ≤ X ≤ ( m + km)] ≥ P or equivalently Prob[|X-m| ≥ km] ≤ 1 - P. This is in the form of Chebyshevʼs inequality provided we take 1/a2 = 1 - P, and km = aσ. Thus a = 1 /
1 - P and km = σ /
1 - P . Therefore k2 m2 (1 - P) = σ2 .
Thus, k2 N2 µs2 (1-P) = N (σs2 + µs2 σf2 /µf ). Solving for N = (σs2 + µs2 σf2 /µf ) / {k2 µs2 (1-P)} = (σf2 /µ f + σs2 / µs2 ) / {k2 (1-P)} Comment: See Dale Nelsonʼs review in PCAS 1969 of Mayerson, Jones and Bowers “The Credibility of the Pure Premium”. Note that this formula resembles that derived from the normal approximation, but with y2 replaced by 1/(1 - P). For example for P = 95%, y2 = 1.962 = 3.84, while 1/(1-P) = 1/.05 = 20. Thus while Chebyshevʼs inequality holds regardless of the form of the distribution, it is very conservative if the distribution is approximately Normal. For P = 95% it results in a standard for full credibility 5.2 times as large.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 104 5.27. E. Using the formulas for the moments: CV2 = E[X2 ] / E2 [X] - 1 = 2 θ2
(α − 1) (α − 2) - 1= 2(α−1)/(α−2) - 1 = α / (α−2). 2 θ α −1
(
)
For α = 2.3, CV2 = 2.3 / 0.3 =7.667. Therefore, (σf2 /µf + σs2 /µs2 ) / {k2 (1-P)} = (2.5 + 7.667) / {(0.0752 )(1 - 0.9)} = 18,075. Comment: Note how much larger the Standard for Full Credibility is than when using the Normal Approximation as in a previous question. 5.28. B. nF = n0 (1+CV2 ) = (1200)(1 + 80000/4002 ) = (1200)(1.5) = 1800 claims. 5.29. D. k = 8%, P = 90%. Therefore y = 1.645, since Φ(1.645) = 0.95 = (1+P)/2. n0 = (y/k)2 = (1.645/0.08)2 = 423. CV = standard deviation / mean = 4000/1000 = 4. nF = (σf2 /µf + σs2 /µs2 ) {y2 /k2 } = n0 (σf2 /µf + CV2 ) = (423)(1.5 + 42 ) = 7403. 5.30. C. For a Poisson frequency, the standard for full credibility for the pure premium is n0 (1 + CV2 ), where CV is the coefficient of variation of the severity and n0 is the standard for full credibility for frequency. Therefore in this case, 1800 = 1200(1 + CV2 ). Therefore CV2 = (1800/1200) - 1 = 0.5. But the square of the coefficient of variation = variance / mean2 . Therefore variance of severity = (0.5)(4002 ) = 80,000. Comment: Given an output, you are asked to solve for the missing input. Note that one makes no use of the given average frequency. 5.31. C. k = 0.08 and P = 0.90. y = 1.645 , since Φ(1.645 ) = 0.95 = (1+P)/2. n0 = y2 / k2 = (1.645/0.08)2 = 423. The coefficient of variation of the severity = standard deviation / mean. C V2 = 362,944 / 12162 = 0.245. Thus nF = n0 (1 + CV2) = (423)(1.245) = 527.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 105 5.32. C. The mean of the severity distribution is 100,000/2 = 50,000. The Second Moment of the Severity Distribution is the integral from 0 to 100,000 of x2 f(x), which is 100,0003 / {3 (100000)}. Thus the variance is: 100,0002 / 3 - 50,0002 = 833,333,333. Thus the square of the coefficient of variation is 833,333,333 / 50,0002 = 1/3. k = 5% (within ±5%) and since P = 0.95, y = 1.960 since Φ(1.960) = (1+P)/2 = 0.975. The Standard for Full Credibility Pure Premium, nF = (y2 / k2) (1+CV2 ) = (1.96/.05)2 (1 + 1/3) = (1537) (4/3) = 2049 claims. Comment: For the uniform distribution on the interval (a,b) the coefficient of variation is: Thus the CV2 =
b-a . (b + a) 3
(b - a)2 (100,000 - 0)2 = = 1/3. (b + a)2 (3) (100,000 + 0)2 (3)
Note that the CV2 is 1/3 whenever a = 0. 5.33. C. k = 6% ( within 6% of the expected total cost ). Φ(1.645) = 0.95 = (1+0.90)/2, so that y = 1.645. Standard for full credibility for frequency = n0 = y2 / k2 = (1.645 / 0.06)2 = 751.7. Coefficient of Variation of the severity = 7500 /1500 = 5. Standard for full credibility for pure premium = nF = n0 (1+CV2) = (751.7)(1 + 52 ) = 19,544 claims. 5.34. B. For the given severity distribution the mean is: 100
100
∫0 x f(x) dx = (0.0002) ∫0 x (100 - x) dx = (0.0002)
x = 100
(50x2
-
x3
/ 3 )]
= 33.33
x =0
For the given severity distribution the second moment is: 100
∫0
100
x2
f(x) dx = (0.0002)
∫0
x2
(100 - x) dx = (0.0002)
{(100 / 3)x3
-
x = 100 4 x / 4}
]
= 1666.67
x= 0
Thus the variance of the severity is: 1666.67 - 33.332 = 555.8. Coefficient of variation squared = CV2 = 555.8 / 33.332 = 0.50. For the given standard for full credibility for frequency, 800 = y2 / k2 = y2 / 0.052 .
⇒ y2 = (800)(0.052 ) = 2. Now for the same P value that produced this y value, we want a standard for full credibility for pure premiums, with k = 0.10: {y2 / k2}(1+CV2 ) = {2 / 0.12 } (1 + 0.50) = (200)(1.5) = 300 claims.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 106 5.35. B. k = 10% ( within 10% of the expected pure premium ). Φ(1.960) = 0.975 = (1+.95)/2, so that y = 1.960. Standard for full credibility for frequency = n0 = (y/k)2 = (1.960 / 0.10)2 = 384 claims. Standard for full credibility for pure premium = nF = n0 (1+CV2 ). Therefore CV2 = (nF / n0 ) - 1 = (1000/ 384) -1 = 1.604. Thus CV = 1.27. 5.36. C. k = 5% ( within 5% of the expected frequency). Φ(2.327) = 0.99 = (1+.98)/2, so that y = 2.327. Standard for full credibility for frequency = n0 = (y/k)2 = (2.327 / 0.05)2 = 2166 claims. Now one has to start fresh and write down the formula for a standard for full credibility for the pure premium, with a new y and k. Since Φ(1.960) =0.975 = (1+0.95)/2, the new y = 1.960. The standard for full credibility for pure premium = nF = n0 (1+CV2 ). ∞
The mean severity is:
∞
∫1 x f(x) dx = ∫1 x (5 / 2)
x- 7 / 2
dx = {(5 / 2) / (-3 / 2)}
∞
The 2nd moment is:
∫1
x2
(5 / 2)
x- 7 / 2
dx = {(5 / 2) / (-1/ 2)}
x= ∞ 3 / 2 x
]
= 5/3.
x=1 x =∞ ∞ 1/ 2 x
]
= 5.
x =1
Thus the variance = 5 - (5/3)2 = 2.22. The coefficient of variation is the standard deviation divided by the mean: 2.22 / (5/3) = 0.894. We are given that this standard for full credibility for pure premium is equal to the previously calculated standard for full credibility for frequency; thus 2166 = (1.9602 / k2 )(1 + 0.8942 ). Solving, the new k = 0.056. Comment: The severity is a Single Parameter Pareto Distribution, with α = 2.5 and θ = 1.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 107 5.37. A. We are given k = 10%, P = 90%. Φ(y)= (P +1)/ 2 = 95%. The Normal distribution has a 95% chance of being less than 1.645. Thus y = 1.645. n0 = (y/k)2 = 271. The mean severity is 3.1 and the variance of the severity = 21.7 - 3.12 = 12.09. Claim Size 1 2 10 Average
Probability 0.5 0.3 0.2
3.1
Square of Claim Size 1 4 100 21.7
Therefore the Square of the Coefficient of Variation = variance / mean2 = 12.09 / 3.12 = 1.258. Therefore the full credibility standard is = n0 (1 + CV2 ) = (271)(1 + 1.258) = 612 claims. 5.38. B. nF = n0 (1+CV2 ) = (y2 / k2) (1+CV2 ). k =10%. y = 1.960 since Φ(1.960) = 0.975 = (P+1)/2 = 1.90 / 2. Thus 3025 = (1.96/0.1)2 (1+CV2 ). Therefore: 7.87 = 1+CV2 . ⇒ CV = 2.62. 5.39. D. 1. False. The correct formula contains the square of the mean severity: Var(C) = E(N) Var(X) + E(X)2 Var(N). 2. True. Using the fact that the Coefficient of Variation is the mean over the standard deviation: n0 {E(X)2 + Var(X)}/E(X)2 = n0 {1 + Var(X)/E(X)2 } = n0 [1 + CV2 ] = nF. 3. True. The “square root rule” for partial credibility used in Classical Credibility. Comment: Statement 3 is only true for n ≤ nF. For n ≥ nF, Z=1. 5.40. E. The Normal distribution has a (1+P)/2 = (1+.95)/2 = 97.5% chance of being less than 1.960. Thus y = 1.960. Therefore in terms of number of claims the full credibility standard is = y 2 (1+ CV2 ) / k2 = (1.962 )(1 + 4) /k2 = 3415 claims. Therefore k = (1.96) 5 / 3415 = 0.075. Comment: You are given the output, 3415 claims, and asked to solve for the missing input, k.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 108 5.41. C. We are given k = 5%. Φ(y) = (1+P)/2 = (1+.90)/2 = 0.95, therefore y = 1.645. n0 = y2/k2 = 1.6452 / 0.052 = 1082 claims. The given severity distribution is a Single Parameter Pareto, with α = 5 and θ = 1. First moment is: αθ/ (α-1) = 5/4. Second moment is: αθ2/ (α-2) = 5/3. 1 + CV2 = E[X2 ] / E[X]2 = (5/3) / (5/4)2 = 16/15. Therefore, the standard for full credibility for the Pure Premium is: n0 (1 + CV2 ) = (1082)(16/15) = 1154 claims. Next the problem states that this is also a full credibility standard for frequency. In this case, Φ(y) = (1+0.95)/2 = 0.975, therefore y = 1.960. Thus setting 1154 claims = y2 / k2 = 1.962 / k2 , one solves for k = 0.0577. 5.42. B. Let the standard deviation of the severity distribution, for which we will solve, be σ. The Classical Credibility Standard for the Pure Premium is given by: nf = n0 (1 + CV2 ). CV2 = σ2 / 50002 . n0 = (y/k)2 = (1.96/0.05)2 = 1537 claims. One must now translate n0 into exposures, since that is the manner in which the full credibility criterion is stated in this problem. One does so by dividing by the expected frequency, which is the fitted Poisson parameter m. Using the method of moments, m = (observed # of claims) / (observed number of exposures) = (1293 + 1592 + 1418) / (18,567 + 26,531 + 20,002) = 4303 / 65,000 = .0662. Thus n0 in terms of exposures is: 1537 claims / (0.0662 claims / exposure) = 23,218 exposures. Now one sets the given criterion for full credibility equal to its calculated value: 120,000 = 23,218(1 + σ2 /50002 ). Solving, σ = $10,208. Comment: Assuming a Poisson frequency with parameter 0.0662, a severity distribution with a mean of $5000 and a standard deviation of $10,208, how many exposures are needed for full credibility if we want the actual total cost of claims to be within ±5% of the expected total 95% of the time? The solution to this alternate question is: (1537 claims) {1 + (10208/5000)2 } / (0.0662 claims per exposure) ≅ 120,000 exposures.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 109 5.43. D. nf = n0 {1 + square of coefficient of variation of severity}. n0 = y2 / k2 . k = 10%. P = 98%. ⇒ y = 2.326 For the Pareto, mean = θ / (α−1) = 3000, and the second moment =
2 θ2 = 36 million. (α − 1) (α − 2)
1+ CV2 = E[X2 ] / E[X]2 = 36 million / 30002 = 4. Thus nF = (2.326 / 0.1)2 (4) = 2164. 5.44. C. k = .10 (“within 10% of the expected”) y = 1.645 since Φ(1.645) = .95 (“to be 90% certain”, allow 5% outside on each tail.) n0 = y2/k2 = 1.6452 /0.12 = 271. Coefficient of Variation2 = Variance / mean2 = 10 / 25 = 0.4. nF = n0 (1+CV2 ) = (271)(1.4) = 379 claims. 5.45. D. k = 0.075 (“within 7.5% of the expected”) y = 1.960 since Φ(1.960) = 0.975 (“to be 95% certain”, allow 2.5% outside on each tail.) n0 = y2/k2 = 1.9602 /0.0752 = 683. 1 + CV2 = second moment / mean2 = exp(2µ + 2σ2) / exp(2µ + σ2) = exp(σ2) = e2.25 = 9.49. nF = n0 (1 + CV2 ) = (683)(9.49) = 6482 claims. But we are given that the full credibility criterion with respect to exposures is 40,000. To convert to claims we multiply by the mean claim frequency. Therefore nF = 40,000m. Therefore m = 6442 / 40,000 = 16.2%.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 110 5.46. A. n0 = y2/k2 = y2 / 0.052 = 400 y2 . For the full credibility standard for pure premiums (“claims costs”) we need to compute the coefficient of variation. For a Pareto with θ = 3000 and α = 4, the second moment is: 2(3000)2 / (4-1)(4-2), while the mean is: 3000 / (4-1). Thus 1 + CV2 = second moment / mean2 = 2(3) / 2 = 3. Therefore, the standard for full credibility is: n0 (1 + CV2 ) = 400 y2 (3) = 1200 y2 . Setting this equal to the given 2000 claims we solve for y: y =
2000 = 1.291. 1200
One then needs to compute how much probability is within ± 1.291 standard deviations on the Normal Distribution. Φ(1.291) = 0.9017. Therefore, P = 1 - (2)(1 - .9017) = 0.803. (9.83% is outside on each tail.) Comment: If one had been given P = 80.3% and were asked to solve for the standard for full credibility, then we would want 0.0985 outside on either tail, so we want Φ(y) = 0.9015. Thus y ≅ 1.29 and the standard for full credibility ≅ (3) (1.292 )/ (0.052 ) ≅ 2000. 5.47. C. For the Gamma Distribution the Coefficient of Variation = 1 /
α = 1.
We are given k = 5% and P = 90%. Φ(1.645) = 0.95 = (1 + P)/2. ⇒y = 1.645. Therefore n0 = y2 /k2 = (1.645/0.05)2 = 1082. nF = n0 (1 + CV2 ) = 1082(1 + 12) = 2164. Comment: The Gamma Distribution for α =1 is an Exponential Distribution, with Coefficient of Variation of 1 (and Skewness of 2.) 5.48. C. nF = y2 /k2 (1+CV2 ), nF is proportional to y2 . For P = 90%, Φ(1.645) = 0.95 = (1 + P)/2. ⇒y = 1.645. For P = 95%, y = 1.960, since Φ(1.960) = 0.975 = (1 + 0.95)/2. Thus the new criterion for full credibility = (1000)(1.960/1.645)2 =1420.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 111 5.49. B. P = 0.95 and k = 0.1. Φ(1.960) = 0.975 = (1+P)/2, so that y = 1.960. n0 = (y/k)2 = (1.960 / 0.10)2 = 384. The mean severity is 4 and so is the variance, since it follows a Poisson Distribution. Thus the square of the coefficient of variation of the severity = CV2 = variance / mean2 = 4 / 42 = 1/4. Therefore nF = n0 (1+CV2 ) = (384)(1 + 1/4) = 480. Comment: Itʼs unusual to have severity follow a Poisson Distribution. This situation is mathematically equivalent to a Poisson-Poisson compound frequency distribution. 5.50. C. nF = y2 /k2 (1+CV2 ). If the CV goes from 2 to 4, and k doubles then the Standard for Full Credibility is multiplied by: {(1+42 ) / (1+22 )} / 22 = (17/5) / 4. Thus the Standard for Full Credibility is altered to: (1200)(17/5) / 4 = 1020. Comment: If k doubled and the CV stayed the same, then the Standard for Full Credibility would be altered to: (1200) / 4 = 300. If k stayed the same and the CV went from 2 to 4, then the Standard for Full Credibility would be altered to: (1200) {(1+42 ) / (1+22 )} = 4080. 5.51. E. For the Lognormal, Mean = exp(µ + σ2/2), 2nd Moment = exp(2µ + 2σ2), 1 + square of coefficient of variation = 2nd moment / mean2 = exp(σ2) = e1 = 2.718. k = 0.1. P = 95%, so that y = 1.96 since Φ(1.96) = 0.975 = (1+P)/2. Thus n0 = y2/k2 = 384. Thus the number of claims needed for full credibility of the pure premium is: n0 (1+CV2 ) = 384(2.718) = 1044 claims. To convert to the full credibility standard to exposures, divide by the expected frequency of 0.01: 1044/.01 = 104.4 thousand exposures.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 112 5.52. E. k = 5% (within 5%), P = 95% (95% of the time). Φ(y) = (1+P)/2 = 0.975, thus y = 1.960. The Inverse Gamma has: E[X] = θ/(α-1). E[X2 ] =
θ2 . (α − 1) (α − 2)
nF = n0 (1+CV2 ) = (y/k)2 (E[X2 ]/E2 [X]) = (1.96/0.05)2 {(α-1)/(α-2)} = (1537)(3/2) ≅ 2306 claims. Comment: In this case, CV2 =1/2. For the Inverse Gamma Distribution, CV2 = 1/(α-2). 5.53. C. The Negative Binomial has a larger variance than the Poisson, so there is more random fluctuation, and therefore the standard for Full Credibility is larger. Specifically, one can derive a more general formula than when the Poisson assumption does not apply. The Standard for Full Credibility is: {y2 / k2 }(σf2 /µf + σs2 / µs2 ), which reduces to the Poisson case when σf2 /µf = 1. For the Negative Binomial the variance is greater than the mean, so σf2 /µf > 1. Thus for the Negative Binomial the standard for Full Credibility is larger than the Poisson case, all else equal. Comment: For example, assume one had in the previous question instead of a Poisson, a Negative Binomial frequency distribution with parameters β = 2.5 and r = 3. Then σf2 /µf = rβ(1+β)/(rβ) = 1 + β = 3.5. Thus, the Standard for Full Credibility would have been: {y2 / k2 }(σf2 /µf + σs2 /µs2 ) = (1537)(3.5 + 0.5) ≅ 6148 claims, rather than about 2306 claims. 5.54. A. Frequency is Poisson and therefore µf = σf2 . σpp2 = µs2 σf2 + µfσs2 = µf (µs2 + σs2 ). Thus 500 = 10(µs2 + 10 ). Therefore, µs2 = 40. CV2 = σs2 / µs2 = 10/40 = 0.25. k = 0.05. Φ(y) = (1+P)/2 = (1+.95)/2 = 0.975. Thus y = 1.96. n0 = (y/k)2 = (1.96/0.05)2 = 1537. nF = n0 (1+CV2 ) = 1537(1+0.25) = 1921. 5.55. D. For the Burr Distribution E[Xn ] =
θn Γ(1 + n / γ) Γ(α - n / γ) . Γ(α)
For α = 6 and γ = 0.5, E[X] = θ Γ(1+ 2) Γ(6-2) / Γ(6) = θ (2!)(3!) / 5! = θ/10. E[X2 ] = θ2Γ(1+ 1)Γ(6-1)/Γ(6) = θ2 (1!)(4!) / 5! = θ2/5. (1+CV2 ) = E[X2 ] / E2 [X] = (θ2/5)/(θ/10)2 = 100/5 = 20. k = 0.10. nF = n0 (1+CV2 ) = (y2 / k2 )(1+CV2 ). Since we are given that nF = 6000. 6000 = (y/0.1)2 (20). y = 0.1
6000 = 0.1 20
300 = 3 = 1.732.
(1+P)/2 = Φ(y) = Φ(1.732) = 0.9584. Thus P = 0.917.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 113 5.56. E. nF = n0 (1+CV2 ) = (1000)(1 + 22 ) = 5000. 5.57. C. For the LogNormal Distribution, 1 +CV2 = (2nd moment)/mean2 = exp(2µ + 2σ2)/ exp(µ + .5σ2)2 = exp(σ2). k = 5%, P = 90%. We have y = 1.645, since Φ(1.645) = 0.95 = (1+P)/2. Therefore, n0 = (y/k)2 = (1.645 /0.05)2 = 1082. We are given nF = 13,000. But nF = n0 (1+CV2 ). Thus 13,000 = 1082(1+CV2 ). 1+CV2 = 12.01. Therefore, 12.01 = exp(σ2). ⇒ σ =
ln(12.01) = 1.577.
5.58. E. The mean frequency is: 750/1000 = 0.75. The variance of the frequency is: 1494/1000 - 0.752 = 0.9315. C VSev2 = 6750000/15002 = 3. k = 5%. P = 95%. y = 1.960. n0 = y2/k2 = 1.9602 /0.052 = 1537. Standard for full credibility = n0(σF2/µF + CVSev2 ) = (1537)(0.9315/0.75 + 3) = 6520 claims. 6520 claims corresponds to 6520/0.75 = 8693 exposures. Alternately, Standard for full credibility in terms of exposures = n0 (coefficient of variation of the pure premium)2 = (1537)(variance of the pure premium)/(mean pure premium)2 = (1537){(.75)(6750000) + (15002 )(0.9315)} / {(0.75)(1500)}2 = (1537)(7.1584 million)/11252 = 8693 exposures. Comment: Items (i) and (ii) are not needed to answer the question, although they do imply that the frequency for the whole portfolio is Negative Binomial. Therefore the factor σF2/µF should be greater than 1. That the severity is Pareto is also not used to answer the question, although one can infer that α = 3 and θ = 3000. 5.59. B. For the Gamma-Poisson, the mixed distribution is Negative Binomial, with r = α = 15 and β = θ = 0.2. Therefore, for frequency, σf2 /µf = rβ(1 + β)/ (rβ) = 1 + β = 1.2. For the Exponential Distribution, σs2 / µs2 = θ2/θ2 = 1. For P = 0.90, y = 1.645. k = 0.05. {y2 / k2 }(σf2 /µf + σs2 /µs2 ) = (1.645/0.05)2 (1.2 + 1) = 2381 claims. Comment: We use the Negative Binomial Distribution for the whole portfolio of insureds, in order to compute the standard for full credibility, thereby taking into account the larger random fluctuation of results due to the heterogeneity of the portfolio.
2013-4-8, Classical Credibility §5 Full Credibility Aggregate Loss, HCM 10/16/12, Page 114 5.60. E. k = 0.02. P = 90% ⇒ y = 1.645. n0 = y2 /k2 = (1.645/0.02)2 = 6765 claims. For the Pareto, E[X] = θ/(α-1) = 0.5/5 = 0.1, E[X2 ] =
2 θ2 (2) (0.52) = = 0.025, (α − 1) (α − 2) (6 -1) (6 - 2)
and 1 + CV2 = E[X2 ]/E[X]2 = 0.025/0.12 = 2.5. Standard for Full Credibility for pure premium when frequency is Poisson = n0 (1 + CVSev2 ) = (6765)(2.5) = 16,913 claims. Comment: For the Pareto Distribution, CV2 = α/(α-2) = 6/4 = 1.5. 5.61. E. σf2/µf = rβ(1 + β)/(rβ) = 1 + β = 4. E[X] = 24.4. E[X2 ] = 2040.4. CVsev2 = 2040.4/24.42 - 1 = 2.427. K = 10%. P = 95%. y = 1.960. n0 = (1.960/0.1)2 = 384. The standard for full credibility for aggregate losses is: (4 + 2.427)(384) = 2468 claims. 5.62. E. Since we have a Poisson frequency, the standard for full credibility is n0 (1 + CVSev2 ). Thus we need to determine the coefficient of variation of severity. Mean = αθ. Variance = αθ2. CV2 =
α θ2 = 1/α, can not be determined. (α θ)2
Comment: One need only know α in order to determine the coefficient of variation of the Gamma Distribution, as in 4B, 5/96, Q.27. P = 95%. y = 1.960. k = 10%. n0 = y2 /k2 = 384. 5.63. B. From the standard for frequency, 2000 = y2 /0.032 . ⇒ y2 = 1.8. For the uniform severity: CV2 = variance/mean2 = (100002 /12)/(10000/2)2 = 1/3. Standard for Aggregate Losses is: n0 (1 + CV2 ) = (1.8/0.052 )(1 + 1/3) = 960 claims.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 115
Section 6, Partial Credibility When one has at least the number of claims needed for Full Credibility, then one assigns 100% credibility to the observed data. However, when one has less data then is needed for full credibility, one assigns an amount of Credibility less than 100%. If the Standard for Full Credibility is 683 claims and one has only 300 claims, then one assigns less than full credibility to this data. How much less is determined via the “square root rule.”46 Let n be the (expected) number of claims for the volume of data, and nF be the standard for Full Credibility for the pure premium or aggregate losses. Then the partial credibility assigned is Z =
n . When dealing with frequency or severity a similar formula applies. nF
Unless stated otherwise assume that for Classical Credibility the partial credibility is given by this square root rule.47 Use the square root rule for partial credibility for either frequency, severity, pure premiums, or aggregate losses. For example if 1000 claims are needed for full credibility for frequency, then the following credibilities would be assigned: Expected # of Claims 1 10 25 50 100 200 300 400 500 600 700 800 900 1000 1500
46
Credibility 3% 10% 16% 22% 32% 45% 55% 63% 71% 77% 84% 89% 95% 100% 100%
In some practical applications an exponent other than 1/2 is used. For example, in Workersʼ Compensation classification ratemaking, an exponent of 0.4 is used by the NCCI. 47 In contrast for Buhlmann/ Greatest Accuracy Credibility, Z = N / (N+K) for K equal to the Buhlmann Credibility parameter. There is no Standard for Full Credibility for Buhlmann Credibility.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 116
Exercise: The Standard for Full Credibility is 683 claims and one has observed 300 claims. How much credibility is assigned to this data? [Solution:
300 = 66.3%.] 683
Exercise: The Standard for Full Credibility is 683 claims and one has observed 2000 claims. How much credibility is assigned to this data? [Solution: 100%. When the volume of data is greater than (or equal to) Standard for Full Credibility, one assigns 100% credibility to the data.] When available, one generally uses the number of exposures or the expected number of claims in the square root rule, rather than the observed number of claims.48 Make sure that in the square root rule you divide comparable quantities: Z=
number of claims , or standard for full credibility in terms of claims
Z=
number of exposures . standard for full credibility in terms of exposures
Exercise: Prior to observing any data, you assume that the claim frequency rate per exposure has mean = 0.25. The Standard for Full Credibility for frequency is 683 claims. One has observed 300 claims on 1000 exposures. Estimate the number of claims you expect for these 1000 exposures next year. [Solution: The expected number of claims on 1000 exposures is: (1000)(.25) = 250. Z=
250 = 60.5%. 683
Alternately, a standard of 683 claims corresponds to 683/0.25 = 2732 exposures. Z=
1000 = 60.5%. 2732
In either case, the estimated future frequency = (60.5%)(0.30) + (1 - 60.5%)(0.25) = 0.280. (1000)(0.280) = 280 claims.]
48
See “Credibility” by Mahler and Dean, page 29.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 117
Limiting Fluctuations: For example, assume that the mean frequency per exposure is 2%, and we have 50,000 exposures. Then the expected number of claims is: (2%)(50,000) = 1000. If frequency is Poisson, then the variance of the number of claims from a single exposure is 2%. 2% 49 The variance of the average frequency for the portfolio of 50,000 exposures is: . 50,000 The standard deviation of the observed claim frequency is:
2% = 0.000632. 50,000
If instead of 50,000 exposures one had only 5000 exposures, then the expected number of claims is: (2%)(5,000) = 100. The standard deviation of the estimated frequency is:
2% = 0.002. 5000
With only 5000 rather than 50,000 exposures, there would be considerably more fluctuation in the observed claim frequency.
49
The variance of an average is the variance of a single draw divided by the number of items being averaged.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 118
Below are shown 100 random simulations of the claim frequency for 50,000 exposures with a Poisson parameter λ = 0.02, for 1000 expected claims:
0.024
0.022
20
40
60
80
100
0.018
0.016
0.014
Below are shown 100 random simulations of the claim frequency for 5,000 exposures with a Poisson parameter λ = 0.02, for 100 expected claims:
0.024
0.022
20
40
60
80
100
0.018
0.016
0.014
With only 100 expected claims, there is much more random fluctuation in the observed claim frequency, than with 1000 expected claims.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 119
Let us now assume that the standard for full credibility for estimating frequency is chosen as 1000 expected claims.50 Then if we had 50,000 exposures and 1000 expected claims, we would give the observed frequency a credibility of one; we would rely totally on the observed frequency to estimate the future frequency. As discussed previously, for this amount of data the standard deviation of the observed claim frequency is: 0.000632. This is also the standard deviation of the estimated claim frequency. Thus the chosen standard for full credibility results in a standard deviation of the estimated claim frequency of 0.000632.51 If we had only 5000 exposures and 100 expected claims, then as discussed previously, the standard deviation of the observed claim frequency is: 0.002. If we were to rely totally on the observed frequency to estimate the future frequency, then the standard deviation of that estimate would be much larger than desired. However, with only 100 expected claims, in estimating the future frequency we multiply the observation by Z =
100 = 31.6%. The standard deviation of Z times the observation is: 1000
(0.316)(0.002) = 0.000632. This is the same standard deviation as when we had full credibility. Therefore, using credibility, the fluctuation in the estimated frequency due to the fluctuations in the data will be the same. To reiterate, the standard deviation of the observed claim frequency is larger for 100 expected claims than for 1000 claims. If one uses 1000 claims as the Standard for Full Credibility, then the credibility assigned to 100 expected claims is the ratio of the standard deviation with 1000 expected claims to the standard deviation with 100 expected claims. In this case, Z = 0.000632 / 0.002 = 31.6% =
100 . 1000
This concept is shown below for two Normal Distributions, approximating Poisson frequency Distributions, one with mean 1000 and variance 1000 (solid curve) and the other with mean 100 and variance 100 (dotted curve).
50 51
This would have been based on some choice of P and k, as discussed previously. If we had more than 50,000 exposures, the standard deviation of the estimated claim frequency would be less.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 120
The x-axis is the number of claims / mean number of claims.
12 10 8 6 4 2
0.7
0.8
0.9
1.1
1.2
1.3
Each arrow is plus or minus one coefficient of variation, since each curve has been scaled in terms of its mean number of claims. With a full credibility standard of 1000 claims, the partial credibility for 100 expected claims is the ratio of the lengths of the arrows: Z = 0.0316/0.100 = 31.6%. The credibilities are inversely proportional to the standard deviations of the observed frequencies. In general, the partial credibility assigned to n claims for n ≤ nF will be the ratio of the standard deviation with nF expected claims to the standard deviation with n expected claims. This ratio will be such that Z =
n . nF
The standard deviation of Z times the observation will be that for full credibility: n µ µ VAR[(Z)(observation)] = Z2 VAR[observation] = = . Thus the random fluctuation in the nF n nF estimate that is due to the contribution of Z times the observation has been limited to that which was deemed acceptable when the Standard for Full Credibility was determined. This is why the term “Limited Fluctuation Credibility” is sometimes used to describe Classical Credibility.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 121
The square root rule for partial credibility is designed so that when one has less data than the standard for the full credibility, the weight given the observation is such that the standard deviation of the estimate of the future has the same value it would have had if instead we had an amount of data equal to the standard for full credibility.
Deriving the Square Root Rule:52 Let nF be such that when the observed pure premium Xfull is based on nF claims: P = Prob[µ - kµ ≤ Xfull ≤ µ + kµ] ⇔ P = Prob[- kµ/σfull ≤ (Xfull - µ)/ σfull ≤ kµ/σfull ]. In this case, our estimate = Xfull. Let Xpartial be the observed pure premium based on n claims, with n < nF. In this case, our estimate = ZXpartial + (1-Z)Y, where Y is other information. We desire to limit the fluctuation in this estimate due to the term ZXpartial. We desire ZXpartial to have a large probability of being close to Zµ: P = Prob[Zµ - kµ ≤ ZXpartial ≤ Zµ + kµ] ⇔ P = Prob[-kµ/(Z σpartial) ≤ (Xpartial - µ) / σpartial ≤ kµ/(Z σpartial)]. Assuming both (Xfull - µ)/σfull and (Xpartial - µ)/σpartial are approximately Standard Normals, comparing the two requirements, in order to make both probabilities P, we require that kµ σ full kµ = . ⇒Z = . Z σpartial σpartial σ full However, the standard deviation of an average goes down as the inverse of the amount of data. Therefore,
52
σ full 1 / nF = = σpartial 1 / n
n . ⇒Z = nF
See pages 514-515 of Mahler and Dean.
n . nF
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 122
Comparing Different Standards for Full Credibility: The credibilities assigned to various numbers of claims under either a Standard for Full Credibility of 2500 claims (dashed) or 1000 claims (solid) are shown below. Cred. 1 0.8 0.6 0.4 0.2
500
1000
1500
2000
2500
Claims 3000
For large volumes of data the credibility is 100% under either Standard. For smaller volumes of data, more credibility is assigned when using a Standard for Full Credibility of 1000 claims rather than 2500 claims. The differences in the amount of credibility assigned using these two different Standards for Full Credibility of 1000 and 2500 claims are: Cred. 0.35 0.3 0.25 0.2 0.15 0.1 0.05 500
1000
1500
2000
2500
Claims 3000
For smaller volumes of data there is as much as a 37% difference in the credibilities depending on the Standard for Full Credibility. Nevertheless, even for the criteria differing by a factor of 2.5, the credibilities assigned to most volumes of data are not that dissimilar.53 53
Rounding Standards for Full Credibility to a whole number of claims should be more than sufficient.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 123
Classical Credibility vs. Buhlmann Credibility: Below the Classical Credibility formula for credibility with 2500 claims for Full Credibility (dashed curve) is compared to one from Buhlmann Credibility (solid curve): Z = N/(N + 350).54 Cred. 1 0.8 0.6 0.4 0.2
1000
2000
3000
4000
Claims 5000
One important distinction is that as the volume of data increases the Buhlmann Credibility approaches but never quite attains 100% credibility.55
54
Z = N/(N+K) for K equal to the “Buhlmann Credibility parameter”. In this example, K = 350. See “Mahlerʼs Guide to Buhlmann Credibility.” 55 However, the credibilities produced by these two formula are relatively similar. Generally this will be true provided the Standard for Full Credibility is about 7 or 8 times the Buhlmann Credibility Parameter. See “An Actuarial Note on Credibility Parameters”, by Howard Mahler, PCAS 1986.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 124
Here is the difference in the credibilities produced by these two formulas:
Cred. 0.1 0.05
1000 - 0.05 -0 . 1
2000
3000
4000
Claims 5000
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 125
Problems: 6.1 (1 point) The Standard for Full Credibility is 1500 claims. How much credibility is assigned to 200 claims? A. less than 0.2 B. at least 0.2 but less than 0.3 C. at least 0.3 but less than 0.4 D. at least 0.4 but less than 0.5 E. at least 0.5 6.2 (1 point) The 1996 pure premium underlying the rate equals $1,000. The loss experience is such that the actual pure premium for that year equals $1,200 and the number of claims equals 400. If 8000 claims are needed for full credibility and the square root rule for partial credibility is used, estimate the pure premium underlying the rate in 1997. (Assume no change in the pure premium due to inflation.) A. Less than $1,020 B. At least $1,020, but less than $1,030 C. At least $1,030, but less than $1,040 D. At least $1,040, but less than $1,050 E. $1,050 or more 6.3 (1 point) Using the square root rule for partial credibility a certain volume of data is assigned credibility of 0.26. How much credibility would be assigned to 20 times that volume of data? A. less than 0.5 B. at least 0.5 but less than 0.7 C. at least 0.7 but less than 0.9 D. at least 0.9 but less than 1.1 E. at least 1.1 6.4 (2 points) Assume a Standard for Full Credibility for severity of 2000 claims. Assume that for the class of Plumbers one has observed 513 claims totaling $4,771,000. Assume the average cost per claim for all similar classes is $10,300. What is the estimated average cost per claim for the Plumbers class? A. less than 9600 B. at least 9600 but less than 9650 C. at least 9650 but less than 9700 D. at least 9700 but less than 9750 E. at least 9750
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 126
6.5 (1 point) The Standard for Full Credibility is 4500 claims. The expected claim frequency is 4% per house-year. How much credibility is assigned to 5000 house-years of data? A. less than 0.2 B. at least 0.2 but less than 0.3 C. at least 0.3 but less than 0.4 D. at least 0.4 but less than 0.5 E. at least 0.5 6.6 (2 points) You are given the following information: •
Frequency is Poisson
•
Severity follows a Gamma Distribution with α = 2.5.
•
Frequency and Severity are Independent.
•
Full credibility is defined as having a 98% probability of being within plus or minus 6%
of the true pure premium. What credibility is assigned to 200 claims? A. less than 0.32 B. at least 0.32 but less than 0.34 C. at least 0.34 but less than 0.36 D. at least 0.36 but less than 0.38 E. at least 0.38 6.7 (3 points) You are given the following: Prior to observing any data, you assume that the claim frequency rate per exposure has mean = 0.05 and variance = 0.15. A full credibility standard is devised that requires the observed sample frequency rate per exposure to be within 3% of the expected population frequency rate per exposure 98% of the time. You observe 9021 claims on 200,000 exposures. Estimate the number of claims you expect for these 200,000 exposures next year. A. less than 9200 B. at least 9200 but less than 9300 C. at least 9300 but less than 9400 D. at least 9400 but less than 9500 E. at least 9500 6.8 (1 point) The standard for full credibility is 1000 exposures. For how many exposures would Z = 40%?
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 127
6.9 (3 points) You are given the following: • The number of claims follows a Poisson distribution. • The variance of the pure premium distribution is 100. • The a priori estimate of the mean pure premium is 6. • The full credibility standard has been selected so that the estimated pure premiums will be within 2.5% of their expected value 80% of the time. • You observe $3,200 of losses for 800 exposures. Using the methods of classical credibility, estimate the future pure premium. A. Less than 5.0 B. At least 5.0, but less than 5.2 C. At least 5.2, but less than 5.4 D. At least 5.4, but less than 5.6 E. At least 5.6 Use the following information for the next two questions:
• • • •
The number of claims follows a Poisson distribution. Claim sizes follow an exponential distribution. The number of claims and claim sizes are independent. Credibility is assigned to the observed data using the concepts of classical credibility.
6.10 (2 points) If one were estimating the future frequency, the volume of data observed would be assigned 60% credibility. Assume the same value of k and P are used to determine the Full Credibility Criterion for frequency and pure premiums. How much credibility would be assigned to this same volume of data for estimating the future pure premium? A. Less than 45% B. At least 45%, but less than 50% C. At least 50%, but less than 55% D. At least 55% E. Cannot be determined from the given information. 6.11 (2 points) If one were estimating the future frequency, the volume of data observed would be assigned 100% credibility. Assume the same value of k and P are used to determine the Full Credibility Criterion for frequency and pure premiums. How much credibility would be assigned to this same volume of data for estimating the future pure premium? A. Less than 85% B. At least 85%, but less than 90% C. At least 90%, but less than 95% D. At least 95% E. Cannot be determined from the given information.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 128
6.12 (3 points) You are given the following: • The number of claims follows a Poisson distribution.
• • • •
• • •
The number of claims and claim sizes are independent. Credibility is assigned to the observed data using the concepts of classical credibility. The estimated pure premium is to be within 10% of its expected value 95% of the time. You observe the following data: Year: 1 2 Dollars of Loss: 200 150 There is no inflation.
3 230
4 180
There is no change in exposure.
The current manual premium contains a provision for losses of 210. Estimate the future annual losses. A. Less than 197 B. At least 197, but less than 199 C. At least 199, but less than 201 D. At least 201, but less than 203 E. At least 203 6.13 (3 points) You are given:
• Claim counts follow a Poisson distribution. • Claim sizes have a coefficient of variation squared of 1/2. • Claim sizes and claim counts are independent. • The number of claims in 2001 was 810. • The aggregate loss in 2001 was $1,134,000. • The manual premium for 2001 was $1.6 million. • The expected loss ratio underlying the manual rates is 80%. (The expected aggregate losses are 80% of manual premiums.)
• The exposure in 2002 is 12% more than the exposure in 2001. • The full credibility standard is to be within 2.5% of the expected aggregate loss 90% of the time. Estimate the aggregate losses (in millions) for 2002. (A) Less than 1.25 (B) At least 1.25, but less than 1.30 (C) At least 1.30, but less than 1.35 (D) At least 1.35, but less than 1.40 (E) At least 1.40
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 129
6.14 (2 points) So far this baseball season, the Houston Astros baseball team has won 35 games and lost 72 games. Using a simulation, a website has predicted that for the entire season the Houston Astros are expected to win 55.4 games and lose 106.6 games. Sanford Beech is an actuarial student. Sandy notices that using classical credibility, giving weight 1 - Z to a 50% winning percentage, he can get the same estimate. How many games would if take for Sandy to give full credibility? A. Less than 190 B. At least 190 but less than 195 C. At least 195 but less than 200 D. At least 200 but less than 205 E. At least 205 6.15 (3 points) You are given the following: • The number of losses is Poisson distributed with mean 500. • Number of losses and loss severity are independent. • Loss severity has the following distribution: Loss Size Probability 100 .30 1000 .40 10,000 .20 100,000 .10 • There is a 1000 deductible and maximum covered loss of 25,000. How much credibility would be assigned so that the estimated total cost of claim payments is within 10% of the expected cost with 90% probability? A. Less than 55% B. At least 55% but less than 60% C. At least 60% but less than 65% D. At least 65% but less than 70% E. At least 70% 6.16 (2 points) Prior to the beginning of the baseball season you expected the New York Yankees to win 100 of 162 games. The Yankees have won 8 of their first 19 games this season. Using a standard for full credibility of 1000 games, predict how many games in total the Yankees will win this season. A. 88 B. 90 C. 92 D. 94 E. 96
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 130
6.17 (3 points) You are given the following information: •
Claim counts follow a Poisson distribution.
•
Claim sizes follow a Gamma Distribution.
•
Claim sizes and claim counts are independent.
•
The full credibility standard is to be within 5% of the expected aggregate loss 90% of the time.
•
The number of claims in 2007 was 77.
•
The average size of claims in 2007 was 6861.
•
In 2007, the provision in the premium in order to pay losses was 400,000.
•
The exposure in 2008 is identical to the exposure in the 2007.
•
There is 4% inflation between 2007 and 2008.
If the estimate of aggregate losses in 2008 is 447,900, what is the value of the α parameter for the Gamma distribution of severity? (A) 2
(B) 3
(C) 4
(D) 5
(E) 6
6.18 (2 points) For Workers Compensation Insurance for Hazard Group D you are given the following information on lost times claims: State of Con Island Countrywide Number of Claims 7,363 442,124 Dollars of Loss 218 million 23,868 million The full credibility standard has been selected so that actual severity will be within 7.5% of expected severity 99% of the time. The coefficient of variation of the size of loss distribution is 4. What is the estimated average severity for Hazard Group D in the state of Con Island? A. 37,000 B. 39,000 C. 41,000 D. 43,000 E. 45,000 6.19 (3 points) The average baseball player has a batting average of 0.260. In his first six at bats, Reginald Mantle gets 3 hits, for a batting average of 0.500. In his 3000 at bats, Willie Mays Hayes has gotten 900 hits, for a batting average of 0.300. Which of these two players would you expect to have a better batting average in the future? Use Classical Credibility to discuss why.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 131
6.20 (4, 5/84, Q.35) (2 points) Frequency is Poisson. Three years of data are used to calculate the pure premium. In the case of an average annual claim count of 36 claims, 20% credibility is assigned to the observed pure premium. The standard for full credibility was chosen so as to achieve a 90% probability of departing no more than 5% from the expected value. What is the ratio of the standard deviation to the mean for the claim severity distribution? A. Less than 1.1 B. At least 1.1, but less than 1.4 C. At least 1.4, but less than 1.7 D. At least 1.7, but less than 2.0 E. 2.0 or more 6.21 (4, 5/85, Q.30) (1 point) The 1984 pure premium underlying the rate equals $1,000. The loss experience is such that the actual pure premium for that year equals $1,200 and the number of claims equals 600. If 5400 claims are needed for full credibility and the square root rule for partial credibility is used, estimate the pure premium underlying the rate in 1985. (Assume no change in the pure premium due to inflation.) A. Less than $1,025 B. At least $1,025, but less than $1,075 C. At least $1,075, but less than $1,125 D. At least $1,125, but less than $1,175 E. $1,175 or more 6.22 (4, 5/86, Q.35) (1 point) You are in the process of revising rates. The premiums currently being used reflect a loss cost per insured of $100. The loss costs experienced during the two year period used in the rate review averaged $130 per insured. The average frequency during the two year review period was 250 claims per year. Using a full credibility standard of 2,500 claims and assigning partial credibility, what loss cost per insured should be reflected in the new rates? (Assume that there is no inflation.) A. Less than $105 B. At least $105, but less than $110 C. At least $110, but less than $115 D. At least $115, but less than $120 E. $120 or more
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 132
6.23 (4, 5/87, Q.36) (2 points) The actuary for XYZ Insurance Company has just developed a new rate for a particular class of insureds. The new rate has a loss cost provision of $125. In doing so, he used the partial credibility approach of classical credibility. In the experience period used, there were 10,000 insureds with an average claim frequency of 0.0210. If the loss cost in the old rate was $100 and the loss cost in the experience period was $200, what was the actuary's standard for full credibility? (Assume zero inflation.) A. Less than 3,000 B. At least 3,000, but less than 3,200 C. At least 3,200, but less than 3,400 D. At least 3,400, but less than 3,600 E. 3,600 or more. 6.24 (4, 5/88, Q.34) (2 points) Assume the random variable N, representing the number of claims for a given insurance portfolio during a one year period, has a Poisson distribution with a mean of n. Also assume X1 , X2 ..., XN are N independent, identically distributed random variables with Xi representing the size of the ith claim. Let C = X1 + X2 + ... Xn represent the total cost of claims during a year. We want to use the observed value of C as an estimate of future costs. Using Classical credibility procedures, we are willing to assign full credibility to C provided it is within 10.0% of its expected value with probability 0.96. Frequency is Poisson. If the claim size distribution has a coefficient of variation of 0.60, what credibility should we assign to the experience if 213 claims occur? A. Less than 0.60 B. At least 0.60, but less than 0.625 C. At least 0.625, but less than 0.650 D. At least 0.650, but less than 0.675 E. 0.675 or more
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 133
6.25 (4, 5/88, Q.35) (2 points) The High Risk Insurance Company is revising its rates, based on its experience during the past two years. The company experienced an average of 1,250 claims annually over these two years. The loss costs underlying the current rates average $500 per insured. The Actuary is proposing that this loss costs provision be revised upward to $550, based on the average loss costs of $700 experienced over the two year experience period. The Actuary is using the Classical credibility approach. The expected number of claims necessary for full credibility is determined by the requirement that the observed total cost of claims should be within 100k% of the true value 100P% of the time. What is the probability that a fully credible estimate of the loss costs (for a sample whose expected number of claims is equal to the full credibility standard) would be within 5% of the true value? Assume that frequency is Poisson, the average claim size is $700, and the variance of the claim size distribution is 17,640,000. A. Less than 0.775 B. At least 0.775, but less than 0.825 C. At least 0.825, but less than 0.875 D. At least 0.875, but less than 0.925 E. 0.925 or more 6.26 (4, 5/89, Q.30) (2 points) The Slippery Rock Insurance Company is reviewing their rates. In order to calculate the credibility of the most recent loss experience they have decided to use Classical credibility. The expected number of claims necessary for full credibility is to be determined so that the observed total cost of claims should be within 5% of the true value 90% of the time. Based on independent studies, they have estimated that individual claims are independent and identically 1 distributed as follows: f(x) = , 0 ≤ x ≤ 200,000. 200,000 Assume that the number of claims follows a Poisson distribution. What is the credibility Z to be assigned to the most recent experience given that it contains 1,082 claims? Use a normal approximation. A. Z ≤ 0.800 B. 0.800 < Z < 0.825 C. 0.825 < Z < 0.850 D. 0.850 < Z < 0.875 E. 0.875 < Z
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 134
6.27 (4, 5/91, Q.23) (2 points) The average claim size for a group of insureds is $1,500 with standard deviation $7,500. Assuming a Poisson claim count distribution, use as your standard for full credibility, the expected number of claims so that the total loss will be within 6% of the expected total loss with probability P = 0.90. We observe 6,000 claims and a total loss of $15,600,000 for a group of insureds. If our prior estimate of the total loss is 16,500,000, find the Classical credibility estimate of the total loss for this group of insureds. A. Less than 15,780,000 B. At least 15,780,000 but less than 15,870,000 C. At least 15,870,000 but less than 15,960,000 D. At least 15,960,000 but less than 16,050,000 E. At least 16,050,000 6.28 (4B, 5/92, Q.6) (1 point) You are given the following information for a group of insureds: • Prior estimate of expected total losses $20,000,000 • Observed total losses $25,000,000 • Observed number of claims 10,000 • Required number of claims for full credibility 17,500 Using the partial credibility as in Classical credibility, determine the estimate for the group's expected total losses based upon the latest observation. A. Less than $21,000,000 B. At least $21,000,000 but less than $22,000,000 C. At least $22,000,000 but less than $23,000,000 D. At least $23,000,000 but less than $24,000,000 E. At least $24,000,000 6.29 (4B, 11/93, Q.20) (2 points) You are given the following: • P = Prior estimate of pure premium for a particular class of business. • O = Observed pure premium during latest experience period for same class of business. • R = Revised estimate of pure premium for same class following observations. • F = Number of claims required for full credibility of pure premium. Based on the concepts of Classical credibility, determine the number of claims used as the basis for determining R. A.
D.
F (R - P) O - P F (R - P)2 (O - P)2
B.
F (R - P)2 (O - P) 2
E.
F2 (R - P) O - P
C.
F (R - P) O - P
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 135
6.30 (4B, 11/95, Q.12) (1 point) 2000 expected claims are needed for full credibility. Determine the number of expected claims needed for 60% credibility. A. Less than 700 B. At least 700, but less than 900 C. At least 900, but less than 1100 D. At least 1100, but less than 1300 E. At least 1300 6.31 (4B, 5/96, Q.28) (1 point) The full credibility standard has been selected so that the actual number of claims will be within 5% of the expected number of claims 90% of the time. Frequency is Poisson. Using the methods of Classical credibility, determine the credibility to be given to the experience if 500 claims are expected. A. Less than 0.2 B. At least 0.2, but less than 0.4 C. At least 0.4, but less than 0.6 D. At least 0.6, but less than 0.8 E. At least 0.8 6.32 (4B, 11/96, Q.29) (1 point) You are given the following:
• • •
The number of claims follows a Poisson distribution. Claim sizes are discrete and follow a Poisson distribution with mean 4.
The number of claims and claim sizes are independent. The full credibility standard has been selected so that the actual number of claims will be within 10% of the expected number of claims 95% of the time. Using the methods of Classical credibility, determine the expected number of claims needed for 40% credibility. A. Less than 100 B. At least 100, but less than 200 C. At least 200, but less than 300 D. At least 300, but less than 400 E. At least 400
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 136
6.33 (4B, 5/99, Q.18) (1 point) You are given the following: • The number of claims follows a Poisson distribution. • The coefficient of variation of the claim size distribution is 2. • The number of claims and claim sizes are independent. • 1,000 expected claims are needed for full credibility. • The full credibility standard has been selected so that the actual number of claims will be within k% of the expected number of claims P% of the time. Using the methods of Classical credibility, determine the number of expected claims needed for 50% credibility. A. Less than 200 B. At least 200, but less than 400 C. At least 400, but less than 600 D. At least 600, but less than 800 E. At least 800 6.34 (4B, 11/99, Q.18) (2 points) You are given the following: • Partial Credibility Formula A is based on the methods of classical credibility, with 1,600 expected claims needed for full credibility. • Partial Credibility Formula B is based on Buhlmann's credibility formula with a Buhlmann Credibility Parameter of K = 391. • One claim is expected during each period of observation. Determine the largest number of periods of observation for which Partial Credibility Formula B yields a larger credibility value than Partial Credibility Formula A. A. Less than 400 B. At least 400, but less than 800 C. At least 800, but less than 1,200 D. At least 1,200, but less than 1,600 E. At least 1,600
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 137
6.35 (4, 5/00, Q.26) (2.5 points) You are given: (i) Claim counts follow a Poisson distribution. (ii) Claim sizes follow a lognormal distribution with coefficient of variation 3. (iii) Claim sizes and claim counts are independent. (iv) The number of claims in the first year was 1000. (v) The aggregate loss in the first year was 6.75 million. (vi) In the first year, the provision in the premium in order to pay losses was 5.00 million. (vii) The exposure in the second year is identical to the exposure in the first year. (viii) The full credibility standard is to be within 5% of the expected aggregate loss 95% of the time. Determine the classical credibility estimate of losses (in millions) for the second year. (A) Less than 5.5 (B) At least 5.5, but less than 5.7 (C) At least 5.7, but less than 5.9 (D) At least 5.9, but less than 6.1 (E) At least 6.1 Note: I have reworded bullet vi in the original exam question 6.36 (4, 11/01, Q.15 & 2009 Sample Q.65) (2.5 points) You are given the following information about a general liability book of business comprised of 2500 insureds: Ni
(i) Xi =
∑Yij is a random variable representing the annual loss of the ith insured. j=1
(ii) N1 , N2 , ... , N2500 are independent and identically distributed random variables following a negative binomial distribution with parameters r = 2 and β = 0.2. (iii) Yi1, Yi2, ..., YiNi are independent and identically distributed random variables following a Pareto distribution with α = 3.0 and θ = 1000. (iv) The full credibility standard is to be within 5% of the expected aggregate losses 90% of the time. Using classical credibility theory, determine the partial credibility of the annual loss experience for this book of business. (A) 0.34 (B) 0.42 (C) 0.47 (D) 0.50 (E) 0.53
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 138
6.37 (4, 11/03, Q.35 & 2009 Sample Q.27) (2.5 points) You are given: (i) Xpartial = pure premium calculated from partially credible data (ii) µ = E[Xpartial] (iii) Fluctuations are limited to ±k µ of the mean with probability P (iv) Z = credibility factor Which of the following is equal to P? (A)
Pr[µ - kµ ≤ Xpartial ≤ µ + kµ]
(B)
Pr[Zµ - k ≤ Z Xpartial ≤ Zµ + k]
(C)
Pr[Zµ - µ ≤ Z Xpartial ≤ Zµ + µ]
(D)
Pr[1 - k ≤ Z Xpartial + (1-Z)µ ≤ 1 + k]
(E)
Pr[µ - kµ ≤ Z Xpartial + (1-Z)µ ≤ µ + kµ]
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 139
Solutions to Problems:
6.1. C. Z =
200 = 36.5%. 1500
6.2. D. Z =
400 = 22.4%. 8000
Estimated Pure Premium = (22.4%)(1200) + (77.6%)(1000) = $1045. 6.3. D. Since the credibility is proportional to the square root of the number of claims, we get (26%)( 20 ) = 116%. However, the credibility is limited to 100%.
6.4. E. Z =
513 = 0.506. Observed average cost per claim is: 4,771,000 / 513 = 9300. 2000
Thus the estimated severity = (0.506)(9300) + (1 - 0.506)(10,300) = $9794. 6.5. B. The expected number of claims is (0.04)(5000) = 200. Z=
200 = 21.1%. 4500
6.6. A. Φ(2.326) = 0.99, so that y = 2.326. n0 = y2/k2 = (2.326 / 0.06)2 = 1503. For the Gamma Distribution, the mean is αθ, while the variance is αθ2 . Thus CV2 = Z=
α θ2 = 1 / α = 1/2.5 = 0.4. nF = n0 (1+CV2 ) = (1503)(1.4) = 2104. (α θ)2
200 = 30.8%. 2104
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 140
6.7. B. P = 98%. Therefore y = 2.326, since Φ(2.326) = 0.99 = (1+P)/2. k = 0.03. Standard For Full Credibility is: nF = (y / k)2 (σf2/µf) = (2.326/0.03)2 (0.15/0.05) = 18,034 claims, or 18,034/0.05 = 360,680 exposures. Z =
200,000 = 74.5%. 360,680
Estimated future frequency is: (74.5%)(9021/200000) + (25.5%)(.05) = 4.635%. Expected number of future claims is: (200000)(4.635%) = 9270. Comment: When available, one generally uses the number of exposures or the expected number of claims in the square root rule, rather than the observed number of claims. Using the expected number of claims, Z =
6.8.
10,000 = 74.5%. 18,034
x = 0.4. ⇒ x = (0.42 ) (1000) = 160 exposures. 1000
6.9. C. CV2 of the Pure Premium is: 100/62 = 2.778. y = 1.282. k = 0.025. n0 = y2/k2 = 2630. Standard for Full Credibility for P.P. = n0 (Coefficient of Variation of the P.P.)2 = (2630)(2.778) = 7306 exposures. Z =
800 = 33.1%. Observation = 3200/800 = 4. 7306
New Estimate = (4)(33.1%) + (6)(66.9%) = 5.34. Alternately, let m be the mean frequency. Then since the frequency is assumed to be Poisson, variance of pure premium = m(second moment of severity). Thus E[X2 ] = 100 / m. E[X] = 6 / m. Standard for Full Credibility in terms of claims is: n0 (1 + CV2 ) = n0 E[X2 ] / E[X]2 = (2630) (100 / 62 )m = 7306m claims. Expected number of claims = 800m. Z=
800m = 33.1%. Proceed as before. 7306m
Comment: You are given the number of exposures and not the number of claims, so that it may be easier to get a standard for full credibility in terms of exposures. When computing Z, make sure the ratio you use is either claims/claims or exposures/exposures. The numerator and the standard for full credibility in the denominator should be in the same units.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 141
6.10. A. The Exponential Distribution has a coefficient of variation of 1. For a Poisson frequency, standard for full credibility for pure premium = nF = n0 (1 + CV2 ) = n0 (1 + 12 ) = 2 n0 = twice standard for full credibility for frequencies. Since the credibility is inversely proportional to the square root of the standard for full credibility, the credibility for pure premiums is that for frequency divided by
2 : 60%/ 2 = 42.4%.
6.11. E. We know we have an amount of data at least equal to the full credibility criterion for frequency. If we have a lot more data, we would also assign 100% credibility for estimating pure premiums. If we have just enough data to assign 100% credibility for estimating frequencies, then we would assign 100%/ 2 = 70.7% credibility for estimating pure premiums. Thus we cannot determine the answer from the given information. Comment: One could proceed as in the previous question and calculate 100%/ 2 = 70.7%. However, this assumes that we have just enough data to assign 100% credibility for estimating frequencies. In fact we may have much more data than this. For example, if the full credibility criteria for frequency is 1082 claims, we might have either 1082 or 100,000 claims in our data. 6.12. B. y = 1.960. k = 0.10. n0 = (y / k)2 = 384. Estimated annual pure premium is : (200 + 150 + 230 + 180) / 4 = 190. Estimated variance of the pure premium is : {(200 - 190)2 + (150 - 190)2 + (230 - 190)2 + (180 - 190)2 } / (4-1) = 1133. Using the formula: Standard for Full Credibility for P.P. in exposures = n0 (Coefficient of Variation of the Pure Premium)2 = (384)(1133/1902 ) = 12.1 exposures. Since we have 4 exposures (we have counted each year as one exposure,) Z=
4 = 57.5%. Observation = 190. Prior estimate is 210. 12.1
Therefore, estimated P.P. = (190)(57.5%) + (210)(42.5%) = 198.5. Alternately, let µF be the mean frequency, σF be the standard deviation of the frequency, µS be the mean frequency, and σS be the standard deviation of the severity. Then in terms of claims, the Standard for Full Credibility for P.P. is: n0 (σF2/µF + CV2 ) = n0 (σF2/µF + σS2/µS2). Thus in terms of exposures, the Standard for Full Credibility for P.P. is: n0 (σF2/µF + σS2/µS2)/µF = n0 (σF2µS2 + µFσS2)/(µF2 µS2) = n0 (variance of P.P.)/(mean of P.P.)2 = n0 (Coefficient of Variation of the Pure Premium)2 . Proceed as above. Comment: The use of the unbiased estimator of the variance, with n - 1 in the denominator, when we have a sample, is the type of thing that is done in Loss Models.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 142
6.13. D. P = 0.90 and k = 0.025. Φ(1.645) = 0.95 = (1+P)/2, so that y = 1.645. n0 = (1.645 / 0.025)2 = 4330. CVSev2 = 1/2. nF = n0 (1 + CVSev2 ) = (4330)(1 + 1/2) = 6495 claims. Z =
810 / 6495 = 35.3%.
The prior estimate of aggregate losses is: (80%)($1.6 million) = $1.28 million. The observation of aggregate losses is $1.134 million. Thus the new estimate is: (0.353)(1.134) + (1 - 0.353)(1.28) = 1.228 million. Since exposures have increased by 12%, the estimate of aggregate losses for 2002 is: (1.12)(1.228) = $1.38 million. 6.14. B. The observed winning percentage is: 35/107. The predicted winning percentage for the remainder of the season is: (55.4 - 35) / (162 - 107) = 20.4/55. Z 35/107 + (1 - Z) (0.5) = 20.4/55. ⇒ Z = 0.7466. 107 / nF = 0.7466. ⇒ nF = 192 games. Comment: Information taken from www.coolstandings.com, as of August 4, 2012.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 143
6.15. D. We are given k = 10%, P = 90%. Thus y = 1.645. n0 = (y/k)2 = 271. For losses of size 100 and 1000 the insurer makes no payment. In the case of 10,000, the insurer pays 10000 - 1000 = 9000. In the case of 100,000, the insurer pays 25000 - 1000 = 24000. The distribution of the size of nonzero payments is: 9000 @2/3 and 24,000 @1/3. This has mean of: (2/3)(9000) + (1/3)(24000) = 14,000. This has second moment of: (2/3)(90002 ) + (1/3)(240002 ) = 246,000,000. 1 + CV2 = 246,000,000/14,0002 = 1.255. We expect 500 losses, and (.3)(500) = 150 nonzero payments. Number of claims (nonzero payments) needed for full credibility is: (271)(1.255) = 340. Z=
150 = 66.4%. 340
Alternately, the distribution of amounts paid is: 0 @.7, 9000 @.2 and 24,000 @.1. This has mean of: (0.2)(9000) + (0.1)(24000) = 4,200. This has second moment of: (0.2)(90002 ) + (0.1)(24,0002 ) = 73,800,000. 1 + CV2 = 73,800,000/42002 = 4.184. Number of losses needed for full credibility is: (271)(4.184) = 1134. Z=
500 = 66.4%. 1134
Comment: The expected total payments are: (150)(14000) = 2,100,000 = (500)(4200). If for example, we observed 2,500,000 in total payments this year, we would estimate total payments next year of: (66.4%)(2,500,000) + (33.6%)(2,100,000) = 2,365,600.
6.16. C. Z =
19 = 13.8%. Observed frequency = 8/19 = .421. 1000
Prior estimate of frequency = 100/162 = .617. Estimated future frequency = (13.8%)(0.421) + (1 - 13.8%)(0.617) = .590. Estimated number of games won rest of season = (.590)(162 - 19) = 84.4. Estimated total number of games won = 8 + 84.4 = 92.4.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 144
6.17. C. Prior to taking into account inflation, the estimate of aggregate losses in 2008 must have been: 447,900/1.04 = 430,673. The observed aggregate loss is: (77)(6861) = 528,297. Z 528,297 + (1 - Z)(400,000) = 430,673. ⇒ Z = 23.9%. P = 90%. y = 1.645. ⇒ n0 = (y/k)2 = (1.645/0.05)2 = 1082 claims. For a Gamma, CV2 = (αθ2) / (αθ)2 = 1/α. Standard for Full Credibility is: (1082)(1 + 1/α). 23.9% = Z =
77 . ⇒ 0.2392 {(1082)(1 + 1/α)} = 77. ⇒ α = 4.07. (1082) (1 + 1/ α)
6.18. B. Average severity for the State = $218 million / 7,363 = $29,608. Average severity for countrywide = $23,868 million / 442,124 = $53,985. P = 99%. ⇒ y = 2.576. n0 = (2.576/0.075)2 = 1180 claims. Standard for full credibility for severity is: CV2 n0 = (42 )(1180) = 18,880 claims. Z=
7,363 = 62.4%. 18,880
Estimated state severity is: (62.4%)($29,608) + (1 - 62.4%)($53,985) = $38,774. Comment: You are not responsible for knowing the details of any specific line of insurance. A simplified portion of the calculation of State/Hazard Group Relativities for Workers Compensation Insurance.
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 145
6.19. I would expect Willie to have a better batting average in the future than Reginald. While Reginald has a batting average of 0.500, there is too little data to have much credibility. Thus the estimated future batting average of Reginald is probably only slightly higher than the overall mean of 0.260. On the other hand, Willie has a considerable amount of data. His estimated future batting average is close to or equal to his observed 0.300. For example, let us assume a Binomial Model. m q (1- q) Then for q = 0.26, the ratio of the variance to the mean frequency is: = 1- q = 0.74. mq If for example, we were to take P = 90% and k = 5%, then n0 = 1082 claims. The Standard for Full Credibility for frequency would be: (0.74)(1082) = 801 claims. This is equivalent to: 801 / 0.26 = 3081 exposures (at bats). Then for Reginaldʼs data, Z =
6 = 4.4%. 3081
Reginaldʼs estimated future batting average is: (4.4%)(0.5) + (1 - 4.4%)(0.26) = 0.271. For Willieʼs data, Z =
3000 = 98.7%. 3081
Reginaldʼs estimated future batting average is: (98.7%)(0.3) + (1 - 98.7%)(0.26) = 0.299. Comment: Not the style of question you will get on your exam. Other reasonable choices for P and k would produce somewhat different credibilities. With additional information besides the results of their batting, one could make better estimates. 6.20. B. k = 5% and P = 90%. We have y = 1.645, since Φ(1.645) = 0.95 = (1+P)/2. Therefore n0 = (y/k)2 = (1.645/0.05)2 = 1082. When we have 36 claims per year for three years we assign 20% credibility; therefore .20 =
108 . Thus nF = 2700. nF
But nF = n0 (1 + CV2 ). Thus 2700 = 1082(1 + CV2 ). CV = 1.22.
6.21. B. The credibility Z =
600 = 1/3. 5400
Thus the new estimate is: (1/3)(1200) + (1- 1/3)(1000) = $1067.
6.22. C. The credibility assigned to (2)(250) = 500 claims, Z = The new estimate is (0.447)(130) + (1 - 0.447)(100) = $113.
500 = 0.447. 2500
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 146
6.23. C. The credibility assigned was:
change in loss cost = difference between observation and prior estimate
(125 - 100) / (200 - 100) = 25%. The expected number of claims was (10,000)(.0210) = 210. Z=
210 . Therefore nF = 210 / 0.252 = 3360. nF
Comment: We expect 210 claims, and Z =
210 = 0.25. 3360
Then the new estimate of the loss costs is: ($200)(0.25) + ($100)(1 - 0.25) = $125. 6.24. B. (1+P)/2 = (1.96)/2 = 0.98. Thus y = 2.054, since Φ(2.054) = 0.98. The standard for full credibility is: (y2 /k2 ) (1+CV2 ) = (2.054/0.10)2 (1+.62) = 574 claims. Thus we assign credibility of Z =
213 = 60.9%. 574
6.25. D. CV2 = variance / mean2 = 17,640,000 / 7002 = 36. k = 0.05, while P (and y) are to be solved for. The credibility being applied to the observation is: Z = (change in estimate) / (observation - prior estimate) = (550-500) / (700-500) = 0 25. We expect: (2)(1250) = 2500 claims. Thus since 2500 claims are given 0.25 credibility, the full credibility standard is: 2500/0.252 = 40,000 claims. However, that should equal (y2 /k2 ) (1+CV2 ) = (y2 /0.052 ) (1 + 36). Thus: y = (0.05)
40,000 = 1.644. 37
Φ(y) = (1+P)/2. Thus P = 2Φ(1.644) - 1 = (2)(0.9499) - 1 = 0.90. 6.26. D. k = 0.05 and P = 0.90. y = 1.645 , since Φ(1.645) = 0.95 = (1+P)/2. n0 = y2 / k2 = (1.645/0.05)2 = 1082. The mean of the severity distribution is 100,000. The second moment of the severity is the integral of x2 /200,000 from 0 to 200,000, which is 200,0002 /3. Thus the variance is 3,333,333,333. The square of the coefficient of variation is variance/ mean2 = 3,333,333,333 / 100,0002 = 0.3333. Thus nF = n0 (1+CV2) = (1082)(1.333) = 1443. For 1082 claims, Z =
1082 = 1443
3 = 0.866. 4
Comment: For the uniform distribution on [a,b], the CV =
b-a . For a = 0, CV2 = 1/3. (b + a) 3
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 147
6.27. D. k = 6%. Φ(1.645) = .95 = (1+.90)/2, so that y = 1.645. Standard for full credibility for frequency = n0 = y2/k2 = (1.645 / .06)2 = 756. Coefficient of Variation of the severity = 7500 /1500 = 5. Standard for full credibility for pure premium = nF = n0 (1 + CV2 ) = 756(1+52 ) = 19,656 claims. Z=
6000 = 0.552. The prior estimate is given as $16.5 million. 19,656
The observation is given as $15.6 million. Thus the new estimate is: (0.552)(15.6) + (1 - 0.552)(16.5) = $16.00 million.
6.28. D. Z =
10,000 = 75.6%. 17,500
Thus the new estimate = (25 million)(0.756) + (20 million)(1 - 0.756) = $23.78 million.
6.29. B. Z =
N . Thus, R = O F
Solving for N, N =
N + P {1 F
N }. F
F (R - P)2 . (O - P)2
Comment: Writing the revised estimate as R = P + Z(O-P) can be useful in general and allows a slightly quicker solution of the problem. This can also be written as Z = (R - P) / (O - P); i.e., the credibility is the ratio of the revision of the estimate from the prior estimate to the deviation of the observation from the prior estimate.
6.30. B. 0.6 = Z =
n . Therefore, n = (0.62 )(2000) = 720 claims. 2000
6.31. D. We are given k = 5% and P = 90%, therefore we have y = 1.645 since Φ(1.645) = .95 = (1 + P)/2. Therefore, n0 = (y/k)2 = (1.645/.05)2 = 1082. The partial credibility is given by the square root rule: Z =
500 = 0.68. 1082
6.32. A. P = .95 and k = .1. Φ(1.960) = .975 = (1+P)/2, so that y = 1.960. n0 = (y/k)2 = (1.960 / .10)2 = 384. Z =
n = 0.4. Thus n = (384)(0.42 ) = 61.4. 384
2013-4-8, 6.33. B.
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 148 n = 0.5. Thus n = (1000) (0.5)2 = 250. 1000
6.34. B. For N observations, Classical Credibility =
N N = , for N ≤ 1600. 1600 40
For N observations, Greatest Accuracy/ Buhlmann Credibility = N/(N + K) = N / (N + 391). We want
N N > . ⇒ N - 40 N + 391 > 0. 40 N + 391
N - 40 N + 391 = 0. ⇒
N =
40 ±
402 - (4)(1)(391) = 17 or 23. 2
For N between 172 = 289 and 232 = 529, the Buhlmann Credibility is greater than the Classical Credibility. Comment: The 2 formulas for K = 391 and nF = 1600 produce very similar credibilities. N Classical Cred. Buhlmann Cred.
0 100 200 300 400 500 529 600 1000 0.0% 25.0% 35.4% 43.3% 50.0% 55.9% 57.5% 61.2% 79.1% 0.0% 20.4% 33.8% 43.4% 50.6% 56.1% 57.5% 60.5% 71.9%
See “An Actuarial Note on Credibility Parameters,” by Howard Mahler, PCAS 1986. 6.35. A. P = .95 and k = .05. Φ(1.960) = 0.975 = (1+P)/2, so that y = 1.960. n0 = (y/k)2 = (1.960 / 0.05)2 = 1537. Standard for full credibility for pure premium = nF = n0 (1 + CV2 ) = 1537(1 + 32 ) = 15,370 claims. Z =
1000 = 25.5%. 15,370
The prior estimate is given as $5 million. The observation is given as $6.75 million. Thus the new estimate is: (25.5%)(6.75) + (1 - 25.5%)(5) = $5.45 million. 6.36. C. k = .05. P = 90%. y = 1.645. n0 = (1.645/.05)2 = 1082 claims. For the Negative Binomial, µf = (2)(0.2) = 0.4. σf2 = (2)(0.2)(1.2). σf2 /µf = 1.2. For the Pareto, E[X] = 1000 / (3-1) = 500. E[X2 ] =
(2) (10002) = 1,000,000. (3 -1) (3 - 2)
C V2 = E[X2 ]/E[X]2 - 1 = 1,000,000/5002 - 1 = 4 - 1 = 3. Standard for Full Credibility = (σf2 /µf + CVSev2 )n0 = (1.2 + 3)(1082) = 4546 claims. 2500 exposures ⇔ (2500)(0.4) = 1000 expected claims. Z =
1000 = 47%. 4546
2013-4-8,
Classical Credibility §6 Partial Credibility, HCM 10/16/12, Page 149
6.37. E. The estimate using classical credibility is: Z Xpartial + (1-Z)µ. We want this estimate to be within ±kµ of µ, with probability P ⇔ P = Pr[µ - kµ ≤ Z Xpartial + (1-Z)µ ≤ µ + kµ]. Comment: See page 30 in Section 2.6 of Mahler & Dean. P = Pr[- kµ ≤ Z Xpartial - Zµ ≤ kµ] ⇔ P = Pr[Zµ - kµ ≤ Z Xpartial ≤ Zµ + kµ] ⇔ P = Pr[(1-Z)µ + Zµ - kµ ≤ (1-Z)µ + Z Xpartial ≤ (1-Z)µ + Zµ + kµ] ⇔ P = Pr[µ - kµ ≤ Z Xpartial + (1-Z)µ ≤ µ + kµ].
2013-4-8,
Classical Credibility §7 Important Ideas , HCM 10/16/12, Page 150
Section 7, Important Formulas and Ideas The estimate using credibility = ZX + (1-Z)Y, where Z is the credibility assigned to the observation X. Full Credibility (Sections 2, 3, and 5): Assume one desires that the chance of being within ± k of the mean frequency to be at least P, then y2 1+ P . n0 = 2 , where y is such that Φ(y) = k 2
The Standard for Full Credibility for Frequency is in terms of claims:
σf2 n0 . µf
In the Poisson case this is: n0. The Standard for Full Credibility for Severity is in terms of claims: CVSev2 n0 . The Standard for Full Credibility for either Pure Premiums or Aggregate Losses is in σ2 terms of claims: ( f + CVS e v2 ) n0 . In the Poisson case this is: (1 + CVSev2 ) n0. µf The standard can be put in terms of exposures rather than claims by dividing by µf. Standard for Full Credibility for Pure Premiums or Aggregate Losses is in terms of exposures: n0 (coefficient of variation of the pure premium)2 . Variance of Pure Premiums and Aggregate Losses (Section 4): Pure Premiums =
$ of Loss # of Claims $ of Loss = = (Frequency)(Severity). # of Exposures # of Exposures # of Claims
When frequency and severity are independent: σP P2 = µF r e q σS e v2 + µS e v2 σF r e q2 .
σA 2 = µF σS 2 + µS 2 σF 2 .
2013-4-8,
Classical Credibility §7 Important Ideas , HCM 10/16/12, Page 151
Partial Credibility (Section 6): When one has at least the number of claims needed for Full Credibility, then one assigns 100% credibility to the observations. Otherwise use the square root rule:
Z=
number of claims , or standard for full credibility in terms of claims
Z=
number of exposures . standard for full credibility in terms of exposures
When available, one generally uses the number of exposures or the expected number of claims in the square root rule, rather than the observed number of claims. Make sure that in the square root rule you divide comparable quantities; either divide claims by claims or divide exposures by exposures.
Mahlerʼs Guide to
Buhlmann Credibility and Bayesian Analysis Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-9 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-9
Buhlmann Credibility,
HCM 10/19/12,
Page 1
Mahlerʼs Guide to Buhlmann Credibility and Bayesian Analysis Copyright 2013 by Howard C. Mahler. Information in bold or sections whose title is in bold are more important for passing the exam. Information presented in italics (and sections whose title is in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Solutions to the problems in each section are at the end of that section.1 Section #
Pages
Section Name
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
4-6 7-48 49-76 77-126 127-205 206-282 283-331 332-393 394-453 454-504 505-516 517-555 556-596 597-611 612-625 626-646 647-679 680-703 704-707
Introduction Conditional Distributions Covariances and Correlations Bayesian Analysis, Introduction
A B C D E F G H I
J
1
Bayesian Analysis, with Discrete Risk Types Bayesian Analysis, with Continuous Risk Types
EPV and VHM Buhlmann Credibility, Introduction Buhlmann Credibility, Discrete Risk Types Buhlmann Credibility, with Continuous Risk Types Linear Regression & Buhlmann Credibility Philbrickʼs Target Shooting Example Die / Spinner Models Classification Ratemaking Experience Rating Loss Functions Least Squares Credibility The Normal Equations for Credibilities Important Formulas and Ideas
Note that problems include both some written by me and some from past exams. In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. Past exam questions are copyright by the Casualty Actuarial Society and Society of Actuaries and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy.
2013-4-9
Buhlmann Credibility,
HCM 10/19/12,
Page 2
Course 4 Exam Questions by Section of this Study Aid2 Section Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
5/00
11/00
5/01
11/01
11/02
11/03
5/05
13 32 11
7 22
29 7
28 14
28 24
13 39 21 24
14 39 19 31
5 33
3
19 38
37
19 20
33
10 11 28
35
11 26 38
29 32
23
9 25
13 32 20
18
18 7
11
29
11 17
6
23
18 23
The CAS/SOA did not release the 5/02, 5/03, 5/04, and 5/06 exams.
2
11/04
Excluding any questions that are no longer on the syllabus.
2013-4-9
Buhlmann Credibility,
Section 11/05 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
11/06
HCM 10/19/12,
5/07
35 15 32
16
6
2, 21 36
7
26
32
The CAS/SOA did not release the 11/07 and subsequent exams.
Page 3
2013-4-9
Buhlmann Credibility §1 Introduction,
HCM 10/19/12,
Page 4
Section 1, Introduction This study guide covers a number of related ideas.3 The basic concepts of Credibility are covered in “Mahlerʼs Guide to Classical Credibility.” Read at least the first section of that Study Guide prior to this one. The concepts in this study guide are applied to special situations in “Mahlerʼs Guide to Conjugate Priors.” In this study guide, the preliminary mathematical ideas of conditional distributions, covariances and correlations are covered first.4 The first key idea is that of Bayes Theorem and Bayesian Analysis. The second key idea is that of Greatest Accuracy or Buhlmann Credibility. Loss Models uses the term “Greatest Accuracy” Credibility for what is more commonly known as Buhlmann, Bühlmann-Straub, or Least Squares Credibility. In this study guide I will use the terms Buhlmann Credibility and Greatest Accuracy Credibility interchangeably. Many of you will benefit by reading my section on the Philbrick Target Shooting Example, prior to studying Buhlmann Credibility.5 One has to become proficient at applying Buhlmann Credibility to various situations typically posed in exam questions, in particular calculating the expected value of the process variance and the variance of the hypothetical means. Therefore, one has to become proficient at calculating variances.6 The third key idea, Nonparametric Empirical Bayesian Estimation, is presented in its own study guide. Rather than a model, the data is used in order to estimate the expected value of the process variance and the variance of the hypothetical means. The fourth key idea, Semiparametric Estimation, is presented in its own study guide. Both a model and data are relied upon in order to estimate the expected value of the process variance and the variance of the hypothetical means.
3
The concepts in Chapter 20 of Loss Models related to Buhlmann or Greatest Accuracy Credibility are demonstrated. This material can also be covered from “Credibility” by Mahler and Dean, Chapter 8 of the fourth Edition of Foundations of Casualty Actuarial Science. My study guide is very similar to and formed a basis for the Credibility Chapter written by myself and Curtis Gary Dean. 4 Many of those familiar with these ideas would benefit by glancing over the important ideas in these sections and doing the highly recommended problems. Those unfamiliar with these ideas should go through these sections in more detail. 5 While not on the syllabus, it will help many students develop an understanding of the ideas on the syllabus. 6 The process variances of various loss (severity) and frequency distributions are covered in those study guides. The process variance of pure premiums is covered in “Mahlerʼs Guide to Classical Credibility”. In general, credibility depends on the variance-covariance structure, as discussed in the section on the Normal Equations for Credibility; however, with rare exceptions, on the exam one need only calculate variances in order to calculate credibilities.
2013-4-9
Buhlmann Credibility §1 Introduction,
HCM 10/19/12,
Page 5
Problems: 1.1 (1 point) You observe 1256 claims per 10,000 exposures. The credibility given to this data is 70%. The complement of credibility is given to the prior estimate of .203 claims per exposure. What is the new estimate of the claim frequency? A. Less than .135 B. At least .135 but less than .145 C. At least .145 but less than .155 D. At least .155 but less than .165 E. At least .165 1.2 (1 point) The prior estimate was a pure premium of $2.53 per $100 of payroll. After observing $81,472 per $795,034 of payroll, the new estimate is a pure premium of $2.87. How much credibility was assigned to the observed data? A. Less than 4% B. At least 4% but less than 5% C. At least 5% but less than 6% D. At least 6% but less than 7% E. At least 7% 1.3 (1 point) Given an observation with a value of 250, the Buhlmann credibility estimate for the expected value of the next observation would be 223. If instead the observation had been 100, the Buhlmann credibility estimate for the expected value of the next observation would have been 118. Determine the Buhlmann credibility of the first observation. A. 40% B. 50% C. 60% D. 70% E. 80% 1.4 (4, 5/83, Q.35, Q.39, Q.41) (1 point) Which of the following are true? 1. Credibility can be characterized as a measure of the relative value of the information contained in the data. 2. The definition of full credibility depends on two parameters. 3. If one assumes that claims for an individual driver are Poisson distributed and that the means of these distributions are Gamma distributed, then the total number of accidents follows a Poisson distribution. A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A ,B, C or D. 1.5 (4, 5/96, Q.3) (1 point) Given a first observation with a value of 2, the Buhlmann credibility estimate for the expected value of the second observation would be 1. Given a first observation with a value of 5, the Buhlmann credibility estimate for the expected value of the second observation would be 2. Determine the Buhlmann credibility of the first observation. A. 1/3 B. 2/5 C. 1/2 D. 3/5 E. 2/3
2013-4-9
Buhlmann Credibility §1 Introduction,
HCM 10/19/12,
Page 6
Solutions to Problems: 1.1. C. (70%)(1256/10,000) + (1-70%)(0.203) = 0.149 1.2. B. Observed pure premium per $100 of payroll is 81472 / 7950.34 = $10.25. new estimate - old estimate $2.87 - $2.53 Z= = = 4.4%. Observation - old estimate $10.25 - $2.53 1.3. D. Let Y be the prior estimate and Z be the credibility of the first observation. 250Z + (1 - Z)Y = 223, and 100 Z + (1 - Z) Y = 118. ⇒ 150Z = 105. ⇒ Z =105/150 = 70%. Alternately, the credibility is the slope of the line of posterior estimates versus observations: Δ estimates 223 - 118 Z= = = 105/150 = 70%. Δ observations 250 - 100 1.4. E. 1. True. This is one way to describe Buhlmann Credibility. 2. True. The Classical Credibility Standard For Full Credibility depends on choosing P and k. 3. False. The mixed predictive distribution for the Gamma-Poisson is a Negative Binomial Distribution. 1.5. A. Let Y be the prior estimate and Z be the credibility of the first observation. Then: 2Z + (1 - Z)Y = 1, and 5 Z + (1 - Z)Y = 2. Therefore, 3Z = 1 or Z = 1/3. Alternately, the credibility is the slope of the line of posterior estimates versus observations: Δ estimates 2 - 1 Z= = = 1/3. Δ observations 5 - 2
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 7
Section 2, Conditional Distributions7 Assume that 14% of actuarial students take exam seminars and that 8% of actuarial students both take exam seminars and pass their exam. Then the chance of a student who has taken an exam seminar of passing his exam is 8% / 14% = 57%. Assume 1000 total students, of whom 140 take exam seminars. Of these 140 students, 80 pass, for a pass rate of: 80/140. This is a simple example of a conditional probability. The conditional probability of an event A given another event B is defined as: P[A and B] P[A|B] = . P[B] In the simple example, A = {student passes exam}, B = {student takes exam seminar}, P[A and B] = 8%, P[B] = 14%. Thus P[A|B] = P[A and B] / P[B] = 8% / 14% = 57%. Here is a more complicated example of a conditional probability. Assume two honest, six-sided dice of different colors are rolled, and the results D1 and D2 are observed. Let S = D1 + 2D2 . Then one can easily compute S for all the possible outcomes:
D1 1 2 3 4 5 6
1
2
D2 3
3 4 5 6 7 8
5 6 7 8 9 10
7 8 9 10 11 12
4
5
6
9 10 11 12 13 14
11 12 13 14 15 16
13 14 15 16 17 18
Then when S ≤ 13 we have the following equally likely possibilities: D2 D1
1
2
3
4
5
6
Possibilities
1 2 3 4 5 6
x x x x x x
x x x x x x
x x x x x x
x x x x x
x x x
x
6 5 5 4 4 3
7
Conditional Density Function of D1 given that S ≤ 13 6/27 5/27 5/27 4/27 4/27 3//27
Conditional distributions form the basis for Bayesian Analysis. Thus even though these ideas are only occasionally tested directly, you should make sure you have a firm understanding of the concepts in this section. For those who already know this material, do only a few problems from this section in order to refresh your memory.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 8
The mean of the conditional density function of D1 given that S ≤ 13 is: {(6)(1) + (5)(2) + (5)(3) + (4)(4) +(4)(5)+(3)(6)} / 27 = 3.148. The median is 3, since the Distribution Function at 3 is 16/27 ≥ .5 while that at 2 is 11/27 < .5. The mode is 1, since that is the value at which the density is a maximum. Exercise: In the above example, if S > 13, what is the chance that D1 = 4? [Solution: The following 9 pairs have S > 13: (2,6), (3,6), (4,5), (4,6), (5,5),(5,6), (6,4), (6,5) and (6,6). Thus P[D1 = 4 | S > 13] = 2 / 9.] Theorem of Total Probability: We observe that P[D1 = 4 | S > 13] P[S > 13] = (2/9)(9/36) = 2/36, while P[D1 = 4 | S ≤ 13] P[S ≤ 13] = (4/27)(27/36) = 4/36. The sum of these two terms is: (2/36) + (4/36) = 6/36 = P[D1 = 4]. Note that either S > 13 or S ≤ 13; these are two disjoint events that cover all the possibilities. One can have a longer series of mutually disjoint events that cover all the possibilities rather than just two. If one has such a set of events Bi, then one can write the marginal distribution function P[A] in terms of the conditional distributions P[A | Bi] and the probabilities P[Bi]:
∑P[A |
Bi] P[Bi]
Bi
This theorem follows from, Σ P[A | Bi] P[Bi] = Σ P[A and Bi] = P[A], provided that the Bi are disjoint events that cover all possibilities. Thus one can compute probabilities of events either directly or by summing a product of terms. For example, we already know that in this example there is 1 in 6 chance that the first die is 3. However, we can compute this probability by taking the set of disjoint events, that S is 3, 4, 5,..., or 18.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 9
Let Bi = {S ≡ D1 + 2D2 = i} for i = 3 to 18. Then we have for D1 = 3: A
B
C
D
i 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
P[D1 = 3 | S = i] 0.00% 0.00% 50.00% 0.00% 33.33% 0.00% 33.33% 0.00% 33.33% 0.00% 33.33% 0.00% 50.00% 0.00% 0.00% 0.00%
P[S=i] 2.78% 2.78% 5.56% 5.56% 8.33% 8.33% 8.33% 8.33% 8.33% 8.33% 8.33% 8.33% 5.56% 5.56% 2.78% 2.78%
Col. B x Col. C = P[D1 = 3 | S = i] P[S=i] 0.00% 0.00% 2.78% 0.00% 2.78% 0.00% 2.78% 0.00% 2.78% 0.00% 2.78% 0.00% 2.78% 0.00% 0.00% 0.00%
1
16.67%
Sum
So in this case we can indeed calculate the probability that the first die is 3, using the Theorem of Total Probability: 1/6 = P[D1 = 3] = Σ P[D1 = 3 | S = i] P[S = i] Exercise: Assume that the number of students taking an exam by exam center are as follows: Chicago 3500, Los Angeles 2000, New York 4500. The number of students from each exam center passing the exam are: Chicago 2625, Los Angeles 1200, New York 3060. What is the overall passing percentage? [Solution: (2625+1200+3060) / (3500+2000+4500) = 6885/10000 = 68.85%. ] Exercise: Assume that the percent of students taking an exam by exam center are as follows: Chicago 35%, Los Angeles 20%, New York 45%. The percent of students from each exam center passing the exam are: Chicago 75%, Los Angeles 60%, New York 68%. What is the overall passing percentage? [Solution: Σ P[A | Bi] P[Bi] = (75%)(35%) + (60%)(20%) + (68%)(45%) = 68.85%. ] Note that this exercise is mathematically the same as the previous exercise. This is a concrete example of the Theorem of Total Probability.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 10
Conditional Expectation: The mean of the conditional density function of D2 given that S≤13 is: {(6)(1) + (6)(2) + (6)(3) + (5)(4) +(3)(5)+(1)(6)} / 27 = 2.852. In general, in order to compute such a conditional expectation, we take the weighted average over all the possibilities y: E[X | B] =
∑ y P[ X = y |
B]
y
Note that the conditional expectation of D2 given that S≤13 which is 2.852 is not equal to the unconditional expectation of D2 which is 3.5, the mean of a fair six-sided die. The fact that we observed that S≤13 decreased the expected value of D2 . Exercise: What is the mean of the conditional density function of D2 given that S>13? [Solution: (0/9)(1) + (0/9)(2) + (0/9)(3) + (1/9)(4) +(3/9)(5)+(5/9)(6) = 5.444. ] In general, we can compute the unconditional expectation by taking a weighted average of the conditional expectations over all the possibilities:
∑E[X
| Bi] P[Bi]
Bi
The different events Bi must be disjoint and cover all the possibilities. In this particular case for example we can take two possibilities S ≤ 13 and S > 13: E[D2 ] = E[D2 |S ≤ 13] P[S ≤ 13] + E[D2 |S > 13] P[S > 13] = ( 2.852)(27/36) + (5.444 )(9/36) = 3.5, which is the correct unconditional mean.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 11
One could have obtained 3.5 by instead summing over each possible value of S: A
B
C
D
i 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Sum
E[D2 | S = i] 1.000 1.000 1.500 1.500 2.000 2.000 3.000 3.000 4.000 4.000 5.000 5.000 5.500 5.500 6.000 6.000
P[S=i] 2.78% 2.78% 5.56% 5.56% 8.33% 8.33% 8.33% 8.33% 8.33% 8.33% 8.33% 8.33% 5.56% 5.56% 2.78% 2.78% 1
Col. B x Col. C = E[D2 | S = i] P[S=i] 0.028 0.028 0.083 0.083 0.167 0.167 0.250 0.250 0.333 0.333 0.417 0.417 0.306 0.306 0.167 0.167 3.500
For example for S = 11, there are three equally likely possibilities:8 (D1 = 5, D2 = 3), (D1 = 3, D2 = 4), (D1 = 1, D2 = 5). Thus E[D2 | S =11] = (3+4+5)/3 = 4. Conditional Variances: One could be asked any questions about a conditional distribution, that one could be asked about any other distribution. For example, the variance of the conditional distribution of D2 given S =11 is computed by subtracting the square of the mean from the second moment. Since for S = 11, there are three equally likely possibilities: (D1 = 5, D2 = 3), (D1 = 3, D2 = 4), (D1 = 1, D2 = 5), the conditional distribution has probability of 1/3 at each of 3, 4 and 5. Thus its second moment is: (1/3)(32 ) +(1/3)(42 ) + (1/3)(52 ) = 16.667. Thus since the mean is 4, the conditional variance is 16.667 - 42 = 0.667. In general one, can compute higher moments in the same way one computes the conditional mean: E[Xn | B] =
∑ yn P[ X = y y
8
Recall that we defined S = D1 + 2D2 .
| B ].
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 12
In general, we can compute the unconditional higher moments by taking a weighted average of the conditional moments over all the possibilities: E[Xn ] =
∑E[X n
| Bi ] P[Bi]
Bi
A Continuous Example: Iʼve gone through a discrete example involving conditional distributions. Here is a continuous example.9 Wherever one would use sums in the discrete case, one uses integrals in the continuous case. For a given value of λ, the number of claims is Poisson distributed with mean λ. In turn λ is distributed uniformly from 0.1 to 0.4. Exercise: From an insured picked at random, what is the chance that zero claims are observed? [Solution: Given λ, the chance that we observe zero claims is e−λ. 0.4
P(n=0) =
0.4
∫0.1P(n = 0 | λ) f(λ) dλ = 0.1∫
e- λ
(1/ 0.3) dλ = (-1/ 0.3)e
λ = 0.4 -λ
]
λ = 0.1
= (e-0.1 - e-0.4)/0.3 = 0.782.] Exercise: From an insured picked at random, what is the chance that one claim is observed? [Solution: Given λ, the chance that we observe one claim is: e−λλ 1 / 1! = λe−λ. 0.4
P(n=1) =
∫
0.4
P(n = 1 | λ) f(λ) dλ =
0.1
∫
λ = 0.4
λe - λ (1/ 0.3) dλ = (-1/ 0.3)(λe - λ + e - λ )]
0.1
λ = 0.1
= (1.1e-0.1 - 1.4e-0.4)/0.3 = 0.190.] Exercise: What is the unconditional mean? [Solution: The unconditional mean can be obtained by integrating the conditional means versus the distribution of λ: 0.4
E[X] = 9
0.4
∫0.1E[X | λ] f(λ) dλ = 0.1∫ λ (1/ 0.3) dλ = {(0.42 - 0.12)/2} / 0.3 = 0.25.]
For additional continuous examples, see the problems below. Also the Conjugate Prior processes provide continuous examples, including the important Gamma-Poisson.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 13
In general, we can compute the unconditional higher moments by taking an integral of the conditional moments times the chance of each possibility over all the possibilities: E[Xn ] =
∫ E[Xn|λ] f(λ) dλ.
Exercise: What is the unconditional variance? [Solution: For the Poisson Distribution the mean is λ, the variance is the λ, and therefore the (conditional) second moment is: λ + λ 2. The unconditional second moment can be obtained by integrating the conditional second moments versus the distribution of λ: 0.4
E[X2 ] =
∫ 0.1
0.4
E[X2
| λ] f(λ) dλ =
∫0.1(λ +
λ2 )
(1/ 0.3) dλ =
(λ2
λ = 0.4
/ 2 + λ / 3) (1/ 0.3)] 3
= 0.32.
λ = 0.1
From the previous exercise the unconditional mean is 0.25. Thus the unconditional variance is: 0.32 - 0.252 = 0.2575. Comment: Integrate the conditional moments, not the conditional variance.]
Summary: This material is preliminary, and therefore there will be very few if any exam questions based solely on the material in this section. However, it does form the basis for the many Bayesian Analysis questions covered subsequently. Thus it is a good idea to have an ability to apply the concepts related to conditional distributions to the type of situations that come up on questions involving Bayesian Analysis, including those covered in “Mahlerʼs Guide to Conjugate Priors”.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 14
Problems: 2.1 (2 points) Assume that 3% of men are colorblind, while 0.2% of women are colorblind. The studio audience of the television show “The Vista” is made up 20% of men and 80% of women. A person in this audience is colorblind. What is the chance that this colorblind person is a man? A. less than 0.65 B. at least 0.65 but less than 0.70 C. at least 0.70 but less than 0.75 D. at least 0.75 but less than 0.80 E. at least 0.80 Use the following information for the next 6 questions: A large set of urns contain many black and red balls. There are four types of urns each with differing percentages of black balls. Each type of urn has a differing chance of being picked. Type of Urn A Priori Probability Percentage of Black Balls I 40% 3% II 30% 5% III 20% 8% IV 10% 13% 2.2 (1 point) An urn is picked and a ball is selected from that urn. What is the chance that the ball is black? A. Less than 0.050 B. At least 0.050 but less than 0.055 C. At least 0.055 but less than 0.060 D. At least 0.060 but less than 0.065 E. At least 0.065 2.3 (1 point) An urn is picked and a ball is selected from that urn. If the ball is black, what is the chance that Urn I was picked? A. Less than 0.22 B. At least 0.22 but less than 0.24 C. At least 0.24 but less than 0.26 D. At least 0.26 but less than 0.28 E. At least 0.28
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 15
2.4 (1 point) An urn is picked and a ball is selected from that urn. If the ball is black, what is the chance that Urn II was picked? A. Less than 0.22 B. At least 0.22 but less than 0.24 C. At least 0.24 but less than 0.26 D. At least 0.26 but less than 0.28 E. At least 0.28 2.5 (1 point) An urn is picked and a ball is selected from that urn. If the ball is black, what is the chance that Urn III was picked? A. Less than 0.22 B. At least 0.22 but less than 0.24 C. At least 0.24 but less than 0.26 D. At least 0.26 but less than 0.28 E. At least 0.28 2.6 (1 point) An urn is picked and a ball is selected from that urn. If the ball is black, what is the chance that Urn IV was picked? A. Less than 0.22 B. At least 0.22 but less than 0.24 C. At least 0.24 but less than 0.26 D. At least 0.26 but less than 0.28 E. At least 0.28 2.7 (2 points) An urn is picked and a ball is selected from that urn and then replaced. If the ball is black, what is the chance that the next ball picked from that same urn will be black? A. Less than 0.06 B. At least 0.06 but less than 0.07 C. At least 0.07 but less than 0.08 D. At least 0.08 but less than 0.09 E. At least 0.09
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 16
Use the following information for the next two questions: For a given value of λ, the number of claims is Poisson distributed with mean λ. In turn λ is distributed uniformly from 0 to 1.5. 2.8 (2 points) What is the chance that zero claims are observed? A. Less than 0.35 B. At least 0.35 but less than 0.40 C. At least 0.40 but less than 0.45 D. At least 0.45 but less than 0.50 E. At least 0.50 2.9 (2 points) What is the chance that one claim is observed? A. Less than 0.35 B. At least 0.35 but less than 0.40 C. At least 0.40 but less than 0.45 D. At least 0.45 but less than 0.50 E. At least 0.50 Use the following information for the next 8 questions: Let X and Y be two continuous random variables with joint density function: f(x,y) = (6 + 12x + 18y) / 25, for 1< y < x < 2. f(x, y) = 0 otherwise. 2.10 (3 points) What is the (unconditional) marginal density f(x)? A. (11x2 - 6x - 5) / 25 B. (11x2 - 16x - 15) / 25 C. (21x2 - 16x - 15) / 25 D. (21x2 - 6x - 5) / 25 E. None of the above 2.11 (3 points) What is the (unconditional) marginal density f(y)? A. (30 + 36y - 16y2 ) / 25 B. (36 + 36y - 24y2 ) / 25 C. (30 + 30y - 16y2 ) / 25 D. (36 + 30y - 24y2 ) / 25 E. None of the above
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 17
2.12 (2 points) What is the conditional density f(y | X=1.5)? A. (24/93) (4 + 3y) B. (24/93) (3 + 3y) C. (24/93) (4 + 5y) D. (24/93) (3 + 5y) E. None of the above 2.13 (2 points) What is the conditional density f(x | Y = 1.5)? A. (8x + 4) / 9 B. (4x + 11) / 9 C. (2x + 15) / 9 D. (x + 19) / 9 E. None of the above 2.14 (4 points) What is the conditional expectation E[Y | X =1.6]? A. 1.27 B. 1.29 C. 1.31 D. 1.33 E. 1.35 2.15 (4 points) What is the conditional expectation E[X | Y = 1.2]? A. Less than 1.50 B. At least 1.50 but less than 1.55 C. At least 1.55 but less than 1.60 D. At least 1.60 but less than 1.65 E. At least 1.65 2.16 (4 points) What is the unconditional expectation E[X]? A. Less than 1.50 B. At least 1.50 but less than 1.55 C. At least 1.55 but less than 1.60 D. At least 1.60 but less than 1.65 E. At least 1.65 2.17 (4 points) What is the unconditional expectation E[Y]? A. Less than 1.25 B. At least 1.25 but less than 1.30 C. At least 1.30 but less than 1.35 D. At least 1.35 but less than 1.40 E. At least 1.40
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 18
Use the following information for the next two questions: X and Y are each given by the result of rolling a six-sided die. X and Y are independent of each other. Z = X + Y. 2.18 (1 point) What is the probability that X = 6 if Z ≥ 10? A. 1/5 B. 1/4 C. 1/3 D. 2/5 E. 1/2 2.19 (2 points) What is the expected value of X if Z ≥ 10? A. less than 4.8 B. at least 4.8 but less than 5.0 C. at least 5.0 but less than 5.2 D. at least 5.2 but less than 5.4 E. at least 5.4 2.20 (2 points) Let X and Y each be distributed exponentially with distribution functions: F(x) = 1 - e-3x, x > 0, H(y) = 1 - e-3y, y > 0. X and Y are independent. Let Z = X + Y. Given Z = 1/2, what is the conditional distribution of X? A. P[X=x | Z=1/2] = 2, for 0
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 19
Use the following information for the next two questions: City Boston Springfield Worcester Pittsfield
Percentage of Total Drivers 40% 25% 20% 15%
Percent of Drivers Accident-Free 90% 92% 94% 96%
2.24 (1 point) A driver is picked at random. If the driver is accident-free, what is the chance the driver is from Boston? A. 35% B. 36% C. 37% D. 38% E. 39% 2.25 (1 point) A driver is picked at random. If the driver has had an accident, what is the chance the driver is from Pittsfield? A. Less than 0.05 B. At least 0.05 but less than 0.06 C. At least 0.06 but less than 0.07 D. At least 0.07 but less than 0.08 E. At least 0.08
2.26 ( 3 points ) A die is selected at random from an urn that contains four six-sided dice with the following characteristics: Number of Faces Number on Face Die A Die B Die C Die D 1 3 1 1 1 2 1 3 1 1 3 1 1 3 1 4 1 1 1 3 The first five rolls of the selected die yielded the following in sequential order: 2, 3, 1, 2, and 4. What is the probability that the selected die is B? A. Less than 0.5 B. At least 0.5, but less than 0.6 C. At least 0.6, but less than 0.7 D. At least 0.7, but less than 0.8 E. 0.8 or more
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 20
2.27 (1 point) On a multiple choice exam, each question has 5 possible answers, exactly one of which is correct. On those questions for which he is not certain of the answer, Les N. DeRisk's strategy for taking the exam is to answer at random from the 5 possible answers. Assume he correctly answers the questions for which he knows the answers. If Les knows the answers to 76% of the questions, what is the probability that he knew the answer to a question he answered correctly? (A) 90% (B) 92% (C) 94% (D) 96% (E) 98% 2.28 (19 points) You are given the following joint distribution of X and Y: y x 0 1 2 0 0.1 0.2 0 1 0 0.2 0.1 2 0.2 0 0.2 (a) What are the marginal distributions of X and Y? (b) What are the conditional distributions of X, given y = 0, 1, 2? (c) What are the conditional expected values of X, given y = 0, 1, 2? (d) What are the conditional variances of X, given y = 0, 1, 2? (e) Verify that E[X] = E[E[X | Y]]. (f) What is E[Var[X | Y]]? (g) What is Var[E[X | Y]]? (h) Verify that Var[X] = E[Var[X | Y]] + Var[E[X | Y]]. (i) What are the conditional distributions of Y, given x = 0, 1, 2? (j) What are the conditional expected values of Y, given x = 0, 1, 2? (k) What are the conditional variances of Y, given x = 0, 1, 2? (l) Verify that E[Y] = E[E[Y | X]]. (m) What is E[Var[Y | X]]? (n) What is Var[E[Y | X]]? (o) Verify that Var[Y] = E[Var[Y | X]] + Var[E[Y | X]]. (p) What is the skewness of X? (q) What is the skewness of Y? (r) What is the kurtosis of X? (s) What is the kurtosis of Y?
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 21
Use the following information for the next 4 questions: For any given insured, the number of claims is Negative Binomial with parameters β and r = 3. Let p = 1/(1+β). Over the portfolio of insureds, p is distributed uniformly from 0.1 to 0.6. 2.29 (2 points) For an insured picked at random, what is the chance that zero claims are observed? A. Less than 5% B. At least 5%, but less than 7% C. At least 7%, but less than 9% D. At least 9%, but less than 11% E. 11% or more 2.30 (2 points) For an insured picked at random, what is the chance that one claim is observed? A. Less than 5% B. At least 5%, but less than 7% C. At least 7%, but less than 9% D. At least 9%, but less than 11% E. 11% or more 2.31 (2 points) What is the unconditional mean? A. Less than 6 B. At least 6, but less than 7 C. At least 7, but less than 8 D. At least 8, but less than 9 E. 9 or more 2.32 (2 points) What is the unconditional variance? A. Less than 55 B. At least 55, but less than 60 C. At least 60, but less than 65 D. At least 65, but less than 70 E. 70 or more 2.33 (2 points) A medical test has been developed for the disease Hemoglophagia. The test gives either a positive result, indicating that the patient has Hemoglophagia, or a negative result, indicating that the patient does not have Hemoglophagia. However, the test sometimes gives an incorrect result. 1 in 500 of those who do not have Hemoglophagia nevertheless have a positive test result. 3% of those having Hemoglophagia have a negative test result. If 83% is the probability that a person with a positive test result has Hemoglophagia, determine the percent of the general population that has Hemoglophagia.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 22
Use the following information for the next two questions: The number of children per family is Poisson with λ = 2. 2.34 (1 point) A family is picked at random. If the family has at least one child, what is the probability that it has more than one child? A. 69% B. 71% C. 73% D. 75% E. 77% 2.35 (2 points) A child is picked at random. What is the probability that this child has at least one brother or sister? A. 83% B. 85% C. 87% D. 89% E. 91%
2.36 (3 points) Prior to your going on vacation for a week, your neighbor, forgetful Frank, has agreed to water your prized potted plant. If your plant is watered it has a 95% chance of living. If your plant is not watered it has only a 40% chance of living. When you return from vacation your plant is dead! Discuss how an actuary would determine the probability that Frank watered your plant. 2.37 (2, 5/85, Q.10) (1.5 points) Let X and Y be continuous random variables with joint density function f(x, y) = 1.5x for 0 < y < 2x < 1. What is the conditional density function of Y given X = x ? A. 1/(2x) for 0 < x < 1/2. B. 1/(2x) for 0 < y < 2x < 1. C. 4/x for 0 < x < 1. D. 4/x for 0 < y < 2x < 1. E. 16x/(1 - 2y2 ) for 0 < y < 2x < 1. 2.38 (2, 5/85, Q.27) (1.5 points) Let X and Y have the joint density function f(x, 7) = x + y for 0 < x < 1 and 0 < y < 1. What is the conditional mean E(Y I X = 1/3)? A. 3/8 B. 5/12 C. 1/2 D. 7/12 E. 3/5 2.39 (4, 5/86, Q.32) (1 point) Let X,Y and Z be random variables. Which of the following statements are true? 1. The variance of X is the second moment about the origin of X. 2. If Z is the product of X and Y, then the expected value of Z is the product of the expected values of X and Y. 3. The expected value of X is equal to the expectation over all possible values of Y, of the conditional expectation of X given Y. A. 2 B. 3 C. 1, 2 D. 1, 3 E. 2, 3
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 23
2.40 (2, 5/88, Q.18) (1.5 points) Let X and Y be continuous random variables with joint density function f(x, y) = 6xy + 3x2 for 0 < x < y < 1. What is E(X I Y = y)? A. 3y4
B. 2y5
C.
6xy + 3x2 4y3
D.
6x2y + 3x3 4y3
E.
11y 16
2.41 (2, 5/88, Q.37) (1.5 points) Let X and Y be continuous random variables with joint density function f(x, y) = e-y/2 for -y < x < y and y > 0. What is P[X < 1 I Y = 3]? A. e-3/2
B. 2e-3
C. (e-1 - e-3)/2
D. 1/6
E. 2/3
2.42 (4, 5/88, Q.33) (1 point) On this multiple choice exam, each question has 5 possible answers, exactly one of which is correct. On those questions for which he is not certain of the answer, Stu Dent's strategy for taking the exam is to answer at random from the 5 possible answers. Assume he correctly answers the questions for which he knows the answers. If Stu Dent knows the answers to 75% of the questions, what is the probability that he knew the answer to a question he answered correctly? A. Less than 0.850 B. At least 0.850, but less than 0.875 C. At least 0.875, but less than 0.900 D. At least 0.900, but less than 0.925 E. 0.925 or more 2.43 (4, 5/89, Q.28) (2 points) A die is selected at random from an urn that contains two six-sided dice with the following characteristics: Number of Faces Number on Face Die #1 Die #2 1 1 1 2 3 1 3 1 1 4 1 3 The first five rolls of the selected die yielded the following in sequential order: 2, 3, 4, 1, and 4. What is the probability that the selected die is the second one? A. Less than 0.5 B. At least 0.5, but less than 0.6 C. At least 0.6, but less than 0.7 D. At least 0.7, but less than 0.8 E. 0.8 or more
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 24
2.44 (4, 5/89, Q.31) (3 points) Let Y be a random variable which represents the number of claims that occur in a given year. The probability density function for Y is a function of the parameter θ. The parameter θ is distributed uniformly over the interval (0, 1). The probability of no claims occurring during a given year is greater than 0.350. That is, P(Y=0) > 0.350 under the assumption that the prior distribution of θ is uniform on (0, 1). Which of the following represent possible conditional probability distributions for Y given θ? 1. P(Y = y | θ) = e−θ θy / y ! ⎛n + y -1⎞ 2. P(Y = y | θ) = ⎜ ⎟ θn (1-θ)y for n = 2 ⎝ y ⎠ ⎛ n⎞ 3. P(Y = y | θ) = ⎜ ⎟ θy (1-θ)n-y for n = 2 ⎝y ⎠ A. 1
B. 2
C. 3
D. 1, 2
E. 1, 3
2.45 (2, 5/90, Q.14) (1.7 points) Let X and Y be continuous random variables with joint density function f(x, y) = (12/25)(x + y2 ) for 1 < x < y < 2. What is the marginal density function of Y where nonzero? A. (6/25)(2y3 - y2 - 1) for 1 < y < 2.
B. (6/25)(3 + 2y2 ) for 1 < y < 2.
C. (6/25)y2 (1 + 2y) for 1 < y < 2.
D. 3(x + y2 )/(8 + 6x - 3x2 - x3 ) for 1 < x < y < 2.
E. 3(x + y2 )/(7 + 3x) for 1 < x < y < 2. 2.46 (2, 5/90, Q.32) (1.7 points) Let X and Y be discrete random variables with joint probability function f(x, y) = (x2 + y2 )/56, for x = 1, 2, 3, and y = 1,..., x. What is P[Y = 3 I Y ≥ 2]? A. 9/28 B. 1/3 C. 6/13 D. 41/54 E. 6/7
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 25
2.47 (2, 5/90, Q.35) (1.7 points) Let X and Y be continuous random variables with joint probability function f(x, y) and marginal density functions fX and fY, respectively, that are nonzero only on the interval (0 ,1). Which of the following statements is always true? 1
A. E[X2 Y 3 ] =
1
∫0 x2dx ∫0 y3dy 1
C. E[X2 Y 3 ] =
1
B. E[X2 ] =
1
∫0 x2 f(x,y)dx ∫0 y3 f(x,y)dy
∫0 x2 f(x,y)dx 1
D. E[X2 ] =
∫0 x2 fX(x)dx
1
E. E[Y3 ] =
∫0 y 3 fX(x)dx
2.48 (4, 5/91, Q.30) (2 points) The expected value of a random variable X, is written as E[X]. Which of the following are true? 1. E[X] = EY[E[X|Y]] 2. Var[X] = EY[Var[X|Y]] + VarY[E[X|Y]] 3. E[g(Y)] = EY[EX[g(Y)| X]] A. 1, 2
B. 1, 3
C. 2, 3
D. 1, 2, 3
E. None of A, B, C, or D
2.49 (2, 5/92, Q.11) (1.7 points) Let X and Y be continuous random variables with joint density function f(x, y) = 3x/4 for 0 < x < 2 and 0 < y < 2 - x. What is P[X > 1]? A. 1/8 B. 1/4 C. 3/8 D. 1/2 E. 3/4 2.50 (2, 5/92, Q.17) (1.7 points) Let X and Y be continuous random variables with joint density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. What is the marginal density function for X, where nonzero? A. y + 1/2
B. 2x
C. x
D. (x + x2 )/2
E. x + 1/2
2.51 (2, 5/92, Q.44) (1.7 points) Let X and Y be discrete random variables with joint probability function f(x, y) = (x + 1)(y + 2)/54 for x = 0, 1, 2 and y = 0, 1, 2. What is E(Y I X = 1)? A. 11/27
B. 1
C. 11/9
D. (y + 2)/9
E. (y2 + 2y)/9
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 26
2.52 (4B, 5/92, Q.4) (2 points) You have selected a die at random from the two dice described below. Die A Die B 2 sides labeled 1 4 sides labeled 1 2 sides labeled 2 1 side labeled 2 2 sides labeled 3 1 side labeled 3 The following outcomes from 5 tosses of the selected die are observed: 1, 1, 2, 3, 1. Determine the probability that you selected Die A. A. Less than 0.20 B. At least 0.20 but less than 0.30 C. At least 0.30 but less than 0.40 D. At least 0.40 but less than 0.50 E. At least 0.50 2.53 (4B, 5/94, Q.5) (2 points) Two honest, six-sided dice are rolled, and the results D1 and D2 are observed. Let S = D1 + D2 . Which of the following are true concerning the conditional distribution of D1 given that S<6? 1. The mean is less than the median. 2. The mode is less than the mean. 3. The probability that D1 = 2 is 1/3. A. 2
B. 3
C. 1, 2
D. 2, 3
E. None of A, B, C, or D
2.54 (2, 2/96, Q.34) (1.7 points) Let X and Y be continuous random variables with joint density function f(x, y) = 2 for 0 < x < y < 1. Determine the conditional density function of Y given X = x, where 0 < x < 1. A. 1/(1 - x) for x < y < 1 B. 2(1 - x) for x < y < 1 C. 2 for x < y < 1 D. 1/y for x < y <1 E. 1/(1 - y) for x < y < 1 2.55 (4B, 5/96, Q.8) (1 point) You are given the following:
• •
X1 and X2 are two independent observations of a discrete random variable X.
X is equally likely to be 1, 2, 3, 4, 5, or 6. Determine the conditional mean of X1 given that X1 + X2 is less than or equal to 4. A. 5/3 B. 2 C. 5/2 D. 10/3 E. 7/2
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 27
2.56 (4B, 11/98, Q.9) (2 points) You are given the following: • A portfolio consists of 75 liability risks and 25 property risks. • The risks have identical claim count distributions. • Loss sizes for liability risks follow a Pareto distribution, with parameters θ = 300 and α = 4. • Loss sizes for property risks follow a Pareto distribution, with parameters θ = 1,000 and α = 3. A risk is randomly selected from the portfolio and a claim of size k is observed. Determine the limit of the posterior probability that this risk is a liability risk as k goes to zero. A. 3/4 B. 40/49 C. 10/11 D. 40/43 E. 1 2.57 (4B, 11/98, Q.16) (2 points) You are given the following: • A portfolio of automobile risks consists of 900 youthful drivers and 800 nonyouthful drivers. • A youthful driver is twice as likely as a nonyouthful driver to incur at least one claim during the next year. • The expected number of youthful drivers (n) who will be claim-free during the next year is equal to the expected number of nonyouthful drivers who will be claim-free during the next year. Determine n. A. Less than 150 B. At least 150, but less than 350 C. At least 350, but less than 550 D. At least 550, but less than 750 E. At least 750
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 28
Use the following information for the next two questions: • Four shooters are available to shoot at a target some distance away that has the following design:
• • • •
R
S
T
U
V
W
X
Y
Z
Shooter A hits Areas R, S, U, and V, each with probability 1/4. Shooter B hits Areas S, T, V, and W, each with probability 1/4. Shooter C hits Areas U, V, X, and Y, each with probability 1/4. Shooter D hits Areas V, W, Y, and Z, each with probability 1/4.
2.58 (4B, 5/99, Q.9) (2 points) One shooter is randomly selected and fires two shots. Determine the probability that the shooter can be identified with certainty. A. 1/4 B. 7/16 C. 1/2 D. 9/16 E. 3/4 2.59 (4B, 11/99, Q.11) (2 points) Two distinct shooters are randomly selected, and each fires one shot. Determine the probability that both shots land in the same Area. A. 1/16 B. 5/48 C. 3/16 D. 1/4 E. 5/12 2.60 (Course 1 Sample Exam, Q.13) (1.9 points) Let X and Y be discrete random variables with joint probability function p(x, y) = (2x + y)/12, for (x, y) = (0, 1), (0, 2), (1, 2) and (1, 3). Determine the marginal probability function of X. A. p(x) is 1/6 for x = 0 and 5/6 for x = 1. B. p(x) is 1/4 for x = 0 and 3/4 for x = 1. C. p(x) is 1/3 for x = 0 and 2/3 for x = 1. D. p(x) is 2/9 for x = 1, 3/9 x = 2, and 4/9 for x = 3. E. p(x) is y/12 for x = 0 and (2 + y)/12 for x = 1. 2.61 (IOA 101, 4/00, Q.7) (2.25 points) Suppose that X and Y are continuous random variables. Prove that ∞
E(X) =
∫-∞E(X | Y = y) fY(y) dy .
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 29
2.62 (1, 5/00, Q.22) (1.9 points) An actuary determines that the annual numbers of tornadoes in counties P and Q are jointly distributed as follows: Annual number of tornadoes in county Q 0 1 2 3 Annual number 0 0.12 0.06 0.05 0.02 of tornadoes 1 0.13 0.15 0.12 0.03 in county P 2 0.05 0.15 0.10 0.02 Calculate the conditional variance of the annual number of tornadoes in county Q, given that there are no tornadoes in county P. (A) 0.51 (B) 0.84 (C) 0.88 (D) 0.99 (E) 1.76 2.63 (2 points) In the previous question, calculate the conditional variance of the annual number of tornadoes in county Q, given that there is at least one tornado in county P. A. Less than 0.75 B. At least 0.75, but less than 0.80 C. At least 0.80, but less than 0.85 D. At least 0.85, but less than 0.90 E. At least 0.90 2.64 (1, 11/00, Q.4) (1.9 points) A diagnostic test for the presence of a disease has two possible outcomes: 1 for disease present and 0 for disease not present. Let X denote the disease state of a patient, and let Y denote the outcome of the diagnostic test. The joint probability function of X and Y is given by: P(X = 0, Y = 0) = 0.800. P(X = 1, Y = 0) = 0.050. P(X = 0, Y = 1) = 0.025. P(X = 1, Y = 1) = 0.125. Calculate Var(Y | X = 1). (A) 0.13 (B) 0.15 (C) 0.20 (D) 0.51 (E) 0.71 2.65 (IOA 101, 4/01, Q.2) (1.5 points) A certain medical test either gives a positive or negative result. The positive test result is intended to indicate that a person has a particular (rare) disease, while a negative test result is intended to indicate that they do not have the disease. Suppose, however, that the test sometimes gives an incorrect result: 1 in 100 of those who do not have the disease have positive test results, and 2 in 100 of those having the disease have negative test results. If 1 person in 1000 has the disease, calculate the probability that a person with a positive test result has the disease. 2.66 (4, 11/04, Q.13 & 2009 Sample Q.142) (2.5 points) You are given: (i) The number of claims observed in a 1-year period has a Poisson distribution with mean θ. (ii) The prior density is: π(θ) = e−θ/(1 - e-k), 0 < θ < k. (iii) The unconditional probability of observing zero claims in 1 year is 0.575. Determine k. (A) 1.5 (B) 1.7 (C) 1.9 (D) 2.1 (E) 2.3
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 30
Solutions to Problems: 2.1. D. The probability of picking a colorblind person out of this population is: (3%)(20%) + (0.2%)(80%) = 0.76%. The chance of a person being both colorblind and male is: (3%)(20%) = 0.6%. Thus the (conditional) probability that the colorblind person is a man is: 0.6% / 0.76% = 78.9%. Alternately, assume we have a 1000 people: 200 men and 800 women. Expected number of colorblind people is: (3%)(200) + (0.2%)(800) = 6 + 1.6 = 7.6. The proportion of the colorblind people in the audience who are male is: 6/7.6 = 78.9%. 2.2. C. Taking a weighted average, the a priori chance of a black ball is 5.6%. A
B
C
D
Type
A Priori Probability
% Black Balls
Col. B times Col. C
I II III IV SUM
0.4 0.3 0.2 0.1 1
0.03 0.05 0.08 0.13
0.0120 0.0150 0.0160 0.0130 0.0560
2.3. A. P[Urn = I | Ball = Black] = P[Urn = I and Ball =Black] / P[Ball = Black] = (.4)(.03) / .056 = .012/.056 = 21.4%. 2.4. D. P[Urn = II | Ball = Black] = P[Urn = II and Ball =Black] / P[Ball = Black] = (.3)(.05) / .056 = .015/.056 = 26.8%. 2.5. E. P[Urn = III | Ball = Black] = P[Urn = III and Ball =Black] / P[Ball = Black] = (.2)(.08) / .056 = .016/.056 = 28.6%. 2.6. B. P[Urn = IV | Ball = Black] = P[Urn = IV and Ball =Black] / P[Ball = Black] = (.1)(.13) / .056 = .013/.056 = 23.2%. Comment: The conditional probabilities of the four types of urns add to unity.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 31
2.7. C. Using the solutions to the previous problems, one takes a weighted average using the posterior probabilities of each type of urn: (21.4%)(.03) + (26.8%)(.05) + (28.6%)(.08) + (23.2%)(.13) = 0.073. Comment: This whole set of problems can be usefully organized into a spreadsheet: A
B
A Priori Probability 0.4 0.3 0.2 0.1
Type I II III IV SUM
C
% Black Balls 0.03 0.05 0.08 0.13
D
E
F
Probability Weights = Col. B x Col. C 0.0120 0.0150 0.0160 0.0130 0.0560
Posterior Probability 0.214 0.268 0.286 0.232 1.000
Col. C x Col. E 0.0064 0.0134 0.0229 0.0302 0.0729
This is a simple example of Bayesian Analysis, which is covered subsequently. 2.8. E. Given λ, the chance that we observe zero claims is: e−λλ 0 /0! = e−λ. 1.5
1.5
1.5
∫ P(n=0 | λ) f(λ) dλ = ∫e−λ(1/1.5) dλ = (-1/1.5)e−λ ] = (1-e-1.5)/1.5 = 0.518.
P(n=0) = 0
0
0
2.9. A. Given λ, the chance that we observe one claim is: e−λλ 1 / 1! = λe−λ. 1.5
1.5
1.5
∫ P(n=1 | λ) f(λ) dλ = ∫λe−λ(1/1.5) dλ = (-1/1.5){λe−λ+ e−λ} ]
P(n=1) = 0
0
0
= (1-2.5e-1.5)/1.5 = 0.295. Comment: For larger numbers of observed claims the result is an incomplete Gamma Function. In this case P(n=1) = (1/1.5)Γ(2;1.5) = (.4422) / 1.5 = .295. For example, P(n=2) = (1/1.5)Γ(3;1.5) = (.1912) / 1.5 = .127. One can compute the other probabilities in a similar manner: # claims Prob
0 0.5179
1 0.2948
2 0.1274
3 0.0438
4 0.0124
5 0.0030
6 0.0006
7 0.0001
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 32 x
2.10. E. f(x) =
x
x
∫ f(x,y) dy = (1/25) ∫ (6 +12x + 18y) dy = (1/25)(6y + 12xy + 9y2) ] =
y=1
y=1
y=1
= (6x +12x2 +9x2 -6 - 12x - 9)/25 = (21x2 - 6x - 15) / 25. Comment: This unconditional density function is referred to as the marginal or full marginal density of x. 2
2.11. D. f(y) =
2
2
∫ f(x,y) dx = (1/25) ∫ (6 +12x + 18y) dx = (1/25)(6x + 6x2 + 18yx) ] =
x=y
x=y
x=y
(12 + 24 + 36y - 6y - 6y2 - 18y2 )/25 = (36 + 30y - 24y2 ) / 25. 2.12. A. f(y | x) = f(x,y) / f(x) = (6 +12x + 18y) / (21x2 -6x - 15). f(y | X=1.5) = (6 +12(1.5) + 18y) / (21(1.5)2 -6(1.5) - 15) = (24 + 18y) / 23.25 = (4 + 3y)(24/93). 2.13. B. f(x | y) = f(x,y) / f(y) = (6 +12x + 18y) / (36 +30y - 24y2 ). f(x | Y = 1.5) = (6 +12x + 18(1.5)) / (36 +30(1.5) - 24(1.5)2 ) = (12x +33) / 27 = (4x + 11) / 9. Comment: Verify that the integral of this density function from x=1.5 to 2 is in fact unity. x
x
∫
∫
y=1
y=1
2.14. C. E[Y | X = x] = y f (y |x) dy = {1/(21x2 -6x - 15)} (6y +12xy + 18y2 ) dy = x
{1/(21x2 -6x - 15)}(3y2 + 6xy2 + 6y3 ) ] = (12x3 +3x2 -6x - 9 ) / (21x2 -6x - 15) y=1 = (4x2 + 5x + 3) / (7x + 5). Therefore, E[Y | X = 1.6] = {4(1.6)2 + 5(1.6) + 3} / {7(1.6) + 5} = 1.311.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 33 2
2
∫
∫
x=y
x=y
2.15. D. E[X | Y =y] = x f (x |y) dx = {1/(36 +30y - 24y2 )} (6x +12x2 + 18yx) dx = 2
{1/(36 +30y - 24y2 )}(3x2 +4x3 + 9yx2 ) ] = (44+ 36y- 3y2 -13y3 ) / (36 +30y - 24y2 ) = x=y (13y2 +29y +22) / (24y +18). Therefore, E[X | Y = 1.2] = {13(1.2)2 +29(1.2) + 22} / {24(1.2) +18} = 1.614. Comment: E[X | Y =y] is close to being linear, as graphed below:
2
2.16. E. E[X] =
2
2
2
∫ ∫x f(x,y) dxdy = (1/25) ∫ ∫ (6x +12x2 + 18xy) dxdy =
y=1 x=y
y=1 x=y
2
2
∫
]
y=1
x=y
(1/25) (3x2 + 4x3 + 9x2 y)
2
∫
dy= (1/25) (44 + 36y -3y2 - 13y3 )dy y=1 2
= (1/25)(44y +18y2 - y3 - (13/4)y4 )
] = 169/100 = 1.69.
y=1
Comment: One can get the same answer by taking the integral over y of: E[X | Y=y] f(y).
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 34 2
2.17. D. E[Y] =
2
2
2
∫ ∫y f(x,y) dx dy = (1/25) ∫ ∫ (6y +12xy + 18y2) dx dy =
y=1 x=y
y=1 x=y
2
2
∫
]
y=1
x=y
(1/25) (6xy + 6x2 y + 18xy2 )
2
∫
dy = (1/25) (36y + 30y2 -24y3 ) dy y=1
2
] = 34/25 = 1.36.
= (1/25)(18y2 + 10y3 - 6y4 )
y=1
Comment: One can get the same answer by taking the integral over x of: E[Y | X =x]f(x). 2.18. E. There are the following 6 equally likely possibilities such that Z ≥ 10: (4,6), (5,5), (5,6), (6,4), (6,5), (6,6). Of these, 3 have X = 6, so that Prob[X = 6 | Z ≥ 10] = Prob[X = 6 and Z ≥ 10] / Prob [Z ≥ 10] = (3/36) /(6/36) = 1/2. 2.19. D. There are the following 6 equally likely possibilities such that X + Y ≥ 10: (4,6), (5,5), (5,6), (6,4), (6,5), (6,6). Of these one has X = 4, two have X = 5, and three have X = 6. Therefore E[X | Z ≥ 10] = {(1)(4) + (2)(5) + (3)(6)} / 6 = 5.33. For those who like diagrams: Y X 1 2 3 4 5 6
1
2
3
4
x
5
x x
6
x x x
Possibilities 0 0 0 1 2 3
Conditional Density Function of X given that X + Y ≥ 10
1/6 2/6 3/6
E[X | Z ≥10] = Σ i P[X=i | Z ≥10] = (1)(0) + (2)(0) + (3)(0) + (4)(1/6) + (5)(2/6) + (6)(3/6) = 5.33. 2.20. A. Since y > 0 and x + y = 1/2, x≤ 1/2. If x + y = 1/2, then y = 0.5 -x. The probability that X = x and Y = .5 -x is: f(x)h(.5-x) = (3e-3x)(3e-3(.5-x)) = 9e -1.5. Now this is proportional to the conditional distribution of x since: P[X=x|Z=1/2] = P[X=x and Z=1/2] / P[Z=1/2]. Thus the conditional distribution of x is proportional to a constant, 9 e -1.5, thus it is uniform on (0,.5]. P[X=x | Z=1/2] = 2 for 0 < x ≤ 1/2.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 35
2.21. X + Y is Binomial with m = 3 + 5 = 8 and q = .2. Prob[X = x | X + Y = 6] = Prob[X = x and X + Y = 6]/Prob[X + Y = 6] = Prob[X = x]Prob[Y = 6 - x]/Prob[X + Y = 6] = (3!/{(3-x)!x!} .2x .83-x)(5!/{(5-(6-x))!(6-x)!} .26-x .85-(6-x)) / (8!/{(8-6)!6!} .26 .88-6) ⎛3⎞ ⎛ 5 ⎞ = (3!/{(3-x)!x!})(5!/{(x-1)!(6-x)!}) /(8!/{2!6!}) = ⎜ ⎟ ⎜ ⎟ ⎝ x⎠ ⎝ 6 - x⎠
/
⎛ 8⎞ ⎜ 6⎟ . ⎝ ⎠
This is a Hypergeometric Distribution. Comment: Beyond what you likely to be asked on your exam! The Hypergeometric Distribution is discussed for example in Introduction to Probability Models by Ross. 2.22. X + Y is Poisson with mean 3 + 7 = 10. Prob[X = x | X + Y = 9] = Prob[X = x and X + Y = 9] / Prob[X + Y = 9] = Prob[X = x] Prob[Y = 9 - x] / Prob[X + Y = 9] = {e- 3 3x / x! } {e- 7 79 - x / (9 - x)!} 3x 79 - x ⎛ 9⎞ 9! = = ⎜ ⎟ 0.3x 0.79-x. x! (9- x)! ⎝ x⎠ e- 10 109 / 9! 109 This is Binomial with m = 9 and q = 0.3. Comment: In general, Prob[X = x | X + Y = z] is Binomial with m = z and q = λ1/(λ1 + λ2). 2.23. B. Assume a priori that the unknown baby is equally like to be a boy or a girl. After the baby is added, let there be a total of N babies. Then there are 4 boys, N-5 girls, and one unknown baby. If the unknown baby is a boy, then the chance of having picked a boy (the chance of the observation) is 5/N. On the other hand, if the unknown baby is a girl, then the chance of having picked a boy (the chance of the observation) is 4/N. Then the posterior chances of being a boy or a girl are proportional to 5/N and 4/N respectively. Thus the posterior probabilities are: 5/9 and 4/9. The probability that the baby added was a girl is: 4/9 = 0.444. A
B
C
D
E
Type of Unknown Baby
A Priori Chance of This Type
Chance of the Observation Given Type
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type
Boy Girl
0.500 0.500
5/N 4/N
2.5/N 2/N
5/9 4/9
4.5/N
1.000
Overall
Comment: See “Itʼs a Puzzlement” by John Robertson in the 5/97 Actuarial Review.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 36
2.24. E. The chance that a driver is accident-free is: (40%)(90%) + (25%)(92%) + (20%)(94%) + (15%)(96%) = 92.2%. The chance that a driver is both accident-free and from Boston is: (40%)(90) = 36%. Thus the chance this driver is from Boston is: 36% / 92.2% = 39.0%. Comment: Some may find to helpful to assume for example a total of 100,000 drivers. 2.25. D. The chance that a driver has had an accident is: (40%)(10%) + (25%)(8%) + (20%)(6%) + (15%)(4%) = 7.8%. The chance that a driver both has had an accident and is from Pittsfield is (15%)(4%) = .6%. Thus the chance this driver is from Pittsfield is: .6% / 7.8% = 0.077. Comment: This is the type of reasoning required to do an exam question involving Bayesian Analysis. Note that the chances for each of the other cities are: {(40%)(10%), (25%)(8%), (20%)(6%)} / 7.8%. You should confirm that the conditional probabilities for the four cities sum to 100%. 2.26. B. If one has Die A, then the chance of the observation is: (1/6)(1/6)(3/6)(1/6)(1/6) = 3 / 7776. This also the chance of the observation for Die C or Die D. If one has Die B, then the chance of the observation is: (3/6)(1/6)(1/6)(3/6)(1/6) = 9 / 7776. A
B
A Priori Probability 0.25 0.25 0.25 0.25
Die A B C D
C
D
E
Chance of Observation 0.000386 0.001157 0.000386 0.000386
Probability Weights = Col. B x Col. C 0.000096 0.000289 0.000096 0.000096
Posterior Probability 0.1667 0.5000 0.1667 0.1667
0.000579
1.000
SUM
2.27. C. If Les knows the answer, then the chance of observing a correct answer is 100%. If Les doesnʼt know the answer to a question then the chance of observing a correct answer is 20%. A
B
Type of Question Les knows Les Doesn't know
A Priori Chance of This Type of Question 0.760 0.240
Overall
C
D
E
Chance of the Observation 1.0000 0.2000
Prob. Weight = Product of Columns B&C 0.7600 0.0480
Posterior Chance of This Type of Question 94.06% 5.94%
0.808
1.000
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 37
2.28. (a) Marginal Distribution of X is the result of adding across columns: 30% @0, 30% @1, 40% @2. Marginal Distribution of Y is the result of adding down rows: 30% @0, 40% @1, 30% @2. (b) Given y = 0, conditional distribution of X: 1/3 @0, 0 @1, 2/3 @2. Given y = 1, conditional distribution of X: 1/2 @0, 1/2 @1, 0 @2. Given y = 2, conditional distribution of X: 0 @0, 1/3 @1, 2/3 @2. (c) Given y = 0, conditional expected value of X: (1/3)(0) + (0)(1) + (2/3)(2) = 4/3. Given y = 1, conditional expected value of X: (1/2)(0) + (1/2)(1) + (0)(2) = 1/2. Given y = 2, conditional expected value of X: (0)(0) + (1/3)(1) + (2/3)(2) = 5/3. (d) Given y = 0, conditional second moment of X: (1/3)(02 ) + (0)(12 ) + (2/3)(22 ) = 8/3. Given y = 1, conditional second moment of X: (1/2)(02 ) + (1/2)(12 ) + (0)(22 ) = 1/2. Given y = 2, conditional second moment of X: (0)(02 ) + (1/3)(12 ) + (2/3)(22 ) = 3. Given y = 0, conditional variance of X: 8/3 - (4/3)2 = 8/9. Given y = 1, conditional variance of X: 1/2 - (1/2)2 = 1/4. Given y = 2, conditional variance of X: 3 - (5/3)2 = 2/9. (e) Using the marginal distribution, E[X] = (.3)(0) + (.3)(1) + (.4)(2) = 1.1. E[E[X | Y]] = Prob(y = 0)E[X | y = 0] + Prob(y = 1)E[X | y = 1] + Prob(y = 2)E[X | y = 2] = (.3)(4/3) + (.4)(1/2) + (.3)(5/3) = 1.1. (f) E[Var[X | Y]] = Prob(y = 0)Var[X | y =0] + Prob(y = 1)Var[X | y =1] + Prob(y = 2)Var[X | y =2] = (.3)(8/9) + (.4)(1/4) + (.3)(2/9) = .4333. (g) Var[E[X | Y]] = {(.3)(4/3)2 +(.4)(1/2)2 + (.3)(5/3)2 } - 1.12 = .2567. (h) Var[X] = (.3)(02 ) + (.3)(12 ) + (.4)(22 )} - 1.12 = .690 = .4333 + .2567 = E[Var[X | Y]] + Var[E[X | Y]]. (i) Given x = 0, conditional distribution of Y: 1/3 @0, 2/3 @1, 0 @2. Given x = 1, conditional distribution of Y: 0 @0, 2/3 @1, 1/3 @2. Given x = 2, conditional distribution of Y: 1/2 @0, 0 @1, 1/2 @2. (j) Given x = 0, conditional expected value of Y: (1/3)(0) + (2/3)(1) + (0)(2) = 2/3. Given x = 1, conditional expected value of Y: (0)(0) + (2/3)(1) + (1/3)(2) = 4/3. Given x = 2, conditional expected value of Y: (1/2)(0) + (0)(1) + (1/2)(2) = 1. (k) Given x = 0, conditional second moment of Y: (1/3)(02 ) + (2/3)(12 ) + (0)(22 ) = 2/3. Given x = 1, conditional second moment of Y: (0)(02 ) + (2/3)(12 ) + (1/3)(22 ) = 2. Given x = 2, conditional second moment of Y: (1/2)(02 ) + (0)(12 ) + (1/2)(22 ) = 2. Given x = 0, conditional variance of Y: 2/3 - (2/3)2 = 2/9. Given x = 1, conditional variance of Y: 2 - (4/3)2 = 2/9. Given x = 2, conditional variance of Y: 2 - (1)2 = 1.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 38
(l) Using the marginal distribution, E[Y] = (.3)(0) + (.4)(1) + (.3)(2) = 1. E[E[Y | X]] = Prob(x = 0)E[Y | x = 0] + Prob(x = 1)E[Y | x = 1] + Prob(x = 2)E[Y | x = 2] = (.3)(2/3) + (.3)(4/3) + (.4)(1) = 1. (m) E[Var[Y | X]] = Prob(x = 0)Var[Y | x =0] + Prob(x = 1)Var[Y | x =1] + Prob(x = 2)Var[Y | x =2] = (.3)(2/9) + (.3)(2/9) + (.4)(1) = .5333. (n) Var[E[X | Y]] = {(.3)(2/3)2 +(.3)(4/3)2 + (.4)(1)2 } - 12 = .0667. (o) Var[Y] = (.3)(02 ) + (.4)(12 ) + (.3)(22 )} - 12 = .6 = .5333 + .0667 = E[Var[X | Y]] + Var[E[X | Y]]. (p) Marginal Distribution of X is the result of adding across columns: 30% @0, 30% @1, 40% @2. E[X] = (.3)(0) + (.3)(1) + (.4)(2) = 1.1. Var[X] = {(.3)(02 ) + (.3)(12 ) + (.4)(22 )} - 1.12 = .690. E[(X - X )3 ] = (.3)(0 - 1.1)3 + (.3)(1 - 1.1)3 + (.4)(2 - 1.1)3 = -0.108. Skew[X] = -0.108/0.6901.5 = -0.188. (q) Marginal Distribution of Y is the result of adding down rows: 30% @0, 40% @1, 30% @2. Since the distribution of Y is symmetric around 1, its skewness is zero. (r) E[(X - X )4 ] = (.3)(0 - 1.1)4 + (.3)(1 - 1.1)4 + (.4)(2 - 1.1)4 = 0.7017. Kurtosis[X] = 0.7017/0.6902 = 1.474. (s) E[Y] = (.3)(0) + (.4)(1) + (.3)(2) = 1. Var[Y] = (.3)(02 ) + (.4)(12 ) + (.3)(22 )} - 12 = 0.6. E[(Y - Y )4 ] = (.3)(0 - 1)4 + (.4)(1 - 1)4 + (.3)(2 - 1)4 = 0.6. Kurtosis[Y] = 0.6/0.62 = 1.667. 2.29. B. p = 1/(1+β). Therefore, 1 - p = β/(1+β). For the Negative Binomial Distribution with r = 3, the conditional probability given p is: P(x | p) = {βx / (1+β)x+r } (x+r-1)! /{x! (r-1)!} = p3 (1-p)x (x+2)! / {(2)(x!)}. Thus the conditional probability of zero claims is: P(x = 0 | p) = p3 . .6 P(x = 0) =
.6
p = .6
∫ P(x = 0 | p) f(p) dp = ∫p3 (1/.5)dp = (1/2) p4 ] = 0.06475.
.1
.1
p = .1
2.30. D. P(x = 1 | p) = 3 p3 (1 - p) = 3(p3 - p4 ). .6
P(x = 1) = .1
.6
p = .6
∫ P(x = 1 | p) f(p) dp = 3∫(p3 - p4) (2)dp = (6)(p4 /4 - p5/5) ] = 0.10095. .1
p = .1
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 39
2.31. C. For the Negative Binomial Distribution with r = 3, the mean is: 3β = 3(1 - p)/p = 3p-1 - 3. The unconditional mean can be obtained by integrating the conditional means versus the distribution of p: .6
E[X] =
.6
p=.6
∫ E[X | p] f(p) dp = 3∫(p-1 - 1) (2)dp = (6)(ln(p) - p) ] = 7.751.
.1
.1
p=.1
2.32. E. For the Negative Binomial Distribution with r = 3, the mean is 3p-1 - 3, while the variance is 3β(1 + β) = 3{(1 - p)/p}/p = 3p-2 - 3p-1. Therefore the (conditional) second moment is: 3p-2 - 3p-1 + (3p-1 - 3)2 = 12p-2 - 21p-1 + 9. The unconditional second moment can be obtained by integrating the conditional second moments versus the distribution of p: .6
E[X2 ] =
.6
p=.6
∫ E[X2 | p] f(p) dp = ∫(12p-2 - 21p-1 + 9) (2)dp = (-24p-1 - 42 ln(p) + 18p) ]
.1
.1
p=.1
= 133.746. From the previous solution the mean is: 7.751. Thus the variance = 133.746 - 7.7512 = 73.675. 2.33. Prob[Positive | Hemoglophagia] = 0.97. Prob[Positive | No Hemoglophagia] = 0.002. x = Percent of of the general population that has Hemoglophagia Prob[Positive] = (x)(0.97) + (1-x)(0.002) = 0.002 + 0.968x. (x)(0.97) 0.83 = Prob[Yes | Positive] = Prob[Positive and Yes]/Prob[Positive] = . 0.002 + 0.968x
⇒ 0.00166 + 0.80344x = 0.97x. ⇒ x = 1.0%. Comment: The percent of those who test positive who have Hemoglophagia as a function of the percent of the population with Hemoglophagia: Percent with Hem.
Percent of Positives with Hem.
5.00% 1.00% 0.10% 0.01%
96.23% 83.05% 32.68% 4.63%
For an extremely rare disease, the majority of positive test results are false positives. 2.34. A. f(0) = e-2 = .1353. f(1) = 2e-2 = .2707. Prob[N > 1 | N > 0] = (1 - .1353 - .2707)/(1 - .1353) = 68.7%.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 40
2.35. C. f(0) = e-2 = .1353. f(1) = 2e-2 = .2707. Let x = average number of children in families with more than one child. 2 = Overall average = (0)(.1353) + (1)(.2707) + x(1 - .1353 - .2707). ⇒ x = 2.911. Therefore percentage of children in families with more than one child is: (2.911)(1 - .1353 - .2707)/2 = 86.5%. Alternately, the percentage of children in families with n children is: n f(n)/E[N]. The percentage of children in families with one child is: (1)(.2707/)2 = .135. Percentage of children in families with more than one child is: 1 - .135 = 86.5%. Comment: A child must come from a family with at least one child. Unlike the previous question, here we pick the child rather than the family at random. Picking a child at random, means each child is equally likely to be picked. Then we are more likely to pick a child from a family with a lot of children. For example, assume there are only two families. Family 1 has one child: Alice. Family 2 has three children: Ben, Charlotte, and Dan. A family is picked at random. If the family has at least one child, what is the probability that it has more than one child? Since we are equally likely to pick Family #1 or #2, the probability is 1/2. If a child is picked at random, what is the probability that this child has at least one brother or sister? We are equally likely to pick Alice, Ben, Charlotte, or Dan, thus the probability is 3/4.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 41
2.36. Let p be the a priori probability that Frank remembered to water your plant. Then the probability that he watered your plant and it died anyway is: 0.05 p. The probability that he failed to water your plant and it died is: 0.6 (1 - p). Thus given that the plant died, the probability Frank failed to water your plant is: 0.6 (1 - p) 12 - 12p = . 0.6 (1 - p) + 0.05 p 12 - 11p For example, if you assume a 30% a priori probability that Frank will water your plant, then the probably that Frank failed to water your plant is: 8.4/8.7 = 96.6%. If you instead assume an 80% a priori probability that Frank will water your plant, then the probably that Frank failed to water your plant is: 2.4/3.2 = 75%. The probability that Frank failed to water your plant as a function of p: 1.0
0.9
0.8
0.7
0.6 0.2
0.4
0.6
0.8
p
Comment: One could instead just check to see how damp the soil around your plant is. 2.37. B.
2x
∫
1.5x / 1.5x dy = 1/(2x), 0 < y < 2x < 1. y =0
2.38. E.
1
1
∫y (1/3 + y) dy / ∫(1/3 + y) dy = (1/6 + 1/3)/(1/3 + 1/2) = (1/2)/(5/6) = 3/5. y =0
y =0
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 42
2.39. B. 1. False. The variance is the second central moment: VAR[X} = E[(X-E[X])2] = E[X2] - E[X]2. The second moment around the origin is E[X2]. 2. False. COVAR[X,Y] = E[XY] - E[X]E[Y], so statement 2 only holds when the covariance of X and Y is zero. (This is true if X and Y are independent.) 3. True. E[X] = EY[E[X|Y]]. 2.40. E.
y
y
∫x (6xy + 3x2) dx / ∫6xy + 3x2 dx = (2y4 + 3y4/4)/(3y3 + y3) = y(11/4)/4 = 11y/16. x =0
x =0
2.41. E. Given Y = 3, X is uniform from -3 to +3. P[X < 1 I Y = 3] = 4/6 = 2/3. 1
3
∫e-3/2 dx / ∫e-3/2 = 2e-3/(3e-3) = 2/3. x =-3
x =-3
2.42. E. If Stu knows the answer, then the chance of observing a correct answer is 100%. If Stu doesnʼt know the answer to a question then the chance of observing a correct answer is 20%. A
B
Type of Question Stu knows Stu Doesn't know
A Priori Chance of This Type of Question 0.750 0.250
C
D
E
Chance of the Observation 1.0000 0.2000
Prob. Weight = Product of Columns B&C 0.7500 0.0500
Posterior Chance of This Type of Question 93.75% 6.25%
0.800
1.000
Overall
2.43. D. If one has Die #1, then the chance of the observation is: (3/6)(1/6)(1/6)(1/6)(1/6) = 3 / 7776. If one has Die #2, then the chance of the observation is: (3/6)(1/6)(1/6)(3/6)(1/6) = 9 / 7776. A
Die 1 2 SUM
B
A Priori Probability 0.5 0.5
C
D
E
Chance of Observation 0.00387 0.01160
Probability Weights = Col. B x Col. C 0.00193 0.00580
Col. D / Sum of Col. D = Posterior Probability 0.2500 0.7500
0.00773
1.000
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 43 1
2.44. A. P(Y=0) =
∫ P(Y = 0 | θ) f(θ) dθ = ∫
P(Y = 0 | θ) dθ
0
For the first case, P(Y = 0 | θ) = e−θ,
1
and P(Y=0) =
∫
e - θ dθ = 1 - e-1 = 0.632.
0
For the second case, P(Y = 0 | θ) =
θ2,
1
and P(Y=0) =
∫
θ2 dθ = 1/3 = 0.333
0
For the third case,
P(Y = 0 | θ) = (1-θ)2,
1
and P(Y=0) =
∫
(1-θ) 2 dθ = 1/3 = 0.333.
0
Only in the first case is P(Y=0) > 0.35. Comment: Three separate problems in which you need to calculate P(Y=0) given P(Y = y | θ) and f(θ). The conditional distributions are a Poisson, Negative Binomial and a Binomial. In each case they are being mixed by the same uniform distribution on [0,1]. 2.45. A.
y
∫(12/25)(x + y2) dx = (12/25){y2/2 - 1/2 + (y - 1)y2} = (6/25)/(2y3 - y2 - 1), for 1 < y < 2. x =1
Comment: The marginal density of Y can not depend on x, eliminating choices D and E. The marginal density of Y has to integrate to 1, eliminating choices B, C, and E. 2.46. C. f(1, 1) = 2/56. f(2, 1) = 5/56. f(2, 2) = 8/56. f(3, 1) = 10/56. f(3, 2) = 13/56. f(3, 3) = 18/56. P[Y = 3 I Y ≥ 2] = (18/56)/(8/56 + 13/56 + 18/56) = 18/39 = 6/13. 2.47. D. Statement A is false. Even if X and Y were independent, then E[X2 Y 3 ] = E[X2 ]E[Y3 ] = 1
1
∫0 x2 fX(x)dx ∫0 y 3 fY(y)dy . Statement B is false. Statement C is false. 1
E[X2 Y 3 ] =
∫∫ x2 y3 f(x,y) dxdy . Statement D is true. Statement E is false. E[Y3] = ∫0 y 3 fY(y) dy .
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 44
2.48. D. 1. True. 2. True. 3. True. Assume discrete distributions; if they are instead continuous distributions then replace summations by integrals. EY[EX[g(Y)| X]] = Σ P[yi][EX[g(Y)| X]] = Σ P[yi]{Σ g(yi)P[yi and xk] /ΣP[yi and xk]} = i
i
k
k
Σ P[yi]{Σ g(yi)P[yi and xk] / P[yi]} = Σ {Σ g(yi)P[yi and xk]} = Σ g(yi)P[yi and xk] = E[g(Y)]. i
k
i
k
i,k
Comment: EY denotes expectation taken over the sample space of Y, (the weighted average taken over all possible values of Y, using the probabilities as the weights.) Statement 3 is similar to E[g(Y)] =EY[g(Y)] = EX[EY[g(Y)| X]]. It turns out you can get the same result (in the regular situations encountered in practice) whether you first take the expectation over X or first take the expectation over Y. Basically you are just taking the double integral of g(y) times the probability density function over all possible values of x and y. The order in which you evaluate this double integral should not matter. In the demonstration of Statement 3, Iʼve used the fact that the sum over all k of P[xk and yi] is P[yi] , since we exhaust all possible x values. The final equality in the demonstration of Statement 3 is the definition of the expected value. 2.49. D.
2
2-x
2
∫ ∫ 3x/4 dy dx = (3/4) ∫ 2x - x2 dx = (3/4)(3 - 7/3) = 1/2. x =1 y =0
2.50. E.
x =1
1
∫(x + y) dy = x + 1/2, for 0 < x < 1. y =0
Comment: The marginal density of X can not depend on y, eliminating choice A. The marginal density of X has to integrate to 1, eliminating choices C and D. 2.51. C. f(1, 0) = 4/54. f(1, 1) = 6/54. f(1, 2) = 8/54. E[Y I X = 1] = {(0)(4/54) + (1)(6/54) + (2)(8/54)}/(4/54 + 6/54 + 8/54) = 11/9.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 45
2.52. C. A
B
C
A Priori Chance Type of Chance of of the Die This Type Observation of Die A 0.500 0.00412 B 0.500 0.00823 Overall
D
E
Prob. Weight = Product of Columns B&C 0.00206 0.00412
Col. D / Sum of Col. D = Posterior Chance of This Type of Die 33.3% 66.7%
0.00617
1.000
For example, the chance of the observation if we have picked Die B is: (4/6)(4/6)(1/6)(1/6)(4/6) = 0.00823. 2.53. A. When S < 6 we have the following equally likely possibilities: D2 Conditional Density Function D1
1
2
3
4
Possibilities
of D1 given that S < 6
1 x x x x 4 2 x x x 3 3 x x 2 4 x 1 The mean of the conditional density function of D1 given that S < 6 is:
4/10 3/10 2/10 1/10
(0.4)(1) + (0.3)(2) + (0.2)(3) + (0.1)(4) = 2. The median is equal to 2, since the Distribution Function at 2 is 0.7 ≥ 0.5, but at 1 it is 0.4 < 0.5. The mode is 1, since that is the value at which the density is a maximum. Thus 1. F, 2. T, 3. F. 2.54 . A.
1
∫
2 / 2 dy = 2/{2(1 - x)} = 1/(1 - x), for x < y < 1. y =x
2.55. A. X1 + X2 ≤ 4 we have the following equally likely possibilities: X2 X1
1
2
3
4
5
6
Possibilities
Conditional Density Function of X1 given that X1 + X2 ≤ 4
1 x x x 3 3/6 2 x x 2 2/6 3 x 1 1/6 4 0 0 5 0 0 6 0 0 The mean of the conditional density function of X1 given that X1 + X2 ≤ 4 is: {(3)(1) + (3)(2) + (1)(3) + (0)(4) + (0)(5)+ (0)(6)} / (3 + 2 + 1) = 5/3.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 46
2.56. D. The posterior distribution is proportional to the product of the prior distribution and the chance of the observation given the type of risk. The prior distribution is: .75, .25. The chances of the observation given the types of risks are the densities at k: (4)(3004 )(300 + k)-5, and (3)(10003 )(1000 + k)-4. Thus the posterior distribution is proportional to: (.75)(4)(3004 )(300 + k)-5 = (3)(3004 )/(300 + k)5 = (3/300)/(300/300 + k/300)5 = (1/100)(1 + k/300)-5, and (.25)(3)(10003 )(1000 + k)-4 = (.75/1000)(1 + k/1000)-4. As k goes to zero, these probability weights go to .01, and .75/1000 = .00075. Normalized to unity, these weights are: .01/.01075 = .930 = 40/43, and .00075/.01075 = .170 = 3/43. 2.57. D. n = the expected number of youthful claim free drivers = the expected number of nonyouthful claims free drivers The chance of a youthful driver incurring at least one claim is: (900 - n)/900. The chance of a nonyouthful driver incurring at least one claim is: (800 - n)/800. We are given that the former is twice the latter: (900 - n)/900 = 2(800 - n)/800. Solving for n: n = 1/{(1/400 - 1/900)} = 720. Comment: Check: (900 - 720)/900 = 20% is twice (800 - 720)/800 = 10%. 2.58. D. The situation is totally symmetric with respect to the shooters and the shooters are equally likely, so we can just take one of the four shooters and examine that situation. If for example we have shooter A, then there are 16 equally likely outcomes: R S U V
R Yes Yes Yes Yes
S Yes No Yes No
U Yes Yes No No
V Yes No No No
Iʼve labeled those situations where the shooter can be identified with certainty. There are 9 out of 16 such situations, for a probability of 9/16. Comment: Note that in the case of targets S and U being hit, you can eliminate both shooters B and C, leaving only shooter A. The pattern of shooters is as follows: A A,C C
A,B A,B,C,D C,D
B B,D D
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 47
2.59. B. The situation is totally symmetric with respect to the shooters and the shooters are equally likely, so we can just take one of the four shooters as first and examine that situation. If for example we assume shooter A is the first shooter, then there is 1/3 chance each that B, C or D is the second shooter. Assume A, hits area R, then there is zero chance that both shots land in the same area. If A hits area V, then regardless of the identity of the second shooter, there is a 1/4 chance of his shot hitting the same area V. If A hits area S, then if the second shooter is B, there is a 1/4 chance of the second shot landing in the same area S; otherwise the chance is zero. Therefore, if A hits area S, there is a (1/3)(1/4) =1/12 chance of the second shot being in the same area. Similarly, if A hits area U, there is a (1/3)(1/4) =1/12 chance of the second shot being in the same area. Thus overall, there is a (0 + 1/4 + 1/12 + 1/12)/4 = 5/48 chance of two shots in the same area. 2.60. B. Prob[X = 0] = p(0, 1) + p(0, 2) = 1/12 + 2/12 = 1/4. Prob[X = 1] = p(1, 2) + p(1, 3) = 4/12 + 5/12 = 3/4.
∫
∫
2.61. E(X | Y = y) = x f(x | y) dx = x f(x, y)/fY(y) dx.
∫
⇒ E(X | Y = y) fY(y) dy =
∫ ∫ x f(x, y) dx dy = E[X].
2.62. D. E[Q | P = 0] = {(0)(.12) + (1)(.06) + (2)(.05) + (3)(.02)}/(.12 + .06 + .05 + .02) = .22/.25 = 0.88. E[Q2 | P = 0] = {(02 )(.12) + (12 )(.06) + (22 )(.05) + (32 )(.02)}/(.12 + .06 + .05 + .02) = .44/.25 = 1.76. Var[Q | P = 0] = 1.76 - .882 = 0.9856. 2.63. B. E[Q | P ≠ 0] = {(0)(.18) + (1)(.3) + (2)(.22) + (3)(.05)}/(.18 + .3 + .22 + .05) = .89/.75 = 1.1867. E[Q2 | P ≠ 0] = {(02 )(.18) + (12 )(.3) + (22 )(.22) + (32 )(.05)}/(.18 + .3 + .22 + .05) = 1.63/.75 = 2.1733. Var[Q | P ≠ 0] = 2.1733 - 1.18672 = 0.765. 2.64. C. E[Y | X = 1] = {(.050)(0) + (.125)(1)}/(.050 + .125) = .7143. E[Y2 | X = 1] = {(.050)(02 ) + (.125)(12 )}/(.050 + .125) = .7143. Var(Y | X = 1) = .7143 - .71432 = 0.2041. 2.65. Prob[Positive | Yes] = .98. Prob[Positive | No] = .01. Prob[Positive] = (.001)(.98) + (.01)(.999) = .01097. Prob[Yes | Positive] = Prob[Positive and Yes]/Prob[Positive] = (.001)(.98)/.01097 = 8.93%.
2013-4-9
Buhlmann Credibility §2 Conditional Distributions, HCM 10/19/12, Page 48
2.66. C. Prob[zero claims | θ] = e−θ. k
k
∫
∫
Prob[zero claims] = e−θ π(θ) dθ = e−θ e−θ/(1 - e-k) dθ = {(1 - e-2k) / 2}/(1 - e-k) = (1 + e-k)/2. 0
0
Set .575 = (1 + e-k)/2. ⇒ k = 1.897. Comment: I used the fact that (1 - x2 )/(1 - x) = 1+ x, with x = e-k.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 49
Section 3, Covariances and Correlations10 Given the joint distribution of two variables, one can compute their covariance and correlation. Covariances: The Covariance of X and Y is defined as: Cov[X,Y] ≡ E[XY] - E[X]E[Y]. Exercise: E[X] = 9, E[Y] = 8, and E[XY] = 60. What is the covariance of X and Y? [Solution: Cov[X,Y] = E[XY] - E[X]E[Y] = 60 - (9)(8) = -12.] As in the previous section, assume two honest, six-sided dice of different colors are rolled, and the results D1 and D2 are observed. Let S = D1 + 2D2 . In order to compute the covariance of D2 and S, one must first compute E[D2 S]. Since S = D1 + 2D2 , for all the possible outcomes, S has values:
D1 1 2 3 4 5 6
1
2
D2 3
3 4 5 6 7 8
5 6 7 8 9 10
7 8 9 10 11 12
4
5
6
9 10 11 12 13 14
11 12 13 14 15 16
13 14 15 16 17 18
Then for these 36 equally likely possibilities the product of D2 and S is:
D1 1 2 3 4 5 6
1
D2 2
3
4
5
6
3 4 5 6 7 8
10 12 14 16 18 20
21 24 27 30 33 36
36 40 44 48 52 56
55 60 65 70 75 80
78 84 90 96 102 108
Thus E[D2 S] = (3+4+5+6+7+8+10+12+14+16+18+20+21+24+27+33+36+ 36+40+44+48+52+56+55+60+65+70+75+80+78+84+90+96+102+108)/36 = 42.5833. 10
Even though these ideas are only occasionally tested directly, it will be assumed that you have a good understanding of the concepts in this section. For those who already know this material, do only a few problems from this section in order to refresh your memory.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 50 Alternately, one could use the conditional distributions of S given various values of D2 to simplify this calculation somewhat: E[D2 S] = Σ {P(D2 = i) E[D2 | D2 = i] E[S | D2 = i]} =
Σ P(D2 = i) i E[S | D2 = i] = (1/6)(1)(5.5) + (1/6)(2)(7.5) + (1/6)(3)(9.5) + (1/6)(4)(11.5) + (1/6)(5)(13.5) + (1/6)(6)(15.5) = 42.5833. Now E[D2 ] = 3.5, while E[S]= E[D1 + 2D2] = E[D1] + 2 E[ D2] = 3.5 + (2)(3.5) = 10.5. Therefore Cov[D2 , S] = E[D2 S] - E[D2 ]E[S] = 42.5833 - (3.5)(10.5) = 5.8333. Alternately, Cov[D2 , S] = Cov[D2 , D1 + 2D2] = Cov[D2 , D1] + 2 Cov[D2 ,D2] = 0 + 2 Var[D2 ] = (2)(35/12) = 35/6 = 5.8333. Cov[X, X] = E[X2 ] - E[X]2 = Var[X]. Thus the variance is a special case of the covariance, Cov[X, X] = Var[X]. Note that if X and Y are independent, then E[XY] = E[X]E[Y] and thus Cov[X,Y] = 0. Note that Cov[X, Y] = Cov[Y, X]. Also Cov[X, Y + Z] = Cov[X, Y] + Cov[X, Z]. Also if b is any constant, Cov[X, bY] = b Cov[X,Y] while Cov[X, b] = 0. Var[X + Y] = E[(X+Y)2 ] - E[X + Y]2 = E[X2 ] + 2E[XY] + E[Y2 ] - (E[X] + E[Y])2 = (E[X2 ] - E[X]2 ) + 2(E[XY] - E[X]E[Y]) + (E[Y2 ] - E[Y]2 ) = Var[X] + Var[Y] + 2Cov[X,Y]. Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y]. If X and Y are independent then: Cov[X, Y] = 0 and Var[X + Y] = Var[X] + Var[Y]. Exercise: Var[X] = 10, Var[Y] = 20, and Cov[X, Y] = -12. What is Var[X + Y]? [Solution: Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y] = 10 + 20 + (2)(-12) = 6.] Similarly, Var[aX + bY] = a2 Var[X] + b2 Var[Y] + 2abCov[X,Y]. Exercise: Var[X] = 10, Var[Y] = 20, and Cov[X, Y] = -12. What is Var[3X + 4Y]? [Solution: Var[3X + 4Y] = 9Var[X] + 16Var[Y] + (2)(3)(4)Cov[X,Y] = 90 + 320 + (24)(-12) = 122.]
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 51 Correlations: The Correlation of two random variables is defined in terms of their covariance: Cov[X, Y] Corr[X,Y] ≡ . Var[X] Var[Y] Exercise: Var[X] = 10, Var[Y] = 20, and Cov[X, Y] = -12. What is Corr[X, Y]? -12 Cov[X, Y] [Solution: Corr[X,Y] = = = -0.85.] Var[X] Var[Y] (10) (20) The correlation is always in the interval [-1, +1]. Corr[X, Y] = Corr[Y,X] Corr[X, X] = 1 Corr[X, -X] = -1 ⎧ Corr[X, Y] if a > 0 ⎪ 0 if a = 0 Corr[X, aY] = ⎨ ⎪-Corr[X, Y] if a < 0 ⎩ Corr[X, aX] = 1 if a > 0 Two variables that are proportional with a positive proportionality constant are perfectly correlated and have a correlation of one. Closely related variables, such as height and weight, have a correlation close to but less than one. Unrelated variables have a correlation near zero. Inversely related variables, such as the average temperature and the use of heating oil, are negatively correlated. Continuing the example with dice, we can determine the correlation of D2 , and S = D1 + 2D2 . Exercise: What is the variance of D2 ? [Solution: The second moment is (1/6)(12 ) + (1/6)(22 ) + (1/6)(32 ) + (1/6)(42 ) + (1/6)(52 ) + (1/6)(62 ) = 91/6. The mean is 7/2. Thus the variance = 91/6 - (7/2)2 = 35/12.] Exercise: What is the variance of S? [Solution: Since D1 and D2 are independent, Var[S] = Var[D1 + 2D2 ] = Var[ D1 ] + 4 Var[ D2 ] = (35/12) + (4)(35/12) = 175/12. ]
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 52 Thus, Corr[D2 , S] =
Cov[D 2, S] Var[D2 ] Var[S]
=
5.8333 (35 / 12) (175 / 12)
= 0.894.
Note that D2 and S are positively correlated. Larger values of D2 tend to be associated with larger values of S, and vice versa.11 A Poisson Process Example:12 N has a Poisson distribution with mean 6. Let A = X1 +...+ XN, where Prob(Xi = 5) = 50% and Prob(Xi = 25) = 50%, for all i, where the Xi ʼs are independent. Let B = Y1 +...+ YN, where Prob(Yi = 5) = 80% and Prob(Yi = 25) = 20%, for all i, where the Yi ʼs are independent. Note that we assume that A and B have the same number of claims N. Exercise: Calculate the Covariance of A and B. [Solution: E[X] = 15. E[Y] = 9. E[AB] = EN[E[AB | n]] = EN[15n9n] = 135E[N2 ] = 135(6 + 62 ) = 5670. Cov[A, B] = 5670 - (15)(6)(9)(6) = 810.] One could instead use the general result established below for this type of situation: Cov[A, B] = Var[N] E[X]E[Y] = (6)(15)(9) = 810. Exercise: Calculate the correlation coefficient between A and B. [Solution: E[X2 ] = (0.5)(52 ) + (0.5)(252 ) = 325. Var[A] = (6)(325) = 1950. E[Y2 ] = (0.8)(52 ) + (0.2)(252 ) = 145. Var[B] = (6)(145) = 870. Corr[A, B] =
11 12
810 = 0.622.] (1950)(870)
The maximum possible correlation is +1, so that D2 and S are very strongly (positively) correlated. See 4, 11/01, Q.29
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 53 A General Result for the Covariance when there is a common Number of Claims: Assume two aggregate process. In each process frequency and severity are independent of each other, and the claim sizes are mutually independent, identically distributed random variables. Let N be the number of claims. Assume N is the same for each of the two processes, but the first process has severity X and the second process has severity Y. X and Y are independent. Let A be the aggregate loss from the first process and B be the aggregate loss from the second process. Then Cov[A, B] = Var[N] E[X]E[Y]. Proof: E[AB] = EN[E[AB | n]] = EN[E[nXnY | n] ] = EN[E[X]E[Y]n2 ] = E[N2 ]E[X]E[Y]. Cov[A, B] = E[AB] - E[A]E[B] = E[N2 ]E[X]E[Y] - E[N]E[X]E[N]E[Y] = Var[N] E[X]E[Y]. If the frequency is Poisson with mean λ, then Cov[A , B] = λE[X]E[Y], Var[A] = λE[X2 ], and Var[B] = λE[Y2 ]. Corr[A, B] =
λ E[X] E[Y] λ E[X2 ] λ E[Y2 ]
=
1 (E[X2] / E[X]2 ) (E[Y2 ]/ E[Y]2 )
For the example above, Corr[A, B] =
1 (1.4444) (1.7901)
= 0.622.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 54 Sample Covariance and Correlation: Assume we have the following heights of eight fathers and their adult sons (in inches):13 Father Son 53 56 54 58 57 61 58 60 61 63 62 62 63 65 66 64 Here is a graph of this data: Son
64
62
60
58
56 54
56
58
60
62
64
66
Father
There appears to be a relationship between the height of the father, X, and the height of his son, Y. A taller father seems to be more likely to have a taller son. One way to measure this relationship is by computing the sample covariance and sample correlation.
13
There are only 8 pairs of observations solely in order to keep things simple.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 55 The sample variance of X is: sX2 =
∑ (Xi
- X)2
N - 1
.
Exercise: What are the sample variances of X and Y? [Solution: Mean height of fathers = X = 59.25. Xi - X = (53, 54, 57, 58, 61, 62, 63, 66) - 59.25 = (-6.25, -5.25, -2.25, -1.25, 1.75, 2.75, 3.75, 6.75). sX2 = {(-6.25)2 + ... + 6.7522 } / (8 - 1) = 143.5/7 = 20.5. Mean height of sons = Y = 61.125. Yi - Y = (56, 58, 61, 60, 63, 62, 65, 64) - 61.125 = (-5.125, -3.125, -0.125, -1.125, 1.875, 0.875, 3.875, 2.875). sY2 = {(-5.125)2 + ... + 2.8752 } / (8 - 1) = 64.875/7 = 9.2679.] In analogy to the sample variance, the sample covariance of X and Y is computed as:14 ^ Cov [X, Y] =
∑ (Xi
- X) (Yi - Y) N - 1
.
Exercise: What is the sample covariance of X and Y? [Solution:
∑ (Xi
- X) (Yi - Y) N - 1
= {(-6.25)(-5.125) + ... + (6.75)(2.875)}/(8 - 1) = 89.75/7 =
12.8214.] Then the sample correlation is the sample covariance divided by the product of the two sample standard deviations: r =
^ [X, Y] Cov = sX sY
∑(Xi - X ) (Yi - Y ) . ∑ (Xi - X)2∑ (Yi - Y)2
15
Exercise: What is the sample correlation of X and Y? 12.8214 [Solution: Using previous solutions: r = = 0.9302. (20.5) (9.2679) Comment: The slope of a linear regression with intercept fit to this line is: ^
β = r sY/sX = (0.93019)(3.0443)/4.5277 = 0.6254.] 14
Just as the variance is a special case of the covariance, the sample variance is a special case of the sample covariance 15 The factors of 1/(N-1) cancel.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 56 Covariance of Sample Means: As discussed previously, Var[ X ] = Var[X] / n.16 A similar result holds for Cov[ X , Y ]. For paired samples of data:17 n
Cov[ X , Y ] = Cov[(1/n) ∑Xi , (1/n) i=1
n
n n
j=1
j=1i=1
∑Yj ] = ∑∑ Cov[Xi ,Yj ] /n2
= n Cov[X, Y]/n2 = Cov[X, Y]/n. Cov[ X , Y ] = Cov[X, Y]/n. Exercise: For the heights example assume Var[X] = Var[Y] = 14, and Corr[X, Y] = 0.93. Determine Cov[ X , Y ]. [Solution: Cov[ X , Y ] = Cov[X, Y]/n = (0.93)(14)/8 = 1.63.] For the example, mean height of fathers = X = 59.25, and mean height of sons = Y = 61.125. Y - X = 61.125 - 59.25 = 1.875, is an estimate of the average difference in heights between sons and their fathers. Exercise: For the heights example assume Var[X] = Var[Y] = 14, and Corr[X, Y] = 0.93. Determine Var[ Y - X ]. [Solution: Var[ Y - X ] = Var[ X ] + Var[ Y ] - 2 Cov[ X , Y ] = 14/8 + 14/8 - (2)(0.93)(14)/8 = 0.245.]
16
See “Mahlerʼs Guide to Frequency Distributions.” The Xi are a series of independent, identically distributed variables. The Yi are another series of independent, identically distributed variables. Xi and Yj for i ≠ j are independent, and therefore have a covariance of zero. For i = j, Cov[Xi, Yj] = Cov[X, Y]. 17
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 57 Problems: 3.1 (1 point) E[X] = 3, E[X2 ] = 15, E[Y] = 5. E[Y2 ] = 37. E[XY] = 19. What is the correlation of X and Y? A. less than 0.30 B. at least 0.30 but less than 0.35 C. at least 0.35 but less than 0.40 D. at least 0.40 but less than 0.45 E. at least 0.45 3.2 (1 point) You are given the following: • Let X be a random variable X. • Z is defined to be 0.75X. Determine the correlation coefficient of X and Z. A. 0.00 B. 0.25 C. 0.50 D. 0.75
E. 1.00
3.3 (3 points) N has a Poisson distribution with mean 4. Let A = X1 +...+ XN, where Xi = 9 for all i. Let B = Y1 +...+ YN, where Prob(Yi = 5) = 80% and Prob(Yi = 25) = 20%, for all i, where the Yi ʼs are independent. Calculate the correlation coefficient between A and B. (A) 0.75 (B) 0.80 (C) 0.85 (D) 0.90
(E) 0.95
Use the following information for the next 8 questions:
•
G is the result of rolling a green six-sided die.
•
R is the result of rolling a red six-sided die.
•
G and R are independent of each other.
•
M is the maximum of G and R.
3.4 (1 point) What is the mode of M ? A. 2 B. 3 C. 4
D. 5
E. 6
3.5 (1 point) What is the mode of the conditional distribution of M if G = 3? A. 2 B. 3 C. 4 D. 5 E. 6
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 58 3.6 (1 point) What is the expected value of M? A. less than 4.0 B. at least 4.0 but less than 4.2 C. at least 4.2 but less than 4.4 D. at least 4.4 but less than 4.6 E. at least 4.6 3.7 (1 point) What is the expected value of the conditional distribution of M if G = 3? A. 3 B. 4 C. 5 D. 6 E. 7 3.8 (1 point) What is the variance of M? A. less than 1.0 B. at least 1.0 but less than 1.3 C. at least 1.3 but less than 1.6 D. at least 1.6 but less than 1.9 E. at least 1.9 3.9 (1 point) What is the variance of the conditional distribution of M if G = 3? A. less than 1.0 B. at least 1.0 but less than 1.3 C. at least 1.3 but less than 1.6 D. at least 1.6 but less than 1.9 E. at least 1.9 3.10 (3 points) What is the covariance of G and M? A. less than 1.0 B. at least 1.0 but less than 1.3 C. at least 1.3 but less than 1.6 D. at least 1.6 but less than 1.9 E. at least 1.9 3.11 (3 points) What is the correlation of G and M? A. less than 0.3 B. at least 0.3 but less than 0.4 C. at least 0.4 but less than 0.5 D. at least 0.5 but less than 0.6 E. at least 0.6
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 59 3.12 (2 points) You are given the following for a sample of four observations from a bivariate distribution: (i) x y 12 33 14 11 25 16 36 40 A is the covariance of the empirical distribution Fe as defined by these four observations. B is the maximum possible covariance of an empirical distribution with identical marginal distributions to Fe . Determine B - A. A. less than 40 B. at least 40 but less than 45 C. at least 45 but less than 50 D. at least 50 but less than 55 E. at least 55 3.13 (2 points) You are given the following: • A is a random variable with mean 15 and variance 7. • B is a random variable with mean 25 and variance 13. • C is a random variable with mean 45 and variance 28. • A, B, and C are independent. • X=A+B • Y=A+C Determine the correlation coefficient between X and Y. A. less than 0.3 B. at least 0.3 but less than 0.4 C. at least 0.4 but less than 0.5 D. at least 0.5 but less than 0.6 E. at least 0.6 3.14 (2 points) You have a paired sample of data: (X1 , Y1 ), (X2 , Y2 ), ... , (X100, Y100). The Xi are a series of independent, identically distributed variables. The Yi are another series of independent, identically distributed variables. Assume Var[X] = 9, Var[Y] = 16, and Corr[X, Y] = 0.8. Determine Var[ X - Y ]. A. 0.02 B. 0.04
C. 0.06
D. 0.08
E. 0.10
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 60 Use the following information for the next 11 questions:
•
A green ball is placed at random in one of three urns.
•
Then independently a red ball is placed at random in one of these three urns.
•
Let X be the number of balls in Urn 1.
•
Let Y be the number of balls in Urn 2.
•
Let Z be the number of balls in Urn 3.
•
Let N be the number of occupied urns.
3.15 (1 point) Given X = 0, what is the (conditional) variance of Y? A. 0.2 B. 0.3 C. 0.4 D. 0.5 E. 0.6 3.16 (1 point) Given X = 1, what is the (conditional) variance of Z? A. less than 0.3 B. at least 0.3 but less than 0.4 C. at least 0.4 but less than 0.5 D. at least 0.5 but less than 0.6 E. at least 0.6 3.17 (2 points) What is the covariance of Y and Z? A. -0.5 B. -0.4 C. -0.3 D. -0.2
E. -0.1
3.18 (2 points) Given X = 0, what is the (conditional) covariance of Y and Z? A. -0.5 B. -0.4 C. -0.3 D. -0.2 E. -0.1 3.19 (2 points) Given X = 1, what is the (conditional) covariance of Y and Z? A. less than -0.3 B. at least -0.3 but less than -0.2 C. at least -0.2 but less than -0.1 D. at least -0.1 but less than 0 E. at least 0 3.20 (1 point) What is the correlation of Y and Z? A. less than -0.3 B. at least -0.3 but less than -0.2 C. at least -0.2 but less than -0.1 D. at least -0.1 but less than 0 E. at least 0
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 61 3.21 (1 point) Given X = 0, what is the (conditional) correlation of Y and Z? A. less than -0.9 B. at least -0.9 but less than -0.8 C. at least -0.8 but less than -0.7 D. at least -0.7 but less than -0.6 E. at least -0.6 3.22 (1 point) Given X = 1, what is the (conditional) correlation of Y and Z? A. less than -0.9 B. at least -0.9 but less than -0.8 C. at least -0.8 but less than -0.7 D. at least -0.7 but less than -0.6 E. at least -0.6 3.23 (1 point) Given X = 0, what is the (conditional) expected value of N? A. 1/4 B. 1/2 C. 1 D. 3/2 E. 2 3.24 (1 point) Given X = 1, what is the (conditional) expected value of N? A. 1/4 B. 1/2 C. 1 D. 3/2 E. 2 3.25 (1 point) Given X = 2, what is the (conditional) expected value of N? A. 1/4 B. 1/2 C. 1 D. 3/2 E. 2 Use the following information for the next 3 questions:
• Two independent lives x and y are of the same age. • Both lives follow De Moivreʼs law of mortality with the same ω. • T(xy) is the the time until failure for the joint life status of x and y. • T( xy ) is the the time until failure for the last survivor status of x and y. 3.26 (3 points) What is the correlation of T(x) and T(xy)? A. 0.3 B. 0.4 C. 0.5 D. 0.6 E. 0.7 3.27 (3 points) What is the correlation of T(x) and T( xy )? A. 0.3 B. 0.4 C. 0.5 D. 0.6 E. 0.7 3.28 (3 points) What is the correlation of T(xy) and T( xy )? A. 0.3 B. 0.4 C. 0.5 D. 0.6 E. 0.7
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 62 3.29 (7 points) You are given the following data points and summary statistics: Price of Silver (X) Price of Gold (Y) $4.67 $343.80 $5.97 $416.25 $6.39 $427.75 $9.04 $530.00 $13.01 $639.75 5
∑Xi = 39.08 i=1
5
∑Yi = 2,357.55 i=1
(i) (2 points) Determine the sample variance of X. (ii) (2 points) Determine the sample variance of Y. (iii) (2 points) Determine the sample covariance of X and Y. (iv) (1 point) Determine the sample correlation of X and Y. 3.30 (3 points) You are given the following joint distribution of X and Y: y x 0 1 2 0 0.1 0.2 0 1 0 0.2 0.1 2 0.2 0 0.2 Determine the correlation of X and Y. A. less than 0.1 B. at least 0.1 but less than 0.2 C. at least 0.2 but less than 0.3 D. at least 0.3 but less than 0.4 E. at least 0.4 3.31 (2, 5/88, Q.41) (1.5 points) A hat contains 3 chips numbered 1, 2, and 3. Two chips are drawn successively from the hat without replacement. What is the correlation between the number on the first chip and the number on the second chip? A. -1/2 B. -1/3 C. 0 D. 1/3 E. 1/2 3.32 (2, 5/90, Q.4) (1.7 points) Let X and Y be discrete random variables with joint probability distribution given in the table below: X Y -1 0 1 0 0.1 0.1 0.2 1 0.1 0.3 0.2 What is Cov(X, Y)? A. -0.02 B. 0 C. 0.02 D. 0.10 E. 0.12
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 63 3.33 (2, 5/92, Q.40) (1.7 points) Let X and Y be discrete random variables with joint probability function given by the following table: x y 0 1 2 0 0 2/5 1/5 1 1/5 1/5 0 What is the variance of Y - X? A. 4/25 B. 16/25 C. 26/25 D. 5/4 E. 7/5 3.34 (2, 5/92, Q.45) (1.7 points) Let X and Y be continuous random variables with joint density function f(x, y) = 6x for 0 < x < y < 1. Note that E[X] = 1/2 and E[Y] = 3/4. What is Cov(X, Y)? A. 1/40 B. 2/5 C. 5/8 D. 1 E. 13/8 3.35 (2, 2/96, Q.4) (1.7 points) Let X1 , X2 , X3 be uniform random variables on the interval (0, 1) with Cov(Xi, Xj) = 1/24 for i, j = 1, 2, 3, i ≠ j. Calculate the variance of X1 + 2X2 - X3 . A. 1/6
B. 1/4
C. 5/12
D. 1/2
E. 11/12
3.36 (2, 2/96, Q.8) (1.7 points) Let X and Y be discrete random variables with joint probability function p(x, y) given by the following table: x y 2 3 4 5 0 0.05 0.05 0.15 0.05 1 0.40 0 0 0 2 0.05 0.15 0.10 0 For this joint distribution E(X) = 2.85 and E(Y) = 1. Calculate Cov(X, Y). A. -0.20 B. -0.15 C. 0.95 D. 2.70 E. 2.85 3.37 (5B, 11/98, Q.24) (1.5 points) Given the following information, what is the variance of a portfolio consisting of equal weights of stocks B and C? Show all work. ρ AB = 0.6. ρAC = 1.0. σA2 = 0.4. σB2 = 0.7. σC2 = 0.6.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 64 3.38 (4B, 11/99, Q.29) (2 points) You are given the following: • A is a random variable with mean 5 and coefficient of variation 1. • B is a random variable with mean 5 and coefficient of variation 1. • C is a random variable with mean 20 and coefficient of variation 1/2. • A, B, and C are independent. •X=A+B •Y=A+C Determine the correlation coefficient between X and Y. A. -2/ 10
B. -1/ 10
C. 0
D. 1/ 10
E. 2/ 10
3.39 (IOA 101, 4/00, Q.12) (3 points) A random sample of 200 pairs of observations (x, y) from a discrete bivariate distribution (X, Y) is as follows: the observation (-2, 2) occurs 50 times the observation (0, 0) occurs 90 times the observation (2, -1) occurs 60 times. Calculate the sample correlation coefficient for these data. 3.40 (1, 5/00, Q.20) (1.9 points) Let X and Y denote the values of two stocks at the end of a five-year period. X is uniformly distributed on the interval (0, 12). Given X = x, Y is uniformly distributed on the interval (0, x). Determine Cov(X, Y) according to this model. (A) 0 (B) 4 (C) 6 (D) 12 (E) 24 3.41 (IOA 101, 9/00, Q.10) (3.75 points) Let Z be a random variable with mean 0 and variance 1, and let X be a random variable independent of Z with mean 5 and variance 4. Let Y = X - Z. Calculate the correlation coefficient between X and Y. 3.42 (4, 11/00, Q.32) (2.5 points) You are given the following for a sample of five observations from a bivariate distribution: (i) x y 1 4 2 2 4 3 5 6 6 4 (ii) x = 3.6, y = 3.8. A is the covariance of the empirical distribution Fe as defined by these five observations. B is the maximum possible covariance of an empirical distribution with identical marginal distributions to Fe . Determine B - A. (A) 0.9 (B) 1.0
(C) 1.1
(D) 1.2
(E) 1.3
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 65 3.43 (1, 5/01, Q.7) (1.9 points) A joint density function is given by f(x, y) = kx, for 0 < x < 1, 0 < y < 1, where k is a constant. What is Cov(X,Y)? (A) - 1/6 (B) 0 (C) 1/9 (D) 1/6 (E) 2/3 3.44 (4, 11/01, Q.29) (2.5 points) In order to simplify an actuarial analysis Actuary A uses an aggregate distribution S = X1 +...+ XN, where N has a Poisson distribution with mean 10 and Xi = 1.5 for all i. Actuary Aʼs work is criticized because the actual severity distribution is given by Pr(Yi = 1) = Pr(Yi = 2) = 0.5, for all i, where the Yi ʼs are independent. Actuary A counters this criticism by claiming that the correlation coefficient between S and S* = Y1 +...+ YN is high. Calculate the correlation coefficient between S and S*. (A) 0.75 (B) 0.80 (C) 0.85 (D) 0.90 (E) 0.95 3.45 (4, 11/03, Q.13) (2.5 points) You are given: (i) Z1 and Z2 are independent N(0,1) random variables. (ii) a, b, c, d, e, f are constants. (iii) Y = a + bZ1 + cZ2 and X = d + eZ1 + fZ2 . Determine E[Y | X]. (A) a (B) a + (b + c)(X - d) (C) a + (be + cf)(X - d) (D) a + [(be + cf) / (e2 + f2 )] X (E) a + [(be + cf) / (e2 + f2 )] (X - d)
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 66 Solutions to Problems: 3.1. E. Var[X] = 15 - 32 = 6. Var[Y] = 37 - 52 = 12. Cov[X, Y] = E[XY] - E[X]E[Y] = 19 - (3)(5) = 4. Corr[X,Y] = Cov[X,Y] /
Var[X]Var[Y] = 4/ (6)(12) = 0.47.
3.2. E. Var[Z] = Var[.75X] = .752 Var[X]. Cov[X, Z] = Cov[X, .75X] = .75Cov[X, X] = .75 Var[X]. Therefore, Corr[X,Z] = .75 Var[X] / Var[X] 0.752 Var[X] = 1. Comments: Two variables that are proportional with a positive proportionality constant are perfectly correlated and have a correlation of one. 3.3. A. E[A] = (4)(9) = 36. Var[A] = (4)(92 ) = 324. E[B] = (4)((.8)(5) + (.2)(25)) = 36. Var[B] = (4)((.8)(52 ) + (.2)(252 )) = 580. E[AB] = E[E[AB | n]] = E[E[9n B | n]] = E[9n E[B | n]] = E[9n 9n] = 81E[n2 ] = (81)(2nd moment of the Poisson) = (81)(4 + 42 ) = 1620. COV[A, B] = E[AB] - E[A]E[B] = 1620 - (36)(36) = 324. Corr[A, B] = COV[A, B]/ Var[A]Var[B] = 324/ (324)(580) = 0.747. Comment: Similar to 4, 11/01, Q.29. 3.4. E. There are 36 equally likely possibilities, with corresponding values of M = MAX[G,R]: R G 1 2 3 1 1 2 3 2 2 2 3 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 The most likely value of M is therefore 6.
4
5
66
4 4 4 4 5 6
5 5 5 5 5 6
6 6 6 6 6 6
3.5. B. The conditional distribution of M if G =3 is: f(3) = 3/6, f(4) = 1/6, f(5) = 1/6, and f(6) = 1/6. Thus the mode of the conditional distribution of M if G =3 is 3. 3.6. D. Examining the 36 equally likely possibilities, the distribution of M is: f(1) =1/36, f(2) = 3/36, f(3) = 5/36, f(4) = 7/36, f(5) = 9/36, and f(6) = 11/36. Thus the mean of M is: ((1)(1) + (3)(2) + (5)(3) + (7)(4) + (9)(5) + (11)(6)) / 36 = 4.472.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 67 3.7. B. The conditional distribution of M if G =3 is: f(3) = 3/6, f(4) = 1/6, f(5) = 1/6, and f(6) = 1/6. Thus the mean of the conditional distribution of M if G =3 is: {(3)(3) +4 +5 +6} /6 = 4. 3.8. E. The distribution of M is: f(1) =1/36, f(2) = 3/36, f(3) = 5/36, f(4) = 7/36, f(5) = 9/36, and f(6) = 11/36. Thus the second moment of M is: ((1)(12 ) + (3)(22 ) + (5)(32 ) + (7)(42 ) + (9)(52 ) + (11)(62 )) / 36 = 21.972. Thus since the mean of M is 4.472, the variance of M is: 21.972 - 4.4722 = 1.97. 3.9. C. The conditional distribution of M if G =3 is: f(3) = 3/6, f(4) = 1/6, f(5) = 1/6, and f(6) = 1/6. Thus the the second moment of the conditional distribution of M if G =3 is: {(3)(32 ) + 42 + 52 + 62 } /6 = 17.33. Thus since the mean of the conditional distribution of M is 4, the variance the conditional distribution of M is: 17.33 - 42 = 1.33. 3.10. C. First one has to compute E[GM]. There are 36 equally likely possibilities and the corresponding values of M = MAX[G,R] are: G 1 2 3 4 5 6
1
2
R 3
1 2 3 4 5 6
2 2 3 4 5 6
3 3 3 4 5 6
4
5
66
4 4 4 4 5 6
5 5 5 5 5 6
6 6 6 6 6 6
E[GM] = Σ P(G = i) i E[M | G = i] = (1/6)(1)(21/6) + (1/6)(2)(22/6) +(1/6)(3)(24/6) + (1/6)(4)(27/6) + (1/6)(5)(31/6) + (1/6)(6)(36/6) = 17.1111. Thus Covar[G,M] = E[GM] - E[G]E[M] = 17.1111 - (3.5)(4.4722) = 1.458. 3.11. E. Corr[G,M] = Covar[g,M] / {Var[G] Var[M]}.5 = 1.458 / {(35/12) (1.973)}.5 = 0.608.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 68 3.12. E. For the observations, E[XY] = {(12)(33) + (14)(11) + (25)(16) + (36)(40)}/4 = 597.5. A = Cov[X,Y] = E[XY] - E[X]E[Y] = 597.5 - E[X]E[Y]. The maximum correlation and covariance occurs when the smallest x corresponds to the smallest y, and the largest x corresponds to the largest y, keeping the observed sets of values of x and y the same as before: x y 12 11 14 16 25 33 36 40 Now, E[XY] = {(12)(11) + (14)(16) + (25)(33) + (36)(40)}/4 = 655.25. B = Cov[X,Y] = E[XY] - E[X]E[Y] = 655.25 - E[X]E[Y]. B - A = 655.25 - 597.5 = 57.75. Comment: Similar to 4, 11/00, Q.32. 3.13. A. Var[X] = Var[A] + Var[B] = 7 + 13 = 20, since A and B are independent. Var[Y] = Var[A] + Var[C] = 7 + 28 = 35, since A and C are independent. Cov[X, Y] = Cov[A + B, A + C] = Cov[A, A] + Cov[A, C] + Cov[B, A] + Cov[B, C] = Var[A] + 0 + 0 + 0 = 7. Corr[X , Y] = Cov[X , Y] /
Var[X]Var[Y] = 7 /
(20)(35) = 0.26.
Comment: No use is made of the given means. 3.14. C. Cov[X, Y] = Corr[X, Y]
Var[X]Var[Y] = (0.8) (9)(16) = 9.6.
Cov[ X , Y ] = Cov[X, Y]/n = 9.6/100 = 0.096. Var[ X - Y ] = Var[ X ] + Var[ Y ] - 2 Cov[ X , Y ] = 9/100 + 16/100 - (2)(.096) = 0.058. 3.15. D. If X = 0, then Y is binomial with q = 1/2 and m = 2. So the conditional variance of Y is: (2)(1/2)(1-1/2) = 1/2. 3.16. A. If X = 1, then Z is binomial with q = 1/2 and m = 1. So the conditional variance of Z is: (1)(1/2)(1-1/2) = 1/4. 3.17. D. There are nine equally likely possibilities: 0,rg,0; r,g,0; g,r,0; gr,0,0; 0,0,rg; r,0,g; g,0,r; 0,r,g; 0,g,r. Therefore, E[YZ] = (0+0+0+0+0+0+0+1+1)/9 = 2/9. E[Y] = E[Z] = 2/3. Thus, Covar[Y,Z] = 2/9 - (2/3)(2/3) = - 2/9 = -0.222. 3.18. A. If X = 0, then of the original nine equally likely possibilities only 4 apply: 0,rg,0; 0,0,rg; 0,r,g; 0,g,r. Thus E[YZ | X =0] = (0 + 0 + 1 + 1) /4 = 1/2. E[Y | X=0] = E[Z | X=0] = 1. Thus, Covar[Y,Z | X =0] = 1/2 - (1)(1) = -1/2. 3.19. B. If X = 1, then of the original nine equally likely possibilities only 4 apply: r,g,0; g,r,0; r,0,g; g,0, r. Thus E[YZ | X =1] = (0 + 0 + 0 + 0) /4 = 0. E[Y | X=1] = E[Z | X=1] = 1/2. Thus, Covar[Y,Z | X =1] = 0 - (1/2)(1/2) = -1/4.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 69 3.20. A. Y and Z are each distributed as a Binomial with q = 1/3 and m = 2. Thus Var[Y] = Var[Z] = (2)(1/3)(2/3) = 4/9. From a previous question, Covar[Y,Z] = - 2/9. Thus Corr[Y,Z] = (-2/9) / {(4/9)(4/9)}0.5 = -1/2. Comment: Note that Y and Z are negatively correlated. Larger values of Y tend to be associated with smaller values of Z and vice versa. 3.21. A. From a previous question, Var[Y | X=0] = Var[Z | X=0] = 1/2. From a previous question, Covar[Y,Z | X =0] = -1/2. Thus Corr[Y,Z | X =0] = (-1/2) / {(1/2)(1/2)}0.5 = -1. Comment: When X = 0, Y and Z are perfectly negatively correlated. This follows from the fact that Y and Z are identically distributed and that Y + Z = 2, a constant. Therefore, Cov[Y, Z] = Cov[Y, 2 - Y] = Cov[Y, 2] - Cov[Y, Y] = 0 - Var[Y] = - Var[Y]. Corr[Y, Z] = Cov[Y, Z] / {Var[Y] Var[Z]}0.5 = -Var[Y] / {Var[Y] Var[Y]}0.5 = -1. 3.22. A. From a previous question, Var[Y | X=1] = Var[Z | X=1] = 1/4. From a previous question, Covar[Y, Z | X =1] = -1/4. Thus Corr[Y, Z | X =0] = (-1/4) / {(1/4)(1/4)}0.5 = -1. 3.23. D. If X = 0, then of the original nine equally likely possibilities only 4 apply: 0,rg,0; 0,0,rg; 0,r,g; 0,g,r. Thus E[N | X =0] = (2 + 2 + 1 + 1) /4 = 3/2. 3.24. E. If X = 1, then there is exactly one other urn (either Y or Z) that is occupied, so that N = 2 and the (conditional) expected value of N is 2. 3.25. C. If X = 2, then there are no other occupied urns , so that N=1 and the (conditional) expected value of N is 1.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 70 3.26 to 3.28. D., D., & C. Let u = T(x); u is uniformly distributed from 0 to ω - x. Let v = T(y); v is uniformly distributed from 0 to ω - x. Let w = T(xy) = min[u, v]. Let z = T( xy ) = max[u, v]. Let ω - x = b. Prob[W > w] = Prob[both future lifetimes > w] = Prob[u > w]Prob[v > w] = (1 - w/b)(1 - w/b). f(w) = -d(Prob[W > w])/dw = (2/b)(1 - w/b), 0 ≤ w ≤ b. Integrating wf(w) from 0 to b, E[W] = b/3. Integrating w2 f(w) from 0 to b, E[W2 ] = b2 /6. ⇒ Var[W] = b2 /6 - (b/3)2 = b2 /18. E[w | u] = E[min[u, v] | u] = Prob[v < u]E[v | v < u] + Prob[v ≥ u]u = (u/b)(u/2) + (1 - u/b)u = u - u2 /2b. E[uw] = E[uE[w | u]] = E[u(u - u2 /2b)] = E[u2 ] - E[u3 ]/2b = b
b
∫u2/b du - ∫u3/b du /2b = b2/3 - b2/8 = 5b2/24. 0
0
Cov[u, w] = E[uw] - E[u]E[w] = 5b2 /24 - (b/2)(b/3) = b2 /24. Corr[u, w] = Cov[u, w]/
Var[u]Var[w] = b2 /24 /
(b2 / 12)(b2 / 18) = 6 / 4 = 0.612.
By symmetry, Corr[T(x), T( xy )] ≡ Corr[u, max[u, v]] = Corr[u, min[u, v]] = 0.612. u + v = min[u, v] + max[u, v] ≡ w + z. Therefore, Cov[u + v, u + v] = Cov[w + z, w + z]. Since u and v are independent, Var[u] + Var[v] = Var[w] + Var[z] + 2Cov[w, z]. Cov[w, z] = (Var[x] + Var[v] - Var[w] - Var[z])/2 = (b2 /12 + b2 /12 - b2 /18 - b2 /18)/2 = b2 /36. Corr[w, z] = Cov[w, z]/
Var[w]Var[z] = b2 /36 /
(b2 / 18)(b2 / 18) = 18/36 = 1/2.
Comment: Similar to Example 9.5.2 and Exercise 9.13 in Actuarial Mathematics. 3.29. (i) X = 39.08/5 = 7.816. Xi - X = -3.146, -1.846, -1.426, 1.224, 5.194. sX2 = {(-3.146)2 + (-1.846)2 + (-1.426)2 + (1.224)2 + (5.194)2 }/(5 - 1) = 10.954. (ii) Y = 2,357.55/5 = 471.51. Yi - Y = -127.71, -55.26, -43.76, 58.49, 168.24. sY2 = {(-127.71)2 + (-55.26)2 + (-43.76)2 + (58.49)2 + (168.24)2 }/(5 - 1) = 13,251. (iii) Cov[X, Y] = {(-3.146)(-127.71) + (-1.846)(-55.26) + (-1.426)(-43.76) + (1.224)(58.49) + (5.194)(168.24)}/(5 - 1) = 377.90. (iv) r = Cov[X, Y]/(sX sY) = 377.90 /
(10.954)(13,251) = 0.992.
Comment: Setup taken from CAS3L, 5/08, Q. 9. ^
For a linear regression, the fitted slope is β = Cov[X, Y]/sX2 = 377.90/10.954 = 34.5. ^ The fitted intercept is α^ = Y - β X = 471.51 - (34.5)(7.816) = 201.86.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 71 3.30. B. E[X] = (.3)(0) + (.3)(1) + (.4)(2) = 1.1. Var[X] = (.3)(02 ) + (.3)(12 ) + (.4)(22 )} - 1.12 = .69. E[Y] = (.3)(0) + (.4)(1) + (.3)(2) = 1. Var[Y] = (.3)(02 ) + (.4)(12 ) + (.3)(22 )} - 12 = .6. E[XY] = (.5)(0) + (.2)(1) + (.1)(2) + (.2)(4) = 1.2. Cov[X, Y] = E[XY] - E[X]E[Y] = 1.2 - (1.1)(1) = 0.1. Corr[X, Y] = Cov[X, Y]/ Var[X]Var[Y] = 0.1/ (0.69)(0.6) ) = 0.155. 3.31. A. Let X be the first chip and Y be the second chip picked. E[XY] = {(1)(2) + (1)(3) + (2)(1) + (2)(3) + (3)(1) + (3)(2)}/6 = 11/3. Cov[X, Y] = E[XY] - E[X]E[Y] = 11/3 - (2)(2) = -1/3. Var[X] = Var[Y] = 2/3. Corr[X, Y] = Cov[X, Y]/ Var[X]Var[Y] = (-1/3)/(2/3) = -1/2. 3.32. A. E[XY] = (0.1)(-1)(0) + (0.1)(0)(0) + (0.2)(1)(0) + (0.1)(-1)(1) + (0.3)(0)(1) + (0.2)(1)(1) = 0.1. E[X] = (0.2)(-1) + (0.4)(0) + (0.4)(1) = .2. E[Y] = (0.4)(0) + (0.6)(1) = 0.6. Cov[X, Y] = E[XY] - E[X]E[Y] = 0.1 - (0.2)(0.6) = -0.02. 3.33. C. E[XY] = (2/5)(1)(0) + (1/5)(2)(0) + (1/5)(1)(0) + (1/5)(1)(1) = 1/5. E[X] = (1/5)(0) + (3/5)(1) + (1/5)(2) = 1. E[Y] = (3/5)(0) + (2/5)(1) = 2/5. E[X2 ] = (1/5)(0) + (3/5)(1) + (1/5)(4) = 7/5. E[Y2 ] = (3/5)(0) + (2/5)(1) = 2/5. E[Y - X] = E[Y] - E[X] = 2/5 - 1 = -3/5. E[(Y - X)2 ] = E[Y2 ] + E[X2 ] - 2E[X]E[Y] = 2/5 + 7/5 - (2)(1)(1/5) = 7/5. Var[Y - X] = 7/5 - (-3/5)2 = 26/25. Alternately, Var[X] = 7/5 - 12 = 2/5. Var[Y] = 2/5 - (2/5)2 = 6/25. Cov[X, Y] = E[XY] - E[X]E[Y] = 1/5 - (1)(2/5) = -1/5. Var[Y - X] = Var[Y] + Var[X] - 2Cov[X, Y] = 6/25 + 2/5 + 2/5 = 26/25. 3.34. A.
1
E[XY] =
y
1
∫ ∫(xy) 6x dx dy = ∫2y4 dy = 2/5.
y =0 x=0
y =0
Cov[X, Y] = E[XY] - E[X]E[Y] = 2/5 - (1/2)(3/4) = 1/40. 3.35. C. Var[Xi] = 1/12.
Var[X1 + 2X2 - X3 ] =
Var[X1 ] + 4Var[X2 ] + Var[X3 ] + 4Cov[X1 , X2 ] - 2Cov[X1 , X3 ] - 4Cov[X2 , X3 ] = 6/12 - 2/24 = 5/12.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 72 3.36. B. E[XY] = (.05)(2)(0) + (.05)(3)(0) + (.15)(4)(0) + (.05)(5)(0) + (.4)(2)(1) + (.05)(2)(2) + (.15)(2)(3) + (.1)(2)(4) = 5.1. Cov[X, Y] = E[XY] - E[X]E[Y] = 2.7 - 2.85 = -0.15. 3.37. ρAC = 1.0. ⇒ ρBC = ρAB = 0.6. Variance of a portfolio consisting of equal weights of stocks B and C: (.52 )σB2 + (.52 )σC2 + (.5)(.5)2ρBCσBσC = (.52 )(0.7) + (.52 )(0.6) + (.5)(.5)(2)(0.6)
0.7
0.6 = 0.519.
3.38. D. Var[A] = {(mean)(CV)}2 = 25. Var[B] = {(5)(1)}2 = 25. Var[C] = {(20)(1/2)}2 = 100. Var[X] = Var[A] + Var[B] = 25 + 25 = 50, since A and B are independent. Var[Y] = Var[A] + Var[C] = 25 + 100 = 125, since A and C are independent. Cov[X, Y] = Cov[A+B, A+C] = Cov[A, A] + Cov[A, C] + Cov[B, A] + Cov[B, C] = Var[A] + 0 + 0 + 0 = 25. Corr[X , Y] = Cov[X , Y] /
Var[X]Var[Y] = 25 /
(50)(125) = 1 /
10 .
Comment: Since A, B, and C are independent, Cov[A, C] = Cov[B, A] = Cov[B, C] = 0. 3.39. E[X] = {(50)(-2) + (90)(0) + (60)(2)}/200 = 0.1. E[Y] = {(50)(2) + (90)(0) + (60)(1)}/200 = 0.2. E[XY] = {(50)(-2)(2) + (90)(0)(0) + (60)(2)(1)}/200 = -1.6. E[X2 ] = {(50)(-2)2 + (90)(0)2 + (60)(2)2 }/200 = 2.2. E[Y2 ] = {(50)(2)2 + (90)(0)2 + (60)(1)2 }/200 = 1.3. Cov[X, Y] = E[XY] - E[X]E[Y] = - 1.6 - (.1)(.2) = -1.62. Var[X] = E[X2 ] - E[X]2 = 2.2 - 0.12 = 2.19. Var[Y] = E[Y2 ] - E[Y]2 = 1.3 - 0.22 = 1.26. Corr[X, Y] = Cov[X, Y]/ Var[X]Var[Y] = -1.62/ (2.19)(1.26) = -0.975.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 73 3.40. C.
12
E[XY] =
x
12
∫ ∫(xy) 1/(12x) dy dx = ∫x2/24 dx = 24.
x =0 y=0 12
E[X] =
x
12
∫ ∫x 1/(12x) dy dx = ∫x/12 dx = 6. x =0 y=0 12
E[Y] =
x =0
x =0 x
12
∫ ∫y 1/(12x) dy dx = ∫x/24 dx = 3. x =0 y=0
x =0
Cov[X, Y] = E[XY] - E[X]E[Y] = 24 - (6)(3) = 6. 3.41. Cov[X, Y] = Cov[X, X - Z] = Cov[X, X] - Cov[X, Z] = Var[X] - 0 = 4. Var[Y] = Var[X - Z] = Var[X] + Var[Z] - 2Cov[X, Z] = 4 + 1 - (2)(0) = 5. Corr[X, Y] = 4/ (4)(5) = 0.894. 3.42. D. For the five observations, E[XY] = {(1)(4) + (2)(2) + (4)(3) + (5)(6) + (6)(4)}/5 = 14.8. Cov[X,Y] = E[XY] - E[X]E[Y] = 14.8 - (3.6)(3.8) = 1.12 = A. If instead we had the maximum correlation between the observed x and y, keeping the observed sets of values of x and y the same as before: x y 1 2 2 3 4 4 5 4 6 6 E[XY] = {(1)(2) + (2)(3) + (4)(4) + (5)(4) + (6)(6)}/5 = 16. Cov[X,Y] = E[XY] - E[X]E[Y] = 16 - (3.6)(3.8) = 2.32 = B. B - A = 2.32 - 1.12 = 1.2. Comment: The maximum correlation and covariance occurs when the smallest x corresponds to the smallest y, and the largest x corresponds to the largest y.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 74 3.43. B. Since f can be written as a product of a function of x and a function of y, (and the support of x does not depend on y and vice versa) x and y are independent. ⇒ Cov[X, Y] = 0. Alternately, the double integral of x over the square is 1/2, so k = 2. 1
E[XY] =
1
∫ ∫(xy) (2x) dy dx = 1/3.
x =0 y=0 1
E[X] =
1
∫ ∫x (2x) dy dx = 2/3. x =0 y=0 1
E[Y] =
1
∫ ∫y (2x) dy dx = 1/2. x =0 y=0
Cov[X, Y] = E[XY] - E[X]E[Y] = 0. 3.44. E. E[S] = (10)(1.5) = 15. Var[S] = (10)(1.52 ) = 22.5. E[S*] = (10)((1+2)/2) = 15. Var[S*] = (10)(2nd mom. of severity) = (10){(12 + 22 )/2} = 25. E[SS*] = En [E[SS* | n]] = En [E[1.5nS* | n]] = En [1.5nE[S* | n]] = En [1.5n1.5n] = 2.25 En [n2 ] = 2.25(2nd moment of the Poisson) = 2.25(10 + 102 ) = 247.5. COV[S, S*] ≡ E[SS*] - E[S]E[S*] = 247.5 - (15)(15) = 22.5. Corr[S, S*] ≡ COV[S, S*]/ Var[S]Var[S*] = 22.5/ (22.5)(25) = 0.949.
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 75 3.45. E. Assume for example, that all the constants are positive. If d = 1000 and X = 1002 ⇒ Z 1 and Z2 are more likely to be positive than negative ⇒ E[Z1 ] > 0 and E[Z2 ] > 0 ⇒ E[Y | X = 1002] > 0. If d = 1000, then E[Y | X = 998] < 0. So we expect E[Y | X] to depend on X - d. This eliminates choices A and D. Now try a = 0, b = 1, c = 0, d = 0, e = 0, f = 1. Then Y = Z1 and X = Z2 . X and Y are independent
⇒ E[Y | X] = E[Y] = E[Z1 ] = 0. However, in this case choice B would give: X, eliminating choice B. Now try a = 0, b = 1, c = 1, d = 0, e = 1, f = 1. Then Y = Z1 + Z2 = X. E[Y | X] = X. However, in this case choice C would give 2X, eliminating choice C. Thus the answer must be the only remaining choice E. Alternately, notice that all of the choices are linear in X. Thus in this case, the least squares linear estimator must be exact; i.e., equal to E[Y | X]. Applying linear regression, the least squares linear estimator of Y given X has slope: Cov[X, Y]/Var[X] = Cov[d + eZ1 + fZ2 , a + bZ1 + cZ2 ]/Var[d + eZ1 + fZ2 ] = {ebVar[Z1 ] + cfVar[Z2 ]}/{e2 Var[Z1 ] + f2 Var[Z2 ]} = (be + cf) / (e2 + f2 ), using the fact that Z1 and Z2 are independent and each have variance of 1. The intercept is: E[Y] - βE[X] = a - d(be + cf) / (e2 + f2 ). So the linear least squares estimator is: a - d(be + cf) / (e2 + f2 ) + X(be + cf) / (e2 + f2 ) = a + [(be + cf) / (e2 + f2 )](X - d). Alternately, one could prove the result as follows: Let G = Z1 + uZ2 and H = Z1 + vZ2 . Prob[Z1 = z | G = g] = Prob[ Z1 = z and G = g] / Prob[G = g] = Prob[ Z1 = z]Prob[Z2 = (g-z)/u] / Prob[G = g] = φ[z] φ[(g-z)/u] / Prob[G = g]. The density of the Standard Normal is: φ[z] = exp[-z2 /2]/
2π .
Therefore, given G, the density of Z1 is proportional to: φ[z] φ[(g-z)/u] ~ exp[-z2 /2]exp[-.5(g-z)2 /u2 ] = exp[-.5{z2 (1 + u2 )/u2 - 2gz/u2 + g2 /u2 )}]
~ exp[-.5{z2 (1 + u2 )/u2 - 2gz/u2 + g2 /(u2 + u4 )}] = exp[-0.5{(z/u) 1 + u2 - g/ u2 + u4 }2 ] = exp[-.5{(z - g/(1 + u2 )}2 / {u2 /(1 + u2 )}], which is proportional to a Normal Distribution with µ = g/(1 + u2 ) and σ2 = u2 /(1 + u2 ).
⇒ Given G, the density of Z1 is a Normal Distribution with µ = g/(1 + u2 ) and σ2 = u2 /(1 + u2 ). ⇒ E[Z1 | G = g] = g/(1 + u2 ).
2013-4-9 Buhlmann Credibility §3 Covariances and Correlations, HCM 10/19/12, Page 76 Given G, the density of Z2 is proportional to: Prob[Z2 = z and G = g] = Prob[ Z1 = g - uz]Prob[Z2 = z] = φ[g - uz] φ[z] ~ exp[-.5(g - uz)2 ] exp[-z2 /2] = exp[-.5{z2 (1 + u2 ) - 2gzu + g2 )}]
~ exp[-.5{z2 (1 + u2 ) - 2gzu + g2 u2 /(1 + u2 )}] = exp[-.5{(z 1 + u2 - gu/ 1 + u2 }2 ] = exp[-.5{(z - gu/(1 + u2 )}2 / {1/(1 + u2 )}], which is proportional to a Normal Distribution with µ = gu/(1 + u2 ) and σ2 = 1/(1 + u2 ).
⇒ Given G, the density of Z2 is a Normal Distribution with µ = gu/(1 + u2 ) and σ2 = 1/(1 + u2 ). ⇒ E[Z2 | G = g] = gu/(1 + u2 ). E[H | G = g] = E[Z1 + vZ2 | G = g] = E[Z1 | G = g] + vE[Z2 | G = g] = g/(1 + u2 ) + vgu/(1 + u2 ) = g(1 + uv)/(1 + u2 ). X = d + eZ1 + fZ2 ⇒ (X - d)/e = Z1 + (f/e)Z2 . Let G = (X - d)/e, and u = f/e. Y = a + bZ1 + cZ2 ⇒ (Y - a)/b = Z1 + (c/b)Z2 . Let H = (Y - a)/b, and v = c/b. Then E[Y | X] = E[a + bH | G = (X - d)/e] = a + b E[H | G = (X - d)/e] = a + b {(X - d)/e}(1 + uv)/(1 + u2 ) = a + b {(X - d)/e}{1 + (f/e)(c/b)}/(1 + (f/e)2 ) = a + (X - d) (be + cf) / (e2 + f2 ). Comment: Beyond what you are likely to be asked on your exam, but not beyond what you could be asked. Some exams have a very hard question like this one. You can not be expected to prove this result under exam conditions! X and Y are Bivariate Normal and E[Y | X] is linear in X; see example 5d and Section 7.7 in A First Course in Probability by Ross. For a more general discussion, see page 206 of Applied Regression Analysis by Draper and Smith.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 77
Section 4, Bayesian Analysis, Introduction Bayesian Analysis will be discussed in this and the following two sections.18 It is also discussed to some extent in the section on the Die-Spinner Models and in the section on the Philbrick Targeting Shooting Example. Bayes Theorem: Take the following simple example. Assume there are two types of risks, each with Bernoulli claim frequencies. One type of risk has a 30% chance of a claim (and a 70% chance for no claims.) The second type of risk has a 50% chance of having a claim. Of the universe of risks, 3/4 are of the first type with a 30% chance of a claim, while 1/4 are of the second type with a 50% chance of having a claim. Type of Risk 1 2
A Priori Probability 3/4 1/4
Chance of a Claim 30% 50%
If a risk is chosen at random, then the chance of having a claim is (3/4)(30%) + (1/4)(50%) = 35%. In this simple example, there are two possible outcomes: either we observe 1 claim or no claims. Thus the chance of no claims is 65%. Assume we pick a risk at random and observe no claim. Then what is the chance that we have risk Type 1? By the definition of the conditional probability we have: P(Type = 1 | n = 0) = P(Type =1 and n =0) / P(n=0). However, P(Type =1 and n =0) = P(n =0 | Type =1) P(Type =1) = (0.7)(0.75). Therefore, P(Type = 1 | n = 0) = P(n =0 | Type =1) P(Type =1) / P(n=0) = (0.7)(0.75) / 0.65 = 0.8077. This is a special case of Bayesʼ Theorem: P(B | A) P(A) P(A | B) = . P(B) P(Risk Type | Observation) =
P(Observation | Risk Type) P(Risk Type) . P(Observation)
Exercise: Assume we pick a risk at random and observe no claim. Then what is the chance that we have risk Type 2? [Solution: P(Type = 2 | n = 0) = P(n =0 | Type =2) P(Type =2) / P(n=0) = (.5)(.25) / .65 = 0.1923.] 18
This material could just as easily all be in one big section; the division into three is somewhat artificial.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 78 Of course with only two types of risks the chance of a risk being Type 2 is unity minus the chance of being Type 1. We observe that 0.8077 + 0.1923 = 1. Some may find it helpful to assume we have for example 1000 risks, 750 of Type 1 and 250 of Type 2. Of the 750 risks of Type 1, on average (70%)(750) = 525 have no claim. Of the 250 risks of Type 2, on average (50%)(250) = 125 have no claim. Thus there are expected to be 650 risks with no claim. Of these 650 risks, 525 or 80.77% are of Type 1, and 125 or 19.23% are of Type 2. Estimating the Future from an Observation: Now not only do we have probabilities posterior to an observation, but we can use these to estimate the chance of a claim if the same risk is observed again. For example, if we observe no claim the estimated claim frequency for the same risk is: (post. prob. Type 1)(claim freq. Type 1) + (posterior prob. Type 2)(claim freq. Type 2) = (0.8077)(30%) + (0.1923)(50%) = 33.85%. This type of Bayesian analysis can be organized into a spreadsheet. For the above example with an observation of zero claims: A
B
A Priori Chance of Type of This Type Risk of Risk 1 2 Overall
0.75 0.25
C
D
E
F
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Freq.
0.7 0.5
0.525 0.125
80.77% 19.23%
0.30 0.50
0.650
1.000
33.85%
Study this very carefully. Organize all of your solutions of Bayesian Analysis questions in a single manner that works well for you. In my spreadsheet, one lists the different types of risks.19 Then list the a priori chance of each type of risk. Next determine the chance of the observation given a particular type of risk. Next compute: “probability weights” = (a priori chance of risk)(chance of observation given that type of risk). These are the Bayes Theorem probabilities except we have not divided by the a priori chance of the observation. We can automatically convert the “probability weights” to probabilities by dividing by their sum so that they add up to unity. (Note how the sum of the probability weights is 0.65, which is the a priori chance of observing zero claims.) Then one can use these posterior probabilities to estimate any quantity of interest; in this case we get a posterior estimate of the frequency.20 19
In this example there are only two types of risks. In more complicated examples I sometimes use additional columns of the spreadsheet to list characteristics of the different types of risks. 20 In more complicated examples I sometimes use additional columns of the spreadsheet to compute the quantity of interest for the different types of risks.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 79 Note that the posterior estimate is a weighted average of the hypothetical means for the different types of risks. Thus the posterior estimate of 33.85% is in the range of the hypotheses, 30% to 50%. This is true in general for Bayesian analysis. The result of Bayesian Analysis is always within the range of hypotheses. This is not necessarily true for the results of applying Credibility. Exercise: What if a risk is chosen at random and one claim is observed. What is the posterior estimate of the chance of a claim from this same risk? [Solution: (0.6429)(0.3) + (0.3571)(0.5) = 37.14%. A
B
C
A Priori Chance of Chance Type of This Type of the Risk of Risk Observation 1 0.75 0.3 2 0.25 0.5 Overall
D
E
F
Prob. Weight = Product of Columns B&C 0.225 0.125 0.350
Posterior Chance of This Type of Risk 64.29% 35.71% 1.000
Mean Annual Freq. 0.30 0.50 37.14%
For example, P(Type = 1 | n = 1) = P(Type =1 and n =1) / P(n=1) = (0.75)(0.3) / 0.35 = 0.643, P(Type = 2 | n = 1) = P(Type =2 and n =1) / P(n=1) = (0.25)(0.5) / 0.35 = 0.357.] Note how the estimate posterior to the observation of one claim is 37.14%, greater than the a priori estimate of 35%. The observation has let us infer that it is more likely that the risk is of the high frequency type than it was prior to the observation. Thus we infer that the future chance of a claim from this risk is higher than it was prior to the observation. Similarly, the estimate posterior to the observation of no claim is 33.85%, less than the a priori estimate of 35%.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 80 Steps for Bayesian Analysis: There are generally the following steps to Bayesian Analysis exam questions: 1. Read and understand the given model21. An overall population is divided into different subgroups with (possibly) different a priori probabilities. 2. Calculate the expected value of the quantity of interest22 for each a priori possibility23. 3. Read and understand the observation(s) of an individual from the overall population.24 Compute the chance of the observation given a risk from each of the subgroups of the overall population. 4. Compute the posterior probabilities using Bayesʼ Theorem, using steps 1 and 3. 5. Take a weighted average of the expected values from step 2, using the posterior probabilities from step 4. Note that this assumes that we have observed an individual from the overall population without knowing which subgroup it is from. Then we are estimating the future outcome for the same individual observed in step 3. If instead one picked a new individual from the population, then the estimate would be the weighted average of the expected values from step 2, using the a priori probabilities from step 1 rather than the posterior probabilities from step 4.
21
In actual applications you may specify the model yourself. On the exam the model will be specified for you. It may involve dice, urns, spinners, insured drivers, etc. 22 The quantity of interest may be the sum of die rolls, claim frequency, claim severity, total losses, etc. 23 The a priori possibilities may be different urns, spinners, classes, etc. 24 In actual applications you may need to obtain the relevant data. On the exam the data will be given you. It may involve rolls of dice, spins of spinners, dollars of loss. etc.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 81 Multi-Sided Dice Example: Letʼs illustrate Bayesian Analysis with a simple example involving multi-sided dice: Assume that there are a total of 100 multi-sided dice of which 60 are 4-sided, 30 are 6-sided, and 10 are 8-sided. The multi-sided dice with 4 sides have 1, 2, 3 and 4 on them.25 The multi-sided dice with the usual 6 sides have numbers 1 through 6 on them. The multi-sided dice with 8 sides have numbers 1 through 8 on them.26 For a given die each side has an equal chance of being rolled; i.e., the die is fair. Your friend has picked at random a multi-sided die. (You do not know what sided die he has picked.) He then rolled the die and told you the result. You are to estimate the result when he rolls that same die again. If the result is a 3 then the estimate of the next roll of the same die is 2.853: A
B
C
D
E
F
Type of Die 4-sided 6-sided 8-sided
A Priori Chance of This Type of Die 0.600 0.300 0.100
Chance of the Observation 0.250 0.167 0.125
Prob. Weight = Product of Columns B&C 0.1500 0.0500 0.0125
Posterior Chance of This Type of Die 70.6% 23.5% 5.9%
Mean Die Roll 2.5 3.5 4.5
0.2125
1.000
2.853
Overall
The general steps to Bayesian Analysis exam questions were in this case: 1. “Read and understand the given model .” Make sure you understand what is meant by the three different types of dice and note their a priori probabilities. 2. “Calculate the expected value of the quantity of interest for each a priori possibility.” The mean die rolls for the three types of dice are: 2.5, 3.5 and 4.5. 3. “Read and understand the observation(s).” There is a single die roll for a 3. 4. “Compute the posterior probabilities using Bayesʼ Theorem.” The posterior probabilities for the three types of dice are: 70.6%, 23.5%, and 5.9%. 5. “Take a weighted average of the expected values from step 2, using the posterior probabilities from step 4.” (70.6%)(2.5) + (23.5%)(3.5) + (5.9%)(4.5) = 2.85.
25 26
The mean of a 4-sided die is: (1 + 2 + 3 + 4)/4 = 2.5. The mean of an 8-sided die is: (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8)/8 = (1+8)/2 = 4.5.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 82 Exercise: If instead a 6 is rolled, what is the estimate of the next roll of the same die? Solution: The estimate of the next roll of the same die is 3.700: A
B
Type of Die 4-sided 6-sided 8-sided
A Priori Chance of This Type of Die 0.600 0.300 0.100
C
D
E
F
Chance of the Observation 0.000 0.167 0.125
Prob. Weight = Product of Columns B&C 0.0000 0.0500 0.0125
Posterior Chance of This Type of Die 0.0% 80.0% 20.0%
Mean Die Roll 2.5 3.5 4.5
0.0625
1.000
3.700
Overall
Alternately, assume we picked each of the 100 dice, and rolled each 24 times. Of these 2400 rolls, 1440 would be of 4-sided dice, 720 would be of 6-sided dice, and 240 would be of 8-sided dice. The expected total number of sixes rolled is: 720/6 + 240/8 = 120 + 30 = 150. Of the sixes rolled, 120/150 = 80% are from 6-sided, and 30/150 = 20% are from 8-diced dice. Proceed as before.] For this multisided die example, we get the following set of estimates corresponding to each possible observation: Observation Bayesian Estimate
1 2.853
2 2.853
3 2.853
4 2.853
5 3.7
6 3.7
7 4.5
8 4.5
Note that while in this simple example the posterior estimates are the same for a number of different observations, this is not usually the case. Exercise: What is the a priori chance of each possible outcome? [Solution: In this case there is a 60% / 4 = 15% chance that a 4-sided die will be picked and then a 1 will be rolled. Similarly, there is a 30% / 6 = 5% chance that a 6-sided die will be selected and then a 1 will be rolled. There is a 10% / 8 = 1.25% chance that an 8-sided die will be selected and then a 1 will be rolled. The total chance of a 1 is therefore: 15% + 5% + 1.25% = 21.25%. Probability due to 4-sided die
Probability due to 6-sided die
Probability due to 8-sided die
A Priori Probability
1 2 3 4 5 6 7 8
0.15 0.15 0.15 0.15
0.05 0.05 0.05 0.05 0.05 0.05
0.0125 0.0125 0.0125 0.0125 0.0125 0.0125 0.0125 0.0125
0.2125 0.2125 0.2125 0.2125 0.0625 0.0625 0.0125 0.0125
Sum
0.6
0.3
0.1
1
Roll of Die
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 83 Using the Multiview Calculator: Using the TI-30X-IIS Multiview, one could work as follows to do Bayes Analysis for this multisided die example when a 3 is rolled: DATA DATA Clear L1 ENTER 0.6 x 1/4 ENTER 0.3 x 1/6 ENTER 0.1 x 1/8 ENTER (The three probability weights should now be in the column labeled L1.) DATA Clear L2 ENTER (Use on the big button at the upper right to select Clear L2.) 2.5 ENTER 3.5 ENTER 4.5 ENTER (The means of the three types of dice should now be in the column labeled L2.) 2nd STAT 1-VAR ENTER (If necessary use on the big button at the upper right to select 1-VAR.) DATA L2 ENTER (Use on the big button at the upper right to select DATA L2.) FRQ L1 ENTER (Use and on the big button at the upper right to select FRQ L1.) CALC ENTER (Use and on the big button at the upper right to select CALC.) Various outputs are displayed. Use and on the big button to scroll through them. n = 0.2125. (the sum of the weights, the a priori chance of the observation.) X = 2.853 (weighted average of the means in L1 with weights in L2, the estimate using Bayes Analysis.) To exit stat mode, hit 2ND QUIT. To display the outputs again: 2nd STAT STATVAR ENTER (Use on the big button at the upper right to select STATVAR.)
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 84 Using the Functions of the Calculator to Compute a Weighted Average: Assume the following: Value 2.5 3.5 4.5
Weight 0.6 0.3 0.1
The weighted average is:
(0.6)(2.5) + (0.3)(3.5) + (0.1)(4.5) = 3.0. 0.6 + 0.3 + 0.1
Note that as is often the case, here the weights add to one, so there is no need to divide by the sum of the weights. You could just calculate this weighted average directly. Alternately, you can use the statistics functions of the allowed electronic calculators to calculate weighted averages. If you already know how to fit a linear regression, you can just use some of the outputs of that. We let X be 2.5, 3.5, 4.5, and the dependent variable Y be the weights: 0.6, 0.3, 0.1.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 85 Using the TI-30X-IIS Multiview, one would fit a straight line with intercept as follows: DATA DATA Clear L1 ENTER 2.5 ENTER 3.5 ENTER 4.5 ENTER (The three values should now be in the column labeled L1.) DATA Clear L2 ENTER (Use on the big button at the upper right to select Clear L2.) 0.6 ENTER 0.3 ENTER 0.1 ENTER (The three weights should now be in the column labeled L2.) 2nd STAT 2-VAR ENTER (Use on the big button at the upper right to select 2-VAR.) CALC ENTER (Use and on the big button at the upper right to select CALC.) Various outputs are displayed. Use and on the big button to scroll through them. n = 3. (number of data points.) etc. ΣY = 1 (sum of the weights) ΣXY = 3 a = -0.25 (slope) b = 1.208 (intercept) In general the weighted average is ΣXY / ΣY. To exit stat mode, hit 2ND QUIT.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 86 Alternately, using the TI-30X-IIS Multiview: DATA DATA Clear All ENTER (Use on the big button at the upper right to select Clear All.) 2.5 ENTER 3.5 ENTER 4.5 ENTER (The three values should now be in the column labeled L1.) 0.6 ENTER 0.3 ENTER 0.1 ENTER (The three weights should now be in the column labeled L2.) 2nd STAT 1-VAR ENTER Under Data select L1 (If necessary use and on the big button at the upper right) Under Frequency select L2 (If necessary use and on the big button at the upper right) CALC ENTER ( Use and on the big button at the upper right to select CALC.) Various outputs are displayed. Use and on the big button to scroll through them. n = 1. (The sum of the weights.) X = 3 (The weighted average.) To exit stat mode, hit 2ND QUIT.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 87 Using the TI-30X-IIS, one would fit a straight line with intercept as follows: 2nd STAT CLRDATA ENTER 2nd STAT 2-VAR ENTER (Use the key if necessary to select 2-VAR rather than 1-VAR.) DATA X1 = 2.5 Y 1 = 0.6 X2 = 3.5 Y 2 = 0.3 X3 = 4.5 Y 3 = 0.1 ENTER STATVAR Various outputs are displayed. Use the key and the key to scroll through them.) n = 3. (number of data points.) etc. ΣY = 1 (sum of the weights) ΣXY = 3 a = -0.25 (slope)
b = 1.208 (intercept)
In general the weighted average is ΣXY / ΣY. Alternately, using the TI-30X-IIS: 2nd STAT CLRDATA ENTER 2nd STAT 1-VAR ENTER (Use the key if necessary to select 2-VAR rather than 1-VAR.) DATA X1 = 2.5 Freq = 6 X2 = 3.5
(Frequencies have to be integers and proportional to the weights.)
Freq = 3 X3 = 4.5 Freq = 1 ENTER STATVAR Various outputs are displayed. Use the key and the key to scroll through them.) X = 3 (The weighted average.)
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 88 Using the BA II Plus Professional, one would fit a straight line with intercept as follows: 2nd DATA 2nd CLR WORK X1 2.5 ENTER ↓ Y 1 0.6 ENTER ↓ X2 3.5 ENTER ↓ Y2 0.3 ENTER ↓ X3 4.5 ENTER ↓ Y3 0.1 ENTER ↓ 2nd STAT If necessary press 2nd SET until LIN is displayed (for linear regression) Various outputs are displayed. Use the keys ↓ and ↑ to scroll through them. n = 3. (number of data points.) etc. ΣY = 1 (sum of the weights) ΣXY = 3 a = 1.208 (intercept) b = -0.25 (slope) In general the weighted average is ΣXY / ΣY.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 89 Balance: Exercise: What is the a priori overall mean? [Solution: The a priori chance of the types of dice are 60%, 30%, and 10%, with means of 2.5, 3.5, and 4.5. Therefore, the a priori overall mean is: (60%)(2.5) + (30%)(3.5) + (10%)(4.5) = 3.] Note that the Bayesian Estimates are in balance; the weighted average of the Bayesian Estimates, using the a priori chance of each observation, is equal to the a priori overall mean of 3: (0.2125)(1) + (0.2125)(2) + (0.2125)(3) + (0.2125)(4) + (0.0625)(5) + (0.0625)(6) + (0.0125)(7) + (0.0125)(8) = 3.
Roll of Die
A Priori Probability
Bayesian Analysis Estimate
1 2 3 4 5 6 7 8
0.2125 0.2125 0.2125 0.2125 0.0625 0.0625 0.0125 0.0125
2.853 2.853 2.853 2.853 3.7 3.7 4.5 4.5
Average
1
3.000
If Di are the possible outcomes, then the Bayesian estimates are E[X | Di ]. Then ΣP(Di ) E[X | Di ] = E[X] = the a priori mean. The estimates that result from Bayesian Analysis are always in balance: The sum of the product of the a priori chance of each outcome times its posterior Bayesian estimate is equal to the a priori mean. Assume we were to repeat many times the following simulation of the multi-sided die example: 1. Choose a die. 2. Roll the die. 3. Apply Bayes Analysis to predict the outcome of the next die roll from the same die. Then the average of the results of step three would be the a priori mean of the model, 3. This is what we mean by the estimates that result from Bayesian Analysis are always in balance. If the original model is correct, then the expected value of the Bayes Estimator is the a priori mean.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 90 Multi-sided Dice Example, More than One Roll: If one observes more than one roll from the same die, the Bayesian Analysis proceeds in the same fashion as in the previous case of one roll, but one computes the probability of observing these rolls. Sometimes one is just given the sum of the observations. Again one just computes the probability of the observation given the particular type of die. For example, if one observes two rolls that sum to 11, the estimate of the next roll is 3.86: A
B
Type of Die 4-sided 6-sided 8-sided
A Priori Chance of This Type of Die 0.600 0.300 0.100
C
Chance of the Observation 0.0000 0.0556 0.0938
Overall
D
E
Prob. Weight = Posterior Product Chance of of Columns This Type B&C of Die 0.0000 0.0% 0.0167 64.0% 0.0094 36.0% 0.0260
1.000
F
Mean Die Roll 2.5 3.5 4.5 3.86
For example, the chance of the observation is 6/64 = 0.09375 if your friend has been rolling an 8-sided die. Two rolls of an 8-sided die can sum to 11 with rolls of: (3,8), (4,7), (5,6), (6,5), (7,4) or (8,3) for six possibilities out of 82 = 64. Exercise: If the sum of three rolls is 4, what is the estimate of the next roll from the same die? [Solution: The estimate using Bayesian Analysis of the next roll is 2.66. A
B
C
Type of Die 4-sided 6-sided 8-sided
A Priori Chance of This Type of Die 0.600 0.300 0.100
Chance of the Observation 0.04688 0.01389 0.00586
Overall
D
E
Prob. Weight = Posterior Product Chance of of Columns This Type B&C of Die 0.02813 85.5% 0.00417 12.7% 0.00059 1.8% 0.03288
1.000
F
Mean Die Roll 2.5 3.5 4.5 2.66
For example, the chance of the observation is 3/216 if your friend has been rolling a 6-sided die. Three rolls of a 6-sided die can sum to 4 with rolls of: (1,1,2), (1,2,1), or (2,1,1) for three possibilities out of 63 = 216.]
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 91 Problems: 4.1 (1 point) Box I contains 5 red and 3 blue marbles, while Box II contains 3 red and 7 blue marbles. A fair die is rolled. If the roll results in a one, two, three, or four, a marble is chosen at random from Box I. If it results in a five or six, a marble is chosen at random from Box II. If the chosen marble is blue, but you are not allowed to see the roll of the die (and thus, you don't know which box has been chosen), what is the probability that Box I was chosen? A. Less than 49% B. At least 49%, but less than 50% C. At least 50%, but less than 51% D. At least 51%, but less than 52% E. 52% or more.
Use the following information for the next two questions: There are three dice : Die A Die B Die C 2 faces labeled 0 4 faces labeled 0 5 faces labeled 0 4 faces labeled 1 2 faces labeled 1 1 faces labeled 1 4.2 (2 points) You select a die at random and then roll the selected die twice. The rolls add to 1. What is the expected value of the next roll of the same die? A. less than 0.45 B. at least 0.45 but less than 0.50 C. at least 0.50 but less than 0.55 D. at least 0.55 but less than 0.60 E. at least 0.60 4.3 (2 points) You select a die at random and then roll the selected die three times The rolls add to 1. What is the expected value of the next roll of the same die? A. 0.25 B. 0.30 C. 0.35 D. 0.40 E. 0.45
4.4 (1 point) Which of the following statements are true with respect to the estimates from Bayesian Analysis? 1. They are in balance. 2. They are between the observation and the a priori mean. 3. They are within the range of hypotheses. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 92 Use the following information for the next two questions: There are two types of urns, each with many balls labeled $1000 and $2000. A Priori Chance of Percentage of Percentage of Type of Urn This Type of Urn $1000 Balls $2000 Balls I 80% 90% 10% II 20% 70% 30% 4.5 (2 points) You pick an Urn at random (80% chance it is of Type I) and pick one ball. If the ball is $2000, what is the expected value of the next ball picked from that same urn? A. 1130 B. 1150 C. 1170 D. 1190 E. 1210 4.6 (2 points) You pick an Urn at random (80% chance it is of Type I) and pick three balls. If two of the balls were $1000 and one of the balls was $2000, what is the expected value of the next ball picked from that same urn? A. 1140 B. 1160 C. 1180 D. 1200 E. 1220
4.7 (2 points) Let X1 be the outcome of a single trial and let E[X2 | X1 ] be the expected value of the outcome of a second trial. You are given the following information: Outcome = T P(X1 = T) Bayesian Estimate For E[X2 | X1 = T] 1 5/8 1.4 4 2/8 3.6 16 1/8 ---Determine the Bayesian estimate for E[X2 | X1 = 16]. A. Less than 11 B. At least 11, but less than 12 C. At least 12, but less than 13 D. At least 13, but less than 14 E. 14 or more 4.8 (2 points) There are four types of urns with differing percentages of black balls. Each type of urn has a differing chance of being picked. Type of Urn A Priori Probability Percentage of Black Balls I 40% 4% II 30% 8% III 20% 12% IV 10% 16% An urn is chosen and fifty balls are drawn from it, with replacement; no black balls are drawn. Use Bayes Theorem to estimate the probability of picking a black ball from the same urn. A. 3.0% B. 3.5% C. 4.5% D. 5.0% E. 5.5%
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 93 4.9 (3 points ) A die is selected at random from an urn that contains four six-sided dice with the following characteristics: Number of Faces Number on Face Die A Die B Die C Die D 1 3 1 1 1 2 1 3 1 1 3 1 1 3 1 4 1 1 1 3 The first five rolls of the selected die yielded the following in sequential order: 2, 3, 1, 2, and 4. Using Bayesian Analysis, what is the expected value of the next roll of the same die? A. 1.8 B. 2.0 C. 2.2 D. 2.4 E. 2.6 4.10 (3 points) A game of chance has been designed where you are dealt a two-card hand from a deck of cards chosen at random from two available decks. The two decks are as follows: Deck A: 1 suit from a regular deck of cards (13 cards). Deck B: same as Deck A except the ace is missing (12 cards). You will receive $10 for each ace or face card in your hand. Assume that you have been dealt two cards which are either an ace or a face card (i.e., a $20 hand). NOTE: A face card equals either a Jack, Queen or King. Using Bayesian Analysis, what is the expected value of the next hand drawn from the same deck assuming the first hand is replaced? A. 5.1 B. 5.3 C. 5.5 D. 5.7 E. 5.9 4.11 (3 points) Your friend has picked at random one of three multi-sided dice. He then rolled the die and told you the result. You are to estimate the result when he rolls that same die again. One of the three multi-sided dice has 4 sides (with 1, 2, 3 and 4 on them), the second die has the usual 6 sides (with numbers 1 through 6), and the last die has 8 sides (with numbers 1 through 8). For a given die each side has an equal chance of being rolled; i.e., the die is fair. Assume the first roll was a five. Use Bayesʼ Theorem to estimate the next roll. A. 3.5 B. 3.7 C. 3.9 D. 4.1 E. 4.3
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 94 Use the following information for the next three questions: You are on a game show, and the host Monty Hall gives you the choice of one of three doors. Behind one door is a car, while behind each of the other two doors is a goat. You pick a door at random, and will receive what is behind your door when it is opened. Monty Hall knows what is behind each door. 4.12 (2 points) Assume that whenever this game is played, Monty Hall opens a door the contestant did not pick that has a goat behind it. Then Monty Hall gives the contestant a chance to switch doors. What is the probability that the car is behind the door you originally picked? 4.13 (2 points) Assume that whenever this game is played, Monty Hall opens at random a door that the contestant did not pick. If opening this door reveals a goat behind it, then Monty Hall gives the contestant a chance to switch doors. If opening this door reveals the car, then the game is over, and the contestant gets a goat. After you pick your door, Monty opens a different door revealing a goat. What is the probability that the car is behind the door you originally picked? 4.14 (2 points) Assume that whenever this game is played, Monty Hall opens a door the contestant did not pick. If the door the contestant picked has a goat behind it, then one third of the time Monty opens the other door with a goat behind it, while two thirds of the time Monty opens the door with a car behind it. If the door the contestant picked has the car behind it, then Monty opens another door revealing a goat. After you pick your door, Monty opens a different door revealing a goat, and gives you an opportunity to switch doors. What is the probability that the car is behind the door you originally picked?
4.15 (2 points) Only two cab companies operate in a city, Green and Blue. Eighty-five percent of the cbs are Green and 15 percent are Blue. A cab was involved in a hit-and-run accident at night. A witness identified the cab as Blue. The court tested the reliability of the witness under the circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80 percent of the time and got the color wrong 20 percent of the time. What is the probability that the cab involved in the accident is Blue rather than Green? (A) 40% (B) 50% (C 60% (D 70% (E) 80%
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 95 4.16 (3 points) You are given the following: • Two urns each contain three marbles. • One urn contains two red marbles and one black marble. • The other urn contains one red marble and two black marbles. An urn is randomly selected and designated Urn A. The other urn is designated Urn B. One marble is randomly drawn from Urn A. The selected marble is placed in Urn B. One marble is randomly drawn from Urn B. This second selected marble is placed in Urn A. One marble is randomly drawn from Urn A. This third selected marble is placed in Urn B. One marble is randomly drawn from Urn B. This fourth selected marble is placed in Urn A. This process is continued indefinitely, with marbles alternately drawn from Urn A and Urn B. The first selected marble is red. The second selected marble is black. Determine the Bayesian analysis estimate of the probability that the third selected marble will be red. A. Less than 0.3 B. At least 0.3, but less than 0.5 C. At least 0.5, but less than 0.7 D. At least 0.7, but less than 0.9 E. At least 0.9 Use the following information for the next two questions: There are three large urns, each filled with so many balls that you can treat it as if there are an infinite number. Urn 1 contains balls with "zero" written on them. Urn 2 has balls with "one" written on them. The final Urn 3 is filled with 50% balls with "zero" and 50% balls with "one". An urn is chosen at random and five balls are picked. 4.17 (4, 5/83, Q.44a) (2 points) If all five balls have zero written on them, use Bayes Theorem to estimate the expected value of another ball picked from that urn. A. less than 0.02 B. at least 0.02 but less than 0.04 C. at least 0.04 but less than 0.06 D. at least 0.06 but less than 0.08 E. at least 0.08 4.18 (4, 5/83, Q.44c) (2 points) If 3 balls have 0 written on them and 2 have 1 written on them, use Bayes Theorem to estimate the expected value of another ball picked from that urn. A. less than 0.42 B. at least 0.42 but less than 0.44 C. at least 0.44 but less than 0.46 D. at least 0.46 but less than 0.48 E. at least 0.48
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 96 Use the following information for the next two questions: There are three dice : Die A Die B Die C 2 faces labeled 0 4 faces labeled 0 5 faces labeled 0 4 faces labeled 1 2 faces labeled 1 1 faces labeled 1 4.19 (4, 5/84, Q.51) (2 points) You select a die at random and then roll the selected die. Assuming a 0 is rolled what is the expected value of the next roll of the same die? A. less than 0.2 B. at least 0.2 but less than 0.3 C. at least 0.3 but less than 0.4 D. at least 0.4 but less than 0.5 E. at least 0.5 4.20 (4, 5/84, Q.51) (2 points) You select a die at random and then roll the selected die. Assuming a 1 is rolled what is the expected value of the next roll of the same die? A. less than 0.5 B. at least 0.5 but less than 0.6 C. at least 0.6 but less than 0.7 D. at least 0.7 but less than 0.8 E. at least 0.8 4.21 (4, 5/86, Q.36) (1 point) Box I contains 3 red and 2 blue marbles, while Box II contains 3 red and 7 blue marbles. A fair die is rolled. If the roll results in a one, two, three, or four, a marble is chosen at random from Box I. If it results in a five or six a marble is chosen at random from Box II. If the chosen marble is red, but you are not allowed to see the roll of the die (and thus, you don't know which box has been chosen), what is the probability that Box I was chosen? A. Less than 55% B. At least 55%, but less than 65% C. At least 65%, but less than 75% D. At least 75%, but less than 85% E. 85% or more.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 97 4.22 (4, 5/89, Q.36) (2 points) Your friend selected at random one of two urns and then she pulled a ball with the number 4 on it from the urn. Then, she replaced the ball in the urn. One of the urns contains four balls numbered 1 through 4. The other urn contains six balls numbered 1 through 6. Your friend will make another random selection of a ball from the same urn. Using the Bayesian method (i.e. Bayes' Theorem) what is the expected value of the number on the ball? A. Less than 2.925 B. At least 2.925, but less than 2.975 C. At least 2.975, but less than 3.025 D. At least 3.025, but less than 3.075 E. 3.075 or more 4.23 (4, 5/90, Q.39) (2 points) Three urns contain balls marked with either 0 or 1 in the proportions described below. Marked 0 Marked 1 Urn A 10% 90% Urn B 60 40 Urn C 80 20 An urn is selected at random and three balls are selected, with replacement, from the urn. The total of the values is 1. Three more balls are selected from the same urn. Calculate the expected total of the three balls using Bayesʼ theorem. A. less than 1.05 B. at least 1.05 but less than 1.10 C. at least 1.10 but less than 1.15 D. at least 1.15 but less than 1.20 E. at least 1.20 4.24 (4, 5/91, Q.38) (2 points) One spinner is selected at random from a group of three different spinners. Each of the spinners is divided into six equally likely sectors marked as described below. --------Number of Sectors---------Spinner Marked 0 Marked 12 Marked 48 A 2 2 2 B 3 2 1 C 4 1 1 Assume a spinner is selected and a 12 was obtained on the first spin. Use Bayes' theorem to calculate the expected value of the second spin using the same spinner. A. Less than 12.5 B. At least 12.5 but less than 13.0 C. At least 13.0 but less than 13.5 D. At least 13.5 but less than 14.0 E. At least 14.0
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 98 4.25 (4, 5/91, Q.50) (2 points) Four urns contain balls marked with either 0 or 1 in the proportions described below. Urn Marked 0 Marked 1 A 70% 30% B 70 30 C 30 70 D 20 80 An urn is selected at random and four balls are selected from the urn with replacement. The total of the values is 2. Four more balls are selected from the same urn. Calculate the expected total of the four balls using Bayesʼ theorem. A. Less than 1.96 B. At least 1.96 but less than 1.99 C. At least 1.99 but less than 2.02 D. At least 2.02 but less than 2.05 E. At least 2.05 4.26 (2, 5/92, Q.19) (1.7 points) A test for a disease correctly diagnoses a diseased person as having the disease with probability .85. The test incorrectly diagnoses someone without the disease as having the disease with probability .10. If 1% of the people in a population have the disease, what is the chance that a person from this population who tests positive for the disease actually has the disease? A. 0.0085 B. 0.0791 C. 0.1075 D. 0.1500 E. 0.9000 4.27 (4B, 5/92, Q.8) (3 points) Two urns contain balls each marked with 0, 1, or 2 in the proportions described below: Percentage of Balls in Urn Marked 0 Marked 1 Marked 2 Urn A .20 .40 .40 Urn B .70 .20 .10 An urn is selected at random and two balls are selected, with replacement, from the urn. The sum of values on the selected balls is 2. Two more balls are selected from the same urn. Determine the expected total of the two balls using Bayes' Theorem. A. Less than 1.6 B. At least 1.6 but less than 1.7 C At least 1.7 but less than 1.8 D. At least 1.8 but less than 1.9 E. At least 1.9
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 99 4.28 (4B, 5/92, Q.24) (2 points) Let X1 be the outcome of a single trial and let E[X2 | X1 ] be the expected value of the outcome of a second trial. You are given the following information: Buhlmann Credibility Bayesian Outcome Estimate For Estimate For T P(X1 =T) E[X2 | X1 =T] E[X2 | X1 =T] 1 1/3 3.4 8 1/3 7.6 12 1/3 10.0 Determine the Bayesian estimate for E[X2 | X1 = 12]. A. 8.6
B. 10.0
C. 10.6
D. 12.0
2.6 7.8 --E. Cannot be determined.
4.29 (4B, 11/94, Q.5) (3 points) Two urns contain balls with each ball marked 0 or 1 in the proportions described below: Percentage of Balls in Urn Marked 0 Marked 1 Urn A 20% 80% Urn B 70% 30% An urn is randomly selected and two balls are drawn from the urn. The sum of the values on the selected balls is 1. Two more balls are selected from the same urn. Note: Assume that each selected ball has been returned to the urn before the next ball is drawn. Determine the Bayesian analysis estimate of the expected value of the sum of the values on the second pair of selected balls. A. Less than 1.035 B. At least 1.035, but less than 1.055 C. At least 1.055, but less than 1.075 D. At least 1.075, but less than 1.095 E. At least 1.095
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 100 4.30 (4B, 11/95, Q.20) (2 points) Ten urns each contain five balls, numbered as follows: Urn 1: 1,2,3,4,5 Urn 2: 1,2,3,4,5 Urn 3: 1,2,3,4,5 Urn 4: 1,2,3,4,5 Urn 5: 1,2,3,4,5 Urn 6: 1,1,1,1,1 Urn 7: 2,2,2,2,2 Urn 8: 3,3,3,3,3 Urn 9: 4,4,4,4,4 Urn 10: 5,5,5,5,5 An urn is randomly selected. A ball is then randomly selected from this urn. The selected ball has the number 2 on it. This ball is then replaced, and another ball is randomly selected from the same urn. The second selected ball has the number 3 on it. This ball is then replaced, and another ball is randomly selected from the same urn. Determine the Bayesian analysis estimate of the expected value of the number on this third selected ball. A. Less than 2.2 B. At least 2.2, but less than 2.4 C. At least 2.4, but less than 2.6 D. At least 2.6, but less than 2.8 E. At least 2.8 4.31 (4B, 5/97, Q.18) (3 points) You are given the following:
• • •
12 urns each contain 10 marbles. n of the urns contain 3 red marbles and 7 black marbles.
The remaining 12-n urns contain 6 red marbles and 4 black marbles. An urn is randomly selected, and one marble is randomly drawn from it. The selected marble is red. This marble is replaced, and a marble is again randomly drawn from the same urn. The Bayesian analysis estimate of the probability that the second selected marble is red is 0.54. Determine n. A. 4 B. 5 C. 6 D. 7 E. 8
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 101 Use the following information for the next two questions: • Two urns each contain three marbles. • One urn contains two red marbles and one black marble. • The other urn contains one red marble and two black marbles. An urn is randomly selected and designated Urn A. The other urn is designated Urn B. One marble is randomly drawn from Urn A. The selected marble is placed in Urn B. One marble is randomly drawn from Urn B. This second selected marble is placed in Urn A. One marble is randomly drawn from Urn A. This third selected marble is placed in Urn B. One marble is randomly drawn from Urn B. This fourth selected marble is placed in Urn A. This process is continued indefinitely, with marbles alternately drawn from Urn A and Urn B. 4.32 (4B, 11/97, Q.27) (1 point) The first two selected marbles are red. Determine the Bayesian analysis estimate of the probability that the third selected marble will be red. A. 1/2 B. 11/21 C. 4/7 D. 2/3 E. 1 4.32 (4B, 11/97, Q.28) (2 points) Determine the limit as n goes to infinity of the Bayesian analysis estimate of the probability that the (2n+1)st selected marble will be red if the first 2n selected marbles are red (where n is an integer). A. 1/2 B. 11/21 C. 4/7 D. 2/3 E. 1
4.33 (4B, 5/99, Q.2) (2 points) Each of two urns contains two fair, six-sided dice. Three of the four dice have faces marked with 1, 2, 3, 4, 5, and 6. The other die has faces marked with 1, 1, 1, 2, 2, and 2. One urn is randomly selected, and the dice in it are rolled. The total on the two dice is 3. Determine the Bayesian analysis estimate of the expected value of the total on the same two dice on the next roll. A. 5.0 B. 5.5 C. 6.0 D. 6.5 E. 7.0
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 102 4.34 (4B, 5/99, Q.26) (3 points) A company invests in a newly offered stock that it judges will turn out to be one of three types. The company believes that the stock is equally likely to be any of the three types. • The annual dividend for Type A stocks has a normal distribution with mean 10 and variance 1. • The annual dividend for Type B stocks has a normal distribution with mean 10 and variance 4. • The annual dividend for Type C stocks has a normal distribution with mean 10 and variance 16. After the company has held the stock for one year, the stock pays a dividend of amount d. The company then determines that the posterior probability that the stock is of Type B is greater than either the posterior probability that the stock is of Type A or the posterior probability that the stock is of Type C. Determine all the values of d for which this would be true. Hint: The density function for a normal distribution is f(x) = exp[-(x-µ)2 /2σ2 ] /{σ 2 π }. A. |d-10| < 2
(2 ln 2) / 3
B. |d-10| < 4
(2 ln 2) / 3
C. 2
(2 ln 2) / 3 < |d-10| < 4
D. |d-10| > 2
(2 ln 2) / 3
E. |d-10| > 4
(2 ln 2) / 3
(2 ln 2) / 3
4.35 (4B, 11/99, Q.16) (2 points) You are given the following: • A red urn and a blue urn each contain 100 balls. • Each ball is labeled with both a letter and a number. • The distribution of letters and numbers on the balls is as follows: Letter A Letter B Number 1 Number 2 Red Urn : 90 10 90 10 Blue Urn: 60 40 10 90 • Within each urn, the appearance of the letter A on a ball is independent of the appearance of the number 1 on a ball. One ball is drawn randomly from a randomly selected urn, observed to be labeled A-2, and then replaced. Determine the expected value of the number on another ball drawn randomly from the same urn. A. Less than 1.2 B. At least 1.2, but less than 1.4 C. At least 1.4, but less than 1.6 D. At least 1.6, but less than 1.8 E. At least 1.8
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 103 4.36 (Course 1 Sample Exam, Q.3) (1.9 points) Ten percent of a companyʼs life insurance policyholders are smokers. The rest are nonsmokers. For each nonsmoker, the probability of dying during the year is 0.01. For each smoker, the probability of dying during the year is 0.05. Given that a policyholder has died, what is the probability that the policyholder was a smoker? A. 0.05 B. 0.20 C. 0.36 D. 0.56 E. 0.90 4.37 (Course 1 Sample Exam, Q.29) (1.9 points) An insurance company designates 10% of its customers as high risk and 90% as low risk. The number of claims made by a customer in a calendar year is Poisson distributed with mean λ and is independent of the number of claims made by a customer in the previous calendar year. For high risk customers λ = 0.6, while for low risk customers λ = 0.1. Calculate the expected number of claims made in calendar year 1998 by a customer who made one claim in calendar year 1997. A. 0.15 B. 0.18 C. 0.24 D. 0.30 E. 0.40 4.38 (1, 5/00, Q.2) (1.9 points) A study of automobile accidents produced the following data: Model year Proportion of all vehicles Probability of involvement in an accident 1997 0.16 0.05 1998 0.18 0.02 1999 0.20 0.03 Other 0.46 0.04 An automobile from one of the model years 1997, 1998, and 1999 was involved in an accident. Determine the probability that the model year of this automobile is 1997. (A) 0.22 (B) 0.30 (C) 0.33 (D) 0.45 (E) 0.50 4.39 (1, 5/00, Q.33) (1.9 points) A blood test indicates the presence of a particular disease 95% of the time when the disease is actually present. The same test indicates the presence of the disease 0.5% of the time when the disease is not present. One percent of the population actually has the disease. Calculate the probability that a person has the disease given that the test indicates the presence of the disease. (A) 0.324 (B) 0.657 (C) 0.945 (D) 0.950 (E) 0.995
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 104 4.40 (1, 11/00, Q.12) (1.9 points) An actuary studied the likelihood that different types of drivers would be involved in at least one collision during any one-year period. The results of the study are presented below. Type of driver Percentage of all drivers Probability of at least one collision Teen 8% 0.15 Young adult 16% 0.08 Midlife 45% 0.04 Senior 31% 0.05 Total 100% Given that a driver has been involved in at least one collision in the past year, what is the probability that the driver is a young adult driver? (A) 0.06 (B) 0.16 (C) 0.19 (D) 0.22 (E) 0.25 4.41 (1, 11/00, Q.22) (1.9 points) The probability that a randomly chosen male has a circulation problem is 0.25. Males who have a circulation problem are twice as likely to be smokers as those who do not have a circulation problem. What is the conditional probability that a male has a circulation problem, given that he is a smoker? (A) 1/4 (B) 1/3 (C) 2/5 (D) 1/2 (E) 2/3 4.42 (1, 5/01, Q.6) (1.9 points) An insurance company issues life insurance policies in three separate categories: standard, preferred, and ultra-preferred. Of the companyʼs policyholders, 50% are standard, 40% are preferred, and 10% are ultra-preferred. Each standard policyholder has probability 0.010 of dying in the next year, each preferred policyholder has probability 0.005 of dying in the next year, and each ultra-preferred policyholder has probability 0.001 of dying in the next year. A policyholder dies in the next year. What is the probability that the deceased policyholder was ultra-preferred? (A) 0.0001 (B) 0.0010 (C) 0.0071 (D) 0.0141 (E) 0.2817 4.43 (1, 5/01, Q.23) (1.9 points) A hospital receives 1/5 of its flu vaccine shipments from Company X and the remainder of its shipments from other companies. Each shipment contains a very large number of vaccine vials. For Company Xʼs shipments, 10% of the vials are ineffective. For every other company, 2% of the vials are ineffective. The hospital tests 30 randomly selected vials from a shipment and finds that one vial is ineffective. What is the probability that this shipment came from Company X ? (A) 0.10 (B) 0.14 (C) 0.37 (D) 0.63 (E) 0.86
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 105 4.44 (4, 11/01, Q.7 & 2009 Sample Q.60) (2.5 points) You are given the following information about six coins: Coin Probability of Heads 1–4 0.50 5 0.25 6 0.75 A coin is selected at random and then flipped repeatedly. Xi denotes the outcome of the ith flip, where “1” indicates heads and “0” indicates tails. The following sequence is obtained: S = {X1 , X2 , X3 , X4 } = {1, 1, 0, 1}. Determine E(X5 | S) using Bayesian analysis. (A) 0.52
(B) 0.54
(C) 0.56
(D) 0.59
(E) 0.63
4.45 (2 points) In the previous question, 4, 11/01, Q. 7, using Bayesian analysis, determine the probability that coin flips 5 and 6 are both heads. A. Less than 29% B. At least 29%, but less than 31% C. At least 31%, but less than 33% D. At least 33%, but less than 35% E. At least 35% 4.46 (2 points) In 4, 11/01, Q. 7, you are instead given that: X1 + X2 + X3 + X4 = 3. Determine E(X5 | S) using Bayesian analysis. (A) 0.52
(B) 0.54
(C) 0.56
(D) 0.59
(E) 0.63
4.47 (3 points) In 4, 11/01, Q. 7, using Bayesian analysis, determine the probability that coin flip 5 is a tail, coin flip 6 is a head, and coin flip 7 is a tail. A. Less than 10.0% B. At least 10.0%, but less than 10.5% C. At least 10.5%, but less than 11.0% D. At least 11.0%, but less than 11.5% E. At least 11.5%
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 106 4.48 (1, 5/03, Q.8) (2.5 points) An auto insurance company insures drivers of all ages. An actuary compiled the following statistics on the companyʼs insured drivers: Age of Driver Probability of Accident Portion of Companyʼs Insured Drivers 16-20 0.06 0.08 21-30 0.03 0.15 31-65 0.02 0.49 66-99 0.04 0.28 A randomly selected driver that the company insures has an accident. Calculate the probability that the driver was age 16-20. (A) 0.13 (B) 0.16 (C) 0.19 (D) 0.23 (E) 0.40 4.49 (1, 5/03, Q.31) (2.5 points) A health study tracked a group of persons for five years. At the beginning of the study, 20% were classified as heavy smokers, 30% as light smokers, and 50% as nonsmokers. Results of the study showed that light smokers were twice as likely as nonsmokers to die during the five-year study, but only half as likely as heavy smokers. A randomly selected participant from the study died over the five-year period. Calculate the probability that the participant was a heavy smoker. (A) 0.20 (B) 0.25 (C) 0.35 (D) 0.42 (E) 0.57 4.50 (4, 5/07, Q.35) (2.5 points) The observation from a single experiment has distribution: Pr(D = d | G = g) = g1-d(1-g)d for d = 0, 1 The prior distribution of G is: Pr(G = 1/5) = 3/5 and Pr(G = 1/3) = 2/5 Calculate Pr(G = 1/3 | D = 0). (A) 2/19 (B) 3/19 (C) 1/3 (D) 9/19
(E) 10/19
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 107 Solutions to Problems: 4.1. D. A
B
Type of Box I II
A Priori Chance of This Type of Box 0.667 0.333
C
D
E
Chance of the Observation 0.3750 0.7000
Prob. Weight = Product of Columns B&C 0.2500 0.2333
Posterior Chance of This Type of Box 51.72% 48.28%
0.483
1.000
Overall
4.2. A. A
B
A Priori Chance of Type of This Type Die of Die A 0.3333 B 0.3333 C 0.3333
C
D
E
F
Chance of the Observation 0.4444 0.4444 0.2778
Prob. Weight = Product of Columns B&C 0.1481 0.1481 0.0926
Posterior Chance of This Type of Die 0.3810 0.3810 0.2381
Mean Roll of Die 0.6667 0.3333 0.1667
0.389
1.000
0.421
Overall
If the sum of two rolls is 1, then one of them must have been a 0 and the other a 1. For example, if die A is chosen, then the chance of the sum being 1 is: (2)(1/3)(2/3) = 4/9. If instead die C is chosen, then the chance of the sum being 1 is: (2)(5/6)(1/6) = 5/18. 4.3. C. A
B
Type of Die A B C
A Priori Chance of This Type of Die 0.3333 0.3333 0.3333
Overall
C
D
Prob. Weight = Chance Product of the of Columns Observation B&C 0.2222 0.0741 0.4444 0.1481 0.3472 0.1157 0.338
E
F
Posterior Chance of This Type of Die 0.2192 0.4384 0.3425
Mean Roll of Die 0.6667 0.3333 0.1667
1.000
0.349
For example, the chance of observing a single 1 on three rolls for Die C is: 3(1/6)(5/6)2 = 75 / 216. (The chance of given numbers of ones being rolled is given by a Binomial Distribution with m = 3 and q = 1/6 in the case of Die C.) 4.4. B. 1. True. 2. Not necessarily true (is always true of Credibility.) 3. True.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 108 4.5. D. A
B
A Priori Chance of Type of This Type Urn of Urn I 0.8000 II 0.2000
C
D
E
F
Chance of the Observation 0.1000 0.3000
Prob. Weight = Product of Columns B&C 0.0800 0.0600
Posterior Chance of This Type of Urn 0.5714 0.4286
Mean Draw from Urn 1100 1300
0.140
1.000
1186
Overall
4.6. B. For example, the chance of picking 2 @ $1000 and 1@$2000 from Urn II is given by f(2) for a Binomial distribution with n = 3 and q = .7: (3)(.72 )(.3). A
B
A Priori Chance of Type of This Type Urn of Urn I 0.8000 II 0.2000
C
D
E
F
Chance of the Observation 0.2430 0.4410
Prob. Weight = Product of Columns B&C 0.1944 0.0882
Posterior Chance of This Type of Urn 0.6879 0.3121
Mean Draw from Urn 1100 1300
0.283
1.000
1162
Overall
4.7. E. Bayesian Estimates are in balance; the sum of the product of the a priori chance of each outcome times its posterior Bayesian estimate is equal to the a priori mean. The a priori mean is: (5/8)(1) + (2/8)(4) + (1/8)(16) = 3.625. Let E[X2 | X1 = 16] = y. Then setting the sum of the chance of each outcome times its posterior mean equal to the a priori mean: (5/8)(1.4) + (2/8)(3.6) + (1/8)(y) = 3.625. Therefore y = 14.8. 4.8. C. Chance of no Black Balls in Fifty Draws = (1-p)50. A
B
A Priori Type Probability I 0.4 II 0.3 III 0.2 IV 0.1 SUM
C
% Black Balls 0.04 0.08 0.12 0.16
D
E
Chance of Probability No Black Balls Weights = in Fifty Draws Col. B x Col. D 0.12989 0.0520 0.01547 0.0046 0.00168 0.0003 0.00016 0.0000 0.0569
F
G
Posterior Probability 0.912 0.081 0.006 0.000
Col. C x Col. F 0.036 0.007 0.001 0.000
1.000
0.044
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 109 4.9. D. If one has Die A, then the chance of the observation is: (1/6)(1/6)(3/6)(1/6)(1/6) = 3 / 7776. This is also the chance of the observation for Die C or Die D. If one has Die B, then the chance of the observation is: (3/6)(1/6)(1/6)(3/6)(1/6) = 9 / 7776. A
B
C
Die A B C D
A Priori Probability 0.25 0.25 0.25 0.25
D
Probability Chance of Weights = Observation Col. B x Col. C 0.00387 0.00097 0.01160 0.00290 0.00387 0.00097 0.00387 0.00097
SUM
E
F
Posterior Probability 0.1667 0.5000 0.1667 0.1667
Mean Value of a Single Roll of a Die 2.000 2.333 2.667 3.000
1.000
2.444
0.00580
4.10. D. Given Deck A, the chance of getting a $20 hand is (4/13)(3/12) = 3/39. Given Deck B, the chance of getting a $20 hand is (3/12)(2/11) = 1/22. For Deck A, the expected value of a card is (10)(4/13). The expected value of a hand is 80/13. For Deck B the expected value of a two card hand is (2)(10)(3/12) = 5. A
Deck A B
B
C
D
E
F
A Priori Probability 50% 50%
Chance of the Observation 0.07692 0.04545
Prob. Weight = Product of Columns B&C 0.038462 0.022727
Posterior Chance of This Type of Risk 62.86% 37.14%
Mean Value of a Hand
0.061189
1.000
5.73
Overall
6.15 5.00
4.11. C. A
B
C
D
E
F
Type of Die 4-sided 6-sided 8-sided
A Priori Chance of This Type of Die 0.333 0.333 0.333
Chance of the Observation 0.000 0.167 0.125
Prob. Weight = Product of Columns B&C 0.000 0.056 0.042
Posterior Chance of This Type of Risk 0.0% 57.1% 42.9%
Mean Die Roll 2.5 3.5 4.5
0.097
1.000
3.93
Overall
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 110 4.12. Applying Bayes Theorem as shown below, there is a 1/3 chance that the car is behind the door you originally picked. A
B
C
D
E
Behind Chosen Door
A Priori Probability
Chance of the Observation Given Risk r Type
Probability Weight
Posterior Probability
Car Goat
0.3333 0.6667
1.0000 1.0000
0.3333 0.6667
0.3333 0.6667
1.000
1.000
Overall
The observation is that Monty opened a door (other than the one you picked) with a goat behind it. Prob[Monty opens a door with a goat behind it given the door you picked has a car behind it] = 1. Prob[Monty opens a door with a goat behind it given the door you picked has a goat behind it] = 1. Comment: In this case, it is advantageous to accept Montyʼs offer and switch doors. Regardless of what is behind the door that you picked, Monty will open a door you did not pick with a goat behind it and give you a chance to switch. There is a 100% chance you will observe Monty opening a door with a goat behind it and give you a chance to switch doors. 4.13. Applying Bayes Theorem as shown below, there is a 1/2 chance that the car is behind the door you originally picked. A
B
C
D
E
Behind Chosen Door
A Priori Probability
Chance of the Observation Given Risk Type
Probability Weight
Posterior Probability
Car Goat
0.3333 0.6667
1.0000 0.5000
0.3333 0.3333
0.5000 0.5000
0.667
1.000
Overall
Comment: In this case, it is neutral between accepting and refusing Montyʼs offer to switch doors. Monty opens a door you did not pick at random. If you picked the door with a car, then the door Monty opens is always a goat. There is a 100% chance you will observe Monty giving you a chance to switch doors.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 111 4.14. Applying Bayes Theorem as shown below, there is a 60% chance that the car is behind the door you originally picked. A
B
C
D
E
Behind Chosen Door
A Priori Probability
Chance of the Observation Given Risk Type
Probability Weight
Posterior Probability
Car Goat
0.3333 0.6667
1.0000 0.3333
0.3333 0.2222
0.6000 0.4000
0.556
1.000
Overall
For example, (2/3)(1/3) = 2/9. (2/9)/(1/3 + 2/9) = 0.4. Comment: In this case, it is advantageous to refuse Montyʼs offer to switch doors. The probability that the car is behind the door you originally picked, depends on the procedure employed by Monty Hall. Quite often this problem is imprecisely stated, without specifying Montyʼs procedure, but implicitly assuming the procedure specified in the first question, rather than some other procedure such as those in one of the other two questions! 4.15. A. Prob[witness says Blue | Blue] = (80%)(15%) = 12%. Prob[witness says Blue | Green] = (20%)(85%) = 17%. Prob[witness says Blue] = 12% + 17% = 29%. Prob[Blue | Witness said Blue] = 12%/ 29% = 41.4%. 4.16. A. Let Urn 1 contain two red marbles and one black marble. Let Urn 2 contain one red marble and two black marbles. If A is Urn 1, then the chance of the first marble being red is 2/3. Then this red marble is placed in Urn 2 and the chance that the second marble is black is 2/4. ( 2 black marbles out of 1+3 = 4 marbles.) Thus in the case the chance of the observation is (2/3)(2/4) = 1/3. The black marble is placed in Urn 1, which now has 1 red and 2 black marbles. So there is a 1/3 chance the third marble is red. If A is Urn 2, then the chance of the first marble being red is 1/3. Then this red marble is placed in Urn 1 and the chance that the second marble is black is 1/4. (1 black marbles out of 1 + 3 = 4 marbles.) Thus in the case the chance of the observation is (1/3)(1/4) = 1/12. The black marble is placed in Urn 2, which now has 3 black marbles. So there is a 0 chance the third marble is red. A
B
C
Urn Number Which is A
A Priori Chance of This Situation
Chance of the Observation
1 2
0.5 0.5
0.3333 0.0833
Overall
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Situation
F
Chance of Red Third Marble
0.1667 0.0417
0.8000 0.2000
0.3333 0.0000
0.2083
1.0000
0.2667
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 112 4.17. A. A
B
Type of Urn 1 2 3
A Priori Chance of this Type of Urn 0.333 0.333 0.333
C
D
Chance of the Observation 1.0000 0 0.0312
Prob. Weight = Product of Columns B & C 0.3333 0.0000 0.0104
Posterior Chance of this Type of Urn 97.0% 0.0% 3.0%
Mean Ball From Urn 0.0 1.0 0.5
0.3438
1.000
0.0152
Overall
E
F
4.18. E. Since the observation is impossible from either Urn 1 or Urn 2, the posterior probability is 100% Urn 3. Therefore the posterior estimate is 1/2. A
B
A Priori Chance of Type of This Type Urn of Urn 1 0.333 2 0.333 3 0.333
C
D
E
F
Chance of the Observation 0.0000 0 0.3125
Prob. Weight = Product of Columns B&C 0.0000 0.0000 0.1042
Posterior Chance of This Type of Urn 0.0% 0.0% 100.0%
Mean Ball From Urn 0.0 1.0 0.5
0.1042
1.000
0.5000
Overall
4.19. C. A
B
A Priori Type of Chance of Die This Type of Die A 0.3333 B 0.3333 C 0.3333 Overall
C
D
E
F
Chance of the Observation 0.3333 0.6667 0.8333
Prob. Weight = Product of Columns B&C 0.1111 0.2222 0.2778
Posterior Chance of This Type of Die 0.1818 0.3636 0.4545
Mean Roll of Die 0.6667 0.3333 0.1667
0.611
1.000
0.318
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 113 4.20. B. A
B
A Priori Type of Chance of Die This Type of Die A 0.3333 B 0.3333 C 0.3333
C
D
E
F
Chance of the Observation 0.6667 0.3333 0.1667
Prob. Weight = Product of Columns B&C 0.2222 0.1111 0.0556
Posterior Chance of This Type of Die 0.5714 0.2857 0.1429
Mean Roll of Die 0.6667 0.3333 0.1667
0.389
1.000
0.500
Overall
Comment: Note that the a priori mean is (1/3)(2/3) + (1/3)(1/3) + (1/3)(1/6) = 7/18. The chance of observing a 0 is 11/18 and the chance of observing a 1 is 7/18. The Bayesian Estimates balance to the a priori mean: (11/18)(.318) + (7/18)(.500) = 7/18. 4.21. D. The posterior distribution is proportional to the product of the a priori chance of picking each box and the chance of the observation. A
B
A Priori Chance of This Box 0.667 0.333
Type of Box I II
C
D
E
Chance of the Observation 0.6000 0.3000
Prob. Weight = Product of Columns B&C 0.4000 0.1000
Col. D / Sum of Col. D = Posterior Chance of This Box 80.00% 20.00%
0.500
1.000
Overall
4.22. A. A
B
Urn
A Priori Chance of This Urn
I II Overall
0.500 0.500
C
D
E
Chance Prob. Weight = Col. D / Sum of Col. D = of the Product of Posterior Observation Columns B & C Chance of this Urn 0.2500 0.1667
0.1250 0.0833 0.2083
60.0% 40.0% 1.000
F
Mean Ball for this Urn 2.500 3.500 2.900
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 114 4.23. A. A
B
C
A Priori Chance of Chance Type of This Type of the Urn of Urn Observation A 0.3333 0.0270 B 0.3333 0.4320 C 0.3333 0.3840 Overall
D
E
F
Prob. Weight = Product of Columns B&C 0.0090 0.1440 0.1280
Col. D / Sum of Col. D = Posterior Chance of This Type of Urn 0.0320 0.5125 0.4555
Mean for 3 Balls from Urn 2.7000 1.2000 0.6000
0.2810
1.000
0.975
For example, the chance of observing a single 1 and 2 zeros on three draws from Urn C is: (3)(.2)(.8)2 = .384. (The chance of given numbers of ones being drawn is given by a Binomial Distribution with n = 3 and q = .2 in the case of Urn C.)
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 115 4.24. E. Compute the probability weights as the product of the chance of observing a 12 given that type of Spinner and the a priori chance of that type of Spinner. Then the new estimate is a weighted average of the means of the spinners, using either the probability weights or the posterior probabilities. (The posterior probabilities are just the probability weights divided by their sum. Note that the sum of the probability weights of .278 is the a priori chance of observing a 12.) A
B
A Priori Chance of Type of This Type Spinner of Spinner A 0.3333 B 0.3333 C 0.3333
C
Chance of Observing Spin of 12 0.3333 0.3333 0.1667
D
E
F
Prob. Weight = Col. D / Sum of Col D. = Mean Product Posterior Chance of for of Columns This Type This Type B&C of Spinner of Spinner 0.1111 0.4000 20 0.1111 0.4000 12 0.0556 0.2000 10
Overall
0.278
1.000
14.800
Comment: The observation in this question a single spin of 12. You could have just as easily been asked for the new estimate if the observation was a single spin of 0: A
B
A Priori Chance of Type of This Type Spinner of Spinner A 0.3333 B 0.3333 C 0.3333
C
Chance of Observing Spin of 0 0.3333 0.5000 0.6667
Overall
D
E
Prob. Weight = Col. D / Sum of Col D. = Product Posterior Chance of of Columns This Type B&C of Spinner 0.1111 0.2222 0.1667 0.3333 0.2222 0.4444 0.500
1.000
F
Mean for This Type of Spinner 20 12 10 12.889
If the observation was a single spin of 48: A
B
A Priori Chance of Type of This Type Spinner of Spinner A 0.3333 B 0.3333 C 0.3333 Overall
C
D
E
F
Chance of Observing Spin of 48 0.3333 0.1667 0.1667
Prob. Weight = Product of Columns B&C 0.1111 0.0556 0.0556
Col. D / Sum of Col D. = Posterior Chance of This Type of Spinner 0.5000 0.2500 0.2500
Mean for This Type of Spinner 20 12 10
0.222
1.000
15.500
Note that the three estimates weighted by the corresponding probabilities of the three observations equal the a priori mean: (.278)(14.8) +(.5)(12.889) + (.222)(15.5) = 14.00. This is an example of the general result, that Bayesian Estimates are “in balance”.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 116 4.25. B. For example, the chance of 2 Balls of each type from Urn D is given by f(2) for a Binomial distribution with n = 4 and q = 0.8 : 6(0.82 )(0.22 ) = 0.1536. A
B
C
D
Urn
A Priori Probability
% of Balls Marked 1
A B C D
0.25 0.25 0.25 0.25
0.3 0.3 0.7 0.8
E
F
Chance of Probability Posterior 2 Balls with 1 Weights = Probability in 4 draws Col. B x Col. D 0.2646 0.0662 0.2793 0.2646 0.0662 0.2793 0.2646 0.0662 0.2793 0.1536 0.0384 0.1621
SUM
0.2369
1.0000
G
Col. C x Col. F 0.0838 0.0838 0.1955 0.1297 0.4928
The posterior probabilities are the probability weights divided by the sum of the probability weights. For example, 0.1621 = 0.0384/0.2368. By weighting the means by the posterior probabilities, the posterior expected value of a single ball from the same urn is 0.4928. Thus the expected value of 4 balls is: (4)(0.4928) = 1.971. 4.26. B. Prob[disease | positive] = Prob[positive | disease] Prob[disease] / Prob[positive] = (0.85)(0.01)/{(0.85)(0.01) + (0.10)(0.99)} = 0.00850/.1075 = 0.0791. 4.27. D. If we observe that the sum of two balls is 2, then the balls picked were in order either: 2,0 1,1 or 0,2. Therefore the chance of the observation if we have picked Urn B is: (0.1)(0.7) + (0.2)(0.2) + (0.7)(0.1) = 0.18. Similarly, the chance of the observation if we have picked Urn A is: (0.4)(0.2) + (0.4)(0.4) + (0.2)(0.4) =0 .32. As shown below, the posterior probability for Urn A is: 0.16 / 0.25 = 64%. For example, the mean for Urn B is: (0)(0.7) + (1)(0.2) + (2)(0.1) = 0.4 A
B
A Priori Chance of Type of This Type Urn of Urn A 0.500 B 0.500 Overall
C
Chance of the Observation 0.32 0.18
D
E
F
Prob. Weight = Col. D / Sum of Col. D = Product Posterior Mean of of Columns Chance of This Type B&C This Type of Urn of Urn 0.16 64.0% 1.2 0.09 36.0% 0.4 0.25000
1.000
The posterior estimate for a single ball drawn from the same urn is 0.912. Thus the posterior estimate for two balls is: (2)(0.912) = 1.824.
0.912
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 117 4.28. C. The Bayesian Estimates are always in balance; the sum of the product of the a priori chance of each outcome times its posterior Bayesian estimate is equal to the a priori mean. The a priori mean is (1/3)(1) + (1/3)(8) + (1/3)(12) = 7. Let E[ X2 | X1 = 12] = ψ. Then setting the sum of the chance of each outcome times its posterior mean equal to the a priori mean: (1/3)(2.6) + (1/3)(7.8) + (1/3)(ψ) = 7. Therefore ψ = 10.6. Comment: The given “Buhlmann Credibility estimates” are on the line: .6T + 2.8. (They average to 7, the a priori mean.) They should be the least squares linear approximation to the Bayesian estimate. One should be able to solve for the missing Bayesian Estimate ψ, since the credibility must equal the slope of the weighted least squares line, which is given (incorrectly) as .6. This slope is given by: {(ΣwiXiY i) - (ΣwiY i)(ΣwiXi)} / {(ΣwiXi2 ) - (ΣwiXi)2 }, where the wi are the a priori probabilities, which in this case are all 1/3, Xi are the possible outcomes, and Y i are the Bayesian Estimates. Using the missing value of 10.6, derived in the solution above, one gets a slope of: {(64.07) -(7)(7)} / {69.67 - 72 } = .73. Thus the given “Buhlmann Credibility estimates” are in fact inconsistent with the other given information. In fact they should be along the line : 1.89 + .73T. (The values along that line are 2.62, 7.73, 10.65 and fall very close to the Bayesian estimates.) Thus if one uses the given incorrect “Buhlmann Credibility estimates” in an attempt to solve this problem one will end up with the wrong answer. If instead one just ignores these “Buhlmann Credibility estimates”, one should obtain the right answer. 4.29. A. Since the urns have the same a priori probability, the chance of having picked each urn is proportional to the chance of observing one ball of each type. Thus the New Estimate = {(0.16)(1.6) + (0.21)(.6)} / (0.16 + 0.21) = 1.032. In more detail, for Urn 1, the chance of picking one ball marked 0 and one marked 1 is: 2(.2)(.8) =.32, as given by the Binomial distribution. One then gets probability weights as the product of the a priori probability and the chance of observation. For example, for Urn B, (.5)(.42) = .21. One converts these to the posterior probabilities by dividing by the sum of the weights. Then one weights together the means for two balls drawn from each urn, using the posterior probabilities (or probability weights): (.4324)(1.6) + (.5676)(.6) = 1.032. Urn A B Sum
A Priori Probability 0.5 0.5
Chance of Observation 0.3200 0.4200
Probability Weights 0.1600 0.2100
Posterior Probability 0.4324 0.5676
Mean (2 balls) 1.6 0.6
0.3700
1.0000
1.032
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 118 4.30. E. A
B
C
D
E
F
G
H
Urn A Priori Chance Chance Chance Prob. Weight = Posterior Chance of (Type Chance of of of of the Product of This Type Picking Picking Observation = of Columns This Type Col. C times Col. D B&E of Risk Risk) of Risk 2 3 1 2 3 4 5 6 7 8 9 10
0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100
0.2 0.2 0.2 0.2 0.2 0 1 0 0 0
0.2 0.2 0.2 0.2 0.2 0 0 1 0 0
0.04000 0.04000 0.04000 0.04000 0.04000 0.00000 0.00000 0.00000 0.00000 0.00000
Overall
Mean
0.00400 0.00400 0.00400 0.00400 0.00400 0.00000 0.00000 0.00000 0.00000 0.00000
0.20000 0.20000 0.20000 0.20000 0.20000 0.00000 0.00000 0.00000 0.00000 0.00000
3 3 3 3 3 1 2 3 4 5
0.02000
1.00000
3
4.31. A. The posterior Bayesian estimate is: (0.3)(0.3n) / (7.2 - 0.3n) + (0.6)(7.2 - 0.6n) / (7.2 - 0.3n). We set this equal to 0.54 as given: (0.54)(7.2 - 0.3n) = 0.09n + 4.32 - 0.36n. ⇒ n = 4. A
B
Type of Urn
A Priori Chance of this type of Urn n/12 (12-n)/12
1 2
C
D
Chance Prob. Weight = of the Product Observation of Columns B&C 0.3 .3n/12 0.6 .6(12-n)/12
E
F
Posterior Chance of this type of Urn .3n/(7.2-.3n) (7.2-.6n)/(7.2-.3n)
Mean of this Type of Urn 0.3 0.6
Comment: Given the output, we must solve for the missing an input. One can verify that if n = 4, then the usual Bayesian Analysis would produce an estimate of .54. A
Type of Urn 1 2 Overall
B
C
D
E
F
A Priori Chance of this type of Urn 0.3333 0.6667
Chance of the Observation
Prob. Weight = Product of Columns B&C 0.10 0.40 0.50
Posterior Chance of this type of Urn 0.20 0.80 1.00
Mean of this type of Urn 0.30 0.60 0.54
0.30 0.60
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 119 4.32. B. Let Urn 1 contain two red marbles and one black marble. Let Urn 2 contain one red marble and two black marbles. If A is Urn 1, then the chance of the first marble being red is 2/3. Then this red marble is placed in Urn 2 and the chance that the second marble is red is 2/4. (1 + 1 = 2 red marbles out of 1 + 3 = 4 marbles.) Thus in the case the chance of the observation is (2/3)(2/4) = 1/3. If A is Urn 2, then the chance of the first marble being red is 1/3. Then this red marble is placed in Urn 1 and the chance that the second marble is red is 3/4. (1+2 = 3 red marbles out of 1+3 = 4 marbles.) Thus in the case the chance of the observation is (1/3)(3/4) = 1/4. Note that regardless of whether A is Urn 1 or 2, the first step removes a red marble from Urn A while the second step returns a red marble to Urn A. Thus prior to the third step, Urn A, as well as Urn B, are in their original configurations. Thus the chance that the third selected marble is red if A is Urn 1 is 2/3 and if A is Urn 2 is 1/3. The posterior chances of A being 1 and 2 are 4/7 and 3/7, resulting in a chance of the third marble being red of: (4/7)(2/3) + (3/7)(1/3) = 11/21. A
B
C
Urn Number Which is A
A Priori Chance of This Situation
Chance of the Observation
1 2
0.5 0.5
0.3333 0.2500
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Situation
Overall
F
Chance of Red Third Marble
0.1667 0.1250
0.5714 0.4286
0.6667 0.3333
0.2917
1.0000
0.5238
Comment: Too hard and too long for 1 point. 4.33. D. If the first 2n marbles are red, each cycle of 2 draws returns us to the starting configuration of marbles in Urns. Thus we have n independent repetitions of the experiment in the previous question. Thus the chances of the observations are (1/3)n and (1/4)n , if A is Urn 1 and Urn 2 respectively. Since the two situations are a priori equally probable, the posterior probabilities are proportional to (1/3)n and (1/4)n . Thus the posterior probabilities are: (1/3)n / {(1/3)n + (1/4)n } and (1/4)n / {(1/3)n + (1/4)n }. As n approaches infinity, the posterior probabilities approach 1 and 0. Thus the estimate that the (2n+1)st marble is red approaches: (1)(2/3) + (0)(1/3) = 2/3. Comment: Uses the intermediate results of the prior solution.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 120 4.34. B. Label the urn with two dice with faces 1, 2, 3, 4, 5, and 6, as Urn A. Label the urn with one die with faces 1, 2, 3, 4, 5, and 6, and one die with faces 1, 1, 1, 2, 2, and 2 as Urn B. Given Urn A, the average result of rolling two dice is 3.5 + 3.5 = 7. Given Urn A, the chance of observing a sum of 3 is 2/36 = 1/18. Given Urn B, the average result of rolling two dice is 3.5 + 1.5 = 5. Given Urn B, the chance of observing a sum of 3 is 6/36 = 1/6. A
B
C
D
E
F
Type of Urn A B
A Priori Chance of This Type of Urn 0.5000 0.5000
Chance of the Observation 0.0556 0.1667
Prob. Weight = Product of Columns B&C 0.0278 0.0833
Posterior Chance of This Type of Urn 0.2500 0.7500
Mean Outcome for Urn 7.0000 5.0000
0.111
1.000
5.500
Overall
4.35. C. Since the a priori probabilities are equal, the posterior probabilities are proportional to the densities for the observation. Let y = d-10. The densities for Types A, B, and C are proportional to: exp(-y2 /2), exp(-y2 /8)/2, and exp(-y2 /32)/4. Therefore since the posterior probability that it is Type B is largest we have: exp(-y2 /8)/2 > exp(-y2 /2) and that exp(-y2 /8)/2 > exp(-y2 /32)/4. Therefore taking logs: -y2 /8 - ln(2) > - y2 /2 and -y2 /8 - ln(2) > - y2 /32 - ln(4). (3/8)y2 > ln(2) and ln(2) > (3/32)y2 . |y| >
8 ln(2) / 3 = 2 ln(2) / 3 and |y| <
Since y = d-10, 2
32 ln(2) / 3 = 4 2 ln(2) / 3 .
2 ln(2) / 3 < |d - 10| < 4
2 ln(2) / 3 .
Comment: If d is very close to 10, then it is most likely to have come from Type A, which has the smallest variance around 10. If d is very far from 10, then it is most likely to have come from Type C. Therefore, the only way Type B could have the largest posterior probability of the three types is if d is neither too far from nor too close to 10. Of the five choices, only choice C is of this form.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 121 4.36. D. Given the red urn, the chance of the observation is: (90%)(10%) = 9%. Given the blue urn, the chance of the observation is: (60%)(90%) = 54%. The mean number from the red urn is: (.9)(1) + (.1)(2) = 1.1. The mean number from the blue urn is: (.1)(1) + (.9)(2) = 1.9. A
B
A Priori Chance of Type of This Type Urn of Urn Red 0.5000 Blue 0.5000 Overall
C
D
E
F
Chance of the Observation 0.0900 0.5400
Prob. Weight = Product of Columns B&C 0.0450 0.2700
Posterior Chance of This Type of Urn 0.1429 0.8571
Mean Outcome for Urn 1.1000 1.9000
0.315
1.000
1.786
4.37. C. Prob[smoker | died] = Prob[died | smoker] Prob[smoker]/Prob[died] = (.05)(.1)/{(.05)(.1) + (.01)(.9)} = .005/.014 = 0.357. 4.38. C. Prob[high | 1 claim] = Prob[1 claim | high] Prob[high]/Prob[1 claim] = (.6e-.6)(.1)/{(.6e-.6)(.1) + (.1e-.1)(.9)} = .0329/.1144 = .288. Prob[low | 1 claim] = Prob[1 claim | low] Prob[low]/Prob[1 claim] = (.1e-.1)(.9)/{(.6e-.6)(.1) + (.1e-.1)(.9)} = .0329/.1144 = .712. Expected number of claims: (.288)(.6) + (.712)(.1) = 0.244. 4.39. D. Prob[1997 | accident] = Prob[accident | 1997] Prob[1997]/Prob[accident] = (.05)(.16)/{(.05)(.16) + (.02)(.18) + (.03)(.20)} = .008/.0176 = 0.4545. Comment: Since the automobile was from one of the model years 1997, 1998, and 1999, we do not use the information on other model years. 4.40. B. Prob[disease | positive] = Prob[positive | disease] Prob[disease] / Prob[positive] = (.95)(.01)/{(.95)(.01) + (.005)(.99)} = .0095/.01445 = 0.657. 4.41. D. Prob[young | collision] = Prob[collision | young] Prob[young] / Prob[collision] = (.08)(16%)/{(.15)(8%) + (.08)(16%) + (.04)(45%) + (.05)(31%)} = .0128/.0583 = 0.220. 4.42. C. Let Prob[smoker | no problem] = p. Then Prob[smoker | problem] = 2p. Prob[problem | smoker] = Prob[smoker | problem] Prob[problem] / Prob[smoker] = (2p)(.25)/{(2p)(.25) + (p)(.75)} = .5/1.25 = 0.4. 4.43. D. Prob[ultra | die] = Prob[die | ultra] Prob[ultra] / Prob[die] = (.001)(10%)/{(.001)(10%) + (.005)(40%) + (.010)(50%)} = .0001/.0071 = 0.0141.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 122 4.44. A. Prob[observation | X] = 30(.1)(.929). Prob[observation | not X] = 30(.02)(.9829). Prob[X | obs.] = Prob[obs. | X] Prob[X] / Prob[obs.] = 30(.1)(.929)(1/5)/{30(.1)(.929)(1/5) + 30(.02)(.9829)(4/5)} = (.1)(.929)/{(.1)(.929) + 4(.2)(.9829)} = 1/{1 + .8(.98/.90)29} = 0.096. 4.45. C. If q is the chance of a head, then the probability of the observation of head, head, tail, head, in that order, is: q3 (1-q). A
B
C
D
E
F
Type of Coin
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Coin
Prob. of a Head
I II III
66.67% 16.67% 16.67%
0.0625 0.0117 0.1055
0.0417 0.0020 0.0176
68.09% 3.19% 28.72%
0.500 0.250 0.750
0.0612
100.00%
0.564
Overall
Comment: Since this is a Bernoulli, with 1 corresponding to a head and 0 corresponding to a tail, Prob[Head on 5th flip] = Prob[X5 = 1] = E[X5 ].
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 123 4.46. D. One uses the posterior distribution from the solution to the exam question. Then the chance of two heads on the next two coin flips is q2 . A
B
C
D
E
F
Type of Coin
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Coin
Prob. of Two Heads
I II III
66.67% 16.67% 16.67%
0.0625 0.0117 0.1055
0.0417 0.0020 0.0176
68.09% 3.19% 28.72%
0.2500 0.0625 0.5625
0.0612
100.00%
0.334
Overall
Alternately, from the solution to the exam question, the chance of a head on coin flip number five is 0.564. Also the distribution posterior to the fourth coin flip is: 68.09%, 3.19%, and 28.72%. Use this as the prior distribution to the fifth coin flip, and then get the distribution posterior to the fifth coin flip, assuming a head on the fifth coin flip. A
B
C
D
E
F
Type of Coin
Probability Prior to the 5th Flip
Chance of Head on the 5th Flip
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Coin
Chance of Head on the 6th Flip
I II III
68.09% 3.19% 28.72%
0.500 0.250 0.750
0.3404 0.0080 0.2154
60.38% 1.41% 38.20%
0.500 0.250 0.750
0.5638
100.00%
0.592
Overall
Prob[head on 5th and head on 6th] = Prob[head on 5th] Prob[head on 6th | head on 5th] = (0.564)(0.592) = 0.334. Comment: The correct answer is not: 0.5642 = 0.318.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 124 4.47. C. If q is the chance of a head, then the probability of the observation of 3 heads in 4 trials is the density at 3 of a Binomial with m = 4: 4q3 (1-q). A
B
C
D
E
F
Type of Coin
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Coin
Prob. of a Head
I II III
66.67% 16.67% 16.67%
0.2500 0.0469 0.4219
0.1667 0.0078 0.0703
68.09% 3.19% 28.72%
0.500 0.250 0.750
0.2448
100.00%
0.564
Overall
Comment: The observation in this question did not specify the order in which the heads and tails occurred. Therefore, each of the chances of observation were 4 times those in the exam question. However, the factor of 4 was common to each row, so the posterior distribution was the same as in the previous question. Therefore, the solution was the same as in the exam question.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 125 4.48. B. One uses the posterior distribution from the solution to the exam question. Then the chance of tail, head, tail, on the next three coin flips is (1-q)q(1-q). A
B
C
D
E
F
Type of Coin
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Coin
Prob. of Head, Tail, Head
I II III
66.67% 16.67% 16.67%
0.0625 0.0117 0.1055
0.0417 0.0020 0.0176
68.09% 3.19% 28.72%
0.1250 0.1406 0.0469
0.0612
100.00%
0.103
Overall
Alternately, from the solution to the exam question, the chance of a head on coin flip number five is .564. Also the distribution posterior to the fourth coin flip is: 68.09%, 3.19%, and 28.72%. Use this as the prior distribution to the fifth coin flip, and then get the distribution posterior to the fifth coin flip, assuming a tail on the fifth coin flip. A
B
C
D
E
F
Type of Coin
Probability Prior to the 5th Flip
Chance of Tail on the 5th Flip
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Coin
Chance of Head on the 6th Flip
I II III
68.09% 3.19% 28.72%
0.500 0.750 0.250
0.3404 0.0239 0.0718
78.05% 5.49% 16.46%
0.500 0.250 0.750
0.4362
100.00%
0.527
Overall
Use this as the prior distribution to the sixth coin flip, and then get the distribution posterior to the sixth coin flip, assuming a head on the sixth coin flip. A
B
C
D
E
F
Type of Coin
Probability Prior to the 6th Flip
Chance of Head on the 6th Flip
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Coin
Chance of Head on the 7th Flip
I II III
78.05% 5.49% 16.46%
0.500 0.250 0.750
0.3902 0.0137 0.1235
73.99% 2.60% 23.41%
0.500 0.250 0.750
0.5274
100.00%
0.552
Overall
Prob[tail on 5th, head on 6th, and tail on 7th] = Prob[tail on 5th] Prob[head on 6th | tail on 5th] Prob[tail on 7th | tail on 5th and head on 6th] = (1 - 0.564)(0.527)(1 - 0.552) = 0.103. Comment: The correct answer is not: (1 - .564)(.564)(1 - .564) = 0.107. 4.49. B. Prob[young | accident] = Prob[accident | young] Prob[young] / Prob[accident] = (.06)(8%)/{(.06)(8%) + (.03)(15%) + (.02)(49%) + (.04)(28%)} = .0048/.0303 = 0.158.
2013-4-9 Buhlmann Credibility §4 Bayes Analysis Introduction, HCM 10/19/12, Page 126 4.50. D. Let q = Prob[die | nonsmoker]. Then Prob[die | light smokers] = 2q and Prob[die | heavy smoker] = 4q. Prob[heavy smoker | die] = Prob[die | heavy smoker] Prob[heavy smoker] / Prob[die] = 4q.2/{4q.2 + 2q.3 + q.5} = .8/1.9 = 0.421. 4.51. E. Prob[G = 1/3 | D = 0] = Prob[G = 1/3] Prob[D = 0 | G = 1/3] / Prob[D = 0] = (2/5)(1/3)/{(3/5)(1/5) + (2/5)(1/3)} = (2/15)/(3/25 + 2/15) = 10/19. Comment: The distribution of D is Bernoulli with mean 1 - g. An application of Bayes Theorem. For whatever reason, the way the question was worded made this simple question harder for me. A mathematically equivalent statement of the question: Frequency is Bernoulli. The prior distribution of q is: Prob[q = 4/5] = 60% and Prob[q = 2/3] = 40%. If no claims are observed, what is the posterior probability that q is 2/3?
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 127
Section 5, Bayesian Analysis, with Discrete Types of Risks In this section, Bayesian Analysis will be applied to situations involving frequency, severity, pure premiums, or aggregate losses, when there are discrete types of risks. The case of continuous types of risks will be covered in the next section.
Bayesian Analysis, Frequency Example: One can use Bayesian Analysis to predict the future claim frequency. For example, assume the following information:
Type 1 2 3
Portion of Risks in this Type 50% 30% 20%
Bernoulli (Annual) Frequency Distribution q = 40% q = 70% q = 80%
We assume that the types are homogeneous; i.e., every risk of a given type has the same frequency process. Assume in addition that a risk is picked at random and that we do not know what type it is.27 If for this randomly selected risk during 4 years one observes 3 claims, then one can use Bayesian Analysis to predict the future frequency. For the three types, the mean (annual) frequencies are: 0.4, 0.7, and 0.8. Therefore the a priori mean frequency is: (50%)(0.4) + (30%)(0.7) + (20%)(0.8) = .57. If the frequency is Bernoulli each year, assuming the years are independent, then the frequency is Binomial over 4 years, with parameters q and m = 4. Therefore, a risk of type 1 is Binomial with q = 40% and m = 4. Thus the chance of observing 3 claims in four years for a risk of type 1 is: (0.4)3 (0.6)1 4! / (3! 1!) = 0.1536. Similarly, the chance of observing 3 claims in four years for a risk of type 2 is: (0.7)3 (0.3)1 4! / (3! 1!) = 0.4116. The chance of observing 3 claims in four years for a risk of type 3 is: (0.8)3 (0.2)1 4! / (3! 1!) = 0.4096. 27
The latter is very important. If one knew which type the risk was, one would use the expected value for that type in order to estimate the future frequency.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 128 Then by Bayesʼ Theorem after observing 3 claims in 4 years, the posterior chance of this individual being of each type is proportional to the product of the a priori chance of being of that type and the chance of the observation if the individual were of that type. So for example, the chance of the individual being from type 1 is proportional to: (50%)(0.1536) = 0.0768. The probability weights for the other two types are: (30%)(0.4116) = 0.12348, and (20%)(0.4096) = 0.08192. One can convert these probability weights to probabilities by dividing by their sum.28 Thus, 0.0768 / 0.2822 = 27.21%, 0.12348 / 0.2822 = 43.76%, and 0.08192 / 0.2822 = 29.03%. Finally one can weight together the mean frequencies for each type, using these posterior probabilities.29 (27.21%)(0.4) + (43.76%)(0.7) + (29.03%)(0.8) = 0.6474. Thus using Bayesian Analysis the estimated future annual frequency for this individual is 0.6474.30 This whole calculation is organized in a spreadsheet as follows: A
B
C
D
E
F
Type
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Freq.
1 2 3
50% 30% 20%
0.15360 0.41160 0.40960
0.076800 0.123480 0.081920
27.215% 43.756% 29.029%
0.4 0.7 0.8
0.282200
100.000%
0.6474
Overall
Sequential Approach: In the above example, let us assume that for a given insured, one observes 3 claims over the first four years, and then no claim in the fifth year. Then there are two equivalent ways to perform Bayes Analysis. One could start from time zero and go through year 5 in one step. The chance of the observation given q is: {4q3 (1 - q)}(1 - q) = 4q3 (1 - q)2 .
28
The sum of the probability weights is the a priori chance of the observations. In this case there was an a priori chance of observing 3 claims in 4 years of 28.22%. 29 One would get the same answer whether one used the posterior probabilities or the posterior probability weights to take the weighted average. 30 As shown in a subsequent section, in this case one would get a different estimate if one used Buhlmann Credibility.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 129 A
B
C
D
E
F
Type
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Freq.
1 2 3
50% 30% 20%
0.09216 0.12348 0.08192
0.046080 0.037044 0.016384
46.31% 37.23% 16.47%
0.4 0.7 0.8
0.099508
1.000
0.5775
Overall
Alternately, one could use the distribution posterior to 4 years as the distribution prior to year 5. The chance of the observation in year 5 given q is: (1 - q). A
B
C
D
E
F
Type
Posterior to Year 4 Probability
Year 5 Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Freq.
1 2 3
27.215% 43.756% 29.029%
0.6 0.3 0.2
0.163290 0.131268 0.058058
46.31% 37.23% 16.46%
0.4 0.7 0.8
0.352616
1.000
0.5775
Overall
This results in the same estimate as previously, using one big step rather than two smaller steps as here. Such a sequential approach works for Bayesian Analysis in general. Bayesian Analysis, First Severity Example: One can use Bayesian Analysis to predict the future claim severity. For example, assume the following information: Portion of Gamma Risks in Severity Type this Type Distribution 1
50%
α = 4, θ = 100
2
30%
α = 3, θ = 100
3
20%
α = 2, θ = 100
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 130 We assume that the types are homogeneous; i.e., every risk of a type has the same severity process. Assume in addition that a risk is picked at random and we do not know what type it is.31 Finally we assume that we are unconcerned with frequency.32 If for this randomly selected risk one observes 3 claims for a total of $450, then one can use Bayesian Analysis to predict the future severity of this risk. The sum of 3 independent claims drawn from a single Gamma Distribution is another Gamma Distribution, but with parameters: 3α and θ. Thus if the risk is of type 1, the distribution of the sum of 3 claims is Gamma with α = 12 and θ = 100. For type 2 the sum has parameters: α = 9 and θ = 100. For type 3 the sum has parameters: α = 6 and θ = 100. Thus assuming one has 3 claims, the chance that they will add to $450 if the risk is from type 1 is the density at 450 of a Gamma Distribution with α = 12 and θ = 100: θ−αxα−1 e−x/θ / Γ(α) = (0.01)12 45012-1 e-(0.01)(450) / Γ(12) = 0.0000426439. Similarly, the chance for type 2 is: (0.01)9 4509-1 e-(0.01)(450) / Γ(9) = 0.000463292. The chance for a risk from type 3 is: (0.01)6 4506-1 e-(0.01)(450) / Γ(6) = 0.00170827. Then by Bayesʼ Theorem, the posterior chance of this individual being from each type is proportional to the product of the a priori chance of being of that type and the chance of the observation if the individual were of that type. So for example, the chance of the individual being of type 1 is proportional to: (50%)(0.0000426439) = 0.0000213. The probability weights for the other two types are: (30%)(0.000463292) = 0.0001390, and (20%)(0.00170827) = 0.0003417. One can convert these probability weights to probabilities by dividing by their sum. For example, 0.0003417 / 0.0005020 = 68.06%. Since the mean of the Gamma Distribution is αθ, for the three types the mean severities are: 400, 300 and 200. Finally one can weight together the mean severities for each type, using these posterior probabilities: (4.25%)(400) + (27.69%)(300) + (68.06%)(200) = 236.33 Thus using Bayesian Analysis the estimated future annual severity for this individual is 236.
31
The latter is very important. If one knew which type the risk was, one would use the expected value for that type in order to estimate the future severity. 32 Either all the risk types have the same frequency process, or we are given no time period within which the number of claims is observed. 33 One would get the same answer whether one used the posterior probabilities or the posterior probability weights to take the weighted average.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 131 This whole calculation is organized in a spreadsheet as follows: A
B
C
D
E
F
Type
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Severity
1 2 3
50% 30% 20%
0.0000426 0.0004633 0.0017083
0.0000213 0.0001390 0.0003417
4.25% 27.69% 68.06%
400 300 200
0.0005020
100.00%
236
Overall
Bayesian Analysis, Second Severity Example: This example is similar to the previous example, except we observe the individual claim sizes rather than the just the total. As before, one can use Bayesian Analysis to predict the future claim severity. Assume as previously:
Type
Portion of Risks in this Type
Gamma Severity Distribution
1
50%
α = 4, θ = 100
2
30%
α = 3, θ = 100
3
20%
α = 2, θ = 100
We assume that the types are homogeneous; i.e., every risk within a type has the same severity process. Assume in addition that a risk is picked at random and we do not know what type it is.34 Finally we assume that we are unconcerned with frequency.35 Since the mean of the Gamma Distribution is αθ, for the three types the mean severities are: 400, 300, and 200. If for this randomly selected risk one observes 3 claims of sizes: $50, $100 and $300 (for a total of $450), then one can use Bayesian Analysis to predict the future severity of this risk.
34
The latter is very important. If one knew which type the insured was, one would use the expected value for that type in order to estimate the future severity. 35 Either all the risk types have the same frequency process, or we are given no time period within which the number of claims is observed.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 132 If one assumes that the claims occurred in the order 50, 100 and 300, then the likelihood of the three observed claims is the product of the individual likelihoods: f(50)f(100)f(300). If one assumes the claims could have occurred in any order, one gets an extra factor of 3! = 6. Since no order was specified, I will include the factor of 6.36 6f(50)f(100)f(300) = 6{θ−α50α−1 e−50/θ/Γ(α)}{θ−α100α−1 e−100/θ/Γ(α)}{θ−α300α−1 e−300/θ/ Γ(α)} = 6θ−3α 1500000α−1 e-450/θ / Γ(α)3 . So if the risk is of type 1 with α = 4 and θ = 100, the likelihood is: (6)0.0112 15000003 e-4.5 / Γ(4)3 = 1.0415 x 10-9. Similarly, the chance for type 2 is: (6).019 15000002 e-4.5 / Γ(3)3 = 1.8746 x 10-8. The chance for a risk of type 3 is: (6).016 15000001 e-4.5 / Γ(2)3 = 9.9981 x 10-8. Then by Bayesʼ Theorem, the posterior chance of this individual being of each type is proportional to the product of the a priori chance of being of that type and the chance of the observation if the individual were of that type. So for example, the chance of the individual being from type 1 is proportional to : (50%)(1.0415 x 10-9) = 5.2075 x 10-10. The probability weights for the other two types are: (30%)(1.8746 x 10-8) = 5.6238 x 10-9, and (20%)(9.9981 x 10-8) = 1.9996 x 10-8. One can convert these probability weights to probabilities by dividing by their sum. For example, 5.2075 x 10-10 / 2.6141 x 10-8 = 1.99%. Finally one can weight together the mean severities for each type, using these posterior probabilities:37 (1.99%)(400) + (21.51%)(300) + (76.49%)(300) = 225. Thus using Bayesian Analysis, the estimated future annual severity for this individual is 225.
36
Since this extra factor would appear on each row, it would drop out when one calculated posterior probabilities. Therefore, some prefer to leave it and similar factors out, when it would not make a difference such as in this case. 37 One would get the same answer whether one used the posterior probabilities or the probability weights to take the weighted average.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 133 Note that differs from the estimate of 236 we got previously using the sum of claims rather than the individual claim values. On the exam, if you are given the individual claim sizes use them in this situation, unless specifically told otherwise.38 This whole calculation is organized in a spreadsheet as follows: A
B
C
D
E
F
Type
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Severity
1 2 3
50% 30% 20%
0.00000000104 0.00000001875 0.00000009998
0.00000000052 0.00000000562 0.00000002000
1.99% 21.51% 76.49%
400 300 200
0.00000002614
100.00%
225
Overall
Bayesian Analysis, Third Severity Example: This example is similar to the previous example, except we consider the frequency processes. As before, one can use Bayesian Analysis to predict the future claim severity. Assume the following information: Type Portion of Risks in this Type
Bernoulli Frequency Dist.
Gamma Severity Dist.
1
50%
q = 40%
α = 4, θ = 100
2
30%
q = 70%
α = 3, θ = 100
3
20%
q = 80%
α = 2, θ = 100
We assume that the types are homogeneous; i.e., every risk within a type has the same frequency and severity processes. Assume in addition that a risk is picked at random and we do not know what type it is. If for this randomly selected risk one observes 3 claims in 4 years, of sizes: $50, $100 and $300, in any order, then one can use Bayesian Analysis to predict the future severity of this risk. Combining the computations of the previous severity example and an earlier frequency example, one can determine the probability of observing 3 claims in 4 years and the probability given one has observed three claims that they are of sizes 50, 100 and 300. Then P(Observation | Risk Type) = P(3 claims in 4 years | Risk Type) P(Severities = 50, 100, 300 | Risk Type & 3 claims).
38
Note the similar Buhlmann Credibility situation, to be discussed subsequently, only uses the sum of claims, never their separate values.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 134 For example for risk type 2, the chance of observing 3 claims in 4 years was computed in the frequency example as 0.4116. For risk type 2, the likelihood given one has observed three claims that they were of sizes 50, 100 and 300, in any order, was computed in the previous severity example as 1.8746 x 10-8. Thus given risk type 2, the chance of the current observation is their product: (0.4116)(1.8746 x 10-8) = 7.716 x 10-9. Then the probability weight for risk type 2 is the product of the chance of the observation given risk type 2 and the a priori chance of risk type 2: (7.716 x 10-9)(30%) = 2.315 x 10-9. The Bayesian Analysis calculation is organized in a spreadsheet as follows: A
B
C
D
E
Chance of Chance of Observing Observing A Priori 3 Claims Given Severities Type Probability in 4 Years Given 3 Claims 1 2 3
50% 30% 20%
0.1536 0.4116 0.4096
Chance of the Observation CxD
F
G
H
Prob. Weight = Posterior Product Chance of Mean of Columns This Type Annual B&E of Risk Severity
0.00000000104 0.00000000016 0.00000000008 0.00000001875 0.00000000772 0.00000000231 0.00000009998 0.00000004095 0.00000000819
0.76% 21.87% 77.38%
400 300 200
0.00000001059
100.00%
223
Overall
Bayesian Analysis, Pure Premium Example, Continuous Distributions: One can use Bayesian Analysis to predict the future pure premium. However, exam questions involving Bayesian Analysis, pure premiums, and continuous severity distributions tend to be very long. Here is an example that shows why. Assume the following information: Type Portion of Risks in this Type
Bernoulli Frequency Dist.
Gamma Severity Dist.
1
50%
q = 40%
α = 4, θ = 100
2
30%
q = 70%
α = 3, θ = 100
3
20%
q = 80%
α = 2, θ = 100
We assume that the types are homogeneous; i.e., every risk of a given type has the same frequency process and the same severity process. Assume that for an individual risk the frequency and severity are independent. Assume in addition that a risk is picked at random and that we do not know what type it is.39 39
The latter is very important. If one knew which class the risk was from, one would use the expected value for that class to estimate the future pure premium.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 135 If for this randomly selected risk one observes in 4 years a total of $450, then one can use Bayesian Analysis to predict the future pure premium of this risk. For the three types the mean (annual) frequencies are: 0.4, 0.7 and 0.8. Since the mean of the Gamma Distribution is αθ, for the three types the mean severities are: 400, 300 and 200. Thus the mean pure premiums are: (0.4)(400) = 160, (0.7)(300) = 210, and (0.8)(200) = 160. Now comes the difficult part. One needs to compute the probability of observing $450 of loss in 4 years. In this case involving the Binomial Distribution, there are at most 4 claims in 4 years.40 If there is no claim, then one would not have any losses, so that is not a possibility for the observation. However, if one has one claim, then the chance of $450 in loss is the Gamma Distribution with α and θ at x = 450: θ−αxα−1 e−x/θ / Γ(α). If one has two claims, then the chance of $450 in loss is the Gamma Distribution41 with 2α and θ at x = 450: θ−2αx2α−1 e−x/θ / Γ(2α). If one has three claims, then the chance of $450 in loss is the Gamma Distribution with 3α and θ at x = 450: θ−3αx3α−1 e−x/θ / Γ(3α). If one has four claims, then the chance of $450 in loss is the Gamma Distribution with 4α and θ at x = 450: θ−4αx4α−1 e−x/θ / Γ(4α). Now, the chance of one claim in four years is: 4q(1-q)3 . The chance of two claims in four years is: 6q2 (1-q)2 . The chance of three claims in four years is: 4q3 (1-q). The chance of four claims in four years is: q4 . Thus the overall chance of observing $450 in four years is: 4q(1-q)3 θ−α xα−1 e−x/θ /Γ(α) + 6q2 (1-q)2 θ−2α x2α−1 e−x/θ /Γ(2α) + 4q3 (1-q) θ−3α x3α−1e−x/θ /Γ(3α) + q4 θ−4α x4α−1 e−x/θ /Γ(4α).
40
If instead of a Binomial one had a Poisson, there would be a small but positive chance of any very large number of claims. 41 The sum of two independent Gamma distributions with α and θ is a Gamma distributions with 2α and θ.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 136 For type 1 with q = 0.4 and α = 4 and θ = 100, the computation of the overall chance of observing $450, goes as follows:42 A
Number of Claims 0 1 2 3 4 Sum
B
C
Probability Given this Number of Claims, of this number the Probability of $450 of Claims in Total Losses 0.1296 0 0.3456 0.0016871788 0.3456 0.0008236295 0.1536 0.0000426439 0.0256 0.0000005338
D
Column B times Column C 0 0.0005830890 0.0002846464 0.0000065501 0.0000000137 0.0008742991
1
Thus for a risk of type 1 the likelihood of $450 in loss in 4 years is 0.000874299. In a similar manner for a risk of type 2 the likelihood of $450 in loss in 4 years is: 0.000737971. A
B
C
D
Number of Claims 0 1 2 3 4
Probability of this number of Claims 0.0081 0.0756 0.2646 0.4116 0.2401
Given this Number of Claims, the Probability of $450 in Total Losses 0 0.0011247859 0.0017082686 0.0004632916 0.0000426439
Column B times Column C 0 0.0000850338 0.0004520079 0.0001906908 0.0000102388
Sum
1
0.0007379713
In a similar manner for a risk of type 3 the likelihood of $450 in loss in 4 years is: 0.00130901.
42
A
B
C
D
Number of Claims 0 1 2 3 4
Probability of this number of Claims 0.0016 0.0256 0.1536 0.4096 0.4096
Given this Number of Claims, the Probability of $450 in Total Losses 0 0.0004999048 0.0016871788 0.0017082686 0.0008236295
Column B times Column C 0 0.0000127976 0.0002591507 0.0006997068 0.0003373586
Sum
1
0.0013090137
While this is not particularly difficult compared to computations actuaries typically do with the aid of a computer, it would take very long on the exam.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 137 One can use these likelihoods in the usual manner to get posterior probabilities for each type: A
B
C
D
E
F
Type
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Pure Premium
1 2 3
50% 30% 20%
0.0008742990 0.0007379710 0.0013090100
0.0004371495 0.0002213913 0.0002618020
47.50% 24.06% 28.45%
160 210 160
0.0009203428
100.00%
172
Overall
Using the posterior probabilities of 47.50%, 24.06%, and 28.45% as weights, the estimated future pure premium is $172. Other than the length of time necessary to calculate Column C, the chance of the observation, this was a typical Bayesian Analysis question. In the next example, the frequency and severity process are somewhat simpler.
Bayesian Analysis, Pure Premium Example, Discrete Distributions:43 Assume there are two types of insureds with different risk processes: Portion of Risks in Type this type A 75% B 25% You are given the following for Type A: • The number of claims for a single exposure period will be either 0, 1 or 2: Number of Claims Probability 0 60% 1 30% 2 10% • The size of the claim, independent of any other will be 50, with probability 80%; or 100, with probability 20%. You are given the following for Type B: • The number of claims for a single exposure period will be either 0, 1 or 2: Number of Claims Probability 0 50% 1 30% 2 20% • The size of the claim, independent of any other will be 50, with probability 60%; or 100, with probability 40%. 43
See also the subsequent section on the Die/Spinner Models.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 138 A risk is selected at random44 and you observe a total of $100 in losses during a single year. One can use Bayesian Analysis to estimate the future pure premium for this risk. For type A, the chance of observing $100 in total losses is sum of the chance for one claim of size $100 and that for two claims each $50: (0.3)(0.2) + (0.1)(0.82 ) = 6.0% + 6.4% = 12.4%. For type B, the chance of observing $100 in total losses is: (0.3)(0.4) + (0.2)(0.62 ) = 12.0% + 7.2% = 19.2%. Then the Bayesian Analysis of the pure premiums proceeds as follows: A
Type A B Overall
B
C
Chance A Priori of the Probability Observation 75% 25%
0.124 0.192
D
E
F
G
H
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Freq.
Mean Sev.
Mean Annual Pure Premium
0.09300 0.04800
65.96% 34.04%
0.5 0.7
60 70
30 49
0.14100
100.00%
Thus the estimated future pure premium is: (65.96%)(30) + (34.04%)(49) = $36.47.
44
You donʼt know to which type this risk belongs.
36.47
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 139 Bayesian Analysis, Pure Premium Example, Dependent Frequency and Severity: Unlike the previous example, in this case the frequency and severity are not independent. Assume there are two types of insureds with different risk processes: Portion of Risks in Type this type A 75% B 25% You are given the following for Type A: • The number of claims for a single exposure period will be either 0, 1 or 2: Number of Claims Probability 0 60% 1 30% 2 10% • If only one claim is incurred, the size of the claim will be 50, with probability 80%; or 100, with probability 20%. • If two claims are incurred, the size of each claim, independent of the other, will be 50, with probability 50%; or 100, with probability 50%. You are given the following for Type B: • The number of claims for a single exposure period will be either 0, 1 or 2: Number of Claims Probability 0 50% 1 30% 2 20% • If only one claim is incurred, the size of the claim will be 50, with probability 60%; or 100, with probability 40%. • If two claims are incurred, the size of each claim, independent of the other, will be 50, with probability 30%; or 100, with probability 70%. A risk is selected at random45 and you observe a total of $100 in losses during a single year. One can use Bayesian Analysis to estimate the future pure premium for this risk.
45
You donʼt know to which type this risk belongs.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 140 For type A one can compute the mean pure premium by listing all the possibilities: Situation 0 claims 1 claim @ 50 1 claim @ 100 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 100 2 claims @ 100 each Overall
Probability 60.0% 24.0% 6.0% 2.5% 5.0% 2.5%
Pure Premium 0 50 100 100 150 200
100.0%
33
For example, the chance for 1 claim at $50 and one at $100, is the chance for two claims of 10%, multiplied by the Binomial probability: (2)(0.5)(0.5) = 1/2. (10%)(1/2) = 5%. Similarly, for type B: Situation 0 claims 1 claim @ 50 1 claim @ 100 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 100 2 claims @ 100 each Overall
Probability 50.0% 18.0% 12.0% 1.8% 8.4% 9.8%
Pure Premium 0 50 100 100 150 200
100.0%
55
For example, the chance for 1 claim at $50 and one at $100, is the chance for two claims of 20%, multiplied by the Binomial probability: (2)(0.7)(0.3) = 0.42. (20%)(0.42) = 8.4%. For type A, the chance of observing $100 in total losses is sum of the chance for one claim of size $100 and that for two claims each $50: 6.0% + 2.5% = 8.5%. For type B, the chance of observing $100 in total losses is: 12.0% + 1.8% = 13.8%. Then the Bayesian Analysis for an observation of a total of $100 in losses during a single year proceeds as follows: A
C
D
E
F
Type
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Pure Premium
A B
75% 25%
0.085 0.138
0.06375 0.03450
64.89% 35.11%
33 55
0.09825
100.00%
40.73
Overall
B
Thus the estimated future pure premium is: (64.89%)(33) + (35.11%)(55) = $40.73.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 141 Bayesian Analysis, Layer Average Severity Example:46 One can apply Bayesian Analysis in order to estimate any quantity of interest. For example, one might be interested in the “layer average severity”, the average dollars in a layer per loss (or per loss excess of a certain amount.) Exercise: Assume losses are given by a Single Parameter Pareto Distribution: f(x) = α 10α x−(α+1), x > 10. If α = 2.5, what are the average dollars in the layer from 15 to 30 per loss? [Solution: In general, the average dollars in the layer from 15 to 30 per loss is: E[X
∧
30] - E[X
E[X ∧ x] =
∧
15]. For the Single Parameter Pareto, with parameters α and θ:
θα αθ .47 For θ = 10 and α = 2.5: α - 1 (α -1) xα − 1
E[X ∧ 30] = (2.5)(10)/(1.5) - (102.5) / {(302.5-1)(1.5)} = 16.67 - 210.82 / 164.32 = 16.67 - 1.28 = 15.39. E[X ∧ 15] = (2.5)(10)/(1.5) - (102.5) / {(152.5-1)(1.5)} = 16.67 - 210.82 / 58.09 = 16.67 - 3.63 = 13.04. E[X ∧ 30] - E[X ∧ 15] = 15.39 - 13.04 = 2.35.] Exercise: Assume losses are given by a Single Parameter Pareto Distribution: f(x) = α 10α x−(α+1), x > 10. Assume we expect per year 6 claims excess of 10. If α = 2.5, what are the expected annual dollars in the layer from 15 to 30? [Solution: We expect 2.35 dollars in the layer for from 15 to 30 for each claim of size greater than 10. The average dollars in the layer from 15 to 30 per loss excess of ten is 2.35. Therefore, the expected annual dollars in the layer from 15 to 30 is: (6)(2.35) = 14.1. Comment: Since the Single Parameter Pareto has support starting at 10, everything is excess of 10. The number of claims excess of 10 here is analogous to the number of losses in a situation in which instead the support starts at zero.] Thus we see that given a particular value of the parameter alpha, we can compute the layer average severity.48 For α = 2.5, the layer average severity for the layer from 15 to 30 is 2.35.49
46
This example is somewhat hard. Note that this formula for the limited expected value only works for α ≠ 1. For α = 1, one could perform the appropriate integrals, rather than using the formula for the limited expected values. 48 Which with additional information can be used to compute the expected annual dollars in that layer. 49 It should be noted that if everything is in millions of dollars (θ = 10 million, and the layer is from 15 million to 30 million), then the layer average severity is just multiplied by one million. In general one can easily adjust the scale in this manner. Changing scales from for example 10 million to 10 can often make a computation easier to perform. (On the computer you may thereby avoid overflow problems.) It can also make the computation much easier to check. 47
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 142 As alpha gets smaller the Single Parameter Pareto Distribution gets heavier-tailed, while as alpha gets larger the distribution gets lighter-tailed. Thus we expect the layer average severity to depend on alpha. Exercise: Assume losses are given by a Single Parameter Pareto Distribution: f(x) = α 10α x−(α+1), x > 10, α > 1. What are the average dollars in the layer from 15 to 30 per loss? [Solution: E[X
∧
30] = 10α/(α−1) - (10α) / {(30α−1)(α−1)}.
E[X
∧
15] = 10α/(α−1) - (10α) / {(15α−1)(α−1)}.
E[X
∧
30] - E[X
∧
15] = (10α/(α−1)){1/(15α−1) - 1/(30α−1)}. ]
Now that we have the quantity of interest, the layer average severity, as a function of alpha, we are now ready to perform Bayesian Analysis. Given an a priori distribution of values for alpha and a set of observations, one can estimate the future layer average severity. Assume the following:
•
Losses are given by a Single Parameter Pareto Distribution: f(x) = α 10α x−(α+1), x > 10
•
Based on prior information we assume that alpha has the following distribution: α
A Priori Probability
2.1 2.3 2.5 2.7
20% 30% 30% 20%
•
We then observe 7 losses (each of size greater than 10): 12, 15, 17, 18, 23, 28, 39. Use Bayesian Analysis to estimate the future layer average severity for the layer from 15 to 30, (the average dollars of loss in the layer from 15 to 30 assuming there has been a single loss of size greater than 10.) Weʼve seen how to compute the layer average severity for this situation given a value for alpha. The other item to compute is the probability of the observation given alpha. The probability of the observation is the product of the densities at the observed points, given alpha. f(x) = α 10α x−(α+1). So for example if alpha = 2.5, then f(x) = 790.6/ x3.5 and f(17) = 0.0390. For alpha = 2.5, f(12)f(15)f(17)f(18)f(23)f(28)f(39) = (0.1321)(0.0605)(0.0390)(0.0320)(0.01355)(0.00681)(0.00213) = 1.960 x 10-12.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 143 The solution to this problem in the usual spreadsheet format is: A
B
C
D
E
F
G
Alpha
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Distribution of Alpha
Layer Average Severity
Square of Layer Average Severity
2.1 2.3 2.5 2.7
20% 30% 30% 20%
4.155e-12 2.931e-12 1.960e-12 1.253e-12
8.311e-13 8.792e-13 5.880e-13 2.507e-13
32.60% 34.49% 23.07% 9.83%
3.10 2.70 2.35 2.04
9.64 7.27 5.50 4.18
2.549e-12
100.00%
2.68
7.33
Overall
The posterior distribution of alpha is: 32.60%, 34.49%, 23.07%, 9.83%; posterior to the observations we believe the loss distribution is somewhat more likely to have a smaller value of alpha (be heavier-tailed.) Posterior to the observation, the estimated layer average severity is: (32.60%)(3.10) + (34.49%)(2.70) + (23.07%)(2.35) + (9.83%)(2.04) = 2.68.50 This compares to the a priori estimate of the layer average severity: (20%)(3.10) + (30%)(2.70) + (30%)(2.35) + ( 20%)(2.04) = 2.54. We can also use the posterior distribution to compute that the expected value of the square of the layer average severity is 7.33. Combining this second moment with the expected value, gives a variance of the layer average severities of 7.33 - 2.682 = 0.148. The posterior standard deviation is
0.148 = 0.38.
Note that instead of an a priori distribution on only four values, one could have had an a priori distribution with support over many more values. The a priori distribution could have been given by some function, either discrete or continuous. If the a priori distribution is continuous, then integrals (or numerical integration) would have replaced sums. The loss distribution function could have been something other than a Single Parameter Pareto Distribution. In which case it could have had more than one unknown parameter, in which case one would have to perform multiple sums or multiple integrals. The quantity of interest could have been something other than the layer average severity. In which case the only change is in the formula for the quantity of interest as a function of the parameter(s).
50
Note one weights together the values of the quantity of interest. One does not calculate the posterior mean of alpha and then calculate the layer average severity for this value of alpha. (In this case, the posterior mean of alpha is 2.327, and the corresponding layer average severity is 2.65.)
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 144 Exercise: Assume the following: • Losses (in millions of dollars) are given by a Single Parameter Pareto Distribution: f(x) = α 10α x−(α+1), x > 10.
•
Based on prior information we assume alpha has the following distribution: α
A Priori Probability
1.5 2.0 2.5 3.0 3.5
10% 20% 40% 20% 10%
•
You then observe 7 losses (each of size greater than 10): 12, 15, 17, 18, 23, 28, 39 (in millions of dollars). • You expect per year 13 losses of size greater than 10. Use Bayesian Analysis to estimate the expected annual dollars of loss (in millions of dollars) in the layer from 30 million to 50 million. [Solution: A
B
C
D
E
F
Alpha
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Distribution of Alpha
Layer Average Severity
1.5 2.0 2.5 3.0 3.5
10% 20% 40% 20% 10%
7.591e-12 4.835e-12 1.960e-12 5.971e-13 1.494e-13
7.591e-13 9.670e-13 7.840e-13 1.194e-13 1.494e-14
28.70% 36.57% 29.65% 4.52% 0.56%
2.60 1.33 0.69 0.36 0.19
2.644e-12
100.00%
1.46
Overall
The estimated layer average severity for the layer from 30 million to 50 million is 1.46 million. Therefore, the expected annual loss in the layer from 30 million to 50 million is: (1.46 million)(13) = $19.0 million.] Note that in this exercise, the a priori distribution of alpha was more diffuse than in the example above. Therefore, the observations had an opportunity to have a greater impact on our posterior estimate. The posterior distribution of alpha differed significantly from the prior distribution of alpha. Therefore, the a priori estimate of $11.6 million differed significantly from the posterior estimate of $19.0 million in expected annual losses in this layer.51
51
The a priori estimate of the layer average severity is: (0.1)( 2.60) + (0.2)(1.33) + (0.4)(0.69) + (0.2)(0.36) + (0.1)(0.19) = 0.89. (13)(0.89) = 11.6.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 145 Problems: Use the following information for the next 7 questions: There are three types of risks. Assume 60% of the risks are of Type A, 25% of the risks are of Type B, and 15% of the risks are of Type C. Each risk has either one or zero claims per year. Type of Risk Chance of a Claim A Priori Chance of Type of Risk A 20% 60% B 30% 25% C 40% 15% 5.1 (1 point) What is the overall mean annual claim frequency? A. 24.5%
B. 25.0%
C. 25.5%
D. 26.0%
E. 26.5%
5.2 (1 point) You observe no claim in a year. What is the probability that the risk you are observing is of Type A? A. 58% B. 60% C. 62% D. 64% E. 66% 5.3 (1 point) You observe no claim in a year. What is the probability that the risk you are observing is of Type B? A. 23.0% B. 23.5% C. 24.0% D. 24.5% E. 25.0% 5.4 (1 point) You observe no claim in a year. What is the probability that the risk you are observing is of Type C? A. 12% B. 14% C. 16% D. 18% E. 20% 5.5 (1 point) You observe no claim in a year. What is the expected annual claim frequency from the same risk? A. 21% B. 23% C. 25% D. 27% E. 29% 5.6 (2 points) You observe one claim in a year. What is the expected annual claim frequency from the same risk? A. 22% B. 24% C. 26% D. 28% E. 30% 5.7 (3 points) You observe a single risk over five years. You observe 2 claims in 5 years. What is the expected annual claim frequency from the same risk? A. 21% B. 23% C. 25% D. 27% E. 29%
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 146 Use the following information for the next two questions: An insured population consists of 9% youthful drivers and 91% adult drivers. Based on experience, we have derived the following probabilities that an individual driver will have n claims in a year: n Youth Adult 0 85% 95% 1 10% 4% 2 4% 1% 3 1% 0% 5.8 (2 points) If a driver has had exactly two claims in the prior year, what is the probability it is a youthful driver? A. Less than 0.26 B. At least 0.26, but less than 0.27 C. At least 0.27, but less than 0.28 D. At least 0.28, but less than 0.29 E. 0.29 or more. 5.9 (2 points) If a driver has had exactly two claims in the prior year, what is the expected number of claims for that same driver over the next year? A. Less than 0.10 B. At least 0.10, but less than 0.11 C. At least 0.11, but less than 0.12 D. At least 0.12, but less than 0.13 E. 0.13 or more. 5.10 (2 points) There are two types of risks, with equal frequencies but different size of loss distributions. Each claim is either $1000 or $2000. A Priori Chance of Percentage of Percentage of Type of Risk This Type of Risk $1000 Claims $2000 Claims Low 80% 90% 10% High 20% 70% 30% You pick a risk at random (80% chance it is Low) and observe three claims. If two of the claims were $1000 and one of the claims was $2000, what is the expected value of the next claim from that same risk? A. less than 1160 B. at least 1160 but less than 1170 C. at least 1170 but less than 1180 D. at least 1180 but less than 1190 E. at least 1190
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 147 Use the following information for the following three questions: Frequency and severity are independently distributed for each driver. There are three types of drivers with the following characteristics: Portion of Drivers Poisson Annual Pareto Type of This Type Claim Frequency Claim Severity Good
60%
5%
α = 5, θ = 10,000
Bad
30%
10%
α = 4, θ = 10,000
Ugly
10%
20%
α = 3, θ = 10,000
5.11 (3 points) A driver is observed to have over a five year period a single claim. Use Bayes theorem to predict this driverʼs future annual claim frequency. A. 9.1% B. 9.3% C. 9.5% D. 9.7% E. 9.9% 5.12 (3 points) Over 1 year, for an individual driver you observe a single claim of size $25,000. Use Bayes Theorem to estimate this driverʼs future average claim severity. A. less than $4100 B. at least $4100 but less than $4200 C. at least $4200 but less than $4300 D. at least $4300 but less than $4400 E. at least $4400 5.13 (4 points) Over 3 years, for an individual driver you observe two claims of sizes $5,000 and $25,000 in that order. Use Bayes Theorem to estimate this driverʼs future average claim severity. A. less than $4100 B. at least $4100 but less than $4200 C. at least $4200 but less than $4300 D. at least $4300 but less than $4400 E. at least $4400
5.14 (2 points) Annual claim counts for each policyholder follow a Negative Binomial distribution with r = 3. Half of the policyholders have β = 1. The other half of the policyholders have β = 2. A policyholder had 2 claims in one year. Determine the probability that for this policyholder β = 2. A. 20%
B. 25%
C. 30%
D. 35%
E. 40%
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 148 5.15 (2 points) The aggregate loss distributions for three risks for one exposure period are as follows: Aggregate Losses $0 $100 $500 Risk A 0.90 0.07 0.03 B 0.50 0.30 0.20 C 0.30 0.33 0.37 A risk is selected at random and is observed to have $500 of aggregate losses in the first exposure period. Determine the Bayesian analysis estimate of the expected value of the aggregate losses for the same risk's second exposure period. A. Less than $100 B. At least $100, but less than $125 C. At least $125, but less than $150 D. At least $150, but less than $175 E. At least $175 5.16 (2 points) Annual claim counts for each policyholder follow a Binomial distribution with m = 2. 80% of the policyholders have q = 0.10. The remaining 20% of the policyholders have q = 0.20. A policyholder had 1 claim in one year. Determine the probability that for this policyholder q = 0.10. A. 61% B. 63% C. 67% D. 69% E. 71% 5.17 (2 points) You are given the following information:
•
S i = state of the world i, for i = 1, 2, 3
•
The probability of each state = 1/3
•
In any state there is either 0 or 1 claims, and the probability of a claim = 30%
•
The claim size is either 1 or 2 units
•
Given that a claim has occurred, the following are conditional probabilities
of claim size (in units) for each possible state: S1 S2 S3 Pr(1) = 1/3 Pr(1) = 1/2 Pr(1) = 1/6 Pr(2) = 2/3 Pr(2) = 1/2 Pr(2) = 5/6 Use the data given above and Bayes' Theorem. If you observe a single claim of size 2 units, in which range is your estimate of the pure premium for that risk? A. 0.45 B. 0.47 C. 0.49 D. 0.51 E. 0.53
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 149 Use the following information for the next 6 questions: For a single insured selected from type A: • The number of claims for a single exposure period will be 1, with probability 4/5; or 2, with probability 1/5. • If only one claim is incurred, the size of the claim will be 50, with probability 3/4; or 200, with probability 1/4. • If two claims are incurred, the size of each claim, independent of the other, will be 50, with probability 60%; or 150, with probability 40%. For a single insured selected from type B: • The number of claims for a single exposure period will be 1, with probability 3/5; or 2, with probability 2/5. • If only one claim is incurred, the size of the claim will be 50, with probability 1/2; or 200, with probability 1/2. • If two claims are incurred, the size of each claim, independent of the other, will be 50, with probability 80%; or 150, with probability 20%. An insured has been selected from a population consisting of 65% insureds in type A and 35% insureds in type B. It is not known which of the two types the insured is from. This insured will be observed for two exposures periods. 5.18 (3 points) If the first exposure period had total losses of 50, determine the expected number of claims for the second exposure period. A. 1.20 B. 1.22 C. 1.24 D. 1.26 E. 1.28 5.19 (2 points) What is the mean pure premium of a risk from type A? A. 104 B. 106 C. 108 D. 110 E. 112 5.20 (2 points) What is the mean pure premium of a risk from type B? A. 100 B. 110 C. 120 D. 130 E. 140 5.21 (2 points) If the first exposure period had total losses of 50, determine the expected total losses for the second exposure period. A. 109 B. 111 C. 113 D. 115 E. 117 5.22 (3 points) If the first exposure period had total losses of 200, determine the expected number of claims for the second exposure period. A. 1.21 B. 1.23 C. 1.25 D. 1.27 E. 1.29 5.23 (2 points) If the first exposure period had total losses of 200, determine the expected total losses for the second exposure period. E. 121 A. 113 B. 115 C. 117 D. 119
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 150 5.24 (3 points) You are given:
• A portfolio of independent insureds is divided into two classes, Class A and Class B. • There are three times as many insureds in Class A as in Class B. • The number of claims for each insured during a single year follows a Bernoulli distribution. • The expected number of claims per year for an individual insured in Class A is 0.4. • The expected number of claims per year for an individual insured in Class B is 0.8. • Classes A and B have claim size distributions as follows: Claim Size Class A Class B 1000 0.30 0.50 2000 0.70 0.50 One insured is chosen at random. The insured's loss for three years combined is 2000. Use Bayesian Analysis to estimate the future pure premium for this insured. (A) 700 (B) 725 (C) 750 (D) 775 (E) 800 5.25 (3 points) You are given the following: •
A portfolio consists of 1000 independent risks.
•
450 of the risks each have a policy with a $100 per claim deductible, 300 of the risks each have a policy with a $1000 per claim deductible, and 250 of the risks each have a policy with a $10,000 per claim deductible.
•
The risks have identical claim count distributions.
•
Prior to truncation by policy deductibles, the loss size distribution for each risk is as follows: Claim Size $50 $500 $5,000 $50,000
•
Probability 40% 30% 20% 10%
A report is available which shows actual loss sizes incurred for each policy
after truncation by policy deductibles, but does not identify the policy deductible associated with each policy. The report shows exactly three losses for a single policy selected at random. Two of the losses are $50,000 and $5,000, but the amount of the third is illegible. Using Bayesʼ Theorem, what is the expected value of this illegible number? A. $11,000 B. $13,000 C. $15,000 D. $17,000 E. $19,000
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 151 5.26 (2 points) You are given the following information:
•
There are three types of risks
•
The types are homogeneous, every risk of a given type has the same
Poisson frequency process: Portion of Average Risks in (Annual) Claim Type this Type Frequency 1 70% 40% 2 20% 60% 3 10% 80% A risk is picked at random and we do not know what type it is. For this randomly selected risk, during 1 year there are 3 claims. Use Bayesian Analysis to predict the future claim frequency of this same risk. A. 0.54 B. 0.56 C. 0.58 D. 0.60 E. 0.62 5.27 (3 points) You are given the following information:
•
There are three types of risks
•
The types are homogeneous, every risk of a given type has the same
Exponential severity process: Portion of Average Risks in Claim Type this Type Size 1 70% $25 2 20% $40 3 10% $50 A risk is picked at random and we do not know what type it is. For this randomly selected risk, there are 3 claims for a total of $140. Use Bayesian Analysis to predict the future average claim size of this same risk. A. $34 B. $36 C. $38 D. $40 E. $42
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 152 Use the following information for the next four questions:
•
There are three types of risks
•
The types are homogeneous, every risk of a given type has the same Poisson frequency process and the same Exponential severity process: Portion of Average Average Risks in Annual Claim Claim Type this Type Frequency Size 1 70% 40% $25 2 20% 60% $40 3 10% 80% $50
5.28 (2 points) If a risk is of type 1, what is the likelihood of the observed total losses in a year being $140? A. 0.01% B. 0.02% C. 0.03% D. 0.04% E. 0.05% 5.29 (2 points) If a risk is of type 2, what is the likelihood of the observed total losses in a year being $140? A. 0.04% B. 0.05% C. 0.06% D. 0.07% E. 0.08% 5.30 (2 points) If a risk is of type 3, what is the likelihood of the observed total losses in a year being $140? A. 0.03% B. 0.05% C. 0.07% D. 0.09% E. 0.11% 5.31 (2 points) A risk is picked at random and we do not know what type it is. For this randomly selected risk, there are $140 of total losses in one year. Use Bayesian Analysis to predict the future average pure premium of this same risk. A. Less than $25 B. At least $25, but less than $26 C. At least $26, but less than $27 D. At least $27, but less than $28 E. $28 or more
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 153 Use the following information for the next nine questions: The years for a crop insurer are either Good, Typical, or Poor, depending on the weather conditions. The years follow a Markov Chain with transition matrix Q: ⎛ 0.70 0.20 0.10 ⎞ ⎜ 0.15 0.80 0.05 ⎟ ⎜ ⎟ ⎝ 0.20 0.30 0.50 ⎠ The insurers aggregate losses for each type of year are Exponentially distributed: Type Mean Good 25 Typical 50 Poor 100 5.32 (3 points) What is the insurerʼs average annual aggregate loss? Hint: A stationary distribution, π, satisfies the matrix equation πQ = π, and Σπi = 1. A. 41
B. 43
C. 45
D. 47
E. 49
5.33 (2 points) What is the probability that the aggregate losses in a year picked at random are greater than 120? A. 8.4% B. 8.6% C. 8.8% D. 9.0% E. 9.2% 5.34 (1 point) Let X be the insurerʼs annual losses this year and Y be the insurerʼs annual losses next year. If this year is Good, what is E[Y]? A. 34 B. 36 C. 38 D. 40 E. 42 5.35 (1 point) Let X be the insurerʼs annual losses this year and Y be the insurerʼs annual losses next year. If this year is Typical, what is E[Y]? A. 41 B. 43 C. 45 D. 47 E. 49 5.36 (1 point) Let X be the insurerʼs annual losses this year and Y be the insurerʼs annual losses next year. If this year is Poor, what is E[Y]? A. 55 B. 60 C. 65 D. 70 E. 75 5.37 (2 points) Let X be the insurerʼs annual losses this year and Y be the insurerʼs annual losses next year. What is E[XY]? A. 2500 B. 2600 C. 2700 D. 2800 E. 2900 5.38 (3 points) What is the long term variance observed in the insurerʼs annual aggregate losses? A. 3000 B. 3100 C. 3200 D. 3300 E. 3400
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 154 5.39 (2 points) Let X be the insurerʼs annual losses this year and Y be the insurerʼs annual losses next year. What is the correlation of X and Y? A. 7% B. 8% C. 9% D. 10% E. 11% 5.40 (3 points) If the insurer has losses of 75 this year, what are the insurerʼs expected losses next year? A. 46 B. 48 C. 50 D. 52 E. 54
5.41 (3 points) You are given the following information:
•
There are three types of risks
•
The types are homogeneous, every risk of a given type has the same
Exponential severity process: Portion of Average Risks in Claim Type This type Size 1 70% $25 2 20% $40 3 10% $50 A risk is picked at random and we do not know what type it is. For this randomly selected risk, there are 3 claims of sizes: $30, $40, and $70. Use Bayesian Analysis to predict the future average claim size of this same risk. A. $34 B. $36 C. $38 D. $40 E. $42 5.42 (2 points) The number of claims incurred each year is 0, 1, 2, 3, or 4, with equal probability. If there is a claim, there is a 65% chance it will be reported to the insurer by year end, independent of any other claims. If there is 1 claim incurred during 2003 that is reported by the end of year 2003, what is the estimated number of claims incurred during 2003? (A) Less than 1.7 (B) At least 1.7, but less than 1.8 (C) At least 1.8, but less than 1.9 (D) At least 1.9, but less than 2.0 (E) At least 2.0
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 155 Use the following information for the next three questions: You are given the following information about two classes of risks:
• Risks in Class A have a Poisson frequency with a mean of 0.6 per year. • Risks in Class B have a Poisson frequency with a mean of 0.8 per year. • Risks in Class A have an exponential severity distribution with a mean of 11. • Risks in Class B have an exponential severity distribution with a mean of 15. • Class A has three times the number of risks in Class B. • Within each class, severities and claim counts are independent. • A risk is randomly selected and observed to have three claims during one year. • The observed claim amounts were: 7, 10, and 21. 5.43 (2 points) Calculate the posterior expected value of the frequency for this risk. (A) Less than 0.68 (B) At least 0.68, but less than 0.69 (C) At least 0.69, but less than 0.70 (D) At least 0.70, but less than 0.71 (E) At least 0.71 5.44 (2 points) Calculate the posterior expected value of the severity for this risk. (A) Less than 12.0 (B) At least 12.0, but less than 12.5 (C) At least 12.5, but less than 13.0 (D) At least 13.0, but less than 13.5 (E) At least 13.5 5.45 (2 points) Calculate the posterior expected value of the pure premium for this risk. (Do not make separate estimates of frequency and severity.) (A) Less than 8.0 (B) At least 8.0, but less than 8.5 (C) At least 8.5, but less than 9.0 (D) At least 9.0, but less than 9.5 (E) At least 9.5
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 156 5.46 (3 points) You are given the following: • The number of claims incurred each year is Poisson with mean 4.
• If there is a claim, there is a 70% chance it will be reported to the insurer by year end. • The chance of a claim being reported by year end is independent of the reporting of any other claim, and is also independent of the number of claims incurred. If there are 3 claims incurred during 2003 that are reported by the end of year 2003, what is the estimated number of claims incurred during 2003? A. 4.0 B. 4.2 C. 4.4 D. 4.6 E. 4.8 Use the following information for the next two questions: There are two equally likely types of risks, each with severities 20 and 50: Type Probability of 20 Probability of 50 A 50% 50% B 90% 10% 5.47 (2 points) A loss of size 20 is observed. Using Bayes Analysis, what is the estimated future severity for this insured? (A) Less than 26 (B) At least 26, but less than 27 (C) At least 27, but less than 28 (D) At least 28, but less than 29 (E) At least 29 5.48 (2 points) A second loss is observed for this same insured, this time of size 50. Using Bayes Analysis, what is the estimated future severity for this insured? (A) Less than 31 (B) At least 31, but less than 32 (C) At least 32, but less than 33 (D) At least 33, but less than 34 (E) At least 34
5.49 (3 points) You are given the following:
• The number of claims incurred each year is Negative Binomial with r = 2 and β = 1.6. • If there is a claim, there is a 70% chance it will be reported to the insurer by year end. • The chance of a claim being reported by year end is independent of the reporting of any other claim, and is also independent of the number of claims incurred. If there are 5 claims incurred during 2003 that are reported by the end of year 2003, what is the estimated number of claims incurred during 2003? A. 6.0 B. 6.2 C. 6.4 D. 6.6 E. 6.8
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 157 5.50 (2 points) You are given: (i) Two risks have the following severity distributions: Probability of Claim Probability of Claim Amount of Claim Amount for Risk 1 Amount for Risk 2 250 0.5 0.7 2,500 0.3 0.2 60,000 0.2 0.1 (ii) Risk 1 is twice as likely to be observed as Risk 2. A claim of 250 is observed. Determine the Bayesian estimate of the second claim amount from the same risk. (A) Less than 10,200 (B) At least 10,200, but less than 10,400 (C) At least 10,400, but less than 10,600 (D) At least 10,600, but less than 10,800 (E) At least 10,800 5.51 (3 points) An insurance company sells three types of policies with the following characteristics: Type of Policy Proportion of Total Policies Annual Claim Frequency I
30%
Negative Binomial with r = 1 and β = 0.25
II
50%
Negative Binomial with r = 2 and β = 0.25
III
20%
Negative Binomial with r = 2 and β = 0.50
A randomly selected policyholder is observed to have a total of one claim for Year 1 through Year 4. For the same policyholder, determine the Bayesian estimate of the expected number of claims in Year 5. (A) Less than 0.4 (B) At least 0.4, but less than 0.5 (C) At least 0.5, but less than 0.6 (D) At least 0.6, but less than 0.7 (E) At least 0.7 5.52 (4 points) There are two types of insured, equally likely. They each have ground up size of loss distributions that are Pareto with α = 4. However, for one type θ = 800, while for the other type θ = 1200. For each type, there is a deductible of either 500 or 1000, equally likely. From a policy picked at random you observe two payments: 400, 1500. Determine the posterior probabilities of all four combinations of type of insured and deductible.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 158 5.53 (2 points) Each insured has at most one claim a year. Claim Size Distribution Class Prior Probability Probability of a Claim 100 200 A 3/4 1/5 2/3 1/3 B 1/4 2/5 1/2 1/2 An insured is chosen at random and a single claim of size 100 has been observed during a year. Use Bayes Theorem to estimate the future pure premium for this insured. A. 32 B. 34 C. 36 D. 38 E. 40 5.54 (4, 5/85, Q.41) (3 points) Si = state of the world i, for i = 1, 2, 3. The probability of each state = 1/3. In any state, the probability of a claim = 1/2. The claim size is either 1 or 2 units. Given that a claim has occurred, the following are conditional probabilities of claim size (in units) for each possible state: S1 S2 S3 Pr(1) = 2/3 Pr(1) = 1/2 Pr(1) = 5/6 Pr(2) = 1/3 Pr(2) = 1/2 Pr(2) = 1/6 Use the data given above and Bayes' Theorem. If you observe a single claim of size 2 units, in which range is your estimate of the pure premium for that risk? A. Less than 0.65 B. At least 0.65, but less than 0.67 C. At least .67, but less than 0.69 D. At least 0.69, but less than 0.71 E. 0.71 or more 5.55 (4, 5/87, Q.39) (2 points) An insured population consists of 1500 youthful drivers and 8500 adult drivers. Based on experience, we have derived the following probabilities that an individual driver will have n claims in a year's time: n Youth Adult 0 0.50 0.80 1 0.30 0.15 2 0.15 0.05 3 0.05 0.00 If you have a policy with exactly one claim on it in the prior year, what is the probability the insured is a youthful driver? A. Less than 0.260 B. At least 0.260, but less than 0.270 C. At least 0.270, but less than 0.280 D. At least 0.280, but less than 0.290 E. 0.290 or more
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 159 5.56 (165, 5/90, Q.9) (1.7 points) A Bayesian method is used to estimate a value t from an observed value u. You are given: (i) t is a sample from a random variable T that has a binomial distribution with q = 1/2 and m = 2. (ii) u is a sample from a random variable U with conditional distribution, given t, that is Binomial with q = t/2 and m = 2. Determine E[T | u = 2]. (A) 5/6 (B) 1 (C) 5/4 (D) 5/3 (E) 5/2 5.57 (4B, 11/92, Q.24) (2 points) A portfolio of three risks exists with the following characteristics:
•
The claim frequency for each risk is normally distributed with mean and standard deviations: Distribution of Claim Frequency Risk Mean Standard Deviation A 0.10 0.03 B 0.50 0.05 C 0.90 0.01
•
A frequency of 0.12 is observed for an unknown risk in the portfolio. Determine the Bayesian estimate of the same risk's expected claim frequency. A. 0.10 B. 0.12 C. 0.13 D. 0.50 E. 0.90 5.58 (4B, 5/93, Q.26) (2 points) You are given the following information: • An insurance portfolio consists of two classes, A and B. • The number of claims distribution for each class is: Probability of Number of Claims = Class 0 1 2 3 A 0.7 0.1 0.1 0.1 B 0.5 0.2 0.1 0.2 • Class A has three times as many insureds as Class B. • A randomly selected risk from the portfolio generates 1 claim over the most recent policy period. Determine the Bayesian analysis estimate of the claims frequency rate for the observed risk. A. Less than 0.72 B. At least 0.72 but less than 0.78 C. At least 0.78 but less than 0.84 D. At least 0.84 but less than 0.90 E. At least 0.90
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 160 5.59 (4B, 11/93, Q.17) (2 points) You are given the following: Two risks have the following severity distribution. Probability of Claim Amount For Amount of Claim Risk 1 Risk 2 100 0.50 0.70 1,000 0.30 0.20 20,000 0.20 0.10 Risk 1 is twice as likely as Risk 2 of being observed. A claim of 100 is observed, but the observed risk is unknown. Determine the Bayesian analysis estimate of the expected value of a second claim amount from the same risk. A. Less than 3,500 B. At least 3,500, but less than 3,650 C. At least 3,650, but less than 3,800 D. At least 3,800, but less than 3,950 E. At least 3,950 5.60 (4B, 5/94, Q.8) (2 points) The aggregate loss distributions for two risks for one exposure period are as follows: Aggregate Losses $50 $1,000 Risk $0 A 0.80 0.16 0.04 B 0.60 0.24 0.16 A risk is selected at random and observed to have $0 of losses in the first two exposure periods. Determine the Bayesian analysis estimator of the expected value of the aggregate losses for the same risk's third exposure period. A. Less than $90 B. At least $90, but less than $95 C. At least $95, but less than $100 D. At least $100, but less than $105 E. At least $105
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 161 5.61 (4B, 5/95, Q.19) (2 points) The aggregate loss distributions for three risks for one exposure period are as follows: Aggregate Losses $0 $50 $2,000 Risk A 0.80 0.16 0.04 B 0.60 0.24 0.16 C 0.40 0.32 0.28 A risk is selected at random and is observed to have $50 of aggregate losses in the first exposure period. Determine the Bayesian analysis estimate of the expected value of the aggregate losses for the same risk's second exposure period. A. Less than $300 B. At least $300, but less than $325 C. At least $325, but less than $350 D. At least $350, but less than $375 E. At least $375 5.62 (4B, 11/95, Q.18 & Course 4 Sample Exam 2000, Q. 11) (3 points) You are given the following:
• A portfolio consists of 150 independent risks. • 100 of the risks each have a policy with a $100,000 maximum covered loss, and 50 of the risks each have a policy with a $1,000,000 maximum covered loss. • The risks have identical claim count distributions.
• Prior to censoring by maximum covered losses, the claim size distribution for each risk is as follows: Claim Size $10,000 $50,000 $100,000 $1,000,000
Probability 1/2 1/4 1/5 1/20
• A claims report is available which shows actual claim sizes incurred for each policy after censoring by maximum covered losses, but does not identify the maximum covered loss associated with each policy. The claims report shows exactly three claims for a policy selected at random. Two of the claims are $100,000, but the amount of the third is illegible. What is the expected value of this illegible number? A. Less than $45,000 B. At least $45,000, but less than $50,000 C. At least $50,000, but less than $55,000 D. At least $55,000, but less than $60,000 E. At least $60,000
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 162 5.63 (4B, 5/96, Q.5) (2 points) You are given the following: • A portfolio of independent risks is divided into two classes.
• •
Each class contains the same number of risks.
For each risk in Class 1, the number of claims for a single exposure period follows a Poisson distribution with mean 1. • For each risk in Class 2, the number of claims for a single exposure period follows a Poisson distribution with mean 2. A risk is selected at random from the portfolio. During the first exposure period, 2 claims are observed for this risk. During the second exposure period, 0 claims are observed for this same risk. Determine the posterior probability that the risk selected came from Class 1. A. Less than 0.53 B. At least 0.53, but less than 0.58 C. At least 0.58 but less than 0.63 D. At least 0.63 but less than 0.68 E. At least 0.68 5.64 (4B, 11/96, Q.12) (3 points) You are given the following:
• 75% of claims are of Type A and the other 25% of claims are of Type B. • Type A claim sizes follow a normal distribution with mean 3,000 and variance 1,000,000. • Type B claim sizes follow a normal distribution with mean 4,000 and variance 1,000,000. A claim file exists for each of the claims, and one of them is randomly selected. The claim file selected is incomplete and indicates only that its associated claim size is greater than 5,000. Determine the posterior probability that a Type A claim was selected. A. Less than 0.15 B. At least 0.15, but less than 0.25 C. At least 0.25, but less than 0.35 D. At least 0.35, but less than 0.45 E. At least 0.45
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 163 5.65 (4B, 5/97, Q.11) (2 points) You are given the following:
• • •
A portfolio of independent risks is divided into three classes. Each class contains the same number of risks.
For all of the risks in Class 1, claim sizes follow a uniform distribution on the interval from 0 to 400. • For all of the risks in Class 2, claim sizes follow a uniform distribution on the interval from 0 to 600. • For all of the risks in Class 3, claim sizes follow a uniform distribution on the interval from 0 to 800. A risk is selected at random from the portfolio. The first claim observed for this risk is 340. Determine the Bayesian analysis estimate of the expected value of the second claim observed for this same risk. A. Less than 270 B. At least 270, but less than 290 C. At least 290, but less than 310 D. At least 310, but less than 330 E. At least 330 5.66 (4B, 5/98, Q.24) (3 points) You are given the following: • A portfolio consists of 100 independent risks. • 25 of the risks have a policy with a $5,000 maximum covered loss, 25 of the risks have a policy with a $10,000 maximum covered loss, and 50 of the risks have a policy with a $20,000 maximum covered loss. • The risks have identical claim count distributions. • Prior to censoring by maximum covered losses, claims size for each risk follow a Pareto distribution, with parameters θ = 5,000 and α = 2. •
A claims report is available which shows the number of claims in various claim size ranges for each policy after censoring by maximum covered loss, but does not identify the maximum covered loss associated with each policy. The claims report shows exactly one claim for a policy selected at random. This claim falls in the claim size range of $9,000 to $11,000. Determine the probability that this policy has a $10,000 maximum covered loss. A. Less than 0.35 B. At least 0.35, but less than 0.55 C. At least 0.55, but less than 0.75 D. At least 0.75, but less than 0.95 E. At least 0.95
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 164 5.67 (4B, 5/99, Q.16) (2 points) You are given the following: • The number of claims per year for Risk A follows a Poisson distribution with mean m . • The number of claims per year for Risk B follows a Poisson distribution with mean m + 1. • The probability of selecting Risk A is equal to the probability of selecting Risk B. One of the risks is randomly selected, and zero claims are observed for this risk during one year. Determine the posterior probability that the selected risk is Risk A. A. Less than 0.3 B. At least 0.3, but less than 0.5 C. At least 0.5, but less than 0.7 D. At least 0.7, but less than 0.9 E. At least 0.9 5.68 (4B, 11/99, Q.28) (2 points) You are given the following: • The number of claims per year for Risk A follows a Poisson distribution with mean m. • The number of claims per year for Risk B follows a Poisson distribution with mean 2m. • The probability of selecting Risk A is equal to the probability of selecting Risk B. One of the risks is randomly selected, and zero claims are observed for this risk during one year. Determine the posterior probability that the selected risk will have at least one claim during the next year. A.
1 - e- m 1 + e- m
B.
1 - e - 3m 1 + e- m
C. 1 - e-m
D. 1 - e-2m
E. 1 - e-2m - e-4m
5.69 (4, 5/00, Q.7) (2.5 points) You are given the following information about two classes of risks: (i) Risks in Class A have a Poisson claim count distribution with a mean of 1.0 per year. (ii) Risks in Class B have a Poisson claim count distribution with a mean of 3.0 per year. (iii) Risks in Class A have an exponential severity distribution with a mean of 1.0. (iv) Risks in Class B have an exponential severity distribution with a mean of 3.0. (v) Each class has the same number of risks. (vi) Within each class, severities and claim counts are independent. A risk is randomly selected and observed to have two claims during one year. The observed claim amounts were 1.0 and 3.0. Calculate the posterior expected value of the aggregate loss for this risk during the next year. (A) Less than 2.0 (B) At least 2.0, but less than 4.0 (C) At least 4.0, but less than 6.0 (D) At least 6.0, but less than 8.0 (E) At least 8.0 5.70 (2 points) In the previous question, 4, 5/00, Q.7, change the observation to: the observed two claim amounts were 1.0, and at least 3.0. Calculate the posterior expected value of the aggregate loss for this risk during the next year.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 165 5.71 (4, 5/00, Q.22) (2.5 points) You are given: (i) A portfolio of independent risks is divided into two classes, Class A and Class B. (ii) There are twice as many risks in Class A as in Class B. (iii) The number of claims for each insured during a single year follows a Bernoulli distribution. (iv) Classes A and B have claim size distributions as follows: Claim Size Class A Class B 50,000 0.60 0.36 100,000 0.40 0.64 (v) The expected number of claims per year is 0.22 for Class A and 0.11 for Class B. One insured is chosen at random. The insured's loss for two years combined is 100,000. Calculate the probability that the selected insured belongs to Class A. (A) 0.55 (B) 0.57 (C) 0.67 (D) 0.71 (E) 0.73 5.72 (4, 11/00, Q.28) (2.5 points) Prior to observing any claims, you believed that claim sizes followed a Pareto distribution with parameters θ = 10 and α = 1, 2 or 3, with each value being equally likely. You then observe one claim of 20 for a randomly selected risk. Determine the posterior probability that the next claim for this risk will be greater than 30. (A) 0.06 (B) 0.11 (C) 0.15 (D) 0.19 (E) 0.25 5.73 (3 points) In 4, 11/00, Q.28, you instead observe for a randomly selected risk two claims, one of size 20 and the other of size 40, not necessarily in that order. Determine the posterior probability that the next claim for this risk will be greater than 30. (A) 18% (B) 20% (C) 22% (D) 24% (E) 26% 5.74 (2 points) In 4, 11/00, Q.28, you instead observe for a randomly selected risk one claim of size 20 or more. Determine the posterior probability that the next claim for this risk will be greater than 30. (A) 13% (B) 15% (C) 17% (D) 19% (E) 21%
5.75 (4, 11/02, Q.39 & 2009 Sample Q. 55) (2.5 points) You are given: Number of Claim Count Probabilities Class Insureds 0 1 2 3 4 1 3000 1/3 1/3 1/3 0 0 2 2000 0 1/6 2/3 1/6 0 3 1000 0 0 1/6 2/3 1/6 A randomly selected insured has one claim in Year 1. Determine the expected number of claims in Year 2 for that insured. (A) 1.00 (B) 1.25 (C) 1.33 (D) 1.67 (E) 1.75
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 166 5.76 (CAS3, 11/03, Q.12) (2.5 points) A driver is selected at random. If the driver is a "good" driver, he is from a Poisson population with a mean of 1 claim per year. If the driver is a "bad" driver, he is from a Poisson population with a mean of 5 claims per year. There is equal probability that the driver is either a "good" driver or a "bad" driver. If the driver had 3 claims last year, calculate the probability that the driver is a "good" driver. A. Less than 0.325 B. At least 0.325, but less than 0.375 C. At least 0.375, but less than 0.425 D. At least 0.425, but less than 0.475 E. At least 0.475 5.77 (CAS3, 11/03, Q.13) (2.5 points) The Allerton Insurance Company insures 3 indistinguishable populations. The claims frequency of each insured follows a Poisson process. Given: Expected Probability Population time between of being in Claim (class) claims class cost I 12 months 1/3 1,000 II 15 months 1/3 1,000 Ill 18 months 1/3 1,000 Calculate the expected loss in year 2 for an insured that had no claims in year 1. A. Less than 810 B. At least 810, but less than 910 C. At least 910, but less than 1,010 D. At least 1,010, but less than 1,110 E. At least 1,110 5.78 (4, 11/03, Q.14 & 2009 Sample Q.11) (2.5 points) You are given: (i) Losses on a companyʼs insurance policies follow a Pareto distribution with probability θ density function: f(x | θ) = , 0 < x< ∞. (x + θ) 2 (ii) For half of the companyʼs policies θ = 1, while for the other half θ = 3. For a randomly selected policy, losses in Year 1 were 5. Determine the posterior probability that losses for this policy in Year 2 will exceed 8. (A) 0.11 (B) 0.15 (C) 0.19 (D) 0.21 (E) 0.27
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 167 5.79 (4, 11/03, Q.39 & 2009 Sample Q.29) (2.5 points) You are given: (i) Each risk has at most one claim each year. (ii) Type of Risk Prior Probability Annual Claim Probability I 0.7 0.1 II 0.2 0.2 III 0.1 0.4 One randomly chosen risk has three claims during Years 1-6. Determine the posterior probability of a claim for this risk in Year 7. (A) 0.22 (B) 0.28 (C) 0.33 (D) 0.40 (E) 0.46 5.80 (4, 11/04, Q.5 & 2009 Sample Q.136) (2.5 points) You are given: (i) Two classes of policyholders have the following severity distributions: Probability of Claim Probability of Claim Claim Amount Amount for Class 1 Amount for Class 2 250 0.5 0.7 2,500 0.3 0.2 60,000 0.2 0.1 (ii) Class 1 has twice as many claims as Class 2. A claim of 250 is observed. Determine the Bayesian estimate of the expected value of a second claim from the same policyholder. (A) Less than 10,200 (B) At least 10,200, but less than 10,400 (C) At least 10,400, but less than 10,600 (D) At least 10,600, but less than 10,800 (E) At least 10,800 5.81 (4, 5/05, Q.35 & 2009 Sample Q.203) (2.9 points) You are given: (i) The annual number of claims on a given policy has the geometric distribution with parameter β. (ii) One-third of the policies have β = 2, and the remaining two-thirds have β = 5. A randomly selected policy had two claims in Year 1. Calculate the Bayesian expected number of claims for the selected policy in Year 2. (A) 3.4 (B) 3.6 (C) 3.8 (D) 4.0 (E) 4.2
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 168 5.82 (4, 11/05, Q.15 & 2009 Sample Q.226) (2.9 points) For a particular policy, the conditional probability of the annual number of claims given Θ = θ, and the probability distribution of Θ are as follows: Number of Claims
0
1
2
Probability
2θ
θ
1 - 3θ
θ
0.10
0.30
Probability 0.80 0.20 One claim was observed in Year 1. Calculate the Bayesian estimate of the expected number of claims for Year 2. (A) Less than 1.1 (B) At least 1.1, but less than 1.2 (C) At least 1.2, but less than 1.3 (D) At least 1.3, but less than 1.4 (E) At least 1.4 5.83 (CAS3, 5/06, Q.30) (2.5 points) Claim counts for each policyholder are independent and follow a common Negative Binomial distribution. A priori, the parameters for this distribution are (r, β) = (2, 2) or (r, β) = (4, 1). Each parameter set is considered equally likely. Policy files are sampled at random. The first two files sampled do not contain any claims. The third policy file contains a single claim. Based on this information, calculate the probability that (r, β) = (2, 2). A. Less than 0.30 B. At least 0.30, but less than 0.45 C. At least 0.45, but less than 0.60 D. At least 0.60, but less than 0.75 E. At least 0.75 5.84 (4, 11/06, Q.16 & 2009 Sample Q.260) (2.9 points) You are given: (i) Claim sizes follow an exponential distribution with mean θ. (ii) For 80% of the policies, θ = 8. (iii) For 20% of the policies, θ = 2. A randomly selected policy had one claim in Year 1 of size 5. Calculate the Bayesian expected claim size for this policy in Year 2. (A) Less than 5.8 (B) At least 5.8, but less than 6.2 (C) At least 6.2, but less than 6.6 (D) At least 6.6, but less than 7.0 (E) At least 7.0
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 169 Solutions to Problems: 5.1. C. (20%)(60%) + (30%)(25%) + (40%)(15%) = 25.5%. Comment: The chance of observing no claim is: 1 - 0.255 = 0.745. 5.2. D. P(Type A | no claim) = P(no claim | Type A)P(Type A) / P (no claim) = (0.8)(0.6) / 0.745 = 64.43%. 5.3. B. (0.7)(0.25) / 0.745 = 23.49%. 5.4. A. (0.6)(0.15) / 0.745 = 12.08%. 5.5. C. A
B
C
A Priori Type of Chance of Chance Risk This Type of the of Risk Observation A 0.6 0.8 B 0.25 0.7 C 0.15 0.6 Overall
D
E
F
Prob. Weight = Product of Columns B&C 0.480 0.175 0.090
Posterior Chance of This Type of Risk 64.43% 23.49% 12.08%
Mean Annual Freq. 0.20 0.30 0.4
0.745
1.000
24.77%
5.6. D. A
Type of Risk A B C Overall
B
A Priori Chance of This Type of Risk 0.6 0.25 0.15
C
D
E
F
Chance of the Observation 0.2 0.3 0.4
Prob. Weight = Product of Columns B&C 0.120 0.075 0.060
Posterior Chance of This Type of Risk 47.06% 29.41% 23.53%
Mean Annual Freq. 0.20 0.30 0.40
0.255
1.000
27.65%
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 170 5.7. D. For example, if one has a risk of Type B, the chance of observing 2 claims in 5 years is given by (a Binomial Distribution): (10)(0.32 )(0.73 ) = 0.3087. A
B
C
Type of A Priori Risk Chance of This Type of Risk A B C
D
Chance of the Observation
0.6 0.25 0.15
0.2048 0.3087 0.3456
Overall
E
F
Prob. Weight = Posterior Product of Columns Chance of This B&C Type of Risk
Mean Annual Freq.
0.123 0.077 0.052
48.78% 30.64% 20.58%
0.20 0.30 0.4
0.252
1.000
27.18%
5.8. D. A
B
Type of Driver Youth Adult
A Priori Chance of This Type of Driver 0.090 0.910
C
D
E
Chance of the Observation 4% 1%
Prob. Weight = Product of Columns B&C 0.0036 0.0091
Posterior Chance of This Type of Driver 28.35% 71.65%
0.013
1.000
Overall
5.9. B. The mean claim frequency for youthful drivers is: (1)(10%) + (2)(4%) + (3)(1%) = 21%. The mean claim frequency for adult drivers is (1)(4%) + (2)(1%) = 6%. (28.53%)(21%) + (71.65%)(6%) = 10.29%. A
B
Type of Driver Youth Adult
A Priori Chance of This Type of Driver 0.090 0.910
Overall
C
D
E
Prob. Weight = Posterior Chance Product Chance of of the of Columns This Type Observation B&C of Driver 4% 0.0036 28.35% 1% 0.0091 71.65% 0.013
1.000
F
Expected Claims per 3 Years 0.210 0.060 0.103
Comment: Assumes that if the driver were youthful in the prior year, he will also be youthful in the future period. On the exam, do not worry about such possible real world concerns.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 171 5.10. B. A
B
A Priori Chance of Type of This Type Risk of Risk Low 0.8000 High 0.2000
C
D
E
F
Chance of the Observation 0.2430 0.4410
Prob. Weight = Product of Columns B&C 0.1944 0.0882
Posterior Chance of This Type of Risk 0.6879 0.3121
Mean Claim from Risk 1100 1300
0.283
1.000
1162
Overall
5.11. A. Note that over 5 years one gets a Poisson with 5 times the mean for a single year. For the Poisson with mean µ, the chance of n accidents is e−µ µn / n!. Therefore the chance of a single accident is µexp(-µ). The chance of a single accident is therefore for each of the type of drivers: Good µ = 0.25: 0.195, Bad µ = 0.5: 0.303, Ugly µ = 1: 0.368. Therefore the Probability Weights are: (0.195)(0.6), (0.303)(0.3), (0.368)(0.1). Taking a weighted average of the claim frequencies of each type gives 9.1%. A
Type of Driver Good Bad Ugly Overall
B
A Priori Chance of This Type of Driver 0.6 0.3 0.1
C
D
E
F
Chance of the Observation 0.195 0.303 0.368
Prob. Weight = Product of Columns B&C 0.117 0.091 0.037
Posterior Chance of This Type of Driver 47.8% 37.1% 15.0%
Mean Annual Freq. 0.05 0.10 0.20
0.245
1.000
9.1%
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 172 5.12. B. The posterior probability of the $25,000 claim having come from each type of driver is proportional to the product of the a priori chance of having that type of driver, times the chance of observing one claim during a year from that type of driver, times the chance of having a claim of size 25,000. (For example, the chance of a claim having come from a Good driver times the chance of such a claim being of size 25,000 for a Good Driver.) For example, for Good drivers (Poisson with mean .05) the chance of observing a single claim over one year is (.05)e-.05 = .04756. For the Pareto f(x) = (αθα)(θ + x) − (α + 1). Thus for the Good drivers, with Pareto parameters α = 5 and θ = 10,000, given one has a claim the chance it is of size 25,000 is f(25,000) = 5(100005 )/(350006 ) = (5 x 10-4)/(3.56 ) = 2.72 x 10-7. The chances of claims of size $25000 are: 2.7, 7.6, 20.0 each times 10-7. Thus the probabilities are proportional to: (.04756)(2.7 x 10-7)(60%), etc. The average severities by Type of driver are θ/(α−1): 2500, 3333, and 5000. Therefore, the weighted average is $4120. A
Type of Driver Good Bad Ugly Overall
B
C
α 5 4 3
A Priori Chance of This Type of Driver 0.6 0.3 0.1
D
E
F
f(25000) 2.72e-7 7.62e-7 2.00e-6
Chance of Observing one Claim 0.04756 0.09048 0.16375
Prob. Weight = Product of Columns C, D, E 7.76e-9 2.07e-8 3.27e-8 6.12e-8
G
H
Posterior Chance of This Type Average of Driver Severity 12.7% 2500.00 33.8% 3333.33 53.5% 5000.00 1.00
4120
Comment: Note that with differing claim frequencies and severities for different types of risks one has to take into account both when computing the chance of observing a given size claim. A Good Driver is less likely to have produced a claim than a Bad Driver and in this example if a Good Driver produces a claim it is less likely to be $25000 than a claim from a Bad Driver.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 173 5.13. E. P(Observation | Risk Type) = P(2 claims in 3 years | Risk Type)P(Severities = 5000 & 25000 | Risk Type & 2 claims). Thus we need to compute both the probability of 2 claims in 3 years given a risk type and the probability given a risk type and that we observe 2 claims in 3 years, that the claims are of sizes $5000 and $25,000. Over three years each of the risk types has a Poisson frequency with three times the annual mean. For example for Bad drivers, the frequency over three years is Poisson with parameter (3)(10%) = 30%. For a Bad driver, the chance of observing 2 claims in 3 years is thus: (.32 )e-.3 / 2 = .03334. The likelihood given one has observed two claims that they were of sizes 5,000 and 25,000 is the product of f(5000)f(25000). For example for Bad drivers with Pareto parameters α = 4, θ =10,000, f(5000)f(25000) = {4(100004 )/(150005 )}{4(100004 )/(350005 )} = 4.0117 x 10-11. Thus given a Bad driver, the chance of the current observation is their product: (.03334) (4.0117 x 10-11) = 1.337 x 10-12. Then the probability weight for Bad Drivers is the product of the chance of the observation given a Bad Driver and the a priori chance of a Bad Driver: (.3)(1.337 x 10-12) = (.3)(.03334) (4.0117 x 10-11) = 4.01 x 10-13. Getting the probability weights for Good and Ugly drivers in a similar manner, one divides each weight by the sum of the weights and computes posterior probabilities of: 4.2%, 24.5% and 71.3%. The average severities by type of driver are θ/(α−1): 2500, 3333, and 5000. Therefore, the weighted average is $4487. A
B
C
D
E
Type of Driver
A Priori Chance of Chance of Observing This Type 2 Claims α of Driver in f(5000) 3 Years
Good Bad Ugly
5 4 3
Overall
0.6 0.3 0.1
0.00968 0.03334 0.09879
4.39e-5 5.27e-5 5.93e-5
F
f(25000) 2.72e-7 7.62e-7 2.00e-6
G
H
Chance of ProbObser- ability vation Weight DXEXF CxG 1.16e-13 1.34e-12 1.17e-11
I
J
Posterior Chance of This Type Average of Driver Severity
6.94e-14 4.01e-13 1.17e-12
4.2% 24.5% 71.3%
2500.0 3333.3 5000.0
1.64e-12
1.00
4487
5.14. D. f(2) = {r(r+1)/2}β2/(1+β)r+2 = 6β2/(1+β)5 . For β = 1, f(2) = 3/16. For β = 2, f(2) = 8/81. P[Observation] = (1/2)(3/16) + (1/2)(8/81) = 0.1431. By Bayes Theorem, P[Risk Type | Observation] = P[Obser. | Type] P[Type] / P[Observation]. P[Risk Type Two | Observation] = (8/81)(.5)/0.1431 = 34.5%. Comment: Similar to CAS3, 5/06, Q.30.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 174 5.15. E. A
B
A Priori Chance of Type of This Type Risk of Risk A 0.333 B 0.333 C 0.333 Overall
C
D
E
F
Chance of the Observation 0.030 0.200 0.370
Prob. Weight = Product of Columns B&C 0.0100 0.0667 0.1233 0.2000
Posterior Chance of This Type of Risk 5.0% 33.3% 61.7% 1.000
Avg. Aggregate Losses 22 130 218 179
For example, the average aggregate loss for risk type B is: (0)(.50) + (100)(.30) + (500)(.20) = 130. The estimated future aggregate losses are: (5.0%)(22) + (33.3%)(130) + (61.7%)(218) = 179. 5.16. D. f(1) = 2q(1-q). For q = 0.10, f(1) = 0.18. For q = 0.20, f(1) = 0.32. P[Observation] = (80%)(0.18) + (20%)(0.32) = 0.208. By Bayes Theorem, P[Risk Type | Observation] = P[Obser. | Type] P[Type] / P[Observation]. P[Risk Type One | Observation] = (.18)(.8)/0.208 = 69.2%. Comment: Similar to CAS3, 5/06, Q.30. 5.17. D. A
State of World 1 2 3 Overall
B
A Priori Chance of This Type of State 0.333 0.333 0.333
C
Chance of the Observation 0.200 0.150 0.250
D
E
Prob. Weight = Posterior Product Chance of of Columns This Type B&C of Risk 0.0667 33.3% 0.0500 25.0% 0.0833 41.7% 0.2000 1.000
F
Pure Premium 0.500 0.450 0.550 0.508
Comment: Remember to multiply by the given 30% claim frequency in order to convert the mean severities into mean pure premiums.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 175 5.18. C. If one has total losses of 50, then one has had a single claim of size 50. The chance of this observation for type A is: (4/5)(3/4) = 60%. For type B it is: (3/5)(1/2) = 30%. Then the Bayesian Analysis to determine the expected claim frequency is: A
B
Type A B
A Priori Chance of This Type 0.650 0.350
C
D
E
F
Chance of the Observation 0.6000 0.3000
Prob. Weight = Product of Columns B&C 0.3900 0.1050
Posterior Chance of this Type 78.8% 21.2%
Mean Freq. for this Type 1.200 1.400
0.4950
1.000
1.242
Overall
Comment: Note that the frequency and severity are not independent. 5.19. B. The different possible outcomes and their probabilities for a Risk from type A are: Situation 1 claim @ 50 1 claim @ 200 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 150 2 claims @ 150 each Overall
Probability 60.0% 20.0% 7.2% 9.6% 3.2%
Total Losses 50 200 100 200 300
100.0%
106.0
For example, the chance of 2 claims with one of size 50 and one of size 150 is the chance of having two claims times the chance given two claims that one will be 50 and the other 150 is: (.2){(2)(.6)(.4)} = 9.6%. In that case the total losses is 50 + 150 = 200. 5.20. D. The different possible outcomes and their probabilities for a Risk from type B are: Situation 1 claim @ 50 1 claim @ 200 2 claims @ 50 each 2 claims: 1 @ 50 & 1 @ 150 2 claims @ 150 each Overall
Probability 30.0% 30.0% 25.6% 12.8% 1.6%
Total Losses 50 200 100 200 300
100.0%
131
For example, the chance of 2 claims each of size 150 is the chance of having two claims times the chance given two claims that both will be 150 is: (0.4){(0.2)(0.2)} = 1.6%.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 176 5.21. B. If one has total losses of 50, then one has had a single claim of size 50. The chance of this observation for type A is: (4/5)(3/4) = 60%. For type B it is: (3/5)(1/2) = 30%. Then the Bayesian Analysis to determine the expected pure premium is: A
B
Type A B
A Priori Chance of This Type 0.650 0.350
C
D
E
F
Chance of the Observation 0.6000 0.3000
Prob. Weight = Product of Columns B&C 0.3900 0.1050
Posterior Chance of this Type 78.8% 21.2%
Mean P.P. for this Type 106 131
0.4950
1.000
111.3
Overall
5.22. E. If one has total losses of 200, then either one has had a single claim of size 200, or one had claims of 50 and 150. The chance of this observation for type A is: (4/5)(1/4) + (1/5){(2)(.6)(.4)} = 29.6%. For type B it is: (3/5)(1/2) + (2/5){(2)(.8)(.2)} = 42.8%. Then the Bayesian Analysis to determine the expected claim frequency is: A
B
C
D
E
F
Type
A Priori Chance of This Type
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of this Type
Mean Freq. for this Type
A B
0.650 0.350
0.2960 0.4280
0.1924 0.1498
56.2% 43.8%
1.200 1.400
0.3422
1.000
1.288
Overall
5.23. C. If one has total losses of 200, then either one has had a single claim of size 200, or one had claims of 50 and 150. The chance of this observation for type A is: (4/5)(1/4) + (1/5){(2)(.6)(.4)} = 29.6%. For type B it is: (3/5)(1/2) + (2/5){(2)(.8)(.2)} = 42.8%. Then the Bayesian Analysis to determine the expected pure premium is: A
B
C
D
E
F
Type A B
A Priori Chance of This Type 0.650 0.350
Chance of the Observation 0.2960 0.4280
Prob. Weight = Product of Columns B&C 0.1924 0.1498
Posterior Chance of this Type 56.2% 43.8%
Mean P.P. for this Type 106 131
0.3422
1.000
116.9
Overall
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 177 5.24. C. The mean pure premium for class A is: (.4)(1700) = 680. The mean pure premium for class B is: (.8)(1500) = 1200. The observation of 2000 over 3 years corresponds to either a single claim of size 2000 (a claim in one year and none in the others) or two claims of size 1000 (a claim in two of the years and none in the remaining year.) The chance of 1 claim over three years is: 3q(1-q)2 . The chance of 2 claims over three years is: 3q2 (1-q). If the risk is from Class A, then the chance of the observation is: (.7){3(.4)(1-.4)2 } + (.3)(.3){3(.4)2 (1-.4)} = .32832. If the risk is from Class B, then the chance of the observation is: (.5){3(.8)(1-.8)2 } + (.5)(.5){3(.8)2 (1-.8)} = .144. A
B
C
D
E
F
Class
A Priori Chance of This Class
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Class
Mean Pure Premium
A B
0.7500 0.2500
0.3283 0.1440
0.2462 0.0360
0.8724 0.1276
680 1,200
0.2822
1.0000
746.3
Overall
Comment: Similar to 4, 5/00, Q.22.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 178 5.25. D. If one has a $100 deductible, then the chance of having a $5,000 loss reported is 20%/60% = 1/3, while the chance of having a $50,000 loss reported is 10%/60% = 1/6. So if the deductible were $100, then the chance of the first two observed losses having sizes of $50,000 and $5,000 is: (2)(1/6)(1/3) = 1/9. If instead one has a $1000 deductible, the chance of having a $5,000 loss reported is 20%/30% = 2/3, while the chance of having a $50,000 loss reported is 10%/30% = 1/3. So if the deductible were $1000, then the chance of the first two observed losses having sizes of $50,000 and $5,000 is: (2)(1/3)(2/3) = 4/9. If one has a $10,000 deductible, then a $5000 loss would not be reported, so there is no chance for this observation. Now the a priori chance of having a $100 deductible is 45%. The mean size of a reported loss when the deductible is $100 is: {(30%)(500) + (20%)(5000) + (10%)(50,000)} / {30% + 20% + 10%} = 10,250. The mean size of a reported loss when the deductible is $1000 is: { (20%)(5000) + (10%)(50,000)} / {20%+10%} = 20000. Putting all of the above together, the posterior estimate of a third loss from the same policy is: A
B
C
D
E
F
Deductible Size (Type of Risk)
A Priori Chance of This Type of Risk
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Average Reported Loss Severity
100 1000 10000
0.450 0.3 0.250
0.11110 0.4444 0.00000
0.04999 0.13332 0.00000
0.27273 0.72727 0.00000
10,250 20,000 50,000
0.18332
1.00000
17,341
Overall
5.26. C. The chance of observing 3 claims for a Poisson is: e−θ θ3 /3!. Therefore the chance of observing 3 claims for a risk of type 1 is: e-.4 (.43 ) / 6 = .00715. A
Type 1 2 3 Overall
B
C
D
E
F
A Priori Probability 70% 20% 10%
Chance of the Observation 0.00715 0.01976 0.03834
Prob. Weight = Product of Columns B&C 0.005005 0.003951 0.003834
Posterior Chance of This Type of Risk 39.13% 30.89% 29.98%
Mean Annual Freq. 0.4 0.6 0.8
0.012791
1.000
0.5817
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 179 5.27. A. The sum of 3 independent claims drawn from a single Exponential Distribution with parameter θ is a Gamma Distribution with parameters: α = 3 and θ. So if the risk is of type 1, the distribution of the sum of 3 claims is Gamma with α = 3 and θ = 25. For type 2 the sum has parameters: α = 3 and θ = 40. For type 3 the sum has parameters: α = 3 and θ = 50. Thus assuming one has 3 claims, the chance that they will add to $140 if the risk is of type 1 is the density of a Gamma(3, 25) at 140: θ−αxα−1 e−x/θ / Γ(α) = (.04)3 1403-1 e-(.04)(140) / Γ(3) = 0.0023193. Similarly, the chance for type 2 is: (.025)3 1403-1 e-(.025)(140) / Γ(3) = .00462397. The chance of the observation for type 3 is: (.02)3 1403-1 e-(.02)(140) / Γ(3) = .00476751. A
Type 1 2 3 Overall
B
C
D
E
F
A Priori Probability 70% 20% 10%
Chance of the Observation 0.00232 0.00462 0.00477
Prob. Weight = Product of Columns B&C 0.001624 0.000925 0.000477
Posterior Chance of This Type of Risk 53.67% 30.57% 15.75%
Mean Annual Severity 25 40 50
0.003025
100.00%
33.52
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 180 5.28. A. The chance of observing n claims for a Poisson is e−λ λ n /n!. Therefore, for example, the chance of observing 3 claims for a risk of type 1 is : e-.4 (.43 ) / 6 = .00715. If one had three claims, then the sum of the claims is given by a Gamma with α =3 and θ. Thus assuming one had 3 claims, the chance that they will add to $140 if the risk is of type 1 is the density of Gamma(3, 25) at 140: θ−αxα−1 e−x/θ / Γ(α) = (25)3 1403-1 e-140/25 / Γ(3) = 0.0023193. Thus if one has type 1, the chance of observing 3 claims totaling $140 is: (.00715)(.0023193) = .00001658. One can compute and then combine the other ways to get a total of $140 in loss for type 1: A
B
C
D
Number of Claims
Probability of this number of Claims
Given this Number of Claims, the Probability of $140 in Total Losses
Column B times Column C
0 1 2 3 4 5
0.67032 0.26813 0.05363 0.00715 0.00072 0.00006
0 0.0001479 0.0008283 0.0023193 0.0043294 0.0060611
0 0.000039660 0.000044419 0.000016583 0.000003096 0.000000347
Sum
1.00000
0.000104105
Thus for a risk of type 1 the likelihood of $140 in loss in a year is .000104. Comment: Iʼve ignored the possibility of more than 5 claims, since that adds very little to the total likelihood in this case. 5.29. C. For type two with average frequency of 60% and average severity of $40, the likelihood of the observed total losses in a year being $140 is computed: A
B
C
D
Number of Claims
Probability of this number of Claims
Given this Number of Claims, the Probability of $140 in Total Losses
Column B times Column C
0 1 2 3 4 5
0.54881 0.32929 0.09879 0.01976 0.00296 0.00036
0 0.00075493 0.00264227 0.00462397 0.00539464 0.00472031
0 0.00024859 0.00026102 0.00009136 0.00001599 0.00000168
Sum
0.99996
0.00061863
Comment: We have ignored the possibility of more than 5 claims, since that adds very little to the total likelihood in this case.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 181 5.30. E. For type 3 with average frequency of 80% and average severity of $50, the likelihood of the observed total losses in a year being $140 is computed: A
B
C
D
Number of Claims
Probability of this number of Claims
Column B times Column C
0 1 2 3 4 5
0.44933 0.35946 0.14379 0.03834 0.00767 0.00123
Given this Number of Claims, the Probability of $140 in Total Losses 0 0.00121620 0.00340536 0.00476751 0.00444967 0.00311477
Sum
0.99982
0 0.00043718 0.00048964 0.00018280 0.00003412 0.00000382 0.00114756
5.31. C. One can use these likelihoods computed in the prior three questions in the usual manner to get posterior probabilities for each type: A
Type 1 2 3
B
A Priori Probability 70% 20% 10%
C
D
E
F
Chance of the Observation 0.00010410 0.00061860 0.00114800
Prob. Weight = Product of Columns B&C 0.00007287 0.00012372 0.00011480
Posterior Chance of This Type of Risk 23.40% 39.73% 36.87%
Mean Annual Pure Premium 10 24 40
0.00031139
100.00%
26.6
Overall
Using the posterior probabilities as weights, the estimated future pure premium is $26.6. 5.32. D. The balance equations for the stationary distribution are: .7π1 + .15π2 + .2π3 = π1.
.2π1 + .8π2 + .3π3 = π2.
.1π1 + .05π2 + .5π3 = π3.
Also π1 + π2 + π3 = 1.
Eliminating π3 from the first two equations: 1.7π1 - 1.15π2 = 3π1 - 2π2. Thus π2 = 1.529π1. Substituting into the third equation: π3 = (.1π1 + .05(1.529)π1)/.5 = .353π1. Therefore, substituting into the constraint equation: (1+ 1.529 +.353)π1 = 1. Thus π1 = .347, π2 = .531, and π3 = .122. Therefore, the mean annual aggregate losses = (.347)(25) + (.531)(50) +(.122)(100) = 47.4.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 182 5.33. C. S(x) = e-x/θ, with θ varying by type of year. Good: S(120) = exp(-120/25) = .82%. Typical: S(120) = exp(-120/50) = 9.07%. Poor: S(120) = exp(-120/100) = 30.12%. From the previous solution, π1 = .347, π2 = .531, and π3 = .122. Therefore, the probability that the annual aggregate losses are greater than 120 is: (.347)(.82%) + (.531)(9.07%) + (.122)(30.12%) = 8.8%. 5.34. C. If this year is Good, there is a 70% chance next year is Good, 20% chance next year is Typical, and 10% chance next year is Poor. Therefore, E[Y | X is Good] = (.7)(25) + (.2)(50) +(.1)(100) = 37.5. 5.35. E. (.15)(25) + (.8)(50) + (.05)(100) = 48.75. 5.36. D. (.2)(25) + (.3)(50) + (.5)(100) = 70. Comment: Note that the insurerʼs average annual loss could be computed using the probabilities of X being of a certain type and E[Y | Type of X]: (34.7%)(37.5) + (53.1%)(48.75) + (12.2%)(70) = 47.4, which matches the solution to a previous question. 5.37. A. E[XY | Type of X] = E[X | Type of X] E[Y | Type of X]. E[XY | X is Good] = (25)(37.5) = 937.5. Type of First Year
Probability
Mean for First Year
Expected Value for Second Year
E[XY]
Good Typical Poor
34.7% 53.1% 12.2%
25 50 100
37.5 48.75 70
937.5 2437.5 7000
Average
2474
Comment: Uses the solutions to previous questions. E[XY | Type of X] = Σ xy Prob[X = x and Y = y | Type of X] =
Σ xy Prob[X = x | Type of X] Prob[ Y = y | Type of X] = Σ x Prob[X = x | Type of X] Σ y Prob[Y = y | Type of X] = E[X | Type of X] E[Y | Type of X].
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 183 5.38. D. For a year chosen at random, the aggregate losses are distributed as per: an Exponential with mean 25 (Good) with a 34.7% probability, an Exponential with mean 50 (Typical) with a 53.1% probability, and an Exponential with mean 100 (Poor) with a 12.2% probability. A 3-point mixture of Exponentials, with mean: (.347)(25) + (.531)(50) + (.122)(100) = 47.425. Its second moment is: (.347){(2)(252 )} + (.531){(2)(502 )} + (.122){(2)(1002 )} = 5528.75. Its variance is: 5528.75 - 47.4252 = 3280. Alternately, the variance of an Exponential Distribution is θ2. Thus if X is Good, the process variance is: 252 = 625. The expected value of the process variance of X is: (34.7%)(252 ) + (53.1%)(502 ) + (12.2%)(1002 ) = 2764. Type
Probability
Process Variance
Mean
Square of Mean
Good Typical Poor
34.7% 53.1% 12.2%
625 2500 10000
25 50 100
625 2500 10000
2764.38
47.42
2764.38
Average
The variance of the hypothetical means = 2764.38 - 47.422 = 516. The total variance is the sum of the expected value of the process variance and the variance of the hypothetical means = 2764 + 516 = 3280. Comment: One uses results from previous solutions. The EPV and VHM, are discussed in a subsequent section. 5.39. A. Cov[X, Y] = E[XY] - E[X]E[Y] = 2474 - (47.4)(47.4) = 227. Corr[X,Y] = Cov[X, Y] / Var[X]Var[Y] = 227/3280 = 6.9%. Comment: Difficult. Uses the solutions to previous questions. The losses in consecutive years are positively correlated, since a Good Year is more likely to be followed by another Good Year and a Poor Year is more likely to be followed by another Poor Year.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 184 5.40. C. One applies Bayes Theorem in order to compute the chances that this year was Good, Typical, or Poor, conditional on the observation of losses of 75 this year. Prob[This Year is Good | X = 75] = Prob[ X = 75 | Good] Prob[Good]/ Prob[X = 75] = (e-75/25 /25)(.347) / {(e-75/25 /25)(.347) + (e-75/50 /50)(.531) + (e-75/100 /100)(.122)} = .00069 / (.00069 + .00237 + .00058) = 19.0%. Similarly, Prob[This Year is Typical | X = 75] = .00237 / (.00069 + .00237 + .00058) = 65.2%. Prob[This Year is Poor | X = 75] = .00058 / (.00069 + .00237 + .00058) = 15.8%. If this year is Good, 37.5 are the expected losses next year. If this year is Typical, 48.75 are the expected losses next year. If this year is Poor, 70 are the expected losses next year. Thus the expected losses next year are: (19.0%)(37.5) + (65.2%)(48.75) + (15.8%)(70) = 50.0. Comment: Probably beyond what you will be asked on the exam. This whole calculation could be arranged in the following spreadsheet: A
B
Type
A Priori Probability
Good Typical Poor
34.7% 53.1% 12.2%
Overall
C
D
E
Chance Prob. Weight = Posterior Chance of of the Product This Type = Observation of B Columns Col. D / Sum of Col. D &C 0.00199 0.00069 0.19001 0.00446 0.00237 0.65154 0.00472 0.00058 0.15845 0.00364
1.00000
F
Expected Losses Next Year 37.50 48.75 70.00 49.98
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 185 5.41. A. If the risk is of type 1, the distribution has parameter θ = 25 and density e−x/θ / θ = .04e-.04x. Thus the chance of the observation is: {.04e-.04(30)} {.04e-.04(40)} { .04e-.04(70)} = (.04)3 e-.04(140) = 2.37 x 10-7. For type 2 the chance of the observation is: (.025)3 e-.025(140) = 4.72 x 10-7. For type 3 the chance of the observation is: (.02)3 e-.02(140) = 4.86 x 10-7. A
C
D
E
F
Type
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Severity
1 2 3
70% 20% 10%
0.0000002367 0.0000004718 0.0000004865
0.0000001657 0.0000000944 0.0000000486
53.67% 30.57% 15.76%
25 40 50
0.0000003087
100.00%
33.5
Overall
B
Comment: One gets the same solution as the previous question. This is true in this case since each riskʼs severity is from a Gamma Distribution with the same value of α. (The Exponential is a Gamma for α = 1.) In the example in the text of this section where this was not the case, the two severity examples produced different results from each other.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 186 5.42. C. Given m ≥ 1 claims have been incurred, the probability of observing 1 claim by year end is the density at 1 of a Binomial Distribution with parameters .65 and m: m(.65)(.35m-1). A
B
C
D
E
F
# of Claims Incurred
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Probability = Col. D / Sum of Col. D
# of Claims Incurred
0 1 2 3 4
0.20 0.20 0.20 0.20 0.20
0.0000 0.6500 0.4550 0.2389 0.1115
0.0000 0.1300 0.0910 0.0478 0.0223
0.0000 0.4466 0.3126 0.1641 0.0766
0 1 2 3 4
Overall
1.0000
0.2911
1.0000
1.871
Comment: Thus the number of claims remaining to be reported is estimated as: 1.871 - 1 = .871. See “Loss Development Using Credibility”, by Eric Brosius. In a similar manner one can compute the estimates for other possible observations: # of Claims Reported by Year End
Estimated Number of Claims Incurred
Estimated Number of Claims Not Yet Reported
0 1 2 3 4
0.512 1.871 2.905 3.583 4.000
0.512 0.871 0.905 0.583 0.000
As usual, the estimates using Bayes Analysis are in balance: # of Claims Reported by Year End
A Priori Probability of the Observation
Estimated Number of Claims Incurred
0 1 2 3 4
0.3061 0.2911 0.2353 0.1318 0.0357
0.512 1.871 2.905 3.583 4.000
Weighted Average
2.000
The prior mean number of claims incurred of (0 + 1 + 2 + 3 + 4)/5 = 2, is equal to the weighted average of the posterior estimates of the numbers of claims incurred, using weights equal to the a priori probabilities of each possible observation.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 187 5.43. A. If the risk is from Class A, then the chance of the observation is: (e-.6 .63 / 3!)(6)(e-7/11/11)(e-10/11/11)(e-21/11/11) = (.01976)(.0001425) = .000002816. If the risk is from Class B, then the chance of the observation is: (e-.8 .83 / 3!)(6)(e-7/15/15)(e-10/15/15)(e-21/15/15) = (.03834)(.0001411) = .000005411. A
B
C
D
E
F
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Class
Mean Frequency
Class
A Priori Chance of This Class
A B
0.7500 0.2500
2.816e-6 5.411e-6
2.112e-6 1.353e-6
0.6096 0.3904
0.6000 0.8000
3.465e-6
1.0000
0.6781
Overall
Comment: I have used the entire observation, including the information on severity, in order to estimate the probability that risk is from Class A or B. In general when doing Bayes Analysis, use all of the information given, no more and no less. 5.44. C. Using the posterior probabilities from the previous solution: (0.6096)(11) + (0.3904)(15) = 12.562. 5.45. C. The mean pure premium for Class A is: (.6)(11) = 6.6. The mean pure premium for Class B is: (.8)(15) = 12.0. Using the posterior probabilities from a previous solution: (0.6096)(6.6) + (0.3904)(12.0) = 8.708. Comment: Similar to 4, 5/00, Q.7. (0.6781)(12.562) = 8.518 ≠ 8.708.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 188 5.46. B. Given m ≥ 3 claims have been incurred, then the chance of the observation is the density at 3 of a Binomial Distribution with parameters .7 and m: {m!/ (3!(m-3)!)} .73 .3m-3. The chance that m claims have been incurred is: e-44m/m!. Thus by Bayes Theorem, the posterior probability of m, for m ≥ 3 is: (e-44m/m!){m!/ (3!(m-3)!)} .73 .3m-3 / Σ (e-44m/m!){m!/ (3!(m-3)!)} .73 .3m-3 = m≥3 ∞
∞
{.3m4m/(m-3)!} / Σ .3m4m/(m-3)! = 1.2m/(m-3)! / 1.23 Σ 1.2i/i! = m=3
i=0
1.2m/(m-3)! / {1.23 e1.2} = e-1.21.2m-3/(m-3)!. Thus the posterior mean is: ∞
∞
∞
∞
Σ me-1.21.2m-3/(m-3)! = Σ (i+3)e-1.21.2i/i! = Σ i e-1.21.2i/i! + 3Σ e-1.21.2i/i! = m=3
i=0
i=0
i=0
= (mean of a Poisson with λ = 1.2) + (3)(sum of the densities of a Poisson with λ = 1.2) = 1.2 + 3 = 4.2. Alternately, divide the original Poisson Process into two independent Poisson Processes: claims reported by year end with mean (.7)(4) = 2.8, and claims not reported by year end with mean (.3)(4) = 1.2. Since the two processes are independent, the expected number of claims not reported is 1.2, regardless of the observation. Therefore, for 3 claims observed by year end, the expected number of claims incurred is: 3 + 1.2 = 4.2. Comment: See “Loss Development Using Credibility”, by Eric Brosius. The posterior distribution of m - 3 is Poisson with mean 1.2. 5.47. C. A
B
C
D
E
F
Type
A Priori Chance of This Type
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Type
Mean
A B
0.50 0.50
0.50 0.90
0.2500 0.4500
0.3571 0.6429
35.000 23.000
0.7000
1.0000
27.286
Overall
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 189 5.48. B. Using the prior distribution and the observation of both losses: A
B
C
D
E
F
Type
A Priori Chance of This Type
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Type
Mean
A B
0.50 0.50
0.25 0.09
0.1250 0.0450
0.7353 0.2647
35.000 23.000
0.1700
1.0000
31.824
Overall
Alternately, one can use the posterior distribution to the observation of the first claim, as the prior distribution to the observation of the second claim: A
B
C
D
E
F
Type
Distribution Posterior to 1st Claim
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Type
Mean
A B
0.3571 0.6429
0.50 0.10
0.1785 0.0643
0.7353 0.2647
35.000 23.000
0.2428
1.0000
31.823
Overall
Comment: Such a sequential approach always works for Bayesian Analysis. 5.49. D. Given m ≥ 5 claims have been incurred, then the chance of the observation is the density at 5 of a Binomial Distribution with parameters .7 and m: {m!/ (5!(m-5)!)} .75 .3m-5. The chance that m claims have been incurred is: (1.6m/2.6m+2)(m+1)!/(m!(2-1)!) = (.1479)(.6154m)(m+1). Thus by Bayes Theorem, the posterior probability of m, for m ≥ 5 is proportional to: (.1479)(.6154m)(m+1){m!/ (5!(m-5)!)} .75 .3m-5, which is proportional to: (.18462m){(m+1)!/(m-5)!)}. Letting i = m - 5, then i = 0, 1, 2, 3, ..., and the density of i is proportional to: (.18462i){(i+6)!/i!}. This is proportional to a Negative Binomial Distribution with r = 7, and β/(1+β) = .18462. β = .226. Therefore, the posterior mean of i is: (7)(.226) = 1.58. Therefore, the posterior mean of m is: 5 + 1.58 = 6.58. Comment: See “Loss Development Using Credibility”, by Eric Brosius. While the number of risk types are infinite, m = 0, 1, 2, ..., they are discretely distributed rather than continuous. 5.50. B. For risk 1, the mean is: (.5)(250) + (.3)(2500) + (.2)(60000) = 12,875. Risk
A Priori Probability
Probability of Observation
Probability Weight
Posterior Distribution
Mean
1 2
66.67% 33.33%
0.5 0.7
0.3333 0.2333
58.82% 41.18%
12,875 6,675
0.5667
Comment: Set up taken from the 4, 11/03, Q.23.
10,322
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 190 5.51. B. If for each year we have a Negative Binomial with r and β, then for a sum of four independent years we have a Negative Binomial with parameters 4r and β. Over 4 years, Type I is Negative Binomial with r = 4 and β = 0.25, Type II is Negative Binomial with r = 8 and β = 0.25, and Type III is Negative Binomial with r = 8 and β = 0.50. f(1) = rβ/(1+β)r+1. A
B
C
A Priori Type of Chance of Chance Risk This Type of the of Risk Observation I II III
0.30 0.50 0.20
0.3277 0.2684 0.1040
Overall
Comment: Similar to 4, 11/06, Q.2.
D
E
F
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Freq.
0.09830 0.13422 0.02081
38.80% 52.98% 8.21%
0.250 0.500 1.000
0.25333
1.000
0.444
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 191 5.52. For the Pareto Distribution: f(x) =
α θα 4θ4 = . (θ + x ) α + 1 (θ + x )5
The density for the payments excess of a deductible of size d is: f(x +d) / S(d) =
4θ4 4 (θ + d)4 ⎛ θ ⎞4 / = . (θ + x + d)5 ⎝ θ + d⎠ (θ + x + d)5
There are four combinations equally likely a priori: θ = 800 and d = 500, θ = 800 and d = 1000, θ = 1200 and d = 500, θ = 1200 and d = 1000. The chances of the observation are: 4 (800 + 500)4 (800 + 400 + 500)5
4 (800 + 500) 4 = 53.411 x 10-9, (800 + 1500 + 500 )5
4 (800 + 1000)4 4 (800 + 1000)4 = 87.421 x 10-9, (800 + 400 + 1000)5 (800 + 1500 + 1000 )5 4 (1200 + 500) 4 4 (1200 + 500)4 = 81.445 x 10-9, (1200 + 400 + 500)5 (1200 + 1500 + 500 )5 4 (1200 + 1000)4 4 (1200 + 1000)4 = 106.568 x 10-9. (1200 + 400 + 1000)5 (1200 + 1500 + 1000 )5 Since the four combinations are equally likely a priori, the probability weights are: 53.411, 87.421, 81.445, and 106.568. Thus the posterior probabilities are: 16.24% θ = 800 and d = 500 26.58% θ = 800 and d = 1000 24.77% θ = 1200 and d = 500 32.41% θ = 1200 and d = 1000 5.53. D. The average pure premium for Class A is: (1/5){(2/3)(100) + (1/3)(200)} = 26.67. The average pure premium for Class B is: (2/5){(1/2)(100) + (1/2)(200)} = 60. Chance of the observation if the insured is from class A: (1/5)(2/3). Thus the probability weight for class A is: (3/4)(1/5)(2/3) = 1/10. Similarly, the probability weight for class B is: (1/4)(2/5)(1/2) = 1/20. Thus the posterior distribution is: 2/3 and 1/3. The estimated future pure premium for this insured is: (2/3)(26.67) + (1/3)(60) = 37.78.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 192 5.54. D. A
B
C
D
State of World
A Priori Chance of This Type of State
Chance of the Observation
1 2 3
0.333 0.333 0.333
0.333 0.500 0.167
Overall
E
F
Prob. Weight = Col. D / Sum of Col. D = Product Posterior Chance of of Columns This Type Average B&C of Risk Severity 0.1111 0.1667 0.0556
33.3% 50.0% 16.7%
0.3333
1.000
1.3333 1.5000 1.1667
G
Average Pure Premium 0.667 0.750 0.583 0.694
Comment: Remember to multiply by the given 50% claim frequency in order to convert the mean severities into mean pure premiums. 5.55. B. A
B
C
D
E
Type of Driver
A Priori Chance of This Type of Driver
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Type of Driver
Youth Adult
0.150 0.850
30% 15%
0.0450 0.1275
26.09% 73.91%
0.172
1.000
Overall
5.56. D. Prob[u = 2 | T = 0] is the density at 2 of a Binomial with q = 0 and m = 2, which is 0. Prob[u = 2 | T = 1] is the density at 2 of a Binomial with q = 1/2 and m = 2, which is 1/4. Prob[u = 2 | T = 1] is the density at 2 of a Binomial with q = 1 and m = 2, which is 1. A
C
D
E
F
T
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Frequency
0 1 2
25% 50% 25%
0.000 0.250 1.000
0.000 0.125 0.250
0.00% 33.33% 66.67%
0 1 2
0.375
100.00%
1.667
Overall
B
Expected future frequency is: (0)(0) + (1/3)(1) + (2/3)(2) = 5/3.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 193 5.57. A. The Normal Distribution has a density function of exp[-(x-µ)2 /(2σ2) ] / σ(2π).5. For example, for Type A the Normal Density at .12 is: e-.222/(.03)(2.507) = 10.65. The probability weights are the chance of the observation (the probability density function) times the a priori probability. The probability density function at .12 from either risk B or C is so small, (10-12 and 10-1320 respectively,) that the posterior chance of Risk A is 100%. Thus the Bayesian estimate is .10, the mean of Risk A. A
Type of Risk A B C
B
C
D
A Priori Chance of This Type of Risk Mean 0.333 0.1 0.333 0.5 0.333 0.9
E
Probability Density Function 1.065e+1 2.288e-12 0.000e+0
Standard Deviation 0.03 0.05 0.01
F
G
Prob. Weight = Posterior Product Chance of of Columns This Type B&E of Risk 3.5494 100.0% 0.0000 0.0% 0.0000 0.0%
Overall
3.5494
1.000
H
Mean For this Type of Risk 0.1 0.5 0.9 0.1000
Comment: Assume that each of the three risks is a priori equally likely. 5.58. B. The posterior probabilities are the probability weights divided by their sum. A
B
C
D
E
F
A Priori Prob. Weight = Posterior Chance of Chance Product Chance of Type of This Type of the of Columns This Type Risk of Risk Observation B&C of Risk A 0.750 0.100 0.075 60.0% B 0.250 0.200 0.050 40.0% Overall
0.125
Avg. Claim Frequency 0.6 1.0 0.76
1.000
The posterior estimate is the product of the posterior probabilities and the means for each type of risk: (60%)(.6) + (40%)(1.0) = .76. 5.59. A. Risk 1 2 Overall
A priori Probability 0.666 0.333
Chance of Observation 0.5 0.7
Probability Weight 0.333 0.233
Probability 0.588 0.412
Mean 4350 2270
0.566
1
3494
Comment: (.333)(.7) = .233. .233 / .566 = .412. (.7)(100) + (.2)(1000) + (.1)(20000) = 2270. (.588)(.4350) + (.412)(2270) = 3494.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 194 5.60. B. A priori Probability 0.5 0.5
Risk A B
Chance of Observation 0.64 0.36
Probability Weight 0.32 0.18
Posterior Probability 0.64 0.36
Mean 48 172
0.5
1
92.64
Overall
The probability weight is the product of the a priori probability and the chance of the observation. The posterior probability is the probability weight divided by the sum of the probability weights. The posterior estimate is: (.64)(48) + (.36)(172) = 92.64. 5.61. E. For example, the average aggregate loss for risk type A is: (0)(.80) + (50)(.16) + (2000)(.04) = 88. A
B
C
A Priori Chance of Chance Type of This Type of the Risk of Risk Observation A 0.333 0.160 B 0.333 0.240 C 0.333 0.320 Overall
D
E
Prob. Weight = Posterior Chance Product of This Type = of Columns Col. D/ B&C Sum of Col. D 0.053 22.2% 0.080 33.3% 0.107 44.4% 0.240 1.000
F
Average Aggregate Losses 88.0 332.0 576.0 386.2
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 195 5.62. C. If one has a $1 million maximum covered loss the chance of having a $100,000 payment is 1/5 = .2. So if the maximum covered loss were $1 million, then the chance of observing two claims payments of $100,000 is .22 = .04. If instead one has a $100,000 maximum covered loss, the chance of having a $100,000 payment is the chance of having a total size of loss ≥ 100,000, which is 1/5 +1/20 = .25. So if the maximum covered loss were $100,000, then the chance of observing two claims payments of $100,000 is .252 = .0625. Now the a priori chance of having a maximum covered loss of $100,000 is 2/3; since the claim count distributions are the same for both types, the chance of a claim coming from a policy with maximum covered loss $100,000 is also 2/3. The mean payment for a claim when the maximum covered loss is $100,000 is: (1/2)(10000) + (1/4)(50000) + (1/5)(100,000) + (100,000)(1/20) = 42,500. The mean payment for a claim when the maximum covered loss is $1 million is: (1/2)(10000) + (1/4)(50000) + (1/5)(100,000) + (1/20)(1,000,000) = 87,500. Therefore, the posterior estimate of a third claim from the same policy is: A
B
Type of Risk
A Priori Chance of This Type of Risk 0.667 0.333
1 2 Overall
C
D
E
F
Chance of the Observation 0.06250 0.04000
Prob. Weight = Product of Columns B&C 0.04167 0.01333 0.05500
Posterior Chance of This Type of Risk 0.75758 0.24242 1.00000
Average Claim Severity $42,500 $87,500 $53,409
Comment: It may take a moment to recognize that this is a question involving Bayesian Analysis; we observe two claims payments from a single policy of unknown type and we wish to estimate the size of another claims payment from the same policy. While this is a somewhat artificial example, (but then how many times in your career have you picked balls from urns), this question tests whether you can recognize and apply Bayes Analysis to general situations.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 196 5.63. D. For the Poisson Distribution with mean λ, f(x) = e−λ λ x / x!. The chance of observing zero claims is therefore f(0) = e−λ, while the chance of observing 2 claims is f(2) = e−λ λ 2 / 2. For the Poisson Distribution the number of claims observed over the first period is independent of the number of claims observed over the second period. (The Poisson has a constant claims intensity, such that how many claims are observed over any interval of time is independent of how many claims are observed over any other disjoint interval of time.) Thus the chance of observation given λ is f(0)f(2) = e−2λ λ 2 / 2. For Class 1 with λ = 1 this is: e-2 / 2 = .06767. For Class 2 with λ = 2 this is: 2e-4 = .03663. The Bayesian Analysis proceeds as follows: A
B
Class 1 2
A Priori Chance of This Class 0.500 0.500
C
D
E
F
Chance of the Observation 0.0677 0.0366
Prob. Weight = Product of Columns B&C 0.0338 0.0183
Posterior Chance of This Type of Risk 64.9% 35.1%
Mean Frequency
0.0522
1.000
1.35
Overall
1 2
5.64. C. If Type A, the chance of the observation is 1 - Φ((5000-3000)/1000) = 1 - Φ(2) = 0.0228. If Type B, then the mean is 4000 and the standard deviation is 1000 so that the chance of the observation is 1 - Φ[(5000-4000)/1000] = 1 - Φ(1) = 0.1587. A
Type of Claim A B SUM
B
A Priori Probability 0.75 0.25
C
Chance of Observation 0.0228 0.1587
D
E
Probability Col. D / Sum of Col. D = Weights = Posterior Col. B x Col. C Probability 0.0171 0.301 0.0397 0.699 0.05677
1.000
The posterior chance of Type A is proportional to the product of its a priori chance and the chance of the observation if Type A. The posterior probability is: .0171/.05678 = 0.301.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 197 5.65. B. The posterior probabilities are proportional to the product of the chance of the observation given each class and the a priori probability of each class. Since the a priori probabilities of the classes are all equal, the posterior probabilities are proportional to the chance of the observation given each class. Thus, the posterior probabilities are proportional to the density functions at 340: 1/400, 1/600 and 1/800. Dividing by their sum, these produce posterior probabilities of: 6/13, 4/13 and 3/13. The means of the classes are 200, 300 and 400. Thus the Bayesian analysis estimate of the expected value of a second claim from the same risk is: (6/13)(200) + (4/13)(300) + (3/13)(400) = 3600/13 = 277. A
B
C
D
E
F
Class
A Priori Chance of this Class
Chance of the Observation
0.3333 0.3333 0.3333
0.002500 0.001667 0.001250
Posterior Chance of This Class = Col. D / Sum of D 0.4615 0.3077 0.2308
Mean of this Class
1 2 3
Prob. Weight = Product of Columns B&C 0.000833 0.000556 0.000417 0.001806
1.000
276.9
Overall
200 300 400
5.66. C. F(x) = 1 - {5000/(5000+x)}2 . The probability of the observation given a maximum covered loss of $5000 is 0. The probability of the observation given a maximum covered loss of $10,000 is: 1 - F(9000) = 0.1276. The probability of the observation given a maximum covered loss of $20,000 is: F(11000) - F(9000) = (1 - 0.0977) - (1 - 0.1276) = 0.0299. A
B
Maximum Covered Loss
A Priori Chance of This Type 0.2500 0.2500 0.5000
5000 10000 20000
C
Chance of the Observation 0.0000 0.1276 0.0299
Overall
D
E
Prob. Weight = Posterior Chance of Product This Type = of Columns Col. D / Sum of Col. D B&C 0.0000 0.0000 0.0319 0.6809 0.0149 0.3191 0.0469
1.0000
5.67. D. If Risk A, the chance of the observation is e-m. If Risk B, the chance of the observation is e-(m+1). The posterior probabilities are proportional to the product of the a priori probabilities and the chance of the observation. Given that A and B are equally likely a priori, the posterior probabilities are proportional to e-m and e-(m+1). Thus the posterior probability of Risk A is: e-m / {e-m + e-(m+1)} = e / (1+e) = 0.731.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 198 5.68. B. The chance of observing zero claims if we have risk type A is e-m. The chance of observing zero claims if we have risk type B is e-2m. The a priori chance of risk type A and B are each .5. Therefore, by Bayes theorem the posterior probability of Risk Type A is: (.5)(e-m) / { (.5)(e-m) +(.5)(e-2m)} = 1/(1+e-m). The posterior probability of Risk Type B is: (.5)(e-2m) / { (.5)(e-m) +(.5)(e-2m)} = e-m/(1+e-m). Thus the posterior chance of zero claims is: e-m/(1+e-m) + e-2m e-m/(1+e-m) = (e-m + e-3m)/(1+e-m). Thus the posterior chance of at least one claim is : 1 - (e-m + e-3m)/(1+e-m) = (1 - e- 3 m) / (1+e-m). Comment: Some students may find this easier to do by just plugging in a value for m at the beginning, such as m = 0.7, and just proceeding to do the problem numerically. In that case, compare your numerical answer to the available choices. With m = 0.7: A
Type of Risk A B Overall
B
C
D
E
F
Mean A Priori Prob. Weight = Posterior of Chance of Chance Product Probability Poisson This of the of Columns Col. E / Sum of Col. E Dist. Type Observation C&D 0.7 1.4
0.5000 0.5000
0.4966 0.2466
G
Chance of at Least 1 Claim
0.2483 0.1233
0.6682 0.3318
0.5034 0.7534
0.3716
1.0000
0.586
(1 - e-3m)/(1 + e-m) = (1 - e-2.1)/(1 + e-.7) = .8775 / 1.4966 = 0.586.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 199 5.69. D. If the risk is from Class A, then the chance of the observation is: (e-1 12 / 2!)(2)(e-1/1)(e-3/1) = e-5 = .00674. If the risk is from Class B, then the chance of the observation is: (e-3 32 / 2!)(2)(e-1/3/3)(e-3/3/3) = e-13/3 = .01312. The mean pure premium for Class A is: (1)(1) = 1. The mean pure premium for Class B is: (3)(3) = 9. A
Class A B
B
C
A Priori Chance Chance of of the This Class Observation 0.5000 0.5000
0.00674 0.01312
Overall
D
E
F
Prob. Weight = Product of Columns B&C 0.00337 0.00656
Posterior Chance of This Class
Mean Pure Premium
0.3392 0.6608
1.0000 9.0000
0.00993
1.0000
6.286
Comment: I have included a factor of two in the chance of observations in order to take into account the two combinations of claim severities (either claim can be first.) If the question has instead said that the first claim was of size 1 and the second claim was of size 3, then one should leave out this factor of two. You get the same answer to the question whether you include this factor of two or not. In general, any factor that shows up on every row of the “chance of the observation” column, drops out when one computes the posterior distribution. 5.70. If we have two claims, then the chance they are of sizes 1.0, and at least 3.0 is: 2 f(1) S(3) = 2 (e-1/θ / θ) e-3/θ. If the risk is from Class A, then the chance of the observation is: (e-1 12 / 2!)(2)(e-1/1)(e-3) = e-5 = 0.00674. If the risk is from Class B, then the chance of the observation is: (e-3 32 / 2!)(2)(e-1/3/3)(e-3/3) = 3 e-13/3 = 0.03937. The mean pure premium for Class A is: (1)(1) = 1. The mean pure premium for Class B is: (3)(3) = 9.
Class A B Overall
A Priori Chance Chance of of the This Class Observation 0.5000 0.5000
0.00674 0.03937
Probability Weight B&C 0.00337 0.01969 0.02305
Posterior Chance of This Class
Mean Pure Premium
0.1461 0.8539
1.0000 9.0000
1.0000
7.831
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 200 5.71. D. The observation corresponds to either a single claim of size 100,000 (either a claim in the first year and none in the second year or vice versa) or two claims of size 50,000 (one in each year.) If the risk is from Class A, then the chance of the observation is: (2)(.22)(.78)(.4) + (.22)(.22)(.6)(.6) = .1547. If the risk is from Class B, then the chance of the observation is: (2)(.11)(.89)(.64) + (.11)(.11)(.36)(.36) = .1269. A
B
C
D
E
F
Class
A Priori Chance of This Class
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Class
Mean Pure Premium
A B
0.6667 0.3333
0.1547 0.1269
0.1031 0.0423
0.709 0.291
15,400 9,020
0.1454
1.000
13,544
Overall
Comment: Bullet (v) could/should have been worded more clearly. The expected number of claims per year for an individual risk in Class A is 0.22. Usually expected frequencies are intended to be per individual or per exposure, unless stated otherwise. If not, in this case they would have said something like the total number of claims per year expected from Class A.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 201 5.72. C. f(x) = α θα/(x+θ)α+1 = α 10α/(x+10)α+1. f(20) = (α/30) /3α. S(x) = (θ/(x+θ))α. S(30) = (10/40)α = 1/4α. A
α
1 2 3
B
C
D
E
A Priori Chance of This Type of Risk 0.3333 0.3333 0.3333
f(20) = Chance of the Observation 0.01111 0.00741 0.00370
Prob. Weight = Product of Columns B&C 0.00370 0.00247 0.00123
Posterior Chance of This Type of Risk = Col. D / Sum of Col. D 50.00% 33.33% 16.67%
25.00% 6.25% 1.56%
0.00741
1.000
14.84%
Overall
F
S(30)
Comment: The a priori probability that a claim will be greater than 30 is: (1/3)(25%) + (1/3)(6.25%) + (1/3)(1.56%) = 10.94%. Since posterior to the observation, there is a greater chance that the Pareto Distribution is longer-tailed (α smaller), the estimate of S(30) has increased. Since the survival function is not linear in alpha, one should not weight together the alphas: (1/2)(1) + (1/3)(2) + (1/6)(3) = 1.667. This would result in an incorrect value for the posterior estimate of S(30); 1/41.667 = 9.9% ≠ 14.84%. 5.73. B. f(x) = α θα/(x+θ)α+1 = α 10α/(x+10)α+1. f(20) = (α/30) /3α. f(40) = (α/50) /5α. The probability of the observation is: 2f(20)f(40). S(x) = (θ/(x+θ))α. S(30) =(10/40)α = 1/4α. A
B
α
A Priori Chance
f(20)
1 2 3
0.3333 0.3333 0.3333
0.01111 0.00741 0.00370
Overall
C
D
E
F
G
H
f(40)
Probability of the Observation
Probability Weight
Posterior Chance
S(30)
0.0000889 0.0000237 0.0000036
0.00002963 0.00000790 0.00000119
76.53% 20.41% 3.06%
25.00% 6.25% 1.56%
0.00003872
1.000
20.46%
0.00400 0.00160 0.00048
Comment: Since it appears on every row, the factor of 2 in the probability of the observation does not affect the posterior distribution.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 202 5.74. D. S(x) = (θ/(x+θ))α. S(30) = (10/40)α = 1/4α. The probability of the observation is: S(20) = (10/30)α = 1/3α. A
B
C
D
E
F
α
A Priori Chance
Probability of the Observation
Probability Weight
Posterior Chance
S(30)
1 2 3
0.3333 0.3333 0.3333
33.33% 11.11% 3.70%
0.11111 0.03704 0.01235
69.23% 23.08% 7.69%
25.00% 6.25% 1.56%
0.16049
100.00%
18.87%
Overall
5.75. B. Mean for class 1 is: (0 + 1 + 2)/3 = 1. Mean for class 2 is: (1)(1/6) + (2)(2/3) + (3)(1/6) = 2. Mean for class 3 is: (2)(1/6) + (3)(2/3) + (4)(1/6) = 3. A
B
C
D
Class
A Priori Chance of This Class
Chance of the Observation
Prob. Weight = Product of Columns B&C
1 2 3
0.5000 0.3333 0.1667
0.33333 0.16667 0.00000
0.16667 0.05556 0.00000
75.00% 25.00% 0.00%
1.00 2.00 3.00
0.22222
1.000
1.25
Overall
E
F
Posterior Mean Chance of This Frequency Class = Col. D / Sum of Col. D
Comment: While there is no reason why they could not have asked you to estimate the future using Buhlmann Credibility, they did not. Buhlmann Credibility is a least squares linear approximation to Bayesian Analysis. Therefore, Bayes Analysis is the default. Use Bayes Analysis when both are available, unless: they use the words: credibility, Buhlmann, Buhlmann-Straub, semiparametric, empirical Bayes, etc., or it is one of the Conjugate Prior situations in which Buhlmann = Bayes, so it does not matter which one you use. 5.76. A. Prob[3 claims | good] = (13 )e-1/3! = .0613. Prob[3 claims | bad] = (53 )e-5/3! = .1404. Since the two types of drivers are equally likely, the a priori probability of the observation of 3 claims is: (.5)(.0613) + (.5)(.1404) = .1009. By Bayes Theorem, P[A | B] = P[B | A]P[A]/P[B]: Prob[good | 3 claims] = Prob[good]Prob[3 claims | good]/Prob[3 claims] = (.5)(.0613)/.1009 = 30.4%. Comment: Prob[bad | 3 claims] = Prob[bad]Prob[3 claims | bad]/Prob[3 claims] = (.5)(.1404)/.1009 = 69.6% = 1 - 30.4%. The estimated future claim frequency for this driver is: (30.4%)(1) + (69.6%)(5) = 3.78.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 203 5.77. A. For Class I, λ = 12/12 = 1 per year. For Class II, λ = 12/15 = 0.8 per year. For Class III, λ = 12/18 = 2/3 per year. Prob[0 claims | Class I] = e-1 = .3679. Prob[0 claims | Class II] = e-.8 = .4493. Prob[0 claims | Class III] = e-2/3 = .5134. The a priori probability of the observation of 0 claims is: (1/3)(.3679) + (1/3)(.4493) + (1/3)(.5134) = .4435. By Bayes Theorem, P[A | B] = P[B | A]P[A]/P[B]: Prob[Class I | 0 claims] = Prob[Class I]Prob[0 claims | Class I]/Prob[0 claims] = (1/3)(.3679)/.4435 = 27.65%. Prob[Class II | 0 claims] = Prob[Class II]Prob[0 claims | Class II]/Prob[0 claims] = (1/3)(.4493)/.4435 = 33.77%. Prob[Class III | 0 claims] = Prob[Class III]Prob[0 claims | Class III]/Prob[0 claims] = (1/3)(.5134)/.4435 = 38.59%. Therefore, the expected future frequency for this insured is: (27.65%)(1) + (33.77%)(.8) + (38.59%)(2/3) = .8039. The expected loss in year 2 for this insured is: (1000)(.8039) = 804. 5.78. D. This a Pareto Distribution with α = 1 and S(x) = θ/(x+θ). Type
A priori chance
f(5)
Probability Weight
Posterior Probability
S(8)
θ=1 θ=3
0.5 0.5
0.02778 0.04688
0.01389 0.02344
0.37209 0.62791
0.1111 0.2727
0.03733
0.2126
5.79. B. For a Binomial with m = 6, f(3) = 20q3 (1-q)3 . For q = 0.1, f(3) = 0.01458. Type
A priori Probability
Chance of Observation
Probability Weight
Posterior Probability
Mean Frequency
I II III
0.7 0.2 0.1
0.01458 0.08192 0.27648
0.01021 0.01638 0.02765
0.188 0.302 0.510
0.1 0.2 0.4
0.05424
0.283
Comment: Buhlmann Credibility is a linear approximation to Bayes Analysis; therefore, on the exam the default is to use Bayes Analysis unless they say to use credibility.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 204 5.80. B. Mean severity for Class 1: (.5)(250) + (.3)(2500) + (.2)(60000) = 12,875. A
B
C
D
E
F
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Class
Mean Severity
Class
A Priori Chance of This Class
1 2
0.6667 0.3333
0.5000 0.7000
0.3333 0.2333
0.588 0.412
12,875 6,675
0.5667
1.000
10,322
Overall
Comment: Same setup as 4, 11/03, Q.23, which uses Buhlmann Credibility. 5.81. C. The chance of the observation is the density at two of a Geometric: β2/(1 + β)3 . A
B
C
D
E
F
Beta
A Priori Chance of This Type
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Type
Mean Frequency
2 5
0.3333 0.6667
0.1481 0.1157
0.0494 0.0772
0.390 0.610
2 5
0.1265
1.000
3.829
Overall
5.82. A. Type I, θ = .1, has distribution: 20% @ 0, 10% @ 1, and 70% @2, with mean 1.5. Type II, θ = .3, has distribution: 60% @ 0, 30% @ 1, and 10% @2, with mean 0.5. A
B
C
D
E
F
Theta
A Priori Chance of This Type
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Type
Mean
0.1 0.3
0.80 0.20
0.10 0.30
0.08 0.06
0.571 0.429
1.500 0.500
0.14
1.000
1.071
Overall
5.83. E. f(0) = 1/(1+β)r. For r = 2 and β = 2, f(0) = 1/9. For r = 4 and β = 1, f(0) = 1/16. f(1) = rβ/(1+β)r+1. For r = 2 and β = 2, f(1) = 4/27. For r = 4 and β = 1, f(1) = 1/8. Probability of the observation is: f(0)f(0)f(1). Prob[Obs. | r = 2 and β = 2] = (1/9)(1/9)(4/27) = .0018290. Prob[Obs. | r = 4 and β = 1] = (1/16)(1/16)(1/8) = .0004883. P[Observation] = (1/2)(.0018290) + (1/2)(.0004883) = 0.001159. By Bayes Theorem, P[Risk Type | Observation] = P[Obser. | Type] P[Type] / P[Observation]. P[Risk Type One | Observation] = (.0018290)(.5)/0.001159 = 78.9%. Comment: P[Risk Type Two | Observation] = (.0004883)(.5)/0.001159 = 21.1%.
2013-4-9 Buhlmann Credibility §5 Bayes Discrete Risk Types, HCM 10/19/12, Page 205 5.84. E. Prob[θ = 8 | observation] = (0.8e-5/8/8) / {(0.8e-5/8/8) + (0.2e-5/2/2)} = 0.867. Prob[θ = 2 | observation] = (0.2e-5/2/2)/{(0.8e-5/8/8) + (0.2e-5/2/2)} = 0.133. Posterior estimate of θ is: (0.867)(8) + (0.133)(2) = 7.20.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 206
Section 6, Bayesian Analysis, with Continuous Risk Types In the prior section Bayes Theorem was applied to situations where there were several distinct types of risks. In this section, Bayes Theorem will be applied in a similar manner to situations in which there are an infinite number of risk types, parameterized in some continuous manner. Where summation was used in the discrete case, instead integration will be used in the continuous case. “Mahlerʼs Guide to Conjugate Priors” contains many more examples of Bayesian Analysis with Continuous Risk Types.52 An Example of Mixing Bernoullis: For example, assume: • In a large portfolio of risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean q. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. • The distribution of q within the portfolio has density function: π(q) = 20.006q4 , 0.6 ≤ q ≤ 0.8. A policyholder is selected at random from the portfolio. He is observed to have two claims in three years. What is the density of his posterior distribution function of q? The chance of the observing 2 claims in 3 years, given q, is from a Binomial Distribution with parameters m = 3 and q: 3q2 (1-q) = 3q2 - 3q3 . By Bayes Theorem, Prob(q | observation) = Prob(q)Prob(observation | q) / Prob(observation). Therefore, the posterior probability density function of q is proportional to : Prob(q)Prob(observation | q) = π(q)(3q2 - 3q3 ). In order to compute the posterior distribution we need to divide by the integral of π(q)(3q2 - 3q3 ), over the support of π(q), which is [.6, .8]. 0.8
∫ π(q)
0.6
52
0.8
(3q2
-
3q3)
dq =
∫ 20.006
0.6
0.8
q4 (3q2
-
3q3 )
dq = 60.018
∫
q6 - q7 dq =
0.6
See the sections on Mixing Poissons, Gamma-Poisson, Beta-Bernoulli, Inverse Gamma - Exponential, and the Overview.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 207 60.018{q7 / 7
q = 0.8 8 q / 8}
]
-
= 60.081{(0.02996 - 0.02097) - (0.00400 - 0.00210)} = 0.04259.
q = 0.6
Thus the density of the posterior distribution of q is: π(q)(3q2 - 3q3 ) / 0.4259 = 20.006q4 3(q2 - q3 ) / 0.4259 = 141.079 (q6 - q7 ), for 0.6 ≤ q ≤ 0.8. Exercise: In this example, a policyholder is selected at random from the portfolio. He is observed to have two claims in three years. What is his future expected annual claim frequency? [Solution: Since for each insured q is the mean frequency, what we want is the mean of the posterior distribution of q. As calculated above, for this observation the density of the posterior distribution of q is: 141.079 (q6 - q7 ), for 0.6 ≤ q ≤ 0.8. Its mean is: 0.8
∫
0.8
141.079 (q6 - q7) q dq = 141.079
0.6
∫
q7 - q8 dq =
0.6
141.079{q8 / 8
-
q = 0.8 9 q / 9}
]
= 141.079{(0.02097 - 0.01491) - (0.00210 - 0.00112)} = 0.72. ]
q = 0.6
Bayesian Interval Estimates: By the use of Bayes Theorem one obtains an entire posterior distribution. Rather than just using the mean of that posterior distribution in order to get a point estimate, one can use the density function of the posterior distribution to estimate the posterior chance that the quantity of interest is in a given interval.53 For the above example, one could compute the posterior probability that the future expected frequency for this insured, q, is in for example the interval [0.75, 0.76]. The posterior chance that q is in the interval [0.75, 0.76] is the integral from 0.75 to 0.76 of the posterior density: 0.76
141.079
∫
0.75
q6
-
q7 dq
=
141.079{q7 / 7
-
q = 0.76 8 q / 8}
]
=
q = 0.75
(141.079) {(0.020922 - 0.013913) - (0.019069 - 0.012514)} = 6.4%.
53
Bayesian Interval Estimates can come up in the situations covered in “Mahlerʼs Guide to Conjugate Priors.”
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 208 More generally, the same technique can be applied whenever one has a continuous prior distribution of the quantity of interest.54 Then given an observation one applies Bayes Theorem to get the posterior density, which is proportional to the product of the prior density and the chance of the observation. One needs to divide by the integral of this product over the support of the prior density, in order to get the posterior density.55 Then in order to find the posterior chance that the quantity of interest is in a given interval, one can integrate the posterior density over that interval.56 Exercise: For each individual within a portfolio, the probability of a claim in a year is a Bernoulli with mean q. The prior distribution of q within the portfolio is uniform on [0,1]. An insured is picked at random from the portfolio, and one claim was observed in one year. What is the posterior estimate that this insured has a q parameter less than 0.2? [Solution: The prior density function of q is π(q) = 1 for 0 ≤ q ≤ 1. The chance of observing one claim in one year, given q, is q. By Bayes Theorem, the posterior probability density function is proportional to: π(q)q = q. In order to compute the posterior density we need to divide by the integral of π(q)q, over the support of π(q), which is [0, 1]. 1
∫ π(q) q dq = 0
1
∫
q dq = 1/2.
0
∫
Thus the posterior density is: π(q)q / π(q)q dq = (1)(q)/(1/2) = 2q, for 0 ≤ q ≤ 1. The posterior probability that q is in the interval [0, 0.2) is the integral from 0 to 0.2 of the posterior density: 0.2
∫
q = 0.2
2q dq = q2 ]
= 0.22 = 0.04. ]
q=0
0
If instead the prior distribution of q had been π(q) = 4q3 , 0 ≤ q ≤ 1, then the density of the posterior
∫
distribution of q would be: π(q)q / π(q)q dq = (4q3 )(q)/(4/5) = 5q4 for 0 ≤ q ≤ 1. In this case, the prior probability that q is in the interval [0, 0.2) is: 0.2
∫
0 54
4q3
dq =
q = 0.2 q4 ] =
0.24 = 0.0016.
q=0
In the example, q was distributed continuously . The posterior density has to integrate to one. In the example, we divided f(q)(3q2 - 3q3 ) by 0.4259 so that the posterior density would integrate to 1 rather than 0.4259. 56 In the example, we integrated the posterior density over the interval [0.75, 0.76], and determined that there was a 6.4% probability that q was in that interval. 55
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 209 While the posterior probability that q is in the interval [0, 0.2) is: 0.2
∫
0
q = 0.2
5q4 dq = q5 ]
= 0.25 = 0.00032.
q=0
Bayesian Estimation:57 Loss Models formalizes and generalizes what has been discussed so far. It is important to note that many people perform and understand Bayesian Analysis without using the following formal definitions. Nevertheless, the notation and terminology of the text may be used in exam questions, so it is a good idea to learn it. Letʼs first go over some definitions by applying them to the previous example. • In a large portfolio of risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean q. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. • The distribution of q within the portfolio has density function: 20.006q4 , 0.6 ≤ q ≤ 0.8 • A policyholder is selected at random from the portfolio. He is observed to have two claims in three years. In general one has the possibilities described via a Prior Distribution of the parameter(s), denoted by π(θ). In this example, the prior distribution of q is: π(q) = 20.006q4 , 0.6 ≤ q ≤ 0.8. The Model Distribution is the likelihood of the observation given a particular value of the parameter or a particular type of insured, denoted fX|θ(x|θ). In general the model distribution will be a product of densities; it is the likelihood function. In this example, the model distribution is the product of Bernoulli densities58: fX|q(2 | q) = 3q2 (1-q). The Joint Distribution is the product of the Prior Distribution and the Model Distribution.59 fX,θ(x,θ) = π(θ) fX|θ(x|θ). In this example, the joint distribution has probability density function: 20.006q4 3q2 (1-q), 0.6 ≤ q ≤ 0.8. Note that x denotes the observations, in this case 2 claims in 3 years. If instead one had seen 3 claims in 3 years, then the joint distribution would instead have p.d.f. of: 20.006q4 q3 , 0.6 ≤ q ≤ 0.8. 57
See Section 15.5.1 of Loss Models. Note that this is a Binomial with parameters m = 3 and q. The factor of 3 in front comes from the fact that which years had claims was not specified; there are 3 different combinations that would produce 2 claims in 3 years. 59 This is just the usual definition used in statistics; see the section on conditional distributions. 58
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 210 The Marginal Distribution of x is the integral over the possible values of the parameter(s) of the joint density:60 fX(x) =
∫ π(θ) fX | θ (x | θ) dθ .
In the Bernoulli example, for the marginal distribution for the number of claims over three years is at 1:61 0.8
fx(1) =
∫ 20.006
0.8
q4 3q(1- q)2
dq = 60.0118
0.6
∫
q5 - 2q6 + q7 dq =
0.6
(60.0118) {(0.86 /6 - 2(0.87 )/7 + 0.88 /8) - (0.66 /6 - 2(0.67 )/7 + 0.68 /8)} = (60.0118)(0.00474 - 0.00188) = 17.2%. Exercise: For the Bernoulli example, what is the marginal distribution for the number of claims over three years? [Solution: x: 0 1 2 3 f(x): 2.5% 17.2% 42.5% 37.8% ] In this example, it is the chance, prior to any observations, that we will observe 0, 1, 2, or 3 claims over the coming 3 years. In general, the marginal distribution is prior to any observations. The marginal distribution is often referred to as the “prior mixed” distribution. Posterior to observations one has two additional distributions of interest. The Posterior Distribution is the distribution of the parameters subsequent to the observation(s). It is just the conditional distribution of the parameters given the observations. The density of the posterior distribution is denoted by: πθ|X(θ | x). As computed previously using Bayes Theorem, in the Bernoulli example after observing 2 claims in 3 years the density of the posterior distribution of q is: 141.079(q6 - q7 ), for .6 ≤ q ≤ .8. The Predictive Distribution is the distribution of x subsequent to the observation(s). It is the mixed distribution of x, given the observations. The density of the predictive distribution is denoted by: fY|X(y | x).
60
This is just the usual definition used in statistics; see the section on conditional distributions. One could also compute the marginal distribution for the number of claims over a single year. In this case it would be an a priori 71.9% chance observing one claim over the coming year and a 28.1% chance of observing no claims over the coming year. (One integrates f(q)q and f(q)(1-q) respectively.) 61
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 211 Exercise: For the Bernoulli example, after observing 2 claims in 3 years, what is the predictive distribution for the number of claims over the next year? [Solution: The density of the posterior distribution of q is: 141.079(q6 - q7 ), for 0.6 ≤ q ≤ 0.8. fY|X(y | x) =
0.8
∫
0.6
0.8
141.079 (q6 - q7) q dq = 141.079
∫
q7 - q8 dq = 71.6%.
0.6
Since for a Bernoulli we can have only 0 or 1 claims in a single year, the predictive distribution is: 28.4% chance of zero claims, and 71.6% chance of one claim, in other words a Bernoulli with q = 0.716. Comment: One can integrate (1-q) times the posterior density and obtain 28.4%.] The predictive distribution is analogous to the marginal (or prior mixed) distribution, but posterior to the observations.62 The predictive distribution is computed from the posterior distribution in the same manner as the marginal distribution is computed from the prior distribution. Mixing Poissons and the Conjugate Prior situations63 provide good examples, which should allow you to fully understand these concepts. Finally it should be noted that throughout this section I have used the mean (expected value using the posterior distribution) of the quantity of interest as the estimator. This is only one possible Bayesian estimator; the use of the mean as the estimator corresponds to a least squares criteria. The Bayes Estimator based on the mean of the posterior distribution is the least squares estimator with respect to the true underlying value of the quantity of interest, as well as with respect to the next observation. Other estimators such as the median, percentiles and the mode of the posterior distribution, are discussed in a subsequent section on Loss Functions / Error Functions. However, unless stated otherwise, assume that Bayesian Estimation refers to the Bayes Estimator corresponding to the squared-error loss function, the mean of the posterior distribution.
62 63
Therefore, the predictive distribution is sometimes referred to as the “posterior mixed” distribution. See “Mahlerʼs Guide to Conjugate Priors.”
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 212
Summary: If π(θ) is the density of the prior distribution of the parameter θ, then the density of the posterior distribution of θ is proportional to: π(θ) P(Observation | θ). The density of the posterior distribution of θ is:
π(θ) Prob[Observation | θ]
∫ π(θ) Prob[Observation | θ] dθ
.
(Mean given θ) π(θ) Prob[Obs. | θ] dθ ∫ The Bayes estimate is: . ∫ π(θ) Prob[Obs. | θ] dθ
64
The Bayes estimate is analogous to the case with a discrete distribution of risk types:
∑ (Mean given θ) Prob[θ] Prob[Obs. | θ] . Prob[θ] Prob[Obs. | θ] ∑ Improper Prior Distributions: In the course of Bayesian Estimation one will use the prior distribution in order to get the posterior distribution. In order to do so the only thing of importance will be the relative chances of each value of the parameter or type of insured. Therefore, it is not even essential that the prior distribution, π(θ), be a true distribution that integrates to unity. For example, if instead in the Bernoulli example we had taken π(q) = q4 , 0.6 ≤ q ≤ 0.8, we would have gotten the same posterior distribution, once we normalized it so that it integrates to unity. One could generalize to a situation in which the “prior distribution” does not even have a finite integral. All that is important is the probabilities all be non-negative. An improper prior distribution is a set of non-negative probabilities for which the sum or integral is infinite.65
64
In the case of the Bernoulli, the mean frequency is the parameter q. In the case of Negative Binomial with r fixed and β varying, the mean frequency would be rβ. 65 See Definition 15.8 in Loss Models.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 213 Exercise: Let severity be given by an exponential distribution with mean δ : f(x) = e-x/δ / δ. In turn, let δ have the improper prior distribution π(δ) = 1/δ , 0 < δ< ∞. One observes 5 claims of sizes: 3, 4, 6, 9 ,11. Determine the posterior distribution of δ. [Solution: The chance of the observation is the product of the densities at the observed points: e-3/δ e-4/δe-6/δe-9/δe-11/δ/ δ5 = e-33/δ / δ5. Multiplying by the prior distribution of 1/δ gives the probability weights: e-33/δ / δ6. The posterior distribution is proportional to this and therefore is an Inverse Gamma Distribution.66 Specifically, the posterior distribution of delta is a Inverse Gamma with θ = 33 (the sum of the observed claims) and α = 5 (the number of observed claims.) πδ | x(δ | x) = 335 e-33/δ / {δ6 Γ(5)}.] Exercise: Let severity be given by an exponential distribution with mean δ : f(x) = e-x/δ / δ. In turn, let δ have the improper prior distribution π(δ) = 1/δ , 0 < δ < ∞. One observes 5 claims of sizes: 3, 4, 6, 9 ,11. Estimate the future average claim size. [Solution: The posterior distribution of delta is an Inverse Gamma with θ = 33 and α = 5. The posterior average claim size is E[δ] = mean of posterior Inverse Gamma = θ / (α−1) = 33/4 = 8.25.] Credibility (Confidence) Intervals:67 Exercise: For each individual within a portfolio, the probability of a claim in a year is a Bernoulli with mean q. The prior distribution of q within the portfolio is uniform on [0, 1]. An insured is picked at random from the portfolio, and one claim was observed in one year. What is the posterior estimate that this insured has a q parameter in the interval [0.20, 0.98]? [Solution: As shown in a previous exercise, the posterior density is: 2q, for 0 ≤ q ≤ 1. The posterior chance that q is in the interval [0.20, 0.98] is the integral from 0.2 to 0.98 of the 0.98
posterior density:
∫
0.2
2q dq =
q = 0.98 q2 ] =
0.982 - 0.22 = 0.96 - 0.04. = 0.92.]
q = 0.2
Thus posterior to the observation, [0.20, 0.98] is a 92% confidence interval for q. In general, by eliminating some probability on either tail, one can use the posterior distribution to create a confidence interval of a given level. 66
One can normalize the probability weights by dividing by their integral; the integral is of the Gamma variety. Alternately, one can recognize the Inverse Gamma from the negative power of the variable multiplied by the exponential of the reciprocal of that variable. 67 Loss Models Definition 15.19 refers to what is commonly termed a confidence interval as a “credibility interval.”
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 214 Quite often the Normal Approximation is used to get a confidence interval. Using the Normal Approximation, the estimated mean ±1.96 times the estimated standard deviation is a 95% confidence interval. The use of the Normal Approximation is usually valid when one has a large amount of data, since the posterior distribution is usually asymptotically normal; i.e., as the volume of data increases the posterior distribution approaches a Normal.68 In general, [a, b] is a credibility interval (confidence interval) at level 1- α for a parameter, provided that the probability that the parameter is outside the interval [a, b] is less than or equal to α. In this example, for a = 0.20, b = 0.98, we have a confidence interval with α = .08, for the parameter q. Usually actuaries pick some reasonable value of α and get a reasonable interval that (approximately) covers probability of 1- α or a little more. Even if two actuaries agree on a value of alpha, they may come up with slightly different confidence intervals.69 This can be the case because they used different approximations. More fundamentally, the confidence interval is not unique. One has to specify something more than just alpha, if one wants a unique interval. For example, one might specify that the interval should be symmetric around the estimated mean. Another example would be an “equal probability interval”, which would leave half the probability on either tail. If α were 10%, then an equal probability interval [a, b] would be such that there is a 5% probability of being less than a and a 5% probability of being greater than b. Even more generally, one is not limited to intervals. One can take the union of several intervals. Such credibility sets have credibility intervals as a special case.70 So if for example, most of the probability of a parameter θ were concentrated around θ = 2 and θ = 7, then a 90% credibility set for θ might be: [1.5, 2.5] ∪ [6, 8].
68
See Theorem 15.22 of Loss Models. Besides the conditions that both the prior distribution and the model distribution (prior likelihood function) are twice differentiable in the parameter(s), there are also the conditions stated in Theorem 15.5. In particular applications it may be clear that the normal approximation is appropriate, for example, if the posterior distribution is a Gamma. Loss Models implies that the Normal Approximation would only be applied if one had difficulty calculating the exact form of the posterior distribution. In fact, the Normal Approximation is often used by actuaries even when the exact form of the posterior distribution has been calculated. 69 Fortunately , I have never found this to be of major practical importance in actuarial work. Since the choice of the confidence level 1-α is somewhat arbitrary, I never agonize over small differences in confidence intervals. 70 I have never found this to be of practical use in actuarial work.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 215 Exercise: Let the posterior distribution of the parameter θ be given as follows: θ
1
1.5
f(θ)
1%
10% 27% 10% 3%
2
2.5
3
4
5
6
7
8
1%
3%
10% 25% 9%
9 1%
Find a 90% confidence set for θ. [Solution: [1.5, 2.5] ∪ [6, 8] is a 90% confidence set. While it is not the only such confidence set, it is the smallest one. For example [1, 7] also covers 90% probability.] Loss Models defines the Highest Posterior Density (HPD) credibility set for a given level 1- α, as that credibility set among all possible credibility sets for a given posterior density and for the given level 1- α, such that the smallest value the posterior density takes in the HPD credibility set is as large as possible.71 In the above exercise, the smallest value of the posterior density in [1.5, 2.5] ∪ [6, 8] is 9%, while the smallest value of the posterior density in [1, 7] is 1%. So we know that [1, 7] is not the HPD 90% credibility set for this posterior density. You can confirm that [1.5, 2.5] ∪ [6, 8] is the HPD 90% credibility set for this posterior density.72 This concept applies equally well to either discrete or continuous densities. The way one would construct an HPD credibility set is to start with the places where the density was largest. Then keep adding places where the density is a little lower, until the sum of the probability covered is the desired amount. In the exercise above, one would start with places where the density is at least 27%, then keep lowering that to 25%, 10%, etc. until one covered at least 90% probability. The HPD 90% credibility set in the exercise turns out to consist of those places where the density is at least 9%. Exercise: Given that the posterior distribution of a parameter θ is a Normal Distribution with µ = 7 and σ = 11, construct the HPD 90% credibility set for θ. [Solution: 7 ± (1.645)(11) = (-11.095, 25.095).] In the case of the Normal Distribution the Highest Probability Density credibility set is just the usual confidence interval symmetric around the mean.73 In the above exercise, the Highest Probability Density 90% credibility set is a confidence interval of ±1.645 standard deviations around the mean. 71
See Definition 15.21 of Loss Models. The sum of the probabilities at those points where the density is > 9%, is only 81%. Thus any set that covers 90% of the probability will have to include a point where the density is less than or equal to 9%. 73 This follows from the fact that the Normal is symmetric and unimodal. 72
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 216 If the posterior distribution is continuous and unimodal (has one mode), then the smallest interval of a given confidence level is [a, b] such that the value of the posterior density at a and b are the same and such that [a, b] covers the desired amount of probability.74 Exercise: Let the posterior distribution for the parameter q be given by qe-q , 0< q. Write down the equations that need to be solved (numerically) in order to find the smallest 80% confidence interval for q. [Solution: Let the desired interval be [a, b]. Then in order to cover 80% probability b
0.8 =
∫
q
e- q
dq = - (1+
a
q=b q q)e ] q= a
= (1+a)e-a - (1+b)e-b.
Also for the shortest interval, the density function at a and b need to be equal: ae-a = be-b. ] By solving these equations numerically, it turns out that a = 0.1673 and b = 3.08029. Thus the interval [.1673, 3.08029] is the smallest 80% confidence interval for this posterior distribution (which is a Gamma distribution with α = 2 and θ = 1.) While the equations need to be solved numerically, the concept is relatively simple when put in graphical terms. The density is shown below. Also shown are vertical lines at 0.1673 and 3.08029. We note that the density has the same value at 0.1673 and 3.08029; the vertical lines are the same height.75 The area under the density between these two lines is 80%.76
0.3
0.2
0.1
1 74
2
3
4
5
6
Theorem 15.20 in Loss Models. Example 15.20 in Loss Models shows how to set up equations for a and b that have to be solved numerically. 75 0.1673e-0.1673 = 0.1415 = 3.08029e-3.08029. 76 (1+0.1673)e-0.1673 - (1+3.08029)e-3.08029 = 0.9875 - 0.1875 = 0.8000. Since the posterior density is a Gamma Distribution, this area can also be put in terms of Incomplete Gamma Functions: Γ[2;3.08029] - Γ[2;0.1673] = 0.8125 - 0.0125 = 0.8000. (Note that 0.8125 = 1 - 0.1875 and 0.0125 = 1 - 0.9875.)
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 217 In order to solve graphically for the shortest 80% confidence interval, one picks a height, sees where the density achieves that height, draws the corresponding vertical lines, and checks the area under the curve between the two vertical lines. If the area is larger than the desired confidence level, in this case 80%, then increase the selected height and try again. If the area is smaller than the desired confidence level, then decrease the selected height and try again. Eventually youʼll find an interval that covers the desired level of confidence. We note that usually the shortest confidence interval has different probabilities outside on either tail. In this case, there is 1.25% outside on the left and 18.75% outside on the right. Exercise: Let the posterior distribution for the parameter q be given by qe-q , 0< q. Write down the equations that need to be solved (numerically) in order to find the equal probability 80% confidence interval for q, the confidence interval that places 10% outside at each end. [Solution: Let the desired interval be [a,b]. Then in order to have 10% probability outside at each tail, F(a) = 1 - (1+a)e-a = .1 and F(b) = 1 - (1+b)e-b = .9. Note that since the posterior distribution is a Gamma Distribution with alpha = 2 and theta = 1, one can also write these equations in terms of Incomplete Gamma Functions as: Γ[2; a] = 0.1 and Γ[2; b] = 0.9.] By solving these equations numerically, it turns out that a = 0.531812 and b = 3.88972. Thus the interval [0.531812, 3.88972] is 80% confidence interval for q for this posterior distribution, that places 10% outside at each tail. While the equations need to be solved numerically, the concept is relatively simple when put in graphical terms. The density is shown below. Also shown are vertical lines at 0.531812 and 3.88972. We note that the areas outside the vertical lines but under the curve are the same; they are each 10%. The area under the density between these two lines is 80%.
0.3
0.2
0.1
1
2
3
4
5
6
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 218 The “equal probability” interval is longer than the smallest interval determined previously. One could also determine an 80% confidence interval using the Normal Approximation. Exercise: Let the posterior distribution for the parameter q be given by qe-q , 0< q. Use the Normal Approximation to determine an 80% confidence interval for q. Note that the posterior distribution is a Gamma Distribution with α = 2 and θ = 1. [Solution: The Gamma Distribution has a mean of αθ = 2 and a variance of αθ2 = 2. For an 80% confidence interval we want ± 1.282 standard deviations, since Φ(1.282) = (1+.8)/2 = 0.9. Thus the desired interval is 2 ± 1.282 2 = [.187, 3.813] . ] The interval determined by the Normal Approximation differs from the other two previously determined. This interval of [0.187, 3.813] actually covers a probability of Γ[2; 3.813] - Γ[2; .187] = 0.8937 - 0.0155 = 87.8%, which is more than the required 80%.77 This interval is shown below:
0.3
0.2
0.1
1
77
2
That is why it is called the Normal approximation.
3
4
5
6
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 219 An Exponential Example:78 Let us assume that severity follows an Exponential Distribution with hazard rate λ, f(x | λ) = λ e-λx, with λ varying across the portfolio. The prior distribution of the parameter λ has probability density function: ⎧2 for 0.1 ≤ λ ≤ 0.3 π(λ) = ⎨ . ⎩ 3 for 0.3 < λ ≤ 0.5 Exercise: Verify that π(λ) integrates to one over its support. [Solution: (2)(0.3 - 0.1) + (3)(0.5 - 0.3) = 1.] Exercise: What is the a priori mean severity? [Solution: E[X | λ] = 1/λ. Thus we integrate 1/λ versus the prior density of lambda. 0.3
0.5
∫0.12 / λ dλ + 0.3∫ 3 / λ dλ = 2 ln[3] + 3 ln[5/3] = 3.73.]
Exercise: What is the prior probability that lambda is between 0.2 and 0.35? 0.3
[Solution:
78
0.35
∫0.2 2 dλ + 0.3∫ 3 dλ = (2)(0.1) + (3)(0.05) = 35%.]
Some of the calculations in this example are longer than what you should get on your exam. Concentrate on the concepts and then do some (more) of my questions or past exam questions in this section.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 220 Exercise: What is the marginal (prior mixed) distribution? Hint:
∫ y e- cy dy = -y e-cy / c - e-cy / c2.
[Solution: f(x | λ) = λ e-xλ. 0.5
The marginal distribution is:
∫0.1 π[λ] λ e- xλ dλ .
0.3
∫0.12λ
λ = 0.3
e- xλ
dλ = (2) { -λ
e- xλ
/ 5 -
e- xλ
]
/ 25
}=
λ = 0.1
(2) {0.1 e-0.1x / x - 0.3 e-0.3x / x + e-0.1x / x2 - e-0.3x / x2 } . 0.5
∫ 0.3
λ = 0.5
3λ e - xλ dλ = (3) { -λ e- xλ / x - e- xλ / x2
]
}=
λ = 0.3
(3) {0.3 e-0.3x / x - 0.5 e-0.5x / x + e-0.3x / x2 - e-0.5x / x2 }. 0.3
The marginal distribution is:
∫0.12λ
0.5
e- xλ
dλ +
∫0.3 3λ e - x λ dλ =
0.2 e-0.1x / x + 2e-0.1x / x2 + 0.3 e-0.3x / x + e-0.3x / x2 - 1.5 e-0.5x / x - 3e-0.5x / x2 .] A graph of the marginal (prior mixed) density: density 0.30 0.25 0.20 0.15 0.10 0.05 2
4
6
8
10
x
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 221 One can use the marginal distribution in order to for example determine the probability that the first claim observed will be in a certain interval. For example, the probability that the first claim will be in the interval from 6 to 8 is the integral from 6 to 8 of the density of the marginal distribution. Using a computer, the probability that the first claim will be in the interval from 6 to 8 is 6.88%. Exercise: A single claim of size 5 is observed from an insured. What is the posterior distribution of lambda for that insured? Hint:
∫ y e- cy dy = -y e-cy / c - e-cy / c2.
[Solution: The chance of the observation given lambda is: f(5 | λ) = λ e-5λ. Thus the numerator of Bayes Theorem, the probability weight, is π(λ) f(5 | λ) : 2λ e-5λ for 0.1 ≤ λ ≤ 0.3, and 3λ e-5λ for 0.3 < λ ≤ 0.5. 0.3
∫0.12λ
λ = 0.3
e- 5λ
dλ = (2) { -λ
e- 5λ
/ 5 -
e- 5λ
/ 25
]
}=
λ = 0.1
(2) {0.1 e-0.5 / 5 - 0.3 e-1.5 / 5 + e-0.5 / 25 - e-1.5 / 25} = 0.02816. 0.5
∫ 0.3
λ = 0.5
3λ e - 5λ dλ = (3) { -λ e- 5λ / 5 - e- 5λ / 25
]
}=
λ = 0.3
(3) {0.3 e-1.5 / 5 - 0.5 e-2.5 / 5 + e-1.5 / 25 - e-2.5 / 25} = 0.03246. 0.3
Thus the denominator of Bayes Theorem is:
∫0.12λ
0.5
e- 5λ
dλ +
∫0.3 3λ e - 5λ dλ = 0.06062.
Dividing the numerator and denominator of Bayes Theorem, the density of the posterior distribution of lambda is: 33 λ e-5λ for 0.1 ≤ λ ≤ 0.3, and 49.5 λ e-5λ for 0.3 < λ ≤ 0.5. Comment: For a draw from a continuous distribution such as an Exponential, we use the density as the chance of the observation. The hint involves a Gamma type integral you should know how to do for your exam. One can do this integral by parts or just remember the result. ∞
Mean of an Exponential with hazard rate c is:
∞
∫0 x c e- cx dx = c ∫0 x e- cx dx = (c) (1/c2) = 1/c. ]
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 222 The density of lambda, posterior to observing one claim of size 5: density 3.0 2.5 2.0 1.5 1.0 0.5 0.1
0.2
0.3
0.4
0.5
lambda
Exercise: What is the expected value of the next claim from the same insured? Hint:
∫ y e- cy dy = -y e-cy / c - e-cy / c2.
[Solution: E[X | λ] = 1/λ. Thus we integrate 1/λ versus the posterior density of lambda. 0.3
33
0.5
∫0.1e- 5λ dλ + 49.5 0.3∫ e- 5λ dλ = (33)(e-0.5/5 - e-1.5/5) + (49.5)(e-1.5/5 - e-2.5/5) = 3.93.
Comment: Differs somewhat from the a priori mean severity of 3.73.] Exercise: What is the posterior probability that lambda is between 0.2 and 0.35? 0.3
[Solution: 33
0.35
∫0.2 λ e- 5λ dλ + 49.5 0.3∫ λ e- 5λ dλ = λ = 0.3
(33) { -λ
e- 5λ
/ 5 -
e- 5λ
/ 25
]
λ = 0.35
} + (49.5) { -λ e- 5λ / 5 - e- 5λ / 25
λ = 0.2
]
}=
λ = 0.3
(33) {0.2 e-1 / 5 - 0.3 e-1.5 / 5 + e-1 / 25 - e-1.5 / 25} + (49.5) {0.3 e-1.5 / 5 - 0.35 e-1.75 / 5 + e-1.5 / 25 - e-1.75 / 25} = 0.23487 + 0.158295 = 39.3%. Comment: This Bayes interval estimate differs from the a priori probability of 35%.]
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 223 Exercise: A single claim of size 5 is observed from an insured. What is the predictive (posterior mixed) distribution? Hint:
∫ y2 e- cy dy = -y2 e-cy / c - 2y e-cy / c2 - 2e-cy / c3.
[Solution: f(x | λ) = λ e-xλ. From a previous solution, the posterior density of lambda is: 33 λ e-5λ for 0.1 ≤ λ ≤ 0.3, and 49.5 λ e-5λ for 0.3 < λ ≤ 0.5. The density of the predictive distribution is the integral of f(x | λ) times the posterior density of λ. 0.3
0.3
∫0.1λ e- xλ 33 λ e- 5λ dλ = 33 0.1∫ λ2 e- (x+ 5)λ dλ =
(33) { -λ
2
λ = 0.3
e- (x + 5)λ
/ (x + 5)
- 2λ
e- (x+ 5)λ
/ (x +
5)2
-
2e-(x+ 5)λ
/ (x +
]
5)3
}=
λ = 0.1
(33) {0.01 e-0.1(x+5) / (x+5) - 0.09 e-0.3(x+5) / (x+5) + 0.2e-0.1(x+5) / (x+5)2 - 0.6e-0.3(x+5) / (x+5)2 + 2e-0.1(x+5) / (x+5)3 - 2e-0.3(x+5) / (x+5)3 } . 0.5
49.5
∫0.3 λ 2 e- (x + 5)λ dλ =
(49.5) { -λ 2 e- (x + 5)λ / (x + 5)
λ = 0.5
- 2λ e- (x+ 5)λ / (x + 5)2 - 2e-(x+ 5)λ / (x + 5)3
]
}=
λ = 0.3
(49.5) {0.09 e-0.3(x+5) / (x+5) - 0.25 e-0.5(x+5) / (x+5) + 0.6e-0.3(x+5) / (x+5)2 - e-0.5(x+5) / (x+5)2 + 2e-0.3(x+5) / (x+5)3 - 2e-0.5(x+5) / (x+5)3 } . Thus the density of the predictive distribution is: 0.33 e-0.1(x+5) / (x+5) + 6.6e-0.1(x+5) / (x+5)2 + 66e-0.1(x+5) / (x+5)3 + 1.485 e-0.3(x+5) / (x+5) + 9.9 e-0.3(x+5) / (x+5)2 + 33 e-0.3(x+5) / (x+5)3 - 12.375 e-0.5(x+5) / (x+5) - 49.5e-0.5(x+5) / (x+5)2 - 99 e-0.5(x+5) / (x+5)3 . ] One can use the predictive distribution in order to for example determine the probability that the next claim observed from this same insured will be in a certain interval. For example, the probability that the next claim will be in the interval from 6 to 8 is the integral from 6 to 8 of the density of the predictive distribution. Using a computer, the probability that the next claim will be in the interval from 6 to 8 is 7.26%. This compares to the a priori probability of 6.88%.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 224 A graph of the predictive (posterior mixed) density after observing a single claim of size 5: density 0.30 0.25 0.20 0.15 0.10 0.05 2
4
6
8
10
x
With only one observation, the predictive distribution is very similar to the marginal distribution; however, here is a comparison of their righthand tails: density 0.016 0.014 0.012 0.010 0.008
Predictive Marginal
0.006 0.004 12
14
16
18
20
x
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 225 Exercise: Assume instead that two claims were observed from a single insured. The two observed claims are of sizes in that order 5 and 10. What is the posterior distribution of lambda for that insured? Hint:
∫ y2 e- cy dy = -y2 e-cy / c - 2y e-cy / c2 - 2e-cy / c3.
[Solution: The chance of the observation given lambda is: f(5 | λ) f(10 | λ) = λ2 e-15λ. Thus the numerator of Bayes Theorem, the probability weight, is π(λ) λ2 e-15λ : 2λ 2 e-15λ for 0.1 ≤ λ ≤ 0.3, and 3λ2 e-15λ for 0.3 < λ ≤ 0.5. 0.3
∫ 0.1
2λ2
e- 5λ
dλ = (2) { -λ
2
λ = 0.3
e-15λ
/ 15 - 2λ
e-15λ
/ 225 -
2e-15λ
/ 3375
]
}=
λ = 0.1
(2) {0.01e-1.5 / 15 - 0.09e-4.5 / 15 + 0.2e-1.5 / 225 - 0.6e-4.5 / 225 + 2e-1.5 / 3375 - 2e-4.5 / 3375} = 0.0007529. λ = 0.5
0.5
∫ 0.3
3λ 2
e- 5λ
dλ = (3) { -λ
2
e-15λ
/ 15 - 2λ
e-15λ
/ 225 -
2e-15λ
/ 3375
]
}=
λ = 0.3
(3) {0.09e-4.5 / 15 - 0.25e-7.5 / 15 + 0.6e-4.5 / 225 - 1e-7.5 / 225 + 2e-4.5 / 3375 - 2e-7.5 / 3375} = 0.0002726. 0.3
The denominator of Bayes Theorem is:
0.5
∫0.12λ2 e- 5λ dλ + 0.3∫ 3λ 2 e- 5λ dλ = 0.0010255.
Thus the density of the posterior distribution of lambda is: 1950 λ2 e-15λ for 0.1 ≤ λ ≤ 0.3, and 2925 λ2 e-15λ for 0.3 < λ ≤ 0.5. Comment: If the observed claims were 5 and 10 in either order, then the chance of the observation would have been multiplied by two. However, this 2 would have canceled in the numerator and denominator of Bayes Theorem, resulting in the same posterior distribution of lambda.]
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 226 The density of lambda, posterior to observing two claims of sizes 5 and 10: density
4
3
2
1
0.1
0.2
0.3
0.4
0.5
Exercise: The two observed claims are of sizes in that order 5 and 10. What is the expected value of the next claim from the same insured? Hint:
∫ y e- cy dy = -y e-cy / c - e-cy / c2.
[Solution: E[X | λ] = 1/λ. Thus we integrate 1/λ versus the posterior density of lambda. 0.3
1950
0.5
∫0.1λ e-15 λ dλ + 2925 0.3∫ λ e-15λ dλ =
(1950) (0.1e-1.5/15 + e-1.5/152 - 0.3e-4.5/15 + e-4.5/152 ) + (2925) (0.3e-4.5/15 + e-4.5/152 - 0.5e-7.5/15 + e-7.5/152 ) = 5.04. Comment: Differs from the a priori mean severity of 3.73.]
lambda
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 227 Exercise: The two observed claims are of sizes in that order 5 and 10. What is the posterior probability that lambda is between 0.2 and 0.35? Hint:
∫ y2 e- cy dy = -y2 e-cy / c - 2y e-cy / c2 - 2e-cy / c3. 0.3
[Solution: 1950
0.35
∫0.2 λ 2 e-15λ dλ + 2925 0.3∫ λ2 e-15λ dλ =
(1950) {0.22 e-3 / 15 - 0.32 e-4.5 / 15 + 0.4 e-3 / 152 - 0.6 e-4.5 / 152 + 2e-3 / 153 - 2e-4.5 / 153 } + (2925){0.32 e-4.5/15 - 0.352 e-5.25/15 + 0.6e-4.5/152 - 0.7e-5.25/152 + 2e-4.5/153 - 2e-5.25/153 } = 40.7%. Comment: This Bayes interval estimate differs from the a priori probability of 35%.]
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 228 Exercise: The two observed claims are of sizes in that order 5 and 10. What is the predictive (posterior mixed) distribution? Hint:
∫ y3 e- cy dy =-y3 e-cy / c - 3y2 e-cy / c2 - 6y e-cy / c3 - 6e-cy / c4.
[Solution: f(x | λ) = λ e-xλ. From a previous solution, the posterior density of lambda is: 1950 λ2 e-15λ for 0.1 ≤ λ ≤ 0.3, and 2925 λ2 e-15λ for 0.3 < λ ≤ 0.5. The predictive distribution is the integral of f(x | λ) times the posterior density of λ. 0.3
0.3
∫0.1λ e- xλ 1950 λ2 e- 15λ dλ = 1950 0.1∫ λ3 e- (x + 15)λ dλ =
(1950) {0.001e-0.1(x+15) / (x+15) - 0.027 e-0.3(x+15) / (x+15) + 0.03e-0.1(x+15) / (x+15)2 - 0.27e-0.3(x+15) / (x+15)2 + 0.6e-0.1(x+15) / (x+15)3 - 1.8e-0.3(x+15) / (x+15)3 + 6e-0.1(x+15) / (x+15)4 - 6e-0.3(x+15) / (x+15)4 } . 0.5
2925
∫0.3 λ 3 e- (x+ 15)λ dλ =
(2925) {0.027 e-0.3(x+15) / (x+15) - 0.125 e-0.5(x+15) / (x+15) + 0.27e-0.3(x+15) / (x+15)2 - 0.75e-0.5(x+15) / (x+15)2 + 1.8e-0.3(x+15) / (x+15)3 - 3e-0.5(x+15) / (x+15)3 + 6e-0.3(x+15) / (x+15)4 - 6e-0.5(x+15) / (x+15)4 } . Thus the predictive distribution is: 1.95 e-0.1(x+15) / (x+15) + 58.5 e-0.1(x+15) / (x+15)2 + 1170 e-0.1(x+15) / (x+15)3 + 11,700 e-0.1(x+15) / (x+15)4 + 26.325 e-0.3(x+15) / (x+15) + 263.25 e-0.3(x+15) / (x+15)2 + 1755 e-0.3(x+15) / (x+15)3 + 5850 e-0.3(x+15) / (x+15)4 - 365.625 e-0.5(x+15) / (x+15) - 2193.75 e-0.5(x+15) / (x+15)2 - 8775 e-0.5(x+15) / (x+15)3 - 17,550 e-0.5(x+15) / (x+15)4 . ] The probability that the next claim will be in the interval from 6 to 8 is the integral from 6 to 8 of the density of the predictive distribution. Using a computer, the probability that the next claim will be in the interval from 6 to 8 is 8.71%. This compares to the a priori probability of 6.88%.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 229 A graph of the predictive (posterior mixed) density after observing two claims of sizes 5 and 10: density
0.20
0.15
0.10
0.05
2
4
6
8
10
x
Here is a comparison of this predictive distribution and the marginal distribution: density 0.30 0.25
Marginal
0.20 0.15 0.10 Predictive 0.05
2
4
6
8
10
12
14
x
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 230 Problems: Use the following information for the next two questions:
•
The probability of y successes in m trials is given by a Binomial distribution with parameters m and q.
•
The prior distribution of q is uniform on [0,1].
•
Two successes were observed in three trials.
6.1 (3 points) What is the Bayesian estimate for the probability that the unknown parameter q is in the interval [0.5, 0.6]? A. Less than 0.15 B. At least 0.15, but less than 0.16 C. At least 0.16, but less than 0.17 D. At least 0.17, but less than 0.18 E. 0.18 or more 6.2 (2 points) What is the probability that a success will occur on the fourth trial? A. 0.3
B. 0.4
C. 0.5
D. 0.6
E. 0.7
6.3 (3 points) For a group of insureds, you are given: (i) The amount of a claim is uniformly distributed from 0 to ω. (ii) The prior distribution of ω is a Single Parameter Pareto with α = 2 and θ = 10. (iii) Four independent claims are observed of sizes: 4, 5, 7, and 13 . Determine the probability that the next claim will exceed 15. (A) 5% (B) 6% (C) 7% (D) 8% (E) 9% 6.4 (2 points) Use the following information: (i) The number of days per hospital stay for patients at an individual hospital follows a zero-truncated negative binomial distribution with parameters r = 2 and β. (ii) β varies between different hospitals. (ii) The assumed prior distribution of β is: π[β] = 2310 β6 / (1 + β)12, 0 < β < ∞. At Mercy Hospital you observe 100 hospital stays that total 450 days. For Mercy Hospital, determine the density of the posterior distribution of β up to a proportionality constant.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 231 Use the following information for the next four questions: • In a large portfolio of risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean q. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. The number of claims for one policyholder is independent of the number of claims for any other policyholder. • The distribution of q within the portfolio has density function: 400q , 0 < q ≤ 0.05 ⎧ f(q) = ⎨ . ⎩40 - 400q , 0.05 < q < 0.10 • A policyholder Phillip DeTanque is selected at random from the portfolio. 6.5. (1 point) Prior to any observations, what is the probability that Phil has a Bernoulli parameter in the interval [0.03, 0.04]? A. 10% B. 11% C. 12% D. 13% E. 14% 6.6. (2 points) During Year 1, Phil has one claim. What is the probability that Phil has a Bernoulli parameter in the interval [0.03, 0.04]? A. 10% B. 11% C. 12% D. 13% E. 14% 6.7. (2 points) During Year 1, Phil has one claim. During Year 2, Phil has no claim. What is the probability that Phil has a Bernoulli parameter in the interval [0.03, 0.04]? A. less than 9.7% B. at least 9.7% but less than 10.0% C. at least 10.0% but less than 10.3% D. at least 10.3% but less than 10.6% E. at least 10.6% 6.8. (3 points) During Year 1, Phil has one claim. During Year 2, Phil has no claim. During Year 3, Phil has no claim. What is the probability that Phil has a Bernoulli parameter in the interval [0.03, 0.04]? A. less than 9.7% B. at least 9.7% but less than 10.0% C. at least 10.0% but less than 10.3% D. at least 10.3% but less than 10.6% E. at least 10.6%
6.9 (2 points) f(x ; α) is a probability density function with one parameter α. α is distributed via π(α), 0 < α < ∞. One observes: x1 , x2 , x3 ,..., xn . What is the posterior probability that α is in the interval [a, b]?
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 232 Use the following information for the next two questions:
• Each insured has its accident frequency given by a Poisson Distribution with mean λ. • λ is assumed to be distributed across the portfolio via the improper prior distribution: π(λ) = 1/λ, λ > 0.
• An insured is randomly selected from the portfolio and you observe C claims in Y years. 6.10 (2 points) What is the density of the posterior distribution of λ? A. λe −λY B. C (λY)C exp(-(λY)C) / λ C. YC+1 λ Ce−λY / C! D. C YC /(λ+Y)C+1 E. None of the above 6.11 (2 points) Using Bayesian Analysis, what is the estimated future claim frequency for this insured? A. (C-1)/Y
B. C/Y
C. (C-1)/(Y-1)
D. C/(Y-1)
E. None of A, B, C, or D
Use the following information for a group of insureds for the next four questions: (i) The amount of a claim is uniformly distributed, but will not exceed a certain unknown limit b. (ii) The prior distribution of b is: π(b) = 200/b3 , b > 10. 6.12 (2 points) Determine the probability that 25 < b < 35. (A) 0.08 (B) 0.10 (C) 0.12 (D) 0.14 (E) 0.16 6.13 (2 points) Determine the probability that the next claim will exceed 20. (A) 0.02 (B) 0.04 (C) 0.06 (D) 0.08 (E) 0.10 6.14 (2 points) From a given insured, three independent claims of sizes 17, 13, and 22 are observed in that order. For this insured, determine the posterior probability that 25 < b < 35. (A) 0.41 (B) 0.43 (C) 0.45 (D) 0.47 (E) 0.49 6.15 (2 points) From a given insured, three independent claims of sizes 17, 13, and 22 are observed in that order. Determine the probability that the next claim from this insured will exceed 20. (A) 0.24 (B) 0.26 (C) 0.28 (D) 0.30 (E) 0.32
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 233 Use the following information for the next two questions: • Losses for individual policyholders follow a Compound Poisson Process. • The prior distribution of the annual claims intensity λ is uniform on [2, 6]. • Severity is Gamma with parameters α = 3 and θ. • The prior distribution of θ has density 25e-5/θ/θ3 , θ > 0. 6.16 (3 points) An individual policyholder has 3 claims this year. What is the expected number of claims from that policyholder next year? Hint: ∫ xn e-x / n! dx = -e-x(xn /n! + xn-1/(n-1)! + ...+ x + 1). (A) 3.00
(B) 3.25
(C) 3.50
(D) 3.75
(E) 4.00
6.17 (3 points) An individual policyholder has 3 claims this year, of sizes 4, 7, and 13. What is the expected aggregate loss from that policyholder next year? ∞
Hint:
∫0 x - (α + 1) e- β / x dx = Γ(α)/βα.
(A) 27
(B) 30
(C) 33
(D) 36
(E) 39
6.18 (3 points) You are given the following: • Claim sizes for a given policyholder follow a distribution with density function f(x) = 3x2 /b3 , 0 < x < b. • The prior distribution of b is a Single Parameter Pareto Distribution with α = 3 and θ = 40. A policyholder experiences two claims of sizes 30 and 60. Determine the expected value of the next claim from this policyholder. A. 30 B. 40 C. 50 D. 60 E. 70 6.19 (4 points) Use the following information: • Claim sizes for a given policyholder follow a mixed exponential distribution with density function f(x) = 0.75λe-λx + 0.5λe-2λx, 0 < x < ∞. • The prior distribution of λ is uniform from 0.01 to 0.05. • The policyholder experiences a claim of size 60. Use Bayesian Analysis to determine the expected size of the next claim from this policyholder. A. 36 B. 37 C. 38 D. 39 E. 40
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 234 Use the following information for the next two questions: (i) Xi is the claim count observed for driver i for one year. (ii) Xi has a negative binomial distribution with parameters β = 0.5 and ri. (iii) The riʻs have an exponential distribution with mean 0.4. (iv) The size of claims follows a Pareto Distribution with α = 3 and θ = 1000. 6.20 (4 points) An individual driver is observed to have 2 claims in one year. Use Bayesian Analysis to estimate this driverʼs future annual claim frequency. (A) 0.33 (B) 0.35 (C) 0.37 (D) 0.39 (E) 0.41 6.21 (2 points) An individual driver is observed to have 2 claims in one year, of sizes 1500 and 800. Use Bayesian Analysis to estimate this driverʼs future annual aggregate loss. (A) 190 (B) 210 (C) 230 (D) 250 (E) 270 Use the following information for the next two questions: (i) Xi is the claim count observed for insured i for one year. (ii) Xi has a negative binomial distribution with parameters r = 2 and βi. (iii) The βiʼs have a distribution π[β] = 280β4 / (1 + β)9 , 0 < β < ∞. 6.22 (3 points) What is the mean annual claim frequency? A. 1.67 B. 2 C. 2.5 D. 3 E. 3.33 6.23 (3 points) An insured has 8 claims in one year. What is that insuredʼs expected future annual claim frequency? A. 4.8 B. 5.0 C. 5.2 D. 5.4 E. 5.6
6.24 (4 points) Use the following information: • Students are given standardized tests on different subjects. • The scores of students on each test are Normally distributed with mean 65. • However, the variance of each of these Normal Distributions, v, differs between the test. • v is assumed to be distributed across the portfolio via the improper prior distribution: π(v) = 1/v, v > 0. • A test on a new subject is administered to 80 students. • The mean of these scores is 66. • The second moment of these scores is 4400. Using Bayes Analysis, estimate the variance of this test. A. 90 B. 95 C. 100 D. 105 E. 110
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 235 Use the following information for the next two questions: • Claim sizes for a given policyholder follow an exponential distribution with density function f(x) = λe-λx , 0 < x < ∞. • The prior distribution of λ is uniform from 0.02 to 0.10. • The policyholder experiences a claim of size y. 6.25 (3 points) If y = 60, use Bayesian Analysis to determine the expected size of the next claim from this policyholder. A. 24 B. 26 C. 28 D. 30 E. 32 6.26 (3 points) Determine the limit as y approaches infinity of the expected size of the next claim from this policyholder. A. 20 B. 30 C. 40 D. 50 E. ∞
Use the following information for the next three questions: (i) The prior distribution of the parameter Θ has probability density function: π(θ) = 1/θ2, 1 < θ < ∞. (ii) Given Θ = θ, claim sizes follow a Pareto distribution with parameters α = 3 and θ. A claim of 5 is observed. 6.27 (3 points) Calculate the posterior probability that Θ exceeds 2. (A) 0.81
(B) 0.84
(C) 0.87
(D) 0.90
(E) 0.93
6.28 (3 points) Calculate the expected value of the next claim. (A) 5.2 (B) 5.4 (C) 5.6 (D) 5.8 (E) 6.0 6.29 (2 points) Which of the following is the probability that the next claim exceeds 5? ∞
A. 162
θ3 ∫ (5 + θ)6 dθ 1
∞
B. 162
∞
θ4 C. 162 ∫ dθ 6 (5 + θ) 1 E. None of A, B, C, or D.
θ3 ∫ (5 + θ)7 dθ 1
∞
D. 162
θ4
∫ (5 + θ)7 1
dθ
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 236 6.30 (2 points) For each insured, the probability of event A in each year is p. p is constant for each insured. Each year is independent of any other. Over a portfolio of insureds, p is uniformly distributed from 0 to 10%. For a given insured, event A occurs in each of three years. In the fourth year, what is probability of observing event A for this same insured? A. 7.5% B. 8.0% C. 8.5% D. 9.0% E. 9.5% Use the following information for the next three questions: • Severity is uniform from 0 to b. • b is distributed uniformly from 10 to 30. 6.31 (1 point) What is the chance that the next loss is less than 10? A. 55% B. 60% C. 65% D. 70% E. 75% 6.32 (3 points) If you observe two losses each of size less than 10, what is the chance that the next loss is less than 10? A. 55% B. 60% C. 65% D. 70% E. 75% 6.33 (2 points) If you observe two losses each of size less than 10, what is the expected size of the next loss? A. Less than 8.0 B. At least 8.0, but less than 8.5 C. At least 8.5, but less than 9.0 D. At least 9.0, but less than 9.5 E. At least 9.5
6.34 (3 points) Use the following information:
• Each insured has its severity given by a Exponential Distribution with mean µ. • µ is assumed to be distributed across the portfolio via the improper prior distribution: π(µ) = 1/µ, µ > 0.
• An insured is randomly selected from the portfolio and you observe two losses of sizes 100 and 400. Using Bayesian Analysis, what is the estimated future average severity for this insured? A. 200
B. 250
C. 300
D. 400
E. 500
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 237 6.35 (3 points) For a portfolio of policies, you are given: (i) The annual claim amount on a policy has probability density function: f(x | θ) = 2x/θ2, 0 < x < θ. (ii) The prior distribution of θ has density function: π(θ) = 4θ3, 0 < θ < 1. (iii) A randomly selected policy had claim amount 0.9 in Year 1. Determine the Bayesian estimate of the claim amount for the selected policy in Year 2. (A) 0.444 (B) 0.500 (C) 0.622 (D) 0.634 (E) 0.667 6.36 (5 points) Use the following information: • Claim sizes follow a Gamma Distribution, with parameters α = 3 and θ. • The prior distribution of θ is assumed to be uniform on the interval (5, 10). You observe from an insured 2 claims of sizes 15 and 31. Using Bayes Analysis, what is the estimated future claim severity from this insured? You may use the following values of the incomplete Gamma Function: Γ[4, 4.6] = 0.674294.
Γ[4, 9.2] = 0.981580.
Γ[5, 4.6] = 0.486766.
Γ[5, 9.2] = 0.951420.
6.37 (5 points) You are given: (i) The size of claims on a given policy has a Pareto Distribution with parameters θ = 1000 and α. (ii) The prior distribution of α is uniform from 3 to 5. A randomly selected policy had a claim of size 800. Use Bayes Analysis to estimate the size of the next claim from this policy. Hint:
∫ x cx dx = x cx / ln[c] - cx / ln[c]2. x
Let the Exponential Integral Function be Ei[x] =
∫
-∞
Ei[ln(4/9)] = -0.30453219. Ei[4 ln(4/9)] = -0.00959173.
et dt . t
Ei[2 ln(4/9)] = -0.08359820. Ei[5 ln(4/9)] = -0.00353746.
Ei[3 ln(4/9)] = -0.02722911.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 238 6.38 (4 points) You are given: (i) The annual number of claims on a given policy has a geometric distribution with parameter β. (ii) The prior distribution of β has the density function: π(β) = α / (β + 1)(α+1), 0 < β < ∞, where α is a known constant greater than 2. A randomly selected policy had x claims in Year 1. Determine the Bayesian estimate of the number of claims for the selected policy in Year 2. 1
Hint:
∫0 y a - 1 (1- y)b - 1 dy = β(a , b) = Γ[a] Γ[b] / Γ[a+b].
(A)
1 α −1
(B)
1 (α − 1) x + α (α − 1) α
(C) x
(D)
x+1 α
(E)
x +1 α −1
Use the following information for the next two questions: • Each insured has its severity given by a Gamma Distribution with α = 5. • θ is assumed to be distributed across the portfolio via the improper prior distribution: π(θ) = 1/θ, θ > 0.
• An insured is randomly selected from the portfolio and you observe 3 losses of sizes: 10, 30, 50. 6.39 (3 points) Using Bayesian Analysis, what is the estimated future average severity for this insured? A. 26
B. 28
C. 30
D. 32
E. 34
6.40 (3 points) Use the Bayesian central limit theorem to construct a 90% credibility interval (confidence interval) for the estimate in the previous question.
6.41 (4 points) Severity is LogNormal with parameters µ and σ = 1.2. µ is uniformly distributed across the portfolio from 9 to 10. From an individual insured, you observe one claims of size 59,874. Use Bayes Analysis to estimate the size of the next claim from the same insured. Hint:
∫
Exp[bx - ax2] dx = Exp[
A. 31,000
B. 32,000
b2 ] 4a
C. 33,000
π 2ax - b {Φ[ ] - 1/2}. a 2a D. 34,000
E. 35,000
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 239 6.42 (3 points) Use the following information: • Each insured has its annual frequency given by a Binomial Distribution with m = 5. • q is assumed to be distributed across the portfolio via the improper prior distribution: 1 π(q) = , 0 < q < 1. q (1− q) • An insured is randomly selected from the portfolio and you observe three years with the following numbers of claims: 3, 2, 5. What is the estimated future average frequency for this insured? A. 2.9
B. 3.0
C. 3.1
D. 3.2
E. 3.3
Use the following information for the next two questions:
• For each insured, claim sizes are uniformly distributed from b to b + 100. • b varies between the insureds via an Exponential Distribution with θ = 80. 6.43 (3 points) An insured is selected at random, and a claim of size 300 is observed from that insured. Determine the expected value of the next claim from this same insured. A. 280 B. 290 C. 300 D. 310 E. 320 6.44 (3 points) An insured is selected at random, and two claims of sizes 200 and 230 are observed from that insured. Determine the expected value of the next claim from this same insured. A. 200 B. 210 C. 215 D. 220 E. 230
Use the following information for the next two questions:
• Frequency for an individual is a 50-50 mixture of two Poissons with means λ and 2λ. • The prior distribution of λ is Exponential with a mean of 0.1. 6.45 (2 points) An insured is chosen at random and observed to have no claims in the first year. Estimate of the expected number of claims next year for the same insured. A. 0.09 B. 0.10 C. 0.11 D. 0.12 E. 0.13 6.46 (3 points) An insured is chosen at random and observed to have one claim in the first year. Estimate of the expected number of claims next year for the same insured. A. 0.24 B. 0.26 C. 0.28 D. 0.30 E. 0.32
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 240 6.47 (15 points) During World War II, you assume that the enemy has a total of n tanks that have serial numbers from 1 to n. You observe k enemy tanks and the maximum of their serial numbers is m. You assume n follows the improper prior discrete distribution uniform on m to infinity. (a) (2 points) For k > 1, using Bayes Analysis, determine the posterior distribution for n. ∞
Hints: For binomial coefficients, for m ≥ k > 1:
∑ ⎛1i⎞ i=m ⎜
⎟ ⎝ k⎠
=
k 1 . mk -1 ⎛ 1⎞ ⎜ k -1⎟ ⎝ ⎠
For a sample of size k from the numbers 1 to n, for k ≤ m ≤ n, the probability that the maximum of ⎛ m- 1⎞ ⎜ ⎟ ⎝ k -1⎠ the sample is m is: . ⎛ n⎞ ⎜ k⎟ ⎝ ⎠ (b) (2 points) For k > 2, determine the mean of this posterior distribution for n. (c) (4 points) For k > 3, determine the variance of this posterior distribution. Hint: Calculate the second factorial moment. (d) (2 points) For k > 1, determine the posterior probability that n > x, for x ≥ m. (e) (5 points) You observe 5 enemy tanks and their maximum serial number is 20. With the aid of a computer, graph the posterior density of n, the total number of enemy tanks. Determine the mean and variance of this posterior distribution of n. With the aid of a computer, graph the posterior survival function of n. 6.48 (2, 5/83, Q.21) (1.5 points) Suppose one observation x is taken on a random variable X with density function f(x | θ) = 2x / (1 - θ2), θ ≤ x ≤ 1. The prior density function for θ is p(θ) = 4θ(1 - θ2), 0 < θ ≤ 1. What is E(θ l x)? A.
2 3x2
B.
8 15
C.
2x 3
D.
2 x2
E.
2 x
6.49 (4B, 11/95, Q.5) (2 points) A number x is randomly selected from a uniform distribution on the interval [0, 1]. Three independent Bernoulli trials are performed with probability of success x on each trial. All three are successes. What is the posterior probability that x is less than 0.9? A. Less than 0.6 B. At least 0.6, but less than 0.7 C. At least 0.7, but less than 0.8 D. At least 0.8, but less than 0.9 E. At least 0.9
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 241 6.50 (4B, 11/97, Q.9) (2 points) You are given the following: • In a large portfolio of automobile risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean m/100,000, where m is the number of miles driven each and every year by the policyholder. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. The number of claims for one policyholder is independent of the number of claims for any other policyholder. • The distribution of m within the portfolio has density function f(m) = m/100,000,000 , 0 < m ≤ 10,000 = (20,000- m)/100,000,000, 10,000 < m < 20,000. A policyholder is selected at random from the portfolio. During Year 1, one claim is observed for this policyholder. During Year 2, no claims are observed for this policyholder. No information is available regarding the number of claims observed during Years 3 and 4. Determine the posterior probability that the selected policyholder drives less than 10,000 miles each year. Hint: Use a change of variable such as q = m/100,000. A. 1/3 B. 37/106 C. 23/54 D. 1/2 E. 14/27 6.51 (4B, 5/98, Q.8) (2 points) You are given the following: • The number of claims during one exposure period follows a Bernoulli distribution with mean q. • The prior density function of q is assumed to be f(q) = (π/2) sin(πq/2), 0 < q < 1. 1
Hint:
∫
1
∫
(π q/ 2) sin(π q / 2) dq = 2/π, and
0
(π q2 / 2) sin(π q / 2) dq = 4(π - 2) / π2.
0
The claims experience is observed for one exposure period and no claims are observed. Determine the posterior density function of q. A. (π/2) sin(πq/2) , 0 < q < 1 B. (πp/2) sin(πq/2) , 0 < q < 1 C. (π(1-q)/2) sin(πq/2) , 0 < q < 1 D. (π2q/4) sin(πq/2) , 0 < q < 1 E. (π2(1-q)/{2(π-2)}) sin(πq/2) , 0 < q < 1
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 242 6.52 (4B, 11/99, Q.5) (3 points) You are given the following: • Claim sizes for a given policyholder follow a distribution with density function f(x) = 2x/b2 , 0 < x < b. • The prior distribution of b has density function g(b) = 1/b2 , 1 < b < ∞. The policyholder experiences a claim of size 2. Determine the expected value of a second claim from this policyholder. A. 1 B. 3/2 C. 2 D. 3 E. ∞ 6.53 (4, 11/01, Q.14 & 2009 Sample Q.64) (2.5 points) For a group of insureds, you are given: (i) The amount of a claim is uniformly distributed, but will not exceed a certain unknown limit θ. (ii) The prior distribution of θ is π(θ) = 500/θ2, θ > 500. (iii) Two independent claims of 400 and 600 are observed. Determine the probability that the next claim will exceed 550. (A) 0.19 (B) 0.22 (C) 0.25 (D) 0.28 (E) 0.31 6.54 (2 points) Altering bullet (iii) in the previous question, 4, 11/01, Q.14, assume that instead two independent claims of 400 and 300 are observed. Determine the probability that the next claim will exceed 550. (A) 0.19 (B) 0.22 (C) 0.25 (D) 0.28 (E) 0.31
6.55 (4, 11/02, Q.21 & 2009 Sample Q. 43) (2.5 points) You are given: (i) The prior distribution of the parameter Θ has probability density function: π(θ) = 1/θ2, 1 < θ < ∞. (ii) Given Θ = θ, claim sizes follow a Pareto distribution with parameters α = 2 and θ. A claim of 3 is observed. Calculate the posterior probability that Θ exceeds 2. (A) 0.33
(B) 0.42
(C) 0.50
(D) 0.58
(E) 0.64
6.56 (2 points) Altering bullet (i) in 4, 11/02, Q.21, you are given: (i) The prior distribution of the parameter Θ has discrete probability density function: Prob[θ = 2] = 70%, Prob[θ = 4] = 30%. (ii) Given Θ = θ, claim sizes follow a Pareto distribution with parameters α = 2 and θ. A claim of 3 is observed. Calculate the posterior distribution of Θ.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 243 6.57 (4, 11/02, Q.24 & 2009 Sample Q. 45) (2.5 points) You are given: (i) The amount of a claim, X, is uniformly distributed on the interval [0, θ]. (ii) The prior distribution of θ is π(θ) = 500/θ2, θ > 500. Two claims, x1 = 400 and x2 = 600, are observed. You calculate the posterior distribution as: f(θ | x1 , x2 ) = 3(6003 /θ4), θ > 600. Calculate the Bayesian premium, E(X3 | x1 , x2 ). (A) 450
(B) 500
(C) 550
(D) 600
(E) 650
6.58 (2 points) Altering bullet (ii) in 4, 11/02, Q.24, you are given: (i) The amount of a claim, X, is uniformly distributed on the interval [0, θ]. (ii) The prior distribution of θ is: Prob[θ = 500] = 50%, Prob[θ = 1000] = 30%, Prob[θ = 2000] = 20%. Two claims, x1 = 400 and x2 = 600, are observed. Calculate the Bayesian premium, E(X3 | x1 , x2 ). A. Less than 500 B. At least 500, but less than 550 C. At least 550, but less than 600 D. At least 600, but less than 650 E. At least 650
6.59 (4, 11/03, Q.19 & 2009 Sample Q.15) (2.5 points) You are given: (i) The probability that an insured will have at least one loss during any year is p. (ii) The prior distribution for p is uniform on [0, 0.5]. (iii) An insured is observed for 8 years and has at least one loss every year. Determine the posterior probability that the insured will have at least one loss during Year 9. (A) 0.450 (B) 0.475 (C) 0.500 (D) 0.550 (E) 0.625 6.60 (4, 11/03, Q.31 & 2009 Sample Q.24) (2.5 points) You are given: (i) The probability that an insured will have exactly one claim is θ. (ii) The prior distribution of θ has probability density function: π(θ) = (3/2)
θ , 0 < θ < 1.
A randomly chosen insured is observed to have exactly one claim. Determine the posterior probability that θ is greater than 0.60. (A) 0.54
(B) 0.58
(C) 0.63
(D) 0.67
(E) 0.72
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 244 6.61 (4, 11/04, Q.33 & 2009 Sample Q.157) (2.5 points) You are given: (i) In a portfolio of risks, each policyholder can have at most one claim per year. (ii) The probability of a claim for a policyholder during a year is q. (iii) The prior density is π(q) = q3 /0.07, 0.6 < q < 0.8. A randomly selected policyholder has one claim in Year 1 and zero claims in Year 2. For this policyholder, determine the posterior probability that 0.7 < q < 0.8. (A) Less than 0.3 (B) At least 0.3, but less than 0.4 (C) At least 0.4, but less than 0.5 (D) At least 0.5, but less than 0.6 (E) At least 0.6 6.62 (4, 11/05, Q.32 & 2009 Sample Q.242) (2.9 points) You are given: (i) In a portfolio of risks, each policyholder can have at most two claims per year. (ii) For each year, the distribution of the number of claims is: Number of Claims Probability 0 0.10 1 0.90 - q 2 q (iii) The prior density is: π(q) = q2 /0.039, 0.2 < q < 0.5. A randomly selected policyholder had two claims in Year 1 and two claims in Year 2. For this insured, determine the Bayesian estimate of the expected number of claims in Year 3. (A) Less than 1.30 (B) At least 1.30, but less than 1.40 (C) At least 1.40, but less than 1.50 (D) At least 1.50, but less than 1.60 (E) At least 1.60
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 245 Solutions to Problems: 6.1. C. Assuming a given value of q, the chance of observing two successes in three trials is 3q2 (1-q). The prior distribution of q is: g(q) = 1, 0 ≤ q ≤ 1. By Bayes Theorem, the posterior distribution of q is proportional to the product of the chance of the observation and the prior distribution: 3q2 (1-q). Thus the posterior distribution of q is proportional to q2 - q3 . (You can keep the factor of 3 and get the same result.) The integral of q2 - q3 from 0 to 1 is 1/3 - 1/4 = 1/12. Thus the posterior distribution of q is 12(q2 - q3 ). (The integral of the posterior distribution has to be unity. In this case dividing by 1/12; i.e., multiplying by 12, will make it so.) The posterior chance of q in [.5, .6] is: .6
.6
∫
] = 0.1627.
12 (q2 - q3 ) dq = 4q3 - 3q4 q=.5
q=.5
Comment: A Beta-Bernoulli conjugate prior situation. The uniform distribution is a Beta distribution with a=1 and b=1. The posterior distribution is Beta with parameters aʼ = a + 2 = 1 + 2 = 3, and b ʻ = b + 3 - 2 = 1 + 3 - 2 = 2: {(4!) /((2!)(1!))}q3-1 (1 - q)2-1 = 12(q2 - q3 ). See “Mahlerʼs Guide to Conjugate Priors.” 6.2. D. From the solution to the prior question, the posterior distribution of q is: 12(q2 - q3 ). The mean of this posterior distribution is: 1
1
∫
]
q=0
q=0
12 (q2 - q3 )q dq = 3q4 - (12/5)q5
= .6.
The chance of a success on the fourth trial is E[q] = 0.6. Comment: A Beta-Bernoulli conjugate prior situation. The uniform distribution is a Beta distribution with a=1 and b=1. The posterior mean is: (a + number of successes) / (a + b + number of trials) = (1 + 2)/(1 + 1 + 3) = 3/5 = .6. See “Mahlerʼs Guide to Conjugate Priors.”
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 246 6.3. B. Severity is uniform on [0, ω]. If for example, ω = 13.01, the chance of a claim of size 13 is 1/13.01. If for example, ω = 12.99, the chance of a claim of size 13 is 0. For ω ≥ 13, Prob[observation] = 24 f(4) f(5) f(7) f(13) = (24)(1/ω)(1/ω)(1/ω)(1/ω) = 24/ω3. For ω < 13, Prob[observation] = 24 f(4) f(5) f(7) f(13) = 24 f(4) f(5) f(7) 0 = 0. π(ω) = 200/ω3, ω > 10. ∞
∞
13
∫10 π(ω) Prob[observation | ω ] dω = 10∫ π(ω) 0 dω + 13∫ ω 3
200 24 dω = 800 / 136 . ω4
By Bayes Theorem, the density of the posterior distribution of ω is: π(ω) Prob[observation | ω ] 4800 / ω 7 136 = = 6 , ω ≥ 13. 800 / 136 800 / 136 ω7 (Recall that if ω < 13, Prob[observation] = 0.) For ω < 15, S(15) = 0. For ω ≥ 15, S(15) = (ω - 15)/ω = 1 - 15/ω. The probability that the next claim will exceed 15 is: ∞
∫
13
136 6 S(15) dω = ω7 ∞
(136 )
∫
15
15
∫
13
136 6 0 dω + ω7
6 dω - (90)(136 ) ω7
∞
∫
15
∞
∫
15
6
136 (1 - 15 / ω) dω = ω7
1 136 136 dω = (90/7) = 6.05%. ω8 156 157
Comment: Similar to 4, 11/01, Q.14. Note that the posterior distribution of ω is also a Single Parameter Pareto, so the Single Parameter Pareto is a Conjugate Prior to the uniform Iikelihood. In general: αʼ = α + n, and θʼ = Max[x1 , ... xn , θ]. In this case, αʼ = 2 + 4 = 6, and θʼ = Max[4, 5 ,7, 13, 10] = 13.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 247 6.4. The density at x of the zero truncated negative binomial is proportional to: ⎛ β ⎞ β x / (1+ β)x + 2 =⎜ ⎟ 2 ⎝ 1+ β ⎠ 1 - 1 / (1+ β)
x
1 . β (2 + β)
Thus the chance of the observation is proportional to: 350 ⎛ β ⎞ 450 1 β = . ⎜ ⎟ ⎝ 1+ β ⎠ (1+ β) 450 (2 +β) 100 β100 (2 + β)100
Thus multiplying by π[β], the posterior distribution of β is the proportional to: β
356
(1+ β) 462 (2 +β) 100 .04
6.5. E.
, β > 0.
.04
∫f(q)dq = ∫400q dq
.03
.03
q =.04
= 200q2
]
= 200(.0016 - .0009) = 14%.
q =.03
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 248 6.6. A. The chance of the observation given q is q. By Bayes Theorem, the posterior probability density function is proportional to f(q)q. In order to compute the posterior density function we need to divide by the integral of f(q)q. .1
.05
.1
∫f(q)qdq = ∫400q2 dq + ∫ 40q - 400q2 dq = 0
0
.05
q=.05
(400/3)q3
]
q=.1
+ {20q2 - (400/3)q3 } ] = .01667 + (.2 - .1333) - (.05 - .01667) = .05.
q=0
q =.05
Thus the posterior density is: 400q2 /.05 = 8000q2 for 0 < q ≤ .05, and {40q - 400q2 }/.05 = 800q - 8000q2 for .05< q ≤ .1. Then the posterior chance that q is in the interval [.03, .04] is the integral from .03 to .04 of the posterior density: .04
q = .04
∫8000q2 dq = 8000q3/3 ] = .1707 - .0720 = 9.87%. .03
q = .03
Comment: An example of a Bayesian Interval Estimate. After having observed a claim, the chance of Phil being a better than average risk has declined. For example, the chance of Philʼs expected frequency being in the interval [.03, .04] has declined from 14% to 9.9%. 6.7. C. The chance of the observation given q is q(1-q) = q - q2 . By Bayes Theorem, the posterior probability density function is proportional to f(q)(q - q2 ). In order to compute the posterior density function we need to divide by the integral of f(q)(q - q2 ). .1
.05
.1
∫f(q)(q - q2)dq = ∫{400q2 - 400q3} dq + ∫ {40q - 440q2 + 0
0
.05 q=.05
{(400/3)q3 - (100)q4 }
400q3 } dq =
] q=0
q =.1
+ {20q2 - (440/3)q3 + (100)q4 } ] = .016042 + .031042 = .047083. q =.05
Thus the posterior density is: {400q2 - 400q3 } / .047083 for 0 < q ≤ .05, and {40q - 440q2 + 400q3 } / .047083 for .05< q ≤ .1. Then the posterior chance that q is in the interval [.03, .04] is the integral from .03 to .04 of the posterior density: .04
q=.04
∫{400q2 - 400q3}/.047083 dq = {(400/3)q3 - (100)q4}/.047083 ] = .004758 /.047083 = 10.1%. .03
q=.03
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 249 6.8. D. The chance of the observation given q is q(1-q)2 = q - 2q2 + q3 . By Bayes Theorem, the posterior probability density function is proportional to f(q)(q - 2q2 + q3 ). In order to compute the posterior density we need to divide by the integral of f(q)(q - 2q2 + q3 ). .1
.05
.1
∫f(q)(q - 2q2 + q3)dq = 400∫{q2 - 2q3 + q4} dq + 40∫ {q - 12q2 + 0
0
.05
q=.05
400{q3 /3 - q4 /2 + q5 /5}
]
21q3 - 10q4 } dq =
q=.1
+ 40{q2 /2 - 4q3 + 21q4 /4 - 2q5 } ] =
q=0
q =.05
(400)(.000041667 - .000003125 + .000000063) + 40{(.005 - .004 + .000525 - .00002) - (.00125 - .0005 + .000032813 - .000000625)} = .01544 +.02891 = .04435. Thus the posterior density is: 400{q2 - 2q3 + q4 } / .04435 for 0 < q ≤ .05, and 40{q - 12q2 + 21q3 - 10q4 } / .04435 for .05< q ≤ .1. Then the posterior chance that q is in the interval [.03, .04] is the integral from .03 to .04 of the posterior density: .04
q=.04
∫
400 {q2 - 2q3 + q4 } / .04435 dq =400{q3 /3 - q4 /2 + q5 /5}/ .04435 .03
]
=
q=.03
.004590 / .04435 = 10.35%. n
6.9. The probability of the observation given α is: f(x1 ; α) f(x2 ; α)... f(xn ; α) = Π f(xi ; α). i=1 n
∞
n
Posterior density of α is: π(α)Π f(xi ; α) / ∫ π(α) Π f(xi ; α) dα. i=1
0
i=1
Prob[ a ≤ α ≤ b] = Integral of the posterior density of α, from a to b: b
n
∞
n
∫ π(α) Π f(xi ; α) dα / ∫ π(α) Π f(xi ; α) dα. a
i=1
0
i=1
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 250 6.10. E. Given lambda, the number of claims over Y years is Poisson with mean Yλ. Therefore, the chance of the observation given λ is: (Yλ)Ce−λY/C!. The posterior distribution is proportional to the product of the chance of the observation given lambda and the prior distribution of lambda: {YCλ Ce−λY/C!} π(λ) = {YCλ Ce−λY/C!} / λ, which is proportional to: λC-1e−λY, proportional to a Gamma Distribution with α = C and θ = 1/Y. Thus the posterior distribution of lambda is a Gamma Distribution with α = C and θ = 1/Y: π λ|x(λ|x) = YC λ C - 1e−λY /(C-1)!, λ > 0. Comment: After this first set of observations one now has the starting point for a Gamma-Poisson process, which can be easily updated for any additional observations of the same insured, as discussed in “Mahlerʼs Guide to Conjugate Priors.” 6.11. B. The posterior distribution of lambda is a Gamma Distribution with α = C and θ = 1/Y. Since each insureds average claim frequency is lambda, the expected future claim frequency is just the expected value of the posterior distribution of lambda. The mean of a Gamma Distribution is: θα = C/Y. Comment: The estimated future claim frequency is equal to the observed claim frequency C/Y. That is why 1/λ is referred to as the noninformative or vague (improper) prior distribution for a Poisson. 6.12. A. & 6.13. D.
35
Prob[25 < b < 35] =
35
∫ 200/b3 db = -100/b2 ] = .16 - .0816 = 0.0784.
25
25
For b ≥ 20, S(20) = (b - 20)/b = 1 - 20/b. ∞
b=∞
∫(1 - 20/b) 200/b3 db = {-100/b2 + (20)(200/3)/b3} ] = .25 - .1667 = 0.0833. 20
b = 20
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 251 6.14. B. & 6.15. A. For b ≥ 22, Prob[observation] = f(17)f(13)f(22) = (1/b)(1/b)(1/b) = 1/b3. For b < 22, Prob[observation] = f(17)f(13)f(22) = f(17)f(13)0 = 0. ∞
∞
∫π(b) Prob[observation | b] db = ∫(200/b3)(1/b3)db = (200/5)/225 = 40/225. 22
22
By Bayes Theorem, Posterior distribution of b is: π(b) Prob[observation | b]/ {40/225 } = (5)(225 )/b6, b >22. 35
Prob[25 < b < 35] =
35
∫ (5)(225)/b6 db = -(225)/b5 ] = .5277 - .0981 = 0.4296.
25
25
For b ≥ 20, S(20) = (b - 20)/b = 1 - 20/b. ∞
∫(1 - 20/b) (5)(225)/b6 db = 1 +
b=∞
(20)((5/6)(225 ))/b6
22
] = 1 - .7576 = 0.2424. b = 22
Comment: Similar to 4, 11/01, Q.14. If we did not assume a prior distribution of b, then the maximum likelihood fit of b would be the maximum of the sample, or 22. Then the estimate of S(20) would be: (22 - 20)/22 = 1/11. 6.16. D. The chance of the observation given λ is: f(3 | λ) = λ3e−λ/6. The prior density of λ is: 1/4, 2 ≤ λ ≤ 6. Therefore, the posterior distribution is proportional to: (1/4)λ3e−λ/6 ∼ λ3e−λ. 6
6
The posterior distribution of λ is: λ3e−λ / ∫λ3e−λ dλ = λ3e−λ / {(3!)(-e-x)(x3 /3! + x2 /2! + x + 1)]} = 2
2
λ 3e−λ / (6){(e-2)(6.333) - (e-6)(61)} = λ3e−λ /4.236. The mean of the posterior distribution is: 6
6
∫λ4e−λ dλ / 4.236 = {(4!)(-e-x)(x4/4! + x3/3! + x2/2! + x + 1)]}/ 4.236 = 2
(24){(e-2)(7) - (e-6)(115)}/4.236 = 15.894/4.236 = 3.75.
2
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 252 6.17. C. Since the prior distributions of λ and θ are independent, the distribution of λ posterior to observing 3 claims is that from the previous solution, with resulting estimated future frequency of 3.75. Given θ, the chance of the observed sizes of loss is: 6f(4)f(7)f(13) ∼ e−4/θθ-3 e−7/θθ-3 e−13/θθ-3 = e−24/θθ-9. Thus the posterior distribution of θ is proportional to: e−24/θθ-9e-5/θ/θ3 = e−29/θθ−12. Mean of posterior distribution of θ is: ∞
∞
∫θe−29/θθ−12dθ / ∫ e−29/θθ−12dθ = {Γ(10)/2910}/{Γ(11)/2911} = 29(9!/10!) = 2.9. 0
0
E[severity] = E[3θ] = (3)(2.9) = 8.7. Mean aggregate loss = (mean frequency)(mean severity) = (3.75)(8.7) = 32.6. Comment: The posterior distribution of θ is an Inverse Gamma with α = 11 and scale parameter 29, with mean 29/(11 - 1) = 2.9. 6.18. C. If b < 60, then we would not observe a loss of size 60. The probability of the observation is: 2f(30)f(60), for b≥ 60. 2f(30)f(60) = (2)(900/b3 )(3600/b3 ), which is proportional to: 1/b6 , b ≥ 60. Single Parameter Pareto Distribution with α = 3 and θ = 40: π(b) = (3)403 /b4 , which is proportional to: 1/b4 , b > 40. Thus the posterior distribution of b is proportional to: (1/b4 )(1/b6 ) = 1/b10, b ≥ 60. The posterior distribution of b is: ∞
(1/b10 ) / ∫ 1/b10 db = (1/b10 )/{1/((9)/609 )} = (9)609 /b10. 60
Given b, the mean severity is the integral from 0 to b of xf(x): 3b/4. Posterior expected value of this mean severity: ∞
∫(3b/4){(9)609 /b10} db = {1/((8)608)}(3/4)(9)609 = 50.6. 60
Comment: Similar to 4B, 11/99, Q.5.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 253 6.19. E. X is a 75%-25% mixture of Exponentials with means 1/λ and 1/(2λ). E[X | λ] = .75/λ + .25/(2λ) = .875/λ. The posterior density of λ is proportional to: Π(λ) f(60) = (.75λe-λ60 + .5λe-2λ60)/.04 = 18.75λe-60λ + 12.5λe-120λ, .01 ≤ λ ≤ .05. .05
∫18.75λe-60λ + 12.5λe-120λ dλ = .01 λ = .05
{18.75(-λe-60λ/60 - e-60λ/3600) + 12.5(-λe-120λ/120 - e-120λ/14400)
]
= .00409634.
λ = .01
Therefore, the posterior density is: (18.75λe-60λ + 12.5λe-120λ)/.00409634 = 4577λe-60λ + 3052λe-120λ, .01 ≤ λ ≤ .05. The expected size of the next claim from the same policyholder is: .05
.05
∫(4577λe-60λ + 3052λe-120λ) E[X | λ] dλ = ∫(4577λe-60λ + 3052λe-120λ) .875/λ dλ = .01
.01 .05
∫
.875 4577e-60λ + 3052e-120λ dλ = 39.96. .01
6.20. E. For a Negative Binomial, f(2) = (r(r+1)/2) β2/(1+β)2+r. g(r) = e-r/.4/.4. Therefore the posterior distribution is proportional to: e-2.5rr(r+1)/(1.5)r= (r + r2 )e-2.905r. ∞
∫ (r + r2)e− 2.905r dr
= Γ(2)/2.9052 + Γ(3)/2.9053 = 1/2.9052 + 2/2.9053 = .200.
0
Therefore, the posterior distribution of r is: (r + r2 )e-2.905r/.200 = 5(r + r2 )e-2.905r. ∞
E[r] = ∫ r5(r + r2 )e−2.905r dr = 5Γ(4)/2.9054 + 5Γ(3)/2.9053 = 30/2.9054 + 10/2.9053 = .829. 0
Expected future annual frequency = E[rβ] = .5E[r] = 0.414. Comment: Set up taken from 4, 5/00, Q.37. 1/(1.5)r = e-ln(1.5)r = e-.405r. ∞
∫ tα − 1 e− t / θ dt = Γ(α)θα, see “Mahlerʼs Guide to Conjugate Priors.” 0
We apply this result twice, once with α = 2 and θ = 1/2.905, and once with α = 3 and θ = 1/2.905.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 254 6.21. B. The chance of the observation is: 2(Negative Binomial @2)(Pareto @ 1500)(Pareto @ 800). However, since the Pareto densities do not involve r, they do not affect the posterior density of r. From the previous solution, the posterior density of r is 5(r + r2 )e-2.905r, and the expected future annual frequency is .414. The average severity is: 1000/(3-1) = 500. Expected future annual aggregate loss is: (500)(.414) = 207. Comment: Since the parameters of the Pareto do not vary by insured, the observed claim sizes do not affect the posterior distribution. 6.22. E. Making the change of variables, x = β/(1+β), dβ = dx/(1-x)2 : ∞
∞
∫
1
∫
1
∫
∫
E[β] = β π(β) dβ = 280 β5/(1+β)9 dβ = 280 x5 (1-x)4 dx/(1-x)2 = 280 x5 (1-x)2 dx = 0
0
0
0
(280)(Γ(6)Γ(3)/Γ(6 + 3)) = (280)(5! 2! / 8!) = 5/3. E[rβ] = (2)E[β] = (2)(5/3) = 10/3 = 3.33. Comment: By a change of variables the distribution of the parameter was converted to a Beta distribution and the integral into a Beta type integral. The Beta Distribution and Beta Type Integrals are discussed in “Mahlerʼs Guide to Conjugate Priors.” See also page 2 of the tables attached to your exam. If π[β] is proportional to βa-1/(1 + β)(a+b), then x = β/(1+β) follows a Beta Distribution with parameters a and b. E[β] = a/(b-1). For this problem a = 5 and b = 4, and a/(b-1) = 5/3. The mixed distribution is sometimes called a Generalized Waring Distribution. See Example 4.7.2 in Insurance Risk Models by Panjer and WiIlmot.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 255 6.23. C. The chance of the observation is the p.d.f. at 8 of a Negative Binomial Distribution, which given β is proportional to: β8/(1+β)8+2 = β8/(1+β)10. Therefore, the posterior distribution of β is proportional to: {β8/(1+β)11}{β4/(1 + β)9 } = β12/(1+β)19. ∞
∞
∫
E[β] = β β12/(1+β)19 dβ / 0
0
1
1
1
∫ β12/(1+β)19dβ = ∫x13 (1-x)6dx/(1-x)2 / ∫x12 (1-x)7dx/(1-x)2 0
0
1
∫
∫
= x13 (1-x)4 dx / x12 (1-x)5 dx = (Γ(14)Γ(5)/Γ(14 + 5))/(Γ(13)Γ(6)/Γ(13 +6)) = 0
0
(13! 4! / 18!)/(12! 5! / 18!) = 13/5. E[rβ] = (2)E[β] = (2)(13/5) = 26/5 = 5.2. Comment: If π[β] is proportional to βa-1/(1 + β)(a+b), and one observes C claims in one year, then the mean of the posterior distribution of β is: (a + C)/(r + b - 1). For this problem a = 5, b = 4, r = 2 and C = 8. (a + C)/(r + b - 1) = 13/5. If for fixed r, 1/(1+β) of the Negative Binomial is distributed over a portfolio by a Beta, then the posterior distribution of 1/(1+β) parameters is also given by a Beta. Thus the Beta distribution is a conjugate prior to the Negative Binomial Distribution for fixed r. Equivalently, β/(1+β) can be distributed via a different Beta Distribution. It turns out that this is an example of “exact credibility”, in which the estimate from Bayesian Analysis equals that from Buhlmann Credibility. In this case K = (b-1)/r and Z = r/(r+b-1). Other examples of exact credibility are discussed in “Mahlerʼs Guide to Conjugate Priors.”
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 256 6.24. B. A Normal Distribution with mean 65 and variance v has a density of: exp[-
(x - 65)2 1 ] . 1 / 4 2v v 2π
Thus the chance of the observation is proportional to: exp[-
1 ∑ (xi - 65)2 ] 20 . 2v v
π(v) = 1/v, v > 0. Thus the posterior distribution of v is proportional to: exp[-
∑ (xi - 65)2 1 ] 21 , v > 0. 2v v
Therefore, the posterior distribution of v is Inverse Gamma with: α + 1 = 21, and θ = ∑ (xi - 65)2 /2. ∑ (xi - 65)2 = ∑ xi2 - 130 ∑ xi + (n)(652 ) = (80)(4400) - (130)(80)(66) + (80)(652 ) = 3600. The posterior distribution of v is Inverse Gamma with α = 21 - 1 = 20, and θ = 3600/2 = 1800. The mean of this Inverse Gamma is: 1800 / (20 - 1) = 94.7. The estimate the variance of this test is 94.7. Comment: In the absence of any other information, such as a prior mean and prior distribution of v, our estimate of the variance of this test would be the sample variance of: (80/79) (4400 - 662 ) = 44.56. As usual, our estimate depends on the prior distribution used.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 257 6.25. C. The chance of the observation given λ is: λe -60λ. The prior distribution of λ is uniform from 0.02 to 0.10, with constant density 12.5. The posterior distribution of λ is proportional to: 12.5λe-60λ, 0.02 < λ < 0.1. Therefore, the posterior distribution of λ is proportional to: λe-60λ, 0.02 < λ < 0.1. .1
.1
∫
λe-60λ dλ = - λe-60λ/60 - e-60λ/3600] = .000179. .02
.02
Thus the posterior distribution of λ is: λe-60λ/.000179, .02 < λ < .1. The expected future severity is: E[1/λ] = .1
.1
.1
∫(1/λ)λe-60λ/.000179 dλ = (1/.000179)∫e-60λ dλ = (1/.000179) (-e-60λ/60)] = 27.8. .02
.02
.02
6.26. D. The chance of the observation given λ is: λe -yλ. Thus the posterior distribution of λ is proportional to: λe-yλ, .02 < λ < .1. .1
.1
∫
λe-yλ dλ = - λe-yλ/y - e-yλ/y2 ] = .02e-.02y/y - e-.02y/y2 - .1e-.1y/y - e-.1y/y2 . .02
.02
The posterior distribution of λ is: λe-yλ/{.02e-.02y/y - e-.02y/y2 - .1e-.1y/y - e-.1y/y2 }, 0.02 < λ < 0.1. The expected future severity is: .1
E[1/λ] =
∫e-yλ/{.02e-.02y/y - e-.02y/y2 - .1e-.1y/y - e-.1y/y2} dλ =
.02
{e-.02y/y - e-.1y/y }/{.02e-.02y/y - e-.02y/y2 - .1e-.1y/y - e-.1y/y2 } = {1 - e-.08y}/{.02 - 1/y - .1e-.08y - e-.08y/y}. As y goes to infinity this goes to 1/0.02 = 50. Alternately, as y gets larger it is more and more likely that the mean severity of 1/λ gets larger, or λ gets smaller. In the limit λ = 0.02, the smallest possible value. 1/λ → 50.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 258 6.27. C. π(θ) = 1/θ2, 1 < θ < ∞. f(x) = (αθα)(θ + x)−(α + 1) = 3θ3/(x + θ)4 . f(5) = 3θ3/(5 + θ)4 . Posterior distribution of θ is proportional to: π(θ)f(5) = 3θ/(5 + θ)4 , 1 < θ < ∞. ∞
∞
∫
∞
∫
∫
3θ/(5 + θ)4 dθ = 3 ( 5 + θ - 5)/(5 + θ)4 dθ = 3 1/(5 + θ)3 - 5/(5 + θ)4 dθ = 1
1
1 θ=∞
= 3{-(1/2)/(5 + θ)2 + (5/3)/(5 + θ)3 } ] = (3)(1/162) = 1/54. θ=1
Posterior distribution of θ is: {3θ/(5 + θ)4 }/(1/54) = 162θ/(5 + θ)4 , 1 < θ < ∞. ∞
∫
The posterior probability that θ exceeds 2 = 162θ/(5 + θ)4 dθ = 2 θ=∞
= (162){-(1/2)/(5 + θ)2 + (5/3)/(5 + θ)3 } ] = (162)(.005345) = 0.866. θ= 2
Comment: Similar to 4, 11/02, Q.21, however the integral here is harder. 6.28. B. The mean of the Pareto Distribution is: θ/(α−1) = θ/2. Posterior distribution of θ is: 162θ/(5 + θ)4 , 1 < θ < ∞. ∞
∞
∫
∫
The expected value of the next claim is: (θ/2)162θ/(5 + θ)4 dθ = 81 θ2/(5 + θ)4 dθ = 1
1
∞
∞
∫
∫
81 (θ2 + 10θ + 25 - 10θ - 25)/(5 + θ)4 dθ = 81 1/(5 + θ)2 - 10θ/(5 + θ)4 - 25/(5 + θ)4 dθ = 1
1
81{(1/6) -10(1/162) - 25(1/648)} = (81)(.06636) = 5.375. Alternately, one can let y = θ/(5 + θ). Then dy = 5/(5 + θ)2 . 1
∫
y=1
expected value = 81 y2/5 dy = (27/5)y3 1/6
] = 5.375.
y = 1/6
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 259 6.29. D. For the Pareto, S(x) = (θ/(θ+x))α. S(5) = θ3/(5 + θ)3 . Posterior distribution of θ is: 162θ/(5 + θ)4 , 1 < θ < ∞. Therefore, the probability that the next claim exceeds 5 is: ∞
∞
∫{θ3/(5 + θ)3}162θ/(5 + θ)4 dθ = 162∫θ4/(5 + θ)7 dθ. 1
1
Comment: Let y = θ/(5 + θ). Then 1 - y = 5/(5 + θ) and dy = 5/(5 + θ)2 . 1
y =1
∫
Probability = 162 y4(1-y)/25 dy = (162/25)(y5 /5 - y6 /6)] = 21.6%. 1/6
y = 1/6
6.30. B. The posterior distribution is proportional to: (density of p)(probability of observation given p) = (10)(p3 ). .1
∫
p 3 dp = .000025. ⇒ The posterior distribution of p is: p3 /.000025 = 40000p3 , 0 < p < 0.1. 0 .1
∫
Posterior mean of p = p 40000p3 dp = 8%. 0
6.31. A.
30
∫
Prob[x < 10 | b] = 10/b. Prob[x < 10] = (10/b) (1/20) db = (1/2)(ln30 - ln10) = 0.549. 10
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 260 6.32. C. The chance of the observation given b is: (10/b)2 . Prior density of b is: 1/20, 10 ≤ b ≤ 30.
Post. Dist. of b is proportional to: (10/b)2 (1/20) = 5/b2 .
30
∫
Post. Dist. of b = (5/b2 ) / 5/b2 db = (5/b2 )/(1/3) = 15/b2 , 10 ≤ b ≤ 30. 10 30
∫(10/b) (15/b2) db = 2/3.
Post. Prob. x < 10 = 10
6.33. B. Post. Dist. of b = 15/b2 , 10 ≤ b ≤ 30. E[X | b] = b/2. 30
∫
Posterior mean of X = (b/2) (15/b2 ) db = 8.24. 10
6.34. E. The chance of the observation is: 2(e-100/µ / µ)(e-400/µ / µ) = 2 e-500/µ / µ2. Therefore, the posterior distribution is proportional to: (1/µ) e-500/µ / µ2 = e-500/µ / µ3. Therefore, the posterior mean is:
∞
∞
0
0
∫ µ e− 500/µ / µ 3 dµ /
∫ e − 500/µ / µ3 dµ .
Making the change of variables, µ = 1/x, dµ = -dx/x2 . ∞
∞
∞
0
0
0
∫ µ e− 500/µ / µ 3 dµ =
∞
∞
0
0
∫ e − 500/µ / µ3 dµ =
∫ e − 500/µ / µ2 dµ =
∫ e − 500x dx = 1/500.
∫ e − 500x x d x = 1/5002.
Therefore, the posterior mean is: (1/500)/(1/5002 ) = 500. Comment: The posterior distribution is proportional to an Inverse Gamma Distribution with α = 2 and θ = 500. Therefore, this is the posterior distribution. Its mean is: 500/( 2 - 1) = 500. For this situation, the posterior distribution of µ is a Inverse Gamma with θ = the sum of the observed losses and α = the number of observed losses. Thus the posterior mean is: (sum of losses)/(number of losses - 1).
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 261 6.35. D. The chance of the observation is zero if θ ≤ 0.9. For θ > 0.9, the chance of the observation is: (2)(0.9)/θ2. π(θ) = 4θ3, 0 < θ < 1. Thus the posterior distribution is proportional to: θ3/θ2 = θ, for 0.9 < θ < 1. 1
∫0.9 θ dθ = (1/2)(12 - 0.92) = 0.095. Thus the posterior distribution of θ is: θ/0.095, for 0.9 < θ < 1. θ
E[X | θ] =
∫0 x 2x / θ2 dx = 2θ/3. 1
Therefore, the posterior mean is:
∫0.9 (θ / 0.095) (2θ / 3) dθ = (2/9)(13 - 0.93)/0.095 = 0.634
Comment: Setup taken from 4, 11/05, Q.7, which instead uses Buhlmann Credibility.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 262 6.36. The chance of the observation is: f(15) f(31) = 2 Thus the probability weight is: (1/5) 2
152 e - 15 / θ 312 e - 31/ θ . 2 θ3 2 θ3
152 e - 15 / θ 312 e - 31/ θ , 5 < θ < 10. 2 θ3 2 θ3
This is proportional to: e-46/θ θ−6. The mean of the Gamma is 3θ. 10
3 Thus the estimated future severity is:
∫5 e - 4 6 / θ θ -5 dθ
10
.
∫5 e - 4 6 / θ θ -6 dθ
10
Let t = 46/θ,
9.2
∫5 e - 4 6 / θ θ- 6 dθ = (1/46)5 4.6∫ e - t t4 dt = (1/46)5 Γ(5) {Γ[5, 9.2] - Γ[5, 4.6]} =
(24/465 ) (0.951420 - 0.486766) = 5.41442 x 10-8. 10
Let t = 46/θ,
9.2
∫5 e - 4 6 / θ θ-5 dθ = (1/46)4 4.6∫ e - t t3 dt = (1/46)4 Γ(4) {Γ[4, 9.2] - Γ[4, 4.6]} =
(6/464 ) (0.981580 - 0.674294) = 4.11778 x 10-7. Estimated future severity = (3) (4.11778 x 10-7) / (5.41442 x 10-8) = 22.8. Comment: Beyond what you are likely to be asked on your exam.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 263 6.37. The chance of the observation is: f(800) = α
800 α
1 = α (4/9)α. α + 1 1800 1800
The prior density of α is: 1/2, 3 < α < 5. The integral of the probability weight is: 5
∫3
1 1 1 α (4 / 9)α dα = 2 3600 1800
5
∫3 α (4 / 9)α dα =
1 {5 (4/9)5 / ln(4/9) - 3 (4/9)3 / ln(4/9) - (4/9)5 / ln(4/9)2 + (4/9)3 / ln(4/9)2 } = 0.32499 / 3600. 3600 Therefore, the posterior distribution of α is: 1 α (4/9)α / (0.32499 / 3600) = 3.077 α (4/9)α, 3 < α < 5. 3600 Given α, the mean of the Pareto is: 1000 / (α - 1). Thus the estimate of the size of the next claim from this policy is: 5
∫3
3.077 α
(4 / 9)α
1000 dα = 3077 α - 1
5
∫3
α (4 / 9)α dα . α - 1
Let x = α - 1. Then the estimate is: 4
3077
∫2
x +1 (4 / 9)x + 1 dx = (12,308/9) { x
4
4
∫2 (4 / 9)x dx + ∫2 (4 / 9)x / x dx } .
4
∫2 (4 / 9)x dx = (4/9)4 / ln(4/9) - (4/9)2 / ln(4/9) = 0.19547. 4
Letting t = x ln(4/9),
∫2
4 ln(4/ 9)
(4 / 9)x / x dx =
∫
2 ln(4/ 9)
et dt = Ei[4 ln(4/9)] - Ei[2 ln(4/9)] = t
(-0.00959173) - (-0.0835982) = 0.07401. Therefore the estimate is: (12,308/9) (0.19547 + 0.07401) = 368.5. Comment: Beyond what you should be asked on your exam.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 264 6.38. D. The chance of the observation given β is: βx / (1+β)x+1. Therefore, the posterior distribution is proportional to: βx / (1+β)x+α+2, 0 < β < ∞. The mean if each Geometric is β. ∞
Thus the posterior mean is:
∫0
β
βx βx + α + 2
∞
dβ /
∫0
β
βx βx + α + 2
dβ .
Let y = 1/(1+β). dy = dβ/(1+β)2 . 1 - y = β/(1+β). ∞
βx
1
∫0 β βx + α + 2 dβ = ∫0 yx (1- y)α dy = β(x+1 , α+1) = Γ[x+1] Γ[α+1] / Γ[x+2+α]. ∞
βx
1
∫0 β βx + α + 2 dβ = ∫0 yx + 1 (1- y)α - 1 dy = β(x+2 , α) = Γ[x+2] Γ[α] / Γ[x+2+α]. Thus the posterior mean is:
Γ[x + 2] Γ[α] / Γ[x + 2 + α] Γ[x + 2]/ Γ[x +1] x + 1 = = . Γ[x +1] Γ[α +1] / Γ[x + 2 + α] Γ[α + 1]/ Γ[α ] α
Comment: Difficult! Setup taken from 4, 5/05, Q.17, which instead uses Buhlmann Credibility. In this case, Bayes Analysis and Buhlmann Credibility produce the same answer.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 265 6.39. D. The density of the Gamma is:
exp[-x / θ] x4 θ5 4!
Thus the chance of the observation is proportional to: exp[-10 / θ] exp[-30 / θ] exp[-50 / θ] exp[-90 / θ] = . θ5 θ5 θ5 θ15 π(θ) = 1/θ, θ > 0. Therefore, the posterior distribution of θ is proportional to: exp[-90 / θ] 1 exp[-90 / θ] = , θ > 0. θ15 θ θ16 Thus the posterior distribution of the θ parameter of the severity distribution is Inverse Gamma with: α = 15, and scale parameter β = 90. The mean of this posterior Inverse Gamma is: 90 / (15 - 1) = 6.4286. Therefore, the estimate of the future mean severity is: α E[θ] = (5)(6.4286) = 32.143. Comment: If instead we had taken π(θ) = 1/θ2, θ > 0, then the posterior distribution would have been an Inverse Gamma with α = 16, and scale parameter θ = 90. The mean of this Inverse Gamma is: 90 / (16 - 1) = 6. The resulting estimate of the future mean severity would be: (5)(6) = 30 = X . Thus in this situation, π(θ) = 1/θ2, θ > 0, would be called the noninformative or diffuse prior. 6.40. From the previous solution, the posterior distribution of the θ parameter of the severity distribution is Inverse Gamma with: α = 15, and scale parameter β = 90. Thus posterior, E[θ2] =
β2 90 2 = = 44.5055. (α - 1) (α - 2) (14)(13)
E[θ] = β/(α-1) = 90 / (15 - 1) = 6.4286. Thus posterior, Var[θ] = 44.5055 - 6.42862 = 3.1786. E[5θ] = (5)(6.4286) = 32.143. Var[5θ] = (25)Var[θ] = (25)(3.1786) = 79.465. Thus using the Normal Approximation, a 90% confidence interval for the estimated mean severity is: 32.143 ± 1.645 79.465 = 32.143 ± 14.664 = [17.48, 46.81]. Comment: Similar to Exercise 15.81 in Loss Models.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 266 6.41. A. The chance of the observation is: f(59,874). The probability weight is: f(59,874) / 1, 9 < µ < 10. This is proportional to: exp[-{ln(59,874) - µ)2 / {(2)(1.22 )}] = exp[-(11 - µ)2 /2.88]. This is proportional to: exp[7.6391µ - 0.3472µ2]. The mean of the LogNormal is: exp[µ + 1.22 /2] = 2.0544 eµ. 10
∫ exp[7.639µ - 0.3472µ2 ] 2.0544 exp[µ] dµ
Thus the estimated future severity is:
9
.
10
∫9 exp[7.639µ - 0.3472µ2 ] dµ
10
∫9 exp[7.639µ - 0.3472µ2 ] dµ =
Exp[
7.6392 ] (4)(0.3472)
(2)(0.3472)(10) - 7.639 (2)(0.3472)(9) - 7.639 π {Φ[ ] - Φ[ ]} = 0.3472 (2)(0.3472) (2)(0.3472)
(5.3258 x 1018) {Φ[-0.83] - Φ[-1.67]} = (5.3258 x 1018) (0.2033 - 0.0475) = 8.298 x 1017. 10
10
∫9 exp[7.639µ - 0.3472µ2 ] 2.0544 exp[µ] dµ = 2.0544 ∫9 exp[8.639µ - 0.3472µ2 ] dµ .
10
∫9 exp[8.639µ - 0.3472µ2 ] dµ =
Exp[
8.6392 ] (4)(0.3472)
(2)(0.3472)(10) - 8.639 (2)(0.3472)(9) - 8.639 π {Φ[ ] - Φ[ ]} = 0.3472 (2)(0.3472) (2)(0.3472)
(6.5571 x 1023) {Φ[-2.03] - Φ[-2.87]} = (6.5571 x 1023) (0.0212 - 0.0021) = 1.2524 x 1022. Thus, the estimated future severity = ( 2.0544)(1.2524 x 1022) / (8.298 x 1017) = 31,007.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 267 6.42. E. The chance of the observation given q is: {10 q3 (1-q)2 } {10 q2 (1 - q)3 } q5 = 100 q10 (1 - q)5 . π(q) =
1 , 0 < q < 1. Thus the posterior distribution of q is proportional to: q (1− q)
1 q10 (1 - q)5 = q9 (1 - q)4 , 0 < q < 1. q (1− q) Therefore, the posterior distribution of q is Beta with a = 10, b = 5, and θ = 1. The mean of this posterior distribution of q is: 10 / (10 + 5) = 2/3. Thus the estimated future average frequency for this insured is: m E[q] = (5)(2/3) = 3.333. Comment: More generally let the data for n years be: x1 , x2 , ... , xn . Then the chance of the observation is proportional to: q
x1
(1- q)
m - x1
x x ... qxn (1- q) m - n = qΣ xi (1- q) mn - Σ i .
x Thus the posterior distribution is proportional to: qΣ xi - 1 (1- q) mn - Σ i - 1.
Therefore, the posterior distribution of q is Beta with a = ∑ xi, b = mn - ∑ xi, and θ = 1. The mean of this posterior distribution of q is: θ
a ∑ xi = . a + b mn
Thus the estimated future average frequency for the insured is: m
∑ xi = X. mn
The estimate of the future is the observed claim frequency. 1 Thus for this situation, is called the noninformative or diffuse prior. q (1− q) Since the posterior distribution is Beta, the predictive distribution (posterior mixed distribution) is a Beta-Binomial, as discussed in “Mahlerʼs Guide to Conjugate Priors.”
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 268 6.43. B. If b were 300, then claim sizes are uniform from 300 to 400, and we could observe a claim of size 300. If b were 200, then claim sizes are uniform from 200 to 300, and we could observe a claim of size 300. b can be 200, 300, or anything in between. Given 200 ≤ b ≤ 300, the chance of the observation given b is 1/100. π(b) = e-b/80 / 80. Thus the probability weight is: (e-b/80 / 80) (1/100). Thus the posterior distribution of b is proportional to: e-b/80, 200 ≤ b ≤ 300. b = 300
e- b / 80 db = -80 e - b / 80 ] = 80 (e-2.5 - e-3.75). ∫ b = 200 200 300
Thus the posterior distribution of b is:
e - b / 80 80 (e - 2.5 - e - 3.75)
, 200 ≤ b ≤ 300.
Given b, the mean severity is: b + 50. Therefore, the expected value of the next claim from the same insured is: 300
∫ 200
(b + 50)
e- b / 80 80 (e - 2.5 - e - 3.75 )
db =
300
1 80 (e - 2.5 - e - 3.75) 1 80 (e - 2.5 - e - 3.75)
∫ 200
300
b e- b / 8 0 db + 50
(-80b
e- b / 80
-
802
∫ 200
80 (e - 2.5 - e - 3.75 )
b = 300 b / 80 e )
]
db =
+ (50)(1) =
b = 200
200 e - 2.5 + 80 e - 2.5 - 300 e - 3.75 - 80 e - 3.75 e - 2.5 - e - 3.75
e - b / 80
+ 50 = 239.845 + 50 = 289.845.
Comment: The integral of the posterior density of b over its support has to be one.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 269 6.44. B. If b were 200, then claim sizes are uniform from 200 to 300, and we could observe a claim of size 200 and a claim of size 230. If b were 130, then claim sizes are uniform from 130 to 230, and we could observe a claim of size 200 and a claim of size 230. b can be 130, 200, or anything in between. Given 130 ≤ b ≤ 200, the chance of the observation given b is: (1/100) (1/100). π(b) = e-b/80 / 80. Thus the probability weight is: (e-b/80 / 80) (1/100) (1/100). Thus the posterior distribution of b is proportional to: e-b/80, 130 ≤ b ≤ 200. 200
∫ 130
e - b / 80
db = -80 e
b = 200 - b / 80
]
= 80 (e-1.625 - e-2.5).
b = 130
Thus the posterior distribution of b is:
e - b / 80 80 (e - 1.625 - e - 2.5)
, 130 ≤ b ≤ 200.
Given b, the mean severity is: b + 50. Therefore, the expected value of the next claim from the same insured is: 200
∫ 130
(b + 50)
e- b / 80 80 (e - 1.625 - e - 2.5)
db =
200
1 80 (e - 1.625 - e - 2.5)
∫ 130
200
b e - b / 8 0 db + 50
∫ 130
e- b / 80 80 (e - 1.625 - e - 2.5)
db =
b = 200
1 80 (e - 1.625 - e - 2.5)
(-80b e- b / 80 - 802 e- b / 80 )]
b = 130
130 e-1.625 + 80 e -1.625 - 200 e - 2.5 - 80 e- 2.5 e - 1.625 - e - 2.5
+ (50)(1) =
+ 50 = 159.960 + 50 = 209.960.
Comment: The integral of the posterior density of b over its support has to be one.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 270 6.45. E. & 6.46. B. π(λ) = 10 e−10λ, λ > ∞. The chance of no claims given λ is: 0.5e−λ + 0.5 e−2λ. Thus the posterior distribution is proportional to: e−10λ(e−λ + e−2λ) = e−11λ + e−12λ. The mean frequency given λ is: 0.5 λ + (0.5)(2λ) = 1.5λ. Thus the posterior mean frequency is: ∞
∫λ
e - 11λ
∞
dλ +
∫ λ e - 12λ dλ
1.5 0 ∞
0 ∞
0
0
= 1.5
∫ e - 11λ dλ + ∫ e - 12λ dλ
1/ 112 + 1/ 122 = 0.1309. 1/ 11 + 1/ 12
The chance of one claims given λ is: 0.5λe−λ + 0.5 (2λ)e−2λ. Thus the posterior distribution is proportional to: e−10λ(λe−λ + 2λe−2λ) = λe−11λ + 2λ e−12λ. The mean frequency given λ is: 0.5 λ + (0.5)(2λ) = 1.5λ. Thus the posterior mean frequency is: ∞
∫ 1.5
λ2
e - 11λ
∞
∫
dλ + 2 λ 2 e - 12λ dλ
0 ∞
∫ λ e - 11λ dλ 0
0 ∞
+ 2
= 1.5
∫ λ e - 12λ dλ
2 / 113 + (2)(2 / 123 ) = 0.2585. 1/ 112 + 2 / 122
0
∞
Comment: For Gamma type integrals:
∫ tn e- c t
0
dt = n! / cn+1.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 271 6.47. (a) Since the prior distribution is uniform, given m ≥ k, the probability weight for n ≥ m is ⎛ m- 1⎞ ⎜ ⎟ ⎝ k -1⎠ proportional to: . Given we have observed k and m, this is proportional to: ⎛ n⎞ ⎜ ⎟ ⎝ k⎠ ∞
From the given identity, the sum of the probability weights is:
∑ ⎛1n⎞
n=m ⎜
⎟ ⎝ k⎠
=
1 . ⎛ n⎞ ⎜ k⎟ ⎝ ⎠
k 1 . mk -1 ⎛ 1⎞ ⎜ k -1⎟ ⎝ ⎠
Thus dividing the probability weight by its sum, the posterior probability for n is: ⎛ m - 1⎞ ⎜ ⎟ k - 1 ⎝ k - 1⎠ , n ≥ m ≥ k > 1. k ⎛ n⎞ ⎜k⎟ ⎝ ⎠ ⎛ m- 1⎞ ⎜ ⎟ k - 1 ⎝ k -1⎠ (b) The mean of the posterior distribution is: = (k-1) n k ⎛ n⎞ n=m ⎜ k⎟ ⎝ ⎠ ∞
∑
⎛ m- 1⎞ (k-1) ⎜ ⎟ ⎝ k -1⎠
(k-1)
∞
∞
⎛ m- 1⎞ ⎛ m- 1⎞ k -1 1 1 = (k-1) ⎜ = (k-1) ⎜ ⎟ ⎟ ⎛ n -1⎞ ⎛ i ⎞ ⎝ k -1⎠ ⎝ k -1⎠ k - 2 n=m ⎜ i=m-1⎜ ⎟ ⎟ ⎝ k -1⎠ ⎝ k -1⎠
∑
∑
(m- 1)! k -1 (m-k)! (k - 2)! k-1 = (m-1) , k > 2. (m-k)! (k -1)! k - 2 (m- 2)! k-2
⎛ m- 1⎞ ⎜ k -1⎟ ⎝ ⎠
∞
∑
n=m
1 = m⎛ 2⎞ ⎜ k - 2⎟ ⎝ ⎠
(n- k)!(k - 1)! = (n -1)!
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 272 (c) The second factorial moment of the posterior distribution is: E[N(N-1)] = ⎛ m-1⎞ ∞ ∞ ∞ ⎜ ⎟ ⎛ m- 1⎞ ⎛ m- 1⎞ (n- k)!(k - 2)! k -1 ⎝ k -1⎠ 1 2 2 = (k-1) ⎜ = (k-1) ⎜ = n(n- 1) ⎟ ⎟ (n - 2)! k ⎛ n⎞ ⎛ n - 2⎞ ⎝ k -1⎠ ⎝ k -1⎠ n=m n=m n=m ⎜ ⎟ ⎜ k⎟ ⎝ k - 2⎠ ⎝ ⎠
∑
∑
∑
⎛ m- 1⎞ k - 2 1 (m- 1)! k - 2 (m-k)! (k - 3)! k -1 = (k-1)2 = (m-1)(m-2). (k-1)2 ⎜ ⎟ m3 ⎞ (m-k)! (k -1)! k - 3 (m- 3)! k-3 ⎝ k -1⎠ k - 3 ⎛ ⎜ k - 3⎟ ⎝ ⎠ E[N2 ] = E[N(N-1)] + E[N] = Var[N] = E[N2 ] - E[N]2 =
k -1 k -1 (m-1)(m-2) + (m-1) . k-3 k-2
k -1 k -1 (k - 1)2 (m-1)(m-2) + (m-1) - (m-1)2 = k-3 k-2 (k - 2)2
(k -1) (m- 1) 2 (m-2) + (k-2)(k-3) - (m-1)(k-1)(k-3)} = (k - 1) (m - 1) (m + 1 - k), k > 3. {(k-2) (k - 2)2 (k - 3) (k - 2)2 (k - 3) ⎛ m- 1⎞ ∞ ⎜ ⎟ k - 1 ⎝ k -1⎠ k -1 ⎛ m- 1⎞ 1 (d) For x ≥ m, Prob[N > x] = = = ⎜ ⎟ k ⎛ n⎞ k ⎝ k -1⎠ ⎛ n⎞ n=x+1 n=x+1 ⎜ ⎟ ⎜ k⎟ ⎝ k⎠ ⎝ ⎠ ∞
∑
∑
⎛ m - 1⎞ ⎜ ⎟ ⎝ k - 1⎠ k -1 ⎛ m- 1⎞ k 1 (m- 1)! (x + 1 - k)! (k - 1)! = = ⎜ ⎟ k ⎝ k -1⎠ k -1 ⎛ x ⎞ (m-k)! (k -1)! x! ⎛ x ⎞ ⎜ k -1⎟ ⎜k - 1⎟ ⎝ ⎠ ⎝ ⎠ =
(m- 1)! (x + 1 - k)! , k > 1. (m- k)! x!
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 273 (e) k = 5 and m = 20. The posterior probability is for n ≥ 20: ⎛ m- 1⎞ ⎜ ⎟ k -1 ⎝ k -1⎠ = (4/5) k ⎛ n⎞ ⎜ k⎟ ⎝ ⎠
⎛ 19⎞ ⎜ ⎟ ⎝4⎠ (n - 5)! = 372,096 . n! ⎛ n⎞ ⎜5⎟ ⎝ ⎠
For example, Prob[n = 20] = 1/5 = 20%, and Prob[n = 21] = 16/105 = 15.24%. Here is a graph of the posterior densities of n: Prob. 0.20
0.15
0.10
0.05
20
25
30
The mean of the posterior distribution of n is: (m-1)
35 k -1 = (19)(4/3) = 25.33. k-2
The variance of the posterior distribution of n is: (k -1) (m- 1) (m + 1 - k) = (16)(4)(19) / {(9)(2)} = 67.56. (k - 2)2 (k - 3) For x ≥ 20, S(x) =
(m-1)! (x + 1 - k)! 19! (x - 4)! = . (m- k)! x! 15! x!
For example, S(20) = 16/20 = 80%, and S(21) = 272/420 = 64.76%.
40
n
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 274 Here is a graph of the posterior survival function: S(x) 0.8
0.6
0.4
0.2
x 20 25 30 35 40 45 50 Comment: The German Tank Problem. See for example, http://en.wikipedia.org/wiki/German_tank_problem Assume that we have the numbers from 1 to n, and pick a subset of size k, without repeating any numbers. There are n ways to chose the first number, n - 1 ways to choose the second number, and n! n+1-k ways to chose the final number. Thus there are such subsets. (n -k)! If a subset of size k has a maximum of m ≤ n, then it can be thought of as first choosing an m, and then choosing the remaining k - 1 elements from the number from 1 to m-1. m-1! Thus there are subsets whose maximum is m. (m-1- k)! Thus given k ≤ m ≤ n, the probability the subset has a maximum of m is: ⎛ m- 1⎞ ⎜ ⎟ ⎝ k -1⎠ (m-1)! n! (m-1)! n! / = / = , matching one of the given hints. (m-1- k)! (n -k)! (m-1- k)! k! (n -k)! k! ⎛ n⎞ ⎜ k⎟ ⎝ ⎠
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 275 6.48. C. The probability weight is: f(x | θ) p(θ) = 8xθ, 0 < θ ≤ x. x
x
∫
∫
E(θ l x) = θ 8xθ dθ / 8xθ dθ = (8x3 /3)/(8x2 /2) = 2x/3. 0
0
6.49. B. Given x, the chance of observing three successes is x3 . The a priori distribution of x is f(x) = 1, 0≤ x ≤ 1. By Bayes Theorem, the posterior density is proportional to the product of the chance of the observation and the a priori density function. Thus the posterior density is proportional to x3 for 0≤ x ≤ 1. Since the integral from zero to one of x3 is 1/4, the posterior density is 4x3 . (The posterior density has to integrate to unity.) Thus the posterior chance that x < 0.9 is the integral of the posterior density from 0 to 0.9, which is 0.94 = 0.656. Alternately, by Bayes Theorem (or directly from the definition of a conditional distribution): Pr[x<.9 | 3 successes] = Pr[3 successes | x<.9 ] Pr[x<.9] / Pr[3 successes] = Pr[3 successes and x<.9 ] / Pr[3 successes] = .9
∫
1
x3 f(x)dx /
x=0
∫
x=0
.9
x3 f(x)dx =
∫
x=0
1
x3 dx /
∫
x=0
x3 dx = {(.94 )/4 } / {(14 )/4} = 0.656.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 276 6.50. B. Let q = m/100,000. Then m = 100,000q. dm/dq = 100000. Then f(q) = f(m)dm/dq = (m/100,000,000)100000 = (100,000q/100,000,000)100000 = 100q for 0 < q ≤ .1, and f(q) = f(m)dm/dq = ((20,000- m)/100,000,000)100000 = ((20,000 -100,000q)/100,000,000)100000 = 20 - 100q for .1 < q < .2. Thus, after the change of variables, f(q) = 100q for 0< q ≤ 0.1, f(q) = 20 - 100q for 1 < q < 0.2. We want to compute the posterior probability that q < .1. The chance of the observation given q is q(1-q). The posterior probability density function is proportional to f(q)q(1-q). In order to compute the posterior distribution we need to divide by the integral of f(q)q(1-q). .2
.1
.2
∫f(q)q(1-q)dq = ∫(100q)q(1-q)dq + ∫(20 - 100q)q(1-q)dq = 0
0
.1 q=.1
{100q3 /3 - 25q4 }
]
q=.2
+ {10q2 - 40q3 + 25q4 } ] = .37/12 + (3.6 - 3.36 +.45)/12 = 1.06/12.
q=0
q =.1
Thus the posterior density is: 100(q2 -q3 )12/1.06 for 0 < q ≤ .1, and (.2 - 100q)(q-q2 )12/1.06 for .1< q ≤ .2. Thus the posterior chance that q<.1 is the integral from 0 to .1 of the posterior density: .1
q=.1
∫(q2-q3)1200/1.06 dq = (1200/1.06)(q3/3 - q4/4) ] = (1200/1.06)(.0037/12) = 37/106. 0
q=0
Comment: The key point for the change of variables is that since probability density functions are derivatives there is an extra factor of dm/dq. f(q) = f(m)dm/dq. One way to remember this is that f(q) = dF(q)/dq and thus f(q)dq = dF = f(m)dm. Both f(q)dq and f(m)dm must integrate to unity since they are probability density functions.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 277 6.51. E. Given q, the chance of observing no claims is 1 - q. The posterior density is proportional to the product of the chance of the observation and the prior density: (1 - q)(π/2) sin(πq/2) = (π/2) sin(πq/2) - (πq/2) sin(πq/2). Integrating from zero to one to: 1
1
∫ (π/2) sin(πq/2) dq - ∫ (πq/2) sin(πq/2) dq = 1 - 2/π = (π − 2)/π. 0
0
(The first integral is unity, since it is the integral of the given density. The second integral is gotten from the hint.) Dividing by this integral, the posterior density is: (1-q)(π/2) sin(πq/2) / {(π − 2)/π} = (π2(1-q)/{2(π-2)}) sin(πq/2). Comment: Choice B integrates to 2/π ≠ 1, while Choice C integrates to 1 - 2/π ≠ 1, thus neither is a density function. Choice A can also be eliminated, since in this case if one observes zero rather than one claim, then the posterior mean is lower than the prior mean and thus the posterior density can not be equal to the prior density.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 278 6.52. C. Given b, the chance of the observation is f(2). If b > 2, then f(2) = 4/b2 . If b ≤ 2, f(2) = 0. The a priori chance of a given value of b, 1 < b < ∞, is 1/b2 . Thus by Bayes Theorem, the posterior density of b is proportional to: (4/b2 )(1/b2 ) = 4/b4 if b > 2, and zero for b ≤ 2. In order to convert this to a density we need to divide by its integral over the domain of b. 2
∞
∫
∞
∫
0 db + 4/b4 db = -4/ 3b3 ] = 1/6. 1
2
2
Thus the posterior density of b is: (4/b4 )/(1/6) = 24/b4 , b > 2. The mean conditional on b: b
x=b
∫
Mean = x(2x/b2 ) dx = (2/3)x3 /b2 ] = 2b /3. 0
x=0
Given b, the mean is 2b /3. Thus the posterior estimate of the next claim is: ∞
∞
∫
(2b /3)(24/b4 ) db = -8/ b2 ] = 2. 2
2
Comment: It turns out in this case, that if one observed a single claim of size y> 1, the posterior estimate is also y. This is an example of a noninformative or vague prior distribution. One has to be very careful to distinguish the two cases where b ≤2 and b > 2. Once we observe a claim of size 2, from the fact that the support of f(x) is 0 < x< b, we know b > 2. Thus the posterior density of b is zero for b ≤ 2. Even though it would have been better if it did so, this question did not specify whether to use Buhlmann Credibility or Bayesian Analysis. However, if one tries to use Buhlmann Credibility one would run into trouble. Prior to any observations, one has an infinite overall mean; the integral of (2b/3)(1/b2 ) from 1 to infinity is infinite. Prior to any observations, one has an infinite second moment of the hypothetical means; the integral of (2b/3)2 (1/b2 ) from 1 to infinity is infinite. Therefore the VHM is also infinite or undefined. The integral of (b2 /18)(1/b2 ) from 1 to infinity is infinite; therefore the EPV is infinite. When both the VHM and EPV are infinite, one can not calculate K and therefore one can not apply Buhlmann Credibility. However, in this case one could take a limit of Buhlmann Credibility Parameters. For g(b) = L/ (L-1)b2 , 1 < b < L, one can calculate, EPV = L/18, overall mean = 2 L (ln L) / 3(L-1), and VHM = (4/9){L - {L (ln L) / (L-1)}2 }. K = EPV/VHM = (1/8)(1/{1 - L(ln L)2 / (L-1)2 }. As L goes to infinity, K goes to 1/8. Thus for large L, for one observation Z ≅ 8/9. The estimate using Buhlmann Credibility is approximately (8/9)(2)+(1/9){2 L (ln L) / 3(L-1)}; as L goes to infinity, this estimate goes to infinity.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 279 6.53. E. Severity is uniform on [0, θ]. If for example, θ = 601, the chance of a claim of size 600 is 1/601. If for example, θ = 599, the chance of a claim of size 600 is 0. For θ ≥ 600, Prob[observation] = 2f(400)f(600) = (2)(1/θ)(1/θ) = 2/θ2. For θ < 600, Prob[observation] = 2f(400)f(600) = 2f(400)0 = 0. ∞
∞
∫π(θ) Prob[observation | θ] dθ = ∫(500/θ2)(2/θ2)dθ = (1000/3)/6003. 500
600
By Bayes Theorem, Posterior distribution of θ is: π(θ) Prob[observation | θ]/ {(1000/3)/6003 } = (3)(6003 )/θ4, θ > 600. For θ ≥ 550, S(550) = (θ - 550)/θ = 1 - 550/θ. θ=∞
∞
∫(1 - 550/θ) (3)(6003)/θ4 dθ = 1 +
(550)(3/4)(6003 )θ−4
600
] = 1 - 0.6875 = 0.3125. θ = 600
Comment: The integral over its support of the posterior distribution must be one. The Single Parameter Pareto is a Conjugate Prior to the uniform Iikelihood. Assume a uniform distribution on (0, ω), with π[ω] a Single Parameter Pareto with α and θ. Then the parameters of the posterior Single Parameter Pareto are: αʼ = α + n, and θʼ = Max[x1 , ... xn , θ]. In this case, the posterior distribution is Single Parameter Pareto with parameters: α = 1 + 2 = 3, and θ = Max[400, 600, 500] = 600.
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 280 6.54. A. Severity is uniform on [0, θ]. For θ ≥ 400, Prob[observation] = 2f(400)f(300) = (2)(1/θ)(1/θ) = 2/θ2. ∞
∫
π(θ) Prob[observation | θ] dθ =
500
∞
∫
(500 / θ 2) (2 / θ 2) dθ = (1000/3)/5003 .
500
By Bayes Theorem, Posterior distribution of θ is: π(θ) Prob[observation | θ] / {(1000/3)/5003 } = (3)(5003 )/θ4, θ > 500. For θ ≥ 550, S(550) = (θ - 550)/θ = 1 - 550/θ. For θ ≤ 550, S(550) = 0. ∞
(3)(5003 )
∫
(1 - 550 / θ) / θ 4 dθ = (3)(5003 ) {
550
1 550 } = 0.188. 3 (5503) 4 (5504)
6.55. E. π(θ) = 1/θ2, 1 < θ < ∞. f(x) = 2θ2/(x + θ)3 . f(3) = 2θ2/(3 + θ)3 . Posterior distribution of θ is proportional to: π(θ)f(3) = 2/(3 + θ)3 , 1 < θ < ∞. ∞
θ=∞
∫
2/(3 + θ)3 dθ = -1/(3 + θ)2 ] = 1/16. 1
θ=1
Posterior density of θ is: {2/(3 + θ)3 }/(1/16) = 32/(3 + θ)3 , 1 < θ < ∞. ∞
∫
θ=∞
The posterior probability that θ exceeds 2 = 32/(3 + θ)3 dθ = -16/(3 + θ)2 2
6.56. f(x) = 2θ2/(x + θ)3 . f(3) = 2θ2/(3 + θ)3 . If θ = 2, f(3) = 8/125 = 0.06400. If θ = 4, f(3) = 32/343 = 0.09329. Posterior distribution of θ is proportional to Prob[θ] f(3): (.7)(.064) = .0448 for θ = 2, (.3)(.09329) = .02799 for θ = 4. Posterior distribution of θ is: .0448/(.0448 + .02799) = 61.5% for θ = 2, .02799/(.0448 + .02799) = 38.5% for θ = 4.
] = 16/25 = 0.64.
θ=2
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 281 6.57. A. Severity is uniform on [0, θ]. E[X | θ] = θ/2. We are given that the posterior distribution of θ is: (3)(6003 )/θ4, θ > 600. Posterior estimate of the average severity is: ∞
∫ E[X | θ] f(θ | x1, x2) dθ = ∫ (θ / 2) 600
(3)(6003 )/ θ4
dθ =
-(3 / 4)(6003)
θ=∞ -2 θ
]
= 450.
θ = 600
Comment: Same setup as 4, 11/01, Q.14. See the solution to that question, in order to see how one would derive the given posterior distribution. 6.58. C. Severity is uniform on [0, θ]. E[X | θ] = θ/2. The uniform distribution from [0, 500] has density of 0 at 600, and the chance of the observation is 0. The uniform distribution from [0, 1000] has density of 1/1000, and the chance of the observation is 1/10002 . The uniform distribution from [0, 2000] has density of 1/2000, and the chance of the observation is 1/20002 . If we assume that the claims are 400 and 600 in that order, then the chance of the observation is: f(400) f(600). Theta
A priori Probability
Chance of Observation
Probability Weight
Posterior Probability
Mean Severity
500 1000 2000
0.5 0.3 0.2
0.000e+0 1.000e-6 2.500e-7
0.000e+0 3.000e-7 5.000e-8
0.000 0.857 0.143
250.0 500.0 1000.0
3.500e-7
571.4
If we instead assume that the claims are 400 and 600 in either order, then the chance of the observation is: 2 f(400) f(600); however, we get the same posterior distribution. 6.59. A. The prior density of p is: π(p) = 2, 0 ≤ p ≤ 0.5. The probability of the observation is p8 . Therefore, the posterior distribution of p is: 0.5
∫
2 p8 / 2p8 dp = p8 /{(1/2)9 /9} = 4608 p8 , 0 ≤ p ≤ 0.5. 0
The posterior probability that the insured will have at least one loss during Year 9 is: 0.5
∫(4608 p8) p dp = (4608)(1/2)10 / 10 = 0.45. 0
2013-4-9 Buhlmann Cred. §6 Bayes Continuous Risk Types, HCM 10/19/12, Page 282 6.60. E. The probability of the observation is θ. Therefore, the posterior distribution of θ is: 1
θ(3/2) θ /
∫0 θ(3 / 2)
θ dθ = θ1.5/(1/2.5) = 2.5θ1.5, 0 ≤ θ ≤ 1.
The posterior probability that that θ is greater than 0.60 is: 1
∫0.6 2.5 θ1.5 dθ = 12.5 - 0.62.5 = 0.721. Comment: The prior distribution of θ is a Beta with a = 1.5 and b = 1. This is a mathematically the same as a Beta-Bernoulli frequency process; see “Mahlerʼs Guide to Conjugate Priors.” The posterior distribution of θ is Beta with parameters: aʼ = a + 1 = 2.5, bʼ = b + 1 - 1 = 1. 6.61. D. Prob[observation | q] = q(1 - q) = q - q2 . 0.8
∫
0.8
∫
Posterior distribution = (q - q2 )(q3 /.07) / (q - q2 )(q3 /.07) dq = (q4 - q5 )/ (q4 - q5 )dq = 0.6
0.6
(q4 - q5 )/.01407. Posterior probability that 0.7 < q < 0.8: 0.8
∫(q4 - q5) dq /.01407 = .00784/.01407 = 0.557. 0.7
6.62. B. Given q, the probability of the observation is: Prob[2 claims in Year 1] Prob[2 claims in Year 2] = q q = q2 . Therefore, the posterior distribution is proportional to: q2 π(q) = q4 /0.039, 0.2 < q < 0.5. .5
∫q4/0.039 dq = .1586. .2
Therefore, the posterior distribution is: (q4 /0.039)/.1586 = 161.7q4 , 0.2 < q < 0.5. Given q, the mean is: 0.90 - q + 2q = 0.90 + q. Therefore, the expected future frequency is: .5
∫161.7q4 (0.90 + q) dq = .90 + .419 = 1.319. .2
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 283
Section 7, EPV and VHM In the prior sections we saw how to apply Bayesian Analysis. In the next section, how to apply Buhlmann Credibility will be demonstrated. In order to apply Buhlmann Credibility, one will first have to calculate the Expected Value of the Process Variance (EPV) and the Variance of the Hypothetical Means (VHM), which together sum to the total variance.79 How to compute these important quantities, will be demonstrated in this section.80 A Series of Examples: The following information will be used in a series of examples involving the frequency, severity, and pure premium: Portion of Bernoulli Gamma Risks in (Annual) Frequency Severity Type this Type Distribution Distribution 1
50%
q = 40%
α = 4, θ = 100
2
30%
q = 70%
α = 3, θ = 100
3
20%
q = 80%
α = 2, θ = 100
We assume that the types are homogeneous; i.e., every insured of a given type has the same frequency and severity process. Assume that for an individual insured, frequency and severity are independent.81 I will show how to compute the Expected Value of the Process Variance and the Variance of the Hypothetical Means in each case. In general, the simplest case involves the frequency, followed by the severity, with the pure premium being the most complex case. Expected Value of the Process Variance, Frequency Example: For type 1, the process variance of the Bernoulli frequency is: q(1 - q) = (0.4)(1 - 0.4) = 0.24. Similarly for type 2 the process variance for the frequency is (0.7)(1 - 0.7) = 0.21. For type 3 the process variance for the frequency is: (0.8)(1 - 0.8) = 0.16.
79
Those who are familiar with the general application of analysis of variance (ANOVA) may find that helps them to understand the material in this section. 80 Many of you will benefit by first reading the section on the Philbrick Target Shooting Example. 81 Across types, the frequency and severity are not independent. In this example, types with higher average frequency have lower average severity.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 284
The expected value of the process variance is the weighted average of the process variances for the individual types, using the a priori probabilities as the weights.82 The EPV of the frequency = (50%)(0.24) + (30%)(0.21) + (20%)(0.16) = 0.215. This computation can be organized in the form of a spreadsheet: Class 1 2 3
A Priori Probability 50% 30% 20%
Bernoulli Parameter n q 0.4 0.7 0.8
Average
Process Variance 0.240 0.210 0.160 0.215
I recommend you organize your computations for exam questions in a similar manner or one that works for you. Using the same structure for similar problems every time reduces the chance for error. Note that to compute the EPV one first computes variances and then one computes the expected value. In contrast, in order to compute the VHM, one first computes expected values, and then one computes the variance. Variance of the Hypothetical Mean Frequencies: For type 1, the mean of the Bernoulli frequency is q = 0.4. Similarly for type 2 the mean frequency is 0.7. For type 3 the mean frequency is 0.8. The variance of the hypothetical mean frequencies is computed the same way one would any other variance. First one computes the first moment: (50%)(0.4) + (30%)(0.7) + (20%)(0.8) = 0.57. Then one computes the second moment: (50%)(0.42 ) + (30%)(0.72 ) + (20%)(0.82 ) = 0.355. Then the VHM = 0.355 - 0.572 = 0.0301. This computation can be organized in the form of a spreadsheet: Class 1 2 3 Average
A Priori Probability 50% 30% 20%
Bernoulli Parameter Mean n Frequency q 0.4 0.4 0.7 0.7 0.8 0.8 0.57
Square of Mean Freq. 0.160 0.490 0.640 0.355
Then the variance of the hypothetical mean frequencies = 0.3550 - 0.5702 = 0.0301.
82
Note that while in this case with discrete possibilities we take a sum, as discussed subsequently, in the case of continuous risk types we would take an integral.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 285
Total Variance, Frequency Example: For an insured picked at random, there is a 0.57 chance of a claim and .43 chance of no claim. This is a Bernoulli, with variance: (0.57)(0.43) = 0.2451. Total Variance = 0.2451 = 0.215 + 0.0301 = EPV + VHM. In general, as will be demonstrated subsequently, EPV + VHM = Total Variance. Expected Value of the Process Variance, Severity Example: The computation of the EPV for severity is similar to that for frequency with one important difference. One has to weight together the process variances of the severities for the individual types using the chance that a claim came from each type.83 The chance that a claim came from an individual of a given type is proportional to the product of the a priori chance of an insured being of that type and the mean frequency for that type. Taking into account the mean frequencies in this manner is only necessary when one is predicting future severities and the type of insured includes specifying both frequency and severity. In the current example, while Type 1 represents 50% of the insureds, as shown below it represents only 35.1% of the claims, due to its relatively low claim frequency. For type 1, the process variance of the Gamma severity is: αθ2 = 4(100)2 = 40,000. Similarly for type 2 the process variance for the severity is: 3(100)2 = 30,000. For type 3 the process variance for the severity is: 2(100)2 = 20,000. The mean frequencies are: 0.4, 0.7, and 0.8. The a priori chances of each type are: 50%, 30% and 20%. Thus the weights to use to compute the EPV of the severity are: (0.4)(50%) = 0.20, (0.7)(30%) = 0.21, and (0.8)(20%) = 0.16. Thus the probabilities that a claim came from each class are: 0.20/0.57 = 0.351, 0.21/0.57 = 0.368, and 0.16/0.57 = 0.281. The expected value of the process variance of the severity is the weighted average of the process variances for the individual types, using these weights. The EPV of the severity is: {(0.2)(40000) + (0.21)(30000) + (0.16)(20000)} / (0.2 + 0.21 + 0.16) = 30,702.84 83
Each claim is one observation of the severity process. The denominator for severity is number of claims. In contrast, the denominator for frequency, as well as pure premiums, is exposures. 84 Note that this result differs from what one would get by using the a priori probabilities as weights. The latter method, which is not correct in this case, would result in: (50%)(40000) + (30%)(30000) + (20%)(20000) = 33,000 ≠ 30702.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 286
This computation can be organized in the form of a spreadsheet: A
B
Class 1 2 3
A Priori Probability 50% 30% 20%
Average
C
D
Mean Weights = Frequency Col. B x Col C. 0.4 0.2 0.7 0.21 0.8 0.16 0.57
E
F
G
Probability that a claim came Gamma Parameters α θ from this class 0.351 4 100 0.368 3 100 0.281 2 100 1.000
H
Process Variance 40,000 30,000 20,000 30,702
Variance of the Hypothetical Mean Severities: The computation of the VHM for severity is similar to that for frequency with one important difference. In computing the moments one has to use for each individual type the chance that a claim came from that type.85 The chance that a claim came from an individual of a given type is proportional to the product of the a priori chance of an insured being of that type and the mean frequency for that type. For type 1, the mean of the Gamma severity is αθ = 4(100) = 400. Similarly, for type 2 the mean severity is 3(100) = 300. For type 3 the mean severity is 2(100) = 200. The mean frequencies are: 0.4, 0.7, and 0.8. The a priori chances of each type are: 50%, 30% and 20%. Thus the weights to use to compute the moments of the severity are: (0.4)(50%) = 0.20, (0.7)(30%) = 0.21, and (0.8)(20%) = 0.16. The variance of the hypothetical mean severities is computed the same way as one would any other variance. First one computes the first moment: {(0.2)(400) + (0.21)(300) + (0.16)(200)} / (0.2 + 0.21 + 0.16) = 307.02. Then one computes the second moment: {(0.2)(4002 ) + (0.21)(3002 ) + (0.16)(2002 )} / (0.2 + 0.21 + 0.16) = 100,526. Then the VHM of the severity = 100,526 - 307.022 = 6265.
85
Each claim is one observation of the severity process. The denominator for severity is number of claims. In contrast, the denominator for frequency (as well as pure premiums) is exposures.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 287
This computation can be organized in the form of a spreadsheet: A
B
Class 1 2 3
A Priori Probability 50% 30% 20%
C
D
E
F
Mean Weights = Gamma Parameters α θ Frequency Col. B x Col C. 0.4 0.2 4 100 0.7 0.21 3 100 0.8 0.16 2 100
Average
0.57
G
H
Mean Severity 400 300 200
Square of Mean Severity 160,000 90,000 40,000
307.02
100,526
Then the variance of the hypothetical mean severities = 100526 - 307.022 = 6265. Two cases for Severities: The above example assumes that not only do the types differ in their severities, they also differ in their frequencies. Exercise: There are two types of risks that are equally likely. Class 1 has a mean frequency of 10% and an Exponential Severity with mean 5. Class 2 has a mean frequency of 20% and an Exponential Severity with mean 8. Calculate the EPV and VHM. [Solution: For an Exponential Distribution, mean = θ and variance = θ2. A
B
C
D
E
F
G
H
Class
A Priori Prob.
Mean Frequency
Weights = Col. B x Col. C
θ
Process Variance
Mean Severity
Square of Mean Severity
1 2
50% 50%
0.1 0.2
0.05 0.1
5 8
25 64
5 8
25 64
51.00
7.00
51.00
Average
0.15
EPV = 51. VHM = 51 - 72 = 2.] If the types do not differ in their frequencies, then the computations of the EPV and VHM are somewhat simpler.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 288
Exercise: There are two types of risks that are equally likely. Class 1 has an Exponential Severity with mean 5. Class 2 has an Exponential Severity with mean 8. Calculate the EPV and VHM. [Solution: For an Exponential Distribution, mean = θ and variance = θ2.
Type
A Priori Chance of This Type
1 2
50% 50%
Overall
θ
Process Variance
Mean
Square of Mean
5 8
25 64
5 8
25 64
44.50
6.50
44.50
EPV = 44.50. VHM = 44.50 - 6.502 = 2.25. Comment: Unlike the previous exercise, since there is no mention of differing frequency by type, we assume the mean frequencies for the types are the same; i.e., we ignore frequency. We therefore get different answers for the EPV and VHM, than in the previous exercise.] Expected Value of the Process Variance, Pure Premium Example: The computation of the EPV for the pure premiums is similar to that for frequency. However, it is more complicated to compute each process variance of the pure premiums.86 For type 1, the mean of the Bernoulli frequency is q = 0.4. For type 1, the variance of the Bernoulli frequency is q(1-q) = (0.4)(1 - 0.4) = 0.24. For type 1, the mean of the Gamma severity is αθ = 4(100) = 400, and the variance of the Gamma severity is αθ2 = 4(1002 ) = 40,000. Thus since frequency and severity are assumed to be independent, the process variance of the pure premium is: (Mean Freq.)(Variance of Severity) + (Mean Severity)2 (Variance of Freq.) = (0.4)(40000) + (400)2 (0.24) = 54,400. Similarly, for type 2 the process variance of the pure premium is: (0.7)(30000) + (300)2 (021) = 39,900. For type 3 the process variance of the pure premium is: (0.8)(20000) + (200)2 (0.16) = 22,400. The expected value of the process variance is the weighted average of the process variances for the individual types, using the a priori probabilities as the weights. The EPV of the pure premium = (50%)(54,400) + (30%)(39,900) + (20%)(22,400) = 43,650. 86
See a “Mahlerʼs Guide to Classical Credibility.”
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 289
This computation can be organized in the form of a spreadsheet: Class
A Priori Probability
1 2 3
50% 30% 20%
Mean Variance of Frequency Frequency 0.4 0.7 0.8
Mean Severity
Variance of Severity
Process Variance
400 300 200
40,000 30,000 20,000
54,400 39,900 22,400
0.24 0.21 0.16
Average
43,650
Variance of the Hypothetical Mean Pure Premiums: The computation of the VHM for the pure premiums is similar to that for frequency. One has to first compute the mean pure premium for each type. For type 1, the mean of the Bernoulli frequency is q = 0.4. For type 1, the mean of the Gamma severity is: αθ = 4(100) = 400. Thus since frequency and severity are assumed to be independent, the mean pure premium is: (Mean Frequency)(Mean Severity) = (0.4)(400) = 160. For type 2, the mean pure premium is: (0.7)(300) = 210. For type 3, the mean pure premium is: (0.8)(200) = 160.87 One computes the first and second moments of the mean pure premiums as follows: Class
A Priori Probability
Mean Frequency
Mean Severity
Mean Pure Premium
Square of Pure Premium
1 2 3
50% 30% 20%
0.4 0.7 0.8
400 300 200
160 210 160
25,600 44,100 25,600
175
31,150
Average
Thus the variance of the hypothetical mean pure premiums = 31,150 - 1752 = 525.
87
Note that in this example it turns out that the mean pure premium for type 3 happens to equal that for type 1, even though the two types have different mean frequencies and severities. The mean pure premiums tend be similar when, as in this example, high frequency is associated with low severity.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 290
Total Variance, Pure Premium Example: For each risk type, the second moment of the pure premium is the process variance plus the square of the mean. The second moments are: Class 1 2 3
A Priori Variance Mean Process of Variance Frequency Severityof Pure Premium Severity Probability 50% 30% 20%
54,400 39,900 22,400
Average
Mean Pure Premium
Second Moment of Pure Premium
160 210 160
80,000 84,000 48,000
175
74,800
The second moment for the mixture is a weighted average of the individual second moments, using the a priori probabilities as the weights: (50%)(80,000) + (30%)(84,000) + (20%)(48,000) = 74,800. The total variance = 74,800 - 1752 = 44,175. EPV + VHM = 43,650 + 525 = 44,175. Thus as is true in general, in this case, EPV + VHM = Total Variance. EPV versus VHM: While the two pieces of the total variance seem similar, the order of operations in their computation is different. In the case of the Expected Value of the Process Variance, EPV, first one separately computes the process variance for each of the types of risks and then one takes the expected value over all types of risks. Symbolically: EPV = Eθ[VAR[X | θ]]. Loss Models uses the symbol v, to refer to the Expected Value of the Process Variance. In the case of the Variance of the Hypothetical Means, VHM, first one computes the expected value for each type of risk and then one takes their variance over all types of risks. Symbolically: VHM = VARθ[ E[X | θ] ]. Loss Models uses the symbol a, to refer to the Variance of the Hypothetical Means.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 291
Demonstration that Total Variance = EPV + VHM: One can demonstrate that in general : 88 VAR[X] = Eθ[VAR[X | θ]] + VARθ[ E[X | θ] ].
First one can rewrite the EPV: Eθ[VAR[X | θ]] = Eθ[ E[X2 | θ] - E[X | θ]2 ] = Eθ[ E[X2 | θ]] - Eθ[E[X | θ]2 ] = E[X2 ] - Eθ[E[X | θ]2 ]. Second, one can rewrite the VHM: VARθ[E[X] | θ] = Eθ[E[X] | θ]2 ] - Eθ[E[X] | θ]2 = Eθ[E[X | θ]2 ] - E[X]2 . Putting together the first two steps: EPV + VHM = Eθ[VAR[X | θ]] + VARθ[E[X] | θ] = E[X2 ] - Eθ[E[X | θ]2 ] + Eθ[E[X | θ]2 ] - E[X]2 = E[X2 ] - E[X]2 = VAR[X] = Total Variance of X.
EPV + VHM = Total Variance. Use in Buhlmann Credibility: The Variance of the Hypothetical Means and the Expected Value of the Process Variance will be used in the next section to compute the Buhlmann Credibility to assign to N observations of this risk process; i.e., observing the same insured for N years. However, in each case one should compute these quantities for one insured for one year. In general, one computes the EPV and VHM for a single observation of the risk process, whether that consists of observing the pure premium for a single randomly selected insured for one year, or it consists of observing the value of a single ball drawn from a randomly selected urn.
Similarly, Cov[X, Y] = Eθ[Cov[X, Y | θ]] + Covθ[ E[X | θ] , E[Y | θ] ]. See for example, Howard Mahlerʼs discussion of Glenn Meyersʼ “Analysis of Experience Rating,” PCAS 1987. 88
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 292
VHM When There Are Only Two Types of Risks: When there are only two types of risks, the calculation of the VHM can be done via a shortcut. If the two hypothetical means are µ1 and µ2, and the a priori probabilities are p1 and 1 - p1 , then the VHM = (µ1 - µ2)2 ( p1 )(1 - p1 ). Exercise: There are two risk types, with hypothetical means of 3 and 8. The a priori probabilities are 60% and 40%. Calculate the Variance of the Hypothetical Means. [Solution: VHM = (3 - 8)2 (0.6)(0.4) = 6. Alternately, the overall mean is: (60%)(3) + (40%)(8) = 5. Second moment of the hypothetical means = (60%)(32 ) + (40%)(82 ) = 31. VHM = 31 - 52 = 6.] Dividing the Covariance Into Two Pieces:89 Similar to the result for variances, I will show that: Total Covariance = Expected Value of the Process Covariance + Covariance of the Hypothetical Means.90 Assume that a set of parameters θ varies across a portfolio. Cov[X, Y] = E[XY] - E[X] E[Y] = Eθ[E[XY | θ]] - Eθ[X | θ] Eθ[Y | θ] = Eθ[E[XY | θ]] - Eθ[E[X | θ] E[Y | θ]] + Eθ[E[X | θ] E[Y | θ]] - Eθ[X | θ] Eθ[Y | θ] = Eθ[E[XY | θ] - E[X | θ] E[Y | θ]] + {Eθ[E[X | θ] E[Y | θ]] - Eθ[X | θ] Eθ[Y | θ]} = Eθ[Cov[X, Y | θ]] + Covθ[[X | θ] , [Y | θ]]. For example, X might be the losses limited to 1000, and Y might be losses excess of 1000. There are two equally types of risks. Type E[X] E[Y] E[XY] 1 400 7000 3 million 2 600 9000 6 million Then, for Type 1: Cov[X, Y] = 3 million - (400)(7000) = 200,000. For Type 2: Cov[X, Y] = 6 million - (600)(9000) = 600,000. Eθ[Cov[X, Y | θ]] = 400,000. C o vθ[[X | θ] , [Y | θ]] = (1/2)(400)(7000) + (1/2)(600)(9000) - (500)(8000) = 100,000. Then, Cov[X, Y] = Eθ[Cov[X, Y | θ]] + Covθ[[X | θ] , [Y | θ]] = 400,000 + 100,000 = 500,000. 89 90
See for example, Howard Mahlerʼs discussion of Glenn Meyersʼ “An Analysis of Experience Rating”, PCAS 1987. This is a generalization of the fact that Total Variance = EPV + VHM.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 293
Mixed Distributions: If λ is the parameter being mixed, then one can split the variance of a mixed distribution into two pieces as: Var[X] = E[Var[X | λ]] + Var[E[X | λ]].91 Var[E[X | λ]] > 0. ⇒ Var[X] > E[Var[X | λ]].
⇔ Variance of a mixture > Average of the variances of the components. Mixing increases the variance. Exercise: A claim count distribution can be expressed as a mixed Poisson distribution. The mean of the Poisson distribution is uniformly distributed over the interval [0, 5]. Determine the variance of the mixed distribution. [Solution: The distribution of λ has mean 2.5 and variance: 52 /12 = 25/12. Var[N] = E[Var[N | λ]] + Var[E[N | λ]] = E[λ] + Var[λ] = 2.5 + 25/12 = 4.583. Alternately, the mean of the mixed distribution is: E[λ] = 2.5. Given lambda, the second moment of each Poisson is: λ + λ2. The second moment of the mixed distribution is: E[λ + λ2] = E[λ] + E[λ2] = 2.5 + (25/12 + 2.52 ) = 10.833. Variance of the mixed distribution is: 10.833 - 2.52 = 4.583.] Exercise: One has a two-point mixture of Binomials. The first component has m = 3 and q = 0.2. The second component has m = 4 and q = 0.1. The first component is given 70% weight and the second component is given weight 30%. Determine the variance of the mixed distribution. [Solution: The first Binomial distribution has mean 0.6 and variance 0.48. The second Binomial distribution has mean 0.4 and variance 0.36. The EPV is: (0.7)(0.48) + (0.3)(0.36) = 0.444. The overall mean is: (0.7)(0.6) + (0.3)(0.4) = 0.54. The second moment of the hypothetical means is: (0.7)(0.62 ) + (0.3)(0.42 ) = 0.30. The VHM is: 0.30 - 0.542 = 0.0084. The variance of the mixture is: EPV + VHM = 0.444 + 0.0084 = 0.4524. Alternately, the second moment of the mixture is: (0.7)(0.48 + 0.62 ) + (0.3)(0.36 + 0.42 ) = 0.744. The variance of the mixture is: 0.744 - 0.542 = 0.4524.] 91
See Theorem 5.7 in Loss Models. As discussed, the first piece, E[Var[X | λ]] , is the Expected Value of the
Process Variance, while the second piece, Var[E[X | λ]], is the Variance of the Hypothetical Means. Total Variance = Expected Value of the Process Variance + Variance of the Hypothetical Means.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 294
Problems: Use the following information for the next 2 questions: There are three types of risks. Each risk has either one or zero claims per year. Type of Risk Chance of a Claim A Priori Chance of Type of Risk A 30% 50% B 40% 35% C 50% 15% 7.1 (2 points) What is the Expected Value of the Process Variance? A. B. C. D. E.
Less than 0.19 At least 0.19 but less than 0.20 At least 0.20 but less than 0.21 At least 0.21 but less than 0.22 At least 0.22
7.2 (2 points) What is the Variance of the Hypothetical Means? A. B. C. D. E.
Less than 0.006 At least 0.006 but less than 0.007 At least 0.007 but less than 0.008 At least 0.008 but less than 0.009 At least 0.009
7.3 (3 points) For a large group of insureds:
• The hypothetical mean frequency for an individual insured is m. • Over the group, m is distributed uniformly on the interval (0, 5]. • The severity for an individual insured is Exponential: f(x) = (1/r) exp(-x/r), x ≥ 0. • Over the group, r is distributed: g(r) = 2r/9, 0 ≤ r ≤ 3. • For any individual insured, frequency and severity are independent. • m and r are independently distributed. In which range is the variance of the hypothetical mean pure premiums for this class of risks? A. Less than 6 B. At least 6, but less than 8 C. At least 8, but less than 10 D. At least 10, but less than 12 E. 12 or more
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 295
Use the following information in the next two questions: An insured population consists of 12% youthful drivers and 88% adult drivers. Based on experience, we have derived the following probabilities that an individual driver will have n claims in a year's time: n Youthful Adult 0 0.85 0.95 1 0.10 0.04 2 0.04 0.01 3 0.01 0.00 7.4 (2 points) What is the Expected Value of the Process Variance? A. Less than 0.07 B. At least 0.07 but less than 0.09 C. At least 0.09 but less than 0.11 D. At least 0.11 but less than 0.13 E. At least 0.13 7.5 (1 point) What is the Variance of the Hypothetical Means? A. 0.0018 B. 0.0020 C. 0.0022 D. 0.0024 E. 0.0026
Use the following information in the next two questions: A portfolio of three risks exists with the claim frequency for each risk Normally Distributed with mean and standard deviations: Risk Mean Standard Deviation A 0.10 0.03 B 0.30 0.05 C 0.50 0.01 7.6 (1 point) What is the Expected Value of the Process Variance? A. 0.0010 B. 0.0012 C. 0.0014 D. 0.0016 E. 0.0018 7.7 (1 point) What is the Variance of the Hypothetical Means? A. Less than 0.023 B. At least 0.023 but less than 0.025 C. At least 0.025 but less than 0.027 D. At least 0.027 but less than 0.029 E. At least 0.029
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 296
Use the following information for the next 6 questions: Two dice, A1 and A2 , are used to determine the number of claims. Each side of both dice are marked with either a 0 or a 1, where 0 represents no claim and 1 represents a claim. The probability of a claim for each die is: Die Probability of Claim 2/6 A1 A2
3/6
In addition, there are two spinners, B1 and B2 , representing claim severity. Each spinner has two areas marked 20 and 50. The probabilities for each claim size are: Claim Size Spinner 20 50 B1 0.60 0.40 B2
0.20 0.80 A single observation consists of selecting a die randomly from A1 and A2 and a spinner randomly from B1 and B2, rolling the selected die, and if there is a claim spinning the selected spinner. 7.8 (1 point) Determine the Expected Value of the Process Variance for the frequency. A. Less than 0.22 B. At least 0.22 but less than 0.23 C. At least 0.23 but less than 0.24 D. At least 0.24 but less than 0.25 E. At least 0.25 7.9 (1 point) Determine the Variance of the Hypothetical Mean frequencies. A. Less than 0.0060 B. At least 0.0060 but less than 0.0063 C. At least 0.0063 but less than 0.0066 D. At least 0.0066 but less than 0.0069 E. At least 0.0069 7.10 (2 points) Determine the Expected Value of the Process Variance for the severity. A. Less than 150 B. At least 150 but less than 170 C. At least 170 but less than 190 D. At least 190 but less than 210 E. At least 210
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 297
7.11 (1 point) Determine the Variance of the Hypothetical Mean severities. A. Less than 33 B. At least 33 but less than 34 C. At least 34 but less than 35 D. At least 35 but less than 36 E. At least 36 7.12 (2 points) Determine the Expected Value of the Process Variance for the pure premium. A. Less than 420 B. At least 420 but less than 425 C. At least 425 but less than 430 D. At least 430 but less than 435 E. At least 435 7.13 (2 points) Determine the Variance of the Hypothetical Mean pure premiums. A. Less than 15.3 B. At least 15.3 but less than 15.8 C. At least 15.8 but less than 16.3 D. At least 16.3 but less than 16.8 E. At least 16.8 Use the following information for the next two questions: There are two types of urns, each with many balls labeled $1000 and $2000. A Priori Chance of Percentage of Percentage of Type of Urn This Type of Urn $1000 Balls $2000 Balls I 80% 90% 10% II 20% 70% 30% 7.14 (2 points) What is the Expected Value of the Process Variance? A. Less than 90,000 B. At least 90,000 but less than 100,000 C. At least 100,000 but less than 110,000 D. At least 110,000 but less than 120,000 E. At least 120,000 7.15 (2 points) What is the Variance of the Hypothetical Means? A. Less than 5000 B. At least 5000 but less than 6000 C. At least 6000 but less than 7000 D. At least 7000 but less than 8000 E. At least 8000
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 298
Use the following information for the next 6 questions:
•
For an individual insured, frequency and severity are independent.
•
For an individual insured, frequency is given by a Poisson Distribution.
•
For an individual insured, severity is given by an Exponential Distribution.
•
Each type is homogeneous; i.e., every insured of a given type has the same
frequency process and severity process. Portion of Insureds in Mean Type this Type Frequency 1 30% 4 2 45% 6 3 25% 9
Mean Severity 50 100 200
7.16 (2 points) What is the Expected Value of the Process Variance for the frequency? A. Less than 6.0 B. At least 6.0 but less than 6.2 C. At least 6.2 but less than 6.4 D. At least 6.4 but less than 6.6 E. At least 6.6 7.17 (2 points) What is the Variance of the Hypothetical Mean frequencies? A. Less than 3.1 B. At least 3.1 but less than 3.3 C. At least 3.3 but less than 3.5 D. At least 3.5 but less than 3.7 E. At least 3.7 7.18 (2 points) What is the Expected Value of the Process Variance for the severity? A. Less than 20,000 B. At least 20,000 but less than 20,500 C. At least 20,500 but less than 21,000 D. At least 21,000 but less than 21,500 E. At least 21,500 7.19 (2 points) What is the Variance of the Hypothetical Mean severities? E. 4200 A. 3400 B. 3600 C. 3800 D. 4000
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 299
7.20 (3 points) What is the Expected Value of the Process Variance for the pure premium? A. Less than 200,000 B. At least 200,000 but less than 250,000 C. At least 250,000 but less than 300,000 D. At least 300,000 but less than 350,000 E. At least 350,000 7.21 (2 points) What is the Variance of the Hypothetical Mean pure premiums? A. 275,000 B. 300,000 C. 325,000 D. 350,000 E. 375,000
Use the following information in the next two questions: The number of claims is given by a Binomial distribution with parameters m = 10 and q. The prior distribution of q is uniform on [0, 1]. 7.22 (2 points) What is the Expected Value of the Process Variance? A. Less than 1.8 B. At least 1.8 but less than 2.0 C. At least 2.0 but less than 2.2 D. At least 2.2 but less than 2.4 E. At least 2.4 7.23 (2 point) What is the Variance of the Hypothetical Means? A. Less than 8.2 B. At least 8.2 but less than 8.4 C. At least 8.4 but less than 8.6 D. At least 8.6 but less than 8.8 E. At least 8.8
Use the following information for the next two questions: (i) Xi is the claim count observed for insured i for one year. (ii) Xi has a negative binomial distribution with parameters r = 2 and βi. (iii) The βiʻs have a distribution π[β] = 280β4 / (1 + β)9 , 0 < β < ∞. 7.24 (3 points) What is the Expected Value of the Process Variance of claim frequency? A. 7 B. 9 C. 11 D. 13 E. 15 7.25 (2 points) What is the Variance of the Hypothetical Mean claim frequencies? A. 7 B. 9 C. 11 D. 13 E. 15
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 300
7.26 (2 points) You are given the following:
•
A portfolio of risks consists of 2 classes, A and B.
•
For an individual risk in either class, the number of claims follows a Poisson distribution with mean λ.
Class Number of Exposures A 700 B 300 Total Portfolio 1,000
Distribution of Mean 0.080 0.200
Lambdas within Class Standard Deviation 0.17 0.24
Determine the standard deviation of the distribution of λ the individuals within the total portfolio. A. 0.19
B. 0.20
C. 0.21
D. 0.22
E. 0.23
Use the following information for the next two questions: • Number of claims for a single insured follows a Poisson distribution with mean λ. • The amount of a single claim has a Pareto distribution with α = 4 given by: F(x) = 1 - {θ/(θ + x)}4
x > 0, θ > 0.
• θ and λ are independent random variables. • E[λ] = 0.70, var(λ) = 0.20 • E[θ] = 100, var(θ) = 40,000 • Number of claims and claim severity distributions are independent. 7.27 (2 points) Determine the expected value of the pure premium's process variance for a single risk. A. Less than 10,000 B. At least 10,000 but less than 11,000 C. At least 11,000 but less than 12,000 D. At least 12,000 but less than 13,000 E. At least 13,000 7.28 (2 points) Determine the variance of the hypothetical mean pure premiums. A. Less than 2000 B. At least 2000 but less than 2500 C. At least 2500 but less than 3000 D. At least 3000 but less than 3500 E. At least 3500
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 301
Use the following information for the next four questions: • There are two types of insureds, A and B. • 30% are Type A and 70% are Type B. • Each insured has a LogNormal Distribution of size of loss. • Type A has µ = 4 and σ = 1. • Type B has µ = 4 and σ = 1.5. 7.29 (2 points) What is the Expected Value of the Process Variance for severities? A. B. C. D. E.
Less than 150,000 At least 150,000 but less than 160,000 At least 160,000 but less than 170,000 At least 170,000 but less than 180,000 At least 180,000
7.30 (2 points) What is the Variance of the Hypothetical Mean Severities? A. 1,100
B. 1,200
C. 1,300
D. 1,400
E. 1,500
7.31 (2 points) The mean frequency for Type A is 3, while that for Type B is 2. What is the Expected Value of the Process Variance for severities? A. Less than 150,000 B. At least 150,000 but less than 160,000 C. At least 160,000 but less than 170,000 D. At least 170,000 but less than 180,000 E. At least 180,000 7.32 (2 points) The mean frequency for Type A is 3, while that for Type B is 2. What is the Variance of the Hypothetical Mean Severities? A. Less than 1,000 B. At least 1,100 but less than 1,200 C. At least 1,200 but less than 1,300 D. At least 1,300 but less than 1,400 E. At least 1,400 7.33 (3 points) You are given the following: • The amount of an individual claim has an Inverse Gamma distribution with shape parameter α = 5 and scale parameter θ. • The parameter θ is distributed via an Exponential Distribution with mean 60. What is the variance of the mixed distribution? A. 350 B. 375 C. 400 D. 425
E. 450
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Use the following information for the next two questions: There are three dice : Die A Die B Die C 2 faces labeled 0 4 faces labeled 0 5 faces labeled 0 4 faces labeled 1 2 faces labeled 1 1 faces labeled 1 7.34 (2 points) What is the Expected Value of the Process Variance? A. Less than 0.15 B. At least 0.15 but less than 0.16 C. At least 0.16 but less than 0.17 D. At least 0.17 but less than 0.18 E. At least 0.18 7.35 (1 point) What is the Variance of the Hypothetical Means? A. Less than 0.04 B. At least 0.04 but less than 0.05 C. At least 0.05 but less than 0.06 D. At least 0.06 but less than 0.07 E. At least 0.07
Use the following joint distribution for the next two questions:
Θ X
100
200
0
0.2
0.1
10
0.1
0.3
40
0.1
0.2
7.36 (2 points) What is the Expected Value of the Process Variance? (A) 245 (B) 250 (C) 255 (D) 260 (E) 265 7.37 (2 points) What is the Variance of the Hypothetical Means? (A) 5 (B) 6 (C) 7 (D) 8 (E) 9
Page 302
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 303
7.38 (2 points) For each insured frequency is Poisson with mean λ. For each insured the severity distribution has a parameter θ. For each insured frequency and severity are independent. λ and θ vary across a portfolio of insureds independently of each other. The mean frequency for the portfolio is 10%. For aggregate losses, the expected value of the process variance is 20,000. Determine the second moment of the mixed severity distribution. (A) 100,000 (B) 200,000 (C) 300,000 (D) 400,000 (E) Can not be determined Use the following information for the next three questions: Severity is LogNormal with parameters m and v. m varies across the portfolio via a Normal Distribution with parameters µ and σ. 7.39 (3 points) What is the Expected Value of the Process Variance? 7.40 (3 points) What is the Variance of the Hypothetical Means? 7.41 (1 point) What is the Buhlmann Credibility Parameter K = EPV/VHM?
7.42 (4 points) For each of the following models of a baseball team, determine the standard deviation of the number of games won in a year. The team plays 162 games in a year. (a) There is a 50% probability of winning each game independent of any other game. (b) The team plays 81 road games with a 45% probability of winning each game independent of any other road game. The team plays 81 home games with a 55% probability of winning each game independent of any other home game. (c) For each game there is equally likely to be a 45% or 55% probability of winning that game. The winning probability of each game is independent of that of any other game. The outcome of each game is independent of that of any other game. (d) The team is equally likely to have a 45% or 55% probability of winning each game. In any single year, the winning probability of each game is equal to that of any other game. The outcome of each game is independent of that of any other game. 7.43 (2 points) For each insured frequency is Poisson with mean λ. λ varies across a set of insureds via a Poisson Distribution with mean µ. Determine the variance of the mixed distribution. A. µ
B. 2µ
C. µ + µ2
D. 2µ2
E. Can not be determined
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 304
7.44 (3 points) Use the following information:
• Frequency for an individual is a 50-50 mixture of two Poissons with means λ and 2λ. • The prior distribution of λ is Exponential with a mean of 0.1. Determine the Buhlmann Credibility Parameter K = A. 6
B. 7
C. 8
D. 9
the expected value of the process variance . variance of the hypothetical means E. 10
Use the following information in the next two questions: There are three large urns, each filled with so many balls that you can treat it as if there are an infinite number. Urn 1 contains balls with "zero" written on them. Urn 2 has balls with "one" written on them. The final Urn 3 is filled with 50% balls with "zero" and 50% balls with "one". An urn is chosen at random and a single ball is selected. 7.45 (4, 5/83, Q.44a) (1 point) What is the Expected Value of the Process Variance? A. Less than 0.06 B. At least 0.06 but less than 0.07 C. At least 0.07 but less than 0.08 D. At least 0.08 but less than 0.09 E. At least 0.09 7.46 (4, 5/83, Q.44b) (1 point) What is the Variance of the Hypothetical Means? A. Less than 0.17 B. At least 0.17 but less than 0.18 C. At least 0.18 but less than 0.19 D. At least 0.19 but less than 0.20 E. At least 0.20 7.47 (4, 5/85, Q.40) (3 points) The hypothetical mean frequencies of the members of a class of risks are distributed uniformly on the interval (0,1]. The probability density function for severity, f(x) = exp(-x/r) / r, x ≥ 0, with the r parameter being different for different individuals. r is distributed on (0,1) by the function g(r) = 2r, 0 < r ≤ 1. The frequency and severity are independently distributed. In which range is the variance of the hypothetical mean pure premiums for this class of risks? A. Less than 0.06 B. At least 0.06, but less than 0.08 C. At least 0.08, but less than 0.10 D. At least 0.10, but less than 0.12 E. 0.12 or more
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 305
7.48 (4, 5/87,Q.31) (1 point) Let X, Y, and Z be discrete random variables. Which of the following statements are true? 1. If Z is the sum of X and Y, the variance of Z is the sum of the variance of X and the variance of Y. 2. If Z is the difference between X and Y, the variance of Z is the difference between the variance of X and the variance of Y. 3. EY[VAR[X|Y]] = VARY [E[X|Y]]. A. 1
B. 1, 2
C. 1, 3
D. 1, 2, 3
E. None of A, B, C, D.
7.49 (4B, 11/92, Q.23) (2 points) You are given the following: • A portfolio of risks consists of 2 classes, A and B.
•
For an individual risk in either class, the number of claims follows a Poisson distribution. Distribution of Claim Frequency Rates Mean Standard Deviation 0.050 0.227 0.210 0.561
Class Number of Exposures A 500 B 500 Total Portfolio 1,000 Determine the standard deviation of the claim frequency for the total portfolio. A. Less than 0.390 B. At least 0.390 but less than 0.410 C. At least 0.410 but less than 0.430 D. At least 0.430 but less than 0.450 E. At least 0.450
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 306
Use the following information for the next two questions: • Number of claims for a single insured follows a Poisson distribution with mean λ. • The amount of a single claim has an exponential distribution given by: f(x) = (1/θ) e-x/θ, x > 0, θ > 0. • θ and λ are independent random variables. • E[λ] = 0.10, var(λ) = 0.0025. • E[θ] = 1000, var(θ) = 640,000. • Number of claims and claim severity distributions are independent. 7.50 (4B, 5/93, Q.21) (2 points) Determine the expected value of the pure premium's process variance for a single risk. A. Less than 150,000 B. At least 150,000 but less than 200,000 C. At least 200,000 but less than 250,000 D. At least 250,000 but less than 300,000 E. At least 300,000 7.51 (4B, 5/93, Q.22) (2 points) Determine the variance of the hypothetical means for the pure premium. A. Less than 10,000 B. At least 10,000 but less than 20,000 C. At least 20,000 but less than 30,000 D. At least 30,000 but less than 40,000 E. At least 40,000
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 307
Use the following information for the next two questions. • The number of claims for a single risk follows a Poisson distribution with mean m. • m is a random variable with E[m] = 0.40 and var(m) = 0.10. • The amount of an individual claim has a uniform distribution on [0, 100,000]. • The number of claims and the amount of an individual claim are independent. 7.52 (4B, 5/94, Q.22) (3 points) Determine the expected value of the pure premium's process variance for a single risk. A. Less than 400 million B. At least 400 million, but less than 800 million C. At least 800 million, but less than 1,200 million D. At least 1,200 million, but less than 1,600 million E. At least 1,600 million 7.53 (4B, 5/94, Q.23) (2 points) Determine the variance of the hypothetical means for the pure premium. A. Less than 400 million B. At least 400 million, but less than 800 million C. At least 800 million, but less than 1,200 million D. At least 1,200 million, but less than 1,600 million E. At least 1,600 million
7.54 (4B, 5/95, Q.4) (3 points) You are given the following: • The number of losses for a single risk follows a Poisson distribution with mean m. • The amount of an individual loss follows an exponential distribution with mean 1000/m and variance (1000/m)2 . • m is a random variable with density function f(m) = (1 + m)/ 6, 1 < m < 3 • The number of losses and the individual loss amounts are independent. Determine the expected value of the pure premium's process variance for a single risk. A. Less than 940,000 B. At least 940,000, but less than 980,000 C. At least 980,000, but less than 1,020,000 D. At least 1,020,000, but less than 1,060,000 E. At least 1,060,000
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 308
7.55 (4B, 5/96, Q.11 & 4B, 11/98, Q.21) (3 points) You are given the following:
• The number of claims for a single risk follows a Poisson distribution with mean λ. • The amount of an individual claim is always 1,000λ. • λ is a random variable with density function f(λ) = 4/λ5, 1 < λ < ∞. Determine the expected value of the process variance of the aggregate losses for a single risk. A. Less than 1,500,000 B. At least 1,500,000, but less than 2,500,000 C. At least 2,500,000, but less than 3,500,000 D. At least 3,500,000, but less than 4,500,000 E. At least 4,500,000 7.56 (4B, 11/97, Q.17) (2 points) You are given the following: • The number of claims follows a Poisson distribution with mean λ . • Claim sizes follow the following distribution: Claim Size Probability 2λ
1/3
8λ
2/3
• The prior distribution for λ is: λ
Probability
1 1/3 2 1/3 3 1/3 • The number of claims and claim sizes are independent. Determine the expected value of the process variance of the aggregate losses. A. Less than 150 B. At least 150, but less than 300 C. At least 300, but less than 450 D. At least 450, but less than 600 E. At least 600
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 309
7.57 (4B, 5/98, Q.7) (2 points) You are given the following: • The number of claims during one exposure period follows a Bernoulli distribution with mean p. • The prior density function of p is assumed to be f(p) = (π/2) sin(πp/2) , 0 < p < 1. 1
Hint:
∫
(πp/2) sin(πp/2) dp = 2/π and
0
1
∫
(πp2 /2) sin(πp/2) dp = 4(π - 2)/π2 .
0
Determine the expected value of the process variance. B. 2(4 - π )/π2
A. 4(π - 3)/π2
C. 4(π - 2)/π2
D. 2/π
E. (4 - π ) / {2(π - 3)}
7.58 (4B, 5/98, Q.26) (3 points) You are given the following: • The number of claims follows a Poisson distribution with mean m. •
Claim sizes follow a distribution with mean 20m and variance 400m2 .
•
m is a gamma random variable with density function f(m) = m2 e-m / 2 , 0 < m < ∞.
• For any value of m, the number of claims and the claim sizes are independent. Determine the expected value of the process variance of the aggregate losses. A. Less than 10,000 B At least 10,000, but less than 25,000 C. At least 25,000, but less than 40,000 D. At least 40,000, but less than 55,000 E. At least 55,000 Use the following information for the next two questions: •
Claim sizes follow a Pareto distribution, with parameters θ and α = 3.
•
The prior distribution of θ has density function f(θ) = e-θ, 0 < θ < ∞.
7.59 (4B, 5/99, Q.5) (2 points) Determine the expected value of the process variance. A. 3/8
B. 3/4
C. 3/2
D. 3
E. 6
7.60 (4B, 5/99, Q.6) (2 points) Determine the variance of the hypothetical means. A. 1/4
B. 1/2
C. 1
D. 2
E. 4
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 310
7.61 (4B, 11/99, Q.4) (2 points) You are given the following: • Claim sizes for a given policyholder follow a distribution with density function f(x) = 2x/b2 , 0 < x < b. • The prior distribution of b has density function g(b) = 1/b2 , 1 < b < ∞ . Determine the expected value of the process variance. A. 0 B. 1/18 C. 4/9 D. 1/2 E. ∞ 7.62 (IOA 101, 9/01, Q.5) (2.25 points) The number of claims, X, to be processed in a day by an employee of an insurance company is modeled as X ~ Poisson with mean 10. The time (minutes) the employee takes, Y, to process x claims is modeled as having a distribution with conditional mean and variance given by E[Y | X = x] = 15x + 20, Var[Y | X = x] = x + 12. Calculate the unconditional variance of the time the employee takes to process claims in a day. 7.63 (4, 5/05, Q.13 & 2009 Sample Q.183) (2.9 points) You are given claim count data for which the sample mean is roughly equal to the sample variance. Thus you would like to use a claim count model that has its mean equal to its variance. An obvious choice is the Poisson distribution. Determine which of the following models may also be appropriate. (A) A mixture of two binomial distributions with different means (B) A mixture of two Poisson distributions with different means (C) A mixture of two negative binomial distributions with different means (D) None of (A), (B) or (C) (E) All of (A), (B) and (C)
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 311
Solutions to Problems: 7.1. E. For a Bernoulli the process variance is q(1-q). Type of Risk A B C
A Priori Probability 0.50 0.35 0.15
q
Process Variance 0.21 0.24 0.25
0.3 0.4 0.5
Average
0.2265
7.2. A. The Variance of the Hypothetical Means = 0.1385 - 0.3652 = 0.005275. Type of Risk A B C
A Priori Probability 0.50 0.35 0.15
Average
Mean 0.3 0.4 0.5
Square of Mean 0.09 0.16 0.25
0.3650
0.1385
7.3. E. m is the mean claim frequency for an insured, and h(m) = 1/5 on [0, 5]. The mean severity for an insured is r, since that is the mean for the given exponential distribution. Therefore for a given insured the mean pure premium is mr. The first moment of the hypothetical mean pure premiums is (since the distributions of m and r are independent): m=5
∫ m=0
r=3
m=5
r=3
∫m r g(r) h(m) dr dm = ∫m/5 dm ∫ r (2r/9) dr = (2.5)(2) = 5. r=0
m=0
r=0
Similarly, the second moment of the hypothetical mean pure premiums is: m=5
r=3
∫
∫m2 r2 g(r) h(m) dr dm = ∫m2/5 dm ∫ r2 (2r/9) dr = (25/3)(9/2) = 37.5.
m=0
r=0
m=5
m=0
r=3
r=0
Thus, the variance of the hypothetical mean pure premiums is: 37.5 - 52 = 12.5. Comment: When two variables are independent, the second moment of their product is equal to the product of their second moments. The same is not true for variances.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 312
7.4. C. The process variance for the Youthful drivers is: .35 - .212 = .3059. Number of Claims 0 1 2 3
Probability for Youthful 0.85 0.10 0.04 0.01
Average Number of Claims 0 1 2 3
Probability for Adult 0.95 0.04 0.01 0.00
Average
n 0 1 2 3
Square of n 0 1 4 9
0.2100
0.3500
n 0 1 2 3
Square of n 0 1 4 9
0.0600
0.0800
Thus the process variance for the Adult drivers is .08 - .062 = .0764. Thus the Expected Value of the Process Variance = (.0764)( 88%) + (.3059)(12%) = 0.104. 7.5. D. The Variance of the Hypothetical Means = .00846 - .07802 = 0.00238. Type of Driver Youthful Adult
A Priori Probability 0.1200 0.8800
Average
Mean 0.2100 0.0600
Square of Mean 0.04410 0.00360
0.0780
0.00846
7.6. B. The process variances are the squares of the standard deviations: .0009, .0025, .0001. Averaging we get EPV = (.0035) /3 = 0.001167. 7.7. C. The Variance of the Hypothetical Means = .1167 - .32 = 0.0267. Type of Risk A B C Average
A Priori Probability 0.3333 0.3333 0.3333
Mean 0.1 0.3 0.5
Square of Mean 0.01 0.09 0.25
0.3000
0.1167
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 313
7.8. C. For a Bernoulli the process variance is q(1-q). For example for Die A1 , the process variance = (2/6)(1- 2/6) = 2/9 = .2222. Type of Die A1 A2
Bernoulli Parameter 0.3333 0.5000
A Priori Probability 0.50 0.50
Average
Process Variance 0.2222 0.2500 0.2361
7.9. E. The Variance of the Hypothetical Means = .18056 - .416672 = 0.00695. Type of Die A1 A2
A Priori Probability 0.50 0.50
Average
Mean 0.33333 0.50000
Square of Mean 0.11111 0.25000
0.41667
0.18056
7.10. C. For spinner B1 the first moment is: (20)(.6) + (50)(.4) = 32 and the second moment is: (202 )(.6) + (502 )(.4) = 1240. Thus the process variance is: 1240 - 322 = 216. For spinner B2 the first moment is: (20)(.2) + (50)(.8) = 44 and the second moment is: (202 )(.2) + (502 )(.8) = 2080. Thus the process variance is: 2080 - 442 = 144. Therefore, the expected value of the process variance is: (1/2)(216) + (1/2)(144) = 180. Type of Spinner B1 B2
A Priori Probability 0.50 0.50
Mean 32 44
Second Moment 1240 2080
Average
Process Variance 216 144 180
7.11. E. The Variance of the Hypothetical Means = 1480 - 382 = 36. Type of Spinner B1 B2 Average
A Priori Probability 0.50 0.50
Mean 32 44
Square of Mean 1024 1936
38
1480
Comment: Note that the spinners are chosen independently of the dice, so frequency and severity are independent across risk types. Thus one can ignore the frequency process in this and the prior question. One can not do so when for example low frequency is associated with low severity, as in the questions related to “good”, “bad” and “ugly “ drivers.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 314
7.12. B. For each possible pair of die and spinner use the formula: variance of p.p. = µf σs2 + µs2 σf2. Die and Spinner A1, B1 A1, B2 A2, B1 A2, B2
A Priori Chance of Risk 0.250 0.250 0.250 0.250
Mean Freq. 0.333 0.333 0.500 0.500
Variance of Freq. 0.222 0.222 0.250 0.250
Mean Severity 32 44 32 44
Process Variance of P.P. 299.6 478.2 364.0 556.0
Variance of Sev. 216 144 216 144
Mean
424.4
Comment: It is a much longer problem if one does not make use of values calculated in the solutions to the previous questions. 7.13. D. The Variance of the Hypothetical Means = 267.222 - 15.8332 = 16.53. Die and Spinner A1, B1 A1, B2 A2, B1 A2, B2
A Priori Chance of Risk 0.250 0.250 0.250 0.250
Mean Freq. 0.333 0.333 0.500 0.500
Mean Severity 32 44 32 44
Mean
Mean Pure Premium 10.667 14.667 16.000 22.000
Square of Mean P.P. 113.778 215.111 256.000 484.000
15.833
267.222
7.14. D. For example, the second moment of Urn II is (.7)(10002 ) + (.3)(20002 ) = 1,900,000. The process variance of Urn II = 1,900,000 - 13002 = 210,000. Type of Urn I II
A Priori Probability 0.8000 0.2000
Mean 1100 1300
Second Moment 1,300,000 1,900,000
Average
Process Variance 90,000 210,000 114,000
7.15. C. The variance of the hypothetical means is: 1,306,000 - 11402 = 6400. Type of Urn I II Average
A Priori Probability 0.8000 0.2000
Mean 1100 1300
Square of Mean 1,210,000 1,690,000
1,140
1,306,000
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 315
7.16. B. For the Poisson the process variance is the equal to the mean. The expected value of the process variance is the weighted average of the process variances for the individual types, using the a priori probabilities as the weights. The EPV of the frequency = (30%)(4) + (45%)(6) + (25%)(9) = 6.15. Type 1 2 3
A Priori Probability 30% 45% 25%
Poisson Parameter 4 6 9
Average
Process Variance 4 6 9 6.15
7.17. C. One computes the first and 2nd moments of the mean frequencies as follows: Type 1 2 3 Average
A Priori Probability 30% 45% 25%
Poisson Parameter 4 6 9
Mean Frequency 4 6 9
Square of Mean Freq. 16 36 81
6.15
41.25
Then the variance of the hypothetical mean frequencies = 41.25 - 6.152 = 3.43. Comment: Using the solution to this question and the previous question, as explained in the next section, the Buhlmann Credibility parameter for frequency is K = EPV/VHM = 6.15/3.43 = 1.79. The Buhlmann Credibility applied to the observation of the frequency for E exposures would be: Z = E / (E + 1.79).
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 316
7.18. A. One has to weight together the process variances of the severities for the individual types using the chance that a claim came from each type. The chance that a claim came from an individual type is proportional to the product of the a priori chance of an insured being of that type and the mean frequency for that type. Parameterize the Exponential with mean θ. For type 1, the process variance of the Exponential severity is: θ2 = 502 = 2500. The mean frequencies are: 4, 6, and 9. The a priori chances of each type are: 30%, 45% and 25%. Thus the weights to use to compute the EPV of the severity are (4)(30%), (6)(45%), (9)(25%) = 1.2, 2.7, 2.25. The expected value of the process variance of the severity is the weighted average of the process variances for the individual types, using these weights. EPV = {(1.2)(2500) + (2.7)(10000) + (2.25)(40000)}/(1.2 + 2.7 + 2.25) = 19,512. A
Type 1 2 3
B
C
D
E
F
A Priori
Mean
Weights =
Exponential Parameter
Process
θ 50 100 200
Variance 2,500 10,000 40,000
Probability 30% 45% 25%
Frequency Col. B x Col C. 4 1.20 6 2.70 9 2.25
Average
19,512
6.15
7.19. A. As in the previous question, in computing the moments one has to use as weights for each individual type the chance that a claim came from that type. A
B
C
D
E
F
Type
A Priori Probability
Mean Frequency
Weights Col.B x Col.C
Mean Severity
Square of Mean Severity
1 2 3
30% 45% 25%
4 6 9
1.20 2.70 2.25
50 100 200
2,500 10,000 40,000
6.15
126.83
19,512
Average
Then the variance of the hypothetical mean severities = 19512 - 126.832 = 3426. Comment: As explained in the next section, the Buhlmann Credibility parameter for severity is K = EPV/VHM = 19512 / 3426 = 5.7. The Buhlmann Credibility applied to the observation of the mean severity for N claims would be: Z = N / (N + 5.7).
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 317
7.20. B. Since frequency and severity are assumed to be independent, the process variance of the pure premium = (Mean Frequency)(Variance of Severity) + (Mean Severity)2 (Variance of Frequency). Type
A Priori Probability
1 2 3
30% 45% 25%
Mean Variance of Frequency Frequency 4 6 9
Mean Severity
Variance of Severity
Process Variance
50 100 200
2500 10000 40000
20,000 120,000 720,000
4 6 9
Average
240,000
7.21. E. The mean pure premium = (Mean Frequency)(Mean Severity). Then one computes the first and second moments of the mean pure premiums as follows: Type
A Priori Probability
Mean Frequency
Mean Severity
Mean Pure Premium
Square of Pure Premium
1 2 3
30% 45% 25%
4 6 9
50 100 200
200 600 1,800
40,000 360,000 3,240,000
780.00
984,000
Average
Then the VHM of the pure premiums = 984,000 - 7802 = 375,600. Comment: Using the solution to this question and the previous question, as explained in the next section, the Buhlmann Credibility parameter for the pure premium is K = EPV/VHM = 240000 / 375600 = .64. The Buhlmann Credibility applied to the observation of the pure premium for E exposures would be: Z = E / (E + .64). 7.22. A. The process variance for a Binomial is: mq(1-q) = 10q(1-q) = 10q - 10q2 . 1
∫
EPV = 10E[q - q2 ] = 10 q - q2 dq = 10(1/2 - 1/3) = 10/6 = 1.67. 0
7.23. B. For the Binomial Distribution the mean is mq = 10q. VHM = VAR[10q] = 100 VAR[q] = 100{E[q2 ] - E[q]2 } = 100{1/3 - (1/2)2 } = 100/12 = 8.33.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 318
7.24. D. Making the change of variables, x = β/(1+β), dβ = dx/(1-x)2 : ∞
∞
∫
1
∫
1
∫
∫
E[β] = β π(β) dβ = 280 β5/(1+β)9 dβ = 280 x5 (1-x)4 dx/(1-x)2 = 280 x5 (1-x)2 dx = 0
0
0
0
(280)(Γ(6)Γ(3)/Γ(6 + 3)) = (280)(5! 2! / 8!) = 5/3. ∞
∞
∫
1
∫
1
∫
∫
E[β2] = β2 π(β) dβ = 280 β6/(1+β)9 dβ = 280 x6 (1-x)3 dx/(1-x)2 = 280 x6 (1-x)dx = 0
0
0
0
(280)(Γ(7)Γ(2)/Γ(7 + 2)) = (280)(6! 1! / 8!) = 5. Process Variance = rβ(1+β) = 2β + 2β2. EPV = E[2β + 2β2] = 2E[β] + 2E[β2] = (2)(5/3) + (2)(5) = 13.33. Comment: By a change of variables the distribution of the parameter was converted to a Beta distribution and the integral into a Beta type integral. The Beta Distribution and Beta Type Integrals are discussed in “Mahlerʼs Guide to Conjugate Priors.” See also page 2 of the tables attached to your exam. If π[β] is proportional to βa-1/(1 + β)(a+b), then x = β/(1+β) follows a Beta Distribution with parameters a and b. 7.25. B. VHM = Var[rβ] = Var[2β] = 4Var[β] = 4(E[β2] - E[β]2) = (4)(5 - 25/9) = 80/9 = 8.89. Comment: Buhlmann Credibility Parameter = K = EPV/VHM = 13.33/8.89 = 1.5. If π[β] is proportional to βa-1/(1 + β)(a+b), then it turns out that K = (b-1)/r. For this problem a = 5, b = 4, and r = 2. K = (b-1)/r = 3/2. This is an example of “exact credibility”, in which the estimate of the future frequency of an insured using Bayesian Analysis equals that from Buhlmann Credibility. See “Mahlerʼs Guide to Conjugate Priors.”
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 319
7.26. B. The distribution of claim frequencies for the combined portfolio has mean given by: E[λ] = {(700)(.080) + (300)(.200)} /1000 = 0.116. The second moment for Class A is the variance plus the square of the mean: (0.172 ) + (0.082 ) = 0.0353. The second moment for Class B is: (0.242 ) + (0.202 ) = 0.0976. The combined portfolioʼs second moment is: {(700)(.0353) +(300)(.0976)} / 1000 = 0.05399. Thus the variance of the combined portfolio is: 0.05399 - 0.1162 = .0405. The standard deviation is: 0.0405 = 0.2013. Alternately, the variance of the means for the two classes is: VAR[E[ λ | Class]] = {(.7)(.0802 ) + (.3)(.2002 )} - .1162 = 0.003024. The average of the variance for the two classes is E[Var[λ | Class]] = {(700)(.172 ) + (300)(.242 )} /1000 = 0.03751. The variance of λ across the whole portfolio is VAR[λ] = E[Var[λ | Class]] + VAR[E[λ] | Class]] = 0.003024 + 0.03751 = 0.0405. Thus the standard deviation is:
0.0405 = 0.2013.
7.27. C. E[θ2 ] = Var[θ] + E[θ]2 = 40,000 + 10,000 = 50,000. Since frequency and severity are independent, for fixed θ and λ, Process Variance of the Pure Premium = E[freq.]Var[sev.] + E2 [sev.]Var[freq.] = λVar[sev.] + E2 [sev.]λ = λ (2nd moment of the severity) = λ (2θ2 / {(α-1)(α-2)}) = λθ2 /3. EPV = E[λθ2 /3] = (1/3)E[λ]E[θ2 ] = (1/3)(.7)(50,000) = 11,667. Comment: The 2nd moment of the Pareto Distribution is: 2θ2 / {(α-1)(α-2)}. 7.28. D. E[λ2] = Var[λ] + E[λ]2 = .20 + .49 = .69. E[θ2] = Var[θ] + E[θ]2 = 40,000 + 10,000 = 50,000. The hypothetical mean pure premium is: (avg. freq.)(avg. severity) = λθ/(α-1) = λθ/3. Var[Mean P.P.] = Var[λθ/3] = E[(λθ/3)2 ] - E[λθ/3]2 = E[λ2]E[θ2]/9 - E[λ]2 E[θ]2 /9 = (.69)(50,000) / 9 - (.49)(10,000) / 9 = 3,289. Comment: The mean of the Pareto Distribution is: θ / (α-1). Combining the answer to this question and the previous one, K = 11667/3289 = 3.5.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 320
7.29. D. For a LogNormal, mean = exp[µ + .5σ2], 2nd moment = exp[2µ + 2σ2]. Type
A Priori Probability
µ
σ
Mean
2nd Moment
Process Variance
A B
0.3 0.7
4 4
1 1.5
90.02 168.17
22,026 268,337
13,923 240,055 172,215
7.30. C. VHM = 22,229 - 144.732 = 1282. Type
A Priori Probability
µ
σ
Mean
Square of Mean
A B
0.3 0.7
4 4
1 1.5
90.02 168.17
8,103 28,283
144.73
22,229
7.31. B. The portion of claims from Type A is: (.3)(3)/((.3)(3) + (.7)(2)) = .9/(.9 + 1.4) = 39.13%. The portion of claims from Type B is: 1.4/(.9 + 1.4) = 60.87%. Type
A Priori Prob.
Mean Freq.
Weight
µ
σ
Mean
2nd Moment
Process Variance
A B
0.3 0.7
3 2
0.9 1.4
4 4
1 1.5
90.02 168.17
22,026 268,337
13,923 240,055 151,569
EPV = (39.13%)(13923) + (60.87%)(240055) = 151,569. 7.32. E. VHM = 20,386 - 137.592 = 1455. Type
A Priori Prob.
Mean Freq.
Weight
Percent of Claims
µ
σ
Mean
Square of Mean
A B
0.3 0.7
3 2
0.9 1.4
39.13% 60.87%
4 4
1 1.5
90.02 168.17
8,103 28,283
2.3
100.00%
137.59
20,386
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 321
7.33. B. Each Inverse Gamma has mean: θ/(α −1) = θ/4, and second moment: θ2/{(α −1)(α −2)} = θ2/12. Therefore, each Inverse Gamma has a (process) variance of: θ2/12 - (θ/4)2 = θ2/48. Since θ is distributed via an Exponential Distribution with mean 60, E[θ] = 60, E[θ2] = (2)(602 ) = 7200, and Var[θ] = 602 = 3600. EPV = E[θ2/48] = E[θ2]/48 = 7200/48 = 150. VHM = Var[θ/4] = Var[θ]/42 = 3600/16 = 225. Total Variance = EPV + VHM = 150 + 225 = 375. Alternately, the mean of the mixture is the mixture of the means: E[θ/4] = E[θ]/4 = 60/4 = 15. The second moment of the mixture is the mixture of the second moments: E[θ2/12] = E[θ2]/12 = 7200/12 = 600. Therefore, the variance of the mixture is: 600 - 152 = 375. Alternately, this is an example of an Exponential-Inverse Gamma. The mixed distribution is Pareto with α = 5 and θ = 60. This Pareto has mean: θ/(α −1) = 60/4 = 15, and second moment: 2θ2/{(α −1)(α −2)} = (2)(602 )/{(4)(3)} = 600. Therefore, this Pareto has a variance of: 600 - 152 = 375. 7.34. E. For example for Die Type C, the process variance = (1/6)(5/6) = 5/36 = .1389. Type of Die
A Priori Probability
Process Variance
A B C
0.3333 0.3333 0.3333
0.2222 0.2222 0.1389
Average
0.1944
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 322
7.35. B. The Variance of the Hypothetical Means = .1944 - .38892 = 0.0432. Type of Die
A Priori Probability
Mean
Square of Mean
A B C
0.3333 0.3333 0.3333
0.6667 0.3333 0.1667
0.4444 0.1111 0.0278
0.3889
0.1944
Average
7.36. C. & 7.37. D. Adding the probabilities, there is a 40% a priori probability of Θ = 100, risk type A, and a 60% a priori probability of Θ = 200, risk type B. For Risk type A, the distribution of X is
[email protected]/.4 = 1/2,
[email protected]/.4 = 1/4, and
[email protected]/.4 = 1/4. The mean for risk Type A is: E[X | Θ = 100] = (0)(1/2) + (10)(1/4) + (40)(1/4) = 12.5. The 2nd moment for risk Type A is: E[X2 | Θ = 100] = (02 )(1/2) + (102 )(1/4) + (402 )(1/4) = 425. Process Variance for Risk Type A is: Var[X | Θ = 100] = 425 - 12.52 = 268.75. Similarly, Risk Type B has mean 18.333, second moment 583.33, and process variance 247.22. EPV = 255.8. The variance of the hypothetical means = 264.16 - 162 = 8.2. Risk Type
A Priori Chance
Mean
Square of Mean
Process Variance
A B
0.4 0.6
12.50 18.33
156.25 336.10
268.75 247.22
16.00
264.16
255.83
Average
Comment: Overall we have: 30% X = 0, 40% X = 10, and 30% X = 40. Thus the overall mean is 16, and the total variance is 264. Note that EPV + VHM = 255.8 + 8.2 = 264 = total variance. Similar to 4, 11/02, Q.29. 7.38. B. For a compound Poisson, the process variance of aggregate loss is: λ E[X2 | θ]. Thus for aggregate loss, EPV = E[λ] Eθ[E[X2 | θ]] = E[λ] E[X2 ]. Therefore, 20,000 = (10%) E[X2 ]. ⇒ E[X2 ] = 200,000. Comment: For severity, the mixture of the second moments is the second moment of the mixture.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
7.39, 7.40, & 7.41. The process variance given m is: Second Moment of LogNormal - Square of First Moment of LogNormal = Exp[2m + 2v2 ] - Exp[m + v2 /2]2 = Exp[2m + 2v2 ] - Exp[2m + v2 ] = Exp[2m] (Exp[2v2 ] - Exp[v2 ]). Therefore, EPV = E[Exp[2m]] (Exp[2v2 ] - Exp[v2 ]). m is Normal with mean µ and standard deviation σ. Therefore, 2m is Normal with parameters 2µ and 2σ. Therefore, Exp[2m] is LogNormal with parameters 2µ and 2σ. Therefore, E[Exp[2m]] is the mean of this LogNormal: Exp[2µ + (2σ)2 /2] = Exp[2µ + 2σ2]. Therefore, EPV = Exp[2µ + 2 σ2] (Exp[2v2 ] - Exp[v2 ]). The hypothetical mean given m is: First Moment of LogNormal = Exp[m + v2 /2] = Exp[m] Exp[v2 /2]. Therefore, VHM = Var[Exp[m]] Exp[v2 /2]2 = Var[Exp[m]] Exp[v2 ]. m is Normal with mean µ and standard deviation σ. Therefore, Exp[m] is LogNormal with parameters µ and σ. Therefore, Var[Exp[m]] is the variance of this LogNormal: Exp[2µ + 2σ2] - Exp[µ + σ2 /2]2 = Exp[2µ + 2σ2] - Exp[2µ + σ2] = Exp[2µ] (Exp[2σ2] - Exp[σ2]). Therefore, VHM = Exp[2µ] (Exp[2σ2] - Exp[σ2]) Exp[v2 ].
K = EPV/VHM =
Exp[2µ + 2 σ2 ] (Exp[2v 2] - Exp[v2 ]) Exp[v2 ] - 1 = . Exp[2µ] (Exp[2σ2 ] - Exp[σ2 ]) Exp[v2 ] 1 - Exp[-σ2 ]
Page 323
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 324
7.42. a) Variance = (0.5)(1 - 0.5)(162) = 40.5. Standard Deviation = 6.364. b) Variance of the road games is: (0.45)(1 - 0.45)(81) = 20.0475. Variance of the home games is: (0.55)(1 - 0.55)(81) = 20.0475. Variance of total number of games won is: 20.0475 + 20.0475 = 40.095. Standard Deviation = 6.332. c) This is a mixture. For a single game, the mean is 0.5. The second moment of a Bernoulli is: q(1-q) + q2 = q. For a single game, the second moment is: (1/2)(0.45) + (1/2)(0.55) = 0.5. The variance of the mixture for one game is: 0.5 - 0.52 = 0.25. Variance of the mixture for 162 games is: (0.25)(162) = 40.5. Standard Deviation = 6.364. Alternately, EPV = (1/2)(.45)(.55) + (1/2)(.55)(.45) = 0.2475. VHM = (1/2)(.45 - .5)2 + (1/2)(.55 - .5)2 = 0.0025. Therefore, the total variance for one game is: 0.2475 + 0.0025 = 0.25. Variance of the mixture is: (0.25)(162) = 40.5. Standard Deviation = 6.364. d) This is a mixture. The mean number of wins for the year is: (0.5)(162) = 81. The second moment of a Binomial is: mq(1-q) + (mq)2 . The second moment of a Binomial with m = 162 and q = 0.45 is: (162)(0.45)(0.55) + {(0.45)(162)}2 = 5354.5. The second moment of a Binomial with m = 162 and q = 0.55 is: (162)(0.55)(0.45) + {(0.55)(162)}2 = 7978.9. The second moment of the mixture is: (1/2)(5354.5) + (1.2)(7978.9) = 6666.7. Variance of the mixture is: 6666.7 - 812 = 105.7. Standard Deviation = 10.281. Alternately, EPV = (1/2)(162)(.45)(.55) + (1/2)(162)(.55)(.45) = 40.095. VHM = (1/2)(72.9 - 81)2 + (1/2)(89.1 - 81)2 = 65.61. Therefore, the Variance of the mixture for 162 games is: 40.095 + 65.61 = 105.7. Standard Deviation = 10.281. 7.43. B. The process variance given λ is λ. ⇒ EPV = E[λ] = µ. The hypothetical mean given λ is λ. ⇒ VHM = Var[λ] = µ. Total Variance = EPV + VHM = 2µ. Comment: The Buhlmann Credibility Parameter is: K = EPV/VHM = µ/µ = 1.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 325
7.44. B. The mean frequency given λ is: 0.5 λ + (0.5)(2λ) = 1.5λ. VHM = Var[1.5λ] = 1.52 Var[λ] = 1.52 (0.12 ) = 0.0225. Second moment given lambda is: (0.5)(λ + λ2) + (0.5){2λ + (2λ)2 } = 1.5λ + 2.5λ2. Process variance given lambda is: 1.5λ + 2.5λ2 - (1.5λ)2 = 1.5λ + 0.25λ2. EPV = 1.5E[λ] + 0.25 E[λ2] = (1.5)(0.1) + (0.25)(2)(0.12 ) = 0.155. K = EPV / VHM = 0.155 / 0.0225 = 6.89. 7.45. D. Type of Urn
A Priori Probability
Process Variance
1 2 3
0.3333 0.3333 0.3333
0 0 0.25
Average
0.0833
7.46. A. Variance of the Hypothetical Means = 0.4167 - 0.52 = 0.1667. Type of Urn
A Priori Probability
Mean for this Type Urn
Square of Mean of this type Urn
1 2 3
0.3333 0.3333 0.3333
0 1 0.5
0 1 0.25
0.5
0.4167
Average
Comment: This is a mixture. Mean of the mixture is: (0 + 1 + 1/2)/3 = 1/2. Each Urn is Bernoulli. For each Urn, its mean is q and variance is q(1-q). Therefore, for each Urn its second moment is: q(1-q) + q2 = q. Second moment of the mixture is: (0 + 1 + 1/2)/3 = 1/2. ⇒ Variance of mixture: 1/2 - (1/2)2 = 1/4. EPV + VHM = 0.0833 + 0.1667 = 1/4 = Total Variance. If you calculated the variance of the mixture, which is the total variance, and calculated either the EPV or VHM, then you could back the other one out of the total variance.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 326
7.47. A. Let m be the mean claim frequency for an insured. Then h(m) = 1 on (0,1]. The mean severity for a risk is r, since that is the mean for the given exponential distribution. Therefore for a given insured the mean pure premium is mr. The first moment of the hypothetical mean pure premiums is: m=1
∫ m=0
r=1
m=1
r=1
∫m r g(r) h(m) dr dm = ∫m dm ∫ r (2r) dr = (1/2)(2/3) = 1/3. r=0
m=0
r=0
The second moment of the hypothetical mean pure premiums is (since the frequency and severity distributions are independent): m=1
∫ m=0
r=1
m=1
r=1
∫m2 r2 g(r) h(m) dr dm = ∫m2 dm ∫ r2 (2r) dr = (1/3)(1/2) = 1/6. r=0
m=0
r=0
The variance of the hypothetical mean pure premiums is: 1/6 - (1/3)2 = 1/18 = 0.0556. 7.48. E. 1. False. In general VAR[X+Y] = VAR[X] + VAR[Y] + 2COV[X,Y]. Thus statement 1 is only true when COV[X,Y] = 0. 2. False. In general VAR[X-Y] = VAR[X] + VAR[Y] - 2COV[X,Y]. 3. False. In analysis of variance, these are the two pieces that make up the unconditional variance, and usually they are not equal. For example the expected value of the process variance is usually not equal to the variance of the hypothetical means. 7.49. D. Assume that what is given as the “Distribution of Claim Frequencies Rates” is the Distribution of the Poisson parameters λ of the individual drivers across each class. The distribution of claim frequencies for the combined portfolio has mean given by E[λ] = {(500)(.050) +(500)(.210)} / 1000 = .130. The second moment for Class A is: (.2272 ) + (.052 ) = .054029. The second moment for Class B is: (.5612 ) + (.212 ) = .358821. The combined portfolioʼs second moment is a weighted average: E[λ2] = {(500)(.054029) +(500)(.358821)} / 1000 = .2064. Thus the variance of λ over the combined portfolio is: .2064 - .132 = .1895. The standard deviation is:
0.1895 = 0.4353.
Alternately, the variance of the means for the two classes is VAR[E[λ] | Class] = .082 = .0064. The average of the variance for the two classes is: E[Var[λ | Class]] = {(500)(.2272 ) + (500)(.5612 )} /1000 = .1831. The variance of λ across the whole portfolio is: VAR[λ] = E[Var[λ | Class]] + VAR[E[λ] | Class] = .1831 + .0064 = .1895. Thus the standard deviation is:
0.1895 = .4353.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 327
7.50. E. E[θ2] = Var[θ] + E[θ]2 = 640,000 + 1,000,000 = 1,640,000. Since frequency and severity are independent, for fixed θ and λ, Process Variance of the Pure Premium = E[freq.]Var[sev.] + E2 [sev.]Var[freq.] = λθ2 + θ2λ = 2λθ2. Expected Value of Process Variance = E[2λθ2 ] = 2E[λ]E[θ2 ] = (2)(.1)(1,640,000) = 328,000. 7.51. B. E[λ2] = Var[λ] + E[λ]2 = .0025 + .01 = .0125. E[θ2] = Var[θ] + E[θ]2 = 640,000 + 1,000,000 = 1,640,000. The hypothetical mean pure premium is: (avg. freq)(avg. severity) = λθ. Var[Mean P.P.] = Var[λθ] = E[(λθ)2 ] - E[λθ]2 = E[λ2] E[θ2] - E[λ]2 E[θ]2 = (.0125)(1,640,000) - (.01)(1,000,000) = 10,500. Comment: Note that if one were to combine the answer to this question and the previous one, then the Buhlmann credibility parameter is K = 328,000 / 10,500 = 31.2. 7.52. D. Given m, since the frequency and severity are independent, the process variance of the Pure Premium = (mean freq.)(variance of severity) + (mean severity)2 (variance of freq.)
= m {(variance of severity) + (mean severity)2 } = m(2nd moment of the severity) 100000
= (m / 100000)
∫ x2 dx=
(m/100000) (100000)3 / 3 = m 3.333 x 109 .
x=0
Thus, E[Process Variance] = 3.333 x 109 E[m] = (3.333 x 109 )(.4) = 1333 million. 7.53. A. The mean severity is 50,000 for each risk. Therefore, given m, the hypothetical mean pure premium = m 50000. Thus the Variance of the Hypothetical Means of the Pure Premiums is: VAR[50000m] = 500002 VAR[m] = (2.5 x 109 )(.10) = 2.5 x 108 = 250 million.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 328
7.54. D. Take m fixed, then since the frequency and severity are independent: Process Variance = µf σs2 + µs σf2 = m (1000/m)2 + (1000/m)2 m = 2,000,000 / m. To get the expected value of the Process Variance one has to integrate over all values of m using the p.d.f. f(m): 3
3
EPV = (2 million ) ∫ (1/m) (1+m)/6 dm = (1 million / 3) ∫ (1/m) + 1 dm = 1
1 m=3
(1 million / 3) {ln(m) + m}
]
= (1 million / 3) {ln(3) + 2} = 1.03 million.
m=1
7.55. D. The individuals in the portfolio are parameterized via λ, which in turn is distributed as f(x) = 4 / λ5, 1 < λ < ∞. For each individual we are given that frequency is Poisson with mean λ and that severity is constant at 1000λ. Thus for each individual (λ fixed), we have: µf = σf2 = λ, µs = 1000λ, and σs2 = 0. Thus for each individual (λ fixed), we have: σPP2 = µf σs2 + µs2 σf2 = (λ)(0) + (1000λ)2 (λ) = 1,000,000 λ3. In order to find the Expected Value of the Process Variance, one needs to take the integral with respect to λ of σPP2 f(λ) dλ. ∞
∞
∫1,000,000 λ3 (4 / λ5) dλ = 4,000,000 ∫ λ−2 dλ = 4,000,000(1) = 4,000,000. 1
1
Comment: You have to carefully calculate the process variance of the aggregate losses for each type of risk and then average over the different types of risks.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 329
7.56. D. For a Poisson frequency with independent frequency and severity, the process variance of the aggregate losses is equal to the mean frequency times the second moment of the severity. Given λ, the second moment of the severity is: (1/3)(2λ)2 + (2/3)(8λ)2 = 44λ2 . Thus given λ, the process variance of the pure premiums is: λE[X2 ] = λ(44λ2 ) = 44λ3 . The Expected Value of the Process Variance is: (1/3)(44) + (1/3)(352) + (1/3)(1188) = 528. Lambda
A Priori Probability
Process Variance of P.P. = 44 Lambda Cubed
1 2 3
0.3333 0.3333 0.3333
44 352 1188 528
7.57. B. The Bernoulli has process variance: q(1-q) = q - q2 . 1
E[q] =
1
∫ (πq/2) sin(πq/2) dq = 2/π.
0
∫ (πq2/2) sin(πq/2) dq = 4(π - 2)/π2 .
E[q2 ] = 0
Thus EPV = E[q - q2 ] = E[q] - E[q2 ] = 2/π - 4(π - 2)/π2 = (8 - 2π)/π2 = 2(4 - π )/π2 . Comment: The Variance of the Hypothetical Means would be computed as follows. VHM = VAR[q] = E[q2 ] - E2 [q] = 4(π - 2)/π2 - (2/π)2 = (4π -12)/π2. Then the Buhlmann Credibility Parameter K = EPV/VHM = (8 - 2π)/(4π - 12) = 3.03. 7.58. D. For m fixed the process variance of the aggregate losses is:
µF σS2 + µS2 σF2 = (m)( 400m2 ) + (20m)2 (m) = 800m3 . Therefore the expected value of the process variance of the aggregate losses is: ∞
∞
∫f(m)800m3 dm = 400 ∫ m5 e-m dm = 400 Γ(6) = (400)(5!) = 48,000. m=0
m=0
Comment: The integral from 0 to ∞ of m5 e-m is given in terms of a Gamma Function. Alternately, we need 800 times the integral of m3 times f(m), where f(m) is a Gamma Distribution, with parameters α = 3 and θ = 1. This integral is the third moment of a Gamma Distribution, it is equal to α(α+1)(α+2)θ3 = 60. Thus the expected value of the process variance of the aggregate losses is equal to (800)(60) = 48,000.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 330
7.59. C. The process variance for a Pareto is: 2θ2 /{(α−1) (α−2)} − {θ/(α−1)}2 = 3θ2 / 4. ∞
∞
∫
∫
∞
∫
EPV = (Process Variance | θ) f(θ) dθ = (3θ2 / 4) e−θ dθ = (3/4) θ2 e−θ dθ = (3/4)Γ(3) = 0
0
0
(3/4)(2) = 3/2. 7.60. A. The mean for a Pareto is: θ/(α−1) = θ/2. 2nd Moment of the Hypothetical Means ∞
∞
∫
∫
= (θ / 2)2 f(θ) dθ = (1/4) θ2 e−θ dθ = (1/4) Γ(3) = (1/4)(2) = 1/2. 0
0
1st Moment of the Hypothetical Means = ∞
∞
∫ (θ / 2) f(θ) dθ = (1/2) ∫ θ e−θ dθ = (1/2) Γ(2) = (1/2)(1) = 1/2. 0
0
Therefore, Variance of the Hypothetical Means = (1/2) - (1/2)2 = 1/4. Comment: There are many ways to make a mistake and still get the right answer. Combining the solutions to this and the previous question would produce a Buhlmann Credibility Parameter K = EPV/VHM = (3/2) / (1/4) = 6. 7.61. E. Given b, one can get the process variance by computing the first and second moments: b
x=b
∫
Mean = x(2x/b2 ) dx = (2/3)x3 /b2 ] = 2b /3. 0
x= 0 b
x=b
∫
2nd moment = x2 (2x/b2 ) dx = x4 / 2b2 ] = b2 /2. 0
x=0
Given b, the Process Variance = b2 /2 - (2b /3)2 = b2 /18. ∞
∞
∫
EPV = (b2 /18)(1/b2 ) db = b/18 ] = ∞. 1
1
Comment: The overall mean, the second moment of the hypothetical means, and the VHM are all also infinite.
2013-4-9
Buhlmann Cred. §7 EPV and VHM,
HCM 10/19/12,
Page 331
7.62. Var[Y] = Var[E[Y | X = x]] + E[Var[Y | X = x]] = Var[15x + 20] + E[x + 12] = 152 Var[X] + E[X] + 12 = (225)(10) + 10 + 12 = 2272. 7.63. A. When one mixes distributions, the variance increases. Var[X] = E[Var[X | λ]] + Var[E[X | λ]] ≥ E[Var[X | λ]]. Thus since for a Poisson Distribution, the variance is equal to the mean, for a mixture of Poisson Distributions, the variance is greater than the mean. Since for a Negative Binomial Distribution, the variance is greater than the mean, for a mixture of Negative Binomial Distributions, the variance is greater than the mean. Since for a Binomial Distribution, the variance is less than the mean, for a mixture of Binomial Distributions, the variance can be either less than, greater than, or equal to the mean. Thus a mixture of two binomial distributions with different means may be appropriate for the given situation.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 332
Section 8, Buhlmann Credibility, Introduction The expected value of the process variance and the variance of the hypothetical means, discussed in the previous section, will be used in Buhlmann Credibility. Buhlmann Credibility Parameter: The Buhlmann Credibility Parameter is calculated as: K = EPV / VHM.92 K = Expected Value of Process Variance / Variance of Hypothetical Means. Where the Expected Value of the Process Variance and the Variance of the Hypothetical Means are each calculated for a single observation of the risk process.93 Buhlmann Credibility Formula: Then for N observations, the Buhlmann Credibility is: Z = N / (N + K).94 Loss Models calls Z the Buhlmann Credibility Factor or Bühlmann-Straub Credibility Factor.95 Using Buhlmann Credibility, the estimate of the future = Z(observation) + (1 - Z)(prior mean). Multi-sided Dice Example: An example involving multi-sided dice was discussed previously with respect to Bayesian Analysis: There are a total of 100 multi-sided dice of which 60 are 4-sided, 30 are 6-sided and 10 are 8-sided. The multi-sided dice with 4 sides have 1, 2, 3 and 4 on them. The multi-sided dice with the usual 6 sides have numbers 1 through 6 on them. The multi-sided dice with 8 sides have numbers 1 through 8 on them. For a given die each side has an equal chance of being rolled; i.e., the die is fair. Your friend has picked at random a multi-sided die. He then rolled the die and told you the result. You are to estimate the result when he rolls that same die again. 92
In Loss Models notation, k = v/a. It is important to use a single consistent definition of a single draw from the risk process, for both the calculation of K and the determination of N. 94 If N =1, then Z = 1/(1 + K) = VHM/(VHM + EPV) = VHM / Total Variance. 95 Bühlmann-Straub refers to the case where there are varying number of exposure units as in for example Group Health Insurance or Commercial Automobile Insurance. 93
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 333 Expected Value of the Process Variance, Multi-sided Die Example: For each type of die we can compute the mean and the (process) variance. For example for a six-sided die one need only list all the possibilities: A
B
C
D
Roll of Die
A Priori Probability
Col. A times Col. B
Square of Col. A times Col. B
1 2 3 4 5 6
0.16667 0.16667 0.16667 0.16667 0.16667 0.16667
0.16667 0.33333 0.50000 0.66667 0.83333 1.00000
0.16667 0.66667 1.50000 2.66667 4.16667 6.00000
Sum
1
3.5
15.16667
Thus the mean is 3.5 and the variance is 15.16667 - 3.52 = 2.91667 = 35/12. Thus the conditional variance if a six-sided die is picked is 35/12. VAR[X | 6-sided] = 35/12. Exercise: What are the mean and variance of a four sided die? [Solution: The mean is 2.5 and the variance is 15/12.] Exercise: What are the mean and variance of an eight sided die? [Solution: The mean is 4.5 and the variance is 63/12.] Exercise: What are the mean and variance of a die with S sides? [Solution: The mean is (S+1)/2 and the variance is (S2 -1 )/12. The mean is the sum of the integers from 1 to S divided by S. The former is S(S+1) /2 , thus the mean is (S+1) /2. The second moment is the sum of the squares of the integers from 1 to S divided by S. The former is S(S+1)(2S-1)/6, thus the second moment is (S+1)(2S+1)/6 . Then the variance is: (S+1)(2S+1)/6 - {(S+1)/2} 2 = (S2 - 1)/12.] One computes the Expected Value of the Process Variance by weighting together the process variances for each type of risk using as weights the chance of having each type of risk96. In this case the Expected Value of the Process Variance is: (60%)(15/12) + (30%)(35/12) + (10%)(63/12) = 25.8 /12 = 2.15. In symbols this sum is: P(4 -sided)VAR[X | 4-sided] + P(6 -sided)VAR[X | 6-sided] +P(8-sided)VAR[X | 8-sided]. 96
In situations where the types of risks are parameterized by a continuous distribution, as for example in the GammaPoisson frequency process, one will take an integral rather than a sum.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 334 Using the fact that a die with S sides has Process Variance of (S2 - 1)/12: Type of Die
A Priori Chance of this Type of Die
Process Variance of this Type of Die
4-sided 6-sided 8-sided
0.6 0.3 0.1
1.25000 2.91667 5.25000
Average
2.15
Note that this is the Expected Value of the Process Variance for one observation of the risk process; i.e., one roll of a die. Variance of the Hypothetical Means, Multi-sided Die Example: One can also compute the Variance of the Hypothetical Means by the usual technique; compute the first and second moments of the hypothetical means. In this case: Type of Die 4-sided 6-sided 8-sided Average
A Priori Chance of this Type of Die 0.6 0.3 0.1
Mean for this Type of Die 2.5 3.5 4.5
Square of Mean of this type of Die 6.25 12.25 20.25
3
9.45
The Variance of the Hypothetical Means is the second moment minus the square of the (overall) mean = 9.45 - 32 = 0.45. Note that this is the variance for a single observation; i.e., one roll of a die. Total Variance: One can compute the total variance of the observed results if one were to repeat this experiment repeatedly. One need merely compute the chance of each possible outcome. In this case there is a 60% / 4 = 15% chance that a 4-sided die will be picked and then a 1 will be rolled. Similarly, there is a 30% / 6 = 5% chance that a 6-sided die will be selected and then a 1 will be rolled. There is a 10% / 8 = 1.25% chance that an 8-sided die will be selected and then a 1 will be rolled. The total chance of a 1 is therefore: 15% + 5% + 1.25% = 21.25%.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 335 A
Roll of Die 1 2 3 4 5 6 7 8 Sum
B
C
D
E
F
Probability due to 4-sided die 0.15 0.15 0.15 0.15
Probability due to 6-sided die 0.05 0.05 0.05 0.05 0.05 0.05
Probability due to 8-sided die 0.0125 0.0125 0.0125 0.0125 0.0125 0.0125 0.0125 0.0125
A Priori Probability 0.2125 0.2125 0.2125 0.2125 0.0625 0.0625 0.0125 0.0125
Col. A times Col. E 0.2125 0.4250 0.6375 0.8500 0.3125 0.3750 0.0875 0.1000
0.6
0.3
0.1
1
3
G
Square of Col. A times Col. E 0.2125 0.8500 1.9125 3.4000 1.5625 2.2500 0.6125 0.8000 11.6
The mean is 3 (the same as computed above) and the second moment is 11.6. Therefore, the total variance is 11.6 - 32 = 2.6. As is generally true, in this case, EPV + VHM = 2.15 + 0.45 = 2.6 = Total Variance. Thus the total variance has been split into two pieces. Estimating Future Die Rolls: Your friend has picked at random a multi-sided die. He then rolled the die and told you the result. You are to estimate the result when he rolls that same die again. In this case, K = EPV / VHM = 2.15 / 0.45 = 4.778 = 43/9, where the EPV and VHM where calculated previously for a single die roll. In this example, the prior mean is: (60%)(2.5) + (30%)(3.5) + (10%)(4.5) = 3.0. Thus the new estimate = Z(observation) + (1 - Z)(3) = 3 + Z (observation - 3). If the credibility assigned to the observation is larger, then our new estimate is more responsive to the observation, and vice versa. Z = N/(N + K), where N is the number of observations. For 1 observation, Z = 1/(1 + 4.778) = 0.1731 = 9 /52 = 0.45 / (0.45 + 2.15) = VHM/(VHM + EPV). Thus in this case if we observe a roll of a 5, then the new estimate is: (0.1731)(5) + (1 - 0.1731)(3) = 3.3462.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 336 Estimates Are In Balance: The Buhlmann Credibility estimate is a linear function of the observation. When observing one die roll, estimate = 0.1731(observation) + (1 - 0.1731)(3) = 0.1731(observation) + 2.4807: Observation New Estimate
1 2 2.6538 2.8269
3 3
4 5 6 7 8 3.1731 3.3462 3.5193 3.6924 3.8655
Exercise: Calculate the a priori chances of the possible observations for this multi-sided die example, with a 60% chance of a 4-sided die, 30% chance of a 6-sided die, and a 10% chance of an 8-sided die. [Solution: Observation A Priori Probability
1 2 3 4 5 6 7 8 0.2125 0.2125 0.2125 0.2125 0.0625 0.0625 0.0125 0.0125
Exercise: Calculate the weighted average of the Buhlmann Credibility estimates, using as weights the a priori chances of the possible observations for this multi-sided die example. [Solution: (0.2125)(2.6538 + 2.8269 + 3 + 3.1731) + (0.0625)(3.3462 + 3.5193) + (0.0125)(3.6924 + 3.8655) = 3. ] The weighted average of the Buhlmann Credibility estimates using as weights the a priori chances of the possible observations is equal to the a priori mean of 3. Let µ be the a priori mean. In general, the Buhlmann Credibility estimates are: (observation - µ)Z + µ. If as here Z is the same for each of the estimates being averaged,97 then the average over all the possible observations is: (average of possible observations - µ)Z + µ = (µ - µ)Z + µ = µ. Thus, if exposures do not vary, the estimates that result from Buhlmann Credibility are in balance: The weighted average of the Buhlmann Credibility estimates over the possible observations for a given situation, using as weights the a priori chances of the possible observations, is equal to the a priori mean. 97
This might not be the case, if rather than averaging over the possible observations for a single situation, one were averaging over the actual observations for classes with different number of exposures when doing classification ratemaking or over commercial insureds when doing experience rating. See the method that “preserves total losses” in the section on varying exposures in “Mahlerʼs Guide to Empirical Bayesian Credibility.”
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 337 Buhlmann Credibility versus Bayesian Analysis: The above results of applying Buhlmann Credibility in the case of observing a single die roll, differ from those previously obtained for Bayesian Analysis. Observation Buhlmann Credibility Estimate Bayesian Analysis Estimate
1 2 3 4 5 6 7 8 2.6538 2.8269 3 3.1731 3.3462 3.5193 3.6924 3.8655 2.853 2.853 2.853 2.853 3.7 3.7 4.5 4.5
Estimate 4.5
4.0
3.5
Buhlmann Credibility 3.0
1
2
3
4
5
6
7
8
Observation
Using Buhlmann Credibility: Estimate = Z(observation) + (1 - Z)(a priori mean). The slope of this line is Z, and 0 ≤ Z ≤ 1. Therefore, the Buhlmann Credibility estimates are on a straight line, with nonnegative slope not exceeding one. The straight line formed by the Buhlmann Credibility estimates seems to approximate the Bayesian Analysis Estimates (dots). As discussed subsequently, the Buhlmann Credibility Estimates are the weighted least squares line fit to the Bayesian Estimates. The a priori mean is 3. Therefore, if one observes a 3, the estimate from Buhlmann credibility is also 3. In general, for an observation equal to the a priori mean, the estimate using Buhlmann Credibility is equal to the observation. This is usually not the case for Bayesian Analysis.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 338 Also note that in the case of the application of credibility, the estimate is always between the a priori estimate and the observation. This is not necessarily true in general for Bayesian Analysis. As discussed in a previous section, the estimates from Bayes Analysis are within the range of hypothesis, in this case from 2.5 the mean of a 4-sided die, to 4.5 the mean of an eight sided die. This is not necessarily true in general for estimates from Buhlmann Credibility . Exercise: Calculate the weighted average of the Bayesian Analysis Estimates, using as weights the a priori chances of the possible observations for this multi-sided die example. [Solution: (4)(0.2125)(2.853) + (2)(0.0625)(3.7) + (2)(0.0125)(4.5) = 3. ] The weighted average of the Bayesian Analysis estimates using as weights the a priori chances of the possible observations is equal to the a priori mean of 3. As discussed in a previous section, Bayesian Analysis estimates are always “in balance.” For either Bayes Analysis or Buhlmann Credibility, one starts with different risks types, and an a priori probability for each risk type. One can apply one or the other to the same set up.98 Buhlmann Credibility is the weighted least squares linear approximation to Bayes Analysis. Note that in the multi-sided die example, if we observe a 7 or an 8, then we know it must have been the eight-sided die with mean 4.5. This is the Bayes estimate, but not the estimate from Buhlmann Credibility. Being a linear approximation, Buhlmann Credibility may have some logical deficiencies in simple models such as the multi-sided die example.99 In Buhlmann Credibility, one uses the initial given probabilities to calculate the EPV and VHM. One calculates the EPV, VHM, and K prior to knowing the particular observation! One does not use Bayes Theorem to get the posterior probabilities, and then use them to calculate the EPV and VHM! Use either the more exact Bayes Analysis or its approximation Buhlmann Credibility, not both. If an exam question can be done by either method, and they do not specify which, use the more exact Bayes Analysis rather than its approximation Buhlmann Credibility.100
98
On some exams they have had two questions with the same set up and asked you to apply Bayes Analysis and Buhlmann Credibility in the two separate questions. 99 While this kind of thing can happen in simple models used on the exam or for teaching purposes, it would be extremely unlikely to occur in a practical application of Buhlmann Credibility. 100 For some situations discussed in “Mahlerʼs Guide to Conjugate Priors,” the two methods give the same result.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 339 Multiple Die Rolls: So far we have computed variances in the case of a single roll of a die. One can also compute variances when one is rolling more than one die.101 There are a number of somewhat different situations which lead to different variances, which lead in turn to different credibilities. Exercise: Each actuary attending a CAS Meeting rolls 2 multi-sided dice. One die is 4-sided and the other is 6-sided. Each actuary rolls his two dice and reports the sum. What is the expected variance of the results reported by all the actuaries? [Solution: The variance is the sum of that for a 4-sided and 6-sided die. Variance = (15/12) + (35/12) = 50/12 = 4.167.] One has to distinguish the situation in this exercise where the types of dice rolled are known, from one where each actuary is selecting dice at random. The latter introduces an additional source of random variation, as shown in the following exercise. Exercise: Each actuary attending a CAS Meeting independently selects 2 multi-sided dice. For each actuary his two multi-sided dice are selected independently of each other, with each die having a 60% chance of being 4-sided, a 30% chance of being 6-sided, and a 10% chance of being 8sided. Each actuary rolls his two dice and reports the sum. What is the expected variance of the results reported by all the actuaries? [Solution: The total variance is the sum of the EPV and VHM. For each actuary let his 2 dice be A and B. Let the parameter (number of sides) for A be θ and that for B be ψ. Note that A only depends on θ, while B only depends on ψ, since the two dice were selected independently. Then EPV = Eθ,ψ[VAR[A+B | θ,ψ]] = Eθ,ψ[VAR[A | θ,ψ]] + Eθ,ψ[VAR[B | θ,ψ]] = Eθ[VAR[A | θ]] + Eψ[VAR[B | ψ]] = 2.15 + 2.15 = (2)(2.15) = 4.30. VHM = VARθ,ψ[ E[A+B | θ,ψ] ] = VARθ,ψ[ E[A | θ,ψ] + E[B | θ,ψ]] = VARθ[ E[A | θ] ] + VARψ[ E[B | ψ]] = (2)(0.45) = 0.90. Where I have used the fact that E[A | θ] and E[B | ψ] are independent and thus their variances add. Total variance = EPV + VHM = 4.3 + 0.9 = 5.2. ] This previous exercise is subtly different from a situation where the two dice selected by a given actuary are always of the same type. 101
These dice examples can help one to think about insurance situations where one has more than one observation or insureds of different sizes.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 340 Exercise: Each actuary attending a CAS Meeting selects two multi-sided dice, both of the same type. For each actuary, his multi-sided dice have a 60% chance of being 4-sided, a 30% chance of being 6-sided, and a 10% chance of being 8-sided. Each actuary rolls his dice and reports the sum. What is the expected variance of the results reported by all the actuaries? [Solution: The total variance is the sum of the EPV and VHM. For each actuary let his two die rolls be A and B. Let the parameter (number of sides) for his dice be θ, the same for both dice. Then EPV = Eθ[VAR[A+B | θ]] = Eθ[VAR[A | θ]] + Eθ[VAR[B | θ]] = Eθ[VAR[A | θ]] + Eθ[VAR[B | θ]] = 2.15 + 2.15 = (2)(2.15) = 4.30. The VHM = VARθ[ E[A+B | θ] ] = VARθ[ 2E[A | θ]] = (22 )VARθ[ E[A | θ] ] = (4)(.45) = 1.80. Where we have used the fact that E[A | θ] and E[B | θ] are the same. Total variance = EPV + VHM = 4.3 + 1.8 = 6.1. Alternately, Total Variance = (N)(EPV for one observation) + (N2 )(VHM for one observation) = (2)(2.15) + (22 )(.45) = 6.1.] Note that this exercise is the same mathematically as if each actuary chose a single die and reported the sum of rolling his die twice. Contrast this exercise with the previous one in which each actuary chose two dice, with the type of each die independent of the other. Letʼs go over these ideas again with each actuary rolling 3 dice. For example, assume you sum the independent rolls of 2 four-sided dice and 1 six-sided die. The mean result is: (2)(2.5) + (1)(3.5) = 8.5. The variance is: (2)(15/12) + (1)(35/12) = 65/12 = 5.4167.102 In contrast if one selected 3 dice at random, with a 2/3 chance that each one was four-sided and a 1/3 that it was six-sided, the variance of the sum of rolling the dice would be larger than in the previous situation where we know the type of dice. Type of Die 4-sided 6-sided Average
A Priori Chance of this Type of Die 0.6667 0.3333
Mean for this Type of Die 2.5 3.5 2.8333
Square of Mean Process of this type Variance of Die This type of Die 6.25 1.2500 12.25 2.9167 8.25
1.8056
The Expected Value of the Process Variance of a single die is 1.8056. For the roll of three dice it is (3)(1.8056) = 5.4167. However, now that we are picking the dice at random we need to add the VHM. The Variance of the Hypothetical Means for the selection of one die is: 8.25 - 2.83332 = 0.222. 102
The variance of a four-sided die is 15/12. The variance of a six-sided die is 35/12.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 341 For the selection of three independent dice, the variance of the means is multiplied by 3; the VHM is: (3)(0.222) = 0.666. One can also compute this variance by going back to first principles and listing all the possibilities: Number of 4-sided Dice
Number of 6-sided Dice
A Priori Chance
Hypothetical Mean
Square of Hypothetical Mean
0 1 2 3
3 2 1 0
0.0370 0.2222 0.4444 0.2963
10.5 9.5 8.5 7.5
110.25 90.25 72.25 56.25
1
8.5
72.9167
Overall
For example, the chance of one 4-sided die and two 6-sided dice is (3)(2/3)(1/3)2 = 2/9. The Variance of the Hypothetical Means is: 72.9167 - 8.52 = .6666 Thus the total variance is: 5.417 + 0.666 = 6.083. Which we note is higher than the 5.417 variance calculated when we know we have 2 four-sided and 1 six-sided die. Instead of making independent selections, you could pick dice which are always all of the same type. Assume you have three urns, A, B, and C. Urns A and B each contain 4-sided dice. Urn C contains 6-sided dice. Pick an urn at random, and then select 3 dice from the selected urn. Then there is a 2/3 chance that the selected dice all were four-sided and a 1/3 chance that they all were six-sided. In this situation, the variance of the sum of rolling the dice would be even larger than in the previous situation where the selection of each die was instead independent. The Expected Value of the Process Variance of a single die is 1.8056. For the roll of three dice it is (3)(1.8056) = 5.4167, the same as above. The Variance of the Hypothetical Means for the selection of one die is 2/9. For the selection of three identical dice, the hypothetical means are each multiplied by 3, so their variance is multiplied by 9. Thus the VHM = (32 )(2/9) = 2. Thus the total variance is: 5.417 + 2 = 7.417. Mathematically, this situation is the same as one in which a single die is selected and then rolled three times. This is the situation that is common in credibility questions. In that case the VHM for the sum of the three rolls is: (32 )(2/9) = 2 and the total variance is: EPV + VHM = 5.417 + 2 = 7.417. The Total Variance = (3)(EPV single die roll) + (32 )(VHM single die roll). The VHM has increased as per N2 the square of the number of observations, while the EPV goes up only as N.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 342 Total Variance of N observations = (N)(EPV for one observation) + (N2 )(VHM for one observation). This is the assumption behind the Buhlmann Credibility formula: Z = N / (N+K). Note that the Buhlmann Credibility parameter K is the ratio of the EPV to VHM for a single die. The Buhlmann Credibility formula is set up to automatically adjust the credibility for the number of observations N. Number of Observations: Exercise: For the multi-sided die example, three rolls from a random die are: 2, 5 and 1. Use Buhlmann Credibility to estimate the next roll from the same die. [Solution: The average observation is: (2 + 5 + 1)/3 = 8/3. There are three observations, so that Z = 3/(3 + K) =3/(3 + 4.778) = 38.6%. The a priori mean is 3. Thus the estimate is: (38.6%)(8/3) + (61.4%)(3) = 2.87. Comment: We have previously calculated the Buhlmann Credibility Parameter, K, for this situation. Since K does not depend on the observations, there is no need to calculate it again.] It makes sense to assign more credibility to more rolls of the selected die, since as we gather more information, we should be able to get a better idea of which type of die has been chosen. If one has N observations of the risk process one assigns Buhlmann Credibility of: Z = N / (N + K). For N = K, Z = K/(K + K) = 1/2. Therefore, the Buhlmann Credibility Parameter, K, is the number of observations needed for 50% credibility. For the Buhlmann Credibility formula as N → ∞, Z→ 1, but Buhlmann Credibility never quite reaches 100%. In this example with K = 4.778: # of Observations Credibility
1 17.3%
2 29.5%
3 38.6%
5 51.1%
10 67.7%
25 84.0%
100 95.4%
1000 99.5%
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 343 Below are two examples, corresponding to K = 100 and K = 400, of how Buhlmann Credibility varies with the number of observations: Credibility 1.0
K = 100
0.8
K = 400
0.6
0.4
0.2
1000
2000
3000
4000
5000
N
Below is shown the difference in the resulting credibilities for these two different values, 100 and 400, of the Buhlmann Credibility Parameter, K: Credibility 0.30 0.25 0.20 0.15 0.10 0.05 1000
2000
3000
4000
5000
N
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 344 If we add up N independent rolls of the same die, the process variances add. So if η2 is the expected value of the process variance of a single die, then Nη2 is the expected value of the process variance of the sum of N identical dice. The process variance of one 6-sided die is 35/12, while the process variance of the sum of ten 6-sided dice is 350/12. In contrast if τ2 is the variance of the hypothetical means of one die roll, then the variance of the hypothetical means of the sum of N rolls of the same die is N2 τ2 . This follows from the fact that each of the means is multiplied by N, and that multiplying by a constant multiplies the variance by the square of that constant. Thus as N increases, the variance of the hypothetical means of the sum goes up as N2 , while the process variance goes up only as N. Based on the case with one roll, we expect the credibility to be given by Z = VHM / Total Variance = VHM / (VHM + EPV) = N2 τ2 / (N2 τ2 + Nη2 ) = N / (N + η2 / τ2 ) = N / (N + K), where K = η2 / τ2 = EPV / VHM, with EPV and VHM are each for a single die.103 In general, one computes the EPV and VHM for a single observation of the risk process, and then plug into the formula for Buhlmann Credibility the number of observations N. If one is estimating claim frequencies, pure premiums, or aggregate losses, then N is in exposures. If one is estimating claim severities, then N is in number of claims. N is in the units of whatever is in the denominator of the quantity one is estimating. Claim frequency =
number of claims . number of exposures
Claim severity =
dollars of loss . number of claims
Pure Premium =
dollars of loss . number of exposures
103
In a subsequent section Z = VHM / Total Variance, is derived in the case of N dice or N observations.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 345 Assumptions Underlying Z = N / (N+K):104 There are a number of important assumptions underlying the formula Z = N / (N+K) where K = EPV / VHM . While these assumptions usually hold on the exam, they do not hold in some real world applications.105 These assumptions are: 1. The complement of credibility is given to the overall mean. 2. The credibility is determined as the slope of the weighted least squares line to the Bayesian Estimates. 3. The risk parameters and risk process do not shift over time.106 4. The expected value of the process variance of the sum of N observations increases as N. Therefore the expected value of the process variance of the average of N observations decreases as 1/N. 5. The variance of the hypothetical means of the sum of N observations increases as N2 . Therefore the variance of the hypothetical means of the average of N observations is independent of N. In addition, we must be careful that an insured has been picked at random, that we observe that insured and then we attempt to make an estimate of the future observation of that same insured If instead one goes back and chooses a new insured at random, then the information contained in the observation has been lost. There is an exception, when they talk about picking insureds from a class.107 “One class of policyholders is selected at random from the book. Nine policyholders are selected at random from this class and are observed to have produced a total of seven claims. Five additional policyholders are selected at random from the same class. Determine the Bühlmann credibility estimate for the total number of claims for these five policyholders.” 108 Each insured in the class is assumed to have the same risk process, so we can use the experience from the first set of insureds to predict the future for the second set of insureds from the same class. 104
These assumptions are also discussed in the subsequent section on Least Squares Credibility. See for example, Howard Mahlerʼs discussion of Glenn Meyersʼ “An Analysis of Experience Rating”, PCAS 1987, or "Credibility with Parameter Uncertainty, Risk Heterogeneity, and Shifting Risk Parameters," by Howard C. Mahler, PCAS 1998. 106 In the Philbrick Target shooting example discussed in a subsequent section, we assume the targets are fixed and that the skill of the marksmen does not change over time. 107 See for example, 4, 11/00, Q. 38 or 4, 11/02, Q.32. 108 Language quoted from 4, 11/00, Q. 38 . 105
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 346 Problems: 8.1 (1 point) The Expected Value of the Process Variance is 50. The Variance of the Hypothetical Means is 5. How much Buhlmann Credibility is assigned to 30 observations of this risk process? A. less than 60% B. at least 60% but less than 65% C. at least 65% but less than 70% D. at least 70% but less than 75% E. at least 75% 8.2 (1 point) If 4 observations are assigned 30% Buhlmann Credibility, what is the value of the Buhlmann Credibility parameter K? A. less than 9.0 B. at least 9.0 but less than 9.2 C. at least 9.2 but less than 9.4 D. at least 9.4 but less than 9.6 E. at least 9.6 8.3 (3 points) Your friend picked at random one of three multi-sided dice. He then rolled the die and told you the result. You are to estimate the result when he rolls that same die again. One of the three multi-sided dice has 4 sides (with 1, 2, 3 and 4 on them), the second die has the usual 6 sides (number 1 through 6), and the last die has 8 sides (with numbers 1 through 8). For a given die each side has an equal chance of being rolled, in other words the die is fair. Assume the first roll was a seven. Use Buhlmann Credibility to estimate the next roll of the same die. Hint: A die with S sides has mean: (S+1)/2, and variance: (S2 - 1)/12. A. 4.1
B. 4.2
C. 4.3
D. 4.4
E. 4.5
Use the following information for the next two questions: There are three large urns, each filled with so many balls that you can treat it as if there are an infinite number. Urn 1 contains balls with "zero" written on them. Urn 2 has balls with "one" written on them. The final Urn 3 is filled with 50% balls with "zero" and 50% balls with "one". An urn is chosen at random and three balls are selected. 8.4 (2 points) If all three balls have “zero” written on them, use Buhlmann Credibility to estimate the expected value of another ball picked from that urn. A. 0.05 B. 0.06 C. 0.07 D. 0.08 E. 0.09 8.5 (2 points) If two balls have “zero” written on them and one ball has “one” written on it, use Buhlmann Credibility to estimate the expected value of another ball picked from that urn. A. 0.30 B. 0.35 C. 0.40 D. 0.45 E. 0.50
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 347 8.6 (1 point) Which of the following statements are true with respect to the estimates from Buhlmann Credibility? 1. They are the linear least squares approximation to the estimates from Bayesian Analysis. 2. They are between the observation and the a priori mean. 3. They are within the range of hypotheses. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D 8.7 (2 points) There are two types of urns, each with many balls labeled $10 and $20. A Priori Chance of Percentage of Percentage of Type of Urn This Type of Urn $10 Balls $20 Balls I 70% 80% 20% II 30% 60% 40% An urn is selected at random, and you observe a total of $60 on 5 balls drawn from that urn at random. Using Buhlmann Credibility, what is estimated value of the next ball drawn from that urn? A. 12.1 B. 12.2 C. 12.3 D. 12.4 E. 12.5 8.8 (3 points) A die is selected at random from an urn that contains four six-sided dice with the following characteristics: Number of Faces Number on Face Die A Die B Die C Die D 1 3 1 1 1 2 1 3 1 1 3 1 1 3 1 4 1 1 1 3 The first four rolls of the selected die yielded the following in sequential order: 3, 4, 2, and 4. Using Buhlmann Credibility, what is the expected value of the next roll of the same die? A. Less than 2.8 B. At least 2.8, but less than 2.9 C. At least 2.9, but less than 3.0 D. At least 3.0, but less than 3.1 E. 3.1 or more 8.9 (2 points) The a priori mean annual number of claims per policy for a block of insurance policies is 10. A randomly selected policy had 15 claims in Year 1 and 11 claims in Year 2. Based on these two years of data, the Bühlmann credibility estimate of the number of claims on the selected policy in Year 3 is 10.75. In Year 3 this policy had 22 claims. Calculate the Bühlmann credibility estimate of the number of claims on the selected policy in Year 4 based on the data for Years 1, 2 and 3. (A) 11.0 (B) 11.5 (C) 12.0 (D) 12.5 (E) 13.0
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 348 8.10 (3 points) A teacher gives to her class a final exam with 100 available points. She finds that the scores very closely approximate a normal distribution with a mean of 56 and standard deviation of 8. The teacher assumes that the score actually achieved by a student on the exam is normally distributed around his "true competence" with the course material. Based on her past experience the teacher estimates the standard deviation of this distribution to be 4. The teacher uses Buhlmann credibility to estimate, for each student, his "true competence" from his observed exam score. The teacher wishes to pass those students whose "true competence" she estimates to be greater than or equal to 65. In which range should the passing grade be? A. Less than 65 B. At least 65, but less than 67 C. At least 67, but less than 69 D. At least 69, but less than 71 E. 71 or more 8.11 (3 points) There are three dice : Die A Die B Die C 2 faces labeled 0 4 faces labeled 0 5 faces labeled 0 4 faces labeled 1 2 faces labeled 1 1 faces labeled 1 A die is picked at random. The die is rolled 8 times and 2 “ones” are observed. Using Buhlmann Credibility what is the expected value of the next roll of that same die? A. Less than 0.25 B. At least 0.25 but less than 0.26 C. At least 0.26 but less than 0.27 D. At least 0.27 but less than 0.28 E. At least 0.28 8.12 (3 points) There are four types of urns with differing percentages of black balls. Each type of urn has a differing chance of being picked. Type of Urn A Priori Probability Percentage of Black Balls I 40% 4% II 30% 8% III 20% 12% IV 10% 16% An urn is chosen and fifty balls are drawn from it, with replacement; no black balls are drawn. Use Buhlmann credibility to estimate the probability of picking a black ball from the same urn. A. less than 3% B. at least 3% but less than 4% C. at least 4% but less than 5% D. at least 5% but less than 6% E. at least 6%
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 349 8.13 ( 3 points) Smith has selected a handful of ordinary six-sided dice. You do not know how many dice Smith has selected, but you assume it is either 10, 11, 12 or 13, with the following probabilities: Number of Dice A Priori Probability 10 20% 11 40% 12 30% 13 10% Smith rolls this same number of dice five times, without you seeing how many dice he rolled. Smith reports totals of: 54, 44, 41, 48 and 47. Using Buhlmann Credibility, what is your estimate of the average sum you can expect if Smith rolls this same number of dice again? Hint: A six-sided die has a mean of 3.5 and a variance of 35/12. A. Less than 43.0 B. At least 43.0, but less than 43.5 C. At least 43.5, but less than 44.0 D. At least 44.0, but less than 44.5 E. 44.5 or more. 8.14 (2 points) You are given the following: • A portfolio consists of a number of independent insureds. • Losses for each insured for each exposure period are one of three values: α, β, or γ. • The probabilities for α, β, and γ vary by insured, but are fixed over time. • The average probabilities for α, β, and γ over all insureds are 30%, 60%, and 10%, respectively. One insured is selected at random from the portfolio and its losses are observed for one exposure period. Estimates of the same insured's expected losses for the next exposure period are as follows: Observed Bayesian Analysis Buhlmann Losses Estimate Credibility Estimate α
y
100
β
150
140
γ
260
300
Determine y. A. Less than 85 B. At least 85, but less than 90 C. At least 90, but less than 95 D. At least 95, but less than 100 E. At least 100
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 350 8.15 (3 points) You are given: (i) Laurence uses classical credibility. (ii) Hans uses the Bühlmann credibility formula. (iii) The full credibility standard used by Laurence is eight times the Bühlmann Credibility Parameter, K, used by Hans. For most amounts of data, the amounts of credibility assigned by Laurence and Hans differ. However, for two non-zero amounts of data, they assign the same credibility. Which of the following are the credibilities assigned in these two cases? (A) 35% and 65% (B) 30% and 70% (C) 25% and 75% (D) 20% and 80% (E) 15% and 85% Use the following information for the next two questions: • The a priori estimate is 25. • Individual observations can extend from 13 to 49. • The Buhlmann Credibility parameter, k, is 11. • There are 4 observations. 8.16 (1 point) What is the minimum possible value of the Buhlmann credibility estimate of the next observation? A. Less than 21 B. At least 21, but less than 22 C. At least 22, but less than 23 D. At least 23, but less than 24 E. At least 24 8.17 (1 point) What is the maximum possible value of the Buhlmann credibility estimate of the next observation? A. Less than 31 B. At least 31, but less than 32 C. At least 32, but less than 33 D. At least 33, but less than 34 E. At least 34
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 351 8.18 (3 points) For several types of risks, you are given: (i) The expected number of claims in a year for these risks ranges from 0.5 to 2.0. (ii) The number of claims follows a Binomial distribution with m = 6 for each risk. During Year 1, n claims are observed for a randomly selected risk. For the same risk, both Bayes and Bühlmann credibility estimates of the number of claims in Year 2 are calculated for n = 0, 1, 2, ... , 6. Which graph represents these estimates? 2.5
2.5
2
2
1.5
1.5
1
1 Buhlmann
Buhlmann 0.5
0.5 Bayes 1
A
2 3 4 No. of Claims in Year 1
Bayes 5
6
1
B
2.5
2.5
2
2
1.5
1.5
1
1
2 3 4 No. of Claims in Year 1
Buhlmann
6
Buhlmann
0.5
0.5 Bayes 1
C
2 3 4 No. of Claims in Year 1
Bayes 5
6
2
1.5
1 Buhlmann 0.5 Bayes 1
2 3 4 No. of Claims in Year 1
1
D
2.5
E
5
5
6
2 3 4 No. of Claims in Year 1
5
6
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 352 8.19 (2 points) For a group of policies, you are given: (i) The annual loss on an individual policy follows a gamma distribution with parameters α = 4 and θ. (ii) The prior distribution of θ has mean 600. (iii) A randomly selected policy had losses of 1400 in Year 1 and 1900 in Year 2. (iv) Loss data for Year 3 was misfiled and unavailable. (v) Based on the data in (iii), the Bühlmann credibility estimate of the loss on the selected policy in Year 4 is 1800. (vi) After the estimate in (v) was calculated, the data for Year 3 was located. The loss on the selected policy in Year 3 was 2763. Calculate the Bühlmann credibility estimate of the loss on the selected policy in Year 4 based on the data for Years 1, 2, and 3. (A) Less than 1850 (B) At least 1850, but less than 1950 (C) At least 1950, but less than 2050 (D) At least 2050, but less than 2150 (E) At least 2150 8.20 (3 points) Use the following information for a portfolio of insureds: (i) The frequency distribution has a parameter α, which varies across the portfolio. (ii) The mean frequency given α is µN(α). (iii) The distribution of µN(α) has a mean of 0.3 and a variance of 0.5. (iv) The severity distribution has a parameter β, which varies across the portfolio. (v) The mean severity given β is µX(β). (vi) The distribution of µX(β) has a mean of 200 and a variance of 6000. (vii) α and β vary independently across the portfolio. (viii) Given α and β, frequency and severity are independent. (ix) The overall variance of aggregate losses is 165,000. Determine the Buhlmann Credibility Parameter for aggregate losses, K. A. 2 B. 3 C. 4 D. 5 E. 6 8.21 (7 points) For a group of risks, you are given: (i) The number of claims for each risk follows a binomial distribution with parameters m = 6 and q. (ii) The values of q are equally likely to be 0.1, 0.3, or 0.6. During Year 1, k claims are observed for a randomly selected risk. For the same risk, both Bayesian and Bühlmann credibility estimates of the number of claims in Year 2 are calculated for k = 0, 1, 2, ... , 6. Plot as function of k these Bayesian and Bühlmann credibility estimates together on the same graph.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 353 Use the following information for the next two questions: At Dinahʼs Diner by the Shore, her lunchtime customers have the following joint distribution of number of cups of coffee they drink and number of pieces of pie they eat: Number of Pieces of Pie Number of Cups of Coffee 0 1 2 0 15% 10% 5% 1 10% 20% 10% 2 5% 10% 15% Any given customer drinks the same number of cups of coffee each time they have lunch at Dinahʼs. Burt has eaten a total of 4 pieces of pie with his last 3 lunches. 8.22 (3 points) Use Bayes Analysis to estimate the number of pieces of pie that Burt will eat with his next lunch. 8.23 (3 points) Use Buhlmann credibility to estimate the number of pieces of pie that Burt will eat with his next lunch. Use the following information for the next two questions: Your company offers an insurance product. There is at most one claim a year per policyholder. There are the following three equally likely types of policyholders: Type Frequency of Claim Severity of Claim Given Claim Occurs Probability Claim Size 1 20% 80% $1,500 20% $1,000 2 40% 50% $1,500 50% $1,000 3 80% 10% $1,500 90% $1,000 You are also given the following 7 years of claims history for a policyholder named Jim: Year 1 2 3 4 5 6 Losses 0 $1000 0 $1000 0 $1500 8.24 (4 points) Use Buhlmann Credibility to estimate Jimʼs losses in year 8. A. 503 B. 508 C. 513 D. 518 E. 523 8.25 (3 points) Use Bayes Analysis to estimate Jimʼs losses in year 8. A. 500 B. 505 C. 510 D. 515 E. 520
7 0
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 354 Use the following information for the following 2 questions: There are two classes of insureds. 70% of policyholders are in group A, while 30% of policyholders are in group B. The size of claims is either 1000 or 2000. In group A, 80% of the claims are of size 1000. In group B, 50% of the claims are of size 1000. For a particular policyholder, you do not know what group it is from, but you observe the following experience over four years: Year Number of Claims Claim Sizes 1 1 1000 2 3 1000, 1000, 2000 3 0 --4 2 2000, 2000 8.26 (2 points) Using Buhlmann Credibility, estimate the future average severity for this policyholder. A. 1350 B. 1370 C. 1390 D. 1410 E. 1430 8.27 (2 points) Using Bayes Analysis, estimate the future average severity for this policyholder. A. 1350 B. 1370 C. 1390 D. 1410 E. 1430
8.28 (3 points) Each insured has at most one claim a year. Claim Size Distribution Class Prior Probability Probability of a Claim 100 200 A 3/4 1/5 2/3 1/3 B 1/4 2/5 1/2 1/2 An insured is chosen at random and a single claim of size 200 has been observed during two years. Use Buhlmann Credibility to estimate the future pure premium for this insured. A. 40 B. 41 C. 42 D. 43 E. 44 8.29 (2 points) You are using Buhlmann Credibility to estimate annual frequency. The estimated future frequency for an individual with 1 year claim free is 7.875%. The estimated future frequency for an individual with 2 years claim free is 7.000%. Determine the estimated future frequency for an individual with 3 years claim free. A. 6.1% B. 6.2% C. 6.3% D. 6.4% E. 6.5%
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 355 8.30 (4, 11/82, Q.40) (1 point) In which of the following situations, should credibility be expected to increase? 1. Larger quantity of observations. 2. Increase in the prior mean. 3. Increase in the variance of hypothetical means. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3
E. None of A, B, C, or D.
Use the following information for the next three questions: A game of chance has been designed where you are dealt cards from a deck of cards chosen at random from two available decks. The two decks are as follows: Deck A: 1 suit from a regular deck of cards (13 cards). Deck B: same as Deck A except the ace is missing (12 cards). You will receive $10 for each ace or face card in your hand. NOTE: A face card equals either a Jack, Queen, or King. 8.31 (2 points) What is the Expected Value of the Process Variance (for the dealing of a single card)? A. 16 B. 18 C. 20 D. 22 E. 24 8.32 (2 points) What is the Variance of the Hypothetical Means (for the dealing of a single card)? A. 0.06 B. 0.08 C. 0.10 D. 0.12 E. 0.14 8.33 (4, 5/84, Q.49) (2 points) Assume that you have been dealt two cards with replacement and both cards are either an ace or a face card (i.e., a $20 hand). Using Buhlmann Credibility, what is the expected value of the next card drawn from the same deck assuming the previous cards have been replaced? A. Less than 2.7 B. At least 2.7, but less than 2.8 C. At least 2.8, but less than 2.9 D. At least 2.9, but less than 3.0 E. 3.0 or more
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 356 8.34 (4, 5/85, Q.39) (3 points) A teacher gives a final exam to his class. He finds that the scores very closely approximate a normal distribution with a mean of 55 and standard deviation of 10. The teacher assumes that the score actually achieved by a student on the exam is normally distributed around his "true competence" with the course material. Based on his past experience the teacher estimates the standard deviation of this distribution to be 5. The teacher uses Buhlmann credibility to estimate, for each student, his "true competence" from his observed exam score. He wishes to pass those students whose "true competence" he estimates to be greater than or equal to 70%. In which range should the passing grade be? Hint: Total variance = expected value of the process variance + variance of the hypothetical means. A. 73 B. 75 C. 77 D. 79 E. 81 8.35 (4, 5/86 Q.40) (1 point) Which of the following statements are true? 1. If X is the random variable representing the aggregate loss amount, N the random variable representing number of claims, and Yi the random variable representing the amount of the ith claim, then VAR[X] = E[Yi] VAR[N] + VAR[Yi] (E[N])2 . 2. Using Buhlmann/Greatest Accuracy credibility methods, the amount of credibility assigned is a decreasing function of the expected value of the process variance. 3. P(H and B) = P(H | B) P(B) A. 3 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 8.36 (4, 5/86, Q.43) (3 points) Jones has selected a handful of ordinary six-sided dice. You do not know how many dice Jones has selected, but you assume it is either 11, 12 or 13, each with equal probability. Jones rolls this same number of dice five times, without you seeing how many dice he rolled. Jones reports totals of 45, 44, 51, 48 and 47. Using Buhlmann Credibility, what is your estimate of the average sum you can expect if he rolls this same number of dice again? Note: A six-sided die has a mean of 3.5 and a variance of 35/12. A. 42 B. 43 C. 44 D. 45 E. 46 8.37 (4, 5/88, Q.37) (3 points) The universe for this problem consists of two urns. Each urn contains several balls of equal size, weight and shape. Urn 1 contains 5 white, 10 black and 5 red balls. Urn 2 contains 20 white, 8 black and 12 red balls. Each white ball is worth zero, each black $100, and each red $500. An urn is selected at random, and you want to determine the expected value of a ball drawn from the urn. Each observation consists of drawing a ball at random from the urn, recording the result, and then replacing the ball. You make two observations from the same urn. What Buhlmann credibility would you assign these two observations, for the purpose of determining the expected value of a ball drawn from the urn? A. Less than 0.0003 B. At least 0.0003, but less than 0.0004 C. At least 0.0004, but less than 0.0005 D. At least 0.0005, but less than 0.0006 E. 0.0006 or more
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 357 8.38 (4, 5/88, Q.42) (1 point) Which of the following statements are true? 1. The Buhlmann credibility estimate is the “best" linear approximation to the Bayesian estimate of the pure premium. 2. A Bayesian estimate is the weighted average of the hypothesis and outcome. 3. If the number of claims for an individual are Poisson distributed with parameter q, where q is a Gamma distributed random variable, then the total number of accidents is also Gamma distributed. A. 1 B. 2 C. 1, 2 D. 1, 3 E. 1, 2 and 3 8.39 (4, 5/89, Q.32) (1 point) Which of the following will increase the credibility of your body of data? 1. A larger number of observations. 2. A smaller process variance. 3. A larger variance of the hypothetical means. A. 1 B. 2 C. 1, 2 D. 2, 3 E. 1, 2, 3 8.40 (4, 5/89, Q.37) (2 points) Your friend selected at random one of two urns and then she pulled a ball with the number 4 on it from the urn. Then, she replaced the ball in the urn. One of the urns contains four balls numbered 1 through 4. The other urn contains six balls numbered 1 through 6. Your friend will make another random selection of a ball from the same urn. Using the Buhlmann credibility model what is the estimate of the expected value of the number on the ball? A. Less than 2.925 B. At least 2.925, but less than 2.975 C. At least 2.975, but less than 3.025 D. At least 3.025, but less than 3.075 E. 3.075 or more 8.41 (4, 5/90, Q.35) (1 point) The underlying expected loss for each individual insured is assumed to be constant over time. If the Buhlmann credibility assigned to the pure premium for an insured observed for one year is 1/2, what is the Buhlmann credibility to be assigned to the pure premium for an insured observed for 3 years? A. 1/2 B. 2/3 C. 3/4 D. 6/7 E. Cannot be determined
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 358 8.42 (4, 5/90, Q.40) (2 points) Three urns contain balls marked with either 0 or 1 in the proportions described below. Marked 0 Marked 1 Urn A 10% 90% Urn B 60 40 Urn C 80 20 An urn is selected at random and three balls are selected, with replacement, from the urn. The total of the values is 1. Three more balls are selected from the same urn. Calculate the expected total of the three balls using Buhlmann's credibility formula. A. less than 1.05 B. at least 1.05 but less than 1.10 C. at least 1.10 but less than 1.15 D. at least 1.15 but less than 1.20 E. at least 1.20 8.43 (4, 5/91, Q.37) (2 points) One spinner is selected at random from a group of three different spinners. Each of the spinners is divided into six equally likely sectors marked as described below. ----------Number of Sectors---------Spinner Marked 0 Marked 12 Marked 48 A 2 2 2 B 3 2 1 C 4 1 1 Assume a spinner is selected and a zero is obtained on the first spin. What is the Buhlmann credibility estimate of the expected value of the second spin using the same spinner? A. Less than 12.5 B. At least 12.5 but less than 13.0 C. At least 13.0 but less than 13.5 D. At least 13.5 but less than 14.0 E. At least 14.0
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 359 8.44 (4, 5/91, Q.51) (3 points) Four urns contain balls marked with either 0 or 1 in the proportions described below. Urn Marked 0 Marked 1 A 70% 30% B 70 30 C 30 70 D 20 80 An urn is selected at random and four balls are selected from the urn with replacement. The total of the values is 2. Four more balls are selected from the same urn. Calculate the expected total of the four balls using Buhlmann's credibility formula. A. Less than 1.96 B. At least 1.96 but less than 1.99 C. At least 1.99 but less than 2.02 D. At least 2.02 but less than 2.05 E. At least 2.05 8.45 (4B, 5/92, Q.9) (3 points) Two urns contain balls each marked with 0, 1, or 2 in the proportions described below: Percentage of Balls in Urn Marked 0 Marked 1 Marked 2 Urn A 0.20 0.40 0.40 Urn B 0.70 0.20 0.10 An urn is selected at random and two balls are selected, with replacement, from the urn. The sum of values on the selected balls is 2. Two more balls are selected from the same urn. Determine the expected total of the two balls using Buhlmann's credibility formula. A. Less than 1.6 B. At least 1.6 but less than 1.7 C. At least 1.7 but less than 1.8 D. At least 1.8 but less than 1.9 E. At least 1.9 8.46 (4B, 11/92, Q.19) (1 point) You are given the following:
• The Buhlmann credibility of an individual risk's experience is 1/3 based upon 1 observation. • The risk's underlying expected loss is constant. Determine the Buhlmann credibility for the risk's experience after four observations. A. 1/4 B. 1/2 C. 2/3 D. 3/4 E. Cannot be determined.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 360 8.47 (4B, 5/93, Q.3) (1 point) You are given the following: • X is a random variable with mean m and variance v.
• m is a random variable with mean 2 and variance 4. • v is a random variable with mean 8 and variance 32. Determine the value of the Buhlmann credibility factor Z, after three observations of X. A. Less than 0.25 B. At least 0.25 but less than 0.50 C. At least 0.50 but less than 0.75 D. At least 0.75 but less than 0.90 E. At least 0.90 Use the following information for the next two questions: Two urns contain balls with each ball marked 0 or 1 in the proportions described below: Percentage of Balls in Urn Marked 0 Marked 1 Urn A 20% 80% Urn B 70% 30% An urn is randomly selected and two balls are drawn from the urn. The sum of the values on the selected balls is 1. Two more balls are selected from the same urn. Note: Assume that each selected ball has been returned to the urn before the next ball is drawn. 8.48 (4B, 11/94, Q.6) (3 points) Determine the Buhlmann credibility estimate of the expected value of the sum of the values on the second pair of selected balls. A. Less than 1.035 B. At least 1.035, but less than 1.055 C. At least 1.055, but less than 1.075 D. At least 1.075, but less than 1.095 E. At least 1.095 8.49 (4B, 11/94, Q.7) (1 point) The sum of the values of the second pair of selected balls was 2. One of the two urns is then randomly selected and two balls are drawn from the urn. Determine the Buhlmann credibility estimate of the expected value of the sum of the values on the third pair of selected balls. A. Less than 1.07 B. At least 1.07, but less than 1.17 C. At least 1.17, but less than 1.27 D. At least 1.27, but less than 1.37 E. At least 1.37
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 361 8.50 (4B, 5/95, Q.2) (2 points) You are given the following: • The Buhlmann credibility of three observations is twice the credibility of one observation. • The expected value of the process variance is 9. What is the variance of the hypothetical means? A. 3 B. 4 C. 6 D. 8 E. 9 8.51 (4B, 5/95, Q.16) (1 point) Which of the following will DECREASE the credibility of the current observations? 1. Decrease in the number of observations 2. Decrease in the variance of the hypothetical means 3. Decrease in the expected value of the process variance A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3 8.52 (4B, 11/95, Q.2) (1 point) The Buhlmann credibility of five observations of the loss experience of a single risk is 0.29. What is the Buhlmann credibility of two observations of the loss experience of this risk? A. Less than 0.100 B. At least 0.100, but less than 0.125 C. At least 0.125, but less than 0.150 D. At least 0.150, but less than 0.175 E. At least 0.175 8.53 (4B, 11/95, Q.21) (3 points) Ten urns each contain five balls, numbered as follows: Urn 1: 1,2,3,4,5 Urn 2: 1,2,3,4,5 Urn 3: 1,2,3,4,5 Urn 4: 1,2,3,4,5 Urn 5: 1,2,3,4,5 Urn 6: 1,1,1,1,1 Urn 7: 2,2,2,2,2 Urn 8: 3,3,3,3,3 Urn 9: 4,4,4,4,4 Urn 10: 5,5,5,5,5 An urn is randomly selected. A ball is then randomly selected from this urn. The selected ball has the number 2 on it. This ball is then replaced, and another ball is randomly selected from the same urn. The second selected ball has the number 3 on it. This ball is then replaced, and another ball is randomly selected from the same urn. Determine the Buhlmann credibility estimate of the expected value of the number on this third selected ball. A. Less than 2.2 B. At least 2.2, but less than 2.4 C. At least 2.4, but less than 2.6 D. At least 2.6, but less than 2.8 E. At least 2.8
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 362 8.54 (4B, 5/96, Q.16) (3 points) You are given the following: • Two urns contain balls.
• •
In Urn A, half of the balls are marked 0 and half of the balls are marked 2.
In Urn B, half of the balls are marked 0 and half of the balls are marked t. An urn is randomly selected. A ball is then randomly selected from this urn, observed, and replaced. You wish to estimate the expected value of the number on the second ball randomly selected from this same urn. For which of the following values of t would the Buhlmann credibility of the first observation be greater than 1/10? A. 1 B. 2 C. 3 D. 4 E. 5 8.55 (4B, 5/96, Q.24) (2 points) A die is randomly selected from a pair of fair, six-sided dice, A and B. Die A has its faces marked with 1, 2, 3, 4, 5, and 6. Die B has its faces marked with 6, 7, 8, 9, 10, and 11. The selected die is rolled four times. The results of the first three rolls are 1, 2, and 3. Determine the Buhlmann credibility estimate of the expected value of the result of the fourth roll. A. Less than 1.75 B. At least 1.75, but less than 2.25 C. At least 2.25, but less than 2.75 D. At least 2.75, but less than 3.25 E. At least 3.25 8.56 (4B, 11/96, Q.10) (2 points) The Buhlmann credibility of n observations of the loss experience of a single risk is 1/3. The Buhlmann credibility of n+1 observations of the loss experience of this risk is 2/5. Determine the Buhlmann credibility of n+2 observations of the loss experience of this risk. A. 4/9 B. 5/11 C. 1/2 D. 6/11 E. 5/9 8.57 (4B, 5/97, Q.23) (3 points) You are given the following:
• • •
Two urns contain balls. In Urn A, half of the balls are marked 0 and half of the balls are marked 2.
In Urn B, half of the balls are marked 0 and half of the balls are marked t. An urn is randomly selected. A ball is then randomly selected from this urn, observed, and replaced. An estimate is to be made of the expected value of the number on the second ball randomly selected from this same urn. Determine the limit of the Buhlmann credibility of the first observation as t goes to infinity. A. 0 B. 1/3 C. 1/2 D. 2/3 E. 1
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 363 8.58 (4B, 11/97, Q.12) (2 points) You are given the following: • A portfolio consists of a number of independent insureds. • Losses for each insured for each exposure period are one of three values: a, b, or c. • The probabilities for a, b, and c vary by insured, but are fixed over time. • The average probabilities for a, b, and c over all insureds are 5/12, 1/6, and 5/12, respectively. One insured is selected at random from the portfolio and its losses are observed for one exposure period. Estimates of the same insured's expected losses for the next exposure period are as follows: Observed Bayesian Analysis Buhlmann Losses Estimate Credibility Estimate a 3.0 x b 4.5 3.8 c 6.0 6.1 Determine x. A. Less than 1.75 B. At least 1.75, but less than 2.50 C. At least 2.50, but less than 3.25 D. At least 3.25, but less than 4.00 E. At least 4.00
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 364 Use the following information for the next two questions: • An urn contains six dice. • Three of the dice have two sides marked 1, two sides marked 2, and two sides marked 3. • Two of the dice have two sides marked 1, two sides marked 3, and two sides marked 5. • One die has all six sides marked 6. One die is randomly selected from the urn and rolled. A 6 is observed. 8.59 (4B, 5/98, Q.14) (2 points) Determine the Buhlmann credibility estimate of the expected value of the second roll of this same die. A. Less than 4.5 B. At least 4.5, but less than 5.0 C. At least 5.0, but less than 5.5 D. At least 5.5, but less than 6.0 E. At least 6.0 8.60 (4B, 5/98, Q.15) (3 points) The selected die is placed back in the urn. A seventh die is then added to the urn. The seventh die is one of the following three types: 1. Two sides marked 1, two sides marked 3, and two sides marked 5. 2. All six sides marked 3. 3. All six sides marked 6. One die is again randomly selected from the urn and rolled. An estimate is to be made of the expected value of the second roll of this same die. Determine which of the three types for the seventh die would increase the Buhlmann credibility of the first roll of the selected die (compared to the Buhlmann credibility used in the previous question.) A. 1 B. 2 C. 3 D. 1, 3 E. 2, 3 8.61 (Course 4 Sample Exam 2000, Q.28) Four urns contain balls marked either 1 or 3 in the following proportions: Urn Marked 1 Marked 3 1 - p1 1 p1 2
p2
1 - p2
3
p3
1 - p3
4
p4
1 - p4
An urn is selected at random (with each urn being equally likely) and balls are drawn from it in three separate rounds. In the first round, two balls are drawn with replacement. In the second round, one ball is drawn with replacement. In the third round two balls are drawn with replacement. After two rounds, the Bühlmann-Straub credibility estimate of the total of the values on the two balls to be drawn in the third round could range from 3.8 to 5.0 (depending on the results of the first two rounds). Determine the value of Buhlmann-Straubʼs k.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 365 8.62 (4, 5/01, Q.6) (2.5 points) You are given: (i) The full credibility standard is 100 expected claims. (ii) The square-root rule is used for partial credibility. You approximate the partial credibility formula with a Bühlmann credibility formula by selecting a Bühlmann k value that matches the partial credibility formula when 25 claims are expected. Determine the credibility factor for the Bühlmann credibility formula when 100 claims are expected. (A) 0.44 (B) 0.50 (C) 0.80 (D) 0.95 (E) 1.00
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 366 8.63 (4, 5/05, Q.32 & 2009 Sample Q.200) (2.9 points) For five types of risks, you are given: (i) The expected number of claims in a year for these risks ranges from 1.0 to 4.0. (ii) The number of claims follows a Poisson distribution for each risk. During Year 1, n claims are observed for a randomly selected risk. For the same risk, both Bayes and Bühlmann credibility estimates of the number of claims in Year 2 are calculated for n = 0, 1, 2, ... , 9. Which graph represents these estimates? 4.5
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5 Buhlmann
1 0.5
Bayes
0.5 2
A
Buhlmann
1
Bayes
4
6
8
10
No. of Claims in Year 1
2
B
4.5
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
8
10
Buhlmann
1
Bayes
0.5
Bayes
0.5 2
4
6
8
10
No. of Claims in Year 1
4.5 4 3.5 3 2.5 2 1.5 Buhlmann
1
Bayes
0.5
E
6
1.5 Buhlmann
1
C
4
No. of Claims in Year 1
2
4 6 No. of Claims in Year 1
8
10
D
2
4
6
No. of Claims in Year 1
8
10
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 367 8.64 (4, 11/06, Q.6 & 2009 Sample Q.251) (2.9 points) For a group of policies, you are given: (i) The annual loss on an individual policy follows a gamma distribution with parameters α = 4 and θ. (ii) The prior distribution of θ has mean 600. (iii) A randomly selected policy had losses of 1400 in Year 1 and 1900 in Year 2. (iv) Loss data for Year 3 was misfiled and unavailable. (v) Based on the data in (iii), the Bühlmann credibility estimate of the loss on the selected policy in Year 4 is 1800. (vi) After the estimate in (v) was calculated, the data for Year 3 was located. The loss on the selected policy in Year 3 was 2763. Calculate the Bühlmann credibility estimate of the loss on the selected policy in Year 4 based on the data for Years 1, 2 and 3. (A) Less than 1850 (B) At least 1850, but less than 1950 (C) At least 1950, but less than 2050 (D) At least 2050, but less than 2150 (E) At least 2150
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 368 8.65 (4, 5/07, Q.2) (2.5 points) For a group of risks, you are given: (i) The number of claims for each risk follows a binomial distribution with parameters m = 6 and q. (ii) The values of q range from 0.1 to 0.6. During Year 1, k claims are observed for a randomly selected risk. For the same risk, both Bayesian and Bühlmann credibility estimates of the number of claims in Year 2 are calculated for k = 0, 1, 2, ... , 6. Determine the graph that is consistent with these estimates. (A) (B)
(C)
(E)
(D)
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 369 8.66 (4, 5/07, Q.21) (2.5 points) You are given: (i) Losses in a given year follow a gamma distribution with parameters α and θ, where θ does not vary by policyholder. (ii) The prior distribution of α has mean 50. (iii) The Bühlmann credibility factor based on two years of experience is 0.25. Calculate Var(α). (A) Less than 10 (B) At least 10, but less than 15 (C) At least 15, but less than 20 (D) At least 20, but less than 25 (E) At least 25
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 370 Solutions to Problems: 8.1. E. The Buhlmann Credibility parameter is: K = (Expected Value of the Process Variance) / (Variance of the Hypothetical Means) = 50/ 5 = 10. Z = N / (N+K) = 30 / (30 + 10) = 75%. 8.2. C. Z = N / (N+K) therefore K = N {(1/Z) - 1} = 4{(1/.3) - 1)} = 9.33. 8.3. A. The variance of the hypothetical means = 12.91667 - 3.52 = .6667.
Type of Die 4-sided 6-sided 8-sided
A Priori Chance of This Type of Die 0.333 0.333 0.333
Overall
Process Variance 1.250 2.917 5.250
Mean Die Roll 2.5 3.5 4.5
Square of Mean Die Roll 6.25 12.25 20.25
3.1389
3.50
12.91667
K = EPV / VHM = 3.1389 / .6667 = 4.71. Z = 1/(1 + 4.71) = .175. The a priori estimate is 3.5 and the observation is 7, so the new estimate is: (.175)(7) + (.825)(3.5) = 4.11. Comment: We know that the 4-sided and 6-sided dice could not have resulted in a seven. Thus using Bayes Analysis, the posterior distribution would be 100% probability of the 8-sided die. Buhlmann Credibility is a linear approximation to Bayes Analysis. This illustrates the problems with using a linear estimator such as Buhlmann Credibility. The Buhlmann Credibility estimate when there is an extreme observation may not be very sensible. On the exam, use Buhlmann Credibility when they ask you to. 8.4. C. Expected Value of the Process Variance = 0.0833. Variance of the Hypothetical Means = 0.4167 - 0.52 = 0.1667. Type of Urn 1 2 3 Average
A Priori Probability 0.3333 0.3333 0.3333
Mean for this Type Urn 0 1 0.5
Square of Mean of this type Urn 0 1 0.25
Process Variance 0 0.00000 0.25000
0.5
0.4167
0.0833
K= EPV / VHM = .0833 / .1667 = .5 Thus for N = 3, Z = 3/(3+.5) = 85.7%. The observed mean is 0 and the a priori mean is .5, therefore the new estimate = (0)(0.857) + (0.5)(1 - 0.857) = 0.0715. 8.5. B. As computed in the solution to the previous question, for 3 observations Z = 85.7% and the a priori mean is .5. Since the observed mean is 1/3, the new estimate is: (1/3)(.857) + (0.5)(1 - 0.857) = 0.357.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 371 8.6. A. 1. True. 2. True (This is always true of credibility, but may not be true of Bayesian analysis.) 3. False (This is always true of Bayesian analysis, but may not be true of credibility.) 8.7. E. For example, the second moment of Urn II is: (.6)(102 ) + (.4)(202 ) = 220. The process variance of Urn II = 220 - 142 = 24. Type of Urn I II
A Priori Probability 70% 30%
Average
Mean 12.0 14.0
Square of Mean 144.0 196.0
12.6
159.6
Second Moment 160 220
Process Variance 16.0 24.0 18.4
The variance of the hypothetical means is: 159.6 - 12.62 = 0.84. Thus the Buhlmann credibility parameter is K = EPV / VHM = 18.4 / 0.84 = 21.9. Thus for 5 observations Z = 5 / (5 + 21.9) = 18.6%. The prior mean is $12.6. The observed mean is: 60/5 = $12.0. Thus the new estimate is: (.186)(12.0) + (1 - .186)(12.6) = $12.49. 8.8. A. For Die A the mean is: (1+1+1+2+3+4)/6 = 2 and the second moment is: (1+1+1+4+9+16)/6 = 5.3333. Thus the process variance for Die A is 5.3333 - 22 = 1.3333. Similarly for Die B the mean is 2.3333 and the second moment is 6.3333. Thus the process variance for Die B is: 6.333 - 2.3332 = . 889. The mean of Die C is 2.6667. The process variance for Die C is: {(1-2.6667)2 + (2-2.6667)2 + (3)(3-2.6667)2 + (4-2.6667)2 } / 6 = .889, the same as Die B. The mean of Die D is 3. The process variance for Die D is: {(1-3)2 + (2-3)2 + (3-3)2 + (3)(4-3)2 } / 6 = 1.333, the same as Die A. Thus the expected value of the process variance = (1/4)(1.333) + (1/4)(.889) + (1/4)(.889) + (1/4)(1.333) = 1.111.
Die A B C D Mean
A Priori Chance of Die 0.250 0.250 0.250 0.250
Mean 2.0000 2.3333 2.6667 3.0000
Square of Mean 4.0000 5.4443 7.1113 9.0000
2.5000
6.3889
Thus the Variance of the Hypothetical Means = 6.3889 - 2.52 = 0.1389. Therefore, the Buhlmann Credibility Parameter = K = EPV / VHM = 1.111 / .1389 = 8.0. Thus the credibility for 4 observations is: 4/(4+K) = 4 /12 = 1/3. The a priori mean is 2.5. The observed mean is: (3 + 4 + 2 + 4)/4 = 3.25. Thus the estimated future die roll is: (1/3)(3.25) + (1 - 1/3)(2.5) = 2.75. Comment: Iʼve illustrated the two different methods of computing variances, first in terms of the moments and second as the second central moment.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 372 8.9. C. The a priori mean is 10. The mean for observed Years 1 and 2 is: (15 + 11)/2 = 13. Therefore, 13Z + (1 - Z)(10) = 10.75 ⇒ Z = 0.25. ⇒ 2/(2 + K) = 0.25 ⇒ K = 6. For three years of data, the observed mean is: (15 + 11 + 22)/3 = 16. Z = 3/(3 + 6) = 1/3. (1/3)(16) + (2/3)(10) = 12. 8.10. C. The total variance is 82 = 64. The (assumed) expected value of the process variance = 42 = 16. Thus the variance of the hypothetical means = total variance - EPV = 64 -16 = 48. The Buhlmann Credibility parameter K is EPV/VHM = 16/48 = 1/3. Thus the credibility assigned to one observation is 1/(1+K) = 1 /(4/3) = .75. Thus if one observes a score of s, our estimate of that studentʼs “true competence” would be s(.75) + (1-.75)(56) = 14 + .75s. Thus an estimated true competency of 65 would correspond to a score such that 65 = 14 + .75s. Thus s = 51 / .75 = 68. Comment: Note that for a single observation, Z = (total variance - EPV) / total variance = (64 - 16)/64 = 3/4. 8.11. E. The Variance of the Hypothetical Means = .1944 - .38892 = .0432. Type of Die A B C
A Priori Probability 0.3333 0.3333 0.3333
Average
Mean 0.6667 0.3333 0.1667
Square of Mean 0.4444 0.1111 0.0278
Process Variance 0.2222 0.2222 0.1389
0.3889
0.1944
0.1944
Thus the Buhlmann credibility parameter is K =EPV / VHM = .1944 / .0432 = 4.50. Thus for 8 observations Z = 8 / (8 +4.5) = 64%. The prior mean is .3889 and the observation is 2/8= .25. Thus the new estimate is: (64%)(.25) + (36%)(.3889) = 0.300. 8.12. B. Assign a value of zero to a non-black ball and a value of 1 to a black ball. Then the future estimate is equal to the chance of picking a black ball. Use the fact that for the Bernoulli the process variance is q(1-q). Type I II III IV Overall
A Priori Prob. 0.4 0.3 0.2 0.1
% Black Balls 0.04 0.08 0.12 0.16
Process Var. 0.0384 0.0736 0.1056 0.1344
0.08
0.072
The EPV = .072. Using the fact that the overall mean is .08, the variance of the hypothetical means is = (.4(.04-.08)2 ) + (.3(.08-.08)2 ) + (.2(.12-.08)2 ) + (.1(.16-.08)2 ) = .0016. Thus K = .072 / .0016 = 45. Z = 50 / (50+45) = 53%. Estimate = (0)(53%) + (8%)(47%) = 3.8%. Comment: Note that the estimate is outside the range of hypothetical means. While this can happen for estimates based on Credibility, it canʼt for estimates based on Bayes Theorem.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 373 8.13. C. The mean of a sum of N 6-sided dice is 3.5N. The process variance of a sum on N independent 6-sided is N(35/12). Number of Dice
A Priori Chance of this Number of Dice
10 11 12 13
0.2 0.4 0.3 0.1
Average
Mean of Sum Square of Mean of Process Variance of Dice Sum of Dice for Sum of Dice 35.0 38.5 42.0 45.5
1225.000 1482.250 1764.000 2070.250
29.167 32.083 35.000 37.917
39.550
1574.125
32.958
Thus the variance of the hypothetical means = 1574.125 - 39.552 = 9.92. The EPV = 32.958. Thus K = EPV/VHM = 32.958 / 9.92 = 3.32. Thus five observations are given credibility Z = 5/(5+K) = 5/8.32 = .601. The observed average is (54+ 44+ 41+ 48 +47)/5 = 46.8. The a priori mean is 39.55. Thus the new estimate is: (.601)(46.8)+(1-.601)(39.55) = 43.9 8.14. C. The weighted average of the Buhlmann Credibility Estimates are: (30%)(100) + (60%)(140) + (10%)(300) = 144. Since the Buhlmann Credibility Estimates are in balance, the a priori mean is 144. Since the Bayesian Analysis Estimates are also in balance, we must have: (30%)(y) + (60%)(150) + (10%)(260) = 144. Thus y = (144 - 116)/.3 = 93.33.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 374 N / (8K) . For Bühlmann Credibility, Z = N/(N+K).
8.15. E. For classical credibility Z = Setting the two credibilities equal,
N / (8K) = N/(N+K). Therefore, for N ≠ 0,
8KN = (N +K)2 . N2 - 6NK + K2 = 0. N=
6K ±
36K2 - 4K2 = K(3 ± 2
8 ) = 5.83K or .172K.
For N = 5.83K, Z = 5.83/6.83 = 85%. For N = .172K, Z = .172/1.172 = 15%. Comment: Here is a graph of the two formulas for K = 100 and a standard for full Credibility of 800, with the Buhlmann formula shown as thick: Z 1.0 0.8 Buhlmann 0.6 0.4
Classical
0.2
5 10 50 100 500 1000 The two curves cross at 17 and 583 claims, when Z = 15% or 85%.
N
8.16. B. & 8.17. B. Z = 4/(4+11) = 4/15. Let x be the average of the 4 observations; x can range from 13 to 49. The estimate is: (4/15)(x) + (11/15)(25) = (4x + 275)/15. For x = 13 the estimate is 327/15 = 21.8. For x = 49 the estimate is: 471/15 = 31.4.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 375 8.18. E. The Buhlmann estimate should be on a straight line, but they are not in Graph B, eliminating graph B. The Buhlmann estimates are the weighted least squares linear approximation to those of Bayes Analysis. However, in graph A, the Buhlmann estimates are always higher than the Bayes estimates. Thus graph A is eliminated. The Bayes estimates must remain within the range of hypotheses, in this case 0.5 to 2.0, eliminating graph C. In graph D, the slope of the Buhlmann line is about: (2.1 - 0.3)/6 = 0.3 = Z. In graph D, the intercept of the Buhlmann line is about: 0.3 = (1 - Z)µ. ⇒ µ = 0.3/(1 - .3) = 0.43. However, µ, the a priori overall mean should be between 0.5 and 2.0, eliminating graph D. Comment: Similar to 4, 5/05, Q.32. The problem with graph D is not obvious! The slope of the straight line formed by the Buhlmann estimates is Z. For ordinary situations, 0 < Z < 1. Thus this slope must be positive and less than 1, which is true for all those graphs in which the Buhlmann estimates are on a straight line. There is no requirement that the Buhlmann estimates must remain within the range of hypotheses. 8.19. D. The a priori mean is E[αθ] = 4E[θ] = (4)(600) = 2400. The mean for observed Years 1 and 2 is: (1400 + 1900)/2 = 1650. Therefore, Z1650 + (1 - Z)(2400) = 1800. ⇒ Z = 0.8. ⇒ 2/(2+K) = 0.8 ⇒ K = 1/2. For three years of data, the observed mean is: (1400 + 1900 + 2763)/3 = 2021, and Z = 3/(3 + 1/2) = 6/7. (6/7)(2021) + (1/7)(2400) = 2075. Comment: Similar to 4, 11/06, Q.6. 8.20. E. E[µN(α)] = 0.3. E[µN(α)2 ] = 0.5 + 0.32 = 0.59. E[µX(β)] = 200. E[µX(β)2 ] = 6000 + 2002 = 46,000. Hypothetical Mean Aggregate Loss = µN(α)µX(β). First Moment of Hypothetical Means of Aggregate Loss = E[µN(α)µX(β)] = E[µN(α)]E[µX(β)] = (.3)(200) = 60. Second Moment of Hypothetical Means of Aggregate Loss = E[(µN(α)µX(β))2 ] = E[µN(α)2 ]E[µX(β)2 ] = (.59)(46,000) = 27,140. Variance of Hypothetical Means of Aggregate Loss = 27,140 - 602 = 23,540. We are given that the total variance of aggregate losses is 165,000. Therefore, EPV + VHM = 165,000. ⇒ EPV = 165,000 - 23,540 = 141,460. K = EPV/VHM = 141,460/23,540 = 6.0.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 376 8.21. The hypothetical means are 0.6, 1.8, and 3.6, equally likely. The a priori mean is: (0.6 + 1.8 + 3.6)/3 = 2. The variance of the hypothetical means is: {(0.6 - 2)2 + (1.8 - 2)2 + (3.6 - 2)2 }/3 = 1.52. The process variances are: 0.54, 1.26, and 1.44, equally likely. The expected value of the process variance is: (0.54 + 1.26 + 1.44)/3 = 1.08. K = EPV/VHM = 1.08/1.52 = 0.711. Z = 1/(1 + K) = 58.4%. Thus, the Buhlmann credibility estimates are: .584k + (1 - .584)(2) = .584k + 0.832. For Bayes analysis, the chance of the observation is f(k) for a Binomial with m = 6: ⎛6⎞ ⎜ ⎟ qk(1-q)6-k. For a given value of k, this is proportional to: qk(1-q)6-k. ⎝ k⎠ The 3 values of q are equally likely, and thus the probability weights are proportional to: qk(1-q)6-k. For k = 0, the probability weights are: .96 , .76 , and .46 . The mean frequencies are: (6)(.1), (6)(.3), and (6)(.6). Bayes Estimate is: 6{(.96 )(.1) + (.76 )(.3) + (.66 )(.4)}/ {.96 + .76 + .66 } = 0.835. For k = 1, the probability weights are: (.1)(.95 ), (.3)(.75 ), and (.6)(.45 ). Bayes Estimate is: 6{(.95 )(.12 ) + (.75 )(.32 ) + (.65 )(.42 )}/ {(.1)(.95 ) + (.3)(.75 ) + (.6)(.45 )} = 1.283. For k = 2, the probability weights are: (.12 )(.94 ), (.32 )(.74 ), and (.62 )(.44 ). Bayes Estimate is: 6{(.94 )(.13 ) + (.74 )(.33 ) + (.64 )(.43 )}/ {(.12 )(.94 ) + (.32 )(.74 ) + (.62 )(.44 )} = 2.033. For k = 3, the probability weights are: (.13 )(.93 ), (.33 )(.73 ), and (.63 )(.43 ). Bayes Estimate is: 6{(.93 )(.14 ) + (.73 )(.34 ) + (.63 )(.44 )}/ {(.13 )(.93 ) + (.33 )(.73 ) + (.63 )(.43 )} = 2.808. For k = 4, the Bayes Estimate is: 6{(.92 )(.15 ) + (.72 )(.35 ) + (.62 )(.45 )}/ {(.14 )(.92 ) + (.34 )(.72 ) + (.64 )(.42 )} = 3.302. For k = 5, the Bayes Estimate is: 6{(.9)(.16 ) + (.7)(.36 ) + (.6)(.46 )}/ {(.15 )(.9) + (.35 )(.7) + (.65 )(.4)} = 3.506. For k = 6, the Bayes Estimate is: 6{.17 + 37 + .47 }/ {.16 + .36 + .66 } = 3.572. k Bayes Estimate Buhlmann Estimate
0 0.835 0.832
1 1.283 1.416
2 2.033 2.000
3 2.808 2.584
4 3.302 3.168
5 3.506 3.752
6 3.572 4.336
Here is a graph, with the Bayes Estimates as the points and the Buhlmann Estimates as the straight line:
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 377 Estimate 4 3.5 3 2.5 2 1.5 1
0
1
2
3
4
5
6
k
Comment: Similar to 4, 5/07, Q.2. 8.22. There are various possible combinations of pieces of pie for Burtʼs three lunches: 0, 2, 2; 2, 0, 2; 2, 2, 0; 1, 1, 2; 1, 2, 1; 2, 1, 1. If Burt is someone who always drinks no cups of coffee, then the chance of the observation is: 3(1/2)(1/6)(1/6) + (3)(1/3)(1/3)(1/6) = .0972. If Burt is someone who always drinks one cup of coffee, then the chance of the observation is: 3(1/4)(1/4)(1/4) + (3)(1/2)(1/2)(1/4) = .2344. If Burt is someone who always drinks two cups of coffee, then the chance of the observation is: 3(1/6)(1/2)(1/2) + (3)(1/3)(1/3)(1/2) = .2917. Number A Priori of Cups Chance of Chance of Coffee This Type of the of Risk Observation 1 2 3
0.3 0.4 0.3
Overall
1.000
9.72% 23.44% 29.17%
Prob. Weight = Product Prior Two Columns
Posterior Chance of This Type of Risk
Average Number of Pieces of Pie
0.0292 0.0938 0.0875
13.86% 44.55% 41.58%
0.667 1.000 1.333
0.2104
1.000
1.092
Comment: If we had been told how many cups of coffee Burt drinks with his lunch each day, then there would have been no need to use Bayes Analysis. “Any given customer drinks the same number of cups of coffee each time they have lunch at Dinahʼs,” means that there is useful information contained in the number of pieces of pie eaten in the past for predicting the number of pieces of pie eaten in the future by the same customer. If each customer instead had the whole joint distribution shown, then the expected number of pieces of pie eaten is one per lunch for every customer.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 378 8.23. The hypothetical means are: {(15%)(0) + (10%)(1) + (5%)(2)}/30% = 2/3, {(10%)(0) + (20%)(1) + (10%)(2)}/40% = 1, {(5%)(0) + (10%)(1) + (15%)(2)}/30% = 4/3. The first moment of the hypothetical means is: (30%)(2/3) + (40%)(1) + (30%)(4/3) = 1. The 2nd moment of the hypothetical means is: (30%)(2/3)2 + (40%)(1)2 + (30%)(4/3)2 = 1.0667. VHM = 1.0667 - 1 = 0.0667. For someone who buys no cups of coffee, the second moment is: {(15%)(02 ) + (10%)(12 ) + (5%)(22 )}/30% = 1. The process variance is: 1 - (2/3)2 = 5/9. For someone who buys 1 cup of coffee, the second moment is: {(10%)(02 ) + (20%)(12 ) + (10%)(22 )}/40% = 1.5. The process variance is: 1.5 - (1)2 = 1/2. For someone who buys 2 cups of coffee, the second moment is: {(5%)(02 ) + (10%)(12 ) + (15%)(22 )}/30% = 2.333. The process variance is: 2.333 - (4/3)2 = 5/9. EPV = (30%)(5/9) + (40%)(1/2) + (30%)(5/9) = 0.533. K = EPV/VHM = 0.533/0.0667 = 8. Z = 3/(3 + 8) = 3/11. Estimated number of pieces of pie: (3/11)(4/3) + (8/11)(1) = 12/11 = 1.091. Comment: While the estimates using Bayes Analysis and Buhlmann credibility are very similar for this observation, they are not equal. 8.24. D. Type 3 has average severity of $1050, and variance of severity of: (10%)(4502 ) + (90%)(502 ) = 22,500. Annual aggregate losses for Type 3 have a process variance of: (80%)(22,500) + (10502 )(80%)(20%) = 194,400. Type
A Priori Probability
1 2 3
33.33% 33.33% 33.33%
Mean Variance of Frequency Frequency 0.2 0.4 0.8
0.16 0.24 0.16
Mean Severity
Variance of Severity
Process Variance
1400 1250 1050
40,000 62,500 22,500
321,600 400,000 194,400
Average
305,333
EPV = (321,600 + 400,000 + 194,400)/3 = 305,333. Type
A Priori Probability
Mean Frequency
Mean Severity
Mean Aggregate
Square of Mean Aggregate
1 2 3
33.33% 33.33% 33.33%
0.2 0.4 0.8
1400 1250 1050
280 500 840
78,400 250,000 705,600
540
344,667
Average
VHM = 344,667 - 5402 = 53,067. K = EPV/VHM = 305,333/53,067 = 5.75. For 7 years of data, Z = 7/(7 + 5.75) = 54.9%. Jimʼs observed mean annual loss is: (1000 + 1000 + 1500)/7 = $500. Estimate for Jim for year 8 is: (54.9%)($500) + (1 - 54.9%)($540) = $518. Comment: Based on Q. 18 of the SOA Fall 2009 Group and Health - Design and Pricing Exam.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 379 8.25. A. For Type 1, the chance of the observation of Jim is: (.8){(.2)(.2)}(.8){(.2)(.2)}(.8){(.2)(.8)}(.8) = 0.00010486. For Type 2, the chance of the observation of Jim is: (.6){(.4)(.5)}(.6){(.4)(.5)}(.6){(.4)(.5)}(.6) = 0.00103680. For Type 3, the chance of the observation of Jim is: (.2){(.8)(.9)}(.2){(.8)(.9)}(.2){(.8)(.1)}(.2) = 0.00006636. The mean aggregate for Type 1 is: (0.2)(1400) = 280. The mean aggregate for Type 2 is: (0.4)(1250) = 500. The mean aggregate for Type 3 is: (0.8)(1050) = 840. Type
A Priori Probability
Chance of Observation
Probability Weight
Posterior Distribution
Mean Aggregate
1 2 3
33.33% 33.33% 33.33%
0.00010486 0.0010368 0.00006636
0.000034953 0.000345600 0.000022120
8.68% 85.83% 5.49%
280 500 840
0.000402673
100.00%
500
Estimate for Jim for year 8 is: (8.68%)($280) + (85.83%)($500) + (5.49%)($840) = $500. 8.26. B. For group A, the mean is 1200, and the variance is: (0.8)(2002 ) + (0.2)(8002 ) = 160,000. For group B, the mean is 1500, and the variance is: (0.5)(5002 ) + (0.5)(5002 ) = 250,000. EPV = (0.7)(160,000) + (0.3)(250,000) = 187,000. Overall mean is: (0.7)(1200) + (0.3)(1500) = 1290. Second Moment of the hypothetical means is: (0.7)(12002 ) + (0.3)(15002 ) = 1,683,000. VHM = 1,683,000 - 12902 = 18,900. K = EPV / VHM = 187,000 / 18,900 = 9.89. We are estimating severity; there are a total of 6 claims, so N = 6. Z = 6 / (6+K) = 37.8%. Observed mean severity is 1500. The future estimate is: (0.378)(1500) + (1 - 0.378)(1290) = 1369. 8.27. C. For group A, the chance of the observation is proportional to: (0.83 )(0.23 ) = 0.004096. For group B, the chance of the observation is proportional to: (0.53 )(0.53 ) = 0.015625. Thus the probability weights are: (0.7)(0.004096) and (0.3)(0.015625). The posterior distribution is: 38.0% and 62.0% The means for the two groups are: 1200 and 1500. The estimate of future severity is: (38%)(1200) + (62%)(1500) = 1386. Comment: One can include or exclude binomial coefficients. As long as one is consistent between the two groups, it will not affect the posterior distribution you get. In the absence of inflation, we make no use of which years the claims occurred in.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 380 8.28. B. The average pure premium for Class A is: (1/5){(2/3)(100) + (1/3)(200)} = 26.67. The average pure premium for Class B is: (2/5){(1/2)(100) + (1/2)(200)} = 60. The a priori mean is: (3/4)(26.67) + (1/4)(60) = 35.00. The second moment of the hypothetical means is: (3/4)(26.672 ) + (1/4)(602 ) = 1433.5. VHM = 1433.5 - 35.002 = 208.5. The variance of severity for Class A is: (2/3)(100 - 133.33)2 + (1/3)(200 - 133.33)2 = 2222. The variance of pure premium for Class A is: (1/5)(2222) + (133.332 )(1/5)(4/5) = 3289. The variance of severity for Class B is: (1/2)(100 - 150)2 + (1/2)(200 - 150)2 = 2500. The variance of pure premium for Class B is: (2/5)(2500) + (1502 )(2/5)(3/5) = 6400. EPV = (3/4)(3289) + (1/4)(6400) = 4067. K = EPV / VHM = 4067 / 208.5 = 19.5. We observe two years so that N = 2. Z = 2 / (2 + K) = 9.3%. (0.093)(200/2) + (1 - 0.093)(35) = 41.0. 8.29. C. For one year Z = 1/(1+K), 1 - Z = K/(1+K). Let µ be the a priori mean frequency. Then the estimate for an individual with one year claim free is: µ K / (1 + K) = 0.07875. For two years Z = 2/(2+K), 1 - Z = K/(2+K). The estimate for an individual with two years claim free is: µ K / (2 + K) = 0.07. Dividing the two equations: (1+K)/(2+K) = 1.125. ⇒ K = 7. ⇒ µ = 9%. For three years Z = 3/(3+K), 1 - Z = K/(3+K). The estimate for an individual with three years claim free is: µ K / (3 + K) = (9%)(7/10) = 6.3%. 8.30. B. 1. True. 2. False. 3. True. 8.31. C. For each deck the process is a 10 times a Bernoulli Process. Thus the process variance is 100 times that of a Bernoulli. For Deck A, q = 4/13 and the process variance is (100)(4/13)(1- 4/13) = 21.302. For Deck B, q = 3/12 and the process variance is (100)(3/12)(1- 3/12) = 18.750. Since the Decks are equally likely, the Expected Value of the Process Variance = (21.302+18.750) /2 = 20.026.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 381 8.32. B. The variance of the hypothetical means = 7.85873 - 2.788462 = 0.0832. Number of Deck A B Average
A Priori Chance of This Deck 0.5 0.5
Mean of a Card from this Deck 3.07692 2.50000
Square of Mean of Card from This Deck 9.46746 6.25000
Process Variance for draw from this Deck 21.302 18.750
2.78846
7.85873
20.026
8.33. C. As shown in the prior solutions, EPV = 20.026 and VHM = .0832. Thus the Buhlmann credibility parameter = K = EPV/VHM = 20.026 / .0832 = 241. Thus two observations are given credibility: 2/(2+K) = 2/243 = .0082. The observed average is $10 per card. The a priori mean is $2.788 per card. Thus the estimate of the next card is: (.0082)(10)+(1-.0082)(2.788) = 2.85. 8.34. B. The total variance is 102 = 100. The (assumed) expected value of the process variance = 52 = 25. Thus the variance of the hypothetical means = total variance - EPV = 100 -25 = 75. The Buhlmann Credibility parameter K is EPV/VHM = 25/75 = 1/3. Thus the credibility assigned to one observation is 1/(1+K) = 1 /(4/3) = .75. Thus if one observes a score of s, our estimate of that studentʼs “true competence” would be: (.75)s + (1 - .75)(55) = 13.75 + .75s. Thus an estimated true competency of 70% would correspond to a score such that: 70 = 13.75 + .75s. Thus s = 56.25 / .75 = 75. 8.35. D. 1. False. The correct formula is VAR[X] = (E[Yi])2 VAR[N] + VAR[Yi] E[N]. Note this formula only hold when frequency and severity are independent. 2. True. As the EPV increases there is more random fluctuation and the credibility assigned to the observation decreases. 3. True. P(H | B) = P(H and B) / P(B).
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 382 8.36. D. The mean of a sum of N 6-sided dice is 3.5N. The process variance of a sum of N independent 6-sided dice is N(35/12). Thus as calculated below, the variance of the hypothetical means = 1772.167 - 422 = 8.167, while the EPV = 35. Thus the Buhlmann credibility parameter = K = EPV/VHM = 35 / 8.167 = 4.29. Thus five observations are given credibility: 5/(5+K) = 5/9.29 = .538. The observed average is: (45+ 44+ 51+ 48 +47)/5 = 47. The a priori mean is 42. Thus the new estimate is: (.538)(47)+(1-.538)(42) = 44.7. Number of Dice 11 12 13
A Priori Chance of This # of Dice 0.3333 0.3333 0.3333
Average
Mean of Sum of Dice 38.5 42.0 45.5
Square of Mean of Sum of Dice 1482.250 1764.000 2070.250
Process Variance for Sum of Dice 32.083 35.000 37.917
42.000
1772.167
35.000
8.37. A. For Urn 1 the mean is: {(5)(0) + (10)(100) + (5)(500)}/(5 + 10 + 5) = 175. For Urn 1 the second moment is: {(5)(02 ) + (10)(1002 ) + (5)(5002 )}/(5 + 10 + 5) = 67500. Therefore, the process variance of Urn 1 is: 67500 - 1752 = 36875. For Urn 2, the mean is: {(20)(0) + (8)(100) + (12)(500)}/(20 + 8 + 12) = 170. For Urn 2, the second moment is: {(20)(02 ) + (8)(1002 ) + (12)(5002 )}/(20 + 8 + 12) = 77000. Therefore, the process variance of Urn 2 is: 77000 - 1702 = 48100. Expected Value of the Process Variance = 42488. Variance of the Hypothetical Means = 29762.5 - 172.52 = 6.25. K= EPV / VHM = 42488 / 6.25 = 6798. Thus for N = 2, Z = 2/(2 + 6798) = 0.00029. Type of Urn 1 2
A Priori Probability 0.5000 0.5000
Average
Mean for this Type Urn 175 170
Square of Mean of this type Urn 30,625.0 28,900.0
Process Variance 36,875 48,100
172.5
29,762.5
42,488
8.38. A. 1. True. 2. False. An estimate using credibility is the weighted average of the hypothesis and outcome. 3. False. The total number of accidents follows a Negative Binomial Distribution. 8.39. E. 1. True. 2. True. 3. True.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 383 8.40. E. For the first urn the mean is: (1 + 2 + 3 + 4)/4 = 2.5. The second moment is: (12 + 22 + 32 + 42 )/4 = 7.5. Thus the process variance = 7.5 - 2.52 = 1.25. For the second urn the mean is: (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. The second moment is: (12 + 22 + 32 + 42 + 52 + 62 )/6 = 15.167. Thus the process variance = 15.167 - 3.52 = 2.917. Thus the expected value of the process variance is: (.5)(1.25)+(.5)(2.917) = 2.083.
Urn I II
A Priori Chance of This Urn 0.500 0.500
Overall
Process Variance 1.250 2.917
Mean of Ball From Urn 2.5 3.5
Square of Mean of Ball From Urn 6.25 12.25
2.083
3.00
9.25
The variance of the hypothetical means = 9.25 - 32 = .25. K = EPV / VHM = 2.083 / .25 = 8.33. Z = 1/(1 + 8.33) = 10.7%. The a priori estimate is 3 and the observation is 4. Therefore, the new estimate is: (.107)(4) + (1-.107)(3) = 3.11. 8.41. C. Z = N/(N + K). For N =1, Z = 1/2, therefore K = N{(1/Z) - 1} = 1(2 - 1) = 1. Therefore for N =3, Z = 3 / (3 + 1) = 3/4. 8.42. D. The variance of the hypothetical means = .33667 - .52 = .08667.
Type of Urn A B C Overall
A Priori Chance of This Type of Urn 0.333 0.333 0.333
Process Variance 0.090 0.240 0.160
Mean for 1 Ball from Urn 0.9 0.4 0.2
Square of Mean of 1 Ball from Urn 0.81 0.16 0.04
0.1633
0.50
0.33667
K = EPV / VHM = .1633 / .08667 = 1.884. For 3 balls, Z = 3/(3 + 1.884) = .614. The a priori estimate for three balls is (3)(.5) = 1.5 and the observation is 1, so the new estimate is: (.614)(1) + (1 - .614)(1.5) = 1.193.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 384 8.43. C. For each type of spinner one calculates the mean. For example, for Spinner B it is: {(3)(0) + (2)(12) + (1)(48)}/6 = 12. One can also compute the 2nd moment for each type of spinner; for example, for Spinner B it is: {(3)(02 ) + (2)(122 ) +(1)(482 )}/6 = 432. Then for each type of Spinner, the process variance is the second moment minus the square of the mean. For example for Spinner B the process variance is: 432 - 122 = 288. One weights together the individual process variances to get: EPV = 337.33. Variance of the Hypothetical Means = 214.6667 - 142 = 18.6667. Type of Spinner A B C
A Priori Probability 0.3333 0.3333 0.3333
Mean for this Square of Mean Type Spinner of this type Spinner 20 400 12 144 10 100
Average
14
2nd Moment of Spinner 816 432 408
214.6667
Process Variance 416 288 308 337.3333
K = EPV / VHM = 337.33/18.6667 = 18.07. Thus for N =1, Z = 1/(1+18.07) = 5.2%. The observed mean is 0 and the a priori mean is 14, therefore, the new estimate is: (0)(.052) + (14)(1 - .052) = 13.3. 8.44. D. The overall mean is .525. The second moment of the hypothetical means is .3275. Urn A B C D SUM
A Priori Probability
% of Balls Marked 1
0.25 0.25 0.25 0.25
0.3 0.3 0.7 0.8 0.5250
Square of Mean of this type Urn 0.09 0.09 0.49 0.64 0.3275
Process Variance 0.21 0.21 0.21 0.16 0.1975
Therefore, the Variance of the Hypothetical Means = .3275 - .5252 = .051875 . The process variance for a single draw from an Urn is q(1-q), since it is a Bernoulli process. For example, for Urn D, the process variance is (.8)(1 - .8) = .16. EPV = (.25)(.21) + (.25)(.21) + (.25)(.21) + (.25)(.16) = .1975. Then K = EPV /VHM = .1975 /.051875 = 3.807. For four balls drawn, Z = N /(N + K) = 4/(4 + 3.807) = .5124. The prior estimate of the average draw is .525. The observed average draw is 2/4 = .5. Thus the new estimate of the average draw is: (.5124)(.5)+(1-.5124)(.525) = .5122. For four draws, the new estimate = (4)(.5122) = 2.049. Comment: Note that we calculate the VHM & EPV for a draw of a single ball. The Buhlmann Credibility formula automatically adjusts for N = 4.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 385 8.45. C. The variance of the hypothetical means = .8 - .82 = .16.
Type of Urn A B
A Priori Chance of This Type of Urn 0.500 0.500
Mean of This Type of Urn 1.2 0.4
Overall
0.80
SquareSquare of Second Mean of MeanMoment of This Type Die This Type of Urn Roll of Urn 1.44 2 0.16 0.6 0.80
Process Variance 0.560 0.440 0.50
The second moment of a ball from Urn B is: (02 )(.7) + (12 )(.2) + (22 )(.1) = .6. Thus the process variance of Urn B is: .6 - (.42 ) = .44. K = EPV / VHM = .5 / .16 = 3.125. Z = N /(N + K) = 2/(2 + 3.125) = .39. The a priori estimate for the sum of two balls is (2)(.8) = 1.6 and the observation is 2, so the new estimate is: (.39)(2) + (1 - .39)(1.6) = 1.756. 8.46. C. For one observation Z = 1 / (1+K) = 1/3. Thus K = 2. For four observations, Z = 4 / (4+K) = 4/6 = 2/3. 8.47. C. Expected Value of the Process Variance = E[v] = 8. Variance of the Hypothetical Means = Var[m] = 4. K = EPV / VHM = 8/4 = 2. So, Z = 3 / (3+K) = 3 / (3+2) = 3/5 = 0.6. 8.48. C. The process variance for picking one ball is given for the Bernoulli by q(1-q). Urn A B Mean
A Priori Probability 0.5 0.5
Mean (1 ball) 0.8 0.3
Square of Mean 0.64 0.09
Process Variance 0.16 0.21
0.550
0.365
0.185
Variance of the Hypothetical Means = .365 - .552 = .252 = .0625. Thus, K = .185 / .0625 = 2.96. Z = 2/(2+2.96) = 40.3%. The complement of credibility is given to the a priori estimate for a pair of balls which is: (2)(.55) = 1.1. Thus, the New Estimate for a pair of balls = (40.3%)(1) + (59.7%)(1.1) = 1.06. Comment: One can instead calculate the process variance, VHM, and K for a pair of balls. Then K = .37 / .25 = 1.48 and Z = 1 / (1+1.48) = 40.3%, thus getting the same result. 8.49. B. Any information about which urn we had chosen which may have been contained in prior observations is no longer relevant once we make a new random selection of an urn. Therefore our best estimate is the grand mean (assuming equal probabilities for the two urns) of 1.10. Comment: Has to be read carefully. Tests basic understanding of an important point.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 386 8.50. A. Let the variance of the hypothetical means = VHM; we will solve for VHM. K = Expected Value of the Process Variance / Variance of the Hypothetical Means = 9 / VHM. Thus the credibility of one observation is: 1 / (1+(9/VHM)) = VHM / (VHM + 9). The credibility of three observations is: 3 / (3 + (9/VHM)) = 3VHM / (3VHM + 9). We are given that 3VHM / (3VHM + 9) = 2{VHM / (VHM + 9)} . Therefore, 6VHM + 18 = 3 VHM + 27. Therefore VHM = 3. Comment: Backwards! One is usually given the Variance of the Hypothetical Means and asked to calculate the credibilities. Note one can first solve for K, which equals 3, and then solve for VHM = EPV / K = 9/3 = 3. 8.51. D. 1. T. Fewer observations are less valuable all other things being equal. 2. T. When the risks are more similar to each other, the relative value compared to the overall mean of the observation of an individual risk is less. 3. F. The information value of an individual observation is increased when the random noise is decreased. 8.52. C. Z = N/(N+K), therefore K = N(1-Z)/Z. If Z = .29 when N = 5, then K = 12.24. Therefore when N = 2, Z = 2 / (2+12.24) = 0.140. 8.53. D. The Process Variance of Urn #1 is {(1-3)2 + (2-3)2 + (3-3)2 + (4-3)2 + (5-3)2 } / 5 = 2. Urn (Type of Risk) 1 2 3 4 5 6 7 8 9 10 Mean
A Priori Chance of Risk 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100
Hypothetical Mean
Square of Mean
Process Variance
3 3 3 3 3 1 2 3 4 5
9 9 9 9 9 1 4 9 16 25
2 2 2 2 2 0 0 0 0 0
3
10
1
Expected Value of the Process Variance = 1. Variance of the Hypothetical Means = 10 - 32 = 1. K = EPV /VHM = 1/1 =1. For two observations, Z = 2 / (2+1) = 2/3. The a priori mean is 3 and the observation is (2+3)/2 = 2.5. The new estimate = (2/3)(2.5) + (1/3)(3) = 2.67. Comment: Note that due to the particular values in this question it is easy to make a mistake, but still end up with the correct answer.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 387 8.54. E. Urn A is twice a Bernoulli process with q =1/2, thus it has a mean of: 2(.5) = 1, and variance of: 22 (.5)(1-.5) = 1. Urn B is t times a Bernoulli process with q = 1/2, thus it has a mean of: (t)(.5) = t/2, and variance of: t2 (.5)(1-.5) = t2 /4. Thus since the two Urns are equally likely, the Expected Value of the Process Variance is: (.5)(1 + t2 /4). The overall mean is: (1 + t/2)/2 = 1/2 + t/4. The Variance of the Hypothetical Means is: (1/2)( 1/2 + t/4 -1)2 + (1/2)(1/2 + t/4 - t/2)2 = (1/4)(1 - t/2)2 . K = EPV / VHM and for one observation Z = 1/(1+K), therefore: t
EPV
VHM
K
Z
1 2 3 4 5
0.625 1 1.625 2.500 3.625
0.0625 0.0000 0.0625 0.2500 0.5625
10.000 ∞ 26.000 10.000 6.444
9.1% 0 3.7% 9.1% 13.4%
Comment: Since for one observation K = (1/Z) - 1, if Z > 1/10, then K < 9. This may save some time testing the five cases. 8.55. C. Die A has a mean of 3.5. The second moment is (12 + 22 + 32 + 42 + 52 + 62 ) / 6 = 91/6. Therefore the variance is 91/6 - 3.52 = 35/12 = 2.1967. Die B has a mean of 8.5 and the same variance as Die A. EPV = (.5)(35/12) + (.5)(35/12) = 35/12 = 2.9167. Variance of the Hypothetical Means = 42.25 - 62 = 6.25. Die A B Average
A Priori Probability 0.5000 0.5000
Mean for this Die 3.5 8.5
Square of Mean of this Class 12.25 72.25
Process Variance 2.9167 2.9167
6
42.25
2.9167
K= EPV / VHM = (35/12) / (6.25) = 7/15 = .4667. Thus for N =3, Z = 3/(3+.4667) = 86.5%. The observed mean is (1+2+3)/3 =2 and the a priori mean is 6. Therefore, the new estimate = (2)(86.5%) + (6)(13.5%) = 2.54. Comment: All of the outcomes for Die B are 5 more than those for Die A. Adding a constant to a variable adds that same constant to the mean and does not alter the variance. Note the contrast in this case of the Buhlmann Credibility estimate compared to the Bayesian Analysis result. The observation is only possible if we have chosen Die A. Thus, Bayesian Analysis gives a posterior estimate of 3.5, the mean of Die A. 8.56. B. n / (n+K) = 1/3 and (n+1)/ (n+1+K) = 2/5. Therefore, 3n = n+K and 5n+5 = 2n+2+2K. Therefore, K = 2n. Then n =3 and K = 6. Thus n+2 observations have credibility: Z = (n+2)/(n+2+K) = (3+2) / (3+2+6) = 5/11.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 388 8.57. B. Urn A B Average
Mean 1
Square of Mean 1
Process Variance 1
t2 /4
t/2 1/2 + t/4
1/2 + t2 /8
t2 /4 1/2 + t2 /8
Thus the Variance of the Hypothetical Means is: 1/2 + t2 /8 - (1/2 + t/4)2 = 1/4 - t/4 + t2 /16. Then K = EPV / VHM = (1/2 + t2 /8) /(1/4 - t/4 + t2 /16). As t approaches infinity, K approaches: ( t2 /8) /(t2 /16) = 2. Thus for one observation, Z approaches: 1/(1+2) = 1/3. Comment: The process variance of Urn B is {(0 - t/2)2 +(t - t/2)2 }/2 = t2 /4. 8.58. C. The (weighted) average of the Bayesian Analysis Estimates is: (5/12)(3) + (1/6)(4.5) + (5/12)(6) = 9/2. Since the Bayesian Analysis Estimates are in balance, the a priori mean is 9/2. The Buhlmann Credibility estimates are thus: (observation - 9/2)Z + 9/2. Since Z is the same for each of the insured (one exposure for each, Z = 1/(1+K)), the Buhlmann Credibility estimates are in balance. Thus we want x to be such that: (x)(5/12) + (3.8)(1/6) + (6.1)(5/12) = 9/2. x = 3.18. Comment: Both the Bayesian and Buhlmann Estimates are in balance. 8.59. B. Define the three types of dice as: Type A: 2@1, 2@2, 2@3 Type B: 2@1, 2@3, 2@5 Type C: All @ 6
Type of Die A B C Overall
A Priori Chance of This Type of Die 0.500 0.333 0.167
Process Variance 0.667 2.667 0.000
Mean Die Roll 2 3 6
Square of Mean Die Roll 4 9 36
1.2222
3.000
11.000
The variance of the hypothetical means = 11 - 32 = 2. K = EPV / VHM = 1.2222 / 2 = .6111. Z = 1/(1 + .6111) = .621. The a priori estimate is 3 and the observation is 6, so the new estimate is: (.621)(6) + (1 - .621)(3) = 4.86.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 389 8.60. C. Define the four types of dice as: Type A : 2@1, 2@2, 2@3 Type B : 2@1, 2@3, 2@5 Type C: All @ 6 Type D: All @ 3 Situation 1:
Type of Die A B C
A Priori Chance of This Type of Die 0.429 0.429 0.143
Overall
1.0000
Process Variance 0.667 2.667 0.000
Mean Die Roll 2 3 6
Square of Mean Die Roll 4 9 36
1.4286
3.000
10.7143
The variance of the hypothetical means = 10.7143 - 32 = 1.7143. K = EPV / VHM = 1.4286 / 1.7143 = .833. Z = 1/(1+.833) = .546. Situation 2:
Type of Die A B C D
A Priori Chance of This Type of Die 0.429 0.286 0.143 0.143
Process Variance 0.667 2.667 0.000 0.000
Mean Die Roll 2 3 6 3.000
Square of Mean Die Roll 4 9 36 9.000
Overall
1.0000
1.0476
3.0000
10.7143
The variance of the hypothetical means = 10.7143 - 32 = 1.7143. K = EPV / VHM = 1.0476/ 1.7143 = .611. Z = 1/(1+.611) = .621. Situation 3: Type of Die
A Priori Chance of This Type of Die
Process Variance
Mean Die Roll
Square of Mean Die Roll
A B C
0.429 0.286 0.286
0.667 2.667 0.000
2 3 6
4 9 36
Overall
1.0000
1.0476
3.4286
14.5714
The variance of the hypothetical means = 14.5714 - 3.42862 = 2.8161. K = EPV / VHM = 1.0476/ 2.8161 = 0.372. Z = 1/(1+0.372) = 0.729. Since the Credibility in the previous question is .621, the Credibility is less in Situation 1, the same in Situation 2, and higher in Situation 3. Comment: One can just calculate the values of the Buhlmann Credibility Parameter K and check when K is smaller so that Z will be larger. The situations compare as follows
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 390 Situation
EPV
VHM
K
Z
in the Question
1.2222
2.0000
0.611
0.621
1 2 3
1.4286 1.0476 1.0476
1.7143 1.7143 2.8161
0.833 0.611 0.372
0.545 0.621 0.729
In Situation 1, the EPV is higher and the VHM is smaller than in the previous question, each of which decreases the Credibility. In Situation 2, the EPV is smaller and the VHM is smaller than in the previous question, which act in different directions on the Credibility. In this case the credibility turns out to be the same as in the previous question, but it could have been either higher or lower. In Situation 3, the EPV is lower and the VHM is higher than in the previous question, each of which increases the Credibility. 8.61. Let m be the a priori mean. Let x = the average draw observed over the first two rounds, which consists of three draws. Then 1 ≤ x ≤ 3. The Buhlmann-Straub credibility estimate after 3 draws is: (3/(3+k))x + (k/(3+k)m = (3x + km)/(3+k). We are told that the estimates for the average draw are from 3.8/2 = 1.9 to 5.0/2 = 2.5. Therefore, when x = 1, the estimate is 1.9 and when x = 3, the estimate is 2.5. Thus (3 + km)/(3+k) = 1.9 and (9 + km)/(3 + k) = 2.5. Therefore, subtracting the two equations, 6/(3 + k) = .6. k = 7. Alternately, the smallest observation for the sum of two balls is: (2)(1) = 2, while the largest such observation is: (2)(3) = 6. Z = Δestimate/ Δobservation = (5.0 - 3.8)/(6 - 2) = .3. During the first 2 rounds there are a total of three observations; therefore, Z = 3/(3 + k). Thus since Z = .3, .3 = 3/(3+k) ⇒ k = 7.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 391 8.62. C. For classical credibility, for 25 claims, Z = 25 / 100 = 1/2. In order to have the Buhlmann Credibility be the same for 25 claims, 25/(25+K) = 1/2. K = 25. Therefore for 100 claims the Buhlmann Credibility is: Z = 100/(100 + 25) = 0.80. Comment: If K were to be put in terms of exposures rather than claims, then K = 25/µf. 100 claims ⇔ 100/µf exposures. Therefore, for 100 claims the Buhlmann credibility is: Z = (100/µf)/(100/µf + 25/µf) = 100/(100 + 25) = .80. Here are the Buhlmann Credibility (dashed) and the Classical Credibility (solid): Cred. 1 0.8 0.6 0.4 0.2
20
40
60
80
100
120
140
Claims
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 392 8.63. A. The Buhlmann estimate should be on a straight line, but they are not in Graph E, eliminating graph E. The Buhlmann estimates are the weighted least squares linear approximation to those of Bayes Analysis. However, in graph C, the Buhlmann estimates are always lower than the Bayes estimates. Thus graph C is eliminated. The Bayes estimates must remain within the range of hypotheses, in this case 1 to 4, eliminating graphs B & D. The a priori mean is between 1 and 4. The Buhlmann estimate is always between the observation and the a prior mean. Therefore, the Buhlmann estimate for an observation of 1 must be at least 1. In graph C, for an observation of 1 the Buhlmann estimate is below 1. Another reason to eliminate graph C. In graph C, the slope of the Buhlmann line is about: (3.3 - 0.5)/8 = 0.35 = Z. In graph C, the intercept of the Buhlmann line is about: 0.5 = (1 - Z)µ. ⇒ µ = 0.5/(1 - 0.35) = 0.77. However, µ, the a priori overall mean should be between 1 and 4, also eliminating graph C. Comment: The slope of the straight line formed by the Buhlmann estimates is Z. For ordinary situations,.0 < Z < 1. Thus this slope must be positive and less than 1, which is true for graphs A to D. 8.64. D. The a priori mean is E[αθ] = 4E[θ] = (4)(600) = 2400. The mean for observed Years 1 and 2 is: (1400 + 1900)/2 = 1650. Therefore, Z1650 + (1 - Z)(2400) = 1800. ⇒ Z = 0.8.
⇒ 2/(2+K) = 0.8 ⇒ K = 1/2. For three years of data, the observed mean is: (1400 + 1900+ 2763)/3 = 2021, and Z = 3/(3 + 1/2) = 6/7. (6/7)(2021) + (1/7)(2400) = 2075. 8.65. E. (A) Buhlmann is not a linear approximation to the Bayesian since the Buhlmann estimate is always less than the corresponding Bayesian estimate. Also the Bayesian estimates go outside the range of hypotheses, which is (6)(.1) = 0.6 to (6)(0.6) = 3.6. (B) Buhlmann is not a linear approximation to the Bayesian since the Buhlmann estimate is always greater than the corresponding Bayesian estimate. (C) The Bayesian estimates go outside the range of hypotheses, which is (6)(.1) = 0.6 to (6)(0.6) = 3.6. (D) The Buhlmann estimates are not on a straight line. Comment: For graph E, we can estimate the slope as about (4.3 - 0.6)/6 = 62% and the intercept as about 0.7. Thus, Z ≅ (4.3 - 0.6)/6 = 62%. µ ≅ 0.7/(1 - 62%) = 2.
2013-4-9 Buhlmann Credibility §8 Buhlmann Cred. Introduction, HCM 10/19/12, Page 393 8.66. A. 0.25 = Z = 2/(2 + K). ⇒ K = 6. Process Variance = variance of a Gamma Distribution = αθ2 . EPV = E[αθ2 ] = θ2E[α] = 50θ2 . Hypothetical Mean = mean of a Gamma Distribution = αθ. VHM = Var[αθ] = θ2 Var[α]. 6 = K = EPV/VHM = 50θ2 /(θ2 Var[α]) = 50/Var[α]. ⇒ Var[α] = 50/6 = 8.33. Comment: Here the Gamma is the distribution of annual aggregate losses, rather than a distribution of severity.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 394
Section 9, Buhlmann Credibility, Discrete Risk Types Buhlmann Credibility will be applied to situations involving frequency, severity, pure premiums, or aggregate losses. A Series of Examples: In a previous section, the following information was used in a series of examples involving the frequency, severity, and pure premium: Portion of Bernoulli Gamma Risks in (Annual) Frequency Severity Type this Type Distribution Distribution 1
50%
q = 40%
α = 4, θ = 100
2
30%
q = 70%
α = 3, θ = 100
3
20%
q = 80%
α = 2, θ = 100
We assume that the types are homogeneous; i.e., every insured of a given type has the same frequency and severity process. Assume that for an individual insured, frequency and severity are independent. Using the Expected Value of the Process Variance and the Variance of the Hypothetical Means computed in a previous section, one can compute the Buhlmann Credibility Parameter in each case. An insured is picked at random of an unknown type.109 For this randomly selected insured during 4 years one observes 3 claims for a total of $450.110 Use Buhlmann Credibility to predict the future frequency, severity, or pure premium of this insured. Frequency Example: As computed in the previous section, the EPV of the frequency = 0.215, while the variance of the hypothetical mean frequencies = 0.0301. Thus the Buhlmann Credibility parameter is: K = EPV / VHM = 0.215 / 0.0301 = 7.14. Thus 4 years of experience are given a credibility of: 4/(4+K) = 4/11.14 = 35.9%. The observed frequency is 3/4 = 0.75. The a priori mean frequency is 0.57. The estimate of the future frequency for this insured is: (0.359)(0.75) + (1 - 0.359)(0.57) = 0.635. 109
If one knew which type the insured was, one would use the expected value for that type to estimate the future frequency, severity, or pure premium. 110 Unlike the Bayesian Analysis case, even if one were given the separate claim amounts, the Buhlmann Credibility estimate of severity only makes use of the sum of the claim amounts, or equivalently the average.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 395 Severity Example: As computed in the previous section, the EPV of the severity = 30,702, while the variance of the hypothetical mean severities = 6265. Thus the Buhlmann Credibility parameter is K = EPV / VHM = 30,702 / 6265 = 4.90. Thus 3 observed claims are given a credibility of 3/(3+K) = 3/7.9 = 38.0%.111 The observed mean severity is: $450/3 = $150. The a priori mean severity is $307. Thus the estimate of the future severity for this insured is: (0.380)(150) + (1 - 0.380)(307) = $247.3. Two cases for Severities: Assume there are two types of risks that are equally likely. Class 1 has a mean frequency of 10% and an Exponential Severity with mean 5. Class 2 has a mean frequency of 20% and an Exponential Severity with mean 8. As computed in the previous section, EPV = 51 and VHM = 2. Therefore, K = 51/2 = 25.5. If the types do not differ in their frequencies, then as computed in the previous section, EPV = 44.50 and VHM = 2.25. Therefore, K = 44.50/2.25 = 19.8, rather than 25.5. Pure Premium Example: As computed in the previous section, the EPV of the pure premium is 43,650, while the variance of the hypothetical mean pure premiums is 525. Thus the Buhlmann Credibility parameter is: K = EPV / VHM = 43,650 / 525 = 83.1. Thus 4 years of experience are given a credibility of: 4/(4+K) = 4/87.1 = 4.6%. The observed pure premium is $450/4 = $112.5. The a priori mean pure premium is $175. Thus the estimate of the future pure premium for this insured is: (0.046)(112.5) + (1 - 0.046)(175) = $172. Note that this estimate of the future pure premium is not equal to the product of our previous estimates of the future frequency and severity. (0.635)($247.3) = $157 ≠ $172. In general, one does not get the same result if one uses credibility to make separate estimates of the frequency and severity instead of directly estimating the pure premium. Therefore, carefully read exam questions involving credibility estimates of the pure premium, to see which of the two methods one is expected to use.
111
Note that the number of observed claims is used to determine the Buhlmann credibility of the severity.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 396 Exposures: In exam questions, it is common that one policyholder observed for one year is one exposure. In those situations, one policyholder for three years is three exposures, nine policyholders for one year is nine exposures,112 and four policyholders observed for five years is 20 exposures. For example, in automobile insurance, exposures are measured in car-years. If one observes 100 cars in each of three years, then one observes a total of 300 exposures. A commercial automobile policyholder may have a different large number of vehicles insured each year. In which case, one adds up the automobiles from each year in order to get the total exposures. The same would be done for the members of a group health insurance policy. Exercise: For a group health insurance policy, you observe the following number of employees in Years 1, 2 and 3 respectively: 800, 600, 400.113 How many exposures are there in total? [Solution: 800 + 600 + 400 = 1800.] While the unit of time is usually a year, occasionally it is something different such as a month. In that case, one insured for one month is one exposure.114 Buhlmann and Bayes Each Lack a Nice Property the Other Has: There are two types of insureds, equally likely. Each insured has a Bernoulli frequency. Type A has a mean frequency of 2%. Type B has a mean frequency of 50%. Exercise: Determine the Buhlmann Credibility Parameter, K. [Solution: Overall mean is: (50%)(0.02) + (50%)(0.5) = 0.26. Second Moment of the Hypothetical Means is: (50%)(0.022 ) + (50%)(0.52 ) = 0.1252. Variance of the Hypothetical Means is: 0.1252 - 0.262 = 0.0576. Expected Value of the Process Variance is: (50%)(0.02)(1 - 0.02) + (50%)(0.5)(1 - 0.5) = 0.1348. K = EPV / VHM = 0.1348/0.0576 = 2.34.] Exercise: An insured is picked at random. This insured has 2 claims in 2 years. Use Buhlmann credibility to estimate the future annual claim frequency for this insured. [Solution: Z = 2/(2 + K) = 2/4.34 = 46.1%. Estimate is: (46.1%)(2/2) + (1 - 46.1%)(0.26) = 0.601.] 112 113 114
See 4, 11/00, Q.38. See 4, 11/01, Q.26. See 4, 11/03, Q.27.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 397 In this case, the range of hypotheses is from 2% to 50%. However, the estimated future frequency using Buhlmann Credibility of 60.1% is outside that range of hypotheses. The estimate from Buhlmann Credibility can be outside the range of hypotheses, since it is a linear estimator which approximates the Bayes result. As discussed previously, in contrast, the estimate from Bayes Analysis is always within the range of hypotheses, since it is a weighted average of the hypothetical means. Exercise: An insured is picked at random. This insured has 2 claims in 10 years. Use Bayes Analysis to estimate the future annual claim frequency for this insured. ⎛ 10⎞ [Solution: The probability of the observation is: ⎜ ⎟ q2 (1-q)8 = 45q2 (1-q)8 . ⎝ 2⎠
Type
q
A Priori Probability
1 2
0.02 0.50
50% 50%
Chance of the Observation
Probability Weight
Posterior Chance of This Type of Risk
0.01531 0.04395
0.007657 0.021973
25.84% 74.16%
0.02 0.50
0.029630
1.000
0.376
Overall
Mean Annual Freq.
In this case, the observed frequency is 20% and the a priori mean frequency is 26%. However, the estimated future frequency using Bayes Analysis of 37.6% is not between the observation and the a priori mean. The estimate from Bayes Analysis is not necessarily between the observation and the a priori mean. As discussed previously, in contrast, the estimate from Buhlmann Credibility is always between the observation and the a priori mean, since it is a weighted average of these two items. Loss Models, Notation and Terminology: a = VHM, v = EPV, µ = collective premium ⇔ a priori mean credibility premium ⇔ estimate using credibility k = Buhlmann Credibility Parameter Z = credibility ⇔ Buhlmann credibility factor ⇔ Bühlmann-Straub credibility factor Buhlmann credibility premium ⇔ Bühlmann-Straub credibility premium
⇔ estimate using Buhlmann Credibility Greatest Accuracy Credibility ⇔ Buhlmann Credibility
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 398 Sequential Approach: Assume the observations come in a sequence, first x1 , then x2 , then x3 , etc. Let the initial estimate be y0 . Then let y1 = the estimate after observing x1 , y2 = the estimate after observing x1 and x2 , etc. Then one standard estimation method would be to let y n = yn-1 + an (xn - yn-1), where an = nth gain factor, 0 < an < 1. In other words, the new estimate after an observation is the most recent estimate plus some fraction of the difference between the latest observation and the most recent estimate. It turns out the use of Buhlmann Credibility is a special case of this method, with an = 1/(n+K), and y0 = m = a priori mean. Exercise: Show that for the use of Buhlmann Credibility, an = (yn - yn-1)/(xn - yn-1) = 1/(n+K). [Solution: Let X n = the average of the first n observations. n
y n = Z X n + (1-Z)m = (n X n + Km) / (n+K) = ( ∑ xi + Km) / (n+K). i=1
n
n-1
i=1
i=1
y n - yn-1 = {( ∑ xi + Km)(n+K-1) - ( ∑ xi + Km)(n+K)} / {(n+K)(n+K-1)} = n
n
n
n-1
n-1
i=1
i=1
i=1
i=1
i=1
{n ∑ xi + K ∑ xi - ∑ xi - Km - n ∑ xi - K ∑ xi } / {(n+K)(n+K-1)} = n
{nxn - ∑ xi + Kxn - Km} / {(n+K)(n+K-1)}. i=1
n-1
n-1
i=1
i=1
xn - yn-1 = xn - ( ∑ xi + Km)/(n+K -1) = (nxn + Kxn - xn - ∑ xi - Km) / (n+K -1) = n
{nxn - ∑ xi + Kxn - Km} / (n+K-1) = (yn - yn-1)(n+K). ] i=1
For example, let m = 10, K = 5, and the data be: 7, 15, 12, 8. Then y0 = 10. y 1 = 10 + (1/6)(7 - 10) = 9.5. y2 = 9.5 + (1/7)(15 - 9.5) = 10.286. y 3 = 10.286 + (1/8)(12 - 10.286) = 10.500. y4 = 10.500 + (1/9)(8 - 10.500) = 10.222. If instead we used all the data at once, X = 10.5, Z = 4/(4+5) = 4/9, and the estimate using Buhlmann Credibility = 10.5(4/9) + (10)(5/9) = 92/9 = 10.222, the same estimate as obtained using the sequential approach. As n → ∞, an = 1/(n+K) → 0. As we get more data, the method is less responsive to the difference between the latest observation and the most recent estimate.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 399 Problems: 9.1 (2 points) You are given: (i) The annual number of claims on a given policy has the geometric distribution with parameter β. (ii) One-third of the policies have β = 2, and the remaining two-thirds have β = 5. A randomly selected policy had ten claims in Year 1. Using Buhlmann Credibility determine the expected number of claims for the selected policy in Year 2. (A) 4.3 (B) 4.4 (C) 4.5 (D) 4.6 (E) 4.7 9.2 (3 points) You are given: (i) Size of loss follows a LogNormal distribution with µ = 5. (ii) For half of the companyʼs policies σ = 1.0, while for the other half σ = 1.5. For a randomly selected policy, you observe 5 losses of sizes: 50, 100, 150, 200, and 250. Using Buhlmann Credibility determine the expected size of the next loss from the selected policy. (A) 325 (B) 330 (C) 335 (D) 340 (E) 345 9.3 (2 points) The aggregate loss distributions for three risks for one exposure period are as follows: Aggregate Losses $0 $100 $500 Risk A 0.80 0.10 0.10 B 0.60 0.20 0.20 C 0.30 0.50 0.20 A risk is selected at random and is observed to have $500 of aggregate losses in the first exposure period. Determine the Buhlmann Credibility estimate of the expected value of the aggregate losses for the same risk's second exposure period. A. $120 B. $130 C. $140 D. $150 E. $160 9.4 (3 points) You are given: (i) Losses on a companyʼs insurance policies follow a Pareto distribution with probability density function: f(x | θ) = 4θ4 / (x + θ)5 , 0 < x < ∞. (ii) For half of the companyʼs policies θ = 1, while for the other half θ = 3. For a randomly selected policy, losses in Year 1 were 5. Using Buhlmann Credibility determine the expected losses for the selected policy in Year 2. (A) 0.95 (B) 1.00 (C) 1.05 (D) 1.10 (E) 1.15
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 400 Use the following information for the following 7 questions: There are three types of drivers with the following characteristics: Portion of Drivers Poisson Annual Pareto Type of This Type Claim Frequency Claim Severity Good
50%
3%
α = 6, θ = 1000
Bad
30%
5%
α = 5, θ = 1000
Ugly
20%
10%
α = 4, θ = 1000
For any individual driver, frequency and severity are independent. 9.5 (3 points) A driver is observed to have over a five year period a single claim. Use Buhlmann Credibility to predict this driverʼs future annual claim frequency. A. 5.5% B. 6.0% C. 6.5% D. 7.0% E. 7.5% 9.6 (2 points) What is the expected value of the process variance of the claim severities (for the observation of a single claim)? A. less than 130,000 B. at least 130,000 but less than 140,000 C. at least 140,000 but less than 150,000 D. at least 150,000 but less than 160,000 E. at least 160,000 9.7 (2 points) What is the variance of the hypothetical mean severities (for the observation of a single claim)? A. less than 2000 B. at least 2000 but less than 3000 C. at least 3000 but less than 4000 D. at least 4000 but less than 5000 E. at least 5000 9.8 (2 points) Over several years, for an individual driver you observe a single claim of size $2500. Use Buhlmann credibility to estimate this driverʼs future average claim severity. A. less than $325 B. at least $325 but less than $335 C. at least $335 but less than $345 D. at least $345 but less than $355 E. at least $355
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 401 9.9 (3 points) What is the expected value of the process variance of the pure premiums (for the observation of a single exposure)? A. less than 11,000 B. at least 11,000 but less than 12,000 C. at least 12,000 but less than 13,000 D. at least 13,000 but less than 14,000 E. at least 14,000 9.10 (2 points) What is the variance of the hypothetical mean pure premiums (for the observation of a single exposure)? A. less than 70 B. at least 70 but less than 80 C. at least 80 but less than 90 D. at least 90 but less than 100 E. at least 100 9.11 (2 points) A driver is observed to have over a five year period a total of $2500 in losses. Use Buhlmann Credibility to predict this driverʼs future pure premium. A. less than $25 B. at least $25 but less than $30 C. at least $30 but less than $35 D. at least $35 but less than $40 E. at least $40
9.12 (3 points) There are three types of risks. Assume 60% of the risks are of Type A, 25% of the risks are of Type B, and 15% of the risks are of Type C. Each risk has either one or zero claims per year. Type of Risk Chance of a Claim A Priori Chance of Type of Risk A 20% 60% B 30% 25% C 40% 15% A risk is selected at random, and you observe 4 claims in 9 years. Using Buhlmann Credibility, what is the estimated future claim frequency for that risk? A. Less than 0.29 B. At least 0.29 but less than 0.30 C. At least 0.30 but less than 0.31 D. At least 0.31 but less than 0.32 E. At least 0.32
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 402 9.13 (3 points) Each taxicab has either have zero or one accident per month. Taxicabs are of two equally common types, with mean accident frequency of either 1% or 2% per month. Over the last 3 years, Deniro Taxis has had 10 cabs all of the same type. Over the last 3 years, Deniro Taxis had 4 accidents. In the future Deniro Taxis will have 12 cabs. Use Buhlmann credibility to predict how many accidents Deniro Taxis will have over the coming 3 years. A. 5.0 B. 5.2 C. 5.4 D. 5.6 E. 5.8 9.14 (3 points) You are given: (i) Size of loss follows a Gamma Distribution with α = 4. (ii) Three quarters of the companyʼs policies have θ = 10, while for the other quarter have θ = 12. For a randomly selected policy, you observe 8 losses of sizes: 10, 20, 25, 30, 35, 40, 40, and 50. Using Buhlmann Credibility determine the expected size of the next loss from the selected policy. (A) 39.0 (B) 39.5 (C) 40.0 (D) 40.5 (E) 41.0 9.15 (3 points) You are given the following information about two classes of risks:
• Risks in Class A have a Poisson frequency with a mean of 0.6 per year. • Risks in Class B have a Poisson frequency with a mean of 0.8 per year. • Risks in Class A have an Exponential severity distribution with a mean of 11. • Risks in Class B have an Exponential severity distribution with a mean of 15. • Class A has three times the number of risks in Class B. • Within each class, severities and claim counts are independent. • A risk is randomly selected and observed to have three claims during one year. • The observed claim amounts were: 7, 10, and 21. Using Buhlmann Credibility, estimate the annual losses for next year for this risk. (Do not make separate estimates of frequency and severity.) (A) 8.0 (B) 8.2 (C) 8.4 (D) 8.6 (E) 8.8 9.16 (2 points) You are given the following information about three types of insureds:
• 60% of insureds are Type 1, 30% are Type 2, and 10% are Type 3. • Insureds in Class 1 have a Binomial frequency with m = 10 and q = 0.1. • Insureds in Class 2 have a Binomial frequency with m = 10 and q = 0.2. • Insureds in Class 3 have a Binomial frequency with m = 10 and q = 0.4. • An insured is randomly selected and observed to have five losses during one year. Using Buhlmann Credibility, estimate the number of losses next year for this risk. (A) 2.0 (B) 2.5 (C) 3.0 (D) 3.5 (E) 4.0
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 403 9.17 (3 points) You are given the following: • A portfolio consists of 150 independent risks. • 100 of the risks each have a policy with a $100,000 per claim policy limit, and 50 of the risks each have a policy with a $1,000,000 per claim policy limit. • The risks have identical claim count distributions. • Prior to censoring by policy limits, the claim size distribution for each risk is as follows: Claim Size Probability $10,000 1/2 $50,000 1/4 $100,000 1/5 $1,000,000 1/20 • A claims report is available which shows actual claim sizes incurred for each policy after censoring by policy limits, but does not identify the policy limit associated with each policy. The claims report shows exactly three claims for a policy selected at random. Two of the claims are $100,000, but the amount of the third is illegible. Use Buhlmann Credibility to estimate the value of this illegible number. A. Less than $61,000 B. At least $61,000, but less than $62,000 C. At least $62,000, but less than $63,000 D. At least $63,000, but less than $64,000 E. At least $64,000 9.18 (2 points) You assume the following information on claim frequency for individual automobile drivers: Type Portion Expected Claim of Driver of Drivers Claims Variance A 20% 0.02 0.03 B 50% 0.05 0.06 C 30% 0.10 0.15 Determine the Bühlmann credibility factor for one year of experience of a single driver selected at random from the population, if its type is unknown. (A) 0.5% (B) 1.0% (C) 1.5% (D) 2.0% (E) 2.5% 9.19 (3 points) The number of claims incurred each year is 2, 3, 4, 5, or 6, with equal probability. If there is a claim, there is a 65% chance it will be reported to the insurer by year end, independent of any other claims. There are 3 claims incurred during 2003 that are reported by the end of year 2003. Use Buhlmann Credibility in order to estimate the number of claims incurred during 2003. (A) 4.30 (B) 4.35 (C) 4.40 (D) 4.45 (E) 4.50
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 404 9.20 (3 points) You are given the following joint distribution:
Θ X
1
2
0
0.1
0.0
5
0.1
0.3
25
0.1
0.4
For a given value of Θ, a sample of size 30 for X sums to 510. Determine the Bühlmann credibility premium. (A) 15.5 (B) 15.7 (C) 15.9 (D) 16.1
(E) 16.3
9.21 (3 points) You are given the following:
• The number of claims incurred each year is Negative Binomial with r = 2 and β = 1.6. • If there is a claim, there is a 70% chance it will be reported to the insurer by year end. • The chance of a claim being reported by year end is independent of the reporting of any other claim, and is also independent of the number of claims incurred. There are 5 claims incurred during 2003 that are reported by the end of year 2003. Use Buhlmann Credibility in order to estimate the number of claims incurred during 2003. A. 6.0 B. 6.2 C. 6.4 D. 6.6 E. 6.8 9.22 (2 points) An insurer writes a large book of policies. You are given the following information regarding claims filed by insureds against these policies: (i) A maximum of one claim may be filed per year. (ii) The probability of a claim varies by insured, and the claims experience for each insured is independent of every other insured. (iii) The probability of a claim for each insured remains constant over time. (iv) The overall probability of a claim being filed by a randomly selected insured in a year is 0.12. (v) The variance of the individual insured claim probabilities is 0.03. An insured selected at random is found to have filed 2 claims over the past 8 years. Determine the Bühlmann credibility estimate for the expected number of claims the selected insured will file over the next 3 years. (A) 0.55 (B) 0.60 (C) 0.65 (D) 0.70 (E) 0.75
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 405 9.23 (3 points) You are given the following information:
•
There are three types of risks.
•
The types are homogeneous, every risk of a given type has the same Exponential
severity process: Portion of Average Risks in Claim Type This type Size 1 70% 25 2 20% 40 3 10% 50 A risk is picked at random and we do not know what type it is. For this randomly selected risk, over 5 years there are 3 claims of sizes: 30, 40, and 70. Use Bühlmann Credibility to predict the future average claim size of this same risk. A. Less than 35 B. At least 35, but less than 36 C. At least 36, but less than 37 D. At least 37, but less than 38 E. 38 or more 9.24 (3 points) For a portfolio of insurance risks, aggregate losses per year per exposure follow a distribution with mean µ and coefficient of variation 1.4, with µ varying by class as follows: Class
µ
Percent of Risks in Class
X 3 50% Y 4 30% Z 5 20% A randomly selected risk has the following experience over three years: Year Number of Exposures Aggregate Losses 1 124 403 2 103 360 3 98 371 Assuming 100 exposures in Year 4, calculate the Bühlmann-Straub estimate of the mean aggregate losses in Year 4 for this risk. A. Less than 350 B. At least 350, but less than 355 C. At least 355, but less than 360 D. At least 360, but less than 365 E. 365 or more
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 406 9.25 (3 points) You are given the following information:
•
There are two types of risks.
•
The types are homogeneous, every risk of a given type has the same Exponential severity process: Portion of Risks in Type This type 1 75% 2 25%
Average Claim Size in Year 1 200 300
• Inflation is 5% per year. A risk is picked at random and we do not know what type it is. For this randomly selected risk, the experience is as follows: In Year 1 there are two claims of sizes 20 and 100. In Year 2 there are no claims. In Year 3 there are two claims of sizes 50 and 400. Use Bühlmann Credibility to predict the average claim size of this same risk in year 4. A. Less than 235 B. At least 235, but less than 240 C. At least 240, but less than 245 D. At least 245, but less than 250 E. 250 or more 9.26 (4 points) For a portfolio of insurance risks, aggregate losses in 2005 follow a LogNormal Distribution with parameters σ = 1.5 and µ varying by type: Class
µ
Percent of Risks in Class
1 8 50% 2 9 50% A randomly selected risk has the following experience: Year Aggregate Losses 2005 32,000 2006 29,000 2007 37,000 Inflation is 10% per year. Estimate of the mean aggregate losses in 2008 for this risk using Buhlmann credibility. A. Less than 24,000 B. At least 24,000, but less than 25,000 C. At least 25,500, but less than 26,000 D. At least 26,000, but less than 27,000 E. 27,000 or more
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 407 Use the following information for the next two questions: Annual claim counts for each policyholder follow a Negative Binomial distribution with r = 3. Half of the policyholders have β = 1. The other half of the policyholders have β = 2. A policyholder had 7 claims in one year. 9.27 (2 points) Using Buhlmann Credibility, estimate the future claim frequency for this policyholder. A. 4.8 B. 4.9 C. 5.0 D. 5.1 E. 5.2 9.28 (3 points) Using Bayes Analysis, estimate the future claim frequency for this policyholder. A. 4.8 B. 4.9 C. 5.0 D. 5.1 E. 5.2
9.29 (4 points) Severity is LogNormal. µ is equally likely to be 6 or 7. σ is equally likely to be 0.5 or 1. µ and σ are distributed independently of each other. Determine the Buhlmann Credibility Parameter K. A. 4 B. 6 C. 8 D. 10
E. 12
9.30 (2 points) You are given: (i) Each risk has at most one claim each year. (ii) Type of Risk Prior Probability Annual Claim Probability I 0.7 0.1 II 0.2 0.2 III 0.1 0.4 One randomly chosen risk has three claims during Years 1-6. Use Buhlmann Credibility in order to estimate the probability of a claim for this risk in Year 7. (A) 0.25 (B) 0.28 (C) 0.31 (D) 0.34 (E) 0.37
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 408 Use the following information for the following 6 questions: There are two types of insureds with the following characteristics: Portion of Insureds Annual Claim Frequency Type of This Type per Exposure Claim Severity 1
40%
Bernoulli q = 0.03
Gamma α = 6 and θ = 100
2
60%
Bernoulli q = 0.06
Gamma α = 4 and θ = 100
For a particular policyholder you observe the following experience over four years: Year Exposures Number of Claims Claim Sizes 2005 20 2 500, 1000 2006 25 0 --2007 30 1 300 2008 25 1 800 9.31 (3 points) Using Buhlmann Credibility, estimate the future claim frequency for this policyholder. A. 4.1% B. 4.3% C. 4.5% D. 4.7% E. 4.9% 9.32 (3 points) Using Bayes Analysis, estimate the future claim frequency for this policyholder. A. 3.2% B. 3.4% C. 3.6% D. 3.8% E. 4.0% 9.33 (3 points) Using Buhlmann Credibility, estimate the future average severity for this policyholder. A. 470 B. 500 C. 530 D. 560 E. 590 9.34 (3 points) Using Bayes Analysis, estimate the future average severity for this policyholder. A. 500 B. 520 C. 540 D. 560 E. 580 9.35 (3 points) Assuming 40 exposures in 2009, using Buhlmann Credibility, estimate the aggregate losses for this policyholder in 2009. A. 820 B. 840 C. 860 D. 880 E. 900 9.36 (3 points) Assuming 30 exposures in 2010, using Bayes Analysis, estimate the aggregate losses for this policyholder in 2010. A. 560 B. 580 C. 600 D. 620 E. 640
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 409 Use the following information for the next two questions:
• Each driver can not have 2 or more accidents per year. • Two groups of drivers comprise equal proportions of the insured population. • One group has a 5% annual frequency, while the other has a 10% annual frequency. • Every driver has the following accident severity distribution: Probability 60% 30% 10%
Size of Loss 1000 3000 5000
9.37 (2 points) What credibility would be given to an insured's number of accidents over three years using Buhlmann's Credibility Formula? A. Less than 1.5% B. At least 1.5%, but less than 2.0% C. At least 2.0%, but less than 2.5% D. At least 2.5%, but less than 3.0% E. 3.0% or more 9.38 (3 points) What credibility would be given to an insured's aggregate losses over three years using Buhlmann's Credibility Formula? A. Less than 1.5% B. At least 1.5%, but less than 2.0% C. At least 2.0%, but less than 2.5% D. At least 2.5%, but less than 3.0% E. 3.0% or more 9.39 (4, 5/87, Q.40) (3 points) There are two classes of insureds in a given population. Each insured has either no claims or exactly one claim in one experience period. For each insured the distribution of the number of claims is binomial. The probability of a claim in one experience period is 0.20 for Class 1 insureds and 0.30 for Class 2. The population consists of 40% Class 1 insureds and 60% for Class 2. An insured is selected at random without knowing the insured's class. What credibility would be given to this insured's experience for five experience periods using Buhlmann's Credibility Formula? A. Less than 0.06 B. At least 0.06, but less than 0.08 C. At least 0.08, but less than 0.10 D. At least 0.10, but less than 0.12 E. 0.12 or more
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 410 9.40 (4, 5/90, Q.56) (3 points) Employ a Buhlmann credibility estimate that uses pure premium rather than using frequency or severity separately. Consider a group of insureds described by the following. 1. An individual insured's claim frequency is Poisson. 2. An individual insured's claim severity has a variance equal to the mean squared. Frequency Severity 3. Expected value of the hypothetical means 0.1 100 4. Variance of the hypothetical means 0.1 2500 5. Frequency and severity are independently distributed. What is the Buhlmann credibility, Z, for a single pure premium observation of an insured selected from the group? A. 0 < Z ≤ 1/5 B. 1/5 < Z ≤ 1/4 C. 1/4 < Z ≤ 1/3 D. 1/3 < Z ≤ 1/2 E. 1/2 < Z ≤ 1 9.41 (4, 5/91, Q.25) (1 point) Assume that the expected pure premium for an individual insured is constant over time. If the Buhlmann credibility for two years of experience is equal to 0.40, find the Buhlmann credibility for three years of experience. A. Less than 0.500 B. At least 0.500 but less than 0.525 C. At least 0.525 but less than 0.550 D. At least 0.550 but less than 0.575 E. At least 0.575
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 411 Use the following information for the next two questions: Classes A and B have the same number of risks. Each class is homogeneous. The following data are the mean and process variance for a risk from the given class. Number of Claims Size of Loss Class Mean Variance Mean Variance A 0.1667 0.1389 4 20 B 0.8333 0.1389 2 5 A risk is randomly selected from one of the two classes and four observations are made of the risk. 9.42 (4B, 5/92, Q.18) (3 points) Determine the value for the Buhlmann credibility, Z, that can be applied to the observed pure premium. A. Less than 0.05 B. At least 0.05 but less than 0.10 C. At least 0.10 but less than 0.15 D. At least 0.15 but less than 0.20 E. At least 0.20 9.43 (4B, 5/92, Q.19) (1 point) The pure premium calculated from the four observations is 0.25. Determine the Buhlmann credibility estimate for the risk's pure premium. A. Less than 0.25 B. At least 0.25 but less than 0.50 C. At least 0.50 but less than 0.75 D. At least 0.75 but less than 1.00 E. At least 1.00
9.44 (4B, 5/93, Q.27) (2 points) Use the following information: • An insurance portfolio consists of two classes, A and B. • The number of claims distribution for each class is: Probability of Number of Claims = Class 0 1 2 3 A 0.7 0.1 0.1 0.1 B 0.5 0.2 0.1 0.2 • Class A has three times as many insureds as Class B. • A randomly selected risk from the portfolio generates 1 claim over the most recent policy period. Determine the Buhlmann credibility estimate of the claims frequency rate for the observed risk. A. Less than 0.72 B. At least 0.72 but less than 0.78 C. At least 0.78 but less than 0.84 D. At least 0.84 but less than 0.90 E. At least 0.90
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 412 9.45 (4B, 5/93, Q.29) (2 points) You are given the following: • The distribution for number of claims is binomial with parameters q and m, where m = 1. • The prior distribution of q has mean = 0.25 and variance = 0.07. Determine the Buhlmann credibility to be assigned to a single observation of one risk. A. Less than 0.20 B. At least 0.20 but less than 0.25 C. At least 0.25 but less than 0.30 D. At least 0.30 but less than 0.35 E. At least 0.35 9.46 (4B, 11/93, Q.18) (3 points) You are given the following: Two risks have the following severity distribution. Probability of Claim Amount For Amount of Claim Risk 1 Risk 2 100 0.50 0.70 1,000 0.30 0.20 20,000 0.20 0.10 Risk 1 is twice as likely as Risk 2 of being observed. A claim of 100 is observed, but the observed risk is unknown. Determine the Buhlmann credibility estimate of the expected value of a second claim amount from the same risk. A. Less than 3,500 B. At least 3,500, but less than 3,650 C. At least 3,650, but less than 3,800 D. At least 3,800, but less than 3,950 E. At least 3,950 9.47 (4B, 5/94, Q.9) (3 points) The aggregate loss distributions for two risks for one exposure period are as follows: ___ Aggregate Losses $50 $1,000 Risk $0 A 0.80 0.16 0.04 B 0.60 0.24 0.16 A risk is selected at random and observed to have $0 of losses in the first two exposure periods. Determine the Buhlmann credibility estimate of the expected value of the aggregate losses for the same risk's third exposure period. A. Less than $90 B. At least $90, but less than $95 C. At least $95, but less than $100 D. At least $100, but less than $105 E. At least $105
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 413 Use the following information for the next two questions: A portfolio of 200 independent insureds is subdivided into two classes as follows: Expected Variance of Number Number of Number of Expected Variance of of Claims Per Claims Per Severity Severity Class Insureds Insured Insured Per Claim Per Claim 1 50 0.25 0.75 4 20 2 150 0.50 0.75 8 36 • Claim count and severity for each insured are independent. • A risk is selected at random from the portfolio, and its pure premium, P1 , for one exposure period is observed. 9.48 (4B, 5/94, Q.17) (3 points) Use the Buhlmann credibility method to estimate the expected value of the pure premium for the second exposure period for the same selected risk. A. 3.25 B. 0.03 P1 + 3.15 C. 0.05 P1 + 3.09 D. 0.08 P1 + 3.00
E. None of A, B, C, or D
9.49 (4B, 5/94, Q.18) (1 point) After three exposure periods, the observed pure premium for the selected risk is P. The selected risk is returned to the portfolio. Then, a second risk is selected at random from the portfolio. Use the Buhlmann credibility method to estimate the expected pure premium for the next exposure period for the newly selected risk. A. 3.25 B. 0.09 P + 2.97 C. 0.14 P + 2.80 D. 0.20 P + 2.59 E. 0.99 P + 0.03 9.50 (4B, 5/95, Q.20) (3 points) The aggregate loss distributions for three risks for one exposure period are as follows: Aggregate Losses $0 $50 $2,000 Risk A 0.80 0.16 0.04 B 0.60 0.24 0.16 C 0.40 0.32 0.28 A risk is selected at random and is observed to have $50 of aggregate losses in the first exposure period. Determine the Buhlmann credibility estimate of the expected value of the aggregate losses for the same risk's second exposure period. A. Less than $300 B. At least $300, but less than $325 C. At least $325, but less than $350 D. At least $350, but less than $375 E. At least $375
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 414 9.51 (4B, 5/96, Q.6) (2 points) You are given the following: • A portfolio of independent risks is divided into two classes.
• •
Each class contains the same number of risks.
For each risk in Class 1, the number of claims for a single exposure period follows a Poisson distribution with mean 1. • For each risk in Class 2, the number of claims for a single exposure period follows a Poisson distribution with mean 2. A risk is selected at random from the portfolio. During the first exposure period, 2 claims are observed for this risk. During the second exposure period, 0 claims are observed for this same risk. Determine the Buhlmann credibility estimate of the expected number of claims for this same risk for the third exposure period. A. Less than 1.32 B. At least 1.32, but less than 1.34 C. At least 1.34, but less than 1.36 D. At least 1.36, but less than 1.38 E. At least 1.38 9.52 (4B, 11/96, Q.4) (2 points) You are given the following:
• • •
A portfolio of independent risks is divided into three classes. Each class contains the same number of risks.
For each risk in Classes 1 and 2, the probability of exactly one claim during one exposure period is 1/3, while the probability of no claim is 2/3. • For each risk in Class 3, the probability of exactly one claim during one exposure period is 2/3, while the probability of no claim is 1/3. A risk is selected at random from the portfolio. During the first two exposure periods, two claims are observed for this risk (one in each exposure period). Determine the Buhlmann credibility estimate of the probability that a claim will be observed for this same risk during the third exposure period. A. 4/9 B. 1/2 C. 6/11 D. 5/9 E. 3/5
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 415 9.53 (4B, 5/97, Q.12) (2 points) You are given the following:
• • • • •
A portfolio of independent risks is divided into three classes. Each class contains the same number of risks. For all of the risks in Class 1, claim sizes follow a uniform distribution on [0, 400]. For all of the risks in Class 2, claim sizes follow a uniform distribution on [0, 600].
For all of the risks in Class 3, claim sizes follow a uniform distribution on [0, 800]. A risk is selected at random from the portfolio. The first claim observed for this risk is 340. Determine the Buhlmann credibility estimate of the expected value of the second claim observed for this same risk. A. Less than 270 B. At least 270, but less than 290 C. At least 290, but less than 310 D. At least 310, but less than 330 E. At least 330
9.54 (4B, 11/97, Q.5) (2 points) You are given the following: • A portfolio of independent risks is divided into two classes. • Each class contains the same number of risks. • The claim count probabilities for each risk for a single exposure period are as follows: Class Probability of 0 Claims Probability of 1 Claim 1 1/4 3/4 2 3/4 1/4 • All claims incurred by risks in Class 1 are of size u. • All claims incurred by risks in Class 2 are of size 2u. A risk is selected at random from the portfolio. Determine the Buhlmann credibility for the pure premium of one exposure period of loss experience for this risk. A. Less than 0.05 B. At least 0.05, but less than 0.15 C. At least 0.15, but less than 0.25 D. At least 0.25, but less than 0.35 E. At least 0.35
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 416 9.55 (4B, 11/99, Q.14) (3 points) You are given the following: • A portfolio of independent risks is divided into two classes. • Each class contains the same number of risks. • The claim count distribution for each risk in Class A is a mixture of a Poisson distribution with mean 1/6 and a Poisson distribution with mean 1/3, with each distribution in the mixture having a weight of 0.5. • The claim count distribution for each risk in Class B is a mixture of a Poisson distribution with mean 2/3 and a Poisson distribution with mean 5/6, with each distribution in the mixture having a weight of 0.5. A risk is selected at random from the portfolio. Determine the Buhlmann credibility of one observation for this risk. A. 9/83 B. 9/82 C. 1/9 D. 10/83 E. 5/41 9.56 (Course 4 Sample Exam 2000, Q.24) You are given the following:
• • • • • •
Type A risks have each year's losses uniformly distributed on the interval [0, 1]. Type B risks have each year's losses uniformly distributed on the interval [0, 2]. A risk is selected at random with each type being equally likely. The first year's losses equal L. Let X be the Buhlmann credibility estimate of the second year's losses.
Let Y be the Bayesian estimate of the second year's losses. Which of the following statements is true? A. If L < 1, then X > Y. B. If L > 1, then X < Y. C. If L = 1/2, then X < Y. D. There are no values of L such that X = Y. E. There are exactly two values of L such that X = Y. 9.57 (4, 5/00, Q.3) (2.5 points) You are given the following information about two classes of business, where X is the loss for an individual insured: Class 1 Class 2 Number of insureds 25 50 E(X) 380 23 E(X2 )
365,000 ---You are also given that an analysis has resulted in a Buhlmann k value of 2.65. Calculate the process variance for Class 2. (A) 2,280 (B) 2,810 (C) 7,280 (D) 28,320 (E) 75,050
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 417 9.58 (4, 11/00, Q.19) (2.5 points) For a portfolio of independent risks, you are given: (i) The risks are divided into two classes, Class A and Class B. (ii) Equal numbers of risks are in Class A and Class B. (iii) For each risk, the probability of having exactly 1 claim during the year is 20% and the probability of having 0 claims is 80%. (iv) All claims for Class A are of size 2. (v) All claims for Class B are of size c, an unknown but fixed quantity. One risk is chosen at random, and the total loss for one year for that risk is observed. You wish to estimate the expected loss for that same risk in the following year. Determine the limit of the Bühlmann credibility factor as c goes to infinity. (A) 0 (B) 1/9 (C) 4/5 (D) 8/9 (E) 1 9.59 (4, 11/00, Q.38) (2.5 points) An insurance company writes a book of business that contains several classes of policyholders. You are given: (i) The average claim frequency for a policyholder over the entire book is 0.425. (ii) The variance of the hypothetical means is 0.370. (iii) The expected value of the process variance is 1.793. One class of policyholders is selected at random from the book. Nine policyholders are selected at random from this class and are observed to have produced a total of seven claims. Five additional policyholders are selected at random from the same class. Determine the Bühlmann credibility estimate for the total number of claims for these five policyholders. (A) 2.5 (B) 2.8 (C) 3.0 (D) 3.3 (E) 3.9 9.60 (4, 11/01, Q.11 & 2009 Sample Q.62) (2.5 points) An insurer writes a large book of home warranty policies. You are given the following information regarding claims filed by insureds against these policies: (i) A maximum of one claim may be filed per year. (ii) The probability of a claim varies by insured, and the claims experience for each insured is independent of every other insured. (iii) The probability of a claim for each insured remains constant over time. (iv) The overall probability of a claim being filed by a randomly selected insured in a year is 0.10. (v) The variance of the individual insured claim probabilities is 0.01. An insured selected at random is found to have filed 0 claims over the past 10 years. Determine the Bühlmann credibility estimate for the expected number of claims the selected insured will file over the next 5 years. (A) 0.04 (B) 0.08 (C) 0.17 (D) 0.22 (E) 0.25
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 418 9.61 (4, 11/01, Q.26 & 2009 Sample Q.72) (2.5 points) You are given the following data on large business policyholders: (i) Losses for each employee of a given policyholder are independent and have a common mean and variance. (ii) The overall average loss per employee for all policyholders is 20. (iii) The variance of the hypothetical means is 40. (iv) The expected value of the process variance is 8000. (v) The following experience is observed for a randomly selected policyholder: Year Average Loss per Number of Employee Employees 1 15 800 2 10 600 3 5 400 Determine the Bühlmann-Straub credibility premium per employee for this policyholder. (A) Less than 10.5 (B) At least 10.5, but less than 11.5 (C) At least 11.5, but less than 12.5 (D) At least 12.5, but less than 13.5 (E) At least 13.5 9.62 (4, 11/01, Q.38 & 2009 Sample Q.78) (2.5 points) You are given: (i) Claim size, X, has mean µ and variance 500. (ii) The random variable µ has a mean of 1000 and variance of 50. (iii) The following three claims were observed: 750, 1075, 2000. Calculate the expected size of the next claim using Bühlmann credibility. (A) 1025 (B) 1063 (C) 1115 (D) 1181 (E) 1266
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 419 9.63 (4, 11/02, Q.29 & 2009 Sample Q. 48) (2.5 points) You are given the following joint distribution:
Θ X
0
1
0
0.4
0.1
1
0.1
0.2
2
0.1
0.1
For a given value of Θ and a sample of size 10 for X: 10
∑ xi = 10. i=1
Determine the Bühlmann credibility premium. (A) 0.75 (B) 0.79 (C) 0.82 (D) 0.86
(E) 0.89
9.64 (4, 11/02, Q.32 & 2009 Sample Q. 50) (2.5 points) You are given four classes of insureds, each of whom may have zero or one claim, with the following probabilities: Number of Claims Class 0 1 I 0.9 0.1 II 0.8 0.2 III 0.5 0.5 IV 0.1 0.9 A class is selected at random (with probability 1/4), and four insureds are selected at random from the class. The total number of claims is two. If five insureds are selected at random from the same class, estimate the total number of claims using Bühlmann-Straub credibility. (A) 2.0 (B) 2.2 (C) 2.4 (D) 2.6 (E) 2.8
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 420 9.65 (4, 11/03, Q.23 & 2009 Sample Q.18) (2.5 points) You are given: (i) Two risks have the following severity distributions: Probability of Claim Probability of Claim Amount of Claim Amount for Risk 1 Amount for Risk 2 250 0.5 0.7 2,500 0.3 0.2 60,000 0.2 0.1 (ii) Risk 1 is twice as likely to be observed as Risk 2. A claim of 250 is observed. Determine the Bühlmann credibility estimate of the second claim amount from the same risk. (A) Less than 10,200 (B) At least 10,200, but less than 10,400 (C) At least 10,400, but less than 10,600 (D) At least 10,600, but less than 10,800 (E) At least 10,800 9.66 (4, 11/04, Q.9 & 2009 Sample Q.139) (2.5 points) Members of three classes of insureds can have 0, 1 or 2 claims, with the following probabilities: Number of Claims Class 0 1 2 I 0.9 0.0 0.1 II 0.8 0.1 0.1 III 0.7 0.2 0.1 A class is chosen at random, and varying numbers of insureds from that class are observed over 2 years, as shown below: Year Number of Insureds Number of Claims 1 20 7 2 30 10 Determine the Bühlmann-Straub credibility estimate of the number of claims in Year 3 for 35 insureds from the same class. (A) 10.6 (B) 10.9 (C) 11.1 (D) 11.4 (E) 11.6
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 421 9.67 (4, 11/04, Q.25 & 2009 Sample Q.151) (2.5 points) You are given: (i) A portfolio of independent risks is divided into two classes. (ii) Each class contains the same number of risks. (iii) For each risk in Class 1, the number of claims per year follows a Poisson distribution with mean 5. (iv) For each risk in Class 2, the number of claims per year follows a binomial distribution with m = 8 and q = 0.55. (v) A randomly selected risk has three claims in Year 1, r claims in Year 2 and four claims in Year 3. The Bühlmann credibility estimate for the number of claims in Year 4 for this risk is 4.6019. Determine r. (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 9.68 (4, 5/05, Q.20 & 2009 Sample Q.190) (2.9 points) For a particular policy, the conditional probability of the annual number of claims given Θ = θ, and the probability distribution of θ are as follows: Number of claims
0
1
2
Probability
2θ
θ
1 - 3θ
Probability[θ = 0.05] = 0.80. Probability[θ = 0.30] = 0.20. Two claims are observed in Year 1. Calculate the Bühlmann credibility estimate of the number of claims in Year 2. (A) Less than 1.68 (B) At least 1.68, but less than 1.70 (C) At least 1.70, but less than 1.72 (D) At least 1.72, but less than 1.74 (E) At least 1.74
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 422 9.69 (4, 5/07, Q.36) (2.5 points) For a portfolio of insurance risks, aggregate losses per year per exposure follow a normal distribution with mean θ and standard deviation 1000, with θ varying by class as follows: Class
θ
Percent of Risks in Class
X 2000 60% Y 3000 30% Z 4000 10% A randomly selected risk has the following experience over three years: Year Number of Exposures Aggregate Losses 1 24 24,000 2 30 36,000 3 26 28,000 Calculate the Bühlmann-Straub estimate of the mean aggregate losses per year per exposure in Year 4 for this risk. (A) 1100 (B) 1138 (C) 1696 (D) 2462 (E) 2500
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 423 Solutions to Problems: 9.1. C. Each Geometric Distribution has mean β, and variance β(1+β). Beta
2 5
A Priori Chance of This Hypothetical Type of Risk Mean 0.333 0.667
Overall
Square of Hypothetical Mean
Process Variance
2.000 5.000
4.000 25.000
6.000 30.000
4.000
18.000
22.000
VHM = 18 - 42 = 2. K = EPV / VHM = 22/2 = 11. One insured for one year. ⇒ N = 1. ⇒ Z = 1/(1 + 11) = 1/12. A priori mean = 4. Observation = 10. Estimate = (10)(1/12) + (4)(11/12) = 4.5. Comment: Setup taken from 4, 5/05, Q.35 on Bayesian Analysis. If it were one insured for three years, then N = 3. If it were instead five insureds for 3 years each, then N = 15. 9.2. D. Each LogNormal Distribution has µ = 5, mean = exp[5 + σ2/2], and second moment = exp[10 + 2σ2]. Sigma
1 1.5 Overall
A Priori Chance of This Hypothetical Type of Risk Mean 0.500 0.500
Square of Hypothetical Mean
Second Moment
Process Variance
245 457
59,874 208,981
162,755 1,982,759
102,881 1,773,778
351
134,428
938,329
VHM = 134,428 - 3512 = 11,227. K = EPV / VHM = 938,329/11,227 = 83.6. N = number of claims = 5. Z = 5/(5 + 83.6) = 5.6%. A priori mean = 351. Observation = (50 + 100 + 150 + 200 + 250)/5 = 150. Estimate = (5.6%)(150) + (1 - 5.6%)(351) = 340.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 424 9.3. B. For Risk A the mean is: (.8)(0) + (.1)(100) + (.1)(500) = 60 and the second moment is: (.8)(02 ) + (.1)(1002 ) + (.1)(5002 ) = 26000. Thus the process variance for Risk A is: 26000 - 602 = 22400.
Risk A B C
A Priori Chance of Risk 0.333 0.333 0.333
Mean
Mean 60 120 150
Square of Mean 3600 14400 22500
110.00
13500
Second Moment 26,000 52,000 55,000
Process Variance 22,400 37,600 32,500 30,833
Thus the Variance of the Hypothetical Means = 13500 - 1102 = 1400. K = EPV / VHM = 30833 / 1400 = 22.0. Z = 1/(1 + K) = 1/23 = .043. The a priori mean is 110. The observation is 500. Thus the estimated die roll is: (.043)(500) + (1 - .043)(110) = 126.8. 9.4. C. Each Pareto Distribution has α = 4, mean θ/(α−1) = θ/3, second moment 2θ2/{(α−1)(α−2)} = θ2/3, and variance θ2/3 - (θ/3)2 = 2θ2/9. Theta
1 3
A Priori Chance of This Hypothetical Type of Risk Mean 0.500 0.500
Overall
Square of Hypothetical Mean
Process Variance
0.333 1.000
0.111 1.000
0.222 2.000
0.667
0.556
1.111
VHM = .556 - .6672 = .111. K = EPV / VHM = 1.111/.111 = 10. N = 1. Z = 1/(1 + 10) = 1/11. A priori mean = 0.667. Observation = 5. Estimate = (5)(1/11) + (.667)(10/11) = 1.06. 9.5. B. For each Poisson, the process variance is the mean. Therefore, Expected Value of the process variance = (.6)(.05) + (.3)(.1) + (.1)(.2) = .05 = Overall mean frequency.
Type of Driver Good Bad Ugly Average
A Priori Chance of This Type of Driver 0.5 0.3 0.2
Mean Annual Claim Freq. 0.03 0.05 0.1
Square of Mean Claim Freq. 0.0009 0.0025 0.0100
Poisson Process Variance 0.03 0.05 0.1
0.050
0.0032
0.050
Therefore the variance of the hypothetical mean frequencies = .0032 - .052 = .0007. Therefore K= EPV / VHM = .05 / .0007 = 71.4. Z = 5/(5 + 71.4) = 6.5%. Estimated frequency = (6.5%)(.2) + (93.5%)(.05) = 0.060.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 425 9.6. B. One needs to figure out for each type of driver a single observation of the risk process, in other words for the observation of a single claim, the process variance of the average size of a claim. Process variance for a Pareto Distribution is: θ2α/{(α−1)2(α−2)}, so the process variances are: 60,000, 104,167, and 222,222. The probability weights are the product of claim frequency and the a priori frequency of each type of driver: (.5)(.03), (.3)(.05), (.2)(.10). The probabilities that a claim came from each of the types of drivers are the probability weights divided by the their sum: .3, .3, .4. Thus the weighted average process variance of the severity is: (60,000)(.3) + (104,167)(.3) + (222,222)(.4) = 138,139. A Priori Chance of Type of This Type Driver of Driver Good Bad Ugly
0.5 0.3 0.2
Avg. Claim Freq.
Probability Weight For Claim
Probability For Claim
Alpha
Process Variance Claim Severity
0.03 0.05 0.1
0.015 0.015 0.020
0.3 0.3 0.4
6 5 4
60,000 104,167 222,222
0.050
1.000
Average
138,139
Comment: On the one hand, a claim is more likely to be from a Good Driver since there are many Good Drivers. On the other hand, a claim is more likely to be from an Ugly Driver, because each such driver produces more claims. Thus one needs to take into account both the proportion of a type of driver and its expected claim frequency. The probability that a claim came from each type of driver is proportional to the product of claim frequency and the a priori frequency of each type of driver. 9.7. C. Average severities for the Pareto Distributions are: θ/(α−1) = 200, 250, and 333. The overall average severity is 268.3. Average of the severity squared is: (.3)(40,000) + (.3)(62,500) + (.4)(111,111) = 75,194. Therefore, the variance of the hypothetical mean severities = 75,194 - 268.32 = 3209. A Priori Chance of Type of This Type Driver of Driver Good 0.5 Bad 0.3 Ugly 0.2 Average
Avg. Claim Freq. 0.03 0.05 0.1
Probability Weight For Claim 0.015 0.015 0.020
Probability For Claim 0.300 0.300 0.400
0.050
1.000
Alpha 6 5 4
Avg. Claim Severity 200 250 333
Square of Avg. Claim Severity 40,000 62,500 111,111
268.3
75,194
9.8. A. K = EPV/ VHM = 138,139 / 3209 = 43.0. Z = 1 / (1 + 43.0) = 1/44. New estimate = {2500 + (43)(268.3)} / 44 = $319.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 426 9.9. A. For a Poisson frequency, variance of p.p. = (mean freq.)(2nd moment of severity). For the Pareto Distribution, the second moment is: 2 θ2 / {(α−1)(α−2)}. A
B
Type of Driver Good Bad Ugly
A Priori Chance of This Type of Driver 0.5 0.3 0.2
C
Claim Freq. 0.03 0.05 0.1
D
Alpha 6 5 4
E
F
Expected Value of Square of Claim Sizes 100,000 166,667 333,333
Variance of P.P. : Product of Columns C&E 3,000 8,333 33,333 10,667
Average
9.10. E. The variance of the hypothetical pure premiums = 287.1 - 13.422 = 107.0.
Type of Driver Good Bad Ugly Average
A Priori Chance of This Type of Driver 0.5 0.3 0.2
Avg. Claim Freq. 0.03 0.05 0.1
Alpha 6 5 4
Avg. Claim Severity 200 250 333
Avg. Pure Premium 6.00 12.50 33.33
Square of Avg. Pure Premium 36.0 156.2 1,111.1
13.42
287.1
9.11. D. The observed pure premium is $2500 / 5 = $500. K = EPV/ VHM = 10667 / 107.0 = 99.7. Z = 5/ (5 + 99.7) = 4.8%. Estimated pure premium = (4.8%)($500) + (1 - 4.8%)($13.42) = $36.78. Comment: The result of making separate estimates for frequency and severity is different than just working directly with pure premiums: 36.76 ≠ (.060)(319) = 19.1. 9.12. B. As shown in the solutions to problems in the previous section, EPV = .1845 and the VHM = .0055. Thus K = EPV / VHM = .1845 / .0055 = 33.7. Thus for 9 observations Z = 9 / (9+33.7) = 21.1%. The prior mean is .255 and the observation is 4/9 = .444. Thus the new estimate is: (.211)(.444) + (.789)(.255) = 0.295.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 427 9.13. E. For a single cab for a single month, frequency is Bernoulli. Process variance is: q(1-q), which is: (.01)(.99) = .0099 and (.02)(.98) = .0196, for the two types. EPV = (1/2)(.0099) + (1/2)(.0196) = .01475. Overall mean = (1/2)(.01) + (1/2)(.02) = .015. VHM = (1/2)(.01 - .015)2 + (1/2)(.02 - .015)2 = .000025. K = EPV/VHM = .01475/.000025 = 590. We observe: (3)(12)(10) = 360 cab-months. Z = 360/(360 + K) = 37.9%. Observed frequency = 4/360. Estimated future frequency = (.379)(4/360) + (1 - .379)(.015) = .01353 per cab-month. For (3)(12)(12) = 432 cab-months, we expect: (.01353)(432) = 5.84 accidents. Alternately, since the frequency per month is so small, it is very close to a Poisson. Thus we have approximately, Poisson frequency with annual mean of .12 or .24. EPV = overall mean = .18. VHM = .062 = .0036. K = EPV/VHM = .018/.0036 = 50. We observe 30 cab-years. Z = Z = 30/(30 + K) = 3/8. Observed frequency = 4/30. Estimated future frequency = (3/8)(4/30) + (5/8)(.18) = .1625 per cab-year. For (3)(12) = 36 cab-years, we expect: (.1625)(36) = 5.85 accidents. Comment: Have assumed that the 12 future cabs are all of the same (unknown) type as were the 10 past cabs. 9.14. C. Each Gamma Distribution has α = 4, mean = 4θ, and variance = 4θ2. Theta
10 12 Overall
A Priori Chance of This Hypothetical Type of Risk Mean 0.750 0.250
Square of Hypothetical Mean
Process Variance
40 48
1,600 2,304
400 576
42
1,776
444
VHM = 1776 - 422 = 12. K = EPV / VHM = 444/12 = 37. N = number of claims = 8. Z = 8/(8 + 37) = 17.8%. A priori mean = 42. Observation = (10 + 20 + 25 + 30 + 35 + 40 + 40 + 50)/8 = 31.25. Estimate = (17.8%)(31.25) + (1 - 17.8%)(42) = 40.1.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 428 9.15. E. Each Poisson Distribution has mean = λ, and each Exponential Distribution has mean θ and second moment 2θ2. The mean aggregate loss is λθ. The variance of the aggregate loss is λ2θ2. Class
A B
λ
0.6 0.8
θ
A Priori Chance of This Type of Risk 0.750 0.250
11 15
Hypothetical Mean
Square of Hypothetical Mean
Process Variance
6.60 12.00
43.56 144.00
145.20 360.00
7.95
68.67
198.90
Overall
VHM = 68.67 - 7.952 = 5.47. K = EPV / VHM = 198.9/5.47 = 36.36. One risk for one year. ⇒ N = 1. Z = 1/(1 + 36.36) = 2.7%. A priori mean = 7.95. Observation = 7 + 10 + 21 = 38. Estimate = (2.7%)(38) + (1 - 2.7%)(7.95) = 8.76. 9.16. C. Each Binomial Distribution has mean = 10q and variance = 10q(1-q). q
0.1 0.2 0.4
A Priori Chance of This Hypothetical Type of Risk Mean 0.600 0.300 0.100
Overall
Square of Hypothetical Mean
Process Variance
1.00 2.00 4.00
1.00 4.00 16.00
0.90 1.60 2.40
1.60
3.40
1.26
VHM = 3.4 - 1.62 = 0.84. K = EPV / VHM = 1.26/0.84 = 1.5. One insured for one year. ⇒ N = 1. Z = 1/(1 + 1.5) = 40.0%. A priori mean = 1.6. Observation = 5. Estimate = (40%)(5) + (1 - 40%)(1.6) = 2.96.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 429 9.17. A. If one has a $1 million maximum covered loss the censoring has no effect. If instead one has a $100,000 maximum covered loss, the chance of having a $100,000 payment is the chance of having a total size of loss ≥ 100,000, which is 1/5 + 1/20 = .25. In that case the mean is: (.5)(10,000) + (.25)(50,000) + (.25)(100,000) = 42,500. Similarly, with a $100,000 maximum covered loss, the second moment of the severity is: (.5)(10,0002 ) + (.25)(50,0002 ) + (.25)(100,0002 ) = 3.175 x 109 . Thus with a $100,000 maximum covered loss the process variance is: 3.175 x 109 - (42,5002 ) = 1.369 x 109 . Maximum Covered Loss 1 million 100 thous.
A Priori Chance of This Hypothetical Type of Risk Mean 0.333 0.667
Overall
87,500 42,500
Second Moment
Process Variance
Square of Hypothetical Mean
5.2675e+10 3.1750e+9
4.5019e+10 1.3688e+9
7.6562e+9 1.8062e+9
1.5919e+10
3.7562e+9
57,500
VHM = 3.756 x 109 - 57,5002 = 4.50 x 108 . K = EPV/VHM = 1.592 x 1010/4.50 x 108 = 35.4. N = 2. Z = 2 / (2+35.4) = 5.4%. A priori mean = $57,500. Observation = $100,000. Estimate = (100,000)(5.4%) + (57,500)(1 - 5.4%) = 59,800. 9.18. B. VHM = .00433 - .0592 = .000849. EPV = .081. Type of Driver A B C
A Priori Chance of Driver 0.200 0.500 0.300
Mean
K = EPV/VHM = .081/.000849 = 95.4. Comment: Similar to 4, 11/01, Q.23.
Mean Freq. 0.020 0.050 0.100
Square of Mean Freq. 0.0004 0.0025 0.0100
Variance of Freq. 0.03 0.06 0.15
0.0590
0.00433
0.081
Z = 1/ (1 + 95.4) = 1.0%.
9.19. A. Given the number of claims incurred m, the mean number of claims reported is: .65m. Thus the VHM = Var[.65m] = .652 Var[m] = (.4225){((2-4)2 + (3-4)2 + (4-4)2 + (5-4)2 + (6-4)2 )/5} = (.4225)(2) = .845. Given the number of claims incurred m, the number of claims reported by year end is Binomial with q = .65 and m. Thus the process variance is: (.65)(.35)m = .2275m. EPV = E[.2275m] = .2275E[m] = (.2275)(4) = .91. K = EPV/VHM = .91/.845 = 1.077. For one observation of the risk process, Z = 1/(1+1.077) = 48.1%. Relying solely on the observation, the estimated number of claims incurred is: 3/.65= 4.615. The a priori mean number of claims incurred is 4. Thus the estimated number of claims incurred is: (.481)(4.615) + (.519)(4) = 4.30. Comment: Beyond what you are likely to be asked on your exam. One would estimate the number of claims yet to be reported as: 4.30 - 3 = 1.30. See “Loss Development Using Credibility”, by Eric Brosius.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 430 9.20. E. Adding the probabilities, there is a 30% a priori probability of Θ = 1, risk type A, and a 70% a priori probability of Θ = 2, risk type B. The mean for risk type A is: E[X | Θ = 1] = (0)(1/3) + (5)(1/3) + (25)(1/3) = 10. The 2nd moment for risk type A is: E[X2 | Θ = 1] = (02 )(1/3) + (52 )(1/3) + (252 )(1/3) = 216.67. Process Variance for Risk Type A is: Var[X | Θ = 1] = 216.67 - 102 = 116.67. Similarly, Risk Type B has mean 16.43, 2nd moment 367.86, and process variance 97.91. Risk Type
A Priori Chance
Mean
Square of Mean
Process Variance
A B
0.3 0.7
10.00 16.43
100.00 269.94
116.67 97.91
14.50
218.96
103.54
Average
The variance of the hypothetical means = 218.96 - 14.502 = 8.71. K = EPV/VHM = 103.54/8.71 = 11.9. Z = 30/(30 + K) = 71.6%. Observed mean is: 510/30 = 17. Prior mean is 14.5. The estimate using Credibility is: (.716)(17) + (.284)(14.5) = 16.3. Comment: Similar to 4, 11/02, Q.29. 9.21. D. Given the number of claims incurred m, the mean number of claims reported is: .7m. Thus the VHM = Var[.7m] = .72 Var[m] = (.49){(2)(1.6)(1+1.6)} = 4.077. Given the number of claims incurred m, the number of claims reported is Binomial with q = .7 and m. Thus the process variance is (.7)(.3)m = .21m. EPV = E[.21m] = .21E[m] = (.21)(2)(1.6) = .672. K = EPV/VHM = .672/4.077 = .165. For one observation of the risk process, Z = 1/1.165 = .858. Relying solely on the observation the estimated number of claims incurred is: 5/.7 = 7.143 The a priori mean number of claims incurred is: (2)(1.6) = 3.2. Thus the estimated number of claims incurred is: (.858)(7.143) + (.142)(3.2) = 6.58. Comment: Beyond what you are likely to be asked. Since in this case the Bayesian Analysis estimates turn out to lay along a straight line, they are equal to the estimate using Buhlmann Credibility. See a problem in a previous section for the Bayesian Analysis estimate for the same situation. See “Loss Development Using Credibility”, by Eric Brosius. 9.22. C. E[q] = Overall mean = 0.12. .03 = VHM = Var[q] = E[q2 ] - E[q]2 . Therefore, E[q2 ] = .03 + E[q]2 = .03 + .122 = .0444. EPV = E[q(1-q)] = E[q] - E[q2 ] = .12 - .0444 = .0756. K = EPV/VHM = .0756/.03 = 2.52. Z = 8/(8 + 2.52) = .760. Estimated future annual frequency = (.760)(2/8) + (.240)(.12) = .219. Estimated number of claims for 3 years = (3)(.219) = 0.657. Comment: Similar to 4, 11/01, Q.11.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 431 9.23. A. EPV = 1007.5. VHM = 1007.5 - 30.52 = 77.25. K = EPV/VHM = 13.0.
Type
A Priori Chance of This Type
θ
Process Variance
Mean
Square of Mean
1 2 3
0.700 0.200 0.100
25 40 50
625 1,600 2,500
25 40 50
625 1,600 2,500
1,007.50
30.50
1,007.50
Overall
Z = 3/(3 + 13.0) = 3/16. Observed Mean = (30 + 40 + 70)/3 = 46.67. Prior Mean = 30.5. Estimate = (3/16)(46.67) + (13/16)(30.5) = 33.53. Comment: Since there is no mention of differing frequency by type, we assume the mean frequencies for the types are the same; i.e., we ignore frequency. N = 3 is the number of claims. The number of years observed is not used. 9.24. B. The process variance for a risk from class X is: {(1.4)(3)}2 = 17.64.
Class
A Priori Chance of This Type
µ
Process Variance
Mean
Square of Mean
X Y Z
0.500 0.300 0.200
3 4 5
17.64 31.36 49.00
3 4 5
9 16 25
28.03
3.70
14.30
Overall
EPV = 28.03. VHM = 14.30 - 3.702 = 0.61. K = EPV/VHM = 50.0. Z = 325/(325 + 50.0) = 86.7%. Prior Mean = 3.70. Observed Mean = (403 + 360 + 371)/(124 + 103 + 98) = 1134/325 = 3.49. Estimated future pure premium = (86.7%)(3.49) + (1 - 86.7%)(3.70) = 3.52. Estimated aggregate loss for 100 exposures: (100)(3.52) = 352. Comment: Similar to 4, 5/07, Q. 36.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 432 9.25. D. As a first step adjust everything to the year 4 level. The means of the Exponentials are: (200)(1.053 ) = 231.53, and (300)(1.053 ) = 347.29. The EPV is: (75%)(231.532 ) + (25%)(347.292 ) = 70,357. The overall mean is: (75%)(231.53) + (25%)(347.29) = 260.47. The VHM is: (75%)(231.53 - 260.47)2 + (25%)(347.29 - 260.47)2 = 2513. K = 70,357/2513 = 28. We observe 4 claims, so Z = 4/(4 + 28) = 1/8. On a year four level, the observed average claim size is: {20(1.053 ) + 100(1.053 ) + 50(1.05) + 400(1.05)}/4 = 152.85. The estimated future claim size on the year 4 level is: (1/8)(152.85) + (7/8)(260.47) = 247.02. Alternately, working in year 1, the EPV is: (75%)(2002 ) + (25%)(3002 ) = 52,500. The overall mean is: (75%)(200) + (25%)(300) = 225. The VHM is: (75%)(200 - 225)2 + (25%)(300 - 225)2 = 1875. K = 52,500/1875 = 28. We observe 4 claims, so Z = 4/(4 + 28) = 1/8. On a year one level, the observed average claim size is: (20 + 100 + 50/1.052 + 400/1.052 )/4 = 132.04. The estimated future claim size on the year 1 level is: (1/8)(132.04) + (7/8)(225) = 213.38. On the year four level: (213.38)(1.053 ) = 247.01. Comment: Both the EPV and VHM are multiplied by the total inflation factor squared. Therefore, K, the Buhlmann Credibility Parameter, is not affected by inflation which changes the scale.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 433 9.26. A. As a first step adjust everything to the year 2008 level. The LogNormals have parameters σ = 1.5, and µ = 8 + ln(1.13 ) = 8.286 and 9 + ln(1.13 ) = 9.286. The first LogNormal has mean exp[8.286 + 1.52 /2] = 12,222, second moment exp[(2)(8.286) + (2)1.52 ] = 1,417,272,377, and variance 1,417,272,377 - 12,2222 = 1,267,895,093. The second LogNormal has mean exp[9.286 + 1.52 /2] = 33,223, second moment exp[(2)(9.286) + (2)1.52 ] = 10,472,305,100, and variance 10,472,305,100 - 33,2232 = 9,368,537,371. The EPV is: (50%)(1,267,895,093) + (50%)(9,368,537,371) = 5,318,216,232. The overall mean is: (50%)(12,222) + (50%)(33,223) = 22,723. The VHM is: (50%)(12,222 - 22,723)2 + (50%)(33,223 - 22,723)2 = 110,250,000. K = EPV/VHM = 5,318.22/110.25 = 48.2. We observe 3 years, so Z = 3/(3 + 48.2) = 5.9%. On the year 2008 level, the average aggregate loss is: {(32,000)(1.13 ) + (29,000)(1.12 ) + (37,000)(1.1)}/3 = 39,461. The estimated aggregate loss for 2008 is: (5.9%)(39,461) + (1 - 5.9%)(22,723) = 23,711. Alternately, working in the year 2004, the first LogNormal has mean exp[8 + 1.52 /2] = 9182, second moment exp[(2)(8) + (2)1.52 ] = 799,902,178, and variance 799,902,178 - 91822 = 715,593,054. The second LogNormal has mean exp[9 + 1.52 /2] = 24,959, second moment exp[(2)(9) + (2)1.52 ] = 5,910,522,063, and variance 5,910,522,063 - 24,9592 = 5,287,570,382. The EPV is: (50%)(715,593,054) + (50%)(5,287,570,382) = 3,001.6 million. The overall mean is: (50%)(9182) + (50%)(24,959) = 17,071. The VHM is: (50%)(9182 - 17,071)2 + (50%)(24,959 - 17,071)2 = 622.2 million. K = EPV/VHM = 3,001.6/ 622.2 = 48.2. We observe 3 years, so Z = 3/(3 + 48.2) = 5.9%. On a year 2004 level, the average aggregate loss is: (32,000 + 29,000/1.1 + 37,000/1.12 )/3 = 29,647. The estimated future aggregate loss on the year 2004 level is: (5.9%)(29,647) + (1 - 5.9%)(17,071) = 17,813. On the year 2008 level, the estimated aggregate loss is: (17,813)(1.13 ) = 23,709. Comment: Both the EPV and VHM are multiplied by the total inflation factor squared. Therefore, K, the Buhlmann Credibility Parameter, is not affected by inflation which changes the scale.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 434 9.27. B. For β = 1, the process variance is: rβ (1+β) = (3)(1)(1 + 1) = 6. For β = 2, the process variance is: rβ (1+β) = (3)(2)(1 + 2) = 18. EPV = (6 + 18)/2 = 12. For β = 1, the mean is: rβ = (3)(1) = 3. For β = 2, the mean is: rβ = (3)(2) = 6. The overall mean is: (3 + 6)/2 = 4.5. The VHM = (.5)(3 - 4.5)2 + (.5)(6 - 4.5)2 = 2.25. K = EPV/VHM = 12/2.25 = 5.33. Z = 1/(1 + 5.33) = 15.8%. (15.8%)(7) + (1 - 15.8%)(4.5) = 4.895. 9.28. D. f(7) = {r(r+1)(r+2)(r+3)(r+4)(r+5)(r+6)/7!}β7/(1+β)r+7 = 36β7/(1+β)10. For β = 1, f(7) = 3.516%. For β = 2, f(7) = 7.804%. P[Observation] = (1/2)(3.516%) + (1/2)(7.804%) = 5.66%. By Bayes Theorem, P[Risk Type | Observation] = P[Obser. | Type] P[Type] / P[Observation]. P[β = 1 | Observation] = (3.516%)(.5)/(5.66%) = 31.06%. P[β = 2 | Observation] = (7.804%)(.5)/(5.66%) = 68.94%. (31.06%)(3)(1) + (68.94%)(3)(2) = 5.068. 9.29. B. For each LogNormal, E[X] = exp[µ + σ2/2] and E[X2 ] = exp[2µ + 2σ2]. Process variance = Second Moment - Mean2 . µ
σ
Mean
Square of Mean
Second Moment
Process Variance
6 7 6 7
0.5 0.5 1 1
457.14 1,242.65 665.14 1,808.04
208,981.29 1,544,174.47 442,413.39 3,269,017.37
268,337.29 1,982,759.26 1,202,604.28 8,886,110.52
59,356.00 438,584.80 760,190.89 5,617,093.15
1,043.24
1,366,146.63
3,084,952.84
1,718,806.21
Average
VHM = 1,366,146.63 - 1043.242 = 277,797. K = EPV/VHM = 1,718,806 / 277,797 = 6.2.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 435 9.30. A. VHM = 0.031 - 0.152 = 0.0085. Type
A priori Probability
Process Variance
Mean
Square of Mean
I II III
0.7 0.2 0.1
0.09000 0.16000 0.24000
0.10000 0.20000 0.40000
0.010 0.040 0.160
0.11900
0.15000
0.03100
EPV = 0.119. K = 0.119/0.0085 = 14. Z = 6/(6 + 14) = 30%. Estimate = (3/6)(30%) + (0.15)(70%) = 0.255. Comment: Setup taken from 4, 11/03, Q.39, where instead one should use Bayes analysis. 9.31. A. For frequency, EPV = 0.04548 and VHM = 0.00252 - 0.0482 = 0.000216. Type 1 2 Average
A Priori Probability 40% 60%
Mean 0.03000 0.06000
Square of Mean 0.00090 0.00360
Process Variance 0.02910 0.05640
0.04800
0.00252
0.04548
K = EPV / VHM = 0.04548/0.00216 = 21.1. There are a total of 100 exposures, so Z = 100/(100 + 21.1) = 82.6%. Observed frequency is 4/100. Prior mean is 0.048. Estimated future frequency is: (82.6%)(0.040) + (1 - 82.6%)(0.048) = 4.14%.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 436 x α -1 e - x / θ 9.32. B. The density of the Gamma Distribution is: . θ α Γ(α) Since θ is the same for both risk types, we can ignore the factor of: e-x/θ The chance of the observation is proportional to: q2
(1-q)18
500 α -1 1000 α -1 0 300 α -1 800 α -1 25 1 29 1 24 q (1-q) q (1-q) q (1-q) = 100 α Γ(α) 100 α Γ(α) 100 α Γ(α) 100 α Γ(α)
q4
(1-q)96
1200 α -1 . 1004 Γ(α) 4
Therefore, ignoring the constant of 1004 in the denominator, the probability weights are: (40%)(0.034 )(0.9796)
3 12005 4 )(0.9496) 1200 = 0.02729. = 0.20884, and (60%)(0.06 (5!)4 (3!)4
Thus the posterior distribution is: 88.44% and 11.56%. Therefore, the estimated future frequency is: (88.44%)(0.03) + (11.56%)(0.06) = 3.35%. Comment: In the case of Bayes analysis, we use all of the information given. In this case, we are given information on severity, both in the model and the observation, and thus this is used to help get the chance of the observation and thus the posterior distribution. If we not been given the information on severity then the analysis would have been instead: The chance of the observation is: ⎛ 20 ⎞ ⎛ 25 ⎞ ⎛ 30 ⎞ ⎛ 25 ⎞ ⎜ 2 ⎟ q2 (1-q)18 ⎜ 0 ⎟ q0 (1-q)25 ⎜ 1 ⎟ q1 (1-q)29 ⎜ 1 ⎟ q1 (1-q)24. ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ This is proportional to: q4 (1-q)96. Therefore, the probability weights are: (40%)(0.034 )(0.9796), and (60%)(0.064 )(0.9496). Thus the posterior distribution is:
and
(40%)(0.034 )(0.9796) (40%)(0.034 )(0.9796) + (60%)(0.064)(0.9496 )
(60%)(0.064 )(0.9496) (40%)(0.034 )(0.9796) + (60%)(0.064)(0.9496 )
= 45.96%,
= 54.04%.
Therefore, the estimated future frequency is: (45.96%)(0.03) + (54.04%)(0.06) = 4.62%.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 437 9.33. C. The mean frequencies are 3% and 6%, thus of the claims we expect from type 1: (40%)(3%) = 25%, and the remaining 75% from type 2. (40%)(3%) + (60%)(6%) Mean Severity is 100α. Variance of Severity is 1002 α = 10,000 α. Thus the two process variances are: 60,000 and 40,000. For severity, EPV = (25%)(60,000) + (75%)(40,000) = 45,000. The hypothetical means are: 600 and 400. Thus the first moment of the hypothetical means is: (25%)(600) + (75%)(400) = 450. The second moment of the hypothetical means is: (25%)(6002 ) + (75%)(4002 ) = 210,000. VHM = 210,000 - 4502 = 7500. K = EPV / VHM = 45,000/7500 = 6. There are a total of 4 claims, so Z = 4/(4 + 6) = 40%. Observed severity is: (500 + 1000 + 300 + 800)/4 = 650. Prior mean is 450. Estimated future severity is: (40%)(650) + (60%)(450) = 530.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 438 x α -1 e - x / θ 9.34. E. The density of the Gamma Distribution is: . θ α Γ(α) Since θ is the same for both risk types, we can ignore the factor of: e-x/θ The chance of the observation is proportional to: q2
(1-q)18
500 α -1 1000 α -1 0 300 α -1 800 α -1 25 1 29 1 24 q (1-q) q (1-q) q (1-q) = 100 α Γ(α) 100 α Γ(α) 100 α Γ(α) 100 α Γ(α)
q4
(1-q)96
1200 α -1 . 1004 Γ(α) 4
Therefore, ignoring the constant of 1004 in the denominator, the probability weights are: (40%)(0.034 )(0.9796)
3 12005 4 )(0.9496) 1200 = 0.02729. = 0.20884, and (60%)(0.06 (5!)4 (3!)4
Thus the posterior distribution is: 88.44% and 11.56%. The hypothetical mean severities are: 600 and 400. Therefore, the estimated future severity is: (88.44%)(600) + (11.56%)(400) = 577. Comment: In the case of Bayes analysis, we use all of the information given. In this case, we are given information on frequency, both in the model and the observation, and thus this is used to help get the chance of the observation and thus the posterior distribution. If we not been given the information on frequency in the model, then the analysis would have been instead: The chance of the observation is proportional to: 500 α -1 1000 α -1 300 α -1 800 α -1 1200 α -1 = . 100 α Γ(α) 100 α Γ(α) 100 α Γ(α) 100 α Γ(α) 1004 Γ(α) 4 Therefore, the probability weights are: (40%)
12005 12003 = 0.048, and (60%) = 0.008. 1004 (5!)4 1004 (3!)4
Thus the posterior distribution is: 85.71% and 14.29%. The hypothetical mean severities are: 600 and 400. Therefore, the estimated future severity is: (85.71%)(600) + (14.29%)(400) = 571.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 439 9.35. D. Mean frequency is q. Variance of frequency is: q(1-q). Mean Severity is 100α. Variance of Severity is: 1002 α. Thus the variance of the pure premium is: q1002 α + (100α)2 q(1-q) = 10,000qα{1 + α(1 - q)}. Thus the two process variances are: (10,000)(0.03)(6){1 + (6)(0.97)} = 12,276, and (10,000)(0.06)(4){1 + (4)(0.94)} = 11,424. Thus, for pure premium, EPV = (40%)(12,276) + (60%)(11,424) = 11,765. The hypothetical means are: (0.03)(600) = 18, and (0.06)(400) = 24. Thus the first moment of the hypothetical means is: (40%)(18) + (60%)(24) = 21.6. The second moment of the hypothetical means is: (40%)(182 ) + (60%)(242 ) = 475.2. VHM = 475.2 - 21.62 = 8.64. K = EPV / VHM = 11,765/8.64 = 1362. There are a total of 100 exposures, so Z = 100/(100 + 1362) = 6.8%. Observed pure premium is: (500 + 1000 + 300 + 800)/100 = 26. Prior mean is 21.6. Estimated future pure premium is: (6.8%)(26) + (1 - 6.8%)(21.6) = 21.90 Given 40 exposures, the estimated aggregate loss is: (40)(21.90) = 876.
9.36. A. The density of the Gamma Distribution is:
x α -1 e - x / θ . θ α Γ(α)
Since θ is the same for both risk types, we can ignore the factor of: e-x/θ The chance of the observation is proportional to: q2 (1-q)18
q4 (1-q)96
α -1 α -1 500 α -1 1000 α -1 0 25 q1 (1-q)29 300 1 (1-q)24 800 q (1-q) q = 100 α Γ(α) 100 α Γ(α) 100 α Γ(α) 100 α Γ(α)
1200 α -1 . 1004 Γ(α) 4
Therefore, ignoring the constant of 1004 in the denominator, the probability weights are: (40%)(0.034 )(0.9796)
3 12005 4 )(0.9496) 1200 = 0.02729. = 0.20884, and (60%)(0.06 (5!)4 (3!)4
Thus the posterior distribution is: 88.44% and 11.56%. The hypothetical mean pure premiums are: (0.03)(600) = 18, and (0.06)(400) = 24. Therefore, the estimated future pure premium is: (88.44%)(18) + (11.56%)(24) = 18.69. Given 30 exposures, the estimated aggregate loss is: (30)(18.69) = 561. Comment: In the case of Bayes analysis, we use all of the information given. In this case, we are given information on both frequency and severity, both in the model and the observation, and thus this is used to help get the chance of the observation and thus the posterior distribution. Thus the posterior distribution is the same for estimating frequency, severity, and pure premiums: 88.44% chance of type 1 and 11.56% chance of type 2.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 440 9.37. D. The hypothetical mean frequencies are: 5% and 10% VHM = 2.5%2 = 0.000625. The process variances are: (0.05)(0.95) = 0.0475, and (0.1)(0.9) = 0.09. EPV = (0.0475 + 0.09)/2 = 0.06875. K = EPV / VHM = 0.06875 / 0.000625 = 110. Z = 3 / (3 + K) = 2.65%. 9.38. B. The mean severity is 2000. The variance of severity is 1,800,000 The hypothetical means are: (5%)(2000) = 100, and (10%)(2000) = 200. VHM = 502 = 2500. The process variance for type 1 is: (0.05)(1,800,000) + (20002 )(0.05)(0.95) = 280,000. The process variance for type 1 is: (0.10)(1,800,000) + (20002 )(0.10)(0.90) = 540,000. EPV = (280,000 + 540,000)/2 = 410,000. K = EPV / VHM = 410,000 / 2500 = 164. Z = 3 / (3 + K) = 1.80%. 9.39. A. Expected Value of the Process Variance = .19. Class 1 2 Average
A Priori Probability 0.4000 0.6000
Mean for this Class 0.2 0.3
Square of Mean of this Class 0.04 0.09
Process Variance 0.16 0.21
0.26
0.0700
0.1900
Variance of the Hypothetical Means = .070 - .262 = .0024. K = EPV / VHM = .19 / .0024 = 79.2. Thus for N = 5, Z = 5/(5 + 79.2) = 5.94%.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 441 9.40. D. Set the Poisson parameter for an insured equal to λ. We are given that across the different insureds E[λ] = .1 and VAR[λ] = .1. Therefore, E[λ2 ] = .1 + .12 = .11. Set the mean severity for an insured equal to µ. We are given that across the different insureds E[µ] = 100 and VAR[µ] = 2500. Therefore, E[µ2 ] = 2500 + 1002 = 12500. We are given that the process variance of the severity is µ2 . Therefore the second moment of the severity process = severity process variance + (mean severity)2 = µ2 + µ2 = 2 µ2 . Since the frequency is Poisson and frequency and severity are independent, the process variance of the pure premium is equal to: (mean frequency)(2nd moment of the severity) = 2λµ2 . The expected value of the process variance of the P.P. = E[2λµ2 ] = 2E[λ]E[µ2 ] = (2)(.1)(12500) = 2500. (Where we have made use of the fact that the frequency and severity are independent.) For an individual insured, the hypothetical pure premiums is λµ. The variance of the hypothetical pure premiums = E[(λµ)2] - E2 [λµ] = E[λ2 ]E[µ2 ] - E2 [λ] E2 [µ] = (12500)(.11) - (.12 )(1002 ) = 1375 - 100 = 1275. K = EPV/VHM = 2500 / 1275 = 1.961. For one observation Z = 1 /(1+1.961) = 0.338. Comment: Difficult. Really tests your understanding of the concepts. 9.41. B. Z = N / (N+K). .40 = 2 / (2+K). Therefore, K = 3. Therefore for N = 3, Z = 3 / (3+3) = 0.5. 9.42. D. The hypothetical mean pure premiums are: (.1667)(4) = 2/3 and (.8333)(2) = 5/3. Since the two classes have the same number of risks the overall mean is 7/6 and the variance of the hypothetical mean pure premiums between classes is: {(2/3 - 7/6)2 + (5/3 - 7/6)2 }/2 = 1/4. For each type of risk, the process variance of the pure premiums is given by: µfσs2 + µs2 σf2 . For Class A, that is: (.1667)(20)+(42 )(.1389) = 5.5564. For Class B, that is: (.8333)(5)+(22 )(.1389) = 4.7221. Since the classes have the same number of risks, the Expected Value of the Process Variance = (.5)(5.5564) + (.5)(4.7221) = 5.139. Thus K = EPV / VHM = 5.139 / .25 = 20.56. Z = N /(N + K) = 4/(4 + 20.56) = 0.163. 9.43. E. The prior estimate is the overall mean of 7/6. The observation is .25. Thus the new estimate is: (.163)(.25) + (7/6)(1 - .163) = 1.017. Comment: Uses the solution of the previous question.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 442 9.44. A. Variance of Hypothetical Means = .520 - .72 = .03. Type Risk A B Mean
A Priori Chance of Risk 0.750 0.250
Avg. Claim Frequency 0.6 1.0 0.700
Square of Avg. Claim Frequency 0.36 1.00 0.520
Second Moment of Claim Freq. 1.400 2.400
Process Variance Claim Freq. 1.04 1.40 1.130
K= EPV /VHM = 1.13 / .030 = 37.7. Z = 1 / (1+37.7) = .026. New Estimate = (.026)(1) + (1 - .026)(.700) = 0.708. Comment: The Process Variance for each risk is the Second Moment of the Frequency minus the square of Average Claim Frequency. For example, 1.4 - .36 = 1.04. The second moment of the claim frequency for Risk B is computed as: (02 )(.5) + (12 )(.2) + (22 )(.1) + (32 )(.2) = 2.40. 9.45. E. We have E[θ] = .25 and Var[θ] = E[θ2] - E[θ]2 = .07. Therefore, E[θ2] = .1325. For the Binomial we have a process variance of mq(1-q) = θ(1−θ) = θ − θ2. Therefore, the Expected Value of the Process Variance = E[θ] - E[θ2] = .25 - .1325 = .1175. For the Binomial the mean is mq = θ. ⇒ Variance of the Hypothetical Means = Var[θ] = .07. K = EPV / VHM = .1175 / .07 = 1.6786. For one observation Z = 1 / (1 + 1.6786) = 0.373. Comment: Since the Buhlmann credibility does not depend on the form of the prior distribution (but just on its mean and variance), one could assume it was a Beta Distribution. In that case one would have a Beta-Bernoulli conjugate prior situation. (See “Mahlerʼs Guide to Conjugate Priors”.) One can then solve for the parameters a and b of the Beta: mean = .25 = a / (a+b) and variance = .07 = ab / {(a+b+1)(a+b)2 }. Therefore b = 3a. ⇒ 4a+1 = (.25)(.75) / .07 = 2.679. ⇒ a = .420. ⇒ b = 1.260. The Buhlmann Credibility parameter for the Beta-Bernoulli is: a + b = 1.68. For one observation, Z = 1 /(1+1.68) = .373.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 443 9.46. B. The variance of the hypothetical means = 14,318,301 - 36532 = 973,892. Risk 1 2
A priori Probability 0.666 0.333
Average
Mean 4350 2270
Second Moment 80,305,000 40,207,000
3,653
Process Variance 61,382,500 35,054,100
Square of Mean 18,922,500 5,152,900
52,553,760
14,318,301
Thus K = EPV / VHM = 52,553,760 / 973,892 = 54. For one observation, Z = 1 / (1 + 54) = 1/55. Thus the new estimate = ($100)(1/55) + ($3653)(54/55) = $3588. Comment: (.7)(1002 ) + (.2)(10002 ) + (.1)(200002 ) = 40,207,000. 40,207,000 - 22702 = 35,054,100. The Buhlmann Credibility estimate has to be between the a priori mean of $3653 and the observation of $100, so that prior to calculating Z one knows that the solution is either A , B, or just barely C. 9.47. D. The Mean for Risk A = (.84)(0) + (.16)(50) + (.04)(1000) = 48. The Second Moment for Risk A = (.84)(02 ) + (.16)(502 ) + (.04)(10002 ) = 40400. The Process Variance for Risk A = 40400 - 482 = 38096. Risk A B
A priori Probability 0.5 0.5
Average
Mean 48 172
Second Moment 40400 160600
110
Process Variance 38096 131016
Square of Mean 2304 29584
84556
15944
The variance of the hypothetical means = 15944 - 1102 = 3844. Thus K = EPV / VHM = 84556 / 3844 = 22. For two observations, Z = 2 / (2 + 22) = 1/12. Thus the new estimate = ($0)(1/12) + ($110)(11/12) = $100.83. 9.48. B. Since the frequency and severity are independent, for each class the Process Variance of the Pure Premium = (mean frequency)(variance of severity) + (mean severity)2 (variance of frequency). Class 1 2 Weighted Average
A Priori Probability 0.25 0.75
Mean Pure Premium 1 4
Variance of Pure Premium 17 66
Square of Mean P.P. 1 16
3.25
53.75
12.25
Variance of Hypothetical Mean P.P. = 12.25 - (3.25)2 = 1.6875. K = EPV/ VHM = 53.75 / 1.6875 = 31.9. Z = 1/(1+31.9) = 3.0%. New Estimate = (.03)(P1 ) + (.97)(3.25) = 0.03P1 + 3.15. Comment: One has to assume that one is unaware of which class the risk has been selected from, just as in for example problems involving urns, where one doesnʼt know from which urn a ball has been drawn.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 444 9.49. A. Any information about which risk we had chosen, which may have been contained in prior observations, is no longer relevant once we make a new random selection of a risk from the portfolio. Therefore our best estimate is the a priori mean of 3.25. 9.50. B. For example, the second moment of aggregate losses for risk type A is: (02 )(.80) + (502 )(.16) + (20002 )(.04) = 160400. Therefore the Process Variance for type A risk is: 164000 - 882 = 152,656. Type Risk A B C Mean
A Priori Chance of Risk 0.333 0.333 0.333
Average Aggregate Losses 88.0 332.0 576.0 332.0
Square of Average Aggregate Losses 7,744 110,224 331,776 149,915
Second Moment of Agg. Losses 160,400 640,600 1,120,800
Process Variance Agg. Losses 152,656 530,376 789,024 490,685
The overall a priori mean is 332. The second moment of the means is: 149915. Therefore, the Variance of the Hypothetical Means = 149915 - 3322 = 39691. Expected Value of the Process Variance = 490685. K = EPV / VHM = 490685/39691 = 12.4. Z= 1 / (1+12.4) = 7.5%. New estimate = (.075)(50) + (.925) (332) = $311. 9.51. D. Variance of the Hypothetical Means = 2.5 - 1.52 = .25. Class 1 2
A Priori Probability 0.5000 0.5000
Mean for this Class 1 2
Square of Mean of this Class 1 4
Process Variance 1 2
1.5
2.5
1.5
Average
K= EPV / VHM = 1.5 / .25 = 6. Thus for N = 2, Z = 2/(2 + 6) = 25%. The observed mean is (2+0)/2 = 1 and the a priori mean is 1.5. Therefore, the new estimate = (1)(25%) + (1.5)(75%) = 1.375. 9.52. C. The Expected Value of the Process Variance = .2222. The Variance of the Hypothetical Means = .2222 - .44442 = .0247. K = EPV/VHM = .2222 / .0247 = 9. Z = 2/(2+9) = 2/11. The observed frequency is 2/2 = 1. The a priori mean is .4444. Thus the estimate of the future frequency is: (2/11)(1) + (9/11)(.4444) = .545 = 6 /11. Class
A Priori Chance of This Class
Mean Annual Claim Freq.
Square of Mean Claim Freq.
Bernoulli Process Variance
1 2 3
0.3333 0.3333 0.3333
0.3333 0.3333 0.6667
0.1111 0.1111 0.4444
0.2222 0.2222 0.2222
0.4444
0.2222
0.2222
Average
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 445 9.53. C. The process variances of the classes are 4002 /12, 6002 /12, and 8002 /12. Since the a priori probabilities of the class are all equal, the Expected Value of the Process Variance = {(4002 /12) + (6002 /12) + (8002 /12)} / 3 = 32,222. The means of the classes are 200, 300 and 400. Thus the a priori overall mean is: (200+300+400) /3 = 300. Thus the Variance of the Hypothetical Means is: {(200-300)2 + (300-300)2 + (400-300)2 } / 3 = 6667. Thus the Buhlmann Credibility Parameter K = EPV/VHM = 32222/6667 = 4.83. Thus for one claim, (N=1), Z = 1/(1+4.83) = .171. Thus after an observation of 340, the Buhlmann credibility estimate of the expected value of a second claim from the same risk is: (340)(.171) + (300)(1 - .171) = 307. In spreadsheet form, the calculation of the EPV and VHM is as follows: Class 1 2 3 Overall
A Priori Chance of this Class 0.333 0.333 0.333
Process Variance 13333 30000 53333
Mean for this Class 200 300 400
Square of Mean for this Class 40000 90000 160000
32222
300
96667
The Variance of the Hypothetical Means = 96667 - 3002 = 6667. Comment: The variance of the uniform distribution on [a,b] is (b-a)2 /12. Recall that when estimating severities, the number of observations is in terms of the number of claims, in this case one. Since when using credibility the posterior estimate is always between the a priori mean and the observation, we can eliminate choices A and B.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 446 9.54. A. The mean pure premium for Class One is u3/4. The second moment of the pure premium is (0)(1/4) + (u2 )(3/4) = (u2 )(3/4); thus the Process Variance for the premium is: (u2 )(3/4) - (u3/4)2 = 3u2 /16. Alternately, the process is u times a Bernoulli, so the process variance is u2 times the process variance of a Bernoulli: u2 q(1-q) = u2 (3/4)(1/4) = 3u2 /16. Similarly, the process variance for Class 2, which has 2u times a Bernoulli with q =1/4, is: (2u)2 (1/4)(1-1/4) = 12u2 /16. Thus since the classes are equally likely, the EPV = (1/2)(3u2 /16) + (1/2)(12u2 /16) = u2 15/32. The mean pure premium for Class 2 is (2u)(1/4) = 2u/4. That for Class 1 is u3/4. Thus since the two classes are equally likely, the overall mean pure premium is: (1/2)(3u/4) + (1/2)(2u/4) = 5u/8. The Variance of the Hypothetical Mean Pure Premiums is: (1/2)(5u/8 - 3u/4)2 + (1/2)(5u/8 - 2u/4)2 = u2 /64. Thus the Buhlmann Credibility Parameter, K = EPV/VHM = (u2 15/32)/( u2 /64) = 30. Thus for one exposure, the credibility Z = 1 /(1+K) = 1/31 = 0.032. In spreadsheet form the calculation of the EPV and VHM is as follows: Class 1 2 Average
A Priori Probability 0.5000 0.5000
Mean for this Class 3u/4 2u/4
Square of Mean of this Class (9/16)u^2 (4/16)u^2
Process Variance (3/16)u^2 (12/16)u^2
5u/8
(13/32)u^2
(15/32)u^2
VHM = (13/32)u2 - (5u/8)2 = u2 (26/64 - 25/64) = u2 /64. EPV = (15/32)u2 . Comment: Since the given solutions do not depend on u, just taking u =1 at the beginning may help you to get the correct solution. (u drops out of the solution.)
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 447 9.55. B. Each class has a mixed distribution. The moments of a mixed distribution are weighted averages of the moments of the individual distributions. Thus the mean for Class A is: (1/2)(1/6) + (1/2)(1/3) = 1/4. Similarly, the mean for class B is: (1/2)(2/3) + (1/2)(5/6) = 3/4. Class A and B are equally likely, therefore the overall mean is 1/2. The Variance of the Hypothetical Means is: (1/2)(1/4-1/2)2 + (1/2)(3/4-1/2)2 = 1/42 = 1/16. The second moment of a Poisson with mean λ (and variance λ) is: λ + λ2. Thus the second moment for a Poisson with mean 1/6 is: 1/6 + 1/36 = 7/36. The second moment for a Poisson with mean 1/3 is: 1/3 + 1/9 = 16/36. Thus the second moment for class A is: (1/2)(7/36) + (1/2)(16/36) = 23/72. Therefore, the process variance of Class A is: 23/72 - (1/4)2 = 37/144. Similarly, the second moment for Class B is: (1/2)(2/3 + 4/9) + (1/2)(5/6 + 25/36) = 95/72. Therefore, the process variance of Class B is: 95/72 - (3/4)2 = 109/144. Therefore, EPV = (1/2)(37/144) + (1/2)(109/144) = 73/144. Buhlmann Credibility Parameter, K = EPV/VHM = (73/144)/(1/16) = 73/9. For one observation, Z = 1/(1 + 73/9) = 9/82.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 448 9.56. E. Risk A has a mean of .5 and a variance of 12 /12 = 1/12. Risk B has a mean of 1 and a variance of 22 /12 = 1/3. Since the risks are equally likely, the a priori mean is: (.5+1)/2 = .75. The VHM = (1/4)2 = 1/16. The EPV = (1/12 + 1/3)/2 = 5/24. K = EPV/VHM = (5/24)/(1/16) = 10/3. For one observation, Z = 1/(1+10/3) = 3/13. If the observation is L, the estimate from Buhlmann Credibility is: X = (3/13)L + (10/13)(.75) = (6L + 15)/26. The density at L given risk A is 1 If L≤1 and 0 if L >1. The density at L given risk B is 1/2. If L ≤ 1, then the density at L given Risk A is twice that given Risk B. Thus since the risks are equally likely a priori, the posterior distribution is: 2/3 Risk A and 1/3 Risk B, if L < 1. If L > 1, then the posterior distribution is: 100% B. If L ≤ 1, then the Bayesian estimate is: Y = (2/3)(1/2) + (1/3)(1) = 2/3. If L > 1, then the Bayesian estimate of next yearʼs losses is: Y = (0)(1/2) + (1)(1) = 1. Now we check versus the given statements. A. If L < 1, X = (6L + 15)/26 and Y = 2/3. For L < 7/18, X < Y. For L > 7/18, X > Y. Thus Statement A is not true. B. If L > 1, X = (6L + 15)/26 and Y = 1. For L < 11/6, X < Y. For L > 11/6, X > Y. Thus Statement B is not true. C. If L =1/2, then X = (6L + 15)/26 = 18/26 = 9/13 = .692 > 2/3 = Y. Thus Statement C is not true. D. X = Y at L = 7/18 and L = 11/6. Thus Statement D is not true. E. At L = 7/18, X = Y = 2/3. At L = 11/6, X = Y = 1. Statement E is true. Comment: The uniform distribution on [a, b] has variance: (b-a)2 /12. A graph of the two estimates as a function of L follows:
1 0.9 0.8 0.7 0.6
0.5
1
1.5
2
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 449 9.57. A. Let the unknown process variance for Class 2 be y. Mean = ((380)(25) + (23)(50))/(25+50) = 142. Second Moment of the Hypothetical Means = ((3802 )(25) + (232 )(50))/(25+50) = 48,486. VHM = 48486 - 1422 = 28,322. EPV = k VHM = (2.65)(28,322) = 75,053. Process Variance for Class 1 = 365,000 - 3802 = 220,600. 75053 = EPV = ((220600)(25) + 50y)/75. Therefore, y = 2280. Comment: Gives the usual output and asks you to solve for a missing input. The missing input, E[X2 ] for Class 2 is: 2280 + 232 = 2809. If one had been given this value, the calculation of K would have gone as follows: Class 1 2
A Priori Probability 0.3333 0.6667
Hypothetical Mean 380 23
Square of Hypoth. Mean 144,400 529
142
48,486
Average
Second Moment 365,000 2,809
Process Variance 220,600 2,280 75,053
VHM = 48,486 - 1422 = 28,322. K = EPV/VHM = 75,053/28,322 = 2.65, matching the given value of K. 9.58. B. EPV = .32 + .08c2 . VHM = .08 + .02c2 - (.2 + .1c)2 = .04 - .04c + .01c2 . Class Mean P.P. Square of Mean Variance of P.P. A
(2)(.2)= .4
.16
(22 )(.2)(.8) = .64
B
.2c
.04c2
c2 (.2)(.8) = .16c2
Overall
.2 + .1c
.08 + .02c2
.32 + .08c2
K = EPV/VHM = (.32 + .08c2 )/(.04 - .04c + .01c2 ) = (.32/c2 + .08)/(.04/c2 - .04/c + .01). The limit as c approaches ∞ of K is .08/.01 = 8. Therefore, Z approaches 1/(1+8) = 1/9. Alternately, one can turn this into a numerical problem by taking a value for c that is much larger than 2. For example, for c = 1000: For Class A, mean P.P. is: (.2)(2) = .4, process variance of P.P. is: (22 ){(.2)(.8)} = .64. For Class B, mean P.P. is: (.2)(1000) = 200, process variance of P.P. is: (10002 ){(.2)(.8)} = 160,000. Class A B
A Priori Probability 0.5000 0.5000
Average
Mean P.P. 0.4 200
Square of Mean P.P. 0.16 40,000
Process Variance 0.6400 160,000
100.2
20,000.08
80,000.32
VHM = 20000.08 - 100.22 = 9960.04. K = EPV/VHM = 80000.32/9960.04= 8.032. Z = 1/(1 + 8.032) ≅ 1/9.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 450 9.59. D. K = EPV/VHM = 1.793/.370 = 4.85. Z = 9/(9 + K) = .650. Estimated frequency for the class = (.65)(7/9) + (1 - .65)(.425) = .654. For five policyholders, we expect (5)(.654) = 3.27 claims. Comment: This is an example of classification ratemaking. One can use the observation to predict the future claim frequency of policyholders from the same class, whether or not these are the same policyholders one has observed. The hypothetical means are the hypothetical mean frequencies for the each class. It is assumed that each individual in a given class has the same frequency distribution. The VHM is the variance between classes. 9.60. D. E[q] = Overall mean = 0.1. .01 = VHM = Var[q] = E[q2 ] - E[q]2 . Therefore, E[q2 ] = .01 + E[q]2 = .01 + .12 = .02. EPV = E[q(1-q)] = E[q] - E[q2 ] = .10 - .02 = .08. K = EPV/VHM = .08/.01 = 8. Z = 10/(10 + 8) = 5/9. Estimated future frequency = (5/9)(0) + (4/9)(.1) = .0444. Estimated number of claims for 5 years = (5)(.0444) = 0.222. 9.61. C. K = EPV/VHM = 8000/40 = 200. Z = 1800/(1800 + 200) = 90%. Total reported losses = (15)(800) + (10)(600) + (5)(400) = 20,000. Observed pure premium = 20000/1800 = 11.11. Overall mean pure premium = 20. Estimated future pure premium = (.9)(11.11) + (1 - .9)(20) = 12. Comment: If we were told that there are 500 employees in year 4, then the estimated aggregate loss for year 4 is: (500)(12) = 6000. First estimate the future pure premium or frequency. Then multiply by the number of exposures in the later year. See for example, 4, 11/04, Q.9. 9.62. B. VHM = Var[µ] = 50. EPV = E[Var[X | µ]] = E[500] = 500. K = EPV/VHM = 10. Z = 3/(3 + 10) = 3/13. A priori mean = E[µ] = 1000. Estimated future severity = (3/13)((750 + 1075 + 2000)/3) + (10/13)(1000) = 1063.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 451 9.63. D. Adding the probabilities, there is a 60% a priori probability of Θ = 0, risk type A, and a 40% a priori probability of Θ = 1, risk type B. For risk type A, the distribution of X is:
[email protected]/.6 = 2/3,
[email protected]/.6 = 1/6, and
[email protected]/.6 = 1/6. Putting this into ordinary language: Prob[Type A] = Prob[Θ = 0] = .4 + .1 + .1 = 60%. Prob[Type B] = Prob[Θ = 1] = .1 + .2 + .1 = 40%. X 0 1 2
Type A 4/6 1/6 1/6
Type B 1/4 2/4 1/4
The mean for risk type A is: E[X | Θ = 0] = (0)(2/3) + (1)(1/6) + (2)(1/6) = 0.5. The 2nd moment for risk type A is: E[X2 | Θ = 0] = (02 )(2/3) + (12 )(1/6) + (22 )(1/6) = .8333. Process Variance for risk type A is: Var[X | Θ = 0] = .8333 - .52 = .5833. Similarly, risk type B has mean 1, second moment 1.5, and process variance 0.5. Risk Type
A Priori Chance
Mean
Square of Mean
Process Variance
A B
0.6 0.4
0.50 1.00
0.25 1.00
0.583 0.500
0.70
0.55
0.550
Average
EPV = (.6)(.5833) + (.4)(0.5) = .550. Overall Mean = (.6)(0.5) + (.4)(1) = 0.7. 2nd moment of the hypothetical means is: (.6)(0.52) + (.4)(12) = 0.55. VHM = .55 - .72 = .06. K = EPV/VHM = .55/.06 = 9.17. Z = 10/(10 + K) = 52.2%. Observed mean is: 10/10 = 1. Prior mean is .7. Estimate is: (.522)(1) + (.478)(.7) = 0.857. Comment: Overall we have: 50% X = 0, 30% X = 1, and 20% X = 2. Thus the overall mean is .7, and the total variance is .61. Note that EPV + VHM = .55 + .06 = .61 = total variance. 9.64. C. Variance of the Hypothetical Means = .2775 - .4252 = .0969. Class 1 2 3 4 Average
A Priori Probability 0.25 0.25 0.25 0.25
Mean 0.1 0.2 0.5 0.9
Square of Mean 0.01 0.04 0.25 0.81
Process Variance 0.09 0.16 0.25 0.09
0.4250
0.2775
0.1475
K = EPV/ VHM = .1475/.0969 = 1.5. Z = 4/(4 + K) = 72.7%. Estimated future frequency = (.727)(2/4) + (.273)(.425) = .48. (5)(.48) = 2.4.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 452 9.65. D. For risk 1, the mean is: (.5)(250) + (.3)(2500) + (.2)(60000) = 12,875, the second moment is: (.5)(2502 ) + (.3)(25002 ) + (.2)(600002 ) = 721,906,250, and the process variance is: 721,906,250 - 12,8752 = 556,140,625. Risk
A Priori Probability
Mean
Square of the Mean
Second Moment
Process Variance
1 2
66.67% 33.33%
12,875.00 6,675.00
165,765,625 44,555,625
721,906,250 361,293,750
556,140,625 316,738,125
10,808.33
125,362,292
476,339,792
VHM = 125,362,292 - 10,808.332 = 8,542,945. EPV = 476,339,792. K = EPV/VHM = 55.8. Z = 1/(1 + K) = 1.8%. (1.8%)(250) + (1 - 1.8%)(10,808.33) = 10,618. Comment: With only two types of risks, VHM = (12875 - 6675)2 (2/3)(1/3) = 8,542,222, with the difference from above due to rounding. 9.66. C. For the third class, (.7)0 + (.2)1 + (.1)2 = .4. (.7)02 + (.2)12 + (.1)22 = .6. Class
A Priori Chance of this Class
Mean for for this Class
1 2 3
0.333 0.333 0.333
0.200 0.300 0.400
0.0400 0.0900 0.1600
0.400 0.500 0.600
0.360 0.410 0.440
0.300
0.0967
0.500
0.403
Overall
Square ofSquare Mean ofSecond Mean Moment for this Class for this Class for this Class
Process Variance
The Variance of the Hypothetical Means = .0967 - .32 = .0067. K = EPV/VHM = .403/.0067 = 60.1. Z = 50/(50 + 60.1) = 45.4%. Estimated future frequency = (45.4%)(17/50) + (54.6%)(.3) = .318. (35)(.318) = 11.1. 9.67. C. Variance of the Hypothetical Means = 22.18 - 4.72 = 0.09. Class 1 2 Average
A Priori Probability 0.5000 0.5000
Mean for this Class 5 4.4
Square of Mean of this Class 25 19.36
Process Variance 5 1.98
4.7
22.18
3.49
K= EPV / VHM = 3.49 / .09 = 38.8. Thus for N = 3, Z = 3/(3 + 38.8) = 7.2%. The observed mean is: (3 + r + 4)/3 = (7 + r)/3, and the a priori mean is 4.7. Therefore, the estimate = (7.2%)(7 + r)/3 + (92.8%)(4.7). Set this equal to 4.6019: 4.6019 = (7.2%)(7 + r)/3 + (92.8%)(4.7). ⇒ r = 3.0.
2013-4-9 Buhlmann Cred. §9 Buhl. Cred. Discrete Risk Types, HCM 10/19/12, Page 453 9.68. B. E[X | θ] = (0)(2θ) + (1)(θ) + (2)(1 - 3θ) = 2 - 5θ. E[X2 | θ] = (0)(2θ) + (1)(θ) + (4)(1 - 3θ) = 4 - 11θ. θ
Prob.
Mean
Mean2
Second Moment
Process Variance
.05 .30
.8 .2
1.75 0.50
3.0625 0.2500
3.45 0.70
.3875 .4500
1.50
2.5000
Avg.
.4000
VHM = 2.5 - 1.52 = .25. K = EPV/VHM = .4/.25 = 1.6. Z = 1/(1 + K) = 1/2.6. Estimate is: (2)(1/2.6) + (1.5)(1 - 1/2.6) = 1.692. 9.69. B. The process variance is 10002 = 1 million for a risk from each class. ⇒ EPV = 1 million. The overall mean is: (60%)(2000) + (30%)(3000) + (10%)(4000) = 2500. The second moment of the hypothetical means is: (60%)(20002 ) + (30%)(30002 ) + (10%)(40002 ) = 6.7 million. Therefore, VHM = 6.7 million - 25002 = 0.45 million. K = EPV/VHM = 1/0.45 = 2.22. Z = 80/(80 + 2.22) = 97.3%. The observed pure premium is: (24,000 + 36,000 + 28,000)/(24 + 30 + 26) = 88,000/80 = 1100. Estimated future pure premium is: (97.3%)(1100) + (1 - 97.3%)(2500) = 1138. Comment: Choice A would correspond to Z = 1, while choice E would correspond to Z = 0, thus one can probably eliminate these choices.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 454
Section 10, Buhlmann Credibility, with Continuous Risk Types In the prior section Buhlmann Credibility was applied to situations where there were several distinct types of risks. In this section, Buhlmann Credibility will be applied in a similar manner to situations in which there are an infinite number of risk types, parameterized in some continuous manner. Where summation was used in the discrete case, instead integration will be used in the continuous case. “Mahlerʼs Guide to Conjugate Priors” contains many more examples of Buhlmann Credibility with Continuous Risk Types.115 An Example of Mixing Bernoullis: For example, assume: • In a large portfolio of risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean q. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. • The distribution of q within the portfolio has density function: π(q) = 20.006q4 , 0.6 ≤ q ≤ 0.8 For a given value of q, the process variance of a Bernoulli is q(1- q) = q - q2 .116 We use integration to get the expected value of the process variance:117 0.8
EPV = E[q -
q2 ]
=
∫ (q -
q2)
π(q) dq = 20.006
0.6
0.8
∫
(q5 - q6) dq = 0.199.
0.6
For a given value of q, the hypothetical mean is q. We use integrals to get the first and second moments of the hypothetical means: 0.8
∫
E[q] =
q π(q) dq = 20.006
0.6 0.8
E[q2 ] =
∫
0.6
0.8
∫
q5 dq = 0.71851.
0.6
q2 π(q) dq = 20.006
0.8
∫
q6 dq = 0.51936.
0.6
VHM = E[q2 ] - E[q]2 = 0.51936 - 0.718512 = 0.0031. K = EPV/VHM = 0.199/0.0031 = 64. 115
See the sections on Mixing Poissons, Gamma-Poisson, Beta-Binomial, Inverse Gamma - Exponential, NormalNormal, and Overview. 116 In general, the process variance and the hypothetical mean will be some function of the parameter. 117 In cases where the distribution of the parameter is well known, example uniform or exponential, we can avoid doing integrals, since we already know the mean, moments, and variance of such a distribution.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 455 Exercise: A policyholder is selected at random from this portfolio. He is observed to have one claim in three years. Using Buhlmann Credibility, what is his expected future frequency? [Solution: Z = 3/(3 + K) = 4.5%. Estimate = (4.5%)(1/3) + (95.5%)(0.71851) = 0.70.] Summary: Assume q is a parameter which varies across the portfolio, via prior distribution π(q). EPV = ∫ (Process Variance | q) π(q) dq . First Moment of Hypothetical Means = ∫ (Mean | q) π(q) dq . Second Moment of Hypothetical Means = ∫ (Mean | q)2 π(q) dq VHM = Second Moment of Hypothetical Means - (First Moment of Hypothetical Means)2 . Using Moment Formulas: In order to calculate the EPV and VHM, in those cases where the prior distribution π(q) is one of the distributions in the Appendices attached to the exam, often one can use moment formulas rather than doing integrals. For example, assume that the annual number of claims on a given policy has a Geometric Distribution with parameter β. The prior distribution of β is Gamma Distribution with parameters α = 4 and θ = 0.1. The process variance is: β(1+β) = β + β2. Thus the EPV = E[β] + E[β2] = (mean of the Gamma) + (second Moment of the Gamma) = (4)(0.1) + (4)(5)(0.12 ) = 0.60. The hypothetical mean is β. Thus the VHM = Var[β] = Variance of the Gamma = (4)(0.12 ) = 0.04. Therefore, K = EPV / VHM = 0.60 / 0.04 = 15.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 456 A Severity Example: Assume severity is Pareto, with θ = 1, and alpha varying across the portfolio. Assume alpha is uniformly distributed from 3 to 5. E[X | α] = 1 / (α - 1). 5
Thus the first moment of the hypothetical means is:
∫3
1 1 dα = {ln(4) - ln(2)} / 2 = 0.34657. α - 1 2
Exercise: What is the second moment of the hypothetical means? 5
[Solution:
∫3
1 1 dα = (1/4 - 1/2) / 2 = 1/8 = 0.125.] (α - 1)2 2
Thus the VHM = 0.125 - 0.346572 = 0.004887. E[X2 | α] =
2 . (α - 1) (α - 2)
Therefore, Var[X | α] =
5
Thus, EPV =
∫3
2 1 .118 (α - 1) (α - 2) (α - 1)2
1 2 dα (α - 1) (α - 2) 2
5
∫3
1 1 dα = 2 (α - 1) 2
5
∫3 α - 2 1
1 dα - 1/8 = α - 1
ln(3/1) - ln(4/2) - 1/8 = 0.2805. Therefore, K = EPV / VHM = 0.2805 / 0.004887 = 57.4. Exercise: For an individual policyholder, we observe 20 claims which total 9. Use Buhlmann Credibility to estimate the size of the next claim from the same policyholder. [Solution: Z = 20 / (20 + 57.4) = 25.8%. Prior mean is 0.34657, from above. Estimate is: (25.8%)(9/20) + (1 - 25.8%)(0.34657) = 0.373.]
118
I have left the process variance in this form in order to make it easier to integrate.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 457 Problems: Use the following information for the next two questions: •
Claim sizes follow a Gamma Distribution, with parameters α and θ = 10.
•
The prior distribution of α is assumed to be uniform on the interval (2.5, 4.5).
•
Buhlmann Credibility is being used to estimate claim severity.
10.1 (2 points) Determine the value of the Buhlmann Credibility Parameter, K. A. Less than 10 B. At least 10, but less than 11 C. At least 11, but less than 12 D. At least 12, but less than 13 E. 13 or more 10.2 (1 point) You observe from an insured 2 claims of sizes 15 and 31. What is the estimated future claim severity from this insured? A. 25 B. 27 C. 29 D. 31 E. 33 10.3 (3 points) You are given:
• The annual claim count for each individual insured has a Negative Binomial Distribution, with parameters r and β, which do not change over time.
• For each insured, β = 0.3. • The r parameters vary across the portfolio of insureds, via a Gamma Distribution with α = 4 and θ = 5. Determine the Buhlmann credibility factor Z for an individual driver for one year. A. 52% B. 54% C. 56% D. 58% E. 60% 10.4 (3 points) You are given: • The annual claim count for each individual insured has a Negative Binomial Distribution, with parameters r and β, which do not change over time.
• For each insured, r = 3. • The β parameters vary across the portfolio of insureds, via an Exponential Distribution with θ = 0.7. Determine the Buhlmann credibility factor for an individual driver for one year. (A) 45% (B) 55% (C) 65% (D) 75% (E) 85%
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 458 10.5 (3 points) You are given the following: • Claim sizes for a given policyholder follow an exponential distribution with f(x) = λe-λx , 0 < x < ∞. • The prior distribution of λ is uniform from 0.02 to 0.10. • The policyholder experiences a claim of size 60. Using Buhlmann Credibility, determine the expected size of the next claim from this policyholder. A. 24 B. 26 C. 28 D. 30 E. 32 10.6 (2 points) Use the following information:
•
The probability of y successes in m trials is given by a Binomial distribution with parameters m and q.
•
The prior distribution of q is uniform on [0, 1].
•
Two successes were observed in six trials.
Use Buhlmann Credibility to estimate the probability that a success will occur on the seventh trial. A. 0.34 B. 0.36 C. 0.38 D. 0.40 E. 0.42 10.7 (4 points) You are given the following: • Claim sizes for a given policyholder follow a distribution with density function f(x) = 3x2 /b3 , 0 < x < b. • The prior distribution of b is a Single Parameter Pareto Distribution with α = 6 and θ = 40. A policyholder experiences two claims of sizes 30 and 60. Use Buhlmann Credibility to determine the expected value of the next claim from this policyholder. A. 37 B. 38 C. 39 D. 40 E. 41 Use the following information for the next two questions: (i) Xi is the claim count observed for driver i for one year. (ii) Xi has a negative binomial distribution with parameters β = 1.6 and ri. (iii) The riʼs have an exponential distribution with mean 0.8. (iv) The size of claims follows a Pareto Distribution with α = 5 and θ = 1000. 10.8 (3 points) An individual driver is observed to have 2 claims in one year. Use Buhlmann Credibility to estimate this driverʼs future annual claim frequency. (A) 1.3 (B) 1.4 (C) 1.5 (D) 1.6 (E) 1.7 10.9 (3 points) An individual driver is observed to have 2 claims in one year, of sizes 1500 and 800. Apply Buhlmann Credibility to aggregate losses in order to estimate this driverʼs future annual aggregate loss. (A) 600 (B) 700 (C) 800 (D) 900 (E) 1000
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 459 10.10 (3 points) Use the following information: • In a large portfolio of risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean q. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. The number of claims for one policyholder is independent of the number of claims for any other policyholder. • The distribution of q within the portfolio has density function: 400q , 0 < q ≤ 0.05 ⎧ f(q) = ⎨ ⎩40 - 400q , 0.05 < q < 0.10 • A policyholder Phillip DeTanque is selected at random from the portfolio. During Year 1, Phil has one claim. During Year 2, Phil has no claim. During Year 3, Phil has one claim. Use Buhlmann Credibility to estimate the probability that Phil will have a claim during Year 4. A. 5.4% B. 5.8% C. 6.2% D. 6.6% E. 7.0% Use the following information for the next three questions: • Losses for individual policyholders follow a Compound Poisson Distribution. • The prior distribution of the Poisson parameter λ is uniform on [2, 6]. • Severity is Gamma with parameters α = 3 and θ. • The prior distribution of θ has density 62.5e-5/θ/θ4 , θ > 0. • The distributions of λ and θ are independent. 10.11 (2 points) An individual policyholder has 2 claims this year. Using Buhlmann Credibility, what is the expected number of claims from that policyholder next year? (A) 3.00 (B) 3.25 (C) 3.50 (D) 3.75 (E) 4.00 10.12 (3 points) An individual policyholder has 2 claims of sizes 4 and 7. Using Buhlmann Credibility, what is the expected size of the next claim from that policyholder? (A) 5.5 (B) 6.0 (C) 6.5 (D) 7.0 (E) 7.5 10.13 (4 points) This year, an individual policyholder has 2 claims of sizes 4 and 7. Applying Buhlmann Credibility directly to the aggregate losses, what is the expected aggregate loss from that policyholder next year? (A) 16 (B) 18 (C) 20 (D) 22 (E) 24 10.14 (4 points) Severity is LogNormal with parameters µ and σ = 1.2. µ varies across the portfolio via a Gamma Distribution with α = 8 and θ = 0.3. Determine the value of the Buhlmann Credibility Parameter K for severity. A. 3 B. 4 C. 5 D. 6 E. 7
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 460 10.15 (3 points) Use the following information: (i) Xi is the claim count observed for insured i for one year. (ii) Xi has a negative binomial distribution with parameters r = 2 and βi. (iii) The βiʻs have a distribution π[β] =
280β 4 , 0 < β < ∞. (1 + β)9
An insured has 8 claims in one year. Using Buhlmann Credibility, what is that insuredʼs expected future annual claim frequency? ∞
Hint:
Γ(c) Γ(d) xc - 1 ∫ (1 + x) c + d dx = Γ(c + d) , for c > 0, d > 0. 0
A. 4.8
B. 5.0
C. 5.2
D. 5.4
E. 5.6
Use the following information for a group of insureds for the next two questions:
• The amount of a claim is uniformly distributed, but will not exceed a certain unknown limit b. • The prior distribution of b is: π(b) = 3000/b4 , b > 10. • From an insured, three claims of sizes 17, 13, and 22 are observed in that order. 10.16 (3 points) Use Buhlmann Credibility to estimate the size of the next claim from this insured. A. 11 B. 12 C. 13 D. 14 E. 15 10.17 (3 points) Use Bayes Analysis to estimate the size of the next claim from this insured. A. 11 B. 12 C. 13 D. 14 E. 15
10.18 (3 points) You are given the following: • Claim sizes for a given policyholder follow an exponential distribution with density function f(x) = e-x/δ / δ , 0 < x < ∞. • The prior distribution of δ is Pareto with α = 2 and θ = 100. • The policyholder experiences a claim of size 60. Using Buhlmann Credibility, determine the expected size of the next claim from this policyholder. A. 60 B. 70 C. 80 D. 90 E. 100
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 461 10.19 (3 points) You are given: (i) Claim counts follow a Poisson Distribution with mean ζ/20. (ii) Claim sizes follow a Gamma Distribution with α = 5 and θ = 4ζ. (iii) Claim counts and claim sizes are independent, given ζ. (iv) The prior distribution has probability density function: π(ζ) = 7/ζ8, ζ > 1. For 100 exposures, calculate the Buhlmann Credibility for aggregate losses. (A) 44% (B) 47% (C) 50% (D) 53% (E) 56% 10.20 (2 points) You are given the following: • Severity is uniform from 0 to b. • b is distributed uniformly from 10 to 15. Calculate Bühlmannʼs K for severity. A. 25 B. 30 C. 35 D. 40
E. 45
10.21 (3 points) You are given the following: • Number of claims for a single insured follows a Poisson distribution with mean λ. • The amount of a single claim has a distribution with mean µ and coefficient of variation of 4. • µ and λ are independent random variables. • E[λ] = 2, Var[λ] = 3. • The distribution of µ has a coefficient of variation of 2. • Number of claims and claim severity distributions are independent. Calculate Bühlmannʼs K for aggregate losses. A. 3.5 B. 4.0 C. 4.5 D. 5.0 E. 5.5
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 462 Use the following information for the next two questions: • During year 5, claim sizes are LogNormal with µ = 6 and σ = 1. • Claim frequency per employee is Binomial with parameters m = 3 and q. • q is the same for the employees of a given employer. • The prior distribution of q between employers is uniform from 0.01 to 0.03. • Frequency and severity are independent. 10.22 (3 points) Determine the Buhlmann Credibility Parameter, K, for estimating pure premiums. (A) Less than 400 (B) At least 400, but less than 500 (C) At least 500, but less than 600 (D) At least 600, but less than 700 (E) At least 700 10.23 (2 points) You observe the following experience for a particular employer. Year Number of Employees Loss per Employee 1 100 21 2 120 28 3 140 27 You expect 130 employees for this employer in year 5. Inflation is 4% per year. Use Buhlmann Credibility to determine the expected losses from this employer in year 5. A. 4600 B. 4700 C. 4800 D. 4900 E. 5000
10.24 (3 points) You are given: (i) The number of claims in a year for a selected risk follows a Poisson distribution with mean λ. (ii) The severity of claims for the selected risk follows an exponential distribution with mean θ. (iii) The number of claims is independent of the severity of claims. (iv) The prior distribution of λ is exponential with mean 4. (v) The prior distribution of θ is Poisson with mean 7. (vi) A priori, λ and θ are independent. Using Bühlmannʼs credibility for aggregate losses, determine k. (A) 4/9 (B) 1/2 (C) 5/9 (D) 2/3 (E) 3/4
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 463 10.25 (4 points) You are given the following: •
The number of claims follows a distribution with mean λ and variance 2λ.
•
Claim sizes follow a distribution with mean µ and variance 2µ2.
•
The number of claims and claim sizes are independent.
•
λ and µ have a prior probability distribution with joint density function f(λ, µ) = 1.5λ, 0 < µ < 2λ, 0 < λ < 1.
Determine the value of Buhlmann's k for severity. (A) 5 (B) 6 (C) 7 (D) 8
(E) 9
10.26 (2 points) You are given: (i) The annual number of claims on a given policy has a geometric distribution with parameter β. (ii) The prior distribution of β is Gamma with parameters α and θ. Determine the Bühlmann credibility parameter, K. (A) (α - 1)/θ
(B) α/θ
(C) (α + 1)/θ
(D) α + 1/θ
(E) None of A, B, C, or D. 10.27 (3 points) Use the following information: • Claim sizes for a given policyholder follow a mixed exponential distribution with density function f(x) = 0.75λe-λx + 0.5λe-2λx , 0 < x < ∞. • The prior distribution of λ is uniform from 0.01 to 0.05. • The policyholder experiences a claim of size 60. Use Buhlmann Credibility to determine the expected size of the next claim from this policyholder. A. 36 B. 37 C. 38 D. 39 E. 40 10.28 (4 points) You are given: (i) Claim counts follow a Poisson Distribution with mean ζ/20. (ii) Claim sizes follow a Gamma Distribution with α = 5 and θ = 4ζ. (iii) Claim counts and claim sizes are independent, given ζ. (iv) The prior distribution has probability density function: π(ζ) = 7/ζ8, ζ > 1. Calculate the Buhlmann Credibility parameter, K, for estimating severity. (E) 9 (A) 1 (B) 3 (C) 5 (D) 7
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 464 10.29 (3 points) You are given: (i) The number of claims in a year for a selected risk follows a Poisson distribution with mean λ. (ii) The severity of claims for the selected risk follows an exponential distribution with mean µ. (iii) The number of claims is independent of the severity of claims. (iv) The joint density of λ and µ is: 2,500,000 λ3 e−10λ µ-4 e−10/µ / 3, λ > 0, µ > 0. Using Bühlmannʼs credibility for aggregate losses, determine k. (A) 1 (B) 3 (C) 5 (D) 7 (E) 9 Use the following information for the next two questions: (i) The number of claims for each policyholder has a Binomial Distribution with parameters m = 3 and q. (ii) The prior distribution of q is f(q) = 1.5 - q, 0 < q < 1. (iii) A randomly selected policyholder had the following claims experience: Year Number of Claims 1 1 2 0 3 2 4 1 5 0 1
(iv)
∫ qa − 1 (1− q)b− 1 dq = 0
(a - 1)! (b- 1)! . (a + b - 1)!
10.30 (4 points) Use Bayes Analysis to estimate this policyholderʼs frequency for year 6. 10.31 (3 points) Use Buhlmann Credibility to estimate this policyholderʼs frequency for year 6.
10.32 (5 points) The distribution of aggregate losses is LogNormal with parameters 5 and σ. σ2 varies across the portfolio via an Inverse Gaussian Distribution with µ = 0.6 and θ = 2. Determine the value of the Buhlmann Credibility Parameter for aggregate losses. Hint: Use the moment generating function of the Inverse Gaussian Distribution. A. 20 B. 30 C. 40 D. 50 E. 60
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 465 10.33 (3 points) You are given the following: • The amount of an individual claim has an Inverse Gamma distribution with shape parameter α = 6 and scale parameter θ. • The parameter θ is distributed via an Exponential Distribution. Calculate Bühlmannʼs k for severity. A. 1/4 B. 1/2 C. 1
D. 3/2
E. 5/2
10.34 (4 points) You are given: (i) The number of claims in a year for a selected risk follows a Poisson distribution with mean λ. (ii) The severity of claims for the selected risk follows an exponential distribution with mean µ. (iii) The number of claims is independent of the severity of claims. (iv) The joint density of λ and µ is: 0.3 µ e−0.1µ/(1 + 10 λ)4 , λ > 0, µ > 0. Using Bühlmannʼs credibility for aggregate losses, determine k. (A) 10 (B) 12 (C) 14 (D) 16 (E) 18 10.35 (2 points) You are given: (i) Given Θ = θ, claim sizes follow a Pareto distribution with parameters α = 4 and θ. (ii) The prior distribution of the parameter Θ is uniform from 10 to 50. Six claims are observed. Determine Bühlmannʼs credibility to be used in order to estimate the size of the seventh claim. A. less than 10% B. at least 10% but less than 15% C. at least 15% but less than 20% D. at least 20% but less than 25% E. at least 25% 10.36 (3 points) Severity is Gamma with parameters that vary continuously: π(α, θ) =
α 2θ 3 , 2 < α < 3, 1 < θ < 2. 23.75
Determine the Buhlmann credibility parameter. A. 5 B. 7 C. 9 D. 11
E. 13
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 466 10.37 (5 points) Severity is LogNormal with parameters u and 2. u varies across the portfolio via a Normal Distribution with µ = 5 and σ2 = 3. A policyholder submits 20 claims with an average size of 10,000. Use Buhlmann Credibility to predict the size of the next claim from this policyholder. A. less than 6500 B. at least 6500 but less than 7000 C. at least 7000 but less than 7500 D. at least 7500 but less than 8000 E. at least 8000 10.38 (4 points) Severity is Inverse Gamma with parameters that vary continuously. α and θ vary independently. α is uniformly distributed from 3 to 5. θ is distributed via: π(θ) =
3000 , θ > 10. θ4
Determine the Buhlmann credibility to be assigned to 4 claims. 1 1 Hint: ∫ dx = + ln[x-2] - ln[x-1]. 2 (x − 2) (x − 1) x −1 A. 55%
B. 60%
C. 65%
D. 70%
E. 75%
10.39 (2 points) The size of loss has a density: f(x | θ) = 3θ3 / x4 , x > θ. The prior distribution of θ is uniform from 10 to 15. We observe 8 claims for a total of 200 from an insured. Use Buhlmann Credibility to estimate the size of the next claim from that insured. A. 19.50 B. 19.75 C. 20.00 D. 20.25 E. 20.50 10.40 (3 points) You are given: (i) The claim count observed for an individual driver for one year has a geometric distribution with parameter β. (ii) β varies across a group of drivers. (iii) 10β follows a zero-truncated poisson distribution with λ = 2. Determine the Buhlmann credibility factor for an individual driver for five years. (A) Less than 0.05 (B) At least 0.05, but less than 0.10 (C) At least 0.10, but less than 0.15 (D) At least 0.15, but less than 0.20 (E) At least 0.20
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 467 10.41 (4 points) You are given: (i) Severity follows a LogNormal Distribution with σ = 2. (ii) µ varies across a group of drivers. (iii) µ-5 follows a Negative Binomial distribution with r = 6 and β = 0.05. Determine the Buhlmann credibility parameter K. A. 10 B. 20 C. 30 D. 50
E. 75
10.42 (3 points) For group medical insurance, you have the following three years of experience from a particular insured group: Year Number of members Number of Claims Average Loss Per Member 1 10 25 2143 2 15 40 2551 3 20 45 2260 There will be 25 members in year 4. The number of claims per member in any year follows a Binomial distribution with parameters m = 10 and q. q is the same for all members in a group, but varies between groups. q is distributed uniformly over the interval (0.20, 0.40). Claim severity follows a Gamma distribution with parameters α = 5, θ = 200. Calculate the Buhlmann-Straub estimate of aggregate losses in year 4. (A) 60,000 (B) 61,000 (C) 62,000 (D) 64,000 (E) 66,000
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 468 Use the following information for the next four questions: There exists a set of risks, each of which can have at most one accident during each year. The hypothetical mean frequencies vary among the individual risks and is a priori distributed with equal probability in the interval between 0.07 and 0.13. The severity and frequency distributions are independent. There are two types of risks, each with a different severity distribution: Risk Type 5 units 10 units 20 units type 1: 1/3 1/2 1/6 type 2: 1/2 1/4 1/4 60% of the risks are type 1, and 40% are type 2. 10.43 (4, 11/82, Q.46A) (3 points) What is the variance of the hypothetical mean pure premiums? A. less than 0.02 B. at least 0.02 but less than 0.03 C. at least 0.03 but less than 0.04 D. at least 0.04 but less than 0.05 E. at least 0.05 10.44 (4, 11/82, Q.46B) (4 points) What is the expected value of the process variance? A. less than 11.6 B. at least 11.6 but less than 11.7 C. at least 11.7 but less than 11.8 D. at least 11.8 but less than 11.9 E. at least 11.9 10.45 (4, 11/82, Q.46C) (1 point) Find K, the Buhlmann Credibility Parameter. A. less than 350 B. at least 350 but less than 400 C. at least 400 but less than 450 D. at least 450 but less than 500 E. at least 500 10.46 (1 point) A risk is chosen at random. You observe a total of 45 units of losses over 15 years. Use Buhlmann Credibility to estimate the future pure premium for that same risk. A. less than 1.1 B. at least 1.1 but less than 1.2 C. at least 1.2 but less than 1.3 D. at least 1.3 but less than 1.4 E. at least 1.4
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 469 10.47 (4B, 11/97, Q.10) (3 points) You are given the following: • In a large portfolio of automobile risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean m/100,000, where m is the number of miles driven each and every year by the policyholder. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. The number of claims for one policyholder is independent of the number of claims for any other policyholder. • The distribution of m within the portfolio has density function m/ 100,000,000 , 0 < m ≤ 10,000 ⎧ f(m) = ⎨ . ⎩(20,000 - m)/ 100,000,000, 10,000 < m < 20,000 A policyholder is selected at random from the portfolio. During Year 1, one claim is observed for this policyholder. During Year 2, no claims are observed for this policyholder. No information is available regarding the number of claims observed during Years 3 and 4. Hint: Use a change of variable such as q = m/100,000. Determine the Buhlmann credibility estimate of the expected number of claims for the selected policyholder during Year 5. A. 3/31 B. 1/10 C. 7/62 D. 63/550 E. 73/570 10.48 (4B, 11/98, Q.19) (2 points) You are given the following: •
Claim sizes follow a gamma distribution, with parameters α and θ = 1/2 .
•
The prior distribution of α is assumed to be uniform on the interval (0, 4).
Determine the value of Buhlmann's k for estimating the expected value of a claim. A. 2/3 B. 1 C. 4/3 D. 3/2 E. 2 10.49 (4B, 5/99, Q.13) (3 points) You are given the following: •
The number of claims follows a distribution with mean λ and variance 2λ.
•
Claim sizes follow a distribution with mean µ and variance 2µ2.
•
The number of claims and claim sizes are independent.
•
λ and µ have a prior probability distribution with joint density function f(λ, µ) = 1, 0 < λ < 1, 0 < µ < 1.
Determine the value of Buhlmann's k for aggregate losses. A. Less than 3 B. At least 3, but less than 6 C. At least 6, but less than 9 D. At least 9, but less than 12 E. At least 12 10.50 (2 points) In the previous question, determine the value of Buhlmann's k for severity. (A) 5 (B) 6 (C) 7 (D) 8 (E) 9
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 470 10.51 (4B, 11/99, Q.20) (3 points) You are given the following: • The number of claims follows a Poisson distribution with mean λ . • Claim sizes follow a distribution with density function f(x) = e-x/λ / λ, 0 < x < ∞. • The number of claims and claim sizes are independent. • The prior distribution of λ has density function g(λ) = e−λ, 0 < λ < ∞. Determine the value of Buhlmann's k for aggregate losses. ∞
Hint: ∫ λ n e- λ dλ = n! 0
A. 0
B. 3/5
C. 1
D. 2
E. ∞
10.52 (4, 5/00, Q.37) (2.5 points) You are given: (i) Xi is the claim count observed for driver i for one year. (ii) Xi has a negative binomial distribution with parameters β = 0.5 and ri. (iii) µi is the expected claim count for driver i for one year. (iv) The µiʼs have an exponential distribution with mean 0.2. Determine the Buhlmann credibility factor for an individual driver for one year. (A) Less than 0.05 (B) At least 0.05, but less than 0.10 (C) At least 0.10, but less than 0.15 (D) At least 0.15, but less than 0.20 (E) At least 0.20
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 471 10.53 (4, 11/01, Q.18 & 2009 Sample Q.67) (2.5 points) You are given the following information about a book of business comprised of 100 insureds: Ni
(i) Xi =
∑Yij is a random variable representing the annual loss of the ith insured. j=1
(ii) N1 , N2 , ..., N100 are independent random variables distributed according to a negative binomial distribution with parameters r (unknown) and β = 0.2. (iii) Unknown parameter r has an exponential distribution with mean 2. (iv) Yi1, Yi2 , ..., YiNi are independent random variables distributed according to a Pareto distribution with α = 3 and θ = 1000. Determine the Bühlmann credibility factor, Z, for the book of business. (A) 0.000 (B) 0.045 (C) 0.500 (D) 0.826 (E) 0.905 10.54 (2 points) In the previous question, 4, 11/01, Q.18, change bullet iii: (iii) The prior distribution of r is discrete: Prob[r = 1] = 1/3, Prob[r = 2] = 1/3, and Prob[r = 3] = 1/3. Determine the Bühlmann credibility factor, Z, for the book of business. (A) 0.50 (B) 0.60 (C) 0.70 (D) 0.80 (E) 0.90
10.55 (4, 11/02, Q.18 & 2009 Sample Q. 41) (2.5 points) You are given: (i) Annual claim frequency for an individual policyholder has mean λ and variance σ2. (ii) The prior distribution for λ is uniform on the interval [0.5, 1.5]. (iii) The prior distribution for σ2 is exponential with mean 1.25. A policyholder is selected at random and observed to have no claims in Year 1. Using Bühlmann credibility, estimate the number of claims in Year 2 for the selected policyholder. (A) 0.56 (B) 0.65 (C) 0.71 (D) 0.83 (E) 0.94 10.56 (2 points) You are given: (i) Annual claim frequency for an individual policyholder has mean λ and variance σ2. (ii) The prior distribution for λ has a 50% chance of 0.5 and 50% chance of 1.5. (iii) The prior distribution for σ2 has a 75% chance of 1 and a 25% chance of 2. (iv) The prior distribution for λ and σ2 are independent. A policyholder is selected at random and observed to have no claims in Year 1. Using Bühlmann credibility, estimate the number of claims in Year 2 for the selected policyholder. (A) 0.56 (B) 0.65 (C) 0.71 (D) 0.83 (E) 0.94
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 472 10.57 (4, 11/03, Q.11 & 2009 Sample Q.8) (2.5 points) You are given: (i) Claim counts follow a Poisson distribution with mean θ. (ii) Claim sizes follow an exponential distribution with mean 10θ. (iii) Claim counts and claim sizes are independent, given θ. (iv) The prior distribution has probability density function: π(θ) = 5/θ6, θ > 1. Calculate Bühlmannʼs k for aggregate losses. (A) Less than 1 (B) At least 1, but less than 2 (C) At least 2, but less than 3 (D) At least 3, but less than 4 (E) At least 4 10.58 (2 points) In the previous question, 4, 11/03, Q.11, change bullet iv: (iv) The prior distribution of θ is discrete: Prob[θ = 1] = 70%, Prob[θ = 2] = 30%. Calculate Bühlmannʼs k for aggregate losses. (A) Less than 1 (B) At least 1, but less than 2 (C) At least 2, but less than 3 (D) At least 3, but less than 4 (E) At least 4
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 473 10.59 (4, 11/04, Q.29 & 2009 Sample Q.154) (2.5 points) You are given: (i) Claim counts follow a Poisson distribution with mean λ. (ii) Claim sizes follow a lognormal distribution with parameters µ and σ. (iii) Claim counts and claim sizes are independent. (iv) The prior distribution has joint probability density function: f(λ, µ, σ) = 2σ, 0 < λ < 1, 0 < µ < 1, 0 < σ < 1. Calculate Bühlmannʼs k for aggregate losses. (A) Less than 2 (B) At least 2, but less than 4 (C) At least 4, but less than 6 (D) At least 6, but less than 8 (E) At least 8 10.60 (3 points) In the previous question, change bullet iv: (iv) The prior joint distribution of λ, µ, and σ is discrete: Prob[λ = 1/4, µ = 3/4, σ = 1/2] = 30%. Prob[λ = 3/4, µ = 1/2, σ = 1/4] = 20%. Prob[λ = 1/2, µ = 1/4, σ = 3/4] = 50%. Calculate Bühlmannʼs k for aggregate losses. (A) Less than 10 (B) At least 10, but less than 20 (C) At least 20, but less than 30 (D) At least 30, but less than 40 (E) At least 40 10.61 (4, 5/05, Q.11 & 2009 Sample Q.181) (2.9 points) You are given: (i) The number of claims in a year for a selected risk follows a Poisson distribution with mean λ. (ii) The severity of claims for the selected risk follows an exponential distribution with mean θ. (iii) The number of claims is independent of the severity of claims. (iv) The prior distribution of λ is exponential with mean 1. (v) The prior distribution of θ is Poisson with mean 1. (vi) A priori, λ and θ are independent. Using Bühlmannʼs credibility for aggregate losses, determine k. (A) 1 (B) 4/3 (C) 2 (D) 3 (E) 4
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 474 10.62 (4, 5/05, Q.17 & 2009 Sample Q.187) (2.9 points) You are given: (i) The annual number of claims on a given policy has a geometric distribution with parameter β. (ii) The prior distribution of β has the Pareto density function π(β) = α/(β + 1)(α+1), 0 < β < ∞, where α is a known constant greater than 2. A randomly selected policy had x claims in Year 1. Determine the Bühlmann credibility estimate of the number of claims for the selected policy in Year 2. 1 (α − 1) x 1 x+1 x +1 (A) (B) + (C) x (D) (E) α (α − 1) α α −1 α α −1 10.63 (4, 11/05, Q.7 & 2009 Sample Q.219) (2.9 points) For a portfolio of policies, you are given: (i) The annual claim amount on a policy has probability density function: f(x | θ) = 2x/θ2, 0 < x < θ. (ii) The prior distribution of θ has density function: π(θ) = 4θ3, 0 < θ < 1. (iii) A randomly selected policy had claim amount 0.1 in Year 1. Determine the Bühlmann credibility estimate of the claim amount for the selected policy in Year 2. (A) 0.43 (B) 0.45 (C) 0.50 (D) 0.53 (E) 0.56
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 475 Solutions to Problems: 10.1. B. The process variance for a Gamma is αθ2. Thus EPV = E[αθ2] = E[100α] = 100E[α] = (100)((4.5+2.5)/2) = 350. The mean of a Gamma is αθ. Thus VHM = Var[αθ] = Var[10α] = 100Var[α] = (100){(4.5-2.5)2 /12} = 33.33. K = EPV/VHM = 350/33.33 = 10.5. Comment: I have used the fact that the variance of a uniform distribution on [a,b] is: (b-a)2 /12. Note that the value of the scale parameter, θ, drops out of the calculation of the Buhlmann Credibility Parameter, K. 10.2. E. From the previous solution K = 10.5. Thus Z = 2/(2+10.5) = 16%. The observation is: (15+31)/2 = 23. The a priori mean is: E[αθ] = E[10α] = 10E[α] = (10)((4.5+2.5)/2) = 35. Thus the new estimate is: (23)(16%) + (35)(84%) = 33.1. 10.3. B. E[r] = mean of the Gamma = αθ = (4)(5) = 20. Var[r] = variance of the Gamma = αθ2 = (4)(52 ) = 100. The mean of each Negative Binomial Distribution is: rβ = 0.3r. The process variance of each Negative Binomial is: rβ(1+β) = (0.3)(1.3)r = 0.39r. EPV = E[0.39r] = 0.39E[r] = (0.39)(20) = 7.8. VHM = Var[0.3r] = 0.32 Var[r] = (0.09)(100) = 9. K = EPV/VHM = 7.8 /9 = 0.867. For one driver, for one year, Z = 1/(1+0.867) = 53.6%. Comment: Similar to 4, 5/00, Q.37. 10.4. A. E[β] = mean of the Exponential = θ = 0.7. Var[β] = variance of the Exponential = θ2 = 0.72 = 0.49. E[β2] = second moment of the Exponential = 2θ2 = 2(0.72 ) = 0.98. The mean of each Negative Binomial Distribution is: rβ = 3β. The process variance of each Negative Binomial is: rβ(1+β) = 3β + 3β2. EPV = E[3β + 3β2] = 3E[β] + 3E[β2] = (3)(0.7) + (3)(0.98) = 5.04. VHM = Var[3β] = 32 Var[β] = (9)(0.49) = 4.41. K = EPV/VHM = 5.04 /4.41 = 1.143. For one driver, for one year, Z = 1/(1+1.143) = 46.7%.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 476 10.5. B. Given λ the process variance is 1/λ2. .10
.10
EPV = E[1/λ2] = ∫(1/.08) 1/λ2 dλ = 500. E[1/λ] = ∫(1/.08) 1/λ dλ = 20.12. .02
.02
Given λ the mean is 1/λ. Prior Mean = E[1/λ] = 20.12. VHM = E[1/λ2] - E[1/λ]2 = 500 - 20.122 = 95.2. K = EPV/VHM = 500/95.2 = 5.25. Z = 1/(1 + K) = .160. Estimated future severity = (.160)(60) + (.840)(20.12) = 26.5. 10.6. C. Process Variance for a single trial is: q(1- q) = q - q2 . EPV = E[q] - E[q2 ] = 1/2 - 1/3 = 1/6. The hypothetical mean for one trial is q. VHM = variance of a uniform from [0 , 1] = 1/12. K = EPV/ VHM = 2. Z = 6/(6 + K) = 75%. Estimated future frequency per trial is: (75%)(2/6) + (25%)(1/2) = 0.375. Comment: A Beta-Bernoulli conjugate prior situation. The uniform distribution is a Beta distribution with a = 1 and b = 1. K = a + b = 2. See “Mahlerʼs Guide to Conjugate Priors.” 10.7. E. The Single Parameter Pareto Distribution has mean: (6)(40)/(6 -1) = 48, second moment: (6)(402 )/(6 - 2) = 2400, and variance: 2400 - 482 = 96. b
b
E[X | b] = ∫ x 3x2 /b3 dx = 3b/4. E[X2 | b] = ∫ x2 3x2 /b3 dx = 3b2 /5. 0
0
Process Variance given b is: 3b2 /5 - (3b/4)2 = 3b2 /80. EPV = E [3b2 /80] = (3/80)(2nd moment of the Single Parameter Pareto Distribution) = (3/80)(2400) = 90. VHM = Var[3b/4] = (9/16)Var[b] = (9/16)(96) = 54. K = EPV/VHM = 90/54 = 5/3. We observe two claims, so Z = 2/(2 + K) = 54.5%. Prior mean = E[E[X | b]] = E[3b/4] = (3/4)E[b] = (3/4)(48) = 36. Observed mean = (30 + 60)/2 = 45. Estimated future severity = (54.5%)(45) + (45.5%)(36) = 40.9. 10.8. C. Process variance given r is: r(1.6)(2.6) = 4.16 r. EPV= E[4.16r] = 4.16 E[r] = (4.16)(.8) = 3.328. Mean given r is: 1.6 r. Prior mean = E[1.6r] = (1.6)(.8) = 1.28. VHM = Var[1.6 r] = 2.56Var[r] = 2.56(.82 ) = 1.6384. K = EPV/VHM = 3.328/1.6384 = 2.03. Z = 1/(1 + K) = 33.0%. Estimated future frequency = (.330)(2) + (.670)(1.28) = 1.52. Comment: Similar to 4, 5/00, Q.37.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 477 10.9. C. The Pareto Severity has mean 1000/4 = 250, second moment (2)(10002 )/((4)(3)) = 166,667, and variance 166667 - 2502 = 104,167. Given r, the process variance of aggregate loss is: (1.6 r)(104,167) + (2502 )(4.16 r) = 426667r. EPV = E[426667r] = (426667)(.8) = 341,334. Given r, the mean aggregate loss is: (1.6r)(250) = 400r. VHM = 4002 Var[r] = (160000)(.82 ) = 102400. K = EPV/VHM = 341,334/102,400 = 3.33. Observe one year, Z = 1/(1 + K) = 23.1%. A priori mean aggregate loss is: E[400r] = (400)(.8) = 320. Estimated future annual aggregate loss = (.231)(2300) + (.769)(320) = 777. Comment: Similar to 4, 11/01, Q.18. Note that the mean severity times the answer to the previous question: (250)(1.52) = 380 is not equal to the solution to this question. Since the severity distribution does not vary by insured, the former estimate has something to recommend it. .05
.10
.10
10.10. D. E[q] = ∫q f(q)dq = 400∫q2 dq + 40∫q dq - 400∫q2 dq = .05. 0
.05
.05
.10
.05 .10
E[q2 ] = ∫q2 f(q)dq = 400∫q3 dq + 40∫q2 dq - 400∫q3 dq = .002917. 0
.05
.05
Process Variance for a single year is: q(1- q) = q - q2 . EPV = E[q] - E[q2 ] = .05 - .002917 = .0471. The hypothetical mean for one trial is q. VHM = variance of the distribution of q = E[q2 ] - E[q]2 = .002917 - .052 = .000417. K = EPV/ VHM = .0471/.000417 = 113. Z = 3/(3+K) = 2.6%. Estimated future frequency per trial is: (2.6%)(2/3) + (97.4%)(.05) = 6.6%. 10.11. C. Since we are mixing Poissons, EPV of Frequency = mean frequency = 4. VHM frequencies = Variance of uniform distribution on [2, 6] = (6 - 2)2 /12 = 4/3. K = EPV/ VHM = 4/(4/3) = 3. Z = 1/(1 + K) = 1/4. Estimated future frequency = (1/4)(2) + (3/4)(4) = 3.5.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 478 10.12. B. The distribution of θ is Inverse Gamma with α = 3 and scale parameter 5, with mean 5/2, and second moment 25/2. Given θ, the process variance of the severity is: αθ2 = 3θ2. EPV = E[3θ2] = 3 E[θ2] = (3)(25/2) = 37.5. Given θ, the mean aggregate loss is: αθ = 3θ. First moment of the hypothetical means is: E[3θ] = 3E[θ] = (3)(5/2) = 7.5. Second moment of the hypothetical means is: E[9θ2] = 9E[θ2] = (9)(25/2) = 112.5. VHM = 112.5 - 7.52 = 56.25. K = EPV/VHM = 37.5/56.25 = 2/3. We observe 2 claims, Z = 2/(2 + K) = 75.0%. Estimated future severity is: (75.0%)(11/2) + (25.0%)(7.5) = 6.0. 10.13. B. The distribution of λ has mean 4, variance 4/3, and second moment 17.333. The distribution of θ is Inverse Gamma with α = 3 and scale parameter 5, with mean 5/2, and second moment 25/2. Given λ and θ, the process variance of the aggregate loss is: λ (second moment of the Gamma severity) = λα(α+1)θ2 = 12λθ2. EPV = E[12λθ2] = 12 E[λ]E[θ2] = (12)(4)(25/2) = 600. Given λ and θ, the mean aggregate loss is: λαθ = 3λθ. First moment of the hypothetical means is: E[3λθ] = 3 E[λ] E[θ] = (3)(4)(5/2) = 30. Second moment of the hypothetical means is: E[9λ2θ2] = 9E[λ2] E[θ2] = (9)(17.333)(25/2) = 1950. VHM = 1950 - 302 = 1050. K = EPV/VHM = 600/1050 = .57. We observe one year, Z = 1/(1 + K) = 63.7%. Estimated future aggregate loss is: (63.7%)(11) + (36.3%)(30) = 17.9. Comment: Note that the product of the separate estimates of frequency and severity: (3.5)(6) = 21, is not equal to the estimate of aggregate losses 17.9.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 479 10.14. B. The moment generating function is defined as: M(t) = E[ext]. For the Gamma Distribution, M(t) = (1 - θt)−α, t < 1/θ. Therefore, E[eµ] = M(1) = (1 - 0.3)-8 = 17.3467. E[e2µ] = M(2) = (1 - 0.6)-8 = 1525.88. For the LogNormal Distribution, E[X] = exp[µ + σ2/2] = 2.0544 eµ. Thus, the first moment of the hypothetical means is: 2.0544 E[eµ] = (2.0544)(17.3467) = 35.637. Second moment of the hypothetical means is: 2.05442 E[e2µ] = (2.05442 )(1525.88) = 6440.1. VHM = 6440.1 - 35.6372 = 5170. For the LogNormal, E[X2 ] = exp[2µ + 2σ2] = 17.8143 e2µ. Process Variance = 17.8143 e2µ - (2.0544 eµ)2 = 13.5937 e2µ. EPV = 13.5937 E[e2µ] = (13.5937)(1525.88) = 20,742. K = EPV/VHM = 20,742/5170 = 4.0. Comment: 1/θ = 1/0.3 = 3.333, so that M(2) exists. 10.15. C. Given β, the process variance is: 2β(1+ β). ∞
E[β] = 280 ∫β5/(1 + β)9 dβ = (280)Γ(6)Γ(3)/Γ(9) = 280(5!)(2!)/(8!) = 1.667. 0 ∞
E[β2] = 280 ∫β6/(1 + β)9 dβ = (280)Γ(7)Γ(2)/Γ(9) = 280(6!)(1!)/(8!) = 5. 0
EPV = E[2β(1+ β)] = (2){E[β] + E[β2]} = (2)(1.667 + 5) = 13.33. Given β, the mean is: 2β. A priori mean is: 2E[β] = (2)(1.667) = 3.33. VHM = Var[2β] = 4 Var[β] = (4){E[β2] - E[β]2 } = (4)(5 - 1.6672 ) = (4)(2.22) = 8.88. K = EPV/VHM = 13.33 /8.88 = 1.5. Z = 1/(1 + K) = .4. Estimated future frequency = (.4)(8) + (.6)(3.33) = 5.2. Comment: If for fixed r, 1/(1+β) of the Negative Binomial is distributed over a portfolio by a Beta, then the posterior distribution of 1/(1+β) parameters is also given by a Beta. Thus the Beta distribution is a conjugate prior to the Negative Binomial Distribution for fixed r.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 480 10.16. D. The prior distribution of b is a Single Parameter Pareto with θ = 10 and α = 3, with first moment αθ/(α-1) = 15, second moment αθ2/(α-2) = 300, and variance 300 - 152 = 75. Given b, the losses are uniform on [0, b], with process variance b2 /12. EPV = E[b2 /12] = E[b2 ]/12 = 300/12 = 25. Given b, the hypothetical mean is b/2. VHM = Var[b/2] = Var[b]/4 = 75/4 = 18.75. K = EPV/VHM = 25/18.75 = 4/3. Since we observe 3 claims, Z = 3/(3 + K) = 69.2%. Prior mean = E[b/2] = E[b]/2 = 15/2 = 7.5. Observed mean = (17 + 13 + 22)/3 = 17.33. Estimated future severity = (69.2%)(17.33) + (30.8%)(7.5) = 14.3.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 481 10.17. C. Severity is uniform on [0, b]. If for example, b = 22.01, the chance of a claim of size 22 is 1/22.01. If for example, b = 21.99, the chance of a claim of size 22 is 0. For b ≥ 22, Prob[observation] = 6 f(17) f(13) f(22) = (6)(1/b)(1/b)(1/b) = 6/b3 . For b < 22, Prob[observation] = 6 f(17) f(13) f(22) = = 6 f(17) f(14) 0 = 0. π(b) = 3000/b4 b > 10. ∞
22
∞
∫10 π(b) Prob[observation | b] db = 10∫ π(b) 0 db + 22∫
3000 6 db = 3000 / 226 . b4 b3
By Bayes Theorem, the posterior distribution of ω is: π(b) Prob[observation | b ] 18,000 / b7 226 = = 6 , b ≥ 22. 3000 / 226 3000 / 226 b7 (Recall that if ω < 22, Prob[observation] = 0.) The mean of the uniform from 0 to b is b/2. Thus, the expected value of the next claim from the same insured is: ∞
∫
22
6
226 b db = (3)(226 ) / {(5)(225 )} = 13.2. b7 2
Comment: The Single Parameter Pareto is a Conjugate Prior to the uniform Iikelihood. In general, αʼ = α + n, and θʼ = Max[x1 , ... xn , θ]. In this case, the posterior distribution of b is Single Parameter Pareto with: α = 3 + 3 = 6, and θ = Max[10, 17, 13, 22] = 22. The mean of the uniform from 0 to b is b/2. Thus the estimate from Bayes Analysis is the posterior expected value of b/2: (Mean of the posterior Single Parameter Pareto) / 2 = {(6)(22)/(6-1)} / 2 = 13.2. Since the Single Parameter Pareto is not a member of a linear exponential family, it does not follow that Buhlmann Credibility is equal to Bayes Analysis; in this case they differ.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 482 10.18. C. Var[X | δ] = δ2. EPV = E[Var[X | δ]] = E[δ2] = second moment of the Pareto = 2θ2/{(α-1)(α-2)}. E[X | δ] = δ. VHM = Var[E[X | δ]] = Var[δ] = variance of the Pareto = 2θ2/{(α-1)(α-2)} - θ2/(α-1)2 = αθ2/{(α-1)2(α-2)}. However, for α = 2, the 2nd moment and variance of the Pareto do not exist. Nevertheless, as α → 2, K = EPV/VHM = 2(α-1)/α → 1. If one takes K = 1, then Z = 1/(1 + K) = 1/2. Prior mean = mean of Pareto = 100/(2-1) =100. Estimated severity = (1/2)(60) + (1/2)(100) = 80. Comment: Beyond what you are likely to be asked on the exam! EPV → ∞ and VHM → ∞. 10.19. B. The second moment of the Gamma Distribution is: (6)(5)(4ζ)2 = 480ζ2. For a Poisson frequency, the process variance is: λ(second moment of severity) =(ζ/20) (480ζ2) = 24ζ3. ∞
EPV =
∞
∫1 (PV given ζ) π(ζ) dζ = ∫1 24 ζ
3
7 / ζ8 dζ =
ζ= ∞ -42 / ζ4
]
= 42.
ζ=1
The mean aggregate loss given ζ is: (ζ/20)(5)(4ζ) = ζ2. ∞
Overall Mean =
∫1 ζ
∞ 2
π(ζ) dζ =
∫1 ζ
2
7 / ζ8
dζ =
ζ =∞ -(7 / 5) / ζ5
]
∞
2nd moment of the hypothetical means =
= 7/5.
ζ =1
∫1 (ζ )
22
∞
π(ζ) dζ =
∫1 ζ
4
7 / ζ8
dζ =
VHM = 7/3 - (7/5)2 = .3733. K = EPV/VHM = 42/.3733 = 112.5. For 100 exposures, Z = 100/(100 + K) = 47.1%. Comment: Similar to 4, 11/03, Q.11. 10.20. A. Process Variance = b2 /12. EPV = E[b2 /12] = E[b2 ]/12 = (52 /12 + 12.52 )/12 = 13.19. Hypothetical Mean = b/2. VHM = Var[b/2] = Var[b]/4 = (52 /12)/4 = .521. K = EPV/VHM = 13.19/.521 = 25.3.
ζ =∞ -(7 / 3) / ζ3
]
ζ =1
= 7/3.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 483 10.21. E. Let E[µ] = c. Then Var[µ] = 22 E[µ]2 = 4c2 . E[µ2] = 5c2 . Severity has mean µ, standard deviation 4µ, and 2nd moment: (4µ)2 + µ2 = 17µ2. Process Variance = λ (2nd moment of severity) = 17λµ2. EPV = E[17λµ2] = 17E[λ]E[µ2] = (17)(2)(5c2 ) = 170c2 . Hypothetical Mean = λµ. VHM = Var[λµ] = E[(λµ)2 ] − E[λµ]2 = E[λ2]E[µ2] − E[λ]2 E[µ]2 = (3 + 22 )(5c2 ) - 22 c2 = 31c2 . K = EPV/VHM = 170/31 = 5.48. 10.22. C. The mean severity is: exp[6 + 12 /2] = e6.5 = 665.142. The second moment of severity is: exp[(2)(6) + (2)(12 )] = e14 = 1,202,604. The variance of severity is: 1,202,604 - 665.1422 = 760,191. E[q] = .02. Var[q] = (.03 - .01)2 /12 = 0.0003333. E[q2 ] = Var[q] + E[q]2 = 0.00043333. For a given value of q, The process variance of pure premium is: 3q760,191 + 3q(1-q)665.1422 = 3,607,812q - 1,327,240q2 . EPV = 3,607,812E[q] - 1,327,240E[q2 ] = 3,607,812(0.02) - 1,327,240(0.00043333) = 71,581. Hypothetical mean = (3q)(665.142) = 1995.4q. Overall mean is: (1995.4)(.02) = 39.91. VHM = 1995.42 Var[q] = (1995.42 )(0.00003333) = 132.7. K = EPV/VHM = 71,581/132.7 = 539. 10.23. A. The total number of employee-years is: 100 + 120 + 140 = 360. Z = 360/(360 + 539) = 40.0%. The inflated losses are: (100)(21)(1.044 ) + (120)(28)(1.043 ) + (140)(27)(1.042 ) = 10,324.69. Observed pure premium is: 10,324.69/360 = 28.68, brought to the year 5 level. From the previous solution, the a priori mean pure premium on the year 5 level is: 39.91. Estimated future pure premium on the year 5 level is: (.4)(28.68) + (1 - .4)(39.91) = 35.42. For 130 employees, the expected losses are: (130)(35.42) = 4605. Comment: In year 4, one has data from years 1 to 3 and is predicting the losses for year 5. 10.24. A. The mean aggregate loss is: λθ. The variance of aggregate loss is: λ(2nd moment of severity) = λ2θ2. EPV = E[λ2θ2] = 2 E[λ] E[θ2] = (2)(4)(7 + 72 ) = 448. First moment of the hypothetical means: E[λθ] = E[λ]E[θ] = (4)(7) = 28. Second moment of the hypothetical means: E[(λθ)2 ] = E[λ2]E[θ2] = {(2)42 }(7 + 72 ) = 1792. VHM = 1792 - 282 = 1008. K = EPV/VHM = 448/1008 = 4/9. Comment: Similar to 4, 5/05, Q.11.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 484 10.25. C. The process variance for severity is given as 2µ2. Since for the joint distribution λ and µ are not independent, one has to weight together the process variances of the severities for the individual types using the chance that a claim came from each type. The chance that a claim came from an individual of a given type is proportional to the product of the a priori chance of an insured being of that type and the mean frequency for that type. This is similar to picking a die and spinner together. In this case, these weights are: f(λ, µ) λ = 1.5 λ2, 0 < µ < 2λ < 1. 1 2λ
1
∫ ∫1.5 λ2 dµ dλ = ∫3λ3 dλ = 3/4. 0 0
0 1
2λ
1
2λ
1
∫1.5λ2 ∫2µ2 dµ dλ / ∫1.5λ2 ∫ dµ dλ= ∫1.5 λ2 16λ3/3 dλ / (3/4) = (8/6)(4/3) = 16/9.
EPV = 0
0
0
0
0
The hypothetical mean severity is given as µ. 1
Overall Mean severity =
2λ
1
2λ
1
∫1.5λ2 ∫µ dµ dλ / ∫1.5λ2 ∫ dµ dλ= ∫1.5 λ2 2λ2 dλ / (3/4) = 0
0
0
0
0
1
2λ
1
(3/5)(4/3) = 4/5. 2λ
∫1.5λ2 ∫µ2 dµ dλ / ∫1.5λ2 ∫ dµ dλ = 8/9.
2nd moment of the hypothetical means = 0
0
0
0
VHM = 8/9 - (4/5)2 = 56/225. K = EPV/VHM = (16/9)/(56/225) = 450/63 = 7.14. Comment: Beyond what you are likely to be asked on your exam! The setup of 4B, 5/99, Q.13 has been altered in the final bullet from an independent to a dependent distribution. 10.26. E. EPV = E[β(1+β)] = E[β] + E[β2] = 1st Moment of Gamma + 2nd Moment of Gamma = αθ + α(α+1)θ2. VHM = Var[β] = variance of the Gamma = αθ2. K = EPV/VHM = {αθ + α(α+1)θ2}/αθ2 = 1/θ + α + 1. Comment: Similar to 4, 5/05, Q.17.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 485 10.27. D. f(x) = 0.75λe-λx + 0.5λe-2λx = 0.75λe-λx + (0.25) (2λe-2λx). X is a 75%-25% mixture of Exponentials with means 1/λ and 1/(2λ). E[X | λ] = .75/λ + .25/(2λ) = .875/λ. E[X2 | λ] = .75(2/λ2) + .25{2/(2λ)2 } = 1.625/λ2. Var[X | λ] = 1.625/λ2 - (.875/λ)2 = .859375/λ2. .05
∫
EPV = (.859375/λ2)/.04 dλ = 1718.75. .01 .05
∫
First Moment of the hypothetical means = (.875/λ)/.04 dλ = 35.206. .01 .05
∫
Second Moment of the hypothetical means = (.875/λ)2 /.04 dλ = 1531.25. .01
VHM = 1531.25 - 35.2062 = 291.8. K = 1718.75/291.8 = 5.89. Z = 1/(1 + K) = 14.5%. The expected size of the next claim from the same policyholder is: (14.5%)(60) + (85.5%)(35.206) = 38.80.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 486 10.28. C. One has to weight together the process variances of the severities for the individual types using the chance that a claim came from each type. The chance that a claim came from an individual of a given type is proportional to the product of the a priori chance of an insured being of that type and the mean frequency for that type. In this case, these weights are: π(ζ) ζ/20 = (7/20)/ζ7, ζ > 1. ∞
∞
∫1 π(ζ) ζ / 20 dζ = (7/20) ∫1 ζ - 7 dζ = (7/20)(1/6) = 7/120. The variance of the Gamma Distribution is: (5)(4ζ)2 = 80ζ2. ∞
EPV =
∞
∞
∫1 (PV given ζ) π(ζ) ζ / 20 dζ / ∫1 π(ζ) ζ / 20 dζ = (120/7) ∫1 80 ζ2 (7 / ζ8) (ζ / 20) dζ ∞
= 480
∫1 ζ - 5 dζ = 120.
The mean of the Gamma Distribution is: (5)(4ζ) = 20ζ. ∞
Overall Mean =
∞
∞
∫1 20 ζ π(ζ) ζ / 20 dζ / ∫1 π(ζ) ζ / 20 dζ = (120/7) ∫1 7ζ- 6 dζ = 120/5 = 24. ∞
∞
∫1
2nd moment of the hypothetical means = (20 ζ)2 π(ζ) ζ / 20 dζ /
∫1 π(ζ) ζ / 20 dζ
∞
= (120/7)
∫1 140 ζ- 5 dζ = (120/7)(140/4) = 600.
VHM = 600 - 242 = 24. K = EPV/VHM = 120/24 = 5. Comment: Beyond what you are likely to be asked on your exam.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 487 10.29. D. λ and µ are distributed independently. λ has a density which is Gamma with α = 4 and θ = 1/10. π(λ) = λ3 e−10λ 104 / Γ[4] = λ3 e−10λ 10,000/ 6. E[λ] = (4)(1/10) = 0.4.
E[λ2] = (4)(5)(1/10)2 = 0.2.
µ has a density which is Inverse Gamma with α = 3 and θ = 10. π(µ) = µ-4 e−10/µ 103 / Γ[3] = µ-4 e−10/µ 500. E[µ2] = (102 )/{(2)(1)} = 50.
E[µ] = 10/2 = 5.
The mean aggregate loss is: λµ. The variance of aggregate loss is: λ(2nd moment of severity) = λ2µ2. EPV = E[λ2µ2] = 2 E[λ] E[µ2] = (2)(0.4)(50) = 40. First moment of the hypothetical means: E[λµ] = E[λ]E[µ] = (0.4)(5) = 2. Second moment of the hypothetical means: E[(λµ)2 ] = E[λ2]E[µ2] = (0.2)(50) = 10. VHM = 10 - 22 = 6. K = EPV/VHM = 40/6 = 6.67. Comment: One can compute the moments by doing the relevant integrals. All of the integrals are of the Gamma type. E[λ2µ2] =
∞
∫ 2 λ µ2 (2,500,000 / 3) λ3 e- 10λ µ-4 e-10 / µ
dλ dµ
0
∞
= (5,000,000/3)
∫
λ4
e-10 λ dλ
0
∞
∫ µ- 2
e-10 / µ dµ = (5,000,000/3)(Γ[5]/105 )/(Γ[1] 101 ) = 40.
0
∞
E[λµ] =
∫ λ µ (2,500,000 / 3) λ3 e-10λ µ-4 e-1 0 / µ dλ dµ 0
∞
= (2,500,000/3)
∫
λ4
e-10 λ dλ
0
E[λ2µ2] =
∞
∫ µ-3
e-10 / µ dµ = (2,500,000/3)(Γ[5]/105 )/(Γ[2] 102 ) = 2.
0
∞
∫ λ2 µ2 (2,500,000 / 3) λ3 e-10λ µ-4 e- 10 / µ
dλ dµ
0
∞
= (2,500,000/3)
∫
0
λ5
e-10 λ dλ
∞
∫ µ- 2
e-10 / µ dµ
0
= (2,500,000/3)(Γ[6]/106 )/(Γ[1] 101 ) = 10. The mixed distribution for frequency is a Negative Binomial with r = α = 4 and β = θ = 1/10. This Negative Binomial has mean of (4)(1/10) = 0.4, and variance of (4)(1/10)(11/10) = 0.44.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 488 The mixed distribution for severity is a Pareto with α = 3 and θ = 10. This Pareto has mean of 10/2 = 5, second moment of (2)(102 )/{(2)(1)} =100, and variance of 100 - 52 = 75. Thus one might think that the variance of aggregate loss is: (0.4)(75) + (52 )(0.44) = 41. However, EPV + VHM = 40 + 6 = 46 > 41. Just looking at the separate frequency and severity mixed distributions does not capture the full variance, since it ignores the various combinations of Poissons with Exponentials such as a small λ with a small µ, or a large λ with a large µ. 10.30. The chance of the observation is proportional to: q(1 - q)2 (1-q)3 q2 (1- q) q(1 - q)2 (1-q)3 = q4 (1-q)11. Thus the posterior distribution is proportional to: (1.5 - q) q4 (1-q)11 = 1.5q4 (1-q)11 - q5 (1-q)11. Therefore, the posterior mean of q is: 1
1
0
0
{1.5 ∫ q5 (1- q)11 dq -
∫
1
1
0
0
q6 (1- q)11 dq }/{1.5 ∫ q4 (1- q)11 dq -
∫ q5 (1- q)11 dq }
= {(1.5)(5! 11! / 17!) - 6! 11! / 18!}/{(1.5)(4! 11! / 16!) - 5! 11! / 17!} = {(1.5)(5! / 17!) - 6! / 18!}/{(1.5)(4! / 16!) - 5! / 17!} = {(1.5)(18) 5! - 6!}/{(1.5)(17)(18) 4! - (18) 5!} = {(1.5)(18)(5) - (5)(6)}/{(1.5)(17)(18) - (18)(5)} = 105/369 = 0.28455. The estimated future frequency is: (3)(0.28455) = 0.8537. 10.31. The process variance is 3q(1-q). 1
1
1
0
0
0
EPV = 3 ∫ q(1- q)(1.5 - q) dq = 4.5 ∫ q(1- q) dq - 3 ∫ q2(1- q) dq = (4.5) 1! 1!/3! - (3) 2! 1! / 4! = 0.5. 1
Overall mean is:
∫
0
1
1
0
0
3q(1.5 - q) dq = 4.5 ∫ q dq - 3 ∫ q2 dq = 4.5/2 - 3/3 = 1.25.
Second Moment of the Hypothetical Means is: 1
1
1
0
0
0
∫ (3q)2 (1.5 - q) dq = 13.5 ∫ q2 dq - 9 ∫ q3 dq = 13.5/3 - 9/4 = 2.25. VHM = 2.25 - 1.252 = .6875. K = EPV/ VHM = .5/.6875 = 0.7273. Z = 5/(5 + K) = .873. Estimated future frequency is: (.873)(4/5) + (1 - .873)(1.25) = 0.857. Comment: While the Buhlmann and Bayes estimates are very similar, they are not equal. The prior distribution of q is similar to but not equal to a Beta Distribution.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 489 10.32. C. The moment generating function is defined as: M(t) = E[ext]. For the Inverse Gaussian Distribution, M(t) = exp[(θ / µ) (1 Therefore, E[exp[σ2]] = M(1) = exp[(2 / 0.6) (1 E[exp[σ2/2]] = M(0.5) = exp[(2 / 0.6) (1 E[exp[2σ2]] = M(2) = exp[(2 / 0.6) (1 -
1 - 2tµ2 / θ )] , t <
θ . 2 µ2
1 - 2 (0.62 )/ 2 )] = 1.9477.
1 - 2(0.5) (0.62) / 2 )] = 1.3701. 1 - 2(2) (0.62) / 2 )] = 4.8042.
For the LogNormal Distribution, E[X] = exp[µ + σ2/2] = exp[5 + σ2/2] = 148.41 exp[σ2/2]. Thus, the first moment of the hypothetical means is: 148.41 E[exp[σ2/2]] = (148.41)(1.3701) = 203.34. Second moment of the hypothetical means is: 148.412 E[exp[σ2]] = (148.412 )(1.9477) = 42,899. VHM = 42,899 - 203.342 = 1552. For the LogNormal, E[X2 ] = exp[2µ + 2σ2] = 22,026 exp[2σ2]. Process Variance = 22,026 exp[2σ2] - (148.41 exp[σ2/2])2 = 22,026 exp[2σ2] - 21,937 exp[σ2]. EPV = 22,026 E[exp[2σ2]] - 21,937 E[exp[σ2]] = (22,026)(4.8042) - (21,937)(1.9477) = 63,091. K = EPV/VHM = 63,091/1552 = 40.7. Comment: Long and difficult! θ Note that = 2.777, so that M(2) exists. 2 µ2 10.33. B. Each Inverse Gamma has mean: θ/(α −1) = θ/5, and second moment: θ2/{(α −1)(α −2)} = θ2/20. Therefore, each Inverse Gamma has a (process) variance of: θ2/20 - (θ/5)2 = θ2/100. θ is distributed via an Exponential Distribution; call the mean of this Exponential Distribution µ. E[θ] = µ, E[θ2] = 2µ2, and Var[θ] = µ2. EPV = E[θ2/100] = E[θ2]/100 = 2 µ2/100 = µ2/50. VHM = Var[θ/5] = Var[θ]/52 = µ2/25. K = EPV/VHM = (µ2/50)/(µ2/25) = 0.5. Comment: This is an example of an Exponential-Inverse Gamma. K = 2/(α - 2), for α > 2.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 490 10.34. B. λ and µ are distributed independently. λ has a density which is Pareto with α = 3 and θ = 1/10. π(λ) = 3(1/10)3 /(1/10 + λ)4 = 30/(1+ 10λ)4 . E[λ] = (1/10)/2 = 0.05. E[λ2] = (2)(1/10)2 /{(2)(1)} = 0.01. µ has a density which is Gamma with α = 2 and θ = 10. π(µ) = µ e−0.1µ / (Γ[2] 102 ) = µ e−0.1µ / 100.
E[µ2] = (2)(3)(102 ) = 600.
E[µ] = (2)(10) = 20.
The mean aggregate loss is: λµ. The variance of aggregate loss is: λ(2nd moment of severity) = λ2µ2. EPV = E[λ2µ2] = 2 E[λ] E[µ2] = (2)(0.05)(600) = 60. First moment of the hypothetical means: E[λµ] = E[λ]E[µ] = (0.05)(20) = 1. Second moment of the hypothetical means: E[(λµ)2 ] = E[λ2]E[µ2] = (0.01)(600) = 6. VHM = 6 - 12 = 5. K = EPV/VHM = 60/5 = 12. Comment: One can compute the moments by doing the relevant integrals, but it is a lot of work! ∞
∞
0
0
∫ 2 λ µ2 0.3 µ e- 0.1µ / (1 + 10 λ)4 dλ dµ = 0.6
E[λ2µ2] =
∫
λ / (1 + 10 λ) 4 dλ
∞
∫ µ3 e- 0.1µ dµ 0
= (0.6)(1/600)(Γ[4] 104 ) = 60. ∞
E[λµ] =
∫ λ µ 0.3 µ
e- 0.1µ
/ (1 +
10 λ )4 dλ
0
∞
dµ = 0.3
∫ λ / (1 + 10
λ) 4
∞
dλ
0
∫ µ2 e- 0.1µ dµ 0
= (0.3)(1/600)(Γ[3] 103 ) = 1. ∞
E[λ2µ2] =
∫
λ2 µ2 0.3 µ e - 0.1µ / (1 + 10 λ)4 dλ dµ = 0.3
0
∞
∫
λ2 / (1 + 10 λ)4 dλ
0
∞
∫ µ3 e- 0.1µ dµ 0
= (0.3)(1/3000)(Γ[4] 104 ) = 6. All of the integrals involving µ are of the Gamma type. The integrals involving λ can be done via integration by parts, or if one is clever by using the moment formulas for a Pareto Distribution. ∞
For example,
∫
λ / (1 + 10 λ) 4 dλ =
0
∞
∫
0
∞
x 0.1 4 / (0.1 + x)4 dx = (0.1/3) ∫ x 3 0.13 / (0.1 + x)4 dx 0
∞
= (0.1/3) ∫ x f(x) dx, where f(x) is the density of a Pareto with α = 3 and θ = 0.1. 0
∞
Therefore, (0.1/3) ∫ x f(x) dx = (0.1/3)(1st moment of this Pareto) = (0.1/3)(0.1/2) = 1/600. 0
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 491 10.35. E. For a Pareto Distribution with α = 4, E[X] = θ/3, E[X2 ] = θ2/3, and Var[X] = 2θ2/9. The uniform distribution from 10 to 50 has mean 30 and variance 402 /12. EPV = E[2θ2/9] = (2/9)E[θ2] = (2/9)(402 /12 + 302 ) = 229.63. VHM = Var[θ/3] = Var[θ]/32 = (402 /12)/9 = 14.815. K = EPV/VHM = 229.63/14.815 = 15.5. Z = 6/(6 + 15.5) = 27.9%. 10.36. D. The hypothetical mean is: αθ. Therefore, the First Moment of Hypothetical Means is: 3
∫ α θ π(α, θ) dα dθ = ∫ α
α2
2
dα
2
∫ θ θ3 dθ / 23.75 = (16.25)(6.2)/23.75 = 4.242. 1
Second Moment of Hypothetical Means is:
∫ (α
θ)2 π(α,
θ) dα dθ =
3
∫
α2
α2
2
dα
2
∫ θ2 θ3 dθ / 23.75 = (42.2)(10.5)/23.75 = 18.657. 1
VHM = 18.657 - 4.2422 = 0.662. The process variance is: αθ2. Therefore, the EPV is:
∫α
θ2
π(α, θ) dα dθ =
3
∫α
2
α2
2
dα
∫ θ2 θ3 dθ / 23.75 = (16.25)(10.5)/23.75 = 7.184. 1
K = EPV/VHM = 7.184/0.662 = 10.85.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 492 10.37. A. The process variance given u is: Second Moment of LogNormal - Square of First Moment of LogNormal = Exp[2u + (2)(22 )] - Exp[u + (22 )/2]2 = Exp[2u] (e8 - e4 ). Therefore, EPV = E[e2u] (e8 - e4 ). u is Normal with mean 5 and variance 3. (When we multiply a variable by a constant such as 2, we multiply the variance by that constant squared, and thus we multiply the standard deviation by that constant.) Therefore, 2u is Normal with mean 10, and variance: (22 )(3) = 12. 12 .
Therefore, e2u is LogNormal with parameters 10 and Therefore, E[e2u] is the mean of this LogNormal: Exp[10 + 12/2] = e16. Therefore, EPV = e16 (e8 - e4 ) = 26,004 million. The hypothetical mean given u is: First Moment of LogNormal = Exp[u + (22 )/2] = eu e2 . Therefore, VHM = Var[eu ] (e2 )2 = Var[eu ] e4 . u is Normal with mean 5 and variance 3. Therefore, eu is LogNormal with parameters 5 and
3.
Therefore, Var[eu ] is the variance of this LogNormal: Exp[(2)(5) + (2)(3)] - Exp[5 + 3/2]2 = e16 - e13. Therefore, VHM = (e16 - e13)e4 = 461.01 million. K = EPV/VHM = 26,004/461.01 = 56.4. Z = 20/(20 + 56.4) = 26.2%. The hypothetical mean given u is: eu e2 . eu is LogNormal with parameters 5 and
3.
Therefore, E[eu ] is the mean of this LogNormal: Exp[5 + 3/2] = e6.5. Prior mean is: E[eu ] e2 = e6.5 e2 = 4915. Estimate = (26.2%)(10,000) + (1 - 26.2%)(4915) = 6247. Comment: Long and hard!
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 493 10.38. C. The prior density of alpha is: 1/2, 3 ≤ α ≤ 5. The hypothetical mean is the mean of the Inverse Gamma: θ/(α−1). Therefore, the First Moment of Hypothetical Means is:
∫
5 ∞ θ 1 3000 1 dα dθ = 1500 dα ∫ α −1 ∫ θ−3 dθ = (1500) {ln(4) - ln(2)} (1/200) = 5.1986. α − 1 2 θ4 3 10
Second Moment of Hypothetical Means is: 5 ∞ ⎛ θ ⎞ 2 1 3000 1 ∫ ⎜⎝ α − 1⎟⎠ 2 θ4 dα dθ = 1500 ∫ (α − 1)2 dα ∫ θ− 2 dθ = (1500) {1/2 - 1/4} (1/10) = 37.5. 3 10
VHM = 37.5 - 5.19862 = 10.475. The second moment of an Inverse Gamma is:
Therefore, the process variance is:
θ2 . (α − 1) (α − 2)
⎛ θ ⎞2 θ2 θ2 -⎜ ⎟ = . (α − 1) (α − 2) ⎝ α − 1 ⎠ (α − 1)2 (α − 2)
Therefore, the EPV is:
∫
∞ 5 1 3000 θ2 1 −2 dα dθ = 1500 ∫ θ dθ ∫ dα . 4 2 2 (α − 1) (α − 2) 2 θ (α − 2) (α − 1) 10 3
5
θ=5
⎤ 1 1 ∫ (α − 1)2 (α − 2) dα = θ − 1 + ln[θ - 2] - ln[θ - 1]⎥⎦ 3
= (1/4 - 1/2) + ln[3/1] - ln[4/2] = 0.155465.
θ=3
EPV = (1500) (1/10) (0.155465) = 23.32. K = EPV/VHM = 23.32/10.475 = 2.23. Z = 4/(4 + 2.23) = 64.2%. 10.39. D. Severity follows a Single Parameter Pareto with α = 3. E[X | θ] = 3θ/(3-1) = 3θ/2.
E[X2 | θ] = 3θ2/(3-2) = 3θ2.
Var[X | θ] = 3θ2 - (3θ/2)2 = 3θ2/4. Then the VHM = VAR[3θ/2] = (9/4) Var[θ] = (9/4)(52 /12) = 75/16. EPV = E[3θ2/4] = (3/4)E[θ2] = (3/4)(52 /12 + 12.52 ) = 118.75. K = EPV/VHM = 76/3. The prior mean is: E[3θ/2] = (3/2)E[θ] = (1.5)(12.5) = 18.75. The observed mean is: 200/8 = 25. Z = 8 / (8 + 76/3) = 24.0%. Estimate = (24.0%)(25) + (1 - 24.0%)(18.75) = 20.25. Comment: The variance of a uniform is its width squared divided by 12. The second moment of a uniform is its variance plus the square of its mean.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 494 10.40. E. The mean of the zero-truncated poisson distribution is: λ/(1 - e−λ) = 2/(1 - e-2) = 2.313. Therefore, E[10β] = 2.313. ⇒ E[β] = 0.2313. As shown in Appendix B of Loss Models, the variance of the zero-truncated poisson distribution is: λ
1 - (λ + 1)e- λ 1 - 3e- 2 = (2) = 1.5890. (1 - e - λ )2 (1 - e-2 )2
Therefore, Var[10β] = 1.5890. ⇒ Var[β] = 1.5890/100 = 0.01589. The process variance is: β(1+β) = β + β2. EPV = E[β + β2] = E[β] + E[β2] = E[β] + Var[β2] + E[β]2 = 0.2313 + 0.01589 + 0.23132 = 0.3007. VHM = Var[β] = 0.01589. K = EPV / VHM = 0.3007 / 0.01589 = 18.92. For 5 years of data, Z = 5/(5 + 18.92) = 20.9%. 10.41. E. E[X | µ] = exp[µ + 22 /2] = e7 eµ−5. ⇒ E[X] = e7 E[eµ−5]. E[X | µ]2 = e14 e2(µ−5). ⇒ second moment of the hypothetical means = E[E[X | µ]2 ] = e14 E[e2(µ-5)]. Var[X | µ] = E[X2 | µ] - E[X | µ]2 = exp[2µ + (2)(22 )] - (e2 eµ)2 = (e8 - e4 )e2µ = (e18 - e14)e2(µ-5) EPV = E[Var[X | µ]] = (e18 - e14) E[e2(µ-5)]. Now the probability generating function is defined as: P[z] = E[zn ]. Here µ−5 follows Negative Binomial distribution, so µ−5 takes the place of n. Therefore, E[eµ−5] = P[e], and E[e2(µ-5)] = P[e2 ]. As shown in Appendix B of Loss Models, for the Negative Binomial: r
⎛ ⎞ 1 1 P(z) = ⎜ , z < 1 + 1/β = 21. ⎟ = ⎝ 1 - β(z -1)⎠ (1.05 - 0.05z)6 P(e) = 1.7143. P(e2 ) = 10.0659. EPV = (e18 - e14) E[e2(µ-5)] = (e18 - e14) P(e2 ) = (e18 - e14) (10.0659) = 648.8 million. VHM = E[E[X | µ]2 ] - E[X]2 = e14 E[e2(µ-5)] - (e7 E[eµ−5])2 = e14 P(e2 ) - e14 P(e)2 = e14 (10.0659 - 1.71432 ) = 8.571 million. K = EPV / VHM = 648.8 / 8.571 = 75.7. Comment: Difficult!
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 495 10.42. B. E[q] = 0.3 E[q2 ] = Var[q] + E[q]2 = 0.22 /12 + 0.32 = 0.09333. Process variance of pure premium is: (10q)(5)(2002 ) + (10002 )(10)(q)(1-q) = (12 million)q - (10 million)q2 . EPV = (12 million) E[q] - (10 million)E[q2 ] = (12 million) (0.3) - (10 million) (0.09333) = 2,667,667. The mean pure premium is: (5)(200)(10q) = 10,000q. A priori mean pure premium is: 10,000 E[q] = (10,000)(0.3) = 3000. VHM = Var[10,000q] = (100 million) Var[q] = (100 million) 0.22 /12 = 333,333. K = EPV / VHM =2,666,667 / 333,333 = 8.0. There are a total of 45 members during the three years. Z = 45 / (45 + 8) = 85.0%. Observed average pure premium is: {(10)(2143) + (15)(2551) + (20)(2260)} / (10 + 15 + 20) = 2331. Estimated future pure premium is: (0.850)(2331) + (1 - 0.850)(3000) = 2431. Estimate of aggregate losses in year 4 is: (2431)(25) = 60,775. Comment: Similar to 4, 11/01, Q.18. No use is made of the given claim counts. 10.43. C. Mean frequency is .10. Mean severity for risk type one is: (1/3)(5) + (1/2)(10) + (1/6)(20) = 10. Mean severity for risk type two is: (1/2)(5) + (1/4)(10) + (1/4)(20) = 10. Thus the mean severity is independent of whether a risk is “type 1” or “type 2”. Let m be the mean frequency for a risk. Since the frequency and severity are distributed independently, the hypothetical mean is 10m. The first moment of the hypothetical means is (10)(.10) = 1. The second moment of the hypothetical means is: m = .13
∫ (10m)2 (1 / .06) dm = (100/3) {(.133) - (.073)} / .06 = .0618 / .06 = 1.03. m = .07
Thus the variance of the hypothetical means is 1.03 - 12 = 0.03. Comment: Since the mean severity is independent of the type of risk, the computation is simplified somewhat.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 496 10.44. E. For a risk with average frequency m, and of “type 1”, the variance of the pure premium is E(PP2 ) - E2 (PP) = 125m - 100m2 : Outcome
A Priori Probability
Pure Premium
Square of Pure Premium
No Claim A Claim of Size 5 A Claim of Size 10 A Claim of Size 20
1-m m/3 m/2 m/6
0 5 10 20
0 25 100 400
Overall
1
10m
125m
For a risk with average frequency m, and of “type 2”, the variance of the pure premium is E(PP2 ) - E2 (PP) = 137.5m - 100m2 : Outcome
A Priori Probability
Pure Premium
Square of Pure Premium
No Claim A Claim of Size 5 A Claim of Size 10 A Claim of Size 20
1-m m/2 m/4 m/4
0 5 10 20
0 25 100 400
Overall
1
10m
137.5m
The density function for the average frequency m is (1/.06) on [.07,.13]. Therefore, the expected value of the process variance of the pure premiums can be obtained by weighting together the chances of the two types of severities and integrating over m: m = .13
m = .13
∫
∫
m = .07
m = .07
(60%) {125m -100m2 } (1 / .06) dm + 40% {137.5m -100m2 } (1 / .06) dm = m = .13
(1/.06)
∫ {130m -100m2} dm = (1/.06){(130)(.006) - (100)(.00618)} = 11.97.
m = .07
Alternately, the mean severity for type 1 is 10 with a variance of 25. The mean severity for type 2 is 10 with a variance of 37.5. For a risk with mean frequency m, the variance of the frequency is m(1-m), since we have a Bernoulli. Thus for a risk with average frequency m, and of “type 1”, the variance of the pure premium is: m25 + (102 )m(1-m) = 125m - 100m2 . For a risk with average frequency m, and of “type 2”, the variance of the pure premium is: m37.5 + (102 )m(1-m) = 137.5m - 100m2 . Then proceed as above. Comment: Alternately one can compute the total variance which turns out to be 12 and subtract the variance of the hypothetical means from the previous question of .03, getting the expected value of the process variance of 11.97. 10.45. B. K = EPV / VHM = 11.97 / 0.03 = 399.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 497 10.46. A. Z = 15 / (15 + 399) = .036. The prior estimate is the mean pure premium of 1. The observation is 45/15 = 3. Thus the new estimate is (.036) (3) + (1-.036)(1) = 1.072. 10.47. D. For q = m/100,000, f(q) = 100q, for 0 < q ≤ .1, and f(q) = 20 - 100q, for .1 < q ≤ .2. The process variance for the Bernoulli given q is q(1-q). Thus the Expected Value of the Process Variance is the integral of q(1-q)f(q) which is 1.06/12. .2
.1
.2
∫f(q)q(1-q)dq = ∫(100q)q(1-q)dq + ∫(20 - 100q)q(1-q)dq = 0
0
.1 q=.1
{100q3 /3 - 25q4 }
]
q=.2
+ {10q2 - 40q3 + 25q4 } ] = .37/12 + (3.6 - 3.36 +.45)/12 = 1.06/12.
q=0
q =.1
The hypothetical mean is q. Therefore overall mean = E[q] = the integral from 0 to 0.2 of qf(q): .2
.1
.2
∫f(q)qdq = ∫(100q)qdq + ∫(20 - 100q)qdq = 0
0
.1
q=.1
100q3 /3
]
q=.2
+ {10q2 - (100/3)q3 }] = .0333 + .3 - .2333 = .1.
q=0
q =.1
The Variance of the Hypothetical Means is the integral of f(q)(q-.1)2 . By symmetry around q = .1, we can take twice the integral from 0 to .1 : .1
.1
∫
∫
q=.1
2 f(q)(q-.1)2 dq = 2 (100q)(q2 - .2q -.01) = 50q4 - 40q3 /3 + q2 0
0
]
= .02/12
q=0
Buhlmann Credibility Parameter, K = EPV/VHM = (1.06/12) / (.02/12) = 53. For two exposures, Z = 2/(2+53) = 2/55. The observed frequency is 1/2. The new estimate is: (2/55)(1/2) + (53/55)(.1) = 63/550. Comment: Since f(q) is symmetric around q = .1, E[q] = .1. 10.48. D. The process variance for a Gamma is αθ2. Thus EPV = E[αθ2] = E[α/22 ] = E[α]/4 = ((4+0)/2)/4 = 1/2. The mean of a Gamma is αθ. Thus VHM = Var[αθ] = Var[α/2] = Var[α]/22 = ((4-0)2 /12)/4 = 1/3. K = EPV/VHM = (1/2)/(1/3) = 3/2. Comment: The variance of a uniform distribution on [a,b] is: (b-a)2 /12.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 498 10.49. E. Since µ and λ are independent and each distributed uniformly over (0,1), E[µ] = E[λ] = 1/2. E[µ2 ] = E[λ2 ] = 1/3. For a given µ and λ, the mean pure premium is µλ, and the process variance of the pure premium is:
σPP2 = µFσS2 + µS2 σF2 = λ(2µ2) + (µ)2 (2λ) = 4µ2λ. Therefore, EPV = E[4µ2λ] = 4 E[µ2] E[λ] = (4)(1/3)(1/2) = 2/3. VHM = VAR[µλ] = E[(µλ)2 ] - E2 [µ] E2 [λ] = E[µ2 ] E[λ2 ] - (1/2)2 (1/2)2 = (1/3)(1/3) - 1/16 = 7/144. Buhlmann Credibility Parameter K = EPV/VHM = (2/3)/(7/144) = 13.7. Comment: E[µ], E[λ], E[µ2 ], and E[λ2 ], can each be computed by doing a double integral with respect to f(λ,µ)dµdλ. 10.50. D. Since for the joint distribution λ and µ are independent, for severity the type of risk is determined just by µ. This is similar to picking a die and spinner separately. 1
∫
EPV = 2µ2 dµ = 2/3. 0 1
∫
Overall Mean severity = µ dµ = 1/2. 0 1
∫
2nd moment of the hypothetical mean severities = µ2 dµ = 1/3. VHM = 1/3 - 1/22 = 1/12. 0
K = EPV/VHM = (2/3)/(1/12) = 8.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 499 10.51. B. Given λ, the mean losses = (mean of Poisson)(mean severity) = λ λ = λ2. Thus the overall mean is: ∞
∞
∫
∫
λ2 f(λ) dλ = λ2 e−λ dλ = 2! = 2. 0
0
The second moment of the hypothetical means is: ∞
∞
∫
∫
λ4 f(λ) dλ = λ4 e−λ dλ = 4! = 24. 0
0
Therefore the Variance of the Hypothetical means = 24 - 22 = 20. Given λ, process variance = (mean of Poisson)(second moment of severity) = λ(2λ2) = 2λ3. ∞
∞
∫
EPV = 2λ3f(λ) dλ = 2 0
∫ λ3 e−λ dλ = 2(3!) = 12. K = EPV/VHM = 12/20 = 3/5.
0
10.52. C. The mean of each Negative Binomial Distribution is: rβ = 0.5ri = µi. E[.5ri] = E[µi] = 0.2. Therefore, E[ri] = 0.2/0.5 = 0.4. The process variance of each Negative Binomial is: rβ(1+β) = (0.5)(1.5)ri = 0.75ri. EPV = E[0.75ri] = 0.75E[ri] = (0.75)(0.4) = 0.3. VHM = Var[µi] = Variance of the Exponential = 0.22 = 0.04. K = EPV/VHM = .3 /.04 = 7.5. For one driver, for one year, Z = 1/(1 + 7.5) = 11.8%. Alternately, since beta is fixed at 0.5, r = µ/0.5 = 2µ. Therefore, since µ follows an Exponential Distribution with mean 0.2, the r parameters over the portfolio have an Exponential Distribution with mean 0.4. Hypothetical mean is: rβ = 0.5 r. ⇒ VHM = Var[0.5 r] = 0.52 Var[r] = (0.25) (0.42 ) = 0.04. Proceed as before. Comment: The frequency process is Negative Binomial, with β fixed and r varying across the portfolio of risks. It has been implicitly assumed that for each driver his r parameter is constant over time. Whenever as here, we are given the distribution of the hypothetical means, we can use that distribution to get the VHM directly, as I did here. For example, in the case of the Gamma-Poisson, the VHM is the variance of the Gamma distribution of lambdas.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 500 10.53. E. mean severity = 1000/(3 - 1) = 500. 2nd moment severity = (2)(10002 )/((3 - 1)(3 - 2)) = 1,000,000. Variance = 750,000. Mean frequency = .2r. Variance of frequency = (.2)(1.2)r = .24r. EPV = E[(.24r)(5002 ) + (.2r)(750,000)] = E[210000r] = 210000E[r] = (210000)(2) = 420,000. VHM = Var[(.2r)(500)] = Var[100r] =1002 Var[r] = (10000)(22 ) = 40,000. K = EPV/VHM = 420,000/40,000 = 10.5. Z = 100/(100 + 10.5) = 0.905. Comment: Assume r is the same for every insured in this book of business; bullet #2 could have been clearer in this regard. 10.54. B. mean severity = 1000/(3 - 1) = 500. 2nd moment severity = (2)(10002 )/{(3 - 1)(3 - 2)} = 1,000,000. Variance = 750,000. Mean frequency = .2r. Variance of frequency = (.2)(1.2)r = .24r. E[r] = 2. E[r2 ] = 1/3 + 4/3 + 9/3 = 14/3. Var[r] = 14/3 - 22 = 2/3. EPV = E[(.24r)(5002 ) + (.2r)(750,000)] = E[210000r] = 210000E[r] = (210000)(2) = 420,000. VHM = Var[(.2r)(500)] = Var[100r] =1002 Var[r] = (10000)(2/3) = 6667. K = EPV/VHM = 420,000/6667 = 63. Z = 100/(100 + 63) = 61.4%. 10.55. E. EPV = E[σ2] = 1.25. VHM = Var[λ] = (1.5 - .5)2 /12 = 1/12. K = 1.25/(1/12) = 15. Z = 1/ (1 + K) = 1/16. Prior mean = E[λ] = 1. Estimate = (1/16)(0) + (15/16)(1) = 15/16 = 0.94. Comment: You have to assume that the distributions of λ and σ2 are independent. 10.56. D. EPV = E[σ2] = (75%)(1) + (25%)(2) = 1.25. VHM5= Var[λ] = (.5)(.5 - 1)2 + (.5)(1.5 - 1)2 = 0.25. K = 1.25/0.25 = 5. Z = 1/ (1 + K) = 1/6. Prior mean = E[λ] = 1. Estimate = (1/6)(0) + (5/6)(1) = 0.833. Comment: Discrete risk type analog to 4, 11/02, Q.18.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 501 10.57. C. The second moment of the Exponential Distribution is: 2(10θ)2 = 200θ2. For a Poisson frequency, the process variance of aggregate losses is: λ(second moment of severity) = θ(200θ2) = 200θ3. ∞
θ= ∞
∞
∫
∫
EPV = (PV given θ) π(θ) dθ = 200θ3 5/θ6 dθ = -500/θ2 ] = 500. 1
θ= 1
1
The mean aggregate loss given θ is: (θ)(10θ) = 10θ2. ∞
θ=∞
∞
∫
∫
Overall Mean = 10θ2 π(θ) dθ = 10θ2 5/θ6 dθ = -(50/3)/θ3 ] = 50/3. 1
θ= 1
1 ∞
∞
∫
θ=∞
∫
2nd moment of the hypothetical means = (10θ2)2 π(θ) dθ = 100θ4 5/θ6 dθ = -500/θ ] = 500. 1
1
θ=1
VHM = 500 - (50/3)2 = 2000/9. K = EPV/VHM = 500/(2000/9) = 2.25. Alternately, π(θ) = 5/θ6, θ > 1. Rewriting this prior distribution as f(x) = 5/x6 , x >1, it is a Single Parameter Pareto Distribution with α = 5 and θ = 1. The process variance of aggregate losses is: λ(second moment of severity) = θ(200θ2) = 200θ3. Therefore, EPV = E[200θ3] = 200(third moment of the Single Parameter Pareto Distribution) = (200)(5)(13 )/(5 - 3) = 500. The mean aggregate loss given θ is: (θ)(10θ) = 10θ2. Overall Mean = E[10θ2] = 10(second moment of the Single Parameter Pareto Distribution) = (10)(5)(12 )/(5 - 2) = 50/3. 2nd moment of the hypothetical means = E[(10θ2)2 ] = 100E[θ4] = (100)(fourth moment of the Single Parameter Pareto Distribution) = (100)(5)(14 /(5 - 4) = 500. VHM = 500 - (50/3)2 = 2000/9. K = EPV/VHM = 500/(2000/9) = 2.25.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 502 10.58. D. 2nd moment of Exponential: 2(10θ)2 = 200θ2. Poisson frequency ⇒ the process variance is: λ(2nd moment of severity) = θ(200θ2) = 200θ3. For θ = 1, process variance is: 200. For θ = 2, process variance is: 1600. EPV = (70%)(200) + (30%)(1600) = 620. Mean aggregate loss given θ is: (θ)(10θ) = 10θ2. For θ = 1, mean is: 10. For θ = 2, mean is: 40. Overall Mean = (70%)(10) + (30%)(40) = 19. 2nd moment of the hypothetical means = (70%)(102 ) + (30%)(402 ) = 550. VHM = 550 - 192 = 189. K = EPV/VHM = 620/189 = 3.28. 10.59. E. For fixed parameters, the mean aggregate is: λ exp[µ + σ2/2] = λ exp[µ] exp[σ2/2]. Variance of aggregate is: λ(2nd moment of severity) = λ exp[2µ + 2σ2] = λ exp[2µ] exp[2σ2]. λ=1 µ=1 σ=1
EPV =
∫ ∫ ∫
λ=1
λ exp[2µ] exp[2σ 2] 2σ dσ dµ dλ = λ2 / 2]
λ =0
λ=0 µ=0 σ=0
µ=1
σ =1
exp[2µ]/ 2]
exp[2σ2 ]/ 2]
σ=0
µ= 0
= (1/2) {(e2 - 1)/2}{(e2 - 1)/2} = 5.1025. λ=1 µ=1 σ=1
Overall mean =
λ exp[µ] ∫ ∫ ∫ λ=0 µ=0 σ=0
exp[σ 2 / 2]
2σ dσ dµ dλ =
λ=1 2 λ / 2] λ =0
µ =1
exp[µ]]
µ= 0
= (1/2)(e1 - 1)2(e1/2 - 1) = 1.1147. 2nd moment of hypothetical means = λ=1 µ=1 σ=1
λ ∫ ∫ ∫ λ=0 µ=0 σ=0
2
exp[2µ ]
exp[σ2 ]
2σ dσ dµ dλ =
= (1/3){(e2 - 1)/2}(e1 - 1) = 1.8297. VHM = 1.8297 - 1.11472 = 0.5872. K = EPV/VHM = 5.1025/0.5872 = 8.69.
λ =1 3 λ / 3]
µ=1
σ =1 exp[2µ]/ 2 exp[σ 2] σ =0 λ =0 µ= 0
]
]
2
σ =1 2 exp[σ / 2] σ =0
]
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 503 10.60. E. For fixed parameters, the mean aggregate is: λ exp[µ + σ2/2]. Variance of aggregate is: λ(2nd moment of severity) = λ exp[2µ + 2σ2]. Type
Probability
lambda
mu
sigma
Mean
Square of Mean Process Var.
1 2 3
0.3 0.2 0.5
0.25 0.75 0.50
0.75 0.50 0.25
0.50 0.25 0.75
0.5997 1.2758 0.8505
0.3597 1.6276 0.7234
1.8473 2.3102 2.5392
0.8603
0.7951
2.2858
VHM = .7951 - .86032 = .0550. K = EPV/VHM = 2.2858/.0550 = 41.6. 10.61. B. The mean aggregate loss is: λθ. The variance of aggregate loss is: λ(2nd moment of severity) = λ2θ2. EPV = E[λ2θ2] = 2 E[λ] E[θ2] = (2)(1)(1 + 12 ) = 4. First moment of the hypothetical means: E[λθ] = E[λ]E[θ] = (1)(1) = 1. Second moment of the hypothetical means: E[(λθ)2 ] = E[λ2]E[θ2] = {(2)12 }(1 + 12 ) = 4. VHM = 4 - 12 = 3. K = EPV/VHM = 4/3. 10.62. D. The distribution of β is a Pareto Distribution with θ = 1. It has mean 1/(α - 1), second moment 2/{(α - 1)(α - 2)}, and variance: 2/{(α - 1)(α - 2)} - 1/(α - 1)2 = α/{(α - 1)2 (α - 2)}. A Geometric Distribution has mean β and variance β(1 + β). EPV = E[β(1+β)] = E[β] + E[β2] = 1/(α - 1) + 2/{(α - 1)(α - 2)} = α/{(α - 1)(α - 2)}. VHM = Var[β] = variance of the Pareto = α/{(α - 1)2 (α - 2)}. K = EPV/VHM = α - 1. Z = 1/(1 + K) = 1/α. Observation is x and the prior mean is: E[β] = 1/(α - 1). Estimate is: (1/α)x + (1 - 1/α){1/(α - 1)} = (x + 1)/α.
2013-4-9 Buhlmann Credibility §10 Buhl. Cred. Contin. Risk Types, HCM 10/19/12, Page 504 θ
10.63. A. E[X | θ] =
∫0 2x2 / θ2 dx
θ
E[X2 | θ] =
= 2θ/3.
∫0 2x3 / θ2 dx
= θ2/2.
Process Variance = E[X2 | θ] - E[X | θ]2 = θ2/18. θ
EPV =
∫0
θ=1
(θ2 / 18) 4θ 3 dθ = (2 / 9) θ6 / 6]
θ=0 1
First moment of the hypothetical means =
∫0
= 1/27.
θ=1
(2θ/ 3) 4θ 3 dθ = (8 / 3) θ5 / 5]
θ=0
2nd moment of the hypothetical means 1
=
∫0
∫
θ=1
(2θ/ 3)2 4θ3 dθ (2θ/3)2 4θ3 dθ = (16 / 9) θ6 / 6]
θ=0
= 8/27.
VHM = 8/27 - (8/15)2 = 0.01185. K = EPV/VHM = (1/27)/0.01185 = 3.125. Z = 1/(1 + K) = 24.2%. Estimate is: (24.2%)(0.1) + (1 - 24.2%)(8/15) = 0.428.
= 8/15.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 505
Section 11, Linear Regression & Buhlmann Credibility Buhlmann Credibility is a linear regression approximation to Bayes Analysis. Fitting a Straight Line with an Intercept:119 120 Two-variable regression model ⇔ 1 independent variable and 1 intercept. Y i = α + βXi + εi. Ordinary least squares regression: minimize the sum of the squared differences between the estimated and observed values of the dependent variable. ^
estimated slope = β =
N ∑Xi Yi N ∑ Xi 2
∑Xi ∑Yi . - (∑Xi)2
α^ = Y - β^ X .
To convert a variable to deviations form, one subtracts its mean. A variable in deviations form is written with a small rather than capital letter. xi = Xi - X . Variables in deviations always have a mean of zero. In deviations form, the least squares regression to the two-variable (linear) regression model, Y i = α + βXi + εi, has solution: ^
β=
119
∑ xi yi ∑ xi Yi = . ∑xi2 ∑xi2
α^ = Y - β^ X .
A review of material not on the syllabus of Exam 4/C. See “Mahlerʼs Guide to Regression,” in the Discussions portion of the NEAS webpage: www.neas-seminars.com/Discussions/ 120 Provided you are given the individual data rather than the summary statistics, the allowed electronic calculators will fit a least squares straight line with an intercept.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 506
Weighted Regressions:121 In a weighted regression we weight some of the observations more heavily than others. One can perform a weighted regression by minimizing the weighted sum of squared errors:
∑ wi (Yi
^
- Yi ) 2 =
∑ wi (Yi
- α - βXi) 2 .
The resulting fitted parameters are:122
∑ wi ∑wi Xi Yi - ∑ wi Xi ∑wi Yi β= ∑wi ∑ wi Xi2 - (∑ wi Xi) 2 ^
α^
.
wi Yi - β^ ∑ wi Xi ∑ = ∑w i
.
Provided that the weights add to one, the weighted regression can be put into deviations form, by subtracting the weighted average from each variable: xi = Xi - ΣwiXi
y i = Yi - ΣwiY i ^
For the two variable model: β =
∑ wi xi yi . ∑wi xi2
α^ = ΣwiY i - β^ ΣwiXi.
Multisided Die Example: Letʼs apply weighted least squares regression to the Bayesian Estimates of the results of a single die-roll in the multi-sided die example. We had previously for this example: Observation A Priori Probability Bayesian Estimate
1 0.2125 2.853
2 0.2125 2.853
3 0.2125 2.853
4 0.2125 2.853
5 0.0625 3.7
6 0.0625 3.7
7 0.0125 4.5
8 0.0125 4.5
The weights to be used are the a priori probabilities of each observation. Put the variables in deviations form, by subtracting the weighted average from each variable: xi = Xi - ΣwiXi, yi = Yi - ΣwiY i. w = {0.2125, 0.2125, 0.2125, 0.2125, 0.0625, 0.0625, 0.0125, 0.0125}. X = {1, 2, 3, 4, 5, 6, 7, 8}.
121 122
A review of material not on the syllabus of Exam 4/C. If all of the wi = 1, then this reduces to the case of an unweighted regression.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 507
ΣwiXi = 3 = a priori mean. x = X - ΣwiXi = {-2, -1, 0, 1, 2, 3, 4, 5}. Y = {2.853, 2.853, 2.853, 2.853, 3.7, 3.7, 4.5, 4.5}.
ΣwiY i = 3.123 y = Y - ΣwiY i = {-0.147, -0.147, -0.147, -0.147, 0.7 , 0.7, 1.5, 1.5} Then if the least squares line is Y = α + βX, ^
β=
∑ wi xi yi = 0.45/2.6 = 0.173. ∑wi xi2
α^ = ΣwiY i - β^ ΣwiXi = 3 - (3)(0.173) = (3)(0.827) = 2.481. ^
Yi = 2.481 + 0.173Xi, where Xi is the observation. Note that the slope of the line is the previously calculated credibility for one roll of a die, Z = 17.3%.124 Note that for one exposure Z = 1 / (1 + K) = 1/(1 + EPV/VHM) = VHM /(VHM + EPV) = VHM / Total Variance = 0.45 / 2.6. The intercept of the fitted line is: (0.827)(3) = (1 - Z)(a priori mean). Thus in this example, the fitted weighted regression line is the estimates using Buhlmann Credibility: Z(observation) + (1-Z)(a priori mean). This is true in general. Derivation of the Relationship of Buhlmann to Bayes: In a more general situation, we would have the possible types of risks Rm. (In the example, these were the three types of multi-sided dice. In other situations the different type of risks are parameterized by a continuous distribution such as a Gamma.) The Bayesian Estimates would be E[X | D], where D is the observed data. Let Di be the possible outcomes of the risk process.
∑ wi Xi is the a priori mean. ∑ wi Yi = Σ P(Di) E[X | Di ] = E[X] = the prior mean. In other words, the Bayesian Estimates are always in balance. 123
The Bayesian Estimates are in balance, so that their weighted average is equal to the a priori mean of 3. For the multi-sided die example, we had EPV = 2.15, VHM = .45, K = EPV/VHM = 4.778, and for the roll of a single die Z = 1/(1+K) = 17.3%. 124
2013-4-9
∑ wi xi2
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 508
= Σ P(Di) (Di - prior mean)2 = the variance of the whole risk process (by definition).
∑ wi xi yi = Σ P(Di) (Di - prior mean) (E[X | Di ] - prior mean) = Σ P(Di) Di E[X | Di ] - Σ P(Di) Di (prior mean) - Σ (prior mean) P(Di) E[X | Di ] + (prior mean)2 =
∑P(Di) Di E[X |
Di ] - (prior mean)2 =
i
∑P(Di) Di∑P(Rm | Di) E(Rm) - (prior mean)2 = i
m
∑Di P(Di) P(Rm | Di) E(Rm) - (prior mean)2 = i,m
∑Di P(Di
| Rm) P(Rm) E(Rm ) - (prior mean)2 =
i,m
∑E(Rm) P(Rm) E(Rm) - (prior mean)2 = m
second moment of the hypothetical means - (prior mean)2 = VHM.125
^
Thus the slope of the weighted regression line is: β =
∑ wi xi yi = ∑wi xi2
(Variance of the Hypothetical Means) / (Total Variance) = VHM / (EPV + VHM) = 1 / (1 + EPV/VHM) = 1/(1+K) = Buhlmann Credibility for one observation = Z. Thus the slope of the weighted least squares line to the Bayesian Estimates is the Buhlmann Credibility. The intercept of the weighted regression line is: α^ = ΣwiY i - β^ ΣwiXi = prior mean - Z (prior mean) = (1 - Z)(prior mean). Thus the weighted regression line is: Y = α + βX = (1 - Z) (prior mean) + Z (observation) = estimate using Buhlmann Credibility. 125
By Bayes Theorem, P(Rm)P(Di | Rm) = P(Di )P(Rm|Di); they are each equal to the probability of having both Rm
and Di. Also Σ Di P(Di | Rm) = E(Rm).
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 509
A General Result: Thus the line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. The slope of this weighted least squares line to the
Bayesian Estimates is the Buhlmann Credibility. Buhlmann Credibility is the Least Squares approximation to the Bayesian Estimates. When the a priori probabilities of each outcome are equal, then this weighted regression reduces to an ordinary regression. Exercise: You are given the following information about a model: First Unconditional Bayesian Estimate of Observation Probability Second Observation 1 1/4 3 4 1/4 6 10 1/4 13 25 1/4 18 Determine the Bühlmann credibility, Z, to be applied to one observation. [Solution: X = 10. x = Xi - X = {-9, -6, 0, 15}. Y = 10. y = Yi - Y = {-7, -4, 3, 8}. ^
β=
∑ xi yi = 207/342 = 0.605 = Z for one observation.] ∑xi2
^ Continuing, the intercept of the regression line is: α^ = Y - β X = 10 - (10)(0.605) = 3.95. ^
The weighted regression line is: Yi = 3.95 + 0.605Xi. Thus the estimate using Buhlmann Credibility is: (0.605)(observation) + 3.95 = Z(observation) + (1 - Z)(10), with Z = 60.5% and prior mean = 10.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 510
Least Squares Approximations: It turns out that Buhlmann Credibility Estimates form a Least Squares Approximation in three different but related manners. Buhlmann Credibility Estimates are the (weighted) least squares line between: 1. Bayesian Estimates vs. Possible Observations 2. True Means vs. Observations 3. Subsequent Observations vs. Prior Observations The third result is an asymptotic one as the sample size approaches infinity. It will be demonstrated in a later section.126 127
126
It is illustrated graphically for a particular example in “A Graphical Illustration of Experience Rating Credibilities,” by Howard C. Mahler, PCAS 1998. 127 The connection between linear regression and Buhlmann Credibility is a key idea used in “Loss Development Using Credibility “ by Eric Brosius, not on the syllabus of this exam.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 511
Problems: 11.1 (2 points) Let X1 be the outcome of a single trial and let E[ X2 | X1 ] be the expected value of the outcome of a second trial as described in the table below. Outcome V Initial Probability of Outcome Bayesian Estimate E[ X2 | X1 = V ] 0 1/4 2 3 1/2 4 10 1/4 6 Determine the Buhlmann Credibility assigned to a single observation. A. 31% B. 33% C. 35% D. 37% E. 39% 11.2 (2 points) You are given the following information: First Unconditional Bayesian Estimate of Observation Probability Second Observation 1 75% 1.50 4 25% 2.50 Determine the Bühlmann credibility estimate of the second observation, given that the first observation is 4. (A) 2.50 (B) 2.75 (C) 3.00 (D) 3.25 (E) 3.50 11.3 (1 point) You are given the following: • An experiment consists of ten possible outcomes: R1 , R2 , ... , R10. • The a priori probability of outcome Ri is Pi. • For each possible outcome, Bayesian analysis was used to calculate predictive estimates, Ei, for the second observation of the experiment. 10
∑Pi Ei
= 24.
i=1
• The Buhlmann credibility factor after one experiment is 1/4. Determine the values for the parameters a and b that minimize the expression: 10
∑Pi (a
+ bRi - Ei )2 .
i=1
A. a = 6; b = 3/4 D. a = 18; b = 1/4
B. a = 18; b = 3/4 C. a = 6; b = 1/4 E. None of A, B, C, or D.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 512
11.4 (3 points) You are given the following information about a credibility model: First Unconditional Bayesian Estimate of Observation Probability Second Observation 10 1/3 18 20 1/3 23 60 1/3 49 Determine the estimate of the second observation using Buhlmann Credibility, given that the first observation is 60. (A) 43 (B) 45 (C) 47 (D) 49 (E) 51 11.5 (4, 5/83, Q.38) (1 point) Which of the following are true? 1. The estimates resulting from the use of Buhlmann Credibility and the application of Bayes Theorem are always equal. 2. The estimate resulting from the use of Buhlmann Credibility is a linear approximation to the estimate resulting from the use of Bayes Theorem. 3. If the estimate resulting from the use of Buhlmann Credibility is greater than the hypothetical mean, then the estimate resulting from the application of Bayes Theorem is also greater than the hypothetical mean. A. 2 B. 3 C. 1, 3 D. 2, 3 E. 1, 2, 3 11.6 (4, 5/90, Q.57) (3 points) Let X1 be the outcome of a single trial and let E[X2 | X1 ] be the expected value of the outcome of a second trial as described in the table below. Outcome Initial Probability K of Outcome
Bayesian Estimate E[X2 | X1 = K ]
0 1/3 1 3 1/3 6 12 1/3 8 Which of the following represents the Buhlmann credibility estimates corresponding to the Bayesian estimates (1, 6, 8)? A. (3, 5, 10) B. (2, 4, 10) C. (2.5, 4.0, 8.5) D. (1.5, 3.375, 9.0) E. (1, 6, 8) 11.7 (4B, 5/93, Q.6) (1 point) Which of the following are true? 1. Buhlmann credibility estimates are the best linear least squares approximations to estimates from Bayesian analysis. 2. Buhlmann credibility requires the assumption of a distribution for the underlying process generating claims. 3. Buhlmann credibility estimates are equivalent to estimates from Bayesian analysis when the likelihood density function is a member of a linear exponential family and the prior distribution is the conjugate prior. A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 513
11.8 (4B, 11/93, Q.24) (3 points) You are given the following: • An experiment consists of three possible outcomes, R1 = 0, R2 = 2, and R3 = 14. • The a priori probability distribution for the experiment's outcome is: Outcome, Ri Probability, Pi 0 2/3 2 2/9 14 1/9 • For each possible outcome, Bayesian analysis was used to calculate predictive estimates, Ei, for the second observation of the experiment. The predictive estimates are: Bayesian Analysis Predictive Outcome, Ri Estimate Ei Given Outcome Ri 0 7/4 2 55/24 14 35/12 • The Buhlmann credibility factor after one experiment is 1/12. Determine the values for the parameters a and b that minimize the expression: 3
∑Pi (a
+ bRi - Ei )2 .
i=1
A. a = 1/12; b = 11/12 D. a = 22/12; b = 1/12
B. a = 1/12; b = 22/12 E. a = 11/12; b = 11/12
C. a = 11/12; b = 1/12
11.9 (4, 11/02, Q.7 & 2009 Sample Q. 35) (2.5 points) You are given the following information about a credibility model: First Unconditional Bayesian Estimate of Observation Probability Second Observation 1 1/3 1.50 2 1/3 1.50 3 1/3 3.00 Determine the Bühlmann credibility estimate of the second observation, given that the first observation is 1. (A) 0.75
(B) 1.00
(C) 1.25
(D) 1.50
(E) 1.75
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 514
Solutions to Problems: 11.1. D. The Buhlmann Credibility is the slope of the least squares line fit to the Bayesian Estimates. One needs to do a weighted regression with the weights equal to the a priori probabilities; in this case one can just duplicate the point (3, 4) and perform an unweighted regression. Thus the X values are: 0, 3, 3, 10 and the Y values are: 2, 4, 4, 6. The slope is: { ((1/n)ΣXiY i) - ((1/n)ΣXi) ((1/n)ΣYi) } / { ((1/n)ΣXi2 ) - ((1/n)ΣXi)2 } = {21 - (4)(4)} /{29.5 - 42 } = 5/13.5 = 0.370. 11.2. A. The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. Since there are only two values, the Bayesian Estimates are on a straight line, so Buhlmann equals Bayes. Given that the first observation is 4, the Bühlmann credibility estimate is 2.5. Comment: Fitting the weighted regression: X = 1, 4. X = Σ wiXi = 1.75. x = X - X = -.75, 2.25. Y = 1.5, 2.5. Y = Σ wiY i = 1.75. y = Y - Y = -.25, .75. w = .75, .25. Σ wixiy i = .5625.
Σ wixi2 = 1.6875. slope = Σ xiy i /Σ xi2 = .5625/1.6875 = 0.333. Intercept = Y - (slope) X = 1.75 - (.333)(1.75) = 1.167. 1.167 + (4)(.333) = 2.50. 11.3. D. Since the Bayesian Estimates are in balance, Σ PiEi = 24 = a priori mean. The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. The slope, b = Z = 1/4. The intercept, a = (1 - Z)(a priori mean) = (3/4)(24) = 18. 11.4. D. The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. Since the a priori probabilities are equal we fit an unweighted regression. X = 10, 20, 60. X = 30. x = -20, -10, 30. Y = 18, 23, 49. Y = 30. y = -12, -7, 19.
Σ xiy i = 880. Σ xi2 = 1400. slope = Σ xiy i /Σ xi2 = 880/1400 = .6286. Intercept = Y - (slope) X = 30 - (.6286)(30) = 11.14. Bühlmann credibility estimate of the second observation = 11.14 + (.6286)(first observation). Given that the first observation is 60, the estimate is: 11.14 + (.6286)(60) = 48.9. Comment: Similar to 4, 11/02, Q.7.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 515
11.5. A. 1. False. They are sometimes equal (as in the Gamma-Poisson Conjugate Prior), but are often unequal. See for example my multi-sided die example in prior sections. 2. True. 3. False. The Buhlmann Credibility estimate is a linear approximation to the result of Bayesian Analysis. They can be on different sides of the hypothetical mean. For example in my multi-sided die example in prior sections, the prior mean is 3, and if a 4 is observed the Buhlmann credibility estimate is 3.17 while the Bayes Analysis estimate is 2.85. 11.6. C. The Buhlmann Credibility is the slope of the least squares line fit to the Bayesian Estimates. One needs to do a weighted regression with the weights equal to the a priori probabilities; in this case since the a priori probabilities are the same one can perform an unweighted regression. The X values are: 0, 3, 12 and the Y values are: 1, 6, 8. The slope is: { ((1/n)ΣXiY i) - ((1/n)ΣXi) ((1/n)ΣYi) } / {((1/n)ΣXi2 ) - ((1/n)ΣXi)2 } = {(38) - (5)(5)} /{51 - 52 } = 0.5.
Average
X 0 3 12
Y 1 6 8
XY 0 18 96
X^2 0 9 144
5
5
38
51
Thus the Buhlmann Credibility is .50 and the new estimates are: (observation)Z + (prior mean)(1 - Z) = (0, 3, 12)(.5) + (5)(1 - .5) = (0, 1.5, 6) + 2.5 = (2.5, 4.0, 8.5). Alternately, one can check whether the given choices are each of the form: new estimate = (observation)Z + (prior mean)(1 - Z) = prior mean + Z(observation - prior mean) = 5 + Z(observation - 5). This will be so if Z = (new estimate - 5) / (observation - 5), is the same for the different observations and corresponding estimates. Observation 0 3 12
Est. A 3 5 10
Cal. Z 0.40 0.00 0.71
Est. B 2.00 4.00 10.00
Cal. Z 0.60 0.50 0.71
Est. C 2.50 4.00 8.50
Cal. Z 0.50 0.50 0.50
Est. D 1.50 3.38 9.00
Cal. Z 0.70 0.81 0.57
Est. E 1.00 6.00 8.00
Cal. Z 0.80 -0.50 0.43
Since the new estimates in choice C are the only ones in the desired form, we have eliminated all the other choices. If some of the other choices were in the proper form one could compare to see which one had the smallest squared error compared to the Bayesian Estimates. In the case of choice C, the squared error is: (1/3)(2.5 - 1)2 + (1/3)(4 - 6)2 + (1/3)(8.5 - 8)2 = 2.167. Comment: Note that the Bayesian Estimates are “in balance”; they average to the a priori overall mean of 5: (1/3)(1) + (1/3)(6) + (1/3)(8) = 5. The average of the Buhlmann Credibility Estimates is also 5. This eliminates choices A, B, and D.
2013-4-9
Buhlmann Credibility §11 Linear Regression,
HCM 10/19/12,
Page 516
11.7. E. 1. True. 2. False, we need only know the mean, VHM and EPV; we need not know the distribution. False, consider for example a die-spinner example. 3. True. 11.8. D. Buhlmann Credibility is the least squares linear approximation to the Bayesian analysis result. The given expression is the squared error of a linear estimate. Thus the values of a and b that minimize the given expression correspond to the Buhlmann credibility estimate. In this case, the new estimate using Buhlmann Credibility = (prior mean)(1-Z) + (observation) Z = 2(1 - 1/12) + (1/12)(observation) = 22/12 + (1/12)(obser.). Therefore a = 22/12 and b = 1/12. Alternately, one can minimize the given expression. One takes the partial derivatives with respect to a and b and sets them equal to zero.
Σ 2Pi (a + bRi - Ei)
= 0, and Σ 2Pi Ri (a + bRi - Ei) = 0.
Therefore, (2/3)(a + b(0) - 7/4) + (2/9)(a + b(2) - 55/24) + (1/9)(a + b(14) - 35/12) = 0
⇒ a + 2 b = 2, and (2/9)(2)(a + b(2) - 55/24) + (1/9)(14)(a + b(14) - 35/12) = 0 18 a + 204 b = 300/6 =50 ⇒ 9 a + 102 b = 25. One can either solve these two simultaneous linear equations by matrix methods or try the choices A through E. Comment: Normally one would not be given the Buhlmann credibility factor as was the case here, allowing the first method of solution, which does not use the information given on the values of the Bayesian analysis estimates. Note that the Bayesian estimates balance to the a priori mean of 2: (2/3)(7/4) + (2/9)(55/24) + (1/9)(35/12) = (126 + 55 + 35)/108 = 216/108 = 2. 11.9. C. The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. Since the a priori probabilities are equal we fit an unweighted regression. X = 1, 2, 3. X = 2. x = X - X = -1, 0, 1. Y = 1.5, 1.5, 3. Y = 2. y = Y - Y = -.5, -.5, 1.
Σ xiy i = 1.5. Σ xi2 = 2. slope = Σ xiy i /Σ xi2 = 1.5/2 = .75 = Z. Intercept = Y - (slope) X = 2 - (.75)(2) = .5. Bühlmann credibility estimate of the second observation = .5 + .75(first observation). Given that the first observation is 1, the Bühlmann credibility estimate is: (.5) + (.75)(1) = 1.25. Comment: The Bühlmann credibility estimate given 2 is 2; the estimate given 3 is 2.75. The Bayesian Estimates average to 2, the overall a priori mean. Bayesian estimates are in balance. The Bühlmann Estimates are also in balance; they also average to 2.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 517
Section 12, Philbrick Target Shooting Example128 In “An Examination of Credibility Concepts,” by Stephen Philbrick there is an excellent target shooting example that illustrates the ideas of Buhlmann Credibility.129 Assume there are four marksmen each shooting at his own target. Each marksmanʼs shots are assumed to be distributed around his target, marked by the letters A, B, C, and D, with an expected mean equal to the location of his target. If the targets are arranged as in Figure 1, the resulting shots of each marksman would tend to cluster around his own target. The shots of each marksman have been distinguished by a different symbol. So for example the shots of marksman B are shown as triangles. We see that in some cases one would have a hard time deciding which marksman had made a particular shot if we did not have the convenient labels.
Figure 1
128
While the specific target shooting example is not on the current syllabus, all the ideas it illustrates are on the syllabus. In addition, most of the problems in this section would be legitimate questions for your exam. Therefore, many of you will benefit from going over this section and doing at least some of the problems. 129 In the 1981 Proceedings of the CAS. It is on the syllabus of the Group and Health - Design and Pricing Exam of the SOA. In my opinion this is the single best paper ever written on the subject of credibility. More actuaries have gotten a good intuitive understanding of credibility by reading this paper, than from any other source.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 518
The point E represents the average of the four targets A, B, C, and D. Thus E is the overall mean.130 If we did not know which marksman were shooting we would estimate that the shot would be at E; the a priori estimate is E. Once we observe a shot from an unknown marksmen131, we could be asked to estimate the location of the next shot from the same marksman. Using Buhlmann Credibility our estimate would be between the observation and the a priori mean of E. The larger the Credibility assigned to the observation, the closer the estimate is to the observation. The smaller the credibility assigned to the data, the closer the estimate is to E. There are a number of features of this target shooting example that control how much Buhlmann Credibility is assigned to our observation. We have assumed that the marksmen are not perfect; they do not always hit their target. The amount of spread of their shots around their targets can be measured by the variance. The average spread over the marksmen is the Expected Value of the Process Variance (EPV). The better the marksmen, the smaller the EPV and the more tightly clustered around the targets the shots will be. The worse the marksmen, the larger the EPV and the less tightly spread. The better the marksmen, the more information is contained in a shot. The worse the marksmen, the more random noise contained in the observation of the location of a shot. Thus when the marksmen are good, we expect to give more weight to an observation (all other things being equal) than when the marksmen are bad. Thus the better the marksmen, the higher the credibility:
Marksmen Good Bad
Clustering of Shots Tight Loose
Expected Value of the Process Variance Small Large
Amount of Noise Low High
Credibility Assigned to an Observation Larger Smaller
The smaller the Expected Value of the Process Variance the larger the credibility. This is illustrated by Figure 2. It is assumed in Figure 2 that each marksman is better132 than was the case in Figure 1. The EPV is smaller and we assign more credibility to the observation. This makes sense, since in Figure 2 it is a lot easier to tell which marksmen is likely to have made a particular shot based solely on its location.
130
In this example, each of the marksmen is equally likely. Thus we weight each target equally. As was seen previously, in general one would take a weighted average using the not necessarily equal a priori probabilities as the weights. 131 Thus the shot does not have one of the convenient labels attached to it. This is analogous to the situation in Auto Insurance, where the drivers in a classification are presumed not to be wearing little labels telling us who are the safer and less safe drivers in the class. We rely on the observed experience to help estimate that. 132 Alternately the marksmen could be shooting from closer to the targets. See Part 4B, 5/93, Q.4.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 519
Figure 2
Another feature that determines how much credibility to give an observation is how far apart the four targets are placed. As we move the targets further apart (all other things being equal) it is easier to distinguish the shots of the different marksmen. Each target is a hypothetical mean of one of the marksmen shots. The spread of the targets can be quantified as the Variance of the Hypothetical Means.
Targets Closer Far Apart
Variance of the Information Hypothetical Means Content Small Lower Large Higher
Credibility Assigned to an Observation Smaller Larger
As illustrated in Figure 3, the further apart the targets the more credibility we would assign to our observation. The larger the VHM the larger the credibility. It is easier to distinguish which marksmen made a shot based solely on its location in Figure 3 than in Figure 1.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 520
Figure 3
The third feature that one can vary is the number of shots observed from the same unknown marksman. The more shots we observe, the more information we have and thus the more credibility we would assign to the average of the observations. Each of the three features discussed follows from the formula for Buhlmann Credibility Z = N / (N + K) = N(VHM) / {N(VHM) + EPV}. Thus as the EPV increases, Z decreases. As VHM increases, Z increases. As N increases, Z increases. Feature of Target Shooting Example
Mathematical Quantification
Buhlmann Credibility
Better Marksmen
Smaller EPV
Larger
Targets Further Apart
Larger VHM
Larger
More Shots
Larger N
Larger
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 521
Expected Value of the Process Variance versus Variance of the Hypothetical Means: There are two separate reasons why the observed shots vary. First, the marksmen are not perfect. In other words the Expected Value of the Process Variance is positive. Even if all the targets were in the same place, there would still be a variance in the observed results. This component of the total variance due to the imperfection of the marksmen is quantified by the EPV. Second, the targets are spread apart. In other words, the Variance of the Hypothetical Means is positive. Even if the every marksman were perfect, there would still be a variance in the observed results, when the marksmen shoot at different targets. This component of the total variance due to the spread of the targets is quantified by the VHM. One needs to understand the distinction between these two sources of variance in the observed results. Also one has to know that the total variance of the observed shots is a sum of these two components: Total Variance = EPV + VHM. Buhlmann Credibility is a Relative Concept: In general, when trying to predict the future by using the past, one tries to separate the signal (useful information) from the noise (random fluctuation). In Philbrickʼs target shooting example, we are comparing two estimates of the location of the next shot from the same marksman: 1. average of the shots, average of the observations, 2. average of the targets, the a priori mean. Z measures the usefulness of one estimator relative to the other estimator. When the marksmen are better, there is less random fluctuation in the shots and the average of the observations is a better estimate, relative to the a priori mean. In this case, the weight Z, applied to the average of the observations is larger, while the weight, 1-Z, applied to the a priori mean is smaller. As the targets get closer together, there is less variation of the hypothetical means, and the a priori mean becomes a better estimate, relative to the average of the observations. In this case, the weight applied to the average of the observations, Z, is smaller, while that applied to the a priori mean, 1-Z, is larger. The Buhlmann Credibility measures the usefulness of one estimator, the average of the observations, relative to another estimator, the a priori mean. If Z = 50%, then the two estimators are equally good or equally bad. Buhlmann credibility is a relative measure of the value of the information contained in the observation versus that in the a priori mean.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 522
A One Dimension Target Shooting Example: Here is a one-dimensional example of Philbrick's target shooting model, such that the marksmen only miss to the left or right. Assume: • There are two marksmen. • The targets for the marksmen are at the points on the number line: 20 and 30. • The distribution of shots from each marksman follows a normal distribution with mean equal to his target value and with standard deviation of 12. Here are 20 simulated shots from each of the marksmen, with the shots labeled as to which marksman it was from:
Assume instead we had unlabeled shots from a single unknown marksman. By observing where an unknown marksman's shot(s) hit the number line, you want to predict the location of his next shot. We can use either Bayesian Analysis or Buhlmann Credibility. To use Buhlmann Credibility we need to calculate the Expected Value of the Process Variance and the Variance of the Hypothetical Means. The process variance for every marksmen is assumed to be the same and equal to 122 = 144. Thus the EPV = 144.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 523
The overall mean is 25 and the VHM is: {(20-25)2 + (30-25)2 }/2 = 25. Thus the Buhlmann Credibility parameter is K = EPV / VHM = 144 /25 = 5.76. Exercise: Assume a single shot at 18 from an unknown marksman. Use Buhlmann Credibility to predict the location of the next shot from the same marksman. [Solution: The credibility of a single observation is Z = 1/(1+5.76) = 14.8%. The Buhlmann Credibility estimate of the next shot is: (18)(14.8%) + (25)(85.2%) = 24.0.] In order to apply Bayesian Analysis, one must figure out the likelihood of the observation of 18 given that the shot came from each marksman. In order to do so, one uses the assumed (x -µ)2 ] 2σ2 , -∞ < x < ∞. σ 2π
exp[Normal Distribution: f(x) =
(x - 10)2 ] 288 . 12 2 π
exp[For example, the first marksman has a probability density function: f(x) =
( - )2 exp[- 18 10 ] 288 Thus the density function at 18 for the first marksman is f(18) = = 0.0328. 12 2 π Similarly the chance of observing 18 if the shot came from the other marksman is 0.0202. One then computes probability weights as the product of the (equal) a priori chances and the conditional likelihoods of the observation. One converts these to probabilities by dividing by their sum. The resulting posterior chance that it was marksman number 1 is: 0.01639 / 0.02648 = 61.9%. Then the posterior estimate of the next shot is a weighted average of the means using the posterior probabilities: (61.9%)(20) + (38.1%)(30) = 23.8. This whole calculation can be arranged in a spreadsheet as follows: A
B
C
D
E
F
G
A Priori Chance Prob. Weight = Posterior Chance of Chance of of the Product this Type of MarksStandard this Type of Observing of Columns Marksman = 18 D&E Col. F / (Sum of Col. F) man Mean Deviation Marksman 1 2 Overall
20 30
12 12
0.500 0.500
0.0328 0.0202
H
Mean
0.01639 0.01008
61.92% 38.08%
20 30
0.02648
1.000
23.81
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 524
More Shots: What if instead of a single shot at 18 one observed three shots at 18, 26 and 4 from the same unknown marksman? Then the chance of the observation is now the product of the likelihoods at 18, 26, and 4. Otherwise the Bayesian Analysis proceeds as before and the estimate of the next shot is 21.3: A
B
C
D
E
F
G
H
Chance Chance Chance Chance A of the of the of the of MarksStd. Priori ObservingObservingObserving Obserman Mean Dev. Chance 18 26 4 vation 1 2 Overall
20 30
12 12
0.50 0.50
0.0328 0.0202
0.0293 0.0314
0.0137 0.0032
1.3147e-5 2.0162e-6
I
Probability Weight Dx H
J
K
Posterior Chance of Marksman Mean
6.57e-6 1.01e-6
86.70% 13.30%
20 30
7.58e-6
1.000
21.33
As calculated above, K = 5.76. The credibility of 3 observations is: Z = 3/(3+5.76) = 34.2%. The larger number of observations has increased the credibility. The average of the observations is: (18 + 26 + 4)/3 = 16. The a priori mean is 25. Thus the Buhlmann Credibility estimate of the next shot is: (16)(34.2%) + (25)(65.8%) = 21.9.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 525
Moving the Targets: Assume that the targets were further apart: • There are two marksmen. • The targets for the marksmen are at the points on the number line: 10 and 40. • The distribution of shots from each marksman follows a normal distribution with mean equal to his target value and with standard deviation of 12. Here are 20 simulated shots from each of the marksmen:
Each shot has more informational content than in the previous case. Without the labels, here one would be more likely to be able to correctly determine the location of which marksman made a particular shot, than when the targets were closer together. Exercise: Assume we observe three shots from an unknown marksman at: 38, 46 and 24. What is the Buhlmann Credibility estimate of the next shot from the same marksman? [Solution: The EPV is still 144 while the VHM is now: 152 = 225. K = EPV/VHM = 144/225 = .64. Z = 3/(3+.64) = 82.4%. The average of the observations is: (38+46+24)/3 = 36. The a priori mean is 25. Thus the Buhlmann Credibility estimate of the next shot is: (36)(82.4%) + (25)(17.6%) = 34.1.]
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 526
The larger VHM has increased the credibility.133 Exercise: Assume we observe three shots from an unknown marksman at: 38, 46 and 24. Use Bayesian Analysis to estimate of the next shot from the same marksman. A
B
C
D
E
F
G
H
Chance Chance Chance Chance A of the of the of the of MarksStd. Priori ObservingObservingObserving Obserman Mean Dev. Chance 38 46 24 vation 1 2
10 40
12 12
0.50 0.50
0.0022 0.0328
Overall
0.0004 0.0293
0.0168 0.0137
1.358e-8 1.315e-5
I
Probability Weight Dx H
J
K
Posterior Chance of Marksman Mean
6.792e-9 6.574e-6
0.10% 99.90%
10 40
6.580e-6
1.000
39.97
Altering the Skill of the Marksmen: Return to the original situation, but now assume that the marksmen are more skilled:134 • There are two marksmen. • The targets for the marksmen are at the points on the number line: 20 and 30. • The distribution of shots from each marksman follows a normal distribution with mean equal to his target value and with standard deviation of 3. With a smaller process variance, each shot contains more information about which marksman produced it. Here we can more easily infer which marksman is likely to have made a shot than when the marksmen were less skilled. The smaller EPV has increased the credibility.135
133
If instead one had moved the targets closer together then the credibility assigned to a single shot would have been less. A smaller VHM leads to less credibility. 134 Alternately assume the marksmen are shooting from closer to the targets. 135 If instead one had less skilled marksmen then the credibility assigned to a single shot would have been less. A larger EPV leads to less credibility.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 527
Here are 20 simulated shots from each of the marksmen:
Exercise: Assume we observe three shots from an unknown marksman at: 18, 26 and 4. What is the Buhlmann Credibility estimate of the next shot from the same marksman? [Solution: The EPV is 32 = 9 while the VHM is 52 = 25. K = EPV/VHM = 9 /25 = .36. Z= 3/3.36 = 89.3%, more than in the original example. The average of the observations is: (18+26+4)/3 = 16. The a priori mean is 25. The Buhlmann Credibility estimate of the next shot is: (16)(89.3%) + (25)(10.7%) = 17.0.] Exercise: Assume we observe three shots from an unknown marksman at: 18, 26 and 4. Use Bayesian Analysis to estimate of the next shot from the same marksman. A
B
C
D
E
F
G
H
Chance Chance Chance Chance A of of of of MarksStd. Priori Observ. Observ. Observ. Obserman Mean Dev. Chance 18 26 4 vation 1 2 Overall
20 30
3 3
0.50 0.50
0.1065 0.0000
0.0180 0.0547
0.0000 1.7e-10 0.0000 1.588e-23
I
J
K
Probability Weight Dx H
Posterior Chance of Marksman Mean
8.48e-11 7.94e-24
100.00% 0.00%
20 30
8.48e-11
1.000
20.00
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 528
Limiting Situations and Buhlmann Credibility: As the number of observation approaches infinity, the credibility approaches one. In the target shooting example, as the number of shots approaches infinity, our Buhlmann Credibility estimate approaches the mean of the observations. On the other hand, if we have no observations, then the estimate is the a priori mean. We give the a priori mean a weight of 1, so 1-Z = 1 or Z = 0. Buhlmann Credibility is given by Z = N / (N+ K). In the usual situations where one has a finite number of observations, 0 < N < ∞, one will have 0 < Z < 1 provided 0 < K < ∞. The Buhlmann Credibility is only zero or unity in unusual situations. The Buhlmann Credibility parameter K = EPV / VHM. So K = 0 if EPV = 0 or VHM = ∞. On the other hand K is infinite if EPV = ∞ or VHM = 0. The Expected Value of the Process Variance is zero only if one has certainty of results.136 In the case of the Philbrick Target Shooting Example, if all the marksmen were absolutely perfect, then the expected value of the process variance would be zero. In that situation we assign the observation a credibility of unity; our new estimate is the observation. The Variance of the Hypothetical Means is infinite if one has little or no knowledge and therefore has a large variation in hypotheses137. In the case of the Philbrick Target Shooting Example, as the targets get further and further apart, the variance of the hypothetical means approaches infinity. We assign the observations more and more weight as the targets get further apart. If one target were in Alaska, another in California, another in Maine and the fourth in Florida, we would give the observation virtually 100% credibility. In the limit, our new estimate is the observation; the credibility is one. However, in most applications of Buhlmann Credibility the Expected Value of the Process Variance is positive and the Variance of the Hypothetical Means is finite, so that K > 0. The Expected Value of the Process Variance can be infinite only if the process variance is infinite for at least one of the types of risks. If in an example involving claim severity, one assumed a Pareto distribution with α ≤ 2, then one would have infinite process variance. In the Philbrick Target Shooting example, a marksman would have to be infinitely terrible in order to have an infinite process variance. 136
For example, one could assume that it is certain that the sun will rise tomorrow; there has been no variation of results, the sun has risen every day of which you are aware. 137 For example, an ancient Greek philosopher might have hypothesized that the universe was more than 3000 years old with all such ages equally likely.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 529
As the marksmen get worse and worse, we give the observation less and less weight. In the limit where the location of the shot is independent of the location of the target we give the observation no weight; the credibility is zero. The Variance of the Hypothetical Means is zero only if all the types of risks have the same mean. For example, in the Philbrick Target Shooting example, if all the targets are at the same location (or alternately each of the marksmen is shooting at the same target) then the VHM = 0. As the targets get closer and closer to each other, we give the observation less and less weight. In the limit we give the observation no weight; the credibility is zero. In limit all the weight is given to the single target. However, in the usual applications of Buhlmann Credibility there is variation in the hypotheses and there is a finite expected value of process variance and therefore K is finite. Assuming 0 < K < ∞ and 0 < N < ∞, then 0 < Z < 1. Thus in ordinary circumstances the Buhlmann Credibility is strictly between zero and one. Buhlmann Credibility Parameter = K = EPV/VHM. Buhlmann Credibility = Z = N / (N + K). EPV → 0 ⇒ K → 0 ⇒ Z → 1. EPV → ∞ ⇒ K → ∞ ⇒ Z → 0. VHM → 0 ⇒ K → ∞ ⇒ Z → 0. VHM → ∞ ⇒ K → 0 ⇒ Z → 1. N → ∞ ⇒ Z → 1. Analysis of Variance: There are two distinct reasons why the shots from a series of randomly selected marksmen vary: different targets and the imperfection of the marksmen. Marksmen Perfect ⇔ EPV = 0 ⇔ shots vary only due to separate targets. Targets the Same ⇔ VHM = 0 ⇔ shots vary only due to the imperfection of marksmen. These two effects add: Total Variance = EPV + VHM.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 530
Bayesian Analysis: The Philbrick Target Shooting example is also useful for illustrating ideas from Bayesian Analysis. If we observe a shot from an unknown marksman, the new estimate using Bayesian Analysis is a weighted average of the locations of the targets.138 Assuming the a priori probabilities of each marksman are equal, the weight applied to each target is proportional to the chance of the shot having come from the corresponding marksman.139 Consider the limiting case where the targets get further and further apart; i.e., the Variance of the Hypothetical Means approaches infinity. As the targets get further and further apart the chance of observing a shot goes quickly to zero for all but the closest target.140 Thus for the Bayesian Analysis approach, (assuming the a priori chance of the closest target is greater than zero), as the targets get further and further apart, the probability weight of the closest target is much larger than any of the others. Thus virtually all of the weight is given to the mean of the closest target, and the posterior estimate approaches the mean of that closest target as the VHM → ∞. This differs from the Buhlmann Credibility approach. Assuming all else stays constant, as the Variance of the Hypothetical Means approaches infinity, the Buhlmann Credibility Parameter K = EPV / VHM approaches zero. Therefore, Z = 1/(1+K) → 1, and the Buhlmann Credibility estimate approaches the observed shot. If the Variance of the Hypothetical Means approaches zero, the hypothetical means get closer and closer. The Bayes Analysis estimate is a weighted average of these hypothetical means, which all approach the overall mean as the VHM → 0. If the EPV approaches zero, then all the process variances approach zero. Therefore, the chance of observing a shot goes quickly to zero for all but the closest target. Thus for the Bayesian Analysis approach, (assuming the a priori chance of the closest target is greater than zero), as the marksmen get better, the probability weight of the closest target is much larger than any of the others. Thus virtually all of the weight is given to the mean of the closest target, and the posterior estimate approaches the mean of that closest target as the EPV → 0. It should be noted that as the EPV → 0, the expected distance between the shot and the closest target also goes to zero. Thus the posterior estimate also approaches the location of the shot as well as the nearest target. 138
In general the Bayesian Estimate is a weighted average of the hypothetical means. In general, the weight applied to each hypothetical mean is proportional to the product of the chance of the observation given that we have the corresponding type of risk times the a priori chance of that type of risk. 140 See for example, the one dimensional example above. When the targets were moved further apart, the posterior probabilities for targets other than closest got small. Move the targets even further apart, and the posterior probability for closest will quickly go to one while the others go to zero. 139
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 531
Problems: Use the following information in the next four questions: There are three marksmen, each of whose shots are Normally Distributed (in one dimension) with means and standard deviations: Risk Mean Standard Deviation A 100 60 B 200 120 C 300 240 12.1 (2 points) A marksman is chosen at random. What is the Buhlmann Credibility Parameter? A. Less than 1 B. At least 1 but less than 2 C. At least 2 but less than 3 D. At least 3 but less than 4 E. At least 4 12.2 (1 point) A marksman is chosen at random. You observe two shots at 90 and 150. Using Buhlmann Credibility, estimate the next shot from the same marksman. A. Less than 130 B. At least 130 but less than 140 C. At least 140 but less than 150 D. At least 150 but less than 160 E. At least 160 12.3 (3 points) A marksman is chosen at random. If you observe two shots at 90 and 150, what is the chance that it was marksman B? A. Less than 10% B. At least 10% but less than 15% C. At least 15% but less than 20% D. At least 20% but less than 25% E. At least 25% 12.4 (1 point) A marksman is chosen at random. If you observe two shots at 90 and 150, what is the Bayesian Estimate of the next shot from the same marksman? A. Less than 130 B. At least 130 but less than 140 C. At least 140 but less than 150 D. At least 150 but less than 160 E. At least 160
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 532
12.5 (2 points) You are given the following: • Four shooters are to shoot at a target some distance away that has the following design: W
X
Z
Y
• Shooter A hits Area W with probability 1/2 and Area X with probability 1/2. • Shooter B hits Area X with probability 1/2 and Area Y with probability 1/2. • Shooter C hits Area Y with probability 1/2 and Area Z with probability 1/2. • Shooter D hits Area Z with probability 1/2 and Area W with probability 1/2. Three of the four shooters are randomly selected, and each of the three selected shooters fires one shot. Two shots land in Area X, and one shot lands in Area Z (not necessarily in that order). The remaining shooter (who was not among the three previously selected) then fires a shot. Determine the probability that this shot lands in Area Z. A. 1/4 B. 1/2 C. 2/3 D. 3/4 E. 1
Use the following information to answer each of the next two questions: Assume you have two shooters, each of whose shots is given by a (one dimensional) Normal distribution: Shooter Mean Variance A +10 9 B -10 225 Assume a priori each shooter is equally likely. You observe a single shot at +20. 12.6 (2 points) Use Bayes Theorem to estimate the location of the next shot. A. less than 0 B. at least 0 but less than 2 C. at least 4 but less than 6 D. at least 6 but less than 8 E. at least 8 12.7 (2 points) Use Buhlmann Credibility to estimate the location of the next shot. A. less than 0 B. at least 0 but less than 2 C. at least 4 but less than 6 D. at least 6 but less than 8 E. at least 8
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 533
Use the following information for the next 5 questions:
• There are three shooters P, Q, and R. • Each shooter is to shoot at his target, P, Q, or R, some distance away. • The shots of each shooter are distributed over a circle of radius 2 centered at his targeted point. The probability density is given by f(r,θ) =
1 , 4 πr
where r is the distance from his targeted point, and θ is the angle measured counterclockwise from the vertical.
• The targeted points P, Q, and R are at the vertices of an equilateral triangle with sides of length 1. 1 3 P is at (0,0), Q is at (1,0) and R is at ( , ). 2 2 One of the three shooters is randomly selected, and that shooter fires a shot at his targeted point. The shot lands at the point S (-0.4, 0). This same shooter then fires a second shot (at the same point targeted in the first shot.) 12.8 (3 points) Determine the Bayesian analysis estimate of the location of the second shot. A. (0.140, 0.153) B. (0.278, 0.173) C. (0.320, 0.231) D. (0.500, 0.866) E. None of the A, B, C, or D 12.9 (2 points) What is the Expected Value of the Process Variance? A. 2/3 B. 1 C. 4/3 D. 5/3 E. 2 12.10 (1 point) What is the Variance of the Hypothetical Means? A. 1/4 B. 1/3 C. 2/5 D. 1/2
E. 3/5
12.11 (2 points) Determine the Buhlmann Credibility estimate of the location of the second shot. A. (0.320, 0.231) B. (0.140, 0.173) C. ( -0.040, 0.115) D. (-0.220, 0.058) E. None of the A, B, C, or D 12.12 (2 points) The second shot is observed to be (-0.8, -0.9). The third shot is observed to be (-0.3, -0.7). Using the information provided by the location of the first three shots, determine the Buhlmann Credibility estimate of the location of the fourth shot from the same shooter. A. (0.320, 0.231) B. (0.140, 0.173) C. ( -0.040, 0.115) D. (-0.220, 0.058) E. None of the A, B, C, or D
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 534
Use the following information for the next two questions: • There are four marksmen. • The targets for the marksmen are at the points on the number line: 10, 20, 30, and 40. • The marksmen only miss to the left or right. • The distribution of shots from each marksman follows a normal distribution with mean equal to his target value and with standard deviation of 15. • Three shots from an unknown marksman are observed at 22, 26, and 14. (x -µ)2 ] 2σ2 , -∞ < x < ∞. σ 2π
exp[Normal Distribution:
mean = µ
variance = σ2
12.13 (3 points) Use Buhlmann Credibility to predict the location of the next shot from the same marksman. A. less than 20 B. at least 20 but less than 21 C. at least 21 but less than 22 D. at least 22 but less than 23 E. at least 23 12.14 (4 points) Use Bayesian Analysis to predict the location of the next shot from the same marksman. A. less than 20 B. at least 20 but less than 21 C. at least 21 but less than 22 D. at least 22 but less than 23 E. at least 23
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 535
Use the following information for the next 4 questions: With equal probability, one of two people will be shooting at one of two targets. Each person aims for a different target. You observe one shot (but not the shooter). Assume each person's shots are normally distributed around their target, with standard deviation 1. Define a coordinate system with one target at +m, and the other at -m. The observed shot was at x. (x -µ)2 ] 2σ2 , -∞ < x < ∞. σ 2π
exp[Normal Distribution:
mean = µ
variance = σ2
12.15 (4, 5/83, Q.47a) (2 points) Using Buhlmann Credibility, estimate where the next shot by the same person will appear. A.
x 1 + m2
B.
xm 1 + m2
C.
m - x 1 + m2
D.
x m2 1 + m2
E. None of A, B, C, or D.
12.16 (4, 5/83, Q.47c) (1 point) For a fixed observed shot x, what happens to your estimate in the prior question as the distance between the targets, 2m, gets very large? A. It approaches zero. B. It approaches m if x > 0, and -m if x < 0. C. It approaches x, the observed shot D. None of A, B, or C E. Can not be determined 12.17 (4, 5/83, Q.47b) (2 points) Using Bayes' Theorem, estimate where the next shot by the same person will appear. exp(mx) + exp(-mx) exp(mx) - exp(-mx) A. m B. m exp(mx) - exp(-mx) exp(mx) + exp(-mx) C. x
exp(mx) + exp(-mx) exp(mx) - exp(-mx)
D. x
exp(mx) - exp(-mx) exp(mx) + exp(-mx)
E. None of the above. 12.18 (4, 5/83, Q.47c) (1 point) For a fixed observed shot x≠0, what happens to your estimate in the prior question as the distance between the targets, 2m, gets very large? A. It approaches zero. B. It approaches m if x > 0, and -m if x < 0. C. It approaches x, the observed shot D. None of A, B, or C E. Can not be determined
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 536
12.19 (4, 5/88, Q.36) (1 point) In reference to Philbrick's gun shot example, which of the following statements are correct? 1. The variance of the hypothetical means increases as the relative distance between the means increases. 2. The credibility of a single observation will be increased as the variance of the hypothetical means increases. 3. Using a Bayesian credibility approach, the best estimate of the location of a second shot after observing a single shot is somewhere on a line connecting the mean of all of the clusters and the mean of the cluster to which the observed shot is closest. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 12.20 (4, 5/90, Q.27) (1 point) Which of the following will increase the credibility of your body of data? 1. A larger number of observations. 2. A smaller process variance. 3. A smaller variance of the hypothetical means. A. 1 B. 2 C. 1, 2 D. 1, 3 E. 1, 2, 3 12.21 (4B, 5/92, Q.13) (2 points) The following is based upon Philbrick's target shooting model with marksmen a, b, c and d each shooting at targets with mean target hits A, B, C and D, respectively, and overall mean E. Given the observation of a single shot X without knowing which marksman fired the shot, which of the following are true concerning the prediction of the same marksman's next shot, Y? 1. If Y is predicted using Bayesian analysis, then Y = F where F is the revised mean of the marksmen determined using the posterior probabilities that shot X was fired by each of the marksmen a, b, c, d. 2. The Buhlmann credibility estimate of Y is equivalent to a linear interpolation of the points X and E where the point X is given weight Z= N/(M+N) and E is given weight 1-Z, M is the expected variance of the marksmen's shots and N is the variance of the mean shots of the marksmen. 3. If Y is predicted using Bayesian analysis, then it is possible that Y is farther in absolute distance from E than both the observed shot X and the Buhlmann credibility estimate for Y. A. 1 only B. 1 and 2 only C. 1 and 3 only D. 2 and 3 only E. 1, 2, and 3
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 537
Use the following information for the next two questions: Consider a one-dimensional example of Philbrick's target shooting model such that the marksmen only miss to the left or right. • There are four marksmen. • Each marksman's target is initially 100 feet from him. • The initial targets for the marksmen are at the points 2, 4, 6, and 8 on a number line (measured in one-foot increments). • The accuracy of each marksman follows a normal distribution with mean equal to his target value and with standard deviation directly proportional to the distance from the target. At a distance of 100 feet from the target,the standard deviation is 3 feet. • By observing where an unknown marksman's shot hits the number line, you want to predict the location of his next shot. 12.22 (4B, 11/93, Q.3) (1 point) Determine the Buhlmann credibility assigned to a single shot of a randomly selected marksman. A. Less than 0.10 B. At least 0.10, but less than 0.20 C. At least 0.20, but less than 0.30 D. At least 0.30, but less than 0.40 E. At least 0.40 12.23 (4B, 11/93, Q.4) (3 points) Which of the following will increase Buhlmann credibility the most? A. Revise targets to 0, 4, 8, and 12. B. Move marksmen to 60 feet from targets. C. Revise targets to 2, 2, 10, and 10. D. Increase number of observations from same marksman to 3. E. Move two marksmen to 50 feet from targets and increase number of observations from same selected marksman to 2. 12.24 (4B, 11/94, Q.14) (1 point) Which of the following statements are true? 1. As the process variance goes to zero, the credibility associated with the current observations goes to one. 2. As the variance of the hypothetical means increases, the credibility associated with the current observations will increase. 3. As the variance of the hypothetical means increases, estimates produced by the Buhlmann credibility method approach those produced by a pure Bayesian analysis method. A. 1 B. 3 C. 1, 2 D. 2, 3 E. 1, 2, 3
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 538
12.25 (4B, 11/94, Q.23) (2 points) Two marksmen shoot at a target. For each marksman, point values are assigned based upon the location of each shot. The three possible locations and point values are: Location Point Value Hit center of target 50 points Hit target, but not center 10 points Miss target completely 0 points Probabilities for the shot locations for each marksman are: Probability of Marksman Hitting Center Hitting Target, But Not Center Missing Target A 0.01 0.09 0.90 B 0.20 0.45 0.35 A marksman is randomly selected and his shots are observed. Determine the expected score of the 21st shot if the first 20 shots all missed the target. A. 0.00 B. 1.40 C. 7.95 D. 14.50 E. Cannot be determined from the given information. 12.26 (4B, 5/95, Q.25) (1 point) Philbrick uses a target shooting example to help explain credibility. For this problem, consider the limiting case where the variance of the hypothetical means approaches infinity. Assume that the location of the shot, the mean of the closest cluster to the shot, and the population mean are all known. Match each technique with its resulting best estimate of the location of the next shot from the same shooter. Technique Estimate of Location of Next Shot B = Pure Bayesian approach 1 = Location of the shot C = Buhlmann credibility 2 = Mean of the closest cluster 3 = Mean of the population A. B with 1, C with 1 B. B with 1, C with 2 C. B with 2, C with 1 D. B with 2, C with 2 E. B with 2, C with 3
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 539
Use the following information for the next two questions: • A shooter is to shoot at one of three points, X, Y, or Z, on a target some distance away. • The shots of the shooter are uniformly distributed over a circle of radius 1 centered at the targeted point. • X, Y, and Z are at the vertices of an equilateral triangle with sides of length 1. •
G is the point equidistant from X, Y, and Z at the center of the triangle.
• M is the point halfway between X and Y on the line segment joining X and Y. One of the three points, X, Y, or Z, is randomly selected, and the shooter fires a shot at this point. The shot lands at the point S, halfway between X and M on the line segment joining X and M. The shooter then fires a second shot at the same point targeted in the first shot. 12.27 (4B, 11/96, Q.17) (2 points) Determine the Bayesian analysis estimate of the location of the second shot. A. X B. G C. M D. S E. A point other than X, G, M, or S 12.28 (4B, 11/96, Q.18) (2 points) Determine the Buhlmann credibility estimate of the location of the second shot. A. X B. G C. M D. S E. A point other than X, G, M, or S
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 540
Use the following information for the next two questions: • A shooter is to shoot at one of three points, X, Y, or Z, on a target some distance away. • The shots of the shooter are uniformly distributed inside a circle of radius 2/3 that is centered at the targeted point. • X, Y, and Z are at the vertices of an equilateral triangle with sides of length 1. • G is the point equidistant from X, Y, and Z at the center of the triangle. • M is the point halfway between X and Y on the line segment joining X and Y. • N is the point halfway between X and Z on the line segment joining X and Z. • P is the point halfway between M and N on the line segment joining M and N. One of the three points, X, Y, or Z, is randomly selected, and the shooter fires a shot at this point. The shot lands at the point M. The shooter then fires a second shot at the same point targeted in the first shot. 12.29 (4B, 5/97, Q.14) (2 points) Determine the Bayesian analysis estimate of the location of the second shot. A. X B. Y C. Z D. G E. M 12.30 (4B, 5/97, Q.15) (2 points) The second shot lands at the point N. The shooter then fires a third shot at the same point targeted in the first two shots. Determine the Bayesian analysis estimate of the location of the third shot. A. X B. G C. M D. N E. P 12.31 (4B, 11/97, Q.14) (2 points) You are given the following: • Four shooters are to shoot at a target some distance away that has the following design:
W
X
Z
Y
• Shooter A hits Area W with probability 1/2 and Area X with probability 1/2. • Shooter B hits Area X with probability 1/2 and Area Y with probability 1/2. • Shooter C hits Area Y with probability 1/2 and Area Z with probability 1/2. • Shooter D hits Area Z with probability 1/2 and Area W with probability 1/2. Three of the four shooters are randomly selected, and each of the three selected shooters fires one shot. One shot lands in Area W, one shot lands in Area X, and one shot lands in Area Y (not necessarily in that order). The remaining shooter (who was not among the three previously selected) then fires a shot. Determine the probability that this shot lands in Area Z. A. 1/4 B. 1/2 C. 2/3 D. 3/4 E. 1
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 541
12.32 (4B, 5/98 Q.12) (1 point) You are given the following: • Six shooters are to shoot at a target some distance away that has the following design:
W
X
Z
Y
• Shooter A hits Area W with probability 1/2 and Area X with probability 1/2. • Shooter B hits Area W with probability 1/2 and Area Y with probability 1/2. • Shooter C hits Area W with probability 1/2 and Area Z with probability 1/2. • Shooter D hits Area X with probability 1/2 and Area Y with probability 1/2. • Shooter E hits Area X with probability 1/2 and Area Z with probability 1/2. • Shooter F hits Area Y with probability 1/2 and Area Z with probability 1/2. Five of the six shooters are randomly selected, and each of the five selected shooters fires one shot. Three shots land in Area W and two shots lands in Area Y (not necessarily in that order). The remaining shooter (who was not among the five previously selected) then fires a shot. Determine the probability that this shot lands in Area Y. A. 0 B. 1/4 C. 1/3 D. 1/2 E. 1
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 542
Solutions to Problems: 12.1. D. The expected value of the process variance is 25,200. The variance of the hypothetical means is: 46667 - 2002 = 6667. K = EPV / VHM = 25200 / 6667 = 3.8. A Priori Chance of This Type of Marksman 0.333 0.333 0.333
Type of Marksman A B C
Mean 100 200 300
Square of Mean 10000 40000 90000
200
46667
Average
Standard Deviation 60 120 240
Process Variance 3600 14400 57600 25200
12.2. E. Z = N / (N + K) = 2 / (2 + 3.8) = .345. The average observation is: (90 + 150) /2 = 120. The a priori mean = 200. Thus the estimate of the next shot is: (.345)(120) + (1 - .345)(200) = 172. 12.3. C. The density for a Normal Distribution with mean µ and standard deviation σ is given by f(x) = exp(-.5{(x-µ)/σ}2 ) / { σ 2 π }. Thus the density function at 90 for Marksman A is: 2 π } = 0.000656.
exp(-.5{(90-100)/60}2 ) / { 60
A Priori Chance Chance of of MarksStd. Type of Observing man Mean Dev. Marksman 90 A B C
100 200 300
60 120 240
0.333 0.333 0.333
0.00656 0.00218 0.00113
Overall
Chance of Observing 150
Chance of the Observat.
Probability Weight
Posterior Chance of Type of Marksman
0.00470 0.00305 0.00137
3.081e-5 6.657e-6 1.550e-6
1.027e-5 2.219e-6 5.167e-7
78.97% 17.06% 3.97%
1.301e-5
1.000
12.4. A. Use the results of the previous question to weight together the a priori means:
Marksman A B C Overall
Posterior Chance of This Type of Risk 78.97% 17.06% 3.97%
A Priori Mean 100 200 300 125
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 543
12.5. B. For convenience add the shooters around the outer edge of the diagram. A
W
X
D
B
Z
Y C
Work out the chance of the observation given that each of the shooters was excluded. • If Shooter A was excluded, then there could have been at most one shot in Area X. Thus the probability of the observation if shooter A was excluded is 0. • If Shooter B was excluded, then there could have been at most one shot in Area X. Thus the probability of the observation if shooter B was excluded is 0. • If Shooter C was excluded, then Shooter D must have hit area Z, (since D could only hit Z or W, and no shot was observed to have hit W.) Shooter A and Shooter B must have hit area X, (since there are two shots observed in Area X.) The probability of this observation is: (1/2)(1/2)(1/2) = 1/8. • If Shooter D was excluded, then Shooter C must have hit area Z, (since C could only hit Z or Y, and no shot was observed to have hit Y.) Shooter A and Shooter B must have hit area X, (since there are two shots observed in Area X.) The probability of this observation is: (1/2)(1/2)(1/2) = 1/8. Then as computed in the spreadsheet, there is a 50% chance that Area Z will be hit. A
B
C
D
E
F
Excluded Marksman
A Priori Chance of This Situation
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Situation
Chance of hitting Area Z
A B C D
0.25 0.25 0.25 0.25
0 0 0.125 0.125
0.00000 0.00000 0.03125 0.03125
0% 0% 50% 50%
0.0 0.0 0.5 0.5
0.062
100%
50%
Overall
Comment: The posterior probabilities of the shot landing in the other areas are: X: 0, Y: 25%, and W: 25%.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 544
12.6. A. The Normal Distribution has a density function of: exp[-(x-µ)2 / (2σ2) ] / {σ 2 π }. The probability weights are the chance of the observation times the a priori probability. A Priori Shooter Probability A B
Standard Mean Deviation
0.5 0.5
10 -10
3 15
Chance of Shot at 20
Probability Weights
Posterior Probability
Mean
0.000514 0.003599
0.000257 0.001800
0.1250 0.8750
10 -10
0.002057
1.0000
-7.50
Sum
Comment: Note that the new estimate of -7.5 is outside the interval formed by the observed value of 20 and the overall mean of 0. While this can happen for estimates based on Bayes Theorem, this can not occur for estimates based on Credibility. 12.7. E. The expected value of the process variance is 117. The variance of the hypothetical means is: 100 - 02 = 100. K = EPV / VHM = 117 / 100 = 1.17. Z = 1 / (1 + 1.17) = .461. Estimate = (.461)(20) + (1 - .461)(0) = 9.2. Shooter A B Average
A Priori Probability 0.5 0.5
Process Variance 9 225
Mean 10 -10
Square of Mean 100 100
117
0
100
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 545
12.8. B. The point S is at (-0.4, 0) and is at a distance of .4 from P, 1.4 from Q, and {(.92 ) + ( 3 /2)2 }.5 = 1.249 from R. Thus if the shooter is P, the probability density function S is 1/ (4π .4). If the shooter is Q, the probability density function at S is 1/ (4π1.4). If the shooter is R, the probability density function at S is 1/ (4π1.249). The posterior probability of shooter P is proportional to the product of the a priori probability of shooter P and the density function at S given shots at target P, in this case (1/3)/ (4π .4). Similarly, one gets a probability weights for Targets Q and R of: (1/3)/ (4π1.4) and, (1/3)/(4π1.249). Thus the posterior probabilities are proportional to: 1/.4 = 2.5, 1/1.4 = .714, and 1/1.249 = .801. Thus the posterior probabilities are: .6227, .1779, and .1994. Thus the posterior estimate is: (.6227)P + (.1779)Q + (.1994)R = (.6227)(0,0) + (.1779)(1, 0) + (.1994)(1/2,
3 /2) = (0.278, 0.173).
A
B
C
D
E
F
G
Target
A Priori Probability
Chance of the Observation
Prob. Weight = Product of Columns B & C
Posterior Chance of This Type Marksman
x-value
y-value
P Q R
33.33% 33.33% 33.33%
0.19894 0.05684 0.06371
0.06631 0.01895 0.02124
62.27% 17.79% 19.94%
0.000 1.000 0.500
0.000 0.000 0.866
0.10650
100.00%
0.278
0.173
Overall
Comment: Beyond what you are likely to be asked on the exam. Since it is a weighted average of P, Q, and R, with weights between 0 and 1, the Bayesian Estimate is within the triangle PQR; estimates from Bayesian Analysis are always within the range of hypotheses. 12.9. C. The process variance is the expected squared distance of observation from its expected value. If we are shooting at target P, then the process variance is the expected squared distance of the shot from P: 2π
2
2π 2
2π
∫ ∫ {(r2)/ (4πr)} rdrdθ = (1/ 4π)∫ ∫r2 drdθ = (8/ 12π) ∫ dθ = 4/3. θ =0 r=0
θ=0
r=0
θ =0
Thus the process variance for shooting at this target is 4/3. The process variance for the other two targets is the same, so that the EPV = 4/3. Comment: Difficult. Beyond what you are likely to be asked on the exam. 12.10. B. The Variance of the Hypothetical Means is the (weighted) average squared distance of the targets from their grand mean of M. In this case P is at (0,0), Q is at (1,0) and R is at (1/2,
3 /2),
then M is at (P/3) + (Q/3) + (R/3) = (1/2, 3 /6). The squared distance of M to any of the targets is then 1/4 + 3/36 = 1/3. Therefore VHM = 1/3.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 546
12.11.A. Using Buhlmann Credibility the estimate of the next shot will be a weighted average of the prior mean M and the average of the observation(s), in this case S. Thus the new estimate will be somewhere on the line between the point M = (1/2, 3 /6) and the point S = (-.4, 0). The Buhlmann Credibility Parameter K = EPV / VHM = (4/3)/(1/3) = 4. Thus for one observation Z = 1/(1+K) = 1/5. Thus the estimate of the next shot = (1/5)S + (4/5) M = (1/5)(-0.4, 0) + (4/5) (1/2, 3 /6) = (0.320, 0.231). Comment: Difficult. Beyond what you are likely to be asked on the exam; it is extremely unlikely that you will be asked to compute the credibility in a two dimensional situation. See the following diagram:
R
M * Buhlmann Credibility Estimate + Bayesian Estimate
S
P
Q
12.12. E. The Buhlmann Credibility Parameter K = EPV / VHM = (4/3)/(1/3) = 4. Thus for three observations Z = 3/(3+K) = 3/7. The average of the three observations is : ((-.4,0) + (-.8,-.9) + (-.3, - .7))/3 = (-.500, -.533). Thus the estimate of the next shot = (3/7)(-.500, -.533) + (4/7) (1/2, 3 /6) = (0.071, -0.063). Comment: Note that the estimate is outside the triangle PQR. The estimates that result from Buhlmann Credibility are occasionally outside the range of hypotheses. 12.13. D. The process variance for every marksmen is assumed to be the same and equal to 152 = 225. Thus the EPV = 225. The overall mean is 25 and the VHM is: {(10-25)2 + (20-25)2 + (30-25)2 + (40-25)2 }/4 = 125. K = EPV / VHM = 225 /125 = 1.8. Z = 3/(3+1.8) = 62.5%. The average observation: (22+26+14)/3 = 20.67. The estimate of the next shot is: (20.67)(62.5%) + (25)(37.5%) = 22.3.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 547
12.14. C. The estimate of the next shot is 21.2: A
B
C
D
E
F
G
H
A Chance Chance Chance Chance MarksStd. Priori of of of of man Mean Dev. ChanceObservingObservingObserving Obser22 26 14 vation 1 2 3 4
10 20 30 40
15 15 15 15
0.25 0.25 0.25 0.25
0.0193 0.0264 0.0231 0.0129
0.0151 0.0246 0.0257 0.0172
0.0257 0.0246 0.0151 0.0059
7.4641e-6 1.5889e-5 8.9163e-6 1.3189e-6
Overall
I
J
K
Probability Weight Dx H
Posterior Chance
1.87e-6 3.97e-6 2.23e-6 3.30e-7
22.22% 47.31% 26.55% 3.93%
10 20 30 40
8.40e-6
1.000
21.2
Mean
12.15. D. The a priori mean is 0. The variance of the hypothetical means is m2 . The process variance for each marksman is 12 = 1. Therefore, the EPV = 1. K = EPV/ VHM = 1/m2 . Thus for one observation, Z = 1 / (1+K) = m2 / (1+m2 ). Thus the new estimate is (x){m2 / (1+m2 )} + 0{1 / (1+m2 )} = xm2 / (1+m2 ). 12.16. C. As m → ∞, xm2 / (1+m2 ) → x. In other words, as m approaches ∞, the VHM gets large, Z approaches 1, and the estimate approaches the observation. 12.17. B. The posterior distribution is proportional to the product of the chance of the observation and the a priori chance of having each target. The targets are equally likely, so their a priori probability is each 1/2. Given that the shooter is aiming at a target with mean µ, the chance of the observation is: (1/ 2 π ) exp( -[(x-µ )2 ]/2 ). Thus the posterior chance of the shooter aiming at the target at +m is proportional to: exp( -[(x-m )2 ]/2]), while the posterior chance of the target at -m is proportional to: exp( -[(x+m )2 ]/2]). Thus the new estimate is: {mexp( -[(x-m )2 ]/2])-mexp( -[(x+m )2 ]/2])}/{exp( -[(x-m )2 ]/2])+exp( -[(x+m )2 ]/2])} = {mexp( -(x2 +m2 )/2){exp(xm) - exp( -xm )}/{exp( -(x2 +m2 )/2)(exp(xm) - exp( -xm ))} = m{exp( mx)-exp( -mx)}/{exp( mx)+exp( -mx)}. 12.18. B. As m →∞, if x > 0, then m{exp( mx)-exp( -mx)}/{exp( mx)+exp( -mx)} → mexp( mx)/{exp mx) = m. As m →∞, if x < 0, then m{exp( mx)-exp( -mx)}/{exp( mx)+exp( -mx)} → -mexp(- mx) / exp( -mx) = -m. As the targets get further apart the Bayesian Analysis estimate approaches the closer target. Comment: For an observation of zero, the Bayes Analysis estimate is zero, regardless of m.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 548
12.19. B. 1. True. 2. True. As the targets get further apart the credibility assigned to a shot increases. 3. False. Using a Bayesian (Buhlmann) credibility approach, the best estimate of the location of a second shot after observing a single shot is somewhere on a line connecting the mean of all of the clusters and the observed shot. 12.20. C. 1. True. The more shots you observe, the more weight you give to the mean observation. 2. True. The better the marksmen, the more the credibility given the observation. 3. False. The closer the targets, the more weight is given to the overall mean and the less weight is given to the shot. 12.21. E. 1. T. The definition of the Bayesian Estimate. 2. T. The definition of Buhlmann Credibility estimate, plus the fact that the Buhlmann Credibility Z = N/(N+K) which for one shot is Z = 1/(1+K) = 1/(1+ EPV/VHM) = VHM / (VHM + EPV). VHM = is the variance of the mean shots of the marksmen, while EPV = the expected variance of the marksmen's shots. 3. T. While the Buhlmann Credibility estimate is between E and X, the Bayesian estimate need not be so. The Bayesian estimate can be further from the overall mean, E, than the observed shot, X, let alone than the Buhlmann Estimate. Comment: The Buhlmann Credibility estimate must be on the straight line between E and X. The Bayesian Estimate being a weighted average of A, B, C and D (with weights between 0 and 1) must be somewhere within or on the square formed by the targets. Depending on the particular situation the Bayesian Analysis estimate could be either closer to or further from E than is the Buhlmann Credibility Estimate. The situation in statement 3 could look as follows if the Bayesian Estimate were further from E: A
C
X
B
E * Buhlmann Estimate
+ Bayesian Estimate D
12.22. D. Expected Value of the Process Variance = 32 = 9. Overall mean is 5. The variance of the Hypothetical Means = (1/4){ (5-2)2 + (5-4)2 + (5-6)2 + (5-8)2 } = 5. K = 9/5. Z = 1/ (1+9/5) = 5/14 = 0.357.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 549
12.23. A. A. Expected Value of the Process Variance remains = 9. Overall mean is 6. The Variance of the Hypothetical Means = (1/4){(6-0)2 + (6-4)2 + (6-8)2 + (6-12)2 } = 20. K = 9/20. Z = 1/ (1+9/20) = 20/29 = 0.690. B. New standard deviation = (3)(6/10) = 1.8. EPV = 1.82 = 3.24. VHM = 5. K = 3.24 / 5 = .648. Z = 1 / 1.648 = .607. C. VHM = (10-6)2 = 16. K = 9 /16. Z = 1/1.563 = .640. D. K = 9/5. Z = 3 / (3 + 1.8) = .625 E. The standard deviation for the two closer marksmen = (3)(5/10) = 1.5. One needs to average the process variances of the two closer and two further marksmen. Expected Value of the Process Variance = (1/2)(1.52 + 32 ) = 5.625. K = 5.625 / 5 = 1.125. Z = 2 / (2 + 1.125) = .640. Comment: Too long for three points! Good review of the ways to increase Buhlmann Credibility: A & C: Increase Variance of the Hypothetical Means, B & E: Decrease the Expected Value of the Process Variance, D & E: Increase Number of Observations. 12.24. C. 1. T, 2. T , 3. F. As the variance of the hypothetical means increases, the Buhlmann credibility estimate approaches the observed shot, while the Bayesian analysis estimate approaches the cluster closest to the shot.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 550
12.25. B. The probability of Marksman A missing the target 20 times is proportional to .920. The probability of Marksman B missing the target 20 times is proportional to .3520. Thus Bayes Analysis gives virtually all the weight to Marksman A. Marksman A has a mean score of (.01)(50) + (.09)(10) = 1.4 Marksman A B
A Priori Probability 0.5 0.5
Sum
1.0000
Chance of Observation 0.1216 7.61e-10
Probability Weights 0.0608 3.80e-10
Posterior Probability 1.0000 6.26e-9
Mean Score 1.4 14.5
0.0608
1.0000
1.400
Comment: Tests understanding of the basic concept of Bayesian analysis. If not told otherwise use Bayesian Analysis, rather than its linear approximation Buhlmann Credibility. Note that in this case, if one had been asked to apply Buhlmann Credibility, the solution would be as follows: Marksman A B
A Priori Probability 0.5 0.5
Process Variance 32.04 334.75
Hypothetical Mean 1.40 14.50
Square of Mean 1.96 210.25
Sum
1.00
183.395
7.950
106.105
For Marksman A the process variance is: {(.01)(50)2 + (.09)(10)2 + (.9)(0)2 } - (1.4)2 = 32.04. EPV = 183.395, VHM = 106.105 - (7.950)2 = 42.9025. K = 183.395 / 42.9025 = 4.275. For 20 observations, Z = 20/24.275 = 82.4%. The observation is an average score of 0, and the a priori estimate is 7.95. Thus the estimate using Buhlmann Credibility is: (0)(82.4%) + (7.95)(17.6%) = 1.399. That the estimates based on Bayesian Analysis and Buhlmann Credibility are virtually the same is a coincidence! For example, if instead we had observed 40 shots all of which had missed the target, then the Bayesian Analysis estimate would again be 1.4. However, the Buhlmann Credibility would now be 40/44.275 = 90.3%; thus the estimate using Buhlmann Credibility would now be: (0)(90.3%) + (7.95)(9.7%) = .771. 12.26. C. As the clusters get further and further apart the chance of observing a shot goes quickly to zero for all but the closest cluster. (For the pure Bayes approach the posterior probability weight of a cluster is the product of the a priori chance of a cluster and the chance of the observation given that cluster. ) Thus for the pure Bayesian approach, (assuming the a priori chance of the closest cluster is greater than zero), as the clusters get further and further apart, the probability weight of the closest cluster is much larger than any of the others. Thus virtually all of the weight is given to the mean of the closest cluster, and the posterior estimate is the mean of that closest cluster. For the Buhlmann Credibility approach, K = EPV/VHM. Assuming EPV stays constant, as the VHM approaches infinity, K approaches zero. Therefore, Z = 1/(1 + K) approaches one, and the Buhlmann Credibility estimate is the observed shot.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 551
12.27. B. For shots at each target the probability density is 1/π uniform over a unit circle centered at that target. (The area of a unit circle is π.) The point S is inside a unit circle centered at either X, Y or Z. Thus each of the three probability density functions is 1/π at S. The posterior probability of Target X is proportional to the product of the a priori probability of Target X and the density function at S given shots at target X, in this case (1/3)(1/π) = 1/ 3π. Similarly, one gets an equal probability weight for Targets Y or Z. Thus the posterior probabilities are equal to 1/3. Thus the posterior estimate is: (1/3)X + (1/3)Y + (1/3)Z = G.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 552
12.28. E. Using Buhlmann Credibility the estimate of the next shot will be a weighted average of the prior mean G and the average of the observation(s), in this case S. Thus the new estimate will be somewhere on the line between the point G and the point S. This is a point other than X, G, M, or S. Comment: It is extremely unlikely that you will be asked to compute the credibility in a two dimensional situation as above. Here is a diagram showing the estimate using Buhlmann Credibility, with Z = 40%: Z
G
* X
S
Estimate
Y
M
The Buhlmann Credibility minimizes the expected squared errors, which are defined in terms of the squared distances between points. The process variance is the expected squared distance of observation from its expected value. If we are shooting at target X, then the process variance is the expected squared distance of the shot from X. The density is 1/π over a unit circle. Thus the expected squared distance from X is: 2π
2π 1
1
∫ ∫ (r2)/π rdrdθ = ∫ θ =0 r=0
θ=0 r=0
2π
∫r3 /π drdθ = ∫1/ 4π dθ = 1/2. θ =0
Thus the process variance for shooting at this target is 1/2. The process variance for the other two targets is the same, so that the Expected Value of the Process Variance = EPV = 1/2. The Variance of the Hypothetical Means is the (weighted) average squared distance of the targets from their mean of G. In this case if X is at (0,0), Y is at (1,0) and Z is at (1/2,
3 /2), then G is at
(1/2, 3 /6). The squared distance of G to any of the targets is then 1/4 + 3/36 = 1/3. Therefore VHM = 1/3. K = EPV / VHM = 1/2 / (1/3) = 3/2. Z = 1 /(1+K) = 1/2.5 = .4. Thus the new estimate is: .4S + .6G = (.4)(1/4,0) + (.6)(1/2,
3 /6) = (.4,
3 /10).
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 553
12.29. E. The posterior probabilities of the targets are proportional to the product of the chance of the observation given each target and the a priori probability of each target. Since the a priori probabilities of the targets are all equal, the posterior probabilities are proportional to the chance of the observation given each target. Thus, the posterior probabilities are proportional to the density functions at the observed shot M. M is a distance of 1/2 from either X or Y, and a distance from Z of 3 /2 =.866 > 2/3. Therefore, if the target is Z, then the density at M is zero. If the target is either X or Y, then the density at M is: 9/4π. Therefore, the posterior probabilities are proportional to: 9/4π, 9/4π, and 0. Thus the posterior probabilities are: 1/2, 1/2, and 0. Thus the Bayesian analysis estimate of the location of a second shot is: (1/2)X + (1/2)Y + (0)Z = M. Z
N G P X
M
Y
Comment: For shots at each target the probability density is 9/4π uniform over a circle of radius 2/3 centered at that target, since the area of a circle of radius 2/3 is π(2/3)2 . The triangle XMZ is a right triangle. Its hypotenuse XZ is length 1. Side XM is length 1/2. Therefore, side MZ is of length:
(12) - (1/ 2)2 = 3 /2.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 554
12.30. A. The posterior probabilities of the targets are proportional to the product of the chance of the observation given each target and the a priori probability of each target. Since the a priori probabilities of the targets are all equal, the posterior probabilities are proportional to the chance of the observation given each target. Thus, the posterior probabilities are proportional to product of the density functions at the observed shots M and N. If the target is Z, then the density at M is zero. If the target is either X or Y, then the density at M is 9/4π. If the target is Y, then the density at N is zero. If the target is either X or Z, then the density at N is 9/4π. Therefore, the posterior probabilities are proportional to: (9/4π)(9/4π), (9/4π)(0), and (0)(9/4π) = (9/4π)2 , 0, and 0. Thus the posterior probabilities are: 1, 0, 0. Thus the Bayesian analysis estimate of the location of a second shot is: (1)X + (0)Y + (0)Z = X. Comment: If the shot is at M, then the target could not have been Z, since it is too far away. If the shot is at N, then the target could not have been Y, since it is too far away. Thus if one observes a shot at M and a shot at N, the target must have been X.
2013-4-9
Buhlmann Credibility §12 Philbrick Target Shooting, HCM 10/19/12, Page 555
A
12.31. A.
W
X
D
B
Z
Y C
Work out the chance of the observation given that each of the shooters was excluded. • If Shooter A was excluded, then Shooter B must have hit Area X, (since only A or B could have hit X.) Thus Shooter C must have hit Area Y, (since only B or C could have hit Y, and B hit X.) Shooter D must have hit area W, (since only A or D could have hit W, and A is assumed not to have shot.) Thus the probability of the observation if shooter A was excluded is: (1/2)(1/2)(1/2) = 1/8. • If Shooter B was excluded, then Shooter A must have hit Area X. Thus Shooter D must have hit Area W and thus Shooter C must have hit Area Y. The probability of this observation is: (1/2)(1/2)(1/2) = 1/8. • If Shooter C was excluded, then Shooter B must have hit Area Y, (since only C or B could have hit Y.) Shooter D must have hit area W, (since D could only hit Z or W, and no shot was observed to have hit Z.) Thus Shooter A must have hit area X, (since A is the only shooter remaining, and X must have been hit by someone.) The probability of this observation is: (1/2)(1/2)(1/2) = 1/8. • If Shooter D was excluded, then Shooter A must have hit area W. ⇒ Shooter C must have hit Area Y. ⇒ Shooter B must have hit area X. Probability of this observation is: (1/2)(1/2)(1/2) = 1/8. Then as computed in the spreadsheet, the posterior chances that the remaining shooter is A, B, C or D are equally likely; there is a 25% chance that Area Z will be hit. Excluded Marksman
A Priori Chance of This Situation
Chance of the Observation
Prob. Weight = Product of Prior Columns
Posterior Chance of This Situation
Chance of hitting Area Z
A B C D
0.25 0.25 0.25 0.25
0.125 0.125 0.125 0.125
0.03125 0.03125 0.03125 0.03125
25% 25% 25% 25%
0.0 0.0 0.5 0.5
0.125
100%
25%
Overall
Comment: This is an example where the observation does not alter the a priori probabilities. This would not be true if instead there had been observed 2 shots in Area X and 1 shot in Area Z. 12.32. A. For three shots to have appeared in area W, shooters A, B, and C must have been selected and all must have hit area W. Thus since the remaining two shooters each hit area Y, they must have been shooters D and F. That means that shooter E is the remaining shooter. Thus the probability that the remaining shooter hits area Y is 0.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 556
Section 13, Die/Spinner Models141 There are simple models of pure premiums involving dice and spinners. The frequency is based on a die roll and the severity is based on the result of a spinner. Either Buhlmann Credibility or Bayesian Analysis can be applied to these models. For example, assume: Two dice, A1 and A2 , are used to determine the number of claims. Each side of both dice are marked with either a 0 or a 1, where 0 represents no claim and 1 represents a claim. The probability of a claim for each die is: Die Probability of Claim 1/6 A1 A2
4/6
In addition, there are two spinners, B1 and B2 , representing claim severity. Each spinner has two areas marked 30 and 50. The probabilities for each claim size are: Claim Size Spinner 30 50 B1 0.75 0.25 B2
0.40 0.60 A die is selected randomly from A1 and A2 and a spinner is selected randomly from B1 and B2 . Note that in this example the die and spinner are chosen independently of each other.142 See the problems for an example where they are chosen together. Therefore, there are four different combinations of die and spinner. This is an example of a cross classification system.
A1 A2
B1
B2
25% 25%
25% 25%
Four observations from the selected die and spinner yield the following claim amounts: 30, 0, 0, 30.
141
See “An Examination of Credibility Concepts”, by Stephen Philbrick, PCAS 1981. Philbrick expands on an example in “Credibility for Severity” , by Charles C. Hewitt, PCAS 1970. Both the Philbrick and Hewitt papers are excellent reading for those who wish to understand credibility. 142 This is similar to what is done in 4, 5/01, Q.28. In contrast, in 4, 5/01, Q.10-11, the frequency and severity distribution go together, they are not selected independently.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 557
Using Buhlmann Credibility we can determine the expected pure premium for the next observation from the same die and spinner, without separately estimating the future frequency and severity.143 In order to calculate the EPV of 305.5, for each possible pair of die and spinner use the formula: variance of p.p. = µf σs2 + µs2 σf2 . Therefore, we need to compute the mean and variance of each die and each spinner. For example, the mean severity for spinner B2 is: (0.4)(30) + (0.6)(50) = 42. The variance of spinner B2 is: (0.4)(30 - 42)2 + (0.6)(50 - 42)2 = 96. Die A Priori and Chance of Spinner Risk A1, B1 0.250 A1, B2 0.250 A2, B1 0.250 A2, B2 0.250
Mean Freq. 0.167 0.167 0.667 0.667
Variance of Freq. 0.139 0.139 0.222 0.222
Mean Variance Severity of Sev. 35 75 42 96 35 75 42 96
Mean
Process Variance of P.P. 182.6 261.0 322.2 456.0 305.5
One computes the mean pure premium for each possible combination of die and spinner. Die and Spinner A1, B1 A1, B2 A2, B1 A2, B2 Mean
A Priori Chance of Risk 0.250 0.250 0.250 0.250
Mean Freq. 0.167 0.167 0.667 0.667
Mean Severity 35 42 35 42
Mean Pure Premium 5.83 7.00 23.33 28.00
Square of Mean P.P. 34.03 49.00 544.44 784.00
16.04
352.87
Thus the Variance of the Hypothetical Means = 352.87 - 16.042 = 95.29. Therefore, the Buhlmann Credibility Parameter for pure premium = K = EPV / VHM = 305.5 / 95.29 = 3.2. Thus the credibility for 4 observations is 4/(4 + K) = 4/7.2 = 55.6%. The a priori mean pure premium as computed above is 16.04. The observed pure premium is (30 + 0 + 0 + 30)/4 = 15. Thus the estimated future pure premium is: (0.556)(15) + (1 - 0.556)(16.04) = 15.44.
143
In general the product of separate estimates of frequency and severity will not equal that gotten working directly with the pure premiums.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 558
The Bayesian Analysis for the same example proceeds as follows: A
B
C
D
E
F
G
H
Type of A Priori Prob. Weight = Posterior Mean Die Chance of Chance Product Chance of Pure and This Die of the of Columns this Die Mean Mean Premium Spinner & Spinner Observation B&C & Spinner Frequency Severity Cols. F x G A1, B1 0.25 0.010851 0.0027127 0.2187 0.1667 35 5.833 A1, B2 0.25 0.003086 0.0007716 0.0622 0.1667 42 7.000 A2, B1 0.25 0.027778 0.0069444 0.5599 0.6667 35 23.333 A2, B2 0.25 0.007901 0.0019753 0.1592 0.6667 42 28.000 Overall
0.0124040
1.000
19.233
For example, if one has die A1 and spinner B2 , the chance of the observation of 30, 0, 0, 30, is: {(1/6)(0.4)}(5/6)(5/6){(1/6)(0.4)} = 0.003086. The posterior chance of die A1 and spinner B2 is: 0.003086 / 0.0124040 = 0.0622. The estimated future mean pure premium = (0.2187)(5.833) + (0.0622)(7) + (0.5599)(23.333) + (0.1592)(28) = 19.23.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 559
Simulation Experiments: It can take many repetitions of the die-spinner process for the observed mean pure premium to approach its expected value. I simulated the four different combinations of die and spinner of this die-spinner example. Here are examples of the average pure premiums after different numbers of simulation runs: Die A1 and Spinner B1, with mean pure premium: (1/6)(35) = 5.83: 14 12 10 8 6 4 2
20
50
100
200
500
1000
Die A1 and Spinner B2, with mean pure premium: (1/6)(42) = 7: 14 12 10 8 6 4 2 20
50
100
200
500
1000
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 560
Die A2 and Spinner B1, with mean pure premium: (4/6)(35) = 23.33: 30 28 26 24 22 20 18 16 20
50
100
200
500
1000
Die A2 and Spinner B2, with mean pure premium: (4/6)(42) = 28:
34 32 30 28 26 24 22
20
50
100
200
500
1000
Generally, the larger the process variance the more observations it will take until you can discern which risk you are likely observing.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 561
Problems: Use the following information for the next 5 questions: Two dice, A1 and A2 , are used to determine the number of claims. Each side of both dice are marked with either a 0 or a 1, where 0 represents no claim and 1 represents a claim. The probability of a claim for each die is: Die Probability of Claim 1/6 A1 A2
2/6
In addition, there are two spinners, B1 and B2 , representing claim severity. Each spinner has two areas marked 10 and 40. The probabilities for each claim size are: Claim Size Spinner 10 40 B1 0.70 0.30 B2
0.20 0.80 A die is selected randomly from A1 and A2 and a spinner from B1 and B2. Four observations from the selected die and spinner yield the following claim amounts in the following order: 10, 0, 40, 10. 13.1 (3 points) Using Buhlmann Credibility, determine the expected claim frequency for the next observation from the same die and spinner. A. 0.28 B. 0.30 C. 0.32 D. 0.34
E. 0.36
13.2 (3 points) Using Buhlmann Credibility, determine the expected claim severity for the next observation from the same die and spinner. A. 22.0 B. 22.4 C. 22.8 D. 23.2
E. 23.6
13.3 (4 points) Using Buhlmann Credibility, determine the expected pure premium for the next observation from the same die and spinner. (Do not separately estimate the future frequency and severity.) A. 7.0 B. 7.5 C. 8.0 D. 8.5 E. 9.0 13.4 (3 points) Using Bayesian Analysis, determine the expected pure premium for the next observation from the same die and spinner. A. 6.7 B. 7.0 C. 7.3 D. 7.6
E. 7.9
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 562
13.5 (2 points) A new die is selected randomly from A1 and A2 and a new spinner from B1 and B2 . For these same selected die and spinner, determine the limit of E[Xn | X1 = X2 = . . . = Xn-1 = 0] as n goes to infinity. A. 3.8
B. 4.0
C. 4.2
D. 4.4
E. 4.6
Use the following information for the next four questions. Assume there are 3 types of risks each with equal probability. Whether or not there is a claim is determined by whether a six-sided die comes up with a zero or a one, with a one indicating a claim. If a claim occurs then its size is determined by a spinner. Type Number of die faces with a 1 rather than a 0 Claim Size Spinner I 2 $100 70%, $200 30% II 3 $100 50%, $200 50% III 4 $100 30%, $200 70% 13.6 (3 points) In one observation of a risk, you observe a single claim for $200. Use Bayes Theorem to estimate the pure premium for this risk. A. less than 90 B. at least 90 but less than 92 C. at least 92 but less than 94 D. at least 94 but less than 96 E. at least 96 13.7 (2 points) What is the variance of the hypothetical mean pure premiums? A. 820 B. 840 C. 860 D. 880 E. 900 13.8 (3 points) What is the expected value of the process variance of the pure premiums? A. less than 6100 B. at least 6100 but less than 6200 C. at least 6200 but less than 6300 D. at least 6300 but less than 6400 E. at least 6400 13.9 (2 points) In one observation of a risk, you observe a single claim for $200. Use Buhlmann credibility to estimate the pure premium for this risk. A. less than 90 B. at least 90 but less than 92 C. at least 92 but less than 94 D. at least 94 but less than 96 E. at least 96
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 563
Use the following information for the next two questions. Two dice, A and B, are used to determine the number of claims. The faces of each die are marked with either a 1 or a 2, where 1 represents 1 claim and 2 represents 2 claims. The probabilities for each die are: Die A B
Probability of 1 Claim Probability of 2 Claims 2/3 1/3 1/3 2/3
In addition, there are two spinners, X and Y, which are used to determine claim size. Each spinner has two areas marked 2 and 5. The probabilities for each spinner are: Probability Probability Spinner that Claim Size = 2 that Claim Size = 5 X 2/3 1/3 Y 1/3 2/3 For the first trial, a die is randomly selected from A and B and rolled. If 1 claim occurs, spinner X is spun. If 2 claims occur, both spinner X and spinner Y are spun. For the second trial, the same die selected in the first trial is rolled again. If 1 claim occurs, spinner X is spun. If 2 claims occur, both spinner X and spinner Y are spun. 13.10 (3 points) If the first trial yielded total losses of 7, use Bayesian Analysis to determine the expected total losses for the second trial. A. Less than 4.6 B. At least 4.6, but less than 4.9 C. At least 4.9, but less than 5.2 D. At least 5.2, but less than 5.5 E. At least 5.5 13.11 (4 points) If the first trial yielded total losses of 7, use Buhlmann Credibility to determine the expected total losses for the second trial. A. Less than 4.6 B. At least 4.6, but less than 4.9 C. At least 4.9, but less than 5.2 D. At least 5.2, but less than 5.5 E. At least 5.5
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 564
Use the following information for the next 4 questions: Two dice, A1 and A2 , are used to determine the number of claims. Each side of both dice are marked with either a 0 or a 1, where 0 represents no claim and 1 represents a claim. The probability of a claim for each die is: Die Probability of Claim 1/3 A1 A2
1/2
In addition, there are two spinners, B1 and B2 , representing claim severity. Each spinner has two areas marked 2 and 5. The probabilities for each claim size are: Claim Size Spinner 2 5 B1 0.60 0.40 B2
0.30 0.70 For Spinner B1 , the mean is 3.2, and the process variance is 2.16. For Spinner B2 , the mean is 4.1, and the process variance is 1.89. A die is selected randomly from A1 and A2 and a spinner from B1 and B2. Five observations from the selected die and spinner yield the following claim amounts in the following order: 0, 2, 5, 0, 5. 13.12 (2 points) Using Bayesian Analysis, determine the expected claim frequency for the next observation from the same die and spinner. A. 0.40 B. 0.42 C. 0.44 D. 0.46 E. 0.48 13.13 (2 points) Using Bayesian Analysis, determine the expected claim severity for the next observation from the same die and spinner. A. less than 3.4 B. at least 3.4 but less than 3.5 C. at least 3.5 but less than 3.6 D. at least 3.6 but less than 3.7 E. at least 3.7 13.14 (3 points) Using Bayesian Analysis, determine the expected pure premium for the next observation from the same die and spinner. (Do not separately estimate the future frequency and severity.) A. 1.60 B. 1.62 C. 1.64 D. 1.66 E. 1.68 13.15 (3 points) Using Buhlmann Credibility, determine the expected pure premium for the next observation from the same die and spinner. A. 1.60 B. 1.62 C. 1.64 D. 1.66 E. 1.68
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 565
Use the following information for the next two questions:
• For each insured, frequency is Geometric with mean β. • For an insured picked at random, β is equally likely to be 5% or 15%. • For each insured, frequency is Exponential with mean θ. • For an insured picked at random, θ is equally likely to be 10 or 20. • The distributions of β and θ are independent. • During years 1, 2, and 3, from an individual insured you observe a total of 2 claims of sizes 5 and 15. 13.16 (3 points) Determine the Bayesian estimate of the expected value of the aggregate losses from this same insured in year four. A. less than 1.7 B. at least 1.7 but less than 1.8 C. at least 1.8 but less than 1.9 D. at least 1.9 but less than 2.0 E. at least 2.0 13.17 (3 points) Determine the Buhlmann Credibility estimate of the expected value of the aggregate losses from this same insured in year four. A. less than 1.7 B. at least 1.7 but less than 1.8 C. at least 1.8 but less than 1.9 D. at least 1.9 but less than 2.0 E. at least 2.0
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 566
Use the following information for the next two questions: Two spinners, A1 and A2 , are used to determine number of claims. Each spinner is divided into regions marked 0 and 1 where 0 represents no claims and 1 represents a claim. The probability of a claim for each spinner is: Spinner Probability of Claim A1 0.15 A2
0.05 A second set of spinners, B1 and B2 , represents claim severity. Each spinner has two areas marked 20 and 40. The probabilities for each claim size are: Claim Size Spinner 20 40 B1 0.80 0.20 B2
0.30 0.70 A spinner is selected randomly from A1 and A2 and a second from B1 and B2 . Three observations from the selected spinners yield the following claim amounts in the following order: 0, 20, 0. 13.18 (4B, 11/92, Q.6) (3 points) Use Buhlmann credibility to separately estimate the expected number of claims and expected severity. Use these estimates to calculate the expected value of the next observation from the same pair of spinners. A. Less than 2.9 B. At least 2.9 but less than 3.0 C. At least 3.0 but less than 3.1 D. At least 3.1 but less than 3.2 E. At least 3.2 13.19 (4B, 11/92, Q.7) (3 points) Determine the Bayesian estimate of the expected value of the next observation from the same pair of spinners. A. Less than 2.9 B. At least 2.9 but less than 3.0 C. At least 3.0 but less than 3.1 D. At least 3.1 but less than 3.2 E. At least 3.2
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 567
Use the following information for the next two questions: Two dice, A1 and A2 , are used to determine the number of claims. Each side of both dice are marked with either a 0 or a 1, where 0 represents no claim and 1 represents a claim. The probability of a claim for each die is: Die Probability of Claim 1/6 A1 A2
3/6
In addition, there are two spinners, B1 and B2 , representing claim severity. Each spinner has two areas marked 2 and 14. The probabilities for each claim size are: Claim Size Spinner 2 14 5/6 1/6 B1 B2
3/6
3/6
A die is randomly selected from A1 and A2 and a spinner is randomly selected from B1 and B2 . The selected die is rolled and if a claim occurs, the selected spinner is spun. 13.20 (4B, 5/93, Q.13) (2 points) Determine E[X1 ], where X1 is the first observation from the selected die and spinner. A. 2/3 B. 4/3 C. 2 D. 4 E. 8 13.21 (4B, 5/93, Q.14) (2 points) For the same selected die and spinner, determine the limit of E[Xn | X1 = X2 = . . . = Xn-1 = 0] as n goes to infinity. A. Less than 0.75 B. At least 0.75 but less than 1.50 C. At least 1.50 but less than 2.25 D. At least 2.25 but less than 3.00 E. At least 3.00
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 568
13.22 (4B, 11/93, Q.14) (3 points) There are two methods for calculating credibility estimates for pure premium. One utilizes separate estimates for frequency and severity, and the other utilizes only the aggregate claim amount. Let A1 and A2 be equally likely frequency distributions and let B1 and B2 be equally likely severity distributions. Number of Claims
Probability of Claim for: A1 A2
Amount of Claims
Probability of Claim Amount for: B1 B2
0 0.80 0.60 100 0.40 0.80 1 0.20 0.40 200 0.60 0.20 A state Ai, Bj is selected at random, and a claim of 100 is observed. Determine the Buhlmann credibility estimate for the next observation from the same selected state utilizing only aggregate claim amounts. A. Less than 41.5 B. At least 41.5, but less than 42.5 C. At least 42.5, but less than 43.5 D. At least 43.5, but less than 44.5 E. At least 44.5
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 569
Use the following information for the next two questions: • A portfolio of independent risks is divided into two classes of equal size.
•
•
• •
All of the risks in Class 1 have identical claim count and claim size distributions as follows: Class 1 Class 1 Number of Claims Probability Claim Size Probability 1 1/2 50 2/3 2 1/2 100 1/3 All of the risks in Class 2 have identical claim count and claim size distributions as follows: Class 2 Class 2 Number of Claims Probability Claim Size Probability 1 2/3 50 1/2 2 1/3 100 1/2 The number of claims and claim size(s) for each risk are independent. A risk is selected at random from the portfolio, and a pure premium of 100 is observed for the first exposure period.
13.23 (4B, 11/95, Q.14 & Course 4 Sample Exam 2000, Q.19) (3 points) Determine the Bayesian analysis estimate of the expected number of claims for this same risk for the second exposure period. A. 4/3 B. 25/18 C. 41/29 D. 17/12 E. 3/2 13.24 (4B, 11/95, Q.15 & Course 4 Sample Exam 2000, Q.20) (2 points) A pure premium of 150 is observed for this risk for the second exposure period. Determine the Buhlmann credibility estimate of the expected pure premium for this same risk for the third exposure period. A. Less than 110 B. At least 110, but less than 120 C. At least 120, but less than 130 D. At least 130, but less than 140 E. At least 140
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 570
13.25 (4B, 5/96, Q.19) (2 points) Two dice, A and B, are used to determine the number of claims. The faces of each die are marked with either a 1 or a 2, where 1 represents 1 claim and 2 represents 2 claims. The probabilities for each die are: Die A B
Probability of 1 Claim Probability of 2 Claims 2/3 1/3 1/3 2/3
In addition, there are two spinners, X and Y, which are used to determine claim size. Each spinner has two areas marked 2 and 5. The probabilities for each spinner are:
Spinner X Y
Probability that Claim Size = 2 2/3 1/3
Probability that Claim Size = 5 1/3 2/3
For the first trial, a die is randomly selected from A and B and rolled. If 1 claim occurs, spinner X is spun. If 2 claims occur, both spinner X and spinner Y are spun. For the second trial, the same die selected in the first trial is rolled again. If 1 claim occurs, spinner X is spun. If 2 claims occur, both spinner X and spinner Y are spun. If the first trial yielded total losses of 5, determine the expected number of claims for the second trial. A. Less than 1.38 B. At least 1.38, but less than 1.46 C. At least 1.46, but less than 1.54 D. At least 1.54, but less than 1.62 E. At least 1.62 13.26 (2 points) In the previous question, if the first trial yielded total losses of 7, determine the expected losses for the second trial using Bayes Analysis. 13.27 (3 points) In 4B, 5/96, Q.19, if the first trial yielded total losses of 7, determine the expected losses for the second trial using Buhlmann Credibility.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 571
Use the following information for the next two questions: Two dice, A and B, are used to determine the number of claims. The faces of each die are marked with either a 0 or a 1, where 0 represents 0 claims and 1 represents 1 claim. The probabilities for each die are: Die Probability of 0 Claims Probability of 1 Claim A 2/3 1/3 B 1/3 2/3 In addition, there are two spinners, X and Y, which are used to determine claim size. Spinner X has two areas marked 2 and 8. Spinner Y has only one area marked 2. The probabilities for each spinner are: Probability that Probability that Spinner Claim Size = 2 Claim Size = 8 X 1/3 2/3 Y 1 0 For the first trial, a die is randomly selected from A and B and rolled. If a claim occurs, a spinner is randomly selected from X and Y and spun. 13.28 (4B, 11/96, Q.6) (1 point) Determine the expected amount of total losses for the first trial. A. Less than 1.4 B. At least 1.4, but less than 1.8 C. At least 1.8, but less than 2.2 D. At least 2.2, but less than 2.6 E. At least 2.6 13.29 (4B, 11/96, Q.7) (2 points) For each subsequent trial, the same die selected in the first trial is rolled again. If a claim occurs, a spinner is again randomly selected from X and Y and spun. Determine the limit of the Bayesian analysis estimate of the expected amount of total losses for the nth trial as n goes to infinity if the first n-1 trials each yielded total losses of 2. A. Less than 1.4 B. At least 1.4, but less than 1.8 C. At least 1.8, but less than 2.2 D. At least 2.2, but less than 2.6 E. At least 2.6
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 572
13.30 (4, 11/00, Q.33) (2.5 points) A car manufacturer is testing the ability of safety devices to limit damages in car accidents. You are given: (i) A test car has either front air bags or side air bags (but not both), each type being equally likely. (ii) The test car will be driven into either a wall or a lake, with each accident type being equally likely. (iii) The manufacturer randomly selects 1, 2, 3 or 4 crash test dummies to put into a car with front air bags. (iv) The manufacturer randomly selects 2 or 4 crash test dummies to put into a car with side air bags. (v) Each crash test dummy in a wall-impact accident suffers damage randomly equal to either 0.5 or 1, with damage to each dummy being independent of damage to the others. (vi) Each crash test dummy in a lake-impact accident suffers damage randomly equal to either 1 or 2, with damage to each dummy being independent of damage to the others. One test car is selected at random, and a test accident produces total damage of 1. Determine the expected value of the total damage for the next test accident, given that the kind of safety device (front or side air bags) and accident type (wall or lake) remain the same. (A) 2.44 (B) 2.46 (C) 2.52 (D) 2.63 (E) 3.09 13.31 (3 points) In the previous question, determine the expected value of the total damage for the next test accident if: the number of dummies, the kind of safety device (front or side air bags), and the accident type (wall or lake), all remain the same. (A) 1.0 (B) 1.1 (C) 1.2 (D) 1.3 (E) 1.4
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Use the following information for 4, 5/01, questions 10 and 11. (i) The claim count and claim size distributions for risks of type A are: Number of Claims Probabilities Claim Size Probabilities 0 4/9 500 1/3 1 4/9 1235 2/3 2 1/9 (ii) The claim count and claim size distributions for risks of type B are: Number of Claims Probabilities Claim Size Probabilities 0 1/9 250 2/3 1 4/9 328 1/3 2 4/9 (iii) Risks are equally likely to be type A or type B. (iv) Claim counts and claim sizes are independent within each risk type. (v) The variance of the total losses is 296,962. A randomly selected risk is observed to have total annual losses of 500. 13.32 (4, 5/01, Q.10) (2.5 points) Determine the Bayesian premium for the next year for this same risk. (A) 493 (B) 500 (C) 510 (D) 513 (E) 514 13.33 (4, 5/01, Q.11) (2.5 points) Determine the Bühlmann credibility premium for the next year for this same risk. (A) 493 (B) 500 (C) 510 (D) 513 (E) 514
Page 573
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 574
13.34 (4, 5/01, Q.28) (2.5 points) Two eight-sided dice, A and B, are used to determine the number of claims for an insured. The faces of each die are marked with either 0 or 1, representing the number of claims for that insured for the year. Die Pr(Claims = 0) Pr(Claims = 1) A 1/4 3/4 B 3/4 1/4 Two spinners, X and Y, are used to determine claim cost. Spinner X has two areas marked 12 and c. Spinner Y has only one area marked 12. Spinner Pr(Cost = 12) Pr(Cost = c) X 1/2 1/2 Y 1 0 To determine the losses for the year, a die is randomly selected from A and B and rolled. If a claim occurs, a spinner is randomly selected from X and Y and spun. For subsequent years, the same die and spinner are used to determine losses. Losses for the first year are 12. Based upon the results of the first year, you determine that the expected losses for the second year are 10. Calculate c. (A) 4 (B) 8 (C) 12 (D) 24 (E) 36 13.35 (4, 11/01, Q.23 & 2009 Sample Q.70) (2.5 points) You are given the following information on claim frequency of automobile accidents for individual drivers: Business Use Pleasure Use Expected Claim Expected Claim Claims Variance Claims Variance Rural 1.0 0.5 1.5 0.8 Urban 2.0 1.0 2.5 1.0 Total 1.8 1.06 2.3 1.12 You are also given: (i) Each driverʼs claims experience is independent of every other driverʼs. (ii) There are an equal number of business and pleasure use drivers. Determine the Bühlmann credibility factor for a single driver. (A) 0.05 (B) 0.09 (C) 0.17 (D) 0.19 (E) 0.27
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 575
Solutions to Problems: 13.1. C. The EPV for frequency is .1806 and is calculated as follows: Type of Die A1 A2
Bernoulli Parameter 0.1667 0.3333
A Priori Probability 0.50 0.50
Average Type of Die A1 A2
Process Variance 0.1389 0.2222 0.1806
A Priori Probability 0.50 0.50
Average
Mean 0.1667 0.3333
Square of Mean 0.0278 0.1111
0.2500
0.0694
VHM = .0694 - .25002 = .0069. K = EPV / VHM = .1806 / .0069 = 26.2. Z = 4/(4+K) = .132. The a priori mean frequency is .25. The observed claim frequency is 3/4= .75. Thus the estimated future frequency is: (.132)(.75) + (1 - .132)(.25) = 0.316. 13.2. D. For Spinner B1 , the mean is (.7)(10) + (.3)(40) = 19, the second moment is (.7)(102 ) + (.3)(402 ) = 550, and the process variance is 550 - 192 = 189. Type of Spinner B1 B2
A Priori Probability 0.50 0.50
Mean 19 34
Second Moment 550 1300
Average
Process Variance 189 144 166.5
The Expected Value of the Process Variance = (1/2)(189) + (1/2)(144) = 166.5. Type of Spinner B1 B2 Average
A Priori Probability 0.50 0.50
Mean 19 34
Square of Mean 361 1156
26.5
758.5
Thus the Variance of the Hypothetical Means = 758.5 - 26.52 = 56.25. Therefore, the Buhlmann Credibility Parameter for frequency = K = EPV / VHM = 166.5 / 56.25 = 2.96. Thus the credibility for 3 claims is: 3/(3 + K) = 50.3%. The a priori mean severity is 26.5. The observed claim severity is: (10 + 40 + 10)/3= 20. Thus the estimated future severity is: (.503)(20) + (1 - .503)(26.5) = 23.2. Comment: Note that the spinners are chosen independently of the dice, so frequency and severity are independent across risk types. Thus one can ignore the frequency process in this question. One can not do so when for example low frequency is associated with low severity, as in the questions related to “good”, “bad” and “ugly “ drivers.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 576
13.3. C. For each possible pair of die and spinner, variance of p.p. = µf σs2 + µs2 σf2 . Die and Spinner A1, B1 A1, B2 A2, B1 A2, B2
A Priori Chance of Risk 0.250 0.250 0.250 0.250
Mean Freq. 0.167 0.167 0.333 0.333
Variance of Freq. 0.139 0.139 0.222 0.222
Mean Severity 19 34 19 34
Variance of Sev. 189 144 189 144
Mean Die and Spinner A1, B1 A1, B2 A2, B1 A2, B2
Process Variance of P.P. 81.64 184.56 143.22 304.89 178.58
A Priori Chance of Risk 0.250 0.250 0.250 0.250
Mean Freq. 0.167 0.167 0.333 0.333
Mean Severity 19 34 19 34
Mean
Mean Pure Premium 3.17 5.67 6.33 11.33
Square of Mean P.P. 10.03 32.11 40.11 128.44
6.62
52.67
Thus VHM = 52.67 - 6.622 = 8.85. K = EPV / VHM = 178.6 / 8.85 = 20.2. Z = 4/(4 + K) = .165. The a priori mean pure premium is 6.62. The observed pure premium is: (10 + 0 + 40 + 10)/4 = 15. Thus the estimated future pure premium is: (.165)(15) + (1 - .165)(6.62) = 8.00. Comment: Note that the result is not equal to the product of the separate estimates for frequency and severity: (.316)(23.3) = 7.36 ≠ 8.00. Neither is it equal to the estimate of pure premium using Bayesian Analysis: 6.84 ≠ 8.00. 13.4. A. There are four possible combinations of die and spinner, or four risk types. If Die A1 and Spinner B1, then the chance of the observation of 10, 0, 40, 10, is: {(1/6)(.7)}(5/6){(1/6)(.3)}{(1/6)(.7)} = .000567. If Die A1 and Spinner B2, then the chance of the observation is: {(1/6)(.2)}(5/6){(1/6)(.8)}{(1/6)(.2)} = .000123. If Die A2 and Spinner B1, then the chance of the observation is: {(2/6)(.7)}(4/6){(2/6)(.3)}{(2/6)(.7)} = .003630. If Die A2 and Spinner B2, then the chance of the observation is: {(2/6)(.2)}(4/6){(2/6)(.8)}{(2/6)(.2)} = .000790. Die and Spinner A1, B1 A1, B2 A2, B1 A2, B2 Mean
A Priori Chance of Risk 0.250 0.250 0.250 0.250
Chance of the Observation 0.000567 0.000123 0.003630 0.000790
Probability Weight 0.0001418 0.0000308 0.0009075 0.0001975 0.001277
Posterior Mean Mean Pure ProbabilityMean SeverityPremium Freq. 11.1% 3.17 2.4% 5.67 71.0% 6.33 15.5% 11.33 100.0%
6.74
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 577
13.5. D. As we have more and more observations with no claim, the probability that we selected die A1 rather than die A2 increases. Therefore the expected value of the pure premium goes to (mean frequency for die A1 ) (mean severity) = (1/6)((19 + 34)/2) = 4.42. Alternately letʼs assume we have for example 100 observations with no claim, then the chance of this observation is (5/6)100 if we have die A1 and (4/6)100 if we have die A2 . Thus the Bayes Analysis is as follows: A
B
C
D
E
F
G
H
Type of A Priori Prob. Weight = Posterior Mean Die Chance of Chance Product Chance of Pure and This Die of the of Columns this Die Mean Mean Premium Spinner & Spinner Observation B&C & Spinner Frequency Severity Cols. F x G A1, B1 0.25 1.207e-8 3.019e-9 0.5000 0.167 19 3.17 A1, B2 0.25 1.207e-8 3.019e-9 0.5000 0.167 34 5.67 A2, B1 0.25 2.460e-18 6.149e-19 0.0000 0.333 19 6.33 A2, B2 0.25 2.460e-18 6.149e-19 0.0000 0.333 34 11.33 Overall
6.037e-9
4.42
1.000
13.6. C. For a Type I Risk, the chance of observing a $200 claim is (2/6)(30%) = 10%, since there is a 2/6 chance of observing any claim, and once a claim is observed there is 30% chance it will be $200. Similarly for Type II : (3/6)(50%) = 25%, For Type III : (4/6)(70%) = 46.7%. Since the types of risks are equally likely a priori, the posterior probabilities are therefore proportional to 10%, 25% and 46.7%. As calculated below this results in a posterior weighted average p.p. of 93.0. A
B
A Priori Chance of Type of This Type Risk of Risk I 0.333 II 0.333 III 0.333 Overall
C
D
E
F
Chance of the Observation 0.100 0.250 0.467
Prob. Weight = Product of Columns B&C 0.033 0.083 0.156
Posterior Chance of This Type of Risk 12.2% 30.6% 57.1%
Avg. Pure Premium 43.3 75.0 113.3
0.272
1.000
93.0
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 578
13.7. A. The mean pure premiums are computed for each type of risk by multiplying the mean frequency by the mean severity. The frequencies are : 2/6, 3/6, 4/6. The mean severities are: $130, $150, $170. Mean p.p.: 130/3, 150/2, 340/3. Remembering that the a priori probabilities are stated to be equal, the grand mean pure premium is $77.22. The expected value of the p.p. squared is 6782. Thus the variance = 6782 - 77.222 = 819.
Type of Risk I II III
A Priori Chance of This Type of Risk 0.333 0.333 0.333
Avg. Claim Freq. 0.333 0.500 0.667
Avg. Claim Severity 130 150 170
Avg. Pure Premium 43.3 75.0 113.3
Square of Avg. Pure Premium 1878 5625 12844
77.2
6782
Average
13.8. D. Use for each type the formula: variance of p.p. = µf σs2 + µs2 σf2 to get process variances of: 4456, 6875, and 7816. For example for Type I : µf = 1/3, σf2 = (1/3)(2/3), µs = 130, σs2 = (70%)(302 )+(30%)(702 ) = 2100; Type I variance of p.p. = (1/3)(2100) + (2/9)(1302 ) = 4456. Remembering that the a priori probabilities are stated to be equal, these process variances average to 6384. Type Risk I II III Mean
A Priori Chance of Risk 0.333 0.333 0.333
Mean Freq. 0.333 0.500 0.667
Variance of Freq. 0.222 0.250 0.222
Mean Severity 130 150 170
Variance of Sev. 2100 2500 2100
Process Variance of P.P. 4456 6875 7822 6384
Comment: In this case, while for any given type of risk the frequency and severity are independent, this is not true across risks. For example, the Type I risks are low frequency and low severity. 13.9. B. Using the solutions to the two previous questions, K = EPV/ VHM = 6384/819 = 7.79. Thus for one observation, Z = 1 / 8.79 = 11.4%. estimated p.p. = (.114)($200) + (1 - .114)($77.22) = $91.2
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 579
13.10. D. Note that unlike the usual die-spinner example, we do not pick the spinners at random (independent of the number of claims.) Rather, the spinners depend on the number of claims, but not the type of risk. In fact we only have two types of risk: low frequency corresponding to Die A and high frequency corresponding to Die B. The mean of Spinner X is: (2/3)(2) +(1/3)(5) = 3, while the mean of Spinner Y is: (1/3)(2) + (2/3)(5) = 4. If we have one claim the mean loss is E[X] = 3. If we have two claims, then the mean loss is: E[X+Y] = E[X] + E[Y] = 3 + 4 = 7. Thus the mean pure premium for Die A is: (2/3)(3) + (1/3)(7) = 4.333. The mean pure premium for Die B is: (1/3)(3) + (2/3)(7) = 5.667. In this case, if one observes a total losses of 7, it must have come from two claims, one of size 2 and one of size 5. There are two ways this could occur; either the claim from Spinner X is 2 and that from Spinner Y is 5 or vice versa. Thus if we have two claims, the chance of one being of size 2 and the other of size 5 is the sum of these two situations: (2/3)(2/3) + (1/3)(1/3) = 5/9. Thus if we have selected Die A, there is a (1/3)(5/9) = .1852 chance of this observation. If we have selected Die B, there is a (2/3)(5/9)= .3704 chance of this observation. A
B
C
D
E
F
Die A B
A Priori Chance of This Die 0.500 0.500
Chance of the Observation 0.1852 0.3704
Prob. Weight = Product of Columns B&C 0.0926 0.1852
Posterior Chance of This Type of Die 33.3% 66.7%
Mean P.P. for This Type of Die 4.333 5.667
0.2778
1.000
5.222
Overall
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 580
13.11. C. The mean pure premium if Die A is 4.333 and the mean pure premium if Die B is 5.667. Thus since Die A and Die B are equally likely a priori, the overall mean is 5 and the Variance of the Hypothetical Mean Pure Premiums is: {(5 .667 - 5)2 +.(4.333 - 5)2 }/2 = 0.6672 = 0.444. If one has Die A, then the possible outcomes are as follows: Situation 1 claim @ 2 1 claim @ 5 2 claims @ 2 each 2 claims: X @ 2 & Y @ 5 2 claims: X @ 5 & Y @ 2 2 claims @ 5 each
Probability 44.4% 22.2% 7.4% 14.8% 3.7% 7.4%
Pure Premium 2 5 4 7 7 10
Square of P.P. 4 25 16 49 49 100
Overall
100.0%
4.333
25.00
Thus for Die A, the process variance of the pure premiums is 25 - 4.3332 = 6.225. Similarly, if one has Die B, then the possible outcomes are as follows: Situation 1 claim @ 2 1 claim @ 5 2 claims @ 2 each 2 claims: X @ 2 & Y @ 5 2 claims: X @ 5 & Y @ 2 2 claims @ 5 each
Probability 22.2% 11.1% 14.8% 29.6% 7.4% 14.8%
Pure Premium 2 5 4 7 7 10
Square of P.P. 4 25 16 49 49 100
Overall
100.0%
5.667
39.00
Thus for Die B, the process variance of the pure premiums is 39 - 5.6672 = 6.885. Thus since Die A and Die B are equally likely a priori, the Expected Value of the Process Variance of the Pure Premiums is: (.5)(6.225) + (.5)(6.885) = 6.555. Thus the Buhlmann Credibility Parameter K = EPV / VHM = 6.555 / .444 = 14.8. Thus one observation would be given credibility of 1/(1 + 14.8) = 6.3%. The a priori mean pure premium is: (.5)(4.333) + (.5)(5.667) = 5. Since the observed pure premium is 7, the Buhlmann Credibility estimate of the future pure premium is: (.063)(7) + (1 - .063)(5) = 5.126.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 581
13.12. C. The frequency only depends on the type of die. We observe no claim, claim, claim, no claim, claim. Chance of observation is: (1-q)2 q3 . Die
A Priori Chance of Die
Chance of the Observation
Probability Weight
A1 A2
0.500 0.500
0.016461 0.031250
0.0082305 0.0156250
34.5% 65.5%
0.3333 0.5000
0.023855
100.0%
0.4425
Mean
Posterior Mean MeanFrequency ProbabilityMean Severity Freq.
13.13. E. The severity only depends on the type of spinner. We observe claims of size 2, 5, and 5. For Spinner B1, this has a probability of: (.6)(.4)(.4) = 0.096. Spinner
A Priori Chance of Spinner
Chance of the Observation
Probability Weight
B1 B2
0.500 0.500
0.096000 0.147000
0.0480000 0.0735000
39.5% 60.5%
3.200 4.100
0.121500
100.0%
3.744
Mean
Posterior Mean Mean Severity ProbabilityMean Severity Freq.
13.14. D. There are four possible combinations of die and spinner, or four risk types. If Die A1 and Spinner B1, then the chance of the observation of 0, 2, 5, 0, 5 is: (2/3){(1/3)(.6)}{(1/3)(.4)}(2/3){(1/3)(.4)} = .001580. If Die A1 and Spinner B2, then the chance of the observation is: (2/3){(1/3)(.3)}{(1/3)(.7)}(2/3){(1/3)(.7)} = .002420. If Die A2 and Spinner B1, then the chance of the observation is: (1/2){(1/2)(.6)}{(1/2)(.4)}(1/2){(1/2)(.4)} = .003000. If Die A2 and Spinner B2, then the chance of the observation is: (1/2){(1/2)(.3)}{(1/2)(.7)}(1/2){(1/2)(.7)} = .004594. Die and Spinner
A Priori Chance of Risk
Chance of the Observation
Probability Weight
A1, B1 A1, B2 A2, B1 A2, B2
0.250 0.250 0.250 0.250
0.001580 0.002420 0.003000 0.004594
0.0003951 0.0006049 0.0007500 0.0011484
13.6% 20.9% 25.9% 39.6%
1.067 1.367 1.600 2.050
0.002898
100.0%
1.657
Mean
Posterior Mean Mean Pure ProbabilityMean SeverityPremium Freq.
Comment: Note that the result is equal to the product of the separate estimates for frequency and severity: (.4425)(3.744) = 1.657. This is due to the fact that in this case the die and spinner are chosen separately. This is not a general property of Bayesian Analysis.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 582
13.15. C. In order to calculate the EPV for each possible pair of die and spinner use the formula: variance of p.p. = µf σs2 + µs2 σf2 . Die and Spinner
A Priori Chance of Risk
Mean Freq.
Variance of Freq.
Mean Severity
Variance of Sev.
Process Variance of P.P.
A1, B1 A1, B2 A2, B1 A2, B2
0.250 0.250 0.250 0.250
0.333 0.333 0.500 0.500
0.222 0.222 0.250 0.250
3.2 4.1 3.2 4.1
2.16 1.89 2.16 1.89
2.996 4.366 3.640 5.147
Mean
4.037
Die and Spinner
A Priori Chance of Risk
Mean Freq.
Mean Severity
Mean Pure Premium
Square of Mean P.P.
A1, B1 A1, B2 A2, B1 A2, B2
0.250 0.250 0.250 0.250
0.333 0.333 0.500 0.500
3.2 4.1 3.2 4.1
1.067 1.367 1.600 2.050
1.138 1.868 2.560 4.202
1.521
2.442
Mean
Thus VHM = 2.442 - 1.5212 = 0.1286. K = EPV / VHM = 4.037 / 0.1286 = 31.4. Z = 5/(5 + K) = .137. The a priori mean pure premium is 1.52. The observed pure premium is: (0 + 2 + 5 + 0 + 5)/5 = 2.4. Thus the estimated future pure premium is: (.137)(2.4) + (1 - .137)(1.52) = 1.64.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 583
13.16. C. There are four risk types equally likely: β = 5% and θ = 10, β = 5% and θ = 20, β = 15% and θ = 10, β = 15% and θ = 20. Given β and θ, the mean (annual) pure premium is βθ. Over three years, the number of claims is Geometric with mean 3β; the density at 2 is:
(3β)2 . (1 + 3β)3
For the Exponential: 2 f(5) f(15) = (2) (e-5/θ/θ) (e-15/θ/θ). β2 e- 20/ θ Thus the chance of the observation is proportional to: . (1 + 3β)3 θ2 Since the four risk types are equally likely, we can also use this as the probability weight. Beta
Theta
Probability Weight
Posterior Probability
5% 5% 15% 15%
10 20 10 20
0.00000222 0.00000151 0.00000999 0.00000679
10.8% 7.4% 48.7% 33.1%
0.500 1.000 1.500 3.000
0.00002051
100.0%
1.851
Mean
Mean Pure #REF! Premium
Comment: Analogous to a die-spinner question in which the die and spinner are chosen separately, and thus with a cross-classification setup and 4 risk types. 13.17. B. There are four risk types equally likely: β = 5% and θ = 10, β = 5% and θ = 20, β = 15% and θ = 10, β = 15% and θ = 20. Given β and θ, the mean (annual) pure premium is: βθ. Given β and θ, the process variance of the annual pure premium is: βθ2 + θ2 β(1+β) = θ2 (2β+β2). Beta
Theta
Process Variance
Mean Pure Premium
5% 5% 15% 15%
10 20 10 20
10.25 41.00 32.25 129.00
0.5 1.0 1.5 3.0
0.250 1.000 2.250 9.000
53.12
1.5
3.125
Mean
Square of Mean Pure#REF! Premium
EPV = 53.12. VHM = 3.125 - 1.52 = 0.875. K = EPV / VHM = 53.12 / 0.875 = 60.7. For three years of data, Z = 3 / (3 + 60.7) = 4.7%. The observed annual pure premium is: (5 + 15)/3 = 20/3. The prior mean annual pure premium is 1.5. The estimated pure premium for year four is: (0.047)(20/3) + (1 - 0.047)(1.5) = 1.74.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 584
13.18. D. First one estimates the frequency. We have a Bernoulli process with mean q and process variance q(1-q).
Type of Spinner A1 A2
A Priori Chance of This Type Spinner 0.500 0.500
Overall
Process Variance 0.1275 0.0475
Mean Spin 0.15 0.05
Square of Mean Spin 0.0225 0.0025
0.0875
0.1000
0.0125
The variance of the hypothetical means = .0125 - 0.12 = .0025. K = EPV / VHM = .0875 / .0025 = 35. N = 3 because we have three observations, so Z = 3/(3 + 35) = 7.9%. The a priori estimate is .1, and the observation is 1/3 (one claim in three trials), so the new estimate of the frequency is: (.079)(1/3) + (1 - .079)(.1) = .118. Similarly one can estimate the severity:
Type of Spinner B1 B2 Overall
A Priori Chance of This Type Spinner 0.500 0.500
Process Variance 64 84
Mean Spin 24 34
Square of Mean Spin 576 1156
74
29
866
For example, the process variance for Spinner B2 is: (0.3)(202 ) + (0.7)(402 ) - 342 = 84. The variance of the hypothetical means = 866 - 292 = 25. K = EPV / VHM = 74 / 25 = 2.96. N = 1 because we have a single claim and thus only one observation of the claim severity process. (The “B” spinner was only spun a single time.) Thus Z = 1/(1 + 2.96) = 25.3%. The a priori estimate is 29, and the observation is 20 (one claim of size 20), so the new estimate of the severity is (.253)(20) + (1 - .253)(29) = 26.7. Combining the separate estimates of frequency and severity, one gets an estimated pure premium of: (.118 )(26.7) = 3.15. Comment: Note the solution would differ if one worked directly with the pure premiums rather than separately estimating the frequency and severity. The solution to this alternate problem is as follows: The variance of the hypothetical means is: 10.825 - 2.92 = 2.415. The expected value of the process variance is 83.175. Therefore K = EPV / VHM = 83.175 / 2.415 = 34.4. For three observations, Z = 3/(3 + 34.4) = 8.0%. The observation is 20 / 3. The a priori mean pure premium is 2.9. Thus the new estimate of the pure premium is: (20/3)(.08) + (2.9)(1-.08) = 3.20.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 585
13.19. E. A
B
C
D
A Priori Prob. Weight = Chance of Chance Product Type of This Pair of the of Columns Spinners of Spinners Observation B&C A1, B1 0.25 0.0867 0.0217 A1, B2 0.25 0.0325 0.0081 A2, B1 0.25 0.0361 0.0090 A2, B2 0.25 0.0135 0.0034 Overall
0.0422
E
F
G
H
Posterior Mean Chance of Pure This Pair Mean Mean Premium of Spinners Frequency Severity Cols. F x G 0.5135 0.15 24 3.600 0.1926 0.15 34 5.100 0.2138 0.05 24 1.200 0.0802 0.05 34 1.700 3.223
1.000
For example, the chance of the observation if one has spinners A1 and B2 is: (.85)(.15)(.85)(.30) = .0325. For example, the posterior chance of spinners A1 and B2 is: .0081 / .0422 = .1926. For example, the mean severity if Spinner B2 is: (.30)(20) + (.70)(40) = 34. 13.20. C. The mean frequency is: (1/2){(1/6)+(3/6)} = 1/3. For spinner B1 the mean claim size is: (5/6)(2) + (1/6)(14) = 4. For spinner B2 the mean claim size is: (3/6)(2) + (3/6)(14) = 8. Thus the mean claim size is: (1/2)(4 + 8) = 6. The mean pure premium as the product of the mean frequency and mean severity: (1/3) (6) = 2. Alternately, one can calculate the mean pure premiums for each type of risk and average: Type of Risk A1,B1 A1,B2 A2,B1 A2,B2 Overall
A Priori Chance of This Type of Risk 0.250 0.250 0.250 0.250
Mean Frequency 0.167 0.167 0.500 0.500
Mean Severity 4.000 8.000 4.000 8.000
Mean Pure Premium 0.667 1.333 2.000 4.000
0.333
6.000
2.000
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 586
13.21. B. As we have more and more observations with no claim, the probability that we selected die A1 rather than die A2 increases. Therefore the expected value of the pure premium goes to (mean frequency for die A1 ) (mean severity) = (1/6)(6) = 1. Alternately letʼs assume we have for example 100 observations with no claim, then the chance of this observation is (5/6)100 if we have die A1 and (3/6)100 if we have die A2 . Thus the Bayes Analysis is as follows: A
B
Type of Risk A1,B1 A1,B2 A2,B1 A2,B2
A Priori Chance of This Type of Risk 0.250 0.250 0.250 0.250
Overall
C
D
E
F
Chance of the Observation 1.207e-8 1.207e-8 7.889e-31 7.889e-31
Prob. Weight = Product of Columns B&C 3.019e-9 3.019e-9 1.972e-31 1.972e-31
Posterior Chance of This Type of Risk 50.00% 50.00% 0.00% 0.00%
Mean Pure Premium 0.667 1.333 2.000 4.000
6.037e-9
1.000
1.000
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 587
13.22. E. Mean frequencies are A1 = .2, A2 = .4. (Bernoulli) Process variances of the frequencies are: A1 = (.2)(.8) = .16, A2 = (.4)(.6) = .24. Mean severities are: B1 = 160, B2 = 120. Calculating the second central moment, the process variances of the severities are for B1 : (.4)(100 - 160)2 + (.6)(200 - 160)2 = 2400, and for B2 : (.8)(100 - 120)2 + (.2)(200 - 120)2 = 1600. Since the frequency and severity are independent, for each type of risk the Process Variance of the Pure Premium = (mean frequency)(variance of severity) + (mean severity)2 (variance of frequency). Type Risk A1, B1 A1, B2 A2, B1 A2, B2 Mean
A Priori Probability 0.25 0.25 0.25 0.25
Mean Freq. 0.2 0.2 0.4 0.4
Variance of Freq. 0.16 0.16 0.24 0.24
Mean Severity 160 120 160 120
Variance of Sev. 2400 1600 2400 1600
Mean P.P. 32 24 64 48
0.3
0.2
140
2000
42
Variance Sq. of of P.P. Mean P.P. 4576 1024 2624 576 7104 4096 4096 2304 4600
2000
Expected Value of the Process Variance of the P.P. = 4600. Variance of the Hypothetical Mean Pure Premiums = 2000 - 422 = 236. Thus, K = 4600 / 236 = 19.5. Z = 1/(1 + 19.5) = 0.0488. The new estimate = (100)(.0488) + (42)(1 - .0488) = 44.8. Comment: Similar to a Die/Spinner question. What if instead the question had asked about the separate estimates of frequency and severity rather than the method using only aggregate claim amounts? In that case the solution would differ as follows. For the frequency, EPV = .20, VHM = .12 = .01, and K = .20 / .01 = 20. Z = 1 / (1 + 20) = 1/21. New estimated frequency = (1)(1/21) + (.3)(20/21) = 1/3. For the severity, EPV = 2000, VHM = 202 = 400, and K = 2000 / 400 = 5. Z = 1 / (1 + 5) = 1/6. New estimated severity = (100)(1/6) + (140)(5/6) = 133.33. The new estimated pure premium in this case would be the product of the separate estimates of frequency and severity: (1/3)(133.33) = 44.44. Notice that this differs from the solution to the question that was asked on the exam. In general the two methods would give different results.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 588
13.23. C. The a priori chance of a risk from either Class is 1/2. If we have a risk from Class 1, then there are two ways of observing a pure premium of 100. One can observe a single claim of 100, which has a probability of (1/2)(1/3) = 1/6, or one can observe two claims each of size 50, which has a probability of (1/2)(2/3)(2/3) = 2/9. Thus for a risk from Class 1, the chance of observing a pure premium of 100 is: (1/6) + (2/9) = 28/72. Similarly, for a risk from Class 2, the chance of observing a pure premium of 100 is: (1/3) + (1/12) = 30/72. By Bayes Theorem, the posterior chance of a risk from Class 1 is proportional to the product of having made the observation if the risk had been from Class 1 times the a priori probability of the risk having been from Class 1; this product is: (28/72)(1/2) = 14/72. Similarly, for Class 2 the posterior probability is proportional to: (30/72)(1/2) = 15/72. Therefore, the posterior probabilities are: (14/29) and (15/29). The mean claim frequency for Class 1 is 3/2, while that for Class 2 is 4/3. Thus the posterior estimate of the claim frequency is: (14/29)(3/2) + (15/29)(4/3) = (21+20)/29 = 41/29. In spreadsheet form, being sure to retain plenty of decimal places: A
B
A Priori Chance of Type of This Type Risk of Risk 1 0.500 2 0.500 Overall
C
D
Chance of the Observation 0.38889 0.41667
Prob. Weight = Product of Columns B&C 0.19444 0.20833 0.40278
E
F
Posterior Chance of This Type of Risk = Average Col. D / Claim Sum of Col D. Frequency 0.48276 1.50000 0.51724 1.33333 1.41379 1.00000
Comment: One has to be careful of rounding, since 17/12 =1.417 while 41/29 = 1.414, so that choices D and C are very close.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 589
13.24. A. The average pure premiums are the product of the average frequency and the average severity. Since the average pure premium for each type is 100, the Variance of the Hypothetical Mean Pure Premiums is zero. Class 1 2 Overall
A Priori Chance 0.5 0.5
Avg. Freq. 1.5000 1.3333
Avg. Severity 66.6667 75.0000
Avg. Pure Premium 100.0000 100.0000
Variance of Freq. 0.2500 0.2222
Variance of Sev. 555.55 625.00
Process Variance of P.P. 1944.44 2083.21 2013.82
Since the frequency and severity are independent, the process variance of the pure premium is: (variance of severity)(mean frequency) + (variance of frequency)(mean severity2 ). For example, for class 2 the process variance of pure premium = (625)(1.333) + (.2222)(752 ) = 2083. Since the Expected Value of the Process Variance is greater than zero, and VHM = 0, we have Z = 0. (We have K = ∞, N = 2, and therefore Z = N / (N+K) = 0.) The a priori estimate of the pure premium is: (.5)(100) + (.5)(100) = 100. Thus the new estimate is: (125)(0) + (100)(1 - 0) = 100. Comment: Thereʼs no need to compute the EPV in order to answer this question. When the mean pure premiums for each class are equal, observations of the pure premium are given no Buhlmann credibility. However, as seen in the previous question, the more exact Bayes Analysis is able to extract some useful information from the observations, even in this case where the mean pure premiums are equal for each class.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 590
13.25. B. In this case, if one observes a total losses of 5, it must have come from a single claim of size 5. If we have selected Die A, there is a: (2/3)(1/3) chance of this observation. If we have selected Die B, there is a: (1/3)(1/3) chance of this observation. A
B
C
D
Die A B
A Priori Chance of This Die 0.500 0.500
Chance of the Observation 0.2222 0.1111
Prob. Weight = Product of Columns B&C 0.1111 0.0556
Overall
0.1667
E
F
Posterior Mean Chance of for This Type This Type of Die of Die 66.7% 1.333 33.3% 1.667 1.000
1.444
Comment: Note that what is being asked for the expected number of claims. Unlike the usual diespinner example, we do not pick the spinners at random (independent of the number of claims.) Rather, the spinners depend on the number of claims, but not the type of risk. In fact we only have two types of risk: low frequency corresponding to Die A and high frequency corresponding to Die B. 13.26. In this case, if one observes a total losses of 7, it must have come from two claims of sizes 2 and 5. The probability of that is proportional to the probability of having two claims, which for Die A is 1/3 and for die B is 2/3. Thus the posterior distribution is 1/3 @ A and 2/3 @ B. The mean of Spinner X is: (2/3)(2) + (1/3)(5) = 3. The mean of Spinner Y is: (1/3)(2) + (2/3)(5) = 4. Thus the mean pure premium if die A is: (2/3)(3) + (1/3)(3 + 4) = 13/3. The mean pure premium if die B is: (1/3)(3) + (2/3)(3 + 4) = 17/3. Thus the Bayesian estimate of the pure premium for trial two is: (1/3)(13/3) + (2/3)(17/3) = 47/9 = 5.222. Comment: The types of die are the two risk types. The severity distribution is independent of which die we pick. Thus the probability of the observation conditional on having 2 claims is independent of the type of die chosen. Assuming we have 2 claims, the chance of one of them being 2 and the other being 5 is: Prob[X = 2] Prob[Y = 5] + Prob[Y = 2] Prob[X = 5] = (2/3)(2/3) + (1/3)(1/3) = 5/9.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 591
13.27. The mean pure premium if Die A is 13/3 and the mean pure premium if Die B is 17/3. Thus since Die A and Die B are equally likely a priori, the Variance of the Hypothetical Mean Pure Premiums is: (2/3)2 = 4/9. If one has Die A, then the possible outcomes are as follows: Situation 1 claim @ 2 1 claim @ 5 2 claims @ 2 each 2 claims: 1 @ 2 & 1 @ 5 2 claims: 1 @ 5 & 1 @ 2 2 claims @ 5 each
Probability 44.4% 22.2% 7.4% 14.8% 3.7% 7.4%
Pure Premium 2 5 4 7 7 10
Square of P.P. 4 25 16 49 49 100
Overall
100.0%
4.333
25.00
Thus for Die A, the process variance of the pure premiums is: 25 - 13/32 = 56/9 = 6.222. Similarly, if one has Die B, then the possible outcomes are as follows: Situation 1 claim @ 2 1 claim @ 5 2 claims @ 2 each 2 claims: 1 @ 2 & 1 @ 5 2 claims: 1 @ 5 & 1 @ 2 2 claims @ 5 each
Probability 22.2% 11.1% 14.8% 29.6% 7.4% 14.8%
Pure Premium 2 5 4 7 7 10
Square of P.P. 4 25 16 49 49 100
Overall
100.0%
5.667
39.00
Thus for Die B, the process variance of the pure premiums is: 39 - (17/3)2 = 62/9 = 6.889. Thus since Die A and Die B are equally likely a priori, the Expected Value of the Process Variance of the Pure Premiums is: (0.5)(56/9) + (0.5)(62/9) = 59/9. Thus the Buhlmann Credibility Parameter K = EPV / VHM = (59/9) / (4/9) = 14.75. For one observation, Z = 1 / (1+14.75) = 6.35%. The a priori mean pure premium is: (0.5)(13/3) + (0.5)(17/3) = 5. The Buhlmann Credibility estimate of the future pure premium is: (0.0635)(7) + (1 - 0.0635)(5) = 5.127. Comment: While for this observation the estimates from Buhlmann Credibility and Bayesian Analyses are very similar, they are not equal. 13.28. C. Frequency and severity are independent. The mean frequency is: (1/3 + 2/3)/2 = 1/2. The mean of spinner X is: (2)(1/3) + (8)(2/3) = 6. The mean of spinner Y is 2. Thus, the mean severity is (6 + 2)/2 = 4. Thus the mean pure premium is: (mean frequency)(mean severity) = (1/2)(4) = 2.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 592
13.29. E. Since the spinner is randomly reselected after each trial, as n goes to infinity we continue to assume that spinner X and Y are equally likely. However the same die is used for each trial, so we can apply Bayes Theorem to estimate the posterior probability of each die. We observe a claim every trial. Therefore the posterior probability of die B is proportional to (2/3)n-1, while that for die A is proportional to (1/3)n-1. Thus as n goes to infinity the ratio of the probability of die B compared to that of die A, goes to infinity. Since the probabilities add to unity, the probability of die B goes to unity. Thus the expected frequency goes to 2/3, that of die B. The expected severity remains 4. Thus the expected pure premium goes to (2/3)(4) = 8/3 = 2.667. Comment: What if instead neither the die nor spinner are reselected after each trial? Then in the limit die B and spinner Y get all the probability. Thus the posterior estimate of the pure premium would be: (2/3)(2) = 4/3 in this case. 13.30. C. There are four equally likely types of tests: Front F, W F, L
Wall Lake
Side S, W S, L
The number of crash dummies acts as the frequency, while the amount of damage acts as the severity. We use Bayes Analysis to predict the future pure premium. If we have a Front and Wall test, then the total damage can be 1 if there is either one dummy with damage of 1, or 2 dummies each with damage .5. This has probability of: (1/4)(1/2) + (1/4)(1/2)2 = 3/16. If we have a Front and Lake test, then the total damage can be 1 if there is one dummy with damage of 1. This has probability of: (1/4)(1/2) = 1/8. If we have a Side and Wall test, then the total damage can be 1 if there is 2 dummies each with damage .5. This has probability of: (1/2)(1/2)2 = 1/8. If we have a Side and Lake test, then the total damage can not be 1. A
B
Type
A Priori Prob.
Frequency
F, W F, L S, W S, L
0.25 0.25 0.25 0.25
1 or 2 or 3 or 4 1 or 2 or 3 or 4 2 or 4 2 or 4
SUM
C
D
E
F
G
H
I
Mean Mean Chance Probability Posterior Freq. Severity Sev. of Weights = Prob. Col. B x Col. G Observ.Col. 2.5 2.5 3.0 3.0
.5 or 1 1 or 2 .5 or 1 1 or 2
0.75 1.5 0.75 1.5
0.1875 0.1250 0.1250 0.0000
J
Mean P.P.
0.04688 0.03125 0.03125 0.00000
0.429 0.286 0.286 0.000
1.875 3.750 2.250 4.500
0.10938
1.000
2.518
The posterior distribution is: 3/7, 2/7, 2/7, 0. The estimated total damage from a test of the same type is: (3/7)(1.875) + (2/7)(3.75) + (2/7)(2.25) + (0)(4.5) = 2.518. Comment: Mathematically similar to a Die/Spinner model of pure premium in which one separately chooses one of two dice and one of two spinners.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 593
13.31. D. We use Bayes Analysis to predict the expected value of the total damage for the next test accident. If we have a Front and Wall test, then the total damage can be 1 if there is either one dummy with damage of 1, or 2 dummies each with damage .5. If we have a Front and Lake test, then the total damage can be 1 if there is one dummy with damage of 1. If we have a Side and Wall test, then the total damage can be 1 if there is 2 dummies each with damage .5. If we have a Side and Lake test, then the total damage can not be 1. Type of Accident Air Bag Type
Number of Dummies
A Priori Probability
Chance of Observation
Probability Weight
Posterior Probability
Average Damage
Front Front Front Front
Wall Wall Wall Wall
1 2 3 4
0.0625 0.0625 0.0625 0.0625
0.5 0.25 0 0
0.03125 0.015625 0 0
0.286 0.143 0.000 0.000
0.75 1.50 2.25 3.00
Front Front Front Front
Lake Lake Lake Lake
1 2 3 4
0.0625 0.0625 0.0625 0.0625
0.5 0 0 0
0.03125 0 0 0
0.286 0.000 0.000 0.000
1.50 3.00 4.50 6.00
Side Side
Wall Wall
2 4
0.1250 0.1250
0.25 0
0.03125 0
0.286 0.000
1.50 3.00
Side Side
Lake Lake
2 4
0.1250 0.1250
0 0
0 0
0.000 0.000
3.00 6.00
0.109375
1
1.286
SUM
1
If there is a Wall accident the average damage per dummy is: (.5 + 1)/2 = .75. If there is a Lake accident the average damage per dummy is: (1 + 2)/2 = 1.5. The estimated total damage for the next test accident is: (.286)(.75) + (.143)(1.5) + (.286)(1.5) + (.286)(1.5) = 1.286. Comment: The difference from Course 4, 11/00, Q.33 is that in this question the number of dummies is kept the same for the next test, in addition to the type of air bag and the type of accident.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 594
13.32. A. For risk type A, the chance of observing total annual losses of 500 is: Prob(1 claim)Prob(size = 500) = (4/9)(1/3) = 12/81. For risk type B, the chance of observing total annual losses of 500 is: Prob(2 claims)Prob(size = 250)2 = (4/9)(2/3)2 = 16/81. Since the risk types are equally likely, the posterior distribution is proportional to the chances of the observation, 12/81 and 16/81. Thus the posterior chance of A is: 12/(12 + 16) = 3/7 and of B is: 16/(12 + 16) = 4/7. The mean loss for A is: (2/3)(990) = 660. The mean loss for B is: (4/3)(276) = 368. The estimated future loss is: (3/7)(660) + (4/7)(368) = 493. A
B
A Priori Chance of Type of This Type Risk of Risk A B
C
D
E
F
Chance of the Observation
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Avg. Pure Premium
0.148 0.198
0.074 0.099
42.9% 57.1%
660.0 368.0
0.173
1.000
493.1
0.500 0.500
Overall
13.33. D. VHM = 285512 - 5142 = 21,316. Risk Type
Mean Freq.
Var. Freq.
Mean Sev.
Var. Sev.
Process Variance
Mean P.P.
Square of Mean P.P.
A B
0.667 1.333
0.444 0.444
990 276
120,050 1,352
515,633.3 35,659
660 368
435,600 135,424
275,646
514
285,512
Avg.
K = EPV/VHM = 275646/21316 = 12.9. Z = 1/(1 + 12.9) = 7.2%. Estimate = (.072)(500) + (1 - .072)(514) = 513. Comment: EPV + VHM = 275,646 + 21,316 = 296,962 = variance of the total losses. Thus one can save time by using the given total variance and either the EPV or VHM to get the other, or one can use the given total variance to check our work.
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 595
13.34. E. Given Die A and Spinner X, chance of the observation is: (3/4)(1/2) = 3/8. Given Die A and Spinner Y, the chance of the observation is: (3/4)(1) = 3/4. Given Die B and Spinner X, the chance of the observation is: (1/4)(1/2) = 1/8. Given Die B and Spinner Y, the chance of the observation is: (1/4)(1) = 1/4. A
B
C
D
Type of A Priori Prob. Weight = Die Chance of Chance Product and This Die of the of Columns Spinner & Spinner Observation B&C A, X 0.25 0.375 0.0938 A, Y 0.25 0.750 0.1875 B, X 0.25 0.125 0.0312 B, Y 0.25 0.250 0.0625 Overall
0.3750
E
Posterior Chance of this Die & Spinner 0.2500 0.5000 0.0833 0.1667
F
Mean Freq. 0.7500 0.7500 0.2500 0.2500
G
Mean Sev. 6 +.5c 12 6 +.5c 12
H
Mean Pure Premium 4.5 + 3c/8 9.000 1.5 + c/8 3.000
1.000
The posterior mean pure premium is: (.25)(4.5 + 3c/8) + (.50)(9) + (.0833)(1.5 + c/8) + (.1667)(3) = 6.25 + .1042c. Setting this equal to the stated estimate of 10: 10 = 6.25 + .1042c. ⇒ c = 36. Comment: If c = 12, then the two spinners are equal. Each loss is then of size 12, and the chance of the observation is the chance of observing 1 claim. In this case, the estimated future pure premium would be 7.5: Type of A Priori Chance Die Chance of of the This Die Observation A B Overall
0.50 0.50
0.750 0.250
Prob. Weight
Posterior Chance of this Die
Mean Freq.
Mean Sev.
Mean Pure Premium
0.3750 0.1250
0.7500 0.2500
0.7500 0.2500
12 12
9.000 3.000
0.5000
1.000
7.500
2013-4-9
Buhlmann Credibility §13 Die/Spinner Models
HCM 10/19/12,
Page 596
13.35. D. For Business Use drivers: (% rural)(1) + (% urban)(2) = 1.8. Thus of the Business Use drivers: 80% are urban and 20% are rural. For Pleasure Use drivers: (% rural)(1.5) + (% urban)(2.5) = 2.3. Thus of the Pleasure Use drivers: 80% are urban and 20% are rural. Type of Driver B, R B, U P, R P, U
A Priori Chance of Driver 0.100 0.400 0.100 0.400
Mean
Mean Freq. 1.000 2.000 1.500 2.500
Square of Mean Freq. 1.000 4.000 2.250 6.250
Variance of Freq. 0.5 1.0 0.8 1.0
2.050
4.425
0.930
VHM = 4.425 - 2.052 = 0.2225. EPV = 0.930. K = EPV/VHM = 0.930/0.2225 = 4.18. Z = 1/ (1 + 4.18) = 0.193. Comment: It is intended that there are four separate cells: Business/Rural, Business/Urban, Pleasure/Rural, Pleasure/Urban. Each driver is in one and only one of the four cells. Rural Urban
Business
Pleasure
10% 40%
10% 40%
The EPV within Business Use is: (0.2)(.5) + (0.8)(1) = 0.9. The Variance of the Hypothetical Means within Business Use is: (0.2)(1 - 1.8)2 + (0.8)(2 - 1.8)2 = 0.16. 0.9 + 0.16 = 1.06, the shown total claims variance for Business Use. This is not the way one would estimate the experience of an individual driver in this type of situation in practical applications with classifications.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 597
Section 14, Classification Ratemaking An important aspect of most forms of insurance is the classification of insureds.144 One groups together insureds with similar characteristics related to risk, so that differences in costs may be recognized.145 Various characteristics might be used in order to group together insureds who are likely to have a similar average loss cost. For example, whether or not someone smokes could be used for life or health insurance. Age is an important classification variable for life insurance. The type of business is important for Workersʼ Compensation Insurance; Furniture Stores are more hazardous on average than Hardware Stores. The year and model of car are important for Automobile Collision Insurance. The place of principal garaging, the years of driving experience, etc., might be important for automobile insurance. Such classes are intended to improve the estimate of the future compared to not using the classification information. They are not intended to eliminate all uncertainty. Generally, one would try to estimate the future pure premium or loss ratio for a classification by applying weight Z to the observation for the class and weight 1 - Z to the estimate for a group of classes. For example, one might estimate the Soap Manufacturing Class by weighting its experience with that of all Manufacturing classes. In classification ratemaking, Z is quite often calculated using Classical Credibility.146 However, ideas from Buhlmann Credibility and Empirical Bayesian Credibility147 can also be used.148 As will be discussed in the next section, an individual policyholderʼs experience can be used together with his classification in order to improve the estimate of that policyholderʼs future experience compared to not relying on the individual experience at all. Experience Rating applies weight Z to the observation of an individual policyholder and weight 1 - Z to the estimate for its class. In Experience Rating, Z is usually calculated using ideas from Buhlmann Credibility and/or Empirical Bayesian Credibility. Experience Rating is used on top of and in addition to Classification Ratemaking. A well designed system of Classifications and Experience Rating Plan work together in order to improve the estimates of the future. Giving some weight to the additional information provided by the experience of an individual policyholder, improves the estimate of the future but does not remove all prediction error. 144
For example, see Foundations of Casualty Actuarial Science, Fourth Edition, Chapters 6 and 3. For example, see the Actuarial Standards Board, Standards of Practice #12. 146 See “Mahlerʼs Guide to Classical Credibility.” 147 See “Mahlerʼs Guide to Empirical Bayesian Credibility.” 148 In that case the complement of of credibility is applied to the loss ratio for a larger group of classes, as in “Empirical Bayesian Credibility for Workersʼ Compensation Ratemaking”, by Glenn Meyers, PCAS 1984, or the current relativity for a class, as in “Workersʼ Compensation Classification Credibilities,” by Howard C. Mahler, CAS Forum, Fall 1999. 145
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 598
Homogeneity of Classifications: One important feature is the homogeneity of the classifications.149 One desires classifications that are relatively homogeneous; one desires that the insureds in a class be as similar as possible in their expected pure premiums.150 Below is shown an example of two classes that are relatively homogeneous. The Poisson parameters of the first class are distributed via a Gamma Distribution around a mean of 10%, while those of the second class are distributed via a Gamma Distribution around a mean of 30%. Density 10
8
Gamma with alpha = 5 and theta = 0.02
6
4 Gamma with alpha = 5 and theta = 0.06 2
0.2
0.4
0.6
0.8
Lambda
Note that there is considerable “overlap” between the classes. The worst insured in the low risk class has a higher expected claim frequency than many insureds in the high risk class. It is generally the case that classifications will exhibit such “overlap”. For example the safest Furniture Store is probably of lower hazard than the least safe Hardware Store, even though on average Furniture Stores are more hazardous. Above, each class has a spread of insureds from more to less risky. The more homogeneous the class the less spread there is. Below is shown an example of less homogeneous classes.
149
See pages 555-556 of Loss Models. If one ignores differences in expected severity, one desires that the insureds within a class have similar expected frequencies. 150
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 599
Density Gamma with alpha = 2 and theta = 0.08 4
3
2
Gamma with alpha = 2 and theta = 0.12
1
0.2
0.4
0.6
0.8
Lambda
These less homogeneous, more heterogeneous classes are the type of thing one might get if one classified insureds according to their middle initials. One would not expect to get much or even any distinction between the classes. The purpose of class plans is to group insureds of similar hazard. A good class plan, such as in the first diagram, would produce class means far apart from each other and individual means within a class tightly bunched around the class mean. We would then assign more credibility to the average for a class and less to the overall mean. The more homogeneous the classes, the more credibility is assigned to their data and the less to the overall average, when determining classification rates. In actual applications there are competing goals.151 One wants to have classifications for which there is likely to be enough data from which to make reasonably accurate rates. Thus it is not useful to divide the total universe of insureds into very tiny but very homogeneous classes. Rather, one wants reasonably homogeneous classes with a reasonable amount of data in most of them. For example, generally one divides states into territories that are large enough to produce a usable quantity of data, but are small enough to capture the variation of hazard across the state. Then one would make rates for each territory as a credibility weighted average of the experience of that territory and the combined experience for the state. 151
The American Academy of Actuaries Committee on Risk Classification Report : “Risk Classification Statement of Principles”, June 1980, lists three “statistical considerations”: Homogeneity, Credibility, and Predictive Stability. Michael Walters in “Risk Classification Standards”, PCAS 1981, lists the following broad desirable characteristics of classification systems: homogeneous, well-defined, and practical.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 600
Homogeneity of Territories: For most lines of insurance, the premium charged depends on the geographical location. A state will be divided into many territories, with the rate depending on the territory. Territories act mathematically like another classification dimension, and thus many of the same ideas apply to territories as apply to classes. Let us assume that for a line of insurance one models the costs by zipcode across a state.152 Then one could create territories by grouping together zipcodes with similar expected pure premiums.153 One would want homogeneous territories, but also territories that are each big enough to have sufficient data to have enough credibility for use in determining territory relativities. One way to measure the homogeneity of territories would be to divide the total variance between zipcodes into a variance between territories and a variance within territories. The smaller the within variance, and thus the larger the between variance, the more homogeneous the territories. For example, one might get a graph of the within variance similar to the following:154 % of Total Variance 50 40 30 20
6
10
15
20
25
30
35
40
Number of Territories
As one divides the state into more and more territories, the rate at which homogeneity improves declines. In this case, one might choose to use about 20 territories, balancing the desire for homogeneous territories with the desire for credible territories. 152
The line of insurance might be homeowners or private passenger automobile. The model might be a generalized linear model, taking into account many aspects of each zipcode. The effect on expected costs due to zipcodes could be determined having adjusted for the effect of the other rating variables. 153 Traditionally one requires that territories be contiguous, but if one relaxes that restriction one can get more homogeneous territories. 154 Adapted from “Determination of Statistically Optimal Geographic Territory Boundaries,” by Klayton N. Southwood, CAS Special Interest Seminar on Predictive Modeling, October 2006.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 601
Problems: 14.1 (1 point) Which of the following are true with respect to grouping of policies to create rating classifications in Property/Casualty insurance? 1. Policies are occasionally grouped according to the different levels of the various risk factors involved. 2. Such grouping should leave an insignificant level of residual heterogeneity. 3. When a significant level of residual heterogeneity would result rather than use such a grouping one should use experience rating methods. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D. Use the following information for the next 15 questions: • The random variable representing the number of claims for a single policyholder in a year follows a Poisson distribution. • Policyholders are divided into five distinct classes, with the following percent in each of the classes: Class 1 Class 2 Class 3 Class 4 Class 5 20% 30% 25% 15% 10% • Within each class of policyholders, the Poisson parameters vary via a Gamma Distribution, representing the heterogeneity of risks within that class. Gamma Parameters Class
α
θ
1 2 3 4 5
1.1 1.6 2.0 1.8 2.5
0.32 0.26 0.28 0.38 0.31
14.2 (1 point) For Class 1, what is the variance of the hypothetical mean frequencies of its policyholders? A. 0.10 B. 0.11 C. 0.12 D. 0.13 E. 0.14 14.3 (1 point) For Class 1, what is the expected value of the process variance of the frequencies of its policyholders? A. 0.27 B. 0.29 C. 0.31 D. 0.33 E. 0.35 14.4 (1 point) Let N be the number of claims next year for a policyholder chosen at random from Class 1. What is the variance of N? A. 0.46 B. 0.49 C. 0.52 D. 0.55 E. 0.58
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 602
14.5 (2 points) Define more homogeneous as a smaller variance of the hypothetical mean frequencies. Which of the five classes is most homogeneous? A. Class 1 B. Class 2 C. Class 3 D. Class 4 E. Class 5 14.6 (1 point) Which of the five classes is least homogeneous? A. Class 1 B. Class 2 C. Class 3 D. Class 4 E. Class 5 14.7 (1 point) Assume you were creating an experience rating system to apply to just the policyholders in the most homogeneous class, how much Buhlmann Credibility should be applied to three years of experience of an individual policyholder? (The complement of credibility will be applied to the class mean.) A. Less than 42% B. At least 42%, but less than 45% C. At least 45%, but less than 48% D. At least 48%, but less than 51% E. At least 51% 14.8 (1 point) Assume you were creating an experience rating system to apply to just to the insureds in the least homogeneous class, how much Buhlmann Credibility should be applied to three years of experience of an individual policyholder? (The complement of credibility will be applied to the class mean.) A. Less than 42% B. At least 42%, but less than 45% C. At least 45%, but less than 48% D. At least 48%, but less than 51% E. At least 51% 14.9 (1 point) Using the answer to the previous question, estimate the future annual frequency of a policyholder chosen at random this Class, who had 5 claims in 3 years. A. 1.00 B. 1.05 C. 1.10 D. 1.15 E. 1.20 14.10 (1 point) Let N be the number of claims next year for a policyholder chosen at random from this portfolio. What is the mean of N? A. 0.43 B. 0.46 C. 0.49 D. 0.52 E. 0.55 14.11 (2 points) Let N be the number of claims next year for a policyholder chosen at random from this portfolio. What is the variance of N? A. 0.61 B. 0.63 C. 0.65 D. 0.67 E. 0.69
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 603
14.12 (2 points) What is the (weighted) average of the variances of the hypothetical means within each of the classes? A. Less than 0.16 B. At least 0.16, but less than 0.18 C. At least 0.18, but less than 0.20 D. At least 0.20, but less than 0.22 E. At least 0.22 14.13 (2 points) Assume you were creating an experience rating system to be applied to all the policyholders, how much Buhlmann Credibility should be applied to three years of experience of an individual policyholder? (The complement of credibility will be applied for each policyholder to the mean of its class. Thus for each policyholder we make use of the knowledge of the class to which it belongs.) Hint: Use the answer to the previous question. A. Less than 35% B. At least 35%, but less than 40% C. At least 40%, but less than 45% D. At least 45%, but less than 50% E. At least 50% 14.14 (1 point) Using the answer to the previous question, estimate the future annual frequency of a policyholder chosen at random from Class 4, who had 5 claims in 3 years. A. 1.00 B. 1.05 C. 1.10 D. 1.15 E. 1.20 14.15 (2 points) Assume the state passed a law banning insurers from using the above classification system. If you create an experience rating system to be applied to all the policyholders, how much Buhlmann Credibility should be applied to three years of experience of an individual policyholder? (One ignores classification for predicting the future frequency and applies the complement of credibility to the mean over all classes.) A. Less than 35% B. At least 35%, but less than 40% C. At least 40%, but less than 45% D. At least 45%, but less than 50% E. At least 50% 14.16 (1 point) Using the answer to the previous question, estimate the future annual frequency of a policyholder chosen at random from Class 4, who had 5 claims in 3 years. A. 1.00 B. 1.05 C. 1.10 D. 1.15 E. 1.20
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 604
14.17 (4, 5/88, Q.38) (1 point) Which of the following statements are true? 1. Large values of credibility are always desirable. 2. A class plan with homogeneous classes will result in low credibilities for individual risk experience. 3. For “good” class plans the credibility of class experience will be higher than for a poorer class plan. A. 1 B. 2 C. 3 D. 2, 3 E. 1, 2 and 3 14.18 (4, 5/91, Q.45) (3 points) A population of insureds consists of two classifications each with 50% of the total insureds. The Buhlmann credibility for the experience of a single insured within a classification is calculated below. Variance of Expected Value Mean Hypothetical of Process Buhlmann Classification Frequency Means Variance Credibility A 0.09 0.01 0.09 0.10 B 0.27 0.03 0.27 0.10 Calculate the Buhlmann credibility for the experience of a single insured selected at random from the population if its classification is unknown. A. Less than 0.08 B. At least 0.08 but less than 0.10 C. At least 0.10 but less than 0.12 D. At least 0.12 but less than 0.14 E. At least 0.14 14.19 (4B, 5/93, Q.23) (1 point) Which of the following statements are true concerning the use of credibility in classification ratemaking? 1. A small standard deviation of observations within a particular class would indicate a homogeneous group of risks within the class. 2. The credibility assigned to class data will tend to decrease as the variance of the hypothetical means between classes increases. 3. A well-designed class plan (resulting in homogeneous classes) generally results in high credibility assigned to the classification experience. A. 1 B. 2 C. 1, 2 D. 1, 3 E. 1, 2, 3
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 605
14.20 (4B, 5/95, Q.24) (2 points) You are given the following: • The random variable representing the number of claims for a single policyholder follows a Poisson distribution. • For each class of policyholders, the Poisson parameters follow a gamma distribution representing the heterogeneity of risks within that class. • For four distinct classes of risks, the random variable representing the number of claims of a policyholder, chosen at random, follows a negative binomial distribution with parameters r and β, as follows: r
Class 1 5.88
Class 2 1.26
Class 3 10.89
Class 4 2.47
β
0.2041
0.1111
0.0101
0.0526
• The negative binomial distribution with parameters r and β has the form f(x) =
βx r(r +1)...(r + x - 1) . (1+ β)x + r x!
• The lower the standard deviation of the gamma distribution, the more homogeneous the class. Which of the four classes is most homogeneous? A. Class 1 B. Class 2 C. Class 3 D. Class 4 E. Cannot be determined from the given information.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 606
Solutions to Problems: 14.1. E. 1. F. Policies are usually grouped into classifications. 2. F. The residual heterogeneity is often still considerable. 3. F. Experience rating is used in addition to classifications, rather than instead of classifications. 14.2. B. For Class 1, VHM = Var[λ] = Variance of the Gamma Distribution = αθ2 = (1.1)(.322 ) = 0.1126. 14.3. E. For Class 1, EPV = E[λ] = Mean of the Gamma = αθ = (1.1)(.32) = 0.352. 14.4. A. Total Variance = EPV + VHM = 0.352 + 0.1126 = 0.4646. Comment: See the discussion of the Gamma-Poisson in “Mahlerʼs Guide to Conjugate Priors.” The marginal distribution is Negative Binomial with r = α = 1.1 and β = θ = 0.32, with variance: rβ(1 + β) = (1.1)(0.32)(1.32) = 0.4646. 14.5. B. & 14.6. D. The most homogeneous class has the lowest variance of the distribution of Poisson parameters within that class. VHM = Var[λ] = Variance of the Gamma Distribution = αθ2. A
B
C
D
Class
α
θ
VHM
1 2 3 4 5
1.1 1.6 2.0 1.8 2.5
0.32 0.26 0.28 0.38 0.31
0.11264 0.10816 0.15680 0.25992 0.24025
Class 2 is the most homogeneous. Class 4 is least homogeneous or the most heterogeneous. 14.7. B. The most homogeneous class is Class 2. The EPV = E[λ] = mean of the Gamma Distribution = αθ, since we are mixing Poissons. Within Class 2 the variance of the hypothetical means is VHM = Var[λ] = Variance of the Gamma Distribution = αθ2. Therefore, the Buhlmann Credibility parameter K = αθ / αθ2 = 1/θ = 1/.26 = 3.85. Thus three years of experience gets credibility Z = 3/(3 + K) = 43.8%. Comment: See the Gamma-Poisson in “Mahlerʼs Guide to Conjugate Priors.”
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 607
14.8. E. The least homogeneous class is Class 4. K = αθ / αθ2 = 1/θ = 1/.38 = 2.63. Thus three years of experience gets credibility Z = 3/(3 + K) = 53.3%. Comment: Note that the credibility for experience rating was less for the more homogeneous class than it is here for the more heterogeneous class. Here the classification does a worse job of predicting the future frequency of an individual policyholder, so we give relatively more weight to the experience of the individual policyholder. 14.9. E. From the previous solution Z = 53.3%. The mean for Class 4 is: (1.8)(3.8) = 0.684. Estimated future annual frequency is: (0.533)(5/3) + (1 - 0.533)(0.684) = 1.208. 14.10. D. & 14.11. E. As shown below the weighted average frequency is 0.5153. Using analysis of variance, the variance for the whole portfolio is: E[Variance | Class] + VAR[Mean | Class]. Class
A Priori Probability
α
θ
Class Mean
1 2 3 4 5
20% 30% 25% 15% 10%
1.1 1.6 2.0 1.8 2.5
0.32 0.26 0.28 0.38 0.31
Average
0.3520 0.4160 0.5600 0.6840 0.7750
Square of Class Mean 0.1239 0.1731 0.3136 0.4679 0.6006
Total Within Class Variance 0.4646 0.5242 0.7168 0.9439 1.0152
0.5153
0.2853
0.6725
VAR[Mean | Class] = variance of the hypothetical means between classes = 0.2853 - 0.51532 = 0.0198. The Total Variance within each class is: EPV + VHM within class = αθ + αθ2 = αθ(1 + θ). E[Variance | Class] = (weighted) average of the (total) variances within each class = (20%)(0.4646) + (30%)(0.5242) + (25%)(0.7168) + (15%)(0.9439) + (10%)(1.0152) = 0.6725. Therefore the variance for the whole portfolio = 0.6725 + 0.0198 = 0.6923.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 608
14.12. A. Within a class, VHM = Var[λ] = Variance of the Gamma Distribution = αθ2. A
B
C
D
E
Class
A Priori Probability
α
θ
VHM
20% 30% 25% 15% 10%
1.1 1.6 2.0 1.8 2.5
0.32 0.26 0.28 0.38 0.31
0.1126 0.1082 0.1568 0.2599 0.2402
1 2 3 4 5
0.1572
Average
Comment: Note that one can divide the total variance for the portfolio into: VHM Between Classes + VHM Within Classes + EPV = .0198 + .1572 + .5153 = .6923. 14.13. D. The Expected Value of the Process Variance = mean = .5153, since we are mixing Poissons. The average of the within class variances of the hypothetical means is .1572. Therefore, the Buhlmann Credibility parameter K = .5153 / .1572 = 3.28. Thus three years of experience gets credibility Z = 3/(3 + K) = 47.8%. Comment: Very difficult. This is a simplified version of what is done in real world applications. One applies experience rating on top of and in addition to classification rating. In theory one could use different Buhlmann Credibility Parameters for each Class, based on the EPV for that class and variance of hypothetical means within that class. In each case, K = αθ / αθ2 = 1/θ. A
B
C
D
Class
α
θ
K
1 2 3 4 5
1.1 1.6 2.0 1.8 2.5
0.32 0.26 0.28 0.38 0.31
3.12 3.85 3.57 2.63 3.23
However, in practice one usually uses the same Buhlmann Credibility parameter, in this case 3.28, for insureds from every class. Thus the credibility would be determined from some average of the variance of the hypothetical means within the classes. (Note that this average of the variance of the hypothetical means within the classes is smaller than the total variance of hypothetical means ignoring the class plan : .1572 < .1572 + .0198 = .1770. ) The resulting Buhlmann Credibility parameter is generally somewhere in the range of parameters that could be calculated for each class separately. The resulting experience rating credibilities are in the general range of those that would result from using a separately calculated Buhlmann Credibility parameter for each class. 14.14. D. From the previous solution Z = 47.8%. The mean for Class 4 is: (1.8)(3.8) = .684. Estimated future annual frequency is: (.478)(5/3) + (1 - .478)(.684) = 1.154.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 609
14.15. E. The VHM = VHM within classes plus VHM between classes = .0198 + .1572 = .1770. EPV = .5153. K = EPV/VHM = .5153/.1770 = 2.91. Z = 3/(3+K) = 50.8%. Comment: In the absence of the classification plan, the individual experience gets more weight than it did in the presence of the classification system. 14.16. C. From the previous solution Z = 50.8%. From a previous solution, the overall mean is .5153. Estimated future annual frequency is: (.508)(5/3) + (1 - .508)(.5153) = 1.100. 14.17. D. 1. False. 2. True. 3. True.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 610
14.18. D. The overall mean is: (.5)(.09)+(.5)(.27) = .18. In order to compute the Variance of the Hypothetical Means, one can compute the second moment of the hypothetical means for each class. For class A the second moment of the hypothetical means is: .01 + .092 = .0181. For class B the second moment of the hypothetical means is: .03 + .272 = .1029. The second moment for the whole population is the weighted average of these second moments for each class: (.5)(.0181) + (.5)(.1029) = .0605. Thus the overall variance of the hypothetical means is: .0605 - .182 = .0281. The Expected Value of the Process Variance for the whole population is a weighted average of the expected value of the process variance for each class: (.5)(.09) + (.5)(.27) = .18. Thus K = EPV / VHM = .18 / .0281 = 6.406. For one observation Z = 1 / (1 + 6.406) = 0.135. Alternately, in order to compute the VHM, one can apply the concepts of analysis of variance. The VAR[Mean | Class] = (.5)(.09 - .18)2 + (.5)(.27 - .18)2 = .0081. Therefore, the variance of the hypothetical means for the whole population is: E[Variance of the hypothetical means | Class] + VAR[Mean | Class] = {(.5)(.01) + (.5)(.03)} + .0081 = .0281, as computed above. Comment: Difficult! More recent exam questions assume that everyone in a class has the same distribution; in other words that they are independent, identically distributed variables. Instead, here it is assumed that each class is not homogeneous; looking at each class separately, there is a variance of hypothetical means within each class. This is more realistic. Therefore, when we combine the two classes and pick an insured at random, without knowing what class it is from, we have to do extra work to get the overall VHM . The definition of the expected value is such that one can weight together the expected value for a subpopulation times the chance of being in that subpopulation. Thus, (combined) EPV = E[Process Variance] = E[Process Variance | Class A] Prob[Class A] + E[Process Variance | Class B] Prob[Class B] = (EPV for Class A)(proportion in Class A) + (EPV for Class B)(proportion in Class B). Also, E[Square of Hypothetical Means] = E[Square of Hypothetical Means | Class A]Prob[A] + E[Square of Hypothetical Means | Class B]Prob[B]. Note that the overall Variance of the Hypothetical Means is greater than the average of that for the individual classes: .0281 > {(.5)(.01) + (.5)(.03)} . Note that one makes no use of the “Buhlmann Credibility" given for each class; for each class EPV/ VHM = K = 9 and thus for one observation Z = 1/(1+9) = 1/10. Note that when a risk is chosen at random without knowing which class it is from, the credibility for one observation is increased compared to that when we know which class the risk is from. In other words, in the absence of the class plan we give more weight to the individual insuredʼs observed experience. In the absence of the class plan, the complement of credibility is given to the overall mean rather than the mean of the relevant class. The overall mean of .18 is a worse predictor of an individualʼs future experience than was the relevant class mean of either .09 or .27, and therefore it is given less weight. Credibility is a measure of the relative value of two predictors.
2013-4-9
Buhlmann Cred. §14 Classification Ratemaking,
HCM 10/19/12,
Page 611
14.19. D. 1. True. 2. False. The credibility will increase not decrease. As the classes differ more from each other, the data for each class will be given more credibility in estimating its own mean while the overall mean will be given less credibility. 3. True. 14.20. C. The most homogeneous class has the lowest standard deviation of the Gamma Distribution and therefore the smallest variance of the Gamma which is the smallest Variance of the Hypothetical Means. The Variance of the Hypothetical Means = Total Variance - Process Var. = Var. of the Negative Binomial - Mean of the Gamma = Variance of the Negative Binomial - Mean of Negative Binomial = rβ(1+β)- rβ = rβ2. It is smallest for Class 3. Class 1 2 3 4
r 5.88 1.26 10.89 2.47
β
0.0204 0.1111 0.0101 0.0526
Variance of Gamma 0.0024 0.0156 0.0011 0.0068
Alternately, the parameters of the Gamma can be derived from those of the Negative Binomial, α = r, θ = β. Then the Variance of the Gamma = αθ2 = rβ2.
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 612
Section 15, Experience Rating Assume that a insured has had no accidents over the last decade. This provides evidence that he is a safer than average insured; his expected claim frequency is lower than average for his class. Thus for automobile insurance one might give him a “safe driver discount” off of the otherwise applicable rate for his class. This is an example of experience rating.155 Generally, experience rating consists of modifying the rate charged to an insured (driver, business, etc.) based on its past experience. While such plans can be somewhat complex in detail,156 in broad outline they all reward better than expected experience and penalize worse than expected experience. Depending on the particular circumstances more or less weight is put on the insuredʼs observed experience from the recent past.157 The new estimate of the insuredʼs frequency or pure premium is a weighted average of that for his classification and the observation. The amount of weight given to the observation is the credibility assigned to the individual insuredʼs data. How much credibility to assign to an individual insuredʼs data is precisely what has been covered in previous sections. In general it should depend on: 1. What is being estimated. Pure Premiums are harder to estimate than frequencies. Total Limits losses are harder to estimate than basic limits losses. 2. The volume of data. All other things being equal, the more data the more credibility is assigned to the observation.158 3. The Expected Value of the Process Variance. The more volatile the experience, the less credibility is assigned to it. 4. The variance of the hypothetical means within classes; the more homogeneous the classification the smaller this variance and the less credibility is assigned to the insuredʼs individual experience compared to that for the whole classification. The more homogeneous the classes, the less credibility assigned an individualʼs data and the more to the average for the class, when performing experience rating (individual risk rating.) The credibility is a relative measure of the value of the information contained in the observation of the individual versus the information in the class average. 155
For example, see Foundations of Casualty Actuarial Science, Fourth Edition, Chapter 4. Experience Rating Plans are currently covered on the CAS Part 5 and Part 9 Exams. 157 The period of past experience used varies between the different Experience Rating Plans. 158 For example, in Workersʼ Compensation Insurance the data from a business with $10,000 in Expected Losses would be given much less credibility for Experience Rating than the data from a business with $1 million in Expected Losses. 156
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 613
The more homogeneous the classes, the more value we place on the class average and the less we place in the individualʼs experience. Thus low credibility is neither good nor bad. It merely reflects the relative values of two pieces of information. With a well designed class plan, the less we need to rely on the observations of the individual, compared to a poorly designed class plan. In auto insurance if we classified insureds based on their middle initials, we would expect to give the insureds individual experience a lot of credibility. A poor class plan leads one to rely more on individual experience. Note that the role of the class in Experience Rating has changed from its role in Classification Ratemaking. In Experience Rating, the class experience receives the complement of credibility not given to the individualʼs experience. In the case of classification rating, the class experience gets the credibility while the complement of credibility is assigned to the experience of all classes combined. In Experience Rating, the insured is the smaller unit while the class is the larger unit. In Classification Ratemaking, the class is the smaller unit while the state is the larger unit. In both cases, the weight given to the classificationʼs experience is larger the more homogeneous the class. Thus the more homogeneous the classes, the more credibility is given to the experience of each class for Classification Ratemaking. The more homogeneous the class, the less credibility is assigned to the individualʼs experience and therefore the more weight is given to the class experience for Experience Rating. A model that helps one to understand the concepts of experience rating is the Gamma-Poisson frequency process.159 Each insuredʼs frequency is given by a Poisson Process. The mean frequencies of the insureds within a class are distributed via a Gamma Distribution. The variance of this Gamma Distribution quantifies the homogeneity of the class. The smaller the variance of this Gamma, the more homogeneous the class. The observed experience of an insured can be used to improve the estimate of that insuredʼs future claim frequency. We assume a priori that the average claim frequencies of the insureds in a class are distributed via a Gamma Distribution with α = 3 and θ = 2/3. The average frequency for the class is (3)(2/3) = 2. If we observe no claims in a year, then the posterior distribution of that insuredʼs (unknown) Poisson parameter is a Gamma distribution with α = 3 and θ = 0.4, with an average of: (3)(0.4) = 1.2.160 Thus the observation has lowered our estimate of this insuredʼs future claim frequency.
159
See “Mahlerʼs Guide to Conjugate Priors.” See “Mahlerʼs Guide to Conjugate Priors.” The posterior alpha is 3 + 0 = 0. The posterior theta = 1/(1+1/(2/3))= 1/ 2.5 = .4. 160
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 614
The prior Gamma with α = 3 and θ = 2/3, and the posterior Gamma with α = 3 and θ = 0.4, are shown: Density 0.7 Posterior
0.6 0.5 0.4 0.3
Prior
0.2 0.1 1
2
3
4
5
6
Lambda
If instead we observe 5 claims in a year, then the posterior distribution of that insuredʼs (unknown) Poisson parameter is a Gamma distribution with α = 8 and θ = 0.4, with an average of: (8)(0.4) = 3.2.161 Thus this observation has raised our estimate of this insuredʼs future claim frequency. The posterior Gamma in the case of this alternate observation is shown below: Density
Prior
0.4
0.3
Posterior
0.2
0.1
1
161
2
3
4
5
See “Mahlerʼs Guide to Conjugate Priors.” The posterior alpha is 3 + 5 = 8. The posterior theta = 1/(1+1/(2/3))= 1/ 2.5 = .4.
6
Lambda
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 615
Problems: The following information pertains to the next three questions. For a large class of drivers the Variance of the Hypothetical Mean Frequencies is 0.124 and the overall mean frequency is 0.660. Assume the claim count for each individual driver has a Poisson distribution whose mean does not change over time. 15.1 (1 points) Use Buhlmann Credibility to estimate the expected annual claim frequency of a driver who has had one accident free year. A. less than 55% B. at least 55% but less than 58% C. at least 58% but less than 61% D. at least 61% but less than 64% E. at least 64% 15.2 (1 point) Use Buhlmann Credibility to estimate the expected annual claim frequency of a driver who has had one accident in four years. A. less than 40% B. at least 40% but less than 42% C. at least 42% but less than 44% D. at least 44% but less than 46% E. at least 46% 15.3 (1 point) Use Buhlmann Credibility to estimate the expected annual claim frequency of a driver who has had eight accidents in ten years. A. less than 70% B. at least 70% but less than 72% C. at least 72% but less than 74% D. at least 74% but less than 76% E. at least 76% 15.4 (1 point) Under a certain Experience Rating Plan, an insured with $15,000 in Expected Losses who has no claims during the experience period receives a 17% credit modification. Under this Experience Rating Plan, how much credibility is assigned to the data of an insured with $15,000 in Expected Losses? A. less than 10% B. at least 10% but less than 15% C. at least 15% but less than 20% D. at least 20% but less than 25% E. at least 25%
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 616
Use the following information for the following 14 questions:162 Claim severity in the State of Confusion follows a Pareto distribution, with parameters α = 3, θ = $20,000. There are four types of risks in this State, all with claim frequency given by a Poisson distribution: Type Excellent Good
Average Annual Claim Frequency 5 10
Bad
15
Ugly
20
15.5 (2 points) A risk is selected at random from a class made up equally of “Good” and “Bad” risks. What is the Buhlmann Credibility assigned to this riskʼs claim frequency observed over a single year? (The complement of credibility will be assigned to the estimated claim frequency for the class.) A. less than 30% B. at least 30% but less than 40% C. at least 40% but less than 50% D. at least 50% but less than 60% E. at least 60% 15.6 (1 point) A risk is selected at random from a class made up equally of “Good” and “Bad” risks. What is the Buhlmann Credibility assigned to this riskʼs claim frequency observed over a three year period? (The complement of credibility will be assigned to the estimated claim frequency for the class.) A. less than 30% B. at least 30% but less than 40% C. at least 40% but less than 50% D. at least 50% but less than 60% E. at least 60%
162
In my paper “A Graphical Illustration of Experience Rating Credibilities,” PCAS 1998, not on the syllabus, I use the situations assumed in these problems, in order to illustrate via graphs the concepts of Experience Rating.
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 617
15.7 (2 points) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. What is the Buhlmann Credibility assigned to this riskʼs claim frequency observed over a single year? (The complement of credibility will be assigned to the estimated claim frequency for the class.) A. less than 60% B. at least 60% but less than 70% C. at least 70% but less than 80% D. at least 80% but less than 90% E. at least 90% 15.8 (2 points) A risk is selected at random from a class made up equally of all four types of risks. What is the Buhlmann Credibility assigned to this riskʼs claim frequency observed over a single year? (The complement of credibility will be assigned to the estimated claim frequency for the class.) A. less than 60% B. at least 60% but less than 70% C. at least 70% but less than 80% D. at least 80% but less than 90% E. at least 90% 15.9 (3 points) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. What is the expected value of the process variance of the loss pure premium? A. less than 5 billion B. at least 5 billion but less than 5.5 billion C. at least 5.5 billion but less than 6 billion D. at least 6 billion but less than 6.5 billion E. at least 6.5 billion 15.10 (2 points) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. What is the variance of the hypothetical loss pure premiums? A. less than 5 billion B. at least 5 billion but less than 5.5 billion C. at least 5.5 billion but less than 6 billion D. at least 6 billion but less than 6.5 billion E. at least 6.5 billion
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 618
15.11 (1 point) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. What is the Buhlmann Credibility assigned to this riskʼs loss pure premium observed over a single year? (The complement of credibility will be assigned to the estimated loss pure premium for the class.) A. less than 40% B. at least 40% but less than 50% C. at least 50% but less than 60% D. at least 60% but less than 70% E. at least 70% 15.12 (1 point) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. This risk is observed to have $300,000 in losses in a single year. Using Buhlmann Credibility what is the expected dollars of loss for this risk in a single future year? A. less than $200,000 B. at least $200,000 but less than $205,000 C. at least $205,000 but less than $210,000 D. at least $210,000 but less than $215,000 E. at least $215,000 15.13 (1 point) Claim sizes are limited to $25,000. What is the mean of the (limited) severity? A. less than $6,000 B. at least $6,000 but less than $6,500 C. at least $7,000 but less than $7,500 D. at least $7,500 but less than $8,000 E. at least $8,000 15.14 (2 points) Claim sizes are limited to $25,000. What is the second moment of the (limited) severity distribution? A. less than 115 million B. at least 115 million but less than 120 million C. at least 120 million but less than 125 million D. at least 125 million but less than 130 million E. at least 130 million
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 619
15.15 (3 points) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. Claim sizes are limited to $25,000. What is the expected value of the process variance of the loss pure premium? A. less than 1.3 billion B. at least 1.3 billion but less than 1.4 billion C. at least 1.4 billion but less than 1.5 billion D. at least 1.5 billion but less than 1.6 billion E. at least 1.6 billion 15.16 (3 points) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. Claim sizes are limited to $25,000. What is the variance of the hypothetical loss pure premium? A. less than 3.3 billion B. at least 3.3 billion but less than 3.4 billion C. at least 3.4 billion but less than 3.5 billion D. at least 3.5 billion but less than 3.6 billion E. at least 3.6 billion 15.17 (1 point) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. Claim sizes are limited to $25,000. What is the Buhlmann Credibility assigned to this riskʼs (limited) loss pure premium observed over a single year? (The complement of credibility will be assigned to the estimated (limited) loss pure premium for the class.) A. less than 71% B. at least 71% but less than 73% C. at least 73% but less than 75% D. at least 75% but less than 75% E. at least 77% 15.18 (1 point) A risk is selected at random from a class made up equally of “Excellent” and “Ugly” risks. Claim sizes are limited to $25,000. This risk is observed to have $200,000 in losses in a single year. Using Buhlmann Credibility what is the expected dollars of (limited) loss for this risk in a single future year? A. less than $170,000 B. at least $170,000 but less than $175,000 C. at least $175,000 but less than $180,000 D. at least $180,000 but less than $185,000 E. at least $185,000
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 620
15.19 (3 points) For an experience rating plan, the credibility assigned to an insureds experience is given by the Buhlmann Credibility formula with K = 40,000: Z = E / (E + 40,000), with E = expected losses for the insured. The sizes of insureds, in other words their expected losses, vary across the portfolio via a Pareto Distribution with θ = 40,000 and α. Determine the average credibility assigned to an insured in this portfolio. 15.20 (4, 5/84, Q.47) (1 point) Suppose you are given a class of insurance which is homogeneous and has a Poisson claim count process for the individual risk. Assume the expected frequency is 10% and disregard severity. What Buhlmann credibility would be assigned to the annual experience of an individual risk taken from the class? A. 0% B. 10% C. 90% D. None of the above E. Insufficient information given 15.21 (4, 5/85, Q.43) (1 point) An insured's loss rate is to be credibility weighted with the loss rate of its class. Which of the following statements are true? 1) As the variance of the hypothetical means increases, the insured's credibility should increase. 2) As the expected value of the process variance increases, the insured's credibility should decrease. 3) If all insureds in the class are identical, the insured's credibility should be zero. A. 3 B. 2, 3 C. 1, 3 D. 1, 2 E. 1, 2, 3
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 621
Solutions to Problems: 15.1. B. Each insuredʼs frequency process is given by a Poisson with parameter λ, with λ varying over the group of insureds. The process variance for each insured is λ. Thus the expected value of the process variance is estimated as follows: Eλ[VAR[X | λ]] = Eλ[λ] = overall mean = .660. K = EPV / VHM = .66 / .124 = 5.32. For one year Z = 1 / (1 + 5.32) = .158. If there are no accidents, estimated frequency is: (0)(.158) + (.66)(1 - .158) = 0.556. Comment: This indicates a claim free credit for one year of 15.8%, equal to the credibility. 15.2. E. From the solution to the previous question, K = 5.32. For four year the credibility Z = 4 / (4 + 5.32) = 42.9%. The observed frequency is 1/4 = .25. The prior mean is .66. T he new estimate = (.25)(42.9%) + (.66)(57.1%) = 0.484. Comment: Under a (simplified) experience rating this insured might get a credit of: 1 - (.484 / .66) = 26.7%. 15.3. D. K = 5.32. For 10 years, Z = 10 /(10 + 5.32) = 65.3%. The observed frequency is .80 and the prior mean is .66. The new estimate is: (.8)(.653) + (.66)(.347) = 0.751. 15.4. C. Let R be the class rate. Let D be the rate based solely on the observed data for the chosen insured. Then for credibility Z, the rate charged the insured = ZD + (1 - Z)R. For the case where D = 0 and the rate charged the insured is: R(1 - 0.17) = 0.83 R, we have: 0.83 R = Z(0) + (1 - Z) R = (1 - Z) R. Thus Z = 0.17. Comment: The credibility is equal to the claims free discount, in this case 17%. 15.5. B. The overall mean frequency is: (10 + 15)/2 = 12.5. Since we are mixing Poissons, Expected Value of the Process Variance = overall mean = 12.5. Variance of Hypothetical Mean Frequencies = {(10 - 12.5)2 + (15 - 12.5)2 } /2 = (5/2)2 = 6.25. K = EPV/VHM = 12.5 / 6.25 = 2. Z = 1 / (1+2) = 33.3% 15.6. E. K = 12.5 / 6.25 = 2. Z = 3 / (3+2) = 60%. 15.7. D. The overall mean frequency is: (5 + 20)/2 = 12.5. Since we are mixing Poissons, Expected Value of the Process Variance = overall mean = 12.5. Variance of Hypothetical Mean Frequencies = {(5 - 12.5)2 + (20 - 12.5)2 } /2 = 7.52 = 56.25. K = EPV/VHM = 12.5 / 56.25 = .222. Z = 1 / (1 + .222) = 81.8%.
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 622
15.8. C. The overall mean frequency is: (5 +10 + 15 + 20)/4 = 12.5. Since we are mixing Poissons, Expected Value of the Process Variance = overall mean =12.5. Variance of Hypothetical Mean Frequencies = {(5 - 12.5)2 + (10 - 12.5)2 + (15 - 12.5)2 + (20 - 12.5)2 } /4 = 31.25. K = 12.5 / 31.25 = .4. Z = 1 / (1 + 0.4) = 71.4% 15.9. B. The process variance of the pure premium with a Poisson frequency = (mean frequency)(second moment of the severity.) The severity distribution is assumed to be the same for all types of risks, therefore the expected value of the process variance = (overall mean frequency)(second moment of Pareto). For a Pareto with parameters: α = 3, θ = $20,000, Second moment of Severity = 2θ2/{(α−1)(α−2)} = 4 x 108 . Therefore, Expected Value of Process Variance = (12.5)( 4 x 108 ) = 5 x 109 . 15.10. C. For a Pareto with parameters α = 3, θ = $20,000, Mean Severity= θ/(α−1) = 10,000. Thus the Hypothetical Mean Pure Premiums are: (5)(10000) and (20)(10000). The overall mean pure premium is: (12.5)(10000). Thus the Variance of the Hypothetical Mean Pure Premiums = 100002 {(5 - 12.5)2 + (20 - 12.5)2 }/2 = 5.625 x 109 . 15.11. C. K = 5 x 109 / 5.625 x 109 = .889. Z = 1 / (1 + .889) = 52.9%. 15.12. E. The mean losses for the class = (12.5)($10000) = 125,000. The credibility is 52.9%. Therefore, new estimate = (.529)(300000) + (1 - .529)(125000) = $217,575. 15.13. E. For the Pareto E[X ∧ x] = {θ/(α−1)}{1−(θ/θ+x)α−1}. E[X ∧ 25000] = (20000 / 2) (1 - (20000 / 45000)2 ) = $8025. 15.14. C. For the Pareto: E[(X ∧ L)2 ] = E[X2 ] {1 - (1+L/ θ)1−α[1+ (α-1)L/ θ]}. E[(X ∧ 25000)2 ] = ( 4 x 108 ) {1 - (1+1.25)-2[1+ (2)(1.25)]} = 1.235 x 108 . 15.15. D. The process variance with a Poisson frequency = (mean frequency)(second moment of the severity.) The severity distribution is assumed to be the same for all types of risks, therefore the expected value of the process variance = (overall mean frequency)(second moment of severity). From the previous problem, second moment of severity = 1.235 x 108 . Therefore, Expected Value of Process Variance = (12.5)(123.5 million) = 1.544 x 109 .
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 623
15.16. E. From a previous problem, E[X ∧ 25000] = $8025. Thus the Hypothetical Mean Pure Premiums are: (5)($8025) and (20)($8025). Variance of the Hypothetical Mean Pure Premiums = (80252 ){(5 - 12.5)2 + (20 - 12.5)2 }/2 = 3.623 x 109 . 15.17. A. K = 1.544 x 109 / 3.623 x 109 = .426. Z = 1/(1 + .426) = 70.1%. 15.18. B. Z = 70.1% and the overall mean loss is 12.5 x $8025 = $100,313. Thus the new estimate = (.701)($200,000) + (1 - .701)($100,313) = $170,194.
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 624
⎛ 40,000 ⎞ α 15.19. The survival function of the sizes of insureds is: S(E) = ⎜ ⎟ . ⎝ E + 40,000 ⎠ 1 - S(E)1/α = 1 - 40,000/(E + 40,000) = E / (E + 40,000) = Z. ∞
∞
0
0
E Therefore, the mean Z is: ∫ f(E) dE = E + 40,000 ∞
∫ {1 - S(E) 1/ α } f(E) dE =
∞
∫ f(E) dE -
E =∞
∫ S(E)1/ α f(E) dE = {F(E) + S(E)1+ 1/ α / (1/ α + 1)} ]
0
E=0
0 ∞
α 40,000 α E dE = ∫ E + 40,000 α+1 (E + 40000) 0
0
∞
E
∫
α+ 2 0 (E + 40000)
dE = E =∞
α
40,000α {
1 1 = . 1/ α + 1 α + 1
∞
E Alternately, the mean Z is: ∫ f(E) dE = E + 40,000 α 40,000α
=1-
⎤ ⎥ (α + 1) (E + 40000)α + 1 ⎦ -E
∞
-
E=0
-1
∫
α+ 1 0 (α +1) (E + 40000)
dE } =
E=∞
α 40,000α { 0 -
⎤ ⎥ α (α +1) (E + 40000)α ⎦ 1
} = α 40,000α
E =0
Alternately,
(α +1) 40,000α + 1 ( x + 40000)α+ 2
1
1 = . α + 1 α (α +1) 40000 α
is the density of a Pareto Distribution with parameters α+1 and
40,000. This distribution has a mean of: 40,000/(α + 1 - 1) = 40,000/α. Therefore, ∞
∫x
0
(α +1) 40,000α + 1 (x +
40000)α + 2
dx = 40,000/α. ⇒
∞
∫
0
x (x +
40000)α + 2
1 α (α +1) 40000 α
∞
∞
0
α+ 2 0 (E + 40000)
E The mean Z is: ∫ f(E) dE = α 40,000α E + 40,000 α 40,000α
dx =
∫
E
.
dE =
1
1 = . α + 1 α (α +1) 40000 α
Comment: The Buhlmann credibility formula has the same mathematical form as the distribution
2013-4-9
Buhlmann Credibility §15 Experience Rating
HCM 10/19/12,
Page 625
function of a Pareto with α = 1. In the first solution, I used the fact that the derivative of the survival function is minus one times the density function. In the second solution, I used integration by parts. 15.20. A. Since the class is (completely) homogeneous the experience of an individual is assigned no credibility for experience rating. (The VHM is zero, so Z = 0.) 15.21. E. 1. True. 2. True. 3. True. If the class is (completely) homogeneous, then the Variance of the Hypothetical Means is zero and the credibility assigned to the experience of an individual is zero. (The class loss rate is given 100% credibility.)
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 626
Section 16, Loss Functions A loss function is defined as a function of the estimate of a parameter and its true value.163 The loss function most commonly used by actuaries is the squared error loss function.164 The squared error is: (estimate - true value)2 . The smaller the expected value of the loss function, the better the estimate. Which estimator is best depends on which loss function one attempts to minimize.165 It turns out that the estimator that minimizes squared errors is the mean. Another loss function is the absolute error: | estimate - true value|. It turns out that the estimator that minimizes absolute errors is the median. Bayesian Estimation: Throughout the section on Bayesian Analysis, I used the mean of the posterior distribution of the quantity of interest as the estimator. When doing Bayes Analysis questions, use the mean of the posterior distribution, unless specifically stated otherwise. However, the mean is only one possible Bayesian estimator. Bayes Analysis using the Squared-error Loss Function, just means do what we usually do, get the posterior mean of the quantity of interest. In general, the Bayes Estimator minimizes the expected value of the given loss function.166 Depending on the loss function one attempts to minimize, one gets the following estimators:167
163
See Definition 15.15 in Loss Models. For example, in linear regression one tries to minimize squared errors. 165 In general, which estimator is “best”, depends on which criterion one uses. 166 See Definition 15.16 in Loss Models. 167 See Theorem 15.18 in Loss Models. Note that multiplying a loss function by a constant does not change the estimator that minimizes that loss function. 164
2013-4-9
Buhlmann Credibility §16 Loss Functions,
Error or Loss Function
Name
HCM 10/19/12,
Page 627
Bayesian Point Estimator
(estimate - true value)2
Squared-error
Mean
⎧ 0 if estimate = true value ⎨ ⎩ 1 if estimate ≠ true value
Zero-one
Mode
Absolute-error
Median
| estimate - true value |
⎧ (1-p) | estimate - true value |, if estimate ≥ true value (overestimate) ⎨ ⎩ (p)| estimate - true value|, if estimate ≤ true value (underestimate)
p th percentile
Exercise: Assume the following information:
•
The probability of y successes in m trials is given by a Binomial distribution with parameters m and q.
•
The prior distribution of q is uniform on [0,1].
•
One success was observed in three trials.
Determine the posterior distribution of q. [Solution: Assuming a given value of q, the chance of observing one success in three trials is 3q(1 - q)2 . The prior distribution of q is: π(q) = 1, 0 ≤ q ≤ 1. By Bayes Theorem, the posterior distribution of q is proportional to the product of the chance of the observation and the prior distribution: 3q(1 - q)2 . Thus the posterior distribution of q is proportional to q - 2q2 + q3 . The integral of q - 2q2 + q3 from 0 to 1 is: 1/2 - 2/3 + 1/4 = 1/12. Thus the posterior distribution of q is: 12(q - 2q2 + q3 ) = 12q - 24q2 + 12q3 , 0 ≤ q ≤ 1.]
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 628
In this exercise, the posterior distribution of q is a Beta Distribution with a = 2, b = 3, and θ = 1:168 Density
1.5
1.0
0.5
0.2
0.4
0.6
0.8
1.0
q
Using the squared-error loss function, the expected future frequency is given by the mean of the posterior distribution: θa/(a+b) = (1)(2)(2 + 3) = 2/5.169 Using instead the zero-one loss function, the expected future frequency is given by the mode of the posterior distribution: θ(a - 1) / (a + b - 2) = (1)(2 - 1)/(2 + 3 - 2) = 1/3.170 The graph of the density reaches a maximum at q = 1/3. Using the absolute loss function, the expected future frequency is given by the median of the posterior distribution. The posterior distribution of q is: f(q) = 12q - 24q2 + 12q3 . Therefore by integration, F(q) = 6q2 - 8q3 + 3q4 . At the median the distribution function is 0.5: 6q2 - 8q3 + 3q4 = 0.5. One can solve numerically for q = 0.3857. In the above graph, half of the area is to the left of q = 0.3857, while the other half of the area is to the right of q = 0.3857.
168
As shown in Appendix A attached to the exam, the density of the Beta Distribution is f(x) = {(a + b - 1)! / ((a-1)! (b-1)!)} (x/θ)a-1 (1 - x/θ)b-1 /θ, 0 ≤ x ≤ θ. This is a special case of a Beta-Bernoulli conjugate prior. See “Mahlerʼs Guide to Conjugate Priors.” 169 See Appendix A of Loss Models. One can compute the mean by integrating q times the posterior density, from 0 to 1. 170 f(q)=12(q - 2q2 + q3 ), fʼ(q) = 12(1 - 4q + 3q2 ), which is zero when q = 1/3 or 1. One can confirm, that the density reaches a maximum at q = 1/3.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 629
Exercise: Use a loss function based on the absolute error, but treat underestimates as three times as important as overestimates. Determine the Bayesian estimate of the future claim frequency. [Solution: The loss function is: | estimate - true value |, if estimate ≥ true value (overestimate) 3 | estimate - true value |, if estimate ≤ true value (underestimate). We can multiply this loss function by 1/4, without affecting which estimator is best: (1/4)| estimate - true value |, if estimate ≥ true value (overestimate) (3/4)| estimate - true value |, if estimate ≤ true value (underestimate). Consulting the previous chart, minimizing the expected value of this loss function corresponds to using the 75th percentile. For the 75th percentile of the posterior distribution of q: 6q2 - 8q3 + 3q4 = 0.75. Solving numerically, q = 0.5437. Comment: Since we really want to avoid underestimates, we choose a larger estimator, the 75th percentile. If we instead treated overestimates as three times as important as underestimates, then we would choose a smaller estimator, the 25th percentile.] Thus we see depending on which loss function we use, the estimated future frequency is either the mean of the posterior distribution 0.4, the mode of the posterior distribution 0.3333, the median of the posterior distribution 0.3857, or the 75th percentile of the posterior distribution 0.5437. Note that the only difference is the criterion that was used to decide which estimator is best; the a priori assumptions and the observations are the same in each case. Connecting the Loss Functions with the Estimators: Assume we observe a sample of size 5 from an unknown distribution: 12, 3, 38, 5, 8. We wish to estimate the next value from this distribution. Let us assume we wish to minimize the squared error of the estimate. Let the estimate be y. Then if x is the next observed value, we wish to minimize (x - y)2 . Let us assume the uniform and discrete distribution on the given sample, in other words the empirical distribution function.171 In other words, let us assume a 20% chance of: 12, 3, 38, 5 or 8. Then the expected squared error for an estimate of y is: .2(12 - y)2 + .2(3 - y)2 + .2(38 - y)2 + .2(5 - y)2 + .2(8 - y)2 .
171
This is similar to a step used in Bootstrapping. See “Mahlerʼs Guide to Simulation.”
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 630
Taking the derivative with respect to y and setting it equal to zero: 0 = 0.4{(12 - y) + (3 - y) + (38 - y) + (5 - y) + (8 - y)}.
⇒ y = (12 + 3 + 38 + 5 + 8)/5 = 11. Thus the sample mean minimizes the squared error loss function. More generally, for the empirical distribution function, the expected squared error using an estimate of y is:
0=
∑ (xi
∂∑
- y)2 . We minimize the expected squared error: n
(xi - y)2 2y n = ∂y n
∑ (xi
- y) . ⇒ y =
∑ xi = sample mean. n
Similarly, the expected absolute error using an estimate of y is: 0.2|12 - y| + 0.2|3 - y| + 0.2|38 - y| + 0.2|5 - y| + 0.2|8 - y|. Here is a graph of the expected absolute error as a function of the estimate y: abs.error 70 65 60 55 50 45
5
10
15
20
y
The expected absolute error is minimized for y = 8, which is the empirical median. More generally, for the empirical distribution function, the expected absolute error for an estimate of |x - y | ∂∑ i ∑ sgn[xi - y] , where | xi - y | n y is: ∑ . We minimize the expected absolute error: 0 = = n ∂y n sgn(z) equals 1 when z > 0 , equals -1 when z < 0, and equals 0 when z = 0.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 631
This partial derivative is equal to zero when there is an equal chance that y > xi or y < xi, which occurs when y is the sample median. Thus in this example, if one wanted to estimate the next value, one might take either the empirical mean of 11 or the empirical median of 8. Which of these two estimates is “better” depends on which criterion or loss function one uses to decide the question. Note that the squared error loss function (dashed line) counts extreme errors more heavily than does the absolute error loss function (solid line.)
-2
-1
1
2
Therefore, it is not surprising that one would get a different best estimate, depending on which of these two loss functions one is trying to minimize. If y is the estimate and x is the observation, let the loss function be defined by: error =
(1- p) | y - x |, if y ≥ x (overestimate) (p) | y - x |, if y ≤ x (underestimate)
If the estimate is y, then the partial derivative with respect to y is (p - 1) if x ≤ y and p if x ≥ y. Thus the expected value of this derivative is: Prob( x ≤ y)(p - 1) + Prob(x > y)() = Prob(x ≤ y)(p - 1) + {1 - Prob(x ≤ y)}(p) = p - Prob(x ≤ y). Thus the expected value of the derivative of this loss function is zero when Prob(x ≤ y) = p, in other words when y is the pth percentile of the distribution of x. Note that if p = 0.5, then the loss function is proportional to the absolute error loss function discussed previously, which is minimized by the 50th percentile, in other words, the median.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 632
Exercise: Assume that we wish to estimate Loss Reserves and that we believe errors when our reserve turns out to be an underestimate to be 9 times as bad as when our reserve turns out to be an overestimate. In other words, we are more concerned about the possibility that the outcome x turns out to be greater than the estimate y, than vice versa. What estimator should we use? [Solution: In this case, the loss function would be:172 ⎧ | y - x |, if y ≥ x (overestimate) error = ⎨ ⎩ 9 | y - x |, if y ≤ x (underestimate) If we multiply this loss function by 1/10, then this is a special case of the prior loss function, with p = 9/10. Thus the best estimator would be the 90th percentile.] If y is the estimate and x is the observation, let the loss function be defined by:173 ⎧0 if y = x error = ⎨ ⎩1 if y ≠ x The mode is that value of x most likely to occur. In other words, the probability of not matching a single selected value is smallest at the mode. Therefore, an estimate equal to the mode of the distribution function of x will minimize the expected value of this loss function.
172
The loss function is subject to an arbitrary multiplicative constant. This is referred to by Loss Models as the “zero-one” loss function. It has little application to actuarial work. 173
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 633
Problems: Use the following information for the next 4 questions: 9 years of losses (in millions of dollars), ranked from smallest to largest are observed: 2, 7, 10, 18, 23, 30, 58, 72, and 617. One wishes to estimate the losses (in millions of dollars) for the next year. 16.1 (2 points) If you are interested in minimizing the expected squared error, what is your estimate of next yearʼs losses (in millions of dollars)? A. Less than 20 B. At least 20 but less than 40 C. At least 40 but less than 60 D. At least 60 but less than 80 E. At least 80 16.2 (2 points) You are interested in minimizing the expected absolute error. What is your estimate of next yearʼs losses (in millions of dollars)? A. Less than 20 B. At least 20 but less than 40 C. At least 40 but less than 60 D. At least 60 but less than 80 E. At least 80 16.3 (2 points) You are doing loss reserving, and are more concerned with underestimates than with overestimates. Each absolute value of any underestimates will be treated as four times as important as the absolute value of any overestimates. What is your estimate of next yearʼs losses (in millions of dollars)? A. Less than 20 B. At least 20 but less than 40 C. At least 40 but less than 60 D. At least 60 but less than 80 E. At least 80 16.4 (2 points) You are more concerned with overestimates than with underestimates. Each absolute value of any overestimates will be treated as four times as important as the absolute value of any underestimates. What is your estimate of next yearʼs losses (in millions of dollars)? A. Less than 20 B. At least 20 but less than 40 C. At least 40 but less than 60 D. At least 60 but less than 80 E. At least 80
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 634
Use the following information for the next 3 questions: •
Let severity be given by an exponential distribution with mean µ: f(x) = e-x/µ / µ.
•
In turn, let µ have the improper prior distribution π(µ) = 1/µ, 0 < µ < ∞.
• •
One observes 3 claims of sizes from an insured: 6, 9 ,11. You may use the following values of the Incomplete Gamma Function: Γ[2;1.67835] = Γ[3; 2.67406] = Γ[4; 3.67206] = Γ[5; 4.67091] = 0.5
16.5 (2 points) Estimate µ for this insured, using the squared-error loss function. A. Less than 8 B. At least 8 but less than 10 C. At least 10 but less than 12 D. At least 12 but less than 14 E. At least 14 16.6 (2 points) Estimate µ for this insured, using the zero-one loss function. A. Less than 8 B. At least 8 but less than 10 C. At least 10 but less than 12 D. At least 12 but less than 14 E. At least 14 16.7 (2 points) Estimate µ for this insured, using the absolute error loss function. A. Less than 8 B. At least 8 but less than 10 C. At least 10 but less than 12 D. At least 12 but less than 14 E. At least 14
16.8 (2 points) A loss function has been defined by: ⎧−2(x − α) if x ≤ α loss = ⎨ ⎩ 3(x − α) if x ≥ α where α is the Bayesian point estimate of x. Which statistic of x should α be so as to minimize the expected value of the loss function? A. 40th percentile
B. 60th percentile
C. Mean
D. Median
E. Mode
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 635
Use the following information for the next three questions: • In a large portfolio of risks, the number of claims for one policyholder during one year follows a Bernoulli distribution with mean q. • The number of claims for one policyholder for one year is independent of the number of claims for the policyholder for any other year. The number of claims for one policyholder is independent of the number of claims for any other policyholder. • The distribution of q within the portfolio has density function: 400q , 0 < q ≤ 0.05 ⎧ f(q) = ⎨ ⎩40 - 400q , 0.05 < q < 0.10 • A policyholder Phillip DeTanque is selected at random from the portfolio. • During Year 1, Phillip has one claim. • During Year 2, Phillip has no claim. • During Year 3, Phillip has no claim. 16.9 (3 points) A loss function is defined as equal to zero if the estimate equals the true value, and one otherwise. You are interested in minimizing the expected value of this loss function. Find the Bayesian estimate of Phillip's q. A. 0.0400 B. 0.0424 C. 0.0500 D. 0.0576 E. 0.0600 16.10 (4 points) You are interested in minimizing the expected absolute error. Find the Bayesian estimate of Phillip's q. A. 0.0400 B. 0.0424 C. 0.0500 D. 0.0576 E. 0.0600 16.11 (2 points) You are interested in minimizing the expected squared error. Find the Bayesian estimate of Phillip's q. A. 0.0400 B. 0.0424 C. 0.0500 D. 0.0576 E. 0.0600
16.12 (4 points) Severity follows a LogNormal Distribution. The prior density of parameters is: π(µ , σ) = 1/σ. Three losses were paid on a policy with following sizes: 1000, 2000, 5000. Determine the Bayesian estimate of µ and σ, using the posterior mode.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 636
Use the following information for the next two questions:
• The amount of a single payment has the Single Parameter Pareto Distribution with θ = 10 and unknown shape parameter.
• The prior distribution has the Gamma Distribution with α = 3 and scale parameter = 5. • Three losses were paid on a policy with following sizes: 13, 16, 21. 16.13 (3 points) With the squared error loss function, what is the Bayes estimate of the shape parameter of the Single Parameter Pareto Distribution for this policy? A. 3.0 B. 3.2 C. 3.4 D. 3.6 E. 3.8 16.14 (3 points) With the zero-one loss function, what is the Bayes estimate of the shape parameter of the Single Parameter Pareto Distribution for this policy? A. 3.0 B. 3.2 C. 3.4 D. 3.6 E. 3.8
Use the following information for the next two questions:
• Size of loss is uniform on [0, c]. • The improper prior of c is: π(c) = 1/c, c > 0. • A particular insured has two losses of sizes: 10, 15. 16.15 (3 points) Using the absolute error loss function, what is the Bayesian estimate of c for this insured? A. 15 B. 21 C. 25 D. 28
E. 30
16.16 (3 points) Using the absolute error loss function, what is the Bayesian estimate of the size of the next loss from this insured? A. 10.50 B. 10.75 C. 11.00 D. 11.25 E. 11.50
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 637
16.17 (4B, 11/94, Q.20) (2 points) The density function for a certain parameter, α, is f(α) =
4.6α e - 4.6 , α = 0, 1, 2, .... α!
A loss function has been defined by: ⎧0 if α = α 1 loss = ⎨ ⎩k if α ≠ α 1 where α1 is the Bayesian point estimate of α, and k is a positive constant. Which statistic of α should α1 be so as to minimize the expected value of the loss function? A. 33rd percentile
B. Maximum value
C. Mean
D. Minimum value
16.18 (4B, 11/97, Q.13) (2 points) You are given the following: • The random variable X has the density function f(x) = e-x, 0 < x < ∞. • A loss function is given by |X - k|, where k is a constant. Determine the value of k that will minimize the expected loss. A. ln 0.5 B. 0 C. ln 2 D. 1 E. 2 16.19 (4B, 11/99, Q.8) (2 points) You are given the following: • A loss function is given by ⎧ k - X if X - k ≤ 0 ⎨ ⎩α (X - k) if X - k > 0 where X is a random variable. • The expected loss is minimized when k is equal to the 80th percentile of X. Determine α. A. 0.2
B. 0.8
C. 1.0
D. 2.0
E. 4.0
E. Mode
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 638
Solutions to Problems: 16.1. E. Using the squared loss function corresponds to using the mean as the estimator. The observed mean is: (2 + 7 + 10 + 18 + 23 + 30 + 58 + 72 + 617)/9 = 93. Alternately, for the observed data the sum of the squared errors would have been if we had made various estimates: Estimate Sum of Squared Errors
85 313,878
90 313,383
93 313,302
95 313,338
100 313,743
16.2. B. Using the absolute loss function corresponds to using the median as the estimator. The estimated median is the (9+1)(50%) = 5th observed loss, which is: 23. Alternately, for the observed data the sum of the absolute errors would have been if we had made various estimates: Estimate Sum of Absolute Errors
15 754
22 741
23 740
24 741
30 747
For example, for an estimate of 22, the sum of the absolute errors would have been: |2-22| + |7-22| + |10-22| + |18-22| + |23-22| + |30 - 22| + |58-22| + |72-22| + |617-22| = 20 + 15 + 12 + 4 + 1 + 8 + 36 + 50 + 595 = 741. 16.3. D. Using a loss function proportional to: (1-.8)| estimate - true value |, if estimate ≥ true value (overestimate) (.8)| estimate - true value |, if estimate ≤ true value (underestimate) corresponds to using the 80th percentile as the estimator. The estimated 80th percentile is the (9+1)(80%) = 8th observed loss, which is: 72. Alternately, for the observed data the sum of each absolute value of any underestimates multiplied by 4, plus the absolute value of any overestimates, would have been if we had made various estimates: Estimate Sum of Errors
50 2598
70 2538
72 2536
75 2548
95 2628
For example, for an estimate of 70, the sum of the errors would have been: |2-70| + |7-70| + |10-70| + |18-70| + |23-70| + |30 - 70| + |58-70| + 4|72-70| + 4|617-70| = 2538.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 639
16.4. A. Using a loss function proportional to: (1-.2)| estimate - true value |, if estimate ≥ true value (overestimate) (.2)| estimate - true value |, if estimate ≤ true value (underestimate) corresponds to using the 20th percentile as the estimator. The estimated 20th percentile is the (9+1)(20%) = 2nd observed loss, which is: 7. Alternately, for the observed data the sum of each absolute value of any overestimates multiplied by 4, plus the absolute value of any underestimates, would have been if we had made various estimates: Estimate Sum of Errors
3 815
5 807
7 799
9 801
11 808
For example, for an estimate of 11, the sum of the errors would have been: 4|2-11| + 4|7-11| + 4|10-11| + |18-11| + |23-11| + |30 - 11| + |58-11| +|72-11| + |617-11| = 808. Comment: Multiplying the loss function by any constant, does not change the estimate of next yearʼs losses. 16.5. D. The chance of the observation is the product of the densities at the observed points: e-6/µe-9/µe-11/µ/ µ3 = e-26/µ / µ3. Multiplying by the prior distribution of 1/µ gives the probability weights: e-26/µ / µ4. The posterior distribution is proportional to this and therefore is an Inverse Gamma Distribution, with θ = 26 (the sum of the observed claims) and α = 3 (the number of observed claims.) Using the squared-error loss function, the estimator is the mean of the posterior distribution of hypothetical means = mean of the posterior Inverse Gamma = θ / (α−1) = 26/2 = 13. 16.6. A. From the previous solution, the posterior distribution is an Inverse Gamma Distribution, with θ = 26 and α = 3. Using the zero-one loss function, the estimator is the mode of the posterior Inverse Gamma = θ / (α+1) = 26/4 = 6.5. 16.7. B. The posterior distribution is an Inverse Gamma Distribution, with θ = 26 and α = 3. Using the absolute error loss function, the estimator is the median of the posterior Inverse Gamma. The Distribution function is: 1 - Γ[α; θ/x] = 1 - Γ[3; 26/x]. The median is where the distribution function is .5. In other words we want Γ[3; 26/x] = .5. We are given that Γ[3; 2.67406] = 0.5. Thus 26/x = 2.67406, or x = 26/2.67406 = 9.72.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 640
16.8. B. The partial derivative of the loss function is -2 if x ≤ α and 3 if x ≥ α. Thus the expected value of the derivative is: Prob(x ≤ α)(-2) + (Prob x ≥ α)(3) = Prob(x ≤ α)(-2) + {1 - Prob(x ≤ α)}(3) = 3 - (5)Prob(x ≤ α). Thus the expected value of the derivative of this loss function is zero when Prob(x ≤ α) = 3/5 = 60%, in other words when α is the 60th percentile of the distribution of x. Comment: The given loss function is just 5 times the loss function discussed in the text of this section, with p = 0.6. The best estimator is the pth percentile, in this case the 60th percentile.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 641
16.9. C. The chance of the observation given q is q(1-q)2 = q - 2q2 + q3 . By Bayes Theorem, the posterior probability density function is proportional to: f(q)(q - 2q2 + q3 ). In order to compute the posterior density we need to divide by the integral of f(q)(q - 2q2 + q3 ). .1
.05
.1
∫f(q)(q - 2q2 + q3)dq = 400∫{q2 - 2q3 + q4} dq + 40∫ {q - 12q2 + 21q3 - 10q4} dq = 0
0
.05
q=.05
]
400{q3 /3 - q4 /2 + q5 /5}
q=.1
+ 40{q2 /2 - 4q3 + 21q4 /4 - 2q5 } ] =
q=0
q =.05
(400)(.000041667 - .000003125 + .000000063) + 40{(.005 - .004 + .000525 - .00002) - (.00125 - .0005 + .000032813 - .000000625)} = .01544 + .02891 = .04435. Thus the posterior density is: 400{q2 - 2q3 + q4 }/.04435, for 0< q ≤.05, and 40{q - 12q2 + 21q3 - 10q4 }/.04435, for .05 < q ≤ .1. For the absolute error function, the Bayes Estimator is the mode, where the posterior density is maximized. Plugging in the given points one gets: q: 0.04 0.0424 0.05 0.0576 0.06 posterior density at q: 13.3 14.9 20.3 19.6 19.1 Comment: Ignoring the factor of 40/.04435, the derivative of the density is: 10{2q - 6q2 + 4q3 }, for 0 < q ≤ .05, and {1 - 24q + 63q2 - 40q3 }, for .05< q ≤ .1. One can check for places where the derivative is zero and the check value of the density at the endpoints 0, 0.05, and 0.1, and determine that 0.05 is the mode. The density is as follows: Density 20
15
10
5
0.02
0.04
0.06
0.08
0.10
q
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 642
16.10. D. From the previous solution, the posterior density is: 400{q2 - 2q3 + q4 } / .04435, for 0 < q ≤ .05, and 40{q - 12q2 + 21q3 - 10q4 } / .04435, for .05< q ≤ .1. For the absolute error function, the Bayes Estimator is the median. One can integrate the density from 0 to each of the given points and determine where the distribution function is 0.5. The median is 0.0576. Plugging in the given points one gets: q: 0.04 0.0424 0.05 0.0576 0.06 posterior distribution function at q: 0.181 0.215 0.348 0.500 0.547 For example, the posterior distribution function at .06 is computed as follows: .05
.06
∫
∫
400 {q2 - 2q3 + q4 }/.04435 dq + 40 {q - 12q2 + 21q3 - 10q4 } /.04435 dq = 0
.05 q=.05
]
400{q3 /3 - q4 /2 + q5 /5} /.04435
q=.06
+ 40{q2 /2 - 4q3 + 21q4 /4 - 2q5 }/.04435
q=0
]
=
q =.05
0.348 + 0.199 = 0.547. 16.11. D. From a previous solution, the posterior density is: 400{q2 - 2q3 + q4 } / .04435 for 0 < q ≤ .05 and 40{q - 12q2 + 21q3 - 10q4 } / .04435 for .05< q ≤ .1. For the squared error function, the Bayes Estimator is the mean. Integrating q times the density from q equals 0 to .05 to .1, the mean is 0.0576. .05
.1
∫
∫
400 {q2 - 2q3 + q4 }q/.04435 dq + 40 {q - 12q2 + 21q3 - 10q4 }q/.04435 dq = 0
.05 q = .05
400{q4 /4 - q5 2/5 + q6 /6}/.04435
] q=0
q = .1
+ 40{q3 /3 - 3q4 + 21q5 /5 - q6 5/3}/.04435
]
=
q = .05
.01299 + .04461 = 0.0576. Comment: The mean and median are slightly different if taken out to more decimal places.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
[(
exp 16.12. f(x) =
ln(x) − µ)2 2σ2 x σ 2π
HCM 10/19/12,
Page 643
]
The chance of the observation is: f(1000) f(2000) f(5000), which is proportional to:
[(
exp -
ln(1000 ) − µ)2 2σ2 σ
] [( exp -
ln(2000) − µ)2 2σ2 σ
] [( exp -
ln(5000 ) − µ)2 2σ2 σ
]
.
The prior distribution of parameters is: π(µ , µ) = 1/σ. Thus the density of the posterior distribution of parameters is proportional to:
[
exp -
( ln(1000 )
− µ)2 + (ln(2000 ) − µ)2 + ( ln(5000 ) − µ)2 2σ 2 σ4
]
.
The mode is where this density is largest. (The proportionality constant will not effect this.) We can maximum this density by maximizing its log: ( ln(1000 ) − µ)2 + ( ln(2000 ) − µ)2 + ( ln(5000 ) − µ)2 - 4 ln[σ]. 2σ2 Setting the partial derivative with respect to mu equal to zero: 0= µ=
( ln(1000 )
− µ) + ( ln(2000 ) − µ) + ( ln(5000 ) − µ) .⇒ σ2
ln(1000) + ln(2000) + ln(5000) = 7.675. 3
Setting the partial derivative with respect to sigma equal to zero: 0=
( ln(1000 )
σ2 = =
( ln(1000 )
− µ)2 + ( ln(2000 ) − µ)2 + ( ln(5000 ) − µ)2 - 4/σ. ⇒ σ3 − µ)2 + ( ln(2000 ) − µ)2 + ( ln(5000 ) − µ)2 4
( ln(1000 ) − 7.675)2 + ( ln(2000 ) − 7.675)2 + ( ln(5000 ) − 7.675 )2 4
⇒ σ = 0.571. Comment: Similar to Exercise 15.80 in Loss Models. The use of the posterior mode corresponds to the zero-one loss function.
= 0.3259.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 644
16.13. D. & 16.14. A. For the Single Parameter Pareto Distribution, f(x | α) = α 10α /xα+1. For the Gamma, π(α) = α2 e-α/5 / {Γ[3] 53 } = α2 e-α/5 / 250. The posterior distribution is proportional to: f(13 | α) f(16 | α) f(21 | α) π(α) = (α 10α /13α+1) (α 10α /16α+1) (α 10α /21α+1) (α2 e-α/5 / 250). This is proportional to: (α 10α /13α) (α 10α /16α) (α 10α /21α) (α2 e-α/5) = α5 103α e-α/5/ 4368α = α5 e6.9078α e-0.2α/ e8.3821α = α5 e-1.6743α. Thus the posterior distribution is a Gamma Distribution with parameters 6 and 1/1.6743. For the squared error loss function, the Bayes estimate is the mean of the posterior distribution, which in this case is: αθ = 6/1.6743 = 3.58. For the zero-one loss function, the Bayes estimate is the mode of the posterior distribution. The mode of a Gamma distribution for α > 1 is: (α - 1)θ = (6 - 1)/1.6743 = 2.99. Comment: Similar to Example 15.17 in Loss Models. In general, the Gamma Distribution is a Conjugate Prior to the Single Parameter Pareto likelihood with θ fixed. In general, if the prior Gamma has shape parameter α and scale parameter β, then the posterior n
Gamma has parameters: αʼ = α + n, and 1/βʼ = 1/β +
∑ ln[xi / θ]. i=1
In this case: αʼ = 3 + 3 = 6, and 1/βʼ = 1/5 + ln[13/10] + ln[16/10] + ln[21/10] = 1.6743. 16.15. B. If c < 15, then the chance of the observation is zero. For c ≥ 15, the chance of the observation is 1/c2 . Thus the density of the posterior distribution is proportional to: (1/c)(1/c2 ) = 1/c3 , c > 15. ∞
∫
15
1 dc = (1/2)(1/152 ). c3
Thus the density of the posterior distribution is: (2)(152 )/c3 , c > 15. Integrating the density, the distribution function is: 1 - (15/c)2 , c > 15. Using the absolute error loss function, we want the median of the posterior distribution. 0.5 = 1 - (15/c)2 . ⇒ c = 15
2 = 21.2.
Comment: The posterior distribution is a Single Parameter Pareto Distribution.
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 645
16.16. D. From the previous solution, the density of the posterior distribution is: 450/c3 , c > 15. The uniform has density 1/c, 0 ≤ x ≤ c. Therefore, for x < 15, c > x and the density of the uniform at x is 1/c, ∞
and thus the density of the predictive distribution is:
∫
15
1 450 dc = 2/45. c c3
However, for x ≥ 15, the density of the uniform at x is zero unless c ≥ x, ∞
and thus the density of the predictive distribution is:
∫x
1 450 dc = 150/x3 . c c3
The distribution of the next loss from this same insured is the predictive distribution. Using the absolute error loss function, we want the median of the predictive distribution. The predictive distribution function at 15 is: (15)(2/45) = 2/3. Thus the median is less than 15. 1/ 2 The median of the predictive distribution is: { } (15) = (0.75)(15) = 11.25. 2/3 Comment: We wish to minimize the expected absolute error of our estimate compared to the next observation. ⎧ 2 / 45, x < 15 The density of the predictive distribution is: f(x | observation) = ⎨ . ⎩150 / x3, x ≥ 15 density 0.04
0.03
0.02
0.01
10
20 ∞
Its integral from 0 to infinity is: (15)(2/45) +
30
40
∫15 150 / x3 dx = 2/3 + 75/152 = 1.
x
2013-4-9
Buhlmann Credibility §16 Loss Functions,
HCM 10/19/12,
Page 646
16.17. E. The mode is that value the distribution is most likely to assume. Therefore the probability of not matching the single selected value is smallest at the mode. The expected value of the given “loss function” is k times the probability of not matching the single selected value α1. Thus α1 = mode will produce the smallest expected value of the loss function. Comment: The expected value of the given “loss function” is a measure of the error of the point estimate. Different such measures or criteria will yield different “best” point estimates. For this particular measure of error the best estimate is the mode. This loss function is proportional to the “zero-one loss function” in Loss Models, and therefore produces the same best estimator, the mode. Note also that the particular density function given for α is not used to solve this question. 16.18. C. This absolute value loss function is minimized by the median. The median of this distribution is the value of x such that .5 = F(x) = 1- e-x. Thus x = ln 2 = .693. 16.19. E. The Loss Function corresponding to the pth percentile is proportional to: (1-p)(k - X) if X ≤ k error =
{ (-p)(k - X)
if X ≥ k
For the 80th percentile, p = .8 and the loss function is proportional to: 0.2(k - X) if X- k ≤ 0 error =
{ 0.8(X - k)
if X -k ≥ 0
Multiplying by 5, we obtain the given loss function, with α = (5)(0.8) = 4. Comment: One can always multiply a loss function by any positive constant, without changing the estimator that minimize it. In this situation, we particularly dislike underestimates; the loss function is four times as large when X > k than when X < k. Therefore, this loss function is minimized by an estimator than tends to aim high, such as the 80th percentile, rather than the median.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 647
Section 17, Least Squares Credibility Some of the mathematics behind the Buhlmann Credibility formula will be discussed in this section.174 It will be shown that Buhlmann Credibility is the linear estimator which minimizes the expected squared error measured with respect to either the future observation, the hypothetical mean, or the Bayesian Estimate. As will be shown, the expected squared errors as a function of the amount of credibility assigned to the observations is a parabola. First weʼll consider a single observation of a risk process and then extend the result to the average of several observations. Loss Models distinguishes between the situation where there is no variation in size or exposure, the so-called Buhlmann Model, and the situation where there is variation in size or exposure, the so-called Buhlmann-Straub Model.175 The Buhlmann Model:176 For a given policyholder its losses in different years, Xi, are independent, identically distributed variables.177 178 The means and variances differ across a group of policyholders in some manner. µ(θ) = E[Xi | θ], the hypothetical mean for risk type θ. v(θ) = Var[Xi | θ], the process variance for risk type θ. Then as discussed previously: µ = E[µ(θ)]. EPV = E[v(θ)]. VHM = Var[µ(θ)]. K = EPV/VHM. For N years of past data, Z = N/(N + K). X = ΣXi /n. Buhlmann Credibility Premium = estimate of the future = Z X + (1 - Z)µ. 174
Almost all exam questions ask you to apply the formula rather than asking about the mathematics behind it. Many actuaries do not make a big deal out of this distinction. 176 See Section 20.3.5 of Loss Models. 177 X could be the number of claims rather than the aggregate loss. As we have seen, the same mathematics can also be applied to severity, where n is the number of claims. 178 We actually need only assume that for a given policyholder the means and the variances are the same and that the distributions are independent. 175
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 648
The Buhlmann-Straub Model:179 For a given policyholder its pure premiums in different years, Xi, are independent.180 In year i, the policy has exposures mi, some measure of size.181 As before, µ(θ) = E[Xi | θ]. Now we assume that the variance of the pure premium is inversely proportional to size: Var[Xi | θ] = v(θ)/mi. Then as discussed previously: µ = E[µ(θ)].
EPV = E[v(θ)].
VHM = Var[µ(θ)].
K = EPV/VHM.
Let m = Σmi = total exposures. Z = m/(m + K). The observed pure premium is: total losses / total exposures =
∑Xi mi = ∑Xi mi . m ∑ mi
Buhlmann-Straub Credibility Premium = estimate of the future pure premium = Z(observed pure premium) + (1 - Z)µ. Now we will discuss the expected squared errors of these estimators, starting with the simpler Buhlmann Model, without size of insured being important. Covariance Matrix: In order to compute expected squared errors, it is useful to work with the Covariance Matrix of different years of data.182 The Covariance Matrix has variances of individual years down the diagonal and covariances between different years off the diagonal. In general, the expected squared errors and thus the least squares credibility depends on the Covariance structure of the data.
179
See Section 20.3.6 of Loss Models. X could be the frequency rather than the pure premium. 181 If all of the mi = 1, then the Buhlmann-Straub model reduces to the Buhlmann model. 182 Cov[X, Y] = E[XY] - E[X]E[Y]. Cov[X, X] = Var[X]. 180
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 649
For example, letʼs calculate the covariance between two separate rolls in the multi-sided die example.183 First one calculates the expected product of the die rolls, given a certain sided die has been picked. Then one takes the expected value over the different sided dice. Then one subtracts the overall mean squared. If one has selected a 4-sided die, then the (conditional) expected value of the product of two rolls is: E[X1 X2 | 4-sided] = {(1)(1) + (1)(2) + (1)(3) + (1)(4) + (2)(1) + (2)(2) + (2)(3) + (2)(4) + (3)(1) + (3)(2) + (3)(3) + (3)(4) + (4)(1) + (4)(2) + (4)(3) + (4)(4)} /16 = 100/16 = 2.52 . Given that one has picked a 4-sided die, the two rolls are independent and the expected value of the product is just the product of the means. E[X1 X2 | 4-sided] = E[X1 | 4-sided] E[X2 | 4-sided] = (2.5)(2.5). Similarly, E[X1 X2 | 6-sided] = 3.52 , and E[X1 X2 | 8-sided] = 4.52 . A priori there is 60% chance of a 4-sided die, 30% chance of a 6-sided die, and 10% chance of an 8-sided die. Thus: E[X1 X2 ] = P(4)E[X1 X2 | 4-sided] + P(6)E[X1 X2 | 6-sided] + P(8)E[X1 X2 | 8-sided] = (60%)(2.52 ) + (30%)(3.52 ) + (10%)(4.52 ) = 9.45. The overall mean is: E[X] = (60%)(2.5) + (30%)(3.5) + (10%)(4.5) = 3. Thus the covariance between different rolls is: COV[X1 , X2 ] = E[X1 X2 ] - E[X1 ]E[X2 ] = 9.45 - (3)(3) = 0.45. This also is the Variance of the Hypothetical Means computed earlier for this multi-sided die example. In fact, the arithmetic was precisely the same. The covariance between different rolls is just the VHM. As shown below, in general for Buhlmann Model, where size is not important, the Covariance structure between the years of data is: COV[Xi , Xj] = η2 δij + τ2, where δij is 1 for i = j and 0 for i≠j. η2 is the Expected Value of the Process Variance for one exposure τ2 is the Variance of the Hypothetical Means. 183
Assume that there are a total of 100 multi-sided dice of which 60 are 4-sided, 30 are 6-sided, and 10 are 8-sided. The multi-sided dice with 4 sides have 1, 2, 3 and 4 on them. The multi-sided dice with the usual 6 sides have numbers 1 through 6 on them. The multi-sided dice with 8 sides have numbers 1 through 8 on them. For a given die each side has an equal chance of being rolled; i.e., the die is fair. Your friend has picked at random a multi-sided die. He then rolled the die and told you the result. You are to estimate the result when he rolls that same die again.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 650
Then COV[X1 , X2 ] = τ2, and COV[X1 , X1 ] = VAR[X1 ] = η2 + τ2. Thus in the multi-sided die example in which τ2 = 0.45 and η2 = 2.15, the variance-covariance matrix between five rolls of the same die is:184
1 2 3 4 5
1
2
3
4
5
2.60 0.45 0.45 0.45 0.45
0.45 2.60 0.45 0.45 0.45
0.45 0.45 2.60 0.45 0.45
0.45 0.45 0.45 2.60 0.45
0.45 0.45 0.45 0.45 2.60
As discussed previously, for this example the covariance between different years of data is 0.45, while the (total) variance of a single year of data is 2.60, where one roll of the die ⇔ one “year” of data. When size of risk is important, the Buhlmann-Straub covariance structure is: COV[Xi , Xj] = δij (η 2 / mi) + τ2, where mi is the exposures for year i. As discussed previously, the EPV is (assumed to be) inversely proportional to the size of risk. Derivation of the Buhlmann Covariance Structure: In general, let the different types of risks be parameterized by θ.185 Let µ(θ) be the hypothetical mean for risks of type θ, E[X | θ] = µ(θ). Note that Eθ [µ(θ)] = µ, the overall a priori mean. Also τ2 = VHM = Eθ[µ(θ)2 ] - µ2. Then E[X1X2] = Eθ[ E[ X1X2 | θ ] ] = Eθ[ E[ X1 | θ ] E[ X2 | θ ] ] = Eθ[µ(θ)µ(θ)] = Eθ[µ(θ)2 ] = τ2 + µ2. Where we have used the fact that for a given type of risk, the first observation X1 and the second observation X2 are independent draws from the same risk process.
184
Iʼve only shown five rows and columns, corresponding to a total of five trials or die rolls. For example, in the case of the Gamma-Poisson, θ is the Poisson parameter. In a discrete case, θ could take on the four values: “Excellent”, “Good”, “Bad” , or “Ugly”. 185
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 651
When i ≠ j, for example 1 and 2, we have that the expected covariance between different years of data is: COV[X1 , X2 ] = E[X1 X2 ] - E[X1 ]E[X2 ] = Eθ[E[X1 X2 |θ]] - µ2 = Eθ[ E[X1 |θ] E[X2 |θ] ] - Eθ[µ(θ)]2 = Eθ[µ2(θ)]] - Eθ[µ(θ)]2 = second moment of the hypothetical means - square of the overall mean = Variance of the Hypothetical Means = τ2. When i = j we have that: COV[Xi , Xj] = COV[Xi , Xi] = VAR[X] = Total Variance = EPV + VHM = η2 + τ2. Thus putting together the cases when i = j and i ≠ j, COV[Xi , Xj] = η2δij + τ2. Squared Errors, Using a Single Observation: Assume we have a universe of risks.186 Choose one risk at random. Let X1 be an observation of a risk process.187 We wish to predict the next outcome X2 of the risk process for the same risk. Let the a priori overall mean be µ. It is assumed that the a priori expected value of the future observation equals µ: E[X2 ] = µ, as does that of the prior observation: E[X1 ] = µ. Let the new estimate be of the linear form: F = Z X1 + (1 - Z)µ. Then the error of the estimate compared to the posterior observation is: F - X2 = Z X1 + (1 - Z)µ - X2 = Z(X1 - X2 ) + (1 - Z)(µ - X2 ). Thus the expected value of the squared error as a function of Z is: V(Z) = E[{Z(X1 - X2 ) + (1 - Z)(µ - X2 )}2 ] = Z2 E[(X1 - X2 )2 ] + 2Z(1 - Z) E [(X1 - X2 )(µ - X2 )] + (1 - Z)2 E[(µ - X2 )2 ]. It will be useful to write out the expected value of various product terms. Let η2 be the expected value of the process variance of a single observation and τ2 be the variance of the hypothetical means of a single observation. Recall that the total variance of a single observation is: η2 + τ2. 186 187
An urn filled with dice, a group of drivers, etc. Roll a die, observe an individual driver for a year, etc.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 652
Then the expected value of the square of an observation from a year is the total variance plus the square of the overall mean: E[X1 2 ] = E[X2 2 ] = η2 + τ2 + µ2. The expected value of a product of observations from different years is the variance of the hypothetical means plus the square of the overall mean: E[X1 X2 ] = Cov[X1 , X2 ] + E[X1 ]E[X2 ] = τ2 + µ2.188 Terms involving the overall mean, involve neither the EPV nor VHM: E[X1 µ] = µE[X1] = µ2. The terms entering into the expected value of the squared error are: E[(X1 - X2 )2 ] = E[X1 2 ] - 2E[X1 X2 ] + E[X2 2 ] = 2(η2 + τ2 + µ2) - 2(τ2 + µ2) = 2η2. E[(X1 - X2 )(µ - X2)] = E[X1 µ] - E[X1 X2 ] - E[X2 µ] + E[X2 2 ] = µ2 - (τ2 + µ2) - µ2 + (η2 + τ2 + µ2) = η2. E[(µ - X2 )2 ] = E[µ2] - 2E[µX2 ] + E[X2 2 ] = µ2 - 2µ2 + (η2 + τ2 + µ2) = η2 + τ2. Thus the expected value of the squared error as a function of Z is: V(Z) = Z2 E[(X1 - X2 )2 ] + 2Z(1 - Z) E[(X1 - X2 )(µ - X2 )] + (1 - Z)2 E[(µ - X2 )2 ] = 2η2 Z 2 + 2η2Z(1 - Z) + (η2 + τ2)(1-Z)2 = (η2 + τ2) Z2 - 2τ2 Z + (η2 + τ2). Expected Value of the Squared Error = (η2 + τ2)Z2 - 2τ2 Z + (η2 + τ2).189 Thus the expected value of the squared error as a function of the weight given to the observation, Z, is a parabola. In order to minimize the expected value of the squared error, one sets its derivative equal to zero. Setting V´(Z) = 0, and solving for Z: Z = τ2 / (η2 + τ2 ) = VHM / Total Variance = 1 / (1 + η2/τ2). If we let K = η2/τ2 = EPV /VHM, where both the VHM and EPV are for a single observation of the risk process, then for one observation: Z = 1/(1 + K). 188 189
For two different years of data, there is no term involving the expected value of the process variance. The squared error between the estimate and the observation.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 653
For example, in the multi-sided die example as calculated in previous sections, for a single observation, EPV = η2 = 2.15, and VHM = τ2 = 0.45. In that case, the Buhlmann Credibility parameter, K = 2.15/.45 = 4.778, and for a single observation the Buhlmann Credibility was 1/(1 + 4.778) = 0.173. The expected value of the squared error as function of Z is in this case: V(Z) = (η2 + τ2) Z2 - 2τ2 Z + (η2 + τ2) = 2.6Z2 - 0.9Z + 2.6. The mean squared error as a function of the weight applied to the observation are shown below for 1 die (solid line), 3 dice (dashed line), and 10 dice (dotted line):190 MSE
2.9 N= 1
2.8 2.7
N= 3
2.6 2.5
N = 10
2.4 2.3
Weight 0.2
0.4
0.6
0.8
1
V(Z) for an observation of one die roll is minimized for Z = 0.173, the value of the Buhlmann Credibility.191 Similarly, the mean squared errors for observations of 3 and 10 die rolls are minimized for Z = 3/(3 + 4.778) = 0.386 and 10/(10 + 4.778) = 0.677.
As discussed below, for N dice, V(Z) = (η2/N + τ2) Z2 - 2τ2 Z + (η2 + τ2) = (2.15/N + .45)Z2 - .9Z + 2.6. Note that for values of Z somewhat different than .173, the expected squared error is still relatively small. For values near optimal, small differences in the Credibility have a relatively small effect on the expected squared error. See “An Actuarial Note on Credibility Parameters”, by Howard C. Mahler, PCAS 1986. 190 191
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 654
For the cases where the observations consist of either one (solid line), three (dashed line), or ten (dotted line) die rolls, shown below are the expected squared error as a function of k, such that the weight to the observation = N/(N + k): MSE
2.8
2.7
2.6 N= 1 2.5 N= 3 2.4 N = 10 2.3 k 1
2
3
4
5
6
7
8
9
10
In each case, the mean squared error is minimized for k = 4.778, the Buhlmann Credibility Parameter for this multisided die example.192 A Simulation Experiment: Simulate the multi-sided die example, as follows: Pick a die at random, with a 60% chance of a 4-sided die, a 30% chance of a 6-sided die, and a 10% chance of a 8-sided die. Roll the die. Estimate the second roll as: w(first roll) + (1 - w)(a priori mean) = w(first roll) + 3(1 - w). Roll this same die again. Then compute the squared error: (predicted second roll - actual second roll)2 . 192
For values near the optimal value of 4.778, small differences in K have a very small effect on the expected squared error. Generally one needs only to estimate K within about a factor of 2. In this case, for values of K between about 3 and 7, the expected squared error is close to minimal. As discussed in a separate study guide, Empirical Bayesian Credibility methods attempt to estimate K solely from the observed data. While the random fluctuations in the data often produce considerable uncertainty in the estimate of K, fortunately K does not need to be estimated very precisely.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 655
For example, if the first die is a six-sided die, and the two rolls are a 4 and a 5, then the squared prediction error would be: {w4 + (1-w)3 - 5}2 . Given a simulated pair of rolls from a die, the squared error is a function of w, the weight applied to the observation. This situation was simulated 1000 times, and the squared errors were averaged. Here is a graph of the Mean Squared Error as a function of w: MSE 4.25 4 3.75 3.5 3.25 3 2.75 0.2
0.4
0.6
0.8
1
Weight
The mean squared error between prediction and observation, is minimized for about w = 0.178, close to the Buhlmann Credibility for a single die roll of 0.173.193 MSE 2.5586 2.5584 2.5582 2.558 2.5578 2.5576
0.16
193
0.17
0.18
0.19
The difference is due to simulating only 1000 runs.
0.2
Weight
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 656
Squared Errors, Using an Average of Several Observations: If instead of a single observation, one uses the average of several observations, the key difference is that the EPV of the average of several observations is inversely proportional to the number of observations. Otherwise, the situation parallels that for a single observation. Let X1 , X2 , ..., XN be N observations of a risk process.194 We wish to predict the next outcome XN+1 for the same risk. Let the average of the observations be O = X = (1/N) Σ Xi. Let the a priori overall mean be µ. Itʼs assumed that the a priori expected value of the future observation equals µ: E[XN+1] = µ, as does that of each prior observation: E[Xi] = µ. Let the new estimate be of the linear form: F = Z O + (1 - Z)µ. Then the error compared to the posterior observation is: F - XN+1 = Z O + (1 - Z)µ - XN+1 = Z(O - XN+1) + (1 - Z)(µ - XN+1). Thus the expected value of the squared error as a function of Z is: V(Z) = E[{Z(O - XN+1) + (1 - Z)(µ - XN+1)}2 ] = Z 2 E[(O - XN+1)2 ] + 2Z(1 - Z) E[(O - XN+1)(µ - XN+1)] + (1 - Z)2 E[(µ - XN+1)2 ]. It will be useful to write out the expected value of various product terms. As before, let η2 be the expected value of the process variance of a single observation and τ2 be the variance of the hypothetical means of a single observation. Then the expected value of the square of an observation from a single year is the total variance plus the square of the overall mean: E[X1 2 ] = E[X2 2 ] = E[XN+12 ] = η2 + τ2 + µ2. O is the average of N independent draws from the same risk process, therefore its mean is µ and its (total) variance is: η2/N + τ2. Thus E[O2] = η2/N + τ2 + µ2.195 The expected value of a product of observations from different years is the variance of the hypothetical means plus the square of the overall mean: E[X1 XN+1] = τ2 + µ2. Thus E[OXN+1] = (1/N) Σ E[XiXN+1] = (1/N)N(τ2 + µ2) = τ2 + µ2. Terms involving the observation and year to be estimated do not contain the EPV. 194 195
Roll N identical dice, observe an individual driver for N years, etc. The process variance of an average declines as per the number of observations.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 657
Terms involving the overall mean, involve neither the EPV nor VHM: E[X1 µ] = µE[X1 ] = µ2. The terms entering into the expected value of the squared error are: E[(O - XN+1)2 ] = E[O2 ] - 2E[OXN+1] + E[XN+12 ] = (η2/N + τ2 + µ2) - 2(τ2 + µ2) + (η2 + τ2 + µ2) = (1 + 1/N)η2. E[(O - XN+1)(µ - XN+1)] = E[Oµ] - E[OXN+1] - E[XN+1µ] + E[XN+12 ] = µ2 - (τ2 + µ2) - µ2 + (η2 + τ2 + µ2) = η2. E[(µ - XN+1)2 ] = E[µ2] - 2E[µXN+1] + E[XN+12 ] = µ2 - 2µ2 + (η2 + τ2 + µ2) = η2 + τ2. Thus the expected value of the squared error as a function of Z is: V(Z) = Z2 E[(O - XN+1)2 ] + 2Z(1 - Z) E[(O - XN+1)(µ - XN+1)] + (1 - Z)2 E[(µ - XN+1)2 ] = (1 + 1/N)η2 Z2 + 2η2Z(1 - Z) + (η2 + τ2)(1 - Z)2 = (η2/N + τ2) Z2 - 2τ2 Z + (η2 + τ2). V(Z) = (η2/N + τ2) Z2 - 2τ2 Z + (η2 + τ2). As was the case of a single observation, the expected value of the squared error as a function of Z is a parabola. In order to minimize the expected value of the squared error, one sets its derivative equal to zero. Setting V´(Z) = 0, and solving for Z: Z = τ2 / (η2/N + τ2) = VHM/ Total Variance = N/(N + η2/τ2). If we let K = η2/τ2 = EPV / VHM, where both the VHM and EPV are for a single observation of the risk process, then for N observations Z = N/(N + K). Therefore Z = N / (N + K), where K = η2/τ2 = EPV / VHM, where EPV and VHM are each for a single observation. For example in the multi-sided die example, for a single observation EPV = η2 = 2.15, and VHM = τ2 = .45. In that case, the Buhlmann Credibility parameter K = 2.15 / 0.45 = 4.778, and for N observations the Buhlmann Credibility was Z = N/(N + 4.778). For example, for N = 3, the Buhlmann Credibility is: Z = 3/7.778 = 0.386. The expected value of the squared error as function of Z is in this case: V(Z) = (η2/N + τ2)Z2 - 2τ2 Z + (η2 + τ2) = (2.15/N + 0.45)Z2 - 0.9Z + 2.6. For N = 3, V(Z) = 1.1667Z2 - 0.9Z + 2.6. V(Z) is minimized for Z = 0.386, the value of the Buhlmann Credibility for three rolls of a die.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 658
Z = VHM / (VHM + EPV) = VHM / Total Variance, where the Variance of the Hypothetical Means and Expected Value of the Process Variance have already been adjusted to be that for N observations.196 Note that the Credibility is: Z = VHM / (VHM + EPV) = (1/EPV) / {(1/EPV) + (1/VHM)}, while the complement of credibility is: 1 - Z = (1/VHM) / {(1/EPV) + (1/VHM)}. Thus the observation is given a weight inversely proportional to its variance: EPV, while the overall a priori mean is given a weight inversely proportional to its variance: VHM. Thus each estimator of the quantity of interest is weighted inversely proportionally to its variance.197 Note that credibility is a measure of how reliable one estimator is relative to the other estimators; the larger the inverse variance of an estimator is compared to that of the other estimators, the more credibility it is given. Summary of Behavior with Number of Observations: As we add up N independent observations of the risk process, the process variances add. Therefore, if η2 is the process variance of a single observation, then Nη2 is the process variance of the sum of N identical draws from the given risk process. In contrast if τ2 is the variance of the hypothetical means of one observation, then the variance of the hypothetical means of the sum of N observations is N2 τ2 . This follows from the fact that each of the means is multiplied by N, and that multiplying by a constant multiplies the variance by the square of that constant. Thus as N increases, the variance of the hypothetical means of the sum goes up as N2 , while the process variance of the sum goes up only as N. Since the average is just the sum times 1/N, the variance of the hypothetical means of the average is independent of N, while the process variance of the average goes down as 1/N. Thus since the relative weights are inversely proportional to the variance, the weight given the observation increases relative to that given to the prior mean, as N increases. As N increases, the credibility given to the observation increases. The credibility to be given by: Z = VHM / Total Variance = VHM / (VHM + EPV) = N2 τ2 / (N2 τ2 + Nη2) = τ2 / (τ2 + η2/N) = N / (N + η2/τ2). 196
One computes K = EPV / VHM where the EPV and VHM are for one observation. The formula Z = N / (N + K) automatically adjust for the number of observations N. 197 This is a special case of general statistical situation. If two unbiased estimators are independent, then the minimum variance unbiased estimator is the weighted average of the two estimators, with each estimator weighted in inverse proportion to its variance.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 659
Therefore Z = N / (N + K), where K = η2 / τ2 = EPV / VHM, where EPV and VHM are each for a single observation.198 Behavior with Size of Risk: In lines of insurance such as Group Health, Commercial Automobile, and Workersʼ Compensation, one deals with insureds with differing number of exposures. For example, one fleet might consist of 10 trucks, while another consists of 100 trucks. All else being equal, one assigns more credibility to the experience of a larger insured than to that of a smaller insured. Let m be some relevant measure of the size of an insured such as exposures, then Z gets larger as m gets larger. Generally what we are estimating for the insured is a quantity with some measure of size of risk in the denominator. So for example, the frequency is number of claims divided by exposures, severity is number of dollars divided by number of claims, pure premiums are losses divided by number of exposures, and a loss ratio is losses divided by premiums. In each case the denominator is some measure of the size of the insured. If one adds up identical exposures then η2, the process variance of losses, is multiplied by m, while in contrast τ2, the variance of the hypothetical mean losses, is multiplied by m2 . However, if instead we deal with pure premiums which are losses divided by m, then each variance is divided by m2 . Thus we have that the process variance is η2/m, while the variance of the hypothetical means is τ2. As the size of risk increases, the process variance of pure premiums decreases while the variance of the hypothetical mean pure premiums remains the same.199 200 This corresponds to the Buhlmann-Straub covariance structure: Cov[Xi, Xj] = τ2 + δijη2/mi. As the size of risk increases the process variance (noise) decreases, so we assign the observation more credibility. Specifically the credibility Z = VHM / Total Variance = VHM / (VHM + EPV) = τ2 / (τ2 + η2/m) = m / (m + η2/τ2).
198
On the exam, the calculation of the Credibility thus involves the calculation of these two variances for a single observation. In general situations, one has to analyze a Covariance matrix, as discussed in the next section. See for example, Howard Mahlerʼs “An Example of Credibility and Shifting Risk Parameters”, PCAS 1990. 199 While this behavior holds on the exam, unless specifically stated otherwise as in 4, 5/01, Q.23, it does not hold in all practical applications. This is discussed briefly in the next section. 200 The key idea is that the pure premium is an average; pure premium is the average loss per exposure. The same mathematics would apply to other averages such as frequency or severity.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 660
Therefore Z = m / (m + K), where K = η2 / τ2 = EPV / VHM, where EPV and VHM are each for a single unit of m; m is measured in whatever units appear in the denominator of the quantity of interest.201 If we are interested in frequency or pure premiums then E, the measure of size of risk, is in units of exposures. If we are interested in severity then E, the measure of size of risk, is in units of (expected number of) claims. If we are interested in loss ratios then E, the measure of size of risk, is in units of premiums. Errors Compared to the Hypothetical Mean: Throughout this section the expected squared error was measured by comparing the estimate to the future observation. When dealing with models, one could instead measure the error of the observation with respect to the hypothetical mean for the chosen risk.202 However, these two varieties of errors are closely related to each other. It turns out that the expected squared error measured with respect to the future observation just contains an extra EPV compared to the squared error with respect to the hypothetical mean.203 This can be shown as follows. As previously, let η2 be the expected value of the process variance of a single observation and let τ2 be the variance of the hypothetical means of a single observation. Let O be the past observation, an average of N data points. Let F be the Credibility estimate, F = ZO + (1 - Z)µ. Let XN+1 be the future observation. V X(Z) = E[(XN+1 - F)2 ]. V HM(Z) = E[(µ(θ) - F)2 ], where µ(θ) is the hypothetical mean for a given risk type θ. Then VHM(Z) = E[(µ(θ)-F)2 ] = E[(µ(θ) - XN+1 + XN+1 - F)2 ] = E[(µ(θ) - XN+1)2 ] + E[( XN+1 - F)2 ] + 2E[(µ(θ) - XN+1)( XN+1 - F)]. Now we have E[(µ(θ) - XN+1)2 ] = η2, since this is just the EPV, the variance of the (single) future observation around its hypothetical mean. 201
On the exam the calculation of the credibility thus involves the calculation of these two variances for a single exposure. In general situations, as discussed in the next section, one has to analyze a Variance-Covariance matrix. 202 Note that in insurance applications the hypothetical mean for a given insured can not be observed. 203 This extra EPV comes from the extra random fluctuation of the future observation around the hypothetical mean for that risk.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 661
E[(XN+1 - µ(θ))( XN+1 - O)] = E[(XN+12 ] + E[µ(θ) O)] - E[(XN+1O)] - E[µ(θ)XN+1] = (η2 + τ2 + µ2) + (τ2 + µ2) - (τ2 + µ2) - (τ2 + µ2) = η2. E[(XN+1 - µ(θ))( XN+1 - µ)] = E[(XN+12 ] + E[µ(θ) µ)] - E[(XN+1 µ)] - E[ µ(θ)XN+1] = (η2 + τ2 + µ2) + µ2 - µ2 - (τ2 + µ2) = η2. Therefore, E[(µ(θ) - XN+1)(XN+1 - F)] = -E[(XN+1 - µ(θ))(XN+1 - F)] = -E[(XN+1 - µ(θ))(XN+1 - ZO + (1 - Z)µ)] = -ZE[(XN+1 - µ(θ))(XN+1 - O)] - (1 - Z)E[(XN+1 - µ(θ))(XN+1 - µ)] = -Zη2 - (1 - Z)η2 = -η2. Therefore, VHM(Z) = E[(µ(θ) - XN+1)2 ] + E[(XN+1 - F)2 ] + 2E[(µ(θ) - XN+1)( XN+1 - F)] = η2 + VX(Z) - 2η2 = VX(Z) - η2. Thus as stated above, the expected squared error measured with respect to the future observation contains just an extra EPV compared to the squared error with respect to the hypothetical mean: VX(Z) = VHM(Z) + η2. Using the previous formula for the expected squared error with respect to the future observation, when one uses N observations, we have: V X(Z) = (η2/N + τ2)Z2 - 2τ2 Z + (η2 + τ2). V HM(Z) = (η2/N + τ2)Z2 - 2τ2 Z + τ2. Since these two expected squared errors differ by only η2, independent of Z, the value of Z that minimizes one also minimizes the other. Thus the Buhlmann Credibility minimizes the expected squared error measured with respect to either the future observation or the hypothetical means. Exercise: For the multi-sided die example, where one observes one roll of a die, what is VX(Z), the expected squared error measured with respect to the future observation? [Solution: VX(Z) = (η2/N + τ2)Z2 - 2τ2Z + (η2 + τ2). For this example, N =1, τ2 = 0.45, and η2 = 2.15. Thus VX(Z) = 2.6 Z2 - 0.9 Z + 2.6.]
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 662
Exercise: For the multi-sided die example, where one observes one roll of a die, what is VHM(Z), the expected squared error measured with respect to the hypothetical means? [Solution: VHM(Z) = (η2/N + τ2)Z2 - 2τ2Z + τ2. For this example, N =1, τ2 = 0.45, and η2 = 2.15. Thus VHM(Z) = 2.6Z2 - 0.9 Z + 0.45.] Exercise: For the multi-sided die example, what is the Buhlmann Credibility assigned to an observation of one roll of a die? [Solution: Minimize either VX(Z) or VHM(Z) and obtain Z = 0.45 / 2.6 = 17.3%.] Errors Compared to the Bayesian Estimates: One can also consider the expected squared error with respect to the results of Bayes Analysis. As shown in the section on Linear Regression, Buhlmann Credibility also minimizes these squared errors. Let E[µ(θ) | O] be the Bayesian Estimate, given the past observation O. Let VB(Z) = E[(E[µ(θ) | O] - F)2 ]. Then VHM(Z) = E[(µ(θ) - F)2 ] = E[(µ(θ) - E[µ(θ) | O] + E[µ(θ) | O] - F)2 ] = E[(E[µ(θ) | O] - µ(θ))2 ] + E[(E[µ(θ) | O] - F)2 ] - 2E[(E[µ(θ) | O] - µ(θ))(E[µ(θ) | O] - F)]. Now we have that, E[(E[µ(θ) | O] - µ(θ))(E[µ(θ) | O] - F)] = EO[E[(E[µ(θ) | O] - µ(θ))(E[µ(θ) | O] - F) | O]] = EO[(E[µ(θ) | O] - E[µ(θ) | O])(E[µ(θ) | O] - {ZO + (1 - Z)µ})] = EO[0(E[µ(θ) | O] - {ZO + (1 - Z)µ})] = EO[0] = 0. Thus, VHM(Z) = E[(E[µ(θ) | O] - µ(θ))2 ] + VB(Z). Note that the first term on the righthand side of the equation, E[(E[µ(θ) | O] - µ(θ))2 ], is independent of Z. Therefore the value of Z that minimizes the expected squared error with respect to the hypothetical means, VHM(Z), will also minimize the expected squared error with respect to the Bayesian Estimates, VB(Z).
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 663
Thus the Buhlmann Credibility is the linear estimator which minimizes the expected squared error measured with respect to either the future observation, the hypothetical mean, or the Bayesian Estimate. Exercise: For the multi-sided die example, where one observes one roll of a die, what is E[(E[µ(θ) | O] - µ(θ))2 ], the expected squared difference between the hypothetical means and the Bayesian Estimates? [Solution: As calculated in a previous section, the Bayesian Estimate corresponding to each of the possible observations are: Observation 1 2 3 4 Bayesian Estimate 2.853 2.853 2.853 2.853
5 3.7
6 3.7
7 4.5
8 4.5
For a given observation, we can compute the posterior chance that we have a 4, 6, or 8 sided die and thus µ(θ) equal to 2.5, 3.5, or 4.5. For example, as shown in a prior section if a 3 is observed, those posterior chances are: 70.6%, 23.5% and 5.9%. Thus if one observes a roll of a 3, E[(E[µ(θ) | O] - µ(θ))2 | 3] = (70.6%)(2.853 - 2.5)2 + (23.5%)(2.853 - 3.5)2 + (5.9%)(2.853 - 4.5)2 = 0.346. For other possible observations one can do the similar computation and then weight together the results using the a priori chances of each observation:204 A Priori Chance of Bayesian Observation Observation Estimate 1 2 3 4 5 6 7 8
0.212 0.212 0.212 0.212 0.062 0.062 0.013 0.013
Average
2.853 2.853 2.853 2.853 3.700 3.700 4.500 4.500
Posterior Posterior Posterior Chance of Chance of Chance of 4-sided die 6-sided die 8-sided die 70.6% 70.6% 70.6% 70.6% 0.0% 0.0% 0.0% 0.0%
23.5% 23.5% 23.5% 23.5% 80.0% 80.0% 0.0% 0.0%
Expected Squared Difference
5.9% 5.9% 5.9% 5.9% 20.0% 20.0% 100.0% 100.0%
3.000
0.346 0.346 0.346 0.346 0.160 0.160 0.000 0.000 0.314
Thus for this example, E[(E[µ(θ) | O] - µ(θ))2 ] = 0.314.] Exercise: For the multi-sided die example, where one observes one roll of a die, what is V B(Z) , the expected squared error measured with respect to the Bayesian Estimates? [Solution: VHM(Z) = 2.6Z2 - 0.9Z + 0.45, and E[E[µ(θ) | O] - µ(θ))2 ] = 0.314. Thus VB(Z) = VHM(Z) - E[(E[µ(θ) | O] - µ(θ))2 ] = 2.6Z2 - 0.9Z + 0.45 - 0.314 = 2.6Z2 - 0.9Z + 0.136.] 204
These a priori chances were computed in a previous section.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 664
Exercise: For the multi-sided die example, what is the Buhlmann Credibility assigned to an observation of one roll of a die? [Solution: Minimize either VX(Z), VHM(Z), or VB(Z) and obtain Z = 0.45 / 2.6 = 17.3%.] Note that the expected squared error of the Credibility Estimates with respect to the Bayesian Estimate is smaller than that with respect to the Hypothetical Means. In the example, 2.6Z2 - 0.9 Z + 0.136 < 2.6Z2 - 0.9 Z + 0.45. Buhlmann Credibility is a closer approximation to the Bayesian Estimates than to the Hypothetical Mean. This makes sense, since the Bayesian Estimates are themselves an approximation to the Hypothetical Means; the Bayesian Estimates are closer to the Hypothetical Means than are the Credibility Estimates, except when the two estimates are equal, since the Bayesian Estimates are not restricted to being a linear function of the observations.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 665
Problems: Use the following information for the next 5 questions: You observe the following experience for six insureds during the years 1996 to 1998 combined: Insured
Premium in 1996-98 (prior to experience mod)
Losses in 1996-98
Loss Ratio in 1996-98
Acme Anvils Buzz Beer Cash & Carey Drum & Drummer Everything Elvis Frank & Stein
100 60 130 120 190 80
65 35 80 110 90 70
65.0% 58.3% 61.5% 91.7% 47.4% 87.5%
Overall
680
450
66.2%
Experience modifications will be calculated for each of these insureds using their experience from 1996 to 1998 together with the formulas: Z=
P (L / P) Z + (1- Z) (0.662) and M = , P +K 0.662
where Z = credibility, P = premium, L = losses, M = experience modification, and 0.662 is the observed overall loss ratio. You subsequently observe the following experience for these same six insured during the year 2000. Insured
Premium in 2000 (prior to experience mod)
Losses in 2000
Loss Ratio in 2000
Acme Anvils Buzz Beer Cash & Carey Drum & Drummer Everything Elvis Frank & Stein
30 20 50 40 60 35
20 15 35 30 25 30
66.7% 75.0% 70.0% 75.0% 41.7% 85.7%
Overall
235
155
66.0%
17.1 (1 point) If K = 70, what is the experience modification for Everything Elvis? A. Less than 0.75 B. At least 0.75 but less than 0.78 C. At least 0.78 but less than 0.81 D. At least 0.81 but less than 0.84 E. At least 0.84
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
17.2 (2 points) The modified loss ratio in the year 2000 is:
HCM 10/19/12,
Page 666
losses . (premiums) (experience modification)
If K = 70, what is the modified loss ratio for Drum & Drummer? A. Less than 0.55 B. At least 0.55 but less than 0.57 C. At least 0.57 but less than 0.59 D. At least 0.59 but less than 0.61 E. At least 0.61 17.3 (2 points) If K = 70, what is the squared difference in the year 2000, between the overall loss ratio and the modified loss ratio for Frank & Stein? A. Less than 0.004 B. At least 0.004 but less than 0.005 C. At least 0.005 but less than 0.006 D. At least 0.006 but less than 0.007 E. At least 0.007 17.4 (3 points) If K = 70, what is the sum of the squared differences in the year 2000, between the overall loss ratio and the modified loss ratios for these six insureds? A. Less than 0.049 B. At least 0.049 but less than 0.050 C. At least 0.050 but less than 0.051 D. At least 0.051 but less than 0.052 E. At least 0.052 17.5 (6 points) Which of the following values of K results in the smallest sum of the squared differences in the year 2000 between the overall loss ratio and the modified loss ratios for these six insureds? A. 40 B. 55 C. 70 D. 85 E. 100 17.6 (3 points) Let X1 , X2 , ..., XN be independent random variables with common mean µ. Var[Xi] = σi2. Let Y = ΣwiXi. Determine wi, such that Y is an unbiased estimator of µ, with the smallest variance. 17.7 (1 point) Let X1 , X2 , ..., XN be independent random variables with common mean µ. Var[Xi] = σ2/mi. Let Y = ΣwiXi. Determine wi, such that Y is an unbiased estimator of µ, with the smallest variance. Hint: Use the answer to the previous question.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 667
17.8 (1 point) Let X1 , X2 , ..., XN be independent random variables with common mean µ. Var[Xi] = b + c/mi. Let Y = ΣwiXi. Determine wi, such that Y is an unbiased estimator of µ, with the smallest variance.
Use the following information for the next 10 questions: There are two urns each containing a large number of balls. Each ball has a number written on it. Number on Ball 0 1 4 Urn A 50% 30% 20% Urn B 20% 50% 30% An urn is picked at random, and balls are drawn from that urn with replacement. 17.9 (1 point) Let X be the result of drawing a single ball. Determine E[X | A], E[X | B], and VarU[E[X | U]]. 17.10 (1 point) Let Y be the sum of drawing 3 balls from a single urn. Determine E[Y | A], E[Y | B], and VarU[E[Y | U]]. 17.11 (1 point) Let X be the average of drawing 3 balls from a single urn. Determine E[ X | A], E[ X | B], and VarU[E[ X | U]]. 17.12 (1 point) Let X be the result of drawing a single ball. Determine Var[X | A], Var[X | B], and EU[Var[X | U]]. 17.13 (1 point) Let Y be the sum of drawing 3 balls from a single urn. Determine Var[Y | A], Var[Y | B], and EU[Var[Y | U]]. 17.14 (1 point) Let X be the average of drawing 3 balls from a single urn. Determine Var[ X | A], Var[ X | B], and EU[Var[ X | U]]. 17.15 (1 point) Let X be the result of drawing a single ball. Determine Var[X]. 17.16 (1 point) Let Y be the sum of drawing 3 balls from a single urn. Determine Var[Y]. 17.17 (1 point) Let X be the average of drawing 3 balls from a single urn. Determine Var[ X ].
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 668
17.18 (1 point) An urn is selected at random. Harry draws five balls, which total 5. Then Sally draws six more balls from the same urn, which total 7. Using Bühlmann-Straub Credibility, estimate the sum of the next 100 balls drawn from the same urn. A. Less than 120 B. At least 120 but less than 125 C. At least 125 but less than 130 D. At least 130 but less than 135 E. At least 135
17.19 (2 points) For each policyholder, losses X1 ,…, Xn , conditional on Θ, are independently and identically distributed with mean, µ(θ) = E(Xj | Θ = θ), j = 1, 2,... , n and variance, v(θ) = Var(Xj | Θ = θ), j = 1, 2,... , n You are given: (i) Cov(Xi, Xj) = 40, for i ≠ j. (ii) Var(Xi) = 130. Determine the Bühlmann credibility assigned for estimating X4 based on X1 , X2 , X3 . (A) Less than 60% (B) At least 60%, but less than 65% (C) At least 65%, but less than 70% (D) At least 70%, but less than 75% (E) At least 75% 17.20 (2 points) Cov[Xi, Xj] = 2 + 18δij, where δij = 0 if i ≠ j and 1 if i = j. If X1 = 27 and X2 = 19, then using Buhlmann Credibility, the estimate of X3 is 32. If instead X1 = 40, X2 = 61, X3 = 45, and X4 = 29, then using Buhlmann Credibility what is the estimate of X5 ? A. 36
B. 37
C. 38
D. 39
E. 40
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 669
17.21 (2 points) Assume the Bühlmann-Straub covariance structure. Xi is the aggregate losses for year i. Ei is the exposures for year i. The expected value of the process variance of aggregate losses for year i is: 20,000 / Ei. Year Exposures 1 2000 2 3000 3 3000 The credibility assigned for estimating X4 based on X1 , X2 , and X3 is 2/3. Calculate Cov(X1 , X1 ). (A) 5
(B) 10
(C) 15
(D) 16.67
(E) None of A, B, C, or D
17.22 (3 points) Using least-squares regression and the following information, estimate the credibility of one year of claims experience. Second Period Claim Count 0 1 2 A. 1%
B. 3%
0 8300 750 50 9100
C. 5%
First Period Claim Count 1 2 740 40 100 8 10 2 850 50 D. 7% E. 9%
Total 9080 858 62 10,000
17.23 (2 points) A model for the claim frequency from an insurance policy is parameterized by θ. You are given n years of claim frequencies from this policy, X1 , X2 , ... Xn . The policy has mi exposures in year i. You are asked to use the Bühlmann-Straub credibility model to estimate the expected claim frequency in year n + 1 for this policy. Which of conditions (A), (B), or (C) are required by the model? (A) The Xi are independent, conditional on θ. (B) The Xi have a common mean. (C) Var[Xi | Θ = θ] = v(θ)/mi. (D) Each of (A), (B), and (C) is required. (E) None of (A), (B), or (C) is required.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 670
17.24 (4, 5/00, Q.18) (2.5 points) You are given two independent estimators of an unknown quantity µ: (i) Estimator A: E(µA) = 1000 and σ(µA) = 400 (ii) Estimator B: E(µB) = 1200 and σ(µB) = 200 Estimator C is a weighted average of the two estimators A and B, such that: µC = w µA + (1 - w)µB Determine the value of w that minimizes σ(µC). (A) 0
(B) 1/5
(C) 1/4
(D) 1/3
(E) 1/2
17.25 (4, 11/05, Q.26 & 2009 Sample Q.236) (2.9 points) For each policyholder, losses X1 ,…, Xn , conditional on Θ, are independently and identically distributed with mean, µ(θ) = E(Xj | Θ = θ), j = 1, 2,... , n and variance, v(θ) = Var(Xj | Θ = θ), j = 1, 2,... , n You are given: (i) The Bühlmann credibility assigned for estimating X5 based on X1 ,…, X4 is Z = 0.4. (ii) The expected value of the process variance is known to be 8. Calculate Cov(Xi, Xj), i ≠ j. (A) Less than -0.5 (B) At least -0.5, but less than 0.5 (C) At least 0.5, but less than 1.5 (D) At least 1.5, but less than 2.5 (E) At least 2.5 17.26 (4, 5/07, Q.32) (2.5 points) You are given n years of claim data originating from a large number of policies. You are asked to use the Bühlmann-Straub credibility model to estimate the expected number of claims in year n + 1. Which of conditions (A), (B), or (C) are required by the model? (A) All policies must have an equal number of exposure units. (B) Each policy must have a Poisson claim distribution. (C) There must be at least 1082 exposure units. (D) Each of (A), (B), and (C) is required. (E) None of (A), (B), or (C) is required.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 671
Solutions to Problems: 17.1. C. Z = 190/(190 + 70) = .731. M = {(.731)(.474) + (1 - .731)(.662)}/.662 = .525/.662 = 0.792. 17.2. D. Z = 120/(120 + 70) = .632. M = {(.632)(.917) + (1 - .632)(.662)}/.662 = 1.243. For the year 2000: (L/P)/M = .750/1.243 = 0.603. 17.3. C. Z = 80/(80 + 70) = .533. M = {(.533)(.875) + (1 - .533)(.662)}/.662 = 1.172. For the year 2000: (L/P)/M = .857/1.171 = .732. The overall loss ratio in the year 2000 is .660. The squared difference is: (.732 - .660)2 = 0.00518. 17.4. B. The sum of the squared differences is 0.04977.
Insured
1996-98 Premium Loss (prior toLossesRatio exper. mod) 1996-98
Acme Anvils Buzz Beer Cash & Carey Drum & Drummer Everything Elvis Frank & Stein
100 60 130 120 190 80
65.0% 58.3% 61.5% 91.7% 47.4% 87.5%
Overall
680
66.2%
Experience Mod, K = 70
Loss Ratio 2000
Modified Loss Ratio 2000
Squared Difference
0.990 0.945 0.954 1.243 0.792 1.172
66.7% 75.0% 70.0% 75.0% 41.7% 85.7%
67.4% 79.3% 73.3% 60.3% 52.6% 73.1%
0.00020 0.01791 0.00545 0.00317 0.01787 0.00517
66.0%
0.04977
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 672
17.5. B. The smallest squared difference occurs when K = 55.
Insured
1996-98 Premium Loss (prior toLossesRatio exper. mod) 1996-98
Acme Anvils Buzz Beer Cash & Carey Drum & Drummer Everything Elvis Frank & Stein
100 60 130 120 190 80
65.0% 58.3% 61.5% 91.7% 47.4% 87.5%
Overall
680
66.2%
Insured
1996-98 Premium Loss (prior toLossesRatio exper. mod) 1996-98
Acme Anvils Buzz Beer Cash & Carey Drum & Drummer Everything Elvis Frank & Stein
100 60 130 120 190 80
65.0% 58.3% 61.5% 91.7% 47.4% 87.5%
Overall
680
66.2%
Insured
1996-98 Premium Loss (prior toLossesRatio exper. mod) 1996-98
Acme Anvils Buzz Beer Cash & Carey Drum & Drummer Everything Elvis Frank & Stein
100 60 130 120 190 80
65.0% 58.3% 61.5% 91.7% 47.4% 87.5%
Overall
680
66.2%
Experience Mod, K = 40
Loss Ratio 2000
Modified Loss Ratio 2000
Squared Difference
0.987 0.929 0.946 1.289 0.765 1.215
66.7% 75.0% 70.0% 75.0% 41.7% 85.7%
67.5% 80.7% 74.0% 58.2% 54.5% 70.6%
0.00025 0.02186 0.00641 0.00603 0.01324 0.00212
66.0%
0.04990
Experience Mod, K = 55
Loss Ratio 2000
Modified Loss Ratio 2000
Squared Difference
0.989 0.938 0.951 1.264 0.780 1.191
66.7% 75.0% 70.0% 75.0% 41.7% 85.7%
67.4% 79.9% 73.6% 59.3% 53.4% 72.0%
0.00022 0.01956 0.00588 0.00439 0.01565 0.00362
66.0%
0.04932
Experience Mod, K = 85
Loss Ratio 2000
Modified Loss Ratio 2000
Squared Difference
0.990 0.951 0.958 1.225 0.804 1.156
66.7% 75.0% 70.0% 75.0% 41.7% 85.7%
67.3% 78.9% 73.1% 61.2% 51.8% 74.1%
0.00018 0.01667 0.00510 0.00226 0.01991 0.00668
66.0%
0.05080
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
Insured
1996-98 Premium Loss LossesRatio (prior to exper. mod) 1996-98
Acme Anvils Buzz Beer Cash & Carey Drum & Drummer Everything Elvis Frank & Stein
100 60 130 120 190 80
65.0% 58.3% 61.5% 91.7% 47.4% 87.5%
Overall
680
66.2%
HCM 10/19/12,
Page 673
Experience Mod, K = 100
Loss Ratio 2000
Modified Loss Ratio 2000
Squared Difference
0.991 0.956 0.960 1.210 0.814 1.143
66.7% 75.0% 70.0% 75.0% 41.7% 85.7%
67.3% 78.5% 72.9% 62.0% 51.2% 75.0%
0.00017 0.01570 0.00480 0.00158 0.02178 0.00813
66.0%
0.05217
Comment: One desires that the loss ratios after the application of experience rating be similar for the different insureds. One way to quantify that goal is to after the fact compute the squared differences between the overall loss ratio and the modified loss ratio. Then the value of K that produced the smallest squared error would have worked best if it had been used. Thus here a Buhlmann Credibility Parameter of (about) 55 would have worked well in the past. An actual test would rely on much more data as well as being somewhat more complicated. See for example,“Parametrizing the Workersʼ Compensation Experience Rating Plan,” by William R. Gillam,” PCAS 1992 and the Discussion by Howard Mahler in PCAS 1993. Here is a graph of the sum of the squared errors as a function of K: SumSqDiff 0.0520 0.0515 0.0510 0.0505 0.0500
40
50
60
70
80
90
100
K
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 674
17.6. Y unbiased. ⇒ µ = E[Y] = ΣwiE[Xi] = µΣwi. ⇒ Σwi = 1. Var[Y] = Σwi2 Var[Xi] = Σwi2 σi2. Use Lagrange Multipliers to minimize Var[Y], subject to the constraint: Σwi - 1 = 0. Set equal to zero the partial derivatives with respect to wj of: Σwi2 σi2 + λ(Σwi - 1). 0 = 2wjσj2 + λ. j = 1, ..., N. ⇒ wj = -λ/(2σj2). In other words, each variable gets weight inversely proportional to its variance.
Σwi = 1. ⇒ wj = (1/σj2)/Σ(1/σi2). Comment: For the case of just two variables X1 and X2 , w1 = (1/σ1 2)/(1/σ1 2 + 1/σ2 2) = σ2 2/(σ1 2 + σ2 2) and w2 = (1/σ2 2)/(1/σ1 2 + 1/σ2 2) = σ1 2/(σ1 2 + σ2 2). In other words, each variable gets weight inversely proportional to its variance. Similar to Exercise 1 in “Topics in Credibility” by Dean. 17.7. w j = (1/σj2)/Σ(1/σi2) = (mj/σ2)Σmi/σ2 = mj/m, where m = Σmi. Comment: If mi is the exposure associated with each Xi, and each Xi is a frequency, pure premium, loss ratio, etc., then the Buhlmann-Straub Covariance Structure assumes Var[Xi] = σ2/mi, where σ2 would be the process variance for one exposure. Then the smallest variance of an unbiased estimator of the common mean is the exposure weighted average of the Xi. If Xi were pure premium, then Σ(mi/m)Xi = (ΣmiXi)/m = (total losses)/(total exposures). Similar to Exercise 10 in “Topics in Credibility” by Dean. 17.8. Using a previous solution, wj = (1/σj2)/Σ(1/σi2) = {mj/(bmj + c)}/Σ{mi/(bmi + c)}. Comment: See Examples 16.7 and 16.29 in Loss Models. If mi is the exposure associated with each Xi, and each Xi is a frequency, pure premium, loss ratio, etc., then the Buhlmann-Straub Covariance Structure assumes b = 0. If b = 0, this reduces to the previous question. 17.9. E[X | A] = (50%)(0) + (30%)(1) + (20%)(4) = 1.1. E[X | B] = (20%)(0) + (50%)(1) + (30%)(4) = 1.7. E[X] = (1.1 + 1.7)/2 = 1.4. VarU[E[X | U]] = {(1.1 - 1.4)2 + (1.7 - 1.4)2 }/2 = 0.09.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 675
17.10. The expected value of the sum of 3 balls is 3 times the expected value of a single ball. E[Y | A] = 3E[X | A] = (3)(1.1) = 3.3. E[Y | B] = 3E[X | B] = (3)(1.7) = 5.1. E[Y] = 4.2. VarU[E[Y | U]] = {(3.3 - 4.2)2 + (5.1 - 4.2)2 }/2 = 0.81. Comment: The variance of the hypothetical means of the sum is 32 = 9 times the VHM for a single ball. 17.11. The expected value of the average of 3 balls is the expected value of a single ball. E[ X | A] = E[X | A] = 1.1. E[ X | B] = E[X | B] = 1.7. E[ X ] = (1.1 + 1.7)/2 = 1.4. VarU[E[ X | U]] = {(1.1 - 1.4)2 + (1.7 - 1.4)2 }/2 = 0.09. Comment: The variance of the hypothetical means of the average is the VHM for a single ball. 17.12. E[X2 | A] = (50%)(02 ) + (30%)(12 ) + (20%)(42 ) = 3.5. Var[X | A] = 3.5 - 1.12 = 2.29. E[X2 | B] = (20%)(02 ) + (50%)(12 ) + (30%)(42 ) = 5.3. Var[X | B] = 5.3 - 1.72 = 2.41. EU[Var[X | U]] = (2.29 + 2.41)/2 = 2.35. 17.13. The variance of the sum of 3 balls is 3 times the variance of a single ball. Var[Y | A] = 3Var[X | A] = (3)(2.29) = 6.87. Var[Y | B] = 3Var[X | B] = (3)(2.41) = 7.23. EU[Var[X | U]] = (6.87 + 7.23)/2 = 7.05. Comment: The expected value of the process variance of the sum is 3 times the EPV for a single ball. 17.14. The variance of the average of 3 balls is 1/3 the variance of a single ball. Var[Y | A] = Var[X | A]/3 = 2.29/3 = .7633. Var[Y | B] = Var[X | B]/3 = 2.41/3 = .8033. EU[Var[X | U]] = (.7633 + .8033)/2 = .7833. Comment: The expected value of the process variance of the sum is 1/3 the EPV for a single ball. 17.15. For an urn picked at random, Prob[0] = 35%, Prob[1] = 40%, and Prob[4]= 25%. E[X] = (35%)(0) + (40%)(1) + (25%)(4) = 1.4. E[X2 ] = (35%)(02 ) + (40%)(12 ) + (25%)(42 ) = 4.4. Var[X] = 4.4 - 1.42 = 2.44. Alternately, Var[X] = VarU[E[X | U]] + EU[Var[X | U]] = 0.09 + 2.35 = 2.44. Comment: Similar to Exercise 2 in “Topics in Credibility” by Dean.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 676
17.16. Var[Y] = VarU[E[Y | U]] + EU[Var[Y | U]] = 0.81 + 7.05 = 7.86. Alternately, list all of the possible outcomes. Outcome
Y
Y^2
Prob. Given A
Prob. Given B
Probability
All 0 All 1 All 4 two @0 and one @ 1 two @0 and one @ 4 two @1 and one @ 0 two @1 and one @ 4 two @4 and one @ 0 two @4 and one @ 1 0, 1, 4
0 3 12 1 4 2 6 8 9 5
0 9 144 1 16 4 36 64 81 25
0.1250 0.0270 0.0080 0.2250 0.1500 0.1350 0.0540 0.0600 0.0360 0.1800
0.0080 0.1250 0.0270 0.0600 0.0360 0.1500 0.2250 0.0540 0.1350 0.1800
0.0665 0.0760 0.0175 0.1425 0.0930 0.1425 0.1395 0.0570 0.0855 0.1800
4.2
25.5
1.0000
1.0000
1.0000
Var[Y] = E[Y2 ] - E[Y]2 = 25.5 - 4.22 = 7.86. Comment: Var[Y] ≠ 3Var[X] = (3)(2.44) = 7.32. For the sum of N balls from a single urn, Var[ X ] = 0.09N2 + 2.35N, where 0.09 is the VHM for a single ball and 2.35 is the EPV for a single ball. 17.17. Var[ X ] = VarU[E[ X | U]] + EU[Var[ X | U]] = 0.09 + .7833 = 0.8733. Alternately, list all of the possible outcomes. Outcome
XBar
XBar^2
Prob. Given A
Prob. Given B
Probability
All 0 All 1 All 4 two @0 and one @ 1 two @0 and one @ 4 two @1 and one @ 0 two @1 and one @ 4 two @4 and one @ 0 two @4 and one @ 1 0, 1, 4
0.0000 1.0000 4.0000 0.3333 1.3333 0.6667 2.0000 2.6667 3.0000 1.6667
0.0000 1.0000 16.0000 0.1111 1.7778 0.4444 4.0000 7.1111 9.0000 2.7778
0.1250 0.0270 0.0080 0.2250 0.1500 0.1350 0.0540 0.0600 0.0360 0.1800
0.0080 0.1250 0.0270 0.0600 0.0360 0.1500 0.2250 0.0540 0.1350 0.1800
0.0665 0.0760 0.0175 0.1425 0.0930 0.1425 0.1395 0.0570 0.0855 0.1800
1.4
2.8333
1.0000
1.0000
1.0000
Var[Y] = E[Y2 ] - E[Y]2 = 2.8333 - 1.42 = 0.8733. Comment: Var[ X ] ≠ Var[X]/3 = 2.44/3 = .8133. See Exercise 5 in “Topics in Credibility” by Dean. For the average of N balls from a single urn, Var[ X ] = 0.09 + 2.35/N, where 0.09 is the VHM for a single ball and 2.35 is the EPV for a single ball.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 677
17.18. D. Using the EPV and VHM for a single ball, K = EPV/VHM = 2.35/.09 = 26.1. Z = 11/(11 + 26.1) = .296. The a priori mean is 1.4. The observed mean is: 12/11. Estimate of the mean is: (.296)(12/11) + (1 - .296)(1.4) = 1.309. Estimate for 100 balls is: 130.9. Comment: Similar to Exercise 6 in “Topics in Credibility” by Dean. 17.19. A. VHM = Cov(Xi, Xj) = 40. VHM + EPV = Var(Xi) = 130. ⇒ EPV = 90. K = EPV/VHM = 90/40 = 2.25. Z = N/(N + K) = 3/(3 + 2.25) = 57.1%. Comment: Similar to 4, 11/05, Q.26. 17.20. B. This is the Buhlmann covariance structure, with EPV = 18 and VHM = 2. K = EPV/VHM = 18/2 = 9. For two observations, Z = 2/(2 + 9) = 2/11. (2/11)(27 +19)/2 + (9/11)µ = 32. ⇒ µ = 34. For four observations, Z = 4/(4 + 9) = 4/13. observed mean is: (40 + 61 + 45 + 29)/4 = 43.75. estimate is: (4/13)(43.75) + (9/13)(34) = 37. 17.21. C. The EPV that we use to calculate K is for one exposure: 20,000. The observed exposures total 8000. 2/3 = Z = 8000/(8000 + K). ⇒ K = 4000. 4000 = EPV/VHM = 20,000/VHM. ⇒ VHM = 5. Expected value of the process variance of aggregate losses for year 1 is: 20,000 / 2000 = 10. Cov(X1 , X1 ) = Var[X1 ] = (Expected value of the process variance for year 1) + VHM = 10 + 5 = 15. Comment: Similar to 4, 11/05, Q.26.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 678
17.22. C. Let X be the first year of experience and Y be the second year of experience. X = {(9100)(0) + (850)(1) + (50)(2)}/10000 = .0950.
ΣX2 /N = {(9100)(02 ) + (850)(12 ) + (50)(22 )}/10000 = .1050. Y = {(9080)(0) + (858)(1) + (62)(2)}/10000 = .0982.
ΣXY/N = {(1)(100) + (2)(10 + 8) + (4)(2)}/10000 = .0144. Let x = X - X , and y = Y - Y .
Σxi2 /N = variance of X = ΣX2 /N - X 2 = .1050 - .09502 = .0960. Σxiy i/N = sample covariance of X and Y = ΣXY/N - X Y = .0144 - (.0950)(.0982) = .00507. Estimated Z = slope of the regression line = Σxiy i / Σxi2 = .00507/.0960 = 5.3%. Comment: Beyond what you are likely to be asked on the exam. See “A Graphical Illustration of Experience Rating Credibilities,” by Howard C. Mahler, PCAS 1998, or pages 315-316 of “Risk Classification” by Robert J. Finger, in Foundations of Casualty Actuarial Science. 17.23. D. All of these are assumptions of the Bühlmann-Straub credibility model. Comment: See Section 16.4.5 of Loss Models. 17.24. B. The two estimates A and B are independent, therefore: Var[C] = Var[wA + (1 - w)B] = w2 Var[A] + (1 - w)2 Var[B] = 4002 w2 + 2002 (1 - w)2 . d Var[C] / dw = 2w4002 - 2(1 - w)2002 . Setting the derivative equal to zero: 2w4002 - 2(1 - w)2002 = 0. ⇒ w = 2002 /(4002 + 2002 ) = 1/5. Comment: Each of the two estimators A and B is given weight inversely proportional to its variance. For w = 1/5, estimate C is: (1/5)(1000) + (4/5)(1200) = 1160. 17.25. C. We have four years of data, and therefore Z = 4/(4 + K). ⇒ 0.4 = 4/(4 + K).
⇒ K = 6. We are given, EPV = 8. Therefore, VHM = EPV/K = 8/6 = 1.333. For the Buhlmann Covariance Structure, for i ≠ j, Cov(Xi, Xj) = VHM = 1.333. Comment: See Equation 20.35 of Loss Models. Cov(Xi, Xi) = EPV + VHM = 9.333.
2013-4-9
Buhlmann Cred. §17 Least Squares Credibility,
HCM 10/19/12,
Page 679
17.26. E. The Bühlmann-Straub credibility model deals with policies with different number of exposure units, so that A is not true. There is no requirement of a specific type of claim count distribution, so that B is not true. There is no minimum number of exposures required, so that C is not true. Comment: For classical credibility, 1082 claims is a commonly used standard for Full Credibility for frequency, corresponding to frequency being Poisson, P = 90%, and k = 5%. In Loss Models, the “Bühlmann credibility model” refers to the case such as individual automobile drivers, where there are no exposures, or one driver for one year is one exposure, or where each policy has the same number of exposures.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 680
Section 18, The Normal Equations for Credibilities In general, one can solve the normal equations for the least squares credibility. The normal equations involve the variance-covariance structure of the data. As discussed in the previous section, for Buhlmann Credibility the Variance-Covariance structure between the years of data is as follows:205 COV[Xi , Xj] = η2δij + τ2
where δij is 1 for i = j and 0 for i≠j.
More generally let COV[Xi , Xj] = Cij. Assume that in order to estimate X1+N we give weight Zi to year Xi , for i = 1 to N, with weight 1 - ΣZi given to the a priori mean µ. Then just as in the previous section, it turns out206 that the expected value of the squared errors is a quadratic function of the credibilities Z:207 N N
V(Z) =
∑ ∑ Zi Zj Cij
N
- 2 ∑ Zi Ci,1+ N + C1+N,1+N
i=1 j=1
i=1
The cases dealt with previously are a special case of this more general situation. Specifically if we observe one year, then N = 1 and the above equation becomes: V(Z) = Z2 C11 - 2ZC12 + C22. For the Buhlmann covariance structure, C11 = C22 = η2 + τ2 and C12 = τ2. Thus for one year of data, V(Z) = Z2 (η2 + τ2) - 2Zτ2 + η2 + τ2, as was derived previously for this covariance structure. For the Buhlmann-Straub covariance structure, for a risk of size m, C 11 = C22 = η2/m + τ2, and C12 = τ2. Thus for one year of data, V(Z) = Z 2 (η2/m + τ2) - 2Zτ2 + η2/m + τ2.
205
When size of risk E is important, the Buhlmann-Straub covariance structure of frequency, severity or pure premiums is: COV[Xi , Xj] = δij (η2 / m) + τ2. The EPV is inversely proportional to the size of risk. 206 207
See for example, “A Markov Chain Model of Shifting Risk Parameters”, by Howard Mahler, PCAS 1997. The credibilities are a vector with N elements, one for each year of data.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 681
In the general case, one can minimize the squared error by taking the partial derivatives of V(Z) with respect to each Zi and setting it equal to zero. This yields N linear equations in N unknowns, which can be solved by the usual matrix methods.208 N
∑ Zi Cij = Cj, 1+N
for j = 1, 2, 3, ..., N
i=1
These are sometimes called the normal equations.209 In Loss Models they assume linear estimators of the form: a0 + Σ ZiX i. Then they derive equations for the least squares linear estimator; they derive the linear estimator that minimizes the mean squared error (MSE). The unbiasedness equation:210 E[X] = a0 + Σ ZiE[Xi]. as well as the above equations:211 N
∑ Zi Cij j = Cj,
1+N
for j = 1,2,3,..., N
i=1
Loss Models refers to the unbiasedness equation plus the above equations as the normal equations. In most applications, one assumes the E[Xi] are each equal to an a priori mean µ, and therefore if one takes a0 = (1 - Σ Zi) µ, then the unbiasedness equation is satisfied. Specifically, it is common to take a0 = (1 - Σ Zi) µ = (complement of credibility)(a priori mean). For one year of data, the estimator reduces to: ZX + (1 - Z)(a priori mean). 208
See for example, “A Markov Chain Model of Shifting Risk Parameters”, by Howard Mahler, PCAS 1997. The equations will hold equally well if there is a gap between the data and the year to be estimated; for example, if one uses years 1, 2, and 3 to predict year 7, then the terms on the righthand side of the equations are Cj, 7. 209 See equations 20.26 in Loss Models. These are sometimes called the normal equations, but Loss Models uses that term for these equations plus the unbiasedness equation. 210 See equations 20.25 in Loss Models. 211 See equations 20.26 in Loss Models.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 682
Equal Exposures per Year: Assume the EPV and VHM are some function of the size of the insured, but the insured has the same number of exposures each year. Then all the EPVs are equal and all the VHMs are equal, and C ij = VHM + δijEPV. This looks just like the Buhlmann covariance structure, and the solution to the Normal Equations is Z = N/(N + K). The Normal equations are in this case: N
∑ Zi (VHM +
δij EPV) = VHM, for j = 1, 2, 3,..., N
i=1
By symmetry, each Zi is equal to the others; let Zi = Z/N, so that ΣZi = Z. Then each of the Normal Equations is: (Z/N)(N VHM + EPV) = VHM. Z = N/(N + EPV/VHM) = N/(N + K), where K = EPV/VHM as usual. Thus even if the Buhlmann-Straub covariance structure does not hold, if there are equal exposures each year, and the EPV and VHM are only functions of the size of insured, then the usual Buhlmann Credibility formula holds. If there are the same number of exposures per year, then the Normal Equations produce the same result as Z = N/(N + K), provided N is the number of years, and K is computed based on the number of exposures in each year. Exercise: Mannyʼs Hat Company has 150 exposures in each of 2 years. You expect a similar company to have an annual frequency of 0.060 claims per exposure. You assume the variance of the hypothetical mean frequency is 0.0001, and the expected value of the annual process variance is: 0.001 + .1/mi, where mi is the annual number of exposures. If Mannyʼs Hat Company had 19 and 14 claims in the two years observed, estimate its future claims frequency per exposure. [Solution: This is not the Buhlmann-Straub covariance structure. However, since there are the same number of exposures each year, we can use Z = N/(N + K), provided K is computed based on the number of exposures in each year, and N is the number of years. Each year the EPV is: 0.001 + 0.1/150 = 0.00167. VHM = 0.0001. K = EPV/VHM = 0.00167/0.0001 = 16.7. N is the number of years, 2. For 2 years, Z = 2/(2 + K) = 0.107. The observed frequency per exposure is: 33/300 = 0.110. Thus the estimated future frequency is: (0.107)(0.110) + (1 - 0.107)(0.060) = 0.065. Alternately, the covariance matrix is:
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 683
0.0001 ⎛ 0.00167 + 0.0001 ⎞ ⎛ 0.00177 0.0001 ⎞ = . ⎜ 0.0001 0.00167 + 0.0001⎟⎠ ⎜⎝ 0.0001 0.00177⎟⎠ ⎝ The Normal Equations are: 0.00177Z1 + 0.0001Z2 = 0.0001. 0.0001Z1 + 0.00177Z2 = 0.0001. Adding the equations: (0.00187)(Z1 + Z2 ) = 0.0002. ⇒ Z1 + Z2 = 0.1070. By symmetry Z1 = Z2 = 0.1070/2 = 0.0535. Estimated future frequency is: (19/150)(0.0535) + (14/150)(0.0535) + (1 - 0.107)(0.060) = 0.065.] Buhlmann-Straub Covariance Structure: In the case of the Buhlmann-Straub Covariance Structure, even if there are differing exposures each year, the usual Buhlmann Credibility formula holds. For the Buhlmann-Straub Covariance Structure, for an insured of size mi in year i: C ij = τ2 + δijη2/mi. The Normal Equations are in this case: N
∑
Zi (τ 2 i=1
+
δij η2
/ mi) = τ2, for j = 1, 2, 3,..., N.
⇒ τ2
N
∑ Zi + Zj η2/mj = τ2, for j = 1, 2, 3,..., N. i=1
N
τ2 ⇒ Z j = mj 2 (1 - ∑ Zi ), for j = 1, 2, 3,..., N. η i=1 Summing these equations over j, and letting m = Σmi:
Σ Zj = m
τ2 (1 - Σ Zj ) η2
Therefore, Σ Zj =
m τ 2 / η2 . 1 + m τ2 / η 2
Therefore, Zj = mj
τ2 τ2 1 mj (1 Σ Z ) = m / = . j i η2 η2 1 + m τ2 / η 2 η2 / τ2 + m
τ2 is the variance of the hypothetical means, and η2 is the Expected Value of the Process Variance η2 mj for one exposure. If as usual we let K = 2 , then Zj = . τ m + K
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 684
Let us assume we are estimating for example pure premiums.212 Then our estimated future pure premium is: Lj mj mj ΣZj PPj + (1 - ΣZj) µ = Σ + (1 - Σ ) µ= m + K mj m + K 1 m L K ΣLj + (1 )µ = + µ. K + m K + m K + m K + m What if instead we just combined all the years of data and let Z = m/(m + K)? Then the estimate of the future pure premium is: L K Z(L/m) + (1 - Z)µ = + µ, the same result as above. K + m K + m We have shown, that in the case of the Buhlmann-Straub Covariance structure, even if the number of exposures varies by year, using the Normal Equations results in the usual Buhlmann Credibility formula being applied to all the years of data combined. Exercise: Mannyʼs Hat Company has the following data for two years: Year: 1 2 Exposures: 100 200 Claims: 12 21 You expect a similar company to have an annual frequency of 0.060 claims per exposure. You assume the variance of the hypothetical mean frequency is 0.0001, and the expected value of the annual process variance is: 0.1/mi, where mi is the annual number of exposures. Estimate its future claims frequency. [Solution: This is an example of the Buhlmann-Straub covariance structure. Therefore, we can use the usual Buhlmann Credibility Formula. K = (EPV for one Exposure)/VHM = 0.1/0.0001 = 1000. There are 300 Exposures in total, so Z = 300/(300 + 1000) = 3/13. The observed frequency is: 33/300 = 0.110. Thus the estimated future frequency is: (3/13)(0.110) + (10/13)(0.060) = 0.07154. Alternately, one can use the Normal Equations as follows. The EPV for year 1 is: 0.01/100 = 0.0010. The EPV for year 2 is: 0.1/200 = 0.0005. The variance of year 1 is: EPV + VHM = 0.0010 + 0.0001 = 0.0011. The variance of year 2 is: EPV + VHM = 0.0005 + 0.0001 = 0.0006. The covariance of different years = VHM = 0.0001. The normal equations are: 0.0011Z1 + 0.0001Z2 = 0.0001. ⇒
11Z1 + Z2 = 1.
0.0001Z1 + 0.0006Z2 = 0.0001. ⇒
Z1 + 6Z2 = 1.
Solving these two linear equations in two unknowns, Z1 = 1/13, and Z2 = 2/13. The estimated future frequency is: (1/13)(12/100) + (2/13)(21/200) + (10/13)(.06) = 0.07154.] 212
The example would work just as well for frequency, severity, loss ratios, etc.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 685
Varying Exposures by Year, More General Variance-Covariance Structures: If the number of exposures differ by year, and one does not have the Buhlmann-Straub Covariance structure, in general then one does not get the usual Buhlmann Credibility formula. However, one can solve the Normal Equations as one would any other set of linear equations. Exercise: Mannyʼs Hat Company has the following data for two years: Year 1 2 Exposures 100 200 Claims 12 21 You expect a similar company to have an annual frequency of .060 claims per exposure. You assume the variance of the hypothetical mean frequency is .0001, and the expected value of the annual process variance is 0.001 + 0.1/mi, where mi is the annual number of exposures. Estimate its future claims frequency. [Solution: The EPV for year 1 is: 0.001 + 0.1/100 = 0.0020. The EPV for year 2 is: 0.001 + 0.1/200 = 0.0015. The variance of year 1 is: EPV + VHM = 0.0020 + 0.0001 = 0.0021. The variance of year 2 is: EPV + VHM = 0.0015 + 0.0001 = 0.0016. The covariance of different years = VHM = 0.0001. The normal equations are: 0.0021Z1 + 0.0001Z2 = 0.0001 ⇒ 0.0001Z1 + 0.0016Z2 = 0.0001 ⇒
21Z1 + Z2 = 1. Z1 + 16Z2 = 1.
Solving these two linear equations in two unknowns, in matrix form:213 ⎛ Z1 ⎞ ⎛21 1 ⎞ −1 ⎛ 1 ⎞ ⎛16 / 335 −1/ 335⎞ ⎛ 1 ⎞ ⎛15 / 335 ⎞ ⎟ ⎜ ⎟=⎜ ⎜ ⎟=⎜ ⎟ ⎜ ⎟=⎜ ⎟. ⎝ Z2 ⎠ ⎝ 1 16 ⎠ ⎝ 1 ⎠ ⎝ −1/ 335 21/ 335⎠ ⎝ 1 ⎠ ⎝20 / 335⎠ Thus Z1 = 15/335 = 3/67, and Z2 = 20/335 = 4/67. Therefore, the estimated future frequency is: (3/67)(12/100) + (4/67)(21/200) + (60/67)(0.06) = 0.06537.] Note how this estimate of 0.06537 differs significantly from that in a previous exercise, 0.07145, where the covariance structure was the same, and the total experience was the same, but the exposures in the two years were the same, rather than different as here.
213
I have shown the solution in matrix form, to remind you how one would solve n linear equations in n unknowns. If a practical application had for example 4 years of data, then one would need to invert a 4 by 4 matrix, in order to solve the Normal Equations.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 686
In one example of a different covariance structure, as in the above exercise, the expected value of the annual process variance is of the form: w + v/mi.214 This covariance structure is used to model parameter uncertainty. Parameter uncertainty involves random fluctuations in the states of the universe, that affect most insureds somewhat similarly regardless of size. As the size of risk increase, the EPV goes to a constant w, rather than zero, as assumed more commonly. Therefore, as the size of risk approaches infinity, the credibility does not approach 1.215 Another example of a different covariance structure is to assume that the variance of the hypothetical means depends on the size of insured: VHM = a + b/mi. This covariance structure is used to model risk heterogeneity. Risk heterogeneity occurs when an insured is a sum of subunits, and not all of the subunits have the same risk process. In Workers Compensation Experience Rating, the commonly assumed covariance structure includes both parameter uncertainty and risk heterogeneity, which leads to credibilities of the form: (Linear Function of Size of Insured)/ (Linear Function of Size of Insured).216 Given these or any other covariance structure, one can obtain the credibilities assigned to each year of data by solving a set of linear equations in a similar manner. For example assume the variance-covariance matrix is given by:217 3.5833 0.3750 0.2837 0.2159 0.3750 3.5833 0.3750 0.2837 0.2837 0.3750 3.5833 0.3750 0.2159 0.2837 0.3750 3.5833
214
See Example 20.25 in Loss Models. See 4, 5/01, Q.23. See Howard Mahlerʼs Discussion of Robin R. Gillamʼs, “Parametrizing the Workers Compensation Experience Rating Plan”, PCAS 1993, or “Credibility with Parameter Uncertainty, Risk Heterogeneity, and Shifting Risk Parameters,” by Howard Mahler, PCAS 1998. 216 See Example 20.26 in Loss Models. See also, Howard Mahlerʼs discussion of “Parametrizing the Workers Compensation Experience Rating Plan”, PCAS 1993, or Howard Mahlerʼs “Credibility with Shifting Risk Parameters, Risk Heterogeneity and Parameter Uncertainty”, PCAS 1998. 217 This example is taken from “A Markov Chain Model of Shifting Risk Parameters”, by Howard Mahler, PCAS 1997. As with the variance-covariance matrices underlying Buhlmann Credibility, the off-diagonal elements are smaller than those along the diagonal. In the Buhlmann covariance structure, all of the off-diagonal terms are equal to τ2, while all 215
the diagonal elements equal τ2 + η2 . In contrast, here as the terms get further from the diagonal they decline. This decline reflects an assumption that as years of data get further apart they are less closely correlated. This is the situation one would expect when risk parameters shift over time.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 687
Then the normal equations become for years 1, 2 and 3 predicting year 4: 3.5833Z1 + 0.3750Z2 + 0.2837Z3
=
0.2159.
0.3750Z1 + 3.5833Z2 + 0.3750Z3
=
0.2837.
0.2837Z1 + 0.3750Z2 + 3.5833Z3
=
0.3750.
Note the way that each equation corresponds to a row of the variance-covariance matrix. These three linear equations in three unknowns have the solution:218 Z 1 = 4.6%, Z 2 = 6.4%, and Z3 = 9.4%. Thus in this case one would give the data from Year 1 a weight of 4.6%, that from Year 2 a weight of 6.4%, that from Year 3 a weight of 9.4%, with the remaining weight of 79.6% being given to the a priori mean.219
Summary: For Buhlmann Covariance Structure, the Normal Equations ⇒ Z =
N . N + K
For Buhlmann-Straub Covariance Structure, the Normal Equations ⇒ Z =
N . N + K
If one assumes a covariance structure different than the Buhlmann-Straub, and exposures vary by year, then the equation Z = m/(m + K) does not hold. In these situations, one can solve the set of linear equations, called the “normal equations”, for the amount of credibility to be assigned to each year of data.
218
The would involve inverting the 3 by 3 matrix of coefficients. Note the way the data from the most recent year is given more weight than that from a more distant year. This is typical when one takes into account shifting risk parameters. If the rate of shifting is relatively slow, then the assumption of stable risk parameters is reasonable to use for practical purposes. 219
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 688
Problems: Use the following information for the next 5 questions:
•
There are four years of data: X1 , X2 , X3 , X4 .
•
Assume a priori that E[X1 ] = E[X2 ] = E[X3 ] = E[X4 ] = µ.
•
The assumed variance-covariance matrix between the years of data is: ⎛17 5 5 5 ⎞ ⎜ ⎟ 5 17 5 5 ⎜ ⎟ ⎜ 5 5 17 5 ⎟ ⎜ ⎟ ⎝ 5 5 5 17 ⎠
•
Assume you will use a linear estimator, a0 + a1 X1 + a2 X2 + a3 X3 , in order to predict X4 .
•
You will employ the normal equations in order to determine the least squares linear estimator of X4 given X1 , X2 , and X3 .
18.1 (1 point) What is a0 ? B. µ/3
A. 5µ/17
C. 15µ/27
D. µ
E. None of the above. 18.2 (1 point) What is a1 ? A. 5/27
B. 1/4
C. 5/17
D. 1/3
E. None of the above.
C. 5/17
D. 1/3
E. None of the above.
C. 5/17
D. 1/3
E. None of the above.
18.3 (1 points) What is a2 ? A. 5/27
B. 1/4
18.4 (1 points) What is a3 ? A. 5/27
B. 1/4
18.5 (1 point) Assume µ = 100 and X1 = 70, X2 = 110, and X3 = 90, what is the least squares linear estimate of X4 ? A. Less than 91 B. At least 91 but less than 92 C. At least 92 but less than 93 D. At least 93 but less than 94 E. At least 94
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 689
Use the following information for the next two questions: • Let Xi be the losses in year i.
•
Assume a priori that E[X1 ] = E[X2 ] = E[X3 ] = E[X4 ] = 100.
•
The assumed variance-covariance matrix between the years of data is: ⎛17 4 3 2 ⎞ ⎜ ⎟ 4 17 4 3 ⎜ ⎟ ⎜ 3 4 17 4 ⎟ ⎜ ⎟ ⎝ 2 3 4 17 ⎠
•
Assume you will use a linear estimator, a0 + a1 X1 + a2 X2 + a3 X3 , in order to predict X4 .
•
You will employ the Normal Equations in order to determine the least squares linear estimator of X4 given X1 , X2 , and X3 .
18.6 (3 points) If X1 = 70, X2 = 110, and X3 = 90, what is the estimate of X4 ? A. Less than 94 B. At least 94 but less than 95 C. At least 95 but less than 96 D. At least 96 but less than 97 E. At least 97 18.7 (3 points) If X1 = 90, X2 = 110, and X3 = 70, what is the estimate of X4 ? A. Less than 94 B. At least 94 but less than 95 C. At least 95 but less than 96 D. At least 96 but less than 97 E. At least 97 18.8 (2 points) You are given the following information about a single risk: (i) The risk has m exposures in each year. (ii) The risk is observed for n years. (iii) The variance of the hypothetical means is a. (iv) The expected value of the annual process variance is w + v/m. Determine the limit of the Bühlmann-Straub credibility factor as n approaches infinity. m m m m (A) (B) (C) (D) (E) 1 m + (w + v) / a m + m2 w / a m + w/a m + v/a
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 690
18.9 (1 point) The a priori expected value of X is 1000. X1 = 900, X2 = 1100, X3 = 800, and X4 = 1300. Using the Normal Equations you determine that: Z1 = 10%, Z2 = 15%, Z3 = 30%, Z4 = 20%. What is the resulting estimate of X5 ? A. 1000
B. 1005
C. 1010
D. 1015
E. 1020
Use the following information for the next three questions: You are given the following data for Cohen Construction Company: Year: 2001 2002 Exposures: 10 50 Losses: 80 750 Pure Premium: 8 15 The expected pure premium for a construction company similar to this one is 21. 18.10 (3 points) You assume that:
• •
The variance of the hypothetical mean pure premiums is 12.
The expected value of the annual process variance is: 400/E, where E is the number of exposures that year. Use least squares credibility in order to estimate the future pure premium for the Cohen Construction Company. A. Less than 16.50 B. At least 16.50 but less than 17.00 C. At least 17.00 but less than 17.50 D. At least 17.50 but less than 18.00 E. At least 18.00 18.11 (4 points) You assume that:
• •
The variance of the hypothetical mean pure premiums is 12.
The expected value of the annual process variance is: 4 + 400/E, where E is the number of exposures that year. Use least squares credibility in order to estimate the future pure premium for the Cohen Construction Company. A. Less than 16.50 B. At least 16.50 but less than 17.00 C. At least 17.00 but less than 17.50 D. At least 17.50 but less than 18.00 E. At least 18.00
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 691
18.12 (4 points) You assume that • Ei is the number of exposures for year i.
• •
PPi is the pure premium for year i. Cov[PPi, PPj] = 12 +
80 Ei Ej
+ δij (4 +
400 ). Ej
•
The exposures in year 2003 are expected to be 30. Use the normal equations in order to estimate the pure premium in 2003 for the Cohen Construction Company. A. Less than 16.50 B. At least 16.50 but less than 17.00 C. At least 17.00 but less than 17.50 D. At least 17.50 but less than 18.00 E. At least 18.00
18.13 (2 points) You are given the following information about a single risk: (i) The risk has 100 exposures in each year. (ii) The risk is observed for 3 years. (iii) The variance of the hypothetical means is 5. (iv) Where m is the number of exposures observed during a year, the expected value of the annual process variance is 30 + 7500/m. Determine the credibility factor Z assigned to the sum of these three years of data. A. 10% B. 12.5% C. 15% D. 17.5% E. 20% 18.14 (2 points) You are given the following information about a single risk: (i) The risk has m exposures in each year. (ii) The risk is observed for n years. (iii) The variance of the hypothetical means is a + b/m. (iv) The expected value of the annual process variance is v/m. Determine the limit of the Bühlmann-Straub credibility factor as m approaches zero. n n n n (A) 0 (B) (C) (D) (E) n + v/a n + w/a n + v/b n + w/b
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 692
18.15 (8 points) Use the following information: • Let Ri be the class relativity in year i.
•
Assume a priori that E[Ri] = 1.
•
Let mi be the expected losses in thousands of dollars for the class in year i,
• •
i
measuring the size of the class. ⎧0 if i ≠ j δij = ⎨ ⎩1 if i = j Cov[Ri, Rj] = 0.05 +
mi
5 mi m j
+ δij (0.005 +
25 mi m j
).
Ri
1 350 0.92 2 180 0.83 3 190 0.76 4 290 0.98 5 320 0.50 6 270 Employ the Normal Equations in order to determine the least squares linear estimator of R6 . (Use a computer to help you with the computations.) A. 0.82 B. 0.84 C. 0.86 D. 0.88 E. 0.90 18.16 (6 points) Use the following information: • You are using data from years 1 through 5 in order to predict year 6. ⎧0 if i ≠ j • δij = ⎨ ⎩1 if i = j
•
The covariance between years of data is: Cov[Xi, Xj] = 0.9|i-j| + 5 δij.
Employ the Normal Equations in order to determine the credibilities to assign to each of the years of data. (Use a computer to help you with the computations.)
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 693
18.17 (4, 5/01, Q.23) (2.5 points) You are given the following information about a single risk: (i) The risk has m exposures in each year. (ii) The risk is observed for n years. (iii) The variance of the hypothetical means is a. (iv) The expected value of the annual process variance is w + v/m. Determine the limit of the Bühlmann-Straub credibility factor as m approaches infinity. n n n n (A) (B) (C) (D) (E) 1 2 n + (w + v)/ a n + n w/a n + w/a n + v/a
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 694
Solutions to Problems: 18.1. E. , 18.2. A. , 18.3. A. , 18.4. A. Write down the normal equations for the least squares linear estimator of X4 given X1 , X2 , and X3 . The unbiasedness equation is: a0 + a1 µ + a2 µ + a3 µ = µ. The 3 remaining normal equations are: 17a1 + 5a2 + 5a3 = 5 5a1 + 17a2 + 5a3 = 5 5a1 + 5a2 + 17a3 = 5 By symmetry a1 = a2 = a3 . Therefore 27a1 = 5. a1 = 5/27 = a2 = a3 . Alternately, one can solve these three linear equations in three unknowns by inverting the matrix of coefficients: (a1 ) (22 -5 -5 ) (5) (5/27) (a2 ) = (-5
22
-5 )/324 (5) =
(5/27)
(a3 )
-5
22 )
(5/27)
(-5
(5)
Using a1 ,a2 and a3 , and the unbiasedness equation to solve for a0 : a0 = µ(1- a1 - a2 - a3) = 12µ /27. Comment: This is the Buhlmann Covariance Structure, with VHM = 5 and EPV + VHM = 17. Thus, EPV = 12 and K = 12/5. We give three years of data a credibility of 3/(3 + 12/5) = 15/27. Thus the credibility estimate of X4 is: (15/27)( X1 + X2 + X3 )/3 + (1 - 15/27)µ = 12µ/27 + (5/27)X1 + (5/27)X2 + (5/27)X3 . 18.5. E. From the previous solutions, a1 = a2 = a3 = 5/27 and a0 = 12µ /27. Thus, X4 = 12µ/27 + (5/27)X1 + (5/27)X2 + (5/27)X3 = (12/27)(100) + (5/27)(70) + (5/27)(110) + (5/27)(90) = 94.44. Comment: This is the Buhlmann Covariance Structure, with VHM = 5 and EPV + VHM = 17. Thus, EPV = 12 and K = 12/5. We give three years of data a credibility of 3/(3 + 12/5) = 15/27 = 55.6%. The a priori mean is 100. The observed average is: (70 + 110 + 90) / 3 = 90. Thus the credibility estimate of X4 is: (55.6%)(90) + (44.4%)(100) = 94.44.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 695
18.6. E. & 18.7. B. The Normal Equations are E[X] = a0 + Σ ZiE[Xi], and 3
Σ Zi C ij = C j, 4
for j = 1,2,3.
i =1
17Z1 + 4Z2 + 3Z3 = 2. 4Z1 + 17Z2 + 4Z3 = 3. 3Z1 + 4Z2 + 17Z3 = 4. Solving, Z1 = 17/308, Z2 = 36/308, Z3 = 61/308.
⇒ a0 = 100 - (100)(17/308 + 36/308 + 61/308) = (100)(194/308). ^ X4 = (100)(194/308) + (17/308)X1 + (36/308)X2 + (61/308)X3 .
If X1 = 70, X2 = 110, and X3 = 90, then ^ X4 = (100)(194/308) + (17/308)(70) + (36/308)(110) + (61/308)(90) = 97.53.
If X1 = 90, X2 = 110, and X3 = 70, then ^ X4 = (100)(194/308) + (17/308)(90) + (36/308)(110) + (61/308)(70) = 94.68.
Comment: This type of covariance structure can occur when there are shifting risk parameters over time. In which case, older years of data are given less weight than recent years. See “A Markov Chain Model of Shifting Risk Parameters”, by Howard Mahler, PCAS 1997. 18.8. E. K = EPV/VHM = (w + v/m)/a. Z = n/(n + K) = n/(n + (w + v/m)/a). As n approaches infinity, Z approaches 1. Comment: Similar to 4, 5/01, Q.23. This covariance structure is used to model parameter uncertainty. Increasing the number years observed can overcome the effects of parameter uncertainty, by averaging over the different assumed random states of the universe in each year. This can not be accomplished by observing more exposures from a single year. See “Credibility with Parameter Uncertainty, Risk Heterogeneity, and Shifting Risk Parameters,” by Howard Mahler, PCAS 1998. 18.9. B. Estimate of X5 = (.1)(900) + (.15)(1100) + (.3)(800) + (.2)(1300) + (1 - .1 - .15 - .3 - .2)(1000) = 1005.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 696
18.10. A. K = (EPV for one Exposure)/VHM = 400/12 = 100/3. We observe a total of 60 exposures, so Z = 60/(60 + 100/3) = 9/14. The observed pure premium is: (80 + 750)/(10 + 50) = 83/6. Therefore, the estimated future pure premium is: (9/14)(83/6) + (1 - 9/14)(21) = 459/28 = 16.39. Alternately, the EPV for 2001 is: 400/10 = 40. The EPV for year 2002 is: 400/50 = 8. The variance of year 2001 is: EPV + VHM = 40 + 12 = 52. The variance of year 2002 is: EPV + VHM = 8 + 12 = 20. The covariance of different years = VHM = 6. ⎛52 12 ⎞ C=⎜ ⎟. ⎝12 20⎠ The normal equations for credibility and linear estimators of the form a0 + Σ ZiXi : n
Σ Zi C ij = C j, 1+n
for j = 1,2,3 ...n
i =1
Plus the unbiasedness equation: E[X] = a0 + Σ ZiE[Xi]. Since we are assuming each year has the same expected pure premium, a0 = µ(1 - Σ Zi) = (a priori mean)(complement of credibility). The normal equations have coefficients from C: 52Z1 + 12Z2 = 12 12Z1 + 20Z2 = 12. Solving, Z1 = 3/28 and Z2 = 15/28. Therefore, the estimated future pure premium is: (8)(3/28) + (15)(15/28) + (21)(1 - (3/28 + 15/28)) = 16.39.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 697
18.11. B. The EPV for 2001 is 4 + 400/10 = 44. The EPV for year 2002 is : 4 + 400/50 = 12. The variance of year 2001 is: EPV + VHM = 44 + 12 = 56. The variance of year 2002 is: EPV + VHM = 12 + 12 = 24. The covariance of different years = VHM = 12. ⎛56 12 ⎞ C=⎜ ⎟. ⎝12 24⎠ The normal equations for credibility and linear estimators of the form a0 + Σ ZiXi : n
Σ Zi C ij = C j, 1+n
for j = 1,2,3 ...n
i =1
Plus the unbiasedness equation: E[X] = a0 + Σ ZiE[Xi]. Since we are assuming each year has the same expected pure premium, a0 = E[X](1 - Σ Zi) = (a priori mean)(complement of credibility). The normal equations have coefficients from C: 56Z1 + 12Z2 = 12 12Z1 + 24Z2 = 12. Solving, Z1 = 12% and Z2 = 44%. Therefore, the estimated future pure premium is: (8)(12%) + (15)(44%) + (21)(1 - (12% + 44%)) = 16.80. Comment: This covariance structure is used to model parameter uncertainty.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 698
18.12. A. Cov[PP1 , PP1 ] = 12 + 80/10 + 4 + 400/10 = 64. Cov[PP1 , PP2 ] = Cov[PP2 , PP1 ] = 12 + 80/ (10)(50) = 15.578. Cov[PP2 , PP2 ] = 12 + 80/50 + 4 + 400/50 = 25.6. Cov[PP1 , PP3 ] = Cov[PP3 , PP1 ] = 12 + 80/ (10)(30) = 16.619. Cov[PP2 , PP3 ] = Cov[PP3 , PP2 ] = 12 + 80/ (50)(30) = 14.066. Cov[PP3 , PP3 ] = 12 + 80/30 + 4 + 400/30 = 32. ⎛ 64 15.578 16.619 ⎞ ⎜ ⎟ C = ⎜15.578 25.6 14.066 ⎟ . ⎜ ⎟ 32 ⎠ ⎝16.619 14.066 The normal equations for credibility and linear estimators of the form a0 + Σ ZiXi : n
Σ Zi C ij = C j, 1+n
for j = 1,2,3 ...n
i =1
Plus the unbiasedness equation: E[X] = a0 + Σ ZiE[Xi]. Since we are assuming each year has the same expected pure premium, a0 = E[X](1 - Σ Zi) = (a priori mean)(complement of credibility). The normal equations have coefficients from C: 64Z1 + 15.578Z2 = 16.619. 15.578Z1 + 25.6Z2 = 14.066. Solving, Z1 = 14.8% and Z2 = 45.9%. Therefore, the estimated future pure premium is: (8)(14.8%) + (15)(45.9%) + (21)(1 - (14.8% + 45.9%)) = 16.32. Comment: This covariance structure can be used to model parameter uncertainty and risk heterogeneity. See “Credibility with Parameter Uncertainty, Risk Heterogeneity, and Shifting Risk Parameters,” by Howard Mahler, PCAS 1998.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 699
18.13. B. There are equal exposures each year, and the EPV and VHM are only functions of the size of insured, and therefore the usual Buhlmann Credibility formula holds. Z = N/(N + K), provided N is the number of years, and K is computed based on the number of exposures in each year. K = EPV/VHM = (30 + 7500/100)/5 = 105/5 = 21. Z = N/(N + K) = 3/(3 + 21) = 12.5%. Alternately, set up the normal equations for credibility and linear estimators of the form a0 + Σ ZiXi : 3
Σ Zi C ij = C j, 1+3
for j = 1,2,3.
i =1
Plus the unbiasedness equation: E[X] = a0 + Σ ZiE[Xi]. The covariance of years i and j is: VHM = 5 for i≠j, and VHM + EPV = 5 + 30 + 7500/100 = 110 for i = j. Thus the Normal Equations are: 110Z1 + 5Z2 + 5Z3 = 5. 5Z1 + 110Z2 + 5Z3 = 5. 5Z1 + 5Z2 + 110Z3 = 5. By symmetry, each Zi is equal to the others. Then each equation becomes 120Z1 = 5. ⇒ Z1 = 5/120. The total weight given to the three years of data is: Z1 + Z2 + Z3 = 3Z1 = 5/40 = 12.5%. Comment: Somewhat similar to 4, 5/01, Q.23.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 700
18.14. D. K = EPV/VHM = (v/m)/(a + b/m) = v/(ma + b). Z = n/(n + K) = n/(n + v/(ma + b)). As m approaches zero, Z approaches n/(n + v/b). Alternately, set up the normal equations for credibility and linear estimators of the form a0 + Σ ZiXi : n
Σ Zi C ij = C j, 1+n
for j = 1,2,3 ...n
i =1
Plus the unbiasedness equation: E[X] = a0 + Σ ZiE[Xi]. The covariance of years i and j is: VHM = a + b/m for i≠j, and VHM + EPV = a + b/m + v/m for i = j. Therefore, Cij = a + b/m + δij(v/m). By symmetry, each Zi is equal to the others; let Zi = Z/n, so that ΣZi = Z. Then (Z/n)(na + nb/m + v/m) = a + b/m. Z = n/(n + (v/m)/(a + b/m)) = n/(n + v/(ma + b)). As m approaches zero, Z approaches n/(n + v/b). Comment: Similar to 4, 5/01, Q.23. Assuming that each year has the same expected value, a0 = (1 - Z)E[X]. This covariance structure is used to model risk heterogeneity. Risk heterogeneity occurs when an insured is sum of subunits, and not all of the subunits have the same risk process. For this covariance structure, as the size of risk approaches zero, the credibility approaches a positive constant. This covariance structure can be refined to remove this feature. See “Credibility with Parameter Uncertainty, Risk Heterogeneity, and Shifting Risk Parameters,” by Howard Mahler, PCAS 1998.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 701
18.15. B. Var[Ri] = Cov[Ri, Ri] = 0.05{1 + 100/mi + .1 + 500/ mi)} = 0.05(1.1 + 600/mi). For example, Var[R1 ] = 0.05(1.1 + 600/350) = .140714. For i ≠ j, Cov[Ri, Ri] = 0.05 +
5 mi m j
For example, Cov[R1 , R2 ] = 0.05 +
. 5 (350) (180)
= 0.0699205.
5
Σ Zi C ij = Cj, 6
for j = 1, 2, 3, 4, 5.
i =1
0.140714Z1 + 0.0699205Z2 + 0.0693892Z3 + 0.0656941Z4 + 0.0649404Z5 = .066265. 0.0699205Z1 + 0.221667Z2 + 0.0770369Z3 + 0.0718844Z4 + 0.0708333Z5 = .0726805. 0.0693892Z1 + 0.0770369Z2 + 0.212895Z3 + 0.0713007Z4 + 0.0702777Z5 = .0720755. 0.0656941Z1 + 0.0718844Z2 + 0.0713007Z3 + 0.158448Z4 + 0.0664133Z5 = .0678685. 0.0649404Z1 + 0.0708333Z2 + 0.0702777Z3 + 0.0664133Z4 + 0.14875Z5 = .0670103. Solving, Z1 = 0.195, Z2 = 0.113, Z3 = 0.118, Z4 = 0.167, Z5 = 0.181.
Σ Zi = 0.195 + 0.113 + 0.118 + 0.167 + 0.181 = .774. The remaining weight of: 1 - .774 = .226 is given to the a priori mean relativity of 1. The estimated relativity for year 6 is: (0.195)(.92) + (0.113)(.83) + (0.118)(.76) + (0.167)(.98) + (0.181)(.50) + (.226)(1) = 0.843. Comment: Well beyond what you will e asked on your exam. A very simplified version of “Workersʼ Compensation Classification Credibilities,” by Howard C. Mahler, CAS Forum, Fall 1999.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 702
18.16. Var[X] = Cov[X, X] = 1 + 5 = 6. Cov[X1 , X2 ] = 0.9.
Cov[X1 , X3 ] = 0.92 = 0.81.
Cov[X1 , X4 ] = 0.93 = 0.729.
The covariance matrix between the years of data is: ⎛ 6 0.9 0.81 0.729 0.6561 0.59049⎞ ⎜ ⎟ 6 0.9 0.81 0.729 0.6561 ⎟ ⎜ 0.9 ⎜ 0.81 0.9 6 0.9 0.81 0.729 ⎟ ⎜ ⎟ 6 0.9 0.81 ⎟ ⎜⎜ 0.729 0.81 0.9 ⎟ 6 0.9 ⎠ ⎝0.6561 0.729 0.81 0.9 Therefore, the Normal Equations are: 6Z1 + 0.9Z2 + 0.81Z3 + 0.729Z4 + 0.6561Z5 = 0.59049Z1 . 0.9Z1 + 6Z2 + 0.9Z3 + 0.81Z4 + 0.729Z5 = 0.6561Z2 . 0.81Z1 + 0.9Z2 + 6Z3 + 0.9Z4 + 0.81Z5 = 0.729Z3 . 0.729Z1 + 0.81Z2 + 0.9Z3 + 6Z4 + 0.9Z5 = 0.81Z4 . 0.6561Z1 + 0.729Z2 + 0.81Z3 + 0.9Z4 + 6Z5 = 0.9Z5 . Solving: Z1 = 6.86%, Z2 = -5.21%, Z3 = 8.83%, Z4 = 10.22%, Z5 = 12.16%. 6.86% - 5.21% + 8.83% + 10.22% + 12.16% = 32.86%. The remaining weight of 67.14% is given to the a priori mean. Comment: Beyond what you will be asked on your exam. The older years are less correlated with year 6, the year we wish to estimate, and thus their data is given less weight. The Normal Equations can have solutions where some of the “credibilities” are negative or greater than one. In this case, giving negative weight to the data from year 2, allows us to give more weight to the data from year 1, resulting in a smaller expected squared error. Year 1 is correlated with year 0, etc., and therefore contains valuable information about prior years. See “A Markov Chain Model of Shifting Risk Parameters,” by Howard Mahler, PCAS 1997, not on the syllabus.
2013-4-9
Buhlmann Credibility §18 Normal Equations,
HCM 10/19/12,
Page 703
18.17. B. K = EPV/VHM = (w + v/m)/a. Z = n/(n + K) = na/(na + w + v/m). As m approaches infinity, Z approaches: na/(na + w) = n/(n + w/a). Alternately, set up the normal equations for credibility and linear estimators of the form a0 + Σ ZiXi : n
Σ Zi C ij = C j, 1+n
for j = 1,2,3 ...n
i =1
Plus the unbiasedness equation: E[X] = a0 + Σ ZiE[Xi]. The covariance of years i and j is: VHM = a for i≠j, and VHM + EPV = a + w + v/m for i = j. Therefore, Cij = a + δij(w + v/m). By symmetry, each Zi is equal to the others; let Zi = Z/n, so that ΣZi = Z. Then (Z/n)(na + w + v/m) = a. Z = na/(na + w + v/m). As m approaches infinity, Z approaches na/(na + w) = n/(n + w/a). Comment: Not as hard as it looks! This is an example where the Buhlmann-Straub covariance structure does not hold. Even if the Buhlmann-Straub covariance structure does not hold, if there are equal exposures each year, and the EPV and VHM are only functions of the size of insured, then the usual Buhlmann Credibility formula holds. If there are the same number of exposures per year, then the Normal Equations produce the same result as Z = N/(N + K), provided N is the number of years, and K is computed based on the number of exposures in each year. Assuming each year has the same expected value, a0 = (1-Z)E[X]. This covariance structure is used to model parameter uncertainty. As the size of risk increase, the EPV goes to a constant w, rather than zero, as assumed more commonly. Therefore, as the size of risk approaches infinity, the credibility does not approach one, assuming w > 0. See Howard Mahlerʼs Discussion of Robin R. Gillamʼs, “Parametrizing the Workers Compensation Experience Rating Plan”, PCAS 1993, or “Credibility with Parameter Uncertainty, Risk Heterogeneity, and Shifting Risk Parameters,” by Howard Mahler, PCAS 1998.
2013-4-9
Buhlmann Credibility §19 Important Ideas,
HCM 10/19/12,
Page 704
Section 19, Important Formulas and Ideas Here are what I believe are the most important formulas and ideas from this study guide to know for the exam. Conditional Distributions (Section 2):
P[A | B] =
E[X | B] =
P[A and B] P[B]
∑ y P[ X = y
P[A] =
∑P[A
| Bi] P[Bi]
Bi
E[X] =
| B]
y
∑E[X
| Bi] P[Bi]
Bi
Covariances and Correlations (Section 3): Cov[X,Y] ≡ E[XY] - E[X]E[Y]. Cov[X, X] = Var[X]. Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y]. If X and Y are independent then: Cov[X, Y] = 0 and Var[X + Y] = Var[X] + Var[Y]. Corr[X,Y] ≡
Cov[X, Y] Var[X] Var[Y]
.
The correlation is always in the interval [-1, +1].
Bayesian Analysis (Sections 4, 5, and 6): Bayesʼ Theorem: P(A | B) =
P(B | A) P(A) . P(B)
P(Risk Type | Observation) =
P(Observation | Risk Type) P(Risk Type) . P(Observation)
Unless stated otherwise, the estimate resulting from Bayesian Analysis is the mean of the posterior distribution, (corresponding to using the squared error loss function.) The result of Bayesian Analysis is always within the range of hypotheses. The estimates that result from Bayesian Analysis are always in balance: The sum of the product of the a priori chance of each outcome times its posterior Bayesian estimate is equal to the a priori mean.
2013-4-9
Buhlmann Credibility §19 Important Ideas,
HCM 10/19/12,
Page 705
If π(θ) is the prior distribution of the parameter θ then the posterior distribution of θ is proportional to: π(θ) P(Observation | θ) The posterior distribution of θ is:
π(θ) Prob[Observation | θ]
∫ π(θ) Prob[Observation | θ] dθ
(Mean given θ) π(θ) Prob[Obs. | θ] dθ ∫ The Bayes estimate is: . ∫ π(θ) Prob[Obs. | θ] dθ When there is a continuous distribution of risk types, one can use the posterior distribution to get Bayesian Interval Estimates. Buhlmann Credibility (Sections 7, 8, 9, and 10): v = EPV = Expected Value of the Process Variance = Eθ[VAR[X | θ]]. a = VHM = Variance of the Hypothetical Means = VARθ[ E[X | θ] ]. EPV + VHM = Total Variance. Buhlmann Credibility Parameter = K =
EPV , VHM
where the Expected Value of the Process Variance and the Variance of the Hypothetical Means are each calculated for a single observation of the risk process. One calculates the EPV, VHM, and K prior to knowing the particular observation! If one is estimating claim frequencies or pure premiums, then N is in exposures. If one is estimating claim severities, then N is in number of claims. For N observations, the Buhlmann Credibility Factor is: Z =
N . N + K
K is the number of observations needed for 50% credibility.
Estimate of the future = (Z) (Observation) + (1 - Z) (Prior Mean). In the use of credibility, the estimate is always between the a priori estimate and the observation.
2013-4-9
Buhlmann Credibility §19 Important Ideas,
HCM 10/19/12,
Page 706
The Buhlmann Credibility estimate is a linear function of the observation. If exposures do not vary, the estimates that result from Buhlmann Credibility are in balance: The weighted average of the Buhlmann Credibility estimates over the possible observations for a given situation, using as weights the a priori chances of the possible observations, is equal to the a priori mean. Linear Regression (Section 11): The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. Buhlmann Credibility is the Least Squares approximation to the Bayesian Estimates. The slope of the weighted least squares line to the Bayesian Estimates is the Buhlmann Credibility. Classification and Experience Rating (Sections 14 and 15): The more homogeneous the classes, the more credibility is assigned to the class data and the less to the overall average, when determining classification rates. The more homogeneous the classes, the less credibility assigned an individualʼs data and the more to the average for the class, when performing experience rating (individual risk rating.) The credibility is a relative measure of the value of the information contained in the observation of the individual versus the information in the class average. Loss Functions (Section 16): Bayes Analysis using the Squared-error Loss Function, just means do what we usually do, get the posterior mean of the quantity of interest. Error or Loss Function
Name
Bayesian Point Estimator
(estimate - true value)2
Squared-error
Mean
⎧ 0 if estimate = true value ⎨ ⎩ 1 if estimate ≠ true value
Zero-one
Mode
Absolute-error
Median
| estimate - true value |
⎧ (1-p) | estimate - true value |, if estimate ≥ true value (overestimate) ⎨ ⎩ (p)| estimate - true value|, if estimate ≤ true value (underestimate)
p th percentile
2013-4-9
Buhlmann Credibility §19 Important Ideas,
HCM 10/19/12,
Page 707
Least Squares Credibility (Section 17): Buhlmann Credibility is the linear estimator which minimizes the expected squared error measured with respect to either the future observation, the hypothetical mean, or the Bayesian Estimate. The expected value of the squared error as a function of the weight applied to the observation is a parabola. Buhlmann Covariance Structure: COV[Xi , Xj] = EPVδij + VHM. The Buhlmann-Straub Model: For a given policyholder its data (frequency, severity, or pure premium) in different years, Xi, are independent. In year i, the policy has exposures mi, some measure of size. µ(θ) = E[Xi | θ]. Var[Xi | θ] = v(θ)/mi. Buhlmann-Straub Covariance Structure: COV[Xi , Xj] = (EPV / Size) δij + VHM.
Normal Equations (Section 18): If there are equal exposures each year, and the EPV and VHM are only functions of the size of insured, then the Normal Equations produce the same result as Z = N/(N + K), provided N is the number of years, and K is computed based on the number of exposures in each year. For Buhlmann Covariance Structure, the Normal Equations ⇒ Z = N / (N+K). For Buhlmann-Straub Covariance Structure, the Normal Equations ⇒ Z = N / (N+K).. If one assumes a covariance structure different than the Buhlmann-Straub, and exposures vary by year, then the equation Z = m/(m + K) does not hold. In these situations, one can solve the set of linear equations, called the “normal equations”, for the amount of credibility to be assigned to each year of data. Assume linear estimators of the form: a0 + Σ ZiXi. Variance-Covariance Matrix: Cij. Then the least squares estimator satisfies the equations: The unbiasedness equation: E[X] = a0 + Σ ZiE[Xi]. N
∑ Zi Cij = Cj, 1+N , for j = 1, 2, 3, ... , N. i=1
Mahlerʼs Guide to
Conjugate Priors Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-10 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-10,
Conjugate Priors,
HCM 10/21/12,
Page 1
Mahlerʼs Guide to Conjugate Priors Copyright 2013 by Howard C. Mahler. Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Solutions to the problems in each section are at the end of that section.1
A
B C
D
E
1
Section #
Pages
Section Name
1 2 3 4 5 6 7 8 9 10 11 12 13
3 4-60 61-67 68-152 153-159 160-203 204-231 232-237 238-265 266-287 288-303 304-343 344-353
Introduction Mixing Poissons Gamma Function and Distribution
Gamma-Poisson Beta Distribution Beta-Bernoulli Beta-Binomial Inverse Gamma Distribution Inverse Gamma-Exponential Normal-Normal Linear Exponential Families Overview of Conjugate Priors Important Formulas and Ideas
Note that problems include both some written by me and some from past exams. The latter are copyright by the CAS and SOA and are reproduced here solely to aid students in studying for exams. In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. In some cases the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to answer exam questions, but it will not be specifically tested. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams.
2013-4-10,
Conjugate Priors,
HCM 10/21/12,
Page 2
Course 4 Exam Questions by Section of this Study Aid2 Section Sample 1 2 3 4 5 6 7 8 9 11 10 11 12 Section 1 2 3 4 5 6 7 8 9 10 11 12
4
5/00
11/00
5/01
3
18, 37, 38
30
2
34
11/01
11/02
11/03
11/04
5/05 6, 14
3 34
3
27
21
11 7
1
23
10 11/05
11/06
5/07
19
2, 19, 23
6
2
10
9, 29
3, 15 30
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams.
2
Excluding any questions that are no longer on the syllabus.
2013-4-10,
Conjugate Priors §1 Introduction,
HCM 10/21/12,
Page 3
Section 1, Introduction Bayesian Analysis and Buhlmann Credibility will be applied to particular situations that commonly occur. First will be discussed situations in which each insured has a Poisson frequency, but with different means. Mixing Poissons can involve either discrete risk types or a continuous distribution of λ. This section serves as a useful review of how to apply Bayesian Analysis and Buhlmann Credibility, as well as a way to prepare for the important Gamma-Poisson conjugate prior situation. If the prior and posterior distributions have the same type of distribution, then the prior distribution is called a conjugate prior. For example, the Gamma Distribution is a conjugate prior to the Poisson Distribution. This Gamma-Poisson frequency process, in which each insured is Poisson and λ has a Gamma Distribution, is very important to learn well. This Study Aid will review in detail four common conjugate prior situations, in decreasing order of importance: Gamma-Poisson, Beta-Bernoulli, Inverse Gamma-Exponential, and Normal-Normal. In all four of these cases, the estimates from Bayesian Analysis and Buhlmann Credibility are equal. These situations are examples of what Loss Models refers to as exact credibility. Also, some important results for linear exponential families will be discussed in Section 10.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 4
Section 2, Mixing Poissons This section presents a simple frequency example as a precursor to the important Gamma-Poisson frequency process. Most of the important features of the Gamma-Poisson are present in the example in this section. Study this example closely and then go back and forth between it and the Gamma-Poisson. Even those who know the Gamma-Poisson very well, should find this a useful example of Bayesian Analysis and Buhlmann Credibility. Prior Distribution:3 Assume there are four types of risks or insureds, all with claim frequency given by a Poisson distribution: Average Annual A Priori Type Claim Frequency Probability Excellent 1 40% Good 2 30% Bad
3
20%
Ugly
4
10%
These four different Poisson distributions are shown below, through eight claims:
3
The first portion of this example is in “Mahlerʼs Guide to Frequency Distributions.” However, here we introduce observations and then apply Bayes Analysis and Buhlmann Credibility.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 5
For a Poisson Distribution with parameter λ the chance of having n claims is given by: f(n) = λn e−λ / n!. So for example for an Ugly risk with λ = 4, the chance of n claims is: 4n e-4 / n!. For an Ugly risk the chance of 6 claims is: 46 e-4 / 6! = 10.4% Similarly the chance of 6 claims for Excellent, Good, or Bad risks are: .05%, 1.20%, and 5.04%, respectively. Marginal Distribution (Prior Mixed Distribution): If we have a risk but do not know what type it is, we weight together the 4 different chances of having 6 claims, using the a priori probabilities of each type of risk in order to get the chance of having 6 claims: (0.4)(0.05%) + (0.3)(1.20%) + (0.2)(5.04%) + (0.1)(10.42%) = 2.43%. The table below displays similar values for other numbers of claims. The probabilities in the final column represents the marginal distribution (also referred to as the prior mixed distribution), which is the assumed distribution of the number of claims for the entire portfolio of risks, prior to any observations.4 Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Probability for Excellent Risks 36.79% 36.79% 18.39% 6.13% 1.53% 0.31% 0.05% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Probability for Good Risks 13.53% 27.07% 27.07% 18.04% 9.02% 3.61% 1.20% 0.34% 0.09% 0.02% 0.00% 0.00% 0.00% 0.00% 0.00%
Probability for Bad Risks 4.98% 14.94% 22.40% 22.40% 16.80% 10.08% 5.04% 2.16% 0.81% 0.27% 0.08% 0.02% 0.01% 0.00% 0.00%
Probability for Ugly Risks 1.83% 7.33% 14.65% 19.54% 19.54% 15.63% 10.42% 5.95% 2.98% 1.32% 0.53% 0.19% 0.06% 0.02% 0.01%
Probability for All Risks 19.95% 26.56% 21.42% 14.30% 8.63% 4.78% 2.43% 1.13% 0.49% 0.19% 0.07% 0.02% 0.01% 0.00% 0.00%
SUM
100.00%
100.00%
100.00%
100.00%
100.00%
Prior Mean: Note that the overall (a priori) mean can be computed in either one of two ways. First one can weight together the means for each type of risks, using the a priori probabilities: (0.4)(1) + (0.3)(2) + (0.2)(3) + (0.1)(4) = 2. Alternately, one can compute the mean of the marginal distribution: (0)(.01995) + (1)(0.2656) + (2)(0.2142) + ... = 2. 4
While the marginal distribution is easily computed by weighting together the four Poisson distributions, in this case it is not itself a Poisson nor another well known distribution.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 6
Prior Total Variance: Number Probability Probability of for for Excellent Claims Good Bad Ugly Risks Risks All Risks Risks Risks 0.1995 0.2656 0.2142 0.1430 0.0863 0.0478 0.0243 0.0113 0.0049 0.0019 0.0007 0.0002 0.0001 0.0000 0.0000 Mean Mean
Number of Claims 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Square of # of Claims 0 1 4 9 16 25 36 49 64 81 100 121 144 169 196
2.000
7.000
Variances can be computed. The total variance is the variance of the claim distribution for the entire portfolio. As seen above, the total variance = 7 - 22 = 3. Prior Expected Value of the Process Variance: The process variance for an individual risk is its Poisson parameter λ since the frequency for each risk is Poisson. Therefore, the expected value of the process variance = the expected value of λ = the a priori mean frequency = 2. Prior Variance of the Hypothetical Means: The variance of the hypothetical means is computed as follows: Type of Risk
A Priori Probability
Mean
Mean Squared
Excellent Good Bad Ugly
0.4 0.3 0.2 0.1
1 2 3 4
1 4 9 16
2
5
Overall
Variance of the Hypothetical Means = 5 - 22 = 1. The Expected Value of the Process Variance + Variance of the Hypothetical Means = 2 + 1 = 3 = Total Variance.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 7
Observations: Let us now introduce the concept of observations. A risk is selected at random and it is observed to have 5 claims in one year. Posterior Distribution: We can employ Bayesian analysis to compute what the chances are that the selected risk was of each type: A
B
C
D
E
Type of Risk
A Priori Probability
Chance of Observation
Col. B x Col. C = Probability Weight
Col. D / Sum of Col. D. = Posterior Probability
Excellent Good Bad Ugly SUM
0.4 0.3 0.2 0.1
0.0031 0.0361 0.1008 0.1563
0.00124 0.01083 0.02016 0.01563 0.04786
2.59% 22.63% 42.12% 32.66% 100.00%
While the posterior probabilities can be calculated in a straightforward manner, they do not come from some named well-known distribution. Predictive Distribution: Using these posterior probabilities, one can compute the predictive distribution; i.e., the posterior analog of the marginal distribution: Number of Probability for Probability for Claims Excellent Risks Good Risks
Probability for Bad Risks
Probability for Ugly Risks
Probability for All Risks
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
36.79% 36.79% 18.39% 6.13% 1.53% 0.31% 0.05% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
13.53% 27.07% 27.07% 18.04% 9.02% 3.61% 1.20% 0.34% 0.09% 0.02% 0.00% 0.00% 0.00% 0.00% 0.00%
4.98% 14.94% 22.40% 22.40% 16.80% 10.08% 5.04% 2.16% 0.81% 0.27% 0.08% 0.02% 0.01% 0.00% 0.00%
1.83% 7.33% 14.65% 19.54% 19.54% 15.63% 10.42% 5.95% 2.98% 1.32% 0.53% 0.19% 0.06% 0.02% 0.01%
6.71% 15.76% 20.82% 20.06% 15.54% 10.18% 5.80% 2.93% 1.33% 0.55% 0.21% 0.07% 0.02% 0.01% 0.00%
SUM
100.00%
100.00%
100.00%
100.00%
100.00%
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 8
For example, in the year subsequent to our observation, for this same risk the chance of having 6 claims is: (0.0259)(.05%) + (0.2263)(1.20%) + (0.4212)(5.04%) + (0.3266)(10.42%) = 5.80%. After having observed 5 claims in a year our new estimate of the chance of that same risk having 6 claims the next year is 5.8% rather than only 2.4% as it was prior to any observation. Below are displayed both the posterior predictive distribution (squares) and the prior marginal distribution (triangles), through 8 claims:
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
Posterior Mean: One can compute the means and variances posterior to the observations. The posterior mean can be computed either by weighting together the means of the different types of risks using the posterior weights or by by computing the mean of the predictive distribution. The former gives (2.59%)(1)+ (22.63%)(2) + (42.12%)(3) + (32.66% )(4) = 3.05. Alternately, the mean of the predictive distribution is: (0.0671)(0) + (0.1576)(1) + (0.2082) (2) + ... = 3.05. Thus the new estimate posterior to the observations for this risk using Bayesian Analysis is 3.05. This compares to the a priori estimate of 2. In general, the observations provide information about the given risk, which allows one to make a better estimate of the future experience of that risk.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 9
Mixing Poissons, Bayesian Analysis Mixing
⇒
Prior Distribution of λ
⇓
Marginal Distribution (Number of Claims)
Observations
⇒
Posterior Distribution of λ
Predictive Distribution (Number of Claims)
Mixing Posterior Expected Value of the Process Variance: Just as prior to the observations, posterior to the observations one can compute three variances: the expected value of the process variance, the variance of the hypothetical pure premiums, and the total variance. The process variance for an individual risk is its Poisson parameter λ since the frequency for each risk is Poisson. Therefore the expected value of the process variance = the expected value of λ = the posterior mean frequency = 3.05. Posterior Variance of the Hypothetical Means: The variance of the hypothetical means is computed as follows: Type of Risk
Posterior Probability
Mean
Square Mean
Excellent Good Bad Ugly
0.0259 0.2263 0.4212 0.3266
1 2 3 4
1 4 9 16
3.05
9.95
Overall
Variance of the Hypothetical Means = 9.95 - 3.052 = 0.65. Note how after the observation the variance of the hypothetical means is less than prior, since the observations have allowed us to narrow down the possibilities.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 10
Posterior Total Variance: The EPV + VHM = 3.05 + 0.65 = 3.70 = Total Variance. Calculating the total variance directly from the predictive distribution as shown below, Total Variance = 12.99 - 3.052 = 3.69, which matches the sum of the expected value of the process variance and the variance of the hypothetical means, except for rounding. Posterior Predictive Distrib.
Number of Claims
Square of # of Claims
0.0671 0.1576 0.2082 0.2006 0.1554 0.1018 0.0580 0.0293 0.0133 0.0055 0.0021 0.0007 0.0002 0.0001 0.0000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 1 4 9 16 25 36 49 64 81 100 121 144 169 196
Mean
3.05
12.99
Buhlmann Credibility: Next, letʼs apply Buhlmann Credibility to this example. The Buhlmann Credibility parameter K = the expected value of the process variance / variance of the hypothetical means = 2 / 1 = 2. Note that K can be computed prior to any observation and doesnʼt depend on them. Having observed 5 claims in one year, Z = 1 / (1+ 2) = 1/3. The observation = 5. The a priori mean = 2. Therefore, the new estimate = (1/3)(5) + (1 - 1/3)(2) = 3. Note that in this case the estimate from Buhlmann Credibility does not match the estimate from Bayesian Analysis. Mixing Poissons, Buhlmann Credibility EPV = Overall A Priori Mean = E[λ] = Mean of Prior Distribution of λ = Mean of Marginal Distribution. VHM = Var[λ] = Variance of Prior Distribution of λ. Total Variance = Variance of Marginal Distribution = EPV + VHM = Mean of Prior Distribution + Variance of Prior Distribution.5 5
Variance of the Marginal Distribution = Mean of the Marginal Distribution + Variance of Prior Distribution. Therefore, Variance of the Marginal Distribution> Mean of the Marginal Distribution. See Equation 6.45 in Loss Models.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 11
Extending the Example: There is nothing unique about assuming four types of risks. If one had assumed for example 100 different types of risks, with mean frequencies from 0.1 to 10, then there would have been no change in the conceptual complexity of the situation, although the computational complexity would have been increased. Assume that a priori the probability of each of these 100 types of risks with mean frequency λ, was given approximately by: (10)(1.53 ) λ2 e-1.5λ / Γ(3) = 16.875 λ2 e-1.5λ
λ = 0.1, 0.2 , 0.3,..., 9.9, 10.
(This is 10 times6 a Gamma density with α = 3 and θ = 2/3. Recall that Γ(3) = 2! = 2.) Note that the Gamma distribution isnʼt being used here as a size of loss distribution. There has been no mention of claim severity; only claim frequency has been dealt with here. One would compute the overall mean frequency by taking the sum from λ = 0.1 to λ = 10 of the product of this density times λ times Δλ = Σ (16.875 λ3 e-1.5λ)(.1) ≅ mean of a Gamma distribution with α = 3 and θ = 2/3. Thus the overall mean is approximately: (3)(2/3) = 2. A further extension of this discrete example to a continuous case would give the Gamma-Poisson situation discussed in a subsequent section. The same type of questions as were asked in this example can be asked about the Gamma-Poisson situation. Due to the mathematical properties of the Gamma and Poisson there are some specific relationships in the case of the Gamma-Poisson in addition to those in this example.
6
One multiplies by 10 in order to compensate for having selected Δµ = 1/10.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 12
A General Result for Mixing Poissons: Assume there is a mixture of different individuals each of which has frequency given by a Poisson, but with different means. For each individual risk the Process Variance is the mean, since this is the case for the Poisson. Therefore the Expected Value of the Process Variance is the (a priori) overall mean frequency. The total variance is the sum of the Expected Value of the Process Variance plus the Variance of the Hypothetical Means. Thus one could estimate the EPV and VHM as:7 EPV = estimated mean. VHM = estimated total variance - EPV. Therefore, the Buhlmann Credibility parameter K = EPV / VHM = A Priori Mean / (A Priori Total Variance - A Priori Mean). Therefore: Buhlmann Credibility Parameter = A Priori Mean / “Excess Variance”. In the example, K = 2 / (3 - 2) = 2, which matches the result above. The denominator is the extra variance beyond the Poisson, that is introduced by the mixing of individual risks with different expected means. Note that the credibility assigned to one observation is: Z = 1/(1+K) = (A Priori Total Variance - A Priori Mean) / A Priori Total Variance = 1 - (A Priori Mean / A Priori Total Variance). In general, when mixing Poissons with means λ via a distribution g(λ), the Expected Value of the Process Variance is the mean of g, and the Variance of the Hypothetical Means is the variance of g. K = E[λ]/Var[λ]. This general result for mixing Poissons is the idea behind semiparametric estimation with Poissons, as discussed in “Mahlerʼs Guide to Semiparametric Estimation.” In the important special case where g is a Gamma Distribution, one gets the Gamma-Poisson frequency process, with the EPV= mean of the Gamma = αθ, while the VHM = variance of the Gamma = αθ2, so that K = (αθ) / (αθ2) = 1/θ, the inverse of the scale parameter of the Gamma.
7
See “Mahlerʼs Guide to Semiparametric Estimation.”
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Problems: Use the following information for the next 5 questions: Each insuredʼs claim frequency follows a Poisson process. There are three types of insureds as follows: Type A Priori Probability Mean Annual Claim Frequency (Poisson Parameter) A 60% 1 B 30% 2 C 10% 3 2.1 (1 point) What is the chance of a single individual having 4 claims in a year? A. 4.7%
B. 4.9%
C. 5.1%
D. 5.3%
E. 5.5%
2.2 (3 points) You observe 4 claims by an individual in a single year. Use Buhlmann Credibility to predict that individualʼs future claim frequency. A. less than 1.7 B. at least 1.7 but less than 1.8 C. at least 1.8 but less than 1.9 D. at least 1.9 but less than 2.0 E. at least 2.0 2.3 (3 points) You observe 4 claims by an individual in a single year. Use Bayesian Analysis to predict that individualʼs future claim frequency. A. less than 2.0 B. at least 2.0 but less than 2.1 C. at least 2.1 but less than 2.2 D. at least 2.2 but less than 2.3 E. at least 2.3 2.4 (2 points) You observe 37 claims by an individual in ten years. Use Buhlmann Credibility to predict that individualʼs future claim frequency. A. less than 2.9 B. at least 2.9 but less than 3.1 C. at least 3.1 but less than 3.3 D. at least 3.3 but less than 3.5 E. at least 3.5 2.5 (3 points) You observe 37 claims by an individual in ten years. Use Bayesian Analysis to predict that individualʼs future claim frequency. A. 2.8 B. 3.0 C. 3.2 D. 3.4 E. 3.6
Page 13
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 14
Use the following information for the next four questions: Each insured has its accident frequency given by a Poisson Process with mean λ. For a portfolio of insureds, λ is distributed uniformly on the interval from 3 to 7. 2.6 (1 point) What is the Expected Value of the Process Variance? A. 1 B. 2 C. 3 D. 4 E. 5 2.7 (2 points) What is the Variance of the Hypothetical Means? A. less than 1.0 B. at least 1.0 but less than 1.5 C. at least 1.5 but less than 2.0 D. at least 2.0 but less than 2.5 E. at least 2.5 2.8 (2 points) An individual insured from this portfolio is observed to have 7 accidents in a single year. Use Buhlmann Credibility to estimate the future accident frequency of that insured. A. less than 5.5 B. at least 5.5 but less than 5.7 C. at least 5.7 but less than 5.9 D. at least 5.9 but less than 6.1 E. at least 6.1 2.9 (4 points) An individual insured from this portfolio is observed to have 7 accidents in a single year. Use Bayesian Analysis to estimate the future accident frequency of that insured. Γ(8; 7) - Γ(8; 3) Γ(8; 7) - Γ(8; 3) A. 7 B. 8 Γ(7; 7) - Γ(7; 3) Γ(7; 7) - Γ(7; 3) C. 7
Γ(9; 7) - Γ(9; 3) Γ(8; 7) - Γ(8; 3)
D. 8
Γ(9; 7) - Γ(9; 3) Γ(8; 7) - Γ(8; 3)
E. None of the above.
2.10 (2 points) Each insured has its accident frequency given by a Poisson distribution with mean λ. Over a portfolio of insureds, λ is distributed via g(λ). g has a mean of 0.09 and a variance of 0.003. For an insured, two claims are observed in three years. Use Buhlmann Credibility to estimate the future claim frequency of this insured. A. 10% B. 11% C. 12% D. 13% E. 14%
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 15
Use the following information for the next 8 questions: (i) An individual automobile insured has annual claim frequencies that follow a Poisson distribution with mean λ. (ii) An actuaryʼs prior distribution for the parameter λ has probability density function: Π(λ) = (0.7)(20e−20λ) + (0.3)(10e−10λ). 2.11 (1 point) What is the a priori expected annual frequency? (A) 5.0% (B) 5.5% (C) 6.0% (D) 6.5% (E) 7.0% 2.12 (1 point) For an insured picked at random, what is the probability that λ > 10%? A. less than 16% B. at least 16% but less than 18% C. at least 18% but less than 20% D. at least 20% but less than 22% E. at least 22% 2.13 (3 points) If in the first policy year, no claims were observed for an insured, what is the probability that for this insured, λ > 10%? A. 14%
B. 16%
C. 18%
D. 20%
E. 22%
2.14 (3 points) If in the first policy year, no claims were observed for an insured, determine the expected number of claims in the second policy year. (A) 4.0% (B) 4.5% (C) 5.0% (D) 5.5% (E) 6.0% 2.15 (3 points) If in the first policy year, one claim was observed for an insured, determine the expected number of claims in the second policy year. A. 11% B. 13% C. 15% D. 17% E. 19% 2.16 (3 points) If in the first policy year, two claims was observed for an insured, determine the expected number of claims in the second policy year. (A) 19% (B) 20% (C) 21% (D) 22% (E) 23% 2.17 (3 points) Determine the Buhlmann credibility parameter, K. (A) 10 (B) 12 (C) 14 (D) 16 (E) 18 2.18 (1 point) If in the first policy year, two claims was observed for an insured, using Buhlmann Credibility, what is the expected number of claims in the second policy year? (A) 19% (B) 20% (C) 21% (D)22% (E) 23%
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 16
Use the following information for the next seven questions: Each insured has its accident frequency given by a Poisson Process with mean θ. For a portfolio of insureds θ is distributed as follows on the interval from a to b: f(θ) =
(d + 1) θd , 0 ≤ a ≤ θ ≤ b ≤ ∞. bd + 1 - ad + 1
You may use the following values of the Incomplete Gamma Function: Γ(α ; x) x
α=2.5
α=3.5
α=4.5
1.2
0.209
0.066
0.017
2.4 0.559 0.316 0.149 3.6 0.794 0.592 0.384 You may also use the following values of the Complete Gamma Function: Γ(2.5) = 1.329, Γ(3.5) = 3.323, Γ(4.5) = 11.632 2.19 (1 point) What is the expected mean value of θ, prior to any observations? A. (1 + D.
1 bd - ad ) d+ 1 - ad + 1 d b
bd - ad bd + 1 - ad + 1
B. (d+1) E.
bd + 2 - ad + 2 bd + 1 - ad + 1
C.
d + 1 bd + 2 - ad + 2 d + 2 bd + 1 - ad + 1
d + 1 b - a
2.20 (2 points) An insured is randomly selected from the portfolio and we observe C claims in Y years. Which of the following is the posterior distribution of θ for this insured? A.
Yd + C θ d + C e - Yθ {Γ(d+ C; bY) - Γ(d + C; aY) } Γ(d + C)
B.
Yd + C θ d + C + 1 e -Yθ Γ(d+ C; bY) - Γ(d + C; aY)
C.
Yd + C + 1 θd + C + 1 e - Yθ Γ(d+ C +1; bY) - Γ(d + C +1; aY)
D.
Yd + C + 1 θd + C + 1 e - Yθ Γ(d+ C; bY) - Γ(d + C; aY)
E.
Yd + C + 1 θ d + C e- Yθ {Γ(d+ C +1; bY) - Γ(d + C + 1; aY) } Γ(d + C + 1)
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 17
2.21 (2 points) An insured is randomly selected from the portfolio and we observe C claims in Y years. Which of the following is the Bayesian Analysis estimate of θ for this insured? A.
Γ(d+ C; bY) - Γ(d + C; aY) d+ C Y Γ(d+ C +1; bY) - Γ(d + C +1; aY)
B.
d + C Γ(d+ C +1; bY) - Γ(d + C +1; aY) Γ(d + C; bY) - Γ(d + C; aY) Y
C.
d + C Γ(d+ C + 2; bY) - Γ(d + C + 2; aY) Γ(d+ C; bY) - Γ(d + C; aY) Y
D.
Γ(d+ C; bY) - Γ(d + C; aY) d+ C+ 1 Γ(d+ C +1; bY) - Γ(d + C +1; aY) Y
E.
d + C + 1 Γ(d+ C + 2; bY) - Γ(d + C + 2; aY) Γ(d+ C + 1; bY) - Γ(d + C + 1; aY) Y
2.22 (1 point) If the parameter d = -1/2, and if a = 0.2 and b =0 .6, what is the expected mean value of θ, prior to any observations? A. 0.34
B. 0.35
C. 0.36
D. 0.37
E. 0.38
2.23 (2 points) The parameter d = -1/2, and a =0.2 and b = 0.6. An individual insured from this portfolio is observed to have 2 claims over 6 years. Which of the following, with support from 0.2 to 0.6, is the posterior distribution of θ for this insured? A. 113.4 θ1.5 e-6θ
B. 113.4 θ1.5 e-7θ
D. 113.4 θ2.5 e-7θ
E. None of the above
C. 113.4 θ2.5 e-6θ
2.24 (2 points) The parameter d = -1/2, and a = 0.2 and b =0 .6. An individual insured from this portfolio is observed to have 2 claims over 6 years. Which of the following is the Bayesian Analysis estimate of θ for this insured? A. less than 0.35 B. at least 0.35 but less than 0.36 C. at least 0.36 but less than 0.37 D. at least 0.37 but less than 0.38 E. at least 0.38 2.25 (2 points) The parameter d = -1, and a = 0 and b = ∞. An individual insured from this portfolio is observed to have 2 claims over 6 years. Which of the following is the Bayesian Analysis estimate of θ for this insured? A. less than 0.35 B. at least 0.35 but less than 0.36 C. at least 0.36 but less than 0.37 D. at least 0.37 but less than 0.38 E. at least 0.38
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 18
Use the following information for the next 7 questions: A group of drivers have their expected annual claim frequency uniformly distributed over the interval 2% to 8%. Each driver's observed number of claims per year follows a Poisson distribution. 2.26 (3 points) A particular driver from this group is observed to have two claims over the most recent five year period. Using Buhlmann credibility, what is the estimate of this driver's future annual claim frequency? A. 6.0% B. 6.2% C. 6.4% D. 6.6% E. 6.8% 2.27 (2 points) A particular driver from this group is observed to have no claims over the most recent five year period. Using Bayesian Analysis, what is the estimate of this driver's future annual claim frequency? A. less than 4.6% B. at least 4.6% but less than 4.7% C. at least 4.7% but less than 4.8% D. at least 4.8% but less than 4.9% E. at least 4.9% 2.28 (2 points) A particular driver from this group is observed to have no claims over the most recent five year period. What is the probability that this driverʼs annual Poisson parameter is less than 5%? A. less than 53% B. at least 53% but less than 55% C. at least 55% but less than 57% D. at least 57% but less than 59% E. at least 59% 2.29 (3 points) A particular driver from this group is observed to have one claim over the most recent five year period. Using Bayesian Analysis, what is the estimate of this driver's future annual claim frequency? A. 5.1% B. 5.3% C. 5.5% D. 5.7% E. 5.9% 2.30 (3 points) A particular driver from this group is observed to have one claim over the most recent five year period. What is the probability that this driverʼs annual Poisson parameter is less than 5%? A. less than 33% B. at least 33% but less than 35% C. at least 35% but less than 37% D. at least 37% but less than 39% E. at least 39%
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 19
2.31 (2 points) A particular driver from this group is observed to have no claims over the most recent five year period. What is the probability that this driver has no claims the following year? A. 93% B. 94% C. 95% D. 96% E. 97% 2.32 (2 points) A particular driver from this group is observed to have no claims over the most recent five year period. What is the probability that this driver has one claim the following year? A. 4.6% B. 4.7% C. 4.8% D. 4.9% E. 5.0%
Use the following information for the next three questions: (i) The number of claims experienced in a given year by each insured follows a Poisson distribution. (ii)
The mean value λ of the Poisson distribution is distributed across the population according to a Single Parameter Pareto Distribution with α = 3 and θ = 0.8.
(iii)
λ is constant for each insured over time.
(iv)
An insured is picked at random and has 4 claims in 7 years.
2.33 (3 points) What is the Buhlmann credibility estimate of the future expected annual claim frequency for this particular insured? A. 74% B. 76% C. 78% D. 80% E. 82% 2.34 (1 point) What is the a priori chance of observing 4 claims in 7 years? A. 7% B. 8% C. 9% D. 10% E. 11% 2.35 (3 points) Use Bayes Theorem in order to estimate the future expected annual claim frequency for this particular insured. A. Less than 80% B. At least 80%, but less than 85% C. At least 85%, but less than 90% D. At least 90%, but less than 95% E. 95% or more
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 20
Use the following information for the next 6 questions: (i) Claim counts for individual insureds follow a Poisson distribution. (ii) Half of the insureds have expected annual claim frequency of 40%. (iii) The other half of the insureds have expected annual claim frequency of 60%. 2.36 (2 points) A randomly selected insured has made 2 claims in each of the first two policy years. Determine the Bayesian estimate of this insuredʼs claim count in the next (third) policy year. (A) 0.554 (B) 0.556 (C) 0.558 (D) 0.560 (E) 0.562 2.37 (2 points) A randomly selected insured has made 1 claim in the first policy year and 3 claims in the second policy year. Determine the Bayesian estimate of this insuredʼs claim count in the next (third) policy year. (A) 0.554 (B) 0.556 (C) 0.558 (D) 0.560 (E) 0.562 2.38 (2 points) A randomly selected insured has made a total of 4 claims in the first two policy years. Determine the Bayesian estimate of this insuredʼs claim count in the next (third) policy year. (A) 0.554 (B) 0.556 (C) 0.558 (D) 0.560 (E) 0.562 2.39 (3 points) A randomly selected insured had at most 2 claims in each of the first two policy years. Determine the Bayesian estimate of this insuredʼs claim count in the next (third) policy year. (A) 0.490 (B) 0.492 (C) 0.494 (D) 0.496 (E) 0.498 2.40 (3 points) A randomly selected insured has made at most a total of 2 claims in the first two policy years. Determine the Bayesian estimate of this insuredʼs claim count in the next (third) policy year. (A) 0.490 (B) 0.492 (C) 0.494 (D) 0.496 (E) 0.498 2.41 (3 points) A randomly selected insured had at least 2 claims in each of the first two policy years. Determine the Bayesian estimate of this insuredʼs claim count in the next (third) policy year. (A) 0.54 (B) 0.55 (C) 0.56 (D) 0.57 (E) 0.58
2.42 (3 points) You are given: (i) The annual number of claims for an individual risk follows a Poisson distribution with mean λ. (ii) For 80% of the risks, λ = 4. (iii) For 20% of the risks, λ = 7. A randomly selected risk had r claims in Year 1. The Bayesian estimate of this riskʼs expected number of claims in Year 2 is 5.97. Determine r. (A) 8 (B) 9 (C) 10 (D) 11 (E) 12
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 21
Use the following information for the next two questions: (i) The conditional distribution of the number of claims per policyholder is Poisson with mean λ. (ii) The variable λ has a Weibull distribution with parameters θ and τ. (iii) A policyholder has 1 claim in Year 1, 2 claims in year 2, and 3 claims in year 3. 2.43 (3 points) Which of the following is equal to the mean of the posterior distribution of λ? ∞
(A)
∫
λτ + 6 exp[−{3λ + (λ / θ) τ }] dλ /
0 ∞
(B)
∫
∞
∫
λτ + 3 exp[−{6λ + (λ / θ) τ }] dλ /
(D)
∫
(λ / θ) τ }] dλ
∞
∫ λτ + 3 exp[−{5λ +
(λ / θ) τ }] dλ
0
λτ + 6 exp[−{3λ + (λ / θ) τ }] dλ /
0 ∞
∫ λτ + 3 exp[−{5λ + 0
0
(C)
∞
∞
∫ λτ + 5 exp[−{3λ +
(λ / θ) τ }] dλ
0
λτ + 3 exp[−{6λ + (λ / θ) τ }] dλ /
0
∞
∫ λτ + 5 exp[−{3λ +
(λ / θ) τ }] dλ
0
(E) None of A, B, C, or D 2.44 (3 points) Which of the following is equal to the probability of three claims in year 4 from this policyholder? ∞
(A) (1/6) ∫
λτ + 8 exp[−{3λ + (λ / θ) τ }] dλ /
0
∞
∫
∞
∞
0
0
λτ + 8 exp[−{3λ + (λ / θ) τ }] dλ /
0 ∞
(D)
∫
∫ λτ + 5 exp[−{3λ +
(λ / θ) τ }] dλ
0
(B) (1/6) ∫ λτ + 8 exp[−{4λ + (λ / θ) τ }] dλ / (C)
∞
∫ λτ + 5 exp[−{3λ +
∞
∫ λτ + 5 exp[−{3λ +
(λ / θ) τ }] dλ
(λ / θ) τ }] dλ
0
λτ + 8 exp[−{4λ + (λ / θ) τ }] dλ /
0
(E) None of A, B, C, or D
∞
∫ λτ + 5 exp[−{3λ + 0
(λ / θ) τ }] dλ
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 22
2.45 (3 points) For an individual high-tech company, the number of Workers Compensation Insurance claims per employee per year is Poisson with mean λ. Over the high-tech industry, λ is uniformly distributed from 0.002 to 0.008. You have the following experience for Initech, a high-tech company: Year Number of Employees Number of Workers Compensation Insurance Claims 1 2000 15 2 2400 19 3 1600 12 During year 4, Initech will have 1400 employees. Using Bühlmann-Straub Credibility, estimate the number of Workers Compensation Insurance Claims Initech will have in year 4. (A) 7 (B) 8 (C) 9 (D) 10 (E) 11 2.46 (3 points) You are given the following information for health insurance: (i) Annual claim counts for individual insureds follow a Poisson distribution. (ii) Three quarters of the insureds have expected annual claim frequency of 2. (iii) The other quarter of the insureds have expected annual claim frequency of 4. (iv) A particular insured has the following experience over 6 years: Number of Claims Number of Years 0 or 1 1 2 or 3 2 more than 3 3 Determine the Bayesian expected number of claims for the insured in the next year. (A) 2.7 (B) 2.9 (C) 3.1 (D) 3.3 (E) 3.5 2.47 (3 points) The number of claims each year for an individual insured has a Poisson distribution with parameter λ. The expected annual claim frequencies of the entire population of insureds is distributed by: 3λ2/8 for 0 < λ < 2. Chip Monk had 2 claims during the past 3 years. Using Buhlmann credibility, what is the estimate of Chipʼs future annual claim frequency? A. 1.25 B. 1.27 C. 1.29 D. 1.31 E. 1.33
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 23
Use the following information for the next three questions: (i) The number of claims incurred in a month by any employee of an insured follows a Poisson distribution with mean λ. (ii) For a given insured, λ is the same for all employees. (iii) The number of claims for different employees are independent. (iv) For a particular employer, you have the following experience: Month Number of employees Number of Claims 1 200 16 2 210 19 3 230 21 4 270 25 2.48 (2 points) The prior distribution of λ is 0.1 and 0.2 equally likely. Determine the Bühlmann-Straub credibility estimate of the number of claims in the next 12 months for 400 employees. (A) Less than 460 (B) At least 460, but less than 500 (C) At least 500, but less than 540 (D) At least 540, but less than 580 (E) At least 580 2.49 (2 points) The prior distribution of λ is 0.1 and 0.2 equally likely. Determine the Bayes Analysis estimate of the number of claims in the next 12 months for 400 employees. (A) Less than 460 (B) At least 460, but less than 500 (C) At least 500, but less than 540 (D) At least 540, but less than 580 (E) At least 580 2.50 (2 points) Using classical credibility, you wish the estimated mean frequency to be within 10% of its true value 95% of the time. Similar employers have a mean frequency of 15% per employee per month. Estimate of the number of claims in the next 12 months for 400 employees. (A) Less than 460 (B) At least 460, but less than 500 (C) At least 500, but less than 540 (D) At least 540, but less than 580 (E) At least 580
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 24
2.51 (4, 5/84, Q.36) (2 points) There is a new brand of chocolate chip cookies. You buy a box of 50 of the cookies, and discover that there are a total of 265 chips. Your prior research has led you to expect 4.80 chips per cookie, with a variance between brands of 0.20. You assume that for a given brand the number of chips in each cookie varies randomly and is given by a Poisson distribution. Using Buhlmann credibility, estimate the average number of chips per cookie for this new brand. A. Less than 4.9 B. At least 4.9, but less than 5.0 C. At least 5.0, but less than 5.1 D. At least 5.1, but less than 5.2 E. 5.2 or more 2.52 (4, 5/86, Q.42) (3 points) A group of drivers have their expected annual claim frequency uniformly distributed over the interval (0.10, 0.30). Each driver's observed number of claims per year follows a Poisson distribution. A particular driver from this group is observed to have three claims over the most recent five year period. Using Buhlmann credibility, what is the estimate of this driver's future claim frequency? A. Less than 0.21 B. At least 0.21, but less than 0.22 C. At least 0.22, but less than 0.23 D. At least 0.23, but less than 0.24 E. 0.24 or more. 2.53 (4, 5/88, Q.43) (2 points) The number of claims each year for an individual insured has a Poisson distribution. The expected annual claim frequencies of the entire population of insureds are uniformly distributed over the interval (0.0, 1.0). An individual's expected annual claim frequency is constant through time. An insured is selected at random. The insured is then observed to have no claims during a year. What is the posterior density function of the expected annual claim frequency θ for this insured? A. θ is uniformly distributed over (0, 1)
B. 3(1-θ)2 for 0 < θ < 1
C. e−θ / (1 - e-1) for 0 < θ < 1
D. eθ / (e-1) for 0 < θ < 1
E. θ has a beta distribution
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 25
2.54 (4, 5/88, Q.44) (2 points) The number of claims each year for an individual insured has a Poisson distribution. The expected annual claim frequencies of the entire population of insureds are uniformly distributed over the interval (0.0, 1.0). An individual's expected annual claim frequency is constant through time. A particular insured had four claims during the prior three years. Using Buhlmann credibility, what is the estimate of this insured's future annual claim frequency? A. Less than 0.65 B. At least 0.65, but less than 0.70 C. At least 0.70, but less than 0.75 D. At least 0.75, but less than 0.80 E. 0.80 or more 2.55 (4, 5/89, Q.41) (3 points) Assume an individual insured is selected at random from a population of insureds. The number of claims experienced in a given year by each insured follows a Poisson distribution. The mean value λ of the Poisson distribution is distributed across the population according to the following distribution: f(λ) = 3λ-4 over the interval (1, ∞). Given that a particular insured experienced a total of 20 claims in the previous 2 years, what is the Buhlmann credibility estimate of the future expected annual claim frequency for this particular insured? (Assume frequency is constant for each insured over time.) A. Less than 2 B. At least 2, but less than 4 C. At least 4, but less than 6 D. At least 6, but less than 8 E. 8 or more 2.56 (4, 5/90, Q.41) (2 points) Assume that the number of claims made by an individual insured follows a Poisson distribution. Assume also that the expected number of claims, λ, for insureds in the population has the probability density function f(λ) = 4 λ-5 for 1 < λ < ∞. What is the value of K used in Buhlmannʼs credibility formula for estimating the expected number of claims for an individual insured? A. K < 5.7 B. 5.7 < K < 5.8 C. 5.8 < K < 5.9 D. 5.9 < K < 6.0 E. 6.0 < K
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 26
2.57 (4, 5/90, Q.52) (2 points) The number of claims each year for an individual insured has a Poisson distribution. The expected annual claim frequency of the entire population of insureds is uniformly distributed over the interval (0, 1). An individualʼs expected claim frequency is constant through time. A particular insured had 3 claims during the prior three years. Using Buhlmann credibility, what is the estimate of this insured's future annual claim frequency? A. Less than 0.60 B. At least 0.60 but less than 0.65 C. At least 0.65 but less than 0.70 D. At least 0.70 but less than 0.75 E. At least 0.75 Use the following information for the next two questions:
• The claim count N for an individual insured has a Poisson distribution with mean λ. • λ is uniformly distributed between 1 and 3. 2.58 (4, 5/91, Q.42) (2 points) Find the probability that a randomly selected insured will have no claims. A. Less than 0.11 B. At least 0.11 but less than 0.13 C. At least 0.13 but less than 0.15 D. At least 0.15 but less than 0.17 E. At least 0.17 2.59 (4, 5/91, Q.43) (2 points) If an insured has one claim during a first period, use Buhlmann's credibility formula to estimate the expected number of claims for that insured in the next period. A. Less than 1.20 B. At least 1.20 but less than 1.40 C. At least 1.40 but less than l.60 D. At least 1.60 but less than 1.80 E. At least 1.80
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 27
2.60 (4B, 11/96, Q.20) (3 points) You are given the following:
•
The number of claims for a single risk follows a Poisson distribution with mean λµ.
•
λ and µ have a prior probability distribution with joint density function f(λ,µ) = 1, 0 < λ < 1, 0 < µ < 1.
Determine the value of Buhlmann's k. A. Less than 5.5 B. At least 5.5, but less than 6.5 C. At least 6.5, but less than 7.5 D. At least 7.5, but less than 8.5 E. At least 8.5 Use the following information for the next two questions: You are given the following:
• A large portfolio of automobile risks consists solely of youthful drivers. • The number of claims for one driver during one exposure period follows a Poisson distribution with mean 4-g, where g is the grade point average of the driver. • The distribution of g within the portfolio is uniform on the interval [0,4]. A driver is selected at random from the portfolio. During one exposure period, no claims are observed for this driver. 2.61 (4B, 5/97, Q.4) (2 points) Determine the posterior probability that the selected driver has a grade point average greater than 3. A. Less than 0.15 B. At least 0.15, but less than 0.35 C. At least 0.35, but less than 0.55 D. At least 0.55, but less than 0.75 E. At least 0.75 2.62 (4B, 5/97, Q.5) (2 points) Determine the Buhlmann credibility estimate of the expected number of claims for this driver during the next exposure period. A. Less than 0.375 B. At least 0.375, but less than 0.425 C. At least 0.425, but less than 0.475 D. At least 0.475, but less than 0.525 E. At least 0.525
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 28
2.63 (4B, 5/98, Q.2) (1 point) You are given the following:
• The number of claims for a single insured follows a Poisson distribution with mean λ. • λ varies by insured and follows a Poisson distribution with mean µ. Determine the value of Buhlmann's k. A. 1
B. λ
C. µ
D. λ/µ
E. µ/λ
2.64 (4, 11/00, Q.3) (2.5 points) You are given the following for a dental insurer: (i) Claim counts for individual insureds follow a Poisson distribution. (ii) Half of the insureds are expected to have 2.0 claims per year. (iii) The other half of the insureds are expected to have 4.0 claims per year. A randomly selected insured has made 4 claims in each of the first two policy years. Determine the Bayesian estimate of this insuredʼs claim count in the next (third) policy year. (A) 3.2 (B) 3.4 (C) 3.6 (D) 3.8 (E) 4.0 2.65 (2 points) In the previous question, 4, 11/00, Q.3, what is the probability that this insuredʼs claim count in the next (third) policy year is 1? A. Less than 8% B. At least 8%, but less than 9% C. At least 9%, but less than 10% D. At least 10%, but less than 11% E. At least 11% 2.66 (2 points) In 4, 11/00, Q.3, using Buhlmann Credibility, estimate this insuredʼs claim count in the next (third) policy year. (A) 3.2 (B) 3.4 (C) 3.6 (D) 3.8 (E) 4.0 2.67 (4, 5/01, Q.18) (2.5 points) You are given: (i) An individual automobile insured has annual claim frequencies that follow a Poisson distribution with mean λ . (ii) An actuaryʼs prior distribution for the parameter λ has probability density function: Π(λ) = (0.5)5e−5λ + (0.5)e−λ/5/5. (iii) In the first policy year, no claims were observed for the insured. Determine the expected number of claims in the second policy year. (A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7 2.68 (3 points) In the previous question, using Buhlmann Credibility, determine the expected number of claims in the second policy year. (A) 0.26 (B) 0.28 (C) 0.30 (D) 0.32 (E) 0.34
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 29
Use the following information for 4, 5/01, questions 37 and 38. You are given the following information about workersʼ compensation coverage: (i) The number of claims for an employee during the year follows a Poisson distribution with mean (100 - p)/100, where p is the salary (in thousands) for the employee. (ii) The distribution of p is uniform on the interval (0, 100]. 2.69 (4, 5/01, Q.37) (2.5 points) An employee is selected at random. No claims were observed for this employee during the year. Determine the posterior probability that the selected employee has salary greater than 50 thousand. (A) 0.5 (B) 0.6 (C) 0.7 (D) 0.8 (E) 0.9 2.70 (4, 5/01, Q.38) (2.5 points) An employee is selected at random. During the last 4 years, the employee has had a total of 5 claims. Determine the Bühlmann credibility estimate for the expected number of claims the employee will have next year. (A) 0.6 (B) 0.8 (C) 1.0 (D) 1.1 (E) 1.2
2.71 (4, 5/05, Q.6 & 2009 Sample Q.177) (2.9 points) You are given: (i) Claims are conditionally independent and identically Poisson distributed with mean Θ. (ii) The prior distribution function of Θ is: F(θ) = 1 -
1 , θ > 0. (1 + θ) 2.6
Five claims are observed. Determine the Bühlmann credibility factor. (A) Less than 0.6 (B) At least 0.6, but less than 0.7 (C) At least 0.7, but less than 0.8 (D) At least 0.8, but less than 0.9 (E) At least 0.9
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 30
2.72 (4, 5/05, Q.14 & 2009 Sample Q.184) (2.9 points) You are given: (i) Annual claim frequencies follow a Poisson distribution with mean λ. (ii) The prior distribution of λ has probability density function: π(λ) = (0.4)e−λ/6/6 + (0.6)e−λ/12/12, λ > 0. Ten claims are observed for an insured in Year 1. Determine the Bayesian expected number of claims for the insured in Year 2. (A) 9.6 (B) 9.7 (C) 9.8 (D) 9.9 (E) 10.0 2.73 (3 points) In the previous question, use Buhlmann Credibility in order to determine the expected number of claims for the insured in Year 2. (A) 9.6 (B) 9.7 (C) 9.8 (D) 9.9 (E) 10.0
2.74 (CAS3, 5/05, Q.17) (2.5 points) An insurer selects risks from a population that consists of three independent groups.
• The claims generation process for each group is Poisson. • The first group consists of 50% of the population. These individuals are expected to generate one claim per year.
• The second group consists of 35% of the population. These individuals are expected to generate two claims per year.
• Individuals in the third group are expected to generate three claims per year. A certain insured has two claims in year 1. What is the probability that this insured has more than two claims in year 2? A. Less than 21% B. At least 21%, but less than 25% C. At least 25%, but less than 29% D. At least 29%, but less than 33% E. 33% or more
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 31
2.75 (SOA M, 5/05, Q.39 & 2009 Sample Q.170) (2.5 points) In a certain town the number of common colds an individual will get in a year follows a Poisson distribution that depends on the individualʼs age and smoking status. The distribution of the population and the mean number of colds are as follows: Proportion of population Mean number of colds Children 0.30 3 Adult Non-Smokers 0.60 1 Adult Smokers 0.10 4 Calculate the conditional probability that a person with exactly 3 common colds in a year is an adult smoker. (A) 0.12 (B) 0.16 (C) 0.20 (D) 0.24 (E) 0.28 2.76 (4, 11/05, Q.19 & 2009 Sample Q.230) (2.9 points) For a portfolio of independent risks, the number of claims for each risk in a year follows a Poisson distribution with means given in the following table: Class Mean Number of Claims per Risk Number of Risks 1 1 900 2 10 90 3 20 10 You observe x claims in Year 1 for a randomly selected risk. The Bühlmann credibility estimate of the number of claims for the same risk in Year 2 is 11.983. Determine x. (A) 13 (B) 14 (C) 15 (D) 16 (E) 17 2.77 (4, 11/06, Q.2 & 2009 Sample Q.247) (2.9 points) An insurance company sells three types of policies with the following characteristics: Type of Policy Proportion of Total Policies Annual Claim Frequency I
5%
Poisson with λ = 0.25
II
20%
Poisson with λ = 0.50
III
75%
Poisson with λ = 1.00
A randomly selected policyholder is observed to have a total of one claim for Year 1 through Year 4. For the same policyholder, determine the Bayesian estimate of the expected number of claims in Year 5. (A) Less than 0.4 (B) At least 0.4, but less than 0.5 (C) At least 0.5, but less than 0.6 (D) At least 0.6, but less than 0.7 (E) At least 0.7
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 32
2.78 (4, 11/06, Q.19 & 2009 Sample Q.263) (2.9 points) You are given: (i) The number of claims incurred in a month by any insured follows a Poisson distribution with mean λ. (ii) The claim frequencies of different insureds are independent. (iii) The prior distribution of λ is Weibull with θ = 0.1 and τ = 2. (iv) Some values of the gamma function are Γ(0.5) = 1.77245, Γ(1) = 1, Γ(1.5) = 0.88623, Γ(2) = 1. (v)
Month Number of Insureds Number of Claims 1 100 10 2 150 11 3 250 14 Determine the Bühlmann-Straub credibility estimate of the number of claims in the next 12 months for 300 insureds. (A) Less than 255 (B) At least 255, but less than 275 (C) At least 275, but less than 295 (D) At least 295, but less than 315 (E) At least 315 2.79 (4, 11/06, Q.23 & 2009 Sample Q.267) (2.9 points) You are given:
(i) The annual number of claims for an individual risk follows a Poisson distribution with mean λ. (ii) For 75% of the risks, λ = 1. (iii) For 25% of the risks, λ = 3. A randomly selected risk had r claims in Year 1. The Bayesian estimate of this riskʼs expected number of claims in Year 2 is 2.98. Determine the Bühlmann credibility estimate of the expected number of claims for this risk in Year 2. (A) Less than 1.9 (B) At least 1.9, but less than 2.3 (C) At least 2.3, but less than 2.7 (D) At least 2.7, but less than 3.1 (E) At least 3.1
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 33
2.80 (4, 5/07, Q.6) (2.5 points) An insurance company sells two types of policies with the following characteristics: Type of Policy Proportion of Total Policies Poisson Annual Claim Frequency I
θ
λ = 0.50
II
1-θ
λ = 1.50
A randomly selected policyholder is observed to have one claim in Year 1. For the same policyholder, determine the Bühlmann credibility factor Z for Year 2. (A)
θ - θ2 1.5 - θ2
(B)
1.5 - θ 1.5 - θ2
C)
2.25 - 2θ 1.5 - θ2
(D)
2θ - θ 2 1.5 - θ2
(E)
2.25 - 2θ2 1.5 - θ2
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 34
Solutions to Problems: 2.1. D. Chance of observing 4 accidents is λ4e−λ / 24. Weight the chances of observing 4 accidents by the a priori probability of λ. A Priori Probability 0.6 0.3 0.1
Type A B C
Poisson Parameter 1 2 3
Chance of 4 Claims 1.53% 9.02% 16.80%
Average
5.31%
2.2 . E. Since we are mixing Poissons, EPV = Overall mean = 1.5. VHM = 2.7 - 1.52 = .45. K = EPV/VHM = 3.33. Z= 1/ (1 + 3.33) = 23.1%. A Priori Probability 0.6 0.3 0.1
Type A B C Average
Poisson Parameter 1 2 3
Square of Mean 1 4 9
1.5
2.7
New Estimate = (23.1%)(4) + (76.9%)(1.5) = 2.08. 2.3. C. Chance of observing 4 accidents is λ4e−λ / 24. Type A B C
A Priori Probability 0.6 0.3 0.1
Poisson Parameter 1 2 3
Sum
1
1.50
Chance of 4 Claims 0.0153 0.0902 0.1680
Probability Weights 0.0092 0.0271 0.0168
Posterior Probability 0.1733 0.5101 0.3166
Mean 1 2 3
0.0531
1.0000
2.14
2.4. C. EPV = 1.5. VHM = 2.7 - 1.52 = .45. K = 3.33. (Same as in the solution before last.) Z= 10/ (10 + 3.33) = 75%. New Estimate = (75%)(3.7) + (25%)(1.5) = 3.15. 2.5. B. Chance of observing 37 accidents in ten years is (10λ)37e−10λ / 37!. Type A B C
A Priori Probability 0.6 0.3 0.1
Poisson Parameter 1 2 3
Sum
1
1.50
Chance of 37 Claims 3.299e-11 2.058e-4 3.061e-2
Probability Weights 0.00000 0.00006 0.00306
Posterior Probability 0.0000 0.0198 0.9802
Mean 1 2 3
0.00312
1.0000
2.98
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 35
2.6. E. EPV = Expected Value of λ = overall mean = 5. 2.7. B. For the uniform distribution on the interval (a, b), the variance = (b-a)2 /12. In this case with a = 3 and b = 7, the variance is: (7-3)2 /12 = 4/3. 2.8. A. K = EPV / VHM = 5/(4/3) = 3.75. For one year, Z = 1 / (1 + 3.75) = .211. New estimate is: (7)(.211) + (5)(1 - .211) = 5.42. 2.9. D. The posterior distribution of λ is proportional to the product of the chance of the observation of 7 claims in one year times the a priori probability of λ. The former is λ7 e−λ/7! for a Poisson. The latter is .25 for λ between 3 and 7, and zero elsewhere. The posterior distribution can be obtained by dividing .25λ7 e−λ/7! by its integral from 3 to 7. 7
(.25/7!) ∫ λ7 e−λ dλ = (.25/7!) Γ(8) {Γ(8; 7) - Γ(8; 3)}. 3
Therefore, the posterior density of λ is given by: λ7 e−λ / {Γ(8)(Γ(8; 7) - Γ(8; 3))}. The posterior mean is: 7
∫ λ λ7e−λ dλ / {{Γ(8)(Γ(8; 7) - Γ(8; 3))} = Γ(9)(Γ(9; 7) - Γ(9; 3)) / {(Γ(8){Γ(8; 7) - Γ(8; 3))} = 3
= 8{Γ(9; 7) - Γ(9; 3)} / {Γ(8; 7) - Γ(8; 3)}. Comment: This is a difficult question. By use of a computer: 8{Γ(9; 7) - Γ(9; 3)} / {Γ(8; 7) - Γ(8; 3)} = (8)(.2709 - .0038)/(.4013 - .0119) = 5.49. 2.10. E. For each insured, its variance is λ, thus EPV= E[λ] = mean of g = .09. VHM = Var[λ] = variance of g = .003. K = EPV/VHM = .09/.003 = 30. Z = 3/(3 + K) = 3/33 = 1/11. The observed frequency is 2/3. Estimate future frequency = (1/11)(2/3) + (10/11)(.09) = 0.142. Comment: In order to perform Buhlmann Credibility, we do not need to know the form of g, but only its first two moments, or equivalently its mean and variance. 2.11. D. E[λ] = the mean of the prior mixed exponential = weighted average of the means of the two exponential distributions = (.7)(1/20) + (.3)(1/10) = 6.5%.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
∞
2.12. D.
∞
HCM 10/21/12,
Page 36
∞
∫ Π(λ)dλ = 14∫ e−20λdλ + 3∫ e−10λdλ = (14 e-2/ 20) + (3e-1/10) = 20.51%. 0.1
0.1
0.1
Alternately, S(.1) for the mixed distribution is weighted average of S(.1) for the two Exponentials: (.7)(e-(20)(.1)) + (.3)(e-(10)(.1)) = .7e-2 + .3e-1 = 20.51%. 2.13. C. Posterior distribution is: Π(λ)e−λ / ∫Π(λ)e−λdλ = ∞
∞
{14e−21λ + 3e−11λ}/{14∫e−21λdλ + 3∫e−11λdλ} = {14e−21λ + 3e−11λ}/{(14/21) + (3/11)} = 0
0
{14e−21λ + 3e−11λ} / 0.9394. ∞
(1.0645)∫ {14e−21λ + 3e−11λ} dλ = (1.0645){(14e-2.1/21) + (3e-1.1/11)} = 18.35%. 0.1
2.14. E. Given λ, the chance of the observation is: e−λ.
∫
Therefore, by Bayes Theorem the posterior distribution is: Π(λ)e−λ / Π(λ)e−λdλ.
∫
∫
Therefore the posterior mean is: λΠ(λ)e−λdλ / Π(λ)e−λdλ = ∞
∞
∫
∞
∫
∫
∞
∫
{14 λe−21λdλ + 3 λe−11λdλ} / {14 e−21λdλ + 3 e−11λdλ} = 0
0
0
0
{(14 / 212 ) + (3/112 )} / {(14/21) + (3/11)} = 0.05654/0.9394 = 6.02%. Comment: Similar to 4, 5/01, Q.18.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 37
2.15. B. Given λ, the chance of the observation is: λe−λ.
∫
Therefore, by Bayes Theorem the posterior distribution is: Π(λ)λe−λ / Π(λ)λe−λdλ.
∫
∫
Therefore the posterior mean is: λ2Π(λ)e−λdλ / λΠ(λ)e−λdλ = ∞
∞
∫
∞
∫
∞
∫
∫
{14 λ2e−21λdλ + 3 λ2e−11λdλ} / {14 λe−21λdλ + 3 λe−11λdλ} = 0
0
0
0
{(2)(14/213 ) + (2)(3/113 )} / {(14/212 ) + (3/112 )} = .007531/.05654 = 13.3%. Comment: For Gamma type integrals, as discussed in the section on the Gamma Function: ∞
∞
∫ tα−1 e−t/θ dt = Γ(α)θα, or ∫ tn e-ct dt = 0
n! / cn+1.
0
Here is a graph of the prior (dashed) and posterior distributions of lambda: 15 12.5 10 7.5 5 2.5 0.1
0.2
0.3
0.4
0.5
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 38
2.16. D. Given λ, the chance of the observation is: λ2e−λ/2.
∫
Therefore, by Bayes Theorem the posterior distribution is: Π(λ) λ2e−λ / Π(λ) λ2e−λdλ.
∫
∫
Therefore the posterior mean is: λ3 Π(λ) e−λ dλ / λ2 Π(λ) e−λ dλ = ∞
∞
∫
∞
∫
∞
∫
∫
{14 λ3e−21λdλ + 3 λ3e−11λdλ} / {14 λ2e−21λdλ + 3 λ2e−11λdλ} = 0
0
0
0
{(6)(14/214 ) + (6)(3/114 )} / {(2)(14/213 ) + (2)(3/113 )} = 0.001661/0.007531 = 22.1%. Comment: If n claims are observed, then the posterior distribution is: {14 λn e−21λ + 3 λn e−11λ }/ {(14)(Γ(n+1) /21n+1) + (3)(Γ(n+1) /11n+1)} If n claims are observed, then the posterior mean is: {(14)(Γ(n+2) /21n+2) + (3)(Γ(n+2) /11n+2)}/ {(14)(Γ(n+1) /21n+1) + (3)(Γ(n+1) /11n+1)} = (n+1){(14 /21n+2) + (3 /11n+2)}/ {(14 /21n+1) + (3 /11n+1)}. For example, for n = 0, 1, 2, 3, 4, and 5, the posterior means are: 0.060, 0.133, 0.221, 0.319, 0.421, and 0.523. 2.17. B. Since we are mixing Poissons, EPV = a priori mean = .065.
∫
second moment of the hypothetical means = λ2 Π(λ) dλ = ∞
∫
∞
∫
14 λ 2e−20λdλ + 3 λ2e−10λdλ = (14)(2/203 ) + (3)(2/103 ) = .0095. 0
0
VHM = .0095 - .0652 = .005275. K = EPV/VHM = .065/.005275 = 12.3. Alternately, the distribution of λ is a mixed exponential, with 2nd moment a weighted average of those of the individual distributions: (.7)(2/202 ) + (.3)(2/102 ) = .0095. Proceed as before. 2.18. C. One year of data is given a credibility of: 1/(1+12.3) = 7.5%. The a priori mean is .065. If 2 claims are observed, the estimated future frequency is: (.075)(2) + (.925)(.065) = 0.210. Comment: If n claims are observed, then the estimated future frequency is: .075n + (.925)(.065). Number of claims Buhlmann Credibility Estimate Bayes Analysis Estimate
0 6.0% 6.0%
1 13.5% 13.3%
2 21.0% 22.1%
3 28.5% 31.9%
4 36.0% 42.1%
5 43.5% 52.3%
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 39
b
∫θ f(θ) dθ = {(d+1)/{bd+1 - ad+1}} θd+2 / (d+2) =
2.19. C. a
{(d+1)/ (d+2)}{bd + 2 - ad + 2} / {bd + 1 - ad + 1}. 2.20. E. For an insured with a Poisson annual frequency of θ, over Y years his frequency is Poisson with mean θY. Therefore the chance of the observation is: e-Yθ (Yθ)C / C!. The posterior distribution is proportional to the product of the a priori probability and the chance of the observation: {(d+1) θd / {bd+1 - ad+1} } e-Yθ (Yθ)C / C!. Therefore, the posterior distribution is proportional to θd+C e-Yθ. Since the density has support [a,b], we must integrate from a to b in order to calculate the constant we must divide by in order that the posterior density integrates to unity as required. b
b
a
∫θd+Ce-Yθdθ = ∫θd+Ce-Yθdθ - ∫θd+Ce-Yθdθ = a
0
0
{Γ(d+C+1; bY) - Γ(d+C+1; aY)} Γ(d+C+1) /Yd+C+1. Thus the posterior distribution is: Yd + C + 1 θd + C e-Yθ / [{Γ(d+C+1;bY) - Γ(d+C+1;aY) }Γ(d+C+1) ]. Comment: See the next section in order to see how to do the required integrals, related to the Gamma Distribution and the Incomplete Gamma Function. 2.21. E. We want the mean of the posterior distribution determined in the previous question. We need to integrate the posterior distribution times θ, over its support [a,b]. b
∫
{Yd+C+1 / [{Γ(d+C+1;bY) -Γ(d+C+1;aY) }Γ(d+C+1) ] } θd+C+1e-Yθdθ = a
{Yd+C+1 / [{Γ(d+C+1;bY) - Γ(d+C+1;aY) }Γ(d+C+1) ] }{Γ(d+C+2;bY) -Γ(d+C+2;aY) }Γ(d+C+2) / Yd+C+2
= {{d+C+1)/Y}{Γ(d+C+2; bY) - Γ(d+C+2; aY) } / {Γ(d+C+1; bY) - Γ(d+C+1; aY) }. 2.22. E. From the solution to a prior question, the prior mean is: {(d+1)/ (d+2)}{bd+2 - ad+2 } / {bd+1 - ad+1} = (.5/1.5){.61.5 - .21.5 ) / (.6.5 - .2.5 ) = 0.382.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 40
2.23. A. From the solution to a previous question, the posterior distribution is: Y d+C+1 θd+C e-Yθ / [{Γ(d+C+1;bY) - Γ(d+C+1;aY) }Γ(d+C+1) ] = 62.5 θ1.5 e-6θ / [{Γ(2.5;3.6) - Γ(2.5;1.2) }Γ(2.5) ] = 88.182 θ1.5 e-6θ / {(0.794 - 0.209)1.329} = 113.4 θ1 . 5 e-6θ , with support [0.2, 0.6]. 2.24. D. From a previous solution, the mean of the posterior distribution is: {{d+C+1)/Y}{Γ(d+C+2; bY) - Γ(d+C+2; aY) } / {Γ(d+C+1; bY) - Γ(d+C+1; aY) } = (2.5/6){Γ(3.5;3.6) - Γ(3.5;1.2)} / {Γ(2.5;3.6) - Γ(2.5;1.2)} = (.4167)(.592 - .066)/(.794 - .209) = 0.375. Comment: The observation of a frequency of 1/3 has reduced the posterior estimate to .375 from the prior estimate of .382. 2.25. A. From the solution to a previous question, the mean of the posterior distribution is: {{d+C+1)/Y}{Γ(d+C+2;bY) - Γ(d+C+2;aY) } / {Γ(d+C+1;bY) - Γ(d+C+1;aY) } = (C/Y){Γ(3; ∞) - Γ(3; 0)} / {Γ(2; ∞) - Γ(2; 0)} = (C/Y)(1 - 0)/(1 - 0) = C/Y = 1/3. Comments: Note that 1/θ on [0,∞) is not a proper density, since it has an infinite integral. Nevertheless, such “improper priors” are used in Bayesian Analysis. Note that the posterior estimate is the same as the observed frequency of 2/6 = 1/3. This will always be the case for this choice of a, b and d; this prior is therefore referred to as the “non-informative prior” or “vague prior” for a Poisson process. Note that Γ(α; ∞) = 1 and Γ(α; 0) = 0; this is why the Gamma Distribution with support from 0 to ∞ can be defined by F(x) = Γ(α; x/θ). 2.26. A. EPV = E[λ] = overall mean = .05. VHM = Var[λ] = (8% - 2%)2 /12 = .0003. K = EPV / VHM = .05 / .0003 = 166.7. For five years, Z = 5 / (5 + 166.7) = 2.9%. The a priori frequency is 5%, while the observed frequency is: 2/5 = 40%. Thus the new estimate is: (2.9%)(40%) + (97.1%) (5%) = 6.0%.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 41
2.27. D. The chance of the observing no claims over five years, given λ, is e−5λ. The prior density of λ is 16.67 for .02 < λ < .08. Therefore, the posterior density of λ is: .08
∫
16.67e−5λ/ 16.67e−5λ dλ = 21.32e−5λ for 0.02 < λ < 0.08. .02
The mean of the posterior distribution is: .08
.08
∫λ21.32e−5λ dλ = -21.32(λe−5λ/5 + e−5λ/25)] = 4.85%. .02
.02
2.28. B. The posterior density is: 21.32e−5λ for .02 < λ < .08. .05
.05
∫
Prob[λ < .05] = 21.32e−5λ dλ = -21.32e−5λ/5] = 53.7%. .02
.02
2.29. C. The chance of the observing one claim over five years, given λ, is 5λe−5λ. The prior density of λ is 16.67 for .02 < λ < .08. Therefore, the posterior density of λ is: .08
∫
16.67(5λe−5λ)/ 16.67(5λe−5λ) dλ = 16.67(5λe−5λ)/0.1896 = 439.6λe−5λ for .02 < λ < .08. .02
The mean of the posterior distribution is: .08
.08
∫λ439.6λe−5λ dλ = -439.6(λ2e−5λ/5 + 2λe−5λ/25 + 2e−5λ/125) ] = 5.47%. .02
.02
2.30. D. The posterior density is: 439.6λe−5λ for .02 < λ < .08. .05
.05
∫
Prob[λ < .05] = 439.6λe−5λ dλ = -439.6(λe−5λ/5 + e−5λ/25)] = 38.4%. .02
.02
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 42
2.31. C. The posterior density is: 21.32e−5λ for .02 < λ < .08. The probability of observing no claims given λ is e−λ. The probability of observing no claims is: .08
.08
∫e−λ21.32e−5λ dλ = -21.32(e−6λ/6)] = 95.3%. .02
.02
Comment: While in this case the probability of observing no claims is close to exp[-future mean frequency], they are not equal. To more decimal places, the future mean frequency is 0.048502. e-.048502 = .95266. On the other hand, the probability of observing no claims is 0.95280. 2.32. A. The posterior density is: 21.32e−5λ for .02 < λ < .08. The probability of observing one claim given λ is λe−λ. The probability of observing one claim is: .08
.08
∫λe−λ21.32e−5λ dλ = -21.32(λe−6λ/6 + e−6λ/36)] = 4.59%. .02
.02
2.33. A. E[λ] = (α / (α − 1)) θ = (3/2)(0.8) = 1.2. E[λ2] = αθ2 /(α − 2) = 1.92. Var[λ] = 1.92 - (1.2)2 = .48. EPV = E[λ] = 1.2. VHM = Var[λ] = .48. Thus K = EPV / VHM = (1.2) /(.48) = 2.5. Z = 7/(7 + K) = .737. Estimated future annual frequency = (.737)(4/7) + (1 - .737)(1.2) = 73.6%. Comment: Note that the distribution of λ has support λ > 0.8, and therefore the estimated future annual frequency is outside the range of hypotheses. This can happen when using Buhlmann Credibility, a linear approximation to Bayes Analysis, but this can not happen when using Bayes Analysis.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 43
2.34. B. Prob(Observation | λ) = (7λ)4 e−7λ / 4! = 100.04 λ4 e−7λ. Prior density of λ is: αθα / λα+1 = 1.536/ λ4, λ > .8. ∞
Prob(observation) =
∞
∫ (100.04 λ4 e−7λ) (1.536/ λ4) dλ = 153.7∫e−7λdλ = 8.1%.
.8
.8
Comment: If in this case one observed for example 6 claims in 7 years, the answer would involve Incomplete Gamma Functions. However, if one observed for example 3 claims in 7 years, the answer would involve doing Exponential Integrals; Exponential Integrals involve an integral of e-x times a negative integral power of x from t to ∞. See Handbook of Mathematical Functions. Here is the chance of the observation for various numbers of claims observed over 7 years: # claims
Prob. of Obser.
# claims
Prob. of Obser.
# claims
Prob. of Obser.
0 1 2 3 4 5 6 7
0.12% 0.75% 2.36% 5.01% 8.12% 10.72% 12.06% 11.96%
8 9 10 11 12 13 14
10.73% 8.92% 7.01% 5.30% 3.93% 2.89% 2.13%
15 16 17 18 19 20 21
1.59% 1.20% 0.92% 0.72% 0.57% 0.45% 0.37%
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 44
2.35. D. Prob(Observation | λ) = (7λ)4 e−7λ / 4! = 100.04 λ4 e−7λ. Prior density of λ is: αθα / λα+1 = 1.536/ λ4, λ > .8. Therefore the posterior density is proportional to: (100.04 λ4 e−7λ) (1.536/ λ4) ~ e−7λ, λ > .8. ∞
∫
Posterior density is: e−7λ / e−7λ dλ = 1893e−7λ, λ > .8. .8 ∞
∞
∫
]
Mean of posterior distrib. = λ1893e−7λ dλ = −1893(λe−7λ/7 + e−7λ/49) = 94.3%. .8
.8
Comment: If in this case one observed for example 6 claims in 7 years, the estimate would involve Incomplete Gamma Functions. However, if one observed for example 3 claims in 7 years, the estimate would involve doing Exponential Integrals; Exponential Integrals involve an integral of e-x times a negative integral power of x from t to ∞. See Handbook of Mathematical Functions. Here are the estimated future annual frequencies for various numbers of claims observed over 7 years, with Bayes Analysis as the dots and Buhlmann Credibility as the straight line: Freq. 3 2.5 2 1.5 1 0.5
5
10
15
20
25
Claims
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 45
2.36. A. The chance of observing 2 claims in a year is: λ2e−λ/2!. Therefore, the chance of observing 2 claims in each of the first two years is: (λ2e−λ/2!)2 . A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
A B
0.5000 0.5000
0.00288 0.00976
Overall
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
F
Mean Frequency
0.00144 0.00488
0.228 0.772
0.400 0.600
0.0063
1.000
0.554
Comment: Similar to 4, 11/00, Q.3. 2.37. A. The chance of observing 1 claim in a year is: λe−λ. The chance of observing 3 claims in a year is: λ3e−λ/3!. Therefore, the chance of the observation is: λe−λλ 3e−λ/3! = λ4e−2λ/6. A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
A B
0.5000 0.5000
0.00192 0.00651
Overall
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
F
Mean Frequency
0.00096 0.00325
0.228 0.772
0.400 0.600
0.0042
1.000
0.554
Comment: Since the chances of observation are proportional to those in the previous question, we get the same posterior distribution and the same answer. 2.38. A. Over two years we have a Poisson with mean 2λ. Therefore, the chance of the observation is: e−2λ(2λ)4/4! = (2/3)λ4e−2λ. A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
A B
0.5000 0.5000
0.00767 0.02602
Overall
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
F
Mean Frequency
0.00383 0.01301
0.228 0.772
0.400 0.600
0.0168
1.000
0.554
Comment: Since the chances of observation are proportional to those in the previous questions, we get the same posterior distribution and the same answer. In this question we have somewhat less information about what occurred, than in the previous questions. In general, one must be careful to use the exact wording of the observation, even though in these particular questions the result of Bayesian Analysis only depended on the sum of the claims over the first two years.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 46
2.39. E. The chance of observing at most 2 claims in a year is: f(0) + f(1) + f(2) = e−λ + λe−λ + λ2e−λ/2!. Therefore, the chance of observing at most 2 claims in each of the first two years is: (e−λ + λe−λ + λ2e−λ/2!)2 . A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
A B
0.5000 0.5000
0.98421 0.95430
Overall
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
F
Mean Frequency
0.49211 0.47715
0.508 0.492
0.400 0.600
0.9693
1.000
0.498
2.40. D. Over two years we have a Poisson with mean 2λ. Therefore, the chance of the observation is: e−2λ + 2λe−2λ + (2λ)2e−2λ/2!. A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
D
A B
0.5000 0.5000
0.95258 0.87949
Overall
E
F
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
Mean Frequency
0.47629 0.43974
0.520 0.480
0.400 0.600
0.9160
1.000
0.496
2.41. C. The chance of observing at least 2 claims in a year is: 1 - {f(0) + f(1)} = 1 - {e−λ + λe−λ}. Therefore, the chance of observing at least 2 claims in each of the first two years is: (1 - {e−λ + λe−λ})2 . A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
A B
0.5000 0.5000
0.00379 0.01486
Overall
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
F
Mean Frequency
0.00189 0.00743
0.203 0.797
0.400 0.600
0.0093
1.000
0.559
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 47
2.42. B. Probabilities of the Observation: 4re-4/r!, 7re-7/r!. Probability Weights: .8 4re-4/r!, .2 7re-7/r!. Posterior Distribution: .8 4re-4/(.8 4re-4 + .2 7re-7), .2 7re-7/(.8 4re-4 + .2 7re-7). Let w = posterior probability that λ = 4. We are given that: 5.97 = (4)w + (7)(1 - w). ⇒ w = 0.343.
⇒ 0.343 = .8 4re-4/(.8 4re-4 + .2 7re-7) = 4/{4 + (4/7)re-3}. ⇒ 0.343 (4/7)r e-3 = 2.628. ⇒ r = ln(153.9)/ln(4/7) = 9. Comment: Similar to 4, 11/06, Q.23. 2.43. C. The chance of the observation given λ is: (λe−λ)(λ2e−λ/2)(λ3e−λ/6) = λ6e−3λ/12. π(λ) = τ(λ/θ)τ exp[-(λ/θ)τ] / λ = τ λτ−1 exp[-(λ/θ)τ] / θτ. Therefore, the posterior distribution of λ is proportional to: (λ6e−3λ){λτ−1 exp[-(λ/θ)τ]} = λτ+5 exp[-{3λ + (λ/θ)τ}]. The mean of the posterior distribution of λ is: (integral of λ times the “probability weight”) / (integral of the “probability weight”) ∞
=
∫
λτ + 6 exp[−{3λ + (λ / θ) τ }] dλ /
0
∞
∫ λτ + 5 exp[−{3λ +
(λ / θ) τ }] dλ .
0
2.44. B. The chance of 3 claims given λ is: λ3e−λ/6. From the previous solution, the posterior distribution of λ is proportional to: λτ+5 exp[-{3λ + (λ/θ)τ}]. The probability of three claims in year 4 from this policyholder is: (integral of λ3e−λ/6 times the “probability weight”) / (integral of the “probability weight”) ∞
= (1/6)
∫
0
λτ + 8 exp[−{4λ + (λ / θ) τ }] dλ /
∞
∫ λτ + 5 exp[−{3λ +
(λ / θ) τ }] dλ .
0
2.45. D. EPV = E[λ] = .005. VHM = Var[λ] = (.008 - .002)2 /12 = .000003. K = EPV/VHM = .005/.000003 = 1667. The total number of employees observed is 6000. Z = 6000/(6000 + 1667) = 78.3%. Observed frequency is: 46/6000 = .00767. Prior mean is .005. Estimated future frequency is: (.783)(.00767) + (1 - .783)(.005) = .00709. Estimated number of claims for year four is: (.00709)(1400) = 9.9. Comment: Similar to Exercise 7 in “Topics in Credibility” by Dean.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 48
2.46. E. The probability covered by the first interval is: f(0) + f(1) = (1+λ)e−λ. The probability covered by the second interval is: f(2) + f(3) = (λ2/2 + λ3/6)e−λ. The probability covered by the last interval is: 1 - {f(0) + f(1) + f(2) + f(3)}. For λ = 2, these three probabilities are: 0.4060, 0.4511, and 0.1429. Thus for λ = 2 the Iikelihood is: (0.4060)(0.45112 )(0.14293 ) = 0.000241. For λ = 4, these three probabilities are: 0.0916, 0.3419, and 0.5665. Thus for λ = 2 the Iikelihood is: (0.0916)(0.34192 )(0.56653 ) = 0.001947. Thus the probability weights are: (3/4)(0.000241), and (1/4)(0.001947). Therefore, the posterior distribution of lambda is: 27.1% and 72.9%. The expected value of lambda for this insured is: (2)(27.1%) + (4)(72.9%) = 3.46. 2
2.47. D. EPV = E[λ] =
∫0 λ 3λ2 / 8 dλ = 3/2.
2
E[λ2] =
∫0 λ2 3λ2 / 8 dλ = 12/5.
VHM = Var[λ] = E[λ2] - E[λ]2 = 12/5 - (3/2)2 = 3/20. K = EPV / VHM = (3/2) / (3/20) = 10. Z = 3 / (3 + K) = 3/13. Estimated future frequency for Chip is: (3/13)(2/3) + (10/13)(3/2) = 102/78 = 1.308. 2.48. A. The EPV = prior mean = 0.15, VHM = 0.052 = 0.0025, K = 0.15/0.0025 = 60. Z = 910/(910 + K) = 910/970. Observed mean is: 81/910. Estimated future frequency is: (910/970)(81/910) + (60/970)(0.15) = 0.09278 per month, per employee. (12)(400)(0.09728) = 445.4 claims. Comment: An example of where for Buhlmann Credibility, the estimated future frequency of 0.09278 is outside the range of hypothesis: 0.1 to 0.2.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 49
2.49. B. The sum of 910 independent, identically distributed Poissons is a Poisson with mean 910λ. Therefore, the chance of the observation given lambda is: (910λ)81 exp[-910λ] / 81!. This is proportional to: λ81 e-910λ. Since the two types are equally likely, the probability weights are proportional to: 0.181 e-91, and 0.281 e-182. These in turn are proportional to: e91 and 281. The first weight is much, much bigger then the second; their ratio is 1.37 x 1015. Thus the posterior distribution is (subject to rounding): 100% and 0%. Estimated future frequency is: 0.1 per month, per employee. (12)(400)(0.1) = 480 claims. 2.50. E. P = 95%. ⇒ y = 1.960. n0 = (1.960/0.10)2 = 384 claims. Z=
81 = 45.9%. 384
(45.9%)(81/910) + (1 - 45.9%)(0.15) = 0.1220 per month, per employee. (12)(400)(0.1220) = 586 claims. 2.51. D. Prior to any observations, the Expected Value of the Process Variance is the (a priori) Overall Mean of 4.8, since we are mixing Poissons. The Variance of the Hypothetical Means (between brands) is .20. K = EPV/VHM = 4.8 / .2 = 24. We observe 50 cookies, so Z = 50/(50 + 24) = .676. The observed frequency is 265/50 = 5.3. The new estimate is: (.676)(5.3) + (1 - .676)(4.8) = 5.14 chips per cookie. Comment: Note the way we use our a priori model to compute K, prior to any observations. Note that the exposure unit in this case is cookies; we are attempting to estimate the chips per cookie. 2.52. D. EPV = overall mean frequency = (.1 + .3)/2 = .2. .3
The 2nd moment of the mean frequencies =
∫ x2 dx
/ (.3 - .1) = .008667 /.2 = .04333.
.1
Therefore the Variance of the Hypothetical Mean Frequencies = .04333 - .22 = .00333. K = EPV / VHM = .2 / .003333 = 60. For five years, Z = 5 / (5 + 60) = 1/13. The a priori frequency is .2, while the observed frequency is 3/5 = .6. Thus the estimated future annual frequency is: (.6)(1/13) + (.2) (12/13) = 0.231. Comment: For the uniform distribution on the interval (a,b), the Variance = (b - a)2 /12. In this case with a = 0.1 and b = 0.3, the variance is .22 /12 = .00333, the VHM.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 50
2.53. C. By Bayes Theorem the posterior density is proportional to the product of the prior density and the chance of the observation given θ. The chance of the observation given θ is e−θ. Thus the posterior density is proportional to (1)e−θ for 0 < θ < 1. In order to convert to a density function, one must divide by its integral from 0 to 1, which is (1 - e-1). Thus the posterior density is: e−θ / (1 - e- 1), for 0 < θ < 1. 2.54. D. Let the mean claim frequency for each insured be θ, then f(θ) = 1 for 0 ≤ θ ≤ 1. Then the second moment is the integral from 0 to 1 of θ2f(θ)dθ, which is θ3 /3 from zero to one, or 1/3. The Expected Value of the Process Variance = E[θ] = 1/2. The variance of the hypothetical means = VAR[θ] = second moment - mean2 = (1/3) - (1/2)2 = 1/12. Therefore, K = EPV / VHM = 1/2 / (1/12) = 6. For 3 years Z = 3 / (3 + K) = 3/9 = 1/3. The prior estimate is 1/2 and the observed frequency is 4/3 . Thus the new estimate = (1/3)(4/3) + (1 - 1/3)(1/2) = 0.777. Comment: The variance of the uniform distribution on the interval (a,b) is (b-a)2 /12, which in this case is 1/12. 2.55. C. The (prior) distribution of λ has a mean of: ∞
∞
∞
∫ λ f(λ) dλ = ∫ 3λ−3 dλ = (−3/2)λ−2 ] = 1.5. The second moment is: 1
1
∞
∫ 1
1 ∞
λ 2 f(λ) dλ =
∞
∫ 3λ−2 dλ = (−3)λ−2 ] = 3. 1
1
Thus the variance of the hypothetical means is 3 - 1.52 = .75. The Expected Value of the Process Variance is E[λ] = 1.5. Thus K = EPV/VHM = 1.5 / .75 = 2. For two years of data Z = 2/(2+K) = 1/2. The observation is a frequency of 20/2 =10. The a priori mean is 1.5. Thus the estimate of this insuredʼs future frequency is: (.5)(10) + (1-.5)(1.5) = 5.75. Comment: The distribution of λ is a Single Parameter Pareto Distribution with α = 3 and θ =1. It has mean = (α / (α − 1))θ = 3/2, and variance = (α / [ (α − 2) (α − 1)2 ])θ2 = 3 /{(1)(22)} = 3/4.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 51
2.56. E. The distribution of λ is a Single Parameter Pareto with θ = 1 and α = 4. The mean = (α / (α − 1))θ = 4/3. The second moment = αθ2 /(α − 2) = 2. The variance = 2 - (4/3)2 = 2/9. EPV = E[λ] = mean of f(λ) = 4/3. VHM = Var[λ] = 2/9. K = EPV / VHM = (4/3) /(2/9) = 6. 2.57. C. Let the mean claim frequency for each insured be θ, then f(θ) = 1 for 0 ≤ θ ≤ 1. Then the second moment is the integral from 0 to 1 of θ2f(θ)dθ, which is θ3 /3 from zero to one, or 1/3. Then the Expected Value of the Process Variance = E[θ] = 1/2. The variance of the hypothetical means = VAR[θ] = second moment - mean2 = (1/3) - (1/2)2 = 1/12. Therefore, K = EPV / VHM = (1/2) / (1/12) = 6. For 3 years Z = 3 / (3+K) = 3/9 = 1/3. The prior estimate is 1/2 and the observed frequency is 3/3 = 1. Thus the new estimate = (1/3)(1) + (1 - 1/3)(1/2) = 2/3. 2.58. D. The chance of no claims for a Poisson is e−λ. We average over the possible values of λ: 3
3
∫
(1/2) e−λ dλ = (1/2)(-e−λ) ] = (1/2)(e-1 - e-3) = (1/2)(.368 - .050) = 0.159. 1
1
2.59. E. Expected Value of the Process Variance = E[λ] = 2. Variance of the Hypothetical Means = VAR[λ] = (3 - 1)2 /12 = 1/3. K = EPV / VHM = 2 / (1/3) = 6. For one observation, Z = 1/(1+6) = 1/7. The prior estimate is E[λ] = 2. The observation is a frequency of 1. Thus the new estimate is: (1/7)(1)+(1 - 1/7)(2) = 13/7 = 1.857. 2.60. A. The mean frequency is E[λµ] = E[λ]E[µ] = (1/2)(1/2) = 1/4. Since we are mixing Poissons, the EPV = mean frequency = 1/4. The Variance of the Hypothetical Means = VAR[λµ] = E[(λµ)2 ] - E2 [λµ] = E[λ2]E[µ2] - 1/42 = (1/3)(1/3) - 1/16 = 7/144. Thus K = EPV / VHM = (1/4) / (7/144) = 36/7 = 5.143. Comment: Note that since λ and µ are independent, the expected values can be separated into products of separate expected values. The uniform distribution from zero to one has first moment of 1/2 and second moment of 1/3.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 52
2.61. D. The posterior probabilities are proportional to the product of the chance of the observation given each grade point average and the a priori probability of each grade point average. The density of g is 1/4 on [0,4]. The chance of observing zero claims from a Poisson with mean λ is: e−λ = e-(4-g) = eg-4. Thus, the posterior density is proportional to (1/4) eg-4 on [0,4]. Dividing by the integral from zero to 4, (1-e-4)/4, the posterior density is: eg-4 / (1-e-4). The posterior probability that the selected driver has a grade point average greater than 3, is the integral of the posterior density from 3 to infinity: (1-e-1) / (1- e-4) = 0.632/0.982 = 0.644. Comment: The integrals used are as follows: g=4
g=4
∫e g-4 dg = eg-4 ] = 1-e-4 g=0
g=0
g=4
g=4
∫ eg-4 dg = eg-4 ] = 1 - e-1. g=3
g=3
2.62. C. For a Poisson process the variance is equal to the mean, in this case 4-g. Thus the Expected Value of the Process Variance is equal to the overall mean of 2: 4
4
0
0
∫ (4 − g) f(g) dg = ∫ (4 − g) (1/ 4) dg =
g-
g= 4 2 g /8 ] g=0
= 2.
The Variance of the Hypothetical Means is the variance of the Uniform Distribution on [0,4], which is (4 - 0)2 /12 = 16/12 = 4/3. Thus K = EPV / VHM = 2/(4/3) = 1.5. For 5 exposures, Z = 5/(5+1.5) = .769. Expected number of claims is: (0)(.769) + (2)(1-.769) = 0.462. Comments: E[4 - g] = 4 - E[g] = 4 - 2 = 2. VAR[4 - g] = VAR[4] + VAR[g] = 0 + 4/3 = 4/3. The variance of the uniform distribution on [a,b] is (b-a)2 /12. 2.63. A. Expected Value of the Process Variance = overall mean = µ. Variance of the Hypothetical Means = Var[λ] = µ. K = EPV/ VHM = µ / µ = 1. Comment: Each individual insured has a Poisson frequency. The distribution of hypothetical means is given by a second Poisson. Var[λ] = Variance of the second Poisson = Mean of the second Poisson = µ. Since λ acts as a sort of dummy variable, the solution canʼt depend on λ, thus eliminating choices B, D, and E.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 53
2.64. C. The chance of observing 4 claims in a year is: λ4e−λ/4!. Therefore, the chance of observing 4 claims in each of the first two years is: (λ4e−λ/4!)2 . A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
A B
0.5000 0.5000
0.00814 0.03817
Overall
D
E
F
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
Mean Frequency
0.00407 0.01908
0.176 0.824
2 4
0.0232
1.000
3.65
Comment: If instead the observation had been a total of 8 claims over two years, then the chance of the observation would have been: (2λ)8e−2λ/8!. A
B
C
Class
A Priori Chance of This Class
Chance of the Observation
A B
0.5000 0.5000
0.02977 0.13959
Overall
D
E
Prob. Weight = Posterior Product of Chance of Columns B & C This Class
F
Mean Frequency
0.01489 0.06979
0.176 0.824
2 4
0.0847
1.000
3.65
While in this case one would get the same answer, in general in using Bayes Theorem, it is important to use all of the information in the observation exactly as given. 2.65. D. The posterior probabilities of the two risk types are: .176 and .824. For λ = 2, f(1) = 2 e-2 = 0.271. For λ = 4, f(1) = 4 e-4 = 0.073. (.176)(.271) + (.824)(.073) = 10.8%. 2.66. B. Mean = (.5)(2) + (.5)(4) = 3. Mixing Poissons ⇒ EPV = mean = 3. Second Moment of the Hypothetical Means = (.5)(22 ) + (.5)(42 ) = 10. VHM = 10 - 32 = 1. K = EPV/VHM = 3/1 = 3. Z = 2/(2+K) = 2/5. Observed Frequency = 8/2 = 4. Estimated Frequency = (2/5)(4) + (3/5)(3) = 3.4.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 54
2.67. A. Given λ, the chance of the observation is: e−λ. Therefore, by Bayes Theorem the posterior distribution of λ is proportional to: Π(λ)e−λ = 2.5e−6λ + 0.1e−1.2λ = (5/12)6e−6λ + (1/12)1.2e−1.2λ. This is proportional to the mixed exponential distribution: (5/6)6e−6λ + (1/6)1.2e−1.2λ, which must therefore be the posterior distribution of λ. The expected number of claims = mean of the posterior distribution: (5/6)(1/6) + (1/6)(1/1.2) = 0.278. Comment: When not told which one to use, use Bayes Analysis, rather than Buhlmann Credibility which is a linear approximation to Bayes Analysis, The a priori mean is: (1/2)(1/5) + (1/2)(5) = 2.6.
∫
The posterior distribution is: Π(λ)e−λ / Π(λ)e−λdλ = ∞
∫
{2.5e−6λ + 0.1e−1.2λ} / { 2.5e−6λ + 0.1e−1.2λ dλ} = 0
{2.5e−6λ + 0.1e−1.2λ}/{2.5/6 + 0.1/1.2} = {2.5e−6λ + 0.1e−1.2λ}/(1/2) = 5e−6λ + 0.2e−1.2λ. ∞
∞
∞
∫ λ {5e−6λ + 0.2e−1.2λ} dλ = 5∫ λe−6λdλ + 0.2∫λe−1.2λ dλ =
posterior mean = 0
0
0
5/62 + 0.2/1.22 = 0.278. If the prior distribution of λ had been an exponential rather than mixed exponential, then this would have been a special case of the Gamma-Poisson. Here is a graph of the prior (dashed) and posterior distributions of lambda: 2
1.5
1
0.5
0.5
1
1.5
2
2.5
3
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 55
∫
2.68. D. first moment of the hypothetical means = λ Π(λ) dλ = ∞
∞
∫
∫
2.5 λe−5λdλ + .1 λe−λ/5dλ = (2.5)(1/52 ) + (.1)(52 ) = 2.6. 0
0
∫
second moment of the hypothetical means = λ2 Π(λ) dλ = ∞
∞
∫
∫
2.5 λ2e−5λdλ + .1 λ2e−λ/5dλ = (2.5)(2/53 ) + (.1)(2)(53 ) = 25.04. 0
0
Since we are mixing Poissons, EPV = a priori mean = 2.6. VHM = 25.04 - 2.62 = 18.28. K = EPV/VHM = 2.6/18.28 = 0.142. Z = 1/(1 + K) = 0.876. Estimated frequency = (0.876)(0) + (1 - 0.876)(2.6) = 0.322. Alternately, the distribution of λ is a mixed exponential, with 50% weight to an exponential with mean 1/5 and 50% weight to an exponential with mean 5. First moment of the mixture = (.5)(.2) + (.5)(5) = 2.6. Second moment of the mixture = (.5)(2)(.22 ) + (.5)(2)(52 ) = 25.04. Proceed as before. 2.69. B. The chance of the observation of no claims is: e−λ = e.01p -1. The prior density of p is .01, 0 < p < 100. Therefore, by Bayes Theorem, the posterior density is proportional to: .01e.01p -1 = .01e-1e.01p, 0 < p < 100. 100
∫.01e-1e.01p dp = .01e-1{100(e - 1)}. 0
The posterior distribution is: .01e-1e.01p/{.01e-1100(e - 1)} = e.01p /{100(e - 1)}, 0 < p < 100. The posterior probability that p > 50 is: 100
∫
(1/ 100(e - 1)) e.01pdp = (e - e.5)/(e - 1) = 0.622. 50
Comment: One can approximate the solution, by use of discrete risk types. Divide the employees into two equally likely groups: p < 50 and p > 50. Then the groups have average salaries of 25 and 75, with average lambdas of .75 and .25. Therefore, the chances of the observation for the two groups are approximately: e-.75 = .472 and e-.25 = .779. Since the two groups are equally likely a priori, the posterior distribution is: .472/(.472 + .779) = .377 and .779/(.472 + .779) = 0.623.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 56
2.70. B. λ = 1 - .01p. VHM = Var[λ] = (.012 )Var[p] = (.012 )(1002 /12) = 1/12. EPV = E[Process Variance | λ] = E[λ] = 1 - .01E[p] = 1 - (.01)(50) = 1/2. K = EPV/VHM = (1/2)/(1/12) = 6. Z = 4 /(4+6) = 40%. Estimated frequency = (40%)(5/4) + (60%)(1/2) = 0.8. Comment: There is no reason why the salaries could not have been uniform from for example (20, 100], rather than (0, 100]. The variance of the uniform distribution on [a, b] is: (b-a)2 /12. 2.71. E. (See Comment) F(θ) is Pareto with parameters 2.6 and 1. It has mean: 1/(2.6 - 1) = 0.625, second moment: (2)(12 )/{(2.6 - 1)(2.6 - 2)} = 2.0833, and variance: 2.0833 - .6252 = 1.6927. EPV = E[Θ] = 0.625. VHM = Var[Θ] = 1.6927. K = EPV/VHM = .625/1.6927 = .37. We observe 5 claims, so n = 5. Z = 5/(5 + K) = 5/5.37 = 93.1%. Comment: It is intended that the claim severity is Poisson, although the question should have made this much clearer. The CAS/SOA accepted both choices C and E. Apparently, they allowed Z = 1/(1 + K) = 1/1.37 = 73.0%. This is incorrect, since when dealing with severity the number of draws from the risk process is the number of claims observed. However, the question should have made it clearer that we were dealing with severity. For example, “Claim sizes are conditionally independent and identically Poisson ...”
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 57
2.72. D. The probability of observing 10 claims given λ is proportional to: λ10e−λ. Therefore the posterior distribution of λ is proportional to: (λ10e−λ){(0.4)e−λ/6/6 + (0.6)e−λ/12/12} = .066667λ10e−7λ/6+ 0.05λ10e−13λ/12. The posterior mean is: ∞
∫
∞
∫
(.066667λ10e−7λ/6+ 0.05λ10e−13λ/12) λ dλ / .066667λ10e−7λ/6+ 0.05λ10e−13λ/12 dλ = 0
0
{(.066667)11!(6/7)12/ + (0.05)11!(12/13)12}/{(.066667)10!(6/7)11/ + (0.05)10!(12/13)11} = (11){(.066667)(6/7)12 + (0.05)(12/13)12}/{(.066667)(6/7)11 + (0.05)(12/13)11} = 9.885. Comment: Gamma type integrals, discussed in “Mahlerʼs Guide to Conjugate Priors.” This is not a Gamma-Poisson situation since π(λ) is a mixture of two Exponentials. In general, let π(λ) be a mixture of two Exponentials: π(λ) = we−λ/θ/θ + (1 - w)e−λ/µ/µ. If one observes C claims in Y years, then applying Bayes Theorem as in this solution, the posterior wµ(Y + 1/ µ)C + 2 + (1- w)θ(Y+ 1/ θ)C + 2 mean turns out to be: (C + 1) . wµ(Y + 1/ µ)C + 2 (Y + 1/ θ) + (1- w)θ(Y + 1/ θ)C + 2(Y + 1/ µ) If π(λ) were an Exponential with mean 6, then αʼ = 1 + 10 = 11 and 1/θʼ = 1/6 + 1 = 7/6, and the expected number of claims in Year 2 would be αʼθʼ = 11/(7/6) = 66/7. If instead π(λ) were an Exponential with mean 12, then αʼ = 1 + 10 = 11 and 1/θʼ = 1/12 + 1 = 13/12, and the expected number of claims in Year 2 would be αʼθʼ = 11/(13/12) = 132/13. As an approximation to the exact answer, one could weight these two results together: (.4)(66/7) + (.6)(132/13) = 9.864, very close to the correct answer in this case, but not always. 2.73. E. The distribution of λ is a 40%-60% mixture of Exponential Distributions with means 6 and 12. This mixture has a mean of: (40%)(6) + (60%)(12) = 9.6, a second moment of: (40%)(2)(62 ) + (60%)(2)(122 ) = 201.6, and variance of: 201.6 - 9.62 = 109.44. EPV = E[Process Variance | λ] = E[λ] = 9.6. VHM = Var[λ] = 109.44. K = EPV/VHM = 9.6/109.44 = .0877. Z = 1/(1 + .0877) = .919. Estimated future frequency: (.919)(10) + (1 - .919)(9.6) = 9.97.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 58
2.74. C. The chance of the observation is: e−λλ 2/2. Group I II III
A priori Probability 0.50 0.35 0.15
lambda
Chance of Observation
Probability Weight
Posterior Probability
Prob. N>2
1 2 3
0.18394 0.27067 0.22404
0.09197 0.09473 0.03361
0.4175 0.4300 0.1525
8.03% 32.33% 57.68%
0.22031
26.05%
For example: (.5)(.18394) = .09197. .09197/.22031 = .4175. Prob[N > 2] = 1 - e−λ - e−λλ - e−λλ 2/2. (.4175)(8.03%) + (.4300)(32.33%) + (.1525)(57.68%) = 26.05%. 2.75. B. The chance of the observation is: e−λλ 3/6. Type Children Nonsmoker Smoker
A priori Probability
lambda
Chance of Observation
Probability Weight
Posterior Probability
30% 60% 10%
3 1 4
0.22404 0.06131 0.19537
0.06721 0.03679 0.01954
54.4% 29.8% 15.8%
0.12354
100.0%
100%
By Bayes Theorem, Prob[Smoker | 3 colds] = Prob[3 colds | smoker] Prob[smoker] / Prob[3 colds] = (0.19537)(0.1) / {(0.22404)(0.3) + (0.06131)(0.6) + (0.19537)(0.1)} = 0.01954/0.12354 = 15.8%. 2.76. B. The overall mean is: {(1)(900) + (10)(90) + (20)(10)}/1000 = 2. Since we are mixing Poissons, EPV = overall mean = 2. The 2nd moment of the hypothetical means is: {(12 )(900) + (102 )(90) + (202 )(10)}/1000 = 13.9. VHM = 13.9 - 22 = 9.9. K = EPV/VHM = 2/9.9 = 0.202. Z = 1/(1 + K) = 1/1.202 = .832. 11.983 = estimate = .832x + (1 - .832)(2). ⇒ x = 14. Comment: Given the output, one needs to determine the missing input.
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 59
2.77. D. Over 4 years, Type I is Poisson with λ = 1.00, Type II is Poisson with λ = 2.00, and Type III is Poisson with λ = 4.00. f(1) = λe−λ. A
B
C
A Priori Type of Chance of Chance Risk This Type of the of Risk Observation I II III
0.05 0.20 0.75
Overall
0.3679 0.2707 0.0733
D
E
F
Prob. Weight = Product of Columns B&C
Posterior Chance of This Type of Risk
Mean Annual Freq.
0.01839 0.05413 0.05495
14.43% 42.47% 43.10%
0.250 0.500 1.000
0.12748
1.000
0.679
2.78. B. EPV = E[λ] = Mean of Weibull = θ Γ[1 + 1/τ] = (.1) Γ(1.5) = 0.088623. Second Moment of Weibull = θ2 Γ[1 + 2/τ] = (.01) Γ(2) = 0.01. VHM = Var[λ] = Variance of Weibull = 0.01 - 0.0886232 = 0.002146. K = EPV/VHM = 0.088623/0.002146 = 41.3. Z = 500/(500 + 41.3) = .924. A Priori Mean = E[λ] = Mean of Weibull = 0.088623. Estimated Future Frequency: (.924)(35/500) + (1 - .924)(0.088623) = 0.0714 per month. Estimated number of claims for 300 insureds for 12 months: (12)(300)(0.0714) = 257. Comment: While we observe 3 months of experience, note that we predict the next 12 months. 2.79. E. Probabilities of Observation: e-1/r!, 3re-3/r!. Probability Weights: .75e-1/r!, .25 3re-3/r!. Posterior Distribution: .75e-1/(.75e-1 + .25 3re-3), .25 3re-3/(.75e-1 + .25 3re-3). We are given that: 2.98 = (1).75e-1/(.75e-1 + .25 3re-3) + (3).25 3re-3/(.75e-1 + .25 3re-3).
⇒ 2.235e-1+ 0.745 3re-3 = .75e-1+ .75 3re-3. ⇒ 3r = 1.49e-1/(.005e-3) = 298e2 . r = {ln(298) + 2}/ln(3) = 7. EPV = E[λ] = (.75)(1) + (.25)(3) = 1.5 = a priori mean. VHM = Var[λ] = (.75)(1 - 1.5)2 + (.25)(3 - 1.5)2 = 0.75. K = EPV/VHM = 2. Z = 1/(1 + 2) = 1/3. (1/3)(7) + (2/3)(1.5) = 3.33. Alternately, let w = posterior probability that λ = 1. We are given that: 2.98 = (1)w + (3)(1 - w). ⇒ w = 0.01.
⇒ 0.01 = .75e-1/(.75e-1 + .25 3re-3) = 1/(1 + 3r-1e-2). ⇒ 3r-1e-2 = 99. ⇒ r = 7. Proceed as before. Comment: Long!
2013-4-10,
Conjugate Priors §2 Mixing Poissons,
HCM 10/21/12,
Page 60
2.80. A. Since we are mixing Poissons, EPV = mean = θ 0.50 + (1 - θ)(1.50) = 1.5 - θ. Second moment of the hypothetical means is: θ 0.502 + (1 - θ)(1.502 ) = 2.25 - 2θ. VHM = 2.25 - 2θ − (1.5 - θ)2 = θ − θ2. K = EPV/VHM = (1.5 - θ)/(θ − θ2). Z = 1/(1 + K) = (θ − θ2) / (1.5 - θ2). Comment: The number of claims observed in year one does not affect Z. “the Bühlmann credibility factor Z for Year 2” is the Bühlmann credibility factor Z used for predicting the number of claims in Year 2. One could take for example θ = 0.3, do the problem numerically, and then see which of the given choice matches your solution. Type 1 2 Average
A Priori Probability 0.3 0.7
Poisson Parameter 0.50 1.50
Square of Mean 0.25 2.25
1.20
1.65
EPV = 1.2. VHM = 1.65 - 1.22 = 0.21. K = 1.2/0.21 = 5.714. Z= 1/(1 + K) = 1/6.714 = 0.1489. Choice A gives: (.3 - .32 )/(1.5 - .32 ) = 0.21/ 1.41 = 0.1489.
2013-4-10,
Conjugate Priors §3 Gamma Function,
HCM 10/21/12,
Page 61
Section 3, Gamma Function and Distribution8 The quantity xα−1e-x is finite for x ≥ 0 and α ≥ 1. Since it declines quickly to zero as x approaches infinity, its integral from zero to ∞ exists. This is the much studied and tabulated (complete) Gamma Function. Γ(α) =
∞
∫
tα - 1 e - t
dt
= θ−α
0
Γ(α) = (α −1) !
∞
∫ tα - 1e - t / θ
dt , for α ≥ 0 , θ ≥ 0.
0
Γ(α) = (α−1)Γ(α−1)
Γ(1) = 1. Γ(2) = 1. Γ(3) = 2. Γ(4) = 6. Γ(5) = 24. Γ(6) = 120. Γ(7) = 720. Γ(8) = 5040. One does not need to know how to compute the complete Gamma Function for noninteger alpha. Many computer programs will give values of the complete Gamma Function.
Γ(1/2) = π
Γ(3/2) = 0.5 π
For α ≥ 10: lnΓ(α) ≅ (α - 0.5) lnα - α + +
Γ(-1/2) = -2 π
Γ(-3/2) = (4/3) π .
ln(2 π ) 1 1 1 1 + + 3 5 2 12 α 360α 1260α 1680α 7
691 3617 1 1 + .9 9 11 13 15 360,360 α 122,400 α 1188α 156 α
For α < 10 use the recursion relationship Γ(α) = (α−1) Γ(α−1). The Gamma function is undefined at the negative integers and zero. For large α: Γ(α) ≅ e-α α α−1/2 2 π , which is Sterlingʼs formula.10 The ratios of two Gamma functions with arguments that differ by an integer can be computed in terms of a product of factors, just as one would with a ratio of factorials. Exercise: What is Γ(8) / Γ(5)? [Solution: Γ(8) / Γ(5) = 7! / 4! = (7)(6)(5) = 210.] Exercise: What is Γ(8.3) / Γ(5.3)? [Solution: Γ(8.3) / Γ(5.3) = 7.3! / 4.3! = (7.3)(6.3)(5.3) = 243.747.] 8
See Appendix A of Loss Models. Also see the Handbook of Mathematical Functions, by M. Abramowitz, et. al. See Appendix A of Loss Models, and the Handbook of Mathematical Functions, by M. Abramowitz, et. al. 10 See the Handbook of Mathematical Functions, by M. Abramowitz, et. al. 9
2013-4-10,
Conjugate Priors §3 Gamma Function,
HCM 10/21/12,
Page 62
Note that even when the arguments are not integer, the ratio still involves a product of factors. The solution of the last exercise depended on the fact that 8.3 - 5.3 = 3 is an integer. Integrals involving e−x and powers of x can be written in terms of the Gamma function: ∞
∫0
tα - 1e - t / θ
dt =
Γ(α) θα, or for integer n:
∞
∫ tn e- c t
dt = n! / cn+1.
0
Exercise: What is the integral from 0 to ∞ of: t4 e-t/10? [Solution: With α = 5 and θ = 10, this integral is: Γ(5) 105 = (4!) (100,000) = 2,400,000.] This formula for “gamma-type” integrals is very useful for working with anything involving the Gamma distribution, for example the Gamma-Poisson process. It follows from the definition of the Gamma function and a change of variables. The Gamma density in the Appendix of Loss Models is: θ−α xα−1 e−x/θ / Γ(α). Since this probability density function must integrate to unity, the above formula for gamma-type integrals follows. This is a useful way to remember this formula on the exam.
Incomplete Gamma Function: As shown in Appendix A of Loss Models, the Incomplete Gamma Function is defined as: Γ(α ; x) =
x
∫ tα - 1 e- t
dt / Γ(α).
0
Γ(α ; 0) = 0. Γ(α ; ∞) = Γ(α)/Γ(α) = 1. As discussed below, the Incomplete Gamma Function with the introduction of a scale parameter θ is the Gamma Distribution. Exercise: Via integration by parts, put Γ(2 ; x) in terms of Exponentials and powers of x. [Solution: Γ(2 ; x) =
x
∫
0
t e - t dt / Γ(2) =
x
∫
0
t =x
t e - t dt = e - t - t e - t ]
t =0
= 1 - e-x - xe-x.]
2013-4-10,
Conjugate Priors §3 Gamma Function,
HCM 10/21/12,
Page 63
One can prove via integration by parts that Γ(α ; x) = Γ(α-1 ; x) - xα-1 e-x / Γ(α).11 This recursion formula for integer alpha is: Γ(n ; x) = Γ(n-1 ; x) - xn-1 e-x /(n-1)!. Combined with the fact that Γ(1 ; x) = ∫ e−t dt = 1 - e-x, this leads to the following formula for the Incomplete Gamma for positive integral alpha:12
Γ(n ; x) = 1 -
n−1 i x
∑
i =0
e- x . i!
Integrals Involving Exponentials times Powers: One can use the incomplete Gamma Function to handle integrals involving te-t/θ. x
∫
t e- t / θ dt =
0
x
x/θ
∫
θs e-s θds = θ2∫ se-s ds = θ2Γ(2 ; x/θ)Γ(2) = θ2{1 - e-x/θ - (x/θ)e-x/θ}.
0
∫ t e - t / θ dt
= θ2 {1 - e-x/θ - (x/θ)e-x/θ }.
0
Exercise: What is the integral from 0 to 4.3 of: te-t/5? [Solution: (52 ) {1 - e-4.3/5 - (4.3/5)e-4.3/5} = 5.32.] Such integrals can also be done via integration by parts, or one can make use of the formula for the Limited Expected Value of an Exponential Distribution:13 x
∫
0
x
t e- t / θ dt = θ ∫ t e- t / θ / θ dt = θ{E[X
∧ x] - xS(x)} =
0
θ{θ(1 - e-x/θ) - xe-x/θ} = θ2{1 - e-x/θ - (x/θ)e-x/θ}.
11
See for example, Formula 6.5.13 in the Handbook of Mathematical Functions, by Abramowitz, et. al. See Theorem A.1 in Appendix A of Loss Models. One can also establish this result by computing the waiting time until the nth claim for a Poisson Process, as shown in “Mahlerʼs Guide to Stochastic Processes,” on another exam. 13 See Appendix A of Loss Models. 12
2013-4-10,
Conjugate Priors §3 Gamma Function,
HCM 10/21/12,
Page 64
When the upper limit is infinity, the integral simplifies: ∞
∫ t e - t dt = θ2.
14
0
In a similar manner, one can use the incomplete Gamma Function to handle integrals involving tn e-t/θ, for n integer: x
∫
tn
e- t/θ
dt
0
= θn+1
x /θ
∫
n
sn
e -s
ds
= θn+1Γ(n+1; x/θ)Γ(n+1)
= n!
θn+1{1
-
i
∑ x ei!
-x
}.
i =0
0
Exercise: What is the integral from 0 to 4.3 of: t2 e-t/5? x
[Solution:
x /θ
∫0 t2 e - t / θ dt = θ3 0∫ s2 e - s ds = θ3 Γ(3 ; x/θ) Γ(3) =
2θ3 {1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ/2}. For θ = 5 and x = 4.3, this is: 250 {1 - e-0.86 - 0.86e-0.86 - 0.862 e-0.86/2} = 14.108.] x
In general,
∫0 t2 e- t / θ dt = 2θ3 {1 - e-x/θ - (x/θ)e-x/θ - (x/θ)2 e-x/θ/2}.
If one divided by θ, then the integrand would be t times the density of an Exponential Distribution. Therefore, the given integral is θ(mean of an Exponential Distribution) = θ2. 14
2013-4-10,
Conjugate Priors §3 Gamma Function,
HCM 10/21/12,
Page 65
Gamma Distribution:15 The Gamma Distribution can be defined in terms of the Incomplete Gamma Function, F(x) = Γ(α ; x/ θ). Note that Γ(α; ∞) = Γ(α) / Γ(α) = 1 and Γ(α; 0) = 0, so we have as required for a distribution function F(∞) = 1 and F(0) = 0.
f(x) =
(x / θ)α e - x / θ xα− 1 e - x / θ = , x > ∞. x Γ(α ) θα Γ (a)
Exercise: What is the mean of a Gamma Distribution? [Solution: ∞ ∞
∫
∞
x f(x) dx =
0
∫
x
xα-1
0
e - x/ θ
θα Γ(α)
∫ xα e - x/ θ dx
dx = 0
θα Γ(α)
=
Γ(α+ 1) θα + 1 Γ(α+ 1) = q = αθ.] α θ Γ(α) Γ(α)
Exercise: What is the nth moment of a Gamma Distribution? [Solution: ∞ ∞
∞
∫ xn f(x) dx = ∫ xn 0
0
xα- 1
e - x/ θ
θ α Γ(α)
∫ xn + α − 1 e - x/ θ dx
dx = 0
θα Γ(α)
=
Γ(α+ n) θ α+ n Γ(α+ n) n = θ θα Γ(α ) Γ(α)
= (α+n-1)(α+n-2)....(α) θn . Comment: This is the formula shown in Appendix A of Loss Models.] Exercise: What is the 3rd moment of a Gamma Distribution with α = 6 and θ = 10? [Solution: (α+n-1)(α+n-2)....(α)θn = (6+3-1)(6+3-2)(6)(103 ) = (8)(7)(6)(1000) = 336,000.]
15
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-10,
Conjugate Priors §3 Gamma Function,
HCM 10/21/12,
Problems: 3.1 (1 point) What is the value of the integral from zero to infinity of: x6 e-4x? A. less than 0.04 B. at least 0.04 but less than 0.05 C. at least 0.05 but less than 0.06 D. at least 0.06 but less than 0.07 E. at least 0.07 3.2 (1 point) What is the density at x = 15 of the Gamma distribution with parameters α = 4 and θ = 10? A. less than 0.012 B. at least 0.012 but less than 0.013 C. at least 0.013 but less than 0.014 D. at least 0.014 but less than 0.015 E. at least 0.015 3.3 (1 point) What is the value of the integral from zero to infinity of: x-5 e-7/x? A. less than 0.002 B. at least 0.002 but less than 0.003 C. at least 0.003 but less than 0.004 D. at least 0.004 but less than 0.005 E. at least 0.005 3.4 (2 points) What is the integral from 3 to 12 of: x e-x/3? A. 5.2
B. 5.4
C. 5.6
D. 5.8
E. 6.0
3.5 (1 point) What is the density at x = 70 of the Gamma distribution with parameters α = 5 and θ = 20? A. less than 0.008 B. at least 0.008 but less than 0.009 C. at least 0.009 but less than 0.010 D. at least 0.010 but less than 0.011 E. at least 0.011
Page 66
2013-4-10,
Conjugate Priors §3 Gamma Function,
HCM 10/21/12,
Page 67
Solutions to Problems: 3.1. B. ∞
∫ tα−1 e−t/θ dt = Γ(α)θα. Set α - 1= 6 and θ = 1/4. 0 ∞
∫ tα−1 e−t/θ dt = Γ(6+1) / 46+1 = 6! / 47 = 0.0439. 0
3.2. B. θ−α xα−1 e−x/θ / Γ(α) = (10-4) 153 e-1.5 / Γ(4) = 0.0126. 3.3. B. The density of the Inverse Gamma is: θα e−θ/x /{xα+1 Γ(α)}, 0 < x < ∞. Since this density integrates to one, x−(α+1) e−θ/x integrates to θ−αΓ(α). Thus taking α = 4 and θ = 7, x-5e-7/x integrates to: 7-4 Γ(4) = 6 / 74 = 0.0025. Comment: Alternately, one can make the change of variables y = 1/x and convert this to the integral of a Gamma density, rather than that of an Inverse Gamma Density. 3.4. D.
x
∫ te-t/θ dt = θ2{1 - e-x/θ - (x/θ)e-x/θ}.
Set θ = 3.
0 12
12
3
x = 12
∫ te-t/3 dt = ∫ te-t/3 dt - ∫ te-t/3 dt = (32){1 - e-x/3 - (x/3)e-x/3}] = 3
0
0
(9){e-1 + (1)e-1 - e-4 - (4)e-4} = 5.80. Comment: Can also be done using integration by parts. 3.5. C. (x/θ)α e−x/θ / {x Γ(α)} = (70/20)5 e-70/20 / {70 Γ(5)} = (3.5)5 e-3.5 / {(70) (24)} = 0.00944.
x=3
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 68
Section 4, Gamma-Poisson The number of claims a particular policyholder makes in a year is assumed to be Poisson with mean λ. Recall that for a Poisson Distribution with parameter λ the chance of having n claims is given by: λ n e−λ / n!. For example the chance of having 6 claims is given by: λ6 e−λ / 6!. Prior Distribution:16 Assume the λ values of the portfolio of policyholders are Gamma distributed with α = 3 and θ = 2/3, and therefore probability density function:17 f(λ) = 1.6875 λ2 e−1.5λ
λ ≥ 0.
The prior Gamma is displayed below:
0.4
0.3
0.2
0.1
1
2
3
4
5
6
Poisson Parameter
The Prior Distribution Function is given in terms of the Incomplete Gamma Function:18 F(λ) = Γ(3; 1.5λ). So for example, the a priori chance that the λ value lies between 4 and 5 is: F(5) - F(4) = Γ(3; 7.5) - Γ(3; 6) = 0.9797 - 0.9380 = 0.0417.19 Graphically, this is the area between 4 and 5 and under the prior Gamma. 16
The first portion of this example is also in “Mahlerʼs Guide to Frequency Distributions.” However, here we introduce observations and then apply Bayes Analysis and Buhlmann Credibility. 17 For the Gamma Distribution, f(x) = xα−1e -x/θ/{Γ(α) θα}. One can look up the formulas for the density and distribution function of a Gamma Distribution in the tables attached to the exam. 18 For the Gamma Distribution, F(x) = Γ(α; x/θ). 19 These values of the Incomplete Gamma Function were calculated on a computer using Mathematica.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 69
Marginal Distribution (Prior Mixed Distribution): If we have a risk and do not know what type it is, in order to get the chance of having 6 claims, one would weight together the chances of having 6 claims, using the a priori probabilities and integrating from zero to infinity20: ∞
∞ 6 -λ ∞ λ 6 e- λ λ e 2 e- 1.5λ dλ = 0.00234375 f(λ) dλ = 1.6875 λ ∫ 6! ∫ 6! ∫ λ8 e- 2.5λ dλ . 0 0 0
This integral can be written in terms of the Gamma function, as was shown in a previous section: ∞
∫ λα − 1 e- λ / θ
dλ = Γ(α) θα.
0
∞
Thus
∫ λ8 e- 2.5λ dλ = Γ(9) 2.5-9 = (8!) (0.4)9 ≅ 10.57. 0
Thus the probability of having 6 claims ≅ (0.00234375)(10.57) ≅ 2.5%. More generally, if the distribution of Poisson parameters λ is given by a Gamma distribution f(λ) = θ−αλ α−1 e−λ/θ / Γ(α), and we compute the chance of having n accidents by integrating from zero to infinity: ∞
∞ 6 -λ ∞ λ n e- λ λ e λ α − 1 e- λ / θ 1 n + α − 1 e- λ (1 + 1 / θ) dλ = dλ = ∫ n! f(λ) dλ = ∫ 6! α Γ(α) α Γ(α) ∫ λ θ n! θ 0 0 0
θn 1 Γ(n + α) α(α + 1)...(α + n -1) = . (1 + θ)n + α n! θα Γ(α) (1 + 1/ θ)n + α n! The mixed distribution is in the form of the Negative Binomial distribution with parameters r = α and β = θ: Probability of x accidents =
βx r(r +1)...(r + x - 1) . (1+ β) x + r x!
For the specific case dealt with previously: r = α = 3 and, β = θ = 2/3.
Note the way both the Gamma and the Poisson have terms involving powers of λ and e−λ and these similar terms combine in the product. 20
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 70
This marginal Negative Binomial is displayed below, through 10 claims: 0.25 0.2 0.15 0.1 0.05
0
1
2
3
The chance of having 6 claims is:
4
5
6
7
8
9
10
(2 / 3)6 (3)(4)(5)(6)(7)(8) = 2.477%. (1 + 2 / 3)6 + 3 6!
This is the same result as calculated above. On the exam, one should not go through this calculation above. Rather remember that for the Gamma-Poisson the (prior) marginal distribution is always a Negative Binomial, with r = α = shape parameter of the (prior) Gamma and β = θ = scale parameter of the (prior) Gamma.21 r goes with alpha, beta rhymes with theta. Prior Mean: Note that the overall (a priori) mean can be computed in either one of two ways. First one can weight together the means for each type of risk, using the a priori probabilities. This is E[λ] = the mean of the prior Gamma = αθ = 3(2/3) = 2. Alternately, one can compute the mean of the marginal distribution: the mean of a Negative Binomial is rβ = 3(2/3) = 2. Of course the two results match.
β goes with θ, and they rhyme, leaving r to go with α. If integer, α is the number of identical Exponentials one adds in order to get a Gamma; while, if integer, r is the number of identical Geometric variables one adds in order to get a Negative Binomial. 21
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 71
Prior Expected Value of the Process Variance: The process variance for an individual risk is its Poisson parameter λ since the frequency for each risk is Poisson. Therefore the expected value of the process variance = the expected value of λ = the a priori mean frequency = αθ = 3(2/3) = 2. Prior Variance of the Hypothetical Means: The variance of the hypothetical means is the variance of λ = Var[λ] = Variance of the Prior Gamma = αθ2 = 3 (2/3)2 = 1.33. Prior Total Variance: The total variance is the variance of the marginal distribution, which for the Negative Binomial equals rβ(1+β) = 3(2/3)(5/3) = 3.33. The Expected Value of the Process Variance + Variance of the Hypothetical Means = 2 + 1.33 = 3.33 = Total Variance. In general, The Expected Value of the Process Variance + Variance of the Hypothetical Means = αθ + αθ2 = αθ(1+θ) = rβ(1+β) = Total Variance. For the Gamma-Poisson we have: Variance of the Gamma = αθ2 = rβ2 = β θ θ rβ(1+β) = rβ(1+β) = (Variance of the marginal Negative Binomial). 1+ β 1+ θ 1+ θ Mean of the Gamma = αθ = rβ =
1 1 rβ(1+β) = rβ(1+β) = 1+ β 1+ θ
1 (Variance of the marginal Negative Binomial). 1+ θ Therefore, Variance of the Gamma + Mean of the Gamma = θ 1 ( + ) (Variance of the marginal Negative Binomial) = 1+ θ 1+ θ Variance of the marginal Negative Binomial. Which is just another way of saying that: EPV + VHM = Total Variance. VHM = the variance of the Gamma. Total Variance = the variance of the Negative Binomial = EPV + VHM. ⇒ αθ2 = Variance of Gamma < Variance of Negative Binomial = rβ(1+β) = αθ + αθ2.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 72
Observations: Let us now introduce the concept of observations. A risk is selected at random and it is observed to have 5 claims in one year. Posterior Distribution: We can employ Bayesian analysis to compute what the chances are that the selected risk had a given Poisson Parameter. Given a Poisson with parameter λ, the chance of observing 5 claims is: λ 5 e−λ / 5!. The a priori probability of λ is the Prior Gamma distribution: f(λ) = 1.6875 λ2 e−1.5λ. Thus the posterior chance of λ is proportional to the product of the chance of observation and the a priori probability: λ7 e−2.5λ. This is proportional to the density for a Gamma distribution with α = 8 and θ = 1/2.5 = 2/5. For an observation of 5 claims, the posterior Gamma is displayed below:
0.35 0.3 0.25 0.2 0.15 0.1 0.05 1
2
3
4
5
6
Poisson Parameter
The Posterior Distribution Function is given in terms of the Incomplete Gamma Function: F(λ) = Γ(8; 2.5λ). So for example, the posterior chance that the λ value lies between 4 and 5 is: F(5) - F(4) = Γ(8; 12.5) - Γ(8; 10) = 0.9302 - 0.7798 = 0.1504. Graphically, this is the area between 4 and 5 and under this posterior Gamma. Note how observing 5 claims in one year has increased the chance of the Poisson parameter being in the interval from 4 to 5, from 0.0417 to 0.1504. This is an example of a “Bayesian Interval Estimate.”
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 73
For an observation of 5 claims, the posterior Gamma with α = 8 and θ = 2/5, and the prior Gamma with α = 3 and θ = 2/3, are compared below: 0.4
0.3 Posterior
Prior 0.2
0.1
1
2
3
4
5
6
lambda
After observing 5 claims in a year, the probability that this risk has a small Poisson parameter has decreased, while the probability that it has a large Poisson parameter has increased. In general, if one observes C claims for E exposures, we have that the chance of the observation given λ is proportional to (Eλ)C e−Eλ.22 This is proportional to λC e−Eλ. The prior Gamma is proportional to λ α−1 e−λ/θ. Thus the posterior probability for λ is proportional to the product: λ C+α−1 e−(E+1/θ)λ. This is proportional to the density for a Gamma distribution with a shape parameter of: C+ α and scale parameter of: 1/(E + 1/θ) = θ/(1 + Eθ). Exercise: A risk is selected at random and it is observed to have 0 rather than 5 claims in one year. Determine the posterior probability that the mean future expected frequency for this risk lies between 4 and 5. [Solution: The posterior distribution is a Gamma distribution with shape parameter of αʼ = C + α = 0 + 3 = 3 and 1/θʼ = 1/θ + E = 3/2 + 1 = 5/2. θʼ = 2/5. In other words, F(λ) = Γ(3; λ/(2/5)) = Γ(3; 2.5λ). So the posterior chance that the λ value lies between 4 and 5 is: F(5) - F(4) = Γ(3; 12.5) - Γ(3; 10) = 0.99966 - 0.99723 = 0.00243. Comment: Graphically, the solution to this exercise is the area between 4 and 5 and under this posterior Gamma. Note how observing 0 claims in one year has decreased the chance of the Poisson parameter being in the interval from 4 to 5.] 22
The Poisson parameter for E exposures is Eλ.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 74
For an observation of 0 claims, the posterior Gamma with α = 3 and θ = 2/5, and the prior Gamma with α = 3 and θ = 2/3, are compared below: 0.6
Posterior
0.5 0.4 0.3 0.2
Prior
0.1 1
2
3
4
5
6
lambda
After observing no claims in a year, the probability that this risk has a small Poisson parameter has increased, while the probability that it has a large Poisson parameter has decreased. For the Gamma-Poisson the posterior density function is also a Gamma. This posterior Gamma has a shape parameter = prior shape parameter + the number of claims observed. This posterior Gamma has a scale parameter = 1 . 1 / (Prior scale parameter) + number of exposures (usually years) observed The updating formulas are: Posterior α = Prior α + C.
1 1 = + E. Posterior θ Prior θ
For example, in the case where we observed 5 claims in 1 year, C = 5 and E = 1. The prior shape parameter was 3 while the prior scale parameter was 2/3. Therefore the posterior shape parameter = 3 + 5 = 8, while the posterior scale parameter = 1/(3/2 + 1) = 2/5, matching the result obtained above. The fact that the posterior distribution is of the same form as the prior distribution is why the Gamma is a Conjugate Prior Distribution for the Poisson.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 75
Predictive Distribution: Since the posterior distribution is also a Gamma distribution, the same analysis that led to a Negative Binomial (prior) marginal distribution, will lead to a (posterior) predictive distribution that is Negative Binomial. However, the parameters of the predictive Negative Binomial are related to the posterior Gamma. For the Gamma-Poisson the (posterior) predictive distribution is always a Negative Binomial, with r = shape parameter of the posterior Gamma, and β = scale parameter of the posterior Gamma. Thus for the Predictive Negative Binomial: r = shape parameter of the prior Gamma + number of claims observed, while 1 β= . 1 / (Scale parameter of the Prior Gamma) + number of exposures observed In the particular example, r = 3 + 5 = 8. β = 1/(1/(2/3) + 1) = 2/5 = 0.4. Thus posterior to the observation of 5 claims, the chance of observing n claims in a year is given by: (8)(9)...(8 + n - 1) 0.4n . 1.4n + 8 n! Therefore posterior to having observed 5 claims in one year, the chance of observing 6 claims in a future year is:
13! 0.46 ≅ 6.3%. 7! 6! 1.46 + 8
Our estimate of the chance of having 6 claims has been increased by the observations from 2.5% to 6.3%.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 76
Below the prior marginal Negative Binomial distribution (triangles) and posterior predictive Negative Binomial distribution (squares) are compared:
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
10
Observing 5 claims has increased the probability of seeing a large number of claims from this risk in the future.
Posterior Mean: One can compute the means and variances posterior to the observations. The posterior mean can be computed in either one of two ways. First one can weight together the means for each type of risk, using the posterior probabilities. This is E[λ] = the mean of the posterior Gamma = αθ = 8 / 2.5 = 3.2. Alternately, one can compute the mean of the predictive distribution: the mean of a Negative Binomial is: rβ = 8 / 2.5 = 3.2. Of course the two results match. Thus the new estimate posterior to the observations for this risk using Bayesian Analysis is 3.2. This compares to the a priori estimate of 2. In general, the observations provide information about the given risk, which allows one to make a better estimate of the future experience of that risk. Not surprisingly observing 5 claims in a single year has raised the estimated frequency from 2 to 3.2.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 77
Posterior Expected Value of the Process Variance: Just as prior to the observations, posterior to the observations one can compute three variances: the expected value of the process variance, the variance of the hypothetical pure premiums, and the total variance. The process variance for an individual risk is its Poisson parameter λ since the frequency for each risk is Poisson. Therefore the expected value of the process variance = the expected value of λ = the posterior mean frequency = 3.2. Posterior Variance of the Hypothetical Means: The variance of the hypothetical means is: the variance of λ = Var[λ] = Variance of the Posterior Gamma = αθ2 = 8(2/5)2 = 1.28. After the observation the variance of the hypothetical means is less than prior (1.28 < 1.33) since the observations have allowed us to narrow down the possibilities.23 Posterior Total Variance: The total variance is the variance of the predictive distribution. The variance of the Negative Binomial equals: rβ(1+β) = 8(.4)(1.4) = 4.48. The Expected Value of the Process Variance + Variance of the Hypothetical Means = 3.2 + 1.28 = 4.48 = Total Variance. In general, EPV + VHM = αθ + αθ2 = αθ(1+θ) = Total Variance.
23
While the posterior VHM is usually less then the prior VHM, when the observation is sufficiently far from our prior expectations, the posterior VHM can be larger than the prior VHM. For example, with a prior Gamma with α = 2 and θ = 1/10, if we observe 5 claims in one year, then the posterior Gamma has parameters: α = 7 and θ = 1/11. The posterior VHM is 7 /112 = 0.058, which is greater than the prior VHM = 2 /102 = 0.020.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 78
Buhlmann Credibility: Next, letʼs apply Buhlmann Credibility to this example. The Buhlmann Credibility parameter K = the (prior) expected value of the process variance / the (prior) variance of the hypothetical means = 2 / (4/3) = 1.5. Note that K can be computed prior to any observation and doesnʼt depend on them. Specifically both variances are for a single insured for one year. For the Gamma-Poisson in general, K =
EPV αθ = = 1/θ. VHM α θ2
For the Gamma-Poisson the Buhlmann credibility parameter K is equal to the inverse of the scale parameter of the Prior Gamma. For the example, K = 1/θ = 1/(2/3) = 1.5. Having observed 5 claims in one year, Z = 1 / (1+ 1.5) = 0.4. The observation = 5. The a priori mean = 2. Therefore, the new estimate = (0.4)(5) + (1 - 0.4)(2) = 3.2. Note that in this case the estimate from Buhlmann Credibility matches the estimate from Bayesian Analysis. For the Gamma-Poisson the estimates from using Bayesian Analysis and Buhlmann Credibility are equal.24 Summary: The many different aspects of the Gamma-Poisson are summarized in below. It would be a good idea to know everything on that diagram for the exam. The Gamma distribution is a distribution of parameters, while the Negative Binomial is a distribution of number of claims. Be sure to be able to clearly distinguish between the situation prior to observations and that posterior to the observations. It is important to note that the Exponential distribution is a special case of the Gamma distribution, for α = 1. Therefore, many exam questions involving the Exponential-Poisson can be answered quickly as a special case of the Gamma-Poisson.
24
As discussed in a subsequent section this is a special case of the general results for conjugate priors of members of linear exponential families. This is an example of what Loss Models refers to as “exact credibility.”
Page 79 HCM 10/21/12,
Conjugate Priors §4 Gamma-Poisson, 2013-4-10,
Gamma-Poisson Frequency Process
Poisson Process Mixing
r = shape parameter of the Prior Gamma = α. β = scale parameter of the Prior Gamma = θ. Mean = rβ = αθ. Variance = rβ(1+β) = αθ + αθ2.
Negative Binomial Marginal Distribution (Number of Claims)
Gamma is a Conjugate Prior, Poisson is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate Buhlmann Credibility Parameter, K = 1/θ. Gamma Prior (Distribution of Parameters) Shape parameter = alpha = α, Scale parameter = theta = θ.
Poisson Process
Mixing
r = shape parameter of the Posterior Gamma = αʼ = α + C. β = scale parameter of the Posterior Gamma = θʼ = 1/(E + 1/θ). Mean = rβ = (α + C)/(E + 1/θ). Variance = rβ(1+β) = (α + C)/(E + 1/θ) + (α + C)/(E + 1/θ)2.
Negative Binomial Predictive Distribution (Number of Claims)
Observations: # claims = C, # exposures = E.
Gamma Posterior (Distribution of Parameters) Posterior Shape parameter = αʼ = α + C. Posterior Scale parameter = 1/θʼ = 1/θ + E.
Poisson Parameters of individuals making up the entire portfolio are distributed via a Gamma Distribution with parameters α and θ: f(x) = θ-α xα-1 e-x/θ / Γ[α], mean = αθ, variance = αθ2.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 80
Comparing the Gamma and Negative Binomial Distributions: Poisson Process Gamma Prior Distribution of Parameters Shape parameter = α.
Mixing
Scale parameter = θ.
Negative Binomial Marginal Distribution # of Claims r = α. β = θ.
The Gamma is a distribution of each insureds mean frequency. ⇒ The a priori mean frequency is the mean of the Gamma. The mean of the Negative Binomial is also the a priori mean. ⇒ αθ = Mean of Gamma = Mean of Negative Binomial = rβ. VHM = the variance of the Gamma. Total Variance = variance of the Negative Binomial. Total Variance = EPV + VHM > VHM. ⇒ α θ2 = Variance of Gamma < Variance of Negative Binomial = r β (1+β) = α θ + α θ2.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 81
Predictive Distribution for More than One Year: In the example, the posterior distribution of λ was Gamma with α = 8 and θ = 0.4. Therefore, the predictive distribution for the next year was Negative Binomial with r = 8 and β = 0.4. This predictive distribution was used to determine the probability of having a certain number of claims during the next year. Sometimes one is interested in the number of claims over several future years. For example, let us determine the probability of having a certain number of claims during the next two years. The number of claims over one year is Poisson with mean λ. Therefore, the number of claims over two years is Poisson with mean 2λ. The posterior distribution of 2λ is Gamma with α = 8 and θ = (2)(0.4) = 0.8. Thus mixing this Poisson by this Gamma, the distribution of the number of claims for the next two years is Negative Binomial with r = 8 and β = 0.8. Exercise: What is the probability that this insured has 3 claims over the next two years? [Solution: f(3) = {r(r+1)(r+2)/3!} β3/(1+β)3+r = {(8)(9)(10)/6}0.83 /1.811 = 9.56%.] In general, if the posterior distribution is Gamma with parameters αʼ and θʼ, then over Y future years the predictive distribution is a Negative Binomial with parameters r = αʼ and β = Yθʼ.25 Note that we do not add the predictive Negative Binomial for one year to itself. This would be the correct thing to do if we assumed each year had a different lambda picked at random. Here we are assuming that each year has the same unknown lambda. Exercise: Alan and Bob each have 5 claims over one year. What is the probability that they have in total 3 claims over the next year? [Solution: Each of Alan and Bob has a posterior distribution of λ which is Gamma with α = 8 and θ = 0.4. Therefore, the predictive distribution for the next year for each of them is Negative Binomial with r = 8 and β = 0.4. The number of claims Alan and Bob will have are independent. Therefore, the sum of their claims next year is the sum of their Negative Binomials, a Negative Binomial Distribution with r = (2)(8) = 16 and β = 0.4. f(3) = {r(r+1)(r+2)/3!} β3/(1+β)3+r = {(16)(17)(18)/6}0.43 /1.419 = 8.74%. Comment: While the Negative Binomial with r = 16 and β = 0.4 has the same mean as the one with r = 8 and β = 0.8, it does not have the same probabilities. 8.74% ≠ 9.56%.] 25
For the Gamma-Poisson, the mixed distribution for Y years of data is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ. See “Mahlerʼs Guide to Frequency Distributions.”
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 82
Problems: 4.1 (2 points) For an insurance portfolio the distribution of the number of claims a particular policyholder makes in a year is Poisson with mean λ. The λ-values of the policyholders follow the Gamma distribution, with parameters α = 5, and θ = 1/8. The probability that a policyholder chosen at random will experience x claims is given by which of the following? A.
(5)(6) ... (4 + x) 85 9x x!
B.
(5)(6) ... (4 + x) 85 9x+ 5 x!
C.
(8)(9) ... (7 + x) 85 9x x!
D.
(8)(9) ... (7 + x) 85 9x+ 5 x!
E. None of A, B, C, or D.
Use the following information to answer the following two questions: Let the likelihood of a claim be given by a Poisson distribution with parameter λ. The prior density function of λ is given by f(λ) = 3125 λ4 e- 5λ /24. You observe 1 claim in 2 years. 4.2 (2 points) The posterior density function of λ is proportional to which of the following? A. λ3 e- 5λ B. λ4 e- 6λ C. λ5 e- 7λ D. λ6 e- 8λ E. None of A, B, C, or D. 4.3 (2 points) What is the Buhlmann credibility estimate of the posterior mean claim frequency? A. less than 0.70 B. at least 0.70 but less than 0.75 C. at least 0.75 but less than 0.80 D. at least 0.80 but less than 0.85 E. at least 0.85
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 83
Use the following information to answer the next 14 questions: The number of claims a particular policyholder makes in a year is Poisson with mean λ. The λ values of the portfolio of policyholders have probability density function: f(λ) = (100,000 / 24) e-10 λ λ 4. You are given the following values of the Incomplete Gamma Function: Γ(α ; y) y
α=4
α=5
α=6
α=7
3.0
0.353
0.185
0.084
0.034
4.0 5.4
0.567 0.787
0.371 0.627
0.215 0.454
0.111 0.298
7.2
0.928
0.844
0.724
0.580
4.4 (1 point) What is the mean claim frequency for the portfolio? A. less than 35% B. at least 35% but less than 45% C. at least 45% but less than 55% D. at least 55% but less than 65% E. at least 65% 4.5 (1 point) What is the probability that an insured picked at random from this portfolio will have a Poisson parameter between 0.3 and 0.4? A. less than 15% B. at least 15% but less than 16% C. at least 16% but less than 17% D. at least 17% but less than 18% E. at least 18% 4.6 (2 points) What is the probability that a policyholder chosen at random will experience 3 claims in a year? A. less than 0.6% B. at least 0.6% but less than 0.9% C. at least 0.9% but less than 1.2% D. at least 1.2% but less than 1.5% E. at least 1.5%
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 84
4.7 (1 point) What is the variance of the claim frequency for the portfolio? A. less than 0.54 B. at least 0.54 but less than 0.56 C. at least 0.56 but less than 0.58 D. at least 0.58 but less than 0.60 E. at least 0.60 4.8 (1 point) What is the expected value of the process variance? A. less than 0.48 B. at least 0.48 but less than 0.50 C. at least 0.50 but less than 0.52 D. at least 0.52 but less than 0.54 E. at least 0.54 4.9 (1 point) What is the variance of the hypothetical mean frequencies? A. less than 0.042 B. at least 0.042 but less than 0.044 C. at least 0.044 but less than 0.046 D. at least 0.046 but less than 0.048 E. at least 0.048 4.10 (1 point) An insured has 2 claims over 8 years. Using Buhlmann Credibility what is the estimate of this insured's expected claim frequency? A. less than 40% B. at least 40% but less than 45% C. at least 45% but less than 50% D. at least 50% but less than 55% E. at least 55% 4.11 (2 points) An insured has 2 claims over 8 years. What is the posterior probability density function for this insured's Poisson parameter λ? A. f(λ) = 850305.6 e−12λ λ 8 B. f(λ) = 850305.6 e−12λ λ 6 C. f(λ) = 850305.6 e−18λ λ 8 D. f(λ) = 850305.6 e−18λ λ 6 E. None of A, B, C, or D
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 85
4.12 (1 point) An insured has 2 claims over 8 years. What is the mean of the posterior distribution of λ? A. less than 40% B. at least 40% but less than 45% C. at least 45% but less than 50% D. at least 50% but less than 55% E. at least 55% 4.13 (1 point) An insured has 2 claims over 8 years. What is the probability that this insured has a Poisson parameter between 0.3 and 0.4? A. 26% B. 28% C. 30% D. 32% E. 34% 4.14 (2 points) An insured has 2 claims over 8 years. What is the variance of the posterior distribution of λ? A. less than 0.014 B. at least 0.014 but less than 0.016 C. at least 0.016 but less than 0.018 D. at least 0.018 but less than 0.020 E. at least 0.020 4.15 (1 point) An insured has 2 claims over 8 years. What is the probability that this insured has a Poisson parameter between 0.3 and 0.4? Use the Normal Approximation. A. less than 27% B. at least 27% but less than 29% C. at least 29% but less than 31% D. at least 31% but less than 33% E. at least 33% 4.16 (2 points) An insured has 2 claims over 8 years. What is the probability that this insured will experience 3 claims in the next year? A. 0.6% B. 0.8% C. 1.0% D. 1.2% E. 1.4% 4.17 (2 points) An insured has 2 claims over 8 years. What is the variance of the predictive distribution? A. 0.35 B. 0.37 C. 0.39 D. 0.41
E. 0.43
4.18 (2 points) An insured has 2 claims over 8 years. What is the probability that this insured will experience 4 claims in the next three years? A. 2.2% B. 2.4% C. 2.6% D. 2.8% E. 3.0%
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 86
Use the following information for the next 2 questions: Prior to any observations you assume each group health policyholder has a frequency distribution which is Poisson, with mean (µ)(number of individuals in that group). You assume that µ is distributed across the different group health policyholders via a Gamma Distribution. You observe the following data for a portfolio of group health policyholders: Policyholder 1997 17 9
Year 1998 20 10
1999 16 13
Sum 53 32
# claims # in group
1
# claims # in group
2
19 11
23 8
17 7
59 26
# claims # in group
3
26 14
30 17
35 18
91 49
# claims # in group
Sum
62 34
73 35
68 38
203 107
4.19 (2 points) Prior to any observations, you assume the Gamma Distribution of µ has parameters α = 9 and θ = 0.2. You expect Policyholder 1 to have 14 individuals in the year 2001. What are the average number of claims expected in the year 2001 from Policyholder 1? A. 23.5 B. 24.0 C. 24.5 D. 25.0 E. 25.5 4.20 (2 points) Prior to any observations, you assume the Gamma Distribution of µ has parameters α = 9 and θ = 0.2. You expect Policyholder 2 to have 7 individuals in the year 2001. You expect the average claim to cost $800 in the year 2001. What is the Buhlmann credibility premium in the year 2001 for Policyholder 2? A. $11,250 B. $11,500 C. $11,750 D. $12,000
E. $12,250
4.21 (3 points) The number of robberies of a given convenience store during the month is assumed to be Poisson distributed with an unknown mean, that varies by store via an Exponential Distribution with mean 0.015. The Big Apple Convenience Store on Main Street has had 4 robberies over the last 36 months. What is the probability that this store will have two robberies over the next 12 months? A. less than 9% B. at least 9% but less than 10% C. at least 10% but less than 11% D. at least 11% but less than 12% E. at least 12%
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 87
Use the following information to answer the next 6 questions: The number of claims a particular policyholder makes in a year is Poisson. The values of the Poisson parameter (for annual claim frequency) for the individual policyholders in a portfolio follow a Gamma distribution, with parameters α = 3 and θ = 1/12. 4.22 (2 points) What is the chance that an insured picked at random from the portfolio will have no claims over the next three years? A.45% B. 47% C. 49% D. 51% E. 53% 4.23 (2 points) What is the chance that an insured picked at random from the portfolio will have one claim over the next three years? A. 30% B. 35% C. 40% D. 50% E. 55% 4.24 (2 points) How much credibility would be assigned to three years of data from an insured picked at random from the portfolio? A. 10% B. 15% C. 20% D. 25% E. 30% 4.25 (1 point) An insured picked at random from the portfolio is observed for three years and has no claims. Use Buhlmann credibility to estimate its future annual claim frequency. A. 0.14 B. 0.16 C. 0.18 D. 0.20 E. 0.22 4.26 (1 point) An insured picked at random from the portfolio is observed for three years and has one claim. Use Buhlmann credibility to estimate its future annual claim frequency. A. less than 0.19 B. at least 0.19 but less than 0.21 C. at least 0.21 but less than 0.23 D. at least 0.23 but less than 0.25 E. at least 0.25 4.27 (3 points) Use Bayesian Analysis to predict the future annual claim frequency of those insureds who have fewer than two claims over a three year period. A. less than 0.19 B. at least 0.19 but less than 0.21 C. at least 0.21 but less than 0.23 D. at least 0.23 but less than 0.25 E. at least 0.25
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 88
4.28 (3 points) The conditional distribution of the annual number of accidents per driver is Poisson with mean λ. λ is constant for a particular driver, but varies between different drivers. The variable λ has a gamma distribution with parameters α = 1.5 and θ = 0.03. A particular driver, Green Acker, has had a total of 2 accidents over the last 3 years. What is the probability that Green Acker will have a total of 1 accident over the next 3 years? (A) 20% (B) 21% (C) 22% (D) 23% (E) 24% Use the following information for the next 2 questions: • The random variable representing the number of claims for a single policyholder follows a Poisson distribution. • For a portfolio of policyholders, the Poisson parameters follow a Gamma distribution representing the heterogeneity of risks within that portfolio. • The random variable representing the number of claims in a year of a policyholder, chosen at random, follows a Negative Binomial distribution with parameters: r = 4 and β = 3/17. 4.29 (1 point) Determine the variance of the Gamma distribution. (A) 0.110 (B) 0.115 (C) 0.120 (D) 0125 (E) 0.130 4.30 (2 points) For a policyholder chosen at random from this portfolio, determine the chance of observing 2 claims over 5 years. (A) 17.0% (B) 17.5% (C) 18.0% (D) 18.5% (E) 19.0% Use the following information for the next three questions: (i) The annual number of claims for each policyholder follows a Poisson distribution with mean λ. (ii) The distribution of λ across all policyholders has probability density function: f(λ) = 100λe−10λ, λ > 0. A randomly selected policyholder is known to have had at least one claim last year. 4.31 (3 points) What is the expected future claim frequency of this policyholder? (A) 0.29 (B) 0.31 (C) 0.33 (D) 0.35 (E) 0.37 4.32 (2 points) Determine the posterior probability that this same policyholder will have no claims this year. (A) 0.72 (B) 0.74 (C) 0.76 (D) 0.78 (E) 0.80 4.33 (3 points) Determine the posterior probability that this same policyholder will have at least 2 claims this year. (A) 3.0% (B) 3.5% (C) 4.0% (D) 4.5% (E) 5.0%
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 89
Use the following information for the next 2 questions: Prior to any observations you assume each group health policyholder has a frequency distribution which is Poisson, with mean (µ)(number of individuals in that group). You assume that µ is distributed across the different group health policyholders via a Gamma Distribution, with parameters α = 9 and θ = 0.2. You observe the following data for a portfolio of group health policyholders: Policyholder 1997 $8,700 9
Year 1998 $11,800 10
1999 $11,100 13
Sum $31,600 32
$ of Loss # in group
1
$ of Loss # in group
2
$13,000 14
$18,200 17
$27,600 18
$58,800 49
$ of Loss # in group
Sum
$21,700 23
$30,000 27
$38,700 31
$90,400 81
4.34 (2 points) Assume that the average claim is $600. hat is the expected pure premium for policyholder 2 in the year 2001? A. 900 B. 1000 C. 1100 D. 1200 E. 1300 4.35 (3 points) Assume that in the year 2001 the average claim cost will be $800. Assume 7% annual inflation. Assuming you expect 16 individuals in group 1 in the year 2001, what is the expected cost for policyholder 1? A. Less than $19,500 B. At least $19,500, but less than $20,000 C. At least $20,000, but less than $20,500 D. At least $20,500, but less than $21,000 E. At least $21,000
4.36 (2 points) You are given: (i) The number of claims incurred in a month by any insured has a Poisson distribution with mean λ. (ii) The claim frequencies of different insureds are independent. (iii) The prior distribution of λ is exponential with mean 1/8. (iv) A randomly selected insured has 1 claim in the final quarter of 2004 and 3 claims in 2005. Determine the credibility estimate of the number of claims for this insured during 2006. (A) 2.4 (B) 2.5 (C) 2.6 (D) 2.7 (E) 2.8
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 90
Use the following information to answer the next 12 questions: The number of claims a particular policyholder makes in a year is Poisson. The values of the Poisson parameter (for annual claim frequency) for the individual policyholders in a portfolio of 10,000 follow a Gamma distribution, with parameters α = 4 and θ = 0.1. You observe this portfolio for one year and divide it into three groups based on how many claims you observe for each policyholder: Group A: Those with no claims. Group B: Those with one claim. Group C: Those with two or more claims. 4.37 (1 point) What is the expected size of Group A? (A) 6200 (B) 6400 (C) 6600 (D) 6800 (E) 7000 4.38 (1 point) What is the expected size of Group B? (A) 2400 (B) 2500 (C) 2600 (D) 2700 (E) 2800 4.39 (1 point) What is the expected size of Group C? (A) 630 (B) 650 (C) 670 (D) 690 (E) 710 4.40 (1 point) What is the expected future claim frequency for a member of Group A? (A) 36%
(B) 38%
(C) 40%
(D) 42%
(E) 44%
4.41 (1 point) What is the expected future claim frequency for a member of Group B? (A) 37% (B) 39% (C) 41% (D) 43% (E) 45% 4.42 (3 points) What is the expected future claim frequency for a member of Group C? (A) 52% (B) 54% (C) 56% (D) 58% (E) 60% 4.43 (1 point) What is the chance next year of 0 claims from an insured in Group A? (A) 65% (B) 67% (C) 69% (D) 71% (E) 73% 4.44 (1 point) What is the chance next year of 0 claims from an insured in Group B? (A) 65% (B) 67% (C) 69% (D) 71% (E) 73% 4.45 (3 points) What is the chance next year of 0 claims from an insured in Group C? (A) 54% (B) 56% (C) 58% (D) 60% (E) 62% 4.46 (2 points) What is the chance next year of 2 or more claims from an insured in Group A? (E) 5.9% (A) 5.1% (B) 5.3% (C) 5.5% (D) 5.7%
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 91
4.47 (2 points) What is the chance next year of 2 or more claims from an insured in Group B? (A) 7.5% (B) 7.7% (C) 7.9% (D) 8.1% (E) 8.3% 4.48 (4 points) What is the chance next year of 2 or more claims from an insured in Group C? (A) 12% (B) 13% (C) 14% (D) 15% (E) 16%
4.49 (3 points) You are given the following: • A portfolio consists of a number of independent risks. •
The number of claims per year for each risk follows a Poisson distribution with mean µ.
•
The prior distribution of µ among the risks in the portfolio is assumed to be
a Gamma distribution. • During several years, a positive number of claims are observed for a particular insured from this portfolio. Which of the following statements are true? 1. For this insured, the posterior distribution of µ can not be an Exponential. 2. For this insured, the coefficient of variation of the posterior distribution of µ is less than the coefficient of variation of the prior distribution of µ. 3. The coefficient of variation of the posterior distribution of the number of claims per year for this insured is less than the coefficient of variation of the a priori distribution of the number of claims per year. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C or D 4.50 (2 points) You are given the following information: • For a given group health policy, the number of claims for each member follows a Poisson distribution with parameter λ. •
λ is the same for each member of a given group.
•
However, λ varies between groups, via a Gamma distribution with mean = 0.08
and variance = 0.0004. • For a particular group, during the latest three years a total of 120 claims has been observed. In each of the three years, this group had 300 members. Determine the Bayesian estimate of lambda for this group based upon the recent observations. A. less than 0.110 B. at least 0.110 but less than 0.115 C. at least 0.115 but less than 0.120 D. at least 0.120 but less than 0.125 E. at least 0.125
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 92
Use the following information for the next two questions: (i) The number of claims incurred in a year by any insured has a Poisson distribution with mean λ. (ii) For an individual insured, λ is constant over time. (iii) The claim frequencies of different insureds are independent. (iv) The prior density of λ is gamma with: f(λ) = (200λ)4 e-200λ / (6λ). (v) Preferred homeowners in Territory 5 whose homeowners insurance is written by ABC Insurance Company are assumed to each have the same mean frequency. (vi) Recent experience for such homeowners insureds has been as follows: Year Number of Insureds Number of Claims 1 200 3 2 250 2 3 300 3 4 350 ? 4.51 (2 points) Determine the Bühlmann-Straub credibility estimate of the number of claims in Year 4. (A) 4.0 (B) 4.2 (C) 4.4 (D) 4.6 (E) 4.8 4.52 (3 points) What is the probability of observing at most 2 claims in Year 4? (A) 18% (B) 20% (C) 23% (D) 27% (E) 30% 4.53 (3 points) You are given: (i) The number of claims per auto insured follows a Poisson distribution with mean λ. (ii) The prior distribution for λ has the following probability density function: f(λ) = (300λ)40 e-300λ / {λΓ(40)} (iii) Randy observes the following claims experience: Year 1 Year 2 Number of claims 60 Number of autos insured 500 600 (iv) Let Randyʼs estimate of the expected number of claims in year 2 be R. (v) Randy rotates to another area of his insurerʼs actuarial department. Andy takes over Randyʼs old job. Andy observes the following claims experience: Year 1 Year 2 Year 3 Number of claims 60 90 Number of autos insured 500 600 700 (vi) Let Andyʼs estimate of the expected number of claims in year 3 be A. Determine R + A. (A) 170 (B) 172 (C) 174 (D) 176 (E) 178
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 93
For the next 3 questions, use the following information on the number of accidents over a six year period for two sets of drivers: Number of Accidents Number of Female Drivers Number of Male Drivers 0 19,634 21,800 1 3,573 6,589 2 558 1,476 3 83 335 4 19 69 5 4 16 6 1 4 7 0 2 8 0 1 9 0 1 Total
23,872
30,293
4.54 (5 points) Fit a Negative Binomial to the data for Females using the method of moments. Test that fit by using the Chi-Square Goodness of Fit Test. Group the data using the largest number of groups such that the expected number of drivers in each group is at least 5. 4.55 (5 points) Fit a Negative Binomial to the data for Males using the method of moments. Test that fit by using the Chi-Square Goodness of Fit Test. Group the data using the largest number of groups such that the expected number of drivers in each group is at least 5. 4.56 (3 points) For each of the previous questions, assume that each insured has a Poisson frequency with mean that is constant over time. Assume that the means of the Poisson distributions are Gamma Distributed across each of the groups of drivers. In each case, using the fitted Negative Binomial Distribution, how much credibility would be given to three years of data from a single driver.
4.57 (2 points) You are given for automobile insurance: (i) Each driver has a frequency that is Poisson with mean λ. (ii) Across the portfolio, λ has a Gamma Distribution with parameters α and θ. (iii) A given driver is observed to have no claims in three years. (iv) The posterior estimate of the future claim frequency for this insured is 85% of the prior estimate. What is the value of θ? (A) 1/20
(B) 1/17
(C) 1/15
(D) 1/10
(E) Can not be determined.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 94
Use the following information for the next two questions: (i) The conditional distribution of the number of claims per policyholder is Poisson with mean λ. (ii) The variable λ has a gamma distribution with parameters α and θ. (iii) A policyholder has 1 claim in Year 1, 2 claims in year 2, and 3 claims in year 3. 4.58 (3 points) Which of the following is equal to the mean of the posterior distribution of λ? ∞
(A)
∫λ
α +4
exp[−(3 + 1/ θ) λ] dλ /
0
∞
(B)
∫
∞
∫
λα + 4 exp[−(6 + 1/ θ) λ] dλ /
(D)
∞
∫ λα + 3 exp[−(6 +
1/ θ) λ] dλ
0
λα + 6 exp[−(6 + 1/ θ) λ] dλ /
0 ∞
∫ λα + 3 exp[−(3 + 1/ θ) λ] dλ 0
0
(C)
∞
∞
∫ λα + 5 exp[−(6 +
1/ θ) λ] dλ
0
∫λ
α +6
exp[−(3 + 1/ θ) λ] dλ /
0
∞
∫ λα + 5 exp[−(3 + 1/ θ) λ] dλ 0
(E) None of A, B, C, or D 4.59 (3 points) Which of the following is equal to the probability of two claims in year 4 from this policyholder? ∞
(A)
∫λ
α +5
exp[−(3 + 1/ θ) λ] dλ /
0
∞
(B)
∫
∞
∫
λα + 5 exp[−(4 + 1/ θ) λ] dλ /
(D)
∞
∫ λα + 3 exp[−(6 +
1/ θ) λ] dλ
0
λα + 7 exp[−(3 + 1/ θ) λ] dλ /
0 ∞
∫ λα + 3 exp[−(3 + 1/ θ) λ] dλ 0
0
(C)
∞
∞
∫ λα + 5 exp[−(6 +
1/ θ) λ] dλ
0
∫λ
α +7
exp[−(4 + 1/ θ) λ] dλ /
0
(E) None of A, B, C, or D
∞
∫ λα + 5 exp[−(3 + 1/ θ) λ] dλ 0
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 95
4.60 (4 points) You are given: (i) The number of accidents per taxicab follows a Poisson distribution with mean λ. (ii) λ is the same for each taxicab owned by a particular company, but λ varies between the companies. (iii) The prior distribution for λ has the following probability density function: f(λ) = (50λ)5 e-50λ / {λΓ(5)}. (iv) Calloway Cab Company has the following claims experience: Year 1 Year 2 Number of accidents 7 6 Number of taxicabs 20 25 The Calloway Cab Company expects to have 30 cabs in each of Years 3 and 4. Determine the probability of a total of 9 accidents in Years 3 and 4. (A) 8% (B) 9% (C) 10% (D) 11% (E) 12% 4.61 (4 points) The number of medical malpractice claims from each doctor is Poisson with mean λ. The improper prior distribution is: π(λ) = 1, λ > 0. (a) Dr. Phil Fine has no claims in year 1. What is his expected number of claims in year 2? (b) Dr. Phil Fine now has 2 claims in year 2. What is the variance of his posterior distribution of λ? (c) Dr. Phil Fine now has 1 claim in year 3. What is the variance of his predictive distribution? 4.62 (5 points) Prior to any observations, you assume the Gamma Distribution of µ has parameters α = 10 and θ unknown. You assume the claims experience of the different policyholders are independent. Which of the following equations should be solved in order to estimate θ via maximum likelihood from the observed data? 54 60 92 A. 3θ = + + 32 + 1/ θ 26 + 1/ θ 49 + 1/ θ B. 30θ =
63 69 101 + + 32 + 1/ θ 26 + 1/ θ 49 + 1/ θ
C. 3θ =
54 60 92 + + 32 + 10 / θ 26 + 10 / θ 49 + 10 / θ
D. 30θ =
63 69 101 + + 32 + 10 / θ 26 + 10 / θ 49 + 10 / θ
E. None of the above.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 96
Use the following information for the next two questions: • The number of claims in a year for an individual follows a Poisson Distribution with parameter λ. • λ follows a Gamma Distribution with α = 2 and θ = 0.10. • For 4 individuals picked at random you observe a total of 6 claims during a year. 4.63 (3 points) Determine the Bayesian estimate of the expected total number of claims next year for this group of 4 individuals. A. 1.2 B. 1.3 C. 1.4 D. 1.5 E. 1.6 4.64 (2 points) Determine the probability that this group of 4 individuals will have a total of 3 claims in the next year. A. 8.0% B. 8.5% C. 9.0% D. 9.5% E. 10.0%
Use the following information for the next two questions: You are given the following information: • The number of claims in a year for an individual follows a Poisson Distribution with parameter λ. • λ follows a Gamma Distribution with α = 2 and θ = 0.10. • For an individual you observe a total of 6 claims during 4 years. 4.65 (2 points) Determine the Bayesian estimate of the posterior annual claim frequency rate for this individual. A. 0.4 B. 0.5 C. 0.6 D. 0.7 E. 0.8 4.66 (2 points) Determine the probability that this individual will have 3 claims in the next year. A. 2.0% B. 2.5% C. 3.0% D. 3.5% E. 4.0%
4.67 (2 points) You are given: (i) The number of accidents an individual has in a year follows a Poisson distribution with mean λ. (ii) λ varies between individuals via a Gamma Distribution with α = 3 and θ = 1/100. (iii) David had two accidents in five years. Use the Bayes estimate that corresponds to the zero-one loss function, in order to predict Davidʼs future mean annual frequency. A. 3.0% B. 3.8% C. 4.0% D. 4.8% E. 5.0%
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 97
Use the following information for the next two questions:
• Pan-Global Airways is running a contest. • They will pick at random one of the flights purchased to fly on them this year, and award that customer free air travel on Pan-Global for the rest of their life.
• The number of flights purchased each year by each of their customers is Poisson with mean λ. • λ is distributed across their customers via a Gamma Distribution with α = 1/3 and θ = 6. 4.68 (3 points) What is the expected number of flights purchased by the lucky customer during the year of the contest? A. 5 B. 6 C. 7 D. 8 E. 9 4.69 (3 points) Assume that the annual number of flights taken by the lucky customer will still be Poisson, but with a mean 1.5 times what it had been. What is the expected number of future flights taken per year by the contest winner? A. 10 B. 11 C. 12 D. 13 E. 14 4.70 (2 points) Claim frequency follows a Poisson distribution with parameter λ. λ is distributed according to: g(λ) = 25 λ e-5λ. An insured selected at random from this population has two claims during the past year. Find the posterior density function for λ. Find the predictive distribution. 4.71 (2 points) Claim frequency follows a Poisson distribution with mean λ. λ is constant for an individual insured, but varies over an insured population via a Gamma Distribution with α = 6 and θ = 0.04. An insured is selected at random and is claim free for n years. Determine n if the posterior estimate of λ for this insured is 0.15. A. 5
B. 10
C. 15
D. 20
D. 25
4.72 (3 points) Claim frequency follows a Poisson distribution with parameter λ. λ follows a gamma distribution with a mean equal to 1. During the next year, 10 policies produced 20 claims. The predictive distribution has a variance of 16/9. Determine the variance of the posterior distribution of λ. A. 1/12
B. 1/11
C. 1/10
D. 1/9
E. 1/8
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 98
4.73 (4, 11/82, Q.49) (3 points) You are given the following probability density functions: Poisson: f(d|h) = e-h hd /d!, d = 0, 1, 2, ...
Mean = h. Variance = h.
Gamma: g(h) = ar e-ah hr-1 / Γ(r), 0≤ h ≤ ∞.
Γ(r+1)/Γ(r) = r, r≥1.
Mean = r/a. Variance = r/a2 . The probability distribution of claims per year (d) is specified by a Poisson distribution with parameter h. The prior distribution of h in a class is given by a Gamma distribution with parameters a, r. Given an observation of c claims in a one-year period, determine the posterior probability distribution for h. Use Bayes' Theorem. A. (a+c)r e-(a+c)h hr-1 / Γ(r)
B. (a+1)r+c+1 e-(a+1)h hr+c / Γ(r+c+1)
C. ar+c e-ah hr+c-1 / Γ(r+c)
D. (a+1)r+c e-(a+1)h hr+c-1 / Γ(r+c)
E. (a+c)r +1 e-(a+c)h hr / Γ(r+1) 4.74 (4, 5/86, Q.37) (1 point) The claim frequency rate Q has a gamma distribution with parameters α and θ. If N policies are written, the number of claims Y will be Poisson distributed with mean NQ. If we observe y claims in one year for N policies, what is the Bayesian update of Q? A. (1-Z) (αθ) + Z (y/N) ; Z = (N / (N + 1/θ))
B. (1-Z) (αθ) + Z (y/N) ; Z = (1/θ / (N + 1/θ))
C. (α + N) / (1/θ + y)
D. (α + 1/θ) / (N + y)
E. None of the above 4.75 (165, 11/86, Q.9) (1.8 points) You are using a Bayesian method to estimate the mean of the number X of accidents per year in a certain group. You assume that X has a Poisson distribution with mean m. Your prior opinion concerning m is that it has an exponential distribution, f(m) = a e-am for some a. In 1985, you observed 14 accidents in the group and revise your opinion concerning m. The mean of the posterior distribution of m is 10. Determine 1/a, the mean of the prior distribution. (A) 2 (B) 4 (C) 6 (D) 7 (E) 12
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 99
4.76 (4, 5/87, Q.42) (2 points) If n is the observed number of claims of m independent trials from a Poisson distribution with mean q, then the probability density function of n is given by: f(n) = exp(-qm) (qm)n / n! , n = 0, 1, 2,.... The prior distribution of q is Gamma, which has the following distribution: h(q) = exp(-bq)ba qa-1 / (a-1)! , q > 0 The mean of this Gamma distribution is a/b and its variance is a/b2 . What is the Buhlmann credibility given to an observation of n claims in m trials? A. ma / ( ma + b2 )
B. m / ( m + b)
D. m / ( m + a)
E. m / ( m + b2 )
C. ma / ( ma + b)
4.77 (4, 5/87, Q.47) (2 points) Let D be a random variable which represents the number of claims experienced in a one year period. The probability density function of D is Poisson with parameter h. That is, P(D=d | h) = e-h hd /d! ; d = 0, 1, 2, .... E(D|h) = VAR(D | h) = h. The random variable H, which represents the parameter of the Poisson distribution, follows the following probability distribution: g(h) = e-h ; 0 ≤ h. E(H) = VAR(H) = 1. What is the probability of observing 2 claims in the next year? A. 1/16 B. 1/12 C. 1/8 D. 1/6 E. 1/3 4.78 (4, 5/87, Q.56) (1 point) Suppose that the distribution of the number of claims for an individual insured is Poisson. Suppose further than an insurer has a number of independent such individuals. Assume, however, that the expected number of claims for individuals from this population follows a gamma distribution. What distribution will the insurer's claims follow? A. Binomial B. Negative Binomial C. Poisson D. Gamma E. Cannot be determined
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 100
4.79 (4, 5/89, Q.40) (2 points) The number of claims X for a given insured follows a Poisson distribution, P[X = x] = λxe-λ/x!. The expected annual mean of the Poisson distribution over the population of insureds follows the distribution f(λ) = e-λ over the interval (0, ∞). An insured is selected from the population at random. Over the last year this particular insured had no claims. What is the posterior density function of λ for the selected insured? A. e-λ, for λ > 0
B. λe-λ, for λ > 0
D. 4λe-2λ, for λ > 0
E. None of A, B, C, or D.
C. 2e-2λ, for λ > 0
Use the following information for the next two questions: The probability distribution function of claims per year for an individual risk is a Poisson distribution with parameter h. The prior distribution of h is a gamma distribution given by g(h) = he-h for 0 < h < ∞. 4.80 (4, 5/90, Q.46) (2 points) Given an observation of 1 claim in a one-year period, what is the posterior distribution for h? A. e-h
B. h2 e-h /2
C. 4h2 e-2h
D. he-h
E. 4he-2h
4.81 (4, 5/90, Q.47) (1 point) What is the Buhlmann credibility to be assigned to a single observation? A. 1/4 B. 1/3 C. 1/2 D. 2/3 E. 1/(1+h)
4.82 (4, 5/90, Q.48) (2 points) An automobile insurer entering a new territory assumes that each individual car's claim count has a Poisson distribution with parameter λ. The insurer also assumes that λ has a gamma distribution with probability density function: f(λ) = e-λ/θ (λ/θ)α−1 / {θΓ(α)}. Initially, the parameters of the gamma distribution are assumed to be α = 50 and θ = 1/500. During the subsequent two year period the insurer covered 750 and 1100 cars for the first and second years, respectively. The insurer incurred 65 and 112 claims in the first and second years, respectively. What is the coefficient of variation of the posterior gamma distribution? A. 0.066 B. 0.141 C. 0.520 D. 1.000 E. Not enough information
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 101
4.83 (4, 5/91, Q.31) (2 points) The number of claims a particular policyholder makes in a year has a Poisson distribution with mean q. The q-values for policyholders follow a gamma distribution with variance equal to 0.2. The resulting distribution of policyholders by number of claims is a negative binomial with parameters r and β such that the mean is equal to: rβ, and variance equal to: rβ(1 + β) = 0.5. What is the value of r(1 + β)? A. Less than 0.6 B. At least 0.6 but less than 0.8 C. At least 0.8 but less than 1.0 D. At least 1.0 but less than 1.2 E. At least 1.2 4.84 (4, 5/91, Q.49) (2 points) The parameter µ is the mean of a Poisson distribution. If µ has a prior gamma distribution with parameters α and θ, and a sample x1 , x2 , ..., xn from the Poisson distribution is available. Which of the following is the formula for the Bayes estimator of µ (i.e. the mean of the posterior distribution)? n
n ln(xi) 1 A. + αθ ∑ n + 1/ θ i=1 n n /θ + 1
n B. n + 1/ θ2
n
C.
αn xi 1 + αθ ∑ α n + 1 i=1 n αn + 1 n
α+ E.
∑ xi i=1
n + 1/ θ
n
∑ xni + i=1
n
D.
1 n / θ2
+ 1
αθ
1/θ xi n + αθ ∑ n + 1/ θ i=1 n n + 1/ θ
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 102
4.85 (4B, 5/92, Q.11) (2 points) You are given the following information:
•
Number of claims follows a Poisson distribution with parameter λ.
•
The claim frequency rate, λ, has a Gamma distribution with mean = 0.14 and variance 0.0004.
• •
During the latest two-year period, 110 claims have been observed.
In each of the two years, 310 policies were in force. Determine the Bayesian estimate of the posterior claim frequency rate based upon the latest observations. A. Less than 0.14 B. At least 0.14 but less than 0.15 C. At least 0.15 but less than 0.16 D. At least 0.16 but less than 0.17 E. At least 0.17 Use the following information for the next two questions: The number of claims for an individual risk in a single year follows a Poisson distribution with parameter λ. The parameter λ has for a prior distribution the following Gamma density function with parameters α = 1 and θ = 2: f(x) = (1/2) e-λ/2, λ > 0. You are given that three claims arose in the first year. 4.86 (4B, 5/92, Q.28) (2 points) Determine the posterior distribution of λ. A. (1/2) e-3λ/2 B. (1/12)λ3 e-λ/2 C. (1/4)λ3 e-λ/2 D. (27/32)λ3 e-3λ/2 E. (1/12)λ2 e-3λ/2 4.87 (4B, 5/92, Q.29) (2 points) Determine the Buhlmann credibility estimate for the expected number of claims in the second year. A. Less than 2.25 B. At least 2.25 but less than 2.50 C. At least 2.50 but less than 2.75 D. At least 2.75 but less than 3.00 E. At least 3.00
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 103
4.88 (4B, 11/92, Q.9) (2 points) You are given the following:
•
Number of claims for a single insured follows a Poisson distribution with mean λ.
•
The claim frequency rate, λ, has a gamma distribution with mean 0.10 and variance 0.0003.
• •
During the last three-year period 150 claims have occurred.
In each of the three years, 200 policies were in force. Determine the Bayesian estimate of the posterior claim frequency rate based upon the latest observations. A. Less than 0.100 B. At least 0.100 but less than 0.130 C. At least 0.130 but less than 0.160 D. At least 0.160 but less than 0.190 E. At least 0.190 4.89 (4B, 11/92, Q.16) (3 points) You are given the following:
•
Number of claims follows a Poisson distribution with mean λ.
•
λ has the Gamma distribution f(λ) = 3e-3λ, λ > 0.
•
The random variable Y, representing claim size, has the gamma distribution: p(y) = exp(-y/2500) / 2500, y > 0. Determine the variance of the pure premium. A. Less than 2,500,000 B. At least 2,500,000 but less than 3,500,000 C. At least 3,500,000 but less than 4,500,000 D. At least 4,500,000 but less than 5,500,000 E. At least 5,500,000
4.90 (4B, 5/93, Q.32) (2 points) You are given the following: • The number of claims for a class of business follows a Poisson distribution. • The prior distribution for the expected claim frequency rate of individuals belonging to this class of business is a Gamma distribution with mean = 0.10 and variance = 0.0025. • During the next year, 6 claims are sustained by the 20 risks in the class. Determine the variance of the posterior distribution for the expected claim frequency rate of individuals belonging to this class of business. A. Less than 0.0005 B. At least 0.0005 but less than 0.0015 C. At least 0.0015 but less than 0.0025 D. At least 0.0025 but less than 0.0050 E. At least 0.0050
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 104
4.91 (4B, 11/93, Q.2) (1 point) You are given the following: • Number of claims follows a Poisson distribution with parameter µ. • Prior to the first year of coverage, µ is assumed to have the Gamma distribution f(µ) = 1000150 µ149 e-1000µ / Γ(150), µ > 0. • In the first year, 300 claims are observed on 1,500 exposures. • In the second year, 525 claims are observed on 2,500 exposures. After two years, what is the Bayesian probability estimate of E[µ]? A. Less than 0.17 B. At least 0.17, but less than 0.18 C. At least 0.18, but less than 0.19 D. At least 0.19, but less than 0.20 E. At least 0.20 Use the following information for the next two questions: • For an individual risk in a population, the number of claims for a single exposure period follows a Poisson distribution with mean µ. •
For the population, µ is distributed according to an exponential distribution with mean 0.1, g(µ) = 10e-10µ, µ > 0.
• •
An individual risk is selected at random from the population. After one exposure period, one claim has been observed.
4.92 (4B, 5/94, Q.25) (3 points) Determine the density function of the posterior distribution of µ for the selected risk. A. 11e-11µ
B. 10µe-11µ
C. 121µe-11µ
D. (1/10)e-9µ
E. (11e-11µ) / µ2
4.93 (4B, 5/94, Q.26) (2 points) Determine the Buhlmann credibility factor, z, assigned to the number of claims for a single exposure period. A. 1/10 B. 1/11 C. 1/12 D. 1/14 E. None of A, B, C, or D
4.94 (4B, 11/94, Q.3) (2 points) You are given the following: The number of claims for a single risk follows a Poisson distribution with mean m. m is a random variable having a prior Gamma distribution with mean = 0.5. The value of k in Buhlmannʼs partial credibility formula is 10. After five exposure periods, the posterior distribution is Gamma with mean 0.6. Determine the number of claims observed in the five exposure periods. A. 3 B. 4 C. 5 D. 6 E. 10
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 105
4.95 (4B, 11/94, Q.24) (2 points) You are given the following: r is a random variable that represents the number of claims for an individual risk and has the Poisson density function f(r) = t r e -t / r !, r = 0, 1, 2, ... The parameter t has a prior Gamma distribution with density function h(t) = 5 e-5t, t > 0. A portfolio consists of 100 independent risks, each having identical density functions. In one year, 10 claims are experienced by the portfolio. Use the Buhlmann credibility method to determine the expected number of claims in the second year for the portfolio. A. Less than 6 B. At least 6, but less than 8 C. At least 8, but less than 10 D. At least 10, but less than 12 E. At least 12 Use the following information for the next two questions: For an individual risk in a population, the number of claims for a single exposure period follows a Poisson distribution with parameter λ. For the population, λ is distributed according to an exponential distribution: h(λ) = 5 e-5λ, λ > 0. An individual risk is randomly selected from the population. After two exposure periods, one claim has been observed. 4.96 (4B, 11/94, Q.25) (2 points) For the selected risk, subsequent to the observation, determine the expected value of the process variance. A. 0.04 B. 0.20 C. 0.29 D. 5.00 E. 25.00 4.97 (4B, 11/94, Q.26) (3 points) Determine the density function of the posterior distribution of λ for the selected risk. A. 7e-7λ
B. 5λe-7λ
C. 49λe-7λ
D. 108λ2 e-6λ
E. 270λ2 e-6λ
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 106
Use the following information for the next two questions: • A portfolio of insurance risks consists of two classes, 1 and 2, that are equal in size. • For a Class 1 risk, the number of claims follows a Poisson distribution with mean λ1. • λ1 varies by insured and follows an exponential distribution with mean 0.3. • For a Class 2 risk, the number of claims follows a Poisson distribution with mean λ2. • λ2 varies by insured and follows an exponential distribution with mean 0.7. Hint: The exponential distribution is a special case of the Gamma distribution with α = 1. 4.98 (4B, 5/95, Q.7) (2 points) Two risks are randomly selected, one from each class. What is total variance of the number of claims observed for both risks combined? A. Less than 0.70 B. At least 0.70, but less than 0.95 C. At least 0.95, but less than 1.20 D. At least 1.20, but less than 1.45 E. At least 1.45 4.99 (4B, 5/95, Q.8) (2 points) Of the risks that have no claims during a single exposure period, what proportion can be expected to be from Class 1? A. Less than 0.53 B. At least 0.53, but less than 0.58 C. At least 0.58, but less than 0.63 D. At least 0.63, but less than 0.68 E. At least 0.68
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 107
4.100 (4B, 5/95, Q.12) (2 points) You are given the following: • A portfolio consists of 1,000 identical and independent risks. • The number of claims for each risk follows a Poisson distribution with mean λ. • Prior to the latest exposure period, λ is assumed to have a gamma distribution, with parameters α = 250 and θ = 1/2000. • During the latest exposure period, the following loss experience is observed: Number of Claims Number of Risks 0 906 1 89 2 4 3 1 1,000 Determine the mean of the posterior distribution of λ. A. Less than 0.11 B. At least 0.11, but less than 0.12 C. At least 0.12, but less than 0.13 D. At least 0.13, but less than 0.14 E. At least 0.14 4.101 (4B, 11/95, Q.7) (2 points) You are given the following:
•
The number of claims per year for a given risk follows a Poisson distribution with mean λ.
•
The prior distribution of λ is assumed to be a gamma distribution with coefficient of variation 1/6.
Determine the coefficient of variation of the posterior distribution of λ after 160 claims have been observed for this risk. A. Less than 0.05 B. At least 0.05, but less than 0.10 C. At least 0.10, but less than 0.15 D. At least 0.15 E. Cannot be determined from the given information.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 108
4.102 (4B, 5/96, Q.21) (2 points) You are given the following:
•
The number of claims per year for a given risk follows a Poisson distribution with mean θ.
•
The prior distribution of θ is assumed to be a gamma distribution with mean 1/2 and variance 1/8.
Determine the variance of the posterior distribution of θ after a total of 4 claims have been observed for this risk in a 2-year period. A. 1/16 B. 1/8 C. 1/6 D. 1/2
E. 1
4.103 (4B, 11/97, Q.2) (2 points) You are given the following: • A portfolio consists of 100 identical and independent risks. • The number of claims per year for each risk follows a Poisson distribution with mean λ . • The prior distribution of λ is assumed to be a gamma distribution with mean 0.25 and variance 0.0025. During the latest year, the following loss experience is observed: Number of Claims Number of Risks 0 80 1 17 2 3 Determine the variance of the posterior distribution of λ . A. Less than 0.00075 B. At least 0.00075, but less than 0.00125 C. At least 0.00125, but less than 0.00175 D. At least 0.00175, but less than 0.00225 E. At least 0.00225 4.104 (4B, 5/98, Q.4) (2 points) You are given the following: • A portfolio consists of 10 identical and independent risks. •
The number of claims per year for each risk follows a Poisson distribution with mean λ.
•
The prior distribution of λ is assumed to be a gamma distribution with mean 0.05
•
and variance 0.01. During the latest year, a total of n claims are observed for the entire portfolio.
•
The variance of the posterior distribution of λ is equal to the variance of the prior distribution of λ.
Determine n. A. 0 B. 1 C. 2
D. 3
E. 4
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 109
Use the following information for the next two questions: • The number of errors that a particular baseball player makes in any given game follows a Poisson distribution with mean λ. •
λ does not vary by game.
•
The prior distribution of λ is assumed to follow a distribution with mean 1/10, variance α/β2 and density function f(λ) = e−βλ β αλ α−1 / Γ(α), 0 < λ < ∞.
•
The player is observed for 60 games and makes one error.
4.105 (4B, 5/99, Q.23) (2 points) If the prior distribution is constructed so that the credibility of the observations is very close to zero, determine which of the following is the largest. A. f(0) B. f(1/100) C. f(1/20) D. f(1/10) E. f(1) 4.106 (4B, 5/99, Q.24) (2 points) If the prior distribution is constructed so that the variance of the hypothetical means is 1/400, determine the expected number of errors that the player will make in the next 60 games. A. Less than 0.5 B. At least 0.5, but less than 2.5 C. At least 2.5, but less than 4.5 D. At least 4.5, but less than 6.5 E. At least 6.5 Use the following information for the next two questions: • The number of claims for a particular insured in any given year follows a Poisson distribution with mean λ. • λ does not vary by year. • The prior distribution of λ is assumed to follow a distribution with mean 10/m, variance 10/m2 , and density function f(λ) = e-mλ m10 λ 9 / Γ(10), 0 < λ < ∞, where m is a positive integer. 4.107 (4B, 11/99, Q.23) (2 points) The insured is observed for m years, after which the posterior distribution of λ has the same variance as the prior distribution. Determine the number of claims that were observed for the insured during these m years. A. 10 B. 20 C. 30 D. 40 E. 50 4.108 (4B, 11/99, Q.24) (2 points) As the number of years of observation becomes larger and larger, the ratio of the variance of the predictive (negative binomial) distribution to the mean of the predictive (negative binomial) distribution approaches what value? A. 0 B. 1 C. 2 D. 4 E. ∞
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 110
4.109 (Course 4 Sample Exam 2000, Q.4) An individual automobile insured has a claim count distribution per policy period that follows a Poisson distribution with parameter λ. For the overall population, λ follows a distribution with density function according to an exponential distribution: h(λ) = 5 e−5λ, λ > 0. One insured is selected at random from the population and is observed to have a total of one claim during two policy periods. Determine the expected number of claims that this same insured will have during the third policy period. 4.110 (4, 5/00, Q.30) (2.5 points) You are given: (i) An individual automobile insured has an annual claim frequency distribution that follows a Poisson distribution with mean λ. (ii) λ follows a gamma distribution with parameters α and θ. (iii) The first actuary assumes α = 1 and θ = 1/6. (iv) The second actuary assumes the same mean for the gamma distribution, but only half the variance. (v) A total of one claim is observed for the insured over a three year period. (vi) Both actuaries determine the Bayesian premium for the expected number of claims in the next year using their model assumptions. Determine the ratio of the Bayesian premium that the first actuary calculates to the Bayesian premium that the second actuary calculates. (A) 3/4 (B) 9/11 (C) 10/9 (D) 11/9 (E) 4/3 4.111 (4, 5/01, Q.2) (2.5 points) You are given: (i) Annual claim counts follow a Poisson distribution with mean λ. (ii) The parameter λ has a prior distribution with probability density function: f(λ) = e−λ/3/3, λ > 0. Two claims were observed during the first year. Determine the variance of the posterior distribution of λ. (A) 9/16
(B) 27/16
(C) 9/4
(D) 16/3
(E) 27/4
4.112 (2 points) In the previous question, 4, 5/01, Q.2, determine the variance of the predictive distribution of the number of claims in the second year. (A) 2.0 (B) 2.5 (C) 3.0 (D) 3.5 (E) 4.0
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 111
4.113 (4, 11/01, Q.3 & 2009 Sample Q.58) (2.5 points) You are given: (i) The number of claims per auto insured follows a Poisson distribution with mean λ. (ii) The prior distribution for λ has the following probability density function: f(λ) = (500λ)50 e-500λ / {λΓ(50)} (iii) A company observes the following claims experience: Year 1 Year 2 Number of claims 75 210 Number of autos insured 600 900 The company expects to insure 1100 autos in Year 3. Determine the expected number of claims in Year 3. (A) 178 (B) 184 (C) 193 (D) 209 (E) 224 4.114 (4, 11/01, Q.34 & 2009 Sample Q.76) (2.5 points) You are given: (i) The annual number of claims for each policyholder follows a Poisson distribution with mean θ. (ii) The distribution of θ across all policyholders has probability density function: f(θ) = θe−θ, θ > 0. ∞
(iii)
∫ θ e-nθ dθ
= 1/n2 .
0
A randomly selected policyholder is known to have had at least one claim last year. Determine the posterior probability that this same policyholder will have at least one claim this year. (A) 0.70 (B) 0.75 (C) 0.78 (D) 0.81 (E) 0.86 4.115 (3 points) In the previous question, what is the expected future annual frequency for this policyholder? (A) 13/6 (B) 11/5 (C) 9/4 (D) 7/3 (E) 5/2
4.116 (4, 11/02, Q.3 & 2009 Sample Q. 32) (2.5 points) You are given: (i) The number of claims made by an individual insured in a year has a Poisson distribution with mean λ. (ii) The prior distribution for λ is gamma with parameters α = 1 and θ = 1.2. Three claims are observed in Year 1, and no claims are observed in Year 2. Using Bühlmann credibility, estimate the number of claims in Year 3. (A) 1.35 (B) 1.36 (C) 1.40 (D) 1.41 (E) 1.43
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 112
4.117 (4, 11/03, Q.27 & 2009 Sample Q.21) (2.5 points) You are given: (i) The number of claims incurred in a month by any insured has a Poisson distribution with mean λ. (ii) The claim frequencies of different insureds are independent. (iii) The prior distribution is gamma with probability density function: f(λ) = (100λ)6 e-100λ / (120λ). Number of Insureds Number of Claims (iv) Month 1 100 6 2 150 8 3 200 11 4 300 ? Determine the Bühlmann-Straub credibility estimate of the number of claims in Month 4. (A) 16.7 (B) 16.9 (C) 17.3 (D) 17.6 (E) 18.0 4.118 (2 points) In the previous question, using the Normal Approximation, determine the probability that the number of claims observed in month 4 is more than 20. (A) 16% (B) 19% (C) 21% (D) 24% (E) 30%
4.119 (4, 5/05, Q.21 & 2009 Sample Q.191) (2.9 points) You are given: (i) The annual number of claims for a policyholder follows a Poisson distribution with mean Λ. (ii) The prior distribution of Λ is gamma with probability density function: f(λ) =
(2λ) 5 e - 2λ , λ > 0. 24 λ
An insured is selected at random and observed to have x1 = 5 claims during Year 1 and x2 = 3 claims during Year 2. Determine E(Λ | x1 = 5, x2 = 3). (A) 3.00
(B) 3.25
(C) 3.50
(D) 3.75
(E) 4.00
4.120 (1 point) In the previous question, for this insured what is the probability of observing 4 claims in year 3? (A) 14% (B) 15% (C) 16% (D) 17% (E) 18% 4.121 (2 points) In 4, 5/05, Q.21, for this insured what is the probability of observing a total of 6 claims in years 3, 4, and 5? (A) 5% (B) 6% (C) 7% (D) 8% (E) 9%
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
4.122 (4, 11/05, Q.2 & 2009 Sample Q.215) (2.9 points) You are given: (i) The conditional distribution of the number of claims per policyholder is Poisson with mean λ. (ii) The variable λ has a gamma distribution with parameters α and θ. (iii) For policyholders with 1 claim in Year 1, the credibility estimate for the number of claims in Year 2 is 0.15. (iv) For policyholders with an average of 2 claims per year in Year 1 and Year 2, the credibility estimate for the number of claims in Year 3 is 0.20. Determine θ. (A) Less than 0.02 (B) At least 0.02, but less than 0.03 (C) At least 0.03, but less than 0.04 (D) At least 0.04, but less than 0.05 (E) At least 0.05 4.123 (4, 11/06, Q.10 & 2009 Sample Q.254) (2.9 points) You are given: (i) A portfolio consists of 100 identically and independently distributed risks. (ii) The number of claims for each risk follows a Poisson distribution with mean λ. (iii) The prior distribution of λ is: π(λ) = (50λ)4 e−50λ / (6λ), λ > 0. During Year 1, the following loss experience is observed: Number of Claims Number of Risks 0 90 1 7 2 2 3 1 Total 100 Determine the Bayesian expected number of claims for the portfolio in Year 2. (A) 8 (B) 10 (C) 11 (D) 12 (E) 14
Page 113
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 114
Solutions to Problems: 4.1. B. The Gamma-Poisson has a Negative Binomial mixed distribution, with parameters r = α = 5 and β = θ = 1/8. {r(r+1) ... (r + x -1)/x!}βx / (1+β)x+r = {(5)(6) ... (4+x)/x!}(1/8)x /(9/8)x+5 = {(5)(6) ... (4+x)/x!} 85 / 9x + 5. 4.2. C. For the Gamma-Poisson, if the prior Gamma has parameters α = 5, θ = 1/5, then the Posterior Gamma has parameters: α = 5 + 1 = 6 and 1/θ = 5 + 2 = 7. Posterior Gamma = θ−α λα−1 e-λ/θ / Γ(α) = 76 λ5 e-7λ / 5! = 980.4 λ5 e- 7λ . 4.3. E. For the Gamma-Poisson, Buhlmann credibility gives the same answer as the mean of the posterior distribution. The mean of the posterior Gamma with parameters α = 6 and θ = 1/7 is: 6/7. Alternately, K = 1/θ = 5. Z = 2 /(2+K) = 2/7. Prior estimate = 5/5 = 1. Observed frequency = 1/2. Therefore, new estimate = (2/7)(1/2) + (1 - 2/7)(1) = 6/7 = 0.857. 4.4. C. The prior distribution is a Gamma with α = 5 (the power to which λ is taken in the density function is α-1) and θ = 1/10 (the value multiplying −λ where it is exponentiated in the density function is 1/θ = 10.) Gamma-Poisson has a marginal distribution (prior to any observations) of a Negative Binomial with parameters: r = α = 5, β = θ = 1/10, with mean = rβ = 5/10 = 0.5. Alternately, each insured expected mean frequency is distributed via the Gamma distribution. Therefore, the overall expected mean for the portfolio is the mean of the Gamma distribution with parameters 5 and 1/10: 5/10 = 0.5. 4.5. E. The prior distribution is a Gamma with α = 5 and θ = 1/10. Thus F(x) = Γ(α; x/ θ) = Γ(5; 10x). F(.4) - F(.3) = Γ(5; 4) - Γ(5; 3 ) = .371 - .185 = 0.186. 4.6. E. Gamma-Poisson has a marginal distribution (prior to any observations) of a Negative Binomial with parameters: β = θ = 1/10, r = α = 5. f(3) = (5)(6)(7)β3 / {(3!)(1+β)3+r} = (35)(.001)/(1.18 ) = 1.6%. 4.7. B. Variance of the Negative Binomial = mean (1+β) = .5 (11/10) = .55 Alternately one can use the solutions to the next two questions: the total variance = Expected value of the process variance + Variance of the hypothetical means = .5 + .05 = 0.55.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 115
4.8. C. Expected value of the process variance = expected value of the variance of Poisson = expected value of λ = mean of Gamma = 0.5. 4.9. E. VHM = variance of λ = variance of Gamma = (5)(1/102 ) = 0.05. 4.10. A. Using the solutions to the previous two questions, K= .5 / .05 = 10. Z = 8/(8 + 10) = 4 /9. Prior estimate = .5. Observed frequency = 2/8. Therefore, the new estimate = (4/9)(1/4) + (5/9)(1/2) = 7/18 = 0.39. 4.11. D. Posterior Gamma has: αʼ = α + C = 5 + 2 = 7, 1/θʼ = 1/θ + E = 10 + 8 = 18. f(λ) = (187 /(6!))e−18λλ 6 = 850305.6 e-18 λ λ 6 . 4.12. A. Mean of the posterior Gamma is 7/18 = 0.389. Comment: Note that the posterior mean is the same as the estimate using Buhlmann credibility, 7/18, since the Gamma is a Conjugate Prior for the Poisson, which is a member of a linear exponential family. 4.13. B. The posterior distribution is a Gamma with α = 7 and θ = 1/18. Thus F(x) = Γ(α; x/θ) = Γ(7; 18x). F(0.4) - F(0.3) = Γ(7; 7.2) - Γ(7; 5.4 ) = 0.580 - 0.298 = 0.282. Comment: An example of a Bayesian interval estimation. ∞
Note that Γ(7; 7.2) =
∑ i=7
7.2i e - 7.2 =1i!
6
∑ i=0
7.2i e - 7.2 = 1 - 0.420 = 0.580. i!
4.14. E. Variance of the posterior Gamma, 7 / 182 = 0.022. Comment: This is less than the .050 variance of the prior; the observation has narrowed the possibilities for this insured. 4.15. A. The posterior distribution is a Gamma, with α = 7 and θ = 1/18, mean .389, and variance of 7/182 . The standard deviation is:
7 /18 = .147.
Thus F(.4) - F(.3) ≅ Φ[(.4-.389)/.147] - Φ[(.3-.389)/.147] = .530 - .272 = 0.258. Comment: Note that this differs from the exact answer of .282 obtained as the solution to a previous question using values of the Incomplete Gamma Functions. 4.16. B. The predictive distribution is Negative Binomial with parameters: β = θ = 1/18, r = α = 7. f(3) = (7)(8)(9)β3 / {(3!)(1+β)3+r} = (84)(1/183 )/(19/18)10 = 0.8%.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 116
4.17. D. The predictive distribution is Negative Binomial with parameters: β = θ = 1/18, r = α = 7. The variance of the predictive distribution is: (7)(1/18)(1 + 1/18) = 0.410. 4.18. E. The posterior distribution of λ is Gamma with parameters: α = 7, θ = 1/18. This is for one future year. Over three future years, frequency is Poisson with mean 3λ. The posterior distribution of 3λ is Gamma with parameters: α = 7, θ = 3/18 = 1/6. Thus, the number of claims over three years is Negative Binomial with r = 7 and β = 1/6. f(4) = {(7)(8)(9)(10)/4!}β4 / (1+β)4+r = (210)(1/64 )/(7/6)11 = 2.97%. Comment: For the Gamma-Poisson, the mixed distribution for Y years of data is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ. See “Mahlerʼs Guide to Frequency Distributions.” 4.19. A. We observe 53 claims from 32 exposures. The posterior Gamma has parameters α = 9 + 53 = 62 and θ = .2/(1+(.2)(32)) = .02703, with mean (62)(.02703) = 1.676. With 14 individuals we expect: (14)(1.676) = 23.5 claims. Alternately, the Buhlmann Credibility Parameter K = 1/θ = 1/.2 = 5. For 32 exposures, Z = 32/(32+5) = 32/37. The observed frequency is 53/32 . The a priori frequency is αθ = (9)(.2) = 1.8. Thus the future estimated frequency is: (32/37)(53/32) + (1.8)(5/37) = 62/37 = 1.676. Thus we expect (14)(1.676) = 23.5 claims. Comment: During 2000, we have data from 1997 to 1999 and trying to predict the year 2001. 4.20. E. We observe 59 claims from 26 exposures. The posterior Gamma has parameters α = 9 + 59 = 68 and θ = 1/(5 + 26) = .032258, with mean (68)(.032258) = 2.1935. With 7 individuals we expect (7)(2.1935) = 15.355 claims. At $800 per claim, this is $12,284. Alternately, the Buhlmann Credibility Parameter K = 1/θ = 1/.2 = 5. For 26 exposures, Z = 26/(26+5) = 26/31. The observed frequency is 59/26 . The a priori frequency is αθ = (9)(.2) =1.8. Thus the future estimated frequency is: (26/31)(59/26) + (1.8)(5/31) = 68/31 = 2.1935. Then proceed as before.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 117
4.21. B. The prior distribution of λ per month is Gamma with α = 1, and θ = 0.015. The posterior distribution of λ is Gamma with α = 1 + 4 = 5 and 1/θ = 1/0.015 + 36 = 102.667. This is for one future month. Over 12 future months, frequency is Poisson with mean 12λ. The posterior distribution of 12λ is Gamma with α = 5, and θ = 12/102.667 = 0.11688. Thus, for 12 months, the distribution of the number of accidents is Negative Binomial with r = 5, and β = 0.11688. f(2) = {r(r+1)/2}β2/(1+β)2+r = {(5)(6)/2} (0.11688)2 /(1 + 0.11688)7 = 9.45%. Alternately, if λ is the mean over 1 month for a particular store, then 12λ is the mean over 1 year. Multiplying an Exponential (Gamma) by a constant we multiply theta by that constant. Converting to years, the prior distribution is Gamma: α = 1, and θ = (12)(0.015) = 0.18. We observe 4 robberies over 3 years. The posterior distribution is Gamma with α = 1 + 4 = 5, and 1/θ = 1/.18 + 3 = 8.556. For 1 year, the predictive distribution is Negative Binomial with r = 5 and β = 1/8.556. f(2) = {r(r+1)/2}β2/(1+β)2+r = {(5)(6)/2} (1/8.556)2 /(1 + 1/8.556)7 = 9.45%. Comment: While the predictive distribution for one month is Negative Binomial with r = 5 and β = 1/102.667, we can not add up 12 copies of this distribution in order to get the distribution of the number of accidents for 12 months. This would be what we would do if each month had a different lambda picked at random from the Gamma Distribution; here each month from a single store has the same lambda. See “Mahlerʼs Guide to Frequency Distributions.” For the Gamma-Poisson, the mixed distribution for Y years of data is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ. 4.22. D. The Poisson parameters over three years are three times those on an annual basis. Therefore they are given by a Gamma distribution with α = 3 and θ = 3/12 = 1/4. (The mean frequency is now 3/4 per three years rather than 3/12 = 1/4 on an annual basis. It might be helpful to recall that θ is the scale parameter for the Gamma Distribution.) The marginal distribution for the Gamma-Poisson is a Negative Binomial, with parameters r = α = 3 and β = θ = 1/4. f(0) = 1/(1 + β)r = 1/(5/4)3 = 0.512. 4.23. A. Negative Binomial, with parameters r = α = 3 and β = θ = 1/4. Therefore f(1) = rβ/(1 + β)r+1 = (3)(1/4)/(5/4)4 = 0.3072.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 118
4.24. C. For the Gamma-Poisson, the Buhlmann Credibility Parameter K = 1/θ = 12. Z = 3/(3 + 12) = 1/5 = 20%. Alternately, one can use the θ of 1/4 from the Gamma distribution of Poisson parameters for 3 years and get Z = 1 / (1 + 4) = 1/5, where there is a single three-year period. Comment: Iʼd recommend sticking to the first approach under exam conditions. 4.25. D. The a priori mean annual frequency is the mean of the prior Gamma = 3/12 = 1/4. Z = .2 from the previous solution. The new estimate is: (.2)(0) + (1 - .2)(.25) = 0.2. 4.26. E. The a priori mean annual frequency is the mean of the prior Gamma = 3/12 = 1/4. The observed annual frequency is 1/3. The credibility is .2. The new estimate = (.2)(1/3) + (1 - .2)(.25) = 0.267. 4.27. C. For the Gamma-Poisson the result of Bayesian Analysis equal that of Buhlmann credibility. The estimates if one observes zero or one claim over three years are given by previous solutions as: .2 and .267. The probabilities of observing zero or one claim over three years are given by previous solutions as: .512 and .3072. Thus the combined estimate for insured with either zero or one claim over three years is the weighted average: {(.2)(.512) + (.267)(.3072)} / (.512 + .3072) = 0.225. Alternately, for a Poisson parameter of λ, the chance of zero or one claim over three years is: e−3λ + 3λe−3λ = e−3λ(1 + 3λ). The a priori probability of λ is proportional to λ2e−12λ. Thus the posterior distribution of λ is proportional to: e−3λ(1 + 3λ)λ2e−12λ = e−15λ(λ2 + 3λ3). In order to get the posterior distribution we must divide by the integral from zero to infinity. Twice using the formula for “Gamma integrals”, this integral is: 15-3Γ(3) + 3(15-4)Γ(4) = .0009481. The posterior distribution is: 1055e−15λ(λ2 + 3λ3). The mean of this distribution is the integral from zero to infinity of λ times f(λ):
∫1055e−15λ(λ3 + 3λ4) dλ = (1055){(15-4Γ(4) + 3(15-5)Γ(5))} = (1055)(.0002133) = 0.225. 4.28. A. The posterior distribution is Gamma with α = 1.5 + 2 = 3.5, and 1/θ = 1/0.03 + 3 = 36.33. Thus, the posterior distribution of 3λ is Gamma with α = 3.5, and θ = 3/36.33 = 0.0826. Thus, for a total of 3 years, the distribution of the number of accidents is Negative Binomial with r = 3.5 and β = 0.0826. f(1) = rβ/(1+β)1+r = 3.5 (0.0826)/(1 + 0.0826)4.5 = 20.2%. Comment: For the Gamma-Poisson, the mixed distribution for Y years of data is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ. See “Mahlerʼs Guide to Frequency Distributions.”
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 119
4.29. D. The variance of the Gamma = VHM = Total Variance - EPV = Variance of the Negative Binomial - Mean of the Gamma = Variance of the Negative Binomial - Mean of Negative Binomial = rβ(1+β) - rβ = rβ2 = (4)(3/17)2 = 0.125. Alternately, the parameters of the Gamma can be gotten from those of the Negative Binomial, α = r = 4, θ = β = 3/17. Then the Variance of the Gamma = αθ2 = 0.125. 4.30. B. The Gamma distribution of Poisson parameters for 1 year from the previous solution has parameters α = 4 and θ = 3/17. The distribution of Poisson parameters for 5 years has the same α = 4, but θ = (5)(3/17) =15 /17. The Negative Binomial for 5 years of data has parameters: r = α = 4 and β = θ = 15/17. Therefore the chance of 2 accidents is: f(2) = (r(r + 1)/2)β2/(1 + β)r+2 = {(4)(5)/2} (15/17)2 /(1 + 15/17)4+2 = (10)(.7785)/(44.484) = 17.5%. 4.31. A. Prob[observe 1 or more claim | λ] = 1 - Prob[0 claims | λ] = 1 - e−λ. ∞
∫ (1 0
e -λ )100λ
e -10λ
∞
dλ = 1 - 100 ∫ λ e -11λ dλ = 1 - 100/112 = 21/121. 0
By Bayes Theorem, the posterior distribution of λ is: (1 - e−λ)100λe−10λ /(21/121) = (12100/21)(λe−10λ - λe−11λ). Therefore, the expected future claim frequency of this policyholder = ∞
∫
0
∞
λ(12100 / 21)(λ e-10λ − λ e -11λ ) dλ = (12100/21) ∫ (λ 2 e-10λ − λ 2 e -11λ ) dλ = 0
(12100/21){2/103 - 2/113 } = 0.287. ∞
Comment: Gamma type integral:
∫
0
tα − 1 e- t / θ dt = Γ(α)θα, or for integer n:
∞
∫ tn e- ct 0
dt = n! / cn+1.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 120
4.32. C. From the previous solution, the posterior distribution of λ is: (12100/21)(λe−10λ - λe−11λ). Therefore, the posterior probability that this same policyholder will have 0 claims this year is: ∞
∞
∫ e−λ(12100/21)(λe−10λ - λe−11λ) dλ = (12100/21) ∫λe-11λ - λe-12λ dλ = 0
0
(12100/21)(1/112 - 1/122 ) = 0.761. Comment: Similar to 4, 11/01, Q.34. 4.33. C. From a previous solution, the posterior distribution of λ is: (12100/21)(λe−10λ - λe−11λ). Prob[ 2 or more claims | λ] = 1 - e−λ - λe−λ. Therefore, the posterior probability that this same policyholder will have at least 2 claims this year is: ∞
∫ (1 - e−λ- λe−λ)(12100/21)(λe−10λ - λe−11λ) dλ = 0 ∞
∫
1- (12100/21) λe-11λ + λ 2e-11λ - λe-12λ - λ2e-12λ dλ = 0
1 - (12100/21)(1/112 + 2/113 - 1/122 - 2/123 ) = 0.0405. 4.34. D. Since we are not given the data on the number of claims, we divide the dollars of loss by the average claim cost. For policyholder 2 that is $58,800/$600 = 98 claims. There are 49 exposures observed for policyholder 2. Thus the posterior Gamma has parameters α = 9 + 98 = 107 and θ = .2/(1+(.2)(49)) = .01852, with mean (107)(.01852) = 1.9815. With a frequency of 1.9815 at $600 per claim, the expected pure premium is: (1.9815)(600) = $1189 per person. Alternately, one could use Buhlmann Credibility with K = 1/θ = 5, and Z = 49/(49 + 5) = 90.7%. The observed frequency is: 98/49 = 2. The a priori mean frequency is: (9)(.2) = 1.8. The estimated future frequency is: (.907)(2) + (1 - .907)(1.8) = 1.9814. The expected pure premium is: (1.9814)(600) = $1189 per person. Comment: During 2000, we have data from 1997 to 1999 and are trying to predict the year 2001.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 121
4.35. B. Since we are not given the data on the number of claims, we divide the dollars of loss by the average claim cost. In 1997 the average claim cost is 800/(1.07)4 = 610. Thus the $8700 in losses correspond to 8700/610 = 14.26 claims. Similarly, 11,800/ (800/(1.07)3 ) = 18.07, and 11,100/ (800/(1.07)2 ) = 15.89. Thus we have 14.26 + 18.07 + 15.89 = 48.22 claims and 32 exposures. The posterior Gamma has parameters: α = 9 + 48.22 = 57.22 and 1/θ = 1/.2 + 32 = 37, with mean 57.22/37 = 1.547 . With 16 individuals we expect (16)(1.547) = 24.75 claims. At $800 per claim, this is $19,800. Alternately, one could use Buhlmann Credibility with K = 1/θ = 5. With 32 exposures, Z= 32/37. The inferred, observed frequency is 48.22/32. The a priori frequency is (9)(.2) = 1.8. Thus the estimated future frequency is: (32/37)(48.22/32) + (5/37)(1.8) = 57.22/37 = 1.546. Then proceed as before. 4.36. C. This is a Gamma-Poisson with α = 1 and θ = 1/8. We observe 3 + 12 = 15 months and 4 claims. αʼ = 1 + 4 = 5 and 1/θʼ = 8 + 15 = 23. Mean of the posterior Gamma is: 5/23. Expected number of claims during 2006 (12 months) is: (12)(5/23) = 60/23 = 2.61. Alternately, K = 1/θ = 8. Z = 15/(15 + 8) = 15/23. A prior mean is 1/8. Estimated future frequency is: (4/15)(15/23) + (1/8)(8/23) = 5/23. Expected number of claims during 2006 (12 months) is: (12)(5/23) = 2.61. 4.37. D, 4.38. B, & 4.39. D. The mixed distribution is a Negative Binomial with r = α = 4 and β = θ = 0.1. f(0) = (1+β)-r = 1.1-4 = .6830. Expected size of group A: 6830. f(1) = rβ(1+β)-(r+1) = (4)(.1)1.1-5 = .2484. Expected size of group B: 2484. Expected size of group C: 10000 - (6830 + 2484) = 686. 4.40. A. αʼ = α + 0 = 4. 1/θʼ = 1/θ + 1 = 11. αʼθʼ = 4/11 = 0.364. 4.41. E. αʼ = α + 1 = 5. 1/θʼ = 1/θ + 1 = 11. αʼθʼ = 5/11 = 0.455.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 122
4.42. C. Expected Number of claims for this portfolio is: (10000)(4)(.1) = 4000. Expected number of claims from Group A is: (4/11)(6830) = 2484. Expected number of claims from Group B is: (5/11)(2484) = 1129. Therefore, expected number of claims from Group C is: 4000 - (2484 + 1129) = 387. Expected claim frequency for Group C is: 387 / 686 = 0.564. Alternately, by Bayes Theorem, the posterior distribution of λ is proportional to: (10000 λ3 e−10λ / 6)(1 - e−λ - λe−λ). ∞
∫ (1 - e−λ- λe−λ)(10000 λ3 e−10λ / 6) dλ = 0 ∞
∫
1- (10000/6) λ3e-11λ + λ 4e-11λ dλ = 1 - (10000/6)(6/114 + 24/115 ) = .068618. 0
Therefore, the posterior distribution is: 24289 λ3 e−10λ(1 - e−λ - λe−λ), with mean: ∞
∫ 24289 λ3 e−10λ(1 - e−λ - λe−λ) λ dλ = 24289(24/105 - 24/115 - 120/116) = 0.564. 0
Comment: The future claim frequency for those with exactly 2 claims is: 6/11 = .545. However, group C also includes some policyholders who had more than 2 claims, and therefore with even higher expected claim frequencies. 4.43. D. rʼ = αʼ = 4, βʼ = θʼ = 1/11= .09091. For the predictive Negative Binomial: f(0) = 1.09091-4 = 70.6%. 4.44. A. rʼ = αʼ = 5, βʼ = θʼ = 1/11= .09091. For the predictive Negative Binomial: f(0) = 1.09091-5 = 64.7%.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 123
4.45. C. From a previous solution, the posterior distribution is: 24289 λ3 e−10λ(1 - e−λ - λe−λ). Chance of 0 claims is: ∞
∞
∫ 24289 λ3 e−10λ(1 - e−λ - λe−λ)e−λ dλ = 24289 ∫ λ3 e−11λ - λ3 e−12λ - λ4 e−12λ dλ = 0
0
24289(6/114 - 6/124 - 24/125 ) = 58.3%. Comment: Prior to any observation, we expect 6830 policyholders with no claims in the first year. In the next year, we expect to see: (.706)(6830) + (.647)(2484) + (.583)(686) = 6829 policyholders with no claims, the same number as the first year, subject to rounding. A better estimate of the mean frequency for each insured results from observing; however, the distribution for the whole portfolio remains the same, assuming the original model was okay. 4.46. E. rʼ = αʼ = 4, βʼ = θʼ = 1/11= .09091. For the predictive Negative Binomial: 1 - (f(0) + f(1)) = 1 - {1.09091-4 + (4)(.09091)(1.09091-5)} = 5.86%. 4.47. E. rʼ = αʼ = 5, βʼ = θʼ = 1/11= .09091. For the predictive Negative Binomial: 1 - (f(0) + f(1)) = 1 - {1.09091-5 + (5)(.09091)(1.09091-6)} = 8.31%. 4.48. A. From a previous solution, the posterior distribution is: 24289 λ3 e−10λ(1 - e−λ - λe−λ). Chance of 2 or more claims is: ∞
∫ 24289 λ3 e−10λ(1 - e−λ - λe−λ)(1 - e−λ - λe−λ) dλ = 0 ∞
24289
∫ λ3 e−10λ + 2λ4 e−12λ + λ3 e−12λ + λ5 e−12λ - 2λ3 e−11λ - 2λ4 e−11λ dλ =
0
24289(6/104 + 48/125 + 6/124 + 120/126 - 12/114 - 48/115 ) = 11.6%. Alternately, for the whole portfolio we expect 686 policyholders to have 2 or more claims. We expect (5.86%)(6830) = 400 from Group A, (8.31%)(2484) = 206 from Group B, and therefore 686 - (400 + 206) = 80 from Group C. 80/686 = 11.7%.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 124
4.49. A. For the Gamma-Poisson the posterior distribution of µ is a Gamma. Posterior α = Prior α + number of claims observed ≥ Prior α + 1 > 1. (α > 0.) Thus while the posterior distribution of µ is a Gamma it canʼt be an Exponential, which would be a Gamma with α = 1. Thus Statement #1 is true. For the Gamma Distribution, the coefficient is variation is αθ2 / (αθ) = 1/ α . Since the Posterior α = Prior α + number of claims observed ≥ Prior α + 1 > Prior α, the coefficient of variation of the posterior distribution of µ is less than the coefficient of variation of the prior distribution of µ. Statement #2 is true. The distribution of the number of claims is a Negative Binomial. The Negative Binomial has a coefficient of variation of: rβ(1+ β) / (rβ) = (1+ β) / (rβ) . However, r = α and β = θ. Thus the prior CV2 is: (1+β)/(rβ) = (1+θ) / αθ = (1 + 1/θ) / α. Let C claims be observed in E years. Posterior α = Prior α + C. 1/Posterior θ = E + 1/Prior θ. Thus the posterior CV2 is: (1 + E + 1/θ) /(C+α). Therefore, depending on the values of C and E, the posterior CV of the distribution of the number of claims can be either larger or smaller than the prior CV. Statement #3 is not true. 4.50. D. The scale parameter of a Gamma is the variance/mean, which in this case for the prior Gamma is: 0.0004 / 0.08 = 1/200. Then the shape parameter is the mean divided by the scale parameter, which for the prior Gamma is: (0.08)/(1/200) = 16. For the Gamma-Poisson, the posterior Gamma has shape parameter = prior shape parameter + number of claims observed = 16 + 120 = 136, and inverse posterior scale parameter = inverse prior scale parameter + the number of observed exposures = 200 + 900 = 1100. Thus the posterior scale parameter of the posterior Gamma is 1/1100. The Bayesian estimate is the mean of the posterior Gamma = posterior shape parameter times the posterior scale parameter = 136/1100 = 0.1236. Alternately, for the Gamma-Poisson the Bayes estimate is equal to the Buhlmann Credibility estimate. For the Gamma-Poisson, the Buhlmann Credibility parameter is the inverse scale parameter of the prior Gamma. Thus the Buhlmann Credibility parameter K = 200. We have observed (3)(300) = 900 member-years, so that the credibility Z = N / (N+K) = 900 / (900 + 200) = 9/11. The observed frequency is 120 / 900 = 0.1333. The prior estimate of the frequency is the mean of the prior Gamma, which is 0.08. Thus the new estimate of the frequency = (9/11)(0.1333) + (2/11)(0.08) = 0.1236.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 125
4.51. C. Gamma-Poisson with prior α = 4 and prior θ = 1/200. αʼ = α + (3 + 2 + 3) = 12. 1/θʼ = 1/θ + (200 + 250 + 300) = 950. Estimated future frequency = mean of the posterior Gamma = αʼθʼ = 12/950. Estimate of the number of claims in Year 4: (350)(12/950) = 4.42. Alternately, K = 1/θ = 200. Observed frequency = (3 + 2 + 3)/(200 + 250 + 300) = 8/750. Prior mean frequency = mean of the prior Gamma = αθ = 4/200 = .02. Z = 750/(750 + K) = .789. Estimated future frequency = (.789)(8/750) + (.211)(.02) = .01264. Estimate of the number of claims in Year 4: (350)(.01264) = 4.42. Comment: Similar to 4, 11/03, Q.27. 4.52. C. Posterior to Year 3, the distribution of λ for one is exposure is Gamma with α = 12 and θ = 1/950. For 350 exposures, the mean frequency is Poisson with mean 350 λ. The distribution of 350 λ is Gamma with α = 12 and θ = 350/950 = 7/19. The number of claims in Year 4 is Negative Binomial with r = 12 and β = 7/19 f(0) = 1/(1 + β)r = (26/19)12 = .02319. f(1) = f(0)rβ/(1+β) = (.02319)(12)(7/26) = .07492. f(2) = f(1)((r+1)/2)β/(1+β) = (.07492)(13/2)(7/26) = .13113. f(0) + f(1) + f(2) = 22.9%. Comment: For the Gamma-Poisson, the mixed distribution for Y exposures is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ. See “Mahlerʼs Guide to Frequency Distributions.” 4.53. A. The prior distribution of λ is Gamma with α = 40 and θ = 1/300. After one year: αʼ = α + C = 40 + 60 = 100. 1/θʼ = 1/θ + E = 300 + 500 = 800. Estimated future frequency = Mean of posterior Gamma = αʼθʼ= 100/800 = 1/8. Expected number of claims in year 2: (600)(1/8) = 75. Updating for one more year of data: αʼ = α + C = 100 + 90 = 190. 1/θʼ = 1/θ + E = 800 + 600 = 1400. Estimated future frequency = Mean of posterior Gamma = αʼθʼ= 190/1400 = .1357. Expected number of claims in year 3: (700)(.1357) = 95. R + A = 75 + 95 = 170. Alternately, Andy can update for both years at once: αʼ = α + C = 40 + 60 + 90 = 190. 1/θʼ = 1/θ + E = 300 + 500 + 600 = 1400. Proceed as before.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 126
4.54. For female drivers, the mean is: 0.211126. The second moment is: 0.292895. rβ = .211126. rβ(1+β) = 0.292895 - 0.2111262 = 0.248321.
⇒ 1 + β = .248321/.211126 = 1.17617. ⇒ β = 0.17617. ⇒ r = .211126/.17617 = 1.1984. Number of Claims
Observed
Method of Moments Negative Binomial
Expected
Chi Square
0 1 2 3 4 and over
19,634 3,573 558 83 24
0.8232820 0.1477789 0.0243305 0.0038853 0.0007233
19,653.4 3,527.8 580.8 92.7 17.3
0.019 0.580 0.896 1.025 2.625
Sum
23,872
1
23,872.0
5.145
Where the last group is 4 and over, since 5 and over would have had 2.7 expected drivers. With 2 fitted parameters, we have 5 - 1 - 2 = 2 degrees of freedom. For the Chi-Square with 2 d.f. the critical value for 10% is 4.605 and for 5% is 5.991. 4.605 < 5.145 < 5.991. Thus we reject the fit at 10%, and do not reject the fit at 5%. 4.55. For male drivers, the mean is: 0.361701. The second moment is: 0.574357. rβ = 0.361701. rβ(1+β) = 0.574357 - 0.3617012 = 0.443529.
⇒ 1 + β = 0.443529/0.361701 = 1.22623. ⇒ β = 0.22623. ⇒ r = .361701/.22623 = 1.5988. Number of Claims
Observed
Method of Moments Negative Binomial
Expected
Chi Square
0 1 2 3 4 5 and over
21,800 6,589 1,476 335 69 24
0.7217573 0.2128941 0.0510369 0.0112953 0.0023959 0.0006205
21,864.2 6,449.2 1,546.1 342.2 72.6 18.8
0.188 3.030 3.175 0.150 0.176 1.441
Sum
30,293
1
30,293.0
8.162
Where the last group is 5 and over, since 6 and over would have had 3.8 expected drivers. With 2 fitted parameters, we have 6 - 1 - 2 = 3 degrees of freedom. For the Chi-Square with 3 d.f. the critical value for 5% is 7.815 and for 2.5% is 9.348. 7.815 < 8.162 < 9.348. Thus we reject the fit at 5%, and do not reject the fit at 2.5%.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 127
4.56. For each group, we have assumed a Gamma-Poisson. r ⇔ α and β ⇔ θ. The data is for six years, so each mean Poisson frequency is 6λ, where λ is the mean for a single year. Thus each Gamma Distribution has 6 times the scale parameter for one year. Therefore, the θ for one year is: the fitted β divided by 6. The Buhlmann Credibility parameter for the Gamma-Poisson is 1/θ for one year. K = 6/(fitted beta). For three years of data, Z = 3/(3 + K). Data Set Fitted Beta K Credibility for Three Years of Data Female 0.17617 34.1 8.1% Male 0.22623 26.5 10.2% Comment: The males drivers are a less homogeneous group than the female drivers. Thus the experience of male driver is given more credibility, compared to the mean of all male drivers, than is the case for female drivers. The credibility assigned to an individual driversʼ experience would be less if one took into account classifications and territories. Data for 1969-1974 California Drivers, taken from Table 1 and Table A2 of “The Distribution of Automobile Accidents-Are Relativities Stable Over Time?,” by Emilio C. Venezian, PCAS 1990. The means for each driver are not constant over time. See “A Markov Chain Model of Shifting Risk Parameters,” by Howard C. Mahler, PCAS 1997. This results in less credibility being given to older years of data. 4.57. B. This is a Gamma-Poisson. The prior estimate is: αθ. αʼ = α + C = α + 0 = α. 1/θʼ = 1/θ + E = 1/θ + 3. ⇒ θʼ = θ/(1 + 3θ). The posterior estimate is: αʼθʼ = αθ/(1 + 3θ). (Posterior estimate)/(prior estimate) = 1/(1 + 3θ) Posterior estimate is 85% of prior estimate. ⇒ 0.85 = 1/(1 + 3θ). ⇒ θ = 1/17. Alternately, Z = the claims free discount = 1 - 85% = 15%. For the Gamma Poisson, K = 1/θ. Z = 3/(3 + 1/θ) = 3θ/(3θ + 1) = 15%. ⇒ θ = 1/17.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 128
4.58. D. The chance of the observation given λ is: (λe−λ)(λ2e−λ/2)(λ3e−λ/6) = λ6e−3λ/12. π(λ) = θ−α λα−1 e−λ/θ / Γ(α). Therefore, the posterior distribution of λ is proportional to: (λ6e−3λ)(λα−1 e−λ/θ) = λα+5 e−(3+1/θ)λ. The mean of the posterior distribution of λ is: (integral of λ times the “probability weight”)/(integral of the “probability weight”) ∞
=
∫
λα + 6 exp[−(3 + 1/ θ) λ] dλ /
0
∞
∫ λα + 5 exp[−(3 + 1/ θ) λ] dλ . 0
Comment: The integrals are of the Gamma type. ∞
∫ λα + 6 exp[−(3 +
1/ θ) λ] dλ = Γ[α+7] / (3 + 1/θ)α +7.
0
∞
∫ λα + 5 exp[−(3 + 1/ θ) λ] dλ = Γ[α+6] / (3 + 1/θ)α +6. 0
{Γ[α+7] / (3 + 1/θ)α +7}/{Γ[α+6] / (3 + 1/θ)α +6} = (α + 6)/(3 + 1/θ). The posterior distribution is Gamma with αʼ = α + 6, and 1/θʼ = 1/θ + 3. The mean of this Gamma is: αʼ θʼ = (α + 6)/(3 + 1/θ).
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 129
4.59. E. The chance of 2 claims given λ is: λ2e−λ/2. From the previous solution, the posterior distribution of λ is proportional to: λα+5 e−(3+1/θ)λ. The probability of two claims in year 4 from this policyholder is: (integral of λ2e−λ/2 times the “probability weight”)/(integral of the “probability weight”) ∞
= 0.5
∫
0
λα + 7 exp[−(4 + 1/ θ) λ] dλ /
∞
∫ λα + 5 exp[−(3 + 1/ θ) λ] dλ . 0
Comment: The integrals are of the Gamma type. ∞
∫ λα + 7 exp[−(4 + 1/ θ) λ] dλ = Γ[α+8] / (4 + 1/θ)α + 8. 0
∞
∫ λα + 5 exp[−(3 + 1/ θ) λ] dλ = Γ[α+6] / (3 + 1/θ)α + 6. 0
(.5){Γ[α+8] / (4 + 1/θ)α + 8} / {Γ[α+6] / (3 + 1/θ)α + 6} = {(α+6)(α+7)/2} (3 + 1/θ)α + 6 / (4 + 1/θ)α + 8. The posterior distribution is Gamma with αʼ = α + 6, and 1/θʼ = 1/θ + 3. The predictive distribution is Negative Binomial with r = α + 6, and β =1/(1/θ + 3). The density at two of this Negative Binomial is: {r(r+1)/2!} β2/(1+β)2+r = {(α + 6)(α + 7)/2} {1/(3 + 1/θ)}2 / {(4 + 1/θ)/(3 + 1/θ)}α + 8 = {(α+6)(α+7)/2} (3 + 1/θ)α + 6 / (4 + 1/θ)α + 8.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 130
4.60. B. The prior distribution is Gamma with α = 5 and θ = 1/50. The posterior distribution of λ is Gamma with α = 5 + 7 + 6 = 18, and 1/θ = 50 + 20 + 25 = 95. For 60 exposures, the frequency is Poisson with mean 60λ. The posterior distribution of 60λ is Gamma with α = 18, and θ = 60/95 = 12/19. Thus, for a total of 60 exposures in years 3 and 4, the distribution of the number of accidents is Negative Binomial with r = 18 and β = 12/19. f(9) = {(18)(19)(20)(21)(22)(23)(24)(25)(26)/9!} β9 / (1+β)9+r = 9.1%. Comment: For the Gamma-Poisson, the mixed distribution for Y exposures of data is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ. See “Mahlerʼs Guide to Frequency Distributions.” A graph of the distribution of the total number of accidents in years 3 and 4: f(x)
0.08 0.06 0.04 0.02
0
5
10
15
20
25
x
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 131
4.61. a. The posterior distribution is proportional to: π(λ) f(0) = e−λ, an Exponential with mean 1. b. The posterior distribution is proportional to: π(λ) f(0) f(2) = λ2 e−2λ/2. This is proportional to a Gamma Distribution with α = 3 and θ = 1/2, which must be the posterior distribution. It has a variance of: αθ2 = 3/4. c. The posterior distribution is proportional to: π(λ) f(0) f(2) f(1) = λ3 e−3λ/2. This is proportional to a Gamma Distribution with α = 4 and θ = 1/3 which must be the posterior distribution. Mixing Poissons via this Gamma produces a Negative Binomial predictive distribution, with r = 4 and β = 1/3. This Negative Binomial has a variance of: rβ(1+β) = αθ2 = 4(1/3)(4/3) = 16/9. Alternately, posterior EPV = posterior mean = (4)(1/3) = 4/3. Posterior VHM = Variance of posterior gamma = (4)(1/3)2 = 4/9. Posterior Total Variance = EPV + VHM = 4/3 + 4/9 = 16/9.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 132
4.62. B. The contributions to the likelihood from each policyholder will multiply, since their experience is assumed to be independent. For the first policyholder, if it has a Poisson parameter of µ, then the loglikelihood from 1997 is f(17) for a Poisson with mean 9µ: (9µ)17 e−9µ / 17!. Multiplying the likelihoods from the 3 years, the likelihood for the first policyholder is: {(9µ)17 e−9µ / 17!}{(10µ)20 e−10µ / 20!}{(13µ)16 e−13µ /16!}. Ignoring annoying constants, which will not affect the maximum likelihood, this is: µ17+20+16 e−(9+10+13)µ = µ53 e−32µ. Given µ in turn is distributed via a Gamma with α = 10 and θ unknown, we can calculate the expected value of this likelihood: ∞
∞
∫ µ53 e−32µ θ−10 µ9 e−µ/θ /Γ(10) dµ = {θ−10 / Γ(10)} ∫ µ62 e−(32+1/θ)µ dµ = 0
0
{θ−10 /Γ(10)}{Γ(63) (32 + 1/θ)-63} = (Γ(63)/Γ(10))θ−10 (32 + 1/θ)-63. The first policyholder had 53 claims and 32 exposures and the likelihood was proportional to: θ−10 (32 + 1/θ)-63 = θ−α (E + 1/θ)-(C+α). Thus for the second policyholder with 59 claims and 26 exposures, the likelihood is proportional to: θ−10 (26 + 1/θ)-69. For the third policyholder with 91 claims and 49 exposures, the likelihood is proportional to: θ−10 (49 + 1/θ)-101. The product of the likelihoods is: θ−30 (32 + 1/θ)-63(26 + 1/θ)-69(49 + 1/θ)-101. The loglikelihood is: -30 ln(θ) - 63 ln(32+1/θ) - 69ln(26+1/θ) - 101 ln(49+1/θ). Setting the derivative with respect to θ equal to 0: -30/θ - (-1/θ2)63/(32+1/θ)- (-1/θ2)69/(26+1/θ)- (-1/θ2)101/(49+1/θ) = 0. 30θ = 63/(32+1/θ) + 69/(26+1/θ) + 101/(49+1/θ). Comment: In this case, the maximum likelihood estimate of θ = 0.192. Very difficult! More generally assume we have N policyholders, with policyholder i having Ci claims and Ei exposures. Then if α is known, the maximum likelihood equation for θ is: αθ = (1/N)Σ (α+Ci)/(Ei + 1/θ). If we performed Bayesian analysis on each policyholder separately, then policyholder i has future expected frequency: (α+Ci) / (Ei + 1/θ). Thus the maximum likelihood equation for θ is equivalent to: overall average future claim frequency = average of the future claim frequencies for the individual policyholders.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 133
4.63. B. & 4.64. D. The number of claims for the group is the sum of 4 independent Poissons, which in turn is also Poisson with mean equal to: λ1 + λ2 + λ3 + λ4. The lambdas are independent draws from a Gamma Distribution with α = 2 and θ = 0.10. Thus their sum is a Gamma Distribution with α = 8 and θ = 0.10. Thus we have a Gamma-Poisson, with 6 claims in one year. αʼ = α + C = 8 + 6 = 14. 1/θʼ = 1/θ + E = 10 + 1 = 11. Posterior mean is: 14/11 = 1.273. The predictive distribution is Negative Binomial with r = 14 and β = 1/11. f(3) =
r (r + 1) (r + 2) 6
3
β
= (1 + β)r + 3
(14)(15)(16) (1/ 11)3 = 9.58%. 6 (12 / 11)17
Comment: The average number of claims next year per individual is: 1.273 /4 = 0.318. The mathematics of looking at 4 random individuals for one year is somewhat different than when looking at one individual for 4 years. 4.65. C. & 4.66. A. We have a Gamma-Poisson. αʼ = α + C = 2 + 6 = 8. 1/θʼ = 1/θ + E = 10 + 4 = 14. Posterior mean is: 8/14 = 0.571. The predictive distribution is Negative Binomial with r = 8 and β = 1/14. f(3) =
r (r + 1) (r + 2) 6
3
β
= (1 + β)r + 3
(8)(9)(10) (1/ 14)3 = 2.05%. 6 (15 / 14)11
4.67. B. The posterior distribution of λ is Gamma with α = 3 + 2 = 5, and 1/θ = 100 + 5 = 105. The zero-one loss function corresponds to the mode. The mode of this Gamma is: θ (α - 1) = (1/105)(4) = 3.8%.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 134
4.68. E. Let x be the number of flights purchased by customers during the contest year. X follows the mixed distribution, a Negative Binomial with r = 1/3 and β = 6. The chance of a customer being picked is proportional to his number of flights x. Thus the chance that the customer picked had purchased x flights is proportional to: x Prob[X = x] = x f(x). Now the sum of x f(x) is the mean of the Negative Binomial: rβ = (1/3)(6) = 2. Let Y be the number of flights purchased by the lucky customer. Then g(y) = y f(y) / 2, where f(y) is the Negative Binomial with r = 1/3 and β = 6. E[Y] =
∑ y g(y) = ∑ y y f(y) / 2 = (1/2)(second moment of f) =
(1/2) {(variance of f) + (mean of f)2 } = (1/2) {(1/3)(6)(7) + 22 } = 9. 4.69. C. Let Y be the number of flights purchased by the lucky customer. Then g(y) = y f(y) / 2, where f(y) is the Negative Binomial with r = 1/3 and β = 6. Given a customer who took y flights in a year, his posterior distribution of lambda is: Gamma with α = 1/3 + y, and 1/θ = 1/6 + 1 = 7/6. Thus the posterior mean is: (1/3 + y)(6/7) = 2/7 + y6/7. Thus assuming no change in behavior, the mean number of flights next year is:
∑ (2 / 7 + y6 / 7) g(y) = (2/7) ∑ g(y) + (6/7) ∑ y g(y) = (2/7)(1) + (6/7) ∑ y y f(y) / 2 =
2/7 + (3/7)(second moment of f) = (2/7) + (3/7){(1/3)(6)(7) + 22 } = 8. Given that the lucky customer on average will increase his flights by 50%, the expected number of future flights taken per year by the contest winner is: (1.5)(8) = 12. Comment: A customer who takes more than his average number of flights during the contest year is more likely to have won the contest. Therefore, the winning customer is likely to have taken more than his average number of flights during the contest year. Therefore, the average lambda for the contest winner is 8, which is less than 9, the average number of flights taken during the contest year by the contest winner. 4.70. This is a Gamma-Poisson with α = 2 and θ = 1/5. The posterior distribution is Gamma with α = 2 + 2 = 4, and 1/θ = 5 + 1 = 6. ⇒ θ = 1/6. Thus, the predictive distribution, the posterior mixed distribution, is Negative Binomial with r = 4 and β = 1/6.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 135
4.71. C. The posterior distribution is Gamma with α = 6 + 0 = 6, and 1/θ = 25 + n. Thus, the mean of the posterior distribution is: 6/(25 + n). Set: 6/(25 + n) = 0.15. ⇒ n = 15. Alternately, the prior mean is: (6)(0.4) = 0.24. Z 0 + (1-Z) 0.24 = 0.15. ⇒ Z = 3/8. For the Gamma-Poisson, K = 1/θ = 1/0.04 = 25.
⇒ 3/8 = n / (n + 25). ⇒ n = 15. 4.72. D. αθ = 1. ⇒ 1/θ = α. The posterior distribution is Gamma with αʼ = α + 20, and 1/θʼ = 1/θ + 10 = α + 10. The predictive distribution, the posterior mixed distribution, is Negative Binomial with rʼ = α + 20, and βʼ = 1/(α + 10). The variance of the predictive distribution is: rʼ βʼ (1 + βʼ) = (α + 20) 16/9 = (α + 20)
1 α + 11 . α + 10 α + 10
1 α + 11 . ⇒ (16)(α + 10)2 = (9)(α + 20)(α + 11). ⇒ α + 10 α + 10
7α 2 + 41α - 380 = 0. ⇒ α = 5. ⇒ αʼ = 25, and θʼ = 1/15. Thus the variance of the posterior distribution is: αʼ θʼ2 = 25/152 = 1/9. 4.73. D. p(h | c) is proportional to f(c | h)g(h) = (e-hhc /c! ) ar e-ah hr-1 / Γ(r) = ar e-(a+1)h hr+c-1 /(Γ(r)c!). This is proportional to e-(a+1)h hr+c-1. One can recognize that this is proportional to a Gamma Distribution with parameters (a+1), (r+c). Therefore, p(h|c) = (a+1)r+c e-(a+1)h hr+c-1 / Γ(r+c). Comment: Note that r is the shape parameter of the prior Gamma, corresponding to α in Loss Models while a is the inverse scale parameter, corresponding to 1/θ in Loss Models.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 136
4.74. A. For the Gamma-Poisson, the posterior Gamma has parameters: α+y, and 1/(1/θ + N). The Bayesian Update, is the mean of the posterior distribution: (α+y)/( 1/θ + N) = {1/θ/ (N + 1/θ)} (αθ) + {N / (N + 1/θ)}(y/N) = {1 - N/ (N + 1/θ)} (αθ) + {N / (N + 1/θ)}(y/N). Alternately, for the Gamma-Poisson the estimates from Buhlmann Credibility and Bayesian Analysis are equal. For the Gamma-Poisson the Buhlmann Credibility parameter is equal to the inverse of the scale parameter of the (prior) Gamma, which is 1/θ in this case. Thus Z = N/(N + 1/θ). The prior estimate is the mean of the prior Gamma or αθ. The observed frequency is y/N. Thus the new estimate is: Z(y/N) + (1 - Z)(αθ). 4.75. A. Gamma-Poisson with α = 1 and 1/θ = a. αʼ = 1 + 14. 1/θʼ = a + 1. 10 = αʼθʼ = 15/(a + 1). ⇒ a = 1/2. ⇒ 1/a = 2. 4.76. B. The expected value of the process variance = E[q] = a/b. The variance of the hypothetical means = VAR[q] = a/b2. Therefore, the Buhlmann Credibility Parameter = K = (a/b) / (a/b2) = b. Thus m trials is given credibility of m/(m+K) = m / (m+b). 4.77. C. ∞
∞
∫
P(D=2) = P(D=2 | h) g(h) dh = 0
∞
∫(e-h h2 / 2!) e-h dh = (1/2) ∫h2e-2h dh
0
0
(1/2) Γ(3) /23 = 2/16 = 1/8. 4.78. B. The marginal distribution for the Gamma-Poisson is a Negative Binomial. 4.79. C. The prior Gamma has α = 1 (an Exponential Distribution), and θ = 1. We observe 0 claims in 1 year. The posterior Gamma has α′ = prior α + number of claims = 1 + 0 = 1 (an Exponential Distribution) and 1/θ′ = prior 1/θ + number of exposures = 1 + 1 = 2. That is, the posterior density function is: 2e−2λ. Comment: Given λ, the chance of observing zero claims is: λ0 e−λ / 0! = e−λ. The posterior distribution is proportional to the chance of observation and the a priori distribution of λ: (e−λ) (e−λ) = e−2λ. Dividing by the integral of e−2λ from 0 to ∞ gives the posterior distribution: e−2λ /(1/2) = 2e−2λ.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 137
4.80. C. The prior Gamma Distribution has parameters α = 2 and θ = 1. Thus the posterior Gamma has parameters α = prior α + number of claims = 2 + 1 = 3, 1/posterior θ = 1/prior θ + number of exposures = 1+ 1 = 2. Thus the posterior Gamma has density f(h) = θ−αhα−1 e-h/θ / Γ(α) = 23 h3-1e-2h / Γ(3) = 8 h2 e-2h / 2 = 4h2 e- 2 h. Comment: For the Gamma f(h) = θ−αhα−1 e−h/θ / Γ(α). Thus for α = 2 and θ = 1, f(h) = 1-2h2-1 e-h / Γ(2) = h e-h, the given prior distribution. 4.81. C. The prior Gamma Distribution has parameters α = 2 and θ = 1. For the Gamma-Poisson the Buhlmann Credibility Parameter K = 1/(prior θ) = 1. Thus for one observation, Z = 1 / (1 + K) = 1/2. 4.82. A. For the Gamma-Poisson, the posterior Gamma has shape parameter α′ = prior α + number of claims observed = 50 + 65 + 112 = 227. For the Gamma Distribution, the mean is αθ, while the variance is αθ2 . Thus the coefficient of variation is:
variance / mean =
The CV of the posterior Gamma is: 1 /
αθ2 / (αθ) = 1 /
α.
227 = 0.066.
Comment: 1/θ′ = 1/θ + number of exposures = 500 + 750 + 1100 = 2350. Thus the posterior θ is 1/2350. One could go from the prior Gamma to the Gamma posterior of both years of observations in two steps, by first computing the Gamma posterior of one year by just adding in the exposures and claims observed over the first year. 4.83. B. The variance of the mixed Negative Binomial Distribution is equal to the total variance of the portfolio = the EPV + VHM = Mean of the Gamma + Variance of the Gamma. Thus Mean of the Gamma = .5 - .2 = .3. Solving for the parameters of the Gamma: mean = αθ = 0.3, and variance = αθ2 = 0.2. Thus θ = .2/.3 = 2/3 and α = (.3)/(2/3) = .45. Now r = α = .45, while β = θ = 2/3. Thus r(1+β) = (.45)(5/3) = 0.75. Alternately, since β = θ and r = α, the variance of the Gamma = αθ2 = rβ2 . Thus since we are given the variance of the Gamma is .2, rβ2 = .2. Also we are give that rβ(1+β) = .5. Therefore (1+β)/β = .5/.2, and β = 2/3. Therefore r = .2/(2/3)2 = .45. r(1+β) = (0.45)(5/3) = 0.75.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 138
4.84. E. For the Gamma-Poisson, the posterior Gamma has shape parameter = α + Σ xi, and inverse of the posterior scale parameter = 1/θ + n. The Bayesian estimate is the mean of the posterior Gamma = posterior shape parameter / posterior scale parameter = {α + Σ xi } / (n+1/θ). Alternately, for the Gamma-Poisson the Bayes estimate is equal to the Buhlmann Credibility estimate. For the Gamma-Poisson, the Buhlmann Credibility parameter is 1/θ, the (inverse) scale parameter of the prior Gamma. Thus Z = n/(n + 1/θ), and 1 - Z = 1/θ / (n + 1/θ). The observed mean is (1/n)Σ xi and the prior estimate is the mean of the prior Gamma: αθ. Thus the new estimate is: {(1/n)Σ xi }{n/(n + 1/θ)} + {αθ}{1/θ / (n + 1/θ)} = {α + Σ xi } / (n + 1/θ). 4.85. D. The prior Gamma has mean = αθ = 0.14, and variance = αθ2 = 0.0004. Thus α = 0.142 / 0.0004 = 49 and θ = 0.0004/0.14 = 1/350. The posterior Gamma has parameters α′ = prior α + number of claims observed = 49 + 110 = 159, and 1/θ′ = prior 1/θ + number of exposures observed = 350 + (2)(310) = 970. Thus θ′ = 1/970. The Bayesian estimate is the mean of the posterior Gamma = α′ θ′ = 159 / 970 = 0.164. Alternately, one can calculate the Buhlmann Credibility estimate which for the Gamma-Poisson is equal to the Bayesian Estimate. EPV = Expected Value of the Poisson Parameter = Mean of Prior Gamma = .14. VHM = Variance of the Poisson Parameters = Variance of the Prior Gamma = .0004. K = EPV / VHM = .14 / .0004 = 350 = prior 1/θ. Weʼve observed (2)(310) = 620 policy-years of exposures, so Z = 620 / (620 + 350) = .639. The prior estimate is .14, while the observation is 110 / 620 = .177. Thus the new estimate is: (.639)(.177) + (1-.639)(.14) = 0.164. Comments: Note that the number of exposures observed is: (2)(310) rather than 310 since the frequency is expressed per year. 4.86. D. αʼ = α + C = 1 + 3 = 4, and 1/θʼ = 1/θ + E = 1/2 + 1 = 1.5. Posterior Gamma: θ−αxα−1 e−x/θ / Γ(α) = 1.54 x4-1 e-1.5x / Γ(4) = (81/16) x3 e-1.5x / 6 = (27/32) x3 e- 1 . 5 x.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 139
4.87. C. For the Gamma-Poisson, the Buhlmann Credibility estimate is equal to the Bayesian Estimate. The later is equal to the mean of the Posterior Gamma (which equals the mean of the Poisson parameters of the individual insureds.) Based on the solution to the previous question, the posterior Gamma has parameters of 4 and 1/1.5 and thus a mean of: 4 /1.5 = 2.67. Alternately, for the Gamma-Poisson the Buhlmann Credibility Parameter is the inverse of the prior scale parameter. Thus K = .5. Z = N / (N+K) = 1 / (1 + .5) = 2/3. The observation is 3/1. The prior estimate is the mean of the prior Gamma = 1/.5 = 2. Thus the posterior estimate = (2/3)(3) + (1/3)(2) = 8/3. 4.88. E. The Gamma distribution has a mean of αθ and variance of αθ2 . We are given 0.1 = αθ and .0003 = αθ, so that θ =.0003 / .1 = .003, and α = 33.33. For the Gamma-Poisson, the posterior Gamma has shape parameter = prior shape parameter plus the number of claims observed = 33.33 + 150 = 183.33. The posterior Gamma has the inverse of the scale parameter equal to the inverse of the prior scale parameter plus the number of exposures observed = 333.33 + (3)(200) = 933.33. Thus the posterior scale parameter is 1/933.33. The Bayes Estimate is the mean of the posterior Gamma Distribution = 183.33 / 933.33 = 0.1964. Comment: Note that there are 200 exposures for each of three years, which counts as observing 600 exposures (because the frequencies are numbers of claims per year per exposure.) 4.89. D. The severity is given by an Exponential Distribution, with mean of 2500 and variance of 25002 . The distribution of λ is Exponential; E[λ] = 1/3, VAR[λ] = 1/32 = 1/9. The hypothetical mean frequencies differ, but the hypothetical mean severities do not. The hypothetical mean pure premium is 2500λ. Thus the variance of the hypothetical mean pure premiums is VAR[2500λ] = 25002 VAR[λ] = (6.25 million)(1/9) = .694 million. The process variance of the pure premium is given by: (mean frequency)(variance of the severity) + (mean severity)2 (variance of frequency) = (λ)(25002 ) + (2500)2 (λ) = λ (12.5 million ). Therefore the EPV of the pure premium = E[λ](12.5 million ) = (1/3)(12.5 million) = 4.167 million. The total variance of the pure premiums is: EPV + VHM = 4.167 + .694 million = 4.861 million. Comment: One has to assume that the frequency and severity are independent. The frequency is given by a Gamma-Poisson Process, with a Gamma with parameters α = 1 and θ = 1/3. The mixed distribution is Negative Binomial with parameters r = 1 and β = 1/3, mean rβ = 1/3, and variance rβ(1+β) = (1)(1/3)(4/3) = 4/9. The variance of the pure premium is given by: (mean freq.)(variance of the severity) + (mean severity)2 (variance of frequency) = (1/3)(25002 ) + (2500)2 (4/9) = 4,861,111. However, this alternative only works when the frequency is Poisson and the variance of the hypothetical mean severities is zero.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 140
4.90. D. Prior Gamma has mean = .1 = αθ and variance = .0025 = αθ2. Therefore θ = 0.0025 / 0.1 = 1/40, and α = .1/θ = 4. For the Gamma-Poisson Conjugate Prior, the Posterior Gamma has shape parameter equal to the shape parameter of the Prior Gamma + Number of Claims Observed = 4 + 6 = 10. The inverse of the scale parameter of the Posterior Gamma = inverse of the scale parameter of Prior Gamma + number of exposures = 40 + 20 = 60. Thus θʼ = 1/60. The variance of the Posterior Gamma = 10 /(602 ) = 0.00278. 4.91. D. Prior Gamma has scale parameter θ = 1/1000 and shape parameter α = 150. After the first year of observations: the new inverse scale parameter = old inverse scale parameter + number of exposures = 1000 + 1500 = 2500, and the new shape parameter = old shape parameter + number of claims = 150 + 300 = 450. Similarly, after the second year of observations: new inverse scale parameter = old inverse scale parameter + number of exposures = 2500 + 2500 = 5000, and the new shape parameter = old shape parameter + number of claims = 450 + 525 = 975. The Bayesian estimate = the mean of the posterior Gamma = (Posterior Shape parameter) (Posterior scale parameter) = (975)(1/ 5000) = 0.195. Comment: One can go directly from the prior Gamma to the Gamma posterior of both years of observations, by just adding in the exposures and claims observed over the whole period of time. One would obtain the same result. 4.92. C. Since the exponential is a special case of the Gamma, this is a Gamma-Poisson Process. The new shape parameter = prior shape parameter + # claims = 1 + 1 = 2. Posterior inverse scale parameter = prior inverse scale parameter + # expos. = 10 + 1 = 11. Therefore the Posterior density = θ−αµα−1 e-µ/θ / Γ(α) = 112 µ2-1 e−11µ / Γ(2) = 121µ e−11µ. Alternately, one can apply Bayes Theorem and integrate. The chance of observing one claim in one exposure period given µ is µ e−µ. ∞
∞
∞
∫ µ e−µg( µ)dµ = ∫ 10µ e−11µdµ = (-10/11)µ e−11µ - (10/121)µ e−11µ ] = 10/121. 0
0
0
By Bayes Theorem, the posterior density of µ is: (the a priori probability)(chance of the observation), divided by the above integral. Therefore the density equals: (10 e−10µ)(µ e−µ) / (10/121) = 121µ e−11µ.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 141
4.93. B. For the Gamma-Poisson, the Buhlmann Credibility parameter K = the inverse of the scale parameter of the prior Gamma = 10. Thus Z = 1 / (1+10) = 1/11. Alternately, for the Gamma-Poisson the Buhlmann credibility estimate equals the Bayes Analysis estimate. The latter is the mean of the posterior Gamma (with α = 2, θ = 1/11) calculated in the previous question: 2 /11. The mean of the prior Gamma is 1/10. The observation is 1. Thus (2/11) = (Z)(1) + (1 - Z)(1/10). Therefore Z = 1/11. Alternately, EPV = E[µ] = Mean of Prior Gamma = 1/10. VHM = VAR[µ] = Variance of the Prior Gamma = 1/102 . Therefore K = (1/10) / (1/100) = 10. Therefore Z = 1/11. 4.94. B. Since K = 10, for 5 exposure periods, credibility = 5 / (5 + 10) = 1/3. For the Gamma-Poisson the Buhlmann Credibility result is equal to the result of Bayesian Analysis, the mean of the posterior distribution, which is given as .6. Therefore we know that the Buhlmann credibility estimate of the mean is .6. If x be the number of claims observed in the five exposure periods, then the observed frequency is x/5. The prior estimate is .5, the mean of the prior Gamma distribution. Therefore we have: .6 = Buhlmann Credibility Estimate = (1/3)(x/5) + (2/3)(1/2) Therefore x = 4. Alternately, for the Poisson-Gamma, the value of K in the Buhlmann credibility formula is equal to the inverse of the scale parameter of the prior Gamma distribution. Thus θ = 1/10. The mean of the prior Gamma is αθ = .5. Thus the prior Gamma distribution has a shape parameter α = 5. For the Gamma-Poisson, the posterior Gamma has new inverse scale parameter equal to the prior inverse scale parameter plus the number of exposures. Thus the posterior inverse scale parameter is equal to 10 + 5 = 15. The posterior shape parameter is equal to the prior shape parameter of 5 plus the number of claims observed, or 5 + x. However, we are given that the posterior Gamma has a mean of .6. But this equals the ratio of its shape parameter to its inverse scale parameter. Thus .6 = (5 + x)/15. Therefore x = 4.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 142
4.95. D. One solution uses the Bayesian result for the Gamma-Poisson and the fact that for the Gamma-Poisson the Bayes result equals the Buhlmann Credibility result. The prior Gamma has parameters: 1, 1/5 and we observe 10 claims for 100 exposures. Therefore, the posterior Gamma has parameters: 1+10 = 11, 1/(5 +100) = 1/105. Therefore the Posterior Gamma has mean 11/105 = 10.5%, so 100 risks are expected to have 10.5 claims. Alternately, the observed is 10 claims. Prior mean of Gamma is 1/5 =.2, so 100 risks are expected to have 100/5 = 20 claims. The process variance for the Poisson is λ, which has expected value = mean of prior Gamma 1/5 = .2. The variance of the hypothetical means is the variance of the prior Gamma = αθ2 = 1/52 = .04. Buhlmann K = EPV/VHM = .2 / .04 = 5. Z = 100 / (100 + 5) = .95. New Estimate = (10)(95%) + (20)(5%) = 10.5 Comment: Once you know that the prior estimate is 20 and that the observation is 10, then the estimate based on Buhlmann Credibility must be between 10 and 20 and therefore only choices D and E are possible answers. 4.96. C. The prior Gamma has parameters 1, 1/5. The posterior Gamma has parameters: 1 + 1 = 2, 1/(5 + 2) = 1/7. It has mean 2/7. For the Poisson, the process variance is the mean λ. So the expected value of the process variance for this risk after the observation is the expected value of λ or the mean of the posterior Gamma: 2/7 = 0.29. Comments: Note that the Exponential-Gamma is a special case of the Gamma-Poisson. 4.97. C. The prior Gamma has parameters 1, 1/5. The posterior Gamma has parameters: 1 + 1 = 2, 1/(5 + 2) = 1/7. f(λ) = 72 λ 2-1 exp(-7λ) / Γ(2) = 49 λ exp(-7λ). Alternately, since over 2 periods we have a Poisson with parameter 2λ, the chance of observing one claim over two periods is equal to: 2λ exp(-2λ). So by Bayes Theorem, the posterior chance of various values of lambda is proportional to this chance times h(λ): 5 exp(-5λ)2λ exp(-2λ) = 10λ exp(-7λ). We need only divide by its integral in order to convert to a probability density function. One can either compute this integral or realize it must be the Gamma Distribution with shape parameter 2 and scale parameter 1/7.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 143
4.98. E. The variance for the sum of both risks combined, is the sum of the individual variances. (Variances add for independent variables.) For each risk, its total variance is the sum of the EPV and VHM. For each Class, the EPV = E[Poisson Variance] = E[Poisson Mean] = Mean for the Class. The Variance of the Hypothetical means for risks from a class is: Var[Poisson Means] = Variance of Exponential Distribution = θ2 = (Mean for the Class)2. Class
Mean
EPV
VHM
Total Variance
1 2
0.3 0.7
0.3 0.7
0.09 0.49
0.39 1.19
SUM
1.58
Comment: The Exponential Distribution has a mean of θ and a variance of θ2. Alternately, one can get the total variance for each risk from the mixed negative binomial distributions. As shown in the solution to the next question, the mixed distribution for Class 1 has parameters r = 1 and β = .3. The variance for the Negative Binomial is rβ(1+β) = .39. Similarly the variance of a risk from Class 2 with r = 1 and β = .7 is rβ(1+β) = 1.19. The sum of the two variances is therefore: .39 + 1.19 = 1.58.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 144
4.99. B. For the Gamma-Poisson the mixed distribution is a Negative Binomial with parameters r = shape parameter of the Gamma = 1, and β = θ. For risks from Class 1, β = θ = .3. For risks from Class 2, β = θ = .7. For the Negative Binomial distribution, f(0) = 1/(1+β)r = 1/(1+β). Thus the chances of observing zero claims (over one exposure period) for the two classes are: .769 and .588. Therefore: A
B
C
D
E
Class
A priori Probability
Chance of Observation
Probability Weight = Col. B x Col. C
Probability = Col. D / Sum of Col. D
1 2
0.5 0.5
0.769 0.588
0.3845 0.2940
0.5667 0.4333
0.6785
1.0000
Overall
The probability of a risk being from Class 1 if no claims are observed (over one exposure period) is 56.7%. Comment: One can work out the probabilities of observing zero claims given a risk from one of the Classes. The chance of zero claims for a Poisson with mean λ is e−λ. Integrating this probability over the values of lambda gives: ∞
∫0
∞
e- λ
f(λ) dλ =
∫0
∞
e- λ
e- λ / θ
/ θ dλ = (1/θ)
∫0 e- (1+ 1/ θ)λ dλ
= (1/θ) / (1 + 1/θ) = 1/(1 + θ).
Which for theta equal to .3 and .7 respectively for the two Classes gives probabilities of .769 and .588 as obtained above. The remainder of the solution proceeds as above. 4.100. B. The number of observed claims is: (89)(1) + (4)(2) + (3)(1) = 100, for 1000 observed risks. For the Gamma-Poisson the posterior Gamma has parameters: α = prior α + number of claims observed = 250 + 100 = 350, 1/θ = 1/prior θ + number of risks observed = 2000 + 1000 = 3000. The mean of the posterior Gamma is αθ = 350/3000 = 0.1167. Alternately, use the fact that for the Gamma-Poisson the Buhlmann Credibility estimate is equal to the Bayesian Estimate. For the Gamma-Poisson, the Buhlmann Credibility parameter is the inverse of the scale parameter of the prior Gamma, 1/θ = 2000. Therefore, the credibility for 1000 observed risks is Z = 1000 / (1000 + 2000) = 1/3. The prior estimate is the mean of the prior Gamma = αθ = 250/2000 = 0.125. The observed frequency is: 100 /1000 = 0.100. Therefore, the Buhlmann Credibility estimate is: (1/3)(0.100) + (2/3)(0.125) = 0.1167.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 145
4.101. B. For the Gamma Distribution, the mean is αθ, while the variance is αθ2 . Thus the coefficient of variation is:
variance / mean =
αθ2 / (αθ) = 1 /
α.
Thus for the Gamma Distribution, α = 1/CV2 . Thus the prior Gamma has: α = 1 / (1/6)2 = 36. For the Gamma-Poisson, the posterior Gamma has shape parameter α′ = prior α + number of claims observed = 36 + 160 = 196. The CV of the posterior Gamma = 1 /
196 = 1/14 = 0.0714.
4.102. C. The Prior Gamma Distribution has mean = 1/2 = αθ and variance = 1/8 = αθ2. Therefore θ = (1/28)/(1/2) = 1/4, and α = (mean)/(θ) = (1/2)/(1/4) = 2. The Posterior Gamma has α′ = prior α + number of claims observed = 2 + 4 = 6. The Posterior Gamma has 1/θ′ = 1/prior θ + number of exposures observed = 4 + 2 = 6. Therefore the Posterior Gamma has a variance of: α′(θ′)2 = 6 / 62 = 1/6. 4.103. B. One can solve for the parameters of the prior Gamma, α and θ, via the Method of Moments: mean = αθ = .25, variance = αθ2 = .0025. Thus θ = 1/100, and α = 25. The parameters of the posterior Gamma are: posterior alpha = prior alpha + number of observed claims = 25 + 23 = 48. posterior theta = 1/{(1/prior theta) + number of observed exposures} = 1/(100 + 100) = 1/200. Variance of the posterior Gamma is: (posterior alpha)(posterior theta)2 = 48/2002 = 0.0012. 4.104. C. Let the parameters of the prior Gamma be α and θ. Then the prior Gamma has a mean = αθ = .05 and variance αθ2 = .01. Therefore α = .25 and θ = 1/5. Let the parameters of the posterior Gamma be α´ and θ´. Then α´ = α + n = .25 + n and 1/θ´ = 1/θ + 10 = 15. The posterior Gamma has a variance of: .01 = α´/θ´2 = (.25 + n) / 152 . Solving n = 2. 4.105. D. Mean of the prior Gamma is 1/10 = α/β. If Z → 0, then K → ∞. But for the GammaPoisson, the Buhlmann Credibility Parameter = K = β. Thus β → ∞. However, the variance of the prior Gamma is α/β2 = 1/ (10β) → 0. Thus most of the probability of the prior Gamma is close to its mean of 1/10. f(1/10) is large. Comment: The credibility of the observation is small because the VHM is small. Note that in this question the parameter β for the Gamma Distribution corresponds to 1/θ in Loss Models.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 146
4.106. C. VHM = α/β2 = 1/400. Also 1/10 = α/β. Therefore, α = 4 and β = 40. The parameters of the posterior Gamma are 4 + 1 = 5 and 40 + 60 = 100. The posterior mean is 5/100 = .05. Therefore, the expected number of errors that the player will make in the next 60 games is (.05)(60) = 3. Alternately, K = β = 40. Z = 60 / (60 + 40) = 60%. Prior mean is 1/10. Observation = 1/60. The estimate = (.6)(1/60) + (.4)(1/10) = .05. (.05)(60) = 3. Comment: Note that in this question the parameter β for the Gamma Distribution corresponds to 1/θ in Loss Models. A better model would classify players by position. For example, first basemen have a higher average fielding percentage than shortstops. This is similar to the classification schemes for insurance, which use characteristics of the insureds to divide the universe into more homogeneous groups. 4.107. C. The Prior Gamma has parameters α = 10 and θ = 1/m. If we observe C claims in m years, then the posterior Gamma has parameters: α = 10 + C, and 1/θ = m + m = 2m. Variance of posterior Gamma = (10+C)/ (2m)2 . Variance of prior Gamma = 10/ m2 . We want: 10/ m2 = (10 + C)/ (2m)2 . Thus, 40 = 10 + C. ⇒ C = 30. 4.108. B. For any Negative Binomial, mean = rβ, variance = rβ(1+β), variance / mean = 1 + β. The predictive Negative Binomial after Y years has β = posterior θ = 1/(Y+ 1/prior θ) = 1/(Y+ m). 1 + β = 1 + (posterior θ) = 1 + 1/ (m + Y), if we observe for Y years. As Y goes to infinity, 1 + β goes to 1. Comment: For a Negative Binomial Distribution, the variance is always greater than the mean, thus Choice A (of 0) can be eliminated. As we observe more and more years, the predictive distribution gets closer and closer to a Poisson Distribution, that of the individual insured we are observing, and thus the ratio of the variance to the mean approaches one. 4.109. This is a Gamma-Poisson, with prior Gamma with parameters α =1 and θ = 1/5. The posterior Gamma has parameters: α =1+1 = 2. 1/θ = 5 + 2 = 7. Thus the posterior Gamma has a mean of αθ = 2/7. Alternately, one can use Buhlmann Credibility. K = 1/prior θ = 5. Z = 2/(2 + 5) = 2/7. Observed frequency = 1/2. A priori estimate is the mean of the prior Gamma, 1/5. New Estimate = (2/7)(1/2) + (5/7)(1/5) = 2/7.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 147
4.110. C. The first actuary has a posterior Gamma with parameters: αʼ = α + C = 1 + 1 = 2 and 1/θʼ = 1/θ + E = 1/(1/6) + 3 = 9. So for the first actuary, the posterior Gamma has a mean of: αʼθʼ = 2/9. The second actuary has a prior Gamma with a mean the same as that of the first actuaryʼs: (1)(1/6) = 1/6, and variance half that of the first actuaryʼs: (1/2){(1)(1/6)2 } = 1/72. Thus as per fitting via the method of moments, for the second actuary: mean = αθ = 1/6 and variance = αθ2 = 1/72. Thus the second actuary has a prior Gamma with parameters: θ = (1/72)/(1/6) = 1/12 and α = (1/6)/(1/12) = 2. The second actuary has a posterior Gamma with parameters: αʼ = α + C = 2 + 1 = 3 and 1/θʼ = 1/θ + E = 1/(1/12) + 3 = 15. So for the second actuary, the posterior Gamma has a mean of: αʼθʼ = 3/15 = 1/5. Therefore, the ratio of the Bayesian premium that the first actuary calculates to the Bayesian premium that the second actuary calculates is: (2/9)/(1/5) = 10/9. Alternately, one can work with credibilities. For the first actuary, K = 1/θ = 6. For three years Z = 3/9 = 1/3. Prior mean = mean of prior Gamma = αθ = 1/6. Observation = 1/3. Estimate = (1/3)(1/3) + (1 - 1/3)(1/6) = 2/9. For the second actuary, K = 1/θ = 12. For three years Z = 3/15 = 1/5. Prior mean = mean of prior Gamma = αθ = 1/6. Observation = 1/3. Estimate of 2nd actuary = (1/5)(1/3) + (1 - 1/5)(1/6) = 1/5. The ratio of their estimates is: (2/9)/(1/5) = 10/9. Comment: The second actuary assumes there is less variation between the insureds, and therefore applies less weight to the observation than does the first actuary. Bullet (iv) in the question applies to the prior Gamma, rather than the posterior Gamma. 4.111. B. This is a Gamma-Poisson, with α = 1 (Exponential) and θ = 3. We observe 2 claims in 1 year. Therefore, the posterior distribution is Gamma with αʼ = α + C = 1 + 2 = 3, and 1/θʼ = 1/θ + E = 1/3 + 1 = 4/3. θʼ = 3/4. The variance of the posterior Gamma Distribution is: αθ2 = (3)(3/4)2 = 27/16. 4.112. E. The posterior distribution is Gamma with α = 3 and θ = 3/4. Therefore, the predictive distribution is Negative Binomial with r = 3, β = 3/4, and variance: rβ(1 + β) = (3)(3/4)(7/4) = 63/16 = 3.9375.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 148
4.113. B. The prior distribution of λ is Gamma with α = 50 and θ = 1/500. αʼ = α + C = 50 + 75 + 210 = 335. 1/θʼ = 1/θ + E = 500 + 600 + 900 = 2000. Estimated future frequency = Mean of posterior Gamma = αʼθʼ= 335/2000 = .1675. Expected number of claims = (1100)(.1675) = 184.25. Alternately, K = 1/θ = 500. Z = 1500 / (1500 + 500) = 3/4. Prior mean = mean of prior Gamma = αθ = 50 / 500 = 0.10. Observation = 285 / 1500 = 0.19. Estimate = (3/4) (0.19) + (1 - 3/4) (0.10) = 0.1675. Expected number of claims in Year 3 = (1100) (0.1675) = 184.25. Comment: Note that posterior to Year 1, a Gamma with: 1 α = 50 + 75 = 125, = 500 + 600 = 1100. θ This acts as the Gamma prior to Year 2. Then adding in the experience for Year 2: 1 αʼ = 125 + 210 = 335, = 1100 + 900 = 2000. θ' 4.114. D. Prob[observe 1 or more claim | θ] = 1 - Prob[0 claims | θ] = 1 - e−θ. ∞
∞
∞
∫ (1 - e−θ)θe-θ dθ = ∫θe-θdθ - ∫θe-2θ dθ = 1 - 1/22 = 3/4. 0
0
0
By Bayes Theorem, the posterior distribution of θ is: (1 - e−θ)θe-θ /(3/4). Thus, the posterior probability that this same policyholder will have at least one claim this year is: ∞
∞
∫ (1 - e−θ)(4/3)(1 - e−θ)θe-θdθ = (4/3) ∫θe-θ - 2θe-2θ + θe-3θ dθ = (4/3)(1 - 2/22 + 1/32) = 0.815. 0
0
Alternately, Prob[observe 0 claims this year| θ] = e−θ. Therefore, given the posterior distribution of θ, the density at 0 of the predictive distribution is: ∞
∞
∫ e−θ(4/3)(1 - e−θ)θe-θdθ = (4/3) ∫θe-2θ - θe-3θ dθ = (4/3)(1/22 - 1/32) = 5/27. 0
0
Prob[at least one claim this year] = 1 - 5/27 = 22/27 = 0.815. Alternately, the prior Gamma, f(λ) = λe−λ, λ > 0, has α = 2 and θ = 1. The marginal distribution is Negative Binomial with r = α = 2 and β = θ = 1.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 149
Therefore, Prob(n claims in Year 1) = (n + 1)/22+n. If one observe n claims in Year 1, the posterior Gamma has αʼ = α + C = 2 + n and 1/θʼ = 1/θ + 1 = 1 + 1. θʼ = 1/2. The predictive distribution is Negative Binomial with rʼ = αʼ = 2 + n and βʼ = θʼ = 0.5. Therefore, Prob(0 claims in Year 2 | n claims in Year 1) = 1/1.52+n. Prob(0 claims in Year 2 | 1 or more claims in Year 1) = ∞
∞
Σ Prob(n claims in Yr 1)Prob(0 claims in Yr 2 | n claims in Yr 1) / Σ Prob(n claims in Year 1) 1
1 ∞
∞
∞
= Σ ((n + 1)/22+n)(1/1.52+n) / Σ (1/1.52+n) = {(1/9)Σ (n/3n ) + (1/3n )}/{(1/1.52 )/(1- 1/1.5)} = 1
1
1
(1/9)(3/4 + 1/2)/(4/3) = 5/27. Prob(1 or more claims in year 2 | 1 or more claims in Year 1) = 1 - 5/27 = 22/27 = 0.815. Comment: Bullet number iii is a special case of the formula for Gamma type integrals. It also follows from the formula for the mean of an Exponential Distribution: ∞
∞
∫ x n e-nx dx = (Mean of an Exponential Dist. with density n e-nx) = 1/n ⇒ ∫ x e-nx dx = 1/n2. 0
0 ∞
∞
∞
In the alternate solution, Σ n/3n = Σ n/3n = 1.5Σ n .5n /1.5n+1 = 1
0
0
1.5(mean of Geometric Distribution with β = .5) = (1.5)(.5) = 3/4.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 150
4.115. D. From the previous solution, the posterior distribution of θ is: (4/3)(1 - e−θ)θe-θ. Therefore, the expected future annual frequency is: ∞
∞
∫ θ(1 - e−θ)θe-θdθ = (4/3) ∫θ2e-θ- θ2e-2θ dθ = (4/3)(Γ(3)/13 - Γ(3)/23) = (4/3)(2! - 2!/8) = 2.33. 0
0
Alternately, assume you start with 1000 policyholders. The marginal distribution is Negative Binomial with r = α = 2 and β = θ = 1, with density at 0 of: 1/(1+ 1)2 = 1/4. Therefore, we expect 250 out of these 1000 policyholders to have 0 claims and thus 750 to have at least one claim. K = 1/θ = 1/1 = 1. Z = 1/(1 + K) = 1/2. The a priori mean is: αθ = (2)(1) = 2. Therefore, the expected future annual frequency for the 250 who had no claims is: (1/2)(0) + (1 - 1/2)(2) = 1. Thus we expect these 250 policyholders to have 250 claim next year. We expect the 1000 policyholders to have (1000)(2) = 2000 claims in total. Thus the 750 who had at least one claim are expected to have: 2000 - 250 = 1750 claims. Their expected annual frequency is: 1750/750 = 7/3. Alternately, we expect 2000 claims from 1000 policyholders. On average 250 had no claim. Thus the observed frequency for the 750 policyholders with at least one claim is: 2000/750. K = 1/θ = 1/1 = 1. Z = 1/(1 + K) = 1/2. The a priori mean is: αθ = (2)(1) = 2. Therefore, the expected future annual frequency for the 750 who had at least 1 claim is: (1/2)(2000/750) + (1 - 1/2)(2) = 7/3. 4.116. D. For the Gamma-Poisson, K = 1/θ = 1/1.2 = .833. Z = 2/(2 + K) = .706. Observed frequency is 3/2. Prior mean frequency is the mean of the Gamma: αθ = 1.2. Estimated future frequency is: (.706)(3/2) + (1 - .706)(1.2) = 1.412. Alternately, for the Gamma-Poisson Buhlmann Credibility gives the same result as Bayesian Analysis. αʼ = α + C = 1 + 3 = 4. 1/θʼ = 1/θ + E = 1/1.2 + 2 = 2.833. θʼ = .353. Mean of the posterior Gamma is: αʼθʼ = (4)(.353) = 1.412.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 151
4.117. B. Gamma-Poisson with prior α = 6 and prior θ = 1/100. αʼ = α + (6 + 8 + 11) = 31. 1/θʼ = 1/θ + (100 + 150 + 200) = 550. Estimated future frequency = mean of the posterior Gamma = αʼθʼ = 31/550. Estimate of the number of claims in Month 4: (300)(31/550) = 16.9. Alternately, K = 1/θ = 100. Observed frequency = (6 + 8 + 11)/(100 + 150 + 200) = 1/18. Prior mean frequency = mean of the prior Gamma = αθ = 6/100 = .06. Z = 450/(450 + K) = 9/11. Estimated future frequency = (9/11)(1/18) + (2/11)(.06) = .0564. Estimate of the number of claims in Month 4: (300)(.0564) = 16.9. Comment: It is not clear to me exactly what is going on in this exam question. Perhaps what was intended is that the prior gamma is for a very large group of insureds,while the experience given is for a particular type of insured, each of whom is assumed to have the same mean frequency. 4.118. D. The posterior distribution of λ for a single insured is Gamma with α = 31 and θ = 1/550. The frequency for a sum of 300 insureds is Poisson with mean 300λ. The posterior distribution of 300λ is Gamma with α = 31 and θ = 300/550 = 6/11. The number of claims from 300 insureds is Negative Binomial Distribution with r = 31 and β = 6/11. The mean is: (31)(6/11) = 16.91. The variance is: (16.91)(1 + 6/11) = 26.13. Prob[more than 20 claims] ≅ 1 - Φ[(20.5 - 16.91)/ 26.13 ] = 1 - Φ[.70] = 24.2%. Comment: For the Gamma-Poisson, the mixed distribution for Y exposures is given by a Negative Binomial Distribution, with parameters r = α and β = Yθ. See “Mahlerʼs Guide to Frequency Distributions.” 4.119. B. This is a Gamma-Poisson with α = 5 and θ = 1/2. αʼ = α + C = 5 + 8 = 13. 1/θʼ = 1/θ + E = 2 + 2 = 4. Posterior mean = αʼθʼ = 13/4 = 3.25. Comment: For the Gamma Distribution in Loss Models: f(x) = (x/θ)α e-x/θ / {x Γ(α)}, x > 0. 4.120. C. The predictive distribution is Negative Binomial with rʼ = αʼ = 13 and βʼ = θʼ = 1/4. f(4) = {(13)(14)(15)(16) / 4!} (1/4)4 / (5/4)13+4 = 16.0%.
2013-4-10,
Conjugate Priors §4 Gamma-Poisson,
HCM 10/21/12,
Page 152
4.121. D. Posterior, over one year λ follows a Gamma with α = 13 and θ = 1/4.
⇒ Over 3 years for this single individual, 3λ follows a Gamma with α = 13 and θ = (3)(1/4) = 3/4. Therefore, the posterior mixed distribution for three years is Negative Binomial: with r = 13 and β = 3/4. f(6) =
(3 / 4)6 (13)(14)(15)(16)(17)(18) = 7.97%. 6! (1 + 3 / 4)19
4.122. A. For the Gamma-Poisson, K = 1/θ. The prior mean frequency is αθ. For the first case, Z = 1/(1 + K) = 1/(1 + 1/θ) = θ/(1 + θ). 0.15 = Z1 + (1 - Z)αθ = θ/(1 + θ) + {1/(1 + θ)}αθ = (1 + α)θ/(1 + θ). ⇒ .15 = .85θ + αθ. For the second case, Z = 2/(2 + K) = 2/(2 + 1/θ) = 2θ/(1 + 2θ). 0.20 = Z2 + (1 - Z)αθ = (2)2θ/(1 + 2θ) + {1/(1 + 2θ)}αθ = (4 + α)θ/(1 + 2θ). ⇒ .20 = 3.6θ + αθ. Subtracting the first equation from the second: .05 = 2.75θ. ⇒ θ = 0.0182. ⇒ α = 7.4. Alternately, using Bayes Analysis, for the first situation: αʼ = α + 1, 1/θʼ = 1/θ + 1. (α + 1)/(1/θ + 1) = .15. ⇒ α + 1 = .15/θ + .15. ⇒ .15 = .85θ + αθ. For the second situation, there are 4 claims in 2 years: αʼ = α + 4, 1/θʼ = 1/θ + 2. (α + 4)/(1/θ + 2) = .20. ⇒ α + 4 = .20/θ + .4. ⇒ .20 = 3.6θ + αθ.
⇒ θ = 0.0182. ⇒ α = 7.4. 4.123. D. A Gamma-Poisson with α = 4 and θ = 1/50. αʼ = α + C = 4 + (0)(90) + (1)(7) + (2)(2) + (3)(1) = 18. 1/θʼ = 1/θ + E = 50 + 100 = 150. Posterior mean frequency is: αʼθʼ = 18/150. Expected Number of Claims for 100 exposures is: (100)(18/150) = 12. Alternately, K = 1/θ = 50. Z = 100/(100 + 50) = 2/3. Estimated future frequency is: (2/3)(14/100) + (1/3)(4/50) = 0.12. (100)(0.12) = 12.
2013-4-10,
Conjugate Priors §5 Beta Distribution,
HCM 10/21/12,
Page 153
Section 5, Beta Distribution The quantity xa-1(1-x)b-1 for a > 0, b > 0, has a finite integral from 0 to 1. This integral is called the (complete) Beta Function. The value of this integral clearly depends on the choices of the (a - 1)! (b- 1)! Γ(a) Γ(b) parameters a and b.26 This integral is: = . (a + b - 1)! Γ(a + b) The Complete Beta Function is a combination of three Complete Gamma Functions: β[a,b] =
1
∫ xa - 1 (1- x)b - 1 dx =
0
(a - 1)! (b- 1)! Γ(a) Γ(b) = . (a + b - 1)! Γ(a + b)
Note that β(a, b) = β(b, a). Exercise: What is the integral from zero to 1 of x5 (1-x)3 ? [Solution: β(6, 4) = Γ(6) Γ(4) / Γ(6+4) = 5! 3! / 9! = 1/504 = 0.001984.] One can turn the complete Beta Function into a distribution on the interval [0, 1] in a manner similar to how the Gamma Distribution was created from the (complete) Gamma Function on [0, ∞]. The Incomplete Beta Function involves an integral from 0 to x < 1:27 β[a,b; x] =
x
∫ ta - 1 (1- t)b - 1 dt / β[a, b].
0
The Incomplete Beta Function is zero at x = 0 and one at x = 1. The latter follows from: 1
β[a,b; 1] =
∫0 ta - 1 (1- t)b - 1 dt / β[a, b] = β[a, b] / β[a, b] = 1.
The following relationship, is sometimes useful: β(a,b; x) = 1 - β(b,a; 1-x). The two parameter Incomplete Beta Function is a special case of what Loss Models calls the Beta distribution, for θ = 1. The Beta Distribution in Loss Models has an additional parameter θ which determines its support: F(x) = β(a,b; x/θ), 0 ≤ x ≤ θ. For use in the Beta-Bernoulli frequency process, θ is always equal to one. For θ = 1, f(x) = 26
(a + b - 1)! xa - 1 (1 - x)b - 1, 0 ≤ x ≤ 1. (a - 1)! (b - 1)!
The results have been tabulated and this function is widely used in many applications. See for example the Handbook of Mathematical Functions, by Abramowitz, et. al. 27 As shown in Appendix A of Loss Models.
2013-4-10,
Conjugate Priors §5 Beta Distribution,
β(a,b;x) has mean:
HCM 10/21/12,
Page 154
a (a +1) ab a , second moment: , and variance: . 2 (a + b) (a + b + 1) (a + b) (a+ b +1) a + b
The mean is between zero and one; for b < a the mean is greater than 0.5. For a fixed ratio of a/b the mean is constant and for a and b large β(a,b;x) approaches a Normal Distribution. As a or b get larger the variance decreases. For either a or b extremely large, virtually all the probability is concentrated at the mean. Here are various Beta Distributions with θ = 1:
5
Beta with a = 1, b = 5
Beta with a = 2, b = 4 2
4
1.5
3 2
1
1
0.5 0.2
0.4
0.6
0.8
1
Beta with a = 4, b = 2
0.2
5
2
0.4
0.6
0.8
1
Beta with a = 5, b = 1
4
1.5
3
1
2
0.5
1 0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
For a > b the Beta Distribution is skewed to the left. For a < b it is skewed to the right. For a = b it is symmetric. For a ≤ 1, the Mode = 0. For b ≤ 1, the Mode = 1. If 0 < a < 1, then f(0) = ∞. If 0 < b < 1, then f(1) = ∞. β(a,b;x), the Beta distribution for θ = 1 is closely connected to the Binomial Distribution. The Binomial parameter q varies from zero to one, the same domain as the Incomplete Beta Function. The Beta density is proportional to the chance of success to the power a-1, times the chance of failure to the power b-1. The constant in front of the Beta density is (a+b-1) times the binomial coefficient for (a+b-2) and a-1.
2013-4-10,
Conjugate Priors §5 Beta Distribution,
HCM 10/21/12,
Page 155
The Incomplete Beta Function is a conjugate prior distribution for the Binomial.28 The Incomplete Beta Function for integer parameters can be used to compute the sum of terms from the Binomial Distribution.29 Summary of the Beta Distribution: Support: 0 ≤ x ≤ θ
Parameters: a > 0 (shape parameter), b > 0 (shape parameter) θ > 0 (similar to a scale parameter, determines the support)
F(x) = β(a, b ; x/θ) ≡
f(x) = =
(a + b - 1)! x/ θ a - 1 ∫ t (1- t)b - 1 dt . (a - 1)! (b- 1)! 0
1 Γ(a + b) (x/θ)a (1 - x/θ)b-1 / x = (x/θ)a (1 - x/θ)b-1 / x β(a, b) Γ(a) Γ(b) (a + b - 1)! (x/θ)a-1 (1 - x/θ)b-1 / θ, 0 ≤ x ≤ θ. (a - 1)! (b- 1)!
For a = 1, b = 1, the Beta Distribution is the uniform distribution from [0, θ]. Γ(a + b) Γ(a + n) (a + b - 1)! (a + n - 1)! E[Xn] = θn = θn Γ(a + b + n) Γ(a) (a + b + n - 1)! (a - 1)! = θn
Mean = θ
a(a + 1) ... (a + n - 1) . (a + b)(a + b + 1) ... (a + b + n - 1) a a+b
E[X2 ] = θ2
a (a +1) (a + b) (a + b + 1)
Variance = θ2
Coefficient of Variation = Standard Deviation / Mean =
Skewness =
2 (b - a) (a + b + 2)
a + b +1 ab
Mode = θ
ab (a + b)2 (a+ b +1)
b . a (a + b +1)
a - 1 , for a > 1 and b > 1 a + b - 2
Limited Expected Value = E[X ∧ x] = θ{a/(a+b)} β(a+1, b; x/θ) + x {1-β(b, a; x/θ)}.
28
The Beta-Bernoulli discussed in the next section is a special case of the Beta-Binomial. See “Mahlerʼs Guide to Frequency Distributions.” On the exam you should either compute the sum of binomial terms directly or via the Normal Approximation. Note that the use of the Beta Distribution is an exact result, not an approximation. See for example the Handbook of Mathematical Functions, by Abramowitz, et. al. 29
2013-4-10,
Conjugate Priors §5 Beta Distribution,
HCM 10/21/12,
Page 156
Beta Distribution for a = 3, b = 3, and θ = 1:
1.75 1.5 1.25 1 0.75 0.5 0.25 0.2
0.4
0.6
0.8
1
Uniform Distribution: The Uniform Distribution from 0 to θ is a Beta Distribution with a = 1 and b = 1. Specifically, the Uniform Distribution from 0 to 1 is a Beta Distribution with a = 1, b = 1, and θ = 1. DeMoivreʼs Law is a Beta Distribution with a = 1, b = 1, and θ = ω. The future lifetime of a life aged x under DeMoivreʼs Law is a Beta Distribution with a = 1, b = 1, and θ = ω - x.
2013-4-10,
Conjugate Priors §5 Beta Distribution,
HCM 10/21/12,
Page 157
Problems: 5.1 (1 point) For a Beta Distribution with parameters a = 4, b = 6, and θ = 1, what is the density function at x = 0.4? A. 0.5 B. 1.0
C. 1.5
D. 2.0
E. 2.5
5.2 (1 point) For a Beta Distribution with parameters a = 4, b = 6, and θ = 1, what is the mean? A. 0.2
B. 0.3
C. 0.4
D. 0.5
E. 0.6
5.3 (1 point) For a Beta Distribution with parameters a = 4, b = 6, and θ = 1, what is the variance? A. 0.005
B. 0.01
C. 0.02
D. 0.03
E. 0.04
5.4 (1 point) For a Beta Distribution with parameters a = 4, b = 6, and θ = 1, what is the mode? A. 0.350
B. 0.375
C. 0.400
D. 0.425
E. 0.450
5.5 (2 points) A Beta Distribution with θ = 1, has a mean of 50% and a coefficient of variation of 20%. Determine its parameters a and b. 5.6 (IOA 101, 4/00, Q.3) (1.5 points) In an investigation into the proportion (q) of lapses in the first year of a certain type of policy, the uncertainty about q is modeled by taking q to have a beta distribution with parameters a = 1, b = 9, and θ = 1, that is, with density: f(q) = 9(1 - q)8 , 0 < q < 1. Using this distribution, calculate the probability that q exceeds 0.2.
2013-4-10,
Conjugate Priors §5 Beta Distribution,
HCM 10/21/12,
Page 158
Solutions to Problems: 5.1. E. f(x) = {(a+b-1)! / (a-1)! (b-1)!} (x/θ)a-1{1 - (x/θ)}b-1 / θ = {9! / 3! 5!}.43 .65 = 504(.064)(.07776) = 2.508. 5.2. C. Mean = θa/(a+b) = 4 / 10 = 0.4. 5.3. C. Second moment = θ2a(a+1)/{(a+b)(a+b+1)}= (4)(5)/{(10)(11) = .1818. Variance = .1818 - .42 = 0.0218. Alternately, Variance = θ2ab / {(a+b)2 (a+b+1)} = (4)(6) / {102 11} = 0.0218. 5.4. B. f(x) is proportional to: x3 (1- x)5 . Setting the derivative with respect to x equal to zero: 0 = 3x2 (1- x)5 - 5x3 (1- q)4 .
⇒ 3(1 - x) = 5x. ⇒ x = 3/8 = 0.375. Comment: In general, the mode of a Beta Distribution is: θ(a - 1) / (a + b - 2), for a > 1 and b > 1. A graph of the density of this Beta Distribution, near the mean of 4/(4 + 6) = 0.4: Prob. 2.5 2.4 2.3 2.2 2.1 2 0.3
5.5. 50% =
0.35
0.4
0.45
0.5
x
a . ⇒ a = b. a + b
(Second Moment) / (Mean)2 = 1 + CV2 = 1.04. ⇒ Second Moment = (1.04)(0.52 ) = 0.26. 0.26 =
a (a +1) a (a +1) (a +1) = = .⇒ (a + b) (a + b + 1) (a + a) (a + a + 1) 2 (2a + 1)
(0.52) (2a + 1) = a + 1. ⇒ a = 12. ⇒ b = 12.
2013-4-10,
Conjugate Priors §5 Beta Distribution,
5.6.
1
∫
q=1
Prob[q > 0.2] = 9(1 - q)8 dq = -(1 - q)9 0.2
] = 0.89 = 0.134.
q = 0.2
HCM 10/21/12,
Page 159
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 160
Section 6, Beta-Bernoulli The Beta-Bernoulli is another example of a conjugate prior situation. As with the Gamma-Poisson, it involves a mixture of claim frequency parameters across a portfolio of risks. Here rather than a Poisson process, the number of claims a particular policyholder makes in a year or single trial is assumed to be Bernoulli with mean q. For a Bernoulli Distribution with parameter q the chance of having 1 claim is q, and of zero claims is 1-q. In a single Bernoulli trial there is either zero or one claim.30 The parameter q is greater than or equal to zero and less than or equal to 1; 0 ≤ q ≤ 1. The mean of the Bernoulli is q and its variance is q(1-q). Prior Distribution: Assume that the values of q are given by a Beta Distribution with a = 5, b = 7, and θ = 1, β(5,7; x), with probability density function:31 f(q) = 2310 q4 (1-q)6 , 0 ≤ q ≤ 1. This prior Beta Distribution is displayed below:
2.5 2 1.5 1 0.5
0.2
30
0.4
0.6
0.8
1
Bernoulli Parameter
The sum of m independent Bernoulli trials with the same parameter q, is a Binomial Distribution with parameters m and q. The Bernoulli is a special case of the Binomial for m = 1. 31 For the Beta-Bernoulli, the value of θ is always 1, so that q goes from 0 to 1.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 161
Marginal Distribution (Prior Mixed Distribution): If we have a risk and do not know what type it is, in order to get the chance of having a claim, one would weight together the chances of having a claim, using the a priori probabilities of the Bernoulli parameter q and integrating from zero to one: 1
∫ q f(q) dq = 2310 0
1
∫ q5 (1- q)6 dq
= 2310
0
5! 6! = 2310 / 5544 = 5/12 = 0.417. 12!
Where we have used the fact from the previous section: 1
∫ xa - 1 (1- x)b - 1 dx =
0
(a - 1)! (b- 1)! Γ(a) Γ(b) = . (a + b - 1)! Γ(a + b)
Thus the chance of having one claim is 0.417. Regardless of what type of risk we have chosen from the portfolio, the only other possibility is having no claims and therefore the chance of having no claims is 0.583. The (prior) marginal distribution is a Bernoulli with parameter q = 0.417. In general if one has q given by β(a,b; x), then the marginal distribution is a Bernoulli with parameter given by the integral from zero to one of q times f(q). This is just the mean of the β(a,b; x) distribution. Thus, if the Bernoulli parameters q are distributed by β(a,b; x), then the marginal distribution is also a Bernoulli with parameter a/(a+b), the mean of β(a,b; x). Note that for the particular case a = 5 and b = 7 we get a marginal distribution with Bernoulli parameter of 5/(5+7) = 5/12, which matches the result obtained above. Prior Expected Value of the Process Variance: The process variance for an individual risk is: q(1-q) = q - q2 since the frequency for each risk is Bernoulli. Therefore the expected value of the process variance = the expected value of q minus the expected value of q2 = the a priori mean frequency - second moment of the frequency.32 The former is the mean of β(a,b): a/(a+b). The latter is the second moment of β(a,b): 32
a (a +1) . (a + b) (a + b + 1)
This relationship holds generally for mixing Bernoullis or Binomials, whether or not q follows a Beta Distribution. See Example 20.37 in Loss Models.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 162
For a = 5 and b = 7, the mean of β(5,7) is 5/12 = 0.4167, while the second moment of β(5,7) is:
(5)(6) = 0.1923. (12)(13)
Thus the expected value of the process variance is: 0.4167 - 0.1923 = 0.224. In general, the expected value of the process variance is : first moment of Beta Distribution - second moment of the Beta Distribution = a (a +1) ab a = . a + b (a + b) (a + b + 1) (a + b) (a + b + 1) For a = 5 and b = 7 this equals: (5)(7) / {(12)(13)} = 0.224, which matches the previous result. Prior Variance of the Hypothetical Means: The variance of the hypothetical means is the variance of q = Var[q] = ab Variance of the Prior Beta = . For a = 5 and b = 7 this is: 0.0187. 2 (a + b) (a + b + 1) Prior Total Variance: The total variance is the variance of the marginal distribution. The variance of the Bernoulli is the chance of success times the chance of failure. The marginal distribution is a Bernoulli with chance of ab a a a success . Thus the total variance is: {1 }= . (a + b)2 a + b a + b a + b For a = 5 and b = 7 this equals: (5)(7)/122 = 0.243. The Expected Value of the Process Variance + Variance of the Hypothetical Means = 0.224 + 0.019 = 0.243 = Total Variance.
EPV + VHM =
ab ab ab + = = Total Variance. 2 (a + b) (a + b + 1) (a + b) (a + b + 1) (a + b)2
VHM = the variance of the Prior Beta. Total Variance = the variance of the Marginal Bernoulli = EPV + VHM. ab ab ⇒ = Variance of Prior Beta < Variance of Marginal Bernoulli = . 2 (a + b) (a + b + 1) (a + b)2
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 163
Observations: Let us now introduce the concept of observations. A risk is selected at random and it is observed to have 13 claims in 19 trials (or years.) Note that for the Bernoulli the number of claims is less than or equal to the number of trials. (One can describe this case of the Bernoulli as observing 13 “successes” in 19 trials or equivalently 6 “failures” in 19 trials.) Posterior Distribution: We can employ Bayesian analysis to compute what the chances are that the selected risk had a given Bernoulli Parameter. Given a Bernoulli with parameter q, the chance of observing 13 claims in 19 trials is Binomial:33 27,132 q13(1-q)6 . The a priori probability of q is the Prior Beta distribution: π(q) = 2310 q4 (1-q)6 , 0 ≤ q ≤ 1. Thus the posterior chance of q is proportional to the product of the chance of observation and the a priori probability: q17(1-q)12. This is proportional to the density for a Beta with a = 18, b = 13, and θ = 1, f(x) = 1,556,878,050 q17(1-q)12, 0 ≤ q ≤ 1. In general, if one observes r claims for n trials, we have that the chance of this observation given q is proportional to qr(1-q)n-r. The prior Beta is proportional to qa-1(1-q)b-1. Note the way that both the Beta and the Bernoulli have q to a power and (1-q) to another power. The posterior probability for q is proportional to their product qa+r-1(1-q)b+n-r-1. This is proportional to the density for β(a+r, b+n-r; x). Thus for the Beta-Bernoulli the posterior density function is also a Beta. This posterior Beta has a first parameter = prior first parameter plus the number of claims observed. This posterior Beta has a second parameter = prior second parameter plus the number of trials (usually years) minus the number of claims observed34. The updating formulas are: aʼ = a + r.
bʼ = b + (n - r).
For example, in the case where we observed 13 claims in 19 trials, r = 13 and n = 19. The prior first parameter was 5 while the prior second parameter was 7. Therefore the posterior first parameter = 5 + 13 = 18, while the posterior second parameter = 7 + 19 - 13 = 13, matching the result obtained above, β(18,13; x). 33
The constant in front of the Binomial is 19! / (13! 6!) = 27132. The posterior second parameter = prior second parameter + the number of failures observed. The posterior first parameter = prior first parameter + the number of successes observed. In all case the third parameter, θ, is 1. 34
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 164
The prior distribution of q is: (a + b - 1)! π(q) = β(a, b; q) = qa-1 (1 - q)b-1, 0 ≤ q ≤ 1. (a - 1)! (b- 1)! If for example we were modeling the testing of missiles, then q is associated with successes, while 1-q is associated with the failures. We add the number of successes to a, which is in the exponent of q: aʼ = a + r. We add the number of failures to b, which is in the exponent of 1-q: bʼ = b + (n - r). The fact that the posterior distribution is of the same form as the prior distribution is why the Beta is a Conjugate Prior Distribution for the Bernoulli. Below are compared the prior β(5, 7; x) (solid) and the posterior β(18, 13; x) (dashed):
4
3
2
1
0.2
0.4
0.6
0.8
1
q
Observing 13 claims in 19 trials has increased the probability of a large Bernoulli parameter and decreased the probability of a small Bernoulli parameter. Exercise: If the prior Beta has a = 5 and b = 7, and 3 claims are observed in 19 trials, what is the posterior Beta? [Solution: aʼ = 5 + 3 = 8. bʼ = 7 + 19 - 3 = 23. Posterior Beta is β(8, 23; x). Comment: The posterior density is: f(q) = 46,823,400 q7 (1-q)22.]
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 165
Below are compared the prior β(5, 7; x) (solid) and this posterior β(8, 23; x) (dashed):
5
4
3
2
1
q 0.2
0.4
0.6
0.8
1
Observing 3 claims in 19 trials has decreased the probability of a large Bernoulli parameter and increased the probability of a small Bernoulli parameter.
Predictive Distribution (Posterior Mixed): Since the posterior distribution is also a Beta distribution, the same analysis that led to a Bernoulli (prior) marginal distribution, will lead to a (posterior) predictive distribution that is Bernoulli . However, the parameters are related to the posterior Beta. For the Beta-Bernoulli the (posterior) predictive distribution is always a Bernoulli, with q = (first parameter of the posterior Beta) / (1st parameter of posterior Beta + 2nd parameter of posterior Beta). Thus q = (first parameter of the prior Beta + number of claims observed) / (first parameter of the prior Beta + second parameter of the prior Beta + number of trials observed). In the particular example with a posterior distribution of β(18,13; x), the parameter of the posterior Bernoulli predictive distribution is q = 18 / (18+13) = 0.5806. Alternatively, one can compute this in terms of the prior β(5,7; x) and the observations of 13 claims in 19 trials; q = (5+13) / (5+7+19) = 0.5806.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 166
Posterior Mean: One can compute the means and variances posterior to the observations. The posterior mean can be computed in either one of two ways. First one can weight together the means for each type of risk, using the posterior probabilities. This is E[q] = the mean of the posterior Beta = 18/(18+13) = 0.5806 . Alternately, one can compute the mean of the predictive distribution: the mean of the predictive Bernoulli is q = 0.5806. Of course the two results match. Thus posterior to the observations, for this risk, the new estimate using Bayesian Analysis is 0.5806. This compares to the a priori estimate of 0.4167. In general, the observations provide information about the given risk, which allows one to make a better estimate of the future experience of that risk. Not surprisingly observing 13 claims in 19 trials (for a frequency of 0.6842) has raised the estimated frequency from 0.4167 to 0.5806. Posterior Expected Value of the Process Variance: The process variance for an individual risk is q(1-q) = q - q2 , where q is its Bernoulli parameter. Therefore the expected value of the process variance = the expected value of q - the expected value of q2 = the posterior mean frequency - second moment of the frequency. The former is the mean of the β(a,b): a/(a+b). The latter is the second moment of β(a,b): a(a+1) / {(a+b)(a+b+1)}. The expected value of the process variance is: 1st moment of Beta Distribution - 2nd moment of the Beta Distribution = a (a +1) ab a = . a + b (a + b) (a + b + 1) (a + b) (a + b + 1) For a = 18 and b = 13 this equals:
(18) (13) = 0.2359. (32) (31)
Posterior Variance of the Hypothetical Means: The variance of the hypothetical means is the variance of q = Var[q] = variance of the Posterior Beta =
ab (18) (13) = = 0.00761. (a + b)2 (a+ b +1) (32) (312)
Note how after the observation the variance of the hypothetical means is less than prior (0.00761 < 0.0187) since the observations have allowed us to narrow down the possibilities.35 35
The posterior VHM is usually but not always less than the prior VHM. When the observation corresponds to a low prior expectation, then the posterior VHM can be larger than the prior VHM. For example with a = 21 and b = 1, the a priori mean is 21/22 = 0.955. If one observes one claim in 25 trials, then the posterior β(22,25; x) has a variance of 0.0052, greater than the 0.0019 variance of the prior β(21,1; x).
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 167
Posterior Total Variance: The total variance is the variance of the predictive distribution. The variance of the Bernoulli equals q(1-q) = (0.5806) (1 - 0.5806) = 0.2435 . The Expected Value of the Process Variance + Variance of the Hypothetical Means = 0.2359 + 0.0076 = 0.2435 = Total Variance, as per the general result. Buhlmann Credibility: Next, letʼs apply Buhlmann Credibility to this example. The Buhlmann Credibility parameter K = the (prior) expected value of the process variance / the (prior) variance of the hypothetical means = 0.2244 / 0.0187 = 12. Note that K can be computed prior to any observation and doesnʼt depend on them. Specifically both variances are for a single insured for one trial.
In general K =
prior EPV = prior VHM
ab (a + b) (a + b +1) ab 2 (a + b) (a + b +1)
= a + b.
For the Beta-Bernoulli in general, the Buhlmann Credibility parameter K = a + b, where β(a, b; x) is the prior distribution. For the example, K = 5 + 7 = 12. Having observed 13 claims in 19 trials, Z = 19 / (19+ 12) = 0.6129. The observation = 13/19. The a priori mean = 5/12 = 0.4167. Therefore the new estimate = (0.6129)(13/19) + (1 - 0.6129)(0.4167) = 0.5806. Note that in this case the estimate from Buhlmann Credibility matches the estimate from Bayesian Analysis. For the Beta-Bernoulli the estimates from using Bayesian Analysis and Buhlmann Credibility are equal.36 Summary: The many different aspects of the Beta-Bernoulli are summarized below. Be sure to be able to clearly distinguish between the situation prior to observations and that posterior to the observations. Note the parallels to the Gamma-Poisson as summarized previously. 36
As discussed in a subsequent section this is a special case of the general results for conjugate priors of members of linear exponential families.
Page 168 HCM 10/21/12,
Conjugate Priors §6 Beta-Bernoulli, 2013-4-10,
Beta-Bernoulli Frequency Process
Bernoulli Process
(Number of Claims) Bernoulli parameter q = mean of Bernoulli = a/(a+b) = mean of prior Beta. Variance = q(1-q) = ab/(a+b)2 .
Bernoulli Marginal Distribution:
Beta is a Conjugate Prior, Bernoulli is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate Buhlmann Credibility Parameter, K = a + b.
Beta Prior a, b Mixing
Bernoulli Process
Mixing
(Number of Claims) Bernoulli parameter q = mean of Bernoulli = (a+r)/(a+b+n) = mean of posterior Beta. Variance = q(1-q) = (a+r)(b+n-r)/(a+b+n)2 .
Bernoulli Predictive Distribution:
Observations: # claims = # successes = r, # exposures = # of trials = n.
(Distribution of Parameters)
Beta Posterior (Distribution of Parameters) Posterior 1st parameter = a + r. Posterior 2nd parameter = b + n - r.
Bernoulli Parameters of individuals making up the entire portfolio are distributed via a Beta Distribution with parameters a and b: f(x) = (a+b-1)! xa-1 (1-x)b-1 /{(a-1)!(b-1)!}, 0 ≤ x ≤ 1, mean = a/(a+b), variance = ab/{(a+b+1)(a+b)2 }.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 169
Uniform-Bernoulli: Since the Uniform Distribution is a special case of the Beta Distribution with a = 1 and b = 1, the Uniform-Bernoulli is a special case of the Beta-Bernoulli.37 When the Parameter θ is Less Than One: Assume each insured has Bernoulli frequency with parameter q, and q is distributed via a Beta Distribution with parameters a, b, and θ < 1.38 The prior distribution of q is proportional to: qa - 1(1 - q/θ)b - 1, 0 ≤ q ≤ θ < 1. If we observe r claims in n years, the probability of the observation is proportional to: qr (1 - q)n-r. Therefore, using Bayes Theorem, the posterior distribution of q is proportional to: qr (1 - q)n-r qa - 1(1 - q/θ)b - 1 = qr+a -1 (1 - q)n-r (1 - q/θ)b - 1, 0 ≤ q ≤ θ < 1. Unless n = r, the posterior distribution of q is not a Beta Distribution.39 When θ < 1, we do not have a Conjugate Prior situation. We can apply Buhlmann Credibility to this situation in the usual manner. Process variance is q(1 - q) = q - q2 . EPV = E[q - q2 ] = E[q] - E[q2 ] = First moment of Beta - Second Moment of Beta = θa/(a + b) - θ2a(a + 1)/{(a + b)(a + b + 1)} = {θa/(a + b)}{1 - θ(a+1)/(a + b + 1)}. VHM = Var[q] = Variance of Beta = Second Moment of Beta - Square of First Moment of Beta = θ2
a (a +1) a2 2 a { a + 1 - a }. - θ2 = θ (a + b)2 (a + b) (a + b + 1) a + b a + b+ 1 a + b
K = EPV/VHM =
1 - θ (a + 1) / (a + b + 1) (a + b + 1) / θ - (a + 1) 40 = (a + b) . θ {(a + 1)/ (a + b + 1) - a / (a + b)} b
When θ = 1, K = a + b, as obtained previously. 37
See 4, 5/89, Q.49; 4B, 5/96, Q.30; 4B, 5/97, Q.9; 4B, 11/97, Q.19; 4B, 11/98, Q.14; 4, 11/00, Q.11. For a = 1 and b = 1, this is the special case of a uniform distribution from 0 to θ. 39 4, 11/03, Q.19, where the prior distribution of q is a uniform from 0 to 0.5, is an example of this exception; we see a claim every year, and thus the posterior distribution is a Beta. However, you should treat 4, 11/03, Q.19 as just another continuous risk type Bayes question. 40 I would not memorize this formula. 38
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 170
Problems: Use the following information to answer the next 14 questions: The number of claims r that a particular policyholder makes in a year is Bernoulli with mean q. The q values of the portfolio of policyholders have probability density function: g(q) = 280 q3 (1 - q)4 , 0 ≤ q ≤ 1. You are given the following values of the Incomplete Beta Function: y 0.45 0.50 0.55 0.60
β(4,5; y)
β(10,11; y)
β(11,10; y)
0.523 0.637 0.740 0.826
0.409 0.588 0.751 0.872
0.249 0.412 0.591 0.755
6.1 (1 point) What is the mean claim frequency for the portfolio? A. less than 45% B. at least 45% but less than 46% C. at least 46% but less than 47% D. at least 47% but less than 48% E. at least 48% 6.2 (1 point) What is the chance that an insured picked at random from this portfolio has a Bernoulli parameter between 0.50 and 0.55? A. less than 10% B. at least 10% but less than 11% C. at least 11% but less than 12% D. at least 12% but less than 13% E. at least 13%
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 171
6.3 (2 points) The probability that a policyholder chosen at random will experience n claims in a year is given by which of the following? ⎛ 1⎞ A. ⎜ ⎟ (3n ) (61-n) / 9 n = 0, 1 ⎝n⎠ ⎛ 1⎞ B. ⎜ ⎟ (6n ) (31-n) / 9 ⎝n⎠
n = 0, 1
⎛ 1⎞ C. ⎜ ⎟ (4n ) (51-n) / 9 ⎝n⎠
n = 0, 1
⎛ 1⎞ D. ⎜ ⎟ (5n ) (41-n) / 9 ⎝n⎠
n = 0, 1
E. None of A, B, C, or D 6.4 (3 points) What is the expected value of the process variance? A. less than 0.20 B. at least 0.20 but less than 0.22 C. at least 0.22 but less than 0.24 D. at least 0.24 but less than 0.26 E. at least 0.26 6.5 (2 points) What is the variance of the hypothetical mean frequencies? A. less than 0.018 B. at least 0.018 but less than 0.020 C. at least 0.020 but less than 0.022 D. at least 0.022 but less than 0.024 E. at least 0.024 6.6 (1 point) What is the variance of the claim frequency for the portfolio? A. less than 0.23 B. at least 0.23 but less than 0.24 C. at least 0.24 but less than 0.25 D. at least 0.25 but less than 0.26 E. at least 0.26
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 172
6.7 (2 points) An insured has 7 claims over 12 years. Using Buhlmann Credibility what is the estimate of this insured's expected claim frequency? A. less than 51% B. at least 51% but less than 53% C. at least 53% but less than 55% D. at least 55% but less than 57% E. at least 57% 6.8 (2 points) An insured has 7 claims over 12 years. The posterior probability density function for this insured's Bernoulli parameter q is proportional to which of the following? A. q9 (1-q)10 B. q10 (1-q)9 E. None of A, B, C, or D
C. q8 (1-q)11
D. q11 (1-q)8
6.9 (2 points) An insured has 7 claims over 12 years. What is the mean of the posterior distribution? A. less than 49% B. at least 49% but less than 50% C. at least 50% but less than 51% D. at least 51% but less than 52% E. at least 52% 6.10 (1 point) An insured has 7 claims over 12 years. What is the chance that this insured has a Bernoulli parameter between 0.50 and 0.55? A. less than 16% B. at least 16% but less than 17% C. at least 17% but less than 18% D. at least 18% but less than 19% E. at least 19% 6.11 (2 points) An insured has 7 claims over 12 years. What is the variance of the posterior distribution? A. less than 0.010 B. at least 0.010 but less than 0.012 C. at least 0.012 but less than 0.014 D. at least 0.014 but less than 0.016 E. at least 0.016
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 173
6.12 (1 point) An insured has 7 claims over 12 years. What is the chance that this insured has a Bernoulli parameter between 0.50 and 0.55? Use the Normal Approximation. A. 13% B. 15% C. 17%
D. 19%
E. 21%
6.13 (2 points) An insured has 7 claims over 12 years. What is the probability density function for the predictive distribution of the number of claims per year for this insured? ⎛ 1⎞ A. ⎜ ⎟ (9n ) (121-n) / 21 n = 0, 1 ⎝n⎠ ⎛ 1⎞ B. ⎜ ⎟ (12n ) (91-n) / 21 ⎝n⎠
n = 0, 1
⎛ 1⎞ C. ⎜ ⎟ (8n ) (131-n) / 21 ⎝n⎠
n = 0, 1
⎛ 1⎞ D. ⎜ ⎟ (13n ) (81-n) / 21 ⎝n⎠
n = 0, 1
E. None of A, B, C, or D 6.14 (4 points) An insured has 7 claims over 12 years. What is the probability that this same insured will have 7 claims over the next 12 years? A. 15% B. 17% C. 19% D. 21% E. 23%
6.15 (2 points) You are given the following:
•
The number of claims for a single insured is a Bernoulli with parameter q, where q varies between insureds.
•
The overall average frequency is 0.6.
•
The Expected Value of the Process Variance is 0.2.
Determine the Variance of the Hypothetical Mean Frequencies. A. less than 0.03 B. at least 0.03 but less than 0.05 C. at least 0.05 but less than 0.07 D. at least 0.07 but less than 0.09 E. at least 0.09
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 174
Use the following information for the next two questions: • The number of claims that a particular policyholder makes in a year is Bernoulli with mean q. • The q values over the portfolio of policyholders are uniformly distributed from 0 to 1.
• A policyholder is observed to have 2 claims in 7 years. 6.16 (2 points) What is the expected future annual claim frequency for this policyholder? A. 1/3 B. 3/8 C. 3/7 D. 4/9 E. 5/11 6.17 (3 points) What is the probability that this policyholder has a q parameter less than 0.4? A. less than 70% B. at least 70% but less than 72% C. at least 72% but less than 74% D. at least 74% but less than 76% E. at least 76%
Use the following information for the next two questions: • You have your back to a pool table. • Your friend places the cue ball at random on the table. • He places another ball at random on the table, and tells you that it is to the left of the cue ball. • He places yet another ball at random on the table, and tells you that it is to the right of the cue ball. • He places yet another ball at random on the table, and tells you that it is to the left of the cue ball. 6.18 (2 points) Using Bayes Analysis, estimate the fraction of the way the cue ball is from the left end of the table towards the right end. 6.19 (2 points) Using Bayes Analysis, estimate the probability that the cue ball is less than one fourth of the way from the left end of the table towards the right end.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 175
Use the following information for the next four questions: Professor Zweistein of the Institute of Basic Studies in Kingston, N. J. has determined that the chance of getting a head when flipping a U. S. penny is a Bernoulli process, with the expected number of heads distributed among the different pennies via the Incomplete Beta Function β(499,501; x). 6.20 (1 point) A penny is chosen at random and flipped. What is the chance of a head? A. less than 49.4% B. at least 49.4% but less than 49.6% C. at least 49.6% but less than 49.8% D. at least 49.8% but less than 50.0% E. at least 50.0% 6.21 (2 points) A penny is chosen at random and flipped 2000 times. 1010 heads are observed. Use Buhlmann Credibility to estimate the future chance of a head when flipping this penny. A. 50.1% B. 50.2% C. 50.3% D. 50.4% E. 50.5% 6.22 (3 points) A penny is chosen at random and flipped 2000 times. 1010 heads are observed. What is the chance that this penny has a Bernoulli parameter greater than .500? Hint: Approximate a Beta Distribution by a Normal Distribution. A. less than 60% B. at least 60% but less than 65% C. at least 65% but less than 70% D. at least 70% but less than 75% E. at least 75% 6.23 (2 points) A penny is chosen at random and flipped 2000 times. 1010 heads are observed. Which of the following is a 90% confidence interval for the Bernoulli parameter of this penny? Hint: Approximate a Beta Distribution by a Normal Distribution. A. (0.479, 0.527) B. (0.482, 0.524) C. (0.485, 0.521) D. (0.488, 0.518) E. (0.491, 0.515)
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 176
Use the following information about a missile defense system for the next three questions:
•
Each trial has chance of success q, independent of any other trial.
•
A priori you assume that q is Beta distributed, with a = 5 and b = 3.
•
You observe one success in the first six trials.
6.24 (2 points) Estimate the chance of success on the next trial. (A) 1/5 (B) 2/5 (C) 3/7 (D) 1/2 (E) None of A, B, C, or D 6.25 (1 point) What is the variance of the predictive distribution? (A) 15/64 (B) 6/25 (C) 35/144 (D) 12/49 (E) None of A, B, C, or D 6.26 (3 points) Estimate the chance of having a failure on each of the next three trials. A. less than 18% B. at least 18% but less than 19% C. at least 19% but less than 20% D. at least 20% but less than 21% E. at least 21%
6.27 (8 points) Over the last decade in the Duchy of Grand Fenwick there have been 111 boys born and 97 girls born. You assume that the natural proportion of boys born to human beings is 52%. Use Bayes Analysis to determine the probability that the future longterm proportion of boys born in the Duchy of Grand Fenwick will be greater than 52%. (a) (2 points) Assume that the longterm proportion of boys born varies between populations. It is 48% for 1/4 of the populations, 52% for 1/2 of the populations, and 56% for the remaining 1/4 of populations. (b) (4 points) Assume that the longterm proportion of boys born varies between populations, uniformly from 48% to 56%. (c) (2 points) Assume that the longterm proportion of boys born varies between populations, following a Beta Distribution with a = 13, b = 12, and θ = 1.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 177
Use the following information for the next 3 questions: • The probability that a baseball player gets a hit in any given attempt is q. • The results of attempts are independent of each other. • For a particular ballplayer, q does not vary by attempt. • The prior distribution of q is assumed to follow a distribution with density function proportional to: q134 (1-q)349, 0 ≤ q ≤ 1. 6.28 (1 point) What is the probability that a ballplayer chosen at random will get a hit on his next attempt? A. less than 26% B. at least 26% but less than 27% C. at least 27% but less than 28% D. at least 28% but less than 29% E. at least 29% 6.29 (2 points) Flash Phillips is observed for 100 attempts and gets 40 hits. How many hits do you expect Flash to get in his next 100 attempts? A. 29 B. 30 C. 31 D. 32 E. 33 6.30 (2 points) Flash Phillips is observed for 100 more attempts for a total of 200, and gets 5 more hits, for a total of 45. Estimate the chance that Flash will get a hit in his next attempt. A. less than 0.255 B. at least 0.255 but less than 0.260 C. at least 0.260 but less than 0.265 D. at least 0.265 but less than 0.270 E. at least 0.270
6.31 (2 points) Lucy van Pelt will hold a football for Charlie Brown to kick. The probability that Lucy will pull the football away just as Charlie tries to kick it is q. You assume that the probability q will be the same for each trial, and that the results of each trial are independent. Prior to any observations, you had assumed that q has a Beta Distribution with parameters a = 1, b = 5, and θ = 1. Over the years, 47 times in a row, Lucy has pulled the football away just before Charlie tried to kick it, and Charlie landed flat on his back. The 48th time, what is the probability that Lucy pulls the football away just before Charlie tries to kick it? A. 80%
B. 90%
C. 95%
D. 99%
E. 99.9%
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 178
6.32 (4 points) A scientist, Lucky Tom, finds coins on his 60 minute walk to work at a Poisson rate of 0.5 coins/minute. The denominations are randomly distributed: (i) 60% of the coins are worth 1; (ii) 20% of the coins are worth 5; and (iii) 20% of the coins are worth 10. One of Tomʼs fellow scientists accidentally released a tyrannosaur, which eats only scientists. Each scientist has a chance of being eaten of q per day. For an individual scientist his value of q remains constant as long as he remains alive. Initially, over all scientists, q is distributed via a Beta Distribution with a = 2, b = 150, and θ = 1. Since the tyrannosaur was released, Lucky Tom has survived 300 days without being eaten. What is the expected amount of money Lucky Tom finds in the future before being eaten? A. 35,000 B. 40,000 C. 45,000 D. 50,000 E. 55,000 Use the following information for the next two questions: • Baseball teams play 162 games in a year. • The probability that a baseball team wins any given game is q. • The results of games are independent of each other. • For a particular team, q does not vary during a year. • Over the different teams, q is distributed via a Beta Distribution with a = 15, b = 15, and θ = 1. • The Durham Bulls baseball team wins 40 of its first 60 games this year. 6.33 (2 points) What is the expected total number of games the Durham Bulls will win this year? (A) 96 (B) 98 (C) 100 (D) 102 (E) 104 6.34 (3 points) What is the variance of the total number of games the Durham Bulls will win this year? (A) 48 (B) 50 (C) 52 (D) 54 (E) 56
6.35 (3 points) For each mother, each child has a chance q of being a girl, independent of the gender of her other children. The value of q varies across the population via a Beta Distribution with parameters a = 10, b = 10, and θ = 1. Mrs. Molly Weasley has had six children, all sons. What is the probability that her next child will be a girl? A. less than 40% B. at least 40% but less than 42% C. at least 42% but less than 44% D. at least 44% but less than 46% E. at least 46%
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 179
Use the following information for the next three questions:
• At halftime of a basketball game they will choose someone from the crowd at random. • The person chosen will get a chance to make a shot from half court. • Let q be the chance of making the shot. • You assume that q is distributed across attendees via a Beta Distribution with a = 1, b = 19, and θ = 1.
• In honor of retiring the uniform number 5 of their former star player Archibald Andrews, today the team will allow the lucky person chosen five chances to make the half court shot.
• At todayʼs game, Steven Quincy Urkel is chosen out of the crowd. • Steve misses his first four attempts at making the shot. 6.36 (2 points) Estimate Steveʼs chance of making his last shot, using the Bayes estimate for the squared error loss function. (A) 0 (B) 1% (C) 2% (D) 3% (E) 4% 6.37 (2 points) Estimate Steveʼs chance of making his last shot, using the Bayes estimate for the absolute error loss function. (A) 0 (B) 1% (C) 2% (D) 3% (E) 4% 6.38 (2 points) Estimate Steveʼs chance of making his last shot, using the Bayes estimate for the zero-one loss function. (A) 0 (B) 1% (C) 2% (D) 3% (E) 4%
6.39 (2 points) Use the following information: • Baseball teams play 162 games in a year. • The probability that a baseball team wins any given game is q. • The results of games are independent of each other. • For a particular team, q varies between the games during a year. • The Hadley Saints baseball team has an expected winning percentage this year of 60%. Determine the standard deviation of the total number of games the Hadley Saints will win this year.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 180
Use the following information for the next two questions: • Each policyholder will have zero or one claim in a year.
• The probability of having a claim is equal to q. • The q values over the portfolio of policyholders are uniformly distributed from 0 to 1. 6.40 (2 points) A particular policyholder has no claims over n years. Determine the expected number of claims this policyholder will have in the following year. 6.41 (2 points) A different policyholder has one claim in each of n years. Determine the expected number of claims this policyholder will have in the following year.
6.42 (4, 5/89, Q.38) (2 points) The prior distribution of your hypothesis about the unknown value of H is given by P(H = 1/4) = 4/5 P(H = 1/2 ) = 1/5 . The data from a single experiment is distributed according to P(D=d | H=h) = hd (1-h)1-d for d = 0, 1. If the result of a single experimental outcome is d = 1, what is the posterior distribution of H? A. P(D = d | H = h) = hd/2 (1-h)1-d/2, for d = 0, 1 B. P(H = 1/4) = 2/3, P(H = 1/2) = 1/3 C. P(H = 1/4) = 1/2, P(H = 1/2) = 1/2 D. P(D = d | H = h) = h2d/3 (1-h)1-d/3, for d = 0, 1 E. P(H = 1/4) = 1/3, P(H = 1/2) = 2/3 6.43 (4, 5/89, Q.49) (3 points) The probability of y successes in n trials is given by the binomial distribution with p.d.f.: ⎛n⎞ f(y; θ) = ⎜ ⎟ θy (1 - θ)n-y. ⎝y ⎠ The prior distribution of θ is a uniform distribution: g(θ) = 1, 0 ≤ θ ≤ 1. Given that one success was observed in two trials, what is the Bayesian estimate for the probability that the unknown parameter θ is in the interval [0.45, 0.55]? A. Less than 0.10 B. At least 0.10, but less than 0.20 C. At least 0.20, but less than 0.30 D. At least 0.30, but less than 0.40 E. 0.40 or more
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 181
6.44 (165, 11/89, Q.9) (1.7 points) You are using the Bayesian process to estimate a binomial probability. The prior distribution is Beta with θ = 1. You are given: (i) The mean of the prior distribution is 1/10. (ii) The mode of the prior distribution is 1/20. (iii) The mean of the posterior distribution is 19/115. (iv) Five trials of an experiment produce h successes. Determine h. Hint: The mode of a Beta Distribution is θ(a - 1) / (a + b - 2). (A) 1
(B) 2
(C) 3
(D) 4
(E) 5
6.45 (4B, 11/92, Q.27) (2 points) You are given the following:
• The distribution for number of claims is Bernoulli with parameter q. • The prior distribution of q is the beta distribution: f(q) =
(3 + 4 +1)! 3 q (1-q)4 , 0 ≤ q ≤ 1. 3! 4!
• 2 claims are observed in 3 trials. Determine the mean of the posterior distribution of q. A. Less than 0.45 B. At least 0.45 but less than 0.55 C. At least 0.55 but less than 0.65 D. At least 0.65 but less than 0.75 E. At least 0.75 Use the following information for the next two questions: • The probability of an individual having exactly one claim in one exposure period is q, while the probability of no claims 1-q. • q is a random variable with the Beta density function f(q) = 6q(1-q), 0 ≤ q ≤ 1. 6.46 (4B, 5/94, Q.2) (3 points) Determine the Buhlmann credibility factor, z, for the number of observed claims for one individual for one exposure period. A. 1/12 B. 1/6 C. 1/5 D. 1/4 E. None of A, B, C, or D 6.47 (4B, 5/94, Q.3) (2 points) You are given the following: • An individual is selected at random and observed for 12 exposure periods. • During the 12 exposure periods, the selected individual incurs 3 claims. Determine the probability that the same individual will have one claim in the next exposure period. A. 1/4 B. 1/7 C. 2/7 D. 3/8 E. 5/16
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 182
Use the following information for the next two questions: • The probability that a single insured will produce exactly one claim during one exposure period is q, while the probability of no claim is 1-q. • q varies by insured and follows a beta distribution with density function f(q) = 6q(1-q), 0 ≤ q ≤ 1. 6.48 (4B, 11/95, Q.24) (3 points) Two insureds are randomly selected. During the first two exposure periods, one insured produces a total of two claims (one in each exposure period) and the other insured does not produce any claims. Determine the probability that each of the two insureds will produce one claim during the third exposure period. A. 2/9 B. 1/4 C. 4/9 D. 1/2 E. 2/3 6.49 (4B, 11/95, Q.25) (2 points) Determine the number of exposure periods of loss experience of a single insured needed to give a Buhlmann credibility factor, Z, of 0.75. A. 2 B. 4 C. 6 D. 12 E. 24
6.50 (165, 5/96, Q.11) (1.9 points) Fifteen successes have been observed in an experiment with n trials. You are applying the Bayesian process to estimate the true probability of success which you know is a binomial probability. You also know that the form of the prior distribution is Beta, with θ = 1. However, you are undecided as to which parameters to use. Prior distribution I has parameters a1 and b1 , while prior distribution II has parameters a2 and b2 . You are given: (i) a1 = 5; (ii) b1 = b2 ; (iii) the mode of prior distribution I is 1/7; (iv) the mean of prior distribution II is 6/11; and (v) the mean of posterior distribution I is 16/31 of the mean of posterior distribution II. Determine n. (A) 75 (B) 90 (C) 100 (D) 125 (E) 165
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 183
6.51 (4B, 5/96, Q.30) (3 points) A number x is randomly selected from a uniform distribution on the interval [0, 1]. Four Bernoulli trials are to be performed with probability of success x. The first three are successes. What is the probability that a success will occur on the fourth trial? A. Less than 0.675 B. At least 0.675, but less than 0.725 C. At least 0.725, but less than 0.775 D. At least 0.775, but less than 0.825 E. At least 0.825 6.52 (4B, 5/97, Q.9 & Course 4 Sample Exam 2000, Q.34) (3 points) You are given the following: • The number of claims for Risk 1 during a single exposure period follows a Bernoulli distribution with mean q. • The prior distribution for q is uniform on the interval [0,1]. • The number of claims for Risk 2 during a single exposure period follows a Poisson distribution with mean λ.
•
The prior distribution for λ has the density function f(λ) = βe−βλ, 0 < λ < ∞, β > 0.
•
The loss experience of both risks is observed for an equal number of exposure periods.
Determine all values of β for which the Buhlmann credibility of the loss experience of Risk 2 will be greater than the Buhlmann credibility of the loss experience of Risk 1. ∞
Hint:
∫ λ2 β e− βλ dλ
= 2 / β2.
0
A. β > 0
B. β < 1
C. β > 1
D. β < 2
E. β > 2
6.53 (4B, 5/97, Q.25) (2 points) You are given the following: • The number of claims for a single insured is 1 with probability q and 0 with probability 1-q, where q varies by insured. • The expected value of the process variance is 0.10. • The average of the hypothetical means is 0.30. Determine the variance of the hypothetical means. A. 0.01 B. 0.09 C. 0.10 D. 0.11
E. 0.19
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 184
6.54 (4B, 11/97, Q.19) (3 points) You are given the following: • The number of claims for a single insured follows a Bernoulli distribution with mean q. • q varies by insured and follows a uniform distribution on the interval [0, s], where 0 ≤ s < 1. Determine the value of Buhlmann's k. A. 2
B. 8
C. s2 / 12
D. s(3 - 2s) / 6
E. 2(3 - 2s) / s
6.55 (4B, 11/98, Q.14) (2 points) You are given the following: • The probability that a risk has at least one loss during any given month is q. • q does not vary by month. • The prior distribution of q is assumed to be uniform on the interval (0, 1). • This risk is observed for n months. • At least one loss is observed during each of these n months. • After this period of observation, the mean of the posterior distribution of q for this risk is 0.95. Determine n. A. 8 B. 9 C. 10 D. 18 E. 19 Use the following information for the next two questions: • The probability that a particular baseball player gets a hit in any given attempt is q . • q does not vary by attempt. • The prior distribution of q is assumed to follow a distribution with mean 1/3, variance ab , and density function (a + b)2 (a+ b +1) π(q) = •
Γ(a + b) a-1 q (1-q)b-1, 0 ≤ q≤ 1. Γ(a) Γ(b)
The player is observed for nine attempts and gets four hits.
6.56 (4B, 11/98, Q.23) (2 points) If the prior distribution is constructed so that the credibility of the observations is arbitrarily close to zero, determine which of the following is the largest. A. f(0) B. f(1/3) C. f(1/2) D. f(2/3) E. f(1) 6.57 (4B, 11/98, Q.24) (3 points) If the prior distribution is constructed so that the variance of the hypothetical means is 1/45, determine the probability that the player gets a hit in the tenth attempt. A. 1/3 B. 13/36 C. 7/18 D. 5/12 E. 4/9
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 185
6.58 (4, 11/00, Q.11) (2.5 points) For a risk, you are given: (i) The number of claims during a single year follows a Bernoulli distribution with mean p. (ii) The prior distribution for p is uniform on the interval [0,1]. (iii) The claims experience is observed for a number of years. (iv) The Bayesian premium is calculated as 1/5 based on the observed claims. Which of the following observed claims data could have yielded this calculation? (A) 0 claims during 3 years (B) 0 claims during 4 years (C) 0 claims during 5 years (D) 1 claim during 4 years (E) 1 claim during 5 years
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 186
Solutions to Problems: 6.1. A. Beta-Bernoulli. Beta has a = 4 and b = 5 (and theta = 1.) Mean of Bernoulli is q. Mean of the portfolio = E[q] = Mean of Beta Distribution = a / (a+b) = 4/9 = 0.444. 6.2. B. The prior distribution is a Beta with a = 4 (the exponent of q) and b = 5 (the exponent of 1-q). Thus F(x) = β(4,5; x). F(.55) - F(.50) = β(4,5; .55) - β(4,5; .5) = .740 - .637 = 0.103. 6.3. C. The (prior) marginal distribution is a Bernoulli with mean 4/9. This is the case because the chance of one claim is the integral of q g(q), which is the mean of g(q), which is the mean of the prior Beta = 4/9, as per the previous question. The chance of no claim is the integral of (1- q)g(q) = {(integral of g(q)} - {(integral of q g(q)} = 1 - mean of the posterior Beta = 1 - 4/9 = 5/9. Note that for each value of q the Bernoulli can have only zero or one claim; therefore, these are the only two possibilities when we integrate over all values of q. 6.4. C. Variance of the Bernoulli is: q(1-q) = (q - q2 ). Expected value of the process variance of the portfolio = E[q] - E[q2 ]. E[q] = Mean of Beta Distribution = a / (a+b) = 4 / 9. E[q2 ] = 2nd moment of Beta Distribution = a(a+1)/((a+b)(a+b+1)) = (4)(5)/((9)(10)) = .2222. Therefore, expected value of the process variance of the portfolio = E[q] - E[q2 ] = .4444 - .2222 = 0.2222. 6.5. E. The mean of the Bernoulli is q. Therefore, the variance of the hypothetical mean frequencies = Var[q] = Variance of Beta = 2/81 = 0.0247. 6.6. C. Using the solutions of the previous two problems, the total variance = Expected value of the process variance + Variance of the hypothetical means = 0.2222 + 0.0247 = 0.2469. Alternately, the variance of the marginal Bernoulli (with a mean of 4/9) is: (4/9)(1 - 4/9) = .247. 6.7. B. K = .2222 / .0247 = 9.0. Z = 12 / (12 + 9) = 57.1%. Estimated claim frequency = (.571)(7/12) + (.429)(.444) = 0.524. Comment: For the Beta-Bernoulli, K = a + b = 4 + 5 = 9. 6.8. B. For the Beta-Bernoulli, the posterior distribution is Beta with new parameters equal to (a + # claims observed), (b + # years - # claims observed)) = (4 + 7) , (5 + 12 - 7) = 11 , 10. β(11,10; x) is proportional to: q10 (1-q)9 .
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 187
6.9. E. The mean of the posterior Beta with parameters 11, 10 is: 11 / (11 + 10) = 11 / 21 = 0.524. Alternately, the estimated frequency using Buhlmann Credibility was .52, and this Buhlmann Credibility result must be equal to the result of Bayesian Analysis, since the Beta is a conjugate prior of the Bernoulli. 6.10. C. The posterior distribution is a Beta with a = 11 and b = 10. Thus F(x) = β(11,10 ; x). F(.55) - F(.50) = β(11,10; .55) - β(11,10; .5) = .591 - .412 = 0.179. Comment: An example of a Bayesian Interval Estimate. 6.11. B. The posterior Beta with parameters 11 and 10 has mean = 11/21 = .5238, second moment a(a+1)/((a+b)(a+b+1)) = (11)(12)/((21)(22)) = .2857, and variance = .2857 - .52382 = 0.0113. Comment: The variance of the posterior Beta is considerably less than the variance of the prior Beta distribution. The observations allow us to narrow the distribution of possibilities. 6.12. D. The posterior distribution of q is a Beta with a = 11 and b = 10, with mean of .524 and standard deviation of:
0.0113 = 0.106. Thus F(.55) - F(.50) ≅
Φ((.55 - .524)/.106) - Φ((.5-.524)/.106) = Φ(.25) - Φ(-.23) = .5987 - .4090 = 0.1897. Comment: Note that this differs from the exact answer of .179 obtained as the solution to a previous question using values of the Incomplete Beta Functions. 6.13. E. The predictive distribution is a Bernoulli with mean 11/21. This is the case because the chance of one claim is the integral of q g(q), which is the mean of g(q), which is the mean of the posterior Beta = 11/21, as per a previous question. The chance of no claim is the integral of (1- q)g(q) = {(integral of g(q)} - {(integral of q g(q)} = 1 - mean of the posterior Beta = 1 - 11/21 = 10/21. Note that for each value of q the Bernoulli can have only zero or one claim; therefore, these are the only two possibilities when we integrate over all values of q. ⎛ 1⎞ ⎜ ⎟ ( 11n ) (101 - n ) / 21 n = 0, 1. ⎝n⎠
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 188
⎛ 12⎞ 6.14. B. Given q, the probability of 7 claims in 5 years is: ⎜ ⎟ q7 (1 - q)5 = 792 q7 (1-q)5 . ⎝ 7⎠ The posterior distribution of q is a Beta with a = 11 and b = 10, with density f(q) = {20!/(10! 9!)} q10 (1 - q)9 = 1,847,560 q10 (1 - q)9 , 0 ≤ q ≤ 1. Therefore, the probability of 7 claims over the next 12 years is: 1
1
∫1847560q10 (1 - q)9 792 q7 (1 - q)5 dq = 1463267520 ∫q17 (1 - q)14 0
dq
0
= (1,463,267,520) (17! 14! / 32!) = 1,463,267,520 / 8,485,840,800 = 17.24%. Comment: Beyond what you are likely to be asked on your exam. Involves Beta type integrals. The estimated future claim frequency is 11/21 and the predictive distribution for the next year is a Bernoulli with mean 11/21. However, the number of claims over the next several years is given by the mixture of Binomial Distributions via a Beta, a Beta-Binomial Distribution, rather than a Binomial Distribution. The probability of 7 claim from a Binomial with m = 12 and q = 11/21 is: (12!/(7! 5!))(11/21)7 (10/21)5 = 21.0%, not the correct answer to this question. 6.15. B. For a Bernoulli the mean is q and the process variance is: q(1-q) = q - q2 . Thus the Expected Value of the Process Variance is: E[q] - E[q2 ]. Thus E[q2 ] = E[q] - EPV. EPV = 0.2. 0.6 = overall average = average of the hypothetical means = E[q]. Thus E[q2 ] = 0.6 - 0.2 = 0.4. VHM = VAR[q] = E[q2 ] - E[q]2 = 0.4 - 0.62 = 0.4 - 0.36 = 0.04. Comment: Note that while the question does not assume that the q values are Beta Distributed, one can assume so anyway and solve for the a and b parameters. In that case, the overall mean = mean of the Beta = a/(a+b) = .6. The EPV = ab/{(a+b)(a+b+1)} = .2. One can solve: a = 3 and b = 2. Then the VHM = ab / {(a+b)2 (a+b+1)} = 0.04. 6.16. A. The uniform distribution is a special case of the Beta, with a = 1 and b = 1. aʼ = a + r = 1 + 2 = 3. bʼ = b + n - r = 1 + 5 = 6. Posterior Beta has mean: aʼ/(aʼ + bʼ) = 3/(3 + 6) = 1/3. Alternately, K = a + b = 1 + 1 = 2. Z = 7/(7 + K) = 7/9. Prior mean = .5. Estimated future annual frequency = (7/9)(2/7) + (2/9)(.5) = 1/3.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 189
6.17. A. Posterior Beta has a = 3, b = 6, θ = 1. f(q) = q3-1(1 - q)6-1 8!/(2!5!) = 168q2 (1 - q)5 . 0.4
1
1
0
0.6
0.6
Prob[q ≤ .4] = 168 ∫ q2(1− q)5dq = 168 ∫ (1- y)2 y5dy =168 ∫ y5 - 2y6 + y 7 dy = y=1
168{y6 /6 - 2y7 /7 + y8 /8} ] = (168)(0.00407503) = 68.46%. y = 0.6
Comment: F(x) = β(3, 6; x/1). F(.4) = β(3, 6; .4) = 68.46%. 6.18. & 6.19. Let q be the fraction of the way the cue ball is from the left end of the table towards the right end. Then the probability a ball is to the left of the cue ball is q. Assume that “your friend places the cue ball at random on the table” means that q is uniformly distributed from 0 to 1. Thus we have a Uniform-Bernoulli, a special case of the Beta-Bernoulli with a = 1 and b = 1. Since q is the probability of a ball being to the left of the cue ball, we have two successes in three trials. Thus the posterior distribution of q is Beta with: aʼ = 1 + 2 = 3, and bʼ = 1 + 1 = 2. The posterior mean is: aʼ / (aʼ + bʼ) = 3/5 = 0.6. The posterior density of q is: q2 (1 - q) 4! / (2! 1!) = 12q2 - 12q3 , 0 ≤ q ≤ 1. The posterior probability that q < 1/4 is: 0.25
∫0 12q2 - 12q3 dq = (4)(0.253) - (3)(0.254) = 5.08%
Comment: Similar to the situation originally discussed by the Reverend Thomas Bayes. His work was edited by Richard Price and published posthumously in 1764 as “An Essay towards solving a Problem in the Doctrine of Chances”. http://rstl.royalsocietypublishing.org/content/53/370.full.pdf . 6.20. D. Beta-Bernoulli, with a = 499 and b = 501. Mean of prior Beta is: a/(a+b) = 499 / (499+501) = 0.499. 6.21. C. For the Beta-Bernoulli, K = a + b = 1000. Z = 2000 / (2000 + 1000) = 2/3. estimate: (2/3)(1010/2000) + (1/3)(.499) = 0.503. Alternately, Posterior Beta has parameters: 499 + 1010 = 1509 and 501 + 2000 - 1010 = 1491. Mean of posterior Beta: 1509 / (1509 + 1491) = 1509/3000 = 0.503.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 190
6.22. B. The 2nd moment of the posterior Beta is: a(a+1)/{(a+b)(a+b+1)} = (1509)(1510)/{(3000)(3001)} = .2530923. Variance = .2530923 - .5032 = .0000833. Prob(q > .500) ≅ 1 - Φ[(0.500 - 0.503) /
0.0000833 ] = 1 - Φ(-.33) = 0.6293.
Comment: The Beta posterior is a distribution of Bernoulli parameters. 6.23. D. From the solution to the previous question, the posterior Beta β(1509,1491), has mean .503 and Standard Deviation .00913. On the Normal Distribution, since Φ(1.645) − Φ(-1.645) = .95 - .05 = .90, the mean ±1.645 standard deviations covers a probability of 90%. Thus .503 ± (1.645)(.00913) = (0.488, 0.518) is an approximate 90% confidence interval for the Bernoulli parameter. Comment: This an example of using the posterior distribution in order to find a confidence interval for value of the true parameter(s) around its Bayesian estimator. In this case an approximate 95% confidence interval for the Bernoulli parameter around its Bayesian estimate of 0.503 would be 0.503 ± (1.960)(0.00913) = (0.485, 0.521). 6.24. C. The posterior distribution of q is Beta with a = 5 + 1 = 6 and b = 3 + 6 - 1 = 8. The posterior mean is: a/(a + b) = 6/(6 + 8) = 6/14 = 3/7. 6.25. D. The predictive distribution is Bernoulli, with q = 3/7. Its variance is: (3/7)(1 - 3/7) = 12/49.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 191
6.26. E. From the previous solution, the posterior distribution of q is a Beta Distribution with a = 6 and b = 8. g(q) = {13!/(5!7!)}q6-1(1-q)8-1 = 10296 q5 (1-q)7 , 0 ≤ q ≤ 1. Given q, the probability of 3 failures in three trials is: (1-q)3 . 1
1
∫
∫
f(0) = (1-q)3 10296 q5 (1-q)7 dq = 10296 q5 (1 - q)10 dq = 10296 β(6, 11) 0
0
= (10296)(5! 10! / 16!) = 10296/48048= 21.4%. Alternately, in order to calculate the probability of 3 failures in 3 Bernoulli trials, proceed sequentially, one trial at a time. The posterior distribution of q is Beta with a = 6 and b = 8 with mean 6/(6 + 8) = 3/7. The chance of a failure in the first trial is: 1 - 3/7 = 4/7. Posterior to one trial with a failure, we get a Beta with a = 6 and b = 9. The conditional probability of a failure in the second trial is: 9/(6 + 9) = 0.6. Posterior to two trials each with a failure, we get a Beta with a = 6 and b = 10. The conditional probability of a failure in the third trial is: 10/(6 + 10) = 5/8. Therefore, the probability of 3 failures in 3 trials is: (4/7)(.6)(5/8) = 21.4%. Comment: Beyond what you are likely to be asked on your exam. The predictive distribution is not a Binomial Distribution with m = 3 and q = 3/7, with density at 0 of: (4/7)3 = 0.187. Instead, the predictive distribution is a Beta-Binomial with m = 3, a = 6, and b = 9. See Exercise 15.82 in Loss Models. 6.27. (a) Let q be the longterm proportion of boys born. Given q, the probability of the observation is proportional to: q111 (1-q)97. For q = 48% this is: 1.1752 x 10-63. For q = 52% this is: 3.6034 x 10-63. For q = 56% this is: 2.9092 x 10-63. Thus the probability weights are: 1.1752/4, 3.6034/2, and 2.9092/4. The posterior probabilities are: 10.4%, 63.8%, and 25.8%. Thus the posterior chance that q > 52%, in other words that q = 56%, is 25.8%.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 192
(b) Given q, the probability of the observation is proportional to: q111 (1-q)97. Since q is uniform from 0.48 to 0.56, the probability weights are also proportional to: q111 (1-q)97. 0.56
∫
Thus the desired probability that q > 52% is:
q111 (1- q)97 dq
0.52 0.56
.
q111 (1- q)97 dq ∫ 0.48 Now q111 (1-q)97 is proportional to a Beta Distribution, with a = 112, b = 98, and θ = 1. This Beta Distribution has a mean of: 112 / (112 + 98) = 0.53333. This Beta Distribution has a second moment of: (112)(113) / {(210)(211)} = 0.28562. Thus this Beta Distribution has a variance of: 0.28562 - 0.533332 = 0.001179. Ignore the constants in front of the density of the Beta, since they will cancel in the ratio of integrals we want. Then the integral in the numerator is (proportional to) the difference between the Beta Distribution at 0.56 and 052. 0.56 - 0.53333 0.52 - 0.53333 Use the Normal Approximation: Φ[ ] - Φ[ ]= 0.001179 0.001179 Φ[0.78] - Φ[-0.39] = 0.7823 - 0.3483 = 0.4340. The integral in the denominator is (proportional to) the difference between the Beta Distribution at 0.56 and 048. 0.56 - 0.53333 0.48 - 0.53333 Use the Normal Approximation: Φ[ ] - Φ[ ]= 0.001179 0.001179 Φ[0.78] - Φ[-1.55] = 0.7823 - 0.0606 = 0.7217. 0.56
∫
Thus the desired probability is:
q111 (1- q)97 dq
0.52 0.56
q111 (1- q)97 dq ∫ 0.48 (Using a computer, the exact answer is 60.02%.)
= 0.4340 / 0.7217 = 60.1%.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 193
(c) For this Beta-Bernoulli, the posterior Beta has: aʼ = 13 + 111 = 124, and bʼ = 12 + 97 = 109. Thus the posterior probability that q > 0.52 is the survival function of the Beta at 0.52. This Beta Distribution has a mean of: 124 / (124 + 109) = 0.53219. This Beta Distribution has a second moment of: (124)(125) / {(233)(234)} = 0.28429. Thus this Beta Distribution has a variance of: 0.28429 - 0.532192 = 0.001064. Use the Normal Approximation: 1 - Φ[
0.52 - 0.53219 0.001064
] = 1 - Φ[-0.37] = Φ[0.37] = 0.6443.
(Using a computer, the exact answer is 64.62%.) Comment: Notice the way that the posterior distribution depends on the prior distribution assumed. All of the priors have a mean of 52%. Similar to a situation analyzed by Pierre Simon Laplace. Laplace developed the modern form of what is now called Bayes Theorem: Prior times likelihood is proportional to the posterior. Unlike in Laplaceʼs day, currently in some countries the percentage of boys born differs significantly from the natural rate, due to using ultrasound to select to have abortions based on the gender of the fetus. 6.28. C. The prior distribution of q is a Beta Distribution with a = 135, b = 350, and θ = 1. (a - 1 = 134 and b - 1 = 349.) The mean of the (prior) Beta Distribution is: 135/(135 + 350) = 0.278. The marginal distribution is Bernoulli with q = .278. 6.29. B. For the Beta-Bernoulli, the Buhlmann Credibility parameter = a + b = 485. Therefore Z = 100 / (100+485) = 100/585. The prior mean is a/(a+b) = 135/485 and the observation is 40/100. Therefore the new estimate of q is: (100/585)(40/100) + (485/585)(135/485) = 175/585 = .299. Thus over his next 100 attempts we expect (100)(.299) = 29.9 hits. Alternately, the posterior Beta has parameters: 135 + 40 = 175 and 350 + 60 = 410. The mean of the posterior Beta is 175 / (175+ 410) = 175/585 = .299. (This is also the mean of the predictive Bernoulli distribution.) Thus over his next 100 attempts we expect (100)(.299) = 29.9 hits.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 194
6.30. C. The distribution of q prior to any observations is a Beta Distribution with a = 135 and b = 350. For the Beta-Bernoulli, the Buhlmann Credibility parameter = a + b = 485. Therefore Z = 200 / (200+485) = 200/685. The prior mean is a/(a+b) = 135/485 and the observation is 45/200. Therefore the new estimate of q is: (200/685)(45/200) + (485/685)(135/485) = 180/685 = 0.263. Alternately, the posterior Beta has parameters: 135 + 45 = 180 and 350 + 155 = 505. The mean of the posterior Beta is 180 / (180+ 505) = 180/685 = 0.263. Alternately, one can start with the distribution of q after the first 100 attempts, which from a previous solution is a Beta with parameters a = 175 and b = 410. Then using only the observation of the second 100 attempts to update this (intermediate) Beta, gives a Beta posterior to all the observations, with parameters 175 + 5 = 180 and 410 + 95 = 505. Then proceed as before. Comment: One can either update in two smaller steps or one big step. 6.31. B. The posterior Beta has a = 1 + 47 = 48 and b = 5 + 0. The mean of the posterior Beta is: 48/(48 + 5) = 90.6%. Comment: The mean of the prior Beta is: 1/(1 + 5) = 1/6. Based on the comic strip Peanuts. In 1952, it might not have occurred to someone that Lucy could be so mean; even 1/6 would have been a rather high estimate of the probability of someone doing something like this. 6.32. D. Beta-Bernoulli, with chance of success (for the tyrannosaur) of q. The posterior Beta for Tomʼs q has: a = 2 + 0 = 2, and b = 150 + 300 = 450. The number of future days Tom stays alive is the number of failures for the tyrannosaur prior to his first success, which is Geometric with β = (chance of failure for the tyrannosaur)/(chance of success for the tyrannosaur) = (1- q)/q = 1/q - 1. Expected number of days alive = E[β] = E[1/q] - 1. For Tomʼs posterior Beta Distribution, E[X-1] = Γ(a + b)Γ(a - 1)/{Γ(a)Γ(a + b + 1) = (a + b - 1)/(a -1) = (2 + 450 - 1)/(2 - 1) = 451. Therefore, E[β] = E[1/q] - 1 = 451 - 1 = 450 days. Tom expects to find (60)(.5){(.6)(1) + (.2)(5) + (.2)(10)} = 108 worth of coins each day. (450 days alive in the future)(108 per day) = 48,600. Comment: Difficult! Beyond what you are likely to be asked on the exam. One could add half a walk, 48,600 + 54 = 48,654, to take into account the probability that Tom completes his walk on the day he is eaten. Note, each scientist can be eaten only once in total. Therefore, this is not your typical Beta-Bernoulli. However, one can update the Beta in the same manner as usual for a scientist who represents 300 failures and no successes for the tyrannosaur.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 195
6.33. D. aʼ = 15 + 40 = 55. bʼ = 15 + (60 - 40) = 35. Posterior mean is: aʼ/(aʼ + bʼ) = 55/90 = 11/18. So we expect the Durham Bulls to win (11/18)(102) = 62 of their remaining 102 games. Expected total wins: 40 + 62 = 102. Alternately, using Buhlmann Credibility, K = a + b = 15 + 15 = 30, Z = 60/(60 + K) = 2/3. Prior mean = a/(a + b) = 15/30 = 1/2. Future frequency = (2/3)(40/60) + (1/3)(1/2) = 11/18. Proceed as before. 6.34. C. The posterior distribution of q is Beta; aʼ = 15 + 40 = 55. bʼ = 15 + (60 - 40) = 35. Posterior mean is: aʼ/(aʼ+bʼ) = 55/90 = .6111. 2nd moment of the posterior Beta is: aʼ(aʼ+1)/{(aʼ+ bʼ)(aʼ+ bʼ+ 1)} = (55)(56)/{(90)(91)} = .3761. The total number of wins is 40 plus the additional wins in the remaining 102 games. Thus the variance of the total number of wins is the variance of the remaining wins. The number of remaining wins is Binomial with m = 102 and q. Posterior EPV = E[102q(1- q)] = (102)(E[q] - E[q2 ]) = (102)(.6111 - .3761) = 24.0. Posterior VHM = Var[102q] = (1022 )Var[q] = (1022 )(E[q2 ] - E[q]2 ) = (1022 )(.3761 - .61112 ) = 27.6. Total Variance = EPV + VHM = 24.0 + 27.6 = 51.6. 6.35. A. K = a + b = 20. Z = 6/(6 + 20) = 3/13. Observed frequency of girls = 0/6 = 0. A priori mean = a/(a + b) = 10/(10 + 10) = 1/2. Estimate = (Z)(0) + (1 - Z)(1/2) = (10/13)(1/2) = 5/13 = 38.5%. Alternately, aʼ = a + r = 10 + 0 = 10. bʼ = b + n - r = 10 + 6 = 16. Mean of posterior Beta is: aʼ/(aʼ + bʼ) = 10/(10 + 16) = 5/13 = 38.5%. 6.36. E., 6.37. D., 6.38. A. The posterior distribution of q is Beta with a = 1 and b = 19 + 4 = 23. The estimate corresponding to the squared error loss function is the mean. The mean of the posterior Beta is 1/(1 + 23) = 1/24 = 4.17%. The estimate corresponding to the absolute loss function is the median. The density of the posterior Beta is: f(q) = 23(1-q)22, 0 < q < 1. Therefore, F(q) = (1- q)23, 0 < q < 1. Set 0.5 = (1- q)23. ⇒ q = 2.97%. ⇒ The median of the posterior Beta is 2.97%. The estimate corresponding to the zero-one loss function is the mode. The density of the posterior Beta is: f(q) = 23(1-q)22, 0 < q < 1. This is a decreasing function of q, and thus the mode is at q = 0. Comment: Loss functions are discussed in “Mahlerʼs Guide to Buhlmann Credibility.” The zero-one loss function is kind of silly for actuarial work.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 196
6.39. The mean of the mixture for a single game is: 0.6. For a given value of q, for a single game, the second moment of the Bernoulli is: (1-q)q + q2 = q. Therefore, the second moment of the mixture is: E[q] = 0.6. Therefore, the variance of the mixture is: 0.6 - 0.62 = 0.24. The variance of the number of games won during a season is: (0.24)(162) = 38.88 The standard deviation of the number of games won during a season is 6.24. Alternately, for a single game EPV = E[q(1-q)] = E[q] - E[q2 ], and VHM = Var[q] = E[q2 ] - E[q]2 . For a single game, the (total) variance is: EPV + VHM = E[q] - E[q]2 = 0.6 - 0.62 = 0.24. The variance of the number of games won during a season is: (0.24)(162) = 38.88 The standard deviation of the number of games won during a season is 6.24. Comment: The usual assumption for a Beta-Binomial would be that for a given team q is the same throughout the year. That is not the case here. Note that in this case, the answer does not depend on the distribution of q. 6.40. This uniform is a Beta with a =1 and b = 1. This is a Beta-Bernoulli with a = 1 and b = 1. aʼ = 1 + 0 = 1, and bʼ = 1 + n = n + 1. Posterior mean is : aʼ / (aʼ + bʼ) = 1 / (n+2). 6.41. This uniform is a Beta with a =1 and b = 1. This is a Beta-Bernoulli with a = 1 and b = 1. aʼ = 1 + n = n + 1, and bʼ = 1 + 0 = 1. Posterior mean is : aʼ / (aʼ + bʼ) = (n+1) / (n+2). 6.42. B. The posterior probability that H = 1/4 is: .2 / .3 = 2/3. The posterior probability that H = 1/2 is: .1 / .3 = 1/3. A
B
C
H
A Priori Chance of This Value of H
Chance of the Observation
1/4 1/2
0.800 0.200
0.2500 0.5000
Overall
Comment: A mixture of Bernoulli Distributions.
D
E
Prob. Weight = Posterior Chance Product of Columns of This Value B&C of H 0.2000 0.1000
66.7% 33.3%
0.3000
1.000
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 197
6.43. B. Assuming a given value of θ, the chance of observing one success in two trials is: θ(1−θ). The prior distribution of θ is: g(θ) = 1, 0 ≤ θ ≤ 1. By Bayes Theorem, the posterior distribution of θ is proportional to the product of the chance of the observation and the prior distribution: θ(1−θ). Thus the posterior distribution of θ is proportional to θ −θ2. The integral of θ − θ2 from 0 to 1 is 1/2 - 1/3 = 1/6. We must divide by 1/6 in order to have the integral of the posterior distribution equal to unity. Thus the posterior distribution of θ is 6(θ −θ2). The posterior chance of θ in [.45, .55] is: θ=.55
6
θ=.55
∫ (θ − θ2) dθ = 3θ2 - 2θ3 ]
θ=.45
= .3 - .1505 = 0.1495.
θ=.45
Comment: This is an example of a Bayesian interval estimate. A Beta-Binomial conjugate prior situation, since the uniform distribution is a Beta distribution with a=1 and b=1. The posterior distribution is Beta(2,2; θ) = {(3!) / (1!)(1!)}θ2-1 (1-θ)2-1 = 6(θ − θ2). 6.44. B. Mean of the Beta is: θa/(a+b) = a/(a+b). ⇒ 1/10 = a/(a + b). ⇒ b = 9a. Mode = 1/20 = (a - 1)/(a + b - 2). ⇒ b = 19a - 18. ⇒ a = 1.8. ⇒ b = 16.2. aʼ = 1.8 + h, bʼ = 16.2 + 5 - h = 21.2 - h, and the posterior mean is: (1.8 + h)/23.
⇒ 19/115 = (1.8 + h)/23. ⇒ h = 2. 6.45. B. The given Beta Distribution has parameters: a = 4 and b = 5. The posterior distribution is also a Beta Distribution, but with new first parameter, a´, equal to the prior first parameter, a, plus the number of claims observed = 4 + 2 = 6, and new second parameter, b´, equal to the prior second parameter, b, plus the number of trials minus the number of claims observed = 5 + 3 - 2 = 6. The mean of a Beta distribution is a /(a + b). Thus the mean of the posterior distribution is: 6 / (6 + 6) = 0.5. 6.46. C. This is a Beta-Bernoulli conjugate prior, with prior Beta Distribution with a = 2 and b = 2. For this situation, the Buhlmann Credibility parameter K = a + b = 4. For one exposure period: Z = 1 / (1 + 4) = 1/5.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 198
6.47. E. Since we have a Bernoulli process with either zero or one claim per exposure period, the chance of having one claim in the next exposure period is the posterior mean. Since this is a conjugate prior situation with a member of a linear exponential family, the Bayes Analysis result is equal to the Buhlmann credibility result. From the previous question K = 4. Therefore, Z = 12 / (12 + 4) = 75%. The observed mean is 3/12 = .25. The prior mean is the mean of the (prior) Beta = a / (a + b) = 2 / 4 = .5. Thus the new estimate = (75%)(.25) + (25%)(.5) = .3125 = 5/16. Alternately, for the Beta-Bernoulli, the parameters of the posterior Beta are: posterior a = prior a + number of claims observed = 2 + 3 = 5, and posterior b = prior b + (number of exposures - number of claims observed) = 2 + 9 = 11. Mean of the posterior Beta = ( posterior a ) / ( posterior a + posterior b ) = 5 / (5 + 11 ) = 5/16. 6.48. A. The prior distribution is a Beta with parameters a = 2 and b = 2. For the Beta-Bernoulli the (posterior) predictive distribution is a Bernoulli, with q = ( a + number of claims observed) / (a + b + number of trials observed) = (2 + number of claims observed) / (2 + 2 + 2) = (2 + number of claims observed) / 6. For the first insured, which had two claims in two trials, q = 4/6 = 2/3. For the second insured, with no claims in two trials, q = 2/6 = 1/3. Thus posterior to the observations, we have two insureds each with Bernoulli distributions, one with q = 2/3 and one with q = 1/3. The chance that they each have a claim is: (2/3)(1/3) = 2/9. 6.49. D. For the Beta-Bernoulli, the Buhlmann Credibility Parameter is K = a + b = 2 + 2 = 4. For N exposure periods, Z = N / (N + 4). Setting Z = .75 and solving for N, N = 12.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 199
6.50. D. The mode of the Beta is where is the density is largest. f(x) is proportional to: (x/θ)a-1 (1 - x/θ)b-1, 0 ≤ x ≤ θ. Setting the derivative of f(x) equal to 0: 0 = (a - 1)(x/θ)a-2 (1 - x/θ)b-1/θ - (b - 1)(x/θ)a-1 (1 - x/θ)b-2/θ. (a - 1)(1 - x/θ) = (b - 1)(x/θ). ⇒ x = θ(a - 1)/(a + b - 2). Checking against the endpoints, this is the maximum of the density when a > 1 and b > 1. The mode is: θ(a - 1)/(a + b - 2) = (a - 1)/(a + b - 2), for a > 1 and b > 1. The mode of prior distribution I is 1/7. ⇒ 1/7 = (5 - 1)/(5 + b1 - 2). ⇒ b1 = 25. ⇒ b2 = 25. Mean of the Beta is: θa/(a+b) = a/(a+b). The mean of prior distribution II is 6/11. ⇒ 6/11 = a2 /(a2 + 25). ⇒ a2 = 30. After 15 successes in n trials, using prior distribution I, aʼ = 5 + 15 = 20, bʼ = 25 + n - 15 = 10 + n, and the posterior mean is: 20/(30 + n). After 15 successes in n trials, using prior distribution II, aʼ = 30 + 15 = 45, bʼ = 25 + n - 15 = 10 + n, and the posterior mean is: 45/(55 + n). The mean of posterior distribution I is 16/31 of the mean of posterior distribution II. ⇒ 20/(30 + n) = (16/31)45/(55 + n). ⇒ 31n + 1705 = 36n + 1080. ⇒ n = 125. 6.51. D. Given x, the chance of observing three successes out of three trials is x3 . The a priori density function is f(x) =1, 0< x <1. The posterior probability is proportional to the product of the chance of the observation and the a priori probability: (x3 ) (1) = x3 . The integral from zero to one of x3 is 1/4. In order to get a density function we must divide x3 by this integral; therefore the posterior density is 4x3 . The mean of the posterior density is the integral from 0 to 1 of (x)(4x3). The integral from 0 to 1 of 4x4 is: 4/5 = 0.80. Alternately, the uniform distribution is a Beta distribution with a = 1 and b = 1. For a Beta-Bernoulli. the posterior mean is: (a + number of successes) / (a + b + number of trials) = (1 + 3)/(1 + 1 + 3) = 4/5 = 0.80.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 200
6.52. D. Risk 1 follows a Uniform Distribution, a Beta-Bernoulli, with a = 1 and b = 1. Thus Risk 1 has Buhlmann Credibility Parameter of: a + b = 2. Risk 2 follows a Gamma-Poisson with α = 1 and θ = 1/β. Thus Risk 2 has Buhlmann Credibility Parameter of: 1/θ = β. If for an equal number of exposures more credibility is assigned to Risk 2 than Risk 1, then the Buhlmann Credibility Parameter for Risk 2 is less than that for Risk 1. In other words, β < 2. Alternately, one can work out the two Buhlmann Credibility Parameters. For Risk 1, the EPV is E[q - q2 ] = 1/2 - 1/3 = 1/6. For Risk 1, the VHM is VAR[q] = 1/12. Thus for Risk 1, K = (1/6)/(1/12) = 2. For Risk 2, the EPV is E[λ] = 1/β. Using the hint, the VHM is VAR[λ] = E[λ2] - E2 [λ] = 2 / β2 - 1/ β2 = 1/ β2. Thus for Risk 2, K = (1/ β)/(1/ β2) = β. Comment: This question is a combination of two simpler questions asking you to compute the credibility for each of two different (conjugate prior) situations. Note that since Z = N/(N + K), a smaller value of K corresponds to a larger Z. 6.53. D. For a Bernoulli the mean is q and the process variance is q(1-q) = q - q2 . Thus the Expected Value of the Process Variance is: E[q] - E[q2 ]. Thus E[q2 ] = E[q] - EPV. We are given that: average of the hypothetical means = E[q] = .3 and EPV = .1. Thus E[q2 ] = .3 - .1 = .2. Now VHM = VAR[q] = E[q2 ] - E[q]2 = .2 - .32 = .2 - .09 = 0.11.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 201
6.54. E. Since q is uniform from 0 to s, E[q] = s/2, and Var[q] = s2 /12.
⇒ E[q2 ] = s2 /12 + (s/2)2 = s2 /3. For fixed q, the process variance of the Bernoulli is q(1-q).
⇒ EPV = E[q(1-q)] = E[q] - E[q2 ] = s/2 - s2 /3. The hypothetical mean of the Bernoulli is q.
⇒ VHM = Var[q] = s2 /12. Thus the Buhlmann Credibility Parameter K = EPV/VHM = (s/2 - s2 /3)/(s2 /12) = (6 - 4s)/s = 2(3 - 2s) / s. Alternately, the density of q is: f(q) = 1/s for 0 ≤ q ≤ s. Thus the Expected Value of the Process Variance is: s
s
∫q(1-q)f(q)dq = (1/s)∫(q-q2)dq = (1/s)(s2/2 - s3/3) = s/2 - s2/3. 0
0
For fixed q, the mean of the Bernoulli is q. The overall mean is thus s/2. Thus the Variance of the Hypothetical Means is: s
s
q=s
∫(q-s/2)2f(q)dq = (1/s)∫(q-s/2)2dq = (1/s)(q-s/2)3 / 3 ] 0
0
= s2 /24 - (-s2 /24) = s2 /12.
q=0
Proceed as before. Alternately, this uniform distribution is a Beta with a = 1, b = 1, and θ = s. Thus this is a Beta-Bernoulli, with θ < 1. K = (a + b){(a + b + 1)/θ - (a + 1)}/b = 2(3/s - 2)/1 = 2(3 - 2s) / s. Comment: q is uniform on [0, s], with the value of s fixed but unknown. Some might find it easier to select a value of s such as 0.4 (not 1), compute K and then see which letter solution could be right. If s = 1, then we would have a uniform distribution on [0, 1]; this is a special case of a Beta-Bernoulli with a = 1, b = 1, (and θ = 1). For this case, K = a + b = 2. Only choices A and E approach this result as s approaches 1.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 202
6.55. D. Given q, the chance of the observation is qn . Since the a priori distribution of q is uniform on [0,1], the posterior distribution of q is proportional to qn . 1
∫
1
qn dq = qn+1/(n+1)
0
] = 1/(n+1). 0
Thus the posterior distribution of q is: qn /(1/(n+1)) = qn (n+1). Thus the mean of the posterior distribution is: 1
∫ 0
1
q {qn (n+1)} dq = qn+2(n+1)/(n+2)
] = (n+1)/(n+2). 0
Setting the posterior mean equal to the given 0.95, one solves for n. 0.95 = (n+1)/(n+2). 0.95n + 1.9 = n + 1. 0.9 = 0.05n. n = 18. Alternately, this is a Beta-Bernoulli, with a = 1 and b =1. The posterior Beta has parameters: 1 + n and 1 + (n-n). This posterior Beta with parameters: n + 1 and 1, has a mean of (n+1)/((n+1) + 1) = (n + 1)/(n + 2). Then proceed as above to solve for n. Alternately, this is a Beta-Bernoulli, with a = 1 and b = 1. Thus the result of Bayesian Analysis is equal to that of Buhlmann Credibility. K = a + b = 2. Z = n / (n + 2). Observation = 1. Prior mean = 1/2. New estimate = (1)(n/(n + 2)) + (1/2)(2/(n + 2)) = (n + 1)/(n + 2). Then proceed as above to solve for n. Comment: Backwards. Given an output, solve for a missing input. 6.56. B. We are given that the mean of the Beta Distribution is 1/3, so a/(a+b) = 1/3 or b = 2a. For comparing values of π(q) between values of q (for fixed a and b) we donʼt care about the constant in front. π(q) is proportional to: qa-1 (1-q)b-1 = qa-1 (1-q)2a-1 . We can maximize this, by maximizing its log: (a-1)ln(q) + (2a-1)ln(1-q). Taking the derivative of this log, and setting it equal to zero: (a-1)/q -(2a-1)/(1-q) = 0. (2a - 1)q = (a - 1) - (a - 1)q. ⇒ q = (a - 1)/(3a - 2). For the Beta-Bernoulli, the Buhlmann Credibility Parameter K = a + b = 3a. For small credibility, K is large, so a is large. As a approaches infinity, the value at which f(q) is largest: (a - 1)/(3a - 2) approaches 1/3. Comment: As the credibility gets infinitesimal, the prior density has its mode at its mean of 1/3. As a goes to infinity, the variance of the prior density goes to zero, and the probability is concentrated at the mean.
2013-4-10,
Conjugate Priors §6 Beta-Bernoulli,
HCM 10/21/12,
Page 203
6.57. C. We are given that ab/{(a+b+1)(a+b)2 } = 1/45. Also from the solution to the previous question we have b = 2a. Thus a(2a)/{(a+2a+1)(a+2a)2 } = 2a2 / {(3a+1)(9a2 )} = 2/ {9(3a+1)} = 1/45. Thus 1/(3a+1) = 1/10. Thus a = 3 and b = 6. The Buhlmann Credibility parameter K = a + b = 9. Thus for 9 observations, Z = 9/(9 + 9) = 50%. The prior mean is 1/3. The observation is 4/9. Thus the new estimate is: (4/9)(1/2) +(1/3)(1- 1/2) = 4/18 + 3/18 = 7/18. Alternately, the posterior Beta has parameters: 3 + 4 = 7 and 6 + (9 - 4) = 11. Thus the mean of the posterior Beta is 7/(7+11) = 7/18. Comment: Gives you two of the intermediate results and asks you to solve for a and b. Check: a/(a+b) = 3/(3+6) = 1/3. ab/{(a+b+1)(a+b)2 } = (3)(6)/{(10)(92 )} = 1/45. 6.58. A. Assume one observes C claims in Y years. Then the chance of the observation is proportional to: pC(1-p)Y-C. The a priori density of p is 1, 0 ≤p≤1. Thus by Bayes Theorem, the posterior density of p is proportional to: pC(1-p)Y-C, 0 ≤p≤1. Therefore, the mean of the posterior density is: 1
1
0
0
∫ p pC(1- p)Y - C dp /
∫ pC(1- p)Y - C dp = β(C+2, Y+1 - C)/β(C+1, Y+1 - C) =
{Γ(C+2)Γ(Y+1-C)/Γ(Y+C+3)} / {Γ(C+1)Γ(Y+1-C)/Γ(Y+C+2)} = (C + 1)/(Y + 2). Of the combinations given, the posterior mean is 1/5 for C = 0 and Y = 3. Alternately, this a special case of the Beta-Bernoulli Conjugate Prior, with a = 1 and b = 1. aʼ = a + C = C + 1. bʼ = b + Y - C = Y + 1 - C. The mean of the posterior Beta is: aʼ/(aʼ + bʼ) = (C + 1)/(Y + 2). Proceed as before. Alternately, since for the Beta-Bernoulli Conjugate Prior, Buhlmann Credibility gives the same result as Bayes Analysis, one could use Buhlmann Credibility with K = a + b = 2. Z = Y/(Y+2). Posterior estimate = (C/Y)Y/(Y+2) + (1/2)2/(Y+2) = (C + 1)/(Y + 2).
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 204
Section 7, Beta-Binomial41 The Beta Distribution is also a conjugate prior to the Binomial Distribution with m fixed. Since a Binomial is a series of m independent Bernoulli trials, one can apply similar ideas to a Beta-Binomial as to a Beta-Bernoulli. Exercise: The number of claims in a year from an individual policyholder is Binomial with parameters m = 10 and q. The prior distribution of q is Beta with a = 3, b = 6, and θ = 1. An individual policyholder has 4 claims the first year, and 5 claims the second year. What is the expected number of claims from this policyholder in the third year? [Solution: We observed two years with the equivalent of 10 Bernoulli trials each; we observed (2)(10) = 20 Bernoulli trials, with 4 + 5 = 9 claims. aʼ = a + r = 3 + 9 = 12, and bʼ = b + n - r = 6 + 20 - 9 = 17. Mean of the posterior Beta Distribution of q is: aʼ/(aʼ + bʼ) = 12/(12 + 17) = 0.414. Expected future annual frequency is: mq = (10)(0.414) = 4.14.] The Beta Distribution is a conjugate prior to the Binomial Distribution for fixed m. If for fixed m, the q parameter of the Binomial is distributed over a portfolio by a Beta, then the posterior distribution of q parameters is also given by a Beta with parameters: aʼ = a + number of claims. bʼ = b + m (number of years) - number of claims. Exercise: The number of claims in a year from an individual policyholder is Binomial with parameters m = 10 and q. The prior distribution of q is Beta with a = 3, b = 6, and θ = 1. An individual policyholder has 4 claims the first year, and 5 claims the second year. What is the posterior distribution of q? [Solution: The posterior distribution is also Beta, with aʼ = a + number of claims = 3 + 4 + 5 = 12, and bʼ = b + m(number of years) - number of claims = 6 + (10)(2) - (4 + 5) = 17. Comment: Mean of the posterior Beta Distribution of q is: aʼ/(aʼ + bʼ) = 12/(12 + 17) = 0.414. Expected future annual frequency is: mq = (10)(0.414) = 4.14.] This is the same result as obtained previously by thinking of the Binomial as the sum of 10 independent Bernoullis. The two approaches are mathematically equivalent. Use whichever approach you prefer. The number of claims per year is Binomial. We observe several years. This is mathematically the same as if we had independent Bernoulli trials, each with mean q. The distribution of q is Beta. This is mathematically equivalent to a Beta-Bernoulli. Therefore, the estimates from Buhlmann Credibility and Bayes Analysis are equal. 41
For m = 1, we get the special case of the Beta-Bernoulli.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 205
Translate from “Binomial Land” to “Bernoulli Land.” Update for Observations. Translate back to “Binomial Land.” This trick works to get the posterior mean. It does not work to get the predictive distribution. As will be discussed, the mixed distribution is a “Beta-Binomial”. Alternately, one can use the following updating formulas for the Beta-Binomial case, in order to get the posterior Beta: aʼ = a + number of claims. bʼ = b + m (number of years) - number of claims. It turns out that for the Beta-Binomial: K = (a + b) / m, with Bayes = Buhlmann.
Beta-Binomial Frequency Process Beta is a Conjugate Prior for the Binomial Likelihood Binomial with m fixed is a Member of a Linear Exponential Family. Buhlmann Credibility Estimate = Bayes Analysis Estimate. Buhlmann Credibility Parameter, K = (a + b)/m. Binomial Process Beta Prior a, b Dist. of Parameters
Mixing
Beta-Binomial Marginal Dist. Number of Claims
Observations: # claims = C, # years = Y.
Beta Posterior Dist. of Parameters aʼ = a + C. bʼ = b + mY - C.
Mixing
Binomial Process
Beta-Binomial Predictive Dist. Number of Claims
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 206
Beta-Binomial, Prior Expected Value of the Process Variance: Since the frequency for each risk is Binomial, the process variance for an individual risk is: mq(1-q) = mq - mq2 Therefore the expected value of the process variance = m E[q] - m E[q2 ]. E[q] = the mean of the Beta Distribution =
a . (a + b)
E[q2 ] = the second moment of the Beta Distribution =
EPV = m
a (a + 1) . (a + b) (a + b + 1)
a a (a + 1) ab -m =m . (a + b) (a + b) (a + b + 1) (a + b) (a + b + 1)
Beta-Binomial, Prior Variance of the Hypothetical Means:
Variance of the Prior Beta Distribution =
a2 a (a + 1) ab = . 2 2 (a + b) (a + b + 1) (a + b) (a + b) (a + b + 1)
Since the frequency for each risk is Binomial, each hypothetical mean is mq. ab VHM = Var[mq] = m2 Var[q] = m2 . (a + b)2 (a + b + 1) Beta-Binomial, Buhlmann Credibility: K = EPV/ VHM =
a + b 42 . m
As with the Beta-Bernoulli, Buhlmann = Bayes. ab ab + m2 2 (a + b) (a + b + 1) (a + b) (a + b + 1) a b (m+ a + b) =m = Variance of Mixed Beta-Binomial. (a + b)2 (a+ b +1) EPV + VHM = m
42
For m = 1, this reduces to the Beta-Bernoulli.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 207
Mixed Beta-Binomial Distribution: In the case of a Beta-Binomial, the mixed distribution is not a Binomial. Therefore, while one can use the idea that a Binomial is a sum of independent Bernoulli trials to get the mean future frequency, one can not use this idea in general to predict the probability of the insured having a certain number of claims in the future. Rather, one has to perform the mixing. The predictive distribution is a mixture of Binomial Distributions via the posterior Beta. For example, in the previous exercise, the posterior Beta for this policyholder has a = 12, b = 17, and θ = 1. Therefore, the posterior density of q is: g(q) =
28! q11(1 - q)16 = 365,061,060 q11(1 - q)16, 0 ≤ q ≤ 1. 11! 16!
Given q, the probability of 6 claims next year is the density at 6 for a Binomial with m = 10: ⎛10 ⎞ ⎜ ⎟ q6 (1 - q)4 = 210 q6 (1 - q)4 . ⎝6⎠ Therefore, mixing over q, the probability of 6 claims next year for this policyholder is: 1
1
0
0
∫ 210 q6 (1 - q)4 365,061,060 q11 (1 - q)16 dq = 76,662,822,600
∫ q17 (1- q)20 dq.
This is a Beta type integral discussed in the previous section. 1
∫ xa - 1 (1- x)b - 1 dx =
0
(a - 1)! (b- 1)! Γ(a) Γ(b) = = β(a, b). (a + b - 1)! Γ(a + b)
Thus, with a = 18 and b = 21, this integral is:
1 17! 20! = . 604,404,010,980 38!
Therefore, mixing over q, the probability of 6 claims next year for this policyholder is: 76,662,822,600 / 604,404,010,980 = 12.68%. This is not the same as the density at 6 of a Binomial with m = 10 and q = posterior mean = 12/29: 5! 6! (17/29)4 (12/29)6 = 12.45%. 12!
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 208
This mixed distribution is an example of what is sometimes called a Beta-Binomial Distribution.43 β(a + x, b + m- x) f(x) = , x = 0, 1, ... , m.44 (m+1) β(a, b) β(x + 1, m +1- x) Exercise: In this example, what is the probability that the insured has zero claims next year? [Solution: In this example, the predictive distribution is a Beta-Binomial with m = 10, a = 12, and b = 17. f(0) = β(12+0, 17+10-0) / {(10+1)β(12, 17)β(0+1, 10+1-0)} = β(12, 27) / {(11)β(12, 17)β(1, 11)}. β(12, 27) = Γ(12) Γ(27)/Γ(39) = 11! 26! / 38!. β(12, 17) = Γ(12) Γ(17)/Γ(29) = 11! 16! / 28!. β(1, 11) = Γ(1) Γ(11)/Γ(12) = (1) 10! / 11! = 1/11. f(0) = {11! 26!/38!} {(11){11! 16!/28!}/11} = (26!/16!) (28!/38!) = {(17)(18) ... (26)} / {(29)(30) ... (38)} = 1.12%. Alternately, given q, the probability of 0 claims next year is the density at 0 for a Binomial with m = 10: (1-q)10. The posterior distribution of q is Beta with a = 12, b = 17, and θ = 1: g(q) = {28!/(11! 16!)} q11(1 - q)16 = 365,061,060 q11(1 - q)16, 0 ≤ q ≤ 1. f(0) =
1
1
0
0
∫ (1- q)10 365,061,060 q11(1 - q)16 dq = 365,061,060
= 365,061,060 β(12, 27) = 365,061,060
∫ q11(1 -
q)26 dq
11! 26! = 365,061,060 / 3,248,970,178 = 1.12%. 38!
Alternately, treat this as 10 separate Bernoulli trials each with no claim. The distribution of q prior to next year is Beta with a = 12, and b = 17 with mean 12/29. The chance of no claims in the first trial is 17/29. Posterior to one trial with no claim, we get a Beta with a = 12 and b = 18. The conditional probability of no claims in the second trial is: 18/(12 + 18) = 18/30. Posterior to two trials with no claim, we get a Beta with a = 12 and b = 19. The conditional probability of no claims in the third trial is: 19/(12 + 19) = 19/31. Proceeding in this manner, the probability of no claims in all 10 trials is: (17/29)(18/30)(19/31)(20/32)(21/33)(22/34)(23/35)(24/36)(25/37)(26/38) = 1.12%. Comment: The final sequential technique only works when one is trying to get the probability of having either no claims in the future, or the maximum number of claims in the future, in this case 10 per year. If one instead was calculating the probability of 6 claims next year, one would not know which of the 10 trials had a claim and which did not.] 43
See Exercise 15.82 in Loss Models. This distribution is also sometimes called a Binomial-Beta, Negative Hypergeometric, or Polya-Eggenberger Distribution. It has three parameters: m, a, and b. See for example Kendallʼs Advanced Theory of Statistics by Stuart and Ord. 44
β(a, b) = Γ(a)Γ(b) / Γ(a+b).
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 209
In this example, the predictive distribution is a Beta-Binomial with m = 10, a = 12, and b = 17, with densities at 0 to 10 of: 0.0112362, 0.0518594, 0.121351, 0.188768, 0.215442, 0.188022, 0.126840, 0.0652322, 0.0244621, 0.00604002, 0.00074612.45 Variance of the Mixed Beta-Binomial Distribution:46 The number of claims in a year from an individual policyholder is Binomial with parameters m = 10 and q. The prior distribution of q is Beta with a = 3, b = 6, and θ = 1. Exercise: Determine the mean, second moment, and variance of this prior Beta distribution. [Solution: E[q] = Mean of the Beta = a / (a + b) = 3/9 = 1/3. a (a +1) (3) (4) E[q2 ] = Second Moment of the Beta = = = 2/15. (a + b) (a + b + 1) (3 + 6) (3 + 6 + 1) Var[q] = Variance of the Beta = 2/15 - (1/3)2 = 1/45.] Then there are two different techniques one could use to determine the variance of the marginal (prior mixed) distribution. Process Variance of the Binomial = 10q(1 - q) = 10q - 10q2 . EPV = E[10q - 10q2 ] = 10 E[q] - 10 E[q2 ] = (10)(1/3 - 2/15) = 2. Hypothetical Mean of the Binomial = 10q. VHM = Var[10q] = 102 Var[q] = (102 )(1/45) = 2.222. Variance of the marginal (mixed) distribution is: EPV + VHM = 2 + 2.222 = 4.22. Alternately, the mean of the mixture is the mixture of the means:47 E[10q] = 10 E[q] = 10/3 = 3.333. The second moment of the mixture is the mixture of the second moments:48 E[10q(1-q) + (10q)2 ] = 10 E[q] + (9)(10) E[q2 ] = 10/3 + (90)(2/15) = 15.333. Variance of the mixture is: 15.333 - 3.3332 = 4.22.
In general, the Beta-Binomial has mean: m 45
a b (m+ a + b) 49 a , and variance: m . (a + b)2 (a+ b +1) a + b
These densities are not equal to those of a Binomial with m = 10 and q = posterior mean =12/29: 0.00479192, 0.0338253, 0.107445, 0.202249, 0.249838, 0.211627, 0.124487, 0.0502131, 0.0132917, 0.00208497, 0.000147174. 46 See 4, 5/07, Q.3. 47 The mean of the Binomial is 10q. 48 The second moment of the Binomial is: variance of the Binomial + (mean of the Binomial)2 . 49 With θ = 1 for the Beta. When m = 1, the Beta-Binomial is just a Bernoulli Distribution.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 210
The Beta-Binomial has the same mean as for a Binomial with m and q = a/(a+b). a b However, for m > 1, this is a larger variance than that of this Binomial: m . a + b a + b For this example, the prior distribution of q is Beta with a = 3, b = 6, and θ = 1. m = 10. The mean of the mixed Beta-Binomial is: m and the variance is: m
a 3 = 10 = 3.333, a + b 3+6
a b (m+ a + b) (3) (6) (10 + 3 + 6) = 10 = 4.22, (a + b)2 (a+ b +1) (3 + 6)2 (3 + 6 + 1)
matching the previous results. If an individual policyholder has 4 claims the first year, and 5 claims the second year, then as discussed previously, the posterior distribution is also Beta, with aʼ = a + number of claims = 3 + 4 + 5 = 12, and bʼ = b + m(number of years) - number of claims = 6 + (10)(2) - (4 + 5) = 17. Exercise: Determine the mean, second moment, and variance of this posterior Beta distribution. [Solution: E[q] = Mean of the Beta = a / (a + b) = 12/29 = 0.4138. a (a +1) (12) (17) E[q2 ] = Second Moment of the Beta = = = 0.1793. (a + b) (a + b + 1) (12 + 17) (12 + 17 + 1) Var[q] = Variance of the Beta = 0.1793 - 0.41382 = 0.00807.] Exercise: Determine the variance of the predictive (posterior mixed) distribution. [Solution: Process Variance of the Binomial = 10q(1 - q) = 10q - 10q2 . Posterior EPV = E[10q - 10q2 ] = 10 E[q] - 10 E[q2 ] = (10)(0.4138 - 0.1793) = 2.345. Hypothetical Mean of the Binomial = 10q. Posterior VHM = Var[10q] = 102 Var[q] = (102 )(0.00807) = 0.807. Variance of the predictive (mixed) distribution is: EPV + VHM = 2.345 + 0.807 = 3.152. Alternately, the mean of the mixture is the mixture of the means: E[10q] = 10 E[q] = (10)(0.4138) = 4.138. The second moment of the mixture is the mixture of the second moments: E[10q(1-q) + (10q)2 ] = 10 E[q] + (9)(10) E[q2 ] = (10)(0.4138) + (90)(0.1793) = 20.275. Variance of the mixture is: 20.275 - 4.1382 = 3.152. Comment: Parallel to the calculation of the variance of the marginal (prior mixed) distribution, except using the posterior Beta rather than the prior Beta.]
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 211
Problems: 7.1 (3 points) The number of claims in a year from an individual policyholder is Binomial with parameters m = 3 and q. The prior distribution of q is Beta with a = 2, b = 4, and θ = 1. An individual policyholder has 1 claim the first year, 2 claims the second year, and 1 claim the third year. What is the expected number of claims from this policyholder in the fourth year? A. 1 B. 6/5 C. 4/3 D. 3/2 E. 8/5 7.2 (2 points) The annual number of claims is Binomial with m = 4 and q. π(q) = 60q2 (1-q)3 , 0 < q < 1. You observe 8 claims in 3 years. Compare the estimate based on Buhlmann Credibility to that from Bayes Analysis. 7.3 (2 points) You are given: (i) For Q = q, X1 , X2 ,..., Xm are independent, identically distributed Bernoulli random variables with parameter q. (ii) The prior distribution of Q is beta with a = 20, b = 28, and θ = 1. (iii) 12 claims are observed. Determine the smallest value of m such that the mean of the posterior distribution of Q is less than or equal to 0.25. (A) 80 (B) 90 (C) 100 (D) 110 (E) 120 7.4 (3 points) You are given: (i) The number of claims for each policyholder has a binomial distribution with parameters m = 10 and q. (ii) The prior distribution of q is beta with parameters a = 4, b = unknown, and θ = 1. (iii) A randomly selected policyholder had the following claims experience: Year Number of Claims 1 3 2 2 3 y (iv) The Bayesian credibility estimate for the expected number of claims in Year 3 based on the Year 1 and Year 2 experience is 3.3333. (v) The Bayesian credibility estimate for the expected number of claims in Year 4 based on the Year 1, Year 2 and Year 3 experience is 3.7838. Determine y. (A) 4 (B) 5 (C) 6 (D) 7 (E) 8
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Use the following information for the next five questions: The number of claims in a year from an individual policyholder is Binomial with parameters m = 5 and q. The prior distribution of q is: π(q) = 60q3 (1- q)2 , 0 ≤ q ≤ 1. 7.5 (1 point) What is the mean of the marginal distribution? A. 2.7 B. 2.9 C. 3.1 D. 3.3 E. 3.5 7.6 (2 points) What is the variance of the marginal distribution? A. less than 1.2 B. at least 1.2 but less than 1.4 C. at least 1.4 but less than 1.6 D. at least 1.6 but less than 1.8 E. at least 1.8 7.7 (2 points) An individual policyholder has 4 claims in a year. What is the expected future annual claim frequency for this policyholder? A. 3.3 B. 3.5 C. 3.7 D. 3.9 E. 4.1 7.8 (2 points) An individual policyholder has 4 claims in a year. What is the variance of the predictive distribution? A. less than 1.2 B. at least 1.2 but less than 1.4 C. at least 1.4 but less than 1.6 D. at least 1.6 but less than 1.8 E. at least 1.8 7.9 (3 points) An individual policyholder has 3 claims in a year. Use the Bayes estimate that corresponds to the zero-one loss function, in order to predict this insuredʼs frequency next year. A. 2.8 B. 2.9 C. 3.0 D. 3.1 E. 3.2
7.10 (3 points) You are given: (i) Conditional on Q = q, the random variables X1 , X2 , … Xm, are independent and follow a Bernoulli distribution with parameter q. (ii) Sm = X1 + X2 + … + Xm. (iii) The distribution of Q is beta with a = 7, b = 3, and θ = 0.6. Determine the variance of the marginal distribution of S40. (A) 12
(B) 14
(C) 16
(D) 18
(E) 20
Page 212
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 213
Use the following information for the next 7 questions: Assume that given an inherent claim frequency q, the number of claims observed for one risk in m trials is given by a Binomial distribution with mean mq and variance mq(1-q). Also assume that the parameter q varies between 0 and 1 for the different risks, with q following a ab a Beta distribution with mean and variance . 2 (a + b) (a+ b +1) a + b 7.11 (2 points) If you observe d claims in m trials for an individual insured. Which of the following is the expected value of the posterior distribution of q for this insured? A. (a+m) / (a+b+d) B. (b+m) / (a+b+d) C. (a+d) / (a+b+m) D. (b+d) / (a+b+m) E. None of the above 7.12 (3 points) What is the Expected Value of the Process Variance (for a single trial)? ab ab A. B. (a + b) (a + b + 1) a + b C.
ab (a + b)2
D.
ab (a +
b)2 (a
+ b + 1)
E. None of the above 7.13 (1 point) What is the Variance of the Hypothetical Means (for a single trial)? ab ab A. B. (a + b) (a + b + 1) a + b C.
ab (a + b)2
D.
ab (a +
b)2 (a
+ b + 1)
E. None of the above 7.14 (4, 5/83, Q.41a) (1 point) What is the Buhlmann Credibility assigned to the observation of m trials? A. m / (m+a) B. m / (m+b) C. m / (m+a+b) D. m / (m+a+ab+b) E. None of the above
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 214
7.15 (1 point) If you observe d claims in m trials for an individual insured. Using Buhlmann Credibility, what is the estimated future claim frequency for this insured? A. (a+m) / (a+b+d) B. (b+m) / (a+b+d) C. (a+d) / (a+b+m) D. (b+d) / (a+b+m) E. None of the above 7.16 (4 points) Let d be such that 0 ≤ d ≤ m. What is the probability of observing d claims in m trials for an individual insured? Let the complete Beta Function be defined as β(r,s) = Γ(r)Γ(s) / Γ(r+s). A.
β(a + d, b) (m+1) β(a, b + m- d) β(d+ 1, m +1- d)
B.
β(a + d, b) (m+1) β(a + m- d, b) β(d, m)
C.
β(a + d, b + m- d) (m+1) β(a + m, b + d) β(d, m)
D.
β(a + d, b + m- d) (m+1) β(a, b) β(d + 1, m +1- d)
E. None of the above. 7.17 (2 points) If a = 2 and b = 4, then what is the probability of observing 5 claims in 7 trials for an individual insured? A. 6.8% B. 7.0% C. 7.2% D. 7.4% E. 7.6%
Use the following information for the next two questions: (i) Conditional on Q = q, the random variables X1 , X2 , … Xm, are independent and follow a Bernoulli distribution with parameter q. (ii) Sm = X1 + X2 + … + Xm (iii) The distribution of Q is beta with a = 3, b = 11, and θ = 1. 7.18 (3 points) Determine the variance of the marginal distribution of S50. (A) 34
(B) 36
(C) 38
(D) 40
(E) 42
7.19 (2 points) We observe that S50 = 7. Determine the variance of the predictive distribution for X51. (A) 0.10
(B) 0.13
(C) 0.16
(D) 0.19
(E) 0.22
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 215
Use the following information for the next three questions:
• Frequency follows a binomial distribution with parameters 4 and q. • The prior distribution of q is: 6 (q - q2 ), 0 ≤ q ≤ 1. • During the next year there is one claim. 7.20 (2 points) Find the posterior distribution of q. 7.21 (1 point) Estimate the future annual frequency for this insured. A. 1.5 B. 1.6 C. 1.7 D. 1.8 E. 1.9 7.22 (2 points) Find the posterior probability that q is more than 0.5. A. 23% B. 25% C. 27% D. 29% E. 31%
7.23 (4, 5/86, Q.47) (1 point) The beta distribution is a conjugate prior distribution to the binomial distribution. Explain briefly what is meant by this. Use the following information for the next two questions: Assume that the number of claims, r, made by an individual insured in one year follows a binomial distribution: ⎛3⎞ p(r) = ⎜ ⎟ θr (1 - θ)3-r, r = 0, 1, 2, 3. ⎝r ⎠ Also assume that the parameter, θ, has the following p.d.f.: g(θ) = 6(θ - θ2 ), 0 < θ < 1. 7.24 (4, 5/91, Q.32) (2 points) Given an observation of one claim in a one year period, what is the posterior distribution of θ? A. 30 θ2 (1 - θ)2 B. 10 θ2 (1 - θ)3 C. 6 θ2 (1 - θ)2 D. 60 θ2 (1 - θ)3 E. 105 θ2 (1 - θ)4 7.25 (4, 5/91, Q.33) (3 points) What is the Buhlmann credibility assigned to a single observation? A. 3/8 B. 3/7 C. 1/2 D. 3/5 E. 3/4
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 216
7.26 (165, 5/91, Q.12) (1.9 points) A Bayesian process is being used to estimate the mortality rate at age x. You are given the following: (i) The mortality rate is assumed to be a random variable with a Beta distribution with θ = 1. (ii) The mean of the prior distribution is 0.20. (iii) 10 lives were observed. (iv) 1 life died before attaining age x + 1. (v) The number of deaths has a binomial distribution. (vi) You have twice as much confidence in the prior mean as in the observed mortality rate. (vii) The mode of any Beta distribution is θ(a - 1) / (a + b - 2). Determine the mode of the posterior distribution. (A) 1/13 (B) 1/9 (C) 1/7 (D) 1/6
(E) 1/5
Use the following information for the next two questions: The number of claims for an individual risk in one year follows the Binomial Distribution with parameters m = 5 and q. The parameter q has a prior distribution in the form of a beta: f(q) = 60 q3 (1-q)2 , 0 ≤ q ≤ 1. No claims occurred in the first year. 7.27 (4B, 5/93, Q.4) (1 point) The posterior distribution of q is proportional to which of the following? A. q3 (1-q)2
B. q3 (1-q)7
C. q8 (1-q)2
D. q7 (1-q)3
E. q2 (1-q)8
7.28 (4B, 5/93, Q.5) (3 points) Determine the Buhlmann credibility estimate for the expected number of claims in the second year. A. Less than 1.40 B. At least 1.40 but less than 1.50 C. At least 1.50 but less than 1.60 D. At least 1.60 but less than 1.70 E. At least 1.70 7.29 (4B, 5/93, Q.29) (2 points) You are given the following: • The distribution for number of claims is binomial with parameters q and m = 1. • The prior distribution of q has mean = 0.25 and variance = 0.07. Determine the Buhlmann credibility to be assigned to a single observation of one risk. A. Less than 0.20 B. At least 0.20 but less than 0.25 C. At least 0.25 but less than 0.30 D. At least 0.30 but less than 0.35 E. At least 0.35
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 217
7.30 (4, 11/03, Q.7 & 2009 Sample Q.5) (2.5 points) You are given: (i) The annual number of claims for a policyholder has a binomial distribution with probability function: ⎛2⎞ p(x | q) = ⎜ ⎟ qx (1-q)2-x, x = 0, 1, 2. ⎝ x⎠ (ii) The prior distribution is: π(q) = 4q3 , 0 < q < 1. This policyholder had one claim in each of Years 1 and 2. Determine the Bayesian estimate of the number of claims in Year 3. (A) Less than 1.1 (B) At least 1.1, but less than 1.3 (C) At least 1.3, but less than 1.5 (D) At least 1.5, but less than 1.7 (E) At least 1.7 7.31 (5 points) In the previous question, 4, 11/03, Q.7, estimate the probability of having 0 claims in Year 3, the probability of having 1 claim in Year 3, and the probability of having 2 claims in Year 3.
7.32 (4, 11/04, Q.1 & 2009 Sample Q.133) (2.5 points) You are given: (i) The annual number of claims for an insured has probability function: ⎛3⎞ p(x) = ⎜ ⎟ q x (1 - q)3- x , x = 0, 1, 2, 3. ⎝ x⎠ (ii) The prior density is π(q) = 2q, 0 < q < 1. A randomly chosen insured has zero claims in Year 1. Using Bühlmann credibility, estimate the number of claims in Year 2 for the selected insured. (A) 0.33 (B) 0.50 (C) 1.00 (D) 1.33 (E) 1.50 7.33 (4, 11/06, Q.9 & 2009 Sample Q.253) (2.9 points) You are given: (i) For Q = q, X1 , X2 ,..., Xm are independent, identically distributed Bernoulli random variables with parameter q. (ii) Sm = X1 + X2 +...+ Xm (iii) The prior distribution of Q is beta with a = 1, b = 99, and θ = 1. Determine the smallest value of m such that the mean of the marginal distribution of Sm is greater than or equal to 50. (A) 1082 (B) 2164
(C) 3246
(D) 4950
(E) 5000
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 218
7.34 (4, 11/06, Q.29 & 2009 Sample Q.272) (2.9 points) You are given: (i) The number of claims made by an individual in any given year has a binomial distribution with parameters m = 4 and q. (ii) The prior distribution of q has probability density function π(q) = 6q(1 - q), 0 < q < 1. (iii) Two claims are made in a given year. Determine the mode of the posterior distribution of q. (A) 0.17 (B) 0.33 (C) 0.50 (D) 0.67
(E) 0.83
7.35 (4, 5/07, Q.3) (2.5 points) You are given: (i) Conditional on Q = q, the random variables X1 , X2 , … Xm, are independent and follow a Bernoulli distribution with parameter q. (ii) Sm = X1 + X2 + … + Xm (iii) The distribution of Q is beta with a = 1, b = 99, and θ = 1. Determine the variance of the marginal distribution of S101. (A) 1.00
(B) 1.99
(C) 9.09
(D) 18.18
(E) 25.25
7.36 (4, 5/07, Q.15) (2.5 points) You are given: (i) The number of claims for each policyholder has a binomial distribution with parameters m = 8 and q. (ii) The prior distribution of q is beta with parameters a (unknown), b = 9, and θ = 1. (iii) A randomly selected policyholder had the following claims experience: Year Number of Claims 1 2 2 k (iv) The Bayesian credibility estimate for the expected number of claims in Year 2 based on the Year 1 experience is 2.54545. (v) The Bayesian credibility estimate for the expected number of claims in Year 3 based on the Year 1 and Year 2 experience is 3.73333. Determine k. (A) 4 (B) 5 (C) 6 (D) 7 (E) 8
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 219
Solutions to Problems: 7.1. B. The prior distribution of q is proportional to: q(1-q)3 . The chance of the observation is proportional to: {q(1-q)2 } {q2 (1-q)} {q(1-q)2 }. Thus the posterior distribution of q is proportional to: q(1-q)3 {q(1-q)2 }{q2 (1-q)}{q(1-q)2 } = q5 (1-q)8 . Therefore, the posterior distribution of q is a Beta with a = 6, b = 9, and θ = 1, with mean 6 / (6 + 9) = 0.4. The expected future annual frequency is: (0.4)(3) = 1.2. Alternately, each year is a Binomial with m = 3, the sum of three independent Bernoullis. Thus three years is the sum of 9 independent Bernoullis, each with the same q. There are a total of 4 claims in 9 Bernoulli trials so: aʼ = a + r = 2 + 4 = 6, and bʼ = b + n - r = 4 + 9 - 4 = 9. Posterior mean of q is: 6/(6 + 9). Expected future annual frequency is: (6/15)(3) = 1.2. 7.2. The number of claims per year is Binomial with m = 4 and q. We observe 3 years. This is mathematically the same as if we had (4)(3) independent Bernoulli trials, each with mean q. The distribution of q is Beta with a = 3 and b = 4. Therefore, this is mathematically equivalent to a Beta-Bernoulli. Therefore, the estimates from Buhlmann Credibility and Bayes Analysis are equal. Alternately, the number of claims over 3 years is Binomial with m = 12 and q. The posterior distribution of q is proportional to: f(8) π(q) ~ q8 (1-q)4 q2 (1-q)3 = q10(1-q)7 . Therefore, the posterior distribution of q is Beta with a = 11 and b = 8. E[X | q] = 4q. Bayes Analysis estimate = E[4q] = 4E[q] = 4(11/19) = 44/19. Var[X | q] = 4q(1 - q) = 4q - 4q2 . The prior distribution of q is Beta with a = 3 and b = 4. E[q] = 3/7. E[q2 ] = (3)(4)/{(7)(8)} = 3/14. (Prior) EPV = E[4q - 4q2 ] = 4(3/7) - (4)(3/14) = 6/7. (Prior) VHM = Var[4q] = 16Var[q] = (16){3/14 - (3/7)2 } = 24/49. K = EPV/VHM = (6/7)/(24/49) = 7/4. Z = 3/(3+K) = 12/19. A priori mean = E[4q] = (4)(3/7) = 12/7. Buhlmann Credibility estimate = (12/19)(8/3) + (7/19)(12/7) = 44/19. Therefore, the estimates from Buhlmann Credibility and Bayes Analysis are equal. 7.3. A. The Posterior Beta has parameters: aʼ = 20 + 12 = 32, and bʼ = 28 + m - 12 = 16 + m. The mean of the Posterior Beta is: aʼ/(aʼ + bʼ) = 32/(48 + m) ≤ .25. ⇒ m ≥ 80. Comment: Similar to 4, 11/06, Q.9.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 220
7.4. B. A Beta-Binomial; Bayes Analysis equals Buhlmann Credibility in this case. Based on the first two years of data: aʼ = a + 3 + 2 = 9, and bʼ = b + (10 - 3) + (10 - 2) = b + 15. Therefore, the Bayesian credibility estimate for the future frequency is aʼ/(aʼ + bʼ) = 9/(9 + b + 15) = 9/(b + 24). The estimate of the number of claims is 10 times that: 3.3333 = 90/(b + 24). ⇒ b = 3. Based on the first three years of data: aʼ = a + 3 + 2 + y = 9 + y, and bʼ = b + (10 - 3) + (10 - 2) + (10 - y) = 28 - y. The estimate of the number of claims is: 3.7838 = 10(9 + y)/(9 + y + 28 - y) = 10(9 + y)/37. ⇒ y = 5. Comment: Similar to 4, 5/07, Q.15. Given the usual outputs, solve for missing inputs.
7.5. B. & 7.6. E. The prior distribution is Beta with a = 4, b = 3, and θ = 1, with mean 4/(4 + 3) = 4/7, second moment (4)(5)/{(7)(8)} = 5/14, and variance 5/14 - (4/7)2 = 3/98. The mean number of claims is E[5q] = 5 E[q] = 5(4/7) = 2.857. Prior to observations, the EPV = E[mq(1-q)] = 5E[q] - 5E[q2 ] = (5)(4/7) - (5)(5/14) = 15/14. Prior to observations, the VHM = Var[5q] = 25 Var[q] = (25)(3/98) = 75/98. Variance of the marginal distribution = Prior EPV + Prior VHM = 15/14 + 75/98 = 90/49 = 1.837. Comment: The mixed distribution is not a Binomial; it is a Beta-Binomial. Similar to 4, 5/07 Q.3. 7.7. A. The prior distribution of q is proportional to: q3 (1- q)2 . The chance of the observation is proportional to: q4 (1-q). Thus the posterior distribution of q is proportional to: q3 (1- q)2 q4 (1-q) = q7 (1-q)3 . Therefore, the posterior distribution of q is a Beta with a = 8, b = 4, and θ = 1, with mean 8/(8 + 4) = 2/3. The expected future annual frequency is: (2/3)(5) = 10/3. Alternately, a year is a Binomial with m = 5, the sum of five independent Bernoullis. The prior distribution is Beta with a = 4, b = 3, and θ = 1. There are a total of 4 claims in 5 Bernoulli trials so: aʼ = a + r = 4 + 4 = 8, and bʼ = b + n - r = 3 + 5 - 4 = 4. Proceed as before.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 221
7.8. C. The posterior distribution of q is a Beta with a = 8, b = 4, and θ = 1, with mean 8/(8 + 4) = 2/3, second moment (8)(9)/{(12)(13)} = 0.4615, and variance: .4615 - (2/3)2 = 0.01709. Posterior to observations, the EPV = E[mq(1-q)] = 5E[q] - 5E[q2 ] = (5)(2/3) - (5)(.4615) = 1.026. Posterior to observations, the VHM = Var[5q] = 25 Var[q] = (25)(0.01709) = 0.427. Variance of the predictive distribution = Posterior EPV + Posterior VHM = 1.026 + 0.427 = 1.453. Comment: The mixed distribution is not a Binomial; it is a Beta-Binomial. Similar to 4, 5/07 Q.3, which deals with the marginal distribution prior to observations, rather than the predictive distribution posterior to observations. 7.9. C. The prior distribution of q is proportional to: q3 (1- q)2 . The chance of the observation is proportional to: q3 (1-q)2 . Thus the posterior distribution of q is proportional to: q3 (1- q)2 q3 (1-q)2 = q6 (1-q)4 . Therefore, the posterior distribution of q is a Beta with a = 7, b = 5, and θ = 1. The zero-one loss function corresponds to the mode. ⇒ We wish to maximize this density. Setting the derivative with respect to q equal to zero: 6q5 (1-q)4 - 4q6 (1-q)3 = 0. ⇒ 6(1-q) = 4q. ⇒ q = 0.6. Thus the estimated frequency is: (5)(0.6) = 3.0. Comment: In general, the mode of a Beta Distribution is: θ(a - 1) / (a + b - 2), for a > 1 and b > 1.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
7.10. E. E[q] = Mean of the Beta = θ
Page 222
a = (0.6)(7/10) = 0.42. a + b
E[q2 ] = Second Moment of the Beta = θ2
a (a + 1) = (0.62 )(56/110) = 0.183273. (a + b) (a + b + 1)
Var[q] = Variance of the Beta = 0.183273 - 0.422 = 0.006873. Process Variance = 40q(1 - q) = 40q - 40q2 . EPV = 40 E[q] - 40 E[q2 ] = (40)(0.42) - (40)(0.183273) = 9.469. Hypothetical Mean = 40 q. VHM = 402 Var[q] = (402 ) (0.006873) = 10.9926. Variance of the marginal (mixed) distribution is: EPV + VHM = 9.469 + 10.996 = 20.465. a Alternately, E[q] = Mean of the Beta = θ = (0.6)(7/10) = 0.42. a + b The mean of the mixture is the mixture of the means: E[40q] = 40 E[q] = (40) (0.42) = 16.80. a (a + 1) E[q2 ] = Second Moment of the Beta = θ2 = (0.62 )(56/110) = 0.183273. (a + b) (a + b + 1) The second moment of the mixture is the mixture of the second moments: E[40q(1-q) + (40q)2 ] = 40 E[q] + (1560) E[q2 ] = (40)(0.42) + (1560)(0.183273) = 302.706. Variance of the mixture is: 302.706 - 16.802 = 20.466. Comment: The mixed distribution is not a Binomial. Here, θ < 1. For use in the Beta-Binomial conjugate prior situation, θ = 1. 7.11. C. For the Beta-Bernoulli, the posterior distribution is Beta with new parameters equal to (a + # claims observed), (b + # trials - # claims observed)) = (a+d), (b+m-d). The mean of this posterior Beta is: (a+d)/{(a+d)+(b+m-d)} = (a+d) / (a+b+m). 7.12. B. The process variance for an individual risk (for one trial) is q(1-q) = q - q2 since the frequency for each risk is Bernoulli. EPV = E[q] - E[q2] = mean of Beta - second moment of Beta = a/(a+b) - a(a+1)/{(a+b)(a+b+1)} = ab / {(a+b)(a+b+1)}. 7.13. D. The variation of the hypothetical means (for one trial) is the variance of q = Var[q] = Variance of the Prior Beta = a(a+1)/{(a+b)(a+b+1)} - a2 /(a+b)2 = ab / {(a+b)2 (a+b+1)}. 7.14. C. K = the (prior) expected value of the process variance / the (prior) variance of the hypothetical means = {ab/((a+b)(a+b+1))} / {ab/((a+b)2 (a+b+1))} = a + b. Z = N/(N + K) = m/(m + a + b).
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 223
7.15. C. The prior mean frequency is the mean of the prior Beta Distribution, which is a/(a+b). The observation is d/n. From the prior solution Z = m/(m+a+b). Thus the estimate using Buhlmann Credibility is: {m/(a+b+m)}(d/m) + {1 - m/(a+b+m)}{a/(a+b)} = d/(a+b+n) + a/(a+b+n) = (a + d) / (a + b + m). 7.16. D. The probability density of q is a Beta Distribution with parameters a and b: {(a+b-1)! / {(a-1)! (b-1)!} qa-1(1-q)b-1 = {Γ(a+b) / Γ(a)Γ(b)} qa-1(1-q)b-1. One can compute the unconditional density at d via integration: 1
∫
f(d) = f(d | q) {Γ(a+b) / Γ(a)Γ(b)} qa-1(1-q)b-1 dq = 0 1
∫
{Γ(a+b) / Γ(a)Γ(b)} {m! / (d!)(m-d)!} qd (1-q)m-d qa-1(1-q)b-1dq = 0 1
{1/β(a,b)}{Γ(m+1) / (Γ(d+1))Γ(m+1-d)}
∫ qa+d-1(1-q)b+m-d-1dq =
0
{1/β(a,b)}{Γ(m+2) / (m+1)(Γ(d+1))Γ(m+1-d)} {Γ(a+d) Γ(b+m-d) / Γ(a+b+m) } = {1/β(a,b)}{1/(m+1)}{1/β(d+1,m+1-d)} β(a+d,b+m-d) = β(a+d,b+m-d) / {(m+1)β(a,b)β(d+1,m+1-d)}. Comment: This (prior) marginal distribution is sometimes called a Binomial-Beta, Negative Hypergeometric, or Polya-Eggenberger Distribution. See for example Kendallʼs Advanced Theory of Statistics by Stuart and Ord. It has three parameters: m, a and b. It has mean ma/(a+b) and variance: abm(m+a+b) / {(a+b+1)(a+b)2}.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 224
7.17. E. Using the solution to the previous question: f(d) = β(a+d,b+m-d) / {(m+1)β(a,b)β(d+1,m+1-d). Here d = 5, a = 2, b = 4, and m = 7. f(5) = β(2+5,4+7-5) / {(7+1)β(2,4)β(5+1,7+1-5) = β(7,6) / {(8)β(2,4)β(6,3). β(7,6) = Γ(7)Γ(6) / Γ(7+6) = (6!)(5!)/(12!) = 1/5544. β(2,4) = Γ(2)Γ(4) / Γ(2+4) = (1!)(3!)/(5!) = 1/20. β(6,3) = Γ(6)Γ(3) / Γ(6+3) = (5!)(2!)/(8!) = 1/168. Therefore, f(5) = β(7,6) / {(8)β(2,4)β(6,3) = {(20)(168)} / {(8)(5544)} = 5/66 = 0.07576. Comment: One can also compute the solution by doing integrals similar to those in the solution to the previous question. The probability of observing other number of claims in 7 trials is as follows: d f(d) F(d)
0 0.15152 0.15152
1 0.21212 0.36364
2 0.21212 0.57576
3 0.17677 0.75253
4 0.12626 0.87879
5 0.07576 0.95455
6 0.03535 0.98990
7 0.01010 1.00000
This is an example of the “Binomial-Beta” distribution with: a = 2, b =4, and m = 7. 7.18. B. E[q] = Mean of the Beta = a/(a + b) = 3/14 = 0.2143. E[q2 ] = Second Moment of the Beta = a(a + 1)/{(a + b)(a + b + 1)} = (3)(4)/{(14)(15)} = 0.05714. Var[q] = Variance of the Beta = 12/{(14)(15)} - (3/14)2 = .01122. Process Variance = 50q(1 - q) = 50q - 50q2 . EPV = 50 E[q] - 50 E[q2 ] = (50)(0.2143 - 0.0571) = 7.86. Hypothetical Mean = 50 q. VHM = 502 Var[q] = (502 )(.01122) = 28.05. Variance of the marginal (mixed) distribution is: EPV + VHM = 7.86 + 28.05 = 35.9. Alternately, E[q] = Mean of the Beta = a/(a + b) = 3/14 = 0.2143. The mean of the mixture is the mixture of the means: E[50q] = 50 E[q] = (50)(0.2143) = 10.715. E[q2 ] = Second Moment of the Beta = a(a + 1)/{(a + b)(a + b + 1)} = (3)(4)/{(14)(15)} = 0.05714. The second moment of the mixture is the mixture of the second moments: E[50q(1-q) + (50q)2 ] = 50 E[q] + (49)(50) E[q2 ] = (50)(0.2143) + (49)(50)(0.05714) = 150.7. Variance of the mixture is: 150.7 - 10.7152 = 35.9. Comment: Similar to 4, 5/07, Q.3. The mixed distribution is not a Binomial; it is a Beta-Binomial. 7.19. B. The posterior distribution is Beta with a = 3 + 7 = 10 and b = 11 + (50 - 7) = 54. This Beta has mean: 10/(10 + 54) = 0.156. Therefore, the predictive distribution is Bernoulli, with q = 0.156. Its variance is: (0.156)(1 - 0.156) = 0.132.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 225
7.20. The chance of the observation given q is: 4 q (1-q)3 . Thus the posterior distribution is proportional to: (q - q2 ) q (1-q)3 = q2 (1-q)4 , 0 ≤ q ≤ 1. This is a Beta Distribution with a = 3, b = 5, and θ = 1. The constant in front is: Γ(3 + 5) / {Γ(3) Γ(5)} = 7! / {(2!)(4!)} = 105. The posterior distribution of q is: 105 q2 (1-q)4 , 0 ≤ q ≤ 1. Alternately, the prior distribution of q is Beta with a = 2, b = 2, and θ = 1. Thus this is a Beta-Binomial, and the posterior distribution is Beta with parameters: a = 2 + 1 = 3, b = 2 + (4)(1) - 1 = 5, and θ = 1. The posterior distribution of q is: 105 q2 (1-q)4 , 0 ≤ q ≤ 1. 7.21. A. The mean of the posterior Beta is: 3 / (3 + 5) = 3/8. Thus the estimated future frequency is: (4)(3/8) = 1.5. Alternately, K = (a+b)/m = (2 + 2)/4 = 1. Z = 1/(1+K) = 1/2. Prior mean is: m a / (a +b) = 4 (2)/(2 + 2) = 2. Thus the estimated future frequency is: (1/2)(1) + (1 - 1/2)(2) = 1.5. 7.22. A. Let x = 1 - q. Then, 105 q2 (1-q)4 = 105 (1 - x)2 x4 = 105 (x4 - 2x5 + x6 ). q > 0.5. ⇔ x < 0.5. 0.5
105
∫0 x4 - 2x5 + x6 dx = (105) {0.55/5 - (2) (0.56/6) + 0.57/7} = 22.66%.
7.23. If the prior distribution is a Beta Distribution and the likelihood is a Binomial Distribution, then the posterior distribution is also a Beta Distribution. 7.24. D. By Bayes Theorem, the posterior distribution of θ is proportional to p(1)g(θ) = 3θ(1-θ)2 6(θ−θ2 ) = 18θ2 (1-θ)3 . Thus the posterior distribution is a Beta Distribution with parameters: a = 3 and b = 4. The constant in front is: Γ(3 + 4) / {Γ(3)Γ(4)} = 6! / {2! 3!} = 60. Thus the posterior distribution is 60θ2 (1-θ)3 . Comment: The prior distribution is a Beta with a = 2 and b = 2. We observe 1 claim in 3 trials. The posterior 1st parameter, a´, is the prior 1st parameter, a, + # claims = 2 + 1 = 3. The posterior 2nd parameter, b´, is the prior 2nd parameter, b, + # trials - # claims = 2 + 3 - 1 = 4.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 226
7.25. B. For the Beta-Bernoulli, the Buhlmann Credibility parameter is a+b (the sum of the parameters of the Prior Beta), which in this case is 2 + 2 = 4. In this case “one observation” is equal to 3 Bernoulli Trials (m=3 in the Binomial.) Therefore, Z = N / (N + K) = 3 /(3 + 4) = 3/7. Alternately, the process variance for an individual risk is: 3θ(1−θ) = 3θ - 3θ2, since the frequency for each risk is Binomial with m = 3. Therefore the expected value of the process variance = the expected value of 3θ - the expected value of 3θ2 = 3E[θ] - 3E[θ2]. E[θ] = the mean of the β(a,b; x): a/(a+b) = 2/(2+2) = 2/4= .5. E[θ2] = second moment of Beta = a(a+1)/{(a+b) (a+b+1)} = (2)(3)/{(4)(5)} = .3. Therefore the EPV = 3(.5 - .3) = .6. The Variance of the Hypothetical Means is the variance of 3θ = 9Var[θ] = 9 (Variance of the Prior Beta) = 9 (.3 - .52 ) = .45. K = EPV / VHM = .6 / .45 = 4/3. In this case, one observation is one draw from the Binomial, therefore N =1. Z = 1 / (1 + 4/3) = 3/7. Comment: Note the care that must be taken over the meaning of N. One must be consistent with the definition of an exposure in the calculation of the variances. The two solutions given here use different such definitions, but still produce the same answer for the Credibility. 7.26. C. Mean of the Beta is: θa/(a+b) = a/(a+b). ⇒ 0.2 = a/(a + b). ⇒ b = 4a. You have twice as much confidence in the prior mean as in the observed mortality rate.
⇒ Z = 1/3. Z = 10/(10 + K) = 1/3. ⇒ K = 20. For the Beta-Binomial, K = a + b. ⇒ a + b = 20. ⇒ 5a = 20. ⇒ a = 4. ⇒ b = 16. aʼ = a + 1 = 5. bʼ = b + 10 - 1 = 25. Posterior Mode = (aʼ - 1)/(aʼ + bʼ - 2) = 4/28 = 1/7. Alternately, posterior estimate is: (1/3)(prior mean) + (2/3)(observed mortality) = (1/3){a/(a+b)}+ (2/3)(1/10) = (1/3)(1/5) + (2/3)(1/10) = 2/15. But the posterior estimate is the mean of the posterior Beta distribution: aʼ/(aʼ + bʼ) = (a + 1)/(a + 1 + b + 9) = (a + 1)/(a + b + 10) = (a + 1)/(5a + 10).
⇒ (a + 1)/(5a + 10) = 2/15. ⇒ a = 4. ⇒ b = 16. Proceed as before. 7.27. B. By Bayes Theorem the posterior is proportional to: (a prior chance of value of q) (chance of observation given q) = f(q) p(0 | q) = {60 q3 (1-q)2 } {(1-q)5 }, which is proportional to: q3 (1-q)7 . Comment: The Beta and the Binomial are Conjugate Priors. (The Beta-Bernoulli is a special case.) The posterior is also a Beta Distribution, with a = 4 and b = 8 and therefore with constant in front of: (a+b-1)! / (a-1)! (b-1)! = 11! / (3!)(7!) = 1320.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 227
7.28. D. One can use the fact that for the Beta-Bernoulli the Bayes and Buhlmann estimates are equal. The mean of the Posterior Distribution β(4,8; t) is: a /(a + b) = 4/(4 + 8) = 4 /12 = 1/3. The expected number of claims for 5 trials is then: 5(1/3) = 5/3 = 1.667. Alternately, for the Beta-Bernoulli, the Buhlmann Credibility parameter K = a + b = 4 + 3 = 7. (Where a and b are the parameters of the prior Beta.) Thus for 5 trials, Z = 5 / (5+7) = 5/12. The prior estimate (for 5 Bernoulli Trials) is 5E[q] = (5)(Mean of β(4,3; q)) = (5)(4/7) = 20/7. Buhlmann estimate upon observing 0 claims is: (5/12)(0) + (7/12)(20/7) = 5/3 = 1.667. 7.29. E. We have E[q] = .25 and Var[q] = E[q2 ] - E[q]2 = .07. Therefore, E[q2 ] = .1325. For the Binomial we have a process variance of mq(1-q) = q(1-q) = q - q2 . Therefore, the Expected Value of the Process Variance = E[q] - E[q2 ] = .25 - .1325 = .1175. For the Binomial the mean is mq = q. Therefore the Variance of the Hypothetical Means = Var[q] = .07. Therefore K = EPV / VHM = .1175 / .07 = 1.6786. For one observation Z = 1 / (1 + 1.6786) = 0.373. Comment: Since the Buhlmann credibility does not depend on the form of the prior distribution (but just on its mean and variance), one could assume it was a Beta Distribution. One can then solve for the parameters a and b of the Beta: mean = .25 = a / (a+b), and variance = .07 = ab / {(a+b+1)(a+b)2 }. Therefore b = 3a, 4a+1 = (.25)(.75) / .07 = 2.679. a = .420, b = 1.260. The Buhlmann Credibility parameter for the Beta-Bernoulli is a + b = 1.68. For one observation, Z = 1 / (1 + 1.68) = 0.373.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 228
7.30. C. π(q) = 4q3 (1 - q)0 , 0 < q < 1. f(x) = {(a+b-1)! / ((a-1)! (b-1)!)} xa-1 (1-x)b - 1, 0 < x < 1.
⇒ The prior distribution is Beta with a = 4 and b = 1. Each year is a Binomial with m = 2. ⇔ Two independent Bernoulli Trials. Thus 2 years ⇔ 4 independent Bernoulli Trials. Observe 2 claims in 4 trials. aʼ = a + 2 = 6. bʼ = b + (4 - 2) = 3. Mean of Posterior Beta is: aʼ/(aʼ + bʼ) = 6/(6 + 3) = 2/3. Expected claims in Year 3: E[mq] = 2E[q] = (2)(2/3) = 4/3. Alternately, chance of observation is: {2q(1-q)} {2q(1-q)} = 4q2 (1-q)2 . By Bayes Theorem, posterior distribution of q is proportional to: π(q) 4q2 (1-q)2 = 16q5 (1-q)2 , 0 ≤ q ≤ 1. This is proportional to a Beta Distribution with a = 6 and b = 3, which therefore must be the posterior distribution of q. Proceed as before. Alternately, for the Beta-Binomial: aʼ = a + number of claims = 4 + 2 = 6. bʼ = b + m (number of years) - number of claims = 1 + (2)(2) - 2 = 3. Proceed as before. Alternately, prior distribution is Beta with a = 4 and b = 1. 4 independent Bernoulli Trials. K = a + b = 5. Prior mean of q is: 4 / (4 + 1) = 0.8. Observe 2 claims in 4 trials. Z = 4 / (4 + 5) = 4/9. Estimate of q: (4/9) (2/4) + (5/9) (0.8) = 2/3. Expected claims in Year 3: E[mq] = 2 E[q] = (2) (2/3) = 4/3. Alternately, prior distribution is Beta with a = 4 and b = 1. Binomial with m = 2. K = (a + b) / m = (4 + 1) / 2 = 2.5. Prior mean frequency: (2) 4 / (4 + 1) = 1.6. Observe 2 claims in 2 years. Z = 2 / (2 + 2.5) = 4/9. Expected claims in Year 3: (4/9) (2/2) + (5/9) (1.6) = 4/3.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 229
7.31. From the previous solution, posterior to Year 2, the distribution of q is a Beta Distribution with a = 6 and b = 3. g(q) = {8!/(5!2!)}q6-1(1-q)3-1 = 168q5 (1-q)2 , 0 ≤ q ≤ 1. Given q, the probability of 2 claims in Year 3 is the density at 2 for a Binomial with m = 2: q2 . 1
1
∫
∫
f(2) = q2 168q5 (1-q)2 dq = 168 q7 (1 - q)2 dq = 168 β(8, 3) = (168)(7! 2! / 10!) = 168/360. 0
0
Given q, the probability of 1 claim in Year 3 is: 2q(1-q). 1
1
∫
∫
f(1) = 2q(1-q)168q5 (1-q)2 dq = 336 q6 (1 - q)3 dq = 336 β(7, 4) = (336)(6! 3! / 10!) = 336/840. 0
0
Given q, the probability of 0 claims in Year 3 is: (1-q)2 . 1
1
∫
∫
f(0) = (1-q)2 168q5 (1-q)2 dq = 168 q5 (1 - q)4 dq = 168 β(6, 5) = (168)(5! 4! / 10!) = 168/1260. 0
0
The probabilities of having either 0, 1 and 2 claims in Year 3 are respectively: 168/1260 = 2/15 = 13.33%, 336/840 = 2/5 = 40%, and 168/360 = 7/15 = 46.67%. Alternately, in order to calculate the probability of 0 claims in Year 3, treat this as 2 separate Bernoulli trials each with no claim. The distribution of q prior to Year 3 is Beta with a = 6 and b = 3 with mean 6/9 = 2/3. The chance of no claims in the first trial is: 1 - 2/3 = 1/3. Posterior to one trial with no claim, we get a Beta with a = 6 and b = 4. The conditional probability of no claims in the second trial is: 4/(6 + 4) = 0.4. Therefore, the probability of no claims in both trials is: (1/3)(.4) = 13.33%. In order to calculate the probability of 2 claims in Year 3, treat this as 2 separate Bernoulli trials each with a claim. The distribution of q prior to Year 3 is Beta with a = 6 and b = 3 with mean 6/9 = 2/3. The chance of a claim in the first trial is 2/3. Posterior to one trial with a claim, we get a Beta with a = 7 and b = 3. The conditional probability of a claim in the second trial is: 7/(7 + 3) = 0.7. Therefore, the probability of a claim in both trials is: (2/3)(.7) = 46.67%. Therefore, the probability of 1 claim in Year 3 is: 1 - 13.33% - 46.67% = 40%. Comment: Beyond what you are likely to be asked on your exam. The predictive distribution is not a Binomial Distribution with m = 2 and q = 2/3, with densities: (1/3)2 = 1/9, 2(1/3)(2/3) = 4/9, (2/3)2 = 4/9. Instead, the predictive distribution is a Beta-Binomial with m = 2, a = 6, and b = 3: f(x) = β(a+x, b+m-x) / {(m+1)β(a, b)β(x+1, m+1-x)} = β(6+x, 5-x) / {(3)β(6, 3)β(x+1, 3-x)}.
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 230
f(0) = β(6, 5) / {(3)β(6, 3)β(1, 3)} = (1/1260)/{(3)(1/168)(1/3)} = 168/1260. f(1) = β(7, 4) / {(3)β(6, 3)β(2, 2)} = (1/840)/{(3)(1/168)(1/6)} = 336/840. f(2) = β(8, 3) / {(3)β(6, 3)β(3, 1)} = (1/360)/{(3)(1/168)(1/3)} = 168/360. See Exercise 15.82 in Loss Models. 7.32. C. This is a Beta-Binomial situation with m = 3, a = 2 and b = 1. One year of experience is three Bernoullis trials. We have zero claims. aʼ = a + 0 = 2 + 0 = 2. bʼ = b + (3 - 0) = 1 + 3 = 4. Mean of the Posterior Beta = aʼ/(aʼ + bʼ) = 2/(2 + 4) = 1/3. Number of claims expected in Year 2: (3)(1/3) = 1. Alternately, K = a + b = 2 + 1 = 3. Z = 3/(3 + 3) = 1/2. Mean of prior Beta: a/(a + b) = 2/(2 + 1). Estimated future frequency per Bernoulli trial: (1/2)(0/3) + (1/2)(2/3) = 1/3. Number of claims expected in Year 2: (3)(1/3) = 1. 7.33. E. E[Xi] = E[q] = the mean of the Beta = a/(a + b) = 1/100. E[Sm] = mE[Xi] = m/100 ≥ 50. ⇒ m ≥ 5000. Comment: No observations and no application of Bayes Theorem to get a posterior distribution. 7.34. C. A Beta-Binomial with a = 2 and b = 2. 4 trials with 2 successes. aʼ = a + 2 = 4. bʻ = b + 4 - 2 = 4. Posterior Distribution of q is Beta with a = 4 and b = 4. f(q) is proportional to: q3 (1- q)3 . We wish to maximize this density. Setting the derivative with respect to q equal to zero: 0 = 3q2 (1- q)3 - 3q3 (1- q)2 . ⇒ q = 0.5. Comment: The mode is where the density is largest. The question refers to the posterior distribution of q, in other words after observations, in order to distinguish it from the prior distribution of q. This posterior Beta is symmetric around its mean of: 4/(4 + 4) = 0.5, which is also the mode: Prob. 1.75 1.5 1.25 1 0.75 0.5 0.25 0.2
0.4
0.6
0.8
1
x
2013-4-10,
Conjugate Priors §7 Beta-Binomial, HCM 10/21/12,
Page 231
7.35. B. E[q] = Mean of the Beta = a/(a + b) = 1/100. E[q2 ] = Second Moment of the Beta = a(a + 1) / {(a + b)(a + b + 1)} = 2/{(100)(101)}. Var[q] = Variance of the Beta = 2/{(100)(101)} - 1/1002 = 99/{(1002 )(101)}. Process Variance = 101q(1 - q) = 101q - 101q2 . EPV = 101 E[q] - 101 E[q2 ] = 101/100 - 2/100 = 0.99. Hypothetical Mean = 101 q. VHM = 1012 Var[q] = (1012 )99/{(1002 )(101)} = 0.9999. Variance of the marginal (mixed) distribution is: EPV + VHM = 0.99 + 0.9999 = 1.9899. Alternately, E[q] = Mean of the Beta = a/(a + b) = 1/100. The mean of the mixture is the mixture of the means: E[101q] = 101 E[q] = 101/100 = 1.01. E[q2 ] = Second Moment of the Beta = a(a + 1) / {(a + b)(a + b + 1)} = 2/{(100)(101)}. The second moment of the mixture is the mixture of the second moments: E[101q(1-q) + (101q)2 ] = 101 E[q] + (100)(101) E[q2 ] = 101/100 + (100)(101)(2)/{(100)(101)} = 101/100 + 2 = 3.01. Variance of the mixture is: 3.01 - 1.012 = 1.9899. Comment: The mixed distribution is not a Binomial; it is a Beta-Binomial. 7.36. D. A Beta-Binomial; Bayes Analysis equals Buhlmann Credibility in this case. Based on the first year of data: aʼ = a + 2, and bʼ = b + (8 - 2) = 9 + 6 = 15. Therefore, the estimate for q is: aʼ/(aʼ + bʼ) = (a + 2)/(a + 2 + 15). The estimate of the number of claims is 8 times that: 2.54545 = 8(a + 2)/(a + 2 + 15). ⇒ a = 5. Based on the first two years of data: aʼ = a + 2 + k = 7 + k, and bʼ = b + 6 + (8 - k) = 23 - k. The estimate of the number of claims is: 3.73333 = 8(7 + k)/(7 + k + 23 - k) = 8(7 + k)/30. ⇒ k = 7. Comment: Given the usual outputs, solve for missing inputs.
2013-4-10,
Conjugate Priors §8 Inverse Gamma Distribution,
HCM 10/21/12,
Page 232
Section 8, Inverse Gamma Distribution50 If X follows a Gamma Distribution, with parameters α and 1, then θ/X follows an Inverse Gamma Distribution with parameters α and θ. (Thus an Inverse Gamma Distribution is no more complicated conceptually than the Gamma Distribution.) α is the shape parameter and θ is the scale parameter, as parameterized in Loss Models. The Distribution Function is: F(x) = 1 - Γ(α ; θ/x), while the probability density function is: f(x) = θα e−θ/x / {Γ(α) x α+1}. Note that the density has an exponential of 1/x times a negative power of x. This is how one recognizes an Inverse Gamma density.51 The scale parameter θ is divided by x in the exponential. The negative power of x has an absolute value one more than the shape parameter α. Exercise: A probability density function is proportional to e-11/x x-2.5. What distribution is this? [Solution: This is an Inverse Gamma Distribution with α = 1.5 and θ = 11. The proportionality constant in front of the density is 111.5 / Γ(1.5) = 36.48 / 0.8862 = 41.16. Note that there is no requirement that α be an integer. However, if α is non-integral then one needs access to software package that computes the (complete) Gamma Function.] The Distribution Function is similar to that of a Gamma Distribution: Γ(α ; x/θ). If x/θ follows an Inverse Gamma Distribution with scale parameter of one, then θ/x follows a Gamma Distribution with a scale parameter of one. The Inverse Gamma is heavy-tailed, as can be seen by the lack of the existence of certain moments.52 The nth moment of an Inverse Gamma only exists for n < α. Note that the Inverse Gamma density function integrates to unity from zero to infinity.53 ∞
e- θ / x ∫ xα + 1 dx = Γ(α) / θα , α > 0. 0
This fact will be useful for working with the Inverse Gamma Distribution. 50
See Appendix A of Loss Models. The Gamma density has an exponential of x times x to a power. 52 In the extreme tail its behavior is similar to that of a Pareto distribution with the same shape parameter α. 53 This follows from substituting y = 1/x in the definition of the Gamma Function. Remember it via the fact that all probability density functions integrate to unity over their support. 51
2013-4-10,
Conjugate Priors §8 Inverse Gamma Distribution,
HCM 10/21/12,
Page 233
For example, one can compute the moments of the Inverse Gamma Distribution: E[Xn ] =
∞
∞
0
0
∫ xnf(x)dx = ∫ xn θα e -θ/ x / { Γ(α) xα+ 1 } dx
∞
= {θα/Γ(α)}
∫0 e- θ/ x x-( α + 1 -n) dx
= {θα/Γ(α)} Γ(α - n) / θ(α-n) = θn Γ(α - n) / Γ(α), α - n > 0. Alternately, the moments of the Inverse Gamma also follow from the moments of the Gamma Distribution which are E[Xn ] = θn Γ(α+n) / Γ(α). (This formula works for n positive or negative.) If X follows a Gamma, with unity scale parameter Γ(α ; x), then Z = θ/X has Distribution Function: F(z) = 1 - Γ(α; θ/z). This is the Inverse Gamma Distribution, as parameterized by Loss Models. Thus the Inverse Gamma has Moments: E[Zn ] = E[(θ/X)n ] = θn E[X-n] = θn Γ(α-n) / Γ(α). Specifically, the mean of the Inverse Gamma = E[θ/X] = θ Γ(α-1) / Γ(α) = θ /(α-1).
2013-4-10,
Conjugate Priors §8 Inverse Gamma Distribution,
HCM 10/21/12,
Inverse Gamma Distribution Support: x > 0
Parameters: α > 0 (shape parameter), θ > 0 (scale parameter)
D. f. :
F(x) = 1 - Γ(α ; θ/x)
P. d. f. :
⎛θ ⎞ α e- θ / x θα e- θ / x f(x) = α + 1 =⎜ ⎟ x Γ[α ] ⎝ x ⎠ x Γ[α]
Moments: E[Xk] = n
θn
∏ (α − i)
=
θk , α > k. (α − 1)...(α − k)
i=1
Mean =
θ α −1
α>1
Second Moment =
Variance =
θ2
Mode =
θ2 ,α >2 (α − 1) (α − 2)
(α − 1)2 (α − 2)
,α >2
Coefficient of Variation = Standard Deviation / Mean = Skewness =
4
θ α+1
α−2 , α > 3. α −3
Kurtosis =
Limited Expected Value: E[X ∧ x] =
1 , α > 2. α −2
3 ( α − 2 ) (α + 5 ) , α > 4. (α − 3 ) ( α − 4 )
θ {1 - Γ[α−1; θ/x]} + x Γ[α; θ/x] α −1
R(x) = Excess Ratio = Γ[α−1; θ/x] − (α−1) (x/θ) Γ[α; θ/x] . e(x) = Mean Excess Loss =
θ Γ [α -1; θ / x] -x α − 1 Γ [ α ; θ / x]
X ∼ Gamma(α, 1) ⇔ θ/X ∼ Inverse Gamma(α, θ).
α>1
Page 234
2013-4-10,
Conjugate Priors §8 Inverse Gamma Distribution,
Displayed below are various Inverse Gamma Distributions: Alpha= 0.8and Theta= 500 0.0008
0.0006
0.0004
0.0002
2000
4000
6000
8000
10000
Alpha = 1.5 andTheta= 1500 0.0008
0.0006
0.0004
0.0002
2000
4000
6000
8000
10000
Alpha = 3 andTheta= 6000 0.0008
0.0006
0.0004
0.0002
2000
4000
6000
8000
10000
HCM 10/21/12,
Page 235
2013-4-10,
Conjugate Priors §8 Inverse Gamma Distribution,
HCM 10/21/12,
Problems: Use the following information for the next 3 questions: You have an Inverse Gamma Distribution with parameters α = 4 and θ = 9. 8.1 (1 point) What is the density function at x = 7? A. less than 0.01 B. at least 0.01 but less than 0.02 C. at least 0.02 but less than 0.03 D. at least 0.03 but less than 0.04 E. at least 0.04 8.2 (1 point) What is the mean? A. 2.0 B. 2.5 C. 3.0
D. 3.5
E. 4.0
8.3 (1 point) What is the variance? A. 4.0 B. 4.5 C. 5.0
D. 5.5
E. 6.0
8.4 (1 point) What is the integral from zero to infinity of e-6/x x-11? A. 0.006
B. 0.008
C. 0.010
D. 0.012
E. 0.014
Page 236
2013-4-10,
Conjugate Priors §8 Inverse Gamma Distribution,
HCM 10/21/12,
Page 237
Solutions to Problems: 8.1. B. f(x) = θα e− θ/x / {Γ(α) xα+1}. f(7) = 94 e- 9/7 / {Γ(4) 75 } = 0.0180. 8.2. C. Mean = θ / (α−1) = 9 / (4-1) = 3. 8.3. B. Variance = θ2 / {(α−1)2 (α−2)} = (92 ) / {32 2} = 4.5. Alternately, the second moment is θ2 Γ(α-2) / Γ(α) = θ2 / {(α−1) (α−2)} = (92 ) / {(3)(2)} = 13.5. Thus the variance = 13.5 - 32 = 4.5. ∞
8.4. A.
∫0 e - q / x
x - (α + 1) dx = Γ(α) / θα. Letting θ = 6 and α = 10,
the integral from zero to infinity of e-6/x x-11 is: Γ(10) / 610 = 9! / 610 = 0.006001. Comment: e-6/x x-11 is proportional to the density of an Inverse Gamma Distribution with θ = 6 and α = 10. Thus its integral from zero to infinity is the inverse of the constant in front of the Inverse Gamma Density, since the density itself must integrate to unity. Alternately, one could let y = 6/x and convert the integral to a complete Gamma Function.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 238
Section 9, Inverse Gamma - Exponential The Inverse Gamma - Exponential is a third example of a conjugate prior situation. Unlike the previous two examples, the Inverse Gamma - Exponential involves a mixture of severity rather than frequency parameters across a portfolio of risks. The sizes of loss for a particular policyholder are assumed to be Exponential with mean δ. Given δ, the distribution function of the size of loss is 1 - e−x/δ, while the density of the size of loss distribution is:
e- x / δ . δ
The mean of this Exponential is δ and its variance is δ2. Note this is not the parameterization of the exponential used in Loss Models; I have used δ rather than θ, so as to not confuse the scale parameter of the Exponential with that of the Inverse Gamma which is θ. So for example, the density of a loss being of size 8 is: (1/δ) e-8/δ. If δ = 2 this density is: (1/2)e-4 = 0.009, while if δ = 20 this density is (1/20)e-4 = 0.034.54 Prior Distribution: Assume that the values of δ are given by an Inverse Gamma distribution with α = 6 and θ = 15, with probability density function:55 g(δ) = 94,921.875 e−15/δ / δ7 , 0 ≤ δ < ∞:
54
The first portion of this example is the same as in “Mahlerʼs Guide to Loss Distributions.” However, here we introduce observations and then apply Bayes Analysis and Buhlmann Credibility. 55 The constant in front is: θα / Γ(α) = 156 / Γ(6) = 11,390,625 / 120 = 94,921.875
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 239
Displayed below is this distribution of Exponential parameters: 0.4
0.3
0.2
0.1
2
4
6
8
10
Note that this distribution is an Inverse Gamma which has a mean of: θ/(α-1) = 15 / (6-1) = 3. This is the a priori estimate of claim severity. Marginal Distribution (Prior Mixed Distribution): If we have a policyholder and do not know its expected mean severity, in order to get the density of the next loss being of size 8, one would weight together the densities of having a loss of size 8 given δ, using the a priori probabilities of δ : g(δ) = 94,921.875 e− 15/δ / δ7, and integrating from zero to infinity: ∞
f(8) =
∫0
e- 8 / δ g(δ) dδ = 94,921.875 δ
∞
e- 23 / δ dδ = 94,921.875 ( 6! ) / (237 ) = 0.0201. 8 δ 0
∫
Where we have used the fact : ∞
e- θ / x α α α + 1 dx = Γ(α) / θ = (α-1)! / θ . x 0
∫
More generally, if the distribution of Exponential means δ is given by an Inverse Gamma distribution g(δ) = θα e−θ/δ / {Γ(α) δα+1}, and then we compute the density at size x by integrating from zero to infinity:56 Both the Exponential and the Inverse Gamma have terms involving powers of e−1/δ and 1/δ; note how these terms combine when one takes the product. 56
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 240
∞
f(x) =
∞ - x/ δ ∞ e - (θ + x) / δ θα e - θ / δ e- x / δ e θα g(δ) dδ = dδ = ∫ δ ∫ δ Γ(α) δ α + 1 ∫ δ α + 2 dδ = Γ(α) 0 0 0
θα Γ(α + 1) α θα = . Γ(α) (θ + x) α + 1 (θ + x) α + 1 Thus the (prior) mixed distribution is in the form of the Pareto distribution. Note that the shape parameter and scale parameter of the mixed Pareto distribution are the same as those of the Inverse Gamma distribution. For the specific case dealt with previously: α = 6 and θ = 15. Thus the density at size x is: 6(156 )(15+x)-7. For x = 8 this density is: 6(156 )(23)-7 = 0.0201. This is the same result as calculated above. For the Inverse Gamma-Exponential the (prior) marginal distribution is always a Pareto, with α = shape parameter of the (prior) Inverse Gamma and θ = scale parameter of the prior Inverse Gamma. The marginal Pareto is a size of loss distribution, while the prior Inverse Gamma is a distribution of parameters. Since the Inverse Gamma is a distribution of each insureds mean severity, the a priori mean severity is the mean of the Inverse Gamma. The mean of the Pareto is also the a priori mean. ⇒ Mean of the Inverse Gamma = Mean of the Pareto. In this particular case we get a marginal Pareto distribution with parameters of α = 6 and θ = 15, which has a mean of 15/(6 - 1) = 3, which matches the mean of the prior Inverse Gamma. Note that the formula for the mean of an Inverse Gamma and a Pareto are both θ/(α-1). Exercise: Each insured has an Exponential severity with mean δ. The values of δ are distributed via an Inverse Gamma with parameters α = 2.3 and θ = 1200. An insured is picked at random. What is the chance that his next claim will be greater than 1000? [Solution: The marginal distribution is a Pareto with parameters α = 2.3 and θ = 1200. ⎛ θ ⎞α ⎛ ⎞ 2.3 1200 S(1000) = ⎜ ⎟ =⎜ ⎟ = 24.8%.] ⎝θ + x ⎠ ⎝ 1000 +1200 ⎠
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 241
Prior Expected Value of the Process Variance: The process variance of the severity for an individual risk is δ2 since the severity for each risk is Exponential. Therefore the expected value of the process variance = the expected value of δ2 = second moment of the distribution of δ = second moment of the Inverse Gamma Distribution =
θ2 . (α − 1) (α − 2)
Thus for α = 6 and θ = 15, the expected value of the process variance is: 152 / {(5)(4)} = 11.25. Prior Variance of the Hypothetical Means: The variance of the hypothetical mean severities is the variance of δ = Var[δ] = Variance of the Prior Inverse Gamma = 2nd moment of Inverse Gamma - square of mean of Inverse Gamma = θ2 θ2 θ2 = . (α − 1) (α − 2) (α − 1)2 (α − 1)2 (α − 2)
For α = 6 and θ = 15, VHM =
152 = 9/4 = 2.25. (6 -1)2 (6 - 2)
Prior Total Variance: The total variance = the variance of the marginal Pareto distribution = 2 θ2 θ2 α θ2 2nd moment of Pareto - square of mean of Pareto = − = . (α − 1) (α − 2) (α − 1)2 (α − 1)2 (α − 2) For α = 6 and θ = 15 this equals:
(6) (15 2 ) = 13.5. (6 - 2) (6 -1)2
The Expected Value of the Process Variance + Variance of the Hypothetical Means = 2.25 + 11.25 = 13.5 = Total Variance. VHM = the variance of the Inverse Gamma. Total Variance = the variance of the Pareto. Total Variance = EPV + VHM. ⇒ Variance of the Inverse Gamma < Variance of the Pareto. Variance of Inverse Gamma:
θ2 α θ2 , α > 2. Variance of Pareto: , α > 2. (α - 1)2 (α - 2) (α - 1)2 (α - 2)
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 242
Observations: Let us now introduce the concept of observations. A risk is selected at random and it is observed to have 3 claims of size: 8, 5, and 4, in that order.57 Posterior Distribution: We can employ Bayesian analysis to compute what the probability that the selected risk had a given Exponential Parameter. Given a Exponential with parameter δ, the density at size 8 is: e- 8 / δ . The density of the first claim being of size 8, the second of size 5 and the third of size 2 is δ the product:
e- 8 / δ e- 5 / δ e- 4 / δ e- 17 / δ = . δ δ δ δ3
The a priori probability of δ is the Prior Inverse Gamma distribution: g(δ) = 94921.875
e- 15 / δ , 0 ≤ δ < ∞. Thus the posterior density of δ is proportional to the product δ7
of the density of observation and the a priori probability:
e- 32 / δ . δ10
This is proportional to the density for an Inverse Gamma distribution with α = 9 and θ = 32. This posterior Inverse Gamma (dashed) is compared to the prior Inverse Gamma (solid): 0.4
0.3
0.2
0.1
2 57
4
6
8
10
It will turn out that for the forthcoming analysis that the answer will only depend on the number of claims, 3, and their total, 17.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 243
In general, if one observes C claims totaling L losses, we have that the density of the observation e- L / δ given δ is proportional to a product of terms equal to: . The prior Inverse Gamma distribution δC e- θ / δ is proportional to: α + 1 . Note that both the density of observation and the Inverse Gamma have δ a term involving the exponential of 1/δ and a term involving δ to a negative power. The posterior e - ( θ + L) / δ probability for δ is proportional to: . This is proportional to the density of an Inverse δ α + 1+ C Gamma distribution with new shape parameter = α+C and new scale parameter = θ+L. Thus for the Inverse Gamma - Exponential the posterior density function is also an Inverse Gamma. This posterior Inverse Gamma has a shape parameter = prior shape parameter plus the number of claims observed. This posterior Inverse Gamma has a scale parameter = prior scale parameter plus the total cost of the claims observed. Posterior α = prior α + C.
Posterior θ = prior θ + L.
For example, in the case where we observed 3 claims totaling 17, C = 3 and L = 17. The prior α shape parameter was 6 while the prior θ scale parameter was 15. Therefore the posterior α shape parameter = 6 + 3 = 9, while the posterior θ scale parameter = 15 + 17 = 32, matching the result obtained above. The fact that the posterior distribution is of the same form as the prior distribution is why the Inverse Gamma is a Conjugate Prior Distribution for the Exponential. Predictive Distribution: Since the posterior distribution is also an Inverse Gamma distribution, the same analysis that led to a Pareto (prior) marginal distribution, will lead to a (posterior) predictive distribution that is Pareto. However, the parameters are related to the posterior Inverse Gamma. For the Inverse Gamma-Exponential the (posterior) predictive distribution is always a Pareto, with parameters: α = shape parameter of the posterior Inverse Gamma and θ = scale parameter of the posterior Inverse Gamma. Thus posterior α = prior α + C and posterior θ = prior θ + L. In the particular example with a posterior Inverse Gamma distribution with parameters 9 and 32, the parameters of the posterior Pareto predictive distribution are also 9 and 32. Alternatively, one can compute this in terms of the prior Inverse Gamma with parameters 6 and 15 and the observations of 3 claims totaling 17; α = 6 + 3 = 9 and θ = 15 + 17 = 32.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 244
Below are compared the marginal Pareto (solid) and the posterior predictive Pareto (dashed): 0.4
0.3
0.2
0.1
2
4
6
8
10
Having observed 3 claims totaling 17, an average severity of 5.67 compared to the a priori mean of 3, has increased the estimated probability of a large claim in the future. Exercise: Each insured has an Exponential severity with mean δ. The values of δ are distributed via an Inverse Gamma with parameters α = 2.3 and θ = 1200. An insured is picked at random and observed to have 5 claims totaling 3000. What is the chance that his next claim will be greater than 1000? [Solution: The predictive distribution is a Pareto with parameters α = 2.3 + 5 = 7.3 and θ = 1200 + 3000 = 4200. S(1000) = {θ/(x+θ)}α = {4200/(1000+4200)}7.3 = 21.0%.] Posterior Mean: One can compute the means and variances posterior to the observations. The posterior mean can be computed in either one of two ways. First one can weight together the means for each type of risk, using the posterior probabilities. This is E[δ] = the mean of the posterior Inverse Gamma = 32/(9-1) = 4. Alternately, one can compute the mean of the predictive Pareto distribution: θ/(α-1) = 32/(9-1) = 4. Of course the two results match. Thus the new estimate posterior to the observations for this riskʼs average severity using Bayesian Analysis is 4. This compares to the a priori estimate of 3. In general, the observations provide information about the given risk, which allows one to make a better estimate of the future experience of that risk. Not surprisingly observing 3 claims totaling 17, for an observed average severity of 5.67, has raised the estimated severity from 3 to 4.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 245
Posterior Expected Value of the Process Variance: Just as prior to the observations, posterior to the observations one can compute three variances: the expected value of the process variance, the variance of the hypothetical pure premiums, and the total variance. The process variance of the severity for an individual risk is δ2 since the severity for each risk is Exponential. Therefore the expected value of the process variance = the expected value of δ2 = second moment of the posterior Inverse Gamma =
θ2 . (α − 1) (α − 2)
Thus for the Posterior Inverse Gamma with α = 9 and θ = 32, the expected value of the process 32 2 variance is: = 18.29. (8) (7) Posterior Variance of the Hypothetical Means: The variance of the hypothetical mean severities is the variance of δ = Var[δ] = Variance of the Posterior Inverse Gamma =
For α = 9 and θ = 32 this is:
θ2 θ2 θ2 = . (α − 1) (α − 2) (α − 1)2 (α − 1)2 (α − 2)
322 = 16/7 = 2.29. (10 -1)2 (10 - 2)
Posterior Total Variance: The total variance = the variance of the predictive Pareto distribution = 2 θ2 θ2 α θ2 = . (α − 1) (α − 2) (α − 1)2 (α − 1)2 (α − 2)
For α = 9 and θ = 32, total variance =
(9)(322) = 20.57. (82)(7)
The Expected Value of the Process Variance + Variance of the Hypothetical Means = 2.29 + 18.29 = 20.58 = Total Variance, subject to rounding.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 246
Buhlmann Credibility: Next, letʼs apply Buhlmann Credibility to this example. The Buhlmann Credibility parameter K = the (prior) expected value of the process variance / the (prior) variance of the hypothetical means = 11.25 / 2.25 = 5. Note that K can be computed prior to any observation and doesnʼt depend on them. Specifically both variances are for a single insured for one trial. In general, EPV =
K=
prior EPV = prior VHM
θ2 θ2 , α > 2, and VHM = , α > 2. (α − 1) (α − 2) (α − 1)2 (α − 2) θ2 (α − 1) (α − 2) θ2
= α - 1, α > 2.
(α − 1)2 (α − 2)
For the Inverse Gamma-Exponential in general, the Buhlmann Credibility parameter K = α - 1, where α > 2 is the shape parameter of the prior Inverse Gamma distribution. For this example, K = 6 - 1 = 5. Having observed 3 claims, Z = 3 / (3 + 5) = 3/8 = 0.375. The observed severity = 17/3. The a priori mean = 15/(6-1) = 3. Thus the new estimate = (3/8)(17/3) + (5/8)(3) = 4. Note that in this case the estimate from Buhlmann Credibility matches the estimate from Bayesian Analysis. For the Inverse Gamma-Exponential the estimates from using Bayesian Analysis and Buhlmann Credibility are equal.58 Summary: The many different aspects of the Inverse Gamma-Exponential are summarized below. Be sure to be able to clearly distinguish between the situation prior to observations and that posterior to the observations.
58
As discussed in a subsequent section this is a special case of the general results for conjugate priors of members of linear exponential families. This is another example of what Loss Models calls “exact credibility.” It should be noted that for α ≤ 2, one should not use Buhlmann Credibility. In these cases one can still use Bayesian Analysis, since α > 0 and C ≥ 1 ⇒ αʼ = α + C > 1 ⇒ posterior Inverse Gamma has a finite mean.
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 247 2013-4-10,
Inverse Gamma-Exponential Severity Process
Mixing
Exponential Process
Variance = αθ2/{(α−2)(α−1)2}.
(Size of Loss) α = shape parameter of the Prior Inverse Gamma = α . θ = scale parameter of the Prior Inverse Gamma = θ. Mean = θ/(α−1). second moment = 2θ2/{(α−1)(α−2)}.
Pareto Marginal Distribution:
Inverse Gamma is a Conjugate Prior, Exponential is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate Buhlmann Credibility Parameter, K = α - 1. Inverse Gamma Prior (Distribution of Parameters) Shape parameter = alpha = α, Scale parameter = theta = θ.
Exponential Process
Mixing
Variance = αθ2/{(α−2)(α−1)2}.
(Size of Loss) α = shape parameter of the Posterior Inverse Gamma = αʼ = α + C. θ = scale parameter of the Posterior Inverse Gamma = θʼ = θ + L. Mean = θ/(α−1). second moment = 2θ2/{(α−1)(α−2)}.
Pareto Predictive Distribution:
Observations: $ of Loss = L, # claims = C.
Inverse Gamma Posterior (Distribution of Parameters) Posterior Shape parameter = αʼ = α + C. Posterior Scale parameter = θʼ = θ + L.
Exponential Parameters (means) of individuals making up the entire portfolio are distributed via an Inverse Gamma Distribution with parameters α and θ: f(x) = θα e-θ/x/ {xα+1Γ[α]}, mean = θ/(α−1), second moment = θ2/{(α−1)(α−2)} , variance = θ2/{(α−2)(α−1)2}.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 248
Comparing the Inverse Gamma and Pareto Distributions:
Exponential Process
Inverse Gamma Prior Distribution of Parameters α, θ.
Mixing
Pareto Marginal Size of Loss α = α. θ = θ.
Since the Inverse Gamma is a distribution of each insureds mean severity. ⇒ The a priori mean severity is the mean of the Inverse Gamma. The mean of the Pareto is also the a priori mean. ⇒ Mean of Inverse Gamma = Mean of Pareto =
θ . α-1
VHM = the variance of the Inverse Gamma. Total Variance = the variance of the Pareto. Total Variance = EPV + VHM > VHM. θ2 α θ2 ⇒ = Variance of Inverse Gamma < Variance of Pareto = , α > 2. (α − 1)2 (α − 2) (α − 1)2 (α − 2)
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 249
Hazard Rates of Exponentials Distributed via a Gamma:59 The way that I have presented things, the mean of each Exponential Severity was δ, and δ followed an Inverse Gamma Distribution. If the hazard rate of the Exponential, λ, is distributed via a Gamma(α, θ), then the mean 1/λ is distributed via an Inverse Gamma(α, 1/θ), and therefore the mixed distribution is Pareto. If the Gamma has parameters α and θ, then the mixed Pareto has parameters α and 1/θ. Relationship to the Gamma-Poisson: Assume as before, δ, the mean of each Exponential, follows an Inverse Gamma Distribution with parameters α = 6 and θ = 15. Then, F(δ) = 1 - Γ[6, 15/δ]. If λ = 1/δ, then F(λ) = Γ[6, 15λ]. ⇒ λ follows a Gamma with parameters α = 6 and θ = 1/15. This is mathematically the same as Exponential interarrival times each with mean 1/λ, or a Poisson Process with intensity λ. Prob[X > x] ⇔ Prob[Waiting time to 1st claim > x] = Prob[no claims by time x]. From time 0 to x we have a Poisson Frequency with mean xλ. xλ has a Gamma Distribution with parameters 6 and x/15. This is mathematically a Gamma-Poisson, with mixed distribution that is Negative Binomial with r = 6 and β = x/15. Prob[X > x] ⇔ Prob[no claims by time x] = f(0) = 1/(1 + x/15)6 = 156 /(15+ x)6 . This is the survival function at x of a Pareto Distribution, with parameters α = 6 and θ = 15, as obtained previously. As before, a risk is selected at random and it is observed to have 3 claims of size: 8, 5, and 4, in that order. This is mathematically equivalent to having three interarrival times in the Poisson Process of lengths 8, 5, and 4. In other words, we see a total of 3 claims in a length of time: 8 + 5 + 4 = 17. Recall that the prior Gamma had parameters α = 6 and θ = 1/15. Thus using the updating formula for the Gamma-Poisson, the posterior Gamma has parameters 6 + 3 = 9 and 1/(15 + 17) = 1/32. Translating back to the Inverse Gamma-Exponential, the Posterior Inverse Gamma has parameters α = 9 and θ = 1/(1/32) = 32, matching the result obtained previously. 59
See for example, 4B, 11/93, Q.6.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 250
Problems: Use the following information to answer the next 15 questions: The size of claim distribution for any particular policyholder is exponential with mean δ. The δ values of the portfolio of policyholders have probability density function: g(δ) = 216 e -6/δ δ−5 You are given the following values of the Incomplete Gamma Function, as per the Appendix of Loss Models: Γ(α ; y) y
α=4
α=5
α=6
α=7
1.0
0.019
0.004
0.001
0.000
1.5 2.0
0.066 0.143
0.019 0.053
0.004 0.017
0.001 0.005
3.0 3.5 6.0 7.0
0.353 0.463 0.849 0.918
0.185 0.275 0.715 0.827
0.084 0.142 0.554 0.699
0.034 0.065 0.394 0.550
9.1 (1 point) What is the mean claim size for the portfolio? A. less than 1.6 B. at least 1.6 but less than 1.8 C. at least 1.8 but less than 2.0 D. at least 2.0 but less than 2.2 E. at least 2.2 9.2 (1 point) What is the probability that an insured picked at random from this portfolio will have an Exponential parameter δ between 2 and 4? A. less than 25% B. at least 25% but less than 35% C. at least 35% but less than 45% D. at least 45% but less than 55% E. at least 55%
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 251
9.3 (2 points) The probability density function that a claim chosen at random will be of size x is given by which of the following? A. 279,936 (6 + x)-7 B. 38,880 (6 + x)-6 C. 5184 (6 + x)-5 D. 648 (6 + x)-4 E. None of A, B, C, or D 9.4 (1 point) An insured is picked at random. What is the probability that his next claim will be greater than 3 and less than 5? A. less than 8% B. at least 8% but less than 10% C. at least 10% but less than 12% D. at least 12% but less than 14% E. at least 14% 9.5 (2 points) What is the variance of the claim severity for the portfolio? A. less than 3 B. at least 3 but less than 5 C. at least 5 but less than 7 D. at least 7 but less than 9 E. at least 9 9.6 (2 points) What is the expected value of the process variance for the claim severity? A. less than 3 B. at least 3 but less than 5 C. at least 5 but less than 7 D. at least 7 but less than 9 E. at least 9 9.7 (2 points) What is the variance of the hypothetical mean severities? A. less than 3 B. at least 3 but less than 5 C. at least 5 but less than 7 D. at least 7 but less than 9 E. at least 9
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 252
9.8 (2 points) An insured has 2 claims of sizes 3 and 5. Using Buhlmann Credibility what is the estimate of this insured's expected future claim severity? A. less than 2.3 B. at least 2.3 but less than 2.5 C. at least 2.5 but less than 2.7 D. at least 2.7 but less than 2.9 E. at least 2.9 9.9 (2 points) An insured has 2 claims of sizes 3 and 5 . Which of the following is proportional to the posterior probability density function for this insured's exponential parameter? A. e−10/δ δ−5 B. e−12/δ δ−6 C. e−14/δ δ−7 D. e−16/δ δ−8 E. None of A, B, C, or D 9.10 (1 point) An insured has 2 claims of sizes 3 and 5. What is the mean of the posterior severity distribution? A. less than 2.3 B. at least 2.3 but less than 2.5 C. at least 2.5 but less than 2.7 D. at least 2.7 but less than 2.9 E. at least 2.9 9.11 (1 point) An insured has 2 claims of sizes 3 and 5. What is the probability that this insured has an Exponential parameter δ between 2 and 4? A. 40%
B. 44%
C. 48%
D. 52%
E. 56%
9.12 (2 points) An insured has 2 claims of sizes 3 and 5. What is the variance of the posterior distribution of exponential parameters? A. less than 2.0 B. at least 2.0 but less than 2.1 C. at least 2.1 but less than 2.2 D. at least 2.2 but less than 2.3 E. at least 2.3
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 253
9.13 (1 point) An insured has 2 claims of sizes 3 and 5. What is the probability that this insured has an Exponential parameter δ between 2 and 4? Use the Normal Approximation. A. 40% B. 44% C. 48%
D. 52%
E. 56%
9.14 (2 points) An insured has 2 claims of sizes 3 and 5. Which of the following is proportional to the density of the predictive distribution of the severity for this insured? A. (14 + x)-8 B. (13 + x)-7 C. (12 + x)-6 D. (11 + x)-5 E. None of A, B, C, or D 9.15 (1 point) An insured has 2 claims of sizes 3 and 5. What is the probability that his next claim will be greater than 2 and less than 4? A. less than 18% B. at least 18% but less than 20% C. at least 20% but less than 22% D. at least 22% but less than 24% E. at least 24%
9.16 (2 points) You are given the following: • The size of the single claim follows a distribution: f(x) = λe-λx, x > 0. • The parameter λ is a random variable with probability density function: g(λ) = 100e-100λ, λ > 0. • Two claims are observed. They have sizes 40 and 80. What is the expected value of the size of the next claim? A. 70 B. 80 C. 90 D. 100 E. 110 9.17 (3 points) You are given the following: • The amount of an individual claim has an Exponential Distribution with mean q • The parameter q is distributed via an Inverse Gamma distribution with shape parameter α = 4 and scale parameter θ = 1000. From an individual insured you observe 3 claims of sizes: 100, 200, and 500. For the zero-one loss function, what is the Bayes estimate of q for this insured? A. 225 B. 250 C. 275 D. 300 E. 325
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 254
Use the following information for the next six questions:
• The time in minutes between text messages sent by an individual girl is Exponential with mean µ. • For an individual, the times between sending text messages are independent. • You assume that µ is distributed between different girls via an Inverse Gamma Distribution with α = 4 and θ = 100.
• Samantha sent her 5th text message exactly 20 minutes after you started observing her. • You may use the following values of the incomplete Gamma Function: Γ[8 ; 7.67] = 0.500. Γ[9 ; 8.67] = 0.500. Γ[10 ; 9.67] = 0.500. 9.18 (2 points) Estimate Samanthaʼs mean time in minutes between text messages, using the Bayes estimate that minimizes the squared error loss function. (A) 0 (B) 10 (C) 12 (D) 14 (E) 15 9.19 (2 points) Estimate Samanthaʼs mean time in minutes between text messages, using the Bayes estimate that minimizes the absolute error loss function. (A) 0 (B) 10 (C) 12 (D) 14 (E) 15 9.20 (2 points) Estimate Samanthaʼs mean time in minutes between text messages, using the Bayes estimate that minimizes the zero-one loss function. (A) 0 (B) 10 (C) 12 (D) 14 (E) 15 9.21 (2 points) Estimate the time in minutes until Samantha sends her next text message, using the Bayes estimate that minimizes the squared error loss function. (A) 0 (B) 10 (C) 12 (D) 14 (E) 15 9.22 (2 points) Estimate the time in minutes until Samantha sends her next text message, using the Bayes estimate that minimizes the absolute error loss function. (A) 0 (B) 10 (C) 12 (D) 14 (E) 15 9.23 (2 points) Estimate the time in minutes until Samantha sends her next text message, using the Bayes estimate that minimizes the zero-one loss function. (A) 0 (B) 10 (C) 12 (D) 14 (E) 15
9.24 (3 points) The size of each loss is Exponential with mean µ. The improper prior distribution is: π(µ) = 1, µ > 0. You observe losses of size: 10, 5, 15, and 20. What is the variance of the predictive distribution?
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 255
9.25 (4B, 11/93, Q.6) (2 points) You are given the following: • An individual risk has exactly one claim each year. • The size of the single claim follows an exponential distribution with parameter t, f(x) = te-tx, x > 0 • The parameter t is a random variable with probability density function h(t) = te-t, t > 0. • A claim of $5 is observed in the current year. Determine the posterior distribution of t. A. t2 e-5t
B. (125/2)t2 e-5t
C. t2 e-6t
D. 108t2 e-6t
E. 36t2 e-6t
Use the following information for the next two questions: • Claim sizes for a given risk follow a distribution with density function f(x) = e-x/λ /λ , 0 < x < ∞, λ > 0. •
The prior distribution of λ is assumed to follow a distribution with mean 50 and density function g(λ) = 500,000 e-100/λ / λ4 , 0 < λ < ∞.
9.26 (4B, 5/98, Q.28) (2 points) Determine the variance of the hypothetical means. A. Less than 2,000 B. At least 2,000, but less than 4,000 C. At least 4,000, but less than 6,000 D. At least 6,000, but less than 8,000 E. At least 8,000 9.27 (4B, 5/98, Q.29) (2 points) Determine the density function of the posterior distribution of λ after 1 claim of size 50 has been observed for this risk. A. 62,500 e-50/λ / λ4 B. 500,000 e-100/λ / λ4 C. 1,687,500 e-150/λ / λ4 D. 50,000,000 e-100/λ / (3λ5) E. 84,375,000 e-150/λ / λ5
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 256
9.28 (4, 11/00, Q.23) (2.5 points) You are given: (i) The parameter Λ has an inverse gamma distribution with probability density function: g(λ) = 500 λ−4 e-10/λ , λ > 0. (ii) The size of a claim has an exponential distribution with probability density function: f (x | Λ = λ) = λ−1 e-x/λ , x > 0, λ > 0. For a single insured, two claims were observed that totaled 50. Determine the expected value of the next claim from the same insured. (A) 5 (B) 12 (C) 15 (D) 20 (E) 25 9.29 (2 points) In the previous question, what is the probability that the next claim from the same insured is greater than 30? (A) 5% (B) 7% (C) 9% (D) 11% (E) 13% 9.30 (2 points) In 4, 11/00, Q.23, determine the variance of the distribution of the size of the next claim from the same insured.
9.31 (4, 5/07, Q.30) (2.5 points) You are given: (i) Conditionally, given β, an individual loss X follows the exponential distribution with probability density function: f(x | β) = exp(-x/β)/β, 0 < x < ∞. (ii) The prior distribution of β is inverse gamma with probability density function: π(β) = c2 exp(-c/β)/β3, 0 < β < ∞. ∞
(iii)
∫ exp(−a / y) / yn dy = (n-2)! / an-1, n = 2, 3, 4, .... 0
Given that the observed loss is x, calculate the mean of the posterior distribution of β. (A) 1 / (x + c)
(B) 2 / (x + c)
(C) (x + c) / 2
(D) x + c
(E) 2(x + c)
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 257
Solutions to Problems: 9.1. D. g(δ) is an Inverse Gamma density function, with parameters: θ = 6 and α = 4. (α+1 is the power to which 1/δ is raised, while θ is the number divided by δ in the exponential term.) It has mean θ / (α - 1) = 6 / 3 = 2. Each exponential severity has a mean of δ. Thus, Mean of the portfolio = E[δ] = Mean of Inverse Gamma Distribution = 2. 9.2. B. The prior distribution is an Inverse Gamma with θ = 6 and α = 4. Thus F(x) = 1 - Γ(α; θ/x) = 1 - Γ(4; 6/x). F(4) - F(2) = {1 - Γ(4; 1.5)} - {1 - Γ(4; 3 )} = Γ(4; 3) - Γ(4; 1.5) = .353 - .066 = 0.287. 9.3. C. The (prior) marginal distribution is given by a Pareto with θ = 6 and α = 4. (Note the mean is θ / (α-1) = 2, which matches the answer to a previous question.) Pareto density = (αθα)(θ + x)−(α + 1) = (4)(64 ) / (6 +x)5 = 5184 / (6 +x)5 . 9.4. C. The (prior) marginal distribution is given by a Pareto with θ = 6 and α = 4. F(x) = 1 - (θ/(θ + x))α. F(3) = 1 - (6/9)4 = .8025. F(5) = 1 - (6/11)4 = .9115. F(5) - F(3) = .9115 - .8025 = 10.9%. 9.5. D. For the Pareto, Variance = θ2α / (α−2)(α−1)2. Since θ = 6 and α = 4, Variance = 8. Alternately, the total variance equals the expected value of the process variance plus the variance of the hypothetical means = 6 + 2 = 8. (Using the solutions to the next two questions.) 9.6. C. The process variance for an exponential is δ2 . Therefore the expected value of the process variance = E[δ2 ] = second moment of the Inverse Gamma = θ2 / (α-1)(α-2) =
62 = 6. (4 -1) (4 - 2)
9.7. A. The mean for an exponential is δ. Therefore the variance of the hypothetical means =VAR[δ] = variance of the Inverse Gamma = 2nd moment - mean2 = 6 - 22 = 2.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 258
9.8. D. Using the solutions to the prior 2 questions, K = 6/2 = 3. (Alternately, K = α - 1 = 3.) Thus, Z = 2/(2 + 3) = 40%. Prior estimate is 2. Observation is (3+5) / 2 = 4. Thus the new estimate = (40%)(4) + (60%)(2) = 2.8 9.9. C. The Posterior Distribution is an Inverse Gamma with θ = 6 + 3 + 5 = 14 and α = 4 + 2 = 6. f(δ) = θα e− θ/δ / {Γ(α) δ α+1} = (146 ) e− 14/δ / {Γ(7-1)δ 7} ∝ e-14 / δ δ- 7. 9.10. D. Mean of Posterior Inverse Gamma is 14/(6 - 1) = 2.8. 9.11. E. The posterior distribution is an Inverse Gamma with θ = 14 and α = 6. Thus F(x) = 1 - Γ(α; θ/x ) = 1 - Γ(6; 14/x). F(4) - F(2) = {1 - Γ(6; 3.5)} - {1 - Γ(6; 7)} = Γ(6; 7) - Γ(6; 3. ) = 0.699 - 0.142 = 0.557. 9.12. A. Variance of Posterior Inverse Gamma = 2nd moment - mean2 = 142 / {(6-1) (6-2)} - {14/(6-1)}2 = 9.8 - 2.82 = 1.96. 9.13. D. The posterior distribution is an Inverse Gamma with θ = 14 and α = 6, with mean of 2.8 and standard deviation of
1.96 = 1.4.
Thus F(4) - F(2) ≅ Φ[(4 - 2.8)/1.4] - Φ[(2 - 2.8)/1.4] = Φ(0.86) - Φ(-0.57) = 0.8051 - 0.2843 = 0.5208. Comment: Note that this differs from the exact answer of 0.557 obtained as the solution to a previous question using values of the Incomplete Gamma Functions. 9.14. E. (Posterior) predictive distribution is Pareto with θ = 14 and α = 6. Pareto density = (αθα)(θ + x)−(α + 1) = (6)(146 ) / (14 +x)7 = 45,177,216 / (14 +x)7 . 9.15. D. The (posterior) predictive distribution is given by a Pareto with θ = 14 and α = 6. F(x) = 1 - (θ/(θ + x))α. F(2) = 1 - (14/16)6 = 0.5512. F(4) = 1 - (14/18)6 = 0.7786. F(4) - F(2) = 0.7786 - 0.5512 = 22.7%.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 259
9.16. E. The posterior distribution is proportional to: e-100λλe -40λ λe -80λ = λ2e-220λ. E[X | λ] =1/λ. Therefore, the expected size of the next claim is: ∞
∫
∞
∫
(1/λ) λ2e-220λ dλ / λ2e-220λ dλ = 220-2 Γ(2) /{220-3 Γ(3)} = 220/2 = 110. 0
0
Alternately, let δ = 1/λ. λ = 1/δ. dλ/dδ = -1/δ2. Then severity is exponential with mean δ. h(δ) = g(λ) |dλ/dδ| = 100e-100/δ/δ2, an Inverse Gamma with α = 1 and θ = 100, (an Inverse Exponential.) αʼ = α + C = 1 + 2 = 3. θʼ = θ + L = 100 + 120 = 220. θʼ/(αʼ-1) = 220/2 = 110. Comment: A somewhat disguised Inverse Gamma-Exponential. Since α = 1 ≤ 2, one can not apply Buhlmann Credibility. The prior Inverse Gamma has no finite mean, and neither the EPV nor VHM exist. 9.17. A. For the Exponential, f(x | q) = e-x/q/q. For the Inverse Gamma, π(q) = 10004 e-1000/q / {Γ[4] q4+1} = 10004 e-1000/q / {3! q5 }. The posterior distribution is proportional to: f(100 | q) f(200 | q) f(500 | q) π(q), which is proportional to: (e-100/q/q) (e-200/q/q) (e-500/q/q) (e-1000/q / q5 ) = e-1800/q / q8 . This is an Inverse Gamma Distribution with parameters 7 and 1800. For the zero-one loss function, the Bayes estimate is the mode of the posterior distribution. The mode of the Inverse Gamma distribution is: θ/(α + 1) = 1800/(7 + 1) = 225. Comment: This is an example of an Inverse Gamma-Exponential. αʼ= α + C = 4 + 3 = 7, and θʼ= θ + L = 1000 + 800 = 1800. For the squared error loss function, the Bayes estimate is the mean of the posterior distribution, which in this case is: θ/(α−1) = 1800/(7 - 1) = 300.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 260
9.18. E., 9.19. D., 9.20. C. We are trying to estimate µ for Samantha. The number of text messages of 5 acts like the number of claims. The total time of 20 minutes acts like the total dollars of loss. The posterior distribution of µ is Inverse Gamma with α = 4 + 5 = 9, and θ = 100 + 20 = 120. After observations, this is the distribution of hypothetical means. The estimate corresponding to the squared error loss function is the mean. The mean of the posterior Inverse Gamma is: θ/(α-1) = 120 / (9-1) = 15. The estimate corresponding to the absolute loss function is the median. The distribution function of the posterior Inverse Gamma is: F(µ) = 1 - Γ[9; 120/µ]. Setting this equal to 50%: 0.50 = Γ[9; 120/µ]. ⇒ 120/µ = 8.67. ⇒ µ = 120/8.67 = 13.84. The estimate corresponding to the zero-one loss function is the mode. The mode of the posterior Inverse Gamma is: θ/(α+1) = 120 / (9+1) = 12. Comment: Loss functions are discussed in “Mahlerʼs Guide to Buhlmann Credibility.” The zero-one loss function is kind of silly for actuarial work. 9.21. E., 9.22. B., 9.23. A. We are trying to estimate the next observation from Samantha. The number of text messages of 5 acts like the number of claims. The total time of 20 minutes acts like the total dollars of loss. The posterior distribution of µ is Inverse Gamma with α = 4 + 5 = 9, and θ = 100 + 20 = 120. Thus the predictive distribution is Pareto with α = 9, and θ = 120. After observations, this is the distribution of observed lengths of time between text messages. The estimate corresponding to the squared error loss function is the mean. The mean of the predictive Pareto is: θ/(α-1) = 120 / (9-1) = 15. The estimate corresponding to the absolute loss function is the median. The median of the predictive Pareto is: θ {(1-p)−1/α - 1} = (120) (0.5-1/9 - 1) = 9.607. The estimate corresponding to the zero-one loss function is the mode. The mode of the predictive Pareto is 0. Comment: Loss functions are discussed in “Mahlerʼs Guide to Buhlmann Credibility.” The zero-one loss function is kind of silly for actuarial work. While the density of the predictive Pareto is largest at zero, that is not a sensible estimate of the next observation. The mean of the Inverse Gamma and the Pareto are equal; however, their modes and medians are not.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 261
9.24. The posterior distribution is proportional to: π(µ) f(10) f(5) f(15) f(20) = e−50/µ / µ4. This is proportional to an Inverse Gamma Distribution with α = 3 and θ = 50, which must be the posterior distribution. Mixing Exponentials via this Inverse Gamma produces a Pareto predictive distribution, with α = 3 and θ = 50. This Pareto has mean = θ/(α-1) = 25, second moment = (2)(502 )/2 = 2500, and variance = 2500 - 252 = 1875. Alternately, posterior EPV = E[µ2] = second moment of posterior inverse gamma = (502 )/2 = 1250. Posterior VHM = Variance of posterior inverse gamma = 1250 - 252 = 625. Posterior Total Variance = EPV + VHM = 1250 + 625 = 1875. 9.25. D. Using Bayes Theorem, the posterior probability weight for t is: (chance of observation given t)(a priori probability) = (te-5t)(te-t) = t2 e-6t. One has to normalize the probability weights so that they integrate to unity. This is proportional to a Gamma density, with α = 3 and θ = 1/6. Therefore in order to be a true probability density function, the constant in front must be θ−α / Γ(θ) = 63 / Γ(3) = 216 / 2! = 108. Therefore, the Posterior distribution is: 108t2 e-6t . Comment: Choices C and E do not integrate to unity; they are proportional to the correct choice D, which does integrate to unity. While this is in fact a special case of the Inverse Gamma- Exponential conjugate prior, this fact doesnʼt seem to aid in the speed of solution. One would have to convert the exponential to the parameterization, δ = 1/t, in which case h(t) becomes an Inverse Gamma with respect to δ.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 262
9.26. B. f(x) is an exponential Distribution with mean λ. g(λ) is an Inverse-Gamma Distribution with parameters α = 3 and θ = 100. The Variance of the Hypothetical Means = Var[λ] = Variance of the Inverse Gamma = θ2 / {(α-1)2 (α-2)} = 1002 / (22 (1)) = 2500. Alternately, one can compute the first and second moments of g, by integrating from zero to infinity and using the change of variables ζ = 100/λ:
∫
∫
E[λ] = λ500,000 e-100/λ / λ4 dλ = 500,000 (100/ζ) e−ζ (ζ/100)4 (100/ζ2) dζ =
∫
50 ζ e−ζ dζ = 50 Γ(2) = 50(1!) = 50.
∫
∫
E[λ2 ] = λ2 500,000 e-100/λ / λ4 dλ = 500,000 (100/ζ)2 e−ζ (ζ/100)4 (100/ζ2) dζ =
∫
5000 e−ζ dζ = 5000. Thus the VHM = Var[λ] = 5000 - 502 = 2500. Alternately, one can compute the first and second moments of g, by computing the -1 and -2 moments of the corresponding Gamma Distribution. If ψ = 1/λ, then the distribution of ψ is: g(λ) |dλ/dψ| = 500,000 e-100ψ ψ4 |(-1/ψ2 )| = 500000 e-100ψ ψ2, which is a Gamma Distribution with parameters α = 3 and θ = 1/100. The nth moment of a Gamma is: θn Γ(α+n)/Γ(α). The -1 moment of this Gamma is: θ-1Γ(α-1)/ Γ(α) = 1/ {θ(α-1)} = 100/2 = 50. The -2 moment of this Gamma is: θ-2Γ(α-2)/ Γ(α) = θ-2/(α-1)(α-2) = 1002 /2 = 5000. Thus as above, the VHM = 5000 - 502 = 2500. Comment: In this case, we were given the first moment of g(λ).
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 263
9.27. E. The prior distribution of λ, g(λ), is an Inverse Gamma Distribution with parameters α = 3 and θ = 100. The posterior distribution of λ is also an Inverse Gamma Distribution, but with parameters α = 3 + 1 = 4 and θ = 100 + 50 = 150. That is a density of: θα e− θ/x / {Γ(α) x α+1} = 1504 e-150/λ /{Γ(4) λ5 } = (506250000/6) e-150/λ / λ5 = 84,375,000 e-150/λ / λ5 . Alternately, if one observes 1 claim of size 50, we have that the chance of the observation given λ is: e-50/λ /λ. The posterior distribution of λ is proportional to: (e-50/λ /λ) g(λ) = (e-50/λ /λ)500,000 e-100/λ / λ4 = 500,000 e-150/λ λ-5. To get the posterior distribution, we need to divide by the integral of this quantity: ∞
∞
∞
∫(e-50/λ /λ) g(λ) dλ = ∫(e-50/λ /λ)500,000 e-100/λ / λ4 dλ = 500,000 ∫ e-150/λ λ-5 dλ. λ=0
λ=0
λ=0
Using the change of variables ζ = 150/λ, the integral becomes: 0
∫
∞
∫
500,000 e-ζ (ζ/150)5 (-150dζ / ζ2 ) = (500,000/1504 ) e-ζ ζ3/ dζ = (500,000/1504 ) Γ(4) ζ=∞
ζ =0
Thus the posterior distribution of λ is: 500,000 e-150/λ λ-5 / {(500,000/1504 ) Γ(4)} = 84,375,000 e-150/λ / λ5 .
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 264
9.28. C. This is an Inverse Gamma - Exponential Conjugate Prior. The prior Inverse Gamma has parameters: α = 3 and θ = 10. There are 2 claims and $50 of loss, thus the posterior Inverse Gamma has parameters: αʼ = 3 + 2 = 5 and θʼ = 10 + 50 = 60. The mean of the posterior Inverse Gamma is: θʼ/(αʼ-1) = 60/4 = 15. Alternately, the predictive distribution is Pareto with α = 5 and θ = 60, and mean: θ/(α - 1) = 60/4 = 15. Alternately, for the Inverse Gamma-Exponential, Buhlmann Credibility produces the same estimate as Bayesian Analysis. K = α -1 = 2. Z = 2/(2+K) = 1/2. Prior mean is: θ/(α-1) = 10/2 = 5. estimate = (1/2)(50/2) + (1/2)(5) = 15. Alternately, the sum of two independent identically distributed Exponential Distributions each with mean λ is a Gamma distribution with α = 2 and θ = λ. Therefore, the chance of the observation 2 claims totaling $50 is the density at 50 of a Gamma distribution with α = 2 and θ = λ, which is proportional to λ−2e-50/λ. By Bayes Theorem the posterior distribution of λ is proportional to: (λ−4e-10/λ)(λ−2e-50/λ) = λ−6e-60/λ. This is an Inverse Gamma Distribution with parameters: α = 5 and θ = 60, and mean: 60/(5-1) = 15. 9.29. E. This is an Inverse Gamma - Exponential Conjugate Prior. The prior Inverse Gamma has parameters: α = 3 and θ = 10. There are 2 claims and $50 of loss, thus the posterior Inverse Gamma has parameters: αʼ = 3 + 2 = 5, and θʼ = 10 + 50 = 60. The predictive distribution is Pareto with α = 5 and θ = 60. For the predictive Pareto Distribution, S(30) = {60/(60 + 30)}5 = 0.132. 9.30. The predictive distribution is Pareto with α = 5 and θ = 60. For this predictive Pareto Distribution: mean = 60 / (5 - 1) = 15. 2nd moment =
(2) (602 ) = 600. Variance = 600 - 152 = 375. (5 - 1) (5 - 2)
Comment: We have computed the variance of the posterior mixed distribution.
2013-4-10,
Conjugate Priors §9 Inverse Gamma - Exponential, HCM 10/21/12, Page 265
9.31. C. An Inverse Gamma - Exponential, with α = 2 and θ = c. The posterior Inverse Gamma has parameters: αʼ = α + C = 2 + 1 = 3, and θʼ = θ + L = c + x. The mean of the posterior Inverse Gamma is: θʼ /(αʼ - 1) = (x + c)/2. Alternately, one can apply Bayes Theorem. The posterior distribution of β is: f(x | β) π(β) /
∫
f(x | β) π(β) dβ = c2
∫ f(x | β) π(β) dβ .
∞
∫ exp[-(c + x)/ β] / β4 dβ = c2 2!/(x + c)3, using the hint with n = 4 and a = x + c. 0
∞
∫ β f(x | β) π(β) dβ = c2 ∫ exp[-(c + x)/ β] / β3dβ = c2 1!/(x + c)2, using the hint with n = 3 and a = x+c. 0
The mean of the posterior distribution of β is:
∫ β f(x | β) π(β) dβ / ∫ f(x | β) π(β) dβ = (x + c)/2. Comment: Since x is a loss, we would expect x to appear in the numerator rather than the denominator of the expected future loss, eliminating choices A and B.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 266
Section 10, Normal-Normal60 The Normal-Normal is a fourth example of a conjugate prior situation. Like the Inverse GammaExponential, it involves a mixture of claim severity rather than frequency parameters across a portfolio of risks. Unfortunately, unlike the Gamma-Poisson, where the Gamma, Poisson, and Negative Binomial Distributions each take different roles, in the Normal-Normal the Normal Distribution takes on all of these roles. Thus this is not the first Conjugate Prior situation one should learn. The sizes of claims a particular policyholder makes is assumed to be Normal with mean m and known fixed variance s2 .61 Given m, the distribution function of the size of loss is: Φ[(x-m)/s], while the density of exp[the size of loss distribution is: φ[(x-m)/s] =
(x - m)2 ] 2s2
s
2π
.
So for example if s = 3, then the probability density of claim being of size 8 is: exp(-(8-m)2 /18) / {3 2 π }. If m = 2 this density is: exp(-2) / {3 2 π } = 0.018, while if m = 20 this density is: exp(-8) / {3 2 π } = 0.000045. Prior Distribution: Assume that the values of m are given by another Normal Distribution with mean 7 and standard deviation of 2, with probability density function: exp[f(m) =
8
(m- 7)2 ] (2) (82 ) 2π
, -∞ < m < ∞.
Note that the mean of this distribution of 7 is the a priori estimate of claim severity.
60
The Normal-Normal is discussed in Loss Models at Examples 5.5, 20.13, and Exercise 20.35. Note Iʼve used roman letter for parameters of the Normal likelihood, in order to distinguish from the parameters of the Normal prior distribution discussed below. 61
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 267
Below is displayed this prior distribution of hypothetical mean severities:62 density 0.20 0.15 0.10 0.05
2
4
6
8
10
12
14
Mean Severity
Marginal Distribution (Prior Mixed Distribution): If we have a risk and do not know what type it is, in order to get the chance of the next claim being of size 8, one would weight together the chances of having a claim of size 8 given m: exp(-(8-m)2 /18) / {3 2 π }, using the a priori probabilities of m: f(m) = exp(-(m-7)2 /8) / {4 2 π }, and integrating from minus infinity to infinity: ∞
∫ exp(-(8-m)2/18)/{3
2 π } f(m) dm =
-∞
∞
∫ exp(-(8-m)2/18)/{3
2 π } exp(-(m-7)2 /8)/{2 2 π } dm =
-∞
∞
{1/(6 2 π )} ∫ exp[ -{(8-m)2 /18 + (m-7)2 /8)}]/ 2 π dm = -∞
∞
{1/(6 2 π )} ∫ exp[ -{13m2 - 190m + 697} / 72]/ 2 π dm = -∞
62
There is a very small but positive chance that the mean severity will be negative. There is always a positive chance that the mean severity will be negative for the Normal-Normal conjugate prior.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 268
∞
{1/(6 2 π )} ∫exp[ -{m2 - (190/13)m + (95/13)2 + (697/13) - (95/13)2 }/(72/13)]/ 2 π dm = -∞ ∞
exp((-36/132 )/(72/13))/{6 2 π } ∫ exp[ -{m -95/13}2 / {2(6/ 13 )2 }]/ 2 π dm = -∞
{exp(-1/26)/{6 2 π }} (6/ 13 ) = exp(-1/26)/{ 13 2 π } = 0.1065. Where we have used the fact that a Normal Density integrates to unity:63 ∞
∫ exp[ -{m -95/13}2 / {2(6/
13 )2 ]/{(6/ 13 ) 2 π } dm = 1.
-∞
More generally, if the distribution of hypothetical means m is given by an Normal Distribution f(m) = exp(-(m-µ)2 /2σ2)/{σ 2 π }, and we compute the chance of having a claim of size x by integrating from minus infinity to infinity:64 ∞
∫ exp(-(x-m)2/2s2)/{s
2 π } f(m) dm =
-∞ ∞
∫ exp(-(x-m)2/2s2)/{s
2 π } exp(-(m-µ)2 /2σ2 )/{σ 2 π } dm =
-∞ ∞
{1/(sσ 2 π )} ∫ exp[-{(x-m)2 /2s2 + (m-µ)2 /2σ2 )}] / 2 π dm = -∞ ∞
{1/(sσ 2 π }) ∫ exp[-{(s2 + σ2 )m2 - (xσ2 + µs2 )2m + (x2 σ2 + µ2s2 )}/(2s2 σ2)] / 2 π dm. -∞
63
With mean of 95/13 and standard deviation of 6/ 13 . Note that Iʼve used Greek letters for the parameters of the prior Normal Distribution, while I used roman letters for the parameters of the Normal likelihood. 64
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 269
Let ξ2 = s2 σ2 /(s2 +σ 2 ), ν = (xσ2 +µs2 )/(s2 +σ 2 ), and δ = (x2 σ2 +µ 2s2 )/(s2 +σ 2 ), then this integral equals: ∞
{1/(sσ 2 π )} ∫ exp[-(m2 - 2νm + δ)/(2ξ2)]/ 2 π dm = -∞ ∞
{1/(sσ 2 π )} ∫ exp[-{m2 - 2νm + ν2 - ν2 + δ}/(2ξ2)]/ 2 π dm = -∞ ∞
{1/(sσ 2 π )} exp[-( ν2 - δ)/(2ξ2)] ∫ exp[-(m - ν)2 /(2ξ2)] / 2 π dm = -∞
{1/(sσ 2 π )} exp[-( ν2 - δ)/(2ξ2)) ξ = exp[-( ν2 - δ)/(2ξ2)] / { s2 + σ 2 2 π }. Where we have used the fact that a Normal Density integrates to unity:65 ∞
∫ exp[- {(m-ν)2 /(2ξ2)] / {ξ
2 π } dm = 1.
-∞
Note that ν2 - δ = (x2 σ4 + 2xµσ2s2 + µ2s4 - {x2 s2 σ2 + x2 σ4 + µ2s4 + µ2 σ2s2 }) /(s2 + σ2 )2 = (2xµσ2s2 - x2 s2 σ2 − µ2 σ2s2 ) /(s2 + σ2 )2 = -(x - µ)2 σ2s2 /(s2 + σ2 )2 . Thus, (ν2 - δ)/ ξ2 = {-(x-µ)2 σ2s2 /(s2 +σ 2 )2 } (s2 +σ 2 ) / (s2 σ2) = -(x - µ)2 / (s2 + σ2 ). Thus the marginal distribution can be put back in terms of x, s, µ, and σ: {1/( s2 + σ 2 2 π )} exp[−( ν2 - δ)/(2ξ2)] = {1/( s2 + σ 2 2 π )} exp[-(x-µ)2 / {2(s2 + σ2 )}]. This is a Normal Distribution with mean µ and variance s2 + σ2 . Thus if the likelihood is a Normal Distribution with variance s2 (fixed and known), and the prior distribution of the hypothetical means of the likelihood is also a Normal, but with mean µ and variance σ2, then the Marginal Distribution (Prior Mixed Distribution) is yet a third Normal Distribution with mean µ and variance s2 + σ2 . 65
With mean of ν and standard deviation of ξ.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 270
As with the other Conjugate Priors discussed above, the mean of the likelihood is what is varying among the insureds in the portfolio. Therefore, the mean of the marginal distribution is equal to that of the prior distribution, in this case µ. For the specific case dealt with previously: s = 3, µ = 7, and σ = 2, the marginal distribution has a Normal Distribution with a mean of 7 and variance of: 32 + 22 = 13. Thus the chance of having a claim of size x is: exp[-(x-7)2 /26] / { 13
2 π }.
For x = 8 this chance is: exp(-1/26) / { 13 2 π } = 0.1065. This is the same result as calculated above. For the Normal-Normal the marginal distribution is always a Normal, with mean equal to that of the prior Normal66 and variance equal to the sum of the variances of the prior Normal and the Normal likelihood.67 Prior Expected Value of the Process Variance: The process variance of the severity for an individual risk is s2 since the severity for each risk is a Normal with fixed variance s2 . Therefore the expected value of the process variance = E[s2 ] = s2 . Thus for s = 3, EPV = 32 = 9. Prior Variance of the Hypothetical Means: The variance of the hypothetical mean severities is the variance of m = Var[m] = Variance of the Prior Normal = σ2 = 22 = 4. Prior Total Variance: The total variance = the variance of the marginal Normal Distribution = s2 + σ2 = 32 + 22 = 13. Expected Value of the Process Variance + Variance of the Hypothetical Means = 9 + 4 = 13 = Total Variance. The fact that the EPV + VHM = Total Variance is one way to remember the variance of the marginal Normal. 66
This fact follows from the the fact that the prior distribution is parametrizing the mean severities of the likelihoods. As will be discussed below, the EPV is the variance of the Normal Likelihood, the VHM is the variance of the prior Normal, and the total variance is the variance of the marginal distribution. Thus this relationship follows from the general fact that the total variance is the sum of the EPV and VHM. 67
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 271
EPV = s2 = variance of Normal Likelihood. VHM = σ2 = variance of Normal Prior. Variance of marginal Normal = Total Variance = EPV + VHM = s2 + σ2 . Observations: Let us now introduce the concept of observations. A risk is selected at random and it is observed to have 5 claims of sizes: 8, 7, 5, 4 and 3. Note that for the forthcoming analysis all that will be important is that there were 5 claims totaling 27. Posterior Distribution: We can employ Bayesian analysis to compute what the chances are that the selected risk had a given hypothetical mean. Given a Normal severity distribution with mean m and variance 9, the chance of observing a claim of size 8 is: exp(-(8-m)2 /18) / {3 2 π }. The chance of having 5 claims of size: 8, 7, 5, 4 and 3 is the product of five likelihoods, which is proportional to: exp[-(8-m)2 /18] exp[-(7-m)2 /18] exp[-(5-m)2 /18] exp[-(4-m)2 /18] exp[-(3-m)2 /18] = exp[-{(8-m)2 + (7-m)2 + (5-m)2 + (4-m)2 + (3-m)2 }/18] = exp[-{163 - 54m + 5m2 }/18].
The a priori probability of m is the Prior Normal distribution: f(m) = exp(-(m-7)2 /8) / {2 2 π }. Thus the posterior chance of m is proportional to the product of the chance of observation and the a priori probability: exp[-{163 - 54m + 5m2 }/18] exp[-(m-7)2 /8] = exp[-{1093 - 342m + 29m2 }/72] = exp[-{(1093/29) - (342/29)m + m2 } / {2(36/29)}]. This is proportional to the density for a Normal distribution with mean 171/29 and variance 36/29.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 272
Below the prior Normal with µ = 7 and σ = 2, and this posterior Normal with µ = 171/29 and σ = 6/ 29 , are compared: density 0.35
Posterior
0.30 0.25 0.20 0.15
Prior
0.10 0.05 2
4
6
8
10
12
14
Mean Severity
In general, if one observes C claims totaling L losses, we have that the chance of the observation given m is a product of terms proportional to: exp[-{-2Lm + Cm2 }/(2s2 )]. The prior Normal distribution is proportional to: exp[-(-2mµ + m2 )/(2σ2)]. The posterior probability for m is therefore proportional to: exp[-{-2Lm + Cm2 }/(2s2 )] exp[-(-2mµ + m2 )/(2σ2)] = exp[-{-2(Lσ2 + µs2 )m + (Cσ2+s2 )m2 )}/(2s2 σ2)] = exp[-{-2((Lσ2 + µs2 )/(Cσ2+s2 ))m + m2 )} / {2s2 σ2/(Cσ2+s2 )}]. This is proportional to the density of a Normal distribution with new mean = (Lσ2 + µs2 ) / (Cσ2+s2 ) and new variance = s2 σ2 / (Cσ2+s2 ). Thus for the Normal-Normal the posterior density function is also a Normal. L σ 2 + µ s2 s2 σ2 This posterior Normal has a mean = , and variance = . C σ2 + s2 C σ2 + s 2
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 273
For example, in the case where we observed 5 claims totaling 27, C = 5 and L = 27, the prior Normal had mean µ = 7 and variance σ2 = 4, and the Normal Likelihoods had variance s2 = 9, the posterior Normal has a mean of: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(27)(4) + (7)(9)} / {(5)(4) + 9} = 171/29, and variance of: s2 σ2/(Cσ2 + s2 ) = (4)(9) / {(5)(4) + 9} = 36/29, matching the result obtained above. The fact that the posterior distribution is of the same form as the prior distribution is why the Normal is a Conjugate Prior Distribution for the Normal (fixed severity). Posterior Mean: One can compute the means and variances posterior to the observations. The posterior mean can be computed by weighting together the means for each type of risk, using the posterior probabilities. This is E[m] = the mean of the posterior Normal = (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(27)(4) + (7)(9)} / {(5)(4) + 9} = 171/29. Thus the new estimate posterior to the observations for this risk using Bayesian Analysis is 171/29. This compares to the a priori estimate of 7. In general, the observations provide information about the given risk, which allows one to make a better estimate of the future experience of that risk. Not surprisingly observing 5 claims totaling 27 (for an average severity of 5.4) has lowered the estimated future mean severity from 7 to 171/29 = 5.9. Posterior Expected Value of the Process Variance: Just as prior to the observations, posterior to the observations one can compute three variances: the expected value of the process variance, the variance of the hypothetical pure premiums, and the total variance. The process variance of the severity for an individual risk is s2 since the severity for each risk is a Normal with fixed variance s2 . Therefore the expected value of the process variance = the expected value of s2 = s2 = 9. Posterior Variance of the Hypothetical Means: The variance of the hypothetical mean severities is the variance of m = Var[m] = Variance of the Posterior Normal =
s2 σ2 = 36/29. C σ2 + s 2
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 274
Posterior Total Variance: The posterior total variance = the posterior Expected Value of the Process Variance + the posterior Variance of the Hypothetical Means = Variance of the Normal Likelihood + Variance of the Posterior Normal = s2 + For this example: s2 +
s2 σ2 . C σ2 + s 2
s2 σ2 = 9 + 36/29 = 297/29. C σ2 + s 2
This total variance is the variance of the predictive distribution. Predictive Distribution: Since the posterior distribution is also a Normal distribution, the same analysis that led to a Normal (prior) marginal distribution, will lead to a (posterior) predictive distribution that is Normal. However, the parameters are related to the posterior Normal. For the Normal-Normal the predictive distribution is always a Normal with mean =
L σ 2 + µ s2 s2 σ2 2+ , and variance = s . C σ2 + s2 C σ2 + s 2
In the particular example, the predictive distribution is a Normal with mean 171/29 and variance 297/29. Below are compared the prior marginal Normal with µ = 7 and σ = and this posterior predictive Normal with µ = 171/29 and σ = density
13 ,
279 / 29 :
Predictive
0.12 0.10 0.08 Marginal 0.06 0.04 0.02 -5
5
10
15
20
x
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 275
Buhlmann Credibility: Next, letʼs apply Buhlmann Credibility to this example. The Buhlmann Credibility parameter the (prior) expected value of the process variance K= = 9 / 4. the (prior) variance of the hypothetical means Note that K can be computed prior to any observation and doesnʼt depend on them. Specifically both variances are for a single insured for one trial. For the Normal-Normal in general, the Buhlmann Credibility parameter K = s2 / σ2, where σ2 is the variance of the prior Normal and s2 is the variance each of the Normal Likelihoods. For this example, K = 9/4. Having observed 5 claims, Z = 5 / {5 + (9/4)} = 20/29 = 0.690. The observed severity = 27/5. The a priori mean = 7. Thus the estimated future severity is: (20/29)(27/5) + (9/29)(7) = 171/29. Note that in this case the estimate from Buhlmann Credibility matches the estimate from Bayesian Analysis. For the Normal-Normal the estimates from using Bayesian Analysis and Buhlmann Credibility are equal.68 For the Normal-Normal, it is easier to apply Buhlmann Credibility than Bayes Analysis. Summary: The many different aspects of the Normal-Normal Conjugate Prior Severity Process are summarized below. Be sure to be able to clearly distinguish between the situation prior to observations and that posterior to the observations. The Normal-Normal is far and away the least important of the four conjugate prior situations to learn for your exam. While applying Bayes Analysis to this situation is very difficult, one can apply Buhlmann credibility to this situation without memorizing anything. One can determine the EPV and VHM by just applying their definitions. 68
As discussed in a subsequent section this is a special case of the general results for conjugate priors of members of linear exponential families. This is an another example of what Loss Models calls “exact credibility.”
Page 276 HCM 10/21/12,
Conjugate Priors §10 Normal-Normal, 2013-4-10,
Normal-Normal Severity Process
(Size of Loss) Mean = µ = mean of the prior Normal Distribution. Variance = s2 + σ2.
Normal Marginal Distribution:
Normal Severity Process, fixed variance s2, mean m.
Normal is a Conjugate Prior, Normal (fixed variance) is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate K = Variance of Normal Likelihood/ Variance of Normal Prior = s2/σ2.
Normal Prior Mixing
Mixing
(Size of Loss) Mean = (Lσ2 + µs2 ) / {Cσ2 + s2 } = mean of posterior Normal Distribution. Variance = s2 + s2 σ2 / {Cσ2 + s2 }.
Normal Predictive Distribution:
Observations: $ of Loss = L, # claims = C.
(Distribution of Parameters) f(m) = φ((m-µ)/σ), mean = µ, variance = σ2.
Normal Posterior (Distribution of Parameters) Mean = (Lσ2 + µs2 ) / {Cσ2 + s2 }. Variance = s2 σ2 / {Cσ2 + s2 }.
Normal Severity Process, fixed variance s2, mean m.
The Means of the Normal Severity Distributions of individuals making up the entire portfolio are distributed via a Normal Distribution with parameters µ and σ: f(m) = exp[-(m-µ)2/ 2σ2]/{σ 2 π }.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 277
Problems: Use the following information to answer the next 15 questions: The size of claim distribution for any particular policyholder is Normal with mean m and variance 25. The m values of the portfolio of policyholders have probability density function: exp[f(m) =
(m-100)2 ] 288
12 2π
, -∞ < m < ∞.
10.1 (1 point) What is the mean claim size for the portfolio? A. less than 20 B. at least 20 but less than 40 C. at least 40 but less than 80 D. at least 80 but less than 160 E. at least 160 10.2 (1 point) What is the total variance of the claim severity for the portfolio? A. less than 135 B. at least 135 but less than 145 C. at least 145 but less than 155 D. at least 155 but less than 165 E. at least 165 10.3 (1 point) What is the probability that an insured picked at random from this portfolio will have an expected mean severity m between 110 and 120? A. less than 15% B. at least 15% but less than 25% C. at least 25% but less than 35% D. at least 35% but less than 45% E. at least 45% 10.4 (1 point) What is the value of the probability density function for the claim sizes of the entire portfolio? exp[A.
(m-100)2 ] 338
13 2π
exp[B.
E. None of A, B, C, or D
(m-100)2 ] 288
12 2π
exp[C.
(m-100)2 ] 98
7 2π
exp[D.
(m-100)2 ] 50
5 2π
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 278
10.5 (1 point) What is the probability that a claim picked at random from this portfolio will be of size between 100 and 120? A. less than 15% B. at least 15% but less than 25% C. at least 25% but less than 35% D. at least 35% but less than 45% E. at least 45% 10.6 (1 point) What is the expected value of the process variance for the claim severity? A. less than 20 B. at least 20 but less than 40 C. at least 40 but less than 80 D. at least 80 but less than 160 E. at least 160 10.7 (1 point) What is the variance of the hypothetical mean severities? A. less than 20 B. at least 20 but less than 40 C. at least 40 but less than 80 D. at least 80 but less than 160 E. at least 160 10.8 (2 points) An insured has 3 claims of sizes 95, 115, and 120. Using Buhlmann Credibility what is the estimate of this insured's expected future claim severity? A. less than 97 B. at least 97 but less than 100 C. at least 100 but less than 103 D. at least 103 but less than 106 E. at least 106 10.9 (1 point) An insured has 3 claims of sizes 95, 115, and 120. What is the mean of the posterior severity distribution? A. less than 97 B. at least 97 but less than 100 C. at least 100 but less than 103 D. at least 103 but less than 106 E. at least 106
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 279
10.10 (2 points) An insured has 3 claims of sizes 95, 115, and 120. What is the variance of the posterior severity distribution? A. less than 8.0 B. at least 8.0 but less than 8.1 C. at least 8.1 but less than 8.2 D. at least 8.2 but less than 8.3 E. at least 8.3 10.11 (1 point) An insured has 3 claims of sizes 95, 115, and 120. Which of the following is proportional to the posterior probability density function for this insured's mean expected severity? A. exp[-(m-100)2 /15.75] B. exp[-(m-105)2 /15.75] C. exp[-(m-100)2 /338] D. exp[-(m-105)2 /338] E. None of A, B, C, or D 10.12 (1 point) An insured has 3 claims of sizes 95, 115, and 120. What is the probability that this insured will have an expected future mean severity m between 110 and 115? A. less than 15% B. at least 15% but less than 25% C. at least 25% but less than 35% D. at least 35% but less than 45% E. at least 45% 10.13 (2 points) An insured has 3 claims of sizes 95, 115, and 120. Which of the following is proportional to the predictive distribution of the severity for this insured? A. exp[-(m-100)2 /65.76] B. exp[-(m-100)2 /15.75] C. exp[-(m-108.78)2 /65.76] D. exp[-(m-108.78)2 /15.75] E. None of A, B, C, or D
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 280
10.14 (1 point) An insured has 3 claims of sizes 95, 115, and 120. What is the probability that the next claim from this insured will be of size between 115 and 120? A. less than 15% B. at least 15% but less than 25% C. at least 25% but less than 35% D. at least 35% but less than 45% E. at least 45% 10.15 (2 points) An insured has 3 claims of sizes 95, 115, and 120. Using Bayesian Analysis what is the estimate of this insured's expected future claim severity? A. less than 97 B. at least 97 but less than 100 C. at least 100 but less than 103 D. at least 103 but less than 106 E. at least 106
Use the following information for the next two questions: • The size of claim distribution for any particular policyholder is LogNormal, with parameters µ and σ = 1.7. •
The µ values of the portfolio of policyholders have a Normal Distribution
•
with mean 3.8 and variance 2.25. For a particular policyholder you observe 5 claims of sizes: 120, 160, 210, 270, and 380.
10.16 (4 points) Use Bayes Analysis to estimate the expected future claim severity for this policyholder. Hint: Work with the log claim sizes and apply Bayes Analysis to the resulting Normal-Normal Conjugate Prior in order to get the predictive Normal distribution. Then convert the predictive Normal Distribution to a LogNormal Distribution. A. 400 B. 500 C. 600 D. 700 E. 800 10.17 (5 points) Use Buhlmann Credibility to estimate the expected future claim severity for this policyholder. A. 400 B. 500 C. 600 D. 700 E. 800
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 281
Use the following information to answer the next 4 questions: • The size of claim distribution for any particular Manufacturing Class is assumed to be Normal with mean m and variance 1.2 million. •
The hypothetical means, m, are assumed to be normally distributed over exp[the Manufacturing Industry Group: f(m) =
•
(m- 6000)2 ] 68,450
185 2π
, -∞ < m < ∞.
For the Widget Manufacturing Class, one observes 72 claims for a total of 350,000.
10.18 (2 points) Using Buhlmann Credibility, what is the estimated future average claim severity for the Widget Manufacturing Class? (x -µ)2 ] 2σ2 , -∞ < x < ∞. σ 2π
exp[Hint: The density of the Normal Distribution is: f(x) = A. Less than 5100 B. At least 5100, but less than 5200 C. At least 5200, but less than 5300 D. At least 5300, but less than 5400 E. At least 5400
10.19 (1 point) What is the mean of the posterior distribution of m? A. Less than 5100 B. At least 5100, but less than 5200 C. At least 5200, but less than 5300 D. At least 5300, but less than 5400 E. At least 5400 10.20 (1 point) What is the variance of the posterior distribution of m? A. Less than 8,000 B. At least 8,000, but less than 9,000 C. At least 9,000, but less than 10,000 D. At least 10,000, but less than 11,000 E. At least 11,000 10.21 (2 points) What is the probability that the Widget Manufacturing Class has an expected future mean severity m between 5000 and 5500? A. 94% B. 95% C. 96% D. 97% E. 98%
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 282
10.22 (2 points) You are given the following: • The IQs of actuaries are normally distributed with mean 135 and standard deviation 10. • Each actuaryʼs score on an IQ test is normally distributed around his true IQ, with standard deviation of 15. • Abbie the actuary scores a 155 on an IQ test. Using Buhlmann Credibility, what is the estimate of Abbieʼs IQ? A. 139 B. 141 C. 143 D. 145 E. 147
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 283
Solutions to Problems: 10.1. D. f(m) is a Normal density function with mean of 100 (and variance of 144.) Each Normal severity has a mean of m. Thus, Mean of the portfolio = E[m] = Mean of the prior Normal Distribution = 100. 10.2. E. f(m) is a Normal density function with mean of 100 and variance of 144. The variance of the portfolio is the sum of the variance of the prior Normal and the (fixed) variance of the Normal Severity Processes = 144 + 25 = 169. 10.3. B. f(m) is a Normal density function with mean of 100 and standard deviation of 12. Thus the probability that an insured picked at random from this portfolio will have a mean severity m between 110 and 120 is: Φ[(120 - 100)/12] - Φ[(110 - 100)/12] = Φ(1.67) - Φ(0.83) = 0.9525 - 0.7967 = 0.1558. Comment: Note that we picked an insured at random and asked about its expected mean severity. This is different than picking a claim at random and asking about its size. The former is the hypothetical mean with a distribution with a variance equal to the VHM (by definition), while the latter has a distribution with variance equal to the total variance. 10.4. A. The (prior) marginal distribution is a Normal, with mean of 100 and variance of: 144 + 25 = 169. Thus the probability density function that a claim chosen at random will be of size x is given by: φ((x-100)/13) = exp[-(x-100)2 /338] / {13 2 π }. 10.5. D. The (prior) marginal distribution is a Normal, with µ =100 and σ2 = 144 + 25 = 169. The standard deviation is
169 = 13. Thus the chance of a claim being in the interval from 100 to 120 is:
Φ[(120-100)/13] - Φ[(100-100)/13] = Φ(1.54) - Φ(0) = 0.9382 - 0.5 = 0.4382. 10.6. B. We are given that each insured has a severity process with variance 25. Thus each process variance is 25 and so is their expected value over the portfolio. 10.7. D. The hypothetical mean severity for each insured is m. The distribution of m is given as Normal with mean of 100 and variance of 144. Thus the variance of the hypothetical mean severities is 144. Comment: Note that the total variance of 169, the variance of the (prior) marginal distribution, is equal to the sum of the (prior) EPV of 25 and the (prior) VHM of 144.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 284
10.8. E. The Buhlmann Credibility Parameter K = EPV / VHM = Variance of the Normal Likelihood = 25/144. Variance of the Normal Prior Distribution Thus for three claims, Z = 3/(3 + 25/144) = 432/457. The prior mean is 100. The observed mean is (95+115+120)/3 = 110. The Buhlmann Credibility Estimate = (110)(432/457) + (100)(25/457) = 50020/457 = 109.45. 10.9. E. The posterior distribution is a Normal, with mean equal to: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(330)(144) + (100)(25)} / {(3)(144) + 25} = 50,020/457 = 109.45. 10.10. A. The posterior distribution is a Normal, with variance equal to: s2 σ2 / ( s2 + Cσ2) = (25)(144) / {25 + (3)(144)} = 7.8775. 10.11. E. The posterior distribution is a Normal, with mean equal to: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(330)(144) + (100)(25)} / {(3)(144) + 25} = 109.45 and variance equal to: s2 σ2 / (s2 + Cσ2) = (25)(144) / {25 + (3)(144)} = 7.8775. Thus the posterior density is: exp[-(m-109.45)2 /15.75] / {2.807 2π }. 10.12. D. The posterior distribution is a Normal, with mean equal to: (Lσ2 + µs2 ) / ( Cσ2 + s2 ) = {(330)(144) + (100)(25)} / {(3)(144) + 25} = 109.45 and variance equal to: s2 σ2 / (s2 + Cσ2) = (25)(144) / {25 + (3)(144)} = 7.8775. Thus the probability that this insured will have an expected future mean severity between 110 and 115 is: Φ[(115-109.45)/2.807] - Φ[(110-109.45)/2.807] = Φ(1.98) - Φ(0.20) = 0.9761 - 0.5793 = 0.3968. 10.13. E. The (posterior) predictive distribution is a Normal, with mean equal to: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(330)(144) + (100)(25)} / {(3)(144) + 25} = 109.45 and variance equal to: s2 + s2 σ2 / (s2 + Cσ2) = 25 + ((25)(144) / {25 + (3)(144)}) = 32.8775. Thus the (posterior) predictive density is: exp[-(m-109.45)2 /65.76] / {5.734 2π }. Comment: The posterior total variance of 32.8775, the variance of the (posterior) predictive distribution, is equal to the sum of the posterior EPV of 25 and the posterior VHM of 7.8875.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 285
10.14. A. The (posterior) predictive distribution is a Normal, with mean equal to: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(330)(144) + (100)(25)} / {(3)(144) + 25} = 109.45, and variance equal to: s2 + s2 σ2 / (s2 + Cσ2) = 25 + ((25)(144) / {25 + (3)(144)}) = 32.8775. The standard deviation is: 32.8775 = 5.734. Thus the chance of a claim being in the interval from 115 to 120 is: Φ[(120 - 109.45)/5.733] - Φ[(115 - 109.45)/5.733] = Φ(1.84) - Φ(0.97) = 0.9671 - 0.8340 = 0.1331. 10.15. E. The (posterior) predictive distribution is a Normal, with mean equal to: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(330)(144) + (100)(25)} / {(3)(144) + 25} = 109.45. Comment: This mean of the (posterior) predictive distribution = Bayes Analysis Estimate = Buhlmann Credibility Estimate. 10.16. E. For any particular policyholder, the log of the claim sizes follows a Normal distribution with standard deviation of 1.7. The hypothetical means of these distributions are in turn Normally Distributed. Thus this is mathematically a Normal-Normal conjugate prior situation. The sum of the observed log claim sizes is: ln(120) + ln(160) + ln(210) + ln(270) + ln(380) = 4.787 + 5.075 + 5.347 + 5.598 + 5.940 = 26.747. Thus the (posterior) predictive distribution of the log claim sizes is a Normal, with mean equal to: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(26.747)(2.25) + (3.8)(1.72 )} / {(5)(2.25) + 1.72 } = 71.16/14.14 = 5.03, and variance equal to: s2 + s2 σ2 / (s2 + Cσ2) = (1.72 ) + ((1.72 )(2.25) / {1.72 + (5)(2.25)}) = 2.89 + 0.460 = 3.35. Thus the standard deviation is: 3.35 = 1.83. The mean of the corresponding LogNormal Distribution with parameters 5.03 and 1.83 is: exp[5.03 + (0.5)(1.832 )] = exp(6.70) = 816. Comment: Very difficult. For an example of the use of the Normal-Normal Conjugate Prior in the context of a LogNormal claim severity, see for example pages 279-280 of “Credibility Using SemiParametric Models,” by Virginia R. Young, ASTIN Bulletin, Volume 27, No. 2, November 1997. The answer seems peculiar, since it is much larger than any of the observed claims. However, with long-tailed distributions, the overwhelming majority of claims are less than the mean. For example, the median of the posterior predictive LogNormal is exp(5.03) = 152. For this LogNormal, the probability of a claim being less than the mean of 816 is: Φ[(6.70-5.03)/1.83] = Φ(0.91) = 82%.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 286
10.17. B. The process variance given µ is: Second Moment of LogNormal - Square of First Moment of LogNormal = Exp[2µ + (2)(1.72 )] - Exp[µ + (1.72 )/2]2 = Exp[2µ] (e5.78 - e2.89) = 305.77 Exp[2µ]. Therefore, EPV = 305.77 E[e2µ]. µ is Normal with mean 3.8 and standard deviation 1.5. Therefore, 2µ is Normal with mean 7.6 and standard deviation 3. Therefore, e2µ is LogNormal with parameters 7.6 and 3. Therefore, E[e2µ] is the mean of this LogNormal: Exp[7.6 + 32 /2] = e12.1 = 179,872. Therefore, EPV = (305.77)(179,872) = 55.00 million. The hypothetical mean given µ is: First Moment of LogNormal = Exp[µ + (1.72 )/2] = eµ e1.445. Therefore, VHM = Var[eµ] (e1.445)2 = Var[eµ] e2.89. µ is Normal with mean 3.8 and standard deviation 1.5. Therefore, eµ is LogNormal with parameters 3.8 and 1.5. Therefore, Var[eµ] is the variance of this LogNormal: Exp[(2)(3.8) + (2)(2.25)] - Exp[3.8 + 2.25/2]2 = e12.1 - e9.85 = 160,914. Therefore, VHM = 160,914 e2.89 = 2.895 million. K = EPV/VHM = 55.00/2.895 = 19.0. ⇒ Z = 5 / (5 + 19.0) = 20.8%. The hypothetical mean given µ is: eµ e1.445. eµ is LogNormal with parameters 3.8 and 1.5. Therefore, E[eµ] is the mean of this LogNormal: Exp[3.8 + 1.52 /2] = e4.925. Thus the prior mean is: E[eµ] e1.445 = e4.925 e1.445 = 584. Observed mean is: (120 + 160 + 210 + 270 + 380)/5 = 228. Estimate = (20.8%) (228) + (1 - 20.8%) (584) = 510. Comment: The estimate using Buhlmann Credibility is not equal to the estimate using Bayes Analysis. While Buhlmann is equal to Bayes in "Normal land”, they are not equal in "LogNormal land”. This is the case because linearity is not preserved under exponentiation. This is also why the mean of a LogNormal is not equal to the exponential of the mean of the underlying Normal. Working in “Normal land”, K = 2.89/2.25 = 1.284, n = 5, and Z = 79.6%. The mean of the log observed claims sizes is: 5.35. The prior mean of the log claim sizes is 3.8, the mean of the prior Normal. (5.35)(0.796) + (3.8)(1 - 0.796) = 5.03. This 5.03 is equal to the mean of the predictive Normal as given in my solution to the previous question. Then as per my solution to the previous question, one could get the variance of the predictive Normal, and proceed as I did to get the predictive LogNormal Distribution and the Bayes estimate of the future severity.
2013-4-10,
Conjugate Priors §10 Normal-Normal,
HCM 10/21/12,
Page 287
10.18. C. The process variance for each class is 1.2 million. The EPV is 1.2 million. f(m) is Normal, with µ = 6000 and σ = 185. The VHM is the variance of f(m), which is 1852 = 34225. Thus K = EPV/VHM = 1,200,000/ 34,225 = 35.06. Z = 72 / (72+35.06) = 0.673. The prior estimate is the mean of f(m) = 6000. The observed severity is 350000/72 = 4861. Thus the posterior estimate is: (4861)(0.673) + (6000)(1 - 0.673) = 5233. Comment: This is a Conjugate Prior situation with the likelihood a member of a linear exponential family, a Normal with fixed variance. Therefore, the Buhlmann Credibility estimate equals that from Bayesian Analysis, the mean of the posterior distribution of m, which is shown in the next solution. 10.19. C. The posterior distribution of m is a Normal, with mean equal to: (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(350000)(34225) + (6000)(1,200,000)} / {(72)(34225) + 1,200,000} = 19,178,750,000 / 3,664,200 = 5234. 10.20. E. The posterior distribution of m is a Normal, with variance equal to: s2 σ2 / (s2 + Cσ2) = (1,200,000)(34,225) / {1,200,000 + (72)(34,225)} = 11,208. 10.21. E. The posterior distribution of m is a Normal, with mean equal to 5234, and variance equal to 11208. Thus it has standard deviation of
11,208 = 106. Thus the probability that this class has
an expected future mean severity between 5000 and 5500 is: Φ[(5500 - 5234)/106] - Φ[(5000 - 5234)/106] = Φ(2.51) - Φ(-2.21) = 0.9940 - (1 - 0.9864) = 0.9804. Comment: Thus (5000, 5500) is a reasonable interval estimate for the expected severity for the Widget Manufacturing Class. 10.22. B. The Expected Value of the Process Variance is: 152 = 225. The Variance of the Hypothetical Means is 102 = 100. Thus K = EPV/VHM = 225/100 = 2.25. Z = 1/(1+ 2.25) = 0.308. The prior estimate is 135 and the observation is 155. Thus the posterior estimate is: (155)(0.308) + (135)(1 - 0.308) = 141. Comment: Since this is a Conjugate Prior situation with the likelihood a member of a linear exponential family (a Normal with fixed variance), the Buhlmann Credibility estimate equals that from Bayesian Analysis. The Bayesian Analysis estimate is the mean of the posterior distribution of m, (Lσ2 + µs2 ) / (Cσ2 + s2 ) = {(155)(100) + (135)(225)} / {(1)(100) + 225} = 141, resulting in the same solution.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 288
Section 11, Linear Exponential Families69 Linear exponential families include: Exponential, Poisson, Normal for fixed σ, Binomial for m fixed (including the special case of the Bernoulli, m = 1) , Negative Binomial for r fixed (including the special case of the Geometric, r = 1), the Gamma for α fixed, and the Inverse Gaussian for θ fixed. Definition: A linear exponential family is a family of probability density functions defined on a fixed interval such that f(x; ψ) = p(x)
p(x) er(ψ )x for the parameter ψ in a fixed interval.70 71 q(ψ)
r(ψ) is called the “canonical parameter”. Thus the log density is of the form: ln f(x; ψ) = x r(ψ ) + ln[p(x)] - ln[q(ψ)]; the log density is the sum of three terms: x times the canonical parameter, a function of x, and a function of the parameter ψ. Note that f can be continuous, for example Exponential, or discrete, for example Bernoulli. Note that a constant multiplying f(x; ψ) can be absorbed into either p(x) or q(ψ). There are different ways to parameterize the density. The log density could be the sum of three terms: x times minus ψ, a function of x, and a function of the parameter. In that case, ψ would be called the “natural parameter”.72 In other words, if we take r(ψ) = -ψ, then f(x; ψ) = p(x) e-ψx / q(ψ). Any single parameter family of distributions that can be put in this form (and that has fixed support that does not depend on the parameter) is a linear exponential family. The so-called natural parameter is just a particular way of parametrizing such densities, which happens to be convenient for deriving various results for linear exponential families in general. 69
As discussed in Sections 5.4 and 15.5.3 of Loss Models. Linear Exponential Families come up in other applications, for example Generalized Linear Models (GLIM). See for example “A Primer on the Exponential Family of Distributions”, by David R. Clark and Charles Thayer, CAS 2004 Discussion Paper Program. 70 The fixed interval for x can be -∞ to +∞, 0 to 1, 0 to ∞, etc. The key thing is that the boundaries can not depend in any way on ψ. 71 I have used ψ rather than θ, in order to avoid confusion where Loss Models has already used θ as a parameter of a distribution. 72 Different authors use different terminology.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 289
For the Poisson with parameter λ, f(x; λ) = e-λ λ x / x! Therefore, ln f(x; λ) = x ln(λ) - ln(x!) - λ Therefore, the Poisson distribution is a member of a linear exponential family of the discrete type. If we set ψ = -ln(λ), then we have the Poisson in terms of its natural parameter; ln f(x; ψ) = -xψ - ln(x!) - e-ψ = -xψ + ln[p(x)] - ln[q(ψ)], with p(x) = 1/ x! and ln[q(ψ)] = e-ψ. For the Exponential distribution: f(x; θ) = e−x/ θ /θ. ln f(x; θ) = -x/θ - lnθ. Therefore, the Exponential distribution is a member of a linear exponential family of the continuous type.73 Exercise: Show that the Normal Distribution with σ fixed and single parameter µ is a member of a linear exponential family. [Solution: For the Normal Distribution with σ fixed and single parameter µ, (x -µ)2 ] 2σ2 . σ 2π
exp[f(x; µ) =
ln f(x; µ) = -0.5{(x-µ)/σ}2 - ln σ - 0.5 ln[2π] =
xµ / σ2 - 0.5 x2 / σ2 - 0.5 µ2 / σ2 - ln σ - 0.5 ln[2π]. This has only one term where x and the single parameter µ appear together and in that term x is linear, therefore, the Normal distribution for fixed variance is a member of a linear exponential family of the continuous type. If one lets ψ = -µ / σ2 , then ψ is the natural parameter of the Normal Distribution with σ fixed.] For the Binomial Distribution with m fixed and single parameter q, f(x; q) = m! qx (1-q)m-x / {x!(m-x)!}. ln f(x; q) = x ln[q] + (m-x) ln[1-q] + ln[m!] - ln[x!] - ln[m-x)!] = x {ln[q] - ln[1-q])} - ln[x!] - ln[(m-x)!] + m ln[1-q] + ln[m!]. Therefore, the Binomial Distribution with m fixed is a member of a linear exponential family of the discrete type. Specifically, with m = 1, the Bernoulli is a member of a linear exponential family of the discrete type. Exercise: What is the natural parameter for the Binomial Distribution with m fixed and single parameter q? [Solution: ln f(x; q) = x {ln[q] - ln[1-q])} - ln[x!] - ln[(m-x)!] + m ln[1-q] + ln[m!]. Thus the natural parameter is: -{ln[q] - ln[1-q]} = ln[(1-q)/q)] = ln[1/q - 1]. ] 73
With natural parameter ψ = 1/θ, q(ψ) = 1/ψ and p(x) = 1. 0 < x < ∞, 0 < ψ < ∞.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 290
For the Negative Binomial Distribution with r fixed and single parameter β, f(x; p) = (x+r-1)! βx / {(1+β)x+r x!(r-1)!} ln f(x; p) = x ln β - (x+r)ln(1+β) + ln[(x+r-1)!] - ln x! - ln[(r-1)!] = x {lnβ − ln(1+β)} + ln[(x+r-1)!] - ln x! - ln[(r-1)!] - r ln(1+β). Therefore, the Negative Binomial Distribution with r fixed is a member of a linear exponential family of the discrete type. Specifically, with r = 1, the Geometric Distribution is a member of a linear exponential family of the discrete type. Exercise: What is the natural parameter for the Negative Binomial Distribution with r fixed and single parameter β? [Solution: ln f(x; β) = x {lnβ - ln(1+β)} + ln[(x+r-1)!] - ln x! - ln (r-1)! - r ln(1+β) . Thus the natural parameter is: -{ln[β] - ln[1+β]} = ln[(1+β)/β] = ln[1 + 1/β]. ]
Exercise: Let f(x; µ) = exp[- 5(x-µ)2 / (xµ2 )]
5 , x > 0. π x3
Is this density a member of a linear exponential family? [Solution: ln f(x) = - 5(x-µ)2 / xµ2 - (3/2)ln(x) + ln(5/π)/2 = -5x/µ2 + 10/µ - 5/x - (3/2)ln(x) + ln(5/π )/2. The term involving both x and µ is: -5x/µ2 . Since this is linear in x, this is a member of a linear exponential family. Comment: This is an Inverse Gaussian Distribution, with θ = 10.] Exercise: Let f(x; µ) = µ exp[-(1 - xµ)2 / (20x)] /
20π x , x > 0.
Is this density a member of a linear exponential family? [Solution: ln f(x) = ln(µ) - (1- xµ)2 / (20x) - ln(20π x) / 2 = ln(µ) - 1/20x + µ/10 - xµ2 /20 - ln(20π x) /2. This is indeed a member of an Exponential Family, and the term involving both x and µ is: xµ2 /20. Since this is linear in x, this is a member of a linear exponential family. Comment: This is a Reciprocal Inverse Gaussian Distribution, with one parameter fixed, as discussed Insurance Riskʼs Models by Panjer & Willmot, not on the syllabus.]
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 291
Cases where Bayes = Buhlmann: If the likelihood density is a member of a linear exponential family and the conjugate prior distribution is used as the prior distribution, then the Buhlmann Credibility estimate is equal to the corresponding Bayesian estimate (for the squared error loss function.)74 Specifically, this applies to the Gamma-Poisson, Beta-Bernoulli, Inverse Gamma-Exponential, and the Normal-Normal (fixed variance).75 Loss Models uses the term exact credibility to refer to situations where the Buhlmann Credibility estimate are identical to the corresponding Bayesian estimate (for a squared error loss function.) Since the Buhlmann Credibility estimates are the least squares line fit to the Bayesian estimates, the two are identical if and only if the Bayesian estimates are along a straight line. Mean and Variance of a Linear Exponential Family: One can derive general formulas for the mean and variance76 of any member of a linear exponential family. Assume one has a likelihood from a linear exponential family: f(x; ψ) = p(x)e-ψx / q(ψ), with x in some fixed interval independent of ψ. For example if f(x; ψ) is a Poisson, then ψ = - ln(λ), p(x) = 1/ x! and ln[q(ψ)] = e-ψ, x = 0,1,2... Thus λ = e−ψ and q(ψ) = exp[e-ψ] = eλ. q(ψ) is the normalizing constant such that the density function f(x) integrates to unity over its domain77: 1 = ∫f(x; ψ) dx = ∫p(x)e-ψx / q(ψ) dx = (1/q(ψ)) ∫p(x)e-ψx dx. Therefore, q(ψ) =
∫ p(x)e-ψx dx.
Exercise: Verify the above equation for the Poisson. However, since the Poisson is discrete , substitute summation for integration. [Solution: q(ψ) = exp[e-ψ] = eλ. ∞
∑ p(x) x=0
74
e- ψx
∞
=
∑ λx / x! = eλ = q(ψ).] x=0
This result is demonstrated subsequently. It also applies for example to the Beta-Binomial (fixed m), and the Beta- Negative Binomial (fixed r) . 76 As well as higher moments. 77 For convenience I have written this in terms of integrals. For discrete density functions such as the Poisson, summation is substituted for integration. 75
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 292
Differentiating the above equation with respect to the natural parameter ψ, one obtains:78 qʼ(ψ) = -∫ p(x) x e-ψx dx. The mean of f(x; ψ) depends on ψ and is computed a follows: µ(ψ) = ∫ x f(x; ψ) dx = {1/q(ψ)} ∫ x p(x) e-ψx dx = -qʼ(ψ) / q(ψ). Therefore:79 µ(ψ) = -qʼ(ψ) / q(ψ) = -d (ln (q(ψ)) / dψ. For example for the Poisson, ln(q(ψ)) = e-ψ, and -d (ln (q(ψ)) / dψ = e-ψ = λ, which is in fact the mean of the Poisson Distribution. Thus as shown in Loss Models, we have computed the mean of a linear exponential family in terms of the q(ψ), its normalizing function in terms of its natural parameter. In a similar manner we can obtain formulas for the second moment and the variance. We had: qʼ(ψ) = - ∫ p(x) x e-ψx dx. Differentiating again with respect to ψ we obtain: qʼʼ(ψ) = ∫ p(x) x2 e-ψx dx = q(ψ) ∫f(x; ψ) x2 dx = q(ψ) E[X2 ]. Therefore, the second moment is: E[X2 ] = qʼʼ(ψ) / q(ψ). Therefore the variance = v(ψ) = E[X2 ] - E[X]2 = qʼʼ(ψ) / q(ψ) - {qʼ(ψ) / q(ψ)}2 = {qʼʼ(ψ)q(ψ) - qʼ(ψ)2 } /q(ψ)2 = d(qʼ(ψ) / q(ψ)) = d2 ln[q(ψ)] / dψ2 .
v(ψ) = 78
d2 ln[q(ψ)] = -µʼ(ψ). 80 dψ2
Note that for a member of a linear exponential family, the domain of integration does not depend on the parameter ψ. If it did, the derivative with respect to ψ of the integral would contain additional terms and the result obtained here would not apply. 79 See Equation 5.8 in Loss Models, with r(ψ) = -ψ. 80 See Equation 5.9 in Loss Models, with r(ψ) = -ψ. Since the variance is > 0, µʼ(ψ) < 0, and µ(ψ) is strictly decreasing in ψ. For example, for the Exponential Distribution the mean is the inverse of the natural parameter.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 293
Exercise: Verify the above equation for the variance in the case of a Poisson. [Solution: For the Poisson, ln[q(ψ)] = e-ψ, and
d2 ln[q(ψ)] = e-ψ = λ, dψ2
which is in fact the variance of the Poisson.] More generally, the rth cumulant, κr = (-1)r dr ln(q(ψ)) / dψr = (-1)r-1 dr-1 µ(ψ) / dψr-1 = -d κr-1
/ dψ .81
Exercise: Use the above relationship to determine the skewness of a Gamma Distribution,
λ αxα−1 e−λx/Γ(α), with natural parameter λ and mean α/λ. [Solution: κ2 = - d (α/λ) / dλ = α/λ2. κ3 = -d ( α/λ2) / dλ = 2α/λ3. Skewness = κ3 / κ2 1.5 = 2/ α .]
81
See Volume 1 of Kendallʼs Advanced Theory of Statistics, by Stuart and Ord.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 294
Equality of Maximum Likelihood & Method of Moments: For Linear Exponential Families, the Methods of Maximum Likelihood and Moments produce the same result when applied to ungrouped data, in the absence of truncating or censoring. Thus there are many cases where one can apply the method of maximum likelihood to ungrouped data by instead performing the simpler method of moments: Exponential, Poisson, Normal for fixed σ, Binomial for m fixed (including the special case of the Bernoulli, m=1) , Negative Binomial for r fixed (including the special case of the Geometric, r =1), and the Gamma for α fixed. Demonstration that Maximum Likelihood Equals Method Moments: This useful fact is demonstrated as follows. Assume one has a likelihood from a linear exponential family: f(x; ψ) = p(x)e-ψx / q(ψ), with x in some fixed interval independent of ψ. Then as shown above, the mean is given by: µ(ψ) = -qʼ(ψ) / q(ψ). Since we have a single parameter ψ, the method of moments consists of setting this equal to the observed mean. In other words (1/n)Σ xi = -qʼ(ψ) / q(ψ) . Now the Method Maximum Likelihood consists of maximizing the sum of the log densities. ln f(xi) = -xiψ + ln[p(xi)] - ln[q(ψ)].
∂ ln f(xi) / ∂ψ = -xi - qʼ(ψ) / q(ψ). Setting the partial derivative, with respect to the single parameter ψ, of the loglikelihood equal to zero: 0 = Σxi + nqʼ(ψ) / q(ψ). Therefore, (1/n)Σ xi = -qʼ(ψ) / q(ψ). This is the same equation as obtained for the method of moments. The mean is therefore a sufficient statistic for any linear exponential family. In other words, all of the information in the data useful for estimating the parameter is contained in the mean of the data. If we wish to estimate the parameter, once we have the mean, we can ignore the individual data points.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 295
A Table of Some Linear Exponential Families: Natural Name
Density
Poisson
e−λ λx /x!
Parameter, ψ82 -ln(λ)
q(ψ)
p(x)
exp[e−ψ]
1/x!
Exponential
e−x/θ /θ
1/θ
1/ψ
Gamma,
θ−αxα−1e−x/θ/Γ(α)
1/θ
ψ−α
1 xα−1/Γ(α)
Fixed α
Normal,
exp(-.5{(x-µ)/σ}2 / {σ 2 π }
-µ/σ2
exp(0.5ψ2 σ2 )
exp(-.5x2 /σ2 )/{σ 2 π }
Fixed σ Bernoulli83 Binomial,
qx(1- q)1-x m! qx(1- q)m-x / {x!(m-x)!}
ln[(1-q)/q]
1+e−ψ
ln[(1-q)/q]
(1+e−ψ)m
1 m! / {x!(m-x)!}
fixed m Geometric
βx / (1+β)x+1
ln(1 + 1/β)
1/(1-e−ψ)
1
Negative Binomial, (x+r-1)! βx / {(1+β)x+rx!(r-1)!}
ln(1 + 1/β)
(1-e−ψ)-r
(x+r-1)! / {x!(r-1)!}
fixed r Inverse
θ / (2π) x-1.5 exp(-θx/ 2µ2 + θ/µ - θ/ 2x)
θ / (2π) x-1.5 exp(- θ/ 2x)
Gaussian, Fixed θ
θ/(2µ2)
exp[- 2θψ ]
Loss Models uses θ in order to designate the natural parameter. I have used ψ instead, in order to avoid confusion with the use of θ as a parameter in the Exponential, Gamma and Inverse Gaussian. 83 In the Bernoulli and the Binomial do not confuse the parameter q with the function q(ψ), used to describe the linear exponential family. 82
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 296
Conjugate Priors: Assume one has a likelihood from a linear exponential family: f(x; ψ) = p(x)e-ψx / q(ψ), with x in some fixed interval independent of ψ. Then one can write down a corresponding conjugate prior distribution of the natural parameter ψ:84 π(ψ) = q(ψ)-K e-Kµψ / c(µ, K). Where c is a normalizing constant, which depends on µ and K, the two parameters of the Conjugate Prior distribution π(ψ).
∫
c(µ, K) = q(ψ)-K e-Kµψ dψ. It will turn out that K is the Buhlmann Credibility Parameter, while µ is the a priori mean = E[µ(ψ)]. For example, if we have a Poisson likelihood, with parameter λ, then the natural parameter ψ = -ln(λ) and q(ψ) = exp[e−ψ] = eλ. Thus π(ψ) = q(ψ)-K e-Kµψ / c(µ, K) = exp[-Ke−ψ] exp[-Kµψ] / c(µ, K). In order to put this density function in terms of λ rather than ψ, we need to change variables, remembering to multiply by |dψ/dλ| = 1/λ. exp[-Ke−ψ] = e−λK. exp[-Kµψ] = exp[Kµ ln(λ) ] = λKµ. Thus, in terms of lambda, this supposed conjugate prior is: e−λK λ Kµ − 1 / c(µ, K). Which we recognize as a Gamma Distribution with parameters θ = 1/K and α = Kµ. In this case, the normalizing constant is: c(µ, K) = Γ(α) θα = Γ(Kµ) K-Kµ. As was discussed previously, the Gamma is in fact a conjugate prior to the Poisson. The Buhlmann Credibility Parameter K is in fact 1/θ. µ is in fact the a priori mean, which is equal to that of the Gamma, αθ = Kµ/K = µ.
84
See Theorem 15.24 of Loss Models. It will be shown that as claimed π(ψ) is in fact a conjugate prior for f.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 297
Marginal Distribution: The marginal distribution is the integral of the likelihood times the prior distribution.
∫f(x; ψ)π(ψ) dψ = ∫{p(x)e-ψx/q(ψ)} q(ψ)-K e-Kµψ / c(µ, K) dψ = {p(x)/c(µ, K)} ∫e-ψ(x+Kµ) q(ψ)-(K+1) dψ = p(x) c((x + Kµ)/(K+1), K+1) / c(µ, K). Where we have used the fact that c(a, b) = ∫q(ψ)-b e-abψ dψ. Thus the marginal distribution is: p(x) c((x + Kµ)/(K+1), K+1) / c(µ, K). For example, in the case of the Gamma-Poisson, p(x) = 1/x! and c(µ, K) = Γ( Kµ) K-Kµ. Thus, p(x) c((x + Kµ)/(K+1), K+1) / c(µ, K) = (1/x!) (Γ(x + Kµ) (K+1)-(x+Kµ)} / {Γ( Kµ) K-Kµ} = {Γ(x + Kµ) / (x! Γ( Kµ))} (K+1)-(x+Kµ) KKµ = {Γ(x + α) / (x! Γ(α))} (1+ 1/θ)-(x+α) θ−α = {(x+α-1)! /(x! (α-1)!)} θx / (1+θ)x+α. This is a Negative Binomial Distribution, with r = α and β = θ, which as discussed previously is in fact the marginal distribution of the Gamma- Poisson. A Priori Mean: We can determine the a priori mean, which is the mean of the marginal distribution as well as E[µ(ψ)]. Since µ(ψ) = -qʼ(ψ)/q(ψ),
∫ ∫ (-1/c(µ, K))∫qʼ(ψ) q(ψ)-(K+1) e-Kµψ dψ. Using integration by parts this equals: (-1/c(µ, K)) { -q(ψ)-K e-Kµψ /K ] - ∫q(ψ)-K µ e-Kµψ dψ} = π(ψ)] - µ∫ π(ψ) dψ.
E[µ(ψ)] = µ(ψ)π(ψ) dψ = - (qʼ(ψ)/q(ψ)) q(ψ)-K e-Kµψ / c(µ, K) dψ =
85
The first term is the difference of π(ψ) at its limits of support. For all of the examples we are dealing with π(ψ) is equal (and in fact zero) at the endpoints of its support.
∫
Thus this first term vanishes and therefore, E[µ(ψ)] = µ π(ψ)dψ = (µ)(1) = µ. 85
With du = qʼ(ψ) q(ψ)-(K+1) dψ and v = e-Kµψ.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 298
Thus µ is in fact the a priori mean.86 Note that we have also shown that the mean of the marginal distribution is µ. Posterior Distribution: Next let us determine the posterior distribution of ψ. Assume we have n observations: x1 , ..., xn , which sum to S. Then by Bayes Theorem the posterior distribution is proportional to: π(ψ) f(x1 ;ψ) ...f(xn ;ψ). This is proportional to: q(ψ)-K e-Kµψ {exp[-ψx1 ] / q(ψ)}... {exp[-ψxn ] / q(ψ)} = q(ψ)-(K+n) e-(Kµ+S)ψ. Thus the posterior distribution has the same form as the prior distribution, but with new parameters: Kʼ = K + n = K + number of observations, and µʼ = (Kµ + S) / (K + n). Thus π(ψ) is indeed a conjugate prior for the linear exponential family f(x; ψ). For example, in the case of the Gamma-Poisson, K = 1/θ and µ = αθ. As was discussed previously, the parameters of the posterior Gamma are 1/θʼ = 1/θ + E = K + n, and αʼ = α + C = Kµ + S. Thus Kʼ = 1/θʼ = 1/θ + E = K + number of observations. µʼ = αʼθʼ = (α + C) / (1/θ + E) = (Kµ + S) / ( K+ n). So the general formula to update the parameters of the conjugate prior distribution works in the particular case of the Gamma-Poisson. Why Buhlmann = Bayesian for Conjug. Priors with Likelihoods from Linear Expon. Families:87 Since the posterior density of ψ has the same form as the prior density, one can obtain this Bayesian Estimate by substituting in the expression for the expected value of µ(ψ) the posterior parameters rather than the prior parameters. Recall that prior to any observations the expected value of µ(ψ) is µ. Since posterior to observations µʼ = (Kµ+S) / (K+n), the posterior expected value of µ(ψ) is: (Kµ + S) / (K + n).
That is why the letter µ was chosen for this parameter of the prior distribution. See Theorem 15.24 in Loss Models. This is also discussed in “A Teacherʼs Remark on Exact Credibility” by Hans Gerber, ASTIN Bulletin, November 1995. 86 87
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 299
For example, in the case of the Gamma-Poisson, this posterior estimate is: (α + number of claims) / (1/θ + number of exposures). The estimate based solely on the observations is: (sum of observations)/(number of observations) = S/n. The estimate prior to observations is the prior expected value of µ(ψ), which is µ. We can rewrite the posterior Bayesian estimate as a linear combination of these two estimates: Posterior Bayesian Estimate = (Kµ+S) / (K+n) = {n/ (K+n)} S/n) + {K/ (K+n)} µ = {n / (n+K)} {estimate based on observations} + (1 - {n / (n+K)}) (prior estimate) For example, in the case of the Gamma-Poisson the posterior estimate = (α + C) / (1/θ + E) = {E / (E + 1/θ)} (C/E) + {1- (E/(E + 1/θ))} (αθ).88 Thus the Bayesian estimator is a linear function of the observation and the prior estimate. In general, the Buhlmann Credibility estimate is the least squares linear approximation to the Bayesian estimate.89 When as is the case here, the Bayesian estimate is itself linear, then it is fit exactly by the least squares line, and the Buhlmann Credibility estimate is equal to the Bayesian estimate. Thus it has been shown that if the likelihood density is a member of a linear exponential family and the conjugate prior distribution is used as the prior distribution, then the Buhlmann Credibility estimate is equal to the corresponding Bayesian estimate. Thus this is an example of what Loss Models refers to as exact credibility. The credibility is the weight given to the observations; Z = n / (n+K). Thus the Buhlmann Credibility parameter K is equal to one of the parameters used to construct the Conjugate Prior π(ψ; µ, K) above.90
88
Where C = number of claims observed, E = number of exposures observed. “Linear” here refers to a linear function of the observation and the prior estimate. 90 This was why this parameterization of the conjugate prior was used. In the case of the Gamma-Poisson, K = 1/θ and Z = E/(E + 1/θ). 89
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 300
Problems: 11.1 (1 point) Which of the following statements are true? 1. If H is some hypothesis and B is an event, then the posterior probability of H is proportional to the product of the conditional probability of B given H and the prior probability of H. 2. When the conditional distribution of the current observations is given by a Binomial distribution, the Beta distribution is a conjugate prior distribution. 3. A probability density function which is non-zero on an interval and such that f(x; θ) is proportional to exp[sin(θ)x + 3x4 + 2x3 + cos(θ)] is a linear exponential family. A. 1, 2
B. 1, 3
C. 2, 3
D. 1, 2, and 3
E. None of A, B, C, or D.
11.2 (2 points) Which of the following are linear exponential families? A. A LogNormal Distribution with µ unknown and σ = 3. B. A Pareto Distribution with θ unknown and α = 4. C. A Gamma Distribution with θ unknown and α = 5. D. A Weibull Distribution with θ unknown and τ = 6. E. None of A, B, C, or D. 11.3 (4, 5/85, Q.42) (1 point) You are given that the likelihood density distribution is a member of a linear exponential family and that the conjugate prior distribution is used as the prior distribution. Which of the following statements are true? 1) It is possible that the likelihood density distribution is a Poisson distribution, and the prior distribution is a Gamma distribution. 2) It is possible that the likelihood density distribution is a negative binomial distribution, and the prior distribution is a beta distribution. 3) The Buhlmann credibility estimate is equal to the corresponding Bayesian estimate. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 11.4 (4, 5/86, Q.41). (1 point) Which of the following statements are true? 1. Buhlmann's credibility estimates are the best linear approximation to Bayesian estimates. 2. The normal distribution is a member of a linear exponential family. 3. In the case of a normal likelihood and a normal conjugate prior distribution, the Buhlmann and Bayesian credibility estimates are equal. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 301
11.5 (4, 5/87, Q.43) (1 point) Let u be in the interval c < u < d. Let x be in the interval a < x < b. Which of the following define the probability density form of a linear exponential family? A. f(x; u) = exp[p(u)A(x) + B(x) + q(u)] B. f(x; u) = exp[p(u)x + B(x) + q(u)] C. f(x; u) = exp[p(u)A(x) + B(x)] D. f(x; u) = exp[p(u)x] + B(x) + q(u) E. f(x; u) = exp[B(x) + q(u) ] 11.6 (4B, 11/92, Q.28) (2 points) You are given the following:
•
The conditional distribution f(x|θ) is a member of the linear exponential family.
• • •
The prior distribution of θ, g(θ), is a conjugate prior with f(x|θ).
•
The expected value of the process variance, Eθ[Var(X | θ)] = 3.
E[X] = 1 E[X | X1 = 4] = 2 where X1 is the value of a single observation.
Determine the variance of the hypothetical means, Varθ(E[X | θ]). A. B. C. D. E.
At least 0 but less than 2 At least 2 but less than 5 At least 5 but less than 8 At least 8 Cannot be determined.
11.7 (4B, 11/94, Q.2) (3 points) You are given the following: The density function g(x|α) is a member of the linear exponential family. The prior distribution of α, h(α), is a conjugate prior distribution with the density function, g(x|α). X1 and X2 are distributed via g(x|α), and E[X1 ] = 0.5 and E[ X2 | X1 = 3] = 1.00. The variance of the hypothetical means, Varα(E[X | α]) = 6. Determine the expected value of the process variance, Eα[Var(X | α)]. A. Less than 2.0 B. At least 2.0, but less than 12.0 C. At least 12.0, but less than 22.0 D. At least 22.0 E. Cannot be determined from the given information.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 302
Solutions to Problems: 11.1. D. 1. T. Bayes Theorem. 2. T. 3. T. It is an exponential family. Inside the exponential, the term involving both x and the single parameter θ is linear in x. 11.2. C. A. For the LogNormal, f(x) = exp[-.5 ({ln(x) − µ} / σ)2] /(xσ ln f(x) = -.5 ({ln(x) − µ} / 3)2 - ln(3x
2 π ). For σ = 3,
2 π ). The term involving both x and µ is: -µln(x)/9, which is not
linear in x. Thus the LogNormal for fixed σ, is not a member of a linear exponential family. B. For the Pareto, f(x) = (αθα)(θ + x) − (α + 1). For α = 4, ln f(x) = ln(4) + 4ln(θ) - 5ln(θ+x). The term involving both x and θ is -5ln(θ+x), which is not linear in x. Thus the Pareto for fixed α, is not a member of a linear exponential family. C. For the Gamma, f(x) = θ−αxα−1 e−x/θ / Γ(α). For α = 5, ln f(x) = -5ln(θ) + 4ln(x) - x/θ - ln 24. The term involving both x and θ is -x/θ, which is linear in x. (The distribution is defined on the fixed interval 0 < x < ∞, and θ is in the fixed interval 0 < θ < ∞.) Thus the Gamma for fixed α, is a member of a linear exponential family. D. For the Weibull, f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x. For τ = 6, ln f(x) = ln(6) + 5ln(x) - 6ln(θ) - (x/θ)6 . The term involving both x and θ is (x/θ)6 , which is not linear in x. Thus the Weibull for fixed τ ≠1, is not a member of a linear exponential family. Comment: The Gamma for α = 1 is an Exponential Distribution, a linear exponential family. The Weibull for τ =1 is an Exponential Distribution, a linear exponential family. 11.3. E. 1. True. The Gamma-Poisson is an example of a conjugate prior and the Poisson is a member of a linear exponential family. 2. True. The Negative Binomial with r fixed is a member of a linear exponential family and the Beta Distribution is a conjugate prior to it. 3. True. 11.4. E. 1. True. 2. True for a Normal with known variance. 3. True. 11.5. B. f(x; u) = exp[p(u)x + B(x) + q(u)]. Comment: Choice A is the form of an exponential family; for A(x) = x, one has a linear exponential family. The form in Loss Models parameterizes using the natural parameter, rather than p(u). A linear exponential family may be parameterized using the natural parameter, rather than it must be parameterized in that convenient manner.
2013-4-10,
Conjugate Priors §11 Linear Exponential Families,
HCM 10/21/12,
Page 303
11.6. A. For the given situation, the estimate based on Bayesian Analysis is equal to that from Buhlmann Credibility. After a single observation of 4, the new Bayesian Estimate is 2, which is also the Buhlmann Credibility estimate. The prior mean is 1. Thus, 2 = Z(4) + (1-Z)(1). Therefore Z = 1/3. With one observation Z = 1 / (1+K). Thus since Z = 1/3, K = 2. But K = EPV / VHM and we are given EPV = 3. Thus VHM = 3/2. Comment: Eθ refers to the expected value averaging over the different values of θ. Similarly, Varθ refers to taking the second central moment over all the different values of θ. 11.7. D. Since we are dealing with conjugate priors and a member of a linear exponential family, the Bayesian estimate is equal to the Buhlmann Credibility Estimate. Our prior mean is 0.5 and if our observation is 3, then the new estimate is 1. Therefore: 1 = 3Z + 0.5(1 - Z). ⇒ Z = 20%. For one observation Z = 1/(1 + K). ⇒ K = 4. But K = Expected Value of the Process Variance / Variance of the Hypothetical Means. Therefore 4 = Expected Value of the Process Variance / 6. Therefore the Expected Value of the Process Variance = 24 Comment: Multiple concepts, difficult. X1 and X2 are intended to be the prior and posterior variables. Thus E[X1 ] is the mean prior to any observation. E[X2 | X1 = 3] is the mean after an observation of 3.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 304
Section 12 Overview of Conjugate Priors Four examples of conjugate priors have been discussed: the Gamma-Poisson and Beta-Bernoulli for frequency, and the Inverse Gamma-Exponential and the Normal-Normal for severity. In each case the posterior distribution is of the same form as the prior distribution.
Updating: Thus in each case one can update this distribution from one year to the next; with one yearʼs posterior distribution becoming the next yearʼs prior distribution. For example, for a Gamma-Poisson assume we start with a prior Gamma with parameters: α = 2 and θ = 1/20. If during the first year we observe 3 claims on 16 exposures,91 then the posterior Gamma has α = 2 + 3 = 5, and 1/θ = 20 + 16 = 36. This becomes the prior distribution for the second year. If one now observes in the second year 5 claims on 18 exposures, the posterior Gamma has α = 5 + 5 = 10, and 1/θ = 36 + 18 = 54. One can continue in this manner. Note that one can also get the posterior Gamma after two years by using the observation in 2 years of 8 claims on 34 exposures to update the original prior Gamma and get: α = 2 + 8 = 10 and 1/θ = 20 + 34 = 54.
Gamma-Poisson, Updating One Year at a Time: Prior: α = 2, 1/θ = 20. Mean Frequency: 10.0%. 7 6 5 4 3 2 1 0.1
91
0.2
0.3
Perhaps these are the robbery claims from a small chain of convenience stores.
0.4
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 305
Year 1: 3 Claims, 16 Exposures. Posterior: α = 5, 1/θ = 36. Mean Frequency: 13.9%. 7 6 5 4 3 2 1 0.1
0.2
0.3
0.4
0.3
0.4
0.3
0.4
Year 2: 5 Claims, 18 Exposures. Posterior: α = 10, 1/θ = 54. Mean Frequency: 18.5%. 7 6 5 4 3 2 1 0.1
0.2
Year 3: 4 Claims, 20 Exposures. Posterior: α = 14, 1/θ = 74. Mean Frequency: 18.9%. 8 6 4 2
0.1
0.2
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 306
Year 4: 4 Claims, 22 Exposures. Posterior: α = 18, 1/θ = 96. Mean Frequency: 18.8%. 8 6 4 2
0.1
0.2
0.3
0.4
Year 5: 3 Claims, 20 Exposures. Posterior: α = 21, 1/θ = 116. Mean Frequency: 18.1%. 10 8 6 4 2
0.1
0.2
0.3
0.4
Other Conjugate Priors: As discussed previously, the Beta distribution is a conjugate prior to the Binomial Distribution for fixed m. If for fixed m, the q parameter of the Binomial is distributed over a portfolio by a Beta, then the posterior distribution of q parameters is also given by a Beta. For m = 1, we get the special case of the Beta-Bernoulli. The Beta distribution is also a conjugate prior to the Negative Binomial Distribution for fixed r. If for fixed r, 1/(1+β) of the Negative Binomial is distributed over a portfolio by a Beta,92 then the posterior distribution of 1/(1+β) parameters is also given by a Beta.93 For r = 1, one gets the special case of the Beta-Geometric. One could equally well have β/(1+β) =1 - 1/(1+β), distributed via a Beta. The marginal distribution is called a Generalized Waring Distribution, with density: f(x) = Γ(k+x) Γ(a+b) Γ(a+k) Γ(b+x) / {Γ(k)x! Γ(a) Γ(b) Γ(a+k+b+x)}, x= 0, 1, 2, ...
92 93
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 307
The Inverse Gamma Distribution is a conjugate prior to the Gamma Distribution. (The Inverse Gamma-Exponential is a special case.) The marginal distribution and the predictive distribution are each Generalized Pareto Distributions. There are many other conjugate prior situations.94 Here are some examples, involving continuous mixtures of severity distributions. In each case the scale parameter θ is being mixed, with the other parameters in the severity distribution held fixed. Severity
Prior Distribution
Marginal Distribution
Exponential
Inverse Gamma: α, θ
Pareto: α, θ
Inverse
Gamma: α, θ
Inverse Pareto: τ = α, θ
Exponential Weibull, τ = t
Inverse Transformed
Burr: α, θ, γ = t
Gamma: α, θ, τ = t Inverse
Transformed
Weibull, τ = t
Gamma: α, θ, τ = t
Gamma, α = a
Inverse Gamma: α, θ
Inverse
Inverse Burr: τ = α, θ, γ = t
Generalized Pareto: α, θ, τ = a Generalized
Gamma, α = a
Gamma: α, θ
Pareto: α = a, θ, τ = α
Transformed
Inverse Transformed
Transformed Beta:
Gamma, α = a, τ = t
Gamma: α, θ, τ = t
α, θ, γ = t, τ = a
Inverse Transformed Gamma, α = a, τ = t
94
Transformed Gamma: α, θ, τ = t
Transformed Beta: α = a, θ, γ = t, τ = α
See the problems for illustrations of some of these additional examples. In Foundations of Casualty Actuarial Science, 3rd edition and earlier, at the end of his Credibility Chapter, Gary Venter has an excellent chart displaying a large number of conjugate distributions.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 308
Exercise: The severity for each insured is a Transformed Gamma Distribution with parameters α = 3.9, q, and τ = 5. Over the portfolio, q varies via an Inverse Transformed Gamma Distribution with parameters α = 2.4, θ = 17, and τ = 5. What is the severity distribution for the portfolio as a whole? [Solution: Using the above chart, the mixed distribution is a Transformed Beta Distribution with parameters α = 2.4, θ = 17, γ = 5, and τ = 3.9.] Let C be the number of claims observed and let L be the dollars of loss observed. Let Λ be the sum of the inverses of the loss sizes observed. Let ζ be the sum of the loss sizes observed each to the power t; if t =1, then ζ = L. Let ω be the sum of the loss sizes observed each to the power - t; if t =1, then ω = Λ. For these conjugate priors, here are the parameters of the posterior distribution: Severity
Prior Distribution
Parameters of Posterior Distribution95
Exponential96
Inverse Gamma, α, θ
αʼ = α +C, θʼ = θ + L
Inverse Exponential
Gamma: α, θ
αʼ = α + C, 1/θʼ = 1/θ + Λ
Weibull, τ = t 97
Inverse Transformed Gamma: α, θ, τ = t
αʼ = α + C, θʼt = θt + ζ, τʼ = t
Inverse
Transformed
Weibull, τ = t
Gamma: α, θ, τ = t
αʼ = α + C, θʼ-t = θ-t + ω, τʼ = t
Gamma, α = a
Inverse Gamma: α, θ
αʼ = α + aC, θʼ = θ + L
Gamma, α = a
Gamma: α, θ
αʼ = α + aC, 1/θʼ = 1/θ + Λ
Transformed
Inverse Transformed
Gamma, α = a, τ = t
Gamma: α, θ,τ = t
Inverse
Inverse Transformed Gamma, α = a, τ = t
αʼ = α + aC, θʼt = θt + ζ, τʼ = t
Transformed Gamma: α, θ,τ = t
95
αʼ = α + aC, θʼ-t = θ-t + ω, τʼ = t
In each case the posterior distribution has the same form as the prior distribution. This is the Inverse Gamma-Exponential Conjugate Prior, discussed previously. Note that is is a special case of the Inverse Transformed Gamma-Weibull, the Inverse Gamma-Gamma, and Inverse Transformed Gamma-Transformed Gamma Conjugate Priors. 97 If one instead in the Weibull used the parameter λ = θτ, then λ follows an Inverse Gamma Distribution, 96
and the Inverse Gamma is a conjugate prior to the Weibull.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 309
Dirichlet-Multinomial: For a life aged x, let qx = y1 , and 1|qx = y2 . Assume a priori, y1 and y2 have a joint distribution f(y1 , y2 ) proportional to: y1 2 y2 4 (1 - y1 - y2 )3 . In order to integrate to one, the appropriate constant in front turns out to be: 11!/(2! 4! 3!) = 138,600. Then, f(y1 , y2 ) = 138,600 y1 2 y2 4 (1 - y1 - y2 )3 , 0 ≤ y1 ≤ 1, 0 ≤ y2 ≤ 1, 0 ≤ y1 + y2 ≤ 1. The Dirichlet Distribution is a generalization of the Beta Distribution (with θ = 1): 98 k
k
k
j=0
j=1
j=0
f(y1 , y2 , y3 , ..., yk) = Γ(a) ∏ { yja j- 1 / Γ(aj)} , 0 ≤ yj ≤ 1, where y0 = 1 - ∑ yj , and a = ∑ aj .99 The above prior is a Dirichlet Distribution with parameters a0 = 4, a1 = 3, and a2 = 5. It can be shown that for the Dirichlet Distribution, E[yj] = aj/a. Exercise: For the above Dirichlet Distribution with parameters a0 = 4, a1 = 3, and a2 = 5, determine E[y1 ] and E[y2 ]. [Solution: a = 4 + 3 + 5 = 12. E[y1 ] = a1 /a = 3/12 = 1/4. E[y2 ] = a2 /a = 5/12. Comment: Therefore, the prior estimates of qx and 1 |qx are 1/4 and 5/12 respectively.] Assume we observe 30 lives all aged x that are assumed to follow the same mortality function. 6 lives die during the first year and 8 lives die during the second year. The remaining 16 lives survive beyond the second year. Then applying Bayes Theorem, the posterior distribution is proportional to: {y1 2 y2 4 (1 - y1 - y2 )3 } {y1 6 y2 8 (1 - y1 - y2 )16} = y1 8 y2 12 (1 - y1 - y2 )19. This posterior distribution is also a Dirichlet Distribution, with a0 = 20, a1 = 9, and a2 = 13. Exercise: Posterior to the observation, determine E[y1 ] and E[y2 ]. [Solution: a = 20 + 9 + 13 = 42. E[y1 ] = a1 /a = 9/42. E[y2 ] = a2 /a = 13/42. Comment: Therefore, the posterior estimates of qx and 1 |qx are 9/42 and 13/42 respectively.]
98
See for example, Kendallʼs Advanced Theory of Statistics. For example, a Dirichlet Distribution with parameters a0 = 4 and a1 = 3 is a Beta Distribution with parameters a = 3 and b = 4. “A Multivariate Bayesian Claim Count Development Model With Closed Form Posterior and Predictive Distributions,” by Stephen Mildenhall, CAS Forum Winter 2006, combines the use of the Dirichlet-Multinomial and the Gamma-Poisson. k 99 The constant, Γ(a)/ ∏ Γ(aj) , is required so that the density integrates to one over its support. j=0
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 310
In general, let there be dj deaths observed during year j, d0 lives survive beyond year k, and k
d = ∑ dj = original number of lives. Then the posterior distribution is Dirichlet with parameters j=0
ajʼ = aj + dj. The Dirichlet Distribution is conjugate prior to the Multinomial. The posterior estimate of E[yj] = E[j-1|qx] is: aiʼ/aʼ = (aj + dj)/(a + d). (ai + di)/(a + d) = {d/(a + d)} (di/d) + {a/(a + d)} (ai/a) = Z(observed rate) + (1 - Z)(prior rate), where Z = a/(a + d). Thus the estimate from Bayes Analysis is equal to that from Buhlmann Credibility, with the k
Buhlmann Credibility Parameter, K = a = ∑ aj .100 j=0
Limits: The Incomplete Gamma Function, Γ(a; y), can be obtained as a limit of an appropriate sequence of Incomplete Beta Functions β(a, b; x), with b = 1+ y/x, as x goes to zero.101 Also the Poisson with parameter θ can be obtained as a limit of an appropriate sequence of Binomial Distributions, with m = θ/q as q goes to zero. Therefore, the Gamma-Poisson Conjugate Prior can be obtained as a limit of an appropriate sequence of Beta-Binomial Conjugate Priors. Since the Poisson can also be obtained as a limit of Negative Binomial Distributions, the Gamma-Poisson Conjugate Prior can also be obtained as a limit of an appropriate sequence of Beta-Negative Binomial Conjugate Priors. Pure Premiums: One can combine a frequency assumption and a severity assumption in order to get a model of total costs; i.e., pure premiums. Problems 1 to 5 below, combine a Gamma-Poisson frequency assumption with an Inverse Gamma-Exponential severity assumption.
100 101
For the Beta-Bernoulli, a0 = b, a1 = a, and K = a0 + a1 = a + b. See “Mahlerʼs Guide to Loss Distributions.”
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 311
Variance of the Prior versus the Marginal Distribution: In the Gamma-Poisson, the prior Gamma and the marginal Negative Binomial have the same mean: α θ = r β. However, the variance of the Negative Binomial is greater than the variance of the Gamma. For β > 0, r β (1 + β) > r β2 = α θ2. This follows from the fact that the variance of the Gamma is the VHM, while the variance of the Negative Binomial is the Total Variance = EPV + VHM > VHM. In general for our conjugate prior examples, the variance of the Marginal Distribution (prior mixed) is greater than the variance of the prior distribution. Thus the variance of the mixed Pareto is greater than that of the prior Inverse Gamma. The variance of the mixed Bernoulli is greater than that of the prior Beta. Similarly, the variance of the Predictive Distribution (posterior mixed) is greater than the variance of the posterior distribution. Some General Ideas: If the likelihood density is a member of a linear exponential family and the conjugate prior distribution is used as the prior distribution, then the Buhlmann Credibility estimate is equal to the corresponding Bayesian estimate (for a squared error loss function.) This was the case for all four of the examples discussed in detail. However, it is important to note that while a lot of time has been spent going over the details of the four conjugate prior examples, these are very special examples. In general, one does not have the posterior distribution of the same type as the prior distribution.102 Even when one has a conjugate prior situation, the Bayesian estimate is not necessarily equal to the Buhlmann Credibility estimate. If the likelihood density is not a member of a linear exponential family, even if a conjugate prior distribution is used as the prior distribution, the Buhlmann Credibility estimate is usually not equal to the corresponding Bayesian estimate (for a squared error loss function.)
102
See for example the mixing Poisson example presented in Section 2, as well as the many examples in “Mahlerʼs Guide to Buhlmann Credibility and Bayesian Analysis.”
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 312
Problems: Use the following information for the next 5 questions: The number of claims a particular policyholder makes in a year is Poisson with mean λ. The values of the Poisson parameter λ (for annual claim frequency) for the individual policyholders in a portfolio follow a Gamma distribution with parameters: α = 6 and θ = 2.5. The size of claim distribution for any particular policyholder is exponential with mean δ. The δ values of the portfolio of policyholders have an Inverse Gamma distribution with parameters: α = 3 and θ = 20. For an individual insured the frequency and severity process are independent of each other. In addition, the distributions across the portfolio of λ and δ are independent of each other. 12.1 (3 points) What is the Expected Value of the Process Variance of the (annual) Pure Premiums for this portfolio? A. less than 6100 B. at least 6100 but less than 6200 C. at least 6200 but less than 6300 D. at least 6300 but less than 6400 E. at least 6400 12.2 (3 points) What is the Variance of the Hypothetical Mean (annual) Pure Premiums for this portfolio? A. less than 28,000 B. at least 28,000 but less than 29,000 C. at least 29,000 but less than 30,000 D. at least 30,000 but less than 31,000 E. at least 31,000 12.3 (2 points) An individual risk from this portfolio is observed to have claims totaling 900 in cost over 3 years. Use Buhlmann Credibility to estimate the future (annual) pure premium of this individual. A. less than 288 B. at least 288 but less than 289 C. at least 289 but less than 290 D. at least 290 but less than 291 E. at least 291
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 313
12.4 (3 points) An individual risk from this portfolio is observed to have 50 claims over 3 years. Use Buhlmann Credibility to estimate the future (annual) claim frequency of this individual. A. less than 16.3 B. at least 16.3 but less than 16.4 C. at least 16.4 but less than 16.5 D. at least 16.5 but less than 16.6 E. at least 16.6 12.5 (3 points) An individual risk from this portfolio is observed to have 50 claims totaling 900 in cost over 3 years. Use Buhlmann Credibility to estimate the future average claim severity of this individual. A. less than 17.2 B. at least 17.2 but less than 17.4 C. at least 17.4 but less than 17.6 D. at least 17.6 but less than 17.8 E. at least 17.8
12.6 (5 points) The number of claims a particular policyholder makes in a year follows a distribution: f(x) =
66 µx (x + 5)! , x = 0, 1, 2, ...., with mean µ. x! 5! (6 + µ)6 + x
The means µ for the individual policyholders in a portfolio follow a Generalized Pareto Distribution, with parameters α, θ = 6, and τ. An insured is chosen at random from this portfolio. What is the probability of observing n claims over the coming year for this insured? Hint: The density of a Generalized Pareto Distribution is: f(x) =
Γ[α + τ] θα xτ -1 , x > 0. Γ[α ] Γ[τ] (θ + x)α + τ
A.
Γ(α + 6) Γ(α + τ) Γ(6 + x) Γ(6) Γ(α) Γ(τ + x + 6 + α) Γ(x + 1)
B.
Γ(α + 6) Γ(α + τ) Γ(τ + x) Γ(τ) Γ(α ) Γ(τ + x + 6 + α) Γ(x + 1)
C.
Γ(α + τ) Γ(τ + x) Γ(6 + x) Γ(6) Γ(τ ) Γ(α) Γ(x +1)
D.
Γ(α + 6) Γ(τ + x) Γ(6 + x) Γ(6) Γ(τ ) Γ(α) Γ(τ + x + 6 + α)
E.
Γ(α + 6) Γ(α + τ) Γ(τ + x) Γ(6 + x) Γ(6) Γ(τ) Γ(α) Γ(τ + x + 6 + α) Γ(x + 1)
12.7 (3 points) f(x) = x2 λ3 e−λx / 2, x > 0. π(λ) = λ3 e−2λ 8/3, λ > 0. You observe n claims, x1 , x2 , ... xn . Compare the estimate based on Buhlmann Credibility to that from Bayes Analysis.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 314
Use the following information for the next 3 questions: The number of claims a particular policyholder makes in a year follows a distribution with parameter p: f(x) = p(1 - p)x, x = 0, 1, 2, .... The values of the parameter p for the individual policyholders in a portfolio follow a Beta distribution, with parameters a = 4, b = 5, θ = 1: g(p) = 280 p3 (1 - p)4 , 0 ≤ p ≤ 1. 12.8 (2 points) What is the a priori mean annual claim frequency for the portfolio? A. less than 1.5 B. at least 1.5 but less than 1.6 C. at least 1.6 but less than 1.7 D. at least 1.7 but less than 1.8 E. at least 1.8 12.9 (3 points) You observe an individual policyholder to have 6 claims in 10 years. Assuming the observation of the separate years are independent, what is the posterior distribution of the parameter p for this policyholder? A. 19,612,560 p14(1-p)9 B. 27,457,584 p13(1-p)10 C. 32,449,872 p12(1-p)11 D. 62,403,600 p11(1-p)13 E. 49,031,400 p10(1-p)14 12.10 (2 points) You observe an individual policyholder to have 6 claims in 10 years. Assuming the observation of the separate years are independent, what is the posterior estimate of the average annual claim frequency for this policyholder? A. less than 0.85 B. at least 0.85 but less than 0.90 C. at least 0.90 but less than 0.95 D. at least 0.95 but less than 1.00 E. at least 1.00
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 315
Use the following information for the next eight questions: The size of claims for any particular policyholder follows an Inverse Gamma distribution with density: f(x) =
θ3 e- θ / x , x > 0. 2x4
The values of parameter θ for the individual policyholders in a portfolio follow a Gamma distribution with density: g(θ) = θ4 e−θ/10 / 2,400,000, θ > 0. Hint: The density of a Generalized Pareto Distribution is: f(x) =
Γ[α + τ] θα xτ -1 , x > 0. Γ[α ] Γ[τ] (θ + x)α + τ
The nth moment of a Generalized Pareto Distribution is: E[Xn ] = θn
Γ(α - n) Γ(τ + n) , α > n. Γ(α) Γ(τ)
12.11 (3 points) Prior to any observations, what is the density function for the claims severity from an insured picked at random from this portfolio? A. 105,000 x4 / (x+10)8 B. 2,800,000 x4 / (x+10)9 C. 63,000,000 x4 / (x+10)10 D. 1,260,000,000 x4 / (x+10)11 E. None of the above. 12.12 (2 points) Prior to any observations, what is the expected claim severity for an insured picked at random from this portfolio? A. less than 26 B. at least 26 but less than 27 C. at least 27 but less than 28 D. at least 28 but less than 29 E. at least 29 12.13 (2 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. What is the posterior density function for the parameter θ for this insured? A. 4.78 x 10-31 θ16 e−.1θ
B. 4.78 x 10-14 θ16 e−θ
C. 2.81 x 1014 θ16 e−43θ
D. 9.82 x 1015 θ16 e−53θ
E. None of the above.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 316
12.14 (3 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. What is the density function for the future claim severity from this insured? A. 105,000 x4 / (x+10)8 B. 163.1 x10 / (x+0.575)14 C. 15291 x16 / (x+1.739)20 D. 5.313 x20 / (x+0.1)24 E. None of the above. 12.15 (2 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. Using Bayesian Analysis what is the expected future average claim severity from this insured? A. less than 15 B. at least 15 but less than 16 C. at least 16 but less than 17 D. at least 17 but less than 18 E. at least 18 12.16 (3 points) What is the Expected Value of the Process Variance, prior to any observations? A. less than 100 B. at least 100 but less than 250 C. at least 250 but less than 500 D. at least 500 but less than 1000 E. at least 1000 12.17 (2 points) What is the Variance of the Hypothetical Mean Severities, prior to any observations? A. 75 B. 100 C. 125 D. 150 E. 175 12.18 (2 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. Using Buhlmann Credibility what is the expected future average claims severity from this insured? A. less than 15 B. at least 15 but less than 16 C. at least 16 but less than 17 D. at least 17 but less than 18 E. at least 18
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 317
Use the following information for the next nine questions: The size of claims for any particular policyholder follows a Gamma distribution with density: f(x) = λ6 x5 e−λx / 120, x > 0. The values of parameter λ for the individual policyholders in a portfolio follow a Gamma distribution with density: g(λ) = 3456 λ3 e−12λ, λ > 0. 12.19 (3 points) Prior to any observations, what is the density function for the claims severity from an insured picked at random from this portfolio? A. 5,806,080 x4 (x +12)-9 B. 10,450,944 x5 (x +12)-10 C. 156,764,160 x4 (x +12)-10 D. 313,528,320 x5 (x +12)-11 E. None of the above. 12.20 (2 points) Prior to any observations, what is the expected claim severity for an insured picked at random from this portfolio? A. less than 22 B. at least 22 but less than 23 C. at least 23 but less than 24 D. at least 24 but less than 25 E. at least 25 12.21 (2 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. What is the posterior density function for the parameter λ for this insured? A. 5.013 x 1017 λ 27 e−43λ B. 1.636 x 1018 λ 30 e−43λ C. 4.934 x 1020 λ 27 e−55λ D. 3.370 x 1021 λ 30 e−55λ E. None of the above. 12.22 (3 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. What is the density function for the future claim severity from this insured? A. 1.917 x 1049 x4 / (x + 55)-30
B. 1.150 x 1050 x5 / (x + 55)-31
C. 5.409 x 1054 x4 / (x + 55)-33 E. None of the above.
D. 3.570 x 1055 x5 / (x + 55)-34
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 318
12.23 (2 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. Using Bayesian Analysis what is the expected future average claim severity from this insured? A. less than 13 B. at least 13 but less than 14 C. at least 14 but less than 15 D. at least 15 but less than 16 E. at least 16 12.24 (3 points) Use the Bayesian central limit theorem to construct a 95% credibility interval (confidence interval) for the estimate in the previous question. 12.25 (3 points) What is the Expected Value of the Process Variance, prior to any observations? A. less than 125 B. at least 125 but less than 130 C. at least 130 but less than 135 D. at least 135 but less than 140 E. at least 140 12.26 (2 points) What is the Variance of the Hypothetical Mean Severities, prior to any observations? A. less than 270 B. at least 270 but less than 280 C. at least 280 but less than 290 D. at least 290 but less than 300 E. at least 300 12.27 (2 points) From a particular insured you observe four claims of sizes: 5, 8, 10, 20. Using Buhlmann Credibility what is the expected future average claims severity from this insured? A. less than 13 B. at least 13 but less than 14 C. at least 14 but less than 15 D. at least 15 but less than 16 E. at least 16
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 319
Use the following information for the next four questions: The severity distribution of each risk in a portfolio is given by a Weibull Distribution: F(x) = 1 - exp[-(x/β)1/3]. β varies over the portfolio via an Inverse Transformed Gamma Distribution with α = 2.5, θ = 343, and τ = 1/3. For the Inverse Transformed Gamma Distribution: F(x) = 1 - Γ[α; (θ/x)τ], x > 0. ⎛ θ ⎞τ τ θτα exp -⎜ ⎟ ⎝ x⎠ f(x) = . τα + 1 x Γ[α]
[ ]
E[Xk] = θk Γ[α - k/τ] / Γ[α], k < ατ. Mode is: θ
⎛ τ ⎞ ⎝ ατ + 1⎠
1/ τ
.
You may use the following values of the Incomplete Gamma Function: Γ[5.5 ; 5.1705] = Γ[6.5 ; 6.16988] = Γ[7.5 ; 7.16943] = Γ[8.5 ; 8.16909] = 0.5. For a given risk you observe four claims of sizes: 1, 8, 27, and 125. 12.28 (3 points) Determine the posterior distribution of β for this risk. 12.29 (1 point) You are interested in minimizing the squared error loss function. Find the Bayesian estimate of the hypothetical mean claim severity for this risk. A. 50 B. 100 C. 200 D. 300 E. 400 12.30 (1 point) A loss function is defined as equal to zero if the estimate equals the true value, and one otherwise. You are interested in minimizing the expected value of this loss function. Find the Bayesian estimate of the hypothetical mean claim severity for this risk. A. Less than 50 B. At least 50, but less than 100 C. At least 100, but less than 200 D. At least 200, but less than 400 E. At least 400 12.31 (2 points) You are interested in minimizing the expected absolute error. Find the Bayesian estimate of the hypothetical mean claim severity for this risk. A. 50 B. 100 C. 150 D. 200 E. 250
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 320
12.32 (3 points) Severity is Normal with mean 1000 and variance v. The prior distribution of v is Inverse Gamma with θ = 50,000 and α = 4. You observe 2 claims of sizes 800 and 900. What is the Bayesian estimate of v; i.e., the mean of the posterior distribution of v? A. Less than 17,000 B. At least 17,000, but less than 17,500 C. At least 17,500, but less than 18,000 D. At least 18,000, but less than 18,500 E. At least 18,500 12.33 (3 points) The Dirichlet Distribution is: k
k
k
j=0
j=1
j=0
f(y1 , y2 , y3 , ..., yk) = Γ(a) ∏ { yja j- 1 / Γ(aj)} , 0 ≤ yj ≤ 1, where y0 = 1 - ∑ yj , and a = ∑ aj . For the Dirichlet Distribution, E[yj] = aj/a. For a life aged x, let qx = y1 , 1|qx = y2 , 2|qx = y3 , and 3|qx = y4 . A priori, y1, y2 , y3 , and y4 have a joint distribution which is Dirichlet with parameters: a0 = 20, a1 = 3, a2 = 4, a3 = 5, and a4 = 6. 50 lives all aged x are assumed to follow the same mortality function. 3 lives die during the first year, 2 lives die during the second year, 4 lives die during the third year, and 6 lives die during the fourth year. The remaining 35 lives survive beyond the fourth year. Determine the posterior estimate of 3 qx. A. 0.21
B. 0.22
C. 0.23
D. 0.24
E. 0.25
12.34 (2 points) Losses follow a distribution function: F(x) = 1 - exp[-c x2 ], x > 0, c > 0. The prior distribution of c follows a Gamma Distribution with parameters α = 10 and θ = 0.02. A sample of 4 values is observed: 1, 2, 3, 5. Determine the form of the posterior distribution of c. 12.35 (3 points) You are given the following: • The amount of an individual claim has an Inverse Gamma distribution with shape parameter α = 6 and scale parameter q. • The parameter q is distributed via an Exponential Distribution with mean 100. From an individual insured you observe 3 claims of sizes: 10, 20, and 50. For the zero-one loss function, what is the Bayes estimate of q for this insured? A. 90 B. 95 C. 100 D. 105 E. 110
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 321
12.36 (3 points) You are given the following: • F(x | β) = 1 - (10/x)β, x > 10. • The prior distribution of β is Gamma with α = 4 and θ = 0.5. • From an individual insured you observe 3 claims of sizes: 15, 25, and 60. What is the posterior distribution of β? 12.37 (4, 5/85, Q.55) (3 points) Losses have the following probability density function: f(x; λ) =
λ exp(-λ
x)
, x > 0. One observes four losses of sizes: 1, 4, 9 and 64.
2 x
The prior distribution of λ is Gamma with α = 0.7 and θ = 1/2. In which of the following ranges is the Bayesian estimate of λ (i.e., the mean of the posterior distribution of λ)? A. B. C. D. E.
Less than 0.28 At least 0.28, but less than 0.29 At least 0.29, but less than 0.30 At least 0.30, but less than 0.31 At least 0.31
12.38 (4, 5/87, Q.63) (2 points) The parameter ψ is to be estimated for a Pareto distribution with shape parameter ψ and scale parameter 1. It is assumed that ψ has a prior distribution g(ψ) that is Gamma distributed with parameters α = 2.2 and θ = 1/2. The posterior distribution is Gamma with parameters: α ′ = n + α, and 1/θ′ = 1/θ +
n
∑ ln(1+ xi) , where n is the sample size. i=1
If the square of the error is to be minimized, what is the Bayes estimate of ψ given the following sample: 3, 9, 12, 6, 4? A. 1.1
B. 7.2
C. 2 + ln(34)
D.
7.2 2 + ln(34)
E.
7.2 2 + ln(18,200)
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 322
12.39 (4, 5/88, Q.45) (1 point) Which of the following are true? 1. One major advantage of the use of conjugate prior distributions is that the prior distribution for one year can be used as the posterior distribution for the next year. 2. If a set of data can be assumed to have a binomial distribution, and a beta distribution is employed as the prior distribution, then the mean of the posterior distribution is equal to the corresponding Buhlmann credibility estimate of the binomial distribution parameter. 3. If n independent Bernoulli trials are performed with constant probability, θ, of success, and the prior distribution of θ is a beta distribution, then the posterior distribution of θ is a beta distribution. A. 2
B. 1, 2
C. 1, 3
D. 2, 3
E. 1, 2, and 3
12.40 (165, 5/90, Q.10) (1.7 points) A mortality study involves 50 insureds, each age x, with similar medical profiles. 7 insureds die during the first year, 12 insureds die during the second year, 11 insureds die during the third year, and 11 insureds die during the fourth year. The observed mortality rates are graduated by a Bayesian method. You are given: (i) The prior joint distribution of ti = i-1lqx are Dirichlet, 4
4
4
j=0
j=1
j=0
fT(t1 , t2 , t3 , t4 ) = Γ(a) ∏ { t ja j -1 / Γ(aj )} , 0 ≤ tj ≤ 1, where t0 = 1 - ∑ t j , and a = ∑ aj . (ii) The parameters of the prior Dirichlet Distribution are: a0 = 4, a1 = 7, a2 = 4, a3 = 15, a4 = 9. (iii) For the Dirichlet Distribution, E[tj] = aj/a. (iv) The vector of graduated values v is the mean vector of the posterior distribution for T. Determine the graduated value v3 (estimating 2Iqx). (A) 0.29
(B) 0.31
(C) 0.42
(D) 0.52
(E) 0.54
12.41 (165, 11/90, Q.10) (1.9 points) A complete study is performed on 40 patients with a serious disease. In the first year after diagnosis, 4 deaths occur. In the second year, 24 deaths occur. You wish to do a Bayesian graduation of ti = i-1|q0 using a Dirichlet prior with parameters a0 = 2, a1 = 8, a1 = 6, and a3 = 4, fT(t1 , t2 , t3 ) is proportional to: t1 7 t2 5 t3 3 (1 - t1 - t2 - t3 ). For the Dirichlet Distribution, fT(t1 , t2 , ..., tk) = Γ(Σai)
k
∏ { tj aj - 1 / Γ (aj )} , 0 ≤ tj ≤ 1, where j =0
k
t0 = 1 -
∑ tj .
For the Dirichlet Distribution, E[tj] = aj/(Σai).
j =1
Use the graduated values to estimate the mortality rate q1 as: E[1|q0 ]/E[q0 ]. (A) 1/5
(B) 1/2
(C) 5/8
(D) 2/3
(E) 5/6
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 323
12.42 (4B, 11/93, Q.7) (1 point) For each of the following pairs, give the corresponding predictive density function. Likelihood Function
Conjugate Prior
Predictive Density
Bernoulli
Beta
1
Poisson
Gamma
2
Exponential
Inverse Gamma
3
A. 1 = Bernoulli, 2 = Negative Binomial, 3 = Pareto B. 1 = Beta, 2 = Gamma, 3 = Inverse Gamma C. 1 = Weibull, 2 = Gamma, 3 = Inverse Gamma D. 1 = Weibull, 2 = Negative Binomial, 3 = Pareto E. 1 = Pareto, 2 = Gamma, 3 = Pareto 12.43 (165, 11/94, Q.10) (1.9 points) A complete study is performed on 100 patients with a serious illness. The observed mortality rates are graduated by a Bayesian method. A Dirichlet Distribution with parameters a0 , a1 , ..., ak, has joint density k
k
k
j=0
j=1
j=0
fT(t1 , t2 , ..., tk) = Γ(a) ∏ { t ja j -1 / Γ(aj)} , 0 ≤ tj ≤ 1, where t0 = 1 - ∑ t j , and a = ∑ aj . For the Dirichlet Distribution, E[tj] = aj/a. You are given: (i) ti = i-1lq0 using a Dirichlet prior distribution with parameters a1 = 50, a2 = 40, a3 = 30, and k
a = ∑ aj = 150. j=0
(ii) The following table of estimated mortality rates was constructed from the graduated values: i 0 1 2 qi 9/25 2/5 7/12 Determine the number of patients who died during the third year. (A) 20 (B) 22 (C) 24 (D) 26 (E) 28
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 324
12.44 (165, 11/97, Q.12) (1.9 points) A Bayesian graduation has been performed on mortality rate data for 200 patients having a certain operation. A Dirichlet Distribution is used for both the prior and posterior distributions. The parameters of the prior joint Dirichlet distribution of ti = i-1lqx are a0 , a1 , ..., ak. k
k
k
j=0
j=1
j=0
fT(t1 , t2 , ..., tk) = Γ(a) ∏ { t ja j -1 / Γ(aj)} , 0 ≤ tj ≤ 1, where t0 = 1 - ∑ t j , and a = ∑ aj . For the Dirichlet Distribution, E[tj] = aj/a. The graduated mortality rate for each year is the weighted average of the prior rate and the observed rate for that year, with a weight of 5/9 to the prior rate. The prior rate is 0.06 for year 1. Determine a1 , the second parameter of the prior Dirichlet distribution. (A) 7
(B) 10
(C) 12
(D) 15
(E) 22
12.45 (4, 5/00, Q.10) (2.5 points) You are given: The size of a claim for an individual insured follows an inverse exponential distribution with the following probability density function: f(x | θ) =
θ e- θ / x ,x>0 x2
The parameter θ has a prior distribution with the following probability density function: g(θ) =
e- θ / 4 ,θ>0 4
One claim of size 2 has been observed for a particular insured. Which of the following is proportional to the posterior distribution of θ? (A) θ e−θ/2
(B) θ e−3θ/4
(C) θ e−θ
(D) θ2 e−θ/2
(E) θ2 e−9θ/4
12.46 (3 points) In the previous question, 4, 5/00, Q.10, what is the predictive distribution? (A) Pareto with α = 2 and θ = 4/3. (B) Inverse Pareto with τ = 2 and θ = 4/3. (C) Gamma with α = 2 and θ = 4/3. (D) Inverse Gamma with α = 2 and θ = 4/3. (E) None of the above.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 325
Solutions to Problems: 12.1. A. Take an individual risk with Poisson parameter λ and Exponential Parameter δ. The mean frequency is λ and the process variance of the frequency is λ. The mean severity is δ and the process variance of the severity is δ2 . Since for an individual risk, the frequency and severity are independent, the Process Variance of the pure premium = µf σs2 + µs2 σf2 = λδ2 + δ2 λ = 2 λδ2 . Since the distributions of λ and δ are independent, the EPV of the Pure Premium = E[2λδ2 ] = 2E[λ]E[δ2 ] = 2 (mean of Gamma Distribution)(2nd moment of the Inverse Gamma Distribution). The mean the Gamma = αθ = 6(2.5) =15. The second moment of the Inverse Gamma = θ2 / {(α-1)( α-2)} = 400 /{(2)(1)} = 200. Therefore, the Expected Value of the Process Variance of the Pure Premium = (2)(15)(200) = 6000. 12.2. D. Take an individual risk with Poisson parameter λ and Exponential Parameter δ. The mean frequency is λ and the mean severity is δ. Mean pure premium = λδ. VHM of Pure Premiums = Var[λδ] = E[(λδ)2 ] - E2 [λδ]. Since the distributions of λ and δ are independent, the VHM = E[λ2 ]E[δ2 ] - E2 [λ]E2 [δ] = (second moment of the Gamma Dist.)(second moment of the Inverse Gamma Dist.) (square of the mean of Gamma Dist.) (square of the mean of Inverse Gamma Dist.). The mean of the Gamma = αθ = 6(2.5) =15. The second moment of the Gamma = α(α+1)θ2 = 262.5. The mean of the Inverse Gamma = θ / (α - 1) = 20 /(4 - 2) = 10. The second moment of the Inverse Gamma = θ2 / {(α-1)( α-2)} = 400 / {(2)(1)} = 200. Therefore, Variance of the Hypothetical Mean Pure Premiums = (262.5)(200) - 152 102 = 30,000. 12.3. D. From the two previous solutions, K = EPV/VHM = 6000 / 30000 = 1/5 = 0.2. Z = 3/3.2 = 93.75%. Prior mean p.p. is (mean frequency)(mean severity) = (15)(10) = 150. Observed pure premium = 900/3 = 300. New estimate = (0.9375)(300) + (1 - 0.9375)(150) = 290.63. 12.4. C. For the Gamma-Poisson, the Buhlmann Credibility Parameter is the scale parameter of the Gamma Distribution. K = 1/θ = 0.4. Z = 3/(3+K) = 3/3.4 = 0.882. The prior mean frequency is the mean of the Gamma = αθ = 15. The observed frequency is 50/3 = 16.67. Thus the new estimated frequency is: (0.882)(16.67) + (1 - 0.882)(15) = 16.47.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 326
12.5. D. For the Inverse Gamma-Exponential, the Buhlmann Credibility Parameter is the shape parameter of the Inverse Gamma Distribution - 1. K = α - 1 = 2. Z = 50/(50 + 2) = .962. (Note that for severity one uses the number of claims to compute the credibility, not the number of years.) The prior mean severity is the mean of the Inverse Gamma = θ/(α-1) = 10. The observed severity is 900/50 = 18. Thus the estimated future severity is: (.962)(18) + (1 - .962)(10) = 17.70. Comment: Note that the product of the separate Buhlmann Credibility estimates of frequency and severity from these two questions does not match the Buhlmann credibility estimate of the pure premium (even though the observed data is the same.) (16.47)(17.70) = 291.52 ≠ 290.63. Thus it makes a difference whether one analyzes pure premiums or separately analyzes frequency and severity.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 327
12.6. E. The density of a Generalized Pareto Distribution is: g(µ) = Γ(α+τ) θαµτ−1/{ Γ(τ)Γ(α)(θ+µ)α+τ} = Γ(α+τ) 6αµτ−1/{ Γ(τ)Γ(α)(6+µ)α+τ}.
∫
The chance of n claims is f(x | µ) g(µ) dµ = µ=∞
∫ {((x+5)!)/(x!)(5!)}66µx/(6+µ)6+xΓ(α+τ) 6αµτ−1 / {Γ(τ)Γ(α)(6+µ)α+τ} dµ = µ=0 µ=∞
∫
{Γ(x+6)/(Γ(x+1)Γ(6))}66+α Γ(α+τ)/ {Γ(τ)Γ(α)} µx+τ−1/ {(6+µ)6+x+α+τ} dµ = µ=0
{Γ(x+6)/(Γ(x+1)Γ(6))} 66+α Γ(α+τ) / {Γ(τ)Γ(α)} 6-(6+α) Γ(τ+x)Γ(6+α)/Γ(τ+x+6+α) = {Γ(α+6) Γ(α+τ)/(Γ(6)Γ(τ)Γ(α))} {Γ(τ+x)Γ(6+x)/(Γ(τ+x+6+α)Γ(x+1))}. Comment: Very difficult. The integrand is of the same form as a Generalized Pareto Distribution. Since the density of the Generalized Pareto Density is unity we know that: µ=∞
∫ Γ(α+τ) θαµτ−1/{ Γ(τ)Γ(α)(θ+µ)α+τ} dµ = 1. Therefore, µ=0 µ=∞
∫ µτ−1/ (θ+µ)α+τ dµ = θ−αΓ(τ)Γ(α)/Γ(α+τ). This can be rewritten as: µ=0 µ=∞
∫ µs−1/ (θ+µ)t dµ = λs-tΓ(s)Γ(t-s)/Γ(t). µ=0
This is an example of a Generalized Waring Frequency Distribution, with parameter r = 6. Each insured has a Negative Binomial Distribution, with fixed parameter r; in this case r = 6. For example if α = 3.4 and τ = 0.6, then the first few densities are: x f(x)
0 0.4814
1 0.2002
2 0.1073
3 0.0639
4 0.0406
5 0.0271
6 0.0187
7 0.0134
8 0.0098
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 328
12.7. The distribution of x is Gamma with α = 3 and θ = 1/λ. This is mathematically the same as if we had 3n claims from Exponentials each with mean 1/λ, summing to Σxi. The distribution of λ is Gamma with α = 4 and θ = 1/2. Therefore, this is mathematically equivalent to an Inverse Gamma-Exponential, (the distribution of 1/λ, the mean of each Exponential, is Inverse Gamma.) Therefore, the estimates from Buhlmann Credibility and Bayes Analysis are equal. Alternately, the posterior distribution is proportional to: f(x1 ) ... f(xn ) π(λ) ~ λ3n exp[−λΣxi] λ3 e−2λ = λ3n+3 exp[−λ(2 + Σxi)]. Therefore, the posterior distribution of λ is Gamma with α = 4 + 3n, and θ = 1/(2 + Σxi). E[X | λ] = 3/λ. Bayes Analysis estimate = E[3/λ] = 3E[1/λ] = (3)(negative first moment of posterior distribution) = (3)(2 + Σxi)/(3 + 3n) = (2 + Σxi)/(n + 1). Var[X | λ] = 3/λ2. The prior distribution of λ is Gamma with α = 4 and θ = 1/2. (Prior) EPV = E[3/λ2] = 3(negative second moment of prior distribution) = (3)(22 )/{(3)(2)} = 2. (Prior) VHM = Var[3/λ] = 9Var[1/λ] = (9)(E[1/λ2] - E[1/λ]2 ) = (9){2/3 - (2/3)2 } = 2. K = EPV/VHM = 2/2 = 1. Z = n/(n+1). A priori mean = E[3/λ] = (3)(2/3) = 2. Buhlmann Credibility estimate = {n(n+1)}Σxi/n + (1/(n+1))(2) = (2 + Σxi)/(n + 1). Therefore, the estimates from Buhlmann Credibility and Bayes Analysis are equal. Comment: If f(x) is Gamma with parameters a and 1/λ, and the prior distribution of λ is Gamma with parameters α > 2, and θ, then the posterior distribution of λ is Gamma with parameters αʼ = α + an, and 1/θʼ = 1/θ + Σxi. The estimated future severity using Bayes Analysis is: E[a/λ] = a(1/θ + Σxi) / (α + an - 1). EPV = E[a/λ2] = a/{θ2 (α-1)(α-2)}. VHM = Var[a/λ] = a2 Var[1/λ] = (a2 )(E[1/λ2] - E[1/λ]2 ) = (a2 ) {1/{θ2 (α-1)(α-2)} - (1/{θ (α-1)})2 } = a2 / {θ2 (α-1)2 (α-2)}. K = EPV/VHM = (α - 1)/a. Z = n/{n+ (α - 1)/a}. A priori mean = E[a/λ] = a/{θ (α-1)}. Buhlmann Credibility estimate = {n/{n+ (α - 1)/a}}Σxi/n + {((α - 1)/a)/{n+ (α - 1)/a}}a/{θ (α-1)} = (1/θ + Σxi)/(n + (α - 1)/a) = a(1/θ + Σxi)/(α + an - 1), the same as from using Bayes Analysis.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 329
12.8. C. This is a Geometric Distribution (a Negative Binomial with r =1), parameterized somewhat differently than in Loss Models, with p = 1/(1+β). Therefore for a given value of p the mean is: µ(p) = β = (1-p)/p. In order to get the average mean over the whole portfolio we need to take the integral of µ(p) g(p) dp. 1
1
1
∫µ(p) g(p) dp = ∫((1-p)/p) 280 p3(1-p)4 dp = 280 ∫ p2(1-p)5 dp = 280 Γ(3)Γ(6) / Γ(3+6) 0
0
0
= 280 (2!)(5!) / 8! = 5/3. Comment: Difficult! Special case of the Beta-Negative Binomial (for r fixed) Conjugate Prior. For the Beta-Negative Binomial in general, the a priori mean turns out to be rb/(a-1). For r =1, b = 5 and a = 4, the a priori mean is (1)(5)/3 = 5/3. 12.9. B. The sum of ten independent Geometric Distribution is a Negative Binomial with r = 10. Therefore, the sum has a Negative Binomial Distribution, with parameters r = 10 and β = (1-p)/p. f(x) = (10+x-1!) p10(1-p)x / (9! x!). f(6) = (15!) p10(1-p)6 / 9! 6! = 5005 p10(1-p)6 . By Bayes Theorem, the posterior distribution of the parameter p for this policyholder is proportional to the product of f(6) times g(p). f(6)g(p) = 5005p10(1-p)6 280p3 (1-p)4 which is proportional to p 13(1-p)10, and is therefore a Beta Distribution with parameters a = 14, b = 11. Thus the posterior distribution of p is: Γ(25)/{Γ(14)Γ(11)}p13(1-p)10 = (24!)/{(13!)(10!)}p13(1-p)10 = 27,457,584 p1 3(1-p)1 0. Comment: For the Beta-Negative Binomial in general the distribution of p posterior to observing C claims in E years is a Beta with parameters: a + rE, b + C. For C = 6, E = 10, a = 4, b = 5 and r =1, the posterior Beta has parameters: 4 + (1)(10), and 5 + 6 = 14, and 11.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 330
12.10. A. From the solution to the previous question, the posterior distribution of p is: Γ(14+11)/{Γ(14)Γ(11)} p13(1-p)10. The Geometric Distribution with parameter β = (1-p)/p has mean β = µ(p) = (1-p)/p. In order to get the average mean over the whole portfolio we need to take the integral of µ(p) g(p) dp. 1
1
∫µ(p) g(p) dp = ∫((1-p)/p) Γ(25)/{Γ(14)Γ(11)} p13(1-p)10 dp = 0
0 1
∫
Γ(25)/{Γ(14)Γ(11)} p12(1-p)11 dp = {Γ(25)/{Γ(14)Γ(11)}} Γ(13)Γ(12) / Γ(13+12) 0
= {Γ(12) / Γ(11)} {Γ(13) / Γ(14)} = 11/13. Comment: Difficult! For the Beta-Negative Binomial in general the estimated mean annual frequency posterior to observing C claims in E years is: r(b + C) / (a + rE - 1). Therefore, the posterior estimated mean annual frequency is (1)(5+6) / {4 + (1)(10) -1} = 11 /13. For the Beta-Negative Binomial in general, the Buhlmann Credibility parameter is: (a-1)/r. The Credibility assigned to the observation of 6 /10 is Z = E / {E + (a-1)/r} = 10 / {10 + (4-1)/1} = 10 /13. The prior estimate is 5/3. The Buhlmann Credibility estimate is: (10/13)(6/10) + (1- 10/13)(5/3) = 6/13 + 5/13 = 11/13, equal to the Bayesian estimate. This is yet another example of the general result when the likelihood is a member of a linear exponential family in a conjugate prior situation.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 331
12.11. A. The marginal distribution is: ∞
∞
∫f(x)g(θ)dθ = ∫{θ3e−θ/x / (2x4) } {θ4 e−θ/10/2,400,000}dθ = 0
0 ∞
∫θ7e−θ(1/x + 1/10) dθ = x-4 {Γ(8)/4,800,000} (1/x + 1/10)-8 =
x-4/4,800,000 0
x-4 {5040 /4,800,000} {x8 108 (x + 10)-8} = 105,000 x4 /(x+10)8 . Comment: The integral is reduced to one involving the Gamma function. The conditional distribution of x is an Inverse Gamma Distribution, with shape parameter 3 and scale parameter θ. The prior distribution of θ is a Gamma Distribution, with shape parameter 5 and scale parameter 10. The marginal distribution is a Generalized Pareto Distribution, with α = 3, θ = 10, τ = 5. In general the marginal distribution will be a Generalized Pareto with parameters: α = shape parameter of the conditional Inverse Gamma Distribution, θ = scale parameter of the prior Gamma Distribution, τ = shape parameter of the prior Gamma Distribution. Thus we see how one can obtain a Generalized Pareto as a mixture of an Inverse Gamma Distribution via a Gamma Distribution. As a special case, one can obtain a Pareto Distribution as a mixture of an Inverse Exponential via a Gamma Distribution. As another special case, one can obtain an Inverse Pareto Distribution as a mixture of an Inverse Gamma via an Exponential Distribution.
12.12. A. From the previous solution, the marginal distribution is: 105,000 x4 /(x+10)8 . This is a Generalized Pareto Distribution with α = 3, θ = 10, τ = 5. Its mean is: θτ / (α-1) = (10)(5)/2 = 25. Alternately, given θ, one has a Inverse Gamma severity distribution with parameters α = 3 and θ, with mean θ/(α-1) = θ/2. Thus the overall mean is E[θ/2] = E[θ]/2 = (mean of the Gamma distribution of θ)/2 = (5)(10)/2 = 25. 12.13. E. Using Bayes Theorem, the posterior distribution is proportional to the product of the chance of the observation and the prior distribution of θ: f(5)f(8)f(10)f(20)g(θ), which is proportional to: θ3e−θ/5 θ3e−θ/8 θ3e−θ/10 θ3e−θ/20 θ4 e−θ/10 = θ16e−.575θ. This is proportional to a Gamma Distribution with α = 17, and scale parameter = 1/0.575: (.57517)θ16e−.575θ / Γ(17) = 3.92 x 10-18 θ16e−.575θ.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 332
12.14. C. From the previous solution, the posterior distribution is: (0.57517)θ16e−.575θ / Γ(17), and thus the predictive distribution is: ∞
∫{θ3e−θ/x / (2x4 } {(0.57517)θ16e−.575θ / Γ(17)} dθ = 0
0.57517 -4 x 2 Γ(17)
=
∞
∫0
0.57517 -4 θ19 e- θ(1/ x + 0.575) dθ = x Γ(20) (1/x + 0.575)-20 2 Γ(17)
Γ(20) (0.57517) x-4 x20 0.575-20 (x + 1.739)-20 = 2 Γ(17)
(2907) x16 0.575-3 (x + 1.739)-2 = 15,291 x16 / (x+1.739)2 0. Comment: The integral is reduced to one involving the Gamma function. The predictive distribution is a Generalized Pareto Distribution with α = 3, θ = 1/ 0.575 = 1.739, τ = 17. 12.15. A. From the previous solution, the predictive distribution is a Generalized Pareto Distribution with α = 3, θ = 1/0.575 = 1.739, τ = 17. This has a mean of θ/(α-1) = (17)(1/0.575)/2 = 14.78. 12.16. D. Given θ, one has a Inverse Gamma severity distribution with shape parameter 3 and variance θ2/ (3-1)2 (3-2) = θ2/4. EPV = E[θ2/4] = E[θ2]/4 = (second moment of the prior Gamma distribution of θ)/4 = (500 + 502 )/4 = 3000/4 = 750. 12.17. C. Given θ, one has a Inverse Gamma severity distribution with shape parameter 3 and mean θ/ (3-1) = θ/2. Thus the Variance of the Hypothetical Mean Severities = Var[θ/2] = Var[θ]/22 = (variance of the prior Gamma distribution of θ)/4 = (5(10)2 )/4 = 125. 12.18. E. From previous solutions, EPV = 750, VHM = 125, and thus K = 750/125 = 6. Z = 4/(4 + 6) = .4. From a previous solution, the a priori mean is 25. The observation is (5+8+10+20)/4 = 10.75. Thus the estimate is: (10.75)(.4) + (25)(.6) = 19.3. Comment: Note that the estimates from Buhlmann Credibility and Bayesian Analysis are not equal. While the Gamma is a Conjugate Prior for the Inverse Gamma severity, the Inverse Gamma is not a member of a linear exponential family.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 333
12.19. B. The marginal distribution is: ∞
∞
∫f(x)g(λ)dλ = ∫{λ6x5e−λx / 120 } {3456 λ3 e−12λ}dλ = 0
0 ∞
∫λ9e−λ(x +12) dλ = 28.8x5 Γ(10) (x +12)-10 = 10,450,944 x5 (x +12)- 1 0.
28.8x5 0
Comment: The integral is reduced to one involving the Gamma function. The conditional distribution of x is a Gamma Distribution with shape parameter 6 and scale parameter 1/λ. The prior distribution of λ is another Gamma Distribution with shape parameter 4 and scale parameter 1/12. The marginal distribution is a Generalized Pareto Distribution with α = 4, θ = 12, τ = 6. In general the marginal distribution will be a Generalized Pareto with parameters: α = shape parameter of the prior Gamma Distributions, θ = 1 / scale parameter of the prior Gamma Distribution, τ = shape parameter of the conditional Gamma Distribution. Thus we see how one can obtain a Generalized Pareto as a mixture of an Gamma Distribution via another Gamma Distribution. As a special case, one can obtain a Pareto Distribution as a mixture of an Exponential via a Gamma Distribution. (Reformulated this is the Inverse-Gamma Exponential Conjugate Prior.) 12.20. D. From the previous solution, the marginal distribution is: 10,450,944 x5 (x +12)-10. This is a Generalized Pareto Distribution with parameters α = 4, θ = 12, τ = 6. Its mean is: θτ / (α-1) = (12)(6)/3 = 24. Alternately, given λ, one has a Gamma severity distribution with mean 6/λ. Thus the overall mean is E[6/λ] = 6E[1/λ] = (6) (the negative first moment of the Gamma distribution of λ, with α = 4 and θ = 1/12) = (6) {(1/12)-1/(4-1)) = 24. 12.21. C. Using Bayes Theorem, the posterior distribution is proportional to the product of the chance of the observation and the prior distribution of λ: f(5)f(8)f(10)f(20)g(λ), which is proportional to: λ6e−5λ λ6e−8λ λ6e−10λ λ6e−20λ λ3 e−12λ = λ27e−55λ. This is proportional to a Gamma Distribution with α = 28 and θ = 1/55: (5528)λ27e−55λ / Γ(28) = 4.934 x 1020 λ 27e−55λ.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 334
12.22. D. From the previous solution, the posterior distribution of λ is: (5528)λ27e−55λ / Γ(28), and thus the predictive distribution is: ∞
∫{λ6x5e−λx / 120} {(5528)λ27e−55λ / Γ(28)} dλ = 0 ∞
(5528 / 120 Γ(28)) x5
∫λ33e−λ(x + 55) dλ = (5528 / 120 Γ(28))x5 Γ(34) (x + 55)-34 0
= 3.570 x 1055 x5 / (x + 55)- 3 4. Comment: The integral is reduced to one involving the Gamma function. The predictive distribution is a Generalized Pareto Distribution with α = 28, θ = 55, and τ = 6. 12.23. A. From the previous solution, the predictive distribution is a Generalized Pareto Distribution with α = 28, θ = 55, τ = 6. This has a mean of θτ/(α-1) = (6)(55)/27 = 12.22. Alternately, given λ the severity is Gamma with α = 6 and θ = 1/λ, and thus mean 6/λ. From a previous solution, the posterior distribution of λ is: (5528)λ27e−55λ / Γ(28), a Gamma Distribution with α = 28 and θ = 1/55. Thus the posterior mean of 1/λ is: θ−1 Γ[α-1]/Γ[α] = 55 Γ[27]/Γ[28] = 55/27. Thus the posterior mean severity is: 6 E[1/λ] = (6)(55)/27 = 12.22. Comment: For a Gamma Distribution, E[Xk] = θk
Γ(α + k) , k > −α. Γ(α)
12.24. From a previous solution, the posterior distribution of λ is: (5528)λ27e−55λ / Γ(28), a Gamma Distribution with α = 28 and θ = 1/55. Thus posterior, E[1/λ2] = θ−2 Γ[α-2]/Γ[α] = 552 Γ[26]/Γ[28] = 552 /{(26)(27)}.
⇒ Var[6/λ] = 36 Var[1/λ] = 36{E[1/λ2] - E[1/λ]2 } = (36){552 /{(26)(27)} - (55/27)2 } = 5.745. From the previous solution, E[6/λ] = (6)(55)/27 = 12.22. Thus using the Normal Approximation, a 95% confidence interval is: 12.22 ± 1.960 5.745 = 12.22 ± 4.70 = [7.52, 16.92]. Comment: Similar to Exercise 15.81 in Loss Models. Γ(α + k) , k > −α. For a Gamma Distribution, E[Xk] = θk Γ(α)
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 335
12.25. E. Given λ, one has a Gamma severity distribution with variance 6/λ2. Thus the Expected Value of the Process Variance is E[6/λ2] = 6E[1/λ2] = 6 (negative second moment of the Gamma distribution of λ) = (6){(1/12)-2/{(4-1)(3-1)}) = 144. 12.26. C. Given λ, one has a Gamma severity distribution with mean 6/λ. Thus the Variance of the Hypothetical Mean Severities is Var[6/λ] = 62 Var[1/λ] = (36){E[1/λ2] - E2 [1/λ]} = 36{24 - 42 ) = 288. 12.27. A. From previous solutions, EPV = 144, VHM = 288, and thus K = 144/288 = 1/2. Z = 4/(4 + 1/2) = 8/9. From a previous solution, the a priori mean is 24. The observation is: (5 + 8 + 10 + 20)/4 = 10.75. Thus the estimate is: (10.75)(8/9) + (24)(1/9) = 110/9 = 12.2. Comment: Note that the estimate from Buhlmann Credibility is equal to that from Bayesian Analysis. The Gamma is a Conjugate Prior for the Gamma severity, and the Gamma severity, with fixed shape parameter, is a member of a linear exponential family. 12.28. The posterior density is proportional to the product of the prior distribution and the densities at the observations. The prior Inverse Transformed Gamma density with β taking the place of x is proportional to: β -11/6 exp(-7/β1/3). The Weibull density at observation x is proportional to: β−1/3 exp(- x1/3 β −1/3). Thus at x = 1, 8, 27, 125, the densities are proportional to: β −1/3 exp(-β−1/3), θ−1/3 exp(-2 β −1/3), β−1/3 exp(-3 β −1/3), and β−1/3 exp(-5 β −1/3). Thus the posterior distribution is proportional to: β -11/6 exp(-7β−1/3) β−1/3 exp(-β−1/3) β−1/3 exp(-2β−1/3) β−1/3 exp(-3β−1/3) β−1/3 exp(-5β−1/3) = β -19/6 exp(-18β −1/3). This proportional to: an Inverse Transformed Gamma distribution with α = 13/2 = 6.5, θ = 183 and τ = 1/3. Note that then: ατ + 1 = (1/3)(13/2) + 1 = 19/6. Comment: The Inverse Transformed Gamma is a Conjugate Prior to the Weibull Iikelihood with tau fixed; the tau parameters of the Weibull and the Inverse Transformed Gamma have to be equal. If one instead in the Weibull used the parameter λ = θτ, then λ follows an Inverse Gamma Distribution, and the Inverse Gamma is a conjugate prior to the Weibull.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 336
12.29. E. As determined in the previous solution, the posterior distribution of β is: an Inverse Transformed Gamma distribution with α = 6.5, θ = 183 and τ = 1/3. It has mean: 183 Γ(6.5 - 3) / Γ(6.5) = 183 / (5.5)(4.5)(3.5) = 67.32. Given a Weibull with parameters β and τ = 1/3, the mean severity is: β Γ(1+ 1/τ) = β Γ(1+ 3) = 6θ. Posterior expected value of the severity is: E[6θ] = 6E[θ] = (6)(67.32) = 404. Comment: Using the squared error loss function the Bayesian estimate is the mean. Loss functions are discussed in “Mahlerʼs Guide to Buhlmann Credibility.” 12.30. A. As determined in a previous solution, the posterior distribution of β is: an Inverse Transformed Gamma distribution with α = 6.5, θ = 183 and τ = 1/3. Using the zero-one loss function, the Bayesian estimate of β is the mode of the posterior Inverse ⎛ τ ⎞ Transformed Gamma distribution: θ ⎝ ατ + 1⎠
1/ τ
= 183 {(1/3)/(13/6 + 1)}3 = 6.80.
Given a Weibull with parameters β and τ = 1/3, the mean severity is: β Γ(1+ 1/τ) = β Γ(1+ 3) = 6θ. Thus the estimated hypothetical mean severity is: (6)(6.80) = 40.8. 12.31. C. As determined in a previous solution, the posterior distribution of β is: an Inverse Transformed Gamma distribution with α = 6.5, θ = 183 and τ = 1/3. Using the absolute error loss function, the Bayesian estimate of β is the median of the posterior distribution. The Transformed Gamma Distribution function is: 1 - Γ[α; (θ/x)τ] = 1 - Γ[6.5 ; 18/x1/3], so the median is such that Γ[6.5 ; 18/x1/3] = 0.5. We are given that Γ[6.5 ; 6.16988] = 0.5. Therefore, 18/ x1/3 = 6.16988. ⇒ median = (18/ 6.16988)3 = 24.8. Given a Weibull with parameters β and τ = 1/3, the mean severity is: β Γ(1+ 1/τ) = β Γ(1+ 3) = 6θ. Thus the estimate of the hypothetical mean severity is: (6)(24.8) = 149.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 337
12.32. E. The probability of the observation is proportional to: {exp[-(800 - 1000)2 /(2v)]/ v } {exp[-(900 - 1000)2 /(2v)]/ v } = exp[-25000/v]/v. The prior distribution of v is proportional to: e-50000/v/v5 . Therefore, the posterior distribution of v is proportional to: (exp[-25000/v]/v)e-50000/v/v5 = e-75000/v/v6 . This is proportional to an Inverse Gamma Distribution with α = 5 and θ = 75,000. ⇒ The posterior distribution of v is an Inverse Gamma Distribution with α = 5 and θ = 75,000. The mean of the posterior distribution of v is: 75,000/(5 - 1) = 18,750. 12.33. B. Applying Bayes Theorem, the posterior distribution is proportional to: {y1 2 y2 3 y3 4 y4 5 (1 - y1 - y2 - y3 - y4 )19} {y1 3 y2 2 y3 4 y4 6 (1 - y1 - y2 - y3 - y4 )35} = y 1 5 y2 5 y3 8 y4 11 (1 - y1 - y2 - y3 - y4 )54. This posterior distribution is also a Dirichlet Distribution, with a0 = 55, a1 = 6, a2 = 6, a3 = 9, and k
a4 = 12. aʼ = ∑ aj ʼ = 55 + 6 + 6 + 9 + 12 = 88. E[yj] = E[j-1|qx] is: ajʼ / aʼ. j=0
E[qx] = a1 ʼ/aʼ = 6/88 = .06818. E[1|qx] = a2 ʼ/aʼ = 6/88 = .06818. E[2|qx] = a3 ʼ/aʼ = 9/88 = .10227. E[3|qx] = a4 ʼ/aʼ = 12/88 = .13636. The posterior estimate of 3 p x is: (1 - qx)(1 - 1|qx)(1 - 2|qx) = (1 - .06818)(1 - .06818)(1 - .10227) = .7795. The posterior estimate of 3 qx is: 1 - .77949 = 0.2205. 12.34. F(x) = 1 - exp[-c x2 ]. f(x) = 2 c x exp[-c x2 ]. Thus the chance of the observation is proportional to: c4 exp[-c(12 + 22 + 32 + 52 )] = c4 exp[-39c]. The prior distribution is proportional to the density of the Gamma prior: c9 exp[-50c]. Thus the posterior distribution is proportional to: c9 exp[-50c] c4 exp[-39c] = c13 exp[-89c]. Thus the posterior distribution is also a Gamma with parameters α = 14 and θ = 1/89. Comment: The severity distribution is a Weibull parameterized differently than in Loss Models.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 338
12.35. C. For the Inverse Gamma, f(x | q) = qα e-q/x / {Γ[α] xα+1} = q6 e-q/x / {5! x7 }. For the Exponential, π(q) = e-q/100/100. The posterior distribution is proportional to: f(10 | q) f(20 | q) f(50 | q) π(q), which is proportional to: q6 e-q/10 q6 e-q/20 q6 e-q/50 e-q/100 = q18 e-0.18q. This is an Gamma Distribution with parameters 19 and 1/0.18 = 5.556. For the zero-one loss function, the Bayes estimate is the mode of the posterior distribution. The mode of a Gamma distribution for α > 1 is: θ(α - 1) = (5.556)(19 - 1) = 100. Comment: This is an example of an Exponential-Inverse Gamma, a special case of the Gamma-Inverse Gamma with α = 1. αʼ = α + aC = 1 + (6)(3) = 19, and 1/θʼ = 1/θ + Σ(1/xi) = 1/100 + 1/10 + 1/20 + 1/50. For the squared error loss function, the Bayes estimate is the mean of the posterior distribution, which in this case is: θα = (5.556)(19) = 105.6.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 339
12.36. By differentiating with respect to x: f(x | β) = β 10β / xβ+1, x > 10. The chance of the observation given β is: 6 f(15 | β) f(25 | β) f(60 | β). This is proportional to: β (10/15)β β(10/25)β β(10/60)β = β3 1.5−β 2.5−β 6−β = β3 22.5−β. The prior density of beta is a Gamma Distribution and is proportional to: e-2β β3. Therefore, the posterior density of β is proportional to: e-2β β3 β3 22.5−β = β6 exp[-β (2 + ln(22.5)] = β6 exp[-β / 0.1956]. Thus the posterior distribution of β is Gamma with α = 7 and θ = 0.1956. Comment: The severity distribution is a Single Parameter Pareto with α = β and θ = 10. Note that for β ≤ 1, the mean severity does not exist. Since the Gamma has support starting at zero, neither the marginal nor the predictive distributions have a finite mean. In general, the Gamma Distribution is a Conjugate Prior to the Single Parameter Pareto likelihood with θ fixed. In general, if the prior Gamma has shape parameter α and scale parameter δ, then the n
posterior Gamma has parameters: αʼ = α + n, and 1/δʼ = 1/δ +
∑ ln[xi / θ]. i=1
In this case: αʼ = 4 + 3 = 7, and 1/βʼ = 1/0.5 + ln[15/10] + ln[25/10] + ln[60/10] = 5.1135. A graph comparing the prior and posterior Gammas: density 0.8 Posterior
0.6
0.4
0.2
Prior
1
2
3
4
5
6
7
beta
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 340
12.37. C. By Bayes Theorem, the posterior distribution of λ is proportional to the product of the prior distribution of λ and the chance of the observation given λ. The chance of the observation is a product of four density functions: f(1;λ) f(4;λ) f(9;λ) f(64;λ) which is proportional to: λ4 exp(-λ) exp(-2λ) exp(-3λ) exp(-8λ) = λ4 e-14λ. The a priori distribution of λ is: 20.7 λ0.7−1 e−2λ / Γ(0.7). Thus the posterior distribution of λ is proportional to: λ4 e-14λ λ−0.3 e−2λ = λ3.7 e-16λ. This is proportional to a Gamma distribution with α = 4.7 and θ = 1/16. Thus the mean of the posterior distribution is αθ = 4.7 / 16 = 0.294. Comment: The Gamma Distribution is a conjugate prior distribution to the Weibull Distribution. Assume that the Weibull is parameterized in a somewhat different way than in Loss Models, has parameters τ (fixed and known), and λ. λ here corresponds to θ−τ in Loss Models . If the prior Gamma distribution of λ has parameters α and θ, then the posterior Gamma distribution of λ has parameters: αʼ = α + n, and 1/θʼ = 1/θ +
∑ xiτ .
In this case, αʼ = 0.7 + 4 = 4.7, and 1/θʼ = 2 + (11/2 + 41/2 + 91/2 + 641/2) = 16. One could instead put this all in terms of an Inverse Gamma Distribution of 1/λ. Integrating the probability weight of λ3.7 e-16λ from zero to infinity one gets Γ(4.7) / 164.7. Thus dividing by this constant will give the posterior distribution, which must integrate to unity. The posterior distribution is: 164.7 λ3.7 e-16λ / Γ(4.7), a Gamma distribution with α = 4.7 and θ = 1/16.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 341
12.38. E. The posterior distribution of ψ is a Gamma with parameters α ′ = n + α = 5 + 2.2 = 7.2, and 1/θ′ = 1/θ + Σ ln(1+xi) = 2 + ln(4) + ln(10) + ln(13) + ln(7) + ln(5) = 2 + ln((4)(10)(13)7)(5)) = 2 + ln(18200). ⇒ θ′ = 1 / {2 + ln(18200)}. The mean of the posterior Gamma is: α ′ θ ′ = 7.2 / {2 + ln(18,200)}. Alternately, not using the hint, the chance of the observation is a product of probability density functions of the Pareto: f(x) = ψ (1+x)−(ψ+1). The a priori distribution of ψ is given by the prior Gamma Distribution: g(ψ) = θ−α ψα−1 e−ψ/θ / Γ(α) = 22.2 ψ1.2 e−2ψ / Γ(2.2), which is proportional to ψ1.2 e−2ψ. Thus the posterior distribution of ψ is proportional to: ψ(1+3)−(ψ+1) ψ(1+9)−(ψ+1) ψ(1+12)−(ψ+1) ψ(1+6)−(ψ+1) ψ(1+4)−(ψ+1) ψ1.2 e−2ψ = ψ6.2 (18200)−(ψ+1) e−2ψ = (18200)-1 ψ6.2 e-(2 + ln(18200))ψ. This is proportional to a Gamma probability density with parameters: α = 7.2 and θ = 1 / {2 + ln(18,200)}. The mean of the posterior Gamma is: α θ = 7.2 / {2 + ln(18,200)}. Comment: When one is minimizing the squared error, then the Bayes estimator is the mean of the posterior distribution. The posterior distribution of ψ is proportional to the product of the chance of the observation given ψ and the a prior chance of ψ. The Gamma is a Conjugate Prior for the Pareto ( with a fixed scale parameter.) 12.39. D. 1. False. The posterior distribution for one year can be used as the prior distribution for the next year. 2. True. 3. True. The Beta Distribution is a conjugate prior distribution to the Binomial Distribution with fixed n. 12.40. A. Applying Bayes Theorem, the posterior distribution is proportional to: {t1 6 t2 3 t3 14 t4 8 (1 - t1 - t2 - t3 - t4 )3 } {t1 7 t2 12 t3 11 t4 11 (1 - t1 - t2 - t3 - t4 )9 } = t1 13 t2 15 t3 25 t4 19 (1 - t1 - t2 - t3 - t4 )12. This posterior distribution is also a Dirichlet Distribution, with a0 = 13, a1 = 14, a2 = 16, 4
a3 = 26, and a4 = 20. aʼ = ∑ aj ʼ = 13 + 14 + 16 + 26 + 20 = 89. E[tj] = E[j-1|qx] is: ajʼʼ/aʼ. j=0
E[2|qx] = a3 ʼ/aʼ = 26/89 = 0.292. Comment: E[qx] = a1 ʼ/aʼ = 14/89. E[1|qx] = a2 ʼ/aʼ = 16/89. E[3|qx] = a4 ʼ/aʼ = 20/89.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 342
12.41. C. Assume that x lives die in year 3. Then 40 - 4 - 24 - x = 12 - x lives survive beyond year 3. Applying Bayes Theorem, the posterior distribution is proportional to: { t1 7 t2 5 t3 3 (1 - t1 - t2 - t3 )} {t1 4 t2 24 t3 x (1 - t1 - t2 - t3 )12 - x} = t1 11 t2 29 t3 3+x (1 - t1 - t2 - t3 )13 - x. This posterior distribution is also a Dirichlet Distribution, with a0 = 14 - x, a1 = 12, a2 = 30, and a3 = 4 + x. E[q0 ] = 12/(14 - x + 12 + 30 + 4 + x) = 12/60 = 1/5. E[1|q0 ] = 30/(14 - x + 12 + 30 + 4 + x) = 30/60 = 1/2. q1 =1|q0 /(1 - q0 ) = (1/2)/(1 - 1/5) = 5/8. 12.42. A. The Beta-Bernoulli the predictive distribution is the Bernoulli, the predictive distribution for the Gamma-Poisson is the Negative Binomial, and for the Inverse Gamma-Exponential the predictive distribution is the Pareto. 12.43. D. The prior distribution is proportional to: t1 49 t2 39 t3 29.... Assume that x die in the first year, y die in the second year, and z die in the third year. Applying Bayes Theorem, the posterior distribution is proportional to: t1 49+x t2 39+y t3 29+z.... This posterior distribution is also a Dirichlet Distribution, with a1 ʼ = 50 + x, a2 ʼ = 40 + y, a3 ʼ = 30 + z. ajʼ = aj + dj, where dj is the number who die during year j and d0 is the number who survive beyond year k. aʼ = Σajʼ = Σaj + Σdj = 150 + 100 = 250. E[tj] = E[j-1|qx] is: ajʼʼ/aʼ. E[2|qx] = a2 ʼ/aʼ = (30 + z)/250. The posterior estimate of 2|qx = (1 - q0 )(1 - q1 )q2 = (16/25)(3/5)(7/12) = .224. (30 + z)/250 = 0.224. ⇒ z = 26. 12.44. D. The prior rate for the first year is: a1 /a = 0.06. ⇒ a1 = 0.06a. k
Let there be dj deaths observed during year j, d0 lives survive beyond year k, and d = ∑ dj = j=0
original number of lives. Then applying Bayes Theorem, the posterior distribution is Dirichlet with parameters ajʼ = aj + dj. The posterior estimate of ti = i-1lqx is: aiʼ/aʼ = (ai + di)/(a + d) = (ai/a){a/(a + d)} + (di/d){d/(a + d)}. The weight to the prior rate, ai/a, is: a/(a + d) = a/(a + 200) = 5/9. ⇒ a = 250.
⇒ a1 = (.06)(250) = 15. Comment: The estimate from Bayes Analysis is equal to that from Buhlmann Credibility, with K = a = 250, Z = d/(d + a), and 1 - Z = a/(d + a) = 250/(200 + 250) = 5/9.
2013-4-10,
Conjugate Priors §12 Overview,
HCM 10/21/12,
Page 343
12.45. B. By Bayesʼ Theorem, the posterior distribution of θ is proportional to the product of the a priori probability of θ and the chance of the observation given θ: g(θ) f(2 | θ) = (e-θ/4 / 4) (θe-θ/2 / 22 ) = θe−3θ/4 /16, which is proportional to: θe−3θ/4. Comment: θ ⇔ x; the posterior distribution of θ is proportional to: x e-x/(4/3). This is a Gamma Distribution with α = 2 and scale parameter of 4/3. Looking in the Appendix A attached to the exam, the constant is: 1/{Γ(α) θα} = 1/{Γ(2) (4/3)2 } = 9/16. Therefore, the posterior distribution is: (9/16) θ e−3θ/4. This is an example of a Gamma - Inverse Exponential Conjugate Prior. The severity distribution is an Inverse Exponential Distribution with scale parameter θ. The prior distribution of θ is a Gamma Distribution with α = 1 and scale parameter 4. One observes one claim of size 2 (C = 1, Λ = sum of inverses of loss sizes = 1/2.) The posterior distribution of θ is also a Gamma Distribution, but with α = 1 + C = 1 + 1 = 2 and scale parameter: 1/(1/4 + Λ) = 1/(1/4 + 1/2) = 4/3. 12.46. B. The predictive distribution is the Inverse Exponential severity mixed via the posterior Gamma: ∞
∫
0
(9 / 16) θ e- 3θ / 4
θe - θ / x 9 ∞ 2 - (3 / 4 + 1/ x) θ dθ = dθ = ∫ θ e x2 16x 2 0
{9/ (16x2 )} 2! / (3/4 + 1/x)3 = (8x/3) (x + 4/3)-3. This is an Inverse Pareto as per Appendix A of Loss Models, with τ = 2 and θ = 4/3. Comment: Note that neither the Inverse Exponential nor the Inverse Pareto have a finite mean. Therefore, one can not use Bayesian Analysis or Buhlmann Credibility to predict the future severity of this insured. The density of the Inverse Pareto is: θ xτ−1 / (x + θ)τ+1.
2013-4-10,
Conjugate Priors §13 Important Ideas,
HCM 10/21/12,
Page 344
Section 13, Important Formulas and Ideas Here are what I believe are the most important formulas and ideas from this study guide to know for the exam.
Mixing Poissons (Section 2): When mixing Poissons: EPV = µ = a priori mean.
Gamma-Poisson Conjugate Prior (Section 4): Estimate using Buhlmann Credibility = Estimate using Bayes Analysis Inputs: Prior Gamma Distribution has parameters α and θ. Poisson Likelihood (Model Distribution) has mean λ that varies across the portfolio via a Gamma Distribution. Marginal Distribution: Negative Binomial Distribution with parameters r = α , β = θ. r goes with alpha, and beta rhymes with theta. Buhlmann Credibility Parameter = 1/θ. Observations: C claims for E exposures. Posterior Gamma Distribution has parameters αʼ = α + C, 1/θʼ = E + 1/θ. Predictive Distribution: Negative Binomial with parameters rʼ = αʼ and βʼ = θʼ. The Exponential is a special case of the Gamma with α = 1. For the Exponential-Poisson, the marginal distribution is Geometric with β = θ.
2013-4-10,
Conjugate Priors §13 Important Ideas,
HCM 10/21/12,
Page 345
Beta-Bernoulli and Beta Binomial Conjugate Priors (Section 6 to 7): Estimate using Buhlmann Credibility = Estimate using Bayes Analysis Inputs: Prior Beta has parameters a, b, and θ = 1. For use in the Beta-Bernoulli, the θ parameter in the Beta is always equal to one. For θ = 1, f(x) =
(a + b - 1)! xa - 1 (1 - x)b - 1, 0 ≤ x ≤ 1. (a - 1)! (b - 1)!
Bernoulli Likelihood has mean q that varies across the portfolio via Beta. Marginal Distribution: Bernoulli with parameter q = a/(a+b). Buhlmann Credibility Parameter = a + b. Observations: r claims (successes) for n exposures (trials). Posterior Beta has parameters aʼ = a + r and bʼ = b + n - r. Predictive Distribution: Bernoulli with parameter q = aʼ/(aʼ+bʼ). The Uniform Distribution on (0,1) is a special case of the Beta Distribution, with a = 1 and b = 1, (and θ = 1). Thus the case of Bernoulli parameters uniformly distributed on (0, 1) is a special case of the Beta-Bernoulli, with a =1 and b =1. If for fixed m, the q parameter of the Binomial is distributed over a portfolio by a Beta, then the posterior distribution of q parameters is also given by a Beta with parameters: aʼ = a + number of claims. bʼ = b + m(number of years) - number of claims.
2013-4-10,
Conjugate Priors §13 Important Ideas,
HCM 10/21/12,
Page 346
Inverse Gamma-Exponential Conjugate Prior (Section 9): Estimate using Buhlmann Credibility = Estimate using Bayes Analysis Inputs: Prior Inverse Gamma has parameters α and θ. Exponential Likelihood has mean δ that varies across the portfolio via an Inverse Gamma Distribution with parameters α and θ. Marginal Distribution: Pareto with same parameters as Prior Inverse Gamma. Buhlmann Credibility Parameter = α - 1. Observations: L dollars of loss on C claims. Posterior Inverse Gamma has parameters αʼ = α + C and θʼ = θ + L. Predictive Distribution: Pareto with same parameters as Posterior Inverse Gamma. Normal-Normal Conjugate Prior (Section 10): Estimate using Buhlmann Credibility = Estimate using Bayes Analysis Inputs: Prior Normal has parameters µ and σ; Normal Likelihood has fixed variance of s2 , with mean m that varies across the portfolio via prior Normal. Marginal Distribution: Normal with mean µ, and variance σ2 + s2 . Buhlmann Credibility Parameter = EPV/VHM = s2 / σ2. Observations: L dollars of loss on C claims. Posterior Normal has mean =
L σ 2 + µ s2 s2 σ2 and variance = . C σ2 + s2 C σ2 + s 2
Predictive Distribution: Normal with same mean as the posterior Normal and variance s2 more than that of the posterior Normal.
2013-4-10,
Conjugate Priors §13 Important Ideas,
HCM 10/21/12,
Page 347
Linear Exponential Families (Section 11): ln f(x; ψ) = x r(ψ ) + ln[p(x)] - ln[q(ψ)]; with r(ψ) the “natural parameter”. Examples of Linear Exponential Families: Poisson, Exponential, Gamma for fixed α, Normal for fixed σ, Bernoulli, Binomial for fixed m, Geometric, Negative Binomial for fixed r, Inverse Gaussian for fixed θ. If the likelihood density is a member of a linear exponential family and the conjugate prior distribution is used as the prior distribution, then the Buhlmann Credibility estimate is equal to the corresponding Bayesian estimate (for a squared error loss function.) Specifically, this applies to the Gamma-Poisson, Beta-Bernoulli, Inverse Gamma-Exponential, and the Normal-Normal (fixed variance.) For Linear Exponential Families, the Methods of Maximum Likelihood and Moments produce the same result when applied to ungrouped data.
2013-4-10,
Conjugate Priors §13 Important Ideas,
HCM 10/21/12,
Page 348
Updating Formulas, Conjugate Priors Gamma-Poisson
αʼ = α + C
Beta-Bernoulli Beta-Binomial
aʼ = a + r bʼ = b + n - r aʼ = a + # of claims. bʼ = b + m (# of years) - # of claims.
Inverse Gamma-
αʼ = α + C
1/θʼ = (1/θ) + E
θʼ = θ + L
Exponential Normal-Normal
µʼ =
L σ 2 + µ s2 C σ 2 + s2
σ ʼ2 =
s2 σ2 C s2 + s 2
Buhlmann Credibility Parameters, Conjugate Priors Gamma-Poisson
K = 1/θ
Beta-Bernoulli Beta-Binomial
K=a+b K = (a + b) / m
Inverse Gamma-Exponential
K=α-1
Normal-Normal
K = s2 / σ2
Estimate using Buhlmann Credibility = Estimate using Bayesian Analysis Marginal Distributions, Conjugate Priors Gamma-Poisson
Negative Binomial: r = α, β = θ
Beta-Bernoulli Beta-Binomial
Bernoulli: q = a/(a + b) Beta-Binomial
Inverse Gamma-Exponential
Pareto: αʼ = α, θʼ = θ
Normal-Normal
Normal: µʼ = µ, σʼ2 = σ2 + s2
Page 349 HCM 10/21/12,
Conjugate Priors §13 Important Ideas, 2013-4-10,
Gamma-Poisson Frequency Process
Poisson Process Mixing
r = shape parameter of the Prior Gamma = α. β = scale parameter of the Prior Gamma = θ. Mean = rβ = αθ. Variance = rβ(1+β) = αθ + αθ2.
Negative Binomial Marginal Distribution (Number of Claims)
Gamma is a Conjugate Prior, Poisson is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate Buhlmann Credibility Parameter, K = 1/θ. Gamma Prior (Distribution of Parameters) Shape parameter = alpha = α, Scale parameter = theta = θ.
Poisson Process
Mixing
r = shape parameter of the Posterior Gamma = αʼ = α + C. β = scale parameter of the Posterior Gamma = θʼ = 1/(E + 1/θ). Mean = rβ = (α + C)/(E + 1/θ). Variance = rβ(1+β) = (α + C)/(E + 1/θ) + (α + C)/(E + 1/θ)2.
Negative Binomial Predictive Distribution (Number of Claims)
Observations: # claims = C, # exposures = E.
Gamma Posterior (Distribution of Parameters) Posterior Shape parameter = αʼ = α + C. Posterior Scale parameter = 1/θʼ = 1/θ + E.
Poisson Parameters of individuals making up the entire portfolio are distributed via a Gamma Distribution with parameters α and θ: f(x) = θ-α xα-1 e-x/θ / Γ[α], mean = αθ, variance = αθ2.
Page 350 HCM 10/21/12,
Conjugate Priors §13 Important Ideas, 2013-4-10,
Beta-Bernoulli Frequency Process
Bernoulli Process
(Number of Claims) Bernoulli parameter q = mean of Bernoulli = a/(a+b) = mean of prior Beta. Variance = q(1-q) = ab/(a+b)2 .
Bernoulli Marginal Distribution:
Beta is a Conjugate Prior, Bernoulli is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate Buhlmann Credibility Parameter, K = a + b.
Beta Prior a, b Mixing
Bernoulli Process
Mixing
(Number of Claims) Bernoulli parameter q = mean of Bernoulli = (a+r)/(a+b+n) = mean of posterior Beta. Variance = q(1-q) = (a+r)(b+n-r)/(a+b+n)2 .
Bernoulli Predictive Distribution:
Observations: # claims = # successes = r, # exposures = # of trials = n.
(Distribution of Parameters)
Beta Posterior (Distribution of Parameters) Posterior 1st parameter = a + r. Posterior 2nd parameter = b + n - r.
Bernoulli Parameters of individuals making up the entire portfolio are distributed via a Beta Distribution with parameters a and b: f(x) = (a+b-1)! xa-1 (1-x)b-1 /{(a-1)!(b-1)!}, 0 ≤ x ≤ 1, mean = a/(a+b), variance = ab/{(a+b+1)(a+b)2 }.
2013-4-10,
Conjugate Priors §13 Important Ideas,
HCM 10/21/12,
Beta-Binomial Frequency Process Beta is a Conjugate Prior for the Binomial Likelihood Binomial with m fixed is a Member of a Linear Exponential Family. Buhlmann Credibility Estimate = Bayes Analysis Estimate. Buhlmann Credibility Parameter, K = (a + b)/m. Binomial Process Beta Prior a, b Dist. of Parameters
Mixing
Beta-Binomial Marginal Dist. Number of Claims
Observations: # claims = C, # years = Y.
Beta Posterior Dist. of Parameters aʼ = a + C. bʼ = b + mY - C.
Mixing
Binomial Process
Beta-Binomial Predictive Dist. Number of Claims
Page 351
Page 352 HCM 10/21/12,
Conjugate Priors §13 Important Ideas, 2013-4-10,
Inverse Gamma-Exponential Severity Process
Mixing
Exponential Process
(Size of Loss) α = shape parameter of the Prior Inverse Gamma = α . θ = scale parameter of the Prior Inverse Gamma = θ. Mean = θ/(α−1). Variance = αθ2/{(α−2)(α−1)2}.
Pareto Marginal Distribution:
Inverse Gamma is a Conjugate Prior, Exponential is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate Buhlmann Credibility Parameter, K = α - 1. Inverse Gamma Prior (Distribution of Parameters) Shape parameter = alpha = α, Scale parameter = theta = θ.
Exponential Process
Mixing
(Size of Loss) α = shape parameter of the Posterior Inverse Gamma = αʼ = α + C. θ = scale parameter of the Posterior Inverse Gamma = θʼ = θ + L. Mean = θ/(α−1). Variance = αθ2/{(α−2)(α−1)2}.
Pareto Predictive Distribution:
Observations: $ of Loss = L, # claims = C.
Inverse Gamma Posterior (Distribution of Parameters) Posterior Shape parameter = αʼ = α + C. Posterior Scale parameter = θʼ = θ + L.
Exponential Parameters (means) of individuals making up the entire portfolio are distributed via an Inverse Gamma Distribution with parameters α and θ: f(x) = θα e-θ/x/ {xα+1Γ[α]}, mean = θ/(α−1), variance = θ2/{(α−2)(α−1)2}.
Page 353 HCM 10/21/12,
Conjugate Priors §13 Important Ideas, 2013-4-10,
Normal-Normal Severity Process
(Size of Loss) Mean = µ = mean of the prior Normal Distribution. Variance = s2 + σ2.
Normal Marginal Distribution:
Normal Severity Process, fixed variance s2, mean m.
Normal is a Conjugate Prior, Normal (fixed variance) is a Member of a Linear Exponential Family Buhlmann Credibility Estimate = Bayes Analysis Estimate K = Variance of Normal Likelihood/ Variance of Normal Prior = s2/σ2.
Normal Prior Mixing
Mixing
(Size of Loss) Mean = (Lσ2 + µs2 ) / {Cσ2 + s2 } = mean of posterior Normal Distribution. Variance = s2 + s2 σ2 / {Cσ2 + s2 }.
Normal Predictive Distribution:
Observations: $ of Loss = L, # claims = C.
(Distribution of Parameters) f(m) = φ((m-µ)/σ), mean = µ, variance = σ2.
Normal Posterior (Distribution of Parameters) Mean = (Lσ2 + µs2 ) / {Cσ2 + s2 }. Variance = s2 σ2 / {Cσ2 + s2 }.
Normal Severity Process, fixed variance s2, mean m.
The Means of the Normal Severity Distributions of individuals making up the entire portfolio are distributed via a Normal Distribution with parameters µ and σ: f(m) = exp[-(m-µ)2/ 2σ2]/{σ 2 π }.
Mahlerʼs Guide to
Semiparametric Estimation Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-11 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-11,
Semiparametric Estimation,
HCM 10/22/12,
Page 1
Mahlerʼs Guide to Semiparametric Estimation Copyright 2013 by Howard C. Mahler. This Study Aid will discuss the important technique of semiparametric estimation. Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Solutions to the problems in each section are at the end of that section.1
A
1
Section #
Pages
Section Name
1 2 3 4 5 6 7 8
3 4-31 32-36 37-40 41-44 45-54 55-58 59
Introduction Poisson Frequency Negative Binomial with Beta Fixed Geometric Frequency Negative Binomial with r Fixed Overview Other Distributions Important Ideas & Formulas
Note that problems include both some written by me and some from past exams. The latter are copyright by the CAS and SOA and are reproduced here solely to aid students in studying for exams. In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. In some cases the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to answer exam questions, but it will not be specifically tested. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams.
2013-4-11,
Semiparametric Estimation,
HCM 10/22/12,
Page 2
Course 4 Exam Questions by Section of this Study Aid2 Section Sample 1 2 3 4 5 6 7
39
5/00
11/00
33
7
5/01
11/01
11/02
11/03
11/04
5/05
11/05
37
28
30
11/06
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams.
2
Excluding any questions that are no longer on the syllabus.
5/07 25
2013-4-11,
Semiparametric Estimation §1 Introduction,
HCM 10/22/12,
Page 3
Section 1, Introduction The application of Buhlmann Credibility to the type of situation in which one assumes the likelihood has a certain form, is referred to by Loss Models as semiparametric empirical Bayes estimation.3 Initially, the likelihood of the frequency for each insured will be assumed to be a Poisson, by far the most common application. Subsequently, I discuss semiparametric estimation using the assumptions of either a Geometric, or Negative Binomial frequency. In each case, the Expected Value of the Process Variance and Total Variance will be estimated, and then we will back out the Variance of the Hypothetical Means. Total Variance = EPV + VHM ⇒ VHM = Total Variance - EPV. Semiparametric vs. Nonparametric vs. Full Parametric Estimation: Semiparametric estimation assumes a particular form of the frequency distribution, but differs from full parametric estimation via Buhlmann Credibility4 where in addition one assumes a particular distribution of types of insureds or mixing distribution. Full parametric estimation assumes a complete model and K is calculated with no reference to any observations. For example, if one assumes Good Drivers are 80% of the portfolio and each have a Poisson frequency with λ = 0.03, while Bad Drivers are 20% of the portfolio and each have a Poisson frequency with λ = 0.07, then one can calculate the Buhlmann Credibility Parameter, K, with no reference to any data. In semiparametric estimation, one relies on the data to help calculate K. Semiparametric estimation assumes a particular form of the frequency distribution, which differs from Nonparametric estimation5 where no such assumption is made. One can use semiparametric estimation with only one year of data from a portfolio of insureds. Nonparametric estimation requires data separately for each of at least two years from several insureds. From the least complete to most complete modeling assumptions: Nonparametric, semiparametric, full parametric. From the least reliance on data in order to calculate K to the most reliance on data to calculate K: Full parametric, semiparametric, nonparametric. 3
See Section 20.4.2 of Loss Models, in particular Example 20.36. See pages 8-47 to 8-49 of “Credibility” by Mahler and Dean. See also “Topics in Credibility” by Dean. 4 See “Mahlerʼs Guide to Buhlmann Credibility.” 5 See “Mahlerʼs Guide to Empirical Bayesian Credibility.”
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 4
Section 2, Poisson Frequency As discussed in a “Mahlerʼs Guide to Conjugate Priors”, when each insured has a Poisson frequency, EPV = E[Process Variance | λ] = E[λ] = overall mean. In semiparametric estimation, when one assumes each exposure has a Poisson Distribution, one estimates the mean and total variance and then: estimated EPV = estimated mean = X . estimated VHM = estimated total variance - estimated EPV = s2 - X . Exercise: Assume that one observes that the claim count distribution during a year is as follows for a group of 10,000 insureds:6 Total Claim Count : 0 1 2 3 4 5 6 7 >7 Number of Insureds: 8116 1434 329 87 24 7 2 1 0 Assume in addition that the claim count for each individual insured has a Poisson distribution. Estimate the Buhlmann Credibility parameter K. [Solution: X = 0.2503 and s2 = (10,000/9999)(0.4213 - 0.25032 ) = 0.3587, where I have used the sample variance, in order to have an unbiased estimate of the variance.7 B
C
D
Number of Claims 0 1 2 3 4 5 6 8 7
A
Probability 0.8116 0.1434 0.0329 0.0087 0.0024 0.0007 0.0002 0.0000 0.0001
Col. A times Col. B 0.0000 0.1434 0.0658 0.0261 0.0096 0.0035 0.0012 0.0000 0.0007
Square of Col. A times Col. B 0.0000 0.1434 0.1316 0.0783 0.0384 0.0175 0.0072 0.0000 0.0049
Sum
1
0.2503
0.4213
estimated EPV = X = 0.2503. estimated VHM = Total Variance - estimated EPV = s2 - X = 0.3587 - 0.2503 = 0.1084. K = EPV/VHM = 0.2503/0.1084 = 2.3.] 6
For example, 329 of these 10,000 insureds happen 2 claims over the previous year. Some of these 329 were very unlucky, but many of these 329 have a worse than average future expected claim frequency. 7 The sample variance has 10,000 - 1 = 9999 in the denominator, rather than 10,000. For a smaller number of insureds, using the sample variance rather than usual variance could make a significant difference. Loss Models uses the sample variance in this situation, while Mahler & Dean do not. On the exam, I would use the sample variance in this situation.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 5
Several Years of Data:8 Assume we have three years of data for 5 insureds:9 Number of Claims Insured Year 1 Year 2 Year 3 1 1 0 0 2 2 1 3 3 0 1 2 4 0 0 1 5 0 3 0
Total 1 6 3 1 3
If we assume each insured is Poisson and that for each insured his Poisson parameter is the same for each of the three years, then it makes sense to treat the sum of the claims from each insured as a single observation of the risk process. The mean 3-year frequency is: (1 + 6 + 3 + 1 + 3) /5 = 14/5 = 2.8 = estimated EPV. The sample variance of the three year totals is: {(1 - 2.8)2 + (6 - 2.8)2 + (3 - 2.8)2 + (1 - 2.8)2 + (3 - 2.8)2 } / (5 - 1) = 4.2. Estimated VHM = 4.2 - 2.8 = 1.4. K = EPV/VHM = 2.8/1.4 = 2. Exercise: Estimate the future number of claims from insured #2 over the next 3 years. [Solution: K was calculated assuming 3 years of data from a single insured was one draw from the risk process, and therefore, Z = 1/(1 + K) = 1/3. Estimated 3 year frequency = (1/3)(6) + (2/3)(2.8) = 3.87.] The estimated future annual frequency for insured #2 is: 3.87 / 3 = 1.29. One could get the same result, by instead treating one year as a single observation of the risk process. The mean annual frequency is 14/15 = 0.9333 = estimated EPV (annual.) The sum of three years from one insured is the sum of three independent, identically distributed variables, with three times the process variance. Therefore, the EPV for three years would be (3)(.9333) = 2.8. The sample variance for the three year totals is 4.2. VHM for 3 years = Var[H.M. for 3 years] = Var[(3)(annual H. M.)] = 32 Var[annual H. M.] = 9 (VHM annual). 4.2 = 2.8 + (9) (VHM annual). Estimated VHM annual = (4.2 - 2.8)/9 = 0.1556. K = EPV/VHM = 6 on an annual basis. For 3 years of data, Z = 3/(3+6) = 1/3. The estimated future annual frequency for insured #2 is: (1/3)(6/3) + (2/3)(0.9333) = 1.29. 8 9
See for example, 4, 5/05, Q.28, and 4, 5/07, Q.25. Practical applications of this technique would involve many more than 5 insureds.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 6
Exercise: During a single 3-year period, 5 insureds had the following experience: Number of Claims in Year 1 through Year 3 Number of Insureds 1 2 3 2 6 1 The number of claims per year follows a Poisson distribution, with λ constant for a given insured. For the insured who had 6 claims over the period, using semiparametric empirical Bayes estimation, determine the Bühlmann estimate for the number of claims in Year 4. [Solution: X = {(2)(1) + (2)(3) + (1)(6)}/5 = 2.8. E[X2 ] = {(2)(12 ) + (2)(32 ) + (1)(62 )}/5 = 11.2. Sample Variance = (5/4)(11.2 - 2.82 ) = 4.2. Estimated EPV = X = 2.8. Estimated VHM = 4.2 - 2.8 = 1.4. K = EPV/VHM = 2.8/1.4 = 2. Throughout we have taken 3 years as one draw from the risk process, so N = 1. Z = 1/(1 + 2) = 1/3. Observed frequency per year for this policyholder is: 6/3 = 2. Overall mean frequency per year is: 2.8/3 = 0.9333. (1/3)(2) + (1 - 1/3)(0.9333) = 1.29. Comment: This is a summarized version of the previous data, and therefore we get the previous result. Similar to 4, 5/05, Q.28 and 4, 5/07, Q. 25.] One could get a somewhat different result by treating the data as 15 separate observations when calculating K. Mean = 14/15 = 0.9333 = estimated EPV. However, the sample variance for 15 separate observations is: {7(0 - 0.9333)2 + 4(1 - 0.9333)2 + 2(2 - 0.9333)2 + 2(3 - 0.9333)2 }/(15 - 1) = 1.2095. Estimated VHM = 1.2095 - 0.9333 = 0.2762. K = 0.9333/0.2762 = 3.38. Z = 3/(3 + 3.38) = .470. The estimated future annual frequency for insured #2 is: (0.470)(6/3) + (0.530)(0.9333) = 1.43. In general these two somewhat different methods of treating several years of data produce somewhat different results.10 The first method, using the sum for each insured, is preferable. Treating this data as 15 separate observations would ignore the assumption that for an individual each year of data is assumed to come from the same Poisson distribution.11 10
As shown subsequently via a simulation experiment, for a larger number of insureds the estimated values of K will be in the same general range. 11 If instead one assumes that for an individual his Poisson parameter shifts over time, then one should specifically take that into account. See for example, “A Markov Chain Model of Shifting Risk Parameters”, by Howard Mahler, PCAS 1997.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 7
Problems: Use the following information for the next two questions: You observe a portfolio of risks for a single year. Assume each individual riskʼs claim frequency is given by a Poisson process. The number of claims observed is as follows: Number of Claims 0 1 2 3 4 5 6 7 8 9
Number of Insureds 210 293 235 141 70 31 12 5 2 1 1000
2.1 (3 points) What is the Buhlmann credibility of a single year of data from an individual risk from this portfolio? A. less than 19% B. at least 19% but less than 20% C. at least 20% but less than 21% D. at least 21% but less than 22% E. at least 22% 2.2 (2 points) What is the Buhlmann credibility of three years of data from an individual risk from this portfolio? A. less than 45% B. at least 46% but less than 47% C. at least 47% but less than 48% D. at least 48% but less than 49% E. at least 49%
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 8
2.3 (3 points) The claim count distribution during a year is as follows for a large group of insureds: Claim Count : 0 1 2 3 4 5 >5 Percentage of Insureds: 60.0% 24.0% 9.8% 3.9% 1.6% 0.7% 0% Assume the claim count for each individual insured has a Poisson distribution whose expected mean does not change over time. What is the estimated future annual frequency for an insured who had 4 claims during the year? (A) 1.6 (B) 1.7 (C) 1.8 (D) 1.9 (E) 2.0 2.4 (2 points) You have data from 10,000 insureds for one year. Xi is the number of claims from the ith insured.
∑ Xi
= 431.
∑ Xi2 = 534.
∑ Xi3 = 736.
The number of claims of a given insured during the year is assumed to be Poisson distributed with an unknown mean that varies by insured. Determine the semiparametric empirical Bayes estimate of the expected number of claims next year of an insured that reported two claims during the studied year. (A) Less than 0.32 (B) At least 0.32, but less than 0.35 (C) At least 0.35, but less than 0.38 (D) At least 0.38, but less than 0.41 (E) At least 0.41
The following information pertains to the next two questions. The claim count distribution is as follows for a large sample of insureds. Total Claim Count 0 Percentage of Insureds 65%
1 25%
2 6%
3 3%
4 1%
>4 0%
Assume the claim count for each individual insured has a Poisson distribution whose expected mean does not change over time. 2.5 (1 point) What is the expected value of the process variance? A. 0.42 B. 0.44 C. 0.46 D. 0.48 E. 0.50 2.6 (1 point) What is the variance of the hypothetical means? A. 0.16 B. 0.17 C. 0.18 D. 0.19 E. 0.20
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 9
2.7 (3 points) You are given the following data for private passenger automobile insurance for a year: Number of Claims per Policy Number of Policies 0 103,704 1 14,075 2 1,766 3 255 4 45 5 6 6 2 7 and over 0 Total 119,853 Assuming the number of claims per year for each policyholder follows a Poisson distribution, using semiparametric empirical Bayes estimation, determine the Bühlmann credibility factor, Z, for one year of data. A. Less than 0.10 B. At least 0.10 but less than 0.13 C. At least 0.13 but less than 0.16 D. At least 0.16 but less than 0.19 E. At least 0.19 2.8 (3 points) The claim count distribution during a year is as follows for a group of 27,000 insureds: Total Claim Count : 0 1 2 3 4 5 6 7 or more Number of Insureds: 25422 1410 131 24 9 3 1 0 Joe Smith, an insured from this portfolio, is observed to have two claims in six years. Assume each insureds claim count follows a Poisson Distribution. Using semiparametric estimation, what is Joeʼs estimated future annual frequency? A. Less than 0.10 B. At least 0.10, but less than 0.15 C. At least 0.15, but less than 0.20 D. At least 0.20, but less than 0.25 E. At least 0.25
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 10
Use the following information for the next two questions: An insurer has data on the number of claims for 700 policyholders for five years. Let Xij is the number of claims from the ith policyholder for year j. You are given: 5
∑ X1j = 1. j=1
700 5
∑ ∑ Xij = 268. i=1 j=1
700 5
∑ ∑ Xij2 = 319. i=1 j=1
⎞2 ⎜ Xi j⎟ = 471. ⎜ ⎟ i=1 ⎝ j=1 ⎠
700 ⎛ 5
∑∑
The frequency for each policyholder is assumed to be Poisson. 2.9 (3 points) Use semiparametric estimation, treating the 5 years of data from each policyholder as one draw from the risk process, in order to estimate the number of claims for policyholder #1 over the next year. (A) 0.09 (B) 0.10 (C) 0.11 (D) 0.12 (E) 0.13 2.10 (3 points) Use semiparametric estimation, treating the data as 3500 separate observations, in order to estimate the number of claims for policyholder #1 over the next year. (A) 0.09 (B) 0.10 (C) 0.11 (D) 0.12 (E) 0.13
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 11
2.11 (2 points) A group of 1000 drivers is observed for a year to determine how many claims each driver experiences. The data is as follows: # of Claims # of Drivers 0 960 1 32 2 6 3 2 Assuming each insured is Poisson, estimate the credibility to be assigned to one year of frequency data from a driver. A. 10% B. 15% C. 20% D. 25% E. 30% 2.12 (3 points) You are given: (i) During a 3-year period, 500 policies had the following claims experience: Total Claims in Years 1, 2, and 3 Number of Policies 0 405 1 50 2 30 3 10 4 5 (ii) The number of claims per year follows a Poisson distribution. (iii) Each policyholder was insured for the entire 3-year period. A randomly selected policyholder had 2 claims over the 3-year period. Using semiparametric empirical Bayes estimation, determine the Bühlmann estimate for the number of claims in Year 4 for the same policyholder. A. Less than 0.20 B. At least 0.20, but less than 0.25 C. At least 0.25, but less than 0.30 D. At least 0.30, but less than 0.35 E. 0.35 or more 2.13 (3 points) The claim count distribution during a year is as follows for a group of 9461 insureds: Total Claim Count: 0 1 2 3 4 5 6 7 8&more Number of Insureds: 7840 1317 239 42 14 4 4 1 0 Assume each insureds claim count follows a Poisson Distribution. How much credibility would be given to three years of data from an insured? A. Less than 45% B. At least 45%, but less than 50% C. At least 50%, but less than 55% D. At least 55%, but less than 60% E. At least 60%
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 12
2.14 (3 points)You are given: (i) During a single 3-year period, 30,293 policies had the following total claims experience: Number of Claims in Year 1 through Year 3 Number of Policies 0 25,480 1 4,198 2 537 3 67 4 8 5 3 (ii) The number of claims per year follows a Poisson distribution. (iii) Each policyholder was insured for the entire period. A randomly selected policyholder had 2 claims over the period. Using semiparametric empirical Bayes estimation, determine the Bühlmann estimate for the number of claims in Year 4 for the same policyholder. (A) 0.10 (B) 0.12 (C) 0.14 (D) 0.16 (E) 0.18 2.15 (3 points)You are given: (i) During a single 6-year period, 23,872 policies had the following total claims experience: Number of Claims in Year 1 through Year 6 Number of Policies 0 19,634 1 3,573 2 558 3 83 4 19 5 4 6 1 (ii) The number of claims per year follows a Poisson distribution. (iii) Each policyholder was insured for the entire period. A randomly selected policyholder had 2 claims over the period. Using semiparametric empirical Bayes estimation, determine the Bühlmann estimate for the claim frequency in Year 7 for the same policyholder. (A) 4% (B) 5% (C) 6% (D) 7% (E) 8%
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 13
2.16 (2 points) You are given: (i) Each of 4000 policyholders was insured for 2 years. (ii) For each policyholder, you assume that the number of claims per year follows a Poisson distribution. (iii) Let Xi be the number of claims that the ith policyholder had over the two years. 4000
(iv)
∑ Xi = 260. i=1
4000
(v)
∑ Xi2 = 320. i=1
A similar policyholder had two claims over a 5-year period. Using semiparametric empirical Bayes estimation, determine the Bühlmann estimate for the annual claim frequency for this policyholder. (A) 13% (B) 14% (C) 15% (D) 16% (E) 17%
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 14
Use the following information for the next 3 questions: The claim count distribution is as follows for a large sample of insureds. Total Claim Count 0 1 2 3 4 >4 Percentage of Insureds 61.9% 28.4% 7.8% 1.6% 0.3% 0% Assume the claim count for each individual insured has a Poisson distribution which does not change over time. 2.17 (4, 5/85, Q.36) 1 point) What is the expected value of the process variance? A. Less than 0.45 B. At least 0.45, but less than 0.55 C. At least 0.55, but less than 0.65 D. At least 0.65, but less than 0.75 E. 0.75 or more 2.18 (4, 5/85, Q.37) (1 point) What is the variance of the hypothetical means? A. Less than 0.01 B. At least 0.01, but less than 0.02 C. At least 0.02, but less than 0.03 D. At least 0.03, but less than 0.04 E. 0.04 or more 2.19 (4, 5/85, Q.38) (1 point) Find the expected claim frequency of an insured who has had one accident free year. A. Less than 0.425 B. At least 0.425, but less than 0.450 C. At least 0.450, but less than 0.475 D. At least 0.475, but less than 0.500 E. 0.500 or more
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 15
2.20 (4, 5/89, Q.39) (2 points) A group of 340 insureds in a high crime area submit the following 210 theft claims in a one year period: Number of Claims Number of Insureds 0 200 1 80 2 50 3 10 Each insured is assumed to have a Poisson distribution for the number of thefts, but the mean of such distribution may vary from one insured to another. If a particular insured experienced 2 claims in the observation period, what is the Buhlmann credibility estimate of the number of claims for this insured in the next year? (Use the observed data to estimate the expected value of the process variance and the variance of the hypothetical means.) A. Less than 0.75 B. At least 0.75, but less than 0.85 C. At least 0.85, but less than 0.95 D. At least 0.95, but less than 1.20 E. 1.20 or more 2.21 (4, 5/91, Q.35) (2 points) The number of claims for each insured in a population has a Poisson distribution. The distribution of insureds by number of actual claims in a single year is shown below. Number of Number of Claims Insureds 0 900 1 90 2 7 3 2 4 1 _____
Total 1000 Calculate the Buhlmann estimate of credibility to be assigned to the observed number of claims for an insured in a single year. A. Less than 0.10 B. At least 0.10 but less than 0.13 C. At least 0.13 but less than 0.16 D. At least 0.16 but less than 0.19 E. At least 0.19
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 16
2.22 (4B, 11/97, Q.7 & Course 4 Sample Exam 2000, Q.39) (3 points) You are given the following: • The number of losses arising from m+4 individual insureds over a single period of observation is distributed as follows: Number of Losses 0 1 2 3 or more
Number of Insureds m 3 1 0
• The number of losses for each insured follows a Poisson distribution, but the mean of each such distribution may be different for individual insureds. • The variance of the hypothetical means is to be estimated from the above observations. Determine all values of m for which the estimate of the variance of the hypothetical means will be greater than 0. A. m ≥ 0 B. m ≥ 1 C. m ≥ 3 D. m ≥ 7 E. m ≥ 9 2.23 (4B, 11/98, Q.11) (2 points) You are given the following: • The number of losses arising from 500 individual insureds over a single period of observation is distributed as follows: Number of Losses Number of Insureds 0 450 1 30 2 10 3 5 4 5 5 or more 0 • The number of losses for each insured follows a Poisson distribution, but the mean of each such distribution may be different for individual insureds. Determine the Buhlmann credibility of the experience of an individual insured over a single period. A. Less than 0.20 B. At least 0.20, but less than 0.30 C. At least 0.30, but less than 0.40 D. At least 0.40, but less than 0.50 E. At least 0.50
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 17
2.24 (4, 5/00, Q.33) (2.5 points) The number of claims a driver has during the year is assumed to be Poisson distributed with an unknown mean that varies by driver. The experience for 100 drivers is as follows: Number of Claims during the Year Number of Drivers 0 54 1 33 2 10 3 2 4 1 Determine the credibility of one year's experience for a single driver using semiparametric empirical Bayes estimation. (A) 0.046 (B) 0.055 (C) 0.061 (D) 0.068 (E) 0.073 2.25 (4, 11/00, Q.7) (2.5 points) The following information comes from a study of robberies of convenience stores over the course of a year: (i) Xi is the number of robberies of the ith store, with i = 1, 2,..., 500. (ii) ∑ Xi = 50 (iii) ∑ Xi2 = 220 (iv) The number of robberies of a given store during the year is assumed to be Poisson distributed with an unknown mean that varies by store. Determine the semiparametric empirical Bayes estimate of the expected number of robberies next year of a store that reported no robberies during the studied year. (A) Less than 0.02 (B) At least 0.02, but less than 0.04 (C) At least 0.04, but less than 0.06 (D) At least 0.06, but less than 0.08 (E) At least 0.08
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 18
2.26 (4, 11/04, Q.37 & 2009 Sample Q.159) (2.5 points) For a portfolio of motorcycle insurance policyholders, you are given: (i) The number of claims for each policyholder has a conditional Poisson distribution. (ii) For Year 1, the following data are observed: Number of Claims Number of Policyholders 0 2000 1 600 2 300 3 80 4 20 Total 3000 Determine the credibility factor, Z, for Year 2. (A) Less than 0.30 (B) At least 0.30, but less than 0.35 (C) At least 0.35, but less than 0.40 (D) At least 0.40, but less than 0.45 (E) At least 0.45 2.27 (4, 5/05, Q.28 & 2009 Sample Q.197) (2.9 points) You are given: (i) During a 2-year period, 100 policies had the following claims experience: Total Claims in Years 1 and 2 Number of Policies 0 50 1 30 2 15 3 4 4 1 (ii) The number of claims per year follows a Poisson distribution. (iii) Each policyholder was insured for the entire 2-year period. A randomly selected policyholder had one claim over the 2-year period. Using semiparametric empirical Bayes estimation, determine the Bühlmann estimate for the number of claims in Year 3 for the same policyholder. (A) 0.380 (B) 0.387 (C) 0.393 (D) 0.403 (E) 0.443
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 19
2.28 (4, 11/05, Q.30 & 2009 Sample Q.240) (2.9 points) For a group of auto policyholders, you are given: (i) The number of claims for each policyholder has a conditional Poisson distribution. (ii) During Year 1, the following data are observed for 8000 policyholders: Number of Claims Number of Policyholders 0 5000 1 2100 2 750 3 100 4 50 5+ 0 A randomly selected policyholder had one claim in Year 1. Determine the semiparametric empirical Bayes estimate of the number of claims in Year 2 for the same policyholder. (A) Less than 0.15 (B) At least 0.15, but less than 0.30 (C) At least 0.30, but less than 0.45 (D) At least 0.45, but less than 0.60 (E) At least 0.60 2.29 (4, 5/07, Q.25) (2.5 points) You are given: (i) During a single 5-year period, 100 policies had the following total claims experience: Number of Claims in Year 1 through Year 5 Number of Policies 0 46 1 34 2 13 3 5 4 2 (ii) The number of claims per year follows a Poisson distribution. (iii) Each policyholder was insured for the entire period. A randomly selected policyholder had 3 claims over the period. Using semiparametric empirical Bayes estimation, determine the Bühlmann estimate for the number of claims in Year 6 for the same policyholder. (A) Less than 0.25 (B) At least 0.25, but less than 0.50 (C) At least 0.50, but less than 0.75 (D) At least 0.75, but less than 1.00 (E) At least 1.00
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 20
Solutions to Problems: 2.1. C. Mean = 1.753, 2nd moment = 5.283. Total Variance (adjusted for degrees of freedom) = (1000/999) (5.283 - 1.7532) = 2.212. EPV = Mean = 1.753. VHM = Total Variance - EPV = .459. K = EPV / VHM = 1.753 / .459 = 3.82. Z = 1 / (1 + 3.82) = 20.7%. Number of Insureds 210 293 235 141 70 31 12 5 2 1
Number of Claims 0 1 2 3 4 5 6 7 8 9
Square of # of Claims 0 1 4 9 16 25 36 49 64 81
1000
1753
5283
Comment: In this case we are given data, therefore first we need to compute the total variance and back out the VHM. The fact that the claim frequency is Poisson is sufficient to allow an estimate of the EPV, but without the data we could not estimate the VHM. We are not given a complete model of the risk process, as for example in the case of a Gamma-Poisson. 2.2. A. For the previous solution, K = 3.82. For three years of data, Z = 3 / (3 + 3.82) = 44.0%. 2.3. C. Mean = .652 and the total variance = 1.414 - .6522 = .989. A
B
C
D
Number of Claims 0 1 2 3 4 5
A Priori Probability 0.60000 0.24000 0.09800 0.03900 0.01600 0.00700
Col. A times Col. B 0.00000 0.24000 0.19600 0.11700 0.06400 0.03500
Square of Col. A times Col. B 0.00000 0.24000 0.39200 0.35100 0.25600 0.17500
Sum
1
0.652
1.41400
EPV = overall mean = .652. VHM = Total Variance - EPV = .989 - .652 = .337. K = EPV/VHM = .652/.337 = 1.93. Z = 1/(1 + K) = .341. Estimated future annual frequency = (.341)(4) + (1 - .341)(.652) = 1.79.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 21
2.4. C. Sample Mean = 431/ 10000 = .0431. Second Moment = 534/ 10000 = .0534. Sample Variance = (10000/9999)(.0534 - .04312 ) = .05155. EPV = Mean = .0431. VHM = Variance - EPV = .05155 - .0431 = .00845. K = EPV/VHM = .0431/ .00845 = 5.10. Z = 1/(1+K) = 16.4%. Estimated frequency = (.164)(2) + (1 - .164)(.0431) = 0.364. Comment: Similar to 4, 11/00, Q.7. One does not make use of
∑ Xi3 = 736.
2.5. E. & 2.6. B. Each insuredʼs frequency process is given by a Poisson with parameter λ, with λ varying over the group of insureds. Then the process variance for each insured is λ. Thus the expected value of the process variance is estimated as follows: Eλ[VAR[X | λ]] = Eλ[λ] = overall mean = 0.50. A
B
C
D
Number of Claims 0 1 2 3 4
A Priori Probability 0.65 0.25 0.06 0.03 0.01
Col. A times Col. B 0.00 0.25 0.12 0.09 0.04
Square of Col. A times Col. B 0.00 0.25 0.24 0.27 0.16
Sum
1
0.50
0.92
The mean = 0.50 and the total variance = 0.92 - 0.502 = 0.67. Thus we estimate the Variance of the Hypothetical Means as: Total Variance - EPV = 0.67 - 0.50 = 0.17.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
2.7. C. Mean = 18594/119853 = 0.1551. 2nd moment = 24376/119853 = 0.2034. Number of Claims
Number of Policies
Contribution to First Moment
Contribution to Second Moment
0 1 2 3 4 5 6
103,704 14,075 1,766 255 45 6 2
0 14,075 3,532 765 180 30 12
0 14,075 7,064 2,295 720 150 72
21
119,853
18,594
24,376
Sample Variance = (119,853 / 119,852) (0.2034 - 0.15512) = 0.1793. EPV = Mean = 0.1551. VHM = Sample Variance - EPV = 0.1793 - 0.1551 = 0.0242. K = EPV / VHM = 0.1551 / .0242 = 6.41. Z = 1 / (1 + 6.41) = 13.5%. Comment: Taken from pages 107-108 of Mathematical Methods in Risk Theory by Hans Buhlmann. The same data is shown in Table 16.20 in Loss Models. 2.8. D. The estimated mean is: 1801/27000 = .06670. The estimated second moment is: 2405/27000 = .08907. The sample variance is: (27000/26999)(.08907 - .066702 ) = .08462. A
B
C
D
Number of Claims 0 1 2 3 4 5 6
Number of Insureds 25422 1410 131 24 9 3 1
Col. A times Col. B 0 1410 262 72 36 15 6
Square of Col. A times Col. B 0 1410 524 216 144 75 36
Sum
27000
1801
2405
Since we have assumed each insured is Poisson: EPV = E[λ] = overall mean = .06670. VHM = Total Variance - EPV = .08462 - .06670 = .01792. K = EPV/VHM = .06670/.01792 = 3.72. For six years of data, Z = 6/(6+3.72) = 61.7%. The estimated overall mean is .06670 and the observed frequency for Joe is 2/6 = 1/3. Therefore, Joeʼs estimated future frequency is: (.617)(1/3) + (1 - .617)(.06670) = 0.231.
Page 22
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 23
2.9. C. The overall mean 5-year frequency is: 268/700 = .3829. The second moment of the 5-year claim frequency is: 471/700 = .6729. Therefore, the sample variance is: (700/699)(.6729 - .38292 ) = .5270. Since each policyholder is Poisson, the estimated EPV (5 year) = mean = .3829. The estimated VHM (5 year) = Total variance - EPV = .5270 - .3829 = .1441. K (5 year) = EPV/VHM = .3829/.1441 = 2.66. Remembering that 5 years of data was defined as one draw from the risk process, Z = 1/(1 + 2.66) = 27.3%. Over 5 years policyholder #1 had 1 claim. Thus the estimated number of claims for policyholder # 1 over the next 5 years is: (27.3%)(1) + (72.7%)(.3829) = .551. Thus the estimated future annual frequency for policyholder # 1 is: .551/5 = 0.110. 2.10. D. The overall mean annual frequency is: 268/3500 = .0766. The second moment of the annual claim frequency is: 319/3500 = .0911. Therefore, the sample variance is: (3500/3499)(.0911 - .07662 ) = .0853. Since each policyholder is Poisson, the estimated EPV = mean = .0911. The estimated VHM = Total variance - EPV = .0853 - .0766 = .0087. K = EPV/VHM = .0766/.0087 = 8.8. For 5 years of data, Z = 5/(5 + 8.8) = 36.2%. The mean annual frequency of policyholder #1 is 1/5 = .2. Thus the estimated future annual frequency for policyholder # 1 is: (36.2%)(.2) + (63.8%)(.0766) = 0.121. Comment: Note the somewhat different estimates in this and the previous solution. 2.11. E. First Moment is: {(0)(960) + (1)(32) + (2)(6) + (3)(2)}/1000 = .050. Second Moment is: {(02 )(960) + (12 )(32) + (22 )(6) + (32 )(2)}/1000 = .074. Estimated Total Variance = .074 - .0502 = .0715. Process Variance = λ. Expected Value of the Process Variance = E[λ] = .050. Estimated Variance of the Hypothetical Means = Total Variance - EPV = .0715 - .05 = .0215. K = EPV/VHM = .050/.0215 = 2.33. For one year of data, Z = 1/(1 + K) = 1/3.33 = 30.0%. Comment: If instead you use the sample variance of: (1000/999)(.0715) = .0716, then VHM = .0716 - .05 = .0216. K = .050/.0216 = 2.31. Z = 1/(1 + K) = 1/3.31 = 30.2%.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 24
2.12. E. Treat three years from a single insured, as one draw from the risk process. Estimated EPV = X = {(405)(0) + (50)(1) + (30)(2) + (10)(3) + (5)(4)}/500 = 0.32. Second Moment = {(405)(02 ) + (50)(12 ) + (30)(22 ) + (10)(32 ) + (5)(42 )}/500 = 0.68. Sample Variance = (500/499)(.68 - .322 ) = .5788. Estimated VHM = .5788 - .32 = .2588. K = EPV/VHM = .32/.2588 = 1.236. N = 1, since we observe an insured for a three year period, one draw from the risk process. Z = 1/(1 + K) = .447. Estimated number of claims for three years is: (.447)(2) + (1 - .447)(.32) = 1.071. Estimated number of claims for one year is: 1.071/3 = 0.357. Comment: Similar to 4, 5/05, Q.28. 2.13. C. The mean is: (0)(7840) + (1)(1317) + (2)(239) + (3)(42) + (4)(14) + (5)(4) + (6)(4) + (7)(1) = 0.2144. 9461 The second moment is: (0)(7840) + (1)(1317) + (4)(239) + (9)(42) + (16)(14)+ (25)(4)+ (36)(4) + (49)(1) = 9461 0.3348. The estimated variance is: (9461/9460)(.3348 - .21442 ) = .2889. EPV = estimated mean = 0.2144. VHM = total variance - EPV = 0.2889 - 0.2144 = 0.0745. K = EPV/VHM = 0.2144/0.0745 = 2.88. For three years, Z = 3/(3 + 2.88) = 51%.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 25
2.14. B. X = {(25,480)(0) + (4,198)(1) + (537)(2) + (67)(3) + (8)(4) + (3)(5)}/30,293 = 0.1822. E[X2 ] = {(25,480)(02 ) + (4,198)(12 ) + (537)(22 ) + (67)(32 ) + (8)(42 ) + (3)(52 )}/30,293 = 0.2361. Sample Variance = (30,293/30,292)(0.2361 - 0.18222 ) = 0.2029. Estimated EPV = X = 0.1822. Estimated VHM = 0.2029 - 0.1822 = 0.0207. K = EPV/VHM = 0.1822/.0207 = 8.80. Throughout we have taken 3 years as one draw from the risk process, so N = 1. Z = 1/(1 + 8.80) = 10.2%. Observed frequency per year for this policyholder is 2/3. Overall mean frequency per year is: 0.1822/3 = 0.0607. (10.2%)(2/3) + (1 - 10.2%)( 0.0607) = 0.123. Comment: Similar to 4, 5/07, Q. 25. Data for male driver in California from 1969 to 1971, taken from Table A2 of “The Distribution of Automobile Accidents - Are Relativities Stable Over Time?,” by Emilio C. Venezian, PCAS 1990. If instead we want K for one year being one draw from the risk process, then the previously determined EPV is divided by 3 and the previously determined VHM is divided by 32 . EPV = 0.1822/3 = 0.0607. VHM = 0.0207/9 = 0.0023. K = 0.607/0.0023 = 26.4. Now N = 3, and Z = 3/(3 + 26.4) = 10.2%, matching the credibility determined previously. 2.15. E. X = {(19,634)(0) + (3,573)(1) + (558)(2) + (83)(3) + (19)(4) + (4)(5) + (1)(6)}/23,872 = 0.2111. E[X2 ] = {(19,634)(02 ) + (3,573)(12 ) + (558)(22 ) + (83)(32 ) + (19)(42 ) + (4)(52 ) + (1)(62 )} /23,872 = 0.2929. Sample Variance = (23,872/23,871)(0.2929 - 0.21112 ) = 0.2483. Estimated EPV = X = 0.2111. Estimated VHM = 0.2483 - 0.2111 = 0.0372. K = EPV/VHM = 0.2111/.0372 = 5.67. Throughout we have taken 6 years as one draw from the risk process, so N = 1. Z = 1/(1 + 5.67) = 15.0%. Observed frequency per year for this policyholder is: 2/6 = 1/3. Overall mean frequency per year is: 0.2111/6 = 0.0352. (15.0%)(1/3) + (1 - 15.0%)( 0.0352) = 0.080. Comment: Similar to 4, 5/07, Q. 25. Data for female driver in California from 1969 to 1974, taken from Table 1 of “The Distribution of Automobile Accidents - Are Relativities Stable Over Time?,” by Emilio C. Venezian, PCAS 1990.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 26
2.16. B. Treat two years from a single insured, as one draw from the risk process. Estimated EPV = X = 260/4000 = 0.065. Second Moment = 320/4000 = 0.08. Sample Variance = (4000/3999)(0.08 - 0.0652 ) = 0.07579. Estimated VHM = 0.07579 - 0.065 = 0.01079. K = EPV/VHM = 0.065/0.01079 = 6.022. We observe a policyholder for 5 years; N = 2.5, since one draw from the risk process is two years. Z = 2.5/(2.5 + K) = 29.3%. Observed two-year frequency for this policyholder is: (2)(2/5) = 0.8. Estimated number of claims for two years is: (29.3%)(0.8) + (70.7%)(0.065) = 0.280. Estimated number of claims for one year is: 0.280/2 = 0.140. Alternately, the EPV for a single year is the mean annual frequency: 0.065/2 = 0.0325. The hypothetical means for a single year are half of those for two years. Therefore, the VHM for one year is 1/22 = 1/4 that for two years: 0.01079/4 = 0.002698. If one year is defined as a single draw from the risk process, then K = 0.0325/0.002698 = 12.05. We observe a policyholder for 5 years; N = 5, since one draw from the risk process is one year. Z = 5/(5 + K) = 29.3%. The observed annual claim frequency for this insured is: 2/5 = 0.4. Estimated future annual frequency for this insured is: (29.3%)(0.4) + (70.7%)(0.0325) = 0.140. Comment: The definition of what is one draw from the risk process, must be consistent between the calculation of the EPV and VHM, and the determination of N. 2.17. B. A
B
C
D
Number of Claims 0 1 2 3 4
A Priori Probability 0.61900 0.28400 0.07800 0.01600 0.00300
Col. A times Col. B 0.00000 0.28400 0.15600 0.04800 0.01200
Square of Col. A times Col. B 0.00000 0.28400 0.31200 0.14400 0.04800
Sum
1
0.5000
0.78800
Each insuredʼs frequency process is given by a Poisson with parameter θ, with θ varying over the group of insureds. Then the process variance for each insured is θ. Thus the expected value of the process variance is estimated as follows: Eθ[VAR[X | θ]] = Eθ[θ] = overall mean = 0.5.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 27
2.18. D. Following the solution to the previous question, one can estimate the total mean as 0.5 and the total variance as: 0.788 - 0.52 = 0.538. Thus we estimate the Variance of the Hypothetical Means as: Total Variance - EPV = 0.538 - 0.5 = 0.038. 2.19. C. Using the results of the previous two questions, K = EPV / VHM = .5 / .038 = 13.16. For one year Z = 1 / (1+13.16) = .071. The a priori mean is .5. Thus if there are no accidents, then the expected future claim frequency is: (0)(.071) + (.5)(1-.071) = 0.465. Comment: This indicates a claim free credit for one year of 7.1%, equal to the credibility. 2.20. B. Each insuredʼs frequency process is given by a Poisson with parameter θ, with θ varying over the group of insureds. The process variance for each insured is θ. Thus the expected value of the process variance is estimated as follows: Eθ[VAR[X | θ]] = Eθ[θ] = overall mean = .6176. A
B
C
D
E
Number of Claims 0 1 2 3
Number of Insureds 200 80 50 10
Probability 0.58824 0.23529 0.14706 0.02941
Col. A times Col. C 0.00000 0.23529 0.29412 0.08824
Square of Col. A times Col. C 0.00000 0.23529 0.58824 0.26471
Sum
340
1
0.6176
1.08824
One can estimate the total variance (adjusting for degrees of freedom) as: (340/339)(1.0882 - .61762 ) = .7089. Thus we estimate the Variance of the Hypothetical Means as: Total Variance - EPV = .7089 - .6176 = .0913. Thus the Buhlmann Credibility parameter K = EPV/VHM = .6176 / .0913 = 6.76. For one observation of single insured Z = 1/(1 + 6.76) = 12.9%. The observed frequency is 2 and the prior mean is .6176. Thus the new estimate is (.129)(2) + (1 - .129)(.6176) = 0.796.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 28
2.21. D. Number of Insureds 900 90 7 2 1
Number of Claims 0 1 2 3 4
Square of Number of Claims 0 1 4 9 16
1000
0.114
0.152
Expected Value of the Process Variance = Mean of the Poissons = 114 /1000 =.114. The Total Variance (adjusting for degrees of freedom) = (1000/999)(.152 - .1142 )= .139. The estimate of the Variance of the Hypothetical Means = Total Variance - EPV = .139 - .114 = .025. K = .114 / .025 = 4.56. For one observation, Z = 1/(1+4.56) = 0.180. Comment: Note that when mixing Poissons, the credibility assigned to one observation is Z = 1/(1+K) = (A Priori Total Variance - A Priori Mean) / A Priori Total Variance = 1 - (A Priori Mean / A Priori Total Variance) = 1 - (.114 / .139) = .180. 2.22. D. The estimated first moment is: (number of claims)/(number of exposures) = {(0)m + (1)(3) + (2)(1)} / (m+3+1) = 5/(m+4). Similarly, the estimated second moment is: {(02 )m + (12 )(3) + (22 )(1)} / (m+3+1) = 7/(m+4). Thus the estimated total variance (adjusted for degrees of freedom) is: ((m + 4)/(m + 3)) (7/(m + 4) - 25/(m + 4)2 ) = 7/(m +3) - 25/((m + 4)(m + 3)). Since, we are mixing Poissons, the estimate of the Expected Value of the Process Variance is equal to the overall mean of 5/(m + 4). Thus the estimated Variance of the Hypothetical means is: Total Variance - Estimated EPV = 7/(m + 3) - 25/((m + 4)(m + 3)) - 5/(m + 4) = (2m - 12)/((m + 4)(m + 3)). Since the denominator is never negative, this expression is positive for m > 12/2 = 6. Since m is an integer number of exposures, the estimated VHM is positive for m ≥ 7. Comment: Note that when this technique or the more sophisticated Empirical Credibility techniques are used in actual applications, it is an important concern that the estimated VHM could be very small or even negative. If one does not adjust for degrees of freedom, then the estimated total variance is: 7/(m+4) - 25/(m+4)2. Then the estimated Variance of the Hypothetical means is: Total Variance - Estimated EPV = 7/(m+4) - 25/(m+4)2 - 5/(m+4) = (2m-17)/(m+4)2. Since the denominator is never negative, this expression is positive for m > 17/2 = 8.5. Since m is an integer number of exposures, the estimated VHM is positive for m ≥ 9. (This was the intended solution, when this question was originally asked and the Syllabus readings were different.)
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 29
2.23. E. The observed mean is .17, while the estimated variance (adjusted for degrees of freedom) is: (500/499)(.39 - .172 ) = .3618. Number of Insureds
Number of Losses
Square of Number of Losses
450 30 10 5 5
0 1 2 3 4
0 1 4 9 16
500
0.170
0.390
Since we are mixing Poissons the EPV = mean = .17. The VHM = Total Variance - EPV = .3618 - .17 = .1918. K = EPV / VHM = .17 / .1918 = .886. For one observation, Z = 1/(1+K) = 1/1.886 = 0.53. 2.24. E. The estimated first moment is 63/100 = .63. The estimated second moment is 107/100 = 1.07. The sample variance is: (100/99)(1.07 - .632 ) = .680. Number of Insureds 54 33 10 2 1
Number of Claims 0 1 2 3 4
Square of # of Claims 0 1 4 9 16
100
63
107
EPV = Mean = .630. VHM = Total Variance - EPV = .680 - .630 = .050. K = EPV / VHM = .63 / .050 = 12.63. Z = 1 / (1+ 12.63) = 7.3%. Comment: As per Loss Models, the sample variance is used to estimate the total variance; otherwise the estimated variance would be biased. If one used the “regular variance” rather than the sample variance, one would instead get: VHM = .673 - .630 = .043, K = 14.62, and Z = 6.4%, not the intended answer out of Loss Models. However, Mahler & Dean, now also on the syllabus, would use the biased estimator of the variance. 2.25. B. EPV = Mean = 50/500 = .1. Second moment = 220/500 = .44. Sample Variance = (500/499)(.44 - .12 ) = .4309. VHM = Total Variance - EPV = .4309 - .1 = .3309. K = EPV/VHM = .1/.3309 = .302. For one store for one year, Z = 1/(1 + K) = 1 /1.302 = .768. The observation is 0 and the overall mean is .1, so the estimated future frequency for this store is: (0)(.768) + (.1)(.232) = 0.023.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 30
2.26. A. Mean = 1520/3000 = .5067. Second Moment = 2840/3000 = .9467. Sample Variance = (3000/2999)(.9467 - .50672 ) = .6902. Estimated EPV = .5067. Estimated VHM = .6902 - .5067 = .1835. K = EPV/VHM = .5067/.1835 = 2.762. For one policyholder, Z = 1/(1 + 2.762) = 0.266. Comment: “The number of claims for each policyholder has a conditional Poisson distribution” ⇔ The frequency for each policyholder is Poisson, but the values of λ (may) vary across the portfolio. In other words, f(n | λ) is Poisson with mean λ. 2.27. C. Treat two years from a single insured, as one draw from the risk process. Estimated EPV = X = {(50)(0) + (30)(1) + (15)(2) + (4)(3) + (1)(4)}/100 = 0.76. Second Moment = {(50)(02 ) + (30)(12 ) + (15)(22 ) + (4)(32 ) + (1)(42 )}/100 = 1.42. Sample Variance = (100/99)(1.42 - .762 ) = .8509. Estimated VHM = .8509 - .76 = .0909. K = EPV/VHM = .76/.0909 = 8.36. N = 1, since we observe an insured for a two year period, one draw from the risk process. Z = 1/(1 + K) = 10.7%. Estimated number of claims for two years is: (.107)(1) + (1 - .107)(.76) = .786. Estimated number of claims for one year is: .786/2 = 0.393. Alternately, the EPV for a single year is the mean annual frequency: 0.76/2 = 0.38. The hypothetical means for a single year are half of those for two years. Therefore, the VHM for one year is 1/22 = 1/4 that for two years: .0909/4 = .0227. If one year is defined as a single draw from the risk process, then K = 0.38/.0227 = 16.74. We observe two years of data, two draws from the risk process, and therefore N = 2. Z = 2/( 2 + K) = 2/18.74 = 10.7%. The observed annual claim frequency for this insured is: 1/2. Estimated future annual frequency for this insured is: (.107)(1/2) + (1 - .107)(0.38) = 0.393. Comment: The definition of what is one draw from the risk process, must be consistent between the calculation of the EPV and VHM, and the determination of N. 2.28. D. Mean = {(0)(5000) + (1)(2100) + (2)(750) + (3)(100) + (4)(50)}/8000 = 4100/8000 = .5125. Second Moment = {(02 )(5000) + (12 )(2100) + (22 )(750) + (32 )(100) + (42 )(50)}/8000 = 6800/8000 = .85. Sample Variance is: (8000/7999)(.85 - .51252 ) = .5874. Estimated EPV = mean = .5125. Estimated VHM = Sample Variance - Mean = .5874 - .5125 = .0749. K = EPV/VHM = .5125/.0749 = 6.84. Z = 1/(1 + K) = 1/7.84 = .1275. Estimate is: (.1275)(1) + (1 - .1275)(.5125) = 0.575.
2013-4-11,
Semiparametric Estimation §2 Poisson Frequency,
HCM 10/22/12,
Page 31
2.29. A. X = {(46)(0) + (34)(1) + (13)(2) + (5)(3) + (2)(4)}/100 = 0.83. E[X2 ] = {(46)(02 ) + (34)(12 ) + (13)(22 ) + (5)(32 ) + (2)(42 )}/100 = 1.63. Sample Variance = (100/99)(1.63 - 0.832 ) = 0.951. Estimated EPV = X = 0.83. Estimated VHM = 0.951 - 0.83 = 0.121. K = EPV/VHM = 6.86. Throughout we have taken 5 years as one draw from the risk process, so N = 1. Z = 1/(1 + 6.86) = 12.7%. Observed frequency per year for this policyholder is 3/5 = 0.6. Overall mean frequency per year is: 0.83/5 = 0.166. (12.7%)(0.6) + (1 - 12.7%)(0.166) = 0.221. Comment: The answer has to be between 0.6 and 0.166, so choices D and E make no sense.
2013-4-11, Semiparametric Estimation §3 Neg. Bin. Beta Fixed,
HCM 10/22/12,
Page 32
Section 3, Negative Binomial with β Fixed One can assume a Negative Binomial Distribution with fixed known β parameter, with the r parameter varying between the insureds. In semiparametric estimation, when one assumes each exposure has a Negative Binomial Distribution with known β parameter, one estimates the mean and total variance and then:12 EPV = (1+ β)(estimated mean) VHM = estimated total variance - EPV. Exercise: Derive the above formula for the EPV. [Solution: Since the mean frequency for each exposure is rβ, the overall mean is: β E[r]. Therefore, E[r] = (estimated mean) / β. Since the process variance of each exposure is rβ(1+β), the EPV is: E[rβ(1+β)] = β(1+β) E[r] = β(1+β) (estimated mean)/ β = (1+ β)(estimated mean). ] Exercise: Assume that one observes that the claim count distribution during a year is as follows for a group of 10,000 insureds: Total Claim Count : 0 1 2 3 4 5 6 7 Number of Insureds: 8116 1434 329 87 24 7 2 1
>7 0
Assume in addition that the claim count for each individual insured has a Negative Binomial distribution, with β = 0.3 and r varying over the portfolio of insureds. Estimate the Buhlmann Credibility parameter K. [Solution: In a previous exercise we estimated the total mean as 0.2503, and the total variance as 0.3587. EPV = (1+ β)(estimated mean) = (1.3)(0.2503) = 0.3254. VHM = Total Variance - EPV = 0.3587 - 0.3254 = 0.0333. K = EPV/VHM = 0.3254/0.0333 = 9.8. ]
12
As β → 0, the Negative Binomial → Poisson, and the EPV → estimated mean.
2013-4-11, Semiparametric Estimation §3 Neg. Bin. Beta Fixed,
HCM 10/22/12,
Page 33
Problems: 3.1 (2 points) The following information comes from a study of robberies of convenience stores over the course of a year: (i) Xi is the number of robberies of the ith store, with i = 1, 2,..., 500. (ii) ∑ Xi = 50 (iii) ∑ Xi2 = 220 (iv) The number of robberies of a given store during the year is assumed to be Negative Binomial with β = 2.7, and r that varies by store. Determine the semiparametric empirical Bayes estimate of the expected number of robberies next year of a store that reported no robberies during the studied year. (A) Less than 0.02 (B) At least 0.02, but less than 0.04 (C) At least 0.04, but less than 0.06 (D) At least 0.06, but less than 0.08 (E) At least 0.08
3.2 (2 points) The number of claims a driver has during the year is assumed to be Negative Binomial with β = 0.6, and r that varies by driver. The number of losses arising from 500 individual insureds over a single period of observation is distributed as follows: Number of Losses Number of Insureds 0 450 1 30 2 10 3 5 4 5 5 or more 0 Determine the credibility of one year's experience for a single driver using semiparametric empirical Bayes estimation. A. Less than 0.20 B. At least 0.20, but less than 0.30 C. At least 0.30, but less than 0.40 D. At least 0.40, but less than 0.50 E. At least 0.50
2013-4-11, Semiparametric Estimation §3 Neg. Bin. Beta Fixed,
HCM 10/22/12,
Page 34
3.3 (3 points) The claim count distribution during a year is as follows for a group of 27,000 insureds: Total Claim Count : 0 1 2 3 4 5 6 7 or more Number of Insureds: 25422 1410 131 24 9 3 1 0 Joe Smith, an insured from this portfolio, is observed to have two claims in six years. Assume each insureds claim count follows a Negative Binomial Distribution. Assume β = 0.25 for each insured, but r varies across the portfolio of insureds. Using semiparametric estimation, what is Joeʼs estimated future annual frequency? A. Less than 0.10 B. At least 0.10, but less than 0.15 C. At least 0.15, but less than 0.20 D. At least 0.20, but less than 0.25 E. At least 0.25
2013-4-11, Semiparametric Estimation §3 Neg. Bin. Beta Fixed,
HCM 10/22/12,
Page 35
Solutions to Problems: 3.1. E. Mean = 50/500 = .1. Second moment = 220/500 = .44. Sample Variance = (500/499)(.44 - .12 ) = .4309. EPV = (1+β)mean = (3.7)(.1) = .37. VHM = Total Variance - EPV = .4309 - .37 = .0609. K = EPV/VHM = .37/.0609 = 6.08. For one store for one year, Z = 1/(1 + K) = 1 /7.08= .141. The observation is 0 and the overall mean is .1, so the estimated future frequency for this store is: (0)(.141) + (.1)(1 - .141) = 0.086. Comment: Similar to 4, 11/00, Q.7, except that there a Poisson is assumed. 3.2. B. The observed mean is .17, while the estimated variance (adjusted for degrees of freedom) is: (500/499)(.39 - .172 ) = .3618. Number of Insureds
Number of Losses
Square of Number of Losses
450 30 10 5 5
0 1 2 3 4
0 1 4 9 16
500
0.170
0.390
Estimated EPV = (1+β)mean = (1.6)(.170) = .272. Estimated VHM = Estimated Total Variance - Estimated EPV = .3618 - .272 = .0898. K = EPV / VHM = .272 / .0898 = 3.03. Z = 1 / (1+ 3.03) = 24.8%. Comment: Similar to 4B, 11/98, Q.11, except that there a Poisson is assumed.
2013-4-11, Semiparametric Estimation §3 Neg. Bin. Beta Fixed,
HCM 10/22/12,
Page 36
3.3. A. The estimated mean is: 1801/27000 = .06670. The estimated second moment is: 2405/27000 = .08907. The sample variance is: (27000/26999)( .08907 - .066702 ) = .08462. A
B
C
D
Number of Claims 0 1 2 3 4 5 6
Number of Insureds 25422 1410 131 24 9 3 1
Col. A times Col. B 0 1410 262 72 36 15 6
Square of Col. A times Col. B 0 1410 524 216 144 75 36
Sum
27000
1801
2405
Since we have assumed each insured has a Negative Binomial frequency with β = 0.25, for each insured, the process variance = rβ(1+ β) = rβ(1.25) = (1.25)(mean). EPV = E[Process Variance] = E[(1.25)(mean)] = (1.25)E[mean] = (1.25)(overall mean) = (1.25)(.06670) = .08338. VHM = Total Variance - EPV = .08462 - .08338 = .00124. K = EPV/VHM = .08338/.00124 = 67. For six years of data, Z = 6/(6+67) = 8.2%. The estimated overall mean is .06670 and the observed frequency for Joe is 2/6 = 1/3. Therefore, Joeʼs estimated future frequency is: (.082)(1/3) + (1 - .082)(.06670) = 0.089.
2013-4-11, Semiparametric Estimation §4 Geometric Frequency, HCM 10/22/12, Page 37
Section 4, Geometric Frequency Assume that each exposure has a Geometric distribution, with the parameters β varying over the portfolio of exposures. Let the mixing distribution be g(β).13 Then since the mean frequency for each exposure is β, the overall mean is the mean of g. Since the process variance of each exposure is β(1+β) = β + β2 , the EPV is: E[β + β2 ] = E[β] + E[β2 ] = mean of g + second moment of g. Since the mean of each exposure is β, the VHM is by definition the variance of g = second moment of g - (mean of g)2 . Thus if we let the overall mean be µ, we have the following relationships when mixing Geometric Distributions:14 µ = mean of g. EPV = µ + second moment of g. VHM = second moment of g - µ2 . Total Variance = EPV + VHM = 2(second moment of g) + µ - µ2 . Therefore, EPV - VHM = µ + µ2 . Let the total variance = σ2 = EPV + VHM = EPV + (EPV - µ2 - µ) = 2 EPV - µ2 - µ. Therefore, EPV = (σ2 + µ + µ2 )/2. Thus VHM = (σ2 - µ - µ2 )/2. Thus like the situation of mixing Poissons, the mean and total variance involve the EPV and VHM. Therefore assuming one is mixing Geometric Distributions, one can estimate the EPV and VHM from the estimated mean X and estimated variance s2 : EPV = (s2 + X + X 2 )/2.
VHM = s2 - EPV = (s2 - X - X 2 )/2.
Note that for some data sets, s2 < X + X 2 , and therefore the estimated VHM would be negative. In this case, either the assumption of a mixture of Geometric Distributions is not appropriate, or there is a lot of random fluctuation which affected the estimate. It should also be noted that as β gets very small, the Geometric approaches a Poisson. Therefore, when the mean claim frequency is small, there is very little difference between assuming a mixture of Geometric Distributions and Poissons. 13 14
For example, g(β) could be a Beta Distribution. See Exercise 20.73 in Loss Models.
2013-4-11, Semiparametric Estimation §4 Geometric Frequency, HCM 10/22/12, Page 38 In semiparametric estimation, when one assumes each exposure has a Geometric Distribution, one estimates the mean and total variance and then: EPV = (estimated total variance + estimated mean + estimated mean2 )/2. VHM = estimated total variance - EPV. Exercise: Assume that one observes that the claim count distribution during a year is as follows for a group of 10,000 insureds: Total Claim Count : 0 1 2 3 4 5 6 7 Number of Insureds: 8116 1434 329 87 24 7 2 1
>7 0
Assume in addition that the claim count for each individual insured has a Geometric distribution. Estimate the Buhlmann Credibility parameter K. [Solution: In a previous exercise we estimated the total mean as 0.2503, and the total variance (adjusted for degrees of freedom) as 0.3587. EPV = (estimated total variance + estimated mean + estimated mean2 )/2 = (0.3587 + 0.2503 + 0.25032 )/2 = 0.3358. VHM = Total Variance - EPV = 0.3587 - 0.3358 = 0.0229. K = EPV/VHM = 0.3358/0.0229 = 14.7. Comment: When we had assumed instead that each individual insured had a Poisson Distribution, then we estimated K = 2.3. Here we assumed more of the total variance was due to the process variance and less was due to differences between the insureds. Therefore, here less credibility would be assigned to the experience of an individual insured.]
2013-4-11, Semiparametric Estimation §4 Geometric Frequency, HCM 10/22/12, Page 39 Problems: 4.1 (3 points) The following information comes from a study of robberies of convenience stores over the course of a year: (i) Xi is the number of robberies of the ith store, with i = 1, 2,..., 500. (ii) ∑ Xi = 50 (iii) ∑ Xi2 = 220 (iv) The number of robberies of a given store during the year is assumed to be Geometric with unknown mean that varies by store. Determine the semiparametric empirical Bayes estimate of the expected number of robberies next year of a store that reported no robberies during the studied year. (A) Less than 0.02 (B) At least 0.02, but less than 0.04 (C) At least 0.04, but less than 0.06 (D) At least 0.06, but less than 0.08 (E) At least 0.08
4.2 (2 points) The number of claims a driver has during the year is assumed to be Geometric with mean that varies by driver. The number of losses arising from 500 individual insureds over a single period of observation is distributed as follows: Number of Losses Number of Insureds 0 450 1 30 2 10 3 5 4 5 5 or more 0 Determine the credibility of one year's experience for a single driver using semiparametric empirical Bayes estimation. A. Less than 0.20 B. At least 0.20, but less than 0.30 C. At least 0.30, but less than 0.40 D. At least 0.40, but less than 0.50 E. At least 0.50
2013-4-11, Semiparametric Estimation §4 Geometric Frequency, HCM 10/22/12, Page 40 Solutions to Problems: 4.1. D. Mean = 50/500 = .1. Second moment = 220/500 = .44. Sample Variance = (500/499)(.44 - .12 ) = .4309. EPV = (estimated total variance + estimated mean + estimated mean2 )/2 = (.4309 + .1 + .12 )/2 = .2705. VHM = Total Variance - EPV = .4309 - .2705 = .1604. K = EPV/VHM = .2705/.1604 = 1.69. For one store for one year, Z = 1/(1+K) = 1 /2.69 = .372. The observation is 0 and the overall mean is .1, so the estimated future frequency for this store is: (0)(.372) + (.1)(1 - .372) = 0.063. Comment: Similar to 4, 11/00, Q.7, except there a Poisson is assumed. 4.2. B. The observed mean is .17, while the estimated variance (adjusted for degrees of freedom) is: (500/499)(.39 - .172 ) = .3618. Number of Insureds
Number of Losses
Square of Number of Losses
450 30 10 5 5
0 1 2 3 4
0 1 4 9 16
500
0.170
0.390
Estimated EPV = (estimated total variance + estimated mean + estimated mean2 )/2 = (.3618 + .17 + .172 )/2 = .2804. Estimated VHM = Estimated Total Variance - Estimated EPV = .3618 - .2804 = .0814. K = EPV / VHM = .2804 / .0814 = 3.44. Z = 1 / (1 + 3.44) = 22.5%. Comment: Similar to 4B, 11/98, Q.11, except there a Poisson is assumed.
2013-4-11, Semiparametric Estimation §5 Neg. Bin. r Fixed,
HCM 10/22/12,
Page 41
Section 5, Negative Binomial with r Fixed Instead of assuming a Geometric Distribution, one can assume a Negative Binomial Distribution with fixed known r parameter, with the β parameter varying between the insureds. In semiparametric estimation, when one assumes each exposure has a Negative Distribution with known r parameter, one estimates the mean and total variance and then: EPV = (estimated total variance + (r)(estimated mean) + estimated mean2 )/(1+r) = (s2 + r X + X 2 )/(1+r). VHM = estimated total variance - EPV = s2 - EPV. Exercise: Derive the above formula for the EPV. [Solution: Let the mixing distribution be g(β). Then since the mean frequency for each exposure is rβ, the overall mean is: µ = r(the mean of g). Since the process variance of each exposure is rβ(1+β) = rβ + rβ2 , the EPV is: E[rβ + rβ2 ] = rE[β] + rE[β2 ] = r(mean of g) + r(second moment of g). Since the mean of each exposure is rβ, the VHM is by definition: Var[rβ] = r2 Var[β] = (r2 )(the variance of g) = (r2 )(second moment of g) - (r2 )( mean of g)2 . Then σ2 = total variance = EPV + VHM = r(mean of g) + r(second moment of g) + (r2 )(second moment of g) - (r2 )( mean of g)2 = µ - µ2 + (r + r2 )(second moment of g). Therefore, second moment of g = (σ2 + µ2 - µ)/(r + r2 ). Therefore, EPV = r(mean of g) + r(second moment of g) = µ + (σ2 + µ2 - µ)/(1 + r) = (σ2 + rµ + µ2 )/(1 + r). If we estimate µ by X and σ2 by s2 , then the estimated EPV is: (s2 + r X + X 2 )/(1+r). Comment: For r = 1, one has a Geometric and as before, EPV = (σ2 + µ + µ2)/2.]
2013-4-11, Semiparametric Estimation §5 Neg. Bin. r Fixed,
HCM 10/22/12,
Page 42
Exercise: Assume that one observes that the claim count distribution during a year is as follows for a group of 10,000 insureds: Total Claim Count : 0 1 2 3 4 5 6 7 >7 Number of Insureds: 8116 1434 329 87 24 7 2 1 0 Assume in addition that the claim count for each individual insured has a Negative Binomial distribution, with r = 2, and β varying over the portfolio of insureds. Estimate the Buhlmann Credibility parameter K. [Solution: In a previous exercise we estimated the total mean as 0.2503, and the total variance as 0.3587. EPV = (estimated total variance + r(estimated mean) + estimated mean2 )/(1+r) = {0.3587 + 2(0.2503) + 0.25032 }/3 = 0.3073. VHM = Total Variance - EPV = 0.3587 - 0.3073 = 0.0514. K = EPV/VHM = 0.3073/0.0514 = 6.0.]
2013-4-11, Semiparametric Estimation §5 Neg. Bin. r Fixed,
HCM 10/22/12,
Page 43
Problems: 5.1 (3 points) The following information comes from a study of robberies of convenience stores over the course of a year: (i) Xi is the number of robberies of the ith store, with i = 1, 2,..., 500. (ii) ∑ Xi = 50 (iii) ∑ Xi2 = 220 (iv) The number of robberies of a given store during the year is assumed to be Negative Binomial with r = 2, and β that varies by store. Determine the semiparametric empirical Bayes estimate of the expected number of robberies next year of a store that reported no robberies during the studied year. (A) Less than 0.02 (B) At least 0.02, but less than 0.04 (C) At least 0.04, but less than 0.06 (D) At least 0.06, but less than 0.08 (E) At least 0.08
5.2 (2 points) The number of claims a driver has during the year is assumed to be Negative Binomial with r = 3, and β that varies by driver. The number of losses arising from 500 individual insureds over a single period of observation is distributed as follows: Number of Losses Number of Insureds 0 450 1 30 2 10 3 5 4 5 5 or more 0 Determine the credibility of one year's experience for a single driver using semiparametric empirical Bayes estimation. A. Less than 0.20 B. At least 0.20, but less than 0.30 C. At least 0.30, but less than 0.40 D. At least 0.40, but less than 0.50 E. At least 0.50
2013-4-11, Semiparametric Estimation §5 Neg. Bin. r Fixed,
HCM 10/22/12,
Page 44
Solutions to Problems: 5.1. C. Mean = 50/500 = .1. Second moment = 220/500 = .44. Sample Variance = (500/499)(.44 - .12 ) = .4309. EPV = (estimated total variance + (r)(estimated mean) + estimated mean2 )/(1+r) = (.4309 + (2)(.1) + .12 )/(1+2) = .2136. VHM = Total Variance - EPV = .4309 - .2136 = .2173. K = EPV/VHM = .2136/.2173 = .98. For one store for one year, Z = 1/(1+K) = 1 /1.98 = .505. The observation is 0 and the overall mean is .1, so the estimated future frequency for this store is: (0)(.505) + (.1)(1 - .505) = 0.050. Comment: Similar to 4, 11/00, Q.7, except there instead a Poisson is assumed. 5.2. C. The observed mean is .17, while the estimated variance (adjusted for degrees of freedom) is: (500/499)(.39 - .172 ) = .3618. Number of Insureds
Number of Losses
Square of Number of Losses
450 30 10 5 5
0 1 2 3 4
0 1 4 9 16
500
0.170
0.390
EPV = (estimated total variance + r(estimated mean) + estimated mean2 )/(1+r) = (.3618 + (3)(.17) + .172 )/(1 + 3) = .2252. Estimated VHM = Estimated Total Variance - Estimated EPV = .3618 - .2252 = .1366. K = EPV / VHM = .2252 / .1366 = 1.65. Z = 1 / (1+ 1.65) = 37.7%. Comment: Similar to 4B, 11/98, Q.11, except there a Poisson is assumed.
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 45
Section 6, Overview Comparisons Between Types of Distributions: In four exercises the same data has been used to estimate the Buhlmann Credibility Parameter, K, using semiparametric estimation. However, the result depended on the assumed type of frequency distribution for each insured: Distribution Type
Poisson
K
Geometric
2.3
14.7
Negative
Negative
Binomial, r =2
Binomial, β = 0.3
6.0
9.8
In the case of a shorter-tailed Poisson, more of the total variation is assumed to be due to variation between the insureds and less from random fluctuation in the risk process of each individual insured. Therefore, for the Poisson assumption, K is smaller and the credibility assigned to an individual is larger. In the case of a longer-tailed Geometric, less of the total variation is assumed to be due to variation between the insureds and more from random fluctuation in the risk process of each individual insured. Therefore, for the Geometric assumption, K is larger and Z is smaller. Mixing Bernoullis: One can not perform semiparametric estimation if we assume a Bernoulli frequency. Assume that each exposure has a Bernoulli distribution, with the parameters q varying over the portfolio of exposures. Let the mixing distribution be g(q).15 Then since the mean frequency for each exposure is q, the overall mean is the mean of g. Since the process variance of each exposure is q(1 - q) = q - q2 , the EPV is: E[q - q2 ] = E[q] - E[q2 ] = mean of g - second moment of g. Since the mean of each exposure is q, the VHM is by definition the variance of g = second moment of g - ( mean of g)2 . Thus if we let the overall mean be µ, we have the following relationships when mixing Bernoullis: µ = mean of g. EPV = µ - second moment of g. VHM = second moment of g - µ2 . Total Variance = EPV + VHM = µ - µ2 . 15
The important special case in which g(q) is a Beta Distribution, is discussed in “Mahlerʼs Guide to Conjugate Priors.”
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 46
Thus unlike the situation of mixing Poissons, the mean and total variance do not usefully involve the EPV and VHM. Thus in the case of mixing Bernoullis, one can not estimate the EPV and VHM from the observed mean and total variance. Each exposure has a Bernoulli distribution if and only if each exposure has either zero or one claim (per year). So it easy to tell whether a Bernoulli assumption is appropriate. If each exposure has a Bernoulli distribution, then the observed variance is always the observed mean - the square of the observed mean. Thus the observed variance does not provide significant help in estimating either the EPV or VHM.16 A Simulation Experiment: The difference between full parametric and semiparametric estimation can be illustrated via simulation. For example, let us simulate a model involving mixing Poissons. Assume there are four types of risks, all with claim frequency given by a Poisson distribution: Average Annual Number Type Claim Frequency of Risks Excellent 1 40 Good 2 30 Bad
3
20
Ugly 4 10 As shown in “Mahlerʼs Guide to Buhlmann Credibility,” for this full parametric model, EPV = 2, VHM = 1, and therefore, K = 2. Therefore, for one year of data, for the full parametric model, Z = 1/3. We determine that one year of data gets a credibility of 1/3, prior to seeing any data. In contrast, for semiparametric estimation, Z depends on the observed data for a portfolio. I simulated 1 year of data from these 100 insureds:
16
# of claims
# of insureds
0 1 2 3 4 5 6 7 8 9
28 27 12 19 7 5 1 0 0 1
Since the total variance is EPV + VHM, and each of the VHM and EPV are positive, we know each must be less than the total variance. Thus if one has enough data to estimate the total variance with some accuracy, then this estimate serves as an upper bound on each of the EPV and VHM.
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 47
Exercise: What is the mean and sample variance for the above data? [Solution: The mean is 1.76. The sample variance is: (100/99)(6.00 - 1.762 ) = 2.932.] If instead of the full parametric model, I merely assume that each insured is Poisson and apply semiparametric estimation to the simulated data, then K = 1.76/(2.932 - 1.76) = 1.50. Therefore, Z = 1/2.50 = 40%. Exercise: For an insured who had 5 claims in one year, what is his estimated frequency next year? [Solution: It depends on whether one uses full parametric or semiparametric estimation. Using the full parametric model, the a priori mean is 2, Z = 1/3, and the estimate = (1/3)(5) + (2/3)(2) = 3. Using semiparametric estimation, the observed mean is 1.76, Z = 40%, and the estimate = (40%)(5) + (60%)(1.76) = 3.056.] I ran a total of ten simulations of the above 100 insureds, and used semiparametric estimation in order to estimate K. The results were: 1.50, 1.98, 1.15, 2.60, 1.20, 2.11, 1.63, 1.76, 1.67, and 3.08. Thus one sees that with only 100 insureds17, the estimates of the Buhlmann Credibility parameter using semiparametric estimation are subject to considerable random fluctuation. However, the resulting estimates are subject to less random fluctuation. For example, for the final simulation, the overall mean was 2.21, K = 3.08, and the estimate for an insured with 5 claims, would be (1/4.08)(5) + (3.08/4.08)(2.21) = 2.894. This is not that dissimilar from the estimate from the first simulation, 3.056 as calculated above. When I instead simulated the same situation, but with 10,000 rather than 100 insureds18, the estimates of K using semiparametric estimating were less subject to random fluctuation. For ten simulation the estimates of K were: 2.02, 1.90, 2.06, 2.07, 2.04, 1.94, 1.94, 1.96, 2.06, 1.95. These are all relatively close to the true underlying Buhlmann Credibility Parameter, K = 2.19 I next altered the mean frequencies to 3%, 6%, 9%, and 12%.20 With 10,000 insureds21, for ten simulation the estimates of K were: 37, 124, 184, 33, 25, 35, 41, 32, 126, 49.22
17
With an overall mean frequency of 2, for 200 total expected claims. And thus had a total of 20,000 rather than 200 expected claims. 19 In actual applications of semiparametric estimation we would not know the true underlying model or the true value of K. The amount of credibility assigned to the data is relatively insensitive to K, and therefore small difference in K like these, usually do not make a significant difference in the estimates of future claim frequency. 20 This is more similar to a situation in private passenger automobile insurance. 21 With a mean frequency of 4%, for 400 total expected claims. 22 While in this case none of the ten simulated K values were negative, due to a negative estimate of the VHM, recall that if the estimated VHM < 0, we set Z = 0. 18
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 48
Exercise: For 10,000 insureds, each of which has a Poisson frequency, 4000 have a mean frequency of 3%, 3000 have a mean of 6%, 2000 have a mean of 9%, and 1000 have a mean of 12%. What is the Buhlmann Credibility Parameter, K? [Solution: EPV = mean = (40%)(3%) + (30%)(6%) + (20%)(9%) + (10%)(12%) = .06. VHM = (40%)(3% - 6%)2 + (30%)(6% - 6%)2 + (20%)(9% - 6%)2 + (10%)(12% - 6%)2 = .0009. K = .06/.0009 = 66.7. Comment: This is a full parametric model, as per “Mahlerʼs Guide to Buhlmann Credibility.”] Thus for these ten simulations, the values of K vary considerably around the true value of 66.7. The corresponding credibility assigned to one year of data varies from 1/185 = 0.5% to 1/26 = 3.8%, compared to the correct amount of 1/67.7 = 1.5%. Simulating Several Years of Data: Rather than just simulating one year of data from each insured, for example one can simulate three years of data from each insured. I simulated 100 insureds with mean frequencies of 1, 2, 3, and 4, as per the original example. The number of claims by year for the 100 insureds were: 1 0 0, 2 1 3, 0 1 2, 0 0 1, 0 3 0, 2 1 0, 0 0 0, 1 0 2, 0 2 1, 1 2 3, 0 1 0, 0 3 0, 0 2 5, 0 0 1, 0 0 1, 0 0 0, 2 1 1, 0 1 2, 0 1 1, 0 1 1, 0 2 0, 1 2 0, 1 0 1, 1 2 0, 2 0 0, 4 0 2, 1 2 2, 0 1 0, 0 3 0, 0 1 0, 0 2 2, 1 0 1, 2 3 0, 1 0 0, 0 1 0, 0 0 2, 2 1 0, 2 1 2, 0 0 0, 1 0 1, 0 2 3, 2 3 6, 6 6 2, 2 4 2, 2 4 0, 2 0 0, 3 3 2, 2 1 1, 2 1 0, 1 2 2, 0 0 3, 3 2 2, 1 2 1, 3 0 2, 0 1 0, 0 2 0, 1 2 1, 0 1 2, 2 3 2, 3 3 3, 2 3 2, 3 2 2, 1 4 4, 1 2 0, 2 2 2, 0 1 1, 0 2 1, 2 3 0, 2 2 3, 3 6 4, 6 3 2, 6 2 0, 5 4 3, 0 2 2, 7 4 1, 3 3 2, 3 3 1, 3 3 2, 0 6 3, 3 3 4, 5 2 1, 3 5 2, 3 1 2, 3 2 5, 4 4 1, 3 4 3, 3 3 4, 3 1 3, 2 7 4, 2 2 1, 5 6 2, 2 5 5, 3 2 3, 3 5 4, 6 4 4, 7 3 7, 5 3 8, 0 3 3, 9 6 7, 4 6 3. As discussed previously, one can treat the data in two somewhat different ways. It is preferable to treat the sum of the claims from each insured over the three years as a single observation.23 Then the mean (3 year) frequency is: 589/100 = 5.89 = EPV. The second moment is: Σ (total for insured i)2 / 100 = 5325/100 = 53.25. Thus the sample variance is: (100/99)(53.25 - 5.892 ) = 18.745. The estimated VHM (3 year) = 18.745 - 5.89 = 12.855. Estimated K (3 year) = EPV / VHM = 5.89/12.855 = 0.46. 23
See 4, 5/05, Q.28.
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 49
Exercise: Using the above estimated K, what is the estimated future annual frequency for the final insured, with 4, 6 and 3 claims? [Solution: We have calculated K assuming that one draw from the risk process is three years of data from a single insured. Therefore, Z = 1/(1 + 0.46) = 68.5%. Estimated three year frequency = (68.5%)(13) + (31.5%)(5.89) = 10.76. Estimated future annual frequency = 10.76 /3 = 3.59.] If instead one treats each year of data from each insured as a separate observation, then the mean annual frequency is: 589/300 = 1.963. The second moment of the frequency is: 2117/300 = 7.057. The sample variance = (300/299)(7.057 - 1.9632 ) = 3.214. Thus the estimated VHM = 3.214 - 1.963 = 1.251. The estimated K = 1.963/1.251 = 1.57. Exercise: Using semi-parametric estimation, treating each year of data from each insured as a separate observation, what is the estimated future annual frequency for the final insured, with 4, 6 and 3 claims? [Solution: Z = 3/(3 + 1.57) = 65.6%. Estimate = (65.6%)(13/3) + (34.4%)(1.963) = 3.52.] More generally, let Xij is the number of claims from the ith insured for year j. If we have n insureds and Y years of data, then treating each year of data from each insured as a separate observation: n
estimated mean frequency =
Y
∑ ∑ Xij / (nY). i=1 j=1
n
estimated second moment of the frequency =
Y
∑ ∑ Xij2 / (nY) . i=1 j=1
Estimated K = EPV / VHM =
mean nY (second moment - mean2 ) - mean nY - 1
.
We note that depending how one uses the data in semiparametric estimation, one gets somewhat different amounts of credibility applied to the data, and slightly different estimates of the future annual frequency. It is preferable not to treat each year of data from an individual insured separately, but rather to work with the total for each insured, since for an individual each year of data is assumed to come from the same Poisson distribution. In each case, the credibility differs from that for the full parametric case, 3/(3+2) = 60%.
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 50
Alternately one could apply nonparametric estimation to this situation.24 si2 = the usual sample variance for the data from a single insured i. EPV = average of the si2 . VHM = (the sample variance of the means for the insureds) - EPV / (# years of data) For the simulated data shown above, s1 2 = 1/3,25 EPV = 1.710, mean for the first insured is 1/3, VHM = 1.513 and K = 1.710/1.513 = 1.13. Thus using nonparametric estimation, three years of data are given a credibility of: 3/(3+1.13) = 72.6%. For ten simulations, the credibility given to three years of data is for these four different techniques: Simulation #
Full Parametric
Semiparametric, Each Year Treated Separately
Semiparametric, Working with Insured's Totals
Nonparametric
1 2 3 4 5 6 7 8 9 10
60.0% 60.0% 60.0% 60.0% 60.0% 60.0% 60.0% 60.0% 60.0% 60.0%
65.6% 69.0% 56.7% 59.8% 68.2% 46.3% 50.9% 68.6% 59.6% 39.7%
68.5% 69.8% 56.9% 60.4% 65.9% 54.3% 46.8% 64.4% 59.8% 46.7%
72.6% 70.9% 57.0% 61.0% 62.2% 61.5% 42.4% 57.5% 59.8% 52.3%
Avg.
60.0%
58.4%
59.4%
59.7%
The average credibility assigned is similar for the four techniques.26 However, the random fluctuations in the data have a large effect on the nonparametric estimates of K, which rely solely on the data. In contrast, the full parametric method does not rely on the data in order to estimate K, and therefore the estimate of K is unaffected by the random fluctuations in the data. The semiparametric estimates of K are usually affected somewhat less by random fluctuations than nonparametric estimation.
24
See “Mahlerʼs Guide to Empirical Bayesian Credibility.” The mean of 1, 0, 0 is 1/3. The sample variance is: {(0 - 1/3)2 + (0 - 1/3)2 + (1 - 1/3)2 }/(3 - 1) = 1/3. 26 This is due to the fact that the model that generated the simulated data was the same one used for full parametric estimation. In actual actuarial applications we do not have a perfect model of reality. 25
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 51
Therefore, the Full Parametric Method has the advantage of a stable estimate of K. Thus one would prefer the Full Parametric Method, provided the assumptions in the model are close to reality.27 The nonparametric model has the advantage of making very few assumptions, but if there is not a very large data set, the estimate of K is subject to significant random fluctuation. The semiparametric estimates of K share some of the advantages and disadvantages of the other two methods.
27
This is a very big caveat. However, in many situations the actuary can use information from other similar situations (other years, other states, other insurers, etc.) to help him formulate a model.
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 52
Problems: Use the following information for the next 4 questions: The claim count distribution during a year is as follows for a group of 10,000 insureds: Total Claim Count : 0 1 2 3 4 5 6 7 8 9 Number of Insureds: 6788 2099 704 253 95 37 15 6 2 1
>9 0
6.1 (3 points) Assume each insureds claim count follows a Poisson Distribution. How much credibility would be given to three years of data from an insured? A. 60% B. 65% C. 70% D. 75% E. 80% 6.2 (3 points) Assume each insureds claim count follows a Geometric Distribution. How much credibility would be given to three years of data from an insured? A. Less than 15% B. At least 15%, but less than 20% C. At least 20%, but less than 25% D. At least 25%, but less than 30% E. At least 30% 6.3 (3 points) Assume each insureds claim count follows a Negative Binomial Distribution, with r = 3, and β varying across the portfolio of insureds. How much credibility would be given to three years of data from an insured? A. Less than 40% B. At least 40%, but less than 45% C. At least 45%, but less than 50% D. At least 50%, but less than 55% E. At least 55% 6.4 (3 points) Assume each insureds claim count follows a Negative Binomial Distribution, with β = 0.2, and r varying across the portfolio of insureds. How much credibility would be given to three years of data from an insured? A. Less than 40% B. At least 40%, but less than 45% C. At least 45%, but less than 50% D. At least 50%, but less than 55% E. At least 55%
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 53
Solutions to Problems: 6.1. B. The estimated mean is: 4988/10000 = .4988. The estimated second moment is: 10680/10000 = 1.0680. The sample variance is: (10000/9999)(1.0680 - .49882 ) = .8193. A
B
C
D
Number of Claims 0 1 2 3 4 5 6 7 8 9
Number of Insureds 6788 2099 704 253 95 37 15 6 2 1
Col. A times Col. B 0 2099 1408 759 380 185 90 42 16 9
Square of Col. A times Col. B 0 2099 2816 2277 1520 925 540 294 128 81
Sum
10000
4988
10680
EPV = overall mean = .4988. VHM = Total Variance - EPV = .8193 - .4988 = .3205. K = EPV/VHM = .4988/.3205 = 1.56. For three years, Z = 3/4.56 = 65.8%. Comment: One year of experience would be given a credibility of Z = 1/(1+K) = 1/2.56 = .391. Thus the estimated future claim frequency (Buhlmann credibility premium) for an insured who had for example 5 claims in one year is: (.391)(5) + (1 - .391)(.4988) = 2.26. 6.2. A. In the previous solution we estimated the total mean as .4988 and the estimated total variance as .8193. EPV = (estimated total variance + estimated mean + estimated mean2 )/2 = (.8193 + .4988 + .49882 )/2 = .7835. VHM = Total Variance - EPV = .8193 - .7835 = .0358. K = EPV/VHM = .7835/.0358 = 21.9. For three years, Z = 3/24.9 = 12.0%. Comment: One year of experience would be given a credibility of Z = 1/(1+K) = 1/22.9 = .044. Thus the estimated future claim frequency (Buhlmann credibility premium) for an insured who had for example 7 claims in one year is: (.044)(7) + (1 - .044)(.4988) = .785. Here we assumed more of the total variance was due to the process variance and less was due to differences between the insureds, than was the case when we assumed each insured was Poisson. Therefore, here much less credibility would be assigned to the experience of an individual insured.
2013-4-11,
Semiparametric Estimation §6 Overview,
HCM 10/22/12,
Page 54
6.3. C. In a previous solution we estimated the total mean as .4988 and the estimated total variance as .8193. EPV = (estimated total variance + r(estimated mean) + estimated mean2 )/(1+r) = (.8193 + 3(.4988) + .49882 )/4 = .6411. VHM = Total Variance - EPV = .8193 - .6411 = .1782. K = EPV/VHM = .6411/.1782 = 3.60. For three years, Z = 3/6.60 = 45.5%. 6.4. D. In a previous solution we estimated the total mean as .4988 and the estimated total variance as .8193. EPV = (1+ β)(estimated mean) = (1.2)(.4988) = .5986. VHM = Total Variance - EPV = .8193 - .5986 = .2207. K = EPV/VHM = .5986/.2207 = 2.71. For three years, Z = 3/5.71 = 52.5%.
2013-4-11, Semiparametric Estimation §7 Other Distributions,
HCM 10/22/12,
Page 55
Section 7, Other Distributions While semiparametric estimation is usually applied to frequency distributions, particularly the Poisson Distribution, it is possible to apply these same ideas to other distributions. For example, assume each insured has an Exponential Severity with mean θ, and θ varies across the portfolio. µ = E[Mean | θ] = E[θ]. EPV = E[Var | θ] = E[θ2]. VHM = Var[Mean | θ] = Var[θ] = E[θ2] - E[θ]2 . Total Variance = EPV + VHM = 2E[θ2] - E[θ]2 = 2E[θ2] - µ2. EPV = E[θ2] = (Total Variance + µ2)/2. Therefore, we can estimate the EPV as: (S2 + X 2 )/2. VHM = S2 - EPV. Exercise: For 100 claims from this portfolio of insureds, ΣXi = 2000, ΣXi2 = 90,000. Estimate K, using semiparametric estimation. [Solution: X = 2000/100 = 20. E[X2 ] = 900. S2 = (900 - 202 )(100/99) = 505. EPV = (S2 + X 2 )/2 = (505 + 202 )/2 = 452.5. VHM = 505 - 452.5 = 52.5. K = 452.5/52.5 = 8.6.]
2013-4-11, Semiparametric Estimation §7 Other Distributions,
HCM 10/22/12,
Page 56
Problems: 7.1 (3 points) Each insured has a Gamma Severity with parameters α = 3 and θ, with θ varying across the portfolio. For 1000 claims from this portfolio of insureds, ΣXi = 20,000, ΣXi2 = 550,000. Estimate the Buhlmann Credibility Parameter K, using semiparametric estimation. A. 11 B. 13 C. 15 D.17 E. 19 7.2 (3 points) The annual aggregate losses for each policy are uniform from 0 to ω, with ω varying across the portfolio. For 200 policies from this portfolio, ΣXi = 20,000, ΣXi2 = 3,000,000. Estimate the amount of credibility to be applied to one year of data from one policy, using semiparametric estimation. A. 5% B. 10% C. 15% D.20% E. 25% 7.3 (3 points) Each insured has an Exponential Severity with mean θ, and θ varies across the portfolio. For 1000 claims from this portfolio of insureds, ΣXi = 50,000, ΣXi2 = 5,100,000. Estimate the Buhlmann Credibility Parameter K, using semiparametric estimation. A. 40 B. 50 C. 60 D.70 E. 80 7.4 (3 points) The annual aggregate losses for each policy are Normal with σ = 170, with µ varying across the portfolio. For 100 policies from this portfolio, ΣXi = 20,000, ΣXi2 = 7,000,000. Using semiparametric estimation, estimate the future annual aggregate losses for a policy that had a total of 1300 in loss in 3 years. A. 215 B. 220 C. 225 D.230 E. 235
2013-4-11, Semiparametric Estimation §7 Other Distributions,
HCM 10/22/12,
Page 57
Solutions to Problems: 7.1. A. µ = E[3θ] = 3E[θ]. EPV = E[3θ2] = 3E[θ2]. VHM = Var[3θ] = 9Var[θ] = 9E[θ2] - 9E[θ]2 . Total Variance = EPV + VHM = 12E[θ2] - 9E[θ]2 = 12E[θ2] - µ2. EPV = 3E[θ2] = (Total Variance + µ2)/4. X = 20000/1000 = 20. E[X2 ] = 550. S2 = (550 - 202 )(1000/999) = 150.15. EPV = (S2 + X 2 )/4 = (150.15 + 202 )/4 = 137.54. VHM = 150.15 - 137.54 = 12.61. K = 137.54/12.61 = 10.9. Comment: For α fixed and θ varying across the portfolio, EPV = (S2 + X 2 )/(α + 1). 7.2. E. µ = E[ω/2] = E[ω]/2. EPV = E[ω2/12] = E[ω2]/12. VHM = Var[ω/2] = Var[ω]/4 = E[ω2]/4 - E[ω]2 /4. Total Variance = EPV + VHM = E[ω2]/3 - E[ω]2 /4 = E[ω2]/3 - µ2. EPV = E[ω2]/12 = (Total Variance + µ2)/4. X = 20000/200 = 100. E[X2 ] = 3,000,000/200 = 15,000. S 2 = (15000 - 1002 )(200/199) = 5025.13. EPV = (S2 + X 2 )/4 = (5025.13 + 1002 )/4 = 3756.3. VHM = 5025.13 - 3756.3 = 1268.8. K = 3756.3/1268.8 = 2.96. Z = 1/(1 + K) = 25.3%. 7.3. B. µ = E[θ]. EPV = E[θ2]. VHM = Var[θ] = E[θ2] - E[θ]2 . Total Variance = EPV + VHM = 2E[θ2] - E[θ]2 = 2E[θ2] - µ2. EPV = E[θ2] = (Total Variance + µ2)/2. X = 50000/1000 = 50. E[X2 ] = 5100. S2 = (5100 - 502 )(1000/999) = 2602.6. EPV = (S2 + X 2 )/2 = (2602.6 + 502 )/2 = 2551.3. VHM = 2602.6 - 2551.3 = 51.3. K = 2551.3/51.3 = 49.7.
2013-4-11, Semiparametric Estimation §7 Other Distributions, 7.4. D. EPV = E[σ2] = E[1702 ] = 28,900. X = 20000/100 = 200. E[X2 ] = 7,000,000/100 = 70,000. S 2 = (70000 - 2002 )(100/99) = 30,303. VHM = 30,303 - 28,900 = 1403. K = 28,900/1403 = 20.6. Z = 3/(3 + K) = 12.7%. Estimate = (12.7%)(1300/3) + (87.3%)(200) = 229.6.
HCM 10/22/12,
Page 58
2013-4-11, Semiparametric Estimation §8 Important Ideas,
HCM 10/22/12,
Page 59
Section 8, Important Ideas & Formulas Assume we have one year of data from a group of insureds. Assume each insuredʼs frequency distribution is of the same type, but a parameter varies across the group of insureds. Let X = observed mean, and s2 = sample variance = estimate of the total variance. In each case, the estimated VHM = s2 - estimated EPV. Poisson: EPV = X = observed mean. Negative Binomial (fixed β): EPV = (1+ β) X . Geometric: EPV= (s2 + X + X 2 )/2.
Negative Binomial (fixed r): EPV = (s2 + r X + X 2 )/(1+r).
Mahlerʼs Guide to
Empirical Bayesian Credibility Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-12 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-12
Empirical Bayesian Credibility,
HCM 10/22/12,
Page 1
Mahlerʼs Guide to Empirical Bayesian Credibility Copyright 2013 by Howard C. Mahler. The concepts in Section 20.4 of Loss Models, by Klugman, Panjer and WiIlmot, related to Nonparametric Estimation are demonstrated. Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (and sections whose title is in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Solutions to the problems in each section are at the end of that section. Note that problems include both some written by me and some from past exams.1 I have assigned point values to the questions based on 100 points corresponds to a four hour actuarial exam.
1
Section #
Pages
1 2 3 4 5 6 7 8 9
3-5 6-38 39-51 52-93 94-106 107-113 114-124 125-138 139-143
Section Name Introduction No Varying Exposures Differing Numbers of Years, No Varying Exposures Varying Exposures Differing Numbers of Years, Varying Exposures Using an A Priori Mean Using an A Priori Mean, Varying Exposures Assuming a Poisson Frequency Important Formulas and Ideas
In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. Past exam questions are copyright by the Casualty Actuarial Society and Society of Actuaries and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams.
2013-4-12
Empirical Bayesian Credibility,
HCM 10/22/12,
Page 2
Course 4 Exam Questions by Section of this Study Aid2 Section Sample 1 2 3 4 5 6 7 8
31
5/00
11/00
15
16 27
5/01
32
11/01
11/02
11/03
11
15
30
11/04
17
5/05
11/05
11/06
5/07
27
11
25
11 22
13
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams.
2
Excluding any questions that are no longer on the syllabus.
2013-4-12
Empirical Bayesian Credibility §1 Introduction,
HCM 10/22/12,
Page 3
Section 1, Introduction Section 20.4.1 of Loss Models discusses estimating the Expected Value of the Process Variance and the Variance of the Hypothetical Means from data.3 This study guide will explain such Empirical Bayesian Credibility techniques. First we will contrast these nonparametric methods to “full parametric” and semiparametric methods of estimation. Multi-sided Dice Example: “Mahlerʼs Guide to Buhlmann Credibility”, “Mahlerʼs Guide to Conjugate Priors”, Loss Models, and “Credibility” by Mahler and Dean, have many examples of different models for which the Bühlmann Credibility assigned to an observation are determined. In these “Full Parametric Models”, prior to an observation, all aspects of the risk process are specified. One such model involves multi-sided dice:4 There are a total of 100 multi-sided dice of which 60 are 4-sided, 30 are 6-sided and 10 are 8-sided. The multi-sided dice with 4 sides have 1, 2, 3, and 4 on them. The multi-sided dice with the usual 6 sides have numbers 1 through 6 on them. The multi-sided dice with 8 sides have numbers 1 through 8 on them. For a given die each side has an equal chance of being rolled; i.e., the die is fair. Your friend has picked at random a multi-sided die. He then rolled the die and told you the result. You are to estimate the result when he rolls that same die again. It was shown that for this example: A Priori Mean Die Roll = 3. Expected Value of the Process Variance = EPV = 2.15. Variance of the Hypothetical Means = VHM = .45. Bühlmann Credibility Parameter = K = EPV/VHM = 2.15 / .45 = 4.778. Credibility assigned to the observation of one die roll = Z = 1/(1+K) = 17.3%. Bühlmann Credibility Estimate of the next roll from the same die = (0.173)(Observation) + (0.827)(3).
3
See also “Topics in Credibility” by Dean. This is also discussed briefly in Section 6.6 of “Credibility” by Mahler and Dean, not on the Syllabus. 4 See “Mahlerʼs Guide to Buhlmann Credibility,” or Sections 3.1 and 3.2 of “Credibility” by Mahler and Dean, the 4th Edition of Foundations of Casualty Actuarial Science.
2013-4-12
Empirical Bayesian Credibility §1 Introduction,
HCM 10/22/12,
Page 4
Example of Mixing Poissons: In “Mahlerʼs Guide to Semiparametric Estimation”, there is an example involving Poisson frequencies.5 Assume that one observes that the claim count distribution is as follows for a large group of insureds: Total Claim Count : Percentage of Insureds:
0 1 2 3 4 5 60.0% 24.0% 9.8% 3.9% 1.6% 0.7%
>5 0%
One can estimate the total mean as 0.652 and the total variance as: 1.414 - 0.6522 = 0.989. Assume in addition that the claim count for each individual insured has a Poisson distribution which does not change over time. In other words each insuredʼs frequency process is given by a Poisson with parameter λ, with λ varying over the group of insureds. Then the process variance for each insured is λ. Thus the expected value of the process variance is estimated as follows: Eλ[VAR[X | λ]] = Eλ[λ] = overall mean = 0.652. Thus we estimate the Variance of the Hypothetical Means as: Total Variance - EPV = 0.989 - 0.652 = 0.337. Then K = EPV / VHM = 0.652/0.337 = 1.93. Exercise: An insured from this portfolio is observed to have 1 claim in five years. Estimate the future claim frequency for this insured. [Solution: Z = 5/(5+K) = 5/6.93 = 72.2%. The estimated future frequency = (72.2%)(1/5) + (1 - 72.2%)(0.652) = 0.326. Comment: Once we have estimated K, we can use it for similar insureds to those in this portfolio. For example, for a similar insured observed for 4 years, Z = 4/(4 + 1.93) = 67.5%.] This is an example of what is called Semiparametric Estimation, or Semiparametric Empirical Bayes Estimation. While the form of the risk process for each insured is specified, the distribution of risk parameters among the insureds is not specified. Therefore, in addition to the modeling assumptions, one must rely on observed data, in order to estimate the Bühlmann Credibility Parameter.
5
See also page 48 of “Credibility “ by Mahler and Dean.
2013-4-12
Empirical Bayesian Credibility §1 Introduction,
HCM 10/22/12,
Page 5
Nonparametric Estimation: In contrast to the two previous examples, one can rely on the data itself, without any specific model of the risk process, to estimate the EPV and VHM. There have been developed an important set of techniques that try to estimate the Bühlmann Credibility Parameter, K, from observed data.6 This is sometimes referred to as nonparametric estimation. For nonparametric estimation, you need either a number of individuals, classes, etc., each observed over a number of years, or a number of individuals, classifications, etc., observed across a larger group.7 Examples of these techniques will be discussed in this study guide.
Semiparametric vs. Nonparametric vs. Full Parametric Estimation: Semiparametric estimation assumes a particular form of the frequency distribution, which differs from nonparametric estimation where no such assumption is made. One can use semiparametric estimation with only one year of data from a portfolio of insureds. Nonparametric estimation requires data separately for each of at least two years from several insureds.8 Semiparametric estimation assumes a particular form of the frequency distribution, but differs from full parametric estimation via Bühlmann Credibility where in addition one assumes a particular distribution of types of insureds or mixing distribution. Full parametric estimation assumes a complete model and the Bühlmann Credibility Parameter, K, is calculated with no reference to any observations. From the least complete to most complete modeling assumptions: Nonparametric, semiparametric, full parametric. From the least reliance on data in order to calculate K to the most reliance on data to calculate K: Full parametric, semiparametric, nonparametric.
6
References include: Loss Models, by Klugman, Panjer, and WiIlmot; “Institute of Actuaries Study Note, An Introduction to Credibility Theory,” by H. R. Waters; “Report of the Credibility Subcommittee: Development and Testing of Empirical Bayes Credibility Procedures for Classification Ratemaking,” I.S.O., September 1980; Gary Venterʼs Credibility Chapter of Editions 1 to 3 of Foundations of Casualty Actuarial Science; Advanced Risk Theory by De Vlyder. 7 Four or more risks, classes, etc., is recommended in order to apply these techniques in practical applications. However, in order to reduce the time needed to perform calculations, exam questions may have only 2 or 3 individuals or classes. 8 As discussed subsequently, if one relies on an a priori estimate of the mean, one can apply these techniques to a single insured, class, etc.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 6
Section 2, No Varying Exposures (No Variation in Years) There are a number of situations to which Nonparametric Empirical Bayesian Estimation can be applied. The simplest case is that in which the insureds each have the same volume of data in each year. Loss Models refers to this case as the Bühlmann case, as opposed to the Bühlmann-Straub case with varying exposures. This case with no varying exposures can be split in turn into a simpler case in which each insured has the same number of years of data, and a more complex case in which insureds have different numbers of years of data. The simpler case, which is a special case of the more complex case, will be discussed first. In this simplest case, we have a rectangular array of data (number of claims, dollars of loss, etc.) with no empty cells and no specific mention of exposures. Each class (or individual, region, etc.) has the same number of years of data. A Three Driver Example: For example, assume there are 3 drivers in a particular rating class. For each of 5 years, we have the number of claims for each of these 3 drivers:9 Hugh Dewey Louis
1
2
3
4
5
0 0 0
0 1 0
0 0 2
0 0 1
0 0 0
Exercise: Calculate the mean and sample variance of each driver. Calculate the overall mean and the sample variance of the driverʼs means. [Solution: The sample variance for Louis is:10 {(0 - 0.6)2 + (0 - 0.6)2 + (2 - 0.6)2 + (1 - 0.6)2 + (0 - 0.6)2 } / (5 - 1) = 0.80.
Hugh Dewey Louis Mean Variance
1
2
3
4
5
Mean
0 0 0
0 1 0
0 0 2
0 0 1
0 0 0
0.00 0.20 0.60 0.2667 0.0933
Sample Variance 0.00 0.20 0.80 0.333
The driverʼs means are: 0, 0.2, and 0.6; their sample variance is: {(0 - 0.2667)2 + (0.2 - 0.2667)2 + (0.6 - 0.2667)2 } / (3 - 1) = 0.09333.] Using nonparametric empirical Bayesian estimation, one estimates the EPV as the average of the sample variances for each driver = (0 + 0.2 + 0.8) / 3 = 0.33333. 9
I always put the observations going across and going down the individuals, classes, groups, regions, etc. Note that in the sample variance one divides by n-1 = 4, rather than n = 5.
10
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 7
Then one estimates the VHM as: (sample variance of the driverʼs means) - EPV / (# years of data) = 0.09333 - 0.33333/5 = 0.0267.11 We divide the estimated EPV and VHM in order to estimate K. K = EPV/VHM = 0.3333 / 0.0267 = 12.5. Z = 5 / (5 + 12.5) = 28.6%. Overall mean = (0 + 0.2 + 0.6)/3 = 0.2667 Using nonparametric empirical Bayesian estimation to estimate the future claim frequency of each driver: Hugh: (0.286)(0) + (1 - 0.286)(0.2667) = 0.190. Dewey: (0.286)(0.2) + (1 - 0.286)(0.2667) = 0.248. Louis: (0.286)(0.6) + (1 - 0.286)(0.2667) = 0.362. Usually one would not employ this technique in practical applications, unless one had many more drivers. We note that the resulting estimates are in balance: (0.190 + 0.248 + 0.362) / 3 = 0.2667 = the observed mean claims frequency. Formulas, no Variation in Exposures or Years: This is an example of Empirical Bayesian estimation. In general this technique would proceed as follows: Assume we have Y years of data from each of C classes. Let Xit be the data (die roll, frequency, severity, pure premium, etc.) observed for class (or risk) i, in year t, for i = 1,...,C, and t = 1,...,Y.12 Y
∑ Xit Let X i = t=1 Y
= average of the data for class i.
C Y
∑∑ Xit Let X = i=1 t=1 CY 11
= overall average of the observed data.
A correction term is subtracted from the sample variance of the driverʼs means in order to make the estimate of the VHM unbiased as discussed below. 12 Here we have assumed the same number of years of data for each class or risk.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 8
Y
∑ (Xit - Xi )2 Let si2 = t=1
Y - 1
= sample variance = estimated process variance for class i.
C
C Y
∑si2 ∑∑ (Xit - Xi )2 The estimated EPV = i=1 C C
= i=1 t=1 C (Y - 1)
∑(Xi The estimated VHM = i=1
.13
- X )2
C - 1
-
EPV 14 . Y
K = EPV / VHM = estimated Bühlmann Credibility Parameter. _ In the above example: C = 3, Y = 5, X2 = 0.2, X = 0.267, s2 2 = 0.20, EPV = 0.333, VHM = 0.093 - 0.333/5 = 0.0264, and K = 0.333 / 0.0264 = 12.6. Thus 5 years of data would be given a credibility of: 5 / (5 + 12.6) = 28.4%. The remaining 71.6% weight be assigned to the observed overall mean of 0.267. Thus for risk 2, Dewey, one would estimate the future annual frequency as: (28.4%)(0.2) + (71.6%)(0.267) = 0.248.
Loss Models uses the notation v for the EPV, and thus the estimated EPV would be v^ . 14 Loss Models uses the notation a for the VHM, and thus the estimated VHM would be ^a . 13
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 9
Using the Functions of the Calculator to Compute Samples Means and Variances: Using the TI-30X-IIS Multiview, one could work as follows with the data for Louis, (0, 0, 2, 1, 0): DATA DATA Clear L1 ENTER 0 ENTER 0 ENTER 2 ENTER 1 ENTER 0 ENTER (The five claim counts for Louis should now be in the column labeled L1.) 2nd STAT 1-VAR ENTER (If necessary use on the big button at the upper right to select 1-VAR.) DATA L1 ENTER (If necessary use on the big button at the upper right to select DATA L1.) FRQ ONE ENTER (If necessary use and to select FRQ ONE.) CALC ENTER (Use and on the big button at the upper right to select CALC.) Various outputs are displayed. Use and on the big button to scroll through them. n = 5. (number of data points.) X = 0.6 (sample mean of X.) S x = 0.89443 (square root of the sample variance of X.) σx = 0.8 (square root of the variance of X, computed with n in the denominator.) ΣX=3
ΣX2 = 5
To exit stat mode, hit 2ND QUIT. To display the outputs again: 2nd STAT STATVAR ENTER (Use on the big button at the upper right to select STATVAR.) To get Sx2 , scroll down to Sx, hit ENTER, hit x2 . The sample variance is: 4/5 = 0.8.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 10
Using the TI-30X-IIS, one could work as follows with the data for Louis, (0, 0, 2, 1, 0): 2nd STAT CLRDATA ENTER 2nd STAT 1-VAR ENTER (Use the key if necessary to select 1-VAR rather than 2-VAR.) DATA X1 = 0 Freq = 1 X2 = 0 Freq = 1 X3 = 2 Freq = 1 X4 = 1 Freq = 1 X5 = 0 Freq = 1 ENTER STATVAR Various outputs are displayed. Use the key and the key to scroll through them. n = 5. (number of data points.) X = 0.6 (sample mean of X.) S x = 0.89443 (square root of the sample variance of X.) σx = 0.8 (square root of the variance of X, computed with n in the denominator.) ΣX=3
ΣX2 = 5
S x2 = 0.894432 = 0.8. Alternately, one could have entered the data for Louis, (0, 0, 2, 1, 0), as follows: X1 = 0 Freq = 3 X2 = 1 Freq = 1 X3 = 2 Freq = 1 ENTER
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 11
Using the BA II Plus Professional, as follows for the data for Louis, (0, 0, 2, 1, 0): 2nd DATA 2nd CLR WORK X1 0 ENTER ↓↓ X2 0 ENTER ↓↓ X3 2 ENTER ↓↓ X4 1 ENTER X4 0 ENTER 2nd STAT If necessary press 2nd SET until 1-V is displayed (for one variable) Various outputs are displayed. Use the keys ↓ and ↑ to scroll through them. n = 5. (number of data points.) X = 0.6 (sample mean of X.) S x = 0.89443 (square root of the sample variance of X.) σx = 0.8 (square root of the variance of X, computed with n in the denominator.) ΣX=3
ΣX2 = 5
S x2 = 0.894432 = 0.8.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 12
Multisided Die Simulation: For example, assume we have the following data generated by a simulation of the multi-sided die example: Risk Number
Die Type
1
2
1 2 3 4 5 6 7 8 9 10
4 4 4 4 4 4 6 6 6 8
1 4 3 4 1 2 5 1 4 8
1 1 2 4 4 1 5 2 2 3
Average Variance
3
Trial Number 4 5 6 7 8
9 10 11 12
Mean
Sample Variance
2 3 3 3 3 4 1 3 6 6
4 1 4 3 2 1 2 1 4 2
3 2 1 1 2 2 5 3 1 2
2.000 2.667 2.667 2.500 2.750 2.417 3.250 2.333 3.750 5.083
1.455 1.515 1.152 1.182 0.932 1.356 2.750 2.061 3.659 5.720
2.942 0.8056
2.178
2 2 2 3 3 3 3 1 5 4
3 4 1 1 3 3 6 1 2 2
1 3 4 3 4 2 1 6 5 7
1 4 4 2 3 4 3 3 6 8
1 1 3 1 2 4 3 3 3 7
4 4 2 3 2 1 3 2 6 5
1 3 3 2 4 2 2 2 1 7
Assume that we only have the observed rolls and do not know what sided die generated each risk. Then one can attempt to estimate the EPV and VHM, and thus the Bühlmann Credibility constant K, as follows. For Risk 1 separately we can estimate the mean as 2 and the process variance as:15 (1- 2)2 + (1- 2)2 + (2 - 2)2 + (4 - 2)2 + (2 - 2)2 + (3 - 2)2 + (1- 2)2 + (1- 2)2 + (3 - 2)2 + (1- 2)2 + (4 - 2)2 + (1- 2)2 12 - 1
= 1.455. We have calculated the usual sample variance of the first row of data. Exercise: Based on the above data for risk 9 what is the estimated process variance? [Solution: The estimated mean is 3.75 and the estimated process variance is 3.659. Comment: We have estimated the sample mean and variance for a row of the data. ] In this manner one gets a separate estimate of the process variance of each of the ten risks. By taking an average of these ten values, one gets an estimate of the (overall) expected value of the process variance, in this case 2.178.16 17
15
Note that this estimate of the process variance only involves data from a single row. We have divided by the number of columns minus one; these Empirical Bayes methodologies are careful to adjust for degrees of freedom in order to attempt to get unbiased estimators. With an unknown mean, one divides by n-1 in order to get the sample variance, an unbiased estimate of the variance. 16 Note that the EPV for the underlying model is in this case 2.10; thus 2.178 is a pretty good estimate. 17 Note that this is the “within variance” from analysis of variance.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 13
Now by taking an average of the means estimated for each of the ten risks, one gets an estimate of the overall average, in this case 2.942. Then in order to estimate the VHM one could sum the squared deviations of the individual means from the overall mean. In this case that sum is: (2 - 2.942)2 + (2.667 - 2.942)2 + (2.667 - 2.942)2 + (2.5 - 2.942)2 + (2.75 - 2.942)2 + (2.417-2.942)2 + (3.25 - 2.942)2 + (2.333 - 2.942)2 + (3.75 - 2.942)2 + (5.083 - 2.942)2 = 7.251. Dividing this sum by one less than the number of risks, 7.251 / (10 - 1) = 0.8057 is the sample variance of the hypothetical means.18 However, as discussed subsequently, this would be a biased estimate of the VHM. It contains a little too much random variation due to the use of the observed means rather than the actual hypothetical means. It turns out that if we subtract out the estimated EPV divided by the number of years, the resulting estimator will be an unbiased estimator of the VHM: 0.8057 - (2.178/12) = 0.6242 In this example: C =10, Y =12, X1 = 2, X = 2.942, s1 2 = 1.455, EPV = 2.178, VHM = 7.251/(10 - 1) - 2.178/12 = 0.6242, and K = 2.178/0.6242 = 3.5. Thus twelve trials of data would be given a credibility of 12 / (12+3.5) = 77.4%.19 The remaining 22.6% weight be assigned to the observed overall mean of 2.942. Thus for risk 1 one would estimate a future roll as: (77.4%)(2.000) + (22.6%)(2.942) = 2.21. Exercise: Assuming we are unaware of which sided die was used to generate each risk, what is the estimate of a future roll from risk 9? [Solution: (77.4%)(3.750) + (22.6%)(2.942) = 3.57.] Note that the computations of the estimated EPV and VHM would have been much faster if one had been given summary statistics. In this case:
ΣΣ(Xij - X i)2 = 239.583.
Σ( X i - X )2 = 7.251.
Estimated EPV = ΣΣ (Xij - X i)2 / {C(Y - 1)} = 239.583 / {(10)(12 - 1)} = 2.178. Estimated VHM = Σ( X i - X )2 / (C - 1) - EPV/Y = 7.251 / (10 - 1) - 2.178/12 = 0.6241.
18
This is the “between variance” from analysis of variance. If instead one had used the value of K of 4.67 from the full parametric model, then 12 trials would be assigned a credibility of 12/(12+4.67) = 72.0%. In the full parametric model, one would weight the observation with the model overall mean of 3.000. For risk 1, with an observation of 2.000, that would result in an estimate of a future mean die roll of: (72.0%)(2.000)+(28.0%)(3.000) = 2.28. 19
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 14
Here are the estimated mean future die rolls for each of the ten risks: Risk Number
Observed Mean Die Roll
Estimated Future Mean Die Roll
1 2 3 4 5 6 7 8 9 10
2.000 2.667 2.667 2.500 2.750 2.417 3.250 2.333 3.750 5.083
2.21 2.73 2.73 2.60 2.79 2.54 3.18 2.47 3.57 4.60
Random Fluctuations: Note that the estimated value of K of 3.5 differs somewhat from the value of K = 2.1/.45 = 4.67 for the full parametric multisided die example. The difference is due to a lack of information here; in the full parametric model we were given or we assumed that there was a 60% chance that the die was 4-sided, 30% chance the die was 6-sided, and 10% chance that the die was 8-sided. In contrast, here we have made no assumptions about the types of dice or their probabilities. In other words, we have made no assumptions about the “structure parameters” that specify the risk process. Unfortunately, while this estimate of the EPV is pretty good, that of the VHM is subject to considerable random fluctuation. A significant difficulty with this technique is that the estimated VHM can be negative, even though as a variance the VHM can not actually be negative. In the case of a negative estimate of the VHM it is generally set equal to zero; this results in no credibility being given to the data, which makes sense if the actual VHM is very small. If the estimated VHM ≤ 0, then set Z = 0. In any case, the random fluctuation in the estimate of the VHM can lead to considerable random fluctuation in the estimate of K. In this case, when a series of ten different simulations were run, the estimated values of K ranged from 2.9 to over 100 to negative values, (which would be set equal to infinity when the VHM was set equal to zero.)
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 15
Explanation of the Correction Term for the VHM: The formula for estimating the Variance of the Hypothetical Means has a correction term involving the (estimated) Expected Value of the Process Variance: VHM = (the sample variance of the class means) - EPV / (# years of data). The basic reason why we need to subtract EPV/Y is because we are working with the sample means rather than the hypothetical means. Each sample mean has a variance of: (Process Variance)/Y, due to random fluctuation.20 Thus the sample variance of the sample means contains this extra random fluctuation and is too large. We subtract EPV/ Y to correct for this and make the estimate of the Variance of the Hypothetical Means unbiased. More formally, using the analysis of variance formula to break Var[ X i] into two pieces: Var[ X i] = Var[E[ X i| Class i]] + E[Var[ X i | Class i]] = Var[µi] + E[σi2 / Y] = VHM + EPV/Y.
⇒ VHM = Var[ X i] - EPV/Y.21 Assumptions for Nonparametric Empirical Bayes Estimation: This technique is referred to as nonparametric empirical Bayes estimation, because we have made no specific assumptions about the structure parameters. In fact, we make some assumptions; as will be discussed, we implicitly or explicitly assume the usual Bühlmann covariance structure between the data from years, classes, etc. Of course, in addition we assumed that the years of data from Louis were drawn from a single distribution, the years of data from Dewey were drawn from a single distribution, etc. We did not specify the form of each of these distributions; they could each have a different form or the same form. The important point, is that the whole computation would make no sense if each row of the data matrix were not a random sample from its own distribution, whatever it is.
20
In general, the variance of any average of independent, identically distributed variables is the variance of a singe variable divided by the number of variables. Var[mean] = Var[X]/n. 21 See page 19 in “Topics in Credibility” by Dean.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 16
The Assumed Covariance Structure: The Empirical Bayes methods shown here assumes the usual Bühlmann covariance structure.22 When the size of risk is not important, for two years of data from a single risk, the Variance-Covariance structure between the years of data is as follows: COV[Xt , Xu] = η2δtu + τ2, where δtu is 1 for t = u and 0 for t ≠ u. COV[X1 , X2 ] = τ2 = VHM, and COV[X1 , X1 ] = VAR[X1 ] = η2 + τ2 = EPV + VHM. Assume one has data from a number of risks over several years. Let Xit be the data (die roll, frequency, severity, pure premium, etc.) observed for class (or risk) i, in year t. Then the assumed covariance structure is: COV[Xit , Xju] = δij δtu η2 + δij τ2. In other words, two observations from different risks have a covariance of zero, two observations of the same risk in different years have a covariance of τ2, the VHM, while the observation of a risk in a single year has a variance of η2 + τ2, the VHM plus the EPV. Let M be the overall grand mean, M = E[Xit], then we have from the assumed covariance structure that the expected value of the product of two observations is: E[Xit Xju] = COV[Xit , Xju] + E[Xit] E[Xju] = δij δtu η2 + δij τ2 + M2 . Then for example for a single risk in a single year: E[Xit 2] = η2 + τ2 + M2 . Y
∑ Xit Let X i = t=1 Y
= average of the data for class i.
Then E[Xit X i] = (1/ Y)
22
Y
Y
u=1
u=1
∑ E[XitXiu] = (1/ Y) ∑ δtu η2 + τ2 + M2 = τ2 + M2 + η2 / Y.
This covariance structure is discussed in “Mahlerʼs Guide to Buhlmann Credibility.” The Bühlmann-Straub covariance structure is assumed when exposures vary by year, as is discussed subsequently.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 17 Y
Also E[ X i 2] = E[ X i X i] = E[
Y
∑ Xit Xi / Y] = (1/ Y) ∑ τ2 + M2 + t=1
η2 / Y
t=1
= τ2 + M2 + η2 / Y = E[Xit X i ]. An Unbiased Estimate of the EPV: In order to estimate the EPV, one looks at the squared differences between the observations for a class and the average for that class: Y
∑ (Xit - Xi )2 Let si2 = t=1
. Y - 1 Y
Then (Y-1) E[
si2 ] =
∑
2 E[(Xit - Xi ) ] =
t=1
Y
∑ E[Xit2] - 2 E[Xit X i ] + E[X i 2 ] t=1
Y
=
∑ η2 + τ2 + M2 - (τ2 + M2 +
η2 / Y) = η2 (Y-1).
t=1
Thus, E[si2 ] = η2 (Y-1) / (Y-1) = η2. si2 , the sample variance, is an unbiased estimator of the process variance. Then any weighted average of the si2 for the various classes would be an unbiased estimator of the EPV. C
In particular, EPV = (1/C)
∑ si2 is an unbiased estimator of the EPV. i=1
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 18
An Unbiased Estimate of the VHM: In order to estimate the VHM, one looks at the squared differences between the average for each class and the overall average: C
Let
ξ2
=
∑ ( Xi - X )2 . i=1
Y
For two different risks, i ≠ j, E[ X i X j] = E[(1 / Y2 )
Y
∑ ∑ Xit Xju ] t=1 u=1
Y
= (1 / Y2 )
Y
Y
Y
∑ ∑ E[Xit Xju] = ∑ ∑ M2 = M2. t=1 u=1
t=1 u=1
Therefore, since previously we had that E[ X i 2 ] = M2 + τ2 + η2 /Y, we have that: E[ X i X j] = M2 + δij (τ2 + η2 / Y). E[ X i X ] = E[(1/ C)
C
C
C
j=1
j=1
j=1
∑ X i X j ] = (1/ C) ∑ E[Xi Xj ] = (1/ C) ∑ M2 + δ ij(τ 2 + η2 / Y )
= (1/ C) {CM2 + τ2 + η2/Y} = M2 + τ2/C + η2 / (CY). E[ X 2 ] = E[ X X ] = E[(1/ C)
C
C
C
i=1
i=1
i=1
∑ X i X ] = (1/ C) ∑ E[Xi X] = (1/ C) ∑ M2 + τ2 + η2 / Y
= M2 + τ2/C + η2 / (CY). C
Then E[ξ2 ] =
∑ E[(Xi - X i=1
)2
C
]=
∑ E[Xi 2] - 2 E[X i X ] + E[ X 2] i=1
C
=
∑ M2 + τ2 + η2 / Y - {M2 + τ2 / C + η2 / (CY)} i=1
= (τ2 + η2/Y) (C - 1).
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 19
Thus, E[ξ2 /(C - 1) - η2 /Y] = τ2 . Therefore if one takes: C
EPV = (1/C)
∑ si2 , which is an unbiased estimator of η2, i=1
then an unbiased estimator of the Variance of the Hypothetical Means, τ2, is: C
∑(Xi VHM = i=1
- X )2 -
C - 1
EPV . Y
Summary of Empirical Bayesian Estimation (no variation in years or exposures): For this case, Empirical Bayesian Estimation can be summarized as follows: si2 is just the usual sample variance for the data from a single class i. Y
∑ (Xit - Xi )2 si2 = t=1
Y - 1
.
C
C Y
∑si2 ∑ ∑ (Xit - Xi )2 EPV = average of the sample variances for each class = i=1 C
= i=1 t=1 C (Y - 1)
.
VHM = (the sample variance of the class means) - EPV / (# years of data) C
∑ (Xi - X )2 = i=1
C - 1
-
EPV . Y
If the estimated VHM ≤ 0, then set Z = 0. Otherwise, as usual, K = 23
# years of data EPV , and Z = .23 # years of data + K VHM
More generally, in the absence of exposures, the amount of data is the number of observations or the number of columns in the data grid.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 20
Problems: 2.1 (2 points) You are given the following information about 3 groups over 2 years: Group Loss in Year 1 Loss in Year 2 1 18 20 2 25 30 3 16 10 What is the credibility assigned to two years of data? (A) 70% (B) 80% (C) 90% (D) 95% (E) 99%
2.2 (3 points) An insurer has data on losses for seven policyholders for ten years. Xij is the loss from the ith policyholder for year j. You are given: X 1 = 13.1. 7 10
X = 10.3. 7
∑∑(Xij - Xi )2 = 253.
∑(Xi
i=1 j=1
i=1
- X )2 = 17.
Estimate the losses for policyholder #1 over the next year. (A) Less than 11.0 (B) At least 11.0, but less than 11.5 (C) At least 11.5, but less than 12.0 (D) At least 12.0, but less than 12.5 (E) At least 12.5 2.3 (3 points) Use the following information:
• Phil and Sylvia are competitors in the light bulb business. • You test 100 bulbs from each of them. • Let Xi be the lifetime for light bulb i. • For Philʼs 100 light bulbs: Σ Xi = 85,217 and Σ Xi2 = 82,239,500. • For Sylviaʼs 100 light bulbs: Σ Xi = 90,167 and Σ Xi2 = 90,372,400. Use Nonparametric Empirical Bayes estimation in order to estimate the average lifetime of a randomly selected light bulb from Phil. A. 860 B. 865 C. 870 D. 875 E. 880
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 21
Use the following information for the next two questions: Hi Blumfield has just moved cross country to Massachusetts, in order to take a new job. During Hiʼs first week, Hiʼs coworker “Wild” Bill Brown drove Hi to and from work. Hiʼs car finally arrived from California over the next weekend. So during his second week, Hi drove his own car to and from work. Hiʼs two way commute times were in minutes: Day Riding With Bill Driving His Own Car Monday 45 53 Tuesday 46 50 Wednesday 39 51 Thursday 42 50 Friday 43 46 2.4 (3 points) Using nonparametric Empirical Bayes Estimation, what is Bühlmannʼs credibility parameter, K? A. Less than 0.5 B. At least 0.5 but less than 1.0 C. At least 1.0 but less than 1.5 D. At least 1.5 but less than 2.0 E. At least 2.0 2.5 (1 point) In his third week, Hi will again drive his own car into work. Using nonparametric Empirical Bayes Estimation, predict Hiʼs average commute time during his third week. A. Less than 49.1 B. At least 49.1 but less than 49.3 C. At least 49.3 but less than 49.5 D. At least 49.5 but less than 49.7 E. At least 49.7 2.6 (3 points) Medical inflation is 5% per year. You have the following data for health insurance claim costs for two insureds for two years. Policyholder
2000
2001
Gilbert Sullivan
500 400
200 900
Estimate Gilbertʼs claim costs during 2002, using Nonparametric Empirical Bayes estimation. A. Less than 490 B. At least 490 but less than 490 C. At least 500 but less than 510 D. At least 510 but less than 520 E. At least 520
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 22
Use the following information to answer the next four questions: For each of 5 employees for each of 6 years, displayed below are the dollars of health insurance claims made: Alice Asok Dilbert Howard Wally
1
2
3
4
5
6
Mean
Sample Variance
138 112 135 91 467
73 103 155 206 133
320 129 121 109 265
102 93 123 211 193
782 104 77 116 118
270 171 139 81 89
280.83 118.67 125.00 135.67 210.83
69,679 802 704 3,341 19,679
Assume that each year of data has been brought to the current cost level. 2.7 (1 point) Use nonparametric Empirical Bayes techniques to estimate the Expected Value of the Process Variance. A. Less than 18,000 B. At least 18,000 but less than 19,000 C. At least 19,000 but less than 20,000 D. At least 20,000 but less than 21,000 E. At least 21,000 2.8 (2 points) Use nonparametric Empirical Bayes techniques to estimate the Variance of the Hypothetical Means. A. Less than 1500 B. At least 1500 but less than 1600 C. At least 1600 but less than 1700 D. At least 1700 but less than 1800 E. At least 1800 2.9 (1 point) Using the results of the previous questions, what is the credibility given to the observed data from Wally? A. Less than 30% B. At least 30% but less than 35% C. At least 35% but less than 40% D. At least 40% but less than 45% E. At least 45% 2.10 (1 point) What is the estimated future annual cost for Howard? A. Less than 150 B. At least 150 but less than 155 C. At least 155 but less than 160 D. At least 160 but less than 165 E. At least 165
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 23
Use for the next three questions the past claims data on a portfolio of policyholders: Year Policyholder 1 2 3 1 85 50 65 2 60 55 70 3 80 95 75 2.11 (2 points) What is the Bühlmann credibility premium for policyholder 1 for year 4? A. Less than 62 B. At least 62 but less than 67 C. At least 67 but less than 72 D. At least 72 but less than 77 E. At least 77 2.12 (2 points) What is the Bühlmann credibility premium for policyholder 2 for year 4? A. Less than 62 B. At least 62 but less than 67 C. At least 67 but less than 72 D. At least 72 but less than 77 E. At least 77 2.13 (2 points) What is the Bühlmann credibility premium for policyholder 3 for year 4? A. Less than 62 B. At least 62 but less than 67 C. At least 67 but less than 72 D. At least 72 but less than 77 E. At least 77
2.14 (3 points) Survival times are available for six insureds, three from Class A and three from Class B. The three from Class A died at times t = 17, t = 32, and t = 39. The three from Class B died at times t = 12, t = 20, and t = 24. Nonparametric Empirical Bayes estimation is used to estimate the mean survival time for each class. Unbiased estimators of the expected value of the process variance and the variance of the hypothetical means are used. Estimate Z, the Bühlmann credibility factor. (A) 0 (B) 1/4 (C) 1/2 (D) 3/4 (E) 1
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 24
2.15 (2 points) Two insureds produced the following losses over a three-year period: Annual Losses Insured Year 1 Year 2 Year 3 1 25 25 25 2 27 24 33 Using the nonparametric empirical Bayes method, determine the Bühlmann credibility premium for Insured 1. A. 25.4 B. 25.6 C. 25.8 D. 26.0 E. 26.2 2.16 (3 points) Two vehicles were selected at random from a population and the following claim counts were observed: Number of Claims during Year Vehicle Year 1 Year 2 Year 3 Year 4 1 1 0 0 0 2 1 3 0 1 Use empirical Bayesian estimation procedures to calculate the credibility weighted estimate of the annual claims frequency for the first vehicle. A. 0.48 B. 0.50 C. 0.52 D. 0.54 E. 0.56 2.17 (2 points) Two medium-sized insurance policies produced the following losses over a threeyear period: Annual Losses Insured Year 1 Year 2 Year 3 1 25 34 23 2 15 26 17 Assuming that the annual exposures for each policy are equal and remain constant through time, use empirical Bayesian estimation procedures to calculate the credibility assigned to the data from the first insured. A. 50% B. 55% C. 60% D. 65% E. 70% 2.18 (3 points) Lucky Tom finds coins on his walk to work. He can take one of two routes to work. The last three times he took the first route he found coins worth: 123, 89, and 101. The last three times he took the second route he found coins worth: 80, 112, and 67. Estimate the worth of the coins he finds tomorrow if Lucky Tom takes the second route to work tomorrow, using Nonparametric Empirical Bayes estimation. A. Less than 89 B. At least 89 but less than 91 C. At least 91 but less than 93 D. At least 93 but less than 95 E. At least 95
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 25
2.19 (4 points) At Hogwarts School of Witchcraft and Wizardry, the number of points in the annual competition for the house cup has been over the last three years: House Year 1 Year 2 Year 3 Gryffindor 421 450 428 Hufflepuff 442 403 382 Ravenclaw 405 398 456 Slytherin 461 487 475 Estimate Slytherinʼs points in Year 4, using Nonparametric Empirical Bayes estimation. A. 445 B. 450 C. 455 D. 460 E. 465 Use the following information for the next two questions: Cookie Monster is testing brands of chocolate chip cookies. Being diligent, he eats 1000 cookies from each of four brands: Trollhouse, Little Gloriaʼs, Chip Off The Old Block, and Hello Mr. Chips. Let Xi be the number of chips for cookie i, i = 1 to 1000. For Trollhouse: Σ Xi = 10,211 and Σ Xi2 = 114,681. For Little Gloriaʼs: Σ Xi = 10,593 and Σ Xi2 = 123,397. For Chip Off The Old Block: Σ Xi = 10,363 and Σ Xi2 = 117,531. For Hello Mr. Chips: Σ Xi = 11,312 and Σ Xi2 = 139,382. 2.20 (2 points) What is the estimated Expected Value of the Process Variance? A. 10.6 B. 10.8 C. 11.0 D. 11.2 E. 11.4 2.21 (3 points) Use Nonparametric Empirical Bayes estimation in order to estimate the average number of chips per cookie for Hello Mr. Chips. A. 11.0 B. 11.1 C. 11.2 D. 11.3 E. 11.4
2.22 (3 points) For each of 5 years, we have the number of claims for each of 2 insureds: A B
1
2
3
4
5
0 1
0 0
1 0
0 0
0 2
Estimate the number of claims for insured B over the next 5 years, using Nonparametric Empirical Bayes estimation. A. 2.0 B. 2.2 C. 2.4 D. 2.6 E. 2.8
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 26
2.23 (3 points) Two insureds produced the following losses over a three-year period: Annual Losses Insured Year 1 Year 2 Year 3 1 $3500 $3100 $3300 2 $2700 $3400 $2900 Inflation is 5% per year. Using the nonparametric empirical Bayes method, estimate the losses in year 5 for Insured 2. A. Less than 3600 B. At least 3600 but less than 3610 C. At least 3610 but less than 3620 D. At least 3620 but less than 3630 E. At least 3630 2.24 (3 points) Two insureds produced losses over a three-year period: Annual Losses Insured Year 1 Year 2 Year 3 A 38 29 41 B ?? ?? ?? For insured B, the losses in Year 2 were 8 more than in Year 1. For insured B, the losses in Year 3 were 2 more than in Year 2. In Year 3, the losses for Insured B are greater than those for Insured A. Using nonparametric empirical Bayes estimation, the Bühlmann credibility factor for an individual policyholder is 10.68%. What were the losses observed for insured B in Year 1? A. 33 B. 34 C. 35 D. 36 E. 37 2.25 (4B, 11/93, Q.25) (1 point) You are given the following: • A random sample of losses taken from policy year 1992 has sample variance, s2 = 16. • The losses are sorted into 3 classes, A, B, and C, of equal size. • The sample variances for each of the classes are: sA2 = 4
sB2 = 5
sC2 = 6
Estimate the variance of the hypothetical means. A. Less than 4 B. At least 4, but less than 8 C. At least 8, but less than 12 D. At least 12 E. Not enough information to estimate
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 27
2.26 (Course 4 Sample Exam 2000, Q.31) You wish to determine the relationship between sales (Y) and the number of radio advertisements broadcast (X). Data collected on four consecutive days is shown below. Day Sales Number of Radio Advertisements 1 10 2 2 20 2 3 30 3 4 40 3 Using the method of least squares, you determine the estimated regression line: ^
Y = -25 + 20X You perform an Empirical Bayes nonparametric credibility analysis by treating the first two days, on which two radio advertisements were broadcast, as one group, and the last two days, on which three radio advertisements were broadcast, as another group. Determine the estimated credibility, Z, of the data from each group. 2.27 (4, 5/00, Q.15 and 4, 11/02, Q.11 & 2009 Sample Q. 38) (2.5 points) An insurer has data on losses for four policyholders for seven years. Xij is the loss from the ith policyholder for year j. You are given: 4 7
∑∑(Xij - Xi
)2 = 33.60
i=1 j=1 4
∑(Xi
- X )2 = 3.30
i=1
Calculate the Bühlmann credibility factor for an individual policyholder using nonparametric empirical Bayes estimation. (A) Less than 0.74 (B) At least 0.74, but less than 0.77 (C) At least 0.77, but less than 0.80 (D) At least 0.80, but less than 0.83 (E) At least 0.83
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 28
2.28 (4, 11/00, Q.16) (2.5 points) Survival times are available for four insureds, two from Class A and two from Class B. The two from Class A died at times t = 1 and t = 9. The two from Class B died at times t = 2 and t = 4. Nonparametric Empirical Bayes estimation is used to estimate the mean survival time for each class. Unbiased estimators of the expected value of the process variance and the variance of the hypothetical means are used. Estimate Z, the Bühlmann credibility factor. (A) 0 (B) 2/19 (C) 4/21 (D) 8/25 (E) 1 2.29 (4, 11/03, Q.15 & 2009 Sample Q.12) (2.5 points) You are given total claims for two policyholders: Year Policyholder 1 2 3 4 X 730 800 650 700 Y 655 650 625 750 Using the nonparametric empirical Bayes method, determine the Bühlmann credibility premium for Policyholder Y. (A) 655 (B) 670 (C) 687 (D) 703 (E) 719 2.30 (4, 11/06, Q.27 & 2009 Sample Q.270) (2.9 points) Three individual policyholders have the following claim amounts over four years: Policyholder Year 1 Year 2 Year 3 Year 4 X 2 3 3 4 Y 5 5 4 6 Z 5 5 3 3 Using the nonparametric empirical Bayes procedure, calculate the estimated variance of the hypothetical means. (A) Less than 0.40 (B) At least 0.40, but less than 0.60 (C) At least 0.60, but less than 0.80 (D) At least 0.80, but less than 1.00 (E) At least 1.00
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 29
2.31 (4, 5/07, Q.11) (2.5 points) Three policyholders have the following claims experience over three months: Policyholder Month 1 Month 2 Month 3 Mean Variance I 4 6 5 5 1 II 8 11 8 9 3 III 5 7 6 6 1 Nonparametric empirical Bayes estimation is used to estimate the credibility premium in Month 4. Calculate the credibility factor Z. (A) 0.57 (B) 0.68 (C) 0.80 (D) 0.87 (E) 0.95
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 30
Solutions to Problems: 2.1. C. EPV = average of the sample variances for each class = 10.833. VHM = Sample Variance of the Means - EPV/(# Years) = 53.083 - 10.833/2 = 47.667. Group
Loss in Year 1
Loss in Year 2
Mean
Sample Variance
1 2 3
18 25 16
20 30 10
19 27.5 13
2 12.5 18
19.833 53.083
10.833
Average Sample Variance
K = EPV/VHM = 10.833/47.667 = .227. For two years of data, Z = 2/(2 + .227) = 90%. 2.2. E. Estimated EPV = average of the sample variances for each policyholder = 1 7
7
10
∑ ∑ (Xij - X i)2 = (1/7)(1/9)(253) = 4.016. i=1
1 9
j=1
Estimated VHM = sample variance of the X i - EPV/(number of years) = 7
(1/6)
∑ (Xi - X )2
- (4.016/10) = 17/6 - 0.4016 = 2.432.
i=1
K = EPV / VHM = 4.016 / 2.432 = 1.651. With 10 years of data, Z = 10/(10+K) = 10/11.651 = 85.8%. The estimate for policyholder #1 is: Z(observed mean for #1) + (1 - Z)( observed overall mean) = (0.858) X 1 + (1 - 0.858) X = (0.858)(13.1) + (0.142)(10.3) = 12.7. 2.3. C. The mean for Phil is 852.17. The second moment for Phil is 822,395. The sample variance for Phil is: (100/99)(822,395 - 852.172 ) = 97,173. The mean for Sylvia is 901.67. The second moment for Sylvia is 903,724. The sample variance for Sylvia is: (100/99)(903,724 - 901.672 ) = 91,632. EPV = (97,173 + 91,632)/2 = 94,403. Overall Mean = (85217 + 90167)/200 = 876.92. Sample Variance of their means is: {(852.17 - 876.92)2 + (901.67 - 876.92)2 } / (2 -1) = 1225.1. Estimated VHM = 1225.1 - 94,403/100 = 281.1. K = EPV/VHM = 94,403/281.1 = 335.8. Z = 100/(100 + 335.8) = 22.9%. Estimated lifetime for Philʼs bulbs is: (0.229)(852.17) + (0.771)(876.92) = 871.3. Comment: Estimate for Sylviaʼs bulbs is: (0.229)(901.67) + (0.771)(876.92) = 882.6.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 31
2.4. A. Bill has a mean time of (45 + 46 + 39 + 42 + 43)/5 = 43. Bill has a sample variance of: {(45 - 43)2 + (46 - 43)2 + (39 - 43)2 + (42 - 43)2 + (43 - 43)2 } / (5 - 1) =7.5. Hi has a mean time of (53 + 50 + 51 + 50 + 46)/5 = 50. Hi has a sample variance of: {(53 - 50)2 + (50 - 50)2 + (51 - 50)2 + (50 - 50)2 + (46 - 50)2 } / (5 - 1) =6.5. The mean time for both weeks is: (43 + 50)/2 = 46.5. The estimated EPV = (7.5 + 6.5)/2 = 7. The estimated VHM is: {(43 - 46.5)2 + (50 - 46.5)2 } / (2 - 1) - EPV/5 = 23.1. K = 7/23.1 = 0.30. 2.5. E. Z = 5/5.30 = 94.3%. The estimate of Hiʼs future commute time is: (0.943)(50) + (1 - 0.943)(46.5) = 49.8 minutes. Comment: The concept is that part of the variation in commute times is “random”; i.e., due to variations in the day of the week, the time of day, weather, delays, etc. Only a portion of the variation is due to who is driving and which car they are driving. Thus we supplement the limited data from Hi driving himself, with some data from Bill over the same route. Note that the estimate could be rewritten as: (0.943)(50) + (1 - 0.943)(50 + 43)/2 = (0.943 + 0.057/2)(50) + (0.057/2)(43) = (97.1%)(Hiʼs mean) + (2.9%)(Billʼs mean). We give more weight to Hiʼs mean because we assume it is more relevant to predicting the future commuting time when Hi drives, (and the two data sources have the same volume of data.) If instead, Bill were driving Hi to work during the third week, then the estimate would be: (0.943)(43) + (1 - 0.943)(46.5) = 43.2. 2.6. E. First put all of the costs on the year 2002 level by adjusting for inflation. For example: (1.052 )(500) = 551. Policyholder
2000
2001
Mean
Sample Variance
Gilbert Sullivan
551 441
210 945
380 693
58,140 127,008
537 48828
92,574
Average Sample Variance
EPV = 92,574. Estimated VHM = 48828 - (92574/2) = 2541. K = 92,574/2541 = 36.4. Z = 2/(2 + 36.4) = 5.2%. Estimate for Gilbert is: (380)(0.052) + (537)(0.948) = 529.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 32
2.7. B. One takes the average of the given sample variances for each individual; the estimate of the EPV is: 18,841. Alice Asok Dilbert Howard Wally
1
2
3
4
5
6
Mean
Sample Variance
138 112 135 91 467
73 103 155 206 133
320 129 121 109 265
102 93 123 211 193
782 104 77 116 118
270 171 139 81 89
280.83 118.67 125.00 135.67 210.83
69,679 802 704 3,341 19,679
174.20 4925.52
18,841
Mean Sample Variance
2.8. D. One takes the sample variance of the 5 individual means: {(281-174)2 + (119-174)2 + (125-174)2 + (136-174)2 + (211-174)2 } / (5 - 1) = 4926. The estimate of the VHM is then: 4926 - EPV/(# of years) = 4926 - 18841/6 = 1786. 2.9. C. K = EPV/VHM = 18841/1786 = 10.5. For 6 years of data Z = 6/(6 + 10.5) = 36.4%. Comment: Since they each have 6 years of data, each individualʼs data is given the same credibility. (There are not differing numbers of exposures per year.) 2.10. D. The estimate of Howardʼs future annual costs is: (135.67)(0.364) + (174.20)(1 - 0.364) = 160. Comment: The estimates of each individualʼs future annual costs are: Mean
Estimate
Alice Asok Dilbert Howard Wally
280.83 118.67 125.00 135.67 210.83
213 154 156 160 188
Mean
174.20
174
2.11. C. EPV = 158.33. VHM = 128.70 - (158.33/3) = 75.9. Policyholder
1
Year 2
1 2 3
85 60 80
50 55 95
Average Sample Variance
3
Mean
Sample Variance
65 70 75
66.67 61.67 83.33
308.33 58.33 108.33
70.56 128.70
158.33
K = 158.33/75.9 = 2.09. Z = 3/(3 + 2.09) = 59.0%. The mean for policyholder 1 is 66.67 and the overall mean is 70.56. Thus the estimate for policyholder 1 for year 4 is: (0.590)(66.67) + (0.410)(70.56) = 68.26.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 33
2.12. B. The estimate for policyholder 2 for year 4 is: (0.590)(61.67) + (0.410)(70.56) = 65.31. 2.13. E. The estimate for policyholder 3 for year 4 is: (0.590)(83.33) + (0.410)(70.56) = 78.09. 2.14. C. EPV = average of the sample variances for each class = 81.833. VHM = Sample Variance of the Means - EPV/(# Years) = 56.889 - 81.833/3 = 29.611. Class
First Survival Time
Second Survival Time
Third Survival Time
Mean
Sample Variance
A B
17 12
32 20
39 24
29.333 18.667
126.333 37.333
24.000 56.889
81.833
Average Sample Variance
Estimated K = 81.833/29.611 = 2.76. Z = 3 / (3 + 2.76) = 0.521. Comment: Similar to 4, 11/00, Q.16. 2.15. E. Insured #1 has sample mean of 25, and sample variance of 0. Insured #2 has sample mean of 28, and sample variance of 21. EPV = (0 + 21)/2 = 10.5. VHM = {(25 - 26.5)2 + (28 - 26.5)2 }/(2 - 1) - EPV/3 = 1. K = EPV/VHM = 10.5. Z = 3/(3 + K) = 22.2%. Estimate for Insured #1 is: (22.2%)(25) + (1 - 22.2%)(26.5) = 26.17. 2.16. A. EPV = (0.25 + 1.583)/2 = 0.917.
A B Mean Sample Variance
1
2
3
4
Mean
1 1
0 3
0 0
0 1
0.250 1.250 0.750 0.500
Sample Variance 0.250 1.583 0.917
VHM = (the sample variance of the means) - EPV / (# years of data) = 0.500 - 0.917/4 = 0.271. K = EPV/VHM = 0.917/0.271 = 3.38. Z = 4 / (4 + K) = 0.542. Estimated frequency for first vehicle: (0.25)(0.542) + (1 - 0.542)(0.75) = 0.479. Comment: Similar to Exercise 12 in “Topics in Credibility” by Dean.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 34
2.17. D. EPV = (34.333 + 34.333)/2 = 34.333.
A B
1
2
3
Mean
25 15
34 26
23 17
27.333 19.333
Mean Sample Variance
23.333 32.000
Sample Variance 34.333 34.333 34.333
VHM = (the sample variance of the means) - EPV / (# years of data) = 32 - 34.333/3 = 20.56. K = EPV/VHM = 34.33/20.56 = 1.67. Z = 3/(3 + K) = 0.642. Comment: Similar to Exercise 13 in “Topics in Credibility” by Dean. The credibility assigned to the data from each insured is the same. Since the exposures are the same for each policy and for each year, we can use the formulas that do not involve exposures. 2.18. D. EPV = 416.83. Estimated VHM = 162 - (416.83/3) = 23.06. Route
1
Year 2
1 2
123 80
89 112
3
Mean
Sample Variance
101 67
104.33 86.33
297.33 536.33
95.33 162.00
416.83
Average Sample Variance
K = 416.83/23.06 = 18.1. Z = 3/(3 + 18.1) = 14.2%. The estimate for route 1 is: (0.142)(86.33) + (0.858)(95.33) = 94.05. 2.19. E. EPV = average of the sample variances = 581.9. VHM = sample variance of the means - EPV/(# years ) = 819.4 - 581.9/3 = 625.4. House
1
Year 2
3
Mean
Sample Variance
Gryffindor Hufflepuff Ravenclaw Slytherin
421 442 405 461
450 403 398 487
428 382 456 475
433.0 409.0 419.7 474.3
229.0 927.0 1002.3 169.3
434.0 819.4
581.9
Average Sample Variance
K = EPV/VHM = 581.9/625.4 = 0.93. For three years of data, Z = 3/(3 + 0.93) = 76.3%. The observed annual points for Slytherin is 474.3 and the overall mean is 434.0. The estimated future annual points for Slytherin is: (76.3%)(474.3) + (1 - 76.3%)(434.0) = 464.7.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 35
2.20. B. For Trollhouse, the mean is 10.211, and the second moment is 114.681. The sample variance is: (1000/999)(114.681 - 10.2112 ) = 10.43. For Little Gloriaʼs, the mean is 10.593, and the second moment is 123.397. The sample variance is: (1000/999)(123.397 - 10.5932 ) = 11.20. For Chip Off The Old Block, the mean is 10.363, and the second moment is 117,531. The sample variance is: (1000/999)(117.531 - 10.3632 ) = 10.15. For Hello Mr. Chips, the mean is 11.312, and the second moment is 139.382. The sample variance is: (1000/999)(139.382 - 11.3122 ) = 11.43. Estimated EPV = (10.43 + 11.20 + 10.15 + 11.43)/4 = 10.80. 2.21. D. Overall Mean = (10.211 + 10.593 + 10.363 + 11.312)/4 = 10.620. Sample Variance of their means is: {(10.211 - 10.62)2 + (10.593 - 10.62)2 + (10.363 - 10.62)2 + (11.312 - 10.62)2 } / (4 -1) = 0.240. Estimated VHM = 0.240 - 10.80/1000 = 0.229. K = EPV/VHM = 10.80/0.229 = 47.2. Z = 1000/(1000 + 47.2) = 95.5%. Estimated average number of chips per cookie for Hello Mr. Chips is: (0.955)(11.312) + (0.045)(10.62) = 11.28. 2.22. A. EPV = (0.2 + 0.8)/2 = 0.5.
A B Mean Sample Variance
1
2
3
4
5
Mean
0 1
0 0
1 0
0 0
0 2
0.20 0.60 0.400 0.080
Sample Variance 0.20 0.80 0.500
VHM = (the sample variance of the means) - EPV / (# years of data) = 0.08 - 0.5/5 = -0.02. Since the estimated VHM is negative, set Z = 0, equivalent to assuming the VHM is extremely small. Overall mean = 4/10 = 0.4. Estimated frequency for insured B: (0)(3/5) + (1 - 0)(0.4) = 0.4. (5)(0.4) = 2.0.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 36
2.23. C. First, inflate all of the losses up to the year 5 level. For example, (1.054 )(3500) = 4254. Insured
1
Year 2
1 2
4254 3282
3589 3936
3
Mean
Sample Variance
3638 3197
3827 3472
137,502 163,432
3649 63,145
150,467
Average Sample Variance
Estimated EPV = 150,467. Estimated VHM = 63,145 - (150,467/3) = 12,989. K = 150,467/12,989 = 11.6. Z = 3 / (3 + 11.6) = 20.5%. The estimate for insured 2 is: (0.205)(3472) + (1 - 0.205)(3649) = 3613. 2.24. C. 0.1068 = Z = 3 / (3 + K). ⇒ K = 25.09. Let x be the losses observed for insured B in Year 1. For Insured A, mean = 36, sample variance = (22 + 72 + 52 )/2 = 39. For Insured B, mean = (x + x + 8 + x + 10)/3 = x + 6, sample variance = (62 + 22 + 42 )/2 = 28. EPV = average of the sample variances for each insured = (39 + 28)/2 = 33.5. 25.09 = K = EPV/VHM. ⇒ VHM = 33.5/25.09 = 1.335. 1.335 = VHM = Sample Variance of the Means - EPV/(# Years).
⇒ Sample Variance of the Means = 1.335 + 33.5/3 = 12.50. Overall mean = (36 + x + 6)/2 = 21 + x/2. 12.50 = Sample Variance of the Means = (15 - x/2)2 + (x/2 - 15)2 .
⇒ (x/2 - 15)2 = 6.25. ⇒ x/2 = 15 ± 2.5. ⇒ x = 35 or 25. However, we are told that in Year 3, the losses for Insured B are greater than those for Insured A. 35 + 10 = 45 > 41. 25 + 10 = 35 < 41. Therefore, the losses in Year 1 for Insured B are 35. Comment: Here is the calculation using the missing inputs: Insured
Loss in Year 1
Loss in Year 2
Loss in Year 3
Mean
Sample Variance
A B
38 35
29 43
41 45
36 41
39 28
38.500 12.500
33.500
Average Sample Variance
EPV = 33.5. VHM = 12.5 - 33.5/3 = 1.333. K = EPV/VHM = 33.5/1.333 = 25.1. For three years of data, Z = 3 / (3 + 25.1) = 10.68%.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 37
2.25. C. Variance of the Hypothetical Means = Total Variance - EPV = 16 - (4 + 5 + 6) / 3 = 16 - 5 = 11. Comment: The estimated EPV is the weighted average of the individual estimated process variances; in this case the classes are equally likely so they each have probability 1/3. The estimated total variance is the sample variance of the losses, 16. 2.26. Group A has a mean sales of (10 + 20)/2 = 15. Group A has a sample variance of: {(10 - 15)2 + (20 - 15)2 } / (2 - 1) = 50. Group B has a mean sales of (30 + 40)/2 = 35. Group B has a sample variance of: {(30 - 35)2 + (40 - 35)2 } / (2 - 1) = 50. Group
1
2
Mean
Sample Variance
A B
10 30
20 40
15 35
50 50
25 200
50
Mean Sample Variance
The estimated EPV = (50 + 50)/2 = 50. The mean for both groups is: (15+35)/2 = 25. The estimated VHM is: {(15 - 25)2 +(35 - 25)2 } /(2 - 1) - EPV/2 = 200 - 50/2 = 175. K = EPV / VHM = 50/175 = 2/7. For two days of data, Z = 2/(2 + 2/7) = 7/8. Comment: This question is not treating each day as if it had a different number of exposures. In fact once you divide the days into two groups, one ignores the number of advertisements per day. The estimate of future sales on days with two advertisements (Group A) is: (7/8)(15) + (1/8)(25) = 16.25. The estimate of future sales on days with three advertisements (Group B) is: (7/8)(35) + (1/8)(25) = 33.25. 2.27. D. Estimated EPV = average of the sample variances for each policyholder = 1 4
4
7
∑ ∑ (Xij - X i)2 = (1/4)(1/6)(33.60) = 1.4. i=1
1 6
j=1
Estimated VHM = (sample variance of the X i ) - EPV / (number of years) = 1 3
4
∑ (Xi - X )2 - (1.4/7) = (3.30/3) - 0.2 = 0.9. i=1
K = EPV / VHM = 1.4/0.9 = 1.556. With 7 years of data, Bühlmann credibility factor = Z = 7 / (7 + K) = 7/ 8.556 = 81.8%.
2013-4-12
Empirical Bayesian Cred. §2 No Varying Exposures, HCM 10/22/12, Page 38
2.28. A. EPV = average of the sample variances for each class = 17. VHM = Sample Variance of the Means - EPV/(# Years) = 2 - 17/2 = -6.5. Class
First Survival Time
Second Survival Time
Mean
Sample Variance
A B
1 2
9 4
5 3
32 2
4.000 2.000
17.000
Average Sample Variance
Since the estimate of the VHM is negative, one treats it as if VHM = 0, which implies zero credibility. Z = 0. Comment: Since Z = 0, the estimated survival time for each class is equal to the overall mean of 4. The Variance of Hypothetical Means is never negative, just as with any other variance. 2.29. C. EPV = average of the sample variances for each class = 3475. VHM = Sample Variance of the Means - EPV/(# Years) = 1250 - 3475/4 = 381. Policyholder
1
Year 2
3
4
Mean
Sample Variance
X Y
730 655
800 650
650 625
700 750
720 670
3,933 3,017
695 1250
3,475
Average Sample Variance
K = EPV/VHM = 3475/381 = 9.1. For four years of data, Z = 4 / (4 + 9.1) = 30.5%. Estimate for Policyholder Y is: (0.305)(670) + (1 - 0.305)(695) = 687. Comment: The answer must be between the mean for policyholder Y of 670 and the overall mean of 695. Thus only choice C and perhaps choice B are possible. 2.30. C. X1 = 3. Sample Variance for X is: {(2 - 3)2 + (3 - 3)2 + (3 - 3)2 + (4 - 3)2 } / (4 - 1) = 2/3. X2 = 5. Sample Variance for Y is: {(5 - 5)2 + (5 - 5)2 + (4 - 5)2 + (6 - 5)2 } / (4 - 1) = 2/3. X 3 = 4. Sample Variance for Z is: {(5 - 4)2 + (5 - 4)2 + (3 - 4)2 + (3 - 4)2 } /( 4 - 1) = 4/3. Estimated EPV = (2/3 + 2/3 + 4/3) / 3 = 8/9. X = (3 + 5 + 4)/3 = 4. Estimated VHM = {(3 - 4)2 + (5 - 4)2 + (4 - 4)2 } / (3 - 1) - EPV/4 = 1 - 2/9 = 7/9 = 0.778. Comment: K = EPV/VHM = (8/9)/(7/9) = 8/7. Z = 4/(4 + K) = 7/9. 2.31. D. Estimated EPV = (1 + 3 + 1)/3 = 5/3. Overall Mean is: (5 + 9 + 6)/3 = 20/3. Estimated VHM = {(5 - 20/3)2 + (9 - 20/3)2 + (6 - 20/3)2 } / (3 - 1) - (5/3)/3 = 3.778. K = EPV/VHM = (5/3)/3.778 = 0.441. Z = 3 / (3 + K) = 87.2%. Comment: They gave you the sample mean and sample variance for each policyholder.
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 39
Section 3, Differing Numbers of Years, No Varying Exposures The method can be generalized in order to deal with differing numbers of years of data.24 Rather than take an average of each insureds sample variance, one takes a weighted average in order to estimate the EPV. One uses weights equal to the number of degrees of freedom for each class, that is the number of years of data for that class minus 1. Y i = the number of years of data for class i Yi
∑ (Xit - Xi )2 si2 = t=1
= the usual sample variance for the data from a single class i.25
Yi - 1
C
∑(Yi
C Yi
-
1)si2
∑∑ (Xit - Xi )2
EPV = weighted average of these sample variances = i=1C
= i=1 t=1 C ∑(Yi - 1) ∑(Yi - 1) i=1
.
i=1
Then besides using the altered estimate of the EPV, the estimate of the VHM is more complicated: C
Let Π =
∑Yi i=1
C ∑ Yi2 i= 1 . C ∑ Yi i= 1
C
∑Yi ( Xi
- X )2 - (C -1)EPV
VHM = i=1
Π Yi
∑ Xit Where X i = t=1 Yi
.
C Yi
∑∑ Xit = average of the data for class i, and X = i=1Ct=1
= overall average.
∑Yi i=1
24
These formulas are special cases of the formulas presented in the next sections, that deal with situations with varying exposures. In those more general formulas, set the number of exposures equal to 1 whenever the insured has data for that year, in order to get the formulas shown here. 25 Or from a single insured i, depending on the data set being analyzed.
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 40
Exercise: Verify that the above equation for the VHM reduces to the prior one when each class has the same number of years of data Y. C
Π=
[Solution:
∑Yi i=1
C ∑ Yi2 i= 1 C ∑ Yi i= 1
C
C
∑Yi ( Xi
CY2 = Y (C-1). CY
= CY -
∑ (Xi - X )2 - (C -1)EPV ∑ (Xi - X)2
- X )2 - (C -1)EPV
VHM = i=1
Y =
Π
C
i=1
Y (C - 1)
= i=1
C- 1
-
EPV = Y
(sample variance of the class means) - EPV / (# of years).] Exercise: There are 3 drivers in a particular rating class. For each year in which they were licensed to drive, we have the number of claims for each driver. Hugh was licensed for the last 3 years, Dewey was licensed for the last 5 years, while Louis was licensed for the last 4 years. Use nonparametric empirical Bayesian estimation to estimate the EPV. 1 Hugh Dewey Louis
0
2
3
4
5
1 0
0 0 2
0 0 1
0 0 0
[Solution: The EPV = weighted average of the sample variances for each driver, with weights equal to the number of years minus one = (0)(2) + (0.2)(4) + (0.9167)(3) = 3.55 / 9 = 0.394. 2 + 4 + 3 1 Hugh Dewey Louis
0
2
3
4
5
Mean
1 0
0 0 2
0 0 1
0 0 0
0.00 0.20 0.75
Sample Variance 0.0000 0.2000 0.9167
Comment: Note that in computing the numerator of the EPV, one could save a little time by just computing the numerator of each sample variance. For example, for Louis this would be: (0 - 0.75)2 + (2 - 0.75)2 + (1 - 0.75)2 + (0 - 0.75)2 = 2.75 = (4 - 1)(0.9167).]
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 41
Exercise: For this same data, use nonparametric empirical Bayesian estimation to estimate the VHM. [Solution: 3
Π=
∑ Yi i=1
3 ∑ Yi2 i=1 3 ∑ Yi i =1
= (3 + 5 + 4) -
32 + 52 + 42 = 12 - 50/12 = 7.83. 3 + 5 + 4
X = overall average of the observed data = 4 claims / 12 years = 1/3. The means for the three individual drivers are: 0, 0.20, and 0.75. C
∑Yi ( Xi VHM = i=1
- X )2 - (C -1)EPV Π
=
(3)(0 - 0.333)2 + (5)(0.20 - 0.333)2 + (4)(0.75 - 0.333)2 - (2)(0.394) = 0.042.] 7.83 Exercise: For the above data, use nonparametric empirical Bayesian estimation to estimate the future claim frequency for each driver. [Solution: K = EPv / VHM = 0.394/0.042 = 9.4. The number of years of data are 3, 5, and 4. Thus the credibilities are: 3/12.4 = 24%, 5/14.4 = 35%, and 4/13.4 = 30%. The estimates of future claim frequency are: Hugh: (24%)(0) + (76%)(0.333) = 0.25. Dewey: (35%)(0.2) + (65%)(0.333) = 0.29. Louis: (30%)(0.75) + (70%)(0.333) = 0.46.]
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 42
Problems: Use the following information for the next three questions: Past claims data on a portfolio of policyholders are given below: Policyholder
1
1 2 3
85 60
Year 2
3
50 55 95
65 70 75
3.1 (3 points) What is the Bühlmann credibility premium for policyholder 1 for year 4? A. Less than 62 B. At least 62 but less than 67 C. At least 67 but less than 72 D. At least 72 but less than 77 E. At least 77 3.2 (2 points) What is the Bühlmann credibility premium for policyholder 2 for year 4? A. Less than 62 B. At least 62 but less than 67 C. At least 67 but less than 72 D. At least 72 but less than 77 E. At least 77 3.3 (3 points) What is the Bühlmann credibility premium for policyholder 3 for year 4? A. Less than 62 B. At least 62 but less than 67 C. At least 67 but less than 72 D. At least 72 but less than 77 E. At least 77
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 43
3.4 (4 points) Use the following information:
• Phil and Sylvia are competitors in the light bulb business. • You test light bulbs from each of them. • Let Xi be the lifetime for light bulb i. • For Philʼs 100 light bulbs: Σ Xi = 85,217 and Σ Xi2 = 82,239,500. • For Sylviaʼs 200 light bulbs: Σ Xi = 178,102 and Σ Xi2 = 165,218,000. Use Nonparametric Empirical Bayes estimation in order to estimate the average lifetime of a randomly selected light bulb from Sylvia. A. Less than 880 B. At least 880 but less than 882 C. At least 882 but less than 884 D. At least 884 but less than 886 E. At least 886 3.5 (5 points) For each of 5 employees, displayed below are the dollars of health insurance claims made: Year 1
Year 2
Year 3
Year 4
Year 5
Year 6
Sum
102 93 123 211 193
782 104 77 116 118
270 171 139 81 89
1685 497 750 814 1265
722
1197
750
5011
Alice Asok Dilbert Howard Wally
138
73
135 91 467
155 206 133
320 129 121 109 265
Sum
831
567
944
Assume that each year of data has been brought to the current cost level. There was no data from Asok for the first two years. (You may assume that Asok was not employed at this company during years 1 and 2.) What is the estimated future annual cost for Asok? A. Less than 150 B. At least 150 but less than 155 C. At least 155 but less than 160 D. At least 160 but less than 165 E. At least 165
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 44
Use the following information for the next two questions: Survival times are available for five insureds, three from Class A and two from Class B. The three from Class A died at times t = 17, t = 32, and t = 39. The two from Class B died at times t = 12 and t = 24. Nonparametric Empirical Bayes estimation is used to estimate the mean survival time for each class. Unbiased estimators of the expected value of the process variance and the variance of the hypothetical means are used. 3.6 (2 points) Estimate the survival time for an insured picked at random from Class A. A. Less than 22 B. At least 22 but less than 24 C. At least 24 but less than 26 D. At least 26 but less than 28 E. At least 28 3.7 (2 points) Estimate the survival time for an insured picked at random from Class B. A. Less than 22 B. At least 22 but less than 24 C. At least 24 but less than 26 D. At least 26 but less than 28 E. At least 28 3.8 (4 points) You have the following data for the number of medical malpractice claims filed against each of five pediatricians.
Dr. George Burns Dr. Flo Schotte Dr. Major Payne Dr. Vera Sharpe-Needles Dr. Hy Fever
Year 1
Year 2
Year 3
Year 4
Mean
Sample Variance
1 2 0
3 1 4 0
2 0 1 1 2
2 2 1 1 0
2 1 1 2 0.5
0.667 1.000 0.000 2.000 1.000
Use Nonparametric Empirical Bayes estimation in order to estimate the expected number of claims from each doctor in year 5.
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 45
3.9 (3 points) Past claims data on four insureds are given below: Insured
Year 1
Year 2
1 2 3 4
75 60 50
60 55 85 40
Use Nonparametric Empirical Bayes estimation in order to estimate the Bühlmann credibility premium for insured 3. A. Less than 74 B. At least 74 but less than 76 C. At least 76 but less than 78 D. At least 78 but less than 80 E. At least 80
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 46
Solutions to Problems: C
∑(Yi
- 1)si2
3.1. C., 3.2. B., 3.3. D. Estimated EPV = i=1C
=
∑(Yi
- 1)
i=1
(3 - 1)(308.33) + (3 - 1)(58.33) + (2 - 1)(200) = 186.67. 2 + 2 + 1 Policyholder
1
1 2 3
85 60
C
Π=
∑Yi i=1
X =
C ∑ Yi2 i=1 C ∑ Yi i =1
=8-
Year 2
3
Total
Mean
Sample Variance
50 55 95
65 70 75
200 185 170
66.67 61.67 85.00
308.33 58.33 200.00
32 + 32 + 22 = 5.25. 8
200 + 185 + 170 = 69.38. 8 C
∑Yi ( Xi Estimated VHM = i=1
- X )2 - (C -1)EPV Π
=
(3)(66.67 - 69.38)2 + (3)(61.67 - 69.38)2 + (2)(85 - 69.38)2 - (186.67)(3 - 1) = 5.25 (688.33 - 373.34)/ 5.25 = 60.0. K = 186.67/60.0 = 3.11. Policyholder #1 has three years of data, so Z = 3/(3 + 3.11) = 49.1%. Thus the estimate for policyholder 1 for year 4 is: (.491)(66.67) + (.509)(69.38) = 68.05. Policyholder #2 has three years of data, so Z = 3/(3 + 3.11) = 49.1%. The estimate for policyholder 2 for year 4 is: (.491)(61.67) + (.509)(69.38) = 65.59. Policyholder #3 has two years of data, so Z = 2/(2 + 3.11) = 39.1%. The estimate for policyholder 3 for year 4 is: (.391)(85.00) + (.609)(69.38) = 75.49. Comment: The estimate of 68.05 for policyholder #1, differs from the estimate of 68.28 in the previous section for a similar setup, but in which each policyholder has 3 years of data.
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 47
3.4. D. The mean for Phil is 852.17. The second moment for Phil is 822,395. The sample variance for Phil is: (100/99)(822,395 - 852.172 ) = 97,173. The mean for Sylvia is 178102/200 = 890.51. The second moment for Sylvia is 165,218,000/200 = 826,090. The sample variance for Sylvia is: (200/199)(826,090 - 890.512 ) = 33,248. C
∑(Yi
- 1)si2
EPV = i=1C
=
∑(Yi
(99)(97,173) + (199)(33,248) = 54,485. 99 + 199
- 1)
i=1
Overall Mean = (85217 + 178102)/300 = 877.73. C
Π=
∑Yi i=1
C ∑ Yi2 i=1 C ∑ Yi i =1
= 300 -
1002 + 200 2 = 133.33. 300
C
∑Yi ( Xi VHM = i=1
- X )2 - (C -1)EPV Π
=
(100)(852.17 - 877.73)2 + (200)(890.51 - 877.73)2 - (2 - 1)(54,485) = 326.4. 133.33 K = EPV/VHM = 54485/326.4 = 166.9. For Sylvia, Z = 200/(200 + 166.9) = 54.5%. Estimate for Sylviaʼs bulbs is: (.545)(890.51) + (.455)(877.73) = 884.7. Comment: For Phil, Z = 100/(100 + 166.9) = 37.5%. Estimated lifetime for Philʼs bulbs is: (.375)(852.17) + (.625)(877.73) = 868.1.
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 48
3.5. E. One calculates the sample variance for each individual. Now one takes the weighted average; each persons sample variance gets a weight proportional to its number of degrees of freedom = number of year - 1. So in this case the weights are: 5/23, 3/23, 5/23, 5/23, 5/23. The estimate of the EPV is: 20,461.
Alice Asok Dilbert Howard Wally
1
2
3
4
5
6
138
73
135 91 467
155 206 133
320 129 121 109 265
102 93 123 211 193
782 104 77 116 118
270 171 139 81 89
Number of Years -1 5 3 5 5 5
Process Variance 69,679 1,198 704 3,341 19,679
23
20,461
Weighted Average C
Π=
∑Yi i=1
C ∑ Yi2 i=1 C ∑ Yi i =1
= (6 + 4 + 6 + 6 + 6) -
62 + 42 + 62 + 62 + 62 = 22.29. 28
X = overall average of the observed data = 5011/ (28 man-years) = 178.96. The means for the five individual are: 280.83, 124.25, 125 , 135.67, and 210.83. C
∑Yi ( Xi VHM = i=1
- X )2 - (C -1)EPV Π
=
{(6)(280.83 - 178.96)2 + (4)(124.25 - 178.96)2 + (6)(125 - 178.96)2 + (6)(135.67 - 178.96)2 + (6)(210.83 - 178.96)2 - (20461)(4)} / 22.29 = 1220 . K = 20461/1220 = 16.8. The credibilities are: 6/22.8 = 26.3%, 4/20.8 = 19.2%, 6/22.8 = 26.3%,6/22.8 = 26.3%, and 6/22.8 = 26.3%. The estimate of Asokʼs future annual cost is: (0.192)(124.25) + (1 - 0.192)(178.96) = 168. Comment: The estimates of each individualʼs future annual costs are: Mean Estimate
Alice 280.83 206
Asok 124.25 168
Dilbert 125.00 165
Howard 135.67 168
Wally 210.83 187
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 49 C
∑(Yi
- 1)si2
3.6. D. & 3.7. B. Estimated EPV = i=1C
=
∑(Yi
(3 - 1)(126.333) + (2 - 1)(72) = 108.22. 2 + 1
- 1)
i=1
Class
First Survival Time
Second Survival Time
Third Survival Time
Mean
Sample Variance
A B
17 12
32 24
39
29.333 18.000
126.333 72.000
C
Π=
∑Yi i=1
C ∑ Yi2 i=1 C ∑ Yi i =1
= 5 - (32 + 22)/5 = 2.4.
X = 124 / 5 = 24.8. C
∑Yi ( Xi Estimated VHM = i=1
- X )2 - (C -1)EPV Π
=
(3)(29.333 - 24.8)2 + (2)(18 - 24.8)2 - (108.22)(2 - 1) = 19.13. 2.4 K = 108.22/19.13 = 5.66. Class A has three years of data, so Z = 3/(3 + 5.66) = 34.6%. Thus the estimated survival time for Class A is: (0.346)(29.333) + (0.654)(24.8) = 26.4. Class B has two years of data, so Z = 2/(2 + 5.66) = 26.1%. Thus the estimated survival time for Class B is: (0.261)(18) + (0.739)(24.8) = 23.0.
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 50 C
∑(Yi
- 1)si2
3.8. EPV = i=1C
=
∑(Yi
(0.667)(3) + (1)(2) + (0)(1) + (2)(3) + (1)(3) = 13/12. 3 + 2 + 1 + 3 + 3
- 1)
i=1
C
Π=
∑Yi i=1
C ∑ Yi2 i=1 C ∑ Yi i =1
= (4 + 3 + 2 + 4 + 4) - (42 + 32 + 22 + 42 + 42 )/17 = 13.41.
X = 23/17 = 1.353.
Σ Y i ( X i - X )2 = (4)(2 - 1.353)2 + (3)(1 - 1.353)2 + (2)(1 - 1.353)2 + (4)(2 - 1.353)2 + (4)(.5 - 1.353)2 = 6.882. C
∑Yi ( Xi VHM = i=1
- X )2 - (C -1)EPV =
Π
6.882 - (13 / 12)(5 - 1) = 0.190. 13.41
K = EPV/VHM = (13/12)/0.190 = 5.70. Year 4Mean 1 2 3 Dr. Burns Dr. Schotte Dr. Payne Dr. Sharpe-Needles Dr. Fever
2 1 1 2 0.5
Years of Data
Z
Estimated Number of Claims
4 3 2 4 4
41.2% 34.5% 26.0% 41.2% 41.2%
1.62 1.23 1.26 1.62 1.00
For example, (41.2%)(2) + (1 - 41.2%)(1.353) = 1.62.
2013-4-12
Empirical Bayesian Cred. §3 Differing Number Years, HCM 10/22/12, Page 51 C
∑(Yi
- 1)si2
3.9. D. Estimated EPV = i=1C
=
∑(Yi
- 1)
i=1
(2 - 1)(112.5) + (2 - 1)(12.5) + (1 - 1)(N.A.) + (2 - 1)(50) = 58.33. 1 + 1 + 1 Insured 1 2 3 4 C
Π=
∑Yi i=1
Year 1
Year 2
Total
Mean
Sample Variance
75 60
60 55 85 40
135 115 85 90
67.50 57.50 85.00 45.00
112.50 12.50 N/A 50.00
50 C ∑ Yi2 i=1 C ∑ Yi i =1
= 7 - (22 + 22 + 12 + 22 )/7 = 5.143.
X = (135 + 115 + 85 + 170)/7 = 60.71. C
∑Yi ( Xi Estimated VHM = i=1
- X )2 - (C -1)EPV Π
=
(2)(67.5 - 60.71)2 + (2)(57.5 - 60.71)2 + (1)(85 - 60.71)2 + (2)(45 - 60.71)2 - (58.33)(4 - 1) 5.143
= 198.6. K = 58.33/198.6 = 0.294. For insured 3 with one year of data, Z = 1/(1 + 0.294) = 77.3%. Thus the estimate for insured 3 is: (77.3%)(85) + (1 - 77.3%)(60.71) = 79.49. Comment: Since there is only one year of data from insured number 3, the sample variance is not defined. However, that is okay since its contribution to the estimate of the EPV includes a factor of: 1 - 1 = 0.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 52
Section 4, Varying Exposures, (No Differing Numbers of Years)26 In the example discussed previously with three drivers, each year from each driver was one exposure. In this and the other examples in the prior sections, each year or trial of data represented the same volume of data. In many insurance applications there would be varying amounts of data per year per either insured or class.27 28 The Bühlmann-Straub Method is a generalization of the previous Bühlmann method, in order to deal with those circumstances in which there are different numbers of exposures or premiums by class and/or year. Again we will first deal with the simpler case where each class has the same number of years of data. A Two Class Example: Assume you have data for 2 classes over three years and wish to use nonparametric Bayesian Estimation to estimate the future pure premium for each class. Class Poultry Farms Dairy Farms
1 41 58
Exposures 2 3 37 29 59 53
Total 107 170
1 232 104
The mean pure premium for Poultry is: 502/107 = 4.69. The mean pure premium for 58 Dairy is: 315/170 = 1.85. Dairy Class Poultry Farm Farm 41 1 Exposures 59 37 2 53 29 3 Class Poultry Farms Dairy Farms
Pure Premium 1 2 3 5.66 0.89 8.17 1.79 1.02 2.85
Losses 2 33 60
3 237 151
Total 502 315
104 232 1 Total 4.69 1.85
Let Xit = Pure Premium for Class i in year t. Y
mit = exposures for class i in year t. mi =
∑ mit . t=1
26
The situation with varying exposures is more general than the situation without varying exposures. For example in group health insurance, different employers have different numbers of employees, and the number of employees vary over time. In commercial automobile insurance, the number of insured cars in the fleet varies from policyholder to policyholder and over time. 28 A classification is a set of similar insureds that are grouped together for purposes of estimating an average rate to charge them. See for example, “Risk Classification”, Robert Finger in Foundations of Casualty Actuarial Science. The number of exposures varies between classes and over time. 27
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 53
Y
∑ mit Xit _ Xi = t=1 = weighted average pure premium for class i.29 mi Y
∑ mit (Xit - X i )2 Then one estimates the process variance of each class as: vi = t=1
Y - 1
. 30
For Poultry the estimated process variance is: (41)(5.66 - 4.69)2 + (37)(0.89 - 4.69)2 + (29)(8.17 - 4.69)2 = 462.0. 3 - 1 Similarly, for Dairy the estimated process variance is: (58)(1.79 - 1.85)2 + (59)(1.02 - 1.85)2 + (53)(2.85 - 1.85)2 = 46.9. 3 - 1 Then the estimate of the EPV is: (v1 + v2 )/2 = (462.0 + 46.9)/2 = 254. One estimates the VHM as follows:31 C
∑ mi 2 Π = m - i=1 m
= 277 -
1072 + 1702 = 131.3. 107 + 170
X = overall average = (502 + 315) / (107 + 170) = 2.95. C
∑mi (X i
- X )2 = (107) (4.69 - 2.95)2 + (170) (1.85 - 2.95)2 = 530.
i=1 C
∑mi ( Xi Then the estimated VHM = i=1 29
- X )2 - (C -1)EPV Π
=
530 - (2 - 1)(254) = 2.10. 131.3
This weighted average pure premium for a class is the sum of the losses for that class divided by the sum of the exposures for that class. 30 This is analogous to a sample variance, however its computation involves exposures and the weighted average. As will be discussed we are estimating the process variance for a single exposure. The notation is taken from 4, 11/01, Q.30. 31 The special formula for the denominator of the VHM is analogous to that when the number of years varied.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 54
Exercise: Determine the credibility do be used in order to estimate the future pure premium for each class. [Solution. K = EPV / VHM = 254 / 2.10 = 121. For the Poultry class: Z = 107 / (107 + 121) = 46.9%. For the Dairy class: Z = 170 / (170 + 121) = 58.4%.] This is an example of the Bühlmann-Straub Method, one of the Empirical Bayesian techniques. In general the Bühlmann-Straub Method would proceed as follows: Assume we have Y years of data from each of C classes. Let Xit be the data (die roll, frequency, severity, pure premium, loss ratio, etc.) observed for class (or risk) i, in year t, for i = 1,...,C, and t =1,...,Y. Let mit be the measure of size (premiums, exposures, number of die rolls, number of members in a group, etc.) for class (or risk) i, in year t, for i = 1, ... , C, and t =1, ... , Y. Y
Let mi =
∑ mit = sum of exposures for class i. t=1
C Y
Let m =
∑ ∑mit = overall sum of exposures. i=1 t=1 Y
∑ mit Xit Let X i = t=1
mi
= weighted average of the data for class i.
C Y
∑∑ mit Xit Let X = i=1 t=1 m
= overall weighted average of the observed data.
Y
∑ mit (Xit - X i )2 Let vi = t=1
Y - 1
= estimated (process) variance of the data for class i.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 55 C
∑ vi
estimated EPV (for a single exposure of the risk process) =
i=1
C
C Y
∑∑ mit (Xit - Xi )2 = i=1 t=1 C (Y - 1)
.
C
∑ mi 2 Let Π = m - i=1 m
= total exposures “adjusted for degrees of freedom.”32
C
∑mi (X i
estimated VHM = i=1
- X)2 - EPV (C -1) Π
.33
If the estimated VHM ≤ 0, set Z = 0. K = EPV / VHM = estimated Bühlmann Credibility Parameter. As always we wish to estimate the EPV and VHM for one draw from the risk process, in other words for one exposure for one year. Each cell usually has more than one exposure, and thus a smaller random fluctuation in the observed average frequency, pure premium, etc., than if it had had only one exposure. Therefore, we multiply (Xit - X i)2 by mit, the exposures in that cell, in order to increase the random fluctuations back up to the level they would have been at with one exposure. The corresponding adjustment is made in the first term in the numerator of the VHM.34
Loss Models does not use any notation or terminology to describe what I have called Π, an intermediate step in the calculation of the estimated VHM. 33 See Equation 20.76 in Loss Models, by Klugman, Panjer, and Willmot, or Part III, Chapter 3, Theorem 20, in Advanced Risk Theory by De Vlyder. 34 These formulas are derived in the next section. 32
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 56
Preserving the Total Losses: One could take the credibility weighted average of each classes pure premium and the overall mean pure premium. For the Poultry class, the estimated future pure premium would be: (46.9%) (4.69) + (1 - 46.9%) (2.95) = 3.77. For the Dairy class, the estimated future pure premium would be: (58.4%) (1.85) + (1 - 58.4%) (2.95) = 2.31. Class Poultry Farms Dairy Farms
Exposures 107 170
Losses 502 315
Pure Premium 4.69 1.85
277
817
2.95
Overall
Z Estimated P.P. 46.9% 3.77 58.4% 2.31
However, there is a problem with this approach! The estimated pure premiums of 3.77 and 2.31, correspond to total estimated losses of: (107)(3.77) + (170)(2.31) = 796. This differs from the observed total losses of 817. Our fancy actuarial technique has lowered the overall rate level, without our intending to. In general, proceeding in the above manner, the observed total losses will not equal the total estimated losses. In order to preserve the total losses, one applies the complement of credibility to the credibility weighted average pure premium.35 Exercise: You have data for 2 classes over three years. Use nonparametric Bayesian Estimation to estimate the future pure premium for each class, using the method that preserves total losses. Class Poultry Farm Dairy Farm
1 41 58
Exposures 2 3 37 29 59 53
1 232 104
Losses 2 33 60
3 237 151
[Solution: From a prior solution, for the Poultry class Z = 46.9%, and for the Dairy class Z = 58.4%. (46.9%)(4.69) + (58.4%)(1.85) The credibility weighted pure premium is: = 3.12. 46.9% + 58.4% Thus the estimated pure premiums are, Poultry: (46.9%) (4.69) + (1 - 46.9%) (3.12) = 3.86, and Dairy: (58.4%) (1.85) + (1 - 58.4%) (3.12) = 2.38. Comment: If for example, we expected 50 exposures from Dairy Farms, we might charge a premium of: (50)(2.38) = 119.] Note that now the estimated total losses are: (107)(3.86) + (170)(2.38) = 818, matching the observed total losses of 817, subject to rounding. 35
All past exam questions of this type have used this method that preserves total losses.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 57
Derivation of the Method of Preserving the Total Losses: If one applies the complement of credibility to the credibility weighted average pure premiums, provided the credibility is of the form Z = E/(E+K), then as will be shown the total losses are preserved.36 Let Li be the losses for class i, mi be the exposures for class i, pi be the estimated pure premium for class i, K = the estimated Bühlmann Credibility Parameter, and Zi be the credibility for class i, Zi = mi / (mi + K). The credibility weighted average pure premium = mi Li (Li / mi) Zi (Li / mi) mi + K mi + K = = . m m i i Zi mi + K mi + K
∑
∑
∑
∑ ∑
∑
The estimated pure premium for class i = pi = Z i (Li/mi) + (1 - Zi) (credibility weighted average pure premium) = Li Li ∑ ∑ mi Li Li K K mi + K mi + K + = + . mi + K mi mi + K mi + K mi + K mi mi ∑ mi + K ∑ mi + K
The estimated total losses = Σmip i =
∑
Li mi + mi + K
Li ∑ K ∑ mi mi + K mi m+i K = ∑ mi + K
∑ mi mi +i K + K ∑ mi +i K = ∑ (mi + K) mi +i K = ΣLi = the total observed losses. L
L
L
Multisided Die Example with Differing Volumes of Data: As in the previous example, the data would consist of the number of exposures per year for each risk, as well as the outcome each year for each risk. For example, the outcome could be the number of accidents or dollars of loss. This technique is set up to be applied to the accident frequencies, pure premiums, severities, the average of a bunch of die rolls, etc.
36
Note the result does not depend on where K came from or what its numerical value is.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 58
For example, assume we have the following data generated by a simulation of the multi-sided die example, where each trial consists of the sum of many rolls of the selected die: Risk Die Number Type 1 2 3 4 5 6 7 8 9 10
4 4 4 4 4 4 6 6 6 8
Number of Die Rolls Trial Number 2 3 4 5
1 150 80 120 40 160 50 40 160 120 80
150 90 110 70 130 50 70 130 110 90
150 100 100 100 100 50 100 100 100 100
150 100 100 100 100 50 100 100 100 100
150 110 90 130 70 50 130 70 90 110
Sum of Die Rolls Trial Number 2 3 4 5
6
1
150 120 80 160 40 50 160 40 80 120
378 205 281 104 396 121 136 555 394 400
371 222 296 178 323 119 260 478 391 385
394 256 263 270 253 122 341 352 362 461
346 246 244 257 240 132 346 364 352 425
386 290 217 339 173 128 467 258 319 491
6 360 288 196 410 101 125 546 124 279 535
Assume that we only have the observed data and do not know what sided die generated each risk. Then as was the case in the absence of varying exposures, one can attempt to estimate the EPV and VHM or the Bühlmann Credibility Parameter K using the average die rolls and number of die rolls for each trial and risk.37
Risk #
1
1 2 3 4 5 6 7 8 9 10
150 80 120 40 160 50 40 160 120 80
Average
Number of Die Rolls Trial Number 2 3 4 5 6 150 90 110 70 130 50 70 130 110 90
150 100 100 100 100 50 100 100 100 100
150 100 100 100 100 50 100 100 100 100
150 110 90 130 70 50 130 70 90 110
150 120 80 160 40 50 160 40 80 120
1 2.520 2.562 2.342 2.600 2.475 2.420 3.400 3.469 3.283 5.000
Average of Die Rolls Trial Number 2 3 4 2.473 2.467 2.691 2.543 2.485 2.380 3.714 3.677 3.555 4.278
2.627 2.560 2.630 2.700 2.530 2.440 3.410 3.520 3.620 4.610
2.307 2.460 2.440 2.570 2.400 2.640 3.460 3.640 3.520 4.250
5
6
2.573 2.636 2.411 2.608 2.471 2.560 3.592 3.686 3.544 4.464
2.400 2.400 2.450 2.562 2.525 2.500 3.413 3.100 3.487 4.458
Mean
Proc. Var.
2.483 2.512 2.495 2.597 2.477 2.490 3.493 3.552 3.495 4.495
2.047 0.819 1.993 0.309 0.195 0.470 1.378 2.688 1.523 6.449
3.009
1.787
For Risk 2 separately we can estimate the mean as:38 (80)(2.562) + (90)(2.467) + (100)(2.56) + (100)(2.46) + (110)(2.636) + (120)(2.4) = 2.512. 600
37
In an insurance application one would work with the frequency (or severity, pure premium, loss ratio, etc.) and number of exposures (or claims, premiums, etc.) for each year and each risk or class. 38 Assuming one had the sum of the die rolls for each trial, one could do this calculation in an equivalent way (except for rounding) as: (205 + 222 + 256 + 246 + 290 + 288) / 600 = 1507/600 = 2.512.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 59
For Risk 2 separately we can estimate the process variance (per die roll):39 (80)(2.562 - 2.512) 2 +(90)(2.467 - 2.512) 2 +(100)(2.56 - 2.512)2 +(100)(2.46 - 2.512) 2 +(110)(2.636 - 2.512)2 + (120)(2.4 - 2.512)2 6 - 1
= 0.819. For each trial, we multiplied its squared difference by its exposures, since it is assumed that the process variance for a trial is that for a single die roll divided by the number of dice.40 Recall that we are trying to estimate the EPV for a single die roll.41 Exercise: Based on the data for risk 9 what is the estimated variance? [Solution: The estimated mean is 3.495 and the estimated variance is 1.523.] In this manner one gets a separate estimate of the process variance of each of the ten risks. By taking an average of these ten values, one gets an estimate of the (overall) expected value of the process variance, in this case 1.787.42 By taking a weighted average of the means estimated for each of the ten risks, using the exposures for each risk as the weight, one gets an estimate of the overall average, in this case 3.008. Then in order to estimate the VHM one could take a weighted average of the squared deviations of the individual means from the overall mean, using the exposures for each class as the weight. The first part of the numerator of the estimated VHM is:
∑ mi (X i - X)2 =
(900)(2.483 - 3.008)2 + (600)(2.512 - 3.008)2 + (600)(2.495 - 3.008)2 + (600)(2.597 - 3.008)2 + (600)(2.477 - 3.008)2 + (300)(2.49 - 3.008)2 + (600)(3.493 - 3.008)2 + (600)(3.552 - 3.008)2 + (600)(3.495 - 3.008)2 + (600)(4.495 - 3.008)2 = 2692.5. However, as derived below this estimate would be biased; it contains too much random variation due to the use of the observed means rather than the actual hypothetical means. It turns out that if we subtract out from the numerator the estimated EPV times the number of classes minus one, then this estimator will be an unbiased estimator of the VHM.
39
Note that this estimate of the process variance only involves data from a single row. We have divided by the number of columns minus one. With an unknown mean, one divides by n-1 in order to get an unbiased estimate of the variance. 40 The assumption that the process variance of the frequency is inversely proportional to the size of risk is the same as used in the derivation of the Bühlmann Credibility formula: Z = E/(E+K). 41 More generally, we will be estimating the EPV and VHM for a single exposure. The Bühlmann Credibility formula will adjust for the number of exposures. 42 One could take any weighted average of the separate estimates of the process variance. The Bühlmann-Straub method, in the case where each class has the same number of years of data, gives each of the ten estimates of the process variance equal weight. Note that the EPV for the underlying model is in this case 2.10.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 60
Numerator of estimated VHM = 2692.5 - (1.787)(10 - 1) = 2676.4. As derived below, the denominator should be less than the sum of the exposures.43 The denominator will be the sum of the exposures minus a term which is the sum of the squares of the exposures for each class divided by the overall number of exposures: Π = m - Σ mi2 / m = 6000 -
9002 + 6002 + 6002 + 6002 + 6002 + 300 2 + 600 2 + 6002 + 6002 + 6002 = 6000
6000 - 630 = 5370. Dividing the numerator by this denominator produces an unbiased estimate of the VHM. Estimated VHM = 2676.4 / 5370 = 0.498. Thus for this example, one would estimate that K = 1.787 / 0.498 = 3.6. Note that this estimated value of K differs somewhat from the value of K = 2.15/0.45 = 4.778 for the multi-sided die example, as discussed in the introductory section. However, unlike here, there we made use of the parameters of the full parametric model in order to estimate the EPV and VHM. In summary for the simulation of the multi-sided die example: C =10, Y = 6, X 1 = 2.483, X = 3.008, v1 = 2.047, C
∑ vi
estimated EPV =
i=1
C
C Y
∑∑ mit (Xit - Xi )2 = i=1 t=1 C (Y - 1)
= 1.787,
C
∑ mi 2 m1 = 900, m = 6000, Π = m C
∑mi (X i
estimated VHM = i=1
i=1
m
= 6000 - 630 = 5370,
- X)2 - EPV (C -1) Π
=
2692.5 - (1.787) (10 - 1) = 0.498. 5370
estimated K = EPV / VHM = 1.787 / 0.498 = 3.6.
43
In the case of one roll per trial, this is equivalent to dividing by the number of risks minus one, in order to adjust for the number of degrees of freedom.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 61
Negative Estimate of the VHM: Unfortunately, the estimate of the VHM produced by the Bühlmann-Straub or other Empirical Bayes Estimators is often subject to considerable random fluctuation. A significant difficulty is that the estimated VHM can be negative, even though we know that as a variance it can not be negative. In the case of negative estimates of the VHM, set the VHM = 0. This results in no credibility being given to the data, which makes sense if the actual VHM is very small but positive. In any case, the random fluctuation in the estimate of the VHM can lead to considerable random fluctuation in the estimate of K. In the case of the multi-sided dice, with relatively large amounts of data, when ten additional simulation experiments were run, the estimated values of K ranged from about 3 to about 7. Formulas, Variation in Exposures, No Variation in Years:44 Y
∑ mit (Xit - X i v i = t=1
C
)2
∑ vi
EPV =
Y - 1
i=1
C
C Y
∑∑ mit (Xit - Xi )2 = i=1 t=1 C (Y - 1)
.
C
∑ mi 2 Π = m - i=1 m C
.
∑mi (X i
VHM = i=1
K=
EPV . VHM
- X)2 - EPV (C -1) . If the estimated VHM < 0, set Z = 0.
Π
Zi =
mi . mi + K
The complement of credibility is given to the “credibility weighted average.”
44
Note that if there are missing data cells, but there are the same number of years of data for each class, one can still use these formulas, rather than the more complicated formulas in the next section. See 4, 11/04, Q.17.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 62
No Variation in Exposures: The formulas in this section consider exposures. If one takes the number of exposures for each class for each year equal to 1, then one obtains the formulas for the case without exposures, discussed previously. Let mit = 1 for all i and t. Then: C
∑∑ mit (Xit - Xi )2 EPV = i=1 t=1 C (Y - 1)
=
C C
Π=m-
m
C
C (Y - 1)
.
Y2 ∑ CY2 = YC = YC = (C-1)Y. YC
YC
C
∑mi (X i
- X)2 - EPV (C -1)
VHM = i=1
Π
C
∑ (Xi - X )2 =
i=1 t=1
C
∑ mi 2 i=1
Y
∑ ∑ (Xit - Xi )2
C Y
i=1
C - 1
.
∑ Y (Xi - X )2 -
= i=1
EPV (C -1)
(C -1) Y
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 63
Exercise: Use the formulas that consider exposures in order to estimate the EPV and VHM for the previously discussed three driver example, which did not involve exposures. Hugh Dewey Louis
1
2
3
4
5
0 0 0
0 1 0
0 0 2
0 0 1
0 0 0
[Solution: Just take the number of exposures for each class for each year equal to 1. mit = 1 for all i and t. C Y
∑∑ mit (Xit - Xi )2 EPV = i=1 t=1 C (Y - 1)
=
(1/3)(1/4){(0 - 0)2 + (0 - 0)2 + (0 - 0)2 + (0 - 0)2 + (0 - 0)2 + (0 - .2)2 + (1 - .2)2 + (0 - .2)2 + (0 - .2)2 + (0 - .2)2 + (0 - .6)2 + (0 - .6)2 + (2 - .6)2 + (1 - .6)2 + (0 - .6)2 } = 4/12 = 1/3. C
∑ mi 2 Π = m - i=1 m
C
= 15 - (52 + 52 + 52 )/15 = 10.
∑mi (X i
VHM = i=1
- X)2 - EPV (C -1) Π
=
(5)(0 - 0.267)2 + (5)(0.2 - 0.267) 2 + (5)(0.6 - 0.267) 2 - (1/ 3)(3 - 1) = 0.0267. 10 Comment: This matches the result obtained previously using the simpler formulas that do not involve exposures.]
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 64
Problems: 4.1 (3 points) You are given the following experience for two insured groups: 1
Year 2
3
Number of members
13
12
14
39
Average loss per member
46
125
114
94.72
Number of members
20
22
25
67
Average loss per member
45
70
22
44.63
Group 1
2
Number of members
Total
Average loss per member 2
Total
106 63.06
3
∑ ∑ mij (Xij - Xi )2
= 74,030.
i=1 j=1 2
∑mi (X i
- X )2 = 61,849.
i=1
Determine the nonparametric Empirical Bayes credibility premium for group 2, using the method that preserves total losses. (A) 49 (B) 50 (C) 51 (D) 52 (E) 53
4.2 (2 points) You are given the following information for four territories: Territory 1 2 3 4
Exposures 235 103 130 47
Losses 30,400 12,200 12,800 3,000
Total
515
58,400
The Bühlmann Credibility Parameter, K = 200. Using the method that preserves total losses, estimate the future pure premium for Territory #4. (A) 100 (B) 101 (C) 102 (D) 103 (E) 104
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 65
Use the following information to answer the next four questions: For each of 6 classes for each of 3 years one has the number of exposures and the observed claim frequency (the number of claims divided by the number of exposures): Exposures Claim Frequency Class Year Year Weighted 1 2 3 Sum 1 2 3 Average Cracker Mfg.
14
20
13
47
10
6
14
9.40
Creamery
8
10
9
27
26
12
17
17.81
Bakery
24
22
19
65
14
5
9
9.49
Macaroni Mfg.
12
10
18
40
12
18
7
11.25
Ice Cream Mfg.
25
18
25
68
9
4
6
6.57
Grain Milling
8
7
11
26
24
28
15
21.27
Sum
91
87
95
273
13.68
9.17
9.97
10.95
4.3 (3 points) Use nonparametric Empirical Bayes techniques to estimate the Expected Value of the Process Variance. A. 300 B. 325 C. 350 D. 375 D. 400 4.4 (3 points) Use nonparametric Empirical Bayes techniques to estimate the Variance of the Hypothetical Means. A. 17.5 B. 18.0 C. 18.5 D. 19.0 E. 19.5 4.5 (1 point) Using the results of the previous questions, what is the credibility given to the three years of observed data from the Cracker Manufacturing class? A. 68% B. 70% C. 72% D. 74% E. 76% 4.6 (2 points) What is the estimated future claim frequency for the Creamery class? Use the method that preserves total claims. E. 15.3 A. 14.5 B. 14.7 C. 14.9 D. 15.1
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 66
4.7 (3 points) One is given the data from 27 insurance agencies writing commercial fire insurance on behalf of your company in the same state, over the last 3 years. Let Xij be the loss ratio for agency i in year j. Let mij be the premium for agency i in year j. 3 27
∑ ∑ mij (Xij - Xi
)2
27
∑mi = 93.2.
= .600.
i=1 j=1
i=1
27
∑mi (X i
27
-
X )2
∑mi2 = 872.3.
= .343.
i=1
i=1
Determine the Buhlmann Credibility factor, Z, for estimating the future loss ratio of an insurance agency with a total of 8.0 in premiums over 5 years, using nonparametric empirical Bayes estimation. A. Less than 35% B. At least 35% but less than 40% C. At least 40% but less than 45% D. At least 45% but less than 50% E. At least 50% 4.8 (2 points) You are given the following information for three policyholders: Policyholder Premiums Losses Loss Ratio 1 5 2 40.0% 2 6 3 50.0% 3 10 9 90.0% Total
21
14
66.7%
The Bühlmann Credibility Parameter, K = 10. Using the method that preserves total losses, estimate the future loss ratio for Policyholder #1. (A) 56% (B) 57% (C) 58% (D) 59% (E) 60%
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 67
For the next two questions use the following experience for four insurers writing private passenger automobile insurance in the nation of Hiber: Year Insurer 1 2 3 Total Freedom
Premium Loss Ratio
10 82%
9 85%
8 86%
27 84.1%
Victoria
Premium Loss Ratio
12 77%
11 84%
12 75%
35 78.5%
Enterprise
Premium Loss Ratio
16 76%
18 75%
20 79%
54 76.8%
Security
Premium Loss Ratio
8 82%
8 77%
8 78%
24 79.0%
Total
Premium Loss Ratio
46 78.6%
46 79.5%
48 79.0%
140 79.0%
Xij denotes the loss ratio for insurer i and year j, where i = 1, 2, 3, 4 and j = 1, 2, 3. Premium is the money charged to the insured in order to buy insurance. Loss ratio = losses / premium. Corresponding to each loss ratio is the amount of premium, mij, measuring exposure. 4
3
∑ ∑ mij (Xij - Xi )2 i=1 j=1
4
= 864.15
∑mi (X i
- X )2 = 1000.79
i=1
4.9 (3 points) Determine the nonparametric Empirical Bayes credibility factor for Freedom Insurance. A. Less than 65% B. At least 65% but less than 70% C. At least 70% but less than 75% D. At least 75% but less than 80% E. At least 80% 4.10 (2 points) Determine the nonparametric Empirical Bayes credibility premium for Freedom Insurance, using the method that preserves total losses. A. Less than 80% B. At least 80% but less than 81% C. At least 81% but less than 82% D. At least 82% but less than 83% E. At least 83%
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 68
Use the following data on two classes over three years for the next four questions: Exposures Year 2001 2002 2003
Standard 100 120 150
Preferred 50 40 30
Total 150 160 180
Total
370
120
490
Losses Year 2001 2002 2003
Standard 12,000 13,000 14,000
Preferred 3200 4500 1700
Total 15,200 17,500 15,700
Total
39,000
9400
48,400
Preferred 64.00 112.50 56.67
Total 101.33 109.38 87.22
78.33
98.78
Pure Premium Year Standard 2001 120.00 2002 108.33 2003 93.33 Total
105.41
Assume that the losses in each year have been adjusted to the cost level for the year 2005. 4.11 (3 points) Use nonparametric Empirical Bayes techniques to estimate the Expected Value of the Process Variance. (A) 25,000 (B) 26,000 (C) 27,000 (D) 28,000 (E) 29,000 4.12 (3 points) Use nonparametric Empirical Bayes techniques to estimate the Variance of the Hypothetical Mean Pure Premiums. A. Less than 170 B. At least 170 but less than 180 C. At least 180 but less than 190 D. At least 190 but less than 200 E. At least 200
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 69
4.13 (1 point) How much credibility is assigned to the data for the Standard Class? A. Less than 75% B. At least 75% but less than 80% C. At least 80% but less than 85% D. At least 85% but less than 90% E. At least 90% 4.14 (2 points) Using the method that preserves total losses, estimate the pure premium in the year 2005 for the Preferred Class. A. Less than 87 B. At least 87 but less than 88 C. At least 88 but less than 89 D. At least 89 but less than 90 E. At least 90
4.15 (3 points) One is given the data for homeowners insurance from 11 branch offices of your insurance company over the last 6 years. Let Xij be the loss ratio for branch office i in year j. Let mij be the premium for branch office i in year j. 11 6
∑ ∑ mij (Xij - Xi
)2
11
= 1.843.
i=1 j=1
i=1
m1 = 9.0.
i=1
11
∑mi (X i
∑mi = 77.2. 11
-
X )2
= 0.361.
∑mi2 = 1105. i=1
Determine the Buhlmann Credibility factor, Z, for estimating the future loss ratio of the first branch office, using nonparametric empirical Bayes estimation. A. 5% B. 10% C. 15% D. 20% E. 25%
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 70
Use the following information for the next 4 questions: You observe loss ratios for each of 5 years for each of 10 classes. Let Xij be the loss ratio for class i in year j, adjusted to the current level. Let mij be the premiums for class i in year j. _ Let Xi be the weighted average loss ratio for class i. Let be the overall weighted average loss ratio. Let mi be the total premiums for class i. 1
3
9
10
100
∑ ∑ mij (Xij - Xi
63
)2
27
5
8
mi
351
4
7
77.2% 82.5% 74.0% 67.3% 73.1% 61.0% 80.2% 72.8% 77.4% 68.6%
10 5
2
6
Class: _ Xi
193
33
125
178
162
142
10
= 102,397.
i=1 j=1
∑mi = 1374. i=1
10
∑mi (X i
- X )2 = 38,429.
X = 76.09%.
i=1
4.16 (1 point) Determine the nonparametric empirical Bayes estimate for the Expected Value of the Process Variance. (A) 2000 (B) 2200 (C) 2400 (D) 2600 (E) 2800 4.17 (2 points) Determine the nonparametric empirical Bayes estimate for the Variance of the Hypothetical Mean Loss Ratios. (A) 13 (B) 14 (C) 15 (D) 16 (E) 17 4.18 (1 point) What is the credibility assigned to the observed data for class 6? (A) 6% (B) 8% (C) 10% (D) 12% (E) 14% 4.19 (4 points) Determine the nonparametric Empirical Bayes credibility premium for class 6, using the method that preserves total losses. (A) 71% (B) 72% (C) 73% (D) 74% (E) 75%
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 71
Use the following information for the next 5 questions: The Cars-R-Us Insurance Company has insured the following 7 taxicab companies in a city, with the following numbers of cabs in their fleets and the following aggregate dollars of loss for each of 5 years:
Company
Number of Cabs 1991 1992 1993
1994
1995
Sum
Arriba Taxis Bowery Canarsie Cabs Downtown East River Flushing Gandhi
80 5 20 10 15 10 25
80 5 20 10 20 11 29
83 6 22 9 25 10 29
85 5 23 9 27 11 30
88 5 26 10 30 12 30
416 26 111 48 117 54 143
Sum
165
175
184
190
201
915
Company
Aggregate Losses 1991 1992 1993
1994
1995
Sum
Arriba Taxis Bowery Canarsie Cabs Downtown East River Flushing Gandhi
$1,000,000 $20,000 $40,000 $50,000 $0 $60,000 $140,000
$570,000 $20,000 $120,000 $0 $40,000 $0 $40,000
$1,800,000 $100,000 $150,000 $110,000 $30,000 $0 $330,000
$200,000 $70,000 $0 $10,000 $0 $0 $70,000
$380,000 $0 $20,000 $40,000 $20,000 $170,000 $100,000
$3,950,000 $210,000 $330,000 $210,000 $90,000 $230,000 $680,000
Sum
$1,310,000
$790,000
$2,520,000
$350,000
$730,000
$5,700,000
1994
1995
1991 to 1995
$2,353 $14,000 $0 $1,111 $0 $0 $2,333
$4,318 $0 $769 $4,000 $667 $14,167 $3,333
$9,495 $8,077 $2,973 $4,375 $769 $4,259 $4,755
Company 1991 Arriba Taxis Bowery Canarsie Cabs Downtown East River Flushing Gandhi 7
Losses per Cab 1992 1993
$12,500 $4,000 $2,000 $5,000 $0 $6,000 $5,600
$7,125 $4,000 $6,000 $0 $2,000 $0 $1,379
$21,687 $16,667 $6,818 $12,222 $1,200 $0 $11,379
5
∑ ∑ mij (Xij - Xi )2 i=1 j=1
7
= 26,723,482,801.
∑mi (X i i=1
- X )2 = 9,876,143,619.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 72
4.20 (1 point) Use nonparametric Empirical Bayes techniques to estimate the Expected Value of the Process Variance. A. Less than 930 million B. At least 930 million but less than 940 million C. At least 940 million but less than 950 million D. At least 950 million but less than 960 million E. At least 960 million 4.21 (2 points) Use nonparametric Empirical Bayes techniques to estimate the Variance of the Hypothetical Means. A. Less than 6.5 million B. At least 6.5 million but less than 7.0 million C. At least 7.0 million but less than 7.5 million D. At least 7.5 million but less than 8.0 million E. At least 8.0 million 4.22 (1 point) Using the results of the previous questions, what is the credibility given to the five years of observed data from the Downtown Company? A. Less than 21% B. At least 21% but less than 22% C. At least 22% but less than 23% D. At least 23% but less than 24% E. At least 24% 4.23 (2 points) Using the method that preserves total losses, what is the estimated future pure premium for the Canarsie Cab Company? A. Less than $4200 B. At least $4200 but less than $4400 C. At least $4400 but less than $4600 D. At least $4600 but less than $4800 E. At least $4800 4.24 (1 point) Using the method that preserves total losses, assuming 5 cabs in 1997, what are the expected aggregate losses in 1997 for the Bowery Company? A. Less than $28,000 B. At least $28,000 but less than $30,000 C. At least $30,000 but less than $32,000 D. At least $32,000 but less than $34,000 E. At least $34,000
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 73
For the next two questions, use the following information for two classes and three years. Number of Claims Class Year 1 Year 2 Year 3 Total A 2 4 3 9 B 4 9 6 19 Number of Exposures Class Year 1 Year 2 Year 3 Total A 100 150 200 450 B 200 200 200 600 Frequency Class Year 1 Year 2 Year 3 Total A 0.02000 0.02667 0.01500 0.02000 B 0.02000 0.04500 0.03000 0.03167 4.25 (3 points) Determine the Buhlmann Credibility parameter, K, to be used for estimating future frequency, using nonparametric empirical Bayes estimation. A. 300 B. 600 C. 900 D. 1200 E. 1500 4.26 (1 point) Using nonparametric empirical Bayes credibility, estimate the future frequency for the Class A, using the method that preserves total claims. A. 2.28% B. 2.32% C. 2.36% D. 2.40% E. 2.44%
4.27 (3 points) You are given the following commercial automobile policy experience: Company Year 1 Year 2 Year 3 Total Losses I ? 100,000 120,000 Number of Automobiles ? 200 300 Losses II 30,000 50,000 ? Number of Automobiles 100 200 ? Losses III 160,000 ? 150,000 Number of Automobiles 400 ? 300 Determine the nonparametric empirical Bayes credibility factor, Z, for Company II. (A) Less than 0.4 (B) At least 0.4, but less than 0.5 (C) At least 0.5, but less than 0.6 (D) At least 0.6, but less than 0.7 (E) At least 0.7
220,000 500 80,000 300 310,000 700
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 74
4.28 (2 points) You are given the following information for two group policyholders: Group Year 1 Year 2 A Aggregate losses 600 600 Number of members 10 12 B Aggregate losses 1000 900 Number of members 25 20 Using nonparametric empirical Bayes estimation, calculate the credibility factor, Z, used for Group B's experience. (A) 85% (B) 87% (C) 89% (D) 91% (E) 93% 4.29 (4, 11/00, Q.27) (2.5 points) You are given the following information on towing losses for two classes of insureds, adults and youths: Exposures Year Adult Youth Total 1996 2000 450 2450 1997 1000 250 1250 1998 1000 175 1175 1999 1000 125 1125 Total
5000
Pure Premium Year 1996 1997 1998 1999
1000
Adult 0 5 6 4
Weighted average 3
6000
Youth 15 2 15 1
Total 2.755 4.400 7.340 3.667
10
4.167
You are also given that the estimated variance of the hypothetical means is 17.125. Determine the nonparametric empirical Bayes credibility premium for the youth class, using the method that preserves total losses. (A) Less than 5 (B) At least 5, but less than 6 (C) At least 6, but less than 7 (D) At least 7, but less than 8 (E) At least 8
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 75
4.30 (4, 5/01, Q.32) (2.5 points) You are given the following experience for two insured groups: Year Group 1 2 3 Total 1
2
Number of members
8
12
5
25
Average loss per member
96
91
113
97
Number of members
25
30
20
75
113
111
116
113
Average loss per member Total 2
Number of members
100
Average loss per member
109
3
2
∑ ∑ mij (xij - x i )2
= 2020
i=1 j=1
∑mi ( xi
- x)2 = 4800
i=1
Determine the nonparametric Empirical Bayes credibility premium for group 1, using the method that preserves total losses. (A) 98 (B) 99 (C) 101 (D) 103 (E) 104 4.31 (4, 11/01, Q.30) (2.5 points) You are making credibility estimates for regional rating factors. You observe that the Bühlmann-Straub nonparametric empirical Bayes method can be applied, with rating factor playing the role of pure premium. Xij denotes the rating factor for region i and year j, where i = 1, 2, 3 and j = 1, 2, 3, 4. Corresponding to each rating factor is the number of reported claims, mij, measuring exposure. You are given: 4
∑mij Xij
4
i
mi =
∑mij
Xi =
j=1
j=1
1 2 3
50 300 150
mi
1.406 1.298 1.178
4
v i = (1/3) ∑mij (Xij - Xi )2
mi( X i - X )2
j=1
0.536 0.125 0.172
0.887 0.191 1.348
Determine the credibility estimate of the rating factor for region 1 using the method that 3
preserves
∑mi Xi . i=1
(A) 1.31
(B) 1.33
(C) 1.35
(D) 1.37
(E) 1.39
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 76
4.32 (4, 11/04, Q.17 & 2009 Sample Q.145) (2.5 points) You are given the following commercial automobile policy experience: Company Year 1 Year 2 Year 3 Losses I 50,000 50,000 ? Number of Automobiles 100 200 ? Losses Number of Automobiles
II
? ?
150,000 500
150,000 300
Losses III 150,000 ? 150,000 Number of Automobiles 50 ? 150 Determine the nonparametric empirical Bayes credibility factor, Z, for Company III. (A) Less than 0.2 (B) At least 0.2, but less than 0.4 (C) At least 0.4, but less than 0.6 (D) At least 0.6, but less than 0.8 (E) At least 0.8 4.33 (4, 5/05, Q.25 & 2009 Sample Q.194) (2.9 points) You are given: Group Year 1 Year 2 Year 3 Total Claims 1 10,000 15,000 Number in Group 50 60 Average 200 250
Total 25,000 110 227.27
Total Claims Number in Group Average
34,000 190 178.95
2
16,000 100 160
18,000 90 200
Total Claims 59,000 Number in Group 300 Average 196.67 You are also given â = 651.03. Use the nonparametric empirical Bayes method to estimate the credibility factor for Group 1. (A) 0.48 (B) 0.50 (C) 0.52 (D) 0.54 (E) 0.56
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 77
Solutions to Problems: C
Y
∑ ∑ mit (Xit - X i )2 4.1. B. EPV =
i=1 t=1
C (Y - 1)
=
1 1 (74,030) = 18,508. 2 3-1
C
∑ mi 2
=1 Π=m- i m
= 106 - (392 + 672 )/106 = 49.3.
C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) =
Π
61,849 - (2 - 1)(18,508) = 879. 49.3
K = EPV/VHM = 18,508/879 = 21.1. Z 1 = 39 / (39 + 21.1) = 0.649. Z2 = 67 / (67 + 21.1) = 0.760. Credibility weighted mean =
(94.72)(0.649) + (44.63)(0.760) = 67.70. 0.649 + 0.760
Estimate for group 2 is: (44.63)(0.760) + (67.70)(1 - 0.760) = 50.17. Comment: Similar to 4, 5/01, Q.32. 4.2. B. Z = E/(E + K) = E/(E + 200). In order to preserve the total losses, apply the complement of credibility to the credibility weighted pure premium of: (54.02)(129.36) + (33.99)(118.45) + (39.39)(98.46) + (19.03)(63.83) = 110.00. 54.02 + 33.99 + 39.39 + 19.03 The estimated pure premium for Territory 4 is: (0.1903)(63.83) + (1 - 0.1903)(110.00) = 101.21. Territory
Exposures
Losses
Pure Premium
Credibility
Estimated P.P.
1 2 3 4 Sum
235 103 130 47 515
30,400 12,200 12,800 3,000 58,400
129.36 118.45 98.46 63.83 113.40
54.02% 33.99% 39.39% 19.03%
120.46 112.87 105.45 101.21
Cred. Weighted
110.00
Comment: (120.46)(235) + (112.87)(103) + (105.45)(130) + (101.21)(47) = 58,399, the observed total losses, subject to rounding.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 78
4.3. C. From the data from the first class the estimated process variance is: {(14)(10 - 9.4)2 + (20)(6 - 9.4)2 + (13)(14 - 9.4)2 } / (3 - 1) = 255.7. Similarly, for each of the remaining classes the estimated process variance would be: 440.0, 468.1, 393.8, 137.3, 404.6. Thus the estimate of the Expected Value of the Process Variance is: (255.7 + 440.0 + 468.1 + 393.8 + 137.3 + 404.6) / 6 = 349.9. 6
4.4. A.
∑ mi (X i - X)2 = 47(9.4 - 10.95)2 + 27(17.81 - 10.95)2 + 65(9.49 - 10.95)2 + i=1
40(11.25 - 10.95)2 + 68(6.57 - 10.95)2 + 26(21.27 - 10.95)2 = 5597.5. 6
Π=m-
∑ mi2 / m = 273 - {472 + 272 + 652 + 402 + 682 + 262}/273 = 221.48 i=1
Thus the estimated VHM = {5597.5 - (349.9)(6 - 1)} / 221.48 = 17.4. 4.5. B. K = 349.9/17.4 = 20.1. For the Cracker Manufacturing class the three years of data have a total of 47 exposures. Thus Z = 47/(47 + 20.1) = 70.0%.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 79
4.6. E. The credibility assigned to the data from the Creamery Class is: 27/(27 + 20.1) = 57.3%. The credibilities assigned to the classes are: 0.700, 0.573, 0.764, 0.666, 0.772, 0.564. The observed frequencies for the classes are given as: 9.40, 17.81, 9.49, 11.25, 6.57, 21.27. Thus the credibility weighted pure premium is: (0.700)(9.40) + (0.573)(17.81) + (0.764)(9.49) + (0.666)(11.25) + (0.772)(6.57) + (0.564)(21.27) 0.700 + 0.573 + 0.764 + 0.666 + 0.772 + 0.564 = 12.03. The observed claim frequency for the Creamery Class is 17.81. Thus the estimated future claim frequency for the Creamery Class is: (57.3%)(17.81) + (42.7%)(12.03) = 15.34. Comment: Using the method that preserves the overall number of claims: Class
Exposures Exposures Claim Frequency Claim Frequency Year (Sum of 3 Years) Year (over 3 years)
Cracker Mfg. Creamery Bakery Macaroni Mfg. Ice Cream Mfg. Grain Milling
47 27 65 40 68 26
9.40 17.81 9.49 11.25 6.57 21.27
Overall Cred. Weighted
273
10.95 12.03
Credibility (K = 20.1)
Estimated Future Claim Frequency
0.700 0.573 0.764 0.666 0.772 0.564
10.19 15.35 10.09 11.51 7.82 17.24
The observed claim frequency over the three years is given in the table in the question. For Creamery it was calculated as: {(8)(26) + (10)(12) + (9)(17)} / (8 + 10 + 9) = 17.81. The numerator is the number of claims observed over the three years. The denominator is the number of exposures observed over the three years. The mean observed claim frequency over all years and classes of 10.95 was calculated similarly. If one did not use the method that preserves total claims, then the estimated future claim frequencies for each of the six classes would be calculated as follows: Class
Exposures Exposures Claim Frequency Claim Frequency Year (Sum of 3 Years) Year (over 3 years)
Cracker Mfg. Creamery Bakery Macaroni Mfg. Ice Cream Mfg. Grain Milling
47 27 65 40 68 26
9.40 17.81 9.49 11.25 6.57 21.27
Overall
273
10.95
Credibility (K = 20.1)
Estimated Future Claim Frequency
0.700 0.573 0.764 0.666 0.772 0.564
9.87 14.89 9.84 11.15 7.57 16.77
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 80
1 1 4.7. A. EPV = 27 3 - 1
27 3
∑ ∑ mij (Xij - Xi)2 = 0.600/54 = 0.0111. i=1 j=1
27
Π=m-
∑ mi2 / m = 93.2 - (872.3/93.2) = 83.84. i=1
27
VHM = {
∑ mi (X i - X)2 - EPV(27 - 1)} / Π = {0.343 - (26)(0.0111)}/83.84 = 0.000649. i=1
K = EPV/VHM = 0.0111/0.000649 = 17.1. Z = 8/(8 + 17.1) = 31.9%. Comment: Loss Ratio = Losses / Premium takes the place of pure premium = Losses / Exposures. Thus premiums take the place of exposures. We are applying the same mathematics to a different situation. See 4, 11/01, Q.30. It is usual to have the same number of years for the individual whose future you are predicting, as was used to estimate K. However, that is not a requirement. Once K has been estimated, in this case using 3 years of data from each of 27 agencies, this Buhlmann Credibility Parameter can be used in the formula Z = N(N + K) with N equal to any reasonable value. In this case, we happen to have more years of data for the agency we wish to estimate the future. In other sections, I discuss how to estimate K using data that does not have the same number of years of data from each individual or class. 4.8. A. The credibilities are: 5/(5 + 10) = 33.3%, 37.5%, and 50.0%. {(0.333)(0.4) + (0.375)(0.5) + (0.500)(0.9)} The Credibility Weighted Loss Ratio is: = 0.638. 0.333 + 0.375 + 0.500 The estimated future loss ratio for policyholder #1 is: (0.333)(0.4) + (0.667)(0.638) = 55.9%. 1 1 4.9. A. EPV = 4 3 - 1 i
4
3
∑ ∑ mij (Xij - Xi )2
= 864.15 / 8 = 108.02.
i=1 j=1 j
4
Π=m-
∑ mi2 /m = 140 - (272 + 352 + 542 + 242)/140 = 101.1. i=1 4
VHM = {
∑ mi (X i - X)2 - (C - 1)EPV}/Π = {1000.79 - (4-1)(108.02)} / 101.1 = 6.694. i=1
K = EPV/VHM = 108.02/6.694 = 16.14. Freedom has total premiums of 27 and credibility of: Z1 = 27/(27 + 16.14) = 0.626.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 81
4.10. D. K = 16.14. Z1 = 27/(27 + 16.14) = 0.626. Z 3 = 54/(54 + 16.14) = 0.770.
Z 2 = 35/(35 + 16.14) = 0.684.
Z 4 = 24/(24 + 16.14) = 0.598. Credibility weighted mean =
(0.626)(84.1%) + (0.684)(78.5%) + (0.770)(76.8%) + (0.598)(79.0%) = 79.4%. 0.626 + 0.684 + 0.770 + 0.598 Estimated future loss ratio for Freedom is: (0.626)(84.1%) + (1 - 0.626)(79.4%) = 82.3%. Comment: Here loss ratio takes the place of pure premium, while premiums takes the place of exposures measuring the size of insured. Just as in the more common case with pure premiums, here:
∑ mij Xij is the total losses for insured j. j
See 4, 11/01, Q.30 for yet another parallel case involving in that case regional rating factors. 1 4.11. E. v i = Y - 1 1 Estimated EPV = C Class Standard Preferred
Y
∑ mit (Xit - Xi )2 = estimated process variance for class i. t=1 C
∑ vi = (22094.6 + 35525.0)/2 = 28,810. i=1
(# of Exposures)(Loss per Exposure - 3 year average)^2 2001 2002 2003 21300.2 10272.2
1028.7 46694.4
21860.2 14083.3
Average
22094.6 35525.0 28809.8
120.00 64.00
Loss per Exposure 108.33 112.50
Standard Preferred Standard Preferred
Standard Preferred
Process Variance
93.33 56.67
Weighted Avg. 105.41 78.33
$12,000 $3,200
Aggregate Losses $13,000 $4,500
$14,000 $1,700
Sum $39,000 $9,400
100 50
Number of Exposures 120 40
150 30
Sum 370 120
Comment: In this question, during 2004 we are using data from the years 2001-2003 in order to predict what will happen in 2005. This is a very common situation in practical applications.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 82 C
4.12. E. Let Π = m -
∑ mi2 / m = 490 - (3702 + 1202) / 490 = 181.2. i=1
X = overall average loss per exposure = 48,400/490 = 98.78. C
∑ mi ( Xi -
X)2 - EPV (C - 1)
estimated VHM = i=1
Π
=
{370(105.41 - 98.78)2 + 120(78.33 - 98.78)2 - (2-1)(28810)} / 181.2 = 207.7. 4.13. A. K = EPV/VHM = 28810/207.7 = 139. For the Standard Class, Z = 370/(370 + 139) = 72.7%. 4.14. B. For the Preferred Class, Z = 120/(120+139) = 46.3%. In order to preserve the total losses apply the complement of credibility to the credibility weighted average pure premium: {(0.727)(105.41) + (.0463)(78.33)} / (0.727 + 0.463) = 94.87. Therefore, the estimated pure premium for the Preferred Class is: (0.463)(78.33) + (1 - 0.463)(94.87) = 87.21. 1 1 4.15. B. EPV = 11 6 - 1
11 6
∑ ∑ mij (Xij - Xi)2 = 1.843/55 = .0335. i=1 j=1
11
Π=m-
∑mi2 / m = 77.2 - (1105/77.2) = 62.89. i=1
11
VHM = { ∑mi (X i - X )2 - EPV(11 - 1)} / Π = {0.361 - (0.0335)(10)} / 62.89 = 0.00041. i=1
K = EPV/VHM = 0.0335/0.00041 = 82. Z = 9/(9 + 82) = 9.9%. 1 1 4.16. D. EPV = C Y - 1
C
Y
∑ ∑ mij (Xij - Xi)2 = 101 i=1 j=1
1 (102,397) = 2560. 5 - 1
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 83 C
4.17. A. Π = m -
∑ mi2 / m = 1374 - 269,954/1374 = 1177.5. i=1
C
∑ mi ( Xi -
X)2 - EPV (C - 1)
VHM = i=1
= {38,429 - (10-1)(2560)} / 1177.5 = 13.07.
Π
4.18. E. K = EPV/VHM = 2560/13.07 = 195.9. Z 6 = 33/(33+195.9) = 0.144. 4.19. C. Credibility weighted mean = 280.0/3.723 = 75.2. Class
Exposures
Credibility
Loss Ratio
Product
Estimate
1 2 3 4 5 6 7 8 9 10
100 351 63 27 193 33 125 178 162 142
33.8% 64.2% 24.3% 12.1% 49.6% 14.4% 39.0% 47.6% 45.3% 42.0%
77.2 82.5 74.0 67.3 73.1 61.0 80.2 72.8 77.4 68.6
26.1 52.9 18.0 8.2 36.3 8.8 31.2 34.7 35.0 28.8
75.9 79.9 74.9 74.3 74.2 73.2 77.2 74.1 76.2 72.4
Sum
372.3%
280.0
Estimated future loss ratio for class 6 is: (61.0%)(.144) + (75.2%)(1 - 0.144) = 73.2%. Comment: If the overall loss ratio were adequate, then the manual rate for class 6 might be changed by: 73.2%/76.09% - 1 = -3.8%, ignoring fixed expense loadings, etc. Similarly, class 2 with a higher than average estimated loss ratio of 79.9%, might have its manual rate increased by: 79.9%/76.09% - 1 = 5.0%.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 84
1 4.20. D. vi = Y - 1
Y
∑ mit (Xit - Xi )2 = estimated process variance of data for company i. t=1
1 1 Estimated EPV = (1/C) Σ vi = C Y - 1
C
Y
∑ ∑ mij (Xij - Xi)2 = (26,723,482,801) / {(7)(4)} i=1 j=1
= 954.4 million. Comment: Here is how one would calculate Σ Company
Arriba Taxis Bowery Canarsie Downtown East River Flushing Gandhi Sum
Σ mij (Xij - X i)2:
(Number of Cabs)(Loss per Cab-5 year average)^2 1991 1992 1993 1994 1995 7.223e+8 8.311e+7 1.893e+7 3.906e+6 8.876e+6 3.030e+7 1.784e+7
4.494e+8 8.311e+7 1.833e+8 1.914e+8 3.030e+7 1.996e+8 3.305e+8
1.234e+10 4.427e+8 3.253e+8 5.542e+8 4.639e+6 1.814e+8 1.272e+9
4.336e+9 1.754e+8 2.033e+8 9.588e+7 1.598e+7 1.996e+8 1.760e+8
2.359e+9 3.262e+8 1.263e+8 1.406e+6 3.156e+5 1.178e+9 6.065e+7
Sum 2.020e+10 1.111e+9 8.570e+8 8.468e+8 6.010e+7 1.789e+9 1.857e+9 2.672e+10
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 85 C
4.21. A. Let Π = m -
∑ mi2 / m = i=1
915 - (4162 + 262 + 1112 + 482 + 1172 + 542 + 1432 )/915 = 915 - 244 = 668.6. C
∑ mi ( Xi -
estimated VHM = i=1
X)2 - EPV (C - 1) =
Π
{9876 million - (7 - 1)(954 million)} / 668.6 = 6.21 million. Comment: X = overall average loss per cab = 5.7 million/915 = $6230. Here is how one would calculate Σ mi ( X i - X )2 : Arriba Taxis Bowery Canarsie Downtown East River Flushing Gandhi
Mean 9495 8077 2973 4375 769 4259 4755
Exposures 416 26 111 48 117 54 143
Contribution to numerator of the VHM 4,434,653,600 88,696,634 1,177,493,439 165,169,200 3,489,234,957 209,781,414 311,114,375
Overall
6230
915
9,876,143,619
4.22. D. K = EPV/VHM = 954 /6.21 = 154. There are 48 exposures. Z = 48/(48+154) = 23.8%. 4.23. B. Calculate the credibility weighted pure premium of 5218: Arriba Taxis Bowery Canarsie Downtown East River Flushing Gandhi Sum
Exposures 416 26 111 48 117 54 143
Z 73.0% 14.4% 41.9% 23.8% 43.2% 26.0% 48.1% 270.4%
Pure Premium $9495 $8077 $2973 $4375 $769 $4259 $4755
Extension $6930 $1167 $1245 $1040 $332 $1106 $2289 $14108
Estimate $8340 $5631 $4278 $5018 $3297 $4969 $4995 $5218
The mean pure premium for Canarsie is $2973. Canarsie has 111 exposures, and K = 154, so Z = 111/(111+154) = 41.9%. Thus the estimated future pure premium is: (0.419)($2973) + (0.581)($5218) = $4277.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 86
4.24. B. The mean pure premium for Bowery is $8077. Bowery has 26 exposures and K = 154, so Z = 26/(26+154) = 14.4%. Thus the estimated future pure premium is: (.144)($8077) + (.856)($5218) = $5630. The expected aggregate losses for 5 cabs are: (5)(5630) = $28,150. 1 4.25. B. vi = Y - 1
Y
∑ mit (Xit - Xi )2 = estimated process variance for class i. t=1
Estimated EPV = (1/C) Σ vi = (.00583 + .03167)/2 = 0.01875. Class A B
(# of Exposures)(Frequency - 3 year average)^2 1 2 3
Process Variance
0.00000 0.02722
0.00583 0.03167
0.00667 0.03556
0.00500 0.00056
Average
0.01875
A B
0.02000 0.02000
Frequency 0.02667 0.04500
A B
2 4
Number of Claims 4 9
100 200
Number of Exposures 150 200
A B
0.01500 0.03000
Weighted Avg. 0.02000 0.03167
3 6
Sum 9 19
200 200
Sum 450 600
Π = m - Σmi2 /m = 1050 - (4502 + 6002 )/1050 = 514.3. X1 = 9/450 = 0.02000. X 2 = 19/600 = 0.03167. X = overall average frequency = 28/1050 = 0.02667. C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) Π
=
{450(0.02000 - 0.02667)2 + 600(0.03167 - 0.02667)2 - (2 - 1)(0.01875)} / 514.3 = 0.0000316. K = EPV/VHM = 0.01875/0.0000316 = 593.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 87
4.26. C. Z A = 450/(450 + 593) = 0.431. ZB = 600/(600 + 593) = 0.503. Credibility weighted mean = {(0.431)(0.0200) + (0.503)(0.03167)} / (0.431 + 0.503) = 0.0263. Estimated frequency for Class A is: (.431)(.0200) + (1 - .431)(.0263) = 0.0236. Comment: If one did not use the method that preserves total losses, in this case one would get a very similar result: (0.431)(0.0200) + (1 - 0.431)(0.02667) = 0.0238. 4.27. D. Xit = pure premiums = losses/(number of automobiles): Company
Pure Premium
I II III
500 300 400
Exposures 200 300 100 200 400 300
I II III
1 vi = Y - 1
400 250 500
XBari
vi
440.00 266.67 442.86
1,200,000 166,667 1,714,286
Average
1,026,984
Total 500 300 700
Y
∑ mit (Xit - Xi )2 = estimated process variance for class i. t=1
Estimated EPV = (1/C) Σ vi = (1200 + 167 + 1714)/3 = 1027 thousand. Let Π = m - Σ mi2 / m = 1500 - (5002 + 3002 + 7002 )/1500 = 946.7. X = overall average loss per exposure = 610,000/1500 = 406.67.
Σ mi ( X i - X )2 = 500(440 - 406.67)2 + 300(266.67 - 406.67)2 + 700(442.86 - 406.67)2 = 7352 thousand. C
∑ mi ( Xi -
Estimated VHM = i=1
X)2 - EPV (C - 1) Π
=
{7352 thousand - (3 - 1)(1027 thousand)}/946.7 = 5596. K = EPV/VHM = (1027 thousand) / 5596 = 184. For Company II, Z = 300 / (300 + 184) = 62.0%. Comment: Similar to 4, 11/04, Q.17. Note that even though there are missing data cells, each company has the same number of years of data, two. Therefore, one can still use the formulas without varying numbers of years of data, rather than the more complicated formulas with varying number of years. Using the latter formulas would result in the same answer.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 88
4.28. B. The pure premiums by year for Group A are: 60 and 50. For Group A, overall p.p. is 1200/22 = 54.55. The pure premiums by year for Group B are: 40 and 45. Overall p.p. is 1900/45 = 42.22. 1 vi = Y - 1
Y
∑ mit (Xit - Xi )2 . t=1
Estimated process variance for Group A: (1/1) {(10)(60 - 54.55)2 + (12)(50 - 54.55)2 } = 545.46. Estimated process variance for Group B: (1/1) {(25)(40 - 42.22)2 + (20)(45 - 42.22)2 } = 277.78. Estimated EPV is: (545.46 + 277.78)/2 = 411.62. C
m-
∑ mi2 / m = 67 - (222 + 452)/67 = 29.55. i=1
The overall mean is: 3100 / 67 = 46.27. C
∑ mi ( Xi -
X)2 = (22)(54.55 - 46.27)2 + (45)(42.22 - 46.27)2 = 2246.40.
i=1
C
∑ mi ( Xi -
estimated VHM = i=1
X)2 - EPV (C - 1) C
m -
∑ mi2 / m i=1
K = EPV / VHM = 411.62 / 62.09 = 6.63. For Group B, Z = 45 / (45 + K) = 87.2%.
= {2246.40 - (411.62)(2 - 1)} / 29.55 = 62.09.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 89
1 4.29. E. v i = Y - 1
Y
∑ mit (Xit - Xi )2 = estimated process variance for class i. t=1
Estimated EPV = (1/C) Σ vi = (10667 + 13917)/2 = 12,292. Class Adult Youth
(# of Exposures)(Loss per Exposure - 4 year average)^2 1996 1997 1998 1999 18000 11250
4000 16000
9000 4375
1000 10125
Average
10667 13917 12292
Adult Youth
$0 $15
Loss per Exposure $5 $2
Adult Youth
$0 $6,750
Aggregate Losses $5,000 $500
2000 450
Number of Exposures 1000 250
Adult Youth
Process Variance
$6 $15
$4 $1
Weighted Avg. $3 $10
$6,000 $2,625
$4,000 $125
Sum 15000 10000
1000 125
Sum 5000 1000
1000 175
K = EPV/VHM = 12292/17.125 = 718. For Youth, Z = 1000/(1000 + K) = 0.582. For Adult, Z = 5000/(5000 + K) = 0.874. In order to preserve the total losses apply the complement of credibility to the credibility weighted average p.p.: {(0.582)(10) + (0.874)(3)} / (0.582 + 0.874) = 5.80. Therefore, the estimated pure premium for the Youth Class is: (0.582)(10) + (1 - 0.582)(5.80) = 8.24. Comment: The estimated pure premium for the Adult Class is: (.874)(3) + (1 - .874)(5.80) = 3.35. The combined estimated losses corresponding to the observed exposures are: (5000)(3.35) + (1000)(8.24) = 24,990, equal to the observed losses of 25,000, subject to rounding. If one had been asked to calculate the VHM, one would proceed as follows. Let Π = m - Σ m2 / m = total exposures “adjusted for degrees of freedom” = 6000 - (50002 + 10002 )/6000 = 1666.67. X = overall average loss per exposure = 25000/6000 = 4.167. C
∑ mi ( Xi -
estimated VHM = i=1
X)2 - EPV (C - 1) Π
=
{(5000(3 - 4.167)2 + 1000(10 - 4.167)2 ) - (2 - 1)12292)} / 1666.67 = 17.125.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 90
1 1 4.30. A. EPV = C Y - 1 i
C
Y
∑ ∑ mij (Xij - Xi)2 = (1/2){1/(3-1)} (2020) = 505. i=1 j=1
j
Π = m - Σmi2 /m = 100 - (252 + 752 )/100 = 37.5. C
∑ mi ( Xi -
X)2 - EPV (C - 1)
VHM = i=1
Π
= {4800 - (2-1)(505)}/37.5 = 114.5.
K = EPV/VHM = 505/114.5 = 4.41. Z 1 = 25/(25+4.41) = .850. Z2 = 75/(75 + 4.41) = .944. Credibility weighted mean = {(97)(.850) + (113)(.944)}/(.850 + .944) = 105.4. Estimate for group 1 is: (97)(.850) + (105.4)(1 - .850) = 98.3. Comment: Σ mi( X i - X )2 = (25)(97-109)2 + (75)(113-109)2 = 4800. 2
3
∑ ∑ mij (Xij - Xi)2 = i=1 j=1
(8)(96-97)2 + (12)(91-97)2 + (5)(113-97)2 + (25)(113-113)2 + (30)(111-113)2 + (20)(116-113)2 = 2020. Estimate for group 1 is: (97) (85.0%) + (105.4) (1 - 85.0%) = 98.3. Estimate for group 2 is: (113) (94.4%) + (105.4) (1 - 94.4%) = 112.6. Reported exposures for group 1: 25. Reported exposures for group 2: 75. (25) (98.3) + (75) (112.6) = 10,903, matching total reported losses subject to rounding. Reported total losses: (25) (97) + (75) (113) = 10,900.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 91
4.31. C. EPV = (v1 + v2 + v3 )/3 = (0.536 + 0.125 + 0.172)/3 = 0.278. Π = m - Σ mi2 / m = 500 - (502 + 3002 + 1502 )/500 = 270. C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) Π
= {(0.887 + 0.191 + 1.348) - (0.278)(3-1)} / 270 =
1.878/270 = .00693. K = EPV/VHM = 0.278/0.00693 = 40.1. Z 1 = 50 / (50 + 40.1) = 0.555. Z2 = 300 / (300 + 40.1) = 0.882. Z3 = 150 / (150 + 40.1) = 0.789. The credibility weighted rating factor is: {(0.555)(1.406) + (0.882)(1.298) + (0.789)(1.178)} / (0.555 + 0.882 + 0.789) = 1.282. Estimated rating factor for region 1: (0.555)(1.406) + (1 - 0.555)(1.282) = 1.351. Comment: A situation not specifically covered on the Syllabus. On the exam, donʼt worry about the situation, just apply the mathematics. A rating factor would be used to help set the rate to be charged in a geographical region compared to some overall average rate. The higher the rating factor, the higher the rate charged in that region. For example, for automobile insurance an urban area would typically have a higher rating factor than a rural area. See for example,“Ratemaking” by Charles McClenahan and “Risk Classification” by Robert Finger, in Foundations of Casualty Actuarial Science.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 92
4.32. B. Xit = pure premiums = losses/(number of automobiles): Company
Pure Premium
I II III
500 300 3000
Exposures 100 200 500 300 50 150
I II III
1 vi = Y - 1
250 500 1000
XBari
vi
333.33 375 1500
4,166,667 7,500,000 150,000,000
Average
53,888,889
Total 300 800 200
Y
∑ mit (Xit - Xi )2 = estimated process variance for class i. t=1
Estimated EPV = (1/C) Σ vi = (4.167 + 7.5 + 150)/3 = 53.89 million. Let Π = m - Σ mi2 / m = 1300 - (3002 + 8002 + 2002 )/1300 = 707.7. X = overall average loss per exposure = 700,000/1300 = 538.46. C
∑ mi ( Xi -
Estimated VHM = i=1
X)2 - EPV (C - 1) Π
=
{300(333.33 - 538.46)2 + 800(375 - 538.46)2 + 200(1500 - 538.46)2 - (3-1)(53.89 mil)}/707.7 = 0.1570 million. K = EPV / VHM = 53.89 million/ 0.1570 million = 343. For Company III, Z = 200 / (200 + 343) = 36.8%. Comment: Note that even though there are missing data cells, each company has the same number of years of data, two. Therefore, one can still use the formulas without varying numbers of years of data, rather than the more complicated formulas with varying number of years. Using the latter formulas would result in the same answer.
2013-4-12
Empirical Bayesian Cred. §4 Varying Exposures, HCM 10/22/12, Page 93
4.33. B. Xit = pure premiums = (total claims) / (number in group). Group
Pure Premium
1 2
200 160 Exposures 50 100
1 2
1 vi = Y - 1
250 200
60 90
XBari
vi
227.27 178.95
68,182 75,789
Average
71,986
Total 110 190
Y
∑ mit (Xit - Xi )2 = estimated process variance for group i. t=1
Estimated EPV = (1/C) Σ vi = (68182 + 75789)/2 = 71,986. The estimated VHM is given as 651.03. K = EPV/VHM = 71986/651.03 = 110.6. Group 1 has 110 exposures, so its data is given credibility of: 110/(110 + 110.6) = 49.9%. Comment: a is Loss Models notation for the VHM. Thus â is the estimated VHM. Presumably, those students who used the syllabus readings “Credibility” by Mahler and Dean, and “Topics in Credibility” by Dean, which do not use that notation, had to guess that â was the estimated VHM, or had to do this calculation themselves! Even though there are missing years of data, each group has two years of data. Therefore, there is no need to use the formulas involving differing numbers of years of data. Group 2 has 190 exposures, so its data is given credibility of 190/(190 + 110.6) = 63.2%. Let Π = m - Σ mi2 / m = 300 - (1102 + 1902 )/300 = 139.33. X = overall average loss per exposure = 59000/300 = 196.67. C
∑ mi ( Xi -
Estimated VHM = i=1
X)2 - EPV (C - 1) Π
=
{110( 227.27 - 196.67)2 + 190(178.95 - 196.67)2 - (2-1)(71,986)} / 139.33 = 651.0, as given.
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 94
Section 5, Varying Exposures, Differing Numbers of Years45 As was the case with no variation in exposures, when one has variation in exposures the method can be generalized in order to deal with differing numbers of years of data. Just as in the case without exposures, rather than take an average of each insureds estimated variance, one takes a weighted average in order to estimate the EPV. One uses weights equal to the number of degrees of freedom for each class, that is the number of years of data for that class minus 1. Yi = the number of years of data for class i. Estimated EPV = weighted average of the estimated variances for each class C
∑
C Yi
(Yi - 1) vi
=1 = i C
(Yi - 1) ∑ i
∑∑ mit (Xit - Xi )2 = i=1 t=1C
=1
∑(Yi
. - 1)
i=1
When each class has the same number of year of data Y, then we just take a straight average of the sample variances for each class, and the above equation for the EPV reduces to the prior one when the years do not vary. One uses the same formulas for estimating the VHM as when the years did not vary: C
∑ mi 2 Π = m - i=1 m C
.
∑mi (X i
VHM = i=1
45
- X)2 - EPV (C -1) Π
.
The formulas in this section have those in the previous sections as special cases.
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 95
Exercise: You have data for 2 classes. You have no data from the Dairy Class for year one; you have data from both classes for the remaining two years. Use nonparametric Bayesian Estimation with the method that preserves total losses in order to estimate the future pure premium for each class. Class Poultry Farms Dairy Farms
1 41
Exposures 2 3 37 29 59 53
1 232
Losses 2 33 60
3 237 151
[Solution: The mean pure premium for Poultry is: (232+33+237) / (41+37+29) = 502/107 = 4.69. The mean pure premium for Dairy is: ( 60 + 151) / (59 + 53) = 211/112 = 1.88. C Yi
∑∑ mit (Xit - Xi )2 EPV = i=1 t=1C
. For Poultry the contribution to the numerator is:
∑(Yi
- 1)
i=1
(41)(5.66 - 4.69)2 + (37)(0.89 - 4.69)2 + (29)(8.17 - 4.69)2 = 924. Similarly, for Dairy: (59)(1.02 - 1.88)2 + (53)(2.85 - 1.88)2 = 94. Thus the estimate of the Expected Value of the Process Variance is: (924 + 94) / (2 + 1) = 339. C
∑ mi 2 Π = m - i=1 m
= 219 - (1072 + 1122 ) / 219 = 109.4.
X = overall average = (502 + 211) / (107 + 112) = 713/219 = 3.26.
Σ mi ( X i - X )2 = (107)(4.69 - 3.26)2 + (112)(1.88 - 3.26)2 = 432. Thus the estimated VHM = {432 - (339)(2-1)} / 109.4 = 0.85. K = 339/0.85 = 399. For the Poultry class the three years of data have a total of 107 exposures. Thus Z = 107 / (107 + 399) = 21.1%. For the Dairy class the two years of data have a total of 112 exposures. Thus Z = 112 / (112 + 399) = 21.9%. (21.1%)(4.69) + (21.9%)(1.88) Credibility weighted pure premium is: = 3.26. 21.1% + 21.9% The estimated pure premium for Poultry is: (21.1%)(4.69) + (1 - 21.1%)(3.26) = 3.56. The estimated pure premium for Dairy is: (21.9%)(1.88) + (1 - 21.9%)(3.26) = 2.96. Comment: In this case, since the two classes have very similar volumes of data, the credibility weighted pure premium is similar to the overall mean.]
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 96
Assumptions for Nonparametric Empirical Bayes Estimation: This technique is referred to as nonparametric empirical Bayes estimation, because we have made no specific assumptions about the structure parameters. In fact, we make some assumptions; as will be discussed, we implicitly or explicitly assume a certain covariance structure between the data from years, classes, etc. To the extent the assumed covariance structure does not match the particular situation to which the method is being applied, the resulting credibilities will be less than optimal.46 The Assumed Covariance Structure: The Empirical Bayes methods shown here assumes the usual Bühlmann or Bühlmann-Straub covariance structure, as discussed in “Mahlerʼs Guide to Buhlmann Credibility.” As discussed previously, when the size of risk m is not important, for two years of data from a single risk, the Variance-Covariance structure between the years of data is as follows: COV[Xt , Xu ] = η2δtu + τ2, where δtu is 1 for t = u and 0 for t ≠ u. COV[X1 , X2 ] = τ2 = VHM, and COV[X1 , X1 ] = VAR[X1 ] = η2 + τ2 = EPV + VHM. When the size of risk m is important, the EPV is assumed to be inversely proportional to the size of risk and the covariance structure of frequency, severity, or pure premiums is assumed to be: COV[Xt , Xu ] = δtu (η2 /m) + τ2. Assume one has data from a number of risks over several years. Let Xit be the data (die roll, frequency, severity, pure premium, etc.) observed for class (or risk) i, in year t, and let mit be the measure of size (premiums, exposures, number of die rolls, etc.) for class (or risk) i, in year t. Then the assumed covariance structure is: COV[Xit , Xju] = δij δtu (η2 / mit) + δij τ2. In other words, two observations from different random risks have a covariance of zero, two observations of the same risk in different years have a covariance of τ2, the VHM, while the observation of a risk in a single year has a variance of (η2 / mit) + τ2, the VHM plus the EPV (for a risk of size 1) divided by the size of risk.
46
Examples of other covariance structures are discussed near the end of a “Mahlerʼs Guide to Buhlmann Credibility,” and in Examples 20.25 and 20.26 in Loss Models.
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 97
Let M be the overall grand mean, M = E[Xit], then we have from the assumed covariance structure that the expected value of the product of two observations is: E[Xit Xju] = COV[Xit , Xju] + E[Xit]E[Xju] = δij δtu (η2 / mit) + δij τ2 + M2 . Then for example for a single risk in a single year: E[Xit2 ] = (η2 / mit) + τ2 + M2 . Y
Let X i = (1/ mi)
∑ mit Xit = weighted average of data for class i. t=1
Y
Then E[Xit X i] = (1/ mi)
Y
∑ miu E[XitXiu] = {(1/ mi) ∑ miu {δtu (η2 / mit) u=1
+ τ 2 + M2 }
u=1
Y
= (1/ mi)
∑ miu {τ2 + M2} + (1/ mi) miu (η2 / miu) = τ2 + M2 + η2 / mi. u=1
Also
E[ X i2 ] = E[ X i X i] = E[
Y
Y
t=1
t=1
∑ mit Xit X i ] = (1/ mi) ∑ mit E[Xit Xi ] =
Y
(1/ mi)
∑ mit (τ2 + M2 + η2 / mit ) = τ2 + M2 + η2 / mi = E[Xit X i]. t=1
An Unbiased Estimate of the EPV (for a single exposure): In order to estimate the EPV, one looks at the squared differences between the observations for a class and the weighted average observations for that class: Y
∑ mit (Xit - X i )2 Let vi = t=1
.
Y - 1
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 98 Y
Then, (Y-1) E[vi] =
∑ mit E[(Xit - Xi
)2
Y
]=
t=1
∑ mit { E[Xit2] - 2E[Xit X i] + E[X i2] } = t=1
Y
∑ mit
Y
{(η2
/ mit) +
τ2
+
M2
-
(τ 2
+
M2
η2
+
/ mi )} =
t=1
∑ (1 - mit / mi ) = η2 (Y-1). t=1
Thus, E[vi] = η2 (Y-1) / (Y-1) = η2. vi is an unbiased estimator of the (expected value of the) process variance. Then any weighted average of the vi for the various classes would be an unbiased estimator of the EPV. C
In particular, EPV = (1/C)
∑ vi . i=1
is an unbiased estimator of the EPV (for a single exposure of the risk process). An Unbiased Estimate of the VHM: In order to estimate the VHM, one looks at the squared differences between the weighted average observations for each class and the overall weighted average: C
Let ξ2 =
∑ mi (X i - X )2 . i=1
1 For two different risks, i ≠ j, E[ X i X i ] = E[ mi mj
1 mi mj
Y
Y
∑∑
t=1 u=1
1 mit mju E[Xit Xju] = mi mj
Y
Y
Y
∑ ∑ mit mju Xit Xju ] = t=1 u=1
Y
∑ ∑ mit mju M2 = M2. t=1 u=1
Therefore, since previously we had that E[ X i2 ] = M2 + τ2 + (η2 / mi), we have that: E[ X i X i ] = M2 + δij (τ2 + η2 / mi).
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 99 C
∑
1 E[ X i X ] = E[ m
j=1
1 mj X i X j = m
C
∑ mj E[ Xi Xj ] = j=1
C
(1/m)
∑ mj {M2 + δij
(τ2 + M2 + η2 / mi)} = (1/m) {mM2 + miτ2 + η2}
j=1
= M2 + τ2mi/m + η2 / m. E[ X 2 ] = E[ X
X ] = E[(1/ m)
C
C
i=1
i=1
∑ mi X i X ] = (1/ m) ∑ E[mi X i X ] =
C
(1/ m)
C
∑ mi (M2
+ τ2 mi / m + η2 / m) = M2 + τ2
i=1
i=1
C
Then E[ξ2 ] =
∑ mi2 / m + η2 / m.
∑ mi E[( Xi -
C
X )2]
=
i=1
∑ mi { E[Xi2] - 2E[Xi X] + E[X 2]} i=1
C
∑ mi
C
{M2
+
τ2
+
η2 / m
i
-
2(M2
+
τ2 m
i/ m
+
η2 / m)
i=1
+
M2
+
τ2
∑ mj2 / m2 + j=1
C
= τ2 (
=
C
C
∑ mi + ∑ mi ∑ i=1
i=1
C
mj2 / m2
j=1
C
E[ξ2 ] = τ2 (m -
-2
∑
∑ mj2 / m ) + η2 (C-1) j=1
⎡ ⎤ ⎢ ⎥ ⎢ ξ2 - η 2 (C -1) ⎥ Thus, E⎢ ⎥ = τ2 . C ⎢ ⎥ mi 2 / m ⎥ ⎢m ⎢⎣ ⎥⎦ i=1
∑
j=1
C
mj2 / m
) + η2
∑ 1 - 2mi / m + mi / m . i=1
η2 / m}
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 100
Therefore if one takes: C
EPV = (1/C)
∑ vi
which an unbiased estimator of η2, and Π = m -
i=1
-
η2
(C- 1)
C
m -
∑ mi2 / m
∑ mi2 / m . i=1
C
ξ2
C
∑mi (X i
= i=1
- X)2 - EPV (C -1) .
Π
i=1
Thus an unbiased estimator of the Variance of the Hypothetical Means, τ2, is: C
∑mi (X i
- X)2 - EPV (C -1)
VHM = i=1
Π
.
General Remarks: The arrays of data have two dimensions. In my formulas the first dimension, the rows of the array, are different classes. However, rather than classes they could equally well be different policyholders, different territories, different states, different groups, different experiments, etc. While exam questions often have for simplicity only two or three classes, these techniques are usually not applied in practical applications unless there are at least 4 classes.47 In my formulas the second dimension, the columns of the array, are different years. However, rather than years they could equally well be different observations, different policyholders, etc. It should be noted that while we have obtained unbiased estimators of the EPV and the VHM, their ratio is not necessarily an unbiased estimator of the Bühlmann Credibility Parameter K = EPV/VHM.48 Also it should be noted that the Bühlmann-Straub technique assumes a certain behavior of the covariances with size of risk and that the risk parameters are (approximately) stable over time. While these are the most commonly used assumptions, they do not hold in some real world applications.49 Thus as with all techniques, in practical applications one must be careful to only apply the Bühlmann-Straub empirical Bayes technique in appropriate circumstances. 47
See for example Gary Venterʼs Credibility Chapter of Editions 1 to 3 of Foundations of Casualty Actuarial Science. See for example Gary Venterʼs Credibility Chapter of Editions 1 to 3 of Foundations of Casualty Actuarial Science. 49 See for example William R. Gillamʼs, “Parametrizing the Workers Compensation Experience Rating Plan”, PCAS 1992 and Howard Mahlerʼs Discussion in PCAS 1993, “A Markov Chain Model of Shifting Risk Parameters”, by Howard Mahler, PCAS 1997, and “Credibility with Shifting Risk Parameters, Risk Heterogeneity and Parameter Uncertainty”, by Howard Mahler, PCAS 1998. 48
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 101
Summary of Bühlmann-Straub Empirical Bayes Estimation: Name
Symbol
Number of Classes50 Frequency for Class i in Year t51
C Xit
Exposures for Class i in Year t52
mit
Number of Years of data for Class i
Yi Yi
Sum of Exposures for Class i
∑ mit
mi
t=1 C Yi
Overall Sum of Exposures
∑∑ mit
m
i=1 t=1 Yi
∑ mit Xit Weighted Average for Class i.
t=1
Xi
mi C Yi
∑∑ mit Xit Overall Weighted Average
i=1 t=1
X
m C Yi
∑∑mit (Xit Expected Value of Process Variance
i=1t=1
EPV
- X i)
2
C
∑(Yi −1) i=1
C
∑ mi 2 Total Exposures “Adjusted for Degrees of Freedom”
m - i=1 m
Π C
∑mi (X i Variance of the Hypothetical Means
VHM
Bühlmann Credibility Parameter Credibility For Class i
K Zi
50
Or number of policyholders or number of groups. Or pure premium or loss ratio. 52 Or premiums or number of members in a group. 51
- X)2 - EPV (C -1)
i=1
EPV / VHM mi / (mi + K)
Π
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 102
Problems: Use the following information for the next 3 questions: Past data for a portfolio of group health policyholders are given below: Policyholder 1
Year 2 20 10
Sum 3 16 13
36 23
Losses # in group
1
Losses # in group
2
19 11
23 8
17 7
59 26
Losses # in group
3
26 14
30 17
35 18
91 49
Losses # in group
Sum
45 25
73 35
68 38
186 98
Note that there is no data from policyholder #1 in year 1. 5.1 (5 points) Estimate the Bühlmann-Straub credibility premium to be charged policyholder #1 in year 4, if you expect 15 in the group. A. Less than 28 B. At least 28 but less than 30 C. At least 30 but less than 32 D. At least 32 but less than 34 E. At least 34 5.2 (1 point) Estimate the Bühlmann-Straub credibility premium to be charged policyholder #2 in year 4, if you expect 10 in the group. A. Less than 17 B. At least 17 but less than 18 C. At least 18 but less than 19 D. At least 19 but less than 20 E. At least 20 5.3 (1 point) Estimate the Bühlmann-Straub credibility premium to be charged policyholder #3 in year 4, if you expect 20 in the group. A. Less than 36 B. At least 36 but less than 37 C. At least 37 but less than 38 D. At least 38 but less than 39 E. At least 39
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 103
Use the following information for the next three questions: There is data from private passenger automobile insurance policies, divided into 4 classes. Let Xij be the pure premium for policy j in class i. Let mij be the exposures for policy j in class i. Then Lij = mij Xij is the loss for policy j in class i. Class Number of Policies 1 2 3 4 Total
1140 1000 960 1060 4160
Number of Exposures 1741 1514 1456 1609 6320
Losses
Σ mij Xij2
1266 1390 1359 1846 5861
12,883 16,157 15,133 22,805 66,978
5.4 (3 points) Use nonparametric empirical Bayes methods to estimate the Expected Value of the Process Variance. A. 11 B. 12 C. 13 D. 14 E. 15 5.5 (3 points) Use nonparametric empirical Bayes methods to estimate the Variance of the Hypothetical Means. A. 0.010 B. 0.015 C. 0.020 D. 0.025 E. 0.030 5.6 (2 points) Determine the nonparametric empirical Bayes credibility premium for class 4, using the method that preserves total losses. A. 1.02 B. 1.04 C. 1.06 D. 1.08 E. 1.10 5.7 (4 points) ABC Insurance Company offers a policy for maid services that is rated on a per employee basis. The two insureds shown in the table below were randomly selected from ABCʼs policyholder database. Over a four-year period the following was observed: Year Insured 1 2 3 4 A Number of Claims 1 2 1 3 No. of Employees 20 22 20 18 B Number of Claims 1 0 1 No. of Employees 14 15 16 Estimate the expected annual claim frequency per employee for insured A using the empirical Bayes Bühlmann-Straub estimation model with the method that preserves total claims. A. 7.5% B. 7.7% C. 7.9% D. 8.1% E. 8.3%
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 104
Solutions to Problems: 5.1. A , 5.2. E , & 5.3. C. The mean pure premium for policyholder #1 is: 36/23 = 1.565. The mean pure premium for policyholder #2 is: 59/26 = 2.269. The mean pure premium for policyholder #3 is: 91/49 = 1.857. 1 1 EPV = C Y - 1
C
Y
∑ ∑ mij (Xij - Xi)2 = i=1 j=1
({(10)(2-1.565)2 + (13)(1.231 - 1.565)2 } + {(11)(1.727 - 2.269)2 + (8)(2.875 - 2.269)2 + (7)(2.429 - 2.269)2 } + {(14)(1.857 - 1.857)2 + (17)(1.765 - 1.857)2 + (18)(1.944 - 1.857)2 } ) / (1 + 2 + 2) = 1.994. Π = m - Σ mi2 / m = 98 - {232 + 262 + 492 } / 98 = 61.2. X = overall average = 186/98 = 1.898.
Σ mi ( X i - X )2 = (23)(1.565 - 1.898)2 + (26)(2.269 - 1.898)2 + (49)(1.857 - 1.898)2 = 6.211. Thus the estimated VHM = {6.211 - (1.994)(3 - 1)} / 61.2 = .0363. K = 1.994/.0363 = 54.9. Policyholder 1 2 3 Overall
Exposures Losses 23 36 26 59 49 91 98
186
Pure Premium 1.565 2.269 1.857
Z 29.5% 32.1% 47.2%
1.898
Comment: The losses could be in units of $1000.
Estimated P.P. 1.800 2.017 1.879
Year 4 Expos. 15 10 20
Premium 27.00 20.17 37.57
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 105 Y
∑ mij (Xij -
5.4. E, 5.5. C, 5.6. D.
Y
Xi)2
j=1
∑ mij Xij2 - 2mij Xij X i + mij X i2 = j=1
Y
Y
∑
=
mij X ij2
- 2Li X i +
mi X i 2
=
∑ mij Xij2 - Li2/mi. j=1
j=1
Σ Σ mij (Xij - X i )2 = ΣΣmij Xij2 - ΣLi2 /mi = 66,978 - {12662 /1741 + 13902 /1514 + 13592 /1456 + 18462 /1609} = 61,395. 1 1 EPV = C Y - 1
C
Y
∑ ∑ mij (Xij - Xi)2 = 61,395/(4160 - 4) = 14.77. i=1 j=1
Π = m - Σ mi2 / m = 6320 - (17412 + 15142 + 14562 + 16092 ) / 6320 = 4732.6. X = 5861/6320 = .927. X1 = 1266/1741 = .727. X2 = 1390/1514 = .918.
X 3 = 1359/1456 = .933. X 4 = 1846/1609 = 1.147.
Σ mi ( X i - X )2 = (1741)(0.727 - 0.927)2 + (1514)(0.918 - 0.927)2 + (1456)(0.933 - 0.927)2 + (1609)(1.147 - 0.927)2 = 147.7. C
∑ mi ( Xi -
X)2 - EPV (C - 1)
VHM = i=1
= {147.7 - (14.77)(4 - 1)}/4732.6 = 0.0218.
Π
K = EPV/VHM = 14.77/.0218 = 678. Estimated P.P. for Class 4: (70.4%)(1.147) + (1 - 70.4%)(.930) = 1.083. Class
Expos
Losses
P.P.
Z
Estimate
1 2 3 4
1741 1514 1456 1609
1266 1390 1359 1846
0.727 0.918 0.933 1.147
72.0% 69.1% 68.2% 70.4%
0.784 0.922 0.932 1.083
Cred. Wght.
0.930
Comment: The number of policies acts as the number of observations for each class. Taken from “Credible Risk Classification,” by Benjamin Joel Turner, Winter 2004 CAS Forum; the losses and pure premiums have been all divided by 1000.
2013-4-12
Empirical Bayes Cred. §5 Varying Exposures & Years, HCM 10/22/12, Page 106
5.7. A. The mean frequency for insured A is: 7/80 = .0875. The mean frequency for insured B is: 2/45 = .0444. 1 1 EPV = C Y - 1
C
Y
∑ ∑ mij (Xij - Xi)2 = i=1 j=1
{(20)(0.05 - 0.0875)2 + (22)(0.0909 - 0.0875)2 + (20)(0.05 - 0.0875)2 + (18)(0.1667 - 0.0875)2 + (14)(0.0714 - 0.0444)2 + (15)(0 - 0.0444)2 + (16)(0.0625 - 0.0444)2 } / (3 + 2) = 0.0429. Π = m - Σ mi2 / m = 125 - {802 + 452 }/125 = 57.6. X = overall average = 9/125 = .072.
Σ mi ( X i - X )2 = (80)(0.0875 - 0.072)2 + (45)(0.0444 - 0.072)2 = 0.0535. Thus the estimated VHM = {0.0535 - (0.0429)(2 - 1)} / 57.6 = 0.000184. K = 0.0429/0.000184 = 233. The credibilities are: 80/(80 + 233) = 0.256, and 45/(45 + 233) = 0.162. Credibility weighted average is: {(0.256)(0.0875) + (0.162)(0.0444)}/(0.256 + 0.162) = 0.0708. Estimated frequency for Insured A: (.256)(0.0875) + (1 - 0.256)(0.0708) = 0.0751. Comment: Similar to Exercise 15 in “Topics in Credibility” by Dean. Estimated frequency for Insured B: (0.162)(0.0444) + (1 - 0.162)(0.0708) = 0.0665.
2013-4-12
Empirical Bayesian Cred. §6 Using an A Priori Mean, HCM 10/22/12, Page 107
Section 6, Using an A Priori Mean One can use similar techniques to those discussed in the previous sections, but employing an a priori estimate of the overall mean, µ.53 In that case, one uses the same estimator for the EPV, but uses a different and somewhat more stable estimator for the VHM. Also, (1 - Z) is applied to the a priori mean, µ. No Variation in Exposures: When there is no variation in exposures (or years), then when using an a priori estimate of the overall mean, µ, the estimate of the VHM is: C
∑(Xi
VHM = i=1
2
- µ)
C
-
EPV . Y
Exercise: You have data for 2 policies over three years. Both policies are currently charged a rate based on an expected annual loss of $587. Use nonparametric Bayesian Estimation to estimate the future annual loss for each policy. Policy A B
53
1 404 632
Losses 2 433 551
3 537 660
Loss Models sometimes refers to this prior estimate, µ, as the manual rate.
2013-4-12
Empirical Bayesian Cred. §6 Using an A Priori Mean, HCM 10/22/12, Page 108
[Solution: The EPV is estimated as 4048, in the usual manner making no use of the a priori estimate of the mean: Policy
Losses 2 433 551
1 404 632
A B
3 537 660
Average
Mean
Sample Variance
458.0 614.3
4,891 3,204 4,048
The mean for Policy A is 458. The mean for Policy B is 614.3. We are given µ = 587. C
∑(Xi
VHM = i=1
2
- µ)
-
C
EPV (458 - 587)2 + (614.3 - 587)2 4048 = = 7344. 2 Y 3
K = 4048 / 7344 = 0.55. There is 3 years of data. Thus Z = 3 / (3 + 0.55) = 84.5%. The estimated future annual loss for Policy A is: (0.845)(458) + (0.155)(587) = 478. The estimated future annual loss for Policy B is: (0.845)(614.3) + (0.155)(587) = 610. Comment: Note that we apply the complement of credibility to the a priori mean of 587, rather than the observed overall mean of 536.] Summary: µ = a priori estimate of the mean. The estimate of the EPV is the same as without an a priori mean: C
C Y
∑ EPV =
∑∑ (Xit - X i )2
si2 i=1
= i=1 t=1 C (Y - 1)
C
.
The estimate of the VHM differs from that without an a priori mean: C
∑(Xi
VHM = i=1
2
- µ)
C
-
EPV , if estimated VHM is negative, set Z = 0. Y
Estimate is: Z(observation) + (1 - Z) µ.
2013-4-12
Empirical Bayesian Cred. §6 Using an A Priori Mean, HCM 10/22/12, Page 109
Working with Only One Policy, No Variation in Exposures: In the case where you use an assumed a priori mean, one could even apply this technique to a single policy or class. With C = 1 and no variation in exposures, the estimators for the EPV and VHM simplify to: Y
∑ (Xt - X )2 EPV = t=1
Y - 1
.
VHM = ( X - µ)2 - (EPV / Y). Exercise: You have data for a policy over three years. The policy is currently charged a rate based on an expected annual loss of $587. Use nonparametric Bayesian Estimation to estimate the future annual loss for this policy. 1 404
Losses 2 433
3 537
[Solution: The EPV is estimated as 4891, from the sample variance, making no use of the a priori estimate of the mean: 1 404
Losses 2 433
3 537
Mean
Sample Variance
458.0
4,891
The observed mean is 458. We are given µ = 587. VHM = ( X - µ)2 - EPV / Y = (458 - 587)2 - 4891/3 = 15,011. K = 4891/15,011 = 0.33. There is 3 years of data. Thus Z = 3 / (3 + 0.33) = 90.1%. The estimated future annual loss for Policy A is: (0.901)(458) + (0.099)(587) = 471.]
2013-4-12
Empirical Bayesian Cred. §6 Using an A Priori Mean, HCM 10/22/12, Page 110
Problems: 6.1 (2 points) For a particular policyholder, the manual rate is 15 per year. The past claims experience is: Year 1 2 3 Claims 13 21 19 Estimate the Bühlmann credibility premium for the next year for the policyholder. A. Less than 15 B. At least 15 but less than 16 C. At least 16 but less than 17 D. At least 17 but less than 18 E. At least 18 6.2 (3 points) An insurer has data on losses for four policyholders for seven years. Xij is the loss from the ith policyholder for year j. 4
7
∑ ∑(Xij - Xi )2
= 33.60
i=1 j=1
X1 = 1.21, X 2 = 2.98, X 3 = 0.49, X 4 = 1.72. Each policyholder is charged an annual rate based on 1.70 in expected losses. Calculate the Bühlmann credibility factor for an individual policyholder using nonparametric empirical Bayes estimation. (A) Less than 0.74 (B) At least 0.74, but less than 0.77 (C) At least 0.77, but less than 0.80 (D) At least 0.80, but less than 0.83 (E) At least 0.83 6.3 (3 points) You have data for 3 policies over four years. Policy A B C
1 198 203 177
Losses 2 249 227 210
Total
578
686
3 205 220 185
4 212 231 192
Total 864 881 764
610
635
2509
For each policy the manual rate is $210 per year. Estimate the Bühlmann credibility premium for the next year for policy C. A. 193 B. 195 C. 197 D. 199 E. 201
2013-4-12
Empirical Bayesian Cred. §6 Using an A Priori Mean, HCM 10/22/12, Page 111
6.4 (3 points) For each of 3 years, we have the number of claims for each of 2 insureds: A B
1
2
3
100 90
110 80
120 100
A priori you expected 95 claims per year for each insured. Estimate the number of claims for insured A next year, using Nonparametric Empirical Bayes estimation. A. 103 or less B. 104 C. 105 D. 106 E. 107 or more 6.5 (3 points) Prior to any observations you assume the average survival time is 20 years. Survival times are available for six insureds, three from Class A and three from Class B. The three from Class A died at times t = 17, t = 32, and t = 39. The three from Class B died at times t = 12, t = 20, and t = 24. Nonparametric Empirical Bayes estimation is used to estimate the mean survival time for each class. What is the estimated mean survival time for Class A? (A) Less than 23 (B) At least 23, but less than 24 (C) At least 24, but less than 25 (D) At least 25, but less than 26 (E) At least 26 6.6 (3 points) A priori you project $400 in losses in 2005 for each insured. Two insureds produced the following losses over a three-year period: Annual Losses Insured 2001 2002 2003 Thelma $350 $230 $300 Louise $270 $510 $390 Inflation is 10% per year. Using the nonparametric empirical Bayes method, estimate the losses in 2005 for Louise. A. 415 B. 420 C. 425 D. 430 E. 435
2013-4-12
Empirical Bayesian Cred. §6 Using an A Priori Mean, HCM 10/22/12, Page 112
Solutions to Problems: 6.1. B. The mean annual claims are: (13 + 21 + 19)/3 = 17.67. The estimated EPV = sample variance = {(13 - 17.67)2 + (21 - 17.67)2 + (19 - 17.67)2 } / (3 - 1) = 17.3. The estimated VHM = (17.67 - 15)2 - 17.3 / 3 = 1.36. K = 17.3/1.36 = 12.7. For the three years of data, Z = 3/(3 + 12.7) = 19.1%. The estimate for the next year is: (0.191)(17.67) + (0.809)(15) = 15.50. 6.2. B. Estimated EPV = average of the sample variances for each policyholder = 1 1 C Y - 1
C
Y
∑ ∑ (Xij - X i)2 = (1/4)(1/6)(33.60) = 1.4. i=1 j=1
1 Estimated VHM = C
C
∑
(Xi - µ)2 -
i=1
_ EPV = (1/4) Σ( Xi - 1.72)2 - 1.4/7 = Y
(1/4) {(1.21 - 1.70)2 + (2.98 - 1.70)2 + (0.49 - 1.70)2 + (1.72 - 1.70)2 } - 0.2 = 0.636. K = EPV/VHM = 1.4/0.636 = 2.20. With 7 years of data, Bühlmann credibility factor = Z = 7/(7 + K) = 7/9.2 = 76.1%. 6.3. D. The EPV is estimated as 289, in the usual manner making no use of the a priori estimate of the mean: Policy 1 198 203 177
A B C
Losses 2 249 227 210
3 205 220 185
Average
1 VHM = C
4 212 231 192
Mean
Sample Variance
216.0 220.2 191.0
517 153 198 289
C
= ∑ (Xi - µ)2 - EPV Y i=1
{(216 - 210)2 + (220.2 - 210)2 + (191 - 210)2 }/3 - 289/4 = 94.8. K = 289/94.8 = 3.05. There is 4 years of data. Thus Z = 4/(4 + 3.05) = 56.7%. The estimated future annual loss for Policy C is: (0.567)(191) + (0.433)(210) = 199.2. Comment: Since we are given an a priori mean, the formula for the estimated VHM is different.
2013-4-12
Empirical Bayesian Cred. §6 Using an A Priori Mean, HCM 10/22/12, Page 113
6.4. D. EPV = (100 + 100)/2 = 100.
A B
1
2
3
Mean
100 90
110 80
120 100
110.00 90.00
Sample Variance 100.00 100.00
100.000
100.000
Mean
1 VHM = C
C
= {(110 - 95)2 + (90 - 95)2 }/2 - 100/3 = 91.67. ∑ (Xi - µ)2 - EPV Y i=1
K = EPV/VHM = 100/91.67 = 1.09. Z = 3/(3 + 1.09) = 0.733. Estimated frequency for insured A: (0.733)(110) + (1 - 0.733)(95) = 106.0. 6.5. B. EPV = average of the sample variances for each class = 81.833. Class
First Survival Time
Second Survival Time
Third Survival Time
Mean
Sample Variance
A B
17 12
32 20
39 24
29.333 18.667
126.333 37.333
24.000
81.833
Average
1 Estimated VHM = C
C
= ∑ (Xi - µ)2 - EPV Y i=1
{(29.333 - 20)2 + (18.667 - 20)2 }/2 - 81.833/3 = 17.16. Estimated K = 81.833/17.16 = 4.77. Z = 3/(3 + 4.77) = 0.386. Estimated future mean survival time for class A: (0.386)(29.333) + (1 - 0.386)(20) = 23.6. 6.6. B. First, inflate all of the losses up to the 2005 level. For example, (1.14 )(350) = 512. Insured
2001
Year 2002
2003
Mean
Sample Variance
Thelma Louise
512 395
306 679
363 472
394 515
11,354 21,509
Average
Estimated EPV = 16,432. Estimated VHM = {(394 - 400)2 + (515 - 400)2 }/2 - 16,432/3 = 1153. K = 16,432/1153 = 14.25. Z = 3/(3 + 14.25) = 17.4%. The estimate for Louise is: (0.174)(515) + (1 - 0.174)(400) = 420.
16,432
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 114
Section 7, Using an A Priori Mean, Variation in Exposures54 When one employs an a priori estimate of the overall mean and there is variation in exposures, then one uses the same estimator for the EPV as was used in the absence of employing an a priori mean, but uses an estimator for the VHM somewhat different than that used in the absence of employing an a priori mean: C
∑mi (X i
- µ )2 - (C) (EPV)
VHM = i=1
.
m
Exercise: You have data for 2 classes over three years. Both classes are currently charged a rate based on a pure premium of 2.50. Use nonparametric Bayesian Estimation to estimate the future pure premium for each class. Class Poultry Farms Dairy Farms
1 41 58
Exposures 2 3 37 29 59 53
1 232 104
Losses 2 33 60
3 237 151
[Solution: The EPV was previously estimated as 254. The mean pure premium for Poultry is: (232 + 33 + 237) / (41 + 37 + 29) = 502 / 107 = 4.69. The mean pure premium for Dairy is: (104 + 60 + 151)/(58 + 59 + 53) = 315 / 170 = 1.85. m = overall exposures = 277. C
∑mi (X i
- µ )2 = (107)(4.69 - 2.50)2 + (170)(1.85 - 2.50)2 = 585.
i=1
Thus the estimated VHM = {585 - (2)(254)} / 277 = 0.278. K = 254 / 0.278 = 914. For the Poultry class the three years of data have a total of 107 exposures. Thus Z = 107 / (107 + 914) = 10.5%. For the Dairy class the three years of data have a total of 170 exposures. Thus Z = 170 / (170 + 914) = 15.7%. The estimated future pure premium for Poultry Farms is: (0.105)(4.69) + (0.895)(2.50) = 2.73. The estimated future pure premium for Dairy Farms is: (0.157)(1.85) + (0.843)(2.50) = 2.40. Class Poultry Farms Dairy Farms
Exposures 107 170
Losses 502 315
Pure Premium 4.69 1.85
Z Estimated P.P. 10.5% 2.73 15.7% 2.40
Comment: Note that we apply the complement of credibility to the a priori mean pure premium of 2.50, rather than the observed overall mean of 2.95. Loss Models would refer to this a priori mean of 2.50 as the “manual rate”] 54
See page 629-630, including example 20.35, in Loss Models.
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 115 Comparison to the Previous Case with No A Priori Mean: In a previous section, when we did not have an a priori mean, the formulas for estimating the VHM were more complicated: C
C
∑mi (X i
-
X)2
VHM = i=1
- EPV (C -1)
∑ mi 2 , with Π = m - i=1 m
Π
.
The use of what I have called Π in the denominator rather than m, and the multiplication of the EPV by C -1 rather than C, were needed in order to make the estimator of the VHM unbiased. They are analogous to the denominator of N - 1 in the sample variance. In order to get an unbiased estimator of the variance, when we use X rather than µ, we adjust for the number of degrees of freedom, and use N - 1 rather than N in the denominator of the sample variance. If we know the mean µ, then we can estimate the variance instead by: Σ(Xi - µ)2 /N. Similarly, here when we are given an a priori mean, in the formula to estimate the VHM there is no need to adjust for an analog of the number of degrees of freedom. Summary: µ = a priori estimate of the mean. The estimate of the EPV is the same as without an a priori mean: C
∑ vi
=1 EPV = i C
C Y
∑∑ mit (Xit - Xi )2 = i=1 t=1 C (Y - 1)
.
The estimate of the VHM differs from that without an a priori mean: C
∑mi (X i VHM = i=1
- µ )2 - (C) (EPV) m
, if estimated VHM is negative, set Z = 0.
Estimate is: Z(observation) + (1 - Z) µ.
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 116 Working with Only One Class, Variation in Exposures: In the case where exposures vary and you use an assumed a priori mean, one can apply this technique to a single class. With C = 1, the estimators for the EPV and VHM simplify to: Y
∑ mt (Xt - X )2 EPV = t=1
Y - 1
.
VHM = ( X - µ)2 - EPV / m. Exercise: You have data from one class over three years. The class is currently charged a rate based on a pure premium of 2.50. Use nonparametric Bayesian Estimation to estimate the future pure premium for the class. Class Poultry Farms
1 41
Exposures 2 3 37 29
1 232
Losses 2 33
3 237
[Solution: The mean pure premium for Poultry is: (232 + 33 + 237) / (41 + 37 + 29) = 502 / 107 = 4.69. For Poultry the estimated process variance is: (41)(5.66 - 4.69)2 + (37)(0.89 - 4.69)2 + (29)(8.17 - 4.69)2 = 462.0. 3 - 1 Thus the estimated VHM = (4.69 - 2.50)2 - 462 / 107 = 0.478. K = 462 / 0.478 = 967. For the Poultry class the three years of data have a total of 107 exposures. Thus Z = 107 / (107 + 967) = 10.0%. The estimated future pure premium for Poultry Farms is: (0.100)(4.69) + (0.900)(2.50) = 2.72.]
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 117 Problems: Use the following information for the next four questions: You have data for 3 classes over four years: Class A B C
Exposures 2001 2002 11 14 20 21 15 14
2003 15 19 12
2004 18 20 12
2001 $2442 $4020 $2880
Losses 2002 $2926 $4725 $2632
2003 $2970 $4028 $2472
2004 $4230 $4120 $2328
Each class is currently charged a rate based on expected losses of $200 per exposure. 7.1 (4 points) Use nonparametric Bayesian Estimation to estimate the expected value of the process variance of the pure premiums. A. 1800 B. 2000 C. 2200 D. 2400 E. 2600 7.2 (2 points) Use nonparametric Bayesian Estimation to estimate the variance of the hypothetical mean pure premiums. A. Less than 110 B. At least 110 but less than 120 C. At least 120 but less than 130 D. At least 130 but less than 140 E. At least 140 7.3 (1 point) Use nonparametric Bayesian Estimation to estimate the amount of credibility to be assigned to the data for Class A. A. Less than 65% B. At least 65% but less than 70% C. At least 70% but less than 75% D. At least 75% but less than 80% E. At least 80% 7.4 (1 point) Use nonparametric Bayesian Estimation to estimate the expected losses in the year 2006 for Class B, assuming exposures of 21 for Class B in the year 2006. A. Less than 4250 B. At least 4250 but less than 4300 C. At least 4300 but less than 4350 D. At least 4350 but less than 4400 E. At least 4400
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 118 7.5 (2 points) You are given the following data for a single group: Year 1 Year 2 Aggregate Losses 110,000 160,000 Number of Members 400 500 Using an a priori mean, the variance of the hypothetical means has been estimated as 120. Using the nonparametric empirical Bayes method, determine the credibility factor to be assigned to the data for this group for purposes of predicting the pure premium of this group in Year 3. (A) 10% (B) 15% (C) 20% (D) 25% (E) 30% 7.6 (3 points) Use the following data for 200 classes over 5 years: Pure Premium for Class i in year t Xit = = relativity for class i in year t. Pure premium in Year t for all classes mit = exposures for class i in year t. 5
mi =
∑ mit = total exposures over 5 years for class i. t=1 5
∑ mit Xit _ Xi = t=1 = weighted average relativity for class i. mi 200 5
∑ ∑ mit (Xit - Xi
)2
200
= 8.350 x
1011.
i=1 t=1
i=1 200
200
∑ mi (Xi i=1
∑ mi = 2.351 x 108.
-
1)2
= 2.072 x
1012.
∑ mi2 = 9.423 x 1012. i=1
You estimate the mean future pure premium for all classes to be $3.15. Using nonparametric Bayesian Estimation, what is the estimated future pure premium for a class with a weighted average relativity of 0.872 and a total of 183,105 exposures over five years? Hint: The a priori mean relativity compared to average is 1. A. Less than $2.95 B. At least $2.95 but less than $3.00 C. At least $3.00 but less than $3.05 D. At least $3.05 but less than $3.10 E. At least $3.10
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 119 7.7 (3 points) For a group policyholder, we have the following data available: 2001 2002 2003 2004 Losses $38,000 $45,000 $42,000 Number in Group 100 110 90 80 (estimate) If the manual rate per person is $370 per year, estimate the total credibility premium for the year 2004. A. 31,750 B. 32,000 C. 32,250 D. 32,500 E. 32,750 7.8 (3 points) Use the following information for two classes. Number of Claims Class Year 1 Year 2 Year 3 Total A 2 4 3 9 B 4 9 6 19 Number of Exposures Class Year 1 Year 2 Year 3 Total A 100 150 200 450 B 200 200 200 600 Frequency Class Year 1 Year 2 Year 3 Total A 0.02000 0.02667 0.01500 0.02000 B 0.02000 0.04500 0.03000 0.03167 A priori, one assumes each class has a mean frequency of 3%. In year 4, there are 300 exposures for Class A and 200 exposures for Class B. Using nonparametric empirical Bayes credibility, estimate the number of claims in year 4 for Class A. A. 6.5 B. 7.0 C. 7.5 D. 8.0 E. 8.5
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 120 7.9 (3 points) You are a pricing actuary in the group insurance line at Big Ticket Insurance. The Chief Actuary requests that you perform a credibility analysis and determine a claims rate for Product XYZ. You are provided the following information on Product XYZ for the previous calendar year: • There was data on 110 different Members. • Average Number of Members per month: 100 • Claims Cost Per Member Per Month (PMPM): $300 • Manual Rate Cost PMPM: $550 • mi = the number of months of data for member i • Xij is the claims cost for member i in month j. • X i is the average claim cost for member i. 110 mi
∑ ∑(Xit - X i )2
= 8000 million.
i=1 t=1 110
∑ mi (X i
- 550)2 = 820 million.
i=1
Using nonparametric Empirical Bayes credibility, estimate the claims cost (PMPM) associated with Product XYZ. (A) Less than 400 (B) At least 400, but less than 420 (C) At least 420, but less than 440 (D) At least 440, but less than 460 (E) At least 460 7.10 (4, 11/05, Q.11 & 2009 Sample Q.223) (2.9 points) You are given the following data: Year 1 Year 2 Total Losses 12,000 14,000 Number of Policyholders 25 30 The estimate of the variance of the hypothetical means is 254. Determine the credibility factor for Year 3 using the nonparametric empirical Bayes method. (A) Less than 0.73 (B) At least 0.73, but less than 0.78 (C) At least 0.78, but less than 0.83 (D) At least 0.83, but less than 0.88 (E) At least 0.88
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 121 Solutions to Problems: 7.1. D. The EPV is estimated as 2369 in the usual manner making no use of the a priori estimate of 1 the mean: vi = Y - 1
Y
∑ mit (Xit - Xi )2 . t=1
Estimated EPV = (1/C) Σvi = (1/3)(4137 + 2211 + 758) = 2369. Class C B A
2001 15 20 11
A B C
2001 $222.00 $201.00 $192.00
A B C
Exposures 2002 14 21 Pure Premium 2002 $209.00 $225.00 $188.00
2003 12 19 15
2004 12 20 18
$2880 $4020 $2442 2001
2003 $198.00 $212.00 $206.00
2004 $235.00 $206.00 $194.00
Overall $216.690 $211.162 $194.566
(exposures)(p.p. - overall p.p.)^2 310.2 827.8 5239.5 2065.5 4021.0 13.3 98.8 603.6 1568.8
6034.8 533.0 3.8
(sigma i)^2 4137 2211 758
Losses
2369
7.2. A. Using the EPV, the pure premiums by class, and exposures by class, all calculated in the previous solution: C
∑ mi (X i - µ)2 - (C)(EPV)
VHM = i=1
m
=
{{(58)(216.69 - 200)2 + (80)(211.16 - 200)2 + (53)(194.57 - 200)2 } - (3)(2369)} / 191 = 107.7. 7.3. C. K = 2369 / 107.7 = 22.0. Class A had 58 exposures. Z = 58/(58 + 22) = 72.5%. 7.4. D. K = 2369 / 107.7 = 22.0. Class B had 80 exposures. Z = 80/(80 + 22.0) = 78.4%. Estimated P.P. = (0.784)(211.16) + (0.216)(200) = 208.75. Thus the estimated losses for 21 exposures are: (21)(208.75) = $4384.
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 122 7.5. C. The pure premiums for the two years are: 110000/400 = 275 and 160000/500 = 320. The overall pure premium for two years is: 270000/900 = 300. 1 vi = Y - 1
Y
∑ mit (Xit - Xi )2 = (1/(2 - 1)) {(400)(275 - 300)2 + (500)(320 - 300)2} = 450,000. t=1
Since there is only one group this 450,000 is the estimate of the EPV. K = EPV/ VHM = 450,000/120 = 3750. There are 900 exposures in the two years of data. Z = 900/(900 + 3750) = 19.4%. Comment: Similar to 4, 11/05, Q.11. 1 1 7.6. A. EPV = C Y - 1
C
Y
∑ ∑ mij (Xij - Xi)2 = 8.350 x 1011/ {(200)(5-1)} = i=1 j=1
1.044 x 109 . A priori we assume a relativity of 1 for a class chosen at random. C
∑ mi (X i -
VHM = i=1
C
µ)2 m
- (C)(EPV)
∑ mi (X i - 1)2 - (C)(EPV)
= i=1
m
{2.072 x 1012 - (200)(1.044 x 109 )} / (2.351 x 108 ) = 7925. Estimated K = 1.044 x 109 / 7925 = 131,735. For given class, Z = 183105/(183105 + 131735) = 58.2%. Estimated future relativity for given class = (0.582)(0.872) + (0.418)(1) = 0.926. Estimated future pure premium for given class = (0.926)($3.15) = $2.92. 7.7. C. The mean dollars per person are: ($38000 + $45000 + $42000)/(100 + 110 + 90) = $416.67. The observed pure premiums by year are: 38000/100 = 380, 45000/110 = 409.09, and 42000/90 = 466.67. Therefore, the estimated EPV = {(100)(380 - 416.67)2 + (110)(409.09 - 416.67)2 + (90)(466.67 - 416.67)2 }/(3 - 1) = 182,879. The estimated VHM = (416.67 - 370)2 - 182,879 / (100 + 110 + 90) = 1568. K = 182,879 / 1568 = 116.6. A total of 300 exposures, Z = 300/(300 + 116.6) = 72.0%. The estimated pure premium for the next year is: (0.720)(416.67) + (0.280)(370) = 403.60. With 80 persons expected in 2004, the estimated premium is: (80)(403.60) = 32,288.
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 123 1 7.8. E. v i = Y - 1
Y
∑ mit (Xit - Xi )2 = estimated process variance for class i. t=1
Estimated EPV = (1/C) Σ vi = (.00583 + .03167)/2 = 0.01875. Class A B
(# of Exposures)(Frequency - 3 year average)^2 1 2 3
Process Variance
0.00000 0.02722
0.00583 0.03167
0.00667 0.03556
0.00500 0.00056
Average
0.01875 0.02000 0.02000
Frequency 0.02667 0.04500
A B A B
A B
0.01500 0.03000
Weighted Avg. 0.02000 0.03167
2 4
Number of Claims 4 9
3 6
Sum 9 19
100 200
Number of Exposures 150 200
200 200
Sum 450 600
X1 = 9/450 = 0.02000. X 2 = 19/600 = 0.03167. µ = 0.03. C
∑ mi (X i - µ)2 - (C)(EPV)
VHM = i=1
m
=
{(450(0.02000 - 0.03)2 + 600(0.03167 - 0.03)2 ) - (2)(0.01875))} / 1050 = 0.00000874. K = EPV/VHM = 0.01875/0.00000874 = 2145. ZA = 450/(450 + 2145) = 0.173. Estimated frequency for Class A is: (0.173)(0.0200) + (1 - 0.173)(0.0300) = 0.0283. (300 exposures)(0.0283) = 8.49 claims. Comment: Note that since we are given an a priori mean, the formula for the estimated VHM is different than when we did not have an a priori mean.
2013-4-12 Emp. Bayes Cred. §7 A Priori Mean & Varying Expos., HCM 10/22/12, Page 124 7.9. A.
C
110
i=1
i=1
∑ (mi - 1) = ∑ (mi - 1) = m - 110 = (12)(100) - 110 = 1090. C mi
∑ ∑(Xij - Xi )2 EPV =
i=1 j=1 C
∑ (mi -
=
8000 million = 7.34 million. 1090
1)
i=1
C
∑ mi (X i - µ)2 - (C)(EPV)
VHM = i=1
m
=
820 million - (110)(7.34 million) = 0.0105 million. (12)(100)
K = EPV/VHM = 7.34 / 0.0105 = 699. We have: (12)(100) = 1200 member-months of data. Z = 1200/(1200 + 699) = 63.2%. Estimated claims cost PMPM is: (300)(63.2%) + (550)(1 - 63.2%) = $392. Comment: Loosely based on Q. 18d of the Fall 2008 Group and Health - Design and Pricing Exam of the SOA. 7.10. D. The pure premiums for the two years are: 12000/25 = 480 and 14000/30 = 466.67. The overall pure premium for two years is: 26000/55= 472.73. 1 vi = Y - 1
Y
∑ mit (Xit - Xi )2 = (1/(2 - 1)) {(25)(480 - 472.73)2 + (30)(466.67 - 472.73)2} t=1
= 2423. Since there is only one group this 2423 is the estimate of the EPV. K = EPV/ VHM = 2423/254 = 9.5. There are 55 exposures in the two years of data. Z = 55/(55 + 9.5) = 85.3%. Comment: I assumed that by “the credibility factor for Year 3” they meant the credibility assigned to the two years of data, in order to estimate year 3. It is unclear what other estimate of the pure premium for year 3 would be getting weight 1 - Z; perhaps it would be given to an a priori mean not mentioned in the question. One can not use nonparametric empirical Bayes methods to estimate the VHM from the data for one group, unless one is given an a priori mean; therefore I have included it in this section of the study guide. In my opinion, this is a poorly written exam question.
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 125
Section 8, Assuming a Poisson Frequency55 As in Semiparametric Estimation, one can assume a Poisson Frequency, in which case, the estimated EPV is equal to the observed overall mean.56 No Exposures, Same Number of Years of Data: As previously, assume there are 3 drivers in a particular rating class. For each of 5 years, we have the number of claims for each of these 3 drivers: Hugh Dewey Louis
1
2
3
4
5
0 0 0
0 1 0
0 0 2
0 0 1
0 0 0
However, now assume each driver has a Poisson frequency. Then, the estimated EPV = X = 4/15 = 0.2667. As previously, the estimated VHM = (the sample variance of the class means) - EPV / Y = C
∑(Xi
- X )2
i=1
C - 1
-
EPV = 0.0933 - 0.2667/5 = 0.0400.57 Y
K = EPV/VHM = .2667/.0400 = 6.667.58 Z = 5/(5 + K) = 0.429. The estimated future claim frequency of each driver: Hugh: (0.429)(0) + (1 - 0.429)(0.2667) = 0.152. Dewey : (0.429)(.2) + (1 - 0.429)(0.2667) = 0.238. Louis : (0.429)(.6) + (1 - 0.429)(0.2667) = 0.410. We note that the resulting estimates are in balance: (0.152 + 0.238 + 0.410) / 3 = 0.2667 = the observed mean claims frequency.
55
See page 29 and Exercise 20 in “Topics in Credibility” by Dean. See “Mahlerʼs Guide to Semiparametric Estimation” and “Mahlerʼs Guide to Conjugate Priors.” 57 The means are: 0, 0.2, and 0.6; their sample variance is: {(0 - 0.2667)2 + (0.2 - 0.2667)2 + (0.6 - 0.2667)2 }/(3 - 1) = 0.0933. 58 In the absence of the Poisson assumption, the estimated K was 12.5. 56
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 126 Connection to Semiparametric Estimation:59 In this situation with no exposures and without differing number of years of data, one could get the same result via Semiparametric Estimation. Treat five years of data from one driver as one draw from the risk process Over 5 years we have one driver with 0 claims, one driver with 1 claim and one driver with 3 claims. Estimated EPV = X = (0 + 1 + 3)/3 = 4/3. The sample variance is: {(0 - 4/3)2 + (1 - 4/3)2 + (3 - 4/3)2 } / (3 - 1) = 7/3. Estimated VHM = 7/3 - 4/3 = 1. K = EPV/VHM = (4/3)/1 = 4/3, where one draw from the risk process is five years of data. Z = 1/(1 + K) = 3/7 = 0.429, matching the previous result. If one had one year of data from each insured, with one exposure from each insured, then using Empirical Bayes Estimation: C
C
∑(Xi - X )2 VHM = i=1
∑(Xi - X )2
-
C - 1
EPV i=1 = C - 1 Y
- EPV = sample variance - EPV.
This matches what is done in Semiparametric Estimation. Differing Number of Years of Data: As in a previous section, assume there are 3 drivers with differing number of years of data.
Hugh Dewey Louis
1
2
3
4
5
0
1 0
0 0 2
0 0 1
0 0 0
Assume each driver has a Poisson frequency. Then, the estimated EPV = X = 4/12 = 1/3. The VHM is estimated as when done previously with differing number of years of data: 3
Π=
∑Yi i=1
59
3 ∑ Yi2 i= 1 3 ∑ Yi i= 1
= (3 + 5 + 4) - (32 + 52 + 42 ) / (3 + 5 + 4) = 12 - 50/12 = 7.833.
See “Mahlerʼs Guide to Semiparametric Estimation.”
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 127 The means for the three individual drivers are: 0, 0.20, and 0.75. C
∑mi (X i
- X)2 - EPV (C -1)
VHM = i=1
=
Π
({(3)(0 - 1/3)2 + (5)(0.20 - 1/3)2 + (4)(0.75 - 1/3)2 } - (1/3)(2)) / 7.833 = 0.05745 K = EPV/VHM = (1/3)/0.05745 = 5.80.60 Exercise: Estimate the future claim frequency for each driver. [Solution: The credibilities are: 3/8.8 = 34.1%, 5/10.8 = 46.3%, and 4/9.8 = 40.8%. The estimates of future claim frequency are: Hugh: (34.1%)(0) + (1 - 34.1%)(1/3) = 0.220. Dewey: (46.3%)(0.2) + (1 - 46.3%)(1/3) = 0.272. Louis: (40.8%)(0.75) + (1 - 40.8%)(1/3) = 0.503.] Exposures: The situation with exposures can be handled in a manner parallel to that with differing numbers of years of data. Assume we have three policies with differing numbers of vehicles over four years: Policy
1
2
3
4
Total
A
Claims Vehicles
1 2
0 2
0 3
0 3
1 10
B
Claims Vehicles
0 1
0 1
2 2
1 2
3 6
C
Claims Vehicles
0 1
1 1
0 1
0 1
1 4
Assume that for each policy, each vehicle has a Poisson frequency.61 Then, the estimated EPV = X = 5/20 = 0.25. When there are exposures, the VHM is estimated using the method from a previous section: C
∑ mi 2 Π = m - i=1 m
= 20 - (102 + 62 + 42 )/20 = 12.4.
The means for the three polices are: 0.10, 0.50, and 0.25. 60 61
In the absence of the Poisson assumption, the estimated K would turn out to be 9.4. The vehicles on a single policy are all assumed to have the same Poisson Distribution.
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 128 C
∑mi (X i VHM = i=1
- X)2 - EPV (C -1) Π
=
{(10)(0.1 - 0.25)2 + (6)(0.5 - 0.25)2 + (4)(0.25 - 0.25)2 } - (0.25)(2) = 12.4 (0.6 - 0.5)/12.4 = 0.008065. K = EPV/VHM = 0.25 / 0.008065 = 31.0. Exercise: Estimate the future claim frequency per vehicle for each policy. [Solution: The credibilities are: 10/41 = 24.4%, 6/37 = 16.2%, and 4/35 = 11.4%. The estimates of future claim frequency are: A: (24.4%)(0.1) + (1 - 24.4%)(0.25) = 0.213. B: (16.2%)(0.5) + (1 - 16.2%)(0.25) = 0.291. C: (11.4%)(0.25) + (1 - 11.4%)(0.25) = 0.250.] As discussed in a previous section, one could use the method that preserves total claims, and give the complement of credibility to the credibility weighted mean frequency: {(24.4%)(.1) + (16.2%)(.5) + (11.4%)(.25)} / {24.4% + 16.2% + 11.4%} = 0.2575. In this case, the estimates of future claim frequency are very similar to before: A: (24.4%)(0.1) + (1 - 24.4%)(0.2575) = 0.219. B: (16.2%)(0.5) + (1 - 16.2%)(0.2575) = 0.297. C: (11.4%)(0.25) + (1 - 11.4%)(0.2575) = 0.257.
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 129 Problems: Use the following information for the next three questions: A portfolio of 8 policyholders had the following number of claims by year: Policyholder
2001
Year 2002
1 2 3 4 5 6 7 8
1 0 1 1 0 0 0 0
1 0 0 0 2 0 0 0
0 0 3 0 0 0 1 0
2 0 4 1 2 0 1 0
Total
3
3
4
10
2003
Total
8.1 (2 points) Assume that 75% of the policyholders in the portfolio have a Poisson frequency with annual mean of 0.3, while the remaining 25% have a Poisson frequency with annual mean of 0.5. Using Buhlmann Credibility, what is the estimated future annual claim frequency for policyholder 3? A. 0.4 B. 0.5 C. 0.6 D. 0.7 E. 0.8 8.2 (3 points) Assume each insureds frequency is Poisson distributed. The mean of each insureds Poisson Distribution does not change over time. What is the estimated future annual claim frequency for policyholder 3? Treat the 3 years of data from an individual policyholder as a single observation for purposes of estimating the Buhlmann Credibility Parameter. A. 0.4 B. 0.5 C. 0.6 D. 0.7 E. 0.8 8.3 (4 points) Using nonparametric Bayesian Estimation, what is the estimated future annual claim frequency for policyholder 3? A. 0.4 B. 0.5 C. 0.6 D. 0.7 E. 0.8 8.4 (2 points) For each of 7 years, we have the number of claims for each of 2 insureds: A B
1
2
3
4
5
6
7
Total
0 1
0 0
1 0
0 0
0 2
0 0
0 1
1 4
The number of claims for each insured each year has a Poisson distribution. Estimate the number of claims for insured B over the next 7 years, using Nonparametric Empirical Bayes estimation. A. 2.6 B. 2.8 C. 3.0 D. 3.2 E. 3.4
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 130 8.5 (3 points) Two insurance policies produced the following claims during a 4 year period: Year Insured 2000 2001 2002 2003 A Number of Claims 2 0 1 0 Insured Vehicles 2 2 2 1 B Number of Claims 0 1 0 --Insured Vehicles 3 3 3 --Assume that the number of claims for each vehicle each year has a Poisson distribution and that each vehicle on a policy has the same expected claim frequency. Estimate the expected annual number of claims per vehicle for Insured A. Use the method that preserves total claims. A. 0.28 B. 0.30 C. 0.32 D. 0.34 E. 0.36 8.6 (2 points) You are given: (i) A region is comprised of four territories. Claims experience is as follows: Territory Number of Exposures Number of Claims A 100 3 B 200 5 C 400 4 D 300 3 (ii) The number of claims for each exposure each year has a Poisson distribution. (iii) Each exposure in a territory has the same expected claim frequency. Determine the empirical Bayes estimate of the credibility to be assigned to the data from Territory D, for purposes of estimating the claim frequency for that territory next year. (A) 30% (B) 35% (C) 40% (D) 45% (E) 50%
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 131 8.7 (3 points) You are given: (i) Over a four-year period, the following claim experience was observed for three insureds: Year Insured 1 2 3 4 A Number of Claims 0 0 0 0 B Number of Claims 0 0 0 1 C Number of Claims 0 2 1 0 (ii) The number of claims for each insured each year follows a Poisson distribution. Determine the semiparametric empirical Bayes estimate of the claim frequency for Insured C in Year 5. (A) Less than 0.40 (B) At least 0.40, but less than 0.45 (C) At least 0.45, but less than 0.50 (D) At least 0.50, but less than 0.55 (E) At least 0.55 8.8 (3 points) Use the following information for two classes and three years. Number of Claims Class Year 1 Year 2 Year 3 Total A 2 4 3 9 B 4 9 6 19 Number of Exposures Class Year 1 Year 2 Year 3 Total A 100 150 200 450 B 200 200 200 600 Assume that the number of claims for each exposure each year has a Poisson distribution and that each exposure in a class has the same expected claim frequency. Using nonparametric Empirical Bayes credibility, estimate the future frequency for Class A, using the method that preserves total claims. A. 2.2% B. 2.3% C. 2.4% D. 2.5% E. 2.6
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 132 8.9 (4 points) PrizeCo sells hole-in-one insurance. (The sponsor of a golf tournament will pay a cash prize to any amateur golfer who makes a hole-in-one during the tournament. PrizeCo will reimburse the sponsor for any prize payments.) For six golf courses at which PrizeCo has provided such insurance for many events over several years: Golf Course Number of Amateur Golfers Numbers of Holes-in-One Amelia Island 8000 6 Clovernook 10,000 9 Donegal Highlands 20,000 8 Foxford Hills 12,000 6 Garden City 5000 1 Victoria 3000 3 Two hundred amateurs will be in an insured tournament at Garden City. The prize for a hole-in-one is $100,000. Determine the empirical Bayes estimate of the amount PrizeCo expects to pay. A. $9000 B. $9500 C. $10,000 D. $10,500 E. $11,000 8.10 (4, 11/05, Q.22 & 2009 Sample Q.233) (2.9 points) You are given: (i) A region is comprised of three territories. Claims experience for Year 1 is as follows: Territory Number of Insureds Number of Claims A 10 4 B 20 5 C 30 3 (ii) The number of claims for each insured each year has a Poisson distribution. (iii) Each insured in a territory has the same expected claim frequency. (iv) The number of insureds is constant over time for each territory. Determine the Bühlmann-Straub empirical Bayes estimate of the credibility factor Z for Territory A. (A) Less than 0.4 (B) At least 0.4, but less than 0.5 (C) At least 0.5, but less than 0.6 (D) At least 0.6, but less than 0.7 (E) At least 0.7
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 133 8.11 (4, 11/06, Q.13 & 2009 Sample Q.257) (2.9 points) You are given: (i) Over a three-year period, the following claim experience was observed for two insureds who own delivery vans: Year Insured 1 2 3 A Number of Vehicles 2 2 1 Number of Claims 1 1 0 B Number of Vehicles N/A 3 2 Number of Claims N/A 2 3 (ii) The number of claims for each insured each year follows a Poisson distribution. Determine the semiparametric empirical Bayes estimate of the claim frequency per vehicle for Insured A in Year 4. (A) Less than 0.55 (B) At least 0.55, but less than 0.60 (C) At least 0.60, but less than 0.65 (D) At least 0.65, but less than 0.70 (E) At least 0.70
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 134 Solutions to Problems: 8.1. A. The Expected Value of the Process Variance = 0.35. Type of Risk A B Average
A Priori Probability 0.75 0.25
Process Variance 0.3 0.5
Mean 0.3 0.5
Square of Mean 0.09 0.25
0.350
0.350
0.130
The Variance of the Hypothetical Means = 0.130 - 0.352 = 0.0075. Thus the Bühlmann credibility parameter, K = EPV / VHM = 0.35 / 0.0075 = 46.7. Therefore, for 3 years of data Z = 3 / (3 + 46.7) = 6.0%. The estimated future annual frequency for policyholder 3 is: (6.0%)(4/3) + (1 - 6.0%)(0.350) = 0.409. 8.2. D. Define the unit of data as one insured over three years. 3 Year Claim Count : 0 1 2 3 4 5 or more Number of Insureds: 3 2 2 0 1 0 For the observed data the mean frequency (per 3 years) is: {(3)(0) + (2)(1) + (2)(2) + (1)(4)} / 8 = 1.25. The second moment is: {(3)(02 ) + (2)(12 ) + (2)(22 ) + (1)(42 )} / 8 = 3.25. Thus the sample variance = (8/7)(3.25 - 1.252 ) = 1.929. EPV = mean = 1.25. VHM = Total Variance - EPV = 1.929 - 1.25 = 0.679. K = EPV/VHM = 1.25/0.679 = 1.84. Three years of data has been defined as N = 1, therefore, Z = 1 / (1 + 1.84) = 35.2%. The observed 3 year frequency for policyholder 3 is 4 and the overall mean is 1.25. The estimated future 3 year frequency for policyholder 3 is: (35.2%)(4) + (1 - 35.2%)(1.25) = 2.22. The estimated future annual frequency is: 2.22/3 = 0.739. Alternately, EPV = mean annual frequency = 10/24 = 5/12. The individual mean annual frequencies are: 2/3, 0, 4/3, 1/3, 2/3, 0, 1/3, 0. The sample variance of the class means is: {3(0 - 5/12)2 + 2(1/3 - 5/12)2 + 2(2/3 - 5/12)2 + (4/3 - 5/12)2 }/(8 - 1) = 3/14. VHM = (the sample variance of the class means) - EPV / Y = 3/14 - (5/12)/3 = 19/252. K = EPV/VHM = (5/12) / (19/252) = 105/19. For 3 years of data, Z = 3/(3 + K) = 57/162 = 35.2%. Proceed as before. Comment: Over three years, each insureds frequency is Poisson.
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 135 8.3. B. EPV = average of the sample variances = 0.5833. VHM = sample variance of the means - EPV / (# years ) = 0.2143 - 0.5833/3 = 0.0199. Policyholder
2001
Year 2002
1 2 3 4 5 6 7 8
1 0 1 1 0 0 0 0
1 0 0 0 2 0 0 0
2003
Mean
Sample Variance
0 0 3 0 0 0 1 0
0.6667 0.0000 1.3333 0.3333 0.6667 0.0000 0.3333 0.0000
0.3333 0.0000 2.3333 0.3333 1.3333 0.0000 0.3333 0.0000
0.4167 0.2143
0.5833
Average Sample Variance
K = EPV/VHM = 0.5833/0.0199 = 29.3. For three years of data, Z = 3/(3 + 29.3) = 9.3%. The observed annual frequency for policyholder 3 is 4/3 and the overall mean is 0.4167. Estimated future annual freq. for policyholder 3 is: (9.3%)(4/3) + (1 - 9.3%)(0.4167) = 0.502. 8.4. D. EPV = overall mean = 5/14. VHM = {(1/7 - 5/14)2 + (4/7 - 5/14)2 }/(2 - 1) - EPV/(# years of data) = 0.09184 - (5/14)/7 = 0.0408. K = (5/14)/0.0408 = 8.75. Z = 7/(7 + 8.75) = 0.444 Estimated frequency for insured B: (0.444)(4/7) + (1 - 0.444)(5/14) = 0.452. (7)(0.452) = 3.16. 8.5. C. Estimated EPV = overall mean = 4/16 = 1/4. Π = m - Σmi2 /m = 16 - (72 + 92 )/16 = 7.875. X1 = 3/7. X 2 = 1/9. C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) Π
=
{7(3/7 - 1/4)2 + 9(1/9 - 1/4)2 - (2 - 1)(1/4)} / 7.875 = 0.01864. K = EPV/VHM = .25/0.01864 = 13.4. ZA = 7/(7 + 13.4) = 0.343. ZB = 9/(9 + 13.4) = 0.402. Credibility weighted mean = {(0.343)(3/7) + (0.402)(1/9)} / (0.343 + 0.402) = 0.257. Estimated frequency for Insured A is: (0.343)(3/7) + (1 - 0.343)(0.257) = 0.316. Comment: Similar to Exercise 20 in “Topics in Credibility” by Dean. The fact that the insureds have different numbers of years of data, does not complicate the calculation. If we had not assumed a Poisson frequency, then the computation of the EPV would have had to take into account the differing number of years of data.
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 136 8.6. A. Since we have assumed a Poisson frequency, the estimated EPV = X = 15/1000 = .015. Π = m - Σ mi2 / m = 1000 - (1002 + 2002 + 4002 + 3002 )/1000 = 700. C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) Π
=
{(100)(.03 - .015)2 + (200)(.025 - .015)2 + (300)(.01 - .015)2 + (400)(.01 - .015)2 - (4 - 1)(.015)}/700 = 0.00002143. K = EPV/VHM = 0.015/0.00002143 = 700. Territory D has 300 exposures, and therefore Z = 300/(300 + 700) = 30%. Comment: Similar to 4, 11/05, Q.22. 8.7. D. Since the frequency is assumed to be Poisson, estimated EPV = mean = (0 + 1 + 3)/(4 + 4 + 4) = 4/12 = 1/3. X1 = 0. X 2 = 0.25. X 3 = 0.75. Estimated VHM = {(0 - 1/3)2 + (0.25 - 1/3)2 + (0.75 - 1/3)2 }/(3 - 1) - EPV/4 = .1458 - 1/12 = 0.0625. K = (1/3)/0.0625 = 5.33. Z1 = 4/(4 + 5.33) = 42.9%. Estimated frequency for Insured C is: (42.9%)(0.75) + (1 - 42.9%)(1/3) = 0.512. 8.8. D. Estimated EPV = overall mean = (9 + 19)/(450 + 600) = 28/1050 = 0.02667. Π = m - Σmi2 /m = 1050 - (4502 + 6002 )/1050 = 514.3. X1 = 9/450 = .02000. X 2 = 19/600 = .03167. C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) Π
=
{450(.02000 - 0.02667)2 + 600(.03167 - 0.02667)2 - (2 - 1)(0.02667)}/514.3 = 0.00001624. K = EPV/VHM = 0.02667/0.00001624 = 1642. Z A = 450/(450 + 1642) = 0.215. ZB = 600/(600 + 1642) = 0.268. Credibility weighted mean = {(0.215)(0.0200) + (0.268)(0.03167)} / (0.215 + 0.268) = 0.0265. Estimated frequency for Class A is: (0.215)(0.0200) + (1 - 0.215)(0.0265) = 0.0251.
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 137 8.9. E. One can either assume that the number of holes-in-one is Poisson or Bernoulli; since the mean frequency per golfer is so small, there is no practical difference. For convenience, take the unit of exposure as 1000 golfers. Then, the overall observed mean frequency is: 33/58 = 56.9% = EPV. The six observed frequencies are: 75%, 90%, 40%, 50%, 20%, and 100%. Π = Σmi - Σ mi2 / Σmi = 58 - (82 + 102 + 202 + 122 + 52 + 32 )/ 58 = 45.207.
Σ mi ( X i - X )2 = {(8)(75% - 56.9%)2 + (10)(90% - 56.9%)2 + (20)(40% - 56.9%)2 + (12)(50% - 56.9%)2 + (5)(20% - 56.9%)2 + (3)(100% - 56.9%)2 } = 3.224. C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) Π
= {3.224 - (0.569)(6 - 1)}/45.207 = 0.00838.
K = EPV/VHM = 0.569/0.00838 = 67.9. The data for Garden City had 5000 golfers, or 5 exposure units. Z = 5/(5 + 67.9) = 6.9%. Estimated future frequency for Garden city is: (6.9%)(20%) + (1 - 6.9%)(56.9%) = 54.4%. The given tournament has 200 golfers, or 0.2 exposures. The expected number of holes-in-one is: (0.2)(54.4%) = 0.109. Expected payment is: ($100,000)(0.109) = $10,900. 8.10. A. Since we have assumed a Poisson frequency, the estimated EPV = X = 12/60 = 1/5. Π = m - Σ mi2 / m = 60 - (102 + 202 + 302 )/60 = 36.67. C
∑ mi ( Xi -
VHM = i=1
X)2 - EPV (C - 1) Π
=
{(10)(.4 - .2)2 + (20)(.25 - .2)2 + (30)(.1 - .2)2 - (.2)(3 - 1)}/36.67 = 0.00954. K = EPV/VHM = 0.20/.00954 = 21.0. Territory A has 10 exposures, and therefore Z = 10/(10 + 21) = 32.3%. Comment: The wording of this past exam question should have been better. One has to use a combination of semiparametric and empirical bayes estimation. If there are no exposures and no differing number of years of data, then what I have done is similar to semiparametric estimation. However, in this question we have differing numbers of years of data and we have exposures. For your exam, I would be on the lookout for exposures.
2013-4-12 Empirical Bayesian Cred. §8 Assuming Poisson Freq., HCM 10/22/12, Page 138 8.11. C. Since the frequency is assumed to be Poisson, estimated EPV = mean = (1 + 1 + 0 + 2 + 3)/(2 + 2 + 1 + 3 + 2) = 7/10 = 0.7. X1 = (1 + 1 + 0)/(2 + 2 + 1) = 0.4. X2 = (2 + 3)/(3 + 2) = 1. Estimated VHM = {5(0.4 - 0.7)2 + 5(1 - 0.7)2 - (2 - 1)(0.7)}/{10 - (52 + 52 )/10} = 0.2/5 = 0.04. K = 0.70/.04 = 17.5. Z1 = 5/(5 + 17.5) = 0.222. Estimated frequency for A is: (0.222)(0.4) + (1 - 0.222)(0.7) = 0.633. Comment: Notice the Poisson assumption in this past exam question. Thus one uses a combination of semiparametric and empirical bayes estimation.
2013-4-12
Empirical Bayesian Cred. §9 Important Ideas,
HCM 10/22/12,
Page 139
Section 9, Important Formulas and Ideas No Variation in Exposures (Section 2) Y
∑ (Xit - Xi )2 t=1 si2 = sample variance for the data from a single class i =
C
Y - 1
.
C Y
∑si2 ∑ ∑ (Xit - Xi )2 EPV = average of the si2 = i=1 C
= i=1 t=1 C (Y - 1)
.
VHM = (the sample variance of the class means) - EPV / (# years of data) C
∑ (Xi - X )2 = i=1
C - 1
-
EPV . Y
If estimated VHM is negative, set Z = 0.
2013-4-12
Empirical Bayesian Cred. §9 Important Ideas,
HCM 10/22/12,
Page 140
No Variation in Exposures, Variation in Years (Section 3) Y i = the number of years of data for class i. Yi
∑ (Xit - Xi )2 si2 = t=1
= the usual sample variance for the data from a single class i.
Yi - 1
C
∑(Yi
C Yi
- 1)si2
∑∑ (Xit - Xi )2
EPV = weighted average of these sample variances = i=1C
= i=1 t=1 C ∑(Yi - 1) ∑(Yi - 1) i=1
C
Let Π =
∑Yi i=1
i=1
C ∑ Yi2 i= 1 . C ∑ Yi i= 1
C
∑Yi ( Xi VHM = i=1
- X )2 - (C -1)EPV Π
, if estimated VHM is negative, set Z = 0.
.
2013-4-12
Empirical Bayesian Cred. §9 Important Ideas,
HCM 10/22/12,
Page 141
Variation in Exposures (Section 4) Xit = pure premium (or frequency) for class i and year t. Observe C classes each for Y years. mit = exposures for class i and year t. Y
mi =
∑ mi t =total exposures for class i. t=1
C
m=
∑ mi = total exposures. i=1
Y
∑ mit (Xit - X i )2 v i = t=1
.
Y - 1
Y
C Y
∑ mit Xit t=1 Xi =
mi
C
.
C Y
∑ ∑ mit (Xit - Xi )2
∑ vi
i=1
X = i=1 t=1 m
.
C
EPV =
∑ ∑ mit Xit
= i=1 t=1 C (Y - 1)
.
C
∑m i 2 Π = m - i=1 m C
.
∑mi (Xi
VHM = i=1
K=
EPV . VHM
- X)2 - EPV (C - 1) , if the estimated VHM is negative, set Z = 0.
Π
Zi =
mi . mi + K
When there is variation in exposures, in order to preserve the total losses, apply the complement of credibility, 1 - Z, to the credibility weighted average of the class means.
2013-4-12
Empirical Bayesian Cred. §9 Important Ideas,
HCM 10/22/12,
Page 142
Variation in Years and Exposures (Section 5) C
∑
C Yi
(Yi - 1) vi
=1 EPV = i C
(Yi - 1) ∑ i
∑∑ mit (Xit - Xi )2 = i=1 t=1C
∑(Yi
. - 1)
i=1
=1
C
∑ mi 2 Π = m - i=1 m C
.
∑mi (X i
VHM = i=1
- X)2 - EPV (C -1) Π
, if estimated VHM is negative, set Z = 0.
The formulas in previous sections are special cases of the formulas in this section; the formulas using an priori mean are not.
2013-4-12
Empirical Bayesian Cred. §9 Important Ideas,
HCM 10/22/12,
Page 143
Using an A Priori Mean (Section 6) µ = a priori estimate of the mean. The estimate of the EPV is the same as that in the absence of relying on an a priori estimate of the mean. (1 - Z) is applied to the a priori mean, µ. No Variation in Exposures, More than One Class: C
∑(Xi
VHM = i=1
2
- µ)
EPV , if estimated VHM is negative, set Z = 0. Y
-
C
No Variation in Exposures, One Class: Y
∑ (Xt - X )2 EPV = t=1
. VHM = ( X - µ)2 - (EPV / Y), if estimated VHM is negative, set Z = 0.
Y - 1
Using an A Priori Mean, Variation in Exposures (Section 7) C
∑mi (X i
- µ )2 - (C) (EPV)
VHM = i=1
m
, if the estimated VHM is negative, set Z = 0.
Variation in Exposures, One Class: Y
∑ mt (Xt - X )2 EPV = t=1
Y - 1
. VHM = ( X - µ)2 - EPV / m, if estimated VHM < 0, set Z = 0.
Assuming a Poisson Frequency (Section 8) EPV = X . Estimate the VHM using the same formula as one would otherwise.
Mahlerʼs Guide to
Simulation Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-13 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-13,
Simulation,
HCM 10/25/12,
Page 1
Mahlerʼs Guide to Simulation Copyright 2013 by Howard C. Mahler. Information in bold and sections whose titles are in bold, are more important to pass your exam. Larger bold type indicates it is extremely important. Information presented in italics (and sections whose titles are in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined. Solutions to the problems in each section are at the end of that section.1 Section #
Pages
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4-5 6-9 10-30 31-63 64-79 80-90 91-111 112-118 119-136 137-158 159-185 186-218 219-229 230-248 249-251 252-277 278-308 309-337 338-341 342-351
A
B
C
D
E F
1
Section Name Introduction Uniform Random Numbers Continuous Distributions, Inversion Method Discrete Distributions, Inversion Method Simulating Normal and LogNormal Distributions Simulating Brownian Motion Simulating Lifetimes Miscellaneous, Inversion Method Simulating a Poisson Process Simulating a Compound Poisson Process Simulating Aggregate Losses and Compound Models Deciding How Many Simulations to Run Simulating a Gamma and Related Distributions Simulating Mixtures of Models Simulating Splices Bootstrapping Bootstrapping via Simulation Estimating p-values via Simulation An Example of a Simulation Experiment Summary and Important Ideas
Note that problems include both some written by me and some from past exams. The latter are copyright by the CAS and SOA and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. In some cases Iʼve rewritten past exam questions in order to match the notation in the current Syllabus.
2013-4-13,
Simulation,
HCM 10/25/12,
Page 2
Course 3 Exam Questions by Section of this Study Aid
Section Sample
5/00
1 2 3 4 5 6 7 8
11/00
5/01
11/01
11
13
8 12
32
11/02
CAS
SOA
CAS
CAS
SOA
11/03
11/03
5/04
11/04
11/04
30
40 39 38
38-40
32 22
9 10 11 13 14 15
19
6
43-44 37
5
33 6 34 5
10 40
Questions no longer on the syllabus: Sample 3, Q.2, Q.7; 3, 5/00, Q.14; 11/00, Q.37-38; 11/01, Q.14; 11/02 Q. 23. CAS3, 11/03, Q.36-37. I have rewritten: CAS3, 11/03, Q.40; CAS3, 11/04, Q.38-39. The CAS/SOA did not release the 5/02 and 5/03 Course 3 exams. The SOA did not release its 5/04 Course 3 exam. From 5/00 to 5/03, the Course 3 Exam was jointly administered by the CAS and SOA. Starting in 11/03, the CAS and SOA gave separate exams.
Course 4 Exam Questions by Section of this Study Aid Section
Sample
12
14
16
38
5/00
11/00
5/01
11/01
11/02
11/03
35
26
11/04
17 17
26
17 18
Questions no longer on the syllabus: 4, 5/01, Q.29; 4, 11/03, Q.10. The CAS/SOA did not release the 5/02, 5/03, and 5/04 Course 4 exams.
16
2013-4-13,
Simulation,
HCM 10/25/12,
Joint Exam 4/C Questions by Section of this Study Aid Section 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5/05
11/05
11/06
5/07
32 12 34
4
27
21
8 16
4 11
9 37
29
The CAS/SOA did not release the 5/06, 11/07, and subsequent exams.
Page 3
2013-4-13,
Simulation §1 Introduction,
HCM 10/25/12,
Page 4
Section 1, Introduction The simulation concepts in chapter 21 of Loss Models are demonstrated. Some additional simulation concepts are discussed in “Mahlerʼs Guide to Risk Measures.” Many of you will find this study guide a good review of different ideas; many actuaries understand a concept much more clearly once they know how to simulate it. Simulation allows the actuary to analyze complex situations that can not be handled in closed form, or that would be difficult to handle by analytical techniques. As long as one can simulate each piece of a process, one can simulate a complicated model, such as a model of an insurer. “Once a simulation model is created, little creative thought is required.” 2 Simulation can also be useful to check the results of work done by other techniques. Simulation experiments can help an actuary to develop his intuition, so he will be better able to apply actuarial judgement to real world problems. If one wishes to simulate a random variable S, then there are 4 steps:3 1. Build a model for S which depends on random variables. We need to know the distributions and dependencies of these random variables. 2a. Simulate the variables from step one.4 2b. Compute S for this simulated set of values. 2c. Repeat steps 2a and 2b many times, recording the output value of S each time.5 3. Use the outputs from step 2 as an approximation to the distribution of S.6 4. Estimate quantities of interest such as the mean of S, the variance of S, the 99th percentile of S, the limited expected value of S at 7, the probability that S is greater than 10, etc. Hopefully, this will all be much more concrete after going through the many examples in this study guide. For example, S might be the aggregate losses of an insured for a given year. 2
See Section 21.1.1 of Loss Models. See Section 21.1.1 of Loss Models. These steps describe in a general way many, but not all, uses of simulation by actuaries. 4 Some of these variables may have to be simulated more than once. 5 Each time you did steps 2a and 2b would be referred to as one simulation run. For a given application, one might perform 1000 simulation runs. 6 If one has done the modeling correctly, and one has performed enough simulation runs, one should have a good approximation to the true distribution of S. 3
2013-4-13,
Simulation §1 Introduction,
HCM 10/25/12,
Page 5
For example, assume that frequency is Binomial with m = 5 and q = 0.1, and severity is Weibull with θ = 100 and τ = 2. Frequency and severity are independent. The size of each claim is independent of that of any other claim.7 Then one would simulate the annual aggregate losses as follows: I. Simulate the number of claims from the Binomial Distribution.8 II. Simulate the size of each claim separately from the Weibull Distribution.9 III. S, the annual aggregate losses for this simulation run, is the sum of the values from step II. IV. Return to step I and repeat the process.
7
This is an example of the usual collective risk model of aggregate losses. See “Mahlerʼs Guide to Aggregate Distributions.” 8 How to simulate a random draw from a Binomial will be discussed in a later section. 9 How to simulate a random draw from a Weibull will be discussed in a later section. If one had simulated 3 claims in step I, one would simulate 3 independent identically distributed Weibulls. If zero claims were simulated, then S = 0.
2013-4-13,
Simulation §2 Uniform Random Numbers,
HCM 10/25/12,
Page 6
Section 2, Uniform Random Numbers Uniform random numbers from the interval [0,1] are the basis of all the simulation techniques covered on the exam. While in practice these “random” numbers are produced by an algorithm, they are intended to be a reasonable approximation to independent draws from the interval [0,1], where any number is as likely to come up as any other number.10 11 If [c, d] is any arbitrary subinterval of [0,1], then the chance of the next random draw being in the interval [c, d] should be equal to its length (d - c). You are not responsible for knowing how these random numbers are produced. In exam questions, you will given a random number or a series of random numbers from [0, 1] to use. Other Intervals than [0,1] Uniform random numbers from [0,1] can be used to easily generate uniform random numbers from an arbitrary interval [r, s]. One needs to multiply by (r - s) the width of the desired interval and then add the appropriate constant r so as to shift intervals so they are equal. To get a uniform random number from [r, s], take (s - r) u + r, where u is a random number from [0, 1]. For example, assume we a given 0.839 as a random number from [0,1]. If instead we needed numbers from the interval [-10, +15], then one multiplies by 25 the width of the desired interval, and then one adds -10: (25)(0.839) - 10 = 10.975. Exercise: 0.516, 0.854, 0.129, and 0.731, are four uniform random numbers from [0, 1]. Use these to produce four uniform random numbers from [100, 500]. [Solution: (400)(0.516) + 100 = 306.4, (400)(0.854) + 100 = 441.6, (400)(0.129) + 100 = 151.6, and (400)(0.731) + 100 = 392.4.]
10
These are pseudorandom numbers. While they will be treated as if they are random, since they are deterministically generated they can not really be random; there is always some pattern remaining. Although I will refer to random numbers for simplicity, I should really say pseudorandom numbers. 11 There are practical difficulties of assuring that consecutive draws or series of consecutive draws do not have a pattern. In practical applications any such pattern may have to be overcome by some sort of shuffling. Unfortunately, one popular technique, congruential generators, generally exhibit such patterns. See for example, pages 317-320 of A New Kind of Science by Stephen Wolfram. Therefore, the Mathematica software package, developed by Dr. Wolfram, uses instead a cellular automation rule in order to produce random numbers.
2013-4-13,
Simulation §2 Uniform Random Numbers,
HCM 10/25/12,
Page 7
Use of Uniform Random Numbers from [0, 1]: Every simulation technique discussed subsequently will use as the input either a single random number from [0, 1] or a series of independent random numbers from [0, 1] . When given a series of random numbers from [0, 1], use them in the order given, and use each one at most once. Once a random number is used, it is used up.12
12
Reusing a random number would normally make the output nonrandom and/or would introduce a dependence between pieces of the output. While in some particular situations there exist “clever” methods of avoiding this problem, they are not covered on the syllabus .
2013-4-13,
Simulation §2 Uniform Random Numbers,
HCM 10/25/12,
Page 8
Problems: 2.1 (1 point) You are given that .426 is a random number from the interval [0, 1]. Generate a random number in the interval [20, 30]. A. less than 21 B. at least 21 but less than 22 C. at least 22 but less than 23 D. at least 23 but less than 24 E. at least 24 2.2 (3 points) Let X and Y be independent, identically distributed variables, each uniformly distributed on [0, 100]. Let Z = Minimum[X, Y]. Simulate 10 values of Z using the following 20 random numbers from [0, 1]: 0.574, 0.079, 0.803, 0.382, 0.507, 0.848, 0.090, 0.631, 0.246, 0.724, 0.968, 0.372, 0.653, 0.736, 0.329, 0.757, 0.915, 0.177, 0.770, 0.403. (Use the first pair to simulate one value of Z, the second pair to simulate a second value of Z, etc.) What is the average of the ten simulated values of Z? A. 30 B. 32 C. 34 D. 36 E. 38 2.3 (CAS3, 11/04, Q.40) (2.5 points) An actuary uses the following algorithm, where U is a random number generated from the uniform distribution on [0,1], to simulate a random variable X: (1) If U < 0.40, set X = 2, then stop. (2) If U < 0.65, set X = 1, then stop. (3) If U < 0.85. set X = 3, then stop. (4) Otherwise, set X = 4, then stop. What are the probabilities for X = 1, 2, 3, 4, respectively? A. 0.40, 0.25, 0.15, 0.20 B. 0.25, 0.40, 0.20, 0.15 C. 0.15, 0.25, 0.20, 0.40 D. 0.15, 0.20, 0.40, 0.25 E. 0.20, 0.25, 0.15, 0.40
2013-4-13,
Simulation §2 Uniform Random Numbers,
HCM 10/25/12,
Page 9
Solutions to Problems: 2.1. E. Multiply by the width of the desired interval and then add a constant to translate the interval. In this case, v = 10u + 20 = 4.26 + 20 = 24.26. 2.2. B. Using the first pair of .574 and .079, one gets a simulated x = (100)(.574) = 57.4 and y = (100)(.079) = 7.9. z = Min[x, y] = 7.9. One obtains the remaining simulated values of Z similarly, and the average is: (7.9 + 38.2 + 50.7 + 9.0 + 24.6 + 37.2 + 65.3 + 32.9 + 17.7 + 40.3)/10 = 32.38. Comment: The mean of either X or Y is 50. However, we expect the minimum of two identically distributed values to have a smaller mean. (We would expect the maximum of two identically distributed variables to have a larger mean.) The distribution of the smallest of two identical, independent variables is: 1 - S(t)2 . In this example, the distribution of Z is: 1 - (1 - z/100)2 = z/50 - z2 /10000, 0 < z < 100. The density of Z is 1/50 - z/5000, 0 < z < 100: f(z) 0.020 0.015 0.010 0.005
20
40
60
80
100
z
Z has a mean of 100/3 = 33.333 < 50. 2.3. B. There is a 40% chance we stop at step 1, and thus a 40% chance of a 2. There is a 65% chance we stop at step 2 or before; therefore, there is a 65% - 40% = 25% chance that we stop at step 2. There is a 25% chance of a 1. Probability of a 3 is: 85% - 65% = 20%. Probability of a 4 is: 100% - 85% = 15%. The probabilities for X = 1, 2, 3, 4, respectively are: 0.25, 0.40, 0.20, 0.15.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 10
Section 3, Continuous Distributions, Inversion Method There are two related simulation methods called the inversion method. One applies to continuous distributions, which will be discussed in this section, while the other applies to discrete distributions, to be discussed in the next section. Letʼs assume we wish to simulate a random draw from an Exponential Distribution with mean 1000, F(x) = 1 - e-x/1000. Let 0.90 be a random number from [0,1]. Set F(x) = 0.9 and solve for x. 0.9 = 1- e-x/1000 ⇒ x = -1000 ln(1 - 0.9) = 2303. If instead we set S(x) = 0.9, then 0.9 = e-x/1000 ⇒ x = -1000 ln(0.9) = 105. F(x) and S(x) are shown below: Prob. 1.0 0.8
F(x)
0.6 0.4 S(x)
0.2
500
1000
1500
2000
2500
3000
x
2303 is where the Distribution Function is 0.9, while 105 is where the Survival Function is 0.9. Either technique is valid. In all problems involving simulation via inversion, there are always (at least) two ways to proceed. This is an example of simulation of a continuous distribution by the inversion method. The key idea is that for any continuous distribution y = F(x) is uniformly distributed on the interval [0,1].13 13
y = S(x) is also uniformly distributed on the interval [0,1].
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 11
Assume u is a random draw from the uniform distribution on [0,1]. To simulate a random draw from the distribution function F(x), we either set u = F(x) or u = S(x), and solve for x. Note that if we set u = F(x), large random numbers correspond to large simulated losses. If instead we set u = S(x), then large random numbers correspond to small simulated losses. For the Exponential Distribution, if one sets u = F(x) = 1 - e-x/θ, then x = -θ ln(1-u). If one instead sets u = 1 - F(x) = S(x), then x = -θ ln(u). Let u be a random number from (0,1). While ideally exam questions should specify whether large random numbers correspond to large or small losses, letting you know whether to set u equal to F(x) or S(x), recent questions do not. If in an exam question it is not stated which way to perform the method of inversion, set F(x) = u, since this is the manner shown in Loss Models, and then solve for x.14 Setting F(x) = u is the same mathematics as solving for a percentile. Therefore, for those cases where formulas are given in Appendix A, we can determine VaRp (X), for p = u. For example, for the Exponential Distribution, VaRp (X) = -θ ln(1-p). Let p = u, we get x = -θ ln(1-u), matching the previous result.
Exercise: Loss sizes are distributed via F(x) =
1 . 1 + (1000 / x)0.7
Let 0.312 be a random draw from the uniform distribution on [0,1]. Use this random draw to simulate a random loss size. Assume large random numbers correspond to large losses. 1 [Solution: 0.312 = F(x) = . Solve for x = 1000{(1/0.312) -1}-1/0.7 = 323. 1 + (1000 / x)0.7 Alternately, this is a LogLogistic Distribution with parameters γ = 0.7 and θ = 1000. For the LogLogistic Distribution, VaRp (X) = θ {p-1 - 1}-1/γ = (1000) {1/0.312 - 1}-1/0.7 = 323. Comment: One can check as follows: F(323) =
14
1 = 0.312. ] 1 + (1000 / 323)0.7
In practical applications, use whichever manner you wish.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 12
This inversion method makes sense, since both u and F(x) are uniformly distributed on the interval [0,1]. By setting u = F(x), the chance of x being in an arbitrary subinterval [c,d] of [0,1], will be equal to the chance that u is in the interval [F(c), F(d)]. This chance is F(d) - F(c), since u is uniformly distributed on [0,1]. This is precisely what is desired; thatʼs what is meant by x being a random draw from F(x). Setting u = F(x) is only useful provided one can solve for x. We could do so in the case of the Exponential and LogLogistic Distributions, because these Distribution functions can be inverted (in closed algebraic form.) Simulation by the inversion method will work for any Distribution Function, F(x), that can be algebraically inverted.15 Examples of distributions that may be simulated in this manner include the: Exponential, Inverse Exponential, Weibull, Inverse Weibull, Pareto, Inverse Pareto, Burr, Inverse Burr, ParaLogistic, Inverse ParaLogistic, LogLogistic, and Single Parameter Pareto.16 Exercise: Let 0.85 be a random number from [0,1]. Using the inversion method, with large random numbers corresponding to large losses, simulate a random loss from a Pareto Distribution, with parameters α = 3 and θ = 500. ⎛ 500 ⎞ 3 [Solution: Set 0.85 = F(x) = 1 - ⎜ ⎟ . Solve for x. (1 - 0.85)-1/3 = (500+x)/500. ⎝ 500 + x ⎠ x = (500)(0.15-1/3) - 500 = 441. Alternately, for the Pareto Distribution, VaRp (X) = θ {(1 - p)−1/α - 1} = (500) {(1 - 0.85)-1/3 - 1} = (500) (0.8821) = 441. Comment: 1 - {500 / (500 + 441)}3 = 0.85.]
15 16
Also called Simulation by Inversion or Simulation by Algebraic Inversion. Each of these distributions has a formula for VaRp (X) in Appendix A attached to the exam.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 13
F(x) is uniform on [0,1]: For example, take an Exponential Distribution with θ = 1000 and let y = 1 - e-x/1000. Then the chance of y being in [0.65, 0.66] will be equal to the chance that x is such that 0.65 ≤ 1 - e-x/1000 ≤ 0.66. Since x was assumed to follow an Exponential Distribution with θ = 1000, by the definition of the cumulative distribution function, this chance is: 0.66 - 0.65 = 0.01. Thus the chance of y being in the subinterval [0.65, 0.66] is equal to the width of that interval. More generally assume x follows the Distribution Function F(x). Then for y = F(x), the chance of y being in an arbitrary subinterval [c, d] of [0, 1], will be equal to the chance that x is such that: c ≤ F(x) ≤ d. By the definition of the cumulative distribution function, this chance is: d - c. Thus the chance of y being in any subinterval of [0, 1] is proportional to the width of that subinterval, with the same proportionality constant of 1 regardless of the subinterval. Thus y is uniformly distributed, with density function h(y) = 1 for 0 ≤ y ≤ 1.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 14
Problems: 3.1 (1 point) Using the inversion method, you are simulating a random loss from the exponential distribution F(x) = 1 - exp(-x/1000). You draw the value 0.988 from the uniform distribution on [0,1]. What is the corresponding size of loss? A. less than 4100 B. at least 4100 but less than 4200 C. at least 4200 but less than 4300 D. at least 4300 but less than 4400 E. at least 4400 3.2 (1 point) An Inverse Weibull Distribution has parameters θ = 15 and τ = 3. Simulate a random draw from this distribution using the random number 0.215. A. 9 B. 10 C. 11 D. 12 E. 13 3.3 (1 point) You wish to simulate via the inversion method a random variable, X, with probability density function: f(x) = 3000 x-4 , 10 < x < ∞. To do so, you use a random variable Y, with probability density function: f(y) = 1; 0 < y < 1. Which simulated value of X corresponds to a sample value of Y equal to 0.35? A. Less than 11.0 B. At least 11.0, but less than 11.2 C. At least 11.2, but less than 11.4 D. At least 11.4, but less than 11.6 E. 11.6 or more 3.4 (1 point) An Inverse Paralogistic Distribution has parameters τ = 6 and θ = 2000. Simulate a random draw from this distribution using the random number 0.396. A. 1900 B. 2100 C. 2300 D. 2500 E. 2700 3.5 (1 point) You wish to model a Pareto distribution with α = 1.5 and θ = 3 via simulation. A value of 0.6 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value from the Pareto distribution corresponds to this 0.6? A. Less than 2.6 B. At least 2.6, but less than 2.7 C. At least 2.7, but less than 2.8 D. At least 2.8, but less than 2.9 E. 2.9 or more
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 15
3.6 (1 point) An Inverse Exponential Distribution has θ = 400. Simulate a random draw from this distribution using the random number 0.092. A. 150 B. 170 C. 190 D. 210 E. 230 3.7 (1 point) Suppose you wish to model a distribution of the form f(x) = qxq-1, 0 < x < 1, with q = 0.3, by simulation. A value of 0.8 is randomly generated from a uniform distribution on the interval (0,1). What simulated value corresponds to this 0.8? A. Less than 0.45 B. At least 0.45, but less than 0.50 C. At least 0.50, but less than 0.55 D. At least 0.55, but less than 0.60 E. 0.60 or more 3.8 (1 point) An Inverse Pareto Distribution has parameters τ = 4 and θ = 300. Simulate a random draw from this distribution using the random number 0.774. A. 3000 B. 3500 C. 4000 D. 4500 E. 5000 3.9 (1 point) You want to generate a random variable, X, with probability density function: f(x) = 0.25 x3 for 0 < x < 2. To do so, you first use a random number, Y, which is uniformly distributed on (0,1). What simulated value of X corresponds to a sample value of Y equal to 0.125? A. 0.750 B. 0.879 C. 1.000 D. 1.189 E. 1.250 3.10 (1 point) An Inverse Burr Distribution has parameters τ = 5, θ = 700, and γ = 3. Simulate a random draw from this distribution using the random number 0.871. A. 1900 B. 2100 C. 2300 D. 2500 E. 2700 3.11 (1 point) You wish to model a Weibull distribution with parameters θ = 464 and τ = 3 by simulation. A value of 0.4 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value from the Weibull distribution corresponds to this 0.4? A. Less than 360 B. At least 360, but less than 380 C. At least 380, but less than 400 D. At least 400, but less than 420 E. 420 or more
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 16
3.12 (1 point) You wish to simulate a LogLogistic Distribution as per Loss Models with parameters θ = 250 and γ = 4. A value of 0.37 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value from the LogLogistic distribution corresponds to this 0.37? A. Less than 220 B. At least 220, but less than 225 C. At least 225, but less than 230 D. At least 230, but less than 235 E. 235 or more 3.13 (1 point) You wish to simulate a random value from the distribution F(x) = 1 - exp[-0.1(1.05x - 1)]. A value of 0.61 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value from this distribution corresponds to this 0.61? A. Less than 35 B. At least 35, but less than 40 C. At least 40, but less than 45 D. At least 45, but less than 50 E. 50 or more 3.14 (1 point) You wish to simulate a ParaLogistic Distribution as per Loss Models with parameters α = 6 and θ = 50. A value of 0.89 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value from the ParaLogistic distribution corresponds to this 0.89? A. Less than 35 B. At least 35, but less than 40 C. At least 40, but less than 45 D. At least 45, but less than 50 E. 50 or more 3.15 (3 points) Using the inversion method, simulate two random draws from the distribution: 1 x F(x) = + , -∞ < x < ∞. Use the random numbers 0.274 and 0.620. 2 8 + 4x2 What is the sum of these two simulated values? A. Less than -0.4 B. At least -0.4, but less than -0.3 C. At least -0.3, but less than -0.2 D. At least -0.2, but less than -0.1 E. -0.1 or more
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 17
3.16 (3 points) You are given the following graph of a cumulative distribution function: F(x) 1 (10000, 1.0)
0.9 (1000, 0.7)
(5000, 0.9)
0.7
(0, 0)
x 1000
5000
10000
You simulate 4 losses from this distribution using the following random numbers from (0, 1): 0.812, 0.330, 0.626, 0.941. For a policy with a $500 deductible, determine the simulated average payment per payment. A. Less than 2500 B. At least 2500, but less than 3000 C. At least 3000, but less than 3500 D. At least 3500, but less than 4000 E. 4000 or more ⎛ ⎞α 1 3.17 (1 point) You wish to model a Burr distribution, F(x) = 1 - ⎜ γ ⎟ , by simulation. ⎝ 1 + (x / θ) ⎠ The Burr distribution has parameters α = 3, θ = 100, and γ = 2. A value of 0.75 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value from the Burr distribution corresponds to this 0.75? A. Less than 77 B. At least 77, but less than 78 C. At least 78, but less than 79 D. At least 79, but less than 80 E. 80 or more
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 18
3.18 (3 points) Annual prescription drug costs are modeled by a two-parameter Pareto distribution with θ = 2000 and α = 2. A prescription drug plan pays annual drug costs for an insured member subject to the following provisions: (i) The insured pays 100% of costs up to the ordinary annual deductible of 250. (ii) The insured then pays 25% of the costs between 250 and 2250. (iii) The insured pays 100% of the costs above 2250 until the insured has paid 3600 in total. This occurs when annual costs reach 5100. (iv) The insured then pays 5% of the remaining costs. Use the following random numbers to simulate four years: 0.58, 0.94, 0.13, 0.80. What are the total plan payments for the four simulated years? A. Less than 4500 B. At least 4500, but less than 4600 C. At least 4600, but less than 4700 D. At least 4700, but less than 4800 E. 4800 or more 3.19 (3 points) The Cauchy Distribution has density f(x) =
1 , -∞ < x < ∞. π {1 + (x - µ)2 }
Using the inversion method, simulate two random draws from a Cauchy Distribution with µ = 2. Use the random numbers 0.313 and 0.762. What is the product of these two simulated values? d tan -1(x) 1 Hint: = . dx 1 + x2 A. 2.5
B. 3.0
C. 3.5
D. 4.0
E. 4.5
3.20 (2 points) Loss sizes for liability risks follow a Pareto distribution, with parameters θ = 300 and α = 4. Loss sizes for property risks follow a Pareto distribution, with parameters θ = 1,000 and α = 3. Using the inversion method, a loss of each type is simulated. Use the random number 0.733 to simulate the liability loss and the random number 0.308 to simulate the property loss. What is the size of the larger simulated loss? A. Less than 120 B. At least 120, but less than 125 C. At least 125, but less than 130 D. At least 130, but less than 135 E. At least 135
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 19
3.21 (2 points) Losses follow a LogLogistic Distribution with parameters θ = 2000 and γ = 3. The following are three uniform (0,1) random numbers: 0.5217 0.1686 0.9485 Using these numbers and the inversion method, simulate three losses. Calculate the average payment per payment for a contract with a deductible of 1000. A. Less than 2000 B. At least 2000, but less than 2250 C. At least 2250, but less than 2500 D. At least 2500, but less than 2750 E. 2750 or more 3.22 (2 points) Use the following:
• Two actuaries each simulate one loss from a Weibull Distribution with parameters τ = 2 and θ = 1500.
• Each uses the random number 0.312 from (0, 1) and the Inverse Transform Method. • Laurel has large random numbers correspond to large losses; her simulated loss is L. • Hardy has large random numbers correspond to small losses; his simulated loss is H. What is H/L? A. Less than 1.60 B. At least 1.60, but less than 1.65 C. At least 1.65, but less than 1.70 D. At least 1.70, but less than 1.75 E. 1.75 or more 3.23 (2 points) X follows a probability density function: 3 / exp[3x + 12], x > -4. Using the random number 0.2203 and the inversion method, simulate a random value of X. A. -3.9 B. -3.8 C. -3.7 D. -3.6 E. -3.5 3.24 (4, 5/86, Q.53) (1 point) You wish to generate, via the method of simulation, a random variable, X, with probability density function: f(x) = 20,000 x-3; 100 < x < ∞. To do so, you use a random variable Y, with probability density function: f(y) = 1; 0 < y < 1. Which simulated value of X corresponds to a sample value of Y equal to 0.25? 100 100 100 A. B. C. D. (80,000)1/3 E. None of the above. 0.75 0.25 0.25
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 20
3.25 (4, 5/87, Q.49) (1 point) You wish to model a Pareto distribution by simulation. The Pareto distribution has parameters α = 0.50 and θ = 3. A value of 0.5 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value from the Pareto distribution corresponds to this 0.5? A. Less than 7.5 B. At least 7.5, but less than 8.5 C. At least 8.5, but less than 9.5 D. At least 9.5, but less than 10.5 E. 10.5 or more 3.26 (4, 5/88, Q.52) (1 point) Suppose you wish to model a distribution of the form f(x) = qxq-1, 0 < x < 1, with q = 0.5, by simulation. A value of 0.75 is randomly generated from a uniform distribution on the interval (0,1). What simulated value corresponds to this 0.75? A. Less than 0.45 B. At least 0.45, but less than 0.50 C. At least 0.50, but less than 0.55 D. At least 0.55, but less than 0.60 E. 0.60 or more 3.27 (4, 5/90, Q.26) (1 point) You want to generate a random variable, X, with probability density function f(x) = (3/8) x2 for 0 < x < 2 by simulation. To do so, you first use a random number, Y, which is uniformly distributed on (0,1). What simulated value of X corresponds to a sample value of Y equal to 0.125? A. 0.125 B. 0.250 C. 0.500 D. 0.750 E. 1.000 3.28 (4, 5/91, Q.24) (1 point) A random number generator producing numbers in the unit interval [0,1] is used to produce an Exponential distribution with parameter θ = 20. The number produced by the random number generator is y = 0.21. Use the inversion method, to calculate the corresponding value of the exponential distribution. A. x = 4.71 B. x = 0.01 C. x = 28.70 D. x = 0.24 E. x = 1.18 3.29 (4B, 11/92, Q.29) (1 point) You are given the following:
• •
The random variable X has the density function f(x) = 4x-5 , x > 1.
The random variable Y having the uniform distribution on (0,1) is used to simulate outcomes for X. Determine the simulated value of X that corresponds to an observed value of Y = 0.250. A. 0.931 B. 0.944 C. 1.000 D. 1.059 E. 1.075
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 21
3.30 (4B, 5/94, Q.11) (2 points) You are given the following: • The random variable X for the amount of an individual claim has the density function f(x) = 2x-3, x ≥ 1. • • • •
The random variable R for the ratio of loss adjustment expense (LAE) to loss is uniformly distributed on the interval [0.01, 0.21]. The amount of an individual claim is independent of the ratio of LAE to loss. The random variable Y having the uniform distribution on [0,1) is used to simulate outcomes for X and R, respectively. Observed values of Y are Y1 = 0.636 and Y2 = 0.245.
Using Y1 to simulate X and Y2 to simulate R, determine the simulated value for X(1+R). A. B. C. D. E.
Less than 0.65 At least 0.65, but less than 1.25 At least 1.25, but less than 1.85 At least 1.85, but less than 2.45 At least 2.45
3.31 (4B, 11/94, Q.18) (1 point) You are given the following: The random variable X has the density function f(x) = 2 x-3, x > 1. The random variable Y having the uniform distribution on [0,1] is used to simulate outcomes for X. Determine the simulated value of X that corresponds to an observed value of Y = 0.50. A. 0.630 B. 0.707 C. 1.414 D. 1.587 E. 2.000 3.32 (4B, 11/96, Q.27) (2 points) You are given the following:
•
⎛ ⎞α 1 The random variable X has a Burr distribution, F(x) = 1 - ⎜ γ⎟ , ⎝ 1 + (x / θ) ⎠ with parameters θ = 1,000,000, α = 2, and γ = 0.5.
•
A random number is generated from a uniform distribution on the interval (0, 1). The resulting number is 0.36. Determine the simulated value of X. A. Less than 50,000 B. At least 50,000, but less than 100,000 C. At least 100,000, but less than 150,000 D. At least 150,000, but less than 200,000 E. At least 200,000
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 22
3.33 (4B, 5/97, Q.26) (2 points) You are given the following: • The random variable X has a distribution that is a mixture of a distribution with density function f1 (x) = 4x3 , 0 < x < 1 and a distribution with density function f2 (x) = 12x2 (1 - x), 0 < x < 1.
•
f1 (x) has a weight of 0.75 and f2 (x) has a weight of 0.25.
•
A random number is generated from a uniform distribution on the interval (0,1). The resulting number is 0.064. Determine the simulated value of X. A. Less than 0.15 B. At least 0.15, but less than 0.25 C. At least 0.25, but less than 0.35 D. At least 0.35, but less than 0.45 E. At least 0.45 3.34 (4B, 5/98, Q.1) (1 point) You are given the following: • The random variable X has cumulative distribution function 1 F(x) = 1 , -∞ < x< ∞. 1 + exp[(x -1) / 2] •
A random number is generated from a uniform distribution on the interval (0, 1). The resulting number is 0.4. Determine the simulated value of X. A. Less than 0.25 B. At least 0.25, but less than 0.75 C. At least 0.75, but less than 1.25 D. At least 1.25, but less than 1.75 E. At least 1.75 3.35 (CAS3, 11/04, Q.39) (2.5 points) The Gumbel distribution is given by: F(x) = exp[-e-x], for all x. Use the random number from a uniform distribution on the interval [0,1], 0.9833, in order to simulate a random value from the Gumbel distribution. Use the inversion method, with large random numbers corresponding to large results. A. Less than -2.0 B. At least -2.0, but less than 0 C. At least 0, but less than +2.0 D. At least +2.0, but less than +4.0 E. At least +4.0 Note: I have rewritten this exam question in order to match the current syllabus.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 23
3.36 (4, 11/06, Q.32 & 2009 Sample Q.275) (2.9 points) A dental benefit is designed so that a deductible of 100 is applied to annual dental charges. The reimbursement to the insured is 80% of the remaining dental charges subject to an annual maximum reimbursement of 1000. You are given: (i) The annual dental charges for each insured are exponentially distributed with mean 1000. (ii) Use the following uniform (0, 1) random numbers and the inversion method to generate four values of annual dental charges: 0.30 0.92 0.70 0.08 Calculate the average annual reimbursement for this simulation. (A) 522 (B) 696 (C) 757 (D) 947 (E) 1042
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 24
Solutions to Problems: 3.1. E. u = F(x) = 1 - e-x /1000 . Therefore, x = -θ ln(1 - u) = -1000 ln (1 - 0.988) = 4423. Alternately, for the Exponential VaRp (X) = -θ ln(1-p) = -1000 ln (1 - 0.988) = 4423. 3.2. E. For the Inverse Weibull Distribution, VaRp (X) = θ {-ln(p)}−1/τ = (15) {-ln(0.215)}-1/3 = 13.0. Alternately, F(x) = exp[-(θ/x)τ] = exp[-(15/x)3 ]. 0.215 = exp[-(15/x)3 ]. ⇒ 1.5371 = (15/x)3 . ⇒ x = 13.0. 3.3. D. F(x) = 1 - 1000x-3, 10 < x. Set 0.35 = F(x) = 1 - 1000x-3. ⎛ 1000 ⎞ 1/ 3 ⎟ x= ⎜ = 11.54. ⎝ 1 - 0.35 ⎠ Alternately, this is a Single Parameter Pareto Distribution with α = 3 and θ = 10. For the Single Parameter Pareto Distribution, VaRp (X) = θ (1- p) - 1/ α = (10)(1 - 0.35)-1/3 = 11.54. 3.4. E. For the Inverse Paralogistic Distribution, VaRp (X) = θ {p-1/τ - 1}-1/τ = (2000) {0.396-1/6 - 1}-1/6 = 2695. ⎧ (x / θ)τ ⎫τ ⎧ (x / 2000)6 ⎫6 ⎬ . ⎬ =⎨ Alternately, F(x) = ⎨ ⎩1 + (x / 2000)6 ⎭ ⎩1 + (x / θ)τ ⎭ ⎧ (x / 2000)6 ⎫6 (x / 2000)6 ⎬ . ⇒ 0.85694 = 0.396 = ⎨ . 1 + (x / 2000)6 ⎩1 + (x / 2000)6 ⎭
⇒ 0.16694 = (x/2000)-6. ⇒ x = 2695. 3.5. A. F(x) = 1 - (θ / (θ + x))α = 1 - (3/(3+x))1.5 Set 0.6 = F(x) = 1 - (3/(3+x))1.5. 3/(3+x) = .4 1/1.5 = .543. x = 2.52. Alternately, for the Pareto Distribution, VaRp (X) = θ {(1 - p)−1/α - 1} = (3) {(1 - 0.6)-1/1.5 - 1} = (3) (0.8420) = 2.52.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 25
3.6. B. For the Inverse Exponential Distribution, VaRp (X) = θ {-ln(p)}-1 = (400) {-ln(0.092)}-1 = 168. Alternately, F(x) = exp[-θ/x] = exp[-400/x]. 0.092 = exp[-400/x]. ⇒ x = 168. 3.7. B. F(x) = xq , 0< x <1. Set 0.8 = F(x) = xq = x.3. Thus x = .81/.3 = 0.475. Comment: This is a Beta Distribution with parameters a = q, b = 1, and θ =1, although this doesnʼt help in the solution. While we can simulate this particular Beta Distribution via the Inverse Transform Algorithm, in general one would use the rejection method, not on the syllabus. 3.8. D. For the Inverse Pareto Distribution, VaRp (X) = θ {p-1/τ - 1}-1 = (300) {0.774-1/4 - 1}-1 = 4536. Alternately, F(x) = {x/(x+θ)}τ = {x/(x+300)}4 . 0.774 = {x/(x+300)}4 . ⇒ x = 4536. 3.9. D. F(x) = x4 / 16. Set .125 = F(x) = x4 / 16. ⇒ x = 2.25 = 1.189. 3.10. C. For the Inverse Burr Distribution, VaRp (X) = θ {p-1/τ - 1}-1/γ = (700) {0.871-1/5 - 1}-1/3 = 2305. ⎛ (x / θ) γ ⎞ τ ⎛ (x / 700)3 ⎞ 5 Alternately, F(x) = ⎜ γ ⎟ = ⎜ 1 + (x / 700)3 ⎟ . ⎝ ⎠ ⎝ 1 + (x / θ) ⎠ ⎛ (x / 700)3 ⎞ 5 (x / 700)3 0.871 = ⎜ . ⇒ 0.97276 = . ⇒ 0.028003 = (x/700)-3. ⇒ x = 2305. ⎟ 1 + (x / 700)3 ⎝ 1 + (x / 700)3 ⎠ 3.11. B. F(x) = 1 - exp(-(x/θ)τ) = 1 - exp(-(x/464)3). Set 0.4 = F(x) = 1 - exp(-(x/.464)3). x = (464)(- ln.6)1/3 = 371. Alternately, for the Weibull Distribution, VaRp (X) = θ {-ln(1-p)}1/τ = (464) {- ln(1 - 0.4)}1/3 = 371. 3.12. A. F(x) = 1 - 1/{1 + (x/θ)γ} = 1 - 1/{1 + (x/250)4 }. Set 0.37 = F(x)= 1 - 1/{1 + (x/250)4 }. x = (250) {1/.63 - 1}1/4 = 219. Alternately, for the Loglogistic Distribution, VaRp (X) = θ {p-1 - 1}-1/γ = (250) {1/.37 - 1}-1/4 = 219.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 26
3.13. D. Set .61 = F(x) = 1 - exp(-0.1( 1.05x -1)). x = ln(1 - ln(1-.61)/0.1) /ln(1.05) = 48.0. Comment: A Gompertz Distribution as per Actuarial Mathematics, with c = 1.05 and m = .1, ( B = .1 ln(1.05) = .0049.) We have simulated an age at death of 48. 3.14. C. F(x) = 1 - 1 / (1 + (x/θ)α)α = 1 - 1 / (1 + (x/50)6 )6 . Set 0.89 = F(x)= 1 - 1 / (1 + (x/50)6 )6 . x = (50) {(1 - 0.89)-1/6 - 1}1/6 = 43.7. Alternately, for the ParaLogistic Distribution, VaRp (X) = θ {(1-p)-1/α - 1}1/α = (50) {(1 - 0.89)-1/6 - 1}1/6 = 43.7. 3.15. B. u = F(x) = 1/2 + x/ 8 + 4x2 . ⇒ x = (u - 1/2) 8 + 4x2 . ⇒ x2 = (2u - 1)2 (2 + x2 ).
⇒ x2 = 2(2u - 1)2 /{1 - (2u - 1)2 }. ⇒ x = (2u - 1)/ 2u - 2u2 . Note that this has the right property that when u < 1/2, x < 0, and when u > 1/2, x > 0. The simulated values are: (2(.274) - 1)/ 2(0.274) - 2(0.274)2 = -0.717. and (2(.62) - 1)/ 2(0.62) - 2(0.62) 2 = 0.350. Their sum is: -0.717 + 0.350 = -0.367. Comment: Check that F(-.717) = .274 and F(.350) = .620. The t-distribution with two degrees of freedom. One can not in general use the method of inversion to simulation a t-distribution. 3.16. C. Linearly interpolating, we find where F(x) = 0.812; 0.812 corresponds to a loss of: 1000 + (.112/.2)(4000) = 3240. 0.330 corresponds to a loss of: (.330/.7)(1000) = 471. 0.626 corresponds to a loss of: (.626/.7)(1000) = 894. 0.941 corresponds to a loss of: 5000 + (.041/.1)(5000) = 7050. Payments are: 2740, 0, 394, 6550. Average payment per (non-zero) payment is: (2740 + 394 + 6550)/3 = 3228. 3.17. A. F(x) = 1 - (1 / (1 + (x/θ)γ))α = 1 - (1 / (1 + (x/100)2 ))3 . Set 0.75 = F(x)= 1 - (1 / (1 + (x/100)2 ))3 . ⇒ x = (100) {(.25)-1/3 - 1}1/2 = 76.6. Alternately, for the Burr Distribution, VaRp (X) = θ {(1-p)-1/α - 1}1/γ = (100) {(1 - 0.75)-1/3 - 1}1/2 = 76.6.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 27
3.18. C. For the Pareto Distribution, VARp [X] = θ {(1-p)−1/α - 1} = (2000){(1-p)-1/2 - 1}. Thus the drug costs for the first simulated year is: (2000){(1 - 0.58)-1/2 - 1} = 1086. The plan pays: (75%)(1086 - 250) = 627. The drug costs for the second simulated year is: (2000){(1 - 0.94)-1/2 - 1} = 6165. The plan pays: (75%)(2250 - 250) + (95%)(6165 - 5100) = 2512. The drug costs for the third simulated year is: (2000){(1 - 0.13)-1/2 - 1} = 144. The plan pays nothing. The drug costs for the fourth simulated year is: (2000){(1 - 0.80)-1/2 - 1} = 2472. The plan pays: (75%)(2250 - 250) = 1500 Total plan payments for the four simulated years: 627 + 2512 + 0 + 1500 = 4639. Comment: Setup taken from SOA3, 11/04, Q.7 in “Mahlerʼs Guide to Loss Distributions.” 3.19. D.
x
x
∫
F(x) = (1/π) 1/{1 + (x - µ)2 } dx = (1/π) tan-1(x-µ) ] = (1/π){tan-1(x - µ) - (-π/2)} = -∞
-∞
1/2 + (1/π)tan-1(x - µ) = 1/2 + (1/π)tan-1(x - 2). u = F(x) = 1/2 + (1/π)tan-1(x - 2). x = 2 + tan[π(u - 1/2)]. The simulated values are: 2 + tan[π(.313 - .5)] = 2 - .666 = 1.334, and 2 + tan[π(.762 - .5)] = 2 + 1.078 = 3.078. Their product is: (1.334)(3.078) = 4.11. Comment: Note that F(1.334) = 1/2 + (1/π)tan-1(-.666) = 1/2 - .5875/3.14159 = .313, and F(3.078) = 1/2 + (1/π)tan-1(1.078) = 1/2 + .8229/3.14159 = .762. An integral you are unlikely to need to know how to do for your exam. The Cauchy Distribution for µ = 0 is a t-distribution with one degree of freedom. 3.20. D. Set u = F(x) =1 - (1+ x/θ)−α. x = θ{(1-u)−1/α - 1). The liability loss is: 300{(1 - .733)-1/4 - 1) = 117. The property loss is: 1000{(1 - .308)-1/3 - 1) = 131. The larger simulated loss is 131. Comment: In each case, we could use that for the Pareto, VaRp (X) = θ {(1 - p)−1/α - 1}.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 28
3.21. A. F(x) = 1 - 1/{1 + (x/θ)γ} = 1 - 1/{1 + (x/2000)3 } = u. ⇒ x = (2000) {1/(1-u)- 1}1/3. x = (2000)(1/.4783 - 1)1/3 = 2059. x = (2000)(1/.8314 - 1)1/3 = 1175. x = (2000)(1/.0515 - 1)1/3 = 5282. Payments are: 1059, 175, 4282. Average payment per (non-zero) payment is: (175 + 1059 + 4282)/3 = 1839. Comment: If the smallest simulated loss had been somewhat smaller and less than 1000, then there would have been only two non-zero payments. In this case, there are 3 losses that result in 3 (non-zero) payments. When all of the losses in a sample result in a non-zero payment, the average payment per payment = mean - deductible. 3.22. E. Laurel sets u = F(x) = 1 - exp[-(x/θ)τ]. Solving x = θ(-ln[1-u])1/τ = 1500(-ln[1 - 0.312])1/2 = 917. Hardy sets u = S(x) = exp[-(x/θ)τ]. Solving x = θ(-ln[u])1/τ = 1500(-ln[0.312])1/2 = 1619. H/L = 1619/917 = 1.77. Comment: The answer does not depend on theta. 3.23. A. f(x) = 3 e-3x e-12, x > -4. By integration: F(x) = 1 - e-3x e-12, x > -4. Set: 0.2203 = 1 - e-3x e-12. ⇒ ln[0.7797] = -3x - 12. ⇒ x = -3.917. 3.24. E. F(x) = 1 - 10000 x-2. Setting 0.25 = F(x) = 1 - 10000 x-2. x-2 = .75 / 10000. Therefore, x = 100 /
0.75 .
Alternately, this is a Single Parameter Pareto Distribution with α = 2 and θ = 100, with VaRp (X) = θ (1- p) - 1/ α = (100) (1 - 0.25)-1/2 = 100 /
0.75 .
3.25. C. F(x) = 1 - {θ/(θ+x)}α = 1 - {3/(3+x)}0.5. Setting F(x) = the uniform random number from (0,1): .5 = 1 - {3/(3+x)}0.5. Therefore 3/(3+x) = 0.52 = 0.25. Thus x = 9. Alternately, for the Pareto Distribution, VaRp (X) = θ {(1 - p)−1/α - 1} = (3) {(1 - 0.5)-1/0.5 - 1} = (3)(3) = 9.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 29
3.26. D. Integrating f(x), F(x) = xθ = x.5. Setting F(x) =.75, x = .752 = 0.5625. 3.27. E. Since f(x) = (3/8)x2 for 0 < x < 2, then by integration F(x) = x3 / 8. Set .125 = F(x) = x3 / 8. Solving x = {(.125)(8)}1/3 = 1. 3.28. A. Set the Distribution function for the exponential with θ = 20, equal to the random number from [0,1]. 0.21 = F(x) = 1 - e-x/20 . Therefore, ln(.79) = - x / 20. ⇒ x = - 20 ln(.79) = 4.71. Alternately, for the Exponential VaRp (X) = -θ ln(1-p) = -20 ln (1 - 0.21) = 4.71. 3.29. E. Integrating f(t) from t = 1 to x, F(x) = 1 - x-4 , x > 1. Simulate via the method of inversion by setting Y = F(x). .250 = 1 - x-4 . ⇒ x = .75-1/4 = 1.075. 3.30. C. The Distribution Function for losses is obtained from the given density function via integration: F(x) = 1 - x-2 , x ≥1. Set F(x) = Y1. 1 - x-2 = 0.636. Thus x = 1.66. To get a random number in the interval [.01,.21] from a random number in the interval [0,1], one must change the width of the latter interval and translate it until it covers the former interval. This is achieved by multiplying by .2 and adding .01. Thus the simulated value of R is: .2Y2 + .01 = .059. Thus the simulated value of X(1 + R) is: (1.66)(1 + .059) = 1.76. Comment: The size of loss distribution is a Single Parameter Pareto Distribution with α = 2 and θ = 1. For the Single Parameter Pareto Distribution, VaRp (X) = θ (1- p) - 1/ α = (1) (1 - 0.636)-1/2 = 1.66 This question is a simplified form of a possible real world use of simulation. 3.31. C. f(x) = 2 x -3, x > 1 and therefore integrating from 1 to x: F(x) = 1 - 1/x2 , x > 1. For a random number on [0,1] of .5, using the method of inversion we set .5 = F(x). .5 = 1 - 1/x2 and therefore x = 1.414. Alternately, this is a Single Parameter Pareto Distribution with α = 2 and θ = 1, with VaRp (X) = θ (1- p) - 1/ α = (1) (1 - 0.5)-1/2 = 2 = 1.414.
2013-4-13,
Simulation §3 Continuous Dist. Inversion Method, HCM 10/25/12, Page 30
3.32. B. Set F(x) = .36 and solve for x: ⎞2 ⎛ ⎞α ⎛ 1 1 0.36 = F(x) = 1 - ⎜ γ ⎟ = ⎜ 1 + (x / 1,000,000)0.5 ⎟ . ⇒ ⎝ ⎠ ⎝ 1 + (x / θ) ⎠ 1000/(1000+ x.5) = (1 - 0.36)1/2 = .8. ⇒ 200 = .8x.5. ⇒ x = 2502 = 62,500. Alternately, for the Burr Distribution, VaRp (X) = θ {(1-p)-1/α - 1}1/γ = (1,000,000) {(1 - 0.36)-1/2 - 1}1/0.5 = 62,500. 3.33. D. Via integration, the two Distribution Functions are: F1 (x) = x4 and F2 (x) = 4x3 - 3x4 . Thus the mixed Distribution Function is: .75 F1 (x) + .25F2 (x) = .75x4 + .25(4x3 - 3x4 ) = x3 . Simulating by algebraic inversion, we set the random number equal to the (mixed) Distribution Function: .064 = x3 . Thus x = 0.4. Comment: Since the mixed distribution simplifies in this case, we can simulate it via algebraic inversion using only one random number. The general method of simulating mixed distributions is discussed in a subsequent section. 3.34. A. Set .4 = F(x) and solve for x. 0.4 = 1 - 1/{1+exp((x-1)/2)}. ⇒ 1 + exp((x-1)/2) = 1/.6. ⇒ (x-1)/2 = ln(.6667). ⇒ x = 0.189. Comment: Check, F(.189) = 1 - 1/{1+exp(-.4055)} = 1 - 1/1.667 = .4. 3.35. E. Set .9833 = F(x) = exp[-e-x]. ⇒ e-x = .01684. ⇒ x = 4.08. Comment: While the Gumbel Distribution is in Appendix A.4 of Loss Models, you are very unlikely to be asked about it on your exam. The Gumbel Distribution is a special case of the extreme value distribution. Note that F(-∞) = exp[-∞] = 0 and F(∞) = exp[0] = 1. f(x) = e-x exp[-e-x] > 0. 3.36. A. Set u = F(x) = 1 - e-x/1000. ⇒ x = -1000ln(1 - u). The four simulated annual dental charges are: -1000ln.7 = 357, -1000ln.08 = 2526, -1000ln.3 = 1204, -1000ln.92 = 84. Reimbursements are: (.8)(257) = 206, 1000, (.8)(1104) = 883, 0. Average annual reimbursement is: (206 + 1000 + 883 + 0)/4 = 522. Comment: For the Exponential Distribution, VaRp (X) = -θ ln(1-p). Each year there is total amount reimbursed, including the possibility of zero. In order to get the average per year, we calculate the total reimbursements over four years and dividing by four. We are taking an average per year, rather than an average per reimbursement.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 31
Section 4, Discrete Distributions, the Inversion Method The inversion method can be applied to simulate any discrete distribution.17 In particular it can be used for any frequency distribution, including the Binomial,18 Poisson,19 Geometric, and Negative Binomial.20 One random number produces one random draw from the discrete distribution. Bernoulli Distribution: Assume we wish to simulate a random draw from a Bernoulli Distribution with mean .2. Then if 0 ≤ u ≤ 0.8 there is no claim, while if 1 ≥ u > 0.8 there is a claim.21 In general for a Bernoulli with parameter q: for u ≤ 1-q there is no claim, while for u > 1-q there is a claim. Exercise: You wish to simulate via the inversion method, random draws from a Bernoulli Distribution with q = 0.2. You are given the following sequence of random numbers from [0,1]: 0.6, 0.2, 0.05, 0.9, 0.995, 0.4, 0.5. Determine the corresponding sequence of random draws from the Bernoulli Distribution. [Solution: f(0) = 0.8, f(1) = 0.2. F(0) = 0.8, F(1) = 1. If the random number is less than or equal to .8 then there is no claim; if the random number is greater than .8 then there is a claim. The corresponding sequence of random draws from the Bernoulli Distribution is: 0, 0, 0, 1, 1, 0, 0.] General Case of a Discrete Distribution: In general, for a discrete distribution, one first constructs a table of the Distribution Function, by cumulating densities. Then one looks for the first place the Distribution Function is greater than the random number. We want the smallest x, such that F(x) > u.22 Using this technique, large random numbers correspond to large simulated values. 17
As discussed subsequently, one can also apply the inversion method to a life table, in order to simulate times of death and the present values of benefits paid for life insurances and annuities. The inversion method is also called the inverse transform method. I refer to the application to discrete distributions as the “Table Look-Up Method”. I refer to the application to continuous distributions as algebraic inversion. 18 One can also simulate a Binomial with parameters m and q, by summing the result of simulating m independent random draws from a Bernoulli Distribution with parameter q. 19 One can also simulate the Poisson via the special algorithm for the Poisson Process to be described in a subsequent section. 20 One can also simulate a Negative Binomial with parameters r and β, for r integer, by summing the result of simulating r independent random draws from a Geometric Distribution with parameter β. 21 Note that one could also simulate the same Bernoulli via a claim when u < 0.2 and no claim for u ≥ 0.2. In general there are many ways to simulate a given distribution. 22 The method in Loss Models ⇔ F(x) > u, although the explanation is far from clear. It would be equally valid to instead require F(x) ≥ u. While Loss Models discusses this distinction as if it were important, there should be no practical difference, since using a computer we would not expect to get a random number u exactly equal to F(x).
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 32
While for discrete distributions one usually does things in this manner, if instead one wants small random numbers corresponding to large simulated values, then we want the smallest x such that F(x) > 1 - u, or equivalently S(x) < u. Binomial Distribution:23 For example take a Binomial Distribution with q = 0.2 and m = 3. We can compute a table of F(x), and then invert by looking up the proper values in this table:24 Number of Claims 0 1 2 3
Probability Density Function 51.200% 38.400% 9.600% 0.800%
Cumulative Distribution Function 0.51200 0.89600 0.99200 1.00000
For example if u is 0.95, then we simulate 2 claims, because 2 is the first value such that F(x) > 0.95. For u in the interval [0.896, 0.992), we will simulate 2 claims. Since u is uniformly distributed on [0,1], the chance of simulating 2 claims will be: 0.992 - 0.896 = 0.096 as desired. Exercise: You wish to simulate via the inversion method, random draws from a Binomial Distribution with m = 3 and q = 0.2. Large random numbers correspond to a large number of claims. You are given the following sequence of random numbers from [0,1]: 0.6, 0.2, 0.8, 0.05, 0.9, 0.995, 0.4, 0.5. Determine the corresponding sequence of random draws from the Binomial Distribution. [Solution: Using the table calculated previously above: 1, 0, 1, 0, 2, 3, 0, 0.] (a, b, 0) Relationship: Note that one could successively calculate the Binomial densities by using of the relationship (m - x) q q (m + 1) q f(x+1) = f(x) = f(x) {+ } together with f(0) = (1 - q)m. (x + 1) (1 - q) 1 - q (x + 1) (1 - q) In some cases, this method of calculating densities can be faster. This technique can be applied to any member of the (a, b, 0) class of frequency distributions as defined in Loss Models.
23 24
The Bernoulli Distribution is a special case of the Binomial Distribution, with m = 1. For example, the chance of two claims is (3)(0.22 )(0.8) = 0.096.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 33
Recall that for each member of the (a, b, 0) class of frequency distributions: f(x+1) / f(x) = a + {b/(x+1)}, x = 0, 1, 2, 3,... where a and b depend on the parameters of the distribution:25 Distribution a b f(0) Binomial
-q / (1-q)
(m+1)q / (1-q)
(1-q)m
Poisson
0
λ
e−λ
Negative Binomial
β / (1+β)
(r-1)β / (1+β)
1 / (1+β)r
Discrete Severity Distributions: The inversion method can also be used with any severity distribution that takes on only finite values.26 For example, assume we have the distribution: Size of Claim Probability Cumulative Distribution Function $100 30% 0.30 $500 60% 0.90 $2500 10% 1.00 Then if u = 0.72, we simulate a claim of size $500, since F($500) > 0.72 and $500 is the first such value of x. If instead u = 0.94, we simulate a claim of size $2500. If u = 0.270367103, we simulate a claim of size $100. Efficiency: The simulation of this severity example can be written as an algorithm: Generate a random number U from (0, 1). If U < 0.3, set X = $100 and stop. If U < 0.9, set X = $500 and stop. Otherwise set X = $2500. One could perform this simulation in a more efficient manner. The most efficient way to simulate X ⇔ doing the fewest comparisons on average ⇒ testing for the largest probabilities first. In this case, $500 has the largest probability, so we would test for it first. $100 has the second largest probability, so we would test for it second. 25
See “Mahlerʼs Guide to Frequency Distributions.” One can approximate numerically any continuous distribution by a discrete distribution; the more values of x, the better the approximation can be. One can then simulate the approximating discrete distribution via the inversion method (Table Look-up) and use the result as an approximate simulation of the original continuous distribution. This can be a useful technique in some practical applications. 26
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 34
An equally valid, but more efficient algorithm: Generate a random number U from (0, 1). If U < 0.6, set X = $500 and stop. If U < 0.9, set X = $100 and stop. Otherwise set X = $2500. Exercise: Using this more efficient algorithm, and the random numbers: 0.72, and 0.94, simulate two random severity values. [Solution: $100, and $2500.] While the two algorithms usually produce different results for a short list of random numbers, for a very long list of random numbers, the distribution of simulated claim sizes will be the same. Therefore, the two algorithms are equally valid. Geometric Distribution: For a Geometric Distribution with β = 3, f(0) = 1/(1+β) = 1/4. f(x+1) = f(x)β/(1+β) = (3/4)f(x). x
f(x)
F(x)
S(x)
0 1 2 3 4 5 6 7 8 9 10
0.25000 0.18750 0.14062 0.10547 0.07910 0.05933 0.04449 0.03337 0.02503 0.01877 0.01408
0.25000 0.43750 0.57812 0.68359 0.76270 0.82202 0.86652 0.89989 0.92492 0.94369 0.95776
0.75000 0.56250 0.42188 0.31641 0.23730 0.17798 0.13348 0.10011 0.07508 0.05631 0.04224
Exercise: For the random numbers 0.84 and 0.49, simulate two random frequency values, with large random numbers corresponding to large numbers of claims. [Solution: 0.84 is first exceeded by F(6). 0.49 is first exceeded by F(2). We simulate 6 and 2 claims.] We note that for a Geometric, S(x) = {β/(1+β)}x+1. For example, S(6) = (3/4)7 = 0.13348. If large random numbers are to correspond to large numbers of claims, then given a random number u, we want the first x such that F(x) > u ⇔ S(x) < 1- u ⇔ {β/(1+β)}x+1 < 1- u ⇔ x+1 > {ln[1 - u] / ln[β/(1+β)]} ⇔ x = largest integer in:
ln[1 - u] . ln[β / (1+ β)]
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 35
For example, for u = 0.84, largest integer in {ln(1 - 0.84) / ln(3/4)} = largest integer in 6.37 = 6, matching the number of claims simulated above. Series of Bernoulli Trials: For a series of independent identical Bernoulli trials, the chance of the first success following x failures is given by a Geometric Distribution with mean: β = chance of a failure / chance of a success. The number of trials = 1 + number of failures = 1 + Geometric.27 Therefore, if one wants to simulate the number of trials before the first success, with large random numbers corresponding to large numbers of trials, x = 1 + largest integer in: {ln[1-u] / ln[β/(1+β)]} = 1 + largest integer in: ln[1-u] / ln[chance of failure]. Exercise: There is a series of independent trials, each with a 80% success rate. Let X represent the number of trials until the first success. Use the inversion method to simulate the random variable, X, where large numbers correspond to a high number of trials. Use the following random number: 0.98. [Solution: x = 1 + largest integer in: ln[1-u] / ln[chance of failure] = 1 + largest integer in: ln(0.02) / ln(0.2) = 1 + largest integer in: 2.43 = 3. Alternately, the number of failures is Geometric with β = 0.2/0.8 = 1/4. x
f(x)
F(x)
0 1 2 3 4 5 6 7
0.8 0.16000 0.03200 0.00640 0.00128 0.00026 0.00005 0.00001
0.8 0.96 0.992 0.9984 0.99968 0.99994 0.99999 1.00000
We need to see where u first exceeded by F(x). 0.98 is first exceeded by F(2).28 Thus we simulate 2 failures or 3 trials.]
27
1 + a Geometric Distribution is a zero-truncated Geometric Distribution. See “Mahlerʼs Guide to Frequency Distributions.” 28 Equivalently, u = 0.02 is first exceeded by S(2).
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 36
Uniform Discrete Variables: A special case of a discrete distribution, is a variable with equal probability on the integers from 1 to n. Exercise: X is distributed with a 20% chance of each of 1, 2, 3, 4, and 5. You are given the following sequence of random numbers from [0,1]: 0.65, 0.21, 0.83, 0.05, 0.92, 0.43. Determine the corresponding sequence of random draws from X. [Solution: F(1) = 0.2, F(2) = 0.4, F(3) = 0.6, F(4) = 0.8, F(5) = 1. Since F(4) > 0.65, 0.65 corresponds to 4. 0.21 corresponds to 2. 0.83 corresponds to 5. 0.05 corresponds to 1. 0.92 corresponds to 5. 0.43 corresponds to 3. Equivalently, in each case take 1 + (largest integer in 5u). For example, when u = 0.43, we get 1 + (largest integer in 2.15) = 1 + 2 = 3.] In general, in order to simulate a variable with equal probability on the integers from 1 to n, let x = 1 + largest integer in nu. Random Permutations and Subsets: The algorithm to simulate random uniform discrete variables can be used to simulate random permutations of the numbers from 1 to n. Exercise: You are given the following sequence of random numbers from [0,1]: 0.65, 0.29, 0.83, 0.05. Simulate a random permutation of the numbers from 1 to 5. [Solution: First simulate a random integer from 1 to 5. 1 + largest integer in (5)(0.65) = 4. Now exchange 4 with 5 to get: 1, 2, 3, 5, 4. Now simulate a random integer from 1 to 4: 1 + largest integer in (4)(0.29) = 2. Exchange the number in the 2nd position with the number in the 4th position to get: 1, 5, 3, 2, 4. 1 + largest integer in (3)(0.83) = 3. Exchange the 3rd position with the 3rd position; the sequence remains the same at: 1, 5, 3, 2, 4. 1 + largest integer in (2)(0.05) = 1. Exchange the 1st position with the 2nd position to get: 5, 1, 3, 2, 4. The simulated random permutation is: 5, 1, 3, 2, 4.] One can simulate random subsets of the numbers from 1 to n, by just stopping partway through the algorithm to simulate a random permutation of the numbers from 1 to n. Exercise: You are given the following sequence of random numbers from [0,1]: 0.35, 0.59. Simulate a random subset of size 2 from the integers from 1 to 5. [Solution: First simulate a random number from 1 to 5: 1 + largest integer in (5)(0.35) = 2. Now exchange 2 with 5 to get: 1, 5, 3, 4, 2. Now simulate a random number from 1 to 4: 1 + largest integer in (4)(0.59) = 3. Now exchange the number in the 3rd position with the number in the 4th position to get: 1, 5, 4, 3, 2. The random subset is the last 2 numbers, {3, 2}.]
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 37
One could go through the exact same steps to simulate a random subset of size 3 from the integers from 1 to 5. However, at the final step, we would take the first 3 numbers {1,5,4}. By proceeding in this manner we use only 2 random numbers rather than 3 in order to simulate a random subset of size 3. In general, in order to simulate a random subset of size r from the integers from 1 to n, one needs either r or n-r random numbers, whichever is smaller. Exercise: Your company has 17 branch offices, numbered from 1 to 17. You wish to visit 3 branch offices at random in order to observe how they are using a new underwriting tool you helped to develop. 29 You are given the following sequence of random numbers from [0,1]: 0.53, 0.42, 0.13. Which branch offices do you visit? [Solution: First simulate a random number from 1 to 17: 1 + largest integer in (17)(0.53) = 10. Now exchange 10 with 17. Now simulate a random number from 1 to 16: 1 + largest integer in (16)(0.42) = 7. Now exchange the number in the 7th position with the number in the 16th position. Now simulate a random number from 1 to 15: 1 + largest integer in (15)(0.13) = 2. Now exchange the number in the 2nd position with the number in the 15th position to get: 1, 15, 3, 4, 5, 6, 16, 8, 9, 17, 11, 12, 13, 14, 2, 7, 10. The random subset is the last 3 numbers: {2, 7, 10}.]
Poisson Distribution: One can simulate the Poisson Distribution using the inversion method, as a well as a special algorithm based on interarrival times to be discussed in a subsequent section. Exercise: You wish to simulate via the inversion method, random draws from a Poisson Distribution with λ = 3.2. You are given the following sequence of random numbers from [0,1]: 0.6, 0.2, 0.8, 0.05, 0.9, 0.995, 0.4, 0.5. Large random numbers correspond to large numbers of claims. Determine the corresponding sequence of random draws from the Poisson Distribution.
29
You do not visit all 17 offices due to time and money constraints. You might have to spend a week at each office in order to estimate the effectiveness of the new underwriting tool. An average of these three estimates could serve as an estimate of the average effectiveness of the new tool throughout your whole company.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 38
[Solution: The first step is to calculate a table of densities for this Poisson,30 using the relationship f(x+1) = f(x) {λ / (x+1)} = f(x) 3.2/(x+1), and f(0) = e−λ = e-3.2 = 0.04076. 3.2 Number of Claims 0 1 2 3 4 5 6 7 8 9 10
Probability Density Function 4.076% 13.044% 20.870% 22.262% 17.809% 11.398% 6.079% 2.779% 1.112% 0.395% 0.126%
Cumulative Distribution Function 0.04076 0.17120 0.37990 0.60252 0.78061 0.89459 0.95538 0.98317 0.99429 0.99824 0.99950
One cumulates the densities to get the distribution function. The first random number is 0.6; one sees the first time F(x) > 0.6, which occurs when x = 3 and F(3) = 0.60252. Similarly, F(2) = 0.37990 > 0.2. Proceeding similarly, the complete sequence of random draws from this Poisson Distribution, corresponding to 0.6, 0.2, 0.8, 0.05, 0.9, 0.995, 0.4, 0.5, is: 3, 2, 5, 1, 6, 9, 3, 3.] One could write the above simulation as an algorithm: Generate a random number U from (0, 1). If U < 0.04076, set X = 0 and stop. If U < 0.17120, set X = 1 and stop. If U < 0.37990, set X = 2 and stop. etc. Efficiency for the Poisson: First comparing u to F(0), then F(1), etc., is not an efficient way to program this algorithm. We would require 1 + x comparisons. Therefore, the expected number of comparisons is: 1 + λ. This could slow down execution of the computer program for large λ. Most of the probability is concentrated near the mean of λ, so our computer program should start our search at the largest integer in λ, in this case 3. We compare u to F(3). If F(3) > u, we then compare u to F(2). If instead F(3) ≤ u, then we compare u to F(4). We proceed until we have determined the smallest x such that F(x) > u. 30
I have only displayed values up to 10. In practical applications, one should include values up to a point where the cumulative distribution function is sufficiently close to 1.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 39
One could perform this simulation in a more efficient manner: If U < F(3) = 0.60252 then X ≤ 3, and then check successively, whether X ≤ 2, etc. If U ≥ F(3) = 0.60252 then X ≥ 4, and then check successively, whether X ≥ 5, etc. So for example, if the random number is 0.8, 0.8 ≥ .60252 ⇒ X > 3. 0.8 ≥ 0.78061 ⇒ X > 4. 0.8 < 0.89459 ⇒ X ≤ 5. ⇒ X = 5. If instead the random number is 0.2, 0.2 < 0.60252 ⇒ X ≤ 3. 0.2 < 0.37990 ⇒ X ≤ 2. 0.2 ≥ 0.17120 ⇒ X > 1 ⇒ X = 2. This is more efficient, since the largest probabilities for a Poisson density are near its mean. In general, a more efficient algorithm would start the search at F(largest integer in λ.)31 Starting to make comparisons near the mean of the distribution will be more efficient whenever applying the Inversion Method to any discrete distribution whose graph resembles that of the Poisson Distribution, such as either a Binomial Distribution with a large mean or a Negative Binomial Distribution with a large mean. Estimating Averages: For example, assume you had a list of 1000 equally likely interest rate scenarios over the next 10 years. For each interest rate scenario, it would take one minute to calculate the expected profits on a large portfolio of GICs. If you had 1000 minutes, or almost 17 hours, you could calculate the expected profits under each scenario. However, if one had only half an hour, one could pick 30 scenarios at random, by simulating a random subset of size 30 from the integers from 1 to 1000. Then one could average the expected profits for these 30 scenarios, and use this as an estimate of the expected profits for this portfolio. This is an example of more general technique. One can simulate a random set of outcomes, in order to estimate the average, variance, percentiles, etc., of a given random process.32 33
Using this more efficient method, as λ gets large, the number of comparison goes up as λ rather than λ. Such a simulation experiment is shown in a subsequent section. Estimating the probability of ruin via simulation is discussed in a subsequent section. Bootstrapping via simulation is discussed in a subsequent section. 33 There can be advantages to taking a stratified random sample. See for example “Implications of Sampling Theory for Package Policy Ratemaking,” by Jeffrey Lange, PCAS 1966. 31 32
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 40
Problems: 4.1 (2 points) You are given the following: • P(k) is a cumulative probability distribution function of a Binomial distribution with q = 0.10 and m = 5. • An observation from the random variable U, having the uniform distribution on [0, 1], is 0.995. Use the inversion method to determine the simulated number of claims from the Binomial distribution. A. 0 B. 1 C. 2 D. 3 E. 4 4.2 (2 points) You are using the inversion method to simulate Z, the present value random variable for a special two-year term insurance on (60). You are given: (i) (60) is subject to only two causes of death, with k
k|q 60(1)
k|q 60(2)
0 0.05 0.03 1 0.06 0.04 (ii) Death benefits, payable at the end of the year of death, are: During year Benefit for Cause 1 Benefit for Cause 2 1 2
1000 1100
2000 2200
(iii) i = 0.05. (iv) For this trial your random number, from the uniform distribution on [0, 1], is 0.923. (v) High random numbers correspond to high values of Z. Calculate the simulated value of Z for this trial. (A) 0 (B) 952 (C) 998 (D) 1905 (E) 1995 4.3 (3 points) You are given the following: • P(k) is a cumulative probability distribution function of a Negative Binomial Distribution as in Loss Models with parameters β = 2/3 and r = 8. • An observation from the random variable U, having the uniform distribution on [0, 1], is 0.35. Use the inversion method to determine the simulated number of claims. A. 2 or less B. 3 C. 4 D. 5 E. 6 or more
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 41
4.4 (2 points) You wish to simulate via the inversion method three independent random draws from a Poisson Distribution with λ = 0.7. You are given the following sequence of independent random numbers from (0,1): 0.681, 0.996, 0.423. Determine the sum of the three random draws from the Poisson Distribution. A. 2 or less B. 3 C. 4 D. 5 E. 6 or more 4.5 (2 point) A discrete empirical distribution X is created from the following observations: $X of Loss Frequency 100 3 200 4 400 6 700 3 1,200 2 2,000 2 (Assume that these six loss values are the only possible loss amounts.) Using this distribution and random numbers Y from the interval (0,1), random losses can be simulated. Calculate the sum of the three independent losses generated by the random Y values of 0.78, 0.31 and 0.60. A. Less than 800 B. At least 800, but less than 1000 C. At least 1000, but less than 1200 D. At least 1200, but less than 1400 E. 1400 or more 4.6 (1 point) Severity is equally likely to be $1000, $2000, $3000 or $4000. A value of 0.61 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated severity corresponds to this 0.61? A. $1000 B. $2000 C. $3000 D. $4000 E. None of the above. 4.7 (2 points) You wish to model a distribution by simulation. f(x) = (0.4)(0.6x), x = 0, 1, 2, 3, ... A value of 0.80 is randomly generated from a distribution which is uniform on the interval (0,1). What simulated value corresponds to this 0.80? A. 0 B. 1 C. 2 D. 3 E. 4 or more
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 42
4.8 (1 point) You wish to simulate a random value from a Binomial Distribution as per Loss Models with m = 4 and q = 0.3. You will do so by first simulating four independent Bernoulli random variables via the inversion method. You use the following four values: 0.1, 0.9, 0.2, 0.6, independently randomly generated from a distribution which is uniform on the interval (0,1). What is the resulting random value from the Binomial Distribution? A. 0 B. 1 C. 2 D. 3 E. 4 4.9 (2 points) You are given the following sequence of five random numbers from [0,1]: 0.125, 0.027, 0.614, 0.850, 0.261. You simulate a random permutation of the numbers from 1 to 5. What is the first of the five values in this permutation? A. 1 B. 2 C. 3 D. 4 E. 5 4.10 (3 points) Mortality follows the Illustrative Life Table from Actuarial Mathematics: l45 = 9164051, l65 = 7533964. For a group of 10 independent lives each age 45, the number of deaths by age 65 is simulated from the binomial distribution using the inversion method. Using the random number 0.93, how many deaths are there by age 65? (A) 2 or fewer (B) 3 (C) 4 (D) 5 (E) 6 or more Use the following information for the next two questions: A discrete random variable X has the following distribution: k Prob(X=k) 1 0.05 2 0.20 3 0.30 4 0.35 5 0.10 A random number from (0, 1) is 0.83. 4.11 (1 point) Using the inversion method, simulate a random value of X. A. 1 B. 2 C. 3 D. 4 E. 5 4.12 (2 points) Using the most efficient algorithm, simulate a random value of X. A. 1 B. 2 C. 3 D. 4 E. 5 4.13 (3 points) Hobbs is a baseball player. The probability that Hobbs gets a hit in any given attempt is 30%. The results of his attempts are independent of each other. Let X represent the number of attempts until his first hit. Use the inversion method in order to simulate the random variable X. Generate the total number of attempts until four hits result, by simulating 4 independent values of X. Use the following random numbers: 0.745, 0.524, 0.941, 0.038. A. fewer than 9 B. 9, 10, or 11 C. 12, 13, or 14 D. 15, 16, or 17 E. more than 17
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 43
4.14 (3 points) You wish to simulate via the inversion method, four random draws from a Poisson Distribution with λ = 5.4. You are given the following sequence of four random numbers from [0,1]: 0.5859, 0.9554, 0.1620, 0.3532. Calculate the sample variance of the corresponding sequence of random draws from the Poisson Distribution. A. less than 9 B. at least 9, but less than 10 C. at least 10, but less than 11 D. at least 11, but less than 12 E. at least 12 4.15 (3 points) Using the inversion method, a Negative Binomial random variable with r = 4 and β = 2 is generated, with 0.64 as a random number from (0, 1). Determine the simulated result. A. 6 or less B. 7
C. 8
D. 9
E. 10 or more
4.16 (2 points) You are given the following information about credit scores of individuals: Interval Probability 400 to 499 2% 500 to 549 5% 550 to 599 8% 600 to 649 12% 650 to 699 15% 700 to 749 18% 750 to 799 27% 800 13% Credit scores are integers. Assume that on each interval scores are distributed uniformly. Simulate 3 credit scores using the following random numbers from (0, 1): 0.528, 0.342, 0.914. 4.17 (2 points) Weather is modeled by a Markov chain, with State 1 rain, and State 2 no rain.
• The probability that it rains on a day is 0.50 if it rained on the prior day. • The probability that it rains on a day is 0.20 if it did not rain on the prior day. It is raining today, Sunday. You simulate the weather for Monday through Saturday. Use the following random numbers from (0, 1): 0.661, 0.529, 0.301, 0.132, 0.378, 0.792, 0.995. How many simulated days did it rain during Monday through Saturday? (A) 1 or less (B) 2 (C) 3 (D) 4 (E) 5 or more
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 44
4.18 (2 points) Use the following information: • Annual aggregate losses can take on one of five values: 0, 1000, 2000, 3000, 4000, with probabilities 15%, 25%, 35%, 20%, 5%, respectively. • Each year is independent of every other year.
•
Use the following ten random numbers from (0, 1): 0.679, 0.519, 0.148, 0.206, 0.824, 0.249, 0.392, 0.980, 0.501, 0.844, in order to simulate this model. What are the total aggregate losses paid in ten years? A. 18,000 B. 19,000 C. 20,000 D. 21,000 E. 22,000 4.19 (3 points) A machine is in one of four states (F, G, H, I) and migrates once per day among them according to a Markov process with transition matrix: F ⎛0.20 0.80 0 0 ⎞ ⎜ ⎟ G ⎜0.50 0 0.50 0 ⎟ H ⎜0.75 0 0 0.25⎟ ⎜ ⎟ I ⎝ 1 0 0 0 ⎠ The daily production of the machine depends on the state: State: F G H I Production: 100 90 70 0 The machine is in State F on day 0. Days 1 to 7 are simulated, using the following random numbers from [0,1]: 0.834, 0.588, 0.315, 0.790, 0.941, 0.510, 0.003. What is the total production of the machine on these 7 days? (A) 600 (B) 610 (C) 620 (D) 630
(E) 640
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 45
4.20 (2 points) Use the following table of distribution function values for the annual number of claims: x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
f(x)
F(x) 0.015625 0.062500 0.144531 0.253906 0.376953 0.500000 0.612793 0.709473 0.788025 0.849121 0.894943 0.928268 0.951874 0.968216 0.979305 0.986698 0.991550 0.994689 0.996695 0.997961 0.998753 0.999243 0.999544 0.999727 0.999838 0.999904 0.999943 0.999967 0.999981 0.999989 0.999994
Simulate five years, using the following random numbers: 0.325, 0.072, 0.956, 0.565, 0.899. What is the sum of the simulated number of claims for the five years? (A) 31 or less (B) 32 (C) 33 (D) 34 (E) 35 or more 4.21 (3 points) Claims follow a Zero-Modified Poisson Distribution with pM 0 = 30% and λ = 1.8. Use the inversion method to simulate the number of claims. Do this three times using: u1 = 0.98 u2 = 0.37 u3 = 0.68 Calculate the sum of the simulated values. (A) 5 (B) 6 (C) 7 (D) 8
(E) 9
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 46
4.22 (3 points) The size a family follows a Zero-Truncated Negative Binomial Distribution with r = 6 and β = 0.4. Use the inversion method to simulate the size of family. Do this three times using: u1 = 0.08 u2 = 0.75 u3 = 0.47 Calculate the sum of the simulated values. (A) 5 (B) 6 (C) 7 (D) 8
(E) 9
4.23 (3 points) N follows a Zero-Modified Binomial Distribution with pM 0 = 20%, m = 10 and q = 0.3. Use the inversion method to simulate N. Do this three times using: u1 = 0.4768 u2 = 0.9967 Calculate the sum of the simulated values. (A) 10 (B) 11 (C) 12 (D) 13
u3 = 0.3820 (E) 14
4.24 (4, 5/89, Q.43) (1 point) A discrete empirical distribution X is created from the following observations: $X of Loss Frequency 100 2 200 5 400 5 600 4 1,000 4 (Assume that these five loss values are the only possible loss amounts.) Using this distribution and random numbers Y from the interval (0,1), random losses can be simulated. Calculate the sum of the two independent losses generated by the random Y values of 0.3 and 0.7. A. 400 B. 500 C. 800 D. 1200 E. None of the above 4.25 (4B, 5/93, Q.2) (2 points) You are given the following: • P(k) is a cumulative probability distribution function for the binomial distribution where k
P(k) =
⎛ m⎞
∑ ⎜⎝ j ⎟⎠ qj (1
- q)m - j , k = 0, 1, ..., m.
j=0
• q = 0.25 and m = 5. • An observation from the random variable U, having the uniform distribution on [0, 1], is 0.7. Use the inversion method to determine the simulated number of claims from the binomial distribution. A. 0 B. 1 C. 2 D. 3 E. 4
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 47
4.26 (3, 5/01, Q.11) (2.5 points) You are using the inversion method to simulate Z, the present value random variable for a special two-year term insurance on (70). You are given: (i) (70) is subject to only two causes of death, with k
k|q 70(1)
k|q 70(2)
0 0.10 0.10 1 0.10 0.50 (ii) Death benefits, payable at the end of the year of death, are: During year
Benefit for Cause 1
Benefit for Cause 2
1 2
1000 1100
1100 1200
(iii) i = 0.06 (iv) For this trial your random number, from the uniform distribution on [0, 1], is 0.35. (v) High random numbers correspond to high values of Z. Calculate the simulated value of Z for this trial. (A) 943 (B) 979 (C) 1000 (D) 1038 (E) 1068 4.27 (3, 11/01, Q.13) (2.5 points) We have 100 independent lives age 70. You are given: (i) Mortality follows the Illustrative Life Table in Actuarial Mathematics: q70 = 0.03318. (ii) i = 0.08 (iii) A life insurance pays 10 at the end of the year of death. The number of claims in the first year is simulated from the binomial distribution using the inversion method (where smaller random numbers correspond to fewer deaths). The random number for the first trial, generated using the uniform distribution on [0, 1], is 0.18. Calculate the simulated claim amount. (A) 0 (B) 10 (C) 20 (D) 30 (E) 40 Note: I have rewritten the original exam question. 4.28 (CAS3, 11/03, Q.38) (2.5 points) Using the inversion method, a Binomial (10, 0.20) random variable is generated, with 0.65 from U(0 ,1) as the initial random number. Determine the simulated result. A. 0 B. 1 C. 2 D. 3 E. 4
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 48
4.29 (CAS3, 11/03, Q.39) (2.5 points) When generating random variables, it is important to consider how much time it takes to complete the process. Consider a discrete random variable X with the following distribution: k Prob(X=k) 1 0.15 2 0.10 3 0.25 4 0.20 5 0.30 Of the following algorithms, which is the most efficient way to simulate X? A. If U<0.15 set X = 1 and stop. If U<0.25 set X = 2 and stop. If U<0.50 set X = 3 and stop. If U<0.70 set X = 4 and stop. Otherwise set X = 5 and stop. B. If U<0.30 set X = 5 and stop. If U<0.50 set X = 4 and stop. If U<0.75 set X = 3 and stop. If U<0.85 set X = 2 and stop. Otherwise set X = 1 and stop. C. If U<0.10 set X =2 and stop. If U<0.25 set X = 1 and stop. If U<0.45 set X = 4 and stop. If U<0.70 set X = 3 and stop. Otherwise set X = 5 and stop. D. If U<0.30 set X = 5 and stop. If U<0.55 set X = 3 and stop. If U<0.75 set X = 4 and stop. If U<0.90 set X = 1 and stop. Otherwise set X = 2 and stop. E. If U<0.20 set X = 4 and stop. If U<0.35 set X = 1 and stop. If U<0.45 set X = 2 and stop. If U<0.75 set X = 5 and stop. Otherwise set X = 3 and stop. 4.30 (CAS3, 11/03, Q.40) (2.5 points) W is a geometric random variable with β = 7/3. Three uniform random number from (0, 1) are: 0.68, 0.08, and 0.48. Use the inversion method. Calculate W3 , the third randomly generated value of W. Comment: I have revised the original exam question.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 49
4.31 (CAS3, 5/04, Q.30) (2.5 points) A scientist performs experiments, each with a 60% success rate. Let X represent the number of trials until the first success. Use the inversion method to simulate the random variable, X, and the following random numbers (where low numbers correspond to a high number of trials): 0.15, 0.62, 0.37, 0.78. Generate the total number of trials until three successes result. A. 3 B. 4 C. 5 D. 6 E. 7 4.32 (CAS3, 11/04, Q.38) (2.5 points) A uniform random number from [0, 1] is 0.7885. The inversion method is used to compute the number of failures, F, before a success in a series of independent trials each with success probability p = 0.70. Low random numbers correspond to a small number of failures. Evaluate F. A. 0 B. 1 C. 2 D. 3 E. 4 Comment: I have revised the original exam question in order to match the current syllabus. 4.33 (4, 5/05, Q.12 & 2009 Sample Q.182) (2.9 points) A company insures 100 people age 65. The annual probability of death for each person is 0.03. The deaths are independent. Use the inversion method to simulate the number of deaths in a year. Do this three times using: u1 = 0.20 u2 = 0.03 u3 = 0.09 Calculate the average of the simulated values. (A) 1/3 (B) 1 (C) 5/3 (D) 7/3
(E) 3
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 50
Solutions to Problems: 4.1. D. Calculate a table of values for the distribution function and then determine the first value at which F(x)>.995. This first occurs at x = 3. Number of Claims 0 1 2 3 4 5
Probability Density Function 59.049% 32.805% 7.290% 0.810% 0.045% 0.001%
Cumulative Distribution Function 0.59049 0.91854 0.99144 0.99954 0.99999 1.00000
Comment: In order to construct the table, one could use the relationship: f(x+1) / f(x) = a + b / (x+1), with for the Binomial a = -q/(1-q), b = (m+1)q/(1-q), and f(0) = (1-q)m. 4.2. C. The present value of benefits are either 0, 1000/1.05 = 952, 1100/1.052 = 998, 2000/1.05 = 1905, 2200/1.052 = 1995. Arrange these from smallest to largest: P.V. of Benefits
Probability
Cumulative Distribution
0 952 998 1905 1995
0.82 0.05 0.06 0.03 0.04
0.82 0.87 0.93 0.96 1
See where the cumulative distribution first exceeds the random number .923. F(998) = .93 > .923, so the simulated value is 998. Comment: Similar to 3, 5/01, Q.11.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 51
4.3. C. Calculate a table of values for the distribution function and then determine the first value at which F(x)>.35. This first occurs at x = 4. Number of Claims 0 1 2 3 4 5 6 7
f(x) 0.0167962 0.0537477 0.0967459 0.1289945 0.1418940 0.1362182 0.1180558 0.0944446
F(x) 0.01680 0.07054 0.16729 0.29628 0.43818 0.57440 0.69245 0.78690
Comment: Iʼve only displayed the first part of the Distribution function; for this problem one need only calculate up to F(4). In order to calculate the densities, one could use the relationship: f(x+1) / f(x) = a + b/(x+1), with for the Negative Binomial a = β/(1+β), b = (r-1)β/(1+β) and f(0) = 1/(1+β)r. 4.4. D. The first step is to calculate a table of densities for this Poisson, using the relationship f(x+1) = f(x) {λ / (x+1)} = .7 f(x) /(x+1) and f(0) = e−λ = e-.7 = .49659. 0.7 Number of Claims 0 1 2 3 4 5 6
Probability Density Function 49.659% 34.761% 12.166% 2.839% 0.497% 0.070% 0.008%
Cumulative Distribution Function 0.49659 0.84420 0.96586 0.99425 0.99921 0.99991 0.99999
One cumulates the densities to get the distribution function. The first random number is 0.681; one sees the first time F(x) > 0.681, which occurs when x = 1. Similarly, F(4) > 0.996. Finally F(0) > 0.423. Thus the random draws from this Poisson Distribution are: 1, 4, 0. Their sum is: 1 + 4 + 0 = 5. Comment: If one were only interested in the total number of claims over three years, one could instead simulate a single draw from a Poisson with mean (3)(.7) = 2.1. However, that is not what you were asked to do here.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 52
4.5. D. Calculate a table of the Distribution Function. Since $700 is the first value at which the distribution function is >.78, $700 is the first simulated claim. Since $200 is the first value at which the distribution function is >.31, $200 is the 2nd simulated claim. Since $400 is the first value at which the distribution function is >.60, $400 is the 3rd simulated claim. $700 + $200 + $400 = $1300. Size of Loss 100 200 400 700 1200 2000
Number Observed 3 4 6 3 2 2
f(x) 0.15 0.2 0.3 0.15 0.1 0.1
F(x) 0.15000 0.35000 0.65000 0.80000 0.90000 1.00000
4.6. C. In thousands of dollars take: 1 + largest integer in: (4)(.61) = 1 + 2 = 3. Alternately, F(1000) = .25, F(2000) = .5, F(3000) = .75, F(4000) = 1. Since F(3000) > .61, .61 corresponds to $3000. 4.7. D. This is a Geometric Distribution. One can just calculate a table of values for f(x) and then take the cumulative sum to get F(x). F(x) first exceeds 0.8 when x is 3. x 0 1 2 3 4
f(x) 0.4000 0.2400 0.1440 0.0864 0.0518
F(x) 0.4000 0.6400 0.7840 0.8704 0.9222
Alternately, one sum up the power series: i=x
i=x
F(x) = Σ f(i) = (.4)Σ .6i = (.4) (1 - .6x+1) / (1-.6) = 1 - .6x+1. i=0
i=0
Want .8 < F(x) = 1 - .6x+1. Solving, x > 2.15. Smallest integer x > 2.15 is 3. Therefore, we simulate 3 claims. 4.8. B. One needs to generate four random draws from a Bernoulli Distribution with the same q parameter as the Binomial. For q = .3 the Bernoulli Distribution has F(0) = .7 and F(1) = 1. Therefore, the four random draws corresponding to: .1, .9, .2, .6, are: 0, 1, 0, and 0. The sum of the four Bernoullis is 0 + 1 + 0 + 0 = 1, the random draw from a Binomial. Comment: This is an alternate method of generating a random draw from a Binomial Distribution. One adds up m independent random draws from a Bernoulli Distribution with parameter q.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 53
4.9. D. First simulate a random integer from 1 to 5: 1 + largest integer in (5)(.125) = 1. Now exchange 1 with 5 to get: 5,2,3,4,1. Now simulate a random integer from 1 to 4: 1 + largest integer in (4)(.027) = 1. Now exchange the number in the 1st position with the number in the 4th position to get: 4, 2, 3, 5, 1. Now simulate a random integer from 1 to 3: 1 + largest integer in (3)(.614) = 2. Now exchange the number in the 2nd position with the number in the 3rd position to get: 4,3,2,5,1. Now simulate a random integer from 1 to 2: 1 + largest integer in (2)(.850) = 2. Now exchange the number in the 2nd position with the number in the 2nd position; the sequence remains the same at: 4,3,2,5,1. The simulated random permutation is: 4, 3, 2, 5, 1. Comment: We only use the first four random numbers. If we had been asked to simulate a random subset of size two, without replacement, the answer would have been {5, 1}, and we could have stopped after using just two random numbers. 4.10. C. The probability of death is: q = 1 - l65/l45 = 1 - 7533964/9164051 = .1779. m = 10. The Distribution Function of the Binomial first exceeds .93 when there are 4 deaths. Number of Deaths 0 1 2 3 4 5 6 7 8 9 10
Probability Density Function 14.100889% 30.513904% 29.714033% 17.146743% 6.493382% 1.686178% 0.304070% 0.037600% 0.003051% 0.000147% 0.000003%
4.11. D. F(x) first exceeds .83 at x = 4. x
f(x)
F(x)
1 2 3 4 5
0.05 0.20 0.30 0.35 0.10
0.05 0.25 0.55 0.90 1.00
Cumulative Distribution Function 0.1410089 0.4461479 0.7432883 0.9147557 0.9796895 0.9965513 0.9995920 0.9999680 0.9999985 1.0000000 1.0000000
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 54
4.12. B. The most efficient way to simulate X ⇔ doing the fewest comparisons on average ⇒ testing for the largest probabilities first. In this case, f(4) > f(3) > f(2) > f(5) > f(1). So we take the cumulative sums of f(4), f(3), f(2), f(5), and f(1): x
f(x)
F(x)
4 3 2 5 1
0.35 0.30 0.20 0.10 0.05
0.35 0.65 0.85 0.95 1.00
. Let U be a random number from (0, 1). If U < 0.35 set X = 4 and stop. If U < 0.65 set X = 3 and stop. If U < 0.85 set X = 2 and stop. If U < 0.95 set X = 5 and stop. Otherwise set X = 1 and stop. In this case, we stop when .83 < .85 and X = 2. Comment: Similar to CAS3, 11/03, Q.39.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 55
4.13. D. First construct a table of the distribution of X. f(1) = .3, f(2) = (.3){1 - f(1)} = (.3)(.7) = .21. f(3) = .7 f(2) = .147. f(x+1) = .7 f(x). x
f(x)
F(x)
1 2 3 4 5 6 7 8
0.3 0.21 0.147 0.1029 0.07203 0.05042 0.03529 0.02471
0.3 0.51 0.657 0.7599 0.83193 0.88235 0.91765 0.94235
We need to see where u is first exceeded by F(x). .745 is first exceeded by F(4). .524 is first exceeded by F(3). .941 is first exceeded by F(8). 038 is first exceeded by F(1). Total number of attempts until four hits result: 4 + 3 + 8 + 1 = 16. Alternately, X is 1 plus a Geometric with β = probability of failure / probability of success = 7/3. For a Geometric, S(n) = (β/(1+β))n+1. Therefore, S(x) = (β/(1+β))x = .7x. Given a random number u, we want the first x such that F(x) > u ⇔ 1 - .7x > u ⇔ 1 - u > .7x
⇔ ln(1 - u) > x ln(.7). Since ln(.7) < 0, ln(1 - u) > x ln(.7) ⇔ x > ln(1-u)/ln(.7). Therefore we want: x = 1 + largest integer in: ln(1-u)/ln(.7). ln.255 / ln.7 = 3.83 ⇒ x = 4. ln.524 / ln.7 = 2.08 ⇒ x = 3. ln.059 / ln.7 = 7.94 ⇒ x = 8. ln.962 / ln.7 = 0.11 ⇒ x = 1. Total number of simulated attempts until four hits result: 4 + 3 + 8 + 1 = 16. Comment: Similar to CAS3, 5/04, Q.30. Note that X is the number of trials rather than the number of failures until the first success. X is a zero-truncated Geometric; X = 1, 2, 3, .... The number failures, X - 1, is a Geometric with β = 7/3.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 56
4.14. B. The first step is to calculate a table of densities for this Poisson, using the relationship f(x+1) = f(x) {λ / (x+1)} = f(x) 5.4 / (x+1), and f(0) = e−λ = e-5.4 = 0.004517. 5.4 Number of Claims
Probability Density Function
Cumulative Distribution Function
0 1 2 3 4 5 6 7 8 9 10 11
0.4517% 2.4390% 6.5852% 11.8533% 16.0020% 17.2821% 15.5539% 11.9987% 8.0991% 4.8595% 2.6241% 1.2882%
0.004517 0.028906 0.094758 0.213291 0.373311 0.546132 0.701671 0.821659 0.902650 0.951245 0.977486 0.990368
One cumulates the densities to get the distribution function. The first random number is 0.5859; one sees the first time F(x) > 0.5859, which occurs when x = 6 and F(6) = .701671. Similarly, F(10) = 0.977486 > 0.9554. Proceeding similarly, the four random draws from this Poisson Distribution are: 6, 10, 3, and 4. The sample mean is: (6 + 10 + 3 + 4)/4 = 5.75. The sample variance is: {(6 - 5.75)2 + (10 - 5.75)2 + (3 - 5.75)2 + (4 - 5.75)2 }/(4 - 1) = 9.58. 4.15. D. Calculate a table of values for the distribution function and then determine the first value at which F(x) > 0.64. This first occurs at x = 9. Number of Claims 0 1 2 3 4 5 6 7 8 9
f(x) 0.0123457 0.0329218 0.0548697 0.0731596 0.0853528 0.0910430 0.0910430 0.0867076 0.0794820 0.0706507
F(x) 0.01235 0.04527 0.10014 0.17330 0.25865 0.34969 0.44074 0.52744 0.60693 0.67758
Comment: Iʼve only displayed the first part of the Distribution function; for this problem one need only calculate up to F(9). In order to calculate the densities, one could use the relationship: f(x+1) / f(x) = a + b/(x+1), with for the Negative Binomial a = β/(1+β), b = (r-1)β/(1+β), and f(0) = 1/(1+β)r. f(x+1) / f(x) = {β/(1+β)}(x + r)/(x + 1). In this case, a = 2/3, b = 6/3, f(x+1) = f(x){2/3 + (6/3)/(x+1)} = f(x)(2/3)(x + 4)/(x + 1).
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 57
4.16. The cumulative distribution function is: 499 549 599 649 699 749 799 800 2% 7% 15% 27% 42% 60% 87% 100% Therefore, .528 corresponds to the interval from 700 to 749. (.528 - 42%)/18% = 0.6. Thus .528 corresponds to: 700 + (50)(.6) = 730. .342 corresponds to the interval from 650 to 699. (.342 - 27%)/15% = 0.48. Thus .342 corresponds to: 650 + (50)(.48) = 674. .914 is greater than 87%, and corresponds to a credit score of 800. Comment: Depending on details, your simulated credit scores could differ by 1 from what I have. 4.17. B. State 1 is rain and State 2 is no rain. The transition matrix is: ⎛ 0.5 0.5⎞ ⎜ 0.2 0.8⎟ ⎝ ⎠ Since the chain starts in state 1, we compare .661 to the cumulative sums across the 1st row of the transition matrix. .5 ≤ .661 < 1.0, so the chain goes to state 2. It does not rain on Monday. Since the chain is now in state 2, we compare .529 to the cumulative sums across the 2nd row of the transition matrix. .2 ≤ .529 < 1.0, so the chain remains in state 2. It does not rain on Tuesday. .2 ≤ .301, so it does not rain on Wednesday. .132 < .2, so it rains on Thursday. Since the chain is now in state 1, and .378 < .5, it remains in state 1. It rains Friday. .5 ≤ .792, so it does not rain on Saturday. It rains Thursday and Friday, 2 days. 4.18. B.
L F(L)
0 1000 0.15 0.40
Year 1 2 3 4 5 6 7 8 9 10 Total
u
Aggregate Loss
0.679 0.519 0.148 0.206 0.824 0.249 0.392 0.980 0.501 0.844
2000 2000 0 1000 3000 1000 1000 4000 2000 3000 19,000
2000 0.75
3000 0.95
4000 1.00
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 58
4.19. C. In State F there is a 20% chance of staying in F and an 80% chance of going to G. The first random number, .834 ≥ .20, so day one is in State G. In State G there is a 50% chance of going to F and an 50% chance of going to H. The next random number, .588 ≥ .50, so day two is in State H. In State H there is a 75% chance of going to F and an 25% chance of going to I. The next random number, .315 < .75, so day three is in State F. In State F there is a 20% chance of staying in F and an 80% chance of going to G. The next random number, .790 ≥ .20, so day four is in State G. In State G there is a 50% chance of going to F and an 50% chance of going to H. The next random number, .941 ≥ .50, so day five is in State H. In State H there is a 75% chance of going to F and an 25% chance of going to I. The next random number, .510 < .75, so day six is in State F. In State F there is a 20% chance of staying in F and an 80% chance of going to G. The first random number, .003 < .20, so day seven is in State F. Thus the machine is in: G, H, F, G, H, F, F. The total production is: 90 + 70 + 100 + 90 + 70 + 100 + 100 = 620. 4.20. E. F(4) = 0.376953 > 0.325. F(2) > 0.072. F(13) > 0.956. F(6) > 0.565. F(11) > 0.899. 4 + 2 + 13 + 6 + 11 = 36. Comment: Based on a Negative Binomial Distribution with r = 10 and β = 1. 4.21. D. A table of the Zero-Modified Poisson Distribution with pM 0 = 30% and λ = 1.8: Number of Claims
Probability
Distribution Function
0 1 2 3 4 5 6 7 8
30.000000% 24.952237% 22.457013% 13.474208% 6.063394% 2.182822% 0.654847% 0.168389% 0.037888%
30.000000% 54.952237% 77.409250% 90.883458% 96.946852% 99.129673% 99.784520% 99.952909% 99.990797%
0.98 is first exceeded when n = 5. 0.37 is first exceeded when n = 1. 0.68 is first exceeded when n = 2. 5 + 1 + 2 = 8.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 59
4.22. C. For the non-truncated Negative Binomial Distribution, f(0) = 1/1.46 = 0.13281. pT k = pk / (1 - 0.13281) =
(6)(7) ... (5 +k) 0.4k 1 . 6 + k k! 1 - 0.13281 1.4
A table of the Zero-Truncated Negative Binomial Distribution with r = 6 and β = 0.4: Size of Family
Probability
Distribution Function
1 2 3 4 5 6 7 8 9
26.254327% 26.254327% 20.003297% 12.859262% 7.348150% 3.849031% 1.885240% 0.875290% 0.389018%
26.254327% 52.508653% 72.511950% 85.371212% 92.719362% 96.568393% 98.453632% 99.328922% 99.717940%
0.08 is first exceeded when n = 1. 0.75 is first exceeded when n = 4. 0.47 is first exceeded when n = 2. 1 + 4 + 2 = 7. 4.23. B. A table of the Zero-Modified Binomial Distribution: Number of Claims
Probability
Distribution Function
0 1 2 3 4 5 6 7 8 9 10
20.000000% 9.966392% 19.220898% 21.966741% 16.475055% 8.472886% 3.026031% 0.741069% 0.119100% 0.011343% 0.000486%
20.000000% 29.966392% 49.187290% 71.154030% 87.629086% 96.101971% 99.128002% 99.869071% 99.988171% 99.999514% 100.000000%
0.4768 is first exceeded when n = 2. 0.9967 is first exceeded when n = 7. 0.3820 is first exceeded when n = 2. 2 + 7 + 2 = 11.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 60
4.24. C. There are 20 observations and the cumulative Distribution Function is: x 100 200 400 600 1000 F(x) .10 .35 .60 .80 1.00 For each random number u from (0,1) one wants the smallest x such that F(x) > u. For u =.3, F(200) = .35 . .3, so we simulate a loss of size 200. For u = .7, F(600) = .80 > .7, so we simulate a loss of size 600. The sum of the simulated values is 200 + 600 = 800. 4.25. C. Calculate a table of values for the distribution function and then determine the first value at which F(x) > .7. This first occurs at x = 2. Number of Claims 0 1 2 3 4 5
Probability Density Function 23.730% 39.551% 26.367% 8.789% 1.465% 0.098%
Cumulative Distribution Function 0.23730 0.63281 0.89648 0.98438 0.99902 1.00000
Sum
1
4.75000
4.26. B. The present value of benefits are either 0, 1000/1.06 = 943, 1100/1.062 = 979, 1100/1.06 = 1038, 1200/1.062 = 1068. Arrange these from smallest to largest: P.V. of Benefits
Probability
Cumulative Distribution
0 943 979 1038 1068
0.2 0.1 0.1 0.1 0.5
0.2 0.3 0.4 0.5 1
See where the cumulative distribution first exceeds the random number .35. F(979) = .40 > .35, so the simulated value is 979. 4.27. C. q 70 = .03318. # of deaths the first year is Binomial with q = .03318 and m = 100: f(0) = (1 - .03318)100 = .0342.
F(0) = .0342.
f(1) = 100(.03318)(1 - .03318)99 = .1175.
F(1) = F(0) + f(1) = .1517.
f(2) = {(100)(99)/2}(.033182 )(1 - .03318)98 = .1996. F(2) = F(1) + f(2) = .3513. See where the distribution function first exceeds the random number of .18: Since F(1) ≤ .18 < F(2), there are 2 deaths. The payment is: (2)(10) = 20. Comment: Some of the given information was used to answer the previous question on this exam.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 61
4.28. C. For a Binomial with m = 10 and q = .2, F(x) first exceeds .65 when x = 2. x
f(x)
F(x)
0 1 2 3 4 5 6 7 8 9 10
0.1073742 0.2684355 0.3019899 0.2013266 0.0880804 0.0264241 0.0055050 0.0007864 0.0000737 0.0000041 0.0000001
0.1073742 0.3758096 0.6777995 0.8791261 0.9672065 0.9936306 0.9991356 0.9999221 0.9999958 0.9999999 1.0000000
Comment: One can stop computing the densities and distribution function, when one gets to x = 2 and notices that F(2) > .65, the given random number. 4.29. D. The most efficient way to simulate X ⇔ doing the fewest comparisons on average ⇒ testing for the largest probabilities first. In this case, f(5) > f(3) > f(4) > f(1) > f(2). So we take the cumulative sums of f(5), f(3), f(4), f(1), and f(2): .3, .3 + .25 = .55, .3 + .25 + .2 = .75, .3 + .25 + .2 + .15 = .90, .3 + .25 + .2 + .15 + .1 = 1.00. Let U be a random number from (0, 1). If U < 0.30 set X = 5 and stop. If U < 0.55 set X = 3 and stop. If U < 0.75 set X = 4 and stop. If U < 0.90 set X = 1 and stop. Otherwise set X = 2 and stop. Comment: All of these are valid algorithms. Expected number of comparisons: A. (1)(.15) + (2)(.1) + (3)(.25) + (4)(.5) = 3.1. B. (1)(.3) + (2)(.2) + (3)(.25) + (4)(.25) = 2.45. C. (1)(.1) + (2)(.15) + (3)(.2) + (4)(.55) = 3.2. D. (1)(.3) + (2)(.25) + (3)(.2) + (4)(.25) = 2.4. E. (1)(.2) + (2)(.15) + (3)(.1) + (4)(.55) = 3.0. 4.30. The third random number is .480. For a Geometric Distribution with β = 7/3, F(x) first exceeds .480 when x = 1. x
f(x)
F(x)
0 1 2 3
0.3000000 0.2100000 0.1470000 0.1029000
0.3000000 0.5100000 0.6570000 0.7599000
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 62
4.31. D. X is the number of trials before the first success. Prob[X = 1] = Prob[success on first trial] = .6. Prob[X = 2] = Prob[failure on first trial] Prob[success on second trial] = (.4)(.6) = .24. Prob[X = 3] = Prob[failure on 1st trial] Prob[failure on 2nd trial]Prob[success on 3rd trial] = (.4)(.4)(.6) = .096. Note that Prob[X = 3] = .4 Prob[X = 2]. f(x+1) = .4 f(x). x
f(x)
F(x)
1 2 3 4 5 6 7 8
0.6 0.24 0.096 0.0384 0.01536 0.006144 0.0024576 0.00098304
0.6 0.84 0.936 0.9744 0.98976 0.99590 0.99836 0.99934
Since low numbers correspond to a high number of trials, we need to see where 1 - u first exceeds F(x), rather than the more usual where u first exceeds F(x). 1 - .15 = .85, first exceeded by F(3) = .936. 1 - .62 = .38, first exceeded by F(1) = .6. 1 - .37 = .63, first exceeded by F(2) = .84. Total number of trials until three successes: 3 + 1 + 2 = 6. Alternately, X is 1 plus a Geometric with β = probability of failure / probability of success = .4/.6 = 2/3. For a Geometric as per Loss Models, S(n) = Prob[# of failures > n] = (β/(1+β))n+1.
⇒ S(x) = Prob[# of trials > x] = Prob[# of failures > x-1] = (β/(1+β))x = {(2/3)/(5/3)}x = 0.4x. Given a random number u, we want the first integer x such that: S(x) > u ⇔ .4x < u ⇔ x > ln(u)/ln(.4). ln.15 / ln.4 = 2.07 ⇒ x = 3. ln.62 / ln.4 = .52 ⇒ x = 1. ln.37 / ln.4 = 1.09 ⇒ x = 2. Total number of trials until three successes: 3 + 1 + 2 = 6. Comment: The number failures, X - 1, is a Geometric with β = 2/3. 4.32. B. f(0) = .7. f(1) = Prob[failure first trial] Prob[success second trial] = (.3)(.7) = .21. f(2) = Prob[failure first trial] Prob[failure second trial] Prob[success third trial] = (.32 )(.7) = .063. F(0) = .7 ≤ .7885. F(1) = .91 > .7885. Thus we simulate 1 failure. Comment: The number of failures before the first success is Geometric, with β = .3/.7 = 3/7.
2013-4-13,
Simulation §4 Discrete Dist. Inversion Method, HCM 10/25/12, Page 63
4.33. B. The number of deaths in a year is Binomial with m = 100 and q = 0.03. f(0) = .97100 = 0.04755. f(1) = 100(.9799).03 = 0.14707. f(2) = {(100)(99)/2}(.9798).032 = 0.22515. F(0) = 0.04755. F(1) = 0.19462. F(2) = 0.41977. u1 = 0.20 is first exceeded by F(2) = 0.41977, simulate two deaths. u2 = 0.03 is first exceeded by F(0) = 0.04755, simulate zero deaths. u3 = 0.09 is first exceeded by F(1) = 0.19462, simulate one death. Average number of deaths simulated is: (2 + 0 + 1)/3 = 1. Comment: Given larger random numbers, one would need to calculate more of the distribution function of the Binomial.
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 64
Section 5, Simulating Normal and LogNormal Distributions One can simulate a random draw from a Standard Unit Normal many different ways, including the inversion method (Table Lookup), the Polar Normal Method,34 and the Rejection Method. 35 A random draw from a Standard Unit Normal can in turn be used to simulate a random Normal or a random LogNormal. Inversion Method (Table Lookup): In order to use the inversion method (Table Lookup) in order to simulate a random draw from a Standard Unit Normal36, one uses a Table of the (standard unit) Normal Distribution, and a random number from (0, 1). Exercise: Given a random number 0.95 from (0,1), use the inversion method in order to simulate a random draw from a Standard Unit Normal. [Solution: Φ(1.645) = 0.95, therefore, the random draw from a Standard Unit Normal is 1.645.] In general, one looks up in a Normal Table, the place where the Distribution equals the random number from (0,1). Given a random number u, find Z such that Φ(Z) = u. One random number from (0,1), gives one random draw from a Standard Unit Normal. Exercise: Given a random number 0.3974 from (0,1), use the inversion method in order to simulate a random draw from a Standard Unit Normal. [Solution: Φ(0.260) = 0.6026, Φ(-0.260) = 1 - 0.6026 = 0.3974. Therefore, the random draw from a Standard Unit Normal is -0.260.] Non-Unit Normals: Assume we have simulated a random draw from a Standard Unit Normal, with a mean of zero and a standard deviation of one. Then we can convert this to a random draw from a Normal with mean µ and standard deviation σ by multiplying the standard normal variable by σ and then adding µ. Simulate a Unit Normal Z, then X = σZ + µ. For example, assume we are trying to simulate heights of human males, which are assumed to be normally distributed with mean 69 inches and a standard deviation of 4 inches. Then a standard normal draw of 1.2 would translate to a height of (1.2)(4) + 69 = 73.8.37 34
See for example Simulation by Ross, not on the syllabus. See for example Simulation by Ross, not on the syllabus. 36 With a mean of zero and a standard deviation of one. 37 This is the inverse of the usual method of standardizing variables so that one can use the Standard Normal Table. 35
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 65
Exercise: Simulate a random draw from a Normal Distribution with parameters µ = 2, σ = 7. Use the random number from zero to one: 0.6406. [Solution: Φ(0.36) = 0.6406. ⇒ Z = 0.36. ⇒ X = (7)(0.36) + 2 = 4.52. Comment: If we standardize 4.52: (4.52 - 2) / 7 = 0.36.] Simulating a LogNormal Distribution: Assume we want to simulate a random variable Y from a LogNormal Distribution with µ = 10 and σ = 4. Then by the definition of the LogNormal Distribution, ln(Y) is Normally distributed with mean 10 and standard deviation 4. Above we saw how to simulate such a variable; if Z is distributed as per a Standard Normal, then 4Z + 10 is distributed as per a Normal Distribution with standard deviation 4 and mean 10. Set ln(Y) = 4Z + 10. Therefore, Y = exp(4Z + 10). For example, if a random draw from a Unit Normal is -0.211, then exp[(-0.211)(4) +10] = 9471 is a random draw from a LogNormal Distribution with µ = 10 and σ = 4. In general, if Z is a random draw from a Unit Normal, then exp(σZ + µ) is a random draw from a LogNormal Distribution with parameters µ and σ. In order to simulate a LogNormal Distribution with parameters µ and σ: 1. Simulate a Unit Normal, Z. 2. Get a random Normal variable with parameters µ and σ. X = σZ + µ. 3. Exponentiate to get a random LogNormal variable. Y = exp(σZ + µ). Exercise: Simulate a random draw from a LogNormal Distribution with parameters µ = 5, σ = 2. Use the random number from zero to one: 0.4286. [Solution: Φ(0.18) = 0.5714 = 1 - 0.4286. ⇒ Z = -0.18. Exp[(2)(-0.18) + 5] = e4.64 = 103.5.]
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 66
Normal Distribution Table Entries represent the area under the standardized normal distribution from -∞ to z, Pr(Z < z). The value of z to the first decimal place is given in the left column. The second decimal is given in the top row. z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.1 0.2 0.3 0.4
0.5000 0.5398 0.5793 0.6179 0.6554
0.5040 0.5438 0.5832 0.6217 0.6591
0.5080 0.5478 0.5871 0.6255 0.6628
0.5120 0.5517 0.5910 0.6293 0.6664
0.5160 0.5557 0.5948 0.6331 0.6700
0.5199 0.5596 0.5987 0.6368 0.6736
0.5239 0.5636 0.6026 0.6406 0.6772
0.5279 0.5675 0.6064 0.6443 0.6808
0.5319 0.5714 0.6103 0.6480 0.6844
0.5359 0.5753 0.6141 0.6517 0.6879
0.5 0.6 0.7 0.8 0.9
0.6915 0.7257 0.7580 0.7881 0.8159
0.6950 0.7291 0.7611 0.7910 0.8186
0.6985 0.7324 0.7642 0.7939 0.8212
0.7019 0.7357 0.7673 0.7967 0.8238
0.7054 0.7389 0.7704 0.7995 0.8264
0.7088 0.7422 0.7734 0.8023 0.8289
0.7123 0.7454 0.7764 0.8051 0.8315
0.7157 0.7486 0.7794 0.8078 0.8340
0.7190 0.7517 0.7823 0.8106 0.8365
0.7224 0.7549 0.7852 0.8133 0.8389
1.0 1.1 1.2 1.3 1.4
0.8413 0.8643 0.8849 0.9032 0.9192
0.8438 0.8665 0.8869 0.9049 0.9207
0.8461 0.8686 0.8888 0.9066 0.9222
0.8485 0.8708 0.8907 0.9082 0.9236
0.8508 0.8729 0.8925 0.9099 0.9251
0.8531 0.8749 0.8944 0.9115 0.9265
0.8554 0.8770 0.8962 0.9131 0.9279
0.8577 0.8790 0.8980 0.9147 0.9292
0.8599 0.8810 0.8997 0.9162 0.9306
0.8621 0.8830 0.9015 0.9177 0.9319
1.5 1.6 1.7 1.8 1.9
0.9332 0.9452 0.9554 0.9641 0.9713
0.9345 0.9463 0.9564 0.9649 0.9719
0.9357 0.9474 0.9573 0.9656 0.9726
0.9370 0.9484 0.9582 0.9664 0.9732
0.9382 0.9495 0.9591 0.9671 0.9738
0.9394 0.9505 0.9599 0.9678 0.9744
0.9406 0.9515 0.9608 0.9686 0.9750
0.9418 0.9525 0.9616 0.9693 0.9756
0.9429 0.9535 0.9625 0.9699 0.9761
0.9441 0.9545 0.9633 0.9706 0.9767
2.0 2.1 2.2 2.3 2.4
0.9772 0.9821 0.9861 0.9893 0.9918
0.9778 0.9826 0.9864 0.9896 0.9920
0.9783 0.9830 0.9868 0.9898 0.9922
0.9788 0.9834 0.9871 0.9901 0.9925
0.9793 0.9838 0.9875 0.9904 0.9927
0.9798 0.9842 0.9878 0.9906 0.9929
0.9803 0.9846 0.9881 0.9909 0.9931
0.9808 0.9850 0.9884 0.9911 0.9932
0.9812 0.9854 0.9887 0.9913 0.9934
0.9817 0.9857 0.9890 0.9916 0.9936
2.5 2.6 2.7 2.8 2.9
0.9938 0.9953 0.9965 0.9974 0.9981
0.9940 0.9955 0.9966 0.9975 0.9982
0.9941 0.9956 0.9967 0.9976 0.9982
0.9943 0.9957 0.9968 0.9977 0.9983
0.9945 0.9959 0.9969 0.9977 0.9984
0.9946 0.9960 0.9970 0.9978 0.9984
0.9948 0.9961 0.9971 0.9979 0.9985
0.9949 0.9962 0.9972 0.9979 0.9985
0.9951 0.9963 0.9973 0.9980 0.9986
0.9952 0.9964 0.9974 0.9981 0.9986
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 67
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
3.0 3.1 3.2 3.3 3.4
0.9987 0.9990 0.9993 0.9995 0.9997
0.9987 0.9991 0.9993 0.9995 0.9997
0.9987 0.9991 0.9994 0.9995 0.9997
0.9988 0.9991 0.9994 0.9995 0.9997
0.9988 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9997 0.9998
3.5 3.6 3.7 3.8 3.9
0.9998 0.9998 0.9999 0.9999 1.0000
0.9998 0.9998 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
0.9998 0.9999 0.9999 0.9999 1.0000
z Pr(Z < z)
Values of z for selected values of Pr(Z < z) 0.842 1.036 1.282 1.645 1.960 0.800 0.850 0.900 0.950 0.975
2.236 0.990
2.576 0.995
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 68
Problems: 5.1 (1 point) Assume -2.153 is a random draw from a Normal Distribution with a mean of zero and standard deviation of one. Use this value to simulate a random draw from a Normal Distribution with a mean of 10 and a standard deviation of 4. A. less than 1 B. at least 1 but less than 2 C. at least 2 but less than 3 D. at least 3 but less than 4 E. at least 4 5.2 (1 point) Assume -2.153 is a random draw from a Normal Distribution with a mean of zero and standard deviation of one. Use this value to simulate a random draw from a LogNormal Distribution with µ = 10 and σ = 4. A. less than 1 B. at least 1 but less than 2 C. at least 2 but less than 3 D. at least 3 but less than 4 E. at least 4 5.3 (1 point) A random number 0.0228 is generated from a uniform distribution on the interval (0, 1). Using the Method of Inversion, determine the simulated value of a random draw from a Normal Distribution with µ = 5 and σ = 17. A. Less than -20 B. At least -20, but less than -10 C. At least -10, but less than 0 D. At least 0, but less than 10 E. At least 10 5.4 (2 points) A random number 0.9772 is generated from a uniform distribution on the interval (0, 1). Using the Method of Inversion, determine the simulated value of a random draw from a LogNormal Distribution with µ = 5 and σ = 3. A. Less than 55,000 B. At least 55,000, but less than 60,000 C. At least 60,000, but less than 65,000 D. At least 65,000, but less than 70,000 E. At least 70,000
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 69
5.5 (3 points) Insurance for a city's snow removal costs covers four winter months.
• There is a deductible of 5000 per month. • The city's monthly costs are independent. • The cost for each month is LogNormal with parameters µ = 9 and σ = 1.5. • To simulate four months of claim costs, the insurer uses the Method of Inversion. • The 4 numbers drawn from the uniform distribution on [0,1] are: 0.6879, 0.1515, 0.2743, 0.8078. Calculate the insurer's simulated claim cost. A. 29,000 B. 31,000 C. 33,000 D. 35,000
E. 37,000
Use the following information for the next 3 questions: X(0) = 0, X(1) - X(0) is Normally distributed with µ = 0 and σ = 5, X(2) - X(1) is Normally distributed with µ = 0 and σ = 5, X(3) - X(2) is Normally distributed with µ = 0 and σ = 5, etc. X(1) - X(0) is independent of X(2) - X(1), X(2) - X(1) is independent of X(3) - X(2), etc. 5.6 (1 point) Simulate X(1). Use the following random number from [0, 1]: 0.3085. A. less than -4 B. at least -4 but less than -3 C. at least -3 but less than -2 D. at least -2 but less than -1 E. at least -1 5.7 (1 point) Given the solution to the previous question, simulate X(2). Use the following random number from [0, 1]: 0.8159. A. less than -1 B. at least -1 but less than 0 C. at least 0 but less than 1 D. at least 1 but less than 2 E. at least 2 5.8 (1 point) Given the solution to the previous question, simulate X(3). Use the following random number from [0, 1]: 0.1151. A. less than -5 B. at least -5 but less than -3 C. at least -3 but less than -1 D. at least -1 but less than 1 E. at least 1
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 70
5.9 (2 points) In Munchkin Land, the heights of adult males are Normally Distributed with mean 40 and standard deviation 5. Dorothy Gale meets three adult male Munchkins. Simulate their heights using the following random numbers from [0, 1]: 0.7486, 0.1210, 0.5319. What is the sum of these three simulated heights? A. 117 B. 118 C. 119 D. 120 E. 121 5.10 (3 points) A retrospectively rated workers compensation policy is written for the Phil Fish Canning Company. The insurance premium paid by the canning company depends on its annual aggregate losses, L: P = (1.03) (1.15L + 80,000), subject to a minimum premium of 300,000 and a maximum premium of 600,000. Phil Fish Canning Companyʼs annual aggregate losses are LogNormal with µ = 12.3 and σ = 0.8. You simulate 5 separate years of losses, using the following random draws from a Standard Normal Distribution with mean zero and standard deviation one: 0.1485, -1.5499, 0.3249, -0.1484, 1.8605. What is the average of the five simulated premiums? A. 400,000 B. 425,000 C. 450,000 D. 475,000 E. 500,000 5.11 (3 points) Mr. Kotterʼs class takes a standardized statewide reading test. A score of at least 60 passes the test. The scores of Mr. Kotterʼs students on this reading test are Normally Distributed with µ = 50 and σ = 10. The following are four uniform (0, 1) random numbers: 0.5596 0.3821 0.8643 0.0495 Using these numbers and the inversion method, simulate the scores of four students: Vinnie Barbarino, Arnold Horshack, Freddie Washington, and Juan Epstein. Determine the difference between the average simulated score of those students who passed and the average simulated score of those students who failed. A. 14 B. 15 C. 16 D. 17 E. 18 5.12 (2 points) Assume the following model:
• •
Annual aggregate losses follow a LogNormal Distribution with µ = 10 and σ = 2.
Each year is independent of the others. Use the following random numbers from (0, 1): 0.9099, 0.3483, 0.5000, in order to simulate this model. What is the simulated total aggregate loss for three years? A. Less than 360,000 B. At least 360,000, but less than 370,000 C. At least 370,000, but less than 380,000 D. At least 380,000, but less than 390,000 E. At least 390,000
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 71
5.13 (3 points) Losses for Medical Malpractice Insurance are assumed to follow a LogNormal Distribution with median of 59,874, and coefficient of variation of 4. The amount of Allocated Loss Adjustment Expense (ALAE) has the following relationship to the amount of loss: ln[ALAE] = 4.6 + ln[Loss] / 2. A random number 0.7224 is generated from a uniform distribution on the interval (0, 1). Using the Method of Inversion, simulate a random Medical Malpractice Loss. What is the resulting ratio of ALAE to Loss? A. 13% B. 16% C. 19% D. 22% E. 25% 5.14 (2 points) Assume the following model:
• •
Annual aggregate losses follow a Normal Distribution with µ = 900 and σ = 150.
Each year is independent of the others. Use the following random numbers from (0, 1): 0.063, 0.834, 0.648, in order to simulate this model. What is the simulated total aggregate loss for three years? A. Less than 2,000 B. At least 2,000, but less than 2,500 C. At least 2,500, but less than 3,000 D. At least 3,000, but less than 3,500 E. At least 3,500 5.15 (4 points) Using the “Antithetic Variate Method”, if one has a random number u, then one uses both u and 1 - u in your simulation. This produces two random outputs instead of one. (a) (1 point) Applying the Antithetic Variate Method to a LogNormal Distribution, using a random number of 0.90, what are the two outputs? (b) (3 points) Applying the Antithetic Variate Method to a LogNormal Distribution, let X and Y be the two outputs from a single random number u. What is the correlation of X and Y? 5.16 (4B, 5/98, Q.17) (2 points) You are given the following: •
In 1997, claims follow a lognormal distribution with parameters µ = 7 and σ = 2.
• Inflation of 5% affects all claims uniformly from 1997 to 1998. • A random number is generated from a uniform distribution on the interval (0, 1). The resulting number is 0.6915 Using the inversion method, determine the simulated value of a claim in 1998. A. Less than 2,000 B. At least 2,000, but less than 3,000 C. At least 3,000, but less than 4,000 D. At least 4,000, but less than 5,000 E. At least 5,000
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 72
5.17 (4B, 5/99, Q.9) (1 point) Two random numbers are generated from a uniform distribution on the interval (0, 1). The resulting numbers are 0.1587 and 0.8413. Using the inversion method, determine the sum of two simulated values from a normal distribution with mean zero and variance one. A. Less than -1.5 B. At least -1.5, but less than -0.5 C. At least -0.5, but less than 0.5 D. At least 0.5, but less than 1.5 E. At least 1.5 5.18 (3, 5/00, Q.32) (2.5 points) Insurance for a city's snow removal costs covers four winter months. (i) There is a deductible of 10,000 per month. (ii) The insurer assumes that the city's monthly costs are independent and normally distributed with mean 15,000 and standard deviation 2,000. (iii) To simulate four months of claim costs, the insurer uses the Inverse Transform Method (where small random numbers correspond to low costs). (iv) The four numbers drawn from the uniform distribution on [0,1] are: 0.5398 0.1151 0.0013 0.7881 Calculate the insurer's simulated claim cost. (A) 13,400 (B) 14,400 (C) 17,800 (D) 20,000 (E) 26,600 5.19 (4, 5/05, Q.34 & 2009 Sample Q.202) (2.9 points) Unlimited claim severities for a warranty product follow the lognormal distribution with parameters µ = 5.6 and σ = 0.75. You use simulation to generate severities. The following are six uniform (0, 1) random numbers: 0.6179 0.4602 0.9452 0.0808 0.7881 0.4207 Using these numbers and the inversion method, calculate the average payment per claim for a contract with a policy limit of 400. (A) Less than 300 (B) At least 300, but less than 320 (C) At least 320, but less than 340 (D) At least 340, but less than 360 (E) At least 360
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 73
5.20 (4, 11/05, Q.27 & 2009 Sample Q.237) (2.9 points) Losses for a warranty product follow the lognormal distribution with underlying normal mean and standard deviation of 5.6 and 0.75 respectively. You use simulation to estimate claim payments for a number of contracts with different deductibles. The following are four uniform (0,1) random numbers: 0.6217 0.9941 0.8686 0.0485 Using these numbers and the inversion method, calculate the average payment per loss for a contract with a deductible of 100. (A) Less than 630 (B) At least 630, but less than 680 (C) At least 680, but less than 730 (D) At least 730, but less than 780 (E) At least 780 5.21 (4, 11/06, Q.21 & 2009 Sample Q.265) (2.9 points) For a warranty product you are given: (i) Paid losses follow the lognormal distribution with µ = 13.294 and σ = 0.494. (ii) The ratio of estimated unpaid losses to paid losses, y, is modeled by y = 0.801 x0.851 e-0.747x where x = 2006 - contract purchase year The inversion method is used to simulate four paid losses with the following four uniform (0,1) random numbers: 0.2877 0.1210 0.8238 0.6179 Using the simulated values, calculate the empirical estimate of the average unpaid losses for purchase year 2005. (A) Less than 300,000 (B) At least 300,000, but less than 400,000 (C) At least 400,000, but less than 500,000 (D) At least 500,000, but less than 600,000 (E) At least 600,000
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 74
Solutions to Problems: 5.1. B. (-2.153)(4) + 10 = 1.388. Comment: Reverse of the usual process of “standardizing” a variable to enable one to use the standard Normal table. 5.2. E. e(-2.153)(4) + 10 = e1.388 = 4.007. Comment: If x follows a LogNormal, then ln(x) follows a Normal. Thus if y follows a Normal, ey follows a LogNormal Distribution. 5.3. A. Set 0.0228 = F(x) = Φ[(x - 5)/17]. Using the Standard Normal Table, Φ(2) = .9772, and thus Φ(-2) = 1 - .9772 = .0228. ⇒ -2 = (x - 5)/17. ⇒ x = -29. Alternately, x = σZ + µ = (17)(-2) + 5 = -29. 5.4. B. Set 0.9772 = F(x) = Φ[(ln(x) - 5)/3]. Using the Standard Normal Table, Φ(2) = .9772. Therefore 2 = (ln(x) - 5)/3. Therefore x = e11 = 59,874. 5.5. E. Since Φ(.49) = .6879, the first random number of .6879 corresponds to a random Unit Normal of 0.49. This in turn correspond to a month with costs of: exp[9 + (1.5)(0.49)] = 16,899. Similarly, the other months correspond to: exp(9 + (1.5)Φ-1(0.1515)) = exp[9 + (1.5)(-1.03)] = 1728, exp[9 + (1.5)Φ-1(0.2743)] = exp[9 + (1.5)(-0.60)] = 3294, and exp[9 + (1.5)Φ-1(0.8078)] = exp(9 + (1.5)(0.87)] = 29,882. After applying the 5000 per month deductible, the insurer pays: 11,899 + 0 + 0 + 24,882 = 36,781. Comment: Similar to 3, 5/00, Q.32. 5.6. C. Φ(-0.5) = .3085. Therefore X(1) = (-0.5)(5) = -2.5. 5.7. E. Φ(0.9) = .8159. Therefore X(2) = X(1) + (0.9)(5) = -2.5 + 4.5 = 2.0. 5.8. B. Φ(-1.2) = .1151. Therefore X(3) = X(2) + (-1.2)(5) = 2 - 6 = -4.0. Comment: A Brownian Motion.
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 75
5.9. B. Φ(.67) = .7486. ⇒ The first height is: 40 + (5)(.67) = 43.35. Φ(-1.17) = .1210. ⇒ The second height is: 40 + (5)(-1.17) = 34.15. Φ(0.08) = .5319. ⇒ The third height is: 40 + (5)(0.08) = 40.40. 43.35 + 34.15 + 40.40 = 117.90. 5.10. A. L = Exp[0.8Z + 12.3]. Thus the first simulated annual aggregate loss is: Exp[(0.8)(.1485) + 12.3] = 247,409. P = (1.03){(1.15)(247409) + 80000} = 375,457. Z
Aggregate Loss
Preliminary Premium
Paid Premium
0.1485 -1.5499 0.3249 -0.1484 1.8605
247,409 63,582 284,908 195,102 973,254
375,457 157,712 419,873 313,499 1,235,219
375,457 300,000 419,873 313,499 600,000
Average
401,766
Comment: No premium paid is less than the minimum of 300,000 or more than the maximum of 600,000. 5.11. D. Φ[0.15] = 0.5596. Φ[-0.30] = 0.3821. Φ[1.10] = 0.8643. Φ[-1.65] = 0.0495. Therefore, the four test scores are: 50 + (0.15)(10) = 51.5, 50 + (-0.30)(10) = 47.0, 50 + (1.10)(10) = 61.0, 50 + (-1.65)(10) = 33.5. The average failing score is: (51.5 + 47.0 + 33.5)/3 = 44. The average passing score is 61. The difference is: 61 - 44 = 17. 5.12. A. Φ[1.34] = 0.9099. The simulated aggregate losses for the first year are: exp[10 + (1.34)(2)] = 321,258. Φ[-0.39] = 0.3483. The simulated aggregate losses for the second year are: exp[10 + (-0.39)(2)] = 10,097. Φ[0] = 0.5000. The simulated aggregate losses for the third year are: exp[10 + (0)(2)] = 22,026. Three year total is: 321,258 + 10,097 + 22,036 = 353,391.
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 76
5.13. E. 1 + CV2 = E[X2 ]/E[X]2 = exp[2µ + 2σ2] / exp[µ + σ2/2]2 = exp[σ2]. Thus, 1 + 42 = exp[σ2]. ⇒ σ = 1.683. For the LogNormal, let x be the median. 0.5 = Φ[(ln[x] - µ)/σ]. ⇒ (ln[x] - µ)/σ = 0. ⇒ x = exp[µ]. Therefore, 59,874 = exp[µ]. ⇒ µ = 11. Given u = 0.7224, we wish to find Z such that Φ[Z] = 0.7224. ⇒ Z = 0.59. the simulated loss is: exp[11 + (1.683)(0.59)] = 161,615. ln[ALAE] = 4.6 + ln[Loss]/2 = 4.6 + ln[161,615]/2 = 10.5965. ⇒ ALAE = 39,995. ALAE / Loss = 39,995 / 161,615 = 24.7%. Comment: Loosely based on “Illinois Tort Reform and the Cost of Medical Liability Claims,” by Susan J. Forray and Chad C. Karls, in the July 2010 Contingencies. 5.14. C. Simulate a random Normal. Φ(-1.53) = 0.0630, so the corresponding random Normal is: 900 + (-1.53)(150) = 671. Φ(0.97) = 0.8340, so the corresponding random Normal is: 900 + (.97)(150) = 1046. Φ(0.38) = 0.6480, so the corresponding random Normal is: 900 + (.38)(150) = 957. Total of the three years is: 671 + 1046 + 957 = 2674.
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 77
5.15. (a) u = 0.90 corresponds to a random Standard Normal of 1.282. u = 1 - 0.90 = 0.10 corresponds to a random Standard Normal of -1.282. Thus the two outputs are: exp[µ + 1.282 σ] and exp[µ - 1.282 σ]. (b) With Z a random draw from a Standard Normal Distribution, the two outputs are: exp[µ + Z σ] and exp[µ - Z σ]. Each of the outputs is a random draw from a LogNormal Distribution. Thus, E[X] = E[Y] = exp[µ + σ2/2]. E[X2 ] = E[Y2 ] = exp[2µ + 2σ2]. Thus, Var[X] = Var[Y] = exp[2µ + σ2] (exp[σ2] - 1). ∞
E[XY] =
∞
∫-∞ exp[µ + z σ] exp[µ - z σ] φ[z] dz = e2µ -∞∫ φ[z] dz = e2µ.
Cov[X, Y] = E[XY] - E[X]E[Y] = e2µ - exp[µ + σ2/2] exp[µ + σ2/2] = e2µ (1 - exp[σ2]). Thus, Corr[X, Y] =
e2µ (1 - exp[σ2 ]) exp[2µ +
σ 2]
(exp[σ2 ]
- 1)
= -exp[-σ2].
Comment: The two outputs from the Antithetic Variate Method are negatively correlated. As σ approaches zero, the correlation approaches -1. The Antithetic Variate Method is on the syllabus of joint Exam MFE/3F. 5.16. C. Working in 1997 dollars, set .6915 = F(x) = Φ({ln(x) - 7}/2). Using the Standard Normal Table, Φ(.5) = .6915. Therefore .5 = {ln(x) - 7}/2. Therefore x = e8 = 2981 in 1997 dollars. Inflating by 5%, this corresponds to a claim of size: (1.05)(2981) = 3130 in 1998. Alternately, working in 1998 dollars, after 5% inflation one has a LogNormal Distribution with parameters µ = 7 + ln(1.05) = 7.049 and σ = 2. Set .6915 = F(x) = Φ[ln(x) - 7.049}/2]. Using the Standard Normal Table, Φ(.5) = .6915. Therefore .5 = {ln(x) - 7.049}/2. Therefore x = e8.049 = 3131 in 1998 dollars. 5.17. C. Looking in the Normal Distribution Table, Φ(-1) is .1587. Thus the first random Normal is: -1. Φ(1) = .8413, so the second random Normal is: 1. The sum of the two simulated values is: (-1) + 1 = 0.
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 78
5.18. B. Φ(.1) = .5398. Thus the first random number of .5398 corresponds to a random Unit Normal of .1. This in turn correspond to a month with costs of: 15,000 + (.1)(2000) = 15,200. Similarly, the other months correspond to: 15,000 + Φ-1(.1151)(2000) = 15,000 + (-1.2)(2000) = 12,600, 15,000 + Φ-1(.0013)(2000) = 15,000 + (-3)(2000) = 9000, and 15,000 + Φ-1(.7881)(2000) = 15,000 + (.8)(2000) = 16,600. After applying the 10,000 per month deductible, the insurer pays: 5200 + 2600 + 0 + 6600 = 14,400. Comment: Note how the simulation would have been not much more difficult if there had been additional coverage modifications, such as an aggregate policy limit. In the absence of the deductible per month, the sum of the losses for the four months would be normally distributed with mean of (4)(15000) = 60,000 and standard deviation of 2000 4 = 4,000. Therefore, in the absence of a deductible, one could use a single random number to directly simulate the aggregate losses. 5.19. A. Let Z be such that Φ(Z) = u. Then the random LogNormal is exp[5.6 + 0.75Z]. Random Number Standard Normal Normal LogNormal Limited to 400 0.6179 0.3 5.825 339 339 0.4602 -0.1 5.525 251 251 0.9452 1.6 6.800 898 400 0.0808 -1.4 4.550 95 95 0.7881 0.8 6.200 493 400 0.4207 -0.2 5.450 233 233 With the policy limit of 400, the mean is: (339 + 251 + 400 + 95 + 400 + 233)/6 = 286. 5.20. A. Assuming that large random numbers correspond to large claims, set F(x) = u. u = Φ[(ln(x) - 5.6)/.75]. x = exp[5.6 + .75Φ−1[u]]. Φ[.6217] = 0.31. ⇒ x = exp[5.6 + (.75)(0.31)] = 341. payment is 241. Φ[.9941] = 2.52. ⇒ x = exp[5.6 + (.75)(2.52)] = 1790. payment is 1690. Φ[.8686] = 1.12. ⇒ x = exp[5.6 + (.75)(1.12)] = 626. payment is 526. Φ[.0485] = -1.66. ⇒ x = exp[5.6 + (.75)(-1.66)] = 78. payment is 0. The average payment per loss is: (241 + 1690 + 526 + 0)/4 = 614. Comment: The question should have specified whether large random numbers correspond to large simulated values or small simulated values. However, when using a table, such as that of the Normal Distribution, one commonly has large random numbers correspond to large simulated values. The average payment per payment is: (241 + 1690 + 526)/3 = 819.
2013-4-13,
Simulation §5 Normal and LogNormal,
HCM 10/25/12,
Page 79
5.21. A. Φ[-0.56] = 0.2877. Thus the first simulated Standard Normal is -0.56. The first simulated paid loss is: exp[13.294 + (0.494)(-.56)] = 450,161. The second simulated paid loss is: exp[13.294 + (0.494)(-1.17)] = 333,041. The third simulated paid loss is: exp[13.294 + (0.494)(0.93)] = 939,798. The fourth simulated paid loss is: exp[13.294 + 0(.494)(0.30)] = 688,451. Average paid loss: (450,161 + 333,041 + 939,798 + 688,451)/4 = 602,863. x = 2006 - contract purchase year = 2006 - 2005 = 1.
⇒ y = 0.801 10.851 e-0.747 = 0.3795. Estimated average unpaid losses: (0.3795)(602,863) = 228,863. Comment: The part of this exam question related to unpaid losses is an unnecessary complication.
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 80
Section 6, Simulating Brownian Motion38 In order to simulate the outcomes of Brownian Motion, one needs to simulate random draws from Normal Distributions. Questions about simulating Brownian Motion can be rephrased in terms of simulating Normal Distributions. Arithmetic Brownian Motion: An Arithmetic Brownian Motion, X(t) is Normal with mean X(0) + µ t, and variance σ2 t. Arithmetic Brownian Motion is a stochastic process, with the following properties: 1. X(t + s) - X(t) is Normally Distributed with mean µ s, and variance σ2 s. 2. The increments for disjoint time intervals are independent. 3. X(t) is continuous. One can simulate Arithmetic Brownian Motion, by simulating the increments as independent Normals. Exercise: Use the Method of Inversion in order to get random draws from a Standard Unit Normal. Use the following random numbers from [0, 1]: 0.4207, 0.7881, 0.0107, 0.9332, 0.3085. [Solution: Φ(-0.2) = 0.4207, and therefore the first random Standard Normal is: -0.2. Similarly, the remaining random Standard Normals are: 0.8, -2.3, 1.5, -0.5. Comment: The “random” numbers from [0, 1] were chosen for illustrative purposes so that it would be easy to look up the corresponding Standard Normals on the Normal Distribution Table provided with the exam.] Let X(t) be the position of a particle at time t. Assume X(t) is an Arithmetic Brownian Motion with µ = 0 and σ = 3. Assume X(0) = 0. Then X(1) - X(0) is Normally distributed with µ = 0 and σ = 3, X(2) - X(1) is Normally distributed with µ = 0 and σ = 3, X(3) - X(2) is Normally distributed with µ = 0 and σ = 3, etc. X(1) - X(0) is independent of X(2) - X(1), X(2) - X(1) is independent of X(3) - X(2), etc. Exercise: Use the Method of Inversion in order to get a random draw from a Normal Distribution with µ = 0 and σ = 3. Use the following random number from [0, 1]: 0.4207. [Solution: The Standard Unit Normal is -0.2. The corresponding Normal is: (-0.2)(3) = -0.6.] 38
Brownian Motion is covered in Derivatives Markets by McDonald, Introduction to Probability Models by Ross, or Loss Models.
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 81
Thus the simulated value of X(1) is -0.6. Now let us assume we want to also simulate X(2). X(2) - X(1) is Normally distributed with µ = 0 and σ = 3. Using the random number 0.7881, the simulated Normal is: (0.8)(3) = 2.4. The simulated value of X(2) = X(1) + 2.4 = -0.6 + 2.4 = 1.8. Using a sequence of random numbers, one could simulate X(t) in this sequential manner: Time 0 1 2 3 4 5
Random Number 0.4207 0.7881 0.0107 0.9332 0.3085
Unit Standard Mean Normal NormalDeviationµ = 0, σ = 3 -0.2 0.8 -2.3 1.5 -0.5
X(t) 0 -0.6 1.8 -5.1 -0.6 -2.1
-0.6 2.4 -6.9 4.5 -1.5
For example, X(3) = X(2) + (-2.3)(3) = 1.8 - 6.9 = -5.1. In order to simulate X(t) at: 1, 2, 3, ..., 100, we would simulate 100 independent Normal Distributions each with standard deviation 3. We would take a running sum of these simulated Normals, in order to get the simulated Arithmetic Brownian Motion. For example X(44) would be the sum of the first 44 random Normals. Here is an example of such a simulation, for t = 1, 2, 3, ..., 100:
20
-10
-20
-30
-40
40
60
80
100
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 82
Here is another simulation of the same process:
80
60
40
20
20
40
60
80
100
-20
Each time one simulates such a stochastic process, one gets a different result. Exercise: Let X(t) be the value of a currency at time t. Assume X(0) = 50. X(t) follows an Arithmetic Brownian Motion with µ = 0.7 and σ = 2. Sequentially simulate X(1) and X(5). Use the following random numbers from (0, 1): 0.6808, 0.1788. [Solution: X(1) - X(0) is Normally distributed with µ = 0.7 and σ = 2. X(5) - X(1) is Normally distributed with µ = (0.7)(5 - 1) = 2.8, and σ = 2 5 - 1 = 4. X(1) - X(0) is independent of X(5) - X(1). Time 0 1 5
Random Number 0.6808 0.1788
StandardMean Standard Increment NormalDeviation 0.47 -0.92
1.64 -0.88
X(t) 50 51.64 50.76
The random number 0.6808 corresponds to a Standard Unit Normal of 0.47. The corresponding Normal with mean .7 and standard deviation 2 is: (0.47)(2) + 0.7 = 1.64. X(1) = 50 + 1.64 = 51.64. The random number 0.1788 corresponds to a Standard Unit Normal of -0.92. The corresponding Normal with mean 2.8 and standard deviation 4 is: (-0.92)(4) + 2.8 = -0.88. X(5) = X(1) - 0.88 = 50.76.]
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 83
Standard Brownian Motion: In Derivative Markets, not on the syllabus of this exam, McDonald refers to an Arithmetic Brownian Motion with µ = 0 and σ = 1 as a Brownian Motion, Z(t). Many other textbooks refer to this as a Standard Brownian Motion. It is assumed that Z(0) = 0. Geometric Brownian Motion: If ln(X(t)) is an Arithmetic Brownian Motion, then X(t) is a Geometric Brownian Motion. For a Geometric Brownian Motion, X(t)/X(0) is LogNormal with parameters µ t and σ t . Geometric Brownian Motion is a stochastic process, with the following properties: 1. X(t + s) / X(t) is LogNormally Distributed with parameters µ s and σ s . 2. The ratios for disjoint time intervals are independent. 3. X(t) is continuous. Standard Normal
⇔
Standard Brownian Motion
Normal Distribution
⇔
Arithmetic Brownian Motion
LogNormal Distribution
⇔
Geometric Brownian Motion
Simulating Geometric Brownian Motion:39 One can simulate Geometric Brownian Motion, by simulating successive ratios as independent LogNormals, or by simulating an Arithmetic Brownian Motion and then exponentiating. Let X(t) be the price of a stock at time t, where time is measured in months. X(0) = $100. Assume X(t) follows a Geometric Brownian Motion with µ = 0.003 and σ = 0.06. Then X(1)/X(0) is LogNormally distributed with µ = 0.003 and σ = 0.06, X(2)/X(1) is LogNormally distributed with µ = 0.003 and σ = 0.06, X(3)/X(2) is LogNormally distributed with µ = 0.003 and σ = 0.06, etc. X(1)/X(0) is independent of X(2)/X(1), X(2)/X(1) is independent of X(3)/X(2), etc. X(1)/X(0) is LogNormally distributed with µ = 0.003 and σ = 0.06. ⇔ ln(X(1)) - ln(X(0)) is Normally distributed with µ = 0.003 and σ = 0.06. 39
See Exercise 21.8 in Loss Models, 3, 5/01, Q.8 rewritten. Simulating stock prices is discussed in “Mahlerʼs Guide to Two Topics in Financial Economics.”
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 84
Thus one can simulate ln(X(t)) as previously, and then exponentiate to get X(t). Using a sequence of random numbers, one could simulate ln[X(t)] in a sequential manner: Time (months)
Random Number
0 1 2 3 4 5
0.3085 0.7881 0.5793 0.9192 0.2420
Unit Standard Mean Normal µ = 0.003, σ = 0.06 NormalDeviation -0.5 0.8 0.2 1.4 -0.7
-0.027 0.051 0.015 0.087 -0.039
ln(X(t))
X(t)
4.605 4.578 4.629 4.644 4.731 4.692
$100.00 $97.34 $102.43 $103.98 $113.43 $109.09
In a similar manner one could simulate many months of the stock price. Here is an example of a simulation of the stock price over 10 years (120 months):
Price 200 180 160 140 120 100
20
40
60
80
100
120
Months
It is important to note, that the movements of the simulated price from month to month are random and independent.40 If one simulated the same process again, one would get a different result. Nevertheless, the expected stock price at time t (in months) is: 100exp(µt + tσ2/2) = 100exp(µ + σ2/2)t = 100exp(0.003 + 0.062 /2)t = 100(1.00481t). After 10 years, the expected price is: 100(1.00481120) = 178. In this particular simulation run, the price at 10 years turned out to be less than expected. In different simulation runs the price at 10 years will vary; Geometric Brownian Motion is a Stochastic Process. 40
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 85
Problems: Use the following information for the next 2 questions: An Arithmetic Brownian Motion is zero at time 0 and has µ = 0 and σ = 7. 6.1 (1 point) Simulate the Arithmetic Brownian Motion at time 5. Use the following random number from [0, 1]: 0.2119. A. less than -20 B. at least -20 but less than -15 C. at least -15 but less than -10 D. at least -10 but less than -5 E. at least -5 6.2 (1 point) Given the solution to the previous question, simulate the Arithmetic Brownian Motion at time 15. Use the following random number from [0, 1]: 0.6554. A. less than -5 B. at least -5 but less than -2 C. at least -2 but less than 2 D. at least 2 but less than 5 E. at least 5
6.3 (3 points) You are to simulate an Arithmetic Brownian Motion that is zero at time zero and with σ = 0.22 and µ = 0.06, at times 1, 5, 10, and 25. You are given the following independent random draws from a Standard Unit Normal Distribution: -1.1160, 0.2761, 1.4980, 0.6966, -0.5783, -0.8267, 0.9787, 0.4923. What is the simulated value of this Brownian Motion at time = 25? A. 2.7 B. 2.8 C. 2.9 D. 3.0 E. 3.1
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 86
Use the following information for the next 3 questions: For a simulation of the movement of a stockʼs price, X(t):
• X(t+1)/X(t) is LogNormally distributed with µ = 0.08 and σ = 0.36. • X(t+1)/X(t) is independent of X(t+2)/X(t+1). • The simulation projects the stock price in steps of time 1. • Simulated price movements are determined using the inversion method. • The price at t = 0 is 80. 6.4 (2 points) Simulate the price of the stock at time 1. Use the following random number from [0, 1]: 0.9641. A. less than 160 B. at least 160 but less than 170 C. at least 170 but less than 180 D. at least 180 but less than 190 E. at least 190 6.5 (2 points) Given the solution to the previous question, simulate the stock price at time 2. Use the following random number from [0, 1]: 0.0139. A. less than 60 B. at least 60 but less than 70 C. at least 70 but less than 80 D. at least 80 but less than 90 E. at least 90 6.6 (2 points) Given the solution to the previous question, simulate the stock price at time 3. Use the following random number from [0, 1]: 0.6179. A. less than 100 B. at least 100 but less than 110 C. at least 110 but less than 120 D. at least 120 but less than 130 E. at least 130
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 87
6.7 (3 points) You are to simulate a Standard Brownian Motion (starts at 0 at time 0 and σ = 1 and µ = 0) successively at times: 1, 2, 3, 4, and 5. You are given the following independent random numbers from (0 ,1): 0.0351, 0.3520, 0.8749, 0.1112, 0.6844. What is the simulated value of this Brownian Motion at time 5? (A) -1.8 (B) -1.6 (C) -1.4 (D) -1.2 (E) -1.0 6.8 (1 point) Using the results of the previous question, simulate a Geometric Brownian Motion with µ = 0.2 and σ = 0.1. Assume the Geometric Brownian Motion is one at time zero. What is the simulated value of this Geometric Brownian Motion at time 5? A. less than 1 B. at least 1 but less than 2 C. at least 2 but less than 3 D. at least 3 but less than 4 E. at least 4
6.9 (3, 5/01, Q.8) (2.5 points) For a simulation of the movement of a stockʼs price: (i) The price follows geometric Brownian motion, with drift coefficient µ = 0.01 and variance parameter σ2 = 0.0004. (ii) The simulation projects the stock price in steps of time 1. (iii) Simulated price movements are determined using the inverse transform method. (iv) The price at t = 0 is 100. (v) The random numbers, from the uniform distribution on [0,1], for the first 2 steps are 0.1587 and 0.9332, respectively. (vi) F is the price at t = 1; G is the price at t = 2. Calculate G - F. (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 6.10 (3 points) Continuing the simulation in the previous question, the random numbers, from the uniform distribution on [0,1], for the next 2 steps are 0.6554 and 0.0359, respectively. Calculate the difference between the simulated price at t = 4 and the simulated price at t = 3. A. less than -1 B. at least -1 but less than 0 C. at least 0 but less than 1 D. at least 1 but less than 2 E. at least 2
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 88
Solutions to Problems: 6.1. C. Φ(-0.8) = 0.2119. The Arithmetic Brownian Motion at time 5 is Normal with mean zero and standard deviation: 7 5 . Therefore X(5) = (-0.8)(7) 5 = -12.52. 6.2. B. Φ(0.4) = 0.6554. Therefore X(15) = X(5) + (0.4)(7) 15 - 5 = -12.52 + 8.85 = -3.67. 6.3. A. One can simulate the corresponding Brownian Motion without drift, X(t), and then add µt at the end in order to get the Brownian Motion with drift. X(1) = (-1.1160)(.22) 1 = -.2455. X(5) = X(1) + (.2761)(.22) 5 - 1 = -.1240. X(10) = X(5) + (1.498)(.22) 10 - 5 = .6129. X(25) = X(10) + (0.6966)(.22) 25 -10 = 1.2064. Thus, the simulated value of the Brownian Motion with drift at time = 25 is: 1.2064 + (25)(.06) = 2.7064. Comment: At t = 10, the simulated simulated value of the Brownian Motion with drift is: .6129 + (10)(.06) = 1.2129. 6.4. B. Φ(1.8) = .9641. A simulated Normal with µ = .08 and σ = 0.36 is: .08 + (1.8)(.36) = .728. The corresponding LogNormal is: e.728 = 2.0709. Therefore price of the stock at time 1: (80)(2.0709) = 165.67. 6.5. D. Φ(-2.2) = .0139. A simulated Normal with µ = .08 and σ = 0.36 is: .08 + (-2.2)(.36) = -0.712. The corresponding LogNormal is: e-.712 = .49066. Therefore X(2) = X(1)(.49066) = (165.67)(.49066) = 81.29. 6.6. A. Φ(0.3) = .6179. A simulated Normal with µ = .08 and σ = 0.36 is: .08 + (.3)(.36) = .188. The corresponding LogNormal is: e.188 = 1.20683. Therefore X(2) = X(1)(1.20683 = (81.29)(1.20683) = 98.10. Comment: A Geometric Brownian Motion.
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 89
6.7. A. A Standard Brownian Motion has σ = 1 and µ = 0. X(1), X(2) - X(1), X(3) - X(2), X(4) - X(3), and X(5) - X(4), are independent Normals each with mean 0 and standard deviation 1. Φ(-1.81) = .0351. Φ(-.38) = .3520. Φ(1.15) = .8749. Φ(-1.22) = .1112. Φ(0.48) = .6844. Therefore, X(1) = -1.81. X(2) = X(1) - .38 = -2.19. X(3) = X(2) + 1.15 = -1.04. X(4) = X(3) - 1.22 = -2.26. X(5) = X(4) + 0.48 = -1.78. 6.8. C. From the previous solution, Standard Brownian Motion: -1.78 @5. Brownian Motion without drift: -1.78σ = (-1.78)(.1) = -.178 @5. Brownian Motion with drift: -.178 + 5µ = -.178 + (5)(.2) = .822 @5. Geometric Brownian Motion with drift: e.822 = 2.275 @5. 6.9. D. µ = 0.01 and σ = 0.02. First simulate the Brownian Motion without drift, by simulating a Normal variable. Then add the drift. Then exponentiate to get the Geometric Brownian Motion. Then multiply by 100 in order to get the stock prices. The Brownian Motion without drift at time 1 is Normally Distributed with mean zero, and standard deviation: σ 1 = 0.02 1 = 0.02. Φ(-1) = .1587. Therefore, the corresponding simulated Normal is: (-1)(.02) = -.02. Given the Brownian Motion without drift at time = 1 is -.02, the Brownian Motion without drift at time = 2 is Normally Distributed with mean -.02 and standard deviation: .02 2 - 1 = .02. Φ(1.5) = .9332. The corresponding simulated Normal is: -.02 + (1.5)(.02) = .01. Brownian Motion without drift: 0 @0, -.02 @1, .01 @2. Brownian Motion with drift: 0 @0, -.02 + µ = -.01 @1, .01 + 2µ = .03 @2. Geometric Brownian Motion with drift: e0 = 1 @0, e-.01 = .9900 @1, e.03 = 1.0305 @2. Simulated Stock Price: 100 @0, 99.00 @1, 103.05 @2. The difference in the two stock prices is: 103.05 - 99.00 = 4.05. Comment: Professor Klugman rewrote this exam question as Exercise 21.8 in Loss Models. Note that the price at time 2 depends on the price at time 1. One must first simulate what happens at time 1 and then what happens between times 1 and 2. One could get the price @2 by noting that the difference of the log stock price at 2 and the log stock price at 1 is the difference in the Brownian Motion with drift: .03 - (-.01) = .04. ⇒ The ratio of stock price @2 to stock price @1 is: e.04 = 1.0408.
⇒ Stock Price @ 2: (99.00)(1.0408) = 103.04, the same answer subject to rounding.
2013-4-13,
Simulation §6 Brownian Motion,
HCM 10/25/12,
Page 90
6.10. A. µ = .01 and σ = .02. From the previous solution, the Brownian Motion without drift at time = 2 is .01. Therefore, the Brownian Motion without drift at time = 3 is Normally Distributed with mean .01 and standard deviation: .02 3 - 2 = .02. Φ(0.4) = .6554. The corresponding simulated Normal is: .01 + (0.4)(.02) = .018. Given the Brownian Motion without drift at time = 3 is .018, the Brownian Motion without drift at time = 4 is Normally Distributed with mean .018 and standard deviation: .02 4 - 3 = .02. Φ(-1.8) = .0359. The corresponding simulated Normal is: .018 + (-1.8)(.02) = -.018. Brownian Motion without drift: .018 @3, -.018 @4. Brownian Motion with drift: .018 + 3µ = .048 @3, -.018 + 4µ = .022 @4. Geometric Brownian Motion with drift: e.048 = 1.0492 @3, e.022 = 1.0222 @4. Simulated Stock Price: 104.92 @3, 102.22 @4. The difference in the two stock prices is: 102.22 - 104.92 = -2.70.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 91
Section 7, Simulating Lifetimes One can also apply the inversion method to a life table, in order to simulate times of death and the present values of benefits paid for life insurances and annuities.41 For life contingencies the following are all the same: small random numbers ⇔ early deaths. large random numbers ⇔ large lifetimes. Setting u = F(x). Setting 1 - u = S(x) =
number still alive . number originally alive
number still alive = (1-u) (number alive at starting age). Exercise: One is simulating future lifetimes for a person age 70 using the illustrative Life Table from Actuarial Mathematics,42 where small random numbers correspond to early deaths. For a random number of 0.10, when does this simulated person die? [Solution: l70 = 6,616,155. (1-u)l70 = (0.9)(6,616,155) = 5,954,540. l72 = 6,164,663 > 5,954,540 > 5,920,394 = l73. Therefore, the person dies between age 72 and age 73. Comment: Note that a small random number did result in an early simulated death.] Unless specifically stated otherwise, on your exam set F(x) = u. Simulation of lifetimes can also be done the other way around, by setting u = S(x). Check with a small random number, such as u = 0.01, whether your result matches the statement in the question. Exercise: One is simulating future lifetimes for a person age 70 using the illustrative Life Table, where small random numbers correspond to late deaths. For a random number of 0.10, when does this simulated person die? [Solution: l70 = 6,616,155. u l70 = (0.10)(6,616,155) = 661,616. l92 = 682,707 > 661,616 > 530,959 = l93. Therefore, the person dies between age 92 and age 93. Comment: Note that a small random number did result in a late simulated death.]
41
I expect a lower average frequency of questions on simulating lifetimes, than when simulation was on the same exam as life contingencies on the old Exam 3. 42 This table is not attached to your exam. An excerpt from this table is given with the problems for this section.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 92
Annuity Example: 43 Fred age (65) and his wife Ethel age (60), two independent lives, purchase a special annual whole life annuity-due. The benefit is 30,000 while both are alive and 20,000 while only one is alive. Mortality follows the Illustrative Life Table.44 i = 0.06. Here is how we would simulate the actuarial present value of this annuity. First one needs two random numbers from (0, 1), for example: 0.668 and 0.222. Exercise: Simulate the death of Fred using the inversion method, where small random numbers correspond to early deaths, using the random number 0.668. [Solution: (1-u)l65 = (1 - 0.668) (7,533,964) = 2,501,276. l84 = 2,660,734 > 2,501,276 > 2,358,246 = l85. Fred dies between age 84 and 85.] Exercise: Simulate the death of Ethel using the inversion method, where small random numbers correspond to early deaths, using the random number 0.222. [Solution: (1-u)l60 = (1 - 0.222) (8,188,074) = 6,370,322. l71 = 6,396,609 > 6,370,322 > 6,164,663 = l72. Ethel dies between age 71 and 72.] Fred lives 84 - 65 = 19 complete years and Ethel lives 71 - 60 = 11 complete years. time: 0 .... 11 12 .... 19 20 payment: 30 30 20 20 0 There are 12 payments of 30,000, followed by 8 payments of 20,000. This is a sum of two certain annuity-dues, one with 20 payments of 20,000 and another with 12 payments of 10,000. The interest rate is 6%, so v = 1/1.06. The present value is: (20000)(1 - v20) / (1-v) + (10000)(1 - v12) / (1-v) = (20000)(12.1581) + (10000)(8.8869) = 332,031. I ran a simulation of this situation 10,000 times. The minimum present value was 30,000 and the maximum was 476,349. The present value had mean of 337,375 and sample standard deviation of 70,351.
43 44
I would be extremely surprised to see this type of thing on your exam. An excerpt from this table is given with the problems for this section.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 93
Here is a graph of the survival function of the present value:
1 0.8 0.6 0.4 0.2
100000
200000
300000
400000
500000
For example, Prob[PV > 300,000] = 73.75% and Prob[PV > 400,000] = 19.16%. De Moivreʼs Law:45 For De Moivreʼs Law, the survival function is uniform: S(t) = 1 - t/ω, 0 < t < ω. For a life aged x, the age of death is uniform from x to ω. Therefore, one can simulate the age of death as: x + u(ω-x).46 Exercise: John is age 25. Mortality follows De Moivreʼs law with ω = 80. Simulate Johnʼs age at death, using the inversion method, where small random numbers correspond to early deaths, and using the random number 0.615. [Solution: Age of death = 25 + (0.615)(80-25) = 58.8.]
45
See page 78 of Actuarial Mathematics. As discussed in a previous section, in order to simulate a random number uniformly distributed on [a, b], one takes: a + (b-a)u, where u is a random number from [0, 1]. 46
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 94
Gompertzʼs Law:47 For Gompertz Law, the survival function is: S(t) = exp[-m(ct - 1)]. For a life age x, one can simulate the age at death, y, via the inversion method. If small random numbers correspond to early deaths, then we set: 1 - u = ly/lx. 1 - u = exp[-m(cy - 1)] / exp[-m(cx - 1)] = exp[-m(cy - cx)]. ln(1 - u) = -m(cy - cx). cy = cx - ln(1 - u)/m. y = ln[cx - ln(1 - u)/m] / ln(c). Exercise: Mortality follows Gompertzʼs Law with m = 0.0008 and c = 1.09. For a life aged 50, use the random number 0.18 to simulate the age of death using the inversion method, where small random numbers correspond to early deaths. [Solution: y = ln[cx - ln(1 - u)/m] / ln(c) = ln[1.0950 - ln(1 - 0.18)/0.0008] / ln(1.09) = 67.0. Check: S(67.0)/S(50) = exp[-0.0008(1.0967 - 1)] / exp[-0.0008(1.0950 - 1)] = 0.77365/0.94300 = 0.820 = 1 - 0.18. Comment: If u = 0.01, then y = ln(1.0950 - ln(1 - 0.01)/0.0008)/ln(1.09) = 51.8. A small random number does correspond to an early death.] Makehamʼs Law, S(t) = exp[-At - m(ct - 1)], can not be algebraically inverted as can Gompertzʼs Law.
47
See page 78 of Actuarial Mathematics.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 95
Problems: In answering the questions in this section, you may use the following values from the Illustrative Life Table in Actuarial Mathematics: x 0 5 10 15 20 21 22 23 lx 10,000,000 9,749,503 9,705,588 9,663,731 9,617,802 9,607,896 9,597,695 9,587,169 x 24 25 26 27 28 29 30 31 lx 9,576,288 9,565,017 9,553,319 9,541,153 9,528,475 9,515,235 9,501,381 9,486,854 x 32 33 34 35 36 37 38 39 lx 9,471,591 9,455,522 9,438,571 9,420,657 9,401,688 9,381,566 9,360,184 9,337,427 x 40 41 42 43 44 45 46 47 lx 9,313,166 9,287,264 9,259,571 9,229,925 9,198,149 9,164,051 9,127,426 9,088,049 x 48 49 50 51 52 53 54 55 lx 9,045,679 9,000,057 8,950,901 8,897,913 8,840,770 8,779,128 8,712,621 8,640,861 x 56 57 58 59 60 61 62 63 lx 8,563,435 8,479,908 8,389,826 8,292,713 8,188,074 8,075,403 7,954,179 7,823,879 x 64 65 66 67 68 69 70 71 lx 7,683,979 7,533,964 7,373,338 7,201,635 7,018,432 6,823,367 6,616,155 6,396,609 x 72 73 74 75 76 77 78 79 lx 6,164,663 5,920,394 5,664,051 5,396,081 5,117,152 4,828,182 4,530,360 4,225,163 x 80 81 82 83 84 85 86 87 lx 3,914,365 3,600,038 3,284,542 2,970,496 2,660,734 2,358,246 2,066,090 1,787,299 x 88 89 90 91 92 93 94 95 96 lx 1,524,758 1,281,083 1,058,491 858,676 682,707 530,959 403,072 297,981 213,977 x 97 98 99 100 101 102 103 104 105 106 107 108 109 110 lx 148,832 99,965 64,617 40,049 23,705 13,339 7,101 3,558 1,668 727 292 108 36 11
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 96
7.1 (1 point) Mortality follows the Illustrative Life Table in Actuarial Mathematics. For a life aged 62, simulate the age of death using the inversion method, where small random numbers correspond to early deaths. Using the random number 0.86, what is the age at death? (A) 75 (B) 80 (C) 85 (D) 90 (E) 95 7.2 (1 point) Mortality follows De Moivreʼs law with ω = 100. Simulate the age at death of a life aged 40, with small random numbers corresponding to early deaths and using the random number 0.316. What is the simulated age at death? A. Less than 60 B. At least 60, but less than 65 C. At least 65, but less than 70 D. At least 70, but less than 75 E. At least 75 7.3 (3 points) Mortality follows Gompertzʼs Law with m = 0.0007 and c = 1.1. For a life aged 55, simulate the age of death using the inversion method, where small random numbers correspond to early deaths. Using the random number 0.23, what is the age at death? Hint: For Gompertzʼs Law, S(x) = exp[-m(cx - 1)]. A. B. C. D. E.
Less than 68 At least 68, but less than 70 At least 70, but less than 72 At least 72, but less than 74 At least 74
7.4 (1 point) Lifetimes follow a Weibull Distribution with τ = 6 and θ = 80. Using the inversion method (where small random numbers correspond to early deaths), you simulate the future lifetime of Samson who is age 60. A random number from the uniform distribution on [0,1] is 0.2368. What is the simulated future lifetime for Samson? A. 7 B. 8 C. 9 D. 10 E. 11 7.5 (2 points) Mortality follows De Moivreʼs law with ω = 90. Rob is aged 65. Laura is aged 60. Simulate their future lifetimes, with small random numbers corresponding to early deaths. Use the random number 0.561 for Rob and the random number 0.432 for Laura. What is the simulated future lifetime of the joint-life status of Rob and Laura? (A) 11 (B) 12 (C) 13 (D) 14 (E) 15
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 97
7.6 (3 points) You are simulating one year of death and surrender benefits for 4 policies. Mortality follows the Illustrative Life Table. The surrender rate, occurring at the end of the year, is 12% for all ages. The simulation procedure is the inverse transform algorithm, with low random numbers corresponding to the decrement occurring. You perform the following steps for each policy: (1) Simulate if the policy is terminated by death. If not, go to Step 2; if yes, continue with the next policy. (2) Simulate if the policy is terminated by surrender. Policy # Age Death Benefit Surrender Benefit 1 95 500 450 2 89 400 350 3 82 300 250 4 75 200 100 The following values are successively generated from the uniform distribution on [0, 1]: 0.13, 0.25, 0.61, 0.44, 0.08, 0.52, 0.93, 0.74, 0.36, 0.47,…. Calculate the total benefits generated by the simulation. (A) 650 or less (B) 700 (C) 750 (D) 800 (E) 850 or more 7.7 (3 points) You are given the following:
• You are using simulation to study the present value of a special annual whole life annuity-immediate on independent lives (57) and (52). • The benefit is 10,000 while both are alive and 7000 while only one is alive.
• Mortality follows the Illustrative Life Table. • i = 0.05 • You are using the inversion method, where small random numbers correspond to early deaths. • In your first trial, your random numbers from the uniform distribution on [0,1] are: 0.815 and 0.341. In this first trial, what is the simulated value of the present value of the benefits? A. Less than 130,000 B. At least 130,000, but less than 135,000 C. At least 135,000, but less than 140,000 D. At least 140,000, but less than 145,000 E. At least 145,000 7.8 (2 points) Lifetimes follow a Weibull Distribution with τ = 6 and θ = 80. Using the inversion method (where small random numbers correspond to early deaths), you simulate the future lifetimes of Samson who is age 60 and Delilah who is age 55. Random number from the uniform distribution on [0,1] are: 0.7836 and 0.4435. What is the simulated future lifetime of the joint-life status of Samson and Delilah? A. 20 B. 22 C. 24 D. 26 E. 28
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 98
7.9 (2 points) You are given the following Life Table for dogs: Age Number of Lives Age Number of Lives Age Number of Lives 0 1,000,000 7 190,135 14 2,298 1 860,708 8 127,579 15 795 2 724,336 9 80,631 16 235 3 594,001 10 47,570 17 58 4 472,836 11 25,929 18 11 5 363,725 12 12,904 19 2 6 268,995 13 5,738 20 0 You use the inversion method, with large random numbers corresponding to long lives, to simulate random ages at death for dogs. Assume for simplicity that dogs only die at positive integral ages. The first four independent random numbers uniform on the interval (0,1) are: 0.9610, 0.8936, 0.3712, and 0.1348. What is the sum of the corresponding four simulated ages at death? A. 23 or less B. 24 C. 25 D. 26 E. 27 or more 7.10 (3 points) You are given the following:
• You are using simulation to study the present value of a special annual whole life annuity-immediate on independent lives Sarek who is a Vulcan and age 120, and Amanda who is a human and age 65. • The benefit is 600 while both are alive and 400 while only one is alive.
• Lifetimes of Vulcans follow a Weibull Distribution with τ = 5 and θ = 150. • Lifetimes of humans follow a Weibull Distribution with τ = 6 and θ = 80. • i = 0.04. • Use the inversion method, where small random numbers correspond to early deaths. • In your first trial, your random numbers from the uniform distribution on [0,1] are: 0.273 and 0.706. In this first trial, what is the simulated present value of the benefits? A. 7,000 B. 8,000 C. 9,000 D. 10,000 E. 11,000 7.11 (2 points) Mortality follows the Illustrative Life Table. Humphrey is aged 51. Lauren is aged 26. Simulate their future lifetimes, with small random numbers corresponding to early deaths. Use the random number 0.851 for Humphrey and the random number 0.243 for Lauren. What is the simulated future lifetime of the joint-life status of Humphrey and Lauren? A. Less than 25 B. At least 25, but less than 30 C. At least 30, but less than 35 D. At least 35, but less than 40 E. At least 40
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 99
Use the following information for the next two questions: You are simulating L, the loss-at-issue random variable for a fully continuous whole life insurance of 1000 on a life age (25). The policy has a double indemnity provision which provides that an additional benefit of 1000 will be paid if death is by accidental means. The contract premium is 6. Mortality follows the illustrative life table. Assume a constant force of mortality within each year of age. 0.0004 is the constant force of decrement due to death by accidental means.
δ = 0.06. 7.12 (4 points) Your random number for simulating the time of death is 0.873, where low random numbers correspond to long times until death. Your random number for simulating the cause of death is 0.912, where high random numbers correspond to deaths by accidental means. Calculate the simulated value of L. A. Less than 0 B. At least 0, but less than 100 C. At least 100, but less than 500 D. At least 500, but less than 1000 E. At least 1000 7.13 (4 points) Your random number for simulating the time of death is 0.989, where low random numbers correspond to long times until death. Your random number for simulating the cause of death is 0.787, where high random numbers correspond to deaths by accidental means. Calculate the simulated value of L. A. Less than 0 B. At least 0, but less than 100 C. At least 100, but less than 500 D. At least 500, but less than 1000 E. At least 1000
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 100
7.14 (3 points) You are given the following: • You are using simulation to study the present value of a special annual whole life annuity-due on independent lives (65) and (62). • The benefit is 20,000 while both are alive and 13,000 while only one is alive.
• Mortality follows De Moivreʼs law with ω = 100. • i = 0.04 • You are using the Inverse Transform Method, where small random numbers correspond to early deaths. • In your first trial, your random numbers from the uniform distribution on [0,1] are: 0.711 and 0.320. In this first trial, what is the simulated value of the present value of the benefits? A. less than 290,000 B. at least 290,000 but less than 295,000 C. at least 295,000 but less than 300,000 D. at least 300,000 but less than 305,000 E. at least 305,000 7.15 (3 points) You are simulating one year of death and surrender benefits for 3 policies. Mortality follows the Illustrative Life Table: q50 = 0.00592, q60 = 0.01376, q70 = 0.03318. The surrender rate, occurring at the end of the year, varies by age. The simulation procedure is the inverse transform algorithm, with low random numbers corresponding to the decrement occurring. You perform the following steps for each policy: (1) Simulate if the policy is terminated by death. If not, go to Step 2; if yes, continue with the next policy. (2) Simulate if the policy is terminated by surrender. The following values are successively generated from the uniform distribution on [0, 1]: 0.754, 0.413, 0.249, 0.192, 0.005, 0.987, 0.526, 0.644, 0.795, 0.684. You are given: Policy #
Age
Death Benefit
Surrender Benefit
Surrender Rate
1 2 3
70 60 50
10 20 15
8 12 6
10% 15% 20%
Calculate the total benefits generated by the simulation. (A) 10 or less (B) 12 (C) 14 (D) 15
(E) 16 or more
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 101
7.16 (2 points) Mortality follows the Illustrative Life Table. There are two independent lives (61) and (64). Simulate their future lifetimes, with small random numbers corresponding to early deaths. Use the random number 0.371 for the first life and the random number 0.403 for the second life. What is the simulated future lifetime of their joint-life status? A. Less than 10 B. At least 10, but less than 15 C. At least 15, but less than 20 D. At least 20, but less than 25 E. At least 25 7.17 (3, 5/00, Q.22) (2.5 points) For a special annual whole life annuity-due on independent lives (30) and (50): (i) Y is the present-value random variable. (ii) The benefit is 1000 while both are alive and 500 while only one is alive. (iii) Mortality follows the Illustrative Life Table. (iv) i = 0.06 (v) You are doing a simulation of K(30) and K(50) to study the distribution of Y, using the inversion method (where small random numbers correspond to early deaths). (vi) In your first trial, your random numbers from the uniform distribution on [0,1] are 0.63 and 0.40 for generating K(30) and K(50) respectively. (vii) F is the simulated value of Y in this first trial. Calculate F. (A) 15,150 (B) 15,300 (C) 15,450 (D) 15,600 (E) 15,750 7.18 (3, 5/01, Q.12) (2.5 points) You are simulating one year of death and surrender benefits for 3 policies. Mortality follows the Illustrative Life Table. The surrender rate, occurring at the end of the year, is 15% for all ages. The simulation procedure is the inverse transform algorithm, with low random numbers corresponding to the decrement occurring. You perform the following steps for each policy: (1) Simulate if the policy is terminated by death. If not, go to Step 2; if yes, continue with the next policy. (2) Simulate if the policy is terminated by surrender. The following values are successively generated from the uniform distribution on [0, 1]: 0.3, 0.5, 0.1, 0.4, 0.8, 0.2, 0.3, 0.4, 0.6, 0.7,…. You are given: Policy #
Age
Death Benefit
Surrender Benefit
1 2 3
100 91 96
10 25 20
10 20 15
Calculate the total benefits generated by the simulation. (A) 30 (B) 35 (C) 40 (D) 45 (E) 50
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 102
7.19 (1 point) Two independent, identical lives (x) and (y) each have a 70% chance of a future lifetime of 10 years and a 30% chance of a future lifetime of 20 years. What is the expected value of T(xy), the future lifetime of their joint-life status? (A) 11 (B) 12 (C) 13 (D) 14 (E) 15 7.20 (2 points) Two independent, identical lives (x) and (y) each have future lifetime uniformly distributed over [0, 40]. What is the expected value of T(xy), the future lifetime of their joint-life status? (A) 11 (B) 12 (C) 13 (D) 14 (E) 15 7.21 (3, 11/01, Q.32) (2.5 points) An actuary is evaluating two methods for simulating T(xy), the future lifetime of the joint-life status of independent lives (x) and (y): (i) Mortality for (x) and (y) follows De Moivreʼs law with ω = 100. (ii) 0 < x ≤ y < 100 (iii) Both methods select random numbers R1 and R2 independently from the uniform distribution on [0, 1]. (iv) Method 1 sets: (a) T(x) = (100 - x)R1 (b) T(y) = (100 - y)R2 (c) T(xy) = smaller of T(x) and T(y) (v) Method 2 first determines which lifetime is shorter: (a) If R1 ≤ 0.50, it chooses that (x) is the first to die, and sets T(xy) = T(x) = (100 - x)R2 (b) If R1 > 0.50, it chooses that (y) is the first to die, and sets T(xy) = T(y) = (100 - y)R2 Which of the following is correct? (A) Method 1 is valid for x = y but not for x < y; Method 2 is never valid. (B) Method 1 is valid for x = y but not for x < y; Method 2 is valid for x = y but not for x < y. (C) Method 1 is valid for x = y but not for x < y; Method 2 is valid for all x and y. (D) Method 1 is valid for all x and y; Method 2 is never valid. (E) Method 1 is valid for all x and y; Method 2 is valid for x = y but not for x < y.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 103
7.22 (3, 11/02, Q.19) (2.5 points) For a fully discrete whole life insurance of 1000 on Glenn: (i) Glenn is now age 80. The insurance was issued 30 years ago, at a contract premium of 20 per year. (ii) Mortality follows the Illustrative Life Table. (iii) i = 0.06 (iv) You are simulating 30L, the prospective loss random variable at time 30 for this insurance based on the contract premium. You are using the inverse transform method, where low random numbers correspond to early deaths (soon after age 80). (v) Your first random number from the uniform distribution on [0, 1] is 0.42. Calculate your first simulated value of 30L . (A) 532
(B) 555
(C) 578
(D) 601
(E) 624
7.23 (SOA3, 11/03, Q.6) (2.5 points) You are simulating L, the loss-at-issue random variable for a fully continuous whole life insurance of 1 on (x). The policy has a double indemnity provision which provides that an additional benefit of 1 will be paid if death is by accidental means. (i) The contract premium is 0.025. (ii) µ(xτ)(t) = 0.02, t > 0. (iii) µ(adb) (t) = 0.005, t > 0, is the force of decrement due to death by accidental means. x (iv) δ = 0.05. (v) Your random number for simulating the time of death is 0.350, where low random numbers correspond to long times until death. (vi) Your random number for simulating the cause of death is 0.775, where high random numbers correspond to deaths by accidental means. Calculate the simulated value of L. (A) -0.391 (B) -0.367 (C) -0.341 (D) -0.319 (E) -0.297
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 104
Solutions to Problems: 7.1. D. Set 1 - u = lx/l62. lx = (1 - .86)(7954179) = 1,113,585. l89 = 1281083 > 1113585 > 1058491 = l90. So the age of death is between 89 and 90. 7.2. A. The future lifetime is uniform from 0 to 100 - 40 = 60. Therefore the future lifetime is: (60)(.316) = 18.96. Age at death is: 40 + 18.96 = 59. 7.3. A. Set 1 - u = lx/l55 = exp[-.0007(1.1x - 1)] / exp[-.0007(1.155 - 1)] = exp[.13234 - .0007(1.1x)]. ⇒ ln(1 - u) = .13234 - .0007(1.1x). ⇒ 1.1x = (.13234 - ln(1 - u))/.0007 = (.13234 - ln(1 - .23))/.0007 = 562.44. ⇒ x = ln(562.44)/ln(1.1) = 66.44. 7.4. D. Samsonʼs age at death follows the Weibull Distribution truncated from below at 60. 1 - H(x) = S(x)/S(60) = exp[-(x/80)6 ] / exp[-(60/80)6 ] = 1.195 exp[-(x/80)6 ]. Set .2368 = H(x) = 1 - 1.195 exp[-(x/80)6 ].
⇒ (x/80)6 = -ln(.7632/1.195) = .4484. ⇒ x = 70.0. Future lifetime is: 70.0 - 60 = 10 years. 7.5. C. Robʼs future lifetime is uniform from 0 to 90 - 65 = 25. Therefore his future lifetime is: (25)(.561) = 14.025. Lauraʼs future lifetime is uniform from 0 to 90 - 60 = 30. Her future lifetime is: (30)(.432) = 12.96. Their joint-life status fails when Laura dies first, a future lifetime of 12.96. 7.6. C. The chance of death at age 95 is q95 = .28191. Since .13 ≤ .28191 there is death and a benefit of 500 paid on policy #1. q89 = .17375 < .25, so there is not death. .12 < .61 so there is not surrender. No benefit is paid on policy #2, the first year. q82 = .09561 < .44, so there is not death. .08 ≤ .12 so there is surrender and a benefit of 250 is paid on policy #3. q72 = .05169 < .52, so there is not death. .12 < .93 so there is not surrender. No benefit is paid on policy #4, the first year. The total benefit paid is: 500 + 0 + 250 + 0 = 750. Comment: Similar to 3, 5/01, Q.12.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 105
7.7. E. Let u be the random number from [0, 1]. Then using the Method of Inversion (Table Lookup) we simulate by calculating (1-u) lx, and seeing where the number of lives fits in the table. (.185)l57 = (.185)(8,479,908) = 1,568,783. l87 = 1,787,299 > 1,568,783 > 1,524,758 = l88; therefore, the life (57) lives until age 87 and dies before age 88. (.659)l52 = (.659)(8,840,770) = 5,826,067. l73 = 5,920,394 > 5,826,067 > 5,664,051 = l74; therefore, the life (52) lives until age 73 and dies before age 74. For the special annual whole life annuity-immediate, there are payments of 10,000 when the first life is: 58, 59,...,78, and the second life is: 53, 54,..., 73. Then there are payments of 7,000 when the first life is: 79, 80,...,87. Thus there are 21 payments of 10,000 while they are both alive, and 9 more payments of 7000 while the life (57) remains alive after the life (52) has died. This is a sum of two certain annuitiesimmediate, one with 30 payments of 7000 and another with 21 payments of 3000. The interest rate is 5%, so v = 1/1.05. The present value is: (7000)(1 - v30)/i + (3000)(1 - v21)/i = (7000)(15.3725) + (3000)(12.8212) = $146,070. Comment: Similar to 3, 5/00, Q.22. In practical applications, it would be equally valid to simulate by calculating (1-u) lx; that is why this question specified that “large random numbers correspond to early deaths.” (In my solution, if u were .99, then the person would die quickly.) The present value of a certain annuity-immediate with n payments is: v(1 - vn )/(1-v) = (1 - vn )/i. (One can add up a geometric series.) 7.8. A. Samsonʼs age at death follows the Weibull Distribution truncated from below at 60. 1 - H(x) = S(x)/S(60) = exp[-(x/80)6 ]/exp[-(60/80)6 ] = 1.195 exp[-(x/80)6 ]. Set .7836 = H(x) = 1 - 1.195 exp[-(x/80)6 ].
⇒ (x/80)6 = -ln(.2164/1.195) = 1.7088. ⇒ x = 87.5. Future lifetime is: 87.5 - 60 = 27.5. Delilahʼs age at death follows the Weibull Distribution truncated from below at 55. 1 - H(x) = S(x)/S(55) = exp[-(x/80)6 ]/exp[-(55/80)6 ] = 1.111exp[-(x/80)6 ]. Set .4435 = H(x) = 1 - 1.111 exp[-(x/80)6 ].
⇒ (x/80)6 = -ln(.5565/1.111) = .6913. ⇒ x = 75.2. Future lifetime is: 75.2 - 55 = 20.2. Simulated future lifetime of the joint-life status is: Min[27.5, 20.2] = 20.2 years.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 106
7.9. B. For the first age at death we want the first age at which the distribution function F(x) > .9610. This occurs when the survival function = S(x) = 1 - F(x) < 1 - .9610 = .0390; which occurs when there are fewer than (1 million)(.0390) = 39,000 lives. This first occurs at age 11. Similarly, the first age at which there are fewer than (1 million)(1 - .8936) =106,400 lives is 9. The first age at which there are fewer than (1 million)(1 - .3712) =628,800 lives is 3. The first age at which there are fewer than (1 million)(1 - .1348) = 865,200 lives is 1. Thus the four ages at death are: 11, 9, 3, and 1, which sum to 24. Comment: Any resemblance to a realistic life table for dogs is coincidental. 7.10. B. Sarekʼs age at death follows the Weibull Distribution truncated from below at 120. 1 H(x) = S(x)/S(120) = exp[-(x/150)5 ]/exp[-(120/150)5 ] = 1.388 exp[-(x/150)5 ]. Set .273 = H(x) = 1 - 1.388 exp[-(x/150)5 ].
⇒ (x/150)5 = -ln(.727/1.388) = .647. ⇒ x = 137.5. Future lifetime is: 137.5 - 120 = 17.5. Amandaʼs age at death follows the Weibull Distribution truncated from below at 65. 1 - H(x) = S(x)/S(65) = exp[-(x/80)6 ]/exp[-(65/80)6 ] = 1.333exp[-(x/80)6 ]. Set .706 = H(x) = 1 - 1.333 exp[-(x/80)6 ]. (x/80)6 = -ln(.294/1.333) = 1.512. ⇒ x = 85.7. Future lifetime is: 85.7 - 65 = 20.7. There are 17 payments of size 600 while they are both alive, the first one a year from now. There are 3 additional payments of 400 while only Amanda is alive. v = 1/1.04. Present value of simulated benefits is the sum of 400 times annuity immediate of 20 payments plus 200 times an annuity immediate of 17 payments: 400(1 - v20)/i + 200(1 - v17)/i = (400)(13.590) + (200)(12.166) = 7869. 7.11. D. Set 1 - u = lx/l51. lx = (1 - .851)(8,897,913) = 1,325,789. l88 = 1,524,758 > 1,325,789 > 1,281,083 = l89. So Humphrey lives between 37 and 38 more years. Set 1 - u = lx/l26. lx = (1 - .243)(9,553,319) = 7,231,862. l66 = 7,373,338 > 7,231,862 > 7,201,635 = l67. So Lauren lives between 40 and 41 more years. The joint-life status fails when Humphrey dies first in between 37 and 38 years.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 107
7.12. B. Low random numbers correspond to long times until death ⇔ u l25 = # alive when dies. .873 l25 = (.873)(9565017) = 8,350,260. Since l58 = 8,389,826 > 8,350,260 > 8,292,713 = l59, death occurs between age 58 and 59. For a constant force of mortality within each year of age, can linearly interpolate on the logs of the survival function, in order to get a more exact age at death. ln(8,389,826) = 15.9425. ln(8,350,260) = 15.9378. ln(8,292,713) = 15.9309. Age of death is: 58 + (15.9425 - 15.9378)/(15.9425 - 15.9309) = 58.41. q58 = .01158, so the force of mortality is -ln(1 - .01158) = .01165. Probability of death by accident is: .0004/.01165 = .0343. High random numbers correspond to deaths by accidental means, and .912 ≤ 1 - .0343 ⇒ there is not death by accident. Present Value of Benefit Payment of 1000 is: 1000e-(33.41)(.06) = 134.71. Present Value of Premiums = 6(PV of a continuous annuity) = 6(1 - e-(.06)(33.41))/.06 = 86.53. L = PV(losses) - PV(premiums) = 134.71 - 86.53 = 48.2. Comment: Similar to SOA3, 11/03, Q.6. See Section 3.6 of Actuarial Mathematics. 7.13. E. Low random numbers correspond to long times until death ⇔ u l25 = # alive when dies. .989 l25 = (.989)(9565017) = 9,459,802. Since l32 = 9,471,591 > 9,459,802 > 9,455,522 = l33, death occurs between age 32 and 33. For a constant force of mortality within each year of age, can linearly interpolate on the logs of the survival function, in order to get a more exact age at death. ln(9,471,591) = 16.06381. ln(9,459,802) = 16.06256. ln(9,455,522) = 16.06211. Age of death is: 32 + (16.06381 - 16.06256)/(16.06381 - 16.06211) = 32.74. q32 = .00170, so the force of mortality is -ln(1 - .00170) = .00170. Probability of death by accident is: .0004/.00170 = .235. High random numbers correspond to deaths by accidental means, and .787 > 1 - .235 ⇒ there is death by accident. Present Value of Benefit Payment of 2000 is: 2000e-(7.74)(.06) = 1257.02. Present Value of Premiums = 6(PV of a continuous annuity) = 6(1 - e-(.06)(7.74))/.06 = 37.15. L = PV(losses) - PV(premiums) = 1257.02 - 37.15 = 1220. Comment: Similar to SOA3, 11/03, Q.6.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 108
7.14. A. The first life dies at age: 65 + (.711)(35) = 89.89. The second life dies at age: 62 + (.320)(38)= 74.16. 1st life: 65 ... 89 90 1 ... 13 1 .... 12 2nd life: 62 ... 74 75 Thus there are 13 payments of 20,000 while they are both alive (including one immediately), and 12 more payments of 13,000 while the life (65) remains alive after the life (62) has died. This is a sum of two certain annuity-dues, one with 25 payments of 13000 and another with 13 payments of 7000. The interest rate is 4%, so v = 1/1.04. The present value is: (13000)(1 - v25)/(1 - v) + (7000)(1 - v13)/(1 - v) = (13000)(16.2470) + (7000)(10.3851) = $283,907. Comment: Similar to 3, 5/00, Q.22 7.15. D. The chance of death at age 70 is q70 = 0.03318. Since .754 > .03318 there is not death. .413 > .10 so there is not surrender on policy #1. q60 = .01376 < .249, so there is not death. 0.192 > 0.15 so there is not surrender on policy #2. q50 = .00592 ≤ .005, so there is death. Death benefit of 15 is paid on policy #3, the first year. The total benefit paid is: 0 + 0 + 15 = 15. Comment: Similar to 3, 5/01, Q.12. 7.16. B. Set 1 - u = lx/l61. lx = (1 - .371)(8,075,403) = 5,079,428. l76 = 5,117,152 > 5,079,428 > 4,828,182 = l77. So the first life lives between 15 and 16 more years. Set 1 - u = lx/l64. lx = (1 - .403)(7,683,979) = 4,587,335. l77 = 4,828,182 > 4,587,335 > 4,530,360 = l78. So the second life lives between 13 and 14 more years. The joint-life status fails when the life age 64 dies first, between 13 and 14 years from now.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 109
7.17. B. Let u be the random number from [0, 1]. Then using the Method of Inversion (Table Lookup) we simulate the curtate future lifetime K(x), by calculating (1-u)lx, and seeing where the number of lives fits in the table. (1-.63)l30 = (.37)(9,501,381) = 3,515,511. l81 = 3,600,038 > 3,515,511 > 3,284,542 = l82; therefore, the life (30) lives until age 81 and dies before age 82. (1-.40)l50 = (.60)(8,950,901) = 5,370,541. l75 = 5,396,081 > 5,370,541 > 5,117,152 = l76; therefore, the life (50) lives until age 75 and dies before age 76. Thus there are 26 payments of 1000 while they are both alive (including one immediately), and 26 more payments of 500 while the life (30) remains alive after the life (50) has died. This is a sum of two certain annuity-dues, one with 52 payments of 500 and another with 26 payments of 500. The interest rate is 6%, so v = 1/1.06. The present value is: (500)(1 - v52)/(1-v) + (500)(1 - v26)/(1-v) = (500)(16.8131)+ (500)(13.7834) = $15,299. Comment: This type of simulation could be easily modified to incorporate more lives or more complicated benefits payments. Simulation would be particular useful in such complex situations. In practical applications, it would be equally valid to simulate K(x) by calculating u lx; that is why this question specified that “small random numbers correspond to early deaths.” (In my solution, if u were .01, then the person would die quickly. ) The present value of a certain annuity-due with n payments is: (1 - vn )/(1-v) = (1 - vn )/d. (If one does not remember the formula, one can add up a geometric series.) 7.18. A. The chance of death at age 100 is q100 = .40812. Since .3 ≤ .40812 there is death and a benefit of 10 paid on policy #1. q91 = .20493 < .5, so there is not death. .1 ≤ .15 so there is surrender and a benefit of 20 is paid on policy #2. q96 = .30445 < .4, so there is not death. .15 < .8 so there is not surrender. No benefit is paid on policy #3, the first year. The total benefit paid is: 10 + 20 + 0 = 30. 7.19. A. T(xy) is the shorter future lifetime. Mean of T(xy) = (.91)(10) + (.09)(20) = 10.9. T(x)
T(y)
Probability
T(xy)
10 10 20 20
10 20 10 20
49% 21% 21% 9%
10 10 10 20
Comment: E[T(x)] = E[T(y)] = (.7)(10) + (.3)(20) = 13. T( xy ) is the longer future lifetime. E[T( xy )] = (.49)(10) + (.51)(20) = 15.1. (10.9 + 15.1)/2 = 13.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 110
7.20. C. Survival Function of their joint-life status = Prob[T(xy) > t] = Probability[T(x) > t and T(y) > t] = Prob[T(x) > t]Prob[T(y) > t] = (1 - t/40)(1 - t/40) = 1 - t/20 + t2 /1600, 0 < t < 40. E[T(xy)] = integral of its survival function = 40
∫1 - t/20 + t2 /1600 dt = 40 - 402 /40 + 403 /4800 = 13.33 < 20. 0
Comment: The density of T(xy) is 1/20 - t/800, 0 < t < 40: 0.05
0.04
0.03
0.02 0.01
10
20
30
40
7.21. D. A joint-life status fails when either life fails. For De Moivreʼs law, the survival function is uniform. For a life aged x and ω = 100, the age of death is uniform from x to 100. Therefore, one can simulate the age of death as: x + u(100 - x). Then the future lifetime, T(x) = u(100 - x). Therefore, Method 1 is a valid method of simulating the two independent future lifetimes. Then by definition of a joint-life status, T(xy) = smaller of T(x) and T(y). Thus, Method 1 is valid. If the two ages are not equal, then there is not a 50% chance that each one will die first. Thus, if x ≠ y, Method 2 is not valid. If x = y, then there is a 50% chance that either one would die first. However, the distribution of the smallest future lifetime is not uniform, rather it is given by an order statistic. Therefore, Method 2 is not valid. Comment: The minimum of two identically distributed values has a smaller mean. The distribution of the smallest of two identical, independent variables is: 1 - S(t)2 . For example, if we have two lives each aged 60, each has a survival function for future lifetimes of: 1 - t/40. Thus the distribution of the minimum of the two future lifetimes is: 1 - (1 - t/40)2 = t/20 - t2 /1600, 0 < t < 40. The density is 1/20 - t/800, 0 < t < 40, with mean 40/3 = 13.33 < 20.
2013-4-13,
Simulation §7 Lifetimes,
HCM 10/25/12,
Page 111
7.22. D. l80 = 3,914,365. Low random numbers correspond to early deaths, so # still alive = (1-u)(# alive at starting age) = (1 - .42)(3,914,365) = 2,270,332. l85 = 2,358,246 > 2,270,332 > l86 = 2,066,090 ⇒ death between ages 85 and 86. 30L
= PV(1000 paid in 6 years) - PV(6 annual premium payments of 20 each) =
1000/1.066 - (20)(1 + 1/1.06 + 1/1.062 + 1/1.063 + 1/1.064 + 1/1.065 ) = 705.0 - (20)(1 - 1/1.066 )/(1 - 1/1.06) = 705.0 - (20)(5.212) = 600.8. Comment: See Section 7.4 of Actuarial Mathematics. Fully discrete whole life insurance ⇔ annual premium payments and payment of the benefit at the end of the year of death. 7.23. D. With a constant force of mortality, the survival function is Exponential, S(t) = e-.02t. Low random numbers correspond to long times until death ⇔ u = S(t). u = e-.02t. ⇒ t = -50ln(u) = -50ln(.350) = 52.5. Probability of death by accident is: .005/.02 = 1/4, (constant over time.) High random numbers correspond to deaths by accidental means, and .775 > .75 ⇒ there is death by accident. Present Value of Benefit Payment of 2 is: 2e-(52.5)(.05) = .1449. Present Value of Premiums = .025(PV of a continuous annuity) = .025(1 - e-(.05)(52.5))/.05 = .4639. L = PV(losses) - PV(premiums) = .1449 - .4639 = -0.319. Comment: Involves more Life Contingencies than simulation. In general, given that death occurred at time t, the probability of death by accident is: µ(adb) (t) / µ(xτ)(t) . x
2013-4-13,
Simulation §8 Miscellaneous Inversion Method, HCM 10/25/12, Page 112
Section 8, Miscellaneous, Inversion Method There are some special situations that will be discussed, with respect to using the inversion method. Continuous Distributions that are constant on an interval:48 Take f(x) = 0.1 for 0 ≤ x ≤ 4, f(x) = 0 for 4 < x < 6, f(x) = 0.2 for 6 ≤ x ≤ 9. Then F(x) = 0.1x for 0 ≤ x ≤ 4, F(x) = 0.4 for 4 ≤ x ≤ 6, F(x) = 0.4 + 0.2(x - 6) for 6 ≤ x ≤ 9. If we have a random number u = 0.3, then since F(3) = 0.3, using the inversion method49 we get x = 3. If instead u = 0.6, then since F(7) = 0.6, using the inversion method we get x = 7. However, what does one do if u = 0.4? F(4) = 0.4, F(5) = 0.4, F(5.1243) = 0.4, F(6) = 0.4. Since f is zero on the interval 4 to 6, F is constant on the interval 4 to 6, and the inversion method does not provide a unique answer. Loss Models takes the largest possible value of x that would work.50 Applying this rule in this case, if u = 0.4, then take x = 6. Some of you may find it convenient to think of this rule as an extension of the rule used when simulating discrete distributions. Find the smallest x, such that F(x) > u. In this case, F(6 + ε) > 0.4, so one would take x = 6 + ε ⇔ x = 6.51 Point Masses of Probability:52 Assume losses are Exponential with mean 2000. However, there is a 5000 policy limit, so payments are censored from above at 5000. Then at 5000 there is a point mass of probability: S(5000) = e-5000/2000 = 8.21%. The distribution function of the payments is continuous below 5000, but then has a jump discontinuity at 5000. F(x) = 1 - e-x/2000, x < 5000. F(5000) = 1. We can use the inversion method to simulate the payments. If u = 0.90, then using the method of inversion53 we get x = (-2000 )ln(1 - .9) = 4605. 48
See Example 21.3 in Loss Models. With large u corresponding to large x. 50 This is not the only possible rule that could be used. 51 Where ε is a very, very small positive number. 52 See Example 21.2 in Loss Models. 53 With large u corresponding to large x. 49
2013-4-13,
Simulation §8 Miscellaneous Inversion Method, HCM 10/25/12, Page 113
If however u = 0.95, we do not take x = (-2000) ln(1 - 0.95) = 5991 > 5000. Rather if u = 0.95, we take x = 5000. Some of you may find it convenient to think of this as an extension of the rule used when simulating discrete distributions. Find the smallest x, such that F(x) > u. In this case, F(5000) = 1 > 0.95. Let F(x) = 0.1x for 0 ≤ x < 4, F(x) = 0.2x - 0.3 for 4 ≤ x < 5, F(x) = 0.1x + 0.3 for 5 ≤ x ≤ 7. Note that F has jump discontinuities at x = 4 and x = 5. Here is a graph of F: Prob. 1
0.8 o 0.6
o
0.4
0.2
1
2
3
4
5
6
7
Exercise: Use the inversion method, with large random numbers corresponding to large simulated values, in order to simulate six random draws from this distribution. Use the following six random numbers from (0, 1): 0.91, 0.44, 0.68, 0.50, 0.02, 0.75. [Solution: F(5) = 0.8 and F(7) = 1, so for u = .91 we look at the last interval. F(6.1) = 0.91 ⇒ u = 0.91 ⇔ x = 6.1. F(4-) = 0.4 and F(4) = 0.5. F(x) first exceeds 0.44 at x = 4. ⇒ u = 0.44 ⇔ x = 4. For u = 0.68 we look at the second interval. F(4.9) = 0.68 ⇒ u = 0.68 ⇔ x = 4.9. F(4-) = 0.4 and F(4) = 0.5. F(x) first exceeds 0.5 at x = 4 + ε. ⇒ u = 0.5 ⇔ x = 4. For u = 0.02 we look at the first interval. F(0.2) = 0.02 ⇒ u = 0.02 ⇔ x = 0.2. F(5-) = 0.7 and F(5) = 0.8. F(x) first exceeds 0.75 at x = 5. ⇒ u = 0.75 ⇔ x = 5. Therefore, the six simulated values are: 6.1, 4, 4.9, 4, 0.2, and 5.]
2013-4-13,
Simulation §8 Miscellaneous Inversion Method, HCM 10/25/12, Page 114
Truncation and Censoring:54 Assume that prior to truncation, the losses are from a density f(x). Then in order to simulate a draw from data truncated from below at d, we first simulate y a random draw from f. We then reject it if y ≤ d, and accept it if y > d.55 Exercise: Assume we wish to simulate data from a Weibull Distribution truncated from below at 1000. Assume we have simulated 5 random draws from this Weibull Distribution, without any truncation: 4752, 87,321, 759, 7302, 14,082. What are the random claims from the Weibull Distribution truncated from below at 1000? [Solution: We reject 759 since since it is less than or equal to the truncation point from below of 1000. Thus the four random claims from the truncated Weibull are: 4752, 87,321, 7302, and 14,082.] These same ideas can be used for data censored rather than truncated. Instead of rejecting a claim one adjusts its value to that of the censorship point. Exercise: Assume we have simulated 5 random draws from this Weibull Distribution, without any truncation or censoring: 4752, 87,321, 759, 7302, 14,082. What are the random losses from the Weibull Distribution censored from above at 10,000? [Solution: We replace 87,321 and 14,802 with 2 losses at 10,000, since they are greater than or equal to the censorship point from above of 10,000. Thus the random losses from the Weibull censored from above at 10,000 are: 759, 4752, 7302, 10,000 and 10,000. Comment: Simulate the unlimited ground up losses, and then apply the maximum covered loss of 10,000.] Exercise: Assume we wish to simulate data from a Weibull Distribution truncated from below at 1000 and censored from above at 10,000. Assume we have simulated 5 random draws from this Weibull Distribution, without any truncation or censoring: 4752, 87,321, 759, 7302, 14,082. What are the random losses from the Weibull Distribution truncated from below at 1000 and censored from above at 10,000? [Solution: We replace 87,321 and 14,802 with 2 losses at 10,000, since they are greater than or equal to the censorship point from above of 10,000. We reject 759 since it is less than or equal to the truncation point from below of 1000. Thus the random losses from the Weibull truncated from below at 1000 and censored from above at 10,000 are: 4752, 7302, 10,000 and 10,000.]
54
See “Mahlerʼs Guide to Fitting Loss Distributions.” Note that for some distributions, such as the Pareto or Exponential, one could instead apply the Inverse Transform Method (algebraic inversion) to the truncated distribution. This would be more efficient. 55
2013-4-13,
Simulation §8 Miscellaneous Inversion Method, HCM 10/25/12, Page 115
Problems: 8.1 (2 points) F(x) = 0.2x for 0 ≤ x < 3, and F(x) = 0.5 + 0.1x for 3 ≤ x ≤ 5. Use the inversion method in order to simulate three random draws from this distribution. Use the following three random numbers from (0, 1): 0.9, 0.3, 0.7. What is the sum of the three simulated values? A. less than 8 B. at least 8 but less than 9 C. at least 9 but less than 10 D. at least 10 but less than 11 E. at least 11 8.2 (1 point) Prob[X = 0] = 20%. f(x) = 0.008e-.01x, for x > 0. Use the inversion method in order to simulate two random draws from this distribution. Use the following two random numbers from (0, 1): 0.70, 0.15. What is the difference between the first simulated value and the second simulated value? A. less than 80 B. at least 80 but less than 90 C. at least 90 but less than 100 D. at least 100 but less than 110 E. at least 110 8.3 (2 points) f(x) = 0.04 for 0 ≤ x ≤ 10, and f(x) = 0.06 for 40 ≤ x ≤ 50. Use the inversion method in order to simulate three random draws from this distribution. Use the following three random numbers from (0, 1): 0.2, 0.7, 0.4. What is the product of the three simulated values? A. less than 6000 B. at least 6000 but less than 7000 C. at least 7000 but less than 8000 D. at least 8000 but less than 9000 E. at least 9000
2013-4-13,
Simulation §8 Miscellaneous Inversion Method, HCM 10/25/12, Page 116
8.4 (1 point) Assume losses prior to the effects of deductibles and maximum covered losses follow a distribution F(x). Assume we have simulated the following random draws from F(x): 12, 4, 80, 37, 9, 24. What is the simulated sum of payments for a policy with a deductible of 10 and a maximum covered loss of 50? A. less than 80 B. at least 80 but less than 90 C. at least 90 but less than 100 D. at least 100 but less than 110 E. at least 110 8.5 (3 points) You are given the following graph of a cumulative distribution function: F(x) 1 (10000, 1)
(3000, 0.8) 0.8 (5000, 0.8)
0.5
(1000, 0.5)
0.3
(1000, 0.3)
(0, 0) 1000
3000
5000
x 10000
You simulate 5 losses from this distribution using the following random numbers from (0, 1): 0.941, 0.800, 0.123, 0.438, 0.762. Determine the sum of the simulated losses. A. Less than 16,000 B. At least 16,000, but less than 16,500 C. At least 16,500, but less than 17,000 D. At least 17,000, but less than 17,500 E. 17,500 or more
2013-4-13,
Simulation §8 Miscellaneous Inversion Method, HCM 10/25/12, Page 117
Solutions to Problems: 8.1. B. F(3-) = .6. F(3) = .8. u = .9 involves the second interval. F(4) = .9, so x = 4. u = 0.3, involves the first interval. F(1.5) = .3, so x = 1.5. F(3) = .8 > .7, so x = 3. 4 + 1.5 + 3 = 8.5. Comment: A graph of F(x): F(x) 1
(5, 1)
0.8
(3, 0.8)
(3, 0.6)
0.6
(0, 0) 3
5
x
8.2. C. F(x) = .2 + .8(1 - e-.01x) for x > 0. For u = .7, set F(x) = u. .7 = .2 + .8(1 - e-.01x). ⇒ x = 98. For u = .15, F(0) = .2 > .15, so x = 0. 98 - 0 = 98. 8.3. E. F(5) = .2. F(45) = .7. F(40 + ε) > .4. The three simulated values are: 5, 45, and 40, with product of 9000. Comment: Similar to Exercise 21.2 in Loss Models. F(10) = .4, F(20) = .4, F(30) = .4, and F(20) = .4. Thus any of the numbers between 10 and 40 inclusive would satisfy F(x) = u = .4. Loss Models takes the largest value such that F(x) = u. A picky detail from the textbook on the syllabus.
2013-4-13,
Simulation §8 Miscellaneous Inversion Method, HCM 10/25/12, Page 118
8.4. B. We replace 80 with the censorship point of 50. The losses of 4 and 9 are less than the truncation point from below so they are not in our truncated data. So the data truncated from below at 10 and censored at 50 is: 12, 50, 37, 24. The payments are: 2, 40, 27, 14. They sum to 83. 8.5. E. Linearly interpolating, we find where F(x) = 0.941; 0.941 corresponds to a loss of: 5000 + (.141/.2)(5000) = 8525. F(x) first exceeds .8 at 5000 + ε, so 0.800 corresponds to a loss of 5000. 0.123 corresponds to a loss of: (.123/.3)(1000) = 410. F(x) first exceeds 0.438 at 1000. 0.762 corresponds to a loss of: 1000 + (.262/.3)(2000) = 2747. Sum of the losses is: 8525 + 5000 + 410 + 1000 + 2747 = 17,682.
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 119
Section 9, Simulating a Poisson Process56 As was discussed previously, one can use the inversion method in order to simulate a Poisson Distribution; this is the method to use on your exam unless stated otherwise. There is a special algorithm based on waiting times that can also be used in order to simulate a Poisson Process; if one does not keep track of when the claims occur, the result is a simulated Poisson Distribution. Poisson Process:57 Before discussing how to simulate a (homogeneous) Poisson Process, first a few facts about Poisson Processes will be reviewed. A (homogeneous) Poisson Process has a constant independent claims intensity λ. The number of claims58 observed in time interval (0, T)59 is given by the Poisson Distribution with mean Tλ. For example, assume we have a Poisson Process on (0, 5) with λ = 0.03. Then the total number of claims is given by a Poisson Distribution with mean (5)(0.03) = 0.15. The Poisson Process is a random process. Sometimes you have to wait a long time for the next claim, and sometimes the next claim shows up right away. The interevent times are the times between claims. X1 is the time until the first claim. X2 is the time from the first claim until the second claim. X3 is the time from the second claim until the third claim, etc. The interevent times are independent Exponential Distributions each with mean 1/λ. The time until the nth event is the sum on n independent Exponential Variables, each with mean 1/λ. Time 0
Claim #2
Claim #1
Exponential mean 1/λ
Exponential mean 1/λ
Claim #3
Exponential mean 1/λ
We can simulate each Exponential interevent time via the method of inversion. 56
While I do not expect them to ask a question on this, it is part of Example 21.8 in Loss Models. See “Mahlerʼs Guide to Stochastic Models” for CAS Exam 3L. 58 Mathematicians refers to “events.” I will instead refer to “claims”, which is the most common application of these ideas for actuaries. The Interevent times are also referred to as interarrival times. 59 Note that by changing the time scale and therefore the claims intensity, one can always reduce to a mathematically equivalent situation where the interval is (0,1) . 57
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 120
Special Algorithm for Simulation of a (Homogeneous) Poisson Process: By simulating a Poisson Process we mean not just the total number of claims observed, which is given by a Poisson Distribution, but also determining the times at which the claims occur. The technique that will be discussed makes use of the fact that the interevent times are independent Exponential Distributions. For example, assume we wish to simulate a Poisson Process on [0,1] for λ = 2.5. Assume we have the following independent random numbers from zero to one: 0.0252, 0.6593, 0.7030, 0.0818, 0.4301. We simulate the following exponential interevent times. Use the Inversion Method, with large random numbers corresponding to large times between claims. ui = F(Xi) = 1 - exp(-2.5Xi). Then we have, Xi = (-1/2.5) ln(1 - ui). The first interevent time is (-1/2.5) ln(0.9748) = 0.0102. Thus the first claim occurs at time 0.0102. The second interevent time is: (-1/2.5) ln(0.3407) = 0.4307. Thus the second claim occurs at: 0.0102 + 0.4307 = 0.4409. In spreadsheet form, the calculations are:
i
Uniform Random Number ui
Interevent Time Xi = -Ln(1-ui) / lambda
Cumulative Interevent Time Sum of Xi
1 2 3 4 5
0.0252 0.6593 0.7030 0.0818 0.4301
0.0102 0.4307 0.4856 0.0341 0.2249
0.0102 0.4409 0.9265 0.9607 1.1856
For this set of random inputs, we compare to unity the cumulative sum of the interevent times. We have at least 4 claims between t = 0 and t = 1, since we observe the 4th claim at 0.9607 < 1. We have fewer than 5 claims, since we observe the 5th claim at 1.1856 > 1. Thus we have simulated 4 claims in this case, with times: 0.0102, 0.4409, 0.9265, and 0.9607. If instead the Poisson Process had been on (0, 0.5), rather than (0,1), then we would have simulated two claims with times: 0.0102 and 0.4409.
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 121
In general, one can simulate a Poisson Process on [0, T] with claims intensity λ as follows: 1. Set i = 1 and Y0 = 0. 2. Simulate a random number from [0, 1], ui. 3. Simulate an exponentially distributed interevent time, Xi = -ln(1 - ui)/λ. 4. Let Yi = Xi + Yi - 1. 5. If Yi > T, then reject Yi and exit the algorithm, having generated i-1 claims at times Yj for j < i. 6. If Yi ≤ T, then let i = i +1 and return to Step 2. Note that we always use on an extra random number in order to determine that our time period has been exceeded. Exercise: Using the above algorithm and the following independent random numbers from (0, 1): 0.66, 0.14, 0.88, 0.55, 0.67, simulate a Poisson Process on (0, 3) with λ = 0.7. [Solution: We get two claims, at times: 1.5412 and 1.7566. 0.7
i
Uniform Random Number ui
Interevent Time Xi = -Ln(1-ui) / lambda
Cumulative Waiting Time Sum of Xi
1 2 3
0.66 0.14 0.88
1.5412 0.2155 3.0289
1.5412 1.7566 4.7856
Note that we exit when the cumulative interevent time is 4.7856 > 3, since this Poisson Process is on the interval (0, 3).] Note that the above algorithm assumes that large random numbers correspond to large times between claims; u = F(t) for the Exponential Distribution of interevent times.60 This is the default, unless stated otherwise in a question. If instead large random numbers correspond to small times between claims, then set S(t) = u and therefore, t = -ln(u)/λ. One can use the above algorithm in order to simulate a Poisson Distribution. If one wanted to simulate from a Poisson Distribution with mean λ, then simulate a Poisson Process with claims intensity λ from time 0 to 1, and record the number of simulated claims.61 60
This is what is done in Example 21.8 of Loss Models. For a Poisson Distribution, we only care about the total number of claims on the time interval [0, 1], and not their individual times. 61
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 122
Exercise: Using the above algorithm and the following independent random numbers from (0, 1): 0.36, 0.54, 0.61, 0.25, 0.07, simulate a Poisson Distribution with λ = 2.2. [Solution: We get three claims. 2.2
i
Uniform Random Number ui
Interevent Time Xi = -Ln(1-ui) / lambda
Cumulative Waiting Time Sum of Xi
1 2 3 4
0.36 0.54 0.61 0.25
0.2029 0.3530 0.4280 0.1308
0.2029 0.5558 0.9838 1.1146
We simulated a Poisson Process with λ = 2.2 on [0, 1]. ] Below are shown the results of 15 separate simulations of a Poisson Process with λ = 0.7, from time 0 to 10. Each row is one simulation run. For example the bottom row shows 3 claims at times: 0.32112, 3.27607, and 8.45428.
0
2
4
6
8
10
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 123
Problems: 9.1 (1 point) You simulate a Poisson Process with claims intensity λ = 0.03 by applying the Inversion Method. Use the following independent random numbers from (0,1): 0.660, 0.203. What is the time of the second claim? A. less than 40 B. at least 40 but less than 41 C. at least 41 but less than 42 D. at least 42 but less than 43 E. at least 43 9.2 (2 points) A random number generator produces the following numbers in the unit interval [0,1]: 0.273, 0.661, 0.790, 0.821, 0.524, 0.036, 0.112. The algorithm based on the interevent times is used to simulate one random observation drawn from a Poisson distribution with a mean of 5. What is the resulting number of claims? A. 2 B. 3 C. 4 D. 5 E. 6 or more 9.3 (2 points) Simulate a Poisson Process with claims intensity λ = 5 by applying the Inversion Method. Use the following independent random numbers from (0,1): 0.627, 0.439, 0.110, 0.379, 0.376, 0.864, 0.601. How many claims occur by time 1? A. 2 B. 3 C. 4 D. 5 E. 6 or more 9.4 (3 points) A random number generator produces the following numbers in the unit interval [0,1]: 0.238, 0.066, 0.799, 0.261, 0.264, 0.532, 0.902, 0.447, 0.634, 0.163, 0.558. The algorithm based on the interevent times is used to simulate a Poisson Process with λ = 3 on the interval [0, 2]. What is the time of the next to last claim? A. 1.1 B. 1.3 C. 1.5 D. 1.7 E. 1.7
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 124
9.5 (2 points) Use the following information:
• One wishes to simulate a Poisson Process with claims intensity λ = 0.3. • Use the following independent random numbers from (0, 1): 0.59, 0.32, 0.79, 0.45, 0.24, 0.83, 0.18, 0.33, 0.62, 0.06. You simulate the interevent times between claims, by applying the Inversion Method. What is the time of the fourth claim? A. less than 12 B. at least 12 but less than 13 C. at least 13 but less than 14 D. at least 14 but less than 15 E. at least 15 9.6 (2 points) Given the following random numbers in the unit interval [0,1]: Generational Sequence 1 2 3 4 5 6
Random Number 0.747 0.427 0.478 0.466 0.888 0.716
The algorithm based on the interevent times is used to simulate one random observation drawn from a Poisson distribution with a mean of 2.5. What is the resulting number of claims? A. 2 B. 3 C. 4 D. 5 E. 6 or more 9.7 (4 points) Claims occur via a Poisson Process with constant intensity 3. Claims are reported after a lag. The lag from occurrence to report is given by an Exponential Distribution with a mean of 0.6. Simulate the time of the occurrence of each claim, and then the time at which it is reported, before simulating another claim. Use the following independent random numbers from (0,1): 0.812, 0.876, 0.884, 0.136, 0.665, 0.197, 0.626, 0.336, 0.717, 0.023, 0.297, 0.206, 0.852, 0.834. What is the time of occurrence of the second claim to be reported? A. less than 1.5 B. at least 1.5 but less than 1.6 C. at least 1.6 but less than 1.7 D. at least 1.7 but less than 1.8 E. at least 1.8
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 125
9.8 (2 points) Simulate a Poisson Process with claims intensity λ = 1.31 by applying the Inversion Method, where small random numbers correspond to large times between claims. What is the time of the sixth claim? Use the following independent random numbers from (0,1): 0.71, 0.28, 0.33, 0.26, 0.60, 0.14, 0.86, 0.12, 0.55, 0.67 A. 3.0 B. 3.5 C. 4.0 D. 4.5 E. 5.0 Use the following information for the next two questions: Claims follow a Poisson Process with λ = 0.4 per year. Corky and Smokey are each simulating 100 independent years of data. (i) Corky Sherwood Forrest begins by simulating the time of the first claim from an Exponential Distribution with mean of 2.5. Then if necessary she simulates the times between claims from an Exponential Distribution with mean of 2.5, using the procedure of repeatedly simulating the time until the next claim, until the end of the year is exceeded. When she is done simulating the first year, she starts again, doing a second year in the same manner; etc. The expected number of random numbers she needs is F. (ii) Smokey Bear uses the same algorithm for simulating the times between claims. However, he simulates the times of claims until a century is exceeded. Then he divides the century up into 100 separate years. The expected number of random numbers he needs is G. 9.9 (2 points) Which of the following statements is true? (A) Neither is a valid method for simulating the process. (B) Corkyʼs method is valid; Smokeyʼs is not. (C) Smokeyʼs method is valid; Corkyʼs is not. (D) Both methods are valid, and will produce identical results from the same sequence of random numbers. (E) Both methods are valid, but they may produce different results from the same sequence of random numbers. 9.10 (1 point) Determine which of the following ranges contains the ratio F/G. (A) 0 < F/G ≤ 1 (B) 1 < F/G ≤ 2 (C) 2 < F/G ≤ 3 (D) 3 < F/G ≤ 4 (E) 4 < F/G
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 126
9.11 (3 points) An insurance company has two insurance portfolios. Claims in Portfolio P occur in accordance with a Poisson process with mean 3 per year. Claims in Portfolio Q occur in accordance with a Poisson process with mean 5 per year. The two processes are independent. Simulate each process by applying the Inversion Method to the exponential times between claims. Use the following independent random numbers from (0,1) to simulate Portfolio P: 0.190, 0.554, 0.108, 0.791, 0.069, 0.631. Use the following independent random numbers from (0,1) to simulate Portfolio Q: 0.051, 0.254, 0.525, 0.110, 0.513, 0.337. How many claims occur in Portfolio P, before three claims occur in Portfolio Q? (A) 0 (B) 1 (C) 2 (D) 3 (E) 4 or more Use the following information for the next two questions:
• Losses occur via a homogeneous Poisson process with intensity 0.25. • Use the following random numbers from (0, 1): 0.166, 0.608, to simulate the times of occurrence of losses, using the Inversion Method to simulate the times between losses. • The delay from occurrence of the loss to the reporting of the loss to the insurer is given by a Weibull Distribution with parameters τ = 1.5 and θ = 2.
• Use the following random numbers from (0, 1): 0.379, 0.240, to simulate the reporting delays, using the Inversion Method.
• The time from reporting of the loss to payment of the loss is given by an Inverse Weibull Distribution with parameters τ = 3 and θ = 1.7.
• Use the following random numbers from (0, 1): 0.861, 0.668, to simulate the payment delays, using the Inversion Method. 9.12 (2 points) At what time is the first loss to occur paid? A. 3.0 B. 3.5 C. 4.0 D. 4.5 E. 5.0 9.13 (2 points) At what time is the second loss to occur paid? A. 7.0 B. 7.5 C. 8.0 D. 8.5 E. 9.0
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 127
9.14 (2 points) A random number generator produces the following numbers in the interval [0,1]: Generational Sequence
Random Number
1 2 3 4 5 6 7 8
0.5148 0.0185 0.5998 0.3295 0.2804 0.9242 0.8456 0.1808
The algorithm based on the interevent times is used to simulate one random observation drawn from a Poisson distribution with a mean of 3.2. Large random numbers correspond to large times between claims. What is the resulting number of claims? A. 1 or less B. 2 C. 3 D. 4 E. 5 or more 9.15 (2 points) You simulate the interevent times between claims of a Poisson Process with claims intensity λ = 4.2, by applying the Inverse Transform Method. Use the following random numbers from (0, 1): 0.72, 0.67, 0.54, 0.40, 0.88, 0.11, 0.43, 0.29. What is the sum of the times of the first three claims? A. less than 1.7 B. at least 1.7 but less than 1.8 C. at least 1.8 but less than 1.9 D. at least 1.9 but less than 2.0 E. at least 2.0 9.16 (4 points) Workers' Compensation claims are either Medical Only or Lost Time. The number of Medical Only and Lost Time claims are independent. The time between Medical Only claims is Exponential with mean 3 days. The time between Lost Time claims is Exponential with mean 7 days. Use the following random numbers to simulate the interevent times of Medical Only claims: 0.597, 0.132, 0.586, 0.752, 0.494, 0.556, 0.332, 0.559, 0.678, 0.806, 0.367, 0.027, 0.327. Use the following random numbers to simulate the interevent times of Lost Time claims: 0.144, 0.147, 0.530, 0.267, 0.044, 0.318, 0.607, 0.770, 0.897, 0.531, 0.509. How many claims are there during the first 20 days? A. 11 or less B. 12 C. 13 D. 14 E. 15 or more
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 128
9.17 (4B, 5/94, Q.19) (2 points) A random number generator produces the following numbers in the unit interval [0,1]: Random Number 7X+5 7Xmod X+ 5 23 0.5652 0.7391 0.9565 0.4783 0.1304
Using the procedure based on the interarrival times, determine the first random number generated for a Poisson distribution with mean equal to 3. A. 0 B. 1 C. 2 D. 3 E. 4 9.18 (4B, 11/94, Q.21) (2 points) A random number generator produces the following numbers in the unit interval [0,1]: Generational Sequence 1 2 3 4 5
Random Number 0.909 0.017 0.373 0.561 0.890
Using the algorithm based on the interarrival times, calculate the simulated number of observed claims from a Poisson distribution with mean = 3. A. 1 B. 2 C. 3 D. 4 E. 5 9.19 (4B, 11/95, Q.1) (2 points) You wish to generate a single random number from a Poisson distribution with mean 1.5. A random number generator produces the following numbers in the unit interval (0, 1): Position in Generation Sequence Random Number 1 0.811 2 0.413 3 0.027 4 0.616 5 0.546 Using the algorithm based on the interarrival times, determine the simulated number from the Poisson Distribution. A. 0 B. 1 C. 2 D. 3 E. 4
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 129
9.20 (4B, 5/97, Q.1) (1 point) You wish to generate a single random number from a Poisson distribution with mean 3. A random number generator produces the following numbers in the unit interval (0, 1): Position in Generation Sequence Random Number 1 0.30 2 0.75 3 0.65 4 0.60 5 0.95 Using the algorithm specifically designed for the Poisson Process based on the interarrival times, determine the simulated number. A. 0 B. 1 C. 2 D. 3 E. 4 9.21 (4B, 11/98 Q.15) (1 point) You wish to generate a single random number from a Poisson distribution with mean 3. A random number generator produces the following numbers in the unit interval (0, 1): Position in Generation Sequence Random Number 1 0.30 2 0.70 3 0.30 4 0.50 5 0.90 Using the algorithm specifically designed for the Poisson Process based on the interarrival times, determine the simulated number. A. 0 B. 1 C. 2 D. 3 E. 4
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Solutions to Problems: 9.1. E. The second claim occurs at time 43.5. 0.03
i 1 2
Uniform Random Number Ui 0.660 0.203
Interevent Time Xi = -Ln(1-Ui) / lambda 35.9603 7.5634
Cumulative Interevent Time Sum of Xi 35.9603 43.5237
9.2. C. The fifth claim occurs after time 1. There are 4 claims by time 1. 5
i
Uniform Random Number Ui
Interevent Time Xi = -Ln(1-Ui) / lambda
Cumulative Interevent Time Sum of Xi
1 2 3 4 5 6
0.273 0.661 0.790 0.821 0.524 0.036
0.0638 0.2164 0.3121 0.3441 0.1485 0.0073
0.0638 0.2801 0.5922 0.9363 1.0848 1.0921
9.3. E. The seventh claim occurs after time 1. There are 6 claims by time 1. 5
i 1 2 3 4 5 6 7
Uniform Random Number Ui 0.627 0.439 0.110 0.379 0.376 0.864 0.601
Interevent Time Xi = -Ln(1-Ui) / lambda 0.1972 0.1156 0.0233 0.0953 0.0943 0.3990 0.1838
Cumulative Interevent Time Sum of Xi 0.1972 0.3128 0.3361 0.4314 0.5258 0.9248 1.1085
Comment: Small random numbers correspond to small times between claims, and therefore we set u = F(t) = 1 - e-λt. t = -ln(1-u)/λ. Similar to Example 21.8 in Loss Models.
Page 130
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
9.4. A. There are 7 claims before time 2. 3
i
Uniform Random Number Ui
Interevent Time Xi = -Ln(1-Ui) / lambda
Cumulative Interevent Time Sum of Xi
1 2 3 4 5 6 7 8
0.238 0.066 0.799 0.261 0.264 0.532 0.902 0.447
0.0906 0.0228 0.5348 0.1008 0.1022 0.2531 0.7743 0.1975
0.0906 0.1134 0.6482 0.7490 0.8512 1.1043 1.8785 2.0760
The next to last claim on [0,2] occurs at time 1.1043. 9.5. A. The fourth claim occurs at time 11.4525. 0.3
i
Uniform Random Number Ui
Interevent Time Xi = -Ln(1-Ui) / lambda
Cumulative Interevent Time Sum of Xi
1 2 3 4
0.59 0.32 0.79 0.45
2.9720 1.2855 5.2022 1.9928
2.9720 4.2575 9.4597 11.4525
9.6. A. The third claim is after time 1. We get 2 claims. 2.5
i
Uniform Random Number Ui
Interevent Time Xi = -Ln(1-Ui) / lambda
Cumulative Interevent Time Sum of Xi
1 2 3 4
0.747 0.427 0.478 0.466
0.5497 0.2227 0.2600 0.2509
0.5497 0.7725 1.0325 1.2835
Page 131
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 132
9.7. C. The 3rd claim to occur, at time 1.6397, is the 2nd to be reported. 3
i 1 2 3 4
Uniform Random Number U1 0.812 0.884 0.665 0.626
Interevent Time Xi = -Ln(1-U1) / lambda 0.5571 0.7181 0.3645 0.3278
Occurrence Time Sum of Xi 0.5571 1.2752 1.6397 1.9675
Uniform Random Number U2 0.876 0.136 0.197 0.336
Report Lag = -.6Ln(1-U2)
Report Time
1.2525 0.0877 0.1316 0.2457
1.8096 1.3629 1.7713 2.2132
9.8. E. The sixth claim occurs at time 5.00. 1.31 i
Uniform Random Number Ui
Interevent Time Xi = -Ln(Ui) / lambda
Cumulative Interevent Time Sum of Xi
1 2 3 4 5 6
0.71 0.28 0.33 0.26 0.60 0.14
0.2614 0.9717 0.8463 1.0283 0.3899 1.5008
0.2614 1.2332 2.0795 3.1078 3.4977 4.9986
Alternately, time of sixth claim is: -ln[(0.71)(0.28)(0.33)(0.26)(0.60)(0.14)]/1.31 = 5.00. Comment: Since small random numbers correspond to large times between claims, we set S(x) = u. x = -ln(u)/λ. 9.9. E. Corkyʼs method is the straightforward valid method. Smokeyʼs method is also valid. He is making use of the properties of a Poisson process. What happens during disjoint intervals, (0, 1], (1, 2], (2, 3], is independent. Also since the intensity is constant, the separate years each are a simulation of a Poisson Process with λ = 0.4. Both methods are valid, but they may produce different results from the same sequence of random numbers. 9.10. D. Each year has on average 0.4 claims. Corky uses for each year one random number for each simulated claim, plus one more random number to know that the year is exceeded. Thus on average she uses 1.4 random numbers per year, or 140 random numbers in total. The century has on average 40 claims. Therefore, Smokey uses on average 41 random numbers in total. F/G = 140/41 = 3.41. Comment: Smokeyʼs method is more efficient. The increase in efficiency is greater, the smaller is λ.
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
Page 133
9.11. B. The third claim from Portfolio Q occurs at time 0.2180. 5
i 1 2 3
Uniform Random Number Ui 0.051 0.254 0.525
Interevent Time Xi = -Ln(1-Ui) / lambda 0.0105 0.0586 0.1489
Cumulative Interevent Time Sum of Xi 0.0105 0.0691 0.2180
The second claim from Portfolio P occurs after 0.2180. There is 1 claim before 0.2180. 3
i 1 2
Uniform Random Number Ui 0.190 0.554
Interevent Time Xi = -Ln(1-Ui) / lambda 0.0702 0.2691
Cumulative Interevent Time Sum of Xi 0.0702 0.3394
Comment: Situation taken from 3, 11/00, Q.6. 9.12. E. & 9.13. B. Large random numbers correspond to large times between losses. u = F(t) = 1 - exp(-0.25t). Then we have, t = -ln(1 - u)/0.25 = -4ln(1 - u). u1
Interevent Time -4ln(1-u1)
Time of Occurrence
0.166 0.608
0.726 3.746
0.726 4.472
Large random numbers corresponding to large reporting delays. u = F(t) = 1 - exp(-(t/2)1.5). Then we have, t = 2{-ln(1-u)}1/1.5. Interarrival u1 Time of TimeOccurrence -4ln(1-u) 0.726 4.472
u2
Reporting Delay 2(-ln(1-u2))^2/3
Time of Reporting
0.379 0.240
1.220 0.845
1.946 5.317
Large random numbers corresponding to large payment delays. u = F(t) = 1 - exp(-(1.7/t)3 ). Then we have, t = 1.7{-ln(u)}-1/3. Interarrival ReportingTime Time u1 of u2 Occurrence Delay Time of 2(-ln(1-u2))^2/3 -4ln(1-u) Reporting 1.946 5.317
u3
Payment Delay 1.7(-ln(u3))^-1/3
Time of Payment
0.861 0.668
3.202 2.301
5.148 7.617
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
9.14. E. Simulate a Poisson Process from time zero to one, with λ = 3.2. Since large random numbers correspond to large interarrival times, u = F(x) = 1 - exp[-λx]. x = -ln(1-u)/λ = -ln(1-u)/3.2. The sixth claim occurs after time 1. There are 5 claims by time 1. 3.2
i
Uniform Random Number Ui
Interarrival Time Xi = -Ln(1-Ui) / lambda
Cumulative Interarrival Time Sum of Xi
1 2 3 4 5 6
0.5148 0.0185 0.5998 0.3295 0.2804 0.9242
0.2260 0.0058 0.2862 0.1249 0.1028 0.8061
0.2260 0.2318 0.5180 0.6429 0.7458 1.5519
9.15. A. The first three claims are at times: 0.3031, 0.5671, and 0.7519. Their sum is: 1.622. 4.2
i 1 2 3 4
Uniform Random Number Ui 0.72 0.67 0.54 0.40
Interarrival Time Xi = -Ln(1 -Ui) / lambda 0.3031 0.2640 0.1849 0.1216
Cumulative Interarrival Time Sum of Xi 0.3031 0.5671 0.7519 0.8736
Comment: Large random numbers correspond to large interarrival times; F(t) = u. This is the way to do it on the exam unless specifically told otherwise.
Page 134
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
9.16. E. For each Exponential, u = 1 - e-t/θ. ⇒ t = -θ ln(1 - u). 3
i
Uniform Random Number Ui
Interarrival Time Xi = - theta Ln(1-Ui)
Cumulative Interarrival Time Sum of Xi
1 2 3 4 5 6 7 8 9
0.597 0.132 0.586 0.752 0.494 0.556 0.332 0.559 0.678
2.7265 0.4247 2.6457 4.1830 2.0437 2.4358 1.2104 2.4561 3.3996
2.7265 3.1511 5.7968 9.9798 12.0235 14.4592 15.6696 18.1258 21.5254
There are 8 Medical Only claims prior to time 20. 7
i
Uniform Random Number Ui
Interarrival Time Xi = - theta Ln(1-Ui)
Cumulative Interarrival Time Sum of Xi
1 2 3 4 5 6 7 8
0.144 0.147 0.530 0.267 0.044 0.318 0.607 0.770
1.0884 1.1130 5.2852 2.1743 0.3150 2.6791 6.5376 10.2877
1.0884 2.2014 7.4865 9.6608 9.9758 12.6549 19.1925 29.4802
There are 7 Lost Time claims prior to time 20. There are a total of 15 claims by time 20. 9.17. C. We simulate 2 claims. 3
i 1 2 3
Uniform Random Number Ui 0.5652 0.7391 0.9565
Interarrival Time Xi = -Ln(1-Ui) / lambda 0.2776 0.4479 1.0450
Cumulative Interarrival Time Sum of Xi 0.2776 0.7255 1.7705
Page 135
2013-4-13,
Simulation §9 Poisson Process,
HCM 10/25/12,
9.18. C. We simulate 3 claims. 3
Interarrival Time Xi = -Ln(1-Ui) / lambda 0.7990 0.0057 0.1556 0.2744
Cumulative Interarrival Time Sum of Xi 0.7990 0.8047 0.9603 1.2347
i
Uniform Random Number Ui
Interarrival Time Xi = -Ln(1-Ui) / lambda
Cumulative Interarrival Time Sum of Xi
1 2 3 4 5
0.811 0.413 0.027 0.616 0.546
1.1107 0.3552 0.0182 0.6381 0.5264
1.1107 1.4658 1.4841 2.1221 2.6486
i
Uniform Random Number Ui
Interarrival Time Xi = -Ln(1-Ui) / lambda
Cumulative Interarrival Time Sum of Xi
1 2 3 4 5
0.30 0.75 0.65 0.60 0.95
0.1189 0.4621 0.3499 0.3054 0.9986
0.1189 0.5810 0.9309 1.2364 2.2349
i
Uniform Random Number Ui
Interarrival Time Xi = -Ln(1-Ui) / lambda
Cumulative Interarrival Time Sum of Xi
1 2 3 4 5
0.30 0.70 0.30 0.50 0.90
0.1189 0.4013 0.1189 0.2310 0.7675
0.1189 0.5202 0.6391 0.8702 1.6377
i 1 2 3 4
Uniform Random Number Ui 0.909 0.017 0.373 0.561
9.19. A. We simulate zero claims. 1.5
9.20. D. We simulate 3 claims. 3
9.21. E. We simulate 4 claims. 3
Page 136
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 137
Section 10, Simulating a Compound Poisson Process62 Once one has simulated when the claims from a Poisson Process have occurred, one can also simulate the size of those claims, in order to simulate a Compound Poisson Process.63 For example, in the previous section a Poisson Process with λ = 2.5 was simulated, obtaining 4 claims prior to time 1, at times: 0.0102, 0.4409, 0.9265, and 0.9607. Exercise: Severity follows a Single Parameter Pareto Distribution with α = 3 and θ = 1000. Using the Inversion Method, with large random numbers corresponding to large sizes, simulate the size of these four claims. Use the following random numbers from [0, 1]: 0.3908, 0.5383, 0.8037, 0.6026. [Solution: u = F(x) = 1 - (x/1000)-3. x = 1000(1 - u)-1/3. 1000(1 - 0.3908)-1/3 = 1179.6. 1000(1 - 0.5383)-1/3 = 1293.8. 1000(1 - 0.8037)-1/3 = 1720.7. 1000(1 - 0.6026)-1/3 = 1360.2. ] We have simulated a claim of size 1179.6 at time 0.0102, size 1293.8 at time 0.4409, size 1720.7 at time 0.9265, and size 1360.2 at time 0.9607. The simulated sum of claims by time 1 is: 1179.6 + 1293.8 + 1720.7 + 1360.2 = 5554.3. As another example of a compound Poisson Process, assume Taxicabs leave a hotel with a group of passengers at a Poisson rate λ = 10 per hour. The number of people in each group taking a cab is independent and has the following probabilities: Number of People Probability 1 0.60 2 0.30 3 0.10 Exercise: Simulate the time of each cab using the Inversion Method, where large random numbers correspond to large times between cabs. Use the following independent random numbers from (0,1): 0.771, 0.710, 0.647, 0.534, 0.313, 0.306, 0.573, 0.114, 0.518, 0.137, 0.812, 0.587, 0.714, 0.406. Simulate the number of people for each cab, with small random numbers corresponding to small numbers of people. Use the following independent random numbers from (0,1): 0.765, 0.520, 0.154, 0.579, 0.111, 0.441, 0.763, 0.575, 0.508, 0.638, 0.992, 0.865, 0.861, 0.098. 62
While I do not expect them to ask a question on this, it is part of Example 21.8 in Loss Models. Compound Poisson Processes are discussed in “Mahlerʼs Guide to Stochastic Models” and in an Introduction to Probability Models by Ross. 63
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 138
[Solution: For example, 0.60 < 0.765 ≤ 0.90, so the 1st cab has 2 people. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
u1
-ln(1-u1)/10
Time
u2
Number of People
0.771 0.710 0.647 0.534 0.313 0.306 0.573 0.114 0.518 0.137 0.812 0.587 0.714 0.406
0.1474 0.1238 0.1041 0.0764 0.0375 0.0365 0.0851 0.0121 0.0730 0.0147 0.1671 0.0884 0.1252 0.0521
0.1474 0.2712 0.3753 0.4517 0.4892 0.5257 0.6108 0.6229 0.6959 0.7107 0.8778 0.9662 1.0914 1.1435
0.765 0.520 0.154 0.579 0.111 0.441 0.763 0.575 0.508 0.638 0.992 0.865 0.861 0.098
2 1 1 1 1 1 2 1 1 2 3 2 2 1
During the first hour, we simulate 12 cabs with a total of 18 people leaving the hotel.] In general we would: 1. Simulate when the losses or events from a Poisson Process have occurred. 2. Simulate the size of each loss or event. Thinning a Poisson Process: One can instead simulate the previous situation by first thinning the original Poisson Process. The cabs leaving with 1 person, 2 people, and 3 people are three independent Poisson Process, with means: (0.6)(10) = 6, (0.3)(10) = 3, and (0.1)(10) = 1. Using the following independent random numbers from (0,1): 0.771, 0.710, 0.647, 0.534, 0.313, 0.306, 0.573, 0.114, 0.518, 0.137, 0.812, 0.587, 0.714, 0.406, simulating the first hour would proceed as follows. First simulate, the number of cabs leaving with one person, a Poisson Process with mean 6: u
-ln(1-u)/6
Time
0.771 0.710 0.647 0.534 0.313 0.306 0.573
0.2457 0.2063 0.1735 0.1273 0.0626 0.0609 0.1418
0.2457 0.4520 0.6255 0.7528 0.8154 0.8762 1.0181
There are 6 such cabs with 1 person each during the first hour.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 139
Next simulate, the number of cabs leaving with 2 people, a Poisson Process with mean 3: u
-ln(1-u)/3
Time
0.114 0.518 0.137 0.812 0.587
0.0403 0.2433 0.0491 0.5571 0.2948
0.0403 0.2836 0.3327 0.8898 1.1846
There are 4 such cabs with 2 people each during the first hour. Finally simulate, the number of cabs leaving with 3 people, a Poisson Process with mean 1: u
-ln(1-u)/1
Time
0.714
1.2518
1.2518
There are no such cabs with 3 people each during the first hour. Note that this alternate method produces different results than previously using the same random numbers. The times of cabs is different, the number of people is different, the number of cabs of each type is different, the total number of cabs is different, and the total number of people is different. Both methods are valid, but they usually produce different results from the same sequence of random numbers. Exercise: Using the first method, how many random numbers do we use on average in order to simulate the cabs and people for 5 hours? [Solution: The mean number of cabs over 5 hours is: (10)(5) = 50. Therefore, we expect to use 51 random numbers in order to simulate times of cabs, where the last time exceeds 5 hours. For each cab, we expect to use a random number in order to simulate the number of people. So in total we expect 51 + 50 = 101 random numbers.] Exercise: Using the second method, how many random numbers do we use on average in order to simulate the cabs and people for 5 hours? [Solution: The mean number of cabs over 5 hours with 1 person is: (0.6)(10)(5) = 30. Therefore, we expect to use 31 random numbers in order to simulate times of such cabs, where the last time exceeds 5 hours. We expect to use 1 + (0.3)(10)(5) = 16 random numbers to simulate cabs with 2 people each. We expect to use 1 + (0.1)(10)(5) = 6 random numbers to simulate cabs with 3 people each. So in total we expect to use: 31 + 16 + 6 = 53 random numbers.] The second method uses fewer random numbers on average than the first method. By using the mathematical fact that one can thin a Poisson Process, one can create a more efficient algorithm to simulate this type of situation.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 140
Problems: 10.1 (3 points) You are given the following:
•
The number of families who immigrate to Vancougar from Honwan is given by a Poisson Process with intensity per day of 1.8.
•
The sizes of these families are distributed as follows: 1 2 3 4 5 6 10% 15% 22% 20% 12% 8%
•
7 6%
8 4%
9 2%
10 1%
The number of families and the sizes of the families are independent.
Simulate how many people immigrate tomorrow. Simulate the time of each family using the Inversion Method to simulate times between families. Use the following independent random numbers from (0,1): 0.3869, 0.6736, 0.6672, 0.2580, 0.4723. Simulate the number of people for each family. Use the following independent random numbers from (0,1): 0.7945, 0.3583, 0.2452, 0.1726, 0.0429. A. 7 or less B. 8 C. 9 D. 10 E. 11 or more 10.2 (3 points) For a particular automobile insurance policyholder, the number of physical damage claims is Poisson with mean 0.04 per year. If a physical damage claim were made today, the cost of the damages would be uniformly distributed over the interval (0, 5000). Annual inflation in physical damage costs is 3%. Simulate the present value of this policyholderʼs physical damage claims over the next 30 years, using an interest rate of 7%. Simulate the time of each claim using the Inversion Method to simulate times between claims. Use the following independent random numbers from (0,1): 0.166, 0.389, 0.581, 0.172, 0.961, 0.607. Simulate the size of each claim. Use the following independent random numbers from (0,1): 0.217, 0.590, 0.999, 0.433, 0.661. A. 500 B. 1000 C. 1500 D. 2000 E. 2500
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 141
Use the following information for the next five questions: The number of accidents follows a Poisson distribution with mean 3 per day. Number of claimants per accident Probability 1 60% 2 20% 3 10% 4 10% You simulate tomorrow. 10.3 (3 points) Simulate the time of each accident using the Inversion Method to simulate times between accidents. Use the following independent random numbers from (0,1): 0.8094, 0.7337, 0.5524, 0.1900, 0.7752, 0.0802, 0.9199, 0.1420, 0.7529. Simulate the number of claimants for each accident. Use the following independent random numbers from (0,1): 0.6754, 0.5523, 0.5554, 0.1149, 0.0622, 0.2268, 0.2235, 0.3565. How many simulated claimants are there tomorrow? A. 2 or less B. 3 C. 4 D. 5 E. 6 or more 10.4 (1 point) In the previous question, what is the number of claimants for the third accident to occur tomorrow? A. 1 B. 2 C. 3 D. 4 E. None of A, B, C, or D 10.5 (3 points) First simulate the time of each 1-claimant accident using the Inversion Method to simulate times between accidents. Then simulate the time of each 2-claimant accident. Then simulate the time of each 3-claimant accident. Then simulate the time of each 4-claimant accident. Use the following independent random numbers from (0,1): 0.8094, 0.7337, 0.5524, 0.1900, 0.7752, 0.0802, 0.9199, 0.1420, 0.7529, 0.2705. How many simulated claimants are there tomorrow? A. 2 or less B. 3 C. 4 D. 5 E. 6 or more 10.6 (1 point) In the previous question, what is the number of claimants for the third accident to occur tomorrow? A. 1 B. 2 C. 3 D. 4 E. None of A, B, C, or D 10.7 (3 points) First simulate the number of accidents by applying the Inversion Method to the Poisson Distribution. Then simulate the number of claimants for each accident. Use the following independent random numbers from (0,1): 0.8094, 0.7337, 0.5524, 0.1900, 0.7752, 0.0802, 0.9199, 0.1420, 0.7529, 0.2705. How many simulated claimants are there tomorrow? A. 2 or less B. 3 C. 4 D. 5 E. 6 or more
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 142
Use the following information for the next four questions: As he walks, Clumsy Klem loses coins at a Poisson rate of 0.2 coins/minute. The denominations are randomly distributed: (i) 50% of the coins are worth 5; (ii) 30% of the coins are worth 10; and (iii) 20% of the coins are worth 25. Two actuaries, Donny and Marie, having played computer games all morning, decide after lunch to simulate Klemʼs 60 minute walk to work. (i) Donny begins by simulating the number of coins lost, using the procedure of repeatedly simulating the time until the next coin is lost, where large random numbers correspond to large times between coins, until the length of the walk has been exceeded. Then, for each coin lost, he simulates its denomination, using the inversion method, with large random numbers corresponding to large denominations. The expected number of random numbers he needs for one simulation of the walk is D. (ii) Marie uses the same algorithm for simulating the times between events. However, she first simulates losing coins worth 5, then simulates losing coins worth 10, then simulates losing coins worth 25, in each case simulating until the 60 minutes are exceeded. The expected number of random numbers she needs for one complete simulation of the walk is M. 10.8 (1 point) Which of the following statements is true? (A) Both methods are valid, and will produce identical results from the same sequence of random numbers. (B) Both methods are valid, but they may produce different results from the same sequence of random numbers. (C) Marieʼs method is valid; Donnyʼs is not. (D) Donnyʼs method is valid; Marieʼs is not. (E) Neither is a valid method for simulating the process. 10.9 (2 points) Determine the ratio M/D. (A) 0.50 (B) 0.53 (C) 0.55 (D) 0.58
(E) 0.60
10.10 (4 points) How much money did Clumsy Klem lose in Donnyʼs simulation? Donny uses the following independent random numbers from (0,1): 0.538, 0.664, 0.502, 0.937, 0.764, 0.636, 0.607, 0.226, 0.373, 0.392, 0.923, 0.080, 0.142, 0.291, 0.417, 0.511, 0.726, 0.929, 0.202, 0.322, 0.433, 0.095, 0.381, 0.206, 0.630, 0.893. (A) 75 or less (B) 80 to 100 (C) 105 to 125 (D) 130 to 150 (E) 155 or more 10.11 (4 points) If Marie uses the same random numbers as Donny, how much money did Clumsy Klem lose in her simulation? (A) 75 or less (B) 80 to 100 (C) 105 to 125 (D) 130 to 150 (E) 155 or more
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 143
Use the following information for the next two questions: Claims are reported according to a Poisson process with intensity 25. The number of claims reported and the claim amounts are independently distributed. Claim amounts are distributed via a Weibull with τ = 0.7 and θ = 1000. You simulate the time of each claim using the Inversion Method to simulate the times between claims. Use the following independent random numbers from (0,1): 0.128, 0.336, 0.192, 0.946, 0.744, 0.390, 0.661. You simulate the size of each claim. Use the following independent random numbers from (0,1): 0.830, 0.583, 0.268, 0.865, 0.467, 0.921, 0.084. 10.12 (3 points) What is the sum of the sizes of claims prior to time 0.2? (A) 5400 (B) 5600 (C) 5800 (D) 6000 (E) 6200 10.13 (2 points) When does the second claim of size greater than 2000 occur? A. less than 0.08 B. at least 0.08 but less than 0.10 C. at least 0.10 but less than 0.12 D. at least 0.12 but less than 0.14 E. at least 0.14
10.14 (3 points) Taxicabs leave a hotel with a group of passengers at a Poisson rate λ = 0.5 per minute. The number of people in each group taking a cab is independent and has the following probabilities: Number of People Probability 1 0.30 2 0.30 3 0.20 4 0.10 5 0.10 Simulate how many people leave via taxicab during the next 10 minutes. Simulate the time between cabs using the Inversion Method to simulate the times between cabs. Use the following independent random numbers from (0,1): 0.600, 0.354, 0.529, 0.959, 0.092, 0.716, 0.536, 0.079, 0.231, 0.618. Simulate the number of people for each cab. Use the following independent random numbers from (0,1): 0.757, 0.372, 0.945, 0.327, 0.157, 0.018, 0.430, 0.383, 0.065, 0.301. A. 7 or less B. 8 C. 9 D. 10 E. 11 or more
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 144
10.15 (3 points) The claims department of an insurance company receives envelopes with claims for insurance coverage at a Poisson rate of λ = 50. For any period of time, the number of envelopes and the numbers of claims in the envelopes are independent. The numbers of claims in the envelopes have the following distribution: Number of Claims Probability 1 0.20 2 0.25 3 0.40 4 0.15 Simulate when the insurer has first received a total of 10 or more claims. Simulate the time of each envelope using the Inversion Method, where small random numbers correspond to large times between envelopes. Use the following independent random numbers from (0,1): 0.424, 0.844, 0.562, 0.612, 0.559, 0.737, 0.344, 0.221. Simulate the number of claims for each envelope, with small random numbers corresponding to small numbers of claims. Use the following independent random numbers from (0,1): 0.574, 0.539, 0.434, 0.853, 0.883, 0.715 0.231, 0.538. A. 0.04 B. 0.06 C. 0.08 D. 0.10 E. 0.12 10.16 (4 points) Claims arrive at an office of an insurer at a Poisson rate of 3 per hour. Klem Handlerʼs job is to initially process each claim. The time it takes Klem to initially process each claim is Exponential with mean 15 minutes. Klem works on claims in the order in which they arrive, and works on a claim until he is done with it. When Klem starts works, there are no claims waiting to be processed. Simulate the time of each claim using the Inversion Method to simulate the times between claims. Use the following independent random numbers from (0,1): 0.282, 0.046, 0.761, 0.171, 0.851, 0.400, 0.278, 0.650. Simulate the time to process claim using the method of inversion. Use the following independent random numbers from (0,1): 0.084, 0.889, 0.201, 0.671, 0.994, 0.443, 0.589, 0.249. What is the simulated the state of the claims processing system at the end of 1 hour? (A) There are no claims waiting to be processed. (B) Klem is working on a claim; there are no others waiting to be processed. (C) Klem is working on a claim; there is one other claim waiting to be processed. (D) Klem is working on a claim; there are two others waiting to be processed. (E) Klem is working on a claim; there are at least three others waiting to be processed.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 145
10.17 (3 points) Claims follow a Poisson process with mean 0.35 per year. Currently, claim sizes follow a Pareto Distribution with α = 4 and θ = 10,000. Costs are increasing at a rate of 10% per year. Simulate the total value of claims over the next 5 years. Simulate the time of each claim using the Inversion Method to simulate the times between claims. Use the following independent random numbers from (0,1): 0.077, 0.333, 0.256, 0.857, 0.431, 0.144, 0.562, 0.876, 0.888, 0.678. Simulate the size of each claim. Use the following independent random numbers from (0,1): 0.833, 0.628, 0.684, 0.839, 0.053, 0.082, 0.272, 0.630, 0.838, 0.490. A. 11,000 B. 12,000 C. 13,000 D. 14,000 E. 15,000 10.18 (3 points) XYZ Re provides reinsurance to Bigskew Insurance Company. XYZ agrees to pay Bigskew for all losses resulting from “events”, subject to:
• a $500 deductible per event and • a $1000 annual aggregate deductible, which applies after the per event deductible. Events occur via a Poisson Process with mean annual frequency of 2 events. Event severity is from the following distribution: Loss Probability $250 0.10 500 0.25 800 0.30 1,000 0.25 2,000 0.05 3,000 0.05 Simulate one year and determine how much XYZ Re pays Bigskew Insurance Company. Use the following random numbers to simulate the times events: 0.188, 0.326, 0.272, 0.551, 0.294, 0.690, 0.738. Use the following random numbers to simulate the size of events: 0.539, 0.933, 0.721, 0.316, 0.970, 0.037. A. 0 B. more than 0 but less than 500 C. at least 500 but less than 1000 D. at least 1000 but less than 1500 E. at least 1500
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 146
10.19 (3 points) Losses follow a Poisson process with mean 2.2 per year. Sizes of loss follow a Single Parameter Pareto Distribution with α = 3 and θ = 5000. Simulate the total value of losses over the next 3 years. Simulate the time of each loss using the Inversion Method. Use the following independent random numbers from (0,1): 0.510, 0.167, 0.947, 0.316, 0.161, 0.693, 0.918, 0.272, 0.630, 0.838, 0.628. Simulate the size of each loss. Use the following independent random numbers from (0,1): 0.143, 0.431, 0.888, 0.562, 0.876, 0.678, 0.923, 0.144, 0.667, 0.744. A. 40,000 B. 45,000 C. 50,000 D. 55,000 E. 60,000 10.20 (6 points) Use the following information:
• • • •
A loss occurrence may be caused by fire or theft. Fire and theft losses occur independently of one another. Fire losses follow a Poisson Process. The expected amount of time between fire losses is 20 days Theft losses follow a Poisson Process. The expected amount of time between theft losses is 10 days.
•
The size of fire losses follows a Pareto Distribution with parameters α = 3 and θ = 2000.
•
The size of theft losses follows a Weibull Distribution with parameters τ = 2 and θ = 400.
Use the following random numbers to simulate the times between Fire claims: 0.525, 0.559, 0.189, 0.902, 0.256, 0.515, 0.872, 0.295, 0.634, 0.318. Use the following random numbers to simulate the sizes of Fire claims: 0.700, 0.226, 0.041, 0.178, 0.614, 0.482, 0.989, 0.712, 0.445, 0.966. Use the following random numbers to simulate the times between Theft claims: 0.251, 0.609, 0.253, 0.217, 0.506, 0.078, 0.525, 0.744, 0.514, 0.046. Use the following random numbers to simulate the sizes of Theft claims: 0.117, 0.416, 0.811, 0.648, 0.866, 0.892, 0.202, 0.396, 0.649, 0.386. What is the simulated aggregate loss in the first 30 days? A. 2000 B. 2500 C. 3000 D. 3500 E. 4000
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 147
Use the following information for the next two questions: Lucky Tom finds coins on his 60 minute walk to work at a Poisson rate of 2 per minute. 60% of the coins are worth 1 each; 20% are worth 5 each; 20% are worth 10 each. The denominations of the coins found are independent. Two actuaries are simulating Tomʼs 60 minute walk to work. (i) The first actuary begins by simulating the number of coins found, using the procedure of repeatedly simulating the time until the next coin is found, until the length of the walk has been exceeded. For each coin found, he simulates its denomination, using the inversion method. The expected number of random numbers he needs for one simulation of the walk is F. (ii) The second actuary uses the same algorithm for simulating the times between events. However, she first simulates finding coins worth 1, then simulates finding coins worth 5, then simulates finding coins worth 10, in each case simulating until the 60 minutes are exceeded. The expected number of random numbers she needs for one complete simulation of Tomʼs walk is G. 10.21 (3, 11/00, Q.43) (1.25 points) Which of the following statements is true? (A) Neither is a valid method for simulating the process. (B) The first actuaryʼs method is valid; the second actuaryʼs is not. (C) The second actuaryʼs method is valid; the first actuaryʼs is not. (D) Both methods are valid, but they may produce different results from the same sequence of random numbers. (E) Both methods are valid, and will produce identical results from the same sequence of random numbers. 10.22 (3, 11/00, Q.44) (1.25 points) Determine which of the following ranges contains the ratio F/G. (A) 0.0 < F/G ≤ 0.4 (B) 0.4 < F/G ≤ 0.8 (C) 0.8 < F/G ≤ 1.2 (D) 1.2 < F/G ≤ 1.6 (E) 1.6 < F/G
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 148
10.23 (SOA3, 11/04, Q.33 & 2009 Sample Q.131) (2.5 points) You are simulating the gain/loss from insurance where: (i) Claim occurrences follow a Poisson process with λ = 2/3 per year. (ii) Each claim amount is 1, 2 or 3 with p(1) = 0.25, p(2) = 0.25, and p(3) = 0.50. (iii) Claim occurrences and amounts are independent. (iv) The annual premium equals expected annual claims plus 1.8 times the standard deviation of annual claims. (v) i = 0 You use 0.75, 0.60, 0.40, and 0.20 from the unit interval to simulate time between claims, where small numbers correspond to longer times. You use 0.30, 0.60, 0.20, and 0.70 from the unit interval to simulate claim size, where small numbers correspond to smaller claims. Calculate the gain or loss during the first 2 years from this simulation. (A) loss of 5 (B) loss of 4 (C) 0 (D) gain of 4 (E) gain of 5
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 149
Solutions to Problems: 10.1. C. Small random numbers correspond to small times between families, so set u = F(t) = 1 - e-1.8t. t = -ln(1- u)/1.8. 1 0.10 0.10
2 0.15 0.25
3 0.22 0.47
4 0.20 0.67
i
u1
Interarrival Time -Ln(1-u1)/1.8
1 2 3
0.3869 0.6736 0.6672
0.2718 0.6220 0.6112
5 0.12 0.79
6 0.08 0.87
Time of Family 0.2718 0.8938 1.5050
7 0.06 0.93
8 0.04 0.97
9 0.02 0.99
u2
# of People
0.7945 0.3583
6 3
SUM
9
10 0.01 1.00
We simulate 2 families with a total of 9 people. 10.2. E. Large random numbers correspond to large times between accidents, so set u = F(t) = 1 - e-λt. t = -ln(1- u)/λ = -25ln(1-u).
i
u1
Interarrival Time -25Ln(1-u1)
Time of Claim
0.166 0.389 0.581
4.538 12.316 21.747
4.538 16.855 38.602
Inflation Factor
Discount Factor
Present Value
u2
Size of Claim no inflation
0.217 0.590
1085 2950
1.144 1.646
0.736 0.320
913 1552
SUM
2465
For example, the first claim is at time 4.538 years, 1.034.538 = 1.144, and 1/1.074.538 = 0.736. 10.3. B. Large random numbers correspond to large times between accidents, so set u = F(t) = 1 - e-3t. t = -ln(1-u)/3.
i
u1
Interarrival Time -Ln(1-u1)/3
1 2 3
0.8094 0.7337 0.5524
0.5525 0.4410 0.2680
Time of Accident 0.5525 0.9936 1.2615
u2
# of Claimants
0.6754 0.5523
2 1
SUM
3
We simulate 2 accidents with a total of 3 claimants. 10.4. E. In the previous solution, only 2 accidents occur tomorrow.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 150
10.5. E. Large random numbers correspond to large times between accidents, so set u = F(t) = 1 - e-λt. t = -ln(1-u)/λ. We have four independent Poisson Processes. 1 claimant accidents have λ = 1.8. 2 claimant accidents have λ = 0.6. 3 claimant accidents have λ = 0.3. 4 claimant accidents have λ = 0.3.
i
u
Interarrival Time -Ln(u) /lambda
Time of Accident
1 2
0.8094 0.7337
0.9209 0.7351
0.9209 1.6560
1
0.5524
1.3398
1.3398
1 2
0.1900 0.7752
0.7024 4.9751
0.7024 5.6776
1 2
0.0802 0.9199
0.2787 8.4149
0.2787 8.6936
We simulate 1 accidents with 1 claimant, 0 accidents with 2 claimants, 1 accident with 3 claimants, and 1 accident with 4 claimants, for a total of 3 accidents with 8 claimants. 10.6. A. In the previous solution, there are accidents at times: .02787, .7049, and .9209. The third accident to occur, at .9209, has 1 claimant . 10.7. E. Construct a Distribution Table of the Poisson Distribution, for λ = 3. n
f(n)
F(n)
0 1 2 3 4 5 6 7 8
0.0498 0.1494 0.2240 0.2240 0.1680 0.1008 0.0504 0.0216 0.0081
0.0498 0.1991 0.4232 0.6472 0.8153 0.9161 0.9665 0.9881 0.9962
The first random number .8094 is first exceeded by F(4) = .8153, so we simulate 4 accidents. Using the next random number .7337 < .8, the first accident has 2 claimant. .5524 < .6, thus the second accident has 1 claimant. .1900 < .6, thus the third accident has 1 claimant. .7752 < .8, thus the fourth accident has 2 claimants. We have simulated a total of 6 claimants. Comment: Unlike the previous questions, we have not determined when each accident occurred.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 151
10.8. B. These are two valid techniques. Marie makes use of the mathematical fact that the process for each type of coin is an independent Poisson Process. The two methods will almost always give different answers from the same sequence of random numbers. Comment: Similar to 3, 11/00, Q.43. 10.9. E. We expect (60)(.2) = 12 coins on average. Therefore, Donny needs to simulate on average 13 interevent times, 12 of which he uses and one of which corresponds to an arrival time which exceeds the 60 minute length of Klemʼs walk. Each interevent time is simulated as: -ln(u)/0.2, and uses one random number from [0, 1]. Then Donny on average needs an additional 12 random numbers to determine the denominations of the coins that have been lost, for a total of: 12 + 13 = 25 random numbers. We expect (50%)(12) = 6 coins worth 5, (30%)(12) = 3.6 coins worth 10, and (20%)(12) = 2.4 coins worth 25. Therefore, Marie expects to simulate on average 6 + 1 = 7 interevent times for coins worth 5, 3.6 + 1 = 4.6 interevent times for coins worth 10, and 2.4 + 1 = 3.4 interevent times for coins worth 25. This requires: 7 + 4.6 + 3.4 = 15 random numbers. (# random numbers needed by Marie)/(# random numbers needed by Donny) = 15/25 = 0.6. Comment: Similar to 3, 11/00, Q.44. 10.10. B. There are 10 coins lost in the 60 minutes, for a total of 80. 0.2
i 1 2 3 4 5 6 7 8 9 10 11
Uniform Random Number U1 0.538 0.664 0.502 0.937 0.764 0.636 0.607 0.226 0.373 0.392 0.923
Interarrival Time Xi = -Ln(1-Ui) / lambda 3.8610 5.4532 3.4858 13.8231 7.2196 5.0530 4.6697 1.2809 2.3340 2.4879 12.8197
Comment: We expect 12 coins.
Time of Claim Sum of Xi 3.8610 9.3142 12.7999 26.6231 33.8427 38.8957 43.5654 44.8463 47.1804 49.6683 62.4880
Uniform Random Number U2 0.080 0.142 0.291 0.417 0.511 0.726 0.929 0.202 0.322 0.433 SUM
Type of Coin 5 5 5 5 10 10 25 5 5 5 80
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 152
10.11. E. The coins worth 5 have an intensity of: (.5)(.2) = .1. 0.1
i
Uniform Random Number U1
Interarrival Time Xi = -Ln(1-Ui) / lambda
Time of Claim Sum of Xi
1 2 3 4 5
0.538 0.664 0.502 0.937 0.764
7.7219 10.9064 6.9716 27.6462 14.4392
7.7219 18.6283 25.5999 53.2461 67.6853
There are 4 coins worth 5 in the 60 minutes. The coins worth 10 have an intensity of: (.3)(.2) = .06. 0.06
i
Uniform Random Number U1
Interarrival Time Xi = -Ln(1-Ui) / lambda
Time of Claim Sum of Xi
1 2 3 4 5 6
0.636 0.607 0.226 0.373 0.392 0.923
16.8434 15.5658 4.2697 7.7801 8.2930 42.7325
16.8434 32.4091 36.6788 44.4590 52.7520 95.4845
There are 5 coins worth 10 in the 60 minutes. The coins worth 10 have an intensity of: (.2)(.2) = .04. 0.04
i
Uniform Random Number U1
Interarrival Time Xi = -Ln(1-Ui) / lambda
Time of Claim Sum of Xi
1 2 3 4 5 6
0.080 0.142 0.291 0.417 0.511 0.726
2.0845 3.8288 8.5975 13.4892 17.8848 32.3657
2.0845 5.9133 14.5108 28.0000 45.8848 78.2505
There are 5 coins worth 25 in the 60 minutes. Total money lost: (4)(5) + (5)(10) + (5)(25) = 195.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 153
10.12. D. and 10.13. E. For the size of claim, u = F(x) = 1 - exp(-(x/1000).7). x = 1000({-ln(1-u)}1/.7). 25
Interarrival Time Xi = -Ln(1-Ui) / lambda
Cumulative Interarrival Time Sum of Xi
Uniform Random Number
Size of Claim
i
Uniform Random Number Ui
1 2 3 4 5
0.128 0.336 0.192 0.946 0.744
0.0055 0.0164 0.0085 0.1168 0.0545
0.0055 0.0219 0.0304 0.1471 0.2016
0.830 0.583 0.268 0.865 0.467
2264 826 189 2697 516
2264 + 826 + 189 + 2697 = 5976. The second claim of size greater than 2000 occurs at time 0.1471. 10.14. D. Small random numbers correspond to small times between cabs, so set u = F(t) = 1 - e-.5t. t = -2ln(1-u).
i
u1
Interarrival Time -2Ln(1-u1)
1 2 3 4
0.600 0.354 0.529 0.959
1.8326 0.8739 1.5058 6.3884
Time of Cab 1.8326 2.7065 4.2123 10.6007
u2
# of People
0.757 0.372 0.945
3 2 5
SUM
10
We simulate 3 cabs with a total of 10 people during the first 10 minutes. 10.15. A. Small random numbers correspond to large times between envelopes, so set u = S(t) = e-50t. t = -ln(u)/50.
i
u1
Interarrival Time -Ln(u1)/50
1 2 3 4
0.424 0.844 0.562 0.612
0.0172 0.0034 0.0115 0.0098
Time of Envelope 0.0172 0.0206 0.0321 0.0419
We first have 10 or more claims at t = 0.0419. Comment: Situation taken from 3, 11/01, Q.30.
u2
# of Claims
Total # of Claims
0.574 0.539 0.434 0.853
3 3 2 4
3 6 8 12
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 154
10.16. B. The times between arrival of claims are Exponential with mean 1/3 of an hour. 3
i
Uniform Random Number Ui
Interarrival Time Xi = -Ln(1-Ui) / 3
Time of Claim Sum of Xi
1 2 3 4 5
0.282 0.046 0.761 0.171 0.851
0.1104 0.0157 0.4771 0.0625 0.6346
0.1104 0.1261 0.6032 0.6657 1.3003
4 claims arrive during the first hour, at times: 0.1104, 0.1261, 0.6032, 0.6657. The time to process the first claim is: -(1/4) ln[1 - 0.084] = 0.0219. So Klem is done processing the first claim at time: 0.1104 + 0.0219 = 0.1323, and can start processing the second claim which arrived at time 0.1261. The time to process the second claim is: -(1/4) ln[1 - 0.889] = 0.5496. So Klem is done processing the second claim at time: 0.1323 + 0.5496 = 0.6819, and can start processing the third claim which arrived at time 0.6032. The time to process the third claim is: -(1/4) ln[1 - 0.201] = 0.0561 So Klem is done processing the third claim at time: 0.6819 + 0.0561 = 0.7380, and can start processing the fourth claim which arrived at time 0.6657. The time to process the fourth claim is: -(1/4) ln[1 - 0.671] = 0.2779. So Klem is done processing the fourth claim at time: 0.7380 + 0.2779 = 1.0159. Thus at time 1, Klem is working on a claim, and there are no others waiting to be processed. Comment: An example of a queue with a single server. 10.17. C. Large random numbers correspond to large times between accidents, so set u = F(t) = 1 - e-λt. t = -ln(1-u)/.35. Large random numbers correspond to large sizes, so set u = F(x) = 1 - (10000/(10000 +x))4 . ⇒ x = 10000((1-u)-1/4 - 1).
i
u1
Interarrival Time -Ln(1-u1)/.35
Time of Claim
0.077 0.333 0.256 0.857
0.229 1.157 0.845 5.557
0.229 1.386 2.231 7.788
Inflation Factor
u2
Size of Claim no inflation
Size of Claim
0.833 0.628 0.684
5643 2805 3338
1.022 1.141 1.237 SUM
5768 3201 4128 13,096
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 155
10.18. D. Simulate, the waiting times between events, using -ln(1-u)/2, since large ⇔ large: u
-ln(1-u)/2
Time
0.188 0.326 0.272 0.551 0.294
0.1041 0.1973 0.1587 0.4004 0.1741
0.1041 0.3014 0.4601 0.8605 1.0346
The fifth simulated event is beyond the end of the first year. There are 4 events the first year. The severity distribution is: Loss
Probability
Distribution
250 500 800 1000 2000 3000
0.10 0.25 0.30 0.25 0.05 0.05
0.10 0.35 0.65 0.90 0.95 1.00
.539 is first exceeded by F(800), so the first event is of size 800. .933 is first exceeded by F(2000), so the second event is of size 2000. .721 is first exceeded by F(1000), so the third event is of size 1000. .316 is first exceeded by F(500), so the fourth event is of size 500. The reinsurer would pay after the per event deductible: 300 + 1500 + 500 + 0 = 2300. Subtracting the annual aggregate deductible, the reinsurer pays 2300 - 1000 = 1300.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 156
10.19. B. Large random numbers correspond to large times between accidents, so set u = F(t) = 1 - e-λt. t = -ln(1-u)/2.2. Large random numbers correspond to large sizes, so set u = F(x) = 1 - (5000/x)3 . ⇒ x = 5000(1-u)-1/3. i
u1
Interarrival Time -Ln(1-u1)/2.2
Time of Loss
u2
Size of Loss
0.510 0.167 0.947 0.316 0.161 0.693 0.918
0.324 0.083 1.335 0.173 0.080 0.537 1.137
0.324 0.407 1.743 1.915 1.995 2.532 3.669
0.143 0.431 0.888 0.562 0.876 0.678 SUM
5,264 6,034 10,373 6,584 10,027 7,295 45,576
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 157
10.20. D. For each Exponential interevent time, u = 1 - e-t/θ. ⇒ t = -θ ln(1 - u). 20
i
Uniform Random Number Ui
Interarrival Time Xi = - theta Ln(1-Ui)
Cumulative Interarrival Time Sum of Xi
1 2
0.525 0.559
14.8888 16.3742
14.8888 31.2630
There is 1 fire claim prior to time 30. For the Pareto, u = 1 - {2000/(x+2000)}3 . ⇒ (1 - u)-1/3 = (x+2000)/2000.
⇒ x = 2000{(1 - u)-1/3 - 1} = 2000{(1 - .700)-1/3 - 1} = 988. 10
i
Uniform Random Number Ui
Interarrival Time Xi = - theta Ln(1-Ui)
Cumulative Interarrival Time Sum of Xi
1 2 3 4 5 6 7
0.251 0.609 0.253 0.217 0.506 0.078 0.525
2.8902 9.3905 2.9169 2.4462 7.0522 0.8121 7.4444
2.8902 12.2806 15.1975 17.6438 24.6960 25.5081 32.9525
There are 6 theft claims prior to time 30. For the Weibull, u = 1 - exp[-(x/400)2 .].
⇒ x = 400 -ln[(1 - u)] . The sizes of the 6 theft clams are: 400
-ln[(1 - 0.117)] = 141. 400
-ln[(1 - 0.416)] = 293. 400
-ln[(1 - 0.811)] = 516.
400
-ln[(1 - 0.648)] = 409. 400
-ln[(1 - 0.866)] = 567. 400
-ln[(1 - 0.892)] = 597.
Thus including the fire claim, the simulated aggregate loss in the first 30 days is: 988 + 141 + 293 + 516 + 409 + 567+ 597 = 3511. 10.21. D. These are two valid techniques. The second actuary makes use of thinning, the mathematical fact that finding the different types of coins are independent Poissons. The two methods will almost always give different answers for a single simulation run using the same sequence of random numbers. Comment: As seen in the next solution, the two methods do not even employ on average the same number of random numbers, let alone produce the same result. While the results for a given simulation run differ, the results of a simulation experiment consisting of a very large number of simulation runs would be the same using either technique.
2013-4-13,
Simulation §10 Compound Poisson Process, HCM 10/25/12, Page 158
10.22. E. We expect 120 coins on average. Therefore, the first actuary needs to simulate on average 121 interevent times, 120 of which he uses and one of which corresponds to an arrival time which exceeds the 60 minute length of Tomʼs walk. Each interevent time is simulated as: -ln(u)/2, and uses one random number from [0, 1]. Then the first actuary on average needs an additional 120 random numbers to determine the denominations of the coins that have been found, for a total of: 120 + 121 = 241 random numbers. We expect (60%)(120) = 72 coins worth 1, (20%)(120) = 24 coins worth 5, and (20%)(120) = 24 coins worth 10. Therefore, the second actuary expects to simulate on average 73 interevent times for coins worth 1, 25 interevent times for coins worth 5, and 25 interevent times for coins worth 10. This requires: 73 + 25 + 25 = 123 random numbers. (# random numbers needed by actuary 1)/(# random numbers needed by actuary 2) = 241/123 = 1.96. Comment: The second method is more efficient. This increased efficiency is made possible by use of the mathematical fact that the process for each type of coin is an independent Poisson Process. 10.23. E. Simulate the interevent times by setting e-t2/3 = u. ⇒ t = -1.5 ln(u). u = .75 ⇒ t = .432. u = .6 ⇒ t = .766. u = .4 ⇒ t = 1.374. Thus claims occur at times .432, .432 + .766 = 1.198, and 1.198 + 1.374 = 2.572, where the last one is beyond the 2 year time frame. For the discrete severity, F(1) = .25, F(2) = .5, F(3) = 1. Simulate size of claims: .25 ≤ u = .3 < .5 ⇒ size = 2. .5 ≤ u = .6 < 1 ⇒ size = 3. Simulated Aggregate Loss in the two years is: 2 + 3 = 5. Mean severity is 2.25. Second moment of severity is 5.75. Mean annual aggregate is: (2/3)(2.25) = 1.5. Variance of annual aggregate is: (2/3)(5.75) = 3.833. Annual premium is: 1.5 + (1.8) 3.833 = 5.02. For two years: Premiums - losses = (2)(5.02) - 5 = 5.04.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 159
Section 11, Simulating Aggregate Losses and Compound Models The simulation methods already discussed can be extended to compound models, such as compound frequency distributions or aggregate losses. Aggregate Losses:64 Assume frequency and severity are independent, then if one can simulate a random draw from the frequency distribution and random draws from the severity distribution, then one can simulate the total losses in a year. 1. Simulate the number of losses (for a year) 2. Simulate the size of each loss 3. Apply any policy provisions For example, assume we simulate that there are 3 claims in a year. Then we simulate 3 random draws from the severity distribution and sum the resulting claims. This technique works equally well for either simple examples with discrete severity distributions65 or in an example involving distributions from the Appendices of Loss Models. Exercise: Assume frequency and severity are independent. The number of claims in a year has density: 0 30% 1 40% 2 20% 3 10% The severity density is: 100 25% 200 60% 300 15% Given the following random numbers from (0,1): 0.82, 0.54, 0.17, 0.91, simulate the total losses in a year.
64
Note that Loss Models presents a number of techniques of calculating the distribution of aggregate losses. Here we are ignoring those clever techniques in favor of using simulation. It should be noted that if one had the (approximate) distribution function of the aggregate losses in a table, one could use the Inversion Method (Table Lookup) in order to simulate years of losses. 65 Think of an example with dice and spinners that is used to illustrate credibility ideas.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 160
[Solution: Apply the Inverse Transform Algorithm to first simulate the number of losses and then to simulate the amount of each loss. The Distribution Function for frequency is: x 0 1 2 3
f(x) 0.3 0.4 0.2 0.1
F(x) 0.3 0.7 0.9 1
Thus since 0.7 ≤ 0.82 and 0.9 > 0.82, we simulate 2 claims. The Distribution Function for severity is: x 100 200 300
f(x) 0.25 0.6 0.15
F(x) 0.25 0.85 1
Therefore, since 0.25 ≤ 0.54 and 0.85 > 0.54, the first simulated loss is of size 200. Since 0.25 > 0.17, the second simulated loss is of size 100. The simulated losses total to: 200 + 100 = 300.] Compound Frequency Distributions: A compound frequency distribution has a primary and secondary distribution, each of which is a frequency distribution.66 First one simulates a random draw from the primary distribution, N. Then one simulates N independent random draws from the secondary distribution, and sums the result.67 For example, assume we have a Poisson-Binomial distribution, with parameters λ = 3.3, q = 0.4, m = 5. First simulate a random draw from the Poisson with λ = 3.3, via the Inverse Transform Algorithm. Then simulate random draw(s) from the Binomial Distribution with q = 0.4, m = 5, via the Inverse Transform Method. Assume we have the following random numbers from (0,1): 0.402, 0.603, 0.184, 0.877, 0.734, 0.439, 0.518, 0.121. For the primary Poisson Distribution, calculate a table of values for f(x) and take the cumulative sum to get F(x). F(x) first exceeds 0.402 at 3.
66
See Loss Models or “Mahlerʼs Guide to Frequency Distributions.” Thus the secondary distribution takes on the same role as a severity distribution in the simulation of aggregate losses. 67
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 161 3.3 Number of Claims
Probability Density Function
Cumulative Distribution Function
0 1 2 3 4 5 6 7 8 9 10 11
3.688% 12.171% 20.083% 22.091% 18.225% 12.029% 6.616% 3.119% 1.287% 0.472% 0.156% 0.047%
0.03688 0.15860 0.35943 0.58034 0.76259 0.88288 0.94903 0.98022 0.99309 0.99781 0.99936 0.99983
Thus we need to simulate 3 random draws from the secondary Binomial Distribution with q = 0.4, and m = 5. Number of Claims
Probability Density Function
Cumulative Distribution Function
0 1 2 3 4 5
7.776% 25.920% 34.560% 23.040% 7.680% 1.024%
0.07776 0.33696 0.68256 0.91296 0.98976 1.00000
One cumulates the densities to get the distribution function. The next random number is 0.603, F(2) = 0.68256 > 0.603, so we simulate two losses. We simulate two more random draws from this Binomial: F(1) > 0.184 , F(3) > 0.877. Thus the total number of losses is: 2 + 1 + 3 = 6. Simulating Per a Claim Deductible and Maximum Aggregate Annual Payment (Stop Loss):68
• Frequency is Negative Binomial with r = 3 and β = 2.69 • Severity is Weibull with θ = 600 and τ = 2.70 • A deductible of 250 is applied to each claim. • The insured pays (retains) no more than a total of 1000 in any given year.71 68
See Examples 21.7 and 21.9 in Loss Models. We can have other combinations of policy provisions. We could have different frequency distributions. 70 We could have different severity distributions. 71 We note that in any year in which the insured has four or fewer losses, the stop loss can not have an effect. 69
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 162
The steps in order to simulate a year:
• Simulate a random draw from the Negative Binomial in order to determine the number of losses.72 • Simulate each size of loss from the Weibull.73 • Apply the deductible of 250 to each loss, keeping track of how much the insured retains. • Limit the insured retention for the year to 1000. • Insurer pays: sum of losses minus total amount retained by the insured. Exercise: In a year we have 6 simulated losses: 245, 285, 220, 590, 65, 1090. How much is retained by the insured and how much is retained by the insurer? [Solution: Applying the deductible, the insured would retain: 245, 250, 220, 250, 65, 250. However, this totals 1280, more than the 1000 retention limit; the insured retains a total of 1000. The losses sum to 2495. Thus the insurer pays a total of: 2495 - 1000 = 1495. Comment: In this simulated year, the insurer paid more due the maximum annual retention.] Here is a histogram of the annual amount paid by the insurer for 100,000 simulations: 0.10
0.08
0.06
0.04
0.02
5000
10000
15000
The estimated 95th percentile of the distribution of the annual amount paid by the insurer is 6692.74 72
Using the method of inversion. Using the method of inversion. 74 This is similar to 6668 estimated by the authors of Loss Models using 11,476 simulated years. 73
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 163
Simulating the Present Value of Aggregate Losses:75 We have already seen how to simulate aggregate losses. However, sometimes one is interested in more than just the total aggregate losses. For example, one might be interested in not only how much was paid, but when it was paid. Loss Models gives an example of simulating the present value of aggregate losses. To calculate present values, one must know the timing of payments and the relevant discount rates. In the simpler case, the discount rates are fixed; in a more complicated case the discount rates are stochastic. The steps in order to simulate the present value of aggregate losses are:76 1. Simulate the time of occurrence of each loss (as well as how many there are.) 2. Simulate the amount of payment.77 3. For each loss, simulate the time from occurrence to payment.78 4. Simulate the discount rate to apply to each payment (if interest rates are stochastic.). For example, assume the times of occurrence are given by a homogeneous Poisson Process. Exercise: Assume the occurrence of losses is given by a homogeneous Poisson Process on [0,1] with intensity λ = 1.5. Assume we have the following independent random numbers from zero to one: 0.0252, 0.6593, 0.2301, 0.7030, 0.0818. Simulate the timing of losses that occur from time = 0 to time = 1. Use the Inverse Transform Method to simulate the exponential interevent times. [Solution: Ui = F(Xi) = 1 - exp(-1.5Xi). Then we have, Xi = (-1/1.5) ln(1 -Ui).
i 1 2 3 4
Uniform Random Number Ui 0.0252 0.6593 0.2301 0.7030
Interevent Time Xi = -Ln(1-Ui) / lambda 0.0170 0.7178 0.1743 0.8093
Cumulative Interevent Time Sum of Xi 0.0170 0.7349 0.9092 1.7185
Thus we have 3 losses at times: 0.0170, 0.7349, and 0.9092.] 75
See Examples 21.6 and 21.8 in Loss Models. While this is a good example of how you can easily add complications to a simulation model in order to better match the real world, in total it is too long for a reasonable exam question. However, pieces of these examples could show up in exam questions. 76 The order of operations can vary somewhat. 77 One would simulate a random draw from a loss distribution. Then one may have to apply coverage modifications. In more complicated simulations, the size of loss may depend on the time of occurrence. 78 In more complicated simulations, the time from occurrence to payment, may depend on the amount of the loss. In more complicated simulations there may be multiple payments per loss. In more complicated simulations, the size of payment may depend on the time from occurrence to payment.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 164
Next we need to simulate the size of each loss. This can be from any of the loss distributions we have learned how to simulate. Assume for example, losses follow a Pareto Distribution. Exercise: Losses follow a Pareto Distribution with parameters α = 2 and θ = 10,000. Simulate three losses, with large random numbers corresponding to large losses, given the following random numbers from (0,1): 0.5472, 0.9311, 0.4009. [Solution: u = 1 - (1+x/100)-2. ⇒ x = 10,000 {(1-u)-.5 -1)}. ⇒ x = 4861, 28,097, and 2920.] Thus we have three losses of sizes: $4861, $28,097, and $2920.79 Let us assume a deductible of $1000 and a maximum covered loss of $25,000.80 Then the insurerʼs payments will be $3861, $24,000, and $1920.81 Next we need to simulate the time from occurrence of each loss to its payment. We can use any distribution, but the Weibull (or its special case the Exponential) are commonly used. One can either have the delay independent of the size of loss, or have it depend on the size of loss and/or payment. Exercise: Assume in each case that the time from occurrence to payment is given by a Weibull Distribution. The parameters of the Weibull are τ = 0.5 and θ = ln[size of loss/1000]/2. (Thus there is a longer average delay until the payment of larger losses.) Simulate the times from occurrence to payment, with large random numbers corresponding to large delays. Use the following random numbers from (0,1): 0.0629, 0.6804, 0.5173. [Solution: u = 1 - exp[-(x/θ)0.5]. ⇒ x = θ{-ln(1-u)}2 . For the first loss, of size $4861, θ = ln(4.861)/2 = 0.7906. Thus the delay is: (0.7906) {-ln(1-.0629)}2 = 0.0033. Similarly, the other delays are: {ln(28.097)/2} {-ln(1-0.6804)}2 = 2.1701, and {ln(3.920)/2} {-ln(1-.5173)}2 = 0.3624.] Thus we have simulated three payments: $3861 at time 0.0170 + 0.0033 = 0.0203, $24,000 at time 0.7349 + 2.1701 = 2.9050, and $2920 at time 0.9092 + 0.3624 = 1.2716.82 79
For simplicity we have assumed the sizes of loss do not depend on the time of occurrence. For many applications this is not an unreasonable approximation over the course of a single year. 80 In Examples 21.6 and 21.8 in Loss Models, there are no coverage modifications. As discussed previously, Examples 21.7 and 21.9 in Loss Models illustrate the effect of a per loss deductible combined with an annual aggregate maximum. 81 For simplicity we have ignored any dependence of the size of payment on the time of payment. 82 If we ran the simulation again, we would probably get a different number of losses, different sizes of loss, and different timings. The present value of aggregate losses would vary from run to run, hopefully approximating the variation we would expect to see in the real world situation we are attempting to model.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 165
Exercise: Assuming a 3% rate of interest, what is the sum of the present values of the above payments? [Solution: $3861 / 1.030.0203 + $24,000 / 1.032.9050 + $2920 / 1.031.2716 = 3859 + 22025 + 2812 = $28,696.] Stochastic Interest Rates:83 Rather than a fixed interest rate, one can have the discount factors be random. The discount factors come from some model, but their actual value varies stochastically from simulation run to simulation run.84 For example, assume that the discount factor to get from time t to time t + Δ is given by : 1 / ( a random draw from a LogNormal Distribution with µ = 0.03 Δ and σ = 0.01 Δ ). Also assume that discount factors over disjoint time intervals are independent.85 Exercise: What is the average annual interest rate? [Solution: Over one year, the inverse discount factor, 1+i, is distributed via a LogNormal with µ = 0.03 and σ = 0.01, with mean exp(0.03 + 0.012 /2) = 1.0305. Thus the average annual interest rate is 3.05%.] Then in order to get the discounted payments, one needs to first get the discount factor from time 0 to 0.0203, then the discount factor to go from time 0.0203 to time 1.2716, and finally the discount factor from 1.2716 to 2.9050.86 Exercise: Using the random numbers from (0,1): 0.9015, 0.2327, 0.7794, and the inverse transform method, simulate three random draws from a Unit Normal. [Solution: Consulting the Normal Table: 1.290, -0.730, and 0.770.] Now the discount factor from time 0 to 0.0203 involves a LogNormal with parameters µ = (0.03)(0.0203) = 0.000609 and σ = (0.01)
0.0203 = 0.00142.
Thus the LogNormal corresponding to the first simulated Normal is: exp[(0.00142)(1.290) + 0.000609] = 1.00244. Thus, the discount factor from time 0 to 0.0203 is: 1/1.00244 = 0.9976. 83
There are many ways to model stochastic interest rates. See for example, Derivative Markets by McDonald on Joint Exam MFE/3F, Practical Risk Theory for Actuaries, by Daykin, et. al., or “The Markov Chain Interest Rate Scenario Generator Revisited”, by Sarah Christiansen, Journal of Actuarial Practice, Vol. 2, No. 1, 1994. 84 In the same manner that the timing of loss events was modeled by a single fixed Poisson Process, but the number of loss events and their times will vary from simulation run to simulation run. 85 Thus interest rates are from a Geometric Brownian Motion with drift, as in Example 21.8 in Loss Models. 86 We need to simulate the discount factors over disjoint intervals. The discount factors from 0 to .203 and from 0 to 1.2716 are not independent.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 166
Next we need to simulate the discount factor from time 0.0203 to time 1.2716. The time interval is: 1.2716 - 0.0203 = 1.2513. Thus, we need a LogNormal with µ = (1.2513)(.03) = 0.03754 and σ = (0.01) 1.2513 = 0.01119. Using a random Normal of -0.730, the random LogNormal is: exp[(0.01119)(-0.730) + 0.03754] = 1.02778. The discount factor is: 1/1.02778 = 0.9730. Exercise: Simulate the discount factor from time 1.2716 to 2.9050. Use a random Unit Normal of 0.770. [Solution: The time interval is 2.9050 - 1.2716 = 1.6334. Thus the discount factor is: 1 / exp[(0.01 1.6334 )(0.770) + (0.03)(1.6334)] = 0.9429.] Now that we have the discount factors for each interval we can compute the sum of the discounted payments, as $28,650:
Time
Interval Discount Factor
Cumulative Discount Factor
Payment
Discounted Payment
0.0203 1.2716 2.9050
0.9976 0.9730 0.9429
0.9976 0.9707 0.9152
3,859 2,920 24,000
3,850 2,834 21,966
30,779
28,650
Sum
For example, the discount factor applied to the third payment at time 2.9050 is: (0.9976)(0.9730)(0.9429) = 0.9152. The discounted value of this payment is: (0.9152)($24,000) = $21,966.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 167
Uses of the Output: One could run this simulation repeatedly, holding all of the assumptions and inputs constant. Each simulation run would have a possibly different number of losses, different sizes of losses, different timings of losses, and different discount factors. For each run we could record the resulting present value of the aggregate losses.87 Then one could tabulate these results in order to estimate the distribution function of the present value of the aggregate losses. The same thing could be done with any complicated situation. As long as we can (approximately) model the pieces, we can simulate the whole situation. Then for each simulation run we can record appropriate summary statistics. In the case of this example, we recorded the present value of aggregate losses. In another example, we might record the reserve deficiency/redundancy from using a certain method of loss reserving. In another situation, we might simulate and record the surplus of a model insurer after 5 years. In another situation, we might record the insurerʼs payments in a year due to hurricanes hitting the state of Florida. One advantage of the simulation technique is that we can estimate the impact of varying the assumptions or inputs. We can also easily add extra levels of complication. In the example, we were able to add policy modifications, random delays between occurrence and payment, and random interest rates, to a basic simulation of aggregate losses. We could have added additional refinements, testing the impact at each stage. We can also incorporate advances in knowledge. For example, as engineers develop more detailed models of how different aspects of buildings affect their damageability by hurricanes, that can be incorporated into a simulation model of hurricane losses. The simulation technique has the major advantage that we can run a simulation model many times.88 We can run the model as many times as we want in order to get a needed level of accuracy of an estimated quantity of interest.89
87
Plus any other quantities of interest, such as the undiscounted aggregate losses, the ratio of the discounted to undiscounted losses, etc. 88 In contrast one can only observe the insurance data from a given year (at a given report) once. 89 How many simulation to run in order to get a certain level of accuracy is discussed in a subsequent section.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 168
Problems: 11.1 (3 points) Claim frequency is assumed to be given by a Binomial Distribution as per Loss Models with parameters m = 6 and q = 0.6. Claim severity is assumed to be given by a Pareto Distribution as per Loss Models with parameters: α = 4 and θ = 100. Frequency and severity are independent. Use the following set of random numbers from (0,1) : 0.402, 0.264, 0.306, 0.917, 0.877, 0.734, 0.601, 0.439, in order to simulate the total dollars of loss in a year. Apply the Inverse Transform Algorithm to first simulate the number of claims and then to simulate the amount of each claim. A. less than 100 B. at least 100 but less than 105 C. at least 105 but less than 110 D. at least 110 but less than 115 E. at least 115 11.2 (3 points) Simulate a random draw from a compound Geometric-Poisson distribution, as per Loss Models, with parameters β = 3.7 and λ = 2.7. (Frequency is Geometric with β = 3.7, and the amount of each event is Poisson with λ = 2.7.) Use the following random numbers from (0,1): 0.725, 0.184, 0.877, 0.734, 0.439, 0.518, 0.121, 0.402, 0.601. Hint: First simulate a random draw from the Geometric via the Inverse Transform Algorithm. Then simulate random draw(s) from the Poisson via the Inverse Transform Method. A. 9 or less B. 10 or 11 C. 12 or 13 D. 14 or 15 E. 16 or more 11.3 (3 points) For an individual over 65: (i) The number of pharmacy claims per month is Poisson with mean 2.1. (ii) The amount of each pharmacy claim is uniformly distributed between 5 and 95. (iii) The amounts of the claims and the number of claims are mutually independent. Simulate the individualʼs total cost of pharmacy claims for the next month. Simulate the number of claims using the Inverse Transform Method using the following random number from (0, 1): 0.949. Simulate the claim amounts using the following random numbers from (0, 1): 0.157, 0.013, 0.879, 0.583, 0.313, 0.842, 0.108, 0.761, 0.775, 0.516. (A) 50 (B) 100 (C) 150 (D) 200 (E) 250
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 169
11.4 (3 points) A television quiz show gives cash prizes. The number of prizes per show, N, and prize amounts, X, have the following distributions: n Pr(N = n) x Pr (X=x) 1 0.8 0 0.2 2 0.2 200 0.7 1000 0.1 An insurer sells stop-loss insurance for prizes with a deductible of 300 (per show). Simulate the sum of the insurerʼs costs for the next three shows. Simulate the number of prizes using the following random numbers from (0, 1): 0.865, 0.225, 0.940, 0.606, 0.747. Simulate the prize amounts using the following random numbers from (0, 1): 0.305, 0.563, 0.993, 0.268, 0.127, 0.815, 0.074, 0.168, 0.094, 0.658. (A) 0 (B) 100 (C) 700 (D) 800 (E) 900 or more
Use the following information for the next 2 questions:
• An insured has a $25,000 per loss deductible. • The insured has an $100,000 annual aggregate maximum; the insured will pay out of pocket no more than $100,000 per year due to his per loss deductible.
• Frequency is assumed to be given by a Poisson Distribution with mean 3.8. • Severity is given by an Inverse ParaLogistic Distribution as per Loss Models with parameters τ = 3 and θ = 10,000. 11.5 (1 point) Given the random number from (0, 1): 0.833, using the method of inversion, simulate the number of losses the insured had in a year. A. 3 or less B. 4 C. 5 D. 6 E. 7 or more 11.6 (3 points) Using the solution to the previous question, the random numbers from (0, 1): 0.271, 0.382, 0.769, 0.910, 0.671, 0.183, 0.909, 0.554, and the method of inversion simulate the size of each loss the insured had. How much does the insurer have to pay in total for the year? A. less than 7,000 B. at least 7,000 but less than 8,000 C. at least 8,000 but less than 9,000 D. at least 9,000 but less than 10,000 E. at least 10,000
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 170
11.7 (3 points) You are given: • An insuredʼs claim severity distribution is described by an exponential distribution: F(x) = 1 - e-x/1000.
• The insuredʼs number of claims is described by a negative binomial distribution with: β = 1.2 and r = 2.
• A 500 per claim deductible is in effect. For the next year, simulate the aggregate losses in excess of the deductible. Simulate the number of claims using the Inverse Transform Method and using the following random number from (0, 1): 0.627. Simulate the claim amounts using the following random numbers from (0, 1): 0.005, 0.627, 0.459, 0.904, 0.674, 0.707, 0.055, 0.539, 0.046, 0.401. (A) 500 or less (B) 600 (C) 700 (D) 800 (E) 900 or more 11.8 (3 points) You are given the following information:
• The inverse of the discount factor to get from time t to time t + Δ follows a LogNormal Distribution with parameters µ = 0.05 Δ and σ = 0.02
Δ.
• Discount factors over disjoint time intervals are independent. • There are three payments: $1300 at time 0.7, $920 at time 1.2, and $1770 at time 2.8. Using the following random numbers from (0,1): 0.6103, 0.4483, 0.8186, what is the simulated sum of the present values of the payments? A. less than 3,300 B. at least 3,300 but less than 3,400 C. at least 3,400 but less than 3,500 D. at least 3,500 but less than 3,600 E. at least 3,600
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 171
11.9 (3 points) A company provides insurance to a concert hall for losses due to power failure. You are given: (i) The number of power failures in a year has a Poisson distribution with mean 1.6. (ii) The distribution of ground up losses due to a single power failure is: x Probability of x 10 0.3 20 0.3 50 0.4 (iii) The number of power failures and the amounts of losses are independent. (iv) There is an annual deductible of 30. Simulate the insurerʼs costs for the next year. Simulate the number of losses using the method of inversion, using the following random number from (0, 1): 0.7712. Simulate the loss amounts using the following random numbers from (0, 1): 0.7030, 0.6433, 0.3163, 0.4930, 0.1484, 0.4551, 0.3824, 0.9541, 0.4858, 0.5900. (A) 0 (B) 10 (C) 20 (D) 30 (E) 40 or more 11.10 (3 points) N is the random variable for the number of accidents in a single year. N follows the distribution: Pr(N = n) = 0.3 (0.7)n-1, n = 1, 2,… Xi is the random variable for the claim amount of the ith accident. Each Xi follows a Weibull Distribution, with θ = 1000 and τ = 0.5. Let U and V1 , V2 ,... be independent random variables following the uniform distribution on (0, 1). You use the inverse transformation method with U to simulate N and Vi to simulate Xi. You are given the following random numbers for the first simulation: u v1 v2 v3 v4
v5
0.712 0.203 0.098 0.546 0.471 0.820 Calculate the total amount of claims during the year for the first simulation. A. less than 100 B. at least 100 but less than 250 C. at least 250 but less than 500 D. at least 500 but less than 1000 E. at least 1000
v6 0.675
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 172
11.11 (2 points) You are given the following information:
• A six-sided die is used to determine the number of losses in a year. • The die has three sides marked with a 0, two sides marked with a 1, and one side marked with a 2. • In addition a spinner represents severity.
• An area marked $30 takes up 75% of the spinner, while an area marked $50 takes up the remaining 25% of the spinner. You simulate a year of losses, followed by another year, followed by a third year. Use the random numbers from (0,1): 0.481, 0.934, 0.815, 0.333, 0.778, 0.121, 0.602. What are the sum of the losses for all three years? A. $80 or less B. $90 C. $100 D. $110 E. $120 or more 11.12 (3 points) You are given: (i) The cumulative distribution for the annual number of losses for a policyholder is: n FN(n) 0 1 2 3 4 5 6 7 8 9 ...
0.3015 0.4031 0.5557 0.6936 0.7924 0.8634 0.9141 0.9476 0.9688 0.9819 ...
(ii) The loss amounts follow the Single Parameter Pareto distribution with θ = 1000 and α = 3. (iii) The inversion method is used to simulate the number of losses and loss amounts for a policyholder. (iv) For the number of losses use the random number 0.6645. (v) For loss amounts use the random numbers: 0.7283 0.1525 0.5773 0.4618 0.1335 0.2380 Use the random numbers in order and only as needed. Based on the simulation, calculate the insurerʼs aggregate losses for this policyholder. A. less than 3600 B. at least 3600 but less than 3700 C. at least 3700 but less than 3800 D. at least 3800 but less than 3900 E. at least 3900
0.8814
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 173
11.13 (3 points) An insurance company sold 20 fire insurance policies as follows: Number of Policy Probability of Policies Maximum Claim Per Policy 8 500 0.05 12 300 0.10 You are given: (i) The claim amount for each policy is uniformly distributed between 0 and the policy maximum (ii) The probability of more than one claim per policy is 0. (iii) Claim occurrences are independent. Simulate a year of aggregate losses. Simulate the number of losses using the Inverse Transform Method using the following random numbers from (0, 1): 0.767, 0.700, 0.859, 0.835, 0.046, 0.789, 0.259, 0.657, 0.551, 0.658, 0.556, 0.150, 0.969, 0.329, 0.015, 0.206, 0.195, 0.492, 0.473, 0.271. Simulate the loss amounts using the following random numbers from (0, 1): 0.414, 0.582, 0.336, 0.083, 0.647, 0.881, 0.477, 0.248, 0.550, 0.092. A. 0 B. 125 C. 200 D. 300 E. 375 11.14 (3 points) Aggregate losses for a policy are modeled as follows: (i) The number of losses has a Poisson distribution with λ = 4. (ii) The amount of each loss has a Burr distribution with α = 3, θ = 100,000, and γ = 2. (iii) The number of losses and the amounts of the losses are mutually independent. (iv) Each claim is subject to a maximum covered loss of 50,000. Use the random number 0.773 to simulate the number of losses. Use the following random numbers to simulate the sizes of loss: 0.062, 0.217, 0.355, 0.781, 0.585, 0.808, 0.485, 0.624. What are the simulated total payments on this policy by the insurer? A. 140,000 B. 160,000 C. 180,000 D. 200,000 E. 220,000
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 174
11.15 (SOA3, 11/03, Q.5 & 2009 Sample Q.82) (2.5 points) N is the random variable for the number of accidents in a single year. N follows the distribution: Pr(N = n) = 0.9 (0.1)n-1, n = 1, 2,… Xi is the random variable for the claim amount of the ith accident. Xi follows the distribution: g(xi) = 0.01 e-0.01xi , xi > 0, i = 1, 2,... Let U and V1 , V2 ,... be independent random variables following the uniform distribution on (0, 1). You use the inverse transformation method with U to simulate N and Vi to simulate Xi with small values of random numbers corresponding to small values of N and Xi. You are given the following random numbers for the first simulation: u v1 v2 v3 v4 0.05 0.30 0.22 0.52 0.46 Calculate the total amount of claims during the year for the first simulation. (A) 0 (B) 36 (C) 72 (D) 108 (E) 144 11.16 (CAS3, 11/04, Q.37) (2.5 points) An actuary is simulating annual aggregate loss for a product liability policy, where claims occur according to a binomial distribution with parameters n = 4 and p = 0.5, and severity is given by an exponential distribution with parameter θ = 500,000. The number of claims is simulated using the inverse transform method (where small random numbers correspond to small claim sizes) and a random value of 0.58 from the uniform distribution on [0,1]. The claim severities are simulated using the inverse transform method (where small random numbers correspond to small claim sizes) using the following values from the uniform distribution on [0,1]: 0.35, 0.70, 0.61, 0.20. Calculate the simulated annual aggregate loss for the product liability policy. A Less than 250,000 B. At least 250,000, but less than 500,000 C. At least 500,000, but less than 750,000 D. At least 750,000, but less than 1,000,000 E. At least 1,000,000
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 175
11.17 (SOA3, 11/04, Q.6 & 2009 Sample Q.122) (2.5 points) You are simulating a compound claims distribution: (i) The number of claims, N, is binomial with m = 3 and mean 1.8. (ii) Claim amounts are uniformly distributed on {1, 2, 3, 4, 5}. (iii) Claim amounts are independent, and are independent of the number of claims. (iv) You simulate the number of claims, N, then the amounts of each of those claims, X1 , X2 ,… , XN. Then you repeat another N, its claim amounts, and so on until you have performed the desired number of simulations. (v) When the simulated number of claims is 0, you do not simulate any claim amounts. (vi) All simulations use the inverse transform method, with low random numbers corresponding to few claims or small claim amounts. (vii) Your random numbers from (0, 1) are 0.7, 0.1, 0.3, 0.1, 0.9, 0.5, 0.5, 0.7, 0.3, and 0.1. Calculate the aggregate claim amount associated with your third simulated value of N. (A) 3 (B) 5 (C) 7 (D) 9 (E) 11 11.18 (SOA3, 11/04, Q.34 & 2009 Sample Q.132) (2.5 points) Annual dental claims are modeled as a compound Poisson process where the number of claims has mean 2 and the loss amounts have a two-parameter Pareto distribution with θ = 500 and α = 2. An insurance pays 80% of the first 750 of annual losses and 100% of annual losses in excess of 750. You simulate the number of claims and loss amounts using the inverse transform method with small random numbers corresponding to small numbers of claims or small loss amounts. The random number to simulate the number of claims is 0.8. The random numbers to simulate loss amounts are 0.60, 0.25, 0.70, 0.10 and 0.80. Calculate the total simulated insurance claims for one year. (A) 294 (B) 625 (C) 631 (D) 646 (E) 658 11.19 (4, 11/05, Q.8 & 2009 Sample Q.220) (2.9 points) Total losses for a group of insured motorcyclists are simulated using the aggregate loss model and the inversion method. The number of claims has a Poisson distribution with λ = 4. The amount of each claim has an exponential distribution with mean 1000. The number of claims is simulated using u = 0.13. The claim amounts are simulated using u1 = 0.05, u2 = 0.95, and u3 = 0.10 in that order, as needed. Determine the total losses. (A) 0 (B) 51 (C) 2996
(D) 3047
(E) 3152
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 176
11.20 (4, 11/06, Q.4 & 2009 Sample Q.249) (2.9 points) You are given: (i) The cumulative distribution for the annual number of losses for a policyholder is: n FN(n) 0 1 2 3 4 5 ...
0.125 0.312 0.500 0.656 0.773 0.855 ...
(ii) The loss amounts follow the Weibull distribution with θ = 200 and τ = 2. (iii) There is a deductible of 150 for each claim subject to an annual maximum out-of-pocket of 500 per policy. The inversion method is used to simulate the number of losses and loss amounts for a policyholder. (a) For the number of losses use the random number 0.7654. (b) For loss amounts use the random numbers: 0.2738 0.5152 0.7537 0.6481 0.3153 Use the random numbers in order and only as needed. Based on the simulation, calculate the insurerʼs aggregate payments for this policyholder. (A) 106.93 (B) 161.32 (C) 224.44 (D) 347.53 (E) 520.05 11.21 (4, 5/07, Q.9) (2.5 points) You are given: (i) For a company, the workers compensation lost time claim amounts follow the Pareto distribution with α = 2.8 and θ = 36. (ii) The cumulative distribution of the frequency of these claims is: Frequency Cumulative Probability 0 0.5556 1 0.8025 2 0.9122 3 0.9610 4 0.9827 5 0.9923 ... ... (iii) Each claim is subject to a deductible of 5 and a maximum payment of 30. Use the uniform (0, 1) random number 0.981 and the inversion method to generate the simulated number of claims. Use as many of the following uniform (0, 1) random numbers as necessary, beginning with the first, and the inversion method to generate the claim amounts. 0.571 0.932 0.303 0.471 0.878 Calculate the total of the companyʼs simulated claim payments. (A) 37.7 (B) 41.9 (C) 56.8 (D) 64.9 (E) 84.9
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 177
Solutions to Problems: 11.1. B. The first step is to generate a random draw from the given Binomial. For m = 6 and q = .6, the Distribution Function is: Number of Claims 0 1 2 3 4 5 6
Probability Density Function 0.410% 3.686% 13.824% 27.648% 31.104% 18.662% 4.666%
Cumulative Distribution Function 0.00410 0.04096 0.17920 0.45568 0.76672 0.95334 1.00000
The Distribution first exceeds .402 at 3, so we simulate 3 claims. Then we simulate three claim sizes from the given Pareto via algebraic inversion. Set u = F(x) = 1 - {θ/(x+θ)}α. Therefore, x = θ{(1-u)−1/α -1}. Therefore using the next three random numbers, we gets 3 claims of sizes: 100(.736-1/4 -1) = 8.0, 100(.694-1/4 -1) = 9.6, and 100(.083-1/4 -1) = 86.3. Thus the simulated total dollars of loss for the year are: 8.0 + 9.6 + 86.3 = 103.9. Comment: Given additional random numbers we could continue to simulate additional years of data in the same manner.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 178
11.2. D. For the primary Geometric Distribution, calculate a table of values for f(x) and the take the cumulative sum to get F(x). F(x) first exceeds 0.725 at 5. x 0 1 2 3 4 5 6 7 8 9 10 11 12
f(x) 0.2128 0.1675 0.1319 0.1038 0.0817 0.0643 0.0506 0.0399 0.0314 0.0247 0.0195 0.0153 0.0121
F(x) 0.2128 0.3803 0.5121 0.6159 0.6976 0.7620 0.8126 0.8525 0.8839 0.9086 0.9280 0.9433 0.9554
Thus we need to simulate 5 draws from the secondary Poisson Distribution. We calculate a table of densities for this Poisson, using the relationship f(x+1) = f(x) {λ / (x+1)} = 2.7 f(x) /(x+1) and f(0) = e−λ = e-2.7 = .0672. 2.7 Number of Claims 0 1 2 3 4 5 6
Probability Density Function 6.721% 18.145% 24.496% 22.047% 14.882% 8.036% 3.616%
Cumulative Distribution Function 0.06721 0.24866 0.49362 0.71409 0.86291 0.94327 0.97943
One cumulates the densities to get the distribution function. The next random number is .184, F(1) = .24866 ≥ .184, so we simulate one loss. We simulate four more random draws or this Poisson: F(5) ≥ .877 , F(4) ≥ .734, F(2) ≥ .439, F(3) ≥ .518. Thus the total number of losses is: 1 + 5 + 4 + 2 + 3 = 15. 11.3. D. First calculate a distribution table of the Poisson with λ = 2.1. n f(n) F(n)
0 0.1225 0.1225
1 0.2572 0.3796
2 0.2700 0.6496
3 0.1890 0.8386
4 0.0992 0.9379
5 0.0417 0.9796
.949 is first exceeded by .9796 so we simulate 5 losses. To simulate a uniform distribution from 5 to 95: x = 90u + 5. (90)(.157) + 5 = 19.13. (90)(.013) + 5 = 6.17. (90)(.879) + 5 = 84.11. (90)(.583) + 5 = 57.47. (90)(.157) + 5 = 33.17. Total amount is: 19.13 + 6.17 + 84.11 + 57.47 + 33.17 = 200.05. Comment: Situation similar to 3, 11/00, Q.32, not a simulation question.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 179
11.4. D. First show. .865 ≥ .8, so we simulate two prizes. .9 > .305 ≥ .2, so the first prize is worth 200. .9 > .563 ≥ .2, so the 2nd prize is worth 200. Total amount is 400, insurer pays 100. Second show. .225 < .8, so we simulate one prize. .993 ≥ .9, so the prize is worth 1000. Insurer pays 700. Third show. .940 ≥ .8, so we simulate two prizes. 9 > .268 ≥ .2, so the first prize is worth 200. .2 > .127, so the second prize is worth 0. Total amount is 200, insurer pays nothing. Total amount paid by the insurer: 100 + 700 + 0 = 800. Comment: Situation similar to 3, 5/01, Q.30, not a simulation question. 11.5. D. We calculate a table of Poisson densities. 3.8 Number of Claims 0 1 2 3 4 5 6 7
Probability Density Function 2.237% 8.501% 16.152% 20.459% 19.436% 14.771% 9.355% 5.079%
Cumulative Distribution Function 0.02237 0.10738 0.26890 0.47348 0.66784 0.81556 0.90911 0.95989
We simulate 6 losses since F(6) = .909 > .833. 11.6. D. The Distribution Function for the Inverse ParaLogistic is F(x) = {1+(10000/x)3 }-3. Setting u = F(x), x = 10000 (u-1/3 -1)-1/3. We simulate 6 losses. u
x
0.271 0.382 0.769 0.910 0.671 0.183
12,240 13,828 22,192 31,519 19,157 10,951
Sum
Insureds Payment 12,240 13,828 22,192 25,000 19,157 10,951
Insurer Payment 0 0 0 6,519 0 0
103,368
6,519
Without the aggregate maximum, the insured would pay $103,368. However, since the insured is required to pay no more than $100,000 per year, the insurer must pick up an extra $3,368. So the insurer pays $6,519 + $3,368 = $9,887.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 180
11.7. B. First calculate a distribution table of the Negative Binomial with β = 1.2 and r = 2. f(0) = 1/(1+β)r. f(x+1) = f(x){β/(1+β)}{(x+r)/(x + 1)}. n f(n) F(n)
0 0.2066 0.2066
1 0.2254 0.4320
2 0.1844 0.6164
3 0.1341 0.7505
4 0.0914 0.8420
5 0.0599 0.9018
6 0.0381 0.9399
.627 is first exceeded by .7505, so we simulate 3 losses. Set u = F(x) = 1 - e-x/1000. x = -1000ln(1 - u). -1000ln(1 - .005) = 5. Insurer pays nothing. -1000ln(1 - .627) = 986. Insurer pays: 986 - 500 = 486. -1000ln(1 - .459) = 614. Insurer pays: 614 - 500 = 114. Total amount paid is: 0 + 486 + 114 = 600. Comment: Situation similar to Course 3 Sample Exam, Q.20, not a simulation question. 11.8. E. Using the random numbers from (0,1): .6103, .4483, .8186, and the inverse transform method, simulate three random draws from a Unit Normal. Consulting the Normal Table: Φ(.28) = .6103, Φ(-.13) = .4483, and Φ(.91) = .8186. Thus the random Unit Normals are: .28, -.13, and .91. Now the discount factor from time 0 to .7 involves a LogNormal with parameters µ = (.05)(.7) = .035, and σ = (.02) 0.7 = .0167. Thus the LogNormal corresponding to the first simulated Normal of .28 is: exp[(.0167)(.28) + .035] = 1.0405 Thus, the discount factor from time 0 to .7 is 1/1.0405 = .9611. Next we need to simulate the discount factor from time .7 to time 1.2. The time interval is .5. The discount factor is: 1/exp[(.02) 0.5 (-0.13) + (.05)(.5)] = .9771. Finally, the discount factor from time 1.2 to time 2.8 is: 1/exp[(.02) 1.6 (.91) + (.05)(1.6)] = .9021. Now that we have the discount factors for each interval we can compute the sum of the discounted payments, as $3,613: Time
Interval Discount Factor
Cumulative Discount Factor
Payment
Discounted Payment
0.7 1.2 2.8
0.9611 0.9771 0.9021
0.9611 0.9391 0.8472
$1,300 $920 $1,770
$1,249 $864 $1,499
$3,990
$3,613
Sum
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 181
11.9. E. First calculate a distribution table of the Poisson with λ = 1.6. n f(n) F(n)
0 0.2019 0.2019
1 0.3230 0.5249
2 0.2584 0.7834
3 0.1378 0.9212
4 0.0551 0.9763
.7712 is first exceeded by .7834, so we simulate 2 losses. .7030 ≥ .6 so the first loss is 50. .6433 ≥ .6 so the second loss is 50. Total amount is: 50 + 50 = 100. Insurer pays: 100 - 30 = 70. Comment: Situation similar to 3, 5/00, Q.11. 11.10. E. Put together a table for the given frequency distribution: n
f(n)
F(n)
1 2 3 4 5 6
0.3 0.21 0.147 0.1029 0.07203 0.050421
0.3 0.51 0.657 0.7599 0.83193 0.882351
Since u = .712 is first exceed by F(4) = .7599, we simulate four accidents. Set v = 1 - exp(-(x/1000).5). ⇒ x = 1000(ln(1 - v))2 . X1 = 1000(ln(1 - .203))2 = 51.5. X2 = 1000(ln(1 - .098))2 = 10.6. X3 = 1000(ln(1 - .546))2 = 623.6. X4 = 1000(ln(1 - .471))2 = 405.5. Aggregate loss = 51.5 + 10.6 + 623.6 + 405.5 = 1091.2. Comment: Frequency is a zero-truncated Geometric Distribution with β = 7/3 . 11.11. D. The frequency distribution is: Number of Losses: f(x) F(x) 0 3/6 .5000 1 2/6 .8333 2 1/6 1.0000 Since .481 < .5000, we simulate no losses in year 1. Since .934 is first exceeded by 1.000, we simulate 2 losses in year 2. Since .815 ≥ .75, the first loss is of size $50. Since .333 < .75, the second loss is of size $30. Since .778 is first exceeded by .8333, we simulate 1 loss in year 3. Since .121 < .75, the loss is of size $30. The sum of the losses for all three years is: $50 + $30 + $30 = $110.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 182
11.12. E. 0.6645 is first exceed by F(3) = 0.6936. ⇒ There are three simulated losses. For the Single Parameter Pareto Distribution, set u = F(x) = 1 - (θ/x)α.
⇒ x = θ/(1 - u)1/α = 1000/(1 - u)1/3 . The three sizes of loss are: 1000/(1 - .7283)1/3 = 1544, 1000/(1 - .1525)1/3 = 1057, 1000/(1 - .5773)1/3 = 1332. Their sum is: 1544 + 1057 + 1332 = 3933. Comment: Similar to 4, 11/06, Q.4. 11.13. B. For the first 8 policies, there is a claim if u > 0.95. (Small random numbers correspond to small numbers of losses.) Using the first 8 random numbers, there are no claims on these 8 polices. For the next 12 policies, there is a claim if u > .90. Using the next 12 random numbers, there is a claim on one of these 12 polices. It has size 300u = (300)(.414) = 124. 11.14. C. First construct a table of the Poisson Distribution: x
f(x)
F(x)
0 1 2 3 4 5 6 7 8
0.018316 0.073263 0.146525 0.195367 0.195367 0.156293 0.104196 0.059540 0.029770
0.018316 0.091578 0.238103 0.433470 0.628837 0.785130 0.889326 0.948866 0.978637
The random number 0.773 is first exceeded at F(5), so we simulate five claims. For the Burr, VaRp [X] = θ {(1-p)−1/α - 1}1/γ = (1000) {(1-p)-1/3 - 1}1/2. The first loss is: (100,000) {0.938-1/3 - 1}1/2 = 14,685. The second loss is: (100,000) {0.783-1/3 - 1}1/2 = 29,147. The third loss is: (100,000) {0.645-1/3 - 1}1/2 = 39,673. The fourth loss is: (100,000) {0.219-1/3 - 1}1/2 = 81,180. The fifth loss is: (100,000) {0.415-1/3 - 1}1/2 = 58,366. The payment on each of the two largest losses is capped at 50,000. The sum of the five payments: 183,505.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 183
11.15. B. Put together a table for the given frequency distribution: n
f(n)
F(n)
1 2 3 4 5
0.9 0.09 0.009 0.0009 0.00009
0.9 0.99 0.999 0.9999 0.99999
Since u = .05 is first exceed by F(1) = .9, we simulate one accident. Severity is Exponential with mean 100. G(x) = 1 - e-x/100. Set G(x) = v1 = .30. ⇒ x = -100ln(1 - .3) = 35.7. Aggregate loss = 35.7. Comment: An overly elaborate description of a typical aggregate situation. Frequency is a zero-truncated Geometric Distribution with β = 1/9. 11.16. D. For the Binomial, F(0) = f(0) = .54 = .0625, f(1) = (4)(.54 ) = .25, F(1) = F(0) + f(1) = .3125, f(2) = (6)(.54 ) = .375, F(2) = F(1) + f(2) = .6875 > .58. Therefore, we simulate two claims. The sizes of the two claims are simulated from the Exponential Distribution. u = F(x) = 1 - e-x/500000. x = -500000 ln(1 - u). u = .35 ⇒ x = 215,391. u = .70 ⇒ x = 601,986. 215,391 + 601,986 = 817,377. 11.17. C. For Binomial, m = 3 and q = .6: f(0) = .43 = .064.
F(0) = .064.
f(1) = (3)(.42 )(.6) = .288.
F(1) = .352.
f(2) = (3)(.4)(.62 ) = .432.
F(2) = .784.
f(3) = .63 = .216.
F(3) = 1.
.352 ≤ u = .7 < .784 ⇒ 2 claims. Use up .1 and .3 (size of claims are 1 and 2). .064 ≤ u = .1 < .352 ⇒ 1 claim. Use up .9 (size of claim 5). .352 ≤ u = .5 < .784 ⇒ 2 claims. u = .5 ⇒ 3. u = .7 ⇒ 4. 3 + 4 = 7.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 184
11.18. C. For a Poisson with λ = 2, f(0) = e-2 = .1353, f(1) = 2e-2 = .2707, f(2) = 4e-2/2 = .2707. f(3) = 8e-2/6 = .1804. F(0) = .1353. F(1) = .4060. F(2) = .6767. F(3) = .8571. .6767 ≤ .8 < .8571, so we simulate 3 claims. Set u = 1 - {500/(500 + x)}2 . ⇒ x = 500{(1-u)-0.5 - 1}. u = .6 ⇒ x = 291. u = .25 ⇒ x = 77. u = .7 ⇒ x = 413. Simulated annual loss: 291 + 77 + 413 = 781. Payment = (80%)(750) + 31 = 631. 11.19. D. Construct a table for the Poisson Distribution with mean 4: n f(n) F(n) 0 .01832 .01832 1 .07326 .09158 2 .14653 .23811 F(n) first exceeds u = 0.13 when n = 2, so we simulate two claims. For the Exponential, set F(x) = 1 - e-x/1000 = ui. ⇒ x = -1000ln(1 - ui). Thus the two simulated values are: -1000ln(.95) = 51 and -1000ln(.05) = 2996. 51 + 2996 = 3047. If instead large random numbers result in small simulated claim amounts, then set S(x) = e-x/1000 = ui. ⇒ x = -1000ln(ui). Then the two simulated values are: -1000ln(.05) = 2996 and -1000ln(.95) = 51. 2996 + 51 = 3047. Comment: The question should have specified whether large random numbers result in large or small simulated values from the Exponential. 11.20. C. Find where F(n) first exceeds u. F(4) = .773 > .7654. ⇒ Simulate 4 losses. To simulate the Weibull, u = F(x) = 1 - exp[-(x/200)2 ]. x = 200 -ln[(1 - u)] . u = 0.2738. ⇒ x = 113. Insurer pays 0. u = 0.5152. ⇒ x = 170. Insurer pays 20. u = 0.7537. ⇒ x = 237. Insurer pays 87. u = 0.6481. ⇒ x = 204. Insurer pays 54. Insurer would pay: 0 + 20 + 87 + 54 = 161. Insured would retain: (113 + 170 + 237 + 204) - 161 = 724 - 161 = 563, too much. Insured retains 500, and insurer pays: 724 - 500 = 224.
2013-4-13,
Simulation §11 Aggregate Losses, HCM 10/25/12, Page 185
11.21. B. 0.981 is first exceeded by F(4) = 0.9827. ⇒ There are four simulated losses. For the Pareto Distribution, set u = F(x) = 1 - {θ/(x + θ)}α.
⇒ x = θ{1/(1 - u)1/α - 1} = 36{{1/(1 - u)1/2.8 - 1} . The four sizes of loss are: (36){1/(1 - .571)1/2.8 - 1} = 12.70, (36){1/(1 - .932)1/2.8 - 1} = 58.03, (36){1/(1 - .303)1/2.8 - 1} = 4.95, (36){1/(1 - .471)1/2.8 - 1} = 9.19. The corresponding four claim payments are: 7.70, 30, 0, 4.19. There sum is: 41.89. Comment: The maximum covered loss is: 5 + 30 = 35. Workersʼ compensation claims do not usually have a maximum covered loss. Sometimes workersʼ compensation claims are subject to a deductible from the point of view of the insured business; they are not subject to a deductible from the point of view of the injured worker.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 186
Section 12, Deciding How Many Simulations to Run Quite often an actuary will attempt to estimate a quantity by repeatedly running a simulation model. For each simulation run, one records the quantity of interest. Then if one has performed enough simulation runs, one can estimate the quantity of interest to a desired degree of accuracy and to a desired confidence level.90 91 The quantity of interest can vary greatly depending on the application. For example, it could be the expected present value of the benefits paid to a totally and permanently disabled worker. It could be the p-value associated with a Kolmogorov-Smirnov statistic. It could be the 80th percentile of the distribution of aggregate annual losses for a self-insured employer. The common idea is that we wish to determine this quantity to some degree of accuracy and therefore wish to know how many simulations to run.92 How many simulations you should run, depends on the particular situation, the quantity of interest, and the desired level of accuracy. Conversely in situations where you can only run a limited number of simulations, you want to be able to determine the level of accuracy of the resulting estimate. Estimation of the Mean: For example, assume we ran a simulation model for the present value of aggregate losses 1000 times and got the following: Sum of results: $25,951,024. Sum of squares of results: 915,107,536,241. Then the estimated mean is: X 1000 = $25,951 and the estimated second moment is 915,107,536,241 / 1000 = 915,107,536. Thus the sample variance is: S10002 = (1000/999)(915,107,536 - 259512 ) = 241,895,030.93 The sample standard deviation is: S1000 = 241,895,030 = 15,553. 90
The process variance due to random fluctuation generally goes down as the inverse of the number of simulation runs. Note that we are not considering any errors due to the use of inaccurate inputs, incomplete or incorrect models of reality, or programming mistakes. 91 This is analogous to ideas behind Classical Credibility. 92 This is of course only an issue, if the each simulation run is time consuming. What constitutes time consuming, depends on how often you plan to perform the given task, and under what circumstances. If you are lucky enough to be able to easily run 100 or 10,000 times as many simulations as you think you might need, go right ahead and do so. 93 Notice the multiplication by 1000/999 in order to get the sample variance. For a data set of size i, multiplying by i/(i-1) results in an unbiased estimator of the variance. For a sample of size i, the sample variance = S2 = Σ(Xj - X )2 /(i-1).
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 187
Let Si be the sample standard deviation after i simulation runs. In this case, S1000 = 15,553. Similarly, let X i be the sample mean after i simulation runs. For the average of i independent, identically distributed variables, Var[ X i] = Var[X]/i.94
Standard deviation of the estimated mean = StdDev[ X i] =
Si . i
Thus with 1000 simulation runs, the standard deviation of the estimated mean is: S1000 15,553 = = 492. 1000 1000 Thus an approximate 95% confidence interval for the expected present value of aggregate losses would be: $25,951 ± (1.96)(492) = $25,951 ± $964. $964/$25,961 = 3.7%, thus these error bars are: ±3.7%. If that was sufficiently accurate for our application, we could use this estimate. If more accuracy was needed, we could go back and run more simulations. For example, we expect that 100 times as many simulations would reduce the error bars by about a factor of 10.95 Therefore, with 100,000 simulations we expect error bars of about ±0.4%. Exercise: Given the above information, how many simulations would you expect to have to run, in order to be 95% confident of estimating the present value of aggregate losses within ±1%? [Solution: Given n simulations, the error bars in percents are about: ((1.960)15553 / n )/25951 = 1.1746/ n . We want 0.01 = 1.1746/ n . Thus n = (1.1746 / 0.01)2 ≅ 13,800.]
94
See “Mahlerʼs Guide to Fitting Loss Distributions.” The standard deviation of the estimate goes down as 1/ i . The sample standard deviation would change a little, thus the error bars would not be smaller by exactly a factor of 10. 95
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 188
Exercise: You run a simulation model of hurricanes hitting the state of Florida. For each simulation run, you record the losses paid by the insurance industry in one year. The resulting statistics are shown below through selected numbers of simulation runs. For example, after 1000 runs, the sum of the runs is: 369,103, and the sum of the squares of the runs is: 28,730,114,814. Thus X 1000 = 369,103/1000 = 369.103, and S1000 = Number of Runs 1000 2000 3000 4000 5000 6000 7000 8000 9000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000 60,000 70,000 80,000 90,000 100,000
1000 (28,730,115 - 369.1032) = 5350. 999
Running Average of Xi Sample Standard Deviation 369 356 352 347 340 334 335 341 355 337 359 360 357 354 365 345 348 347 360 355 362 372 368
5350 5832 6164 6027 5840 5733 5631 5439 5345 5148 5592 5471 5520 5544 5558 5497 5513 5501 5580 5545 5582 5597 5572
Si/√i
(Si/Xbari)/√i
169.2 130.4 112.5 95.3 82.6 74.0 67.3 60.8 56.3 51.5 45.7 38.7 34.9 32.0 29.7 27.5 26.0 24.6 22.8 21.0 19.7 18.7 17.6
0.4585 0.3663 0.3197 0.2746 0.2429 0.2216 0.2009 0.1783 0.1587 0.1528 0.1272 0.1075 0.0978 0.0904 0.0814 0.0797 0.0747 0.0709 0.0633 0.0590 0.0545 0.0502 0.0479
You are asked to estimate the average annual loss. You want to be 90% certain that your estimate will not differ from the true value by more than 10%. Determine the minimum number of simulation runs you could have reviewed in order to meet your objective. [Solution: A 90% confidence interval, corresponds to ±1.645 standard deviations on the Normal Distribution. Therefore, we want 1.645Si/ i ≤ 0.1 X i. ⇔ (Si/ X i)/ i ≤ 0.0608. This first occurs for i = 70,000. Alternately, we can estimate the coefficient of variation as: 5572/368 = 15.1. We want i ≥ (Si2 / X i2 )(y/k)2 = (CV2 )(y/k)2 = (15.12 )(1.645/0.1)2 ≅ 62,000. Of the values shown, this first occurs for i = 70,000. Comment: We could have stopped after 70,000 simulation runs, and gotten an estimate of: 355 ± (1.645)(5545)/ 70,000 = 355 ± 34.]
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 189
In general, if we want to have (at least) P chance of being within ±k of the true value, where k is a percentage, let y be such that Φ(y) = (1+P)/2. Then we want:
y Si i
≤ k X i. ⇔
( S i/ X i) / i ≤ k/y. ⇔ i ≥ (Si2 / X i2 ) (y/k)2 . ⇔ Needed Number of Simulations = (y/k)2
estimated variance . (estimated mean)2
In the above exercise, k = 10%, P = 90%, y = 1.645, k/y = 10%/1.645 = 0.0608, and we required:
( S i/ X i)/ i ≤ .0608. Relation to Classical Credibility:96 Needed Number of Simulations = (y/k)2
estimated variance = n0 CV2 . (estimated mean)2
This is the same as the number of claims needed for full credibility for severity. For example, with k = 1%, P = 95% (y = 1.960), and CV = 15,553/25,951 = 0.5993, n = 0.59932 (1.960/0.01)2 = 13,798.97 Criterion in terms of Amounts Rather than Percents: Sometimes one will require a probability of being within a certain dollar amount of the true value. For example, in the previous exercise involving simulation of hurricanes, let us assume we want to be 90% certain that your estimate will not differ from the true value by more than 50. Then we want 1.645Si / i ≤ 50 ⇔ Si/ i ≤ 30.4. This first occurs at 35,000 simulations. If we want to have at least P chance of being within ±A of the true value, where A is an amount, y Si / i ≤ A⇔ S i/ i ≤ A/y. Note that the formula to use when the criterion is in amounts is different than when the criterion was in percents.
96 97
See “Mahlerʼs Guide to Classical Credibility.” This matches a result obtained previously, subject to rounding.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 190
Yes/No Simulations: Exercise: Frequency is Bernoulli. You observe 17 claims on 1000 exposures. Determine a 90% confidence interval for the fitted q, using the method of moments. [Solution: q^ = 17/1000 = 0.017. Var[ q^ ] = Var[ X ] = Var[X]/1000 = q(1-q)/1000 = (0.017)(1 - 0.017)/1000 = 0.00001671. StdDev[ q^ ] = 0.00409. A 90% confidence interval is: 0.017 ± (1.645)(0.00409) = 0.017 ± 0.007.] Sometimes the result of simulation is either yes or no. For example, a model may simulate whether an insurer is ruined or not. In those cases, one has a series of Bernoulli trials. Therefore one can estimate the variance, without using the sample variance. Exercise: Out of 1000 simulation runs, a model insurer is ruined 17 times. What is a 90% confidence interval for the probability of ruin? [Solution: We have 17 yeses on 1000 Bernoulli trials. Using the solution to the previous exercise, a 90% confidence interval for the probability of ruin is: 0.017 ± 0.007.] If we ran 100,000 runs rather than 1000, we would expect to get a better estimate of the probability of ruin. The 90% confidence interval would be instead: 0.017 ± .007/ 100 = 0.017 ± 0.0007. This 90% confidence interval is within: ±0.0007/0.017 = ±4.1%. Exercise: Out of 1000 simulation runs, a model insurer is ruined 17 times. How many simulations do you expect to have to run in order to be 90% confident of estimating the probability of ruin within 10% of the true value? [Solution: We want (10%)(0.017) ≥ 1.645 (0.017)(1 - 0.017) / n .
⇒ n ≥ 15,647.19. ⇒ n = 15,648.] Yes/No Simulations and Classical Credibility: The number of yes/no simulations needed can also be related to ideas from Classical Credibility. The number of exposures needed for full credibility for frequency is: (y2 /k2 ) σf2 /µf2 .98 In a yes/no simulation, each simulation is Bernoulli, and σf2 /µf2 = q(1-q)/q2 = (1-q)/q. For example, with k = 10%, P = 90% (y = 1.645), and q = 0.017, n = {(1 - 0.017)/0.017} (1.645/0.1)2 = 15,647, matching the previous result subject to rounding. 98
See “Mahlerʼs Guide to Classical Credibility.” When estimating frequency, the number of observations of the risk process is the number of exposures, and therefore the number of simulations corresponds to the number of exposures. In contrast, when estimating severity, the number of observations of the risk process is the number of claims, and therefore the number of simulations corresponded to the number of claims.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 191
Estimating the Distribution Function: When using simulation in order to estimate a distribution function at x, one asks for each simulation run whether the result is ≤ x. This is a yes/no question. Therefore, one can apply similar ideas to those that were just discussed. The Empirical Distribution Function at x, Fn (x), is the observed number of losses less than or equal to x divided by the total number of losses. Assume that the losses are drawn from a Distribution Function F(x). Then the observed number of losses ≤ x is Binomial with parameters n and F(x). Thus the Empirical Distribution Function is (1/n) times a Binomial Distribution with parameters n and F(x). Thus it has mean of F(x) and a variance of: n F(x){(1-F(x)}/n2 = F(x)S(x)/n. As the number of claims, n, increases, the variance of the estimate of the distribution decreases as 1/n. We can use these ideas to get error bars for estimated values of the distribution function. Exercise: Assume one has run a simulation 2500 times and recorded the annual amount paid by the insurer for each simulation. Assume that in 1731 simulations the amount paid was less than or equal to $100,000. Estimate F(100,000) for the annual amount paid by the insurer. [Solution: F(100,000) ≅ 1731/2500 = 0.6924.] The estimated variance of this estimate is: F(100,000)S(100,000)/N = (0.6924)(1 - 0.6924)/2500 = 0.00008519.99 Thus the standard deviation is 0.0092. Thus an approximate 95% confidence interval for F(100,000) is: 0.69 ± 0.02. Again, since the variance of the estimate goes down as 1/n, the more simulations we run, the more accurate the estimate. So if we want more accuracy, we can just run more simulations. The error bars get narrower as per 1/ n Exercise: Assume that each simulation run takes about 0.18 seconds. About how long would we have to run this simulation model in order to be able to estimate with 95% confidence, F(100,000) within ±0.001? [Solution: Since the error bars get narrower as per 1/ n , we would need to run about (0.02/0.001)2 = 400 as many simulations as above. (400)(2500) = 1,000,000 simulations. At 0.18 seconds per simulation run, this is 180,000 seconds or 50 hours, a little over two days. Comment: One could leave the simulation running on your computer over the weekend, and you should have enough accuracy when you come in on Monday. If instead each simulation run took 18 seconds, one would need 5000 hours or almost seven months. This is a situation where a more efficient algorithm, a better computer program, or a faster computer would be helpful.] 99
Note that the variance is relatively insensitive to the estimated value of F(100,000), provided F(100,000) is not very close to either 0 or 1.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 192
You are estimating the distribution function at $1 million via simulation. You want to be 90% certain of being within ±5% of the true value. By within ±5%, we mean between 0.95F($1 million) and 1.05F($1 million). How many simulations do you need to run? We want 1.645 Fn($1 million) Sn ($1 million) / n ≤ 0.05Fn ($1 million).
⇔ n ≥ (1.645/0.05)2 S n ($1 million)/Fn ($1 million) ⇔ n ≥ 1082Sn ($1 million)/Fn ($1 million). We can check at each n, whether n Fn ($1 million)/Sn ($1 million) ≥ 1082. Alternately, suppose we know that F($1 million) ≥ 0.8, then S($1 million)/F($1 million) ≤ 0.2/0.8 = 0.25. Therefore, we need n ≥ (1082)(0.25) = 271. The first method is more exact and will allow us to run somewhat fewer simulations. In general, in order to estimate the distribution function at x via simulation, with probability P of being within ±k of the true value we want: y Fn(x) S n(x) / n ≤ k Fn (x) ⇔
n Fn (x) ⎛ y⎞ 2 ≥ = n0 . ⎝ k⎠ Sn(x)
Exercise: For a book of business, you are simulating the retained losses net of a proposed reinsurance arrangement. You wish to estimate the probability that the retained losses will be less than or equal to $5 million. You want to be 90% certain of being within ±1% of the true value. You get the following results for various numbers of simulations. Number of Runs 500 1000 2000 3000 4000 5000
# of Results ≤ $5 million 452 896 1803 2711 3599 4504
Fn($5 million) 90.40% 89.60% 90.15% 90.37% 89.97% 90.08%
n 4,708 Fn / Sn 8,615 18,305 28,142 35,900 45,403
How many simulations could you have run? [Solution: We want: n Fn ($5 million) / Sn ($5 million) ≥ (y/k)2 = (1.645/0.01)2 = 27,060. This first occurs when n = 3000. Comment: The resulting estimate would have been: 90.37% ± 0.89%. Note that once we had done enough simulations to guess that F($5 million) ≅ 0.9, then n ≅ (y/k)2 (1 - 0.9)/0.9 = 3007.]
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 193
Suppose that instead we wished to estimate the probability that the retained losses will be greater than $5 million, and we wanted to be 90% certain of being within ±1% of the true value. Then we want: y Fn(x) S n(x) / n ≤ k Sn (x) ⇔ n Sn (x) / Fn (x) ≥ (y/k)2 = 27,060. n Sn (x)/Fn (x) = (y/k)2 . ⇔ n = (y/k)2 Fn (x)/Sn (x) = 27,060 Fn (x)/Sn (x). Assuming F(x) is about 0.9008, we would need about: (27,060) (0.9008/0.0992) = 245,722 simulations! Thus the manner in which we state the requirement, in terms of the distribution or survival function, makes a very big difference! In this example, within ±1% of the true value of the distribution function is about 0.90 ± 0.009, while within plus ±1% of the true value of the survival function is about 0.10 ± 0.001. The latter is the same as 0.90 ± 0.001 for the estimate of the distribution function, and requires many more simulations to achieve 90% confidence. Estimating Percentiles: If Φ(y) = (1+ P)/2, then the probability covered by the p n ± y n p (1− p) order statistics is about P.100 For example, if P = 95%, then y = 1.960, and an approximate 95% confidence interval is in terms of order statistics: np ± 1.96 n p (1− p) . Exercise: Assume we have run a simulation 2500 times and recorded the annual amount paid by the insurer for each simulation. Assume we order the results from smallest to largest. What is an approximate 95% confidence interval for the 70th percentile? [Solution: np ± 1.96 n p (1− p) = (0.7)(2500) ± 1.96 2500 (0.3) (1 - 0.3) = 1750 ± 44.9. Thus an approximate 95% confidence interval for the 70th percentile is from the 1705th observation to the 1795th observation.] For example, the 1705th observation from smallest to largest might be $93,830, while the 1795th observation might be $112,163. In that case, an approximate 95% confidence interval for the 70th percentile is 103 thousand ± 9 thousand. The error bars correspond to ± 9%. If we desire more accuracy, then we can run more simulations. For example if we want to be within ±1% of the true value, we would expect to have to run about (9%/1%)2 = 81 times as many simulation runs as we have done so far.
100
See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 194
^ 101 Let, Π p = smoothed empirical estimate of the pth percentile.
a = np - y n p (1− p) , rounded down to the nearest integer, and b = np + y n p (1− p) , rounded up to the nearest integer.102 X(a) = the ath value from smallest to largest. X(b) = the bth value from smallest to largest. Then (X(a), X(b)) is an approximate P confidence interval for the pth percentile, Πp . If we want to have a probability P that our estimate of the pth percentile is within ±k of the true value, ^ ^ 103 then we require that: X(a) ≥ Π p (1 - k) and X(b) ≤ Πp (1 + k). ^ ^ In other words, we want the interval Π p ± k Πp to be at least as wide as our P confidence interval,
and thus cover at least a probability of P. Exercise: You are running a simulation in order to estimate the 99th percentile. You wish to have a 90% probability of being within ±5% of the true value. After n simulation runs, what comparisons would you make in order to determine if your criterion was met? [Solution: y = 1.645. a = 0.99n - 1.645 n (0.99) (0.01) = 0.99n - 0.1637 n , round down. ^
^
b = 0.99n + 0.1637 n , round up. k = 5%. You want: X(a) ≥ 0.95 Π 0.99 and X(b) ≤ 1.05 Π 0.99 .] For example, for n = 1 million, a = 0.99n - 0.1637 n = 999,000 - 163.7 = 989,836 rounding down, and b = 0.99n + 0.1637 n = 999,000 + 163.7 = 990,164 rounding up to the nearest integer. Assume you define a and b as in the solution to this exercise, and the results after selected numbers of simulation runs are as follows:
101
Number of Runs
Estimate of 99th percentile
a
X(a)
b
X(b)
500,000 1,000,000 2,000,000 3,000,000
3577 3603 3668 3655
494,884 989,836 1,979,768 2,969,716
3324 3390 3512 3548
495,116 990,164 1,980,232 2,970,284
3813 3801 3833 3787
See “Mahlerʼs Guide to Fitting Loss Distributions.” We have made the interval of order statistics a little wider, in order to cover a little more than the desired probability, rather than a little less. 103 See Example 21.5 of Loss Models. 102
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 195
Exercise: How many simulations could you have run in order to satisfy your criterion? ^
^
[Solution: Check whether: X(a) ≥ 0.95 Π 0.99 and X(b) ≤ 1.05 Π 0.99 . Number of Runs
Estimate of 99th percentile
500,000 1,000,000 2,000,000 3,000,000
3577 3603 3668 3655
X(a)
0.95 times estimate
Comparing to X(a)
3324 3390 3512 3548
3398 3423 3485 3472
FALSE FALSE TRUE TRUE
X(b)
1.05 times estimate
Comparing to X(b)
3813 3801 3833 3787
3756 3783 3851 3838
FALSE FALSE TRUE TRUE
Thus 2 million simulation runs would have been enough. ^
^
Comment: After 2 million runs, Π 0.99 ± 0.05 Π 0.99 = (3485, 3851) is wider than (X(a), X(b)) = (X(1979768), X(1980232)) = (3512, 3833). Since (3512, 3833) is a 95% confidence interval for Π0.99, (3485, 3851) must cover more than 95% probability, and the criterion is met.]
Practical Issues: This approach discussed with respect to how many simulations to run when estimating percentiles can be rather time consuming.104 At each simulation we need to order the simulated values from ^ 105 smallest to largest in order to get X(a), X(b), Π p , and do a comparison.
Both here and as well as when determining other quantities of interest, it would be faster to only do the appropriate comparison at selected numbers of simulations. For a particular application, this could be for example once every 1000 simulation runs. If one insisted on doing comparisons at every simulation run, one can speed things up as follows. As each new value is simulated, find where it fits and place it in the proper order. So for example, after 541 simulations we would have a set of 541 values ordered from smallest to largest. Then if the next value simulated were $10,832, we would see where this $10,832 fit among the 541 values. We would then place the $10,832 in its proper order and have a set of 542 values. When estimating the mean, if one insisted on doing comparisons at every simulation run, one could speed things up by using recursion formulas to calculate the sample mean and variance at each stage: X i+1 = X i + (Xj+1 - X i) / (j+1).
104
S2j+1 = (1 - 1/j) S2j + (j+1) ( X i+1- X i)2 .106
This is the approach shown in Loss Models and thus is the one on the syllabus. Sorts of large data sets can be time consuming. 106 See Simulation by Ross, not on the syllabus. 105
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 196
When Estimating Percentiles, an Alternate Method to Determine How Many Simulations: ^
Alternately, one could proceed as follows. Var[ Πp ] ≅ p(1-p)/ {n f(Πp )2 }.107 108 ^ 2 2 2 2 2 2 Set y Var[Π p ] = kΠp . ⇒ y p(1-p)/ {n f(Πp )} = k Πp . ⇒ n = (y/k) p(1-p) / {Πp f(Πp )} .
Exercise: For an Exponential Distribution, what is Πp f(Πp )? [Solution: p = F(Πp ) = 1 - exp[-Πp /θ]. ⇒ Πp = -θln(1-p). f(Πp ) = exp[-Πp /θ]/θ = exp[ln(1-p)]/θ = (1-p)/θ. Πp f(Πp ) = -(1- p)ln(1-p).] Thus for an Exponential Distribution, n = (y/k)2 p(1-p) / {-(1- p)ln(1-p)}2 = (y/k)2 {p/(1-p)} / ln(1-p)2 . Exercise: We are simulating random draws from an Exponential Distribution. We are estimating the 90th percentile, and wish to have a probability of 95% of being within ±1% of the true value. Determine how many simulations we should run. [Solution: P = 95%. y = 1.96. k = 0.01. (y/k)2 = (1.96/0.01)2 = 38,416. n = (y/k)2 {p/(1-p)} / ln(1-p)2 = (38,416)(0.9/0.1) / ln(0.1)2 = 65,211.] While n = (y/k)2 {p/(1-p)} / ln(1-p)2 is a reasonable number of simulations to run when one has a distribution with a righthand tail similar to an Exponential, we would expect to need to run more simulations when there is heaver righthand tail, such as in the case of a Pareto. Exercise: For a Pareto Distribution, what is Πp f(Πp )? [Solution: p = F(Πp ) = 1 - (θ/(θ + Πp ))α. ⇒ Πp = θ{(1-p)−1/α - 1}. f(Πp ) = αθα/(θ + Πp )α+1 = α(1-p)1+1/α/θ. ⇒ Πp f(Πp ) = α(1-p){1 - (1-p)1/α}.] Thus for a Pareto Distribution, n = (y/k)2 p(1-p) / (α(1-p){1 - (1-p)1/α})2 = (y/k)2 {p/(1-p)} / (α{1 - (1-p)1/α})2 . Exercise: We are simulating random draws from a Pareto Distribution with α = 3 and θ = 1000. We are estimating the 90th percentile, and wish to have a probability of 95% of being within ±1% of the true value. Determine how many simulations we should run. 107
See Kendallʼs Advanced Theory of Statistics, Volume 1.
^ Π p is asymptotically Normally Distributed with mean Πp and this variance. See Albert Beerʼs Discussion of “Estimating Probable Maximum Loss with Order Statistics,” PCAS 1983. 108
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 197
[Solution: P = 95%. y = 1.96. k = 0.01. (y/k)2 = (1.96/0.01)2 = 38,416. n = (y/k)2 {p/(1-p)} / (α{1 - (1-p)1/α})2 = (38,416)(0.9/0.1) / (3{1 - 0.11/3})2 ≅ 134,000. Comment: This is reasonably close to the 126,364 simulation runs needed by the authors of Loss Models, when they performed this simulation in their Example 21.5.] Of course, if we already know the distribution, we would not need to do the simulation in order to estimate a percentile. When the distribution is unknown, one could proceed as follows. Run a reasonable number of simulations, in order to get a preliminary estimate of Πp . For example, after 999 runs, X(900) is the smoothed empirical estimate of Π90. We can estimate f(Π90) as for example: 10 / {X(905) - X(895)}. Then estimate the needed number of simulation runs: n = (y/k)2 p(1-p) / {Πp f(Πp )}2 . ^
^
After doing at least n runs, one could check whether: X(a) ≥ Πp (1 - k), and X(b) ≤ Πp (1 + k).
A Practical Rule of Thumb: When using a very complicated simulation model, one could decide whether a given number of simulation runs is enough as follows. If for example you thought 500 simulations runs might be enough,109 then run at least 5 separate sets of 500 runs each.110 Compare the results you would intend to use, between the different sets.111 If the variation in results seems quite acceptable for the intended application, then 500 runs is enough. If not, then one needs more runs; check whether ten times as many runs is acceptable. Continue to increase the number of runs until acceptable results are achieved. If these number of simulations would take too long for the given application, see if there is some way to speed up the computer program112 or simplify the simulation model.113
109
If you have no clue as to how many runs would be enough, start by seeing how many runs can be done in a reasonable period of time for the intended application. For example, if a minute is a reasonable period of time, and that allows about 150 runs, then check to see whether 150 runs works okay. 110 Some actuaries would run two separate sets. I recommend at least five; ten is better. While two sets may demonstrate a problem, it is also easy for fluctuation in results to remain hidden with only two sets to compare. 111 “Results” may consist of an estimate of the mean, estimate of the distribution function, estimate of a percentile, etc. It could also consists of one or more numbers from each run. For example, maybe you tabulate the present value of benefits at several different interest rates. 112 For example use a faster computer, revise the computer code, find a more efficient special algorithm for a part of the simulation that is taking a lot of time, rewrite the simulation in a different computer language, etc. 113 For example, perhaps one does not need to look at as fine gradations of time.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 198
Problems: Use the following information for the next two questions: You run a simulation model 500 times and record for each run the annual amount paid by the insurer. The sum of these 500 values is $421,952. The sum of the squares of these 500 values is 1,267,383,362. 12.1 (2 points) What is an approximate 90% confidence interval for the average annual payment by the insurer? A. $844 ± $100 B. $844 ± $110 C. $844 ± $120 D. $844 ± $130 E. $844 ± $140 12.2 (2 points) How many simulations should you run in total, in order to have a 90% chance of the estimated average annual payment being within ±3% of the true value. A. less than 7,000 B. at least 7,000 but less than 8,000 C. at least 8,000 but less than 9,000 D. at least 9,000 but less than 10,000 E. at least 10,000 12.3 (2 points) You run a simulation model for fatal workers compensation claims in the State of Franklin. For each simulation run you record the present value of the benefits paid by the insurer on one fatal claim. You are asked to estimate the average present value of fatal claims in the State of Franklin. You want to be 90% certain that your estimate will not differ from the true value by more than $5000. The resulting statistics are shown below through selected numbers of simulation runs. (Si^2 is the sample variance after i simulation runs.) i
Running Average of Xi
Si^2
3000
$242,362
31,940,838,400
4000
$257,004
32,137,732,900
5000
$255,810
32,443,214,400
6000
$255,621
32,231,020,900
Determine the minimum number of simulation runs you could have reviewed in order to meet your objective. A. 3000 B. 4000 C. 5000 D. 6000 E. More than 6000
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 199
Use the following information for the next two questions: You have created a simulation model of the operations of the Independent Morticians self-insurance group over the next 7 years. You run the simulation 5000 times. On 171 of those runs the self-insurance group goes insolvent. 12.4 (2 points) What is a 80% confidence interval for the probability that this self-insurance group will go insolvent over the next 7 years? A. [3.19%, 3.65%] B. [3.09%, 3.75%] C. [2.99%, 3.85%] D. [2.89%, 3.95%] E. [2.79%, 4.05%] 12.5 (1 point) Assume one wants to be 80% confident that the simulated probability that this self-insurance group will go insolvent over the next 7 years is within ±0.001 of the probability one would get by running an infinite number of simulation runs, the so-called true value. How many simulation runs should one expect to have to run? (A) 55,000 (B) 60,000 (C) 65,000 (D) 70,000 (E) 75,000
12.6 (2 points) Samantha is running a simulation and wants to be 95% confident of being within ±5% of the true value. The resulting statistics are shown below through selected numbers of simulation runs. Number of Runs
Running Average
Sample Standard Deviation
1,000
837
643
5,000
852
698
10,000
849
672
25,000
846
665
Determine the minimum number of simulation runs Samantha could have reviewed in order to meet her objective. A. 1000 B. 5000 C. 10,000 D. 25,000 E. More than 25,000
12.7 (2 points) You are simulating observations from a Gamma Distribution with α = 3 and θ = 1000. How many simulations are needed to be 99% certain of being within ±2% of the true mean? (A) Less than 2000 (B) At least 2000, but less than 3000 (C) At least 3000, but less than 4000 (D) At least 4000, but less than 5000 (E) At least 5000
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 200
12.8 (2 points) You are running a simulation in order to estimate the 80th percentile. You wish to have a 95% probability of being within ±2% of the true value. After n simulation runs, you let a = [0.8n - 1.960 n(0.8)(0.2) ], and b = 1 + [0.8n + 1.960 n(0.8)(0.2) ], where [x] is the largest integer in x. X(a) = the ath value from smallest to largest. X(b) = the bth value from smallest to largest. The results after selected numbers of simulation runs are as follows. Number of Runs
Estimate of 80th percentile
a
X(a)
b
X(b)
5,000 10,000 20,000 30,000
699 720 718 702
3,944 7,921 15,889 23,864
671 703 706 694
4,056 8,079 16,111 24,136
725 738 726 709
Determine the smallest number of simulations that you could have run in order to satisfy your criterion. A. 5000 B. 10,000 C. 20,000 D. 30,000 E. more than 30,000 12.9 (2 points) Joan is running a computer model that simulates the course of rivers. For each simulation run she records the ratio: the length of the river . the air distance between the source and the mouth of the river Joan wants to be 99% certain that her estimate of the average of this ratio will not differ from the true value by more than 0.005. The resulting statistics are shown below through selected numbers of simulation runs. Number of Runs
Total of Ratios
Sample Variance
100
318.5
0.0213
500
1,575.7
0.0198
1000
3,130.1
0.0209
5000
15,723.2
0.0218
Determine the minimum number of simulation runs Joan could have reviewed in order to meet her objective. A. 100 B. 500 C. 1000 D. 5000 E. More than 5000
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 201
12.10 (2 points) You are simulating observations from a Weibull Distribution with τ = 2 and θ = 100. How many simulations are needed to be 98% certain of being within ±1% of the true value of the probability of a loss being of size greater than 200? (A) Less than 1,000,000 (B) At least 1,000,000, but less than 1,500,000 (C) At least 1,500,000, but less than 2,000,000 (D) At least 2,000,000, but less than 2,500,000 (E) At least 2,500,000 12.11 (2 points) To estimate E[X] you have simulated X1 , X2 , X3 , ..., X50. 50
50
∑ Xi = 372.
∑ Xi2 = 4906.
i=1
i=1
You want the standard deviation of the estimator of E[X] to be less than 0.1. Estimate the total number of simulations needed. (A) Less than 4000 (B) At least 4000, but less than 5000 (C) At least 5000, but less than 6000 (D) At least 6000, but less than 7000 (E) At least 7000 12.12 (2 points) You are simulating the results next year on a book of private passenger automobile business written by an independent agent on behalf of your insurance company. You are trying to estimate the probability that the losses plus allocated loss adjustment expenses will be less than or equal to $500,000. You want to be 90% certain of being within ±5% of the true value. You get the following results for various numbers of simulations. Number of Runs 500 1000 2000 3000 4000 5000
# of Results ≤ $500,000 452 362 732 1090 1459 1827
Fn 90.40% 36.20% 36.60% 36.33% 36.48% 36.54%
n 4,708 Fn / Sn 567 1,155 1,712 2,297 2,879
What is the smallest number of simulations you could have run in order to meet your criterion? A. 1000 B. 2000 C. 3000 D. 4000 E. 5000
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 202
12.13 (2 points) You are simulating the value one year from now of a portfolio of stocks. You are trying to estimate the probability that the value will be less than or equal to $10 million. You want to be 99% certain of being within ±1% of the true value. You get the following results for various numbers of simulations. Number of Runs 500 10,000 20,000 30,000 40,000 50,000
# of Results452 ≤ $10 million
Fn 90.40% 59.63% 60.60% 60.14% 59.73% 59.60%
5,963 12,120 18,041 23,892 29,801
n 4,708 Fn / Sn 14,771 30,761 45,257 59,330 73,769
What is the smallest number of simulations you could have run in order to meet your criterion? A. 10,000 B. 20,000 C. 30,000 D. 40,000 E. 50,000 12.14 (2 points) You are running a simulation in order to estimate the median. You wish to have a 98% probability of being within ±1% of the true value. After n simulation runs, you let a = [0.5n - 2.326 0.25n ], and b = 1 + [0.5n + 2.326 0.25n ], where [x] is the largest integer in x. X(a) = the ath value from smallest to largest. X(b) = the bth value from smallest to largest. The results after selected numbers of simulation runs are as follows. Number of Runs
Estimate of median
a
X(a)
b
X(b)
1,000 2,000 3,000 4,000
$1306 $1272 $1312 $1321
463 947 1436 1926
$1207 $1193 $1274 $1309
537 1053 1564 2074
$3813 $3801 $3833 $3787
Determine the smallest number of simulations that you could have run in order to satisfy your criterion. A. 1000 B. 2000 C. 3000 D. 4000 E. more than 4000 12.15 (3 points) For large sample size n, the variance of the sample variance is approximately: kurtosis - 1 σ4 . n You are running a simulation of the annual aggregate losses of an insurance company. You will use the sample variance of the results of the simulation runs in order to estimate the variance of the aggregate annual losses. You wish to have a 90% probability of being within ±5% of the true value of the variance of the aggregate annual losses. Assuming that the distribution of annual aggregate losses has a kurtosis of 10, how many simulations should one run? A. 7000 B. 8000 C. 9000 D. 10,000 E. 11,000
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 203
12.16 (2 points) You are running a simulation of the expected total amount your insurance company will have to pay for earthquake claims in a state over the coming year. You want to be 90% confident of being within ±2% of the true value. The resulting statistics are shown below through selected numbers of simulation runs. Number of Runs
Running Average
Sample Standard Deviation
5,000
131
412
25,000
133
403
100,000
138
406
500,000
137
408
Determine the minimum number of simulation runs you could have reviewed in order to meet your objective. A. 5000 B. 25,000 C. 100,000 D. 500,000 E. More than 500,000 12.17 (2 points) You are simulating observations from a Weibull Distribution with τ = 2 and θ = 100. How many simulations are needed to be 98% certain of being within ±1% of the true value of the probability of a loss being of size less than or equal to 50? (A) Less than 50,000 (B) At least 50,000, but less than 100,000 (C) At least 100,000, but less than 150,000 (D) At least 150,000, but less than 200,000 (E) At least 200,000 12.18 (2 points) XYZ Manufacturing Company is no longer in business. Gillian is running a computer model that simulates the present value of retirement benefits that remain to be paid out by the pension plan of XYZ Manufacturing Company. The resulting statistics are shown below through selected numbers of simulation runs. Number of Runs
Running Average
Sample Standard Deviation
50
39,423,980
5,620,848
100
38,898,538
5,737,898
250
38,990,035
5,703,661
500
39,173,131
5,670,152
Gillian wants to be 95% certain that her estimate of the expected present value will not differ from the true value by more than 1,000,000. Determine the minimum number of simulation runs Gillian could have reviewed in order to meet her objective. A. 50 B. 100 C. 250 D. 500 E. More than 500
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 204
Use the following information for the next three questions: You are simulating observations from a Pareto Distribution with α = 1.5 and θ = 1000. 12.19 (2 points) How many simulations are needed to be 95% certain of being within ±10% of the true value of the probability of a loss being of size less than or equal to 500? (A) Less than 300 (B) At least 300, but less than 400 (C) At least 400, but less than 500 (D) At least 500, but less than 600 (E) At least 600 12.20 (2 points) How many simulations are needed to be 95% certain of being within ±10% of the true value of the probability of a loss being of size greater than 5000? (A) Less than 10,000 (B) At least 10,000, but less than 20,000 (C) At least 20,000, but less than 30,000 (D) At least 30,000, but less than 40,000 (E) At least 40,000 12.21 (2 points) How many simulations are needed to be 95% certain of being within ±10% of the true value of the mean? A. 10,000 B. 50,000 C. 100,000 D. 500,000 E. Can not be determined
12.22 (2 points) XYZ Manufacturing Company is no longer in business. Gordon is running a computer model that simulates whether the current assets held by the pension plan will be sufficient in order to pay the retirement benefits that remain to be paid out by the pension plan of XYZ Manufacturing Company. Gordon runs the simulation 1000 times. On 141 of those runs the pension plan runs out of money prior to paying all benefits. Gordon wants to be 90% certain that his estimate of the probability of the pension plan running out of money is correct within 0.005. Determine the minimum number of simulation runs Gordon should have reviewed in order to meet his objective. (A) Less than 2000 (B) At least 2000, but less than 5000 (C) At least 5000, but less than 10,000 (D) At least 10,000, but less than 15,000 (E) At least 15,000
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 205
12.23 (2 points) You are simulating the present value of the losses that will be paid by a reinsurer on a portfolio of reinsurance contracts. You wish to estimate the probability that this present value will be greater than $1 billion. You want to be 95% certain of being within ±5% of the true value. You get the following results for various numbers of simulations. Number of Runs 500 10,000 50,000 100,000 500,000 1,000,000
# of Results 452> $1 billion 28 142 289 1403 2837
Sn($1 billion) 90.40% 0.2800% 0.2840% 0.2890% 0.2806% 0.2837%
n 4,708 Sn / Fn 28 142 290 1,407 2,845
What is the smallest number of simulations you could have run in order to meet your criterion? A. 10,000 B. 50,000 C. 100,000 D. 500,000 E. 1,000,000 12.24 (3 points) You simulate observations from a specific distribution F(x), such that the number of simulations N is sufficiently large to be at least 90 percent confident of estimating F(1000) correctly within 2.5 percent. Let D represent the number of simulated values less than 1000. Determine which of the following could not be values of N and D. (A) N = 1000 D = 815 (B) N = 2000 D = 1330 (C) N = 3000 D = 1775 (D) N = 4000 D = 2080 (E) N = 5000 D = 2325 12.25 (2 points) For a simulation, you desire that the estimated mean be within 1% of the correct value 90% of the time. If 435,000 simulation runs is the smallest number that will achieve this goal, what is the coefficient of variation of the distribution that is being simulated? A. 2 B. 3 C. 4 D. 5 E. 6
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 206
12.26 (2 points) You are given the following: • You use a simulation model of an insurer in order to estimate the average policyholder surplus five years from now. • For each simulation run, starting with the same given initial conditions for the model insurer, you record the surplus of the model insurer five years from now. • You want to be 95% certain that your estimate will not differ from the true value by more than 60. • The resulting statistics are shown below through selected numbers of simulation runs. S i^2 is the sample variance after i simulation runs. i
Running Average of Xi
Si^2
100
5,352
1,105,260
250
5,026
1,130,872
500
4,965
1,132,637
1000
5,142
1,094,928
2500
5,093
1,100,705
Determine the minimum number of simulation runs you could have reviewed in order to meet your objective. A. 100 B. 250 C. 500 D. 1000 E. 2500 12.27 (2 points) You are planning a simulation to estimate the mean aggregate annual loss. You assume that the distribution of aggregate losses has a coefficient of variation of about 3. Use the central limit theorem to estimate the smallest number of trials needed so that you will be at least 90% confident that the simulated mean is within 2.5% of the population mean. (A) 30,000 (B) 40,000 (C) 50,000 (D) 60,000 (E) 70,000 12.28 (2 points) Simulation is used in order to estimate the value of the cumulative distribution function at 1000. After some simulation runs you estimate F(1000) to be about 82%. Estimate the minimum total number of simulations needed so that there is at least a 98% probability that the estimate is within ±5% of the correct value. (A) Less than 200 (B) At least 200, but less than 400 (C) At least 400, but less than 600 (D) At least 600, but less than 800 (E) At least 800
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 207
12.29 (2 points) Simulation is used to estimate the value of F(800). One hundred independent random draws from F are simulated. Of these one hundred simulated values, seventy are less than or equal to 800. Determine the minimum total number of simulations so that there is at least a 99% probability that the estimate is within ±2% of the correct value. (A) Less than 10,000 (B) At least 10.000, but less than 15,000 (C) At least 15,000, but less than 20,000 (D) At least 20,000, but less than 25,000 (E) At least 25,000 12.30 (Course 4 Sample Exam 2000, Q.14) You are given a tool that determines the expected prospective profitability of any insurance policy. You are asked to use this tool to estimate the average profitability of a large number of policies. You want to be 95% certain that your estimate will not differ from the true value by more than 0.02 units. Your estimates of profitability, Xi, for various numbers of policies reviewed, together with the indicated statistics, are shown below. 20 21 22 23 24 25 26 27 28 29 ... 1 2 3i 30 31 32 33 34 35 36 37 38 39 40
Xi 1.0066 1.0795 1.0559 1.1691 1.0834 1.1272 1.0722 1.1373 1.0428 1.1418 1.1249 1.1062 ... 1.2350 1.0478 1.0875 1.1149 1.1591 1.0226 0.9668 1.1487 1.1887 1.1303 1.0484
Running Average of 1.0787 1.0795 1.0677 1.0830 1.0850 1.0844 1.0865 1.0849 1.0845 1.0866 1.0879 0.0806 ...Xi 1.0928 1.0914 1.0912 1.0919 1.0939 1.0919 1.0884 1.0900 1.0926 1.0936 1.0925
Si^2 0.00269721 0.00027908 0.00295101 0.00281050 0.00276758 0.00265399 0.00265508 0.00262253 0.00252463 0.00254843 0.00250808 0.00063266 ... 0.00314274 0.00310330 0.00300365 0.00292673 0.00297075 0.00302872 0.00337693 0.00338141 0.00354638 0.00348950 0.00345104
Si 0.0519 0.0167 0.0543 0.0530 0.0526 0.0515 0.0512 0.0502 0.0505 0.0501 0.0252 ... 0.0561 0.0557 0.0548 0.0541 0.0545 0.0550 0.0581 0.0581 0.0596 0.0591 0.0587
Si/√i 0.01161 0.01181 0.01185 0.01130 0.01097 0.01052 0.01003 0.01004 0.00967 0.00954 0.00930 0.01452 ... 0.01024 0.01001 0.00969 0.00942 0.00935 0.00930 0.00969 0.00956 0.00966 0.00946 0.00929
Determine the minimum number of policies you could have reviewed to meet your objective. 12.31 (4, 11/01, Q.17 & 2009 Sample Q.66) (2.5 points) To estimate E[X] you have simulated X1 , X2 , X3 , X4 , and X5 with the following results: 1 2 3 4 5. You want the standard deviation of the estimator of E[X] to be less than 0.05. Estimate the total number of simulations needed. (A) Less than 150 (B) At least 150, but less than 400 (C) At least 400, but less than 650 (D) At least 650, but less than 900 (E) At least 900
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 208
12.32 (4, 11/05, Q.16 & 2009 Sample Q.227) (2.9 points) You simulate observations from a specific distribution F(x), such that the number of simulations N is sufficiently large to be at least 95 percent confident of estimating F(1500) correctly within 1 percent. Let P represent the number of simulated values less than 1500. Determine which of the following could be values of N and P. (A) N = 2000 P = 1890 (B) N = 3000 P = 2500 (C) N = 3500 P = 3100 (D) N = 4000 P = 3630 (E) N = 4500 P = 4020 12.33 (4, 11/06, Q.11 & 2009 Sample Q.255) (2.9 points) You are planning a simulation to estimate the mean of a non-negative random variable. It is known that the population standard deviation is 20% larger than the population mean. Use the central limit theorem to estimate the smallest number of trials needed so that you will be at least 95% confident that the simulated mean is within 5% of the population mean. (A) 944 (B) 1299 (C) 1559 (D) 1844 (E) 2213 12.34 (4, 5/07, Q.37) (2.5 points) Simulation is used to estimate the value of the cumulative distribution function at 300 of the exponential distribution with mean 100. Determine the minimum number of simulations so that there is at least a 99% probability that the estimate is within ±1% of the correct value. (A) 35 (B) 100 (C) 1418 (D) 2013 (E) 3478
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 209
Solutions to Problems: 12.1. A. The estimated mean is $421,952/500 = 844. The estimated second moment is 1,267,383,362/ 500 = 2,534,766. Thus the estimated standard deviation for one observation is: (500 / 499)(2,534,766 - 844 2) = 1351. Thus with 500 observations, the standard deviation of the estimated mean is 1351/ 500 = 60.4. Thus an approximate 90% confidence interval for the expected present value of average annual payment by the insurer would be $844 ± (1.645)(60.4) ≅ $844 ± $100. 12.2. B. The estimated mean is $421,952/500 = 844. The estimated second moment is 1,267,383,362/ 500 = 2,534,766. Thus the estimated CV2 = 2,534,766/8442 - 1 = 2.558. P = 90%, so y = 1.645. k = .03. Thus we need CV2 (y/k)2 = (2.558)(1.645/.03)2 = 7691 simulations. Alternately, if we use the sample variance, (500/499)(2,534,766 - 8442 ) = 1,826,082. Thus we need CV2 (y/k)2 = (1,826,082/8442 )(1.645/.03)2 = 7708 simulations. Comment: Since one has 500 data points, it does not make much difference whether one uses the sample variance. 12.3. B. An approximate 90% confidence interval is ±1.645 Si / i . Thus we want 1.645Si / i ≤ $5000 or Si / i ≤ 5000/1.645 = 3040. This first occurs at 4000 simulations. i
Running Average of Xi
Si^2
Si
Si/√i
3000
$242,362
31,940,838,400
$178,720
$3,263
4000
$257,004
32,137,732,900
$179,270
$2,835
5000
$255,810
32,443,214,400
$180,120
$2,547
6000
$255,621
32,231,020,900
$179,530
$2,318
Comment: Φ(1.645) = (1+.9)/2. Using the first 4000 simulation runs, the estimated average size of fatal claims is: $257,004 ± $4664 or about $252 to $262 thousand. Note that the criterion is in terms of an amount rather than a percent. 12.4. B. The mean chance of insolvency is 171/5000 = .0342. This is Bernoulli process (there is either an insolvency or there is not) with variance (.0342)(1-.0342); thus the average of 5000 observations has a variance of (.0342)(1-.0342)/5000 = .000006606. The standard deviation is: .00257. For an 80% confidence interval we want ±1.282 standard deviations, since Φ(1.282) = .90. Thus the estimated chance of an insolvency is: 3.42% ± (1.282)(.257%) = 3.42% ± 0.33%.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 210
12.5. A. Since running 5000 simulations gives error bars of ±0.33% = .0033 for an 80% confidence interval, in order to get error bars of ±0.001 we need to run more simulations. Since the error bars go down as 1/ n , we expect to need to run 5000(.33/.1)2 = 54,450 simulation runs. Alternately, we want to be within ±.001, 80% of the time. Want: 0.001 = 1.282 (standard deviation of the mean) = 1.282 (0.0342)(1 - 0.0342)/ N .
⇒ n = (1.282/.001)2 (.0342)(1 - .0342) = 54,286. 12.6. A. y = 1.960. k = .05. Want n ≥ (y/k)2 (Sn / X n )2 . ⇔ (Sn / X n )/ n ≤ k/y = 0.05/1.96 = .0255. Number of Runs
Running Average
Sample Standard Deviation
(Sn/Xbar)/n^.5
1,000
837
643
0.0243
5,000
852
698
0.0116
10,000
849
672
0.0079
25,000
846
665
0.0050
This is first true when n = 1000. 12.7. E. For P = 99%, y = 2.576. For the Gamma Distribution CV2 = αθ2/(αθ)2 = 1/α = 1/3. Required number of simulations = (y/k)2 C V2 = (2.576/.02)2 /3 = 5530. Comment: Similar to Exercise 21.4 in Loss Models. Of course we already know that the mean of this Gamma Distribution is: αθ = 3000, so we would not actually be estimating it via simulation. ^ ^ 12.8. C. Check whether: X(a) ≥ .98 Π.8 and X(b) ≤ 1.02 Π.8 .
Number of Runs
Estimate of 80th percentile
X(a)
0.98 times estimate
Comparing to X(a)
X(b)
1.02 times estimate
Comparing to X(b)
5,000 10,000 20,000 30,000
699 720 718 702
671 703 706 694
685 706 704 688
FALSE FALSE TRUE TRUE
725 738 726 709
713 734 732 716
FALSE FALSE TRUE TRUE
Thus 20,000 simulation runs would have been enough. Comment: For P = 95%, y = 1.960.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 211
12.9. E. An approximate 99% confidence interval is ±2.576 Si / i , where Si2 is the sample variance. Thus we want 2.576Si / i ≤ .005 or Si / i ≤ .005/2.576 = 0.00194. This first occurs for more than 5000 simulations. i
Total of Ratios
Running Average of Ratios
Si^2
Si
Si/√i
100
318.5
3.1850
0.0213
0.1459
0.01459
500
1,575.7
3.1514
0.0198
0.1407
0.00629
1000
3,130.1
3.1301
0.0209
0.1446
0.00457
5000
15,723.2
3.1446
0.0218
0.1476
0.00209
Comment: Φ(2.576) = (1+.99)/2 = .995. According to Darwinʼs Ghost by Steve Jones, the expected value of this ratio for rivers is π. Note that the criterion is in terms of an amount rather than a percent. 12.10. E. For P = 98%, y = 2.326. For the Weibull Distribution, S(200) = exp[-(200/100)2 ] = 1.83%. Required number of simulations = (y/k)2 F(200)/S(200) = (2.326/.01)2 (98.17%/1.83%) = 2,902,332. Comment: We are estimating S. Therefore we want: y(F S / n)0.5 ≤ k S. ⇔ (y/k)2 F/S ≤ n. It is very hard to get an accurate estimate of the probability in the extreme right hand tail; therefore it would take a lot of simulations in order to get an accurate estimate of S(200). For 2,902,332 simulations, 2.326
F(200) S(200) / n = 2.326
(98.17%)(1.83%) / 2,902,332 =
0.000183 = (1.83%)(1%). 12.11. B. Mean is: 372/50 = 7.44. Second Moment is: 4906/50 = 98.12. Sample Variance is: (50/49)(98.12 - 7.442 ) = 43.64. Sample Standard Deviation = 6.606. Want 6.606/ i < 0.1. Therefore, i > (6.606/.1)2 = 4364. Note that the criterion is in terms of an amount rather than a percent. 12.12. B. The variance of the empirical distribution is F(x)S(x)/n. We want 1.645 FnSn / n ≤ .05Fn . ⇒ 1.6452 Fn S n /n ≤ .052 Fn 2 .
⇒ nFn /Sn ≥ (1.645/.05)2 = 1082. This first occurs when n = 2000. Comment: For 2000 simulations, 1.645
Fn S n / n = 1.645
.0177 ≤ .0183 = (36.60%)(5%). We want nFn /Sn ≥ (y/k)2 .
(0.366)(0.634)/ 2000 =
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 212
12.13. E. We want nFn /Sn ≥ (y/k)2 = (2.576/.01)2 = 66,358. This first occurs when n = 50,000. Comment: We are estimating F. Therefore we want: For 50,000 simulations, 2.576
Fn S n / n ≤ k F. ⇔ (y/k)2 ≤ n F/S. (0.5960)(0.4040) / 50,000 =
Fn S n / n = 2.576
0.00565 ≤ 0.00596 = (59.60%)(1%). Comment: As example of a yes/no simulation. 12.14. E. Check whether: X(a) ≥ .99(estimated median) and X(b) ≤ 1.01(estimated median). Number of Runs
Estimate of median
1,000 2,000 3,000 4,000
$1306 $1272 $1312 $1321
X(a)
0.99 times estimate
Comparing to X(a)
$1207 $1193 $1274 $1309
$1293 $1259 $1299 $1308
FALSE FALSE FALSE TRUE
X(b)
1.01 times estimate
Comparing to X(b)
$1362 $1311 $1340 $1336
$1319 $1285 $1325 $1334
FALSE FALSE FALSE FALSE
Thus 4000 simulation runs is not enough. Comment: p = .5. p(1-p) = .25. For P = 98%, y = 2.326. 12.15. D. P = 90%. ⇒ y = 1.645. The standard deviation of the sample variance is: σ2 We want: 1.645 σ2
kurtosis - 1 . n
kurtosis - 1 = 0.05 σ2. n
⇒ n = (kurtosis - 1) (1.645/0.05)2 = (9) (1082) = 9738. Comment: This situation is not covered on the syllabus. In general when estimating the variance, n = n0 (kurtosis - 1), where n0 = (y/k)2 . The larger the kurtosis, the heavier the tail of the distribution, and the more simulations you need to get a good estimate of the variance. For example, for an Exponential Distribution, the kurtosis is 9, so that n = 8 n0 . If instead, we were estimating the mean, then n = n0 CV2 ; for an Exponential the coefficient of variation is one and n = n0 . Estimating the variance within ±5% of the true value is equivalent to estimating the standard deviation within approximately ±2.5%. One could get a rough estimate of the kurtosis of the distribution of the aggregate annual losses from the distribution of results of the simulations runs, or one might have a rough idea of what the kurtosis is from similar situations you have worked on in the past.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 213
12.16. C. y = 1.645. k = .02. Want n ≥ (y/k)2 S n 2 / X n 2 .
⇔ (Sn / X n )/ n ≤ k/y = .02/1.645 = .0122. This is first true when n = 100,000. Number of Runs
Running Average
Sample Standard Deviation
(Sn/Xbar)/n^.5
5,000
131
412
0.0445
25,000
133
403
0.0192
100,000
138
406
0.0093
500,000
137
408
0.0042
Alternately, we want: y Sn / n ≤ k X n ⇔ (Sn / X n )/ n ≤ k/y. Proceed as before. Alternately, we want n ≥ (y/k)2 S n 2 / X n 2 = (1.645/.02)2 CV2 = 6765 CV2 . We can estimate the CV from any of the given information. For example CV ≅ 408/137 = 3. n ≥ (6765)(32 ) ≅ 61,000. Thus of the choices given, we want at least 100,000 runs. 12.17. D. For P = 98%, y = 2.326. For the Weibull Distribution, F(50) = 1 - exp[-(50/100)2 ] = .221. Required number of simulations = (y/k)2 S(50)/F(50) = (2.326/.01)2 (.779/.221) = 190,706. Comment: Similar to Exercise 21.4 in Loss Models. Of course we already know F(50), so we would not actually be estimating it via simulation. We are estimating F. Reasoning it out, we want: y(F S / n).5 ≤ k F. ⇔ (y/k)2 S/F ≤ n. 12.18. C. 95% corresponds to ±1.960 standard deviations. We want 1.960 Si/ n < 1,000,000. Want Si/ n < 1,000,000/1.96 = 510,204. This first occurs for 250 simulation runs. Number of Runs
Si
Si/√i
50
5,620,848
794,908
100
5,737,898
573,790
250
5,703,661
360,731
500
5,670,152
253,577
Alternately, after 50 runs, we can estimate the sample standard deviation as about 5.62 million. We want (1.96)(5.62 million)/ i < 1,000,000. i > {(5.62)(1.96)}2 = 121. Of the choices given, this first occurs for 250 simulation runs. Comment: Note that the criterion is in terms of an amount rather than a percent. 12.19. C. For P = 95%, y = 1.960. F(500) = 1 - (1000/1500)1.5 = .456. Required number of simulations = (y/k)2 S(500)/F(500) = (1.960/.1)2 (.544/.456) = 458.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 214
12.20. A. For P = 95%, y = 1.960. S(5000) = (1000/6000)1.5 = .0680. Required number of simulations = (y/k)2 F(200)/S(200) = (1.960/.1)2 (.9320/.0680) = 5265. 12.21. E. For this Pareto, since α < 2, there is no finite variance. Therefore, one can not use of the Normal Approximation in order to determine a number of simulations that will meet this criteria. Comment: However, the Law of Large Numbers applies, since we have identically distributed distributions with a finite mean, whether or not the variance is finite. See for example, An Introduction to Probability Theory and its Applications by Feller. Therefore, for a Pareto with alpha = 1.5 the sample mean is a consistent estimator. The probability of a large error will indeed decline to zero as n goes to infinity. 12.22. D. The mean chance of running out of money is: 141/1000 = .141. This is Bernoulli process (there is either enough money or there is not) with variance: (.141)(1 - .141) = .1211 and standard deviation: 0.1211 = .348. For a 90% confidence interval we want ±1.645 standard deviations. We want (1.645)(.348/ i ) < .005. i > {(1.645)(.348)/.005}2 = 13,108. Comment: As example of a yes/no simulation. Also the requirement is not stated as a percent of the true value. 12.23. E. We want nSn ($1 billion)/Fn ($1 billion) ≥ (y/k)2 = (1.960/.05)2 = 1537. This first occurs when n = 1,000,000. Comment: 1.96
(1 - 0.002837)(0.002837) / 1,000,000 = .0104%.
After 1 million simulations, the resulting estimate that the probability that this present value will be greater than $1 billion would have been: .2837% ± .0104%.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 215
12.24. B. The variance of the empirical distribution F(1000) is: (D/N)(1 - D/N)/N. For 90% confidence, y = 1.645. Therefore we want: (D / N) (1 - D / N) / N ≤ .025D/N. ⇒ ND/{N - D) ≥ 4330.
1.645
Trying the choices, only choice B is not okay. Choice
N
D
ND/(N-D)
A B C D E
1000 2000 3000 4000 5000
815 1330 1775 2080 2325
4,405 3,970 4,347 4,333 4,346
Alternately, for each choice compute: (.025D/N)/ (D / N) (1 - D / N) / N = .025 D / 1 - D/ N , and check when this is greater than or equal to 1.645. Choice
N
D
.025 {D/(1 - D/N)}^.5
A B C D E
1000 2000 3000 4000 5000
815 1330 1775 2080 2325
1.659 1.575 1.648 1.646 1.648
Comment: In general, we want: nFn (x)/Sn (x) ≥ (y/k)2 . Here y = 1.645, k = .025, Fn (x) = D/N, and S n (x) = 1 - D/N. Thus the requirement is: D/(1 - D/N) ≥ (1.645/.025)2 . ⇔ ND/{N - D) ≥ 4330. Similar to 4, 11/05, Q.16. 12.25. C. y = 1.645. k = .01. (y/k)2 = 27060. We are given that 435000 = (y/k)2 S n 2 / X n 2 .
⇒ Sn 2 / X n 2 = 435000/27060 = 16.08. ⇒ Sn / X n = 4.01. 12.26. E. An approximate 95% confidence interval is ±1.960 Si / i . Thus we want 1.960Si / i ≤ 60. We want: Si / i ≤ 60/1.960 = 30.6. We check and this first occurs at 2500 simulations. i
Running Average of Xi
Si^2
Si
Si/√i
100
5,352
1,105,260
1,051.3
105.1
250
5,026
1,130,872
1,063.4
67.3
500
4,965
1,132,637
1,064.3
47.6
1000
5,142
1,094,928
1,046.4
33.1
2500
5,093
1,100,705
1,049.1
21.0
Comment: You can assume that the amounts of surplus are actually in millions of dollars. The plus or minus is given in terms of an amount rather than a percent of the true value.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 216
12.27. B. P = 90%. ⇒ y = 1.645. k = .025. n0 = y2 /k2 = (1.645/.025)2 = 4330. Standard for Full Credibility for Severity: n0 C V2 = (4330)32 = 38,970. Alternately, for N simulations, X has variance of σ2/N. An approximate 90% confidence interval for µ is: X ± 1.645σ/ N . Set .025µ = 1.645σ/ N . ⇒ N = (1.645/.025)2 (σ/µ)2 = (4330) 32 = 38,970. Comment: Similar to 4, 11/06, Q.11. 12.28. C. The variance of the empirical distribution function at 1000 is: F(1000)S(1000)/n. Since Φ[2.326] = 0.99, 98% probability ⇔ y = 2.326. We want, 2.326
F(1000) S(1000) ≤ .05 F(1000). n
⇒ n ≥ (2.326/.05)2 S(1000)/F(1000) ≅ (2164)(18%/82%) = 475. Comment: Similar to 4, 5/07, Q.37. 12.29. A. The variance of the empirical distribution function at 800 is: F(800)S(800)/n. Since Φ[2.576] = 0.995, 99% probability ⇔ y = 2.576. We want, 2.576 F(800)S(800)/ n ≤ 0.02 F(800).
⇒ n ≥ 16,589 S(800)/F(800). From the 100 runs we estimate that F(800) is approximately: 70/100 = 0.7.
⇒ n ≥ (16,589)(0.3/0.7) = 7110. 12.30. An approximate 95% confidence interval is ±1.96 Si / i . Thus we want 1.96 Si / i ≤ .02 or Si / i ≤ .02/1.96 = .01020. This first occurs at 31 policies. Comment: You can ignore all but the first and last columns of the table (since they were nice enough to do all of the required calculations for you.) Note that the criterion is in terms of an amount rather than a percent.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 217
12.31. E. The sample mean is 3 and the sample variance is: {(1 - 3)2 + (2 - 3)2 + (3 - 3)2 + (4 - 3)2 + (5 - 3)2 }/(5 - 1) = 2.5. Sample standard deviation =
2.5 .
With i simulations, standard deviation of the estimate of E[X] is:
2.5 / i .
We want 2.5 / i ≤ .05. i ≥ 2.5/.052 = 1000. Comment: Note that one uses the sample variance, with n - 1 in the denominator. The sample variance estimated from only 5 values is subject to very significant random fluctuation, and should not be used for this purpose in practical applications. Note that the criterion is in terms of an amount rather than a percent. 12.32. D. The variance of the empirical distribution F(1500) is: (P/N)(1 - P/N)/N. For 95% confidence, y = 1.960. Therefore we want: (P / N) (1 - P / N) / N ≤ .01P/N. ⇒ NP/{N - P) ≥ 38,416.
1.960
Trying the choices, NP/{N - P) is: 34364, 15000, 27125, 39243, and 37688, respectively. Choice
N
P
NP/(N-P)
A B C D E
2000 3000 3500 4000 4500
1890 2500 3100 3630 4020
34,364 15,000 27,125 39,243 37,688
Therefore, only N = 4000 and P = 3630, choice D, is okay. Alternately, for each choice compute: (.01P/N)/ (P / N) (1 - P / N) / N = .01 P / 1 - P / N , and check when this is greater than or equal to 1.960. Choice
N
P
.01 {P/(1 - P/N)}^.5
A B C D E
2000 3000 3500 4000 4500
1890 2500 3100 3630 4020
1.854 1.225 1.647 1.981 1.941
Comment: In general, we want: nFn (x)/Sn (x) ≥ (y/k)2 . Here y = 1.960, k = .01, Fn (x) = P/N, and S n (x) = 1 - P/N. Thus the requirement is: P/(1 - P/N) ≥ (1.960/.01)2 . ⇔ NP/(N - P) ≥ 38,416.
2013-4-13,
Simulation §12 How Many Simulations to Run, HCM 10/25/12, Page 218
12.33. E. P = 95%. ⇒ y = 1.960. k = .05. n0 = y2 /k2 = 1537. CV = standard deviation / mean = 1.2. Standard for Full Credibility for Severity: n0 C V2 = (1537)1.22 = 2213. Alternately, for N simulations, X has variance of σ2/N. An approximate 95% confidence interval for µ is: X ± 1.960σ/ N . Set .05µ = 1.960σ/ N . ⇒ N = (1.960/.05)2 (σ/µ)2 = (1537)1.22 = 2213. Comment: See “Mahlerʼs Guide to Classical Credibility.” 12.34. E. The variance of the empirical distribution function at 300 is: F(300)S(300)/n. Since Φ[2.576] = 0.995, 99% probability ⇔ y = 2.576. We want, 2.576 F(300)S(300)/ n ≤ .01 F(300).
⇒ n ≥ 66,358 S(300)/F(300). For an Exponential Distribution with θ = 100, F(300) = 1 - e-3 = 0.9502, and S(300) = e-3 = 0.0498. n ≥ (66,358)(.0498/.9502) = 3478. Comment: If we knew we had a Exponential Distribution with θ = 100, then F(300) = 1 - e-3, without any need for using simulation. We actually only make use of the fact that F(300) ≅ 95%.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 219
Section 13 Simulating a Gamma and Related Distributions While it is not discussed in Loss Models, one can simulate a Gamma Distribution as well as the Chi-Square, Transformed Gamma, Inverse Gamma, Beta, and Generalized Pareto Distributions. Gamma, Integer Shape Parameter:114 One can simulate a Gamma Distribution as a sum of Exponentials. For example, assume that 17.1, 8.5, 22.9, and 3.7 are four independent random draws from an Exponential distribution with θ = 10, using the inversion method. Then their sum 17.1 + 8.5 + 22.9 + 3.7 = 52.2 is a random draw from a Gamma Distribution with α = 4 and θ = 10. Exercise: Let 0.54, 0.16, 0.25 be independent random numbers from [0, 1]. Simulate a random draw from a Gamma Distribution with α = 3 and θ = 1. Assume large random numbers correspond to large losses. [Solution: Simulate three independent draws from an Exponential Distribution with θ = 1. Simulate the Exponentials via the Inversion Method, u = 1- e-x, or x = -ln(1-u). Thus the three independent random Exponentials are: -ln(0.46), -ln(0.84), and -ln(0.75). The random Gamma is their sum: -{ln(0.46) + ln(0.84) + ln(0.75)} = 1.239. Comment: Note that alternately we can take: -ln[(0.46)(0.84)(0.75)] = 1.239. ] Exercise: Let 1.239 be a random draw from a Gamma Distribution with α = 3 and θ = 1. Use it to get a random draw from a Gamma Distribution with α = 3 and θ = 10. [Solution: If 1.239 is a random draw from a Gamma Distribution with α = 3 and θ = 1, then (10)(1.239) = 12.39 is a random draw from a Gamma with α = 3 and θ = 10.] A More Efficient Algorithm: There is a somewhat more efficient way to simulate a Gamma for integer alpha. Assume for example we wish to generate 4 independent exponential variables each with mean θ. Then since 1-u is also uniformly distributed on (0,1) if u is, we can simulate each Exponential via x = -θ ln(u), where large random numbers correspond to small simulated values. Then simulate 4 random numbers from (0,1): u1 , u2 , u3 , u4 . Let x1 = -θ ln (u1 ), ..., x4 = -θ ln(u4 ). The desired Gamma variable is: x1 + x2 + x3 + x4 = -θ ln(u1 u2 u3 u4 ). 114
This is simple enough to be on your exam. As discussed in Simulation by Ross, not on the syllabus, the Gamma Distribution can be simulated via the rejection method, for any value of the shape parameter alpha.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 220
Exercise: Let 0.38, 0.46, 0.84, 0.75 be independent random numbers from (0,1). Use the above algorithm to simulate a random draw from a Gamma Distribution with α = 4 and θ = 10. [Solution: The random Gamma is: -10 ln[(0.38)(0.46)(0.84)(0.75)] = 22.06. ] In general, in order to simulate a Gamma with shape parameter α = n and scale parameter θ, simulate n random numbers from (0,1): u1 , ..., un . Then the desired Gamma variable is: -θ ln(u1 u2 ... un ). This revised algorithm requires we take only one logarithm rather than n, at the cost of additional multiplications, and thus may save some computer execution time. Chi-Square Distribution, (Gamma with half-integer shape parameter): A Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with α = ν/2 and θ = 2. The Chi-Square Distribution with ν degrees of freedom is the sum of ν squares of independent Standard Unit Normals. Thus if for example, -1.228, 1.144, and -0.257 are three independent random draws from a Standard Normal, then (-1.228)2 + (1.144)2 + (-0.257)2 = 2.883 is a random draw from a Chi-Square Distribution with three degrees of freedom. Let Y be a random draw from a Chi-Square Distribution with three degrees of freedom, which is a Gamma Distribution as per the Loss Models with α = 3/2 and θ = 2. This has mean (3/2)/(2) = 3. Then 1000Y is from a Gamma with α = 3/2 and θ = 2000, with mean 3000 rather than 3. Thus if 2.883 is a random draw from a Chi-Square with 3 degrees of freedom, then 2883 is a random draw from a Gamma Distribution with α = 3/2 and θ = 2000. In general in order to simulate a Gamma Distribution with parameters α = ν /2 and θ, one simulates a random draw from a Chi-Square with ν degrees of freedom and multiplies by θ/2. Exercise: Let 0.8, 0.2, -1.4, 0.5, and -0.3 be five independent random draws from a Standard Unit Normal. Simulate a random draw from a Gamma Distribution with α = 2.5 and θ = 1/100. [Solution: (0.8)2 + (0.2)2 + (-1.4)2 + (0.5)2 + (-0.3)2 = 2.98 is a random draw from a Chi-Square Distribution with five degrees of freedom. This Chi-Square Distribution is a Gamma Distribution with a shape parameter of 5/2 = 2.5, but a mean of 5. The desired Gamma has a mean of 2.5/100 = 0.025. Thus we multiply by 0.025/5 = 1/200 = θ/2, in order to get a random draw from the desired Gamma: 2.98/200 = 0.0149.]
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 221
Inverse Gamma Distribution: Assume we want to simulate a random variable Y from an Inverse Gamma Distribution as per Loss Models with parameters α = 4, and θ = 10. Then by the definition of the Inverse Gamma Distribution: F(y) = 1 - Γ[4; (10/y)]. Thus θ/Y = 10/Y is Gamma distributed with α = 4 and θ = 1. Above we saw how to simulate such a variable X. Set 10/Y = X. Therefore, Y = 10/X. If for example, a random draw from a Gamma with α = 4 and θ = 1 is 0.0522, then 10/0.0522 = 191.6 is a random draw from an Inverse Gamma with α = 4 and θ = 10. In general in order to simulate a random draw from an Inverse Gamma, one simulates a random draw from a Gamma with the same α parameter and unity scale parameter, inverts, and then multiplies that result by θ.115 LogGamma: As discussed previously, if we wish to simulate a random LogNormal, we simulate a random Normal with corresponding parameters and then exponentiate. Similarly, if we wish to simulate a random LogGamma, we simulate a random Gamma with corresponding parameters and then exponentiate. Assume we want to simulate a random variable Y from a LogGamma Distribution, with α = 4 and θ = 2, and Distribution Function F(x) = Γ[α, ln(x) / θ]. Then, ln(Y) is Gamma distributed with α = 4 and θ = 2. Above we saw how to simulate such a variable X. Set ln(Y) = X. Therefore, Y = exp(X). If for example, a random draw from a Gamma with α = 4 and θ = 2 is 10.44, then e10.44 = 34,201 is a random draw from a LogGamma with α = 4 and θ = 2. Exercise: Use 1.239 a random draw from a Gamma Distribution with α = 3 and θ = 1, to get a random draw from a LogGamma Distribution with α = 3 and θ = 10. [Solution: exp[(10)(1.239)] = 240,386 is a random draw from a LogGamma with α = 3 and θ = 10.]
115
It may be helpful to think of this as first simulating a random draw from an Inverse Gamma Distribution with scale parameter of unity and then adjusting the scale to desired value at the end, by multiplying by the desired scale parameter theta.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 222
Transformed Gamma: Assume we want to simulate a random variable Y from a Transformed Gamma Distribution as per Loss Models with parameters α = 4, θ = 10, and τ = 3. Then by the definition of the Transformed Gamma Distribution: F(y) = Γ[4; (y/10)3 ] = Γ[4; (0.001y3 )]. Thus (Y/θ)τ = 0.001Y3 is Gamma distributed with α = 4 and θ = 1. Above we saw how to simulate such a variable X. Set 0.001Y3 = X. Therefore, Y = 10X1/3. If for example, a random draw from a Gamma with α = 4 and θ = 1 is 0.0522, then 10(0.0522)1/3 = 3.74 is a random draw from a Transformed Gamma with α = 4, θ = 10, and τ = 3. In general, in order to simulate a random draw from a Transformed Gamma, one simulates a random draw from a Gamma with the same α parameter and unity scale parameter, takes the result to the power 1/τ, and then multiplies that result by θ.116 In the above: (10)(0.0522)1/3 = 3.74. Exercise: Let 1.239 be a random draw from a Gamma Distribution with α = 3 and θ = 1. Use it to get a random draw from a Transformed Gamma Distribution with α = 3, θ = 100 and τ = 4. [Solution: (100)(1.239)1/4 = 105.5. ] Inverse Transformed Gamma Distribution: Assume we want to simulate a random variable Y from an Inverse Transformed Gamma Distribution as per Loss Models with parameters α = 4, θ = 10 and τ = 3. Then by the definition of the Inverse Transformed Gamma Distribution: F(y) = Γ[4; (10/y)3 ] = Γ[4; (1000/y3 )]. Thus (θ / Y)τ = 1000 /Y3 is Gamma distributed with α = 4 and θ = 1. Above we saw how to simulate such a variable X. Set 1000 /Y3 = X.⇒ Y3 = 1000/X. Therefore, Y = 10/X1/3. If for example, a random draw from a Gamma with α = 4 and θ = 1 is 0.0522, then 10/(0.0522)1/3 = 26.76 is a random draw from an Inverse Transformed Gamma with α = 4, θ = 10, and τ = 3.
116
First simulating a random draw from a Transformed Gamma Distribution with scale parameter of unity and then adjusting the scale to desired value at the end, by multiplying by the desired scale parameter theta.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 223
In general in order to simulate a random draw from an Inverse Transformed Gamma, one simulates a random draw from a Gamma with the same α parameter and unity scale parameter, takes the result to the power -1/τ, and then multiplies that result by θ.117 In the above example: 10(0.0522)-1/3 = 26.76. Exercise: Let 1.239 be a random draw from a Gamma Distribution with α = 3 and θ = 1. Use it to get a random draw from an Inverse Transformed Gamma Distribution with α = 3, θ = 100, and τ = 4. [Solution: (100)(1.239)-1/4 = 94.8 is a random draw from an Inverse Transformed Gamma Distribution with α = 3, θ = 100, and τ = 4. ] Beta Distribution:118 A Beta Distribution can be expressed in terms of two Gamma Distributions.119 If X is a random draw from a Gamma Distribution with shape parameter a and scale parameter θ, and Y is a random draw from a Gamma Distribution with shape parameter b and same scale parameter θ, then Z = X / (X+Y) is a random draw from a Beta Distribution with parameters a and b (and scale parameter 1.) While this method works for non-integer parameters, it is particularly useful when a and b are integers. Exercise: Let 0.49, 0.92 be independent random numbers from [0,1]. Simulate a random draw from a Gamma Distribution with α = 2 and θ = 1. [Solution: Simulate two independent draws from an Exponential Distribution with θ = 1. The random Gamma is their sum: -{ln(0.51) + ln(0.08)} = 3.199.] Exercise: Let 1.239 be a random draw from a Gamma Distribution with α = 3 and θ = 1 and let 3.199 be a random draw from a Gamma Distribution with α = 2 and θ = 1. Use them to simulate a random draw from a Beta Distribution with parameters a = 3, and b = 2, θ = 1. [Solution: 1.239 / (1.239 + 3.199) = 0.279. ]
117
It may be helpful to think of this as first simulating a random draw from an Inverse Transformed Gamma Distribution with scale parameter of unity and then adjusting the scale to desired value at the end, by multiplying by the desired scale parameter theta. 118 As discussed in Simulation by Ross, not on the syllabus, the Beta Distribution can be simulated via the rejection method. 119 See for example page 48 of Hogg & Klugman, not on the Syllabus.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 224
Generalized Pareto Distribution:120 The Generalized Pareto Distribution as per Loss Models with parameters α, θ, and τ, with both α and τ integer or half-integer has Distribution Function:121 β(τ, α; x/(x+θ)) = F(2τ),(2α) [αx/(θτ)], where this is the F-Distribution with 2τ and 2α degrees of freedom. This F-Distribution is the ratio of two Chi-Square Distributions each divided by their degrees of freedom: {(χ2 2τ) / 2τ} / {(χ2 2α) / 2α}. Since αx/(θτ) follows this distribution, x follows: θ {(χ2 2τ) / (χ2 2α) }. Thus one can simulate a Generalized Pareto Distribution as a ratio of two Chi-Square Distributions, provided 2τ and 2α are both integers. Exercise: Assume one has six independent random draws from a Standard Unit Normal Distribution with mean zero and standard deviation one: 0.1, 0.5, -0.9, -1.8, 1.8, and -0.2. Use these values to simulate a random draw from a Chi-Square Distribution with six degrees of freedom. [Solution: A random draw from a Chi-Square with 6 degrees of freedom is a sum of 6 squares of random independent Unit Normals: (0.1)2 + (0.5)2 + (-0.9)2 + (-1.8)2 + (1.8)2 + (-0.2)2 = 7.59.] Exercise: Assume one has four independent random draws from a Standard Unit Normal Distribution with mean zero and standard deviation one: 0.2, -1.4, 0.5, and 0.8. Use these values to simulate a random draw from a Chi-Square Distribution with four degrees of freedom. [Solution: A random draw from a Chi-Square Distribution with four degrees of freedom is a sum of four squared random independent draws from Unit Normals: (0.2)2 + (-1.4)2 + (0.5)2 + (0.8)2 = 2.89.] Exercise: Assume 7.59 is a random draw from a Chi-Square Distribution with six degrees of freedom, and 2.89 is a random draw from a Chi-Square Distribution with four degrees of freedom. Use these values to simulate a random draw from a Generalized Pareto Distribution with parameters: α = 2, θ = 0.01, and τ = 3. [Solution: This Generalized Pareto Distribution has Distribution Function: β(3,2; x/(x+.01)) = F6,4[2x/{(3)(.01)}] = F6,4[66.6667x]; this is the F-Distribution with 6 and 4 d.f., a ratio of two Chi-Square Distributions divided by their degrees of freedom: {(χ2 6 ) / 6} / {(χ2 4 ) / 4} = (2/3)(χ2 6 )/(χ2 4 ). Since 66.6667x follows this distribution, x follows: {(2/3)/66.6667)} (χ2 6 )/(χ2 4 ) = (0.01) {(χ2 6 ) / (χ2 4 )} = θ {(χ2 2k) / (χ2 2α)}. Thus using the given random draws from Chi-Square Distributions, the random draw from the desired Generalized Pareto Distribution is: (0.01)(7.59 / 2.89) = 0.0263.] 120
The Generalized Pareto Distribution can be simulated as a special case of Transformed Beta Distribution, simulated by the rejection method, not on the syllabus. 121 See for example, Loss Distributions by Hogg & Klugman, p.217, 223, not on the syllabus.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 225
Problems: 13.1 (2 points) Simulate a random draw from a Gamma Distribution with α = 2 and θ = 500, by first simulating random draws from an Exponential Distribution. Use the following independent random numbers from [0,1] : 0.53, 0.76, 0.91, 0.38 (A) 800 (B) 900 (C) 1000 (D) 1100 (E) 1200 13.2 (1 point) One has three independent random draws from a Normal Distribution with mean zero and standard deviation one: 0.635, -0.011, 0.701. Use these three values to simulate a random draw from a Chi-Square Distribution with 3 degrees of freedom. (A) 0.9 (B) 1.0 (C) 1.1 (D) 1.2 (E) 1.3 13.3 (1 point) One has three independent random draws from a Normal with mean zero and standard deviation one: 0.635, -0.011, 0.701. Use these three values to simulate a random draw from a Gamma Distribution with parameters α = 3/2 and θ = 1000. A. less than 400 B. at least 400 but less than 420 C. at least 420 but less than 440 D. at least 440 but less than 460 E. at least 460 13.4 (1 point) Assume 0.38 is a random draw from a Gamma Distribution with parameters α = 3 and θ = 2. Use this value to simulate a random draw from an Inverse Gamma Distribution with parameters α = 3 and θ = 2. A. less than 4 B. at least 4 but less than 6 C. at least 6 but less than 8 D. at least 8 but less than 10 E. at least 10 13.5 (1 point) Assume 11.2 is a random draw from a Gamma Distribution with parameters
α = 3 and θ = 1. Use this value to simulate a random draw from a Transformed Gamma Distribution with parameters α = 3, θ = 2 and τ = 4. A. less than 4.0 B. at least 4.0 but less than 4.2 C. at least 4.2 but less than 4.4 D. at least 4.4 but less than 4.6 E. at least 4.6
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 226
13.6 (3 points) Use the following set of random numbers from [0,1] : 0.8150 0.7484 0.0638 0.9247 0.2164. in order to generate a random draw from a Beta Distribution as per Loss Models with parameters a = 2, b = 3, and θ = 1. Hint: Simulate Gamma Distributions as a sum of Exponential Distributions. Use the fact that if X is a Gamma Distribution with shape parameter a and scale parameter θ, and Y is a Gamma Distribution with shape parameter b and the same scale parameter θ, then Z = X / (X+Y) is a Beta Distribution with parameters a, b, and scale parameter 1. A. less than 0.2 B. at least 0.2 but less than 0.3 C. at least 0.3 but less than 0.4 D. at least 0.4 but less than 0.5 E. at least 0.5 13.7 (4 points) Assume one has six independent random draws from a Normal Distribution with mean zero and standard deviation one: 0.063, 0.460, -0.949, -1.770, 1.787, -0.189. Use these values to simulate a random draw from a Generalized Pareto Distribution as per Loss Models with parameters α = 1, θ = 10, and τ = 2. Hint: Use the relationship between the F-Distribution and the Incomplete Beta Function,
β(τ,α; x/(x+θ)) = F(2τ),(2α) [αx/(θτ)], where this is the F-Distribution with 2τ and 2α degrees of freedom. This F-Distribution is the ratio of two Chi-Square Distributions each divided by their degrees of freedom: {(χ2 2τ) / 2τ} / {(χ2 2α) / 2α}. A. less than 11 B. at least 11 but less than 12 C. at least 12 but less than 13 D. at least 13 but less than 14 E. at least 14 13.8 (2 points) Let 0.72, 0.34, 0.06, and 0.55 be independent random numbers from [0,1]. Simulate a random draw from a Gamma Distribution with α = 4 and θ = 1000, by first simulating random draws from an Exponential Distribution. A. less than 2500 B. at least 2500 but less than 2600 C. at least 2600 but less than 2700 D. at least 2700 but less than 2800 E. at least 2800
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 227
Solutions to Problems: 13.1. D. Let large random numbers correspond to large losses.
⇒ To simulate an Exponential with mean 500, u = F(x) = 1 - e-x/500. ⇒ x = -500ln(1-u). -500ln.47 = 377.5. -500ln.24 = 713.6. The random Gamma is: 377.5 + 713.6 = 1091. Comment: In order to simulate a Gamma with α = 2, we need to simulate two Exponentials, requiring two random numbers. 13.2. A. The Chi-Square Distribution with 3 degrees of freedom is a sum of squares of three independent random unit Normals: (.635)2 + (-.011)2 + (.701)2 = 0.895. 13.3. D. Using the previous solution, a random draw from a Chi-Square Distribution with three degrees of freedom is .895. This is also a Gamma Distribution with α = 3/2 and θ = 2: Γ[1.5; x/2]. If we instead want a Gamma with θ = 1000: Γ[1.5; x/1000], then x is multiplied by 1000/2 = 500.
⇒ The random draw is: (500)(.895) = 447.5. 13.4. E. If .38 is a random draw from a Gamma Distribution with parameters α = 3 and θ = 2, then .38/2 = .19 is a random draw from a Gamma Distribution with parameters α = 3 and θ = 1. Thus a random draw from an Inverse Gamma with parameters α = 3 and θ = 2 is: 2/.19 = 10.53. 13.5. A. The Transformed Gamma distribution F(x) = Γ[3 ; (x/2)4 ] = Γ[3 ; x4 /16]. So if x4 comes from a Gamma α = 3 and θ = 16, then x comes from the Transformed Gamma with parameters α = 3, θ = 2 and τ = 4. A draw from a Gamma with α = 3 and θ = 16 is 16 times a random draw from a Gamma with α = 3 and θ = 1. Thus (11.2)(16) = 179.2 is a draw from a Gamma with α = 3 and θ = 16. Set x4 = 179.2. Therefore, x = 179.20.25 = 3.659.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 228
13.6. E. First simulate a series of Exponentials, with scale parameter of 1, via Inversion: 1 - e-x = u. Therefore x = - ln(1-u). Random Number
Random Exponential
0.8150 0.7484 0.0638 0.9247 0.2164
1.687 1.380 0.066 2.586 0.244
A Gamma with α = 2 is the sum of two random Exponentials: 1.687 + 1.380 = 3.067. A Gamma with α = 3 is the sum of three random Exponentials: .066 + 2.586 + .244 = 2.896. If X is a random draw from a Gamma Distribution with shape parameter 2 and scale parameter of 1, and Y is a random draw from a Gamma Distribution with shape parameter 3 and the same scale parameter 1, then Z = X / (X+Y) is a random draw from a Beta Distribution with parameters 2 and 3. Thus the desired random draw from a Beta is: 3.067 / (3.067 + 2.896) = 0.514. Comment: Choose any consistent scale parameter for the simulated Gammas. 13.7. D. This Generalized Pareto Distribution has Distribution Function x / (x +10) β(2,1; x/(x+10)) = F4,2[ ] = F4,2[x/20] where this is the (2)(10)/ (x + 10) F-Distribution with 4 and 2 degrees of freedom. This F-Distribution is the ratio of two Chi-Square Distributions divided by their degrees of freedom: {(χ2 4 ) / 4} / {(χ2 2 ) / 2}. Since x/20 follows this distribution, x follows: (10){(χ2 4 ) / (χ2 2 ) }. A random draw from a Chi-Square Distribution with four degrees of freedom is a sum of four squared random independent draws from Unit Normals: (.063)2 + (.460)2 + (-.949)2 + (-1.770)2 = 4.249. A random draw from a Chi-Square Distribution with two degrees of freedom is a sum of two squared random independent draws from Unit Normals: (1.787)2 + (-.189)2 = 3.229. Thus the random draw from a Generalized Pareto Distribution is (10)(4.249 / 3.229) = 13.16. Comment: This relationship will only be useful when α and τ are each integer or half-integer. Since both α and τ are integers in this case, both the numerator and denominator could have been simulated as a sum of exponential distributions.
2013-4-13,
Simulation §13 Gamma and Related Distributions, HCM 10/25/12, Page 229
13.8. B. Simulate four independent draws from an Exponential Distribution with θ = 1000. Simulate the Exponentials via Algebraic Inversion, u = 1- e-x/θ, or x = -θ ln (1-u) . Thus the four independent random Exponentials are: -1000ln(1-.72), -1000ln(1-.34), -1000ln(1-.06), and -1000ln(1-.55). The random Gamma is their sum: -1000{ln(.28) + ln(.66) + ln(.94) + ln(.45)} = -1000 ln{(.28)(.66)(.94)(.45)} = 2549.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 230
Section 14, Simulating Mixtures of Models The methods already discussed can be extended in order to simulate mixtures of models. This can be done either for n-point mixtures or continuous mixtures. N-Point Mixtures (Mixed Distributions): Assume one has a mixed distribution with weight p to one distribution A(x) and weight 1 - p to another distribution B(x).122 This is referred to in Loss Models as a 2-point mixture. In order to simulate a random draw from this mixed distribution, first one would simulate a random number from 0 to 1 and see whether it is less than p, the weight applied to the first distribution.123 If so then one simulates a random draw from the first distribution A; if not then one simulates a random draw from the second distribution B. For example, one could have a mixed “Pareto-LogNormal” Distribution, with A = Pareto Distribution, B = LogNormal Distribution, and p = 0.10, the weight for the Pareto Distribution. Then in order to simulate a draw from this Pareto-LogNormal first simulate a random number from 0 to 1. If this random number is less than 0.1, then simulate a random draw from the Pareto, using a new random number and the Inverse Transform Method. If this random number is greater than or equal to 0.1, then simulate a random draw from the LogNormal. Thus simulating mixed distributions involves first simulating a random number from zero to one in order to decide which of the underlying distributions the claim comes from.124 Then one proceeds to simulate a random draw from the appropriate underlying distribution using one of the techniques discussed previously. Exercise: Let a distribution function be given by: (0.3){1 - 100/x2 } + (0.7){1 - 1000/x3 }, for x > 10. Then given the sequence of random numbers: 0 .41, 0.93, 0.12, 0.62, simulate two random draws from this distribution. Use the first number of a pair to decide which of the two components of the mixture to work with. Small random numbers result in working with the first component. Use the second number of a pair to simulate the component via the method of inversion. Large random numbers correspond to large losses.
122
See “Mahlerʼs Guide to Loss Distributions.” This is mathematically equivalent to a Bernoulli distribution with f(0) = p and f(1) = 1 - p. F(0) = p and F(1) = 1. We see where u is first exceeded by F. If u < p, then u < F(0) , and we simulate a random draw from the first component of the mixture. 124 Note that this technique applies equally well even if more than two distributions are weighted together. 123
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 231
[Solution: Since the first random number 0.41 is ≥ 0.3, we simulate a random draw from the second distribution: 1 - 1000/x3 . Using the second random number and the Inverse Transform Method, 0.93 = 1 - 1000/x3 . Therefore x = 24.26. Since the third random number 0.12 is < 0.3, we simulate a random draw from the first distribution: 1 - 100/x2 . Using the fourth random number and the Inverse Transform Method, 0.62 = 1 - 100/x2 . Therefore x = 16.22. Comment: Note that the given distribution is a mixture of two Single Parameter Pareto Distributions, the first longer-tailed than the second, each of which could be simulated by the Inverse Transform Method.] Given more random numbers from [0, 1], one can simulate more random draws from the mixed distribution. In each case one uses a random number to check which of the two distributions to simulate from, and then uses another random number to simulate a single draw from that distribution. To simulate a random draw from the two point mixture pA(x) + (1 - p)B(x): If the random number u1 is less than p, then simulate a random draw from A. If the random number u1 is greater than or equal to p, then simulate a random draw from B. One would need an additional random number u2 to simulate a random draw from A or B. One can proceed similarly with mixtures of 3, 4, or more distributions. First one uses a random number number from (0, 1) in order to decide which of the distributions to simulate, and then one simulates a single random draw from that distribution.
Mixtures versus Linear Combinations: A two-point mixture differs from a linear combination of two variables. They are simulated differently and have different variances. Let X be a 4-sided die. X has mean 2.5, variance 1.25, and 2nd moment 7.5. Let Y be an 8-sided die. Y has mean 4.5, variance 5.25, and 2nd moment 25.5. Let Z be a two-point mixture, with 70% weight to X and 30% weight to Y. Z has mean: (0.7)(2.5) + (0.3)(4.5) = 3.1, second moment: (0.7)(7.5) + (0.3)(25.5) = 12.9, and variance: 12.9 - 3.12 = 3.29.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 232
Exercise: Simulate a random draw from Z, given two random numbers: 0.57, 0.32. Use the first random number to decide between X and Y, where low values correspond to X. [Solution: 0.57 < 0.70, and therefore simulate a draw from X. 0.32 is first exceeded by F(2) = 2/4 = 0.50, so the simulated value is 2.] Exercise: Simulate a random draw from Z, given two random numbers: 0.81, 0.66. Use the first random number to decide between X and Y, where low values correspond to X. [Solution: 0.81 ≥ 0.70, and therefore simulate a draw from Y. 0.66 is first exceeded by F(6) = 6/8 = 0.75, so the simulated value is 6.] As before, let X be a 4-sided die and let Y be an 8-sided die. Let W = 0.7X + 0.3Y. (Sum the result of rolling a 4 sided-die multiplied by 0.7 and the result of rolling an 8 sided-die multiplied by 0.3.) W is a linear combination rather than a mixture. W has mean: (0.7)(2.5) + (0.3)(4.5) = 3.1. Var[W] = 0.72 Var[X] + 0.32 Var[Y] = (0.49)(1.25) + (0.09)(5.25) = 1.085. Exercise: Simulate a random draw from W, given two random numbers: 0.81, 0.66. Use the first random number for X and the second random number for Y. [Solution: 0.81 is first exceeded by FX(4) = 4/4 = 1.00, so the simulated value from X is 4. 0.66 is first exceeded by FY(6) = 6/8 = 0.75, so the simulated value from Y is 6. The simulated value from W is: (0.7)(4) + (0.3)(6) = 4.6. Comment: In this example, the linear combination can take on a noninteger value. As seen in the two prior exercises, the mixture only takes on integer values.] Let X be a 4-sided die. X has mean 2.5, variance 1.25, 2nd moment 7.5. V = X + X. (Roll two 4-sided dice and add the result.) V is the sum of two independent, identically distributed variables. Mean of V is: 2.5 + 2.5 = 5. Variance of V is: 1.25 + 1.25 = 2.5. Exercise: Simulate a random draw from V, given two random numbers: 0.81, 0.66. (Simulate two independent rolls of 4-sided dice and sum them.) [Solution: 0.81 is first exceeded by F(4) = 4/4 = 1.00, so the first simulated value from X is 4. 0.66 is first exceeded by F(3) = 3/4 = 0.75, so the second simulated value from X is 3. The simulated value from V is: 4 + 3 = 7. ]
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 233
Continuous Mixtures: One can simulate continuous mixtures125 by first simulating a random draw from the Distribution of the parameter126 and then simulating a random draw from the likelihood127 given that value of the parameter. For example, assume that each insured has an Exponential severity with mean δ. Assume that in turn the distribution of δ over a portfolio of insureds is given by an Inverse Weibull Distribution as per Loss Models with parameters θ = 10 and τ = 7. Assume we wish to simulate random sizes of loss from this portfolio,128 using random numbers from (0,1): 0.518, 0.121, 0.847, 0.272, 0.184, 0.902. First one simulates via the Inverse Transform Algorithm a random draw from the Inverse Weibull, which has Distribution Function: F(δ) = exp[-(10/δ)7 ]. Setting u = exp[-(10/δ)7 ]. ln(u) = -(10/δ)7 . δ = 10/(-ln u)1/7. Thus the first random draw from the Inverse Weibull Distribution is 10/(-ln 0.518)1/7 = 10.62. Then we simulate a draw from an Exponential Distribution with mean 10.62: (10.62)(-ln(1 - 0.121)) = 1.370. Continuing we simulate another random draw from the Inverse Weibull, 10/(-ln 0.847)1/7 = 12.92. Then we simulate a draw from an Exponential Distribution with mean 12.92: (12.92)(-ln(1 - 0.272)) = 3.47. Continuing we simulate another random draw from the Inverse Weibull, 10/(-ln .184)1/7 = 9.28. Then we simulate a draw from an Exponential Distribution with mean 9.28: (9.28)(-ln(1 - 0.902) = 21.56. Thus we have simulated three losses of sizes: 1.37, 3.47 and 21.56. Given more random numbers from (0,1) we can proceed to simulate more random severities in a similar manner. The same technique can be applied to mixtures of frequency distributions, such as the Gamma-Poisson and Beta-Binomial, as well as mixtures of severity distributions such as the Inverse Gamma-Exponential and the Normal-Normal.
125
For example, the mixed distribution of the Gamma-Poisson In the Gamma-Poisson, the Gamma Distribution of Poisson means. 127 In the Gamma-Poisson, a Poisson frequency distribution. 128 Note that in this case the Mixed Distribution is not one of the named distributions in Appendix A of Loss Models. However, this has no effect on our ability to use this technique to simulate it. 126
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 234
Problems: 14.1 (1 point) Losses are given by the cumulative distribution: F(x) = 1 - 0.8e-0.02x - 0.2e-0.07x, x ≥ 0. Simulate a loss using the method of inversion. Use the following two random numbers from (0, 1): 0.531, 0.213. A. less than 4 B. at least 4 but less than 6 C. at least 6 but less than 8 D. at least 8 but less than 10 E. at least 10 Use the following information for the next two questions:
•
V follows a Pareto Distribution with parameters α = 4, θ =10.
•
W follows an Exponential Distribution with mean = 0.8: F(w) = 1 - e- w/0.8.
•
Y is a two-point mixture of V and W, with 5% weight to the Pareto Distribution and 95% weight to the Exponential Distribution.
•
You wish to simulate a value of Y, from this two point mixture.
•
You simulate the mixing variable where high values correspond to the Exponential distribution.
•
Then you simulate the value of Y, where high random numbers correspond to high values of Y.
14.2 (1 point) With large random numbers corresponding to large losses, simulate a random draw from Y, using the sequence of random numbers from [0,1] : 0.27 , 0.64. A. less than 0.5 B. at least 0.5 but less than 1.0 C. at least 1.0 but less than 1.5 D. at least 1.5 but less than 2.0 E. at least 2.0 14.3 (1 point) With large random numbers corresponding to large losses, simulate a random draw from Y, using the sequence of random numbers from [0,1] : 0.04, 0.47. A. less than 0.5 B. at least 0.5 but less than 1.0 C. at least 1.0 but less than 1.5 D. at least 1.5 but less than 2.0 E. at least 2.0
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 235
14.4 (1 point) We have a 6-sided die and an 8-sided die. Briefly describe how one would simulate a 50%-50% mixture of these two variables. Briefly describe how one would simulate a 50%-50% weighting of these two variables. 14.5 (3 points) The number of claims for each individual insured has a Poisson distribution and the means of these Poisson distributions are exponentially distributed over the population of insureds, with mean 1.3. Use the method of inversion in order to simulate the number of claims from a randomly selected individual. Use the following two random numbers from (0, 1): 0.3244, 0.8731. (A) 0 (B) 1 (C) 2 (D) 3 (E) 4 or more 14.6 (3 points) For a life insurance: (i) 1/3 of the population are smokers and 2/3 are nonsmokers. (ii) The future lifetimes follow a Weibull distribution with: τ = 2 and θ = 15 for smokers τ = 2 and θ = 20 for nonsmokers (iii) The death benefit is 100,000 payable at time of death. (iv) i = 0.05. You are given the following 6 random numbers from [0, 1]: 0.17, 0.48, 0.67, 0.94, 0.57, 0.23. Simulate the future lifetimes of 3 random individuals, with small random numbers corresponding to early deaths. (Simulate one lifetime, then the next, and then the last one.) What is the present value of the total benefits paid? A. Less than 125,000 B. At least 125,000, but less than 130,000 C. At least 130,000, but less than 135,000 D. At least 135,000, but less than 140,000 E. At least 140,000 14.7 (2 points) Each insuredʼs claim frequency follows a Binomial Distribution, with m = 5. There are three types of insureds as follows: Type A Priori Probability Binomial Parameter q A 60% 0.1 B 30% 0.2 C 10% 0.3 Simulate the number of claims from an insured picked at random. Use the following random numbers from [0, 1]: 0.67, 0.91. (A) 0 (B) 1 (C) 2 (D) 3 (E) 4 or more
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 236
14.8 (2 points) You are given the following: • A portfolio consists of 75 liability risks and 25 property risks. • The risks have identical claim count distributions. •
Loss sizes for liability risks follow a Pareto distribution, with parameters θ = 300 and α = 4.
•
Loss sizes for property risks follow a Pareto distribution, with parameters θ = 1,000 and α = 3.
Simulate a single size of loss. You are given the following 2 random numbers from [0, 1]: 0.43, 0.89. A. Less than 200 B. At least 200, but less than 250 C. At least 250, but less than 300 D. At least 300, but less than 350 E. At least 350 14.9 (3 points) You are given the following:
• The claim count N for an individual insured has a Poisson distribution with mean λ. • λ is uniformly distributed between 1 and 3. Simulate the number of claims from an insured picked at random. Use the following 2 random numbers from [0, 1]: 0.85, 0.52. (A) 0 (B) 1 (C) 2 (D) 3 (E) 4 or more 14.10 (3 points) F(x) = (0.2)(1-e-x/10) + (0.5)(1-e-x/25) + (0.3)(1-e-x/100). Simulate 4 sizes of loss, with large random numbers corresponding to large losses, using the following 8 random numbers from [0, 1]: 0.88, 0.40, 0.34, 0.92, 0.73, 0.86, 0.09, 0.51. What is the sum of the four simulated losses? A. Less than 300 B. At least 300, but less than 325 C. At least 325, but less than 350 D. At least 350, but less than 375 E. At least 375
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 237
Use the following information for the next 3 questions: Risk Type Number of Risks Frequency
Size of Loss Distribution
I
600
Poisson, λ = 4%.
Single Parameter Pareto, θ = 10, α = 4
II
400
Poisson, λ = 7%.
Single Parameter Pareto, θ = 50, α = 3
Single Parameter Pareto, θ = 10, α = 4. Mean Severity is: αθ/(α − 1) = (4)(10)/3 = 13.3333. 2nd moment: αθ2/(α - 2) = (4)(100)/2 = 200. Single Parameter Pareto, θ = 50, α = 3. Mean Severity is: αθ/(α - 1) = (3)(50)/2 = 75. 2nd moment: αθ2/(α - 2) = (3)(2500)/1 = 7500. 14.11 (2 points) You independently simulate the aggregate loss for a single year for each risk. Let S be the sum of these 1000 amounts. You repeat this process many times. What is the variance of S? A. Less than 200,000 B. At least 200,000, but less than 210,000 C. At least 210,000, but less than 220,000 D. At least 220,000, but less than 230,000 E. At least 230,000 14.12 (3 points) A risk is selected at random from one of the 1000 risks. You simulate the aggregate loss for a single year for this risk. This risk is replaced, and a new risk is selected at random from one of the 1000 risks. You simulate the aggregate loss for a single year for this new risk. You repeat this process many times, each time picking a new risk at random. What is the variance of the outcomes? A. Less than 200 B. At least 200, but less than 210 C. At least 210, but less than 220 D. At least 220, but less than 230 E. At least 230 14.13 (2 points) A risk is selected at random from one of the 1000 risks. You simulate the aggregate loss for a single year for this risk. You then simulate another year for this same risk. You repeat this process many times. What is the expected variance of the outcomes? A. 190 B. 200 C. 210 D. 220 E. 230
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 238
Use the following information for the next four questions: X follows an Exponential Distribution with mean 30. Y follows a Pareto Distribution with α = 3 and θ = 100. X and Y are independent. Let Z be a two-point mixture with 60% weight to X and a 40% weight to Y. Let W = 0.6X + 0.4Y. 14.14 (2 points) What is the variance of Z? A. Less than 2500 B. At least 2500, but less than 3000 C. At least 3000, but less than 3500 D. At least 3500, but less than 4000 E. At least 4000 14.15 (1 point) Use the random numbers 0.728 and 0.184 to simulate a random draw from Z, with a small random number indicating a random draw from the Exponential Distribution, and with large random numbers corresponding to large losses. (A) 4 (B) 5 (C) 6 (D) 7 (E) 8 14.16 (2 points) What is the variance of W? (A) 1500 (B) 2000 (C) 2500 (D) 3000
(E) 3500
14.17 (2 points) Use the random numbers 0.728 and 0.184 to simulate a random draw from W, with large random numbers corresponding to large losses. (A) 20 (B) 25 (C) 30 (D) 35 (E) 40
14.18 (3 points) For each insured, let the number of losses be given by a Poisson distribution with mean λ. The density function of λ over the portfolio is given by a Gamma Distribution f(λ) = λ2 e- λ/4 /128. Simulate the number of losses from an insured picked at random. First simulate a random λ by simulating three random Exponentials, in each case setting the distribution function equal to a random number from (0, 1). Then simulate a random draw from a Poisson with that mean, via the Inverse Transform Method. Use the following random numbers from (0,1): 0.518, 0.121, 0.402, 0.601. A. 2 or less B. 3 C. 4 D. 5 E. 6 or more
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 239
14.19 (2 points) Tom will generate via simulation 100,000 values of the random variable X as follows: (i) He will generate the value of λ from a distribution that has Prob[2] = 60%, and Prob[3] = 40%. (ii) He then generates x from the Poisson distribution with mean λ. (iii) He repeats the process 99,999 more times: first generating a value λ, then generating x from the Poisson distribution with mean λ. Calculate the expected number of Tomʼs 100,000 simulated values of X that are 4. A. less than 12,200 B. at least 12,200 but less than 12,300 C. at least 12,300 but less than 12,400 D. at least 12,400 but less than 12,500 E. at least 12,500 14.20 (2 points) In the previous question, let V = the variance of a single simulated set of 100,000 values. What is the expected value of V? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6 14.21 (2 points) Dick will generate via simulation 100,000 values of the random variable X as follows: (i) He will generate the value of λ from a distribution that has Prob[2] = 60%, and Prob[3] = 40%. (ii) He will then generate 100,000 independent values from the Poisson distribution with mean λ. Calculate the expected number of Dickʼs 100,000 simulated values of X that are 4. A. 12.000 B. 12,500 C. 13,000 D. 13,500 E. 14,000 14.22 (2 points) In the previous question, let V = the variance of a single simulated set of 100,000 values. What is the expected value of V? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 240
14.23 (1 point) Harry will generate via simulation 100,000 values of the random variable X as follows: (i) He will generate the value of λ from a distribution that has Prob[2] = 60%, and Prob[3] = 40%. (ii) He then generates x from the Poisson distribution with mean λ. (iii) He will then copy 99,999 times this value of x. Calculate the expected number of Harryʼs 100,000 simulated values of X that are 4. A. less than 12,200 B. at least 12,200 but less than 12,300 C. at least 12,300 but less than 12,400 D. at least 12,400 but less than 12,500 E. at least 12,500 14.24 (1 point) In the previous question, let V = the variance of a single simulated set of 100,000 values. What is the expected value of V? A. less than 2.0 B. at least 2.0 but less than 2.2 C. at least 2.2 but less than 2.4 D. at least 2.4 but less than 2.6 E. at least 2.6
14.25 (2 points) You wish to simulate a value, Y, from a two point mixture. With probability 0.4, Y follows a Single Parameter Pareto Distribution with α = 3 and θ = 1000. With probability 0.6, Y follows a Single Parameter Pareto Distribution with α = 4 and θ = 1000. You simulate the mixing variable where low values correspond to the first distribution. Then you simulate the value of Y, where low random numbers correspond to low values of Y. Your uniform random numbers from [0, 1] are 0.66 and 0.32 in that order. Calculate the simulated value of Y. (A) 1100 (B) 1150 (C) 1200 (D) 1250 (E) 1300
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 241
14.26 (3, 11/02, Q.10 & 2009 Sample Q.81) (2.5 points) You wish to simulate a value, Y, from a two point mixture. With probability 0.3, Y is exponentially distributed with mean 0.5. With probability 0.7, Y is uniformly distributed on [-3, 3]. You simulate the mixing variable where low values correspond to the exponential distribution. Then you simulate the value of Y, where low random numbers correspond to low values of Y. Your uniform random numbers from [0, 1] are 0.25 and 0.69 in that order. Calculate the simulated value of Y. (A) 0.19 (B) 0.38 (C) 0.59 (D) 0.77 (E) 0.95 14.27 (2 points) In the previous question, if your uniform random numbers from [0, 1] were instead 0.44 and 0.82 in that order, calculate the simulated value of Y. (A) 0.10 (B) 0.86 (C) 1.60 (D) 1.92 (E) None of A, B, C, or D 14.28 (SOA3, 11/04, Q.5) (2.5 points) You are simulating the future lifetimes of newborns in a population. (i) For any given newborn, mortality follows De Moivreʼs law with maximum lifetime Ω. (ii) Ω has distribution function F(ω) = (ω/80)2 , 0 ≤ ω ≤ 80. (iii) You are using the inverse transform method, with small random numbers corresponding to small values of Ω or short future lifetimes. (iv) Your first random numbers from [0,1] for simulating Ω and the future lifetime are 0.4 and 0.7 respectively. Calculate your first simulated value of the future lifetime. (A) 22 (B) 35 (C) 46 (D) 52 (E) 56 14.29 (2 points) If in the previous question, the random numbers from [0,1] for simulating Ω and the future lifetime were instead 0.8 and 0.6 respectively, what would be the simulated value of the future lifetime? (A) 35 (B) 37 (C) 39 (D) 41 (E) 43
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 242
Solutions to Problems: 14.1. E. This is a two-point mixture of Exponential Distributions with means 50 and 1/.07: F(x) = (.8)(1 - e-x/50) + (.2)(1 - e-.07x). Since .531 < .8, we simulate a random draw from the first Exponential Distribution. Set .213 = 1 - e-x/50. x = -50 ln(1 - .213) = 11.98. 14.2. B. Since the first random number .27 is ≥ .05, we simulate a random draw from the Exponential Distribution. Using the second random number of .64 and algebraic inversion: .64 = 1 - e- y /.8. Thus y = -.8 ln(.36) = 0.82. 14.3. D. Since the first random number .04 is < .05, we simulate a random draw from the Pareto Distribution. Using the second random number of .47 and algebraic inversion: .47 = 1 - {10/(10+y)}4 . Thus, .53-1/4 = (10+y)/10. y = 1.72. 14.4. Simulating a 50%-50% mixture of the two variables: 50% of the time we roll one of the dice and 50% of time we roll the other die. Each time we randomly select which of the two dice to roll (perhaps by flipping a coin.) Simulating a 50%-50% weighting of the two variables: We roll each die once, and take an average of the results. 14.5. B. First simulate the value of λ from the Exponential Distribution with mean 1.3: .3244 = 1- e-λ/1.3. λ = -1.3ln(1 - .3244) = .5098. Construct a table of densities for a Poisson with mean .5098: 0.5098 Number of Claims
Probability Density Function
Cumulative Distribution Function
0 1 2 3
60.062% 30.619% 7.805% 1.326%
0.60062 0.90681 0.98486 0.99812
One cumulates the densities to get the distribution function. The next random number is 0.8731; one sees the first time F(n) > 0.8731, which occurs when n = 1.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 243
14.6. D. .17 < 1/3, so the first person is a smoker. .48 = F(x) = 1 - exp(-(x/15)2 ). x = 15 -ln(1- .48) = 12.13. .67 ≥ 1/3, so the second person is a nonsmoker. .94 = F(x) = 1 - exp(-(x/20)2 ). x = 20 -ln(1- .94) = 33.55. .57 ≥ 1/3, so the third person is a nonsmoker. .23 = F(x) = 1 - exp(-(x/20)2 ). x = 20 -ln(1- .23) = 10.22. The present value of the total benefits paid is: (100000)(1.05-12.13 + 1.05-33.55 + 1.05-10.22) = 135,526. Comment: The situation is similar to 3, 5/00, Q.8. 14.7. C. The first random number is .65 and .6 < .67 ≤ .9, so we simulate a Type B insured. Construct a table of densities for a Binomial with m = 5 and q = 0.2. Number of Claims
Probability Density Function
Cumulative Distribution Function
0 1 2 3 4 5
32.768% 40.960% 20.480% 5.120% 0.640% 0.032%
0.32768 0.73728 0.94208 0.99328 0.99968 1.00000
Sum
1
F(n) first exceeds the next random number, .91, when n = 2. 14.8. B. .43 < .75, so the risk chosen is a liability risk. .89 = F(x) = 1 - (300/(300 + x))4 . x = 300((1-.89)-1/4 -1) = 221. Comment: The situation is from 4B, 11/98 Q.8. 14.9. D. The first random number from [0 ,1] is 0.85. This produces a random number from 1 to 3 of: 1 + (2)(.85) = 2.7. Thus we simulate from a Poisson Distribution with λ = 2.7. Construct a table of densities for a Poisson with λ = 2.7: Number of Claims
Probability Density Function
Cumulative Distribution Function
0 1 2 3 4 5
6.721% 18.145% 24.496% 22.047% 14.882% 8.036%
0.06721 0.24866 0.49362 0.71409 0.86291 0.94327
Sum
F(n) first exceeds the next random number, 0.52, when n = 3. Comment: Situation taken from 4, 5/91, Q.42.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 244
14.10. B. u = F(x) = 1 - e-x/θ. x = -θln(1-u). .7 < .88 ≤ 1, so we simulate the first loss from the Exponential with mean 100. -100ln(1-.40) = 51.1. .2 < .34 ≤ .7, so we simulate the second loss from the Exponential with mean 25. -25ln(1-.92) = 63.1. .7 < .73 ≤ 1, so we simulate the third loss from the Exponential with mean 100. -100ln(1-.86) = 196.6. .09 < .2 , so we simulate the fourth loss from the Exponential with mean 10. -10ln(1-.51) = 7.1. 51.1 + 63.1 + 196.6 + 7.1 = 317.9. 14.11. C. For Type I, the Single Parameter Pareto has 2nd moment: αθ2/(α − 2) = (4)(100)/2 = 200. Therefore, the aggregate has variance: λ (2nd moment of severity) = (.04)(200) = 8. For Type II, the Single Parameter Pareto has 2nd moment: αθ2/(α − 2) = (3)(2500)/1 = 7500. Therefore, the aggregate has variance: (.07)(7500) = 525. The sum of 600 risks of Type I and 400 risks of Type II has variance: (600)(8) + (400)(525) = 214,800. Comment: We know exactly how many of each type we have, rather than picking a certain number of risks at random. We are simulating the annual aggregate loss for this entire portfolio of risks, over and over again. 14.12. D. For Type I, the Single Parameter Pareto has mean αθ/(α − 1) = (4)(10)/3 = 13.3333, and 2nd moment αθ2/(α − 2) = (4)(100)/2 = 200. Therefore, the aggregate has mean: (.04)(13.3333) = .53333, variance: (.04)(200) = 8, and second moment: 8 + .533332 = 8.2844. For Type II, the Single Parameter Pareto has mean αθ/(α − 1) = (3)(50)/2 = 75, and second moment αθ2/(α − 2) = (3)(2500)/1 = 7500. Therefore, the aggregate has mean: (.07)(75) = 5.25, variance: (.07)(7500) = 525, and second moment: 525 + 5.252 = 552.5625. This is a 60%-40% mixture of the two aggregate distributions. The mixture has mean: (.6)(.53333) + (.4)(5.25) = 2.42. The mixture has second moment: (.6)(8.2844) + (.4)(552.5625) = 226. The mixture has variance: 226 - 2.422 = 220.14.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 245
14.13. C. If the risk is of Type I, then the variance of outcomes is 8. If the risk is of Type II, then the variance of outcomes is 525. The expected value of this variance is: (.6)(8) + (.4)(525) = 214.8. Comment: This is the Expected Value of the Process Variance. The Variance of the Hypothetical Means is: .6(.53333 - 2.42)2 + (.4)(5.25 - 2.42)2 = 5.34. Expected Value of the Process Variance + Variance of the Hypothetical Means = 214.8 + 5.34 = 220.14 = Total Variance. We have less randomness here than in the previous question, where we had the mixture. Note that the variance of the sum of one loss from each of the 1000 risks is: (1000)(214.8) = 214,800, the solution to a previous question. 14.14. D. E[X] = 30. E[X2 ] = (2)(302 ) = 1800. E[Y] = θ/(α−1) = 100/2 = 50. E[Y2 ] = 2θ2 /{(α−1) (α−2)} = 10,000. E[Z] = (60%)(30) + (40%)(50) = 38. E[Z2 ] = (60%)(1800) + (40%)(10,000) = 5080. Var[Z] = 5080 - 382 = 3636.
14.15. D. 0.728 ≥ 60%, so we simulate a random draw from the Pareto. Set .184 = 1 - (100/(z + 100))3 . ⇒ z = 7.0.
14.16. A. Var[X] = 1800 - 302 = 900. Var[Y] = 10,000 - 502 = 7500. Var[W] = Var[ .6X + .4Y] = Var[.6X] + Var[.4Y] = .62 Var[X] + .42 Var[Y] = (.62 )(900) + (.42 )(7500) = 1524.
14.17. B. Simulate a random draw from the Exponential: Set 0.728 = 1 - exp[-x/30]. ⇒ x = 39.1. Simulate a random draw from the Pareto: Set 0.184 = 1 - (100/(y + 100))3 . ⇒ y = 7.0. w = (.6)(39.1) + (.4)(7.0) = 26.3. Comment: A two-point mixture differs from a linear combination of two variables. They are simulated differently and have different variances.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 246
14.18. E. The distribution of lambda is a Gamma Distribution as per Loss Models with α = 3 and θ = 4. Thus we can simulate a random λ as the sum of three random Exponentials with mean 4. -4 ln(1-.518) = 2.919. -4 ln(1-.121) = .516. -4 ln(1-.402) = 2.057. λ = 2.919 + .516 + 2.057 = 5.492. We now simulate a random Poisson with mean 5.492. We calculate a table of densities for this Poisson, using the relationship f(x+1) = f(x) {λ / (x+1)} = 5.492 f(x) /(x+1), and f(0) = e−λ = e-5.492 = .00412. 5.492 Number of Claims
Probability Density Function
Cumulative Distribution Function
0 1 2 3 4 5 6 7 8 9
0.412% 2.262% 6.213% 11.374% 15.616% 17.152% 15.700% 12.318% 8.456% 5.160%
0.00412 0.02674 0.08887 0.20261 0.35877 0.53029 0.68729 0.81047 0.89503 0.94664
One cumulates the densities to get the distribution function. The next random number is 0.601; one sees the first time F(x) > 0.601, which occurs when x = 6. Thus we simulate 6 losses. Comment: The mixture of Poisson Distributions via a Gamma is a Negative Binomial. This illustrates another way to simulate a Negative Binomial, as a continuous mixture of Poisson Distributions. 14.19. A. Of the simulated values of lambda, on average 60,000 are 2 and 40,000 are 3. For a Poisson with λ = 2, f(4) = e-2 24 / 4! = 0.090223522. Therefore, of the expected 60,000 with λ = 2, we expect (0.090223522)(60,000) = 5413.4 to have X = 4. For a Poisson with λ = 3, f(4) = e-3 34 / 4! = 0.168031356. Therefore, of the expected 40,000 with λ = 3, we expect (0.168031356)(40,000) = 6721.3 to have X = 4. In total, we expect: 5413.4 + 6721.3 = 12,135 to have X = 4.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 247
14.20. E. Expected Value of the Process Variance is: E[P.V. | λ] = E[λ] = (60%)(2) + (40%)(3) = 2.4. Variance of the Hypothetical Means is: Var[Mean | λ] = Var[λ] = (60%)(22 ) + (40%)(32 ) - 2.42 = 0.24. Total Variance is: EPV + VHM = 2.4 + 0.24 = 2.64. Comment: I used Var[X] = E[Var[X|Y] + Var[E[X|Y]]. What Tom did was simulate one year each from 100,000 randomly selected insureds. What Dick did was pick a random insured and simulate 100,000 years for that insured; each year is an independent random draw from the same Poisson distribution with unknown λ. Even though the situations simulated by Tom and Dick have the same mean, they have different variances. 14.21. A. For a Poisson with λ = 2, f(4) = e-2 24 / 4! = .090223522. For a Poisson with λ = 3, f(4) = e-3 34 / 4! = .168031356. For a given simulation, Prob[X = 4] = (60%)(.090223522) + (40%)(.168031356) = .12315. Therefore, in total we expect: (100,000)(.12315) = 12,135 to have X = 4. 14.22. D. If λ = 2, we simulate a data set drawn from a Poisson Distribution with variance 2. If λ = 3, we simulate a data set drawn from a Poisson Distribution with variance 3. E[V] = (60%)(2) + (40%)(3) = 2.4. Comment: One need know nothing about simulation, in order to answer these questions. 14.23. A. For a Poisson with λ = 2, f(4) = e-2 24 / 4! = .090223522. For a Poisson with λ = 3, f(4) = e-3 34 / 4! = .168031356. For a given simulation, Prob[X = 4] = (60%)(.090223522) + (40%)(.168031356) = .12315. Therefore, in total we expect: (100,000)(.12315) = 12,135 to have X = 4. Comment: Either all of the values of X will be 4, or none of them will be. 14.24. A. Since the data set consists of 100,000 identical items, it has a variance of zero. 14.25. A. Since the first random number .66 ≥ .4, we simulate from the second distribution. Set .32 = F(x) = 1 - (1000/x)4 . ⇒ x = 1101. 14.26. C. Since the first random number .25 < .3, we simulate a random exponential. Set .69 = F(x) = 1 - e-x/.5. ⇒ x = (.5){-ln(1 - .69)} = 0.586. 14.27. D. .44 ≥ .3 ⇒ we simulate a random number uniform on [-3, 3]. (.82)(6) - 3 = 1.92, is the next simulated value from the mixed distribution.
2013-4-13,
Simulation §14 Mixtures of Models, HCM 10/25/12, Page 248
14.28. B. Set .4 = F(w) = (ω/80)2 . ⇒ ω = 50.6. Lifetime = (.7)(50.6) = 35.4. 14.29. E. Set .8 = F(w) = (ω/80)2 . ⇒ ω = 71.6. Lifetime = (.6)(71.6) = 43.0.
2013-4-13,
Simulation §15 Splices, HCM 10/25/12, Page 249
Section 15, Simulating Splices Assume one had the spliced density:129 0.01619e-x/50 on (0, 100), and 0.01519/(1 + x/200)4 on (100, ∞) ⇔ (0.7)(0.02313e-x/50) on (0, 100), and (0.3)(0.050625/(1 + x/200)4 ) on (100, ∞) ⇔ (0.7)(e-x/50/50)/(1 - e-100/50) on (0, 100), (0.3){(3)2003 /(200 + x)4 }/{(200/300)3 } on (100, ∞) ⇔ (0.8096)(Exponential[50]) on (0, 100), and (1.0125)(Pareto[3, 200]) on (100, ∞). Then at the breakpoint of 100, the spliced distribution function is 0.7. For x < 100, F(x) = (0.8096)(the Exponential Distribution at x) = (0.8096)(1 - e-x/50). ⎛ 200 ⎞ 3 For x > 100, S(x) = (1.0125)(the Pareto Survival Function at x) = (1.0125) ⎜ ⎟ . ⎝ 200 + x ⎠ ⎛ 200 ⎞ 3 For x > 100, F(x) = 1 - (1.0125) ⎜ ⎟ . ⎝ 200 + x ⎠ Thus one can simulate a random draw from this splice via algebraic inversion.130 If large random numbers correspond to large losses, then if u < 0.7 we work with the Exponential portion of the splice. Exercise: Using the random number 0.4, what is the random draw from the splice, if large random numbers correspond to large losses? [Solution: 0.4 = (0.8096)(1 - e-x/50). ⇒ x = 34.07.] If large random numbers correspond to the large losses, then if u > 0.7 we work with the Pareto portion of the splice. Exercise: Using the random number 0.8, what is the random draw from the splice, if large random numbers correspond to large losses? ⎛ 200 ⎞ 3 [Solution: 0.8 = 1 - (1.0125) ⎜ ⎟ . ⇒ x = 143.4.] ⎝ 200 + x ⎠ In general, if large random correspond to large losses, then for u < w1 we work with the first component of the splice and for u > w1 we work with the second component of the splice.131 129
See Section 5.2.6 of Loss Models and “Mahlerʼs Guide to Loss Distributions.” In this case, each of the components of the splice can be simulated by algebraic inversion. 131 For u = w1 , then the simulated value is the breakpoint. 130
2013-4-13,
Simulation §15 Splices, HCM 10/25/12, Page 250
Problems: 15.1 (3 points) f(x) = 0.12 for x ≤ 5, and f(x) = 0.06595e-x/10 for x > 5. Simulate two losses via the method of inversion. Use the two random numbers: 0.43 and 0.77. What is the sum of the two simulated losses? A. 13 B. 14 C. 15 D. 16 E. 17 15.2 (3 points) There is a 60% probability X is zero. If X is not zero, then X follows a Single Pareto Distribution with θ = 100 and α = 3. You use the inverse transformation method to simulate the outcome. Your random numbers for the first five trials are: 0.114, 0.657, 0.770, 0.402, 0.899. Calculate the average of the outcomes of these first five trials. A. Less than 74 B. At least 74, but less than 75 C. At least 75, but less than 76 D. At least 76, but less than 77 E. At least 77 15.3 (3 points) One has a three component splice, with each component a uniform distribution on an interval. The first component on [0, 5] has a total of 50% probability. The second component on [5, 25] has a total of 30% probability. The third component on [25, 100] has a total of 20% probability. You use the inverse transformation method to simulate random draws from this splice. Your random numbers for the first four trials are: 0.931, 0.561, 0.785, 0.108. Calculate the average of the outcomes of these first four trials. A. 27 B. 28 C. 29 D. 30 E. 31 15.4 (SOA3, 11/03, Q.40 & 2009 Sample Q.83) (2.5 points) You are the consulting actuary to a group of venture capitalists financing a search for pirate gold. Itʼs a risky undertaking: with probability 0.80, no treasure will be found, and thus the outcome is 0. The rewards are high: with probability 0.20 treasure will be found. The outcome, if treasure is found, is uniformly distributed on [1000, 5000]. You use the inverse transformation method to simulate the outcome, where large random numbers from the uniform distribution on [0, 1] correspond to large outcomes. Your random numbers for the first two trials are 0.75 and 0.85. Calculate the average of the outcomes of these first two trials. (A) 0 (B) 1000 (C) 2000 (D) 3000 (E) 4000
2013-4-13,
Simulation §15 Splices, HCM 10/25/12, Page 251
Solutions to Problems: 15.1. B. F(x) = .12 x, x ≤ 5. F(5) = .6. S(x) = .6595e-x/10, x > 5. Setting F(x) = .43 ⇒ x = .43/.12 = 3.58. Setting F(x) = .77 ⇒ S(x) = .23 ⇒ .6595e-x/10 = .23 ⇒ e-x/10 = .349 ⇒ x = 10.53. Sum of the two simulated losses is: 3.58 + 10.53 = 14.11. 15.2. D. This can be thought of as a two component splice between a point mass at 0 of 60% and a Single Pareto Distribution with weight 40%. F(x) = .6, for 0 ≤ x ≤ 100. F(x) = .6 + .4{1 - (100/x)3 }, for 100 < x. .114 ≤ .6, so that the first simulated x = 0. Set .657 = F(x) = .6 + .4{1 - (100/x)3 }. ⇒ x = 105.26. Set .770 = F(x) = .6 + .4{1 - (100/x)3 }. ⇒ x = 120.26. .402 ≤ .6, so that the fourth simulated x = 0. Set .899 = F(x) = .6 + .4{1 - (100/x)3 }. ⇒ x = 158.21. Average of the five simulated values is: (0 + 105.26 + 120.26 + 0 + 158.21)/5 = 76.75. Comment: Similar to SOA3, 11/03, Q.40. 15.3. A. .931 > .8, so we are in the third component; 25 + (100 - 25)(.931 - .8)/.2 = 74.125. .8 ≥ .561 > .5, so we are in the second component; 5 + (25 - 5)(.561 - .5)/.3 = 9.067. .8 ≥ .785 > .5, so we are in the second component; 5 + (25 - 5)(.785 - .5)/.3 = 24. .108 ≤ .5, so we are in the first component; (5)(.108)/.5 = 1.08. (74.125 + 9.067 + 24 + 1.08)/4 = 27.07. 15.4. B. This can be thought of as a two component splice between a point mass at 0 of 80% and a uniform distribution with weight 20%. F(x) = 0.8, for 0 ≤ x ≤ 1000. F(x) = 0.8 + (0.2)(x - 1000)/4000, for 1000 ≤ x ≤ 5000. 0.75 ≤ 0.8, so that the first simulated x = 0. F(2000) = 0.85, so that the second simulated x is 2000. Average of the two simulated values is: (0 + 2000)/2 = 1000.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 252
Section 16, Bootstrapping One important property of an estimator is its Mean Squared Error (MSE).132 Bootstrapping is a general technique of estimating the mean squared error of an estimator.133 One has to have a set of data of size n. Then one produces new sets of size m, the number of values used in the estimator, by drawing elements from the data set with replacement, retaining the order in which the values are picked. One then calculates the squared error of the estimator for each of these new sets; the average of all of these squared errors is our estimate of the MSE. There are two somewhat different versions. In the first method of Bootstrapping, we take all possible new sets of size m and average the resulting squared errors. Exercise: One has a data set of 3 sizes of loss: $3, $5, and $14. List all of the possible sets one could get by drawing 2 losses with replacement from the data set. Retain the order in which the losses occurred. [Solution: There are 32 = 9 such sets: {3, 3}; {3, 5}; {3,14}; {5, 3}; {5, 5}; {5,14}; {14, 3}; {14, 5}; {14, 14}.] Assume we estimate the median (of the underlying distribution) by taking an average of the next two losses we observe. This is an example of an estimator of a quantity of interest. If the next two losses are 14 and 3, then our estimate of the median would be: (14 + 3)/2 = 8.5. First, the quantity of interest, in this case the median, is calculated assuming a distribution that is uniform and discrete on the given data set, in other words using the empirical distribution function. In this case we assume 1/3 probability on each of $3, $5, and $14.134 Our estimate of the median is the middle value of 5.135 If we treat 5 as the “true” median and 8.5 as our estimate, then the squared error is: (8.5 - 5)2 = 12.25. Exercise: For each of the sets in the previous exercise, calculate the corresponding squared error in the estimated median, assuming 5 is the “true value” of the median.
132
The mean square error (MSE) of an estimator is the expected value of the squared difference between the estimate and the true value. See “Mahlerʼs Guide to Fitting Loss Distributions.” 133 Asymptotically, i.e., as the sample size approaches infinity, the bootstrap estimate converges to the true value. See Example 21.15 of Loss Models. 134 1/3 probability on each of 3, 5, and 14, is the empirical distribution. 135 Using Definition 3.7 in Loss Models, the median of the uniform discrete distribution on 3, 5, and 14 is 5, since 1/3 = F(5-) ≤ 50% ≤ F(5) = 2/3. I would take as the median the first value such that F(x) ≥ 50%, which in this case is 5.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 253
[Solution: First Loss
Second Loss
Estimated Median
True Value
Squared Error
3 3 3 5 5 5 14 14 14
3 5 14 3 5 14 3 5 14
3 4 8.5 4 5 9.5 8.5 9.5 14
5 5 5 5 5 5 5 5 5
4 1 12.25 1 0 20.25 12.25 20.25 81
Average
7.333
7.333
16.889
Having computed the squared error for all 9 possible sets of size two, one can then average them. The result of 16.889 could be taken as an estimate of the MSE of this estimator for the median. The steps in general for Bootstrapping (not via Simulation) are: 0. You are a given a set and an estimator of a quantity of interest. 1. The quantity of interest is calculated assuming the distribution was uniform and discrete on the given data set, in other words using the empirical distribution function. 2. List all the subsets, with replacement, of appropriate size of the original set. 3. For each subset from step 2, calculate the estimate using the estimator. 4. Compute the mean squared difference between the values from step 3 and the value from step 1. The result of step 4 is the bootstrap estimate of the Mean Squared Error of the estimator.
Bootstrapping Estimates of the MSE of estimates of the Mean:136 One can apply bootstrapping to the usual estimator of the mean. However, using Bootstrapping (not via simulation) to estimate the MSE of the mean of n numbers gives the same answer as: (E[X2 ] - E[X]2 ) / n. 136
See Example 21.15 in Loss Models.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 254
For example, take, two losses of sizes: 4 and 10. The estimator of the mean will be the average of the next two numbers. We would estimate the mean as: (4 + 10)/2 = 7. Here is how we could use Bootstrapping in order to estimate the MSE of this estimate. For each of the 22 = 4 possible combinations we compute the average and compare it to the “true value” of 7: First Loss
Second Loss
Average
Squared Error
4 4 10 10
4 10 4 10
4.0 7.0 7.0 10.0
9.0 0.0 0.0 9.0
Average
4.5
We estimate the MSE as 4.5. Alternately, E[X2 ] = (42 +102 )/2 = 58. (E[X2 ] - E[X]2 ) / n = (58 - 72 )/2 = 4.5, getting the same result. In general, using Bootstrapping to estimate the MSE of the mean of n numbers gives the same answer as (E[X2 ] - E[X]2 ) / n.137 138 This is equivalent to: n
∑(xi − X )2 i=1
n2
n
∑ xi , where X = i=1 . n
This bootstrap estimate of the MSE is somewhat different than the unbiased estimate of: n
∑(xi
n
− X )2
∑ xi
, where X = i=1 . S 2 /n = i=1 n (n -1) n
137
Where n is the number of numbers averaged in the estimator of the mean. There is in practical applications no need to use the Bootstrap in order to estimate the MSE of the estimated mean. Rather, Bootstrapping can be useful to estimate the MSE of estimators of more complicated quantities such as the variance, Loss Elimination Ratio, the Layer Average Severity, actuarial present values, etc. 138 Since the sample mean is unbiased, MSE = Variance.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 255
Exercise: We have three losses of sizes: 14, 21, and 37. The estimator of the mean used is the average of the next three numbers. Use Bootstrapping in order to estimate the MSE of this estimator. [Solution: The “true value” is: (14+21+37)/3 = 24. For each of the 33 = 27 possible combinations we compute the average and compare it to 24: First Loss
Second Loss
Third Loss
Average
Squared Error
14 14 14 14 14 14 14 14 14 21 21 21 21 21 21 21 21 21 37 37 37 37 37 37 37 37 37
14 14 14 21 21 21 37 37 37 14 14 14 21 21 21 37 37 37 14 14 14 21 21 21 37 37 37
14 21 37 14 21 37 14 21 37 14 21 37 14 21 37 14 21 37 14 21 37 14 21 37 14 21 37
14.00 16.33 21.67 16.33 18.67 24.00 21.67 24.00 29.33 16.33 18.67 24.00 18.67 21.00 26.33 24.00 26.33 31.67 21.67 24.00 29.33 24.00 26.33 31.67 29.33 31.67 37.00
100.00 58.78 5.44 58.78 28.44 0.00 5.44 0.00 28.44 58.78 28.44 0.00 28.44 9.00 5.44 0.00 5.44 58.78 5.44 0.00 28.44 0.00 5.44 58.78 28.44 58.78 169.00
Average
30.89
We estimate the MSE as 30.89. Alternately, the first moment is: (14+21+37)/3 = 24. The second moment is: (142 + 212 + 372 )/3 = 668.67. The biased estimate of the variance is: 668.67 - 242 = 92.67. Thus the variance of the average of three losses from this distribution is 92.67/3 = 30.89.]
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 256
Exercise: Given a set of three losses, the estimator of the mean is the average of the next three numbers. Demonstrate that using Bootstrapping to estimate the MSE gives the same answer as (E[X2 ] - E[X]2 ) / n. [Solution: The “true value” is E[X] for the original data set. Let the three losses in the random subset be a, b and c. Then the estimator applied to the random subset is the average: (a+b+c)/3. The squared difference between the “true value” and the estimator applied to the random subset is: {E[X] - (a + b + c)/3}2 = E[X]2 - 2E[X](a+b+c)/3 + (a2 + b2 + c2 + 2ab + 2ac + 2bc)/9. Now a, b, and c vary over all of their possibilities independently of each other. E[a] = E[b] = E[c] = E[X]. E[a2 ] = E[b2 ] = E[c2 ] = E[X2 ]. Therefore, the expected value of the squared errors is: E[X]2 - 2E[X]E[X] + 3E[X2 ]/9 + 6E[X]E[X]/9 = (E[X2 ] - E[X]2 )/3.]
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 257
Estimating the Probability of the Sample Mean Being Close to the Underlying Mean: One can apply the bootstrapping approach to estimate probabilities. For example, assume you have three values: 67, 78, and 101. Then the sample mean is: (67 + 78 + 101)/3 = 82. We assume these are three independent, identically distributed variables from some unknown distribution. Assume we wish to estimate the probability that the mean of this unknown distribution is within 5 of the sample mean of 82. Then take all subsets of {67, 78, 101} and compute the mean: X1
X2
X3
Mean
X1
X2
X3
Mean
67 67 67 67 67 67 67 67 67
67 67 67 78 78 78 101 101 101
67 78 101 67 78 101 67 78 101
67.00 70.67 78.33 70.67 74.33 82.00 78.33 82.00 89.67
78 78 78 78 78 78 78 78 78
67 67 67 78 78 78 101 101 101
67 78 101 67 78 101 67 78 101
70.67 74.33 82.00 74.33 78.00 85.67 82.00 85.67 93.33
101 101 101 101 101 101 101 101 101
67 67 67 78 78 78 101 101 101
67 78 101 67 78 101 67 78 101
78.33 82.00 89.67 82.00 85.67 93.33 89.67 93.33 101.00
We check how many of the 27 computed means for subsets are between 82 - 5 = 77 and 82 + 5 = 87. There are 13 out of 27. We estimate the probability that the mean of the distribution from which these values were drawn is within ±5 of the observed mean as: 13/27 = 48%. Note that one could equivalently check whether the sum of the three numbers is within (3)(5) = 15 of the sum of the original data set: 67 + 78 + 101 = 246. Since 67 + 67 + 78 = 212 < 231, this set is not okay. Since 67 + 67 + 101 = 235 is ≥ 231 and ≤ 261, this set is okay. Since 67 + 101 + 101 = 269 > 261, this set is not okay.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 258
Problems: 16.1 (3 points) You are given a random sample of 3 values from a distribution: 6, 8, 13. You will estimate the median of the distribution by taking the average of the next two observations. Determine the bootstrap approximation to the mean square error of this estimator. A. less than 2 B. at least 2 but less than 3 C. at least 3 but less than 4 D. at least 4 but less than 5 E. at least 5 16.2 (3 points) You are given a random sample of three values: 2, 5, 11. 2
∑(Xi − X )2 You estimate the variance using the estimator g(X1 , X2 ) = i=1
, where
2
X = (X1 + X2 )/2. Determine the bootstrap approximation to the mean square error. (A) 100 (B) 110 (C) 120 (D) 130 (E) 140 16.3 (5 points) You are given a random sample of three values: 5, 7, 16. 3
∑(Xi − X )2 You estimate the variance using the estimator g(X1 , X2 , X3 ) = i=1
3
, where
X = (X1 + X2 + X3 )/3. Determine the bootstrap approximation to the mean square error. A. less than 150 B. at least 150, but less than 160 C. at least 160, but less than 170 D. at least 170, but less than 180 E. at least 180 16.4 (3 points) You are given a random sample of five values from a distribution F: 575, 1043, 1980, 248, 697. You estimate θ(F) = E[X] using the estimator: g(X1 , X2 , X3 , X4 , X5 ) = (X1 + X2 + X3 + X4 + X5 )/5. Determine the bootstrap approximation to the mean square error. A. 60,000 B. 70,000 C. 80,000 D. 90,000 E. 100,000
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 259
16.5 (2 points) You are given a random sample of two values from a distribution: 3, 9. You estimate the loss elimination ratio at 5. Determine the bootstrap approximation to the mean square error of this estimate. A. 0.01 B. 0.02 C. 0.03 D. 0.04 E. 0.05 16.6 (3 points) You are given a random sample of three values: 5, 7, 16. 2
You estimate the variance using the estimator g(X1 , X2 ) =
∑(Xi − X )2 , where i=1
X = (X1 + X2 )/2. Determine the bootstrap approximation to the mean square error. A. less than 400 B. at least 400, but less than 500 C. at least 500, but less than 600 D. at least 600, but less than 700 E. at least 700 16.7 (3 points) You are given a random sample of three values: 2, 6, 17. Use the bootstrap approach to estimate the probability that the mean of the distribution from which these values were drawn is within ±4 of the observed mean. (A) 70% (B) 75% (C) 80% (D) 85% (E) 90% 16.8 (3 points) You are given a random sample of four values from a distribution F: 15, 21, 29, 44. You estimate θ(F) = E[X] using the estimator g(X1 , X2 , X3 , X4 ) = (X1 + X2 )/2 . Determine the bootstrap approximation to the mean square error. A. less than 56 B. at least 56, but less than 58 C. at least 58, but less than 60 D. at least 60, but less than 62 E. at least 62 16.9 (3 points) You are given a random sample of three values: 3, 7, 14. Use the bootstrap approach to estimate the probability that the mean of the distribution from which these values were drawn is within ±2 of the observed mean. A. less than 35% B. at least 35%, but less than 40% C. at least 40%, but less than 45% D. at least 45%, but less than 50% E. at least 50%
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 260
16.10 (3 points) You observe the number of claims during a year, X, for three insureds. Alice had no claims. Bob had no claims. Carol had 3 claims. You estimate the variance of X using the estimator: 3
∑(Xi − X )2 g(X1 , X2 , X3 ) = i=1
2
.
Determine the bootstrap estimate of the mean-squared error of g. (A) 1 (B) 2 (C) 4 (D) 8 (E) 16 16.11 (2 points) You are given a random sample of two values from a distribution: 7, 42. You estimate the limited expected value at 25. Determine the bootstrap approximation to the mean square error of this estimate. A. less than 35 B. at least 35, but less than 40 C. at least 40, but less than 45 D. at least 45, but less than 50 E. at least 50 16.12 (3 points) You are given a random sample of five values from a distribution F: 7, 10, 19, 24, 69. You estimate θ(F) = E[X] using the estimator g(X1 , X2 , X3 ) = (X1 + X2 + X3 )/3. Determine the bootstrap approximation to the mean square error. A. less than 100 B. at least 100, but less than 120 C. at least 120, but less than 140 D. at least 140, but less than 160 E. at least 160
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 261
16.13 (5 points) Three observed values of the random variable X are: 0 1 8 You estimate the third central moment of X using the estimator: 3
∑ (Xi − X) 3
g(X1 , X2 , X3 ) = i=1
3
.
Determine the bootstrap estimate of the mean-squared error of g. (A) Less than 500 (B) At least 500, but less than 750 (C) At least 750, but less than 1000 (D) At least 1000, but less than 1250 (E) At least 1250 16.14 (3 points) For a policy that covers both fire and wind losses, you are given: (i) A sample of fire losses was 8 and 20. (ii) A sample of wind losses was 6 and 30. (iii) Fire and wind losses are independent, but do not have identical distributions. Based on the sample, you estimate that adding a policy deductible of 10 per wind claim will eliminate 25% of the insured loss. Determine the bootstrap approximation to the mean square error of the estimate. (A) Less than 0.004 (B) At least 0.004, but less than 0.006 (C) At least 0.006, but less than 0.008 (D) At least 0.008, but less than 0.010 (E) At least 0.010 16.15 (4 points) You observe the number of claims during a year, X, for five insureds. Anthony had no claims. Danielle had no claims. Kevin had 1 claim. Ryan had no claims. Samantha had 1 claim. You estimate the mode of X by taking the most common value in a set of size five. Determine the bootstrap estimate of the mean-squared error of this estimator of the mode. (A) 0.32 (B) 0.34 (C) 0.36 (D) 0.38 (E) 0.40
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 262
16.16 (3 points) Three observed values of the random variable X are: 1 3 8 You estimate the median of X by taking the middle value in size of the next three values. Determine the bootstrap estimate of the mean-squared error of this estimator. A. 7.5 B. 8 C. 8.5 D. 9.0 E. 9.5 16.17 (Course 4 Sample Exam 2000, Q.38) You are given a random sample of four values from a distribution F: 4, 5, 9, 14. You estimate θ(F) = E[X] using the estimator g(X1 , X2 , X3 , X4 ) = X1 . Determine the bootstrap approximation to the mean square error. 16.18 (4, 5/00, Q.17) (2.5 points) You are given a random sample of two values from a distribution function F: 1
3 2
∑(Xi − X )2 You estimate θ(F) = Var(X) using the estimator g(X1 , X2 ) = i=1
2
, where
X = ( X1 + X2 )/2. Determine the bootstrap approximation to the mean square error. (A) 0.0 (B) 0.5 (C) 1.0 (D) 2.0 (E) 2.5 16.19 (4, 11/00, Q.26) (2.5 points) You are given a random sample of two values from a distribution function F: 1 3. 2
You estimate θ(F) = Var(X) using the estimator g(X1 , X2 ) =
∑(Xi − X )2 , where i=1
X = ( X1 + X2 )/2. Determine the bootstrap approximation to the mean square error. (A) 0.0 (B) 0.5 (C) 1.0 (D) 2.0 (E) 2.5 16.20 (4, 11/02, Q.35 & 2009 Sample Q.52) (2.5 points) With the bootstrapping technique, the underlying distribution function is estimated by which of the following? (A) The empirical distribution function (B) A normal distribution function (C) A parametric distribution function selected by the modeler (D) Any of (A), (B) or (C) (E) None of (A), (B) or (C)
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 263
16.21 (4, 11/03, Q.26 & 2009 Sample Q.20) (2.5 points) You are given a sample of two values, 5 and 9. 2
∑(Xi − X )2 You estimate Var(X) using the estimator g(X1 , X2 ) = i=1
2
Determine the bootstrap approximation to the mean square error of g. (A) 1 (B) 2 (C) 4 (D) 8 (E) 16 16.22 (4, 5/05, Q.4 & 2009 Sample Q.175) (2.9 points) Three observed values of the random variable X are: 1 1 4 You estimate the third central moment of X using the estimator: 3
∑ (Xi − X) 3
g(X1 , X2 , X3 ) = i=1
3
.
Determine the bootstrap estimate of the mean-squared error of g. (A) Less than 3.0 (B) At least 3.0, but less than 3.5 (C) At least 3.5, but less than 4.0 (D) At least 4.0, but less than 4.5 (E) At least 4.5 16.23 (4, 5/07, Q.29) (2.5 points) For a policy that covers both fire and wind losses, you are given: (i) A sample of fire losses was 3 and 4. (ii) Wind losses for the same period were 0 and 3. (iii) Fire and wind losses are independent, but do not have identical distributions. Based on the sample, you estimate that adding a policy deductible of 2 per wind claim will eliminate 20% of the insured loss. Determine the bootstrap approximation to the mean square error of the estimate. (A) Less than 0.006 (B) At least 0.006, but less than 0.008 (C) At least 0.008, but less than 0.010 (D) At least 0.010, but less than 0.012 (E) At least 0.012
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 264
Solutions to Problems: 16.1. E. We want the expected value of: (estimate - true value)2 . Assuming we use the observed empirical distribution function as a substitute for the underlying distribution function, then the “true” value of the quantity of interest, in this case the median, would be 8. Then we compute E[(estimate - 8)2 ] for the empirical distribution function, by writing down all of the possible pairs of values one can get and averaging the squared errors. First Observation 6 8 13 6 8 13 6 8 13
Second Observation Estimated Median 6 6 6 8 8 8 13 13 13
Squared Error
6.0 7.0 9.5 7.0 8.0 10.5 9.5 10.5 13.0
4.00 1.00 2.25 1.00 0.00 6.25 2.25 6.25 25.00
Average
5.33
16.2. B. Take all possible subsets of size two of {2, 5, 11} and compute the estimator: g(2, 2) = 0. g(2, 5) = 2.25. g(2, 11) = 20.25. g(5, 2) = 2.25. g(5, 5) = 0. g(5, 11) = 9. g(11, 2) = 20.25. g(11, 5) = 9. g(11, 11) = 0. 2, 5 and 11 are the observed points. For the empirical distribution, the mean is 6 and the variance is: ((2-6)2 +(5-6)2 +(11-6)2 )/3 = 14; treat 14 as the “true” value of the variance. Then the estimated mean squared error is: {(14 - 0)2 + (14 - 2.25)2 + (14 - 20.25)2 + (14 - 2.25)2 + (14 - 0)2 + (14 - 9)2 + (14 - 20.25)2 + (14 - 9)2 + (14 - 0)2 }/9 = 110.25.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 265
16.3. D. Take all possible subsets of size three of {5, 7, 16} and compute the estimator: g(5, 5, 5) = 0. g(5, 5, 7) = .889. g(5, 5, 16) = 26.889. g(5, 7, 5) = .889. g(5, 7, 7) = .889. g(5, 7, 16) = 22.889. g(5, 16, 5) = 26.889. g(5, 16, 7) = 22.889. g(5, 16, 16) = 26.889. g(7, 5, 5) = .889. g(7, 5, 7) = .889. g(7, 5, 16) = 22.889. g(7, 7, 5) = .889. g(7, 7, 7) = 0. g(7, 7, 16) = 18. g(7, 16, 5) = 22.889. g(7, 16, 7) = 18. g(7, 16, 16) = 18. g(16, 5, 5) = 26.889. g(16, 5, 7) = 22.889. g(16, 5, 16) = 26.889. g(16, 7, 5) = 22.889. g(16, 7, 7) = 18. g(16, 7, 16) = 18. g(16, 16, 5) = 26.889. g(16, 16, 7) = 18. g(16, 16, 16) = 0. If the distribution were uniform and discrete on 5, 7 and 16, the observed points, then the mean would be 28/3 and the variance would be: ((5 - 28/3)2 + (7 - 28/3)2 + (16 - 28/3)2 )/3 = 22.889; treat this as the true value. Then the estimated mean squared error is: {3(22.889 - 0)2 + 6(22.889 - .889)2 + 6(22.889 - 18)2 + 6(22.889 - 22.889)2 + 6(22.889 - 26.889)2 } / 27 = 174.6. Comment: With replacement, there are 33 = 27 subsets. 16.4. B. Using Bootstrapping to estimate the MSE of the mean of n numbers gives the same answer as: (E[X2 ] - E[X]2 ) / n. E[X] = (575 + 1043 + 1980 + 248 + 697)/5 = 908.6. E[X2 ] = (5752 + 10432 + 19802 + 2482 + 6972 )/5 = 1,177,237. The estimated variance = 1177237 - 908.62 = 351683. The estimator for the mean is an average of five observations, thus in this case we divide by n = 5: 351683/5 = 70,337. Comment: Doing the usual bootstrap calculation would involve in this case listing 55 = 3125 possible sets of size five. This would be very time consuming. 16.5. C. If the distribution were uniform and discrete on 3 and 9, the observed points, then the loss elimination ratio at 5 would be: {(.5)(3) + (.5)(5)}/{(.5)(3) + (.5)(9)} = 2/3. Treat 2/3 as the “true value”. Take all possible subsets of size two of {3, 9}, and compute the loss elimination ratio at 5. For 3 and 3, LER(5) = 1. For 3 and 9, LER(5) = (3 + 5)/(3 + 9) = 2/3. For 9 and 3, LER(5) = (3 + 5)/(3 + 9) = 2/3. For 9 and 9, LER(5) = (5 + 5)/(9 + 9) = 5/9. Then the estimated mean squared error is: {(1 - 2/3)2 + (2/3 - 2/3)2 + (2/3 - 2/3)2 + (5/9 - 2/3)2 }/4 = 0.0309.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 266
16.6. D. If the distribution were uniform and discrete on 5, 7 and 16, the observed points, then the mean would be 28/3 and the variance would be: ((5 - 28/3)2 + (7 - 28/3)2 + (16 - 28/3)2 )/3 = 22.889; treat 22.889 as the “true value”. Take all possible subsets of size two of {5, 7, 16} and compute the estimator 2
g(X1 , X2 ) = Σ(Xi - X )2 , where X = ( X1 + X2 )/2. i =1
First Loss
Second Loss
Mean
Estimator
"True Value"
Squared Error
5 5 5 7 7 7 16 16 16
5 7 16 5 7 16 5 7 16
5 6 10.5 6 7 11.5 10.5 11.5 16
0.00 2.00 60.50 2.00 0.00 40.50 60.50 40.50 0.00
22.889 22.889 22.889 22.889 22.889 22.889 22.889 22.889 22.889
523.91 436.35 1414.59 436.35 523.91 310.15 1414.59 310.15 523.91
Average
654.88
Then the estimated mean squared error is: 655. Comment: Note that the “true value” only depends on the quantity we are trying to estimate, not on the form of g. For example, If g instead were: (1/2){(X1 - X )2 + (X2 - X )2 }, the “true value” would still be 22.889.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 267
16.7. A. Take all subsets of {2, 6, 17} and compute the mean: X1
X2
X3
Mean
X1
X2
X3
Mean
2 2 2 2 2 2 2 2 2
2 2 2 6 6 6 17 17 17
2 6 17 2 6 17 2 6 17
2.00 3.33 7.00 3.33 4.67 8.33 7.00 8.33 12.00
6 6 6 6 6 6 6 6 6
2 2 2 6 6 6 17 17 17
2 6 17 2 6 17 2 6 17
3.33 4.67 8.33 4.67 6.00 9.67 8.33 9.67 13.33
17 17 17 17 17 17 17 17 17
2 2 2 6 6 6 17 17 17
2 6 17 2 6 17 2 6 17
7.00 8.33 12.00 8.33 9.67 13.33 12.00 13.33 17.00
The observed mean is (2 + 6 + 17)/3 = 8.33. We check how many of the 27 computed means for subsets are between 8.33 - 4 = 4.33 and 8.33 + 4 = 12.33. There are 19 out of 27, so we estimate the probability that the mean of the distribution from which these values were drawn is within ±4 of the observed mean as: 19/27 = 70.4%.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 268
16.8. C. We want the expected value of: (estimate - true value)2 . We use the observed empirical distribution function as a substitute for the underlying distribution function; then the “true” value of the quantity of interest, in this case the mean, is: (15 + 21 + 29 + 44)/4 = 27.25. For each of the 42 = 16 possible sets of size two, we apply the estimator and compare it to the “true value” of 27.25: First Loss
Second Loss
Average
"True Value"
Squared Error
15 15 15 15 21 21 21 21 29 29 29 29 44 44 44 44
15 21 29 44 15 21 29 44 15 21 29 44 15 21 29 44
15.00 18.00 22.00 29.50 18.00 21.00 25.00 32.50 22.00 25.00 29.00 36.50 29.50 32.50 36.50 44.00
27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25 27.25
150.06 85.56 27.56 5.06 85.56 39.06 5.06 27.56 27.56 5.06 3.06 85.56 5.06 27.56 85.56 280.56
Average
59.09
We estimate the MSE as: 59.1. Alternately, in general, using Bootstrapping to estimate the MSE of the mean of n numbers gives the same answer as (E[X2 ] - E[X]2 ) / n. The 2nd moment is: (152 + 212 + 292 + 442 )/4 = 860.75. The estimated variance = 860.75 - 27.252 = 118.2. The estimator for the mean is an average of two observations, thus in this case we divide by n = 2; 118.2/2 = 59.1. Comment: Similar to Course 4 Sample Exam, Q.38. Estimating the mean by using the next two observed value, the mean squared error is the variance of the distribution divided by the number of data points used in the estimator, 2.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 269
16.9. D. Take all subsets of {3, 7, 14} and compute the mean: X1
X2
X3
Mean
X1
X2
X3
Mean
3 3 3 3 3 3 3 3 3
3 3 3 7 7 7 14 14 14
3 7 14 3 7 14 3 7 14
3.00 4.33 6.67 4.33 5.67 8.00 6.67 8.00 10.33
7 7 7 7 7 7 7 7 7
3 3 3 7 7 7 14 14 14
3 7 14 3 7 14 3 7 14
4.33 5.67 8.00 5.67 7.00 9.33 8.00 9.33 11.67
14 14 14 14 14 14 14 14 14
3 3 3 7 7 7 14 14 14
3 7 14 3 7 14 3 7 14
6.67 8.00 10.33 8.00 9.33 11.67 10.33 11.67 14.00
The observed mean is: (3 + 7 + 14)/3 = 8. We check how many of the 27 computed means for subsets are between 8 - 2 = 6 and 8 + 2 = 10. There are 13 out of 27, so we estimate the probability that the mean of the distribution from which these values were drawn is within ±2 of the observed mean as: 13/27 = 48.1%. Comment: One could equivalently check whether the sums are between 18 and 30.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 270
16.10. B. For a discrete distribution, Prob[0] = 2/3 and Prob[3] = 1/3, the mean is 1, and the variance is: (2/3)(0 - 1)2 + (1/3)(3 - 1)2 = 2. Treat 2 as the “true value”. We need to list all 27 subsets of size three from the original set. Since the value 0 shows up twice, this can be confusing. Let us denote the original set as: 0, zero, 3. Then the 27 subsets are: {0, 0, 0}, {0, 0, zero}, {0, 0, 3}, {0, zero, 0}, {0, 3, 0}, {0, zero, zero}, {0, zero, 3}, {0, 3, zero}, {0, 3, 3}; {zero, 0, 0}, {zero, 0, zero}, {zero, 0, 3}, {zero, zero, 0}, {zero, 3, 0}, {zero, zero, zero}, {zero, zero, 3}, {zero, 3, zero}, {zero, 3, 3}; {3, 0, 0}, {3, 0, zero}, {3, 0, 3}, {3, zero, 0}, {3, 3, 0}, {3, zero, zero}, {3, zero, 3}, {3, 3, zero}, {3, 3, 3}. g(0, 0, 0) = 0. g(0, 0, 3) = (1/2){(0 - 1)2 + (0 - 1)2 + (3 - 1)2 } = 3 = g(0, 3, 3). The corresponding values of the estimator g are: 0, 0, 3, 0, 3, 0, 3, 3, 3; 0, 0, 3, 0, 3, 0, 3, 3, 3; 3, 3, 3, 3, 3, 3, 3, 3, 0. The sum of squared differences from the “true value of 2” are: 9(0 - 2)2 + 18(3 - 2)2 = 54. ⇒ MSE = 54/27 = 2. Alternately, one could denote the 27 subsets based on the insureds: {A, A, A}, {A, A, B}, {A, A, C}, {A, B, A}, {A, C, A}, {A, B, B}, {A, B, C}, {A, C, B}, {A, C, C}; {B, A, A}, {B, A, B}, {B, A, C}, {B, B, A}, {B, C, A}, {B, B, B}, {B, B, C}, {B, C, B}, {B, C, C}; {C, A, A}, {C, A, B}, {C, A, C}, {C, B, A}, {C, C, A}, {C, B, B}, {C, B, C}, {C, C, B}, {C, C, C}. Proceed as before. Comment: Long! Similar to 4, 5/05, Q.4. There are 23 = 8 subsets with three zeros. There are (3)22 = 12 subsets with two zeros and a three. There are (3)2 = 6 subsets with two threes and a zero. There is 1 subset with three threes. 16.11. C. If the distribution were discrete and uniform on 7 and 42, the observed points, then the limited expected value at 25 would be: (7 + 25)/2 = 16. Treat 16 as the “true value”. Take all possible subsets of size two of {7, 42}, and compute the loss elimination ratio at 5. For 7 and 7, E[X ∧ 25] = 7. For 7 and 42, E[X ∧ 25] = 16. For 42 and 7, E[X ∧ 25] = 16. For 42 and 42, E[X ∧ 25] = 25. The estimated mean squared error is: {(7 - 16)2 + (16 - 16)2 + (16 - 16)2 + (25 - 16)2 }/4 = 40.5. 16.12. E. Using Bootstrapping to estimate the MSE of the mean of n numbers gives the same answer as: (E[X2 ] - E[X]2 ) / n. E[X] = (7 + 10 + 19 + 24 + 69)/5 = 25.8. E[X2 ] = (72 + 102 + 192 + 242 + 692 )/5 = 1169.4. The estimated variance = 1169.4 - 25.82 = 503.76 The estimator for the mean is an average of three observations, thus in this case we divide by n = 3: 503.76/3 = 167.92. Comment: Doing the usual bootstrap calculation would involve in this case listing 53 = 125 possible sets of size three. This would be time consuming.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 271
16.13. D. For a discrete distribution, Prob[0] = 1/3, Prob[1] = 1/3 and Prob[8] = 1/3, the mean is 3, and the third central moment is: (1/3)(0 - 3)3 + (1/3)(1 - 3)3 + (1/3)(8 - 3)3 = 30. Treat 30 as the “true value”. We need to list all 27 subsets of size three from the original set: {0, 0, 0}, {1, 1, 1}, {8, 8, 8}, three subsets with two 0 and one 1, three subsets with two 0 and one 8, three subsets with two 1 and one 0, three subsets with two 1 and one 8, three subsets with two 8 and one 0, three subsets with two 8 and one 1, and six subsets with one 0, one 1, and one 8. g(0, 0, 0) = 0. g(1, 1, 1) = 0. g(8, 8, 8) = 0. g(0, 0, 1) = (1/3){(0 - 1/3)3 + (0 - 1/3)3 + (1 - 1/3)3 } = 0.074. g(0, 0, 8) = (1/3){(0 - 8/3)3 + (0 - 8/3)3 + (8 - 8/3)3 } = 37.926. g(1, 1, 0) = (1/3){(1 - 2/3)3 + (1 - 2/3)3 + (0 - 2/3)3 } = -0.074. g(1, 1, 8) = (1/3){(1 - 10/3)3 + (1 - 10/3)3 + (8 - 10/3)3 } = 25.407. g(8, 8, 0) = (1/3){(8 - 16/3)3 + (8 - 16/3)3 + (0 - 16/3)3 } = -37.926. g(8, 8, 1) = (1/3){(8 - 17/3)3 + (8 - 17/3)3 + (1 - 17/3)3 } = -25.407. g(0, 1, 8) = (1/3){(0 - 3)3 + (1 - 3)3 + (8 - 3)3 } = 30. The sum of squared differences from the “true value” of 30 are: 3(0 - 30)2 + 3(30 - .074)2 + 3(30 - 37.926)2 + 3(30 + .074)2 + 3(30 - 25.407)2 + 3(30 + 37.926)2 + 3(30 + 25.407)2 + 6(30 - 30)2 = 31,403. ⇒ MSE = 31,403/27 = 1163. Comment: Similar to 4, 5/05, Q.4.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 272
16.14. A. We take all possible sets of two wind losses and two fire losses, and compute the loss elimination ratio. We then take the squared difference between this loss elimination ratio and the given 0.25. First Fire Loss 8 8 20 20 8 8 20 20 8 8 20 20 8 8 20 20 Average
Second First Second Fire Loss Wind Loss Wind Loss 8 20 8 20 8 20 8 20 8 20 8 20 8 20 8 20
6 6 6 6 6 6 6 6 30 30 30 30 30 30 30 30
6 6 6 6 30 30 30 30 6 6 6 6 30 30 30 30
Losses Eliminated
Total Losses
Loss Elim. Ratio
Squared Error
12 12 12 12 16 16 16 16 16 16 16 16 20 20 20 20
28 40 40 52 52 64 64 76 52 64 64 76 76 88 88 100
0.4286 0.3000 0.3000 0.2308 0.3077 0.2500 0.2500 0.2105 0.3077 0.2500 0.2500 0.2105 0.2632 0.2273 0.2273 0.2000
0.031888 0.002500 0.002500 0.000370 0.003328 0.000000 0.000000 0.001558 0.003328 0.000000 0.000000 0.001558 0.000173 0.000517 0.000517 0.002500 0.003171
Comment: Similar to 4, 5/97, Q.29. Assume we have two fire losses and two wind losses, with the following distribution of fire losses: 50% chance of a 8 and a 50% chance of a 20, and with the following distribution of wind losses: 50% chance of a 6 and a 50% chance of a 30. As stated, the distribution of fire losses is independent of the distribution of wind losses. Then the expected losses eliminated by a wind deductible of 10 are: (2){(.5)(6) + (.5)(10)} = 16. The expected total losses are: (2){(.5)(6) + (.5)(30)} + (2){(.5)(8) + (.5)(20)} = 64. The loss elimination ratio for a wind deductible of 10 is: 16/64 = 0.25, as given in the question. We take this as the “true value”.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 273
16.15. A. The empirical model assigns probability 3/5 to 0 and 2/5 to one. The mode of the empirical model is 0. Treat zero as the “true value.” The mode of a subset will be zero, if there are 2 or fewer ones selected. If there are 3 or more ones, then the mode is 1. The number of ones in a subset of size five is Binomial with m = 5 and q = 2/5 = 0.4. For this Binomial: f(0) + f(1) + f(2) = (0.6)5 + (5)(0.4)(0.6)4 + (10)(0.4)2 (0.6)3 = 0.68256. Thus 0.68256 of the time the estimate will be zero, and the remainder of the time it will be 1. The average squared difference between the “true value” and the estimate is: (0.68256)(0 - 0)2 + (1 - 0.68256)(1 - 0)2 = 0.31744. Alternately, there are a total of 55 = 3125 subsets with replacement from the original set of size five. The mode will be zero, unless there are 3, 4 , or 5 ones selected. In order to get all ones, one can have each element two different ways: 25 = 32. In order to get 4 ones, one can have each element that is one two different ways, and the single zero three different ways. Then the zero can appear in one of five locations: (5)(3)24 = 240. In order to get 3 ones, one can have each element that is one two different ways, and each zero three different ways. ⎛5⎞ Then the zero can appear in 4 + 3 + 2 + 1 = 10 = ⎜ ⎟ different combinations of locations: ⎝2⎠ (10)(32 )23 = 720. Thus the total number of ways the mode is one is: 32 + 240 + 720 = 992. So 992/3125 of the time the mode is one, while the remaining 2133/3125 the mode is zero. Proceed as before. Comment: In order to get 2 ones, one can have each element that is one two different ways, and each zero three different ways. Then the ones can appear in 10 different combinations of locations: (10)(33 )22 = 1080. In order to get 1 one, one can have the one two different ways, and each zero three different ways. Then the one can appear in 5 different locations: (5)(34 )2 = 810. In order to get all zeros, each zero three different ways: 35 = 243. Thus the total number of ways the mode is zero is: 1080 + 810 + 243 = 2133. 992 + 2133 = 3125.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 274
16.16. A. For the empirical model on {1, 3, 8}, the median is 3. Take 3 as the “true value”. For the estimate to be 1, there must be 2 or 3 ones in the subset. The number of ones is Binomial with m = 3 and q = 1/3. The density of this Binomial at 2 is: (3)(1/3)2 (2/3) = 6/27. The density of this Binomial at 3 is: (1/3)3 = 1/27. Thus 7/27 of the time the estimate is 1. By symmetry, 7/27 of the time the estimate is 8. Thus, the remaining 13/27 of the time the estimate is 3. Therefore, the bootstrap estimate of the mean-squared error is: (7/27)(1 - 3)2 + (13/27)(3 - 3)2 + (7/27)(8 - 3)2 = 7.52. Alternately, list all 27 subsets of size 3, and the result of applying the estimator. {1, 1, 1} 1 {3, 1, 1} 1 {8, 1, 1} 1 {1, 1, 3} 1 {3, 1, 3} 3 {8, 1, 3} 3 {1, 1, 8} 1 {3, 1, 8} 3 {8, 1, 8} 8 {1, 3, 1} 1 {3, 3, 1} 3 {8, 3, 1} 3 {1, 3, 3} 3 {3, 3, 3} 3 {8, 3, 3} 3 {1, 3, 8} 3 {3, 3, 8} 3 {8, 3, 8} 8 {1, 8, 1} 1 {3, 8, 1} 3 {8, 8, 1} 8 {1, 8, 3} 3 {3, 8, 3} 3 {8, 8, 3} 8 {1, 8, 8} 8 {3, 8, 8} 8 {8, 8, 8} 8 There are 7 sets where the estimate is a one, 13 sets where the estimate is a three, and 7 sets where the estimate is an eight. Therefore, the bootstrap estimate of the mean-squared error is: {(7)(1 - 3)2 + (13)(3 - 3)2 + (7)(8 - 3)2 }/27 = 7.52. 16.17. We want the expected value of: (estimate - true value)2 . We use the observed empirical distribution function as a substitute for the underlying distribution function; then the “true” value of the quantity of interest, in this case the mean, is: (4 + 5 + 9 + 14)/4 = 8. For each of the 4 possible sets of size one, we apply the estimator and compare it to the “true value” of 8, then take the average of these squared differences: {(4 - 8)2 + (5 - 8)2 + (9 - 8)2 + (14 - 8)2 }/4 = (16 + 9 + 1 + 36)/4 = 15.5. Comment: Without all the fancy notation, we are estimating the mean of a distribution by using the next observed value. In this case, the mean squared error of our estimate is the variance of the distribution. The second moment is: (16 + 25 + 81 + 196)/4 = 79.5. The estimated variance = 79.5 - 82 = 15.5. The estimator is a single observation, rather than an average of n observations. Thus in this case there is no need to divide by n at the end; alternately we divide by n = 1.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 275
16.18. B. Take all possible subsets of size two of {1, 3} and compute the estimator: g(1, 1) = 0. g(1, 3) = 1. g(3, 1) = 1. g(3, 3) = 0. If the distribution were uniform on 1 and 3, the observed points, then the mean would be 2 and the variance would be 1: .5(3-2)2 + .5(1-2)2 = 1. Treat 1 as the “true value”. Then the estimated mean squared error is: {(1 - 0)2 + (1 - 1)2 + (1 - 1)2 + (1 - 0)2 }/4 = 1/2. 16.19. C. Use the empirical distribution in order to get the "true value" of the variance: 50% chance of a 1 and a 50% chance of a 3. ⇒ mean = (.5)(1) + (.5)(3) = 2. second moment = (.5)(12) + (.5)(32) = 5. Variance = 5 - 22 = 1. Note that this did not depend on the form of the estimator g. Take all possible subsets of size two of {1, 3} and compute the estimator: g(1, 1) = 0. g(1, 3) = 2. g(3, 1) = 2. g(3, 3) = 0. Take squared differences from the “true value” of 1. The estimated mean squared error is: {(1 - 0)2 + (1 - 2)2 + (1 - 2)2 + (1 - 0)2 }/4 = 1. Comment: Note that the “true value” only depends on the quantity we are trying to estimate, not on the form of g. For example, If g instead were: (1/2){ (X1 - X )2 + (X2 - X )2 }, the “true value” would still be 1. 16.20. A. Using bootstrapping, the quantity of interest is calculated assuming the distribution was uniform (and discrete) on the given data set, in other words using the empirical distribution function. 16.21. D. Use the empirical distribution in order to get the "true value" of the variance: 50% chance of a 5 and a 50% chance of a 9. ⇒ mean = (.5)(5) + (.5)(9) = 7. second moment = (.5)(52) + (.5)(92) = 53. Variance = 53 - 72 = 4. Note that this did not depend on the form of the estimator g. Take all possible subsets of size two of {5, 9} and compute the estimator: g(5, 5) = 0. g(5, 9) = 4. g(9, 5) = 4. g(9, 9) = 0. Take squared differences from the “true value” of 4. The estimated mean squared error is: {(0 - 4)2 + (4 - 4)2 + (4 - 4)2 + (0 - 4)2 }/4 = 8.
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 276
16.22. E. For a discrete distribution, Prob[1] = 2/3 and Prob[4] = 1/3, the mean is 2, and the third central moment is: (2/3)(1 - 2)3 + (1/3)(4 - 2)3 = 2. Treat 2 as the “true value”. The number of 4s in a subset of size three is Binomial with m = 3 and q = 1/3. Prob[no fours] = (2/3)3 = 8/27. Prob[one four] = (3) (1/3) (2/3)2 = 12/27. Prob[two fours] = (3) (1/3)2 (2/3) = 6/27. Prob[three fours] = (1/3)3 = 1/27. Thus out of 27 subsets: 3 ones: 8 times, 2 ones and 1 four: 12 times, 1 one and 2 fours: 6 times, 3 fours: 1 time. Alternately, list all 27 subsets of size three from the original set. Since the value 1 shows up twice, this can be confusing. Let us denote the original set as: 1, one, 4. Then the 27 subsets are: {1, 1, 1}, {1, 1, one}, {1, 1, 4}, {1, one, 1}, {1, 4, 1}, {1, one, one}, {1, one, 4}, {1, 4, one}, {1, 4, 4}; {one, 1, 1}, {one, 1, one}, {one, 1, 4}, {one, one, 1}, {one, 4, 1}, {one, one, one}, {one, one, 4}, {one, 4, one}, {one, 4, 4}; {4, 1, 1}, {4, 1, one}, {4, 1, 4}, {4, one, 1}, {4, 4, 1}, {4, one, one}, {4, one, 4}, {4, 4, one}, {4, 4, 4}. Thus as before out of 27 subsets: 3 ones: 8 times, 2 ones and 1 four: 12 times, 1 one and 2 fours: 6 times, 3 fours: 1 time. g(1, 1, 1) = 0. g(1, 1, 4) = 2. g(1, 4, 4) = (1/3){(1 - 3)3 + (4 - 3)3 + (4 - 3)3 } = -2. g(4, 4, 4) = 0. The sum of squared differences from the “true value” of 2 are: 8(0 - 2)2 + 12(2 - 2)2 + 6(-2 - 2)2 + (0 - 2)2 = 132. ⇒ MSE = 132/27 = 4.89. Comment: Difficult under exam conditions. This question is longer because of: 3 items in the data set 3 items used in the estimator Dealing with the third central moment Two items in the data set are equal!
2013-4-13,
Simulation §16 Bootstrapping, HCM 10/25/12, Page 277
16.23. E. Assume we have two fire losses and two wind losses, with the following distribution of fire losses: 50% chance of a 3 and a 50% chance of a 4, and with the following distribution of wind losses: 50% chance of a 0 and a 50% chance of a 3. As stated, the distribution of fire losses is independent of the distribution of wind losses. Then the expected losses eliminated by a wind deductible of 2 are: (2){(.5)(0) + (.5)(2)} = 2. The expected total losses are: (2){(.5)(0) + (.5)(3)} + (2){(.5)(3) + (.5)(4)} = 10. The loss elimination ratio for a wind deductible of 2 is: 2/10 = 0.2, as given in the question. We take this as the “true value”. We take all possible sets of two wind losses and two fire losses, and compute the loss elimination ratio. We then take the squared difference between this loss elimination ratio and 0.2. First Fire Loss 3 3 4 4 3 3 4 4 3 3 4 4 3 3 4 4 Average
Second First Second Fire Loss Wind Loss Wind Loss 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4
0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3
0 0 0 0 3 3 3 3 0 0 0 0 3 3 3 3
Losses Eliminated
Total Losses
Loss Elim. Ratio
Squared Error
0 0 0 0 2 2 2 2 2 2 2 2 4 4 4 4
6 7 7 8 9 10 10 11 9 10 10 11 12 13 13 14
0.0000 0.0000 0.0000 0.0000 0.2222 0.2000 0.2000 0.1818 0.2222 0.2000 0.2000 0.1818 0.3333 0.3077 0.3077 0.2857
0.040000 0.040000 0.040000 0.040000 0.000494 0.000000 0.000000 0.000331 0.000494 0.000000 0.000000 0.000331 0.017778 0.011598 0.011598 0.007347 0.013123
Comment: One can save some time by noting that some of the possible sets of losses produce the same loss elimination ratio.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 278
Section 17, Bootstrapping via Simulation139 One could perform a bootstrap calculation for any estimator that uses m values from a set of size n. However, the number of subsets that need to be listed is in general nm, and grows very quickly as m and/or n grow. It soon becomes impractical to list all the possible subsets and perform the needed calculations. If one had a data set of size 10 and the estimator used 10 values, if one did not simulate random subsets, one would need to list 1010 subsets! Therefore, in practical applications, one usually uses the second form of the Bootstrap. One simulates a sufficient number of different subsets of size m, with replacement; each element is a random draw from the original data set. Exercise: One has a data set of 5 sizes of loss: $3, $5, $8, $14, and $22. Assume 2, 4, 2, are three independent random numbers from the integers from 1 to 4. What is the corresponding set of three losses from this data set? [Solution: $5, $14, $5.] For this set of $5, $14, $5, the middle size loss is $5; one could use this as the estimate of the median. For the empirical model on the original set, the median is the first place the distribution function is at least 50%, in this case $8; as usual we take this as the “”true value.” Then the squared error is (8 - 5)2 = 9. We could simulate as many sets of three losses as we wanted. Then we could average the resulting squared errors and use this as an estimate of the mean squared error of this estimator of the median.
139
See Example 21.16 of Loss Models.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 279
The steps in general for Bootstrapping via Simulation are: 0. You are a given a set and an estimator of a quantity of interest. 1. The quantity of interest is calculated assuming the distribution was uniform and discrete on the given data set, in other words using the empirical distribution function. 2. Simulate with replacement a data set of appropriate size from the given set. 3. For each subset from step 2, calculate the estimate using the estimator. 4. Tabulate the squared difference between the estimate for this run from step 3 and the value from step 1. Return to step 2. After performing sufficient simulations, the mean of the squared differences from step 4 is the bootstrap estimate of the Mean Squared Error of the estimator.140 Bootstrapping Via Simulation Applied to an Estimator of the Survival Function: Here is an example, where the quantity of interest is the survival function at two hundred, S(200). The estimator used will be S(200) for a LogNormal Distribution fit via the method of moments. The data set will consist of the following 20 values: 15, 17, 18, 24, 60, 64, 66, 73, 75, 80, 95, 103, 113, 125, 139, 153, 178, 238, 318, 530. Exercise: Fit a LogNormal Distribution to the above data via the method of moments. [Solution: The mean is 124.2, and the second moment is 29,596.5. Matching the first two moments: 124.2 = exp(µ + σ2/2), and 29,596.5 = exp(2µ + 2σ2). Dividing the second equation by the square of the first, eliminates mu: 29,596.5 / 124.22 = 1.91866 = exp(σ2). ⇒ σ = 0.8072. ⇒ µ = ln(124.2) - 0.80722 /2 = 4.4961.] Exercise: Use the fitted LogNormal to estimate S(200). [Solution: S(200) = 1 - F(200) = 1 - Φ[ln(200) - 4.4961)/0.8072] = 1 - Φ(0.99) = 0.1611. Comment: This estimate of 16.11% differs only slightly from the observed 3/20 = 15% of values greater than 200.] Then for a bootstrap estimate of the MSE of this estimator, we will compute the squared differences from 0.15, the empirical distribution function at 200 for the given data set. 140
Loss Models does not state how many simulations would be sufficient. According to Kendallʼs Advanced Theory of Statistics, one should run at least 200 simulations. In many situations one requires considerably more.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 280
Exercise: Given the following 20 random integers from 1 to 20: 5, 13, 9, 5, 9, 20, 2, 10, 8, 17, 1, 9, 11, 20, 6, 19, 16, 10, 14, 1, what is the corresponding simulated subset (with replacement) from the original data given above? [Solution: The first simulated value is the 5th element of the original data set, 60. The second simulated value is the 13th element of the original data set, 113. The simulated subset is: 60, 113, 75, 60, 75, 530, 17, 80, 73, 178, 15, 75, 95, 530, 64, 318, 153, 80, 125, 15. Comment: On your exam, they are extremely likely to give you the simulated subsets to use, as in 4, 11/04, Q.16, which is also 2009 Sample Q.144.] Exercise: Fit a LogNormal Distribution to the simulated data via the method of moments. [Solution: The mean is 136.55, and the second moment is 40,123.8. Matching the first two moments: 136.55 = exp(µ + σ2/2), and 40,123.8 = exp(2µ + 2σ2). Dividing the second equation by the square of the first, eliminates mu: 40,123.8 / 136.552 = 2.15188 = exp(σ2). ⇒ σ = 0.8754.
⇒ µ = ln(136.55) - 0.87542 /2 = 4.5335.] Exercise: Use this fitted LogNormal to estimate S(200). [Solution: S(200) = 1 - Φ[(ln(200) - 4.5335)/0.8754] = 1 - Φ(0.87) = 0.1922.] Thus from the first simulation run, we get a squared difference of (0.1922 - 0.15)2 = 0.00178. Assume that in a similar manner the next four simulation runs resulted in estimates for S(200) of: 0.1788, 0.0548, 0.3192, and 0.1660. Then based on these five simulation runs, the bootstrap estimate of the Mean Squared Error would be: (0.1922 - 0.15)2 + (0.1788 - 0.15)2 + (0.0548 - 0.15)2 + (0.3192 - 0.15)2 + (0.1660 - 0.15)2 5 = 0.00811. In order to use a bootstrap estimate we need to run a sufficient number of simulations. I ran 10,000 rather than just 5 simulation runs, and the estimated MSE for the estimate of S(200) was 0.00518. The square root of this MSE is 0.07. The root mean squared error is kind of like the standard deviation of the estimate. In this case, our point estimate of the survival function at 200 is 0.16. The standard deviation of the estimate is 0.07. Thus an approximate 95% interval is: 0.16 ± (1.96)(0.07) = 0.16 ± 0.14 = 0.02 to 0.30. This is a rather wide interval for an estimate of the value of the survival function at 200.141 141
This is probably unusable for any practical application. We would need a larger initial data set than 20 values, in order to accurately estimate S(200).
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 281
Estimating the Probability of the Sample Mean Being Close to the Underlying Mean: Here is an example involving the probability of the sample mean being close to the true underlying mean. Take the set of 10 points: 56, 101, 78, 67, 93, 87, 64, 72, 80, 69. We wish to estimate using bootstrapping via simulation: 10
∑Xi Prob( -5 < i=1 - µ < 5), 10 where Xi are independent identically distributed variables with unknown mean µ.142 Exercise: Use the following random integers from 1 to 10: 7, 9, 6, 4, 10, 7, 5, 2, 10, 4, in order to simulate a random set of size 10 from the given data set. [Solution: 64, 80, 87, 67, 69, 64, 93, 101, 69, 67.] For this simulated set, the sample mean is: 76.1. For the original data set, the mean is 76.7. If we take this 76.7 as the “true mean”, then -5 < 76.1 - 76.7 < 5. Thus for this simulation run, the sample mean is inside the desired interval. Running 100,000 such simulations, the sample mean was inside this interval 76,139 times. Thus for this situation, we estimate the probability of | sample mean - true mean | < 5, as about 76.1%.
142
One could approximate this probability by using the Studentʼs t distribution. In this case the sample mean is 76.7 and the sample variance is 193.35. The standard deviation of the sample mean is: 193.35 / 10 = 4.397. Thus -5 < corresponds to -4.5/4.397 = -1.023. The CDF at -1.023 of the t-distribution with 9 degrees of freedom is: 0.1665. Thus the desired probability is approximately: 1 - (2)(0.1665) = 0.667. However, the given data set is significantly skewed; the skewness is about 3. Therefore, the assumption underlying the t-test, that each variable is Normal, does not apply. However, if n, the number of data points, were larger, the approximation would be better.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 282
A Practical Example, MSE of Estimated Excess Ratio at $1 million: I will now apply this simulation version of the Bootstrap to a somewhat larger data set, the ungrouped data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”, with 130 losses. While this approaches a practical application, the calculations are much too long to perform under exam conditions. I will illustrate how to apply Bootstrapping to estimated Excess Ratios.143 Exercise: What is the Excess Ratio at $1 million for the ungrouped data in Section 2? [Solution: Only the 7 largest losses contribute to the layer excess of $1 million. They contribute 78,800 + 117,600 + 546,800 + 1,211,000 + 1,229,700 + 2,961,000 + 3,802,200 = $9,947,100. The total losses are $40,647,700. Thus the Excess Ratio at $1 million, R($1 million) = 9,947,100 / 40,647,700 = 0.2447.] We use simulation in order to estimate the MSE of this estimate of R($1 million) = 0.2447.144 For each simulation run, simulate 130 random numbers, with replacement, from the integers from 1 to 130. Then take the corresponding losses in the data set in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”. So for example, if the number 9 showed up twice, then the simulated set of losses would have two losses of size $10,400, since the 9th loss in the data set in Section 2 is $10,400. Then for this simulated set of 130 losses compute R($1 million).145 Letʼs say the first simulated data set produced an estimate of R($1 million) of 0.2085. Then treating 0.2447 as the true value of R($1 million), the squared error is: (0.2085 - 0.2447)2 = 0.001310. Then we perform many simulation runs, and record for each run the squared error. The average of all these squared errors, is the estimate of the MSE of our estimator. I ran this simulation 500 times.146 The sum of the squared errors was 3.8475. Thus the estimated mean squared error was 3.8475/500 = 0.00775. Thus an approximate 95% confidence interval for R($1 million) would be: 0.2447 ± 1.96 0.00775 = 0.24 ± 0.17.147 Note that here we have made no parametric assumptions.
143
The Excess Ratio is one minus the Loss Elimination Ratio. Excess Ratios were chosen as an example for illustrative purposes. One could apply these techniques to other quantities of interest such as Limited Expected Values, the 75th percentile, etc. 144 0.2447 is an estimate of the future excess ratio at $1 million we expect to observe for data from the same source as that in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”. 0.2447 is the empirical R($1 million) for this data set; in that sense it is exact. However, as an estimate of the underlying expected excess ratio it has an associated MSE. 145 Just as in the above exercise. 146 I used Mathematica. The first 10 simulations produced estimates of R($1 million) of: 0.208493, 0.215869, 0.264728, 0.0358846, 0.0499248, 0.183025, 0.256293, 0.219231, 0.188767, and 0.253879. The average excess ratio for the 500 simulation runs was 0.228. 147 As seen previously when estimating the variance of the estimates of R($1 million) from using maximum likelihood, with only 130 data points, it is very difficult to estimate R($1 million).
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 283
Alternately, one could estimate a 95% confidence interval for R($1 million) by ordering the simulation results from smallest to largest. Then the (0.025)(501) = 12.5th result is an estimate of the 2.5th percentile, while the (0.975)(501) = 488.5th result is an estimate of the 97.5th percentile. The 12th and 13th results are: 0.0555174, 0.0580671. Their average is: 0.057. The 488th and 489th results are: 0.392122, 0.392282. Their average is: 0.392. Thus an approximate 95% confidence interval for R($1 million) is [0.057, 0.392]. Note that the use of Bootstrapping to estimate the MSE, is distribution free. We made no assumption about what type of distribution the data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions” would follow. Rather, we are relying solely on the data that we actually observe. For example, even though the data in Section 2 contains no loss greater than $5,000,000, I assume such a loss is possible in the future. Applying Bootstrapping ignores this possibility.148 Exercise: What is the empirical R($5 million) for the ungrouped data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”. [Solution: Since there are no losses > $5 million, R($5 million) = 0.] Exercise: Using Bootstrapping, what is the estimated MSE of this estimate of R($5 million)? [Solution: Every selected set of data contains no loss greater than $5 million. Therefore, in each case, the estimate of R($5 million) = 0. Therefore, in each case the squared error is (0-0)2 = 0. Thus the estimated MSE is 0.] Clearly, the actual MSE of this estimator is positive rather than zero. This points out the limitations of Bootstrapping, when one applies it to quantities such as R($5 million), which are very sensitive to the righthand tail of the distribution. In such cases, one needs a large data set in order to usefully apply Bootstrapping.
An Example involving Actuarial Present Values:149 One observes a group of 100 lives all the same age. 3 die during the first year. 4 die during the second year. 4 die during the third year. 3 die during the fourth year. 6 die during the fifth year. The 80 lives remaining survive beyond the fifth year.150 148
In contrast, fitting a size of loss distribution, results in a value of S($5 million) > 0. See Example 21.16 in Loss Models, as well as the paragraph that follows this example. 150 Thus these 80 values are censored from above at 5. Estimation of the distribution function in more complicated situations are considered in “Mahlerʼs Guide to Survival Analysis.” 149
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 284
Assume that each of these lives was covered by a 5-year term insurance that paid at the end of the year of death.151 Assume the face amount was $100,000 on each policy. Then the insurer would pay a total of $2,000,000 in benefits. At 5%, the present value of those benefits is: (100,000) {3/1.05 + 4/1.052 + 4/1.053 + 3/1.054 + 6/1.055 ) = $1,710,988. Therefore, one could estimate the actuarial present value of the benefits of a 5 year term insurance provided to lives this age as: $1,710,988 / 100 = $17,110. Since this estimate is based on only 100 lives, it has a large mean squared error. Here is how one could use bootstrapping via simulation, in order to estimate the mean squared error. 1. $17,100 is the estimate of the actuarial present value, assuming the empirical distribution. 2. Simulate a random subset of size 100 from the original set of 100 lives, with replacement. 3. Calculate the present value of the benefits for this simulated set of lives. Divide by 100. 4. Tabulate the squared difference between the result of step 3 and $17,110. 5. Return to step 2. For example, assume that the first random set of 100 lives is:152 30, 39, 11, 8, 93, 11, 19, 54, 38, 94, 62, 18, 95, 30, 24, 1, 11, 29, 95, 24, 14, 77, 94, 26, 11, 17, 35, 96, 42, 79, 77, 21, 52, 61, 66, 75, 93, 15, 42, 91, 25, 79, 68, 86, 71, 42, 60, 71, 81, 10, 1, 50, 3, 56, 21, 40, 40, 24, 88, 35, 42, 31, 78, 98, 64, 45, 6, 24, 94, 56, 83, 52, 15, 14, 48, 61, 58, 74, 56, 31, 47, 45, 68, 61, 12, 12, 72, 63, 23, 40, 32, 6, 90, 11, 92, 19, 2, 85, 96, 64. Then the times until death, where C means censored from above at 5, are: C, C, 3, 3, C, 3, 5, C, C, C, C, 5, C, C, C, 1, 3, C, C, C, 4, C, C, C, 3, 5, C, C, C, C, C, C, C, C, C, C, C, 5, C, C, C, C, C, C, C, C, C, C, C, 3, 1, C, 1, C, C, C, C, C, C, C, C, C, C, C, C, C, 2, C, C, C, C, C, 5, 4, C, C, C, C, C, C, C, C, C, C, 4, 4, C, C, C, C, C, 2, C, 3, C, 5, 1, C, C, C. Thus there are: 4 payments at time 1, 2 payments at time 2, 7 payments at time 3, 4 payments at time 4, and 6 payments at time 5. The present value of these payments of 100,000 each is: (100,000) {4/1.05 + 2/1.052 + 7/1.053 + 4/1.054 + 6/1.055 } = $1,966,241. Dividing by 100, gives an actuarial present value of $19,662. The squared difference from the “true value” of 17,100 is: (19,662 - 17,100)2 = 6,563,844. This squared difference from the first run was tabulated. Then the process is repeated many times. 151
Payment at the end of the year of death was chosen for simplicity. A five year term was chosen to keeps things simple. 152 Where 19 would be 19th life from the original set, which died in the fifth year.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 285
Performing 1000 such simulations, the mean squared error was: 11,623,616.153 154 Therefore, the estimated root mean squared error is:
11,623,616 = 3409.
Thus an approximate 95% confidence interval for the actuarial present value is: $17,100 ± $6682.155 As expected, 100 lives is not enough to get a good estimate of the actuarial present value. One could apply the same techniques to more complicated situations with multiple truncation points from below and multiple censorship points from above.156 If one used the Kaplan-Meier Product Limit Estimator in order to estimate the survival function, then the bootstrap estimate of the variance of that estimate of the Survival Function is asymptotically equal to Greenwoodʼs Approximation.157
An Example of Estimating the Variance of an Estimator:158 Assume we have a sample of size n from a uniform distribution on (0, b). Then it turns out that the expected value of the maximum of the sample is: nb/(n+1).159 Therefore, an unbiased estimator of b is: {(n+1)/n} (maximum of the sample).160 Since this estimator of b is unbiased, its variance is equal to its Mean Squared Error.161 It turns out that the variance of this estimator of b is: b2 / {n(n+2)}.162 One has a sample of 20 values, assumed to be drawn from a uniform distribution on (0, b): 46, 114, 89, 602, 612, 813, 222, 826, 365, 279, 344, 417, 28, 69, 412, 311, 509, 244, 855, 473. Exercise: Use the estimator discussed above to get an unbiased estimator of b. [Solution: (21/20) (855) = 897.75.]
153
The average of the 1000 tabulated squared differences. For these 1000 simulation runs, the smallest actuarial present value was 6886, while the largest was 31,628. 155 Using plus or minus 1.96 standard errors. 156 See “Mahlerʼs Guide to Survival Analysis.” 157 See the last sentence on page 658 of Loss Models. As discussed in “Mahlerʼs Guide to Survival Analysis,” Greenwoodʼs Approximation is a way to estimate the variance of the Kaplan-Product Limit Estimator of the Survival Function. 158 See Exercise 21.13 in Loss Models. 159 See Example 12.5 in Loss Models. 160 See the section on Properties of Estimators in “Mahlerʼs Guide to Fitting Loss Distributions.” 161 MSE = Variance + Bias2 . 162 See Example 12.10 in Loss Models. 154
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 286
Exercise: What is the variance of that estimate? [Solution: (897.752 ) / {(20)(22)} = 1832. ^
Comment: By substituting b for b in the formula for the variance of this estimator, we get an estimate of the variance of this estimator.] One could instead use bootstrapping in order to estimate the variance of this estimator of b.163 1. Simulate a random subset of size 20 from the original set of 20, with replacement. 2. Calculate: (21/20) (maximum of the set from step 1). 3. Tabulate the result of step 2. 4. Return to step 1. After doing many simulation runs, the estimated variance of the estimator of b is the sample variance of the tabulated values. For example, the first random subset was: 412, 69, 417, 344, 279, 417, 509, 365, 222, 365, 114, 69, 412, 612, 813, 46, 813, 813, 826, 826. The resulting estimate of b was: (21/20) (826) = 867.3. Performing 1000 such simulations, the estimated variance of this estimator of b was: 2827.164
Histograms and Confidence Intervals: One can also display the output of the simulation version of Bootstrapping in a histogram. For example, one might be interested in the mean of the distribution from which was drawn the ungrouped data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”, with 130 losses.165 I simulated a random subset of size 130, with replacement, and then calculated the mean. I repeated this process 100,000 times. The results are displayed in the following histogram:
163
As pointed out in Loss Models, in this example one can not use bootstrapping to estimate the MSE, since given a data sample, there is no corresponding value of b for the empirical distribution. 164 The sample variance of the 1000 tabulated values. For these 1000 simulations, the smallest estimate of b was 535, while the largest was 898. Note that due to the nature of this particular estimator, we never get an estimate larger than: (21/20)(855) = 898. 165 The sample mean for this data is 312,675.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 287
Prob. 0.04
0.03
0.02
0.01
200000
300000
400000
500000
600000
XBar
This histogram is an estimate of the distribution of the sample mean X .166 By sorting the output from smallest to largest one can estimate the percentiles of the distribution of X .167 For example, the estimated 5th percentile would be: (0.05)(100,001) ≅ 5000th value, while the estimated 95th percentile would be: (0.95)(10001) ≅ 95,001st value. The estimated 5th percentile is 229,445, and the estimated 95th percentile is 409,627. Therefore, an approximate 90% confidence interval for the mean is (229,445, 409,627). One can apply similar techniques to other quantities of interest such as the variance, limited expected value, loss elimination ratio, etc.
166
For a sample of size 130 from a distribution, X , is a statistic subject to random fluctuation. For some samples, X will be close to the mean of the distribution µ, while for other samples it will not. 167 The smallest value out of 10,000 was 165,740. The largest value was 562,722.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 288
For example, one could estimate the median of the distribution from which was drawn the ungrouped data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”. Out of the 130 values, the 65th is 119,300, while the 66th is 122,000. Thus the smoothed empirical estimate of the median is: (119,300 + 122,000)/2 = 120,650. I simulated a random subset of size 130 from this ungrouped data, with replacement, and then calculated the smoothed empirical estimate of the median. I repeated this process 100,000 times. The results are displayed in the following histogram:
Prob. 0.15
0.10
0.05
100000
150000
200000
Median
This histogram is an estimate of the distribution of the empirical median for a sample of size 130 drawn from the distribution underlying the ungrouped data. There is random fluctuation; sometimes the estimated median will be close to the median of the underlying distribution, and sometimes it will not. However, we notice that there is much less random fluctuation than was the case for the mean. The median is less affected by the presence or absence of a few very large values in the data set than is the mean. The median is more robust than the mean; the empirical median is less affected by several unusual values in the data.168 168
The trimmed mean, in which one excludes for example 5% in each tail, is a more robust estimator of the underlying mean than is the sample mean. Unfortunately, the trimmed mean should not be used in practical applications which involve a distribution with skewness far from zero, as is the case for size of loss distributions and many other actuarial applications.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 289
As discussed in “Mahlerʼs Guide to Fitting Loss Distributions,” the sample variance, with N-1 in the denominator, is an unbiased estimate of the variance of the distribution from which the data was drawn. For the ungrouped data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”, there are 130 values, with first moment = 312,674.6, and second moment = 4.9284598 x 1011. Therefore, the sample variance is: (130/129)(4.9284598 x 1011 - 312,674.62 ) = 398,143 million. I simulated a random subset of size 130, with replacement, and then calculated the sample variance. I repeated this process 100,000 times. The results are displayed in the following histogram: Probability 0.025
0.020
0.015
0.010
0.005
0
200
400
600
800
1000
1200
(1000 million) 1400
The mean of the simulations is 395,805. This ungrouped data could have came from a discrete distribution equal to the empirical model. The variance of this discrete distribution is: 4.9284598 x 1011 - 312,674.62 = 395,081 million. Since the sample variance is an unbiased estimator of the variance of the distribution from which the data was drawn, its expected value for a simulation equals the variance of this discrete distribution, 395,081 million. The mean of simulations differs slightly due to the finite number of simulation runs. This histogram is an estimate of the distribution of the sample variance for a random sample of 130 from the distribution that generated the original ungrouped data. We can see that for this small data set from a distribution with a heavy righthand tail, the estimate of the variance is far from precise.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 290
As discussed in a question in “Mahlerʼs Guide to Fitting Loss Distributions,” for a sample of size i =n
n n > 2, it can be shown that
∑ (Xi - X)3
i =1
(n - 1) (n - 2)
is an unbiased estimate of the third central moment of
the distribution from which the data was drawn. For the ungrouped data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”, there are 130 values, with first moment = 312,674.6, second moment = 4.9284598 x 1011, and third moment = i =n
∑ (Xi - X )3
1.60022 x 1018.
i= 1
n
= E[X3 ] - 3 E[X2 ] E[X] + 2E[X3 ]. Therefore, this estimator is:
{1302 /(129)(128)} {1.60022 x 1018 - (3)(4.9284598 x 1011)(312,674.6) + (2)( 312,674.63 )} = 1.22724 x 1018. I simulated a random subset of size 130, with replacement, and then calculated this estimator. I repeated this process 100,000 times. The results are displayed in the following histogram: Probability 0.06 0.05 0.04 0.03 0.02 0.01 100
200
300
400
The mean of the simulations is 1.19935 x 1018. Again the estimates vary considerably between the simulation runs.
(10000 million million) 500
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 291
As another example, one could estimate the coefficient of variation of the ungrouped data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions” as:169 (Standard Deviation) / Mean = 628,554 / 312,675 = 2.010. I simulated a random subset of size 130, with replacement, and then calculated the CV. I repeated this process 100,000 times. The results are displayed in the following histogram:
Prob. 0.035 0.030 0.025 0.020 0.015 0.010 0.005 1.5
2.0
2.5
CV
This histogram is an estimate of the distribution of the observed coefficient of variation for a random sample of 130 from the distribution that generated the original ungrouped data. By sorting the output from smallest to largest one can estimate the percentiles of this distribution. The estimated 2.5th percentile is 1.326, and the estimated 97.5th percentile is 2.353. Therefore, an approximate 95% confidence interval for the CV is: (1.326, 2.353).170
169
I have used the biased estimate of the variance, rather than the sample variance. One would prefer to have a significantly larger sample than 130, in order to estimate the CV of a heavier-tailed distribution. 170
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 292
I simulated a random subset of size 130, with replacement, and then calculated the skewness. I repeated this process 100,000 times. The results are displayed in the following histogram: Prob.
0.025 0.020
0.015 0.010
0.005
2
3
4
5
6
7
8
9
Skew
This histogram is an estimate of the distribution of the observed skewness for a random sample of 130 from the distribution that generated the original ungrouped data. By sorting the output from smallest to largest one can estimate the percentiles of this distribution. The estimated 2.5th percentile is 2.582, and the estimated 97.5th percentile is 6.410. Therefore, an approximate 95% confidence interval for the skewness is: (2.582, 6.410).171
171
It is very difficult to estimate the skewness of a heavier-tailed distribution. The observed skewness for this ungrouped data set is 4.829.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 293
Problems: 17.1 (3 points) You are given the following set of 10 losses: 14, 16, 18, 19, 20, 22, 26, 37, 55, 81. You will use simulation and Bootstrapping to estimate the MSE of the limited expected value at 25. In the first simulation you use the ten random numbers from (0,1) of: 0.847, 0.136, 0.936, 0.309, 0.876, 0.478, 0.738, 0.964, 0.696, 0.337. What is the squared error for this simulation run? A. less than 2 B. at least 2 but less than 3 C. at least 3 but less than 4 D. at least 4 but less than 5 E. at least 5 17.2 (2 points) For 5 days, you observe the line waiting for tellers in a bank. The number of customers and the total time (in minutes) they spent in line was as follows: Day # of customers Time in line
1 872 2589
2 951 2934
3 752 1987
4 814 2317
5 738 1855
You will use simulation and Bootstrapping to estimate the MSE of the average time spent waiting in line. In the first simulation you use the five random numbers from (0,1) of: 0.241, 0.922, 0.718, 0.207, 0.152. What is the squared error for this simulation run? A. less than 0.01 B. at least 0.01 but less than 0.02 C. at least 0.02 but less than 0.03 D. at least 0.03 but less than 0.04 E. at least 0.04
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 294
17.3 (4 points) You are given the following: • A set of five values: 9, 17, 6, 15, 8.
•
You will estimate the variance as: 5
∑(Xi − X )2 i=1
4
, where X is the observed mean.
•
The following 15 random numbers from (0, 1): 0.984, 0.774, 0.353, 0.445, 0.029, 0.851, 0.364, 0.103, 0.817, 0.973, 0.314, 0.976, 0.195, 0.113, 0.758. Use simulation to determine a bootstrap estimate of the MSE of the estimated sample variance. A. less than 10 B. at least 10, but less than 15 C. at least 15, but less than 20 D. at least 20, but less than 25 E. at least 25 17.4 (3 points) One has four values: 0, 2, 5, 9. You simulate the following 6 random subsets of the original set: 9, 0, 5, 2 0, 9, 0, 9 0, 2, 2, 0 5, 2, 5, 5 9, 9, 5, 9 9, 2, 5, 2 Based on these 6 simulations, what is the estimate of the Mean Squared Error of the usual estimator of the mean using 4 values, X = (X1 + X2 + X3 + X4 )/4? A. less than 2 B. at least 2, but less than 3 C. at least 3, but less than 4 D. at least 4, but less than 5 E. at least 5
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 295
Use the following set of 10 values for the next 5 questions: 2 5 12 13 17 20 21 32 40
63
17.5 (1 point) What is the Limited Expected Value at 25? A. less than 14 B. at least 14, but less than 15 C. at least 15, but less than 16 D. at least 16, but less than 17 E. at least 17 17.6 (2 points) Use the following ten random numbers from (0, 1) to generate a set of ten random elements (with replacement) from the original set: 0.072, 0.711, 0.492, 0.647, 0.803, 0.869, 0.856 , 0.627, 0.049, 0.128. What is the Limited Expected Value at 25 of this new random set? A. less than 14 B. at least 14, but less than 15 C. at least 15, but less than 16 D. at least 16, but less than 17 E. at least 17 17.7 (2 points) Use the following ten random numbers from (0, 1) to generate a set of ten random elements (with replacement) from the original set: 0.578, 0.536, 0.103, 0.521, 0.296, 0.177, 0.646, 0.029, 0.488, 0.320. What is the Limited Expected Value at 25 of this new random set? A. 13.5 B. 14.0 C. 14.5 D. 15.0 E. 15.5 17.8 (2 points) Use the following ten random numbers from (0, 1) to generate a set of ten random elements (with replacement) from the original set: 0.835, 0.757, 0.402, 0.410, 0.293, 0.268, 0.673, 0.428, 0.475, 0.216. What is the Limited Expected Value at 25 of this new random set? A. 15.5 B. 16.0 C. 16.5 D. 17.0 E. 17.5 17.9 (1 point) Based on the solutions to the previous 4 questions, estimate the Mean Squared Error of the estimated Limited Expected Value at 25. A. less than 2 B. at least 2, but less than 3 C. at least 3, but less than 4 D. at least 4, but less than 5 E. at least 5
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 296
17.10 (2 points) A sample of three loss amounts is: 5, 10, and 50. You estimate the loss elimination ratio for a deductible of 10. You are given the following simulations from the sample: Simulation Loss Amounts 1 50 5 10 2 10 10 10 3 5 10 50 4 5 5 50 5 50 10 10 Determine the bootstrap approximation to the mean square error of the estimate. (A) 0.04 (B) 0.06 (C) 0.08 (D) 0.10 (E) 0.12 17.11 (3 points) One has six independent random draws from an unknown distribution: 1, 2, 4, 5, 7, 10. You simulate the following 10 random subsets of the original set: 4, 1, 7, 7, 5, 1 1, 4, 10, 1, 4, 10 7, 10, 10, 10, 5, 7 5, 7, 2, 10, 10, 5 10, 7, 7, 7, 1, 2 2, 5, 1, 5, 4, 7 1, 10, 5, 2, 10, 1 5, 1, 7, 4, 4, 2 10, 2, 7, 1, 2, 7 5, 10, 2, 5, 2, 2 Based on these 10 simulations, using the bootstrapping approach, what is the estimate of the probability that the mean of this unknown distribution is in [4, 5]? A. 50% B. 60% C. 70% D. 80% E. 90%
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 297
17.12 (3 points) A sample of four loss amounts is: 10, 30, 50, and 80. You estimate, e(20), the mean excess loss at 20. You are given the following simulations from the sample: Simulation Loss Amounts 1 30 50 50 30 2 80 10 80 10 3 30 50 30 30 4 30 10 30 50 5 50 50 50 80 Determine the bootstrap approximation to the mean square error of the estimate. A. less than 150 B. at least 150, but less than 200 C. at least 200, but less than 250 D. at least 250, but less than 300 E. at least 300 17.13 (4 points) You are given the following information on losses for a line of insurance: Accident Year Loss Reserves as of 12 months Losses at Ultimate 2000 8 11 2001 9 13 2002 10 12 Let Xi be the loss reserve as of 12 months for year i. xi = Xi - X . Let Yi be the ultimate losses for year i. yi = Yi - Y . You calculate the sample correlation between X and Y as:
∑xi yi . ∑ xi2 ∑ yi 2
For the observed data, the sample correlation is 1/2. You are given the following simulations of pairs taken from the sample: Simulation (Xi, Yi) 1 (8, 11), (8,11), (9, 13) 2 (10, 12), (9, 13), (9, 13) 3 (8, 11), (10, 12), (8, 11) 4 (9, 13), (8, 11), (9, 13) Determine the bootstrap approximation to the mean square error of the estimate. A. less than 0.4 B. at least 0.4, but less than 0.5 C. at least 0.5, but less than 0.6 D. at least 0.6, but less than 0.7 E. at least 0.7
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 298
17.14 (2 points) The annual losses from five insureds are: 0, 1, 2, 5, 30. You estimate the mean as the average of these five values. You are given the following simulations from the sample: Simulation Claim Amounts 1 30, 2, 2, 0, 2 2 2, 0, 30, 5, 30 3 30, 30, 5, 0, 5 4 0, 30, 0, 2, 1 Determine the bootstrap approximation to the mean square error of the estimate. A. less than 20 B. at least 20, but less than 25 C. at least 25, but less than 30 D. at least 30, but less than 35 E. at least 35 17.15 (5 points) Four individuals are all the same age at the beginning of a ten year study: Individual Time of Termination Cause of Termination A 4 Death B 6 Withdrew from Study C 9 Death D 10 Study Ended You estimate the survival function using the Kaplan-Meier Product Limit Estimator. You then use this estimated Survival Function to estimate at 5% the actuarial present value of a 10 year term insurance with a death benefit of 1. You are given the following simulations from the sample: Simulation Insured 1 C, C, A, B 2 C, B, D, C 3 A, A, C, B Determine the bootstrap approximation to the mean square error of this estimate of the actuarial present value of this term insurance. A. less than 0.02 B. at least 0.02, but less than 0.03 C. at least 0.03, but less than 0.04 D. at least 0.04, but less than 0.05 E. at least 0.05
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 299
17.16 (4 points) Three observed values of the random variable X are: 2, 7, 24. You estimate the third central moment of X using the estimator: 3
∑ (Xi − X) 3
g(X1 , X2 , X3 ) = i=1
3
.
You are given the following simulations from the sample: Simulation Simulated Sample 1 24, 7, 24 2 7, 24, 2 3 2, 2, 2 4 7, 2, 7 Determine the bootstrap approximation to the mean square error of g. A. less than 200,000 B. at least 200,000, but less than 250,000 C. at least 250,000, but less than 300,000 D. at least 300,000, but less than 350,000 E. at least 350,000 17.17 (4 points) The observed continuously compound monthly returns for ABC stock have been: -10%, -5%, -2%, -1%, 0, 1%, 2%, 3%, 5%, 10%, 20%. You simulated eight sets of size three from the observed sample: Simulation Simulated Sample 1 {0%, 1%, -5%} 2 {3%, -5%, -5%} 3 {5%, 3%, 1%} 4 {20%, 3%, -2%} 5 {10%, -10%, -5%} 6 {10%, 2%, 1%} 7 {-5%, -10%, 20%} 8 {3%, 5%, -2%} The current price of ABC stock is 100. Estimate the expected payoff of a 3 month 105 strike Arithmetic Average Asian Price Call Option on ABC stock. (The stock prices 1, 2, and 3 months from now will be averaged. The payoff is: (Average - 105)+.) A. less than 4 B. at least 4, but less than 5 C. at least 5, but less than 6 D. at least 6, but less than 7 E. at least 7
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 300
17.18 (6 points) One has seven values: 45, 71, 77, 78, 82, 95, 121. You simulate the following 10 random subsets of the original set, each of size seven: {78, 78, 82, 95, 71, 121, 78}
{95, 82, 71, 82, 82, 82, 71}
{71, 77, 121, 45, 95, 78, 95}
{77, 121, 77, 45, 95, 82, 95}
{77, 82, 77, 78, 121, 82, 78}
{121, 121, 121, 121, 71, 45, 95}
{121, 82, 78, 121, 82, 71, 82}
{45, 45, 45, 45, 45, 71, 95}
{121, 78, 82, 95, 95, 78, 78}
{45, 78, 71, 77, 95, 78, 45}
Define the quartiles as the 25th, 50th, and 75th percentiles. Define the interquartile range as the difference between the third and first quartiles. You will estimate the standard deviation using: (interquartile range) / 1.349. Based on these 10 simulations, what is the estimate of the Mean Squared Error of this estimator? A. 80 B. 90 C. 100 D. 110 E. 120 17.19 (4, 11/04, Q.16 & 2009 Sample Q.144) (2.5 points) A sample of claim amounts is {300, 600, 1500}. By applying the deductible to this sample, the loss elimination ratio for a deductible of 100 per claim is estimated to be 0.125. You are given the following simulations from the sample: Simulation Claim Amounts 1 600 600 1500 2 1500 300 1500 3 1500 300 600 4 600 600 300 5 600 300 1500 6 600 600 1500 7 1500 1500 1500 8 1500 300 1500 9 300 600 300 10 600 600 600 Determine the bootstrap approximation to the mean square error of the estimate. (A) 0.003 (B) 0.010 (C) 0.021 (D) 0.054 (E) 0.081 17.20 (3 points) In the previous question, determine the bootstrap approximation to the mean square error of the estimate of E[X ∧ 1000]. (A) 15,000 (B) 20,000 (C) 25,000 (D) 30,000 (E) 35,000
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 301
Solutions to Problems: 17.1. B. The empirical limited expected value at 25 is: (14 + 16 + 18 + 19 + 20 + 22 + 25 + 25 + 25 + 25)/10 = 20.9. We need 10 random integers from 1 to 10. Let u be a random number from (0,1), then in each case we take: 1 + largest integer in (10u). Thus the 10 random integers are: 9, 2, 10, 4, 9, 5, 8, 10, 7, 4. The corresponding losses are: 55, 16, 81, 19, 55, 20, 37, 81, 26, 19. For this set of losses, the limited expected value at 25 is: (25 + 16 + 25 + 19 + 25 + 20 + 25 + 25 + 25 + 19)/10 = 22.4. Thus the squared error is: (22.4 - 20.9)2 = 2.25. Comment: One would run many more simulations and then average the squared errors from the different runs, in order to estimate the MSE. 17.2. A. The estimated average waiting time in line: (2589 + 2934 + 1987 + 2317 + 1855)/(872 + 951 + 752 + 814 + 738) = 11682/4127 = 2.831. We need 5 random integers from 1 to 5. Let u be a random number from (0,1), then in each case we take: 1 + largest integer in (5u). Thus the 5 random integers are: 2, 5, 4, 2, 1. Using the data from these days, the estimated average waiting time in line : (2934 + 1855 + 2317 + 2934 + 2589)/(951 + 738 + 814 + 951 + 872) = 12629/4326 = 2.919. Thus the squared error is: (2.919 - 2.831)2 = 0.0077. Comment: Note that one picks random days, rather than separately picking # of customers and time spent in line, since we expect more time in total spent in line when there are more customers. In general, one should avoid separately simulating correlated variables. 17.3. A. Assuming a distribution uniform on the original data set, the variance is: (1/5){(9 - 11)2 + (17 - 11)2 + (6 - 11)2 + (15 - 11)2 + (8 - 11)2 } = 18. The first set of five random numbers: 0.984, 0.774, 0.353, 0.445, 0.029, corresponds to the following integers from 1 to 5: 5, 4, 2, 3, 1. These correspond to the five values from the given set: 8, 15, 17, 6, 9; the ordered sample is: 6, 8, 9 15, 17. The sample variance is 22.5. The next set of five random numbers, produces values: 8, 17, 9, 8, 8, with a sample variance of 15.5. The third set of five random numbers, produces values: 17, 8, 9, 9, 15, with a sample variance of 16.8. Thus the MSE = {(22.5 - 18)2 + (15.5 - 18)2 + (16.8 - 18)2 }/3 = 9.3.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 302
17.4. D. The “true value” is: (0 + 2 + 5 + 9)/4 = 4. Set Mean True Value Squared Difference 9, 0, 5, 2 4.0 4 0 0, 9, 0, 9 4.5 4 .25 0, 2, 2, 0 1.0 4 9 5, 2, 5, 5 4.25 4 .0625 9, 9, 5, 9 8.0 4 16 9, 2, 5, 2 4.5 4 .25 MSE = (0 + .25 + 9 + .0625 + 16 + .25)/6 = 4.26. Comment: In actual applications one would perform at least 250 simulations. Performing 100,000 simulations, the estimated MSE was 2.872. For the usual estimator of the mean, there is no need to do bootstrapping via simulation. For this data set, E[X2 ] = (0 + 4 + 25 + 81)/4 = 27.5. Thus the MSE of the usual estimator of the mean is: (E[X2 ] - E[X]2 )/N = (27.5 - 42 )/4 = 2.875. 17.5. D. (2 + 5 + 12 + 13 +17 + 20 + 21 + 25 + 25 + 25)/10 = 16.5. 17.6. D. The random numbers from (0, 1) are converted to random integers from 1 to 10, by taking the largest integer in: 1 + 10u: {1, 8, 5, 7, 9, 9, 9, 7, 1, 2}. This corresponds to the elements: {2, 32, 17, 21, 40, 40, 40, 21, 2 ,5}. The LEV at 25 is: (2 + 25 + 17 + 21 + 25 + 25 + 25 + 21 + 2 + 5)/10 = 16.8. 17.7. A. The random numbers from (0, 1) are converted to random integers from 1 to 10, by taking the largest integer in: 1 + 10u: {6, 6, 2, 6, 3, 2, 7, 1, 5, 4}. This corresponds to the elements: {20, 20, 5, 20, 12, 5, 21, 2, 17, 13}. The LEV at 25 is: (20 + 20 + 5 + 20 + 12 + 5 + 21 + 2 + 17 +13)/10 = 13.5. 17.8. E. The random numbers from (0, 1) are converted to random integers from 1 to 10, by taking the largest integer in: 1 + 10u: {9, 8, 5, 5, 3, 3, 7, 5, 5, 3}. This corresponds to the elements: {40, 32, 17, 17, 12, 12, 21, 17, 17, 12}. The LEV at 25 is: (25 + 25 + 17 + 17 + 12 + 12 + 21 + 17 + 17 + 12)/10 = 17.5.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 303
17.9. C. Take 16.5 as the true value. The three estimates are: 16.8, 13.5 and 17.5 The mean squared error is: {(16.8 - 16.5)2 + (13.5 - 16.5)2 + (17.5 - 16.5)2 }/3 = 3.36. Comment: In actual application one would run many more simulations than 3. When I ran 10,000 simulations, I got an estimated mean squared error of 6.19. I then applied the same idea to the LEV at $1 million of the ungrouped data set in Section 1. The estimate is $236,158. Running 10,000 simulations, I got an estimated mean squared error of 623.4 million; or a root mean squared error of about 25 thousand. Thus the estimated E[X ∧ 1,000,000], using ± 2 root mean squared errors is: $236 ± 50 thousand. Similarly, the estimated E[X ∧ 100,000] is $74,995, and the estimated MSE is 9.099 million; the estimated E[X ∧ 100,000] is $75 ± 6 thousand. Note that in terms of percentages, the interval estimate of E[X ∧ 100,000] is narrower than that of E[X ∧ 1,000,000]. 17.10. C. For the sample, the empirical Loss Elimination Ratio at 10 is: (5 + 10 + 10)/(5 + 10 + 50) = 25/65 = .3846.
50 10 5 5 50
5 10 10 5 10
10 10 50 50 10
Losses Eliminated
Total of Losses
LER
(LER - .3846)^2
25 30 25 20 30
65 30 65 60 70
0.3846 1.0000 0.3846 0.3333 0.4286
0.0000 0.3787 0.0000 0.0026 0.0019
SUM
0.3833
The bootstrap approximation to the mean square error of the estimate: .3833/5 = 0.0767. Comment: Similar to 4, 11/04, Q.16. In a practical application, one would have more than 3 values in the sample and one would simulate hundreds of random subsets. 17.11. B. Six of the ten means are in the interval [4, 5], for an estimated probability of 60%. 4, 1, 7, 7, 5, 1 has mean 4.17 1, 4, 10, 1, 4, 10 has mean 5 7, 10, 10, 10, 5, 7 has mean 8.17 5, 7, 2, 10, 10, 5 has mean 6.5 10, 7, 7, 7, 1, 2 has mean 5.67 2, 5, 1, 5, 4, 7 has mean 4 1, 10, 5, 2, 10, 1 has mean 4.83 5, 1, 7, 4, 4, 2 has mean 3.83 10, 2, 7, 1, 2, 7 has mean 4.83 5, 10, 2, 5, 2, 2 has mean 4.33 Comment: It would be faster to just check whether the sum of each set is in [24, 30].
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 304
17.12. E. For the sample, the empirical mean excess loss at 20 is: (10 + 30 + 60)/3 = 33.33. Number of Losses Excess Mean Excess Large Losses of 20 Loss at 20 30 80 30 30 50
50 10 50 10 50
50 80 30 30 50
30 10 30 50 80
4 2 4 3 4
80 120 60 50 150
(e(20) - 33.33)^2
20.00 60.00 15.00 16.67 37.50
177.7 711.3 336.0 277.7 17.4
SUM
1520.0
The bootstrap approximation to the mean square error of the estimate: 1520/5 = 304. Comment: In a practical application, one would have more values in the sample and one would simulate hundreds of random subsets. 17.13. E. For the first simulation, x = (-1/3, -1/3, 2/3) and y = (-2/3, -2/3, 4/3).
Σ xiy i /
∑ xi2 ∑ yi2 = (4/3)/
(2 / 3)(8 / 3) = 1.
For the second simulation, x = (2/3, -1/3, -1/3) and y = (-2/3, 1/3, 1/3).
Σ xiy i /
∑ xi2 ∑ yi2 = (-2/3)/
(2 / 3)(2 / 3) = -1.
For the third simulation, x = (-2/3, 4/3, -2/3) and y = (-1/3, 2/3, -1/3).
Σ xiy i /
∑ xi2 ∑ yi2 = (4/3)/
(8 / 3)(2 / 3) = 1.
For the fourth simulation, x = (1/3, -2/3, 1/3) and y = (2/3, -4/3, 2/3).
Σ xiy i /
∑ xi2 ∑ yi2 = (4/3)/
(2 / 3)(8 / 3) = 1.
MSE = {(1 - 1/2)2 + (-1 - 1/2)2 + (1 - 1/2)2 + (1 - 1/2)2 }/4 = 0.75. Comment: In a practical application, one would have more values in the sample and one would simulate hundreds of random subsets. 17.14. A. Mean for the data is 38/5 = 7.6.
30 2 30 0
2 0 30 30
2 30 5 0
0 5 0 2
2 30 5 1
Average of Losses
(mean - 7.6)^2
7.2 13.4 14 6.6
0.16 33.64 40.96 1.00 75.76
The bootstrap approximation to the mean square error of the estimate is: 75.76/4 = 18.94. Comment: As discussed in the previous section, one would not usually use bootstrapping to estimate the MSE of the usual estimator of the mean.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 305
17.15. D. For the original data set: t s r (s-r)/r S(t) 4 1 4 3/4 3/4 9 1 2 1/2 3/8 APV = (1 - 3/4)/1.054 + (3/4 - 3/8)/1.059 = 0.4474. For the simulated sample C, C, A, B: t s r (s-r)/r S(t) 4 1 4 3/4 3/4 9 2 2 0 0 APV = (1 - 3/4)/1.054 + (3/4 - 0)/1.059 = 0.6891. For the simulated sample C, B, D, C: t s r (s-r)/r S(t) 9 2 3 1/3 1/3 APV = (1 - 1/3)/1.059 = 0.4297. For the simulated sample A, A, C, B: t s r (s-r)/r S(t) 4 2 4 1/2 1/2 9 1 1 0 0 APV = (1 - 1/2)/1.054 + (1/2 - 0)/1.059 = .7337. The bootstrap approximation to the mean square error of the estimate is: {(.6891 - .4474)2 + (.4297 - .4474)2 + (.7337 - .4474)2 }/3 = 0.0469. Comment: See Example 21.16 in Loss Models. 17.16. C. For a uniform and discrete distribution on 2, 7, and 24, the mean is 11 and the third central moment is: {(2 - 11)3 + (7 - 11)3 + (24 -11)3 }/3 = 468. 24 7 2 7
7 24 2 2
24 2 2 7
Mean
g
(g - 468)^2
18.333 11.000 2.000 5.333
-363.926 468.000 0.000 -9.259
692,101 0 219,024 227,776 1,138,901
The bootstrap approximation to the mean square error of the estimate is: 1,138,901/4 = 284,725.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 306
17.17. A. For each sample we determine the prices and then average them. The payoff is: (Average - 105)+. For example, for the third simulation, the prices are: 100 e.05 = 105.13, 100e.08 = 108.33, and 100e.09 = 109.42. (105.13 + 108.33 + 109.42)/3 ≅ 107.62. (107.62 - 105)+ = 2.62. Simulation
Return 1
Return 2
Return 3
Price 1
Price 2
Price 3
Average
Payoff
1 2 3 4 5 6 7 8
0 0.03 0.05 0.2 0.1 0.1 -0.05 0.03
0.01 -0.05 0.03 0.03 -0.1 0.02 -0.1 0.05
-0.05 0.05 0.01 -0.02 -0.05 0.01 0.2 -0.02
100.00 103.05 105.13 122.14 110.52 110.52 95.12 103.05
101.01 98.02 108.33 125.86 100.00 112.75 86.07 108.33
96.08 103.05 109.42 123.37 95.12 113.88 105.13 106.18
99.03 101.37 107.62 123.79 101.88 112.38 95.44 105.85
0.00 0.00 2.62 18.79 0.00 7.38 0.00 0.85
Average
3.71
Comment: We could instead take random actual sets of consecutive three months returns, in order to preserve any correlation between monthly returns. See p. 831 of Derivative Markets by McDonald, not on the syllabus.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 307
17.18. D. For the empirical distribution, mean = (45 + 71 + .. + 121) / 7 = 81.2857, second moment = (452 + 712 + .. + 1212 ) / 7 = 7067, variance = 7067 - 81.28572 = 459.633, and standard deviation = 21.439. Treat 21.439 as the “true value” of σ. (0.25) (7+1) = 2. (0.75) (7+1) = 6. Thus for each of the subsets, we will estimate the 25th percentile as the 2nd value from smallest to largest, while we will estimate the 75th percentile as the 6th value from smallest to largest. Then as stated in the question, for each simulated subset we estimate the σ using: interquantile range (6th value) - (2nd value) = . 1.349 1.349 Subset
25th percentile
{78, 78, 82, 95, 71, 121, 78} 78 {95, 82, 71, 82, 82, 82, 71} 71 {71, 77, 121, 45, 95, 78, 95} 71 {77, 121, 77, 45, 95, 82, 95} 77 {77, 82, 77, 78, 121, 82, 78} 77 {121, 121, 121, 121, 71, 45, 95} 71 {121, 82, 78, 121, 82, 71, 82} 78 {45, 45, 45, 45, 45, 71, 95} 45 {121, 78, 82, 95, 95, 78, 78} 78 {45, 78, 71, 77, 95, 78, 45} 45 The bootstrap estimate of the MSE of the estimator is:
75th percentile
Estimate of σ
95 82 95 95 82 121 121 71 95 78
12.602 8.154 17.791 13.343 3.706 37.064 31.875 19.274 12.602 24.462
{(12.602 - 21.439)2 + (8.154 - 21.439)2 + ... + (24.462 - 21.439)2 } / 10 = 109.3. Comment: This estimator of the standard deviation is sometimes used for small samples. It is based on the fact that for a Normal Distribution, the interquartile range is 1.349σ.
2013-4-13,
Simulation §17 Bootstrapping via Simulation, HCM 10/25/12, Page 308
17.19. A. Since all of the losses are greater than 100, the losses eliminated are 300 and the LER is: 300/(total losses).
600 1500 1500 600 600 600 1500 1500 300 600
600 300 300 600 300 600 1500 300 600 600
Total of Losses
LER
(LER - .125)^2
2700 3300 2400 1500 2400 2700 4500 3300 1200 1800
0.111111 0.090909 0.125000 0.200000 0.125000 0.111111 0.066667 0.090909 0.250000 0.166667
0.000193 0.001162 0.000000 0.005625 0.000000 0.000193 0.003403 0.001162 0.015625 0.001736
SUM
0.029099
1500 1500 600 300 1500 1500 1500 1500 300 600
The bootstrap approximation to the mean square error of the estimate: .0291/10 = 0.0029. Comment: There is no reason why a given subset could not happen to be simulated twice, as happened in this case for {600, 600, 1500}. In a practical application one would have more than 3 values in the sample and one would simulate hundreds of random subsets. 17.20. C. For the sample, E[X
600 1500 1500 600 600 600 1500 1500 300 600
600 300 300 600 300 600 1500 300 600 600
1500 1500 600 300 1500 1500 1500 1500 300 600
∧
1000] = (300 + 600 + 1000)/3 = 633.3. Total of Losses Limited to 1000
LEV at 1000
(LEV - 633.3)^2
2200 2300 1900 1500 1900 2200 3000 2300 1200 1800
733.3 766.7 633.3 500.0 633.3 733.3 1000.0 766.7 400.0 600.0
10,007 17,787 0 17,769 0 10,007 134,469 17,787 54,429 1,109
SUM
263,362
The bootstrap approximation to the mean square error of the estimate: 263362/10 = 26,336.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 309
Section 18, Estimating the p-value via Simulation p-value = Prob[test statistic takes on a value equal to its calculated value or a value less in agreement with H0 (in the direction of H1 ) | H0 ]. Rather than consulting the appropriate statistical table,172 one can estimate the probability-values or p-values of statistical tests via simulation. I will discuss how to do so for the Chi-Square test, the Kolmogorov-Smirnov test, and other statistical tests.173 One has to deal somewhat differently with those situations where one has an assumed distribution and where one has fit a distribution to the given data.174 The former situation will be discussed first. Chi-square Test, Assumed Distribution: In each simulation run, one would take the observed number of random draws from the assumed distribution. One would then group the simulated data and compute the Chi-Square Statistic. One would record the results of each of the many simulation runs. Then for example, the p-value for 8.23 would be estimated by the percentage of the simulations runs in which the Chi-Square statistic was greater than or equal to 8.23. Here is a specific example involving comparing an assumed Burr with the Grouped Data in Section 3 of “Mahlerʼs Guide to Fitting Loss Distributions.” Interval ($000) Number of Accidents 0-5 2208 5 -10 2247 10-15 1701 15-20 1220 20-25 799 25-50 1481 50-75 254 75-100 57 100 - ∞ 33 172
See “Mahlerʼs Guide to Fitting Loss Distributions.” In the case of the Chi-Square test, one could either consult a statistical table, or compute an incomplete Gamma Function and use the fact that a Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with parameters α = ν/2 and θ =2. 173 While you are unlikely in your work to apply this technique to tests for which there are statistical tables, these general techniques can be applied to other cases for which there are not tables, for example, the CDF Distance. The Chi-Square and K-S tables are based on a large sample approximation, and therefore the simulation techniques may be more accurate in the case of small sample sizes. Also, when fitting by methods other than Maximum Likelihood applied to ungrouped data or Minimum Chi-Square, the number of degrees of freedom lost is only approximately the number of fitted parameters, and therefore the p-value from simulation may be a little more accurate. 174 In the case applying the Chi-Square test to fitted distributions, we lose a number of degrees of freedom equal to the number of fitted parameters, All else being equal, this lowers the p-value corresponding to a given value of the Chi-Square Statistic. Thus it makes sense that the simulation experiment would differ somewhat when one has a fitted distribution, rather than an assumed distribution.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 310
⎛ ⎞α 1 Exercise: For a Burr Distribution, 1 - ⎜ γ ⎟ , with parameters α = 4.3, θ = 42,000, and ⎝ 1 + (x / θ) ⎠ γ = 1.3, and the above grouped data, calculate the Chi-Square Statistic. [Solution: The Chi-square Statistic is 19.484. Bottom of Interval $ Thous. 0 5 10 15 20 25 50 75 100
Top of Interval $ Thous. 5 10 15 20 25 50 75 100 Infinity
# claims in the Interval 2208 2247 1701 1220 799 1481 254 57 33
F(lower)
F(upper)
0.00000 0.23063 0.46146 0.63264 0.75057 0.82975 0.96966 0.99255 0.99765
0.23063 0.46146 0.63264 0.75057 0.82975 0.96966 0.99255 0.99765 1.00000
10000
F(upper) minus F(lower) 0.23063 0.23083 0.17118 0.11794 0.07918 0.13991 0.02289 0.00510 0.00235
Fitted # claims 2306.3 2308.3 1711.8 1179.4 791.8 1399.1 228.9 51.0 23.5
(ObservedFitted)^2 /Fitted 4.187 1.630 0.068 1.401 0.066 4.791 2.761 0.694 3.887
1.00000
10000
19.484
The grouped data has 9 intervals; so one could consult the Chi-Square Table for 8 degrees of freedom: Degrees Significance Levels of Freedom 0.050 0.025 0.010 0.005 8 15.51 17.53 20.09 21.96 Since 17.53 < 19.48 < 20.09, the p-value is between 2.5% and 1%. In order to estimate the p-value by simulation, one would perform many runs. On each run, one would first simulate 10,000 random losses from the assumed Burr with parameters α = 4.3, θ = 42,000, and γ = 1.3.175 Doing so and grouping them into the same intervals as above, my first simulation run yielded: Interval ($000) Number of Accidents 0-5 2329 5 -10 2273 10-15 1644 15-20 1216 20-25 782 25-50 1447 50-75 241 75-100 42 100 - ∞ 26 175
If u is a random number from zero to one, set u = F(x) and solve for x. x = θ((1-u)−1/α -1)1/γ .
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 311
Next, one calculates the Chi-Square Statistic for this simulated data set and the assumed Burr Distribution. Bottom of Interval $ Thous. 0 5 10 15 20 25 50 75 100
Top of Interval $ Thous. 5 10 15 20 25 50 75 100 Infinity
# claims in the Interval 2329 2273 1644 1216 782 1447 241 42 26
F(lower)
F(upper)
0.00000 0.23063 0.46146 0.63264 0.75057 0.82975 0.96966 0.99255 0.99765
0.23063 0.46146 0.63264 0.75057 0.82975 0.96966 0.99255 0.99765 1.00000
10000
F(upper) minus F(lower) 0.23063 0.23083 0.17118 0.11794 0.07918 0.13991 0.02289 0.00510 0.00235
Fitted # claims 2306.3 2308.3 1711.8 1179.4 791.8 1399.1 228.9 51.0 23.5
(ObservedFitted)^2 /Fitted 0.224 0.541 2.684 1.139 0.121 1.638 0.643 1.604 0.277
1.00000
10000
8.870
The resulting Chi-Square Statistic for this first simulation run is 8.870. One would continue to perform simulation runs. On each run, one simulates a set of 10,000 losses from the assumed Burr, and computes the Chi-Square statistic for the simulated data and the assumed Burr. One then tabulates the results of many runs. For example, the next ten simulation runs produced Chi-Square statistics of: 5.006, 5.606, 3.282, 6.188, 7.606, 13.488, 9.336, 9.718, 5.237, and 5.001. The percentage of runs in which the Chi-Square is greater than or equal to 19.48 is the estimated p-value for 19.48.176 When I simulated 1000 runs, 4 of them had Chi-Square statistics greater than or equal to 19.48. Therefore, the estimated p-value for 19.48 is: 4/1000 = 0.4%.
Chi-square Test, Fitted Distribution: One has to handle the case of a fitted distribution a little differently than that of an assumed distribution. Again, one simulates a data set of appropriate size from the distribution, in this case the fitted distribution we are testing. However, in this case one fits the same type of distribution to the simulated data via the same method.
176
The p-value is the chance that the statistic would be ≥ 19.48 for 10,000 random losses from the assumed Burr. The null hypothesis is that the data comes from a Burr with parameters α = 4.3, θ = 42,000, and γ = 1.3. For a small p-value, the null hypothesis is rejected. Note that once I have the tabulated Chi-Square values I could use them to estimate the p-value if I was comparing this Burr to another set of 10,000 random losses grouped in the same manner. The original grouped data was only used to get the groupings and to compute the Chi-Square statistic of 19.48. The original grouped data did not specifically enter into the simulations.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 312
⎛ ⎞α 1 For example, the maximum likelihood Burr, 1 - ⎜ γ ⎟ , fit to the grouped data in ⎝ 1 + (x / θ) ⎠ Section 3 of “Mahlerʼs Guide to Fitting Loss Distributions,” has parameters α = 3.9913, θ = 40,467, and γ = 1.3124. The Chi-Square Statistic is 1.458.177 In order, to estimate the p-value by simulation, one would perform many runs. On each run, one would first simulate 10,000 random losses from this Burr with parameters α = 3.9913, θ = 40,467, and γ = 1.3124. Then group the simulated data into the same 9 intervals and fit a Burr to this grouped data using the method of maximum likelihood. This Burr Distribution might have parameters α = 4.0102, θ = 39,510, and γ = 1.3347. Then one computes the Chi-Square Statistic for the simulated grouped data and the Burr Distribution with parameters α = 4.0102, θ = 39,510, and γ = 1.3347. One would record the results of each of the many simulation runs. Then the p-value for 1.458 would be estimated by the percentage of the simulations runs in which the Chi-Square statistic was greater than or equal to 1.458.178
In General, when Data Has Not Been Fit to the Distribution: You are given a distribution you wish to compare to a given data set. 1. Simulate a data set of appropriate size from the given distribution.179 2. Compute the value of the statistic. 3. Tabulate the value of the statistic and return to step 1. When enough simulations have been run, the tabulated values of the statistic can be used to compute a p-value. The percentage of simulation runs in which the statistic is ≥ the given value is the corresponding p-value.180
177
See “Mahlerʼs Guide to Fitting Loss Distributions.” Since 1.458 is an extremely small Chi-Square statistic for 9 - 1 - 3 = 5 degrees of freedom, the p-value is about 90%, representing an excellent fit. 179 For the case of the K-S statistic, one can use the uniform distribution on [0,1]. 180 While the Chi-Square and K-S are one-sided tests, the same technique can be applied to two-sided tests. 178
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 313
In General, when Data Has Been Fit to the Distribution: You are given a distribution fit to a given data set. 1. Simulate a data set of appropriate size from the given distribution. 2. Fit a new distribution of the same family to the simulated data using whatever technique had been used to fit the original distribution. 3. Compute the value of the statistic using the distribution from step 2 and the simulated data from step 1. 4. Tabulate the value of the statistic and return to step 1. When enough simulations have been run, the tabulated values of the statistic can be used to compute a p-value. The percentage of simulation runs in which the statistic is ≥ the given value is the corresponding p-value.
Kolmogorov-Smirnov Statistic, Assumed Distribution: Since the K-S Statistic is distribution-free, we can assume for simulation purposes that the data comes from a uniform distribution on (0,1). We run many simulations. For each run we simulate n random numbers from (0,1) and then compute the K-S Statistic, comparing to the uniform distribution on (0,1). Exercise: For the first simulation run, you get the following 6 random numbers from (0,1): 0.454, 0.116, 0.032, 0.669, 0.783, 0.278 . What is the K-S Statistic for this simulation run? [Solution: The maximum absolute difference occurs just after 0.278; the K-S Stat. is 0.2220.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 314
X
Assumed F(X)
0.032
0.0320
Observed Distribution 0.0000
Absolute Value of Fitted - Observed 0.0320 0.1347
0.1667 0.0507 0.116
0.1160 0.2173 0.3333 0.0553
0.278
0.2780 0.2220 0.5000 0.0460
0.454
0.4540 0.2127 0.6667 0.0023
0.669
0.6690 0.1643 0.8333 0.0503
0.783
0.7830 0.2170 1.0000
We record the K-S statistic computed for each of the many simulation runs. Then for example, the estimated p-value for 0.180, is the percentage of runs in which the K-S Statistic was greater than or equal to 0.180. Exercise: We perform 1000 such simulation runs, in each run taking a sample size of 100. Of these 1000 simulation runs, 840 have a K-S Statistic less than 0.110. Estimate the p-value for 0.110 with 100 data points. [Solution: (1000 - 840)/1000 = 16.0%.]
Simulation Experiment, Kolmogorov-Smirnov Statistic: I performed 1000 simulation runs. For each run I simulated 100 random numbers from (0,1) and then computed the K-S Statistic, comparing to the uniform distribution on (0,1). Selected results were as follows, grading from smallest to largest:
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 315
Order Statistics (1000 runs)
K-S Statistic (Data Set of Size 100)
800 801
0.105145 0.105326
900 901
0.119011 0.119588
950 951
0.130744 0.130765
990 991 992 993 994 995 996 997 998 999 1000
0.150539 0.150877 0.155919 0.160689 0.162398 0.162656 0.163302 0.163891 0.172224 0.173012 0.183574
Out of 1000 runs, 201 results of runs were greater than 0.105145, while 200 results of runs were greater than 0.105326. Therefore, the estimated critical value for 20% is about 10.5%. Significance Level 20% 10% 5% 1%
Estimated Critical Value 0.105 0.119 0.131 0.151
Critical Value Using K-S Table (n = 100)181 0.107 0.122 0.136 0.163
Somewhat more precisely, for the 20% critical value, we want to estimate the 80th percentile of the K-S distribution. The smoothed empirical estimate of 80th percentile is the (1000 + 1)(0.8) = 800.8th value: (0.105145)(0.2) + (0.105326)(0.8) = 0.10529. I performed 1000 additional simulation runs; however, for each run I simulated instead 10,000 random numbers from (0,1) and then computed the K-S Statistic, comparing to the uniform distribution on (0,1). Selected results were as follows, grading from smallest to largest: 181
Somewhat more precise critical values can be obtained by taking c/( n + 0.12 + 0.11/ n ) rather than c/ n . See page 105 of Goodness-of-Fit-Techniques by DʼAgostino and Stephens. For n = 100, we would take c/10.131 rather than c/10. This would result instead in critical values of: 10.6%, 12.1%, 13.4%, and 16.1%.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 316
Order Statistics (1000 runs)
K-S Statistic (Data Set of Size 10,000)
800 801
0.0107684 0.0107788
900 901
0.0123457 0.0123822
950 951
0.0135538 0.0135566
990 991 992 993 994 995 996 997 998 999 1000
0.0164725 0.0164932 0.0165467 0.0166142 0.0168428 0.0178481 0.0189834 0.0196795 0.0201126 0.0208079 0.0226108
Out of 1000 runs, 201 results of runs were greater than or equal to 0.0107684, while 200 results of runs were greater or equal to than 0.0107788. Therefore, the estimated critical value for 20% is about 0.01077. Significance Level 20% 10% 5% 1%
Estimated Critical Value 0.01077 0.01236 0.01356 0.01648
Critical Value Using K-S Table (n = 10,000)182 0.010727 0.012238 0.013581 0.016276
There is a reasonable match between the results of these simulations and the table of Kolmogorov-Smirnov critical values. The match is better than for the previous set of simulations, with data sets of size 100 rather than 10,000.183
182
Using critical values with more accuracy than commonly displayed in a KS table. The K-S table is based on an asymptotic result for large sample sizes. Also, the estimated critical values are subject to random fluctuation, since they were based on a finite number (1000) simulation runs. 183
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 317
Critical Values versus p-values: For a given value of the K-S statistic, the corresponding p-value is the value of the K-S Survival Function at that value of the K-S statistic. Exercise: For the K-S statistic with a sample size of 10,000, use the results of the previous simulation in order to estimate the p-value for 0.0165. [Solution: Out of 1000 runs, 9 have a statistic greater than 0.0165. 9/1000 = 0.9%.] p-value: given x find S(x).184 In contrast, for a given value of the Survival Function, the corresponding critical value is the value of the K-S statistic such that we get the given value of the K-S Survival Function. For example, for 10% we estimated the critical value as 0.01236, since S(0.01236) = 10%. critical value: given S(x) find x.185 Kolmogorov-Smirnov Statistic, Fitted Distribution: One can also compare a distribution to data to which it has been fit.186 In that case, since we have picked the “best” parameters of the distribution so that it will come closest to matching the data, the distribution matches the data better than it would otherwise.187 Therefore, the K-S statistic will be smaller than it otherwise would be. Therefore, the probability of seeing a large K-S statistic due to random fluctuation is smaller than it otherwise would be. In other words, the correct probability value for a given value of the K-S statistic is smaller when one compares a distribution to data to which it has been fit, than when one has an assumed distribution By the same reasoning, the 95th percentile of the distribution of K-S statistic will be smaller for the case when one compares a distribution to data to which it has been fit. The 95th percentile of the distribution of K-S statistic is the 5% critical value. Therefore, the correct 5% critical value will be smaller for the case when one compares a distribution to data to which it has been fit, than when one has an assumed distribution.188
184
This is for a one-sided test such as the K-S. For a two-sided test one would add the area in both tails. This is for a one-sided test such as the K-S. For a two-sided test one would add the area in both tails. 186 See “Mahlerʼs Guide to Fitting Loss Distributions.” 187 The criteria of what are the best parameters varies based on the method of fitting. 188 This is the same reason we reduce the number of degrees of freedom for the Ch-Square Goodness-of-Fit Test, when one compares a distribution to data to which it has been fit. See “Mahlerʼs Guide to Fitting Loss Distributions.” 185
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 318
The critical values in the usual K-S Table are based on the case of an assumed rather than fitted distribution. Therefore, if one were to apply the usual K-S Table in the case of a fitted distribution, one would be using critical values that are too large. As was done previously, using the uniform distribution in the simulation, corresponds to the case of an assumed distribution. Using the uniform distribution produces results similar to the usual K-S Table. Therefore, using the uniform distribution in the simulation and then incorrectly applying the results to a case of a distribution fit to the data to which it is compared, would produce critical values that are too large, and would overstate the p-value.189 Thus if more accuracy is desired, one should instead use a somewhat more complicated simulation procedure. For example, assume a Pareto Distributions with parameters α = 2 and θ = 100 has been fit via maximum likelihood to 1000 data points. Then in each simulation run we would generate 1000 random draws from this Pareto with parameters α = 2 and θ = 100. Then fit a Pareto via maximum likelihood to the simulated data; for example, for the first run the fitted Pareto had parameters α = 2.45465 and θ = 140.716. Then we compute the K-S Statistic for the empirical distribution corresponding to these 1000 simulated points versus the Pareto fit to the simulated data with parameters α = 2.45465 and θ = 140.716; the K-S Statistic for the first run turns out to be 0.0140239. Then we record the K-S statistic computed for each of many simulation runs. As before, I performed 1000 simulation runs. For example, the estimated p-value for 0.023 is the percentage of simulation runs in which the K-S Statistic was greater than or equal to 0.023; for the values shown below, this is: 200/1000 = 20%. In addition, I ran a simulation similar to the previous ones in which one compared to the uniform distribution, corresponding to an assumed rather than fitted distribution. Exercise: Using the values shown subsequently, estimate the probability value for this situation for a K-S statistic of 0.033. Use both the results of the more complicated simulation fitting a Pareto, and the simpler simulation using a Uniform Distribution. [Solution: For the more complicated simulation, 5 out of 1000 values are greater than 0.033; thus the estimated p-value is 0.5%. For the simpler simulation, somewhat more than 200 out of 1000 values are greater than 0.033; thus the estimated p-value is somewhat more than 20%. Comment: The estimated p-value of about 20% from the simulation using the Uniform Distribution is erroneous; it is much bigger than 0.5%, the p-value approximately appropriate in this situation for a K-S Statistic of 0.033.]
189
See pages 450 and 657 of Loss Models.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 319
Order Statistic (1000 runs) 800 801
Data Set of Size 1000 K-S Statistic Fitting Pareto 0.0229799 0.0230078
Data Set of Size 1000 K-S Statistic versus Uniform 0.0335282 0.0335376
900 901
0.0254673 0.0255171
0.0384360 0.0384896
950 951
0.0274450 0.0274915
0.0441382 0.0442205
990 991 992 993 994 995 996 997 998 999 1000
0.0310457 0.0317198 0.0319260 0.0319718 0.0324317 0.0326460 0.0333961 0.0337541 0.0348501 0.0354379 0.0405292
0.0510510 0.0511214 0.0512179 0.0521539 0.0522656 0.0523428 0.0533708 0.0556483 0.0561995 0.0570298 0.0604800
Significance Level 20% 10% 5% 1%
Estimated Critical Values for the K-S Statistic Fitting Pareto versus Uniform K-S Table (n = 1000) 0.0230 0.0335 0.0339 0.0255 0.0385 0.0387 0.0275 0.0442 0.0429 0.0314 0.0511 0.0515
In the case of the fitted distribution, the critical values are significantly smaller than when comparing to an assumed distribution.190 The critical values for the simulation of an assumed distribution are comparable to those from the K-S Table, while those from the simulation of a fitted distribution are significantly smaller.191 Using the K-S Table in this case of a fitted distribution, would result in rejecting in too few situations and rejecting at the wrong significance level. For example, if the K-S Statistic were 0.03 for this fitted case, we should reject at 5%, while using the K-S Table we would not reject at 20%. 190
When one is comparing data to a distribution fit to that data, one gets a smaller K-S statistic or Chi-Square Statistic on average. This is why we subtract the number of fitted parameters from the degrees of freedom when doing a Chi-Square Test. This moves us up rows on the Chi-Square Table, resulting in smaller critical values. 191 Here we have fit 2 parameters. In the case of more fitted parameters, the difference will be larger.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 320
Simulation Experiment, Censoring from Above, Kolmogorov-Smirnov Statistic: For each run I simulated 10,000 random numbers from (0,1). Then I censored from above at 0.6, by limiting each value to 0.6. Then I computed the K-S Statistic, comparing to the uniform distribution on (0,1), altered by censoring from above at 0.6. I performed 1000 simulation runs. Selected results were as follows, grading from smallest to largest: Order Statistic (1000 runs) 800 801
K-S Statistic (Data Set of Size 10,000) 0.0102941 0.0103052
900 901
0.0120218 0.0120372
950 951
0.0133836 0.0134078
990 991 992 993 994 995 996 997 998 999 1000
0.0158420 0.0160664 0.0162462 0.0169515 0.0170816 0.0171526 0.0173295 0.0174351 0.0175825 0.0183527 0.0195802
Significance Level 20% 10% 5% 1%
Estimated Critical Value (Data Set of Size 10,000) No Censoring Censored from Above at 0.6 0.01077 0.01030 0.01236 0.01203 0.01356 0.01340 0.01648 0.01595
From a Table192 0.011813 0.013211 0.015996
As stated by Loss Models, with censoring from above the Kolmogorov-Smirnov critical values are somewhat less than those in the absence of censoring.193 192
For censoring from above at about the 60th percentile. See page 112 of Goodness-of-Fit-Techniques by DʼAgostino and Stephens, not on the syllabus. 193 Both columns of figures are subject to random fluctuation, since they were based on a finite number of simulation runs.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 321
Simulation Experiment, Small Sample, Kolmogorov-Smirnov Statistic: For each run I simulated 5 random numbers from (0,1). Then I computed the K-S Statistic, comparing to the uniform distribution on (0,1). I performed 100,000 simulation runs. Selected results were as follows, grading from smallest to largest: Order Statistic (100,000 runs) 79,752
K-S Statistic (Data Set of Size 5) 0.445537
80,000 80,001
0.446726 0.446737
80,248
0.447947
89,814
0.507542
90,000 90,001
0.509055 0.509057
90,186
0.510744
94,864
0.560045
95,000 95,001
0.562347 0.562362
95,136
0.564447
98,938
0.663386
99,000 99,001
0.666978 0.667001
99,062
0.670624
99,990 99,991 99,992 99,993 99,994 99,995 99,996 99,997 99,998 99,999 100,000
0.853502 0.853916 0.855839 0.858562 0.859347 0.863762 0.873532 0.874698 0.875638 0.929179 0.937643
2013-4-13,
Significance Level 20% 10% 5% 1%
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 322 Critical Values (Data Set of Size 5) Estimated Using the Asymptotic Formulas 194 via Simulation Exact for large samples195 0.447 0.446 0.480 0.509 0.51 0.547 0.562 0.564 0.607 0.667 0.67 0.728
There is a match between the critical values estimated via simulation and the “exact” critical values for a set of size 5 taken from a statistical table. The critical values calculated by applying the formulas that are correct for large sample sizes, are somewhat too large. Based on this simulation, here is a histogram of the density of the Kolmogorov-Smirnov Statistic for a sample size of 5: Density
0.15
0.1
0.05
x 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 194
Taken from Table VIII in Probability and Statistical Inference, by Hogg and Tanis, which has values displayed to only two decimal places, and from “Kolmogorov-Smirnov: A Goodness of Fit Test for Small Samples,” by J. Romeu. 195 1.0727/ n , 1.2238/ n , 1.3581/ n , and 1.6276/ n . See “Mahlerʼs Guide to Fitting Loss Distributions.” These critical values are approximate and should be good for 15 or more data points.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 323
Based on this simulation, for the Kolmogorov-Smirnov Statistic for a sample size of 5, the mean is 0.3583, the variance is 0.01192, the skewness is 0.749, and the kurtosis is 3.473. Based on this simulation, here is a graph of the Survival Function of the Kolmogorov-Smirnov Statistic for a sample size of 5, with the vertical axis on a log scale: 1000 S(x) 20.0 10.0 5.0 2.0 1.0 0.5 0.2
0.65
0.70
0.75
0.80
0.85
x
The estimate of the critical value for 20% is an estimate of the 80th percentile of the distribution function of the K-S Statistic for a sample size of 5. Similarly, the estimate of the critical value for 10% is an estimate of the 90th percentile of this distribution function. As discussed in a previous section, (X(a), X(b)) is an approximate P confidence interval for the pth percentile, Πp , where Φ(y) = (1 + P)/2, a = np - y
n p (1 - p) , rounded down to the nearest integer,
b = np + y
n p (1 - p) rounded up to the nearest integer,
X(a) = the ath value from smallest to largest, and X(b) = the bth value from smallest to largest.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 324
For a 95% confidence interval, y = 1.960. In this case with 100,000 simulations run, n = 100,000. Therefore, a = 100,000p - 1.96 100,000p(1- p) = 100,000p - 619.81 p (1 - p) , rounded down to the nearest integer. b = 100,000p + 619.81 p (1 - p) , rounded up to the nearest integer. For the 80th percentile, a = (100,000)(0.8) - 619.81 (0.8) (0.2) = 80000 - 247.9 = 79,752, and b = 80000 + 247.9 = 80,248. X(a) = X(79,752) = 0.445537, and X(b) = X(80,248) = 0.447947. Therefore, an approximate 95% confidence interval for the 20% critical value is: (0.4455, 0.4479). Exercise: Determine an approximate 95% confidence interval for the 10% critical value. [Solution: a = (100,000)(0.9) - 619.81 (0.9) (0.1) = 90000 - 185.9 = 89,814, and b = 90000 + 185.9 = 90,186. X(a) = X(89,814) = 0.507542, and X(b) = X(90,186) = 0.510744. An approximate 95% confidence interval for the 10% critical value is: (0.5075, 0.5107). ] Exercise: Determine an approximate 95% confidence interval for the 5% critical value. [Solution: a = (100,000)(0.95) - 619.81 (0.95) (0.05) = 95000 - 135.1 = 94,864, rounding down to the nearest integer. b = 95000 + 135.1 = 95,136, rounding up to the nearest integer. X(a) = X(94,864) = 0.560045, and X(b) = X(95,136) = 0.564447. An approximate 95% confidence interval for the 5% critical value is: (0.5600, 0.5644). ] Exercise: Determine an approximate 95% confidence interval for the 1% critical value. [Solution: a = (100,000)(0.99) - 619.81 (0.99) (0.01) = 99000 - 61.7 = 98,938. b = 99000 + 61.7 = 99,062. X(a) = X(99,062) = 0.663386, and X(b) = X(98,938) = 0.670624. An approximate 95% confidence interval for the 1% critical value is: (0.6634, 0.6706). Comment: If one desires or requires a narrower 95% confidence interval, then one would have to run more simulations than 100,000.]
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 325
Simulation Experiment, Anderson-Darling Statistic: One can apply the same techniques to the Anderson-Darling Statistic.196 For example, I simulated 100 random draws from an Exponential with mean 500. Then I computed the Anderson-Darling Statistic, comparing the simulated data to this Exponential Distribution. I repeated this experiment 100 times and got the following results, sorted from smallest to largest: 0.150793, 0.202866, 0.217615, 0.225211, 0.237629, 0.270732, 0.27209, 0.274993, 0.309209, 0.316596, 0.330829, 0.333732, 0.349364, 0.353056, 0.360744, 0.366817, 0.36963, 0.370551, 0.372486, 0.383931, 0.387847, 0.396176, 0.398085, 0.401829, 0.411677, 0.428012, 0.438676, 0.439, 0.444106, 0.476288, 0.479759, 0.493281, 0.512119, 0.52198, 0.537032, 0.544232, 0.559708, 0.563882, 0.566465, 0.578422, 0.582152, 0.60815, 0.614244, 0.61862, 0.630172, 0.660967, 0.679364, 0.688806, 0.716474, 0.750722, 0.757633, 0.758422, 0.764534, 0.82453, 0.838526, 0.84205, 0.852554, 0.882573, 0.888872, 0.909104, 0.909258, 0.923411, 0.924255, 0.976673, 1.00923, 1.0141, 1.01463, 1.03914, 1.05707, 1.07163, 1.1203, 1.16195, 1.2021, 1.22746, 1.2637, 1.30529, 1.31522, 1.32201, 1.3232, 1.37913, 1.44438, 1.4451, 1.47229, 1.4864, 1.49273, 1.5304, 1.54589, 1.63668, 1.76114, 1.77371, 1.7916, 1.8953, 1.8961, 1.94726, 2.21822, 2.2395, 2.24672, 2.67465, 3.08156, 3.22387. Since out of 100, 10 values are greater than 1.78, the estimated critical value for 10% is about 1.78. However, 100 simulations are not enough to get a good estimate of this critical value. Running 10,000 rather than 100 simulations, resulted in an estimated 10% critical value of 1.934. The same type of simulation experiment was run for a Pareto Distribution, with α = 3 and θ = 5000. Running 10,000 simulations, resulted in an estimated 10% critical value of 1.935. Simulation Experiment, Fitting a Distribution, Anderson-Darling Statistic: Next an experiment was run in which 100 random draws were simulated from an Exponential with mean 500, but then these 100 values were fit to an Exponential via method of moments. Then the Anderson-Darling Statistic was computed for this data versus this fitted Exponential. Running 10,000 such simulations, resulted in an estimated 10% critical value of 1.038.197 Comparing to a distribution fit to the data results in lower values for the Anderson-Darling Statistic, resulting in a smaller critical value than when a distribution was not fit to the data.198
196
See “Mahlerʼs Guide to Fitting Loss Distributions.” Based on Table 4.11 in Goodness-of-Fit-Techniques by DʼAgostino and Stephens, not on the syllabus, the 10% critical value for this situation is 1.056. 198 See pages 450 of Loss Models. 197
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 326
Simulation Experiment, Censoring from Above, Anderson-Darling Statistic: Next a simulation experiment was run involving censoring from above at 1000, and an Exponential with mean 500. 100 random draws were simulated from an Exponential with mean 500, but then these values were censored from above, by limiting them to 1000. The Anderson-Darling Statistic was computed for each run, by comparing the censored data to the effect of censoring the Exponential with mean 500. 10,000 runs were performed. From smallest to largest the 9000th result was 1.73964 and the 9001th result was 1.74041. Therefore, the estimated critical for a 10% significance level is 1.740.199 As expected, with censoring the critical value is less than the 1.934 estimated without censoring. Simulation Experiment, Likelihood Ratio Test: 100 random draws were simulated from an Exponential with mean 1. Then both an Exponential and a Weibull Distribution were fit via maximum likelihood to the simulated data set. Then the likelihood ratio test statistic was computed: twice the difference in the loglikelihood of the maximum likelihood Weibull and the maximum likelihood Exponential. 1000 simulations were run; the results were tabulated and sorted from smallest to largest. The 60 largest values were: 3.63825, 3.67433, 3.70577, 3.71951, 3.78715, 3.81249, 3.86028, 3.88607, 3.90208, 3.91491, 3.92111, 3.94456, 3.98428, 4.07159, 4.0956, 4.11507, 4.13306, 4.13986, 4.23185, 4.2408, 4.25685, 4.30396, 4.31158, 4.34111, 4.35435, 4.3743, 4.42301, 4.43565, 4.47076, 4.50084, 4.57997, 4.62432, 4.64379, 4.74315, 4.78544, 4.86269, 4.90033, 4.92223, 5.04541, 5.1575, 5.17913, 5.59052, 5.67148, 5.85721, 5.9493, 5.98434, 6.04156, 6.56074, 6.75665, 6.81564, 7.13955, 7.60975, 8.06785, 8.52695, 8.66458, 8.78946, 8.87052, 9.25884, 9.48098, 10.8397. The statistic should approximately follow a Chi-Square Distribution with one degree of freedom. Here is a comparison: Value of the Statistic 3.84 5.02 6.64 7.88
199
p-value from Simulation 5.4% 2.2% 1.2% 0.8%
p-value from Chi-Square Table 5.0% 2.5% 1.0% 0.5%
Based on Table 4.4 in Goodness-of-Fit-Techniques by DʼAgostino and Stephens, not on the syllabus, the 10% critical value for this situation is about 1.74.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 327
Simulation Experiment, Schwarz Bayesian Criterion: Using the Schwarz Bayesian Criterion,200 one adjusts the loglikelihoods by subtracting in each case the penalty: (number of fitted parameters) ln(number of data points) / 2. One then compares these penalized loglikelihoods directly; larger is better. First I simulated n random draws from an Exponential Distribution with mean 665. An Exponential and a LogNormal Distribution were each fit via maximum likelihood to the same simulated data. Then each maximum loglikelihood was penalized; for the Exponential the penalty was ln(n) / 2, while for the LogNormal with 2 parameters the penalty was ln(n). Then the two penalized loglikelihoods were compared. This was repeated 1000 times. Exponential, number of simulated data points 10 100 1000
number of times Exponential fit was judged better, out of 1000 803 989 1000
One would hope, that most of the time for data simulated from an Exponential Distribution, the Exponential fit would be judged better than the Lognormal fit. The Schwarz Bayesian Criterion did relatively well for the simulated Exponential data, even for small data sets. Next I simulated n random draws from a LogNormal Distribution with µ = 6, σ = 1, and mean 665. An Exponential and a LogNormal Distribution were each fit via maximum likelihood to the same simulated data. Then each maximum loglikelihood was penalized; for the Exponential the penalty was ln(n) / 2, while for the LogNormal with 2 parameters the penalty was ln(n). Then the two penalized loglikelihoods were compared. This was repeated 1000 times. LogNormal, number of simulated data points 10 100 1000
number of times LogNormal fit was judged better, out of 1000 465201 944 1000
One would hope, that most of the time for data simulated from an Lognormal Distribution, the Lognormal fit would be judged better than the Exponential fit. The Schwarz Bayesian Criterion did relatively well for the simulated LogNormal data, except for small data sets (size = 10).
200
See “Mahlerʼs Guide to Fitting Loss Distributions.” 856 out of 1000 times the (non-penalized) loglikelihood of the fitted LogNormal was better than the (non-penalized) loglikelihood of the fitted Exponential. 201
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 328
Minimum CDF (Cumulative Distribution Function) Distance: One can fit distributions to grouped data by minimizing the CDF distance.202 The CDF distance (cumulative distribution function distance) is a (weighted) sum of squared differences between the observed and fitted distribution functions: Σ wi {Observed Distribution Function - Fitted Distribution Function}2 The differences are computed at the ends of the intervals for the grouped data. For each set of weights one gets a somewhat different statistic. One can use numerical methods to find the distribution with the minimum cdf. For example, the Minimum CDF Distance Pareto fit to the Ungrouped Data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”, has α = 1.9618 and θ = 292,923, with a CDF distance of 0.0000470.203
A Practical Example, CDF Distance: One can use the simulation techniques discussed in order to determine whether the value of 0.0000470 for the CDF distance indicates a good fit to this data. For each simulation run: 1. Simulate 130 random draws from a Pareto Distribution with α = 1.9618 and θ = 292,923.204 205 2. Fit a Pareto Distribution to the simulated data from step 1, via Minimum CDF Distance. 3. Tabulate the CDF Distance for the simulated data from step 1 and the fitted distribution from step 2. I performed 100 such simulations and sorted the resulting 100 values of the CDF Distance.
202
See page 53 of the First Edition of Loss Models, not on the syllabus. The values at which the fitted and empirical CDF were compared are: 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, 1 million, and infinity. 204 Via the Method of Inversion. Where u is a random number from (0, 1), set F(x) = u and solve for x. x = θ{(1-u)−1/α - 1}. 205 There are 130 data points in the Ungrouped data set in Section 2. 203
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 329
The results are as follows, where all values need to be multiplied by 10-9 :206 44427, 56851, 58901, 59165, 66326, 66937, 76780, 79694, 81038, 92967, 115125, 129143, 130573, 131644, 132761, 132995, 133181, 134106, 135351, 141344, 142388, 144569, 145445, 149529, 152884, 156233, 157347, 163293, 165990, 169973, 175048, 177383, 178215, 185927, 192738, 194795, 194827, 197315, 197660, 198718, 199051, 199864, 201582, 202875, 206989, 208905, 210559, 210903, 214341, 222482, 226810, 229646, 233054, 238576, 244605, 262613, 263703, 263774, 264308, 276260, 278471, 287788, 302483, 308810, 311096, 320591, 323722, 325263, 333364, 333959, 337106, 338029, 343113, 347647, 356982, 362409, 372792, 375200, 388171, 394290, 441630, 446376, 454569, 456119, 483549, 490871, 491389, 525274, 530663, 569361, 580755, 586309, 591063, 595556, 658635, 660383, 697401, 730600, 863255, 1449623. Only one of these 100 values were smaller than 0.0000470. Since 99 out of 100 values of the CDF Distance were greater than 0.0000470, the estimated p-value is 99%, indicating a very good fit. Similarly, the Minimum CDF Distance Weibull fit to the Ungrouped Data in Section 2 of “Mahlerʼs Guide to Fitting Loss Distributions”, has τ = 0.814457 and θ = 207,920, with a CDF distance of 0.0002778. This CDF Distance is greater than that for the Minimum Distance Pareto, which was 0.0000470. Thus the Weibull is not as good a fit as the Pareto. One can use simulation again in order to estimate the p-value of the fitted Weibullʼs CDF Distance. Now the null hypothesis is that the data was generated from the Minimum Distance Weibull. Therefore, for each simulation run: 1. Simulate 130 random draws from a Weibull Distribution with τ = 0.814457 and θ = 207,920 2. Fit a Weibull Distribution to the simulated data from step 1, via Minimum CDF Distance. 3. Tabulate the CDF Distance for the simulated data from step 1 and the fitted distribution from step 2. I performed 100 such simulations and sorted the resulting 100 values of the CDF Distance.
206
Thus a CDF Distance of 0.0000470 corresponds to a listed value of 47,000.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 330
The results are as follows, where all values need to be multiplied by 10-9 :207 20240, 22826, 33098, 45396, 49823, 58252, 65495, 68045, 68636, 70697, 71402, 75976, 76382, 77036, 78687, 84228, 87007, 88455, 88696, 88767, 89787, 94003, 94349, 97183, 97697, 107869, 113878, 117826, 118097, 118978, 122119, 124086, 126639, 126747, 130089, 131226, 135233, 136631, 138205, 142211, 148924, 149653, 153282, 160963, 171136, 173441, 176140, 188352, 201739, 204431, 208881, 214158, 219629, 220924, 222870, 225136, 228748, 236843, 238471, 246924, 248368, 253477, 254010, 261447, 262002, 262067, 262146, 262452, 265055, 276222, 276264, 280051, 292734, 301634, 313252, 331887, 343889, 351149, 351363, 356536, 360514, 372618, 388119, 397019, 404086, 405079, 412168, 420707, 426765, 441663, 444418, 452379, 455205, 496103, 560832, 577875, 579196, 597995, 615813, 779607. Since 29 out of 100 values of the CDF Distance were greater than 0.0002778, the estimated p-value is 29%.
207
Thus a CDF Distance of 0.0002778 corresponds to a listed value of 277,800.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 331
Problems: Use the following information for the next four questions: • In each run, a set of 50 independent random numbers from (0,1) has been simulated and then the Kolmogorov-Smirnov Statistic has been calculated by comparing to the uniform distribution on (0,1). • 1000 such simulations have been run.
• The K-S statistics from each run were recorded and ordered from smallest to largest: k1 , k2 , k3 , ..., k1000.
• Selected values are: k970 = 0.194753, k971 = 0.195407, k972 = 0.197153, k973 = 0.197611, k974 = 0.198487, k975 = 0.199487, k976 = 0.200685, k977 = 0.203016, k978 = 0.204902, k979 = 0.205828, k980 = 0.206179, k981 = 0.207174, k982 = 0.207572, k983 = 0.210561, k984 = 0.212629, k985 = 0.214446, k986 = 0.214500, k987 = 0.216814, k988 = 0.218258, k989 = 0.220447, k990 = 0.225803, k991 = 0.230389, k992 = 0.230480, k993 = 0.241209, k994 = 0.242379, k995 = 0.249352, k996 = 0.250113, k997 = 0.260620, k998 = 0.263153, k999 = 0.267335, k1000 = 0.319145. 18.1 (1 point) For 50 data points, estimate the p-value of a K-S Statistic of 0.25. (A) 0.1% (B) 0.5% (C) 1% (D) 2% (E) 3% 18.2 (1 point) For 50 data points, for the K-S Statistic estimate the critical value for 2.5%. (A) 0.19 (B) 0.20 (C) 0.21 (D) 0.22 (E) 0.23 18.3 (1 point) For 50 data points, estimate the p-value of a K-S Statistic of 0.21. (A) 0.1% (B) 0.5% (C) 1% (D) 2% (E) 3% 18.4 (1 point) For 50 data points, for the K-S Statistic estimate the critical value for 1.0%. (E) 0.23 (A) 0.19 (B) 0.20 (C) 0.21 (D) 0.22
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 332
18.5 (2 points) You are given the following:
• 10,000 losses are grouped into 11 intervals. • A Weibull Distribution has been fit to this grouped data via Maximum Likelihood. The Chi-Square Statistic is 14.63. • In each run, a set of 10,000 independent random numbers from this Weibull Distribution has been simulated, and grouped into the same 11 intervals as the original data. Then a new Weibull Distribution has been fit via Maximum Likelihood to this simulated data. The Chi-Square Statistic has been calculated comparing this new Weibull Distribution to the simulated data.
• 1000 such simulations have been run. • The Chi-Square statistics from each run were recorded and ordered from smallest to largest: x1 , x2 , x3 , ... x1000.
• Selected values are: x800 = 12.7102, x900 = 14.1681, x925 = 14.9048, x950 = 16.3975, x975 = 17.5641. Estimate the p-value. A. less than 5% B. at least 5% but less than 6% C. at least 6% but less than 7% D. at least 7% but less than 8% E. at least 8%
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 333
Use the following information for the next two questions: During a one-year period, the number of accidents per day was distributed as follows: Number of Accidents Days 0 209 1 111 2 33 3 7 4 3 5 2 18.6 (2 points) For the above data and a Poisson Distribution with λ = 0.7, the Chi-Square Statistic is 9.18. (The data was grouped into four intervals, 0, 1, 2, and “3 and over”, in order to compute the Chi-Square Statistic.) Explain how you would use simulation to estimate the p-value. 18.7 (2 points) For the above data, the maximum likelihood Poisson has λ = 0.6. The Chi-Square Statistic is 2.84. (The data was grouped into four intervals, 0, 1, 2, and “3 and over”, in order to compute the Chi-Square Statistic.) Explain how you would use simulation to estimate the p-value.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 334
18.8 (2 points) For each run, 10,000 random numbers were simulated from (0,1). Then these simulated values were censored from above at 0.5, by limiting each value to 0.5. Then the Kolmogorov-Smirnov Statistic was computed by comparing to the uniform distribution on (0,1), altered by censoring from above at 0.5. 1000 simulation runs were performed. Selected results were as follows, grading from smallest to largest: Order Statistic (1000 runs) 800 801
K-S Statistic (Data Set of Size 10,000) 0.0095348 0.0095477
900 901
0.0111373 0.0111643
950 951
0.0125958 0.0126018
990 991 992 993 994 995 996 997 998 999 1000
0.0150013 0.0150674 0.0155057 0.0155918 0.0156372 0.0159683 0.0161213 0.0173629 0.0177408 0.0186538 0.0204895
Estimate the critical values for the 20%, 10%, 5%, 1% and 0.5% significance levels, for the Kolmogorov-Smirnov Statistic and 10,000 data points, when there is censoring from above at (approximately) the median.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 335
Solutions to Problems: 18.1. B. p-value = (# of simulations with value ≥ .25) / (total # of simulations) = 5/1000 = 0.5%. Comment: Given a value of the test statistic, we estimate the survival function at that value. 18.2. B. (2.5%)(1000) = 25. There are 25 simulation runs with values ≥ .20. Thus the estimated critical value for 2.5% is 0.20. More precisely, we wish to estimate the 97.5th percentile of the K-S distribution, in other words, where the survival function is 2.5%. (1001)(.975) = 975.975. Thus we want the 975.975th value from smallest to largest. Linearly interpolating between the 975th and 976th values: (.025)(.199487) + (.975)(.200685) = .200655. Comment: Given a value of the survival function, we estimate the corresponding value of the test statistic. 18.3. D. p-value = (# of simulations with value ≥ .21) / (total # of simulations) = 18/1000 = 1.8%. Comment: Given a value of the test statistic, we estimate the survival function at that value. 18.4. E. (1.0%)(1000) = 10. There are 10 simulation runs with values ≥ .23. Thus the estimated critical value for 1.0% is 0.23. More precisely, we wish to estimate the 99th percentile of the K-S distribution, in other words, where the survival function is 1%. (1001)(.99) = 990.99. Thus we want the 990.99th value from smallest to largest. Linearly interpolating between the 990th and 991th values: (.01)(0.225803) + (.99)(0.230389) = .230343. Comment: Given a value of the survival function, we estimate the corresponding value of the test statistic. 18.5. E. Since x900 = 14.1681 and x925 = 14.9048, at least 76 and at most 100 tabulated ChiSquare values are ≥ 14.63. Linearly interpolating, about 85 values are greater than 14.63 out of 1000. Therefore, p ≅ 85/1000 = 8.5%.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 336
18.6. 1. Simulate 365 random draws from a Poisson Distribution with λ = 0.7. 2. Group this simulated data into intervals, 0, 1, 2, and “3 and over”. Compute the Chi-Square Statistic for the simulated data from step 1 and a Poisson Distribution with λ = 0.7. 4. After a sufficient number of simulation runs (steps 1 to 2), for example 1000, the estimated p-value is: (χ2 > 9.18) / (total number of runs). Comment: When I ran 1000 such simulations, 23 had χ2 > 9.18, for an estimated p-value of 23/1000 = 2.3%. This compares to a p-value from a Chi-Square Distribution with 4 - 1 = 3 degrees of freedom of 2.7%. 18.7. 1. Simulate 365 random draws from a Poisson Distribution with λ = 0.6. 2. Fit an Poisson Distribution to the simulated data from step 1, via Maximum Likelihood (which gives the same result as the Method of Moments.) 3. Group this simulated data into four intervals, 0, 1, 2, and “3 and over”. Compute the Chi-Square Statistic for the simulated data from step 1 and the fitted distribution from step 2. 4. After a sufficient number of simulation runs (steps 1 to 3), for example 1000, the estimated p-value is: (χ2 > 2.84)/(total number of runs). Comment: The situation was taken from 4, 5/01, Q.19-20. When I ran 1000 such simulations, 257 had χ2 > 2.84, for an estimated p-value of 257/1000 = 25.7%. This compares to a p-value from a Chi-Square Distribution with 4 - 1 - 1 = 2 degrees of freedom of 24.2%.
2013-4-13,
Simulation §18 Estimating the p-value, HCM 10/25/12, Page 337
18.8. Out of 1000 runs, 6 results of runs were greater than or equal to 0.0159683, while 5 results of runs were greater than or equal to 0.0161213. Therefore, the estimated critical value for 0.5% is about .0161. Significance Level Estimated Critical Value 20% 0.96% 10% 1.12% 5% 1.26% 1% 1.51% 0.5% 1.61% For the 1% critical value, we want to estimate the 99th percentile of the K-S distribution. The smoothed empirical estimate of 99th percentile is the (1000 + 1)(.99) = 990.99th value: (.0150013)(.01) + (.0150674)(.99) = .0150667. In a similar manner, one can obtain more precise estimates of the other critical values: Significance
Percentile
Order Stat.
K-S below
K-S above
Critical Value
20.0% 10.0% 5.0% 1.0% 0.5%
80.0% 90.0% 95.0% 99.0% 99.5%
800.8 900.9 950.95 990.99 995.995
0.0095348 0.0111373 0.0125980 0.0150013 0.0159683
0.0095477 0.0111643 0.0126018 0.0150674 0.0161213
0.00955 0.01116 0.01260 0.01507 0.01612
Comment: If one desired more accuracy, one would need to run more than 1000 simulations.
2013-4-13,
Simulation §19 Simulation Experiment, HCM 10/25/12, Page 338
Section 19, An Example of a Simulation Experiment Simulation can be used to illustrate points and to improve ones understanding of the material on the syllabus. Here is an extremely simple example of the use of simulation to illustrate percentile estimation. This example assumes that claims are drawn from a Pareto Distribution as per Loss Models with parameters α = 3 and θ = 50,000. This distribution has a mean of: 50,000 / (3 - 1) = 25,000. Exercise: What are the 10th, 50th, and 90th percentiles of this Pareto Distribution? [Solution: F(π10) = 0.1. 1 - (1 + π10/50,000)-3 = 0.1. Therefore, the 10th percentile is π10 = 50,000{(1 - 0.1)-1/3 -1} = 1787. The median or 50th percentile is: 50,000{(1 - 0.5)-1/3 -1} = 12,996. The 90th percentile is: 50,000{(1 - 0.9)-1/3 -1} = 57,722.] Exercise: Assume one observed the following nine claims: 10278, 28633, 945, 1231, 53871, 309, 23776, 456, 12233. Estimate the 10th, 50th and 90th percentiles. [Solution: Sort the claims from smallest to largest: 309, 456, 945, 1231, 10278, 12233, 23776, 28633, 53871. The 10th percentile is estimated by the (0.10)(9+1) = 1st claim which is 309. The 50th percentile is estimated by the (0.50)(9+1) = 5th claim which is 10,278. The 90th percentile is estimated by the (0.90)(9+1) = 9th claim which is 53,871.] Exercise: Assume 0.0546 is a random number from [0,1]. Using the Inversion Method, with large random numbers corresponding to large losses, what is the simulated claim from the assumed Pareto Distribution? [Solution: x = 50,000{(1 - 0.0546}-1/3 -1} = 945.] One can simulate 9 claims from the assumed Pareto (via the Inversion Method) and then estimate the 10th, 50th, and 90th percentiles. I simulated this situation 100 separate times and tabulated the results. Here are graphs of the resulting estimates sorted from smallest to largest:
2013-4-13,
Simulation §19 Simulation Experiment, HCM 10/25/12, Page 339
Estimates of the 10th percentile based on 9 claims (actual answer is 1787):
8000
6000
4000
2000
20
40
60
80
100
Estimates of the 50th percentile based on 9 claims (actual answer is 12,996):
40000
30000
20000
10000
20
40
60
80
100
2013-4-13,
Simulation §19 Simulation Experiment, HCM 10/25/12, Page 340
Estimates of the 90th percentile based on 9 claims (actual answer is 57,722):
200000
150000
100000
50000
20
40
60
80
100
It is evident from these graphs, that in a situation such as this, one can not rely on the estimates of percentiles obtained from observing only 9 claims.208 Assume one were to ask an experienced actuary how reliable the estimates of the 10th, 50th or 90th percentiles would be based on observing only 9 claims. One would hope that he would say “not very”. By simulating this experiment 100 times you can get a feel for exactly how unreliable such estimates are likely to be in situations where the underlying size of claim distribution is somewhat like the assumed Pareto Distribution.209 In my opinion, developing an intuition for practical situations is a very important part of your training as an actuary. A well trained actuary should not have to perform any simulation or algebraic analysis in order to answer the easy question asked here. Simulating situations and testing the sensitivity of the results to the inputs is a quick way to develop and improve your own intuition. For example, how would the results change if the Pareto was longer-tailed (smaller alpha) or shortertailed (larger alpha) or if instead the distribution had been an Exponential? How would the results have differed if instead of observing 9 claims one had observed 99 or 9999 claims? You should be able to perform simulations to answer these questions on your own. The estimates are much more reliable, if the number of claims is 9999 rather than 9: 208
The more claims one has observed the smaller the variance of the estimate, all other things being equal. In general the higher the percentile the more the variance of the estimate. In general the longer-tailed the distribution the more the variance of the estimate. 209 The number of simulation runs needed was discussed in a previous section. For an algebraic development of this material see pp. 107-108 of Insurance Risk Models by Panjer & Willmot.
2013-4-13,
Simulation §19 Simulation Experiment, HCM 10/25/12, Page 341
Estimates of the 10th percentile based on 9999 claims (actual answer is 1787): 1950 1900 1850 1800 1750 1700
20
40
60
80
100
Estimates of the 50th percentile based on 9999 claims (actual answer is 12,996):
13200 13000 12800 12600
20
40
60
80
100
Estimates of the 90th percentile based on 9999 claims (actual answer is 57,722): 60000 59000 58000 57000 56000 55000 20
40
60
80
100
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 342
Section 20, Summary and Important Ideas Each of the distributions that are covered on the exam, can be simulated by one or more techniques. Binomial: Inversion Method (table look up), or as a sum of Bernoulli Distributions. Poisson: Inversion Method (table look up), or Special Algorithm based on Interevent Times. Geometric: Inversion Method (table look up) Negative Binomial: Inversion Method (table look up), or as a sum of independent Geometric Distributions (for r integer), Frequency Distributions: Inversion Method (table look up). Compound Frequency Distributions: Simulate a random draw from the primary distribution, N; then simulate N independent random draws from the secondary distribution. Mixed Frequency Distributions: Simulate a random draw δ, from the mixing distribution. Then for the relevant parameter equal to δ, simulate a random draw from the distribution that is subject to being mixed. Standard Unit Normal: Inversion Method (table look up), Rejection Method, or the Polar Normal Method. Normal: Multiply a Standard Unit Normal by the standard deviation and add the mean. LogNormal: Exponentiate the corresponding Normal variable. Multivariate Normal: Use the Choleski Square Root Method to get a matrix C such that C CT = Σ, where Σ is the variance-covariance matrix. Then let Z be a vector of independent draws from a Standard Unit Normal, then the desired multivariate Normal is CZ + µ, where µ is the vector of means. Chi-Square Distribution, ν degrees of freedom: Sum of ν squares of Standard Unit Normals. Exponential: Inversion Method.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 343
Inverse Exponential: Inversion Method. Single Parameter Pareto: Inversion Method. Weibull: Inversion Method. Inverse Weibull: Inversion Method. Pareto: Inversion Method. Inverse Pareto: Inversion Method. LogLogistic: Inversion Method. ParaLogistic: Inversion Method. Inverse ParaLogistic: Inversion Method. Burr: Inversion Method. Inverse Burr: Inversion Method. Gamma: Sum of Exponentials (for α integer), from a Chi-Square (for 2α integer), or via the Rejection Method . Transformed Gamma: Take a Gamma variable with same α and unity scale parameter to the power 1/τ, and then multiply the result by θ. Inverse Transformed Gamma: Take a Gamma variable with same α and unity scale parameter to the power -1/τ, and then multiply the result by θ. Inverse Gamma: Take the inverse of a random draw from a Gamma Distribution with same α and unity scale parameter and then multiply the result by θ. Generalized Pareto: Rejection Method (as a special case of the Transformed Beta), as an F-distribution which is a ratio of Chi-Squares (only for 2τ and 2α integer), or as a mixture of Gammas via a Gamma.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 344
Beta: The Rejection Method, or as a combination of two Gammas. Transformed Beta: Let y be a random draw from a Beta Distribution with shape parameters τ and α. Then x = θ / (1/y -1)1/γ is a random draw from a Transformed Beta Distribution with parameters α, θ, γ, and τ. Inverse Gaussian: The Rejection Method.
Uniform Random Numbers (Section 2) Uniform random numbers from the interval [0,1] are the basis of all the simulation techniques covered on the exam. Uniform random number from [r, s]: (s - r)u + r, where u is a random number from [0, 1]. When given a series of random numbers from [0, 1], use them in the order given, and use each one at most once.
Continuous Distributions, Inversion Method (Section 3) Simulation by the inversion method will work for any Distribution Function, F(x), that can be algebraically inverted. Let u be a random number from (0,1). If it is not stated which way to perform the method of inversion, set F(x) = u. Equivalently, determine VaRp (X), for p = u. Setting F(x) = u. ⇔ large random numbers correspond to large losses. Discrete Distributions, Inversion Method (Section 4) Construct a table of the Distribution Function. We want the smallest x, such that F(x) > u. ⇔ large random numbers correspond to large simulated values.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 345
Simulating Normal and LogNormal Distributions (Section 5) In order to simulate a Unit Normal, µ = 0, σ = 1: Given a random number u, find Z such that Φ(Z) = u. In order to simulate a Normal Distribution with parameters µ and σ: Simulate a Unit Normal Z, then X = σZ + µ. In order to simulate a LogNormal Distribution with parameters µ and σ: 1. Simulate a Unit Normal, Z. 2. Get a random Normal variable with parameters µ and σ. X = σZ + µ. 3. Exponentiate to get a random LogNormal variable. Y = exp(σZ + µ). Simulating Brownian Motion (Section 6) An Arithmetic Brownian Motion, X(t) is Normal with mean X(0) + µ t, and variance σ2 t. Arithmetic Brownian Motion is a stochastic process, with the following properties: 1. X(t + s) - X(t) is Normally Distributed with mean µ s, and variance σ2 s. 2. The increments for disjoint time intervals are independent. 3. X(t) is continuous. One can simulate Arithmetic Brownian Motion, by simulating the increments as independent Normals. A Standard Brownian Motion, Z(t) is an Arithmetic Brownian Motion with µ = 0 and σ = 1. If ln(X(t)) is an Arithmetic Brownian Motion, then X(t) is a Geometric Brownian Motion. For a Geometric Brownian Motion, X(t)/X(0) is LogNormal with parameters µ t and σ t . Geometric Brownian Motion is a stochastic process, with the following properties: 1. X(t + s) / X(t) is LogNormally Distributed with parameters µ s and σ s . 2. The ratios for disjoint time intervals are independent. 3. X(t) is continuous. One can simulate Geometric Brownian Motion, by simulating successive ratios as independent LogNormals, or by simulating a Arithmetic Brownian Motion and then exponentiating.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 346
Simulating Lifetimes (Section 7) For life contingencies the following are all the same: small random numbers ⇔ early deaths. large random numbers ⇔ large lifetimes. Setting u = F(x). Setting 1 - u = S(x) = (# still alive) / (# originally alive). # still alive = (1-u) (# alive at starting age). Simulating a Poisson Process (Section 9) 1. Set i = 1 and Y0 = 0. 2. Simulate a random number from [0, 1], ui. 3. Simulate an exponentially distributed interevent time, Xi = -ln(1 - ui)/λ. 4. Let Yi = Xi + Yi-1. 5. If Yi > T, then reject Yi and exit the algorithm, having generated i-1 claims at times Yj for j < i. 6. If Yi ≤ T, then let i = i +1 and return to Step 2. Xi = -ln(1 - ui)/λ. ⇔ Large random numbers correspond to large interevent times. This is the default, equivalent to F(x) = u. If one simulates a Poisson Process on [0,1] with claims intensity λ, then the number of simulated claims is a random draw from a Poisson Distribution with mean λ. Simulating a Compound Poisson Process (Section 10) First simulate the Poisson Process, then simulate the size of each claim. Simulating Aggregate Losses and Compound Models (Section 11) Aggregate: First simulate the number of claims, then simulate the size of each claim. Compound: First simulate the primary distribution, then simulate those number of independent random draws from the secondary distribution.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 347
Deciding How Many Simulations to Run (Section 12) S i = the sample standard deviation after i simulation runs. The standard deviation of the estimated mean = Si / i . Where we want a confidence interval covering probability P, Φ(y) = (1 + P)/2. If we want to have at least P chance (example P = 90%) of being within ±k (example k = 5%) of the true value of the mean, then we want: (Si/ X i) / i ≤ k/y. ⇔ Needed Number of Simulations = (y/k)2 estimated variance/(estimated mean)2 . If we want to have at least P chance (example P = 90%) of being within ±k (example k = 5%) of the true value of the distribution function, then we want:
n Fn (x) ⎛ y ⎞ 2 ≥ . ⎝ k⎠ Sn(x)
If we want to have at least P chance (example P = 90%) of being within ±k (example k = 5%) of the n Sn (x) true value of the survival function, then we want: ≥ (y/k)2 . Fn (x) ^ Let, Π p = smoothed empirical estimate of the pth percentile.
a = np - y n p (1 - p) , rounded down to the nearest integer, and b = np + y n p (1 - p) , rounded up to the nearest integer. X(a) = the ath value from smallest to largest. X(b) = the bth value from smallest to largest. Then (X(a), X(b)) is an approximate P confidence interval for the pth percentile, Πp . If we want to have a probability P that our estimate of the pth percentile is within ±k of the true value, ^ ^ then we require that: X(a) ≥ Π p (1 - k) and X(b) ≤ Πp (1 + k).
Mixtures of Models (Section 14) N-Point Mixtures: First use a random number from (0,1) to determine at random which of the underlying distributions to use. Then simulate a random draw from that randomly selected distribution. Continuous Mixtures: First simulating a random draw from the Distribution of the parameter and then simulate a random draw from the likelihood given that value of the parameter.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 348
Bootstrapping (Section 16) The steps in general for Bootstrapping (not via Simulation) are: 0. You are a given a set and an estimator of a quantity of interest. 1. The quantity of interest is calculated assuming the distribution was uniform and discrete on the given data set, in other words using the empirical distribution function. 2. List all the subsets, with replacement, of appropriate size of the original set. 3. For each subset from step 2, calculate the estimate using the estimator. 4. Compute the mean squared difference between the values from step 3 and the value from step 1. The result of step 4 is the bootstrap estimate of the Mean Squared Error of the estimator. The use of Bootstrapping to estimate the MSE is distribution free. Using Bootstrapping (not via simulation) to estimate the MSE of the mean of n numbers gives the same answer as: (E[X2 ] - E[X]2 ) / n. Bootstrapping via Simulation (Section 17) The steps in general for Bootstrapping via Simulation are: 0. You are a given a set and an estimator of a quantity of interest. 1. The quantity of interest is calculated assuming the distribution was uniform and discrete on the given data set, in other words using the empirical distribution function. 2. Simulate with replacement a data set of appropriate size from the given set. 3. For each subset from step 2, calculate the estimate using the estimator. 4. Tabulate the squared difference between the estimate for this run from step 3 and the value from step 1. Return to step 2. After performing a large number of simulations, the mean of the squared differences from step 4 is the bootstrap estimate of the Mean Squared Error of the estimator.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 349
Estimating the p-value via Simulation (Section 18) When Data Has Not Been Fit to the Distribution: 0. You are given a distribution you wish to compare to a given data set. 1. Simulate a data set of appropriate size from the given distribution 2. Compute the value of the statistic. 3. Tabulate the value of the statistic and return to step 1. When enough simulations have been run, the percentage of simulation runs in which the statistic is ≥ the given value is the corresponding p-value. When Data Has Been Fit to the Distribution: 0. You are given a distribution fit to a given data set. 1. Simulate a data set of appropriate size from the given distribution. 2. Fit a new distribution of the same family to the simulated data using whatever technique had been used to fit the original distribution. 3. Compute the value of the statistic using the distribution from step 2 and the simulated data from step 1. 4. Tabulate the value of the statistic and return to step 1. When enough simulations have been run, the percentage of simulation runs in which the statistic is ≥ the given value is the corresponding p-value. p-value: given x find S(x). critical value: given S(x) find x.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 350
Further Reading: Simulation is widely used in many different actuarial applications. For example, here are some papers demonstrating practical uses of simulation by property/casualty actuaries:210 “Computer Simulation and the Actuary: A Study of the Realizable Potential”, by David A. Arata, PCAS LXVIII, 1981, pp. 24. “A Simulation Test of Prediction Errors of Loss Reserve Estimation Techniques,” by James N. Stanard, PCAS LXXII, 1985, pp. 124. “Simulating Serious Workersʼ Compensation Claims,” by Gary G. Venter and William R. Gillam, CAS Discussion Paper Program, 1986. “A Simulation Procedure for Comparing Different Claims Reserving Methods,” by Teivo Pentikainen and Jukka Rantala, CAS Forum, Fall 1995. “Simulation Models for Self-Insurance,” Trent R. Vaughn, CAS Forum, Spring 1996. “A Simulation Approach in Excess Reinsurance Pricing,” by Dmitry E. Papush, CAS Forum, Spring 1997. “A Comparative Study of the Performance of Loss Reserving Methods Through Simulation,” by Prakash Narayan and Thomas V. Warthen III, CAS Forum, Summer 1997. “Dynamic Financial Analysis of a Workersʼ Compensation Insurer,” by David Appel, Mark W. Mulvaney, and Susan E. Witcraft, CAS Forum, Summer 1997. “Building a Public Access PC-Based DFA Model,” by Stephen P. DʼArcy, Richard W. Gorvett, Joseph A. Herbers, Thomas E. Hettinger, Steven G,. Lehmann, and Michael J. Miller, CAS Forum, Summer 1997. “Performance Testing Aggregate and Structural Reserving Methods: A Simulation Approach,” by John W. Rollins, CAS Forum, Summer 1997. “Homeowners Ratemaking Revisited (Use of Computer Models to Estimate Catastrophe Loss Costs),“ by Michael A. Walters, and Francois Morin, PCAS LXXIV, 1997. “Estimating the Actuarial Value of the Connecticut Second Injury Fund Loss Portfolio,” by Abbe Sohne Bensimon, CAS Discussion Paper Program, 1998. “Using the Public Access DFA Model: A Case Study,” by Stephen P. DʼArcy, Richard W. Gorvett, Joseph A. Herbers, Thomas E. Hettinger, and Robert J. Walling III, CAS Forum, Summer 1998. ”Estimating the Variability of Loss Reserves,” by Richard E. Sherman, CAS Forum, Fall 1998. “Some Extensions of J.N. Standardʼs Simulation Model for Loss Reserving,” by Richard L. Vaughan, CAS Forum, Fall 1998. “Statistical Modeling Techniques for Reserve Ranges; A Simulation Approach,” by Chandu C. Patel and Alfred Raws III, CAS Forum, Fall 1998. “The Mechanics of a Stochastic Corporate Financial Model,” by Gerald S. Kirschner and William C. Scheel, PCAS LXXXV, 1998, pp. 404. “Implications of Dynamic Financial Analysis on Demutualization,” by Jan Lommele and Kevin Bingham, CAS Forum, Winter 1999. 210
All types of actuaries use simulation. However, I am a property/casualty actuary. I have not updated the list since 1999, but there continues to be many actuarial papers which use simulation.
2013-4-13,
Simulation §20 Summary & Important Ideas, HCM 10/25/12, Page 351
“Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates,” by David B. Speights, Joel B. Brodsky, and Darya L. Chudora, CAS Forum, Winter 1999. “Random Number Generation Using Low Discrepancy Points,” by Donald F. Mango, CAS Forum, Spring 1999. “A Practical Suggestion for Log-Linear Workers Compensation Cost Models,” by Dan Corro, CAS Forum, Spring 1999. “Parameterizing Interest Rate Models,” by Kevin C. Ahlgrim, Stephen P. DʼArcy and Richard W. Gorvett, CAS Forum, Summer 1999. “A Comprehensive System for Selecting and Evaluating DFA Model Parameters,” by Adam J. Berger and Chris K. Madsen, CAS Forum, Summer 1999. “Modeling Parameter Uncertainty in Cash Flow Projections,” by Roger M. Hayne, CAS Forum, Summer 1999. “Estimating Between Line Correlations Generated by Parameter Uncertainty,” by Glenn Meyers, CAS Forum, Summer 1999. “Calibration of Stochastic Scenario Generators for DFA,” by John M. Mulvey, Francois Morin, and Bill Pauling, CAS Forum, Summer 1999. “Levels of Determinism in Workers Compensation Reinsurance Commutations,” by Gary Blumsohn, PCAS LXXXVI, 1999, pp. 1. “Workers Compensation Reserve Uncertainty,” by Douglas M. Hodes, Sholom Feldblum, and Gary Blumsohn, PCAS LXXXVI, 1999, pp. 263. “Modeling Losses with the Mixed Exponential Distribution,” by Clive Keatinge, PCAS LXXXVI, 1999, pp. 654. “Downward Bias of Using High-Low Averages for Development Factors,” by Cheng-Sheng Peter Wu, PCAS LXXXVI, 1999, pp. 699.
Breakdown of Past Exams and Projected Weighting by Topic Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-14 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-14
Breakdown of Past Exams,
Page 1
HCM 10/26/12,
Below is a table of the frequency of the letter solutions on the Course 4 Exams: Course 4 Exam
A
B
C
D
E
May 2000 Nov. 2000 May 2001 Nov. 2001 Nov. 2002 Nov. 2003 Nov. 2004 May 2005 Nov. 2005 Nov. 2006 May 2007
6 8 7 6 6 7 8 8 8 8 7
9 6 7 10 9 8 9 6 7 7 7
8 9 9 7 8 8 7 7 7 7 7
9 9 7 8 9 8 8 8 6 7 8
8 8 10 9 8 9 8 6 7 6 11
Sum
79
85
84
87
90
Below is a table of the points per topic on the pre-2005 Course 4 Exams, assuming a 100 point exam.1 (The May 2002, 2003, and 2004 exams were not released.) Course 4 Fitting Fitting Exam Freq. Loss Dist. Dist.
1
Classical Buhlmann Conjugate Cred. Cred. Priors
Simul. in Fitting
Semi- Emp. Param. Bayes. Est. Cred.
Total
Sample May 2000 Nov. 2000 May 2001 Nov. 2001 Nov. 2002 Nov. 2003 Nov. 2004
2.5 5 0 5 2.5 5 5 2.5
22.5 22.5 20 20 17.5 20 25 32.5
2.5 2.5 2.5 0 2.5 5 5 2.5
12.5 12.5 12.5 12.5 20 17.5 17.5 15
5 5 7.5 10 5 2.5 5 2.5
5 2.5 2.5 2.5 2.5 2.5 5 2.5
2.5 2.5 2.5 0 0 0 0 2.5
2.5 2.5 5 2.5 2.5 2.5 2.5 2.5
55 55 52.5 52.5 52.5 55 65 62.5
Average
3.4
22.5
2.8
15.0
5.3
3.1
1.2
2.8
56.2
There were cuts in the Syllabus for 2001, particularly in Survival Analysis. The Survival Analysis syllabus was also substantially reduced effective with the Spring 2003 exam. It was increased somewhat in 2005.
2013-4-14
Breakdown of Past Exams,
Course 4 Fitting Simulation Exam Freq. & Loss in Dist. Fitting
Page 2
HCM 10/26/12,
Credibility
Regression
Time Series
Survival Analysis
Total
Sample May 2000 Nov. 2000 May 2001 Nov. 2001 Nov. 2002 Nov. 2003 Nov. 2004
25 27.5 20 25 20 25 30 35
5 2.5 2.5 2.5 2.5 2.5 5 2.5
25 25 30 25 30 27.5 30 25
15 15 15 15 12.5 15 12.5 15
12.5 12.5 15 12.5 15 12.5 12.5 10
17.5 17.5 17.5 20 20 17.5 10 12.5
100 100 100 100 100 100 100 100
Average
25.9
3.1
27.2
14.4
12.8
16.6
100.0
Below is a table of the points per topic on the Joint 4/C Exams from 2005 and 2006, assuming a 100 point exam.2 It may aid you in directing your study efforts.3 While the past breakdown of points may help to estimate the future, it will not directly reflect the mix of questions that will occur on your exam. The May 2006 exam was not released. Joint Exam 4/C
2
Fitting Fitting Freq. Loss Dist. Dist.
Classical Buhlmann Conjugate SemiEmp. Simul. Cred. Cred. Priors Param. Bayes. Est. Cred.
Surv. Anal.
Inter. & Smooth.
May 2005 Nov. 2005 Nov. 2006
0.0 5.7 5.7
34.3 28.6 25.7
2.9 2.9 2.9
17.1 11.4 5.8
8.6 5.7 17.1
2.9 2.9 0.0
2.9 5.7 5.7
8.6 8.6 11.4
14.3 20.0 20.0
8.6 8.6 5.7
Average
3.8
29.5
2.9
11.5
10.5
1.9
4.8
9.5
18.1
7.6
Interpolation and Smoothing is no longer on the syllabus. Note that the division between topics is somewhat arbitrary for some exam questions. Iʼve chosen the Study Guide that most closely matches each question. In many cases more elementary material is assumed in an exam question covering more advanced material. Many of the questions in the Conjugate Priors Study Guide are applications of ideas presented in the Buhlmann Credibility Study Guide. 3
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 3
Below is a table of the points per topic on the Joint 4/C Exams starting in 2007, assuming a 100 point exam.4 May 2007 Frequency Distributions Loss Distributions Aggregate Distributions Ruin Theory
0 7.5 5.0 2.5
Fitting Frequency Distributions Fitting Loss Distributions Survival Analysis
5.0 22.5 12.5
Classical Credibility Buhlmann Credibility Conjugate Priors Semiparametric Estimation Empirical
0 12.5 10.0 2.5 2.5
Simulation Two Topics in Financial Economics Risk Measures
7.5 7.5 2.5
Estimate of Average Breakdown on Joint Exam 4/C:5
Frequency Distributions Loss Distributions Aggregate Distributions Risk Measures Fitting Frequency Distributions Fitting Loss Distributions Survival Analysis Credibility6 Simulation
4
My Estimate 6% 8% 7% 3% 4% 26% 10% 28% 8%
The subsequent exams were not released. This is my best estimate, which should be used with appropriate caution, particularly in light of the changes in the syllabus. In any case, the number of questions by topic varies from exam to exam. 6 With about 3% for Classical Credibility, 14% for Buhlmann Credibility, 7% for Conjugate Priors, 1% for Semiparametric Estimation, and 3% for Empirical Bayes Credibility. 5
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 4
In the following breakdowns of the past Course 4 exams I have excluded questions on: Regression: moved to Validation by Education Experience in 2005. Time Series: moved to Validation by Education Experience in 2005. Prior to 2005, much of the Simulation subject was on Course 3, although some was on Course 4. See that study aid or the breakdown of past Course 3 exams shown below. Prior to the Spring 2008 Exam, the CAS/SOA posted a file of Sample Questions Exam 4/C. This consisted of past exam questions; those that are still on the syllabus are included in my Study Guides. Their file contains no questions on Risk Measures, which was added to the syllabus for 2007. The correspondence of these sample questions and past exams is shown below, as well as their location in my study guides. Number 1 2 3 4 5 6 7 8 9 10 11 12 13
Mahler Study Guide Frequency Distributions Loss Distributions Aggregate Distributions Risk Measures Fitting Frequency Distributions Fitting Loss Distributions Survival Analysis Classical Credibility Buhlmann Credibility & Bayesian Analysis Conjugate Priors Semiparametric Estimation Empirical Bayesian Credibility Simulation
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
1 2 3 4 5 6 7 8 9 10
6, Section 8 8, Section 5 6, Section 6 6, Section 24 10, Section 6 6, Section 9 Deleted 9, Section 10 Deleted Deleted
4, 11/03, Q.2 4, 11/03, Q.3 4, 11/03, Q.4 4, 11/03, Q.6 4, 11/03, Q.7 4, 11/03, Q.8
11 12 13 14 15 16 17 18 19 20
9, Section 5 12, Section 2 5, Section 4 6, Section 29 9, Section 6 7, Section 3 7, Section 7 9, Section 9 6, Section 9 13, Section 16
4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03,
Q.14 Q.15 Q.16 Q.18 Q.19 Q.21 Q.22 Q.23 Q.24 Q.26
21 22 23 24 25 26 27 28 29 30
10, Section 4 6, Section 13 6, Section 32 9, Section 6 1, Section 12 6, Section 10 8, Section 6 2, Section 32 9, Section 5 7, Section 5
4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03, 11/03,
Q.27 Q.28 Q.30 Q.31 Q.32 Q.34 Q.35 Q.37 Q.39 Q.40
4, 11/03, Q.11
Page 5
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
31 32 33 34 35 36 37 38 39 40
6, Section 7 10, Section 4 7, Section 5 5, Section 3 9, Section 11 7, Section 7 6, Section 10 12, Section 2 8, Section 5 6, Section 16
4, 11/02, Q.2 4, 11/02, Q.3 4, 11/02, Q.4 4, 11/02, Q.6 4, 11/02, Q.7 4, 11/02, Q.8 4, 11/02, Q.10 4, 11/02, Q.11 4, 11/02, Q.14 4, 11/02, Q.17
41 42 43 44 45 46 47 48 49 50
9, Section 10 Deleted 9, Section 16 6, Section 11 9, Section 6 7, Section 2 5, Section 4 9, Section 9 6, Section 26 9, Section 9
4, 11/02, Q.18
51 52 53 54 55 56 57 58 59 60
Deleted 13, Section 16 8, Section 4 6, Section 8 9, Section 5 6, Section 24 2, Section 14 10, Section 4 6, Section 18 9, Section 4
4, 4, 4, 4, 4, 4, 4, 4,
11/02, 11/02, 11/02, 11/02, 11/02, 11/02, 11/02, 11/02,
Q.21 Q.23 Q.24 Q.25 Q.28 Q.29 Q.31 Q.32
4, 11/02, Q.35 4, 11/02, Q.36 4, 11/02, Q.37 4, 11/02, Q.39 4, 11/02, Q.40 4, 11/01, Q.2 4, 11/01, Q.3 4, 11/01, Q.6 4, 11/01, Q.7
Page 6
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
61 62 63 64 65 66 67 68 69 70
6, Section 22 9, Section 9 Deleted 9, Section 6 8, Section 6 13, Section 12 9, Section 10 7, Section 2 6, Section 28 9, Section 13
4, 11/01, Q.10 4, 11/01, Q.11
71 72 73 74 75 76 77 78 79 80
5, Section 4 9, Section 9 7, Section 4 Deleted 6, Section 9 10, Section 4 7, Section 7 9, Section 9 6, Section 10 Deleted
4, 11/01, Q.25 4, 11/01, Q.26 4, 11/01, Q.27
81 82 83 84 85 86 87 88 89 90
13, Section 14 13, Section 11 13, Section 15 2, Section 31 3, Section 5 3, Section 1 2, Section 35 3, Section 5 2, Section 22 1, Section 19
3, 11/02, Q.10 SOA3, 3, 11/03, Q.5 SOA3, 3, 11/03, Q.40 SOA3, 11/03, Q.3 SOA3, 11/03, Q.4 SOA3, 11/03, Q.19 SOA3, 11/03, Q.29 SOA3, 11/03, Q.33 SOA3, 11/03, Q.34 3, 11/02, Q.5
4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4,
11/01, 11/01, 11/01, 11/01, 11/01, 11/01, 11/01,
11/01, 11/01, 11/01, 11/01, 11/01,
Q.14 Q.15 Q.17 Q.18 Q.19 Q.22 Q.23
Q.33 Q.34 Q.37 Q.38 Q.40
Page 7
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
91 92 93 94 95 96 97 98 99 100
3, Section 5 3, Section 11 1, Section 16 1, Section 11 3, Section 7 2, Section 31 2, Section 36 3, Section 5 3, Section 11 2, Section 38
3, 11/02, Q.6 3, 11/02, Q.16 3, 11/02, Q.27 3, 11/02, Q.28 3, 11/02, Q.36 3, 11/02, Q.37 3, 11/01, Q.6 3, 11/01, Q.7 3, 11/01, Q.18 3, 11/01, Q.28
101 102 103 104 105 106 107 108 109 110
2, Section 8 3, Section 1 2, Section 23 1, Section 19 1, Section 19 1, Section 16 3, Section 11 1, Section 11 3, Section 1 3, Section 5
3, 11/01, Q.35 3, 11/01, Q.36 3, 11/01, Q.37 3, 5/01, Q.3 3, 5/01, Q.15 3, 5/01, Q.16 3, 5/01, Q.19 3, 5/01, Q.25 3, 5/01, Q.26 3, 5/01, Q.29
111 112 113 114 115 116 117 118 119 120
1, Section 16 1, Section 16 3, Section 5 1, Section 17 2, Section 32 2, Section 31 2, Section 14 3, Section 5 2, Section 36 2, Section 36
3, 5/01, Q.36 3, 11/00, Q.2 3, 11/00, Q.8 3, 11/00, Q.13 3, 11/00, Q.21 3, 11/00, Q.27 3, 11/00, Q.31 3, 11/00, Q.32 3, 11/00, Q.41 3, 11/00, Q.42
Page 8
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
121 122 123 124 125 126 127 128 129 130
Deleted 13, Section 11 2, Section 31 1, Section 3 3, Section 5 3, Section 1 2, Section 36 Deleted Deleted 1, Section 17
131 132 133 134 135 136 137 138 139 140
13, Section 10 13, Section 11 10, Section 6 6, Section 7 7, Section 2 9, Section 5 6, Section 10 5, Section 2 9, Section 9 6, Section 12
SOA3, 11/04, Q.33 SOA3, 11/04, Q.34 4, 11/04, Q.1 4, 11/04, Q.2 4, 11/04, Q.4 4, 11/04, Q.5 4, 11/04, Q.6 4, 11/04, Q.8 4, 11/04, Q.9 4, 11/04, Q.10
141 142 143 144 145 146 147 148 149 150
7, Section 7 9, Section 2 6, Section 9 13, Section 17 12, Section 4 6, Section 23 6, Section 6 8, Section 2 6, Section 19 6, Section 24
4, 11/04, Q.12 4, 11/04, Q.13 4, 11/04, Q.14 4, 11/04, Q.16 4, 11/04, Q.17 4, 11/04, Q.18 4, 11/04, Q.20 4, 11/04, Q.21 4, 11/04, Q.22 4, 11/04, Q.24
SOA3, SOA3, SOA3, SOA3, SOA3, SOA3,
11/04, 11/04, 11/04, 11/04, 11/04, 11/04,
Q.6 Q.7 Q.8 Q.15 Q.17 Q.18
SOA3, 11/04, Q.32
Page 9
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
151 152 153 154 155 156 157 158 159 160
9, Section 9 6, Section 22 Deleted 9, Section 10 6, Section 8 6, Section 22 9, Section 6 7, Section 5 11, Section 2 6, Section 16
4, 11/04, Q.25 4, 11/04, Q.26
161 162 163 164 165 166 167 168 169 170
6, Section 26 2, Section 33 2, Section 39 3, Section 5 3, Section 11 1, Section 11 3, Section 5 2, Section 15 2, Section 38 10, Section 2
4, 11/04, Q.40 SOAM, 5/05, Q.9 SOAM, 5/05, Q.10 SOAM, 5/05, Q.17 SOAM, 5/05, Q.18 SOAM, 5/05, Q.19 SOAM, 5/05, Q.31 SOAM, 5/05, Q.32 SOAM, 5/05, Q.34 SOAM, 5/05, Q.39
171 172 173 174 175 176 177 178 179 180
3, Section 5 6, Section 16 8, Section 5 7, Section 5 13, Section 16 6, Section 18 10, Section 2 Deleted 6, Section 30 6, Section 30
SOAM, 5/05, Q.40 4, 5/05, Q.1 4, 5/05, Q.2 4, 5/05, Q.3 4, 5/05, Q.4 4, 5/05, Q.5 4, 5/05, Q.6
4, 4, 4, 4, 4, 4, 4,
11/04, 11/04, 11/04, 11/04, 11/04, 11/04, 11/04,
Q.29 Q.30 Q.32 Q.33 Q.36 Q.37 Q.38
4, 5/05, Q.9 4, 5/05, Q.10
Page 10
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
181 182 183 184 185 186 187 188 189 190
9, Section 10 13, Section 4 9, Section 7 10, Section 2 7, Section 6 6, Section 26 9, Section 10 Deleted 6, Section 19 9, Section 9
4, 5/05, Q.11 4, 5/05, Q.12 4, 5/05, Q.13 4, 5/05, Q.14 4, 5/05, Q.15 4, 5/05, Q.16 4, 5/05, Q.17
191 192 193 194 195 196 197 198 199 200
10, Section 4 6, Section 6 6, Section 9 12, Section 4 6, Section 5 6, Section 25 11, Section 2 Deleted 6, Section 4 9, Section 8
4, 5/05, Q.21 4, 5/05, Q.22 4, 5/05, Q.24 4, 5/05, Q.25 4, 5/05, Q.26 4, 5/05, Q.27 4, 5/05, Q.28
201 202 203 204 205 206 207 208 209 210
6, Section 12 13, Section 5 9, Section 5 2, Section 39 1, Section 16 3, Section 11 2, Section 18 1, Section 15 2, Section 36 3, Section 5
4, 5/05, Q.33 4, 5/05, Q.34 4, 5/05, Q.35 SOAM, 11/05, SOAM, 11/05, SOAM, 11/05, SOAM, 11/05, SOAM, 11/05, SOAM, 11/05, SOAM, 11/05,
4, 5/05, Q.19 4, 5/05, Q.20
4, 5/05, Q.31 4, 5/05, Q.32
Q.17 Q.18 Q.19 Q.26 Q.27 Q.28 Q.34
Page 11
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
211 212 213 214 215 216 217 218 219 220
2, Section 40 3, Section 5 1, Section 16 7, Section 1 10, Section 4 6, Section 8 7, Section 2 7, Section 8 9, Section 10 13, Section 11
SOAM, 11/05, Q.35 SOAM, 11/05, Q.38 SOAM, 11/05, Q.39 4, 11/05, Q.1 4, 11/05, Q.2 4, 11/05, Q.3 4, 11/05, Q.5 4, 11/05, Q.6 4, 11/05, Q.7 4, 11/05, Q.8
221 222 223 224 225 226 227 228 229 230
6, Section 6 5, Section 4 12, Section 7 Deleted 6, Section 30 9, Section 5 13, Section 12 7, Section 6 6, Section 28 10, Section 2
4, 11/05, Q.9 4, 11/05, Q.10 4, 11/05, Q.11
231 232 233 234 235 236 237 238 239 240
7, Section 6 6, Section 9 12, Section 8 Deleted 6, Section 13 9, Section 17 13, Section 5 6, Section 26 5, Section 3 11, Section 2
4, 11/05, Q.20 4, 11/05, Q.21 4, 11/05, Q.22
4, 4, 4, 4, 4, 4,
11/05, 11/05, 11/05, 11/05, 11/05, 11/05,
Q.14 Q.15 Q.16 Q.17 Q.18 Q.19
4, 11/05, Q.25 4, 11/05, Q.26 4, 11/05, Q.27 4, 11/05, Q.28 4, 11/05, Q.29 4, 11/05, Q.30
Page 12
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
241 242 243 244 245 246 247 248 249 250
6, Section 18 9, Section 6 6, Section 5 6, Section 19 8, Section 5 6, Section 8 10, Section 2 2, Section 3 13, Section 11 6, Section 24
4, 11/05, Q.31 4, 11/05, Q.32 4, 11/05, Q.33 4, 11/05, Q.34 4, 11/05, Q.35 4, 11/06, Q.1 4, 11/06, Q.2 4, 11/06, Q.3 4, 11/06, Q.4 4, 11/06, Q.5
251 252 253 254 255 256 257 258 259 260
9, Section 8 7, Section 3 10, Section 6 10, Section 4 13, Section 12 5, Section 3 12, Section 8 7, Section 5 5, Section 3 9, Section 5
4, 11/06, Q.6 4, 11/06, Q.7 4, 11/06, Q.9 4, 11/06, Q.10 4, 11/06, Q.11 4, 11/06, Q.12 4, 11/06, Q.13 4, 11/06, Q.14 4, 11/06, Q.15 4, 11/06, Q.16
261 262 263 264 265 266 267 268 269 270
7, Section 9 7, Section 8 10, Section 2 7, Section 5 13, Section 5 6, Section 15 10, Section 2 6, Section 6 2, Section 22 12, Section 2
4, 11/06, Q.17 4, 11/06, Q.18 4, 11/06, Q.19 4, 11/06, Q.20 4, 11/06, Q.21 4, 11/06, Q.22 4, 11/06, Q.23 4, 11/06, Q.24 4, 11/06, Q.26 4, 11/06, Q.27
Page 13
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
2009 Sample Qs
Location in my Study Guides
Original Source
271 272 273 274 275 276 277 278 279 280
Deleted 10, Section 6 8, Section 5 7, Section 5 13, Section 3 6, Section 11 6, Section 30 6, Section 5 2, Section 18 3, Section 11
4, 11/06, Q.29 4, 11/06, Q.30 4, 11/06, Q.31 4, 11/06, Q.32 4, 11/06, Q.33 4, 11/06, Q.34 4, 11/06, Q.35 SOAM, 11/06, Q.6 SOAM, 11/06, Q.7
281 282 283 284 285 286 287 288 289
2, Section 31 3, Section 5 1, Section 6 2, Section 35 1, Section 16 2, Section 31 3, Section 5 1, Section 17 3, Section 1
SOAM, SOAM, SOAM, SOAM, SOAM, SOAM, SOAM, SOAM, SOAM,
11/06, 11/06, 11/06, 11/06, 11/06, 11/06, 11/06, 11/06, 11/06,
Q.20 Q.21 Q.22 Q.29 Q.30 Q.31 Q.32 Q.39 Q.40
Page 14
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 15
Course 4 Sample Exam, 2000: Question
Study Aid
Subject
2. 3. 4. 6.
7, Sec. 5 5, Sec. 3 10, Sec. 4 7, Sec. 7
Nelson-Aalen Estimator Frequency, Maximum Likelihood Gamma-Poisson Log-Transformed Confidence Interval
7. 8. 9. 10.
2, Sec. 12 Empirical Limited Expected Value 6, Sec. 9 Methods of Moments, Grouped Data 6, Sec. 12 Chi-Square No Longer on the Syllabus, Method of Scoring
11. 13. 14. 15.
9, Sec. 5 Bayesian Analysis (4B, 11/95, Q.18) No Longer on the Syllabus, Excess Mortality 13, Sec. 12 How Many Simulations to Run 8, Sec. 5 Full Credibility, Aggregate Losses (4B, 11/97, Q.24)
17. 18. 19. 20.
No Longer on the Syllabus, Cox Proportional Hazard 6, Sec. 25 Maximum Likelihood, Truncated & Censored Data 9, Sec. 13 Bayesian Analysis (4B, 11/95, Q.14) 9, Sec. 13 Buhlmann Credibility (4B, 11/95, Q.15)
22. 23. 24. 26.
7, Sec. 2 Kaplan-Meier Product Limit Estimator 6, Sec. 16 Kolmogorov-Smirnov Statistic (4B, 5/97, Q.28) 9, Sec. 9 Buhlmann Credibility No Longer on the Syllabus, Epanechnikov Kernel Smoothing
27. 28. 31. 32.
No Longer on the Syllabus, Maximum Like., Markov Processes 9, Sec. 8 Buhlmann Credibility 12, Sec. 2 Nonparametric Estimation No Longer on the Syllabus, Modified Product Limit Estimator
33. 34. 37. 38. 39.
6, Sec. 30 10, Sec. 6 6, Sec. 11 13, Sec. 16 11, Sec. 2
Variance of Functions of Estimated Pars. (4B, 11/96, Q.16) Beta-Bernoulli (4B, 5/97, Q.9) Maximum Likelihood, Grouped Data Bootstrapping Semiparametric Estimation (4B, 11/97, Q.7)
Regression: 5, 12, 29, 30, 35, (40).
Time Series: 1, 16, 21, 25, 36.
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 16
Spring 2000, Course 4 Exam: Question
Study Aid
Subject
2. 3. 4. 6.
6, Sec. 7 9, Sec. 9 7, Sec. 5 2, Sec. 18
Estimating Percentiles Buhlmann Credibility Nelson-Aalen Estimator Expected Amounts per Loss and per Payment
7. 8. 10. 11.
9, Sec. 5 7, Sec. 5 10, Sec. 12 6, Sec. 16
Bayesian Analysis Nelson-Aalen Estimator Exponential-Inverse Exponential Conjugate Prior Kolmogorov-Smirnov Statistic
12. 14. 15. 17.
No Longer on the Syllabus, Log-rank Test 6, Sec. 26 Properties of Estimators 12, Sec. 2 Nonparametric Estimation 13, Sec. 16 Bootstrapping
18. 19. 21. 22.
9, Sec. 17 7, Sec. 6 6, Sec. 23 9, Sec. 5
23. 25. 26. 27.
No Longer on the Syllabus, Cox Proportional Hazard 6, Sec. 30 Variance of Functions of Estimated Parameters 8, Sec. 6 Classical Credibility Estimate of Aggregate Loss No Longer on the Syllabus, Product Limit Estimator, Right Truncation
29. 30. 32. 33.
5, Sec. 4 10, Sec. 4 6, Sec. 8 11, Sec. 2
34. 36. 37. 38. 40.
6, Sec. 30 Covariance of Functions of Estimated Parameters 6, Sec. 9 Method of Moments 9, Sec. 10 Buhlmann Credibility 7, Sec. 3 Variance of Product Limit Estimator No Longer on the Syllabus, Frequency, Accident Profiles
Regression: 1, 9, 16, 24, 31, 35.
Minimizing Variance of a Combination of Estimators Variance of Nelson-Aalen Estimator Maximum Likelihood, Single Parameter Pareto Bayesian Analysis
Chi-Square, Frequency Gamma-Poisson Percentile Matching Semiparametric Estimation
Time Series: 5, 13, 20, 28, 39.
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 17
Fall 2000, Course 4 Exam: Question
Study Aid
Subject
2 3 4 6
6, Sec. 9 10, Sec. 2 7, Sec. 2 6, Sec. 10
Method of Moments Bayes Analysis Kaplan-Meier Product Limit Estimator Maximum Likelihood
7 8 10 11
11, Sec. 2 Semiparametric Estimation No Longer on the Syllabus, Excess Mortality 6, Sec. 15 Schwarz Bayesian Criterion 10, Sec. 6 Beta-Bernoulli
13 14 15 16
6, Sec. 29 Covariance Matrix 8, Sec. 5 Full Credibility for Aggregate Losses No Longer on the Syllabus, Cox Proportional Hazard 12, Sec. 2 Nonparametric Estimation
18 19 20 22
2, Sec. 18 9, Sec. 9 7, Sec. 6 6, Sec. 24
23 24 26 27
10, Sec. 9 Inverse Gamma-Exponential No Longer on the Syllabus, Survival Function Confidence Bands 13, Sec. 16 Bootstrapping 12, Sec. 4 Nonparametric Estimation, Varying Exposures
28 29 32 33
9, Sec. 5 Bayes Analysis No Longer on the Syllabus, Bayes Theorem and Dirichlet Distribution 9, Sec. 3 Covariances 9, Sec. 13 Bayes Analysis
34 36 38 39
6, Sec. 10 Maximum Likelihood No Longer on the Syllabus, Turnbullʼs Modification of the Product-Limit Est. 9, Sec. 9 Buhlmann Credibility 6, Sec. 8 Percentile Matching
Regression: 5, 12, 21, 31, 35, 37.
Expected Amounts per Loss and per Payment Buhlmann Credibility Variance of Nelson-Aalen Estimator Maximum Likelihood for Censored Data
Time Series: 1, 9, 17, 25, 30, 40.
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 18
Spring 2001, Course 4 Exam: Question
Study Aid
Subject
2 3 4 6
10, Sec. 4 2, Sec. 3 7, Sec. 2 9, Sec. 8
Gamma-Poisson Kurtosis Kaplan-Meier Product Limit Estimator Buhlmann Credibility vs. Classical Credibility
7 8 10 11
6, Sec. 24 Maximum Likelihood, Censored Data No Longer on the Syllabus, Excess Mortality 9, Sec. 13 Bayes Analysis 9, Sec. 13 Buhlmann Credibility
12 14 15 16
6, Sec. 16 Kolmogorov-Smirnov Statistic 7, Sec. 6 Variance of Nelson-Aalen Estimator No Longer on the Syllabus, Confidence Interval for the Mean Survival Time 6, Sec. 10 Maximum Likelihood
18 19 20 22
10, Sec. 2 Bayes Analysis 5, Sec. 4 Chi-Square Statistic 5, Sec. 5 Likelihood Ratio Test No Longer on the Syllabus, Log-rank Test
23 25 26 28
9, Sec. 18 Normal Equations for Credibilities 6, Sec. 30 Delta Method No Longer on the Syllabus, Right Truncated Data 9, Sec. 13 Bayes Analysis
29 30 31 32
No Longer on the Syllabus, Ranksum Test 6, Sec. 10 Maximum Likelihood No Longer on the Syllabus, Cox Proportional Hazard 12, Sec. 4 Nonparametric Estimation, Varying Exposures
34 35 37 38 39
6, Sec. 22 Maximum Likelihood, Truncated Frequency Data No Longer on the Syllabus, Biweight Kernel Smoothing 10, Sec. 2 Bayes Analysis 10, Sec. 2 Buhlmann Credibility 6, Sec. 9 Method of Moments
Regression: 5, 13, 21, 24, 33, 40.
Time Series: 1, 9, 17, 27, 36.
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Fall 2001, Course 4 Exam: Question
Study Aid
Subject
2 3 4 6
2, Sec. 14 10, Sec. 4 7, Sec. 2 6, Sec. 18
Working with the Uniform Distribution Gamma-Poisson Kaplan-Meier Product Limit Estimator p-p plots
7 8 10 11
9, Sec. 4 Bayes Analysis No Longer on the Syllabus, Cox Proportional Hazard 6, Sec. 22 Maximum Likelihood, Truncated Data 9, Sec. 9 Buhlmann Credibility
12 14 15 17
No Longer on the Syllabus, Right Truncated Data 9, Sec. 6 Bayes Analysis 8, Sec. 6 Classical Credibility 13, Sec. 12 How many Simulations to run
18 19 20 22
9, Sec. 10 Buhlmann Credibility 7, Sec. 2 Kaplan-Meier Product Limit Estimator No Longer on the Syllabus, Excess Mortality 6, Sec. 28 Variance of Estimated Parameters
23 25 26 27
9, Sec. 13 5, Sec. 4 9, Sec. 9 7, Sec. 4
29 30 31 33
9, Sec. 3 Correlations 12, Sec. 4 Nonparametric Estimation, Varying Exposures No Longer on the Syllabus, Log-rank Test 6, Sec. 9 Method of Moments
34 36 37 38 40
10, Sec. 4 2, Sec. 5 7, Sec. 7 9, Sec. 9 6, Sec. 10
Regression: 5, 13, 21, 28, 35.
Buhlmann Credibility Chi-Square Buhlmann Credibility Survival Analysis, Grouped Data
Gamma-Poisson Limited Expected Value, Thinning Poissons Log-Transformed Confidence Interval Buhlmann Credibility Maximum Likelihood Time Series: 1, 9, 16, 24, 32, 39.
Page 19
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 20
Fall 2002, Course 4 Exam: Question
Study Aid
Subject
2 3 4 6
6, Sec. 7 10, Sec. 4 7, Sec. 5 5, Sec. 3
Empirical Estimate of Percentiles Gamma-Poisson Nelson-Aalen Estimator Maximum Likelihood, Negative Binomial
7 8 10 11
9, Sec. 11 7, Sec. 7 6, Sec. 10 12, Sec. 2
Buhlmann Credibility as linear approximation to Bayes Log-Transformed Confidence Interval Maximum Likelihood, ungrouped data, Pareto Nonparametric Estimation, No Varying Exposures
13 14 15 17
2, Sec. 38 Moments of Mixed Exponential Distribution 8, Sec. 5 Classical Credibility, Standard for Full Credibility No Longer on the Syllabus, Waldʼs Test 6, Sec. 16 Kolmogorov-Smirnov Statistic
18 19 21 23
9, Sec. 10 Buhlmann Credibility No Longer on the Syllabus, Right Truncated Data 9, Sec. 6 Bayes Analysis 6, Sec. 11 Maximum Likelihood, grouped data
24 25 26 28
9, Sec. 6 Bayes Analysis 7, Sec. 2 Kaplan-Meier Product Limit Estimator No Longer on the Syllabus, Log-rank Test 5, Sec. 4 Chi-Square
29 31 32 33
9, Sec. 9 6, Sec. 26 9, Sec. 9 7, Sec. 3
Buhlmann Credibility Bias Buhlmann Credibility Confidence Interval for Percentile of the Survival Function
35 36 37 39 40
13, Sec. 16 8, Sec. 4 6, Sec. 8 9, Sec. 5 6, Sec. 24
Bootstrapping Variance of Aggregate Losses Percentile Matching Bayes Analysis Maximum Likelihood, data censored from above
Regression: 5, 12, 20, 27, 30, 38.
Time Series: 1, 9, 16, 22, 34.
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 21
Fall 2003, Course 4 Exam: Question
Study Aid
Subject
2 3 4 6
6, Sec. 8 8, Sec. 5 6, Sec. 6 6, Sec. 24
Percentile Matching Standard for Full Credibility Triangular Kernel Smoothing Maximum Likelihood
7 8 10 11
10, Sec. 7 Beta-Bernoulli 6, Sec. 9 Method of Moments No Longer on the Syllabus, Ranksum Test 9, Sec. 10 Buhlmann Credibility
12 13 14 15
No Longer on the Syllabus, Cox Proportional Hazard 9, Sec. 3 Conditional Expected Values 9, Sec. 5 Bayes Analysis 12, Sec. 2 Nonparametric Estimation
16 18 19 21
5, Sec. 4 6, Sec. 29 9, Sec. 6 7, Sec. 3
Chi-Square Test Information Matrix Bayes Analysis Variance of Kaplan-Meier Product Limit Estimator
22 23 24 26
7, Sec. 7 9, Sec. 9 6, Sec. 9 13, Sec. 16
Log-Transformed Confidence Interval Buhlmann Credibility Method of Moments Bootstrapping
27 28 30 31
10, Sec. 4 6, Sec. 13 6, Sec. 32 9, Sec. 6
Gamma-Poisson Likelihood Ratio Test Minimum Modified Chi-Square, Not on the Syllabus Bayes Analysis
32 34 35 37 39 40
1, Sec. 12 6, Sec. 10 8, Sec. 6 2, Sec. 32 9, Sec. 5 7, Sec. 5
Frequency, Accident Profiles Maximum Likelihood Partial Credibility Grouped Data, Uniform Assumption Bayes Analysis Nelson-Aalen Estimator
Regression: 5, 9, 20, 29, 36.
Time Series: 1, 17, 25, 33, 38.
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Fall 2004, Course 4 Exam: Question
Study Aid
Subject
1. 2. 4. 5.
10, Sec. 7 6, Sec. 7 7, Sec. 2 9, Sec. 5
Beta-Bernoulli Smoothed Empirical Estimates of Percentiles Kaplan-Meier Product Limit Estimator Bayes Analysis
6. 8. 9. 10.
6, Sec. 10 5, Sec. 2 9, Sec. 9 6, Sec. 12
Maximum Likelihood Method of Moments and Percentile Matching Buhlmann Credibility Chi-Square
12. 13. 14. 16.
7, Sec. 7 9, Sec. 2 6, Sec. 9 13, Sec. 17
Log-Transformed Confidence Interval Conditional Distributions & Mixtures Method of Moments Bootstrapping via Simulation
17. 18. 20. 21.
12, Sec. 4 6, Sec. 23 6, Sec. 6 8, Sec. 2
Empirical Bayes, Varying Exposures Maximum Likelihood Kernel Smoothing Classical Credibility
22. 24. 25. 26.
6, Sec. 19 6, Sec. 24 9, Sec. 9 6, Sec. 22
Various Test Statistics Method of Moments Buhlmann Credibility Maximum Likelihood
28. 29. 30. 32.
No Longer on the Syllabus, Cox Proportional Hazard 9, Sec. 10 Buhlmann Credibility 6, Sec. 8 Percentile Matching 6, Sec. 22 Maximum Likelihood
33. 34. 36. 37. 38. 40.
9, Sec. 6 Bayes Analysis No Longer on the Syllabus, Generalized Linear Models 7, Sec. 5 Nelson-Aalen Estimator 11, Sec. 2 Semiparametric Estimation 6, Sec. 16 Kolmogorov-Smirnov Statistic 6, Sec. 26 Properties of Estimators
Regression: 3, 11, 19, 23, 27, 35.
Time Series: 7, 15, 31, 39.
Page 22
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 23
Spring 2005, Exam 4/C: Question
Study Aid
Subject
1 2 3 4 5 6 7 8 9 10
6, Sec. 16 Kolmogorov-Smirnov Statistic 8, Sec. 5 Classical Credibility, Full Credibility for Aggregate Losses 7, Sec. 5 Nelson-Aalen Estimator 13, Sec. 16 Bootstrapping 6, Sec. 18 p-p plots 10, Sec. 2 Buhlmann Credibility 7, Sec. 4 Large Sample Approximation to the Kaplan-Meier Estimator No Longer on the Syllabus, Cubic Splines and Curvature 6, Sec. 30 Maximum Likelihood 6, Sec. 30 Delta Method
11 12 13 14 15 16 17 18 19 20
9, Sec. 10 Buhlmann Credibility 13, Sec. 4 Simulating a Binomial Distribution 9, Sec. 7 Mixtures of Frequency Distributions 10, Sec. 2 Bayes Analysis 7, Sec. 6 Nelson-Aalen Estimator, linear confidence interval 6, Sec. 26 Bias and Mean Squared Error 9, Sec. 10 Buhlmann Credibility No Longer on the Syllabus, Cox proportional hazards model, loglikelihood 6, Sec. 19 Various tests of fits 9, Sec. 9 Buhlmann Credibility
21 22 23 24 25 26 27 28 29 30
10, Sec. 4 Gamma-Poisson 6, Sec. 6 Kernel Smoothing No Longer on the Syllabus, Cubic Splines 6, Sec. 9 Method of Moments 12, Sec. 4 Empirical Bayes, Varying Exposures 6, Sec. 5 Ogives 6, Sec. 25 Maximum Likelihood 11, Sec. 2 Semiparametric Estimation No Longer on the Syllabus, Cox proportional hazards model, maximum likelihood No Longer on the Syllabus, Cubic Splines
31 32 33 34 35
6, Sec. 24 9, Sec. 8 6, Sec. 12 13, Sec. 5 9, Sec. 5
Maximum Likelihood Buhlmann Credibility versus Bayes Analysis Chi-Square Simulating a LogNormal Distribution Bayes Analysis
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 24
Fall 2005, Exam 4/C: Question
Study Aid
Subject
1 2 3 4 5 6 7 8 9 10
7, Sec. 1 Empirical Distribution 10, Sec. 4 Gamma-Poisson 6, Sec. 8 Percentile Matching No Longer on the Syllabus, Cubic Splines 7, Sec. 2 Product Limit Estimator and Maximum Likelihood 7, Sec. 8 Maximum Likelihood 9, Sec. 10 Buhlmann Credibility 13, Sec. 11 Simulating Aggregate Losses 6, Sec. 6 Kernel Smoothing 5, Sec. 4 Chi-Square
11 12 13 14 15 16 17 18 19 20
12, Sec. 7 Empirical Bayes, A Priori Mean and Varying Exposures No Longer on the Syllabus, Squared Norm Smoothness No Longer on the Syllabus, Cox Proportional Hazards Model 6, Sec. 30 Delta Method 9, Sec. 5 Bayes Analysis 13, Sec. 12 Number of Simulations Needed 7, Sec. 6 Nelson-Aalen Estimator 6, Sec. 28 Asymptotic Variance of Maximum Likelihood 10, Sec. 2 Buhlmann Credibility 7, Sec. 6 Delta Method
21 22 23 24 25 26 27 28 29 30
6, Sec. 9 Method of Moments 12, Sec. 8 Empirical Bayes with Poisson Assumption No Longer on the Syllabus, Cubic Splines No Longer on the Syllabus, Cox Proportional Hazards Model 6, Sec. 13 Likelihood Ratio Test 9, Sec. 17 Covariance Structure Underlying Buhlmann Credibility 13, Sec. 5 Simulating a LogNormal 6, Sec. 26 Mean Squared Error 5, Sec. 3 Method of Moments and Maximum Likelihood 11, Sec. 2 Semiparametric Estimation
31 32 33 34 35
6, Sec. 18 9, Sec. 6 6, Sec. 5 6, Sec. 19 8, Sec. 5
p-p plots and difference graphs Bayes Analysis Ogive Chi-Square, Kolmogorov-Smirnov, Anderson-Darling Classical Credibility
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Fall 2006, Exam 4/C: Question
Study Aid
Subject
1 2 3 4 5 6 7 8 9 10
6, Sec. 8 Percentile Matching 10, Sec. 2 Bayes Analysis 2, Sec. 3 Skewness 13, Sec. 11 Simulation of Aggregate Losses 6, Sec. 24 Maximum Likelihood 9, Sec. 8 Buhlmann Credibility 7, Sec. 3 Greenwoodʼs Approximation No Longer on the Syllabus, Lagrange Interpolation 10, Sec. 7 Beta-Bernoulli 10, Sec. 4 Gamma-Poisson
11 12 13 14 15 16 17 18 19 20
13, Sec. 12 5, Sec. 3 12, Sec. 8 7, Sec. 5 5, Sec. 3 9, Sec. 5 7, Sec. 9 7, Sec. 8 10, Sec. 2 7, Sec. 5
21 22 23 24 25 26 27 28 29 30
13, Sec. 5 Simulation 6, Sec. 15 Schwarz Bayesian Criterion 10, Sec. 2 Bayes Analysis and Buhlmann Credibility 6, Sec. 6 Uniform Kernel No Longer on the Syllabus, Cubic Spline, Extrapolation 2, Sec. 22 Exponential Distribution 12, Sec. 2 Empirical Bayes No Longer on the Syllabus, Cox Proportional Hazard Model 10, Sec. 7 Beta-Binomial 8, Sec. 5 Classical Credibility
31 32 33 34 35
7, Sec. 5 13, Sec. 3 6, Sec. 11 6, Sec. 30 6, Sec. 5
Number of Simulations Maximum Likelihood Empirical Bayes and Semiparametric Estimation Nelson-Aalen Variance of Maximum Likelihood Estimator Bayes Analysis Double Decrement Maximum Likelihood Buhlmann Credibility Nelson-Aalen
Nelson-Aalen Simulation Maximum Likelihood, Grouped Data Delta Method Ogive
Page 25
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 26
May 2007, Exam 4/C: Question
Study Aid
1 2 3 4 5
6, Sec. 25 Maximum Likelihood, Truncated and Censored Data 9, Sec. 8 Buhlmann Credibility 10, Sec. 7 Beta-Bernoulli No Longer on the Syllabus, LogNormal Model of Stock Prices 6, Sec. 12 Chi-Square
6 7 8 9 10
10, Sec. 2 2, Sec. 14 3, Sec. 7 13, Sec. 11 6, Sec. 9
Buhlmann Credibility Using the Uniform Distribution, Second Limited Moment Aggregate Distributions Simulation of Aggregate Losses Method of Moments and Percentile Matching
11 12 13 14 15
12, Sec. 2 7, Sec. 3 2, Sec. 32 6, Sec. 13 10, Sec. 7
Empirical Bayes Survival Analysis, Greenwoodʼs Approximation Variance of Excess Losses Likelihood Ratio Test Beta-Binomial
16 17 18 19 20
6, Sec. 6 Kernel Smoothing 3, Sec. 5 Variance of Aggregate Distribution 5, Sec. 3 Maximum Likelihood No Longer on the Syllabus, Simulating Stock Prices 6, Sec. 16 Kolmogorov-Smirnov
continued on the next page
Subject
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 27
May 2007, Exam 4/C, continued: Question
Study Aid
Subject
21 22 23 24 25
9, Sec. 8 Buhlmann Credibility No Longer on the Syllabus, Cox Model fit via Maximum Likelihood No Longer on the Syllabus, Ruin Theory 6, Sec. 8 Percentile Matching 11, Sec. 2 Semiparametric Estimation
26 27
7, Sec. 4 4, Sec. 5
28 29 30
6, Sec. 8 13, Sec. 16 10, Sec. 9
31 32 33 34 35
6, Sec. 22 Maximum Likelihood, Truncated Data 9, Sec. 17 Buhlmann Credibility 7, Sec. 6 Survival Analysis, Nelson-Aalen No Longer on the Syllabus, Estimating the Annualized Expected Return on a Stock 9, Sec. 4 Bayes Theorem
36 37 38 39 40
9, Sec. 9 13, Sec. 12 7, Sec. 2 2, Sec. 24 5, Sec. 2
Survival Analysis, Grouped Data Risk Measures, Distortion Functions, No Longer on the Syllabus Percentile Matching, Empirical Distribution Function Bootstrapping Inverse Gamma-Exponential
Buhlmann Credibility Number of Simulations Needed Survival Analysis, Kaplan-Meier Weibull Distribution and Negative Binomial Distribution Confidence Interval for a Fitted Parameter
2013-4-14
Breakdown of Past Exams,
Course 3 Sample Exam 2000: 5 10 12 14 15 17 18 20 25 41 42
2, Sec. 18 2, Sec. 38 1, Sec. 19 3, Sec. 11 3, Sec. 11 2, Sec. 35 2, Sec. 36 3, Sec. 5 3, Sec. 5 3, Sec. 7 3, Sec. 7
Course 3 Spring 2000: 4 8 11 16 17 19 22 25 30 32 37
1, Sec. 19 2, Sec. 24 3, Sec. 11 3, Sec. 5 2, Sec. 39 3, Sec. 5 13, Sec. 7 2, Sec. 31 2, Sec. 36 13, Sec. 5 1, Sec. 14
HCM 10/26/12,
Page 28
2013-4-14
Breakdown of Past Exams,
Course 3 Fall 2000: 2 8 13 21 27 31 32 41 42 43 44
1, Sec. 16 3, Sec. 5 1, Sec. 17 2, Sec. 32 2, Sec. 31 2, Sec. 14 3, Sec. 5 2, Sec. 36 2, Sec. 36 13, Sec. 10 13, Sec. 10
Course 3 Spring 2001: 3 8 11 12 15 16 19 24 25 26 28 29 30 36
1, Sec. 19 13, Sec. 6 13, Sec. 4 13, Sec. 7 1, Sec. 19 1, Sec. 16 3, Sec. 11 2, Sec. 24 1, Sec. 11 3, Sec. 1 2, Sec. 39 3, Sec. 5 3, Sec. 11 1, Sec. 16
HCM 10/26/12,
Page 29
2013-4-14
Breakdown of Past Exams,
Course 3 Fall 2001: 6 7 13 18 27 28 30 32 35 36 37
2, Sec. 36 3, Sec. 5 13, Sec. 4 3, Sec. 11 1, Sec. 19 2, Sec. 38 1, Sec. 16 13, Sec. 7 2, Sec. 8 3, Sec. 1 2, Sec. 23
Course 3 Fall 2002: 5 6 10 16 19 27 28 33 36 37
1, Sec. 19 3, Sec. 5 13, Sec. 14 3, Sec. 11 13, Sec. 7 1, Sec. 16 1, Sec. 11 2, Sec. 13 3, Sec. 7 2, Sec. 31
HCM 10/26/12,
Page 30
2013-4-14
Breakdown of Past Exams,
CAS Fall 2003, Exam 3: 5 12 13 14 15 16 17 18 19 20 21 22 23 24 25 38 39 40
2, Sec. 13 9, Sec. 5 9, Sec. 5 1, Sec. 3 1, Sec. 19 2, Sec. 30 2, Sec. 22 1, Sec. 6 2, Sec. 34 2, Sec. 37 2, Sec. 31 2, Sec. 18 2, Sec. 23 3, Sec. 5 3, Sec. 5 13, Sec. 4 13, Sec. 4 13, Sec. 4
SOA Fall 2003, Exam 3: 3 4 5 6 18 19 28 29 33 34 39 40
2, Sec. 31 3, Sec. 5 13, Sec. 11 13, Sec. 7 2, Sec. 38 3, Sec. 1 2, Sec. 32 2, Sec. 35 3, Sec. 5 2, Sec. 22 2, Sec. 13 13, Sec. 15
Bayes Analysis Bayes Analysis
HCM 10/26/12,
Page 31
2013-4-14
Breakdown of Past Exams,
CAS Spring 2004, Exam 3: 16 17 19 20 21 22 26 28 29 30 32 33 34 35 37 38 39 40
1, Sec. 4 2, Sec. 36 3, Sec. 5 2, Sec. 22 2, Sec. 24 3, Sec. 5 1, Sec. 16 1, Sec. 8 2, Sec. 36 13, Sec. 4 1, Sec. 11 2, Sec. 37 2, Sec. 36 2, Sec. 15 3, Sec. 3 3, Sec. 5 3, Sec. 5 3, Sec. 7
HCM 10/26/12,
Page 32
2013-4-14
Breakdown of Past Exams,
CAS Fall 2004, Exam 3: 7 21 22 23 24 25 26 27 28 29 30 31 32 33 37 38 39 40
2, Sec. 34 1, Sec. 6 1, Sec. 3 1, Sec. 4 1, Sec. 3 2, Sec. 24 2, Sec. 12 2, Sec. 34 2, Sec. 38 2, Sec. 38 2, Sec. 37 3, Sec. 5 3, Sec. 5 2, Sec. 36 13, Sec. 11 13, Sec. 4 13, Sec. 3 13, Sec. 2
SOA Fall 2004, Exam 3: 5 6 7 8 15 17 18 24 32 33 34
13, Sec. 14 13, Sec. 11 2, Sec. 31 1, Sec. 3 3, Sec. 5 3, Sec. 1 2, Sec. 36 2, Sec. 33 1, Sec. 17 13, Sec. 10 13, Sec. 11
HCM 10/26/12,
Page 33
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
CAS Spring 2005 Exam 3: Question
Study Aid
Subject
4.
2, Sec. 33
Mean Excess Loss
6.
3, Sec. 1
Aggregate Losses
8.
3, Sec. 5
Variance of Aggregate Distribution
9.
3, Sec. 5
Variance of Aggregate Distribution
10.
1, Sec. 16
Gamma-Poisson
15.
1, Sec. 3
Binomial Distribution
16.
1, Sec. 11
Adding Frequency Distributions
17.
10, Sec. 2
Bayes Analysis on Poisson Processes
18.
6, Sec. 10
Maximum Likelihood
19.
6, Sec. 9
Method of Moments
20.
5, Sec. 3
Maximum Likelihood
21.
6, Sec. 26
Properties of Estimators
24.
6, Sec. 14
Hypothesis Testing
28.
1, Sec. 6
Negative Binomial Distribution
30.
2, Sec. 34
Hazard Rate
35.
2, Sec. 24
Pareto Distribution
40.
3, Sec. 5
Variance of Aggregate Distribution
Page 34
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
SOA Spring 2005 Exam M: Question
Study Aid
Subject
9.
2, Sec. 33
Mean Excess Loss
10.
2, Sec. 39
Mixing a Normal Distribution via another Normal
17.
3, Sec. 5
Variance of Aggregate Distribution
18.
3, Sec. 11
Stop Loss Premium
19.
1, Sec. 11
(a, b, 0) Class of Frequency Distributions
31.
3, Sec. 5
Variance of Aggregate Distribution
32.
2, Sec. 15
Payment per Payment Variable
34.
2, Sec. 38
Mixed Distributions
39.
10, Sec. 2
Bayes Analysis on Poisson Processes
40.
3, Sec. 5
Variance of Aggregate Distribution
Page 35
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 36
CAS Fall 2005, Exam 3: Question
Study Aid
Subject
1.
6, Sec. 10
Maximum Likelihood and Method of Moments
4.
5, Sec. 3
Maximum Likelihood
10.
2, Sec. 33
Mean Residual Life (intended as Life Contingencies Q.)
11.
2, Sec. 34
Hazard Rates (intended as Life Contingencies Q.)
19.
2, Sec. 29
Creating Additional Distributions
20.
2, Sec. 22
Payment per Loss, Exponential Distribution
21.
2, Sec. 36
Inflation
22.
2, Sec. 31
Agentʼs Bonus
24.
1, Sec. 4
Thinning a Poisson Distribution
30.
3, Sec. 5
Variance of Aggregate Distribution
32.
2, Sec. 38
Two-point Mixture
33.
2, Sec. 36
Inflation, Loss Elimination for a Franchise Deductible
34.
3, Sec. 5
LogNormal Approximation to an Aggregate Distribution
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 37
SOA Fall 2005, Exam M: Question
Study Aid
Subject
8.
2, Sec. 24
Exponential Distribution and Gamma Distribution
13.
2, Sec. 34
Hazard Rates (intended as Life Contingencies Q.)
14.
2, Sec. 31
Limited Expected Values
17.
2, Sec. 39
Inverse Gamma - Exponential
18.
1, Sec. 16
Compound Frequency Distribution, Normal Approximation
19.
3, Sec. 11
Stop Loss Premium
20.
2, Sec. 39
Continuous Mixtures (intended as Life Contingencies Q.)
26.
2, Sec. 18
Average Payment per Payment
27.
1, Sec. 15
Probability Generating Functions of Compound Distribs.
28.
2, Sec. 36
Inflation
32.
2, Sec. 38
Two-Point Mixtures (intended as Life Contingencies Q.)
34.
3, Sec. 5
Aggregate Distributions
35.
2, Sec. 40
Splices
38.
3, Sec. 5
Variance of Aggregate Distribution
39.
1, Sec. 16
Variance of Compound Frequency Distribution
40.
3, Sec. 5
Variance of Compound Poisson
2013-4-14
Breakdown of Past Exams,
HCM 10/26/12,
Page 38
CAS Spring 2006 Exam 3: Question
Study Aid
Subject
10. 11.
2, Sec. 34 2, Sec. 34
Hazard Rate, Mean Residual Life (intended as Life Con. Q.) Hazard Rate
16. 25.
2, Sec. 34 2, Sec. 24
Hazard Rate (intended as Life Contingencies Q.) Skewness of a Pareto Distribution
26. 27.
2, Sec. 36 2, Sec. 29
Inflation Creating Additional Distributions
28. 30.
2, Sec. 37 9, Sec. 5
Lee Diagram, Inflation Bayes Analysis
31. 32.
1, Sec. 11 1, Sec. 6
(a, b, 0) class Negative Binomial Distribution, change of exposures
35. 36.
1, Sec. 16 2, Sec. 24
Compound Frequency Distribution Gamma Distribution, Mode
37. 38.
2, Sec. 31 2, Sec. 33
Average Size of Losses within a Size Interval Mean Residual Life
39.
2, Sec. 36
Inflation
CAS Fall 2006 Exam 3: 18 19 20 23 24 25 29 30 31 32
2, Sec. 40 1, Sec. 17 2, Sec. 38 1, Sec. 6 1, Sec. 6 1, Sec. 6 3, Sec. 5 2, Sec. 36 1, Sec. 10 1, Sec. 4
2013-4-14
Breakdown of Past Exams,
SOA Fall 2006 Exam M: 6 7 20 21 22 29 30 31 32 39 40
2, Sec. 18 3, Sec. 11 2, Sec. 31 3, Sec. 5 1, Sec. 6 2, Sec. 35 1, Sec. 16 2, Sec. 31 3, Sec. 5 1, Sec. 17 3, Sec. 1
CAS Fall 2007 Exam 3: 7 30
6, Sec. 14 2, Sec. 41
Hypothesis Testing Life Contingencies
CAS Fall 2008 Exam 3L: 12.
7, Sec. 1
Force of Mortality
HCM 10/26/12,
Page 39
Mahlerʼs Guide to
Fitting Frequency Distributions Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-5 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-5,
Fitting Frequency Distributions,
HCM 10/9/12,
Page 1
Mahlerʼs Guide to Fitting Frequency Distributions Copyright 2013 by Howard C. Mahler. This Study Aid will review what a student needs to know about fitting frequency distributions in Loss Models.1 Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (or sections whose title is in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined. Solutions to the problems in each section are at the end of that section. Note that problems include both some written by me and some from past exams.2 The latter are copyright by the Casualty Actuarial Society and the SOA and are reproduced here solely to aid students in studying for exams.3
1
Section #
Pages
Section Name
1
3-5
Introduction
2
6-27
Method of Moments
3
28-79
Method of Maximum Likelihood
4
80-134
Chi-Square Test
5
135-152
Likelihood Ratio Test
6
153-167
Fitting to the (a, b, 1) class
7
168-170
Important Formulas and Ideas
CAS Exam 3 and SOA Exam M used to include material preliminary to that on joint Exam 4/C, which is summarized in the Introduction to this study guide and covered in more detail in “Mahlerʼs Guide to Frequency Distributions.” 2 In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. In some cases the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to answer exam questions, but it will not be specifically tested. 3 The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams.
2013-4-5,
Fitting Frequency Distributions,
HCM 10/9/12,
Page 2
Course 4 Exam Questions by Section of this Study Aid4 Section
Sample
5/00
11/00
5/01
11/01
11/02
11/03
11/04
1 2 3
8 3
4
6 29
19
5
25
28
16
20
6
Section
5/05
11/05
11/06
5/07
1 2
40
3
29
4
10
12 15
18
5 6
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07, 5/08, 11/08, and 5/09 exams. Fitting to the (a, b, 1) distributions was added to the syllabus for the Fall 2009 exam.
4
Excluding any questions that are no longer on the syllabus.
2013-4-5,
Fitting Frequency § 1 Introduction,
HCM 10/9/12,
Page 3
Section 1, Introduction One can be asked many of the same questions about fitting Frequency Distributions as one can about fitting size of Loss Distributions. However, the Syllabus is limited to fitting the three common frequency distributions: Binomial, Poisson, and Negative Binomial. These are the members of the (a,b,0) class of frequency distributions. This includes special cases such as the Bernoulli (Binomial with m =1), Binomial with m fixed, Geometric (Negative Binomial with r =1), and Negative Binomial for r fixed. The two methods of fitting one needs to know how to apply to Frequency Distributions are the Method of Moments and the Method of Maximum Likelihood. In addition one has to know how to apply the Chi-Square statistic to test a fit and how to use the loglikelihood to compare fits. All of these topics are covered in more detail in “Mahlerʼs Guide to Fitting Loss Distributions.” Here is a summary of each of the three common frequency distributions.5
Binomial Distribution Support: x = 0, 1, 2, 3..., m.
P. d. f. :
f(x) =
Parameters: 1 > q > 0, m ≥ 1. m = 1 is a Bernoulli Distribution.
m! qx (1- q)m - x ⎛m⎞ x = ⎜ ⎟ q (1- q )m-x. x! (m- x)! ⎝x ⎠
Mean = mq
Variance = mq(1-q)
Mode = largest integer in mq + q (if mq + q is an integer, then f(mq + q) = f(mq + q- 1) and both mq + q and mq + q - 1 are modes.) Probability Generating Function: P(z) = {1 + q(z-1)}m. f(0) = (1-q)m.
f(x + 1) b −q q =a + , a= , b = (m+ 1) . f(x) x +1 1− q 1− q
The sum of m independent Bernoullis with the same q is a Binomial with parameters m and q. The sum of two independent Binomials with parameters (m1 , q) and (m2 , q) is also Binomial with parameters m1 + m2 , and q. 5
See “Mahlerʼs Guide to Frequency Distributions”, or Loss Models.
2013-4-5,
Fitting Frequency § 1 Introduction,
HCM 10/9/12,
Page 4
Poisson Distribution Support: x = 0, 1, 2, 3...
P. d. f. :
Mean = λ
f(x) =
Parameters: λ > 0
λx e - λ x! Variance = λ
Mode = largest integer in λ (if λ is an integer then f(λ) = f(λ−1) and both λ and λ-1 are modes.) Probability Generating Function: P(z) = eλ(z-1) , λ > 0.
f(0) = e−λ.
f(x + 1) b =a + , a = 0, b = λ . f(x) x +1
The sum of two independent variables each of which is Poisson with parameters λ1 and λ 2 is also Poisson, with parameter λ1 + λ2 . If frequency is given by a Poisson and severity is independent of frequency, then the number of claims above a certain amount (in constant dollars) is also a Poisson. This is an example of thinning a Poisson Distribution. If claims are from a Poisson Process, and one divides these claims into subsets in a manner independent of the frequency process, then the claims in each subset are independent Poisson Processes.
2013-4-5,
Fitting Frequency § 1 Introduction,
HCM 10/9/12,
Page 5
Negative Binomial Distribution Support: x = 0, 1, 2, 3...
P. d. f. :
f(x) =
Mean = rβ
Parameters: β > 0, r ≥ 0.
r = 1 is a Geometric Distribution.
⎛ x+ r − 1⎞ r(r + 1)...(r + x - 1) βx βx = . ⎜ ⎟ x! (1+ β )x + r ⎝ x ⎠ (1+ β) x + r Variance = rβ(1+β)
Mode = largest integer in (r-1)β. (if (r-1)β is an integer, then both (r-1)β and (r-1)β - 1 are modes.) Probability Generating Function: P(z) = {1- β(z-1)}-r. f(0) = 1/ (1+β)r.
f(x + 1) b β β =a + ,a= , b = (r − 1) . f(x) x +1 1+ β 1+ β
The sum of r independent Geometric Distributions with the same β, is a Negative Binomial Distribution with parameters r and β. X is Negative Binomial(r1 , β) and Y is Negative Binomial(r2 , β), X and Y independent, then X + Y is Negative Binomial(r1 + r2 , β).
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 6
Section 2, Method of Moments One can fit a type of distribution to data via the Method of Moments, by finding that set of parameters such that the moments (about the origin) match the observed moments. If one has a single parameter, such as in the case of the Poisson Distribution, then one matches the observed mean to the theoretical mean of the distribution. In the case of two parameters, one matches the first two moments, or equivalently one matches the mean and variance. Poisson Distribution: Fitting via the method of moments is easy for the Poisson, one merely sets the single parameter ^
λ equal to the observed mean, λ = X . Exercise: Assume one has observed insureds and gotten the following distribution of insureds by number of claims: Number of Claims 0 Number of Insureds 17649
1 4829
2 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
Fit this data to a Poisson Distribution via the Method of Moments. [Solution: By taking the average value of the number of claims, one can calculate that the first moment is: X = (0)(17,649) + (1)(4829) + (2)(1106) + (3)(229) + (4)(44) + (5)(9) + (6)(4) + (7)(1) + (8)(1) 23,872 = 0.3346. Thus the fitted Poisson has λ = 0.3346.] As discussed in the following section, for the Poisson the Method of Maximum Likelihood equals the Method of Moments.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 7
Binomial Distribution, m fixed: If as is common when fitting a Binomial, m is taken as fixed, then there is only one parameter q, and ^ solving via the method of moments for the Binomial is easy: q = X / m.
As discussed in the following section, for m fixed, this is also the Method of Maximum Likelihood solution for q. Exercise: Assume one has observed insureds and got the following distribution of insureds by number of claims: Number of Claims Number of Insureds
0 208
1 357
2 274
3 126
4 31
5 3
6 1
7&+ 0
All 1000
Fit this data to a Binomial Distribution with m = 6 via the Method of Moments. [Solution: By taking the average value of the number of claims, one can calculate that the first moment is: X = 1428/1000 = 1.428. Then the fitted q = 1.428/6 = 0.238. Comment: If this data is from a Binomial Distribution, then we know that m ≥ 6, since for the Binomial Distribution x ≤ m.] Binomial Distribution: Fitting via the method of moments for the Binomial Distribution consists of writing two equations for the two parameters m and q in order to match the mean and variance. Mean = X = mq Variance = E[X2 ] - X 2 = mq(1-q) Solving for m and q gives: m=
X2 X 2 + X - E[X2]
q= X /m Since m has to be integer, one usually needs to round the fitted m to the nearest integer and then take q = X / m. (After rounding m, the observed and fitted second moments will no longer be equal.)
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 8
Exercise: Assume one has observed insureds and got the following distribution of insureds by number of claims: Number of Claims Number of Insureds
0 208
1 357
2 274
3 126
4 31
5 3
6 1
7&+ 0
All 1000
Fit this data to a Binomial Distribution via the Method of Moments. [Solution: By taking the average value of the number of claims, one can calculate that the first moment is: 1428/1000 = 1.428. By taking the average value of the square of the number of claims, one can calculate that the 2nd moment is: 3194 /1000 = 3.194 . Thus the fitted Binomial has m =
X2 X 2 + X - E[X2]
= 1.4282 / (1.4282 + 1.428 - 3.194) = 7.465.
Rounding m to the nearest integer, m = 7. Then q = 1.428/7 = 0.204.] One can compare the fitted via method of moments Binomial distribution with m = 7 and q = 0.204 to the observed distribution: Number of Claims
Observed
Method of Moments Binomial
0 1 2 3 4 5 6 7
208 357 274 126 31 3 1 0
202.5 363.3 279.3 119.3 30.6 4.7 0.4 0.01
Sum
1000.0
1000.0
For example, f(1) = 7(1 - 0.204)6 (0.204) = 0.36325. 1000 f(1) = 363.3. The fit via method of moments seems to be a reasonable first approximation. How to test the fit will be discussed in a subsequent section.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 9
Negative Binomial, r fixed: If one takes r as fixed in the Negative Binomial, then solving via the method of moments is ^
straightforward: X = rβ ⇒ β = X / r. Exercise: For r = 1.5, fit the following data to a Negative Binomial Distribution via the methods of moments. Number of Claims 0 Number of Insureds 17649
1 4829
2 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
[Solution: X = 0.3346. The fitted β = X /r = 0.3346 / 1.5 = 0.223.] As discussed in the following section, for r fixed, this is also the Method of Maximum Likelihood solution for β. Negative Binomial: Assume one has the following distribution of insureds by number of claims: Number of Claims 0 Number of Insureds 17649
1 4829
2 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
By taking the average of the number of claims, as was calculated, the first moment is 0.3346. By taking the average of the square of the number of claims observed for each insured, one can calculate that the second moment is: (02)(17,649) + (12 )(4829) + (22 )(1106) + (32 )(229)+ (42 )(44)+ (52)(9) + (62 )(4) + (72 )(1) + (82)(1) = 23,872 0.5236. Thus the estimated variance is: 0.5236 - 0.33462 = 0.4116. Since the estimated variance is greater than the estimated mean, it might make sense to fit a Negative Binomial Distribution to this data. Using the method of moments one would try to match the first two moments by fitting the two parameters of the Negative Binomial Distribution r and β. Exercise: Fit the above data to a Negative Binomial Distribution via the Method of Moments. [Solution: One can write down two equations in two unknowns, by matching the mean and the Variance variance: X = rβ. Variance = rβ(1+β). ⇒ 1 + β = = 0.4116/ 0.3346 = 1.230. X
⇒ β = 0.230. ⇒ r = X /β = 0.3346 / 0.230 = 1.455.]
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 10
In general for the Negative Binomial, the method of moments consists of writing two equations for the two parameters r and β by matching the first two moments, or equivalently, one can match the mean and variance: X = rβ. E[X2 ] - X 2 = rβ(1+β).
^
The solution is: β =
E[X2] - X 2 - X X
.
^r
=
X2 E[X2]
-
^
X2
- X
= X / β.
One can compare the Negative Binomial distribution fitted via method of moments with β = 0.230 and r = 1.455, to the observed distribution: Number of Claims
Observed
Method of Moments Negative Binomial
0 1 2 3 4 5 6 7 8 9 10
17,649 4,829 1,106 229 44 9 4 1 1 0 0
17,663.5 4,805.8 1,103.1 237.6 49.5 10.1 2.0 0.4 0.1 0.0 0.0
Sum
23,872.0
23,872.0
The fit via method of moments seems to be a reasonable first approximation. How to test the fit will be discussed in a subsequent section.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 11
Variance of an Estimated Mean: Let us assume we have X1 , X2 and X3 , three independent, identically distributed variables. Since they are independent, their variances add, and since they are identical they each have the same variance, Var[X]: Var[X1 + X2 + X3 ] = Var[X1 ] + Var[X2 ] + Var[X3 ] = 3Var[X]. Let the estimated mean be: X = (X1 + X2 + X3 )/3. Var[ X ] = Var[(X1 + X2 + X3 )/3] = Var[X1 + X2 + X3 ]/32 = (3Var[X])/32 = Var[X]/3. Exercise: We generate four independent random variables, each from a Poisson with λ = 5.6. What is the variance of their average? [Solution: Var[ X ] = Var[(X1 + X2 + X3 + X4 )/4] = Var[X1 + X2 + X3 + X4 ]/42 = (4Var[X])/42 = Var[X]/4 = 5.6/4 = 1.4.] For n independent, identically distributed variables: n
n
Var[ X ] = Var[(1/n) ∑Xi ] = Var[ ∑ i=1
n
Xi ]/n2
i=1
=
∑Var[Xi] /n2 = n Var[X]/n2 = Var[X]/n. i=1
Var[ X ] = Var[X] / n. The variance of an average declines as 1/(the number of data points).
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 12
Variance of Estimated Parameters: Previously, we fit a Poisson Distribution to 23,872 observations via the Method of Moments and obtained a point estimate λ = .3346. Then one might ask how reliable this estimate is, assuming this data actually came from a Poisson Distribution. In other words, what is the variance one would expect in this estimate solely due to random fluctuation in the data set? ^
^
Using the Method of Moments, we set λ = X . Therefore Var[ λ ] = Var[ X ]. Thus in this case we have reduced the problem of calculating the variance of the estimated λ to that of estimating the variance of the estimated mean. For X from a Poisson Distribution, with λ = 0.3346: Var(X) = λ = 0.3346. ^
^
Therefore, Var[ λ ] = Var[ X ] = Var[X]/n = λ / n = 0.3346 / 23,872 = 0.00001402. The standard deviation of the estimated λ is thus 0.00374. Using plus or minus 1.96 standard ^
deviations, an approximate 95% confidence interval for λ is: 0.3346 ± 0.0073.6 The larger the data set, the larger n, and the smaller the variance of the estimate of λ. Exercise: A Binomial Distribution with fixed m = 6 has been fit to 1000 observations via the Method of Moments. The resulting estimate of q^ = 0.238. What is an approximate 95% confidence interval for q^ ? [Solution: q^ = X /6. Var[ q^ ] = Var[ X ]/36 = (Var[X]/1000)/36 = Var[X]/36,000 = 6(q)(1-q) / 36,000 = (0.238)(1 - 0.238)/6000 = 0.00003023. The Standard Deviation is: 0.00003023 = 0.0055. Thus using plus or minus 1.96 standard deviations, an approximate 95% confidence interval for q^ is: 0.238 ± 0.011.]
6
Since for a Poisson, the Method of Moments is equal to the Method of Maximum Likelihood, this is also the interval estimate from the Method of Maximum Likelihood.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 13
Exercise: Assume we have fit 23,872 observations via the Method of Moments to a Negative ^
Binomial Distribution with fixed r = 1.5 and obtained a point estimate β = 0.223. ^
What is an approximate 95% confidence interval for β ? ^
^
[Solution: β = X /1.5. Var[ β] = Var[ X ]/2.25 = = 0.00000762. Standard Deviation is:
Var[X]/ 23,872 1.5 β (1+ β) (0.223) (1.223) = = 2.25 (2.25) (23,872) (1.5) (23,872)
0.00000762 = 0.00276. ^
Thus an approximate 95% confidence interval for β is: 0.223 ± 0.005. Comment: The confidence interval is so narrow due to the large amount of data. Its width goes down as the inverse of the square root of the amount of data.] Variance of Functions of the Parameters: Exercise: Assume we have fit a Negative Binomial Distribution with fixed r = 1.5 to 23,872 ^
observations via the Method of Moments and obtained a point estimate β = 0.223. What is an estimate of the variance of the frequency process? [Solution: rβ(1+β) = (1.5)(.223)(1.223) = 0.409.] ∂h Assume we have a function of the estimated parameter θ^ , h(θ). Then Δh(θ) ≅ (Δθ). ∂θ ⎛ ∂h ⎞ 2 ⎛ ∂h ⎞ 2 Therefore, Var[h] ≅ (Δh(θ))2 ≅ ⎜ ⎟ (Δθ)2 ≅ ⎜ ⎟ Var[ θ^ ]. Thus, Var[h(θ)] ≅ ⎝ ∂θ ⎠ ⎝ ∂θ ⎠
⎛ ∂h ⎞ 2 ⎜ ⎟ Var[ θ^ ].7 ⎝ ∂θ ⎠
Exercise: Assume we have fit a Negative Binomial Distribution with fixed r = 1.5 to 23,872 observations via the Method of Moments and obtained a point estimate β = 0.223. What is an approximate 95% confidence interval for the variance of the frequency process? [Solution: h(β) = rβ(1 + β) = (1.5)(β + β2).
∂h = 1.5(1 + 2β) = (1.5){1 + (2)(0.223)} = 2.169. ∂β
⎛ ∂h ⎞ 2 ^ Thus Var[h(β)] ≅ ⎜ ⎟ (Var[ β]) = 2.1692 (0.00000762). ⎝ ∂θ ⎠ Standard Deviation = (2.169)(0.00276) = 0.00599. Thus an approximate 95% confidence interval for the variance of the frequency process is: 0.409 ± 0.012.]
7
This is a special case of the delta method. See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 14
Problems: Use the following information for the next two questions. Over one year, the following claim frequency observations were made for a group of 1,000 policies: # of Claims # of Policies 0 800 1 180 2 19 3 1 2.1 (2 points) You fit a Poisson Distribution via the method of moments. Estimate the probability that a policy chosen at random will have more than 1 claim next year. (A) Less than 1.6% (B) At least 1.6%, but less than 1.8% (C) At least 1.8%, but less than 2.0% (D) At least 2.0%, but less than 2.2% (E) At least 2.2% 2.2 (2 points) You fit a Binomial Distribution with m = 3 via the method of moments. Estimate the probability that a policy chosen at random will have more than 1 claim next year. (A) Less than 1.6% (B) At least 1.6%, but less than 1.8% (C) At least 1.8%, but less than 2.0% (D) At least 2.0%, but less than 2.2% (E) At least 2.2%
2.3 (3 points) You have the following data from 10,000 insureds for one year. Xi is the number of claims from the ith insured.
∑ Xi
= 431.
∑ Xi2 = 634.
The number of claims of a given insured during the year is assumed to be Poisson distributed with an unknown mean that varies by insured via a Gamma Distribution. Of these 10,000 insureds, how many should one expect to observe with 3 claims next year? A. 4 B. 6 C. 8 D. 10 E. 12
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 15
Use the following information for the next three questions: One has observed the following distribution of insureds by number of claims: Number of Claims 0 1 2 3 4 5&+ All Number of Insureds 490 711 572 212 15 0 2000 2.4 (2 points) A Binomial Distribution with m = 4 is fit via the Method of Moments. Which of the following is an approximate 95% confidence interval for q? A. [0.317, 0.321] B. [0.315, 0.323] C. [0.313, 0.325] D. [0.311, 0.327] E. [0.309, 0.329] 2.5 (2 points) A Poisson Distribution is fit via the Method of Moments. Which of the following is an approximate 95% confidence interval for λ? A. [1.251, 1.301] D. [1.176, 1.376]
B. [1.226, 1.326] E. [1.151, 1.401]
C. [1.201, 1.351]
2.6 (2 points) A Negative Binomial Distribution with r = 2 is fit via the Method of Moments. Which of the following is an approximate 95% confidence interval for β? A. [0.636, 0.640] D. [0.576, 0.700]
B. [0.626, 0.650] E. [0.526, 0.675]
C. [0.606, 0.670]
Use the following information for the next two questions. Over one year, the following claim frequency observations were made for a group of 13,000 policies, where ni is the number of claims observed for policy i:
Σ ni = 671. Σ ni2 = 822. 2.7 (2 points) You fit a Poisson Distribution via the method of moments. Estimate the probability that a policy chosen at random will have at least one claim next year. A. 4.4% B. 4.6% C. 4.8% D. 5.0% E. 5.2% 2.8 (3 points) You fit a Negative Binomial Distribution via the method of moments. Estimate the probability that a policy chosen at random will have at least one claim next year. A. 4.4% B. 4.6% C. 4.8% D. 5.0% E. 5.2%
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 16
Use the following information for the next 10 questions: During a year, 10,000 insureds have a total of 4200 claims. 2.9 (1 point) You fit a Binomial Distribution with m = 10 via the method of moments. What is the fitted q? (A) Less than 0.04 (B) At least 0.04, but less than 0.05 (C) At least 0.05, but less than 0.06 (D) At least 0.06, but less than 0.07 (E) At least 0.07 2.10 (2 points) What is the standard deviation of q^ ? (A) Less than 0.0004 (B) At least 0.0004, but less than 0.0005 (C) At least 0.0005, but less than 0.0006 (D) At least 0.0006, but less than 0.0007 (E) At least 0.0007 2.11 (1 point) You fit a Poisson Distribution via the method of moments. What is the fitted λ? (A) Less than 0.4 (B) At least 0.4, but less than 0.5 (C) At least 0.5, but less than 0.6 (D) At least 0.6, but less than 0.7 (E) At least 0.7 ^
2.12 (1 point) What is the standard deviation of λ ? (A) Less than 0.007 (B) At least 0.007, but less than 0.008 (C) At least 0.008, but less than 0.009 (D) At least 0.009, but less than 0.010 (E) At least 0.010 2.13 (1 point) Using the fitted Poisson, what is the probability that an insured picked at random will have at least one claim next year? (A) Less than 25% (B) At least 25%, but less than 30% (C) At least 30%, but less than 35% (D) At least 35%, but less than 40% (E) At least 40%
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 17
2.14 (2 points) What is the standard deviation of the estimate in the previous question? (A) 0.0005 (B) 0.001
(C) 0.002
(D) 0.003
(E) 0.004
2.15 (1 point) You fit a Negative Binomial Distribution with r = 3 via the method of moments. What is the fitted β? (A) Less than 0.2 (B) At least 0.2, but less than 0.3 (C) At least 0.3, but less than 0.4 (D) At least 0.4, but less than 0.5 (E) At least 0.5 ^
2.16 (2 points) What is the standard deviation of β? (A) Less than 0.001 (B) At least 0.001, but less than 0.002 (C) At least 0.002, but less than 0.003 (D) At least 0.003, but less than 0.004 (E) At least 0.004 2.17 (1 point) Using the fitted Negative Binomial, what is the probability that an insured picked at random will have one claim next year? (A) 10% (B) 15% (C) 20% (D) 25% (E) 30% 2.18 (2 points) What is the standard deviation of the estimate in the previous question? (A) 0.0001 (B) 0.0002 (C) 0.0005 (D) 0.0010 (E) 0.0020 2.19 (2 points) You are given the following information on a block of similar Homeowners policies: Number of Policies Renewed
90
Not renewed
10
Using the normal approximation, determine the lower bound of the symmetric 95% confidence interval for the probability that such a Homeowners policy will be renewed. (A) 0.84
(B) 0.85
(C) 0.86
(D) 0.87
(E) 0.88
2.20 (2 points) You observe 39 successes in 1000 trials. Using the normal approximation, determine the upper bound of the symmetric 95% confidence interval for the probability of a success. (A) 0.050 (B) 0.051 (C) 0.052 (D) 0.053 (E) 0.054
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 18
2.21 (3 points) You are given the following: • A discrete random variable X has the density function: β1 x β 2x f(x) = 0.2 + 0.8 , x = 0, 1, 2, ... , 0 < β1 < β2. (1+ β1 )x + 1 (1+ β 2)x + 1 A random sample taken of the random variable X has mean 0.14 and variance 0.16.
•
Determine the method of moments estimate of β1. A. 0.11
B. 0.13
C. 0.15
D. 0.17
E. 0.19
2.22 (4, 5/89, Q.45) (2 points) The number of claims X that a randomly selected policyholder has in a year follows a negative binomial distribution, as per Loss Models. The following data was compiled for 10,000 policyholders in one year: Number of Number of Claims Policyholders 0 8,200 1 1,000 2 600 3 200 What is the method of moments estimate for the parameter β? A. β < 0.35
B. 0.35 ≤ β < 0.40 C. 0.40 ≤ β < 0.45 D. 0.45 ≤ β < 0.50
E. 0.50 ≤ β
2.23 (4, 5/90, Q.24) (1 point) Assume that the number of claims for an insured has a Poisson distribution: p(n) = e-θ θn / n!. Using the observations 3, 1, 2, 1, taken from a random sample, what is the method of moments ~
estimate, θ , of θ? ~
A. θ < 1.40 ~
B. 1.40 ≤ θ < 1.50 ~
C. 1.50 ≤ θ < 1.60 ~
D. 1.60 ≤ θ < 1.70 ~
E. 1.70 ≤ θ
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 19
2.24 (4B, 5/92, Q.5) (2 points) You are given the following information: • Number of large claims follows a Poisson distribution. • Exposures are constant and there are no inflationary effects. • In the past 5 years, the following number of large claims has occurred: 12, 15, 19, 11, 18 Estimate the probability that more than 25 large claims occur in one year. (The Poisson distribution should be approximated by the normal distribution.) A. Less than 0.002 B. At least 0.002 but less than 0.003 C. At least 0.003 but less than 0.004 D. At least 0.004 but less than 0.005 E. At least 0.005 2.25 (4B, 11/92, Q.13) (2 points) You are given the following information: • The occurrence of hurricanes in a given year has a Poisson distribution. • For the last 10 years, the following number of hurricanes has occurred: 2, 4, 3, 8, 2, 7, 6, 3, 5, 2 Using the normal approximation to the Poisson, determine the probability of more than 10 hurricanes occurring in a single year. A. Less than 0.0005 B. At least 0.0005 but less than 0.0025 C. At least 0.0025 but less than 0.0045 D. At least 0.0045 but less than 0.0065 E. At least 0.0065 2.26 (2, 2/96, Q.26) (1.7 points) Let X1 ,..., Xn be a random sample from a discrete distribution with probability function: p(1) = θ, p(2) = θ, and p(3) = 1 - 2θ, where 0 < θ < 1/2. Determine the method of moments estimator of θ . A. (3 - X )/3
B. ( X - 1)/4
C. (2 X - 3)/6
D. X
E. X /2
2.27 (4, 11/04, Q.8 & 2009 Sample Q.138) (2.5 points) You are given the following sample of claim counts: 0 0 1 2 2 You fit a binomial(m, q) model with the following requirements: (i) The mean of the fitted model equals the sample mean. (ii) The 33rd percentile of the fitted model equals the smoothed empirical 33rd percentile of the sample. Determine the smallest estimate of m that satisfies these requirements. (A) 2 (B) 3 (C) 4 (D) 5 (E) 6
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 20
2.28 (CAS3, 5/06, Q.1) (2.5 points) The number of goals scored in a soccer game follows a Negative Binomial distribution. A random sample of 20 games produced the following distribution of the number of goals scored: Goals Scored Frequency Goals Scored Frequency 0 1 5 2 1 3 6 1 2 4 7 0 3 5 8 1 4 3 Use the sample data and the method of moments to estimate the parameter β of the Negative Binomial distribution. A. Less than 0.25 B. At least 0.25, but less than 0.50 C. At least 0.50, but less than 0.75 D. At least 0.75, but less than 1.00 E. At least 1.00 2.29 (4, 5/07, Q.40) (2.5 points) You are given: Loss Experience Number of Policies 0 claims 1600 1 or more claims 400 Using the normal approximation, determine the upper bound of the symmetric 95% confidence interval for the probability that a single policy has 1 or more claims. (A) 0.200 (B) 0.208 (C) 0.215 (D) 0.218 (E) 0.223
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 21
2.30 (CAS3L, 11/09, Q.17) (2.5 points) You are creating a model to describe exam progress. You are given the following information:
• Let X be the number of exams passed in a given year. • The probability mass function is defined as follows: P(X = 0) = 1 - p - q P(X = 1) = p P(X = 2) = q
• Over the last 5 years, you observe the following values of X: 0 0 1 2 2 Calculate the method of moments estimate of p. A. Less than 0.15 B. At least 0.15, but less than 0.21 C. At least 0.21, but less than 0.27 D. At least 0.27, but less than 0.33 E. At least 0.33
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 22
Solutions to Problems: 2.1. D. λ = mean = ((0)(800) + (1)(180) + (2)(19) + (3)(1)) /1000 = 0.221. 1 - {f(0) + f(1)} = 1 - e-0.221(1 + 0.221) = 2.11%. 2.2. A. 3q = mean = 0.221. q = 0.07367. f(2) + f(3) = (3)(0.073672 )(1 -0 .07367) + 0.073673 = 1.55%. 2.3. D. The mixed distribution for a Gamma-Poisson is a Negative Binomial. Fit a Negative Binomial via Method of Moments. second moment = 634/10000 = 0.0634. mean = 0.0431 = rβ. variance = 0.0634 - 0.04312 = 0.06154 = rβ(1+β). (1+β) = 0.06154/0.0431 = 1.4278. ⇒ β = 0.4278. r = 0.1007. f(3) = (r(r+1)(r+2)/3!)β3 /(1+β)r+3 = {(0.1007)(1.1007)(2.1007)/6) 0.42783 /1.42783.1007 = 0.1007%.
10,000 f(3) = 10.1.
2.4. E. X = {(0)(490) + (1)(711) +(2)(572) + (3)(212) + (4)(15)}/2000 = 1.2755. q^ = X /m = 1.2755/4 = .3189. Var[ q^ ] = Var[ X ]/16 = (Var[X]/2000)/ 16 = 4(q)(1-q) / 32,000 = (0.3189)(1 - 0.3189)/8000 = 0.00002715. StdDev[ q^ ] = 0.00002715 = 0.0052. Thus an approximate 95% confidence interval for q is: 0.319 ± (1.96)(0.0052) = 0.319 ± 0.010. ^
2.5. B. λ = X = 1.2755. ^
Var[ λ ] = Var[ X ] = Var[X]/ 2000 = λ / 2000 = 1.2755/2000 = 0.0006378. Standard Deviation is:
0.0006378 = 0.0253.
Thus an approximate 95% confidence interval for λ is: 1.276 ± (1.96)(0.0253) = 1.276 ± 0.050. ^
^
2.6. C. β = X / r = 1.2755/2 = .6378. Var[ β] = Var[ X ]/22 = Var[X]/ (4)(2000) = 2(β)(1+β)/ 8000 = (.6378)(1.6378)/4000 = 0.0002611. Standard Deviation is:
0.0002611 1 = 0.0162.
Thus an approximate 95% confidence interval for β is: 0.638 ± 0.032. 2.7. D. λ = mean = 671/ 13000 = .05162. 1 - f(0) = 1 - e-.05162 = 1 - 95.0% = 5.0%.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 23
2.8. B. second moment = 822/13000 = 0.06323. mean = 0.05162 = rβ. variance = 0.06323 - 0.051622 = 0.06057 = rβ(1 + β). (1 + β) = 0.06057/0.05162 = 1.1734. ⇒ β = 0.173. ⇒ r = 0.2977. 1 - f(0) = 1 - 1/1.1734.2977 = 1 - 95.35% = 4.65%. 2.9. B. X = 4200/10000 = .42. q^ = X /m = 0.42/10 = 0.042.
2.10. D. Var[ q^ ] = Var[ X /m] = Var[ X /10] = Var[ X ]/100 = (Var[X]/n)/100 = (10q(1-q)/10000)/100 = (.042)(1-.042)/100000 = 0.000000402. StdDev[ q^ ] = 0.000000402 = 0.000634.
^
2.11. B. λ = X = 0.42. ^
2.12. A. Var[ λ ] = Var[ X ] = Var[X]/n = λ/10000 = .42/10000. ^
StdDev[ λ ] = 0.42 / 100 = 0.00648. 2.13. C. Prob[at least one claim] = 1 - e−λ = 1 - e-.42 = 34.30%. 2.14. E. h(λ) = 1 - e−λ. ∂h/∂λ = e−λ = e-.42 = 0.6570. ^
Var[h(λ)] ≅ (∂h/∂λ)2 Var[ λ ] = (.65702 )(0.000042). StdDev[h(λ)] = .6570 0.000042 = 0.00426. Comment: Thus an approximate 95% confidence interval for the probability of at least one claim is: 34.30% ± (1.96)(.426%) ≅ 34.3% ± 0.8%. ^
2.15. A. β = X /r = .42/3 = 0.14. ^
2.16. C. Var[ β] = Var[ X /3] = Var[ X ]/9 = (Var[X]/n)/9 = (3β(1+β)/10000)/9 = (.14)(1.14)/30000 = .00000532.
^
StdDev[ β] = 0.00000532 = 0.00231.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 24
2.17. D. f(1) = rβ / (1+β)1+r = (3)(.14)/(1.14)4 = 24.87%. 2.18. E. h(β) = 3β / (1+β)4 . ∂h/∂β = 3/(1+β)4 - 12β/(1+β)5 = 3/1.144 - 12(.14)/1.145 = .9037. ^
Var[h(β)] ≅ (∂h/∂β)2 Var[ β] = (.90372 )(.00000532). StdDev[h(β)] = 0.9037 0.00000532 = 0.00208. Comment: Thus an approximate 95% confidence interval for f(1) is: 24.87% ± (1.96)(.208%) ≅ 24.9% ± 0.4%. 2.19. A. Treat this as a yes/no situation or Bernoulli, with q the probability of renewal. We fit a Bernoulli using the method of moments: q^ = X = 90/100 = 0.9. Var[ q^ ] = Var[ X ] = Var[X]/n = q(1 - q)/n = (0.9)(0.1)/100 = 0.0009. An approximate 95% confidence interval for q^ is: 0.9 ± 1.960 0.0009 = 0.9 ± 0.0588 = [0.8412 , 0.9588]. Comment: Similar to 4, 5/07, Q.40. One would get the same result using maximum likelihood. 2.20. B. We fit a Bernoulli using the method of moments: q^ = X = 39/1000 = 0.039. Var[ q^ ] = Var[ X ] = Var[X]/n = q(1 - q)/n = (0.039)(1 - 0.039)/1000 = 0.0000375. An approximate 95% confidence interval for q^ is: 0.0390 ± 1.960 0.0000375 = 0.0390 ± 0.0120 = [0.0270 , 0.0510]. Comment: Similar to 4, 5/07, Q. 40.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 25
2.21. A. This is a mixture of two Geometric Distributions, with weights of 20% and 80%. E[X] = 0.2 β1 + 0.8 β2. A Geometric Distribution has a mean of β, and a variance of: β(1+β) = β + β2. Therefore, a Geometric Distribution has a second moment of: β + β2 + β2 = β + 2β2. Therefore, the mixture has a second moment of: (0.2)(β1 + 2β12) + (0.8)(β2 + 2β22) = 0.2 β1 + 0.4 β12 + 0.8 β2 + 1.6 β22. Setting the theoretical and observed moments equal, gives 2 equations in 2 unknowns: 0.2 β1 + 0.8 β2 = 0.14. ⇒ β2 = 0.175 - 0.25 β1. 0.2 β1 + 0.4 β12 + 0.8 β2 + 1.6 β22 = 0.16 + 0.142 = 0.1796. ⇒ 0.2 β1 + 0.4 β12 + (0.8) (0.175 - 0.25 β1) + (1.6) (0.175 - 0.25 β1)2 = 0.1796. ⇒ 0.5 β12 - 0.14 β1 + 0.0094 = 0. ⇒ β1 =
0.14 ±
(-0.14)2 - (4)(0.5)(0.0094) = 0.1117 or 0.1683. (2)(0.5)
If β1 = 0.1117, then β2 = 0.175 - (0.25)(0.1117) = 0.1471 > 0.1117. OK. If β1 = 0.1683, then β2 = 0.175 - (0.25)(0.1683) = 0.1329 < 0.1683. Not OK. Comment: Similar to 4, 11/01, Q.23 in “Mahlerʼs Guide to Fitting Loss Distributions.” We are told that β1 < β2. 2.22. E. X = {(8200)(0) +(1000)(1)+(600)(2)+(200)(3)} / 10,000 = .28. The estimated 2nd moment is: {(8200)(02 ) +(1000)(12 )+(600)(22 )+(200)(32 )} / 10,000 = .52. Thus the estimated variance is: .52 - .282 = .4416. Setting the estimated mean and variance equal to the theoretical mean and variance for the Negative Binomial: Variance/Mean = rβ(1 + β) / rβ = 1 + β = .4416/ .28 = 1.577. ⇒ β = 0.577. Comment: One can use the value of β to then solve for r = .28/.577 = .485. For β =.577 and r = .485, one would expect 10000 policyholders to be distributed as follows: Number of Claims Fitted Number of Policyholders Observed # of Policyholders
0 8017 8200
1 1423 1000
2 387 600
Thus the Negative Binomial is not a good fit to this data.
3 117 200
4 37 0
5 12 0
6 4 0
7 1 0
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 26
2.23. E. Since the Poisson has one parameter θ, one sets X equal to the mean of the Poisson, θ. X = (3+1+2+1)/4 = 1.75. Thus θ = 1.75. 2.24. C. The average number of large claims observed per year is: (12+15+19+11+18)/5 = 15. Thus we estimate that the Poisson has a mean of 15 and thus a variance of 15. We wish to estimate the probability of 26 large claims or more; using the continuity correction, we wish to standardize 25.5 by subtracting the mean of 15 and dividing by the standard deviation of Thus Prob(N > 25) ≅ 1 - Φ((25.5 - 15)/
15 .
15 ) = 1 - Φ(2.71) ≅ 1 - 0.9966 = 0.0034.
2.25. B. The observed mean is 42 / 10 = 4.2. Assume a Poisson with mean of 4.2 and therefore variance of 4.2. Using the “continuity correction”, P( N > 10) ≅ 1 - Φ[(10.5 - 4.2) /
4.2 ] = 1 - Φ[3.07] = 1 - 0.9989 = 0.0011.
2.26. A. E[X] = (1)θ + (2)θ + (3)(1 - 2θ) = 3 - 3θ. Set X = E[X]. X = 3 - 3θ. ⇒ θ = (3 - X )/3. 2.27. E. Sample Mean = (0 + 0 + 1 + 2 + 2)/5 = 1. (.33)(5 + 1) = 1.98. The 1.98th value is 0, the smoothed empirical 33rd percentile. Thus for the Binomial we want mq = 1 and the 33rd percentile to be 0.
⇒ q = 1/m, and F(0) ≥ .33. F(0) = f(0) = (1 - q)m = (1 - 1/m)m. ⇒ (1 - 1/m)m ≥ .33. Try values of m. For m = 5, (4/5)5 = .3277. For m = 6, (5/6)6 = 0.3349. Comments: A mixture of method of moments and percentile matching, not a typical real world application. Percentile matching is discussed in “Mahlerʼs Guide to Fitting Loss Distributions.” For a discrete distribution, take the 100pth percentile as the first value at which F(x) ≥ p. In this case, the 33rd percentile of the Binomial Distribution is the smallest x such that F(0) ≥ .33. For m = 5 and q = 1/5, the 33rd percentile is 1. For m = 6 and q = 1/6, the 33rd percentile is 0. 2.28. A. Mean is: {(1)(0) + (3)(1) + (4)(2) + (5)(3) + (3)(4) + (2)(5) + (1)(6) + (0)(7) + (1)(8)}/20 = 3.1 2nd moment is: {(1)(02 ) + (3)(12 ) + (4)(22 ) + (5)(32 ) + (3)(42 ) + (2)(52 ) + (1)(62 ) + (0)(72 ) + (1)(82 )}/20 = 262/20 = 13.1. Variance is: 13.1 - 3.12 = 3.49. Matching the mean and the variance: rβ = 3.1, and rβ(1 + β) = 3.49.
⇒ 1 + β = 3.49/3.1 = 1.1258. ⇒ β = 0.1258. ⇒ r = 3.1/0.1258 = 24.64.
2013-4-5,
Fitting Frequency § 2 Method of Moments,
HCM 10/9/12,
Page 27
2.29. D. Treat this as a yes/no situation or Bernoulli. We fit a Bernoulli using the method of moments: q^ = X = 400/2000 = 1/5. Var[ q^ ] = Var[ X ] = Var[X]/n = q(1 - q)/n = (1/5)(4/5)/2000 = 0.00008. An approximate 95% confidence interval for q^ is: 0.2 ± 1.960 0.00008 = 0.2 ± 0.0175 = [0.1825 , 0.2175]. Comment: q here is the probability of one or more claim. One would get the same result using maximum likelihood. 2.30. B. In order to fit the two parameters, p and q, we match the first two moments: (0)(1 - p - q) + (1)(p) + (2)(q) = (0 + 0 + 1 + 2 + 2)/5. ⇒ p + 2q = 1. (02 )(1 - p - q) + (12 )(p) + (22 )(q) = (02 + 02 + 12 + 22 + 22 )/5. ⇒ p + 4q = 9/5. Solving the two equations in two unknowns: p = 1/5, and q = 2/5. Comment: In general this intuitive result holds: p^ = the observed proportion of ones, and q^ = the observed proportions of twos. Let n be the total number of observations, with n1 ones, n2 twos, and n - n1 - n2 zeros. Then the two equations are: p + 2q = (n1 + 2n2 )/n. p + 4q = (n1 + 4n2 )/n. The solutions of these two equations is: p^ = n1 /n, and q^ = n2 /n.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 28
Section 3, Method of Maximum Likelihood For ungrouped data {x1 , x2 , ... , xn } define: Likelihood = Π f(xi) .
Loglikelihood = Σ ln f(xi) .
During a year, 5 policies have 0 claims each, 3 policies have 1 claim each, and 2 policies have 2 claims each. Then the likelihood is: f(0)f(0)f(0)f(0)f(0)f(1)f(1)f(1)f(2)f(2) = f(0)5 f(1)3 f(2)2 . The loglikelihood is: lnf(0) + lnf(0) + lnf(0) + lnf(0) + lnf(0) + lnf(1) + lnf(1) + lnf(1) + lnf(2) + lnf(2) = 5lnf(0) + 3lnf(1) + 2lnf(2). For a given data set and type of distribution, the likelihood and the loglikelihood are functions of the parameter(s) of the distribution. In order to fit a chosen type distribution by maximum likelihood you maximize the likelihood or equivalently maximize the loglikelihood. In other words for ungrouped data, you find the set of parameters such that either Π f(xi) or Σ ln f(xi) is maximized. For single parameter frequency distributions one can solve for the parameter value by taking the derivative of the loglikelihood and setting it equal to zero.8 Poisson: Exercise: For the above data, what is the loglikelihood for a Poisson distribution? [Solution: For the Poisson: f(x) = λx e−λ / x!. ln f(0) = ln(e−λ) = −λ. ln f(1) = ln(λe−λ) = lnλ − λ. ln f(2) = ln(λ2 e−λ/2) = 2lnλ − λ − ln(2). loglikelihood = 5lnf(0) + 3lnf(1) + 2lnf(2) = 5{- λ} + 3{lnλ - λ} + 2{2lnλ − λ − ln(2)} = 7lnλ − 10λ - 2ln(2).]
8
For two parameters, one can apply numerical techniques. See for example Appendix F of Loss Models. Many computer programs such as Excel or Mathematica come with algorithms to maximize or minimize functions, which can be used to maximize the loglikelihood or minimize the negative loglikelihood.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 29
The loglikelihood as a function of the parameter λ:
- 12
- 14
- 16
0.5
1
1.5
2
Exercise: For the above example, for what value of λ is the loglikelihood maximized? [Solution: loglikelihood = 7lnλ − 10λ - 2ln(2). Set the partial derivative with respect to λ equal to 0: 0 = 7/λ - 10. ⇒ λ = 7/10 = 0.7.] Exercise: For the above example, using the method of moments what is the fitted Poisson? [Solution: λ = X = {(5)(0) + (3)(1) + (2)(2)}/10 = 7/10 = 0.7.] Maximum likelihood and the method of moments each result in a Poisson with λ = 0.7. For the Poisson: f(x) = λx e−λ / x!. Thus the loglikelihood for the observations x1 , x2 , ... xN is: Σ ln f(xi) = Σ {xi lnλ - λ - ln( xi!)} Taking the partial derivative with respect to λ and setting it equal to zero, gives: Σ {(xi / λ) - 1} = 0. Therefore, λ = Σ xi / N = X . For the Poisson Distribution the Method of Maximum Likelihood equals the Method of Moments, provided the frequency data has not been grouped into intervals.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 30
Binomial, m fixed: Exercise: Assume the following distribution of insureds by number of claims: Number of Claims Number of Insureds
0 208
1 357
2 274
3 126
4 31
5 3
6 1
7&+ 0
All 1000
What is the loglikelihood for a Binomial Distribution with fixed m = 6 and q unknown? [Solution: For the Binomial with m = 6, f(x) = {6!/ (6-x)!x!} qx (1-q)6-x. ln f(x) = x ln (q) + (6-x) ln(1-q) + ln(6!) - ln[(6-x)!] - ln(x!). The loglikelihood is the sum of the loglikelihood at each point; alternately each number of claims contributes the product of its number of insureds times the loglikelihood for those number of claims. 208 ln f(0) + 357 ln f(1) + 274 ln f(2) + 126 ln f(3) + 31 ln f(4) + 3 ln f(5) + 1 ln f(6) = 208 {6 ln(1-q)} + 357 {ln (q) + 5 ln(1-q) + ln(6!) - ln(5!)} + 274 {2 ln (q) + 4 ln(1-q) + ln(6!) - ln(4!) - ln(2!)} + 126 {3 ln (q) + 3 ln(1-q) + ln(6!) - ln(3!) - ln(3!)} + 31{4 ln (q) + 2 ln(1-q) + ln(6!) - ln(2!) - ln(4!)} + 3 {5 ln (q) + ln(1-q) + ln(6!) - ln(5!)} + 1{6 ln q}.] Exercise: Fit the above data to a Binomial with m = 6 via the Method of Maximum Likelihood. [Solution: Set equal to zero the partial derivative of the loglikelihood with respect to the single parameter q. 0 = {(1)(357) + (2)(274) +(3)(126) + (4)(31) + (5)(3) + (6)(1)}/q {(6)(208) + (5)(357) + (4)(274) + (3)(126) + (2)(31) + (1)(3)}/(1-q). ⇒ 1428/q = 4572/(1-q). ⇒ q = 1428/(4572 + 1428) = 1428/6000 = .238. Comment: This is the same solution as fitting by the Method of Moments. X = 1428/1000 = 1.428. Then q = X /m = 1.428/6 = .238. ] For a Binomial Distribution with m fixed, the Method of Maximum Likelihood solution for q is the same as the Method of Moments, provided the frequency data has not been grouped into intervals. Binomial, m not fixed: Assume we want to fit the above data via Maximum Likelihood to Binomial (with m not fixed.) Then the first step is to maximize the likelihood for various fixed values of m,9 and then let m vary. We note that m is always an integer10 greater than or equal to the largest number of claims observed. For this data, m ≥ 6 and integer. For each value of m, q = X /m = 1.428/m. Then one can compute the loglikelihood for these values of q and m. 9
As discussed above, for m fixed, the maximum likelihood estimate of q is observed mean / m. This is the same as the method of moments estimate of q for m fixed. 10 In this respect the Binomial differs from the Negative Binomial, in which r can be any positive number, integer or not.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 31
Exercise: What is the loglikelihood for a Binomial Distribution with m = 6 and q = 0.238? [Solution: 208 { 6 ln(.762) } + 357 { ln (.238) + 5 ln(.762) + ln(6!) - ln (5!) } + 274 {2 ln (.238) + 4 ln(.762) + ln(6!) - ln(4!) - ln(2!)} + 126 {3 ln (.238) + 3 ln(.762) + ln(6!) - ln(3!) - ln(3!)} + 31{4 ln (.238) + 2 ln(.762) + ln(6!) - ln(2!) - ln(4!)} + 3 {5 ln (.238) + ln(.762) + ln(6!) - ln(5!)} + 1{6 ln .238} = (208)(-1.63085) + (357)(-1.00277) + (274)(-1.25015) + (126)(-2.12615) + (31)(-3.57751) + (3)(-5.65747) + (1)(-8.61291) = -1444.13.] For this data set, here are the loglikelihoods for m = 6, 7, 8 and 9, with q = 1.428/m: m
q
loglikelihood
6 7 8 9
0.2380 0.2040 0.1785 0.1587
-1444.13 -1443.13 -1443.20 -1443.63
Loglikelihood
- 1443.5
- 1444.0
- 1444.5
6
7
8
9
10
11
m
The largest loglikelihood occurs when m = 7.11 Thus the maximum likelihood fit is at m = 7 and q = 0.2040.
11
The largest loglikelihood corresponds to the smallest negative loglikelihood. Above m = 7 the loglikelihood gets smaller, so in this case the loglikelihood has been computed for enough values of m. If not, one would do the computation at larger values of m, until one found the maximum likelihood.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 32
Negative Binomial, r fixed: Exercise: During a year, 5 policies have 0 claims each, 3 policies have 1 claim each, and 2 policies have 2 claims each. For r = 3, what is the maximum likelihood Negative Binomial? [Solution: For the Negative Binomial: f(x) = {r(r+1)..(r+x-1)/x!} β x / (1+β)x+r. f(0) = 1/(1+β)3 . f(1) = 3β/(1+β)4 . f(2) = 6β2/(1+β)5 . ln f(0) = -3ln(1+β). ln f(1) = ln(3) + ln(β) - 4ln(1+β). ln f(2) = ln(6) + 2ln(β) - 5ln(1+β). loglikelihood = 5lnf(0) + 3lnf(1) + 2lnf(2) = 5{-3ln(1+β)} + 3{ln(3) + ln(β) - 4ln(1+β)} + 2{ln(6) + 2ln(β) - 5ln(1+β)} = 3ln(3) + 2ln(6) + 7ln(β) - 37ln(1+β). Setting the partial derivative of the loglikelihood with respect to β equal to zero: 7/β - 37/(1+β) = 0. ⇒ 37β = 7(1+β). ⇒ β = 7/30.] Exercise: During a year, 5 policies have 0 claims each, 3 policies have 1 claim each, and 2 policies have 2 claims each. For r = 3, what is the method of moments Negative Binomial? [ X = {(5)(0) + (3)(1) + (2)(2)}/10 = 7/10. rβ = X . ⇒ β = X /r = (7/10)/3 = 7/30.] If one takes r as fixed in the Negative Binomial, then the method of maximum likelihood equals the method of moments, provided the frequency data has not been grouped into intervals. Negative Binomial: When r and β are both allowed to vary, then one must maximize the likelihood or loglikelihood via numerical methods. For example, assume the data to which we had previously fit a Negative Binomial via Method of Moments:12 Number of Claims 0 Number of Insureds 17649
1 4829
2 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
For given values of r and β, the loglikelihood is: 17649 ln f(0) + 4829 ln f(1) + 1106 ln f(2) + 229 ln f(3) + 44 ln f(4) + 9 ln f(5) + 4 ln f(6) + ln f(7) + ln f(8). For given values of r and β, one can calculate the densities of the Negative Binomial Densities and thus the loglikelihood. 12
The Method of Moments Negative Binomial had parameters r = 1.455 and β = 0.230.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 33
A table of such loglikelihoods is:
β = 0.210 β = 0.215 β = 0.220 β = 0.225 β = 0.230 β = 0.235 β = 0.240
r = 1.40 -17,968.4 -17,951.2 -17,937.6 -17,927.5 -17,920.6 -17,916.8 -17,915.9
r = 1.45 -17,943.8 -17,931.5 -17,922.8 -17,917.5 -17,915.5 -17,916.5 -17,920.5
r = 1.50 -17,927.1 -17,919.8 -17,916.0 -17,915.6 -17,918.4 -17,924.3 -17,933.1
r = 1.55 -17,918.1 -17,915.6 -17,916.8 -17,921.2 -17,928.9 -17,939.7 -17,953.3
r = 1.60 -17,916.1 -17,918.6 -17,924.6 -17,934.0 -17,946.5 -17,962.0 -17,980.5
Based on this table, the maximum likelihood fit is approximately β = .23 and r = 1.45.13 It turns out that the maximum likelihood Negative Binomial has parameters: β = 0.2249 and r = 1.4876.14 This maximum likelihood Negative Binomial seems to be a reasonable first approximation to the observed data. In this case the maximum likelihood and method of moments distributions are relatively similar. Number of Claims
Observed
Method of Moments Negative Binomial
Maximum Likelihood Negative Binomial
0 1 2 3 4 5 6 7 8 9 10 Sum
17,649 4,829 1,106 229 44 9 4 1 1 0 0 23872.0
17,663.5 4,805.8 1,103.1 237.6 49.5 10.1 2.0 0.4 0.1 0.0 0.0 23872.0
17,653.5 4,821.8 1,101.1 235.0 48.4 9.8 1.9 0.4 0.1 0.0 0.0 23872.0
Rather than directly maximizing the loglikelihood by numerical techniques, one can set the partial derivatives of the loglikelihood equal to zero. For the Negative Binomial, f(x) = ((r+x-1)...(r+1)(r)/ x!) βx / (1+β)x+r ln f(x) = ln (r+x-1) + ... + ln(r+1) + ln(r) + x ln(β) - (x+r)ln(1+β) - ln(x!).
∂ ln f(x) / ∂β = x/β - (x+r)/(1+β) = x{1/β - 1/(1+β)} - r/(1+β). Note that the loglikelihood is also large for β = .225 and r = 1.50, and β = .215 and r =1.55. One would need to refine the grid in order to achieve more accuracy for β and r. 14 For the Negative Binomial one solves via numerical methods. I used the Nelder-Mead algorithm, the “Simplex Method” described in Appendix F of Loss Models, not on the syllabus. 13
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 34
∂ ln f(x) / ∂r = 1/(r+x-1) +...+ 1/(r+1) + 1/r - ln(1+β) = i = x-1
Σ 1/ (r+i)
- ln(1+β).
i=0
Where the summation is zero for x = 0. For example, for the Negative Binomial and the given data set the loglikelihood is: 17649 ln f(0) + 4829 ln f(1) + 1106 ln f(2) + 229 ln f(3) + 44 ln f(4) + 9 ln f(5) + 4 ln f(6) + ln f(7) +ln f(8).
∂ ln f(x) / ∂β = {4829 +(2)(1106) + (3)(229) + (4)(44)+(5)(9) +(4)(6) + (1)(7) + (1)(8)} (1/β 1/(1+β)) - 23782 r/(1+β) = 7988 / (β(1+β)) - 23782 r/(1+β).
∂ ln f(x) / ∂r = 4829/ r + 1106(1/r + 1/(r+1)) + 229(1/r + 1/(r+1) + 1/(r+2)) + 44(1/r + 1/(r+1) + 1/(r+2) + 1/(r+3)) + 9(1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) +1/(r+4)) + 4(1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) +1/(r+4)+1/(r+5)) + (1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) +1/(r+4)+1/(r+5) +1/(r+6))+ (1/r + 1/(r+1) + 1/(r+2) + 1/(r+3) +1/(r+4)+1/(r+5) +1/(r+6)+1/(r+7)) - 23782ln(1+β) . One could set these two equations equal to zero and solve numerically. The first equation becomes, 0 = 7988/ (β(1+β)) - 23782 r/(1+β). 7988/23782 = r β. Observed mean = theoretical mean. More generally, let nx be the number of observations of x claims. Σ nx = n.
µ = observed mean = Σ xnx /n. Then, ∂ ln f(x) / ∂β = Σ xnx / (β(1+β)) - n r/(1+β) = {n /(1+β)} (µ /β - 1/r). ∞
i = x-1
∂ ln f(x) / ∂r = Σ nx { Σ 1/ (r+i) } - n ln(1+β). x =1
i=0
Then setting the two partial derivatives equal to zero gives the following two equations:15 r β = Σ xnx / n = observed mean = µ. ∞
ln(1+β) = (1/n) Σ nx 15
i = x-1
Σ 1/ (r+i).
x =1 i=0 See Equations 15.14 and 15.15 in Loss Models.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 35
One could substitute beta from the first equation into the second equation:16 ∞
ln(1+ µ/r) = (1/n)Σ nx x =1
i = x-1
Σ 1/ (r+i). i=0
Then one could solve this equation for r, via numerical means. Unless you require an extraordinary amount of accuracy, I recommend that instead you either just directly maximize the loglikelihood via numerical techniques, such as the Nelder-Mead Simplex method, or the Scoring Method. Exercise: For the following data and a Negative Binomial Distribution with parameters
β = 0.22494 and r = 1.48759, verify that r β = observed mean = µ, and ∞
Ln(1+ µ/r) = (1/n)Σ nx x =1
i = x-1
Σ 1/ (r+i). i=0
Number of Claims 0 Number of Insureds 17649
1 4829
2 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
[Solution: The observed mean = 7988/23872 = 0.033462. (0.22494)(1.48759) = 0.33462. Thus r β = observed mean. Ln(1+ µ/r) = Ln(1 + 0.33462/1.48759) = 0.20289. 1.48759 Number of Claims 0 1 2 3 4 5 6 7 8
Observed 17,649 4,829 1,106 229 44 9 4 1 1
Sum
23872.0 ∞
Therefore, (1/n) Σ nx x =1
1/(r+i) 0.67223 0.40200 0.28673 0.22284 0.18223 0.15414 0.13355 0.11782 0.10540
Sum of 1/(r+i)
Number of Observations times Sum of 1/(r+i)
0.67223 1.07422 1.36095 1.58379 1.76602 1.92016 2.05372 2.17153
3246.190 1188.091 311.659 69.687 15.894 7.681 2.054 2.172 4843.427
i = x-1
Σ 1/ (r+i) = 4843.427/23872 = 0.20289 =
ln(1+r/µ). ]
i=0
Thus we have shown that the maximum likelihood Negative Binomial has parameters:
β = 0.22494 and r = 1.48759. 16
See Equation 15.16 in Loss Models.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 36
Reparameterizing Distributions: As pointed out in Loss Models, there are many different ways to parameterize a distribution. For example, for a Negative Binomial, one could take µ = rβ and β as the parameters, rather than r and β. In that case µ would be the mean of the Negative Binomial Distribution, and rβ(1+β) = µ(1+β) would be the variance. Exercise: For a Negative Binomial with µ = 2.6 and β = 1.3, what is the density at 4. [Solution: Take r = µ/β = 2.6/1.3 = 2. f(4) = {r(r+1)(r+2)(r+3)/4!}β4 / (1+β)4+r = {(2)(3)(4)(5)/24}(1.34 )/(2.36 ) = (5)(2.8561)/ 148.04 = 0.09647.] Thus we see that the density at 4 is the same for either µ = 2.6 and β = 1.3 or r = 2.6/1.3 = 2 and β = 1.3. Thus changing the way we parameterize the distribution can not effect which distribution one gets via fitting by Maximum Likelihood. Exercise: Assume the Maximum Likelihood Negative Binomial has µ = 6 (mean of 6) and β = 1.7 (variance of (2.7)(6).) Then what is the Maximum Likelihood Negative Binomial parameterized as in Loss Models, fit to this same data? [Solution: r = µ/β = 6/1.7 and β = 1.7.] The manner in which we parameterize the distribution has no effect on the result of applying the Method of Maximum Likelihood. However, in some circumstances, certain forms of parameterization may enhance the ability of numerical methods to quickly and easily find the Maximum Likelihood fit.17 Linear Exponential Families:18 For Linear Exponential Families, the Method of Maximum Likelihood and the Method Moments produce the same result when applied to ungrouped data19. Thus there are many cases where one can apply the method of maximum likelihood to ungrouped data by instead performing the simpler method of moments: Exponential, Poisson, Normal for fixed σ, Binomial for m fixed (including the special case of the Bernoulli) , Negative Binomial for r fixed (including the special case of the Geometric), and the Gamma for α fixed. 17
Also the manner of parametrizing the distribution will affect the form of the Information Matrix. See “Mahlerʼs Guide to Fitting Loss Distributions.” For a large class of distributions, if the mean is taken as one parameter, then for the Method of Maximum Likelihood, the estimate of the mean will be asymptotically independent of the estimate of the other parameter(s). 18 See "Mahler's Guide to Conjugate Priors". 19 This useful fact is demonstrated in "Mahler's Guide to Conjugate Priors".
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 37
Variance of Estimated Single Parameters: Assuming the form of the distribution and the other parameters are fixed, the approximate variance of the estimate of a single parameter using the method of maximum likelihood is given by negative the inverse of the product of the number of points times the expected value of the second partial derivative of the log likelihood: 20 ^
Variance of θ ≅
-1 .21 ∂2 lnf(x) n E[ ] ∂θ 2
Exercise: A Poisson Distribution, f(x) = e−λ λ x / x!, is fit to 23,872 observations via the Method of Maximum Likelihood and obtained a point estimate of λ = 0.3346. What is the variance of this estimate of λ? [Solution: ln f(x) = xlnλ - λ - ln(x!).
∂ ln f(x) / ∂λ = x /λ - 1.
∂2 ln f(x) / ∂λ2 = -x/λ2.
E[∂2 ln f(x) / ∂λ2] = E[ -x/λ2] = -E[x]/λ2 = -λ/λ2 = -1/λ. ^
Thus Var[ λ ] ≅ -1/ {nE[∂2 ln f(x) / ∂λ2]} = λ/n = 0.3346 / 23,872 = 0.00001402.] Since for the Poisson the Method of Moments equals the method of maximum likelihood, we get the same result as was obtained for the Method of Moments in the previous section. As discussed, one can hold all but one parameter in a distribution fixed, and estimate the remaining parameter by maximum likelihood. For example, assume we have a Negative Binomial with r = 1.5. The second partial derivative of the loglikelihood with respect to β is obtained as follows: f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r = {(1.5)(2.5)...(0.5 + x)/x!} βx / (1+β)x+1.5. ln f(x) = xln(β) - (x+1.5)ln(1+β) + ln((1.5)(2.5)...(0.5 + x)/x!).
∂ln f(x) / ∂β = x/β - (x+1.5)/(1+β). ∂2 ln f(x) / ∂β2 = (x+1.5)/(1+β)2 - x/β2. Exercise: Assume we have fit a Negative Binomial Distribution with r = 1.5 to 23,872 observations ^
via the Method of Maximum Likelihood and obtained a point estimate β = 0.223. ^
What is an approximate 95% confidence interval for β? 20 21
This is the Cramer-Rao (Rao-Cramer) lower bound. This is a special case of the 2 parameter case, as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 38
[Solution: E[∂2 ln f(x) / ∂β2] = E[(x+1.5)/(1+β)2 - x/β2] = (E[x]+1.5)/(1+β)2 - E[x]/β2 = (1.5β+1.5)/(1+β)2 - 1.5β/β2 = 1.5{1/(1+β) - 1/β} = -1.5 / {β(1+β)}. ^
Thus Var[ β] ≅ -1/ {nE[∂2 ln f(x) / ∂β2]} = β(1+β)/ (1.5n) = (.223)(1.223) / {(1.5)(23,872)} = 0.00000762. Standard Deviation is:
0.00000762 = 0.00276. (1.96)(0.00276) = 0.005. ^
Thus an approximate 95% confidence interval for β is: 0.223 ± 0.005. Comment: Since for the Negative Binomial with r fixed the Method of Moments equals the method of maximum likelihood, we get the same result as was obtained for the Method of Moments in the previous section.] In this case as well as in general, the variance of the estimate is inversely proportional to the number of data points used to fit the distribution. This can be a useful final check of your solution. Here is the approximate variance of the estimated parameter for the following single parameter frequency distributions: Distribution
Parameter
Approximate Variance q(1- q) n
Bernoulli
q
Binomial (m fixed)
q
q(1- q) mn
Poisson
λ
λ n
Geometric
β
β (1+ β) n
Negative Binomial (r fixed)
β
β (1+ β) rn
Negative Binomial (β fixed)
r
1 ∞
x -1
x =1
i =0
1
∑ nx ∑ r + i
For two parameters distributions, one would make use of the “Information Matrix”, as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 39
Fisherʼs Information: For one parameter distributions, the Information or Fisherʼs Information is: -n E [∂2 ln f(x) / ∂θ2]. Fisherʼs Information is the reciprocal of the Cramer-Rao lower bound. For the Poisson, method of moments is equal to maximum likelihood.22 ^
Therefore, Var[ λ ] = Var[ X ] = Var[X]/n = λ/n. ^
However, Var[ λ ] =
1 n . ⇒ Fisherʼs Information = . Fisher's Information λ
Alternately, ln f(x) = xlnλ - λ - ln(x!). E[
∂ ln[f(x)] = x/λ - 1. ∂λ
∂2 ln[f(x)] ] = E[-x/λ2] = -E[x]/λ2 = -λ/λ2 = -1/λ. ∂λ 2
Fisherʼs Information = -n E [∂2 ln f(x) / ∂θ2] =
n . λ
Frequency Distribution
Fisherʼs Information
Binomial, m fixed23
mn q (1- q)
Poisson
Negative Binomial, r fixed24
22
n λ rn β (1+ β)
In the absence of grouping, truncation, or censoring. m =1 is a Bernoulli. 24 r = 1 is a Geometric. 23
∂2 ln[f(x)] = -x/λ2. ∂λ 2
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 40
Variance of Functions of Estimated Parameters: When there is only one parameter θ, the variance of the estimate of a function of θ, h(θ) is:25 ⎛ ∂h ⎞ 2 ^ Var[h(θ)] ≅ ⎜ ⎟ Var[ θ ]. ⎝ ∂θ ⎠ For example, assume one has a Poisson Distribution. Assume we wish to estimate the chance of ∂h having a single claim. Then h(λ) = λe-λ. Then = e-λ - λe-λ = e-λ (1- λ). ∂λ ^
As shown above Var[ λ ] ≅ λ / n. Thus, Var[h] ≅
⎛ ∂h ⎞ 2 ^ Var[ λ ] = e-2λ (1- λ )2 λ / n. ⎝ ∂λ ⎠
Exercise: A Poisson Distribution, f(x) = e−λ λ x / x!, has been fit to 23,872 observations via the Method of Maximum Likelihood and one obtained a point estimate λ = 0.3346. What is the variance of the resulting estimate of the density function at 1? [Solution: Var[h] ≅ e-2λ (1- λ )2 λ / n = e-(2)(0.3346) (1- 0.3346)2 0.3346 / 23,872 = 0.00000318. Comment: A one-dimensional example of the delta method, as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”] Exercise: A Poisson Distribution, f(x) = e−λ λ x / x!, has been fit to 23,872 observations via the Method of Maximum Likelihood and one obtained a point estimate λ = 0.3346. What is the an approximate 95% confidence interval for the chance of 1 claim? [Solution: Var[h] ≅ 0.00000318. Standard Deviation is 0.0018. The point estimate is: f(1) = λe-λ = 0.3346 e-0.3346 = 0.239. Thus an approximate 95% confidence interval is: 0.239 ± 0.004.]
25
This is the same formula used in the section on Method of Moments. It is a special case of the delta method, discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 41
Data Grouped into Intervals: Sometimes frequency data will be grouped into intervals.26 For grouped data, the likelihood for an interval [ai, bi] with ni observations is: {F(bi) -F(ai)} n i . The loglikelihood is: ni ln[F(bi) -F(ai)]. For grouped data, the loglikelihood is a sum of terms over the intervals: (number of observations in the interval) ln(probability covered by the interval). For example, assume we have the following data:27 Number of Claims Number of Insureds
0 17649
1 4829
2 1106
3 229
4 44
5 or more 15
All 23872
Then the loglikelihood is: 17649 ln f(0) + 4829 ln f(1) + 1106 ln f(2) + 229 ln f(3) + 44 ln f(4) + 15 ln[1 - F(4)]. Where rather than use the density for the final interval, we use the sum of the densities for 5 or more, that is 1 - F(4) = 1 - {f(0) + f(1) + f(2) + f(3) + f(4)}. Exercise: What is the loglikelihood for the following data? Number of Claims Number of Insureds
0 17649
1 4829
2 to 4 1379
5 or more 15
All 23872
[Solution: The loglikelihood is: 17649 ln f(0) + 4829 ln f(1) + 1379ln(f(2) + f(3) + f(4)) + 15 ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)}].]
26
This is more common for severity data. The most common grouping of frequency data is to have a final interval, such as 10 or more claims. 27 Note that this is a previous data set, with a final interval of 5 or more claims. This grouping has removed some information, compared to the previous version of the data set.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 42
Exercise: What is the loglikelihood for the following data and a Negative Binomial Distribution with r = 1.5 and β = 0.2? Number of Claims Number of Insureds
0 17649
1 4829
2 to 4 1379
5 or more 15
All 23872
[Solution: The densities of the Negative Binomial are: x
f(x)
0 1 2 3 4 5 6
0.7607258 0.1901814 0.0396211 0.0077041 0.0014445 0.0002648 0.0000478
ln[f(0)] = -0.273482. ln[f(1)] = -1.659777. ln[f(2) + f(3)+ f(4)] = ln(0.0487697) = -3.02065. ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)] = ln(0.0003231) = -8.038. The loglikelihood is: 17,649 ln f(0) + 4829 ln f(1) + 1379ln[f(2) + f(3) + f(4)] + 15 ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)}] = (17.649)(-0.273482) + (4829)(-1.659777) + (1379)(-3.02065) + (15)(-8.038) = -17,127.8.] Given a particular form of the density and a set of data, the loglikelihood is a function of the parameters. For example for a Poisson distribution and the following data Number of Claims Number of Insureds
0 17649
1 4829
2 to 4 1379
5 or more 15
All 23872
the loglikelihood is: 17,649 ln f(0) + 4829 ln f(1) + 1379 ln[f(2) + f(3) + f(4)] + 15 ln[1 - {f(0) + f(1) + f(2) + f(3) + f(4)}] = 17,649 ln(e−λ) + 4829 ln(λe−λ) + 1379 ln(λ2e−λ/2 + λ3e−λ/6 + λ4e−λ/24) + 15 ln(1 - e−λ - λe−λ - λ2e−λ/2 - λ3e−λ/6 - λ4e−λ/24).
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 43
Here is a graph of the loglikelihood as a function of lambda:
- 20000 - 25000 - 30000
0.2
0.4
0.6
0.8
1
To fit a distribution via maximum likelihood to grouped data you find the set of parameters such that either Π{F(bi) -F(ai) } n i or Σ ni ln[F(bi) - F(ai)] is maximized. While in this case one can not solve for the maximum loglikelihood in closed form, using a computer one can determine that the maximum loglikelihood is -17243.8, for λ = 0.327562. ^
For this example, the maximum likelihood Poisson has λ = 0.328.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 44
Years of Data:28 Sometimes one will only have limited information, the number of exposures and number of claims, from each of several years. For example: Year 1995 1996 1997 1998
Exposures 1257 1025 1452 1311
Claims 16 12 18 17
Total
5045
63
Note that the mean observed claim frequency is: 63/5045 = 0.01249. Assume that each exposure has a Poisson frequency process, and that each exposure in every year has the same expected claim frequency, λ. Assume that each Poisson frequency process is independent across exposures and years. Then in 1995, the number of claims is a Poisson frequency process with mean 1257λ. Similarly, in 1998, the number of claims is a Poisson frequency process with mean 1311λ. In 1998, the likelihood is: f(17) = e− 1311λ (1311 λ)17 / 17!. In 1998, the loglikelihood is: ln[f(17)] = 17 ln(λ) + 17 ln(1311) - 1311 λ - ln(17!). Thus the sum of the loglikelihoods is: (16+12+18+17) ln(λ) + 16 ln(1257) + 12 ln(1025) + 18 ln(1452) + 17 ln(1311) - (1257 + 1025 + 1452 + 1311)λ - ln(16!) - ln(12!) - ln(18!) - ln(17!) = 63ln(λ) + 16ln(1257) + 12ln(1025) + 18ln(1452) + 17ln(1311) - 5045λ - ln(16!) - ln(12!) - ln(18!) ln(17!). Setting the partial derivative of the loglikelihood equal to zero, one obtains 0 = 63/λ - 5045. ⇒ λ = 63/5045 = X . In general, when applied to years of data, the Method of Maximum Likelihood applied to the Poisson, produces the same result as the Method of Moments. For either the Binomial with m fixed, or the Negative Binomial with r fixed, when applied to years of data, the Method of Maximum Likelihood produces the same result as the Method of Moments. 28
See Example 15.35 in Loss Models.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 45
Exercise: Assume that each exposure in every year has a Negative Binomial frequency process, with the same parameters β and r. Assume that each Negative Binomial frequency process is independent across exposures and years. What is the loglikelihood for the following data: Year Exposures Claims 1995 1257 16 1996 1025 12 1997 1452 18 1998 1311 17 Total 5045 63 [Solution: In 1998, the number of claims is the sum of 1311 independent Negative Binomial Distributions each with the same parameters β and r, which is a Negative Binomial frequency process with parameters β and 1311r. In 1998 the likelihood is: f(17) = {(1311r + 16)!/(17!)(1311r -1)!} β17/(1+β)17+1311r. ln f(17) = ln(1311r + 16)! - ln(17!) - ln((1311r -1)!) + 17ln β - (17+1311r) ln((1+β)). Thus the sum of the loglikelihoods is: ln((1257r + 15)!) + ln((1025r + 11)!) + ln((1452r + 17)!) + ln((1311r + 16)!) - ln(16!) - ln(12!) ln(18!) - ln(17!) - ln((1257r -1)!) - ln((1025r -1)!) - ln((1452r -1)!) - ln((1311r -1)!) + 63lnβ (63+5045r) ln(1+β). Comment: One could use the form of the Negative Binomial density shown in the Appendix B attached to the exam; I found the alternate form involving factorials easier to work with here.] Then one could maximize this loglikelihood via numerical techniques, in order to fit the maximum likelihood Negative Binomial to this data. Restricted Maximum Likelihood:29 Assume the following numbers of claims: Year 1st quarter 2nd quarter 3rd quarter 1 26 22 25 2 22 25 23
4th quarter 27 22
Total 100 92
Assume that claims the first year are Poisson with parameter λ1. Assume that claims the second year are Poisson with parameter λ2.
29
See 4, 11/00, Q. 34, involving exponential severities, in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 46
Exercise: Use maximum likelihood to estimate λ1. [Solution: It is the same as the method of moments. Estimated λ1 = 100. Alternately, each quarter is Poisson with parameter λ1/4. The loglikelihood is: ln f(26) + ln f(22) + ln f(25) + ln f(27) = ln( e-λ1 /4 (λ1/4)26/26!) + ln( e-λ1 /4 (λ1/4)22/22!) + ln( e-λ1 /4 (λ1/4)25/25!) + ln( e-λ1 /4 (λ1/4)27/27!) = −λ1 + 100ln(λ1) − 100ln(4) - ln(26!) - ln(22!) - ln(25!) - ln(27!). Setting the partial derivative with respect to λ1 equal to zero: 0 = -1 + 100/λ1. ⇒ λ1 = 100.] ^
Similarly applying maximum likelihood to the data for Year 2: λ2 = 92. Instead of separately estimating λ1 and λ2, one can assume some sort of relationship between them. For example, let us assume λ2 = 0.9λ1.30 For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!). Year 1 Loglikelihood is: -λ1 + 100ln(λ1) - ln(100!). Assuming λ2 = .9λ1, Year 2 Loglikelihood is: -λ2 + 92ln(λ2) - ln(92!) = -.9λ1 + 92ln(.9λ1) - ln(92!) = -.9λ1 + 92ln(λ1) + 92ln(.9) - ln(92!). Total Loglikelihood = -λ1 + 100ln(λ1 ) - ln(100!) - .9λ1 + 92ln(λ1) + 92ln(.9) - ln(92!) = -1.9λ1 + 192ln(λ1) - ln(100!) + 92ln(.9) - ln(92!). Setting the partial derivative with respect to λ1 equal to zero: 0 = -1.9 + 192/λ1. ⇒ λ1 = 192/1.9 = 101.05. Alternately, Year 2 is expected to produce the same number of claims as 0.9 of Year 1. Therefore, in total we have 1.9 exposures on the Year 1 level. For the Poisson, maximum likelihood = method of moments.31 λ1 = 192/1.9 = 101.05.
30
In practical applications, this type of assumption may come from many places. For example, it may come from examining a much larger similar set of data. 31 This applies here due to the very special properties of the Poisson Distribution.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 47
Problems: Use the following information for the next three questions: Number of Claims 0 1 2 3 4 5 6
Number of Insureds 301 361 217 87 27 6 1 1000
3.1 (1 point) The data given above is fit to a Poisson distribution using the method of maximum likelihood. What is the mean number of claims for this fitted Poisson distribution? A. less than 1.15 B. at least 1.15 but less than 1.25 C. at least 1.25 but less than 1.35 D. at least 1.35 but less than 1.45 E. at least 1.45 3.2 (2 points) What is the upper end of an approximate 95% confidence interval for the estimate in the previous question? A. less than 1.15 B. at least 1.15 but less than 1.25 C. at least 1.25 but less than 1.35 D. at least 1.35 but less than 1.45 E. at least 1.45 3.3 (3 points) What is an approximate 95% confidence interval for the chance of having 2 or more claims? A. [0.332, 0.342] B. [0.327, 0.347] C. [0.322, 0.352] D. [0.317, 0.357] E. [0.312, 0.362]
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 48
3.4 (2 points) You are given the following accident data from 1000 insurance policies: Number of accidents Number of policies 0 100 1 267 2 311 3 208 4 87 5 23 6 4 7+ 0 You fit a Binomial Distribution with m = 7 to this data via Maximum Likelihood. What is the fitted value of q? A. less than .25 B. at least .25 but less than .30 C. at least .30 but less than .35 D. at least .35 but less than .40 E. at least .40 3.5 (2 points) A fleet of cars has had the following experience for the last three years: Year Cars Number of Claims 1 1500 60 2 1700 80 3 2000 100 Using maximum likelihood, estimate of the annual Poisson parameter for a single car. A. 4.6% B. 4.8% C. 5.0% D. 5.2% E. 5.4% 3.6 (2 points) You are given the following: • The number of claims per year for a given risk follows a distribution with probability function p(n) = λn e−λ / n! , n = 0, 1,..., λ > 0. •
Five claims were observed for this risk during Year 1, nine claims were observed for this risk during Year 2, and three claims were observed for this risk during Year 3.
If λ is known to be an integer, determine the maximum likelihood estimate of λ. A. 4
B. 5
C. 6
D. 7
E. 8
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 49
3.7 (1 point) A data set has an empirical mean of 1.5 and no individual with more than 4 claims. For a Binomial Distribution and this data set: m
q
loglikelihood
4 5 6 7 8
0.3750 0.3000 0.2500 0.2143 0.1875
-1633.20 -1631.08 -1629.87 -1629.41 -1630.63
Using the maximum likelihood Binomial Distribution, what is the probability of zero claims? A. 15.3% B. 16.8% C. 17.8% D. 18.5% E. 19.0% 3.8 (3 points) You observe the following grouped data on 1000 insureds: Number of Claims Number of Insureds 0-5 372 6-10 549 11-15 78 16-20 1 Which of the following expressions should be maximized in order to fit a Poisson Distribution with parameter λ to the above data via the method of maximum likelihood? A.
e−1000λ
⎛5 ⎞ 372 ⎛ 10 ⎞ 549 ⎛ 15 ⎞ 78 i i i ⎜ ∑ λ / i!⎟ ⎜ ∑ λ / i!⎟ ⎜ ∑ λ / i!⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝i=0 ⎠ ⎝i=6 ⎠ ⎝i=11 ⎠
B.
e−1000λ
⎛5 ⎞ 372 ⎛ 10 ⎞ 921 ⎛ 15 ⎞ 999 ⎛ 20 ⎞ 1000 i i ⎜ ∑ λ / i!⎟ ⎜ ∑ λ / i!⎟ ⎜ ∑ λ i / i!⎟ ⎜ ∑ λ i / i!⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝i=0 ⎠ ⎝i=6 ⎠ ⎝i=11 ⎠ ⎝i=16 ⎠
C.
e−1000λ
⎛5 ⎞ 628 ⎜ ∑ λ i / i!⎟ ⎜ ⎟ ⎝i=0 ⎠
D.
⎛5 ⎞ ⎛ 10 ⎞ ⎛ 15 ⎞ ⎛ 20 ⎞ i i i i ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ -1000λ + ln ⎜ ∑ λ / i!⎟ + ln ⎜ ∑ λ / i!⎟ + ln ⎜ ∑ λ / i!⎟ + ln ⎜ ∑ λ / i!⎟⎟ ⎝ i=0 ⎠ ⎝ i=6 ⎠ ⎝ i=11 ⎠ ⎝ i=16 ⎠
E.
⎛ 10 ⎞ 79 ⎜ ∑ λ i / i!⎟ ⎜ ⎟ ⎝i=6 ⎠
⎛ 20 ⎞ ⎜ ∑ λ i / i!⎟ ⎜ ⎟ ⎝ i=16 ⎠
⎛ 15 ⎞ ⎜ ∑ λ i / i!⎟ ⎜ ⎟ ⎝i=11 ⎠
None of the above.
3.9 (3 points) You observe 100 insureds. 85 of them have no claims and 15 of them have at least one claim. Fit a Poisson Distribution with parameter λ to this data via the method of maximum likelihood. What is the fitted value of λ? A. 0.15
B. 0.16
C. 0.17
D. 0.18
E. 0.19
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 50
3.10 (3 points) You have the following data from the state of East Carolina: Region Number of Claims Number of Exposures Claim Frequency Rural 5000 250,000 2.0% Urban 10,000 312,500 3.2% You assume that the distribution of numbers of claims is Poisson. Based on data from other states, you assume that the mean claim frequency (expected number of claims per exposure) for urban insureds is 1.5 times that for rural insureds. Via the method of maximum likelihood applied to all the data, estimate the expected number of claims for the rural region of East Carolina next year, if there are again 250,000 exposures. A. 5000 B. 5050 C. 5100 D. 5150 E. 5200 3.11 (3 points) Claim counts follow a Negative Binomial distribution. Determine the likelihood for the following four independent observations of claim counts: 2, 0, 4, 3. r3 (r + 1) (r + 2) (r + 3) β9 r3 (r + 1)2 (r + 2)2 (r + 3) β9 A. B. 288 (1+ β)r + 9 288 (1+ β) r + 9 C.
r3 (r + 1)3 (r + 2)2 (r + 3) β9 288 (1+ β) 4r + 9
D.
r3 (r + 1)3 (r + 2)2 (r + 3)2 β9 288 (1+ β) 4r + 9
E. None of the above 3.12 (3 points) You observe the following data on 1000 policies: Number of Accidents Number of Policies 0 100 1 267 2 311 3 208 4 or more 114 Which of the following expressions should be maximized in order to fit a Poisson Distribution with parameter λ to the above data via the method of maximum likelihood? A.
-1000λ + 1513 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
B.
-1000λ + 886 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
C.
-886λ + 1513 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
D.
-886λ + 886 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3.
E.
None of the above.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 51
Use the following information for the next two questions: One has observed the following distribution of insureds by number of claims: Number of Claims 0 1 2 3 4 5&+ All Number of Insureds 390 324 201 77 8 0 1000 3.13 (2 points) A Poisson Distribution is fit via the Method of Maximum Likelihood. ^
Which of the following is the variance of λ ? A. less than 0.0008 B. at least 0.0008 but less than 0.0009 C. at least 0.0009 but less than 0.0010 D. at least 0.0010 but less than 0.0011 E. at least 0.0011 3.14 (2 points) A Negative Binomial Distribution with r = 3 is fit via the Method of Maximum ^
Likelihood. Which of the following is the variance of β? A. less than 0.00012 B. at least 0.00012 but less than 0.00013 C. at least 0.00013 but less than 0.00014 D. at least 0.00014 but less than 0.00015 E. at least 0.00015 3.15 (2 points) Claim counts follow a Poisson distribution. Determine the likelihood for the following four independent observations of claim counts: 5, 1, 2, 4. A. λ4 e−λ / 5760 B. λ12 e−λ / 5760 C. λ e−4λ / 5760 D. λ4 e−4λ / 5760 E. None of the above 3.16 (3 points) You are given: (i) An insurance policy has experienced the following numbers of claims over a 5-year period: 10 2 4 0 6 (ii) In the sixth year this insurance policy had at most one claim. (iii) Numbers of claims are independent from year to year. (iii) You use the method of maximum likelihood to fit a Poisson model. Determine the estimated Poisson parameter. (A) 3.60 (B) 3.65 (C) 3.70 (D) 3.75 (E) 3.80
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 52
3.17 (3 points) You are given the following data for the number of claims during a one-year period: Number of Claims Number of Policies 0 9000 1 800 2 180 3 20 4+ 0 Total 10,000 A Poisson distribution is fitted to the data using maximum likelihood estimation. Let P = probability of at least one claim using the fitted Poisson model. A Negative Binomial distribution is fitted to the data using the method of moments. Let Q = probability of at least one claim using the fitted Negative Binomial model. Calculate |P - Q|. (A) 0.00 (B) 0.01 (C) 0.02 (D) 0.03 (E) 0.04 Use the following information for the next four questions. Over one year, the following claim frequency observations were made for a group of 4000 policies, where ni is the number of claims observed for policy i:
Σ ni = 372. Σ ni2 = 403. 3.18 (2 points) You fit a Binomial Distribution with m = 2 via maximum likelihood. Estimate the number of these 4000 policies that will have exactly two claims next year. A. 9 or less B. 10 C. 11 D. 12 E. 13 or more 3.19 (2 points) You fit a Poisson Distribution via maximum likelihood. Estimate the number of these 4000 policies that will have exactly two claims next year. A. 14 or less B. 15 C. 16 D. 17 E. 18 or more 3.20 (2 points) You fit a Geometric Distribution via maximum likelihood. Estimate the number of these 4000 policies that will have exactly two claims next year. A. 22 or less B. 23 C. 24 D. 25 E. 26 or more 3.21 (2 points) You fit a Negative Binomial Distribution with r = 0.5 via maximum likelihood. Estimate the number of these 4000 policies that will have exactly two claims next year. A. 31 or less B. 32 C. 33 D. 34 E. 35 or more
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 53
Use for the next four questions, the following data for three years: Year Exposures Number of Claims 1 632,121 16,363 2 594,380 15,745 3 625,274 16,009 3.22 (2 points) You assume each exposure each year has the same Poisson Distribution. ^
Fit a Poisson Distribution via maximum likelihood. What is λ ? A. 2.6%
B. 2.8%
C. 3.0%
D. 3.2%
E. 3.4%
3.23 (2 points) You assume each exposure each year has the same Binomial Distribution with m = 2. Fit a Binomial Distribution via maximum likelihood. What is q^ ? 3.24 (2 points) You assume each exposure each year has the same Geometric Distribution. ^
Fit a Geometric Distribution via maximum likelihood. What is β ? 3.25 (2 points) You assume each exposure each year has the same Negative Distribution with ^
r = 1.5. Fit a Negative Distribution via maximum likelihood. What is β ?
3.26 (2 points) In baseball, a pitcherʼs earned run average (ERA) is the number of earned runs he allows per 9 innings. Chris Cross is a pitcher who has allowed 40 earned runs in 100 innings. Assume that the number of earned runs a pitcher allows per inning is Poisson. Determine a 90% confidence interval for the underlying mean ERA of Chris Cross. 3.27 (2 points) Data has been collected on 1000 insurance contracts, and a distribution has been fit by maximum likelihood. Determine the corresponding loglikelihood. number of claims per contract observed number fitted number 0 852 851 1 113 117 2 28 24 3 7 6 4 and more 0 2 A. -600 B. -580 C. -560 D. -540 E. -520
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 54
3.28 (3 points) You are given the following data on the annual number of claims: Number of Claims Number of Policies 0 70 1 25 2 5 3 or more 0 You assume a Negative Distribution with β = 0.2. Fit a Negative Distribution via maximum likelihood. What is ^r ? A. Less than 1.70 B. At least 1.70, but less than 1.75 C. At least 1.75, but less than 1.80 D. At least 1.80, but less than 1.85 E. 1.85 or more 3.29 (3 points) Claim counts follow a Poisson distribution with mean lambda. You observe five years of data: 3, 0, 2, at most 1, and 1. Determine the maximum likelihood estimate of lambda. A. 1.29 B. 1.31 C. 1.33 D. 1.35 E. 1.37 3.30 (2 points) For a data set of size 700 from a Negative Binomial Distribution with r = 3, determine Fisherʼs Information. 3.31 (3 points) You observe 100 insureds. 20 of them have fewer than 2 claim. Fit a Binomial Distribution with parameter m = 10 to this data via the method of maximum likelihood. What is the fitted value of q? A. 0.23 B. 0.25 C. 0.27 D. 0.29 E. 0.31
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 55
3.32 (2 points) You are given the following data for the number of claims during a one-year period: Number of Claims Number of Policies 0 104 1 37 2 11 3 6 4 2 5+ 0 Total 160 A Geometric distribution is fitted to the data using the method of moments. Let G = probability of two claims using the fitted Geometric model. A Poisson distribution is fitted to the data using maximum likelihood estimation. Let P = probability of two claims using the fitted Poisson model. Calculate 10,000 |G - P|. (A) Less than 40 (B) At least 40, but less than 50 (C) At least 50, but less than 60 (D) At least 60, but less than 70 (E) At least 70 3.33 (2 points) For a data set of size 100 from a Binomial Distribution with m = 5, determine Fisherʼs Information. 3.34 (4 points) You observe the following data on 15 insureds: Number of Claims Number of Insureds 0 or 1 5 2 or 3 7 more than 3 3 You fit a Geometric Distribution via the method of maximum likelihood. What is the fitted β? A. 1.8
B. 2.0
C. 2.2
D. 2.4
E. 2.6
3.35 (2 points) You are given the following information: • Annual claim counts for policies follow a Poisson distribution with mean λ. • 70 out of 200 policies have zero claims. Calculate the maximum likelihood estimate of λ. A. 0.90
B. 0.95
C. 1.00
D. 1.05
E. 1.10
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 56
3.36 (2, 5/85, Q.20) (1.5 points) Let X1 , X2 , X3 , and X4 be a random sample from the discrete distribution X such that P[X = x] = θ2x exp[-θ2]/x!, for x = 0, 1, 2, . . . , where θ > 0. If the data are 17, 10, 32, and 5, what is the maximum likelihood estimate of θ? A. 4
B. 8
C. 16
D. 32
E. 64
3.37 (2, 5/88, Q.42) (1.5 points) Let X be the number of customers contacted in a given week before the first sale is made. The random variable X is assumed to have probability function f(x) = p(1 - p)x-1 for x = 1, 2, . . . , where p is the probability of a sale on any contact. For three independently selected weeks, the values of X were 7, 9, and 2. What is the maximum likelihood estimate of p? A. 1/18 B. 1/15 C. 1/6 D. 1/5 E. 6 3.38 (2, 5/90, Q.25) (1.7 points) Let X1 , X2 , X3 be independent Poisson random variables with means θ, 2θ, and 3θ, respectively. What is the maximum likelihood estimator of θ? A. X /2
B. X
C. (X1 + 2X2 + 3X3 )/6
D. (3X1 + 2X2 + X3 )/6
E. (6X1 + 3X2 + 2X3 )/11
3.39 (165, 5/96, Q.3) (1.9 points) A coin is believed by Mark to be biased such that the probability of heads is 0.9. Mark tosses the coin 100 times and 50 heads are observed. Mark decides to revise his estimate of the probability of heads by taking a weighted average of his a priori estimate and the maximum likelihood estimate from his experiment. The weight given to his a priori estimate is the a priori standard deviation and the weight given to his maximum likelihood estimate is the estimated standard deviation. Determine the revised estimate of the probability of heads. (A) 0.61 (B) 0.65 (C) 0.70 (D) 0.75 (E) 0.79 3.40 (4B, 11/98, Q.17) (2 points) You are given the following: • The number of claims per year for a given risk follows a distribution with probability function p(n) = λn e-λ / n! , n = 0, 1,..., λ > 0 . •
Two claims were observed for this risk during Year 1 and one claim was observed for this risk during Year 2.
If λ is known to be an integer, determine the maximum likelihood estimate of λ . A. 1
B. 2
C. 3
D. 4
E. 5
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 57
3.41 (Course 4 Sample Exam 2000, Q.3) A fleet of cars has had the following experience for the last three years: Earned Car Years Number of Claims 500 70 750 60 1000 100 The Poisson distribution is used to model this process. Determine the maximum likelihood estimate of the Poisson parameter for a single car year. 3.42 (4, 11/02, Q.6 & 2009 Sample Q. 34) (2.5 points) The number of claims follows a negative binomial distribution with parameters β and r, where β is unknown and r is known. You wish to estimate β based on n observations, where x is the mean of these observations. Determine the maximum likelihood estimate of β. (A) x / r2
(B) x / r
(C) x
(D) r x
(E) r2 x
3.43 (CAS3, 5/05, Q.20) (2.5 points) Blue Sky Insurance Company insures a portfolio of 100 automobiles against physical damage. The annual number of claims follows a binomial distribution with m = 100. For the last 5 years, the number of claims in each year has been: Year 1: 5 Year 2: 4 Year 3: 4 Year 4: 9 Year 5: 3 Two methods for estimating the variance in the annual claim count are: Method 1: Unbiased Sample Variance Method 2: Maximum Likelihood Estimation Use each method to calculate an estimate of the variance. What is the difference between the two estimates? A. Less than 0.50 B. At least 0.50, but less than 0.60 C. At least 0.60, but less than 0.70 D. At least 0.70, but less than 0.80 E. 0.80 or more 3.44 (CAS3, 11/05, Q.4) (2.5 points) When Mr. Jones visits his local race track, he places three independent bets. In his last 20 visits, he lost all of his bets 10 times, won one bet 7 times, and won two bets 3 times. He has never won all three of his bets. Calculate the maximum likelihood estimate of the probability that Mr. Jones wins an individual bet. A. 13/60 B. 4/15 C. 19/60 D. 11/30 E. 5/12
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
3.45 (4, 11/05, Q.29 & 2009 Sample Q.239) (2.9 points) You are given the following data for the number of claims during a one-year period: Number of Claims Number of Policies 0 157 1 66 2 19 3 4 4 2 5+ 0 Total 248 A geometric distribution is fitted to the data using maximum likelihood estimation. Let P = probability of zero claims using the fitted geometric model. A Poisson distribution is fitted to the data using the method of moments. Let Q = probability of zero claims using the fitted Poisson model. Calculate |P - Q|. (A) 0.00 (B) 0.03 (C) 0.06 (D) 0.09 (E) 0.12 3.46 (CAS3, 5/06, Q.2) (2.5 points) Annual claim counts follow a Negative Binomial distribution. The following claim count observations are available: Year Claim Count 2005 0 2004 3 2003 5 Assuming each year is independent, calculate the likelihood function of this sample. ⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r + 2)2 (r + 4) A. ⎜ ⎟ ⎜ ⎟ 3! 5! ⎝ β + 1⎠ ⎝ β + 1⎠ ⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r + 2)2 (r + 4) B. ⎜ ⎟ ⎜ ⎟ 2! 4! ⎝ β + 1⎠ ⎝ β + 1⎠ ⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r + 1)2(r + 2)2 (r + 3) C. ⎜ ⎟ ⎜ ⎟ 2! 4! ⎝ β + 1⎠ ⎝ β + 1⎠ ⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r +1)2(r + 2)2 (r + 3) (r + 4) D. ⎜ ⎟ ⎜ ⎟ 2! 4! ⎝ β + 1⎠ ⎝ β + 1⎠ ⎛ 1 ⎞ 3r ⎛ β ⎞ 8 r2 (r +1)2(r + 2)2 (r + 3) (r + 4) E. ⎜ ⎟ ⎜ ⎟ 3! 5! ⎝ β + 1⎠ ⎝ β + 1⎠
Page 58
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 59
3.47 (4, 11/06, Q.12 & 2009 Sample Q.256) (2.9 points) You are given: (i) The distribution of the number of claims per policy during a one-year period for 10,000 insurance policies is: Number of Claims per Policy Number of Policies 0 5000 1 5000 2 or more 0 (ii) You fit a binomial model with parameters m and q using the method of maximum likelihood. Determine the maximum value of the loglikelihood function when m = 2. (A) -10,397 (B) -7,781 (C) -7,750 (D) -6,931 (E) -6,730 3.48 (4, 11/06, Q.15 & 2009 Sample Q.259) (2.9 points) You are given: (i) A hospital liability policy has experienced the following numbers of claims over a 10-year period: 10 2 4 0 6 2 4 5 4 2 (ii) Numbers of claims are independent from year to year. (iii) You use the method of maximum likelihood to fit a Poisson model. Determine the estimated coefficient of variation of the estimator of the Poisson parameter. (A) 0.10 (B) 0.16 (C) 0.22 (D) 0.26 (E) 1.00 3.49 (4, 5/07, Q.18) (2.5 points) You are given: (i) The distribution of the number of claims per policy during a one-year period for a block of 3000 insurance policies is: Number of Claims per Policy Number of Policies 0 1000 1 1200 2 600 3 200 4+ 0 (ii) You fit a Poisson model to the number of claims per policy using the method of maximum likelihood. (iii) You construct the large-sample 90% confidence interval for the mean of the underlying Poisson model that is symmetric around the mean. Determine the lower end-point of the confidence interval. (A) 0.95 (B) 0.96 (C) 0.97 (D) 0.98 (E) 0.99
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 60
3.50 (CAS3L, 11/08, Q.6) (2.5 points) You are given the following:
• An insurance company provides a coverage which can result in only three loss amounts in the event that a claim is filed: $0, $500 or $1,000.
• The probability, p, of a loss being $0 is the same as the probability of it being $1,000. • The following 3 claims are observed:
$0
$0
$1,000
What is the maximum likelihood estimate of p? A. Less than 0.20 B. At least 0.20, but less than 0.40 C. At least 0.40, but less than 0.60 D. At least 0.60, but less than 0.80 E. At least 0.80 3.51 (CAS3L, 11/09, Q.19) (2.5 points) You are given the following information:
• The number of trials before success follows a geometric distribution. • A random sample of size 10 from that process is: 0 1 2 3 4 4 5 6 7 8 Calculate the maximum likelihood estimate of the variance for the underlying geometric distribution. A. Less than 10 B. At least 10, but less than 12 C. At least 12, but less than 14 D. At least 14, but less than 16 E. At least 16 3.52 (CAS3, 5/10, Q.21) (2.5 points) You are given the following information: • Daily claim counts follow a Poisson distribution with mean λ. • Exactly five of the last nine days have zero claims. Calculate the maximum likelihood estimate of λ. A. Less than 0.25 B. At least 0.25, but less than 0.35 C. At least 0.35, but less than 0.45 D. At least 0.45, but less than 0.55 E. At least 0.55 3.53 (2 points) In the previous question, CAS3L, 5/10, Q.21, assume instead that daily claim counts follow a Geometric distribution with mean β. Calculate the maximum likelihood estimate of β. A. 0.4
B. 0.5
C. 0.6
D. 0.7
E. 0.8
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 61
Solutions to Problems: 3.1. B. For the Poisson with parameter λ , the method of maximum likelihood gives the same result as the method of moments: λ = Σni / N = 1200 / 1000 = 1.2. 3.2. C. The estimated mean is λ = 1.2. ln f(x) = xlnλ -λ - ln(x!). ∂ ln f(x) / ∂λ = x /λ - 1. ∂2 ln f(x) / ∂λ2 = -x/λ2. E[∂2 ln f(x) / ∂λ2] = E[ -x/λ2] = -E[x]/λ2 = -λ/λ2 = -1/λ. ^
Thus Var[ λ ] ≅ -1/nE[∂2 ln f(x) / ∂λ2] = λ/n = 1.2 / 1000 = .0012. Standard Deviation = 0.0012 = 0.0346. An approximate 95% confidence interval for the mean frequency is the estimated mean ± 1.96 standard deviations: 1.20 ± (1.96)(0.036) = 1.20 ± 0.07. The upper end is 1.27. Alternately, maximum likelihood is equal to the method of moments. ^
Var[ λ ] = Var[ X ] = Var[X]/n = λ/n = 1.2 / 1000 = 0.0012. Proceed as before. 3.3. E. The estimated chance of two or more claims is 1 - e−λ - λe−λ = 1 - (2.2e-1.2) = .337. h = 1 - (1+ λ)e−λ. ∂h/∂λ = (1+ λ)e−λ - e−λ = λe−λ = 1.2e-1.2 = .361. Var[h] = (∂h / ∂λ)2 Var[λ] = (.361)2 (.0012) =.000156. Standard Deviation =
0.000156 = 0.0125.
An approximate 95% confidence interval for the chance of two or more claims is: 0.337 ± 0.025. Comment: A one-dimensional example of the delta method, as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.” 3.4. B. For m fixed, maximum likelihood equals the method of moments. The mean is: {(100)(0) + (267)(1) + (311)(2) + (208)(3) + (87)(4) + (23)(5) + (4)(6)}/ 1000 = 2.000. q = mean/ m = 2.000/7= 0.286.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 62
3.5. A. For the Poisson, the method of maximum likelihood equals the method of moments. λ = (60 + 80 + 100)/(1500 + 1700 + 2000) = 0.0462. Alternately, in year one, the number of claims is a Poisson frequency process with mean 1500 λ. In year one, the likelihood is f(60) = e−1500λ (1500 λ)60 / 60!. In year one, the loglikelihood is ln(f(60)) = 60 ln(λ) + 60 ln(1500) - 1500λ - ln(60!). Thus the sum of the loglikelihoods over the three years is: (60 + 80 + 100)ln( λ) + 60ln(1500) + 80ln(1700) + 100ln(2000) - (1500 + 1700 + 2000)λ - ln(60!) - ln(80!) - ln(100!). Taking the partial derivative with respect to λ and setting it equal to zero: 0 = 240/λ - 5200. ⇒ λ = 240/5200 = 0.0462. Comment: Similar to Course 4 Sample Exam, question 3. 3.6. C. The likelihood is the product of: f(5)f(9)f(3) = (λ5e−λ/5!)(λ9e−λ/9!)(λ3e−λ/3!) = λ 17e−3λ / 261273600. Trying all the given values of lambda, the maximum likelihood occurs at λ = 6. Lambda
Likelihood
4 5 6 7 8
0.00040 0.00089 0.00099 0.00068 0.00033
Comments: Similar to 4B, 11/98, Q.17. 3.7. D. For m fixed, maximum likelihood equals the method of moments. So for fixed m, q = mean/m = 1.5/m. Thus the table contains the best set of parameters for each m. The best loglikelihood is for m = 7 and q = .2143. f(0) = (1 - .2143)7 = 0.1848.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 63
3.8. A. The likelihood is the product of terms, one for each interval. For each interval one takes the probability covered by the interval, to the power equal to the number of insureds observed for that interval. For example, the probability covered by the interval from 0 to 5 is: f(0) + f(1) + f(2) + f(3) + f(4) + f(5) = e−λ {1 + λ + (λ2 / 2!) + (λ3 / 3!) + (λ4 / 4!) + (λ5 / 5!)} = e−λ
i=5
∑ λi / i! i=0
This term will be taken to the power 372, since that is the number of insureds observed in the interval from 0 to 5: e−372λ {
i=5
∑ λi / i! }372 i=0
In total the likelihood function is the product of the contributions of each interval: e−1000λ {
i=5
∑
λ i / i! }372
i=0
i=10
{
∑
i=6
λ i / i! }549 {
i=15
∑
λ i / i! }78 {
i=11
i=20
∑ λi / i! }.
i=16
3.9. B. The loglikelihood is the sum of terms, one for each interval. For each interval one takes the log of the probability covered by the interval times the number of insureds observed for that interval. Interval Number of Insureds Probability Contribution to the Loglikelihood 0
85
e−λ
(85)(-λ)
1 or more
15
1 - e−λ
(15)ln(1 - e−λ)
Loglikelihood = -85λ + 15ln(1 - e−λ). In order to maximize the loglikelihood, take its derivative and set it equal to zero: 0 = -85 + 15e−λ/(1 - e−λ). ⇒ e−λ = 0.85. ⇒ λ = 0.163. Alternately, let 1 - q be the probability covered by the first interval and q be the probability covered by the second interval. Then the Iikelihood is: (1-q)85 q15. This is the same as the Iikelihood from a Bernoulli when we observe 15 events out of 100 trials. Therefore, the maximum likelihood value of q is 15/100 = 0.15. 0.15 = 1- q. ⇒ e−λ = 0.85. ⇒ λ = 0.163. Comment: One could have instead maximized the likelihood.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 64
3.10. E. For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!). Rural Loglikelihood is: Σ -λR + xiln(λR) - ln(xi!) = -250000λR + 5000ln(λR) - Σln(xi!). Rural
Assuming λU = 1.5λR, Urban Loglikelihood is: Σ -1.5λR + xiln(1.5λR) - ln(xi!) = -312500(1.5λR) + 10000ln(1.5λR) - Σln(xi!). Urban
Total Loglikelihood = -{250000 + (1.5)(312500)}λR + 15000ln(λR) + 10000ln(1.5) - Σln(xi!). Setting the partial derivative with respect to λR equal to zero: 0 = -718750 + 15000/λR. ⇒ λR = 15000/718750 = 2.087%. ⇒ 250000λR = 5217. Alternately, 312500 urban exposures are expected to produce as many claims as (1.5)(312500) = 468,750 rural exposures. For the Poisson, maximum likelihood = method of moments. λ R = (5000 + 10000)/(250,000 + 468,750) = 2.087%. ⇒ 250000λR = 5217. Comment: This trick of adjusting exposures works for a Poisson frequency, but does not work in general for other frequency distributions. Similar to 4, 11/00, Q. 34, involving exponential severities, in “Mahlerʼs Guide to Fitting Loss Distributions.” A similar trick of adjusting losses works for an Exponential severity, but does not work in general for other severity distributions. 3.11. C. f(0) = 1/(1 + β)r. f(2) = {r(r+1)/2!}β2/(1 + β)r+2. f(3) = {r(r+1)(r+2)/3!}β3/(1 + β)r+3. f(4) = {r(r+1)(r+2)(r+3)/4!}β4/(1 + β)r+4. Likelihood is: f(0)f(2)f(3)f(4) = {r3 (r+1)3 (r+2)2 (r+3)/288} β9/(1 + β)4 r + 9. Comment: One would need a computer in order to maximize the likelihood. Similar to CAS3, 5/06, Q.2.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 65
3.12. A. For grouped data, the loglikelihood is: Σ ni ln[F(bi) - F(ai)] = 100 ln f(0) + 267 ln f(1) + 311 ln f(2) + 208 ln f(3) + 114 ln(1 - f(0) - f(1) - f(2) - f(3)) = 100 ln(e−λ) + 267 ln(λe−λ) + 311 ln(λ2e−λ/2) + 208 ln(λ3e−λ/6) + 114 ln(1 - e−λ - λe−λ - λ2e−λ/2 - λ3e−λ/6) = -100λ - 267λ + 267ln(λ) - 311λ + (2)(311)ln(λ) - 311 ln(2) - 208λ + (3)(208)ln(λ) - 208 ln(6) - 114λ + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) = -1000λ + 1513 ln(λ) + 114 ln(eλ - 1 - λ - λ2/2 - λ3/6) - 519 ln2 - 208 ln3. Comment: For the Poisson, for grouped data, the method of maximum likelihood generally differs from the method of moments. One could instead maximize the likelihood: λ 1513 e-886λ (1 - e−λ - λe−λ - λ2e−λ/2 - λ3e−λ/6)114 /{2311 6208}. Here is a graph of the loglikelihood as a function of lambda: - 1600 - 1700 - 1800 - 1900 - 2000
1.5
2
2.5
3
3.5
4
The maximum loglikelihood is -1533.12 at lambda = 2.03011. 3.13. C. X = {(0)(390) + (1)(324) + (2)(201) + (3)(77) + (4)(8)}/1000 = .989. For the Poisson, for ungrouped data, Method of Maximum Likelihood = Method of Moments. ^
^
λ = X = .989. Var[ λ ] = Var[ X ] = Var[X]/n = λ/1000 = .989/1000 = 0.000989.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 66
3.14. D. For the Negative Binomial with r fixed, for ungrouped data, Method of Maximum Likelihood = Method of Moments.
^
β = X /r = .989/3 = .330.
^
Var[ β] = Var[ X /3] = Var[ X ]/9 = Var[X]/ 9n = rβ(1+β)/9000 = (3)(.330)(1.330)/9000 = 0.000146. ^
Comment: Var[ β] = β(1+β)/ (r n) = (.330)(1.330)/3000 = .000146. 3.15. E. f(1) = λe−λ. f(2) = λ2e−λ / 2. f(4) = λ4e−λ / 24. f(5) = λ5e−λ / 120. Likelihood is: f(1)f(2)f(4)f(5) = λ12 e−4λ / 5760. Comment: The likelihood is maximized for λ = X = 3. 3.16. E. The first five years each contribute the appropriate Poisson density to the likelihood. The sixth year contributes the probability of either 0 or 1 claim: e−λ + λe−λ = e−λ(1 + λ). The likelihood is: {e−λ λ10/10!}{e−λ λ2/2!} {e−λ λ4/4!} {e−λ} {e−λ λ6/6!} e−λ(1 + λ). Ignoring the factorials, this is proportional to: e−6λ λ22 + e−6λ λ23. Setting the derivative with respect to lambda equal to zero: 0 = -6e−6λ λ22 + 22e−6λ λ21 - 6e−6λ λ23 + 23e−6λ λ22. 0 = -6λ + 22 - 6λ2 + 23λ. ⇒ 6λ2 - 17λ - 22 = 0.
⇒ λ = {17 + 172 - (4)(6)(-22) } / {(2)(6)} = 3.80. Comment: The first five observations are ungrouped data, while the sixth observation is grouped. If there were 0 claims in year 6, then maximum likelihood is equal to method of moments: ^
λ = (10 + 2+ 4 + 0 + 6 + 0)/6 = 3.67. If there were 1 claim in year 6, then maximum likelihood is equal to method of moments: ^
λ = (10 + 2+ 4 + 0 + 6 + 1)/6 = 3.83. Thus we expect the answer to this question to be somewhere between 3.67 and 3.83.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 67
3.17. B. Mean = {(9000)(0) + (800)(1) + (180)(2) + (20)(3)}/10,000 = 1220/10,000 = 0.1220. 2nd moment = {(9000)(0) + (800)(1) + (180)(4) + (20)(9)}/10,000 = 1700/10,000 = 0.1700. Variance = .1700 - .12202 = 0.15512. For the Poisson, maximum likelihood is equal to the method of moments. λ = .1220. P = Prob[N > 0] = 1 - e−λ = 1 - e-.1220 = 0.11485. For the Negative Binomial, rβ = .1220, and rβ(1+β) = .15512.
⇒ β = 0.2715. ⇒ r = 0.4494. Q = Prob[N > 0] = 1 - 1/(1+β)r = 1 - 1/1.2715.4494 = 0.10232. |P - Q| = | 0.11485 - 0.10232| = 0.01253. Comment: Similar to 4, 11/05, Q.29. 3.18. A. X = 372/4000 = 0.093. For m fixed, maximum likelihood is equal to the method of moments. mq = 2q = 0.093. q = 0.0465. f(2) = .04652 = 0.216%. (4000)(0.216%) = 8.6. 3.19. C. X = 372/4000 = 0.093. Maximum likelihood is equal to the method of moments. λ = 0.093. f(2) = .0932 e-.093 / 2! = 0.394%. (4000)(0.394%) = 15.8. 3.20. E. X = 372/4000 = 0.093. Maximum likelihood is equal to the method of moments. β = 0.093. f(2) = β2/(1 + β)3 = .0932 / (1.093)3 = 0.662%. (4000)(0.662%) = 26.5. 3.21. D. X = 372/4000 = 0.093. For r fixed, maximum likelihood is equal to the method of moments. rβ = β/2 = 0.093. β = 0.186. f(2) = {(.5)(1.5)/2!} .1862 / (1.186)2.5 = 0.847%. (4000)(0.847%) = 33.9. 3.22. A. In this situation, the method of moments is equal to maximum likelihood. ^
λ = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) = 48,117/ 1,851,775 = 2.60%.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 68
3.23. In this situation, the method of moments is equal to maximum likelihood. 2 q^ = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) = 48,117/ 1,851,775 = 2.60%. ⇒ q^ = 2.60%/2 = 1.30%. Alternately, the first year has a Binomial with m = (2)(632,121) = 1264242. f(16363) = {1264242!/(16363!(1264242 - 16363)!)} q16363 (1- q)(2)(632,121) - 16363 The contribution to the loglikelihood is: 16363 lnq + {(2)(632,121) - 16363}ln(1 - q) + constants. Summing up the contributions from the years, the loglikelihood is: 48,117 lnq + {(2)(1,851,775) - 48,117}ln(1 - q) + constants. Setting the partial derivative with respect to q equal to zero: 0 = 48,117/q + {(2)(1,851,775) - 48,117}/(1 - q). ⇒ 0 = 48,117(1 - q) + q{(2)(1,851,775) - 48,117} ⇒ q^ = 48,117/{(2)(1,851,775)} = 1.30%. 3.24. In this situation, the method of moments is equal to maximum likelihood. ^
β = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) = 48,117/ 1,851,775 = 2.60%. Alternately, the first year has a Negative Binomial with r = 632,121. f(16363) = (632121 + 16363 - 1)!/{(632,121 - 1)! 16363!} β16363/(1 + β)16363+632121. The contribution to the loglikelihood is: 16363 lnβ + (16363 + 632,121)ln(1 + β) + constants. Summing up the contributions from the years, the loglikelihood is: 48,117 lnβ + (1,851,775 + 48,117) ln(1 + β) + constants. Setting the partial derivative with respect to β equal to zero: 0 = 48,117/β + (1,851,775 + 48,117)/(1 + β).
⇒ 0 = 48,117(1 + β) + β(1,851,775 + 48,117) ⇒ β^ = 48,117/1,851,775 = 2.60%.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 69
3.25. In this situation, the method of moments is equal to maximum likelihood. ^
1.5 β = (16,363 + 15,745 + 16,009)/(632,121 + 594,380 + 625,274) = 2.60%.
⇒ β^ = 2.60%/1.5 = 1.73%. Alternately, the first year has a Negative Binomial with r = (1.5)(632,121). f(16363) = (constants) β16363/(1 + β)16363+(1.5)(632121). The contribution to the loglikelihood is: 16363 lnβ + {16363 + (1.5)(632,121)}ln(1 + β) + constants. Summing up the contributions from the years, the loglikelihood is: 48,117 lnβ + {(1.5)(1,851,775) + 48,117} ln(1 + β) + constants. Setting the partial derivative with respect to β equal to zero: 0 = 48,117/β + {(1.5)(1,851,775) + 48,117}/(1 + β).
⇒ 0 = 48,117(1 + β) + β{(1.5)1,851,775 + 48,117} ⇒ β^ = 48,117/{(1.5)(1,851,775)} = 1.73%. ^
3.26. Using either the method of moments or maximum likelihood, λ = 40/100 = 0.4 per inning. ^
Var[ λ ] = (Variance for a single inning)/(number of innings observed) = λ/100 = 0.004. ^
A 90% confidence interval for λ is: 0.40 ± 1.645
0.004 = 0.400 ± 0.104 = 0.296 to 0.504.
To convert to an ERA we multiply by 9: 2.66 to 4.54. Comment: There is a lot of random fluctuation in the results of pitching only 100 innings. 3.27. E. From the fitted numbers of contracts in the table, for the fitted distribution: f(0) = 851/1000 = 0.851, f(1) = 0.117, f(2) = 0.024, f(3) = 0.006, Prob[4 or more] = 0.002. Based on the observations, the loglikelihood is: 852 ln[f(0)] + 113 ln[f(1)] + 28 ln[f(2)] + 7 ln[f(3)] = 852 ln[0.851] + 113 ln[0.117] + 28 ln[0.024] + 7 ln[0.006] = -520.16. Comment: The fitted model is a Negative Binomial with r = 0.501 and β = 0.379. See Table 16.19 in Loss Models.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood, rβ f(1) = . (1 + β)r + 1
1 3.28. D. f(0) = . (1 + β)r
f(2) =
HCM 10/9/12,
Page 70 2
r (r + 1)
β . (1 + β)r + 2
2
loglikelihood is: 70 ln[f(0)] + 25 ln[f(1)] + 5 ln[f(2)] = -70 r ln(1.2) + (25){ln(r) + ln(0.2) - (r+1)ln(1.2)} + (5){ln(r) + ln(r+1) - ln(2) + 2ln(0.2) - (r+2)ln(1.2)}. Setting the partial derivative of the loglikelihood with respect to r equal to 0: 0 = -70 ln(1.2) + 25/r - 25ln(1.2) + 5/r + 5/(r+1) - 5 ln(1.2). 0 = (r)(r+1) 100 ln(1.2) - 30(r+1) - 5r. 0 = 100 ln(1.2) r2 + {100 ln(1.2) - 35}r - 30. 0 = 18.232 r2 - 16.768 r - 30 = 0. ⇒ r=
16.768 ±
16.7682 - (4)(18.232)(-30) = 1.823. (r has to be positive.) (2)(18.232)
Comment: Since there are no policies with more than 2 claims, there is no contribution from ln[f(3)], etc., and therefore we can solve in closed form for r. The Method of Moments fit is: r = 0.35/0.2 = 1.75. Here is a graph of the loglikelihood as a function of r: loglike. - 76 - 77 - 78 - 79 - 80 - 81 - 82 1.0
1.5
2.0
2.5
3.0
r
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 71
3.29. B. The likelihood associated with at most one is: F(1) = f(0) + f(1). Thus the likelihood is: f(3) f(0) f(2) F(1) f(1) = f(3) f(0) f(2) {f(0) + f(1)} f(1) = (λ3 e-λ / 6) e-λ (λ2 e-λ / 2) (e-λ + λ e-λ) (λ e-λ) = λ6 e-5λ (1 + λ) / 12. Thus the loglikelihood is: 6 ln(λ) - 5λ + ln(1+λ) - ln(12). Setting the derivative of the loglikelihood equal to zero: 6/λ - 5 + 1/(1+λ) = 0. ⇒ 6(1+λ) - 5(1+λ)λ + λ = 0. ⇒ 5λ2 - 2λ - 6 = 0.
⇒ λ=
22 - (4)(5)(-6) = 1.314. (2)(5)
2 +
Comment: If we were to treat the fourth year as a observation of 1/2, then the sample mean would be: 6.5/5 = 1.3, close to the right answer. When dealing with the Poisson, if you do not know what else to do, try something intuitive. The fourth year acts like grouped data. 3.30. f(x) =
(3)(4)...(2 + x) βx . x! (1+ β) x+ 3
ln f(x) = ln[3] + ln[4] + ... ln[2 + x] - ln[x!] + x ln[β] - (x+3) ln[1+β]. ∂ ln[f(x)] x x + 3 = . ∂β β 1+ β E[X] = 3β. Thus E[
∂2 ln[f(x)] x x+3 =- 2 + . 2 ∂β β (1+ β) 2
∂2 ln[f(x)] 3β 3β + 3 1 1 3 ]=- 2 + =3{ - } =. 2 2 ∂β 1+β β β (1+ β) β (1+ β)
Fisherʼs Information = -n E[
∂2 ln[f(x)] 3 2100 ] = (-700){}= . 2 ∂q β (1+ β) β (1+ β )
Alternately, for the Negative Binomial with r fixed, maximum likelihood is equal to the method of ^
^
moments. β = X /3. ⇒ Var[ β ] = Var[ X ]/9 = Var[X]/630 = {(3)β(1+β)} / 6300 = β(1+β) / 2100. ^
However, Var[ β ] =
1 2100 . ⇒ Fisherʼs Information = . Fisher's Information β (1+ β )
Comment: For a Negative Binomial with r fixed, for sample size n, Fisherʼs Information =
rn . β (1+ β)
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 72
3.31. C. Let p = f(0) + f(1). Then the loglikelihood is: 20 ln[p] + 80 ln[1 - p]. Setting the derivative with respect to p equal to zero: 20/p - 80/(1-p) = 0. ⇒ p = 0.2. ⇒ 0.2 = f(0) + f(1) = (1-q)10 + 10 q (1 - q)9 = (1 + 9q) (1 - q)9 . Try the choices and for q = 0.27, (1 + 9q) (1 - q)9 = 0.202. 3.32. B. X = {(0)(105) + (1)(37) + (2)(11) + (3)(6) + (4)(2)}/160 = 85/160 = 0.53125. For the Geometric, β = X = 0.53125. f(2) = β2 /(1 + β)3 = 0.07861. For the Poisson, maximum likelihood is equivalent to method of moments, λ = X = 0.53125. f(2) = λ2e−λ/2 = 0.08296. 10,000 |0.07861 - 0.08296| = 43.5. Comment: Similar to 4, 11/05, Q.29. 3.33. f(x) =
5! (1-q)5-x qx. x! (5 - x)!
ln f(x) = ln[5!] - ln[x!] - ln{(5-x)!] + (5-x)ln[1 - q] + x ln[q]. ∂ ln[f(x)] 5 - x x = + . ∂q 1 - q q ∂2 ln[f(x)] 5 - x x = . 2 ∂q (1 - q)2 q2 E[X] = 5q. Thus E[
∂2 ln[f(x)] 5 - 5q 5q 1 1 5 ]= - 2 =5{ - } =. 2 2 ∂q q (1- q) q q (1- q) (1 - q)
Fisherʼs Information = -n E[
∂2 ln[f(x)] 5 500 ] = (-100){}= . 2 ∂q q (1- q) q (1- q)
Alternately, for the Binomial with m fixed, maximum likelihood is equal to the method of moments. q^ = X /5. ⇒ Var[ q^ ] = Var[ X ]/25 = Var[X]/2500 = {(5)q(1-q)} / 2500 = q(1-q) / 500. However, Var[ q^ ] =
1 500 . ⇒ Fisherʼs Information = . Fisher's Information q (1- q)
Comment: For a Binomial with m fixed, for sample size n, Fisherʼs Information =
mn . q (1- q)
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 73
3.34. E. The probability covered by the first interval is: 1 β 1 + 2β f(0) + f(1) = + = . 1+β (1+ β) 2 (1+ β) 2 The probability covered by the second interval is: f(2) + f(3) =
β2 β3 (1 + 2β) β 2 + = . (1+ β) 3 (1+ β) 4 (1+ β) 4
The probability covered by the final interval is: β4 (1+ β)5 β4 β5 β4 f(4) + f(5) + .... = + + ... = = . β (1+ β) 5 (1+ β) 6 (1+ β) 4 1 1+β
Therefore, the loglikelihood is: 5 ln[
1 + 2β (1 + 2β) β 2 β4 ] + 7 ln[ ] + 3 ln[ ]= (1+ β) 2 (1+ β) 4 (1+ β) 4
12 ln[1 + 2β] + 26 ln[β] - 50 ln[1 + β]. Setting the derivative with respect to beta equal to zero: 0 = 24 / (1 + 2β) + 26/β - 50 / (1 + β). ⇒ 24β2 - 52β - 26 = 0. ⇒ 12β2 - 26β - 13 = 0. ⇒ β=
26 ±
262 - (4)(12)(-13) . Taking the positive root, β = 2.586. (2)(12)
3.35. D. The probability of seeing zero claims is: e−λ. The probability of seeing more than zero claims is: 1 - e−λ. Thus the likelihood is: (e−λ)70 (1 - e−λ)130. Therefore, the loglikelihood is: 70 (-λ) + 130 ln(1 - e−λ). Setting the partial derivative of the loglikelihood with respect to λ equal to zero: -70 + 130e−λ/(1 - e−λ) = 0. ⇒ 130e−λ = (70)(1 - e−λ). ⇒ e−λ = 70/200 = 0.35. ⇒ λ = 1.05. Alternately, when we have data grouped into only two intervals, one can fit a single parameter via maximum likelihood by setting the theoretical and empirical distribution functions equal at the boundary between the two intervals. (The empirical distribution function at x is the percentage of observations that are less than or equal to x.) In this case, we set e−λ = F(0) = 0.35. ⇒ λ = 1.05. Comment: Similar to CAS3L, 5/10, Q.21. This is an example of data grouped into intervals; in this case there are two intervals: 0 claims, and at least one claim
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 74
3.36. A. ln f(x) = 2x lnθ - θ2 - ln(x!). Loglikelihood is: 2lnθΣxi - nθ2 - Σln(xi!). Set the partial derivative with respect to θ equal to zero: 0 = 2Σxi /θ - 2nθ. ⇒ θ2 = Σxi/n = (17 + 10 + 32 + 5)/4 = 16. ⇒ θ^ = 4. Alternately, let θ2 = λ. Then we have a Poisson, and maximum likelihood is the same as the method moments: λ^ = X = 16. ⇒ θ^ = 4. 3.37. C. ln f(x) = ln(p) + (x - 1)ln(1-p). Loglikelihood is: 3 ln(p) + (6 + 8 + 1)ln(1-p). Set the derivative with respect to p equal to zero: 0 = 3/p - 15/(1-p). ⇒ 3(1-p) = 15p. ⇒ p = 1/6. Alternately, X follows one plus a Geometric Distribution as per Loss Models.
⇒ X - 1 follows a Geometric Distribution as per Loss Models, with p = 1/(1+β). For the Geometric, the method of moments is equal to maximum likelihood.
⇒ Working with X - 1, β^ = (6 + 8 + 1)/3 = 5. p = 1/(1+β) = 1/6. 3.38. A. f(x) = λxe−λ/x!. ln f(x) = xln(λ) - λ - ln(x!). Therefore, the loglikelihood is: X1 ln(θ) - θ - ln(X1 !) + X2 ln(2θ) - 2θ - ln(X2 !) + X3 ln(3θ) - 3θ - ln(X3 !) = X1 ln(θ) + X2 ln(θ) + X2 ln(2) + X3 ln(θ) + X3 ln(3) - 6θ - ln(X1 !) - ln(X2 !) - ln(X3 !). Setting the partial derivative with respect to θ equal to zero: (X1 + X2 + X3 )/θ - 6 = 0. ⇒ θ^ = (X1 + X2 + X3 )/6 = X /2. 3.39. B. A priori variance = (.9)(.1)/100 = .0009. Standard deviation = .03. Maximum likelihood estimate ⇔ method of moments estimate: q = 50/100 = .5. Estimated variance = (.5)(.5)/100 = .0025. Standard deviation = .05. Revised estimate is: {(.03)(.9) + (.05)(.5)}/(.03 + .05) = 0.65.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 75
3.40. B. The likelihood is the product of: f(2)f(1) = (λ2e−λ/2)(λe−λ) = λ3e−2λ/2. Trying all the given values of lambda, the maximum likelihood occurs at λ = 2. Lambda
Likelihood
1 2 3 4 5
0.0677 0.0733 0.0335 0.0107 0.0028
Comment: The method of maximum likelihood applied to the Poisson is equal to the method of moments, so the result would ordinary be λ = 3/2. However, here λ has been restricted to be an integer. Thus one expects the answer to be either 1 or 2. 3.41. For the Poisson, the method of maximum likelihood equals the method of moments. λ = (70+60+100)/(500+750+1000) = 0.102. Alternately, in year one, the number of claims is a Poisson frequency process with mean 500 λ. In year one, the likelihood is f(70) = e−500λ (500 λ)70 / 70!. In year one, the loglikelihood is ln(f(70)) = 70 ln(λ) +70 ln(500) - 500λ - ln(70!). Thus the sum of the loglikelihoods over the three years is: (70 + 60 + 100) ln( λ) + 70 ln(500) + 60 ln(750) + 100 ln(1000) - (500+750+1000)λ - ln(70!) ln(60!) - ln(100!). Taking the partial derivative with respect to λ and setting it equal to zero: 0 = 230/λ - 2250. λ = 230/2250 = 0.102.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 76
3.42. B. For r fixed, for the Negative Binomial Distribution, maximum likelihood equals the method of moments. Set rβ = x . ⇒ β = x /r. Alternately, f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r . ln f(x) = ln(r) + ln(r+1) + ... + ln(r+x-1) - ln(x!) + xln(β) - (x+r)ln(1+β).
Σ ln f(xi) = Σ {ln(r) + ln(r+1) + ... + ln(r+xi -1) - ln(xi!)} + ln(β)Σxi - ln(1+β)Σ(xi + r). Taking the partial derivative with respect to β and setting it equal to zero: 0 = Σxi/β - Σ(xi + r)/(1 + β). ⇒ βΣ(xi + r) = (1 + β)Σxi. ⇒ β = Σxi / Σr = Σxi /(nr) = x /r. Alternately, f(x) = {r(r+1)...(r+x-1)/x!} β x / (1+β)x+r. ln f(x) = x ln β - (x + r) ln(1 + β) + “constants”, where since r is known we treat terms not involving β as a “constant” that will drop out when we take the partial derivative with respect to β. Loglikelihood: lnβ Σxi - ln(1 + β) (nr + Σxi) + constants. Set the partial derivative with respect to β equal to 0: 0 = Σxi/β - (nr + Σxi)/(1 + β). ⇒ (1 + β)Σxi = β(nr + Σxi). ⇒ β = Σxi/(nr) = x /r. 3.43. D. Mean = (5 + 4 + 4 + 9 + 3)/5 = 5. Sample Variance = {(5 - 5)2 + (4 - 5)2 + (4 - 5)2 + (9 - 5)2 + (3 - 5)2 }/( 5 - 1) = 5.5 For m fixed, Maximum Likelihood ⇔ Method of Moments. Set 100q = 5.
⇒ q = 0.05. Thus for the Binomial, the variance is: q(1 - q)100 = (.05)(.95)(100) = 4.75. 5.5 - 4.75 = 0.75. 3.44. A. The number of bets won on each visit to the track is a Binomial with m = 3. Maximum likelihood is equal to the method of moments. mq = 3q = X = {(10)(0) + (7)(1) + (3)(2) + (0)(3)}/(10 + 7 + 3 + 0) = 13/20. ⇒ q = 13/60. 3.45. C. X = {(0)(157) + (1)(66) + (2)(19) + (3)(4) + (4)(2)}/(157 + 66 + 19 + 4 + 2) = 0.5. For the Geometric, maximum likelihood is equivalent to the method of moments. β = X = 0.5. f(0) = 1/(1 + β) = 1/1.5 = 0.6667. For the Poisson, method of moments, λ = X = 0.5. f(0) = e−λ = e-.5 = 0.6065. The absolute difference is: |0.6667 - 0.6065| = 0.0601. Comment: For the Poisson, maximum likelihood is equivalent to the method of moments.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 77
3.46. E. f(0) = 1/(1 + β)r. f(3) = {r(r+1)(r+2)/3!}β3/(1 + β)r+3. f(5) = {r(r+1)(r+2)(r+3)(r+4)/5!}β5/(1 + β)r+5. ⎛ 1 ⎞ 3r ⎛ β ⎞ 8 2 2 2 Likelihood is: f(0)f(3)f(5) = ⎜ ⎟ ⎜ ⎟ r (r +1) (r + 2) (r + 3)(r + 4) / (3! 5!). ⎝ β + 1⎠ ⎝ β + 1⎠ Comment: Using a computer one can maximize the loglikelihood: r = 1.871, β = 1.425. 3.47. B. X = {(5000)(0) + (5000)(1)}/(5000 + 5000) = 0.5. For the Binomial with m fixed, the method of maximum likelihood is equal to the method of moments. mq = 0.5. ⇒ q^ = 0.5/2 = 0.25. loglikelihood is: 5000 ln[f(0)] + 5000 ln[f(1)] = 5000 ln[(1 - q)2 ] + 5000 ln[2(1-q)(q)] = 15000 ln[1- q] + 5000 ln[q] + 5000 ln[2]. At q^ = 0.25, the loglikelihood is: 15000 ln[.75] + 5000 ln[.25] + 5000 ln[2] = -7781. ^
3.48. B. λ = X = (10 + 2 + 4 + 0 + 6 + 2 + 4 + 5 + 4 + 2)/10 = 3.9. ^
^
Var[ λ ] = Var[ X ] = Var[X]/N = λ /10 = 0.39. ^
CV[ λ ] = 0.39 / 3.9 = 0.160. Alternately, f(x) = e−λ λx/x!. lnf(x) = -λ + xln(λ) - ln(x!). 2 ∂ln[f(x)] ∂2ln[f(x)] 2. E[ ∂ ln[f(x)] ] = -E[X]/λ2 = -λ/λ2 = -1/λ. = -1 + x/λ. = -x/λ ∂λ ∂λ2 ∂λ2
-1
^
Var[ λ ] =
∂2 ln[f(x)]
N E[
∂λ 2
= λ/N. Proceed as before. ]
3.49. C. The method of maximum likelihood is equivalent to the method of moments in this case. ^
λ = X = {(0)(1000) + (1)(1200) + (2)(600) + (3)(200)}/3000 = 1. ^
Var[ λ ] = Var[ X ] = Var[X]/n = λ/n = 1/3000. Therefore, a 90% confidence interval for λ is: 1 ± (1.645) 1/ 3000 = 0.970 to 1.030. 3.50. C. Prob[X = 0] = Prob[X = 1000] = p. Therefore, Prob[X = 500] = 1 - 2p. Therefore, 0 ≤ p ≤ 1/2. The likelihood of the observation is: p p p = p3 , an increasing function of p. This is maximized for p as big as possible; p = 1/2.
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 78
3.51. E. For the Geometric Distribution, maximum likelihood is equal to the method of moments. ^
β = (0 + 1 + 2 + 3 + 4 + 4 + 5 + 6 + 7 + 8)/10 = 4. The variance of the Geometric is: β(1+β) = (4)(5) = 20. 3.52. E. The probability of seeing zero claims in a day is: e−λ. The probability of seeing more than zero claims in a day is: 1 - e−λ. 5 days with zero claims and 4 days with more than zero claims, thus the likelihood is: (e−λ)5 (1 - e−λ)4 . Therefore, the loglikelihood is: 5 (-λ) + 4 ln(1 - e−λ). Setting the partial derivative of the loglikelihood with respect to λ equal to zero: -5 + 4e−λ/(1 - e−λ) = 0. ⇒ e−λ = 5/9. ⇒ λ = 0.588. Alternately, set y = e−λ. Then the likelihood is: y5 (1-y)4 . Since this a one-to-one monotonic transformation, we can find the y that maximizes y5 (1-y)4 , and the Iikelihood will be maximized for the corresponding λ. Setting the derivative with respect to y equal to zero: 0 = 5y4 (1-y)4 + 4y5 (1-y)3 . ⇒ 5(1-y) = 4y. ⇒ y = 5/9. ⇒ e−λ = 5/9. ⇒ λ = 0.588. Alternately, when we have data grouped into only two intervals, one can fit a single parameter via maximum likelihood by setting the theoretical and empirical distribution functions equal at the boundary between the two intervals. (The empirical distribution function at x is the percentage of observations that are less than or equal to x.) In this case, we set e−λ = F(0) = 5/9. ⇒ λ = 0.588. Comment: This is an example of data grouped into intervals; in this case there are two intervals: 0 claims, and at least one claim. In general, maximum likelihood is invariant under monotonic one-to-one change of variables, such as: x2 ,
x , 1/x, ex, e-x, and ln(x).
2013-4-5,
Fitting Frequency § 3 Maximum Likelihood,
HCM 10/9/12,
Page 79
3.53. E. The probability of seeing zero claims in a day is: 1/(1+β). The probability of seeing more than zero claims in a day is: 1 - 1/(1+β) = β/(1+β). 5 days with zero claims and 4 days with more than zero claims, thus the likelihood is: 1/(1+β)5 β4/(1+β)4 = β4/(1+β)9 . Therefore, the loglikelihood is: 4lnβ - 9 ln(1+β). Setting the partial derivative of the loglikelihood with respect to λ equal to zero: 4/β - 9/(1+β). = 0. ⇒ 4(1+β) = 9β. ⇒ β = 0.8. Alternately, when we have data grouped into only two intervals, one can fit a single parameter via maximum likelihood by setting the theoretical and empirical distribution functions equal at the boundary between the two intervals. In this case, we set 1/(1+β) = F(0) = 5/9. ⇒ β = 0.8.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 80
Section 4, Chi-Square Test32 The principle method to test the fits of frequency distributions is the Chi-Square Test. A smaller Chi-Square statistic is better, assuming the same degrees of freedom; as discussed below we can compare fits using the corresponding p-values. Chi-Square Statistic: One can use the Chi-Square Statistic to test the fit of Frequency Distributions or Loss Distributions. The Chi-Square Statistic is computed as a sum of terms, one for each interval. (observed number - expected number)2 For each interval one computes: . expected number For example, compare a Poisson Distribution with λ = 1/2 to the following data: Number of Claims
Observed # Insureds
0 1 2 3 4 5 and over
22,281 10,829 2,706 429 44 6
Sum
36,295
Then the expected number of insureds with n claims is: 36,295 f(n) = 36,295 e−λ λn /n!. For example for n = 3: (36,295)(e-0.5 0.53 / 3!) = (36,295)(0.012636) = 458.6.
32
Number of Claims
Observed # Insureds
Poisson Distribution
Expected # Insureds
Chi-Square = (observed - expected)^2 / expected
0 1 2 3 4 5 and over
22,281 10,829 2,706 429 44 6
0.606531 0.303265 0.075816 0.012636 0.001580 0.000172
22,014.03 11,007.02 2,751.75 458.63 57.33 6.25
3.238 2.879 0.761 1.914 3.099 0.010
Sum
36,295
1.000000
36,295.00
11.900
See also “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 81
The contribution to the Chi-Square Statistic from the n = 3 interval is: (429 - 458.63)2 /458.63 = 1.914. The Chi-Square Statistic is: 3.238 + 2.879 + 0.761 + 1.914 + 3.099 + 0.010 = 11.900. The closer the match between the observed and expected numbers in an interval, the smaller the contribution to the Chi-Square Distribution. A small Chi-Square Statistic indicates a good match between the data and the assumed distribution. Note that the sum of the assumed column is equal to the sum of the observed column. That should always be the case, when you compute a Chi-Square Statistic. In this case, we just assumed the given Poisson Distribution, and did not fit it to this data. In the case of a distribution fit to a data set, one can calculate the Chi-Square Statistic in the same manner. The Negative Binomial distribution previously fit by Method of Moments to the following data has parameters β = 0.230 and r = 1.455. Number of Claims 0 Number of Drivers 17649
1 4829
2 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
Here is the computation of the Chi-Square Statistic for this case: Number of Claims Observed
Fitted Neg. Binomial
Fitted # Drivers
Chi-Square = (observed # - fitted #)^2 / fitted #
0 1 2 3 4 5 and over
17,649 4,829 1,106 229 44 15
0.7399149 0.2013197 0.0462114 0.0099522 0.0020728 0.0005291
17,663.25 4,805.90 1,103.16 237.58 49.48 12.60
0.01 0.11 0.01 0.31 0.61 0.46
Sum
23,872
1.0000000
23,871.97
1.50
For example, for 3 claims the fitted number of drivers is: (total # of drivers observed)(f(3) of the fitted Negative Binomial Distribution) = (23872)(.0099522) = 237.58. The total of the observed and fitted columns are equal, as they should be.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 82
Note that in this case I have chosen to group the intervals for 5 and over, so as to have at least 5 expected observations.33 However, unless an exam question has specifically told you which groups to use, use the groups for the data given in the exam question.34 Hypothesis Testing:35 One could test the fit of the method of moments Negative Binomial Distribution to the observed data by comparing the computed Chi-Square Statistic of 1.50 to the Chi-Square Distribution with degrees of freedom equal to: the number of intervals - 1 - number of fitted parameters.36 In this case with six intervals and two fitted parameters, one would compare 1.50 to the Chi-Square for 6 -1 - 2 = 3 degrees of freedom. Degrees of Freedom 3
0.100 6.251
Significance Levels 0.050 0.025 7.815 9.348
0.010 11.345
0.005 12.838
One would not reject this fit at a 10% significance level, since 1.50 < 6.251. We do not reject at 10% the null hypothesis that the data was generated by this Negative Binomial Distribution. The alternate hypothesis is that it was not. In general, the number of degrees of freedom is used to determine which row of the ChiSquare Table to consult. We then see which two columns bracket the value of the Chi-Square statistic and we then reject at the significance level of the left hand of the two columns, and we do not reject at the significance level of the right hand of the two columns, which is in this case is 10%. In general, reject to the left and do not reject to the right. If instead the Chi-Square statistic had turned out to be 11, since 9.348 < 11 < 11.345, we would have rejected at 2.5% and not rejected at 1.0%.
33
Six and over would have only 2.5 expected observations. In practical applications, there are a a number of different rules of thumb one can use for determining the groups to use. See page 452 of Loss Models. I use one of the rules mentioned in Loss Models: One should have an expected number of claims in each interval of 5 or more, so that the normal approximation that underlies the theory, is reasonably close; therefore, some of the given intervals for grouped data may be combined for purposes of applying the Chi-Square test. 34 As in for example 4, 5/00, Q.29. 35 See “Mahlerʼs Guide to Fitting Loss Distributions” for a more detailed discussion of Hypothesis Testing. 36 In computing the number of degrees of freedom, we use the number of intervals used to compute the Chi-Square statistic. As discussed above, this may be less than the number of intervals in the original data set, if we have combined some intervals.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 83
In determining the number of degrees of freedom, we only subtract the number of parameters when the distribution has been fit to the data set we are using to compute the Chi-Square. We do not decrease the number of degrees of freedom if this distribution has been fit to some similar but different data set. If the distribution is assumed, then the number of degrees of freedom is: number of intervals minus one. Exercise: Use the Chi-Square Statistic computed previously as 11.900, in order to test the null hypothesis H0 : that the data with 36,295 insureds is a random sample from the Poisson Distribution with λ = 1/2, against the alternative hypothesis H1 : this data is not a random sample from the Poisson Distribution with λ = 1/2. [Solution: With six intervals and no fitted parameters, there are: 6 - 1 = 5 degrees of freedom. Degrees Significance Levels of Freedom 0.100 0.050 0.025 0.010 0.005 5 9.236 11.070 12.832 15.086 16.750 11.070 < 11.900 < 12.832; reject H0 at 5%, and do not reject H0 at a 2.5%.] To compute the number of Degrees of Freedom: 1. Determine the groups to use in computing the Chi-Square statistic. Unless the exam question has specifically told you which groups to use, use the groups for the data given in the question. 2. Determine whether any parameters have been fit to this data, and if so how many. 3. Degrees of freedom = (# intervals from step 1) - 1 - (# of fitted parameters, if any, from step #2).
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 84
Exercise: Assume one has observed drivers over nine years and got the following distribution of drivers by number of claims over the nine year period: Number of Claims Number of Drivers
0 17649
1 2 4829 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
Use the Chi-Square Statistic to test the fit of a Poisson Distribution fit to this data via the Method of Maximum Likelihood. Group the intervals for 4 and over in computing the Chi-Square Statistic. [Solution: For the Poisson, the Method of Maximum Likelihood and the Method Moments applied to ungrouped data produce the same result. By taking the average value of the number of claims, one can calculate the first moment: (0)(17649) + (1)(4829) + (2)(1106) + (3)(229) + (4)(44) + (5)(9) + (6)(4) + (7)(1) + (8)(1) 23,872 = 7988 / 23,872 = 0.3346. The fitted Poisson parameter λ is equal to the observed mean of 0.3346. Number of Claims Observed 0 17,649 1 4,829 2 1,106 3 229 4 and over 59 Sum
23,872
Fitted Chi-Square = Poisson (observed number - fitted number )^2 / fitted # Distribution 17,083.4 18.73 5,716.1 137.67 956.3 23.43 106.7 140.33 9.5 257.92 23872.0
578.08
Note that the total of the observed and fitted columns are equal, as they always should be for the computation of a Chi-Square. For example, for 2 claims, the fitted number of observations is the total number of observed claims times the Poisson density at 2: (λ2 e−λ/2!)(23,872) = {(0.33462 )(e-.3346)/2} (23,872) = (0.04006) (23,872) = 956.3. Then the contribution to the Chi-Square is: (1106 - 956.3)2 / 956.3 = 23.43. One tests the fit by comparing the computed value of 578.08 to the Chi-Square Distribution with degrees of freedom equal to the number of intervals - 1 - number of fitted parameters. In this case with one fitted parameter and 5 intervals, one would compare to the Chi-Square for 5 - 1 - 1 = 3 degrees of freedom. Degrees Significance Levels of Freedom 0.100 0.050 0.025 0.010 0.005 3 6.251 7.815 9.348 11.345 12.838 One would reject this fit at a 0.5% significance level, since 12.838 < 578.08. Comment: The Poisson is too short-tailed to fit this data. ]
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 85
p-values: The probability value or p-value is the value of the Survival Function of the Chi-Square Distribution (for the appropriate number of degrees of freedom) at the value of the Chi-Square Statistic. If the data set came from the fitted or assumed distribution, then the p-value is the probability that the Chi-Square statistic would be greater than its observed value, due to random fluctuation. Good Match between data and distribution. ⇒ Small Chi-Square Statistic. ⇒ A small Chi-Square distribution function value. ⇒ A large Chi-Square survival function value. ⇒ A large p-value. A large p-value indicates a good fit. Thus one can compare the fit of two distributions to a data by comparing the p-values corresponding to their Chi-Square Statistics; the distribution with the larger the p-value is a better fit. In general, the p-value is the value at which one can reject the fit or the null hypothesis.37 For the previous exercise, the p-value was less than 0.5%, since 12.838 < 578.08. If instead the Chi-Square statistic with 3 degrees of freedom had been 8.2, then the p-value would have been 5% > p > 2.5%, since 7.815 < 8.2 < 9.348. Exercise: The Chi-Square statistic with 3 degrees of freedom is 10. Estimate the corresponding p-value. [Solution: 2.5% > p > 1%, since 9.348 < 10 < 11.345. Comment: Thus we reject at 2.5% and do not reject at 1%.] Note that using the Chi-Square table one can only get interval estimates of p. Using computer software one can instead get specific values for p. For example, for the Chi-Square statistic of 10 with 3 degrees of freedom, the corresponding p-value is 1.86%.38 Exercise: A Negative Binomial distribution has a Chi-Square statistic of 14.8 with 6 degrees of freedom, while a Poisson distribution has a Chi-Square statistic of 15.4 with 7 degrees of freedom. Which one has the better p-value? [Solution: Since 14.449 < 14.8 < 16.812, the p-value for the Negative Binomial is between 2.5% and 1%. Since 14.067 < 15.4 < 16.013, the p-value for the Poisson is between 5% and 2.5%. The Poisson has a larger and therefore better p-value.] 37
When using the Chi-Square Table, one rejects at the significance value in the table that first exceeds the p-value. For example, with a p-value of 0.6% one rejects at 1%. Reject to the left and do not reject to the right. 38 The Chi-Square Statistic for 3 d.f. is a Gamma Distribution with α = 3/2 and θ = 2. The survival function at 10 is 1 - Γ[1.5; 10/2] = 0.0186.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 86
Years of Data:39 Exercise: Use the following data for three years: Year Exposures Claims 2001 900 158 2002 1000 160 2003 1100 162 Use the Chi-Square Goodness of Fit Test to test the null hypothesis that the frequency per exposure is the same for all three years. [Solution: The observed overall frequency is: (158 + 160 + 162)/(900 + 1000 + 1100) = 0.16. The expected number of claims in 2001 is: (900)(.16) = 144. Year
Observed
Exposures
Assumed Freq.
Expected
Chi-Square
2001 2002 2003
158 160 162
900 1,000 1,100
0.16 0.16 0.16
144 160 176
1.361 0.000 1.114
Sum
480
3,000
0.16
480
2.475
There are 3 - 1 = 2 degrees of freedom. The 10% critical value is 4.605. 2.475 < 4.605, so we do not reject H0 at 10%. Comment: Using a computer, the p-value is 29.0%.] When given years of data, the Chi-Square test statistic is computed as: χ2 =
∑ (nk - Ek)2 / Vk , k
where nk is the number of claims for year k, Ek is the expected number for year k, and Vk is the variance for year k.40 For the above data, n1 = 158 and E1 = (900)(mean frequency). With no assumption as to the form of the frequency distribution, we take Vk = Ek. If frequency is assumed to be Poisson, then E1 = 900λ, and V1 = 900λ = E1 . If frequency is assumed to be Binomial, then E1 = 900mq, and V1 = 900mq(1-q) = E1 (1-q). If frequency is assumed to be Negative Binomial, then E1 = 900rβ, and V1 = 900rβ(1+β) = E1 (1+β).
39
See Example 16.8 and Exercise 16.14 in Loss Models. nk is approximately Normally distributed. (nk - Ek)/ V k follows approximately a Standard Normal Distribution. The sum of the squares of Standard Normal Distributions follows a Chi-Square Distribution. We lose one degree of freedom because the total of the expected Ek, is equal to the total of the observed nk. 40
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 87
Exercise: Use the same data for three years: Year Exposures Claims 2001 900 158 2002 1000 160 2003 1100 162 Fit a Poisson Distribution to this data via method of moments. Use the Chi-Square Goodness of Fit Test to test the null hypothesis that the frequency per exposure is the same for all three years. [Solution: The observed overall frequency is: (158 + 160 + 170)/(900 + 1000 + 1100) = 0.16. ^
λ = 0.16. We get the same expected numbers of claims by year and the same Chi-Square Statistic as in the previous exercise. Year
Observed
Exposures
Assumed Freq.
Expected
Chi-Square
2001 2002 2003
158 160 162
900 1,000 1,100
0.16 0.16 0.16
144 160 176
1.361 0.000 1.114
Sum
480
3,000
0.16
480
2.475
As before, there are 3 - 1 = 2 degrees of freedom. The 10% critical value is 4.605. 2.475 < 4.605, so we do not reject H0 at 10%. Comment: For years of data, for the Poisson, maximum likelihood is equal to the method of moments.] Even though in the second exercise we fit a Poisson Distribution, given the form of the data we compared to, we only used the mean of that Poisson Distribution. Unlike in previous situations, we did not compare the expected and observed numbers of insureds with zero claims, one claim, two claims, etc. We did not use the density at 0, 1, 2, etc., of the fitted Poisson Distribution. Therefore, we were really only assuming the same mean frequency in each year, just as in the previous exercise. Therefore, in this situation, there is no need to subtract an additional degree of freedom for a fitted parameter. In general, if applying the Chi-Square Goodness of Fit Test to data with total claims and exposures by year, then the number of degrees of freedom is the number of years minus one.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 88
Exercise: Use the same data for three years: Year Exposures Claims 2001 900 158 2002 1000 160 2003 1100 162 Fit a Geometric Distribution to this data via method of moments. Use the Chi-Square Goodness of Fit Test to test the null hypothesis that the frequency per exposure is the same for all three years. [Solution: The observed overall frequency is: (158 + 160 + 162) / (900 + 1000 + 1100) = 0.16. ^
β = 0.16. Thus, we get the same expected numbers of claims by year as in the previous exercise. However, the denominator, Vk = β(1+β)(exposures) = Ek(1+β). χ2 =
∑(nk - E k)2 / Vk = k
(158 - 144)2 /{(144)(1.16)} + (160 - 160)2 /{(160)(1.16)} + (162 - 176)2 /{(176)(1.16)} = 2.133. As before, there are 3 - 1 = 2 degrees of freedom. The 10% critical value is 4.605. 2.133 < 4.605, so we do not reject H0 at 10%. Comment: For years of data, for the Geometric, maximum likelihood is equal to the method of moments. The Chi-Square statistic here is that for the Poisson assumption divided by 1.16, subject to rounding: 2.475/1.16 = 2.134.]
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 89
Kolmogorov-Smirnov Statistic:41 The Kolmogorov-Smirnov Statistic is computed by finding the maximum absolute difference between the observed distribution function and the fitted distribution function: Max | observed distribution function at x - theoretical distribution function at x | x Shown below is the computation for the Negative Binomial fit above to data via the Method of Moments. In this case, the maximum absolute difference between the Fitted and Observed Distribution Functions is .00060 and occurs at 0. This K-S statistic of .0006 can be used to compare the fit of the distribution to that of some other type of distribution. The smaller the K-S Statistic the better the fit.
Number of Claims 0 1 2 3 4 5 6 7 8 9 10
Observed 17,649 4,829 1,106 229 44 9 4 1 1 0 0
Observed Distribution Function 0.73932 0.94161 0.98794 0.99753 0.99937 0.99975 0.99992 0.99996 1.00000 1.00000 1.00000
Method of Moments 0.73991 0.20132 0.04621 0.00995 0.00207 0.00042 0.00009 0.00002 0.00000 0.00000 0.00000
Fitted Distribution Function 0.73991 0.94123 0.98745 0.99740 0.99947 0.99989 0.99998 1.00000 1.00000 1.00000 1.00000
Absolute Difference 0.00060 0.00037 0.00049 0.00013 0.00010 0.00015 0.00006 0.00004 0.00000 0.00000 0.00000
Note that we do not attach any specific significance level, as we did with the Chi-Square Statistic. The use of the K-S Statistic to reject or not reject a fit should be confined to ungrouped data and continuous distributions. The K-S Statistic should not be applied to discrete distributions or grouped data in order to reject or not reject a fit . Nevertheless, even when dealing with discrete distributions, a smaller value of the K-S Statistic does indicate a better fit.
41
See “Mahlerʼs Guide to Fitting Loss Distributions” for a more extensive discussion of the Kolmogorov-Smirnov Statistic. For a discrete distribution one compares at all the (available) points, rather than “just before” and “just after” each observed claim, as with a continuous distribution.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 90
Chi-Square Table attached to Exam 4/C: The Chi-Square Table that has been attached to Exam 4/C is shown on the next page. As usual there are different rows corresponding to different degrees of freedom; in this case the degrees of freedom extend from 1 to 20. The values shown in each row are the places where the Chi-Square Distribution Function for that numbers of degrees of freedom has the stated P values. The value of the distribution function is denoted by P (capital p.) For example, for 4 degrees of freedom, F(9.488) = 0.950.
95%
2
4
6
9.488
14
Unity minus the distribution function is the Survival Function; the value of the Survival Function is the p-value (small p) or the significance level, sometimes denoted by α. For example, for 4 degrees of freedom, 9.488 is the critical value corresponding to a significance level of 1 - 0.95 = 5%. The critical values corresponding to a 5% significance level are in the column labeled P = 0.950. Similarly, the critical values corresponding to a 1% significance level are in the column labeled P = 0.990.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 91
For the following questions, use the following Chi-Square table:
χ2 Distribution
P 1-P
χ02
0
The table below gives the value of χ 02 for which Prob[ χ 2 < χ 02 ] = P for a given number of degrees of freedom and a given value of P.
Value of P
Degrees of Freedom
0.005
0.010
0.025
0.050
0.900
0.950
0.975
0.990
0.995
1 2 3 4 5
0.000 0.010 0.072 0.207 0.412
0.000 0.020 0.115 0.297 0.554
0.001 0.051 0.216 0.484 0.831
0.004 0.103 0.352 0.711 1.145
2.706 4.605 6.251 7.779 9.236
3.841 5.991 7.815 9.488 11.070
5.024 7.378 9.348 11.143 12.832
6.635 9.210 11.345 13.277 15.086
7.879 10.597 12.838 14.860 16.750
6 7 8 9 10
0.676 0.989 1.344 1.735 2.156
0.872 1.239 1.647 2.088 2.558
1.237 1.690 2.180 2.700 3.247
1.635 2.167 2.733 3.325 3.940
10.645 12.017 13.362 14.684 15.987
12.592 14.067 15.507 16.919 18.307
14.449 16.013 17.535 19.023 20.483
16.812 18.475 20.090 21.666 23.209
18.548 20.278 21.955 23.589 25.188
11 12 13 14 15
2.603 3.074 3.565 4.075 4.601
3.053 3.571 4.107 4.660 5.229
3.816 4.404 5.009 5.629 6.262
4.575 5.226 5.892 6.571 7.261
17.275 18.549 19.812 21.064 22.307
19.675 21.026 22.362 23.685 24.996
21.920 23.337 24.736 26.119 27.448
24.725 26.217 27.688 29.141 30.578
26.757 28.300 29.819 31.319 32.801
16 17 18 19 20
5.142 5.697 6.265 6.844 7.434
5.812 6.408 7.015 7.633 8.260
6.908 7.564 8.231 8.907 9.591
7.962 8.672 9.390 10.117 10.851
23.542 24.769 25.989 27.204 28.412
26.296 27.587 28.869 30.144 31.410
28.845 30.191 31.526 32.852 34.170
32.000 33.409 34.805 36.191 37.566
34.267 35.718 37.156 38.582 39.997
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 92
Problems: Use the following information for the next four questions: You observe 10,000 trials of a process you believe is Binomial with parameters m = 8 and q unknown. Number of Claims Number of Observations 0 188 1 889 2 2123 3 2704 4 2362 5 1208 6 426 7 88 8 12 10000 4.1 (2 points) Determine q by the method of moments. A. 0.40
B. 0.42
C. 0.44
D. 0.46
E. 0.48
4.2 (4 points) What is the Chi-Square statistic for the fitted Binomial Distribution from the previous question? A. less than 11 B. at least 11 but less than 12 C. at least 12 but less than 13 D. at least 13 but less than 14 E. at least 14 4.3 (1 point) Based on the Chi-Square statistic computed in the previous question, one tests the hypothesis H0 that the data is drawn from the fitted Binomial Distribution. Which of the following is true? A. Reject H0 at 0.005. B. Do not reject H0 at 0.005. Reject H0 at 0.010. C. Do not reject H0 at 0.010. Reject H0 at 0.025. D. Do not reject H0 at 0.025. Reject H0 at 0.050. E. Do not reject H0 at 0.050. Reject H0 at 0.100.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 93
4.4 (3 points) What is the Kolmogorov-Smirnov Statistic for the fitted Binomial Distribution? A. less than 0.010 B. at least 0.010 but less than 0.015 C. at least 0.015 but less than 0.020 D. at least 0.020 but less than 0.025 E. at least 0.025 4.5 (2 points) Assume one has observed drivers over nine years and got the following distribution of drivers by number of claims over the nine year period: Number of Claims Number of Drivers
0 17649
1 4829
2 1106
3 229
4 44
5 9
6 4
7 1
8 1
All 23872
One fits a Negative Binomial Distribution via the method of moments to this data, assuming r = 1.8. What is the value of β? A. less than 0.16 B. at least 0.16 but less than 0.17 C. at least 0.17 but less than 0.18 D. at least 0.18 but less than 0.19 E. at least 0.19 4.6 (3 points) What is the Chi-Square statistic for the fitted Negative Binomial Distribution from the previous question? Group the intervals so that each has at least 5 expected observations. A. less than 4 B. at least 4 but less than 6 C. at least 6 but less than 8 D. at least 8 but less than 10 E. at least 10 4.7 (1 point) Based on the Chi-Square statistic computed in the previous question, one tests the hypothesis H0 that the data is drawn from the fitted Negative Binomial Distribution. Which of the following is true? A. Reject H0 at 0.005. B. Do not reject H0 at 0.005. Reject H0 at 0.010. C. Do not reject H0 at 0.010. Reject H0 at 0.025. D. Do not reject H0 at 0.025. Reject H0 at 0.050. E. Do not reject H0 at 0.050. Reject H0 at 0.100.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 94
Use the following information for the next nine questions: Number of Claims Number of Policies
0 90,000
1 9,000
2 700
3 50
4 5
All 99,755
4.8 (2 points) One fits a Poisson Distribution via the method of moments to this data. What is the value of the fitted parameter λ? A. less than 0.10 B. at least 0.10 but less than 0.11 C. at least 0.11 but less than 0.12 D. at least 0.12 but less than 0.13 E. at least 0.13 4.9 (3 points) What is the Chi-Square statistic for the fitted Poisson Distribution from the previous question? Group the intervals so that each has at least 5 expected observations. A. less than 10 B. at least 10 but less than 20 C. at least 20 but less than 30 D. at least 30 but less than 40 E. at least 40 4.10 (1 point) Based on the Chi-Square statistic computed in the previous question, one tests the hypothesis H0 that the data is drawn from the fitted Poisson Distribution. How many degrees of freedom does one use? A. 1 B. 2 C. 3 D. 4
E. 5
4.11 (1 point) Based on the Chi-Square statistic computed previously, one tests the hypothesis H0 that the data is drawn from the fitted Poisson Distribution. Which of the following is true? A. Reject H0 at 0.005. B. Do not reject H0 at 0.005. Reject H0 at 0.010. C. Do not reject H0 at 0.010. Reject H0 at 0.025. D. Do not reject H0 at 0.025. Reject H0 at 0.050. E. Do not reject H0 at 0.050. Reject H0 at 0.100.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 95
4.12 (2 points) One fits a Negative Binomial Distribution via the method of moments to this data. What is the value of the fitted parameter β? A. less than 0.06 B. at least 0.06 but less than 0.08 C. at least 0.08 but less than 0.10 D. at least 0.10 but less than 0.12 E. at least 0.12 4.13 (1 point) One fits a Negative Binomial Distribution via the method of moments to this data. What is the value of the fitted parameter r? A. less than 1.5 B. at least 1.5 but less than 1.6 C. at least 1.6 but less than 1.7 D. at least 1.7 but less than 1.8 E. at least 1.8 4.14 (3 points) What is the Chi-Square statistic for the fitted Negative Binomial Distribution from the previous two questions? Group the intervals so that each has at least 5 expected observations. A. less than 1 B. at least 1 but less than 3 C. at least 3 but less than 5 D. at least 5 but less than 7 E. at least 7 4.15 (1 point) Based on the Chi-Square statistic computed in the previous question, what is the p-value of the fitted Negative Binomial Distribution. A. 0.005 < p < 0.010 B. 0.010 < p < 0.025 C. 0.025 < p < 0.050 D. 0.050 < p < 0.100 E. 0.100 < p 4.16 (3 points) What is the loglikelihood to be maximized in order to fit a Negative Binomial Distribution, with parameters β and r, to the above data via the method of maximum likelihood? A. 10570ln(β) + 9755ln(r) + 755ln(r+1) + 55ln(r+2) + 5ln(r+3) - 110325 ln(1+β) B. 10570ln(β) + 9000ln(r) + 700ln(r+1) + 50ln(r+2) + 5ln(r+3) - 110325 ln(1+β) C. 10570ln(β) + 9755ln(r) + 755ln(r+1) + 55ln(r+2) + 5ln(r+3) - (99755r + 10570)ln(1+β) D. 10570ln(β) + 9000ln(r) + 700ln(r+1) + 50ln(r+2) + 5ln(r+3) - (99755r + 10570)ln(1+β) E. None of the above.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 96
4.17 (2 points) A Geometric, a compound Poisson-Binomial, and a mixed Poisson-Inverse Gaussian Distribution have each been fit to the same set of data. The set of data has 8 intervals. (For each distribution each interval has more than 5 expected observations.) Use the Chi-Square Statistics to rank the fits from best to worst. Distribution # of Parameters Chi-Square 1. Geometric 1 15.5 2. Compound Poisson-Binomial 3 14.6 3. Mixed Poisson-Inverse Gaussian 2 12.7 A. 1, 2, 3 B. 1, 3, 2 C. 2, 1, 3 D. 2, 3, 1 E. 3, 1, 2 4.18 (3 points) The following data is for the number of hurricanes hitting the continental United States from 1900 to 1999. Decade
Observed
0 1 2 3 4 5 6 7 8 9
15 20 15 17 23 18 15 12 16 13
Sum
164
Let H0 be the hypothesis that the mean frequency during each decade is the same. Using the Chi-Square test, which of the following are true? A. Do not reject H0 at 0.5%. Reject H0 at 1.0%. B. Do not reject H0 at 1.0%. Reject H0 at 2.5%. C. Do not reject H0 at 2.5%. Reject H0 at 5.0%. D. Do not reject H0 at 5.0%. Reject H0 at 10.0%. E. Do not reject H0 at 10.0%.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 97
Use the data shown below for the next two questions: Region Exposures Claims 1 1257 124 2 1025 119 3 1452 180 4 1311 177 Total
5045
600
4.19 (3 points) Assume that each exposure in every region has a Poisson Distribution, with the same mean λ. Assume that each Poisson Distribution process is independent across exposures and regions. Fit λ via Maximum Likelihood. Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted Poisson Distribution. Which of the following statements is true? A. H0 will be rejected at the 1% significance level. B. H0 will be rejected at the 2.5% significance level, but not at the 1% level. C. H0 will be rejected at the 5% significance level. but not at the 2.5% level. D. H0 will be rejected at the 10% significance level, but not at the 5% level. E. H0 will not be rejected at the 10% significance level. 4.20 (4 points) Assume that each exposure in every region has a Binomial Distribution, with m = 2 and the same parameter q. Assume that each Binomial frequency process is independent across exposures and regions. Fit q via Maximum Likelihood. Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted Binomial Distribution. Which of the following statements is true? A. H0 will be rejected at the 1% significance level. B. H0 will be rejected at the 2.5% significance level, but not at the 1% level. C. H0 will be rejected at the 5% significance level. but not at the 2.5% level. D. H0 will be rejected at the 10% significance level, but not at the 5% level. E. H0 will not be rejected at the 10% significance level.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 98
4.21 (5 points) You are given the following data on the number of runs scored during half innings of major league baseball games from 1980 to 1998: Runs Number of Occurrences 0 518,228 1 105,070 2 47,936 3 21,673 4 9736 5 4033 6 1689 7 639 8 274 9 107 10 36 11 25 12 5 13 7 14 1 15 0 16 1 Total 709,460 Fit a Negative Binomial via the method of moments. Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted Negative Binomial Distribution. Group the intervals so that each has at least 5 expected observations. Which of the following statements is true? A. H0 will be rejected at the 0.005 significance level. B. H0 will be rejected at the 0.01 significance level, but not at the .005 level. C. H0 will be rejected at the 0.025 significance level. but not at the 0.01 level. D. H0 will be rejected at the 0.05 significance level, but not at the 0.025 level. E. H0 will not be rejected at the 0.05 significance level.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 99
4.22 (4 points) You are given the following data on the number of injuries per claim on automobile bodily injury liability insurance: Number of Injuries Number of Policies 1 4121 2 430 3 71 4 19 5 6 6 4 7 and more 1 Total 4652 Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the following distribution: f(n) =
0.2n . -n ln(0.8)
Group the intervals so that each has at least 5 expected observations. Which of the following statements is true? A. H0 will be rejected at the 0.005 significance level. B. H0 will be rejected at the 0.01 significance level, but not at the .005 level. C. H0 will be rejected at the 0.025 significance level. but not at the 0.01 level. D. H0 will be rejected at the 0.05 significance level, but not at the 0.025 level. E. H0 will not be rejected at the 0.05 significance level. 4.23 (3 points) Ten thousand insureds are assigned to one of five classes based on their total number of claims over the prior 5 years, as follows: Class Number of Claims Number of Insureds in Class A No Claims 6825 B One Claim 2587 C Two Claims 499 D Three Claims 78 E Four or More Claims 11 The null hypothesis is that annual claim frequency follows a Poisson distribution with mean 0.08, which implies over 5 years a Poisson distribution with mean 0.4. Which of these classes has the largest contribution to the Chi-square goodness-of-fit statistic?
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 100
4.24 (4 points) You are given the following data on the number of claims per year on automobile insurance: Number of Claims Number of Policies 0 20,592 1 2,651 2 297 3 41 4 7 5 0 6 1 7 and more 0 Total 23,589 Fit a Negative Binomial via the method of moments. Use the Chi-Square Statistic to test H0 , the hypothesis that the data came from the fitted Negative Binomial Distribution. Group the intervals so that each has at least 5 expected observations. Which of the following statements is true? A. H0 will be rejected at the 0.01 significance level. B. H0 will be rejected at the 0.025 significance level, but not at the .01 level. C. H0 will be rejected at the 0.05 significance level. but not at the 0.025 level. D. H0 will be rejected at the 0.10 significance level, but not at the 0.05 level. E. H0 will not be rejected at the 0.10 significance level. 4.25 (2 points) You are given the following data on the number of claims: Number of Claims Number of Insureds 0 91,000 1 8,100 2 800 3 or more 100 Total 100,000 You compare this data to a Geometric Distribution with β = 0.1. Compute the chi-square goodness-of-fit statistic. A. less than 11 B. at least 11 but less than 12 C. at least 12 but less than 13 D. at least 13 but less than 14 E. at least 14
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 101
Use the data shown below for the next three questions: Year Exposures Claims 1 3000 200 2 3300 250 3 3700 310 Total
10,000
760
4.26 (2 points) Assume that the expected frequency per exposure is the same in every year. Conduct a Chi-Square goodness-of-fit test of this hypothesis. Which of the following statements is true? A. H0 will be rejected at the 1% significance level. B. H0 will be rejected at the 2.5% significance level, but not at the 1% level. C. H0 will be rejected at the 5% significance level. but not at the 2.5% level. D. H0 will be rejected at the 10% significance level, but not at the 5% level. E. H0 will not be rejected at the 10% significance level. 4.27 (2 points) Determine the maximum likelihood estimate of the Poisson parameter for the above data. Conduct a Chi-Square goodness-of-fit test for this fitted Poisson model. Which of the following statements is true? A. H0 will be rejected at the 1% significance level. B. H0 will be rejected at the 2.5% significance level, but not at the 1% level. C. H0 will be rejected at the 5% significance level. but not at the 2.5% level. D. H0 will be rejected at the 10% significance level, but not at the 5% level. E. H0 will not be rejected at the 10% significance level. 4.28 (2 points) Fit instead a Geometric Distribution via maximum likelihood to the above data. Conduct a Chi-Square goodness-of-fit test for this fitted model. Which of the following statements is true? A. H0 will be rejected at the 1% significance level. B. H0 will be rejected at the 2.5% significance level, but not at the 1% level. C. H0 will be rejected at the 5% significance level. but not at the 2.5% level. D. H0 will be rejected at the 10% significance level, but not at the 5% level. E. H0 will not be rejected at the 10% significance level.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 102
4.29 (3 points) Use the following information on the number of accidents over a nine year period for a set of drivers: Number of Accidents Number of Drivers 0 17,649 1 4,829 2 1,106 3 229 4 44 5 9 6 4 7 1 8 1 9 0 Total 23,872 A Negative Binomial has been fit to this data via maximum likelihood, with fitted parameters are r = 1.4876 and β = 0.2249. Calculate the Chi-Square Goodness of Statistic. Group the data using the largest number of groups such that the expected number of drivers in each group is at least 5. A. less than 1 B. at least 1 but less than 2 C. at least 2 but less than 3 D. at least 3 but less than 4 E. at least 4 4.30 (3 points) For a set of insurance policies, there is at most one claim per policy per year. Year Number of Policies Number of Claims 2006 9,000 842 2007 10,500 1016 2008 12,000 1258 2009 13,500 1380 2010 15,000 1594 Use Maximum Likelihood to fit a Bernoulli Distribution. Test this fit using the Chi-Square Goodness-of-Fit Test. What is your conclusion? A. Reject H0 at 0.005. B. Do not reject H0 at 0.005. Reject Ho at 0.010. C. Do not reject H0 at 0.010. Reject Ho at 0.025. D. Do not reject H0 at 0.025. Reject Ho at 0.050. E. Do not reject H0 at 0.050.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 103
4.31 (4, 5/89, Q.51) (3 points) The following claim frequency observations were made for a group of 1,000 policies: # of Claims # of Policies 0 800 1 180 2 19 3 or more 1 Your hypothesis is that the number of claims per policy follows a Poisson distribution with a mean of µ = 0.20. What is the Chi-square statistic for this data under your hypothesis? Group the intervals so that each has at least 5 expected observations. A. Less than 2.4 B. At least 2.4, but less than 2.6 C. At least 2.6, but less than 2.8 D. At least 2.8, but less than 3.0 E. 3.0 or more
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 104
4.32 (4B, 5/92, Q.30) (3 points) You are given the following information for 10,000 risks grouped by number of claims: • A negative binomial distribution was fit to the grouped risks.
•
Minimum chi-square estimation was used to estimate the two parameters of the negative binomial. • The results are as follows: Number of Actual Number Estimated Number Of Risks Claims Of Risks Using Negative Binomial 0 8,725 8,738 1 1,100 1,069 2 135 162 3 35 26 4 3 4 5 2 1 10,000 10,000 You are to use the Chi-square statistic to test the hypothesis, H0 , that the negative binomial provides an acceptable fit. Group the intervals so that each has at least 5 expected observations. Which of the following is true? A. Reject H0 at 0.005. B. Do not reject H0 at 0.005. Reject Ho at 0.010. C. Do not reject H0 at 0.010. Reject Ho at 0.025. D. Do not reject H0 at 0.025. Reject Ho at 0.050. E. Do not reject H0 at 0.050.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 105
4.33 (4B, 11/92, Q.22) (3 points) You are given the following information for 10,000 risks grouped by number of claims.
• A Poisson distribution with mean λ was fit to the grouped risks. • Minimum chi-square estimation has been used to estimate λ. • The results are as follows: Number of Claims 0 1 2 3 or more
Actual Number of Risks 7,673 2,035 262 30
Estimated Number of Risks Using Poisson 7,788 1,947 243 22
10,000 10,000 You are to use the Chi-square statistic to test the hypothesis, H0 , that the Poisson provides an acceptable fit. Which of the following is true? A. Reject H0 at 0.005. B. Do not reject H0 at 0.005. Reject H0 at 0.010. C. Do not reject H0 at 0.010. Reject H0 at 0.025. D. Do not reject H0 at 0.025. Reject H0 at 0.050. E. Do not reject H0 at 0.050.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 106
Use the following information for the next three questions: A portfolio of 5000 insureds are grouped by number of claims as follows: Number of Claims Number of Insureds 0 4,101 1 806 2 85 3 8 The underlying distribution for number of claims is assumed to be Poisson with mean µ. Maximum likelihood estimation is used to estimate µ for the above data. 4.34 (4B, 5/93, Q.16) (2 points) Determine the Chi-square statistic, χ2, using the above information. A. Less than 0.50 B. At least 0.50 but less than 1.50 C. At least 1.50 but less than 2.50 D. At least 2.50 but less than 3.50 E. At least 3.50 4.35 (4B, 5/93, Q.17) (1 point) How many degrees of freedom are associated with the χ2 statistic that was calculated for the previous question? A. 1 B. 2 C. 3 D. 4
E. 5
4.36 (4B, 5/93, Q.18) (2 points) Determine the Kolmogorov-Smirnov statistic using the above information. A. Less than 0.0003 B. At least 0.0003 but less than 0.0008 C. At least 0.0008 but less than 0.0013 D. At least 0.0013 but less than 0.0018 E. At least 0.0018
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 107
4.37 (4B, 5/94, Q.7) (2 points) You are given the following information for a portfolio of 10,000 risks grouped by number of claims: A Poisson distribution is fitted to the grouped risks with these results: Number of Risks Number of Claims Actual Expected 0 9,091 9,048 1 838 905 2 51 45 3 or more 20 2 Total 10,000 10,000 Determine the Kolmogorov-Smirnov statistic for the fitted Poisson distribution. A. Less than 0.001 B. At least 0.001, but less than 0.003 C. At least 0.003, but less than 0.005 D. At least 0.005, but less than 0.007 E. At least 0.007 4.38 (4B, 11/96, Q.9) (3 points) You are given the following: • The observed number of claims for a group of 1,000 risks has been recorded as follows: Number of Claims Number of Risks 0 729 1 242 2 29 3 or more 0 • The null hypothesis, H0 , is that the number of claims per risk follows a Poisson distribution. • A chi-square test is performed with three classes. The first class contains those risks with 0 claims, the second contains those risks with 1 claim, and the third contains those risks with 2 or more claims. • The minimum chi-square estimate of the mean of the Poisson distribution is 0.3055. Which of the following statements is true? A. H0 will be rejected at the 0.005 significance level. B. H0 will be rejected at the 0.01 significance level, but not at the .005 level. C. H0 will be rejected at the 0.025 significance level. but not at the 0.01 level. D. H0 will be rejected at the 0.05 significance level, but not at the 0.025 level. E. H0 will not be rejected at the 0.05 significance level.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 108
4.39 (4B, 11/97, Q.20) (3 points) You are given the following: • The observed number of claims for a group of 100 risks has been recorded as follows: Number of Claims Number of Risks 0 80 1 20 •
The null hypothesis, H0 , is that the number of claims per risk follows a Bernoulli
•
distribution with mean q. A chi-square test is performed.
Determine the smallest value of q for which H0 will not be rejected at the 0.01 significance level. A. Less than 0.08 B. At least 0.08, but less than 0.09 C. At least 0.09, but less than 0.10 D. At least 0.10, but less than 0.11 E. At least 0.11
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 109
Use the following information for the next two questions: You are given the following: • The observed number of claims for a group of 50 risks has been recorded as follows: Number of Claims Number of Risks 0 7 1 10 2 12 3 17 4 4 • The null hypothesis, H0 , is that the number of claims per risk follows a uniform distribution on 0, 1, 2, 3, and 4. 4.40 (4B, 5/98, Q.10) (2 points) A chi-square test is performed with five classes. Which of the following statements is true? A. H0 will be rejected at the 0.01 significance level. B. H0 will be rejected at the 0.025 significance level, but not at the 0.01 level. C. H0 will be rejected at the 0.05 significance level, but not at the 0.025 level. D. H0 will be rejected at the 0.10 significance level, but not at the 0.05 level. E. H0 will not be rejected at the 0.10 significance level. 4.41 (4B, 5/98, Q.11) (2 points) Two adjacent classes of the five classes above are combined, and a chi-square test performed with four classes. Determine which of the following combinations will result in a p-value of the Chi-Square test most different from the one in the previous question. A. Combining the risks with 0 claims and the risks with 1 claim B. Combining the risks with 1 claim and the risks with 2 claims C. Combining the risks with 2 claims and the risks with 3 claims D. Combining the risks with 3 claims and the risks with 4 claims E. Can not be determined. Note: This exam question has been rewritten.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 110
4.42 (4, 5/00, Q.29 & 4, 11/02, Q.28 & 2009 Sample Q. 47) (2.5 points) You are given the following observed claim frequency data collected over a period of 365 days: Number of Claims per Day Observed Number of Days 0 50 1 122 2 101 3 92 4+ 0 Fit a Poisson distribution to the above data, using the method of maximum likelihood. Regroup the data, by number of claims per day, into four groups: 0 1 2 3+ Apply the chi-square goodness-of-fit test to evaluate the null hypothesis that the claims follow a Poisson distribution. Determine the result of the chi-square test. (A) Reject at the 0.005 significance level. (B) Reject at the 0.010 significance level, but not at the 0.005 level. (C) Reject at the 0.025 significance level, but not at the 0.010 level. (D) Reject at the 0.050 significance level, but not at the 0.025 level. (E) Do not reject at the 0.050 significance level.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 111
4.43 (IOA 101, 4/01, Q.14) (12 points) Consider a group of 1000 policyholders, all of the same age, and each of whose lives is insured under one or more policies. The following frequency distribution gives the number of claims per policyholder in 1999 for this group. Number of claims per policyholder (i) 0 1 2 3 ≥4 Number of policyholders (fi) 826 128 39 7 0 A statistician argues that an appropriate model for the distribution of X, the number of claims per policyholder, is X ~ Poisson. Under this proposal, the frequencies expected are as follows (you are not required to verify these): Number of claims per policyholder 0 1 2 3 ≥4 Expected number of policyholders 796.9 180.9 20.5 1.6 0.1 A second statistician argues that a more appropriate model for the distribution of X is given by: P(X = x) = p(1 - p)x, x = 0, 1, 2,... (i) (1.5 points) Without doing any further calculations, comment on the first statisticianʼs proposed model for the data. (ii) (3 points) Consider the second statistician's proposed model. Verify that the mean of the distribution of X is (1 - p)/p and hence calculate the method of moments estimate of p. (Note: this estimate is also the maximum likelihood estimate.) (iii) (2.25 points) Verify that the frequencies expected under the second statisticianʼs proposed model are as follows: Number of claims per policyholder 0 1 2 3 ≥4 Expected number of policyholders 815.0 150.8 27.9 5.2 1.2 (iv) (5.25 points) (a) Test the goodness-of-fit of the second statisticianʼs proposed model to the data, quoting the p-value of your test statistic and your conclusion. (b) Assuming that you had been asked to test the goodness-of-fit “at the 1% level”, state your conclusion.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 112
4.44 (4, 5/01, Q.19) (2.5 points) During a one-year period, the number of accidents per day was distributed as follows: Number of Accidents Days 0 209 1 111 2 33 3 7 4 3 5 2 You use a chi-square test to measure the fit of a Poisson distribution with mean 0.60. The minimum expected number of observations in any group should be 5. The maximum possible number of groups should be used. Determine the chi-square statistic. (A) 1 (B) 3 (C) 10 (D) 13 (E) 32 4.45 (4, 11/01, Q.25 & 2009 Sample Q.71) (2.5 points) You are investigating insurance fraud that manifests itself through claimants who file claims with respect to auto accidents with which they were not involved. Your evidence consists of a distribution of the observed number of claimants per accident and a standard distribution for accidents on which fraud is known to be absent. The two distributions are summarized below: Number of Claimants per Accident Standard Probability Observed Number of Accidents 1 0.25 235 2 .35 335 3 .24 250 4 .11 111 5 .04 47 6+ .01 22 Total 1.00 1000 Determine the result of a chi-square test of the null hypothesis that there is no fraud in the observed accidents. (A) Reject at the 0.005 significance level. (B) Reject at the 0.010 significance level, but not at the 0.005 level. (C) Reject at the 0.025 significance level, but not at the 0.010 level. (D) Reject at the 0.050 significance level, but not at the 0.025 level. (E) Do not reject at the 0.050 significance level.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 113
4.46 (4, 11/03, Q.16 & 2009 Sample Q.13) (2.5 points) A particular line of business has three types of claims. The historical probability and the number of claims for each type in the current year are: Historical Number of Claims Type Probability in Current Year A 0.2744 112 B 0.3512 180 C 0.3744 138 You test the null hypothesis that the probability of each type of claim in the current year is the same as the historical probability. Calculate the chi-square goodness-of-fit test statistic. (A) Less than 9 (B) At least 9, but less than 10 (C) At least 10, but less than 11 (D) At least 11, but less than 12 (E) At least 12 4.47 (4, 11/05, Q.10 & 2009 Sample Q.222) (2.9 points) 1000 workers insured under a workers compensation policy were observed for one year. The number of work days missed is given below: Number of Days of Work Missed Number of Workers 0 818 1 153 2 25 3 or more 4 Total 1000 Total Number of Days Missed 230 The chi-square goodness-of-fit test is used to test the hypothesis that the number of work days missed follows a Poisson distribution where: (i) The Poisson parameter is estimated by the average number of work days missed. (ii) Any interval in which the expected number is less than one is combined with the previous interval. Determine the results of the test. (A) The hypothesis is not rejected at the 0.10 significance level. (B) The hypothesis is rejected at the 0.10 significance level, but is not rejected at the 0.05 significance level. (C) The hypothesis is rejected at the 0.05 significance level, but is not rejected at the 0.025 significance level. (D) The hypothesis is rejected at the 0.025 significance level, but is not rejected at the 0.01 significance level. (E) The hypothesis is rejected at the 0.01 significance level.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 114
Solutions to Problems: 4.1. A. The observed mean is 32003 / 10000 = 3.200. Set mq = 8q = 3.2. Thus q = 0.400. Number of Claims 0 1 2 3 4 5 6 7 8
Observed 188 889 2123 2704 2362 1208 426 88 12
Observed times # of Claims 0 889 4246 8112 9448 6040 2556 616 96
Sum
10000
32003
Comment: This is also the solution for the Method of Maximum Likelihood. 4.2. C. One needs to compute 10000 times the fitted Binomial with parameters m = 8 and q = 0.40. Then, in order to compute the Chi-Square one sums: (observed number - fitted number)2 / fitted number Number of Claims 0 1 2 3 4 5 6 7 8 Sum
Chi Square
Observed 188 889 2123 2704 2362 1208 426 88 12
Fitted via Method of Moments 168.0 895.8 2090.2 2786.9 2322.4 1238.6 412.9 78.6 6.6
10000
10000
12.91
2.39 0.05 0.52 2.47 0.67 0.76 0.42 1.11 4.53
4.3. E. One tests the significance by using the Chi-Square Distribution for 9 - 1 - 1 = 7 degrees of freedom, # of intervals - 1 - number of fitted parameters. Using the 12.91 Chi-Square Statistic computed in the previous question, we reject at 10% and do not reject at 5%, since 14.067 > 12.91 > 12.017.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 115
4.4. A. Calculate the densities for the Binomial and then cumulate them to get the Distribution Function. For example, F(2) = .0168 + .0896 + .2090 = .3154. The Kolmogorov-Smirnov Statistic is computed by finding the maximum absolute difference between the observed distribution function and the fitted distribution function: Number of Claims 0 1 2 3 4 5 6 7 8
Observed 188 889 2123 2704 2362 1208 426 88 12
Observed Distribution Function 0.0188 0.1077 0.3200 0.5904 0.8266 0.9474 0.9900 0.9988 1.0000
Method of Moments Binomial 0.0168 0.0896 0.2090 0.2787 0.2322 0.1239 0.0413 0.0079 0.0007
Fitted Distribution Function 0.0168 0.1064 0.3154 0.5941 0.8263 0.9502 0.9915 0.9993 1.0000
Absolute Difference 0.0020 0.0013 0.0046 0.0037 0.0003 0.0028 0.0015 0.0005 0.0000
This maximum absolute difference is 0.0046 and occurs at 2 claims. Comment: Note that we do not attach any specific significance level as we did with the Chi-Square Statistic. Nevertheless, even when dealing with discrete distributions, a smaller value of the K-S Statistic does indicate a better fit. 4.5. D. By taking the average value of the number of claims, one can calculate that the first moment is 7988 / 23872 = .3346. Set the mean = rβ = 1.8β = 0.3346. ⇒ β = .3346/1.8 = 0.186. Number of Claims
Observed
0 1 2 3 4 5 6 7 8 Sum
17,649 4,829 1,106 229 44 9 4 1 1 23872
Observed times # of claims 0 4829 2212 687 176 45 24 7 8 7988
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 116
4.6. D. One needs to compute 23,872 times the fitted Negative Binomial with parameters r = 1.8 and β = .186. Then, in order to compute the Chi-Square one sums: (observed number - fitted number)2 / fitted number B
C
D
E
Number of Claims
A
Observed
Negative Binomial r = 1.8, beta = .186
Fitted via Method of Moments
Chi Square
0 1 2 3 4 5 and over
17,649 4,829 1,106 229 44 15
0.73561 0.20766 0.04559 0.00906 0.00170 0.00038
17,560.5 4,957.2 1,088.4 216.2 40.7 9.0
0.45 3.32 0.28 0.76 0.27 4.01
Sum
23872
1
23,872.0
9.08
Comment: We were told to group intervals so that the fitted column to has entries that are ≥ 5 on each row. If we instead grouped as “6 and over”, the observed column would be ≥ 5, but not the fitted column. 4.7. E. We have a number of degrees of freedom = (#intervals - 1) - (# fitted parameters) = (6 - 1) - 1 = 4. Degrees Significance Levels of Freedom 0.100 0.050 0.025 0.010 0.005 4 7.779 9.488 11.143 13.277 14.860 The computed Chi-Square of 9.08 is less than 9.488 and greater than 7.779. Do not reject at 5%. Reject at 10%. 4.8. B. A
B
C
Number of Claims 0 1 2 3 4 Sum
Observed 90,000 9,000 700 50 5 99755
Col. A times Col. B 0 9000 1400 150 20 10570
By taking the average value of the number of claims, one can calculate that the mean is 10570 / 99755 = .10596. Using the method of moments, match the first moment: λ = 0.106.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 117
4.9. E. One needs to compute 99,755 times the fitted Poisson Distribution with parameter λ = 0.106. Then, in order to compute the Chi-Square one sums: (observed number - fitted number)2 / fitted number Number of Claims 0 1 2 3 and over
Observed 90,000 9,000 700 55
Sum
99755.0
Fitted Poisson Distribution 0.89942 0.09534 0.00505 0.00018
Fitted Number 89,722.1 9,510.5 504.1 18.3 99755.0
Chi Square 0.86 27.41 76.17 73.66 178.1
Comment: We were told to group intervals so that the fitted column to has entries that are ≥ 5 on each row. The intervals for 3 and over were grouped together so as to get at least 5 expected insureds. (The interval 4 and over would have had only 0.5 expected insureds.) 4.10. B. Number of degrees of freedom = (number of intervals) -1 - (number of fitted parameters) = 4 - 1 - 1 = 2. 4.11. A. With two degrees of freedom, we reject at .005 since 178.1 > 10.597. Comment: This data is fit well by a Negative Binomial rather than a Poisson Distribution; the variance is significantly greater than the mean. 4.12. B. and 4.13. D. By taking the average value of the number of claims, one can calculate that the first moment is: 10570 / 99755 = 0.1060. By taking the average value of the square of the number of claims observed for each driver, one can calculate that the second moment is: 12330 / 99755 = 0.1236. Thus the estimated variance is: 0.1236 - 0.10602 = 0.1124. A
Number of Claims 0 1 2 3 4 Sum
B
C
D
Observed 90,000 9,000 700 50 5 99755
Col. A times Col. B 0 9000 1400 150 20 10570
Square of Col. A times Col. B 0 9000 2800 450 80 12330
Using the method of moments one would try to match the first two moments by fitting the two parameters of the Negative Binomial Distribution β and r. One can write down two equations in two unknowns, by matching the mean and the variance: mean = .1060 = rβ. variance = .1124 = rβ(1+β). 1 + β = Variance / Mean = .1124 / .1060 = 1.0604. β = 0.060. r = mean/β = .1060 / .0604 = 1.76.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 118
4.14. A. One needs to compute 99,755 times the fitted Negative Binomial with parameters r = 1.755 and β = .0604. Then, in order to compute the Chi-Square one sums: (observed number - fitted number)2 / fitted number Number of Claims
Observed
Chi Square
90,000 9,000 700 55
Fitted via Method of Moments 89,998.5 8,996.6 705.9 54.0
0 1 2 3 and over Sum
99755
99755
0.0700
0.0000 0.0013 0.0492 0.0196
Comment: We were told to group intervals so that the fitted column to has entries that are ≥ 5 on each row. The number of expected observations for 4 and more would be only 3.65, thus we group the final interval as 3 and more. 4.15. E. Number of degrees of freedom = (number of intervals) -1 - (number of fitted parameters) = 4 - 1 - 2 = 1. With one degree of freedom, since .070 < 2.706, 0.100 < p. Comment: In other words, we do not reject at 10% since .070 < 2.706. The p-value is the survival of this Chi-Square Distribution at .0700. Using a computer, the p-value is 79.1%. This data is fit very closely by a Negative Binomial. 4.16. C. The likelihood is the product of terms: f(0)90000 f(1)9000f(2)700f(3)50f(4)5 = (1+β)-r90000 {rβ/(1+β)r+1}9000{r(r+1)β2 / 2(1+β)r+2}700 {r(r+1)(r+2)β3 / 6(1+β)r+3}50 {r(r+1)(r+2)(r+3) β4 / 24(1+β)r+4}5 . Which is proportional to: β10570 r9755 (r+1)755 (r+2)55 (r+3)5 / (1+β)99755r+10570 Thus other than an additive constant the loglikelihood is: 10570ln(β) + 9755ln(r) + 755ln(r+1) + 55ln(r+2)+ 5ln(r+3) - (99755r + 10570)ln(1+β). Comment: Maximizing the loglikelihood will also maximize the likelihood. Any terms which do not depend on the parameters, do not effect which parameters produce the maximum likelihood. 4.17. E. The degrees of freedom are 7 - number of fitted parameters. Geometric Distribution has 1 parameter so 6 d.f.; 14.449 < 15.5 < 16.812 so 1% < p < 2.5%. Compound Poisson-Binomial Distribution has 3 parameter so 4 d.f.; 13.277 < 14.6 < 14.860 so 1/2% < p < 1%. Mixed Poisson-Inverse Gaussian Distribution has 2 parameters so 5 d.f.; 11.070 < 12.7 < 12.832 so 2.5% < p < 5%. The larger the p-value, the better the fit. Therefore, the fits from best to worst are: Mixed Poisson-Inverse Gaussian Distribution, Geometric, Compound Poisson-Binomial Distribution or 3, 1, 2.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 119
4.18. E. Chi-Square of 5.88 with 10 - 1 = 9 degrees of freedom. (There is an assumed rather than fitted distribution. Thus there are no fitted parameters.) Decade
Observed Number
Assumed Distribution
Assumed Number
Chi-Square ((Observed - Assumed)^2)/Assumed
0 1 2 3 4 5 6 7 8 9
15 20 15 17 23 18 15 12 16 13
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
16.4 16.4 16.4 16.4 16.4 16.4 16.4 16.4 16.4 16.4
0.12 0.79 0.12 0.02 2.66 0.16 0.12 1.18 0.01 0.70
Sum
164
1
164.0
5.88
Since 5.88 < 14.684, therefore we do not reject at 10%. Comment: The p-value is 75%; we do not reject the hypothesis that the expected frequency by decade is the same. The data was taken from “A Macro Validation Dataset for U.S. Hurricane Models”, by Douglas J. Collins and Stephen P. Lowe, CAS Forum, Winter 2001. 4.19. D. The overall mean is: 600/5045 = .119. Claim data by region is mathematically the same as claim data by years. When applied to years of data, the Method of Maximum Likelihood applied to the Poisson, produces the same result as the Method of Moments. λ = 0.119. In each case, the expected number of claims is the number of exposures times λ = 0.119. Region Observed Number Exposures 1 124 1257 2 119 1025 3 180 1452 4 177 1311 Sum
600
5045
Expected Number ((Observed - Expected)^2)/Expected 149.49 4.348 121.90 0.069 172.69 0.310 155.92 2.851 600
7.578
We have 4 years (regions) of data, so there are 4 - 1 = 3 degrees of freedom. Since 6.251 < 7.578 < 7.815, H0 is rejected at the 10% significance level, but not at 5%. Comment: We are really only assuming the same mean frequency for every exposure in every region; we are not really using the fact that the frequency is assumed to be Poisson. Therefore, we do not lose a degree of freedom for the fitted parameter.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 120
4.20. C. In region 1, the number of claims is the sum of 1311 independent Binomial Distributions each with the same parameters m = 2 and q, which is a Binomial frequency process with parameters m = (2)(1311) = 2622 and q. In region 1 the likelihood is f(170) = {(2622!)/(170!)(2622-170)!} q170 (1-q)2622-170. ln f(170) = ln[2622!] - ln[170!] - ln[(2622-170)!] + 170lnq + (2622-170)ln(1-q). Sum of the loglikelihoods is: ln[2514!] + ln[2050!] + ln[2904!] + ln[2622!] - ln[130!] - ln[120!] - ln[180!] - ln[170!] - ln[(2514-130)!] - ln[(2050-120)!] - ln[(2904-180)!] - ln[(2622-170)!] + 600lnq + {(2)(5045)-600}ln(1-q). Taking the partial derivative with respect to q and setting it equal to zero: 600/q = {(2)(5045)-600}/(1-q). 600(1-q) = q((2)(5045)-600). q = 600/{(2)(5045)} = 0.0595. In each case, the fitted number of claims is the number of exposures times m = 2 times q = 0.0595. χ 2 = Σ (nk - Ek)2 /Vk, where the denominator is based on the variance of the fitted Binomial. While the expected number would be the variance of a Poisson, the variance of a Binomial is: mq(1-q)(total exposures) = (1-q) (Expected Number). The Chi-Square Statistic would be somewhat bigger: Region Observed Number Exposures Expected Number 1 124 1257 149.49 2 119 1025 121.90 3 180 1452 172.69 4 177 1311 155.92 Sum
600
5045
600
4.623 0.073 0.329 3.031 8.057
We have 4 years (regions) of data, so there are 4 - 1 = 3 degrees of freedom. Since 7.815 < 8.057 < 9.348, H0 is rejected at the 5% significance level, but not at 2.5%. Comment: The Chi-Square Statistic here with the Binomial assumption is that for the Poisson divided by 1 - q: 8.057 = 7.578/(1 - 0.0595). For years of data, for the Binomial, method moments is equal to maximum likelihood.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 121
4.21. A. The mean is: {(518228)(0) + (105070)(1) + ... + (1)(16)}/709,460 = 0.48438. The 2nd moment is: {(518228)(02 ) + (105070)(12 ) + ... + (1)(162 )}/709,460 = 1.23442. rβ = .48438, and rβ(1 + β) = 1.23442 - .484382 = 0.99980.
⇒ 1 + β = .99980/.48438 = 2.0641. ⇒ β = 1.0641. ⇒ r = .48438/1.0641 = 0.4552. So that each interval has at least 5 expected observations, we group 15 or more. Number of runs
Observed Number
Fitted Neg. Bin.
Fitted Number
Chi Square
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 or more
518,228 105,070 47,936 21,673 9,736 4,033 1,689 639 274 107 36 25 5 7 1 1
0.71900940 0.16872853 0.06328968 0.02670240 0.01189091 0.00546216 0.00256021 0.00121713 0.00058474 0.00028320 0.00013804 0.00006764 0.00003329 0.00001644 0.00000815 0.00000808
510,108.4 119,706.1 44,901.5 18,944.3 8,436.1 3,875.2 1,816.4 863.5 414.8 200.9 97.9 48.0 23.6 11.7 5.8 5.7
129.2 1789.5 205.1 393.0 200.3 6.4 8.9 58.4 47.8 43.9 39.2 11.0 14.7 1.9 4.0 3.9
Sum
709,460
1
709,460
2957.2
There are 16 intervals, and 16 - 1 - 2 = 13 degrees of freedom. For 13 degrees of freedom, the critical value for 1/2% is 29.819. Since 2957.2 > 29.819, reject at 1/2%! Comment: Even though the variance is greater than the mean, the Negative Binomial is a terrible fit to his data. The data is taken from “An Analytic Model for Per-inning Scoring Distributions,” by Keith Woolner. 4.22. A. So that each interval has at least 5 expected observations, we group 4 or more. Number of Claims
Observed Number
Assumed Density
Assumed Number
Chi Square
1 2 3 4 or more
4,121 430 71 30
0.89628 0.08963 0.01195 0.00214
4,169.5 417.0 55.6 9.9
0.564 0.408 4.270 40.468
Sum
4,652
1
4,652
45.710
There are 4 intervals, and 4 - 1 = 3 degrees of freedom; the critical value for 0.5% is 12.838. Since 12.838 < 45.710, reject at 0.5%. Comment: Data set taken from page 301 of Insurance Risk Models by Panjer and WiIlmot.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 122
4.23. B. For the Poisson, f(0) = e-0.4 = .670320. f(1) = 0.4e-0.4 = 0.268128. f(2) = 0.42 e-0.4/2 = .053626. f(3) = 0.43 e-0.4/6 = 0.007150. Prob[4 or more] = 1 - 0.670320 - 0.268128 - 0.053626 - 0.007150 = 0.000776. There are a total of 10,000 observed insureds, and therefore, the expected number by class are: 6703.20, 2681.29, 536.26, 71.50, and 7.76. Chi-square statistic is: Σ (Observed - Expected)2 /Expected = 10.058. Class
Observed Number
Assumed Poisson
Expected Number
Chi-Square ((Observed - Expected)^2)/Expected
A B C D E
6825 2587 499 78 11
0.670320 0.268128 0.053626 0.007150 0.000776
6,703.20 2,681.28 536.26 71.50 7.76
2.213 3.315 2.588 0.591 1.350
Sum
10000
1.000000
10,000.00
10.058
The largest contribution is from Class B: (2587 - 2681.28)2 /2681.28 = 3.315. 4.24. E. The mean is: {(20,592)(0) + (2,651)(1) + (297)(2) + (41)(3) + (7)(4) + (1)(6)}/23,589 = 0.14422. The 2nd moment is: {(20,592)(02 ) + (2,651)(12 ) + (297)(22 ) + (41)(32 ) + (7)(42 ) + (1)(62 )}/23,589 = 0.18466. rβ = 0.14422, and rβ(1 + β) = 0.18466 - 0.144222 = 0.16386.
⇒ 1 + β = 0.16386/0.14422 = 1.1362. ⇒ β = 0.1362. ⇒ r = 0.14422/0.1362 = 1.0589. So that each interval has at least 5 expected observations, we group 4 or more. Number of Claims
Observed Number
Fitted Neg. Bin.
Fitted Number
Chi Square
0 1 2 3 4 or more
20,592 2,651 297 41 8
0.873532 0.110881 0.013683 0.001672 0.000232
20,605.8 2,615.6 322.8 39.5 5.5
0.009 0.480 2.057 0.061 1.181
Sum
23,589
1
23,589
3.788
There are 5 intervals, and 5 - 1 - 2 = 2 degrees of freedom. For 2 degrees of freedom, the critical value for 10% is 4.605. Since 3.788 < 4.605, do not reject H0 at 10%. Comment: Taken from pages 302-303 of Insurance Risk Models by Panjer and WiIlmot. See also Example 15.31 in Loss Models.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 123
4.25. E. For the Geometric: f(0) = 1/1.1 = 0.909091, f(1) = 0.1/1.12 = 0.082645, f(2) = 0.12 /1.13 = 0.007513. Prob[3 or more] = 1 - 0.909091 - 0.082645 - 0.007513 = 0.000751. Number of Claims
Observed Number
Assumed Geometric
Expected Number
Chi-Square ((Observed - Expected)^2)/Expected
0 1 2 3 or more
91,000 8,100 800 100
0.9090909 0.0826446 0.0075131 0.0007513
90,909.09 8,264.46 751.31 75.13
0.091 3.273 3.155 8.231
Sum
100,000
1.0000000
100,000.00
14.750
4.26. C. The overall mean is: 760/10000 = 7.6%. In each case, the expected number of claims is the number of exposures times λ = .076. Year 1 2 3 Sum
Observed Number Exposures 200 3000 250 3300 310 3700 760
10000
Expected Number 228.00 250.80 281.20
((Observed - Expected)^2)/Expected 3.439 0.003 2.950
760
6.391
We have 3 years of data, so there are 3 - 1 = 2 degrees of freedom. Since 5.991 < 6.391 < 7.378, H0 is rejected at the 5% significance level, but not at 2.5%. 4.27. C. The overall mean is: 760/10000 = 7.6%. When applied to years of data, the Method of Maximum Likelihood applied to the Poisson, produces the same result as the Method of Moments. λ = .076. In each case, the expected number of claims is the number of exposures times λ = .076. Year 1 2 3 Sum
Observed Number Exposures 200 3000 250 3300 310 3700 760
10000
Expected Number 228.00 250.80 281.20
((Observed - Expected)^2)/Expected 3.439 0.003 2.950
760
6.391
We have 3 years of data, so there are 3 - 1 = 2 degrees of freedom. Since 5.991 < 6.391 < 7.378, H0 is rejected at the 5% significance level, but not at 2.5%. Comment: The same answer as in the previous question. We are really only assuming the same mean frequency for every exposure in every year. Therefore, we do not lose a degree of freedom for the fitted parameter. Similar to Example 16.8 in Loss Models.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 124
4.28. D. The overall mean is: 760/10000 = 7.6%. When applied to years of data, the Method of Maximum Likelihood applied to the Geometric, produces the same result as the Method of Moments. β = 0.076. In each case, the expected number of claims is the number of exposures times β = 0.076. χ 2 = Σ (nk - Ek)2 /Vk, where the denominator is the variance based on the fitted Geometric: β(1+β)(exposures) = (1+β)(Expected Number). The Chi-Square Statistic would be somewhat smaller than in the Poisson case: (200 - 228)2 /{(228)(1.076)} + (250 - 250.8)2 /{(250.8)(1.076)} + (310 - 281.2)2 /{(281.2)(1.076)} = 5.939. We have 3 years of data, so there are 3 - 1 = 2 degrees of freedom. Since 4.605 < 5.939 < 5.991, H0 is rejected at the 10% significance level, but not at 5%. Comment: The Chi-Square statistic here is that for the Poisson assumption divided by 1.076, subject to rounding: 6.391/1.076 = 5.940. Similar to Exercise 16.14 in Loss Models. 4.29. B. The last group is 5 and over, since 6 and over would have had only 2.4 expected drivers. Number of Claims
Observed
Maximum Likelihood Negative Binomial
Expected
Chi Square
0 1 2 3 4 5 and over
17,649 4,829 1,106 229 44 15
0.7395056 0.2019838 0.0461271 0.0098458 0.0020281 0.0005096
17,653.5 4,821.8 1,101.1 235.0 48.4 12.2
0.001 0.011 0.021 0.155 0.403 0.660
Sum
23,872
1
23,872.0
1.252
Comment: Data for female drivers in California, taken from Table 2 of “A Markov Chain Model of Shifting Risk Parameters,” by Howard C. Mahler, PCAS 1997. With 2 fitted parameters, we have 6 - 1 - 2 = 3 degrees of freedom. For the Chi-Square with 3 d.f. the critical value for 10% is 6.251. 1.252 < 6.251. Thus we do not reject this fit at 10%.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 125
4.30. B. For the Bernoulli, Method of Moments is equal to Maximum Likelihood. The mean of the data is: (842 + 1016 + 1258 + 1380 + 1594) / (9000 + 10,500 + 12,000 + 13,500 + 15,000) = 0.1015. Thus q^ = 0.1015. In each case, the expected number of claims is the number of exposures times 0.1015. With years of data, χ2 = Σ (nk - Ek)2 /Vk, where the denominator is the variance based on the fitted Bernoulli: Vk = q (1-q) (exposures) = (1-q) (Expected Number) = (0.8985) (Expected Number). For example, for 2006: (842 - 913.50)2 / {(0.8985)(913.50)} = 6.229. Year 2006 2007 2008 2009 2010
Observed Number 842 1016 1258 1380 1594
Exposures 9,000 10,500 12,000 13,500 15,000
Expected Number 913.50 1065.75 1218.00 1370.25 1522.50
Chi-Square 6.229 2.585 1.462 0.077 3.737
Sum
6090
60,000
6090
14.090
The Chi-Square Good-of-Fit statistic is: 14.090. We have 5 years of data, so there are: 5 - 1 = 4 degrees of freedom. Since 13.277 < 14.090 < 14.860, H0 is rejected at the 1% significance level, but not at 0.5%. Comment: We are really only assuming the same mean frequency for every exposure in every year. Therefore, we do not lose a degree of freedom for the fitted parameter. Similar to Example 16.8 in Loss Models. 4.31. A. For each interval one computes: (observed number - fitted number)2 / fitted number Group the last two intervals so as to have at least 5 expected observations per interval. Number of Claims
Observed
Fitted
Chi Square
0 1 2 or more
800 180 20
818.7 163.7 17.5
0.43 1.61 0.35
Sum
1000
1000.0
2.39
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 126
4.32. C. Group the last two intervals, so that one has at least 5 expected observations per interval, as recommended. In that case, one has 5 intervals, and one has fit two parameters, therefore one has 5 - 1 -2 = 2 degrees of freedom. One computes the Chi-Square Statistic as 8.53 as shown below. Since 8.53 > 7.378, one rejects at 2.5%. Since 8.53 < 9.210, one does not reject at 1%. # Claims 0 1 2 3 4 & over
Observed Number Fitted Number 8725 8738 1100 1069 135 162 35 26 5 5
Sum
10000
((Observed - Fitted)^2)/Fitted 0.02 0.90 4.50 3.12 0.00
10000
8.53
4.33. B. For each of the groupings one computes: (observed number of risks - fitted number of risks )2 / fitted number of risks Taking the sum one gets a Chi-Square Statistic of 10.07. Number of Claims 0 1 2 3 or more
Observed # Risks 7673 2035 262 30
Assumed # Risks 7788.00 1947.00 243.00 22.00
Chi Square 1.70 3.98 1.49 2.91
10000
10000
10.07
There are four groupings. Since one parameter (the Poisson distribution is described by one parameter) was fit to the data, we have 4 - 1 - 1 = 2 degrees of freedom. Use the row of the table for 2 degrees of freedom. Since 10.07 > 9.210 one can reject at α = .010. On the other hand, 10.07 < 10.597, so one can not reject at α = .005. Comment: The fitted Poisson has parameter .25, with density function: Number of Claims Probability
0 0.778801
1 0.194700
2 0.024338
3 0.002028
4 0.000127
5 0.000006
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 127
4.34. B. The first step is to estimate µ. For the Poisson Distribution, maximum likelihood is the same as the method of moments. The observed mean is (806 + (2)(85) + (3)(8)) / 5000 = 1000 / 5000 = 0.2. Thus µ = 0.2. Then for example 5000f(2) = 5000(.22 )e-0.2 / 2! = 81.87. The final interval of three or more is obtained as: 5000 - (4093.65 + 818.73 + 81.87) = 5.74. The Chi-Square statistic of 1.218 is computed as follows: Result Observed Number 0 4101 1 806 2 85 3 or more 8 Sum
Fitted Number 4093.65 818.73 81.87 5.74
((Observed - Fitted)^2)/Fitted 0.013 0.198 0.119 0.888
5000.00
1.218
5000
Comment: The solution is sensitive to rounding. 4.35. B. Degrees of freedom = (# of intervals) - 1 - (#estimated parameters) = 4 - 1 - 1 = 2. 4.36. D. First, one calculates the Empirical Distribution Function. For example at x = 2, the Empirical Distribution Function is (4101 + 806 + 85) / 5000 = 0.9984. Then one calculates the fitted distribution function by summing the density function for a Poisson with mean of 0.2. For example, F(1) = f(0) + f(1) = e-0.2 + .2 e-0.2 = 0.9825. One then computes the absolute difference of the empirical and fitted distribution functions. The K-S statistic is the maximum absolute difference of 0.0015.
x 0 1 2 3
Empirical Distribution Function 0.8202 0.9814 0.9984 1.0000
Fitted Distribution Function 0.8187 0.9825 0.9989 0.9999
Absolute Difference 0.0015 0.0011 0.0005 0.0001
Comment: For a discrete distribution such as the Poisson, one compares at the actual points. For a continuous distribution, one would compare each fitted probability to the observed distribution function just before and just after each observed claim value. See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 128
4.37. C. The maximum absolute difference between the empirical and fitted distributions occurs at x = 0, and is 0.0043.
x 0 1 2
Empirical Distribution Function 0.9091 0.9929 0.9980
Fitted Distribution Function 0.9048 0.9953 0.9998
Absolute Difference 0.0043 0.0024 0.0018
Comment: For a discrete distribution one compares at all the (available) points. Due to the grouping of data, comparisons canʼt be made for x ≥ 3. (Both distribution functions equal one at infinity.) 4.38. E. There are 3 intervals and weʼve fit one parameter, so we have 3 - 1 - 1 = 1 degree of freedom. The chance of 0 claims is e-0.3055 = 0.73675. The chance of 1 claim is: (0.3055)e-0.3055 = 0.22508. The chance of 2 or more claims is: 1 - 0.73675 - 0.22508 = 0.03817. Chi-Square Statistic is 3.556. 3.556 < 3.841, so do not reject at 5%. Number of Claims 0 1 2 or more Sum
Number of Risks 729 242 29 1000.00
Fitted Number 736.75 225.08 38.17 1000.00
((Observed - Fitted)^2)/Fitted Chi-Square 0.082 1.272 2.203 3.556
Comment: I have used the groups we were told to use.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 129
4.39. E. Number of Claims Observed Number Fitted Number 0 80 100-100q 1 20 100q
((Observed - Fitted)^2)/Fitted (20-100q)^2 / (100-100q)
There are 2 intervals, so we have 2 -1 = 1 degrees of freedom. The critical value for a 1% significance level is 6.635. The Chi-square statistic is computed as the sum of the two contributions: (20-100q)2 {1/(100-100q) + 1/100q } = (2-10q)2 / {q(1-q)} . Setting the Chi-Square Statistic equal to the critical value of 6.635, we solve for q: (2-10q)2 / {q(1-q)} = 6.635. (2-10q)2 = 6.635q(1-q). 106.635q2 - 46.635q + 4 = 0. Thus q = {46.635 ±
46.6352 - (4)(106.635)(4) } / {(2)(106.635)} = 0.2187 ± 0.1015
= 0.117 or 0.320. We have a bad fit for q far from the observed mean of 0.2. Thus we reject the fit at 1% for q < .117 or q > .320. Thus the smallest value of q for which H0 will not be rejected at the 0.01 significance level is about 0.117. Comment: Note that we have assumed various values of q, rather than estimating q by fitting a Bernoulli to this data. (For example, using the method of moments one would estimate q = 20/100 = 0.2.) Therefore, we do not subtract any fitted parameters in order to determine the number of degrees of freedom. 4.40. C. For 5 classes we have 5 -1 = 4 degrees of freedom. Chi-Square Statistic is 9.80. Since 9.80 > 9.488, we reject at 5%; since 9.80 < 11.143, we do not reject at 2.5%. Number of Claims Observed Number Fitted Number 0 7 10 1 10 10 2 12 10 3 17 10 4 4 10 Sum
50
50
((Observed - Fitted)^2)/Fitted 0.90 0.00 0.40 4.90 3.60 9.80
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 130
4.41. D. From the solution to the previous question, the overwhelming majority of the Chi-Square Statistic came from the last two intervals. In addition, the differences between fitted and observed were in opposite directions. Thus combining the risks with 3 claims and the risks with 4 claims greatly reduces the Chi-Square Statistic. Number of Claims Observed Number Fitted Number 0 7 10 1 10 10 2 12 10 3 or 4 21 20 Sum
50
((Observed - Fitted)^2)/Fitted 0.90 0.00 0.40 0.05
50
1.35
For 4 classes we have 4 -1 = 3 degrees of freedom. Chi-Square Statistic is 1.35, which is less than 6.251, so p-value > 10%. Thus combining the risks with 3 claims and the risks with 4 claims results in a much different p-value from the p-value in the previous question, which was between 2.5% and 5%. The other listed combinations do not alter the p-value as much as does combining the risks with 3 claims and the risks with 4 claims. A. Combining the risks with 0 claims and the risks with 1 claims: Number of Claims Observed Number Fitted Number 0 or 1 17 20 2 12 10 3 17 10 4 4 10 Sum
50
((Observed - Fitted)^2)/Fitted 0.450 0.400 4.900 3.600
50
9.350
Since 9.348 < 9.350 < 11.345, the p-value is between 1% and 2.5%. B. Combining the risks with 1 claims and the risks with 2 claims: Number of Claims Observed Number 0 7 1 or 2 22 3 17 4 4 Sum
50
Fitted Number 10 20 10 10
((Observed - Fitted)^2)/Fitted 0.90 0.20 4.90 3.60
50
9.60
Since 11.345 > 9.60 > 9.348, the p-value is between 1% and 2.5%. C. Combining the risks with 2 claims and the risks with 3 claims: Number of Claims Observed Number Fitted Number 0 7 10 1 10 10 2 or 3 29 20 4 4 10 Sum
50
50
((Observed - Fitted)^2)/Fitted 0.90 0.00 4.05 3.60 8.55
Since 9.348 > 8.55 > 7.815, the p-value is between 2.5% and 5%. Comment: Combining classes after one has seen the data solely in order to decrease the computed Chi-Square Statistic, would defeat the whole purpose of the test.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 131
4.42. C. For the Poisson, the method of Maximum Likelihood is equal to the Method of Moments. The observed mean is: {(0)(50) + (1)(122) + (2)(101) + (3)(92)} / (50 + 122 + 101 + 92) = 600/365 = 1.644. Thus λ = 1.644. Thus for example, the fitted number of days with 2 claims is: (365)(e−λλ 2 /2!) = (365)(e-1.6441.6442 /2) = (365)(0.2611) = 95.30. Using the groupings specified in the question, the Chi-Square Statistic is 7.55. Number of Claims 0 1 2 3 or more
Observed 50 122 101 92
Fitted Poisson 19.32% 31.76% 26.11% 22.81%
Fitted Number 70.52 115.93 95.30 83.25
((Observed - Fitted)^2)/Fitted 5.97 0.32 0.34 0.92
Sum
365
1
365
7.55
For 4 classes and one fitted parameter we have 4 - 1 - 1 = 2 degrees of freedom. Since 7.55 > 7.378, we reject at 2.5%; since 7.55 < 9.210, we do not reject at 1%. Comment: 7.378 < 7.55 < 9.210. Reject to the left (at 100% - 97.5% = 2.5%) and do not reject to the right (at 100% - 99% = 1%.) Since an interval of 6 or more would only have about 2.5 fitted insureds, the six groupings below are what I would have used in the absence of any special instructions, such as those that were given in this question. Number of Claims Observed Number Fitted Number 0 50 70.52 1 122 115.93 2 101 95.30 3 92 52.22 4 0 21.46 5 or more 0 9.56 Sum
365
365
((Observed - Fitted)^2)/Fitted 5.97 0.32 0.34 30.30 21.46 9.56 67.95
For 6 - 1 - 1 = 4 degrees of freedom, one would reject at 0.5%. The assumed distribution does a terrible job of fitting the data in the righthand tail!
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 132
4.43. (i) The Poisson has way too large a probability of one claim, while greatly underestimating the number of insureds who have multiple claims. The Poisson is a poor match for this data. (ii) E[X] = 0p + (1)p(1 - p) + 2p(1 - p)2 + 3p(1- p)3 + ... = p(1 - p){{1 + (1 - p) + (1 - p)2 + ...} + {(1 - p) + (1 - p)2 + ...} + {(1 - p)2 + (1 - p)3 + ...} + ...} = p(1 - p){1/(1 - (1-p)) + (1-p)/(1 - (1-p)) + (1-p)2 /(1 - (1-p)) + ...} = (1 - p){1 + (1-p) + (1-p)2 + ...} = (1 - p){1/(1 - (1-p))} = (1 - p)/p. Alternately, this is a Geometric Distribution with: (1 - p) = β/(1 + β). ⇒ Mean = β = (1- p)/p. X = {(0)(826) + (1)(128) + (2)(39) + (3)(7)}/1000 = 0.227. (1 - p)/p = 0.227. ⇒ p = 0.815. (iii) f(0) = p = 0.815. f(1) = p(1 - p) = 0.150775. f(2) = p(1 - p)2 = 0.02789. f(3) = p(1 - p)3 = 0.005160. The expected number policyholders with four or more claims is: (1000){1 - (.815 + 0.150775 + 0.02789 + 0.005160)} = 1.2. Multiplying by 1000 one gets the other expected numbers of policyholders: 815.0 @ 0, 150.8 @ 1, 27.9 @ 2, and 5.2 @ 3. (iv) (a) The number of degrees of freedom is: 5 - 1 - 1 = 3. Number of Claims
Number of Insureds Observed
Geometric Distribution
Fitted Number of Insureds
Chi-Square = (observed # - fitted #)^2 / fitted #
0 1 2 3 4 and more
826 128 39 7 0
0.81500 0.15077 0.02789 0.00516 0.00117
815.00 150.78 27.89 5.16 1.17
0.148 3.440 4.422 0.656 1.171
Sum
1000
1.00000
1,000.00
9.838
For 3 degrees of freedom, the 2.5% critical value is 9.348, while the 1% critical value is 11.345. Since 9.348 < 9.838 < 11.345, the p-value is between 1% and 2.5%. (b) Since 9.838 < 11.345, we do not reject the fit at the 1% significance level. (Since 9.348 < 9.838, we reject the fit at the 2.5% significance level.) Alternately, since the p-value is greater than 1%, we do not reject the fit at 1%. Comment: In part (iv) one could combine the rows 3 and 4 or more, and arrive at the same conclusion, based on a Chi-Square Statistic of 8.082 for 2 degrees of freedom. Number of Claims
Number of Insureds Observed
Geometric Distribution
Fitted Number of Insureds
Chi-Square = (observed # - fitted #)^2 / fitted #
0 1 2 3 or more
826 128 39 7
0.81500 0.15077 0.02789 0.00633
815.00 150.78 27.89 6.33
0.148 3.440 4.422 0.071
Sum
1000
1.00000
1,000.00
8.082
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 133
4.44. B. The fitted number of days with n accidents is: (365)(e-.6.6n /n!). Number of Claims
Number of Days Observed
Fitted Poisson Distribution
Fitted Number of Days
Chi-Square = (observed # - fitted #)^2 / fitted #
0 1 2 3 and +
209 111 33 12
0.549 0.329 0.099 0.023
200.316 120.190 36.057 8.437
0.38 0.70 0.26 1.50
Sum
365
1.000
365.000
2.84
Chi-Square Statistic is: (209 - 200.316)2 /200.316 + (111 - 120.190)2 /120.190 + (33 - 36.057)2 /36.057 + (12 - 8.437)2 /8.437 = 2.84. Comment: In general, you should compute your Chi-Square Statistics to more accuracy than the nearest integer. (I have shown a little more accuracy than needed.) An interval of 4 and over would have had only 1.2 expected observations. Thus the final group used is “3 and over”. This final groupʼs expected number of days is: 365 - (200.316 + 120.190 + 36.057) = 8.437. 4.45. A. Since there are no fitted parameters, there are 6 - 1 = 5 degrees of freedom. # Claimants
Observed Number
Standard Probability
Expected Number
((Observed - Expected)^2)/Expected
1 2 3 4 5 6 or more
235 335 250 111 47 22
25% 35% 24% 11% 4% 1%
250 350 240 110 40 10
0.90 0.64 0.42 0.01 1.23 14.40
Sum
1000
100%
1000
17.59
Since 17.59 > 16.750, reject the null hypothesis at 1/2%. 4.46. B. For example, (.2744)(430) = 117.99. (112 - 117.99)2 /117.99 = .304. Type
Number of Claims
Historical Probability
Assumed Number
(Observed - Assumed)^2/Assumed Chi-Square
A B C
112 180 138
0.2744 0.3512 0.3744
117.99 151.02 160.99
0.304 5.563 3.284
Sum
430.00
1.00
430.00
9.151
Comment: For 3 - 1 = 2 d.f., 7.378 < 9.151 < 9.210; reject at 2.5%, but not at 1%.
2013-4-5,
Fitting Frequency § 4 Chi-Square Test,
HCM 10/9/12,
Page 134
4.47. E. The estimate of λ is: 230/1000 = .23. Number of Days
Number of Workers Observed
Fitted Poisson Distribution
Fitted Number of Days
Chi-Square = (observed # - fitted #)^2 / fitted #
0 1 2 3 and +
818 153 25 4
0.79453 0.18274 0.02102 0.00171
794.53 182.74 21.02 1.71
0.69 4.84 0.76 3.08
Sum
1000
1.000
1,000.00
9.36
We have 4 intervals and fit 1 parameter, and thus there are 4 - 1 - 1 = 2 degrees of freedom. 9.210 < 9.36 < 10.597, reject H0 at 1% (but not at 1/2%.) Comment: The CAS/SOA also accepted choice D, presumably to allow for intermediate rounding in computing the Chi-Square Statistic. For example, if one rounds the fitted values to the nearest tenth, 794.5, 182.7, 21.0, and 1.8, where the final value is gotten by subtraction from 1000, then the computed statistic is instead 8.97. In general, when computing the Chi-Square Statistic avoid intermediate rounding. Since the expected number in each interval is at least one, bullet number ii has no effect.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 135
Section 5, Likelihood Ratio Test42 Another way to rank fits of distributions is to compare the likelihoods or the loglikelihoods. As discussed previously, the larger the likelihood or loglikelihood the better the fit, all other things being equal. While more parameters usually results in a better fit, the improvement from additional parameters may or may not be significant. It turns out that one can use the Chi-Square Statistic to test whether a fit is significantly better. For example, assume we have fit via Maximum Likelihood a Negative Binomial Distribution and a Poisson Distribution to the same data. Assume the loglikelihoods are: Poisson: -3112.2 and Negative Binomial: -3110.4. The Negative Binomial has a larger loglikelihood, which is not surprising since we can obtain a Poisson as a limit of Negative Binomials. The Negative Binomial has two parameters and thus a greater ability to fit the peculiarities of a particular data set. In order to determine whether the Negative Binomial fit is significantly better, we take twice the difference of the loglikelihood: 2{-3110.4 - (-3112.2)} = 3.6. Since, the difference in the number of parameters is one, we compare to the Chi-Square Distribution with one degree of freedom: 0.100 2.706
0.050 3.841
0.025 5.024
0.010 6.635
0.005 7.879
Since 2.706 < 3.6 < 3.841, we reject at 10%, but not at the 5% significance level, the hypothesis that the Poisson is a more appropriate model than the Negative Binomial. The Principle of Parsimony says we should use the smallest number of parameters that get the job done. At a 5% level of significance, the one parameter Poisson Distribution is preferred to the two parameter Negative Binomial Distribution, even though the Negative Binomial Distribution has a somewhat larger loglikelihood. At the 5% significance level, the improvement in the loglikelihood would have to be somewhat larger in order to abandon the simpler Poisson model in favor of the more complicated Negative Binomial model.
42
See Section 16.4.4 in Loss Models. The Likelihood Ratio Test is also discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 136
On the other hand, at a 10% level of significance, the two parameter Negative Binomial Distribution is preferred to the one parameter Poisson Distribution. At the 10% significance level, the improvement in the loglikelihood is large enough in order to abandon the simpler Poisson model in favor of the more complicated Negative Binomial model. In general the Likelihood Ratio Test (or Loglikelihood Difference Test) proceeds as follows:43 1. One has two distributions, one with more parameters than the other, both fit to the same data via Maximum Likelihood. 2. One of the distributions is a special case of the other, with fewer parameters.44 3. One computes twice the difference in the loglikelihoods.45 4. One compares the result of step 3 to the Chi-Square Distribution, with a number of degrees of freedom equal to the difference of the number of fitted parameters of the two distributions. 5. One draws a conclusion as to whether the more general distribution fits significantly better than its special case. H0 is that the distribution with fewer parameters is appropriate. The alternative hypothesis H1 is that the distribution with more parameters is appropriate. Unlike some other hypothesis tests, the likelihood ratio test is set up to compare two possibilities. H0 : We use the simpler distribution with fewer parameters. H1 : We use the more complicated distribution with more parameters. We always prefer a model with fewer parameters, unless a model with more parameters is a significantly better fit. For example, H0 : Data was drawn from a Geometric Distribution (Negative Binomial with r = 1). H1 : Data was drawn from a Negative Binomial. The best fitting Negative Binomial has to have a loglikelihood that is greater than or equal to that of the best fitting Geometric. 43
Note that the twice the difference of the loglikelihoods approaches a Chi-Square Distribution as the data set gets larger and larger. Thus one should be cautious about drawing any conclusion concerning fits to small data sets. 44 This test is often applied when one distribution is the limit of the other. For example, the Poisson is a limit of Negative Binomial Distributions. Loss Models at pages 456 states that in this case the test statistic has a mixture of Chi-Square Distributions. Even though there is no Theorem to justify it, actuaries often use the Likelihood Ratio Test to compare fits of distributions with different numbers of parameters, even when one is not a special case of the other. 45 Equivalently, one computes twice the log of the ratio of the likelihoods.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 137
Exercise: Both a Geometric Distribution and a mixed Poisson-Transformed Gamma Distribution (a Poisson mixed via a Transformed Gamma Distribution) have been fit to the same data via the Method of Maximum Likelihood. The loglikelihoods are -1737.3 for the mixed Poisson-Transformed Gamma (3 parameters) and -1740.8 for the Geometric (1 parameter). Use the Likelihood Ratio Test to determine whether the mixed Poisson-Transformed Gamma fits this data significantly better than the Geometric. [Solution: The Geometric is a special case of the Negative Binomial. A mixed Poisson-Gamma is a Negative Binomial. Thus a Negative Binomial is a special case of a mixed Poisson-Transformed Gamma Distribution. Therefore, a Geometric is a special case of a mixed Poisson-Transformed Gamma Distribution. The Geometric has one parameter while the mixed Poisson-Transformed Gamma Distribution has three parameters, those of the Transformed Gamma. Therefore we compare to the Chi-Square Distribution with 3 - 1 = 2 degrees of freedom. Degrees of Freedom 2
0.100 4.605
Significance Levels 0.050 0.025 5.991 7.378
0.010 9.210
0.005 10.597
Twice the difference of the loglikelihoods is: 2{-1737.3 - ( -1740.8)} = 7.0. 5.991 < 7.0 < 7.378. Thus we do not reject the hypothesis of a better fit for the Geometric at a 2.5% level and reject at the 5% level. In other words, at the 2.5% level we do not reject the null hypothesis that the simpler Geometric model is appropriate rather than the alternative hypothesis that the more complex mixed Poisson-Transformed Gamma model is appropriate.] Exercise: One observes 663 claims from 2000 insureds. A Negative Binomial distribution is fit via maximum likelihood, obtaining β = .230 and r = 1.442 with corresponding maximum loglikelihood of 1793.24. One then takes r = 1.5 and finds the maximum likelihood β is 0.221, with corresponding maximum loglikelihood of -1796.11. Use the likelihood ratio test in order to test the hypothesis that r = 1.5. [Solution: The Negative Binomial with r and β has two parameters, while the Negative Binomial with r fixed at 1.5 is a special case with one parameter β. The difference in number of parameters is 1. We compare to the Chi-Square Distribution with one degree of freedom. Twice the difference in loglikelihoods is: 2{ -1793.24 - (-1796.11)} = 5.74. 5.024 < 5.74 < 6.635 ⇒ reject at 2.5% and do not reject at 1%, the hypothesis that r = 1.5. Comment: See 4, 11/03, Q.28, involving a Pareto Distribution, in “Mahlerʼs Guide to Loss Distributions.” H0 is that the distribution with fewer parameters is appropriate ⇔ r = 1.5.]
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 138
Exercise: One observes 2100 claims on 10,000 exposures. You fit a Poisson distribution via maximum likelihood. Use the likelihood ratio test in order to test the hypothesis that λ = 0.2. [Solution: The maximum likelihood fit is: λ = 2100/10000 = 0.21. For 10,000 exposures we have a Poisson with mean 10000λ. Loglikelihood = ln f(2100) = ln(e-10000λ (10000λ)2100 / 2100!) = -10000λ + 2100ln(λ) + 2100ln(10000) - ln(2100!). For λ = 0.2, loglikelihood = (-10000)(0.2) + 2100ln(0.2) + 2100ln(10000) - ln(2100!) = -5379.820 + 2100ln(10000) - ln(2100!). For λ = 0.21, loglikelihood = (-10000)(0.21) + 2100ln(0.21) + 2100ln(10000) - ln(2100!) = -5377.360 + 2100ln(10000) - ln(2100!). Twice the difference in loglikelihoods is: 2{-5377.360 - (-5379.820)} = 4.92. The restriction that λ be 0.2 is one dimensional. Alternately, the Poisson with λ unknown has one parameter, while the Poisson with λ = 0.2 is a special case with zero parameters; the difference in number of parameters is 1. In any case, we compare to the Chi-Square Distribution with one degree of freedom. 3.841 < 4.92 < 5.024 ⇒ reject at 5% and do not reject at 2.5%, the hypothesis that λ = 0.2.] Testing Other Hypothesis: Assume you observe 100,000 claims in Year 1 and 92,000 claims in Year 2 Assume that claims the first year are Poisson with parameter λ1. Assume that claims the second year are Poisson with parameter λ2. Using maximum likelihood to estimate λ1 is the same as the method of moments. Estimated λ1 = 100,000. Year 1 maximum loglikelihood is: -λ1 + 100000ln(λ1) - ln(100000!) = -100000 + 100000ln(100000) - ln(100000!) = 1,051,292.55 - ln(100000!). ^
Similarly applying maximum likelihood to the data for Year 2: λ2 = 92,000. Year 2 maximum loglikelihood is: -92000 + 92000ln(92000) - ln(92000!) = 959,518.03 - ln(92000!). As discussed previously, instead of separately estimating λ1 and λ2, one can assume some sort of relationship between them. For example, let us assume λ2 = 0.9λ1.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 139
For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!). Year 1 Loglikelihood is: -λ1 + 100000ln(λ1) - ln(100000!). Assuming λ2 = 0.9λ1, Year 2 Loglikelihood is: -λ2 + 92000ln(λ2) - ln(92000!) = -0.9λ1 + 92000ln(.9λ1) - ln(92000!) = -0.9λ1 + 92000ln(λ1) + 92000ln(0.9) - ln(92000!). Total Loglikelihood = -λ1 + 100000ln(λ1 ) - ln(100000!) - 0.9λ1 + 92000ln(λ1) + 92000ln(0.9) - ln(92000!) = -1.9λ1 + 192000ln(λ1) - ln(100000!) + 92000ln(0.9) - ln(92000!) . Setting the partial derivative with respect to λ1 equal to zero: 0 = -1.9 + 192000/λ1. ⇒ λ1 = 192000/1.9 = 101,052.63. Maximum loglikelihood is: -1.9(101,052.63) + 192000ln(101,052.63) - ln(100000!) + 92000ln(0.9) - ln(92000!) = 2,010,799.01 - ln(100000!) - ln(92000!). The unrestricted maximum loglikelihood is: 1,051,292.55 - ln(100000!) + 959,518.03 - ln(92000!) = 2,010,810.58 - ln(100000!) - ln(92000!), better than the restricted maximum loglikelihood of 2,010,799.01 - ln(100000!) - ln(92000!). It is not surprising that without the restriction we can do a somewhat better job of fitting the data. The unrestricted model involves two Poissons, while the restricted model is a special case in which one of the Poissons has 0.9 times the mean of the other. Let the null hypothesis H0 be that λ2 = 0.9λ1. Let the alternative H1 be that H0 is not true. Then we can use the likelihood ratio test as follows. We use the loglikelihood for the unrestricted model of 2,010,810.58 - ln(100000!) - ln(92000!), and the loglikelihood for the restricted model of 2,010,799.01 - ln(100000!) - ln(92000!). The test statistic is as usual twice the difference in the loglikelihoods: (2)(2,010,810.58 - ln(100000!) - ln(92000!) - {2,010,799.01 - ln(100000!) - ln(92000!)}) = 23.14. We compare to the Chi-Square Distribution with one degree of freedom, since the restriction is one dimensional. Since 7.879 < 23.14, we reject H0 at 0.5%.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 140
Problems: 5.1 (1 point) For some data, a Poisson distribution and a Negative Binomial distribution have each been fit by maximum likelihood. The fitted Poisson has a loglikelihood of -725.3, and the fitted Negative Binomial has a loglikelihood of -722.9. Treating the Poisson distribution as the null hypothesis which of the following is true? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level. 5.2 (2 points) A Binomial, Poisson, and Negative Binomial Distribution have each been fit to the same data via the method of maximum likelihood. The negative loglikelihoods are as follows: Distribution Negative Loglikelihood Binomial 28,903.6 Poisson 28,905.2 Negative Binomial 28,903.1 Which of the following statements are true at a 5% significance level? 1. The Binomial is a significantly better fit to this data than the Poisson. 2. The Negative Binomial is a significantly better fit to this data than the Binomial. 3. The Negative Binomial is a significantly better fit to this data than the Poisson. A. None of 1,2,3 B. 1 C. 2 D. 3 E. None of A, B, C, or D 5.3 (1 point) A Poisson distribution and a Binomial distribution have each been fit by maximum likelihood to the same data. The fitted Poisson has a loglikelihood of -1052.7, and the fitted Binomial has a loglikelihood of -1049.1. Which of the following is true? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 141
5.4 (3 points) You have the following data from the state of West Carolina: Region Number of Claims Number of Exposures Claim Frequency Rural 5000 250,000 2.000% Urban 10,000 320,000 3.125% You assume that the distribution of numbers of claims is Poisson. Based on data from other states, you assume that the mean claim frequency for Urban insureds is 1.5 times that for Rural insureds. Let H0 be the hypothesis that the mean claim frequency in West Carolina for Urban is 1.5 times that for Rural. Using the likelihood ratio test, one tests the hypothesis H0 . Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. Do not reject H0 at 5%. 5.5 (3 points) To the following data on the number of claims from each of 1000 policyholders, a Poisson distribution and a Negative Binomial distribution have each been fit by maximum likelihood. Expected Number of Expected Number of Number Number of Policyholders based Policyholders based on the of Claims Policyholders on the Fitted Poisson Fitted Negative Binomial 0 886 878.095 885.981 1 100 114.152 100.065 2 12 7.420 12.207 3 2 0.322 1.526 4 or more 0 0.011 0.221 Using the likelihood ratio test, which of the following is true? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 142
5.6 (1 point) A Negative Binomial Distribution and a Geometric Distribution have each been fit by maximum likelihood to the same accident data. The fitted Negative Binomial has a loglikelihood of -2602.78, and the fitted Geometric Distribution has a loglikelihood of -2604.56. If one had had twice as much data, with the same proportion of insureds with a given number of accidents, which of the following would have been the conclusion of the likelihood ratio test? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level. 5.7 (3 points) You have the following data on automobile insurance theft claims: Color of Car Number of Claims Number of Cars Black 1200 30,000 Other 2250 75,000 You assume that the distribution of numbers of claims for each car is Geometric. Using the likelihood ratio test, one tests the hypothesis H0 that the mean claim frequency for black cars is 25% more than that for the other colored cars. Which of the following is true? A. Reject H0 at 1%. B. Reject H0 at 2.5%. Do not reject H0 at 1%. C. Reject H0 at 5%. Do not reject H0 at 2.5%. D. Reject H0 at 10%. Do not reject H0 at 5%. E. Do not reject H0 at 10%.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 143
5.8 (3 points) To the following data on the number of claims from each of 400 policyholders, a Poisson distribution and a mixture of two Poisson distributions have each been fit by maximum likelihood. Expected Number of Expected Number of Number Number of Policyholders based Policyholders based on the of Claims Policyholders on the Fitted Poisson Fitted Mixture of Two Poissons 0 111 100.883 111.454 1 135 138.967 133.527 2 85 95.713 86.771 3 43 43.948 41.971 4 17 15.135 17.180 5 6 4.170 6.251 6 2 0.957 2.043 7 1 0.188 0.599 8 or more 0 0.039 0.204 Using the likelihood ratio test, which of the following is true? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level. 5.9 (3 points) You are given the following data from 1000 insurance policies: Number of Accidents Number of Policies 0 900 1 80 2 15 3 5 4+ 0 The null hypothesis is that the data was drawn from a Poisson Distribution with λ = 10%. The alternate hypothesis is that the data was drawn from the maximum likelihood Poisson. Using the likelihood ratio test, what is the conclusion? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
5.10 (6 points) You are given the following data from 1000 insurance policies: Number of Claims Number of Policies 0 657 1 233 2 79 3 27 4 3 5 1 H0 : The data was drawn from a Geometric Distribution. H1 : The data was drawn from a Negative Binomial Distribution. The maximum likelihood Negative Binomial has r = 1.55 and β = 0.315. What is the conclusion of the Likelihood Ratio Test? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level.
Page 144
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 145
5.11 (4, 5/01, Q.20) (2.5 points) During a one-year period, the number of accidents per day was distributed as follows: Number of Accidents Days 0 209 1 111 2 33 3 7 4 3 5 2 For these data, the maximum likelihood estimate for the Poisson distribution is ^ λ = 0.60, and for the negative binomial distribution, it is r^ = 2.9 and β^ = 0.21. The Poisson has a negative loglikelihood value of 385.9, and the negative binomial has a negative loglikelihood value of 382.4. Determine the likelihood ratio test statistic, treating the Poisson distribution as the null hypothesis. (A) -1 (B) 1 (C) 3 (D) 5 (E) 7 5.12 (1 point) In the previous question, 4, 5/01, Q.20, which of the following is true? A. H0 is rejected at the 0.005 significance level. B. H0 is rejected at the 0.010 significance level, but not at the 0.005 level. C. H0 is rejected at the 0.025 significance level, but not at the 0.010 level. D. H0 is rejected at the 0.050 significance level, but not at the 0.025 level. E. H0 is not rejected at the 0.050 significance level. 5.13 (1 point) In, 4, 5/01, Q.20, if we had three times as much data with the same proportion of days with given numbers of accidents, what would be the likelihood ratio test statistic?
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 146
Solutions to Problems: 5.1. D. The likelihood ratio test statistic = twice the difference in the loglikelihoods = (2)(-722.9 - (-725.3)) = 4.8. Since the Negative Binomial has one more parameter, compare to the Chi-Square distribution for one degree of freedom. Since 3.841 < 4.8 < 5.024, reject at 5% and do not reject at 2.5%, the null hypothesis that the simpler Poisson Distribution should be used rather than the more complicated Negative Binomial. 5.2. D. Since there is a difference of one parameter between the Poisson and Binomial Distributions, we compare to the Chi-Square Distribution with one degree of freedom. Twice the difference of the loglikelihoods is: (2)(28,905.2 - 28,903.6) = 3.2. Degrees Significance Levels of Freedom 0.100 0.050 0.025 0.010 0.005 1 2.706 3.841 5.024 6.635 7.879 Since 2.706 < 3.2 < 3.841, the Binomial fit is significantly better than the Poisson at the 10% level, but it is not significantly better than the Poisson at the 5% level. Since there is a difference of one parameter between the Poisson and Negative Binomial Distributions, we compare to the Chi-Square Distribution with one degree of freedom. Twice the difference of the loglikelihoods is: (2)(28,905.2 - 28,903.1) = 4.2. Since 3.841 < 4.2 < 5.024, the Negative Binomial fit is significantly better than the Poisson at the 5% level, but not significantly better at the 2.5% level. One can not use the likelihood ratio test to compare the fits of the Negative Binomial and Binomial Distributions. The Negative Binomial is a somewhat better fit, but we donʼt know whether it is significantly better. 5.3. B. The likelihood ratio test statistic = twice the difference in the loglikelihoods = (2)(-1049.1 - (-1052.7)) = 7.2. Since the Binomial has one more parameter, compare to the ChiSquare distribution for one degree of freedom. Since 6.635 < 7.2 < 7.879, reject at 1% and do not reject at 1/2%, the null hypothesis that the simpler Poisson Distribution should be used rather than the more complicated Binomial. Comment: For the likelihood ratio test, the null hypothesis is always to use the simpler distribution.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 147
5.4. C. For a Poisson Distribution, f(x) = e−λλ x/x!. ln f(x) = -λ + xln(λ) - ln(x!). Loglikelihood is: Σ -λ + xiln(λ) - ln(xi!) = -λE + Nln(λ) - Σln(xi!), where E = exposures, and N = total # of claims. Separate estimate of λ for Rural Poisson Distribution, λ = .02, same as the method of moments. The corresponding maximum loglikelihood is: -(.02)(250000) + 5000ln(.02) - Σln(xi!) = -24560.12 - Σln(xi!) . Rural
Rural
Separate estimate of λ for Urban Poisson Distribution, λ = 0.03125. The corresponding maximum loglikelihood is: -(.03125)(320000) + 10000ln(.03125) -
Σln(xi!) = -44657.36 - Σln(xi!) . Urban
Urban
Restricted by H0 , λU = 1.5λR, the loglikelihood for the combined sample is: -250,000λR + 5000ln(λR) - 320000(1.5λR) + 10000ln(1.5λR) - Σln(xi!). Setting the partial derivative with respect to λR equal to zero, and solving: λ R = (5000 + 10000) / {250000 + (320000)(1.5)} = 0.020548. λ U = (1.5)(.020548) = 0.030822. The corresponding maximum loglikelihood is: -250,000(.020548) + 5000ln(.020548) - 320000(.030822) + 10000ln(.030822) -
Σln(xi!) =
-69220.27 - Σln(xi!). The unrestricted maximum loglikelihood is: -24560.12 - Σln(xi!) -44657.36 - Σln(xi!) = -69217.48 - Σln(xi!). Rural
Urban
Twice the difference in the loglikelihoods: (2){-69217.48 - (-69220.27)} = 5.58. The restriction is one dimensional, so compare to the Chi-Square with one degree of freedom. Alternately, the unrestricted model has two parameters, while the restricted model has one parameter; the difference in number of parameters is one, so compare to the Chi-Square with one degree of freedom. Since 5.024 < 5.58 < 6.635, we reject H0 at 2.5% and do not reject H0 at 1%.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 148
5.5. B. For each fitted distribution the loglikelihood is: Σ ni ln f(xi) = 886 ln(E0 /1000) + 100 ln(E1 /1000) + 12 ln(E2 /1000) + 2 ln(E3 /1000) = 886 ln(E0 ) + 100 ln(E1 ) + 12 ln(E2 ) + 2 ln(E3 ) - 1000 ln(1000). Thus the difference of the loglikelihoods for the Negative Binomial and the Poisson is: 886ln(885.981/878.095) + 100ln(100.065/114.152) + 12ln(12.207/7.420) + 2ln(1.526/.322) = 3.836. The likelihood ratio test statistic is: (2)(3.836) = 7.672. Since there is a difference of one parameter, compare to the Chi-Square with one degree of freedom. 6.635 < 7.672 < 7.879. Reject H0 at 1%, but not at 0.5%. Comment: The loglikelihood for the fitted Negative Binomial, with r = 0.8619 and β = 0.1508, is: -403.292. The loglikelihood for the fitted Poisson, with λ = 0.130, is: -407.130. The null hypothesis is to use the simpler Poisson distribution; the alternative hypothesis is to use the more complicated Negative Binomial distribution. 5.6. B. Each of the loglikelihoods would be twice as much. The maximum likelihood fitted parameters would be the same. Thus now the maximum likelihood Negative Binomial has a loglikelihood of: (2)(-2602.78) = -5205.56. Now the maximum likelihood Geometric has a loglikelihood of: (2)(-2604.56) = -5209.12. The likelihood ratio test statistic = twice the difference in the loglikelihoods = (2){-5205.56 - (-5209.12)} = 7.12. The difference in number of parameters is: 2 - 1 = 1. Comparing to the Chi-Square Distribution with one degrees of freedom: 6.635 < 7.12 < 7.879. Thus we reject H0 at 1% and not at 0.5%.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 149
5.7. D. For a Geometric Distribution, f(x) = βx / (1+β)x+1. ln f(x) = x ln(β) - (x+1) ln(1+β). Loglikelihood is: Σ xi ln(β) - (xi+1) ln(1+β) = N ln(β) - N ln(1+β) - E ln(1+β), where E = exposures, and N = total # of claims. Separate estimate of β for the Black Geometric Distribution, β = 1200/30,000 = 0.04, the same as the method of moments. The corresponding maximum loglikelihood is: 1200 ln(0.04) - 1200 ln(1.04) - 30,000 ln(1.04) = -5086.337. Separate estimate of β for the Other Geometric Distribution, β = 2250/75,000 = 0.03. The corresponding maximum loglikelihood is: 2250 ln(0.03) - 2250 ln(1.03) - 75,000 ln(1.03) = -10,173.173. Restricted by H0 , βB = 1.25βO, the loglikelihood for the combined sample is: 1200 ln(1.25βO) - 1200 ln(1 + 1.25βO) - 30,000 ln(1 + 1.25βO) + 2250 ln(βO) - 2250 ln(1 + βO) - 75,000 ln(1 + βO) = 1200 ln(1.25) + 3450 ln(βO) - 31,200 ln(1 + 1.25βO) - 77,250 ln(1 + βO). Setting the partial derivative with respect to βO equal to zero: 0 = 3450 / βO - 39,000 / (1 + 1.25βO) - 77,250 / (1 + βO).
⇒ 131,250 βO2 + 108,487.5 βO - 3450 = 0. ⇒ βO = 0.030664. The corresponding maximum loglikelihood is: 1200 ln(1.25) + 3450 ln(0.030664) - 31,200 ln[1 + (1.25)(0.030664)] - 77,250 ln(1.030664) = -15,261.073. The unrestricted maximum loglikelihood is: -5086.337 + (-10,173.173) = -15,259.510. Twice the difference in the loglikelihoods: (2){(-15,261.073) - (-15,259.510)} = 3.126. The restriction is one dimensional, so compare to the Chi-Square with one degree of freedom. Alternately, the unrestricted model has two parameters, while the restricted model has one parameter; the difference in number of parameters is one, so compare to the Chi-Square with one degree of freedom. Since 2.706 < 3.126 < 3.841, we reject H0 at 10% and do not reject H0 at 5%.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 150
5.8. E. For each fitted distribution the loglikelihood is: Σ ni ln f(xi) = 111 ln(E0 /400) + 135 ln(E1 /400) + 85 ln(E2 /400) + 43 ln(E3 /400) + 17 ln(E4 /400) + 6 ln(E5 /400) + 2 ln(E6 /400) + 1 ln(E7 /400) = 111 ln(E0 ) + 135 ln(E1 ) + 85 ln(E2 ) + 43 ln(E3 ) + 17 ln(E4 ) + 6 ln(E5 ) + 2 ln(E6 ) + ln(E7 ) 400 ln(400). Thus the difference of the loglikelihoods for the mixture of two Poissons and the Poisson is: 111 ln(111.454/100.883) + 135 ln(133.527/138.967) + 85 ln(86.771/95.713) + 43 ln(41.971/43.948) + 17 ln(17.180/15.135) + 6 ln(6.251/4.170) + 2 ln(2.043/0.957) + ln(0.599/0.188) = 2.613. The likelihood ratio test statistic is: (2)(2.613) = 5.226. A mixture of Poissons has three parameters: λ1, λ2, and p the weight to the first Poisson. Since there is a difference of two parameters, (three versus one), we compare to the Chi-Square with two degrees of freedom. 4.605 < 5.226 < 5.991. ⇒ Reject H0 at 10%, but not at 5%. Comment: The null hypothesis is to use the simpler Poisson distribution; the alternative hypothesis is to use the more complicated mixture of two Poissons. The loglikelihood for the fitted mixture of two Poissons is -612.319. The loglikelihood for the fitted Poison is -614.932. The fitted Poisson has λ = sample mean = 1.3775. ^
5.9. C. X = {(900)(0) + (80)(1) + (15)(2) + (5)(3)} / 1000 = 12.5%. Thus λ = 0.125. We can think of the Poisson Distribution with λ = 10% as no fitted parameters, and thus a special case of the Poisson fit via maximum likelihood. The loglikelihood is: 900 ln[f(0)] + 80 ln[f(1)] + 15 ln[f(2)] + 5 ln[f(3)] = -900λ + (80){ln(λ) -λ} + (15){2ln(λ) - λ - ln(2)} + (5){3ln(λ) - λ - ln(6)}. Therefore, the difference between the maximum loglikelihood and that for λ = 0.10 is: (-900)(0.125 - 0.10) + (80){ln(1.25) - 0.025} + (15){2 ln(1.25) - 0.025} + (5){3 ln(1.25) - 0.025} = (1000)(-0.025) + 125 ln(1.25) = 2.893. Thus the Likelihood Ratio Test Statistic is: (2)(2.893) = 5.786. We are comparing zero and one fitted parameter, so we have one degree of freedom. Comparing to the Chi-Square Distribution with one degrees of freedom: 5.024 < 5.786 < 6.635. Thus we reject H0 at 2.5% and not at 1%.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 151
5.10. D. X = {(657)(0) + (233)(1) + (79)(2) + (27)(3) + (3)(4) + (1)(5)}/1000 = 0.489. Thus maximum Iikelihood Geometric Distribution has β = 0.489. f(0) = 1/ (1+β) = 0.671592.
f(1) = f(0) β / (1 + β) = 0.220556.
f(2) = f(1) β / (1 + β) = 0.072433.
f(3) = f(2) β / (1 + β) = 0.023787.
f(4) = f(3) β / (1 + β) = 0.007812.
f(5) = f(0) β / (1 + β) = 0.002566.
The corresponding loglikelihood is: 657 ln(0.671592) + 233 ln(0.220556) + 79 ln(0.072433) + 27 ln(0.023787) + 3 ln(0.007812) + 1 ln(0.002566) = -942.605. We are told that the maximum likelihood Negative Binomial has r = 1.55 and β = 0.315. f(0) =
f(2) =
1 = 0.654132. (1 + β)r r (r + 1) 2
f(4) =
f(5) =
f(1) =
2
β = 0.074178. (1 + β)r + 2
r (r + 1) (r + 2)(r + 3) 24
f(3) =
rβ = 0.242874. (1 + β)r + 1 r (r + 1) (r + 2) 6
3
β = 0.0210266. (1 + β)r + 3
4
β = 0.005729. (1 + β)r + 4
r (r + 1) (r + 2)(r + 3)(r + 4) β5 = 0.001523. 120 (1 + β)r + 5
The corresponding loglikelihood is: 657 ln(0.654132) + 233 ln(0.242874) + 79 ln(0.074178) + 27 ln(0.0210266) + 3 ln(0.005729) + 1 ln(0.001523) = -940.354. Thus the Likelihood Ratio Test Statistic is: (2) {-940.354 - (-942.605)} = 4.502. Comparing to the Chi-Square Distribution with one degrees of freedom: 5.024 < 4.502 < 6.635. Thus we reject H0 at 5% and not at 2.5%. Comment: Using a computer, the probability value is 3.4%.
2013-4-5,
Fitting Frequency § 5 Likelihood Ratio Test,
HCM 10/9/12,
Page 152
5.11. E. The likelihood ratio test statistic = twice the difference in the loglikelihoods = (2)(385.9 - 382.4) = 7.0. Comment: We would compute the loglikelihoods in each case as Σni lnf(xi): Number of Number Claims of Days
Poisson Density
Contribution to Loglikelihood
Neg. Binomial Density
Contribution to Loglikelihood
0 1 2 3 4 5
209 111 33 7 3 2
0.54881 0.32929 0.09879 0.01976 0.00296 0.00036
-125.40 -123.30 -76.39 -27.47 -17.46 -15.88
0.57534 0.28957 0.09800 0.02778 0.00711 0.00170
-115.53 -137.57 -76.65 -25.08 -14.84 -12.75
Sum
365
0.99996
-385.91
0.99950
-382.43
The maximum likelihood Poisson is the same as the method of moments: λ = {(209)(0) + (111)(1) + (33)(2) + (7)(3) + (3)(4) + (2)(5)}/365 = 0.6027. 5.12. B. Since the Negative Binomial has one more parameter, compare to the Chi-Square distribution for one degree of freedom. Since 6.635 < 7.0 < 7.879, reject at 1% and do not reject at 1/2%, the null hypothesis that the simpler Poisson Distribution should be used rather than the more complicated Negative Binomial Distribution. 5.13. Each of the loglikelihoods would be three times as much. The maximum likelihood fitted parameters would be the same. Thus now the maximum likelihood Poisson has a loglikelihood of: (3)(-385.9) = -1157.7. Now the maximum likelihood Negative Binomial has a loglikelihood of: (3)(-382.4) = -1147.2. The likelihood ratio test statistic = twice the difference in the loglikelihoods = (2){-1147.2 - (-1157.7)} = 21, 3 times what it was when we had the original data. Comment: Assuming the null hypothesis were true, in other words the data was drawn from a Poisson, then if we took a second sample of size equal to the first, we would not expect to get the same proportion of days with given numbers of accidents, as we did in the first sample. The power of a test is the probability of rejecting H0 when it is false. As the total sample size increases, the power of the test increases. For a relatively small sample, even if H0 is not true, there may not be enough statistical evidence to reject H0 . It is not unusual to get a small sample which is not too bad a match to H0 , even though the data was not drawn from the assumed distribution. For a very large sample, if H0 is not true, there is likely to be enough statistical evidence to reject H0 . When the data was not drawn from the assumed distribution, it is very unusual to get a large sample which is a good match to H0 . In practical applications, as one has more data, one can justify fitting a somewhat more complicated model (more parameters.) Also, as the amount of data increases, one can often rely on an empirical model in whole or in part.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 153
Section 6, Fitting to the (a, b, 1) Class46 Members of the (a, b, 1) family include: all the members of the (a, b, 0) family, Zero-Truncated Binomial, Zero-Truncated Poisson, Extended Truncated Negative Binomial, the Logarithmic Distribution, and the corresponding zero-modified distributions.47 As with the members of the (a, b, 0) family, one can fit these distributions to data via Method of Moments or Maximum Likelihood. Method of Moments, Zero-Truncated Distributions: Assume we have the following data on number of persons injured in bodily injury accidents: Number of People Injured in the Accident: Number of Accidents
1 2 1256 100
3 6
4&+ 0
The mean of this data is: X = (1256 + 200 + 18) / (1256 + 100 + 6) = 1.08223. The mean of a zero-truncated Poisson is: λ/{1 - f(0)} = λ/(1 - e−λ). Therefore, using the Method of Moments to fit a zero-truncated Poisson: λ / (1 - e−λ) = 1.08223. ⇒ 1.08223 - 1.08223e−λ - λ = 0. One can solve this equation numerically. The result is λ = 16.02%.48
46
See Section 15.6.4 in Loss Models. The (a, b, 1) class is discussed in “Mahlerʼs Guide to Frequency Distributions.” 48 λ = 0 is also a root, but this makes no sense for a zero-truncated Poisson. 47
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 154
Here is a graph of the lefthand side of this equation as a function of lambda:
0.003 0.002 0.001
0.05
0.1
0.15
0.2
lambda
- 0.001
We can see the lefthand side of this equation is zero for about λ = 16% Since the fitted lambda is small, we can use the approximation: e−λ ≅ 1 - λ + λ2/2 - λ3/6. Therefore, (1 - e−λ)/λ ≅ 1 - λ/2 - λ2/6. Therefore, the equation for method of moments becomes: 1 - λ/2 - λ2/6 ≅ 1/1.08223. Solving this quadratic equation, λ ≅ 16.1%.49 Exercise: We observe 1936 ones, 449 twos, and 37 threes. Fit via Method of Moments to the above data a zero-truncated Binomial with m = 3. [Solution: X = {1936 + (2)(449) + (3)(37)}/(1936 + 449 + 37) = 2945/2422. The mean of the zero-truncated Binomial is: 3q/{1 - (1 - q)3 } = 1/(1 - q + q2 /3). Set the theoretical mean equal to the sample mean: 1 - q + q2 /3 = 2422/2945. ⇒ 2445q2 - 8835q + 1569 = 0. ⇒ q = 18.96%. Comment: For larger values of m, one would have to solve numerically.] Exercise: We observe 1200 ones, 310 twos, 70 threes, 15 fours, and 5 fives. Fit via Method of Moments to the above data a zero-truncated Geometric. [Solution: X = {1200 + (2)(310) + (3)(70) + (4)(15) + (5)(5)}/(1200 + 310 + 70+ 15 + 5) = 2115/1600 = 1.322. The mean of the zero-truncated Geometric is: β/{1 - 1/(1 + β)} = 1 + β. Set the theoretical mean equal to the sample mean: β = 0.322.] 49
2.84 is also a root to the approximate equation, but is not a solution to the original equation.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 155
A zero-truncated Negative Binomial has a mean of: rβ/{1 - 1/(1 + β)r}. Therefore for r fixed, X = rβ/{1 - 1/(1 + β)r}. One could solve numerically for β. Similarly, one could fit a zero-truncated Negative Binomial with both r and β varying by matching the first and second moments, and then solving numerically. Method of Moments, Logarithmic Distribution: Exercise: We observe 889 ones, 97 twos, 9 threes, 3 fours, and 2 fives. Fit via Method of Moments to the above data a Logarithmic Distribution. [Solution: X = {889 + (2)(97) + (3)(9) + (4)(3) + (5)(2)}/(889 + 97 + 9 + 3 + 2) = 1132/1000 = 1.132. The mean of the Logarithmic Distribution is: β / ln[1 + β]. Set the theoretical mean equal to the sample mean: β = 1.132 ln[1 + β]. We can either solve numerically, or use the approximation that: ln[1 + β] ≅ β - β2/2 + β3/3. Thus β ≅ 1.132β - 1.132β2/2 + 1.132β3/3. ⇒ β ≅ 0.29. Comment: Solving numerically, β = 0.2751.] Maximum Likelihood, Zero-Truncated Distributions: When applied to individual ungrouped data, for the zero-truncated Poisson, zero-truncated Binomial with m fixed, and zero-truncated Negative Binomial with r fixed, Maximum Likelihood is equivalent to Method of Moments. Exercise: Verify that for a zero-truncated Poisson, fitting to individual ungrouped data is the same for Maximum Likelihood and the Method of Moments. [Solution: f(x) = λx e−λ/x!. ln f(x) = x ln(λ) - λ + constants. The zero-truncated density is: h(x) = f(x)/(1 - e−λ). ln h(x) = x ln(λ) - λ - ln[1 - e−λ] + constants. ∂ ln h(x) / ∂ λ = x/λ - 1 - e−λ/(1 - e−λ). Set the partial derivative of the loglikelihood with respect to λ equal to zero: 0 = Σxi/λ - n - n e−λ/(1 - e−λ). ⇒ X /λ = 1 + e−λ/(1 - e−λ) = 1/(1 - e−λ). ⇒ X = λ/(1 - e−λ). The righthand side of the equation is the mean of the zero-truncated Poisson. Thus Maximum Likelihood is the same as the Method of Moments. M Comment: See equation 15.19 in Loss Models, with p0 = 0.]
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 156
Method of Maximum Likelihood, Logarithmic Distribution: f(x) = βx/{x (1+β)x ln(1+β)}. ln f(x) = x lnβ - lnx - x ln(1+β) - ln[ln(1+β)]. Then the loglikelihood is: Σxi lnβ - Σlnxi - Σxi ln(1+β) - n ln[ln(1+β)]. Set the partial derivative of the loglikelihood with respect to β equal to zero: 0 = Σxi /β - Σxi /(1+β) - n /{(1+β)ln(1+β)}. ⇒
Σxi /{β(1+β)} = n/{(1+β)ln(1+β)}. ⇒ X = β/ln(1+β). Since the righthand side is the mean of the Logarithmic Distribution, this is the same as the method of moments. When applied to individual ungrouped data, for the Logarithmic Distribution, Maximum Likelihood is equivalent to Method of Moments.
Method of Maximum Likelihood, Zero-Modified Distributions: Exercise: Fit a Zero-Modified Poisson to the following data using maximum likelihood. Number of Claims: 0 1 2 3 4&+ Number of Policies: 18638 1256 100 6 0
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 157
[Solution: f(x) = λx e−λ/x!. ln f(x) = x ln(λ) - λ + constants. −λ The zero-modified density is: h(x) = f(x)(1 - pM 0 )/(1 - e ), x > 0. M −λ ln h(x) = x ln(λ) - λ + ln[1 - pM 0 ] - ln[1 - e ] + constants. ln h(0) = ln( p 0 ). Then the loglikelihood is: M −λ 18638 ln( pM 0 ) + ln(λ)Σxi - 1362λ + 1362ln[1 - p 0 ] - 1362ln[1 - e ] + constants.
Set the partial derivative of the loglikelihood with respect to pM 0 equal to zero: M M 18638 / pM 0 - 1362/(1 - p 0 ). ⇒ p 0 = 18638/(18638 + 1362) = 18638/20,000 = 93.19%.
Set the partial derivative of the loglikelihood with respect to λ equal to zero: 0 = Σxi/λ - 1362 - 1362e−λ/(1 - e−λ). ⇒ Σxi/λ= 1362/(1 - e−λ). ⇒ Σxi/1362= λ/(1 - e−λ). This is the same equation as for fitting via maximum likelihood a zero-truncated Poisson to the data x ≥ 1. (1256 + 200 + 18)/1362 = 1.0822 = λ/(1 - e−λ). ⇒ λ = 16.01%.] Thus in this example, we would assign to zero the observed probability of zeros, and fit lambda as one would fit a zero-truncated Poisson to the observations other than zeros. The latter is the same as the method of moments −λ In the above example, the mean of the zero-modified distribution is: λ(1 - pM 0 )/(1 - e ).
If we substitute the fitted value pM 0 = 18,638/20,000, then the mean of the zero-modified distribution is: λ(1362/20,000)/(1 - e−λ). Thus the equation for the fitted lambda can be rewritten as: −λ (1362/20,000) Σxi/1362 = λ(1362/20,000)/(1 - e−λ). ⇒ X = λ(1 - pM 0 )/(1 - e ).
In other words, we match the observed mean to the mean of the zero-modified distribution. When applied to individual ungrouped data, for the zero-modified Poisson, zero-modified Binomial with m fixed, and zero-modified Negative Binomial with r fixed, Maximum Likelihood is equivalent to assigning to zero the observed proportion of zeros and matching the mean of the zero-modified distribution to the sample mean. Set pM 0 = the proportion of zeros in the data.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 158
Variance of Maximum Likelihood Estimates: As discussed previously, the approximate variance of the estimate of a single parameter using the method of maximum likelihood is given by negative the inverse of the product of the number of points times the expected value of the second partial derivative of the log likelihood: ^
Variance of θ ≅ -1 / {n E [∂2 ln f(x) / ∂θ2]}. For the zero-truncated Poisson we had: ln f(x) = x ln(λ) - λ - ln[1 - e−λ] + constants.
∂ ln f(x) / ∂ λ = x/λ - 1 - e−λ/(1 - e−λ) = x/λ - 1 - 1/(eλ - 1). ∂2 ln f(x) / ∂ λ2 = -x/λ2 + eλ/(eλ - 1)2 . E[∂2 ln f(x) / ∂ λ2] = -E[x]/λ2 + eλ/(eλ - 1)2 = -1/{λ(1 - e−λ)} + e−λ/(1 - e−λ)2 = (λe−λ - 1 + e−λ)/{λ(1 - e−λ)2 } ^
Variance of λ ≅ -1 / {n E [∂2 ln f(x) / ∂θ2]} = λ(1 - e−λ)2 /{n(1 - λe−λ - e−λ)}. For example, the following data was fit previously to a zero-truncated Poisson: Number of People Injured in the Accident: 1 2 3 4&+ Number of Accidents 1256 100 6 0 ^ λ
= 16%. ^
Variance of λ = λ(1 - e−λ)2 /{n(1 - λe−λ - e−λ)} = (0.16)(1 - e-0.16)2 /{(1362)(1 - 0.16e-0.16 - e-0.16)} = 0.000223. ^
Standard Deviation of λ = 1.5%. For the zero-modified Poisson we had: −λ ln f(x) = x ln(λ) - λ + ln[1 - pM 0 ] - ln[1 - e ] + constants, x > 0.
ln f(0) = ln( pM 0 ). When we observe n0 values of zero out of sample of size n, the loglikelihood is: M −λ n0 ln( pM 0 ) + ln(λ)Σxi - (n - n0 )λ + (n - n0 )ln[1 - p 0 ] - (n - n0 )ln[1 - e ] + constants.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 159
∂ loglike / ∂ λ = Σxi/λ - (n - n0 ) - (n - n0 )e−λ/(1 - e−λ) = Σxi/λ - (n - n0 ) - (n - n0 )/(eλ - 1) . ∂2 loglike / ∂ λ2 = -Σxi/λ2 + (n - n0 )eλ/(eλ - 1)2 . −λ The mean of the zero-modified Poisson is: (1 - pM 0 )λ/(1 - e ). Therefore, −λ −λ −λ 2 E[∂2 loglike / ∂ λ2] = -n(1 - pM 0 )/{λ(1 - e )} + (n - n0 )e /(1 - e ) .
Substituting for pM 0 its estimator n0 /n: E[∂2 loglike / ∂ λ2] = (n - n0 )/(λe−λ + e−λ - 1)/{λ(1 - e−λ)2 }. ^
Variance of λ ≅ -1 / E [∂2 loglike/ ∂λ2] = λ(1 - e−λ)2 /{(n - n0 )(1 - λe−λ - e−λ)}.50
M M ∂ loglike / ∂ pM 0 = n0 / p 0 - (n - n0 )/(1 - p 0 ).
∂2 loglike / ∂ pM 0 ∂ λ = 0. ^ 51 Therefore, Cov[ λ , p^M 0 ] = 0.
M2 M 2 2 ∂2 loglike / ∂ pM 0 = -n0 / p 0 - (n - n0 )/(1 - p 0 ) .
Substituting for pM 0 its estimator n0 /n: 2 2 2 2 2 3 ∂2 loglike / ∂ pM 0 = -n /n0 - (n - n0 )/(1 - n0 /n) = -n /n0 - n /(n - n0 ) = -n /{n0 (n - n0 )}.
^M ^M 3 Variance of p^M 0 ≅ {n0 (n - n0 )}/n = {(n0 /n)(1 - n0 /n)}/n = p 0 (1 - p 0 )/n. This is the same result one would get by noting that each observation has a chance of pM 0 of being zero and a chance of 1 - pM 0 of not being zero. Thus we have a Bernoulli for a single draw, and the variance of the average goes down as 1/n.
50
By making use of the loglikelihood rather than the log density, we have used the “observed information” as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.” 51 As discussed in “Mahlerʼs Guide to Fitting Loss Distributions,” the information matrix would have zeros on the off-diagonal, and thus so would its inverse, the covariance matrix.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 160
Problems: 6.1. (3 points) The following data is for the number of people riding in each car crossing the Washington Bridge: Number of People 1 2 3 4 5 6 Number of Cars 200 300 150 60 30 10 Fit a Zero-Truncated Binomial Distribution with m = 6. What is the fitted q? A. 32% B. 35% C. 38% D. 41% E. 44% 6.2 (4 points) The average number of claims is 0.2. The proportion of insureds with zero claims is 85%. Fit via Maximum Likelihood a zero-modified Poisson Distribution to this data. What is the fitted density at 2? A. less than 1.5% B. at least 1.5% but less than 2.0% C. at least 2.0% but less than 2.5% D. at least 2.5% but less than 3.0% E. at least 3.0% 6.3 (2 points) The following data is for the number of pages for the resumes submitted with each of 138 job applications to the 19th Century Insurance Company: Number of Pages 1 2 3 4 5 6 7 8 Number of Resumes 60 35 19 11 6 4 2 1 Fit a Zero-Truncated Geometric Distribution. What is the fitted β? A. 1.2
B. 1.3
C. 1.4
D. 1.5
E. 1.6
6.4 (3 points) What is the standard deviation of the estimate in the previous question? A. 0.10 B. 0.12 C. 0.14 D. 0.16 E. 0.18 6.5 (3 points) The following data is for the number of people in each of 500 households: Number of People Number of Households
1 50
2 90
3 110
4 100
5 70
6 45
7 20
8 10
9 4
Fit a Zero-Truncated Poisson Distribution. What is the fitted λ? A. 3.3
B. 3.4
C. 3.5
D. 3.6
E. 3.7
6.6 (4 points) What is the standard deviation of the estimate in the previous question? A. 0.06 B. 0.08 C. 0.10 D. 0.12 E. 0.14
10 1
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 161
6.7. (3 points) Actuaries buy on average 0.2 computers per year. The proportion of actuaries who do not buy a computer during a given year is 85%. Fit via Maximum Likelihood a zero-modified Geometric Distribution to this data. Les N. DeRisk is an actuary. Using the fitted distribution, what is the probability that Les buys two computers next year? A. 2.2% B. 2.4% C. 2.6% D. 2.8% E. 3.0% 6.8 (4 points) Verify that for a zero-truncated Binomial with m fixed, fitting to individual ungrouped data is the same for Maximum Likelihood and the Method of Moments. 6.9 (3 points) Fit the following data for the number of persons injured in bodily injury accidents via maximum likelihood to a zero-truncated Negative Binomial Distribution with r = 2. Number of People 1 2 3 4 Number of Accidents 825 150 20 5 What is the fitted β? A. 0.13
B. 0.15
C. 0.17
D. 0.19
E. 0.21
6.10 (3 points) Fit the following data via maximum likelihood to a zero-modified Poisson. Number of claims 0 1 2 3 4 5 Count 1706 351 408 268 74 5 What is the density at three for the fitted distribution? A. 6% B. 7% C. 8% D. 9% E. 10% 6.11 (3 points) The following data is for the number of automobiles insured on private passenger automobile policies: Number of Automobiles: 1 2 3 4 Number of Policies: 100 75 20 5 Fit the above data to a Logarithmic Distribution. A. less than 1.2 B. at least 1.2 but less than 1.3 C. at least 1.3 but less than 1.4 D. at least 1.4 but less than 1.5 E. at least 1.5 6.12 (4 points) Verify that for a zero-truncated Negative Binomial with r fixed, fitting to individual ungrouped data is the same for Maximum Likelihood and the Method of Moments.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 162
6.13 (3 points) The following data is for the number of strokes needed on the windmill hole of the Gulliver Miniature Golf Course for 1000 golfers: Number of Strokes 1 2 3 4 5 6 Number of Golfers 100 200 300 250 100 50 You model the number of strokes as a Zero-Truncated Geometric Distribution. You fit β via maximum likelihood. What is the coefficient of variation of this estimate? A. 0.01 B. 0.02 C. 0.04 D. 0.08
E. 0.16
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 163
Solutions to Problems: 6.1. B. Mean is: {(200)(1) + (300)(2) + (150)(3) + (60)(4) + (30)(5) + (10)(6)}/750 = 2.267. Set this equal to the mean of the zero-truncated Binomial: 2.267 = 6q/{1 - (1 - q)6 }. ⇒ 1 - (1 - q)6 = 2.647q. Let x = 1 - q. ⇒ x6 + 2.647(1- x) = 1. ⇒ x6 - 2.647x +1.647 = 0. Solving numerically x = 0.651 and q = 0.349. 6.2. E. For the zero-modified Poisson, Maximum Likelihood is equivalent to assigning to zero the observed proportion of zero and matching the mean of the zero-modified distribution to the sample mean. pM 0 = 0.85. 0.2 = mean of the zero-modified Poisson = (1 - .85)λ / (1 - e−λ) = (0.15)λ / (1 - e−λ). 1 - .75λ - e−λ = 0. e−λ = 1 - λ + λ2/2 - λ3/6 .... 0 ≅ 0.25λ - λ2/2 + λ3/6. ⇒ 0 ≅ 1.5 - 3λ + λ2. ⇒ λ ≅ (3 -
3 )/2 = 0.634.
−λ -.634 .6342 /2)(.15)/(1 - e-.634) = 3.4%. f(2) ≅ {e−λλ 2/2}(1 - pM 0 )/{1 - e } = (e
Comment: Solving numerically, λ = 0.606. f(2) = (e-.606 .6062 /2}(.15)/(1 - e-.606) = 3.3%. 6.3. A. The mean of a Zero-Truncated Geometric Distribution is: β/(1 - 1/(1+β)} = 1 + β. The mean of the data is: 307/138 = 2.2246. Set this theoretical mean equal to the observed mean: 1 + β = 2.2246. ⇒ β = 1.2246. Comment: Method of Maximum Likelihood gives the same answer as the method of moments. 6.4. C. Second moment of a Zero-Truncated Geometric Distribution is: {β(1 + β) + β2 }/(1 - 1/(1+β)} = (1 + β)2 + β(1+β). Therefore, the variance of a Zero-Truncated Geometric Distribution is: (1 + β)2 + β(1+β) - (1 + β)2 = β(1+β). ^
^
β = X - 1. ⇒ Var[ β ] = Var[ X ] = Var[X]/N = β(1+β)/N = (1.2246)(2.2246)/138 = 0.01974. ^
StdDev[ β ] = 0.01974 = 0.1405. Comment: Due to the memoryless property of the Geometric Distribution, the zero-truncated version has the same variance as the regular version.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 164
6.5. D. The mean of a Zero-Truncated Poisson Distribution is: λ/(1 - e−λ). The mean of the data is: 1846/500 = 3.692. Set this theoretical mean equal to the observed mean: λ/(1 - e−λ) = 3.692. Try values of lambda. For example, 3.7/(1 - e-3.7) = 3.794. 3.6/(1 - e-3.6) = 3.701. 3.5/(1 - e-3.5) = 3.609. Thus λ = 3.6. Comment: Method of Maximum Likelihood gives the same answer as the method of moments. More exactly, λ = 3.590. 6.6. B. The zero-truncated density is: h(x) = f(x)/(1 - e−λ). ln h(x) = x ln(λ) - λ - ln[1 - e−λ] + constants.
∂ ln h(x) / ∂ λ = x/λ - 1 - e−λ/(1 - e−λ) = x/λ - 1 - 1/(eλ - 1). ∂2 ln h(x) / ∂ λ2 = -x/λ2 + eλ/(eλ - 1)2 = -x/λ2 + e−λ/(1 - e−λ)2 . E[∂2 ln h(x) / ∂ λ2] = -{λ/(1 - e−λ)}/λ2 + e−λ/(1 - e−λ)2 = -1/{λ(1 - e−λ)} + e−λ/(1 - e−λ)2 = -1/{(3.6)(1 - e-3.6) + e-3.6/(1 - e-3.6)2 = -0.2856 + 0.0289 = -0.2567. ^
Var[ λ ] = -1/{n E[∂2 ln h(x) / ∂ λ2]} = -1/{(500)(-0.2567)} = 0.00779. ^
StdDev[ λ ] = 0.00779 = 0.0883. 6.7. D. For the zero-modified Geometric, Maximum Likelihood is equivalent to assigning to zero the observed proportion of zero and matching the mean of the zero-modified distribution to the sample mean. pM 0 = 0.85. 0.2 = mean of the zero-modified Geometric = (1 - pM 0 ) (mean of zero-truncated Geometric) = (1 - pM 0 ) (1+β) = (0.15)(1 + β).
⇒ β = 0.2/0.15 - 1 = 1/3. M M T 2-1 2 2 f(2) = p M 2 = (1 - p 0 ) p 2 = (1 - p 0 ) {β /(1+β) } = (0.15) {(1/3)/(4/3) } = 2.8%. Comment: Mike Swaim's Les N. DeRisk actuarial cartoon appears in the “Actuarial Digest.”
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 165
6.8. f(x) = qx (1-q)m-x m!/(x! (m-x)!). ln f(x) = x ln(q) + (m-x)ln(1-q) + constants. The zero-truncated density is: h(x) = f(x)/{1 - (1-q)m}. ln h(x) = x ln(q) + (m-x)ln(1-q) - ln[{1 - (1-q)m}] + constants.
∂ ln h(x) / ∂ q = x/q + (m-x)/(1-q) - m(1-q)m-1/{1 - (1-q)m}. Set the partial derivative of the loglikelihood with respect to q equal to zero: 0 = Σxi/q + Σ(m-xi)/(1-q) - Σm(1-q)m-1/{1 - (1-q)m}. 0 = n X /q + mn/(1-q) - n X /(1-q) - mn(1-q)m-1/{1 - (1-q)m}. ⇒ 0 = X /q + m/(1-q) - X /(1-q) - m(1-q)m-1/{1 - (1-q)m}. ⇒ X /q - X /(1-q) = m/(1-q) - m(1-q)m-1/{1 - (1-q)m}. ⇒ X /{(q)(1-q)} = {m/(1-q)}(1 - (1-q)m/{1 - (1-q)m}) = {m/(1-q)}/{1 - (1-q)m}. ⇒ X = mq/{1 - (1-q)m}. The righthand side of the equation is the mean of the zero-truncated Binomial. Thus Maximum Likelihood is the same as the Method of Moments. Comment: See equation 15.20 in Loss Models, with pM 0 = 0. 6.9. A. The mean of a zero-truncated Negative Binomial with r = 2 is: 2β/{1 - 1/(1+β)2 }. The observed mean is: 1205/100 = 1.205. Set the theoretical and observed means equal: 1.205 = 2β/{1 - 1/(1+β)2 }. ⇒ 1.205 - 1.205/(1+β)2 = 2β. ⇒ 1.205(1+β)2 - 1.205 = 2β(1+β)2 .
⇒ 2.41β + 1.205β2 = 2β + 4β2 + 2β3. ⇒ 2β2 + 2.795β - 0.41 = 0. . β = {-2.795 +
2.7952 - (4)(2)(-0.41) }/4 = 0.1339.
Comment: Method of Maximum Likelihood gives the same answer as the method of moments.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 166
6.10. B. Let pM 0 = 1706/2812 = 0.6067. Then fit via Method of Moments a zero-truncated distribution to the non-zero observations. λ/(1 - e−λ) = {(351)(1) + (408)(2) + (268)(3) + (74)(4) + (5)(5)}/(351 + 408 + 268 + 74 + 5) = 2292/1106 = 2.072. Trying values, for λ = 2, λ/(1 - e−λ) = 2.313. For λ = 1.7, λ/(1 - e−λ) = 2.080. For λ = 1.69, λ/(1 - e−λ) = 2.072. The fitted zero-modified Poisson has: pM 0 = 0.6067, and λ = 1.69. The density at three is: (1 - .6067)(1.693 e-1.69/6)/(1 - e-1.69) = 7.16%. Comment: We want: λ/(1 - e−λ) = 2.072. The denominator is less than one, so the function is greater than λ. Depending on how good your first guess is, it may take you a little longer. I tried λ = 2, since ignoring the denominator, that would be approximately okay. One could instead for example start with λ = 1. In my case, my first guess of λ = 2 resulted in a value that was too big by about 0.3, so I reduced lambda by 0.3 and tried again. The guessing part at the end of the solution is not a key skill for your exam. The mean of the fitted distribution is: (1 - .6067)(1.69)/(1 - e-1.69) = 0.815. This matches the observed mean of: 2292/2812 = 0.815. Here is a comparison of the fitted distribution to the data: Number of claims 0 1 2 3 4 5 6+ Observed Count 1706 351 408 268 74 5 0 Fitted Count 1706 423 357 201 85 29 11 6.11. E. X = 330/200 = 1.65. The mean of the Logarithmic Distribution is: β / ln[1 + β]. Set the theoretical mean equal to the sample mean: β / ln[1 + β] = 1.65. Try values of β: 1.6/ln[2.6] = 1.674. 1.55/ln[2.55] = 1.656. 1.535/ln[2.535] = 1.650. β = 1.535. Comment: Maximum Likelihood equals method of moments.
2013-4-5,
Fitting Frequency § 6 (a, b, 1) Class,
HCM 10/9/12,
Page 167
6.12. f(x) = {βx / (1+β)r+x} r (r+1) ... (r + x - 1)/x!. ln f(x) = x ln(β) - (r+x)ln(1+β) + constants. The zero-truncated density is: h(x) = f(x)/{1 - (1+β)-r}. ln h(x) = x ln(β) - (r+x)ln(1+β) - ln[{1 - (1+β)-r}] + constants.
∂ ln h(x) / ∂ β = x/β - (r+x)/(1+β) - r(1+β)-(r+1)/{1 - (1+β)-r}. Set the partial derivative of the loglikelihood with respect to β equal to zero: 0 = Σxi/β - Σ(r+xi)/(1+β) - Σr(1+β)-(r+1)/{1 - (1+β)-r}. 0 = n X /β - nr/(1+β) - n X /(1+β) - nr(1+β)-(r+1)/{1 - (1+β)-r}. ⇒ 0 = X /β - r/(1+β) - X /(1+β) - r(1+β)-(r+1)/{1 - (1+β)-r}. ⇒ X /β - X /(1+β) = r/(1+β) - r(1+β)-(r+1)/{1 - (1+β)-r}. ⇒ X /{(β)(1+β)} = {r/(1+β)}(1 - (1+β)-r/{1 - (1+β)-r}) = {r/(1+β)}/{1 - (1+β)-r}. ⇒ X = rβ/{1 - (1+β)-r}. The righthand side of the equation is the mean of the zero-truncated Negative Binomial. Thus Maximum Likelihood is the same as the Method of Moments. 6.13. C. Method of Maximum Likelihood gives the same answer as the method of moments. The mean of a Zero-Truncated Geometric Distribution is: β/{1 - 1/(1+β)} = 1 + β. The mean of the data is: 3200/1000 = 3.2. Set this theoretical mean equal to the observed mean: 1 + β = 3.2. ⇒ β = 2.2. ^
^
β = X - 1. Var[ β ] = Var[ X ] = Var[X]/1000 = β(1+β)/1000 = (2.2)(3.2)/1000 = 0.00704. ^
^
StdDev[ β ] = 0.00704 = 0.0839. Coefficient of variation of β is: 0.0839/2.2 = 0.038. Alternately, the zero-truncated density is: h(x) = f(x)/{1 - 1/(1+β)} = f(x)(1+β)/β = βx-1/(1+β)x. ln h(x) = (x-1) ln(β) - x ln(1+β).
∂ ln h(x) / ∂ β = (x-1)/β - x/(1+β).
∂2 ln h(x) / ∂ β2 = -(x-1)/β2 + x/(1+β)2 . E[∂2 ln h(x) / ∂ β2] = -(E[x] - 1)/β2 + E[x]/(1+β)2 = -β/β2 + (1+β)/(1+β)2 = -1/β + 1/(1+β) = -1/2.2 + 1/3.2 = -0.014205. ^
Var[ β ] = -1 / {n E[∂2 ln h(x) / ∂ β2]} = -1 / {(1000)(-0.014205)} = 0.00704. Proceed as before. Comment: The density, mean, and variance of the Zero-Truncated Geometric Distribution are shown in Appendix B attached to the exam.
2013-4-5,
Fitting Frequency § 7 Important Ideas,
HCM 10/9/12,
Page 168
Section 7, Important Formulas and Ideas Here are what I believe are the most important formulas and ideas from this study guide to know for the exam. Method of Moments (Section 2): If one has a single parameter, then one matches the observed mean to the theoretical mean of the distribution. In the case of two parameters, one matches the first two moments, or equivalently one matches the mean and variance. In order to estimate the variance of a single parameter fit by the Method of Moments: Write the estimated parameter as a function of the observed mean, X , and use the fact that Var( X ) = Var(X) / n.
Method of Maximum Likelihood (Section 3): For ungrouped data:
Likelihood = Π f(xi).
Loglikelihood = Σln f(xi).
Find the set of parameters such that the likelihood or the loglikelihood is maximized. For the Poisson, Binomial with m fixed, or the Negative Binomial with r fixed, the method of maximum likelihood is equal to the method of moments. (Fisherʼs) Information = -n E [∂2 ln f(x) / ∂θ2]. For a single parameter, Var[ θ^ ] =
nE
[∂2
−1 1 = . 2 the information ln f(x) / ∂ θ]
The variance of the estimate of a function of a single parameter θ, h(θ), is: ⎛ ∂h ⎞ 2 Var[h(θ)] ≅ ⎜ ⎟ Var[ θ^ ]. ⎝ ∂θ ⎠ For grouped data: Likelihood = Π {F(bi ) − F(a i )}ni . Loglikelihood = Σ ni ln(F(bi) - F(ai)). When applied to years of data, the Method of Maximum Likelihood applied to the Poisson, produces the same result as the Method of Moments.
2013-4-5,
Fitting Frequency § 7 Important Ideas,
HCM 10/9/12,
Page 169
Chi-Square Test (Section 4): Chi-Square Statistic is computed as a sum of terms, for each interval one computes: (observed number - expected number)2 / expected number. A small Chi-Square Statistic indicates a good match between the data and the distribution. To compute the number of Degrees of Freedom: 1. Determine the groups to use in computing the Chi-Square statistic. Unless the exam question has specifically told you which groups to use, use the groups for the data given in the question. 2. Determine whether any parameters have been fit to this data, and if so how many. 3. Degrees of freedom = (# intervals from step 1) - 1 - (# of fitted parameters, if any, from step #2). The degrees of freedom gives the proper row; find the columns that bracket the Chi-Square Statistic and then reject to the left and do not reject (accept) to the right. The p-value is the value of the Survival Function of the Chi-Square Distribution. A large p-value indicates a good fit. If applying the Chi-Square Goodness of Fit Test to data with total claims and exposures by year, then the number of degrees of freedom is the number of years minus one, and χ2 =
∑(nk - E k)2 / Vk . k
Likelihood Ratio Test (Section 5): The Likelihood Ratio Test proceeds as follows: 1. One has two distributions, one with more parameters than the other, both fit to the same data via Maximum Likelihood. 2. One of the distributions is a special case or limit of the other. 3. One computes twice the difference in the loglikelihoods. 4. One compares the result of step 3 to the Chi-Square Distribution, with a number of degrees of freedom equal to the difference of the number of parameters of the two distributions. 5. One draws a conclusion as to whether the more general distribution fits significantly better than its special case. H0 is that the distribution with fewer parameters is appropriate. The alternative hypothesis H1 is that the distribution with more parameters is appropriate.
2013-4-5,
Fitting Frequency § 7 Important Ideas,
HCM 10/9/12,
Page 170
Fitting to the (a, b, 1) Class (Section 6): When applied to individual ungrouped data, for the zero-truncated Poisson, zero-truncated Binomial with m fixed, zero-truncated Negative Binomial with r fixed, and Logarithmic Distribution, Maximum Likelihood is equivalent to Method of Moments. When applied to individual ungrouped data, for the zero-modified Poisson, zero-modified Binomial with m fixed, and zero-modified Negative Binomial with r fixed, Maximum Likelihood is equivalent to assigning to zero the observed proportion of zeros and matching the mean of the zero-modified distribution to the sample mean. Set pM 0 = the proportion of zeros in the data.
Mahlerʼs Guide to
Fitting Loss Distributions Joint Exam 4/C
prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-6 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-6,
Fitting Loss Distributions,
HCM 10/14/12,
Page 1
Mahlerʼs Guide to Fitting Loss Distributions Copyright 2013 by Howard C. Mahler. The Fitting Loss Distribution concepts on the Joint CAS Exam 4 / SOA Exam C from Loss Models, by Klugman, Panjer, and WiIlmot, are demonstrated. Information in bold and sections whose titles are in bold, are more important to pass your exam. Larger bold type indicates it is extremely important. Information presented in italics (and sections whose titles are in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined. Solutions to the problems in each section are at the end of that section. Note that problems include both some written by me and some from past exams1. The latter are copyright by the Casualty Actuarial Society and the SOA and are reproduced here solely to aid students in studying for exams.2 Greek letters used in Loss Models: α = alpha, β = beta, γ = gamma, θ = theta, λ = lambda, µ = mu, σ = sigma, τ = tau β = beta, used for the Beta and incomplete Beta functions. Γ = Gamma, used for the Gamma and incomplete Gamma functions. Φ = Phi, used for the Normal distribution. φ = phi, used for the Normal density function.
Π = Pi is used for the continued product just as Σ = Sigma is used for the continued sum
1
In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus. In some cases the material covered is preliminary to the current Syllabus; you will be assumed to know it in order to answer exam questions, but it will not be specifically tested. 2 The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams.
2013-4-6,
Fitting Loss Distributions,
HCM 10/14/12,
Page 2
Section #
Pages
Section Name
1 2 3 4 5 6 7 8 9 10
5-6 7-8 9 10-12 13-44 45-123 124-134 135-166 167-213 214-298
Introduction Ungrouped Data Grouped Data The Modeling Process Ogives and Histograms Kernel Smoothing Estimation of Percentiles Percentile Matching Method of Moments Fitting to Ungrouped Data by Maximum Likelihood
11 12 13 14 15 16 17 18 19 20
299-322 323-381 382-409 410-435 436-444 445-505 506-532 533-554 555-586 587-593
Fitting to Grouped Data by Maximum Likelihood Chi-Square Test Likelihood Ratio Test Hypothesis Testing Schwarz Bayesian Criterion Kolmogorov-Smirnov Test, Basic Kolmogorov-Smirnov Test, Advanced p-p Plots Anderson-Darling Test Percentile Matching Applied to Truncated Data
21 22 23 24 25 26 27 28 29 30
594-604 605-654 655-668 669-692 693-715 716-777 778-793 794-819 820-846 847-911
Method of Moments Applied to Truncated Data Maximum Likelihood Applied to Truncated Data Single Parameter Pareto Distribution, Data Truncated from Below Fitting to Censored Data Fitting to Data Truncated and Censored Properties of Estimators Variance of Estimates, Method of Moments Variance of Estimated Single Parameters, Maximum Likelihood Information Matrix and Covariance Matrix Variance of Functions of Maximum Likelihood Parameters
31 32 33
912-924 925-936 937-947
Non-Normal Confidence Intervals Minimum Modified Chi-Square Important Ideas & Formulas
A
B
C
D E
F
G
H
I
J
2013-4-6,
Fitting Loss Distributions,
HCM 10/14/12,
Page 3
Exam 4 Questions by Section of this Study Aid3 4 Section Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
5/00
11/00
5/01
11/01 11/02 11/03 11/04
4
8
2 32 36
39 2 39 6 34 16 30
2 37 33 40
37 9
10
2 8 24 34
20 2 30 14 6
5/05
11/05 11/06
26 22
33 9
35 24 1
24
3 21
23 33
22 12
17
38
1
22
5 19
6
34
10
21 22
7
40
6
26 32 18 24
18 14
31
40
13 25 34
20 31 34
31 31 27 16
22 33
24, 28 10
5 14
25
10 11
16
33 10 28
23
5/07
5 1 28 18
18 25
9, 10
14
34
30
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams. I have put past exam questions in the study guide and section where they seem to fit best. However, exam questions often rely on material in more than one section or more than one study guide. Therefore, one should use this chart to direct your study efforts with appropriate caution.
3 4
Excluding any questions that are no longer on the syllabus. Some former Exam 4 questions that cover more basic material are in “Mahlerʼs Guide to Loss Distributions.”
2013-4-6,
Fitting Loss Distributions §1 Introduction,
HCM 10/14/12,
Page 4
Section 1, Introduction The material in this study guide uses the ideas in “Mahlerʼs Guide to Loss Distributions.” It is strongly recommended you review the important idea section of that study guide before proceeding any further. It would also be worthwhile to look through again that portion of Appendix A of Loss Models that will be attached to your exam. In this study guide are discussed a number of related topics on Fitting Loss Distributions: 1. Graphical Techniques to Display or Smooth Data: Ogives, Histograms, Kernel Smoothing, p-p plots 2. Methods of Fitting Distributions: Percentile Matching, Method of Moments, Maximum Likelihood 3. Statistical Tests of Fits: Chi-Square Goodness-of-Fit, Likelihood Ratio Test, Schwarz Bayesian Criterion, Kolmogorov-Smirnov, Anderson-Darling 4. Properties of Estimators: Section 26 5. Variances of Estimates Derived From Fitting Distributions: Sections 27-30
2013-4-6,
Fitting Loss Distributions §1 Introduction,
HCM 10/14/12,
Page 5
Loss Distributions as per Loss Models
Distribution Name
Distribution Function
Probability Density Function
Exponential
1 - e-x/θ
e-x/θ / θ
⎛ θ⎞ α 1 - ⎜ ⎟ , x > θ. ⎝ x⎠
Single Parameter Pareto
⎡ ⎛ x ⎞ τ⎤ ⎛ x ⎞τ τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥ ⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦ x
⎡ ⎛ x ⎞ τ⎤ 1 - exp⎢-⎜ ⎟ ⎥ ⎣ ⎝ θ⎠ ⎦
Weibull
(x / θ)α e- x/ θ x Γ(α)
Γ[α ; x/θ]
Gamma
LogNormal
⎡ ln(x) − µ ⎤ Φ⎢ ⎥⎦ ⎣ σ
Pareto
⎛ θ ⎞α 1 - ⎜ ⎟ ⎝ θ + x⎠
⎛x ⎞ Inverse Gaussian Φ ⎜ − 1⎟ ⎝µ ⎠
[
Inverse Gamma
α θα xα + 1
θ x
]
+
e2θ / µ
1 - Γ[α ; θ/x]
[(
exp -
=
ln(x) − µ)2 2σ2 x σ 2π
x α -1 e - x / θ θ α Γ(α)
]
α θα (θ + x)α + 1
⎛x ⎞ Φ − ⎜ + 1⎟ ⎝µ ⎠
[
θ x
]
⎛x ⎞2 θ ⎜ - 1⎟ ⎝µ ⎠ exp θ 2x .5 1 2π x
θα e - θ / x xα + 1 Γ[α]
[
]
2013-4-6,
Fitting Loss Distributions §1 Introduction,
HCM 10/14/12,
Page 6
Moments of Loss Distributions as per Loss Models
Distribution Name
Mean
Variance
Exponential
θ
θ2
Single Parameter Pareto
αθ α −1
α θ2 (α − 1)2 (α − 2)
Weibull
θ Γ[1 + 1/τ]
Gamma
αθ
Moments
n! θn α θn , α> n α−n
θ2 {Γ([1 + 2/τ] − Γ[1 + 1/τ]2 }
θn Γ[1 + n/τ]
αθ2 n−1 Γ[α + n] θn ∏(α + i) = θn (α )...(α + n -1) = θn Γ[α] i=0
exp[µ + σ2/2]
LogNormal
θ α −1
Pareto
exp[2µ + σ2] (exp[σ2] -1) α θ2 (α − 1)2 (α − 2)
exp[nµ + n2 σ2/2]
n
n! θn
∏ (α − i)
=
n! θ n , α> n (α − 1)...(α − n)
i=1
Inverse Gaussian
Inverse Gamma
µ3/θ
µ
θ α −1
θ2 (α − 1)2 (α − 2)
eθ/µ
n
2θ n µ Kn - 1/2 (θ/µ) µπ
θn
∏ (α − i) i=1
=
θn ,α>n (α − 1)...(α − n)
2013-4-6,
Fitting Loss Distributions §2 Ungrouped Data,
HCM 10/14/12,
Page 7
Section 2, Ungrouped Data There are 130 losses of sizes: 300 400 2,800 4,500 4,900 5,000 7,700 9,600 10,400 10,600 11,200 11,400 12,200 12,900 13,400 14,100 15,500 19,300 19,400 22,100 24,800 29,600 32,200 32,500 33,700 34,300
37,300 39,500 39,900 41,200 42,800 45,900 49,200 54,600 56,700 57,200 57,500 59,100 60,800 62,500 63,600 66,400 66,900 68,100 68,900 71,100 72,100 79,900 80,700 83,200 84,500 84,600
86,600 88,600 91,700 96,600 96,900 106,800 107,800 111,900 113,000 113,200 115,000 117,100 119,300 122,000 123,100 126,600 127,300 127,600 127,900 128,000 131,300 132,900 134,300 134,700 135,800 146,100
150,300 171,800 173,200 177,700 183,000 183,300 190,100 209,400 212,900 225,100 226,600 233,200 234,200 244,900 253,400 261,300 261,800 273,300 276,200 284,300 316,300 322,600 343,400 350,700 395,800 406,900
423,200 437,900 442,700 457,800 463,000 469,300 469,600 544,300 552,700 566,700 571,800 596,500 737,700 766,100 846,100 852,700 920,300 981,100 988,300 1,078,800 1,117,600 1,546,800 2,211,000 2,229,700 3,961,000 4,802,200
Each individual value is shown, rather than the data being grouped into intervals. The type of data shown here is called individual or ungrouped data. Some students will find it helpful to put this data set on a computer and follow along with the computations in the study guide to the best of their ability.5 The best way to learn is by doing.
5
Even this data set is far bigger than would be presented on an exam. In many actual applications, there are many thousands of claims, but such a large data set is very difficult to present in a Study Aid. It is important to realize that with modern computers, actuaries routinely deal with such large data sets. There are other situations where all that is available is a small data set such as presented here.
2013-4-6,
Fitting Loss Distributions §2 Ungrouped Data,
HCM 10/14/12,
Page 8
This ungrouped data set is used in many examples throughout this study guide: 300, 400, 2800, 4500, 4900, 5000, 7700, 9600, 10400, 10600, 11200, 11400, 12200, 12900, 13400, 14100, 15500, 19300, 19400, 22100, 24800, 29600, 32200, 32500, 33700, 34300, 37300, 39500, 39900, 41200, 42800, 45900, 49200, 54600, 56700, 57200, 57500, 59100, 60800, 62500, 63600, 66400, 66900, 68100, 68900, 71100, 72100, 79900, 80700, 83200, 84500, 84600, 86600, 88600, 91700, 96600, 96900, 106800, 107800, 111900, 113000, 113200, 115000, 117100, 119300, 122000, 123100, 126600, 127300, 127600, 127900, 128000, 131300, 132900, 134300, 134700, 135800, 146100, 150300, 171800, 173200, 177700, 183000, 183300, 190100, 209400, 212900, 225100, 226600, 233200, 234200, 244900, 253400, 261300, 261800, 273300, 276200, 284300, 316300, 322600, 343400, 350700, 395800, 406900, 423200, 437900, 442700, 457800, 463000, 469300, 469600, 544300, 552700, 566700, 571800, 596500, 737700, 766100, 846100, 852700, 920300, 981100, 988300, 1078800, 1117600, 1546800, 2211000, 2229700, 3961000, 4802200
2013-4-6,
Fitting Loss Distributions §3 Grouped Data,
HCM 10/14/12,
Page 9
Section 3, Grouped Data Unlike the ungrouped data in Section 2, often one is called upon to work with data grouped into intervals.6 In this example, both the number of losses in each interval and the dollars of loss on those losses are shown. Sometimes the latter information is missing or sometimes additional information may be available. Interval ($000) 0-5 5 -10 10-15 15-20 20-25 25-50 50-75 75-100 100 - ∞ SUM
Number of Losses 2208 2247 1701 1220 799 1481 254 57 33 10,000
Total of Losses in the Interval ($000) 5,974 16,725 21,071 21,127 17,880 50,115 15,303 4,893 4,295 157,383
The estimated mean is $15,738. As will be seen, in some cases one has to deal with grouped data in a somewhat different manner than ungrouped data. With modern computing power, the actuary is usually better off working with the data in an ungrouped format if available. The grouping process discards valuable information. The wider the intervals, the worse is the loss of information.
6
Note that in this example, for simplicity I have not made a big deal over whether for example the 10-15 interval includes 15 or not. In many real world applications, in which claims cluster at round numbers, that can be important.
2013-4-6,
Fitting Loss Distributions §4 Modeling Process,
HCM 10/14/12,
Page 10
Section 4, The Modeling Process and Parameters of Distributions Actuaries construct and use many mathematical models of real world situations of interest. Model selection is based on a balance between fit to the observed data and simplicity. Six Steps of the Modeling Process:7 Loss Models lists six steps to the modeling process.8 1. Model Choice: Choose one or more models to investigate. 2. Model Calibration: Fit the model to the data.9 3. Model Validation: Using statistical tests or other techniques to determine if the fit(s) are good enough to use. 4. Other Models: Possibly add additional models, and in that case return to step 1. 5. Model Selection: Select which model to use. 6. Modify for the Future: Make any changes needed to the selected model, so that it is appropriate to apply to the future.10
7
See Section 1.1.1 of Loss Models, not on the syllabus. As with all such general lists of steps, any real world application may be more closely or less closely approximated by this list. One would not always go through an elaborate procedure, particularly if one desires a rough estimate of something that will only be used once. 8 The actuary makes use of his prior knowledge and experience. Prior to “step one”, the actuary should understand the purpose of the model, read any relevant actuarial literature, talk to his colleagues, and investigate what data/information is available or can be obtained. 9 The actuary should examine the quality and reasonableness of any data before it is used. 10 For example, one might need to take into account inflation.
2013-4-6,
Fitting Loss Distributions §4 Modeling Process,
HCM 10/14/12,
Page 11
An Example of the Modeling Process:11 An actuary is interested in estimating excess ratios12 for Workers Compensation Insurance in Massachusetts.13 0. The available data is examined and it is determined that several years of Unit Statistical Plan data for Massachusetts at third, fourth, and fifth report, will be appropriate to use. This data has already been used for other purposes and therefore has already been checked for reasonableness and validity. 1. The mean, Coefficient of Variation, Skewness, and Kurtosis are calculated for each year of data. Mean excess losses are also examined. Based on this information, several heavier tailed distributions such as the LogNormal and Pareto are investigated. 2. These models are fit via maximum likelihood to the data.14 3. These fits are tested via the Kolmogorov-Smirnov Statistic and are all rejected.15 4. Various mixtures are considered.16 1. Several 2-point mixtures are chosen. 2. These models are fit via maximum likelihood to the data. 3. These fits are tested via the Kolmogorov-Smirnov Statistic and are all rejected. 4. Splices are considered.17
11
A somewhat simplified version of what I did to develop the method described in “Workers Compensation Excess Ratios: An Alternative Method of Estimation,” by Howard C. Mahler, PCAS 1998. 12 The excess ratio is one minus the loss elimination ratio. 13 These excess ratios will be used to determine Excess Loss Factors used in Retrospective Rating. 14 See “Mahlerʼs Guide to Statistics.” How to fit via maximum likelihood will be discussed subsequently. 15 The Kolmogorov-Smirnov Statistic will be discussed subsequently. 16 Mixtures are discusses in a subsequent section. 17 Splices are discusses in “Mahlerʼs Guide to Loss Distributions.”
2013-4-6,
Fitting Loss Distributions §4 Modeling Process,
HCM 10/14/12,
Page 12
1. Several splices between the empirical distribution and continuous distributions are chosen. 2. These models are fit via maximum likelihood to the data. 3. These fits are tested via the Kolmogorov-Smirnov Statistic and are all rejected. 4. A combination of mixtures and splices are considered. 1. Several splices between the empirical distribution and 2-point mixtures are chosen. 2. These models are fit via maximum likelihood to the data. 3. These fits are tested via the Kolmogorov-Smirnov Statistic. 5. A splice between the empirical distribution and a 2-point mixture of an Exponential and a Pareto is selected.18 6. Incorporate the effect of inflation and law amendments. 7. The model was then compared to more recent data not used in the selection process, but otherwise similar to the data used in the selection process. The selected model displayed a good fit to this more recent data.
18
The actual method uses something similar in concept to a splice, but not a traditional splice.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 13
Section 5, Ogives and Histograms19 There are several graphical techniques one can use to display size of loss data. Ogives: An ogive is an approximate graph of the Distribution function. We assume a uniform distribution on each interval, as discussed previously. The ogive is made up of straight line segments. For example, for the grouped data in Section 3, an ogive would look something like this:20 Ogive of Grouped Data in Section 3
.9656
.9910
.9967
.8175 .7376 .6156 .4455
.2208
0
5
10
15 20 25
50
75
100
Accident Size ($000) Note that each of the points is connected by a straight line segment.21 There is no specific loss size at which the distribution function reaches unity.22 The ogive is not unique, since it is an approximate graph of the Distribution function.23 In this example, one could draw an ogive that connected fewer of the points. 19
See Section 13.3 in Loss Models. Note that there are 33 accidents larger than $100,000. There is no unique way to represent this since the interval stretches to infinity. 21 For example, F(10) = .4455 and F(15) = .6156, so there is a straight line from (10,.4455) to (15,.6156). 22 Since for this set of ungrouped data, the last interval extends to infinity. 23 For ungrouped data, the more detailed information allows more possible choices of ogives. 20
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 14
Connect by straight lines the points: (xi, empirical distribution function at xi) .24 Exercise: What is the height of the above ogive at $30,000? [Solution: (80%)(.8175) + (20%)(.9656) = .8471.] Histograms: A histogram is an approximate graph of the probability density function. We assume a uniform distribution on each interval, as discussed previously. For the grouped data in Section 3, a histogram would look as follows:25 Histogram of Grouped Data in Section 3
.0000442 .0000449
.0000340 .0000244
.0000160
.0000059 .0000010
0
5 10 15 20 25
50
.0000002
75
100
Accident Size ($000) For example, for the interval from $25,000 to $50,000 of length $25,000, there are 1481 losses out of a total of 10,000, so that the height is: (1481 / 10000) / 25000 = .0000059. One has to remember to divide by both the total number of losses, 10,000, as well as the width of the interval 25,000, so that the p.d.f. will integrate to unity. In other words, the total area under the histogram should equal unity. The height of each rectangle =
# losses in the interval .26 (total # losses) (width of interval)
The histogram is the derivative of the ogive. 24
In Definition 13.8 in Loss Models, Fn (x) = Fn (cj-1) (cj - x)/(cj - cj-1) + Fn (cj) (x - cj-1)/(cj - cj-1), cj-1 ≤ x ≤ cj. Note that there are 33 accidents larger than $100,000. There is no unique way to represent this as a probability density, since the interval stretches to infinity. 26 In Definition 13.9 in Loss Models, fn (x) = {Fn (cj) - Fn (cj-1)}/(cj - cj-1) = nj/{n(cj - cj-1)}, cj-1 ≤ x < cj, where each interval is closed at its bottom and open at its top. 25
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 15
Exercise: Draw a histogram of the following grouped data: 0 -10: 6, 10-20: 11, 20-25: 3. [Solution: The heights are: 6/((20)(10)) = .03, 11/((20)(10)) = .055, and 3/((20)(5)) = .03. 0.055
0.03
10
20
25
Comparing a Histogram to a Continuous Distribution:27 It can sometimes be useful to compare a histogram to a continuous distribution. For example, here is a comparison between the histogram of the grouped data in Section 3, and an Exponential Distribution with θ = 15,636:28 Prob. 0.00006 0.00005 0.00004 0.00003 0.00002 0.00001
20
40
60
80
100
$000
This Exponential Distribution is not a good match to this data.29 27
Figure 16.2 of Loss Models is a comparison to a histogram. Figures 16.1 and 16.3 are comparisons to the empirical distribution function. Graphs of the difference between the empirical distribution and a continuous distribution are very useful and are discussed in a subsequent section. 28 As discussed in a subsequent section, this is the maximum likelihood Exponential Distribution fit to this data. 29 As discussed in a subsequent section, one can perform a Chi-Square Test.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 16
It turns out a Burr Distribution with parameters α = 3.9913, θ = 40,467, and γ = 1.3124, is a good match to this data.30 Here is a comparison between the histogram of the grouped data in Section 3, and this Burr Distribution: Prob. 0.00005
0.00004
0.00003
0.00002
0.00001
20
40
60
80
100
$000
Variances:31 In the histogram of the grouped data from Section 3, the height at $12,000 is .00003402. Thus we have an estimated density at $12,000 of .00003402. This was calculated as follows: (# of losses in interval 10,000 to 15,000)/{(total # of losses)(15000 - 10000)} = 1701/{(10000)(5000)} = 0.00003402. The number of losses observed in the interval 10,000 to 15,000 is random. Assuming the number of such losses is Binomial with m = 10,000 and q = 1701/10000 = 0.1701, it has variance: (10000)(0.1701)(1 - 0.1701) = (10,000)(0.1412). The estimate of the density is the number divided by: (10,000)(5000). Thus the estimate of the density has variance: (10000)(0.1412)/{(10000)(5000)}2 = 0.1412 / {(10000)(50002 )}. The estimate of the density has standard deviation of: 0.00000075.
30
As discussed in a subsequent section, this is the maximum likelihood Burr Distribution fit to this data. For the Burr Distribution, F(x) =1 - (1/(1+(x/θ)γ))α, and f(x) = αγ(x/θ)γ(1+(x/θ)γ)−(α + 1) /x. As discussed in a subsequent section, one can perform a Chi-Square Test. 31 See Section 14.2 of Loss Models.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 17
In general, an estimate of the density from the histogram, fn (x), has variance:32 (losses in interval / N) {1 - (losses in interval / N)} . N (width of the interval)2 Exercise: Given the following grouped data: 0 -10: 6, 10-20: 11, 20-25: 3. Based on the histogram, what is the estimated density at 18 and what is the variance of that estimate? [Solution: f20(18) = 11/{(20)(10)} = .055. The variance of this estimate is: (11/20)(1 - 11/20)/{20 (102 )} = 0.00012375. Comment: Thus an approximate 95% confidence interval is: 0.055 ± 0.022.] In the ogive of the grouped data from Section 3, the height at $30,000 is: (80%)(.8175) + (20%)(.9656) = .8741. This is an estimate of the Distribution Function at $30,000. Since the number of losses observed less than $25,000 and less than $50,000 are each random, there is a variance of this estimate. Let A = # of losses less than $25,000 and let B = # of losses from $25,000 to $50,000. Then we can write the estimate as: Fn (30000) = (80%)(A/N) + (20%)(A+B)/N. Var[Fn (30000)] = {.82 Var[A] + .22 Var[A + B] + (2)(.2)(.8)Cov[A, A + B]}/N2 . A is assumed Binomial with q = .8175. Var[A] = (.8175)(1 - .8175)N = .1492N. B is assumed Binomial with q = .9656 - .8175 = .1481. Var[B] = (.1481)(1 - .1481)N = .1262N. A + B is assumed Binomial with q = .9656. Var[A + B] = (.9656)(1 - .9656)N = .0332N. Var[A + B] = Var[A] + Var[B] + 2Cov[A, B]. ⇒ Cov[A, B] = (Var[A + B] - Var[A] - Var[B])/ 2 = (.0332N - .1492N - .1262N)/2 = -.1211N. In fact, A and B are jointly multinomial distributed, with covariance:33 -N(A/N)(B/N) = -N(.8175)(.1481) = -.1211N. Cov[A, A + B] = Var[A] + Cov[A, B] = .1492N - .1211N = .0281N. Therefore, Var[Fn (30000)] = {.82 (.1492N) + .22 (.0332N) + (2)(.2)(.8)(.0281N)}/N2 = 0.1058/N = 0.1058/10000 = 0.00001058. 32
In general, the variance of the estimated probability covered by an interval is: (Probability in the interval) (1 - Probability in the interval) / N. The estimated density is this estimated probability divided by the width of the interval; its variance is divided by the width squared. 33 See for example, A First Course in Probability by Sheldon Ross.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 18
In general, if x is in the interval from ai to bi, then Fn (x) = Fn (ai) (bi - x)/(bi - ai) + Fn (bi) (x - ai)/(bi - ai). Var[Fn (ai)] = Fn (ai)Sn (ai)/N. Var[Fn (bi)] = Fn (bi)Sn (bi)/N. Cov[Fn (ai), Fn (bi)] = Cov[A/N, (A+B)/N] = Var[A]/N2 + Cov[A, B]/N2 = N Fn (ai)Sn (ai)/N2 - N Fn (ai){Fn (bi) - Fn (ai)}/N2 = Fn (ai)Sn (bi)/N. Var[Fn (x)] = {Var[Fn (ai)](bi - x)2 + Var[Fn (bi)](x - ai)2 + 2Cov[Fn (ai), Fn (bi)](bi - x)(x - ai)}/(bi - ai)2 . Var[Fn (x)] = {Fn (ai)Sn (ai)(bi - x)2 + Fn (bi)Sn (bi)(x - ai)2 + 2Fn (ai)Sn (bi)(bi - x)(x - ai)}/{N (bi - ai)2 }. Note that since Fn (x) + Sn (x) = 1, Var[Sn (x)] = Var[Fn (x)]. Exercise: Given the following grouped data: 0 -10: 6, 10-20: 11, 20-25: 3. Based on the height of the ogive at 18 and what is the variance of that estimate? [Solution: F20(18) = .2(6/20) + (.8)(17/20) = (.2)(.3) + (.8)(.85) = 0.74. The variance of this estimate is: {.22 (.3)(.7) + .82 (.85)(.15) + (2)(.2)(.8)(.3)(.15)}/20 = 0.00522. Alternately, F(18) = (X + .8Y)/20, where X = number in interval from 0 to 10, and Y is the number in the interval from 10 to 20. Var[X] = (20)(6/20)(14/20) = 4.2. Var[Y] = (20)(11/20)(9/20) = 4.95. Cov[X, Y] = -(20)(6/20)(11/20) = -3.3. Var[F(18)] = Var[X + .8Y]/400 = {Var[X] + .82 Var[Y] + (2)(.8) Cov[X,Y]}/400 = {4.2 + (.64)(4.95) + (1.6)(-3.3)}/400 = 0.00522. Comment: Thus an approximate 95% confidence interval is: 0.74 ± 0.14.]
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 19
Comparing Ogives: One can compare ogives.34 The following data is taken from “Rating by Layer of Insurance,” by Ruth E. Salzmann, PCAS 1963. For four different classes of building, shown are the number of fire losses of size less than or equal to a given percent of value of the building. For example, for Frame Protected Buildings, 4636 out of 4862 fire claims resulted in damage of 10% or less of the value of the building.
34
Percent of Value
Frame Protected
Brick Protected
Frame Unprotected
Brick Unprotected
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100
546 1157 1659 2041 2338 2610 2833 3003 3151 3310 3981 4256 4388 4474 4520 4554 4585 4605 4636 4730 4767 4794 4810 4818 4828 4837 4843 4862
210 398 561 670 762 840 916 964 998 1047 1243 1307 1330 1344 1353 1361 1370 1373 1381 1400 1406 1411 1415 1421 1424 1427 1428 1432
169 383 547 662 733 811 867 902 937 968 1095 1170 1203 1217 1224 1237 1239 1240 1254 1272 1280 1287 1294 1298 1300 1305 1308 1333
54 120 155 191 218 237 248 257 272 280 323 344 349 351 353 356 356 358 362 366 370 372 373 374 374 374 375 378
See Exercise 13.6 of Loss Models. I find comparing ogives to usually be of little value in practical applications.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Exercise: Draw ogives comparing the protected to the unprotected distributions. Put both axis on a log scale. [Solution: For each distribution, we need to divide the given numbers by the total. For example, for frame protected, at 1% of value, the empirical distribution function is: 3310/4862 = .685. Here is a comparison of Frame Protected (solid) to Frame Unprotected (dashed): Distrib. 1 0.7 0.5
0.3 0.2 0.15
0.1
0.5
1
5
10
50
100
% of Value
Here is a comparison of Brick Protected (solid) to Brick Unprotected (dashed): Distrib. 1 0.7 0.5
0.3
0.2 0.15 0.1
0.5
1
5
10
50
100
% of Value
While the Protected and Unprotected distributions appear to differ, it is not very clear.]
Page 20
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 21
Difference graphs, discussed subsequently, would be a better way to display this comparison. Salzmann graphically compares the percent of total loss costs from losses of size less than or equal to a certain value, which is much better at showing any differences between the distributions than any comparison of ogives. The unprotected buildings have a much larger percent of total loss costs from large losses than do the protected buildings, something that is not visible from the ogives.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 22
Problems: Use the following grouped data for each of the next six questions: Range($) # of losses loss ($000) 0-100 6300 300 100-200 2350 350 200-300 850 200 300-400 320 100 400-500 110 50 over 500 70 50 10000
1050
5.1 (1 point) What is the value of the histogram at $230? A. less than 0.0009 B. at least 0.0009 but less than 0.0010 C. at least 0.0010 but less than 0.0011 D. at least 0.0011 but less than 0.0012 E. at least 0.0012 5.2 (2 points) What is the standard deviation of the estimate in the previous question? A. 0.000002 B. 0.000005 C. 0.00001 D. 0.00002 E. 0.00003 5.3 (1 point) What is the value of the ogive at $120? A. 64%
B. 66%
C. 68%
D. 70%
E. 72%
5.4 (3 points) What is the standard deviation of the estimate in the previous question? A. less than 0.0030 B. at least 0.0030 but less than 0.0035 C. at least 0.0035 but less than 0.0040 D. at least 0.0040 but less than 0.0045 E. at least 0.0045 5.5 (1 point) Use the ogive in order to estimate the 90th percentile of the size of loss distribution. A. 230 B. 240 C. 250 D. 260 E. 270 5.6 (2 points) Use the ogive in order to estimate the probability of a loss being of size between 60 and 230. A. 50% B. 51% C. 52% D. 53% E. 54%
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 23
5.7 (2 points) 100 losses are observed in intervals: cj-1
cj
nj
0 1 20 1 2 15 2 5 25 5 25 40 Let f100(x) be the histogram corresponding to this data. Determine f100(0.5) + f100(1.5) + f100(2.5) + f100(5.5). A. 0.41
B. 0.43
C. 0.45
D. 0.47
E. 0.49
5.8 (8 points) The following data is taken from "Comprehensive Medical Insurance - Statistical Analysis for Ratemaking" by John R. Bevan, PCAS 1963. Shown are the empirical distribution functions of the severity of loss at selected values. (The data has been grouped into intervals) The data is truncated from below by a $25 deductible. Size of Male Female Loss Employees Employees Spouse Child $49 19.3% 15.4% 14.2% 17.2% 99 47.6 39.4 38.5 38.2 199 64.3 59.7 59.7 70.5 299 73.5 72.3 71.2 80.9 399 80.1 80.3 78.6 87.9 499 85.0 85.6 83.1 91.4 999 91.4 95.2 93.8 96.7 1999 95.1 98.3 96.7 99.0 2999 97.0 99.2 98.3 99.2 3999 98.4 99.5 98.8 99.5 4999 98.6 99.6 99.0 99.7 6667 99.1 99.8 99.5 99.9 7499 99.4 99.9 99.5 99.9 10000 100.0 100.0 100.0 100.0 There are four separate distributions shown, based on the person incurring the medical expenses. (They were based on the following numbers of claims: 955, 1291, 915, 994). Draw ogives for each of these distributions, with however the x-axis on a log scale. Does it appear as if some or all of this data came from the same distribution?
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 24
Use the following grouped data for each of the next 6 questions: A random sample of 500 losses is distributed as follows: Loss Range Frequency [0, 10] 150 (10, 25] 90 (25, 100] 260 5.9 (1 point) What is the value of the histogram at 30? A. 0.003 B. 0.004 C. 0.005 D. 0.006 E. 0.007 5.10 (2 points) What is the standard deviation of the estimate in the previous question? A. 0.0002 B. 0.0003 C. 0.0004 D. 0.0005 E. 0.0006 5.11 (1 point) What is the value of the ogive at 15? A. 34% B. 35% C. 36% D. 37%
E. 38%
5.12 (3 points) What is the standard deviation of the estimate in the previous question? A. 0.02 B. 0.03 C. 0.04 D. 0.05 E. 0.06 5.13 (1 point) Use the ogive in order to estimate the 70th percentile of the size of loss distribution. A. less than 50 B. at least 50 but less than 55 C. at least 55 but less than 60 D. at least 60 but less than 65 E. at least 65 5.14 (2 points) Use the ogive in order to estimate the probability of a loss being of size between 13 and 42. A. 20% B. 22% C. 24% D. 26% E. 28%
5.15 (4 points) You are given the following data on the size of 954 physician professional liability claims, censored from above at 100,000: 0-1000 234 1001-5000 416 5001-10,000 134 10,001-25,000 101 25,001-50,000 36 50,001-100,000 33 Use an ogive to estimate the variance of the size of claims limited to 40,000, Var[X ∧ 40,000].
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 25
Use the following information for the next three questions: •
Twelve losses have been recorded as follows: 1050, 1100, 1300, 1500, 1900, 2100, 2200, 2400, 3000, 3200, 4100, 4400.
•
An ogive and histogram have been fitted to this data grouped using endpoints: 1000, 2500, 5000.
5.16 (1 point) Determine the height of the corresponding relative frequency histogram at x = 4000. A. 0.00009 B. 0.00010 C. 0.00011 D. 0.00012 E. 0.00013 5.17 (1 point) Using the ogive, what is the estimate of the 81st percentile of the distribution function underlying the empirical data? A. 3000 B. 3200 C. 3400 D. 3600 E. 3800 5.18 (2 points) Using the ogive, estimate the hazard rate at 3000, h(3000). A. 0.0001 B. 0.0002 C. 0.0003 D. 0.0004 E. 0.0005 5.19 (4, 5/85, Q.50) (1 point) Which of the following statements are true? 1) Two random variables are independent if their correlation coefficient is zero. 2) An ogive is an estimate of a sample's underlying continuous probability density function. 3) For any random variable X with distribution function F(x), Y = F(X) has a uniform probability density function. A. 2 B. 3 C. 1, 2 D. 1, 3 E. 2, 3 5.20 (4, 5/88, Q.51) (1 point) The ogive H(x) was fit to empirical data. H(x) consists of line segments. and the values of H(x) at the endpoints of these segments are defined below: x H(x) x H(x) 1 0.0 20 0.6 5 0.2 25 0.7 10 0.3 27 0.8 15 0.4 30 1.0 17 0.5 Using the ogive H(x), what is the estimate of the 55th percentile of the distribution function underlying the empirical data? A. Less than 18.0 B. At least 18.0, but less than 18.3 C. At least 18.3, but less than 18.6 D. At least 18.6, but less than 18.9 E. 18.9 or more
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 26
5.21 (2, 5/88, Q.21) (1.5 points) Observations are drawn at random from a continuous distribution. The following histogram is constructed from the data. Frequency per unit x 0.3
0.2
0.1
1
2
3
4
5
6
Oberservation Values (x)
Which set of frequencies could yield this histogram for the intervals 0 < x ≤ 3, 3 < x ≤ 5, and 5 < x ≤ 6, respectively? A. 1; 2; 3 B. 3; 2; 1 C. 3; 3; 3 D. 3; 4; 3 E. 3; 7; 10
5.22 (4, 5/88, Q.54) (1 point) Which of the following statements are true? 1. There is one and only one ogive that fits a given empirical distribution. 2. The Central Limit Theorem applies only to continuous distributions. 3. If the absolute deviation is used as the loss function for a Bayesian point estimate, then the resulting estimator is the median of the posterior distribution. A. 1 B. 2 C. 3 D. 1, 3 E. 2, 3 5.23 (160, 11/89, Q.6) (2.1 points) The observed number of failures in each week are: Week: 1 2 3 4 5 Failures: 3 2 3 1 1 A histogram is constructed, with 5 intervals, one for each week. The probability density function at the midpoint of week 3 is estimated from this histogram. Calculate the estimated variance of this estimated probability density function. (A) 0.016 (B) 0.021 (C) 0.024 (D) 0.035 (E) 0.048
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 27
5.24 (4, 5/91, Q.34) (2 points) The following relative frequency histogram depicts the expected distribution of policyholder claims. The policyholder pays the first $1 of each claim, the insurer pays the next $9 of each claim, and the reinsurer pays the remaining amount if the claim exceeds $10. What is the insurer's average payment per (non-zero) payment by the insurer? Assume the claims are distributed uniformly in each interval, (with probability densities of .10 from 0 to 3, .20 from 3 to 5, and .03 from 5 to 15.) A. 3.7 B. 3.9 C. 4.1 D. 4.3 E. 4.5 0.20 0.19 0.18 0.17
Probability Density
0.16 0.15 0.14 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00
0
2
4 6 8 Size of Policyholder Loss
10
12
14
16
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 28
5.25 (4B, 5/93, Q.31) (2 points) The following 20 wind losses, recorded in millions of dollars,occurred in 1992: 1, 1, 1, 1, 1, 2, 2, 3, 3, 4 6, 6, 8, 10, 13, 14, 15, 18, 22, 25 To construct an ogive H(x), the losses were segregated into four ranges: (0.5, 2.5), (2.5, 8.5), (8.5, 15.5), (15.5, 29.5). Determine the values of the probability density function h(x), corresponding to H(x), for the values x1 =4 and x2 =10. A. h(x1 ) = 0.300, h(x2 ) = 0.200 B. h(x1 ) = 0.050, h(x2 ) = 0.050 C. h(x1 ) = 0.175, h(x2 ) = 0.050 D. h(x1 ) = 0.500, h(x2 ) = 0.700 E. h(x1 ) = 0.050, h(x2 ) = 0.029 5.26 (4B, 11/94, Q.12) (1 point) You are given the following: Nine observed losses have been recorded in thousands of dollars and are grouped as follows: Interval [0,2) [2,5) [5,∞) Number of claims 2 4 3 Determine the value of the relative frequency histogram (p.d.f) for those losses at x = 3. A. Less than 0.15 B. At least 0.15, but less than 0.25 C. At least 0.25, but less than 0.35 D. At least 0.35, but less than 0.45 E. At least 0.45 5.27 (4B, 5/95, Q.1) (1 point) 50 observed losses have been recorded in millions and grouped by size of loss as follows: Size of Loss (X) Number of Observed Losses ( 0.5, 2.5] 25 ( 2.5, 10.5] 10 ( 10.5, 100.5] 10 (100.5, 1000.5] 5 __ 50 What is the height of the relative frequency histogram, h(x), at x = 50? A. Less than 0.05 B. At least 0.05, but less than 0.10 C. At least 0.10, but less than 0.15 D. At least 0.15, but less than 0.20 E. At least 0.20
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 29
5.28 (4B, 11/96, Q.11) (2 points) You are given the following: • Ten losses (X) have been recorded as follows: 1000, 1000, 1000, 1000, 2000, 2000, 2000, 3000, 3000, 4000. • An ogive, H(x), has been fitted to this data using endpoints for the connecting line segments with x-values as follows: x = c0 = 500, x = c1 = 1500, x = c2 = 2500, x = c3 = 4500 Determine the height of the corresponding relative frequency histogram, h(x), at x = 3000. A. 0.00010 B. 0.00015 C. 0.00020 D. 0.00025 E. 0.00030 5.29 (4B, 5/99, Q.30) (1 point) The derivative of an ogive is an estimate of which of the following functions? A. Probability density function B. Cumulative distribution function C. Limited expected value function D. Mean residual life function E. Loss function
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 30
5.30 (1, 11/00, Q.17) (1.9 points) A stock market analyst has recorded the daily sales revenue for two companies over the last year and displayed them in the histograms below. Numberof Occurences CompanyA
92.5
96.5
98.5 100 101.5 103.5
107.5
Daily SalesRevenue Numberof Occurences CompanyB
92.5
96.5
98.5 100 101.5 103.5
107.5
Daily SalesRevenue
The analyst noticed that a daily sales revenue above 100 for Company A was always accompanied by a daily sales revenue below 100 for Company B, and vice versa. Let X denote the daily sales revenue for Company A and let Y denote the daily sales revenue for Company B, on some future day. Assuming that for each company the daily sales revenues are independent and identically distributed, which of the following is true? (A) Var(X) > Var(Y) and Var(X + Y) > Var(X) + Var(Y). (B) Var(X) > Var(Y) and Var(X + Y) < Var(X) + Var(Y). (C) Var(X) > Var(Y) and Var(X + Y) = Var(X) + Var(Y). (D) Var(X) < Var(Y) and Var(X + Y) > Var(X) + Var(Y). (E) Var(X) < Var(Y) and Var(X + Y) < Var(X) + Var(Y).
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 31
5.31 (4, 5/05, Q.26 & 2009 Sample Q.195) (2.9 points) You are given the following information regarding claim sizes for 100 claims: Claim Size Number of Claims 0 - 1,000 16 1,000 - 3,000 22 3,000 - 5,000 25 5,000 - 10,000 18 10,000 - 25,000 10 25,000 - 50,000 5 50,000 - 100,000 3 over 100,000 1 Use the ogive to estimate the probability that a randomly chosen claim is between 2,000 and 6,000. (A) 0.36 (B) 0.40 (C) 0.45 (D) 0.47 (E) 0.50 5.32 (4, 11/05, Q.33 & 2009 Sample Q.243) (2.9 points) For 500 claims, you are given the following distribution: Claim Size Number of Claims [0, 500) 200 [500, 1,000) 110 [1,000, 2,000) x [2,000, 5,000) y [5,000, 10,000) ? [10,000, 25,000) ? [25,000, ∞) ? You are also given the following values taken from the ogive: F500(1500) = 0.689 F500(3500) = 0.839 Determine y. (A) Less than 65 (B) At least 65, but less than 70 (C) At least 70, but less than 75 (D) At least 75, but less than 80 (E) At least 80
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
5.33 (4, 11/06, Q.35 & 2009 Sample Q.278) (2.9 points) You are given: (i) A random sample of payments from a portfolio of policies resulted in the following: Interval Number of Policies (0, 50] 36 (50, 150] x (150, 250] y (250, 500] 84 (500, 1000] 80 (1000, ∞) 0 Total n (ii) Two values of the ogive constructed from the data in (i) are: Fn (90) = 0.21, and Fn (210) = 0.51 Calculate x. (A) 120
(B) 145
(C) 170
(D) 195
(E) 220
Page 32
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 33
Solutions to Problems: 5.1. A. In the interval 200 to 300 of length 100, there are 850 claims out of a total of 10000, so that the density function is (850 / 10000) / 100 = 0.00085. 5.2. E. Variance = (losses in interval / N){1 - (losses in interval / N)}/{N (width of the interval)2 } (0.085)(1 - 0.085) / {10000(1002 )}. Standard Deviation = 0.0000279. 5.3. C. F(100) = 0.6300 and F(200) = 0.8650, thus by linear interpolation the ogive at 120 is: (0.8)(0.6300) + (0.2)(0.8650) = 0.6770. 5.4. D. Variance = {0.82 (0.63)(1 - 0.63) + 0.22 (0.865)(1 - 0.865) + (2)(0.8)(0.2)(0.63)(1 - 0.865) }/ 10,000. Standard Deviation = 0.00426. Comment: Beyond what you are likely to be asked on your exam. 5.5. B. F(200) = 0.8650 and F(300) = 0.9500. Thus the estimate of the 90th percentile is between 200 and 300. We want: 0.90 = (0.8650)(300 - x)/100 + (0.9500)(x - 200)/100.
⇒ 90 = 259.5 - 0.8650x + 0.95x - 190. ⇒ x = 20.5/0.085 = 241.2. Check: (0.588)(0.8650) + (0.412)(0.9500) = 0.900. 5.6. B. F(200) = 0.8650 and F(300) = 0.9500, thus by linear interpolation the ogive at 230 is: (0.7)(0.8650) + (.3)(0.9500) = 0.8905. F(0) = 0 and F(100) = 0.6300, thus by linear interpolation the ogive at 60 is: (0.4)(0) + (0.6)(0.6300) = 0.3780. Prob[between 60 and 230] = 0.8905 - 0.3780 = 0.5125.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
5.7. C. fn (x) = nj/{n(cj - cj-1)}, cj-1 ≤ x < cj. f100(0.5) = 20/{(100)(1 - 0)} = 0.20. f100(1.5) = 15/{(100)(2 - 1)} = 0.15. f100(2.5) = 25/{(100)(5 - 2)} = 0.0833. f100(5.5) = 40/{(100)(25 - 5)} = 0.02. f100(0.5) + f100(1.5) + f100(2.5) + f100(5.5) = 0.20 + 0.15 + 0.0833 + 0.02 = 0.4533. Comment: Here is a graph of the histogram: 0.2
0.15
0.0833
0.02 1 2
5
25
Page 34
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
5.8. Male Employees: 1
0.8
0.6
0.4
0.2
0 50
100
500
1000
5000 10000
100
500
1000
5000 10000
Female Employees: 1
0.8
0.6
0.4
0.2
0 50
Page 35
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 36
Spouses: 1
0.8
0.6
0.4
0.2
0 50
100
500
1000
5000 10000
50
100
500
1000
5000 10000
Children: 1
0.8
0.6
0.4
0.2
0
The incidence of smaller size claims is greater for children than for adults. All of the adult ogives look somewhat similar to me. However, if one looks carefully, each of these distributions is somewhat different than the others. Comment: Similar to Exercise 13.6 in Loss Models. A portion of the same data is analyzed in Example 16.17 in Loss Models. In the early 1960s, spouses of employees who were covered under the employers health insurance plan were all or almost all female. Some of the differences between the adult distributions may be due to age. Comparing ogives is not a very useful way for most of us to distinguish between somewhat similar distributions. One could compare the difference of distribution functions. For example, here is a graph of the difference
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
between the male and female employee distributions: 0.075 0.05 0.025
50
100
500
1000
5000 10000
- 0.025 -0 . 0 5 - 0.075 - 0.1
5.9. E. In the interval 25 to 100 of length 75, there are 260 claims out of a total of 500, so that the density function is: (260 / 500) / 75 = 13/1875 = 0.00693. Comment: A graph of the histogram: 3 100
3 250 13 1875
10
25
100
Page 37
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
5.10. B. Variance =
HCM 10/14/12,
Page 38
(losses in interval / N) {1 - (losses in interval / N)} = N (width of the interval)2
(52%)(1 - 52%) / {(500)(752 )}. Standard Deviation = 0.0002979. Alternately, the probability in the interval is estimated as: 260/500 = 52%. The variance of the estimated probability in the interval is: (52%)(1 - 52%) / 500 = 0.0004992. The estimate of the histogram is: (probability in the interval) / 75. Thus the variance of the estimate the density at 30 is: 0.0004992 / 752 . Standard Deviation = 0.0002979. 5.11. C. F(10) = 150/500 = 0.30, and F(25) = 240/500 = 0.48, thus by linear interpolation the ogive at 15 is: (2/3)(0.30) + (1/3)(0.48) = 0.36. Comment: A graph of the ogive: 1
12 25 3 10
10
25
100
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 39
5.12. A. The estimate is: (2/3)Fn (10) + (1/3) Fn (25). Therefore, its variance is: (2/3)2 Var[Fn (10)] + (1/3)2 Var[Fn (25)] + (2)(2/3)(1/3) Cov[Fn (10), Fn (25)] = (4/9) (0.3)(1 - 0.3)/500 + (1/9) (0.48)(1 - 0.48)/500 + (4/9) (0.3)(1 - 0.48)/500 = 0.0003808. Standard Deviation = 0.01951. Alternately, in general, if x is in the interval from ai to bi, then Var[Fn (x)] = {Fn (ai)Sn (ai)(bi - x)2 + Fn ( bi)Sn ( bi)(x - ai)2 + 2Fn (ai)Sn ( bi)(bi - x)(x - ai)}/{N (bi - ai)2 }. Variance = {(0.30)(1 - 0.30)102 + (0.48)(0.52)52 + (2)(0.30)(0.52)(10)(5)}/ {(500) 152 } = 0.0003808. Standard Deviation = 0.01951. Comment: Beyond what you are likely to be asked on your exam. 5.13. C. F(25) = 0.48 and F(100) = 1. Thus the estimate of the 70th percentile is between 25 and 100. We want: 0.70 = (0.48)(100 - x)/75 + (1)(x - 25)/75.
⇒ 52.5 = 48 - 0.48x + x - 25. ⇒ x = 29.5/0.52 = 56.73. Check: (0.48)(100 - 56.73)/75 + (1)(56.73 - 25)/75 = 0.700. 5.14. D. F(10) = 150/500 = 0.30, and F(25) = 240/500 = 0.48. Thus by linear interpolation the ogive at 13 is: (0.3)(12/15) + (0.48)(3/15) = 0.336. F(25) = 240/500 = 0.48, and F(100) = 1. Thus by linear interpolation the ogive at 42 is: (0.48)(58/75) + (1)(17/75) = 0.5979. Prob[between 13 and 42] = 0.5979 - 0.336 = 0.2619.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 40
5.15. An Ogive assumes that the data is uniform on each interval. Of the 36 claims in the interval from 25,001 to 50,000, we assume 40% are of size 40,000 or more; there are (60%)(36) = 21.6 claims in the interval from 25,001 to 40,000, and (40%)(36) = 14.4 claims in the interval from 40,001 to 50,000. Thus there are total of 33 + 14.4 = 47.4 claims of size more than 40,000. E[X ∧ 40,000] = {(234)(500) + (416)(2500) + (134)(7500) + (101)(17,500) + (21.6)(32,500) + (47.4)(40000)} / 954 = 7060. For each interval from a to b, the second moment of the uniform is: (b3 - a3 ) / {(b-a)(3)}. Lower Endpoint 0 1000 5000 10000 25000 40000
Upper Endpoint 1000 5000 10000 25000 40000 40000
Number of Claims 234 416 134 101 21.6 47.4
First Moment 500 3,000 7,500 17,500 32,500 40,000
Second Moment 333,333 10,333,333 58,333,333 325,000,000 1,075,000,000 1,600,000,000
954
7,060
151,025,507
Var[X ∧ 40,000] = 151,025,507 - 70602 = 101,181,907. Comment: Data summarized from Table 3 of Sheldon Rosenbergʼs discussion of “On the Theory of Increased Limits and Excess of Loss Pricing”, PCAS 1977. The final interval includes all of the large losses that have been limited to 40,000. 5.16. E. The first interval 1000 to 2500 includes 8/12 = 2/3 of the losses, while the second interval contains 4/12 = 1/3 of the losses. 4000 is in the second interval, and in the second interval the histogram has height (1/3)/(5000-2500) = 0.000133. 5.17. D. The first interval 1000 to 2500 includes 8/12 = 2/3 of the losses, while the second interval contains 4/12 = 1/3 of the losses. Thus the second line segment of the Ogive goes from (2500, 2/3) to (5000, 1). It has a slope of (1/3)/(5000 - 2500). Thus for y = .81, x = 2500 + (.81 - .6667)/{(1/3)/(5000 - 2500)} = 3575. 5.18. E. The second line segment of the Ogive goes from (2500, 2/3) to (5000, 1). Thus f(3000) = (1 - 2/3) / (5000 - 2500) = 1/7500. F(3000) = (2/3)(4/5) + (1)(1/5) = 11/15. h(3000) = f(3000) / S(3000) = (1/7500) / (4/15) = 0.0005.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 41
5.19. B. 1. False. Two random variables are independent if and only if their joint probability density function is the product of their individual probability density functions. If X and Y are independent, then E[XY] = E[X]E[Y], and thus both the covariance = E[XY] - E[X]E[Y] and the correlation = Covar[X,Y] / (Var[X}Var[Y]).5 are zero. However, the converse is not true. There are cases where the correlation is zero yet X and Y are dependent. 2. False. An Ogive is an estimate of the cumulative distribution function. A Histogram is an estimate of the probability density function. 3. True. F(x) is uniformly distributed on the interval [0,1]. 5.20. C. We wish to estimate the point at which F(x) = .55. F(17) = .5 and F(20) = .6. Linearly interpolating we estimate F(18.5) = .55. 5.21. D. The area of the rectangles are: (3)(.1) = .3, (2)(.2) = .4, and (1)(.3) = .3. This is consistent with a frequency of 3, 4, 3, which has probabilities of .3, .4, .3. 5.22. C. 1. False. One can choose different ways to group the data, which results in different xi at which you graph points, which produces somewhat different looking ogives. 2. False. The Central Limit Theorem applies to either discrete or continuous distributions. The sum of many independent, identically distributed variables (with finite mean and variance) approaches a Normal Distribution. 3. True. 5.23. B. The estimate of f(3) is the height of the histogram at 3: 3/{(1)(10)} = .3 Variance of this estimate is: (.3)(1 - .3)/10 = 0.021.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
5.24. C.
Size of Loss x<1 1 ≤ x ≤ 10 x > 10 The average amount per loss is: 10
HCM 10/14/12,
Page 42
Amount Paid by Insurer 0 x-1 9
3
5
10
∫(x-1)f(x) dx +9{1-F(10)} = ∫(x-1)f(x) dx + ∫(x-1)f(x) dx + ∫(x-1)f(x) dx + (9)(.15) 1
1 3
3
5
∫
5
10
∫
∫
= (.10) (x-1)dx + .20) (x-1)dx + (.03) (x-1)dx + 1.35 1
3
5
= (.10)(2) + (.20)(6) + (.03)(32.5) + 1.35 = 3.725. The average payment per non-zero payment is: 3.725 / 0.9 = 4.139. Comment: The area under the histogram is: (.10)(3) + (2)(.20) + (10)(.03) = 30% + 40% + 30% = 100%. Thus this is indeed a probability density function. The average size of loss equals 5.05. The average amount paid by an insured per loss is .95. The average amount paid by the reinsurer per loss (whether or not the reinsurer makes a payment) is .375. Note that .95 + 3.725 + .375 = 5.05. 5.25. E. The ogive is an approximation to the distribution function; the question asked for the corresponding probability density function or histogram. x1 = 4 is in the second interval of width 6. There are 6 out of 20 claims in this interval. Therefore, h(x1 ) = (6/20)/6 = 0.050. x2 = 10 is in the third interval of width 7. There are 4 out of 20 claims in this interval. Therefore h(x2 ) = (4/20)/7 = 0.029. 5.26. A. The interval [2, 5) of length 3 has 4 claims out of the total of 9 claims. Thus the empirical p.d.f. at x = 3 is (4/3)/9 = 0.148. 5.27. A. The histogram is an approximation to the Probability Density Function (p.d.f.). The interval that contains claims of size 50 has 10 claims out of 50 claims, 20% of the total. This interval has a width of 100.5 - 10.5 = 90. So the histogram is .2 / 90 = 0.0022. 5.28. B. The interval 2500 to 4500 has a width of 2000. Since, 3 out of 10 observed claims are in this interval, the probability covered by this interval is: 3/10. Thus the height of the histogram in this interval is (3/10)/2000 = 0.00015. 5.29. A. Since the Ogive is an approximate cumulative Distribution Function, its derivative is an approximate probability density function.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 43
5.30. E. Company Aʼs share price X is less dispersed about the mean share price of 100 than Company Bʼs share price Y. ⇒ Var(X) < Var(Y). A daily sales revenue above 100 for Company A was always accompanied by a daily sales revenue below 100 for Company B, and vice versa. ⇒ Corr[X, Y] < 0. ⇒ Cov[X, Y] < 0.
⇒ Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X,Y) < Var(X) + Var(Y). 5.31. B. (22/2 + 25 + 18/5)/100 = 0.396. The empirical distribution function is 0.16 at 1000, .38 at 3000, .63 at 5000, and .81 at 10000. Therefore, the height of the ogive at 2000 is: (.16 + .38)/2 = .27. The height of the ogive at 6000 is: (4/5)(.63) + (1/5)(.81) = .666. Prob[2000 < X < 6000] = .666 - .27 = 0.396. Comment: The ogive is an approximate distribution function. Here is this ogive, shown up to 10,000: 0.8
0.6
0.4
0.2
2000
4000
6000
8000
10000
The desired probability is the difference of the height of the ogive at 6000, 0.666, and the height of the ogive at 2000, 0.270.
2013-4-6,
Fitting Loss Distributions §5 Ogives & Histograms,
HCM 10/14/12,
Page 44
5.32. E. At 1000 the empirical distribution function is: (200 + 110)/500 = 310/500. At 2000 the empirical distribution function is: (310 + x)/500. At 5000 the empirical distribution function is: (310 + x + y)/500. Therefore, linearly interpolating, the height of the ogive at 1500 is: (310/500)(.5) + {(310 + x)/500}(.5) = (310 + .5x)/500. Similarly, the height of the ogive at 3500 is: {(310 + x)/500}(.5) + {(310 + x + y)/500}(.5) = (310 + x + .5y)/500. (310 + .5x)/500 = .689. ⇒ 310 + .5x = 344.5. ⇒ x = 69. (310 + x + .5y)/500 = .839 ⇒ 310 + x+ .5y = 419.5. ⇒ y = 219 - 2x = 81. Comment: Given two outputs, solve for two missing inputs. The ogive is a series of straight lines between the values of the empirical distribution function at the endpoints of the intervals. 5.33. A. Fn (50) = 36/(200 + x + y). Fn (150) = (36 + x)/(200 + x + y). Fn (90) = .6 Fn (50) + .4 Fn (150) = (36 + .4x)/(200 + x + y). Fn (250) = (36 + x + y)/(200 + x + y). Fn (210) = .4 Fn (150) + .6 Fn (250) = (36 + x + .6y)/(200 + x + y). .21 = (36 + .4x)/(200 + x + y). ⇒ (.21)(200 + x + y) = 36 + .4x. ⇒ .19x - .21y = 6. .51 = (36 + x + .6y)/(200 + x + y). ⇒ (.51)(200 + x + y) = 36 + x + .6y. ⇒ .49x + .09y = 66. Solving these 2 equations in 2 unknowns: x = {(6)(.09) + (66)(.21)}/{(.19)(.09) + (.49)(.21)} = 120, and y = {(.19)(120) - 6}/.21 = 80.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 45
Section 6, Kernel Smoothing35 As discussed previously, the empirical distribution function and corresponding empirical model assigns probability 1/n to each of n observed values. For example, with the following observations: 81, 157, 213, the probability function (pdf) of the corresponding empirical model is: p(81) = 1/3, p(157) = 1/3, p(213) = 1/3.36 In a Kernel Smoothing Model, such a discrete model is smoothed using a “kernel” function. Examples are the uniform kernel, the triangular kernel, and the gamma kernel. In each case, the mean of the smoothed model is the same as the original empirical mean. Uniform Kernel: The simplest case uses the uniform kernel. In general, for a uniform kernel of bandwidth b, we have a uniform distribution from yj - b to yj + b, centered at each of the data points yj:
height 1/ 2b
width 2b Then one weights these uniform distributions together in order to get the smoothed model. In the case of the uniform kernel, the wider the bandwidth of each uniform distribution, the more smoothing. One could smooth the above empirical model using a uniform kernel with for example a bandwidth of 50. Rather than a point mass of probability at each data point, we spread the probability over an interval. For example, the 1/3 point mass at 81 is spread over the interval from 81 - 50 = 31 to 81 + 50 = 131, of width (2)(50) = 100 centered at 81. For the uniform kernel, the kernel density is: ky(x) = 1 / (2b), y - b ≤ x ≤ y + b.37 So for example, for a bandwidth of 50, for the kernel centered at 157, k157(x) = 1/100, 107 ≤ x ≤ 207, and zero elsewhere. The Uniform Kernel smoothing model is: Uniform[31, 131] / 3 + Uniform[107, 207] / 3 + Uniform[163, 263] / 3. 35
See Section 14.3 in Loss Models. Note that the empirical model depends only on the data set, and does not depend on which type of kernel we use for smoothing. 37 Note that the endpoints are included in the uniform kernel. 36
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 46
It has pdf of: 0 for x < 31, 1/300 for 31 ≤ x < 107, 2/300 for 107 ≤ x ≤ 131, 1/300 for 131 < x < 163, 2/300 for 163 ≤ x ≤ 207, 1/300 for 207 < x ≤ 263, 0 for x > 263.38 The three separate uniform kernels, centered at 81, 157 and 213, look as follows: 0.01 0.008 0.006 0.004 0.002
50
100
150
200
250
300
50
100
150
200
250
300
50
100
150
200
250
300
0.01 0.008 0.006 0.004 0.002
0.01 0.008 0.006 0.004 0.002
38
Each uniform has density of 1/100. The empirical model is 1/3 at each of the observed points. So each contribution is 1/300. The first uniform starts to contribute at 31. So starting at 31 we have 1/300. The second uniform starts to contribute at 107. So starting at 107 we have: 1/300 + 1/300 = 2/300. The first uniform stops contributing after 131. So after 131 we have 1/300, etc.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 47
Note that each uniform kernel is discontinuous at its endpoints. This uniform kernel smoothed density is an average of the individual uniform kernels, and looks as follows:39
0.006 0.005 0.004 0.003 0.002 0.001
50
100
150
200
250
300
Note that the kernel smoothed density has jump discontinuities at: 31, 107, 131, 163, 207, and 263. One could apply smoothing using the uniform kernel to larger data sets in a similar manner. Exercise: What is the density at 95,000 of the kernel smoothed density for the ungrouped data in Section 2, using a uniform distribution with a bandwidth of 5,000? [Solution: The loss of size 91,700 contributes: Uniform Distribution[86,700, 96,700]/130. The loss of size 96,600 contributes: Uniform Distribution[91,600, 101,600]/130. The loss of size 96,900 contributes: Uniform Distribution[91,900, 101,900]/130. Thus the density at 95,000 of the kernel smoothed density is: (3/130)/10,000 = 0.000002308. Comment: The bandwidth is 5000. Since these are the only three loss sizes within 5000 of 95,000, these are the only three that contribute to the kernel smoothed density at 95,000. The ungrouped data in Section 2 has 130 losses.] In general, the larger the bandwidth, the more smoothing. In practical applications, one wants to smooth out the noise (random fluctuation) and retain the signal (useful information). An actuary will try several different bandwidths, and choose one that appears to have an appropriate balance of these two competing goals.
39
In general we would weight the kernels together, with for example a point that appeared twice in the data set getting twice the weight.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 48
For example, shown out to 200,000, here is 100,000 times the kernel smoothed density for the ungrouped data in Section 2, using a uniform kernel with a bandwidth of 5,000:40 density 1.0 0.8 0.6 0.4 0.2
50000
100000
150000
size 200000
Here is 100,000 times the uniform kernel smoothed density with a wider bandwidth of 10,000, and thus more smoothing: density 0.7 0.6 0.5 0.4 0.3 0.2 0.1 50000 40
100000
150000
size 200000
Note that a very small probability has been assigned to negative loss sizes. This could be avoided by using a more complicated kernel which is not on the Syllabus.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 49
Here is 100,000 times the uniform kernel smoothed density with an even wider bandwidth of 25,000, and thus even more smoothing:41 density 0.5
0.4
0.3
0.2
0.1
50000
100000
150000
size 200000
Based on this graph, one would estimate that the mode, the place where the density is largest, is somewhere near 25,000.
41
In practical applications, one would want enough smoothing in order to remove most of the affects of random fluctuations, while avoiding too much smoothing which removes the informational content. In this case, a bandwidth of 25,000 seems to produce a little too much smoothing , while a bandwidth of 5000 seems to produce not enough smoothing.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 50
Variance of the Estimate of the Density at a Single Point: For the ungrouped data in Section 2, out of 130 losses, 21 are in the interval [100,000, 150,000]. Thus using a bandwidth of 25,000, the kernel smoothed density at 125,000 is: (21/ 130) / 50,000 = 0.000003231. This is an estimate of f(125,000) for the distribution from which this data was drawn. It is mathematically equivalent to the estimate from a histogram with an interval [100,000, 150,000]. The histogram has variance:
(losses in interval / N) {1 - (losses in interval / N)} N (width of the interval)2
= (21/130) (109/130) / {(130)(50,0002 )}. The standard deviation of the estimate is 0.000000646. Thus a 95% confidence interval for f(125,000) is: 0.000003231 ± (1.96)(0.000000646). For 100,000 times the uniform kernel smoothed density with a bandwidth of 25,000, here is a graph of the point estimate plus or minus 1.96 standard deviations of that estimate: density 0.6 0.5 0.4 0.3 0.2 0.1
50000
100000
150000
size 200000
In general, the variance of the estimate of the density at x from using a uniform kernel is: (z / N) {1 - (z / N)} , where z is the number of items in the interval [x - b, x + b]. N (2b) 2
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 51
Kernels Related to the Uniform Distribution: The uniform kernel is a member of a family of kernels:42 ky(x) = (1 -
n 0.52 )
n
⎧ ⎛ x - y ⎞2⎫ ⎨1 - ⎝ ⎬ / b, y - b ≤ x ≤ y + b. b ⎠ ⎭ ⎩
n = 0 ⇔ Uniform Kernel. ky(x) = 1 / (2b), y - b ≤ x ≤ y + b. 3 ⎧ ⎛ x - y ⎞2⎫ n = 1 ⇔ Epanechnikov Kernel. ky(x) = ⎨1 - ⎝ ⎬ , y - b ≤ x ≤ y + b. 4b ⎩ b ⎠ ⎭ Here is a graph of an Epanechnikov Kernel, centered at 0 with bandwidth 1, k0 (x) = (3/4) (1 - x2 ) , - 1 ≤ x ≤ 1: density 0.7 0.6 0.5 0.4 0.3 0.2 0.1 - 1.0
- 0.5
0.5
1.0
size
We can think of the uniform kernel as coming from the density: 1/2, - 1 ≤ x ≤ 1. We center each kernel at an observed point, and introduce a scale via the bandwidth b. Similarly, we can think of an Epanechnikov Kernel as coming from the density: (3/4) (1 - x2 ) , - 1 ≤ x ≤ 1. We center each kernel at an observed point, and introduce a scale via the bandwidth b. 42
The area under the kernel is 1. Do not memorize the formulas for this family of kernels. See Klein and Moeshberger, Survival Analysis, no longer on the Syllabus.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 52
Exercise: Verify that as with all kernels, the Epanechnikov Kernel has an area of 1. 3 [Solution: 4b
y+b
∫
y-b
⎧ 3 b3 (-b)3 ⎛ x - y ⎞2⎫ 1 dx = {2b + } = 1.] ⎨ ⎝ b ⎠ ⎬⎭ 4b 3 b2 3 b2 ⎩
Exercise: You are given four losses: 0.3, 1.8, 2.2, 3.6. Using an Epanechnikov kernel with bandwidth 1, estimate the density function at 1.5. [Solution: Only those kernels centered at data points within 1 of 1.5 contribute. k1.8(1.5) = (3/4) (1 - 0.32 ) = 0.6825. k2.2(1.5) = (3/4) (1 - 0.72 ) = 0.3825. The estimate of f(1.5) is:
0 + 0.6825 + 0.3825 + 0 = 0.26625. 4
Comment: Here is a graph of the kernel smoothed density: density 0.35 0.30 0.25 0.20 0.15 0.10 0.05 1
2
3
4
size ]
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 53
Exercise: You are given four losses: 0.3, 1.8, 2.2, 3.6. Using an Epanechnikov kernel with bandwidth 1, estimate the distribution function at 1.5. [Solution: All of the kernel centered at 0.3 is to left of 1.5; it contributes 1 to F(1.5). None of the kernel centered at 3.6 is to left of 1.5; it contributes 0 to F(1.5). The contribution of the kernel centered at 1.8 is: 1.5
(3/4)
∫0.8 1 - (x - 1.8)2 dx = (3/4) {0.7 - (-0.3)3/3 + (-1)3/3} = 0.28175.
The contribution of the kernel centered at 2.2 is: 1.5
(3/4)
∫1.21 - (x - 2.2)2 dx = (3/4) {0.3 - (-0.7)3/3 + (-1)3/3} = 0.06075.
The estimate of F(1.5) is:
1 + 0.28175 + 0.06075 + 0 = 0.335625. 4
Comment: In general, for an Epanechnikov kernel centered at y, the contribution to F(x) is: 0, for x ≤ y - b ⎧ ⎪ ⎪⎪ 1 3 (x - y) (x - y)3 , for y - b < x < y + b . ⎨ + 4b 4 b3 ⎪2 ⎪ ⎪⎩ 1, for x ≥ y + b
15 n = 2 ⇔ Biweight or Quartic Kernel. ky(x) = 16 b
]
2 ⎧ ⎛ x - y ⎞2⎫ 1 , y - b ≤ x ≤ y + b. ⎨ ⎝ b ⎠ ⎬⎭ ⎩
We can think of a Biweight Kernel as coming from the density: (15/16) (1 - x2 )2 , - 1 ≤ x ≤ 1. We center each kernel at an observed point, and introduce a scale via the bandwidth b. Exercise: Verify that as with all kernels, the Biweight Kernel has an area of 1. 15 [Solution: 16 b
y+b
∫
y-b
2 ⎧ 15 ⎛ x - y ⎞2⎫ ⎨1 - ⎝ ⎬ dx = ⎠ b 16 b ⎩ ⎭
y+b
∫
y-b
4 ⎛ x - y⎞ 2 ⎛ x - y⎞ 1 - 2 + dx = ⎝ b ⎠ ⎝ b ⎠
15 b3 (-b)3 b5 (-b)5 {2b - 2 + 2 + } = (15/16) (2 - 4/3 + 2/5) = 1.] 16 b 3 b2 3 b2 5 b4 5 b4
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 54
Here is a graph comparing a Biweight Kernel and an Epanechnikov Kernel, each with bandwidth of one, each centered at zero: density BiWeight 0.8 Epanechnikov 0.6
0.4
0.2
- 1.0
- 0.5
0.5
1.0
size
The BiWeight kernel is more highly peaked than the Epanechnikov kernel, which in turn is more highly peaked than the uniform kernel (which has no peak). Exercise: You are given four losses: 0.3, 1.8, 2.2, 3.6. Using a BiWeight kernel with bandwidth 1, estimate the density function at 1.5. [Solution: Only those kernels centered at data points within 1 of 1.5 contribute. k1.8(1.5) = (15/16) (1 - 0.32 )2 = 0.77634. k2.2(1.5) = (15/16) (1 - 0.72 )2 = 0.24384. The estimate of f(1.5) is:
0 + 0.77634 + 0.24384 + 0 = 0.25505.] 4
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 55
Exercise: You are given four losses: 0.3, 1.8, 2.2, 3.6. Using a Biweight kernel with bandwidth 1, estimate the distribution function at 1.5. [Solution: All of the kernel centered at 0.3 is to left of 1.5; it contributes 1 to F(1.5). None of the kernel centered at 3.6 is to left of 1.5; it contributes 0 to F(1.5). The contribution of the kernel centered at 1.8 is: 1.5
(15/16)
1.5
∫0.8 {1 - (x - 1.8)2}2 dx = (15/16) 0.8∫ 1 - 2(x - 1.8)2 + (x - 1.8)4 dx =
(15/16) {0.7 - (2)(-0.3)3 /3 + (2)(-1)3 /3 + (-0.3)5 /5 + (-1)5 /5 } = 0.23517. The contribution of the kernel centered at 2.2 is: 1.5
(15/16)
∫1.2 {1 - (x -
1.5
2.2)2 }2 dx
= (15/16)
∫1.21 - 2(x - 2.2)2 + (x - 2.2)4 dx =
(15/16) {0.3 - (2)(-0.7)3 /3 + (2)(-1)23/3 + (-0.7)5 /5 + (-1)5 /5 } = 0.02661. The estimate of F(1.5) is:
1 + 0.23517 + 0.02661 + 0 = 0.31545.] 4
One can generalize this family by allowing n to be non-integer:43 1 ky(x) = b β(n+ 1, 1/ 2)
For n = 1/2, ky(x) =
2 bπ
n
⎧ ⎛ x - y ⎞2⎫ 1 , y - b ≤ x ≤ y + b. ⎨ ⎝ b ⎠ ⎬⎭ ⎩
44 1 - {(x - y) / b}2 / b, y - b ≤ x ≤ y + b.
Here is a graph of an example of this kernel, a semicircle, centered at 0 with bandwidth 1: density 0.6 0.5 0.4 0.3 0.2 0.1 - 1.0 43 44
- 0.5
0.5
1.0
size
The Beta Function is discussed in "Mahler's Guide to Conjugate Priors." The area under the kernel is 1. See 4, 5/05, Q.22.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 56
Triangular Kernel: The triangular kernel centers a triangular density at each of the observed points. The triangular density has width 2b and height 1/b, where b is the bandwidth.
height 1/b
width 2b The area of this triangle is 1; the area under any density is 1. One could smooth the above empirical model, p(81) = 1/3, p(157) = 1/3, p(213) = 1/3, using a triangular kernel with a bandwidth of for example 50. Rather than a point mass of probability at each data point, we spread the probability using the triangular density. The 1/3 point mass at 81 is spread over the interval 31 to 131, of width (2)(50) = 100 centered at 81, with more weight to values near 81 and less weight to those near the endpoints of the interval.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 57
Exercise: What is the density at 61 for a triangular density centered at 81 with bandwidth 50? [Solution: The density is 0 at 81 - 50 = 31 and 1/50 = .02 at 81. 0.02
0.012
31
61
81
131
Linearly interpolating, the density at 61 is: (3/5)(0.02) = 0.012.] Thus the kernel smoothed density at 61 is: .012/3 = 0.004. Only the triangle centered at 81 contributes at 61. However at 120, both the triangles centered at 81 and 157 contribute. The density at 120 of the first triangular kernel centered at 81 is: (0.02)(131 - 120)/50 = 0.0044. 0.02
0.0044
31
81
120 131
Instead of drawing a diagram, one can use the following formula for the triangular kernel: b - |x - y | ky(x) = , y - b ≤ x ≤ y + b. b2 For example, k81(120) = (50 - |120 - 81|)/502 = 11/502 = 0.0044. (b - |x - y|) is the distance of x from the closer endpoint, y - b or y + b. (b - |x - y|)/b is the ratio of the distance of x from the closer endpoint to the distance from the center to this endpoint. {(b - |x - y|)/b}(1/b) = (b - |x - y|)/b2 , is the linearly interpolated height of the triangle at x.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 58
The density at 120 of the 2nd triangular kernel centered at 157 is: (.02)(120 - 107)/50 = 0.0052. Therefore, the smoothed density at 120 is: (0.0044)/3 + (0.0052)/3 = 0.0032. This smoothing model is: Triangle[31, 131] / 3 + Triangle[107, 207] / 3 + Triangle[163, 263] / 3. It has pdf of: 0 for x < 31, (x - 31)/7500 for 31 ≤ x < 81, (131 - x)/7500 for 81 ≤ x < 107, (131 - x)/7500 + (x - 107)/7500 = 0.0032 for 107 ≤ x ≤ 131, (x - 107)/7500 for 131 < x < 157, (207 - x)/7500 for 157 < x < 163, (207 - x)/7500 + (x - 163)/7500 = 0.00587 for 163 ≤ x ≤ 207, (x - 163)/7500 for 207 < x ≤ 213, (263 - x)/7500 for 213 < x ≤ 263, 0 for x > 263. The three separate triangular kernels, centered at 81, 157 and 213, look as follows: Density 0.02
31
81
x
131
Density 0.02
107
157
x
207
Density 0.02
163
213
263
x
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 59
Note that the slope of each triangular kernel changes at its endpoints and peak. So for example, the first triangular kernel is not differentiable at: 31, 81, and 131. This triangular kernel smoothed density is an average of the individual triangular kernels, and looks as follows:45 Density
0.006 0.005 0.004 0.003 0.002 0.001
31
81
107
131
157
213
263
Note that this triangular kernel smoothed density is not differentiable at: 31, 81, 107, 131, 157, 163, 207, 213, and 263. It is level between 107 and 131, as well as between 163 and 207; in these intervals the decreasing contribution from one triangle kernel is offset by the increasing contribution from another triangle kernel.
45
In general we would weight the kernels together, with for example a point that appeared twice in the data set getting twice the weight.
x
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 60
Shown out to 200,000, here is 100,000 times the kernel smoothed density for the ungrouped data in Section 2, using a triangular kernel with a bandwidth of 10,000:46 0.8
0.6
0.4
0.2
50000
100000
150000
200000
For the ungrouped data in Section 2, here is 100,000 times the triangular kernel smoothed density with a wider bandwidth of 50,000, and thus more smoothing: 0.40 0.35 0.30 0.25 0.20 0.15 50000
46
100000
150000
200000
The computer in graphing this density, has made the triangles less obvious than they in fact are. Note that a very small probability has been assigned to negative loss sizes. This can be avoided by using a more complicated kernel, not on the Syllabus.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 61
Gamma Kernel: One can turn a size of loss distribution into a kernel. For example, the Gamma Kernel has a mean equal to an observed point. Specifically, if y is the observed point, the Gamma Kernel is a Gamma density with parameters α and θ = y/α, mean y and coefficient of variation 1/ α . The smaller α, the larger the CV and the more smoothing. For example, here is a Gamma Kernel with α = 10, θ = 8.1, and mean 81:
0.015 0.0125 0.01 0.0075 0.005 0.0025 50
100
150
200
Unlike the previous kernels, the gamma kernel has support 0 to ∞. Therefore, all of the individual gamma distributions contribute something to the gamma kernel smoothed density at any point greater than zero.47 One could smooth the above empirical model, p(81) = 1/3, p(157) = 1/3, p(213) = 1/3, using a gamma kernel with for example α = 10. Rather than a point mass of probability at each data point, we spread the probability using the gamma density. Exercise: What is the density at 120 for a Gamma with mean 81 and α = 10? [Solution: α = 10 and θ = 81/10 = 8.1. The density of a Gamma is: f(x) = θ−αxα−1 e−x/θ / Γ(α) = 8.1-10 x9 e-x/8.1/9! = 2.267 x 10-15 x9 e-x/8.1. f(120) = 0.00431.] Similarly, the density at 120 for a gamma density with mean 157 and α = 10 is 0.00749, and the density at 120 for a gamma density with mean 213 and α = 10 is 0.00264. The smoothed density at 120 is: (0.00431 + 0.00749 + 0.00264)/3 = 0.00481. 47
Also, there is no density assigned to negative values as can be the case with either the uniform or triangular kernels.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 62
The Gamma Kernel smoothing model is: Gamma[10, 8.1] / 3 + Gamma[10, 15.7] / 3 + Gamma[10, 26.3] / 3. This gamma kernel smoothed density looks as follows: 0.006 0.005 0.004 0.003 0.002 0.001
100
200
300
400
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 63
Shown out to 200,000, here is 100,000 times the kernel smoothed density for the ungrouped data in Section 2, using a gamma kernel with α = 4: 1.0
0.8
0.6
0.4
0.2
0.0 0
50000
100000
150000
200000
For the ungrouped data in Section 2, 100,000 times the gamma kernel smoothed density with a smaller α of 2, and thus more smoothing: 1.2
1.0
0.8
0.6
0.4
0.2
0.0 0
50000
100000
150000
200000
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 64
Distribution Function, Uniform Kernel:48 One can also use kernels to smooth distribution functions. One sees how much of the area from a kernel is to the left of the value at which one wants the smoothed distribution. For the data 81, 157, 213, and a uniform kernel with a bandwidth of 50, here is how one computes the smoothed distribution at 120. The uniform kernel centered at 81 goes from 31 to 131. Thus (120 - 31)/100 = 0.89 of its area is to the left of 120. The uniform kernel centered at 157 goes from 107 to 207. Thus (120 - 107)/100 = 0.13 of its area is to the left of 120. The uniform kernel centered at 213 goes from 163 to 263. Thus none of its area is to the left of 120. Thus the smoothed distribution at 120 is: 0.89/3 + 0.13/3 + 0/3 = 0.340. A kernel density estimator of the distribution function corresponding to a discrete p(yj) is: n
F(x) = ∑ p(yj) K yj(x) . ^
j=1
0 for x < y - b ⎧ ⎪ ⎪⎪ x - (y - b) For the uniform kernel, Ky(x) = ⎨ for y - b ≤ x ≤ y + b . 2b ⎪ ⎪ ⎪⎩ 1 for x > y + b
For the uniform kernel with bandwidth 50 and centered at 81: 0 for x < 31 ⎧ ⎪ ⎪⎪ x - 31 K81(x) = ⎨ for 31 ≤ x ≤ 131. ⎪ 100 ⎪ ⎪⎩ 1 for x > 131
48
See 4, 11/04, Q.20, and 4, 11/06, Q.24, and 4, 5/07, Q.16.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 65
For the uniform kernel centered at 157: 0 for x < 107 ⎧ ⎪ ⎪⎪ x - 107 K157(x) = ⎨ for 107 ≤ x ≤ 207 . ⎪ 100 ⎪ ⎪⎩ 1 for x > 207
For the uniform kernel centered at 213: 0 for x < 163 ⎧ ⎪ ⎪⎪ x - 163 K213(x) = ⎨ for 163 ≤ x ≤ 263 ⎪ 100 ⎪ ⎪⎩ 1 for x > 263
Then the uniform kernel smoothed Distribution Function is: 0, for x < 31 ⎧ ⎪ ⎪ x - 31 ⎪ , for 31 ≤ x < 107 300 ⎪ ⎪ ⎪ x - 31 x - 107 + , for 107 ≤ x < 131 ⎪ 300 300 ⎪ ⎪⎪ 1 x - 107 K81(x)/3 + K157(x)/3 + K213(x)/3 = ⎨ + , for 131 ≤ x < 163 3 300 ⎪ ⎪ ⎪ 1 + x - 107 + x - 163 , for 163 ≤ x < 207 ⎪3 300 300 ⎪ ⎪ 2 x - 163 + , for 207 ≤ x < 263 ⎪ 3 300 ⎪ ⎪ ⎪⎩ 1, for 263 ≤ x
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 66
Exercise: Using this algebraic form, determine the smoothed distribution at 120. [Solution: (120 - 31)/ 300 + (120 - 107)/300 = 0.340, matching the previous result.] Here is a graph of the uniform kernel smoothed distribution function: F(x) 1
61 75
13 25 31 75
19 75
31
107
131
163
207
263
x
The uniform kernel smoothed distribution is a series of connected line segments, where the slope changes where the smoothed density has jump discontinuities. F(131) = 31/75 < 0.5, and F(163) = 13/25 > 0.5, thus the median of the the uniform kernel smoothed distribution function is between 131 and 163. Linearly interpolating, the median is at: {(0.5 - 31/75)(163) + (13/25 - 0.5)(131)} / (13/25 - 31/75) = 157. Alternately, use the algebraic form of F(x) for x between 131 and 163. Set 0.5 = 1/3 + (x - 107)/300. ⇒ x = 157.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 67
Distribution Function, Triangle Kernel:49 For the triangle kernel, it takes more work to compute the areas than for the uniform kernel. However, the idea is the same; we need to determine the area to the left of a vertical line. Exercise: For the data 81, 157, 213, and a triangular kernel with a bandwidth of 50, what is the smoothed distribution at 180? [Solution: The triangular kernel centered at 81 goes from 31 to 131. Thus all of its area is to the left of 180. The triangular kernel centered at 157 goes from 107 to 207: density 0.02
107
157
180
207
x
The triangle to the right of 180 has width 27, height (27/50)(1/50), and area (27/50)2 /2. Thus 1 - (27/50)2 /2 = 0.8542 of the large triangle's area is to the left of 180. The triangular kernel centered at 213 goes from 163 to 263: density 0.02
163
180
213
263
x
The triangle to the left of 180 has width 17, height (17/50)(1/50), and area (17/50)2 /2. Thus (17/50)2 /2 = 0.0578 of the large triangle's area is to the left of 180. The smoothed distribution at 180 is: 1/3 + 0.8542/3 + 0.0578/3 = 0.637.] 49
See 4, 11/05, Q.9.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 68
A kernel density estimator of the distribution function corresponding to a discrete p(yj) is: n
F(x) = ∑ p(yj) K yj(x) . ^
j=1
0, for x < y - b ⎧ ⎪ ⎪ {x - (y - b)}2 ⎪ , for y - b ≤ x < y ⎪ 2 b2 For the triangle kernel, Ky(x) = ⎨ . ⎪ {x - (y + b)}2 , for y ≤ x < y + b ⎪1 2 b2 ⎪ ⎪ 1, for x ≥ y + b ⎩
For the triangle kernel with bandwidth 50 and centered at 81: 0, for x < 31 ⎧ ⎪ ⎪ (x - 31)2 ⎪ , for 31 ≤ x < 81 5000 ⎪ K81(x) = ⎨ . 2 ⎪ (x - 131) , for 81 ≤ x < 131 ⎪1 5000 ⎪ ⎪ 1, for x ≥ 131 ⎩
For the triangle kernel centered at 157: 0, for x < 107 ⎧ ⎪ ⎪ (x - 107) 2 ⎪ , for 107 ≤ x < 157 5000 ⎪ K157(x) = ⎨ . ⎪ (x - 207)2 , for 157 ≤ x < 207 ⎪1 5000 ⎪ ⎪ 1, for x ≥ 207 ⎩
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 69
For the triangle kernel centered at 213: 0, for x < 163 ⎧ ⎪ ⎪ (x - 163) 2 ⎪ , for 163 ≤ x < 213 5000 ⎪ K213(x) = ⎨ . ⎪ (x - 263)2 , for 213 ≤ x < 263 ⎪1 5000 ⎪ ⎪ 1, for x ≥ 263 ⎩
Then the triangle kernel smoothed Distribution Function is: 0, for x < 31 ⎧ ⎪ ⎪ (x - 31)2 ⎪ , for 31 ≤ x < 81 15,000 ⎪ ⎪ ⎪ 1 (x - 131)2 , for 81 ≤ x < 107 ⎪ 3 15,000 ⎪ ⎪ 2 2 ⎪ 1 - (x - 131) + (x - 107) , for 107 ≤ x < 131 ⎪3 15,000 15,000 ⎪ ⎪ 1 (x - 107)2 , for 131 ≤ x < 157 ⎪ 3 15,000 ⎪ K81(x)/3 + K157(x)/3 + K213(x)/3 = ⎨ . ⎪ 2 (x - 207)2 , for 157 ≤ x < 163 ⎪ 3 15,000 ⎪ ⎪2 2 2 ⎪ - (x - 207) + (x - 163) , for 163 ≤ x < 207 15,000 15,000 ⎪3 ⎪ ⎪ 2 (x - 163) 2 + , for 207 ≤ x < 213 ⎪ 3 15,000 ⎪ ⎪ (x - 263)2 ⎪ 1 , for 213 ≤ x < 263 ⎪ 15,000 ⎪ ⎪ 1, for x ≥ 263 ⎩
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 70
Exercise: Using this algebraic form, determine the smoothed distribution at 180. [Solution: 2/3 - (180 - 207)2 / 15,000 + (180 - 163)2 /15,000 = 0.637, matching the previous result.] Here is a graph of the triangle kernel smoothed distribution function:
F(x) 1.0
0.8
0.6
0.4
0.2
50
100
150
200
250
300
x
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 71
General Formulas: n
A kernel density estimator of a discrete density p(yj) is: f (x) = ∑ p(yj) k yj(x) .50 ^
j=1
For the uniform kernel, ky(x) =
1 , y - b ≤ x ≤ y + b.51 2b
For the triangular kernel, ky(x) = For the gamma kernel, ky(x) =
b - |x - y | , y - b ≤ x ≤ y + b.52 2 b (α / y)α xα - 1 e- x α/ y , 0 < x < ∞. Γ(a)
A kernel density estimator of the distribution function corresponding to a discrete p(yj) is: n
F(x) = ∑ p(yj) K yj(x) . ^
j=1
For the uniform kernel, Ky(x) = 0 for x < y - b,
x - (y - b) for y - b ≤ x ≤ y + b, 1 for x > y + b. 2b
For the triangular kernel, Ky(x) = 0 for x < y - b,
{x - (y - b)}2 2b 2
for y - b ≤ x ≤ y, 1 -
{x
- (y + b)}2 for y ≤ x ≤ y + b, 1 for x > y + b. 2b2
Variances: For the three observations: 81, 157, 213, we constructed the kernel smoothed density with a uniform kernel with bandwidth 50. The Uniform Kernel smoothing model is: Uniform[31, 131] / 3 + Uniform[107, 207] / 3 + Uniform[163, 263] / 3. With pdf of: 0 for x < 31, 1/300 for 31 ≤ x < 107, 2/300 for 107 ≤ x ≤ 131, 1/300 for 131 < x < 163, 2/300 for 163 ≤ x ≤ 207, 1/300 for 207 < x ≤ 263, 0 for x > 263. The mean of each kernel is its corresponding data point. Thus, the mean of the kernel smoothed density is the mean of the data. In this case, the mean is: (81 + 157 + 213)/3 = 150.333. 50
See Definition 14.2 in Loss Models. Note that the density is positive at the endpoints; in other words the endpoint are included. 52 Due to the nature of the triangle kernel, the density is zero at the endpoints. 51
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 72
We can compute the second moment of the kernel smoothed density: (1/3) (2nd moment of Uniform[31, 131]) + (1/3) (2nd moment of Uniform[107, 207]) + (1/3) (2nd moment of Uniform[163, 263]) = (1/3){(1313 - 313 )/300 + (2073 - 1073 )/300 + (2633 - 1633 )/300} = (1/3){7394.33 + 25482.33 + 46202.33} = 26,359.7. Therefore, the variance of the kernel smoothed density is: 26,359.7 - 150.3332 = 3759.7. This is same manner in which we would get the variance of a mixture of three uniform distributions. Alternately, we can think of the kernel smoothed density as 3 equally likely risk types.53 Then the process variance of each type of risk is that of a uniform distribution of width 100, which is 1002 /12. Thus the expected value of the process variance is: 1002 /12 = 833.33. The mean of each uniform kernel is the corresponding observed point. Thus the variance of the hypothetical means is the variance of the observed data. In this case, the variance of the data is:54 (1/3){(81 - 150.33)2 + (157 - 150.33)2 + (213 - 150.33)2 } = 2926.22. The variance of the kernel smoothed density is: EPV + VHM = 2926.22 + 833.33 = 3759.6.55 In general, the variance of the kernel smoothed density is the variance of the data (the variance of the empirical distribution function) plus the (average) variance of the kernel.56 Each uniform kernel has variance: (2b)2 /12 = b2 /3. A triangular density from 0 to 2b has mean b, and second moment: b
∫
0
2b
x2 x / b2 dx + ∫ x2 (2b - x)/ b2 dx = b2 /4 + 11b2 /12 = 7b2 /6. b
Therefore, the variance of this triangular kernel is: 7b2 /6 - b2 = b2 /6. Each triangular kernel has variance: b2 /6. Exercise: For this same example, what is the variance of the triangle kernel smoothed density for a bandwidth of 50? [Solution: 2926.22 + 502 /6 = 3342.9.] 53
See "Mahler's Guide to Buhlmann Credibility." We do not take the sample variance. 55 See "Mahler's Guide to Buhlmann Credibility." 54
In the case of the Gamma kernel, each kernel has a variance of αθ2 = y2 /α. So in this case one must take an average of these variances, which is the second moment of the data divided by α. 56
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 73
Pareto Kernel:57 One can use other size of loss distributions as the basis of kernels. For example, a Pareto with parameters α > 1 and θ = y(α - 1), has a mean of y, and density: (αθα)(θ + x) − (α + 1). Therefore, for the Pareto kernel, ky(x) =
α {y(α - 1)}α , 0 < x < ∞. {y(α - 1) + x}α + 1
Shown out to 200,000, here is the kernel smoothed density for the ungrouped data in Section 2, using a Pareto kernel with α = 10:
Density times one million 15
10
5
50000
57
See Exercise 14.27 in Loss Models.
100000
150000
Size 200000
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 74
Limited Expected Values, Uniform Kernel: First, let us assume for simplicity that we have a sample of size one; one value at 213. Let us use a uniform kernel with bandwidth 50: 0.01 0.008 0.006 0.004 0.002
50
100
150
200
250
300
The mean of the kernel is the midpoint, 213, the observed value. Let us calculate some limited expected values for this kernel. E[X ∧ 300] = 213, since all of the values are less than 300 the limit in this case had no effect. E[X ∧ 150] = 150, since all of the values become 150 after we apply the limit. With instead a limit of 200, some of the values are capped at 200 and some are unaffected. 200
E[X ∧ 200] =
∫ 163
263 - 200 x f(x) dx + (200) = 100
x = 200 2 x ⎤
200 ⎥⎦
200
∫163 (x / 100) dx + 126 =
+ 126 = 67.155 + 126 = 193.155.
x = 163
We note that: E[X ∧ 200] = 193.155 ≤ 213 = E[X], and E[X ∧ 200] = 193.155 ≤ 200.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 75
In order to compute E[X ∧ 200] geometrically, we can divide this kernel by a vertical line at 200: density 0.01
163 There is an area of: and an area of:
200
213
263
y
200 - 163 = 0.37 to the left of 200, 100
263 - 200 = 0.63 to the right of 200.58 100
The values to the left of 200 each contribute their values to E[X ∧ 200]. The average of these small values is: (163 + 200)/2 = 181.5. Thus the contribution of the small losses is: (0.37)(181.5) = 67.155. The values to the right of 200 each contribute 200 to E[X ∧ 200]. Thus the contribution of the large losses is: (0.63)(200) = 126. Therefore, E[X ∧ 200] = 67.155 + 126 = 193.155, matching the previous result.
58
The area under a kernel is one.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 76
In general, let us assume that we have a uniform kernel centered at y, with bandwidth b: f(x) = 1 / (2b), y-b ≤ x ≤ y+b. Then for L ≥ y + b, since all of the values are less than L the limit in this case had no effect. Thus E[X ∧ L] = y. For L ≤ y - b, all of the values become L after we apply the limit. Thus E[X ∧ L] = L. If y + b > L > y - b, some of the values are capped at L and some are unaffected. L
E[X ∧ L] =
∫ y-b
x =L 2 x ⎤
4b ⎥⎦
x =y -b
y + b - L x f(x) dx + L = 2b
+L
L
∫y-b 2b dx + L x
y + b - L = 2b
y + b - L 2L (y + b) - (y - b)2 - L2 = . 2b 4b
Thus for a uniform kernel centered at y with bandwidth b:59 L, L ≤ y - b ⎧ ⎪2L (y + b) - (y - b)2 - L2 E[X ∧ L] = ⎨ , y - b < L < y + b. 4b ⎪ y, L ≥ y + b ⎩ Exercise: Use the above formula to compute E[X ∧ 200] for the uniform kernel centered at 213 with bandwidth 50. [Solution:
2L (y + b) - (y - b)2 - L2 (2) (200) (213 + 50) - (213 - 50)2 - 2002 = = 193.155. (4)(50) 4b
Comment: Matching the previous result.] Exercise: Compute E[X ∧ 200] for the uniform kernel centered at 157 with bandwidth 50. [Solution:
2L (y + b) - (y - b)2 - L2 (2) (200) (157 + 50) - (157 - 50)2 - 2002 = = 156.755. (4)(50) 4b
Comment: You can get the same result from first principles, either algebraically or geometrically.] E[X ∧ 200] for the uniform kernel centered at 81 with bandwidth 50 is its mean of 81. 59
I would not memorize this formula.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 77
Let us return to the example, where we have a data set of size three: 81, 157, and 213. We apply a uniform kernel with bandwidth 50. The mean of each kernel is the point at which it is centered. Therefore, the mean of the kernel smoothed density is the mean of the data: (1/3)(81) + (1/3)(157) + (1/3)(213) = 150.333. E[X ∧ 200] for the kernel smoothed density is the average of the limited expected values for each of the individual kernels computed previously: (1/3)(81) + (1/3)(156.755) + (1/3)(193.155) = 143.637. In general, the limited expected value for a kernel smoothed density is a weighted average of the limited expected values of each of the individual kernels, with weights equal to the number of times each value appears in the original data set. For this example, here is a graph of the limited expected values of the uniform kernel smoothed density as a function of the limit: LEV 150.3 143.6
100
50
100
200
263
L
A limit of more than 213 + 50 = 263 has no effect; so that for such a limit the limited expected value is just the mean of: (81 + 157 + 213) / 3 = 150.333.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 78
Limited Expected Values, Triangle Kernel: Working with the Triangle Kernel is harder than working with the Uniform Kernel. First, let us assume for simplicity that we have a sample of size one; one value at 213. Let us use a triangle kernel with bandwidth 50: density 0.02
163
200 213
263
x
The mean of this kernel is the midpoint, 213, the observed value. E[X ∧ 300] = 213, since all of the values are less than 300 the limit in this case had no effect. E[X ∧ 150] = 150, since all of the values become 150 after we apply the limit. With instead a limit of 200, some of the values are capped at 200 and some are unaffected. x - 163 For 163 ≤ x ≤ 213, f(x) = (0.02) = 0.0004x - 0.0652.60 213 - 163 Alternately, ky(x) = Here, k213(x) = Thus f(x) = and f(x) =
60
b - |x - y | , y - b ≤ x ≤ y + b. b2
50 - | x - 213 | , 163 ≤ x ≤ 213. 502
50 - (213 - x) = 0.0004x - 0.0652, 163 ≤ x ≤ 213, 50 2
50 - (x - 213) = -0.0004x + 0.1052, 213 ≤ x ≤ 263. 50 2
One can verify that f(163) = 0 and f(213) = 0.02.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 79
The area to the left of 200 is a fraction of the area of 1/2 that is to the left of 213: ⎛ 200 - 163 ⎞ 2 (1/2) = 0.2738. ⎝ 213 - 163 ⎠ Alternately, the area to the left of 200 is a triangle, with base of 37, 200 - 163 and height of: (0.02) = 0.0148. 213 - 163 Thus the area of this triangle is: (1/2)(37)(0.0148) = 0.2738. Therefore, the area to the right of 200 is: 1 - 0.2738 = 0.7262. 200
E[X ∧ 200] =
200
∫163 x f(x) dx + (200) (1 - 0.2738) = 163∫ x (0.0004x - 0.0652) dx + 145.24 = x = 200
0.0004 x3 ⎤ - 0.0326x2 ⎥ 3 ⎦
+ 145.24 = 51.383 + 145.24 = 196.623.
x = 163
We note that: E[X ∧ 200] = 196.623 ≤ 213 = E[X], and E[X ∧ 200] = 196.623 ≤ 200. One can instead use a geometric approach. In order to do so, I will use a result for right triangles. Assume we have a right triangle with base from c to d:
c
d
Then the average value is: d - (d-c)/3. In other words, the average is one third of the way from the "high end" to the "low end". This result does not depend on the ratio of the base to the height.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 80
Instead, assume we have a right triangle as follows:
d
c
Then the average value is: d + (d-c)/3; again one third of the way from the high end to the low end. density 0.02
163
200 213
263
x
⎛ 200 - 163 ⎞ 2 As determined before, the area of the triangle to the left of 200 is: (1/2) = 0.2738. ⎝ 213 - 163 ⎠ The average of this small right triangle is: 200 - (200 - 163)/3 = 187.667. In computing the limited expected value at 200, any value greater than 200 contributes 200. In contrast, all of the values less than 200 contribute their value to E[X ∧ 200]. Thus E[X ∧ 200] = (0.2738)(187.667) + (200)(1 - 0.2738) = 196.623, matching the previous result.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 81
Exercise: Compute E[X ∧ 200] for the triangle kernel centered at 157 with bandwidth 50: density 0.02
107
157
207
x
200 [Solution: The area to the right of 200 is a fraction of the area of 1/2 that is to the left of 213: ⎛ 207 - 200 ⎞ 2 (1/2) = 0.0098. ⎝ 207 - 157 ⎠ The average for this triangle is: 200 + (207 - 200)/3 = 202.333. For the whole triangle kernel, E[X] = 157. In computing the limited expected value at 200, any value greater than 200 contributes 200. Thus while the small right triangle contributes to E[X] its area times its average, it only contributes 200 times its area to E[X ∧ 200]. In contrast, all of the values less than 200 contribute their value to both E[X] and E[X ∧ 200]. Thus E[X ∧ 200] = 157 - (202.333)(0.0098) + (200)(0.0098) = 156.977. Alternately, for 107 ≤ x ≤ 157, f(x) = (0.02) For 157 ≤ x ≤ 207, f(x) = (0.02)
x - 107 = 0.0004x - 0.0428. 157 - 107
207 - x = 0.0828 - 0.0004x. 207 - 157
200
E[X ∧ 200] =
∫107 x f(x) dx + (200) (0.0098) =
157
200
∫107 x (0.0004x - 0.0428) dx + 157∫ x (0.0828 - 0.0004x) dx + 1.96 = x = 157
0.0004 3
x3
⎤ - 0.0214x2 ⎥ ⎦
x = 200
0.0004 x3 ⎤ + 0.0414x2 ⎥⎦ 3
x = 107
= 70.167 + 84.850 + 1.96 = 156.977.]
x = 157
+ 1.96
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 82
For the triangle kernel centered at 81 with bandwidth 50, all of the area is to the left of 200, and thus E[X ∧ 200] is just the average of this kernel: 81. Let us return to the example, where we have a data set of size three: 81, 157 and 213. We apply a triangle kernel with bandwidth 50. The mean of each kernel is the point at which it is centered. Therefore, the mean of the kernel smoothed density is the mean of the data: (1/3)(81) + (1/3)(157) + (1/3)(213) = 150.333. E[X ∧ 200] for the kernel smoothed density is the average of the limited expected values for each of the individual kernels computed previously: (1/3)(81) + (1/3)(156.977) + (1/3)(196.623) = 144.867. In general, the limited expected value for a kernel smoothed density is a weighted average of the limited expected values of each of the individual kernels, with weights equal to the number of times each value appears in the original data set. For this example, here is a graph of the limited expected values of the triangle kernel smoothed density as a function of the limit: LEV 150.3 144.9
100
50
100
200
263
L
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 83
In general, let us assume that we have a triangle kernel centered at y, with bandwidth b. Then for L ≥ y + b, since all of the values are less than L the limit in this case had no effect. Thus E[X ∧ L] = y. For L ≤ y - b, all of the values become L after we apply the limit. Thus E[X ∧ L] = L. If y + b > L > y - b, some of the values are capped at L and some are unaffected. First let us assume that y + b > L ≥ y: 1/b
y-b
y
L
The area of the triangle to the right of L is: The average of this right triangle is: L +
y+b
1 ⎛ y + b - L⎞ 2 . ⎠ 2 ⎝ b
y + b - L . 3
Each of the values in this small triangle contributes L to E[X ∧ L] rather than its value. Thus E[X ∧ L] = E[X] =y-
1 ⎛ y + b - L⎞ 2 y + b - L {L+ - L} ⎠ 2 ⎝ b 3
(y + b - L)3 . 6 b2
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 84
If instead y ≥ L > y - b: 1/b
y-b
L
y
y+b
The area of the triangle to the left of L is: The average of this right triangle is: L -
1 ⎛ L - y + b⎞ 2 . ⎠ 2 ⎝ b
L - y + b . 3
Each of the values in the triangle contributes its value to E[X ∧ L]. Those values to the right of L contribute L to E[X ∧ L]. Thus E[X ∧ L] =
1 ⎛ L - y + b⎞ 2 L - y + b 1 ⎛ L - y + b⎞ 2 {L } + L {1 } ⎠ ⎠ 2 ⎝ b 3 2 ⎝ b
=L-
(L - y + b)3 . 6 b2
Putting all the cases together, we have for a triangle kernel centered at y with bandwidth b:61 L, for L ≤ y - b ⎧ ⎪ ⎪ 3 ⎪L - (L - y + b) , for y - b ≤ L ≤ y ⎪ 6 b2 E[X ∧ L] = ⎨ . 3 ⎪ (y + b - L) , for y ≤ L ≤ y + b ⎪y 2 6 b ⎪ ⎪ y, for L ≥ y + b ⎩
61
I would not memorize this formula.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 85
Exercise: Use the above formula to compute E[X ∧ 200] for the triangle kernel centered at 213 with bandwidth 50. [Solution: L -
(L - y + b)3 (200 - 213 + 50)3 = 200 = 196.623. 6 b2 (6) (50 2)
Comment: Matching the previous result.] Exercise: Use the above formula to compute E[X ∧ 200] for the triangle kernel centered at 157 with bandwidth 50. [Solution: y -
(y + b - L)3 (157 - 200 + 50)3 = 157 = 156.977. 6 b2 (6) (50 2)
Comment: Matching the previous result.]
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 86
Problems: Use the following information for the next 8 questions: A random sample of five claims yields the values: 100, 500, 1000, 2000, and 5000. 6.1 (1 point) Using a uniform kernel with bandwidth 1000, what is the smoothed probability density function at 2500? A. 0.0001 B. 0.0002 C. 0.0003 D. 0.0004 E. 0.0005 6.2 (2 points) What is the standard deviation of the estimate in the previous question? A. 0.00006 B. 0.00007 C. 0.00008 D. 0.00009 E. 0.00010 6.3 (2 points) Using a triangular kernel with bandwidth 2000, what is the smoothed probability density function at 6000? A. 0.000025 B. 0.000050 C. 0.000075 D. 0.000100
E. 0.000125
6.4 (4 points) Using a gamma kernel with α = 3, what is the smoothed probability density function at 3000? A. 0.00004 B. 0.00005 C. 0.00006 D. 0.00007 E. 0.00008 6.5 (2 points) Using a uniform kernel with bandwidth 1000, what is the smoothed distribution function at 1700? A. 0.56 B. 0.58 C. 0.60 D. 0.62 E. 0.64 6.6 (2 points) Using a triangular kernel with bandwidth 2000, what is the smoothed distribution function at 3000? A. 0.76 B. 0.78 C. 0.80 D. 0.82 E. 0.84 6.7 (2 points) Using a uniform kernel with bandwidth 3000, what is the variance of the kernel smoothed density? A. 4.5 million B. 5.0 million C. 5.5 million
D. 6.0 million
E. 6.5 million
6.8 (3 points) Using a triangular kernel with bandwidth 3000, what is the variance of the kernel smoothed density? A. 4.5 million B. 5.0 million C. 5.5 million
D. 6.0 million
E. 6.5 million
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 87
Use the following information for the next 3 questions: A random sample of 20 observations of a random variable X yields the following values: 0.5, 1.1, 1.6, 2.2, 2.4, 3.0, 3.6, 4.2, 4.5, 5.1, 6.3, 6.9, 8.2, 8.8, 9.9, 11.1, 12.5, 13.3, 14.2, 15.4. 6.9 (2 points) Using a uniform kernel with bandwidth 1.5, what is the smoothed pdf at 5.2? A. Less than 0.03 B. At least 0.03, but less than 0.04 C. At least 0.04, but less than 0.05 D. At least 0.05, but less than 0.06 E. At least 0.06 6.10 (2 points) Using a uniform kernel with bandwidth 3, what is the smoothed pdf at 8? A. 0.03 B. 0.04 C. 0.05 D. 0.06 E. 0.07 6.11 (3 points) Using a triangular kernel with bandwidth 2, what is the smoothed pdf at 10? A. 0.03 B. 0.04 C. 0.05 D. 0.06 E. 0.07 For the next two questions, use the following information on the time required to close claims: Time(weeks), tj Number of closures, sj Number of open claims, rj 1 2 3 4 5 6 7 8
30 20 10 8 3 4 5 2
100 70 50 40 32 29 25 20 ^
6.12 (2 points) Using a uniform kernel with a bandwidth of 1.5, determine f (4). A. 0.07
B. 0.08
C. 0.09
D. 0.10
E. 0.11 ^
6.13 (3 points) Using a triangular kernel with a bandwidth of 1.5, determine f (5). A. Less than 0.045 B. At least 0.045, but less than 0.050 C. At least 0.050, but less than 0.055 D. At least 0.055, but less than 0.060 E. At least 0.060
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 88
6.14 (2 points) For a data set of size n, what is the difference between the variance of the empirical distribution function and the variance of the uniform kernel smoothed density with bandwidth b? Use the following information for the next six questions: There are four losses of size: 500, 1000, 1000, 1500. 6.15 (1 point) Draw the kernel smoothed density for a uniform kernel with bandwidth 200. 6.16 (1 point) Draw the kernel smoothed density for a uniform kernel with bandwidth 500. 6.17 (1 point) Draw the kernel smoothed density for a uniform kernel with bandwidth 800. 6.18 (2 points) Draw the triangular kernel smoothed density with bandwidth 200. 6.19 (2 points) Draw the triangular kernel smoothed density with bandwidth 500. 6.20 (2 points) Draw the triangular kernel smoothed density with bandwidth 800. 6.21 (3 points) From a population having survival function S, you are given the following sample of size six: 20, 20, 32, 44, 50, 50. Colonel Klink uses a uniform kernel with bandwidth 14 in order to estimate S(35). Colonel Sanders uses a uniform kernel with bandwidth 8 in order to estimate S(35). What is the difference between the estimates of Colonel Klink and Colonel Sanders? A. Less than -0.03 B. At least -0.03, but less than -0.01 C. At least -0.01, but less than 0.01 D. At least 0.01, but less than 0.03 E. At least 0.03 6.22 (2 points) For a data set of size n, what is the difference between the variance of the empirical distribution function and the variance of the triangular kernel smoothed density with bandwidth b? 6.23 (2 points) From a population having density function f, you are given the following sample: 6, 10, 10, 15, 18, 20, 23, 27, 32, 32, 35, 40, 40, 46, 46, 55, 55, 55, 63, 69, 82. Calculate the kernel density estimate of f, using the uniform kernel with bandwidth 10. At which of the following values of x is the kernel density estimate of f(x) the largest? A. 10 B. 20 C. 30 D. 40 E. 50 6.24 (3 points) You are given the following sample: 9, 15, 18, 20, 21. Using the uniform kernel with bandwidth 4, estimate the median. (A) 17.0 (B) 17.5 (C) 18.0 (D) 18.5 (E) 19.0
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 89
Use the following information for the next 11 questions: For eight widgets, their times until failure were in months: 7, 11, 13, 17, 19, 22, 28, 31. 6.25 (1 point) Using a uniform kernel with bandwidth 5, what is the smoothed pdf at 20? A. 0.038 B. 0.040 C. 0.042 D. 0.044 E. 0.046 6.26 (2 points) Using a triangular kernel with bandwidth 5, what is the smoothed pdf at 20? A. 0.039 B. 0.041 C. 0.043 D. 0.045 E. 0.047 6.27 (2 points) What is the variance of the uniform kernel smoothed density function, using a bandwidth of 10? A. 55 B. 65 C. 75 D. 85 E. 95 6.28 (2 points) What is the variance of the triangle kernel smoothed density function, using a bandwidth of 10? A. 70 B. 75 C. 80 D. 85 E. 90 6.29 (2 points) Calculate the kernel density estimate of F(15), using the uniform kernel with bandwidth 5. A. 0.38 B. 0.40 C. 0.42 D. 0.44 E. 0.46 6.30 (3 points) Calculate the kernel density estimate of F(16), using the triangle kernel with bandwidth 5. A. 0.38 B. 0.40 C. 0.42 D. 0.44 E. 0.46 6.31 (4 points) Using a gamma kernel with α = 5, what is the smoothed pdf at 20? A. 0.025
B. 0.027
C. 0.029
D. 0.031
E. 0.033
6.32 (4 points) Using a pareto kernel with α = 5, what is the smoothed pdf at 20? A. 0.012
B. 0.014
C. 0.016
D. 0.018
E. 0.020
6.33 (3 points) Calculate the kernel density estimate of F(15), using the pareto kernel with α = 5. A. 0.68
B. 0.70
C. 0.72
D. 0.74
E. 0.76
6.34 (3 points) Using a uniform kernel with a bandwidth of 5, what is the median of the kernel smoothed distribution? A. 17.50 B. 17.75 C. 18.00 D. 18.25 E. 18.50 6.35 (3 points) Using a Normal kernel with σ = 5, estimate F(20). A. 49%
B. 51%
C. 53%
D. 55%
E. 57%
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 90
6.36 (1 point) Which of the following statements are true? 1. For a triangular kernel, as the bandwidth increases, the amount of smoothing increases. 2. For a gamma kernel, as α increases, the amount of smoothing increases. 3. For an exponential kernel, the smoothed density always has a mode of zero. A. 1 B. 2 C. 3 D. 1, 3 E. None of A, B, C, or D Use the following information for the next 4 problems. You observe 4 sizes of loss: 200, 200, 300, 500. 6.37 (3 points) Using a uniform kernel with a bandwidth of 150, for the kernel smoothed density, determine E[X ∧ 250]. A. Less than 210 B. At least 210 but less than 215 C. At least 215 but less than 220 D. At least 220 but less than 225 E. At least 225 6.38 (4 points) Using a triangle kernel with a bandwidth of 150, for the kernel smoothed density, determine E[X ∧ 250]. A. Less than 210 B. At least 210 but less than 215 C. At least 215 but less than 220 D. At least 220 but less than 225 E. At least 225 6.39 (4 points) Determine the algebraic form of F(x) for a uniform kernel with a bandwidth of 150. 6.40 (6 points) Determine the algebraic form of F(x) for a triangle kernel with a bandwidth of 150.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Use the following information for the next 2 problems.
• You observe 5 values: 20, 30, 70, 120, 200. • Define the Epanechnikov kernel with bandwidth b as: ky(x) =
3 4b
⎧ ⎛ x - y ⎞2⎫ 1 , y - b ≤ x ≤ y + b. ⎨ ⎝ b ⎠ ⎬⎭ ⎩
6.41 (2 points) Using an Epanechnikov kernel with bandwidth 50, estimate the density function at 100. A. 0.0040 B. 0.0042 C. 0.0044 D. 0.0046 E. 0.0048 6.42 (3 points) Using an Epanechnikov kernel with bandwidth 50, calculate the kernel density estimate of F(100). A. 0.54 B. 0.56 C. 0.58 D. 0.60 E. 0.62
Use the following information for the next 2 problems.
• You observe 4 values: 4, 6, 13, 17. • g(t) = 0.1 + t3 /2000, -5 ≤ t ≤ 5. • ky(x) = g(x - y). 6.43 (2 points) Using this kernel, estimate the density function at 10. A. 0.040 B. 0.045 C. 0.050 D. 0.055 E. 0.060 6.44 (3 points) Using this kernel, estimate the distribution function at 10. A. 0.40 B. 0.45 C. 0.50 D. 0.55 E. 0.60 Use the following information for the next 2 problems.
• You observe 6 losses: 20, 20, 20, 30, 30, 45. • g(t) = {1 - (t/15)2 }2 / 16, -15 ≤ t ≤ 15. • ky(x) = g(x - y). 6.45 (2 points) Using this kernel, estimate the density function at 28. A. 0.028 B. 0.030 C. 0.032 D. 0.034 E. 0.036 6.46 (3 points) Using this kernel, estimate the distribution function at 40. A. 0.80 B. 0.82 C. 0.84 D. 0.86 E. 0.88
Page 91
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 92
6.47 (4, 11/03, Q.4 & 2009 Sample Q. 3) (2.5 points) You study five lives to estimate the time from the onset of a disease to death. The times to death are: 2 3 3 3 7. Using a triangular kernel with bandwidth 2, estimate the density function at 2.5. (A) 8/40 (B) 12/40 (C) 14/40 (D) 16/40 (E) 17/40 6.48 (4, 11/04, Q.20 & 2009 Sample Q.147) (2.5 points) From a population having distribution function F, you are given the following sample: 2.0, 3.3, 3.3, 4.0, 4.0, 4.7, 4.7, 4.7. Calculate the kernel density estimate of F(4), using the uniform kernel with bandwidth 1.4. (A) 0.31 (B) 0.41 (C) 0.50 (D) 0.53 (E) 0.63 6.49 (2 points) In the previous question, calculate the kernel density estimate of F(3), using the uniform kernel with bandwidth 1.4. (A) 0.22 (B) 0.24 (C) 0.26 (D) 0.28 (E) 0.30
6.50 (4, 5/05, Q.22 & 2009 Sample Q.192) (2.9 points) You are given the kernel: ky(x) = (2/π)
1 - (x - y)2 , y - 1 ≤ x ≤ y + 1.
You are also given the following random sample: 1 3 3 5 Determine which of the following graphs shows the shape of the kernel density estimator.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 93
6.51 (4, 11/05, Q.9 & 2009 Sample Q.221) (2.9 points) You are given: (i) The sample: 1 2 3 3 3 3 3 3 3 3 ^
(ii) F 1 (x) is the kernel density estimator of the distribution function using a uniform kernel with bandwidth 1. ^
(iii) F 2 (x) is the kernel density estimator of the distribution function using a triangular kernel with bandwidth 1. ^
^
Determine which of the following intervals has F 1 (x) = F 2 (x) for all x in the interval. (A) 0 < x < 1 (B) 1 < x < 2 (C) 2 < x < 3 (D) 3 < x < 4 (E) None of (A), (B), (C) or (D) 6.52 (4, 11/06, Q.24 & 2009 Sample Q.268) (2.9 points) You are given the following ages at time of death for 10 individuals: 25 30 35 35 37 39 45 47 49 55 Using a uniform kernel with bandwidth b = 10, determine the kernel density estimate of the probability of survival to age 40. (A) 0.377 (B) 0.400 (C) 0.417 (D) 0.439 (E) 0.485 6.53 (4, 5/07, Q.16) (2.5 points) You use a uniform kernel density estimator with b = 50 to smooth the following workers compensation loss payments: 82 126 161 294 384 ^
If F (x) denotes the estimated distribution function and F5 (x) denotes the empirical distribution ^
function, determine | F (150) - F5 (150) | . (A) Less than 0.011 (B) At least 0.011, but less than 0.022 (C) At least 0.022, but less than 0.033 (D) At least 0.033, but less than 0.044 (E) At least 0.044
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 94
Solutions to Problems: 6.1. A. Only those losses within 1000 of 2500 contribute. There is 1 such loss, at 2000, which contributes Uniform[1000, 3000]/5. The p.d.f. at 2500 is: (1/2000)/5 = 0.0001. 6.2. D. The density estimate is based on the number of items in the interval [1500, 3500]. Let z = number of items in the interval [1500, 3500]. Then the estimated p.d.f. at 2500 is: (z/5)/2000. Now the variance of the probability in an interval is a generalization of the variance of the empirical distribution function: (probability in the interval) (1 - probability in the interval) / N. Thus Var[z/5] = (1/5)(4/5)/5 = 4/125. Var[(z/5)/2000] = (4/125)/20002 = 0.000000008. Standard deviation of the estimate is: 0.0000894. Equivalently, an estimate of the density from the histogram has variance: (losses in interval / N) {1 - (losses in interval / N)} = (1/5)(4/5) / {(5)(20002 )} = 0.000000008. N (width of the interval)2 Standard deviation of the estimate is: 0.0000894. Comment: We only have 5 data points, so the standard deviation of the estimate of 0.0000894 is almost as big as the estimate itself of 0.0001. 6.3. B. Only those losses within 2000 of 6000 contribute. There is 1 such loss. The triangular kernel centered at 5000 has density at 6000 of: (2000 - |6000 - 5000|)/20002 = 0.00025. Thus the smoothed pdf at 6000 is: .00025/5 = 0.00005. Comment: Here is the triangular kernel smoothed density: density 0.0014 0.0012 0.0010 0.0008 0.0006 0.0004 0.0002 - 2000
2000
4000
6000
8000
x
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 95
6.4. D. The density of a Gamma is: f(x) = θ−αxα−1 e−x/θ/Γ(α) = θ-3x2 e−x/θ / 2. f(3000) = 4,500,000e−3000/θ /θ3 . For the kernel with mean at point y, θ = y/α = y/3. f(3000; y) = 121,500,000e−9000/y /y3 . f(3000; 100) = 1.0 x 10-37. f(3000; 500) = 1.5 x 10-8. f(3000; 1000) = 0.000015. f(3000; 2000) = 0.000169. f(3000; 5000) = 0.000161. The smoothed density at 3000 is: (1.0 x 10-37 + 1.5 x 10-8 + 0.000015 + 0.000169 + 0.000161)/5 = 0.000069. Comment: Here is the gamma kernel smoothed density: density 0.008
0.006
0.004
0.002
1000
2000
3000
4000
5000
6000
x
6.5. E. The uniform kernel centered at 100 goes to 1100, thus all of it is to the left of 1700. The uniform kernel centered at 500 goes to 1500, thus all of it is to the left of 1700. The uniform kernel centered at 1000 goes from 0 to 2000, thus .85 of it is to the left of 1700. The uniform kernel centered at 2000 goes from 1000 to 3000, thus .35 of its area is to the left of 1700. The uniform kernel centered at 5000 goes from 4000, thus none of it is to the left of 1700. The smoothed distribution at 1700 is: 1/5 + 1/5 + .85/5 + .35/5 + 0/5 = 0.64.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 96
6.6. B. The triangular kernel centered at 100 goes to 2100, thus all of it is to the left of 3000. The triangular kernel centered at 500 goes to 2500, thus all of it is to the left of 3000. The triangular kernel centered at 1000 goes to 3000, thus all of it is to the left of 3000. The triangular kernel centered at 2000 goes from 0 to 4000, thus 1 - (1/2)2 /2 = 7/8 of its area is to the left of 1700. The triangular kernel centered at 5000 goes from 3000, thus none of it is to the left of 3000. The smoothed distribution at 3000 is: 1/5 + 1/5 + 1/5 + (7/8)/5 + 0/5 = 0.775. 6.7. D. The mean of the data is 1720. The second moment of the data is 6,052,000. The variance of the data is: 6,052,000 - 17202 = 3,093,600. The variance of the uniform kernel is: b2 /3 = 30002 /3 = 3,000,000. The variance of the kernel smoothed density is the variance of the data plus the variance of the kernel: 3,093,600 + 3,000,000 = 6,093,600. Comment: A uniform kernel with bandwidth b, and width 2b, has variance: (2b)2 /12 = b2 /3. 6.8. A. The mean of the data is 1720. The second moment of the data is 6,052,000. The variance of the data is: 6,052,000 - 17202 = 3,093,600. The variance of the triangular kernel is: b2 /6 = 30002 /6 = 1,500,000. The variance of the kernel smoothed density is the variance of the data plus the variance of the kernel: 3,093,600 + 1,500,000 = 4,593,600. Comment: A triangular density from 0 to 2b has mean b, and second moment: b
∫
0
x2 x / b2
2b
dx + ∫ x2 (2b - x)/ b2 dx = b2 /4 + 11b2 /12 = 7b2 /6. b
Therefore, the variance of this triangular kernel is: 7b2 /6 - b2 = b2 /6. 6.9. E. Only those losses within 1.5 of 5.2 contribute. There are 4 such losses, and they each contribute 1/{(20)(3)}, so the p.d.f. at 5.2 is: 4/60 = 0.067. 6.10. C. Only those losses within 3 of 8 contribute. There are 6 such losses, and they each contribute 1/{(20)(6)}, so the p.d.f. at 8 is: 6/120 = 0.05. 6.11. C. Only those losses within 2 of 10 contribute. There are 4 such losses. The triangular kernel centered at 8.2 has density at 10 of: (2 - |10 - 8.2|)/22 = 0.05. The triangular kernel centered at 8.8 has density at 10 of: (2 - |10 - 8.8|)/22 = 0.2. The triangular kernel centered at 9.9 has density at 10 of: (2 - |10 - 9.9|)/22 = 0.475. The triangular kernel centered at 11.1 has density at 10 of: (2 - |10 - 11.1|)/22 = 0.225. Thus the smoothed pdf at 10 is: (.05 + .2 + .475 + .225)/20 = 0.0475.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 97
6.12. A. Only those times within 1.5 of 4 contribute. There are 3 such times: 3, 4, and 5. The uniform kernel of bandwidth 1.5 has density 1/3. p(3) = 10/100 = .1. p(4) = 8/100 = .08. p(5) = 3/100 = .03. f (4) = Σ p(yj)kyj(4) = (.1)(1/3) + (.08)(1/3) + (.03)(1/3) = 0.07.
^
Comment: The percentage of the original 100 that are closed in year 3 is 10/100 = 10%. We divide by 100 rather than the 82 claims that are closed by time 8. We assume that the remaining 18 claims will be closed after time 8. This is a similar idea to having 18 out 100 lives still alive at age 80. We would assume they would die at some time after age 80. 6.13. B. Only those times within 1.5 of 5 contribute. There are 3 such times. The triangular kernel centered at 4 has density at 5 of: (1.5 - |5 - 4|)/1.52 = 0.2222. The triangular kernel centered at 5 has density at 5 of: (1.5 - |5 - 5|)/1.52 = 0.6667. The triangular kernel centered at 6 has density at 5 of: (1.5 - |5 - 6|)/1.52 = 0.2222. p(4) = 8/100 = .08. p(5) = 3/100 = .03. p(6) = 4/100 = .04. f (5) = Σ p(yj)kyj(5) = (.08)(.2222) + (.03)(.6667) + (.04)(.2222) = 0.0467.
^
6.14. The variance of the kernel smoothed density is the variance of the data (the variance of the empirical distribution function) plus the variance of the kernel. Therefore, the difference between the variance of the empirical distribution function and the variance of the uniform kernel smoothed density with bandwidth b is minus the variance of the uniform kernel: -(2b)2 /12 = -b2 /3. Comment: If you remember that the result does not depend on the sample size n, then you can take n = 1. For n = 1, the variance of the empirical distribution function is zero. For n = 1, the kernel smoothed density is just a single uniform distribution with width 2b.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 98
6.15. The value 1000 appears twice, so the corresponding kernel gets twice the weight. Uniform[300, 700]/4 + Uniform[800, 1200]/2 + Uniform[1300, 1500]/4. 0.0012 0.001 0.0008 0.0006 0.0004 0.0002
500
1000
1500
2000
Comment: With a small bandwidth, the three uniform kernels do not overlap. At each of 6 points: 300, 700, 800, 1200, 1300, and 1700, the kernel smoothed density is discontinuous. 6.16. 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 500
1000
1500
2000
Comment: At 4 points: 0, 500, 1500, and 2000, the kernel smoothed density is discontinuous.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 99
6.17. 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001
-5 0 0
500
1000
1500
2000
2500
Comment: At each of 6 points: -300, 200, 700, 1300, 1800, and 2300, the kernel smoothed density is discontinuous. 6.18. The value 1000 appears twice, so the corresponding kernel gets twice the weight. Triangle[300, 700]/4 + Triangle[800, 1200]/2 + Triangle[1300, 1500]/4. 0.0025 0.002 0.0015 0.001 0.0005
500
1000
1500
2000
Comment: With a small bandwidth, the three triangular kernels do not overlap. The slope changes at: 300, 500, 700, 800, 1000, 1200, 1300, 1500, and 1700. At each of these 9 points the kernel smoothed density is continuous, but not differentiable.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 100
6.19. The three separate triangles are as follows: 0.002
0.002
0.0015
0.0015
0.001
0.001
0.0005
0.0005
500
1000
1500
2000
500
1000
1500
2000
500
1000
1500
2000
0.002 0.0015 0.001 0.0005
Weighting these three densities together, with weights 1/4, 1/2, and 1/4 gives: 0.001 0.0008 0.0006 0.0004 0.0002
500
1000
1500
2000
Comment: It turns out that in this special case, the three triangles when weighted together form one large triangle. The slope changes at: 0, 1000, and 2000. At each of these 3 points the kernel smoothed density is continuous, but not differentiable.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 101
6.20. 0.0008
0.0006
0.0004
0.0002
-5 0 0
500
1000
1500
2000
2500
Comment: The slope changes at: -300, 200, 500, 700, 1000, 1300, 1500, 1800, and 2300. At each of these 9 points the kernel smoothed density is continuous, but not differentiable. 6.21. B. The empirical model is: 1/3 at 20, 1/6 at 32, 1/6 at 44, and 1/3 at 50. For Colonel Klink, using a bandwidth of 14: The uniform kernel centered at 20 stretches from 6 to 34, and is all to the left of 35, so it contributes nothing to S(35). The uniform kernel centered at 32 stretches from 18 to 46, and (46-35)/28 = 11/28 is to the right of 35 and contributes to S(35). The uniform kernel centered at 44 stretches from 30 to 58, and (58-35)/28 = 23/28 is to the right of 35 and contributes to S(35). The uniform kernel centered at 50 stretches from 36 to 64, and all of it is to the right of 35 and contributes to S(35). Colonel Klink's estimate of S(35) is: (0)(1/3) + (11/28)(1/6) + (23/28)(1/6) + (1)(1/3) = 90/168 = 15/28. For Colonel Sanders, using a bandwidth of 8: The uniform kernel centered at 20 is all to the left of 35, so it contributes nothing to S(35). The uniform kernel centered at 32 stretches from 24 to 40, and (40-35)/16 = 5/16 is to the right of 35 and contributes to S(35). The uniform kernel centered at 44 stretches from 36 to 52, and all of it is to the right of 35 and contributes to S(35). The uniform kernel centered at 50 stretches from 42 to 58, and all of it is to the right of 35 and contributes to S(35). For Colonel Sanders, his estimate of S(35) is: (0)(1/3) + (5/16)(1/6) + (1)(1/6) + (1)(1/3) = 53/96. The difference between the estimates of Colonel Klink and Colonel Sanders is: 15/28 - 53/96 = (360 - 371)/672 = -11/672 = -0.0164.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 102
6.22. The variance of the kernel smoothed density is the variance of the data (the variance of the empirical distribution function) plus the variance of the kernel. Therefore, the difference between the variance of the empirical distribution function and the variance of the triangular kernel smoothed density with bandwidth b is minus the variance of the triangular kernel: -b2 /6. Comment: If you do not remember the variance of the triangular kernel, one can compute it via integration. 6.23. C. We want the largest number of values within 10 of x, since only these values contribute. x = 10: [0, 20] has 6 values. x = 20: [10, 30] has 7 values. x = 30: [20, 40] has 8 values. x = 40: [30, 50] has 7 values. x = 50: [40, 60] has 7 values. Comment: Values exactly 10 from x contribute to the uniform kernel density estimate of f(x). 6.24. B. In each case, we need to find what portion of the area of the uniform kernel centered at one of the sample points is to the left of a certain value. For each possibility, all of the area is included for the kernel centered at 9. The kernel centered at 15 goes from 11 to 19. For F(17.5), 6.5/8 of the area of the kernel centered at 15 is included, 3.5/8 of the area of the kernel centered at 18 is included, 1.5/8 of the area of the kernel centered at 20 is included, and 0.5/8 of the area of the kernel centered at 21 is included. F(17.5) = (1/5)(1 + 6.5/8 + 3.5/8 + 1.5/8 + 0.5/8) = (1/5)(20/8) = 0.5. Thus 17.5 is the estimated median. Comment: Trying the other choices, the distribution function is either too small or too big. For example, for F(17), 6/8 of the area of the kernel centered at 15 is included, 3/8 of the area of the kernel centered at 18 is included, 1/8 of the area of the kernel centered at 20 is included, and none of the area of the kernel centered at 21 is included. F(17) = (1/5)(1 + 6/8 + 3/8 + 1/8) = (1/5)(18/8) = 0.45. Similarly, F(18) = (1/5)(1 + 7/8 + 4/8 + 2/8 + 1/8) = (1/5)(22/8) = 0.55. 6.25. A. Only those times within 5 of 20 contribute. There are 3 such times out of 8. Each uniform kernel has a height of: 1/{(2)(5)} = 1/10. So the p.d.f. at 20 is: (3/8)(1/10) = 0.0375. 6.26. D. Only those times within 5 of 20 contribute. There are 3 such times out of 8. The triangular kernel centered at 17 has density at 20 of: (5 - |20 - 17|)/52 = 0.08. The triangular kernel centered at 19 has density at 20 of: (5 - |20 - 19|)/52 = 0.16. The triangular kernel centered at 22 has density at 20 of: (5 - |20 - 22|)/52 = 0.12. Thus the smoothed pdf at 20 is: (.08 + .16 + .12)/8 = 0.045.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 103
Comment: The three triangles that contribute to the kernel smoothed density at 20 are: 0.2
0.15
0.1
0.05
15
20
25
30
15
20
25
30
15
20
25
30
0.2
0.15
0.1
0.05
0.2
0.15
0.1
0.05
Each triangular kernel has base of (2)(5) = 10 and height of 1/5 = .2, for an area of one. For example, the triangle centered at 22, has height at 20 of: (3/5)(.2) = .12. The entire kernel smoothed density is an average of 10 triangular kernels:
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 104
0.04
0.03
0.02
0.01
10
20
30
40
6.27. E. The mean of the data is: 18.5. The second moment of the data is: 402.25. Variance of the data is: 402.25 - 18.52 = 60. The variance of the uniform kernel is: width2 /12 = 202 /12 = 100/3. Variance of the kernel smoothed density is: 60 + 100/3 = 93.33. 6.28. B. The mean of the data is: 18.5. The second moment of the data is: 402.25. Variance of the data is: 402.25 - 18.52 = 60. 10
∫ x2 (10 − x) dx
The variance of the triangle kernel is: 010
= (10000/3 - 10000/4)/(100/2) = 50/3.
∫ (10 − x) dx
0
Variance of the kernel smoothed density is: 60 + 50/3 = 76.67. Comment: The variance of the triangle kernel is: b2 /6 = 102 /6. 6.29. A. We need to determine how much of each kernel is to the left of 15. The kernel centered at 7 is all to the left of 15. The kernel centered at 11, goes from 6 to 16, and is 90% to the left of 15. The kernel centered at 13, goes from 8 to 18, and is 70% to the left of 15. The kernel centered at 17, goes from 12 to 22, and is 30% to the left of 15. The kernel centered at 19, goes from 14 to 24, and is 10% to the left of 15. The remaining kernels are all to the right of 15. The kernel density estimate of F(15) is: (1 + 0.9 + 0.7 + 0.3 + 0.1)/8 = 0.375.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 105
6.30. C. We need to determine how much of each kernel is to the left of 16. The kernel centered at 7 is all to the left of 16. The kernel centered at 11 is all to the left of 16. The kernel centered at 13, goes from 8 to 18, and is: 1 - (2)(0.08)/2 = 92% to the left of 16. 0.2
0.08
13 16 8 18 The kernel centered at 17, goes from 12 to 22, and is: (4)(0.16)/2 = 32% to the left of 16. 0.2 0.16
16 17 12 22 The kernel centered at 19, goes from 14 to 24, and is: (2)(0.08)/2 = 8% to the left of 16. 0.2
0.08
16 19 14 24 The remaining kernels are all to the right of 16. The kernel density estimate of F(16) is: (1 + 1 + 0.92 + 0.32 + 0.08)/8 = 0.415.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 106
6.31. C. The density of a Gamma is: f(x) = θ−αxα−1 e−x/θ/Γ(α) = θ-5x4 e−x/θ / 24. f(20) = 6666.67 e−20/θ /θ5 . For the kernel with mean at point y, θ = y/α = y/5. f(20; y) = 20,833,333 e−100/y /y5 . f(20; 7) = 0.00077. f(20; 11) = 0.01458.
f(20; 13) = 0.02560.
f(20; 17) = 0.04091. f(20; 19) = 0.04357. f(20; 22) = 0.04291. f(20; 28) = 0.03403. f(20; 31) = 0.02891. The smoothed density at 20 is: (0.00077 + 0.01458 + 0.02560 + 0.04091 + 0.04357 + 0.04291 + 0.03403 + 0.02891)/8 = 0.02891. 6.32. B. The density of a Pareto is: f(x) = α θα / (x + θ)α+1 = 5 θ5 / (x + θ)6 . f(20) = 5 θ5 / (20 + θ)6 . For the kernel with mean at point y, θ = y(α−1) = 4y. f(20; y) = 5 45 y5 / (20 + 4y)6 = 1.25 y5 / (5 + y)6 . f(20; 7) = 0.00704. f(20; 11) = 0.01200. f(20; 13) = 0.01365. f(20; 17) = 0.01565. f(20; 19) = 0.01620. f(20; 22) = 0.01663. f(20; 28) = 0.01666. f(20; 31) = 0.01644. The smoothed density at 20 is: (0.00704 + 0.01200 + 0.01365 + 0.01565 + 0.01620 + 0.01663 + 0.01666 + 0.01644)/8 = 0.01428. 6.33. C. The distribution of a Pareto is: F(x) = 1 - θα / (x + θ)α = 1 - 1/ (1 + x/θ)5 . S(20) = 1/ (1 + 20/θ)5 . For the kernel with mean at point y, θ = y(α−1) = 4y. S(20; y) = 1/ (1 + 5/y)5 . S(20; 7) = 0.06754. S(20; 11) = 0.15359. S(20; 13) = 0.19650. S(20; 17) = 0.27551. S(20; 19) = 0.31096. S(20; 22) = 0.35917. S(20; 28) = 0.43976. S(20; 31) = 0.47347. The smoothed survival function at 20 is: (0.06754 + 0.15359 + 0.19650 + 0.27551 + 0.31096 + 0.35917 + 0.43976 + 0.47347)/8 = 0.28456. The smoothed distribution function at 20 is: 1 - 0.28456 = 0.71544.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 107
6.34. B. Since there are an even number of items in the sample, the empirical median is: (17 + 19)/2 = 18. The contributions to F(18) are for the various uniform kernels: 1, 1, 1, 6/10, 4/10, 1/10, 0, 0. ^
Thus for the kernel smoothed distribution, F (18) = (1 + 1 + 1 + 0.6 + 0.4 + 0.1)/8 = 4.1/8 > 0.5. Thus the median of the kernel smoothed distribution is less than 18. ^
For x = 18 - c, where c is small, the contributions to F (18-c) are for the various kernels: 1, 1, 1 - c/10, (6-c)/10, (4-c)/10, (1-c)/10, 0, 0. ^
Thus, F (18-c) = (4.1 - 4c/10)/8 = 0.5125 - c/20. ^
Setting F (18-c) equal to 0.5: 0.5 = 0.5125 - c/20. ⇒ c = 0.25. ^
Thus F (17.75) = 0.5, and 17.75 is the median of the kernel smoothed distribution. ^
Comment: F (17.75) = (1 + 1 + 0.975 + 0.575 + 0.375 + 0.075 + 0 + 0) / 8 = 0.5. 6.35. E. For each data point, we take a Normal Distribution with mean equal to that data point. For example, for the data point 22, F(20) = Φ[(20 - 22)/5] = Φ[-0.4] = 0.3446. The estimate of F(20) is: (1/8) {Φ[20 - 7)/5] + Φ[(20 - 11)/5] + Φ[(20 - 13)/5] + Φ[(20 - 17)/5] + Φ[(20 - 19)/5] + Φ[(20 - 22)/5] + Φ[(20 - 28)/5] + Φ[(20 - 31)/5]} = (1/8) {Φ[2.6] + Φ[1.8] + Φ[1.4] + Φ[0.6] + Φ[0.2] + Φ[-0.4] + Φ[-1.6] + Φ[-2.2]} = (1/8) {0.9953 + 0.9641 + 0.9192 + 0.7257 + 0.5793 + 0.3446 + 0.0548 + 0.0139} = 0.5746. Comment: Loss Models does not mention using the Normal Distribution as a kernel, but I have used it in a manner parallel to the Gamma or the Pareto distributions. One could use the density of the Normal to estimate the probability density function. 6.36. D. 1. True. 2. False. As α increases, the amount of smoothing decreases. 3. True. Each exponential kernel (gamma kernel with α = 1) has a mode of zero, therefore so does the smoothed density which is an average of the individual kernels.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 108
6.37. B. The uniform kernel centered at 500 has all of its area to the right of 250, so E[X ∧ 250] = 250. The uniform kernel centered at 300:
150 250 300 It has 1/3 of its area to the left of 250, and 2/3 to the right of 250. Thus for this kernel, E[X ∧ 250] = (1/3)(200) + (2/3)(250) = 233.333. The uniform kernel centered at 200:
450
50 200 250 350 It has 2/3 of its area to the left of 250, and 1/3 to the right of 250. Thus for this kernel, E[X ∧ 250] = (2/3)(150) + (1/3)(250) = 183.333. The overall limited expected value is a weighted average of that for the individual kernels: (2)(183.333) + 233.333 + 250 E[X ∧ 250] = = 212.5. 2 + 1 + 1
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 109
6.38. C. The triangle kernel centered at 500 has all of its area to the right of 250, so E[X ∧ 250] = 250. The triangle kernel centered at 300:
150 250 300 The area of the triangle to the left of 250 is: (1/2)(100/150)2 = 2/9. The average of this right triangle is: 250 - 100/3 = 216.667. Thus E[X ∧ 250] = (2/9)(216.667) + (1 - 2/9)(250) = 242.593. The triangle kernel centered at 200:
50 200 250 2 The area of the triangle to the right of 250 is: (1/2)(100/150) = 2/9. The average of this right triangle is: 250 + 100/3 = 283.333. Each value greater than 250 contributes 250 rather than its value to E[X ∧ 250] . Thus E[X ∧ 250] = E[X] - (2/9)(283.333 - 250) = 200 - 7.407 = 192.593. The overall limited expected value is a weighted average of that for the individual kernels: (2)(192.593) + 242.593 + 250 E[X ∧ 250] = = 219.445. 2 + 1 + 1
450
350
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 110
⎧ 0, for x < 50 ⎪ 6.39. For the uniform kernel centered at 200, F(x) = ⎨(x - 50) / 300, for 50 ≤ x ≤ 350 ⎪ 1, for x > 350 ⎩ ⎧ 0, for x < 150 ⎪ For the uniform kernel centered at 300, F(x) = ⎨(x - 150) / 300, for 150 ≤ x ≤ 450 ⎪ 1, for x > 450 ⎩ ⎧ 0, for x < 350 ⎪ For the uniform kernel centered at 500, F(x) = ⎨(x - 350) / 300, for 350 ≤ x ≤ 650 ⎪ 1, for x > 650 ⎩ The sample is: 200, 200, 300, 500. Thus the empirical model assigns 1/2 to 200, 1/4 to 300, and 1/4 to 500. Thus the kernel smoothed distribution function is: For x ≤ 50, F(x) = 0. For 50 ≤ x ≤ 150, F(x) = (1/2)(x-50)/ 300 = x/600 - 1/12. For 150 ≤ x ≤ 350, F(x) = (1/2)(x-50)/ 300 + (1/4)(x-150)/ 300 = x/400 - 5/24. For 350 ≤ x ≤ 450, F(x) = (1/2)(1) + (1/4)(x-150)/ 300 + (1/4)(x-350)/ 300 = x/600 + 1/12. For 450 ≤ x ≤ 650, F(x) = (1/2)(1) + (1/4)(1) + (1/4)(x-350)/ 300 = x/1200 + 11/24. For x ≥ 650, F(x) = 1. 0, for x ≤ 50 ⎧ ⎪ x / 600 - 1/ 12, for 50 ≤ x ≤ 150 ⎪ ⎪ x / 400 - 5 / 24, for 150 ≤ x ≤ 350 F(x) = ⎨ . ⎪ x / 600 + 1/ 12, for 350 ≤ x ≤ 450 ⎪x / 1200 + 11/ 24, for 450 ≤ x ≤ 650 ⎪ 1, for x ≥ 650 ⎩
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 111
Comment: A graph of the uniform kernel smoothed distribution function: F(x) 1
5 6
2 3
1 6
50
150
350
450
650
x
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 112
6.40. The triangle kernel centered at 200 has f(x) = (x - 50)/1502 , for 50 ≤ x ≤ 200. Therefore, F(x) = (x - 50)2 /45,000, for 50 ≤ x ≤ 200. F(200) = 1/2. f(x) = (350 - x)/1502 , for 200 ≤ x ≤ 350. Therefore, F(x) = 1 - (350 - x)2 /45,000, for 200 ≤ x ≤ 350. 0, for x ≤ 50 ⎧ ⎪ (x - 50)2 / 45,000, for 50 ≤ x ≤ 200 ⎪ For the triangle kernel centered at 200, F(x) = ⎨ 2 ⎪1 - (350 - x) / 45,000, for 200 ≤ x ≤ 350 ⎪⎩ 1, for x ≥ 350 0, for x ≤ 150 ⎧ ⎪ (x - 150)2 / 45,000, for 150 ≤ x ≤ 300 ⎪ For the triangle kernel centered at 300, F(x) = ⎨ 2 ⎪1 - (450 - x) / 45,000, for 300 ≤ x ≤ 450 ⎪⎩ 1, for x ≥ 450 0, for x ≤ 350 ⎧ ⎪ (x - 350)2 / 45,000, for 350 ≤ x ≤ 500 ⎪ For the triangle kernel centered at 500, F(x) = ⎨ 2 ⎪1 - (650 - x) / 45,000, for 500 ≤ x ≤ 650 ⎪⎩ 1, for x ≥ 650 The sample is: 200, 200, 300, 500. Thus the empirical model assigns 1/2 to 200, 1/4 to 300, and 1/4 to 500. Thus the kernel smoothed distribution function is: For x ≤ 50, F(x) = 0. For 50 ≤ x ≤ 150, F(x) = (x - 50)2 /90,000. For 150 ≤ x ≤ 200, F(x) = (x - 50)2 /90,000 + (x - 150)2 /180,000. For 200 ≤ x ≤ 300, F(x) = 1/2 - (350 - x)2 /90,000 + (x - 150)2 /180,000. For 300 ≤ x ≤ 350, F(x) = 3/4 - (350 - x)2 /90,000 - (450 - x)2 /180,000. For 350 ≤ x ≤ 450, F(x) = 3/4 - (450 - x)2 /180,000 + (x - 350)2 /180,000. For 450 ≤ x ≤ 500, F(x) = 3/4 + (x - 350)2 /180,000. For 500 ≤ x ≤ 650, F(x) = 1 - (650 - x)2 /180,000. For x ≥ 650, F(x) = 1.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 113
Comment: A graph of the triangle kernel smoothed distribution function: F(x) 1
7 8 29 36 25 36 43 72
19 72
1 9
50
150 200
300 350
450 500
650
x
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 114
6.41. C. For a bandwidth of 50, only kernels centered at points within 50 of 100 contribute to the estimate of f(100). k70(100) =
3 ⎧ ⎛ 100 - 70⎞ 2 ⎫ ⎨1 - ⎝ ⎬ = 0.0096. 200 ⎩ 50 ⎠ ⎭
k120(100) =
3 200
⎧ ⎛ 100 - 120 ⎞ 2 ⎫ 1 = 0.0126. ⎨ ⎝ ⎠ ⎬⎭ 50 ⎩
Thus the estimate of f(100) =
0 + 0 + 0.0096 + 0.0126 + 0 = 0.00444. 5
Comment: Here is a graph of the kernel smoothed density: density 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0
50
100
150
200
250
size
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 115
6.42. E. For a bandwidth of 50, kernels centered at any point less than or equal to 100 - 50 contribute 1 to F(100). Kernels centered at any point greater than or equal to 100 + 50 contribute 0 to F(100). k70(x) =
3 ⎧ ⎛ x - 70⎞ 2 ⎫ ⎨1 - ⎝ ⎬ , 20 ≤ x ≤ 120. 200 ⎩ 50 ⎠ ⎭
The contribution from the kernel centered at 70 to F(100) is the area to the left of 100, which is: 3 200
100
∫
20
⎧ 3 (100 - 70)3 (20 - 70)3 ⎛ x - 70 ⎞ 2 ⎫ dx = {80 + } = 0.896. ⎨1 - ⎝ ⎬ 50 ⎠ ⎭ 200 (3) (502 ) (3) (502) ⎩
k120(x) =
3 200
⎧ ⎛ x - 120 ⎞ 2 ⎫ 1 , 70 ≤ x ≤ 170. ⎨ ⎝ 50 ⎠ ⎬⎭ ⎩
The contribution from the kernel centered at 70 to F(100) is the area to the left of 100, which is: 3 200
100
∫
70
⎧ 3 (100 - 120) 3 (70 - 120)3 ⎛ x - 120 ⎞ 2 ⎫ dx = {30 + } = 0.216. ⎨1 - ⎝ ⎬ 50 ⎠ ⎭ 200 (3) (50 2) (3) (502 ) ⎩
Thus the estimate of F(100) =
1 + 1 + 0.896 + 0.216 + 0 = 0.6224. 5
Comment: In general, for an Epanechnikov kernel centered at y, the contribution to F(x) is: 0, for x ≤ y - b ⎧ ⎪ ⎪⎪ 1 3 (x - y) (x - y)3 , for y - b < x < y + b . ⎨ + 4b 4 b3 ⎪2 ⎪ ⎪⎩ 1, for x ≥ y + b
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 116
6.43. D. Only two of the four data points are within 5 of 10. ky(x) = g(x - y) = 0.1 + (x-y)3 /2000, y-5 ≤ t ≤ y+5. k6 (10) = 0.1 + (10 - 6)3 / 2000 = 0.132. k13(10) = 0.1 + (10 - 13)3 / 2000 = 0.0865. Thus the kernel smoothed density at 10 is: (0.132 + 0.0865) / 4 = 0.054625. Comment: Not a sensible kernel to use for practical applications. ky(x) is the density at x for a kernel “centered” at y. In this case, the mean of the kernel is not y. 6.44. C. All of the kernel centered at the data point 4 is to the left of 10. None of the kernel centered at the data point 17 is to the left of 10. k6 (x) = 0.1 + (x - 6)3 / 2000, 1 < x < 11. 10
K6 (10) =
∫1
10
k6(t) dt = (9)(0.1) +
∫1
(t - 6)3 / 2000 dt = 0.9 - 0.46125 = 0.853875.
k13(x) = 0.1 + (x - 13)3 / 2000, 8 < x < 18. 10
K13(10) =
∫8
10
k13(t) dt = (2)(0.1) +
∫8
(t - 13)3 / 2000 dt = 0.2 - 0.0680 = 0.132.
Thus the kernel smoothed distribution at 10 is: (1 + 0.853875 + 0.132 + 0) / 4 = 0.4964. 6.45. E. The empirical model is: 1/2 @20, 1/3 @30, and 1/6 @45. g(28 - 20) = g(8) = {1 - (8/15)2 }2 / 16 = 0.03200. g(28 - 30) = g(-2) = {1 - (-2/15)2 }2 / 16 = 0.06030. g(28 - 45) = g(-17) = 0. Thus the kernel smoothed density at 28 is: (1/2)(0.03200) + (1/3)(0.06030) = 0.0361. Comment: A Biweight kernel with bandwidth 15.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 117
6.46. D. The empirical model is: 1/2 @20, 1/3 @30, and 1/6 @45. g(t) = (1 - 2t2 /225 + t4 /50,625) / 16, -15 ≤ t ≤ 15. Integrating g(s) from -15 to t: G(t) = t/16 - t3 /5400 + t5 /4,050,000 + 15/16 - 153 /5400 + 155 /4,050,000 = 0.5 + t/16 - t3 /5400 + t5 /4,050,000. The kernel centered at 20 is all to the left of 40, so its contribution is one to the distribution. G(40 - 30) = 0.5 + 10/16 - 1000/5400 + 100,000/4,050,000 = 0.9645. G(40 - 45) = 0.5 - 5/16 + 125/5400 - 3125/4,050,000 = 0.2099. Thus the kernel smoothed distribution function at 40 is: (1/2)(1) + (1/3)(0.9645) + (1/6)(0.2099) = 0.8565.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 118
6.47. B. The empirical model is: 1/5@2, 3/5@3, and 1/5@7. The triangular kernel centered at 2 with bandwidth 2, stretches from 0 to 4. It has height 1/2 at 2, and thus area (1/2)(4)(1/2) = 1. Thus it is: (1/2)(3/4) = 3/8 at 2.5. The triangular kernel centered at 3 with bandwidth 2, stretches from 1 to 5. It has height 1/2 at 3, and thus area (1/2)(4)(1/2) = 1. 2.5 is 3/4 of the way from 1 to 3, and thus this triangle density is: (1/2)(3/4) = 3/8 at 2.5. The triangular kernel at 7 with bandwidth 2, stretches from 5 to 9 and does not contribute at 2.5. The height of the each triangular kernel at 2.5 is weighted by the empirical probability of the associated point. Thus the estimated density at 2.5 is: (1/5)(3/8) + (3/5)(3/8) + (1/5)(0) = 3/10 = 12/40. Comment: For example, here is the triangular kernel centered at 3: 0.5 0.4 0.3 0.2 0.1 1 2 3 4 5 Here is the triangular kernel smoothed density:
6
0.35 0.3 0.25 0.2 0.15 0.1 0.05 2
4
6
8
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 119
6.48. D. 2 + 1.4 = 3.4 ≤ 4. All of the area of the uniform kernel centered at 2.0 is to the left of 4.
2.0 2.0 - 1.4 = 0.6 Thus the uniform kernel at 2 contributes its full value to F(4). 4 - 1.4 ≤ 3.3 ≤ 4 + 1.4. (4 - 1.9)/2.8 = .75. 3/4 of the area of the uniform kernel centered at 3.3 is to the left of 4:
2.0 + 1.4 = 3.4
3.3 4.0 3.3 - 1.4 = 1.9 The uniform kernel centered at 3.3 contributes 75% of its value to F(4). Half of the area of the uniform kernel centered at 4 is to the left of 4:
3.3 + 1.4 = 4.7
4.0 4.0 - 1.4 = 2.6 The uniform kernel centered at 4 contributes 50% of its value to F(4).
4.0 + 1.4 = 5.4
(4.0 - 3.3)/2.8 = .25 ⇒ 1/4 of the area of the uniform kernel centered at 4.7 is to the left of 4:
4.7 4.7 + 1.4 = 6.1 4.0 The uniform kernel centered at 4.7 contributes 25% of its value to F(4). The sample was: 2.0, 3.3, 3.3, 4.0, 4.0, 4.7, 4.7, 4.7. For the empirical model we assign: 1/8 probability to 2, 2/8 probability to 3.3, 2/8 probability to 4, and 3/8 probability to 4.7. The kernel density estimate of F(4) is: (1)(1/8) + (.75)(2/8) + (.5)(2/8) + (.25)(3/8) = 0.53125. 4.7 - 1.4 = 3.3
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 120
6.49. B. 3 - 1.4 ≤ 2 ≤ 3 + 1.4. (4.4 - 2)/2.8 = 6/7, so the uniform kernel centered at 2 contributes 6/7 of its value to F(3). (4.4 - 3.3)/2.8 = 11/28, so the uniform kernel centered at 3.3 contributes 11/28 of its value to F(3). (4.4 - 4)/2.8 = 1/7, so the uniform kernel centered at 4 contributes 1/7 of its value to F(3). 4.7 > 3 + 1.4, so the the uniform kernel centered at 4.7 contributes nothing to F(3). The kernel density estimate of F(3) is: (6/7)(1/8) + (11/28)(2/8) + (1/7)(2/8) + (0)(3/8) = 0.241. Comment: Here is a graph of the kernel smoothed distribution: 1 0.8 0.6 0.4 0.2
1
2
3
4
5
6
7
6.50. D. The kernel is the equation for a semicircle of radius 1. Thus the kernels are three semicircles centered at 1, 3, and 5, each with radius 1. However, the value 3 appears twice in the sample. Thus the kernels are multiplied by 1/4, 1/2, and 1/4, with the figure centered at 3 having twice the height of the others, as in figure D. 6.51. B. The Empirical Model is 1/10 @ 1, 1/10 @ 2, and 8/10 @ 3. Bandwidth 1, means each uniform kernel has height 1/2 and stretches ±1 from each point. Using the uniform kernel with bandwidth 1, the contributions to the smoothed density are: first kernel: (.1)(1/2) = 0.05, 0 ≤ x ≤ 2. second kernel: (.1)(1/2) = 0.05, 1 ≤ x ≤ 3. third kernel: (.8)(1/2) = 0.40, 2 ≤ x ≤ 4. Therefore, using the uniform kernel with bandwidth 1, the smoothed density is: .05, for 0 ≤ x ≤ 1. .05 + .05 = .1, for 1 ≤ x ≤ 2. .05 + .40 = .45, for 2 ≤ x ≤ 3. (.8)(1/2) = .40, for 3 ≤ x ≤ 4.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 121
f(x) 0.45 0.4
0.1 0.05 1
2
3
4
x
Thus the distribution function is: .05x = .05x for 0 ≤ x ≤ 1, .05 + .1(x - 1) = .1x - .05 for 1 ≤ x ≤ 2, .15 + .45(x - 2) = .45x - .75 for 2 ≤ x ≤ 3, .6 + .4(x - 3) = .4x - .6 for 3 ≤ x ≤ 4, and 1 for x > 4. F(x) 1.0 0.8 0.6 0.4 0.2 x 1 2 3 4 5 The triangular kernel with bandwidth one has height 1 and stretches ±1 from each point. Using the triangular kernel with bandwidth 1, the contributions to the smoothed density are: first kernel: .1x, 0 ≤ x ≤ 1, and .1(2 - x), 1 ≤ x ≤ 2. second kernel: .1(x - 1), 1 ≤ x ≤ 2, and .1(3 - x), 2 ≤ x ≤ 3. third kernel: .8(x - 2), 2 ≤ x ≤ 3, and .8(4 - x), 3 ≤ x ≤ 4. Therefore, the smoothed density is: .1x for 0 ≤ x ≤ 1, .1(2 - x) + .1(x - 1) = .1 for 1 ≤ x ≤ 2, .1(3 - x) + .8(x - 2) = .7x - 1.3 for 2 ≤ x ≤ 3, .8(4 - x) = -.8x + 3.2, 3 ≤ x ≤ 4.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 122
f(x) 0.8
0.6
0.4
0.2
1
2
3
4
x
Therefore, integrating the smoothed density, the smoothed distribution function is: .05x2 for 0 ≤ x ≤ 1, .05 + .1(x-1) = .1x - .05 for 1 ≤ x ≤ 2, .35x2 - 1.3x + 1.35 for 2 ≤ x ≤ 3, -.4x2 + 3.2x - 5.4 , 3 ≤ x ≤ 4, and 1 for x > 4. F(x) 1.0 0.8 0.6 0.4 0.2
1 ^
2
3
4
5
x
^
Therefore, F 1 (x) = F 2 (x) for 1 ≤ x ≤ 2. Comment: The intervals in the choices should have included their endpoints. Note that on the interval from 1 to 2, the decline in the contribution from the first triangular kernel is exactly canceled by the increase in the contribution from the second triangular kernel. Therefore, on this interval the triangular kernel smoothed density is constant. The triangular smoothed distribution function can only equal that from the uniform kernel on such an interval where the triangular kernel smoothed density is constant. I do not like this exam question; not only is it long, it seems to mostly test mathematical cleverness rather than the application of kernel smoothing to actuarial work.
2013-4-6,
Fitting Loss Distributions §6 Kernel Smoothing,
HCM 10/14/12,
Page 123
6.52. E. For the uniform kernel centered at each point, we need to compute what percent is to the left of 40. For example, the kernel centered at 35, goes from 25 to 45, and 75% contributes to F(40). 25 30 35 35 37 39 45 47 49 55 % contribution 100% 100% 75% 75% 65% 55% 25% 15% 5% 0 (1/10)(100% + 100% + 75% + 75% + 65% + 55% + 25% + 15% + 5% + 0) = 0.515. S(40) = 1 - 0.515 = 0.485. 6.53. C. F5 (150) = (# ≤ 150)/(total #) = 2/5 = 0.4. The uniform kernel from 82 - 50 = 32 to 82 + 50 = 132 is all to the left of 150. The uniform kernel from 126 - 50 = 76 to 126 + 50 = 176 is 74% to the left of 150. The uniform kernel from 161 - 50 = 111 to 161 + 50 = 211 is 39% to the left of 150. The uniform kernel from 294 - 50 = 244 to 294 + 50 = 344 is 0% to the left of 150. The uniform kernel from 384 - 50 = 334 to 284 + 50 = 434 is 0% to the left of 150. ^
Therefore, F (150) = (1/5)(1) + (1/5)(74%) + (1/5)(39%) + (1/5)(0) + (1/5)(0) = 0.426. ^
| F (150) - F5 (150) | = | 0.426 - 0.4 | = 0.026.
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 124
Section 7, Estimation of Percentiles For a continuous distribution function, the 95th percentile is x, such that F(x) = 0.95. Similarly, the 80th percentile is x, such that F(x) = 0.80. The median is the 50th percentile; at the median the distribution function is 0.5. Exercise: Let F(x) = 1 - e-x/10. Find the 75th percentile of this distribution. [Solution: 0.75 = 1 - e-x/10. x = -10ln(1 - 0.75) = 13.86. Check: 1 - e-13.86/10 = 1 - .250 = .75. As shown in Appendix A: VaRp (X) = -θ ln(1-p). VaR0.75 = - (10) ln(0.25) = 13.86.] An actuary often wants to estimate the 90th, 95th, or other percentile of the distribution that generated a data set, without making an assumption as to the type of distribution. For example, for a data set of 4 losses: 378, 552, 961, 2034, the percentiles are estimated as follows: 378 552 961 2034 20% 40% 60% 80% Leaving 20% of the probability on either tail; we estimate a 1/(4+1) = 20% chance of a future loss less than 378 and a 20% chance of a future loss greater than 2034. Similarly, we estimate a 20% chance of a future loss of size between 961 and 2034. These four losses divide the real line into 5 intervals, and we assign 20% probability to each of these 5 intervals. By using the number of losses plus one, 4 + 1 = 5, rather than the number of losses, 4, in the denominator, room is left for a future loss bigger than 2034. Using 4 in the denominator, would result in the 4th loss of 2034 being (incorrectly) used to estimate the 100th percentile; we would (incorrectly) assume there could never be a loss greater than 2034 in the future. The use of N+1 in the denominator leaves probability on either tail for future losses larger or smaller than those that happen to appear in a finite sample. This method is distribution free. While we are assuming that the data consists of independent random draws from the same distribution, we have not assumed the losses came from any specific type of distribution such as a Weibull or Pareto. One can also estimate percentiles in between the 20th and 80th for this data set of 4 observed losses of sizes: 378, 552, 961, 2034, as follows using linear interpolation: 30th 50th 70th 465
756.5
1497.5
378
552
961
2034
20th
40th
60th
80th
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 125 Exercise: Given 4 observed losses of sizes: 378, 552, 961, 2034, estimate the 65th percentile. [Solution: The 60th percentile is estimated as 961 and the 80th percentile is estimated as 2034. Linearly interpolating, the 65th percentile is estimated as: (3/4)(961) + (1/4)(2034) = 1229.25.] Alternately, one could have multiplied the percentile times the number of losses + 1 to get: (.65)(5) = 3.25. Then the “3.25th loss” is our estimate of the 65th percentile. The 3.25th loss is linearly interpolated between the sizes of the 3rd and 4th losses: (3/4)(961) + (1/4)(2034) = 1229.25. This technique is referred to by Loss Models as the “smoothed empirical estimate” of the percentile.62 Given a data set with N points, the smoothed empirical estimate of the pth percentile is the p(N+1) loss from smallest to largest, linearly interpolating between two loss amounts if necessary. Exercise: Estimate the 55th percentile of the ungrouped data in Section 2. [Solution: There are 130 losses. (.55)(130 + 1) = 72.05. The 72nd loss is 128,000. The 73rd loss is 131,300. Thus the estimate 55th percentile is by linear interpolation: (.95)(128,000) + (.05)(131,300) = 128,165. ] Exercise: Given 4 observed losses: 378, 552, 961, 2034, estimate the 90th percentile. [Solution: The 80th percentile is estimated as 2034, the largest observed loss. Thus all we can say is the 90th percentile is greater than 2034.] In this case, one can not estimate percentiles less than 1/5 = 20% or greater than 4/5 = 80%. In general, with N points, one can not estimate percentiles less than 1/(N+1) or greater than N/(N+1). While this is significant limitation for the small data sets common in exam questions, it is not significant for large data sets. Exercise: Estimate the 95th percentile of the ungrouped data in Section 2. [Solution: There are 130 losses. (.95)(130 + 1) = 124.45. The 124th loss is 1,078,800. The 125th loss is 1,117,600. Thus the estimate 95th percentile is by linear interpolation: (.55)(1,078,800) + (.45)(1,117,600) = 1,096,260. ] This is how one gets point estimates for percentiles.63
62
See Definition 15.3 in Loss Models. How to get interval estimates of percentiles is not on the syllabus. See for example "Mahler's Guide to Statistics," for CAS Exam 3L. 63
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 126 Empirical Distribution vs. Smoothed Empirical Estimate of Percentiles: The smoothed empirical estimate of percentiles differs from the Empirical Distribution Function. The Empirical Distribution Function at x is defined as: (# losses ≤ x)/(total # of losses). Exercise: What is the Empirical Distribution Function for a data set of 4 losses: 378, 552, 961, 2034? [Solution: The Empirical Distribution Function is 0 for x < 378. It jumps up to 1/4 at 378. It is 1/4 for 378 ≤ x < 552. There is another jump discontinuity at the next observed value of 552. The Empirical Distribution Function is: 1/2 for 552 ≤ x < 961, 3/4 for 961 ≤ x < 2034, and 1 for 2034 ≤ x. ] Below are shown the Empirical Distribution Function (thinner horizontal lines) and the smoothed empirical estimates of percentiles (thicker line segments) for the data set: 378, 552, 961, 2034. 1
0.75
0.5
0.25
378 552
961
2034
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 127 N versus N+1: For a data set of size N, there is an N in the denominator of the Empirical Distribution Function.64 In contrast, in the smoothed empirical estimate of percentiles, the ith loss is an estimate of the i/(N+1) percentile; there is N+1 in the denominator.65 In some cases we use in the denominator N, the number of data points, while in others we use N + 1: Smoothed empirical estimate of percentiles ⇒ N+1 in the denominator. p-p plots ⇒ N+1 in the denominator. Empirical Distribution Function ⇒ N in the denominator. Kolmogorov-Smirnov Statistic ⇒ N in the denominator.
64
The Empirical Distribution Function is used in the computation of the Kolmogorov-Smirnov Statistic discussed in a subsequent section. 65 The estimated percentile is also used in the p-p plots, discussed in an subsequent section. In p-p plots there is N+1 in the denominator of the first component of each point.
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 128 Problems: 7.1 (1 point) There are seven losses of sizes: 15, 22, 25, 39, 43, 54, 76. Estimate the 75th percentile of the size of loss distribution. A. less than 53 B. at least 53 but less than 55 C. at least 55 but less than 57 D. at least 57 but less than 59 E. at least 59 Use the following information for the next two questions:
• The state of Minnehaha requires that each of its towns and cities budget for snow removal, so as to expect to have enough money to remove all the snow in 19 out of 20 winters • You are hired as a consultant by the state of Minnehaha to check the snow removal budgets of each of its towns and cities • The town of Frostbite Falls, Minnehaha pays $10,000 per inch for snow removal.
• Over the last 127 winters, the ten with the most snow in Frostbite Falls have had the following numbers of inches: 133, 137, 142, 151, 162, 166, 176, 181, 190, 224. 7.2 (1 point) Determine the single best estimate of how much Frostbite Falls needs to budget for snow removal for the coming winter. A. Less than $1.50 million B. At least $1.50 million, but less than $1.55 million C. At least $1.55 million, but less than $1.60 million D. At least $1.60 million, but less than $1.65 million E. At least $1.65 million 7.3 (1 point) You are rehired as a consultant by the state of Minnehaha to do the same job one year later. During the most recent winter, Frostbite Falls had 282 inches of snow! Determine the revised single best estimate of how much Frostbite Falls needs to budget for snow removal for the coming winter. A. Less than $1.50 million B. At least $1.50 million, but less than $1.55 million C. At least $1.55 million, but less than $1.60 million D. At least $1.60 million, but less than $1.65 million E. At least $1.65 million
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 129 7.4 (2 points) You are given the following random sample of 19 claim amounts from policies with a limit of 100: 5, 5, 5, 5, 10, 10, 15, 20, 25, 25, 25, 30, 40, 50, 75, 90, 100, 100, 100. Determine the smoothed empirical estimate of the 70th percentile. (A) 35 (B) 40 (C) 45 (D) 50 (E) can not be determined 7.5 (2 points) You are given the following random sample of 13 claim amounts: 99 133 175 216 250 277 651 698 735 745 791 906 947 Using the smoothed empirical estimates, to which percentile does 500 correspond? (A) 45th
(B) 46th
(C) 47th
(D) 48th
(E) 49th
7.6 (2 points) You are given the following size of loss data for general liability insurance: 175 200 250 300 300 350 400 450 500 500 550 800 1000 1500 Calculate the smoothed empirical estimate of the 75th percentile. (A) 550 (B) 600 (C) 650 (D) 700 (E) 750 7.7 (4, 5/86, Q.52) (1 point) You are given the following random sample 1.1, 1.3, 1.8, 2.4, 2.5, 2.6, 2.9, 3.0, 3.2 and 3.7 from an unknown continuous distribution. Which of the following represents the 90th sample percentile? A. Undefined B. 3.20 C. 3.25 D. 3.65 E. 3.70 7.8 (4, 5/87, Q.53) (1 point) You are given the following random sample from an unknown continuous distribution: 34, 61, 20, 16, 91, 85, 6. What are the 25th and 75th sample percentiles, respectively? A. 13.4, 67 B. 27.25, 69.75 C. 16, 61 D. 11, 88 E. None of the above. 7.9 (4B, 11/92,Q.11) (1 point) A random sample of 20 observations has been ordered as follows: 12, 16, 20, 23, 26, 28, 30, 32, 33, 35, 36, 38, 39, 40, 41, 43, 45, 47, 50, 57 Determine the 60th sample percentile, Π60. A. 32.4
B. 36.0
C. 38.0
D. 38.4
E. 38.6
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 130 7.10 (4B, 5/93, Q.30) (1 point) The following 20 wind losses, recorded in millions of dollars,occurred in 1992: 1, 1, 1, 1, 1, 2, 2, 3, 3, 4 6, 6, 8, 10, 13, 14, 15, 18, 22, 25 Calculate the 75th sample percentile. A. 12.25 B. 13.00 C. 13.25 D. 13.75 E. 14.00 7.11 (4B, 11/97, Q.29) (2 points) You wish to calculate the (100p)th sample percentile based on a random sample of 4 observations. Determine all values of p for which the (100p)th sample percentile is defined. A. 0≤p≤1 B. 0.20≤p≤0.80 C. 0.25≤p≤0.75 D. 0.33≤p≤0.67 E. p = 0.50 7.12 (IOA 101, 4/00, Q.1) (1.5 points) Fourteen economists were asked to provide forecasts for the percentage rate of inflation for the third quarter of 2002. They produced the forecasts given below. 1.2 1.4 1.5 1.5 1.7 1.8 1.8 1.9 1.9 2.1 2.7 3.2 3.9 5.0 Calculate the median and the 25th and 75th percentiles of these forecasts. 7.13 (4, 5/00, Q.2) (2.5 points) You are given the following random sample of ten claims: 46 121 493 738 775 1078 1452 2054 2199 3207 Determine the smoothed empirical estimate of the 90th percentile. (A) Less than 2150 (B) At least 2150, but less than 2500 (C) At least 2500, but less than 2850 (D) At least 2850, but less than 3200 (E) At least 3200 7.14 (IOA 101, 4/01, Q.1) (1.5 points) The following amounts are the sizes of claims on homeowners insurance policies for a certain type of repair. 198 221 215 209 224 210 223 215 203 210 220 200 208 212 216 Determine the smoothed empirical estimates of the 25th percentile (lower quartile), median, and 75th percentile (upper quartile) of these claim amounts. 7.15 (4, 11/02, Q.2 & 2009 Sample Q. 31) (2.5 points) You are given the following claim data for automobile policies: 200 255 295 320 360 420 440 490 500 520 1020 Calculate the smoothed empirical estimate of the 45th percentile. (A) 358 (B) 371 (C) 384 (D) 390 (E) 396
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 131 7.16 (4, 11/04, Q.2 & 2009 Sample Q.134) (2.5 points) You are given the following random sample of 13 claim amounts: 99 133 175 216 250 277 651 698 735 745 791 906 947 Determine the smoothed empirical estimate of the 35th percentile. (A) 219.4 (B) 231.3 (C) 234.7 (D) 246.6 (E) 256.8
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 132 Solutions to Problems: 7.1. B. 75th percentile is about the (.75)(7+1) = 6th claim from smallest to largest, which is 54. 7.2. C. The estimate of the 95th percentile is the (127+1)(.95) = 121.6 winter. The 121st winter is 151, the 122nd winter is 162. Thus the estimated 95th percentile is: (.4)(151) + (.6)(162) = 157.6 inches. (157.6 inches)($10,000 / inch) = $1.576 million. 7.3. D. The estimate of the 95th percentile is the (128+1)(.95) = 122.55 winter. The 122nd winter is 162, the 123rd winter is 166. Thus the estimated 95th percentile is (.45)(162)+(.55)(166) = 164.2 inches. (164.2 inches)($10,000 / inch) = $1.642 million. Comment: Note that we have relied upon an estimation technique that assumes that each year of snow is an independent random draw from the same (unknown) distribution. If in fact the amount of snowfall for two consecutive years is highly correlated, given the very large amount of snow in the most recent winter, we may have underestimated the budget for the next year. In the case of correlated years some sort of Simulation Model might be helpful. Insurance losses for consecutive years tend to be positively correlated. If one has a better than expected year, the next year is also likely to be better than expected, and vice versa. In any case, relying on a 127 or 128 year long record of weather, treating each year as equally relevant regardless of how long ago it is, may be a questionable methodology. However, do not worry about all of these types of issues, when answering what are intended to be straightforward questions on lower numbered actuarial exams. 7.4. D. p(N+1) = (.7)(19 + 1) = 14. We want the 14th claim from smallest to largest: 50. Comment: The 16th claim from smallest to largest, 90, is the estimate of the 16/20 = 80th percentile. Due to the censoring from above, we can not estimate percentiles larger than the 80th. 7.5. C. 277 corresponds to: 6/(13 + 1) = 0.4286. 651 corresponds to: 7/(13 + 1) = 0.5. 500 corresponds to: (.4286)(651- 500)/(651 - 277) + (.5)(500 - 277)/(651 - 277) = 0.471. Comment: Backwards question. (.47)(13 + 1) = 6.58. Thus, the smoothed empirical of the 47th percentile is: (.42)(277) + (.58)(651) = 494. 7.6. B. p(N+1) = (.75)(14 + 1) = 11.25. 11th loss is 550. 12th loss is 800. estimate of the 75th percentile is: (.75)(550) + (.25)(800) = 612.5. 7.7. D. Since one has 10 claims, the 90th percentile is estimated as the (1+10)(.90) = 9.9th claim. The 9th claim is 3.2 while the 10th claim is 3.7. Interpolating linearly, one gets: (.9)(3.7) + (.1)(3.2) = 3.65.
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 133 7.8. E. Order the data points: 6, 16, 20, 34 , 61, 85, 91. We have 7 data points. The pth percentile is therefore estimated as the p(7+1) point. Thus the 25th percentile is estimated as the (.25)(8) = 2nd point = 16. The 75th percentile is estimated as the (.75)(8) = 6th point: 85. 7.9. E. With 20 observations one estimates the 60th percentile as the (.60)(20+1) = 12.6 claim. The 12th claim (from smallest to largest) is 38, while the 13th claim is 39. Interpolating one estimates the 60th percentile as: (38)(.4) + (39)(.6) = 38.6. 7.10. D. For 20 claims, in order to estimate the 75th percentile, we look at the (20+1)(.75) = 15.75 claim. The 15th claim is of size 13, while the 16th claim is of size 14. Linearly interpolating: (1/4)(13) + (3/4)(14) = 13.75. 7.11. B. Let the 4 claims from smallest to largest be: v, x, y, and z. Then v is an estimate of the 100(1/(1+4)) = 20th percentile. x is an estimate of the 100(2/(1+4)) = 40th percentile. y is an estimate of the 100(3/(1+4)) = 60th percentile. z is an estimate of the 100(4/(1+4)) = 80th percentile. Thus the (100p)th sample percentile is defined for 0.20 ≤ p ≤ 0.80. 7.12. N = 14, so the median is (.5)(14 + 1) = 7.5th value: (1.8 + 1.9)/2 = 1.85. 25th percentile is (.25)(14 + 1) = 3.75th value: (.25)(1.5) + (.75)(1.5) = 1.5. 75th percentile is (.75)(14 + 1) = 11.25th value: (.75)(2.7) + (.25)(3.2) = 2.825. 7.13. D. The estimate of the 90th percentile is the (.9)(10 + 1) = 9.9th claim. Thus we linearly interpolate between the 9th claim of 2199 and the 10th claim of 3207: (.1)(2199) + (.9)(3207) = 3106. 3207 3106
2199 9
9.9
10
2013-4-6, Fitting Loss Dists. §7 Estimation of Percentiles, HCM 10/14/12, Page 134 7.14. Sorting the 15 values from smallest to largest: 198, 200, 203, 208, 209, 210, 210, 212, 215, 215, 216, 221, 220, 223, 224. (0.25)(15 + 1) = 4. ⇒ 208 is the estimated 25th percentile. (0.5)(15 + 1) = 8. ⇒ 212 is the estimated median percentile. (0.75)(15 + 1) = 12. ⇒ 221 is the estimated 75th percentile. 7.15. C. N = 11, p = .45, (N+1)p = 5.4. The 5th claim is 360 and the 6th claim is 420. Linearly interpolating with more weight to 360: (.6)(360) + (.4)(420) = 384. 420
384
360 5
5.4
6
7.16. D. (.35)(13 + 1) = 4.9. Linearly interpolate between the 4th and 5th value: (.1)(216) + (.9)(250) = 246.6. 250 246.6
216 4
4.9
5
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 135
Section 8, Percentile Matching In order to use Percentile Matching to fit a distribution, one matches a number of percentiles of the data and the distribution equal to the number of parameters of the distribution. Exponential Distribution: To employ percentile matching with a one parameter distribution, one solves for the parameter value such that F(x) equals the chosen percentile. For example, assume one were using percentile matching at the 95th percentile to fit an exponential distribution to the ungrouped data in Section 2. Exercise: For the ungrouped data set in Section 2, estimate the 95th percentile. [Solution: There are 130 claims. The observed 95th percentile is at about the 0.95(131) = 124.45th claim. The 124th claim is 1,078,800. The 125th claim is 1,117,600. Thus the 95th percentile = (0.55)(1,078,800) +(0.45)(1,117,600) = 1,096,260.] The observed 95th percentile is somewhere around 1.096 million. For the fitted Exponential, we want F(1.096 million) = 0.95. Therefore, one solves for the exponential parameter θ, such that 0.95 = 1 - exp(-1.096 x 106 /θ). ⇒ θ = 3.7 x 105 . As shown in Appendix A: VaRp (X) = -θ ln(1-p). 1.096 million = VaR0.95 = - θ ln(0.05). ⇒ θ = 3.7 x 105 . Similarly, for the 90th percentile, the observed 90th percentile is around 7.63 x 105 , and θ = -7.63 x 105 / ln(1-0.9) = 3.3 x 105 . Note that the value of the fitted parameter depends on the percentile at which the matching is done. If the curve really fit the data perfectly, it would not matter what percentile we used; this rarely is the case in the real world. In general for percentile matching to an Exponential, if p1 is the percentile at which one is matching, and x1 is the observed value for that percentile, then θ = -x1 / ln(1 - p1 ). Exercise: For the ungrouped data set in Section 2, estimate the 75th percentile. [Solution: There are 130 claims. The smoothed empirical estimate of the 75th percentile is the (.75)(131) = 98.25th claim. The 98th claim is 284,300. The 99th claim is 316,300. Thus the 75th percentile = (.75)(284,300) + (.25)(316,300) = 292,300.] Exercise: For the ungrouped data set in Section 2, fit an Exponential Distribution by matching to the 75th percentile. [Solution: F(292,300) = 1 - exp[-292,300/θ] = 0.75. ⇒ θ = -292,300 / ln(1 - 0.75) = 210,850.]
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 136
Two Parameter Distributions: For distributions with two parameters the matching is done at two selected percentiles. Again the fitted parameters would depend on the percentiles selected. For example, let's fit a LogNormal to the ungrouped data set in Section 2, at the 75th and 95th percentiles. As determined above, for this data set the estimated 75th percentile is 292,300 and the estimated 95th percentile is 1,096,260. For the fitted LogNormal we want F(292,300) = 0.75 and F(1,096,260) = 0.95. We have two equations in the two unknown parameters µ and σ. Φ[{ln(292,300)−µ} / σ] = 0.75, and Φ[{ln(1,096,260)−µ} / σ] = 0.95. By use of the Standard Normal Table, Φ[0.674] = 0.75 and Φ[1.645] = 0.95. Thus the two equations are equivalent to: {ln(292,300)−µ} / σ = 0.674 and {ln(1,096,260)−µ} / σ = 1.645. One can solve σ = {ln(1,096,260)-ln(292,300)} / (1.645 - 0.674) = 1.361, and then µ = ln(1,096,260) - (1.361)(1.645) = 11.67. Generally, it is relatively straightforward to check the results of percentile matching. Given the fitted parameters, one goes back and check the resulting values of the Distribution Function. For example, for a LogNormal Distribution with µ = 11.67 and σ = 1.361, F(1,096,260) = Φ[{ln(1,096,260)−11.67} / 1.361] = Φ[1.644] = 0.950, and F(292,300) = Φ[{ln(292,300)−11.67} / 1.361] = Φ[0.673] = 0.750. Thus this checks.66
It checks to the level of accuracy used. With no intermediate rounding I got µ = 11.6667 and σ = 1.36225. This is beyond the level of accuracy usually employed for the crude technique of percentile matching. 66
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 137
Below are some examples of results of percentile matching applied to the ungrouped data set in Section 2.67 Distrib. Data
Perc. 1 Perc. 2 Perc. 3
Par. 1
Par. 2
Par. 3
Mean(000) Coef. Var. Skewness 313 2.01 4.83
LogNormal LogNormal LogNormal
10 50 75
60 90 95
11.50 11.70 11.67
1.625 1.439 1.362
370 340 296
3.61 2.63 2.32
57.8 26.1 19.5
Loglogistic Loglogistic Loglogistic
10 50 75
60 90 95
1.0435 1.1911 1.3964
100,768 120,650 133,089
2323 659 385
N.A. N.A. N.A.
N.A. N.A. N.A.
Pareto Pareto Pareto
10 50 75
60 90 95
1.3035 1.4584 1.7548
145,749 198,286 242,895
480 433 322
N.A. N.A. N.A.
N.A. N.A. N.A.
Weibull Weibull Weibull
10 50 75
60 90 95
164,384 211,889 166,908
0.8672 0.6508 0.5829
177 289 261
1.16 1.59 1.82
2.48 3.97 4.85
Burr
10
50
2.0531
336,729
365
N.A.
N.A.
90
0.8888
In the following, let the percentiles at which the matching is done be p1 and p2 , with corresponding loss amounts x1 and x2 . LogNormal Distribution: Exercise: For the ungrouped data in Section 2, determine the smoothed empirical estimate of the 50th percentile. [Solution: For the ungrouped data in Section 2 with 130 values, the 50th percentile is the (131)(0.5) = 65.5th value from smallest to largest. (119,300)(0.5) + (122,000)(0.5) = 120,650.] Exercise: For the ungrouped data in Section 2, determine the smoothed empirical estimate of the 90th percentile. [Solution: (131)(0.9) = 117.9. (0.1)(737,700) + (0.9)(766,100) = 763,260..] Exercise: Fit a LogNormal to the ungrouped data in Section 2, matching at the 50th and 90th percentiles. [Solution: F(120,650) = Φ[(ln(120,650) - µ)/σ] = 0.5. ⇒ (11.7001 - µ)/σ = 0. ⇒ µ = 11.7001. F(763,260) = Φ[(ln(763,260) - µ)/σ] = 0.9. ⇒ (13.5454 - µ)/σ = 1.282.
⇒ σ = (13.5454 - 11.7001)/1.282 = 1.439.]
67
I have used the following estimates of the percentiles: 10th 12,270, 50th 120,650, 60th 148,620, 75th 292,300, 90th 763,260, 95th 1,096,260.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 138
Exercise: Verify that a LogNormal Distribution with parameters µ = 11.7001, and θ = 1.439, matches the ungrouped data in Section 2 at the 50th, and 90th percentiles. [Solution: F(120,650) = Φ[(ln(120,650) - 11.7001)/1.439] = Φ[0] = 0.50. F(763,260) = Φ[(ln(763,260) - 11.7001)/1.439] = Φ[1.282] = 0.90.] For the LogNormal, the two equations for matching percentiles are: Φ((lnx1 - µ)/σ) = p1 and
Φ((lnx2 - µ)/σ) = p2 . Let ξ1 = Φ-1(p1 ). ξ2 = Φ-1(p2 ). Then (lnx1 - µ)/σ = ξ1 and (lnx2 - µ)/σ = ξ2 . Therefore, σ = {ln(x2 ) - ln(x1 )} / (ξ2 - ξ1 ) = {ln(x2 /x1 )} / (ξ2 - ξ1 ), and µ = ln(x2 ) - σξ2 . For the above example, ξ1 = Φ-1(.5) = 0, and ξ2 = Φ-1(.9) = 1.282.
σ = {ln(763,260/120,650)}/(1.282 - 0) = 1.439. µ = ln(763,260) - (1.439)(1.282) = 11.70. Weibull Distribution: Exercise: Fit the Weibull to the ungrouped data in Section 2, matching at the 50th and 90th percentiles. [Solution: For the Weibull F(x) = 1 - exp(-(x/θ)τ). The estimated 50th percentile is 120,650. The estimated 90th percentile is 763,260. Therefore, exp[-(120,650/θ)τ] = 0.5, and exp[-(763,260/θ)τ] = 0.1.
⇒ (120,650/θ)τ = -ln(0.5), and 763,260/θ)τ = -ln(0.1). τ ln(120,650) - τln(θ) = ln(-ln(.5)), and τ ln(763,260) - τln(θ) = ln(-ln(0.1)). Therefore, ln(θ) =
ln[-ln(.1)] ln(120,650) - ln[-ln(.5)] ln(763,260) = 14.7233/1.20055 = 12.2638. ln[-ln(.1)] - ln[-ln(.5)]
θ = exp[12.2638] = 211,885. τ = ln(-ln(0.1)) / {ln(763,260) - ln(θ)} = 0.6508. Alternately, as shown in Appendix A: VaRp (X) = θ [ -ln(1-p) ]1/τ. Thus, 120,650 = θ [ -ln(0.5) ]1/τ, and 763,260 = θ [ -ln(0.1) ]1/τ]. Dividing these two equations: 6.3262 = [ln(10)/ln(2)]1/τ.
⇒ ln(6.3262) = ln[ln(10)/ln(2)]/τ = ⇒ τ = 0.6508. ⇒ θ = 763,260 / ln[10]1/0.6508 = 211,886.]
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 139
In general, the percentile matching for a Weibull was done as follows: Let ξ1 = -ln(1-p1 ), ξ2 = -ln(1-p2 ). Then
θ = exp[{ln(ξ2 ) ln(x1 ) - ln(ξ1 ) ln(x2 )}/{ ln(ξ2 ) - ln(ξ1 )}] and τ = ln(ξ2 ) / {ln(x2 ) - ln(θ)}. In the previous example, ξ1 = -ln(1-p1 ) = -ln(.5) = ln(2). ln(ξ1 ) = ln(ln(2)) = -.366.
ξ2 = -ln(1-p2 ) = -ln(.1) = ln(10). ln(ξ2 ) = ln(ln(10)) = .834. ln(x1 ) = ln(120,650) = 11.70. ln(x1 ) = ln(763,260) = 13.55. Then θ = exp[{ln(ξ2 ) ln(x1 ) - ln(ξ1 ) ln(x2 )}/{ ln(ξ2 ) - ln(ξ1 )}] = exp[(.834)(11.70) - (-.366)(13.55)]/{.834-(-.366)} ] = 211,889. If p1 = .25 and p2 = .75, then the general formulas become:
ξ1 = -ln(.75) = ln(4/3) and ξ2 = -ln(.25) = ln(4) θ = exp[{ln( ln(4)) ln(x1 ) - ln(ln(4/3)) ln(x2 )}/{ ln( ln(4)) - ln(ln(4/3))}] = exp[{g ln(x1 ) - ln(x2 )}/{ g-1), where g = ln( ln(4))/ln(ln(4/3), and
τ = ln(( ln(4)) / {ln(x2 ) - ln(θ)}. These match the formulas in Appendix A of Loss Models.68 Loglogistic: (x / θ)γ For the Loglogistic Distribution, F(x) = . 1 + (x / θ)γ Exercise: Fit a Loglogistic to the ungrouped data in Section 2, matching at the 75th and 95th percentiles.
68
The formulas for percentile matching will not be attached to your exam in the abridged version of Appendix A.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 140
[Solution: Matching, we want F(292,300) = 0.75, and F(1,096,260) = 0.95.
⇒ (292,300/θ)γ / {1 + (292,300/θ)γ} = 0.75, and (1,096,260/θ)γ / {1 + (1,096,260/θ)γ} = 0.95. {1 + (292,300/θ)γ} / (292,300/θ)γ = 4/3, and {1 + (1,096,260/θ)γ} / (1,096,260/θ)γ = 20/19. (292,300/θ)−γ + 1 = 4/3 and (1,096,260/θ)−γ + 1 = 20/19. (292,300/θ)−γ = 1/3, and (1,096,260/θ)−γ = 1/19. (292,300/θ)γ = 3, and (1,096,260/θ)γ =19. Thus (1,096,260/292,300)γ = 19/3. γ ln(1,096,260/292,300) = ln(19/3) γ = ln(19/3) / ln(1,096,260/292,300) = 1.396. ⇒ θ = 1,096,260 /(191/γ) = 133,015. Alternately, as shown in Appendix A: VaRp (X) = θ {p-1 - 1}-1/γ. Thus, 292,300 = θ {1/0.75 - 1}-1/γ, and 1,096,260 = θ {1/0.95 - 1}-1/γ. ⎛ 1/ 0.75 - 1⎞ 1/ γ Dividing the two equations: 3.7505 = ⎜ ⎟ = 6.33331/γ. ⎝ 1/ 0.95 - 1⎠
⇒ γ = ln(6.3333)/ln(3.7505) = 1.396. ⇒ θ = 1,096,260 /(191/γ) = 133,015.] In general, the percentile matching for a Loglogistic is done as follows: Let ξ1 = p1 / (1-p1 ), ξ2 = p2 / (1-p2 ). Then γ = ln(ξ2 / ξ1 ) / ln(x2 / x1 ), and θ = x2 / ξ2 1/γ. If p1 = 0.25 and p2 = 0.75, then the formulas become: ξ1 = 1/3 and ξ2 = 3
γ = ln(9) / ln(x2 / x1 ) and θ = x2 / 31/γ . We can get theta as well from θ = x1 / ξ1 1/γ = x1 31/γ. Thus in this case, θ =
x1 x2 . Writing for the 25th and 75th percentiles x1 = p and x2 = q,
then these formulas become: γ = 2 ln(3) / {ln(q) - ln(p)} and θ = exp[((ln(q) + ln(p))/2].69 Pareto Distribution: Exercise: Set up the equations to be solved, to fit a Pareto to the ungrouped data in Section 2, matching at the 50th and 90th percentiles. [Solution: For the Pareto F(x) = 1 - (1 + x/θ)−α. The estimated 50th percentile is 120,650. The estimated 90th percentile is 763,260. Therefore, (1 + 120,650/θ)−α = 0.5 and (1 + 763,260/θ)−α = 0.1. One could rewrite this as 120,650/θ = 0.5−1/α -1 and 763,260/θ = 0.1−1/α -1.] 69
These match the formulas in Appendix A of Loss Models. The formulas for percentile matching will not be attached to your exam in the abridged version of Appendix A.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 141
If we were to fit to a Pareto to the ungrouped data in Section 2, matching at the 50th and 90th percentiles, one could numerically solve for alpha: (0.1−1/α -1) / (0.5−1/α -1) = 763,260/120,650. The solution turns out to be α = 1.4584. Then θ = 120,650 / (0.5−1/α -1) = 198,286. Exercise: Check the above fit via matching at the 50th and 90th percentiles of a Pareto. [Solution: F(x) = 1 - (1 + x/θ)−α. F(120650) = 1 - (1 + 120650/198286)-1.4584 = 1 - 1/1.60851.4584 = 0.500. F(763260) = 1 - (1 + 763260/198286)-1.4584 = 0.900. ] In general for the Pareto, let ξ1 (α) = (1-p1 )−1/α, ξ2 (α) = -ln(1-p2 )−1/α. Then solve the following equation numerically for alpha:70 (ξ2 (α) -1) /(ξ1 (α) -1) - x2 /x1 = 0. Then θ = x1 /(ξ1 (α) -1).
Burr Distribution with Alpha Fixed:71 72 The Burr Distribution has parameters α, θ and γ. If alpha is known, then we have two parameters, and therefore we match at two percentiles. Exercise: For the ungrouped data in Section 2, determine the smoothed empirical estimate of the 60th percentile. [Solution: For the ungrouped data in Section 2 with 130 values, the 60th percentile is the (131)(0.6) = 78.6th value from smallest to largest. (0.4)(146,100) + (0.6)(150,300) = 148,620.] Exercise: For the ungrouped data in Section 2, determine the smoothed empirical estimate of the 80th percentile. [Solution: (131)0(.8) = 104.8. (0.2)(406,900) +0 (.8)(423,200) = 419,940.] Exercise: Set up the equations that need to be solved in order to fit a Burr Distribution with α = 2 to the ungrouped data in Section 2, by matching at the 60th and 80th percentiles. [Solution: For the Burr, the survival function S(x) = (1/{1 + (x/θ)γ})α = (1/{1 + (x/θ)γ})2 . Thus the two equations are: (1/{1 + (148,620/θ)γ})2 = 0.4, and (1/{1 + (419,940/θ)γ})2 = 0.2.] 70
Note that in the case of the Pareto, as well as the Burr, one can reduce to one equation in one unknown to be solved numerically. In more complicated cases, percentile matching could be performed by solving numerically two equations in two unknowns. However, this is probably wasted effort due to the inherent lack of accuracy of percentile matching. 71 See 4, 11/06, Q.1. 72 The Loglogistic Distribution is a Burr Distribution with α = 1. See 4, 11/05, Q.3.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 142
1 = 0.4. ⇒ (148,620/θ)γ = 2.5 - 1 = .5811. {1 + (148,620 / θ)γ}2 1 = 0.2. ⇒ (419,940/θ)γ = 5 - 1 = 1.2361. {1 + (419,940 / θ)γ}2 Dividing the two equations eliminates θ: 2.8259γ = 2.1272.
⇒ γ = ln(2.1272)/ln(2.8259) = .7266. ⇒ θ = 419,940/1.23611/.7266 = 313,687. Exercise: Verify that a Burr Distribution with parameters α = 2, θ = 313,687, and γ = .7266, matches the ungrouped data in Section 2 at the 60th, and 80th percentiles. [Solution: F(x) = 1 - (1/{1 + (x/θ)γ})α = 1 - (1/{1 + (x/313,687)}0.7266))2 . F(148,620) = 1 - (1/{1 + (148,620/313,687)0.7266})2 = 0.600. F(419,940) = 1 - (1/{1 + (419,940/313,687)0.7266})2 = 0.800.] Burr Distribution: The Burr Distribution has three parameters α, θ and γ. Thus we match at three percentiles. Exercise: Set up the equations that need to be solved in order to fit a Burr Distribution to the ungrouped data in Section 2, by matching at the 10th, 50th and 90th percentiles. [Solution: For the Burr, the survival function S(x) = (1/{1 + (x/θ)γ})α. For the ungrouped data in Section 2, the 10th percentile is 12,270, the 50th percentile is 120,650 and the 90th percentile is 763,260. Thus the three equations are: (1/{1 + (12,270/θ)γ})α = 0.9, (1/(1 + {120,650/θ)γ})α = 0.5, and (1/{1 + (763,260/θ)γ})α = 0.1. These can be simplified to: (12,270/θ)γ = 0.9−1/α - 1, (120,650/θ)γ = 0.5−1/α - 1, and (763,260/θ)γ) = 0.1−1/α - 1. γ ln(12,270/θ) = ln(0.9−1/α - 1), γ ln(120,650/θ) = ln(0.5−1/α - 1), and γ ln(763,260/θ)γ) = ln(0.1−1/α - 1).]
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 143
Let ξ1 (α) = (1-p1 )−1/α - 1, ξ2 (α) = ln(1-p2 )−1/α - 1 , ξ3 (α) = ln(1-p3 )−1/α - 1. Then one could rewrite the equations for percentile matching as: γ ln(x1 /θ) = ln(ξ1 (α)), γ ln(x2 /θ) = ln(ξ2 (α)), γ ln(x3 /θ) = ln(ξ3 (α)). Subtracting the first equation from the second equation gives: γ {ln(x2 /θ) - ln(x1 /θ)} = ln(ξ2 (α)) − ln(ξ1 (α)), or γ = ln(ξ1 (α)/ξ2 (α)) / {ln(x1 /x2 )}. Similarly γ = ln(ξ2 (α)/ξ3 (α)) / {ln(x2 /x3 )}. We need to solve numerically for alpha such that: ln(ξ1 (α)/ξ2 (α)) / ln(x1 /x2 ) = ln(ξ2 (α)/ξ3 (α)) / ln(x2 /x3 ), with ξi(α) = (1-pi)−1/α -1. Then γ = ln(ξ1 (α)/ξ2 (α)) / {ln(x1 /x2 )} and θ = x1 / ξ1 (α)1/γ. Exercise: Verify that a Burr Distribution with parameters α = 2.05305, θ = 336,729, and γ = 0.888834, matches the ungrouped data in Section 2 at the 10th, 50th and 90th percentiles. [Solution: F(x) = 1 - (1/(1 + (x/θ)γ))α. F(12270) = 1 - 1/{(1 + (12270/336729)0.888834)}2.05305 = 0.100 F(120650) = 1 - 1/{(1 + (120650/336729)0.888834)}2.05305 = 0.500 F(763260) = 1 - 1/{(1 + (763260/336729)0.888834)}2.05305 = 0.900.]
Summary: One matches at a number of percentiles equal to the number of fitted parameters. Important cases include: Exponential Distribution, LogNormal Distribution, Weibull Distribution, LogLogistic, Burr with fixed alpha, and the Single Parameter Pareto Distribution.73
73
Fitting to the Single Parameter Pareto Distribution will be discussed in a subsequent section
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 144
Grouped Data: One can also apply percentile matching to grouped data.74 One must somehow estimate the percentile(s) for the data. This is most easily done if one chooses one of the endpoints of the intervals at which to match. For example for the grouped data in Section 3, out of 10,000 there are 8175 accidents of size less than or equal to 25,000. Interval ($000) Number of Accidents Cumulative Number of Accidents 0-5 2208 2208 5 -10 2247 4455 10-15 1701 6156 15-20 1220 7376 20-25 799 8175 25-50 1481 9656 50-75 254 9910 75-100 57 9967 100 - ∞ 33 10000 SUM 10,000 Therefore, if one fit an Exponential Distribution to this data by percentile matching at 25,000, we would set 1 - e-25000/θ = 8175/10000. ⇒ θ = -25000/ln(.1825) = 14,697. Exercise: Fit a LogNormal Distribution to the above grouped data from Section 3, via percentile matching at 15,000 and 50,000. [Solution: There are 6156 out of 10000 accidents less than or equal to 15,000. Set Φ[(ln(15,000) - µ)/σ] = 0.6156. ⇒ (9.616 - µ)/σ = 0.294. There are 9656 out of 10000 accidents less than or equal to 50,000. Set Φ[(ln(50,000) - µ)/σ] = 0.9656. ⇒ (10.820 - µ)/σ = 1.820. (10.820 - µ)/(9.616 - µ) = 1.820/ .294 = 6.190. ⇒ µ = 9.38. ⇒ σ = 0.79.] Exercise: Fit a Weibull Distribution to the above grouped data from Section 3, via percentile matching at 20,000 and 100,000. [Solution: There are 7376 out of 10000 accidents less than or equal to 20,000. Set 1 - exp[-(20000/θ)τ] = 0.7376. ⇒ (20000/θ)τ = 1.338. There are 9967 out of 10000 accidents less than or equal to 100,000. Set 1 - exp[-(100000/θ)τ] = 0.9967. ⇒ (100000/θ)τ = 5.714.
⇒ 5τ = 5.714/1.338 = 4.271. ⇒ τ = 0.902. ⇒ θ = 14,482.] 74
See 4, 11/02, Q. 37.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 145
Mixtures: For example, one can model losses via a two-point mixture: F(x) = w A(x) + (1 - w) B(x), where A(x) and B(x) are each distribution functions.75 w is an additional parameter. 0 ≤ w ≤ 1. The number of parameters for this mixture is: 1 + (number for parameters for A) + (number of parameters for B). Thus this mixture has at least three parameters. On your exam, when fitting a mixture, all but one or two of these parameters will be fixed, since otherwise one would need a computer to determine the answer. Exercise: Let A(x) be an Exponential Distribution with mean 10. Let B(x) be an Exponential Distribution with mean 20. For some data, the smoothed empirical estimate of the 70th percentile is 15. FIt a two-point mixture via percentile matching. [Solution: 0.7 = w (1 - e-15/10) + (1 - w) (1 - e-15/20). ⇒ w = (0.3 - e-15/20) / (e-15/10 - e-15/20) = 0.692. Comment: Note that for the first Exponential, F(15) = 1 - e-15/10 = 0.777 > 0.7. For the second Exponential, F(15) = 1 - e-15/20 = 0.528 < 0.7. The only way it is possible to get 0.7 by weighting together two numbers, with weights that sum to one and are each at least 0 and at most 1, is if one of the numbers is greater than or equal to 0.7 and the other number is less than or equal to 0.7. More generally, for a two point mixture with all the parameters other than w fixed, in order for percentile matching to result in 0 ≤ w ≤ 1, at the empirical pth percentile, one of the component distributions has to be greater than or equal to p/100, while the other one is less than or equal to p/100.]
75
See “Mahlerʼs Guide to Loss Distributions.” One could mix more than two distributions, and there are also continuous mixtures.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 146
Problems: Use the following information to answer each of the following two questions. A data set of claim sizes has its 15th and 85th percentiles at 1000 and 5000 respectively. A LogNormal distribution is fit to this data via percentile matching applied to these two percentiles. 8.1 (2 points) The fitted distribution has a σ parameter in which of the following intervals? A. less than 0.6 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9 8.2 (1 point) The fitted distribution has a µ parameter in which of the following intervals? A. less than 7.2 B. at least 7.2 but less than 7.4 C. at least 7.4 but less than 7.6 D. at least 7.6 but less than 7.8 E. at least 7.8 Use the following information to answer each of the following two questions. A data set of claim sizes has its 35th percentile and 75th percentile at 10 and 20 respectively. A Weibull distribution, is fit to this data via percentile matching applied to these two percentiles. 8.3 (3 points) The fitted distribution has a τ parameter in which of the following intervals? A. less than 1.3 B. at least 1.3 but less than 1.4 C. at least 1.4 but less than 1.5 D. at least 1.5 but less than 1.6 E. at least 1.6 8.4 (2 points) Determine the θ parameter of the fitted distribution. A. 15.0
B. 15.5
C. 16.0
D. 16.5
E. 17.0
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 147
8.5 (2 points) You observe the following 6 claims: 162.22, 151.64, 100.42, 174.26, 20.29, 34.36. A Distribution: F(x) = 1 - e-qx, x > 0, is fit to this data via percentile matching at the 57th percentile. Determine the value of q. A. less than 0.006 B. at least 0.006 but less than 0.007 C. at least 0.007 but less than 0.008 D. at least 0.008 but less than 0.009 E. at least 0.009 Use the following information for the next two questions: You observe the following 9 values: 11.2, 11.4, 11.6, 11.7, 11.8, 11.9, 12.0, 12.3, 12.4 A Normal Distribution is fit to this data via percentile matching at the 10th and 80th percentiles. 8.6 (2 points) Determine the value of σ. A. 0.4
B. 0.5
C. 0.6
D. 0.7
E. 0.8
D. 12.1
E. 12.3
8.7 (1 point) Determine the value of µ. A. 11.5
B. 11.7
C. 11.9
8.8 (2 points) You are given the following information about a set of individual claims: (i) 80th percentile = 115,000 (ii) 95th percentile = 983,000 A LogNormal Distribution is fit using percentile matching. Using this LogNormal Distribution, estimate the average size of claim. A. Less than 360,000 B. At least 360,000 but less than 380,000 C. At least 380,000 but less than 400,000 D. At least 400,000 but less than 420,000 E. At least 420,000
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
8.9 (2 points) You are given the following data: 51 66 94 180 317 502 672 1626
HCM 10/14/12,
Page 148
3542
You use the method of percentile matching at the 75th percentiles to fit a mixed distribution to these data: F(x) = w(1 - e-x/500) + (1 - w)(1 - e-x/1000). Determine the estimate of w. (A) Less than 0.35 (B) At least 0.35, but less than 0.40 (C) At least 0.40, but less than 0.45 (D) At least 0.45, but less than 0.50 (E) At least 0.50 8.10 (3 points) The smoothed empirical estimates of the 10th and 60th percentiles are 40 and 80 respectively. You use the method of percentile matching to fit a Gompertz Distribution: F(t) = 1 - exp[-B(ct - 1)/ln(c)], B > 0, c > 1, t ≥ 0. Estimate S(85). (A) Less than 0.24 (B) At least 0.24, but less than 0.26 (C) At least 0.26, but less than 0.28 (D) At least 0.28, but less than 0.30 (E) At least 0.30 8.11 (2 points) Assume that the heights of 15 year old boys are Normally Distributed. The 95th percentile of these heights is 184 centimeters. The 5th percentile of these heights is 158 centimeters. Using percentile matching, estimate σ. A. 7
B. 8
C. 9
D. 10
E. 11
8.12 (4 points) F(10) = 0.58. F(20) = 0.78. You fit via percentile matching an Inverse Burr Distribution with τ = 0.7. Determine F(50). A. 91% B. 92%
C. 93%
D. 94%
E. 95%
8.13 (3 points) Let R be the weekly wage for a worker compared to the statewide average. R follows a LogNormal Distribution, with σ < 2. 97.5% of workers have weekly wages at most twice the statewide average. Determine what percentage of workers have weekly wages less than half the statewide average. A. 2% B. 3% C. 4% D. 5% E. 6%
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 149
8.14 (2 points) You are given: (i) Losses follow a Weibull distribution with τ = 0.7 and θ unknown. (ii) A random sample of 200 losses is distributed as follows: Interval Number of Losses x ≤ 100 90 100 < x ≤ 250 60 x > 250 50 Estimate θ by matching at the 75th percentile. (A) Less than 150 (B) At least 150, but less than 175 (C) At least 175, but less than 200 (D) At least 200, but less than 225 (E) At least 225 8.15 (3 points) Assume that when professional golfers try to make a put, their chance of failure is a function of the distance of the golf ball from the cup. Assume that their chance of failure follows a Loglogistic Distribution. Their chance of success at 8 feet is 50%. Their chance of success at 16 feet is 20%. Estimate their chance of success at 40 feet. A. 2% B. 3% C. 4% D. 5% E. 6% 8.16 (4 points) Annual income follows a LogNormal Distribution. The 50th percentile of annual income is $50,000. The 95th percentile of annual income is $250,000. Determine the percentage of total income earned by the top 1% of earners. A. 8% B. 9% C. 10% D. 11% E. 12% 8.17 (2 points) You are given the following data: 42, 123, 140, 151, 209, 327, 435, 479, 721, 1358, 1625. You use the method of percentile matching at the 50th and 85th percentiles to fit a LogNormal distribution to these data. What is the second moment of the fitted distribution?. (A) Less than 500,000 (B) At least 500,000, but less than 1 million (C) At least 1 million, but less than 3 million (D) At least 3 million, but less than 5 million (E) At least 5 million
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 150
8.18 (4 points) You have data on the number of Temporary Total claims for 870 Workers Compensation Insurance classes. 358 of these classes have 250 or fewer claims. 440 of these classes have 500 or fewer claims. You fit via percentile matching a mixture of two Exponential Distributions, with weights 55% and 45%. What is the resulting estimate of the number of classes with more than 5000 claims? A. 100 B. 110 C. 120 D. 130 E. 140 8.19 (2 points) For a portfolio of policies, you are given: (i) Losses follow a Weibull distribution with parameters θ and τ. (ii) A sample of 13 losses is: 18 26 28 29
35
43
57
94
119
166
400
569
795
(iii) The parameters are to be estimated by percentile matching using the 50th and 90th smoothed empirical percentiles. Calculate the estimate of θ. (A) Less than 120 (B) At least 120, but less than 125 (C) At least 125, but less than 130 (D) At least 130, but less than 135 (E) At least 135 8.20 (4, 5/85, Q.52) (3 points) It was determined that the 40th percentile of a sample is 1, and that the 75th percentile of the sample is 64. Use percentile matching to estimate the parameter, τ, of a Weibull distribution. Which of the following represents the value of τ? A.
ln 0.25 - ln 0.6 ln 64
D. ln[
ln 0.25 ] (ln 64) (ln 0.6)
B.
ln 0.25 (ln 64) (ln 0.6)
C . ln[
ln 0.25 ] / ln 64 ln 0.6
E. None of the above
8.21 (160, 5/89, Q.16) (2.1 points) A sample of 10 lives was observed from the time of diagnosis until death. You are given: (i) Times of death were: 1, 1, 2, 3, 3, 3, 4, 4, 5 and 5. (ii) The lives were subject to a survival function, S(t) = αt2 + βt + 1, 0 ≤ t ≤ k . Determine the parameter α by the method of percentiles matching, using the 25th and 75th percentiles. (A) -0.04
(B) -0.03
(C) -0.02
(D) -0.01
(E) 0
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 151
8.22 (4, 5/90, Q.44) (2 points) A random sample of claims has been drawn from a Loglogistic distribution, with unknown parameters θ and γ. In the sample, 80% of the claim amounts exceed $100 and 20% of the claim amounts exceed $400. Find the estimate of θ by percentile matching. A. Less than 100 B. At least 100 but less than 150 C. At least 150 but less than 200 D. At least 200 but less than 250 E. At least 250 8.23 (160, 5/91, Q.20) (1.9 points) For a complete study of five lives, you are given: (i) Deaths occur at times t = 2, 3, 3, 5, 7. (ii) The underlying survival distribution is S(t) = 4-λt, t ≥ 0. ^
Using the percentile matching at the median, calculate λ . (A) 0.111
(B) 0.125
(C) 0.143
(D) 0.167
(E) 0.333
8.24 (4B, 5/96, Q.17) (2 points) You are given the following:
•
Losses follow a Pareto distribution, with parameters θ and α.
•
The 10th percentile of the distribution is θ - k, where k is a constant.
•
The 90th percentile of the distribution is 5θ - 3k.
Determine α. A. Less than 1.25 B. At least 1.25, but less than 1.75 C. At least 1.75, but less than 2.25 D. At least 2.25, but less than 2.75 E. At least 2.75
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 152
8.25 (4B, 11/96, Q.3) (2 points) You are given the following:
• • •
Losses follow a Weibull distribution, with parameters θ and τ. The 25th percentile of the distribution is 1,000. The 75th percentile of the distribution is 100,000.
Determine τ. A. Less than 0.4 B. At least 0.4, but less than 0.6 C. At least 0.6, but less than 0.8 D. At least 0.8, but less than 1.0 E. At least 1.0 8.26 (Course 160 Sample Exam #1, 1996, Q.13) (1.9 points) From a sample of 10 lives diagnosed with terminal cancer, you are given: (i) The deaths occurred at times 4, 6, 6, 6, 7, 7, 9, 9, 9, 14. (ii) The underlying distribution was Weibull with parameters θ and τ. Determine τ by the method of percentiles matching, using the 25th and 75th percentiles. (A) 2
(B) 3
(C) 4
(D) 5
(E) 6
8.27 (4, 5/00, Q.32) (2.5 points) You are given the following information about a sample of data: (i) Mean = 35,000 (ii) Standard deviation = 75,000 (iii) Median = 10,000 (iv) 90th percentile = 100,000 (v) The sample is assumed to be from a Weibull distribution. Determine the percentile matching estimate of the parameter τ . (A) Less than 0.25 (B) At least 0.25, but less than 0.35 (C) At least 0.35, but less than 0.45 (D) At least 0.45, but less than 0.55 (E) At least 0.55 8.28 (4, 11/00, Q.39) (2.5 points) You are given the following information about a study of individual claims: (i) 20th percentile = 18.25 (ii) 80th percentile = 35.80 Parameters µ and σ of a lognormal distribution are estimated using percentile matching. Determine the probability that a claim is greater than 30 using the fitted lognormal distribution. (A) 0.34 (B) 0.36 (C) 0.38 (D) 0.40 (E) 0.42
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 153
8.29 (4, 11/02, Q.37 & 2009 Sample Q. 54) (2.5 points) You are given: (i) Losses follow an exponential distribution with mean θ. (ii) A random sample of losses is distributed as follows: Loss Range Number of Losses (0 – 100] 32 (100 – 200] 21 (200 – 400] 27 (400 – 750] 16 (750 – 1000] 2 (1000 – 1500] 2 Total 100 Estimate θ by matching at the 80th percentile. (A) 249
(B) 253
(C) 257
(D) 260
(E) 263
8.30 (4, 11/03, Q.2 & 2009 Sample Q.1) (2.5 points) You are given: (i) Losses follow a loglogistic distribution with cumulative distribution function: (x / θ)γ F(x) = . 1 + (x / θ)γ (ii) The sample of losses is: 10 35 80 86
90
120
158
180
200
210
1500
Calculate the estimate of θ by percentile matching, using the 40th and 80th empirically smoothed percentile estimates. (A) Less than 77 (B) At least 77, but less than 87 (C) At least 87, but less than 97 (D) At least 97, but less than 107 (E) At least 107 8.31 (4, 11/04, Q.30 & 2009 Sample Q.155) (2.5 points) You are given the following data: 0.49 0.51 0.66 1.82 3.71 5.20 7.62 12.66 35.24 You use the method of percentile matching at the 40th and 80th percentiles to fit an Inverse Weibull distribution to these data. Determine the estimate of θ. (A) Less than 1.35 (B) At least 1.35, but less than 1.45 (C) At least 1.45, but less than 1.55 (D) At least 1.55, but less than 1.65 (E) At least 1.65
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 154
8.32 (4, 11/05, Q.3 & 2009 Sample Q.216) (2.9 points) A random sample of claims has been drawn from a Burr distribution with known parameter α = 1 and unknown parameters θ and γ. You are given: (i) 75% of the claim amounts in the sample exceed 100. (ii) 25% of the claim amounts in the sample exceed 500. Estimate θ by percentile matching. (A) Less than 190 (B) At least 190, but less than 200 (C) At least 200, but less than 210 (D) At least 210, but less than 220 (E) At least 220 8.33 (4, 11/06, Q.1 & 2009 Sample Q.246) (2.9 points) You are given: (i) Losses follow a Burr distribution with α = 2. (ii) A random sample of 15 losses is: 195 255 270 280 350 360 365 380 415 450 490 550 575 590 615 (iii) The parameters γ and θ are estimated by percentile matching using the smoothed empirical estimates of the 30th and 65th percentiles. Calculate the estimate of γ. (A) Less than 2.9 (B) At least 2.9, but less than 3.2 (C) At least 3.2, but less than 3.5 (D) At least 3.5, but less than 3.8 (E) At least 3.8 8.34 (4, 5/07, Q.24) (2.5 points) For a portfolio of policies, you are given: (i) Losses follow a Weibull distribution with parameters θ and τ. (ii) A sample of 16 losses is: 54 70 75 81 84 88 97 105 109 114 122 125 128 139 146 153 (iii) The parameters are to be estimated by percentile matching using the 20th and 70th smoothed empirical percentiles. Calculate the estimate of θ. (A) Less than 100 (B) At least 100, but less than 105 (C) At least 105, but less than 110 (D) At least 110, but less than 115 (E) At least 115
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 155
8.35 (4, 5/07, Q.28) (2.5 points) You are given the following graph of cumulative distribution functions:
Determine the difference between the mean of the lognormal model and the mean of the data. (A) Less than 50 (B) At least 50, but less than 150 (C) At least 150, but less than 350 (D) At least 350, but less than 750 (E) At least 750
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 156
Solutions to Problems: 8.1. C. & 8.2. D. Set the distribution function at 5000 of the LogNormal equal to .85 and at 1000 equal to .15: Φ(((ln(5000)-µ)/σ) = 0.85 and Φ(((ln(1000)-µ)/σ) = 0.15. Therefore consulting the Normal Table, ((ln(5000)-µ)/σ = 1.036 and ((ln(1000)-µ)/σ = -1.036. Solving, σ = {ln(5000) - ln(1000)} / (1.036 - (-1.036)) = 0.7768. and µ = ln(1000) - (0.7768)(-1.036) = 7.713. 8.3. E., 8.4. D. As shown in Appendix A: VaRp (X) = θ [ -ln(1-p) ]1/τ. Thus, 10 = θ [ -ln(0.65) ]1/τ, and 20 = θ [ -ln(0.25) ]1/τ]. Dividing these two equations: 2 = [ln(4)/ln(1/0.65)]1/τ = .3.218081/τ.
⇒ τ = ln(3.21808)/ln(2) = 1.686. ⇒ θ = 20 / ln[4]1/1.686 = 16.48. Alternately, for the Weibull F(x) = 1 - exp(-(x/θ)τ). F(10) = 0.35 and F(20) = 0.75. Therefore, exp(-(10/θ)τ) = 1 - .35, and exp(-(20/θ)τ) = 1 - 0.75. (10/θ)τ = -ln(1 - 0.35) = .4308, and (20/θ)τ = -ln(1 - 0.75) = 1.3863. Dividing the second equation by the first: 2τ = 3.2180. ⇒ τln(2) = ln(3.2180) = 1.1688. τ = 1.1688 / .6932 = 1.686. (20/θ)1.686 = 1.3863. ⇒ θ = 20/1.38631/1.686 = 16.48. 8.5. A. First order the claims from smallest to largest. The 4th claim is 151.64 and is an estimate of the 4/(1+ 6) = 0.57 percentile. Set 1 - e-q151.64 = 0.57. Thus q = 0.0056. 8.6. B. & 8.7. C. The 10th percentile is estimated as the (.1)(9+1) = 1st claim, 11.2. The 80th percentile is estimated as the (.80)(9+1) = 8th claim, which is 12.3. The Distribution Function is in terms of the Standard Normal: F(x) = Φ((x - µ)/σ). Set F(11.2) = 0.1 and F(12.3) = 0.8. Thus Φ((11.2 - µ)/σ) = 0.1 and Φ((12.3 - µ)/σ) = 0.8. Now the Standard Normal has Φ(.842) = 0.8 and Φ(1.282) = 1 - 0.9 = 0.1. Therefore: (12.3 - µ)/σ = 0.842, and (11.2 - µ)/σ = -1.282. Thus: 12.3 - µ = .842σ, and 11.2 - µ = -1.282σ. Subtracting the two equations: 1.1 = 2.124σ and thus σ = 0.518. µ = 11.2 + 1.282σ = 11.2 + (1.282)(.518) = 11.86.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 157
8.8. E. One sets the distribution function of the LogNormal equal to the percentiles: .80 = F(115000) = Φ(ln(115000) - µ)/σ), and .95 = F(983,000) = Φ(ln(983,000) - µ)/σ). Consulting the Normal Distribution Table, this implies that: .842 = (11.653 - µ)/σ and 1.645 = (13.798 - µ)/σ. Dividing the second equation by the first: (13.798 - µ)/(11.653 - µ) = 1.954. Solving µ = 9.403 and σ = 2.672. For the fitted LogNormal Distribution, E[X] = exp(µ + σ2/2) = exp(9.403 + 2.6722 /2) = e12.973 = 430,538. Comment: Similar to 4, 11/00, Q.39. 8.9. A. The 75th percentile is the (.75)(9 + 1) = 7.5th claim: (672 + 1626)/2 = 1149. .75 = w(1 - e-1149/500) + (1 - w)(1 - e-1149/1000). ⇒ w = (.25 - e-1.149)/( e-2.298 - e-1.149) = 0.309. Comment: For a two-point mixture in which only the weight is unknown, the pth percentile of the mixture is between the individual pth percentiles. Therefore, in order for the fitted weight, via percentile matching at the pth percentile, to be between 0 and 1, the empirical pth percentile has to be between the individual pth percentiles. In this case, the empirical 75th percentile has to be between -500 ln(.25) = 693 and -1000 ln(.25) = 1386. 8.10. E. .1 = 1 - exp[-B(c40 - 1)/ln(c)]. ⇒ B(c40 - 1)/ln(c) = .105361. 0.6 = 1 - exp[-B(c80 - 1)/ln(c)]. ⇒ B(c80 - 1)/ln(c) = .916291. Dividing the two equations: (c80 - 1)/(c40 - 1) = 8.69672. c80 - 8.69672c40 + 7.69672 = 0. c40 = {8.69672 ±
8.696722 - (4)(7.69672) }/2 = {8.69672 ± 6.69672}/2 = 7.69672 or 1.
c > 1 ⇒ c40 = 7.69672. ⇒ c = 1.05234.
⇒ B = .916291 ln(1.05234)/(1.0523480 - 1) = 0.0008029. S(85) = exp[-0.0008029(1.0523485 - 1)/ln(1.05234)] = 0.305. 8.11. B. 0.95 = F(184) = Φ[(184 - µ)/σ]. ⇒ (184 - µ)/σ = 1.645. ⇒ 184 - µ = 1.645σ. 0.05 = F(158) = Φ[(158 - µ)/σ]. ⇒ (158 - µ)/σ = -1.645. ⇒ 158 - µ = -1.645σ. Subtracting the two equations: 3.29σ = 26. ⇒ σ = 7.90 centimeters. Comment: µ = 158 + 1.645σ = 171 = (158 + 184)/2.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 158
8.12. C. As shown in Appendix A, VaRp [X] = θ {p-1/τ - 1}-1/γ. Thus, 10 = θ {0.58-1/0.7 - 1}-1/γ = θ 0.84931/γ, and 20 = θ {0.78-1/0.7 - 1}-1/γ = θ 2.34691/γ. Dividing these two equations: 2 = 2.76331/γ. ⇒ γ = ln(2.7633)/ln(2) = 1.466.
⇒ θ = 10 (1.17751/1.466) = 11.18. F(50) = (1 + (11.18/50)1.466)-0.7 = 92.9%. Alternately, for the Inverse Burr, F(x) = {(x /θ)γ/(1 + (x /θ)γ)}τ = {1 + (θ /x)γ}−τ = {1 + (θ /x)γ}-0.7. Therefore, 0.58 = F(10) = {1 + (θ /10)γ}-0.7. ⇒ (θ /10)γ = 1/0.581/.7 - 1 = 1.1775. Also, 0.78 = F(20) = {1 + (θ /20)γ}-0.7. ⇒ (θ /20)γ = 1/0.781/.7 - 1 = 0.4261. Dividing the first equation by the second equation: 2γ = 2.763. ⇒ γ = ln(2.763)/ln(2) = 1.466.
⇒ θ = 10 (1.17751/1.466) = 11.18. F(50) = (1 + (11.18/50)1.466)-0.7 = 92.9%. 8.13. E. Since R is the ratio with respect to the statewide average, E[R] = 1. Therefore, the LogNormal Distribution has mean of 1. exp[µ + σ2/2] = 1. ⇒ µ = -σ2/2. .975 = F(2) = Φ[(ln2 − µ)/σ] . ⇒ (ln2 − µ)/σ = 1.960. ⇒ ln2 − µ = 0.69315 + σ2/2 = 1.960σ. σ2 - 3.842σ + 1.386. ⇒ σ = {3.92 ±
3.922 - (4)(1)(1.386) }/2 = (3.92 ± 3.134)/2 =
0.393 or 3.527. Since we are given that σ < 2, σ = 0.393. µ = -σ2/2 = -0.0772. F(0.5) = Φ[(ln.5 − µ)/σ] = Φ[(-.6932 + .0772)/.393] = Φ[-1.567] = 5.9%. Comment: Such wage tables are used to price the impact of changes in the laws governing Workers Compensation benefits. 8.14. B. Out of a total of 200 losses, there are 150 losses less than or equal to 250 and 50 greater than 250, so 250 is the best estimate of the 75th percentile, given this grouped data. Matching at the 75th percentile: 1 - exp[-(250/θ)0.7] = .75. ⇒ ln(.25) = -(250/θ)0.7.
⇒ θ = 250/ {-ln(.25)}1/0.7 = 157. Comment: Similar to 4, 11/02, Q.37.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 159
8.15. C. We match at two percentiles, getting two equations in two unknowns. For the Loglogistic, VaRp [X] = θ {p-1 - 1}-1/γ. At 8 feet the chance of failure is 50%, and at 16 feet the chance of failure is 80%. Thus we have, 8 = θ {1/0.5 - 1)-1/γ = θ, and 16 = θ {1/0.8 - 1)-1/γ = θ 41/γ . Therefore, 16 = (8) (41/γ). ⇒ γ = 2. The chance of success is: 1 - F(x) =
1 . 1 + (x / 8)2
The chance of success at 40 feet is:
1 = 1/26 = 3.85%. 1 + (40 / 8)2
Comment: Check, the chance of success at 8 feet is: and the chance of success at 16 feet is:
1 = 50%, 1 + (8 / 8)2
1 = 20%. 1 + (16 / 8)2
The chance of failure goes to 1 as x approaches infinity. 8.16. B. 0.50 = Φ[{ln(50,000) - µ}/σ]. ⇒ µ = ln(50,000) = 10.820. 0.95 = Φ[{ln(250,000) - µ}/σ]. ⇒ σ = {ln(250,000) - ln(50,000)} / 1.645 = 0.978. Thus the 99th percentile is: exp[10.820 + (2.326)(0.978)] = 486,420. E[X] = exp[10.820 + 0.9782 /2] = 80,680. E[X ∧ 486,420] = exp(µ + σ2/2) Φ[{ln(486,420) - µ - σ2 }/σ] + (486,420) {1 - Φ[{ln(486,420) - µ}/σ]} = (80,680) Φ[2.326 - 0.978] + (486,420) {1 - Φ[2.326]} = (80,680)(0.9115) + (486,420)(0.01) = 78,404. E[X | X > 486,420] = e(486,420) + 486,420 = (E[X] - E[X ∧ 486,260]) / S(486,420) + 486,420. = (80,680 - 78,404) / 0.01 + 486,420 = 714,020. The percentage of total income earned by the top 1% of earners is: (0.01)E[X | X > 486,420] / E[X] = (0.01)(714,020) / 80,680 = 8.85%. Comment: The distribution of annual incomes in the United States has a heavier righthand tail than this LogNormal.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 160
8.17. E. (0.5)(11 + 1) = 6. (0.85)(11 + 1) = 10.2. The smoothed empirical estimate of the 50th percentile is: 327. The smoothed empirical estimate of the 85th percentile is: (0.8)(1358) + (0.2)(1625) = 1411.4. One sets the distribution function of the LogNormal equal to the percentiles: 0.5 = F(327) = Φ[(ln(327) - µ)/σ], and 0.85 = F(1411.4) = Φ[(ln(1411.4) - µ)/σ]. Consulting the bottom of the Normal Distribution Table, this implies that: 0 = (5.78996 - µ)/σ and 1.036 = (7.25234 - µ)/σ. Solving µ = 5.78996 and σ = 1.41156. For the fitted LogNormal, E[X2 ] = exp[2µ + 2σ2 ] = exp[(2)(5.78996) + (2)(1.411562 )] = 5.75 million. 8.18. D. Matching the survival functions yields two equations in two unknowns: 1 - 358 / 870 = 0.55 exp[-250/θ1] + 0.45 exp[-250/θ2]. 1 - 440 / 870 = 0.55 exp[-500/θ1] + 0.45 exp[-500/θ2]. Let u = exp[-250/θ1] and v = exp[-250/θ2]. Then the two equations can be rewritten as: 0.5885 = 0.55 u + 0.45 v. ⇒ 58.62 = 55u + 45v. ⇒ v = 1.3027 - 1.2222 u. 0.4943 = 0.55 u2 + 0.45 v2 . ⇒ 49.43 = 55 u2 + 45 v2 .
⇒ 49.43 = 55 u2 + 45 (1.3027 - 1.2222 u)2 . ⇒ 122.22 u2 - 143.29 u + 26.94 = 0. ⇒ u =
143.29 ±
143.292 - (4)(122.22)(26.94) . (2)(122.22)
u = 0.9372 or 0.2352. If u = 0.9372, then v = 1.3027 - (1.2222)(0.9372) = 0.1573. If u = 0.2352, then v = 1.3027 - (1.2222)(0.2352) = 1.0153. However, v = exp[-250/θ2] < 1, so the second set of roots is no good. S(5000) = 0.55 exp[-5000/θ1] + 0.45 exp[-5000/θ2] = 0.55 u20 + 0.45 v20 = (0.55)(0.937220) + (0.45)(0.157320) = 0.1503. The estimated number of classes with more than 5000 claims is: (0.1503)(870) = 131. Comment: The data was taken from “NCCIʼs 2007 Hazard Group Mapping,” by John P. Robertson, Variance, Vol. 3, Issue 2, 2009. exp[-250/θ1] = u = 0.9372. ⇒ θ1 = 3855. exp[-250/θ2] = v = 0.1573. ⇒ θ2 = 135.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 161
8.19. B. (0.5)(13 + 1) = 7. (0.9)(13 + 1) = 12.6. The smoothed empirical estimate of the 50th percentile is: 57. The smoothed empirical estimate of the 90th percentile is: (0.4)(569) + (0.6)(795) = 704.6. From Appendix A of Loss Models, for the Weibull: VaRp (X) = θ {-ln(1-p)}1/τ. ⇒ 57 = θ {-ln(1-0.5)}1/τ = θ 0.693151/τ, and 704.6 = θ {-ln(1-0.9)}1/τ = θ 2.302591/τ. Dividing these two equations: 12.3614 = 3.321921/τ. ⇒ τ = ln[3.32192] / ln[12.3614] = 0.4774.
⇒ θ = 704.6 / 2.302591/0.4774 = 122.8. Comment: Similar to 4, 5/07, Q.24. 8.20. C. 1 - exp[-(1/θ)τ] = 0.4, and 1 - exp[-(64/θ)τ] = 0.75. ln(0.6) = -(1/θ)τ, and ln(.25) = -(64/θ)τ. Dividing the two equations: ln(.25)/ln(.6) = 64τ. ln[ln (0.25) / ln(0.6)] = τln(64). ⇒ τ = ln[ln(0.25) / ln (0.6)] / ln 64 ≅ 0.24. 8.21. D. The smoothed empirical estimate of the 25th percentile is the (.25)(10 + 1) = 2.75th loss
⇔ (.25)(1) + (.75)(2) = 1.75. The smoothed empirical estimate of the 75th percentile is the (.75)(10 + 1) = 8.25th loss
⇔ (.75)(4) + (.25)(5) = 4.25. Set .75 = S(1.75) = α3.0625 + β1.75 + 1. ⇒ 0 = α12.25 + β7 + 1. Set .25 = S(4.25) = α3.0625 + β4.25 + 1. ⇒ 0 = α72.25 + β17 + 3. Multiplying the first equation by 17 and the second equation by 7 and subtracting: 0 = -297.5α - 4. ⇒ α = -0.0134. ⇒ β = -0.1194.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 162
8.22. D. As shown in Appendix A: VaRp (X) = θ {p-1 - 1}-1/γ. Thus, 100 = θ {1/0.2 - 1}-1/γ, and 400 = θ {1/0.8 - 1}-1/γ. Dividing the two equations: 4 = 161/γ. ⇒ γ = 2. ⇒ θ = 400 / 41/γ = 400 / 41/2 = 200. Alternately, F(x) = (x/θ)γ/{1 + (x/θ)γ}. Since there are two parameters, we match percentiles at two points. 0.2 = F(100) = (100/θ)γ/(1+(100/θ)γ), and 0.8 = F(400) = (400/θ)γ/(1+(400/θ)γ) . Therefore, (100/θ)γ = 1/(1-.2) - 1 = 1/4 and (400/θ)γ = 1/(1-.8) - 1 = 4 . Therefore, γ ln100 - γln(θ) = ln(1/4), and γ ln400 - γ ln(θ)= ln(4). Subtracting the two equations: {ln(400) - ln(100)}γ = ln(4) - ln(.25) . Therefore, γ = {ln(4)-ln(.25)}/ {ln(400) - ln(100)} = ln(16)/ln(4) = 2ln(4)/ ln(4) = 2. Therefore θ = 400 / 41/γ = 400 / 41/2 = 200. Alternately, one can divide the two equations, and get (400/100)γ = 4/(1/4). 4γ = 16. Therefore, γ = 2 and θ = 200. Comment: Let ξ1 = p1 / (1 - p1 ) = .2/.8 = 1/4, ξ2 = p2 / (1 - p2 ) = .8/.2 = 4. Then γ = ln(ξ2 / ξ1 ) / ln(x2 / x1 ) = ln(16) / ln(400/100) = 2 ln(4) / ln(4) = 2, and θ = x2 / ξ2 1/γ = 400 / 41/2 = 200. 8.23. D. Set 0.5 = S(3) = 4-λ3. 3λ = 1/2. λˆ = 1/6 = 0.167. 8.24. C. For the Pareto Distribution F(x) = 1 - (θ/(θ+x))α . We are given that .10 = F(θ−k) = 1 - (θ/(θ+θ−k))α and .90 = F(5θ−3k) = 1 - (θ/(θ+5θ−3k))α. Therefore, .90 = (θ/(2θ−k))α and .10 = (θ/(6θ−3k))α = (θ/(2θ−k))α(3−α). Dividing these two equations one gets: 9 = 3α. Therefore α = 2. 8.25. A. One matches at two percentiles: 1 - exp(-(1000 / θ)τ) = .25, and 1 - exp(-(100000 / θ)τ) = .75. Therefore, (1000 / θ)τ = -ln(.75) , and (100000 / θ)τ = -ln(.25). Dividing the equations, 100τ = ln(.25)/ ln(.75) = 4.8188. τ = ln(4.8188)/ln(100) = 0.341.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 163
8.26. C. The empirical 25th percentile is the (.25)(11) = 2.75th value, which is 6. The empirical 75th percentile is the (.75)(11) = 8.25th value, which is 9. F(t) = 1 - exp[-(t/θ)τ]. .25 = F(6) = 1 - exp[-(6/θ)τ]. ⇒ (6/θ)τ = -ln(.75) = .2877. .75 = F(9) = 1 - exp[-(9/θ)τ]. ⇒ (9/θ)τ = -ln(.25) = 1.3863. Dividing the two equations: (9/6)τ = 4.8186. ⇒ τ = ln(4.8186)/ln(1.5) = 3.88. Comment: θ = 9/1.38631/3.88 = 8.27. 8.27. D. Since the Weibull Distribution has two parameters, we need to match at two percentiles, in this case the 50th (the median) and the 90th. 1 - exp(-(10000 / θ)τ) = .5, and 1 - exp(-(100000 / θ)τ) = .90. Therefore, (10000 / θ)τ = -ln(1-.5) = .693, and (100000 / θ)τ = -ln(1-.90) = 2.303. Dividing the equations, 10τ = 2.303/.693 = 3.32. Therefore, τ = log10(3.32) = 0.521. Alternately, for the Weibull: VaRp (X) = θ {-ln(1-p)}1/τ. ⇒ 10,000 = θ {-ln(1-0.5)}1/τ = θ 0.6931/τ, and 100,000 = θ {-ln(1-0.9)}1/τ = θ 2.3031/τ. Proceed as before. Comment: θ^ = 20,166. The given mean and standard deviation are not used. 8.28. A. One sets the distribution function of the LogNormal equal to the percentiles: 0.20 = F(18.25) = Φ[(ln(18.25) - µ)/σ], and 0 = F(35.80) = Φ[(ln(35.80) - µ)/σ]. Consulting the bottom of the Normal Distribution Table, this implies that: -.842 = (2.904 - µ)/σ and .842 = (3.578 - µ)/σ. Solving µ = 3.241 and σ = 0.400. For the fitted LogNormal, S(30) = 1 - Φ[(ln(30) - 3.241)/0.400] = 1 - Φ(0.400) = 1 - 0.6554 = 0.3446. 8.29. A. Out of a total of 100 losses, there are 80 losses less than or equal to 400 and 20 greater than 400, so 400 is the best estimate of the 80th percentile, given this grouped data. Matching at the 80th percentile: 1 - e−400/θ = .8. ⇒ θ = -400/ln(.2) = 249. Comment: When we have grouped data, we do not really know the exact individual sizes. Therefore, we just look for a place where the appropriate percentage of losses are less than or equal to X. When we have ungrouped data, as is more commonly the case, then we use the smoothed empirical estimate of percentiles. The ungrouped case is the one to know well.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 164
8.30. E. The smoothed empirical estimate of the 40th percentile is: (.4)(11 + 1) = 4.8th loss ⇔ (86)(.2) + (90)(.8) = 89.2. Similarly, 80th percentile is: (.8)(11 + 1) = 9.6th loss ⇔ (200)(.4) + (210)(.6) = 206. As shown in Appendix A: VaRp (X) = θ {p-1 - 1}-1/γ. Thus, 89.2 = θ {1/0.4 - 1}-1/γ, and 206 = θ {1/0.8 - 1}-1/γ. Dividing the two equations: 2.30942 = 61/γ. ⇒ γ = 2.141. ⇒ θ = (1.51/2.141)(89.2) = 107.8. Alternately, set F(89.2) = 0.4 and F(206) = 0.8. (89.2/θ)γ / {1 + (89.2/θ)γ} = .4. ⇒ 1.5 = θγ/ 89.2γ. (206/θ)γ / {1 + (206/θ)γ} = .8. ⇒ .25 = θγ/ 206γ.
⇒ 1.5/.25 = (206/89.2)γ. ⇒ γ = ln(6)/ln(206/89.2) = 2.141. ⇒ θ = (1.51/2.141)(89.2) = 107.8. Comment: Check: (89.2/107.8)2.141 / {1 + (89.2/107.8)2.141} = .6666/(1 + .6666) = .4. (206/107.8)2.141 / {1 + (206/107.8)2.141} = 4/(1 + 4) = .8. 8.31. D. (.4)(9 + 1) = 4. ⇒ The 40th percentile is 1.82. (.8)(9 + 1) = 8. ⇒ The 80th percentile is 12.66. As shown in Appendix A: VaRp (X) = θ {-ln(p)}−1/τ. Thus, 1.82 = θ ln(1/.4)-1/τ = θ 1.091361/τ, and 12.66 = θ ln(1/.8)-1/τ = θ 4.481421/τ. Dividing the two equations: 6.95604 = 4.106271/τ. ⇒ τ = 0.728.
⇒ θ = (1.82)(.91631/.728) = 1.61. Alternately, F(x) = exp[-(θ/x)τ]. .4 = F(1.82) = exp[-(θ/1.82)τ]. ⇒ .91629 = (θ/1.82)τ. .8 = F(12.66) = exp[-(θ/12.66)τ]. ⇒ .22314 = (θ/12.66)τ.
⇒ 4.106 = (12.66/1.82)τ = 6.956τ. ⇒ 1.412 = τ 1.9396. ⇒ τ = 0.728. ⇒ θ = (1.82)(.91631/.728) = 1.61.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 165
8.32. E. As shown in Appendix A: VaRp (X) = θ {(1-p)-1/α - 1}1/γ = θ {1/(1-p) - 1}1/γ. 100 = θ (1/.75 - 1)1/γ = θ 0.33331/γ, and 500 = θ (1/.25 - 1)1/γ = θ 31/γ. Dividing the two equations: 5 = 91/τ. ⇒ γ = ln9/ln5 = 1.3652. ⇒ θ = (100)(31/1.3652) = 223.6. Alternately, F(x) = 1 - {θγ/(θγ + xγ)}α = 1 - θγ/(θγ + xγ) = xγ/(θγ + xγ). .25 = F(100) = 100γ/(θγ + 100γ). ⇒ (.75) 100γ = .25θγ. .75 = F(500) = 500γ/(θγ + 500γ). ⇒ (.25) 500γ = .75θγ. Dividing the two equations: 5γ/3 = 3. ⇒ γ = ln9/ln5 = 1.3652. ⇒ θ = (100)(31/1.3652) = 223.6. Comment: A Burr Distribution with α = 1 is a Loglogistic Distribution. 8.33. E. (15 + 1)(.3) = 4.8. ⇔ (280)(.2) + (350)(.8) = 336. (15 + 1)(.65) = 10.4. ⇔ (450)(.6) + (490)(.4) = 466. As shown in Appendix A: VaRp (X) = θ {(1-p)-1/α - 1}1/γ = θ {1/(1-p)0.5 - 1}1/γ. 336 = θ (
1 0.7
- 1)1/γ = θ 0.195231/γ, and 466 = θ (
1 0.35
- 1)1/γ = θ 0.690311/γ.
Dividing the two equations: 1.3869 = 3.53591/τ. ⇒ γ = 3.86. ⇒ θ = 513. Alternately, Matching at the 30th percentile: .30 = 1 - {1 + (336/θ)γ}-2. ⇒ (336/θ)γ = .1952. Matching at the 65th percentile: .65 = 1 - {1 + (466/θ)γ}-2. ⇒ (466/θ)γ = .6903. Dividing the two equations: 1.387γ = 3.536. ⇒ γ = 3.86. ⇒ θ = 513.
2013-4-6,
Fitting Loss Distributions §8 Percentile Matching,
HCM 10/14/12,
Page 166
8.34. E. (0.2)(16 + 1) = 3.4. (0.7)(16 + 1) = 11.9. The smoothed empirical estimate of the 20th percentile is: (0.6)(75) + (.04)(81) = 77.4. The smoothed empirical estimate of the 70th percentile is: (0.1)(122) + (0.9)(125) = 124.7. From Appendix A, for the Weibull: VaRp (X) = θ {-ln(1-p)}1/τ. ⇒ 77.4 = θ {-ln(1-0.2)}1/τ = θ 0.223141/τ, and 124.7 = θ {-ln(1-0.7)}1/τ = θ 1.203971/τ. Dividing these two equations: 1.6111 = 5.39561/τ. ⇒ τ = 3.534. ⇒ θ = 118.3. Alternately, percentile matching we get two equations in two unknowns: 0.2 = 1 - exp[-(77.4/θ)τ]. ⇒ (77.4/θ)τ = 0.22314. 0.7 = 1 - exp[-(124.7/θ)τ]. ⇒ (124.7/θ)τ = 1.20397. Dividing these two equations: 1.6111τ = 5.3956. ⇒ τ = 3.534. ⇒ θ = 118.3. Comment: Check: 1 - exp[-(77.4/118.3)3.534] = 0.200. 1 - exp[-(124.7/118.3)3.534] = 0.700. 8.35. B. From the graph, F(10) = 20% and F(100) = 60%. Therefore, matching percentiles: Φ[(ln10 - µ)/σ] = .2. ⇒ (ln10 - µ)/σ = - 0.842. ⇒ ln10 - µ = -0.842σ. Φ[(ln100 - µ)/σ] = .6. ⇒ (ln100 - µ)/σ = 0.25. ln100 - µ = 0.25σ.
⇒ σ = {ln(100) - ln(10)}/(.25 + .842) = 2.109. ⇒ µ = ln(100) - (.23)(2.109) = 4.078. Mean of the LogNormal is: exp[4.078 + 2.1092 /2] = 546. From the empirical distribution function, 20% of the data is 10, 40% of the data is 100, and 40% of the data is 1000. The mean is: (20%)(10) + (40%)(100) + (40%)(1000) = 442. The difference between the means is: 546 - 442 = 104. Comment: A somewhat unusual question and a little long. If one does not round the 60th percentile of the Standard Normal Distribution to 0.25, then Φ[(ln10 - µ)/σ] = .2. ⇒ (ln10 - µ)/σ = - 0.842. ⇒ ln10 - µ = -0.842σ. Φ[(ln100 - µ)/σ] = .6. ⇒ (ln100 - µ)/σ = 0.253. ln100 - µ = 0.253σ.
⇒ σ = {ln(100) - ln(10)}/(.253 + .842) = 2.103. ⇒ µ = ln(100) - (.253)(2.103) = 4.073. Mean of the LogNormal is: exp[4.073 + 2.1032 /2] = 536. The difference between the means is: 536 - 442 = 94, resulting in the same letter solution.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 167
Section 9, Method of Moments One can fit a type of distribution to data via the Method of Moments, by finding that set of parameters such that the moments (about the origin) of the given distribution match the observed moments. If one has a single parameter, such as in the case of the Exponential Distribution, then one matches the observed mean to the theoretical mean of the loss distribution. Fitting the Exponential to the ungrouped data set in Section 2 using the Method of Moments: θ = E[X] = 312,675. Note that for the Exponential the Method of Maximum Likelihood applied to ungrouped data matches the result for the Method of Moments. Applying either one of these to fit an exponential are commonly asked on exams. Know how to do this!
Mean Coefficient of Variation Skewness
Ungrouped Data 312,675 2.01 4.83
Fitted Exponential 312,675 1.00 2.00
Thus we expect the Exponential has much too thin a righthand tail in order to properly fit the ungrouped data in Section 2. Fortunately there are other distributions to choose from, such as those in Appendix A of Loss Models. In general, given a distribution with n parameters, one can try to match the first n moments.76 For many of the two parameter distributions in the Appendix A of Loss Models, formulas are given for fitted parameters via the method of moments in terms of the first two moments.77 If one has a distribution with two parameters, then one can either match the first two moments or match the mean and the variance, whichever is easier.78 So for example, if we try to fit a Gamma distribution to the data in Section 2, then we can match the first two moments, since the Gamma has two parameters alpha and theta: E[X] = 3.12674 x 105 = αθ. E[X2 ] = 4.9284598 x 1011 = α (α+1) θ2. 76
If one has three parameters then one attempts to match the first three moments. You are extremely unlikely to be asked a method of moments question involving more than two parameters. 77 These formulas are not included in the Appendix attached to the exam. Formulas are given for the Pareto, Gamma, Inverse Gamma, LogNormal, Inverse Gaussian, and Beta (for fixed scale parameter). In their formulas in Appendix A of Loss Models, m stands for the first moment and t stands for the second moment. 78 One gets the same answer either way.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 168
Therefore, E[X2 ] - E[X]2 = αθ2 = 3.951 x 1011. Therefore θ = (E[X2 ] - E[X]2 )/ E[X] = 1.264 x 106 . Thus α = E[X2 ] / (E[X2 ] -E[X]2 ) = 0.2475.79 Exercise: Fit the Pareto Distribution to the data in Section 2. [Solution: Set the first moments equal: 3.12674 x 105 = θ/(α−1), and set the second moments equal: 4.9284598 x 1011 = 2θ2/{(α−1)(α−2)}. Dividing the second equation by the square of the first, eliminates θ: 4.9284598 x 1011 / ( 3.12674 x 105 )2 = 5.0411 = 2(α−1)/(α−2). (3.0411)α = 2(4.0411). α = 2.658. θ = (3.12674 x 105 )(2.658 - 1) ≅ 518,000. Alternately, using the formulas for the method of moments in Appendix A of Loss Models:80 m = E[X] = 3.12674 x 105 . t = E[X2 ] = 4.9284598 x 1011 . α = 2 (t - m2 ) /(t - 2m2 ) = 2( 3.951 x 1011)/( 2.973159 x 1011) = 2.658. θ = mt /(t - 2m2 ) = 1.541001 x 1017 / (2.973159 x 1011) = 518,304.] Exercise: Set up the equations to be solved in order to fit a Weibull Distribution to the data in Section 2 using the Method of Moments. [Solution: One matches the first two moments. θ Γ[1+ 1/τ] = 3.12674 x 105 . θ2 Γ[1+ 2/τ] = 4.9284598 x 1011. One could eliminate theta and get an equation to solve numerically for tau:81 Γ[1+ 2/τ] / Γ[1+ 1/τ] 2 = 4.9284598 x 1011 / (3.12674 x 105 )2 = 5.0411. One can numerically solve for τ = .5406 and then θ = 312,674/ Γ[1+ 1/τ] = 312674/1.74907 = 178765.]
This matches the formulas in Appendix A of Loss Models, with m = µ1 and t = µ2 ′. 80 The formulas for method of moments will not be attached to your exam in the abridged version of Appendix A. 81 Note that in the case of the Weibull one can reduce to one equation in one unknown to be solved numerically. In more complicated cases, the method of moments could be performed by solving numerically two equations in two unknowns. However, this may be wasted effort due to the inherent lack of accuracy of the method of moments. 79
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 169
Exercise: Fit the LogNormal Distribution to the ungrouped data in Section 2. [Solution: Set the first moments equal: 3.12674 x 105 = exp(µ + σ2/2), and set the second moments equal: 4.9284598 x 1011 = exp(2µ + 2σ2). Dividing the second equation by the square of the first, eliminates mu: 4.9284598 x 1011 / (3.12674 x 105 )2 = 5.0411 = exp(σ2). ⇒ σ = 1.272. µ = ln(3.12674 x 105 ) - (1.2722 )/2 ≅ 11.84. Alternately, using the method of moments in Appendix A of Loss Models: m = E[X] = 3.12674 x 105 . t = E[X2 ] = 4.9284598 x 1011. σ=
ln(t) - 2 ln(m) =
1.6176 = 1.2718. µ = ln(m) - σ 2/2 = 11.844.]
It is relatively easy to check the results of fitting via the method of moments. For example, for a LogNormal with parameters µ = 11.844 and σ = 1.2718, the first moment is: exp(11.844 + 1.27182 /2) = 312,618, and the second moment is: exp[2(11.844) + 2(1.27182 )] = e26.923 = 4.9259 x 1011. These do indeed match, subject to rounding, the first two moments of the ungrouped data in Section 2. Exercise: Using the fitted LogNormal Distribution, estimate the median of the distribution from which the ungrouped data in Section 2 was drawn. [Solution: The median of a LogNormal Distribution is eµ. e11.844 = 139,246. Comment: This differs from the smoothed empirical estimate of the median, the average of the 65th and 66th values: (119,300 + 122,000)/2 = 120,650.] Exercise: Fit via Method of Moments the Inverse Gaussian Distribution to the ungrouped data in Section 2. [Solution: µ = E[X] and µ3/θ = E[X2 ] - E[X]2. Thus θ = E[X]3 / (E[X2 ] - E[X]2 ). E[X] = 3.12674 x 105 . E[X2 ] = 4.9284598 x 1011. µ = 312,674, and θ = 77,373.] Parameters of curves fit to the ungrouped data in Section 2 by the Method of Moments:82 Pareto: α = 2.658, θ = 518,304 Weibull: θ = 178,765, τ = 0.5406 Gamma: α = 0.2475, θ = 1.264 x 106 LogNormal: µ = 11.844, σ =1.2718 Inverse Gaussian µ = 312,674, θ = 77,373. 82
The Weibull had to be fit via computer, since the formula for the moments involves an Gamma functions.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 170
Coefficient of Variation: C V2 = Variance / E[X]2 = (E[X2 ] - E[X]2 ) / E[X]2 = E[X2 ] / E[X]2 - 1.
1 + CV2 =
E[X2 ] . E[X]2
The coefficient of variation depends on the shape parameter(s), not the scale parameter. In many examples of method of moments with two parameters, we divide the second moment by the square of the first moment, eliminating the scale parameter and thus solving for the shape parameter. Grouped Data: Similarly, one can use the method of moments to fit curves to the grouped data in Section 3. Fitting the Exponential to the grouped data set in Section 3 using the Method of Moments, one matches the first moment to the single parameter: θ = E[X] = 15,738.
Mean Coefficient of Variation Skewness
Grouped Data 15738 ≈1 ≈3
Fitted Exponential 15738 1 2
Comparing skewness, we expect the exponential has a righthand tail that is a little too thin in order to properly fit the grouped data in Section 3. In order to fit distributions with two parameters or more, one has to use estimates for the moments, such as those that were made in a previous section. For this particular grouped data set, one will run into a problem trying to fit a Pareto, since the Pareto has too heavy a tail to fit this data. For the Pareto: E[X] = θ/ (α−1) = 1.57 x 104 .
E[X2 ] = 2 θ2 / (α−2) (α−1) = 4.88 x 108 .
Solving for alpha and theta: θ = E[X] E[X2 ]/ (E[X2 ]- 2E[X]2 ) = 7.66 x 1012 / (-0.0498 x 108 ) = -1.53 x 106 . α = 2 ( E[X2 ] - E[X]2 ) / (E[X2 ] - 2E[X]2 ) = 4.83 x 108 / (-0.0498 x 108 ) = -97.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 171
This is not a viable solution, since both alpha and theta are supposed to be positive. In general, one will run into difficulty trying to fit a Pareto to a data set with a coefficient of variation less than or equal to one.83 Parameters of curves fit to grouped data in Section 3 by the Method of Moments:84 Weibull: θ = 15774, τ = 1.011 Gamma: α = 1.021 , θ = 15,385 LogNormal: µ = 9.320, σ = 0.8264 Inverse Gaussian: µ = 15,700 and θ = 16,024. Exercise: A set of data has first moment of 1302 and second moment 4,067,183. Fit a Pareto Distribution via the method of moments. [Solution: E[X] = θ/ (α−1) = 1302. E[X2 ] = 2 θ2 / {(α−2) (α−1)} = 4,067,183. Solving for alpha and theta: θ = E[X] E[X2 ] / (E[X2 ] - 2E[X]2 ) = 5,295,472,266 / 676,775 = 7825. α = 2 (E[X2 ] - E[X]2 ) / (E[X2 ] - 2E[X] 2 ) = 4,743,958 /676,775 = 7.01. Comment: These are the same formulas as in Appendix A of Loss Models,85 with m = 1302 and t = 4,067,183.]
83
The Pareto has a coefficient of variation greater than 1. As alpha gets very large, the Pareto approaches an exponential distribution and the coefficient of variation approaches 1. 84 Using estimated first moment of 1.57 x 104 and estimated second moment of 4.88 x 108 . 85 The formulas for method of moments will not be attached to your exam in the abridged version of Appendix A.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 172
Problems: Use the following information to answer the following four questions: You observe the following five claims: 410, 1924, 2635, 4548, 6142. 9.1 (1 point) Using the method of moments, a LogNormal distribution is fit to this data. What is the value of the fitted µ parameter? A. less than 7.8 B. at least 7.8 but less than 7.9 C. at least 7.9 but less than 8.0 D. at least 8.0 but less than 8.1 E. at least 8.1 9.2 (2 points) Using the method of moments, a LogNormal distribution is fit to this data. What is the value of the fitted σ parameter? A. less than 0.6 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9 9.3 (1 point) Using the method of moments, a Normal distribution is fit to the natural logarithms of this data. What is the value of the fitted µ parameter? A. less than 7.8 B. at least 7.8 but less than 7.9 C. at least 7.9 but less than 8.0 D. at least 8.0 but less than 8.1 E. at least 8.1 9.4 (2 points) Using the method of moments, a Normal distribution is fit to the natural logarithms of this data. What is the value of the fitted σ parameter? A. less than 0.6 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 173
9.5 (1 point) An exponential distribution F(x) = 1 - e-x/θ is fit to the following size of claim data by the method of moments. What is the value of the fitted parameter θ? Range($) 0-100 100-200 200-300 300-400 400-500 over 500
# of claims 6300 2350 850 320 110 70
loss ($000) 300 350 200 100 50 50
10000 1050 A. less than 100 B. at least 100 but less than 110 C. at least 110 but less than 120 D. at least 120 but less than 130 E. at least 130 The following information should be used to answer the next two questions: 10 Claims have been observed: 1500, 5500 3000, 3300, 2300, 6000, 5000, 4000, 3800, 2500. The underlying distribution is assumed to be Gamma, with parameters α and θ unknown. 9.6 (2 points) In what range does the method of moments estimator of θ fall? A. less than 400 B. at least 400 but less than 500 C. at least 500 but less than 600 D. at least 600 but less than 700 E. at least 700 9.7 (1 point) In what range does the method of moments estimator of α fall? A. less than 4 B. at least 4 but less than 5 C. at least 5 but less than 6 D. at least 6 but less than 7 E. at least 7
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 174
9.8 (1 point) An insurer writes a health insurance policy with a coinsurance factor of 80%. The insurer makes 623 payments for a total of $184,013. The insurer assumes the losses, prior to the effect of the coinsurance factor, follow a distribution: 1 F(x) = 1 , with E[X] = θπ/4. {1 + (x / θ)2}2 What is the method of moments fitted value of θ? A. 410
B. 430
C. 450
D. 470
E. 490
9.9 (2 points) You observe the following 6 claims: 162.22, 151.64, 100.42, 174.26, 20.29, 34.36. A Distribution: F(x) = 1 - e-qx, x > 0, is fit to this data via the Method of Moments. Determine the value of q. A. less than 0.006 B. at least 0.006 but less than .007 C. at least 0.007 but less than .008 D. at least 0.008 but less than .009 E. at least 0.009
Use the following information in the next six questions: You observe the following 10 claims: 1729, 101, 384, 121, 880, 3043, 205, 132, 214, 82. 9.10 (2 points) You fit this data via the method of moments to a Pareto Distribution. Determine α. A. 4.0
B. 4.5
C. 5.0
D. 5.5
E. 6.0
9.11 (1 point) You fit this data via the method of moments to a Pareto Distribution. Determine θ. A. 2400
B. 2500
C. 2600
D. 2700
E. 2800
9.12 (1 point) You fit this data via the method of moments to an Inverse Gaussian Distribution. In which of the following intervals is µ? A. less than 400 B. at least 400 but less than 500 C. at least 500 but less than 600 D. at least 600 but less than 700 E. at least 700
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 175
9.13 (1 point) You fit this data via the method of moments to an Inverse Gaussian Distribution. In which of the following intervals is θ? A. less than 350 B. at least 350 but less than 360 C. at least 360 but less than 370 D. at least 370 but less than 380 E. at least 380 9.14 (1 point) You fit this data via the method of moments to an Inverse Gamma Distribution. In which of the following intervals is α? A. less than 1 B. at least 1 but less than 2 C. at least 2 but less than 3 D. at least 3 but less than 4 E. at least 4 9.15 (1 point) You fit this data via the method of moments to an Inverse Gamma Distribution. In which of the following intervals is θ? A. less than 1000 B. at least 1000 but less than 1100 C. at least 1100 but less than 1200 D. at least 1200 but less than 1300 E. at least 1300
9.16 (3 points) The following five losses have been observed: $500, $1,000, $1,500, $2,500, $4,500. Use the method of moments to fit a LogNormal Distribution. Use this LogNormal Distribution to estimate the probability that a loss will exceed $4,500. A. Less than 5% B. At least 5% but less than 6% C. At least 6% but less than 7% D. At least 7% but less than 8% E. At least 8%
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 176
Use the following information in the next two questions: You observe the following 10 values: 0.21, 0.40, 0.14, 0.65, 0.53, 0.92, 0.30, 0.44, 0.76, 0.07. The underlying distribution is a Beta Distribution as per Loss Models, with θ = 1. The parameters a and b are fit to this data via the Method of Moments. 9.17 (2 points) In which interval is the fitted a? A. less than 1.0 B. at least 1.0 but less than 1.2 C. at least 1.2 but less than 1.4 D. at least 1.4 but less than 1.6 E. at least 1.6 9.18 (2 points) In which interval is the fitted b? A. less than 1.0 B. at least 1.0 but less than 1.2 C. at least 1.2 but less than 1.4 D. at least 1.4 but less than 1.6 E. at least 1.6 9.19 (3 points) You are given the following: • The random variable X has the density function f(x) = 0.4 exp(-x/δ1)/δ1 + 0.6 exp(-x/δ2)/δ2 , 0 < x < ∞, 0 < δ1 < δ2. •
A random sample taken of the random variable X has mean 4 and variance 27.
Determine the method of moments estimate of δ2. A. Less than 4.5 B. At least 4.5, but less than 5.0 C. At least 5.0, but less than 5.5 D. At least 5.5, but less than 6.0 E. At least 6.0 9.20 (3 points) You are given: (i) Claim amounts follow a shifted exponential distribution with probability density function: f(x) = e-(x-δ)/θ/θ, δ < x < ∞. (ii) A random sample of claim amounts X1 , X2 ,..., X10: 15
16
18
20
24
34
41
52
66
75
Estimate δ by matching both the mean of the shifted exponential distribution to the empirical mean, and the median of the shifted exponential distribution to the smoothed empirical estimate of the median. (A) 12.5 (B) 13.0 (C) 13.5 (D) 14.0 (E) 14.5
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 177
9.21 (1 point) You are modeling a claim process as a mixture of two independent distributions A and B. Distribution A is exponential with mean 1. Distribution B is exponential with mean 10. Positive weight p is assigned to distribution A. The sample mean is 3. Determine p using the method of moments. A. Less than 0.80 B. At least 0.80, but less than 0.85 C. At least 0.85, but less than 0.90 D. At least 0.90, but less than 0.95 E. At least 0.95 9.22 (3 points) The following data have been collected: Year Number Of Claims Average Claim Size 1 1732 22,141 2 2007 22,703 3 1920 24,112 4 1851 24,987 Inflation is assumed to be 4% per year. A LogNormal distribution with parameters σ = 3 and µ is used to model the claim size distribution. Estimate µ for Year 6 using the Method of Moments. (A) 5.66
(B) 5.67
(C) 5.68
(D) 5.69
(E) 5.70
9.23 (3 points) The portion of games won during a season by each of the 14 baseball teams in the American League were: 0.407, 0.426, 0.426, 0.444, 0.463, 0.469, 0.488, 0.512, 0.543, 0.543, 0.580, 0.580, 0.593, 0.593. Fit via the method of moments a Beta Distribution with θ = 1. 9.24 (2 points) The parameter of the Inverse Exponential distribution is to be estimated using the method of moments based on the following data: 2 5 11 28 65 143 Estimate θ by matching the kth moment with k = -1. (A) Less than 10 (B) At least 10, but less than 15 (C) At least 15, but less than 20 (D) At least 20, but less than 25 (E) At least 25
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 178
9.25 (3 points) You are modeling a claim process as a mixture of two independent distributions A and B. You are given: (i) Distribution A is exponential. (ii) Distribution B is exponential. (iii) Weight 0.8 is assigned to distribution A. (iv) The mean of the mixture is 10. (iv) The variance of the mixture is 228. Determine the mean of Distribution B using the method of moments. (A) 22 (B) 24 (C) 26 (D) 28 (E) 30 9.26 (3 points) If Y follows a Poisson Distribution with parameter λ, then for c > 0, cY follows an “Over-dispersed Poisson” Distribution with parameters c and λ. The distribution of Aggregate Losses has a mean of 10 and variance of 200. Fit via the method of moments an Over-dispersed Poisson Distribution to the distribution of aggregate losses. Estimate the probability that the aggregate losses are less than their mean. A. Less than 55% B. At least 55%, but less than 60% C. At least 60%, but less than 65% D. At least 65%, but less than 70% E. At least 70% 9.27 (2 points) From a complete mortality study of 7 laboratory animals, you are given: (i) The times of death, in weeks, are: 2, 3, 4, 5, 6, 8, 14. (ii) The operative survival model is assumed to be Exponential with mean θ. (iii) θ1 is the estimate of the parameter θ using percentile matching at the median. (iv) θ2 is the estimate of the parameter θ using the method of moments. Calculate the absolute difference in the estimated survival functions at 10 using θ1 and θ2. (A) 0.02
(B) 0.03
(C) 0.04
(D) 0.05
(E) 0.06
9.28 (2 points) A set of data has a sample mean and median of 830 and 410, respectively. You fit a LogNormal Distribution by matching these two sample quantities to the corresponding population quantities. What is the fitted value of the parameter σ? (A) 1.1
(B) 1.2
(C) 1.3
(D) 1.4
(E) 1.5
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 179
9.29 (2 points) The following claim data were generated from a Pareto distribution: 303 30 35 78 12 Using the method of moments to estimate the parameters of a Pareto distribution, calculate the loss elimination ratio at 10. (A) 2% (B) 4% (C) 6% (D) 8% (E) 10% 9.30 (5 points) You are given following data for 3048 professional liability insurance claims: Claim Size Interval Number of Claims 0 to 5000 1710 5001 to 25,000 968 25,001 to 100,000 343 100,001 to 250,000 23 250,001 to 500,000 4 More than 500,000 0 Assume that the sizes are uniformly distributed on each interval. Fit a LogNormal Distribution via the Method of Moments. 9.31 (3 points) You are given: (i) Losses in Year i follow a gamma distribution with parameters αi and θi. (ii) αi = α, for i = 1, 2, 3,… (iii) The parameters θi vary in such a way that there is an annual inflation rate of 8% for losses. (iv) The following is a sample of six losses: Year 1: 100 300 Year 2: 200 500 Year 3: 100 1000 Using trended losses, determine the method of moments estimate of θ5. (A) 650
(B) 700
(C) 750
(D) 800
(E) 850
9.32 (3 points) The parameters of an Inverse Pareto distribution are to be estimated based on the following data by matching kth moments with k = -1 and k = -2: 10 30 100 300 800 Determine S(1000) for the fitted Inverse Pareto. A. 3% B. 4% C. 5% D. 6% E. 7%
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 180
9.33 (3 points) You are given the following data on the sizes of 9156 policies. X, Policy Premium ($000) Number of Policies 2 8000 5 1000 25 150 100 5 250 1 Let X be the policy premium in thousands of dollars. You assume ln[X] follows a Gamma Distribution. Fit the Gamma Distribution to ln[X] via method of moments. 9.34 (2 points) You are given the following: • The random variables X has the density function α 10 α f(x) = , 0 < x < ∞, α > 0. (x +10)α + 1 •
A random sample of size 7 is taken of the variable X: 1, 3, 7, 10, 18, 21, 37.
Determine the method of moments estimate of α. A. 1.72
B. 1.74
C. 1.76
D. 1.78
E. 1.80
9.35 (4, 5/86, Q.55) (2 points) X1 , X2 ,..., Xn is an independent sample drawn from a lognormal distribution with parameters µ and σ2. 1 Let X = n
n
∑ Xi i=1
1 S2 = n- 1
n
∑ (Xi - X)2 i=1
In terms of X and S2 obtain estimators for µ and σ2 using the method of moments. A. µ = X ; σ2 = S2 / X
B. µ = ln X ; σ2 = ln S2
C. µ = X ; σ2 = ln (S2 + X 2 )
D. µ = 0.5 ln[ X 3 / ( X + S2 )] ; σ2 = ln (S2 - X 2 )
E. µ = ln X - .5 ln (1 + S2 / X 2 ) ; σ2 = ln (1 + S2 / X 2 ) 9.36 (160, 11/86, Q.12) (2.1 points) A random sample of death records yields the following exact ages at death: 30, 50, 60, 60, 70, 90. The age at death of the population from which the sample is drawn follows a gamma distribution. The parameters α and θ are estimated using the method of moments. Determine the estimate of α. (A) 6.0
(B) 7.2
(C) 9.0
(D) 10.8
(E) 12.2
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 181
9.37 (4, 5/87, Q.60) (1 point) Using the method of moments what is an estimate of the mean of a lognormal distribution given the sample: 3, 4.5, 6, 6.25, 6.5, 6.75, 7, 7.5, 8.5, 10 A. Less than 6 B. At least 6, but less than 6.25 C. At least 6.25, but less than 6.50 D. At least 6.50, but less than 6.75 E. 6.75 or more. 9.38 (4, 5/88, Q.53) (2 points) Given the distribution f(x) = axa-1, 0 < x < 1, 0 < a < ∞, and the sample 0.7, 0.14, 0.8, 0.9, 0.65; what is the method of moments estimate for a? A. Less than 1.0 B. At least 1.0, but less than 1.3 C. At least 1.3, but less than 1.6 D. At least 1.6, but less than 1.9 E. 1.9 or more 9.39 (160, 5/89, Q.14) (2.1 points) You are given: (i) Five lives are observed from time t = 0 until death. (ii) Deaths occur at t = 3, 4, 4, 11 and 18. Assume the lives are subject to the probability density function f(t) = t e-t/c / c2 , t > 0. Determine c by the method of moments. (A) 1/4 (B) 1/2 (C) 1 (D) 2 (E) 4 9.40 (2, 5/90, Q. 31) (1.7 points) Let X be a continuous random variable with density function f(x ; θ) = x(1−θ)/θ/θ for 0 < x < 1, where θ > 0. What is the method-of-moments estimator of θ? A. (1 - X )/ X
B. ( X - 1)/ X
C. X /(1- X )
D. X /( X - 1)
E. 1/(1 + X )
9.41 (4, 5/90, Q.34) (2 points) The following observations: 1000, 850, 750, 1100, 1250, 900, are a random sample taken from a Gamma distribution with unknown parameters α and θ. In what range does the method of moments estimators of α fall? A. B. C. D. E.
Less than 30 At least 30, but less than 40 At least 40, but less than 50 At least 50, but less than 60 60 or more
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 182
9.42 (160, 11/90, Q.19) (1.9 points) From a complete mortality study of 10 laboratory mice, you are given: (i) The times of death, in days, are 2, 3, 4, 5, 5, 6, 8, 10, 11, 11. (ii) The operative survival model is assumed to be uniform from 0 to ω. (iii) ω1 is the estimate of the uniform parameter ω using percentile matching at the median. (iv) ω2 is the estimate of the uniform parameter ω using the method of moments. Calculate ω1 - ω2. (A) -2
(B) -1
(C) 0
(D) 1
(E) 2
9.43 (4, 5/91, Q.41) (2 points) A large sample of claims has an observed average claim size of $2,305 with a variance of 989,544. Assuming the claim severity distribution to be lognormal, estimate the probability that a particular claim exceeds $3,000. (Use the moments of the lognormal distribution.) A. Less than 0.14 B. At least 0.14 but less than 0.18 C. At least 0.18 but less than 0.22 D. At least 0.22 but less than 0.26 E. At least 0.26 9.44 (4, 5/91, Q.46) (2 points) The following sample of 10 claims is observed: 1500 6000 3500 3800 1800 5500 4800 4200 3900 3000. The underlying distribution is assumed to be Gamma, with parameters α and θ unknown. In what range does the method of moments estimators of θ fall? A. Less than 250 B. At least 250, but less than 300 C. At least 300, but less than 350 D. At least 350, but less than 400 E. 400 or more
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 183
9.45 (4B, 5/92, Q.10) (3 points) You are given the following information:
• •
Losses follow a LogNormal distribution with parameters µ and σ.
The following five losses have been observed: $500, $1,000, $1,500, $2,500, $4,500. Use the method of moments to fit a Normal Distribution to the natural logs of the loss sizes. Use the corresponding LogNormal Distribution to estimate the probability that a loss will exceed $4,500. A. Less than 0.01 B. At least 0.01 but less than 0.05 C. At least 0.05 but less than 0.09 D. At least 0.09 but less than 0.13 E. At least 0.13 9.46 (4B, 5/92, Q.26) (1 point) The random variable X has the density function with parameter β given by f(x;β) = (1/β2 ) x exp[-.5 (x/β)2 ]; x >0, β > 0. Where E[X] = (β/2)
2π and the variance of X is: 2β2 - (π/2)β2 .
You are given the following observations of X: 4.9, 1.8, 3.4, 6.9, 4.0. Determine the method of moments estimate of β. A. Less than 3.00 B. At least 3.00 but less than 3.15 C. At least 3.15 but less than 3.30 D. At least 3.30 but less than 3.45 E. At least 3.45 9.47 (Course 160 Sample Exam #3, 1994, Q.13) (1.9 points) From a complete mortality study of five lives, you are given: (i) The underlying survival function is exponential with hazard rate λ. (ii) Deaths occur at times 1, 2, t3 , t4 , 9, where 2 < t3 < t4 < 9. (iii) The estimate of λ using the Method of Moments is 0.21. (iv) The estimate of λ using the Percentile Matching at the median is 0.21. Calculate t4 . (A) 6.5
(B) 7.0
(C) 7.5
(D) 8.0
(E) 8.5
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 184
9.48 (4B, 5/95, Q.5) (2 points) You are given the following: • The random variable X has the density function f(x) = αxα−1, 0 < x < 1, α > 0. • A random sample of three observations of X yields the values 0.40, 0.70, 0.90. Determine the method of moments estimate of α. A. Less than 0.5 B. At least 0.5, but less than 1.5 C. At least 1.5, but less than 2.5 D. At least 2.5, but less than 3.5 E. At least 3.5 9.49 (4B, 5/96, Q.4) (2 points) You are given the following:
•
The random variable X has the density function f(x) = 2(θ - x)/θ2, 0 < x < θ.
•
A random sample of two observations of X yields the values 0.50 and 0.90.
Determine the method of moments estimator of θ. A. Less than 0.45 B. At least 0.45, but less than 0.95 C. At least 0.95, but less than 1.45 D. At least 1.45, but less than 1.95 E. At least 1.95 9.50 (4B, 5/98, Q.5) (2 points) You are given the following: • The random variables X has the density function f(x) = α(x + 1)−(α+1), 0 < x < ∞, α > 0. •
A random sample of size n is taken of the random variable X.
•
The values in the random sample totals nµ.
Assuming α > 1, determine the method of moments estimator of α. A. µ
B. µ / (µ - 1)
C. µ / (µ + 1)
D. (µ - 1) / µ
E. (µ + 1) / µ
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 185
Use the following information for the next two questions: • The random variable X has the density function f(x) = 0.5 exp(-x/λ1)/λ1 + 0.5 exp(-x/λ2)/λ2, 0 < x < ∞, 0 < λ1 ≤ λ2. •
A random sample taken of the random variable X has mean 1 and variance k.
9.51 (4B, 11/98, Q.25) (3 points) If k is 3/2, determine the method of moments estimate of λ1. A. Less than 1/5 B. At least 1/5, but less than 2/5 C. At least 2/5, but less than 3/5 D. At least 3/5, but less than 4/5 E. At least 4/5 9.52 (4B, 11/98, Q.26) (2 points) Determine the values of k for which method of moments estimates of λ1, and λ2 exist. A. 0 < k
B. 0 < k < 3
C. 0 < k < 2
D. 1 ≤ k
9.53 (4B, 11/99, Q.21) (2 points) You are given the following: • The random variable X has the density function f(x) = wf1 (x) + (1-w)f2 (x), 0 < x < ∞, 0 ≤ w ≤ 1. • A single observation of the random variable X , yields the value 1. ∞
• ∫ x f1 (x) dx = 1 0
∞
• ∫ x f2 (x) dx = 2 0
• f2 (x) = 2f1 (x) ≠ 0 Determine the method of moments estimate of w. A. 0 B. 1/3 C. 1/2 D. 2/3
E. 1
E. 1 ≤ k < 3
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 186
9.54 (Course 160 Sample Exam #1, 1999, Q.12) (1.9 points) From a laboratory study of ten lives, you are given: (i) From the observed data, the following values of S(t) are estimated: ˆ (t) t S 5 0.8 8 0.6 9 0.4 12 0.2 19 0.0 (ii) A Weibull distribution is to be fitted to the sample data by using percentile matching at the 20th and 60th percentiles. Calculate S(8), the estimated probability of surviving to time 8, using the fitted Weibull. (A) 0.45 (B) 0.50 (C) 0.55 (D) 0.60 (E) 0.65 9.55 (Course 4 Sample Exam 2000, Q.8) Summary statistics of 100 losses are: Interval
Number of Losses
Sum
Sum of Squares
(0,2000] (2000,4000] (4000,8000] (8000, 15000] (15,000 ∞)
39 22 17 12 10
38,065 63,816 96,447 137,595 331,831
52,170,078 194,241,387 572,753,313 1,628,670,023 17,906,839,238
Total
100
667,754
20,354,674,039
A Pareto Distribution is fit to this data using the method of moments. Determine the parameter estimates. 9.56 (4, 5/00, Q.36) (2.5 points) You are given the following sample of five claims: 4 5 21 99 421 You fit a Pareto distribution using the method of moments. Determine the 95th percentile of the fitted distribution. (A) Less than 380 (B) At least 380, but less than 395 (C) At least 395, but less than 410 (D) At least 410, but less than 425 (E) At least 425
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 187
9.57 (4, 11/00, Q.2) (2.5 points) The following data have been collected for a large insured: Year Number Of Claims Average Claim Size 1 100 10,000 2 200 12,500 Inflation increases the size of all claims by 10% per year. A Pareto distribution with parameters α = 3 and θ is used to model the claim size distribution. Estimate θ for Year 3 using the method of moments. (A) 22,500
(B) 23,333
(C) 24,000
(D) 25,850
(E) 26,400
9.58 (4, 5/01, Q.39) (2.5 points) You are modeling a claim process as a mixture of two independent distributions A and B. You are given: (i) Distribution A is exponential with mean 1. (ii) Distribution B is exponential with mean 10. (iii) Positive weight p is assigned to distribution A. (iv) The standard deviation of the mixture is 2. Determine p using the method of moments. (A) 0.960 (B) 0.968 (C) 0.972 (D) 0.979 (E) 0.983 9.59 (4, 11/01, Q.33 & 2009 Sample Q.75) (2.5 points) You are given: (i) Claim amounts follow a shifted exponential distribution with probability density function: f(x) = e-(x-δ)/θ/θ, δ < x < ∞. (ii) A random sample of claim amounts X1 , X2 ,..., X10: 5
5
5
6
8
9
11
12
16
(iii) Σ Xi = 100 and Σ Xi2 = 1306 Estimate δ using the method of moments. (A) 3.0
(B) 3.5
(C) 4.0
(D) 4.5
(E) 5.0
23
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 188
9.60 (4, 11/03, Q.8 & 2009 Sample Q.6) (2.5 points) For a sample of dental claims x1 , x2 ,..., x10, you are given: (i) Σ xi = 3860 and Σ xi2 = 4,574,802. (ii) Claims are assumed to follow a lognormal distribution with parameters µ and σ. (iii) µ and σ are estimated using the method of moments. Calculate E[X ∧ 500] for the fitted distribution. (A) Less than 125 (B) At least 125, but less than 175 (C) At least 175, but less than 225 (D) At least 225, but less than 275 (E) At least 275 9.61 (4, 11/03, Q.24 & 2009 Sample Q.19) (2.5 points) You are given: (i) A sample x1 , x2 ,..., x10 is drawn from a distribution with probability density function: {e-x/θ/θ + e-x/σ/σ}/2, 0 < x < ∞. (ii) θ > σ (iii) Σxi = 150 and Σxi2 = 5000 Estimate θ by matching the first two sample moments to the corresponding population quantities. (A) 9
(B) 10
(C) 15
(D) 20
(E) 21
9.62 (4, 11/04, Q.14 & 2009 Sample Q.143) (2.5 points) The parameters of the inverse Pareto distribution F(x) = {x/(x + θ)}τ are to be estimated using the method of moments based on the following data: 15 45 140 250 560 1340 Estimate θ by matching kth moments with k = -1 and k = -2. (A) Less than 1 (B) At least 1, but less than 5 (C) At least 5, but less than 25 (D) At least 25, but less than 50 (E) At least 50 9.63 (2 points) Using the data in the prior question, 4, 11/04, Q.14, the parameters of the Inverse Gamma distribution are to be estimated using the method of moments. Estimate θ by matching kth moments with k = -1 and k = -2. (A) 26
(B) 28
(C) 30
(D) 32
(E) 34
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 189
9.64 (CAS3, 5/05, Q.19) (2.5 points) Four losses are observed from a Gamma distribution. The observed losses are: 200, 300, 350, and 450. Find the method of moments estimate for α. A. 0.3
B. 1.2
C. 2.3
D. 6.7
E. 13.0
9.65 (4, 5/05, Q.24 & 2009 Sample Q.193) (2.9 points) The following claim data were generated from a Pareto distribution: 130 20 350 218 1822. Using the method of moments to estimate the parameters of a Pareto distribution, calculate the limited expected value at 500. (A) Less than 250 (B) At least 250, but less than 280 (C) At least 280, but less than 310 (D) At least 310, but less than 340 (E) At least 340 9.66 (4, 11/05, Q.21 & 2009 Sample Q.232) (2.9 points) You are given: (i) Losses on a certain warranty product in Year i follow a lognormal distribution with parameters µi and σi. (ii) σi = σ, for i = 1, 2, 3,… (iii) The parameters µi vary in such a way that there is an annual inflation rate of 10% for losses. (iv) The following is a sample of seven losses: Year 1: 20 40 50 Year 2: 30 40 90 120 Using trended losses, determine the method of moments estimate of µ3. (A) 3.87
(B) 4.00
(C) 30.00
(D) 55.71
(E) 63.01
9.67 (CAS3, 11/06, Q.3) (2.5 points) Claim sizes of 10 or greater are described by a single parameter Pareto distribution, with parameter α. A sample of claim sizes is as follows: 10
12
14
18
21
25
Calculate the method of moments estimate for α for this sample. A. Less than 2.0 B. At least 2.0, but less than 2.1 C. At least 2.1, but less than 2.2 D. At least 2.2, but less than 2.3 E. At least 2.3
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 190
9.68 (4, 5/07, Q.10) (2.5 points) A random sample of observations is taken from a shifted exponential distribution with probability density function: f(x) = e-(x-δ)/θ/θ, δ < x < ∞. The sample mean and median are 300 and 240, respectively. Estimate δ by matching these two sample quantities to the corresponding population quantities. (A) Less than 40 (B) At least 40, but less than 60 (C) At least 60, but less than 80 (D) At least 80, but less than 100 (E) At least 100 9.69 (CAS3, 11/07, Q.5) (2.5 points) X is a two-parameter Pareto random variable with parameters θ and α. A random sample from this distribution produces the following four claims:
• x1 = 2,000 • x2 = 17,000 • x3 = 271,000 • x4 = 10,000 Find the Method of Moments estimate for α. A. Less than 2 B. At least 2, but less than 3 C. At least 3, but less than 4 D. At least 4, but less than 5 E. At least 5 9.70 (CAS3L, 5/09, Q.17) (2.5 points) A random variable, X, follows a lognormal distribution. You are given a sample of size n and the following information:
Σ xi / n = 1.8682 Σ xi2 / n = 4.4817 Use the method of moments to estimate the lognormal parameter σ. A. Less than 0.4 B. At least 0.4, but less than 0.8 C. At least 0.8, but less than 1.2 D. At least 1.2 but less than 1.6 E. At least 1.6
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 191
9.71 (CAS3L, 11/10, Q.20) (2.5 points) You are given the following information:
• A gamma distribution has mean αθ and variance αθ2. • Five observations from this distribution are: 2
10
12
8
8
Calculate the method of moments estimate for α. A. Less than 4.0 B. At least 4.0, but less than 5.0 C. At least 5.0, but less than 6.0 D. At least 6.0, but less than 7.0 E. At least 7.0 9.72 (CAS3L, 5/11, Q.19) (2.5 points) You are given the following five observations from a single-parameter Pareto distribution: 125 250 300 425 500 The value of the mode, θ, is known in advance to be 100. Calculate the method of moments estimate for the parameter θ. A. Less than 1.1 B. At least 1.1, but less than 1.2 C. At least 1.2, but less than 1.3 D. At least 1.3, but less than 1.4 E. At least 1.4
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 192
Solutions to Problems: 9.1. B & 9.2. A. The observed mean is (410 + 1924 + 2635 + 4548 + 6142)/5 = 3131.8. The second moment is: (4102 + 19242 + 26352 + 45482 + 61422 )/5 = 13844314. For the LogNormal Distribution the mean is exp[µ +.5 σ2], while the second moment is exp[2µ + 2σ2]. With 2 parameters, the method of moments consists of matching the first 2 moments. Thus setting exp[µ +.5 σ2] = 3131.8 and exp[2µ + 2σ2] = 13,844,314, we can solve by dividing the square of the 1st equation into the 2nd equation: exp[σ2] = 13844314 / 3131.82 = 1.4115. Thus σ = 0.5871 and thus µ = 7.877. Comment: It is only a coincidence that eµ = e7.877 = 2636, equal to the sample median of 2635, subject to rounding. 9.3. A. & 9.4. E. µ = (Σ ln x i)/5 = 7.72. Σ( ln x i)2 /5 = 60.49. σ =
60.49 - 7.722 = 0.94.
Comment: The mean and variance of the logs of the claim sizes have been matched to those of a Normal Distribution. The method of moments applied to the Normal Distribution underlying the LogNormal, such as in these two questions, is not the same as applying the method of moments directly to the LogNormal, such as in the previous two questions. Applying the method of moments to the Normal Distribution underlying the LogNormal turns out to be the same as applying Maximum Likelihood to the LogNormal Distribution. 9.5. B. For the exponential the mean is θ. Therefore for the method of moments θ = observed mean = 105.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 193
9.6. C. & 9.7. E. Compute the observed first two moments.
Average
Claim Size
Square of Claim Size
1500 5500 3000 3300 2300 6000 5000 4000 3800 2500
2,250,000 30,250,000 9,000,000 10,890,000 5,290,000 36,000,000 25,000,000 16,000,000 14,440,000 6,250,000
3690
15,537,000
Match the observed and theoretical means and variances: αθ = 3690, and αθ2 = 15,537,000 - 36902 = 1,920,900. Therefore, α = 36902 / 1920900 = 7.088 and θ = 3690/7.088 = 521. 9.8. D. The total losses prior to the effect of the coinsurance factor are: 184013/.8 = 230016. Thus the average loss is 230016/623 = 369.2. Set the observed mean equal to the theoretical mean: 369.2 = θπ/4. θ = (4/π)(369.2) = 470. Comment: This is a ParaLogistic Distribution as per Loss Models, with α = 2. 9.9. E. The average of the 6 claims is 107.2. For this Exponential Distribution, the mean is 1/q. Set mean = 1/q = 107.2. Solve for q = 1 / 107.2 = 0.0093. 9.10. B. & 9.11. A. first moment = (1729 + 101 + 384 + 121 + 880 + 3043 + 205 + 132 + 214 + 82)/ 10 = 689.1. 2nd mom. = (17292 + 1012 + 3842 + 1212 + 8802 + 30432 + 2052 + 1322 + 2142 + 822 )/ 10 = 13307957 / 10 = 1330796. Matching first moments: θ/(α-1) = 689.1. Matching 2nd moments: 2θ2 / {(α-1)(α-2)} = 1330796. Dividing the second equation by the square of the first equation: 2(α-1)/(α-2) = 1330796/689.12 = 2.8025. Solving, α = 2(1 - 2.8025)/(2 - 2.8025) = 4.49. Then, θ = 689.1(4.49 - 1) = 2406. 9.12. D. & 9.13. E. Matching first moments: µ = 689.1. Matching variances: µ3 /θ = 1330796 - 689.12 = 855,937 . Then, θ = (689.13 ) / 855,937 = 382.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 194
9.14. C. & 9.15. B. Matching first moments: θ/(α−1) = 689.1. Matching second moments: θ2 /{(α−1)(α−2)} = 1330796. Then, dividing the second equation by the square of the first equation: (α−1)/(α−2) = 1330796/689.12 = 2.8025. Solving α = ((2)(2.8025) - 1)/( 2.8025 - 1) = 2.555. θ = (2.555 - 1)(689.1) = 1072. 9.16. B. The observed mean is: (500 + 1000 + 1500 + 2500 + 4500)/5 = 2000. The second moment is: (5002 + 10002 + 15002 + 25002 + 45002 )/5 = 6,000,000. For the LogNormal Distribution the mean is exp[µ +.5 σ2], while the second moment is exp[2µ + 2σ2]. With 2 parameters, the method of moments consists of matching the first 2 moments. Thus set exp[µ +.5 σ2] = 2000 and exp[2µ + 2σ2] = 6,000,000. Divide the square of the 1st equation into the 2nd equation: exp[2µ + 2σ2]/exp[2µ + σ2] = exp[σ2] = 6,000,000 / 20002 = 1.5. ⇒ σ = .637. ⇒ µ = 7.398. 1 - F(4500) = 1 - Φ[(ln(4500) - 7.398)/.637] = 1 - Φ[1.59] = 5.6%. 9.17. B. & 9.18. D. first moment = 0.442, 2nd moment = 0.26396.
Average
Size 0.21 0.40 0.14 0.65 0.53 0.92 0.30 0.44 0.76 0.07
Square of Size 0.0441 0.1600 0.0196 0.4225 0.2809 0.8464 0.0900 0.1936 0.5776 0.0049
0.442
0.26396
This is a Beta Distribution as in Loss Models, with θ = 1. Matching first moments: a/(a+b) = 0.442. Matching second moments: a(a+1) / {(a+b)(a+b+1)} = .26396. From the first equation: b = 1.262a. Dividing the second equation by the first equation: (a+1)/(a+b+1) = .5972. Therefore, (a+1) = (a + 1.262a +1)(.5972).⇒ a = 1.15 and b = 1.45.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 195
9.19. D. For an Exponentials with mean δ, the second moment is 2δ2 . The moments of the mixed distribution are the mixture of those of the individual distributions. Therefore, the mixed distribution has mean: .4δ1 + .6δ2 and second moment: 2(.4δ12 + .6δ22). Using the method of moments with two parameters, we match the first two moments. 4 = 0.4δ1 + 0.6δ2 ⇒ 20 = 2δ1 + 3δ2. ⇒ δ1 = 10 - 1.5δ2. 27 + 42 = 2(0.4δ12 + 0.6δ22) ⇒ 215 = 4δ12 + 6δ22. Substituting into the second equation: 215 = 4(10 - 1.5δ2)2 + 6δ22. ⇒ 15δ22 - 120δ2 + 185 = 0. δ2 = {120 ±
1202 - (4)(15)(185) }/{(2)(15)} = 4 ± 1.915 = 5.915 or 2.085.
Note that δ2 > δ1. If δ2 = 5.915, then δ1 = (185 - 3δ22) / 12δ2 = 1.128 < δ2. If instead, δ2 = 2.085, then δ1 = (185 - 3δ22) / 12δ2 = 6.873 > δ2. Comment: Similar to 4B, 11/98, Q.25. 9.20. B. The empirical mean is: 361/10 = 36.1. The smoothed empirical estimate of the median is: (24 + 34)/2 = 29. If X follows a shifted Exponential, then Y = X - δ follows an Exponential. Y has mean θ, and median: -θln(.5) = .693θ. Therefore, X = Y + δ, has mean: θ + δ, and median: .693θ + δ. We want: θ + δ = 36.1 and .693θ + δ = 29. ⇒ θ = (36.1 - 29)/(1 - .693) = 23.1. ⇒ δ = 13.0. 9.21. A. Mean of the mixed distribution is: (1)(p) + (10)(1-p) = 10 - 9p. Setting the mean of the mixture equal to 3: 10 - 9p = 3. p = 7/9 = 0.778. Comment: We do not use the fact that the individual distributions are Exponential. For a two-point mixture in which only the weight is unknown, the mean of the mixture is between the individual means. Therefore, the sample mean has to be between the individual means for the fitted weight via method of moments to be between zero and one. In this case, the empirical mean has to be between 1 and 10.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 196
9.22. E. Since we want to estimate µ for Year 6, inflate all of the data to the cost level of Year 6. For Year 1, (22,141)(1.04)5 = 26937.9. So for Year 1, the inflated losses are: (1732)(26937.9) = 46.656 million. Year
Number of Claims
Average Size of Claim
Inflated Average Size of claim
Inflated Dollars of Loss
1 2 3 4
1732 2007 1920 1851
$22,141 $22,703 $24,112 $24,987
$26,937.9 $26,559.3 $27,122.7 $27,025.9
$46,656,463 $53,304,513 $52,075,624 $50,025,013
Sum
7510
$202,061,614
The total inflated losses are: 202.062 million. Average inflated claim size = 202.062 million /7510 = 26,906. Set the observed and theoretical means equal: exp[µ +.5 σ2] = exp[µ + 4.5] = 26,906. µ = ln(26,906) - 4.5 = 5.70. Comment: Similar to 4, 11/00, Q. 2.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 197
9.23. The mean of the data is: .5048. The second moment of the data is: .2590. .5048 = a/(a + b). ⇒ b = .9810a. .2590 = a(a+1)/(a + b)(a + b + 1). ⇒ .2590(1.9810a)(1.9810a + 1) = a(a + 1).
⇒ 1.0164a + 0.5131 = a + 1. ⇒ a = 29.7. ⇒ b = 29.1. Comment: The winning percentages do not average to 50%, since some games were played against the National League. In this year, which was 2007, the American League did better than average in its games against the National League. A graph of the fitted Beta Distribution, with support from 0 to 1: f(x) 6 5 4 3 2 1 0.3
0.4
0.5
0.6
0.7
x
9.24. A. E[1/X] = (1/2 + 1/5 + 1/11 + 1/28 + 1/65 + 1/143)/ 6 = 0.1415. For the Inverse Exponential Distribution, E[X-1] = θ−1 Γ[-1 + 1] = Γ[0]/θ = 1/θ. 1/θ = 0.1415. ⇒ θ = 7.067. Comment: Similar to 4, 11/04, Q.14. 9.25. C. Mean of the mixed distribution is: (θA)(.8) + (θB)(.2) = 10. Second Moment of the mixed distribution is: (2θA2)(.8) + (2θB2)(.2) = 228 + 102 = 328. θA = 12.5 - .25θB. 1.6(12.5 - .25θB)2 + .4θB2 = 328. .5θB2 - 10θB - 78 = 0.
θB = {10 +
Comment: θA = 12.5 - .25θB = 6.
102 + (4)(0.5)(78) }/{(2)(.5)} = 26.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 198
9.26. C. The mean of c times a Poisson is cλ. The variance of c times a Poisson is c2 λ. cλ = 10. c2 λ = 200. ⇒ c = 20. ⇒ λ = 0.5. 20N < 10. ⇔ N < 1/2. ⇔ N = 0. Density at zero for the Poisson is: e−λ = e-0.5 = 60.6%. Comment: Since Var[cX]/E[cX] = cVar[X]/E[X], for c > 1, the Over-dispersed Poisson Distribution has a variance greater than it mean. See for example “A Primer on the Exponential Family of Distributions”, by David R. Clark and Charles Thayer, CAS 2004 Discussion Paper Program. 9.27. E. Using percentile matching at the median, set F(5) = .5: .5 = 1 - e-5/θ.
⇒ θ1 = 7.2135. S(10) = e-10/7.2135 = .250. Using the method of moments: θ2 = (2 + 3 + 4 + 5 + 6 + 8 + 14)/7 = 6. S(10) = e-10/6 = 0.189. |.250 - .189| = 0.061. 9.28. B. The LogNormal Distribution has mean = exp[µ + σ2/2], and median = exp[µ]. 410 = exp[µ]. ⇒ µ = 6.016. 830 = exp[µ + σ2/2]. ⇒ µ + σ2/2 = 6.721. ⇒ σ = 1.19. Comment: One can solve for the median by setting 0.5 = Φ[(ln(x) - µ)/σ]. 9.29. E. E[X] = (303 + 30 + 35 + 78 + 12)/5 = 91.6 = θ/(α-1). E[X2 ] = (3032 + 302 + 352 + 782 + 122 )/5 = 20032.4 = 2θ2 / {(α-1)(α-2)}. Dividing the second equation by the the square of the first equation: 2(α-1)/(α-2) = 2.387. ⇒ α = 7.17. ⇒ θ = 565.2. E[X] = θ/(α−1). E[X ∧ x] = {θ/(α-1)} {1 - (θ/(θ+x))α−1}. LER(x) = E[X ∧ x]/E[X] = 1 - {θ/(θ+x)}α−1. LER(10) = 1 - {565.2 / (565.2 + 10)}7.17-1 = 10.3%. Comment: Similar to 4, 5/05, Q.24.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 199
9.30. For each interval from a to b, the mean of the uniform is: (a + b)/2. For example, (5 + 25)/2 = 15. For each interval from a to b, the second moment of the uniform is: (b3 - a3 ) / {(b-a)(3)}. For example, (253 - 53 )/{(25 - 5)(3)} = 258.333. Lower Endpoint
Upper Endpoint
Number of Claims
First Moment
Second Moment
0 5 25 100 250
5 25 100 250 500
1710 968 343 23 4
2.500 15.000 62.500 175.000 375.000
8.333 258.333 4,375.000 32,500.000 145,833.333
3048
15.012
1,015.674
Then weight the moments by the number of claims. {(1710)(8.33) + (968)(258.33) + (343)(4375) + (23)(32,500) + (4)(145,833.33)}/3048 = 1015.7. Set the first and second moments of the LogNormal equal to their estimates: exp[µ + σ2/2] = 15,012. exp[2µ + 2σ2] = 1,015,674,000. Dividing the second equation by the square of the first equation: exp[σ2] = 1,015,674,000/15,0122 = 4.5069. ⇒ σ = 1.227. ⇒ µ = 8.864. Comment: Data summarized from Table 2 of Sheldon Rosenbergʼs discussion of “On the Theory of Increased Limits and Excess of Loss Pricing”, PCAS 1977. Estimating the moments in this way is discussed in “Mahlerʼs Guide to Loss Distributions.” 9.31. C. Since we wish to estimate theta for year 5, inflate all of the losses to the year 5 level: (100)(1.084 ) = 136.0. (300)(1.084 ) = 408.1. (200)(1.083 ) = 251.9. (500)(1.083 ) = 629.9. (100)(1.082 ) = 116.6. (1000)(1.082 ) = 1166.4. First moment is: (136.0 + 408.1 + 251.9 + 629.9 + 116.6 + 1166.4) / 6 = 451.5. Second moment is: (136.02 + 408.12 + 251.92 + 629.92 + 116.62 + 1166.42 ) / 6 = 336,559. Matching moments, results in two equations in two unknowns: αθ = 451.5.
αθ2 = 336,559.
Divide the second equation by the square of the first equation: 1/α = 336,559 / 451.52 = 1.651. ⇒ α = 0.606. ⇒ θ = 451.5 / 0.606 = 745. Comment: Similar to 4, 11/05, Q.21.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 200
9.32. B. For an Inverse Pareto, E[1/X] = θ−1 / (τ - 1), and E[1/X2 ] = 2θ−2 / {(τ - 1)(τ - 2)}. The negative first moment for the data is: {1/10 + 1/30 + 1/100 + 1/300 + 1/800}/5 = 0.02958. The negative second moment for the data is: {1/102 + 1/302 + 1/1002 + 1/3002 + 1/8002 } / 5 = 0.002245. Matching moments, results in two equations in two unknowns: θ−1 / (τ - 1) = 0.02958.
2θ−2 / {(τ - 1)(τ - 2)} = 0.002245.
Divide the second equation by the square of the first equation: 2 (τ - 1) / (τ - 2) = 0.002245 / 0.029582 = 2.566. ⇒ τ = 5.534.
⇒ θ = 1 / {(0.02958)(5.534 - 1)} = 7.456. S(1000) = 1 - (1000 / 1007.456)5.534 = 4.0%. Comment: Similar to 4, 11/04, Q.14. 9.33. The first moment is: 8000 ln[2] + 1000 ln[5] + 150 ln[25] + 5 ln[100] + ln[250] = 0.83726. 9156 The second moment is: 8000 ln[2]2 + 1000 ln[5]2 + 150 ln[25]2 + 5 ln[100]2 + ln[250]2 = 0.88735. 9156 α θ = 0.83726. α θ2 = 0.88735 - 0.837262 = 0.18635
⇒ θ = 0.22257. ⇒ α = 3.7618. Comment: If ln[X] follows a Gamma Distribution, then X follows a LogGamma Distribution, a distribution which is not discussed in Loss Models. 9.34. A. This is a Pareto distribution with θ = 10, and mean: 10 / (α - 1). X = (1 + 3 + 7 + 10 + 18 + 21 + 37)/7 = 13.857. 10 / (α - 1) = 13.857. ⇒ α = 1.722.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 201
9.35. E. The method of moments consists of matching the first two moments, since we have two parameters. For the LogNormal Distribution the mean is exp[µ +.5 σ2] while the second moment is exp[2µ + 2σ2]. Thus the variance is: exp[2µ + σ2]{exp[σ2] -1}. Taking X as an estimate of the mean and S2 as an estimate of the variance, we set: exp[µ +.5σ2] = X and exp[2µ + σ2]{exp[σ2] -1} = S2. We get: exp[σ2] -1 = S2 / X 2 . Thus σ2 = ln( 1 + S2 / X 2 ). Therefore, µ = ln( X ) - .5σ2 = ln( X ) - .5 ln( 1 + S2 / X 2 ). 9.36. D. αθ = (30 + 50 + 60 + 60 + 70 + 90)/6 = 60. α(α+1)θ2 = (302 + 502 + 602 + 602 + 702 + 902 )/6 = 3933.33. Dividing the second equation by the square of the first equation: 1 + 1/α = 1.09259. α = 10.8. Comment: θ = 5.56. One could instead set: αθ2 = 3933.33 - 602 = 333.33. 9.37. D. The observed mean is: (3 + 4.5 + 6 + 6.25 + 6.5 + 6.75 + 7 + 7.5 + 8.5 + 10)/10 = 6.60. Since the method of moments consists of matching the first two moments, the mean of the fitted distribution will also be 6.60. Comment: If one were to carry through the method of moments, the second moment is: (32 +4.52 + 62 +6.252 +6.52 +6.752 + 72 +7.52 +8.52 +102 )/10 = 46.96. For the LogNormal Distribution the mean is exp[µ +.5 σ2], while the second moment is exp[2µ + 2σ2]. Thus setting exp[µ +.5 σ2] = 6.6 and exp[2µ + 2σ2] = 46.96, we can solve exp[σ2] = 46.96 / 6.62 = 1.078. Thus σ = .274 and therefore µ = 1.850. 9.38. D. Integrating xf(x) = axa from zero to one, the mean is: a/(1+a). The observed mean is .638. Since we have one parameter, the method of moments consists of matching the first moment. Setting a/(1+a) = .638, we get a = 1/((1/.638) -1) = 1.762. Comment: A Beta distribution with b =1 and θ = 1, therefore with mean: θa/(a+b) = a/(1+a). 9.39. E. This is a Gamma Distribution with α = 2 and θ = c. X = (3 + 4 + 4 + 11 + 18)/5 = 8. Set 8 = αθ = 2c. ⇒ c = 4.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments, 1
∫
1
∫
9.40. A. x f(x) dx = x1/θ/θ dx = x1 + 1/θ/(θ+1) 0
0
HCM 10/14/12,
Page 202
x=1
] = 1/(θ+1). x=0
Set the theoretical mean equal to the sample mean: 1/(θ+1) = X . ⇒ θ = (1 - X )/ X . 9.41. B. The observed first moment = m = (1000+850+750+1100+1250+900)/6 = 975. The second moment = t = (10002 + 8502 + 7502 + 11002 + 12502 + 9002 )/6 = 977,917. Match the observed and theoretical means and variances: αθ = 975 and αθ2 = 977,917 - 9752 = 27292. Therefore, α = 9752 / 27292 = 34.8 and θ = 975/34.8 = 28.0. 9.42. A. The median of the uniform distribution is ω/2. The empirical median is: (5+6)/2 = 5.5. Setting ω/2 = 5.5, ⇒ ω1 = 11. The mean of the uniform distribution is ω/2. The empirical mean is 6.5. Setting ω/2 = 6.5, ⇒ ω2 = 13. ω1 - ω2 = 11 - 13 = -2. 9.43. C. For the LogNormal Distribution: Mean = exp(µ + .5 σ2), second moment = exp(2µ + 2σ2). Setting these equal to the observed moments: exp(µ + .5 σ2) = 2305, and exp(2µ + 2σ2) = 989,544 +23052 =6302569. Therefore µ + .5 σ2 = ln(2305) = 7.7428 and 2µ + 2σ2 = ln(6,302,569) =15.6565. Therefore, σ2 = 15.6565 - (2)(7.7428) = .1709 and µ = 7.7428 -(.5)(.1709) = 7.657. For the LogNormal Distribution: F(x) = Φ[{ln(x) − µ} / σ]. 1 - F(3000) = 1- Φ[{ln(3000) - 7.657} / 0.1709 ] = 1 - Φ[0.85] = 20%.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 203
9.44. E. first moment = 3800. second moment = 16,332,000.
Average
Claim Size
Square of Claim Size
1500 6000 3500 3800 1800 5500 4800 4200 3900 3000
2,250,000 36,000,000 12,250,000 14,440,000 3,240,000 30,250,000 23,040,000 17,640,000 15,210,000 9,000,000
3800
16,332,000
Match the observed and theoretical means and variances: αθ = 3800, and αθ2 = 16,332,000 - 38002 = 1,892,000. Therefore, α = 38002 / 1,892,000 = 7.632 and θ = 3800/7.632 = 498. Comment: α = mean2 / (2nd moment - mean2 ) = 7.63. 9.45. C. (1/5)Σ ln xj = (1/5)(ln 500 + ln 1000 + ln 1500 + ln 2500 + ln 4500) = 36.6715 /5 = 7.3343. (1/5)Σ ln xj2 = (1/5){(ln 500)2 + (ln 1000)2 + (ln 1500)2 + (ln 2500)2 + (ln 4500)2 } = 271.7963/5 = 54.3593. Then matching the moments of the log claim sizes to a Normal Distribution: µ = 7.3343. σ2 = 54.3593 - 7.33432 . ⇒ σ = .7532. For the LogNormal Distribution: F(x) = Φ((ln x − µ )/σ). 1 - F(4500) = 1 - Φ[(ln 4500 − 7.3343 )/.7532] = 1 - Φ[1.43] = 1 - 0.9236 = 0.0764. 9.46. D. This distribution has one parameter, so under the method of moments one sets the observed mean of (4.9 + 1.8 + 3.4 + 6.9 + 4.0)/5 = 4.2, equal to the theoretical mean. (β/2) 2 π = 4.2. ⇒ β = (4.2)(2) /
2 π = 3.35.
Comment: Weibull Distribution, with parameters θ = β 2 and τ = 2. Thus this an example of a single parameter special case of a two parameter distribution in which one parameter is fixed, although this fact is not particularly helpful in solving this problem. The mean is: θ Γ(1 + 1/τ) = Γ(3/2) β 2 = { π / 2} β 2 = (β/2) 2 π . The second moment is: θ2 Γ(1 + 2/τ) = Γ(2) β2 2 = 2β2 .
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 204
9.47. E. The empirical median is t3 . 1/2 = S(t3 ) = exp[-.21t3 ]. ⇒ t3 = 3.3. The empirical mean is: (1 + 2 + 3.3 + t4 + 9)/5 = 3.06 + t4 /5. From the method of moments result: 1/.21 = 3.06 + t4 /5. ⇒ t4 = 8.51. 9.48. C. Since this is a one parameter distribution the method of moments involves matching the mean. The observed mean is (.4+.7+.9) / 3 = .667 = 2 / 3. The mean of this distribution is the integral from zero to one of xf(x), which is: x=1
1
∫ αxα dx = (αxα+1) / (α+1) ] 0
= α / (α+1).
x=0
Setting this equal to the observed mean of 2/3, gives 2α + 2 = 3α. Therefore α = 2. Comment: A Beta distribution with a = α, b =1 and θ = 1, therefore with mean θa/(a+b) = α/(1+α). 9.49. E. The mean of the given density is: θ
∫0
θ
x f(x) dx =
∫0
x= θ
2x(θ- x) / θ2 dx = (2 / θ2)(θx2 / 2 - x3 / 3)]
= (2/θ2)(θ3 /6) = θ/3.
x= 0
Since one has a single parameter, in order to apply the Method of Moments one sets the mean of the fitted distribution equal to the observed mean: (0.5 + 0.9) /2 = θ / 3. ⇒ θ = (3)(0.7) = 2.1. 9.50. E. In order to apply the method of moments one sets the observed mean equal to the theoretical mean and solve for the single parameter α. The theoretical mean is by integration by ∞
parts:
∞
∫0 x f(x) dx = ∫0 xα (x +1)- α - 1 dx =
x =∞ -x(x +1)- α x= 0
]
∞
+
∫0 (x+ 1)- α dx = 0 + 1/(α-1) = 1/(α-1).
Setting µ = 1/(α-1) and solving: α = (µ + 1) / µ. Comment: This is a Pareto Distribution, with θ = 1 fixed, with mean θ/(α-1) = 1/(α-1). Alternately, the tail probability S(x) = (x+1)−α can be obtained by integrating the density function from x to infinity. Then the mean is the integral from 0 to infinity of S(x).
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 205
9.51. C. The mean of these Exponentials is: λ. The variance of these Exponentials is: λ2 . Therefore, the second moment of these Exponentials is: λ2 + λ2 = 2λ2 . The mean and second moment of the mixed distribution are the weighted average of those of the individual distributions. Therefore, the mixed distribution has mean: .5(λ1 + λ2) and second moment: .5(2λ12 + 2λ22). Using the method of moments with two parameters, we match the first two moments. 1 = .5(λ1 + λ2) or 2 = λ1 + λ2, k + 12 = .5(2λ12 + 2λ22) or k + 1 = λ12 + λ22. Squaring the first equation gives: 4 = λ22+ λ 12 + 2λ 1λ 2. Subtracting the second equation gives: 3-k = 2λ1λ 2. Thus λ2 = (3-k)/ 2λ1. Substituting back into the first equation: 2 = λ1 + (3-k)/ 2λ1 . Thus λ12 - 2λ1 + (3-k)/2 = 0. Letting k = 3/2 as given, this equation becomes, λ1 2 - 2λ1 + 3/4 = 0. λ1 =
2 ±
22 - (4)(1)(3 / 4) = 1 ± 1/2 = 1/2 or 3/2. (Note λ2 was defined to be the larger one.) 2
Check: .5(λ1 + λ2) = .5(1/2 + 3/2) = 1. .5(2λ12 + 2λ22) = 1/4 + 9/4 = 5/2 = 1+ k. 9.52. E. The solution to the previous question reduces to a quadratic equation: λ 12 - 2λ1 + (3-k)/2 = 0. The solutions are: 1 ±
1 - (3 - k)/ 2 = 1 ±
(k - 1) / 2 . The solutions are
complex for k < 1. The solutions are positive (and real) for k < 3. For k ≥ 3, one solution is negative (or zero), which is not allowed since for the given densities both lambdas are positive. (For k > 3,
(k - 1) / 2 > (3 - 1) / 2 = 1.) Thus we require 1 ≤ k < 3 .
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 206
9.53. E. The mean of the mixed distribution is (1)(w) + (2)(1 - w) = 2 - w. Set the theoretical mean equal to the observed mean: 2 - w = 1. w = 1. Comment: One has to assume that f1 has no parameters and that f2 has no parameters; for example f1 might be an Exponential Distribution with mean 1, while f2 might be a Pareto Distribution with α = 3 and θ = 4, and thus mean of 2. Here we have only one remaining parameter w. Otherwise, the question makes no sense. Commonly both f1 and f2 would have non-fixed parameters and thus the means would depend on these unknown parameters; in that case, one would have to match more than the first moment in order to apply the method of moments. One does not commonly apply the method of moments to a single observation. The answer would have been the same if the observed mean of many observations had been 1. If for example, the observed mean had been instead 1.4, then w = .6. If the observed mean had been outside [1, 2], then we would not have gotten an answer; w would have been outside [0, 1]. ˆ (t) = 0.8. 9.54. B. The empirical 20th percentile is 5, where S The empirical 60th percentile is 9. F(t) = 1 - exp[-(t/θ)τ]. .20 = F(5) = 1 - exp[-(5/θ)τ]. ⇒ (5/θ)τ = -ln(.8) = 0.2231. 0.6 = F(9) = 1 - exp[-(9/θ)τ]. ⇒ (9/θ)τ = -ln(0.4) = 0.9163. Dividing the two equations: (9/5)τ = 4.107. ⇒ τ = ln(4.107)/ln(1.8) = 2.40. θ = 9/ .91631/2.40 = 9.33. S(8) = exp[-(8/9.33)2.40] = 0.50. 9.55. The observed mean is: 667,754/100 = 6677.54. The observed second moment is: (sum of squared loss sizes)/(number of losses) = 20,354,674,039/100 = 203,546,740.39. The first moment of a Pareto is: θ/(α-1), and the second moment of a Pareto is: 2θ2 / {(α-1)(α-2)}. Matching first moments θ/(α-1) = 6677.54. Matching second moments 2θ2 / {(α-1)(α-2)} = 203,546,740.39. Dividing the second equation by the square of the first equation: 2(α-1)/(α-2) = 203,546,740.39/6677.542 = 4.5649. Solving, α = 2(1 - 4.5649)/(2 - 4.5649) = 2.780. Then, θ = 6677.54(2.780 - 1) = 11,886. Comment: In this case, when it comes to the first two moments, we have enough information to proceed in exactly the same manner as if we had ungrouped data.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 207
9.56. A. The first moment is: (4 + 5 + 21 + 99 + 421)/5 = 110. The second moment is: (42 + 52 + 212 + 992 + 4212 )/5 = 37504.8. Match the first two moments: θ/(α−1) = 110 and 2θ2/((α−1)(α−2)) = 37504.8. Solving, α = 2
37504.8 / 1102 - 1 = 3.819. 37504.8 / 1102 - 2
θ = (3.819 - 1)(110) = 310.1. 0.95 = 1 - (1+x/θ)−α. Therefore, x = θ(.05−1/α - 1) = 369. Comment: VaRp (X) = θ {(1 - p)−1/α - 1}. 9.57. E. Since we want to estimate θ for Year 3, inflate all of the data to the cost level of Year 3: 10000(1.1)2 = 12100. 12500(1.1) = 13750. The total inflated losses are: (100)(12100) + (200)(13750) = 3,960,000. Average claim size = 3,960,000 /(100+200) = 13,200. Set the observed and theoretical means equal: θ/(α-1) = 13,200. θ = (3-1)(13200) = 26,400. Alternately, in Year 1, set θ/(α-1) = 10,000. ⇒ θ = 20,000. in Year 2, set θ/(α-1) = 12,500. ⇒ θ = 25,000. Inflate up to year 3: (20000)(1.12 ) = 24,200, and (25000)(1.1) = 27,500. Weight the two inflated values by the number of claims in each year: {(100)(24,200) + (200)(27,500)} / (100 + 200) = 26,400.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 208
9.58. E. Mean of the mixed distribution is: (1)(p) + (10)(1-p) = 10 - 9p. Second Moment of the mixed distribution is: {(2)(12 )}(p) + {(2)(102 )}(1-p) = 200 - 198p. Variance of the mixed distribution is: 200 - 198p - (10 - 9p)2 = 100 - 18p - 81p2 . Setting the variance of the mixture equal to 22 = 4: 81p2 + 18p - 96 = 0. p =
-18 ±
182 + (4)(96)(81) = -0.1111 ± 1.0943 = (2) (81)
- 1.2054 or 0.9832. p = 0.9832, since p is stated to be positive. Comment: The moments of a mixed distribution are a weighted average of the moments of the individual distributions. An Exponential Distribution has mean θ and second moment 2θ2. Usually one would not apply the method of moments to a one parameter situation by matching the standard deviation or variance. Rather, usually one would match the mean, but one is not given enough information to do so in this case. 9.59. D. Solve for the two parameters, θ and δ, by matching the first two moments. ∞
∫
∞
∞
∫
∫
δ
0
E[X] = xf(x) dx = xe-(x-δ)/θ/θ dx = (y+δ)e-y/θ/θ dy = θ + δ. δ ∞
∫
∞
∫
∞
∞
∫
∫
0
0
E[X2 ] = x2 f(x) dx = x2 e-(x-δ)/θ/θ dx = (y+δ)2 e-y/θ/θ dy = (y2 + 2yδ + δ2)e-y/θ/θ dy = δ
δ
2θ2 + 2θδ + δ2. observed first moment = 100/10 = 10 = theoretical 1st moment = θ + δ. observed 2nd moment = 1306/10 = 130.6 = theoretical 2nd moment = 2θ2 + 2θδ + δ2. Therefore, θ = 10 - δ. 2(10 - δ)2 + 2(10 - δ)δ + δ2 = 130.6. δ2 - 20δ + 69.4 = 0. ⇒ δ = {20 ±
(400 - (4)(69.4) }/2 = 4.5 or 15.5.
15.5 is inappropriate since we observe claim amounts < 15.5. Thus δ = 4.5. Comment: θ = 5.5. If y = x - δ, y follows an Exponential Distribution. θ = E[Y] = E[X] - δ. 2θ2 = E[Y2 ] = E[(X-δ)2 ] = E[X2 ] - 2δE[X] + δ2 = E[X2 ] - 2δ(δ+θ) + δ2 = E[X2 ] - 2δθ - δ2. ⇒ E[X2 ] = 2θ2 + 2θδ + δ2. The shifted Exponential is a special case of the shifted Gamma. See the “Translated Gamma Distribution”, page 388 of Actuarial Mathematics, not on the Syllabus.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 209
9.60. D. The observed mean is: 3860/10 = 386. The second moment is: 4,574,802/10 = 457,480.2. With 2 parameters, the method of moments consists of matching the first 2 moments. Thus set exp[µ +.5 σ2] = 386 and exp[2µ + 2σ2] = 457,480.2. Divide the square of the 1st equation into the 2nd equation: exp[2µ + 2σ2]/exp[2µ + σ2] = exp[σ2] = 457,480.2 / 3862 = 3.070. ⇒ σ = 1.059. ⇒ µ = 5.395. E[X
∧
x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x{1 − Φ[(lnx − µ)/σ]}.
E[X
∧
500] = exp(5.395 + 1.0592/2)Φ[(ln500 − 5.395 − 1.0592)/1.059] + 500{1 - Φ[(ln500 − 5.395)/1.059]} =
386Φ[-.29] + (500) {1 - Φ[.77]} = (386)(1 - 0.6141) + (500)(1 - 0.7794) = 259. 9.61. D. This a 50-50 mixture of two Exponentials, with means θ and σ. E[X] = (θ + σ)/2. E[X2 ] = mixture of the 2nd moments = (2θ2 + 2σ2)/2 = θ2 + σ2. Set (θ + σ)/2 = 150/10 = 15, and θ2 + σ2 = 5000/10 = 500.
⇒ σ = 30 - θ. ⇒ 2θ2 - 60θ + 400 = 0. ⇒ θ = 10 or 20. However, we want θ > σ. ⇒ θ = 20 and σ = 10. Comment: A two point mixture of Exponentials is the most common mixture; you should be able to recognize and work with it, even when unusual letters such as σ are used for a mean.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 210
9.62. C. For an Inverse Pareto, E[1/X] = θ−1 / (τ - 1), and E[1/X2 ] = 2θ−2 / {(τ - 1)(τ - 2)}. The negative first moment for the data is: {1/15 + 1/45 + 1/140 + 1/250 + 1/560 + 1/1340}/6 = 0.017093. The negative second moment for the data is: {1/152 + 1/452 + 1/1402 + 1/2502 + 1/5602 + 1/13402 }/6 = 0.0008348. Matching moments, results in two equations in two unknowns: θ−1/(τ - 1) = 0.017093.
2θ−2 / {(τ - 1)(τ - 2)} = 0.0008348.
Divide the second equation by the square of the first equation: 2(τ - 1)/(τ - 2) = 0.0008348/0.0170932 = 2.857. ⇒ τ = 4.334.
⇒ θ = 1 / {(.017093)(4.334 - 1)} = 17.55. Alternately, if X follows an Inverse Pareto with parameters τ and θ, then 1/X follows a Pareto with parameters τ and 1/θ. So we can match the corresponding first and second moments for the Pareto: θ/(α-1) = 0.017093, and 2θ2/{(α-1)(α-2)} = 0.0008348.
⇒ α = 4.334 and θ = 0.05699. Translating back to the parameters of the Inverse Pareto: τ = 4.334 and θ = 1/0.05699 = 17.55. 9.63. D. For an Inverse Gamma, E[1/X] = θ−1Γ(α+1)/Γ(α) = θ−1α, and E[1/X2 ] = θ−2Γ(α+2)/Γ(α) = θ−2α(α + 1). Matching moments, results in two equations in two unknowns: θ−1α = .017093.
θ−2α(α + 1) = .0008348.
Divide the second equation by the square of the first equation: (α + 1)/α = .0008348/.0170932 = 2.857. ⇒ α = .539. ⇒ θ = .539/.017093 = 31.5. Alternately, if X follows an Inverse Gamma with parameters α and θ, then 1/X follows a Gamma with parameters α and 1/θ. So we can match the corresponding first and second moments for the Gamma: θα = .017093, and θ2α(α+1) = .0008348.
⇒ (α+1)/α = 2.857. ⇒ α = .539. ⇒ θ = .017093/.539 = .0317. Translating back to the parameters of the Inverse Gamma: α = .539 and θ = 1/.0317 = 31.5.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 211
9.64. E. X = 1300/4 = 325 = αθ. E[X2 ] = (2002 + 3002 + 3502 + 4502 )/4 = 113,750 = α(α + 1)θ2. Dividing the 2nd equation by the square of the first equation: 1 + 1/α = 1.0769. ⇒ α = 13.0. Alternately, one can match the variances: αθ2 = 113,750 - 3252 = 8125.
⇒ θ = 8125/325 = 25. ⇒ α = 325/25 = 13.0. 9.65. C. E[X] = (130 + 20 + 350 + 218 + 1822)/5 = 508 = θ/(α-1). E[X2 ] = (1302 + 202 + 3502 + 2182 + 18222 )/5 = 701401.6 = 2θ2/{(α-1)(α-2)}. Dividing the second equation by the the square of the first equation: 2(α-1)/(α-2) = 2.718. ⇒ α = 4.786. ⇒ θ = 1923. E[X ∧ x] = {θ/(α−1)}{1 - (θ/(θ+x))α−1}. E[X ∧ 500] = {1923/(4.786 - 1)}{1 - (1923/(1923 + 500))4.786-1} = 296. 9.66. B. Put all the losses on a common level, by inflating them all to year 3. (20)(1.12 ) = 24.2, (40)(1.12 ) = 48.4, (50)(1.12 ) = 60.5, (30)(1.1) = 33, (40)(1.1) = 44, (90)(1.1) = 99, (120)(1.1) = 132. Mean of the inflated losses is: 441.1/7 = 63.01. Second moment of the inflated losses is: 36838.45/7 = 5265.6. Match the first two moments:
exp[µ + σ2/2] = 63.01.
exp[2µ + 2σ2] = 5265.6.
Divide the second equation by the square of the first: exp[σ2] = 1.326. ⇒ σ = .531.
⇒ µ = ln(63.01) - .5312 /2 = 4.00. Comment: Since we have inflated all of the losses to the year 3 level, the resulting estimate of µ is estimate of the mu parameter for year 3, what they have called µ3 . 9.67. E. Since x ≥ 10, we need to take θ = 10. X = (10 + 12 + 14 + 18 + 21 + 25)/6 = 16.667. X = θα/(α-1). ⇒ 16.667 = 10 α/(α-1). ⇒ α = 2.5. Comment: The Single Parameter Pareto distribution has support x > θ, and is set up to work with losses of size greater than a certain value. Since x > θ, it was not correct to have x ≥ 10 and a loss of size 10 in this exam question.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 212
9.68. E. By integrating, F(x) = 1 - e-(x-δ)/θ, δ < x < ∞. Solve for the median: .5 = 1 - e-(x-δ)/θ. median = θ ln2 + δ. If x follows a shifted exponential distribution then x - δ follows an Exponential Distribution. Therefore E[X - δ] = θ. ⇒ E[X] = θ + δ. Set 300 = θ + δ, and 240 = θ ln2 + δ. ⇒ θ = 60/(1 - ln2) = 195.5. ⇒ δ = 300 - 195.5 = 104.5. Comment: Matching one percentile and matching the first moment. Sort of a combination of percentile matching and the method of moments. One can compute the mean of the distribution via integration, with y = x - δ: ∞
∞
∫ x f(x) dx = ∫ x δ
δ
e-(x - δ)/ θ / θ dx
∞
=
∫ (y + δ) 0
e- y / θ /
θ dy =
∞
∫y
e- y / θ / θ dy
0
∞
+ δ ∫ e- y / θ / θ dy 0
= θ + δ, where the first of the final two integrals is the mean of an Exponential Distribution. 9.69. C. First moment = (2,000 + 17,000 + 271,000 + 10,000)/4 = 75,000. Second moment = (2,0002 + 17,0002 + 271,0002 + 10,0002 )/4 = 18,458,500. Set the moments of the Pareto equal to that of the data: 75,000 = θ/(α-1). 18,458,500 = 2θ2/{(α-1)(α-2)}. Divide the second equation by the square of the first equation: 3.2815 = 2(α-1)/(α-2).
⇒ α = 3.561. ⇒ θ = 192.1. 9.70. B. For the LogNormal Distribution the mean is exp[µ + 0.5 σ2], while the second moment is exp[2µ + 2σ2]. With 2 parameters, the method of moments consists of matching the first 2 moments. Thus set exp[µ +.5 σ2] = 1.8682, and exp[2µ + 2σ2] = 4.4817. Divide the square of the 1st equation into the 2nd equation: exp[2µ + 2σ2] / exp[2µ + σ2] = exp[σ2] = 4.4817 / 1.86822 = 1.2841. ⇒ σ = 0.5. ⇒ µ = 0.5.
2013-4-6,
Fitting Loss Distributions §9 Method of Moments,
HCM 10/14/12,
Page 213
9.71. C. E[X] = 8. E[X2 ] = (22 + 102 + 122 + 82 + 82 )/5 = 75.2. Match the mean and variance. Set αθ = 8, and αθ2 = 75.2 - 82 = 11.2.
⇒ θ = 1.4. ⇒ α = 8/1.4 = 5.714. Alternately, match the first and second moments: αθ = 8, and α(α+1)θ2 = 75.2. Solving, θ = 1.4, and α = 5.714. 9.72. E. X = (125 + 250 + 300 + 425 + 500) / 5 = 320. The mean of a Single Parameter Pareto Distribution is: θ α / (α-1). Since there is one parameter, theta is given as 100, we match the first moments: 320 = θ α / (α-1) = 100 α / (α-1). ⇒ α = 1.45. Comment: Unusual language to give you the value of theta. Whenever we deal with the single-parameter Pareto distribution, we assume theta is known. Theta should be given and then we can fit the single parameter alpha via either method of moments or maximum likelihood (they are not equal.) Carefully match the exact name of the given distribution with what is shown in the Appendix.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 214
Section 10, Fitting to Ungrouped Data by Maximum Likelihood For ungrouped data {x1 , x2 , ... , xn } define: Likelihood = Π f(xi)
Loglikelihood = Σ ln f(xi)
In order to fit a chosen type of size of loss distribution by maximum likelihood, you maximize the likelihood or equivalently maximize the loglikelihood. In other words, for ungrouped data you find the set of parameters such that either Π f(xi) or Σ ln f(xi) is maximized. For single parameter distributions one can usually solve for the parameter value by taking the derivative and setting it equal to zero. α 7α For example, take f(x) = . 86 Then, ln f(x) = ln(α) + α ln(7) - (α+1)ln(7 + x). α + 1 (7 + x) If you observe five losses of size: 1, 3, 6, 10, 25, then the loglikelihood = ln f(1) + ln f(3) + ln f(6) + ln f(10) + ln f(25), and is a function of α: LogLikelihood -1 6 . 5 - 16.55 -1 6 . 6 - 16.65 -1 6 . 7 - 16.75
1.2
1.4
1.6
1.8
For α = 2, ln f(1) = ln(2) + 2 ln(7) - (3)ln(8) = -1.65. For α = 2, loglikelihood = -1.65 - 2.32 - 3.11 - 3.91 - 5.81 = -16.8. Graphically, the maximum likelihood corresponds to α ≅ 1.4. 86
This is a Pareto Distribution, but with θ = 7 fixed, leaving α as the sole parameter.
2
Alpha
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 215 Here is how to algebraically find the value of α which maximizes the loglikelihood. ln f(x) = ln(α) + α ln(7) − (α+1)ln(7 + x). Σ ln (f(xi)) = Σ { ln(α) + α ln(7) − (α+1)ln(7 + xi) }. The derivative with respect to α is: Σ {(1/α) + ln(7) - ln(7 + xi)} = 5/α - Σ ln {(7 +xi)/7}. Setting this derivative equal to zero: 0 = (5/α) − Σ ln {(7 +xi)/7}. Solving for alpha: α = 5 / Σ ln {(7 +xi)/7} = 5 / {ln(8/7)+ln(10/7)+ ln(13/7) + ln(17/7) +ln(32/7)} = 5 / {0.1335 + 0.3567 + 0.6190 + 0.8873 + 1.5198} = 1.422. 1.422 is the Maximum Likelihood α. Exercise: For an Inverse Exponential Distribution, determine the maximum likelihood θ, fit to: 1, 3, 6, 10, 25. [Solution: f(x) = θe-θ/x/x2 . ln f(x) = ln(θ) - θ/x - 2ln(x). ∂ lnf(x) = 1/θ - 1/x. ∂θ
0=
i) = n/θ - Σ1/xi. ∑ ∂ lnf(x ∂θ
θ = n / Σ1/xi = 5/(1/1 + 1/3 + 1/6 + 1/10 + 1/25) = 5/1.64 = 3.049.] Exercise: For the Gamma Distribution with α fixed, write down the equation that needs to be solved in order to maximize the likelihood. [Solution: ln f(x) = -αlnθ + (α−1)lnx - (x/θ) - ln(Γ(α)) Setting equal to zero the partial derivatives of Σ ln (f(xi)) with respect to theta one obtains: θ = {(1/n)Σxi } / α . This is the same equation as for the Method of Moments for the Gamma Distribution with α fixed. Note that for α = 1 fixed, one would have an Exponential Distribution.] Exercise: You observe ten claims of size: 10, 20, 30, 50, 70, 90, 120, 150, 200, 250. Fit a Gamma Distribution with α = 4 fixed, to this data using the method of Maximum Likelihood. [Solution: θ = {(1/n)Σxi } / α = (990/10)/4 = 24.75.] For distributions with more than one parameter, one is still able to set all the partial derivatives equal to zero, but is unlikely to be able to solve for the parameters in closed form.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 216 Exercise: For the Gamma Distribution write down two equations that need to be solved in order to maximize the loglikelihood. ψ(α) is the digamma function, d ln(Γ(α)) / dα = ψ(α). [Solution: f(x) = θ−αxα−1 e−x/θ / Γ(α). ln f(x) = -αlnθ + (α−1)lnx - (x/θ) - ln(Γ(α)).
∂ Σ ln (f(xi)) /∂α = -n lnθ + Σlnxi - n ψ(α) = 0. ⇒ ψ(α) = (1/n)Σlnxi - lnθ. ∂ Σ ln (f(xi)) /∂θ = -nα/θ + Σxi /θ2 = 0. ⇒ θ = (1/n)Σxi / α. ⇒ lnθ = ln[(1/n)Σxi] - ln(α). Substituting the second equation into the first ⇒ ψ(α) - ln α = (1/n)Σlnxi - ln[(1/n)Σxi]. ] For most distributions with more than one parameter, one maximizes the loglikelihood by standard numerical algorithms,87 rather than by solving the equations in closed form.88 Quite often one uses percentile matching or the method of moments to obtain a good starting point for such a numerical algorithm. In these cases, one should still be able to calculate the likelihood or loglikelihood for a given set of parameters. Exercise: For a Gamma Distribution with α = 3 and θ = 10, what is the loglikelihood for the following set of data: 20, 30, 40? [Solution: ln f(x) = -αlnθ + (α-1)lnx - (x/θ) - ln[Γ(α)] = -3 ln(10) + 2 ln(x) - x/10 - ln2. ln f(20) = -3.609. ln f(30) = -3.799. ln f(40) = -4.223. loglikelihood = -3.609 - 3.799 - 4.223 = -11.631.] Given a list of sets of values for the parameters, one could be asked to see which set produces the largest loglikelihood.89 Pareto Distribution Fit to the Ungrouped Data in Section 2: For the Pareto Distribution: f(x) =
α θα , and ln[f(x)] = ln(α) + αln(θ) - (α+1)ln(θ + x). (θ + x)α + 1
For a particular pair of values of alpha and theta one can compute ln[f(x)] for each observed size of claim and add up the results. For alpha = 1.5 and theta = 100,000, ones gets ln[f(x)] = ln(α) + αln(θ) - (α+1)ln(θ + x) = 17.68 - (2.5) ln(100000 + x). 87
Many commercial software packages will maximize or minimize functions. As discussed in Appendix C of the first edition of Loss Models, one can use the Nelder-Mead simplex algorithm to minimize the negative loglikelihood. Having been given a starting value, this algorithm searches n-dimensional space iteratively finding points where the function of interest is smaller and smaller. 88 The lack of access to a computer, restricts the variety of possible exam questions. 89 For example, five sets of values for the parameters could be given in an exam question.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 217 For five of the loss sizes in Section 2, this is: x 300 37300 86600 150300 423200
ln[f(x)] for Pareto with alpha = 1.5 and theta = 100,000 -11.11 -11.90 -12.67 -13.40 -15.24
Exercise: For the data in Section 2, compute the log likelihood for a Pareto distribution with alpha = 1.5 and theta = 100,000. [Solution: Computing ln[f(x)] for each of the 130 losses and summing one gets a log likelihood of -1762.25.] Note that once the type of curve (Pareto) and the data set (Section 2) have been selected, the loglikelihood is a function of the parameters. Here is a chart of the log likelihood for the data in Section 2, for various values of the Pareto parameters alpha and theta: Theta (100 thousand)
1.5
1.0 1.5 2.0 2.5 3.0 3.5 4.0
-1762.25
Alpha 1.6
1.7
1.8
1.9
-1766.59
-1771.44
-1776.74
-1782.45
-1750.55
-1752.32
-1754.59
-1757.32
-1760.44
-1748.03
-1748.19
-1748.87
-1749.00
-1751.53
-1749.30
-1748.36
-1747.93
-1747.94
-1748.36
-1752.38
-1750.61
-1749.35
-1748.54
-1748.14
-1756.36
-1753.95
-1752.05
-1750.60
-1749.55
-1760.79
-1757.87
-1755.45
-1753.49
-1751.92
As seen in the above table and the following graph, the values of alpha and theta which maximize the loglikelihood are close to α =1.7, θ = 250,000.90 Small differences in loglikelihood are significant; over this whole chart the loglikelihood only varies by 2%. For values of alpha and theta near these values, the loglikelihood is near the maximum.
90
A more exact computation yields α =1.702 , θ = 240,151, with loglikelihood of - 1747.87.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 218 Here is a graph of this loglikelihood as a function of the parameters α and θ, with the maximum loglikelihood marked with a dot:
As shown below, for values along the straight line corresponding to an estimated mean of θ / (α-1) which is close to 340 thousand, the loglikelihood is large. Pareto Estimated Mean ($ thousand) Theta Alpha (100 thousand) 1.5 1.6 1.7 1.0 200 167 143 1.5 300 250 214 2.0 400 333 286 2.5 500 417 357 3.0 600 500 429 3.5 700 583 500 4.0 800 667 571
1.8
1.9
125 188 250 312 375 438 500
111 167 222 278 333 389 444
This example has some features which are quite common when fitting curves by maximum likelihood. The loglikelihood surface in three dimensions has a ridge. The slope along that ridge is shallow; values of the loglikelihood along that ridge do not vary quickly. Thus there is a considerable uncertainty around the fitted set of parameters. This will be discussed further in later sections.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 219 Distributions Fit to the Ungrouped Data in Section 2: Parameters of curves fit to the ungrouped data in Section 2 by the method of Maximum Likelihood:91 Distribution Exponential Pareto Weibull Gamma LogNormal Inverse Gaussian Transformed Gamma Generalized Pareto Burr LogLogistic ParaLogistic
Parameters Fit via Maximum Likelihood to Ungrouped Data in Section 2 θ = 312,675 α = 1.702 θ = 240151 θ = 231158 τ = 0.6885 α = 0.5825 θ = 536,801 µ = 11.5875 σ = 1.60326 µ = 312675 θ = 15226 α = 4.8365 θ = 816 α = 1.7700 θ = 272,220 α = 1.8499 θ = 272,939 γ = 1.147 θ = 115,737 α = 1.125 θ = 134,845
τ = 0.30089 τ = 0.94909 γ = 0.97036
The means, coefficients of variation and skewness are as follows: Ungrouped Data
Maximum Likelihood Fitted Curves Expon. Weibull Gamma TGam Pareto
Burr InvGaus GenPar LogNorm
Mean ($000)
313
313
298
313
302
342
334
313
336
389
Coef. Var.
2.01
1.00
1.49
1.31
1.85
N.D.
N.D.
4.53
N.D.
3.47
Skewness
4.83
2.00
3.60
2.62
6.80
N.D.
N.D.
13.59
N.D.
52.3
Some of these distributions fail to have a finite variance and skewness. It is not uncommon for higher moments of fitted size of loss distributions to fail to exist (as finite quantities.) In contrast, any actual finite sample of claims has finite calculated moments. However, samples taken from distributions without finite variance, will tend to have large estimated coefficients of variation. Distributions with infinite skewness or even infinite variance are used in many actual applications, but one should always do so with appropriate caution.
91
Note that in the case of the LogNormal, the Maximum Likelihood fit is equivalent to fitting a Normal Distribution to the log claim sizes via the Method of Moments; µ = the mean of the log claim sizes, while σ is the standard deviation of the log claim sizes.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 220 The survival functions for the data (thick), Gamma Distribution, and Transformed Gamma Distribution, fit by maximum likelihood, are shown below:92 S(x) 0.200 0.100 0.050 0.020 0.010 0.005
Gamma Trans. Gamma
0.002 1.0
1.5
2.0
x (million)
3.0
The tail of the Gamma is too light.93 The Transformed Gamma is closer to the data. The survival functions for the data (thick), Pareto Distribution, and LogNormal Distribution: S(x) 0.100 0.050
0.020 0.010
Pareto
0.005 LogNormal 0.002 1.0 92 93
1.5
2.0
3.0
Note that both axes are on a log scale. The tails of the Exponential and Weibull are also too light.
5.0
7.0
10.0
x (million)
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 221 Both the Pareto and the LogNormal do a much better job of fitting the data than the Gamma, with the Pareto somewhat better than the LogNormal. Comparing Loglikelihoods: The values of the loglikelihood can be used to compare the maximum likelihood curves: Negative # Negative Distribution Loglikelihood Pars. Distribution Loglikelihood Generalized Pareto Burr Pareto ParaLogistic Transformed Gamma
1747.82 1747.85 1747.87 1748.71 1748.98
3 3 2 2 3
LogLogistic LogNormal Weibull Gamma Exponential Inverse Gaussian
1749.20 1752.20 1753.04 1759.05 1774.88 1808.02
# Pars. 2 2 2 2 1 2
Of the three parameter distributions, the one with the best loglikelihood is the Generalized Pareto. Of the two parameter distributions, the one with the best loglikelihood is the Pareto. Note how the small difference in loglikelihood from about -1748 to -1759 takes you from a distribution that fits well to one that doesnʼt. Small differences in the loglikelihood are important. Also note the closeness of the values for the first three distributions. The Burr and Generalized Pareto each have the Pareto as a special case.94 Adding additional parameters always allows one to fit the data better,95 but one should only add parameters when they provide a significant gain in accuracy. “The principle of parsimony” states that no more causes should be assumed than will account for the effect.96 97 As applied here, the principle of parsimony, states that one should use the minimum number of parameters that get the job done.98 A simpler model has a number of advantages, including: may smooth irregularities in the data, is more likely to apply over time or in similar situations, each value may be more accurately estimated. A more complex model will closely fit the observed data. 94
In this case the fitted Burr and Generalized Pareto curves are both very close to a Pareto. For example, the Gamma is a special case of the Transformed Gamma, with τ = 1. Therefore, the Maximum Likelihood Gamma is one of the Transformed Gammas we will look at. Therefore Maximum Likelihood over all Transformed Gammas is ≥ that of the Maximum Likelihood Gamma. 96 This principle is also referred to as Occam's Razor. 97 "Everything should be made as simple as possible, but not one bit simpler," Albert Einstein. 98 For example, one can always fit data better with a quadratic curve rather than a linear curve. One can fit exactly any 10 points with a 9th degree polynomial. However, that does not imply that it would be preferable to use the fitted 9th degree polynomial. 95
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 222 For example, an Exponential Distribution with one parameter is simpler than its generalization the Gamma Distribution with two parameters. A Gamma Distribution will always fit the data better. However, it may be that the Gamma is just picking up peculiarities of the data set, which will not be reproduced in the future. One must balance goodness of fit versus simplicity. In this case, the 2 parameter Pareto Distribution is preferable to either of the 3 parameter distributions: the Burr or the Generalized Pareto. The loglikelihood could be used to decide whether the 3 parameter Burr or Generalized Pareto curves provide a statistically significant better fit compared to the 2 parameter special case the Pareto. Taking twice the difference of the loglikelihoods, or twice the log of the ratio of likelihoods, we get 0.04 or 0.10. Comparing to the Chi-Square distribution for 1 degree of freedom we find that the difference is insignificant. Similarly, twice the difference between the loglikelihood of the Transformed Gamma (3 parameters) and the Gamma (2 parameters) is: (2) (10) = 20, which is significant. As will be discussed, the Likelihood Ratio Test can be used to determine whether such an improvement in loglikelihood is statistically significant.99 Based on the loglikelihood the Pareto fits a little better than the Transformed Gamma; the Pareto has fewer parameters and a better loglikelihood then the Transformed Gamma. You should compare the graphs of the distributions versus the ungrouped data in Section 2 to verify that the Pareto and Transformed Gamma fit well, while the Gamma does not fit well. One can also compare various statistics such as Kolmogorov-Smirnov Statistic as is done in a subsequent section. In addition, comparing the mean excess losses and/or the Limited Expected Values provide useful information on the fit of the curves as shown subsequently.
99
The Likelihood Ratio Test will be discussed in a subsequent section.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 223 Limited Expected Values, Comparison of Empirical and Fitted: Using the formulas in Appendix A of Loss Models, the values of the Limited Expected Value for various limits for the distributions fit by maximum likelihood to the ungrouped data in Section 2 are:100
Limited Expected Value ($000) Distribution
10 K
100 K
1m
2.5 m
5m
10 m
Data
9.7
75.0
236.2
283.7
312.7
312.7
Pareto
9.7
74.2
234.1
280.3
302.9
317.7
Weibull
9.3
72.5
258.4
293.1
297.3
297.6
Gamma
9.3
74.4
280.8
311.4
312.9
312.9
Trans. Gamma
9.6
73.1
245.4
288.5
300.5
303.5
Gen. Pareto
9.6
73.9
235.0
280.4
302.0
315.4
Burr
9.6
73.9
235.0
280.5
301.9
315.2
LogNormal
9.7
71.2
244.5
312.1
348.3
370.7
Data excluding largest claim Data duplicating largest claim
9.7
74.8
230.2
266.5
277.9
277.9
9.7
75.2
242.0
300.6
346.9
346.9
One can usefully compare the fitted versus observed Limited Expected Values, in order to check the goodness of fit of various distributions. The Pareto does a good job of matching the observed Limited Expected Values. (For the ungrouped data, the Generalized Pareto and Burr fitted curves are very close to the fitted Pareto curve and thus also fit well.) The Transformed Gamma seems next best, followed by the LogNormal. The Gamma and Weibull donʼt seem to match well. It should be noted that for a small data set the observed Limited Expected Values are subject to considerable fluctuation at higher limits. This is illustrated by computing Limited Expected Values with the largest loss either eliminated from the ungrouped data set or duplicated. Given the large difference that results at higher limits, one should be cautious about rejecting a curve based on comparing to the observed Limited Expected Values at the upper limits. (In fact one of the reasons for fitting curve is because at higher limits the data is thinner.)
The parameters of these fitted distributions are: Exponential: θ = 312,675; Pareto: α = 1.702 and θ = 240,151; Weibull: θ = 231,158 and τ = 0.6885; Gamma: α = 0.5825 and θ = 536,801; Transformed Gamma: α = 4.8365, θ = 816, and τ = 0.30089, Generalized Pareto: α = 1.7700, θ = 272,220, and τ =0.94909; Burr: α = 1.8499, θ = 272,939, and γ = 0.97036; LogNormal: µ = 11.59 and σ = 1.603. 100
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 224 Shown below are the Limited Expected Values for the data (thick), Pareto, and the LogNormal fit via Maximum Likelihood to the ungrouped data. Limited Expected Value (000)
Pareto
300
LogNormal
200 1000
1500
2000
3000
5000
Size (000)
Mean Excess Losses, Comparison of Empirical and Fitted: The values of the Mean Excess Loss for various limits for the distributions fit by maximum likelihood to the ungrouped data in Section 2 are :
Mean Excess Loss ($000) Distribution
10 K
100 K
1m
5m
10 m
Data
322.9
423.3
1421.0
Exponential
312.7
312.7
312.7
312.7
312.7
Pareto
356.5
484.7
1766.7
7464.7
14587.2
Weibull
323.4
394.8
606.5
922.0
1124.0
Gamma
341.0
393.6
477.4
518.1
527.0
Trans. Gamma
318.4
418.6
890.0
1965.4
2894.3
Gen. Pareto
351.1
472.8
1644.8
6839.6
13332.3
Burr
349.4
470.9
1621.5
6669.3
12965.7
LogNormal
408.8
614.8
1767.3
5024.4
8298.2
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 225 Below are the empirical mean excess loss (thick), the mean excess loss for the maximum likelihood Pareto and the maximum likelihood Transformed Gamma: e(x) (million)
3.0
Pareto
2.0 1.5
1.0 Transformed Gamma
0.5
1.0
2.0
x (million)
The Transformed Gamma is perhaps somewhat too light-tailed, while the Pareto is perhaps somewhat too heavy-tailed. If one were extrapolating out to large loss sizes, which distribution is used is very important. Below are compared the Mean Excess Losses estimated from the fitted Pareto, LogNormal, and Transformed Gamma, out to $20 million: e(x) (million)
25 20
Pareto
15
LogNormal
10 Transformed Gamma
5
x (million) 5
10
15
20
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 226 I have found the Mean Excess Losses (Mean Residual Lives) particularly useful at distinguishing between the tails of the different distributions when interested in using the curves to estimate Excess Ratios. Comparing the Limited Expected Values seems particularly useful when the distributions are to be used for estimating increased limit factors. Loss Elimination Ratios and Excess Ratios, Comparison of Empirical and Fitted: Here are Loss Elimination Ratios and Excess (Pure Premium) Ratios for the curves fit via maximum likelihood to the ungrouped data in Section 2: Excess Ratios
Loss Elimination Ratios
Distribution
$1 million
$2.5 million
$5 million
$10 K
$100 K
$1 million
Data
0.245
0.093
0.000
0.031
0.240
0.755
Pareto
0.316
0.181
0.115
0.028
0.217
0.684
Weibull
0.132
0.015
0.001
0.031
0.243
0.868
Gamma
0.103
0.005
0.000
0.030
0.238
0.897
Trans. Gamma
0.193
0.051
0.011
0.032
0.241
0.807
Gen. Pareto
0.300
0.164
0.100
0.029
0.220
0.700
Burr
0.297
0.161
0.097
0.029
0.221
0.703
LogNormal
0.374
0.200
0.108
0.025
0.182
0.626
Even though the empirical excess ratio at $5 million is 0, those for most of the fitted distributions are significantly positive. While the largest loss reported in our ungrouped data set of 130 losses is about $5 million, this does not imply that if the experiment were repeated we couldnʼt get a loss of $10 million, $20 million, or more. Even if very rare, large losses can still have a significant impact on the expected value of the excess ratio. The loss dollars from hurricanes are a good example. Storms of an intensity such that they occur less frequently than once in several decades can have a significant impact on the total expected loss as well as the expected value of the excess ratios.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 227 Below are the excess ratios for the data (solid), Pareto (dotted), and LogNormal (dashed): Excess Ratio 0.6 0.5 Pareto 0.4 0.3 Empirical 0.2 LogNormal 0.1 0.50
0.70
1.00
1.50
2.00
3.00
Size (million)
In our example, if the losses are actually being drawn from the Pareto distribution fit by maximum likelihood, then 10% of the expected total dollars are coming from dollars in the layer excess of $5 million! This is in spite of the fact that for the particular sample we observe no dollars in that layer. If we had added a loss of $10 million to the observed ungrouped data, then the observed excess ratio would have been about 10% for this layer. (Remember that a loss of size $10 million only contributes $5 million to the layer excess of $5 million.) For this Pareto distribution, 7.5% of the total expected losses come from losses larger than $20 million, 4% from losses greater than $50 million, 2.5% from losses greater than $100 million, and half a percent from losses greater than $1 billion! This is what is meant by a very heavy tailed distribution. If instead the losses follow the fitted Transformed Gamma, then about 1% of the total loss dollars are expected to come from the layer excess of $5 million, rather than 10% as for the fitted Pareto. The observed excess ratio would be about 1% if we added a single loss of $5.5 million to the ungrouped data set. In comparison to the Pareto, for the Transformed Gamma, 0.6% of the total expected losses come from losses larger than $20 million, and only 4 x 10-6 from losses greater than $50 million. Thus while the fitted Transformed Gamma has a heavy tail, it is not nearly as heavy as the fitted Pareto.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 228 There is no way we can distinguish on a statistical basis between these two distributions solely based on 130 observed claims. Yet they produce substantially different estimates of the expected losses in higher layers. This illustrates the difficulties actuaries have estimating high layers of loss in real world applications. Nevertheless, the techniques in Loss Models form at least the starting point from which most such estimates are made. Below are compared the Excess Ratios estimated from the fitted Pareto, LogNormal (thick), and Transformed Gamma, out to $20 million:
Excess Ratio 0.35 0.30 0.25 0.20 0.15 0.10 Pareto 0.05 Trans. Gamma
LogNormal 5
10
15
20
Size (million)
Linear Exponential Families: For Linear Exponential Families, the Methods of Maximum Likelihood and Moments produce the same result when applied to ungrouped data.101 Thus there are many cases where one can apply the method of maximum likelihood to ungrouped data by instead performing the simpler method of moments: Exponential, Poisson, Normal for fixed σ, Binomial for m fixed (including the special case of the Bernoulli), Negative Binomial for r fixed (including the special case of the Geometric), the Gamma for α fixed, and the Inverse Gaussian for θ fixed.
101
This useful fact is demonstrated in "Mahler's Guide to Conjugate Priors".
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 229 Creating Single Parameter Distributions: It is possible to reduce the number of parameters in any distribution, by assuming one or more of the parameters is a constant. This can be done with any of the distributions in Loss Models. For example, as has already been mentioned, the two parameter Gamma distribution with a shape parameter of unity is the one parameter Exponential distribution. Each of the two parameter distributions discussed above can be made a one parameter distribution by setting either parameter equal to a constant. This is not only useful in the real world applications, but it is likely to be used in exam questions.102 For example, assume we take the Pareto distribution, and set θ = 300,000. To fit via the method of Maximum Likelihood the resulting one parameter distribution to the ungrouped data in Section 2, we set the derivative of the log likelihood with respect to α, equal to zero. The log likelihood is: Σ { ln(α) + α ln(300,000) - (α+1)ln(300,000 + xi) }. The derivative with respect to α is: Σ { (1/α) + ln(300,000) - ln(3000,00 + xi) } Setting this derivative equal to zero: 0 = (n/α) - Σ ln {(300,000 +xi)/300,000} Solving for alpha: α = n / Σ ln[(300,000 +xi)/300,000] = 1.963. Note that this is not the same value obtained for α when the two parameter Pareto distribution was fit by maximum likelihood to the ungrouped data. Remember that in that case the value of both α and θ were allowed to vary freely. The fitted parameters α =1.702, θ = 240 thousand produce the maximum likelihood over all possible pairs of parameters. α = 1.963 only maximizes the likelihood for all possible values of α when θ = 300,000. Invariance of the Method of Maximum Likelihood Under Change of Variables: The Method of Maximum Likelihood is unaffected by change of variables, that are one to one and monotonic, such as: x2 , x , 1/x, ex, e-x, and ln(x). This important result often lets one reduce the method to a simpler case.
102
For example, many of the exam questions on maximum likelihood involve this idea.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 230 For example, assume one is using Maximum Likelihood to fit a Weibull Distribution with θ unknown and τ = 3 fixed. Then f(x) = 3x2 exp(-(x/θ)3 )/θ3 . The loglikelihood is: Σlnf(xi) = -θ-3Σxi3 + 2Σln(xi) -3n ln(θ) + n ln(3). Set the partial derivative with respect to θ equal to zero: 0 = 3θ-4 Σxi3 - 3n/θ. Thus θ = {Σxi3 /n}1/3. Alternately, one could use the change of variables y = x3 , which transforms this Weibull103 into an Exponential with mean θ3 . Then when applied to ungrouped data, the maximum likelihood fit to an Exponential is equal to that of the Method of Moments: θ3 = Σyi / n. Thus transforming back via y = x3 , θ = {Σxi3 /n}1/3. Now the reason this works is that the loglikelihood of the Exponential and the Weibull differ only by terms which don't depend on the parameter θ. The density of the Exponential with mean θ3 is: g(y) = exp(-y/ θ3 )/ θ3 . The log density of the Exponential is: ln g(y) = -y/θ3 -ln(θ3 ) = -x3 / θ3 - 3ln(θ). This differs by: 2ln(x) + n ln(3), from the log density of the Weibull (for τ = 3) of: -θ-3x3 + 2ln(x) - 3ln(θ) + n ln(3). Thus the value of θ that maximizes one of the loglikelihoods, also maximizes the other. This is what is meant by the Maximum Likelihood being invariant under change of variables. The key is that the change of variables can not depend on any of the parameters.104 Exercise: Let x1, x2 , ... , xn be fit to a Normal Distribution via Maximum Likelihood. Determine µ and σ. [Solution: f(x) = (1/σ) exp(-.5{(x-µ)/σ}2 ) / 2 π . ln f(x) = -0.5{(x-µ)2 /σ2 } - ln(σ) - (1/2)ln(2π). Σ ln f(xi) = -0.5{Σ(xi-µ)2 /σ2 } - nln(σ) - (n/2)ln(2π). Set the partial derivatives of the loglikelihood equal to zero.
∂Σ ln f(xi) / ∂σ = Σ(xi-µ)2 /σ3 - n/σ = 0. ∂Σ ln f(xi) / ∂µ = Σ(xi-µ)/σ2 = 0. Therefore Σ(xi-µ) = 0. µ = (1/n)Σxi. Therefore σ =
103 104
Σ(xi - µ)2 .] n
The Weibull can be obtained from an Exponential via a power transformation. Note that in the example, τ was fixed at 3 and thus was not a parameter.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 231 Note that the fitted µ and σ are the usual estimates for the mean and variance.105 Thus in the case of the Normal, applied to ungrouped data, the Method of Maximum Likelihood produces the same result as the Method of Moments. Exercise: Let x1, x2 , ... , xn be fit to a LogNormal Distribution via Maximum Likelihood. Determine µ and σ. [Solution: f(x) = exp[-.5 ({ln(x) − µ} / σ)2] /{xσ 2 π ) ln f(x) = -.5{(ln(x)-µ)2 /σ2 } - ln(σ) - ln(x) - (1/2)ln(2π) Σ ln f(xi) = -.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - nln(x) - (n/2)ln(2π) Set the partial derivatives of the loglikelihood equal to zero.
∂Σ ln f(xi) / ∂σ = Σ(ln[xi] - µ)2 /σ3 - n/σ = 0. ∂Σ ln f(xi) / ∂µ = Σ(ln[xi] - µ)/σ2 = 0.
∑ lnxi . ⇒ σ2 = ∑ (lnx i Therefore, Σ(ln[x ] - µ) = 0. ⇒ µ = i
N
N
)
- X2
.]
Notice that for the LogNormal Distribution, the Method of Maximum Likelihood gives the same result as the Method of Maximum Likelihood applied to the Normal Distribution and the log of the claim sizes. This is the case because the Method of Maximum Likelihood is invariant under changes of variables such as y = ln(x).106 If a set of parameters maximizes the likelihood of a LogNormal Distribution, then they also maximize the likelihood of the corresponding Normal. In addition, since in the case of the Normal the Method of Maximum Likelihood and the Method of Moments (applied to ungrouped data) produce the same result, applying the Method of Maximum Likelihood to the LogNormal is the same as applying the Method of Moments to the underlying Normal. Another important example of a change of variables is the effect of uniform inflation. If for example, we have 5% annual inflation over 3 years, then if X is the loss amount in the year 2000, Y = 1.053 X is the loss amount in the year 2003. We could fit a distribution using maximum likelihood applied to the year 2000 data, and then adjust this distribution for the effects of uniform inflation.107 Alternately, we could fit a distribution using maximum likelihood applied to the data adjusted to a year 2003 level. Due to the invariance of maximum likelihood under change of variables, the results would be the same in the two alternatives.
105
Note that this estimate of the variance with n rather than n-1 in the denominator is biased. If x follows a LogNormal, then ln(x) follows a Normal. 107 How to do this is discussed in "Mahler's Guide to Loss Distributions." For example for a Pareto Distribution, α stays the same and θ is multiplied by the inflation factor, in this case 1.053 . 106
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 232 Demonstration of the Invariance of the Method of Maximum Likelihood: The Method of Maximum Likelihood is unaffected by change of variables, that are one to one monotonic such as: x2 , x , and ln(x). Let y = g(x), where g does not depend on any of the parameters to be fitted, and is one-to-one monotonic.108 Then the Distribution Functions are related via FY(g(x)) = FX(x), while the density functions are related via fY(g(x)) g'(x) = fX(x). The loglikelihood in terms of x is: Σ ln fX(xi) = Σ ln fY(yi) + Σ ln g'(xi). However, the second term on the right hand side of the equation does not depend on the parameters of the loss distributions fX and fY. Thus if a set of parameters maximizes the loglikelihood of fX, then it also maximizes the loglikelihood of fY. Thus in general, the Method of Maximum Likelihood is invariant under such changes of variables.
108
y = g(x) is one-to-one if each value of y corresponds to no more than one value of x. y = g(x) is monotonic increasing if as x increases y does not decrease. y = g(x) is monotonic decreasing if as x increases y does not increase.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 233 Beta Distribution: As shown in Appendix A of Loss Models, the Beta Distribution has support from 0 to θ.109
f(x) = =
1 Γ(a + b) (x/θ)a (1 - x/θ)b-1 / x = (x/θ)a (1 - x/θ)b-1 / x β(a, b) Γ(a) Γ(b) (a + b - 1)! (x/θ)a-1 (1 - x/θ)b-1 / θ, 0 ≤ x ≤ θ. (a - 1)! (b- 1)!
For a = 1, b = 1, the Beta Distribution is the uniform distribution from [0, θ]. For various special cases, one can fit a Beta Distribution via maximum Iikelihood on the exam. For b = 1 and θ fixed and known: f(x) = a xa-1 / θa. ln [f(x)] = ln[a] + (a-1)ln[x] - a ln[θ]. Loglikelihood is: n ln[a] + (a-1)
∑ ln[xi] - n a ln[θ].
Setting the partial derivative with respect to a equal to zero:
∑ ln[xi] - n ln[θ]. -n a^ = ln[θ] - n/ ∑ ln[xi] = n . ∑ ln[xi / θ]
0 = n/a +
110
i=1
Exercise: For a = 1 and θ fixed and known, fit a Beta Distribution via maximum likelihood. [Solution: f(x) = b (1 - x/θ)b-1 / θ. Loglikelihood is: n ln[b] + (b-1)
ln [f(x)] = ln[b] + (b-1)ln[1 - x/θ] - ln[θ].
∑ ln[1 - xi / θ] - n ln[θ].
Setting the partial derivative with respect to b equal to zero: -n 0 = n/b + ln[1 - xi / θ] . ⇒ b^ = n .
∑
∑ ln[1 - xi / θ] i=1
^ Comment: The formula for b follows from that for a^ , and the change of variables: y = θ - x.
Note that S(x) = (1 - x/θ)b , 0 < x < θ, which is a Modified DeMoivreʼs Law.] 109 110
I discuss the Beta Distribution in my Guide to Loss Distributions. See CAS3, 11/05, Q.18, and 4, 11/04, Q. 6.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 234 ⎛
Modified DeMoivreʼs Law: S(x) = ⎜1 ⎝
x⎞ α ⎟ , 0 ≤ x ≤ ω, α > 0. ω⎠
^ = In general, for ω fixed, when fitting the exponent α via maximum likelihood: α n
-n
∑ ln[1 - xi / ω]
.111
i=1
For b = 2 and θ fixed and known: f(x) = a(a+1) xa-1 (1 - x/θ) / θa. ln [f(x)] = ln[a] + ln[a+1] + (a-1)ln[x] + ln[1 - x/θ] - a ln[θ]. Loglikelihood is: n ln[a] + n ln[a+1] + (a-1)
∑ ln[xi] - ∑ ln[1 - xi / θ] - n a ln[θ].
Setting the partial derivative with respect to a equal to zero: n/a + n/(a+1) +
∑ ln[xi] - n ln[θ] = 0. ⇔ n/a + n/(a+1) + ∑ ln[xi / θ] = 0.
Exercise: You observe 5 values: 12, 33, 57, 70, 81. For b = 2 and θ = 100, fit a Beta Distribution via maximum likelihood. [Solution: 5/a + 5/(a+1) + ln[12/100] + ln[33/100] + ln[57/100] + ln[70/100] + ln[81/100] = 0 ⇒ 5/a + 5/(a+1) - 4.358 = 0. ⇒ 4.358 a2 - 5.642 a - 5 = 0. 5.642 + Taking the positive root of this quadratic equation: a^ =
5.6422 - (4)(4.358)(-5) = 1.90.] (2)(4.358)
Exercise: You observe 5 values: 21, 33, 47, 60, 71. For a = 2 and θ = 100, fit a Beta Distribution via maximum likelihood. [Solution: f(x) = b(b+1) x (1 - x/θ)b-1 / θ2. ln [f(x)] = ln[b] + ln[b+1] + (b-1)ln[1 - x/θ] + ln[x] - 2 ln[θ]. Setting the partial derivative of the loglikelihood with respect to b equal to zero: n/b+ n/(b+1) +
∑ ln[1 - xi / θ] = 0. ⇒
5/b + 5/(b+1) - 3.425 = 0. ⇒ 3.425 b2 - 6.575 b - 5 = 0. 6.575 + Taking the positive root of this quadratic equation: b^ =
111
See CAS 3L, 11/11, Q. 18.
6.5752 - (4)(3.425)(-5) = 2.50.] (2)(3.425)
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 235 Formulas for Some Examples of Maximum Likelihood fits to Ungrouped Data:112 Distribution
Parameters
Exponential
θ = Σxi/N = X , same as the method of moments
Inverse Exponential
θ=
Single Parameter Pareto
α=
Gamma, α fixed
θ = X /α, same as the method of moments
Pareto, θ fixed
α=
Weibull, τ fixed
θ=
(∑xi τ / N) 1 / τ
Normal
µ=
X , σ2 =
LogNormal
µ=
∑ lnxi , σ2 = ∑ (lnx i
Inverse Gaussian, θ fixed
µ= X
Inverse Gaussian
µ = X, θ=
Inverse Gamma, α fixed
θ=
Uniform on [0, b]
b = maximum of the xi
Uniform on [a, 0]
a = minimum of the xi
Uniform on [a, b]
a = minimum of the xi, b = maximum of the xi
112
N
∑1/ xi N
∑ ln[xi / θ]
N +x ∑ ln[θ θ i
]
∑ (xi
)
- X 2 N
N
, same as the method of moments
)
- X2
N
1 ∑ 1/ xi - 1/ X N
α ∑ 1/ xi N
In the absence of truncation and/or censoring, as well as in the absence of grouping.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 236
Inverse Weibull, τ fixed
⎛ θ= ⎜ ⎜ ⎝
Inverse Pareto, θ fixed
τ=
Beta, b = 1, θ fixed
a= n
⎞ 1/ τ ⎟ xi- τ ⎟⎠
N
∑
N
∑ ln[1 + θ / xi ] -n
∑ ln[xi / θ] i=1
Beta, a = 1, θ fixed
b= n
-n
∑ ln[1 - xi / θ] i=1
Maximum Likelihood versus Method of Moments and Percentile Matching: Method of Moments and Percentile Matching each match one or more statistics of the data to the fitted distribution. They have the advantage of being relatively simple to perform. While Method of Moments and Percentile Matching do a fairly good job for lighter tailed data sets, they generally do not perform as well for data sets with heavy right hand tails, such as are common in casualty insurance. Also the choice of percentiles at which to perform the matching can be somewhat arbitrary. The method of maximum likelihood uses all the information contained in the data set, and thus as will be discussed in a subsequent section has many desirable statistical properties when applied to large samples. Also as will be discussed in subsequent sections, the method of maximum likelihood can be applied to situations involving data combined from different policies with different deductibles and/or different maximum covered losses. In a majority of cases, maximum likelihood requires the use of a computer. The computer program will usually require a starting set of parameters in order to numerically maximize the loglikelihood. Percentile Matching or Method of Moments are often used to provide such a starting set of parameters.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 237 Problems: 10.1 (3 points) You observe 4 claims of sizes: 2, 5, 7 and 10. For this data, which of the following Pareto distributions has the largest likelihood? A. Pareto with α = 1, θ = 10
B. Pareto with α = 1.5, θ = 12
C. Pareto with α = 2, θ = 15
D. Pareto with α = 2.5, θ = 18
E. Pareto with α = 3, θ = 20 10.2 (3 points) A random variable X is given by the density function: f(x) = (q+2)(q+1) xq (1-x), 0 ≤ x ≤ 1. A random sample of three observations of X yields: 0.2, 0.3 and 0.6. Determine the maximum likelihood estimator of q. A. less than 0 B. at least 0 but less than 0.5 C. at least 0.5 but less than 1.0 D. at least 1.0 but less than 1.5 E. at least 1.5 10.3 (2 points) You observe the following 6 claims: 162.22, 151.64, 100.42, 174.26, 20.29, 34.36. A distribution: F(x) = 1 - e-qx, x > 0, is fit to this data via the Method of Maximum Likelihood. Determine the value of q. A. less than 0.006 B. at least 0.006 but less than 0.007 C. at least 0.007 but less than 0.008 D. at least 0.008 but less than 0.009 E. at least 0.009 10.4 (3 points) 10 Claims have been observed: 1500, 5500 3000, 3300, 2300, 6000, 5000, 4000, 3800, 2500. The underlying distribution is assumed to be Gamma, with parameters α = 8 and θ unknown. In what range does the maximum likelihood estimator of θ fall? A. 430
B. 440
C. 450
D. 460
E. 470
10.5 (3 points) 0, 3, and 8 are three independent random draws from Normal Distributions. Each Normal Distribution has the same mean. The variances of the Normal Distributions are respectively: 1/θ, 1/(2θ), and 1/(3θ). Determine the maximum likelihood estimate of θ. A. 0.03
B. 0.04
C. 0.05
D. 0.06
E. 0.07
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 238 10.6 (2 points) Losses are uniformly distributed on [1, b]. You observe 4 losses: 1.7, 3.1, 3.4, 4.6. What is the maximum likelihood estimate of b? A. less than 4 B. at least 4 but less than 5 C. at least 5 but less than 6 D. at least 6 but less than 7 E. at least 7 Use the following information to answer the following two questions: You observe the following five claims: 6.02, 7.56, 7.88, 8.42, 8.72. (x -µ)2 ] 2σ2 . σ 2π
exp[The normal distribution has its probability density function given by: f(x) =
10.7 (1 point) Using the method of maximum likelihood, a Normal distribution is fit to this data. What is the value of the fitted µ parameter? A. less than 7.8 B. at least 7.8 but less than 7.9 C. at least 7.9 but less than 8.0 D. at least 8.0 but less than 8.1 E. at least 8.1 10.8 (2 points) Using the method of maximum likelihood, a Normal distribution is fit to this data. What is the value of the fitted σ parameter? A. less than 0.6 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9 10.9 (3 points) You have the following data from three states: State Number of Claims Dollars of Loss Bay 200 200,000 Empire 400 500,000 Granite 100 75,000 You assume that the mean claim size for Empire State is 1.4 times that for Bay State and 1.7 times that for Granite State. You assume the size of claim distribution for each state is Exponential. Estimate the mean claim size for Empire State via the method of maximum likelihood applied to the data of all three states. (A) 1100 (B) 1150 (C) 1200 (D) 1250 (E) 1300
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 239 Use the following information to answer the following two questions: You observe the following five claims: 410, 1924, 2635, 4548, 6142. 10.10 (1 point) Using the method of maximum likelihood, a LogNormal distribution is fit to this data. What is the value of the fitted µ parameter? A. less than 7.8 B. at least 7.8 but less than 7.9 C. at least 7.9 but less than 8.0 D. at least 8.0 but less than 8.1 E. at least 8.1 10.11 (2 points) Using the method of maximum likelihood, a LogNormal distribution is fit to this data. What is the value of the fitted σ parameter? A. less than 0.6 B. at least 0.6 but less than 0.7 C. at least 0.7 but less than 0.8 D. at least 0.8 but less than 0.9 E. at least 0.9
Use the following information to answer each of the following two questions. A Pareto distribution has been fit to a set of data xi, i = 1 to n, using the method of maximum likelihood. For the fitted parameter θ, let: v=
∑ ln(1 + n
xi / θ)
w=
∑1/ (1 +
xi / θ)
n
10.12 (2 points) Which of the following equations is satisfied? A. w + 1/v =1 B. (1/w) + v = 1 C. w - v = 1 D. (1/w ) - (1/v) = 1
E. None of A,B, C, or D.
10.13 (2 points) For the fitted parameters, α is equal to which of the following? A. v/(v+w)
B. w/(w+1)
C. 1/v
D. w/(w-1)
E. v/(v-w)
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 240 10.14 (3 points) You observe the following 10 claims: 1729, 101, 384, 121, 880, 3043, 205, 132, 214, 82 You fit to this data via the method of maximum likelihood the Distribution Function ⎛ 1000 ⎞ α F(x) = 1 - ⎜ ⎟ . In which of the following intervals is α? ⎝ 1000 + x ⎠ A. less than 2.5 B. at least 2.5 but less than 2.6 C. at least 2.6 but less than 2.7 D. at least 2.7 but less than 2.8 E. at least 2.8
Use the following information for the next three questions:
• •
5 Claims have been observed: 1500, 5500, 3000, 3300, 2300. An Inverse Gaussian Distribution with parameters µ and θ is fit to this data via maximum likelihood.
10.15 (2 points) Determine the value of the fitted µ. A. less than 3200 B. at least 3200 but less than 3300 C. at least 3300 but less than 3400 D. at least 3400 but less than 3500 E. at least 3500 10.16 (2 points) Determine the value of the fitted θ. A. less than 15,000 B. at least 15,000 but less than 16,000 C. at least 16,000 but less than 17,000 D. at least 17,000 but less than 18,000 E. at least 18,000 10.17 (2 points) If µ is fixed as 4000, determine the value of the fitted θ. A. less than 13,000 B. at least 13,000 but less than 14,000 C. at least 14,000 but less than 15,000 D. at least 15,000 but less than 16,000 E. at least 16,000
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 241 Use the following information to answer the next two questions: • An insurer writes a policy with a coinsurance factor of 90%.
• • •
There are 7 payments: 3236, 3759, 10769, 22832, 28329, 36703, 72369. The losses prior to the effect of the coinsurance factor are assumed to follow a LogNormal Distribution. A LogNormal Distribution is fit via the Method of Maximum Likelihood.
10.18 (2 points) What is the fitted µ parameter? A. less than 9.7 B. at least 9.7 but less than 9.8 C. at least 9.8 but less than 9.9 D. at least 9.9 but less than 10.0 E. at least 10.0 10.19 (2 points) What is the fitted σ parameter? A. less than 0.7 B. at least 0.7 but less than 0.8 C. at least 0.8 but less than 0.9 D. at least 0.9 but less than 1.0 E. at least 1.0 10.20 (3 points) A sample of 200 losses has the following statistics: 200
∑ xi
-2
200
= 0.0045641
i=1 200
∑ xi
∑ xi = 130,348 i=1
-1.5
200
= 0.046162
∑ xi1.5 = 3,832,632
i=1
i=1
200
∑ xi -1 = 0.59041
200
i=1
i=1
∑ xi2 = 120,252,097
You assume that the losses come from a Weibull distribution with τ = 1.5. Determine the maximum likelihood estimate of the Weibull parameter θ . (A) Less than 700 (B) At least 700, but less than 800 (C) At least 800, but less than 900 (D) At least 900, but less than 1000 (E) At least 1000
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 242 Use the following information for the next two questions: • You observe 1000 claims of sizes xi, i =1, 2, 3.... 1000. 1000
∑ xi
•
1000
= 150,000
i=1
1000
∑
∑ lnxi = 4,800 i=1
(lnxi)2
1000
= 24,000
i=1
∑ xi2 =
30 million
i=1
•
A Gamma Distribution is fit to this data via the method of maximum likelihood
•
Where ψ(y) = d (ln Γ(y) ) / dy is the digamma function: y
ln(y) - ψ(y)
y
ln(y) - ψ(y)
2.0 2.1 2.2 2.3 2.4 2.5
.270 .257 .244 .233 .223 .213
2.6 2.7 2.8 2.9 3.0
.204 .196 .189 .182 .176
10.21 (3 points) What is the fitted value of α? A. less than 2.5 B. at least 2.5 but less than 2.6 C. at least 2.6 but less than 2.7 D. at least 2.7 but less than 2.8 E. at least 2.8 10.22 (2 points) What is the fitted value of θ? A. less than 35 B. at least 35 but less than 45 C. at least 45 but less than 55 D. at least 55 but less than 65 E. at least 65
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 243 10.23 (3 points) Two friends Bert and Ernie work at different insurers. They are each analyzing similar data at their insurers. They have each calculated the following Negative Loglikelihoods for Weibull Distributions using the data at their own insurer. Negative Loglikelihoods for Bertʼs data: Theta
τ = 0.3
τ = 0.5
τ = 0.7
τ = 0.9
τ = 1.1
3000 5000 7000 9000 11000
1473.07 1473.95 1476.27 1478.81 1481.31
1447.75 1443.55 1446.16 1450.43 1455.09
1477.03 1455.49 1454.12 1458.22 1464.11
1562.76 1500.83 1487.66 1488.05 1493.31
1728.75 1583.94 1545.46 1536.25 1537.83
Negative Loglikelihoods for Ernieʼs data: Theta
τ = 0.3
τ = 0.5
τ = 0.7
τ = 0.9
τ = 1.1
3000 5000 7000 9000 11000
1100.98 1098.15 1097.72 1098.07 1098.72
1061.03 1051.90 1050.28 1050.96 1052.51
1046.93 1026.09 1022.01 1022.82 1025.44
1052.74 1012.43 1004.09 1004.60 1008.27
1079.22 1008.35 993.23 992.77 997.36
If they were to fit a Weibull Distribution to their combined data via maximum likelihood, what would be the survival function at 22,000? A. 1% B. 5% C. 10% D. 15% E. 20% 10.24 (3 points) Let x1 , x2 , ..., xn and y1 , y2 , ..., ym denote independent random samples of severities from Region 1 and Region 2, respectively. Pareto distributions with θ = 1, but different values of α, are used to model severities in these regions. Past experience indicates that the average severity in Region 2 is half the average severity in Region 1. You intend to calculate the maximum likelihood estimate of α for Region 1, using the data from both regions. Which of the following equations must be solved? (A) n/α - Σln(xi) + 2m/(2α - 1) - 2Σln(yi) = 0. (B) n/α - Σln(xi) + m/(α - 1) - Σln(yi) = 0. (C) n/α - Σln(1 + xi) + 2m/(2α - 1) - 2Σln(1 + yi) = 0. (D) n/α - Σln(1 + xi) + m/(α - 1) - Σln(1 + yi) = 0. (E) None of the above
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 244 10.25 (3 points) You are given: (i) Claim counts follow a Poisson distribution with mean µ. (ii) Claim sizes follow an Exponential distribution with mean 10µ. (iii) Claim counts and claim sizes are independent, given µ. For a given policyholder you observe the following claims: Year 1: 10, 70. Year 2: No claims. Year 3: 20, 30, 50. Estimate µ for this policyholder, using maximum likelihood. A. 1.0
B. 1.5
C. 2.0
D. 2.5
E. 3.0
10.26 (3 points) You are observe the following five sizes of loss: 29 55 61 182 270 Fit via maximum likelihood an Inverse Weibull Distribution with τ = 4. What is the fitted value of θ? A. 40
B. 42
C. 44
D. 46
E. 48
10.27 (3 points) Slippery Elm, Ent and expert treasure finder, searches for treasure in either the ruins of Orthanc or the ruins of Minas Morgul. He has made 12 trips to Orthanc and found a total of 8200 worth of treasure. He has made 7 trips to Minas Morgul and found a total of 3100 worth of treasure. The value of treasure that Slippery Elm finds on a trip to either location has a Gamma Distribution with α = 3. However, the expected value of treasure found on a trip to Minas Morgul is assumed to be 50% more than that on a trip to Orthanc. Determine the maximum likelihood estimate of θ for a trip to Orthanc. A. 160
B. 165
C. 170
D. 175
E. 180
10.28 (3 points) You observe the following five sizes of loss: 11 17 23 38 54 Fit via maximum likelihood a LogNormal Distribution with σ = 0.6 and µ unknown. Use the fitted distribution in order to estimate the survival function at 75. A. 1% B. 2% C. 3% D. 4% E. 5%
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 245 10.29 (3 points) You are given: (i) Low-hazard risks have an exponential claim size distribution with mean 0.8θ. (ii) Medium-hazard risks have an exponential claim size distribution with mean θ. (iii) High-hazard risks have an exponential claim size distribution with mean 1.5θ. (iv) Two claims from low-hazard risks are observed, of sizes 100 and 200. (v) Two claims from medium-hazard risks are observed, of sizes 50 and 300. (vi) Two claims from high-hazard risks are observed, of sizes 150 and 400. Determine the maximum likelihood estimate of θ. (A) 180
(B) 190
(C) 200
(D) 210
(E) 220
10.30 (3 points) You are observe the following six sizes of loss: 9 15 25 34 56 90 Fit via maximum likelihood an Inverse Gamma Distribution with α = 3 and θ unknown. What is mean of the fitted distribution? A. less than 30 B. at least 30 but less than 35 C. at least 35 but less than 40 D. at least 40 but less than 45 E. at least 45 10.31 (4 points) You observe the following ten losses: 27, 98, 21, 219, 195, 33, 316, 11, 247, 45. 10
10
10
10
i=1
i=1
i=1
i=1
∑ xi = 1212.
∑ xi2 = 260,860.
∑ln[xi] = 42.5536.
∑ln[xi]2 = 193.948.
Using the method of maximum likelihood, a LogNormal distribution is fit to this data. Using the method of moments, another LogNormal distribution is fit to this data. Each LogNormal distribution is used to estimate the probability that a loss will exceed 100. What is the absolute difference in these two estimates? A. 0.01 B. 0.03 C. 0.05 D. 0.07 E. 0.09 10.32 (2 points) You are observe the following five sizes of loss: 19 45 64 186 370 Fit via maximum likelihood an Inverse Exponential Distribution. What is the fitted value of θ? A. 30
B. 35
C. 40
D. 45
E. 50
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 246 10.33 (3 points) The following data have been collected for a large insured: Year Number Of Claims Average Claim Size 1 100 10,000 2 200 12,500 Inflation increases the size of all claims by 10% per year. A Gamma distribution with parameters α = 3 and θ is used to model the claim size distribution. Estimate θ for Year 3 using the method of maximum likelihood. (A) 4000
(B) 4400
(C) 4800
(D) 5200
(E) 5600
10.34 (3 points) You are observe the following five sizes of loss: 32 45 71 120 178 Fit via maximum likelihood an Inverse Pareto Distribution with θ = 30. What is the fitted value of τ? A. 2.6
B. 2.8
C. 3.0
D. 3.2
E. 3.4
10.35 (2 points) You are given the following three observations: 1 2 3 You fit a distribution with the following density function to the data: f(x) = (p+1) x2 {1 - (x/6)3 }p / 72, 0 < x < 6, p > -1 Determine the maximum likelihood estimate of p. (A) 12 (B) 14 (C) 16 (D) 18 (E) 20 10.36. (3 points) Assume that the heights of maize plants are Normally Distributed. You measure the heights of 10 mature self-fertilized maize plants: 35, 39, 45, 47, 48, 50, 51, 52, 54, 60. You measure the heights of 10 mature cross-fertilized maize plants: 63, 64, 66, 67, 70, 72, 73, 74, 76, 82. You assume that the two Normal distributions have the same coefficient of variation but the height of cross-fertilized plants is on average 1.4 times the height of self-fertilized plants. You fit via maximum likelihood using the data from both samples. What is the fitted value of σ for the cross-fertilized plants? A. 7
B. 8
C. 9
D. 10
E. 11
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 247 10.37 (2, 5/83, Q.26) (1.5 points) Let X1 , X2 , . . . , Xn be a random sample from a distribution with density function f(x) = exp[-
(x - θ)2 ] 2
2 , for x ≥ θ. π
What is the maximum likelihood estimator for θ? A. X
B. min(X1 , X2 , . . . , Xn )
C. max(X1 , X2 , . . . , Xn )
D. X /2
E. 2 X
10.38 (2, 5/83, Q.32) (1.5 points) Let X1 , X2 , X3 , and X4 be a random sample from a distribution exp[with density function f(x) =
(x - 4) ] β , for x > 4, where β > 0. β
If the data from this random sample are 8.2, 9.1, 10.6, and 4.9, respectively, what is the maximum likelihood estimate of β? A. 4.2
B. 7.2
C. 8.2
D. 12.2
E. 28.8
10.39 (2, 5/85, Q.49) (1.5 points) A random sample X1 , . . . , Xn is taken from a distribution with density function f(x) = (θ + 1)xθ for 0 < x < 1, where θ > 0. What is the maximum likelihood estimator of θ? n
A. -1 - n/ ∑ ln xi i=1
n
B. n/ ∑ ln xi i=1
n
n
C. ∑ ln xi / n
D. 1 + n/ ∑ ln xi
i=1
i=1
n
E. 1 - ∑ ln xi /n i=1
10.40 (4, 5/85, Q.54) (3 points) Fit via maximum likelihood a Weibull distribution with probability density function: f(x; q) = q exp(-q x1/2) / (2 x1/2), x > 0, to the observations 1, 4, 9 and 64. In which of the following ranges is the maximum likelihood estimate of q? A. Less than .28 B. At least .28, but less than .29 C. At least .29, but less than .30 D. At least .30, but less than .31 E. At least .31 10.41 (2, 5/88, Q. 27) (1.5 points) Let X1 , . . ., Xn be a random sample of size n from a continuous distribution with density function f(x) =
θ exp[-θ
x]
2x
for 0 < x, where 0 < θ.
What is the maximum likelihood estimate of θ? n
A. n / ∑ xi i=1
n
B. n / ∏ xi i=1
n
C. ∑ (1/ xi ) i=1
n
D. 2n/ ∑ (1/ xi ) i=1
n
E. ∑ xi / n i=1
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 248 10.42 (4, 5/88, Q.55) (2 points) The following six observations came from a gamma distribution that has its first parameter, α, equal to 3.0: 1.0, 2.0, 2.2, 2.8, 3.0, 4.1. What is the maximum likelihood estimate of the second parameter, θ, assuming that α is 3.0? A. B. C. D. E.
Less than 0.80 At least 0.80 but less than 0.85 At least 0.85 but less than 0.90 At least 0.90 but less than 0.95 0.95 or more
10.43 (4, 5/89, Q.48) (2 points) A sample of n independent observations with values x1 ,...,xn came from a distribution with a probability density function of: f(x; q) = 2 q x exp[-q x2 ], x ≥ 0. What is the maximum likelihood estimator for the unknown parameter q? n
n
∑ xi
A. i=1 n
B.
n
C.
n
∑ xi i=1
n
n
∑ xi2
∑ xi2
D. i=1 n ln2
i=1
E. None of the above 10.44 (2, 5/90, Q.7) (1.7 points) Let X1 , . . . , X4 be a random sample from a normal distribution with mean 3 and unknown variance σ2 > 0. If the sample values are 4, 8, 5, and 3, what is the value of the maximum likelihood estimate of σ2? A. 7/2
B. 9/2
C. 14/3
D. 5
E. 15/2
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 249 10.45 (4, 5/90, Q.45) (1 point) Let x1 , x2 , ... , xn be a random sample taken from a normal distribution with mean µ = 0 and variance σ2 . (x -µ)2 ] 2σ2 . σ 2π
exp[The normal distribution has its probability density function given by: f(x) = Which of the following is the maximum likelihood estimator of σ? n
n
∑ xi2 A.
∑ xi2
i=1
n
n -1
∑ xi2
C. i=1 n
n
∑ xi D.
i=1
B.
n
n
∑ xi2
i=1
E. i=1 n -1
n -1
10.46 (4, 5/91, Q.36) (2 points) Given the cumulative distribution function: F(x) = xp for 0 ≤ x ≤ 1, and a sample of n observations, x1 , x2 , ... xn , what is the maximum likelihood estimator of p? A.
-n n
∑ lnxi
B.
i=1
n
∑ lnxi n
∑ lnxi i=1
n
i=1
n
D.
n
∑ xi E.
i=1
n
⎧⎪ n ⎫⎪ n C. ⎨∏xi⎬ ⎪⎩ i=1 ⎪⎭
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 250 10.47 (4, 5/91, Q.47) (2 points) The following sample of 10 claims is observed: 1500, 6000, 3500, 3800, 1800, 5500, 4800, 4200, 3900, 3000. The underlying distribution is assumed to be Gamma, with parameters α = 12 and θ unknown. In what range does the maximum likelihood estimator of θ fall? A. less than 300 B. at least 300 but less than 310 C. at least 310 but less than 320 D. at least 320 but less than 330 E. at least 330 10.48 (2, 5/92, Q. 41) (1.7 points) Let X be a single observation from a continuous distribution 3 x with density function f(x) = , θ ≤ x ≤ 3θ, where θ > 0. 2θ 2θ2 What is the maximum likelihood estimator of θ? A. X/3
B. 3X/5
C. 2X/3
D. X
E. 3X
10.49 (4B, 5/92, Q.21) (2 points) A random sample of n claims, x1 , x2 , ..., xn , is taken from the following exponential distribution: f(x) = e-x/θ/θ, x > 0. Determine the maximum likelihood estimator for θ. n
∑ lnxi
A. i=1 n
n
∑ xi
B. i=1 n
n
∑ lnxi
C. i=1
n
n
∑ xi
D. i=1 n
n
∑ exp[xi]
E. i=1
n
10.50 (4B, 5/92, Q.27) (2 points) The random variable X has the density function with parameter β given by f(x; β) = x
exp[-
(x / β) 2 ] 2 ; x > 0, β > 0. β2
Where E[X] = (β/2) 2 π and the variance of X is: 2β2 - (π/2)β2 . You are given the following observations of X: 4.9, 1.8, 3.4, 6.9, 4.0. Determine the maximum likelihood estimate of β. A. B. C. D. E.
Less than 3.00 At least 3.00 but less than 3.15 At least 3.15 but less than 3.30 At least 3.30 but less than 3.45 At least 3.45
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 251 10.51 (4B, 5/93, Q.7) (2 points) A random sample of n claims x1 , x2 , ..., xn , is taken from the probability density function f(xi) =
exp[-
(xi - 1000)2 ] 2θ , -∞ < xi < ∞. 2πθ
Determine the maximum likelihood estimator of θ. n
A.
i=1
n n
∑ ln[(xi D.
- 1000)2 ]
i=1
n
n
n
∑ (xi -1000)2
∑ ln[(xi
∑ (xi -1000)2 B.
i=1
C.
n n
∑ ln[(xi
E. i=1
- 1000)2 ]
i=1
n
- 1000)2 ] n
10.52 (4B, 11/93, Q.8) (2 points) A random sample of 5 claims x1 ,..., x5 is taken from the probability density function αλ α f(xi) = , α, λ, xi > 0. (λ + xi)α + 1 In ascending order the observations are: 43, 145, 233, 396, 775. Given that λ = 1000, determine the maximum likelihood estimate of α. A. B. C. D. E.
Less than 2.2 At least 2.2, but less than 2.7 At least 2.7, but less than 3.2 At least 3.2, but less than 3.7 At least 3.7
10.53 (4B, 5/94, Q.1) (1 point) You are given the following: • The random variable X has the exponential distribution given by f(x) = λe-λx, x > 0. •
A random sample of three observations of X yields the values 0.30, 0.55, 0.80.
Determine the value of the maximum likelihood estimator of λ. A. B. C. D. E.
Less than 0.5 At least 0.5, but less than 1.0 At least 1.0, but less than 1.5 At least 1.5, but less than 2.0 At least 2.0
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 252 10.54 (4B, 11/95, Q.4) (3 points) You are given the following: • The random variable X has the density function f(x) = 2 (θ - x) / θ2 , 0 < x < θ .
•
A random sample of two observations of X yields the values 0.50 and 0.90.
Determine the maximum likelihood estimator of θ. A. Less than 0.45 B. At least 0.45, but less than 0.95 C. At least 0.95, but less than 1.45 D. At least 1.45, but less than 1.95 E. At least 1.95 10.55 (2, 2/96, Q.12) (1.7 points) Let X1 , . . . , Xn be a random sample from a continuous distribution with density function f(x) = α2α/xα+1, x ≥ 2, where α > 0. Determine the maximum likelihood estimator of α. n
∑Xi
A. min( X1 , . . . , Xn )
B. i=1 n
C. n
n
∑ ln[Xi] i=1
D. max(X1 , . . . , Xn )
E. n
∑ ln[Xi]
n - n ln[2]
i=1
10.56 (2, 2/96, Q.33) (1.7 points) Let X1 ,..., Xn be a random sample from a continuous distribution with density f(x) = e-x /(1 - e−θ), for 0 < x < θ, where 0 < θ < ∞. Determine the maximum likelihood estimator of θ. n
A. ( ∏ Xi)1/ n i=1
B. X C. -In(1 - e-x) D. minimum(X1 ,..., Xn ) E. maximum(X1 ,..., Xn )
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 253 10.57 (4B, 5/96, Q.14) (1 point) You are given the following:
• The random variable X has the density function f(x) = (1/θ) e-x/θ , x > 0. • A random sample of three observations of X yields the values x1 , x2 , and x3 . Determine the maximum likelihood estimator of θ. A. (x1 + x2 + x3 )/3 B. (ln x1 + ln x2 + ln x3 )/3 C. (1/x1 + 1/x2 + 1/x3 )/3 D. exp(-(x1 + x2 + x3 )/3) E. (x1 + x2 + x3 )1/3 10.58 (4B, 5/96, Q.26) (1 point) Which of the following statements regarding loss distribution models are true? 1. Method of moments estimators provide good starting values for iterative maximum likelihood estimation. 2. A weight function may be used with minimum distance estimation. 3. A two-parameter model may be preferable to a three-parameter model in some cases. A. 1 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 10.59 (4B, 11/96, Q.5) (3 points) You are given the following: • The random variable X has the density function f(x) =
β exp[-
β2 ] 2x , 0 < x < ∞, β > 0.
2π x3
• A random sample of three observations of X yields the values 100, 150, and 200. Determine the maximum likelihood estimate of β . A. Less than 11.5 B. At least 11.5, but less than 12.0 C. At least 12.0, but less than 12.5 D. At least 12.5, but less than 13.0 E. At least 13.0
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 254 10.60 (4B, 11/97, Q.6) (2 points) You are given the following: • The random variable X has one of the following three density functions: f1 (x) = 1, 0 < x < 1 f2 (x) = 2x, 0< x < 1 f3 (x) = 3x2 , 0 < x < 1 • A random sample of two observations of X yields the values 0.50 and 0.60. Using the likelihood function, rank the three density functions from most likely to least likely based on the two observations. A. f1 (x), f2 (x), f3 (x) B. f1 (x), f3 (x), f2 (x) C. f2 (x), f1 (x), f3 (x) D. f2 (x), f3 (x), f1 (x)
E. f3 (x), f2 (x), f1 (x)
10.61 (4B, 5/98, Q.6) (2 points) You are given the following: • The random variables X has the density function α f(x) = , 0 < x < ∞, α > 0. (x +1)α + 1 •
A random sample of size n is taken of the random variable X.
Determine the limit of the maximum likelihood estimator of α, as the sample mean goes to infinity. A. 0
B. 1/2
C. 1
D. 2
E. ∞
10.62 (4B, 5/98, Q.20) (1 point) You are given the following: • The random variable X has the density function f(x) = e-x/θ /θ , 0 < x < ∞ , θ > 0. • θ is estimated by maximum likelihood based on a large random sample of size n. • p is the proportion of the observations in the sample that are greater than 1. • The probability that X is greater than 1 is estimated by the estimator exp(-1/θ) . Determine the estimator for the probability that X is greater than 1. n
∑ xi
A. i=1 n
n B. exp − n ∑ xi
[
]
i=1
C. p
D. - ln p
E. -1/ ln p
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 255 10.63 (4B, 5/99, Q.14) (2 points)You are given the following: •
Claim sizes follow a distribution with density function f(x) = e-x/θ / θ , 0 < x < ∞, θ > 0.
•
A random sample of 100 claims yields total aggregate losses of 12,500.
Using the maximum likelihood estimate of θ, estimate the proportion of claims that are greater than 250. A. Less than 0.11 B. At least 0.11, but less than 0.12 C. At least 0.12, but less than 0.13 D. At least 0.13, but less than 0.14 E. At least 0.14 10.64 (4B, 11/99, Q.22) (2 points) You are given the following: • The random variable X has the density function f(x) = wf1 (x)+(1-w)f2 (x), 0 < x < ∞, 0 ≤ w ≤1. • A single observation of the random variable X, yields the value 1. ∞
• ∫ x f1 (x) dx = 1 0
∞
• ∫ x f2 (x) dx = 2 0
• f2 (x) = 2f1 (x) ≠ 0 Determine the maximum likelihood estimate of w. A. 0 B. 1/3 C. 1/2 D. 2/3
E. 1
10.65 (3, 5/00, Q.28) (2.5 points) For a mortality study on college students: (i) Students entered the study on their birthdays in 1963. (ii) You have no information about mortality before birthdays in 1963. (iii) Dick, who turned 20 in 1963, died between his 32nd and 33rd birthdays. (iv) Jane, who turned 21 in 1963, was alive on her birthday in 1998, at which time she left the study. (v) All lifetimes are independent. (vi) Likelihoods are based upon the Illustrative Life Table in Appendix 2A of Actuarial Mathematics. Selected values of lx are as follows: l20 = 9,617,802
l21 = 9,607,896
l22 = 9,597,695
l23 = 9,587,169
l30 = 9,501,381
l31 = 9,486,854
l32 = 9,471,591
l33 = 9,455,522
l55 = 8,640,861
l56 = 8,563,435
l57 = 8,479,908
l58 = 8,389,826
Calculate the likelihood for these two students. (A) 0.00138 (B) 0.00146 (C) 0.00149 (D) 0.00156
(E) 0.00169
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 256 10.66 (4, 11/00, Q.6) (2.5 points) You have observed the following claim severities: 11.0 15.2 18.0 21.0 25.8 You fit the following probability density function to the data: f(x) =
exp[-
(x - µ)2 ] 2x , x > 0, µ > 0. 2πx
Determine the maximum likelihood estimate of µ. (A) Less than 17 (B) At least 17, but less than 18 (C) At least 18, but less than 19 (D) At least 19, but less than 20 (E) At least 20 10.67 (4, 11/00, Q.34) (2.5 points) Phil and Sylvia are competitors in the light bulb business. Sylvia advertises that her light bulbs burn twice as long as Philʼs. You were able to test 20 of Philʼs bulbs and 10 of Sylviaʼs. You assumed that the distribution of the lifetime (in hours) of a light bulb is ^
^
exponential, and separately estimated Philʼs parameter as θ P = 1000 and Sylviaʼs parameter as θ S = 1500 using maximum likelihood estimation. Determine θ*, the maximum likelihood estimate of θP restricted by Sylviaʼs claim that θS = 2 θP. (A) Less than 900 (B) At least 900, but less than 950 (C) At least 950, but less than 1000 (D) At least 1000, but less than 1050 (E) At least 1050
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 257 10.68 (4, 5/01, Q.16) (2.5 points) A sample of ten losses has the following statistics: 10
∑
X -2
10
= 0.00033674
∑X0.5
i=1
i=1
10
∑X - 1 = 0.023999
10
i=1
i=1
10
∑X - 0.5 = 0.34445
10
i=1
i=1
= 488.97
∑X = 31,939 ∑X 2 = 211,498,983
You assume that the losses come from a Weibull distribution with τ = 0.5. Determine the maximum likelihood estimate of the Weibull parameter θ . (A) Less than 500 (B) At least 500, but less than 1500 (C) At least 1500, but less than 2500 (D) At least 2500, but less than 3500 (E) At least 3500 10.69 (2 points) In the previous question, 4, 5/01, Q.16, instead assume that the losses come from an Inverse Gaussian distribution with θ = 4000. Determine the maximum likelihood estimate of the Inverse Gaussian parameter µ . (A) Less than 500 (B) At least 500, but less than 1500 (C) At least 1500, but less than 2500 (D) At least 2500, but less than 3500 (E) At least 3500 10.70 (4, 5/01, Q.30) (2.5 points) The following are ten ground-up losses observed in 1999: 18 78 125 168 250 313 410 540 677 1100 You are given: (i) The sum of the ten losses equals 3679. (ii) Losses are modeled using an exponential distribution with maximum likelihood estimation. (iii) 5% inflation is expected in 2000 and 2001. (iv) All policies written in 2001 have an ordinary deductible of 100 and a maximum covered loss of 1000. (The maximum payment per loss is 900.) Determine the expected amount paid per loss in 2001. (A) 256 (B) 271 (C) 283 (D) 306 (E) 371
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 258 10.71 (4, 11/01, Q.40 & 2009 Sample Q.79) (2.5 points) Losses come from a mixture of an exponential distribution with mean 100 with probability p and an exponential distribution with mean 10,000 with probability 1- p. Losses of 100 and 2000 are observed. Determine the likelihood function of p. ⎛ p e - 1 (1- p) e- .01⎞ ⎛ p e -20 (1- p) e- 0.2 ⎞ (A) ⎜ ⎟⎜ ⎟ ⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠ ⎛ p e - 1 (1- p) e- .01⎞ ⎛ p e- 20 (1- p) e- 0.2 ⎞ (B) ⎜ ⎟ + ⎜ ⎟ ⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠ ⎛p e - 1 (1- p) e- .01 ⎞ ⎛ p e- 20 (1- p) e- 0.2 ⎞ (C) ⎜ + + ⎟⎜ ⎟ ⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠ ⎛p e - 1 ⎛ p e -20 (1- p) e- .01 ⎞ (1- p) e- 0.2 ⎞ (D) ⎜ + + ⎟ + ⎜ ⎟ ⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠ ⎛ e-1 ⎛ e -20 e- .01 ⎞ e- 0.2 ⎞ (E) p ⎜ + + ⎟ + (1- p) ⎜ ⎟ ⎝ 100 10,000 ⎠ ⎝ 100 10,000 ⎠ 10.72 (4, 11/02, Q.10 & 2009 Sample Q. 37) (2.5 points) A random sample of three claims from a dental insurance plan is given below: 225 525 950 Claims are assumed to follow a Pareto distribution with parameters θ = 150 and α. Determine the maximum likelihood estimate of α. (A) Less than 0.6 (B) At least 0.6, but less than 0.7 (C) At least 0.7, but less than 0.8 (D) At least 0.8, but less than 0.9 (E) At least 0.9 10.73 (4, 11/03, Q.34 & 2009 Sample Q.26) (2.5 points) You are given: (i) Low-hazard risks have an exponential claim size distribution with mean θ. (ii) Medium-hazard risks have an exponential claim size distribution with mean 2θ. (iii) High-hazard risks have an exponential claim size distribution with mean 3θ. (iv) No claims from low-hazard risks are observed. (v) Three claims from medium-hazard risks are observed, of sizes 1, 2 and 3. (vi) One claim from a high-hazard risk is observed, of size 15. Determine the maximum likelihood estimate of θ. (A) 1
(B) 2
(C) 3
(D) 4
(E) 5
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 259 10.74 (4, 11/04, Q.6 & 2009 Sample Q.137) (2.5 points) You are given the following three observations: 0.74 0.81 0.95 You fit a distribution with the following density function to the data: f(x) = (p+1) xp , 0 < x < 1, p > -1 Determine the maximum likelihood estimate of p. (A) 4.0 (B) 4.1 (C) 4.2 (D) 4.3
(E) 4.4
10.75 (CAS3, 5/05, Q.18) (2.5 points) The following sample is taken from the distribution f(x, θ) =(1/θ)e-x/θ. Observation 1 2 3 4 5 6 7 x 0.49 1.00 0.47 0.91 2.47 5.03 16.09 Determine the Maximum Likelihood Estimator of c, where P(X > c) = 0.75. A. Less than 1.0 B. At least 1.0 but less than 1.2 C. At least 1.2 but less than 1.4 D. At least 1.4 but less than 1.6 E. 1.6 or more 10.76 (CAS3, 11/05, Q.1) (2.5 points) The following sample was taken from a distribution with probability density function f(x) = θxθ−1, where 0 < x < 1 and θ > 0. 0.21
0.43
0.56
0.67
0.72
Let R and S be the estimators of θ using the maximum likelihood and method of moments, respectively. Calculate the value of R - S. A. Less than 0.3 B. At least 0.3, but less than 0.4 C. At least 0.4, but less than 0.5 D. At least 0.5, but less than 0.6 E. At least 0.6
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 260 10.77 (CAS3, 11/06, Q.2) (2.5 points) Call center response times are described by the cumulative distribution function F(x) = xθ+1, where 0 ≤ x ≤ 1 and θ > -1. A random sample of response times is as follows: 0.56 0.83 0.74 0.68 0.75 Calculate the maximum likelihood estimate of θ. A. Less than 1.4 B. At least 1.4, but less than 1.6 C. At least 1.6, but less than 1.8 D. At least 1.8, but less than 2.0 E. At least 2.0 10.78 (CAS3, 5/07, Q.10) (2.5 points) Let Y1 , Y2 , Y3 , Y4 , ... ,Yn , represent a random sample from the following distribution with p.d.f. f(x) = e-x+θ, θ < x < ∞, -∞ < θ < ∞. Which one of the following is a maximum likelihood estimator for θ? n
A.
∑Yi 1 n
B.
∑Yi2 1
n
C. ∏ Yi 1
D. Minimum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ] E. Maximum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ]
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 261 10.79 (CAS3, 5/07, Q.11) (2.5 points) The proportion of allotted time a student takes to complete an exam, x, is described by the following distribution: f(x) = (θ + 1) xθ, 0 ≤ x ≤ 1 and θ > -1. A random sample of five students produced the following observations: Student Proportion of Allotted Time 1 0.92 2 0.79 3 0.90 4 0.65 5 0.86 Using the sample data, calculate the maximum likelihood estimate of θ. A. Less than 0 B. At least 0, but less than 1.0 C. At least 1.0, but less than 2.0 D. At least 2.0, but less than 3.0 E. At least 3.0 10.80 (CAS3, 11/07, Q.6) (2.5 points) Waiting times at a bank follow an exponential distribution with a mean equal to θ. The first five people in line are observed to have had the following waiting times: 10, 5, 21, 10, 7.
• θ^ A = Maximum Likelihood Estimator of θ • θ^ B = Method of Moments Estimator of θ Calculate θ^ A - θ^ B. A. Less than -0.6 B. At least -0.6, but less than -0.2 C. At least -0.2, but less than 0.2 D. At least 0.2, but less than 0.6 E. At least 0.6
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 262 10.81 (CAS3L, 5/08, Q.3) (2.5 points) You are given the following:
• A random sample of claim amounts: 8,000
10,000
12,000
15,000
• Claim amounts follow an inverse exponential distribution, with parameter θ. Calculate the maximum likelihood estimator for θ. A. Less than 9,000 B. At least 9,000, but less than 10,000 C. At least 10,000, but less than 11,000 D. At least 11,000, but less than 12,000 E. At least 12,000 10.82 (CAS3L, 11/08, Q.4) (2.5 points) You are given the following information:
• A random variable X has probability density function: f(x; θ) = θ xθ−1, where 0 < x < 1 and θ > 0. • A random sample of five observations from this distribution is shown below: 0.25
0.50
0.40
0.80
0.65
Calculate the maximum likelihood estimator for θ. A. Less than 1.00 B. At least 1.00, but less than 1.10 C. At least 1.10, but less than 1.20 D. At least 1.20, but less than 1.30 E. At least 1.30 10.83 (CAS3L, 5/09, Q.19) (2.5 points) You are given the following:
• A random variable, X, has the following probability density function: f(x, θ) = θ xθ−1, 0 < x < 1, 0 < θ < ∞.
• A random sample from this distribution is shown below: 0.10
0.25
0.50
0.60
0.70
Calculate the maximum likelihood estimate of θ. A. Less than 0.4 B. At least 0.4, but less than 0.6 C. At least 0.6, but less than 0.8 D. At least 0.8, but less than 1.0 E. At least 1.0
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 263 10.84 (CAS3L, 11/09, Q.18) (2.5 points) You are given the following five observations from an inverse exponential distribution: 3 9 13 33 51 The probability density function of the inverse exponential distribution is: f(x) =
θ e- θ / x . x2
Calculate the maximum likelihood estimate for θ. A. Less than 10 B. At least 10, but less than 15 C. At least 15, but less than 20 D. At least 20, but less than 25 E. At least 25 10.85 (CAS3L, 5/11, Q.18) (2.5 points) You are given the following information:
• A distribution has density function: f(x) = (θ + 1)(1 - x)θ for 0 < x < 1
• You observe the following four values from this distribution: 0.05
0.10
0.20
0.50
Calculate the maximum likelihood estimate of the parameter θ. A. Less than 0.5 B. At least 0.5, but less than 1.5 C. At least 1.5, but less than 2.5 D. At least 2.5, but less than 3.5 E. At least 3.5
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 264 10.86 (CAS3L, 11/11, Q.18) (2.5 points) You are given the following information: ⎛ ⎝
• Mortality follows the survival function S(x) = ⎜1 -
x ⎞k ⎟ , 0 ≤ x ≤ 90, k > 0. 90 ⎠
• For a sample size of two, the deaths are recorded as one at age 10 and one at age 50. Calculate the maximum likelihood estimate of k. A. Less than 1.0 B. At least 1.0, but less than 1.5 C. At least 1.5, but less than 2.0 D. At least 2.0, but less than 2.5 E. At least 2.5 10.87 (2 points) In the previous question, CAS3L, 11/11, Q.18, calculate the method of moments estimate of k.
10.88 (CAS3L, 11/11, Q.19) (2.5 points) You are given the following five observations: 2.3 3.3 1.2 4.5 0.7 A uniform distribution on the interval [a,b] is fit to these observations using maximum likelihood estimation. This produces parameter estimates a^ and b^ . Calculate b^ - a^ . A. Less than 4.0 B. At least 4.0, but less than 4.2 C. At least 4.2, but less than 4.4 D. At least 4.4, but less than 4.6 E. At least 4.6 10.89 (CAS3L, 5/12, Q.18) (2.5 points) You are given a distribution with the following probability density function where α is unknown: f(x; α) = (1 +
1 1/α ) x , 0 < x < 1, α > 0. α
You are also given a random sample of four observations: 0.2 0.5 0.6 0.8 Estimate α by the maximum likelihood method. A. Less than 2.9 B. At least 2.9, but less than 3.0 C. At least 3.0, but less than 3.1 D. At least 3.1, but less than 3.2 E. At least 3.2
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 265 Solutions to Problems: 10.1. E. For the Pareto, f(x) = (αθα)(θ + x) − (α + 1). For each given set of parameters, compute the likelihood for each of the four values of claim sizes observed and multiply the result: Size
Pareto 1 10
Pareto 1.5 12
Pareto 2 15
Pareto 2.5 18
Pareto 3 20
2 5 7 10
0.0694 0.0444 0.0346 0.0250
0.0850 0.0523 0.0396 0.0275
0.0916 0.0563 0.0423 0.0288
0.0961 0.0589 0.0440 0.0296
0.1025 0.0614 0.0452 0.0296
0.00000267
0.00000484
0.00000627
0.00000736
0.00000842
Comment: Note that one can work with the sum of the loglikelihoods instead.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 266 10.2. B. Maximize the loglikelihood. ln f(x) = ln(q+2) + ln(q+1) + q ln(x) + ln(1-x).
Σ ln f(x) = 3ln(q+2) + 3ln(q+1) + q {ln(.2) +ln(.3)+ln(.6)} + ln(1-.2) +ln(1-.3)+ ln(1-.4) Setting the partial derivative with respect to q of the sum of the loglikelihoods equal to zero: 0 = 3/(q+2) + 3 /(q+1) + {ln(.2) + ln(.3) + ln(.6)}.
⇒ 0 = 3(q+1) + 3(q+2) - 3.32421(q+1)(q+2) ⇒ q2 + 1.195q - .7074 = 0. Thus q = {-1.195 ± 1.1952 + (4)(0.7074) }/2 = 0.434 or -1.629. However, for q ≤ -1, the density function doesn't integrate to unity over [0,1]; for q ≤ -1 this integral is infinite. Therefore q = 0.434 rather than -1.629. Comment: This is a Beta Distribution, with a = q+1, b = 2, θ =1. One can verify numerically that q = 0.434 corresponds to the maximum likelihood: q -0.100 -0.050 0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.434 0.450 0.500 0.500 0.550
f(.2) 1.6069 1.6062 1.6000 1.5889 1.5733 1.5537 1.5307 1.5047 1.4759 1.4449 1.4120 1.3887 1.3775 1.3416 1.3416 1.3048
f(.3) 1.3502 1.3772 1.4000 1.4187 1.4336 1.4448 1.4525 1.4570 1.4585 1.4571 1.4531 1.4489 1.4466 1.4378 1.4378 1.4269
f(.6) 0.7198 0.7602 0.8000 0.8393 0.8780 0.9160 0.9534 0.9901 1.0261 1.0612 1.0956 1.1185 1.1292 1.1619 1.1619 1.1938
Likelihood 1.5617 1.6815 1.7920 1.8919 1.9802 2.0564 2.1199 2.1707 2.2088 2.2344 2.2480 2.2506 2.2500 2.2413 2.2413 2.2224
10.3. E. The Method of Maximum Likelihood is equal to the Method of Moments for the Exponential Distribution fit to ungrouped data. Thus 1/q = 107.2, q = 1 / 107.2 = 0.0093. Applying the Method of Maximum Likelihood, the density function for this Exponential Distribution is f(x) = qe-qx. The loglikelihood is Σ lnf(xi) = Σ {lnq - qxi}. To maximize this, set the partial derivative with respect to q equal to zero. Σ{1/q - xi} = 0. ⇒ N/q - Σxi = 0. ⇒ q = N / Σxi = 6 / 643.19 = 0.0093.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 267 10.4. D. Maximize the loglikelihood. f(x) = θ−αxα−1 e−x/θ / Γ(α) = θ−8x7 e−x/θ / Γ(8). ln f(x) = -8ln(θ) +7ln(x) -x/θ - ln(7!). loglikelihood = Σ ln f(xi) = -80ln(θ) + 7Σln(xi) - (1/θ)Σxi - 10ln(5040). Setting the partial derivative with respect to θ of the loglikelihood equal to zero: 0 = 80/θ - (1/θ2)Σxi. Therefore, θ = Σxi / 80 = 36900 / 80 = 461. Comment: For the Gamma distribution with α known, the maximum likelihood estimate of θ is: {Σxi /n}/ α = observed mean/ α = method of moments estimator. 10.5. C. With variance v, the density of the Normal Distribution is: f(x) = exp(-.5(xi - µ)2 /v) 2 πv . With variance 1/(θmi), the density of the Normal Distribution is: f(xi) = exp(-.5(xi - µ)2 θmi)
(θmi) / (2π)
ln f(xi) = -.5(xi - µ)2 θmi + .5ln(θ) + .5ln(mi) - .5ln(2π).
Σ∂ln f(xi)/∂µ = θΣ(xi - µ)mi = 0. ⇒ µ = Σximi/Σmi = {(0)(1) + (3)(2) + (8)(3)}/(1 + 2 + 3) = 5. Σ∂ln f(xi)/∂θ = -.5Σ(xi - µ)2 mi + .5Σ(1/θ) = 0. ⇒ θ = n/{Σ(xi - µ)2 mi} = 3/ {(0 - 5)2 (1) + (3 - 5)2 (2) + (8 - 5)2 (3)} = 3/60 = 0.05. Comment: See Exercise 15.60 in Loss Models. 10.6. B. The density is 1/(b - 1). The likelihood is: f(1.7)f(3.1)f(3.4)f(4.6) = 1/(b - 1)4 . Since we observe a loss of size 4.6, b ≥ 4.6. For b ≥ 4.6, the likelihood is maximized for b = 4.6.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 268 10.7. A. & 10.8. E. f(x) = (1/σ) (1/ 2 π ) exp(-.5{(x-µ)/σ}2 ). ln f(x) = -.5{(x-µ)2 /σ2} - ln(σ) - (1/2)ln(2π). Σ ln f(xi) = -.5{Σ(xi-µ)2 /σ2} - nln(σ) - (n/2)ln(2π). Set the partial derivatives of the sum of loglikelihoods equal to zero.
∂Σ ln f(xi) / ∂σ = Σ(xi-µ)2 /σ3 - n/σ = 0. ∂Σ ln f(xi) / ∂µ = Σ(xi-µ)/σ2 = 0. Therefore Σ(xi-µ) = 0. µ = (1/n)Σxi = (6.02 + 7.56 + 7.88 + 8.42 + 8.72) / 5 = 7.72. Therefore σ =
∑ (xi - µ)2 / n =
0.886 = 0.94.
Comment: Notice that for the Normal Distribution the Method of Moments and the Method of Maximum Likelihood applied to ungrouped data give the same result. 10.9. E. For an Exponential Distribution, ln f(x) = -x/θ - ln(θ). Assuming θB = θE/1.4, and θG = θE /1.7, then the loglikelihood is:
Σ(-xi 1.4/θE - ln(θE/1.4)) + Σ(-xi/θE - ln(θE)) + Σ(-xi 1.7/θE - ln(θE/1.7)) = Bay
Empire
Granite
-200000(1.4)/θE - 200ln(θE/1.4) - 500000/θE - 400ln(θE) - 75000(1.7)/θE - 100ln(θE/1.7) = -907500/θE - 700ln(θE) + 200ln(1.4) + 100ln(1.7) . Setting the partial derivative of the loglikelihood with respect to θE equal to zero: 0 = 907500/θE2 - 700/θE. θE = 907500/ 700 = 1296. Comment: Similar to 4, 11/00, Q. 34. One could just multiply the losses observed for Bay State by 1.4 and those for Granite State by 1.7, in order to get them up to the level of Empire State. Then for the Exponential Distribution, the method of maximum likelihood equals the method of moments: ((1.4)(200000) + 500000 + (1.7)(75000))/(200 + 400 + 100) = 1296. Instead of states with different cost levels, it could have been years with different cost levels due to inflation.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 269 (lnx - µ)2 ] 2σ2 . x σ 2π
exp[10.10. A. & 10.11. E. f(x) =
ln f(x) = -.5{(ln(x)-µ)2 /σ2} - ln(σ) - ln(x) - (1/2)ln(2π). Σ ln f(xi) = -.5{Σ(ln(xi)-µ)2 /σ2} - nln(σ) - Σln(xi) - (n/2)ln(2π). Set the partial derivatives of the sum of loglikelihoods equal to zero.
∂Σ ln f(xi) / ∂σ = Σ(ln(xi)-µ)2 /σ3 - n/σ = 0. ∂Σ ln f(xi) / ∂µ = Σ(ln(xi)-µ)/σ2 = 0. Therefore Σ(ln(xi)-µ) = 0. µ = (1/n)Σln(xi) = (6.02 + 7.56 + 7.88 + 8.42 + 8.72) / 5 = 7.72. Therefore σ =
∑ (xi - µ)2 / n =
0.886 = 0.94.
Comment: Notice that for the LogNormal Distribution the Method of Maximum Likelihood gives the same result as the Method of Maximum Likelihood applied to the Normal Distribution and the log of the claim sizes. In general, the Method of Maximum Likelihood is invariant under such changes of variables. In particular, if a set of parameters maximizes the likelihood of a LogNormal Distribution, then they also maximize the likelihood of the corresponding Normal, as seen for example in this pair and the previous pair of questions. In addition, since in the case of the Normal the Method of Maximum Likelihood and the Method of Moments (applied to ungrouped data) produce the same result, applying the Method of Maximum Likelihood to the LogNormal is the same as applying the Method of Moments to the underlying Normal.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 270 10.12. E. and 10.13. C. Setting up the maximum likelihood equations for the Pareto in terms of: v = { Σ ln(1 +xi / θ) }/n, and w = { Σ 1/ (1+xi / θ) }/n. The first step is to write down the likelihood function: f(xi) = α θα / (xi +θ)α+1. Then write the sum of the log likelihoods: Σ {ln(α) + α ln(θ) - (α+1) ln(xi +θ)} Then take the partial derivatives with respect to the two parameters α and θ and set them equal to zero: Σ {1/α + ln(θ) - ln (xi +θ) } = 0, and Σ {α/θ − (α+1) / (xi +θ) } = 0. The second equation becomes: nα/θ = (α+1) Σ{1 / (xi +θ)}. α/(α + 1) = (1/n) Σ{1/(1 + xi / θ)} = w.
Thus 1 + 1/α = 1/w
The first equation becomes: Σ1/α = Σ { ln (xi +θ) - ln(θ)}. 1/α = (1/n)Σ ln {(1 + xi / θ} = v. Thus α = 1/v. So the solution to the second question is C. Putting this into the second equation: 1+ v = 1/w. So the solution to the first question is E (none of the above). Comment: In the case of the ungrouped data in Section 2, the maximum likelihood fit was determined to be α = 1.702 and θ = 240,151. Given θ = 240,151, one can calculate v = .5877 and w = .6298, and verify that in this case, α = 1/v and 1 + v = 1/w.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 271 10.14. A. This is a Pareto Distribution with θ fixed at 1000. The density function is: f(x) = (αθα)(θ + x) − (α + 1) = (α1000α)(1000 + x)−(α + 1). The log likelihood is: Σ { ln(α) + α ln(1000) − (α+1)ln(1000 + xi)}. The derivative with respect to α is: Σ { (1/α) + ln(1000) − ln(1000 + xi)}. Setting this derivative equal to zero: 0 = (n/α) - Σ ln{(1000 +xi)/1000}. Solving for alpha: α = n / Σ ln {(1000 +xi)/1000} = 10 / 4.151 = 2.41. x = Size of Claim 1729 101 384 121 880 3043 205 132 214 82 SUM
1 +(x/1000) 2.729 1.101 1.384 1.121 1.880 4.043 1.205 1.132 1.214 1.082
ln [1 +(x/1000)] 1.004 0.096 0.325 0.114 0.631 1.397 0.186 0.124 0.194 0.079 4.151
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 272 10.15. A. & 10.16. B. f(x) = (θ/ 2πx3 ).5 exp[- θ({x − µ} / µ)2 / 2x]. ln f(x) = .5 ln(θ) -.5ln(2π) - 1.5ln(x) - θ({x − µ} / µ)2 / 2x. Σ ln f(xi) = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ /2) Σ(xi /µ2 - 2/µ + 1/ xi). Set the partial derivatives of the loglikelihood equal to zero.
∂Σ ln f(xi) / ∂µ = -(θ /2)Σ(-2xi /µ3 + 2/µ2 ) = 0. ∂Σ ln f(xi) / ∂θ = n/(2θ) - (1/2)Σ(xi /µ2 - 2/µ + 1/xi) = 0. The first equation is: Σ 2/µ2 = Σ2xi /µ3. Therefore, n µ = Σ xi. ⇒ µ = Σ xi /n. The second equation is: θ = n / Σ(xi /µ2 - 2/µ + 1/xi ) = n /{nE[X]/µ2 - 2n/µ + nE[1/X]} = 1 / {E[X]/E[X]2 - 2/E[X] + E[1/X]} = 1 / {E[1/X] -1/E[X]}.
Average
X
1/X
1500 5500 3000 3300 2300
0.00066667 0.00018182 0.00033333 0.00030303 0.00043478
3120
0.00038393
Therefore, µ = 3120. θ = 1 / {E[1/X] - 1/E[X]} = 1/(.00038393 - 1/3120) = 15,769. Comment: For the Inverse Gaussian, one can solve for the maximum likelihood parameters in closed form. The parameter µ, which is equal to the mean of Inverse Gaussian, is set equal to the observed mean. The fitted parameter θ is a function of the observed mean and the observed negative first moment. 10.17. A. Σ ln f(xi) = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ /2) Σ(xi /µ2 - 2/µ + 1/ xi). Set the partial derivative of the loglikelihood with respect to θ equal to zero:
∂Σ ln f(xi) / ∂θ = n/(2θ) - (1/2)Σ(xi /µ2 - 2/µ + 1/xi) = 0. θ = n / (Σxi /µ2 - 2n/µ + Σ1/xi ) = 5/(15600/40002 - 10/4000 + .001920) = 12,670. Comment: When the parameter µ is fixed, the fitted value of θ is different than when both µ and θ are fit.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 273 10.18. B. & 10.19. E. Convert the payments to the losses prior to the effect of the coinsurance factor of 90%: (3236, 3759, 10769, 22832, 28329, 36703, 72369) / .9 = 3595, 4177, 11965, 25369, 31477, 40781, 80410. (lnx - µ)2 exp[] 2 2σ f(x) = . x σ 2π ln f(x) = -0.5{(ln(x)-µ)2 /σ2} - ln(σ) - ln(x) - (1/2)ln(2π). Σ ln f(xi) = -.5{Σ(ln(xi)-µ)2 /σ2} - nln(σ) - Σln(xi) - (n/2)ln(2π). Set the partial derivatives of the sum of loglikelihoods equal to zero:
∂Σ ln f(xi) / ∂σ = Σ(ln(xi)-µ)2 /σ3 - n/σ = 0. ∂Σ ln f(xi) / ∂µ = Σ(ln(xi)-µ)/σ2 = 0. ⇒ Σ(ln(xi)-µ) = 0. ⇒ µ = (1/n)Σln(xi) = {ln(3595)+ln(4177)+ln(11965)+ln(25369)+ln(31477)+ln(40781)+ln(80410)}/ 7 = 9.76. Therefore, σ = ( Σ(ln(xi)-µ)2 / n )0.5 = 1.175 = 1.08. Comment: In general, fitting a LogNormal Distribution via Maximum Likelihood is equivalent to fitting a Normal Distribution to the log sizes via the Method of Moments. The original losses are all -ln.9 more than the logs of the payments. Therefore, if one fits a Normal Distribution via Methods of Moments to the log payments, getting a mean of 9.655, one needs to add -ln.9 in order to get the mean of the logs of the original losses. (Adding a constant to a Normal Distribution gives another Normal Distribution; one adds a constant to the mean and leaves the standard deviation the same.) 10.20. B. f(x) = τxτ−1θ−τ exp(-(x/θ)τ). ln f(x) = lnτ + (τ - 1) ln x - τlnθ - (x/θ)τ. Set the partial derivative with respect to θ of the loglikelihood equal to zero:
Σ ∂ln f(xi) / ∂θ = Σ −τ/θ + τxτ/θτ+1 = 0. Nτ/θ = (τ/θτ+1)Σ xτ. θ = {(1/N)Σ xτ }1/τ = (3832632/200)1/1.5 = 716. Comment: Similar to 4, 5/01, Q.16.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 274 10.21. B. & 10.22. D. For the Gamma Distribution, the log density is: ln f(x) = -αlnθ + (α−1)lnxi - (x/θ) - ln(Γ(α)). Setting equal to zero the partial derivative of Σ ln (f(xi)) with respect to α one obtains: 0 = Σ {-lnθ + lnxi - ψ(α) } = -n lnθ - n ψ(α) + Σ lnxi. Setting equal to zero the partial derivative of Σ ln (f(xi)) with respect to θ one obtains: 0 = Σ{-α/θ + xi θ−2} = -nα/θ - θ−2Σ xi. This implies that: θ = {(1/n)Σxi} / α. Substituting for θ in the first equation one obtains: ψ(α) - lnα = (1/n)Σ ln xi - ln ((1/n)Σxi). Substituting in the particular values for this data set of 1000 points: ψ(α) - ln(α) = (1/n) Σln(xi) - ln( (1/n) Σxi ) = 4800/1000 - ln(150000/1000) = -.211. Interpolating in the above table of values of: ln(y) - ψ(y) gives α = 2.52. Therefore, θ = (Σxi / n)/ α = 150 / 2.52 = 59.5. Comment: Beyond what you are likely to be asked on the exam. 10.23. C. The loglikelihood for the combined data is the sum of the loglikelihoods for the individual sets of data: Negative Loglikelihoods for combined data: Theta
τ = 0.3
τ = 0.5
τ = 0.7
τ = 0.9
τ = 1.1
3000 5000 7000 9000 11000
2574.05 2572.10 2573.99 2576.88 2580.03
2508.78 2495.45 2496.44 2501.39 2507.60
2523.96 2481.58 2476.13 2481.04 2489.55
2615.50 2513.26 2491.75 2492.65 2501.58
2807.97 2592.29 2538.79 2529.02 2535.19
The best loglikelihood is for θ = 7000 and τ = 0.7. S(22000) = Exp[-(22000/7000).7] = 10.8%. Comment: Since we are only shown a grid of parameter values, we only know that the actual maximum likelihood would occur somewhere near θ = 7000 and τ = 0.7.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 275 10.24. C. f(x) = (αθα)(θ + x)−(α + 1) = α(1 + x)−(α + 1). ln[f(x)] = ln(α) − (α+1)ln(1 + x). Let α be the parameter for Region 1, and a be the parameter for Region 2. Mean for region 1 is 1/(α-1). Mean for region 2 is 1/(a-1). We are given: 1/(a-1) = 0.5/(α-1). ⇒ a = 2α - 1. Loglikelihood is: n ln(α) − (α+1)Σln(1 + xi) + m ln(a) − (a+1)Σln(1 + yi) = n ln(α) − (α+1)Σln(1 + xi) + m ln(2α - 1) − (2α)Σln(1 + yi). Set the derivative with respect to α equal to zero: n/α - Σln(1 + xi) + 2m/(2α - 1) - 2Σln(1 + yi) = 0. Comment: Similar to 4, 11/04, Q.18, involving a Single Parameter Pareto Distribution. 10.25. D. The density of the Poisson is: e−µµn /n!. The density of the Exponential is: e-x/(10µ)/(10µ). Therefore, the contributions to the likelihood are: Year 1: {e−µµ2 /2!} {e-10/(10µ)/(10µ)} { e-70/(10µ)/(10µ)}. Year 2: e−µ. Year 3: { e−µµ3 /3!} {e-20/(10µ)/(10µ)} {e-30/(10µ)/(10µ)} {e-50/(10µ)/(10µ)}. Therefore, the likelihood is proportional to: e−3µµ5 e-18/µ/µ5 = e−3µ e-18/µ. Therefore, ignoring constants, the loglikelihood is: -3µ - 18/µ. Setting the derivative equal to zero: 0 = -3 + 18/µ2 . ⇒ µ =
6 = 2.45.
Comment: Setup taken from 4, 11/03, Q.11, a Buhlmann Credibility Question.
10.26. B. f(x) =
⎛ θ ⎞τ τ θτ exp ⎜ ⎟ . ⎝ x⎠ xτ + 1
∂lnf(x) = τ/θ - τθτ−1/xτ. ∂θ ⎛ θ= ⎜ ⎜ ⎝
[ ]
lnf(x) = ln[τ] + τ ln[θ] - (τ+1)ln[x] - (θ/x)τ.
Setting the partial derivative equal to zero: τN/θ = τθτ−1
1/ 4 ⎞ 1/ τ 5 ⎛ ⎞ ⎟ = = 42.07. ⎝ 29 - 4 + 55 - 4 + 61- 4 + 182 - 4 + 270 - 4 ⎠ xi- τ ⎟⎠
N
∑
Comment: For a Weibull Distribution, with τ fixed: θ =
(∑xi τ / N) 1 / τ .
∑ xi-τ . ⇒
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 276 10.27. E. f(x) = x2 e-x/θ / (2θ3). ln f(x) = 2ln(x) - x/θ - 3ln(θ) - ln(2). Mean for Minas Morgul is 1.5 times that for Orthanc. ⇒ 3θM = 3(1.5θO). ⇒ θM = 1.5θO. Loglikelihood is: 2 Σ ln(xi) - Σ xi / θO - 12{3ln(θO)} - 12ln(2) + 2 Σ ln(xi) - Σ xi / (1.5θO) - 7{3ln(1.5θO)} - 7ln(2). Orthanc
Orthanc
M.M.
M.M.
Set the derivative of the loglikelihood with respect to θO equal to zero: 0 = Σ xi / θO2 - 36/θO + Σ xi / (1.5θO2) - 21/θO. Orthanc
M.M.
57/θO = 8200 / θO2 + 3100 / (1.5θO2). θO = {8200 + (3100 / 1.5)}/57 = 180.1. Alternately, bring the value of the treasure from Minas Morgul to the Orthanc level: 3100/1.5 = 2066.7. For the Gamma with alpha fixed, method of moments is equal to the method of maximum likelihood: 3θ = (8200 + 2066.7)/(12 + 7). ⇒ θ = 180.1. Comment: This trick of adjusting the losses applies to the Gamma with alpha fixed, including its special case the Exponential. It does not apply to severity distributions in general. 10.28. C. f(x) = exp[-.5 ({ln(x)−µ} /σ)2] /{xσ 2 π }. ln f(x) = -.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) - ln(2π)/2 = -1.3889({ln(x)−µ}2 - ln(x) - ln(0.6) - ln(2π)/2. Setting the partial derivative of the loglikelihood with respect to µ equal to zero: 0 = 2.7778Σ{ln(xi)−µ}. µ = Σln(xi)/5 = {ln(11) + ln(17) + ln(23) + ln(38) + ln(54)}/5 = 3.199. S(75) = 1 - Φ[(ln(75) - 3.199)/.6] = 1 - Φ[1.86] = 3.1%. Alternately, take the logs of the losses and fit to a Normal, via maximum likelihood which is the same as the method of moments. µ = average of log losses = {ln(11) + ln(17) + ln(23) + ln(38) + ln(54)}/5 = 3.199. S(75) = 1 - Φ[(ln(75) - 3.199)/.6] = 1 - Φ[1.86] = 3.1%.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 277 10.29. A. For an Exponential with mean µ, f(x) = e-x/µ/µ and ln f(x) = -x/µ - lnµ. Thus the loglikelihood is: {-100/(.8θ) - ln(.8θ)} + {-200/(.8θ) - ln(0.8θ)} + {-50/θ - ln(θ)} + {-300/θ - ln(θ)} + {-150/(1.5θ) - ln(1.5θ)} + {-400/(1.5θ) - ln(1.5θ)}= -1091.67/θ - 6lnθ - 2ln.8 - 2ln1.5. Setting the derivative equal to zero: 0 = 1091.67/θ2 - 6/θ. ⇒ θ = 1091.67/6 = 182. Alternately, convert all of the data to the medium-hazard level (which has mean θ.) 100 and 200 from 0.8θ ⇔ 100/.8 = 125 and 250 from θ. 150 and 400 from 1.5θ ⇔ 150/1.5 = 100 and 266.67 from θ. For the Exponential, Maximum Likelihood Method of Moments: θ = (125 + 250 + 50 + 300 + 100 + 266.67)/6 = 182. Comment: Similar to 4, 11/03 Q.34. 10.30. B. f(x) = (θ/x)3 e-θ/x/(x 2). ln f(x) = 3ln(θ) - 4ln(x) - θ/x - ln(2). Setting the partial derivative of the loglikelihood with respect to θ equal to zero: 0 = N 3/θ - Σ1/xi. θ = (6)(3)/Σ1/xi = 18/(1/9 + 1/15 + 1/25 + 1/34 + 1/56 + 1/90) = 65.2. mean = θ/(α - 1) = 65.2/2 = 32.6. 10.31. D. Method of moments: Mean = 1212/10 = 121.2 = exp[µ + σ2/2]. Second Moment = 260,860/10 = 26,086 = exp[2µ + 2σ2]. Dividing the second equation by the square of the first equation: exp[2µ + 2σ2]/exp[2µ + σ2] = exp[σ2] = 26,086/121.22 = 1.7758.
⇒ σ = 0.758. ⇒ µ = 4.510. S(100) = 1 - Φ[(ln(100) - 4.510)/0.758] = 1 - Φ[0.13] = 44.83%. Fitting the LogNormal via the method of maximum likelihood is equivalent to fitting a Normal Distribution via the method of moments to ln(xi): µ = E[ln(xi)] = 42.5536/10 = 4.255. σ=
E[ln(xi)2 ] - E[ln(xi )]2 = 193.948 / 10 - 4.255362 = 1.134.
S(100) = 1 - Φ[(ln(100) - 4.255)/1.134] = 1 - Φ[0.31] = 37.83%. Absolute difference in the two estimates of S(100) is: |44.83% - 37.83%| = 7.00%.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 278 10.32. E. f(x) = θx-2e-θ/x, x > 0. ln f(x) = ln(θ) - 2ln(x) - θ/x. The loglikelihood is equal to: N ln(θ) - 2 Σ lnxi - θ Σ 1/xi. Setting the partial derivative of the loglikelihood with respect to θ equal to zero: 0 = N/θ - Σ1/xi. ⇒ θ = N/Σ1/xi = 5/(1/19 + 1/45 + 1/64 + 1/186 + 1/370) = 50.7. Alternately, if X is Inverse Exponential with parameter θ, then 1/X is Exponential with mean 1/θ. Fitting an Exponential via maximum likelihood is the same as the method of moments. Fitting an Exponential distribution with mean 1/θ via the method of moments to 1/X: 1/θ = (1/19 + 1/45 + 1/64 + 1/186 + 1/370)/5. ⇒ θ = 50.7. 10.33. B. Since we want to estimate θ for Year 3, inflate all of the data to the cost level of Year 3: 10000(1.1)2 = 12100. 12500(1.1) = 13750. The total inflated losses are: (100)(12100) + (200)(13750) = 3,960,000. Average claim size = 3,960,000 /(100+200) = 13,200. For the Gamma with alpha fixed, maximum likelihood is equal to method of moments. Set the observed and theoretical means equal: 3θ = 13,200. ⇒ θ = 13,200/3 = 4,400. Alternately, f(x) = x2 e-x/θ /(2 θ3). ln f(x) = 2 lnx - x/θ - ln(2) - 3 ln(θ). loglikelihood is: 2 Σ lnxi - Σxi/θi - N ln(2) - 3 Σ ln(θi), where the thetas differ by year. Let θ = the year 3 theta. Let yi be the loss xi inflated to the year 3 level. Then xi/θi = yi/θ. Also, θi = θ/1.1c, where c is either 1 or 2. ⇒ ln(θi) = ln(θ/1.1c) = ln(θ) + constants. loglikelihood is: 2 Σ lnxi - Σyi/θ - N ln(2) - 3 Σ ln(θi) = -3,960,000/θ - (3)(300) ln(θ) + constants. Setting the derivative of the loglikelihood with respect to θ equal to zero: 0 = 3,960,000/θ2 - 900/θ. ⇒ θ = 3,960,000 / 900 = 4,400.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 279 10.34. A. f(x) =
τθ x τ − 1 . (x + θ ) τ + 1
lnf(x) = ln[τ] + ln[θ] + (τ-1)ln[x] - (τ+1)ln[x+θ]. ∂lnf(x) = 1/τ + ln[x] - ln(x+θ). ∂τ Setting the partial derivative equal to zero: N/τ = τ=
∑
N ln[1 + θ / xi ]
∑ ( ln[xi + θ] - ln[xi] ) = ∑ ln[1 + θ / xi ]. ⇒
=
5 = 2.63. ln[1 + 30 / 32] + ln[1 + 30 / 45] + ln[1 + 30 / 71] + ln[1 + 30 / 120] + ln[1 + 30 / 178] Comment: For a Pareto Distribution, with θ fixed: α =
N . θ + xi ∑ ln[ θ ]
10.35 .C. ln f(x) = ln[p+1] + 2 ln[x] + p ln[1 - (x/6)3 ] - ln[72]. ∂ln[f(x)] = 1/(p+1) + ln[1 - (x/6)3 ]. ∂p 0=
i)] = 3/(p+1) + ln[1 - 1/216] + ln[1 - 1/27] + ln[1 - 1/8]. ⇒ ∑ ∂ln[f(x ∂p
3/(p+1) = 0.1759. ⇒ p = 16.05. Comment: A Generalized Beta Distribution as per Appendix A of Loss Models, with a = 1, b = p+1, θ = 6, and τ = 3.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 280
[(
x - µ)2 2σ2 σ 2π
exp 10.36. B. For the Normal Distribution, f(x) =
ln f(x) = -
(x
]
.
- µ)2 - ln(σ) - ln(2π)/2. 2σ2
The first Normal for the self-fertilized plants has parameters: µ/1.4 and σ/1.4. The second Normal for the cross-fertilized plants has parameters: µ and σ. (The Normals have the same CV, but the mean of the second is 1.4 times the mean of the first.) Let the two samples be xi and yi. Then the loglikelihood over the two samples combined is: 10
-
∑
(xi -
µ / 1.4)2 2 (σ / 1.4)2
i=1
10
∑
(y i - µ)2
i=1
2 σ2
- 20ln(σ) + constants.
Setting the partial derivative with respect to µ equal to zero: 10
0=
∑ i=1
(y i - µ) σ2
10
+
∑
(1.4x i - µ) σ2
i=1
. ⇒ 20 µ =
10
10
i=1
i=1
∑ 1.4xi + ∑ yi = (1.4)(481) + 707. ⇒ µ = 69.02.
Setting the partial derivative with respect to σ equal to zero: 10
0=
∑ i=1
(1.4x i - µ)2 σ3
10
+
∑
(y i - µ)2
i=1
σ3
- 20/σ. ⇒
20σ2 = 10
10
∑ (1.4x i - µ)2 + ∑ (y i - µ)2 i=1
+ = (63 - 69.02)2 + (64 - 69.02)2 + ... + {(1.4)(60) - 69.02}2 .
i=1
σ2 = 64.4796. ⇒ σ = 8.03. Alternately, put the self-fertilized heights on the cross-fertilized level by multiplying by 1.4: 49, 54.6, 63, 65.8, 67.2, 70, 71.4, 72.8, 75.6, 84. Now fit to the combined sample. For the Normal Distribution, method of moment is equal to maximum likelihood. µ = (63 + 64 + ... + 82 + 49 + 54.6 + ... + 84) / 20 = 69.02. σ2 = {(63 - 69.02)2 + (64 - 69.02)2 + .. + (84 - 69.02)2 } / 20 = 64.4796. ⇒ σ = 8.03. Comment: One can get the mean and variance of the combined sample using the stat functions of the calculator. However, be careful; here we want the biased estimator of the variance rather than the sample variance.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 281 10.37. B. f(x) = exp[-(x - θ)2 /2]
2 / π , for x ≥ θ. ln f(x) = -(x - θ)2 /2 - ln(2/π)/2.
loglikelihood is: -Σ(xi - θ)2 /2 - n ln(2/π)/2. Set the partial derivative with respect to θ equal to zero: Σ(xi - θ) = 0. ⇒ θ = X . Since x ≥ θ, θ ≤ min(X1 , X2 , . . . , Xn ) ≤ X . The loglikelihood increases as θ increases towards X .
⇒ The maximum likelihood occurs for the largest possible θ, min(X1 , X2 , . . . , Xn ). Comment: A case where one has to carefully check the endpoints, in order to find the maximum. If y has a Normal Distribution with mean 0 and variance 1, then x = θ + |y| has the given distribution. 10.38. A. f(x) = exp[-(x-4)/β]/β. ln f(x) = -(x-4)/β - lnβ. Loglikelihood is: -Σ(xi - 4)/β - n lnβ. Set the partial derivative with respect to β equal to zero:
Σ(xi - 4)/β2 - n/β = 0. ⇒ β^ = Σ(xi - 4)/n = {(8.2 - 4) + (9.1 - 4) + (10.6 - 4) + (4.9 - 4)}/4 = 4.2. Comment: Let y = x - 4, then y follows an Exponential Distribution.
⇒ Maximum likelihood equals the Method of Moments: β^ = Y = 4.2. 10.39. A. ln f(x) = ln(θ + 1) + θ ln x. loglikelihood is: n ln(θ + 1) + θ Σln xi. Set the partial derivative of the loglikelihood with respect to θ equal to zero: 0 = n/(θ + 1) + Σln xi. ⇒ n/(θ + 1) = -Σln xi. ⇒ (θ + 1)/n = -1/Σln xi. ⇒ θ^ = -1 - n/Σln xi. 10.40. B. The log density is: ln f(x;q) = ln(q) - qx1/2 - ln(2) - ln(x1/2). Thus the loglikelihood for the four observed values is: ln f(1) + ln f(4) + ln f(9) + ln f(64) = 4 ln(q) - q(1 + 2 + 3 + 8) - 4ln(2) - ln((1)(2)(3)(8)). To maximize the loglikelihood, set the partial derivative with respect to q equal to zero. 4/q - 14 = 0. Thus q = 4/14 = 0.286. 10.41. A. f(x) = θ exp[-θ x ]/ 2x . ln f(x) = lnθ - θ x - ln(2x)/2. loglikelihood is: n lnθ - θΣ xi - Σln(2xi)/2. ^ Setting the partial derivative with respect to theta equal to zero: 0 = n/θ - Σ xi . ⇒ θ = n/Σ xi .
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 282 10.42. B. f(x) = θ−αxα−1 e−x/θ / Γ(α) = θ−3x2 e−x/θ / 2. ln f(x) = -3 ln(θ) + 2 ln(x) - x/θ - ln(2). ∂ ln f(x) / ∂ θ = -3/θ +x/θ2. Setting the derivative of the loglikelihood equal to zero: 0 = ∂ Σln f(xi) / ∂θ = -3n/θ + (1/θ2)Σxi . Therefore θ = Σxi / 3n = (1+2+2.2+2.8+3+4.1) / (3)(6) = 0.839. Comment: For the Gamma with fixed α, the method of maximum likelihood applied to ungrouped data is equal to the method of moments. 10.43. C. ln f(x;q) = ln(2) + ln(q) + ln(x) - qx2 . ∂ ln f(x;q) / ∂ q = 1/q - x2 .
Σ ∂ ln f(xi ;q) / ∂ q = n / q − Σxi2 . Setting the partial derivative with respect to q of the loglikelihood equal to zero: 0 = n / q − Σxi2 . Thus q = n /
Σxi2 .
Comment: This is a Weibull Distribution with τ = 2 fixed and θ = q-1/2. 10.44. E. f(x) = exp[-.5(x-3)2 /σ2]/{σ 2 π ). loglikelihood = -.5Σ(xi - 3)2 /σ2 - n ln(σ) - n(1/2)ln(2π). Set the partial derivative with respect to σ of the loglikelihood equal to 0: 0 = Σ(xi - 3)2 /σ3 - n/σ. σ2 = (1/n)Σ(xi - 3)2 = {(4-3)2 + (8-3)2 + (5-3)2 + (3-3)2 }/4 = 7.5. Comment: In this case, the maximum likelihood estimate is equal to the method of moments. 10.45. A. For µ = 0, f(x) = (1/σ) (1/ 2 π ) exp(-0.5{x2 /σ2 }). ln f(x) = -.5{x2 /σ2} - ln(σ) - (1/2)ln(2π). Σ ln f(xi) = -.5{Σxi2 /σ2} - nln(σ) - (n/2)ln(2π).
∂Σ ln f(xi) / ∂σ = Σxi2 /σ3 - n/σ = 0. Therefore σ = ( Σxi2 / n )0 . 5. Comment: We get for our estimate of σ2 the usual estimate for the variance of a distribution, which is biased due to the use of n rather than n-1 in the denominator. 10.46. A. F(x) = xp , therefore f(x) = pxp-1. ln f(x) = ln p +(p-1)ln(x). Set 0 = ∂ Σ ln f(xi) / ∂p = Σ (1/p + ln(xi)). ⇒ p = -n / Σ ln(xi). Comment: Note that this is a Beta Distribution with a = p, b = 1, and θ = 1.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 283 10.47. C. For the Gamma Distribution f(x) = θ−αxα−1 e−x/θ / Γ(α) . ln f(x) = -α ln(θ) + (α−1) ln(x) -x/θ - ln Γ(α). Setting the partial derivative with respect to theta of the loglikelihood equal to zero will maximize the likelihood: 0 = ∂ Σ ln f(xi) / ∂ θ = Σ {−α/ θ + xi/ θ2}. ⇒ θ = Σ xi / (nα) = 38000 / {(10)(12)} = 317. Comment: For the Gamma Distribution with fixed α, the method of maximum likelihood is equal to the method of moments. 10.48. C. For a single observation, likelihood = f(x) = 3/(2θ) - x/(2θ2), θ ≤ x ≤ 3θ. θ ≤ x. The largest possible value of θ is x, in which case the likelihood is: 3/(2x) - x/(2x2 ) = 1/x. x ≤ 3θ. The smallest value of θ is x/3, in which case the likelihood is: 9/(2x) - 9x/(2x2 ) = 0.
∂f(x)/∂θ = -3/(2θ2) - x/(θ3) = 0. ⇒ θ = 2x/3. In which case the likelihood is: 9/(4x) - 9/(8x) = 9/(8x) > 1/x. This is indeed the maximum, so the maximum likelihood estimator of θ is 2X/3. Comment: Choice E is not possible, since then θ = 3X ⇒ X = θ/3, which is outside the given domain. One can try the other choices and see which one produces the largest likelihood. For example, if X = 3, then 1 ≤ θ ≤ 3, and here is a graph of the likelihood as a function of θ: Likelihood 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 1.5
2
The maximum likelihood occurs at: 2X/3 = 2.
2.5
3
Theta
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 284 10.49. B. One maximizes the likelihood by maximizing the loglikelihood. n
n
n
The loglikelihood is: Σ ln (f(xi)) = -Σ {ln (θ) + xi/θ} = - n ln (θ) - (1/θ)Σ xi. i=1
i=1
i=1
We set the partial derivative of the loglikelihood with respect to θ equal to zero: n
0 = -n/θ +(1/θ2 )Σ xi. i=1
n
θ = (1/n) Σ xi. i=1
Comment: In this case, maximum likelihood is equivalent to the method of moments. 10.50. C. The log density is: ln f(x) = (-2 lnβ) + lnx - .5 (x/β)2 . The partial derivative with respect to β of the loglikelihood is: Σ{(-2/β) + xi2 / β3 } = (-2n/β) + Σ xi2 / β3 . Setting this equal to zero will maximize the loglikelihood: β = {Σ xi2 / (2n) }1/2. For the observed data: Σ xi2 = 24.01 + 3.24 + 11.56 + 47.62 + 16 = 102.43, and the number of points n = 5. Therefore the estimated β = {102.43 / 10 }1/2 = 3.20 Comment: A Weibull Distribution, with τ = 2 fixed. 10.51. B. The loglikelihood function is: Σ ln f(xi) = (-.5n)ln(2πθ) - Σ{(xi -1000)2 /2θ} = (-n/2)ln(2π) + (-n/2)ln(θ) + (-1/2q)Σ(xi -1000)2 . The partial derivative with respect to θ is: (-n/2)(1/θ) + (1/2θ2)Σ(xi -1000)2 . Setting this partial derivative equal to zero and solving for theta: θ = (1/n) Σ (xi - 1000)2 . Comment: Maximizing the Loglikelihood is equivalent to maximizing the likelihood. This a Normal distribution with fixed mean 1000 and with the variance rather than the standard deviation as the parameter. The maximum likelihood estimate of θ is the usual estimate of the variance from a sample.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 285 10.52. E. Set equal to zero the partial derivative of the loglikelihood with respect to the single parameter alpha; then solve for alpha. f(xi) = α1000α(1000 + xi)−α−1. ln f(xi) = ln(α) + αln(1000) − (α+1)ln(1000 + xi).
∂Σ ln f(xi) / ∂α = Σ {(1/α) + ln(1000) - ln(1000 + xi)} = 0. Therefore, α = n / Σ ln{(1000 + xi) / 1000} = 5 /1.294 = 3.86. xi 43 145 233 396 775
(1000 + xi)/1000 1.043 1.145 1.233 1.396 1.775
SUM
ln((1000 + xi)/1000) 0.042 0.135 0.209 0.334 0.574 1.294
Comment: Note that this is a Pareto distribution with the scale parameter fixed at 1000. 10.53. D. For the exponential distribution with ungrouped data, the method of maximum likelihood equals the method of moments. The mean of the exponential is in this case 1/λ. Therefore λ = 1/ observed mean = 3 / (.3+.55+.80) = 1.82. Comment: One can add up the log densities of ln(λ) - λx, set the partial derivative with respect to λ equal to zero and solve for λ. This use of the method of maximum likelihood takes a little longer, but yields the same solution.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 286 10.54. D. Note that for the single parameter distribution defined in this question, θ appears in the support (0, θ). Thus we desire θ > .9, so that the observed values will both be less than θ and thus within the support of the distribution. One can maximize the likelihood by maximizing the loglikelihood. The loglikelihood is: Σ ln f(xi) = Σ { ln(2) + ln(θ - xi) - 2ln(θ)}.
∂ Σ ln f(xi) / ∂θ = Σ {1/ (θ - xi) - 2/θ} = 1/ (θ - 0.5) + 1/ (θ - 0.9) - 4 / θ. Setting the partial derivative of the loglikelihood equal to zero: 1/ (θ - 0.5) + 1/ (θ - 0.9) = 4 / θ. Thus, θ (θ - 0.9) + θ (θ - 0.5) = 4 (θ - 0.5) (θ - 0.9). Thus, 2θ2 - 4.2 θ + 1.8 = 0. Thus, θ = {4.2 ± {4.22 - (4)(2)(1.8)}.5} / {(2)(2)} = 1.05 ± 0.45. Thus, θ = 1.50 or .60. However, as discussed above θ > .9, so we reject θ = .6 and the maximum likelihood estimate for θ is 1.50. Comment: Note that if θ = 0.6, then the density at .9 is in fact zero, so that the likelihood, which is f(0.5)f(0.9), would be zero rather than a maximum. For θ = 1.50 the likelihood is: f(0.5)f(0.9) = (0.888)(0.533) = 0.473. One can check numerically that this is in fact the maximum likelihood: θ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
f(.5) 1.0000 0.9917 0.9722 0.9467 0.9184 0.8889 0.8594 0.8304 0.8025 0.7756 0.7500
f(.9) 0.2000 0.3306 0.4167 0.4734 0.5102 0.5333 0.5469 0.5536 0.5556 0.5540 0.5500
Likelihood 0.2000 0.3278 0.4051 0.4482 0.4686 0.4741 0.4700 0.4598 0.4458 0.4297 0.4125
10.55. E. ln f(x) = ln(α) + α ln(2) - (α+1)ln(x). Loglikelihood = Σ ln f(xi) = n ln(α) + n α ln(2) - (α+1)Σln(xi). Set equal to zero the derivative of the loglikelihood with respect to alpha: 0 = n/α + n ln(2) - Σln(xi). α = n/{Σln(xi) - n ln(2)} = n / Σln(xi/2).
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 287 10.56. E. e−θ is a decreasing function of θ. ⇒ 1 - e−θ is an increasing function of θ.
⇒ e-x /(1 - e−θ) is a decreasing function of θ. The likelihood is a decreasing function of θ, so we want θ to be as small as possible. We know each Xi < θ, and therefore θ ≥ maximum(X1 ,..., Xn ). The best θ is maximum(X1 ,..., Xn ). Comment: An example of where it is important to check the endpoints in order to find the maximum. 10.57. A. f(x) = (1/θ)e-x/θ . Σln(f(xi)) = Σ(−lnθ - xi/θ). ∂Σln(f(xi)) /∂θ = Σ(−1/θ + xi/θ2 ). Setting the partial derivative with respect to θ of the loglikelihood equal zero: 0 = Σ(−1/θ + xi/θ2 ) = -n/θ + (1/θ2 )Σ xi . Therefore θ = (1/n)Σ xi = (x1 + x2 + x3 )/3. Comment: For the Exponential Distribution, the Method of Maximum Likelihood applied to ungrouped data is the same as the Method of Moments. 10.58. E. 1. True. 2. True. 3. True. Comment: Adding additional parameters always allows one to fit the data better, but one should only do so when they provide a significant gain in accuracy. “The principle of parsimony,” states one should use the minimum number of parameters that get the job done. Thus in some cases a two-parameter model such as a Gamma may be preferable to a three-parameter model such as a Transformed Gamma. 10.59. B. ln f(x) = ln β - (1/2) ln(2π) - (3/2) ln(x) - β2/(2x).
∂ ln f(x) / ∂β = 1/β - β / x. 0 = ∂ Σ ln f(xi) / ∂β = n/β - βΣ1/xi. Therefore, β2 = n / Σ1/xi = 3/ {(1/100)+ (1/150) + (1/200)} = 3/(.01+.00667 + .005) = 138.44. β = 11.77. Comment: This is an Inverse Gamma Distribution, with α = 1/2 and θ = β2/2.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 288 10.60. C. For ungrouped data, the likelihood function is the product of the density function at the observed points. In this case the likelihood function is f(0.5)f(0.6). Density f(0.5) f(0.6) Likelihood = f(0.5)f(0.6) f1 (x) = 1 1 1 1 f2 (x) = 2x
1
1.2
1.2
f3 (x) = 3x2
.75
1.08
0.81
Thus ranking the likelihoods from most likely to least likely: f2 (x), f1 (x), f3 (x). The second density is the most likely with the largest likelihood of 1.2, while the third density is least likely with the smallest likelihood of 0.81. Comment: Note that 0.5 and 0.6 are both in the support of each of the density functions. 10.61. A. ln f(x) = lnα - (α+1)ln(x+1). ∂ Σ ln f(xi) / ∂α = Σ {1/α - ln(xi+1) } = n/α - Σ ln(xi+1) Setting the partial derivative of the loglikelihood equal to zero and solving: α = 1 / {Σ ln(xi+1)}/n. Since xi > 0, ln(xi+1) > 0. Therefore, Σ ln(xi+1) > ln(1+largest claim) ≥ ln(1+ sample mean) > ln( sample mean).
⇒ {Σ ln(xi+1)}/n > ln(sample mean)/n. ⇒ as sample mean goes to ∞ so does {Σ ln(xi+1)}/n. Thus the denominator of α goes to infinity and α goes to 0. Comment: The denominator of α is the log of the geometric average of one plus the claim sizes: {Σ ln(xi+1)}/n = ln( {Π(xi+1)}1/n). As the sample mean goes to infinity, so does the geometric average of 1 + claim sizes, and therefore so does the log of the geometric average of 1 + claim sizes. Thus the denominator goes to infinity and α goes to 0. This old exam question could have been worded better. If the actual α were less than or equal to one, then the actual mean is infinite. If that were the case, then for many finite samples, the sample mean would be very big. We are asked what happens to our estimate of α as we look at those samples in which the sample mean is very big. 10.62. B. For ungrouped data, the maximum likelihood estimator for the exponential is equal to the method of moments. The theoretical mean of λ is set equal to the observed mean of Σ xi / n. θ = Σ xi / n. The survival function is S(x) = exp(-x/θ). S(1) = exp(-1/θ). Therefore the estimate of S(1) is: exp(-n/ Σ xi ). Comment: The answer is a probability and therefore should always be in the interval [0, 1]. Only choices B and C have that property.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 289 10.63. D. Since for ungrouped data for the exponential distribution the Method of Maximum Likelihood is the same as the Method of Moments, and since the mean is θ, θ = 12500 / 100 = 125. F(x) = 1 - e-x/θ. 1 - F(250) = e-250/125 = 0.135. 10.64. A. The likelihood is wf1 (1) + (1-w)f2 (1) = wf1 (1) + (1-w)(2f1 (1)) = f1 (1)(2-w). For w in [0,1], the likelihood is maximized for w = 0. Comment: One has to assume that f1 has no parameters and that f2 has no parameters; we have only one (remaining) parameter w. Since there is only a single observation, the likelihood is maximized by applying all the weight to whichever density is larger at that observation. In this case, the second density is larger at 1, so all the weight is applied to the second density. 10.65. C. The likelihood associated with Dick is: (l32 - l33)/ l20 = (9,471,591 - 9,455,522)/9,617,802 = 0.001671. The likelihood associated with Jane is: l56/ l21 = 8,563,435 /9,607,896 = 0.8913. Therefore, the combined likelihood is: (.001671)(.8913) = 0.00149. 10.66. A. ln f(x) = -(x-µ)2 / (2x) - ln(x)/2 - ln(2π)/2.
∂ln f(x)/ ∂µ = (x-µ)/ x = 1 - µ/x. Σ∂ln f(xi)/ ∂µ = Σ(1 - µ/xi) = n - µΣ(1/xi). Setting the partial derivative equal to zero and solving for µ: µ = n/Σ(1/xi) = 5/(1/11 + 1/15.2 + 1/18 + 1/21 + 1/25.8) = 16.74. Comment: This is a Reciprocal Inverse Gaussian Distribution. See Insurance Risk Models by Panjer and WiIlmot. Here is a graph of the loglikelihood as a function of µ: Loglikelihood - 15.4 - 15.5 - 15.6 - 15.7 - 15.8 - 15.9 15.0
15.5
16.0
16.5
17.0
17.5
18.0
18.5
mu
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 290 10.67. B. Let TP be the total lifetimes for Philʼs and TS be the total lifetimes for Sylviaʼs bulbs. 20 bulbs for Phil and 10 bulbs for Sylvia. For the Exponential Distribution applied to ungrouped data, the Method of Moments equals the Method of Maximum Likelihood. Therefore, TP/20= 1000 and TS/10 = 1500. So TP = (20)(1000) = 20000 and TS = (10)(1500) = 15000. For the Exponential Distribution, f(x) = e-x/θ/θ. ln f(x) = -x/θ - ln(θ). Assuming θS = 2θP, then the loglikelihood is:
Σ(-xi/θP - ln(θP)) + Σ(-xi/ 2θP - ln(2θP)) = -TP/θP - 20ln(θP) - TS/ 2θP - 10(ln(θP)+ ln(2)). Phil
Sylvia
Setting the partial derivative of the loglikelihood with respect to θ equal to zero: 0 = TP/θP2 - 20/θP + TS/ 2θP2 - 10/θP. θP = (TP + TS/ 2)/(20 + 10) = (20000 + 15000/2)/(20 + 10) = 917. Comment: One could just half the time observed for Sylviaʼs bulbs; 15000 hours would have only been 7500 hours if the bulbs had been Philʼs. Then one can apply the method of moments: (20000 + 15000/2)/(20 + 10) = 917. 10.68. C. f(x) = τxτ−1θ−τ exp(-(x/θ)τ). ln f(x) = lnτ + (τ - 1) ln x - τlnθ - (x/θ)τ.
Σ ∂ln f(xi) / ∂θ = Σ −τ/θ + τxτ/θτ+1 = 0.
Nτ/θ = (τ/θτ+1)Σ xτ.
θ = {(1/N)Σ xτ }1/τ = {(1/10)Σ x0.5 }1/0.5 = (488.97/10)2 = 2391. Comment: For Maximum Likelihood for a Weibull Distribution with τ fixed, θτ = the τth moment of the observed data. 10.69. D. For the Inverse Gaussian distribution with θ fixed, maximum likelihood is equal to the method of moments. µ = X = 31,939/10 = 3194. Alternately, f(x) = (θ/ 2πx3 ).5 exp[- θ({x − µ} / µ)2 / 2x]. ln f(x) = 0.5 ln(θ) - 0.5ln(2π) - 1.5ln(x) - θ({x − µ} / µ)2 / 2x. Σ ln f(xi) = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ /2) Σ(xi /µ2 - 2/µ + 1/ xi). Set the partial derivative of the loglikelihood with respect to µ equal to zero:
∂Σ ln f(xi) / ∂µ = -(θ /2)Σ(-2xi /µ3 + 2/µ2 ) = 0. ⇒ Σ 2/µ2 = Σ2xi /µ3. Therefore, n µ = Σ xi. ⇒ µ = Σ xi /n = X = 31,939/10 = 3194.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 291 10.70. C. For the Exponential Distribution, the method of maximum likelihood applied to ungrouped data is the same as the method of moments: θ = mean. Apply the method of maximum likelihood in 1999: θ1999 = 3679/10 = 367.9. A 1000 maximum covered loss in 2001 is equivalent to a 1000/1.052 = 907.03 maximum covered loss in 1999. Deflating the deducible and maximum covered loss, the average payment per loss is: E[ X
∧
907.03] - E[X
∧
90.70] = θ(1 - e−907.03/θ) - θ(1 - e−90.70/θ) =
367.9(e-90.70/367.9 - e-907.03/367.9) = 256.25. Inflating to year 2001 dollars: (256.25)(1.052 ) = 282.5. Alternately, work in the year 2001. All of the loss amounts are multiplied by 1.052 . Thus applying the method of maximum likelihood: θ2001 = (1.052 )(3679)/10 = 405.61. Then the average payment per loss is: E[ X
∧ 1000] - E[X ∧
100] =
θ(1 - e−1000/θ) - θ(1 - e−100/θ) = 405.61(e-100/405.61 - e-1000/405.61) = 282.5. Alternately, if θ1999 = 367.9, this scale parameter is multiplied by the inflation factor over 2 years of 1.052 . Therefore, θ2001 = (1.052 )(367.9) = 405.61. Proceed as above.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 292 10.71. C. f(x) = pe-x/100/100 + (1-p)e-x/10000/10000. The likelihood is: f(100) f(2000) = {(pe- 1/100) + (1-p)e- . 0 1/10000}{(pe- 2 0/100) + (1-p)e- . 2/10000}. Comment: The Maximum Likelihood occurs for p = 0.486. loglikelihood - 16.5 - 17.0 - 17.5 - 18.0 - 18.5
0.2
0.4
0.6
0.8
1.0
p
loglikelihood - 16.3745 - 16.3750 - 16.3755 - 16.3760 - 16.3765
0.47
0.48
0.49
0.50
0.51
p
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 293 10.72. B. f(x) = α θα (θ+x)−(α+1). Σ ln (f(xi)) = Σ {ln(α) + α ln(θ) − (α+1)ln(θ + xi)}. The derivative with respect to α is: Σ {(1/α) + ln(θ) - ln(θ + xi)} = N/α - Σ ln {(θ +xi)/θ}. Setting this derivative equal to zero: 0 = (N/α) − Σ ln {(θ +xi)/θ}. Solving for alpha: α = N / Σln((θ + xi)/θ) = 3/{ln((150 + 225)/150) + ln((150 + 525)/150) + ln((150 + 950)/150)} = 3/{ln(2.5) + ln(4.5) + ln(7.333)} = 0.68. 10.73. B. For an Exponential with mean µ, f(x) = e-x/µ/µ and ln f(x) = -x/µ - lnµ. The claim of size 1 from the medium hazard level contributes: -1/(2θ) - ln(2θ). The claim of size 2 from the medium hazard level contributes: -2/(2θ) - ln(2θ). The claim of size 3 from the medium hazard level contributes: -3/(2θ) - ln(2θ). The claim of size 15 from the high hazard level has µ = 3θ and x = 15, and contributes: -15/(3θ) - ln(3θ). Thus the loglikelihood is: {-1/(2θ) - ln(2θ)} + {-2/(2θ) - ln(2θ)} + {-3/(2θ) - ln(2θ)} + {-15/(3θ) - ln(3θ)} = -8/θ - 4lnθ - 3ln2 - ln3. Setting the derivative equal to zero: 0 = 8/θ2 - 4/θ. ⇒ θ = 2. Alternately, convert all of the data to the low-hazard level. 1, 2, and 3 from 2θ ⇔ 1/2, 1, 3/2 from θ. 15 from 3θ ⇔ 15/3 = 5 from θ. For the Exponential, Maximum Likelihood equals Method of Moments: θ = (1/2 + 1 + 3/2 + 5)/4 = 2. Comment: The absence of any reported claims from the low hazard level, contributes nothing to the loglikelihood. In the log density, x is the size of claim, not the aggregate loss. When estimating severity, the number of claims is the amount of data; no claims provides no data for estimating severity. 10.74. D. ln f(x) = ln(p+1) + p ln(x).
∂ Σ ln f(xi) / ∂p = Σ 1/(p+1) + ln(xi) = 0. ⇒ p = -1 + n/Σln(xi) = -1 - 3/{ln(.74) + ln(.81) + ln(.95)} = 4.327. Comment: A Beta Distribution with a = p+1, b = 1, and θ = 1. 10.75. B. For the Exponential, maximum likelihood is equal to the method of moments. θ^ = X = 26.46/7 = 3.78.
0.75 = Prob[X > c] = e-c/3.78. ⇒ c = 1.087.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 294 10.76. A. This is a Beta Distribution with a = θ, b = 1, and what Loss Models calls θ = 1. The mean is: a/(a + b) = θ/(θ + 1). X = (.21 + .43 + .56 + .67 + .72)/5 = 0.518. θ/(θ + 1) = 0.518. ⇒ θ = 1.0747. This is the method of moments estimator of θ, so that what the question calls S is 1.0747. ln f(x) = lnθ + (θ - 1)ln(x). Loglikelihood = n lnθ + (θ - 1)Σ ln(xi). Set the derivative of the loglikelihood with respect to θ equal to zero: n/θ + Σ ln(xi) = 0.
⇒ θ = -n/Σ ln(xi) = -5/(ln.21 + ln.43 + ln.56 + ln.67 + ln.72) = 1.3465. This is the maximum likelihood estimator of θ, so that what the question calls R is 1.3465. R - S = 1.3465 - 1.0747 = 0.272. Comment: One can calculate the mean of the given density by integrating x f(x) from 0 to 1. 10.77. D. F(x) = xθ+1. f(x) = (θ + 1)xθ. lnf(x) = ln(θ + 1) + θln(x). loglikelihood is: n ln(θ + 1) + θ Σ ln(xi). Set the partial derivative with respect to theta equal to zero: 0 = n/(θ + 1) + Σ ln(xi). θ + 1 = -n/Σ ln(xi) = -5/{ln(0.56) +ln(0.83) + ln(0.74) + ln(0.68) + ln(0.75)} = 2.873. θ = 1.873. Comment: A Beta Distribution with a = θ + 1 and b = 1. 10.78. D. ln f(x) = θ - x, θ < x. The loglikelihood is: n θ - Σxi, θ < xi. The loglikelihood is an increasing function of θ, therefore we want the largest possible value of θ. θ < xi. ⇒ θ < Minimum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ]. Therefore, the largest possible value of θ is Minimum [Y1 , Y2 , Y3 , Y4 , ... ,Yn ]. Comment: One needs to be careful, since θ is part of the support of the density. If any xi ≤ θ, then f(xi) = 0, and the likelihood is zero. A shifted exponential distribution.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 295 10.79. E. ln f(x) = ln(θ + 1) + θ lnx. The loglikelihood is: n ln(θ + 1) + θ Σlnxi. Set the partial derivative with respect to θ equal to zero: 0 = n/(θ + 1) + Σlnxi. θ = -n/ Σlnxi - 1 = - 5/{ln(.92) + ln(.79) + ln(.90) + ln(.65) + ln(.86)} - 1 = 3.97. Comment: A Beta Distribution with a = θ + 1 and b = 1. 10.80. C. For the Exponential Distribution, method of moments ⇔ maximum likelihood. θ^ A = θ^ B. ⇒ θ^ A - θ^ B = 0. Comment: θ^ = (10 + 5 + 21 + 10 + 7)/5 = 10.6. 10.81. C. f(x) = θe-θ/x/x2 . ln f(x) = ln(θ) - θ/x - 2ln(x).
∂ lnf(x) / ∂θ = 1/θ - 1/x. Set the partial derivative of the loglikelihood equal to zero: 0 = Σ ∂ lnf(xi) / ∂θ = n/θ - Σ1/xi. θ = n/Σ1/xi = 4/(1/8000 + 1/10,000 + 1/12,000 + 1/15,000) = 10,667. Alternately, if X is Inverse Exponential with parameter θ, then 1/X is Exponential with mean 1/θ. Fitting an Exponential via maximum likelihood is the same as the method of moments. Fitting an Exponential distribution with mean 1/θ via the method of moments to 1/X: 1/θ = (1/8000 + 1/10,000 + 1/12,000 + 1/15,000)/4. ⇒ θ = 10,667. 10.82. E. ln f(x) = ln(θ) + (θ - 1) ln(x). Loglikelihood is: n ln(θ) + (θ - 1) Σ ln(xi). Setting the partial derivative of the loglikelihood with respect to θ equal to zero: n/θ + Σ ln(xi) = 0. θ = -n/Σ ln(xi) = - 5/{ln(.25) + ln(.5) + ln(.4) + ln(.8) + ln(.65)} = 1.37. Comment: This is a Beta Distribution as per Appendix A of Loss Models, with a = θ, b = 1, and θ = 1.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 296 10.83. D. ln f(x) = ln(θ) + (θ - 1)lnx. Therefore, for the given sample, the loglikelihood is: 5ln(θ) + (θ - 1){ln.1 + ln.25 + ln.5 + ln.6 + ln.7} = 5ln(θ) - 5.2495(θ - 1). Setting the partial derivative with respect to theta equal to zero: 5/θ - 5.2495 = 0. ⇒ θ = 0.952. Comment: A Beta Distribution as per Loss Models with a = θ, b = 1, and θ = 1. 10.84. A. lnf(x) = ln(θ) - θ/x - 2ln(x). n
n
loglikelihood is: n ln(θ) - θ ∑ 1 - 2 ∑ ln[xi] . i=1 xi i=1 Set the partial derivative of the loglikelihood with respect to theta equal to zero: n n 0 = n/θ - ∑ 1 . ⇒ θ^ = n . ⇒ θ^ = 5/(1/3 + 1/9 + 1/13 + 1/33 + 1/51) = 8.75. 1 i=1 xi ∑ xi i=1 Alternately, let Y = 1/X. Then Y follows an Exponential Distribution with mean 1/θ. The maximum likelihood fit to this Exponential is the same as the method of moments: 1/ θ^ = (1/3 + 1/9 + 1/13 + 1/33 + 1/51)/5. ⇒ θ^ = 8.75. 10.85. D. ln f(x) = ln(θ + 1) + θ ln(1-x). ∂ lnf(x) = 1/(θ+1) + ln(1 - x). ∂θ
Σ ∂ lnf(x) = n/(θ+1) + Σln(1 - xi). ∂θ Setting equal to zero, the partial derivative of the log density with respect to theta: 0 = n/(θ+1) + Σln(1 - xi).
⇒θ=
-n
∑ ln(1- xi)
-1=
-4 = 2.728. ln(0.95) + ln(0.9) + ln(0.8) + ln(0.5)
Comment: A Beta Distribution, with parameters a = 1, b = θ + 1, and θ = 1.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 297 10.86. D. f(x) = -
d S(x) k = dx 90
⎛ ⎜1 ⎝
-
x ⎞ k- 1 . ⎟ 90 ⎠
ln f(x) = ln[k] - ln[90] + (k-1) ln[1 - x/90].
∂ ln f(x) ∂k
= 1/k + ln[1 - x/90].
Setting equal to zero the partial derivative of the loglikelihood with respect to k: 0 = 1/k + ln[1 - 10/90] + 1/k + ln[1 - 50/90]. -2 ⇒k= = 2.154. ln[8 / 9] + ln[4 / 9] Comment: This is a Modified DeMoivre's Law, with ω = 90. This is a Beta Distribution with parameters a = 1, b = k, and θ = ω = 90. In general, for ω fixed, when fitting the exponent k via maximum likelihood: k^ = n
-n
∑ ln[1 - xi / ω]
.
i=1
10.87. This is a Beta Distribution with parameters a = 1, b = k, and θ = 90. The mean is: θ a / (a+b) = 90 / (1 + k). Set (10 + 50)/2 = 90 / (1 + k). ⇒ k = 2. 90
Alternately, mean =
90
∫0 S(x) dx = ∫0
x = 90
⎛ ⎜1 ⎝
x ⎞k 90 ⎛ x ⎞ k + 1⎤ ⎟ dx = ⎜1 ⎟ ⎥⎦ k +1 ⎝ 90 ⎠ 90 ⎠
= 90 / (1 + k).
x =0
Set (10 + 50)/2 = 90 / (1 + k). ⇒ k = 2. 10.88. A. For the uniform distribution on [a, b], the estimate of b is the maximum of the sample, and the estimate of a is the minimum of the sample. Thus, b^ - a^ = 4.5 - 0.7 = 3.8. Alternately, if a > 0.7, then the likelihood is zero. If b < 4.5, then the likelihood is zero. For a ≤ 0.7 and b ≥ 4.5, the likelihood is: (b-a)-5. The likelihood is a decreasing function of b, so we want to take the smallest possible b, which is 4.5. The likelihood is an increasing function of a, so we want to take the largest possible a, which is 0.7. ^ Thus, b - a^ = 4.5 - 0.7 = 3.8.
2013-4-6, Fitting Loss Distribs. §10 Ungrouped Max. Likelihood, HCM 10/14/12, Page 298 10.89. D. ln f(x) = ln[1 + 1/α] + ln[x]/α. Loglikelihood is: 4 ln[1 + 1/α] + (ln[0.2] + ln[0.5] + ln[0.6] + ln[0.8])/α = 4 ln[1 + 1/α] + ln[0.048]/α. Setting the partial derivative of the loglikelihood with respect to alpha equal to zero: -1 4 -ln[0.048] -4 + = 0. ⇒ 1 + 1/α = = 1.3173. ⇒ α = 3.15. 2 2 α 1 + 1/ α α ln[0.048] Comment: This is a Beta distribution with b = 1, θ = 1, and a = 1 + 1/α. a^ = n
-n
∑ ln[xi / θ] i=1
=
-4 = 1.317. ⇒ α^ = 1/0.317 = 3.15. ln[0.2] + ln[0.5] + ln[0.6] + ln[0.8]
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 299
Section 11, Fitting to Grouped Data by Maximum Likelihood In order to fit a chosen type of size of loss distribution by maximum likelihood to grouped data, you maximize the likelihood or equivalently you maximize the loglikelihood. For grouped data, the likelihood for an interval [ai, bi] with ni claims is: {F(bi) - F(ai)} n i . The likelihood is the product of terms from each interval. The contribution to the loglikelihood from an interval is: ni ln[F(bi) - F(ai)]. For grouped data, the loglikelihood is a sum of terms over the intervals: (number of observations in the interval) ln(probability covered by the interval). To fit a distribution via maximum likelihood to grouped data you find the set of parameters such that either Π{F(bi) - F(ai) } n i or Σ ni ln[F(bi) - F(ai)] is maximized.113 For example for the Weibull Distribution: F(x) = 1 - exp[-(x/θ)τ]. ln[F(bi) - F(ai)] = ln[exp(-(ai/θ)τ) - exp(-(bi/θ)τ)]. For a particular pair of values of θ and τ one can compute ni ln[F(bi) -F(ai)] for each of the observed intervals and add up the results. For example, the second interval for the grouped data in Section 3 contributes: 2247 ln[ F(10000) - F(5000) ]. For a Weibull with θ = 15,000 and τ = 1.1, this is: 2247 ln[0.4728 - 0.2582] = -3458. Adding up the contributions from all the intervals gives a loglikelihood: -18,638. Bottom of Interval $ Thous. 0 5 10 15 20 25 50 75 100
Top of Interval $ Thous. 5 10 15 20 25 50 75 100 Infinity
# claims in the Interval 2208 2247 1701 1220 799 1481 254 57 33 10000
113
F(lower)
F(upper)
0.0000 0.2582 0.4728 0.6321 0.7465 0.8269 0.9767 0.9972 0.9997
0.2582 0.4728 0.6321 0.7465 0.8269 0.9767 0.9972 0.9997 1.0000
Probability for the Interval 0.2582 0.2146 0.1593 0.1143 0.0805 0.1498 0.0205 0.0025 0.0003
Negative Loglikelihood
1.0000
18,638
2990 3458 3124 2646 2013 2812 988 342 266
ni is just the number of observed claims for the ith interval. Loss Models has the ith interval from (ci-1, ci]. 10,000 might be the top of one interval and then 10,000 would not be included the next interval, rather 10,001 would be the smallest size included in the next interval. For ease of exposition, I have not been that precise here. In practical applications, one would need to know in which interval a loss of size 10,000 is placed.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 300 Note that since we do not know the size of each loss, we could not work with the density. Rather we work with the differences in the Distribution Function at the end points of intervals. For a selected set of parameter values, the loglikelihoods for the Weibull for the ungrouped data in Section 3 are: Tau Theta 14,000 15,000 16,000 17,000 18,000
0.9 -18,864 -18,843 -18,857 -18,899 -18,962
1.0 -18,721 -18,669 -18,663 -18,694 -18,752
1.1 -18,731 -18,638 -18,605 -18,618 -18,667
1.2 -18,881 -18,738 -18,670 -18,659 -18,694
The values of θ and τ which maximize the loglikelihood are close to θ = 16,000 and τ = 1.1.114 It is worthy of note that this process makes no use of any information on dollars of loss in each interval. This is in contrast to the method of moments. Thus for grouped data the method of moments and the method of maximum likelihood do not produce the same fitted exponential distribution, unlike the case with ungrouped data. Usually one maximizes the loglikelihood by standard numerical algorithms,115 rather than by solving the equations in closed form.116
A more exact computation yields θ = 16,148 and τ = 1.100. For grouped data (as well as discrete frequency data) one can use the Method of Scoring. 116 The lack of access to a computer, restricts the variety of possible exam questions. 114 115
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 301 Examples of Fitted Distributions: The calculated parameters of other distributions fit to the grouped data in Section 3 by the Method of Maximum Likelihood are:
Distribution Exponential Pareto Weibull Gamma LogNormal Inverse Gaussian Transformed Gamma Generalized Pareto Burr
Parameters Fit via Maximum Likelihood to the Grouped Data in Section 3 θ = 15,636 N.A. θ = 16,184 α = 1.2286 µ = 9.2801 µ = 15,887 α = 3.2536 α = 8.0769 α = 3.9913
N.A. τ = 1.0997 θ = 12,711 σ = 0.91629 θ = 14,334 θ = 1861 θ = 73,885 θ = 40,467
τ =0 .59689 τ = 1.5035 γ = 1.3124
As discussed with respect to the method of moments, the Pareto distribution has too heavy of a righthand tail to fit the grouped data set from Section 3. The mean, coefficient of variation and skewness for the maximum likelihood curves are as follows: Grouped Data Expon. Mean ($000) 15.7 15.6 ≈1 CV 1.00 ≈3 Skewness 2.00
Maximum Likelihood Fitted Curves Weibull Gamma TGamma Burr 15.6 15.6 15.7 15.7 0.91 0.90 0.95 0.98 1.73 1.80 2.39 3.19
GenPar InvGauss LogNor 15.7 15.9 16.3 0.97 1.05 1.15 2.73 3.16 4.95
Just as with ungrouped data, the values of loglikelihoods can be usefully looked at in order to compare the maximum likelihood curves. The negative loglikelihoods fit to the grouped data in Section 3 are: Distribution Burr Generalized Pareto Transformed Gamma Gamma
Neg. Log Likelihood 18533.9 18536.0 18541.0 18580.5
Distribution Weibull LogNormal Inverse Gaussian Exponential
Neg. Log Likelihood 18604.3 18627.4 18656.4 18660.9
Based on this criterion the Burr is the best fit to the grouped data in Section 3.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 302 Below are shown the survival functions for some of these distributions fit via maximum likelihood, as well as the empirical survival function. Survival Functions at x times 1000 5 10 15 20
25
50
75
100
150
Data
0.7792
0.5545
0.3844
0.2624
0.1825
0.0344
0.0090
0.0033
Exponential
0.7263
0.5275
0.3832
0.2783
0.2021
0.0409
0.0083
0.0017
0.0001
Weibull
0.7597
0.5549
0.3986
0.2830
0.1993
0.0315
0.0045
0.0006
0.0000
Gamma
0.7701
0.5584
0.3971
0.2795
0.1953
0.0308
0.0047
0.0007
0.0000
Trans. Gam.
0.7809
0.5478
0.3801
0.2648
0.1859
0.0360
0.0082
0.0021
0.0002
Burr
0.7798
0.5536
0.3830
0.2636
0.1825
0.0348
0.0091
0.0030
0.0005
Data
0.7792
0.5545
0.3844
0.2624
0.1825
0.0344
0.0090
0.0033
It appears that the Burr distribution fits best. This will be seen more clearly when the mean excess losses are compared. Limited Expected Value, Empirical versus Fitted Values: The empirical Limited Expected Values at the endpoints of the intervals were computed previously for the grouped data in Section 3. For the curves fit by maximum likelihood to this grouped data, the Limited Expected Values are as follows: Limited Expected Value ($000) x $ Thous. Data Expon. Weibull 5 4.5 4.3 4.4 10 7.8 7.4 7.7 15 10.1 9.6 10.0 20 11.7 11.3 11.7 25 12.8 12.5 12.9 50 15.0 15.0 15.2 75 15.5 15.6 15.5 100 15.6 15.6 15.6 150 15.6 15.6 200 15.6 15.6 250 15.6 15.6 Infinity 15.7 15.6 15.6
Gamma 4.4 7.7 10.1 11.8 13.0 15.2 15.6 15.6 15.6 15.6 15.6 15.6
Trans. Gamma 4.5 7.8 10.1 11.7 12.9 15.1 15.6 15.7 15.7 15.7 15.7 15.7
Gen. Pareto 4.5 7.8 10.1 11.7 12.8 15.0 15.5 15.6 15.7 15.7 15.7 15.7
Burr 4.5 7.8 10.1 11.7 12.8 15.0 15.5 15.6 15.7 15.7 15.7 15.7
Log Normal 4.6 7.9 10.1 11.6 12.7 15.0 15.7 16.0 16.2 16.3 16.3 16.3
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 303 For example, for the LogNormal distribution, E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}. With µ = 9.28 and σ = 0.916, E[X ∧ 25000] = exp(9.6995)Φ[0.00833] + 25000 {1 - Φ[0.9243]} = (16310)(0.5033) + (25000)(0.1777) = 12,651.
Mean Excess Loss, Empirical versus Fitted Values: The empirical Mean Excess Losses at the endpoints of the intervals were computed previously for the grouped data in Section 3. For the curves fit by maximum likelihood to this grouped data, the mean excess losses are as follows: e(x) x $ Thousand 0 5 10 15 20 25 50 75 100 150 200 250
Data 15.8 14.4 14.3 14.6 15.2 15.9 21.2 27.1 30.2
$ Thousand Expon. 15.6 15.6 15.6 15.6 15.6 15.6 15.6 15.6 15.6 15.6 15.6 15.6
Weibull 15.6 14.7 14.3 14.0 13.7 13.5 12.8 12.4 12.1 11.7 11.4 11.1
Gamma 15.6 14.5 14.1 13.8 13.7 13.6 13.2 13.1 13.0 12.9 12.9 12.8
Trans. Gamma 15.7 14.3 14.4 14.7 15.0 15.4 17.2 18.8 20.1 22.5 24.5 26.2
Gen. Pareto 15.7 14.3 14.3 14.6 15.1 15.6 18.7 22.0 25.5 32.4 39.5 46.5
Burr 15.7 14.4 14.2 14.5 15.0 15.7 20.0 25.0 30.3 41.3 52.6 64.1
Log Normal 16.3 14.6 15.8 17.4 19.0 20.6 28.1 35.0 41.3 53.2 64.3 74.8
For example, for the LogNormal distribution, e(x) = exp(µ + σ2/2){1 − Φ[(lnx − µ − σ2)/σ] / {1 − Φ[(lnx − µ)/σ]} − x. With µ = 9.28, σ = .916, e(25000) = exp(9.6995)(1-Φ[.00833]) / {1 - Φ[.9243]} - 25000 = (16310)(.4967)/(.1777) - 25000 = 20,589.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 304 The following graph compares the mean excess losses for the Burr (solid), the Transformed Gamma (dashed) and the data (points): 35000
30000
25000
20000
15000 20000
40000
60000
80000
100000
120000
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 305 With Only Two Groups: Exercise: You observe that out of 50 claims, 5 are of size greater than 1000. An Exponential distribution is fit to this data via the method of maximum likelihood. What is the fitted value of θ? [Solution: The likelihood is: (1 - e-1000/θ)45 (e-1000/θ)5 . Let y = e-1000/θ. Then the likelihood is: (1 - y)45 y5 . Set the derivative with respect to y equal to zero: 0 = -45(1 - y)44 y5 + 5(1 - y)45 y4 .
⇒ 0 = -45y + 5(1 - y). ⇒ y = 5/50 = 0.1. ⇒ θ = -1000/ln0.1 = 434.] Notice that for the fitted θ = 434, F(1000) = 1 - e-1000/434 = 90.0% = 45/50 = empirical distribution function at 1000. In general, when we have data grouped into only two intervals, one can fit a single parameter via maximum likelihood by setting the theoretical and empirical distribution functions equal at the boundary between the two intervals. This is mathematically equivalent to percentile matching. Assume there are n1 claims from 0 to b and n2 claims from b to ∞. Then the likelihood is: F(b) n 1 {1 - F(b)} n 2 . Let y = F(b). Then the likelihood is y n 1 (1 - y) n 2 , a Beta Distribution with a = n1 + 1, b = n2 + 1, and θ = 1. The mode of this Beta Distribution is: (a - 1)/(a + b - 2) = n1 /(n1 + n2 ).117 Thus the likelihood is maximized for F(b) = y = n1 /(n1 + n2 ) = empirical distribution function at b.
117
See “Mahlerʼs Guide to Loss Distributions.”
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 306 Fitting a Two Parameter Distribution to Three Groups: Assume we have data grouped into three intervals. Assume there are n1 claims from 0 to a, n2 claims from a to b, and n3 claims from b to ∞. Make the change of variables: β = F(a) and γ = S(b). The loglikelihood is: n1 lnF(a) + n2 ln[F(b) - F(a)] + n3 lnS(b) = n1 lnβ + n2 ln[1 - β - γ] + n3 lnγ. Setting the partial derivatives with respect to β and γ of the loglikelihood equal to zero: 0 = n1 /β - n2 /(1 - β - γ). ⇒ β/(1 - β - γ) = n1 /n2 . 0 = -n2 /(1 - β - γ) + n3 /γ. ⇒ γ/(1 - β - γ) = n3 /n2 . Solving: β = n1 /( n1 + n2 + n3 ), and γ = n3 /( n1 + n2 + n3 ). In other words, set F(a) = n1 /( n1 + n2 + n3 ) = observed proportion from 0 to a, and S(b) = n1 /( n1 + n2 + n3 ) = observed proportion from b to ∞. This is mathematically equivalent to percentile matching. Exercise: Fit a Weibull Distribution via maximum likelihood to the following grouped data. Interval Number of Losses (0, 200) 32 [200, 500) 50 [500, ∞) 18 [Solution: 1 - exp[-(200/θ)τ] = F(200) = 32/100. ⇒ (200/θ)τ = 0.3857. ⇒ τ ln200 - τlnθ = -0.9523. exp[-(500/θ)τ] = S(500) = 18/100. ⇒ (500/θ)τ = 1.7148. ⇒ τ ln500 - τlnθ = 0.5393. Therefore, τ =
0.5393 + 0.9523 = 1.628. ⇒ θ = 359.] ln(500) - ln(200)
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 307 Problems: 11.1 (3 points) You observe that out of 60 claims, 10 are of size greater than 5. A Weibull distribution assuming τ = 3, is fit to this data via the method of maximum likelihood. What is the fitted value of θ? A. less than 3.7 B. at least 3.8 but less than 3.9 C. at least 3.9 but less than 4.0 D. at least 4.0 but less than 4.1 E. at least 4.1 11.2 (3 points) The following 100 losses are observed in the following intervals: Interval Number of Losses [0, 1] 40 [1, 2] 20 [2, 5] 25 [5, ∞) 15 You wish to fit a Pareto Distribution to this data via the method of maximum likelihood. Which of the following functions should be maximized? ⎧ ⎛ θ ⎞ α⎫40 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫60 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫85 A. ⎨1 - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎝ θ + 1⎠ ⎭ ⎩⎝ θ + 1⎠ ⎝ θ + 2 ⎠ ⎭ ⎩⎝ θ + 2 ⎠ ⎝θ + 5⎠ ⎭ ⎩ ⎧ ⎛ θ ⎞ α ⎫0.4 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α ⎫0.6 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫0.85 ⎛ θ ⎞ α B. ⎨1 - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎜ ⎟ ⎝ θ + 1⎠ ⎭ ⎩⎝ θ + 1⎠ ⎝ θ + 2 ⎠ ⎭ ⎩⎝ θ + 2 ⎠ ⎝θ + 5⎠ ⎭ ⎝θ + 5 ⎠ ⎩ ⎧ ⎛ θ ⎞ α ⎫40 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫20 ⎧⎛ θ ⎞ α ⎛ θ ⎞ α⎫25 ⎛ θ ⎞ 15α C. ⎨1 - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎨⎜ ⎟ - ⎜ ⎟ ⎬ ⎜ ⎟ ⎝ θ + 1⎠ ⎭ ⎩⎝ θ + 1⎠ ⎝ θ + 2 ⎠ ⎭ ⎩⎝ θ + 2 ⎠ ⎝θ + 5⎠ ⎭ ⎝θ + 5⎠ ⎩ ⎛ θ ⎞ 40α ⎛ θ ⎞ 20 α ⎛ θ ⎞ 25 α D. ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ θ + 1⎠ ⎝θ + 2⎠ ⎝ θ + 5⎠ E. None of the above.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 308 11.3 (4 points). Ten observed losses have been recorded in thousands of dollars and are grouped as follows: Interval [100,150] [150,200] [200, ∞] Number of claims 6 1 3 The random variable underlying the observed losses has the Distribution function F(x) = 1 - (x/100000)-q, x > 100,000. The value of the parameter q is to be determined via the method of maximum likelihood. Which of the following equations should be solved (numerically)? ln(3) (2 / 3)q (1/ 2)q - ln(3) (2 / 3)q A. 0 = + - 1. 1 - (2 / 3)q (2 / 3)q - (1/ 2)q B. 0 =
(2 / 3)q 1 - (2 / 3)q
C. 0 =
6 ln(1.5) (2 / 3)q ln(2) (1/ 2)q - ln(1.5) (2 / 3)q + - 3 ln(2). 1 - (2 / 3)q (2 / 3)q - (1/ 2)q
D. 0 =
6 (2 / 3)q 1 - (2 / 3)q
+
+
ln(2) (1/ 2)q - 1. (2 / 3)q - (1/ 2)q
(1/ 2)q - 3. (2 / 3)q - (1/ 2)q
E. None of the above. 11.4 (3 points) An exponential distribution F(x) = 1 - e-λx is fit to the following size of claim data by the method of maximum likelihood. Which of the following functions should be maximized? Range # of claims 0-1 6300 1-2 2350 2-3 850 3-4 320 4-5 110 over 5 70 Total 10000 A. {1 - e-λ}3000 {e-λ - e-2λ}3500 {e-2λ - e-3λ}2000 {e-3λ - e-4λ}1000 {e-4λ - e-5λ}500 e-2500λ B. {1 - e-λ}2.1 {e-λ - e-2λ}1.49 {e-2λ - e-3λ}2.35 {e-3λ - e-4λ}3.13 {e-4λ - e-5λ}4.55 e-35.7λ C. {1 - e-λ}6300 {e-λ - e-2λ}4700 {e-2λ - e-3λ}2550 {e-3λ - e-4λ}1280 {e-4λ - e-5λ}550 e-2100λ D. {1 - e-λ} {e-λ - e-2λ} {e-2λ - e-3λ} {e-3λ - e-4λ} {e-4λ - e-5λ} e-5λ E. {1 - e-λ}6300 {e-λ - e-2λ}2350 {e-2λ - e-3λ}850 {e-3λ - e-4λ}320 {e-4λ - e-5λ}110 e-350λ 11.5 (2 points) In the previous question what is the fitted value of λ? (A) .95
(B) 1.00
(C) 1.05
(D) 1.10
(E) 1.15
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 309 11.6 (3 points) There are 45 losses of size less than 10. There are 65 losses of size at least 10. You fit a Pareto Distribution with θ = 20 via maximum likelihood. What is the fitted value of α? (A) 1.1
(B) 1.2
(C) 1.3
(D) 1.4
(E) 1.5
11.7 (3 points) Losses follow a uniform distribution on [0, b]. The following 13 losses have been grouped into intervals: Interval Number of Losses (0, 1) 6 [1, 2) 4 [2, 3) 2 [3, 4) 1 [4, ∞) 0 What is the maximum likelihood estimate of b? A. 3.00 B. 3.25 C. 3.50 D. 3.75 E. 4.00 11.8 (3 points) There are 50 losses of size less than 1000. There are 20 losses of size at least 1000. You fit a Pareto Distribution with α = 3 via maximum likelihood. What is the fitted value of θ? (A) Less than 1900 (B) At least 1900, but less than 2000 (C) At least 2000, but less than 2100 (D) At least 2100, but less than 2200 (E) At least 2200 11.9 (3 points) You are given: (i) Losses follow an Inverse Exponential distribution, as per Loss Models. (ii) A random sample of 400 losses is distributed as follows: Loss Range Frequency [0, 10000] 140 (10000, 20000] 100 (20000, ∞) 160 Calculate the maximum likelihood estimate of θ. (A) Less than 7000 (B) At least 7000, but less than 8000 (C) At least 8000, but less than 9000 (D) At least 9000, but less than 10,000 (E) At least 10,000
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 310 11.10 (3 points) Fit a Loglogistic Distribution via maximum likelihood to the following grouped data. Interval Number of Losses (0, 10) 248 [10, 100) 427 [100, ∞) 325 Use the fitted distribution to estimate S(500). A. 6% B. 8% C. 10% D. 12% E. 14% 11.11 (4B, 5/94, Q.12) (3 points) You are given the following: • Four observations have been made of a random variable having the density function f(x) = 2λx exp(-λx2 ), x > 0. •
Only one of the observations was less than 2.
Determine the maximum likelihood estimate of λ. A. Less than 0.05 B. At least 0.05, but less than 0.06 C. At least 0.06, but less than 0.07 D. At least 0.07 E. Cannot be determined from the given information 11.12 (4B, 11/95, Q.16) (2 points) You are given the following: • Six losses have been recorded in thousands of dollars and are grouped as follows: Interval Number of Losses (0,2) 2 [2,5) 4 [5,∞) 0 • The random variable X underlying the losses, in thousands, has the density function f(x) = λe-λx, x > 0, λ > 0. Which of the following functions must be maximized to find the maximum likelihood estimate of λ? A. (1- e-2λ)2 (e-2λ - e-5λ)4 B. (1- e-2λ)2 (e-2λ - e-5λ)4 ( e-5λ)6 C. (1- e-2λ)2 (e-2λ - e-5λ)4 (1 - e-5λ)6 D. (1- e-2λ)2 (e-2λ - e-5λ)4 ( e-5λ)-6 E. (1- e-2λ)2 (e-2λ - e-5λ)4 (1 - e-5λ)-6
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 311 11.13 (4B, 5/97, Q.8) (2 points) You are given the following:
• •
The random variable X has a uniform distribution on the interval [0, θ]. A random sample of three observations of X has been recorded and grouped as follows: Number of Interval Observations [0, k) 1 [k, 5) 1 [5, θ]
1
Determine the maximum likelihood estimate of θ. A. 5
B. 7.5
C. 10
D. 5+k
E. 10-k
11.14 (4B, 5/97, Q.10) (2 points) You are given the following: • Forty (40) observed losses from a long-term disability policy with a one-year elimination period have been recorded and are grouped as follows; Years of Disability Number of Losses (1, 2) 10 [2, ∞) 30 • You wish to shift the observations by one year and fit them to a Pareto distribution, with parameters θ (unknown) and α = 1. Determine the maximum likelihood estimate of θ. A. 1/3
B. 1/2
C. 1
D. 2
E. 3
11.15 (4B, 5/99, Q.3) (2 points) You are given the following: •
The random variable X has a uniform distribution on the interval (0, θ) , θ > 2.
•
A random sample of four observations of X has been recorded and grouped as follows: Number of Interval Observations (0,1] 1 (1,2] 2 (2, θ)
1
Determine the maximum likelihood estimate of θ. A. 8/3
B. 11/4
C. 14/5
D. 20/7
E. 3
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 312 11.16 (Course 4 Sample Exam 2000, Q.37) Twenty widgets are tested until they fail. Failure times are distributed as follows: Interval Number Failing (0, 1] 2 (1, 2] 3 (2, 3] 8 (3, 4] 6 (4, 5] 1 (5, ∞) 0 The exponential survival function S(t) = exp(-λt) is used to model this process. Determine the maximum likelihood estimate of λ. 11.17 (4, 11/02, Q.23 & 2009 Sample Q. 44) (2.5 points) You are given: (i) Losses follow an exponential distribution with mean θ. (ii) A random sample of 20 losses is distributed as follows: Loss Range Frequency [0, 1000] 7 (1000, 2000] 6 (2000, ∞) 7 Calculate the maximum likelihood estimate of θ. (A) Less than 1950 (B) At least 1950, but less than 2100 (C) At least 2100, but less than 2250 (D) At least 2250, but less than 2400 (E) At least 2400 11.18 (4, 11/06, Q.33 & 2009 Sample Q.276) (2.9 points) For a group of policies, you are given: (i) Losses follow the distribution function F(x) = 1 - θ/x, θ < x < ∞. (ii) A sample of 20 losses resulted in the following: Interval Number of Losses x ≤ 10 9 10 < x ≤ 25 6 x > 25 5 Calculate the maximum likelihood estimate of θ. (A) 5.00
(B) 5.50
(C) 5.75
(D) 6.00
(E) 6.25
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 313 11.19 (CAS 3L, 5/12, Q.17) (2.5 points) You are given the following information:
• Claim severity follows an Inverse Exponential distribution with parameter θ. • One claim is observed, which is known to be between 50 and 500. • Calculate the maximum likelihood estimate of θ. A. Less than 60 B. At least 60, but less than 90 C. At least 90, but less than 120 D. At least 120, but less than 150 E. At least 150
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 314 Solutions to Problems: 11.1. E. F(x) = 1 - exp (-(x/θ)τ). Therefore for τ = 3, F(5) = 1 - exp (-125/θ3). S(5) = exp (-125/θ3). There are 50 claims less than 5 and 10 claims greater than 5. (The likelihood function is the probability of being in each interval, to the power of the number of claims in that interval. Here we have only two intervals.) The loglikelihood function is: 50 ln(1- exp (-125/θ3)) +10(-125/θ3). Setting the derivative with respect to θ equal to zero: (50){(-3θ−4)125exp(-125/θ3)} / {1 - exp(-125/θ3)} - 1250(-3θ−4) = 0. 5 exp(-125/θ3) = 1 - exp(-125/θ3). ⇒ exp(-125/θ3) = 1/6. Therefore, -125/θ3 = -ln(6). θ = (125/ln(6))1/3 = 4.12. Alternately, when we have data grouped into only two intervals, one can fit via maximum likelihood by setting the theoretical and empirical distributions functions equal at the boundary between the two groups. 1 - exp (-125/θ3) = 50/60. ⇒ θ = (125/ln(6))1/3 = 4.12.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 315 11.2. C. F(x) = 1 - {θ/(θ+x)}α, a Pareto Distribution. One takes the difference in distribution functions at the top and bottom of each interval and then takes the product to the power of the number of claims in that interval. Bottom Top Number F(Bottom F(Top Contribution of of of of of to Interval Interval Claims Interval) Interval) Likelihood 1 - {θ/(θ+1)}α
0
(1 - {θ/(θ+1)}α)40
0
1
40
1
2
20
1 - {θ/(θ+1)}α 1 - {θ/(θ+2)}α
({θ/(θ+1)}α - {θ/(θ+2)}α)20
2
5
25
1 - {θ/(θ+2)}α 1 - {θ/(θ+5)}α
({θ/(θ+2)}α - {θ/(θ+5)}α)25
5
∞
15
1 - {θ/(θ+5)}α
{θ/(θ+5)}15α
1
Likelihood is the product of the contributions of each of the intervals: (1 - {θ/(θ+1)}α)40({θ/(θ+1)}α - {θ/(θ+2)}α)20({θ/(θ+2)}α - {θ/(θ+5)}α)25 {θ/(θ+5)}15α. 11.3. C. Take the difference in distribution functions at the top and bottom of each interval and then take the product to the power of the number of claims in that interval: Bottom of Top of Number F(Bottom F(Top of Difference Contribution Interval Interval of Claims of Interval) Interval) of Distrib. to Likelihood 100
150
6
0
1 - (2/3)q
1 - (2/3)q
{1 - (2/3)q }6
150
200
1
1 - (2/3)q
1 - (1/2)q
(2/3)q - (1/2)q
(2/3)q - (1/2)q
200
∞
3
1 - (1/2)q
1
(1/2)q
(1/2)3q
The likelihood is: {1 - (2/3)q }6 {(2/3)q - (1/2)q } (1/2)3q. The loglikelihood is: 6 ln{1 - (2/3)q } + ln{(2/3)q - (1/2)q } + 3q ln(1/2). The derivative with respect to q of the loglikelihood is: ln(2/3){- (2/3)q } 6 /{1 - (2/3)q } + { ln(2/3)(2/3)q - ln(1/2)(1/2)q }/{(2/3)q - (1/2)q } + 3 ln(1/2) Setting the derivative of the loglikelihood equal to zero in order to find a maximum: 0 = {ln(1.5) (2/3)q } 6 /{1 - (2/3)q } + {ln(2)(1/2)q - ln(1.5)(2/3)q }/{(2/3)q - (1/2)q } - 3 ln(2) Comment: The numerical solution of the above equation is q ≅ 1.904, which is the maximum likelihood estimate of the parameter q.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 316 11.4. E. & 11.5. B. Take the difference in distribution functions at the top and bottom of each interval and then take the product to the power of the number of claims in that interval: Bottom of Top of Number of F(Bottom F(Top of Difference Contribution Interval Interval Claims of Interval) Interval) of Distrib. to Likelihood 0
1
6300
0
1 - e-λ
1 - e-λ
1
2
2350
1 - e-λ
1 - e-2λ
e-λ - e-2λ
{e-λ - e-2λ}2350
2
3
850
1 - e-2λ
1 - e-3λ
e-2λ - e-3λ
{e-2λ - e-3λ}850
3
4
320
1 - e-3λ
1 - e-4λ
e-3λ - e-4λ
{e-3λ - e-4λ}320
4
5
110
1 - e-4λ
1 - e-5λ
e-4λ - e-5λ
{e-4λ - e-5λ}110
5
∞
70
1 - e-5λ
1
e-5λ
{1 - e-λ}6300
{e-5λ }70
Likelihood = {1 - e-λ}6300 {e-λ - e-2λ}2350 {e-2λ - e-3λ}850 {e-3λ - e-4λ}320 {e-4λ - e-5λ}110 {e-5λ }70. Let y = e-λ. Likelihood = (1 - y)6300(y - y2 )2350(y2 - y3 )850(y3 - y4 )320(y4 - y5 )110(y5 )70 = y 5800(1 - y)9930. Set the derivative equal to zero: 0 = 5800y5799(1 - y)9930 - 9930y5800(1 - y)9929. ⇒ 5800(1 - y) = 9930y.
⇒ y = 5800/15730 = .3687. ⇒ e-λ = .3687 ⇒ λ = 0.998. Comment: Similar to 4, 11/02, Q.23. For grouped data, for an Exponential, method of moments and maximum likelihood are not equal. 11.6. C. F(x) = 1 - {θ/(θ+x)}α = 1 - {20/(20+x)}α. One takes the difference in distribution functions at the top and bottom of each interval, and then takes the product to the power of the number of claims in that interval. Likelihood: (1 - {20/(20+10)}α)45 ({20/(20+10)}α)65. Let y = {20/(20+10)}α = (2/3)α. Then the likelihood is: (1 - y)45 y65. In order to maximize the likelihood, set the derivative with respect to y equal to zero: 0 = -45(1 - y)44 y65 + 65(1 - y)45 y64. ⇒ y = 65/(65 + 45) = 2/7 = .591. ⇒ (2/3)α = .591. ⇒ α = ln(.591)/ln(2/3) = 1.30. Alternately, when we have data grouped into only two intervals, one can fit via maximum likelihood by setting the theoretical and empirical distributions functions equal at the boundary between the two groups. 1 - {20/(20+10)}α = 45/110. ⇒ (2/3)α = .591. ⇒ α = 1.30.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 317 11.7. B. The likelihood is: (F(1) - F(0))6 (F(2) - F(1))4 (F(3) - F(2))2 (F(4) - F(3)). Since there is one loss of size at least 3, b ≥ 3. F(1) - F(0) = F(2) - F(1) = F(3) - F(2) = 1/b. F(4) - F(3) = 1/b, provided b ≥ 4. If b ≥ 4, F(4) - F(3) = 1/b. Therefore, if b ≥ 4, the likelihood is 1/b13. This is greatest for b = 4. In contrast, assume for example b = 3.6. Then F(4) - F(3) = .6/3.6, since the uniform on [0, 3.6] has no probability beyond 3.6.. If 3 ≤ b ≤ 4, F(4) - F(3) = (b - 3)/b. If 3 ≤ b ≤ 4, the likelihood is (b-3)/b13. Setting the derivative equal to zero, this is largest for b = (3)(13)/12 = 3.25. The likelihood at 3.25 is: .25/3.2513 = 5.5 x 10-8. The likelihood at 4 is: 1/413 = 1.5 x 10-8. Thus the maximum likelihood occurs for b = 3.25. 11.8. B. F(x) = 1 - {θ/(θ+x)}α = 1 - {θ/(θ+x)}3 . One takes the difference in distribution functions at the top and bottom of each interval, and then takes the product to the power of the number of claims in that interval. Likelihood: (1 - {θ/(θ+1000)}3 )50 ({θ/(θ+1000)}3 )20. Let y = {θ/(θ+1000)}3 . Then the likelihood = (1 - y)50 y20. In order to maximize the likelihood, set the derivative with respect to y equal to zero: 0 = -50(1 - y)49 y20 + 20(1 - y)50 y19. ⇒ y = 20/(50 + 20) = 2/7 = .2857. ⇒ {θ/(θ+1000)}3 = .2857. ⇒ θ = 1929. Alternately, when we have data grouped into only two intervals, one can fit via maximum likelihood by setting the theoretical and empirical distributions functions equal at the boundary between the two groups. 1 - {θ/(θ+1000)}3 = 50/70. ⇒ {θ/(θ+1000)}3 = .2857. ⇒ θ = 1929. 11.9. E. F(x) = e-θ/x. The likelihood is: (e-θ/10000)140(e-θ/20000 - e-θ/10000)100(1 - e-θ/20000)160. Let y = e-θ/20000. ⇒ y2 = e-θ/10000. Then the likelihood is: y280(y - y2 )100(1 - y)160 = y380(1 - y)260. Set the derivative with respect to y of the likelihood equal to zero: 380y379(1 - y)260 - 260y380(1 - y)259 = 0. ⇒ 380(1 - y) = 260y. ⇒ y = 19/32.
⇒ e-θ/20000 = 19/32. ⇒ θ = -20000ln(19/32) = 10,426.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 318 11.10. D. Since we are fitting two parameters with only three intervals: Set F(10) = observed proportion from 0 to 10 = 0.248, and S(100) = observed proportion from 100 to ∞ = 0.325. (x / θ)γ 1 F(x) = . γ = 1 + (x / θ) 1 + (θ / x) γ 1 = 0.248. ⇒ (θ/10)γ = 3.032. 1 + (θ / 10)γ 1 = 0.325. ⇒ (θ/100)γ = 0.4815. 1 + (100 / θ)γ
⇒ 10γ = 3.032/0.4815 = 6.297. ⇒ γ = 0.799. ⇒ θ = 40.1. S(500) =
1 = 11.75%. 1 + (500 / 40.1)0.799
Comment: This is mathematically equivalent to percentile matching. 11.11. D. This is grouped data with two intervals, one from 0 to 2 and the other from 2 to ∞. Therefore one uses the Distribution function to compute the likelihood. The distribution function can be obtained either by integrating the given density function or by recognizing that this is a Weibull distribution with parameters θ = λ−1/2, τ = 2. F(x) = 1 - exp(-λx2 ). F(2) = 1 - e-4λ. For each interval in order to compute its likelihood, one takes the difference of the Distribution function at the top and bottom of the interval, and then takes the result to the power equal to the number of claims observed in that interval. Finally the overall likelihood is the product of the likelihoods computed for each interval. Interval # Claims Likelihood 0 to 2
1
F(2) = 1 - e-4λ
2 to ∞
3
[1-F(2)]3 = e-12λ
overall
4
e-12λ - e-16λ
Thus in this case the overall likelihood is equal to e-12λ - e-16λ . One maximizes the likelihood by setting equal to zero its partial derivative with respect to the single parameter λ: 0 = -12e-12λ + 16 e-16λ. Solving λ = (1/4)ln(4/3) = 0.072. Alternately, when we have data grouped into only two intervals, one can fit via maximum likelihood by setting the theoretical and empirical distributions functions equal at the boundary between the two groups. 1 - e-4λ = 1/4. ⇒ λ = (1/4)ln(4/3) = 0.072. Comment: While one could maximize the loglikelihoods, in this case it seems easier to work with the likelihoods themselves.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 319 11.12. A. F(x) = 1 - e-λx, an exponential distribution. One takes the difference in distribution functions at the top and bottom of each interval and then takes that to the power of the number of claims in that interval. The likelihood is the product of all these terms. Bottom of Top of Number of F(Bottom F(Top of Difference Contribution Interval Interval Claims of Interval) Interval) of Distrib. to Likelihood 0
2
2
2
5
4
5
∞
0
1 - e-2λ
1 - e-2λ
{1 - e-2λ}2
1 - e-2λ
1 - e-5λ
e-2λ - e-5λ
{e-2λ - e-5λ}4
1 - e-5λ
1
e-5λ
{e-5λ}0
0
Likelihood = {1 - e-2λ}2 {e-2λ - e-5λ}4 {e-5λ}0 = {1 - e-2λ }2 {e-2λ - e-5λ }4 . 11.13. B. For grouped data, the likelihood is the product of terms, each of which is the difference of the distribution function at the top and the bottom of an interval taken to the power of the number of claims observed in that interval. The Uniform Distribution Function on [0,θ] has: F(0) = 0, F(k) = k/θ, F(5) = 5/θ, F(θ) = 1. The Likelihood is: (k/θ - 0)1 (5/θ - k/θ)1 (1 - 5/θ)1 = (5k - k2 ) (θ−2 - 5θ−3). In order to maximize the Likelihood as a function of θ, we set the partial derivative with respect to θ of the Likelihood equal to zero. 0 = (5k - k2 ) (-2θ−3 + 15θ−4). Therefore, θ = 15/2 = 7.5. Comment: If one assumed that one had one observation greater than 5, but did not know how big it was (for example one had a maximum covered loss of 5, so the data was censored at 5), then it would reduce to the mathematical situation presented. If [5,θ] had been changed to [5,∞); i.e., greater than or equal to 5, the solution would have been the same. Note that with the uniform distribution, the solution is independent of k. 11.14. E. After shifting we have 10 claims in the interval (0,1) and 30 claims in the interval [1,∞]. For grouped data, the likelihood is the product of terms, each of which is the difference of the distribution function at the top and the bottom of an interval taken to the power of the number of claims observed in that interval. The Pareto Distribution with α = 1 (on the shifted data) is: F(0) = 0, F(1) = 1 - θ/(θ+1) = 1/(θ+1), F(∞) = 1. In this case the Likelihood is: {F(1) - F(0)}10{F(∞) - F(1)}30 = {1/(θ+1)}10{θ/(θ+1)}30 = θ30/(θ+1)40. The loglikelihood is: 30ln(θ) - 40ln(θ+1). In order to maximize the Likelihood, we set the partial derivative with respect to θ of the loglikelihood equal to zero: 0 = 30/θ - 40/(θ+1). Therefore, 30(θ+1) = 40θ, or θ = 3.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 320 11.15. A. The likelihood is the product of terms each of which is the probability covered by an interval taken to the power of the number of claims in that interval. The likelihood = (1/θ) (1/θ)2 ((θ-2)/θ) = θ−3 - 2θ−4. The partial derivative with respect to θ is: -3θ−4 + 8θ−5. Setting that equal to zero and solving, θ = 8/3. 11.16. The contribution to the likelihood of interval (a,b] is: {F(b)-F(a)}# points in interval Interval Contribution to Likelihood (0, 1]
(1 - e−λ)2
(1, 2]
(e−λ - e−2λ)3
(2, 3]
(e−2λ - e−3λ)8
(3, 4]
(e−3λ - e−4λ)6
(4, 5]
(e−4λ - e−5λ)
The likelihood is: (1 - e−λ)2 (e−λ - e−2λ)3 (e−2λ - e−3λ)8 (e−3λ - e−4λ)6 (e−4λ - e−5λ). Let y = e−λ. Then the likelihood is (1 - y)2 y 3 (1 -y)3 y 16(1 - y)8 y 18(1 - y)6 y 4 (1 - y) = y41(1 - y)20. Setting the derivative with respect to y equal to zero: 0 = 41y40(1 - y)20 - 20y41(1 - y)19. 41(1- y) = 20y. y = 41/61.
⇒ λ = - ln y = -ln(41/61) = 0.397. Comment: y41(1 - y)20 is proportional to the density of a Beta Distribution with θ =1, a = 42 and b = 21. This has mode of θ(a - 1)/(a + b - 2) = 41/61. Therefore this expression is maximized for y = 41/61.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 321 11.17. B. F(x) = 1 - e-x/θ. The likelihood is the product of the contributions from each interval. Bottom of Top of Interval Interval
Number of Claims
Difference of Distributions
Contribution to Likelihood
0
1000
7
1 - e-1000/θ
{1 - e-1000/θ}7
1000
2000
6
e-1000/θ - e-2000/θ
{e-1000/θ - e-2000/θ}6
2000
∞
7
e-2000/θ
{e-2000/θ}7
Likelihood = {1 - e-1000/θ}7 {e-1000/θ - e-2000/θ}6 {e-2000/θ}7 . Let y = e-1000/θ. Then, likelihood = (1 - y)7 y 6 (1 - y)6 y 14 = y20(1 - y)13. Maximize the likelihood by setting its derivative equal to zero: 0 = 20y19(1 - y)13 - 13y20(1 - y)12. ⇒ 20(1 - y) = 13y. ⇒ y = 20/33.
⇒ θ = -1000/ln(20/33) = 1997. Alternately, the loglikelihood is: 7ln[1 - e-1000/θ] + 6ln[e-1000/θ - e-2000/θ] - 14000/θ. Set the derivative of the loglikelihood with respect to theta equal to zero: 0 = -7e-1000/θ 1000/θ2 / (1 - e-1000/θ) + 6{e-1000/θ 1000/θ2 - e-2000/θ 2000/θ2}/(e-1000/θ - e-2000/θ) + 14000/θ2. 0 = -7e-1000/θ/(1 - e-1000/θ) + 6{e-1000/θ - 2e-2000/θ}/(e-1000/θ - e-2000/θ) + 14. 14(1 - e-1000/θ) = 7e-1000/θ - 6(1 - 2e-2000/θ). ⇒ e-1000/θ = 20/33. ⇒ θ = -1000/ln(20/33) = 1997. Comment: Note that y20 (1 - y)13 is proportional to a Beta Distribution with a = 21, b = 14, and θ = 1. The mode of this Beta Distribution is:
a - 1 a + b - 2
=
21 - 1 20 = . 21 + 14 - 2 33
Thus the likelihood is maximized for y = 20/33. 11.18. B. For θ < 10, likelihood is: F(10)9 {F(25) - F(10)}6 S(25)5 = (1 - θ/10)9 (θ/10 - θ/25)6 (θ/25)5 = (1 - θ/10)9 θ 6 (1/10 - 1/25)6 θ 5 (1/25)5 . loglikelihood is: 9 ln(1 - θ/10) + 11 ln(θ) + constants. Set the derivative with respect to θ equal to zero: 0.9/(1 - θ/10) = 11/θ. ⇒ θ = 5.5.
2013-4-6, Fitting Loss Distributions §11 Grouped Max. Likelihood, HCM 10/14/12, Page 322 11.19. D. This is grouped data; the likelihood is the product of the probability in each interval to the power the number of items in each interval. Here there is only one observation, and the probability covered by the interval from 50 to 500 is: F(500) - F(50). Thus the likelihood is: F(500) - F(50) = e-θ/500 - e-θ/50. Take the partial derivative with respect to θ of the likelihood and set it equal to zero: 0 = -e-θ/500 / 500 + e-θ/50 / 50. ⇒ 10 = exp[-θ/500] / exp[-θ/50] = exp[θ/50 - θ/500] = exp[0.018θ]. ⇒ θ = 128.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 323
Section 12, Chi-Square Test118 The Chi-Square Statistic provides one way to examine the fit of distributions. One sums up the contributions from each interval:
k
The Chi-Square Statistic is:
∑(Oj j=1
(observed number - expected number)2 . expected number
2
- Ej) / Ej .
The expected number of claims for the interval [a,b] is N{F(b) - F(a)}, where F(x) is the fitted or assumed distribution function and N is total number of claims. The better the match, the closer the expected and observed values of the Distribution function will be. If there is a reasonable match over all intervals, the sum over all intervals is small. Thus the better the match, the smaller the Chi-Square Statistic. For example for the Burr distribution, F(x) = 1 - {1/(1 + (x/θ)γ)}α, with parameters α = 3.9913, θ = 40,467, γ = 1.3124 fit to the grouped data in Section 3 by the Method of Maximum Likelihood:119 Bottom of Top of Interval Interval $ Thous. $ Thous. 0 5 5 10 10 15 15 20 20 25 25 50 50 75 75 100 100 Infinity
# claims in the Interval 2208 2247 1701 1220 799 1481 254 57 33 10000
F(lower)
F(upper)
0.000000 0.220189 0.446379 0.617040 0.736355 0.817547 0.965227 0.990915 0.996977
0.220189 0.446379 0.617040 0.736355 0.817547 0.965227 0.990915 0.996977 1.000000
F(upper) minus F(lower) 0.220189 0.226190 0.170661 0.119315 0.081192 0.147680 0.025688 0.006062 0.003023
Fitted # claims 2201.89 2261.90 1706.61 1193.15 811.92 1476.80 256.88 60.62 30.23
(ObservedFitted)^2 /Fitted 0.017 0.098 0.018 0.604 0.206 0.012 0.032 0.216 0.254
1.000000
10000
1.458
For example, F(25000) = 1 - {1/ (1+ (x/θ)γ)}α = 1 - {1/(1+(25000/40467)1.3124)}3.9913 = 0.81755. 118
See Section 16.4.3 of Loss Models. The Chi-Square Statistic is also referred to as “Pearsonʼs goodness-of-fit statistic.” 119 When computing the Chi-Square Statistic, since the result can be sensitive to rounding, avoid intermediate rounding. I would compute the fitted/assumed values to at least one decimal place; two decimal places would be better when the fitted/assumed values are small.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 324 The fitted number of claims in the interval 25000 to 50000 is: (10,000){F(50000) - F(25000)} = (10000)(0.96523 - 0.81755) = 1476.8, where 10,000 is the total observed claims. The contribution to the Chi-Square Statistic from the interval 25000 to 50000 is: (1481 - 1476.8)2 / 1476.8 = 0.012. The Chi-Square Statistic is the sum of the contributions from each of the intervals, which is 1.458 in this case. This value of 1.458 is small because of the close match of the fitted Burr Distribution to the observed data. Degrees of Freedom: For example, when the Chi-Square test is used to test the fairness of dice, one assumes the probability of each face coming up is the the same, prior to seeing any data. Another example is when one tests whether the distribution fit to last yearʼs data, also fits this yearʼs new data. For an assumed rather than fitted distribution, the computed sum follows a Chi-Square distribution with degrees of freedom equal to the number of intervals minus one.120 121 The number of degrees of freedom tells you what row of the Chi-Square table to use to test the significance of the match. Note that the Chi-Square Statistic is always non-negative. By definition the Chi-Square Statistic is greater than or equal to zero122. The smaller the Chi-Square Statistic the better fit between the curve and the observed data. When you fit a distribution to this data, you lose one degree of freedom per fitted parameter; d.f. = # intervals - 1 - # fitted parameters.123 120
We lose one degree of freedom, because the sum of the expected column always equals the sum of the observed column. 121 If one has 9 intervals as above, then one has taken the sum of 9 Standard Unit Normals squared. This sum would ordinarily have a Chi-Square distribution with 9 degrees of freedom. However, we lose one degree of freedom, since the total number of fitted claims over all intervals is set equal to the total number of observed claims,10,000. Thus in this case we have 8 degrees of freedom, if there were no fitted parameters. A derivation is given subsequently. 122 Therefore, one performs a one-sided test, rather than a two-sided test as with the use of the Normal distribution. 123 In the case of maximum likelihood applied to grouped data one losses one degree of freedom per fitted parameter. If one applies the method of maximum likelihood to the individual claim values of ungrouped data, then one loses degrees of freedom, but somewhat less than one per fitted parameter. In this case, to be conservative one can assume one loses one degree of freedom per fitted parameter, although the actual loss of degrees of freedom may be less. See Kendallʼs Advanced Theory of Statistics, by Stuart and Ord, 5th edition, Vol.2 , pp. 1166, 1172.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 325 We only subtract the number of parameters when the distribution has been fit to the data set we are using to compute the Chi-Square. We do not decrease the number of degrees of freedom if this distribution has been fit to some similar but different data set. Note that with fewer degrees of freedom, the mean of the Chi-Square distribution is less, and therefore it is easier to reject a fitted curve.124 The point being that the parameters of the fitted distribution have been selected precisely to fit the particular observed data, so one should be less impressed by a low Chi-Square value than one would be in a situation where the curves parameters were selected prior to seeing the particular data. Reduce the number of degrees of freedom for the number of fitted parameters when: 1. You are given grouped data or group data into intervals. 2. You fit a distribution to the data in #1. 3. Compare the fitted distribution in #2 to the data in #1. 4. Number of degrees of freedom is: # of interval - 1 - number of parameters fit in #2. For example, Kermit is computing a Chi-Square for a LogNormal Distribution (with two parameters µ and σ) compared to the Muppet Insurance Companyʼs individual health insurance size of loss data for the year 2000 from the state of Nebraska. If Kermit fit a LogNormal Distribution to this data, then the number of degrees of freedom are reduced by the number of fitted parameters, two. However, here are some examples where in comparing a LogNormal Distribution to Muppet Insurance Companyʼs individual health insurance data for the year 2000 from the state of Nebraska, Kermit should not subtract the number of parameters, two, from the number of degrees of freedom: 1. Kermit is comparing this data to a LogNormal fit to the data from the year 1999, a different year. 2. Kermit is comparing this data to a LogNormal fit to data from the state of Kansas, a different state. 3. Kermit is comparing this data to a LogNormal fit to data from some other insurer. 4. Kermit is comparing this data to a LogNormal fit to data for group health rather than individual health insurance, a different line of insurance. 5. Kermit is comparing this data to a LogNormal with parameters picked by Miss Piggy, who has not seen the data Kermit is examining. This is an assumed distribution. In computing the degrees of freedom, one takes the number of intervals actually used to compute the Chi-Square Statistic.
124
The mean of a Chi-Square Distribution with v degrees of freedom is v.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 326 To compute the number of Degrees of Freedom: 1. Determine the groups to use in computing the Chi-Square statistic. Unless the exam question has specifically told you which groups to use, use the groups for the data given in the question. 2. Determine whether any parameters have been fit to this data, and if so how many. 3. Degrees of freedom = (# intervals from step 1) - 1 - (# of fitted parameters, if any, from step #2).
Some Intuition Behind Reducing Degrees of Freedom: By using a lot of parameters, one can get a good match between a model and the data. For example, one can fit a 10th degree polynomial to pass through any 11 data points. When we fit a distribution to data, we have determined the parameters so that the distribution will look like a good match to the data. Thus the Chi-Square Statistic will be smaller when comparing the distribution to the data to which we have fit it. Degrees of Freedom
Value of P 0.900
0.950
0.975
0.990
0.995
1 2 3 4
2.706 4.605 6.251 7.779
3.841 5.991 7.815 9.488
5.024 7.378 9.348 11.143
6.635 9.210 11.345 13.277
7.879 10.597 12.838 14.860
The critical values increase as we go down each column of the Chi-Square Table. If we have fit 2 parameters, and compare the distribution to the data to which it has been fit, then we have done (approximately) 2 degrees of freedom worth of “cheating,” thereby making the Chi-Square Statistic smaller than it would have otherwise been. Moving up 2 rows in the table, we compare the statistic to smaller values, compensating for our “cheating.” The more parameters we have fit, the larger the needed adjustment. If we had instead fit 3 parameters, we would reduce the degrees of freedom by 3, and move up 3 rows rather than 2.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 327 Which Groups to Use in Computing the Chi-Square Statistic:125 As stated above, unless an exam question has specifically told you how to determine which groups to use, use the groups for the data given in the question. If the exam question gives you some rule that should be followed, you may have to combine some of the given groups.126 According to Loss Models, the Chi-Square Goodness of Fit Test works best when the expected number of items are approximately equal for the different intervals. Loss Models mentions various additional rules of thumbs used by different authors:127
• One should have an expected number of items in each interval of at least 5. • One should have an expected number of items in each interval of at least 1.
• One should have an expected number of items in each interval of at least 1, and an expected number of items of at least 5 in at least 80% of the intervals.
• When testing at a 1% significance level, one should have an average expected number of items in each interval of at least 4.
• When testing at a 5% significance level, one should have an average expected number of items in each interval of at least 2,
• One should have a sample size of at least 10, at least 3 intervals, and (sample size)2 /(number of intervals) ≥ 10.
125
See page 452 of Loss Models. See 4, 11/04, Q.10. In “Mahlerʼs Guide to Fitting Frequency Distributions”, 4, 5/00, Q.29 and 4, 5/01, Q.19. 127 In practical applications, there are a a number of different rules of thumb one can use for determining the groups to use. I use one of the rules mentioned in Loss Models: One should have an expected number of claims in each interval of 5 or more, so that the normal approximation that underlies the theory, is reasonably close; therefore, some of the given intervals for grouped data may be combined for purposes of applying the Chi-Square test. 126
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 328 A Weibull Distribution Example: Here is another example of the computation of a Chi-Square Statistic. For the Weibull distribution fit to the grouped data in Section 3 by the Method of Maximum Likelihood, θ = 16,184 and τ = 1.0997, F(x) = 1 - exp[-(x/16184)1.0997]: Bottom of Top of Interval Interval $ Thous. $ Thous. 0 5 10 15 20 25 50 75 100
5 10 15 20 25 50 75 100 Infinity
Oi
2208 2247 1701 1220 799 1481 254 57 33 10000
(Oi - Ei)^2 / Ei
F(upper)
F(upper) minus F(lower)
Ei
F(lower) 0.00000 0.24028 0.44508 0.60142 0.71696 0.80075 0.96848 0.99548 0.99939
0.24028 0.44508 0.60142 0.71696 0.80075 0.96848 0.99548 0.99939 1.00000
0.24028 0.20480 0.15634 0.11553 0.08379 0.16774 0.02700 0.00391 0.00061
2402.8 2048.0 1563.4 1155.3 837.9 1677.4 270.0 39.1 6.1
15.8 19.3 12.1 3.6 1.8 23.0 0.9 8.2 119.9
1.00000
10000
204.6
With a very large Chi-Square of 204.6, the Weibull Distribution is a very poor fit to this data. By examining the final column, one can see which intervals contributed significantly to the Chi-Square. In this case, the interval from 100 to infinity contributed an extremely large amount; the righthand tail of the Weibull is too light to fit this data. In this case, the fit is so poor there are also very large contributions from many other intervals.128
Testing a Fit: The p-value (probability value) is the value of the Survival Function of the Chi-Square Distribution (for the appropriate number of degrees of freedom) at the value of the Chi-Square Statistic. If the data came from the fitted/assumed distribution, then the p-value is the probability that the ChiSquare statistic would be greater than its observed value. A large p-value indicates a good fit. The p-value determines the significance level at which one rejects the fit.129 130
128
A contribution from an interval of 2 or more is significant. A contribution of 5 or more is large. When using the Chi-Square Table, one rejects at the significance value in the table that first exceeds the p-value. For example, with a p-value of 0.6% one rejects at 1%. Using a computer, one can get more accurate p-values, than by using a table. 130 See the subsequent section on hypothesis testing for a general discussion of p-values. 129
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 329 For some Distributions fit to the grouped data in Section 3 by the Method of Maximum Likelihood, the values are:131 # Parameters Chi-Square Statistic p-value132 Burr 3 1.46 91.8% Generalized Pareto 3 5.70 33.7% Transformed Gamma 3 16.45 reject at 1% 0.57% Gamma 2 142.7 reject at 1/2% 0133 LogNormal 2 178.3 reject at 1/2% 0 Weibull 2 207.5 reject at 1/2% 0 Inverse Gaussian 2 238.5 reject at 1/2% 0 Exponential 1 255.4 reject at 1/2% 0 With 9 intervals, there are 8 - the number of fitted parameter degrees of freedom. Thus for the Transformed Gamma with three fitted parameters, we are interested in the Chi-Square distribution with 5 degrees of freedom.134 Degrees of Freedom 5 6 7
0.010 9.236 10.645 12.017
Significance Levels (1-P) 0.050 0.025 11.070 12.832 12.592 14.449 14.067 16.013
0.010 15.086 16.812 18.475
0.005 16.750 18.548 20.278
From the Chi-Square table, for 5 degrees of freedom, as shown above, 1 - F(15.086) = S(15.086) = 1% and S(16.750) = 0.5%. In other words, for five degrees of freedom the critical value for 1% is 15.086, while the critical value for 1/2% is 16.750. In the test of the fit of the Transformed Gamma Distribution, the p-value is the survival function at 16.45. From the Chi-Square for 5 degrees of freedom, S(15.086) = 1% and S(16.750) = 0.5%, and therefore, 1% > S(16.45) > 0.5%. Thus the p-value for the fitted Transformed Gamma is between 1% and 0.5%. Using a computer, one can determine that S(16.45) = 0.57%. Thus the pvalue for the Transformed Gamma fit is 0.57%. Since 16.45 > 15.086, the critical value for 1%, we can reject the Transformed Gamma fit by maximum likelihood at a 1% level. Since 16.45 < 16.750, the critical value for 1/2%, we can not reject the Transformed Gamma fit by maximum likelihood at a 1/2% level. In other words, we reject the fit of the Transformed Gamma at 1% and do not reject at 1/2%. 131
See the section on maximum likelihood fitting to grouped data for the parameters. Obtained using a computer. Using the table, one can determine that for the Transformed Gamma the p-value is between 1% and 1/2 %. 133 Less than 3 x 10-28. 134 The Chi-Square distribution with 5 degrees of freedom is a Gamma Distribution with α = 5/2 and θ = 2. 132
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 330 There is a “mechanical” method of using the Chi-Square Table, that may help avoid mistakes on the exam. Once one had the calculated value of Chi-Square for the fitted Transformed Gamma, one would proceed as follows. Looking at the row of the Chi-Square Table135 for: 8 - 3 = 5 degrees of freedom, we see which entries bracket the calculated Chi-Square value of 16.45 for the fitted Transformed Gamma. In this case 15.086 < 16.45 < 16.750. We reject at the significance level of the left hand of the two columns, which is in this case is 1%. We do not reject at the significance level of the right hand of the two columns, which is in this case is 1/2%. In general, look on the correct row of the table, determine which entries bracket the statistic, and reject to the left and do not reject to the right. Exercise: For 7 degrees of freedom, the Chi-Square Statistic is 15.52. At what significance level do you reject the null hypothesis? [Solution: 14.067 < 15.52 < 16.013 ⇒ reject at 5%, and do not reject at 2.5%.] The very poor fits of the two parameter distributions can all be easily rejected at 1/2%, since the critical value for 6 degrees of freedom is 18.548. For example, the Chi-Square value for the fitted Gamma is 142.7, and since 142.7 > 18.548, we reject at the 1/2% significance level. Low p-values indicate a poor fit. For example, the fits of the Transformed Gamma, Gamma, Weibull, Inverse Gaussian, and Exponential, each have p-values less than 1%. A p-value of less than 1% gives a strong indication that the data did not come from the fitted distribution. Another Formula for the Chi-Square Statistic: χ 2 = Σ (Ei - Oi)2 / Ei = Σ (Ei2 - 2OiEi + Oi2 ) / Ei = Σ (Oi2 / Ei - 2Oi + Ei) = Σ Oi2 / Ei - 2ΣOi + ΣEi = Σ (Oi2 / Ei) - 2n + n = Σ (Oi2 / Ei) - n.136
χ2 =
n
2
∑ OEii
- n.
i=1
Some people may find this mathematically equivalent alternate formula to be useful.137
135
See the Chi-Square Table given below, prior to the problems. Where I have used the fact that Σ Ei = Σ Oi = n. 137 See for example, 4, 5/07, Q.5. 136
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 331 Chi-Square Statistic, Ungrouped Data: When dealing with ungrouped data in order to compute the Chi-Square Statistic you must choose intervals in which to group the data. I have chosen 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, and 1,000,000 as the endpoints to use with the ungrouped data in Section 2.138 For example, here is the computation of the Chi-Square Statistic for a Weibull Distribution with parameters θ = 230,000, τ = 0.6, compared to the ungrouped data in Section 2. Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000
Upper Endpoint 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000 infinity
Observed # claims 8 13 12 24 35 19 12 7 130
F(lower) 0.00000 0.14135 0.23208 0.32986 0.45484 0.65052 0.79678 0.91066
F(upper) 0.14135 0.23208 0.32986 0.45484 0.65052 0.79678 0.91066 1.00000
Assumed # claims 18.38 11.80 12.71 16.25 25.44 19.01 14.80 11.61
Chi Square 5.858 0.123 0.040 3.698 3.595 0.000 0.531 1.834
130
15.678
The computed Chi-Square for the Weibull Distribution is 15.678. In this case, we have not fit a Weibull Distribution to this data; we assume the parameters were selected without looking at this data set.139 Thus since we have (used) 8 intervals, we consult the Chi-Square Table for 8 - 1 = 7 degrees of freedom. Since 15.678 > 14.067, we reject the Weibull at 5%. Since 15.678 ≤ 16.013, we do not reject the Weibull at 2.5%. One can compute the Chi-Square Statistic for various distributions fit by maximum likelihood to the ungrouped data in Section 2. The values are:140 # Parameters Chi-Square Statistic p-value141 Pareto 2 1.50 82.7% Burr 3 1.60 80.9% Generalized Pareto 3 1.63 80.3% Transformed Gamma 3 2.78 59.5% LogNormal 2 5.3 38.0% Weibull 2 7.9 16.2% Gamma 2 13.0 reject at 2.5% 2.3% Exponential 1 26.4 reject at 1/2% 0.02% 138
Unfortunately, the computed Chi-Square values depend somewhat on this choice of intervals. Perhaps this Weibull was fit to a similar set of data collected a year before the data in Section 2. 140 Using intervals with 0, 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, 1,000,000, and infinity as endpoints. See the section on Maximum Likelihood Ungrouped Data, for the parameters of the fitted distributions. 141 Via computer. Using the table, one can determine that for the Gamma the p-value is between 2.5% and 1 %. 139
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 332 The p-value of the fitted Exponential is less than 1%, strongly indicating this data did not come from this Exponential. The p-value of the fitted Gamma is between 10% and 1%, providing some indication that this data did not come from this Gamma, but some uncertainty as well. The p-values of the other fits are all above 10%, providing no evidence for H1 , the alternative hypothesis that the data did not come from the fitted distribution.142 Here is the computation of the Chi-Square statistic for the Pareto Distribution fit to the ungrouped data in Section 2 via maximum likelihood, α = 1.702, θ = 240151: Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000
Upper Endpoint 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000 infinity
Observed # claims 8 13 12 24 35 19 12 7 130
F(lower) 0.00000 0.06708 0.15511 0.27523 0.44706 0.70308 0.85277 0.93884
F(upper) 0.06708 0.15511 0.27523 0.44706 0.70308 0.85277 0.93884 1.00000
Fitted # claims 8.72 11.44 15.62 22.34 33.28 19.46 11.19 7.95
Chi Square 0.06 0.21 0.84 0.12 0.09 0.01 0.06 0.11
130
1.50
In contrast for the Exponential the Chi-Square statistic is very large at 26.4. With one parameter, we compare to a Chi-Square Distribution with 8 - 1 - 1 = 6 degrees of freedom.143 The Exponential is such a poor fit that we can reject it at a 1/2% significance level.144 For the Gamma distribution with two parameters, we would compare to the Chi-Square Distribution with 8 - 1 - 2 = 5 degrees of freedom.145 Since the Chi-Square for the Gamma is 13.0, which is greater than the 12.832 critical value for 2.5%, we can reject the Gamma at a 2.5% significance level. On the other hand, the Gamma has a Chi-Square 13.0 < 15.086, the value at which the Chi-Square distribution with 5 d.f. is 99%, so we can not reject the Gamma at a 1% significance level.
142
This general statement about p-values could be applied equally well to other hypothesis tests, for example the Kolmogorov-Smirnov Statistic. 143 Due to the one fitted parameter, Iʼve subtracted 1 from the number of intervals minus one. 144 For 6 degrees of freedom the Chi-Square distribution is 99.5% at 18.548. 26.4 > 18.548 so we can reject the Exponential Distribution at 1/2%. 145 Due to the two fitted parameters, Iʼve subtracted 2 from the number of intervals minus one. When as here one applies the method of maximum likelihood to the individual claim values of ungrouped data, then one loses degrees of freedom, but somewhat less than one per fitted parameter. In this case, to be conservative I have assumed one loses the full 2 degree of freedom, although the actual loss of degrees of freedom may be less. See Kendallʼs Advanced Theory of Statistics, by Stuart and Ord, 5th edition, Vol.2 , p. 1172.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 333 The null hypothesis is that the ungrouped data in Section 2 was drawn from the maximum likelihood Gamma Distribution, while the alternative hypothesis is that the data was not drawn from this Gamma Distribution. Hypothesis tests are set up to disprove something.146 At the 10% significance level, we reject the hypothesis that the ungrouped data in Section 2 was drawn from the maximum likelihood Gamma Distribution. In other words, if this data was drawn from the maximum likelihood Gamma Distribution, then there is less than a 10% probability that the ChiSquare Statistic would be at least 13.0, the observed value. At the 10% significance level, we do not reject the hypothesis that the ungrouped data in Section 2 was drawn from the maximum likelihood Weibull Distribution. At the 10% significance level, we do not reject the hypothesis that the ungrouped data in Section 2 was drawn from the maximum likelihood LogNormal Distribution. Clearly, the data can not be drawn from both the LogNormal and the Weibull. It might be drawn from one of them or neither of them, but not both. At most one of these hypotheses is true. However, given the amount of data we have, only 130 data points, we do not have enough information to disprove either of these hypothesis at the 10% significance level. With more similar data to the 130 data points we have, we should be able to get more information on which distribution generated the data. For the LogNormal Distribution, the p-value is 38.0%. In other words, if this data was drawn from the maximum likelihood LogNormal, then there is 38.0% probability that the Chi-Square Statistic would be at least 5.3, the observed value.147 This does not demonstrate that the data came from this LogNormal Distribution. Rather it provides insufficient information to disprove, at a 10% significance level, that the data came from this LogNormal Distribution. Some Intuition for the Chi-Square Test: Assume that the data was drawn from the distribution F; in other words H0 is true. For the interval ai to bi, the probability that each loss will be in that interval is: F(bi) - F(ai). Therefore, for m losses, the number of losses in this interval is Binomial with parameters m and q = F(bi) - F(ai). For small q, this Binomial is approximately a Poisson with mean m{F(bi) - F(ai)}. For large m, this Poisson is approximately Normal with mean m{F(bi) - F(ai)} and variance m{F(bi) - F(ai)}. Number of losses expected in interval i: Ei = m {F(bi) - F(ai)}. 146 147
See the subsequent section on Hypothesis Testing. For the Chi-Square Distribution with 8 - 1 - 2 = 5 degrees of freedom, S(5.3) = 38.0%.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 334 Number of losses observed in interval i: Oi ≅ Normal with µ = Ei and σ2 = Ei. (Oi - Ei) / Ei ≅ Normal with µ = 0 and σ = 1. (Oi - Ei)2 / Ei ≅ Square of Standard Normal = Chi-Square with one degree of freedom. The sum of ν independent Chi-Square Distributions each with one degree of freedom is a Chi-Square Distribution with ν degrees of freedom. If we have ν intervals, we are summing ν approximately Chi-Square Distributions with one degree of freedom. However, when we add up the contributions from each interval, they are not independent, because ΣOi = ΣEi. Therefore, one loses a degree of freedom. While the above is not a derivation, hopefully it gave you some idea of where the test comes from. Below is a derivation of the fact that as the number of items in the data set gets large, the Chi-Square Statistic approaches a Chi-Square Distribution. Derivation of the Chi-Square Test:148 For a claims process that followed the assumed distribution, the mean number of claims for the interval [a,b] is N{F(b) - F(a)}, where F(x) is the assumed distribution function and N is total number of claims. Let pi = F(bi) - F(ai) = the probability covered by interval i. Then the assumed mean number of claims for interval i = µi = piN. The observed number of claims for interval i = xi. With k intervals, the observed data has a multinomial distribution with probabilities: p1 , p2 ,.., pk.149 The variance of the number of observations in the first interval is: Np1 (1-p1 ). The covariance of the number of observations in the first interval and the number of observations in the second interval is: -Np1 p 2 . This multinomial distribution has variance-covariance matrix C, with Cii = Npi(1-pi) and C ij = -Npip j. Due to the linear constraint, Σpi = 1, C has rank k -1. If we eliminate the last row and last column of C, we have a nonsingular matrix C*. Let D = the matrix inverse of C*.150 Then it turns out that Dii = (1/pi + 1/pk)/N and Dij = (1/pk)/N. As N gets large, this multinomial distribution approaches a multivariate Normal Distribution. 148
See for example, Volume I of Kendallʼs Advanced Theory of Statistics. This is the multivariate analog of the Binomial Distribution, which involves a single variable. 150 Taking the inverse of the variance-covariance matrix in the multivariate case is analogous to taking the inverse of the variance in the case of a Normal Distribution. 149
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 335 Therefore the quadratic form (x - µ)D(x - µ) has approximately a Chi-Square Distribution.151 It has number of degrees of freedom equal to the rank of D, which is k -1 = # of intervals - 1.152 k-1
k-1 k-1 1 1 1 2 (x - µ)D(x - µ) = { ∑ ( + ) (xi - µi) + ∑ ∑ ( ) (xi - µ i) (xj - µj) } / N = pk i=1 p i i=1 j=1, j≠i p k
⎫⎪ 2 (xi - µ i)2 k-1 k-1 (xi - µi) (x j - µj ) k-1 (xi - µ i)2 ⎧⎪k-1 =∑ + ⎨∑ (xi - µi )⎬ / µk. However, ∑ Np + ∑ ∑ N pk µi ⎪⎩i=1 ⎪⎭ i i=1 i=1 j=1, j≠i i=1 k-1
k
k
∑ xi = N = ∑ µi . i=1
k-1
Therefore,
i=1
∑ (xi
- µi ) = µk - xk. Therefore,
i=1 k
(xi - µ i)2 = the usual Chi-Square test statistic. µ i i=1
(x - µ)D(x - µ) = ∑
As noted above this quadratic form (x - µ)D(x - µ) has approximately a Chi-Square Distribution with degrees of freedom = number of intervals - 1, as was to be proven. One Rule of Thumb: Note that this required that the multinomial distribution be approximated by a multivariate Normal Distribution. This approximation is poor unless each µi is large enough.153 Thus as discussed previously, one common rule of thumb is that in order to apply the Chi-Square test, each µi, the assumed/fitted number of claims in each interval, should be at least 5.154
151
Subtracting the mean, squaring, and dividing by the variance is how one would convert a Normal Distribution to the square of a Standard Normal Distribution, which is a Chi-Square Distribution with one degree of freedom. We have done the analog in the multivariate case. 152 We lost one degree of freedom due to the linear constraint Σpi = 1. In the Chi-Square test, the total number of expected claims over all intervals is set equal to the total number of observed claims. 153 This is similar to the Normal Approximation of a Binomial Distribution. This approximation is poor unless the mean of the Binomial is large enough. 154 As discussed previously, statisticians use various different rules of thumb. See page 452 of Loss Models.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 336 Chi-Square Distribution: Let Z1 , Z2 , ..., Zν be independent Unit Normal variables, then Z1 2 + Z2 2 + ... + Zν2 is said to have a Chi-Square Distribution with ν degrees of freedom. A Chi-Square Distribution with ν degrees of freedom is the sum of ν independent Chi-Square Distributions with 1 degree of freedom. A Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with α = ν/2 and θ = 2, with mean αθ = ν, and variance αθ2 = 2ν. Therefore a Chi-Square Distribution with 2 degrees of freedom is a Gamma Distribution with α = 2/2 = 1 and θ = 2, an Exponential Distribution with mean 2. Therefore F(.10) = 1 - e-.1/2 = 5%, F(5.99) = 1 - e-5.99/2 = 95%, and F(7.38) = 1 - e-7.38/2 = 97.5%, matching the values shown in the Chi-Square Table.
Chi-Square Table: On a subsequent page is the Chi-Square Table to be attached to your exam. The different rows correspond to different degrees of freedom; in this case the degrees of freedom extend from 1 to 20.155 For most exam questions, one first determines how many degrees of freedom one has, and therefore which row of the table to use, and then ignores all of the other rows of the table. The values shown in each row are the places where the Chi-Square Distribution Function for that numbers of degrees of freedom has the stated P values. The value of the distribution function is denoted by P (capital p.) So for example, for 4 degrees of freedom, F(0.484) = 0.025 and F(13.277) = 0.990. Unity minus the distribution function is the Survival Function; the value of the Survival Function is the p-value (small p). For example, for 4 degrees of freedom, 13.277 is the critical value corresponding to a significance level of 1 - 0.990 = 1%. The critical values corresponding to a 1% significance level are in the column labeled P = 0.990. Similarly, the critical values corresponding to a 5% significance level are in the column labeled P = 0.950. 155
One can approximate values beyond those shown in the table. For ν ≥ 40,
2 χ 2 is approximately Normal with
mean 2 ν - 1 and variance 1. A better approximation is that (χ2/ν)1/3 is approximately Normal with mean 1 - 2/(9ν) and variance 2/(9ν). See Kendallʼs Advanced Theory of Statistics.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 337 For example, for 4 degrees of freedom, as shown in the table, F(0.711) = 0.05.
5%
0.711
2
4
6
8
10
In other words, 0.711 is the 5th percentile of the Chi-Square Distribution with 4 degrees of freedom. Similarly, as shown in the table for 4 degrees of freedom, F(11.143) = 0.975.
97.5%
2
4
6
8
11.143
14
Unity minus the Distribution Function is the Survival Function. The value of the Survival Function is the p-value (small p) in the Chi-Square Goodness of Fit Test.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 338 For example, for 4 degrees of freedom, S(11.143) = 2.5%.
2.5% 2
4
6
8
11.143
14
11.143 is the critical value corresponding to a significance level of 1 - 0.975 = 2.5% in the Chi-Square Goodness of Fit Test. The critical values corresponding to a 1% significance level are in the column labeled P = .990. For 4 degrees of freedom, 13.277 is the critical value corresponding to a significance level of 1 - 0.99 = 1% in the Chi-Square Goodness of Fit Test.
1% 2
4
6
8
10
13.277
16
Similarly, the critical values corresponding to a 5% significance level for the Chi-Square Goodness of Fit Test are in the column labeled P = 0.950.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 339 For the following questions, use the following Chi-Square table:
χ2 Distribution
P 1-P
χ02
0
The table below gives the value of χ 02 for which Prob[ χ 2 < χ 02 ] = P for a given number of degrees of freedom and a given value of P.
Value of P
Degrees of Freedom
0.005
0.010
0.025
0.050
0.900
0.950
0.975
0.990
0.995
1 2 3 4 5
0.000 0.010 0.072 0.207 0.412
0.000 0.020 0.115 0.297 0.554
0.001 0.051 0.216 0.484 0.831
0.004 0.103 0.352 0.711 1.145
2.706 4.605 6.251 7.779 9.236
3.841 5.991 7.815 9.488 11.070
5.024 7.378 9.348 11.143 12.832
6.635 9.210 11.345 13.277 15.086
7.879 10.597 12.838 14.860 16.750
6 7 8 9 10
0.676 0.989 1.344 1.735 2.156
0.872 1.239 1.647 2.088 2.558
1.237 1.690 2.180 2.700 3.247
1.635 2.167 2.733 3.325 3.940
10.645 12.017 13.362 14.684 15.987
12.592 14.067 15.507 16.919 18.307
14.449 16.013 17.535 19.023 20.483
16.812 18.475 20.090 21.666 23.209
18.548 20.278 21.955 23.589 25.188
11 12 13 14 15
2.603 3.074 3.565 4.075 4.601
3.053 3.571 4.107 4.660 5.229
3.816 4.404 5.009 5.629 6.262
4.575 5.226 5.892 6.571 7.261
17.275 18.549 19.812 21.064 22.307
19.675 21.026 22.362 23.685 24.996
21.920 23.337 24.736 26.119 27.448
24.725 26.217 27.688 29.141 30.578
26.757 28.300 29.819 31.319 32.801
16 17 18 19 20
5.142 5.697 6.265 6.844 7.434
5.812 6.408 7.015 7.633 8.260
6.908 7.564 8.231 8.907 9.591
7.962 8.672 9.390 10.117 10.851
23.542 24.769 25.989 27.204 28.412
26.296 27.587 28.869 30.144 31.410
28.845 30.191 31.526 32.852 34.170
32.000 33.409 34.805 36.191 37.566
34.267 35.718 37.156 38.582 39.997
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 340 Problems: 12.1 (2 points) You observe the following 600 outcomes of rolling a six-sided die. Result Observed Number 1 121 2 94 3 116 4 97 5 88 6 84 Based on the Chi-Square statistic, one tests the hypothesis H0 that the die is fair. Which of the following is true? A. Reject H0 at α = 0.005. B. Do not reject H0 at α = 0.005. Reject H0 at α = 0.010. C. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. D. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. E. Do not reject H0 at α = 0.050. 12.2 (3 points) A baseball team plays 150 games per year and has lost the following number of games (over these 12 periods of 5 years each): Period Number of Games Lost Period Number of Games Lost 1901-05 402 1931-35 391 1906-10 451 1936-40 386 1911-15 412 1941-45 326 1916-20 357 1946-50 344 1921-25 370 1951-55 354 1926-30 389 1956-60 310 Total 4492 Let H0 be the hypothesis that the teamʼs results were drawn from the same distribution over time. Using the Chi-Square statistic, which of the following is true? A. Reject H0 at α = 0.005. B. Do not reject H0 at α = 0.005. Reject H0 at α = 0.010. C. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. D. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. E. Do not reject H0 at α = 0.050.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 341 12.3 (3 points) The distribution F(x) = 1 - (5/x)α, x > 5, where x is in units of thousands of dollars, has been fit to the following grouped data, with the resulting estimate of the single parameter α = 2.0. Bottom of Interval $ Thous. 5 10 15 20 25 50 75 100
Top of Interval $ Thous. 10 15 20 25 50 75 100 Infinity
# claims in the Interval 7400 1450 475 230 350 50 20 25
10000 Using the Chi-Square statistic, one tests the hypothesis H0 that the data was drawn from this fitted distribution. Which of the following is true? A. Reject H0 at a significance level of 0.5%. B. Do not reject H0 at 0.5%. Reject H0 at 1%. C. Do not reject H0 at 1%. Reject H0 at 2.5%. D. Do not reject H0 at 2.5%. Reject H0 at 5%. E. Do not reject H0 at 5%. 12.4 (2 points) You observe 5000 children, each of whom has a father with type A blood and a mother with type B blood. 1183 children have Type A blood, 2612 have type AB blood and 1205 have type B blood. Use the Chi-Square statistic to test the hypothesis that children of a father with type A blood and a mother with type B blood are expected to have types: A, AB and B with probabilities 1/4, 1/2, 1/4. A. Reject H0 at α = 0.005. B. Do not reject H0 at α = 0.005. Reject H0 at α = 0.010. C. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. D. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. E. Do not reject H0 at α = 0.050.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 342 12.5 (3 points) Given the grouped data below, what is the Chi-Square statistic for the density function f(x) = 12x2 (1-x) on the interval [0,1]? Range 0 to 0.2 0.2 to 0.3 0.3 to 0.4 0.4 to 0.5 0.5 to 0.6 0.6 to 0.7 0.7 to 0.8 0.8 to 0.9 0.9 to 1.0
# of claims 35 65 95 120 135 160 185 155 50 1000
A. less than 15 B. at least 15 but less than 17 C. at least 17 but less than 19 D. at least 19 but less than 21 E. at least 21 12.6 (1 point) Based on the Chi-Square statistic computed in the previous question, one tests the hypothesis H0 that the data is drawn from the density function f(x) = 12x2 (1-x) on the interval [0,1]. Which of the following is true? A. Reject H0 at α = 0.005. B. Do not reject H0 at α = 0.005. Reject H0 at α = 0.010. C. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. D. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. E. Do not reject H0 at α = 0.050.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 343 12.7 (3 points) 1000 claims have been grouped into intervals: Bottom of Top of # claims in the Interval Interval Interval 0 1 350 1 2 250 2 3 150 3 4 100 4 5 50 5 Infinity 100 An Exponential Distribution F(x) = 1 - e-x/θ has been fit to this grouped data. The resulting estimate of θ is 2.18. What is the p-value of the Chi-Square statistic? A. Less than 0.010 B. At least 0.010, but less than 0.025 C. At least 0.025, but less than 0.050 D. At least 0.050, but less than 0.100 E. At least 0.100 12.8 (4 points) The Generalized Pareto Distribution, F(x) = β[τ, α; x/(θ+x)], with parameters α = 2, θ = 400, τ = 5, is being compared to the following grouped data: Range 0-1000 1000-2000 2000-3000 3000-4000 4000-5000 5000-10,000 over 10,000
# of claims 450 290 110 50 30 50 20
1000 Use the following values of the F-distribution with 10 and 4 degrees of freedom: y 0 .5 1 2 3 4 5 10 20 F10,4[y] 0 .171 .452 .737 .849 .903 .933 .980 .995 What is the value of the Chi-Square statistic? Hint: β(a,b; x) = F[bx/ a(1-x)], where F is the F-Distribution with 2a and 2b degrees of freedom. A. less than 0.5 B. at least 0.5 but less than 1.0 C. at least 1.0 but less than 1.5 D. at least 1.5 but less than 2.0 E. at least 2.0
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 344 Use the following grouped data for the next three questions: Bottom of Top of # claims in the Interval Interval Interval 0 2.5 2625 2.5 5 975 5 10 775 10 25 500 25 Infinity 125 Total
5000
12.9 (3 points) A Pareto Distribution has been fit to the above grouped data. The resulting fitted parameters are α = 2.3 and θ = 6.8. Using the Chi-Square statistic, one tests the hypothesis H0 that the data was drawn from this fitted distribution. Which of the following is true? A. Reject H0 at α = 0.005. B. Do not reject H0 at α = 0.005. Reject H0 at α = 0.010. C. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. D. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. E. Do not reject H0 at α = 0.050. 12.10 (3 points) A LogNormal Distribution has been fit to the above grouped data. The resulting fitted parameters are µ = 0.8489 and σ = 1.251. What is the Chi-Square statistic? A. less than 6 B. at least 6 but less than 7 C. at least 7 but less than 8 D. at least 8 but less than 9 E. at least 9 12.11 (1 point) Using the Chi-Square statistic computed in the previous question, one tests the hypothesis H0 that the data was drawn from this fitted distribution. Which of the following is true? A. Reject H0 at α = 0.005. B. Do not reject H0 at α = 0.005. Reject H0 at α = 0.010. C. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. D. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. E. Do not reject H0 at α = 0.050.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 345 Use the following information for the next 3 questions: • Inverse Gaussian, LogNormal, Transformed Beta, Burr, Inverse Burr, ParaLogistic, Transformed Gamma, Gamma, Weibull, and Exponential Distributions have each been fit via Maximum Likelihood to the same set of data grouped into 10 intervals. • Each interval has many more than 5 claims expected for each of the fitted Distributions.
• The values of the Chi-Square Statistic are as follows: Distribution Chi-Square Transformed Beta 11.4 Transformed Gamma 11.9 Burr 12.7 Inverse Gaussian 13.0 Weibull 13.8 Gamma Inverse Burr ParaLogistic Exponential LogNormal
14.3 15.0 16.2 17.0 18.7
Number of Parameters 4 3 3 2 2 2 3 2 1 2
12.12 (2 points) Based on the p-values of the Chi-Square Statistic, rank the following three models from best to worst: Inverse Burr, Weibull, Exponential. A. Inverse Burr, Weibull, Exponential B. Weibull, Inverse Burr, Exponential C. Inverse Burr, Exponential, Weibull D. Exponential, Weibull, Inverse Burr E. None of the above. 12.13 (2 points) Based on the p-values of the Chi-Square Statistic, rank the following three models from best to worst: Transformed Beta, Transformed Gamma, ParaLogistic. A. Transformed Beta, Transformed Gamma, ParaLogistic B. Transformed Gamma, Transformed Beta, ParaLogistic C. Transformed Beta, ParaLogistic, Transformed Gamma D. ParaLogistic, Transformed Gamma, Transformed Beta E. None of the above. 12.14 (2 points) Based on the p-values of the Chi-Square Statistic, rank the following three models from best to worst: Inverse Gaussian, Gamma, LogNormal. A. Inverse Gaussian, Gamma, LogNormal B. Gamma, Inverse Gaussian, LogNormal C. Inverse Gaussian, LogNormal, Gamma D. LogNormal, Gamma, Inverse Gaussian E. None of the above.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 346 12.15 (3 points) Calculate the Chi-Square statistic for the hypothesis that the underlying distribution is Pareto with parameters alpha = 1 and theta = 2, given the 150 observations grouped below. Class Range Frequency 1 0 to 3 75 2 3 to 7 30 3 7 to 10 10 4 10 and above 35 A. Less than 5.0 B. At least 5.0, but less than 6.0 C. At least 6.0, but less than 7.0 D. At least 7.0, but less than 8.0 E. 8.0 or more 12.16 (3 points) One has a large amount of data split into only two intervals. The data is a random sample from the assumed distribution, in other words the null hypothesis is true. Show that as the number of data points goes to infinity, the distribution of the Chi-Square Statistic approaches that of a Chi-Square Distribution with one degree of freedom. Note: If X follows a Normal Distribution with mean 0 and variance 1, then X2 follows a Chi-Square Distribution with one degree of freedom. 12.17 (3 points) One thousand policyholders were observed from the time they arranged a viatical settlement until their death. 210 die during the first year, 200 die during the second year, 190 die during the third year, 200 die during the fourth year, and 200 die during the fifth year. Use the Chi-Square statistic to test the hypothesis, H0 , that F(t) = acceptable fit. Which of the following is true? A. Reject H0 at 0.005. B. Do not reject H0 at 0.005; reject H0 at 0.010. C. Do not reject H0 at 0.010; reject H0 at 0.025. D. Do not reject H0 at 0.025; reject H0 at 0.050. E. Do not reject H0 at 0.050.
t (t + 1) , 0 ≤ t ≤ 5, provides an 30
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 347 12.18 (1 point) According to Loss Models, which of the following statements are true? 1. For the chi-square goodness-of-fit test, if the sample size were to double, with each number showing up twice instead of once, the test statistic would double and the critical values would remain unchanged. 2. All models are wrong, but some may be useful. 3. If one fits a distribution by maximum likelihood to some grouped data, and one then computes the chi-square statistic by comparing this fitted distribution to this data, then the number of degrees of freedom in the approximating chi-square distribution would be equal to the number of intervals minus the number of fitted parameters. A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3 12.19 (2 points) A LogNormal distribution has been fit via maximum likelihood to the following grouped data: Range($) # of claims loss($000) 1-10,000 1496 4,500 10,001-25,000 365 6,437 25,001-100,000 267 13,933 100,001-300,000 99 16,488 300,001-1,000,000 15 7,207 Over 1,000,000 1 2,050 2243 50,615 The fitted LogNormal distribution has the following values of the distribution function: F(10,000) = 0.6665, F(25,000) = 0.8261, F(100,000) = 0.9562, F(300,000) = 0.9898, F(1,000,000) = 0.9986. Using the Chi-Square Goodness-of-fit test, which of the following is true? The minimum expected number of observations in any group should be 5. The maximum possible number of groups should be used. A. Reject H0 at 0.005. B. Do not reject H0 at 0.005; reject H0 at 0.010. C. Do not reject H0 at 0.010; reject H0 at 0.025. D. Do not reject H0 at 0.025; reject H0 at 0.050. E. Do not reject H0 at 0.050.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 348 12.20 (3 points) For people with a certain type of cancer, who are age 70 at the time of diagnosis, you have hypothesized that q70 = 0.3, q71 = 0.5, and q72 = 0.4. 1000 patients have been diagnosed with this type of cancer at age 70. 270 die within the first year, 376 die during the second year, 161 die during the third year, and the remaining 193 patients survive more than 3 years. Use the Chi-Square statistic to test this hypothesis. A. Reject H0 at 0.010. B. Do not reject H0 at 0.010; reject H0 at 0.025. C. Do not reject H0 at 0.025; reject H0 at 0.050. D. Do not reject H0 at 0.050; reject H0 at 0.010. E. Do not reject H0 at 0.010. 12.21 (4 points) A LogNormal Distribution with µ = 8.3 and µ = 1.7 is compared to the following data: Range($) 1-10,000 10,001-50,000 50,001-100,000 100,001-300,000 300,001-1,000,000 Over 1,000,000
# of claims 1124 372 83 51 5 2
loss($000) 3,082 7,851 5,422 7,607 2,050 3,000
1637 29,012 Using the Chi-Square Goodness-of-fit test, which of the following is true? The minimum expected number of observations in any group should be 5. The maximum possible number of groups should be used. A. Reject H0 at 0.010. B. Do not reject H0 at 0.010; reject H0 at 0.025. C. Do not reject H0 at 0.025; reject H0 at 0.050. D. Do not reject H0 at 0.050; reject H0 at 0.010. E. Do not reject H0 at 0.010.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 349 12.22 (4 points) You are given the following random sample of 50 claims: 988, 1420, 1630, 1702, 1891, 2017, 2037, 2824, 3300, 3601, 3603, 3690, 3734, 4400, 5175, 5200, 5250, 5381, 5550, 6177, 6238, 6350, 6620, 7837, 7941, 8104, 9850, 10180, 10300, 10487, 10554, 11370, 11800, 12350, 12474, 13087, 13682, 13760, 13783, 14800, 16352, 17298, 19193, 19292, 21290, 24422, 28170, 36893, 62841, 85750. You test the hypothesis that these claims follow a continuous distribution F(x) with the following selected values: x 1000 2500 5000 10,000 15,000 20,000 25,000 50,000 F(x) 0.06 0.21 0.41 0.63 0.75 0.82 0.87 0.95 You group the data using the largest number of groups such that the expected number of claims in each group is at least 5. Calculate the chi-square goodness-of-fit statistic. (A) Less than 7 (B) At least 7, but less than 10 (C) At least 10, but less than 13 (D) At least 13, but less than 16 (E) At least 16 12.23 (3 points) You are given: (i) 100 values in the interval from 0 to 1. (ii) This data is then grouped into 5 ranges of equal length covering the interval from 0 to 1. 5
(iii)
∑O j2 = 2169. j=1
(iv) The Chi-square goodness-of-fit test is performed, with H0 that the data was drawn from the uniform distribution on the interval from 0 to 1. Determine the result of the test. (A) Do not reject H0 at the 0.10 significance level. (B) Reject H0 at the 0.10 significance level, but not at the 0.05 significance level. (C) Reject H0 at the 0.05 significance level, but not at the 0.025 significance level. (D) Reject H0 at the 0.025 significance level, but not at the 0.01 significance level. (E) Reject H0 at the 0.01 significance level. 12.24 (1/2 point) Is the following statement true? If a Pareto distribution has been fit via maximum likelihood to some data grouped into intervals, then no other Pareto Distribution can have a smaller Chi-Square Statistic than this Pareto Distribution.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 350 12.25 (2 points) The following data is from Germany, on the percent of the population born by quarters of the year, where the quarters are measured starting with the cutoff date for youth soccer leagues. Quarter 1 Quarter 2 Quarter 3 Quarter 4 Professional Soccer Players: 30% 25% 22% 23% General Population: 24% 25% 26% 25% Let H0 be the hypothesis that the distribution for professional soccer players is the same as that of the general population. If the p-value of a Chi-square Goodness of Fit test is 1%, how big is the sample of professional soccer players? A. 300 B. 350 C. 400 D. 450 E. 500 12.26 (3 points) You are given the following: • Policies are written with a deductible of 500 and a maximum covered loss of 25,000. • One thousand payments have been recorded as follows: Interval Number of Claims (0, 1000] 165 (1,000, 5,000] 292 (5,000, 10,000] 157 (10,000, 24,500) 200 24,500 186 • The null hypothesis, H0 , is that losses prior to the effects of the deductible and maximum covered loss follow a Weibull Distribution, with parameters τ = 1/2 and θ = 7000. Determine the Chi-Square Goodness of Fit Statistic. A. Less than 1.5 B. At least 1.5, but less than 2.0 C. At least 2.0, but less than 2.5 D. At least 2.5, but less than 3.0 E. 3.0 or more
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 351 12.27 (3 points) You observe 400 losses grouped in intervals. Interval Number of Losses (0, 50] 30 (50, 100] 50 (100, 250] 130 (250, 500] 190 Fit a uniform distribution on (0, ω) to this data via maximum likelihood. Let the null hypothesis be that the data was drawn from this fitted distribution. You test this hypothesis using the chi-square goodness-of-fit test. Determine the result of the test. (A) The hypothesis is not rejected at the 0.10 significance level. (B) The hypothesis is rejected at the 0.10 significance level, but is not rejected at the 0.05 significance level. (C) The hypothesis is rejected at the 0.05 significance level, but is not rejected at the 0.025 significance level. (D) The hypothesis is rejected at the 0.025 significance level, but is not rejected at the 0.01 significance level. (E) The hypothesis is rejected at the 0.01 significance level. 12.28 (2 points) You are given a sample of losses: Loss Range Frequency [0, 1000] 70 (1000, 2000] 60 (2000, ∞) 70 An Exponential Distribution has been fit via maximum likelihood to the above data resulting in an estimate of θ of 1997. Using the Chi-Square statistic, one tests the hypothesis H0 that the data was drawn from this fitted distribution. Which of the following is true? A. Reject H0 at a significance level of 1%. B. Do not reject H0 at 1%. Reject H0 at 2.5%. C. Do not reject H0 at 2.5%. Reject H0 at 5%. D. Do not reject H0 at 5%. Reject H0 at 10%. E. Do not reject H0 at 10%.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 352 12.29 (4, 5/87, Q.54) (3 points) You want to test the hypothesis H0 : F(x) = F0 (x), where F0 (x) is a Pareto distribution with parameters α =1.5, θ = 1500. You have 100 observations in the following ranges: 0 - 1000 55 1000 - 2500 15 2500 - 10000 22 10000 and over 8 In which of the following ranges does the Chi-Square statistic fall? The minimum expected number of observations in any group should be 5. The maximum possible number of groups should be used. A. Less than 3 B. At least 3, but less than 4.5 C. At least 4.5, but less than 6 D. At least 6, but less than 7 E. 7 or more. 12.30 (160, 11/87, Q.16) (2.1 points) In the following table, the values of tIqx are calculated from a fully specified survival model, and the values of dx+t are observed deaths from the complete mortality experience of 100 cancer patients age x at entry to the study. t dx+t tIq x 0 0.10 15 1 0.25 30 2 0.25 20 3 0.20 15 4 0.15 10 5 0.05 10 You hypothesize that the mortality of cancer patients is governed by the specified model. Let χ2 be the value of the Chi-Square statistic used to test the validity of this model, and ν be the degrees of freedom. Determine χ2 - ν. (A) 6.4
(B) 7.4
(C) 8.4
(D) 45.3
(E) 46.3
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 353 12.31 (4, 5/90, Q.55) (3 points) Two states provide workers' compensation indemnity claim benefits. One state caps payments at $50,000 and the other state caps payments at $100,000. Size of indemnity claim data is provided in the following table. Claim Number of Claims Size State 1 State 2 0 - 24,999 2,900 1,000 25,000 - 49,999 1,200 300 50,000 - ∞ 900 50,000 - 59,999 200 60,000 - 99,999 300 100,000 - ∞ 200 We are fitting the combined data for both states to a Pareto distribution, using minimum chi-square estimation. The chi-square statistic is being calculated by combining the number of claims from the two claim size intervals common to both states, which results in six terms for the chi-square statistic. What is the chi-square statistic, Q, corresponding to the parameter estimates α = 3 and θ = 75,000? A. B. C. D. E.
Q < 200 200 < Q < 400 400 < Q < 600 600 ≤ Q < 800 800 ≤ Q
12.32 (4, 5/91, Q.40) (2 points) Calculate the the Chi-Square statistic, χ2, to test the hypothesis that the underlying distribution is Loglogistic with parameters θ = 2 and γ = 1, given the 25 grouped observations. Range Number of Observations 0≤x<2 8 2≤x<6 5 6 ≤ x < 10 4 10 ≤ x < 14 3 14 ≤ x 5 The minimum expected number of observations in any group should be 5. The maximum possible number of groups should be used. A. χ2 ≤ 3.0
B. 3.0 < χ2 ≤ 5.0
C. 5.0 < χ2 ≤ 7.0
D. 7.0 < χ2 ≤ 9.0
E. 9.0 < χ2
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 354 12.33 (4B, 11/92, Q.14) (3 points) You are given the following: • Basic limits data from two states have been collected. Indiana's data is capped at $25,000 while New Jersey's data is capped at $50,000. Number of claims by size of loss are: Number of Claims Claim Size Indiana New Jersey 0 - 9,999 4,925 2,405 10,000 - 24,999 2,645 1,325 25,000 - ∞ 2,430 25,000 - 34,999 405 35,000 - 49,999 325 50,000 - ∞ 540 10,000 5,000 • The underlying loss distribution is the same in both states.
•
A Pareto distribution, with parameters α = 2 and θ = 25,000, has been fit to
the combined data from the two states using minimum chi-square estimation. Calculate the Chi-Square statistic using the six distinct size classes. A. Less than 5.00 B. At least 5.00 but less than 6.00 C. At least 6.00 but less than 7.00 D. At least 7.00 but less than 8.00 E. At least 8.00 12.34 (4B, 11/93, Q.23) (2 points) You are given the following: • A random sample, x1 , ..., x20 is taken from a probability distribution function F(x). 1.07, 1.07, 1.12, 1.35, 1.48, 1.59, 1.60, 1.74, 1.90, 2.02, 2.05, 2.07, 2.15, 2.16, 2.21, 2.34, 2.75, 2.80, 3.64, 10.42. • The probability distribution function F(x) underlying the random sample is assumed to have the form F(x) = 1 - (1/x), x≥1. • The observations from the random sample were segregated into the following intervals: [1, 4/3), [4/3, 2), [2, 4), [4, ∞) You are to use the Chi-Square statistic to test the hypothesis, H0 , that F(x) provides an acceptable fit. Which of the following is true? A. Reject H0 at α = 0.005. B. Do not reject H0 at α = 0.005; reject H0 at α = 0.010. C. Do not reject H0 at α = 0.010; reject H0 at α = 0.025. D. Do not reject H0 at α = 0.025; reject H0 at α = 0.050. E. Do not reject H0 at α = 0.050.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 355 12.35 (4B, 5/94, Q.14) (2 points) You are given the following: • X is a random variable assumed to have a Pareto distribution with parameters α = 2 and θ = 1000. •
A random sample of 10 observations of X yields the values 100, 200, 225, 275, 400, 700, 800, 900, 1500, 3000. Use the intervals [0,250), [250,500), [500,1000), and [1000,∞) to calculate the Chi-Square statistic, Q, to test the Pareto assumption. A. Q < 0.80 B. 0.80 ≤ Q < 1.00 C. 1.00 ≤ Q < 1.20 D. 1.20 ≤ Q < 1.40 E. 1.40 ≤ Q 12.36 (4B, 11/94, Q.4) (3 points) You are given the following: A random sample of 1,000 observations from a loss distribution has been grouped into five intervals as follows: Interval Number of Observations [ 0, 3.0) 180 [3.0, 7.5) 180 [7.5, 15.0) 235 [15.0, 40.0) 255 [40.0, ∞) 150 1000 The loss distribution is believed to be a Pareto distribution and the minimum chi-square technique has been used with the grouped data to estimate the parameters, α = 3.5 and θ = 50. Using the Chi-Square statistic, what is the highest significance level at which you would not reject this fitted distribution? A. Less than 0.005 B. 0.005 C. 0.010 D. 0.025 E. 0.050
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 356 12.37 (4B, 11/95, Q.3) (2 points) You are given the following: • 100 observed losses have been recorded in thousands of dollars and are grouped as follows: Interval Number of Losses (0, 1) 15 [1, 5) 40 [5,10) 20 [10,15) 15 [15, ∞) 10 • The random variable X underlying the observed losses, in thousands, is believed to have the density function: f(x) = (1/5) e-x/5, x > 0. Determine the value of the Chi-Square statistic. A. Less than 2 B. At least 2, but less than 5 C. At least 5, but less than 8 D. At least 8, but less than 11 E. At least 11 12.38 (4B, 5/96, Q.23) (2 points) Forty (40) observed losses have been recorded in thousands of dollars and are grouped as follows: Interval Number of Total Losses ($000) Losses ($000) (1, 4/3) 16 20 [4/3, 2) 10 15 [2, 4) 10 35 [4, ∞) 4 20 The null hypothesis, H0 , is that the random variable X underlying the observed losses, in thousands, has the density function f(x) = 1/x2 , x > 1. Using the Chi-Square statistic, determine which of the following statements is true. A. Reject H0 at α = 0.010. B. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. C. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. D. Do not reject H0 at α = 0.050. Reject H0 at α = 0.100. E. Do not reject H0 at α = 0.100.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 357 12.39 (Course 160 Sample Exam #3, 1997, Q.12) (1.9 points) For a complete study of 150 patients diagnosed with a fatal disease, you are given: (i) For each patient, t = 0 at time of diagnosis. (ii) The number of deaths in each interval is Interval Number of Deaths (0,1] 21 (1,2] 27 (2,3] 39 (3,4] 63 (iii) The χ2 statistic is used to test the fit of the survival model S(t) = 1 - t(t + 1)/20, 0 ≤ t ≤ 4. (iv) The appropriate number of degrees of freedom is denoted by ν. Calculate χ2/ν. (A) 0.5
(B) 0.6
(C) 0.7
(D) 0.9
(E) 1.2
12.40 (4B, 5/99, Q.11) (3 points) You are given the following: • One hundred claims greater than 3,000 have been recorded as follows: Interval Number of Claims (3,000, 5,000] 6 (5,000, 10,000] 29 (10,000, 25,000] 39 (25,000, ∞) 26 • Claims of 3,000 or less have not been recorded. • The null hypothesis, H0 , is that claim sizes follow a Pareto distribution, with parameters α = 2 and θ = 25,000. A chi-square test is performed using the Chi-Square statistic with four classes. Determine which of the following statements is true. A. Reject H0 at α = 0.010. B. Do not reject H0 at α = 0.010. Reject H0 at α = 0.025. C. Do not reject H0 at α = 0.025. Reject H0 at α = 0.050. D. Do not reject H0 at α = 0.050. Reject H0 at α = 0.100. E. Do not reject H0 at α = 0.100.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 358 12.41 (Course 4 Sample Exam 2000, Q.9) Summary statistics of 100 losses are: Interval
Number of Losses
Sum
Sum of Squares
(0,2000] (2000,4000] (4000,8000] (8000, 15000] (15,000 ∞)
39 22 17 12 10
38,065 63,816 96,447 137,595 331,831
52,170,078 194,241,387 572,753,313 1,628,670,023 17,906,839,238
Total
100
667,754
20,354,674,039
When a Pareto Distribution was fit via the method of moments to a different data set, the estimated parameters were α = 2.5 and θ = 10,000. Determine the chi-square statistic and number of degrees of freedom for a test (with five groups) to access the acceptability of fit of the data above to these parameters. 12.42 (4, 11/04, Q.10 & 2009 Sample Q.140) (2.5 points) You are given the following random sample of 30 auto claims: 54 140 230 560 600 1,100 1,500 1,800 1,920 2,000 2,450 2,500 2,580 2,910 3,800 3,800 3,810 3,870 4,000 4,800 7,200 7,390 11,750 12,000 15,000 25,000 30,000 32,300 35,000 55,000 You test the hypothesis that auto claims follow a continuous distribution F(x) with the following percentiles: x 310 500 2,498 4,876 7,498 12,930 F(x) 0.16 0.27 0.55 0.81 0.90 0.95 You group the data using the largest number of groups such that the expected number of claims in each group is at least 5. Calculate the chi-square goodness-of-fit statistic. (A) Less than 7 (B) At least 7, but less than 10 (C) At least 10, but less than 13 (D) At least 13, but less than 16 (E) At least 16
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 359 12.43 (4, 5/05, Q.33 & 2009 Sample Q.201) (2.9 points) You test the hypothesis that a given set of data comes from a known distribution with distribution function F(x). The following data were collected: Interval F(xi) Number of Observations x<2 0.035 2≤x<5 0.130 5≤x<7 0.630 7≤x<8 0.830 8≤x 1.000 Total where xi is the upper endpoint of each interval.
5 42 137 66 50 300
You test the hypothesis using the chi-square goodness-of-fit test. Determine the result of the test. (A) The hypothesis is not rejected at the 0.10 significance level. (B) The hypothesis is rejected at the 0.10 significance level, but is not rejected at the 0.05 significance level. (C) The hypothesis is rejected at the 0.05 significance level, but is not rejected at the 0.025 significance level. (D) The hypothesis is rejected at the 0.025 significance level, but is not rejected at the 0.01 significance level. (E) The hypothesis is rejected at the 0.01 significance level. 12.44 (4, 5/07, Q.5) (2.5 points) You are given: (i) A computer program simulates n = 1000 pseudo-U(0, 1) variates. (ii) The variates are grouped into k = 20 ranges of equal length. 20
(iii)
∑O j2 = 51,850 j=1
(iv) The Chi-square goodness-of-fit test for U(0, 1) is performed. Determine the result of the test. (A) Do not reject H0 at the 0.10 significance level. (B) Reject H0 at the 0.10 significance level, but not at the 0.05 significance level. (C) Reject H0 at the 0.05 significance level, but not at the 0.025 significance level. (D) Reject H0 at the 0.025 significance level, but not at the 0.01 significance level. (E) Reject H0 at the 0.01 significance level.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 360 Solutions to Problems: 12.1. D. For 6 intervals so have 6 -1 = 5 degrees of freedom. Chi-Square Statistic is 11.42, which is greater than 11.070 but less than 12.832. Since 11.42 > 11.070, we reject at 5%; since 11.42 < 12.832, we do not reject at 2.5%. Result Observed Number Expected Number
((Observed - Expected)^2)/Expected
1 2 3 4 5 6
121 94 116 97 88 84
100 100 100 100 100 100
4.41 0.36 2.56 0.09 1.44 2.56
Sum
600
600
11.42
12.2. A. With 12 intervals there are 12 - 1 = 11 degrees of freedom. The Chi-Square Statistic is 45 as computed below. Since 45 > 26.757, one rejects at 1/2%. Period
Observed Number Assumed Number
((Observed - Assumed)^2)/Assumed
1 2 3 4 5 6 7 8 9 10 11 12
402 451 412 357 370 389 391 386 326 344 354 310
374.33 374.33 374.33 374.33 374.33 374.33 374.33 374.33 374.33 374.33 374.33 374.33
2.04 15.70 3.79 0.80 0.05 0.57 0.74 0.36 6.24 2.46 1.10 11.06
Sum
4492
4492
44.93
Comment: We assume a uniform distribution. No parameters have been fit. As always, we lose one degree of freedom, 12 - 1 = 11, since the total of the assumed and observed columns are set equal.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 361 12.3. D. One has 8 intervals and has fit one parameter, therefore one has 8 - 1 - 1 = 6 degrees of freedom. One computes the Chi-Square Statistic as 13.29 as shown below. Since 12.592 < 13.29 < 14.449, one rejects at 5% and does not reject at 2.5%. Bottom of Interval $ Thous.
Top of Interval $ Thous.
# claims in the Interval
F(lower)
5 10 15 20 25 50 75 100
10 15 20 25 50 75 100 Infinity
7400 1450 475 230 350 50 20 25
0.00000 0.75000 0.88889 0.93750 0.96000 0.99000 0.99556 0.99750
SUM
10000
F(upper)
Fitted # claims
Chi Square
0.75000 0.88889 0.93750 0.96000 0.99000 0.99556 0.99750 1.00000
7500.0 1388.9 486.1 225.0 300.0 55.6 19.4 25.0
1.33 2.69 0.25 0.11 8.33 0.56 0.02 0.00
10000
13.29
12.4. B. For 3 types we have 3 - 1 = 2 degrees of freedom. Chi-Square Statistic is 10.23, which is greater than 9.21 but less than 10.60. Since 10.23 > 9.210, we reject at 1%; since 10.23 < 10.597, we do not reject at 1/2%. Type
Observed Number Assumed Number
((Observed - Assumed)^2)/Assumed
A AB B
1183 2612 1205
1250 2500 1250
3.59 5.02 1.62
Sum
5000
5000
10.23
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 362 x
12.5. C. F(x) =
∫0 f(t) dt = 4x3 - 3x4, 0≤ x ≤1.
The Chi-square statistic is computed as by taking the sum of: (fitted - observed)2 / fitted numbers of claims. For each interval, the fitted number of claims = {total number of claims}{F(upper) - F(lower)} Bottom of Interval
Top of Interval
# claims in the Interval
0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
35 65 95 120 135 160 185 155 50 1000
F(lower) F(upper) 0.0000 0.0272 0.0837 0.1792 0.3125 0.4752 0.6517 0.8192 0.9477
0.0272 0.0837 0.1792 0.3125 0.4752 0.6517 0.8192 0.9477 1.0000
Fitted # claims
Chi Square
27.2 56.5 95.5 133.3 162.7 176.5 167.5 128.5 52.3
2.24 1.28 0.00 1.33 4.72 1.54 1.83 5.46 0.10
1000
18.50
12.6. C. Chi-Square is 18.50. There are 9 intervals, so we have 9 - 1 = 8 degrees of freedom. Since 18.50 >17.535, we reject at 2.5%; since 18.50 < 20.090, we do not reject at 1%.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 363 12.7. E. One has 6 intervals and has fit one parameter, therefore one has 6 - 1 - 1 = 4 degrees of freedom. One computes the Chi-Square Statistic as 4.09 as shown below. Since 4.09 < 7.779, the p-value is greater than 10%. This is equivalent to saying we do not reject at a 10% significance level the hypothesis that the data was drawn from the fitted Exponential Distribution. Bottom of Interval
Top of Interval
# claims in the Interval
0 1 2 3 4 5
1 2 3 4 5 Infinity
350 250 150 100 50 100
SUM
F(lower) F(upper)
F(Upper) minus F(Lower)
Fitted # claims
(ObservedFitted)^2 /Fitted
0.00000 0.36791 0.60046 0.74745 0.84036 0.89910
0.36791 0.23255 0.14699 0.09291 0.05873 0.10090
367.9 232.6 147.0 92.9 58.7 100.9
0.87 1.31 0.06 0.54 1.30 0.01
1000
4.09
0.36791 0.60046 0.74745 0.84036 0.89910 1.00000
1000
For example, F(3) = 1 - exp(-3/2.18) = 0.74745. The fitted number of claims in the interval 2 to 3 is: (1000){ F(3) - F(2) } = (1000)(.74745-.60046) = 147.0, where 1000 is the total observed claims. The contribution to the Chi-Square Statistic from the interval 2 to 3 is: (150-147)2 / 147 = 0.06. The Chi-Square Statistic is the sum of the contributions: 4.09. Comment: For the Chi-Square Distribution with 4 degrees of freedom S(7.779) = 10%; therefore S(4.09) > S(7.779) = 10%. Using a computer, S(4.09) = 39%. 12.8. B. Use the relationship between the F-Distribution and the Incomplete Beta function. β(a, b; y) = F[by/(a(1-y))], where F is the F-distribution with 2a and 2b degrees of freedom. Let a = 5, b = 2, y = x/(x + 400). ⇒ 1 - y = 400/(x + 400). Therefore, the Generalized Pareto Distribution = β(5, 2; x/(x + 400)) = F10,4[2{x/(x+400)}/{5(400/(x+400))}] = F10,4[2x/2000] = F10,4[x/1000]. So for example, the Distribution function at x =1000 is F10,4[1] = .452. Thus the predicted portion of claims between $1000 and $2000 is F10,4[2] - F10,4[1] = 0.737 - 0.452 = 0.285. With 1000 total claims, the predicted number of claims for this interval is 285. Summing the contributions from each interval, χ2 = 0.620. Interval ($000)
Predicted
Observed
{(Pred-Obs)^2}/Pred
0-1 1-2 2-3 3-4 4-5 5-10 over 10
452 285 112 54 30 47 20
450 290 110 50 30 50 20
0.009 0.088 0.036 0.296 0.000 0.191 0.000
Total
1000
1000
0.620
Comment: This is a difficult question!
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 364 12.9. D. One has 5 intervals and has fit two parameters, therefore one has 5 - 1 - 2 = 2 degrees of freedom. One computes the Chi-Square Statistic as 7.23 as shown below. Since 5.991 < 7.23 < 7.378, one rejects at 5% and does not reject at 2.5%. (Using the row for 2 degrees of freedom, find the columns that bracket the Chi-Square Statistic. Reject at the significance level of the column to the left and do not reject at the significance level of the column to the right.) Bottom of Interval
Top of Interval
# claims in the Interval
0 2.5 5 10 25
2.5 5 10 25 Infinity
2625 975 775 500 125
SUM
F(lower) F(upper)
F(Upper) minus F(Lower)
Fitted # claims
(ObservedFitted)^2 /Fitted
0.00000 0.51330 0.71852 0.87510 0.97121
0.51330 0.20522 0.15658 0.09611 0.02879
2566.5 1026.1 782.9 480.6 143.9
1.33 2.55 0.08 0.79 2.49
5000
7.23
0.51330 0.71852 0.87510 0.97121 1.00000
5000
For example, F(25) = 1 - (θ/(θ + x))α = 1 - (6.8/(6.8 + 25))2.3 = 0.97121. The fitted number of claims in the interval 10 to 25 is: (5000){ F(25) - F(10) } = (5000)(.97121 - .87510) = 480.6, where 5000 is the total observed claims. The contribution from the interval 10 to 25 is: (500 - 480.6)2 / 480.6 = 0.79. 12.10. E. F(2.5) = Φ[{ln(x) − µ} / σ] = Φ[{ln(2.5) - .8489} / 1.251] = Φ[0.05] = 0.5199. F(5) = Φ[{ln(5) - .8489} / 1.251] = Φ[0.61] = 0.7291. F(10) = Φ[{ln(10) - .8489} / 1.251] = Φ[1.16] = 0.8770. F(25) = Φ[{ln(25) - .8489} / 1.251] = Φ[1.89] = 0.9706. The expected number of claims in the interval 10 to 25 is: (5000){F(25) - F(10)} = (5000)(0.9706 - 0.8770) = 468.0. The contribution to the Chi-Square Statistic from the interval 10 to 25 is: (500 - 468.0)2 / 468.0 = 2.19. The sum of the contributions is 12.25. Bottom of Interval
Top of Interval
# claims in the Interval
0 2.5 5 10 25
2.5 5 10 25 Infinity
2625 975 775 500 125
SUM
5000
F(lower) F(upper) 0.0000 0.5199 0.7291 0.8770 0.9706
0.5199 0.7291 0.8770 0.9706 1.0000
F(Upper) minus F(Lower)
Expected # claims
(ObservedExpected)^2 /Expected
0.5199 0.2092 0.1479 0.0936 0.0294
2599.5 1046.0 739.5 468.0 147.0
0.25 4.82 1.70 2.19 3.29
5000
12.25
Comment: Without rounding prior to looking in the Normal Table, the statistic would be 9.9. The Pareto Distribution in the prior question was a better fit to this data than the LogNormal Distribution, since the Pareto had a smaller Chi-Square Statistic. Note that the Pareto and the LogNormal have the same number of parameters, so it is appropriate to compare them in this way.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 365 12.11. A. One has 5 intervals and has fit two parameters, therefore one has 5 - 1 - 2 = 2 degrees of freedom. One computes the Chi-Square Statistic as 12.25 as shown previously. Since 10.597 < 12.25, one rejects at 1/2%. 12.12. E. The number of degrees of freedom equals the number of intervals minus one minus the number of fitted parameters = 9 - number of fitted parameters. The Inverse Burr has 3 parameters, so 6 d.f. Since 14.449 < 15 < 16.812, the p-value for the Inverse Burr is between 1% and 2.5%. The Weibull has 2 parameters, so 7 d.f. Since 12.017< 13.8 < 14.067, the p-value for the Weibull is between 5% and 10%. The Exponential has 1 parameter, so 8 d.f. Since 15.507 < 17 < 17.535, the Exponential has a p-value between 2.5% and 5%. Thus the Weibull has the biggest p-value, while the Inverse Burr has the smallest. Thus the fits from best to worst are: Weibull, Exponential, Inverse Burr. Comment: Since the Inverse Burr has a larger Chi-square and more parameters than the Weibull, the Inverse Burr is a worse fit than the Weibull and has a smaller p-value. It is only necessary to compute the p-values and compare them if the Distribution with more parameters has a smaller ChiSquare. 12.13. B. The number of degrees of freedom equals the number of intervals minus one minus the number of fitted parameters = 9 - number of fitted parameters. The Transformed Beta has 4 parameters, so 5 d.f. Since 11.070 < 11.4 < 12.832, the Transformed Beta has a p-value between 2.5% and 5%. The Transformed Gamma has 3 parameters, so 6 d.f. Since 10.685 < 11.9 < 12.592, the p-value for the Transformed Gamma is between 5% and 10%. The ParaLogistic has 2 parameters, so 7 d.f. Since 16.013 < 16.2 < 18.475, the p-value for the ParaLogistic is between 1% and 2.5%. Thus the Transformed Gamma has the biggest p-value, while the ParaLogistic has the smallest. Thus the fits from best to worst are: Transformed Gamma, Transformed Beta, ParaLogistic.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 366 12.14. A. Since they all have the same numbers of degrees of freedom, the one with the smallest Chi-square has the biggest p-value and the best fit. Thus the fits from best to worst are: Inverse Gaussian, Gamma, LogNormal. Comment: When the Distributions have the same number of parameters (and the same number of intervals) one can just rank the Chi-Square values without computing p-values. Smallest Chi-Square is the best fit. For this set of questions, using a computer, the p-values are as follows: Distribution
χ2
d.f.
p-value
Transformed Beta Transformed Gamma Burr Inverse Gaussian Weibull Gamma Inverse Burr ParaLogistic Exponential LogNormal
11.4 11.9 12.7 13.0 13.8 14.3 15.0 16.2 17.0 18.7
5 6 6 7 7 7 6 7 8 7
4.4% 6.4% 4.8% 7.2% 5.5% 4.6% 2.0% 2.3% 3.0% 0.9%
12.15. D. For a Pareto with α = 1 and θ = 2: F(x) = 1 - 2/(x+2) = x / (x+2). The fitted number of claims for each interval is: 150{F(top of interval) - F(bottom of interval)}
Sum
Bottom of Interval
Top of Interval
F(Bottom of Interval)
F(Bottom of Interval)
0 3 7 10
3 7 10 ∞
0.000 0.600 0.778 0.833
0.600 0.778 0.833 1.000
Fitted Observed (Observed-Fitted)^2 Number Number /Fitted 90.000 26.667 8.333 25.000
75 30 10 35
2.50 0.42 0.33 4.00
150
150
7.25
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 367 12.16. Let p1 = expected probability for the first interval. Let p2 = 1 - p1 = expected probability for the second interval. Let O1 = number observed in the first interval. Let O2 = number observed in the second interval. Let n = O1 + O2 = number of data points. Let E1 = p1 n = number expected in the first interval. Let E2 = p2 n = number expected in the second interval. Note that O1 + O2 = E1 + E2 . ⇒ O1 - E1 = E2 - O2 . ⇒ (O1 - E1 )2 = (O2 - E2 )2 . χ 2 = (O1 - E1 )2 /E1 + (O2 - E2 )2 /E2 = (O1 - E1 )2 {1/E1 + 1/E2 } = (O1 - E1 )2 (E1 + E2 )/(E1 E2 ) = (O1 - E1 )2 n/{n p1 n (1-p1 )} = (O1 - n p1 )2 /{n p1 (1-p1 )} = {(O1 - n p1 )/ n p1(1- p1) }2 . Since H0 is true, O1 is Binomially Distributed, with parameters n and p1 . O 1 has mean n p1 and variance n p1 (1-p1 ). Therefore, (O1 - n p1 )/ n p1(1- p1) is approximately Standard Normally Distributed. As n approaches infinity, (O1 - n p1 )/ n p1(1- p1) approaches a Standard Normal Distribution. ⇒ χ 2 = {(O1 - n p1 )/ n p1(1- p1) }2 approaches a Chi-Square Distribution with 1 degree of freedom. Comment: When computing the Chi-square goodness-of-fit test statistic, for a given interval, the Expected does approach the Observed as a percent. In other words, |Expected - Observed|/ Expected goes to zero for each interval. However, the contribution from each interval, (Expected - Observed)^2 / Expected, does not go to zero due to the square in the numerator. In the above example, if H0 is true, the sum of the contributions, the Chi-Square Statistic, approaches the square of a Standard Normal Distribution. Even if the assumed distribution is not correct, but the assumed distribution function matches the true distribution function at the breakpoint between the intervals, then the result still holds. With only two intervals, we are really only testing whether the assumed distribution function value at the breakpoint is correct. 12.17. C. There are 5 - 1 = 4 degrees of freedom. Lower Limit
Upper Limit
observations
F(lower)
0 1 2 3 4
1 2 3 4 5
210 200 190 200 200
0.0000 0.2582 0.4472 0.6325 0.8165
1000
F(upper)
Assumed # claims
ChiSquare
0.2582 0.4472 0.6325 0.8165 1.0000
258.2 189.0 185.2 184.0 183.5
9.00 0.64 0.12 1.38 1.48 12.62
Since 11.143 < 12.62 ≤ 13.277, reject at 2.5% and do not reject at 1%. Comment: Similar to Exercise 16.9 in Loss Models.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 368 12.18. D. 1. True. Both the expected, Ei, and the observed, Oi, for each interval would double. Therefore, Σ (Ei - Oi)2/Ei would also double. The same number of intervals. ⇒ The same numbers of degrees of freedom.
⇒ Look in the same row of the Chi-Square Table. ⇒ The same critical values. 2. True. What Professor Klugman means, quoting George Box, is that any statistical model is only an approximation to reality. 3. False. Number of intervals minus one minus the number of fitted parameters. Comment: Statement 2 is a quote from the textbook taken out of context. Exam questions should not quote from the textbook out of context, but once in a while they do. “For the chi-square goodness-of-fit test, if the sample size were to double, with each number showing up twice instead of once, the test statistic would double and the critical values would remain unchanged.” Assuming the null hypothesis were true, then if we took a second sample of size equal to the first, we would not expect to get the same proportion of values in each interval as we did in the first sample. For example, let us assume a case where the null hypothesis was true, and the first sample had a very unusually large number of items in the first interval. This can happen due to random fluctuation. However, we have no reason to assume that this would again occur for a second sample. In fact, if the null hypothesis was true, if we were to take enough samples, we would expect the total proportion of items observed to approach the expected in each interval. The Chi-Square Statistic would approach a Chi-Square Distribution with the appropriate number of degrees of freedom. If instead the null hypothesis is false, then for most or all intervals the probability covered by an interval would not match that which we calculated from the null hypothesis. Therefore, as we took more samples, the observed total proportions should not approach the expected proportions derived from the null hypothesis. The Chi-Square Statistic would not approach a Chi-Square Distribution. The power of a test is the probability of rejecting H0 when it is false. As the total sample size increases, the power of the test increases. For a relatively small sample, even if H0 is not true, there may not be enough statistical evidence to reject H0 . It is easy to get a small sample which is not too bad a match to H0 , even though the data was not drawn from the assumed distribution. For a very large sample, if H0 is not true, there is likely to be enough statistical evidence to reject H0 . When the data was not drawn from the assumed distribution, there is only a very small probability of getting a large sample which is a good match to H0 ,.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 369 12.19. A. In order to have 5 expected claims, we need to group the last 2 intervals together. Bottom of Interval
Top of Interval
# claims in the Interval
1 10,000 25,000 100,000 300,000
10,000 25,000 100,000 300,000 Infinity
1496 365 267 99 16
SUM
F(lower) F(upper)
F(Upper) minus F(Lower)
Fitted # claims
(ObservedFitted)^2 /Fitted
0.0000 0.6665 0.8261 0.9562 0.9898
0.66650 0.15960 0.13010 0.03360 0.01020
1495.0 358.0 291.8 75.4 22.9
0.00 0.14 2.11 7.41 2.07
2243
11.73
0.6665 0.8261 0.9562 0.9898 1.0000
2243
There are two fitted parameters, and thus 5 - 1 - 2 = 2 degrees of freedom. For 2 degrees of freedom, the 1/2% critical value is 10.597. 10.597 < 11.73. ⇒ Reject at 0.5%. Comment: We make no use of the given dollars of loss to answer the question. We can only try to group consecutive intervals. The over 1 million interval would have an expected number of: 2243 S(1 million) = (2243)(0.0014) = 3.14 < 5. Thus we combine the 300,001-1,000,000 and over 1,000,000 intervals. Having done this, we confirm that the expected number in each interval is at least 5. This same rule of thumb was used in 4, 11/04, Q.10. The Maximum Likelihood LogNormal Distribution has µ = 8.435 and σ = 1.802; fitting done on a computer. The data was taken from AIA Closed Claim Study (1974) in Table IV of “Estimating Pure Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976. 12.20. B. The number expected to die during the first year is: (.3)(1000) = 300. The number expected to die during the second year is: (.5)(1-.3)(1000) = 350. The number expected to die during the third year is: (.4)(1-.5)(1-.3)(1000) = 140. The number expected to survive more than 3 years is: 1000 - 300 - 350 - 140 = 210. time Observed Number Expected Number 0 to 1 270 300 1 to 2 376 350 2 to 3 161 140 more than 3 193 210 Sum
1000
1000
((Observed - Expected)^2)/Expected 3.000 1.931 3.150 1.376 9.458
The number of degrees of freedom is: 4 - 1 - 0 = 3. 9.348 < 9.458 < 11.345. ⇒ Do not reject H0 at 0.010; reject H0 at 0.025.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 370 12.21. C. In order to have 5 expected claims, we need to group the last two intervals together. F(10,000) = Φ[{ln(x) − µ} / σ] = Φ[{ln(10,000) - 8.3} / 1.7] = Φ[0.54] = 0.7054. F(50,000) = Φ[{ln(50,000) - 8.3} / 1.7] = Φ[1.48] = 0.9306. F(100,000) = Φ[{ln(100,000) - 8.3} / 1.7] = Φ[1.89] = 0.9706. F(300,000) = Φ[{ln(300,000) - 8.3} / 1.7] = Φ[2.54] = 0.9945. Bottom of Interval
Top of Interval
# claims in the Interval
1 10,000 50,000 100,000 300,000
10,000 50,000 100,000 300,000 Infinity
1124 372 83 51 7
SUM
F(lower) F(upper)
F(Upper) minus F(Lower)
Expected # claims
(ObservedExpected)^2 /Expected
0.0000 0.7054 0.9306 0.9706 0.9945
0.70540 0.22520 0.04000 0.02390 0.00550
1154.7 368.7 65.5 39.1 9.0
0.82 0.03 4.69 3.60 0.45
1637
9.59
0.7054 0.9306 0.9706 0.9945 1.0000
1637
For 5 - 1 = 4 d.f., the 5% and 2.5% critical values are: 9.488 and 11.143. 9.488 < 9.59 < 11.143. ⇒ Reject at 5% and do not reject at 2.5%. Comment: We make no use of the given dollars of loss to answer the question. Data taken from NAIC Closed Claim Study (1975) in Table VII of “Estimating Pure Premiums by Layer - An Approach” by Robert J. Finger, PCAS 1976. Without rounding prior to looking in the Normal Table, the statistic would be 9.9. 12.22. C. There are 50 observed, so for an interval to have at least 5 expected claims, it must cover at least: 5/50 = 10%. The largest number of groups that accomplishes this is: 0 to 2500, 2500 to 5000, 5000 to 10,000, 10,000 to 15,000, 15,000 to 25,000, and 25,000 to ∞. Each contribution from an interval is: (assumed - observed)2 /assumed. Lower Limit
Upper Limit
observations
F(lower)
F(upper)
Prob. in Interval
Assumed # claims
ChiSquare
0 2,500 5,000 10,000 15,000 25,000
2,500 5,000 10,000 15,000 25,000 infinity
7 7 13 13 6 4
0.00 0.21 0.41 0.63 0.75 0.87
0.21 0.41 0.63 0.75 0.87 1.00
0.21 0.20 0.22 0.12 0.12 0.13
10.5 10.0 11.0 6.0 6.0 6.5
1.17 0.90 0.36 8.17 0.00 0.96
50.0
11.56
50
Comment: Similar to 4, 11/04, Q.10.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 371 12.23. B.
5
5
j=1
j=1
∑O j = ∑Ej = 100.
Since the intervals are of the same length and we are assuming a uniform distribution, the expected numbers in each interval are the same. Ej = 100/5 = 20 for all j. 5
Chi-Square Statistic is:
∑(Oj j=1
2
- Ej) / Ej =
5
∑(Oj
2
- 20) / 20 =
j=1
5
5
j=1
j=1
∑O j2 /50 - 2 ∑O j + 100.
Therefore, the Chi-Square Statistic is: 2169/20 - 100 = 8.45. The number of degrees of freedom is: 5 - 1 = 4. 7.779 < 8.45 < 9.448.
⇒ Reject H0 at the 0.10 significance level, but not at the 0.05 significance level. Comment: Similar to 4, 5/07, Q.5. 12.24. False. The maximum likelihood fit will usually have a relatively small Chi-Square statistic, but it will usually not be the same as the distribution which has the smallest Chi-Square statistic. Comment: As discussed in a subsequent section, one can fit distributions by minimizing the Chi-Square Statistic, although this is not discussed in Loss Models. 12.25. E. Let N be the sample size. Then the Chi-Square Statistic is the sum of (assumed - observed)2 /assumed: (0.06N)2 /0.24N + 0 + (0.04N)2 /0.26N + (0.02N)2 /0.25N = 0.02275N. There are 4 - 1 = 3 degrees of freedom. The 1% critical value is 11.345. p-value is 1%. ⇒ 0.02275N = 11.345. ⇒ N = 499. Comment: See “The Expert Mind,” by Philip E. Ross, Scientific American, August 2006.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 372 12.26. D. S(x) = exp[-(x/7000)1/2], where x is the size of loss. S(500) = 0.7655. S(1500) = 0.6294. S(5500) = 0.4121. S(10,500) = 0.2938. S(25,000) = 0.1511. Payments Losses Probability (0, 1000] 500 to 1500 {S(500) - S(1500)}/S(500) = 0.1778. (1,000, 5,000] 1500 to 5500 {S(1500) - S(5500)}/S(500) = 0.2839. (5,000, 10,000] 5500 to 10,500 {S(5500) - S(10,500)}/S(500) = 0.1545. (10,000, 24,500) 10,500 to 25,000 {S(10,500) - S(25,000)}/S(500) = 0.1864. 24,500 25,000 or more S(25,000)/S(500) = 0.1974. Lower Limit
Upper Limit
Observed Number
Probability
Expected Number
ChiSquare
0 1,000 5,000 10,000 24,500
1,000 5,000 10,000 24,500
165 292 157 200 186
0.1777 0.2839 0.1545 0.1865 0.1974
177.7 283.9 154.5 186.5 197.4
0.908 0.231 0.039 0.982 0.658
1000
2.818
1000
(165
- 177.7)2 (292 - 283.9)2 (157 - 154.5 )2 (200 - 186.5 )2 (186 - 197.4 )2 + + + + 177.7 283.9 154.5 186.5 197.4
= 2.818. Comment: The data has been truncated and shifted from below at 500 and censored from above at 25,000. Note that the probabilities for the intervals add to unity, as they always should; the sum of the observed and expected columns are equal.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 373 12.27. C. Since we observe losses that may be as big as 500, we have ω ≥ 500. Then the likelihood is:
⎛ 50 ⎞ 30 ⎛ 100 - 50 ⎞ 50 ⎛ 250 - 100 ⎞ 130 ⎛ 500 - 250 ⎞ 190 , ⎝ω⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ω ω ω
which is proportional to ω−400. This is a decreasing function of ω, so in order to maximize the likelihood we want the smallest possible ω, which is 500. For a uniform distribution from 0 to 500, the expected number of losses for the four intervals are: 40, 40, 120, and 200. Therefore, the chi-square statistic is: (30 - 40)2 (50 - 40)2 (130 - 120) 2 (190 - 200) 2 + + + = 6.333. 40 40 120 200 There were four intervals, and one fitted parameter, so the number of degrees of freedom is: 4 - 1 - 1 = 2. For 2 degrees of freedom, the 5% critical value is 5.991 and the 2.5% critical value is 7.378. Since 5.991 < 6.333 < 7.378, we reject H0 at 5% but not at 2.5%. 12.28. C. F(1000) = 1 - Exp[-1000/1997] = 0.3939. F(2000) = 1 - Exp[-2000/1997] = 0.6327. Lower Limit
Upper Limit
0 1000 2000
1000 2000 infinity
# Prob. in obs. F(bot) F(top) Interval
Expected # claims
ChiSquare
70 60 70
0.3939 0.2387 0.3673
78.78 47.75 73.47
0.980 3.143 0.163
200
1.0000
200.00
4.286
(78.78 - 70)2 (47.75 - 60)2 (73.47 - 70)2 + + = 4.286. 78.78 47.75 73.47 One fitted parameter. ⇒ Degrees of freedom = 3 - 1 - 1 = 1. 3.841 < 4.286 < 5.024. ⇒ Reject at 5% and not at 2.5%.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 374 12.29. C. The total number of claims is 100. For each interval the expected number of claims is: (100)(F(upper) - F(lower)). For each interval compute: (expected - observed)2 / expected. Bottom of Interval
Top of Interval
# claims in the Interval
0 1000 2500 10000
1000 2500 10000 Infinity
55 15 22 8
F(lower) F(upper) 0.000 0.535 0.770 0.953
0.535 0.770 0.953 1.000
100
Expected # claims
Chi Square
53.5 23.5 18.3 4.7
0.04 3.08 0.77 2.30
100
6.19
For example, F(2500) = 1 - (1500/(1500+2500))1.5 = 1 - .3751.5 = 1 - .230 = .770. However, the final interval has fewer than 5 expected claims, so group it with the next to last interval: Bottom of Interval
Top of Interval
# claims in the Interval
0 1000 2500
1000 2500 Infinity
55 15 30
F(lower) F(upper) 0.000 0.535 0.770
0.535 0.770 1.000
100
Expected # claims
Chi Square
53.5 23.5 23.0
0.04 3.08 2.16
100
5.28
Comment: Since the question told us to have the minimum expected number of observations in any group should be 5, we did so. This is one common rule of thumb. 12.30. B. ν = The number of degrees of freedom is: # of intervals - 1 - # fitted parameters = 6 - 1 - 0 = 5. t
Observed Number Expected Number
((Observed - Expected)^2)/Expected
0 1 2 3 4 5
15 30 20 15 10 10
10 25 25 20 15 5
2.500 1.000 1.000 1.250 1.667 5.000
Sum
100
100
12.417
χ 2 - ν = 12.417 - 5 = 7.417. Comment: There is no mention of having fit the survival model to this data.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 375 12.31. B. For each of the six distinct size classes one computes: (observed number of claims - fitted number of claims )2 / fitted number of claims Lower Upper Observed Endpoint Endpoint # claims F(lower) F(upper) 0 25000 50000 50000 60000 100000
25000 50000 infinity 60000 100000 infinity
3900 1500 900 200 300 200
0.0000 0.5781 0.7840 0.7840 0.8285 0.9213
Assumed # claims
0.5781 0.7840 1.0000 0.8285 0.9213 1.0000
7000
4047 1441 1080 89 186 157
(Observed - Fitted)^2 / Fitted = Chi-Square 5.3 2.4 30.0 138.2 70.7 11.5
7000
258.1
For example, F(25000) = 1 - (75000/(25000+75000))3 = .5781. The expected number of claims in the interval from 25,000 to 50,000 is: (7000)(F(50000) - F(25000)) = (7000)(.7840 - .5781) = 1441. The total number of claims for State 2 is 2000. The number of claims in State 2 expected from 50,000 to 60,000 is: (2000)(F(60000)-F(50000)) = 2000(.8285 - .7840) = 89. The total number of claims for State 1 is 5000. The number of claims in State 1 expected from 50,000 to ∞ is: (5000)(F(∞)-F(50000)) = 5000(1 - .7840) = 1080. The contribution to the Chi-Square from this interval is (200-89)2 / 89 = 138. 12.32. D. The Chi-square statistic is computed by taking the sum of: (fitted - observed)2 / fitted numbers of claims. For each interval, the fitted number of claims = {total number of claims}{F(upper) - F(lower)}. For the Loglogistic, F(x) = (x/θ)γ/(1 + (x/θ)γ) = x/(2+x). Group the last three given intervals together to form an interval from 6 to ∞, in order to get an expected number of claims of at least 5. Bottom of Interval
Top of Interval
# claims in the Interval
0 2 6
2 6 Infinity
8 5 12
SUM
25.00
F(lower) F(upper) 0.000 0.500 0.750
0.500 0.750 1.000
Fitted # claims
((Observed - Fitted)^2)/Fitted equals Chi-Square
12.50 6.25 6.25
1.62 0.25 5.29
25.00
7.16
Comment: Since the question told us to have the minimum expected number of observations in any group should be 5, we did so. This is one common rule of thumb.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 376 12.33. B. For each of the six distinct size classes one computes: (observed number of claims - fitted number of claims )2 / fitted number of claims. Lower Upper Observed Endpoint Endpoint # claims F(lower) F(upper) 0 10000 25000 25000 35000 50000
10000 25000 infinity 35000 50000 infinity
7330 3970 2430 405 325 540
0.0000 0.4898 0.7500 0.7500 0.8264 0.8889
0.4898 0.7500 1.0000 0.8264 0.8889 1.0000
15000
Fitted # claims
Chi Square
7346.94 3903.06 2500.00 381.94 312.50 555.56
0.039 1.148 1.960 1.392 0.500 0.436
15000
5.47
For example, F(10000) = 1 - (25000/(10000+25000))2 = 0.4898. The fitted number of claims in the interval from 10,000 to 25,000 is (15000)(F(25000) - F(10000)) = (15000)(0.7500 - 0.4898) = 3903.1. The fitted number of claims in New Jersey from 25,000 to 35,000 is: (5000)(F(35000)-F(25000)) = 5000(0.8264 - 0.7500) = 381.94. The contribution to the Chi-Square from this interval is (405 - 381.94)2 / 381.94 = 1.392. 12.34. D. For each interval we compute (observed - fitted)2 / fitted. There are four intervals, so that the degrees of freedom are 4 - 1 = 3. Since 9.348 > 9.2 we canʼt reject at the 2.5% significance level. Since 7.815 < 9.2 we can reject at the 5% significance level. Lower Limit
Upper Limit
observations
1 1.333 2 4
1.333 2 4 infinity
3 6 10 1 20
F(lower) F(upper) 0.000 0.250 0.500 0.750
0.250 0.500 0.750 1.000
Fitted # claims
ChiSquare
5.0 5.0 5.0 5.0
0.80 0.20 5.00 3.20
20
9.20
Comment: Go to the row for 3 degrees of freedom, and find where 9.2 is bracketed: 7.815 < 9.2 < 9.348. Reject at the significance level of the column to the left, 5%, and do not reject to the right, 2.5%. The distribution is a Single Parameter Pareto distribution.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 377 12.35. A. F(x) = 1 - (1000/(1000+x))2 . Chi-Square goodness-of-fit statistic is the sum of terms for each interval: (observed number of claims - fitted number of claims )2 / fitted number of claims. Bottom of Interval
Top of Interval
F[Top of Interval]
Fitted # of Claims
Observed # of Claims
ChiSquare
0 250 500 1000
250 500 1000 ∞
0.360 0.556 0.750 1.000
3.600 1.960 1.940 2.500
3 2 3 2
0.1000 0.0008 0.5792 0.1000
10.000
10
0.7800
Sum
The fitted number of claims in an interval is 10 times the difference of the Distribution Function at the top and bottom of the interval. For example, the fitted number of claims for the third interval is 10(0.750 - 0.556) = 1.940. The Chi-Square contribution for the third interval is: (3 - 1.94)2 / 1.94 = 0.5792. Comment: We use these intervals to compute the Chi-Square statistic, as we were told to do. 12.36. B. For each interval we compute (observed - fitted)2 / fitted. As computed below, the sum of the contributions of the intervals is 9.50. There are five intervals, and we have fit two parameters using minimum Chi-Square, so that the degrees of freedom are 5 - 1 - 2 = 2. (Number of degrees of freedom = the number of intervals minus one, minus one degree of freedom for every parameter fit, when we fit via minimum Chisquare.) Since 10.597 > 9.50, we do not reject at the 0.5% significance level. Since 9.210 < 9.50, we can reject at the 1% significance level. Lower Limit
Upper Limit
observations
0 3 7.5 15 40
3 7.5 15 40 infinity
180 180 235 255 150 1000
F(lower) F(upper) 0.000 0.184 0.387 0.601 0.872
0.184 0.387 0.601 0.872 1.000
Fitted # claims
ChiSquare
184.5 202.4 213.9 271.4 127.8
0.11 2.47 2.08 0.99 3.85 9.50
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 378 12.37. E. This an Exponential Distribution, so F(x) = 1 - e-x/5. The Chi-square statistic is computed by taking the sum over the intervals of the squared differences of the fitted and observed numbers of claims divided by the expected number of claims: (Fitted-Observed)2 / Fitted. For each interval the fitted number of claims is: {total number of claims}{F(upper) - F(lower)} . Bottom of Interval
Top of Interval
# claims in the Interval
0 1 5 10 15
1 5 10 15 ∞
15 40 20 15 10
F(lower) F(upper) 0.000 0.181 0.632 0.865 0.950
0.181 0.632 0.865 0.950 1.000
Fitted # claims
Chi Square
18.1 45.1 23.3 8.6 5.0
0.54 0.57 0.46 4.86 5.06
100
11.49
12.38. D. f(x) = 1/x2 , x > 1, therefore integrating gives F(x) = 1 - 1/x, x > 1. The Chi-square statistic is computed by taking the sum of the contributions from each interval: (fitted - observed)2 / fitted numbers of claims. For each interval the fitted number of claims = {total number of claims}{F(upper) - F(lower)} Bottom of Interval
Top of Interval
# claims in the Interval
1 1.333 2 4
1.333 2 4 ∞
16 10 10 4 40
F(lower) F(upper) 0.000 0.250 0.500 0.750
0.250 0.500 0.750 1.000
Fitted # claims
Chi Square
10.0 10.0 10.0 10.0
3.60 0.00 0.00 3.60
40
7.20
We have 4 intervals, so we have 4 - 1 = 3 degrees of freedom. Since 7.20 < 7.815, we do not reject at 5%. Since 6.251 < 7.20, we reject at 10%. Comment: f(x) is the probability density function for a Single Parameter Pareto Distribution. Since there is no mention of having fit the curve to this data, we do not decrease the degrees of freedom by the number of fitted parameters. Whenever we are doing a Chi-Square goodness of fit test, we compare the expected number of items in each interval to the observed number of items in each interval. In that case, if the null hypothesis is true, the statistic has (approximately) a Chi-Square Distribution. If instead we were to compare the expected dollars in each interval to the observed number of dollars of each interval, that would be a different test. I do not know what the distribution of this different test statistic would be. (If the assumed size of loss distribution had a finite second moment, perhaps someone could figure out what the limit of the test statistic would be as the sample size went to infinity.)
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 379 12.39. E. There is no mention of having fit any parameters, ν = the number of degrees of freedom is: # of intervals - 1 - # fitted parameters = 4 - 1 - 0 = 3. time
Observed Number
S(upper end)
Probability Covered by Interval
Expected Number
Chi-Square
0 to 1 1 to 2 2 to 3 3 to 4
21 27 39 63
0.9 0.7 0.4 0
0.1 0.2 0.3 0.4
15 30 45 60
2.400 0.300 0.800 0.150
Sum
150
1
150
3.650
χ 2/ν = 3.65/3 = 1.22. 12.40. D. Since the data is truncated from below, we need to adjust the distribution function. G(x) = {F(x) - F(3000)}/S(3000), where F is for a Pareto with α = 2 and θ = 25,000. The Chi-square statistic is computed as usual by taking the sum of: (fitted - observed)2 / fitted numbers of claims. For each interval the fitted number of claims = {total number of claims}{G(upper) - G(lower)}. Bottom of Interval
Top of Interval
3000 5000 10000 25000
5000 10000 25000 ∞
# claims Fitted in the F(lower) F(upper) G(lower) G(upper) # claims interval 6 29 39 26
0.2028 0.3056 0.4898 0.7500
0.3056 0.4898 0.7500 1.0000
0.0000 0.1289 0.3600 0.6864
0.1289 0.3600 0.6864 1.0000
100
There are 4 intervals, so we have 4 - 1 = 3 degrees of freedom. Since 6.251< 7.34 < 7.815, we do not reject at 5% and reject at 10%.
Chi Square
12.9 23.1 32.6 31.4
3.68 1.50 1.24 0.92
100
7.34
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 380 12.41. Since the Pareto was not fit to the data set to which we are comparing, subtract no fitted parameters. ⇒ The number of degrees of freedom is # of intervals - 1 = 5 - 1 = 4. For each interval compute: (observed # of claims - assumed # of claims)2 / assumed #. Lower Limit
Upper Limit
observations
F(lower)
0 2000 4000 8000 15000
2000 4000 8000 15000 infinity
39 22 17 12 10
0.0000 0.3661 0.5688 0.7700 0.8988
F(upper)
Assumed # claims
ChiSquare
0.3661 0.5688 0.7700 0.8988 1.0000
36.61 20.27 20.12 12.89 10.12
0.16 0.15 0.48 0.06 0.00
100
0.85
F(x) = 1 - (1 + x/10000)-2.5. For example, F(2000) = 1 - (1 + 2000/10000)-2.5 = 0.3661. (100)(.5688 - .3661) = 20.27. (22 - 20.27)2 / 20.27 = 0.15. Comment: Since .85 < 7.779, we do not reject the Pareto at 10%. The method of moments Pareto fit to this data is α = 2.780 and θ = 11,884, as determined in the solution to Course 4 Sample Exam, Q.8. This would have 5 - 1 - 2 = 2 degrees of freedom and a chi-square statistic of 1.45. Since 1.45 < 4.605 we would not reject this Pareto at 10%. Lower Limit
Upper Limit
observations
F(lower)
F(upper)
Fitted # claims
ChiSquare
0 2000 4000 8000 15000
2000 4000 8000 15000 infinity
39 22 17 12 10
0.0000 0.3511 0.5536 0.7609 0.8966
0.3511 0.5536 0.7609 0.8966 1.0000
35.11 20.25 20.73 13.57 10.34
0.43 0.15 0.67 0.18 0.01
100
1.45
12.42. A. There are 30 observed, so for an interval to have at least 5 expected claims, it must cover at least: 5/30 = 1/6 = 16.67%. The largest number of groups that accomplishes this is: 0 to 500, 500 to 2498, 2498 to 4876, and 4876 to ∞. Each contribution from an interval is: (assumed - observed)2 /assumed. Lower Limit
Upper Limit
observations
F(lower)
0 500 2498 4876
500 2498 4876 infinity
3 8 9 10
0.00 0.27 0.55 0.81
F(upper)
Prob. in Interval
Assumed # claims
ChiSquare
0.27 0.55 0.81 1.00
0.27 0.28 0.26 0.19
8.1 8.4 7.8 5.7
3.21 0.02 0.18 3.24
30.0
6.66
30
(8.1 - 3)2 /8.1 + (8.4 - 8)2 /8.4 + (7.8 - 9)2 /7.8 + (5.7 - 10)2 /5.7 = 6.66. Alternately, χ2 = Σ (Oi2 / Ei) - n = 32 /8.1 + 82 /8.4 + 92 /7.8 + 102 /5.7 - 30 = 6.66.
2013-4-6, Fitting Loss Distributions §12 Chi-Square Test, HCM 10/14/12, Page 381 12.43. C. There are 5 - 1 = 4 degrees of freedom. Lower Limit
Upper Limit
observations
F(lower)
0 2 5 7 8
2 5 7 8 ∞
5 42 137 66 50
0.0000 0.0350 0.1300 0.6300 0.8300
F(upper)
Expected Number
ChiSquare
0.0350 0.1300 0.6300 0.8300 1.0000
10.5 28.5 150.0 60.0 51.0
2.881 6.395 1.127 0.600 0.020
300
11.022
300
(5 - 10.5)2 /10.5 + (42 - 28.5)2 /28.5 + (137 - 150)2 /150 + (66 - 60)2 /60 + (50 - 51)2 /51 = 11.022. Since 9.488 < 11.022 ≤ 11.143, reject at 5% and do not reject at 2.5%.
12.44. E.
20
20
j=1
j=1
∑O j = ∑Ej = 1000.
Since the intervals are of the same length and we are assuming a uniform distribution, the expected number for each interval are the same. Ej = 1000/20 = 50 for all j. 20
Chi-Square Statistic is:
∑(Oj j=1
2
- Ej) / Ej =
20
∑(Oj
2
- 50) / 50 =
j=1
20
∑
2 O j /50 j=1
20
- 2 ∑O j + 1000. j=1
Therefore, the Chi-Square Statistic is: 51,850/50 - 1000 = 37. The number of degrees of freedom is: 20 - 1 = 19. 36.191 < 37 < 38.582.
⇒ Reject H0 at the 0.01 significance level, but not at the 0.005 significance level. Alternately, χ2 = Σ(Oi2 / Ei) - n = ΣOi2 /50 - 1000 = 51,850/50 - 1000 = 37. Proceed as before. Comment: There are no fitted parameters. This is a case where using a 1% significance level we would make a Type I error; the simulated sample was sufficiently unusual that we rejected the null hypothesis even though it was true.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 382
Section 13, The Likelihood Ratio Test156 As discussed previously, the larger the likelihood or loglikelihood the better the fit. One can use the Likelihood Ratio Test to test whether a fit is significantly better. For example we previously fit via Maximum Likelihood both a Transformed Gamma and a Weibull to the ungrouped data in Section 2. Since the Weibull is a special case of a Transformed Gamma, the Transformed Gamma has a larger maximum likelihood. The loglikelihoods were -1748.98 for the Transformed Gamma and -1753.04 for the Weibull. The test statistic is twice the difference of loglikelihoods: (2){-1748.98 - (-1753.04)} = 8.12. One compares the test statistic to the Chi-Square Distribution with one degree of freedom: 0.100 2.706
0.050 3.841
0.025 5.024
0.010 6.635
0.005 7.879
Since 8.12 > 7.879, at the 0.5% significance level we reject the null hypothesis that the data came from the fitted Weibull (2 parameters), as opposed to the alternative hypothesis that the data came from the fitted Transformed Gamma Distribution (3 parameters). The Likelihood Ratio Test (or Loglikelihood Difference Test) proceeds as follows:157 1. One has two distributions, one with more parameters than the other, both fit to the same data via Maximum Likelihood. 2. One of the distributions is a special case of the other.158 3. Compute twice the difference in the loglikelihoods.159 160 4. Compare the result of step 3 to the Chi-Square Distribution, with a number of degrees of freedom equal to the difference of the number of fitted parameters of the two distributions. 5. Draw a conclusion as to whether the more general distribution fits significantly better than its special case. H0 is the hypothesis that the distribution with fewer parameters is appropriate. The alternative hypothesis H1 is that the distribution with more parameters is appropriate.
156
See Section 16.4.4 in Loss Models. Note that the twice the difference of the loglikelihoods approaches a Chi-Square Distribution as the sample size gets larger. Thus one should be cautious about drawing any conclusion concerning fits to small data sets. 158 This test is often applied when one distribution is the limit of the other. Loss Models at page 456 states that in this case the test statistic has a mixture of Chi-Square Distributions. 159 Equivalently, one computes twice the log of the ratio of the likelihoods. 160 The factor of two is a constant, which as discussed subsequently, comes from the 1/2 in the exponent of the density of the Standard Normal Distribution. 157
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 383 Exercise: Both a Weibull (2 parameters) and an Exponential Distribution (1 parameter) have been fit to the same data via the Method of Maximum Likelihood. The loglikelihoods are -737 for the Weibull and -740 for the Exponential. Use the Likelihood Ratio Test to determine whether the Weibull fits this data significantly better than the Exponential. [Solution: The Exponential is a special case of the Weibull, with one less parameter. Therefore we compare to the Chi-Square Distribution with one degree of freedom. Twice the difference of the loglikelihoods is 6. 5.024 < 6 < 6.635. Thus at 2.5% we reject the simpler Exponential model, in favor of H1 that the more complex Weibull model is appropriate. At the 1% level we do not reject H0 that the simpler Exponential model is appropriate.] Exercise: A Weibull Distribution has been fit to some data via the Method of Maximum Likelihood. The fitted parameters are θ^ = 247 and ^τ = 1.54 with loglikelihood of -737. For θ = 300 and τ = 2, the loglikelihood is -741. Use the likelihood ratio test in order to test the hypothesis that θ = 300 and τ = 2. [Solution: The Weibull θ = 300 and τ = 2 is a special case of the general Weibull, with two less parameters. Therefore we compare to the Chi-Square Distribution with two degrees of freedom. Twice the difference of the loglikelihoods is 8. 7.378 < 8 < 9.210. Thus at 2.5% we reject the hypothesis that θ = 300 and τ = 2, while at the 1% level we do not reject. Comment: See Example 16.9 in Loss Models.]
Testing Other Hypothesis:161 One can use the likelihood ratio in order to test other hypotheses. For example, one can test hypotheses involving restrictions on the relationships of the parameters of the distributions of two related data sets, such as in the following example. Phil and Sylvia are competitors in the light bulb business.162 You were able to test 20 of Philʼs bulbs and 10 of Sylviaʼs bulbs: Number of Bulbs Total Lifetime Average Lifetime Phil 20 20,000 1000 Sylvia 10 15,000 1500 You assume that the distribution of the lifetime (in hours) of a light bulb is Exponential.
161 162
The general subject of hypothesis testing is discussed in a subsequent section. See 4, 11/00, Q.34, in the section on Maximum Likelihood.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 384 Exercise: Using maximum likelihood, separately estimate θ for Philʼs Exponential Distribution. What is the corresponding maximum loglikelihood? [Solution: For an Exponential Distribution, ln f(x) = -x/θ - ln(θ). The loglikelihood for Philʼs bulbs is: Σ{-xi /θ - ln(θ)} = (-1/θ)Σxi - nln(θ) = -20000/θ - 20ln(θ). Setting the partial derivative of the loglikelihood with respect to θ equal to zero: 0 = 20000/θ2 - 20/θ. θ = 20000/20 = 1000. (The same result as the method of moments.) The maximum loglikelihood is: -20000/1000 - 20ln(1000) = -158.155.] Similarly, if we separately estimate θ for Sylviaʼs Exponential Distribution, θ = 1500. The corresponding maximum loglikelihood is: -15000/1500 - 10ln(1500) = -83.132.163 Sylvia advertises that her light bulbs burn twice as long as Philʼs. Using maximum likelihood applied to all the data, estimate θ for Philʼs Exponential Distribution restricted by Sylviaʼs claim. What is the corresponding maximum loglikelihood? [Solution: For the Exponential Distribution, f(x) = e-x/θ/θ. ln f(x) = -x/θ - ln(θ). Assuming θS = 2θP, then the loglikelihood is:
Σ{-xi/θP - ln(θP)} + Σ{-xi/ 2θP - ln(2θP)} = -20000/θP - 20ln(θP) - 15000/ (2θP) - 10ln(2θP). Phil
Sylvia
Setting the partial derivative of the loglikelihood with respect to θ equal to zero: 0 = 20000/θP2 - 20/θP + 15000/ 2θP2 - 10/θP. θP = (20000 + 15000/2)/(20 + 10) = 917. θS = 2θP = 1834. The maximum loglikelihood is: -20000/917 - 20ln(917) - 15000/1834 - 10ln(1834) = -21.810 - 136.422 - 8.179 - 75.143 = -241.554.] The unrestricted maximum loglikelihood is: -158.155 - 83.132 = -241.287, somewhat better than the restricted maximum likelihood of -241.554. It is not surprising that without the restriction we can do a somewhat better job of fitting the data. The unrestricted model involves two Exponentials, while the restricted model is a special case in which one of the Exponentials has twice the mean of the other. Let the null hypothesis H0 be that Sylviaʼs light bulbs burn twice as long as Philʼs. Let the alternative H1 be that H0 is not true. Then we can use the likelihood ratio test as follows. 163
In general, the maximum loglikelihood for an Exponential and n ungrouped data points is: -n(1 + ln( x )).
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 385 We use the loglikelihood for the restricted model of -241.554 and the loglikelihood for the unrestricted model of -241.287. The test statistic is as usual twice the difference in the loglikelihoods: (2) {-241.287 - (-241.554)} = 0.534. We compare to the Chi-Square with one degree of freedom, since the restriction is one dimensional. Since 0.534 < 2.706, we do not reject H0 at 10%. One reason we did not reject Sylviaʼs claim was due to the small sample size. Exercise: Redo the above example with the following different data: Number of Bulbs Total Lifetime Average Lifetime Phil 200 200,000 1000 Sylvia 100 150,000 1500 [Solution: Separate estimate of θ for Philʼs Exponential Distribution, θ = 1000. The corresponding maximum loglikelihood is: -200000/1000 - 200ln(1000) = -1581.55. Separate estimate of θ for Sylviaʼs Exponential Distribution, θ = 1500. The corresponding maximum loglikelihood is: -150000/1500 - 100ln(1500) = -831.32. Restricted by Sylviaʼs claim, θP = (200000 + 150000/2)/(200 + 100) = 917. θS = 2θP = 1834. The maximum loglikelihood is: -200000/917 - 200ln(917) - 150000/1834 - 100ln(1834) = -2415.54. Unrestricted loglikelihood is: -1581.55 - 831.32 = -2412.87. Twice the difference in the loglikelihoods: (2) {-2412.87 - (-2415.54)} = 5.34. Compare to the Chi-Square with one degree of freedom. Since 5.024 < 5.34 < 6.635, we reject H0 at 2.5% and do not reject H0 at 1%. Comment: The sample size was 10 times as large and so were all the loglikelihoods.] As another example, recall that the Pareto fit by Maximum Likelihood to the Ungrouped Data in Section 2 had parameters α = 1.702 and θ = 240,151 and loglikelihood of -1747.87. The Ungrouped Data in Section 2 had a mean of 312,675. If we were to restrict our review to Pareto Distributions with this mean, then θ = 312,675(α - 1).164 With this restriction, the maximum likelihood Pareto has α = 2.018 and θ = 318,303 with loglikelihood of -1748.12. We test the hypothesis that the mean is 312,675 versus the alternative that it is not, by the likelihood ratio test.165 The test statistic is: (2) {-1747.87 - (-1748.12)} = 0.50. Since the restriction is one dimensional, we compare to a Chi-Square with 1 degree of freedom. Since 0.50 < 2.706, at 10% we do not reject the hypothesis that the mean is 312,675. 164 165
Such that the mean θ/(α−1) = 312675. This is a special case of the Pareto, with one fewer parameter. See Example 16.9 in Loss Models.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 386 A Practical Technique of Deciding Which of Many Distributions to Use:166 Assume a number of distributions with various numbers of parameters have been fit via maximum likelihood to the same data. For each number of parameters, the best fit is that with the largest loglikelihood. Choose a significance level. 1. Compare the loglikelihood of the best one parameter distribution with the best two parameter distribution, using the likelihood ratio test.167 2. Compare the best 3 parameter distribution to the result of step 1, using the likelihood ratio test. 3. Compare the best four parameter distribution (if any) to the result of step 2, using the likelihood ratio test. 4. etc. Exercise: Distributions have been fit to the same data via maximum likelihood. Distribution Negative Loglikelihood Number of Parameters Burr 4508.32 3 Exponential 4511.07 1 Gamma 4509.82 2 Generalized Pareto 4508.56 3 Inverse Burr 4507.93 3 Inverse Exponential 4513.12 1 Inverse Gamma 4509.41 2 Inverse Gaussian 4511.27 2 Inverse Transformed Gamma 4507.59 3 LogLogistic 4510.03 2 LogNormal 4508.95 2 Mixture of 2 Exponentials 4509.02 3 Mixture of 3 Exponentials 4505.91 5 Mixed Exponential-Pareto 4506.10 4 Mixed Weibull-Pareto 4504.82 5 Pareto 4510.38 2 Pareto with α = 2
4512.60
1
Transformed Gamma Transformed Beta Weibull
4509.23 4507.04 4510.18
3 4 2
Weibull with τ = 0.5
4510.74
1
Based on this information, at a significance level of 5%, using the above technique, which distribution provides the best fit? 166
See Section 16.5.3 of Loss Models. Even though there is no theorem to justify it, actuaries often use the Likelihood Ratio Test to compare fits of distributions with different numbers of parameters, even when one is not a special case of the other. 167
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 387 [Solution: The best one parameter distribution is the Weibull with τ = 0.5, with a loglikelihood of -4510.74. The best two parameter distribution is the LogNormal, with a loglikelihood of -4508.95. Twice the difference in the loglikelihoods is: (2)(1.79) = 3.58. The critical value at 5% for one degree of freedom is 3.841. 3.58 ≤ 3.841. ⇒ use the simpler Weibull with τ = 0.5. The best three parameter distribution is the Inverse Transformed Gamma, with a loglikelihood of -4507.59. Twice the difference in the loglikelihoods between the Inverse Transformed Gamma and the Weibull with τ = 0.5 is: (2)(3.15) = 6.30. The critical value at 5% for two degrees of freedom is 5.991. 6.30 > 5.991. ⇒ use the Inverse Transformed Gamma. The best four parameter distribution is the Mixed Exponential-Pareto, with a loglikelihood of -4506.10. Twice the difference in the loglikelihoods between the Mixed Exponential-Pareto and the Inverse Transformed Gamma is: (2)(1.49) = 2.98. 2.98 ≤ 3.841.
⇒ use the simpler Inverse Transformed Gamma. The best five parameter distribution is the Mixed Weibull-Pareto, with a loglikelihood of -4504.82. Twice the difference in the loglikelihoods between the Mixed Weibull-Pareto and the Inverse Transformed Gamma is: (2)(2.77) = 5.54. 5.54 ≤ 5.991. ⇒ use the simpler Inverse Transformed Gamma. Thus the Inverse Transformed Gamma is the best fit. Comment: At each stage we apply the likelihood ratio test, even when we do not have a special case or a limit.] Derivation of the Likelihood Ratio Test:168 Under regularity conditions, maximum likelihood estimates asymptotically have a Multivariate Normal distribution with Covariance Matrix V, and the likelihood function asymptotically has a Multivariate Normal distribution.169 The likelihood is proportional to: exp[-(1/2)(t - θ)ʼ V-1 (t - θ)], where t is the vector of maximum likelihood parameters, θ is a vector of values of these parameters, probably different than those that produce the maximum likelihood, and V is the covariance matrix.170
168
The derivation is not on the syllabus. See for example, Kendallʼs Advanced Theory of Statistics, Volume 2. Asymptotically, means as the sample size approaches infinity. 170 See Equation 23.25 in Kendallʼs Advanced Theory of Statistics, Volume 2. 169
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 388 For H1 , the unrestricted case, the maximum likelihood set of parameters is t.171 The likelihood for θ = t is proportional to: exp[-(1/2)(t - t)ʼ V-1 (t - t)] = e0 = 1. H0 is the restricted case, where for example some of the parameters are fixed.172 Call the maximum likelihood vector of parameters for the restricted case r. The likelihood for θ = r is proportional to: exp[-(1/2)(t - r)ʼ V-1 (t - r)]. Therefore, the ratio of the maximum likelihood in the unrestricted case to the maximum likelihood in the restricted case is: 1/exp[-(1/2)(t - r)ʼ V-1 (t - r)] = exp[(1/2)(t - r)ʼ V-1 (t - r)]. The likelihood ratio test statistic is twice the log of this ratio, which is: (t - r)ʼ V-1 (t - r). Since for large samples, the vector of maximum likelihood parameters, t, has a Multivariate Normal distribution, the quadratic form (t - r)ʼ V-1 (t - r) has what is called a NonCentral Chi-Square Distribution.173 If H0 is true, then since maximum likelihood is asymptotically unbiased, the expected values of the vectors t and r are equal, and the likelihood ratio test statistic, (t - r)ʼ V-1 (t - r), is a (central) Chi-Square Distribution. The number of degrees of freedom is equal to the difference in the dimensionality of the unrestricted and restricted cases, or the number of parameters fixed in the restricted case. Note that this result was derived assuming a large sample. As the sample size approaches infinity, the likelihood ratio test statistic approaches a Chi-Square Distribution.174
171 172
H1 might be for example that the distribution is a Transformed Gamma, with 3 parameters. H0 might be for example that the distribution is an Exponential, which is a special case of a Transformed Gamma
with α = 1 and τ = 1. 173 See Section 23.6 of Kendallʼs Advanced Theory of Statistics, Volume 2. 174 See “Mahlerʼs Guide to Simulation” for an example of estimating the p-value via simulation for the Likelihood Ratio Test for a sample of size 100.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 389 Problems: Use the following information for the next 7 questions:
• Define the “Holmes Distribution” to be: F(x) = 1 - (1+x/θ)-2, x > 0. • Various distributions have each been fit to the same data via Maximum Likelihood. • The loglikelihoods are as follows: Distribution Transformed Beta Transformed Gamma Generalized Pareto Weibull Pareto Holmes Exponential
Number of Parameters 4 3 3 2 2 1 1
Loglikelihood -2582 -2583 -2585 -2586 -2587 -2588 -2592
13.1 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Pareto Distribution is an appropriate model versus the alternative hypothesis H1 that the Generalized Pareto Distribution is an appropriate model for this data. Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. None of the above. 13.2 (2 points) Based on the likelihood ratio test, one tests at the 1% significance level. Which of the following statements are true? 1. The Generalized Pareto Distribution is a more appropriate model for this data than the Transformed Beta Distribution. 2. The Pareto Distribution is a more appropriate model for this data than the Transformed Beta Distribution. 3. The Holmes Distribution is a more appropriate model for this data than the Transformed Beta Distribution. A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A,B,C, or D.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 390 13.3 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Holmes Distribution is an appropriate model versus the alternative hypothesis H1 that the Pareto Distribution is an appropriate model for this data. Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. None of the above. 13.4 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Generalized Pareto Distribution is not a better model for this data than the Holmes Distribution. Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. None of the above. 13.5 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Weibull Distribution is an appropriate model versus the alternative hypothesis H1 that the Transformed Gamma Distribution is an appropriate model for this data. Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. None of the above. 13.6 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Exponential Distribution is an appropriate model versus the alternative hypothesis H1 that the Transformed Gamma Distribution is an appropriate model for this data. Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. None of the above.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 391 13.7 (1 point) Based on the likelihood ratio test, one tests the hypothesis H0 that the Weibull Distribution is not a better model for this data than the Exponential Distribution. Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. None of the above.
Use the following information for the next four questions: You are given the following 4 claims: 100, 500, 2000, 10,000. You fit a LogNormal Distribution to these claims via maximum likelihood. 13.8 (2 points) What is the fitted value of µ? 13.9 (1 point) What is the fitted value of σ? 13.10 (2 points) What is the maximum loglikelihood? 13.11 (3 points) Determine the result of using the likelihood ratio test in order to test the hypothesis that µ = 5 and σ = 1.5. (A) Reject at the 0.005 significance level. (B) Reject at the 0.010 significance level, but not at the 0.005 level. (C) Reject at the 0.025 significance level, but not at the 0.010 level. (D) Reject at the 0.050 significance level, but not at the 0.025 level. (E) Do not reject at the 0.050 significance level.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 392 13.12 (5 points) You have the following data from three states: State Number of Claims Aggregate Losses Average Size of Loss Bay 3000 3,000,000 1000 Empire 6000 7,500,000 1250 Granite 1500 1,125,000 750 The size of claim distribution for each state is Exponential. Let H0 be the hypothesis that the mean claim size for Empire State is 1.3 times that for Bay State and 1.6 times that for Granite State. Based on the likelihood ratio test, one tests the hypothesis H0 . Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. Do not reject H0 at 5%. Use the following information for the next two questions: You are given the following 5 claims: 40, 150, 230, 400, 770. You assume the size of loss distribution is Gamma. 13.13 (3 points) You fit a Gamma Distribution with α = 3 to these claims via maximum likelihood. What is the maximum loglikelihood? A. less than -34.8 B. at least -34.8 but less than -34.7 C. at least -34.7 but less than -34.6 D. at least -34.6 but less than -34.5 E. at least -34.5 13.14 (3 points) Determine the result of using the likelihood ratio test in order to test the hypothesis that α = 3 and θ = 200 versus the alternative α = 3. (A) Reject at the 0.005 significance level. (B) Reject at the 0.010 significance level, but not at the 0.005 level. (C) Reject at the 0.025 significance level, but not at the 0.010 level. (D) Reject at the 0.050 significance level, but not at the 0.025 level. (E) Do not reject at the 0.050 significance level.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 393 13.15 (1 point) A LogNormal Distribution has been fit via maximum likelihood to some data. The corresponding loglikelihood is -2067.83. Then restricting the mean to be 1000, another LogNormal Distribution has been fit via maximum likelihood to the same data. The corresponding loglikelihood is -2072.02. Let H0 be that the mean of the population that produced this data is 1000. Let H1 be that the mean of the population that produced this data is not 1000. Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. Do not reject H0 at 5%. 13.16 (3 points) You have the following data from the state of West Carolina: Region Number of Claims Aggregate Losses Average Size of Claim Rural 5000 500,000 100 Urban 10,000 1,250,000 125 You assume that the distribution of sizes of claims is exponential. Based on data from other states, you assume that the mean claim size for Urban insureds is 1.2 times that for Rural insureds. Let H0 be the hypothesis that the mean claim size in West Carolina for Urban is 1.2 times that for Rural. Let H1 be that H0 is not true. Using the likelihood ratio test, one tests the hypothesis H0 . Which of the following is true? A. Reject H0 at 1/2%. B. Reject H0 at 1%. Do not reject H0 at 1/2%. C. Reject H0 at 2.5%. Do not reject H0 at 1%. D. Reject H0 at 5%. Do not reject H0 at 2.5%. E. Do not reject H0 at 5%. 13.17 (3 points) Let X1 , X2 , ..., Xn be a sample from a Normal Distribution. Let H0 : Normal with µ = 0, and σ = 1. Let H1 : Normal with mean µ, and σ = 1. Demonstrate that in this case the likelihood ratio test statistic has a Chi-Square Distribution with one degree of freedom, if H0 is true.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 394 13.18 (4 points) You are given: (i) 500 claim amounts are randomly selected from a Pareto distribution with θ = 10 and unknown α. (ii) Σ ln(xi + 10) = 1485. You use the likelihood ratio test to test the hypothesis that θ = 10 and α = 1.7. Determine the result of the test. (A) Reject at the 0.005 significance level. (B) Reject at the 0.010 significance level, but not at the 0.005 level. (C) Reject at the 0.025 significance level, but not at the 0.010 level. (D) Reject at the 0.050 significance level, but not at the 0.025 level. (E) Do not reject at the 0.050 significance level. 13.19 (3 points) Distributions have been fit to the same data via maximum likelihood. Distribution Negative Loglikelihood Number of Parameters Burr 3020.65 3 Exponential 3023.88 1 Gamma 3021.05 2 Generalized Pareto 3021.82 3 Inverse Burr 3019.25 3 Inverse Exponential 3031.71 1 Inverse Gaussian 3023.02 2 LogNormal 3024.20 2 Pareto 3022.87 2 Transformed Gamma 3016.98 3 Weibull 3022.04 2 At a significance level of 1%, which distribution provides the best fit? 13.20 (3 points) You observe the following 10 losses from state A: 10, 79, 87, 22, 18, 34, 73, 70, 58, 69. The sum is 520. You observe the following 10 losses from state B: 48, 125, 100, 170, 133, 56, 131, 87, 205, 105. The sum is 1160. Let H0 : The losses from both states follow the same Exponential Distribution. Let H1 : The losses from each state follow an Exponential Distribution. Using the likelihood ratio test, which of the following are true? (A) Do not reject H0 at the 0.10 level of significance. (B) Reject H0 at the 0.10 level of significance, but not at the 0.05 level of significance. (C) Reject H0 at the 0.05 level of significance, but not at the 0.025 level of significance. (D) Reject H0 at the 0.025 level of significance, but not at the 0.01 level of significance. (E) Reject H0 at the 0.01 level of significance.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 395 13.21 (1 point) You fit via maximum likelihood an Exponential and Weibull Distribution to the same data set. The loglikelihood for the fitted Exponential is -1034.71. The loglikelihood for the fitted Weibull is -1031.30. Based on the likelihood ratio test, test the hypothesis H0 that the Exponential Distribution is an appropriate model versus the alternative hypothesis H1 that the Weibull Distribution is an appropriate model for this data. (A) Reject H0 at the 0.005 significance level. (B) Reject H0 at the 0.010 significance level, but not at the 0.005 level. (C) Reject H0 at the 0.025 significance level, but not at the 0.010 level. (D) Reject H0 at the 0.050 significance level, but not at the 0.025 level. (E) Do not reject H0 at the 0.050 significance level. 13.22 (3 points) You are given: (i) A random sample of 100 losses from a Inverse Gamma distribution. (ii) The maximum likelihood estimates are α^ = 3.075 and θ^ = 994. (ii) The natural logarithm of the likelihood function evaluated at the maximum likelihood estimates is -686.084. (iii) When α = 4, the maximum likelihood estimate of θ is 1293. (iv) Σln(xi) = 594.968. (v)
Σxi = 50,383.
(vi) Σ 1/xi = 0.309383. (vii) You use the likelihood ratio test to test the hypothesis H0 : α = 4. H1 : α ≠ 4. Determine the result of the test. (A) Do not reject H0 at the 0.10 level of significance. (B) Reject H0 at the 0.10 level of significance, but not at the 0.05 level of significance. (C) Reject H0 at the 0.05 level of significance, but not at the 0.025 level of significance. (D) Reject H0 at the 0.025 level of significance, but not at the 0.01 level of significance. (E) Reject H0 at the 0.01 level of significance.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 396 13.23 (4 points) You that assume a data set of size n came from an Inverse Gaussian with θ = 13. You use the likelihood ratio test in order to test H0 : µ = µ0 versus H1 : µ ≠ µ0 . Let X = the sample mean. Show that the likelihood ratio test statistic has the form:
13 n X µ0 2
( X - µ0 )2 .
Use the following information for the next two questions:
• A random sample of losses from an Inverse Weibull distribution is: 0.1
0.3
0.6
0.8
13.24 (3 points) For τ = 2, determine the maximum loglikelihood. 13.25 (3 points) The maximum likelihood estimates are θ^ = 0.2277 and τ^ = 1.2433. You use the likelihood ratio test to test the hypothesis H0 : τ = 2. H1 : τ ≠ 2. Determine the result of the test. (A) Do not reject H0 at the 0.10 level of significance. (B) Reject H0 at the 0.10 level of significance, but not at the 0.05 level of significance. (C) Reject H0 at the 0.05 level of significance, but not at the 0.025 level of significance. (D) Reject H0 at the 0.025 level of significance, but not at the 0.01 level of significance. (E) Reject H0 at the 0.01 level of significance.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 397 13.26 (4, 11/03, Q.28 & 2009 Sample Q.22) (2.5 points) You fit a Pareto distribution to a sample of 200 claim amounts and use the likelihood ratio test to test the hypothesis that α = 1.5 and θ = 7.8. You are given: (i) The maximum likelihood estimates are α^ = 1.4 and θ^ = 7.6. (ii) The natural logarithm of the likelihood function evaluated at the maximum likelihood estimates is -817.92. (iii) Σln(xi + 7.8) = 607.64. Determine the result of the test. (A) Reject at the 0.005 significance level. (B) Reject at the 0.010 significance level, but not at the 0.005 level. (C) Reject at the 0.025 significance level, but not at the 0.010 level. (D) Reject at the 0.050 significance level, but not at the 0.025 level. (E) Do not reject at the 0.050 significance level. 13.27 (4, 11/05, Q.25 & 2009 Sample Q.235) (2.9 points) You are given: (i) A random sample of losses from a Weibull distribution is: 595 700 789 799 1109 (ii) At the maximum likelihood estimates of θ and τ, Σ ln(f(xi)) = -33.05. (iii) When τ = 2, the maximum likelihood estimate of θ is 816.7. (iv) You use the likelihood ratio test to test the hypothesis H0 : τ = 2. H1 : τ ≠ 2. Determine the result of the test. (A) Do not reject H0 at the 0.10 level of significance. (B) Reject H0 at the 0.10 level of significance, but not at the 0.05 level of significance. (C) Reject H0 at the 0.05 level of significance, but not at the 0.025 level of significance. (D) Reject H0 at the 0.025 level of significance, but not at the 0.01 level of significance. (E) Reject H0 at the 0.01 level of significance.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 398 13.28 (4, 5/07, Q.14) (2.5 points) You are given: (i) Twenty claim amounts are randomly selected from a Pareto distribution with α = 2 and unknown θ. (ii) The maximum likelihood estimate of θ is 7.0. (iii) Σ ln(xi + 7.0) = 49.01. (iv) Σ ln(xi + 3.1) = 39.30. You use the likelihood ratio test to test the hypothesis that θ = 3.1. Determine the result of the test. (A) Do not reject H0 at the 0.10 significance level. (B) Reject H0 at the 0.10 significance level, but not at the 0.05 significance level. (C) Reject H0 at the 0.05 significance level, but not at the 0.025 significance level. (D) Reject H0 at the 0.025 significance level, but not at the 0.01 significance level. (E) Reject H0 at the 0.01 significance level.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 399 Solutions to Problems: 13.1. D. The Pareto is a special case of the Generalized Pareto, with one less parameter. Twice the difference in the loglikelihoods is 2{-2585 -(-2587)} = 4. Since there is a difference of one in the number of parameters we compare to the Chi-Square Distribution with one degree of freedom. Since 3.841 < 4 < 5.024 we reject at 5% and do not reject at 2.5%. 13.2. A. 1. We test the hypothesis H0 that the Generalized Pareto Distribution is an appropriate model versus the alternative hypothesis H1 that the Transformed Beta Distribution is an appropriate model for this data. The Generalized Pareto is a special case of the Transformed Beta, with one less parameter. Twice the difference in the loglikelihoods is: 2{-2582 -(-2585)} = 6. Since there is a difference of one in the number of parameters we compare to the Chi-Square Distribution with one degree of freedom. Since 5.024 < 6 < 6.635, we reject at 2.5% and do not reject at 1%. Thus statement #1 is True. (The Transformed Beta is a better model than the Generalized Pareto at the 2.5% significance level. However, the Generalized Pareto is a better model than the Transformed Beta at the 1% significance level.) 2. We test the hypothesis H0 that the Pareto Distribution is an appropriate model versus the alternative hypothesis H1 that the Transformed Beta Distribution is an appropriate model for this data. The Pareto is a special case of the Transformed Beta, with two less parameters. Twice the difference in the loglikelihoods is: 2{-2582 -(-2587)} = 10. Since there is a difference of two in the number of parameters we compare to the Chi-Square Distribution with two degrees of freedom. Since 9.210 < 10 < 10.597 we reject at 1% and do not reject at 0.5%. Thus statement #2 is False. 3. The “Holmes” Distribution is a Pareto with alpha fixed at 2 and is a special case of the Transformed Beta, with three less parameters. Twice the difference in the loglikelihoods is: 2{-2582 -(-2588)} = 12. Since there is a difference of three in the number of parameters we compare to the Chi-Square Distribution with three degrees of freedom. Since 11.345 < 12 < 12.838 we reject at 1% and do not reject at 0.5%. Thus statement #3 is False. (Statement #3 would be true at a 0.5% significance level. The Transformed Beta is a better model than the Holmes Distribution at the 1% significance level. However, the Holmes Distribution is a better model than the Transformed Beta at the 0.5% significance level.) Comment: I made up the name “Holmes Distribution” solely for the purposes of these questions. It is a special case of the Pareto for alpha = 2, fixed. 13.3. E. The Holmes Distribution is a special case of the Pareto, with one less parameter. Twice the difference in the loglikelihoods is 2{-2587 -(-2588)} = 2. Since there is a difference of one in the number of parameters we compare to the Chi-Square Distribution with one degrees of freedom. Since 2 < 2.706 we do not reject at 10%. Comment: The Pareto Distribution is not a better model than the Holmes Distribution at a 10% significance level. (It is therefore also true that the Pareto Distribution is not a better model than the Holmes Distribution at a 5% significance level, etc.)
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 400 13.4. D. The Holmes Distribution is a special case of the Generalized Pareto, with two less parameters. Twice the difference in the loglikelihoods is 2{-2585 -(-2588)} = 6. Since there is a difference of two in the number of parameters we compare to the Chi-Square Distribution with two degrees of freedom. Since 5.991 < 6 < 7.378, we reject at 5% and do not reject at 2.5%. 13.5. C. The Weibull is a special case of the Transformed Gamma, with one less parameter. Twice the difference in the loglikelihoods is: 2{-2583 -(-2586)} = 6. Since there is a difference of one in the number of parameters we compare to the Chi-Square Distribution with one degree of freedom. Since 5.024 < 6 < 6.635 we reject at 2.5% and do not reject at 1%. 13.6. A. The Exponential is a special case of the Transformed Gamma, with two less parameters. Twice the difference in the loglikelihoods is 2{-2583 -(-2592)} = 18. Since there is a difference of two in the number of parameters we compare to the Chi-Square Distribution with two degrees of freedom. Since 10.597 < 18 we reject at 0.5%. 13.7. A. The Exponential is a special case of the Weibull, with one less parameter. Twice the difference in the loglikelihoods is 2{-2586 -(-2592)} = 12. Since there is a difference of one in the number of parameters we compare to the Chi-Square Distribution with one degree of freedom. Since 7.879 < 12 we reject at 0.5%. (lnx - µ)2 ] 2σ2 . x σ 2π
exp[13.8, 13.9, & 13.10. f(x) =
ln f(x) = -.5{(ln(x)-µ)2 /σ2 } - ln(σ) - ln(x) - (1/2)ln(2π). Σ ln f(xi) = -.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - Σln(xi) - (n/2)ln(2π). Set the partial derivatives of the sum of loglikelihoods equal to zero:
∂Σ ln f(xi) / ∂σ = Σ(ln(xi)-µ)2 /σ3 - n/σ = 0. ∂Σ ln f(xi) / ∂µ = Σ(ln(xi)-µ)/σ2 = 0. ⇒ Σ(ln(xi)-µ) = 0. ⇒ µ = Σln(xi) / n = {ln(100) + ln(500) + ln(2000) + ln(10000)}/ 4 = 6.908. Therefore σ = (Σ(ln(xi)-µ)2 / n )0.5 = 2.887 = 1.699. For µ = 6.908 and σ = 1.699, loglikelihood: -.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - Σln(xi) - (n/2)ln(2π) = -.5{Σ(ln(xi) - 6.908)2 /1.6992 } - 4ln(1.699) - Σln(xi) - 2ln(2π) = -2.00 - 2.12 - 27.63 - 3.676 = -35.43.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 401 13.11. D. For µ = 5 and σ = 1.5, loglikelihood: -.5{Σ(ln(xi)-µ)2 /σ2 } - nln(σ) - Σln(xi) - (n/2)ln(2π) = -.5{Σ(ln(xi) - 5)2 /1.52 } - 4ln(1.5) - Σln(xi) - 2ln(2π) = -5.80 - 1.62 - 27.63 - 3.675 = -38.73. The test statistic is twice the difference in the loglikelihoods: (2)(-35.43 - (-38.73)) = 6.60. The difference is number of parameters is: 2 - 0 = 2. So we compare the statistic to the Chi-Square Distribution with 2 degrees of freedom. 5.991 < 6.60 < 7.378. ⇒ Reject at 5%, but not at 2.5%. Comment: Similar to 4, 11/03, Q.28. 13.12. D. For an Exponential Distribution, ln f(x) = -x/θ - ln(θ). Assuming θB = θE/1.3, and θG = θE /1.6, then the loglikelihood is:
Σ(-xi 1.3/θE - ln(θE/1.3)) + Σ(-xi/θE - ln(θE)) + Σ(-xi 1.6/θE - ln(θE/1.6)) = Bay
Empire
Granite
-3000000(1.3)/θE - 3000ln(θE/1.3) - 7500000/θE - 6000ln(θE) - 1125000(1.6)/θE - 1500ln(θE/1.6) = -13,200,000/θE - 10500ln(θE) + 3000ln(1.3) + 1500ln(1.6). Setting the partial derivative of the loglikelihood with respect to θE equal to zero: 0 = 13,200,000/θE2 - 10500/θE. θE = 13,200,000/10500 = 1257.14. The corresponding maximum loglikelihood is: -13,200,000/1257.14 - 10500ln(1257.14) + 3000ln(1.3) + 1500ln(1.6) = -83942.13. Separate estimate of θ for Bay Stateʼs Exponential Distribution, θ = 1000. The corresponding maximum loglikelihood is: -3,000,000/1000 - 3000ln(1000) = -23723.27. Separate estimate of θ for Empire Stateʼs Exponential Distribution, θ = 1250. The corresponding maximum loglikelihood is: -7,500,000/1250 - 6000ln(1250) = -48785.39. Separate estimate of θ for Granite Stateʼs Exponential Distribution, θ = 750. The corresponding maximum loglikelihood is: -1,125,000/750 - 1500ln(750) = -11430.11. Unrestricted loglikelihood is: - 23723.27 - 48785.39 - 11430.11 = -83938.77. Twice the difference in the loglikelihoods: (2){-83938.77 - (-83942.13)} = 6.72. The restriction is two dimensional, so compare to the Chi-Square with two degrees of freedom. Since 5.991 < 6.72 < 7.378, we reject H0 at 5% and do not reject H0 at 2.5%.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 402 13.13. A. For alpha fixed, maximum likelihood is equal to the method of moments. X = (40 + 150 + 230 + 400 + 770)/5 = 318. θ = X /α = 318/3 = 106. f(x) = e-x/θx2 /(2θ3). Σln f(xi) = Σ{-xi/θ + 2 ln(xi) - ln2 - 3ln(θ)} = -Σxi/θ + 2 Σln(xi) - nln2 - 3nln(θ). For θ = 106 the loglikelihood is: -Σxi/106 + 2 Σln(xi) - 5ln2 - 15ln106 = -15 + 53.55 - 3.47 - 69.95 = -34.87. 13.14. D. For θ = 200 the loglikelihood is: -Σxi/200 + 2 Σln(xi) - 5ln2 - 15ln200 = -7.95 + 53.55 - 3.47 - 79.47 = -37.34. The test statistic is twice the difference in the loglikelihoods: (2)(-34.87 - (-37.34)) = 4.94. The difference is number of parameters is: 1 - 0 = 1. So we compare the statistic to the Chi-Square Distribution with 1 degrees of freedom. 3.841 < 4.94 < 5.024. ⇒ Reject at 5%, but not at 2.5%. Comment: Similar to 4, 11/03, Q.28. In the likelihood ratio test, we determine whether fitting a parameter from the data makes a significant difference. When we are given alpha and theta, we are not estimating any parameters from the data. When we are given alpha = 3, we are only estimating one parameter, theta, from the data. 13.15. A. Use the likelihood ratio test. Twice the difference in the loglikelihoods is: (2)(-2067.83 - (-2072.02)) = 8.38. Fixing the mean is a one dimensional restriction, so we compare to the Chi-Square with one degree of freedom. 8.38 > 7.879, so we reject H0 at 1/2%. Comment: Similar to Example 16.9 in Loss Models.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 403 13.16. C. f(x) = e-x/θ/θ. ln f(x) = -x/θ - ln(θ). Loglikelihood is: -Σxi/θ - nln(θ). Separate estimate of θ for Rural Exponential Distribution, θ = 100, same as the method of moments. The corresponding maximum loglikelihood is: -500,000/100 - 5000ln(100) = -28025.85. Separate estimate of θ for Urban Exponential Distribution, θ = 125. The corresponding maximum loglikelihood is: -1,250,000/125 - 10000ln(125) = -58283.14. Restricted by H0 , θU = 1.2θR, the loglikelihood for the combined sample is: -500,000/θR - 5000ln(θR) -1,250,000/(1.2θR) - 10000ln(1.2θR). Setting the partial derivative with respect to θR equal to zero, and solving: θR = (500000 + 1250000/1.2)/(5000 + 10000) = 102.78. θU = (1.2)(102.78) = 123.33. The corresponding maximum loglikelihood is: -500,000/102.78 - 5000ln(102.78) -1,250,000/123.33 - 10000ln(123.33) = -86311.76. Unrestricted loglikelihood is: -28025.85 - 58283.14 = -86308.99. Twice the difference in the loglikelihoods: (2){-86308.99 - (-86311.76)} = 5.54. The restriction is one dimensional, so compare to the Chi-Square with one degree of freedom. Since 5.024 < 5.54 < 6.635, we reject H0 at 2.5% and do not reject H0 at 1%. 13.17. For σ = 1, f(x) = exp[-(x - µ)2 /2]/ 2 π . loglikelihood is: -Σ(xi - µ)2 /2 - (n/2)ln(2π). The maximum likelihood µ = X , with corresponding loglikelihood: -Σ(xi - X )2 /2 - (n/2)ln(2π). For µ = 0, loglikelihood is: -Σxi2 /2 - (n/2)ln(2π). Twice the difference in the loglikelihoods is: Σxi2 - Σ(xi - X )2 = Σ(2xi X - X 2 ) = n X 2 . If H0 is true, X is Normal with mean zero and standard deviation 1/ n . Therefore,
n X is a Unit Normal with mean zero and standard deviation 1.
Therefore, n X 2 is the square of a Unit Normal, a Chi-Square Distribution with 1 degree of freedom. Comment: Beyond what you should be asked on your exam. Unlike the usual situation, where the test statistic is approximately Chi-Square, with the approximation improving as n gets larger, here it actually follows a Chi-Square Distribution. σ could have taken any fixed positive value, rather than 1.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 404 13.18. A. f(x) = αθα/(θ + x)α + 1 = α 10α/(10 + x)α + 1. ln f(x) = ln(α) + α ln(10) − (α+1)ln(10 + x). Loglikelihood = Σ ln (f(xi)) = Σ { ln(α) + α ln(10) − (α+1)ln(10 + xi) } = 500 ln(α) + 500α ln(10) - 1485(α+1). Setting the derivative with respect to α equal to zero: 0 = 500/α + 500 ln(10) - 1485. ⇒ α^ = 500 / {1485 - 500ln(10)} = 1.498. The corresponding loglikelihood is: 500 ln(1.498) + (500)(1.498) ln(10) - (1485)(2.498) = -1782.83. The loglikelihood corresponding to θ = 10 and α = 1.7 is: 500 ln(1.7) + (500)(1.7) ln(10) - (1485)(2.7) = -1786.99. The likelihood ratio test statistic is: (2){-1782.83 - (-1786.99)] = 8.32. We are comparing a situation with one fitted parameter versus zero fitted parameters, or 1 - 0 = 1 degree of freedom in the Chi-Square table. 8.32 > 7.879, so we reject at H0 1/2%. Comment: Similar to 4, 5/07, Q.25. 13.19. Of the one parameter distributions, the Exponential has the best loglikelihood at -3023.88. Of the two parameter distributions, the Gamma has the best loglikelihood at -3021.05. Applying the likelihood ratio test to the Gamma versus the Exponential, twice the difference is: (2)(-3021.05 - 3023.88) = 5.66 ≤ 6.635 the 1% critical value for 1 degree of freedom. Thus we do not reject the simpler Exponential in favor of the Gamma. Of the three parameter distributions, the Transformed Gamma has the best loglikelihood at -3016.98. Applying the likelihood ratio test to the Transformed Gamma versus the Exponential, twice the difference is: (2)(-3016.98 - -3023.88 ) = 13.80 > 9.210 the 1% critical value for 2 degrees of freedom. Thus we reject the simpler Exponential in favor of the Transformed Gamma. The Transformed Gamma is selected as best. Comment: For a given number of parameters, the distribution with the larger loglikelihood is the better fit. Applying the likelihood ratio test to the Transformed Gamma versus the Gamma, twice the difference is (2)(-3016.98 - -3021.05) = 8.14 > 6.635 the 1% critical value for 1 degree of freedom. Thus we reject the simpler Gamma in favor of the Transformed Gamma. It may turn out that applying the likelihood ratio test does not show a clear winner as it did here. The Schwarz Bayesian Criterion does not have this potential problem.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 405 13.20. B. f(x) = e-x/θ/θ. loglikelihood: -Σxi/θ - n ln(θ). Maximum likelihood is equal to the method of moments: θ = X . The maximum loglikelihood is: -Σxi/ X - n ln( X ) = -n{1 + ln( X )}. For State A separately, θ = X = 52. The maximum loglikelihood is: -10{1 + ln(52)} = -49.5124. For State B separately, θ = X = 116. The maximum loglikelihood is: -10{1 + ln(116)} = -57.5359. For the two states combined, θ = X = 84. The maximum loglikelihood is: -20{1 + ln(84)} = -108.6163. Twice the difference in loglikelihoods: 2{-(49.5124 + 57.5359) - (-108.6163)} = 3.136. The difference in the number of parameters is 1. Therefore we compare to a Chi-Square Distribution with 1 degree of freedom. Since 2.706 < 3.136 < 3.841, reject H0 at the 0.10 level of significance, but not at the 0.05 level of significance. 13.21. B. The test statistic is twice the difference in the loglikelihoods: (2)( -1031.30 - (-1034.71)) = 6.82. The difference is number of parameters is: 2 - 1 = 1. So we compare the statistic to the Chi-Square Distribution with 1 degrees of freedom. 6.635 < 6.82 < 7.879. ⇒ Reject at 1%, but not at 0.5%. 13.22. C. f(x) = θα e−θ/x / {Γ(α) xα+1}. ln f(x) = αln(θ) - θ/x - lnΓ(α) - (α + 1)ln(x). loglikelihood = Nαln(θ) - θΣ1/xi - N lnΓ(α) - (α + 1)Σln(xi) = (100)αln(θ) - θ(0.309383) - 100lnΓ(α) - (α + 1)(594.968). For α = 4 and θ = 1293, the loglikelihood is: (100)(4)ln(1293) - (1293)(0.309383) - 100ln6 - (4 + 1)(594.968) = -688.160. The test statistic is twice the difference in the loglikelihoods: (2)(L1 - L0 ) = (2){-686.084 - (-688.160)} = 4.152. The difference in the number of parameters is: 2 - 1 = 1. So we compare the statistic to the Chi-Square Distribution with 1 degrees of freedom. 3.841 < 4.152 < 5.024. ⇒ Reject at 5%, but not at 2.5%.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 406 13.23 For the Inverse Gaussian distribution with θ fixed, maximum likelihood is equal to the method of moments. µ = X . Alternately, f(x) = (θ/ 2πx3 ).5 exp[- θ({x − µ} / µ)2 / 2x]. ln f(x) = 0.5 ln(θ) - 0.5ln(2π) - 1.5ln(x) - θ({x − µ} / µ)2 / 2x. Σ ln f(xi) = (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ/2) Σ(xi /µ2 - 2/µ + 1/ xi). Set the partial derivative of the loglikelihood with respect to µ equal to zero.
∂Σ ln f(xi) / ∂µ = -(θ /2)Σ(-2xi /µ3 + 2/µ2 ) = 0. Σ 2/µ2 = Σ2xi /µ3. Therefore, n µ = Σ xi. ⇒ µ = Σ xi /n = X . The corresponding maximum likelihood is: (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ/2) Σ(xi / X 2 - 2/ X + 1/ xi). The loglikelihood corresponding to µ = µ0 is: (n/2) ln(θ) - (n/2) ln(2π) - 1.5Σ ln(xi) - (θ/2) Σ(xi /µ0 2 - 2/µ0 + 1/ xi). Twice the difference in the loglikelihoods is: θ {Σ(xi /µ0 2 - 2/µ0 ) - Σ(xi / X 2 - 2/ X ) } = 13 {
=
13 n X µ0 2
{ X 2 - 2 X µ0 - µ0 2 + 2µ0 2} =
nX µ0 2
13 n X µ0 2
-
2n µ0
( X - µ0 )2 .
-
nX
2n
X
X
2 +
}
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 407 13.24. f(x) = τ(θ/x)τ exp[-(θ/x)τ] /x. ln f(x) = ln(τ) - (τ + 1)ln(x) + τ ln(θ) - (θ/x)τ. For τ = 2, the loglikelihood is: 4 ln(2) - 3 ln(0.1) - 3 ln(0.3) - 3 ln(0.6) - 3 ln(0.8) + 8 ln(θ) - θ2 (0.1-2 + 0.3-2 + 0.6-2 + 0.8-2). Set the derivative of the loglikelihood equal to zero: 8/θ = (2θ) (115.451). ⇒ θ = 0.1861. Thus the maximum loglikelihood is: 4 ln(2) - 3 ln(0.1) - 3 ln(0.3) - 3 ln(0.6) - 3 ln(0.8) + 8 ln(0.1861) - 0.18612 (115.451) = -1.956. Comment: For an Inverse Weibull distribution with τ fixed, the maximum likelihood θ is: ⎛ ⎜ ⎜ ⎝
⎞ 1/ τ ⎟ . xi- τ ⎟ ⎠
N
∑
13.25. A. The loglikelihood is: 4 ln(τ) - (τ + 1) ln[(0.1)(0.3)(0.6)(0.8)] + 4τ ln(θ) - (0.1-τ + 0.3-τ + 0.6-τ + 0.8-τ) θτ. For θ = 0.2277 and τ = 1.2433, the loglikelihood is: 4 ln(1.2433) - (1.2433 + 1) ln[(0.1)(0.3)(0.6)(0.8)] + (4)(1.2433) ln(0.2277) - (0.1-1.2433 + 0.3-1.2433 + 0.6-1.2433 + 0.8-1.2433) (0.22771.2433) = -0.976. The likelihood ratio statistic is twice the difference: (2){-0.976 - (-1.956)} = 1.960. There is a difference in fitted parameters of: 2 - 1 = 1. Consulting the Chi-Square Table for one degree of freedom, 1.960 < 2.706. Do not reject H0 at 10%. Comment: Similar to 4, 11/05, Q.25.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 408 13.26. C. f(x) = (αθα)(θ + x)−(α + 1). ln f(x) = ln(α) + αln(θ) - (α + 1)ln(θ + x). loglikelihood = N ln(α) + N α ln(θ) - (α + 1)Σln(θ + xi). For α = 1.5, θ = 7.8, and N = 200, the loglikelihood is: (200)ln(1.5) + (200)(1.5)ln(7.8) - (2.5)(607.64) = -821.77. The test statistic is twice the difference in the loglikelihoods: (2)(-817.92 - (-821.77)) = 7.70. The difference in the number of fitted parameters is: 2 - 0 = 2. So we compare the statistic to the Chi-Square Distribution with 2 degrees of freedom. 7.378 < 7.70 < 9.210. ⇒ Reject at 2.5%, but not at 1%. Comment: When we fit both alpha and theta, then there are two parameters. When we instead fix both alpha and theta, then there are no fitted parameters. The latter is a special case of the former. 13.27. C. f(x) = τ(x/θ)τ exp[-(x/θ)τ] /x. ln f(x) = ln(τ) + (τ - 1)ln(x) - τ ln(θ) - (x/θ)τ. For τ = 2 and θ = 816.7, ln f(x) = ln(2) + ln(x) - 2ln(816.7) - (x/816.7)2 . Loglikelihood is: 5ln(2) + Σln(xi) - 10ln(816.7) - Σ(xi/816.7)2 = 3.466 + 33.305 - 67.053 - 5.000 = -35.28. We are given that at the maximum likelihood estimate the loglikelihood is -33.05. The likelihood ratio statistic is twice the difference: (2){-33.05 - (-35.28)} = 4.46. There is a difference in number of fitted parameters of 2 - 1 = 1. Consulting the Chi-Square Table for one degree of freedom, 3.841 < 4.46 < 5.024. Reject H0 at 5%, but not at 2.5%. Comment: For τ = 2 known and fixed, you should be able to determine the maximum likelihood value of θ yourself. θ = (Σ xiτ / N)1/τ = {(5952 + 7002 + 7892 + 7992 + 11092 )/5}1/2 = 816.7. Using a computer, the maximum likelihood Weibull has θ = 869.6 and τ = 4.757; given these fitted parameters, you should be able to calculate the corresponding maximum loglikelihood of -33.05. H0 is to use the simpler distribution, the Weibull with τ = 2 fixed.
2013-4-6, Fitting Loss Distributions §13 Likelihood Ratio Test, HCM 10/14/12, Page 409 13.28. E. f(x) = (αθα)(θ + x)−(α + 1). ln f(x) = ln(α) + αln(θ) - (α + 1)ln(θ + x). loglikelihood = N ln(α) + N α ln(θ) - (α + 1)Σln(θ + xi) = (20)ln(2) + (20)(2) ln(θ) - (2 + 1)Σln(θ + xi). For θ = 7.0, the loglikelihood is: 20 ln(2) + 40 ln(7) - 3 Σln(7 + xi) = 91.6993 - (3)(49.01) = -55.331. For θ = 3.1, the loglikelihood is: 20 ln(2) + 40 ln(3.1) - 3 Σln(3.1 + xi) = 59.1190 - (3)(39.30) = -58.781. The test statistic is twice the difference in the loglikelihoods: (2){-55.331 - (-58.781)} = 6.900. The difference in the number of parameters is: 1 - 0 = 1. So we compare the statistic to the Chi-Square Distribution with 1 degrees of freedom. 6.635 < 6.900 < 7.879. ⇒ Reject at 1%, but not at 0.5%. Comment: When we fix alpha and fit theta, then there is only one parameter. When we instead fix both alpha and theta, then there are no fitted parameters. The latter is a special case of the former.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 410
Section 14, Hypothesis Testing175 You should know how to apply hypothesis testing to the Chi-Square Test, Likelihood Ratio Test, Kolmogorov-Smirnov Test, etc. It is also a good idea to know some of the general terminology. Chi-Square Example: The previously discussed application of the Chi-Square Statistic is an example of Hypothesis Testing. For example, for the grouped data in Section 3, for the Transformed Gamma Distribution fit by the Method of Maximum Likelihood, with parameters α = 3.2557, θ = 1861, and τ = 0.59689, the Chi-Square statistic is 16.45. The steps of hypothesis testing are: 1. Choose a level of significance
For example, level of significance = 1/2%.
2. Formulate the statistical model
The grouped data in Section 3 is a random sample drawn from a single distribution.
3. Specify the null hypothesis H0
H0 :
The assumed distribution is a
Transformed Gamma Distribution with α = 3.2557, θ = 1861, and τ = 0.59689 and the alternative hypothesis H1 .
H1 :
The assumed distribution is not the above.
4. Select a test statistic whose behavior is known.
The Chi-Square Statistic computed above has approximately a Chi-Square Distribution with 5 degrees of freedom.176
5. Find the appropriate critical region.
The critical region or rejection region is χ2 > 16.750.177
6. Compute the test statistic on the
The test statistic is χ2 = 16.45.
assumption that H0 is true. 7. Draw conclusions. If the test statistic lies in the critical region, then reject the null hypothesis. 175
The test statistic is not in the critical region, (since 16.45 ≤ 16.750) so we do not reject H0 . We do not reject the fitted Trans. Gamma at 1/2%.
See Section 12.4 of Loss Models. Assuming the null hypothesis is true. Degrees of freedom = 9 intervals - 1 - 3 fitted parameters = 5. 177 Consulting the Chi-Square table for 5 d.f. and P = 99.5% ⇔ significance level of 1/2%. The interval from 0 to 16.750 is a 99.5% confidence interval for the Chi-Square Statistic if H0 is true. 176
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 411 If the test statistic had been instead 17, then we would have rejected the null hypothesis. If the significance level had been 1% instead of 1/2%, then the critical region would have been instead χ 2 > 15.086. The test statistic would have been in the critical region, since 16.45 > 15.086. Thus we would have rejected the fitted Transformed Gamma at 1%. Null Hypothesis: In general, in hypothesis testing one tests the null hypothesis H0 versus an alternative hypothesis H1 . It is important which hypothesis is H0 and which is H1 ; as will be discussed, they are treated differently. In the example above, the null hypothesis was that the grouped data in Section 3 was a random sample drawn from a Transformed Gamma Distribution with α = 3.2557, θ = 1861, and τ = 0.59689. The alternative hypothesis was that the grouped data in Section 3 was not a random sample drawn from this Transformed Gamma Distribution. A large Chi-Square means it is unlikely H0 is true and therefore we would reject H0 . In an application to regression, the null hypothesis might be that a certain slope in a regression model is zero, while the alternative hypothesis is that this slope is not zero. If the universe of possibility is divided in a manner that includes a boundary, the null hypothesis must include the boundary.178 For example, if µ is the mean of a Normal Distribution, H0 might be µ ≥ 0, while H1 is µ < 0. Note that hypothesis tests are set up to disprove something, H0 , rather than prove something. In the Chi-Square example, the test is set up to disprove that the data came from a certain Transformed Gamma Distribution. For example, a dry sidewalk is evidence it did not rain. On the other hand a wet sidewalk might be caused by rain or something else such as a sprinkler system. A wet sidewalk can not prove that it rained, but a dry sidewalk is evidence that it did not rain. Similarly, a large Chi-Square value is evidence that the data was not drawn from the given distribution, and may lead one to reject the null hypothesis. On the other hand, a small Chi-Square value results in one not rejecting the null hypothesis; a small Chi-Square value does not prove the null hypothesis is true.
178
See page 326 of Loss Models.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 412 We do not reject H0 unless there is sufficient evidence to do so. This is similar to the legal concept of innocent (not guilty) until proven guilty. A trial does not prove one innocent. Exercise: A LogNormal and a Pareto Distribution have each been fit to the same data, grouped into 11 intervals. The Chi-Square Statistics are 16.2 and 16.7 respectively. What conclusions do you draw at a 2.5% significance level? [Solution: There are 2 fitted parameters and 11 - 1 - 2 = 8 degrees of freedom. The critical value for 2.5% is 17.535. Since 16.2 ≤ 17.535 we do not reject the LogNormal. Since 16.7 ≤ 17.535 we do not reject the Pareto. Comment: The Chi-Square for the LogNormal is somewhat lower, and since there are the same number of degrees of freedom, the p-value for the LogNormal is somewhat higher than for the Pareto. The LogNormal is a somewhat better fit than the Pareto.] In the above exercise, at a 2.5% significance level we do not reject either the LogNormal or the Pareto fit. We were unable to reject two contradictory hypotheses that the data is a random sample from a fitted Pareto or LogNormal Distribution. Clearly the data canʼt be a random sample from both of these distributions. It would be somewhat troubling to “accept” two contradictory hypotheses. Thus “do not reject H0 ” is more precise than and preferable to “accept H0 ”. According to Loss Models, one should not use the term “accept H0 ”. Nevertheless, it is common for actuaries, including perhaps some members of the exam committee, to use the terms “do not reject H0 ” and “accept H0 ” synonymously. For many actuaries in common usage: do not reject ⇔ accept. Test Statistic: A hypothesis test needs a test statistic whose distribution is known. In the above example, the test statistic was the Chi-Square Statistic. In a subsequent example, the test statistic is the Kolmogorov-Smirnov Statistic. In other tests, one would use the Normal Table, t, or F Tables. Critical Values: The critical values are the values used to decide whether to reject H0 . For example, in the above Chi-Square test, the critical value (for 1/2% and 5 degrees of freedom) was 16.750. We reject H0 if the Chi-Square statistic is greater than 16.750. The critical value(s) form the boundary (other than ±∞) of the rejection or critical region. rejection region ⇔ critical region ⇔ if test statistic is in this region then we reject H0 .
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 413 Significance Level: The significance level, α, of the test is a probability level selected prior to performing the test. In the above Chi-Square example 1/2% was selected. Using the Chi-Square table attached to the exam, one can perform tests at significance levels of 10%, 5%, 2.5%, 1%, and 1/2%. For example, a significance level of 5% uses the column listed as P = 1 - 5% = 95%. If Prob[test statistic will take on a value at least as unusual as the computed value | H0 is true] is less than or equal to the significance level chosen, then we reject the H0 . If not, we do not reject H0 . The result of any hypothesis test depends on the significance level chosen. Therefore, in practical applications the choice of the significance level is usually important. Exercise: A Weibull Distribution has been fit to data grouped into 11 intervals. The Chi-Square Statistic is 18.3. What conclusions do you draw at different significance levels? [Solution: There are 2 fitted parameters and 11 - 1 - 2 = 8 degrees of freedom. The critical values for 10%, 5%, 2.5%, 1%, and 1/2% shown in the Chi-Square Table for 8 degrees of freedom are: 13.362, 15.507, 17.535, 20.090, 21.955. Since 18.3 > 13.362, reject the Weibull at 10%. Since 18.3 > 15.507, reject the Weibull at 5%. Since 18.3 > 17.535, reject the Weibull at 2.5%. Since 18.3 ≤ 20.090, do not reject the Weibull at 1%. Since 18.3 ≤ 21.955, do not reject the Weibull at 1/2%.] The results of this exercise would usually be reported as: reject the Weibull at 2.5%, do not reject the Weibull at 1%. Since we reject at 2.5%, we also automatically reject at 5% and 10%. Since we do not reject at 1%, we also automatically do not reject at 1/2%. Types of Errors: There are two important types of errors that can result when performing hypothesis testing:179 Type I Error
Reject H0 when it is true.
Type II Error
Do not reject H0 when it is false.
Exercise: A Chi-Square test was performed at a 1/2% significance level for 5 degrees of freedom. The Chi-Square Statistic was greater than the critical value of 16.75. Therefore, H0 was rejected. What is the probability of a Type I error? 179
We are assuming you set everything up correctly. These errors are due to the random fluctuations present in all data sets and the incomplete knowledge of the underlying risk process which led one to perform a hypothesis test in the first place.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 414 [Solution: If H0 is true, then there is a 1/2% chance that the Chi-Square Statistic will be greater than 16.750, due to random fluctuation in the limited sample represented by the observed data; the survival function at 16.750 of a Chi-Square Distribution with ν = 5 is 1/2%. The Chi-Square Statistic was greater than 16.750, therefore the probability of a Type I error is less than 1/2%.] Rejecting H0 at a significance level of α, means the probability of a Type I error is at most α. We would like both a small probability of rejecting when we should not (making a Type I error), and a small probability of failing to reject when we should (making a Type II error). But there is a trade-off. Reducing the probability of one type of error, increases the probability of the other type of error. With more relevant data one can reduce the probability of both types of errors. p-value: The p-value = Prob[test statistic takes on a value less in agreement with H0 than its calculated value]. If the p-value is less than or equal to the chosen significance level, then we reject H0 .180 Exercise: For 2 degrees of freedom, the Chi-Square Statistic is 8. What is the p-value? [Solution: Since the critical value for 2.5% is 7.378 and the critical value for 1% is 9.210, and 7.378 < 8 < 9.210, the p-value is between 1% and 2.5%. Reject at 2.5%, do not reject at 1%. Using a computer or noting that the Chi-Square Distribution with 2 degrees of freedom is an Exponential Distribution with mean 2, the p-value is: S(8) = e-8/2 = 1.83%.] When applying hypothesis testing to test the fit of a distribution to data, the larger the p-value the better the fit. Small p-values indicate a poor fit. According to Loss Models:181 p-values > 10% do not indicate any support for the alternative hypothesis H1 , while p-values < 1% indicate strong support for H1 .
180
Some sources instead say that we reject H0 if the p-value is less than the chosen significance level. There should not be a case on the exam or in practical applications where the p-value is exactly equal to the significance level. 181 See page 329 of Loss Models.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 415 Power of a Test: The power of a test is the probability of rejecting the null hypothesis, when H1 is true. Prob[Type II error] = 1 - Power of the test = probability of failing to reject H0 when it is false. Thus, everything else equal, large power of a test is good. Assume that you have a data set with 100 values. H0 is that the data is from a Normal Distribution with variance 30 and mean = 5. H1 is that the data is a Normal Distribution with variance 30 and mean = 7. What is the power of applying the Normal test at a 5% significance level? We reject H0 if the observed mean is large; we perform a one-sided test. The observed mean is Normally Distributed, with a variance of: 30/100 = 0.3. If H0 is true, Prob[observed mean > x] = 1 - Φ[(x - 5)/ 0.3 ]. Consulting the Normal Table, this probability is 5% when x = 5 + (1.645) 0.3 = 5.90. Power = the probability of rejecting the null hypothesis when H1 is true = Probability[Normal Distribution with variance 0.3 and mean = 7 is greater than 5.90] = Φ[(7 - 5.90)/ 0.3 ] = Φ(2.01) = 97.8%. Exercise: What is the power of instead applying the Normal test at a 2.5% significance level? [Solution: Performing a one-sided test, we reject when the observed mean is greater than: 5 + (1.960) 0.3 = 6.07. Power = Φ[(7 - 6.07)/ 0.3 ] = Φ(1.70) = 95.5%.] In general, all other things being equal, the smaller the significance level, the smaller the power of the test. A smaller significance level results in less chance of rejecting H0 even though it is true, a Type I error, but a greater chance of failing to reject H0 even though it is false, a Type II error. Decision
H0 true
H0 False
Reject H0
Type I Error ⇔ p-value
Correct ⇔ Power
Do not reject H0
Correct ⇔ 1 - p-value
Type II Error ⇔ 1 - Power
In general, there is a trade-off between Type I and Type II errors. Making the probability of one type of error smaller, usually makes the probability of the other type of error larger.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 416 A hypothesis test is uniformly most powerful if it has the greatest power (largest probability of rejecting H0 when it is false) of any test with the same (or smaller) significance level.182 We compare the power of tests for a given size data set. The larger the data set, the easier it is to reject H0 when it is false; the larger the data set, the more powerful a given test. Exercise: What is the power of applying the Normal test at a 2.5% significance level and assuming 200 data points? [Solution: Performing a one-sided test, we reject when the observed mean is greater than: 5 + (1.960) 0.15 = 5.76. Power = Φ[(7 - 5.76)/ 0.15 ] = Φ(3.20) = 99.93%. Comment: Increasing the sample size from 100 to 200 increased the power of the test from 95.5% to 99.93%.] For example, when computing the Chi-Square statistic, the numerator has terms proportional to the square of the sample size and the denominator has terms that are proportional to the sample size. Thus as the sample size increases, if everything else stays the same, the value of the test statistic will increase, while the critical values stay the same, and thus we are more likely to reject H0 when it is false.183 In the case of the Kolmogorov-Smirnov test, the critical values decrease as one over the square root of the sample size. Thus as the sample size increases, if everything else stays the same, the test statistic is the same but the critical values are smaller, and thus we are more likely to reject H0 when it is false.184
182
See Definition 12.8 in Loss Models. See page 454 of Loss Models. The Anderson-Darling Statistic acts similarly. If the data set doubled in size, all else being equal, the test statistic would double, while the critical value remained the same, and thus we are more likely to reject H0 when it is false. 184 Questions will often involve determining the K-S Statistic for a data set of size 5, solely so that you can compute the statistic under exam conditions. The K-S test would have little power if applied to such very small data sets. 183
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 417 Kolmogorov-Smirnov Example: The application of the Kolmogorov-Smirnov Statistic, to be discussed subsequently, is another important example of hypothesis testing. For example, with 130 points as in the ungrouped data in Section 2, the critical values for the K-S Statistic are: Significance Level = α
0.20 |
Critical Value for n = 130
0.10 |
0.0938
0.05 |
0.107
0.01 |
0.119
0.143
For the Pareto Distribution fit to the ungrouped data in Section 2 via Method of Moments, α = 2.658 and θ = 518,304, the K-S Statistic is 0.131. The critical value for 5% is: 1.36 / 130 = 0.119. Since 0.131 > 0.119, we can reject the fit of the Pareto at a 5% significance level. On the other hand, the critical value for 1% is: 1.63 / 130 = 0.143 > 0.131, so one can not reject the Pareto at the 1% significance level. Mechanically, the K-S Statistic for the Pareto of .131 is bracketed by 0.119 and 0.143. One rejects to the left and does not reject (accepts) to the right. Reject at 5% and do not reject at 1%. The steps of hypothesis testing are: 1. Choose a 5% significance level 2. Formulate a statistical model: The ungrouped data in Section 2 is a random sample of independent draws from a single distribution. 3. Specify the Null Hypothesis: The distribution is a Pareto with α = 2.658 and θ = 518,304.185 The alternative hypothesis is that it is not the above distribution. 4. Select the K-S Statistic (whose behavior is known.) 5. The critical region or rejection region is K-S ≥ 1.36 / 130 = 0.119. 6. The test statistic is computed as 0.131, as stated above. 7. Draw the conclusion, that we reject the null hypothesis; since the computed statistic is in the critical region we reject the fit at 5%.
185
This is a Pareto Distribution fit to the ungrouped data in Section 2 via the Method of Moments.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 418 If instead in Step 1 one had chosen a 1% significance level, then the critical region would have been K-S ≥ 1.63 / 130 = 0.143. The computed statistic of 0.131 is outside this critical region; thus we do not reject the Null Hypothesis at a 1% significance. We have not proven the Null Hypothesis is true; rather the data doesn't contradict the Null Hypothesis at this significance level of 1%. Fitted Distributions: In this example, the parameters of the Pareto Distribution were estimated by fitting to the ungrouped data in Section 2. When we then compare this fitted Pareto Distribution to this same data, the Pareto Distribution matches the data better than it otherwise would, since fitting determines parameters that produce a distribution that is close to the data. Therefore, the Kolmogorov-Smirnov Statistic is smaller than it would have been if we had instead picked the parameters of the Pareto in advance. We reject the null hypothesis when the Kolmogorov-Smirnov statistic is large. Therefore, if one uses the table of Kolmogorov-Smirnov critical values, one would have a lower probability of rejecting H0 .186 This increases the probability of a Type II error, failing to reject when one should. It decreases the probability of a Type I error, rejecting when one should not. In general, when a distribution is fit to data, and no adjustment is made to the critical values, the probability of a Type II error increases, while the probability of a Type I error decreases, compared to using parameters specified in advance.187 This would be the case for the usual applications of the Kolmogorov-Smirnov and Anderson-Darling tests, to be discussed subsequently. For the Chi-Square test, as discussed previously, we would reduce the degrees of freedom by the number of fitted parameters. This would adjust the critical values for the effect of fitting, and there is no obvious impact on the probabilities of errors compared to using parameters specified in advance.
186
Assuming one did not adjust the table of critical values of the effect of fitting parameters. There is no such adjustment for the K-S Statistic discussed on the syllabus. One might use simulation to estimate this effect. See “Mahlerʼs Guide to Simulation.” 187 See page 448 of Loss Models. See also statement A of 4, 5/05, Q.19.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 419 Fitting to Half the Data and Comparing the Fit to the Other Half of the Data:188 One way around this potential problem with the use of the Kolmogorov-Smirnov test when comparing a fitted distribution is to select at random half of the data. Then fit the distribution to this half of the data. Then compare the fitted distribution to the remainder of the given data set. For example, I selected at random half of the ungrouped data in Section 2:189 400, 2800, 4500, 10400, 14100, 15500, 19400, 22100, 29600, 32200, 32500, 39500, 39900, 41200, 42800, 45900, 49200, 54600, 56700, 59100, 62500, 63600, 66900, 68100, 68900, 72100, 80700, 84500, 91700, 96600, 106800, 113000, 115000, 117100, 126600, 127600, 128000, 131300, 134300, 135800, 146100, 150300, 171800, 183000, 209400, 212900, 225100, 233200, 244900, 253400, 284300, 395800, 437900, 442700, 463000, 469300, 571800, 737700, 766100, 846100, 920300, 981100, 1546800, 2211000, 2229700. Then a Pareto Distribution was fit via maximum likelihood to these 65 values: α = 1.824 and θ = 251,823.190 This fitted Pareto Distribution was then compared to the remaining 65 values from the ungrouped data in Section 2, that had not been used in the fit. The Kolmogorov-Smirnov Statistic was 0.079.191 For 65 data points, the 20% critical value is: 1.07/ 65 = 0.133. Since 0.079 < 0.133, we do not reject this maximum likelihood Pareto at 20%. After performing the hypothesis test, if the fit is acceptable, one would then fit the distribution to the entire original data set. In this case, the maximum likelihood Pareto for the entire data set in Section 2 has α = 1.702 and θ = 240,151, as discussed previously.
188
See page 448 of Loss Models. A random permutation of the numbers 1 to 130 was produced, using the function RandPermutation in Mathematica. Then the first half of this random permutation was used to select the values from the original data set. How to generate random permutations and subsets is discussed in a subsection of Section 4 of “Mahlerʼs Guide to Simulation.” You almost certainly do not need to know how to do so for your exam. 190 When a Pareto was fit via maximum likelihood to all 130 data points, α = 1.702 and θ = 240,151. 191 For the maximum likelihood Pareto fit to all the data and then compared to all of this data, the Kolmogorov-Smirnov Statistic was 0.059. With 1/2 the data, one would have expected a K-S Statistic here of about: (0.059) 2 = 0.083. 189
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 420 Problems: 14.1 (2 points) You assume that the mean severity is $4000. An insurer wrote 1000 exposure and observed 85 claims. You observe an average severity of $5000 with an average variance of 35 million. You test the null hypothesis that the severity assumption is adequate. (This means that you will reject the null hypothesis only if the assumptions are too low.) At what significance level do you reject the null hypothesis? (Assume the observed variance is that of the populationʼs size of loss distribution. Use the Normal Approximation.) A. 10% B. 5% C. 2.5% D. 1% E. 0.5% 14.2 (2 points) Let H0 be the hypothesis that a particular set of claims are drawn from a Pareto distribution with parameters α = 2 and θ = 1 million. Let H1 be the hypothesis that this set of claims are drawn from a Pareto distribution with parameters α < 2 and θ = 1 million. You then observe 5 claims all of which are greater than 1 million. You reject the hypothesis H0 at which of the following levels of significance? A. 10%
B. 5%
C. 1%
D. 0.1%
E. 0.01%
14.3 (2 points) One has fit both a Gamma and an Exponential Distribution to the same data via maximum likelihood. The loglikelihood for the Gamma is: -1542.1. The loglikelihood for the Exponential is: -1545.8. Perform a likelihood ratio test using the form and all of the terminology of hypothesis testing. 14.4 (2 points) Let H0 be the hypothesis that a particular set of claims are drawn from a distribution F. Let H1 be the hypothesis that this set of claims are drawn from a distribution with a heavier righthand tail than F. You then observe 3 claims, all of which are greater than x. How large does F(x) have to be, so that you reject the hypothesis H0 at 10% significance? A. 30%
B. 54%
C. 73%
D. 90%
E. 97%
14.5 (1 point) Which of the following statements about hypothesis testing is false? A. The p-value is the probability given H0 is true, that the test statistic takes on a value equal to its calculated value or a value less in agreement with H0 (in the direction of H1 ). B. The p-value is the chance of a Type II error. C. If the p-value is less than the chosen significance level, then we reject H0 . D. A p-value of less than 1% for a Chi-Square test of a loss distribution, indicates strong support for the hypothesis that the sample did not come from this loss distribution. E. None of the above statements is false.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 421 14.6 (2 points) Let H0 be the hypothesis that a particular set of claims are drawn from a distribution F. Let H1 be the hypothesis that this set of claims are drawn from a distribution with a heavier righthand tail than F. You then observe 3 claims, the maximum of which is greater than x. How large does F(x) have to be, so that you reject the hypothesis H0 at 10% significance? A. 30%
B. 54%
C. 73%
D. 90%
E. 97%
14.7 (2 points) You are given the following: • Loss sizes follow an Exponential distribution with mean θ. • The null hypothesis, H0 : θ ≥ 1000, is tested against the alternative hypothesis, H1 : θ < 1000. • 100 losses that sum to 83,000 were observed. Determine the p-value of this test. Use the Normal Approximation. A. 1/2% B. 1% C. 5% D. 10%
E. 20%
14.8 (2 points) You assume that the frequency is given by a Poisson Distribution with a mean of 0.07. An insurer wrote 1000 exposure and observed 85 claims. You test the null hypothesis that the frequency assumption is adequate. (This means that you will reject the null hypothesis only if the assumptions are too low.) At what significance level do you reject the null hypothesis? (Use the Normal Approximation.) A. 10% B. 5% C. 2.5% D. 1% E. None of A, B, C, or D. 14.9 (2, 5/83, Q.10) (1.5 points) Let X1 , X2 , . . . , X16 be a random sample from a normal distribution with mean µ and variance 16. In testing the null hypothesis H0 : µ = 0 against the alternative hypothesis H1 : µ = 1, the critical region is X > k. If the significance level (size) of the test is 0.03, then the respective values of k and the probability of a Type II error for µ = 1 are: A. 0.48, 0.02
B. 0.48, 0.97
C. 1.88, 0.19
D. 1.88, 0.81
E. 1.88, 0.97
14.10 (2, 5/85, Q.22) (1.5 points) Let p represent the proportion of defectives in a manufacturing process. To test H0 : p ≤ 1/4 versus H1 : p > 1/4, a random sample of size 5 is taken from the process. If the number of defectives is 4 or more, the null hypothesis is rejected. What is the probability of rejecting H0 if p = 1/5? A. 6/3125
B. 4/625
C. 21/3125
D. 3104/3125
E. 621/625
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 422 14.11 (2, 5/85, Q.23) (1.5 points) Let X have the density function f(x) = (θ + 1)xθ for 0 < x < 1. The hypothesis H0 : θ = 1 is to be rejected in favor of H1 : θ = 2 if X > 0.90. What is the probability of a Type I error? A. 0.050 B. 0.095 C. 0.190 D. 0.810
E. 0.905
14.12 (2, 5/85, Q.42) (1.5 points) A researcher wants to test H0 : θ = 0 versus H1 : θ = 1, where θ is a parameter of a population of interest. The statistic W, based on a random sample of the population, is used to test the hypothesis. Suppose that under H0 , W has a normal distribution with mean 0 and variance 1, and under H1 , W has a normal distribution with mean 4 and variance 1. If H0 is rejected when W > 1.50, then what are the probabilities of a Type I or Type II error respectively? A. 0.07 and 0.01 B. 0.07 and 0.99 C. 0.31 and 0.01 D. 0.31 and 0.99 E. 0.93 and 0.99 14.13 (4, 5/87, Q.50) (1 point) Which of the following are true regarding hypothesis tests? 1. The test statistic has a probability of α of falling in the critical region when H0 is true, where α is the level of significance. 2. One should reject the H0 when the test statistic falls outside of the critical region. 3. The fact that the test criteria is not significant proves that the null hypothesis is true. A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3 14.14 (2, 5/88, Q.20) (1.5 points) A single observation X from the distribution with density function α f(x) = , for x > 1 is used to test the null hypothesis H0 : α = 2 against the alternative (x + 1)α + 1 H1 : α = 4. Let H0 be rejected if X < k for some k. If the probability of a Type I error is 3/4, what is the probability of a Type II error? A. 1/16 B. 1/4 C. 7/16 D. 1/2 E. 15/16 14.15 (2, 5/88, Q.45) (1.5 points) One hundred random observations are taken from a Normal Distribution, with mean µ and variance 4. To test H0 : µ = 3 versus H1 : µ > 3, a critical region of the form X > c is to be used. What is the value of c such that the probability of a Type I error is 0.10? A. 3.17 B. 3.26 C. 3.33 D. 3.51 E. 5.56
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 423 14.16 (4, 5/89, Q.52) (3 points) An insurer calculates its premium rates for a certain line of insurance assuming a Poisson claim frequency with a mean = 0.01 and an expected claim severity of $1,500. The insurer wrote 10,000 of these policies and observed 115 claims with an average severity of $2,000 and an average standard deviation of $2,000. Using a 5% level of significance, test the null hypothesis that the underlying assumptions are adequate. (This means that the insured will reject the null hypothesis only if his assumptions are too low.) (Note: Assume the sample standard deviation matches the population's claim severity standard deviation. Use the normal approximation.) A. Do not reject underlying frequency assumption, do not reject underlying severity assumption B. Do not reject underlying frequency assumption, reject underlying severity assumption C. Reject underlying frequency assumption, do not reject underlying severity assumption D. Reject underlying frequency assumption, reject underlying severity assumption E. Cannot be determined 14.17 (2, 5/90, Q.17) (1.7 points) Let X1 , X2 be a random sample from a Poisson distribution with mean θ. The null hypothesis H0 : θ = 5 is to be tested against the alternative hypothesis H1 : θ ≠ 5 using the test statistic X = (X1 + X2 )/2. What is the probability of a Type I error if the critical region is | X - 5| ≥ 4? 8
A. 1 - ∑ e− 5 5y / y! y=2 8
C. 1 - ∑ e−10 10y / y! y=2
9
B. 1 - ∑ e− 5 5y / y! y=1 17
D. 1 - ∑ e−10 10y / y! y=0
17
E. 1 - ∑ e−10 10y / y! y=3
14.18 (2, 5/90, Q.29) (1.7 points) Let X1 , X2 ,. . . . , Xn be a random sample from a normal distribution with mean µ and variance 50. The null hypothesis H0 : µ = 10 is to be tested against the alternative hypothesis H1 : µ = 15 using the critical region X ≥ 13.75. What is the smallest sample size required to ensure that the probability of a Type II error is less than or equal to 0.31? A. 2 B. 4 C. 5 D. 8 E. 20
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 424 14.19 (2, 5/92, Q.20) (1.7 points) Let p be the probability of success of a Bernoulli trial, and let X be the number of successes in 4 trials. In testing the null hypothesis H0 : p = 0.50 against the alternative hypothesis H1 : p = 0.25, the critical region is X ≤ 1. What is the probability of a Type II error? A. 27/128 B. 67/256 C. 5/16
D. 11/16
E. 189/256
14.20 (2, 5/92, Q.22) (1.7 points) Let X1 , X2 be a random sample from a distribution with density function f(x) = θxθ−1 for 0 < x < 1, where θ > 0. The null hypothesis H0 : θ = 3 is tested against the alternative hypothesis H1 : θ = 2 using the statistic Y = max(X1 , X2 ). If the critical region is {Y: Y < 1/2}, then what is the probability of a Type I error? A. 1/64 B. 1/20 C. 1/4 D. 3/4 E. 63/64 14.21 (1 point) In the previous question, what is the probability of a Type II error? 14.22 (2, 2/96, Q.16) (1.7 points) Let X be a single observation from a continuous distribution with density f(x) = exp[-|x-θ|]/2, for -∞ < x < ∞. The null hypothesis H0 : θ = 0 is tested against the alternative hypothesis H1 : θ = 1. The null hypothesis is rejected if X > k. The probability of a Type I error is 0.05. Calculate the probability of a Type II error. A. 0.0184 B. 0.1359 C. 0.8641 D. 0.9500 E. .9816 14.23 (2, 2/96, Q.19) (1.7 points) Let X1 ,..., Xn and Y1 ,..., Yn be independent random samples from normal distributions with means µX and µY and variances 2 and 4, respectively. The null hypothesis H0 : µX = µY is rejected in favor of the alternate hypothesis H1 : µX > µY if X - Y > k. Determine the smallest value of n for which a test of significance level (size) 0.025 has power of at least 0.5 when µX = µY + 2. A. 3
B. 4
C. 5
D. 6
E. 8
14.24 (2, 2/96, Q.20) (1.7 points) Five hypotheses are to be tested using five independent test statistics. A common significance level (size) for each test is desired which ensures that the probability of rejecting at least one hypothesis is 0.4, when all five hypotheses are true, Determine the desired common significance level (size). A. 0.040 B. 0.080 C. 0.097 D. 0.167 E. 0.400
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 425 14.25 (4B, 5/96, Q.31) (2 points) You are given the following: • A portfolio consists of 100 identical and independent risks.
• The number of claims per year for each risk follows a Poisson distribution with mean θ. • You wish to test the null hypothesis H0 : θ = 0.01 against the alternative hypothesis H1 : θ > 0.01.
• The null hypothesis will be rejected if the number of claims for the entire portfolio in the latest year is greater than or equal to 3. Without using a normal approximation, determine the significance level of this test. A. Less than 0.01 B. At least 0.01, but less than 0.05 C. At least 0.05, but less than 0.10 D. At least 0.10, but less than 0.20 E. At least 0.20 14.26 (4B, 5/97, Q.29) (3 points) You are given the following: • A portfolio of independent risks is divided into two classes.
•
The number of claims per year for each risk follows a Poisson distribution with mean θ, where θ may vary by class, but does not vary within each class.
•
The observed number of claims for the latest year has been recorded as follows: Class Number of Risks Number of Claims 1 100 4 2 25 0
•
For each class individually, you wish to test the null hypothesis H0 : θ = 0.10 against the alternative hypothesis H1 : θ < 0.10 .
Determine which of the following statements is true. A. H0 will be rejected at the 0.01 significance level for both classes. B. H0 will be rejected at the 0.05 significance level for both classes, but will be not be rejected at the 0.01 level for both classes. C. H0 will be rejected at the 0.05 significance level for Class 1, but will not be rejected at the 0.05 level for Class 2. D. H0 will be rejected at the 0.05 significance level for Class 2, but not be rejected at the 0.05 level for Class 1. E. H0 will not be rejected at the 0.05 significance level for both classes.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 426 14.27 (4B, 11/98, Q.7) (2 points) You are given the following: • Claim sizes follow a Pareto distribution, with parameters α (unknown) and θ = 10,000. • The null hypothesis, H0 : α = 0.5, is tested against the alternative hypothesis, H1 : α < 0.5. • One claim of 9,600,000 is observed. Determine which of the following statements is true. A. H0 will be rejected at the 0.01 significance level. B. H0 will be rejected at the 0.02 significance level, but not at the 0.01 level. C. H0 will be rejected at the 0.05 significance level, but not at the 0.02 level. D. H0 will be rejected at the 0.10 significance level, but not at the 0.05 level. E. H0 will not be rejected at the 0.10 significance level. 14.28 (4B, 11/99, Q.15) (2 points) You are given the following: • The annual number of claims follows a Poisson distribution with mean λ . • The null hypothesis, H0 : λ = m, is to be tested against the alternative hypothesis, H1 : λ < m, based on one year of data. • The significance level must not be greater than 0.05. Determine the smallest value of m for which the critical region could be nonempty. A. Less than 0.5 B. At least 0.5, but less than 1.5 C. At least 1.5, but less than 2.5 D. At least 2.5, but less than 3.5 E. At least 3.5 14.29 (4B, 11/99, Q.30) (1 point) You wish to test the hypothesis that a set of data arises from a given parametric distribution with given parameters. (Thus, no parameters are estimated from the data.) Which of the following statements is true? A. The value of the Chi-Square statistic depends on the endpoints of the chosen classes. B. The value of the Chi-Square statistic depends on the number of parameters of the distribution. C. The value of the Kolmogorov-Smirnov statistic depends on the endpoints of the chosen classes. D. The value of the Kolmogorov-Smirnov statistic depends on the number of parameters of the distribution. E. None of the above statements is true.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 427 14.30 (CAS3, 5/05, Q.24) (2.5 points) Which of the following statements about hypothesis testing are true? 1. A Type I error occurs if H0 is rejected when it is true. 2. A Type II error occurs if H0 is rejected when it is true. 3. Type I errors are always worse than Type II errors. A. 1 only B. 2 only C. 3 only D. 1 and 3 only
E. 2 and 3 only
14.31 (CAS3, 11/07, Q.7) (2.5 points) You are given the following information on a random sample:
• Y = X1 +...+ Xn where the sample size, n, is equal to 25 and the random variables are independent and identically distributed
• Xi has a Poisson distribution with parameter λ • H0 : λ = 0.1 • H1 : λ < 0.1 • The critical region to reject H0 is Y ≤ 3 Calculate the significance level of the test. A. Less than .50 B. At least .50, but less than .60 C. At least .60, but less than .70 D. At least .70, but less than .80 E. At least .80
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 428 Solutions to Problems: 14.1. A. The variance of the mean severity for 85 claims is: 35 million/85 = 411,765. The chance that we would observe an average severity of 5000 or more: 1 -Φ[(5000-4000)/ 411,765 ] = 1 - Φ(1.56) = 1 - 0.941 = 0.059 < 0.10. Thus we reject at 10%. Since .059 > .05, we do not reject at 5%. Alternately, the total observed losses are: (85)(5000) = 425,000. The expected losses for 85 claims are: (85)(4000) = 340,000. The variance of the sum of 85 claims is: (85)(35 million) = 2975 million. Thus the chance of observing 425,000 or more of losses is: 1 - Φ[(425,000 - 340,000)/ 2975 million ] = 1 - Φ(1.56) = 1 - 0.941 = 0.059 < 0.10. Thus we reject at 10%, but not at 5%. 14.2. D. If H0 is true, then the probability that a claim exceeds $1 million = {θ/(θ+x)}α = (1/2)2 = .25. Thus the probability that five claims all exceed $1 million is (.25)5 = .098%. We can reject at a 0.1% significance level since there is less than a 0.1% chance of rejecting the null hypothesis H0 when it is in fact true. We can not reject at a .01% level since .098% > .01%. Comment: The p-value is .098%. 14.3. H0 is that the data is a random draw from the simpler Exponential Distribution, a special case of the Gamma Distribution. H1 is that the data is a random draw from the Gamma Distribution, a generalization of the Exponential Distribution with one more parameter α. The test statistic is twice the difference in the loglikelihoods: (2){-1542.1 - (-1545.8)} = 7.4. The test statistic follows a Chi-Square distribution with 1 degree of freedom, the difference in the number of fitted parameters for the Exponential and Gamma. For significance levels of 5%, 2.5%, 1%, and 1/2% the critical values are: 3.84, 5.02, 6.64, and 7.88. Since 7.4 > 6.64, reject H0 in favor of H1 at a 1% significance level. (At a 1% level, the rejection or critical region is χ2 > 6.64.) Since 7.4 ≤ 7.88, do not reject H0 in favor of H1 at a 1/2% significance level. (At a 1/2% level, the rejection or critical region is χ2 > 7.88.) The p-value of the test is between 1/2% and 1%. 14.4. B. Prob[observation | H0 true] = S(x)3 . We would reject H0 at 10%, if this probability were less than 10%. S(x)3 < 0.1. ⇒ S(x) < 0.464. ⇔ F(x) > 0.536.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 429 14.5. B. The p-value = Prob[Type I error] = Prob[rejecting H0 when it is true]. ⇒ Statement B is false. In general, a small p-value indicates a bad fit. Loss Models states that a p-value of less than 1% indicates strong support for H1 , the alternative hypothesis that the data sample did not come from this loss distribution. ⇒ Statement D is true. 14.6. E. Assuming the data is drawn from F, the distribution function of the maximum is F(x)3 . Prob[observation | H0 true] = 1 - F(x)3 . We would reject H0 at 10%, if this probability were less than 10%. 1 - F(x)3 < 0.1. ⇒ F(x)3 > .9. ⇔ F(x) > 0.965. 14.7. C. The sum of 100 independent, identically distributed Exponentials has mean 100θ and variance 100θ2. Assuming θ = 1000, Prob[sum ≤ 83,000] ≅ Φ[(83000 - 100000)/10000] = Φ[-1.7] = 4.46%. Comment: Similar to Exercise 12.13 in Loss Models. We work with the value of θ in H0 closest to H1 . If θ > 1000, Prob[sum ≤ 83,000] is smaller than 4.46%. For example, if θ = 1100, Prob[sum ≤ 83,000] ≅ Φ[
83,000 - 110,000 ] = Φ[-2.45] = 0.71%. 11,000
A graph of Prob[sum ≤ 83,000], as a function of theta: Prob. 0.04
0.03
0.02
0.01
1050
1100
1150
1200
1250
1300
theta
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 430 14.8. B. The expected number of claims is: (1000)(.07) = 70. The variance of the number of claims is 70. The standard deviation is:
70 = 8.37. The chance that we would observe 85 or more claims
from the assumed Poisson is: 1 - Φ((84.5-70)/8.37) = 1 - Φ(1.73) = 0.0418. 0.05 > 0.0418 > 0.025 ⇒ we reject at 5%, and do not reject at 2.5%. 14.9. D. X is Normal with mean µ and variance: 16/16 = 1 The probability of rejecting if µ = 0 is: 1 - Φ(k) = .03. ⇒ k = 1.88. Prob[Type II error] = Prob[not rejecting | µ = 1] = Φ(1.88 - 1) = Φ(.88) = 0.81. 14.10. C. Prob[4 or more defectives | p = 1/5] = (5)(1/5)4 (4/5) + (1/5)5 = 21/3125. 14.11. C. f(x) = (θ + 1)xθ. F(x) = xθ+1. Prob[Type I error] = Prob[reject | H0 ] = Prob[X > 0.90 | θ = 1] = 1 - 0.92 = 0.19. 14.12. A. Prob[Type I Error] = Prob[Rejecting H0 | H0 is true] = 1 - Φ(1.5/1) = 0.0668. Prob[Type II Error] = Prob[Not Rejecting H0 | H1 is true] = Φ((1.5-4)/1) = Φ(-2.5) = 0.0062. 14.13. A. 1. True. This is the definition of the significance level. For example, for a Chi-Square Test, a 1% significance means that there is a 1% chance that χ2 > critical value. 2. False. One should not reject the null hypothesis (at the given level of significance) when the test statistic falls outside of the critical region. 3. False. The fact that the test criteria is not significant merely tells us that the data do not contradict the null hypothesis, rather than proving that H0 is true. 14.14. A. f(x) = αx−(1+α), for x > 1. F(x) = 1 - x−α, for x > 1. 3/4 = Prob[Type I Error] = Prob[reject when should not] = Prob[X < k | H0 ] = Prob[X < k | α = 2] = 1 - k-2. ⇒ 1/4 = 1/k2 . ⇒ k = 2. Prob[Type II Error] = Prob[do not reject when should] = Prob[X ≥ 2 | H1 ] = Prob[X ≥ 2 | α = 4] = 2-4 = 1/16. Comment: A Single Parameter Pareto Distribution, with θ = 1. 14.15. B. 0.10 = Prob[Type I error] = Prob[Reject when H0 true] = Prob[ X > c | µ = 3] = 1 - Φ[(c- 3)/ 4 / 100 ]. ⇒ .10 = Φ((c- 3)/.2) ⇒ (c - 3)/.2 = 1.282. ⇒ c = 3.256.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 431 14.16. B. Mean number of claims is (10000)(.01) = 100. For 10000 independent policies each with a variance of .01, the variance is (10000)(.01) = 100. Thus the chance of 115 claims or more is approximately: 1- Φ((114.5 - 100)/ 100 ) = 1 - Φ(1.45) = 1- 0.9265 = 7.35%. Since 7.35% > 5%, we can not reject the frequency hypothesis at 5%. If the standard deviation is $2000 for a single claim, then for the average of 115 claims it is 2000 / 115 = 186.5. The expected mean severity is 1500. Prob[observed severity ≥ 2000] ≅ 1- Φ( (2000 - 1500)/186.5 ) = 1 - Φ(2.68) = 1- .9963 = 0.37%. Since .37% < 5%, we can reject the severity hypothesis at 5%. Comment: Note the use of the continuity correction in approximating the discrete frequency distribution by a Normal, even though using 114.5 rather than 115 makes no difference to the solution in this case. 14.17. E. If H0 is true, X1 + X2 is Poisson with mean 10. | X - 5| ≥ 4 ⇔ X ≤ 1 or X ≥ 9 ⇔ X1 + X2 ≤ 2 or X1 + X2 ≥ 18. 17
Prob[reject when H0 true] = 1 - Prob[2 < X1 + X2 < 18] = 1 - ∑ e−10 10y / y! . y=3
14.18. D. Probability of a Type II Error is: Prob[failing to reject | H1 true] = Prob[ X < 13.75 | H1 true] = Φ[(13.75 - 15)/ 50 / n ] = Φ(-.1768 n ). Set this probability equal to .31: Φ(-0.1768 n ) = .31. ⇒ -0.1768 n = -0.50. ⇒ n = 8. Thus the smallest possible n is 8. Comment: For n = 8, Probability of a Type II Error is: Φ[(13.75 - 15)/ 50 / 8 ] = Φ[-0.5] = 0.3085. 14.19. B. Prob[Type II error] = Prob[failing to reject when H0 false] = Prob[X > 1 | H1 ] = 1 - Prob[X = 0 or X = 1 | p = 0.25] = 1 - .754 - (4)(.25)(.753 ) = 67/256. 14.20. A. f(x) = θxθ−1. F(x) = xθ. Prob[Type I error] = Prob[rejecting H0 when H0 is true] = Prob[max(X1 , X2 ) < 1/2 | θ = 3] = Prob[X1 < 1/2 | θ = 3]Prob[X2 < 1/2 | θ = 3] = (1/2)3 (1/2)3 = 1/64.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 432 14.21. Prob[Type II error] = Prob[not rejecting H0 | H1 ] = 1 - Prob[max(X1 , X2 ) < 1/2 | θ = 2] = 1 - Prob[X1 < 1/2 | θ = 2] Prob[X2 < 1/2 | θ = 2] = 1 - (1/2)2(1/2)2 = 1 - 1/16 = 0.9375. Comment: Power = Prob[rejecting H0 | H1 ] = 1/16. If the critical region is {Y: Y < c}, then power = c4 and significance = c6. Not a good test! 14.22. C. By integrating f(x) from x to ∞: S(x) = exp[-(x-θ)]/2, x > θ. 0.05 = Prob[Type I error] = Prob[rejecting H0 when it is true] = Prob[X > k | θ = 0] = exp[-k]/2. ⇒ 0.01 = e-k. ⇒ k = 2.303. Prob[Type II error] = Prob[not rejecting H0 | H1 ] = Prob[X < 2.303 | θ = 1] = 1 - exp[-(2.303-1)]/2 = 1 - e-1.303/2 = 0.8641. 14.23. D. X - Y is Normal with mean µX - µY and variance 2/n + 4/n = 6/n. When µX = µY, X - Y has mean 0. When µX = µY + 2, X - Y has mean 2. .025 = Prob[Rejecting H0 when it is true] = Prob[ X - Y > k | µX = µY] = 1 - Φ[k/ 6 / n ].
⇒ k/ 6 / n = 1.96. ⇒ k = 4.801/ n . .5 ≤ Prob[Rejecting H0 when µX = µY + 2] = Prob[ X - Y > 4.801/ n | µX = µY + 2] = 1 - Φ[{(4.801/ n ) - 2}/ 6 / n ].
⇒ {(4.801/ n ) - 2}/ 6 / n ≤ 0. ⇒ (4.801/ n ) ≤ 2. ⇒ n ≥ 5.76. ⇒ n = 6. 14.24. C. For each test independently, α = Prob[Reject when hypothesis is true]. .6 = Prob[Not rejecting none of the hypothesis when they are all true] = (1 - α)5 . ⇒ α = 0.097. 14.25. C. If H0 is true, each individual has a distribution which is Poisson with mean .01. The sum of independent Poisson variables is Poisson; thus the portfolio has a Poisson Distribution with a mean of (100)(.01) = 1. For such a Poisson the chance of x claims is given by: f(x) = e-1 (1)x / x! = e-1 / x! . The chance of 3 or more claims (and thus rejecting the null hypothesis) is: 1 - {f(0) +f(1) +f(2)} = 1 - {e-1 + e-1 + e-1 /2 } = 1 - 2.5e-1 = 1 - 0.920 = 0.080. Comment: The significance level α of the test is the chance of rejecting the null hypothesis when it is in fact true. If one used the Normal Approximation, the chance of 3 or more claims is approximately: 1 - Φ((2.5 -1)/
1) = 1 - Φ(1.5) = 1 - .9332 = .0668.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 433 14.26. C. Adding up independent Poisson Distributions gives a Poisson Distribution with the sum of the individual parameters. Therefore each class has a Poisson Distribution. If each risk in Class 1 has a mean of θ1, then Class 1 has a mean of 100 θ1. If θ1 = 0.1, then Class 1 has a mean of 10. Thus observing only 4 claims might lead one to believe that θ1 < .1. The chance of observing 4 or fewer claims is for a Poisson Distribution with mean λ: e−λ{1 + λ + λ2/2 + λ3/6 + λ4/24}. If θ1 = .1, then λ = 100θ1 = 10, and the chance of 4 or fewer claims is 644e-10 = .029. Thus for Class 1 we reject H0 at the 0.05 significance level, but do not reject H0 at the 0.01 significance level. If θ2 = .1, then Class 2 has a Poisson with a mean of (25)(.1) = 2.5. Thus observing only 0 claims might lead one to believe that θ2 < .1. The chance of observing 0 (or fewer claims) is for a Poisson Distribution with mean λ: e−λ. If θ2 = .1, then λ = 2.5, and the chance of 0 claims is e-2.5 = 0.082. Thus for Class 2 we do not reject H0 at the 0.05 significance level. 14.27. C. S(9,600,000) = {θ/(θ + 9,600,000)}α = (10000/9,610,000).5 = 3.2%. Thus since 5% > 3.2% > 2%, we reject H0 at 5% and do not reject at 2%. Comments: For smaller α, the Pareto is heavier-tailed and there is more chance of getting a claim as large as 9.6 million or larger. For example, if instead α = 0.3, then S(9,600,000) = 12.7%, and we would not reject at 10% the hypothesis that α = 0.3. The more extreme the observation, the more likely we are to reject the null hypothesis. If for example, the single claim observed had been instead 50 million, then the chance of observing such a large claim if α = 0.5 would be only 1.4%, so we would in that case reject at 2% and do not reject at 1%.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 434 14.28. D. If the observed number of claims is too low, then we will reject the null hypothesis that λ = m in favor of the alternative hypothesis that λ is smaller than m. Assume for example m = 5 and we observe only 1 claim. Then given the null hypothesis the chance that we could have observed 1 or fewer claims is e-5 + 5e-5 = 4.0%. Thus we could reject the null hypothesis at a 5% significance level. In general we would reject the null hypothesis at a 5% significance level if we observe c claims and F(c) ≤ 5%. The critical region consists of those values of c for which we would reject at the given significance level. For example, for m =5, the critical region for a 5% significance level is {0, 1}. Now the critical region will always be empty if we can not reject when we observe zero claims. This will be the case if F(0) = e-m > significance level = α . Thus the critical region is empty if m < -ln(α). If α = .05 then the critical region is empty for m < -ln(.05) = 2.996. If α < .05, then -ln(α) > 2.996. Thus for example, if α = .01, then the critical region is empty for m < -ln(.01) = 4.605. So over all α ≤ .05, the smallest m for which the critical region can be nonempty is m = 2.996 and α = 5%. The corresponding critical region is {0}. 14.29. A. Statement A is true. If one groups the data differently the value of the Chi-Square statistic changes. Statement B is false. The value of the Chi-Square statistic never depends directly on the number of parameters. Statement C is false, since we should not group the data in order to calculate the K-S Statistic. Statement D is false, because the K-S Statistic never depends directly on the number of parameters. Comment: Note that even if the value of the Chi-Square statistic changes, this may or may not effect your conclusion as to whether you do not reject or reject the hypothesis at a given significance level. If one had fit parameters, then the number of degrees of freedom of the Chi-Square statistic would have been reduced by the number of fitted parameters; however, the value of the Chi-Square statistic never depends directly on the number of parameters, whether estimated from data or not. 14.30. A. Statement 1 is true. A Type II error occurs if H0 is not rejected when it is false. ⇒ Statement 2 is false. Depending on the situation being modeled, either type of error can be worse. Comment: From a purely statistical point of view, one wants to avoid both types of errors, and neither is inherently worse. However, for a given sample size, decreasing the probability of one type of error increases the probability of the other type of error.
2013-4-6, Fitting Loss Distributions §14 Hypothesis Testing, HCM 10/14/12, Page 435 14.31. D. The sum of 25 independent Poisson Distributions is another Poisson with 25 times the mean. Significance level of the test = Prob[reject when H0 is true] = Prob[number of claims ≤ 3 from a Poisson with mean (25)(.1) = 2.5] = f(0) + f(1) + f(2) + f(3) = e-2.5(1 + 2.5 + 2.52 /2 + 2.53 /6) = 75.8%. Comment: One would not usually perform a statistical test with a significance level of 76%! Significance levels are usually something like 5% or 1%. Since H1 : λ < 0.1, it makes sense to reject when the observed number of claims is small.
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 436
Section 15, The Schwarz Bayesian Criterion192 The Likelihood Ratio Test can be used when one distribution is a special case (or limit) of the other. Similar tests can be performed even when this is not the case. By fitting more parameters we can increase the maximum loglikelihood, but there is the potential problem of overfitting the data. One can avoid this by penalizing those fits with more parameters. The Schwarz Bayesian Criterion is an example of such a “penalized loglikelihood value”. One adjusts the loglikelihoods by subtracting in each case the penalty: (number of fitted parameters) ln(number of data points) / 2. penalty = (r/2) ln(n) = r ln( n ), where r = number of parameters and n = number of data points. One then compares these penalized loglikelihoods directly; larger is better. Exercise: Both a Pareto (2 parameters) and a Transformed Gamma Distribution (3 parameters) have been fit to the same 200 points via the Method of Maximum Likelihood. The loglikelihoods are -820.09 for the Transformed Gamma and -822.43 for the Pareto. Use the Schwarz Bayesian Criterion to compare the fits. [Solution: The penalized likelihood value for the Transformed Gamma is: -820.09 - (3)ln(200)/2 = -828.04. The penalized likelihood value for the Pareto is: -822.43 - (2)ln(200)/2 = -827.73. Since -827.73 > -828.04, the Pareto is the better fit.] Note that the distribution with more parameters receives a larger penalty to its loglikelihood. This is consistent with the principle of parsimony. We avoid using more parameters unless it is worthwhile. Also the penalties are larger for larger data sets; the increase is as per the log of the size of the data set.193 The improvement in the fit from adding an extra parameter has to be larger in order to be worthwhile in the case of a larger data set.
192
See page 461 of Loss Models. “Estimating the Dimension of a Model,” by Gideon Schwarz, The Annals of Statistics, 1978, Vol. 6, No.2. 193 The likelihood ratio test does not use the size of the data set, although since it relies on an asymptotic result the likelihood ratio test should not be applied to very small data sets.
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 437 Testing Other Hypothesis: As with the likelihood ratio test, one can use the Schwarz Bayesian Criterion in order to test other hypotheses. For example, one can test hypotheses involving restrictions on the relationships of the parameters of the distributions of two related data sets, such as in the following previously discussed example. Phil and Sylvia are competitors in the light bulb business. You were able to test 20 of Philʼs bulbs and 10 of Sylviaʼs bulbs: Number of Bulbs Total Lifetime Average Lifetime Phil 20 20,000 1000 Sylvia 10 15,000 1500 You assume that the distribution of the lifetime (in hours) of a light bulb is Exponential. Using maximum likelihood, separately estimating θ for Philʼs Exponential Distribution, θ = 1000 with corresponding maximum loglikelihood: -20000/1000 - 20ln(1000) = -158.155. Separately estimating θ for Sylviaʼs Exponential Distribution, θ = 1500. The corresponding maximum loglikelihood is: -15000/1500 - 10ln(1500) = -83.132. Using maximum likelihood applied to all the data, estimating θ for Philʼs Exponential Distribution restricted by Sylviaʼs claim that her light bulbs burn twice as long as Philʼs, θP = (20000 + 15000/2)/(20 + 10) = 917. θS = 2θP = 1834. The maximum loglikelihood is: -20000/917 - 20ln(917) - 15000/1834 - 10ln(1834) = -241.554. The unrestricted maximum loglikelihood is: -158.155 - 83.132 = -241.287, somewhat better than the restricted maximum likelihood of -241.554. Let the null hypothesis H0 be that Sylviaʼs light bulbs burn twice as long as Philʼs. Let the alternative H1 be that H0 is not true. Then we can use the Schwarz Bayesian Criterion as follows. The penalty is: (number of fitted parameters) ln(number of data points)/2 = (number of fitted parameters)ln(30)/2 = (number of fitted parameters)(1.701).
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 438 For the unrestricted model we have two parameters, and the penalized loglikelihood is: -241.287 - (2)(1.701) = -244.689. For the restricted model we have one parameters, and the penalized loglikelihood is: -241.554 - (1)(1.701) = -243.255. Since -243.255 > -244.689, the unrestricted model is not significantly better than the restricted model, and we do not reject H0 . One reason we did not reject Sylviaʼs claim was due to the small sample size. Exercise: Redo the above example with the following data: Number of Bulbs Total Lifetime Average Lifetime Phil 2000 2,000,000 1000 Sylvia 1000 1,500,000 1500 [Solution: Separate estimate of θ for Philʼs Exponential Distribution, θ = 1000. The corresponding maximum loglikelihood is: -2000000/1000 - 2000ln(1000) = -15815.51. Separate estimate of θ for Sylviaʼs Exponential Distribution, θ = 1500. The corresponding maximum loglikelihood is: -1500000/1500 - 1000ln(1500) = -8313.22. Restricted by Sylviaʼs claim, θP = (2000000 + 1500000/2)/(2000 + 1000) = 917. θS = 2θP = 1834. The maximum loglikelihood is: -2000000/917 - 2000ln(917) - 1500000/1834 - 1000ln(1834) = -24155.38. Unrestricted loglikelihood is: -15815.51 - 8313.22 = -24128.73. Unrestricted penalized loglikelihood is: -24128.73 - (2)ln(3000)/2 = -24136.74. Restricted penalized loglikelihood is: -24155.38 - (1)ln(3000)/2 = -24159.38. -24136.74 > -24159.38, the unrestricted model is significantly better, and we reject H0 .]
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 439 Problems: 15.1 (2 points) Two distributions have been fit via maximum likelihood to the same data. One of the distributions is a special case of the other, with one fewer parameter. If the result of applying the likelihood ratio test at a 1/2% significance level and the Schwarz Bayesian Criterion are the same, regardless of the specific values of the loglikelihoods, then how many points are in the data set? A. 1500 B. 2000 C. 2500 D. 3000 E. 3500 Use the following information for the next 2 questions: • Various distributions have each been fit to the same set of 300 data points via the method of Maximum Likelihood. • The loglikelihoods are as follows: Distribution Number of Parameters Loglikelihood Transformed Beta 4 -2582 Transformed Gamma 3 -2583 Generalized Pareto 3 -2585 Inverse Gaussian 2 -2589 LogNormal 2 -2590 Exponential 1 -2592 15.2 (2 points) Based on the Schwarz Bayesian Criterion penalized likelihood value discussed in Loss Models, rank the following three models from best to worst: Transformed Beta, Transformed Gamma, Inverse Gaussian. A. Transformed Beta, Transformed Gamma, Inverse Gaussian B. Transformed Beta, Inverse Gaussian, Transformed Gamma C. Transformed Gamma, Transformed Beta, Inverse Gaussian D. Inverse Gaussian, Transformed Beta, Transformed Gamma E. None of the above. 15.3 (2 points) Based on the Schwarz Bayesian Criterion penalized likelihood value discussed in Loss Models, rank the following three models from best to worst: Generalized Pareto, LogNormal, Exponential. A. Generalized Pareto, LogNormal, Exponential B. Generalized Pareto, Exponential, LogNormal C. LogNormal, Generalized Pareto, Exponential D. Exponential, Generalized Pareto, LogNormal E. None of the above.
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 440 15.4 (2 points) A two-point mixture of Inverse Gaussian Distributions (5 parameters) and an Inverse Transformed Gamma (3 parameters) are each fit via maximum likelihood to the same 50 values. The likelihood for the fitted two-point mixture of Inverse Gaussian Distributions is 2.1538 x 10-169. The likelihood for the fitted Inverse Transformed Gamma is 8.4492 x 10-171. Based on the Schwarz Bayesian Criterion, which of these two is the better fit? 15.5 (4 points) You are given the following data on the number of claims during a year on 421,240 motor vehicle insurance policies. Number of Claims per Policy Number of Policies 0 370,412 1 46,545 2 3,935 3 317 4 28 5 3 6 or more 0 A mixture of two Poissons is fit to this data with result: λ 1 = 0.103, λ2 = 0.366, and the weight to the first Poisson is p = 0.89, with corresponding loglikelihood of -171,133.5. Use the Schwarz Bayesian Criterion to compare this fit to that of the maximum likelihood Geometric Distribution. 15.6 (3 points) You have the following data from the state of West Carolina: Region Number of Claims Aggregate Losses Average Size of Loss Rural 5000 500,000 100 Urban 10,000 1,250,000 125 You assume that the distribution of sizes of claims is exponential. Based on data from other states, you assume that the mean claim size for Urban insureds is 1.2 times that for Rural insureds. Let H0 be that the mean claim size in West Carolina for Urban is 1.2 times that for Rural. Using the Schwarz Bayesian Criterion, test the hypothesis H0 .
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 441 15.7 (3 points) Use the following information: • Various frequency distributions have each been fit to the same set of 500 data points via the method of Maximum Likelihood. • The loglikelihoods are as follows: Distribution Number of Parameters Loglikelihood Poisson 1 -932.11 Negative Binomial 2 -928.80 Compound Poisson-Binomial 3 -926.77 Compound Negative Binomial-Binomial 4 -924.43 2 Point Mixture of Binomial and Negative Binomial 5 -919.62 Based on the Schwarz Bayesian Criterion penalized loglikelihood value, which is the best model? A. Poisson B. Negative Binomial C. Compound Poisson-Binomial D. Compound Negative Binomial-Binomial E. Two Point Mixture of Binomial and Negative Binomial 15.8 (4, 11/00, Q.10) (2.5 points) You are given: (i) Sample size = 100 (ii) The negative loglikelihoods associated with five models are: Model Number Of Parameters Negative Loglikelihood Generalized Pareto 3 219.1 Burr 3 219.2 Pareto 2 221.2 Lognormal 2 221.4 Inverse Exponential 1 224.2 (iii) The form of the penalty function is r ln(n)/2. Which of the following is the best model, using the Schwarz Bayesian Criterion? (A) Generalized Pareto (B) Burr (C) Pareto (D) Lognormal (E) Inverse Exponential Comment: The original question has been rewritten in order to match the current syllabus. 15.9 (4, 11/06, Q.22 & 2009 Sample Q.266) (2.9 points) Five models are fitted to a sample of n = 260 observations with the following results: Model Number of Parameters Loglikelihood I 1 - 414 II 2 - 412 III 3 - 411 IV 4 - 409 V 6 - 409 Determine the model favored by the Schwarz Bayesian criterion. (A) I (B) II (C) III (D) IV (E) V
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 442 Solutions to Problems: 15.1. C. Let d = difference in loglikelihoods. Using the likelihood ratio test, one consults the ChiSquare Table for one degree of freedom and rejects the simpler distribution at 1/2% if and only if 2d > 7.88 ⇔ d > 3.94. Using the Schwarz Bayesian Criterion, with one extra parameter the penalized loglikelihood for the more complicated distribution is better if and only if d > ln(n)/2. The two results are always the same if 3.94 = ln(n)/2 ⇒ n = e7.88 = 2644. 15.2. C. One adjusts the loglikelihoods by subtracting in each case: (number of fitted parameters) ln(number of data points)/2. In this case with 300 points, ln(number of data points)/2 = ln(300)/2 = 2.852. The penalized likelihood values are: Transformed Beta: -2582 -(4)(2.852) = -2593.4. Transformed Gamma: -2583 -(3)(2.852) = -2591.6. Inverse Gaussian: -2589 -(2)(2.852) = -2594.7. Thus the models from best to worst are: Transformed Gamma, Transformed Beta, Inverse Gaussian. 15.3. B. One adjusts the loglikelihoods by subtracting in each case: (number of fitted parameters) ln(number of data points)/2. In this case with 300 points, ln(number of data points)/2 = ln(300)/2 = 2.852. The penalized likelihood values are: Generalized Pareto: -2585 - (3)(2.852) = -2593.6. LogNormal: -2590 - (2)(2.852) = -2595.7. Exponential: -2592 - (1)(2.852) = -2594.9. Thus the models from best to worst are: Generalized Pareto, Exponential, LogNormal. 15.4. The loglikelihood for the two-point mixture of Inverse Gaussian Distributions is: ln(2.1538 x 10-169) = ln(2.1538) - 169 ln(10) = -388.37. Penalized loglikelihood is: -388.37 - (5/2)ln(50) = -398.15. The loglikelihood for the Inverse Transformed Gamma is: ln(8.4492 x 10-171) = ln(8.4492) - 171 ln(10) = -391.61. Penalized loglikelihood is: -391.61 - (3/2)ln(50) = -397.48. -397.48 > -398.15. Thus based on the Schwarz Bayesian Criterion, the Inverse Transformed Gamma is better. Comment: Based on “Efficient Stochastic Modeling”, by Yvonne C. Chueh, in Contingencies, January/February 2005.
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 443 15.5. For the Geometric Distribution, the method of maximum likelihood is equal to the method of ^
moments. β = X = {(0)(370412) + (1)(46545) + (2)(3915) + (3)(317) + (4)(28) + (5)(3)}/421240 = 0.13174. f(0) = 1/(1+β). ln f(0) = -ln(1+β) = -ln(1.13174) = -0.123756. f(x+1) = f(x)β/(1+β). ln f(x+1) = ln f(x) + ln(.13174/1.13174) = ln f(x) - 2.150681. Loglikelihood is: (370412)(-.123756) + (46545)(-2.274437) + (3935)(-4.425118) + (317)(-6.575799) + (28)(-8.726480) + (3)(-10.877161) = -171478.7. The penalty for the Schwarz Bayesian Criterion is: (# of parameters)ln(n)/2 = (# of parameters)ln(421240)/2 = (# of parameters)6.5. Penalized loglikelihood for the Geometric: -171478.7 - 6.5 = -171,485.2. Penalized loglikelihood for the mixture of two Poissons: -171,133.5 - (3)(6.5) = -171,153. The penalized loglikelihood for the mixture of two Poissons is larger and therefore the mixture of two Poissons is a (much) better fit to this data than the Geometric Distribution. Comment: The data is taken from page 45 of Risk Theory, by Beard, Pentikainen, and Pesonen. See also Tables 15.13 and 16.18 in Loss Models. 15.6. For an Exponential Distribution, ln f(x) = -x/θ - ln(θ). Loglikelihood is: -Σxi/θ - nln(θ). Separate estimate of θ for Rural Exponential Distribution, θ = 100, same as the method of moments. The maximum loglikelihood is: -500,000/100 - 5000ln(100) = -28025.85. Separate estimate of θ for Urban Exponential Distribution, θ = 125. The corresponding maximum loglikelihood is: -1,250,000/125 - 10000ln(125) = -58283.14. Restricted by H0 , θU = 1.2θR, the loglikelihood for the combined sample is: -500,000/θR - 5000ln(θR) -1,250,000/(1.2θR) - 10000ln(1.2θR). Setting the partial derivative with respect to θR equal to zero, and solving: θR = (500000 + 1250000/1.2)/(5000 + 100000) = 102.78. θU = (1.2)(102.78) = 123.33. The corresponding maximum loglikelihood is: -500,000/102.78 - 5000ln(102.78) -1,250,000/123.33 - 10000ln(123.33) = -86311.76. Unrestricted loglikelihood is: -28025.85 - 58283.14 = -86308.99. The penalty is: (number of fitted parameters) ln(number of data points)/2. Unrestricted penalized loglikelihood is: -86308.99 - (2)ln(15000)/2 = -86318.61. Restricted penalized loglikelihood is: -86311.76 - (1)ln(15000)/2 = -86316.57. -86318.61 < -86316.57, the unrestricted model is not significantly better than the restricted model, and we do not reject H0 . Comment: See 4, 11/00, Q. 34.
2013-4-6, Fitting Loss Dists. §15 Schwarz Bayesian Criterion, HCM 10/14/12, Page 444 15.7. B. Using the Schwarz Bayesian Criterion one adjusts the loglikelihoods by subtracting in each case: (number of fitted parameters) ln(number of data points)/2 = (number of fitted parameters) ln(500)/2 = (number of fitted parameters)(3.107). Model
Number Of Parameters
Loglikelihood
Penalty
Penalized Loglikelihood
Poisson Negative Binomial Compound Pois.-Bin. Comp. Neg. Bin. - Bin. Mixed Bin. and Neg. Bin.
1 2 3 4 5
-932.11 -928.80 -926.77 -924.43 -919.62
3.107 6.214 9.321 12.428 15.535
-935.22 -935.01 -936.09 -936.86 -935.15
The largest penalized loglikelihood is that for the Negative Binomial. 15.8. C. Using the Schwarz Bayesian Criterion one adjusts the loglikelihoods by subtracting in each case: (number of fitted parameters) ln(number of data points)/2 = (number of fitted parameters) ln(100)/2 = (number of fitted parameters)(2.303). Model
Number Of Parameters
Loglikelihood
Penalty
Penalized Loglikelihood
Gen. Pareto Burr Pareto LogNormal Inverse Expon.
3 3 2 2 1
-219.1 -219.2 -221.2 -221.4 -224.2
6.909 6.909 4.606 4.606 2.303
-226.01 -226.11 -225.81 -226.01 -226.50
The largest penalized loglikelihood is that for the Pareto. Comment: From best to worst, the models are: Pareto, LogNormal and Generalized Pareto tied, Burr, Inverse Exponential. Since they have the same number of parameters and the Pareto has a larger loglikelihood, the Pareto is better than the LogNormal. Since they have the same number of parameters and the Generalized Pareto has a larger loglikelihood, the Generalized Pareto is better than the Burr. 15.9. A. Using the Schwarz Bayesian Criterion one adjusts the loglikelihoods by subtracting in each case: (number of fitted parameters) ln(number of data points)/2 = (number of fitted parameters) ln(260)/2 = (number of fitted parameters)(2.780). Model
Number Of Parameters
Loglikelihood
Penalty
Penalized Loglikelihood
I II III IV V
1 2 3 4 6
-414.00 -412.00 -411.00 -409.00 -409.00
2.78 5.56 8.34 11.12 16.68
-416.78 -417.56 -419.34 -420.12 -425.68
The largest penalized loglikelihood is that for the Model I. Comment: Since Model V has the same loglikelihood as Model IV, but more parameters, Model V is inferior to Model IV.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 445
Section 16, Kolmogorov-Smirnov Test, Basics194 The Kolmogorov-Smirnov Statistic is a formal way to compare the fitted/assumed and empirical distribution functions, which can be applied to ungrouped data. The Kolmogorov-Smirnov Statistic is computed for ungrouped data by finding the maximum absolute difference between the empirical distribution function and the fitted/assumed distribution function: D = Max | ( empirical distrib. function at x) - (theoretical distrib. function at x) |. x This maximum occurs just before or just after one of the observed points, where the empirical distribution function has a jump.195 By definition the Kolmogorov-Smirnov Statistic is greater than or equal to zero.196 The smaller the Kolmogorov-Smirnov statistic the better the fit between the distribution and the observed data.197
194
See Section 16.4.1. in Loss Models. This maximum difference between any two distributions is sometimes referred to as the maximum discrepancy between the two distributions. 196 Therefore, one performs a one-sided test, rather than a two-sided test as with the use of the Normal distribution. 197 Instead of testing the fit of a single sample to a distribution, one can perform a similar test to check whether two samples of data come from the same distribution. This test is sometimes also referred to as a Kolmogorov-Smirnov test. One takes the maximum difference of the two empirical distribution functions. But then one compares to a different statistic then the one considered in Loss Models. 195
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 446 Exercise: An Exponential distribution with θ = 1000 is compared to the following data set: 197, 325, 981, 2497. What is the value of the Kolmogorov-Smirnov (K-S) statistic? [Solution: The K-S statistic is 0.2225, from the absolute difference at or just after 325. x
Assumed F(x)
197
0.1788
Empirical Distribution 0
Absolute Value of Assumed - Empirical 0.1788 0.0712
0.25 0.0275 325
0.2775 0.2225 0.5 0.1251
981
0.6251 0.1249 0.75 0.1677
2497
0.9177 0.0823 1
For example, at 324.999 the assumed Exponential Distribution is: 1 - e-324.999/1000 = 0.2775. The empirical distribution at 324.999 is 1/4. The absolute difference at 324.999 is: |0.2775 - 0.25| = 0.0275. At 325, the fitted Exponential Distribution is: 1 - e-325/1000 = 0.2775. The empirical distribution at 325 is 2/4. Therefore, the absolute difference at 325 is: |0.2775 - 0.5| = 0.2225. Comment: While the Exponential Distribution is continuous at 325, and therefore F(324.999) ≅ F(325), the empirical distribution function has a jump discontinuity of size 1/4 at 325, as well as the other observed points.] Note the way the spreadsheet in the above solution was set up. Leave three blank lines between each of the observed values, where the observed values are ranked from smallest to largest. Put next to each observed value the fitted/assumed distribution function at that value. On the line between each observed value, put the empirical distribution function. The empirical distribution function starts at zero before the first observation and increases each time by 1 , up to one after the last observation. In the case of a repeated value in the data number of points set, there would be a double jump in the empirical distribution function.198 198
See for example, 4B, 5/95, Q.11. If a value shows up three times there will be a triple jump in the empirical distribution function, etc.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 447 Then the absolute differences are taken between all of the adjacent rows. For example, I took |0.6251 - 0.75| = 0.1249 and |0.9177 - 0.75| = 0.1677. In this case with four points, there are a total of eight comparisons. Going down this column of absolute differences, the K-S statistic is the largest value, in this case 0.2225. Some people may prefer to arrange this same calculation somewhat differently.199 Let F*(x) be the assumed distribution, Fn (x-) be the empirical distribution function just before x, and Fn (x) be the empirical distribution function at x. Then we compute both |Fn (x-)- F*(x)| and |Fn (x)- F*(x)| : x
Absolute difference
Fn(x-)
F*(x)
Fn(x)
Absolute difference
197
0.1788
0
0.1788
0.25
0.0712
325
0.0275
0.25
0.2775
0.5
0.2225
981
0.1251
0.5
0.6251
0.75
0.1249
2497
0.1677
0.75
0.9177
1
0.0823
As previously, the K-S statistic is the largest absolute difference, D = 0.2225.200
199 200
See Table 16.3 in Loss Models. We check both columns of absolute differences.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 448 A Weibull Example: For the ungrouped data in Section 2 there are 130 points. Thus one must compute the absolute difference between the empirical and fitted distribution 260 times, twice the number of points. For example, the 78th point is 146,100. Thus the empirical distribution function is 77 / 130 = 0.5923 just before 146,100 and 78 / 130 = 0.6000 just after 146,100. There is a jump discontinuity at 146,100. For the Weibull distribution fit to this data via maximum likelihood with parameters θ = 231,158 and τ = .6885, F(146,100) = 1 - exp[-(146100/231158).6885] = .5177. Thus just before 146,100 the absolute difference between empirical and fitted is: .5923 - .5177 = .0746. Just after 146,100 the absolute difference between empirical and fitted is: .6000 - .5177 = .0823. It turns out that the maximum such absolute difference occurs just after the 77th point, 135,800. The empirical distribution is 0.5923, the fitted distribution is 0.5001, and the absolute difference is 0.0922. Thus 0.0922 is the Kolmogorov-Smirnov Statistic for the Weibull distribution fit by maximum likelihood to the ungrouped data in Section 2. Here is a larger portion of the calculation of the Kolmogorov-Smirnov Statistic for the Weibull distribution fit by maximum likelihood: x
Fitted Weibull
Empirical Distribution
Difference just before x
Difference at x
122,000 123,100 126,600 127,300 127,600 127,900 128,000 131,300 132,900 134,300 134,700 135,800 146,100 150,300 171,800 173,200 177,700 183,000 183,300 190,100 209,400
0.4748 0.4769 0.4835 0.4848 0.4853 0.4859 0.4861 0.4921 0.4950 0.4975 0.4982 0.5001 0.5177 0.5246 0.5574 0.5595 0.5659 0.5732 0.5736 0.5827 0.6071
0.5077 0.5154 0.5231 0.5308 0.5385 0.5462 0.5538 0.5615 0.5692 0.5769 0.5846 0.5923 0.6000 0.6077 0.6154 0.6231 0.6308 0.6385 0.6462 0.6538 0.6615
0.0252 0.0308 0.0319 0.0383 0.0454 0.0526 0.0601 0.0618 0.0666 0.0718 0.0788 0.0845 0.0746 0.0754 0.0502 0.0559 0.0572 0.0576 0.0649 0.0634 0.0467
0.0329 0.0385 0.0396 0.0460 0.0531 0.0603 0.0678 0.0695 0.0743 0.0795 0.0865 0.0922 0.0823 0.0831 0.0579 0.0636 0.0649 0.0653 0.0726 0.0711 0.0544
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 449 Difference between Empirical and Fitted Distributions, D(x) plot: Let D(x) = empirical distribution - fitted/assumed distribution.201 Then a good fit would have D(x) of small magnitude for all x. K-S Statistic = Max | D(x) | x The smaller the Kolmogorov-Smirnov Statistic, the better the fit. One can usefully graph, D(x), the difference function. For example, here is the difference between the empirical distribution function and the maximum likelihood Weibull Distribution, for the ungrouped data in Section 2:202 D(x) 0.10
0.05
0.00
1
10
100
1000
x (000)
- 0.05
- 0.10 For example, 140,000 is between 135,800 and 146,100, the 77th and 78th values out of 130. The fitted Weibull distribution at 140,000 is: 1 - exp[-(400000/231158).6885] = 0.5074. Therefore, D(140,000) = 77/130 - 0.5074 = 0.0849. Graphically the K-S statistic is the maximum distance this difference curve, the plot of D(x), gets from the x-axis, either above or below. In this case this maximum distance is 0.0922, which occurs at 135,800. In the region from about 100,000 to 200,000 there is a poor fit; the empirical distribution function increases quickly in this region due to a large number of claims reported in this size category. 201 202
See page 446 in Loss Models. D stands for difference, deviation or discrepancy. While the computer has drawn the graph as continuous, there is a jump discontinuity at each of the data points.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 450 The Kolmogorov-Smirnov Statistic has a clear and simple graphical interpretation. It is the maximum distance D(x) gets from the x-axis, or equivalently the maximum distance between the graphs of the empirical and the fitted distribution functions. The larger the Kolmogorov-Smirnov Statistic, the worse the fit. ⇔ The further the difference curve gets from the x-axis, the worse the fit. For curves fit to the ungrouped data in Section 2 by the Method of Maximum Likelihood, the values of the Kolmogorov-Smirnov Statistic are:203 K-S Statistic LogLogistic 0.047 ParaLogistic 0.049 Pareto 0.059 Gen. Pareto 0.060 Burr 0.060 Trans. Gamma 0.064
K-S Statistic LogNormal 0.082 Weibull 0.092 Gamma 0.132 Exponential 0.240 Inverse Gaussian 0.373
reject fit at 5% reject fit at 1% reject fit at 1%
Hypothesis Testing: Loss Models states that the Kolmogorov-Smirnov test should only be used on individual data.204 One can use the Kolmogorov-Smirnov Statistic to test whether some ungrouped data was drawn from an assumed distribution. For the Kolmogorov-Smirnov Statistic the critical value is inversely proportional to the square root of the number of points. Here is a table of critical values:205 206 207 Significance Level = α
0.20 |
Critical Value = c 203
0.10 |
1.07/ n
0.05 |
1.22/ n
0.01 |
1.36/ n
1.63/ n
How to determine the significance levels is discussed below. See the Section on fitting to ungrouped data via maximum likelihood for the parameters of the fitted distributions. 204 See page 428. As discussed in the next section, one can get bounds on the K-S statistic for grouped data 205 If needed to answer an exam question, this or a similar table will be included within the question. 206 These critical values to more accuracy plus some additional ones: 20% 1.0727, 10% 1.2238, 5% 1.3581, 2.5% 1.4802, 1% 1.6276, 0.5% 1.7308, 0.1% 1.9495. 207 These critical values are approximate and should be good for 15 or more data points. For smaller sample sizes, the critical values are somewhat larger than those given by these formulas. For example, here are 20% critical values for various small sample sizes (do not divide by n ): 4 0.494, 5 0.446, 6 0.411, 7 0.381, 8 0.358, 9 0.339, 10 0.322. Similar 5% critical values: 4 0.624, 5 0.564, 6 0.521, 7 0.486, 8 0.457, 9 0.432, 10 0.411. Taken from “Kolmogorov-Smirnov: A Goodness of Fit Test for Small Samples,” by J. Romeu.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 451 These critical values, d/ n , for significance level α are determined from the asymptotic distribution of the maximum absolute deviation, by finding the value d such that: ∞
α = 1 - 2 Σ (-1)r-1 exp(-2r2 d2 ). r =1
While these critical values are often used by actuaries when comparing data to distributions fit to that same data, the correct critical values in this situation are smaller than those in the table.208 The distribution of the K-S Statistic is independent of the loss distribution.209 Thus we can use a single table to apply the K-S test, regardless of whether the fitted distribution is an Exponential, Pareto, etc. In order to enter the table, we only need to know the number of data points, n. So for example, with 130 points as in the ungrouped data in Section 2, the critical values for the K-S Statistic are: Significance Level = α
0.20 |
Critical Value for n = 130
0.10 |
0.0938
0.05 |
0.107
0.01 |
0.119
0.143
For the Gamma Distribution, the K-S Statistic is 0.132. Since 0.132 > 0.119, we can reject the fit of the Gamma at a 5% significance level. On the other hand, the critical value for 1% is: 1.63 / 130 = 0.143 > 0.132, so one can not reject the Gamma at the 1% significance level. Mechanically, the K-S Statistic for the Gamma of 0.132 is bracketed by 0.119 and 0.143. One rejects to the left and does not reject to the right. Reject at 5%, and do not reject at 1%. For the Pareto, the K-S Statistic is 0.059. Since 0.059 < 0 .0938, we do not reject the fit of the Pareto at 20%.
208
See page 450 of Loss Models, and “Mahlerʼs Guide to Simulation.” There is no simple adjustment based on the number of fitted parameters. 209 In my problems, I give a demonstration of why the K-S Statistic is distribution free.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 452 Exercise: A distribution has been fit to 10,000 claims. The Kolmogorov-Smirnov (K-S) statistic is 1.30%. Test the hypothesis that the data came from this distribution. [Solution: With 10,000 points, the critical values for the K-S Statistic are: Significance Level = α
0.20
0.10
0.05
0.01
| | | | Critical Value for n = 10000 1.07% 1.22% 1.36% 1.63% Since 1.22% < 1.30% < 1.36%, we reject the hypothesis at 10% and do not reject at 5%.] Graphical version of Rejection: Here is the difference between the empirical distribution function and the maximum likelihood Pareto Distribution, for the ungrouped data in Section 2:
D(x) 0.10
0.0938
0.05
0.00
1
10
100
1000
x (000)
- 0.05
- 0.10
-0.0938
The maximum absolute difference is 0.059, the K-S statistic; the Pareto fits this data well. The critical value for 130 data points and 20% significance level is 0.0938. The critical region or rejection region is outside the horizontal lines at ±0.0938. The difference curve stays between the horizontal lines at ±0.0938, the critical value for 20%, and therefore we do not reject the fit (we accept the fit) at 20%.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 453 For the Weibull, the K-S Statistic is 0.0922. Since 0.0922 < 0.0938, we (just barely) do not reject the fit of the Weibull at a 20% significance level. Below is shown the difference between the empirical distribution function and the maximum likelihood Weibull Distribution. Since the difference curve gets further from the x-axis than was the case for the Pareto Distribution, the Weibull is not as good a fit. D(x) 0.10
0.0938
0.05
0.00
1
10
100
1000
x (000)
- 0.05
- 0.10
-0.0938
The difference curve (barely) stays between the horizontal lines at ±0.0938, the critical value for 20%, and therefore we do not reject the fit (we accept the fit) at 20%. We note that for small sizes of loss, the empirical distribution is less than the fitted Weibull Distribution, and therefore the fitted Weibull has a thicker lefthand tail than the empirical.210 For large sizes of loss, the empirical distribution is somewhat less than the fitted Weibull Distribution; the empirical survival function is somewhat greater than the fitted survival function. Therefore the fitted Weibull Distribution has a somewhat thinner righthand tail than the empirical. Since there are only four losses of size greater than 2 million in the ungrouped data set in Section 2, it is difficult to draw conclusions about the righthand tail.
For example, for the fitted Weibull with θ = 231,158 and τ = .6885, F(10000) = 10.9%, while there are 8 out of 130 values in the data set ≤ 10,000, for an Empirical Distribution Function at 10,000 of: 8/130 = 6.2%. 210
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 454 For the Exponential, the K-S Statistic is 0.240. Since 0.240 > 0.143, we reject the fit of the Exponential at a 1% significance level. Below is shown the difference between the empirical distribution function and the maximum likelihood Exponential Distribution: D(x) 0.2 0.143 0.1
0.0
x (000) 1
10
100
1000
- 0.1 -0.143 - 0.2
Since the difference curve gets far from the x-axis, the Exponential is not a good fit. One can reject the Exponential distribution at 1%, since it goes outside the band around the x-axis formed by the 1% significance line, y = ±0.143. The rejection or critical region for 1% is outside the horizontal lines at y = ±.143. If the difference curve anywhere enters that critical region, then we reject at 1%.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 455 Tails of Distributions: D(x) plots can be used to compare the tails of a distribution to those of the data. The righthand tail, refers to large losses, in other words as x approaches infinity. Prob[X > c] = S(c). Thus the Survival Function measures how much probability is in the righthand tail. If the model has more probability in the righthand tail than the data, then the righthand tail of the model distribution is said to be too thick. For the righthand tail, that means for large x the Survival Function of the model is larger than the empirical survival function. Thus, if the righthand tail of the model distribution is too thick, then D(x) = empirical distribution - assumed distribution = assumed survival function - empirical survival function is positive in the righthand tail. Here is an example, where the model has too thick of a righthand tail compared to the data:211 D(x)
x
If the model has less probability in the righthand tail than the data, then the righthand tail of the model distribution is said to be too thin. For the righthand tail, that means for large x the Survival Function of the model is smaller than the empirical survival function. Thus, if the righthand tail of the model distribution is too thin, then D(x) is negative in the righthand tail.
211
Both the model and the data have a distribution function of one at infinity, so D(∞) = 0.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 456 The lefthand tail, refers to small losses, in other words as x approaches zero, (or in the case of the Normal Distribution negative infinity.) Prob[X ≤ c] = F(c). Thus the Distribution Function measures how much probability is in the lefthand tail. If the model has more probability in the lefthand tail than the data, then the lefthand tail of the model distribution is said to be too thick. For the lefthand tail, that means for small x the Distribution Function of the model is larger than the empirical distribution function. Thus, if the lefthand tail of the model distribution is too thick, then D(x) is negative in lefthand tail. Here is an example, where the model has too thick of a lefthand tail compared to the data:212 D(x) x
If the model has less probability in the lefthand tail than the data, then the lefthand tail of the model distribution is said to be too thin. For the lefthand tail, that means for small x the Distribution Function of the model is smaller than the empirical distribution function. Thus, if the lefthand tail of the model distribution is too thin, then D(x) is positive in the lefthand tail. Righthand Tail of the Model is Too Thick. ⇔ D(x) is Positive for large x. Righthand Tail of the Model is Too Thin. ⇔ D(x) is Negative for large x. Lefthand Tail of the Model is Too Thick. ⇔ D(x) is Negative for small x. Lefthand Tail of the Model is Too Thin. ⇔ D(x) is Positive for small x. In all cases, we compare the probability in the tail for the model and the data. 212
Both the loss model and the data have a distribution function of zero at zero, so D(0) = 0.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 457 Area Under the Difference Graph, D(x): D(x) = empirical distribution - fitted/assumed distribution = Fn (x) - F(x) = S(x) - Sn (x). The sample mean, X , is equal to the integral of the empirical survival function, Sn (x). The mean of a distribution, E[X], is equal to the integral of its survival function, S(x).213 Therefore, the integral of D(x) is: E[X] - X . In other words, the area under the difference graph, counting areas below the x-axis as negative, is the difference between the mean of the distribution and the sample mean. Exercise: One observes losses of sizes: 3, 6, 12, 15. Assume a distribution uniform from zero to 20. Graph D(x). [Solution: F(x) = x/20, x < 20. ⎧ 0, x < 3 ⎪ ⎪⎪ 0.25, 3 ≤ x < 6 Fn (x) = ⎨ 0.50, 6 ≤ x < 12 . ⎪0.75, 12 ≤ x < 15 ⎪ ⎪⎩ 1, 15 ≤ x The graph of D(x) = Fn (x) - F(x): D(x)
0.2
0.1
3
6
12
15
20
x
- 0.1
Comment: There are jump discontinuities in the empirical distribution function at each of the observed points, and therefore also in the difference graph, D(x).] 213
Assuming the distribution has support starting at zero.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 458 Exercise: Compute the area under D(x), counting areas below the x-axis as negative. [Solution: From 0 to 3 we have a triangle below the x axis with area: (3)(3/20)/2 = 9/40. D(x) 0.25 0.2 0.15 0.1
3
6
12
15
- 0.05 - 0.1 - 0.15 From 3 to 5 we have a triangle above the x axis with area: (2)(2/20)/2 = 4/40. From 5 to 6 we have a triangle below the x axis with area: (1)(1/20)/2 = 1/40. From 6 to 10 we have a triangle above the x axis with area: (4)(4/20)/2 = 16/40. From 10 to 12 we have a triangle below the x axis with area: (2)(2/20)/2 = 4/40. From 12 to 15 we have a triangle above the x axis with area: (3)(3/20)/2 = 9/40. From 15 to 20 we have a triangle above the x axis with area: (5)(5/20)/2 = 25/40. The area below D(x) is: (-9 + 4 - 1 + 16 - 4 + 9 + 25)/40 = 1.] For the uniform distribution from 0 to 20, E[X] = 10. For the sample: 3, 6, 12, 15, X = 9. Thus for this example, E[X] - X = 10 - 9 = 1, which is indeed the area under D(x).
20
x
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 459 Distributions fit via Method of Moments: For Curves fit to the ungrouped data in Section 2 by the Method of Moments, the values of the Kolmogorov-Smirnov Statistic are:214
Inverse Gaussian LogNormal Pareto Weibull Exponential Gamma
K-S Statistic 0.099 0.100 0.131 0.158 0.240 0.275
reject fit at 20% reject fit at 20% reject fit at 5% reject fit at 1% reject fit at 1% reject fit at 1%
Different Criteria For Choosing Between Fits of Distributions: We have covered a number of different ways of deciding which distribution best fits a given data set. Criterion
Good Fit
Chi-Square
Small215
Kolmogorov-Smirnov Statistic
Small
Anderson-Darling Statistic216
Small
Likelihood or Loglikelihood
Large
Penalized Loglikelihood217
Large
214
See the Section on fitting via the Method of Moments for the parameters of the fitted distributions. One wants a large corresponding “p-value”, which is the Survival Function of the Chi-Square Distribution (for the appropriate number of degrees of freedom) at the value of the Chi-Square Statistic. 216 To be discussed subsequently. 217 The Schwarz Bayesian Criterion. 215
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 460 Graphing the Difference of Two Empirical Distribution Functions: One can also graph the difference between two different empirical distributions. For example, here is a comparison of fire damage for protected and unprotected buildings.218 Frame, Protected Distribution- Unprotected Distribution 0.02
0.1
0.5
1
5
10
50
100
% of Value
- 0.02
- 0.04
- 0.06
Brick , Protected Distribution- UnprotectedDistribution 0.01
0.1
0.5
1
5
10
50
100
% of Value
- 0.01 - 0.02 - 0.03 - 0.04
218
The data were taken from “Rating by Layer of Insurance,” by Ruth E. Salzmann, PCAS 1963. For four different classes of building, she shows the number of fire losses of size less than or equal to a given percent of value of the building. This data were previously displayed in my section on ogives.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 461 Approaches to Selecting Models:219 Loss Models refers to looking at the above items as “score based approaches” to selecting a model. Score Based Approaches: 1. Chi-Square Statistic 2. p-value of Chi-Square Test 3. Kolmogorov-Smirnov Statistic 4. Anderson-Darling Statistic 5. Likelihood or Loglikelihood 6. Schwarz Bayesian Criterion Which distribution fits better may depend on which criterion one uses. Remember the principle of parsimony, which says one should not use more parameters than are necessary. Also Loss Models suggests that one limit the number of different models considered. If one has a vast array of models, one of them will happen to fit the data, even if it would not help to predict the future. Loss Models summarizes its advice as:220 1. Use a simple model if possible. 2. Restrict the universe of possible models. In addition to the “score based approaches”, there are graphical techniques, such as ogives, histograms, graphs of difference functions, p-p plots, graphical comparisons of mean excess losses, etc. Loss Models includes these graphical approaches in what he calls “judgement based approaches” to selecting a model. Judgement Based Approaches: 1. Reviewing graphs. 2. Focusing on important items for the particular application.221 3. Relying on models that have worked well in similar situations in the past.222 In many cases, score based approaches will narrow down the viable models to a few, and then judgement would be required in order to decide between these good candidates. 219
See Section 16.5 of Loss Models. See Section 16.5.1 of Loss Models. 221 For example, when pricing excess insurance, one would focus on the righthand tail of a size of loss distribution. 222 An actuary is not expected to reinvent the wheel each time he estimates a quantity. If fitting a Pareto distribution using maximum likelihood has worked well for many years for pricing your insurerʼs excess insurance, then you are likely to do so again this year without necessarily checking alternatives. Also modeling judgments are often based in whole or in part on reading the actuarial literature describing how other actuaries have modeled similar situations. 220
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 462 Problems: For the following questions, use the following table for the Kolmogorov-Smirnov statistic. α
0.20 |
c
0.10 |
1.07/ n
0.05 |
1.22/ n
.025 |
1.36/ n
0.01 |
1.48/ n
1.63/ n
16.1 (1 point) A distribution has been fit to 1000 claims. The Kolmogorov-Smirnov (K-S) statistic is 0.035. Which of the following is true? A. Do not reject the fit at 20%. B. Do not reject the fit at 10%. Reject the fit at 20%. C. Do not reject the fit at 5%. Reject the fit at 10%. D. Do not reject the fit at 1%. Reject the fit at 5%. E. Reject the fit at 1%. 16.2 (3 points) A Pareto distribution with parameters α = 1.5 and θ = 1000 was fit to the following five claims: 179, 352, 918, 2835, 6142. What is the value of the Kolmogorov-Smirnov (K-S) statistic? A. less than 0.24 B. at least 0.24 but less than 0.25 C. at least 0.25 but less than 0.26 D. at least 0.26 but less than 0.27 E. at least 0.27 16.3 (3 points) A LogNormal distribution with parameters µ = 7.72 and σ = 0.944 was fit to the following five claims: 410, 1924, 2635, 4548, 6142. What is the value of Kolmogorov-Smirnov (K-S) statistic? A. less than 0.15 B. at least 0.15 but less than 0.20 C. at least 0.20 but less than 0.25 D. at least 0.25 but less than 0.30 E. at least 0.30 16.4 (3 points) For f(x) = 2e-2x, using the method of inversion, five claims are simulated using random numbers: 0.280, 0.673, 0.372, 0.143, 0.961. Compute the Kolmogorov-Smirnov Statistic. A. less than 0.13 B. at least 0.13 but less than 0.16 C. at least 0.16 but less than 0.19 D. at least 0.19 but less than 0.22 E. at least 0.22
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 463 16.5 (3 points) For f(x) = 3,000 / (10 +x)4 , using the method of inversion, five claims are simulated using random numbers: 0.280, 0.673, 0.372, 0.143, 0.961. Compute the Kolmogorov-Smirnov Statistic. A. less than 0.13 B. at least 0.13 but less than 0.16 C. at least 0.16 but less than 0.19 D. at least 0.19 but less than 0.22 E. at least 0.22 16.6 (2 points) Given the distribution f(x) = 5 x4 , 0 < x < 1, and the sample: 0.90, 0.70, 0.75, 0.80, 0.50, 0.65, what is the value of the Kolmogorov-Smirnov Statistic? A. less than 0.35 B. at least 0.35 but less than 0.40 C. at least 0.40 but less than 0.45 D. at least 0.45 but less than 0.50 E. at least 0.50 16.7 (1 point) Using the result of the previous question, which of the following is true? A. Do not reject the fit at 20%. B. Do not reject the fit at 10%. Reject the fit at 20%. C. Do not reject the fit at 5%. Reject the fit at 10%. D. Do not reject the fit at 1%. Reject the fit at 5%. E. Reject the fit at 1%. 16.8 (4 points) You observe the following 10 claims: 241,513 110,493
231,919 139,647
105,310 220,942
125,152 161,964
116,472 105,829
A Distribution Function: F(x) = 1 - (x/100000)-2.8, x > 100,000, has been fit to this data. Determine the value of the Kolmogorov-Smirnov Statistic. A. less than 0.13 B. at least 0.13 but less than 0.16 C. at least 0.16 but less than 0.19 D. at least 0.19 but less than 0.22 E. at least 0.22 16.9 (3 points) You observe 5 claims of different sizes. A distribution function is compared to this data. What is the smallest possible value of the Kolmogorov-Smirnov Statistic? (A) 0 (B) 0.05 (C) 0.10 (D) 0.15 (E) 0.20
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 464 16.10 (2 points) Below is shown a graph of D(x), the difference between the empirical distribution function and a fitted distribution function. 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 10000
20000
30000
- 0.01 - 0.02 - 0.03 - 0.04 - 0.05 - 0.06 - 0.07 - 0.08 Which of the following statements are true? 1. The left tail of the fitted distribution is too thick. 2. The right tail of the fitted distribution is too thick. 3. The Kolmogorov-Smirnov Statistic is between 0.05 and 0.06. A. 1 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D
40000
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 465 16.11 (4 points) For the following four losses: 40, 150, 230, 400, and an Exponential Distribution as per Loss Models, which of the following values of θ has the best value of the KolmogorovSmirnov (K-S) goodness of fit statistic? A. 190 B. 210 C. 230
D. 250
E. 270
16.12 (1 point) Below is shown a graph of, D(x), the difference between the empirical distribution function and a fitted distribution function. 0.2 0.19 0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 - 0.01 - 0.02 - 0.03 - 0.04 - 0.05 - 0.06 - 0.07 - 0.08 - 0.09 - 0.1 - 0.11 - 0.12 - 0.13 - 0.14 - 0.15 - 0.16 - 0.17 - 0.18 - 0.19 - 0.2
5000
10000
What is the value of the Kolmogorov-Smirnov Statistic? A. 0.12 B. 0.14 C. 0.16 D. 0.18 E. 0.20
15000
20000
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 466 16.13 (1 point) According to Loss Models, which of the following are score-based approaches to selecting models? 1. Comparing Kolmogorov-Smirnov Statistics. 2. Examining graphs of the difference between the empirical distribution and the models. 3. Comparing p-values of the Chi-Square Statistics. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C or D 16.14 (2 point) For a single data set, for various continuous distributions on [0, 1], below are shown graphs of D(x). In which case is the mean of the distribution greater than the sample mean? A. B. 1
1
x
D.
C.
1
x
1
E.
1
x
x
x
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 467 16.15 (2 points) A Uniform distribution on 0 to 10 is compared to the following data set: 1, 1, 3, 3, 6, 6, 6, 7, 9, 9. What is the value of the Kolmogorov-Smirnov (K-S) statistic? A. 0.10 B. 0.15 C. 0.20 D. 0.25 E. 0.30 Use the following information for the next two questions: (i) The following are six observed claim amounts: 700 1200 2000 4600 6000 (ii) An Exponential Distribution is fit to this data via maximum likelihood.
9500
16.16 (1 point) Determine D(5999), the difference function at 5999. A. -0.13 B. -0.11 C. -0.09 D. -0.07 E. -0.05 16.17 (1 point) Determine D(6000), the difference function at 6000. A. 0.02 B. 0.04 C. 0.06 D. 0.08 E. 0.10 16.18 (2 points) You are testing the null l hypothesis that f(x) = x/18 for 0 < x < 6. A random sample of size 5 is drawn in which 2 values are equal to 2, and 3 values are equal to 4. What is the conclusion of the Kolmogorov-Smirnov test? A. Do not reject H0 at 20%. B. Do not reject H0 at 10%.
Reject H0 at 20%.
C. Do not reject H0 at 5%.
Reject H0 at 10%.
D. Do not reject H0 at 1%.
Reject H0 at 5%.
E. Reject H0 at 1%.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 468 16.19 (3 points) There are four data points: 10, 20, 30, 40. You are given the following graph of D(x): D(x)
10
20
30
- 0.1
- 0.2
- 0.3
- 0.4
- 0.5
- 0.6 Which of the following is F(x)? A. Uniform Distribution from 0 to 60. B. Exponential Distribution with θ = 25. C. Weibull Distribution with τ = 0.4 and θ = 7.5. D. Pareto Distribution with α = 4 and θ = 75. E. LogNormal Distribution with µ = 2 and σ = 1.1.
40
50
60
x
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 469 16.20 (4, 5/86, Q.5) (3 points) Given the sample 1, 1.5, 2, 2.5, and 2.75, you wish to test the goodness of fit of the distribution with a probability density function: f(x) = 2x/9, 0 ≤ x ≤ 3. What is the value of the Kolmogorov-Smirnov (K-S) goodness of fit statistic? A. Less than 0.11 B. At least 0.11, but less than 0.13 C. At least 0.13, but less than 0.15 D. At least 0.15, but less than 0.17 E. 0.17 or more. 16.21 (4, 5/87, Q.61) (3 points) Assume that the random variable X has the probability density function: f(x;θ) = θ + 2(1 - θ)x , 0≤x≤1, with parameter θ, 0 ≤ θ ≤ 2. What is the Kolmogorov-Smirnov statistic to test the fit of the distribution with θ = 0 given the following sample? 0.45, 0.5, 0.55, 0.75 A. Less than 0.20 B. At least 0.20, but less than 0.35 C. At least 0.35, but less than 0.50 D. At least 0.50, but less than 0.65 E. 0.65 or more. 16.22 (160, 5/87, Q.5) (2.1 points) Two lives are observed, beginning at t = 0. One dies at t1 = 5; the other dies at t2 = 9. The survival function S(t) = 1 - t/10 is hypothesized. Calculate the Kolmogorov-Smirnov statistic. (A) 0.4 (B) 0.5 (C) 0.6 (D) 0.7
(E) 0.8
16.23 (4, 5/88, Q.57) (2 points) Given the distribution f(x) = θxθ-1, 0 < x < 1, θ = 3, and the sample 0.7, 0.75, 0.8, 0.5, 0.65, what is the value of the Kolmogorov-Smirnov (K-S) statistic? A. Less than 0.1 B. At least 0.1, but less than 0.2 C. At least 0.2, but less than 0.3 D. At least 0.3, but less than 0.4 E. 0.4 or more
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 470 16.24 (4, 5/90, Q.49) (2 points) The following observations: 1.7, 1.6, 1.6, 1.9 are taken from a random sample. You wish to test the goodness of fit of a distribution with probability density function given by f(x) = x/2, for 0 ≤ x ≤ 2. Using the Kolmogorov-Smirnov statistic, you should: A. Do not reject at both the .01 level and the .10 level B. Do not reject at the .01 level but reject at the .10 level C. Do not reject at the .10 level but reject at the .01 level D. Reject at both the .01 level and the .10 level E. Cannot be determined 16.25 (160, 5/90, Q.17) (2.1 points) From a laboratory study of nine lives, you are given: (i) The times of death are 1, 2, 4, 5, 5, 7, 8, 9, 9. (ii) It has been hypothesized that the underlying distribution is uniform from 0 to 11. Calculate the Kolmogorov-Smirnov (K-S) statistic for the hypothesis. (A) 0.12 (B) 0.14 (C) 0.16 (D) 0.18 (E) 0.20 16.26 (4B, 11/92, Q.26) (2 points) You are given the following: • You have segregated 25 losses into 4 classes based on size of loss.
• •
A Pareto distribution with known parameters is believed to fit the observed data.
The chi-square statistic and the Kolmogorov-Smirnov statistic have been calculated to test the distribution's goodness of fit. Which of the following are true regarding these two statistics? 1. The chi-square statistic has an approximate chi-square distribution with 4 degrees of freedom. 2. The critical value of the Kolmogorov-Smirnov statistic, c, is inversely proportional to the square root of the sample size. 3. Calculating the Kolmogorov-Smirnov statistic required testing at most 8 values. A. 1 only B. 2 only C. 3 only D. 1, 2 only E. 1, 3 only
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 471 16.27 (4B, 11/93, Q.10) (3 points) A random sample of 5 claims x1 ,..., x5 is taken from the probability density function f(xi) =
α θα , α, θ, xi > 0. In ascending order the observations are: (θ + xi)α + 1
43, 145, 233, 396, 775. Suppose the parameters are α = 1.0 and θ = 400. Determine the Kolmogorov-Smirnov statistic for the fitted distribution. A. Less than 0.050 B. At least 0.050, but less than 0.140 C. At least 0.140, but less than 0.230 D. At least 0.230, but less than 0.320 E. At least 0.320 16.28 (4B, 5/95, Q.11) (2 points) Given the sample 0.1, 0.4, 0.8, 0.8, 0.9 you wish to test the goodness of fit of the distribution with a probability density function given by f(x) = (1 + 2x) / 2 , 0 ≤ x ≤ 1. Determine the Kolmogorov-Smirnov goodness of fit statistic. A. Less than 0.15 B. At least 0.15, but less than 0.20 C. At least 0.20, but less than 0.25 D. At least 0.25, but less than 0.30 E. At least 0.30 Use the following information for the next 2 questions: • A random sample of 20 observations of a random variable X yields the following values: 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0 • The null hypothesis, H0 , is that X has a uniform distribution on the interval [0, 20]. 16.29 (4B, 11/95, Q.9) (2 points) Determine the value of the Kolmogorov-Smirnov statistic used to test H0 . A. Less than 0.075 B. At least 0.075, but less than 0.125 C. At least 0.125, but less than 0.175 D. At least 0.175, but less than 0.225 E. At least 0.225 16.30 (4B, 11/95, Q.10) (1 point) Which of the following statements is true? A. H0 will be rejected at the 0.01 significance level. B. H0 will be rejected at the 0.05 significance level but not at the 0.01 level. C. H0 will be rejected at the 0.10 significance level but not at the 0.05 level. D. H0 will be rejected at the 0.20 significance level but not at the 0.10 level. E. H0 will not be rejected at the 0.20 significance level.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 472 Use the following information for the next 2 questions: •
Claim sizes follow a lognormal distribution with parameters µ and σ .
•
A random sample of five claims yields the values 0.1, 0.5, 1.0, 2.0, and 10.0 (in thousands).
16.31 (4B, 11/98, Q.3) (2 points) Determine the maximum likelihood estimate of σ . A. Less than 1.6 B. At least 1.6, but less than 1.8 C. At least 1.8, but less than 2.0 D. At least 2.0, but less than 2.2 E. At least 2.2 16.32 (4B, 11/98. Q.4) (2 points) Determine the value of the Kolmogorov-Smirnov statistic using the maximum likelihood estimates. A. Less than 0.07 B. At least 0.07, but less than 0.09 C. At least 0.09, but less than 0.11 D. At least 0.11, but less than 0.13 E. At least 0.13 16.33 (4, 5/00, Q.11) ( 2.5 points) The size of a claim for an individual insured follows an inverse exponential distribution with the following probability density function: f(x | θ) =
θ e- θ / x ,x>0 x2
The parameter θ has a prior distribution with the following probability density function: g(θ) =
e- θ / 4 ,θ>0 4
For a particular insured, the following five claims are observed: 1 2 3 5 13 Determine the value of the Kolmogorov-Smirnov statistic to test the goodness of fit of f(x l θ = 2). (A) Less than 0.05 (B) At least 0.05, but less than 0.10 (C) At least 0.10, but less than 0.15 (D) At least 0.15, but less than 0.20 (E) At least 0.20
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 473 16.34 (4, 5/01, Q.12) (2.5 points) You are given the following random observations: 0.1 0.2 0.5 1.0 1.3 You test whether the sample comes from a distribution with probability density function: f(x) = 2/ (1+x)3 , x > 0. Calculate the Kolmogorov-Smirnov statistic. (A) 0.01 (B) 0.06 (C) 0.12 (D) 0.17 (E) 0.19 16.35 (4, 11/02, Q.17 & 2009 Sample Q. 40) (2.5 points) You are given: (i) A sample of claim payments is: 29 64 90 135 182 (ii) Claim sizes are assumed to follow an exponential distribution. (iii) The mean of the exponential distribution is estimated using the method of moments. Calculate the value of the Kolmogorov-Smirnov test statistic. (A) 0.14 (B) 0.16 (C) 0.19 (D) 0.25 (E) 0.27 16.36 (4, 11/04, Q.38 & 2009 Sample Q.160) (2.5 points) You are given a random sample of observations: 0.1 0.2 0.5 0.7 1.3 You test the hypothesis that the probability density function is: f(x) = 4/(1+x)5 , x > 0. Calculate the Kolmogorov-Smirnov test statistic. (A) Less than 0.05 (B) At least 0.05, but less than 0.15 (C) At least 0.15, but less than 0.25 (D) At least 0.25, but less than 0.35 (E) At least 0.35 16.37 (4, 5/05, Q.1 & 2009 Sample Q.172) (2.9 points) You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis, H0 , that the probability density function for the population is: f(x) = 4/(1 + x)5 , x > 0. (iii) Critical values for the Kolmogorov-Smirnov test are: Level of Significance: 0.10 0.05 Critical Value: 1.22/ n 1.36/ n Determine the result of the test. (A) Do not reject H0 at the 0.10 significance level.
0.025
0.01
1.48/ n
1.63/ n
(B) Reject H0 at the 0.10 significance level, but not at the 0.05 significance level. (C) Reject H0 at the 0.05 significance level, but not at the 0.025 significance level. (D) Reject H0 at the 0.025 significance level, but not at the 0.01 significance level. (E) Reject H0 at the 0.01 significance level.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 474 16.38 (4, 5/07, Q.20) (2.5 points) You use the Kolmogorov-Smirnov goodness-of-fit test to assess the fit of the natural logarithms of n = 200 losses to a distribution with distribution function F*. You are given: (i) The largest value of |Fn (x) - F*(x)| occurs for some x between 4.26 and 4.42. (ii)
Observed x 4.26 4.30 4.35 4.36 4.39 4.42
F*(x)
Fn (x-)
Fn (x)
0.584 0.599 0.613 0.621 0.636 0.638
0.505 0.510 0.515 0.520 0.525 0.530
0.510 0.515 0.520 0.525 0.530 0.535
(iii) Commonly used large-sample critical values for this test are 1.22 / 1.36 /
n for α = 0.05, 1.52 /
n for α = 0.02, and 1.63 /
n for α = 0.10,
n for α = 0.01.
Determine the result of the test. (A) Do not reject H0 at the 0.10 significance level. (B) Reject H0 at the 0.10 significance level, but not at the 0.05 significance level. (C) Reject H0 at the 0.05 significance level, but not at the 0.02 significance level. (D) Reject H0 at the 0.02 significance level, but not at the 0.01 significance level. (E) Reject H0 at the 0.01 significance level
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 475 Solutions to Problems: 16.1. B. For 1000 points, the critical values for the K-S stat. are 1.22 / and 1.07 / 1000 = 0.0338 for 20%. 0.0338 < 0.035 < 0.0386. Thus one can reject at 20% but does not reject at 10%.
1000 = 0.0386 for 10%
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 476 16.2. D. At each of the observed claim sizes, compute the values of the fitted Pareto distribution: F(x) = 1 - (θ/(θ+x))α = 1 - (1000/(1000 +x))1.5. So for example, F(352) = 1 - (1000/(1000 + 352))1.5 = 0.3639. Then compare each fitted probability to the empirical distribution function just before and just after each observed claim value. (Thus there are twice 5, or 10 comparisons.) The largest absolute difference is: F(2835) - 0.6 = 0.8668 - 0.6 = 0.2668 = K-S statistic. X
Fitted F(X)
179
0.2189
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.2189 0.0189
0.2 0.1639 352
0.3639 0.0361 0.4 0.2235
918
0.6235 0.0235 0.6 0.2668
2835
0.8668 0.0668 0.8 0.1476
6142
0.9476 0.0524 1
An alternate way to arrange this same calculation, as per Table 16.3 in Loss Models: x
Absolute difference
Fn(x-)
F*(x)
Fn(x)
Absolute difference
179
0.2189
0
0.2189
0.2
0.0189
352
0.1639
0.2
0.3639
0.4
0.0361
918
0.2235
0.4
0.6235
0.6
0.0235
2835
0.2668
0.6
0.8668
0.8
0.0668
6142
0.1476
0.8
0.9476
1
0.0524
The K-S Statistic is 0.2668, the maximum absolute difference, checking in both columns. Fn (x-) is the empirical distribution function just before x. F*(x) is the assumed distribution function at x. Fn (x) is the empirical distribution function at x.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 477 Comment: The Pareto Distribution (shown dashed) and the Empirical Distribution Function: Prob. 1 0.8 0.6 0.4 0.2
2000
4000
x 8000
6000
A graph of the Empirical Distribution Function minus Fitted Pareto Distribution: Prob. 0.05
2000
4000
6000
- 0.05 -0 . 1 - 0.15 -0 . 2 - 0.25
The maximum absolute difference of .2668 occurs just before 2835.
8000
x
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 478 16.3. C. At each of the observed claim sizes, compute the values of the fitted LogNormal distribution. F(x) = Φ[{ln(x) − µ} / σ] = Φ[{ln(x) - 7.72} / 0.944]. So for example, F(410) = Φ((ln(410) - 7.72)/0.944) = Φ(-1.80) = 1 - 0.9641 = 0.0359. Then compare each fitted probability to the empirical distribution function just before and just after each observed claim value. (Thus there are twice 5 or 10 comparisons.) The largest absolute difference is F(1924) - 20% = 0.4325 - 0.2 = 0.2325 = K-S statistic. X
(LN(X)-7.72)/.944
Fitted F(X)
410
-1.80
0.0359
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.0359 0.1641
0.2 0.2325 1924
-0.17
0.4325 0.0325 0.4 0.1675
2635
0.17
0.5675 0.0325 0.6 0.1704
4548
0.74
0.7704 0.0296 0.8 0.0554
6142
1.06
0.8554 0.1446 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 479 16.4. E. An Exponential Distribution with (inverse) scale parameter of 2, F(x) = 1 - e-2x. For the inversion method let y = F(x), 1 - y = e-2x , x= - {ln (1-y) } / 2. Thus for y = .280, .673, .372, .143, .961; x = .1643, .5589, .2326, .0772, 1.6221. Sort the values of x from lowest to highest. One then computes the values for the exponential distribution at these values of x, and compares them to the empirical distribution function. Arranging things in the following pattern aids the comparison, which should take place twice for each point, once “just before” the value of x and once “just after”. The maximum absolute difference occurs just after the third claim, at which the empirical distribution function is 3/5 = 0.6 but the fitted distribution is 0.372. The K-S statistic is this maximum absolute difference of 0.228.
x
Empirical Distribution Function
Fitted Distribution Function
Absolute Difference
0.000 0.143 0.0772
0.143 0.057 0.200 0.080
0.1643
0.280 0.120 0.400 0.028
0.2326
0.372 0.228 0.600 0.073
0.5589
0.673 0.127 0.800 0.161
1.6221
0.961 0.039 1.000
Comment: The method of inversion is discussed in “Mahlerʼs Guide to Simulation.”
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 480 16.5. E. A Pareto Distribution with α = 3 and θ = 10, F(x) = 1 - (1+ x/10)-3. For the inversion method let y = F(x), 1 - y = (1+ x/10)-3 . x = 10{ (1-y)-1/3 -1 }. Thus for y = .280, .673, .372, .143, .961; x = 1.157, 4.515, 1.677, 0.528, 19.488.
x
Empirical Distribution Function
Fitted Distribution Function
Absolute Difference
0.000 0.143 0.528
0.143 0.057 0.200 0.080
1.157
0.280 0.120 0.400 0.028
1.677
0.372 0.228 0.600 0.073
4.515
0.673 0.127 0.800 0.161
19.488
0.961 0.039 1.000
Comment: The method of inversion is discussed in “Mahlerʼs Guide to Simulation.” Note that by the definition of the inversion method the values of the distribution function at the five simulated claim sizes is given by the five random numbers. Thus performing the inversion was unnecessary; one need not even compute the values of x. One computes the K-S statistic by comparing the given random numbers, corresponding to the values of the distribution function, to the values of the empirical distribution, which at the observed claim sizes takes on the values 1/5, 2/5, 3/5, 4/5 and 1. Thus one compares the given random numbers (ordered from smallest to largest) to the uniform distribution. By this “trick” one sees that the K-S statistic is really independent of the particular distribution. If you do not follow this important idea, see the previous problem in which the method inversion is applied with the same random numbers, but to an Exponential Distribution; one gets the same answer for the K-S Statistic as here.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 481 16.6. E. The Distribution function is F(x) = x5 , 0 ≤ x ≤ 1. Note that one has to rank the sample data from smallest to largest in order to compute the empirical distribution.
x
Empirical Distribution Function
Fitted Distribution Function
Absolute Difference
0.000 0.031 0.500
0.031 0.135 0.167 0.051
0.650
0.116 0.217 0.333 0.165
0.700
0.168 0.332 0.500 0.263
0.750
0.237 0.429 0.667 0.339
0.800
0.328 0.506 0.833 0.243
0.900
0.590 0.410 1
16.7. C. For 6 data points, the critical values for the K-S stat. are 1.63 /
6 = 0.665 for 1%,
1.36 / 6 = 0.555 for 5%, 1.22/ 6 = 0.498 for 10%, and 1.07/ 6 = 0.437 for 20%. 0.498 < 0.506 < 0.555 . Thus one can reject at 10%, but does not reject at 5%.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 482 16.8. D.
x
Empirical Distribution Function 0.000
Fitted Distribution Function
Absolute Difference 0.135
105,310
0.135 0.035 0.100 0.047
105,829
0.147 0.053 0.200 0.044
110,493
0.244 0.056 0.300 0.048
116,472
0.348 0.052 0.400 0.066
125,152
0.466 0.034 0.500 0.107
139,647
0.607 0.007 0.600 0.141
161,964
0.741 0.041 0.700 0.191
220,942
0.891 0.091 0.800 0.105
231,919
0.905 0.005 0.900 0.015
241,513
0.915 0.085 1.000
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 483 16.9. C. Let the five values from smallest to largest be: x1 , x2 , x3 , x4 , x5 . In order to compute the K-S Statistic, the first two comparisons we need to make are: |F(x1 ) - 0| and |F(x1 ) - 0.2|. At least one of these is greater than or equal to 0.1. If F(x1 ) = .1, then each of these absolute differences is .1. If F(x1 ) = .1, F(x2 ) = .3, F(x3 ) = .5, F(x4 ) = .7, and F(x5 ) = .9, then all of the comparisons result in 0.1. The smallest possible value of the Kolmogorov-Smirnov Statistic is 0.1. Comment: With N distinct values, the smallest possible K-S Statistic is 1/(2N). 16.10. E. For small x, the empirical distribution function is less than the fitted distribution. The left tail of the fitted distribution is too thick; #1 is true. For large x, the empirical distribution function is greater than the fitted distribution. For large x, the empirical survival function is less than the fitted survival function. The right tail of the fitted distribution is too thick; #2 is true. For some small x, the difference is about -.07. The K-S Statistic is the largest absolute value of the difference, which is about .07; #3 is false. Comment: Items 1 and 2 are similar to 4, 11/01, Q.6, involving p-p plots. The graph was based on 1000 simulated values from a Weibull with τ = 1/2 and θ = 1000, and a Lognormal Distribution fit to this simulated data via maximum likelihood. 16.11. D. As computed below, the K-S statistics are: .296, .260, .229, .202, .227. θ = 250, has the smallest and therefore the best K-S statistic. 190 X
Exponential Distribution
40
0.190
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.190 0.060
0.25 0.296 150
0.546 0.046 0.5 0.202
230
0.702 0.048 0.75 0.128
400
0.878 0.122 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 484 210 X
Exponential Distribution
40
0.173
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.173 0.077
0.25 0.260 150
0.510 0.010 0.5 0.166
230
0.666 0.084 0.75 0.101
400
0.851 0.149 1
230 X
Exponential Distribution
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.160
40
0.160 0.090 0.25 0.229
150
0.479 0.021 0.5 0.132
230
0.632 0.118 0.75 0.074
400
0.824 0.176 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 485 250 X
Exponential Distribution
40
0.148
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.148 0.102
0.25 0.201 150
0.451 0.049 0.5 0.101
230
0.601 0.149 0.75 0.048
400
0.798 0.202 1
270 X
Exponential Distribution
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.138
40
0.138 0.112 0.25 0.176
150
0.426 0.074 0.5 0.073
230
0.573 0.177 0.75 0.023
400
0.773 0.227 1
Comment: Note that the method of moments and maximum likelihood fit, θ = 205, does not have the best K-S statistic. Which θ is “best” depends on what criterion you use. 16.12. D. The Kolmogorov-Smirnov Statistic is the maximum distance from the x-axis, either above or below, of the difference curve, in this case about 0.18. 16.13. B. Examining graphs is a judgment-based approach, according to Prof. Klugman.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 486 16.14. D. The area under D(x), counting areas below the x-axis as negative, is the difference between the mean of the distribution and the sample mean. Therefore, when the mean of the distribution is greater than the sample mean, the area under the D(x) graph is positive, which is graph D. Alternately, D(x) = empirical distribution - assumed distribution. Therefore, for graph D, the sample has more probability for small x than does the assumed distribution. Therefore, the assumed distribution has more large values than the sample. Therefore, for graph D, the mean of the distribution is greater than the sample mean. Comment: A data set of size 1000 was simulated from a Beta Distribution with a = 2, b = 2, and θ = 1. Then the empirical distribution function for this data was compared to: A. Beta Distribution with a = 2, b = 2, and θ = 1, with mean 1/2. B. Beta Distribution with a = 1, b = 2, and θ = 1, with mean 1/3. C. Beta Distribution with a = 3, b = 3, and θ = 1, with mean 1/2. D. Beta Distribution with a = 2, b = 1, and θ = 1, with mean 2/3. E. Uniform Distribution from 0 to 1, a Beta Distribution with a = 1, b = 1, and θ = 1, with mean 1/2. In graphs A, C, and E, the area under D(x) is close to zero, and E[X] is approximately equal to X . In graph B, the area under D(x) is negative, and E[X] < X .
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 487 16.15. C. The K-S statistic is .2, from the absolute difference just before 6. X
Assumed F(X)
1
0.1
Empirical Distribution 0
Absolute Value of Assumed - Empirical 0.1 0.1
0.2 0.1 3
0.3 0.1 0.4 0.2
6
0.6 0.1 0.7 0.0
7
0.7 0.1 0.8 0.1
9
0.9 0.1 1
16.16. B. & 16.17. C. For the Exponential Distribution, maximum likelihood equals method of moments: θ^ = (700 + 1200 + 2000 + 4600 + 6000 + 9500)/6 = 4000. D(x) = empirical distribution - fitted/assumed distribution. D(5999) = 4/6 - (1 - e-5999/4000) = 2/3 - .7768 = -0.110. D(6000) = 5/6 - (1 - e-6000/4000) = 5/6 - .7769 = 0.056. Comment: The empirical distribution function, and thus D(x), has a jump discontinuity at each of the observed values.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 488 16.18. C. The distribution function is: F(x) = x2 /36, 0 < x < 6. F(2) = 1/9. F(4) = 4/9. The Empirical Distribution Function is 0.4 at 2 and 1 at 4. Thus the four comparisons are: |1/9 - 0| = 1/9, |1/9 - 0.4| = 0.2889, |4/9 - 0.4| = 0.0444, |4/9 - 1| = 0.5556. Thus the K-S statistic is 0.5556. α
0.20 |
c
0.10 |
1.07/ n
0.05 |
.025
0.01
|
1.22/ n
1.36/ n
| 1.48/ n
1.63/ n
Customizing the above Table for n = 5, the 10% critical value is: 1.22/ 5 = 0.5456. The 5% critical value is: 1.36/ 5 = 0.6082. 0.5456 < 0.5556 < 0.6082. ⇒ Reject at 10% but not at 5%. Comment: The probability of drawing a sample of size 5 from a continuous distribution and getting only two different values is virtually zero. 16.19. E. At 30, the empirical distribution function is 3/4. D(30) = 3/4 - F(30). Based on the graph D(30) ≅ -0.14, so F(30) = 0.75 + 0.15 = 0.90. For the Uniform Distribution from 0 to 60, F(30) = 1/2. For the Exponential Distribution with θ = 25, F(30) = 1 - e-30/25 = 0.70. For the Weibull Distribution with τ = 0.4 and θ = 7.5, F(30) = 1 - exp[-(30/7.5)0.4] = 0.82. For the Pareto Distribution with α = 4 and θ = 75, F(30) = 1 - (75/105)4 = 0.74. For the LogNormal Distribution with µ = 2 and σ = 1.1, F(30) = Φ[
ln(30) - 2 ] = Φ[1.27] = 0.898. 1.1
Thus of the choices, F(x) is the LogNormal Distribution. Comment: There are other places one can check, to eliminate the choices other than E. x Empirical Distribution Function LogNormal Distribution D(x) 1010
0 0.25
0.608 0.608
-0.608 -0.358
2020
0.25 0.50
0.817 0.817
-0.567 -0.317
3030
0.50 0.75
0.899 0.899
-0.399 -0.149
4040
0.75 1.00
0.938 0.938
-0.188 -0.062
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 489 16.20. D. f(x) = (2/9)x ⇒ F(x) = x2 /9, 0 ≤ x ≤ 3. At each of the observed claim sizes, compute the values of this distribution. Then compare each fitted probability to the empirical distribution function just before and just after each observed claim value. (Thus there are twice 5, or 10 comparisons.) The largest absolute difference is: 1 - F(2.75) = 1 - 0.8403 = 0.1597 = K-S statistic. X
Fitted F(X)
1
0.1111
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.1111 0.0889
0.2 0.0500 1.5
0.2500 0.1500 0.4 0.0444
2
0.4444 0.1556 0.6 0.0944
2.5
0.6944 0.1056 0.8 0.0403
2.75
0.8403 0.1597 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 490 16.21. C. For θ = 0, f(x) = 2x and thus F(x) = x2 . At each of the observed claim sizes, compute the values of this distribution. Then compare each fitted probability to the empirical distribution function just before and just after each observed claim value. (Thus there are twice 4, or 8 comparisons.) The largest absolute difference is: 0.75 -F(0.55) = 0.75 - 0.3025 = 0.4475 = K-S statistic. X
Fitted F(X)
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.2025
0.45
0.2025 0.0475 0.25 0.0000
0.5
0.2500 0.2500 0.5 0.1975
0.55
0.3025 0.4475 0.75 0.1875
0.75
0.5625 0.4375 1
16.22. B. Just before 5, the empirical distribution function is 0 and F(5) = 5/10 = 1/2. Absolute difference is 1/2. At 5, the empirical distribution function is 1/2 and F(5) = 5/10 = 1/2. Absolute difference is 0. Just before 9, the empirical distribution function is 1/2 and F(5) = 9/10 = .9. Absolute difference is .4. At 9, the empirical distribution function is 1 and F(5) = 9/10 = .9. Absolute difference is .1. Kolmogorov-Smirnov statistic is 1/2.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 491 16.23. E. At each of the observed claim sizes, compute the values of the fitted distribution: F(x) = xθ = x3 . Then compare each fitted probability to the empirical distribution function just before and just after each observed claim value. (Thus there are twice 5, or 10 comparisons.) The largest absolute difference is: 1 - F(0.8) = 1 - 0.512 = 0.488. X
Fitted F(X)
0.5
0.1250
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.1250 0.0750
0.2 0.0746 0.65
0.2746 0.1254 0.4 0.0570
0.7
0.3430 0.2570 0.6 0.1781
0.75
0.4219 0.3781 0.8 0.2880
0.8
0.5120 0.4880 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 492 16.24. B. Since we have 4 points, the critical values are 1.22/ 4 = 0.610 and 1.63 4 4 = 0.815. By integrating the density function, the distribution function is: x2 / 4 for 0 ≤ x ≤2. One compares the Empirical and Fitted Distribution Functions just before and just after each observed data point:
x
Empirical Distribution Function
Fitted Distribution Function
Absolute Difference
0.000 0.640 1.600
0.640 0.140 0.500 0.223
1.700
0.723 0.028 0.750 0.152
1.900
0.902 0.098 1.000
The largest absolute difference of .64 occurs just before x = 1.6. F(1.59999) - 0 = 0.64. We compare the K-S statistic of 0.64 to the critical values: 0.64 > 0.61, so that we reject at 10%. 0.64 ≤ 0.815 so we do not reject at 1%. Comment: Choice C is impossible; whenever one rejects at the 1% significance level one automatically also rejects at the 10% significance level. Note that since 1.6 occurs twice in the sample, the Empirical Distribution Function jumps from 0 to 2/4 = .5 at x = 1.6, (we get a double jump,) and we have to perform only 6 rather than 8 comparisons.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 493 16.25. D. Note that since 5 appears twice in the data set, the empirical distribution jumps from 3/9 to 5/9. t
Assumed F(t)
1
0.0909
Empirical Distribution 0
Absolute Value of Assumed - Empirical 0.0909 0.0202
0.1111 0.0707 2
0.1818 0.0404 0.2222 0.1414
4
0.3636 0.0303 0.3333 0.1212
5
0.4545 0.1010 0.5556 0.0808
7
0.6364 0.0303 0.6667 0.0606
8
0.7273 0.0505 0.7778 0.0404
9
0.8182 0.1818 1
16.26. B. 1. False. The Chi-Square Statistic has a number of degrees of freedom equal to 4 - 1 = 3. 2. True. 3. False. The K-S stat. is computed on the ungrouped data. Since there are 25 losses and we usually have to test just before and just after each point, we would expect to have to test 50 values. Comment: If the Pareto was fit to the data, then the degrees of freedom for the Chi-Square would be two less for two fitted parameters: 4 - 1 - 2 = 1. It is not totally clear from the question whether one has the 25 ungrouped losses. In any case, one should not apply the K-S test to grouped data; thus statement #3 is still not true.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 494 16.27. E. For the Pareto Distribution, F(x) = 1 - (θ/(θ+x))α. For α = 1 and θ = 400, F(x) = 1 - {400/(400+x)} = x / (400 + x). In order to compute the Kolmogorov-Smirnov statistic, one must compare the Empirical and Fitted distributions “just before” and “just after” the observed points. In this case the maximum absolute difference is 0.34 and occurs just after the fifth observed point. The fitted F(775) = .6596, while the empirical distribution function is 1. X
Fitted F(X)
43
0.0971
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.0971 0.1029
0.2 0.0661 145
0.2661 0.1339 0.4 0.0319
233
0.3681 0.2319 0.6 0.1025
396
0.4975 0.3025 0.8 0.1404
775
0.6596 0.3404 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 495 16.28. E. F(x) = (x + x2 )/2, 0 ≤ x ≤1. In order to compute the Kolmogorov-Smirnov statistic, one must compare the Empirical and Fitted distributions “just before” and “just after” the observed points. In this case the maximum absolute difference is .320 and occurs just before the third observed point. The fitted F(0.8) = 0.720, or if you prefer F(0.799999) = 0.720, while the empirical distribution function is 2/5 = 0.4. X
Fitted F(X)
0.1
0.055
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.055 0.145
0.2 0.080 0.4
0.280 0.120 0.4 0.320
0.8
0.720 0.080 0.8 0.055
0.9
0.855 0.145 1
Comment: Since the point 0.8 appears twice in the data, there are 2 x 4 = 8 comparisons (rather than 2 times 5 = 10 comparisons) and the empirical distribution function jumps from 2/5 = 0.4 to 4/5 = 0.8 at 0.8 (rather than a jump of 1/5.)
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 496 16.29. E. At x = 5 the fitted distribution is 5/20 = 0.25 while the empirical distribution function is 0.5. At x = 15 the fitted distribution is 15/20 = 0.75 while the empirical distribution function is 1. The K-S Statistic is maximum over all x of: | empirical distribution - fitted distribution | = |0.5-0.25| = |1-0.75| = 0.25. The absolute difference attains it maximum at x = 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,and 15. The easiest way to see this is via a graph. Hereʼs a graph of the empirical and assumed uniform distribution functions:
Hereʼs a graph of the difference between the empirical distribution function and the assumed uniform distribution function:
One can also compare the assumed and empirical distributions just before and just after each data point. A portion of that calculation can be seen below:
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 497
x
Empirical Distribution Function
Assumed Distribution Function
Absolute Difference
0.400 0.175 4.5
0.225 0.225 0.450 0.200
5
0.250 0.250 0.500 0.200
6
0.300 0.250 0.550
x
Empirical Distribution Function
Assumed Distribution Function
Absolute Difference
0.900 0.200 14
0.700 0.250 0.950 0.200
15
0.750 0.250 1.000 0.200
16
0.800 0.200 1.000
16.30. D. For 20 observations the critical values are 1.07 / significance level critical value
0.20 0.239
0.10 0.273
20 , etc.: 0.05 0.304
0.01 0.364
One compares the K-S Statistic from the previous question of 0.25 to these critical values. Since 0.25 > 0.239 we reject at the 0.20 significance level. Since 0.25 ≤ 0.273 we do not reject at the 0.10 significance level.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 498 16.31. A. The Method of Maximum Likelihood applied to the LogNormal is the same as the Method of Maximum Likelihood applied to the underlying Normal Distribution. In turn the Method of Maximum Likelihood applied to the Normal is the same as the Method of Moments. The average of the log claim sizes is 6.9078 and the second moment of the log claim sizes is 50.03. Thus the variance of the log claim sizes is: 50.03 - (6.90782 ) = 2.312. σ2 = 2.312, thus σ = 1.521. x
ln(x)
square of ln(x)
100 500 1000 2000 10000
4.6052 6.2146 6.9078 7.6009 9.2103
21.2076 38.6214 47.7171 57.7737 84.8304
6.9078
50.0300
Alternately, for a LogNormal Distribution, the log density is: ln f(x) = -.5 ({ln(x)−µ} /σ)2 - ln(x) - ln(σ) - .5 ln(2π). The loglikelihood for n points xi is: Σ {-.5 ({ln(xi)−µ} /σ)2 - ln(xi)} - n ln(σ) - .5n ln(2π). Taking the partial derivatives with respect to µ and σ and setting them equal to zero: 0 = Σ{ln(xi)−µ} /σ2, or Σln(xi) = nµ. 0 = Σ {ln(xi)−µ}2 /σ3 - n/σ, or Σ {ln(xi)−µ}2 = nσ2. Thus µ = (1/n)Σ ln(xi) and σ2 = (1/n)Σ {ln(xi)−µ}2 . Then proceed as above.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 499 16.32. E. The maximum likelihood estimate is µ = 6.908 and σ = 1.521. (This was the solution to the previous exam question.) At each of the observed claim sizes, compute the values of the fitted LogNormal distribution: F(x) = Φ[{ln(x)−µ} / σ] . So for example, F(2000) = Φ[{ln(x)−µ} / σ] = Φ[{ln(2000)−6.908} / 1.521] = Φ[.46] = .6772. Then compare each fitted probability to the empirical distribution function just before and just after each observed claim value. (Thus there are twice 5, or 10 comparisons.) The largest absolute difference is 0.1345 = K-S statistic. X
ln(x)
(ln(x) - mu)/sigma
Fitted F(X)
Empirical Distribution 0
Empirical - Observed 0.0655
100
4.605
-1.51
0.0655 0.1345 0.2 0.1228
500
6.215
-0.46
0.3228 0.0772 0.4 0.1000
1000
6.908
-0.00
0.5000 0.1000 0.6 0.0772
2000
7.601
0.46
0.6772 0.1228 0.8 0.1345
10000
9.210
1.51
0.9345 0.0655 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 500 16.33. D. F(x) = e-θ/x = e-2/x, an Inverse Exponential Distribution. The K-S statistic is 0.1679. X
Fitted F(X)
1
0.1353
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.1353 0.0647
0.2 0.1679 2
0.3679 0.0321 0.4 0.1134
3
0.5134 0.0866 0.6 0.0703
5
0.6703 0.1297 0.8 0.0574
13
0.8574 0.1426 1
Comment: We are told to test the goodness of fit of f(x l θ = 2), therefore, one makes no use of g(θ). Rather than compute a mixed distribution over all θ, θ is set equal to 2. The information on g(θ) was used to answer the previous exam question, 4, 5/00, Q.10.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 501 16.34. E. By integrating the density, F(x) = 1 - (1+x)-2. The K-S statistic is the largest absolute difference: 0.189. X
Assumed F(X)
0.1
0.1736
Empirical Distribution 0
Absolute Value of Assumed - Empirical 0.1736 0.0264
0.2 0.1056 0.2
0.3056 0.0944 0.4 0.1556
0.5
0.5556 0.0444 0.6 0.1500
1
0.7500 0.0500 0.8 0.0110
1.3
0.8110 0.1890 1
Comment: This is a Pareto Distribution with α = 2 and θ = 1.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 502 16.35. E. θ = observed mean = (29 + 64 + 90 + 135 + 182)/5 = 100. At each of the observed claim sizes, compute the values of the fitted Exponential distribution: F(x) = 1 - e-x/100. Then compare each fitted probability to the empirical distribution function just before and just after each observed claim value. (Thus there are twice 5, or 10 comparisons.) The largest absolute difference is F(64) - 0.2 = 0.4727 - 0.2 = 0.2727 = K-S statistic. X
Fitted F(X)
29
0.2517
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.2517 0.0517
0.2 0.2727 64
0.4727 0.0727 0.4 0.1934
90
0.5934 0.0066 0.6 0.1408
135
0.7408 0.0592 0.8 0.0380
182
0.8380 0.1620 1
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 503 16.36. E. F(x) = 1 - 1/(1 + x)4 , x >0.
x
Empirical Distribution Function
Assumed Distribution Function
Absolute Difference
0.0 0.317 0.1
0.317 0.117 0.2 0.318
0.2
0.518 0.118 0.4 0.402
0.5
0.802 0.202 0.6 0.280
0.7
0.880 0.080 0.8 0.164
1.3
0.964 0.036 1.0
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 504 16.37. D. At each of the observed values, compute the values of the assumed distribution: F(x) = 1 - 1/(1 + x)4 . Then compare each to the empirical distribution function just before and just after each observed value. The largest absolute difference is: |F(0.7) - 0.2| = |0.8803 - 0.2| = 0.6803. X
Assumed F(X)
Empirical Distribution 0
Absolute Value of Assumed - Empirical 0.5177
0.2
0.5177 0.3177 0.2 0.6803
0.7
0.8803 0.4803 0.4 0.5233
0.9
0.9233 0.3233 0.6 0.3486
1.1
0.9486 0.1486 0.8 0.1643
1.3
0.9643 0.0357 1
The critical values are: 1.22/ 5 = 0.546, 0.608, 0.662, 0.729. 0.662 < 0.680 < 0.729. ⇒ Reject at 2.5% and do not reject at 1%.
2013-4-6, Fitting Loss Distributions §16 K-S Test Basics, HCM 10/14/12, Page 505 16.38. D. We take all the absolute differences |Fn (x) - F*(x)| and |Fn (x-) - F*(x)|, and find the largest one. The K-S statistic is 0.111. x
Absolute difference
Fn(x-)
F*(x)
Fn(x)
Absolute difference
4.26
0.0790
0.505
0.584
0.510
0.0740
4.30
0.0890
0.510
0.599
0.515
0.0840
4.35
0.0980
0.515
0.613
0.520
0.0930
4.36
0.1010
0.520
0.621
0.525
0.0960
4.39
0.1110
0.525
0.636
0.530
0.1060
4.42
0.1080
0.530
0.638
0.535
0.1030
Customize the table for n = 200: α
10%
5%
2%
1%
c 0.086 0.096 0.107 0.115 Since 0.107 < 0.111 < 0.115, we reject H0 at 2% and do not reject at 1%. Comment: I have no idea why the question said “the fit of the natural logarithms of n = 200 losses to a distribution.” I think it meant to say, “the fit of n = 200 losses to a distribution.” F*(x) is the assumed distribution, Fn (x-) is the empirical distribution function just before x, and Fn (x) is the empirical distribution function at x. See Table 16.3 in Loss Models. We only look between 4.26 and 4.42, since we are given that the maximum absolute difference occurs in that region; we are not given any information on what happens outside that region.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 506
Section 17, Kolmogorov-Smirnov Test, Advanced In this section, additional ideas related to the Kolmogorov-Smirnov Statistic are discussed: how to compute the K-S Statistic when data is truncated and/or censored, confidence intervals for the underlying distribution function, and how to get bounds on the K-S Statistic when one has grouped data. Censoring: One can also compute the Kolmogorov-Smirnov Statistic in the case of censoring from above. Let u be the censorship point from above. Then the Kolmogorov-Smirnov Statistic is defined as: D = Max | ( empirical distrib. function at x) - (theoretical distrib. function at x) | x≤u
Exercise: An Exponential distribution with θ = 1000 is compared to the following data set censored from above at 2000: 197, 325, 981, 2000. What is the value of the Kolmogorov-Smirnov (K-S) statistic? [Solution: The K-S statistic is 0.2225, from the absolute difference at or just after 325. X
Assumed F(X)
Empirical Distribution 0
Absolute Value of Assumed - Empirical 0.1788
197
0.1788 0.0712 0.25 0.0275
325
0.2775 0.2225 0.5 0.1251
981
0.6251 0.1249 0.75 0.1147
2000
0.8647
Comment: In this case the K-S statistic was the same as in a similar exercise without censorship. If prior to censorship the maximum departure had occurred for large losses, the K-S statistic would have been smaller after censorship.] With censoring from above, we only make comparisons below the censorship point, including one just before the censorship point.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 507 At the censorship point, both the empirical distribution and the fitted/assumed distribution after adjusting it for censorship, are unity; the difference is zero. In the presence of censoring from above, the critical values should be somewhat smaller; because we are unable to make comparisons above the censorship point, there is less opportunity for random fluctuation to produce a large K-S statistic.223 The assumption we are testing is that the unlimited losses came from a certain distribution. When we have data censored from above, the distribution function jumps to 1 at the censorship point. For example, with a policy limit of 25,000, losses of size 25,000 or greater all appear in our data base at 25,000. If for example the unlimited data followed an Exponential Distribution with mean 100,000, F(25000) for this Exponential Distribution is: 1 - e-0.25 = 0.221. Yet the empirical distribution function based on data simulated from this Exponential censored from above at 25,000 would be one at 25,000. We would expect to see about 78% of the losses listed in our simulated censored data base as 25,000! A comparison at 25,000 would tell us nothing about whether our assumption is true. It would be inappropriate to compare 1 to F(25000) for a distribution such as the Exponential that has not been altered for censoring.224 Truncation: One can also compute the Kolmogorov-Smirnov Statistic in the case of truncation from below. Let t be the truncation point from below. Then the Kolmogorov-Smirnov Statistic is defined as: D = Max | ( empirical distrib. function at x) - (theoretical distrib. function at x) | t≤x
Where the theoretical distribution has been altered for the effects of truncation. 225
223
See page 450 of Loss Models and “Mahlerʼs Guide to Simulation.” One could compare 1 to 1, the theoretical distribution at the censorship point after adjusting for the effects of censoring. This is similar to comparing the empirical and theoretical distributions at infinity in the absence of censoring. Taking the absolute difference between 1 and 1, would not affect the Kolmogorov-Smirnov Statistic. 225 See “Mahlerʼs Guide to Loss Distributions,” or page 442 of Loss Models. 224
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 508 Exercise: Losses prior to truncation are assumed to follow an Exponential distribution with θ = 1000. This assumption is compared to the following data set truncated from below at 250: 325, 981, 2497. What is the value of the Kolmogorov-Smirnov (K-S) statistic? [Solution: After truncation from below at 250, G(x) =
F(x) - F(250) e- 250 / 1000 - e - x / 1000 = S(250) e- 250 / 1000
= 1 - e-(x-250)/1000, x > 250. X
Assumed Dist. after Truncation
325
0.0723
Empirical Distribution 0
Absolute Value of Assumed - Empirical 0.0723 0.2611
0.3333 0.1852 981
0.5186 0.1481 0.6667 0.2276
2497
0.8943 0.1057 1
The K-S statistic is 0.2611, from the absolute difference at or just after 325.]
Truncation and Censoring: Let t be the truncation point from below, and u be the censorship point from above. F* is the fitted/assumed distribution, after altering a ground-up, unlimited size of loss distribution for the affects of any truncation from below.226 Fn is the empirical distribution. The Kolmogorov-Smirnov Statistic is defined as:227 D = Max | Fn (x) - F*(x) | t≤x≤u
In the absence of truncation and censoring, this reduces to the previously discussed definition of the Kolmogorov-Smirnov Statistic.
226 227
What I have previously called G(x). See page 448 in Loss Models.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 509 Exercise: Losses prior to truncation and censoring are assumed to follow a Weibull distribution with θ = 3000 and τ = 1/2. This assumption is compared to the following data set truncated from below at 1000 and censored from above at 10,000: 1219, 1737, 2618, 3482, 4825, 6011, 10,000, 10,000, 10,000, 10,000. What is the value of the Kolmogorov-Smirnov (K-S) statistic? [Solution: After truncation from below at 1000, F*(x) =
F(x) - F(1000) exp[- 1000 / 3000 ] - exp[- x / 3000 ] = S(1000) exp[- 1000 / 3000 ]
= 1 - 1.7813 exp[- x / 3000 ], 1000 < x < 10,000. X
Assumed Dist. after Truncation
1219
0.0583
Empirical Distribution 0.0
Absolute Value of Assumed - Empirical 0.0583 0.0417
0.1 0.0677 1737
0.1677 0.0323 0.2 0.1001
2618
0.3001 0.0001 0.3 0.0935
3482
0.3935 0.0065 0.4 0.0989
4825
0.4989 0.0011 0.5 0.0675
6011
0.5675 0.0325 0.6 0.1130
10000
0.7130
The K-S statistic is 0.1130, from the absolute difference just before 10,000.]
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 510 Confidence Interval for the Distribution Function: The Kolmogorov-Smirnov Statistic can be used to place error bars around the empirical distribution. In this case of the ungrouped data from Section 2 with 130 data points, the critical value for 1% is 0.143. Prob[Max | Empirical Distribution - Actual Underlying Distribution Function | > 0.143] = 1%. There is a 99% probability that the actual distribution function from which this data was drawn is within ±0.143 of the empirical distribution function, for all x. Below are shown bands of ±0.143 around the empirical distribution function (thick line). 1
0.8
0.6
0.4
0.2
1000
10,000
100,000
1 million
x
The bands are restricted to never go below 0 or above 1, as for any distribution function. 99% of the time, the distribution will remain everywhere inside these bands. Only 1 time in a 100 will the true distribution lie outside for any size of loss.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 511 One could get a confidence interval for the distribution function at one specific size of loss, by using the variance of the empirical distribution function. However, this confidence band gotten via the K-S Statistic applies to all sizes of loss simultaneously. As shown below, the 10% critical value of 0.107 would lead to narrower error bars that would contain the true distribution 100% - 10% = 90% of the time: 1
0.8
0.6
0.4
0.2
1000
10,000
100,000
1 million
x
In general, if the Kolmogorov-Smirnov critical value is c for a significance level of α, then with a probability of 1 - α: {empirical distribution function - c} ≤ F(x) ≤ {empirical distribution function + c} For the ungrouped data in Section 2, there are 130 claims. Therefore, the critical value for the K-S Statistic at 20% is: 1.07 / 130 = 9.38%. Thus at any point the actual distribution function F(x) underlying the risk process that presumably generated this data, whatever it is, has a 80% chance of being within ±9.38% of the observed value. Note that this result does not depend on any assumption about the form of F(x)! For example, the empirical distribution function for the ungrouped data in Section 2 is 60.00% at 150,000.228 Thus there is an 80% chance that F(150000) is within the interval 60.00% ± 9.38% or [0.5062, 0.6938]. Similarly, the empirical distribution function is 94.62% at 1 million.229 Thus there is an 80% chance that F(1 million) is within the interval 94.62% ± 9.38%. 228 229
Of the 130 claims in Section 2, 78 are less than or equal to 150,000. 78/130 = 0.6. Of the 130 claims in Section 2, 123 are less than or equal to 1,000,000. 123/130 = 0.9462.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 512 However, since any distribution function never exceeds unity, the 80% confidence interval for F(1 million) is: [0.8524, 1]. The same adjustment would need to be made for confidence intervals that would go below zero, since the distribution function is always greater than or equal to zero. Below are shown the 99% confidence intervals, as well as the maximum likelihood Exponential Distribution (dashed). Since the Exponential Distribution gets outside of the 99% confidence bands, we reject it at 1%.230 1
0.8
0.6
0.4
0.2
1000
10,000
100,000
1 million
x
Exercise: You observe a set of 1000 claims. Ranked from smallest to largest, the 721st claim is $24,475, while the 722nd claim is $25,050. Using the K-S Statistic, determine a 95% confidence interval for the value of the underlying distribution function at $25,000. [Solution: The observed value of the distribution function at $25000 is: 721 / 1000 = 0.721. The critical value for the K-S statistic at 5% is: 1.36/ 1000 = 0.043. So the 95% confidence interval is: 0.721 ± 0.043 = (0.678, 0.764). ] To get a confidence interval around the empirical distribution function for the underlying distribution from which the data was drawn: 80% confidence interval ⇔ ± 20% critical value of K-S Statistic 90% confidence interval ⇔ ± 10% critical value of K-S Statistic 95% confidence interval ⇔ ± 5% critical value of K-S Statistic 99% confidence interval ⇔ ± 1% critical value of K-S Statistic Restrict the interval to be at least 0 and no more than 1. 230
This is just the same information as presented in a previous difference graph, arranged in a different manner.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 513 This result of using the K-S Statistic to get confidence intervals for the Distribution Function is very powerful. It has great practical value when one is dealing with the center of the Distribution function or one has many data points. For example, with 10,000 data points the critical value for a 1% significance level is 1.63%. So one can construct narrow bands around the empirical distribution function in this case. Another way to think about it, with enough data there is sometimes no reason to fit a distribution rather than just use the empirical distribution. In contrast, for the ungrouped data in Section 2, in the tail of the distribution the error bars constructed by the Kolmogorov-Smirnov Statistic are not useful. Grouped Data: The Kolmogorov-Smirnov Distribution is designed to work with individual data, in other words ungrouped data. However, for grouped data one can compute the maximum absolute difference between the fitted and empirical distributions. While this maximum discrepancy between the two distributions is useful to test the relative closeness of the different fits, the same statistical test for the Kolmogorov-Smirnov Statistic as applied with ungrouped data does not apply, since one is only testing the difference at the endpoints of the intervals.231 For example for the Burr distribution232 fit via maximum likelihood to the grouped data in Section 3, the maximum absolute difference between fitted and empirical distribution functions is .0014 and is calculated as follows: Bottom of Interval $ Thous. 0 5 10 15 20 25 50 75 100
Top of Interval $ Thous. 5 10 15 20 25 50 75 100 Infinity
# claims in the Interval 2208 2247 1701 1220 799 1481 254 57 33
Burr F(upper)
Empirical F(upper)
Absolute Difference
0.2202 0.4464 0.6170 0.7364 0.8175 0.9652 0.9909 0.9970 1.0000
0.2208 0.4455 0.6156 0.7376 0.8175 0.9656 0.9910 0.9967 1.0000
0.0006 0.0009 0.0014 0.0012 0.0000 0.0004 0.0001 0.0003 0.0000
10000
231 232
As the intervals become narrower the distinction between grouped and ungrouped data is less important. The Burr Distribution, F(x) = 1- (1/(1+(x/θ)γ))α, fit via maximum likelihood has parameters:
α = 3.9913, θ = 40,467 and γ = 1.3124.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 514 For the curves fit via Maximum Likelihood to the grouped data in Section 3 the results are:233 Maximum Absolute Difference Fitted vs. Empirical Distribution Burr 0.0014 Generalized Pareto 0.0037 Transformed Gamma 0.0063 Gamma 0.0171 Weibull 0.0199 LogNormal 0.0274 The Burr distribution is once again the best fit to the grouped data in Section 3, while none of the two parameter curves fit this grouped data well. When a curve fits as badly as the Weibull, one can in fact reject the fit. The Kolmogorov-Smirnov Statistic is at least 0.0199, since the KolmogorovSmirnov Statistic would be the maximum, taken over more points, of the absolute difference. Therefore, since the critical value at a 1% significance level for 10,000 points is 0.0163, we can in fact reject the Weibull at a 1% significance level. In any particular situation involving grouped data, one can put bounds on the K-S Statistic. The K-S Statistic is at least the maximum absolute discrepancy between the empirical and theoretical Distribution Functions. For example, in the case of the Burr, the K-S Statistic is at least 0.0014 as shown above.
233
See the Section on fitting via maximum likelihood to grouped data, for the parameters of the fitted distributions.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 515 One can also put an upper bound on the K-S Statistic, by putting bounds on what the Empirical Distribution Function would look like if the data had been ungrouped. For example in the case of the Burr, by checking just before and just after each endpoint:234 235 x .001 4999.999 5000.001 9999.999 10,000.001 14,999.999 15,000.001 19999.999 20,000.001 24,999.999 25,000.001 49999.999 50,000.001 74,999.999 75,000.001 99,999.999 100,000.001 1 trillion236
Empirical Distribution Maximum Minimum .2208 0 .2208 0 .4455 .2208 .4455 .2208 .6156 .4455 .6156 .4455 .7376 .6156 .7376 .6156 .8175 .7376 .8175 .7376 .9656 .8175 .9656 .8175 .9910 .9656 .9910 .9656 .9967 .9910 .9967 .9910 1.0000 .9967 1.0000 .9967
Theoretical Distribution 0 .2202 .2202 .4464 .4464 .6170 .6170 .7364 .7364 .8175 .8175 .9652 .9652 .9909 .9909 .9970 .9970 1.0000
Absolute Differences Theoretical vs. Empirical .2208 0 .0006 .2202 .2253 .0006 .0009 .2256 .1692 .0009 .0014 .1715 .1206 .0014 .0012 .1208 .0811 .0012 .0000 .0801 .1481 .0000 .0004 .1477 .0258 .0004 .0001 .0253 .0058 .0001 .0003 .0060 .0030 .0003 .0000 .0033
This results in absolute differences of up to 0.2256, resulting from the maximum possible empirical distribution just below 10,000 of 0.2208 versus the theoretical distribution there of 0.4464. One can arrange this calculation in the form of a spreadsheet. It makes use of the same values for the theoretical and empirical distribution functions as used in the previous spreadsheet used to compute the lower bound for the K-S Statistic. However, now one offsets the Empirical Distribution Function both up a row and down a row compared to the Theoretical Distribution Function. Then one has to compute two columns of absolute differences and search for the largest value in either of the two columns.
The Burr Distribution, F(x) = 1 - 1/{1+(x/θ)γ}α, fit via maximum likelihood has parameters: α = 3.9913, θ = 40,467 and γ = 1.3124. 235 While the computation is somewhat mind numbing, some clever people can pick out the likely placed where the largest values will result. In real world applications one would have this computation programmed, while on the exam one would hopefully have very few intervals. 236 Any very large number such that the theoretical distribution function is very close to unity. 234
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 516 The spreadsheet to calculate the maximum possible K-S Statistic looks as follows for the maximum likelihood Burr Distribution237 versus the Grouped Data from Section 3: Endpoint of Interval $ Thous. 0 5 10 15 20 25 50 75 100 Infinity
Absolute Difference For Maximum K-S Stat. 0.2202 0.2256 0.1715 0.1208 0.0799 0.1477 0.0253 0.0060 0.0033
Empirical Distribution
0.0000 0.2208 0.4455 0.6156 0.7376 0.8175 0.9656 0.9910 0.9967
Burr
Empirical Distribution
Absolute Difference For Maximum K-S Stat.
0.0000 0.2202 0.4464 0.6170 0.7364 0.8175 0.9652 0.9909 0.9970 1.0000
0.2208 0.4455 0.6156 0.7376 0.8175 0.9656 0.9910 0.9967 1.0000
0.2208 0.2253 0.1692 0.1206 0.0811 0.1481 0.0258 0.0058 0.0030
Based on the above spreadsheet and the previous spreadsheet, in the case of the Burr, the Kolmogorov-Smirnov statistic is at least 0.0014 and at most 0.2256. If there were many narrow intervals, this technique would allow one to get a fairly good estimate of the K-S statistic. The same computation can be understood graphically. Here the Burr Distribution (dashed) is compared to the minimum possible empirical distribution function (solid), which assumes that in each interval all of the claims occur at the upper end: Prob. 1.0 0.8 0.6 0.4 0.2
20000
40000
60000
80000
100000
The Burr Distribution, F(x) = 1 - {1/(1+(x/θ)γ)}α, fit via maximum likelihood has parameters: α = 3.9913, θ = 40,467 and γ = 1.3124. 237
size
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 517 Here the Burr Distribution (dashed) is compared to the maximum possible empirical distribution function (solid), which assumes that in each interval all of the claims occur at the lower end: Prob. 1.0 0.8 0.6 0.4 0.2
20000
40000
60000
80000
100000
size
Here are the differences between the fitted Burr Distribution minus the minimum possible Empirical Distribution Function (above the x-axis), and the fitted Burr Distribution minus the maximum possible Empirical Distribution Function (below the x-axis):
0.2
0.1
20000
40000
60000
80000
100000
- 0.1
- 0.2 The maximum absolute difference, at the marked point (10000, 0.2256), corresponds to the upper bound on the K-S statistic of 0.2256, computed previously.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 518 Exercise: Compute the bounds on the Kolmogorov-Smirnov statistic for the Weibull distribution fit via maximum likelihood to the Grouped Data in Section 3. Note this Weibull has parameters: θ = 16,184 and τ = 1.0997. [Solution: Bottom of Interval $ Thous. 0 5 10 15 20 25 50 75 100
Top of Interval $ Thous. 5 10 15 20 25 50 75 100 Infinity
# claims in the Interval 2208 2247 1701 1220 799 1481 254 57 33
Weibull F(upper)
Empirical F(upper)
Absolute Difference
0.2403 0.4451 0.6014 0.7170 0.8007 0.9685 0.9955 0.9994 1.0000
0.2208 0.4455 0.6156 0.7376 0.8175 0.9656 0.9910 0.9967 1.0000
0.0195 0.0004 0.0142 0.0206 0.0168 0.0029 0.0045 0.0027 0.0000
10000
Thus the K-S Statistic is at least 0.0206. Endpoint of Interval $ Thous. 0 5 10 15 20 25 50 75 100 Infinity
Absolute Difference For Maximum K-S Stat.
Empirical Distribution
0.2403 0.2243 0.1559 0.1014 0.0631 0.1510 0.0299 0.0084 0.0033
0.0000 0.2208 0.4455 0.6156 0.7376 0.8175 0.9656 0.9910 0.9967
Weibull
Empirical Distribution
Absolute Difference For Maximum K-S Stat.
0.0000 0.2403 0.4451 0.6014 0.7170 0.8007 0.9685 0.9955 0.9994 1.0000
0.2208 0.4455 0.6156 0.7376 0.8175 0.9656 0.9910 0.9967 1.0000
0.2208 0.2052 0.1705 0.1362 0.1005 0.1649 0.0225 0.0012 0.0006
Thus the K-S Statistic is at most 0.2403.] Loss Models states that the Kolmogorov-Smirnov test should only be used on individual data, in other words on ungrouped data. However, as has been discussed, if one has grouped data, a maximum and minimum for the Kolmogorov-Smirnov statistic can be determined.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 519 In some situations, such as when one has many narrow intervals and a large amount of data, one can usefully apply a statistical hypothesis using the same Kolmogorov-Smirnov critical values as used for ungrouped data. However, usually one does not apply the Kolmogorov-Smirnov test to grouped data. One could run a simulation in order to determine critical values to apply to a particular grouped data situation with many narrow intervals, that an actuary might be frequently encountering.238 Such a test, using the maximum observed absolute difference, would not be called the Kolmogorov-Smirnov test, although it would clearly be somehow related. The power of such a test would be less than the Kolmogorov-Smirnov test applied to the same data in ungrouped form.
Comparing Two Empirical Distributions: Similar techniques to those discussed can be used to test the hypothesis that two data sets were each drawn from the same distribution of unknown type. The test statistic is: D = Max | (first empirical distrib. function at x) - (second empirical distrib. function at x) |. x As previously, the maximum occurs just before or just at one of the data points. However, now we have to check at each of the data points in each of the two data sets. Let n1 be the size of the first data set, and n2 be the size of the second data set. The critical values can be approximated for large data sets by letting n = n1 n2 /(n1 + n2 ), and using the previously given table of critical values for the K-S Statistic. Exercise: If the sample sizes are 30 and 50, what is the critical value for 10%? [Solution: n = (30)(50)/80 = 18.75. The critical value is: 1.22/ 18.75 = 0.282.] Thus, with sample sizes of 30 and 50, if D calculated as above were greater than 0.282, one would reject at 10% the null hypothesis that the two data sets were drawn from the same distribution.
238
See “Mahlerʼs Guide to Simulation.”
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 520 Problems: For the following questions, use the following table for the Kolmogorov-Smirnov statistic. α
0.20 |
c
0.10 |
1.07/ n
0.05 |
1.22/ n
0.025 |
1.36/ n
0.01 |
1.48/ n
1.63/ n
17.1 (2 points) You observe a set of 10,000 claims. Ranked from smallest to largest, the 9700th claim is $1,096,000, while the 9701th claim is $1,112,000. Use the Kolmogorov-Smirnov statistic in order to determine a 90% confidence interval for the value of the underlying distribution function at $1,100,000. A. [0.94, 1.00] B. [0.9537, 0.9863] C. [0.9564, 0.9836] D. [0.9578, 0.9822] E. [0.9593, 0.9807] 17.2 (2 points) You are given the following information about a random sample: (i) The sample size equals five. (ii) Two of the sample observations are known to exceed 50, and the remaining three observations are 20, 30 and 45. Ground up unlimited losses are assumed to follow an Exponential Distribution with θ = 65. Calculate the value of the Kolmogorov-Smirnov test statistic. A. Less than 0.20 B. At least 0.20, but less than 0.22 C. At least 0.22, but less than 0.24 D. At least 0.24, but less than 0.26 E. At least 0.26 17.3 (4 points) You observe the following five ground-up claims from a data set that is truncated from below at 100: 120 150 190 260 580 Loss sizes prior to truncation are assumed to follow a Loglogistic distribution with γ = 2 and θ = 150. Calculate the value of the Kolmogorov-Smirnov test statistic. A. Less than 0.16 B. At least 0.16, but less than 0.18 C. At least 0.18, but less than 0.20 D. At least 0.20, but less than 0.22 E. At least 0.22
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 521 17.4 (2 points) You observe the following 10 claims: 241,513 110,493
231,919 139,647
105,310 220,942
125,152 161,964
116,472 105,829
Use the Kolmogorov-Smirnov statistic in order to determine an 80% confidence interval for the value of the underlying distribution function at $150,000. A. [0.24, 0.96] B. [0.26, 0.94] C. [0.28, 0.92] D. [0.30, 0.90] E. [0.32, 0.88] 17.5 (3 points) Losses truncated from below at 50 are of size: 64, 90, 132, 206. Loss sizes prior to truncation are assumed to follow an Exponential distribution with θ = 100. Calculate the value of the Kolmogorov-Smirnov test statistic. (A) 0.19 (B) 0.21 (C) 0.23 (D) 0.25 (E) 0.27 17.6 (3 points) With a deductible of 500 and a maximum covered loss of 5000, the following five payments are made: 179, 352, 968, 1421, 4500. Loss sizes prior to the effect of the deductible and maximum covered loss are assumed to follow a Pareto Distribution with parameters α = 2 and θ = 1000. Calculate the value of the Kolmogorov-Smirnov test statistic. (A) 0.19 (B) 0.21 (C) 0.23 (D) 0.25 (E) 0.27 17.7 (3 points) Losses prior to censoring are assumed to follow a Weibull distribution with θ = 3000 and τ = 2. This assumption is compared to the following data set censored from above at 5000: 737, 1618, 2482, 3003, 5000. What is the value of the Kolmogorov-Smirnov (K-S) statistic? (A) 0.17 (B) 0.19 (C) 0.21 (D) 0.23 (E) 0.25 17.8. You observe the following 35 losses: 6 7 11 14 15
17 18 19 25 29
30 34 38 40 41
48 49 53 60 63
78 103 124 140 192
198 227 330 361 421
514 546 750 864 1638
Use the Kolmogorov-Smirnov statistic in order to get a 90% confidence band for the distribution from which this data was drawn. Which of the following is the resulting interval for the value of the underlying distribution function at 200? A. [0.47, 1.00] B. [0.49, 0.99] C. [0.51, 0.97] D. [0.53, 0.95] E. [0.55, 0.93]
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 522 17.9 (4, 5/88, Q.58) (2 points) A random sample was taken from an unknown distribution function F(x): Rank x Rank x Rank x Rank x 1 3 5 12 9 30 13 99 2 4 6 13 10 40 14 105 3 9 7 15 11 55 15 115 4 10 8 20 12 78 16 129 Using the Kolmogorov-Smirnov statistic, calculate a 90% confidence band for F(x) at x = 40. A. The lower bound of the confidence band is less than 0.30 and the upper bound of the confidence band is greater than 0.95 B. The lower bound of the confidence band lies between 0.30 and 0.35 and the upper bound of the confidence band lies between 0.90 and 0.95 C. The lower bound of the confidence band lies between 0.35 and 0.40 and the upper bound of the confidence band lies between 0.80 and 0.90 D. The lower bound of the confidence band lies between 0.40 and 0.45 and the upper bound of the confidence band lies between 0.70 and 0.80 E. The lower bound of the confidence band lies between 0.45 and 0.50 and the upper bound of the confidence band lies between 0.60 and 0.70 17.10 (4B, 11/93, Q.22) (2 points) A random sample, x1 , ..., x20 is taken from a probability distribution function F(x). 1.07, 1.07, 1.12, 1.35, 1.48, 1.59, 1.60, 1.74, 1.90, 2.02, 2.05, 2.07, 2.15, 2.16, 2.21, 2.34, 2.75, 2.80, 3.64, 10.42 Determine the 90% confidence band for F(x) at x = 1.50 using the Kolmogorov-Smirnov statistic. A. (-0.023, 0.523) B. (0, 0.523) C. (0, 0.804) D. (1.070, 1.900) E. (1.070, 2.020)
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 523 17.11 (4B, 5/97, Q.28 & Course 4 Sample Exam 2000, Q. 23) (2 points) You are given the following: • Forty (40) observed losses have been recorded in thousands of dollars and are grouped as follows: Interval Number of ($000) Losses (1, 4/3) 16 [4/3, 2) 10 [2, 4) 10 [4, ∞) 4 • The null hypothesis, H0 , is that the random variable X underlying the observed losses, in thousands, has the density function f(x) = 1/x2 , 1 < x < ∞. Since exact values of the losses are not available, it is not possible to compute the exact value of the Kolmogorov-Smirnov statistic used to test H0 . However, it is possible to put bounds on the value of this statistic. Based on the information above, determine the smallest possible value and the largest possible value of the Kolmogorov-Smirnov statistic used to test H0 . A. Smallest possible value = 0.10, Largest possible value = 0.25 B. Smallest possible value = 0.10, Largest possible value = 0.40 C. Smallest possible value = 0.15, Largest possible value = 0.25 D. Smallest possible value = 0.15, Largest possible value = 0.40 E. Smallest possible value = 0.25, Largest possible value = 0.40
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 524 17.12 (4B, 5/99, Q.12) (2 points) You are given the following: • One hundred claims greater than 3,000 have been recorded as follows: Interval Number of Claims (3,000, 5,000] 6 (5,000, 10,000] 29 (10,000, 25,000] 39 (25,000, ∞) 26 • Claims of 3,000 or less have not been recorded. • The null hypothesis, H0 , is that claim sizes follow a Pareto distribution, with parameters α = 2 and θ = 25,000 . Since exact values of the claims are not available, it is not possible to compute the exact value of the Kolmogorov-Smirnov statistic used to test H0 . However, it is possible to put bounds on the value of this statistic. Referring to the information above, determine the smallest possible value of the Kolmogorov-Smirnov statistic used to test H0 . A. Less than 0.03 B. At least 0.03, but less than 0.06 C. At least 0.06, but less than 0.09 D. At least 0.09, but less than 0.12 E. At least 0.12
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 525 Solutions to Problems: 17.1. D. The value of the empirical distribution function at $1,100,000 is 9700 / 10000 = 0.9700. The critical value for the K-S- stat. at 10% is 1.22/ 10,000 = 0.0122. So the 90% confidence interval is 0.9700 ± 0.0122. Comment: Note that since the K-S statistic is never negative, the 90% confidence interval uses the critical value for 10%. The variance of the empirical distribution function at 1.1 million is approximately: (.97)(1 - .97)/10000 = .00000291. Thus a 90% confidence interval for the empirical distribution function at 1.1 million is approximately: .9700 ± 1.645 0.00000291 = 0.9700 ± 0.0028. This is much narrower than the confidence interval gotten via the K-S Statistic, which applies to all sizes of loss simultaneously. 17.2. E. F(x) = 1 - exp[-(x/65)], x < 50. X
Assumed F(X)
20
0.2649
Empirical Distribution 0.0
Absolute Value of Assumed - Empirical 0.2649 0.0649
0.2 0.1697 30
0.3697 0.0303 0.4 0.0996
45
0.4996 0.1004 0.6 0.0634
50
0.5366
The largest absolute difference is: |0.2649 - 0| = 0.2649 = K-S statistic. Comment: We make the final comparison just before the censorship point of 50: |0.5366 - 0.6| = .0634.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 526 17.3. B. Prior to truncation the distribution function is: F(x) = (x/150)2 / {1 + (x/150)2 } = x2 /(22500 + x2 ). F(100) = 0.3077. S(100) = 0.6923. After truncation from below at 100 the distribution function is: G(x) = {F(x) - F(100)} / S(100) = (F(x) - 0.3077)/0.6923. At each of the observed loss sizes, compute the values of G(x). Then compare each value to the empirical distribution function just before and just after each observed loss. The largest absolute difference is 0.161 = K-S statistic. X
F(x)
100
0.3077
0.0000 G(x)
Empirical Distribution 0
Absolute Value of G(x)0.0000 - Empirical
0 0.1192 120
0.3902
0.1192 0.0808 0.2 0.0778
150
0.5000
0.2778 0.1222 0.4 0.0454
190
0.6160
0.4454 0.1546 0.6 0.0393
260
0.7503
0.6393 0.1607 0.8 0.1094
580
0.9373
0.9094 0.0906 1
17.4. B. Since 6 claims out of 10 are less than or equal to 150,000, the value of the empirical distribution function at $150,000 is 6/10 = .6. The critical value for the K-S Statistic at 20% is: 1.07/(100.5) = 0.34. So the 80% confidence interval is: 0.60 ± 0.34. Comment: Since the K-S Stat. is never negative, the 80% confidence interval uses the critical value for 20%. 1.07/ n is the 20% critical value from the K-S table. You will be provided the K-S Table in any question where you need it.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 527 17.5. B. Prior to truncation the distribution function is: F(x) = 1 - e-x/100. After truncation from below at 50 the distribution function is: G(x) = {F(x) - F(50)}/S(50) = {e-50/100 - e-x/100}/e-50/100 = 1 - e-(x-50)/100. At each of the observed loss sizes, compute the values of G(x). Then compare each value to the empirical distribution function just before and just after each observed loss. The largest absolute difference is 0.2101 = K-S statistic. X
G(x)
Empirical Distribution 0
Absolute Value of G(x) - Empirical 0.0000
0.0000 0.0000 0 0.1306 64
0.1306 0.1194 0.25 0.0797
90
0.3297 0.1703 0.5 0.0596
132
0.5596 0.1904 0.75 0.0399
206
0.7899 0.2101 1
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 528 17.6. C. Prior to truncation and censoring the distribution function is: F(x) = 1 - {1000/(1000 + x)}2 . After truncation from below at 500 and censoring from above at 5000 the distribution function is: G(x) = {F(x) - F(500)}/S(500) = 1 - S(x)/S(500) = 1 - {1000/(1000 + x)}2 /{1000/1500}2 = 1 - {1500/(1000 + x)}2 , 500 < x < 5000. Payments of: 179, 352, 968, 1421, 4500, correspond to losses of size: 679, 852, 1468, 1921, 5000 or more. At each of the loss sizes below the censorship point of 5000, compute the values of G(x). Then compare each value to the empirical distribution function just before and just after each observed loss. We also compare the empirical distribution function and G(x) just before 5000. The largest absolute difference is 0.2306 = K-S statistic. X
G(x)
Empirical Distribution 0
Absolute Value of G(x) - Empirical 0.0000
0.0000 0.0000 0 0.2019 679
0.2019 0.0019 0.2 0.1440
852
0.3440 0.0560 0.4 0.2306
1468
0.6306 0.0306 0.6 0.1363
1921
0.7363 0.0637 0.8 0.1375
5000
0.9375
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 529 17.7. A. F(x) = 1 - exp[-(x/3000)2 ], x < 5000. X
Assumed F(X)
Empirical Distribution 0.0
Absolute Value of Assumed - Empirical 0.0586
737
0.0586 0.1414 0.2 0.0524
1618
0.2524 0.1476 0.4 0.0956
2482
0.4956 0.1044 0.6 0.0329
3003
0.6329 0.1671 0.8 0.1378
5000
0.9378
The largest absolute difference is: |0.6329 - 0.8| = 0.1671 = K-S statistic. Comment: We make the final comparison just before the censorship point of 5000: |0.9378 - 0.8| = 0.1378. 17.8. D. Since 26 claims out of 35 are less than or equal to 200, the empirical distribution function at 200 is 26/35 = 0.74. The critical value for the K-S Statistic at 10% is 1.22/35.0.5 = .0206. So the 90% confidence interval is: 0.74 ± 0.21. 17.9. B. For the Kolmogorov-Smirnov Statistic, if the critical value is c for a significance level of α, then with a probability of 1 - α: {empirical distribution function - c} ≤ F(x) ≤ {empirical distribution function + c}. For a probability of 90%, α = 0.10, so with n = 16, c = 1.22 / The empirical distribution function at x = 40 is: 10/16 = .625. The confidence interval is: 0.625 ± 0.305 = 0.32 to 0.93.
16 = 0.305.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 530 17.10. B. Since we want a 90% confidence interval α = 0.10; for α = 0.10, there is only a 10% chance that the K-S Statistic will be greater than the critical value. For 20 observed points and α = 0.10, the critical value is: 1.22(20-0.5) = 0.273. Thus the absolute value of the difference between the empirical and actual underlying distribution functions is at most .273, with 90% confidence. (The K-S Stat. is the maximum over all values of x of this absolute difference.) The empirical distribution at x = 1.50 is 5/20 = 0.25, since the fifth observed point is 1.48 and the sixth observed point is 1.59. Thus F(1.50) is 0.250 ± 0.273, which leads to a confidence interval of: (0 , 0.523). Comment: Note that since any distribution function is always greater than or equal to zero and less than or equal to one, the distribution function at 1.50 must not be negative. This is why choice A is not correct. (Select the one best answer.)
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 531 17.11. D. The K-S statistic is defined as the maximum absolute difference of the empirical and theoretical Distribution Functions. By integrating the given density function, the theoretical Distribution Function is: F(x) = 1 - 1/x, 1 < x < ∞. Thus: Empirical Theoretical Absolute x Distribution Distribution Difference 1 0 0 0 4/3 .40 .25 .15 2 .65 .50 .15 4 .90 .75 .15 ∞ 1 1 0 Thus we observe absolute differences of 0, .15, .15, .15, and 0. Thus the K-S statistic is at least .15. However, it may be that the absolute difference is bigger at some point which is not the end point of an interval. For example, we know that the empirical distribution function at 1.99999 is at least .40 and at most .65. Thus the absolute difference of the empirical and theoretical distributions is at most the larger of: |.65-.50| = 0.15 and |.45-.50| = 0.05. Similarly checking just before and just after each endpoint: x Empirical Distribution Theoretical Absolute Differences Maximum Minimum Distribution Theoretical vs. Empirical 1.0001 .40 0 0 .40 0 1.3333 .40 0 .25 .15 .25 1.3334 .65 .40 .25 .40 .15 1.9999 .65 .40 .50 .15 .10 2.0001 .90 .65 .50 .40 .15 3.9999 .90 .65 .75 .15 .10 4.0001 1 .90 .75 .25 .15 extremely large 1 .90 1.00 0 .10 This results in absolute differences of up to .40 (the maximum possible empirical distribution just above either 1, 4/3, or 2, versus the theoretical distribution there.) This calculation can be arranged in a spreadsheet (similar to that used to get the minimum K-S Statistic for grouped data) as follows: Endpoint Absolute of Interval Difference $ Thous. For Maximum K-S Stat. 1 1.3333333 2 4 Infinity
0.2500 0.1000 0.1000 0.1000
Empirical Distribution
Theoretical Distribution
Empirical Distribution
Absolute Difference For Maximum K-S Stat.
0.0000 0.4000 0.6500 0.9000
0.0000 0.2500 0.5000 0.7500 1.0000
0.4000 0.6500 0.9000 1.0000
0.4000 0.4000 0.4000 0.2500
Comments: If there were many narrow intervals, this technique would allow one to get a fairly good estimate of the K-S statistic. One needs to offset the columns of Empirical Distributions one row up and down and then get two columns of absolute differences.
2013-4-6, Fitting Loss Distributions §17 K-S Test Advanced, HCM 10/14/12, Page 532 17.12. C. The Kolmogorov-Smirnov statistic is the maximum over all x of the absolute difference between the empirical and theoretical distribution functions. In this case we can only observe the absolute difference at a few values of x. However, the K-S statistic must be greater than or equal to the maximum of these observed absolute differences. x
Number of Claims ≤ x and > 3000
Empirical Distribution Function
Theoretical Distribution Function
Absolute Difference
3000 5000 10000 25000
0 6 35 74
0 0.06 0.35 0.74
0.0000 0.1289 0.3600 0.6864
0.0000 0.0689 0.0100 0.0536
Comment: Both distributions are for the data truncated from below at 3000.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 533
Section 18, p-p Plots239 240 The K-S Statistic provides a formal technique of comparing the fitted and empirical distribution functions. Plots of the difference function provide a graphical technique to compare empirical and fitted distributions. Another graphical technique to analyze a fit is via a p-p plot. In a p-p plot one graphs the fitted distribution versus the estimated percentile at that data point. For example, the ungrouped data in Section 2 has 130 values: 300, 400, 2800, ... , 4,802,200. The Pareto fit by Maximum Likelihood to this data has parameters α =1.702 and θ = 240,151. ⎧ ⎫1.702 240,151 F(300) = 1 - ⎨ ⎬ = 0.0021. ⎩ 300 + 240,151⎭ So corresponding to the loss of 300, we plot the point (1/131, 0.0021). ⎧ ⎫1.702 240,151 F(400) = 1 - ⎨ ⎬ = 0.0028. ⎩ 400 + 240,151⎭ Thus the second point plotted is (2/131, 0.0028). ⎧ ⎫ 1.702 240,151 F(4,802,200) = 1 - ⎨ ⎬ = 0.9944. ⎩ 4,802,200 + 240,151⎭ So corresponding to the loss of 4,802,200, we plot the point (130/131, 0.9944). For the ungrouped data in Section 2, the 104th loss out of 130 is 406,900. Thus 406,900 is the estimate of the 104/131 = 79.39th percentile. For the Maximum Likelihood Pareto, F(406,900) = 1 - {240,151/(406,900 + 240,151)}1.702 = 0.8149. So corresponding to the loss of 406,900, we plot the point (0.7939 , 0.8149). We note that 0.7939 corresponding to a smoothed empirical percentile, and 0.8149 representing the fitted Distribution Function are close, indicating a reasonable fit at this portion of the distribution. For the p-p plot a better fit occurs when the plotted points are close to the comparison line x = y. 239
See pages 445-446 of Loss Models. Personally, I find the type of exhibits discussed previously in which one graphs the difference between the empirical and fitted distribution functions, much more useful in practical applications than p-p plots. The eye can easily distinguish the differences from the horizontal x-axis, rather than comparing to a line at a 45 degree angle as in the p-p plots. In addition, one can easily add the K-S critical values as horizontal lines in difference graphs, allowing one to perform the K-S test graphically. Finally one can easily translate back to the sizes of loss, which are shown right on the x-axis of the difference graph; this allows one to quickly pick out those size ranges in which the fit is not as good. This would require the backup calculations that produced the p-p plot; it can not be done directly from the p-p plot itself. 240
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 534 One could proceed similarly for each of the 130 losses in the ungrouped data set in Section 2. For this maximum likelihood Pareto, the resulting p-p plot is: Fitted 1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
Sample
If one has a set of n losses, x1 , x2 ,..., xn , from smallest to largest, to which a Distribution F(x) has been fit, then the p-p plot consists of the n points: ( i/(n+1), F(xi) ). One also includes on the p-p plot, the comparison line x=y. The closer the plotted points stay to the comparison line, the better the fit. Exercise: For the ungrouped data in Section 2, the 78th loss out of 130 is 146,100. The Weibull fit by Maximum Likelihood to this data has parameters θ = 231,158 and τ = 0.6885. What point would this produce on a p-p plot? [Solution: 146,100 is the estimate of the 78/131 = 59.5th percentile. For this Weibull, F(146,100) = 1 - exp[-(146,100/231,158).6885] = 0.518. Therefore, corresponding to the loss of 146,100, we plot the point (0.595 , 0.518).]
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 535 The whole p-p plot for the maximum likelihood Weibull fit to the ungrouped data in Section 2 is: Fitted 1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
Sample
For small losses, the fitted Weibull Distribution function is larger than the empirical distribution function.241 Therefore, the fitted Weibull distribution has too thick of a left tail. For very large size losses, the fitted Weibull Distribution function is larger than the empirical distribution function; for large losses the fitted survival function is smaller than the sample survival function. Therefore, the fitted Weibull distribution has too thin of a right tail. In the neighborhood of where the sample distribution is 0.5, the slope of the curve is less than 1, the slope of the comparison line. Therefore, the fitted Weibull distribution has less probability than the sample distribution, near the sample median.
241
The curved line is above the 45° comparison line.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 536 By comparing the two p-p plots, one sees that the Weibull Distribution is a worse fit to this data than the Pareto (thick),242 since the Weibullʼs furthest departure from the comparison line x = y is larger than that of the Pareto: Fitted 1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
Sample
While we have only applied the p-p plot to ungrouped data, one can apply a similar technique to grouped data. Of course one can only plot at the endpoints of the intervals. One plots the empirical distribution at each endpoint versus the fitted distribution function at that endpoint.
242
This was seen previously by looking at the K-S statistics and graphs of the differences between the fitted and empirical distributions.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 537 Tails of Distributions: One can use p-p plots to compare the tails of a fitted distribution and the data.243 For example, here is a p-p plot:
For small losses, the dashed line is below the 45° comparison line. Therefore, for small losses, the fitted distribution function is smaller than the sample distribution function. Therefore, the fitted distribution has too thin of a left tail. For large size losses, the fitted distribution function is smaller than the sample distribution function; for large losses the fitted survival function is larger than the sample survival function. 243
See 4, 11/01, Q.6.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 538 Here is a graph of the survival functions of the sample (solid) and fitted (dashed), for very large sizes of loss:
F(x)
S(x)
Sample
Fitted
Sample
Fitted
0.900 0.950 0.970 0.990
0.870 0.910 0.940 0.975
0.100 0.050 0.030 0.010
0.130 0.090 0.060 0.025
Fitted S(x) → 0 less quickly as x → ∞ ⇒ the fitted distribution has too thick of a right tail. Here is an approximate graph of the difference of the sample and fitted distribution functions, as a function of the sample distribution function:
This fitted distribution has too thin of a lefthand tail and too thick of a righthand tail.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 539 In the neighborhood of where the sample distribution is .5, the slope of the curve is more than 1. As the sample distribution (solid) increases from 0.4 to 0.6, the fitted distribution (dashed) increases from about 0.34 to 0.65:
Therefore, the fitted distribution has more probability than the sample distribution, near the sample median.
N versus N+1: In some cases we use in the denominator N, the number of data points, while in others we use N + 1: Smoothed empirical estimate of percentiles ⇒ N+1 in the denominator. p-p plots ⇒ N+1 in the denominator. Empirical Distribution Function ⇒ N in the denominator. Kolmogorov-Smirnov Statistic ⇒ N in the denominator.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 540 Problems: 18.1 (2 points) The graph below shows a p-p plot of a fitted distribution compared to a sample.
Fitted 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Sample
Which of the following is true? 1. The lefthand tail of the fitted distribution is too thin. 2. The righthand tail of the fitted distribution is too thin. 3. The fitted distribution has less probability around the sample median than does the sample distribution. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. none of A, B, C, or D 18.2 (1 point) You are constructing a p-p plot. The 27th out of 83 losses from smallest to largest is 142. The fitted distribution is an Exponential with mean 677. What point should one plot? A. (0.321, 0.189) B. (0.325, 0.189) C. (0.321, 0.811) E. None of the above
D. (0.325, 0.811)
18.3 (3 points) A Pareto distribution with parameters α = 1.5 and θ = 1000 is being compared to the following five claims: 179, 352, 918, 2835, 6142. Construct the p-p plot.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 541 18.4 (1 point) Which of the following p-p plots indicates the best model of the data? 1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
A.
0.2
0.4
0.6
0.8
1
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
C.
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
E.
B.
0.2
0.4
0.6
0.8
1
D.
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 542 18.5. (3 points) You are given the following p-p plot: Fitted 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
The plot is based on the sample: 10 15 20 30 50 70 100 Determine the fitted model underlying the p-p plot. (A) Inverse Exponential with θ = 23. (B) Pareto with α = 1 and θ = 50. (C) Uniform on [0, 200]. (D) Exponential with mean 95. (E) Normal with mean 50 and standard deviation 30.
0.8
150
200
1
Sample
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 543 18.6 (2 points) A Weibull distribution with parameters τ = 0.8 and θ = 200 is being compared to the following four losses: 19, 62, 98, 385. Which of the following is the p-p plot?
A.
B.
C.
D.
E. None of A, B, C, or D.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 544 18.7 (4, 11/01, Q.6 & 2009 Sample Q.59) (2.5 points) The graph below shows a p-p plot of a fitted distribution compared to a sample. Fitted 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sample
Which of the following is true? (A) The tails of the fitted distribution are too thick on the left and on the right, and the fitted distribution has less probability around the median than the sample. (B) The tails of the fitted distribution are too thick on the left and on the right, and the fitted distribution has more probability around the median than the sample. (C) The tails of the fitted distribution are too thin on the left and on the right, and the fitted distribution has less probability around the median than the sample. (D) The tails of the fitted distribution are too thin on the left and on the right, and the fitted distribution has more probability around the median than the sample. (E) The tail of the fitted distribution is too thick on the left, too thin on the right, and the fitted distribution has less probability around the median than the sample.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 545 18.8 (4, 5/05, Q.5 & 2009 Sample Q.176) (2.9 points) You are given the following p-p plot: F(x) 1 0.8 0.6 0.4 0.2 Fn(x) 0.2
0.4
0.6
The plot is based on the sample: 1 2 3 15 30 50 51 Determine the fitted model underlying the p-p plot.
0.8
99
1
100
(A) F(x) = 1 - x-0.25, x ≥ 1 (B) F(x) = x / (1 + x), x ≥ 0 (C) Uniform on [1, 100] (D) Exponential with mean 10 (E) Normal with mean 40 and standard deviation 40 18.9 (4, 11/05, Q.31 & 2009 Sample Q.241) (2.9 points) You are given: (i) The following are observed claim amounts: 400 1000 1600 3000 5000 5400 (ii) An exponential distribution with θ = 3300 is hypothesized for the data. (iii) The goodness of fit is to be assessed by a p-p plot and a D(x) plot. Let (s, t) be the coordinates of the p-p plot for a claim amount of 3000. Determine (s - t) - D(3000). (A) -0.12 (B) -0.07 (C) 0.00 (D) 0.07 (E) 0.12
6200
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 546 Solutions to Problems: 18.1. D. For small losses, the fitted distribution function is smaller than the empirical distribution function. (The curve is below the 45° comparison line.) Therefore, the fitted distribution has too thin of a left tail. For large size losses, the fitted distribution function is larger than the empirical distribution function; for large losses the fitted survival function is smaller than the sample survival function. Therefore, the fitted distribution has too thin of a right tail. In the neighborhood of where the sample distribution is 0.5, the slope of the curve is less than 1. (At x = 0.4, y is about 0.24; at x = 0.6, y is about 0.36. For a 0.2 change in x, y changes by only about 0.12.) Therefore, the fitted distribution has less probability than the sample distribution, near the sample median. Comment: Similar to 4, 11/01, Q.6. 18.2. A. 142 is the estimate of the 27/(83+1) = 32.1th percentile. For this Exponential, F(142) = 1 - exp(-142/677) = 0.189. So corresponding to the loss of 142, plot the point (0.321 , 0.189).
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 547 18.3. For the Pareto Distribution, F(x) = 1 - (θ/(θ+x))α = 1 - (1000/(1000 +x))1.5. F(179) = .2189. F(352) = 1 - (1000/(1000 + 352))1.5 = .3639. F(918) = .6235. F(2835) = .8668. F(6142) = .9476. One plots the points ( i/(n+1), F(xi) ), for i = 1 to n = 5. The five plotted points are: (1/6, 0.2189), (2/6, 0.3639), (3/6, 0.6235), (4/6, 0.8668), (5/6, 0.9476).
18.4. B. If the model is a good one, the points in the p-p plot will lie very close to the straight line from (0, 0) to (1, 1). Models C and E are very poor. Of the remaining three, plot B appears to have the points closest to the line. Comment: These are all p-p plots to my ungrouped data in Section 2. Plot A is vs. the maximum likelihood LogNormal with µ = 11.5875 and σ = 1.60326. Plot B is vs. the maximum likelihood Pareto with α = 1.702 and θ = 240,151. Plot C is vs. the maximum likelihood Exponential with θ = 312,675. Plot D is vs. the maximum likelihood Weibull with θ = 231,158 and τ = 0.6885. Plot E is vs. a Pareto with α = 2 and θ = 150,000.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 548 18.5. D. There are 9 values in the sample. Each plotted point should be: {i/10, F(xi)}. The first plotted point is about {0.1 , 0.1}. For choice B, F(10) = 1 - 50/(50 + 10) = 1/6 = 0.166. For choice C, F(10) = 10/200 = 0.05. Eliminating choices B and C. The last plotted point is about {0.9 , 0.88}. For choice E, F(200) = Φ((200 - 50)/30) = Φ(5) ≅ 1. Eliminating choice E. The eighth plotted point is about {0.8, 0.8}. For choice A, F(150) = e-23/150 = 0.858. Eliminating choice A. Comment: Similar to 4, 5/05, Q.5. One should check the 15 plotted points D. They should be: 10 20 versus 30 choice50 70 100 150 200 i/10 F(xi)
0.1 0.100
0.2 0.146
0.3 0.190
0.4 0.271
0.5 0.409
Here are p-p plots of the various choices: (A) F(x) = e-23/x, x ≥ 0: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
0.8
1
0.8
1
Sample
(B) F(x) = x / (50 + x), x ≥ 0: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
Sample
0.6 0.521
0.7 0.651
0.8 0.794
0.9 0.878
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 549 (C) Uniform on [0, 200]: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
0.8
1
0.8
1
Sample
(D) Exponential with mean 95: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
Sample
(E) Normal with mean 50 and standard deviation 30: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
0.8
1
Sample
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 550 18.6. A. F(19) = 1 - exp[-(19/200)0.8] = 0.141.
F(62) = 1 - exp[-(62/200)0.8] = 0.324.
F(98) = 1 - exp[-(98/200).8] = 0.432. F(385) = 1 - exp[-(385/200).8] = 0.815. Thus we plot 4 points: (0.2, 0.141), (0.4, 0.324), (0.6, 0.432), (0.8, 0.815). 18.7. E. For small losses, the fitted distribution function is larger than the sample distribution function. (The curved line is above the 45° comparison line.) Therefore, the fitted distribution has too thick of a left tail. For large size losses, the fitted distribution function is larger than the sample distribution function; for large losses the fitted survival function is smaller than the sample survival function. Here is a graph of the survival functions of the sample (solid) and fitted (dashed), for very large sizes of loss: Survival Function 0.1 0.08 0.06 0.04 0.02
F(x)
S(x)
Sample
Fitted
Sample
Fitted
0.890 0.900 0.920 0.970
0.930 0.945 0.965 0.990
0.110 0.100 0.080 0.030
0.070 0.055 0.035 0.010
Fitted S(x) → 0 more quickly as x → ∞ ⇒ the fitted distribution has too thin of a right tail. In the neighborhood of where the sample distribution is .5, the slope of the curve is less than 1. As the sample distribution increases from .4 to .6, the fitted distribution only increases from about .34 to .42. Therefore, the fitted distribution has less probability than the sample distribution, near the (sample) median. Comment: The slope is greater than one in the left hand tail. Therefore, the fitted density was larger than the sample density in the left hand tail. The slope is less than one in the right hand tail. Therefore, the fitted density was smaller than the sample density in the right hand tail.
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 551 18.8. A. There are 9 values in the sample. Each plotted point should be: {i/10, F(xi)}. The first plotted point is {.1, 0}. For choices B, D, and E, F(1) ≠ 0, so these choices are eliminated. The last plotted point is about {0.9, 0.7}. For choice A, F(100) = 1 - 1/ 10 = 0.684 ≅ 0.7. For choice C, F(100) = 1 ≠ 0.7, eliminating choice C. Comment: One more points A. They should be: 1 could check 2 3 of the plotted 15 30 versus 50 choice51 99 100 i/10 F(xi)
0.1 0.000
0.2 0.159
0.3 0.240
0.4 0.492
0.5 0.573
0.6 0.624
0.7 0.626
0.8 0.683
0.9 0.684
For example the fifth plotted point is at about {.5, .58}, and F(30) = 1 - 30-0.25 = 0.573 ≅ 0.58. Here are p-p plots of the various choices: (A) F(x) = 1 - x-0.25, x ≥ 1: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
0.8
1
0.6
0.8
1
Sample
(B) F(x) = x / (1 + x), x ≥ 0: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
Sample
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 552 (C) Uniform on [1, 100]: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
0.8
1
0.8
1
Sample
(D) Exponential with mean 10: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
Sample
(E) Normal with mean 40 and standard deviation 40: Fitted 1 0.8 0.6 0.4 0.2 0.2
0.4
0.6
0.8
1
Sample
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 553 18.9. B. The claim of size 3000 is the 4th out of 7, so the first coordinate of the p-p plot is: 4/(7 + 1) = 0.5. For the Exponential, F(3000) = 1 - e-3000/3300 = 0.5971. Thus the point corresponding to the claim of size 3000 in the p-p plot is: (0.5, 0.5971). The D(x) plot is the difference graph, the difference between the empirical and theoretical distribution functions. The empirical distribution function at 3000 is 4/7, while the theoretical distribution function is: 1 - e-3000/3300 = 0.5971. Therefore D(3000) = 4/7 - 0.5971 = -0.0257. (s - t) - D(3000) = (0.5 - 0.5971) - (4/7 - 0.5971) = -0.0714. Comment: Note the order of the difference in D(x): empirical - theoretical. The p-p plot uses n + 1 in its denominator, while the empirical distribution function uses n. Also note that the empirical distribution jumps to 4/7 at 3000, but is 3/7 just before 3000. Here is the entire p-p plot: Fitted 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Sample
2013-4-6, Fitting Loss Distributions §18 p-p Plots, HCM 10/15/12, Page 554 Here is the difference graph out to 10,000: D(x) 0.15 0.10 0.05
2000
4000
6000
8000
10000
x
- 0.05 - 0.10 - 0.15 - 0.20 The largest distance from the x-axis occurs just before 5000. At 4999.99, the empirical distribution function is 4/7, while the Exponential Distribution is: 1 - e-4999.99/3300 = 0.780. Therefore, the Kolmogorov-Smirnov Statistic is: |4/7 - 0.780| = 0.209.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 555
Section 19, Anderson-Darling Test244 Similar to the Kolmogorov-Smirnov (K-S) statistic, the Anderson-Darling statistic also tests how well an ungrouped data set is fit by a given distribution. The computation of the Anderson-Darling statistic is somewhat different, and is based on giving more weight to differences in either of the two tails. No Truncation or Censoring: In the absence of truncation or censoring, for a data set of size n, {y1 , y2 , ..., yn }, from smallest to largest, the Anderson-Darling statistic, A2 , can be computed as:
A2
n
n
i=1
i=1
= -n - (1/n) ∑ (2i - 1) {ln[F(yi)] + ln[S(yn + 1- i)]} = -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] .245
Exercise: An Exponential distribution with θ = 1000 was fit to the following four claims: 197, 325, 981, 2497. What is the value of the Anderson-Darling statistic? [Solution: F(x) = 1 - e-x/1000. F(197) = 0.1788. ln(F(197)) = -1.7214. S(x) = e-x/1000. ln(S(x)) = -x/1000. (1)ln(F(197)) + (3)ln(F(325)) + (5)ln(F(981)) + (7)ln(F(2497)) = (1)(-1.7214) + (3)(-1.2820) + (5)(-.4699) + (7)(-.0859) = -8.518. (1)ln(S(2497)) + (3)ln(S(981)) + (5)ln(S(325)) + (7)ln(S(197)) = (1)(-2.497) + (3)(-.981) + (5)(-.325) + (7)(-.197) = -8.444. i
2i - 1
yi
F(yi)
lnF(yi)
S(yn +1 - i)
ln S(yn+1 - i)
1 2 3 4
1 3 5 7
197 325 981 2497
0.1788 0.2775 0.6251 0.9177
-1.7214 -1.2820 -0.4699 -0.0859
0.0823 0.3749 0.7225 0.8212
-2.497 -0.981 -0.325 -0.197
-8.5185
-8.4440
Anderson-Darling statistic = -4 - (1/4)(-8.518 - 8.444) = 0.241. Comment: For this situation, we have previously calculated the K-S statistic as 0.2225.] Just as with the K-S statistic, the Anderson-Darling statistic is always positive, and large values indicate a bad fit.246
244
See Section 16.4.2 of Loss Models. See for example, Survival Models by Dick London, not on the syllabus. 246 Unlike the K-S statistic, the Anderson-Darling Statistic can be greater than 1. 245
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 556 Hypothesis Testing: According to Loss Models, the critical values for the Anderson-Darling statistic are:247 significance level α: 10%
5%
1%
critical value c:
2.492
3.880
1.933
Thus for the above exercise with an Anderson-Darling statistic of 0.241, we do not reject the fit of the Exponential Distribution at 10%. Exercise: If the Anderson-Darling Statistic had been 3, what conclusion would we draw? [Solution: 2.492 < 3 < 3.880. Reject to the left and do not reject to the right. Reject at 5%, and do not reject the fit at 1%.] Note that unlike the critical values for the Kolmogorov-Smirnov Statistic, the critical values for the Anderson-Darling Statistic do not depend on the sample size. Comparing the Anderson-Darling and Kolmogorov-Smirnov Statistics: For various distributions fit by the Method of Maximum Likelihood to the ungrouped data in Section 2 with 130 losses,248 the values of the Anderson-Darling Statistic are:
Pareto LogNormal Weibull Gamma Exponential
Anderson-Darling Statistic 0.229 0.825 1.222 2.501 reject at 5% 11.313 reject at 1%
K-S Statistic 0.059 0.082 0.092 0.132 reject at 5% 0.240 reject at 1%
The Pareto is an excellent fit, while the Exponential is a horrible fit. In this case, the ranking of the fits and the results of hypothesis testing are the same for the Anderson-Darling and the KolmogorovSmirnov statistics. In general, the Anderson-Darling statistic may give different results than the K-S statistic, due to the Anderson-Darling applying more weight to the differences in the tails.249 247
See page 450 of Loss Models. If asked to do hypothesis testing, this table should be included in the question. These critical values are not strictly applicable to small samples. While these critical values are often used when comparing data to distributions fit to that same date, the correct critical values in this case are smaller. See page 429 of Loss Models. When there is censoring from above, then the critical values need to be smaller. 248 See the Section on fitting to ungrouped data via maximum likelihood for the parameters of the fitted distributions. 249 Looking at the previously presented graphs of the difference functions, in the case of the Method of Maximum Likelihood applied to the ungrouped data in Section 1, the major discrepancies were in the range 100,000 to 200,000 rather than in the tails.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 557 For many applications of loss distributions, we are very concerned with the fit in the right hand tail, and thus the extra weight applied there in the computation of the Anderson-Darling statistic is very useful. On the other hand, for many applications of loss distributions, we are unconcerned with the fit in the left hand tail, and thus the extra weight applied there in the computation of the Anderson-Darling statistic is counterproductive. General Formula: The Anderson-Darling Statistic is defined as: u
A2
{Fn (x) - F * (x)} 2 f * (x) dx , F * (x) S * (x) t
≡ n ∫
where F* is the model distribution to which we are comparing,250 Fn is the empirical distribution, n is the number of data points, t is the truncation point from below, and u is the censorship point from above. Thus the Anderson-Darling statistic is small when the model distribution closely matches the empirical distribution, and thus their squared difference is small. Anderson-Darling statistic small ⇔ good fit. Anderson-Darling statistic large ⇔ bad fit.
The Anderson-Darling statistic is a weighted average of the squared difference between the empirical distribution function and the model distribution function. The weights are 1/{F(x)S(x)}. F(x)S(x) is the variance of the empirical distribution function. Thus the weights are approximately inversely proportional to the variance of the empirical distribution. Near the middle the weights are close to
1 = 4. (1/ 2) (1/ 2)
In the left hand tail, the weights are larger, for example
1 = 11.1. (1/ 10) (9 / 10)
Similarly, in the right hand tail, the weights are larger, for example
1 = 11.1. (9 / 10) (1/ 10)
Thus, the Anderson-Darling Statistic weights more heavily discrepancies in either tail. 250
After altering a ground-up, unlimited size of loss distribution for the affects of any truncation from below. What I have called H.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 558 It turns out using the fact that Fn (x) is constant on intervals, the above integral reduces to:251 252 k
A2 = -nF*(u) + n ∑ Sn(yi)2 ln[ i= 0
k S * (yi) F * (yi + 1 ) + n Fn(yi)2 ln[ ∑ ] ], S * (yi + 1 ) F * (y ) i i=1
where n is the number of data points, k is the number of data points which are not censored from above, t is the truncation point from below, and u is the censorship point from above. t = y0 < y1 < y2 ... < yk-1 < yk < yk+1 = u. In the absence of truncation and censoring, the above formula becomes:253 n-1
A2
= -n + n ∑
Sn (yi)2 {ln[S(yi)]
n
- ln[S(yi + 1 )]} + n ∑Fn (yi)2 {ln[F(yi + 1 )] - ln[F(y i)]} ,
i=0
i=1
where Fn is the empirical distribution function, Fn (yi) = i/n, Sn is the empirical survival function, S n (yi) = 1 - i/n, y0 = 0, and yn+1 = ∞. Exercise: An Exponential distribution with θ = 1000 was fit to the following four claims: 197, 325, 981, 2497. Using the above formula, compute the Anderson-Darling statistic. [Solution: F(x) = 1 - e-x/1000. S(x) = e-x/1000. ln(S(x)) = - x/1000. n = number of data points = 4. Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4. n-1
∑ Sn(yi)2 {ln[S(yi)]
- ln[S(yi + 1 )]} = 0.5278.
i=0
i
yi
Sn(yi)
lnS(yi)
lnS(yi+1)
contribution
0 1 2 3
0 197 325 981
1.0000 0.7500 0.5000 0.2500
0.0000 -0.1970 -0.3250 -0.9810
-0.1970 -0.3250 -0.9810 -2.4970
0.1970 0.0720 0.1640 0.0948 0.5278
251
See page 450 of Loss Models. In the absence of censoring from above, yn+1 = ∞ and one should not include the final term in the first summation; otherwise one would be asked to take ln(0). 252 One can instead of having the first term, have the final summation go to i = k+1, with yk+2 = ∞, and F(yk+2) = 1. 253 With no truncation, t = 0 = y0 . With no censoring k = n and c = ∞ = yn+1, and we do not include the final term in the sum involving lnS.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 559 n
∑Fn(yi)2 {ln[F(yi + 1)]
- ln[F(y i)]} = 0.5324.
i=1
i
yi
Fn(yi)
lnF(yi)
lnF(yi+1)
contribution
1 2 3 4
197 325 981 2497
0.2500 0.5000 0.7500 1.0000
-1.7214 -1.2820 -0.4699 -0.0859
-1.2820 -0.4699 -0.0859 0.0000
0.0275 0.2030 0.2160 0.0859 0.5324
Anderson-Darling statistic = -4 + (4)(0.5278) + (4)(0.5324) = 0.241.] This matches the result previously obtained. One can show that in the absence of truncation and censoring, the first formula I gave for the Anderson-Darling statistic, matches the more complicated formula given in Loss Models. Derivation of the formula for the Anderson-Darling Statistic: Assume no truncation or censoring. Let x1 , ..., xn be the data from smallest to largest. Let x0 and xn+1 be the lower and upper endpoints of the support of F. Then the empirical distribution function is: Fn (x) = j/n for xj ≤ x < xj+1, j = 0 to n. For xj ≤ x < xj+1: n{Fn (x) - F(x)}2 f(x) /{F(x)S(x)} = n{j/n - F(x)}2 f(x) /{F(x)S(x)} = (j2 /n) f(x) /{F(x)S(x)} - 2j f(x)/S(x) + nf(x)F(x)/S(x). Let y = S(x). dy = -f(x) dx
∫
Then A2 ≡ n {Fn (x) - F(x)}2 f(x)/{F(x)S(x)} dx =
Σ (j2/n) ∫f(x)/{F(x)S(x)} dx - Σ 2j
∫f(x)/S(x) dx + Σn ∫f(x)F(x)/S(x) dx = -Σ (j2 /n)∫dy/(y - y2 ) + 2Σj ∫dy/y - nΣ ∫(1 - y)/y dy. ∫dy/(y - y2) = ∫dy/y - dy/(1 - y) = ln(y) + ln(1-y). ∫(1 - y)/y dy = ∫dy/y - ∫dy = ln(y) - y. At the top of each interval, y = S(xj+1). At the bottom of each interval, y = S(xj).
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 560 A 2 = (-1/n)Σ j2 {lnS(xj+1) + lnF(xj+1) - lnS(xj) - lnF(xj)} + 2Σ j {lnS(xj+1) - lnS(xj)} - nΣ {lnS(xj+1) - S(xj+1) - lnS(xj) + S(xj)} . Assume for example that n = 3, with three data points from smallest to largest: x1 , x2 , x3 . Then, A2 = (-1/3){lnS(x2 ) + lnF(x2 ) - lnS(x1 ) - lnF(x1 ) + 4(lnS(x3 ) + lnF(x3 ) - lnS(x2 ) - lnF(x2 )) + 9(lnS(x4 ) + lnF(x4 ) - lnS(x3 ) - lnF(x3 )) } + 2{lnS(x2 ) - lnS(x1 ) + 2(lnS(x3 ) - lnS(x2 )) + 3(lnS(x4 ) - lnS(x3 ))} - 3{lnS(x1 ) - S(x1 ) - lnS(x0 ) + S(x0 ) + lnS(x2 ) - S(x2 ) - lnS(x1 ) + S(x1 ) + lnS(x3 ) - S(x3 ) - lnS(x2 ) + S(x2 ) + lnS(x4 ) - S(x4 ) - lnS(x3 ) + S(x3 )} = -3lnS(x4 ) + (5/3)lnS(x3 ) + lnS(x2 ) + (1/3) lnS(x1 ) - 3 lnF(x4 ) + (5/3)lnF(x3 ) + lnF(x2 ) + (1/3) lnF(x1 ) + 6lnS(x4 ) - 2lnS(x3 ) - 2lnS(x2 ) - 2lnS(x1 ) - 3 lnS(x4 ) + 3 lnS(x0 ) + 3 S(x4 ) - 3 S(x0 ) = -3 + (-1/3){lnS(x3 ) + 3lnS(x2 ) + 5 lnS(x1 )} + (-1/3){lnF(x1 ) + 3lnF(x2 ) + 5 lnF(x3 )}.254 This is of the stated form: n
A2
= -n - (1/n) ∑ (2i - 1) ln[F(yi) S(y n + 1- i)] = -n - (1/n) i=1
n
∑ (2i - 1) {ln[F(yi)] + ln[S(yn + 1- i)]} . i=1
Censoring:255 Here is an example of the computation of the Anderson-Darling Statistic, when there is censorship from above at u. k
A2
= -nF*(u) + n ∑ i=0
Sn (yi)2 ln
k S * (yi) F * (y ) [S * (y )] + n ∑Fn(yi)2 ln[ F * (yi +)1 ]. i +1 i i=1
k = the number of data points which are not censored from above. Therefore, the summations in the Anderson-Darling statistic are only taken over those data points that have not been censored from above. Also the first term is the number of data points multiplied by the value of the uncensored distribution at the censorship point. 254 255
Using the facts that: S(x4 ) = 0, S(x0 ) = 1, lnS(x0 ) = ln(1) = 0, and lnF(x4 ) = ln(1) = 0. See the second half of example 16.6 in Loss Models.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 561 Exercise: An Exponential distribution with θ = 1000 is compared to the following data set censored from above at 2000: 197, 325, 981, 2000. What is the value of the Anderson-Darling statistic? [Solution: F*(x) = 1 - e-x/1000, x < 2000. S*(x) = e-x/1000, x < 2000. ln(S(x)) = -x/1000. n = number of data points = 4. k = number of data points not censored from above = 3. Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4. y5 = u = 2000. k
∑ Sn(yi)2 ln[SS* (y* (yi) )] = 0.4967. i +1
i=0
i
yi
Sn(yi)
lnS*(yi)
lnS*(yi+1)
contribution
0 1 2 3 4
0 197 325 981 2000
1.0000 0.7500 0.5000 0.2500
0.0000 -0.1970 -0.3250 -0.9810
-0.1970 -0.3250 -0.9810 -2.0000
0.1970 0.0720 0.1640 0.0637 0.4967
k
∑Fn(yi)2 ln[F F* (y* (yi +)1) ] = 0.4130. i
i=1
i
yi
Fn(yi)
lnF*(yi)
lnF*(yi+1)
contribution
1 2 3 4
197 325 981 2000
0.2500 0.5000 0.7500 1.0000
-1.7214 -1.2820 -0.4699 -0.1454
-1.2820 -0.4699 -0.1454 0.0000
0.0275 0.2030 0.1825 0.4130
F*(u) = 1 - e-2000/1000 = 0.8647. Anderson-Darling statistic = -(4)(0.8647) + (4)(0.4967) + (4)(0.4130) = 0.180.]
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 562 Truncation:256 Here is an example of the computation of the Anderson-Darling Statistic, when there is truncation from below at t. Exercise: Losses prior to truncation are assumed to follow an Exponential distribution with θ = 1000. This assumption is compared to the following data set truncated from below at 250: 325, 981, 2497. What is the value of the Anderson-Darling Statistic? [Solution: After truncation from below at 250, F*(x) = {F(x) - F(250)} / S(250) = (e-250/1000 - e-x/1000)/e-250/1000 = 1 - e-(x-250)/1000, x > 250. ln(S*(x)) = -(x-250)/1000, x > 250. n = number of data points = 3. k = number of data points not censored from above = 3. Fn (yi) = i/3. Sn (yi) = 1 - Fn (yi) = 1 - i/3. y 0 = t = truncation point = 250. y4 = ∞. In the absence of censoring we do not include the last term in the first summation, which would otherwise involve taking the log of infinity. 2
∑ Sn(yi)2 {ln[S(yi)]
- ln[S(yi + 1 )]} = 0.5350.
i=0
i
yi
Sn(yi)
lnS*(yi)
lnS*(yi+1)
contribution
0 1 2 3
250 325 981 2497
1.0000 0.6667 0.3333
0.0000 -0.0750 -0.7310
-0.0750 -0.7310 -2.2470
0.0750 0.2916 0.1684 0.5350
3
∑Fn(yi)2 {ln[F(yi + 1)]
- ln[F(y i)]} = 0.5729.
i=1
i
yi
Fn(yi)
lnF*(yi)
lnF*(yi+1)
contribution
1 2 3
325 981 2497
0.3333 0.6667 1.0000
-2.6275 -0.6567 -0.1117
-0.6567 -0.1117 0.0000
0.2190 0.2422 0.1117 0.5729
F*(u) = F*(∞) = 1. Anderson-Darling statistic = -3 + (3)(0.5350) + (3)(0.5729) = 0.324.]
256
See the first half of example 16.6 in Loss Models.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 563 Truncation and Censoring: Here is an example involving both truncation from below and censoring from above. Exercise: Losses prior to truncation and censoring are assumed to follow a Weibull distribution with θ = 3000 and τ = 1/2. This assumption is compared to the following data set truncated from below at 1000 and censored from above at 10,000: 1219, 1737, 2618, 3482, 4825, 6011, 10,000, 10,000, 10,000, 10,000. What is the value of the Anderson Darling statistic? [Solution: After truncation from below at 1000, F*(x) = {F(x) - F(1000)}/S(1000) = ( e- 1000/ 3000 - e - x/ 3000 ) / e- 1000/ 3000 = 1 - 1.7813 e - x/ 3000 , 1000 < x < 10,000. n = number of data points = 10. k = number of data points not censored from above = 6. Fn (yi) = i/10. Sn (yi) = 1 - Fn (yi) = 1 - i/10. k
∑ Sn(yi)2 ln[SS* (y* (yi) )] = 0.5123. i +1
i=0
i
yi
Sn(yi)
S*(yi)
lnS*(yi)
lnS*(yi+1)
contribution
0 1 2 3 4 5 6
1000 1219 1737 2618 3482 4825 6011 10000
1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000
1.0000 0.9417 0.8323 0.6999 0.6065 0.5011 0.4325 0.2870
0.0000 -0.0601 -0.1836 -0.3568 -0.5000 -0.6909 -0.8382
-0.0601 -0.1836 -0.3568 -0.5000 -0.6909 -0.8382 -1.2484
0.0601 0.1000 0.1109 0.0702 0.0687 0.0368 0.0656 0.5123
k
∑Fn(yi)2 ln[F F* (y* (yi +)1) ] = 0.2106. i
i=1
i
yi
Fn(yi)
F*(yi)
lnF*(yi)
lnF*(yi+1)
contribution
1 2 3 4 5 6
1219 1737 2618 3482 4825 6011 10000
0.1000 0.2000 0.3000 0.4000 0.5000 0.6000
0.0583 0.1677 0.3001 0.3935 0.4989 0.5675 0.7130
-2.8417 -1.7855 -1.2036 -0.9328 -0.6954 -0.5665
-1.7855 -1.2036 -0.9328 -0.6954 -0.5665 -0.3382
0.0106 0.0233 0.0244 0.0380 0.0322 0.0822 0.2106
F*(u) = F*(10000) = 1 - 1.7813exp(-(10000/3000)0.5) = 0.7130. Anderson-Darling statistic = -(10)(.7130) + (10)(0.5123) + (10)(0.2106) = 0.099.]
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 564 Problems: 19.1 (3 points) For the following four losses: 40, 150, 230, 400, and an Exponential Distribution with θ = 200, what is the value of the Anderson-Darling statistic? A. 0.05
B. 0.10
C. 0.20
D. 0.30
E. 0.40
19.2 (4 points) A Pareto distribution with parameters α = 1.5 and θ = 1000 was fit to the following five claims: 179, 352, 918, 2835, 6142. What is the value of the Anderson-Darling statistic? A. less than 0.35 B. at least 0.35 but less than 0.40 C. at least 0.40 but less than 0.45 D. at least 0.45 but less than 0.50 E. at least 0.50 19.3 (2 points) A Weibull Distribution, F*(x), has been fit to 80 uncensored sizes of loss, yj. 79
Σ {1 - Fn(yj)}2 {ln[1 - F*(yj)] - ln[1 - F*(yj+1)]} = 0.4982. j=0 80
Σ Fn(yj)2 {lnF*(yj+1) - lnF*(yj)} = 0.5398. j=1
Use the following table for the Anderson-Darling statistic: significance level α: 10%
5%
1%
critical value c: 1.933 2.492 3.880 Which of the following are true with respect this Weibull fit? A. Do not reject the fit at 10%. B. Do not reject the fit at 5%. Reject the fit at 10%. C. Do not reject the fit at 1%. Reject the fit at 5%. D. Reject the fit at 1%. E. None of the above. 19.4 (3 points) A distribution was fit to the following 5 untruncated and uncensored losses: 410, 1924, 2635, 4548, and 6142. The corresponding values of the fitted distribution are: .0355, .4337, .5659, .7720, and .8559. What is the value of the Anderson-Darling statistic? A. 0.15 B. 0.20 C. 0.25 D. 0.30 E. 0.35
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 565 Use the following information for the next two questions:
• There are five observed sizes of loss: x1 < x2 < x3 < x4 < x5 . • A parametric distribution, F(x), has been fit to this data. • F(x1 ) = 0.1.
F(x2 ) = 0.3.
F(x3 ) = 0.5.
F(x4 ) = 0.7.
F(x5 ) = 0.9.
19.5 (1 point) Compute the Kolmogorov-Smirnov Statistic. A. 0.09 B. 0.10 C. 0.11 D. 0.12 E. 0.13 19.6 (2 points) Compute the Anderson-Darling Statistic. A. 0.09 B. 0.10 C. 0.11 D. 0.12 E. 0.13
Use the following information for the next two questions:
• There are five observed sizes of loss: x1 < x2 < x3 < x4 < x5 . • A parametric distribution, F(x), has been fit to this data. • F(x1 ) = 0.1.
F(x2 ) = 0.3.
F(x3 ) = 0.6.
F(x4 ) = 0.7.
F(x5 ) = 0.9.
19.7 (1 point) Compute the Kolmogorov-Smirnov Statistic. A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21 19.8 (2 points) Compute the Anderson-Darling Statistic. A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21
Use the following information for the next two questions:
• There are five observed sizes of loss: x1 < x2 < x3 < x4 < x5 . • A parametric distribution, F(x), has been fit to this data. • F(x1 ) = 0.1.
F(x2 ) = 0.3.
F(x3 ) = 0.5.
F(x4 ) = 0.7.
19.9 (1 point) Compute the Kolmogorov-Smirnov Statistic. A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21 19.10 (2 points) Compute the Anderson-Darling Statistic. A. 0.17 B. 0.18 C. 0.19 D. 0.20 E. 0.21
F(x5 ) = 0.8.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 566 19.11 (1 point) A distribution has been fit to data. The Anderson-Darling Statistic is 2.11. Use the following table for the Anderson-Darling statistic: significance level α: 10%
5%
1%
critical value c: 1.933 2.492 3.880 Which of the following are true with respect this fit? A. Do not reject the fit at 10%. B. Do not reject the fit at 5%. Reject the fit at 10%. C. Do not reject the fit at 1%. Reject the fit at 5%. D. Reject the fit at 1%. E. None of the above. 19.12 (6 points) You observe the following 10 losses truncated from below at 100,000: 241,513 110,493
231,919 139,647
105,310 220,942
125,152 161,964
116,472 105,829
A Distribution Function: F(x) = 1 - (x/100000)-2.8, x > 100,000, has been fit to this data. Determine the value of the Anderson-Darling statistic. A. 0.1 B. 0.2 C. 0.3 D. 0.4 E. 0.5 19.13 (4 points) Losses prior to truncation and censoring are assumed to follow a Weibull distribution with θ = 3000 and τ = 2. This assumption is compared to the following data set censored from above at 5000: 737, 1618, 2482, 3003, 5000. What is the value of the Anderson-Darling statistic? (A) 0.16 (B) 0.18 (C) 0.20 (D) 0.22 (E) 0.24 19.14 (1 point) Which of the following statements are true? 1. For the Kolmogorov-Smirnov test, if the sample size were to double, with each number showing up twice instead of once, the test statistic would double and the critical values would remain unchanged. 2. For the Anderson-Darling test, if the sample size were to double, with each number showing up twice instead of once, the test statistic would double and the critical values would remain unchanged. 3. For the Likelihood Ratio test, if the sample size were to double, with each number showing up twice instead of once, the test statistic would double and the critical values would remain unchanged. A. 1 B. 2 C. 3 D. 1, 2 E. 2, 3 19.15 (4 points) Losses truncated from below at 50 are of size: 64, 90, 132, 206. Loss sizes prior to truncation are assumed to follow an Exponential distribution with θ = 100. Calculate the value of the Anderson-Darling statistic. (A) 0.19 (B) 0.21 (C) 0.23 (D) 0.25
(E) 0.27
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 567 19.16 (4 points) With a deductible of 500 and a maximum covered loss of 5000, the following five payments are made: 179, 352, 968, 1421, 4500. Loss sizes prior to the effect of the deductible and maximum covered loss are assumed to follow a Pareto Distribution with parameters α = 2 and θ = 1000. Calculate the value of the Anderson-Darling statistic. (A) 0.24 (B) 0.26 (C) 0.28 (D) 0.30
(E) 0.32
19.17 (3 points) Assume lifetimes are uniform on (0, 100). You observe three deaths at ages: 40, 70, and 80. Compute the Anderson-Darling Statistic. (A) 0.51 (B) 0.54 (C) 0.57 (D) 0.60 (E) 0.63 19.18 (1 point) A certain distribution is being compared to data. Let H0 be that the data was drawn from this distribution. Let H1 be that the data was not drawn from this distribution. The Anderson-Darling Statistic is 3.2. Use the following table for the Anderson-Darling statistic: significance level α:
10%
5%
1%
critical value c: 1.933 2.492 3.880 What are the probabilities of Type I and Type II errors? 19.19 (3 points) Assume lifetimes are uniform on (0, 100). You observe a single death at age 60. Compute the Anderson-Darling Statistic using u
A2
=n
{Fn(x) - F * (x)}2 f * (x) dx. F * (x) S * (x) t
∫
(A) 0.31
(B) 0.34
(C) 0.37
(D) 0.40
(E) 0.43
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 568 19.20 (3 points) Prior to truncation and censoring f(x) = 2x, 0 < x < 1. Data has been left truncated at 0.3 and right censored at 0.7: 0.6, 0.7+. Compute the value of the Anderson-Darling statistic. A. less than 0.10 B. at least 0.10 but less than 0.12 C. at least 0.12 but less than 0.14 D. at least 0.14 but less than 0.16 E. at least 0.16 19.21 (3 points) Prior to the affect of a maximum covered loss, losses are assumed to follow a LogNormal Distribution with µ = 11 and σ = 1. Data has been collected from policies with a 50,000 maximum covered loss: 20,000, 30,000, 50,000+, 50,000+. Compute the value of the Anderson-Darling statistic. (A) 0.08 (B) 0.10 (C) 0.12 (D) 0.14
(E) 0.16
19.22 (3 points) Assume that the random variable x has the probability density function: f(x) = 0.18 - 0.016x , 0 ≤ x ≤ 10. Suppose that a sample is truncated at x = 6 so that values below this amount are excluded. The sample is then observed to be: 7.0, 7.5, 8.0, 8.0. Compute the value of the Anderson-Darling statistic. (A) 0.7 (B) 0.8 (C) 0.9 (D) 1.0 (E) 1.1 19.23 (Course 160 Sample Exam #3, 1994, Q.12) (1.9 points) With respect to methods used to test the acceptability of a fitted parametric model as a representation of the true underlying model, according to Loss Models which of the following are true? I. Let Ej be the expected number of observations in interval j. Then the Chi-Square goodness of fit test works best when the Ej are about equal. II. The Kolmogorov-Smirnov (K-S) statistic is the smallest absolute deviation between the empirical and model distribution functions. III. The Anderson-Darling statistic is a departure measure that weights the expected squared deviations between the empirical and model distribution functions. (A) I and II only (B) I and III only (C) II and III only (D) I, II and III (E) The correct answer is not given by (A), (8), (C) or (D).
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 569 19.24 (4, 11/04, Q.22 & 2009 Sample Q.149) (2.5 points) If the proposed model is appropriate, which of the following tends to zero as the sample size goes to infinity? (A) Kolmogorov-Smirnov test statistic (B) Anderson-Darling test statistic (C) Chi-square goodness-of-fit test statistic (D) Schwarz Bayesian adjustment (E) None of (A), (B), (C) or (D) 19.25 (4, 5/05, Q.19 & 2009 Sample Q.189) (2.9 points) Which of the following statements is true? (A) For a null hypothesis that the population follows a particular distribution, using sample data to estimate the parameters of the distribution tends to decrease the probability of a Type II error. (B) The Kolmogorov-Smirnov test can be used on individual or grouped data. (C) The Anderson-Darling test tends to place more emphasis on a good fit in the middle rather than in the tails of the distribution. (D) For a given number of cells, the critical value for the chi-square goodness-of-fit test becomes larger with increased sample size. (E) None of (A), (B), (C) or (D) is true. 19.26 (4, 11/05, Q.34 & 2009 Sample Q.244) (2.9 points) Which of statements (A), (B), (C), and (D) is false? (A) The chi-square goodness-of-fit test works best when the expected number of observations varies widely from interval to interval. (B) For the Kolmogorov-Smirnov test, when the parameters of the distribution in the null hypothesis are estimated from the data, the probability of rejecting the null hypothesis decreases. (C) For the Kolmogorov-Smirnov test, the critical value for right censored data should be smaller than the critical value for uncensored data. (D) The Anderson-Darling test does not work for grouped data. (E) None of (A), (B), (C) or (D) is false.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 570 Solutions to Problems: 19.1. D. F(x) = 1 - e-x/200. F(40) = .1813. ln(F(40)) = -1.7078. S(x) = e-x/200. ln(S(x)) = - x/200. (1)ln(F(40)) + (3)ln(F(150)) + (5)ln(F(230)) + (7)ln(F(400)) = -6.5474. (1)ln(S(2497)) + (3)ln(S(981)) + (5)ln(S(325)) + (7)ln(S(197)) = -10.6. i
2i - 1
yi
F(yi)
lnF(yi)
S(yn+i-1)
lnS(yn+i-1)
1 2 3 4
1 3 5 7
40 150 230 400
0.1813 0.5276 0.6834 0.8647
-1.7078 -0.6394 -0.3807 -0.1454
0.1353 0.3166 0.4724 0.8187
-2 -1.15 -0.75 -0.2
-6.5474
-10.6000
Anderson-Darling statistic = A2 = -n - (1/n)Σ (2i - 1){ln(F(yi)) + ln(S(yn+1-i))} = - 4 - (1/4)(-6.5474 - 10.6) = 0.287. A2 = -n - (1/n)Σ (2i - 1){ln(F(yi)) + ln(S(yn+1-i))} n-1
n
Alternately, A2 = -n + nΣ Sn (yi)2 {ln(S(yi)) - ln(S(yi+1))} + nΣ Fn (yi)2 {ln(F(yi+1)) - ln(F(yi))} = i=0
i=1
-4 + (4){(12 )ln(1/.8187) + (.752 )ln(.8187/.4724) + (.52 )ln(.4724/.3166) + (.252 )ln(.3166/.1353)} + (4){(.252 )ln(.5276/.1813) + (.52 )ln(.6834/.5276) + (.752 )ln(.8647/.6834) + (12 )ln(1/.8647)} = 0.287.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 571 19.2. E. At each of the observed claim sizes, compute the values of the fitted Pareto distribution: F(x) = 1 - (θ/(θ+x))α = 1 - (1000/(1000 +x))1.5. So for example, F(352) = 1 - (1000/(1000 + 352))1.5 = 0.3639. (1)ln(F(179)) + (3)ln(F(352)) + (5)ln(F(918)) + (7)ln(F(2835)) + (9)ln(F(6142)) = -8.3984. (1)ln(S(6142)) + (3)ln(S(2835)) + (5)ln(S(918)) + (7)ln(S(352)) + (9)ln(S(179)) = -19.2720. i
2i - 1
yi
F(yi)
lnF(yi)
S(yn+i-1)
lnS(yn+i-1)
1 2 3 4 5
1 3 5 7 9
179 352 918 2835 6142
0.2189 0.3639 0.6235 0.8668 0.9476
-1.5193 -1.0109 -0.4724 -0.1429 -0.0538
0.0524 0.1332 0.3765 0.6361 0.7811
-2.9490 -2.0163 -0.9769 -0.4524 -0.2470
-8.3984
-19.2720
Anderson-Darling statistic = - 5 - (1/5)(-8.3984 - 19.2720) = 0.534. n-1
n
Alternately, A2 = -n + nΣ Sn (yi)2 {ln(S(yi)) - ln(S(yi+1))} + nΣ Fn (yi)2 {ln(F(yi+1)) - ln(F(yi))} = i=0
i=1
-5 + (5){(12 )ln(1/.7811) + (.82 )ln(.7811/.6361) + (.62 )ln(.6361/.3765) + (.42 )ln(.3765/.1332) + (.22 )ln(.1332/.0524) + (.22 )ln(.3639/.2189) + (.42 )ln(.6235/.3639) + (.62 )ln(.8668/.6235) + (.82 )ln(.9476/.8668) + (12 )ln(1/.9476)} = 0.534. 19.3. C. Since there is no censoring, censorship value = u = ∞, the term for j = 80 drops out of the first summation, and k = number of uncensored points = number of points = n = 80. The Anderson-Darling Statistic is computed as: k
k
A 2 = -nF*(u) + nΣ Sn (yj)2 {ln(S*(yj)) - ln(S*(yj+1))} + nΣ Fn (yj)2 {ln(F*(yj+1)) - ln(F*(yj))} j= 0
j=1
= -80 + (80)(.4982) + (80)(.5398) = 3.04. Since 2.492 < 3.04 < 3.880, we reject the fit at 5% and do not reject at 1%.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 572 19.4. E. (1)ln(F(410)) + (3)ln(F(1924)) + (5)ln(F(2635)) + (7)ln(F(4548)) + (9)ln(F(6142)) = -11.9029. (1)ln(S(6142)) + (3)ln(S(4548)) + (5)ln(S(2635)) + (7)ln(S(1924)) + (9)ln(S(410)) = -14.8506. i
2i - 1
yi
F(yi)
lnF(yi)
S(yn+i-1)
lnS(yn+i-1)
1 2 3 4 5
1 3 5 7 9
410 1924 2635 4548 6142
0.0355 0.4337 0.5659 0.7720 0.8559
-3.3382 -0.8354 -0.5693 -0.2588 -0.1556
0.1441 0.2280 0.4341 0.5663 0.9645
-1.9372 -1.4784 -0.8345 -0.5686 -0.0361
-11.9029
-14.8506
Anderson-Darling statistic = - 5 - (1/5)(-11.9029 - 14.8506) = 0.351. n-1
n
Alternately, A2 = -n + nΣ Sn (yi)2 {ln(S(yi)) - ln(S(yi+1))} + nΣ Fn (yi)2 {ln(F(yi+1)) - ln(F(yi))} = i=0
i=1
-5 + (5){(12 )ln(1/.9645) + (.82 )ln(.9645/.5663) + (.62 )ln(.5663/.4341) + (.42 )ln(.4341/.2280) + (.22 )ln(.2280/.1441) + (.22 )ln(.4337/.0355) + (.42 )ln(.5659/.4337) + (.62 )ln(.7720/.5659) + (.82 )ln(.8559/.7720) + (12 )ln(1/.8559)} = 0.351. Comment: Based on a fitted LogNormal Distribution with parameters µ = 7.72 and σ = .944. 19.5. B. The Kolmogorov-Smirnov Statistic is 0.1. X
Fitted F(X)
x1
0.1
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.1 0.1
0.2 0.1 x2
0.3 0.1 0.4 0.1
x3
0.5 0.1 0.6 0.1
x4
0.7 0.1 0.8 0.1
x5
0.9 0.1 1
Comment: This is the smallest possible K-S statistic for 5 unique data points.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 573 19.6. E. (1)ln(F(x1 )) + (3)ln(F(x2 )) + (5)ln(F(x3 )) + (7)ln(F(x4 )) + (9)ln(F(x5 )) = ln(.1) + 3ln(.3) + 5ln(.5) + 7ln(.7) + 9ln(.9) = -12.825. (1)ln(S(x5 )) + (3)ln(S(x4 )) + (5)ln(S(x3 )) + (7)ln(S(x2 )) + (9)ln(S(x1 )) = ln(.1) + 3ln(.3) + 5ln(.5) + 7ln(.7) + 9ln(.9) = -12.825. Anderson-Darling statistic = A2 = -n - (1/n)Σ (2i - 1){ln(F(yi)) + ln(S(yn+1-i))} = - 5 - (1/5)(-12.825 - 12.825) = ..130. n-1
n
Alternately, A2 = -n + nΣ Sn (yi)2 {ln(S(yi)) - ln(S(yi+1))} + nΣ Fn (yi)2 {ln(F(yi+1)) - ln(F(yi))} = i=0
i=1
-5 + (5){(12 )ln(1/.9) + (.82 )ln(.9/.7) + (.62 )ln(.7/.5) + (.42 )ln(.5/.3) + (.22 )ln(.3/.1)} + (5){(.22 )ln(.3/.1) + (.42 )ln(.5/.3) + (.62 )ln(.7/.5) + (.82 )ln(.9/.7) + (12 )ln(1/.9)} = 0.130. 19.7. D. The Kolmogorov-Smirnov Statistic is 0.2. X
Fitted F(X)
x1
0.1
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.1 0.1
0.2 0.1 x2
0.3 0.1 0.4 0.2
x3
0.6 0.0 0.6 0.1
x4
0.7 0.1 0.8 0.1
x5
0.9 0.1 1
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 574 19.8. A. (1)ln(F(x1 )) + (3)ln(F(x2 )) + (5)ln(F(x3 )) + (7)ln(F(x4 )) + (9)ln(F(x5 )) = ln(.1) + 3ln(.3) + 5ln(.6) + 7ln(.7) + 9ln(.9) = -11.914. (1)ln(S(x5 )) + (3)ln(S(x4 )) + (5)ln(S(x3 )) + (7)ln(S(x2 )) + (9)ln(S(x1 )) = ln(.1) + 3ln(.3) + 5ln(.4) + 7ln(.7) + 9ln(.9) = -13.941. Anderson-Darling statistic = A2 = -n - (1/n)Σ (2i - 1){ln(F(yi)) + ln(S(yn+1-i))} = - 5 - (1/5)(-11.914 -13.941) = 0.171. n-1
n
Alternately, A2 = -n + nΣ Sn (yi)2 {ln(S(yi)) - ln(S(yi+1))} + nΣ Fn (yi)2 {ln(F(yi+1)) - ln(F(yi))} = i=0
i=1
-5 + (5){(12 )ln(1/.9) + (.82 )ln(.9/.7) + (.62 )ln(.7/.4) + (.42 )ln(.4/.3) + (.22 )ln(.3/.1)} + (5){(.22 )ln(.3/.1) + (.42 )ln(.6/.3) + (.62 )ln(.7/.6) + (.82 )ln(.9/.7) + (12 )ln(1/.9)} = 0.171. 19.9. D. The Kolmogorov-Smirnov Statistic is 0.2. X
Fitted F(X)
x1
0.1
Empirical Distribution 0
Absolute Value of Fitted - Empirical 0.1 0.1
0.2 0.1 x2
0.3 0.1 0.4 0.1
x3
0.5 0.1 0.6 0.1
x4
0.7 0.1 0.8 0.0
x5
0.8 0.2 1
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 575 19.10. D. (1)ln(F(x1 )) + (3)ln(F(x2 )) + (5)ln(F(x3 )) + (7)ln(F(x4 )) + (9)ln(F(x5 )) = ln(.1) + 3ln(.3) + 5ln(.5) + 7ln(.7) + 9ln(.8) = -13.885. (1)ln(S(x5 )) + (3)ln(S(x4 )) + (5)ln(S(x3 )) + (7)ln(S(x2 )) + (9)ln(S(x1 )) = ln(.2) + 3ln(.3) + 5ln(.5) + 7ln(.7) + 9ln(.9) = -12.132. Anderson-Darling statistic = A2 = -n - (1/n)Σ (2i - 1){ln(F(yi)) + ln(S(yn+1-i))} = -5 - (1/5)( -13.885 - 12.132) = 0.203. n-1
n
Alternately, A2 = -n + nΣ Sn (yi)2 {ln(S(yi)) - ln(S(yi+1))} + nΣ Fn (yi)2 {ln(F(yi+1)) - ln(F(yi))} = i=0
i=1
-5 + (5){(12 )ln(1/.9) + (.82 )ln(.9/.7) + (.62 )ln(.7/.5) + (.42 )ln(.5/.3) + (.22 )ln(.3/.2)} + (5){(.22 )ln(.3/.1) + (.42 )ln(.5/.3) + (.62 )ln(.7/.5) + (.82 )ln(.8/.7) + (12 )ln(1/.8)} = 0.203. Comment: One of the values of the fitted distribution function differs by 0.1 from the optimal value. While the K-S statistic is the same as in the previous set of questions, the Anderson Darling statistic is not. In this case, the discrepancy was in the right hand tail rather than in the middle. Since the tails get more weight, the Anderson-Darling statistic is larger here than in the previous set of questions. 19.11. B. 1.933 < 2.11 < 2.492 ⇒ reject the fit at 10% and do not reject at 5%. 19.12. D. This is a Single Parameter Pareto Distribution, set up to work directly with data truncated from below. Anderson-Darling statistic = A2 = -n - (1/n)Σ (2i - 1){ln(F(yi)) + ln(S(yn+1-i))} = - 10 - (1/10)(-43.5665 - 60.6007) = 0.417. i
2i - 1
yi
F(yi)
lnF(yi)
S(yn+i-1)
lnS(yn+i-1)
1 2 3 4 5 6 7 8 9 10
1 3 5 7 9 11 13 15 17 19
105310 105829 110493 116472 125152 139647 161964 220942 231919 241513
0.1349 0.1467 0.2438 0.3475 0.4665 0.6074 0.7408 0.8914 0.9051 0.9153
-2.0035 -1.9194 -1.4116 -1.0570 -0.7626 -0.4985 -0.3000 -0.1150 -0.0997 -0.0885
0.0847 0.0949 0.1086 0.2592 0.3926 0.5335 0.6525 0.7562 0.8533 0.8651
-2.4689 -2.3554 -2.2196 -1.3502 -0.9351 -0.6282 -0.4269 -0.2794 -0.1586 -0.1449
-43.5665
Alternately,
k
k
A 2 = -nF*(u) + nΣSn (yi)2 ln(S*(yi)/S*(yi+1)) + nΣFn (yi)2 ln(F*(yi+1)/F*(yi)), i=0
i=1
-60.6007
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 576 n = number of data points = 10. k = number of data points not censored from above = 10. Fn (yi) = i/10. Sn (yi) = 1 - Fn (yi) = 1 - i/10. y 0 = t = truncation point = 100,000. y11 = ∞. In the absence of censoring we do not include the last term in the first summation, which would otherwise involve taking the log of infinity. 9
Σ Sn(yi)2 {ln(S*(yi)) - ln(S*(yi+1))} = 0.6060. i=0
i
yi
Sn(yi)
lnS*(yi)
lnS*(yi+1)
contribution
0 1 2 3 4 5 6 7 8 9
100000 105310 105829 110493 116472 125152 139647 161964 220942 231919 241513
1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000
0.0000 -0.1449 -0.1586 -0.2794 -0.4269 -0.6282 -0.9351 -1.3502 -2.2196 -2.3554
-0.1449 -0.1586 -0.2794 -0.4269 -0.6282 -0.9351 -1.3502 -2.2196 -2.3554 -2.4689
0.1449 0.0111 0.0773 0.0723 0.0725 0.0767 0.0664 0.0783 0.0054 0.0011 0.6060
10
Σ Fn(yi)2{ln(F*(yi+1)) - ln(F*(yi))} = 0.4357. i=1
i
yi
Fn(yi)
lnF*(yi)
lnF*(yi+1)
contribution
1 2 3 4 5 6 7 8 9 10
105,310 105,829 110,493 116,472 125,152 139,647 161,964 220,942 231,919 241,513
0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.7000 0.8000 0.9000 1
-2.0035 -1.9194 -1.4116 -1.0570 -0.7626 -0.4985 -0.3000 -0.1150 -0.0997 -0.0885
-1.9194 -1.4116 -1.0570 -0.7626 -0.4985 -0.3000 -0.1150 -0.0997 -0.0885 0.0000
0.0008 0.0203 0.0319 0.0471 0.0660 0.0715 0.0907 0.0098 0.0091 0.0885 0.4357
F*(u) = F*(∞) = 1. Anderson-Darling statistic = -10 + (10)(.6060) + (10)(.4357) = 0.417.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 577 19.13. C. F*(x) = 1 - exp[-(x/3000)2 ], x < 5000. n = number of data points = 5. k = number of data points not censored from above = 4. Fn (yi) = i/5. Sn (yi) = 1 - Fn (yi) = 1 - i/5. y6 = u = 5000. k
Σ Sn(yi)2 {ln(S*(yi)) - ln(S*(yi+1))} = 0.4714. i=0
i
yi
Sn(yi)
lnS*(yi)
lnS*(yi+1)
contribution
0 1 2 3 4
0 737 1618 2482 3003 5000
1.0000 0.8000 0.6000 0.4000 0.2000
0.0000 -0.0604 -0.2909 -0.6845 -1.0020
-0.0604 -0.2909 -0.6845 -1.0020 -2.7778
0.0604 0.1475 0.1417 0.0508 0.0710 0.4714
k
Σ Fn(yi)2{ln(F*(yi+1)) - ln(F*(yi))} = 0.5061. i=1
i
yi
Fn(yi)
lnF*(yi)
lnF*(yi+1)
contribution
1 2 3 4
737 1618 2482 3003 5000
0.2000 0.4000 0.6000 0.8000
-2.8376 -1.3768 -0.7019 -0.4575
-1.3768 -0.7019 -0.4575 -0.0642
0.0584 0.1080 0.0880 0.2517 0.5061
F*(u) = 1 - exp[-(5000/3000)2 ] = 0.9378. Anderson-Darling statistic = -(5)(.9378) + (5)(.4714) + (5)(.5061) = 0.199. 19.14. E. 1. False. The Fitted and Empirical Distribution Functions would remain the same, as would the K-S Statistic. However, the critical values go down as 1/ n , so they would be 1/ 2 as large as before. 2. True. The Fitted and Empirical Distribution Functions would remain the same. However, the formula for the Anderson-Darling Statistic has a factor of n, so that the statistic would double. The table of critical values for the Anderson-Darling Statistic does not depend on sample size. 3. True. The fitted parameters of the distributions would remain the same. The same terms would enter into the sum that is the loglikelihood, except each term would show up twice, resulting in twice the loglikelihood. The test statistic, twice the difference in loglikelihoods, would be double what it was. One would still have the number of degrees of freedom equal to the difference in the number of parameters, and therefore, one would get the same critical values when one entered the Chi-Square Table. Comment: See pages 449, 450, 454, and 455 of Loss Models.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 578 19.15. C. After truncation from below at 50, F*(x) = {F(x) - F(50)}/S(50) = (e-50/100 - e-x/100)/e-50/100 = 1 - e-(x-50)/100, x > 50. ln(S*(x)) = -(x-50)/100, x > 20. n = number of data points = 4. k = number of data points not censored from above = 4. Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4. y 0 = t = truncation point = 50. y5 = ∞. In the absence of censoring we do not include the last term in the first summation, which would otherwise involve taking the log of infinity. 3
Σ Sn(yi)2 {ln(S*(yi)) - ln(S*(yi+1))} = 0.4375. i=0
i
yi
Sn(yi)
lnS*(yi)
lnS*(yi+1)
contribution
0 1 2 3 4
50 64 90 132 206
1.0000 0.7500 0.5000 0.2500
0.0000 -0.1400 -0.4000 -0.8200
-0.1400 -0.4000 -0.8200 -1.5600
0.1400 0.1462 0.1050 0.0462 0.4375
4
Σ Fn(yi)2{ln(F*(yi+1)) - ln(F*(yi))} = 0.6199. i=1
i
yi
Fn(yi)
lnF*(yi)
lnF*(yi+1)
contribution
1 2 3 4
64 90 132 206
0.2500 0.5000 0.7500 1.0000
-2.0353 -1.1096 -0.5806 -0.2359
-1.1096 -0.5806 -0.2359 0.0000
0.0579 0.1323 0.1939 0.2359 0.6199
F*(u) = F*(∞) = 1. Anderson-Darling statistic = -4 + (4)(0.4375) + (4)(0.6199) = 0.230.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 579 19.16. D. Prior to truncation and censoring the distribution function is: F(x) = 1 - {1000/(1000 + x)}2 . After truncation from below at 500 and censoring from above at 5000 the distribution function is: F*(x) = {F(x) - F(500)}/S(500) = 1 - S(x)/S(500) = 1 - {1000/(1000 + x)}2 /{1000/1500}2 = 1 - {1500/(1000 + x)}2 , 500 < x < 5000. Payments of: 179, 352, 968, 1421, 4500, correspond to losses of size: 679, 852, 1468, 1921, 5000 or more. n = number of data points = 5. k = number of data points not censored from above = 4. Fn (yi) = i/5. Sn (yi) = 1 - Fn (yi) = 1 - i/5. k
Σ Sn(yi)2 {ln(S*(yi)) - ln(S*(yi+1))} = 0.6692. i=0
i
yi
Sn(yi)
S*(yi)
lnS*(yi)
lnS*(yi+1)
contribution
0 1 2 3 4 5
500 679 852 1468 1921 5000
1.0000 0.8000 0.6000 0.4000 0.2000
1.0000 0.7981 0.6560 0.3694 0.2637 0.0625
0.0000 -0.2255 -0.4216 -0.9959 -1.3329
-0.2255 -0.4216 -0.9959 -1.3329 -2.7726
0.2255 0.1255 0.2067 0.0539 0.0576 0.6692
k
Σ Fn(yi)2{ln(F*(yi+1)) - ln(F*(yi))} = 0.3287. i=1
i
yi
Fn(yi)
F*(yi)
lnF*(yi)
lnF*(yi+1)
contribution
1 2 3 4 5
679 852 1468 1921 5000
0.2000 0.4000 0.6000 0.8000
0.2019 0.3440 0.6306 0.7363 0.9375
-1.6002 -1.0671 -0.4611 -0.3061
-1.0671 -0.4611 -0.3061 -0.0645
0.0213 0.0970 0.0558 0.1546 0.3287
F*(u) = F*(5000) = 1 - {1500/(1000 + 5000)}2 = 0.9375. k k S * (yi) F * (yi + 1) Anderson-Darling statistic = A2 = -nF*(u) + n ∑ Sn (yi)2 ln[ + n Fn (yi)2 ln[ ∑ ] ]= S * (y ) F * (y ) i + 1 i i=0 i=1
-(5)(.9375) + (5)(.6692) + (5)(.3287) = 0.302.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 580 n
19.17. E. A2 = -n - (1/n)Σ (2i - 1)ln[F(yi) S(yn+1-i)] = i=1
-3 - (1/3){ln[F(40)S(80)] + 3ln[F(70)S(70)] + 5ln[F(80)S(40)]} = -3 - (1/3){ln[(.4)(.2)] + 3ln[(.7)(.3)] + 5ln[(.8)(.6)]} = 0.626. 19.18. 2.492 < 3.2 < 3.880. Therefore, if one rejects H0 , the chance of making a Type I error is between 5% and 1%. A Type II error would occur if we failed to reject H0 when it is not true. There is no way to determine the probability of a Type II from the given information. 19.19. E. In this case, n = 1, t = 0, u = 100, Fn (x) = 0 for x < 60 and 1 for x ≥ 60, F*(x) = x/100, and f*(x) = 1/100. u
n
∫ {Fn(x) -
F * (x)}2 f * (x) / {F * (x)S* (x)} dx =
t
60
∫ (0 - x / 100)2 (1/ 100) / {(x / 100)(1
- x / 100)} dx +
0
100
∫
(1 - x / 100)2 (1/ 100) / {(x / 100)(1 - x / 100)} dx =
60
.01
60
100
0
60
∫ x / (100 - x) dx + .01 ∫
(100 - x)/ x dx =
60
.01
∫ 100 / (100 0
100
- x) - 1 dx + .01
∫
100 / x - 1 dx =
60
(.01){100 ln(100/40) - 60 - 100 ln(100/60) - 40} = ln(2.5) - ln(.6) - 1 = 0.427. Comment: A2 = -n - (1/n)Σ (2i - 1){ln(F(yi)) + ln(S(yn+1-i))} = -1 - {ln(.6) + ln(.4)} = 0.427.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 581 19.20. C. Prior to truncation and censoring the distribution function is: F(x) = x2 , 0 < x < 1. F(0.3) = 0.09. After truncation from below at 0.3 the distribution function is: F*(x) = {F(x) - F(0.3)}/S(0.3) = (x2 - 0.09)/0.91, 0.3 < x < 1. n = number of data points = 2. k = number of data points not censored from above = 1. Fn (yi) = i/2. Sn (yi) = 1 - Fn (yi) = 1 - i/2. k
Σ Sn(yi)2 {ln(S*(yi)) - ln(S*(yi+1))} = i=0
S n (0.3)2 {ln(S*(0.3)) - ln(S*(0.6))} + Sn (0.6)2 {ln(S*(0.6)) - ln(S*(0.7))} = 12 {ln(1) - ln(0.7033)} + (1/2)2 {ln(0.7033) - ln(0.5604)} = 0.4088. k
Σ Fn(yi)2{ln(F*(yi+1)) - ln(F*(yi))} = i=1
Fn (0.6)2 {ln(F*(0.7)) - ln(F*(0.6))} = (1/2)2 {ln(0.4396) - ln(0.2967)} = 0.0983. F*(u) = F*(0.7) = {0.72 - 0.09}/0.91 = 0.4396. k k S * (yi) F * (yi + 1) Anderson-Darling statistic = A2 = -nF*(u) + n ∑ Sn (yi)2 ln[ + n Fn (yi)2 ln[ ∑ ] ]= S * (y ) F * (y ) i + 1 i i=0 i=1
-(2)(0.4396) + (2)(0.4088) + (2)(0.0983) = 0.1350.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 582 19.21. E. For the LogNormal Distribution, F(20,000) = Φ[ F(30,000) = Φ[
ln(30,000) - 11 ] = Φ[-0.69] = 0.2451. 1
F(50,000) = Φ[
ln(50,000) - 11 ] = Φ[-0.18] = 0.4286. 1
ln(20,000) - 11 ] = Φ[-1.10] = 0.1357. 1
n = number of data points = 4. k = number of data points not censored from above = 2. Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4. k
Σ Sn(yi)2 {ln(S*(yi)) - ln(S*(yi+1))} = i=0
S n (0)2 {ln(S*(0)) - ln(S*(20,000))} + Sn (20,000)2 {ln(S*(20,000)) - ln(S*(30,000))} + Sn (30,000)2 {ln(S*(30,000)) - ln(S*(50,000))} = 12 {ln(1) - ln(0.8643)} + (3/4)2 {ln(0.8643) - ln(0.7549)} + (1/2)2 {ln(0.7549) - ln(0.5714)} = 0.2916. k
Σ Fn(yi)2{ln(F*(yi+1)) - ln(F*(yi))} = i=1
Fn (20,000)2 {ln(F*(30,000)) - ln(F*(20,000))} + Fn (30,000)2 {ln(F*(50,000)) - ln(F*(30,000))} = (1/4)2 {ln(0.2451) - ln(0.1357)} + (1/2)2 {ln(0.4286) - ln(0.2451)} = 0.1767. F*(u) = F(50,000) = 0.4286. k k S * (yi) F * (yi + 1) Anderson-Darling statistic = A2 = -nF*(u) + n ∑ Sn (yi)2 ln[ + n Fn (yi)2 ln[ ∑ ] ]= S * (y ) F * (y ) i + 1 i i=0 i=1
-(4)(0.4286) + (4)(0.2916) + (4)(0.1767) = 0.1588.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 583 19.22. B. Prior to truncation the distribution function is: F(x) = 0.18x - 0.008x2 , 0 < x < 10. F(6) = 0.792. After truncation from below at 6 the distribution function is: F*(x) = {F(x) - F(6)}/S(6) = (0.18x - 0.008x2 - 0.792)/0.208, 6 < x < 10. F*(6) = 0. F*(7) = 0.3654. F*(7.5) = 0.5192. F*(8) = 0.6538. n = number of data points = 4. k = number of data points not censored from above = 4. Fn (yi) = i/4. Sn (yi) = 1 - Fn (yi) = 1 - i/4. k
Σ Sn(yi)2 {ln(S*(yi)) - ln(S*(yi+1))} = i=0
S n (6)2 {ln(S*(6)) - ln(S*(7))} + Sn (7)2 {ln(S*(7)) - ln(S*(7.5))} + Sn (7.5)2 {ln(S*(7.5)) - ln(S*(8))} + Sn (8)2 {ln(S*(8)) - ln(S*(8))} = 12 {ln(1) - ln(0.6346)} + (3/4)2 {ln(0.6346) - ln(0.4808)} + (1/2)2 {ln(0.4808) - ln(0.3462)} + 0 = 0.6930, where since there is no censoring we have left out the last term that would involve ln(S*(∞)) = ln(0). k
Σ Fn(yi)2{ln(F*(yi+1)) - ln(F*(yi))} = i=1
Fn (7)2 {ln(F*(7.5)) - ln(F*(7))} + Fn (7.5)2 {ln(F*(8)) - ln(F*(7.5))} + Fn (8)2 {ln(F*(8)) - ln(F*(8))} + Fn (8)2 {ln(F*(∞)) - ln(F*(8))} = (1/4)2 {ln(0.5192) - ln(0.3654)} + (1/2)2 {ln(0.6538) - ln(0.5192)} + 0 + 12 {ln(1) - ln(.6538)}= 0.5045 F*(u) = F*(∞) = 1. k k S * (yi) F * (yi + 1) Anderson-Darling statistic = A2 = -nF*(u) + n ∑ Sn (yi)2 ln[ + n Fn (yi)2 ln[ ∑ ] ]= S * (y ) F * (y ) i + 1 i i=0 i=1
-(4)(1) + (4)(0.6930) + (4)(0.5045) = 0.790. 19.23. B. Statement I is true. See page 452 of Loss Models. Statement II is false; the K-S statistic is the largest absolute deviation. See page 448 of Loss Models. Statement III is true. See page 450 of Loss Models.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 584 19.24. A. If H0 is true, the probability that the K-S test statistic will be greater than the critical value is by definition the corresponding significance level. The critical values for the Kolmogorov-Smirnov all contain 1/ n , where n is the sample size. Thus as the sample size goes to infinity the critical values go to zero. Let δ and ε be very small values. Let the critical value corresponding to ε/2 be c/ n . Then take c/ n = δ, or n = (c/δ)2 . Prob[K-S ≥ δ | sample size ≥ (c/δ)2 ] ≤ ε/2. Therefore, the probability that the K-S test statistic is larger than δ can be made less than ε, for sufficiently large n. The K-S test statistic tends to zero as the sample size goes to infinity. For the Anderson-Darling test statistic and the Chi-square goodness-of-fit test statistic, the critical values are independent of the sample size; these test statistics do not tend to zero. The Schwarz Bayesian adjustment is: (number of fitted parameters) ln(number of data points) / 2, which goes to infinity as the sample size goes to infinity. Comment: As the sample size goes to infinity, the empirical distribution function approaches the underlying distribution from which the sample was drawn. Therefore, if the distribution we are comparing to is this underlying distribution, in other words if the proposed model is appropriate, then the empirical distribution function approaches the model distribution, as the sample size goes to infinity. As the sample size goes to infinity, the Kolmogorov-Smirnov test statistic tends to zero. When computing the Chi-square goodness-of-fit test statistic, for a given interval, the Expected does approach the Observed as a percent. In other words, |Expected - Observed|/ Expected goes to zero for each interval. However, the contribution from each interval, (Expected - Observed)2 / Expected, does not decline. Notice the square in the numerator. If H0 is true, the sum of the contributions, the Chi-Square Statistic, approaches a Chi-Square Distribution. A key idea, is that the critical values for the ChiSquare are not a function of the number of data points, as opposed to the number of intervals. For example, let us assume 5 degrees of freedom. Then the critical value at 5% is 11.07. Thus there is a 5% chance that the Chi-Square Statistic will be greater than 11.07, for large amounts of data, if H0 is true. Prob[χ2 > 11.07 | n = 1 million] = 5%. Prob[χ2 > 11.07 | n = 100 million] = 5%. Prob[χ2 > 11.07 | n = 10,000 million] = 5%. Clearly the Chi-Square Statistic is not going to zero as n approaches infinity. You might look at some of the simulations of p-values in “Mahlerʼs Guide to Simulation.”
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 585 19.25. E. Loss Models states at page 448, “When the parameters are estimated from the data, the test statistic tends to be smaller than it would have been had the parameter values been prespecified. That is because the method itself tries to choose parameters that produce a distribution that is close to the data. In that case, the test become approximate. Because rejection of the null hypothesis occurs for large values of the test statistic (for the K-S, Chi-Square, and Anderson-Darling tests), the approximation tends to increase the probability of a Type II error while decreasing the probability of a Type I error. Among the test presented here, only the chi-square test has a built-in correction for this situation. Modifications for the other tests have been developed, but they will not be presented here.” Thus statement A is false. Loss Models states that the K-S test should only be used on individual data. (In a footnote at page 448 it mentions that it is possible to modify the K-S test for use with grouped data.) Thus B is false. The Anderson-Darling test tends to place more emphasis on a good fit in the tails of the distribution rather than in the middle of the distribution. Thus statement C is false. For the chi-square goodness-of-fit test, the critical values depend on the number of degrees of freedom, which depend in turn on the number of intervals rather than the sample size. Thus statement D is false. Comment: In the case of the Chi-Square or K-S statistics, using the sample data to estimate the parameters of the distribution produces a better than expected match between assumed and expected, and therefore tends to decrease the statistic. If one did not somehow compensate for this, (for example by reducing the degrees of freedom in the Chi-Square by the number of fitted parameters), then the probability of rejecting (at a specific significance level) would decline. Therefore, the probability of rejecting when H0 is true would go down; the probability of a Type I error would decline. The probability of failing to reject when H0 is false would go up; the probability of a Type II error would increase. In practical applications, one should compensate for using the sample data to estimate the parameters of the distribution. (Loss Models states that when the data set is used to estimate the parameters of the null hypothesis distribution, the correct critical values for the Kolmogorov-Smirnov test are smaller.) If one adjusts properly, it is not obvious why the probability of a Type II error would change. The Kolmogorov-Smirnov statistic can be calculated for individual data, and the critical values to apply a hypothesis test to this statistic have been tabulated. (There are separate tables to apply for small samples, not covered on your exam.) If one has grouped data, a maximum and minimum for the Kolmogorov-Smirnov statistic can be determined. In some situations, such as when one has many narrow intervals and lots of data, this is sufficient to allow one to usefully apply a statistical hypothesis using these same K-S critical values. These situations do come up in actuarial work. However, usually one does not apply the Kolmogorov-Smirnov test to grouped data. One could run a simulation, see “Mahlerʼs Guide to Simulation”, to come up with critical values to apply to a particular grouped data situation with many narrow intervals, that an actuary might be frequently encountering. Such a test, using the maximum observed absolute difference, would not be called the Kolmogorov-Smirnov test, although it would clearly be somehow related. The power of such a test would be less than the K-S test applied to the same data in ungrouped form.
2013-4-6, Fitting Loss Dists. §19 Anderson-Darling Test, HCM 10/15/12, Page 586 19.26. A. Loss Models states that, the chi-square goodness-of-fit test works best when the expected number of observations are about equal from interval to interval. Thus Statement A is false. Loss Models states that, when the data is used to estimate the parameters of the distribution, the correct critical values for the Kolmogorov-Smirnov test decrease. One rejects the null hypothesis when the test statistic is greater than the critical values. Therefore, if one does not revise the critical values, the probability of rejecting the null hypothesis decreases. More generally, Loss Models states that when the parameters are estimated from the data, if one does not revise the critical values, the probability of a Type I error is lowered, since the probability of rejecting the null hypothesis decreases. Statement B is not false. Loss Models states that when u < ∞, in other words when the data is right censored, the (correct) critical values for the Kolmogorov-Smirnov test are smaller. Statement C is true. As with the Kolmogorov-Smirnov test, the Anderson-Darling test is designed to work with ungrouped data. Statement D is not false. Comment: See pages 448, 450, and 452 of Loss Models. The writer of this question searched several pages of the textbook to find verbal statements to test. While this type of question is not common on this exam, there have been others such as 4, 5/05, Q.19.
2013-4-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/15/12, Page 587
Section 20, Percentile Matching Applied to Truncated Data Assume we wish to fit a distribution to the ground-up total limits losses,257 but all we have is truncated data. Then one can work with the truncated distribution function that corresponds to the distribution function of the ground-up total limits losses. In this section, fitting via Percentile Matching in this situation will be discussed. In subsequent sections, the Method of Moments and Maximum Likelihood will be discussed for this situation. If one has data truncated and shifted from below, where the payments after the application of a deductible have been recorded, then one can translate to data truncated from below where the loss amounts of the insured have been recorded, or vice versa. When data is truncated from below at the value d, losses of size less than d are not in the reported data base. As discussed previously, the distribution function is revised as follows: F(x)- F(d) G(x) = , x > d. S(d) One can apply percentile matching to data truncated from below by working with this revised distribution function. One matches G(x) to the empirical distribution at a number of points equal to the number of parameters of the chosen type of distribution. For example, assume that the ungrouped data in Section 2 were truncated from below at $150,000; only the 52 losses in the final two columns would be reported. Then for example for the Exponential Distribution, for truncation from below at 150,000 the revised Distribution Function is: G(x) =
(1 - e - x / θ) - (1 - e - 150,000 / θ ) = 1 - e-(x−150000)/θ. 1 - (1 - e - 150 ,000 / θ )
If we were to match the observed percentile p at the observed claim size x: p = 1 - e-(x−150000)/θ ⇒ θ = -(x - 150000) / {ln(1-p)}. The 26th claim (out of the 52 losses greater than $150,000) is $406,900. Therefore matching at the 26 / (1 + 52) percentile, would imply θ = -256,900 / {ln(1 - 26/53)} = 3.81 x 105 . Exercise: Fit an exponential distribution to the data in Section 2 truncated at $150,000, via percentile matching at the 40th observed claim. [Solution: θ = - (766,100 - 150,000 )/ {ln(1 - 40/53)} = 4.38 x 105 .] 257
Unaffected by deductibles or maximum covered losses.
2013-4-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/15/12, Page 588 Problems: 20.1 (1 point) From a policy with a $1000 deductible, you observe 4 claims with the following amounts paid by the insurer: $1000, $2000, $3000, $5000. You fit an exponential distribution F(x) = 1 - e-x/θ to this data via percentile matching. If the matching is performed at the $5000 claim payment, then the parameter θ is in which of the following intervals? A. less than 1500 B. at least 1500 but less than 2000 C. at least 2000 but less than 2500 D. at least 2500 but less than 3000 E. at least 3000 Use the following information for the next two questions: From a policy with a $10,000 deductible, you observe 6 claims with the following amounts paid by the insurer: $22,000, $28,000, $39,000, $51,000, $80,000, and $107,000. 20.2 (2 points) You assume the losses prior to the impact of the deductible follow a Pareto Distribution with θ = 40,000. Percentile matching is performed at the payment of size $51,000. What is the fitted α parameter? (A) 1.2
(B) 1.4
(C) 1.6
(D) 1.8
(E) 2.0
20.3 (4 points) You assume the losses prior to the impact of the deductible follow a LogNormal Distribution with µ = 10. Percentile matching is performed at the payment of size $80,000. What is the fitted σ parameter? (A) 1.2
(B) 1.4
(C) 1.6
(D) 1.8
(E) 2.0
20.4 (2 points) From a policy with a $50 deductible and a 70% coinsurance factor, you observe 9 claims with the following amounts paid by the insurer: $28, $70, $105, $140, $203, $224, $350, $504, $665. You assume the losses prior to the impact of the deductible and coinsurance factor follow an Exponential Distribution F(x) = 1 - e-x/θ. Percentile matching is performed at the payment of size $224. What is the fitted θ? A. 300
B. 325
C. 350
D. 400
E. 450
2013-4-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/15/12, Page 589 20.5 (3 points) You are given: (i) There is a deductible of 100. (ii) A random sample of 100 losses is distributed as follows: Loss Range Number of Losses (100 – 200] 31 (200 – 400] 39 (400 – 750] 15 (750 – 1000] 5 over 1000 10 FIt via percentile matching a Weibull Distribution to the payments. Fit at the 70th and 90th percentiles. Estimate the probability that a loss will exceed 2000. A. less than 2.0% B. at least 2.0% but less than 2.5% C. at least 2.5% but less than 3.0% D. at least 3.0% but less than 3.5% E. at least 3.5% 20.6 (3 points) From a policy with a $100 deductible and an 80% coinsurance factor, you observe 14 claims with the following amounts paid by the insurer: $40, $80, $80, $120, $160, $200, $320, $400, $560, $720, $1120, $1520, $3920, $7920. You assume the losses prior to the impact of the deductible and coinsurance factor follow a Loglogistic Distribution. Percentile matching is performed at the payment of sizes of $80 and $400. What is S(10,000) for the fitted Loglogistic Distribution? A. 0.5% B. 1.0% C. 1.5% D. 2.0% E. 2.5% 20.7 (5 points) You have the following data on the frequency of large claims under Medicare: Amount Annual Number of Claims Exceeding Amount per 1000 members 50,000 54.8 100,000 14.8 200,000 1.2 Fit a Weibull Distribution to the ground up claims using the above information.
2013-4-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/15/12, Page 590 Solutions to Problems: 20.1. E. The insurerʼs payments above a deductible is data truncated and shifted. The distribution function for the data truncated and shifted at 1000 is G(x) = {F(x+1000) - F(1000)} / {1-F(1000)} = {e-1000/θ - e-(x+1000)/θ } / e-1000/θ = 1 - e-x/θ. The observed 4/(4+1) = 80th percentile is 5000. Therefore, G(5000) = 1 - e-5000/θ = 0.80. Solving, θ = 3107. Comment: Note that the exponential distribution is the same after truncation and shifting; this implies its constant mean excess loss. 20.2. A. For the data truncated and shifted from below, G(x) = (F(x+d) - F(d))/S(d). The payment of $51,000 is the 4th of 6 payments, so it corresponds to an estimate of the 4/7 percentile of G. F(d) = F(10,000) = 1 - (1 + 10,000/40,000)−α = 1 - 0.8α. F(51000 + d) = F(61,000) = 1 - (1 + 61,000/40,000)−α = 1 - 0.396α. Matching the observed and theoretical percentiles: 4/7 = G(51.000) = (F(61.000) - F(10.000)) / {1 - F(10.000)} = {0.8α - 0.396α} / (0.8)α = 1 - 0.495α. Solving, α = ln(3/7) / ln(0.495) = 1.2. Alternately, for data truncated from below, G(x) = (F(x) - F(d))/{1 - F(d)}. A payment of $51,000 corresponds to a loss of $61,000. The payment of $51,000 is the 4th of 6 payments, so it corresponds to an estimate of the 4/7 percentile of G. F(10,000) = 1 - (1 + 10,000/40,000)−α = 1 - 0.8α. F(61,000) = 1 - (1 + 61,000/40,000)−α = 1 - 0.396α. Matching the observed and theoretical percentiles: 4/7 = G(61,000) = (F(61,000) - F(10,000)) / {1 - F(10,000)} = {0.8α - 0.396α} / (0.8)α = 1 - 0.495α. Solving: α = ln(3/7) / ln(0.495) = 1.2. Alternately, the non-zero payments follow another Pareto Distribution with the same α and θ = (original θ) + d = 40,000 + 10,000 = 50,000. ⎛ 50,000 ⎞ G(51,000) = 1 - ⎜ ⎟ ⎝ 50,000 + 51,000⎠
α
= 1 - 0.495α.
Matching the observed and theoretical percentiles: 4/7 = 1 - 0.495α. ⇒ α = ln(3/7) / ln(0.495) = 1.2. Comment: A Pareto with parameters α and θ, after truncating and shifting at d, is another Pareto with parameters α and θ + d.
2013-4-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/15/12, Page 591 20.3. C. For the data truncated from below, G(x) = (F(x) - F(d))/{1 - F(d)}. A payment of $80,000 corresponds to a loss of $90,000. The payment of $80,000 is the 5th of 6 payments, so it corresponds to an estimate of the 5/7 percentile of G. Matching the observed and theoretical percentiles: 5/7 = G(90000) = (F(90000) - F(10000))/{1 - F(10000)} = {Φ((ln(90000) - 10)/σ) - Φ((ln(10000) - 10)/σ)} /{1 - Φ((ln(10000) - 10)/σ)} = {Φ(1.408/σ) - Φ(-.790/σ)} /{1 - Φ(-.790/σ)} = {Φ(1.408/σ) + Φ(.790/σ) - 1} /Φ(.790/σ). Plugging in the given values of σ, for σ =1.6, G(90,000) is closest to the desired 5/7 = 0.714. σ
{Φ(1.408/σ) + Φ(.790/σ) - 1} / Φ(.790/σ)
1.2
Φ(1.17) + Φ(.66) - 1} /Φ(.66) = (.8790 + .7454 - 1)/.7454 = 0.838
1.4
Φ(1.01) + Φ(.56) - 1} /Φ(.56) = (.8438 + .7123 - 1)/.7123 = 0.781
1.6
Φ(0.88) + Φ(.49) - 1} /Φ(.49) = (.8106 + .6879 - 1)/.6879 = 0.725
1.8
Φ(0.78) + Φ(.44) - 1} /Φ(.44) = (.7823 + .6700 - 1)/.6700 = 0.675
2.0
Φ(0.70) + Φ(.40) - 1} /Φ(.40) = (.7580 + .6554 - 1)/.6554 = 0.631
Comment: Beyond what you are likely to be asked on your exam. One needs to solve numerically; a more exact answer is σ = 1.642. If both µ and σ were allowed to vary, then one would have to match at two percentiles. For example, if one matched at payments of $39,000 and $80,000, then solving numerically would give µ = 10.5 and σ = 1.26. 20.4. C. The $224 payment is 6 out of 9, and thus corresponds to an estimate of the 6/(9+1) = 60th percentile. Translate the $224 payment back to what it would have been in the absence of the coinsurance factor: 224/0.7 = 320. The insurerʼs payments excess of a deductible is data truncated and shifted. The distribution function for the data truncated and shifted at 50 is: G(x) = {F(x+50) - F(50)} / {1-F(50)} = {e-50/θ - e-(x +50)/θ } / e-50/θ = 1 - e-x/θ. The observed 60th percentile corresponds to a payment of $320 after the deductible. Therefore, we want G(320) = 1 - e-320/θ = .60. Solving, θ = -320/ln(.4) = $349. Comment: Due to the memory property of the Exponential Distribution, ignoring the coinsurance, the distribution of non-zero payments excess of the deductible is the same as that of the ground up losses. For θ = 349, Prob[Loss ≤ 50] = 1 - e-50/349 = 0.1335, and Prob[Loss ≤ 370] = 1 - e-370/349 = 0.6536. For θ = 349, ignoring the coinsurance, Prob[non-zero payment ≤ 320] = 1 - e-320/349 = .600 = (0.6536 - 0.1335)/(1 - 0.1335). For θ = 349, including the 70% coinsurance, Prob[non-zero payment ≤ (320)(0.7) = 224] = 0.600.
2013-4-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/15/12, Page 592 20.5. C. The 70th and 90th percentiles of the data are losses of size 400 and 1000. These correspond to payments of 300 and 900. 0.7 = 1 - exp[-(300/θ)τ]. ⇒ (300/θ)τ = 1.20397. 0.9 = 1 - exp[-(900/θ)τ]. ⇒ (650/θ)τ = 2.30259. (900/300)τ = 1.9125. ⇒ τ = 0.5902. θ = 219.0. Prob[loss > 2000] = Prob[payment > 1900] = S(1900) = exp[-(1900/219)0.5902] = 2.79%. 20.6. B. $80 shows up twice in the sample, so it corresponds to: 2.5/15. $400 corresponds to smoothed empirical estimate of percentiles of 8/15. A payment of $80 corresponds to a loss of: (80/0.8) + 100 = $200. A payment of $400 corresponds to a loss of: (400/0.8) + 100 = $600. For the Loglogistic, VARp [X] = θ {p-1 - 1}-1/γ. Thus, 200 = θ {15/2.5 - 1)-1/γ, and 600 = θ {15/8 - 1)-1/γ. Dividing the two equations: 3 = {(7/8) / 5}-1/γ. ⇒ γ = - ln(7/40) / ln(3) = 1.587.
⇒ θ = (600) (7/8)1/1.587 = 552. (x / θ)γ 1 For the Loglogistic, F(x) = γ . ⇒ S(x) = 1 + (x / θ)γ . 1 + (x / θ) S(10,000) =
1 = 1.0%. 1 + (10,000 / 552)1.587
Comment: Check. F(200) =
F(600) =
(200 / 552)1.587 = 0.1664. 2.5/15 = 1/6 = 1.667. 1 + (200 / 552)1.587
(600 / 552)1.587 = 0.5330. 8/15 = 0.5333. 1 + (600 / 552)1.587
2013-4-6, Fitting Loss Dists. §20 Percentile Match. Trunc., HCM 10/15/12, Page 593 20.7. We are not given the survival functions, but we do know that: S(100,000) / S(50,000) = 14.8/54.8, while S(200,000) / S(50,000) = 1.2/54.8. For the Weibull Distribution, S(x) = exp[-(x/θ)τ]. Thus we have two equations in two unknowns: 14.8/54.8 = exp[-(100,000 / θ)τ] / exp[-(50,000 / θ)τ] = exp[(50,000τ - 100,000τ) / θτ]. 1.2/54.8 = exp[-(200,000 / θ)τ] / exp[-(50,000 / θ)τ] = exp[(50,000τ - 200,000τ) / θτ]. Taking logs: -1.3091 = (50,000τ - 100,000τ) / θτ. -3.8214 = (50,000τ - 200,000τ) / θτ. Dividing the two equations: 2.9191 = (50,000τ - 200,000τ) / (50,000τ - 100,000τ) = (1 - 4τ) / ( 1 - 2τ).
⇒ 2.9191 - 2.9191 2τ = 1 - 4τ. Let x = 2τ, then x2 - 2.9191 x + 1.9191. ⇒ (x - 1)(x - 1.9191) = 0. ⇒ x = 1 or x = 1.9191. x = 1 ⇒ 2τ = 1. ⇒ τ = 0, not a viable answer. x = 1.9191 ⇒ 2τ = 1.9191. ⇒ τ = 0.9404. ⇒ θ0.9404 = (50,0000.9404- 200,0000.9404) / (-3.8214) = 18,419. ⇒ θ = 34,324. Comment: Mathematically equivalent to percentile matching with data truncated from below at 50,000. For the truncated data, at 100,000 the distribution function is: 1 - 14.8/54.8 = 0.7299. For the truncated data, at 200,000 the distribution function is: 1 - 1.2/54.8 = 0.9781. For the Weibull distribution truncated from below at 50,000, at 100,000 the distribution function is: 1 - exp[-(100,000/34,324)0.9404] / exp[-(50,000/34,324)0.9404] = 0.7299. For the Weibull distribution truncated from below at 50,000, at 200,000 the distribution function is: 1 - exp[-(200,000/34,324)0.9404] / exp[-(50,000/34,324)0.9404] = 0.9781. Data taken form “Who Moved My Deductible?”, by Mark Troutman, in the May 2012 issue of Contingencies.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 594
Section 21, Method of Moments Applied to Truncated Data One can apply the Method of Moments to truncated data. If the original data prior to truncation (ground up claim amounts) is assumed to follow F(x), then the mean of the data truncated and shifted at d is e(d), the mean excess loss at d. Average Payment per Payment = (E[X] - E[X ∧ d]) / S(d) = e(d). If data is truncated from below, no shifting, then the average value is d more: ∞
e(d) + d =
∫ t f(t) dt / S(d). d
Thus if you know the form of e(d), this helps in fitting by the method of moments to truncated data. Truncation from Below: For example assume that the losses in the absence of any deductible would follow a Pareto distribution function with α = 3: F(x) = 1 - {θ /(θ +x)}3 . Assume that the following data has been truncated from below at 5: 10, 15, 20, 30, and 50. Then the method of moments can be is used to estimate the parameter θ. For the Pareto Distribution, the mean excess loss is; e(x) = (θ+x)/(α−1). For α = 3, e(5) = (θ+5)/2. The average size of the data truncated from below at 5 is: e(5) + 5 = θ/2 + 7.5. The observed mean for the data truncated from below is: (10+15+20+30+50) / 5 = 25. Matching the observed mean to the theoretical mean: 25 = θ/2 + 7.5. Thus θ = (2)(25 - 7.5) = 35. For the ungrouped data in Section 2 truncated from below at $150,000, the mean is 684,550. For an Exponential Distribution, e(d) = θ. Thus for d = 150,000, e(d) + d = θ + 150000. Therefore, fitting a ground-up Exponential Distribution to the ungrouped data in Section 2 truncated at $150,000 via the method of moments: θ + 150,000 = 684,550.258 Therefore, θ = 684,550 - 150,000 = 534,550. In general, for a truncation point of d from below, the method of moments applied to the Exponential Distribution: θ = Σ (xi - d) / N = Σxi /N - d = (mean of the truncated data) - d. 258
We assume that the losses prior to truncation from below follow an Exponential Distribution.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 595 Truncation from Below, Two Parameter Case: Applying the method of moments to two parameter distributions is more complicated, since one needs to match both the first and second moments. The second moment can be computed as : ∞
∫ t2 f(t) dt
/ S(d).
d
Such integrals are computed in the same manner as those for the moments and Limited Expected Values. While these integrals are normally too complicated to be computed on the exam, one can make use of the formulas for the limited moments provided in Appendix A of Loss Models. The second moment of the data truncated from below can be put in terms of the Limited Second Moment as follows:
E[(X ∧ d)2 ] =
d
∫ t2 f(t) dt
+ S(d)d2 .
0 ∞
E[X2 ] =
∫ t2 f(t) dt . 0
∞
Therefore,
∫
d
t2
∞
f(t) dt =
∫
0
t2
d
f(t) dt -
∫ t2 f(t) dt = E[X2] + S(d)d2 - E[(X ∧ d)2]. 0
Then the second moment of the data truncated from below is the above integral divided by S(d): ∞
2nd moment of the data truncated from below at d =
∫ t2 f(t) dt
/ S(d) =
d
(E[X2 ] + S(d)d2 - E[(X ∧ d)2 ] ) / S(d) = (E[X2 ] - E[(X ∧ d)2 ]) / S(d) + d2 . Exercise: What is the second moment for a LogNormal Distribution truncated from below at d? [Solution: {E[X2 ] + S(d)d2 - E[(X ∧ d)2 ] } /S(d)= (exp(2µ + 2σ2) - exp(2µ + 2σ2)Φ[(ln(d) -µ - 2σ2)/σ] ) / {1 - Φ[(ln(d) - µ)/σ]} = exp(2µ + 2σ2){1 - Φ[(ln(d) - µ - 2σ2)/σ]} / {1 - Φ[(ln(d) - µ)/σ] }.] Exercise: What are the first and second moments for a LogNormal Distribution with parameters µ = 3.8 and σ = 1.5, truncated from below at 1,000?
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 596 [Solution: first moment = e(d) + d = exp(µ + σ2/2){1 - Φ[(ln(d) − µ − σ2)/σ] }/ {1 - Φ[(ln(d) − µ)/σ]} = e4.925{1 - Φ[.5718]} / {1- Φ[2.071] } = (137.7) (0.2837)/0.0191 = 2045. second moment = exp(2µ +2σ2){1- Φ[ln(d) -µ - 2σ2)/σ]} / {1- Φ[(ln(d) -µ)/σ] } = e12.1{1 - Φ[-.928]} / {1- Φ[2.071] } = (179,872) (0.8233)/ 0.0191 = 7.7 million.] Higher moments can be computed using the formula:259 nth moment of the data truncated from below at d = (E[Xn ] + S(d)dn - E[(X ∧ d)n ]) / S(d) In order to apply the method of moments to a two parameter distribution and data truncated from below, one has to compute the first and second moments for the given distribution type when the small losses are not reported and then one has to match the equations for these moments to the observed values and solve (numerically). For example, assume you have data truncated from below at 1000, with the observed first moment is 3000 and the observed second moment is 71 million. Assume a LogNormal Distribution would have fit the data prior to truncation from below. Then to apply the method of moments in order to fit a LogNormal Distribution, one writes down two equations, matching the observed and theoretical values:260 exp(µ + σ2/2){1 − Φ[(ln1000 − µ − σ2)/σ] }/ {1 − Φ[(ln1000 − µ)/σ]} = 3000 exp(2µ +2σ2){1- Φ[(ln1000 -µ - 2σ2)/σ]} / {1- Φ[(ln1000 -µ)/σ] } = 71 million These equations can be solved numerically,261 in this case µ = -0.112 and σ = 2.473. Fitting to Data Truncated and Shifted from Below:262 When data is truncated and shifted from below at the value d, losses of size less than d are not in the reported data base, and larger losses have their reported values reduced by d. The distribution function and the probability density functions are revised as follows: G(x) = { F(x+d) - F(d) } / S(d), x > 0. g(x) = f(x+d) / S(d), x > 0. 259
For n =1 this reduces to e(L) + L. For n =2 it reduces to the formula derived above. The current formula is derived in the exact same way. 260 The right hand sides are the first and second moments of a LogNormal Normal truncated from below at 1000. 261 While one can not be asked to solve these equations on the exam, in theory you could be asked to set them up. 262 One can always convert data truncated and shifted from below to data truncated (and unshifted) from below or vice versa. Thus one can work in whichever context is easier for you.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 597 For example, one could fit to data that is truncated and shifted from below via method of moments. If the original data (ground-up claim amounts) is assumed to follow: F(x) = 1 - {250000/(250000+x)}α, and if the data is truncated and shifted at 150,000, then G(x) = {1 - (250000/(400000+x))α) - (1 - (5/8)α)} / {1 - ( 1 - (5/8)α} = 1 - {400,000/(400,000+x)}α, for x > 0. This happens to be a Pareto with scale parameter = 400,000, and shape parameter equal to α. Thus the theoretical mean of the data truncated and shifted at 150,000 is: 400,000 / (α-1). If one truncates and shifts at 150,000 the ungrouped data in Section 2, then the observed mean turns out to be 534,550. This is 150,000 less than the observed mean of 684,550 for this data truncated from below at 150,000. Setting the observed and theoretical mean equal: 400,000 / (α-1) = 534,550. Thus α = 1.75. In general, if the original data prior to truncation (ground up claim amounts) is assumed to follow F(x), then the mean of the data truncated and shifted at d is e(d), the mean excess loss at d. So in our example where F(x) = 1 - {250000/(250000+x)}α, the mean excess loss of this Pareto Distribution is: e(x) = (θ+x)/(α-1). e(150000) =(250000 + 150000)/(α-1) = 400,000 / (α-1), as determined above. Coinsurance Factors: Since a coinsurance factor is applied last, we can remove its effects and convert to a question not involving coinsurance. For example, if one has a coinsurance factor of 80%, we can divide each payment by 0.80 in order to convert it to what it would have been in the absence of a coinsurance factor. Then the fitting can be performed in the usual manner. Fitting to Data Truncated from Above: As discussed in a prior section, when data is truncated from above at the value L, losses of size greater than L are not in the reported data base. The distribution function and the probability density functions are revised as follows: G(x) = F(x) / F(L), x ≤ L. g(x) = f(x) / F(L), x ≤ L. Mathematically, data truncated from above is parallel to truncation from below, and similar techniques apply. As discussed in a subsequent section, fitting loss distributions to censored data is more common in actuarial applications, than fitting to data truncated from above.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 598 The theoretical mean of data truncated from above at L is:263 L
L
L
∫0 x g(x) dx = ∫0 x f(x) / F(L) dx = ∫0 x f(x) dx / F(L) = {E[X ∧ L] - L S(L)} / F(L). For example, if assumes the original untruncated losses follow an Exponential: F(x) = 1 - e-x/θ, then the mean of the data truncated from above at L is: (E[X ∧ L] - L S(L) )/F(L) = ({θ(1 - e-L/θ) } - L e-L/θ )/ (1 - e-L/θ) = θ - L e-L/θ/(1 - e-L/θ) = θ - L/(eL/θ - 1). For example, the mean of the grouped data in Section 3 if it were truncated from above at 20,000 is: $64,897,000 / 7376 = $8798. If we assume that the original untruncated losses follow an Exponential Distribution: F(x) = 1 - e-x/θ, then the method of moments would say that: θ - 20000/(e20000/θ - 1) = 8798. One could then numerically solve for θ ≅ 27,490.
263
The mean of the data truncated from above is similar to the Limited Expected Value. However, the Limited Expected value includes contributions from all size losses, while the data truncated from above excludes large losses. The Limited Expected Value consists of two terms. The contribution from the small losses is what is included in the data truncated from above. The other term of L S(L) is the contribution of large losses, which are not recorded when data is truncated from above. Thus the numerator of the mean of the data truncated from above is: E[X ∧ L] - L S(L). The denominator is F(L) since losses of size greater than L are not recorded.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 599 Problems: 21.1 (2 points) The following five ground up losses have been observed from policies each with a deductible of 10: 60, 75, 80, 105, and 130. ⎛ 100 ⎞ α We assume that the untruncated data follows the distribution function: F(x) = 1 - ⎜ ⎟ . ⎝ 100+ x ⎠ The method of moments is used to fit this distribution. What is the estimate of the parameter α? A. less than 2.2 B. at least 2.2 but less than 2.3 C. at least 2.3 but less than 2.4 D. at least 2.4 but less than 2.5 E. at least 2.5 Use the following information for the next 3 questions: From a policy with a $10,000 deductible, you observe 6 claims with the following amounts paid by the insurer: $22,000, $28,000, $39,000, $51,000, $80,000, and $107,000. 21.2 (1 point) You fit an Exponential Distribution to this data via the method of moments. What is the fitted θ parameter? (A) 40,000
(B) 45,000
(C) 50,000
(D) 55,000
(E) 60,000
21.3 (2 points) You fit a Pareto Distribution with α = 4 to this data via the method of moments. What is the fitted θ parameter? A. less than 145,000 B. at least 145,000 but less than 150,000 C. at least 150,000 but less than 155,000 D. at least 155,000 but less than 160,000 E. at least 160,000 21.4 (5 points) You fit a LogNormal Distribution with σ = 1.5 to this data via the method of moments. What is the fitted µ parameter? A. 9.0
B. 9.2
C. 9.4
D. 9.6
E. 9.8
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 600 21.5 (3 points) The size of accidents is believed to follow a distribution function: F(x) = 1 - (50000 / x)q , for x > 50,000. Suppose a sample of data is truncated from above at x = 150,000. The sample of 7 accidents is then observed to be: 55,000, 60,000, 65,000, 70,000, 75,000, 80,000, and 90,000. The parameter q is fit to this data using the method of moments. Which of the following intervals contains the fitted value of q? A. less than 2.4 B. at least 2.4 but less than 2.6 C. at least 2.6 but less than 2.8 D. at least 2.8 but less than 3.0 E. at least 3.0 21.6 (2 points) From a policy with a $50 deductible and a 70% coinsurance factor, you observe 9 claims with the following amounts paid by the insurer: $28, $70, $105, $140, $203, $224, $350, $504, $665. You assume the losses prior to the impact of the deductible and coinsurance factor follow an Exponential Distribution F(x) = 1 - e-x/θ. Using the Method of Moments, what is the fitted θ? A. less than 370 B. at least 370 but less than 380 C. at least 380 but less than 390 D. at least 390 but less than 400 E. at least 400 21.7 (4, 5/87, Q.62) (2 points) Assume that the random variable x has the probability density function: f(x ; θ) = θ + 2(1 - θ)x , 0 ≤ x ≤ 1, with parameter θ, 0 ≤ θ ≤ 2. Suppose that a sample is truncated at x = 0.60 so that values below this amount are excluded. The sample is then observed to be 0.70, 0.75, 0.80, 0.80 Using the method of moments, what is the estimate for the parameter θ? 1
You may use
∫
x f(x; θ) dx = 0.5227 - 0.2027θ.
0.6
A. B. C. D. E.
Less than 1.00 At least 1.00, but less than 1.30 At least 1.30, but less than 1.60 At least 1.60, but less than 1.90 1.90 or more.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 601 Solutions to Problems: 21.1. C. The distribution function for the data truncated from below at 10 is: G(x) = {F(x) - F(10)} / {1-F(10)} = {(100/110)α - (100/(100+x))α }/ (100/110)α = 1 - {110/(100 + x)}α, x > 10. The survival function is 1 from 0 to 10, and {110 /(100 +x)}α from 10 to ∞. Therefore, the mean of truncated distribution is: 10
∞
0
10
∫ 1 dx +
∫ 110α
/ (100 + x)α dx = 10 + 110/(α - 1).
Set the mean of the truncated distribution equal to the mean of the truncated data: (60 + 75 + 80 + 105 + 130)/5 = 90 = 10 + 110/(α - 1). Thus α = 19/8 = 2.375. Alternately, the distribution function for the data truncated and shifted by 10 is: G(x) = {F(x+10) - F(10)} / {1-F(10)} = {(100/110)α - (100/(110+x))α }/ (100/110)α = 1 - {110/(110 + x)}α. This is a Pareto Distribution, with parameters α and 110, and mean 110 / (α - 1). The truncated and shifted data is the losses minus the deductible of 10. Setting the mean of the truncated and shifted distribution equal to the observed mean payment of 80: 80 = 110 / (α−1). Thus α = 19/8 = 2.375. Comment: The average claim for the data truncated and shifted by 10 is e(10), the mean excess loss at 10. For the Pareto Distribution: e(x) = (x+θ) / (α−1), so e(10) = (10 + 100) / (α−1) = 110 / (α−1). 21.2. D. After truncation and shifting, one gets the same Exponential Distribution, due to the memoryless property of the Exponential. Therefore, matching means: θ = ($22,000 + $28,000 + $39,000 + $51,000 + $80,000 + $107,000)/6= $54,500. Alternately, the mean after truncation and shifting is the mean excess loss, e(10000). For the Exponential, e(x) = θ, so we set θ = observed mean = $54,500.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 602 21.3. C. After truncation and shifting at 10,000, one gets: G(x) = {F(x+10000) - F(10000)}/S(10000) = {(1+10000/θ)-4 - (1+(x+10000)/θ)-4}/(1+10000/θ)-4 = 1 - (1+x/(θ+10000))-4, which is a Pareto Distribution with α = 4 and θʼ = θ + 10000. This second Pareto has mean: (θ + 10000)/(4-1). Set this equal to the observed mean of 54,500 and solve for θ = 153,500. Alternately, the mean after truncation and shifting is the mean excess loss, e(10000). For the Pareto, e(x) = (θ + x)/(α−1). e(10000) = (θ + 10000)/3. So we set (θ + 10000)/3 = observed mean = $54,500. Therefore, θ = $153,500. 21.4. C. For the LogNormal Distribution, E[X] = exp(µ + σ2/2) = exp(µ + 1.125) = 3.080eµ. E[X
∧ 10000] = exp(µ + σ2/2)Φ[(ln(10000) − µ − σ2)/σ] + 10000S(10000) =
3.080eµΦ[4.640 − µ/1.5] + 10000{1 − Φ[6.140 − µ/1.5]}. S(10000) = 1 - Φ[(ln(10000) − µ)/σ] = 1 − Φ[6.140 − µ/1.5]. The mean after truncation and shifting is the mean excess loss, e(10000) = (E[X] - E[X ∧ 10000] )/S(10000) = 3.080eµ{1 - Φ[4.640 − µ/1.5]}/{1 - Φ[6.140 − µ/1.5] } - 10000. We want e(10000) = observed mean = $54,500. One needs to try all the given choices for µ in order to see which one satisfies this equation. The equation is satisfied for µ ≅ 9.4. mu
Mean
LEV[10000]
S(10000)
e(10000)
9.0 9.2 9.4 9.6 9.8
24,959 30,485 37,235 45,479 55,548
6612 7036 7438 7809 8150
0.4443 0.4960 0.5517 0.6026 0.6517
41,295 47,277 54,009 62,512 72,730
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 603 21.5. C. This is a Single Parameter Pareto, with density f(x) = q 50000q x−(q+1). Adjusting the density for truncation from above: g(x) = f(x) / F(150000) = q 50000q x−(q+1) / {1- 3−q}. The mean of the data truncated from above is the integral of xg(x) from 50000 to 150000: 150000
150000
mean = ∫ xg(x) dx = ∫ {q 50000q / (1- 3−q)} x −q dx = 50000
50000 x = 150000
= -(q / {(q-1)(1- 3−q)}) 50000q x−(q-1)
]
x = 50000
= (q / {(q-1)(1- 3−q)}) 50000q {50000−(q-1) - 150000−(q-1)} = (q / {(q-1)(1- 3−q)}) 50000{1 - (1/3)q-1}. The observed average is: 495000 / 7 = 70,714. Thus one wants: 70714 = (q / {(q-1)(1- 3−q)}) 50000{1 - (1/3)q-1} or 1.414 = q{1 - (1/3)q-1} / {(q-1)(1- 3−q)}. One can solve for q numerically as about 2.72 or just try values of q: q 2.6 2.7 2.8
q{1 - (1/3)^(q-1)} / { (q-1)(1- (1/3)^q)} 1.427 1.416 1.405
21.6. A. The mean payment is: ($28 + $70 + $105 + $140 + $203 + $224 + $350 + $504 + $665)/9= 254.33. Prior to the effect of the coinsurance factor, the mean would be: 254.33/.7 = 363. After truncation and shifting, one gets the same Exponential Distribution, due to the memoryless property of the Exponential. Therefore, matching means θ = 363. Comment: Since the coinsurance factor is applied last, we can remove its effects and convert to a question involving only a deductible.
2013-4-6, Fitting Loss Dists. §21 Method Moments Trunc., HCM 10/15/12, Page 604 21.7. D. The density function for the data truncated from below at .6 is g(x) = f(x) / (1-F(.6)). Integrating f(x) = θ + 2(1 - θ)x we get F(x) = θx + (1 - θ)x2, 0≤x≤1 . Thus F(.6) = .6θ + (.36)(1-θ) = .36 + .24θ. Thus g(x) = f(x) / (1-F(.6)) = f(x)/( .64 - .24θ). The mean of the truncated distribution is the integral of xg(x) from .6 to 1. This is (the integral of xf(x) from .6 to 1)/( .64 - .24θ) = (.5227 - .2027θ) /(.64 - .24θ). Setting this equal to the observed mean of (.7 + .75 + .8 + .8)/4 = .7625, we have: (.5227 - .2027θ) /(.64 - .24θ) = .7625. Thus θ = 0.0347 / 0.0197 = 1.76.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 605
Section 22, Maximum Likelihood Applied to Truncated Data One can apply the Method of Maximum Likelihood to truncated data, with the distribution assumed to fit the data prior to truncation.264 Therefore one has to modify either the distribution or density function for the effects of truncation. Data Truncated from Below: For data truncated from below at a value d, also called left truncated at d, losses of size d or less are not reported.265 For example, if an insurance policy has a deductible of 1000, then the insurance company would pay nothing unless a loss exceeds 1000. For a large loss, it would pay 1000 less than the loss. Exercise: On a policy with a $1000 deductible, there are three losses of size: $700, $3000, $5000. How much does the insurer pay? [Solution: The insurer pays 0, 2000, and 4000, for a total of $6000.] The data for this policy would be reported to the actuary in either of two ways:
• Two losses of sizes $3000 and $5000. • Two payments of sizes $2000 and $4000.266 In either case, the actuary would not know how many losses there were of size $1000 or less, nor would he know the exact sizes of any such losses. For an ordinary deductible of size d, the losses have been truncated from below at d. For a franchise deductible, the small losses would not be reported, and the insurer would pay all of large loss. This data is truncated from below at the amount of the franchise deductible. The Method of Maximum Likelihood can be applied to truncated data in order to fit distributions. If one assumes that the original ground up losses followed F(x), then one gets the likelihood of the data truncated from below using either G(x) for grouped data or g(x) for ungrouped data. G(x) = 264
F(x)- F(d) , x > d. S(d)
g(x) =
f(x) , x > d. S(d)
In a subsequent section, one assumes the truncated data is directly fit by a Single Parameter Pareto Distribution. This is the usual way data from a policy with a deductible of d is reported. 266 The size of payments would be after the deductible has been subtracted. 265
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 606 The log density for data truncated from below at d is: ln (f(x)/S(d)) = ln[f(x)] - ln[S(d)]. For data truncated from below at d, a loss of size x greater than d contributes f(x) / S(d) to the likelihood, or ln[f(x) / S(d)] to the loglikelihood. Exercise: Prior to the effects of a 1000 deductible, the losses follow a distribution F(x), with density f(x). There were two losses reported of sizes 3000 and 5000. Write down the likelihood. [Solution: The first loss contributes: f(3000)/S(1000). The second loss contributes: f(5000)/S(1000). Thus the likelihood is: f(3000) f(5000) / S(1000)2 . Comment: The solution would be the same if we had instead observed two payments of sizes 2000 and 4000.] Exercise: In the previous exercise, determine the maximum likelihood fit of an Exponential Distribution with mean θ. [Solution: The likelihood is: (e-3000/θ/θ) (e-5000/θ/θ) / (e-1000/θ)2 = e-6000/θ/θ2. The loglikelihood is: -6000/θ - 2lnθ. Thus, 0 = 6000/θ2 - 2/θ. ⇒ θ = 6000/2 = 3000.] In general, the method of maximum likelihood applied to ungrouped data truncated from below requires that one maximize:267 Σ ln(f(xi)) - N ln(S(d)) For the Exponential Distribution, one can solve in closed form for the method of maximum likelihood applied to data truncated from below at d as follows: Σ ln(f(xi)) - N ln(S(d)) = Σ {-ln(θ) -xi / θ} - N(-d/θ) Taking the derivative of the loglikelihood with respect to θ and setting it equal to zero: 0 = Σ {-1/θ + xi/θ2 } - Nd/θ2 , therefore Σθ = Σxi - Nd. ⇒ θ = Σ(xi -d)/ N = average payment. Thus for the Exponential Distribution, the method of moments and the method of maximum likelihood applied to ungrouped data give the same result, as was the case in the absence of truncation from below. For more complicated distributions, in order to maximize the likelihood, one must use numerical methods.
267
As always, one can maximize either the likelihood or the loglikelihood. When fitting to ungrouped data one works with the density, while when fitting to grouped data one works with the distribution function.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 607 ⎛ ⎞α 1 For example, fitting the Burr distribution, F(x) = 1 - ⎜ γ ⎟ , via maximum likelihood to ⎝ 1 + (x / θ) ⎠ ungrouped data truncated from below at d, one would need to maximize: Σ ln(f(xi)) - N ln(S(d)) = Σ ln(αγ(xi/θ)γ(1 + (xi/θ)γ)−(α + 1) /xi) - N n((1/(1 + (d/θ)γ))α) = Σ {ln(α) + ln(γ) + (γ−1)ln(xi) − γln(θ) - (α+1)ln(1 + (xi/θ)γ)} - Nαln(1 + (d/θ)γ)) = (γ−1)Σln(xi) - (α+1)Σ ln(1 + (xi/θ)γ) + N ln(α) + N ln(γ) - Nγln(θ) - Nαln(1+ (d/θ)γ)). Numerically maximizing this loglikelihood for the ungrouped data in Section 2 truncated from below at d = 150,000 gives parameters:268 α = 1.142, θ = 2.917 x 105 , γ = 1.460. The survival function of the fitted Burr Distribution adjusted for truncation from below at 150,000 is compared to the empirical survival function (thick) for the data truncated from below at 150,000: 1.00 0.50
0.20 0.10 0.05
0.02 200
500
1000
2000
5000
x (000)
The Burr distribution seems to fit fairly well. One should apply appropriate caution in inferring the behavior of the size of loss distribution below the truncation point from a curve fit to data above a truncation point.
268
One would use the exact same numerical techniques to maximize this loglikelihood as discussed in the absence of truncation from below.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 608 Data Truncated and Shifted From Below: Similarly one could apply the Method of Maximum Likelihood to data truncated and shifted from below. If one assumes that the original ground up losses followed F(x), then one gets the likelihood of the data truncated and shifting using either G(x) for grouped data or g(x) for ungrouped data. G(x) =
F(x + d) - F(d) , x > 0. S(d)
g(x) =
f(x + d) , x > 0. S(d)
Exercise: Assume the losses in the grouped data in Section 3 were truncated and shifted from below at 50,000. Then how would this data have been recorded? [Solution: Interval ($000) Number of Accidents Loss on Accidents in the Interval ($000) 0-25 254 2,603 25-50 57 2,043 50 - ∞ 33 2,645 ] If one assumed the losses prior to modification followed a distribution F, then the likelihood of this data truncated and shifted from below at 50,000 would be: {G(25,000) - G(0)}254 {G(50,000) - G(25,000)}57 {1 - G(50,000)}33 = {F(75,000) - F(50,000)}254 {F(100,000) - F(75,000)}57 {1-F(100,000)}33 / S(50000)344. Data Truncated from Above: As discussed in a prior section, when data is truncated from above at the value L, losses of size greater than L are not in the reported data base. The distribution function and the probability density functions are revised as follows: G(x) = F(x) / F(L),
x ≤ L.
g(x) = f(x) / F(L),
x ≤ L.
The loglikelihood for grouped data truncated from above at a value L is: Σni ln(G(ci+1) - G(ci)) = Σni {ln[(F(ci+1) - F(ci))/F(L)]} = Σni ln(F(ci+1) - F(ci)) - Σni ln(F(L)) = Σni ln[F(ci+1) - F(ci)] - n ln(F(L)). where ni are the observed number of losses in the ith interval [ci, ci+1], and Σni = n.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 609 For example, assume the grouped data in Section 3 were truncated from above at 20,000. Then the data would have been recorded as: Interval ($000) Number of Accidents Loss on Accidents in the Interval ($000) 0-5 2208 5,974 5 -10 2247 16,725 10-15 1701 21,071 15-20 1220 21,127 Thus the loglikelihood would be: (2208) ln[F(5000)] + (2247) ln[F(10000) - F(5000)] + (1701) ln[F(15000) - F(10000)] + (1220) ln[F(20000) - F(15000)] - (2208+2247+1701+1220) ln[F(20000)]. This could be maximized by the usual numerical techniques in order to solve for the fitted parameter(s) of the assumed distribution. Data Truncated from Below and Above:269 For data truncated from below at d and truncated from above at L, the distribution and density functions are revised as follows: F(x) - F(d) f(x) G(x) = , d < x ≤ L. g(x) = , d < x ≤ L. F(L) - F(d) F(L) - F(d) For ungrouped data, the likelihood to be maximized is the product of these densities at the observed points: Π {f(xi) /(F(L) - F(d))}. Each term in the product has in the denominator the probability remaining after truncation. Frequency Distributions: Frequency Distributions can be truncated from below and/or truncated from above. For example, assume that the time lag to settle a claim is distributed via a Poisson Distribution with mean λ: f(t) = e−λλ t/t!, t = 0, 1, 2, ... If for some reason we only know how many claims were settled at time lags up to 3, then the data is truncated from above: g(t) = f(t)/(f(0) + f(1) + f(2) + f(3)) = (λt/t!)/(1 + λ + λ2/2 + λ3/6), t = 0, 1, 2, 3. Exercise: You observe the following numbers of claims settled at time lags 0, 1, 2, and 3: 7, 10, 15, 8. What is the loglikelihood for a Poisson with λ = 4? 269
See 4B, 5/99, Q.25 and the related 4B, 11/99, Q.6.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 610 [Solution: g(t) = 0.04225 4t/t!. g(0) = 0.0423, g(1) = 0.1690, g(2) = 0.3380, g(3) = 0.4507. The loglikelihood is: 7 ln(g(0)) + 10 ln(g(1)) + 15 ln(g(2)) + 8 ln(g(3)) = -62.6.] Below is a graph of the loglikelihood as a function of λ. The maximum likelihood is for λ = 2.04. -55 -60 -65 -70 -75 -80 -85
1
2
3
4
5
Normally, one would use numerical methods in order to solve for the maximum likelihood fit. However, with only 2 time intervals one can solve algebraically. Exercise: You observe 7 claims settled at a time lag of 0, and 10 claims settled at a time lag of 1. The number of claims settled at lags of 2 or more are unknown. What is the maximum likelihood Poisson? [Solution: g(0) = f(0)/{f(0) + f(1)} = 1/(1 + λ). g(1) = f(1)/{f(0) + f(1)} = λ/(1 + λ). loglikelihood = 7 ln(g(0))) + 10 ln(g(1)) = -7 ln(1+λ) + 10 ln(λ) - 10 ln(1+λ). Setting the partial derivative with respect to λ equal to zero: -17/(1+λ) + 10/λ = 0. ⇒ λ = 10/7.] One could also solve if for example the data were only available for lags 1 and 2. In this case, the data has been truncated from below as well as from above. Exercise: You observe 9 claims settled at a time lag of 1, and 12 claims settled at a time lag of 2. The number of claims settled at other lags are unknown. What is the maximum likelihood Poisson? [Solution: g(1) = f(1)/{f(1) + f(2)} = e−λλ/(e−λλ + e−λλ 2/2) = 2/(2 + λ). g(2) = f(2)/{f(1) + f(2)} = λ/(2 + λ). loglikelihood = 9 ln(g(1))) + 12 ln(g(2)) = 9 ln(2) - 9 ln(2+λ) + 12 ln(λ) - 12 ln(2+λ). Setting the partial derivative with respect to λ equal to 0, -21/(2 + λ) + 12/λ = 0. ⇒ λ = 8/3.]
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 611 One can combine the loglikelihoods from different years, assuming the expected settlement pattern is the same for each year. For example, assume the following claims settlement activity for a book of claims as of the end of 1999:270 Number of Claims Settled Year Year Settled Reported 1997 1998 1997 Unknown 13 1998 15 1999
1999 11 12 14
Assuming the expected claim settlement pattern is Poisson (with the lag in years), with the same λ for each report year. Then the loglikelihood is the sum of the loglikelihoods from each year. For report year 1997, with data for only lags 1 and 2, the loglikelihood is: 13 ln(g(1)) + 11 ln(g(2)) = 13 ln(2/(2 + λ)) + 11 ln(λ/(2 + λ)) = 13 ln(2) + 11 ln(λ) - 24ln(2 + λ). For report year 1998, with data for only lags 0 and 1, the loglikelihood is:271 15 ln(g(0)) + 12 ln(g(1)) = 15 ln(1/(1 + λ)) + 12 ln(λ/(1 + λ)) = 12 ln(λ) - 27 ln(1 + λ). For report year 1999, the loglikelihood is: 14 ln (g(0)) = 14 ln(1) = 0. The total loglikelihood is: 13 ln(2) + 11 ln(λ) - 24ln(2 + λ) + 12 ln(λ) - 27 ln(1 + λ). Setting the derivative of the loglikelihood with respect to λ equal to zero: 23/λ - 24/(2 + λ) - 27/(1 + λ) = 0. ⇒ 28λ2 + 9λ - 46 = 0. λ = 1.13, is the maximum likelihood Poisson. The Geometric Distribution, has the same memoryless property as the Exponential Distribution. Therefore, under truncation from below, one gets something that looks similar to the original distribution. Exercise: What is the density for a Geometric Distribution, after truncation from below at zero? [Solution: f(t) = βt/(1 + β)t+1. g(t) = f(t)/(1 - f(0)) = {βt/(1 + β)t+1}/{1 - 1/(1 + β)} = β t-1/(1 + β)t, t = 1, 2, 3 ... ] 270
Similar to 4, 5/01, Q.34. See also somewhat simpler 4, 11/04, Q.32. While for simplicity I have used the same letter h for years 1997 and 1998, the truncation points are different, and therefore the densities after truncation are different. 271
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 612 Assume the following settlement activity for a book of claims: Number of Claims Settled Year Year Settled Reported 1 2 1 Unknown 123 2 300 3
3 37 111 282
Exercise: Assume the expected settlement pattern is a Geometric Distribution as per Loss Models. Given the above settlement activity, what is the contribution to the loglikelihood from report year 2? [Solution: f(t) = βt/(1 + β)t+1. g(t) = f(t)/(f(0) + f(1)) = {βt/(1 + β)t+1} / {1/(1 + β) + β/(1 + β)2 } = β t(1 + β)1-t/(1 + 2β). g(0) = (1 + β)/(1 + 2β). g(1) = β /(1 + 2β). Thus the contribution to the loglikelihood is: 300 ln(g(0)) + 111 ln(g(1)) = 300 ln(1 + β) + 111 ln(β) - 411 ln(1 + 2β). ] Exercise: Assume the expected settlement pattern is a Geometric Distribution as per Loss Models. Given the above settlement activity, what is the contribution to the loglikelihood from report year 1? [Solution: f(t) = βt/(1 + β)t+1. g(t) = f(t)/(f(1) + f(2)) = {βt/(1 + β)t+1}/{β/(1 + β)2 + β 2 /(1 + β)3 }. g(1) = (1 + β)/(1 + 2β). g(2) = β /(1 + 2β). Thus the contribution to the loglikelihood is: 123 ln(g(1)) + 37 ln(g(2)) = 123 ln(1 + β) + 37 ln(β) - 160 ln(1 + 2β). ] Assuming a Geometric Distribution, the overall loglikelihood is: 300 ln(1 + β) + 111 ln(β) - 411 ln(1 + 2β) + 123 ln(1 + β) + 37 ln(β) - 160 ln(1 + 2β) + 0 = 423 ln(1 + β) + 148 ln(β) - 571 ln(1 + 2β). Setting the derivative of the loglikelihood with respect to λ equal to zero: 423/(1 + β) + 148/β - (2)(571)/(1 + 2β) = 0. ⇒ -275β + 148 = 0. β = 148/275 = 0.538, is the maximum likelihood fit.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 613 Here is a graph of the loglikelihood as a function of β:
Loglikelihood - 327
- 328
- 329
- 330
0.4
0.5
0.6
0.7
0.8
beta
Some of you would have found it convenient to reparameterize the Geometric Distribution as f(t) = (1-p)pt, with p = β/(1+β).272 In that case the loglikelihood would be: 148 ln(p) - 571 ln(1+ p). The maximum likelihood p = 148/ (571 - 148) = 148/423. Which implies β = p/(1 - p) = 148/275, matching the result above.
272
See 4, 5/01, Q.34.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 614 An Example of a Practical Application to Hurricane Data: As a practical example of fitting to severity data truncated from below, consider the following data on losses from 164 hurricanes, all adjusted to a common level.273 Losses are in thousands of dollars: 502, 2439, 3954, 5717, 8223, 8796, 9005, 9557, 10278, 10799, 11414, 11446, 11837, 12256, 13296, 13449, 15474, 16208, 17116, 17658, 17866, 17976, 18497, 18825, 18891, 18946, 19065, 19357, 19612, 19929, 20289, 20416, 27091, 28690, 29419, 31069, 31388, 31447, 32860, 34636, 37659, 41277, 41746, 42825, 43577, 47299, 55046, 58145, 58548, 61970, 63152, 63918, 64533, 65024, 65139, 66228, 67732, 69972, 71158, 75739, 75760, 76846, 86278, 87098, 96877, 118642, 122518, 127951, 132787, 133682, 133959, 147702, 155351, 169071, 189781, 189900, 192283, 194630, 194890, 208433, 217219, 224907, 275001, 283869, 293910, 305313, 348405, 356558, 362090, 366142, 368245, 393073, 400501, 403169, 438296, 465074, 484223, 525681, 534237, 547711, 596026, 605316, 643598, 646193, 662658, 668635, 687544, 696402, 775971, 783072, 825054, 836911, 888088, 894836, 923918, 942310, 956927, 970828, 1028039, 1119560, 1163819, 1176396, 1191386, 1270333, 1356989, 1371030, 1378549, 1435127, 1460391, 1624995, 1650468, 1709809, 1755434, 1910703, 1979274, 2087738, 2124106, 2584891, 2728296, 2735157, 2853627, 2949789, 3096434, 3476218, 3686521, 3746855, 3762550, 3912101, 4568366, 4709959, 5432151, 5529261, 5855343, 6265912, 7976601, 8196810, 9816472, 9965606, 10009409, 11518111, 16146375, 16485683, 24486691, 49728840. While this data has not been truncated from below at a specific value, it has still been restricted to larger storms. Only those large tropical storms in which winds attain speeds of at least 74 miles per hour are labeled hurricanes.274 While this is a useful distinction for practical purposes, this dividing line is somewhat arbitrary. There is a continuum from weak tropical storms, to those just below hurricane strength, to those hurricanes of category 1, up to those of category 5. However, this data base excludes any tropical storm which did not make it to hurricane strength. Thus the data has been in some sense truncated from below. In order to analyze the severity I have truncated from below at 10,000 (meaning $10 million in insured losses), leaving 156 severities. Then in each case I will assume the underlying density prior to truncation was f(x), while the density after truncation is: f(x)/S(10,000). Therefore, the loglikelihood is: Σln(fxi) - N ln[1-S(10,000)]. 273
These are insured losses for hurricanes hitting the continental United States from 1900 to 1999. The reported losses have been adjusted to a year 2000 level for inflation, changes in per capita wealth (to represent the changes in property value above the rate of inflation), changes in insurance utilization, and changes in number of housing units (by county). See “A Macro Validation Dataset for U.S. Hurricane Models”, by Douglas J. Collins and Stephen P. Lowe, CAS Forum, Winter 2001. 274 I have seen the definition of hurricane vary by one m.p.h.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 615 As a first step I graphed the empirical mean excess loss (mean residual life): e(x) ($ million) 20000
15000
10000
5000
2000
4000
6000
8000
10000
x ($million)
The mean excess loss is increasing, perhaps a little less than linearly.275 Thus I tried to fit, distributions with increasing mean excess losses. The best fits via maximum likelihood were: Distribution
Parameters
Negative Loglikelihood
LogNormal
µ = 11.9113, σ = 2.53218
2282.56
Weibull
θ = 224,510, τ = 0.327437
2279.32
Mixed
θ = 400,847, τ = 0.370266,
2271.00
Weibull-LogNormal
µ = 9.84798, σ = 0.0461508, p = weight to Weibull = 0.94854
5 Point Mixture
θ1 = 6453, θ2 = 44,138, θ3 = 667,790,
of Exponentials
θ4 = 3,921,035, θ5 = 18,114,718,
2276.42
p 1 = 0.239005, p2 = 0.256331, p 3 = 0.304175, p4 = 0.172696 The Pareto was too heavy-tailed to fit this data, while distributions like the Weibull for τ < 1, and the LogNormal, whose mean excess losses increase less than linearly, fit well. 275
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 616 Based on the loglikelihoods, the mixture of a Weibull and a LogNormal seems to be the best fit. Exercise: Use the Loglikelihood Ratio test in order to test the fit of the mixed Weibull-LogNormal. [Solution: The 2 parameter Weibull Distribution is a special case of the 5 parameter Weibull-LogNormal Distribution. Thus we compare (2)(2279.32 - 2271.00) = 16.64 to a Chi-Square Distribution with 5 - 2 = 3 degrees of freedom. Since 16.64 > 12.838, we reject at 1/2% the hypothesis that the simpler Weibull Distribution should be used, and use instead the mixed Weibull-LogNormal Distribution. Comment: Since the 2 parameter LogNormal Distribution has a worse loglikelihood than the Weibull, the mixed Weibull-LogNormal fares even better in comparison to the LogNormal than it did to the Weibull.] One could also apply the Schwarz Bayesian Criterion: Distribution LogNormal Weibull Mixed
# Parameters 2 2 5
Loglikelihood -2282.56 -2279.32 -2271.00
Penalty276 5.05 5.05 12.62
Penalized Loglikelihood -2287.61 -2284.37 -2283.62
22.72
-2299.14
Weibull-LogNormal 5-Point Mixture of Exponentials
9
-2276.42
Based on the Schwarz Bayesian Criterion, the best fit is the mixed Weibull-LogNormal, followed by the Weibull, LogNormal, and the 5 point mixture of Exponentials. One can also compute the Kolmogorov-Smirnov and Anderson-Darling statistics: Distribution K-S Statistic Anderson-Darling Statistic LogNormal 6.84% 1.089 Weibull 6.32% 0.610 2 point mixture of Weibull & LogNormal 4.72% 0.257 5 point mixture of Exponentials 3.63% 0.118 Using the K-S statistics, all of these fits are not rejected at 20%, since the critical value for 156 data points is: 1.07/ 156 6 = 8.57%. The 9 parameter 5-point mixture of Exponentials has a somewhat better (smaller) K-S statistic than the 5 parameter mixed Weibull-LogNormal. Using the Anderson-Darling statistics, all of these fits are not rejected at 10%, since the critical value is 1.933. The 9 parameter 5-point mixture of Exponentials has the best (smallest) Anderson-Darling statistic. 276
penalty = (# of parameters) ln(# points)/2 = (# of parameters) ln(156)/2 = (# of parameters)(2.525).
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 617 Here are graphs of the differences between the empirical and fitted distributions, out to $1 billion:277 Empirical Distribution Function minus Maximum Likelihood LogNormal:
D(x) 0.06
0.04
0.02
20
50
100
200
500
1000
x ($million)
- 0.02
- 0.04
- 0.06
277
With the x-axis on a log scale and only shown up to $1 billion. Since one is likely to be very interested in the righthand tail, for the two best fits, below are also shown difference graphs from $1 billion to $100 billion in size.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 618 Empirical Distribution Function minus Maximum Likelihood Weibull:
D(x) 0.06
0.04
0.02
20
- 0.02
- 0.04
50
100
200
500
1000
x ($million)
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 619 Empirical Distribution Function minus Maximum Likelihood 2 point mixture of Weibull & LogNormal: D(x)
0.04
0.02
20
- 0.02
50
100
200
500
1000
x ($million)
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 620 Empirical Distribution Function minus Maximum Likelihood 5 point mixture of Exponentials: D(x)
0.03
0.02
0.01
20
50
100
200
500
1000
x ($million)
- 0.01
- 0.02
Note that the K-S Statistic of 0.0363 for the 5-point mixture of Exponentials, corresponds to the maximum distance this difference curve gets from the x-axis, either above or below.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 621 Graphs of the differences between the empirical and fitted distributions, out to $100 billion: Empirical Distribution Function minus Maximum Likelihood 2 point mixture of Weibull & LogNormal: D(x) 0.015
0.010
0.005
2
5
10
20
50
100
x ($billion)
- 0.005
Empirical Distribution Function minus Maximum Likelihood 5 point mixture of Exponentials: D(x) 0.010 0.005
2 - 0.005 - 0.010
5
10
20
50
100
x ($billion)
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 622 The nth moment for a distribution truncated from below at 10,000 (meaning $10 million in insured losses) is: ∞
∫
xn f(x) dx
10,000
S(10,000)
.
Using a computer to compute these integrals,278 I obtained the first four moments, which in turn were used to compute the Coefficient of Variation, Skewness, and Kurtosis: Data 1.841
5pt. Exp. 1.849
Weibull 2.068
LogNormal 4.289
Weib-LogN 2.019
CV
2.75
2.87
3.73
22.8
3.24
Skewness
6.46
8.27
17.68
13955
13.01
Kurtosis
55
110
833
1.2 x 1011
405
Mean ($ billion)
The LogNormal is much too heavy-tailed in comparison to the data! The 5-point mixture of Exponentials is similar to the data, with the mixture of the Weibull and LogNormal less so.279 Based on the empirical data over the last century, the frequency of hurricanes hitting the continental U.S. with more than $10,000,000 in insured loss (equivalent to 10,000 in the data base) is: 156/100 = 1.56.280 Using any of the fitted distributions, the chance that such a hurricane will have damage greater than x is: S(x)/S(10,000). Exercise: Using the fitted Weibull Distribution, what is the expected frequency of hurricanes with damage > $10 billion? [Solution: Given a hurricanes of size > 10,000, the probability it is of size > 10,000,000 (equivalent to $10 billion) is: S(10,000,000 )/S(10,000) = exp[-(10000000/224,510)0.327437]/exp[-(10000/224,510)0.327437] = 0.0312/0.6970 = 4.48%. Thus the expected frequency per year is: (4.48%)(1.56) = 6.99%.]
278
Using Mathematica. The empirical skewness is very sensitive to the presence or absence of a single very large hurricane. The empirical kurtosis is extremely sensitive to the presence or absence of a single very large hurricane. 280 This point estimate is subject to error. 279
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 623 The average “return time” is defined as 1/(average frequency). Using the fitted Weibull Distribution, the estimated average return time for a hurricane of size > $10 billion is: 1/.0699 = 14.3 years. Here are some estimated return times in years: Size ($ billion) 1 5 10 25 50 100
Empirical 2.17 7.14 16.7 100 N.A. N.A.
Weibull 2.28 7.08 14.3 48.1 158.5 707
WeibullLogNormal 2.16 6.78 14.3 53.9 208.4 1196
LogNormal 2.43 6.65 11.4 25.5 50.8 108
Mixed Exponential 2.12 6.96 16.4 66.4 275.0 4348
So we expect a storm of more than $5 billion on average once every 7 years. As the size gets bigger, the estimates get more and more uncertain. Those for a storm of size greater than $100 billion need to be taken with an extremely large grain of salt.281 In any case, fitting size of loss distributions is only one method of approaching this question. The preferred technique currently is to simulate possible hurricanes using meteorological data and estimate the likely damage using exposure data on the location and characteristics of insured homes combined with engineering and physics data.282
281
A subsequent section will discuss the variance of estimates of functions of the parameters of size of loss distributions. Note that in this case there is error in both the estimated overall frequency and the estimated survival function. 282 "A Formal Approach to Catastrophe Risk Assessment in Management", by Karen M. Clark, PCAS 1986, or "Use of Computer Models to Estimate Loss Costs," by Michael A. Walters and Francois Morin, PCAS 1997.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 624 Loss Modelʼs Example, Claim Count Development:283 Claims due to automobile collisions are reported to insurers relatively quickly. In contrast, claims due to medical malpractice often take many years to be reported to an insurer. For example, claims of mistakes during the delivery of a baby are often pursued 18 or more years after the birth. Therefore, if an insurer wrote a lot of medical malpractice insurance covering incidents during 1995, it still expects more claims to be made in the future. This contrasts to automobile collision insurance, in which all claims relating to accidents during 1995 should have been made a long time ago. For lines of insurance such as medical malpractice with significant reporting delays, estimating how many claims will eventually be reported is an important actuarial task. This example in Loss Models tries to illustrate this. Here is data on medical malpractice report lags:284 Lag (months)
# claims
Lag (months)
# claims
Lag (months)
# claims
0-6 6-12 12-18 18-24 24-30 30-36 36-42 42-48 48-54 54-60
4 6 8 38 45 36 62 33 29 24
60-66 66-72 72-78 78-84 84-90 90-96 96-102 102-108 108-114 114-120
22 24 21 17 11 9 7 13 5 2
120-126 126-132 132-138 138-144 144-150 150-156 156-162 162-168
7 17 5 8 2 6 2 0
SUM
463
Note that the data is grouped into intervals. For example, 38 claims are reported with lags of between 18 and 24 months. 463 claims have been reported by 168 months (14 years). It is assumed that additional claims will be reported beyond 168 months, but at the time this data was available enough time has not passed for that to happen; the data is truncated from above at 168 months. Loss Models believes this data has a light tail, and therefore fits a Weibull Distribution. 283
See Section 18.3 of Loss Models, not on the Syllabus. Similar ideas are covered in “Estimation of the Distribution of Report Lags by the Method of Maximum Likelihood” by Edward Weissner, PCAS 1978. 284 See Table 18.2 of Loss Models. Taken from “ Report Lag Distributions: Estimation and Application to IBNR Counts,” by Frank Accomando and Edward Weissner, 1988 Casualty Loss Reserve Seminar, transcript available on the CAS webpage.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 625 For the Weibull Distribution F(x) = 1 - exp[-(x/θ)τ]. After truncation from above at 168, G(x) = F(x)/F(168) = (1 - exp[-(x/θ)τ])/(1 - exp[-(168/θ)τ]). Therefore the loglikelihood is: 4ln(G(6) - G(0)) + 6ln(G(12) - G(6)) + ... + 2ln(G(162) - G(156)) + 0ln(G(168 ) - G(162)) = 4ln((F(6) - F(0)))/F(168)) + 6ln((F(12) - F(6))/F(168)) +...+ 2ln((F(162) - F(156))/F(168)) = 4ln(F(6) - F(0)) + 6 ln(F(12) - F(6)) + ... + 2ln(F(162) - F(156)) - 463ln(F(168)) = 4ln(1 - exp[-(6/θ)τ]) + 6 ln(exp[-(6/θ)τ] - exp[-(12/θ)τ]) + ... + 2ln(exp[-(156/θ)τ] - exp[-(162/θ)τ]) - 463ln((1 - exp[-(168/θ)τ])). The loglikelihood is a function of τ and θ. For example, for τ = 1.7 and θ = 67, the loglikelihood turns out to be -1419.32. Using numerical methods, it turns out that the loglikelihood is maximized for τ = 1.71268 and θ = 67.3002 at -1419.30. The Weibull Distribution fit via maximum likelihood has τ = 1.71268 and θ = 67.3002.285 Using the fitted Weibull, what percentage of claims are expected to be reported after 120 months? [Solution: S(x) = exp[-(x/θ)τ]. S(120) = exp[-(120/67.3002)1.71268] = 6.77%.] Exercise: What is the empirical survival function at 120 months, for the data truncated from above at 168? [Solution: Out of 463 reported claims, there were 47 reported after 120 months. 47/463 = 0.102.] The distribution function for data truncated from above at 168 is: G(x) = F(x)/F(168). Exercise: What is the fitted survival function at 120 months, for the data truncated from above at 168? [Solution: F(120) = 1 - exp[-(120/67.3002)1.71268] = 0.9323. F(168) = 1 - exp[-(168/67.3002)1.71268] = 0.9917. 1 - F(120)/F(168) = 1 - .9323/.9917 = 0.060.] Exercise: Using the fitted Weibull how many claims do you expect to be reported in total? [Solution: F(168) = 1 - exp[-(168/67.3002)1.71268] = 0.9917. Thus we estimate that the 463 claims reported by 168 months are 99.17% of the total. Thus, expected total = 463/.9917 = 466.9.] 285
Matching the result in Section 18.3.2 of Loss Models.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 626 Exercise: For a newer data set there are 252 claims reported by 60 months. Using the fitted Weibull, τ = 1.71268 and θ = 67.3002, how many claims do you expect to be reported in total? [Solution: F(60) = 1 - exp[-(60/67.3002)1.71268] = .5602. ⇒ 252/.5602 = 449.8. ] Here is a comparison between survival functions corresponding to truncation from above at 168 months for the data (dots) and the fitted Weibull Distribution (curve): 1 0.8
0.6 0.4 0.2
25
50
75
100
125
150
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 627 Here is a comparison between the fitted and empirical distributions at the endpoints of the intervals, similar to a p-p plot: 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
It appears that there may be significant differences between the empirical and fitted distribution functions. We can use the Chi-Square statistic to test the fit.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 628 Exercise: Use the Chi-Square Statistic to test the fitted Weibull Distribution. Group the data using the largest number of groups such that the expected number of claims in each group is at least 5. [Solution: It turns out to be necessary to group the intervals 132 to 138 and 138 to 144, since they would otherwise have 4.35 and 3.48 expected claims, each fewer than 5. Similarly it is necessary to group the last four intervals into an interval from 144 to 168. Bottom of Interval
Top of Interval
Observed # claims
Weibull D. at Top of Interval
Fitted # claims
ChiSquare
0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120 126 132 144
6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120 126 132 144 168
4 6 8 38 45 36 62 33 29 24 22 24 21 17 11 9 7 13 5 2 7 17 13 10
0.01579 0.05084 0.09922 0.15720 0.22169 0.29000 0.35979 0.42911 0.49634 0.56022 0.61983 0.67455 0.72404 0.76817 0.80700 0.84076 0.86976 0.89439 0.91509 0.93229 0.94645 0.95800 0.97476 0.99170
7.37 16.36 22.59 27.07 30.11 31.89 32.59 32.36 31.39 29.82 27.83 25.55 23.10 20.60 18.13 15.76 13.54 11.50 9.66 8.03 6.61 5.39 7.83 7.91
1.54 6.56 9.42 4.41 7.36 0.53 26.55 0.01 0.18 1.14 1.22 0.09 0.19 0.63 2.81 2.90 3.16 0.20 2.25 4.53 0.02 25.02 3.42 0.55
463
104.71
463
For example, the Weibull Distribution Function at 72 is: 1 - exp[-(72/67.3002)1.71268] = 0.67455. The fitted number of claims for the interval 66 to 72 is: 463(F(72) - F(62))/F(168) = (463)(0.67455 - 0.61983)/0.99170 = 25.55. There are 24 intervals used to compute the Chi-square, and two fitted parameters. Therefore, there are 24 - 1 - 2 = 21 degrees of freedom. The critical value at 0.5% is 41.401.286 Since 104.71 > 41.401, we reject the fitted Weibull Distribution at 0.5%.] The maximum likelihood Weibull Distribution is a poor fit to this data.
286
Using a somewhat larger Chi-Square Table than is attached to the exam.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 629 Problems: 22.1 (3 points) The size of accidents is believed to follow a distribution function: ⎛ 200,000 ⎞ α F(x) = 1 - ⎜ ⎟ . ⎝ 200,000 + x2 ⎠ Suppose a sample is truncated at x = 500, so that values below this amount are excluded. The sample of 1000 accidents is then observed to be: Interval Number of Accidents 500 - 1000 800 1000 - 5000 190 over 5000 10 The parameter α is fit to this data using the method of maximum likelihood. Which of the following functions would one maximize? A. { 1 - 0.375α}1000 {0.375α - 0.01786α}200 {0.01786α }10 B. { 1 - 0.375α}800 {0.375α - 0.01786α}190 {0.01786α }10 C. { 0.375α}1000 {0.375α - 0.01786α}200 {1 - 0.01786α }10 D. { 0.375α}800 {0 .375α - 0.01786α}190 {1 - 0.01786α }10 E. None of the above 22.2 (3 points) You observe 3 claims from three policies with different deductible amounts: Claim Policy Deductible Amount Paid by Number Number Amount The Insurer C2319 MAC22 1000 1,500 C1671 MAC31 none 18,000 C3753 MAC12 5000 31,000 You assume that in the absence of any deductibles the losses would follow a Distribution Function F(x), with probability density function f(x). Let S(x) = 1 - F(x). What is the likelihood? A. f(1500) f(18,000) f(31,000) B. f(2500) f(18,000) f(36,000) f(1500) f(18,000) f(31,000) C. S(1000) S(5000) D.
f(2500) f(18,000) f(36,000) S(1000) S(5000)
E. None of the above.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 630 22.3 (3 points) The report lag for claims is assumed to be exponentially distributed: F(x) = 1 - e-λx where x is the delay in reporting. You have the following data on the cumulative number of claims reported for a given accident month, but you are yet to have any information on any claims that may take more than 3 months to be reported. Delay Cumulative Number of Claims Reported 1 7 2 12 3 16 In order to estimate λ by the method of maximum likelihood, which of the following functions should be maximized? Treat the given data as if it is ungrouped. A. ln(λ) - 29λ - ln(1 - e-3λ) B. 16 ln(λ) - 29λ - 16 ln(1 - e-3λ) C. ln(λ) - 35λ - ln(1 - e-3λ) D. 16 ln(λ) - 35λ - 16 ln(1 - e-3λ) λ E. 16 ln[ ] 1 - e - 3λ 22.4 (3 points) You observe 3 claims from three policies with different deductible amounts and coinsurance factors: Deductible Coinsurance Amount Paid by Amount Factor The Insurer 1000 80% 600 2000 none 2,500 500 90% 5,220 You assume that in the absence of any deductible and coinsurance factor the losses would follow a Distribution Function F(x), with probability density function f(x). Let S(x) = 1 - F(x). The parameters of this distribution function are fit to this data via the method of maximum likelihood. Which of the following functions should you maximize? f(1600) f(4500) f(5720) f(1600) f(4500) f(5720) A. B. S(500) S(1000) S(2000) S(450) S(800) S(2000) C.
f(1750) f(4500) f(6300) S(500) S(1000) S(2000)
E. None of the above.
D.
f(1750) f(4500) f(6300) S(450) S(800) S(2000)
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 631 Use the following information on the settlement activity for a book of claims for the next two questions: Number of Claims Settled Year Year Settled Reported 1 2 1 2 3
Unknown
30 203
3 4 27 212
22.5 (3 points) Let t be the time lag in settling a claim. Assume that t follows a Poisson Distribution. Determine the maximum likelihood estimate of the parameter λ. A. less than 0.11 B. at least 0.11 but less than 0.12 C. at least 0.12 but less than 0.13 D. at least 0.13 but less than 0.14 E. at least 0.14 22.6 (3 points) Let t be the time lag in settling a claim. Assume that t has the probability density function f(t) = (1-p)pt, t = 0, 1, 2, ... Determine the maximum likelihood estimate of the parameter p. A. less than 0.11 B. at least 0.11 but less than 0.12 C. at least 0.12 but less than 0.13 D. at least 0.13 but less than 0.14 E. at least 0.14 22.7 (3 points) From a policy with a deductible of 5000, you observe the following 5 payments: 1000, 3000, 6000, 15,000, and 40,000. You assume that in the absence of any deductibles the losses would follow a Distribution Function F(x), with probability density function f(x). S(x) = 1 - F(x). The parameters of this distribution function are fit to this data via the method of maximum likelihood. Which of the following functions should you maximize? A. f(1000) f(3000) f(6000) f(15000) f(40000) B. f(1000) f(3000) f(6000) f(15000) f(40000) / S(5000)5 C. f(6000) f(8000) f(11000) f(20000) f(45000) D. f(6000) f(8000) f(11000) f(20000) f(45000) / F(5000)5 E. None of the above.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 632 22.8 (3 points) You study the time between accidents and reports of claims. The study was terminated at time 3. Time of Accident Time of Claim Report Number of Reported Claims 0 t≤1 18 0 1
F(1)32 {F(2) - F(1)} 23 {F(3) - F(2)}9 F(2)24 F(3)40
(B)
F(1)43 {F(2) - F(1)} 23 {F(3) - F(2)}9 F(2)24 F(3)40
(C)
F(1)32 {F(2) - F(1)} 23 {F(3) - F(2)}9 F(2)32 F(3)55
(D)
F(1)43 {F(2) - F(1)} 23 {F(3) - F(2)}9 F(2)32 F(3)55
(E) None of A, B, C, or D. 22.9 (3 points) The size of losses prior to the impact of any deductible is believed to follow a Weibull Distribution. The following data has been truncated from below at 5. Interval Number of Losses (5, 10] 60 (10, 50] 25 (50, ∞) 15 Using the method of maximum likelihood, which of the following functions would one maximize? A.
{exp[-(5 / θ)τ ] - exp[-(10 / θ)τ ]} 60 {exp[-(10 / θ)τ ] - exp[-(50 / θ)τ ]}25 exp[-(50 / θ)τ ] 15 exp[-(5 / θ)τ ]40
B.
{1 - exp[-(10 / θ)τ ]}60 {exp[-(10 / θ)τ ] - exp[-(50 / θ)τ ]}25 exp[-(50 / θ)τ ]15 exp[-(5 / θ)τ ] 40
C.
{exp[-(5 / θ)τ ] - exp[-(10 / θ)τ ]} 60 {exp[-(10 / θ)τ ] - exp[-(50 / θ)τ ]}25 exp[-(50 / θ)τ ] 15 exp[-(5 / θ)τ ]100
D.
{1 - exp[-(10 / θ)τ ]}60 {exp[-(10 / θ)τ ] - exp[-(50 / θ)τ ]}25 exp[-(50 / θ)τ ]15 exp[-(5 / θ)τ ]100
E. None of the above
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 633 22.10 (3 points) Size of loss is believed to follow a distribution function F. S = 1 - F. The number of losses in the middle interval is unknown because your coworker spilled coffee on the report: Interval Number of Losses 0 - 10,000 100 10,000 - 25,000 unknown over 25,000 10 What is the likelihood function? F(10,000)100 S(25,000)10 A. F(10,000)100 S(25,000)10 B. {F(10,000) S(25,000)}110 C.
F(10,000)100 S(25,000)10 {F(25,000) - F(10,000)} 110
D.
F(10,000)100 S(25,000)10 {F(10,000) + S(25,000)}110
E. None of the above 22.11 (3 points) The number of accidents for each driver follows a Poisson distribution. You have the following information on 10,000 drivers who had accidents: Number of Accidents: 1 2 3 4 5 or more Number of Drivers: 9776 200 21 3 0 You have no information on how many drivers had no accidents. Estimate the parameter λ using the method of maximum likelihood. A. 0.04
B. 0.05
C. 0.06
D. 0.07
E.0 08
22.12 (2 points) The ground-up losses are assumed to follow an Exponential Distribution with mean θ. Suppose a sample is truncated at x = 500, so that values below this amount are excluded. The sample of 1000 accidents is then observed to be: Interval Number of Accidents 500 - 1000 800 over 1000 200 Fit the parameter θ to this data using the method of maximum likelihood. A. less than 150 B. at least 150 but less than 200 C. at least 200 but less than 250 D. at least 250 but less than 300 E. at least 300
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 634 Use the following information for the next 2 questions: From a policy with a $1000 deductible, you observe 4 loss amounts (prior to the subtraction of the deductible): $1500, $3000, $5000, $15,000. 22.13 (1 point) You fit an Exponential Distribution to this data via the method of maximum likelihood. What is the fitted θ parameter? A. less than 5000 B. at least 5000 but less than 6000 C. at least 6000 but less than 7000 D. at least 7000 but less than 8000 E. at least 8000 22.14 (3 points) Draw the p-p plot comparing the Exponential fitted in the previous question to that data.
22.15 (3 points) You are given: (i) The number of claims follows a Poisson distribution with mean λ. (ii) Observations other than 0, 1, and 2 have been deleted from the data. (iii) The data contain an equal number of observations of 0, 1 and 2. Determine the maximum likelihood estimate of λ. (A) 1.1
(B) 1.2
(C) 1.3
(D) 1.4
(E) 1.5
22.16 (2 points) The ground-up losses are assumed to follow an Exponential Distribution with mean θ. From polices with a 500 franchise deductible, we have data that is left truncated at x = 500: Interval Number of Accidents 500 - 1000 100 over 1000 80 From polices with a 1000 franchise deductible, we have data that is left truncated at x = 1000: Interval Number of Accidents 1000 - 2000 200 over 2000 50 Fit the parameter θ to the combined data using the method of maximum likelihood. A. 620
B. 640
C. 660
D. 680
E. 700
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 635 Use the following information for the next 6 questions: The total compensation of the 100 best paid chief executives in the United States during 2004 was as follows in millions of dollars: 8.42, 8.43, 8.50, 8.66, 8.81, 8.95, 8.98, 9.17, 9.20, 9.33, 9.48, 9.68, 9.71, 9.75, 9.80, 9.90, 9.90, 9.96, 10.00, 10.00, 10.07, 10.08, 10.16, 10.28, 10.51, 10.56, 10.58, 10.64, 10.64, 10.67, 10.73, 10.78, 10.80, 10.80, 10.85, 10.93, 11.03, 11.06, 11.11, 11.19, 11.26, 11.49, 11.62, 11.80, 11.99, 12.11, 12.28, 12.35, 12.49, 12.49, 12.58, 12.68, 12.7, 12.84, 12.85, 12.95, 13.07, 13.27, 13.35, 13.63, 13.76, 13.77, 13.93, 14.00, 14.25, 15.00, 15.04, 15.21, 15.50, 15.78, 15.89, 15.99, 16.36, 16.34, 16.39, 16.82, 16.94, 17.02, 17.79, 18.77, 20.04, 20.40, 20.91, 20.98, 21.00, 21.33, 21.85, 22.47, 22.51, 23.77, 23.93, 24.00, 27.99, 28.92, 29.79, 30.96, 32.13, 32.72, 35.12, 38.66. Use a computer to help perform the calculations. 22.17 (4 points) Draw a graph of the mean residual life as a function of size. 22.18 (3 points) Based on this graph, pick a simple parametric model and fit it to this data via maximum likelihood. Note that this data has been truncated from below. Take this truncation point as 8.40. 22.19 (4 points) For the distribution fit in the previous question, draw a difference graph, D(x). 22.20 (2 points) Test the fit using the Kolmogorov-Smirnov Statistic. α
0.20 |
c
0.10 |
1.07/ n
0.05 |
1.22/ n
.025 |
1.36/ n
0.01 |
1.48/ n
1.63/ n
22.21 (4 points) Add a parameter to the previously fit distribution and fit this more complicated distribution via maximum likelihood. 22.22 (2 points) Use the Likelihood Ratio test to compare the simpler and more complicated fits.
22.23 (4B, 11/94, Q.27) (1 point) You plan to use several steps to fit and test the accuracy of the fit of a loss distribution to ungrouped data that has been truncated from below. In what order would you use the following procedures in fitting and analyzing the fit of a distribution: 1. Method of moments estimation 2. Maximum likelihood estimation 3. Comparing the fitted and empirical limited expected value functions A. 1, 2, 3 B. 2, 1, 3 C. 3, 1, 2 D. 3, 2, 1 E. The order of use is irrelevant.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 636 22.24 (4B, 5/95, Q.15) (2 points) You are given the following: • Six observed losses have been recorded in thousands of dollars and are grouped as follows: Interval Number of Losses (0, 2) 2 [2, 5) 4 • There is no record of the number of losses at or above $5,000. • The random variable underlying the observed losses, in thousands, has the density function f(x) = λe-λx, x > 0, λ > 0. Which of the following functions must be maximized to find the maximum likelihood estimate of λ? A. (1 - e-2λ)2 (e-2λ-e-5λ)4 B. (1 - e-2λ)2 (e-2λ-e-5λ)4 (e-5λ)6 C. (1 - e-2λ)2 (e-2λ-e-5λ)4 (1-e-5λ)6 D. (1 - e-2λ)2 (e-2λ-e-5λ)4 (e-5λ)-6 E. (1 - e-2λ)2 (e-2λ-e-5λ)4 (1-e-5λ)-6 22.25 (4B, 5/96, Q.29) (3 points) You are given the following: • Forty (40) observed losses from a long-term disability policy with a one-year elimination period have been recorded and are grouped as follows: Years of Disability Number of Losses (1,2) 10 [2,∞) 30 • You wish to shift the observations by one year and fit them to a Pareto distribution with parameters θ (unknown) and α = 1. Determine the maximum likelihood estimator of θ. A. 1/3
B. 1/2
C. 1
D. 2
E. 3
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 637 22.26 (4B, 11/96, Q.19) (2 points) You are given the following: • Three losses have been recorded as follows: 200, 300, 500.
• •
Losses below 100 have not been recorded. The random variable X underlying the losses has the density function f(x;θ) and the cumulative distribution function F(x;θ).
Which of the following functions must be maximized to find the maximum likelihood estimate of θ? A. f(200;θ) f(300;θ) f(500;θ) D.
f(200;θ) f(300;θ) f(500;θ) F(100;θ)3
B.
f(200;θ) f(300;θ) f(500;θ) f(100;θ)3
E.
f(200;θ) f(300;θ) f(500;θ) {1 - F(100;θ)}3
C.
f(200;θ) f(300;θ) f(500;θ) {1 - f(100;θ)}3
22.27 (4B, 5/99, Q.25) (2 points) You are given the following: • Losses have been truncated from below at k1 and truncated from above at k2 . •
The random variable X underlying the losses has the density function f(x; θ) and the cumulative distribution function F(x; θ).
Which of the following is the form of the function that must be maximized to find the maximum likelihood estimate of θ? A. Π f(xi ; θ)
D.
∏ f(xi ; θ) F(k2; θ) {1 - F(k1 ; θ)}
B.
E.
∏ f(xi ; θ) F(k2; θ) 1 - F(k1 ; θ)
C.
∏ f(xi ; θ) {1
- F(k2 ; θ)}
1 - F(k1; θ)
∏ f(x i ; θ) F(k2 ; θ) - F(k1; θ)
22.28 (4B, 11/99 Q.6) (2 points) You are given the following: • The random variable X underlying the losses has the density function f(x; θ) and the cumulative distribution function F(x;θ). • All loss amounts for losses below k1 and above k2 are available. However, information on the number of losses between k1 , and k2 and the amounts of these losses has been misplaced. What is the likelihood function of θ? A. Π f(xi ; θ)
D.
∏ f(x i ; θ) F(k2 ; θ) - F(k1; θ)
B.
E.
∏ f(xi ; θ) F(k1; θ) {1 - F(k2; θ)}
∏ f(xi ; θ) 1 + F(k1 ; θ) - F(k2; θ)
C.
∏ f(xi ; θ) F(k2; θ) {1 - F(k1 ; θ)}
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 638 22.29 (4, 5/01, Q.34) (2.5 points) You are given the following claims settlement activity for a book of automobile claims as of the end of 1999: Number of Claims Settled Year Year Settled Reported 1997 1998 1997 Unknown 3 1998 5 1999
1999 1 2 4
L = (Year Settled – Year Reported) is a random variable describing the time lag in settling a claim. The probability function of L is fL (l) = (1-p)pl, for l = 0, 1, 2, ... Determine the maximum likelihood estimate of the parameter p. (A) 3/11 (B) 7/22 (C) 1/3 (D) 3/8 (E) 7/15 22.30 (4, 11/01, Q.10 & 2009 Sample Q.61) (2.5 points) You observe the following five ground-up claims from a data set that is truncated from below at 100: 125 150 165 175 250 You fit a ground-up exponential distribution using maximum likelihood estimation. Determine the mean of the fitted distribution. (A) 73 (B) 100 (C) 125 (D) 156 (E) 173 22.31 (4, 11/04, Q.26 & 2009 Sample Q.152) (2.5 points) You are given: (i) A sample of losses is: 600 700 900 (ii) No information is available about losses of 500 or less. (iii) Losses are assumed to follow an exponential distribution with mean θ. Determine the maximum likelihood estimate of θ. (A) 233
(B) 400
(C) 500
(D) 733
(E) 1233
22.32 (4, 11/04, Q.32 & 2009 Sample Q.156) (2.5 points) You are given: (i) The number of claims follows a Poisson distribution with mean λ. (ii) Observations other than 0 and 1 have been deleted from the data. (iii) The data contain an equal number of observations of 0 and 1. Determine the maximum likelihood estimate of λ. (A) 0.50
(B) 0.75
(C) 1.00
(D) 1.25
(E) 1.50
22.33 (2 points) In the previous question, change bullet iii to read instead: (iii) The data contains twice as many observations of 0 as of 1. Determine the maximum likelihood estimate of λ. (A) 0.50
(B) 0.75
(C) 1.00
(D) 1.25
(E) 1.50
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 639 22.34 (4, 5/07, Q.31) (2.5 points) You are given: (i) An insurance company records the following ground-up loss amounts, which are generated by a policy with a deductible of 100: 120 180 200 270 300 1000 2500 (ii) Losses less than 100 are not reported to the company. (iii) Losses are modeled using a Pareto distribution with parameters θ = 400 and α. Use the maximum likelihood estimate of α to estimate the expected loss with no deductible. (A) Less than 500 (B) At least 500, but less than 1000 (C) At least 1000, but less than 1500 (D) At least 1500, but less than 2000 (E) At least 2000 22.35 (CAS 3L, 5/12, Q.19) (2.5 points) You are given the following information:
• The number of claims follows a Poisson distribution with mean λ. • Observations other than 1 and 2 have been deleted from the data. • In the remaining data set, 75% of the observations are of 1 and 25% are of 2. • Calculate the maximum likelihood estimate of λ. A. Less than 0.15 B. At least 0.15, but less than 0.30 C. At least 0.30, but less than 0.45 D. At least 0.45, but less than 0.60 E. At least 0.60
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 640 Solutions to Problems: 22.1. B. The distribution function truncated below at 500 is G(x) = {F(x) - F(500)} / {1 -F(500)} = [{200000 /(200000 + 5002)}α − {200000 /(200000 + x2)}α] / {200000 /(200000 + 5002)}α = 1 - {450000 /(200000 +x2)}α. Thus G(500) = 0, G(1000) = 1 - {450000 /(200000 + 10002)}α= 1 - .375α G(5000) = 1 - {450000 /(200000 + 50002)}α= 1 - .01786α The contribution to the likelihood function from each interval is the difference in distribution functions at the top and bottom of each interval to the power of the number of losses: {1 - .375α }8 0 0 {.375α - .01786α }1 9 0 {.01786α}1 0. Comment: The distribution is a Burr with θ =
200,000 and γ = 2.
22.2. D. One needs to compute the likelihood for each claim and then multiply the terms together. Since the first policy has a deductible, we need to modify the likelihood for truncation and shifting from below. With a $1000 deductible, the density has to be divided by S(1000). Thus the likelihood for the first claim is: f(2500)/S(1000). Note that we use the density at 2500 in the numerator, since the loss in the absence of a deductible would be 1000 + 1500 = 2500. For the second claim of 18,000, there is no deductible, so that the likelihood is f(18,000). The likelihood for the third claim with a deductible of 5000 is: f(36000)/S(5000). Deductible Amount Paid by Size of Amount The Insurer Loss Likelihood 1000 1,500 2,500 f(2500)/S(1000) 0 18,000 18,000 f(18000) 5000 31,000 36,000 f(36000)/S(5000) Multiplying the individual likelihoods together: f(2500) f(18000) f(36000)/ {S(1000) S(5000)}. Comment: One could write the likelihood of the second claim similarly to the other two as: f(18000) / S(0); remember that S(0) = 1 - F(0) = 1 - 0 = 1.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 641 22.3. B. The Distribution Function of the data truncated from above at 3 is G(x) = F(x)/F(3) = (1 - e-λx) / (1 - e-3λ). The density function is g(x) = Gʼ(x) = λe-λx / (1 - e-3λ). The log density is: ln g(x) = ln(λ) - λx - ln(1 - e-3λ). One wishes to maximize the loglikelihood. Σ ln g(xi) = n ln(λ) - λΣxi - n ln(1 - e-3λ). In this case we have a total of 16 data points, so n =16. There are 7 claims with delay 1, 5 claims with delay 2, and 4 claims with delay 3. Thus Σxi = (7)(1) + (5)(2) + (3)(4) = 29. Thus the sum of the loglikelihoods is: 16 ln(λ) - 29λ - 16 ln(1 - e-3λ). Comment: See “Estimation of the Distribution of Report Lags by the Method of Maximum Likelihood” by Edward Weisner, PCAS 1978. In the solution it has been assumed the data is ungrouped. If in fact the delays are assumed to be grouped into intervals (0,1], (1,2], and (2,3], one gets a different likelihood function. 22.4. C. Since the coinsurance factor is applied last, we can remove its effects and convert to a question involving only deductibles. 600/0.8 = 750. 5220/0.9 = 5800. Thus the three payments prior to the effect of the coinsurance factors are: 750, 2500, and 5800. Compute the likelihood for each claim: Deductible Amount Paid by Insurer Size of Amount Prior to Coinsurance Loss Likelihood 1000 750 1750 f(1750)/S(1000) 2000 2500 4500 f(4500)/S(2000) 500 5800 6300 f(6300)/S(500) Multiplying the individual likelihoods together one gets: f(1750) f(4500) f(6300) / {S(500) S(1000) S(2000)}.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 642 22.5. E. The loglikelihood is the sum of the loglikelihoods from each year. For report year 1, with data for only lags 1 and 2, g(1) = f(1)/{f(1) + f(2)} = λe −λ/{λe−λ + λ 2e−λ/2} = 2/(2+λ), and g(2) = f(2)/{f(1) + f(2)} = λ/(2+λ). The contribution to the loglikelihood from report year 1 is: 30 ln(g(1)) + 4 ln(g(2)) = 30 ln(2/(2+λ)) + 4 ln(λ/(2+λ)) = 30 ln(2) + 4 ln(λ) - 34 ln(2+λ). For report year 2, with data for only lags 0 and 1, g(0) = f(0)/{f(0) + f(1)} = e−λ/{e−λ + λe −λ} = 1/(1+λ), and g(1) = f(1)/{f(0) + f(1)} = λ/(1+λ). The contribution to the loglikelihood is: 203 ln(g(0)) + 27 ln(g(1)) = 203 ln(1/(1+λ)) + 27 ln(λ/(1+λ)) = 27 ln(λ) - 230 ln(1+λ). For report year 3, the loglikelihood is: 212 ln (g(0)) = 212 ln(1) = 0. The total loglikelihood is: 30 ln(2) + 31 ln(λ) - 34ln(2+λ) - 230 ln(1+λ). Setting the derivative of the loglikelihood with respect to λ equal to zero: 31/λ - 34/(2+λ) - 230/(1+λ) = 0. 233λ2 + 401λ - 62 = 0. λ = 0.143. 22.6. D. The loglikelihood is the sum of the loglikelihoods from each year. For report year 1, with data for only lags 1 and 2, g(1) = f(1)/{f(1) + f(2)} = 1/(1+p), and g(2) = f(2)/{f(1) + f(2)} = p/(1+p). The contribution to the loglikelihood is: 30 ln(g(1)) + 4 ln(g(2)) = 30 ln(1/(1+p)) + 4 ln(p/(1+p)) = 4 ln(p) - 34 ln(1+p). For report year 2, with data for only lags 0 and 1, the loglikelihood is: 203 ln(g(0)) + 27 ln(g(1)) = 203 ln(1/(1+p)) + 27 ln(p/(1+p)) = 27 ln(p) - 230 ln(1+p). For report year 3, the loglikelihood is: 212 ln (g(0)) = 212 ln(1) = 0. The total loglikelihood is: 31 ln(p) - 264 ln(1+p). Setting the derivative of the loglikelihood with respect to λ equal to zero: 31/p - 264/(1+p) = 0. p = 31/(264-31) = 31/233 = 0.133. Comment: Similar to 4, 5/01, Q.34. 22.7. E.
Deductible Amount Paid by Size of Amount The Insurer Loss 5000 1000 6000 5000 3000 8000 5000 6000 11000 5000 15000 20000 5000 40000 45000 Multiplying the individual likelihoods together one gets: f(6000) f(8000) f(11000) f(20000) f(45000)/S(5000)5 .
Likelihood f(6000)/S(5000) f(8000)/S(5000) f(11000)/S(5000) f(20000)/S(5000) f(45000)/S(5000)
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 643 22.8. A. This data is truncated from above. For example, any accidents at time 0 that took more than 3 years to be reported, did not make it into our data base. Data for accidents at time 0 is truncated from above at 3. G(x) = F(x)/F(3). Contribution to the likelihood from the 18 claims reported by time 1 is: G(1)18 = F(1)18/F(3)18. Contribution from the 13 claims reported between time 1 and 2 is: {G(2) - G(1)}18 = (F(2) - F(1))13/F(3)13. Time of Acc. Time of Report x Trunc. Point
Reported Claims
Contribution F(1)18/F(3)18
0
t≤1
(0, 1]
3
18
0
1< t≤2
(1, 2]
3
13
(F(2) - F(1))13/F(3)13
0
2< t≤3
(2, 3]
3
9
(F(3) - F(2))9 /F(3)9
1
t≤2
(0, 1]
2
14
1
2< t≤3
(1, 2]
2
10
(F(2) - F(1))10/F(2)10
1
11
F(1)11/F(1)11 = 1
2 t≤3 (0, 1] Likelihood = product of the contributions =
F(1)14/F(2)14
F(1)3 2(F(2) - F(1))2 3(F(3) - F(2))9 /{F(2)2 4F(3)4 0}. 22.9. C. The survival function truncated below at 5000 is: 1 - G(x) = S(x)/S(5) = exp[-(x/θ)τ]/ exp[-(5/θ)τ]. Thus G(5) = 0, G(10) = 1 - exp[-(10/θ)τ]/ exp[-(5/θ)τ]. G(50) = 1 - exp[-(50/θ)τ]/ exp[-(5/θ)τ]. The contribution to the likelihood function from each interval is the difference in distribution functions at the top and bottom of each interval to the power of the number of losses: {(exp[-(5/θ)τ] - exp[-(10/θ)τ])/ exp[-(5/θ)τ]}60 {(exp[-(10/θ)τ] - exp[-(50/θ)τ])/ exp[-(5/θ)τ]}25 {exp[-(50/θ)τ]/ exp[-(5/θ)τ]}15 = {exp[-(5/θ)τ] - exp[-(10/θ)τ]}60{exp[-(10/θ)τ] - exp[-(50/θ)τ]}25 exp[-(50/θ)τ]15/ exp[-(5/θ)τ]100. 22.10. D. One has to rescale f in order to take into account the missing data. g(x) = f(x)/{1 - (F(25000) - F(10000))} = f(x)/{1 + F(10000) - F(25000)}, x ≤ 10000 or x ≥ 25000. G(10000) = F(10000)/{1 + F(10000) - F(25000)}. Integrating from 25000 to ∞, 1 - G(25000) = (1 - F(25000))/{1 + F(10000) - F(25000)}. Likelihood = G(10000)100(1 - G(25000))10 = F(10000)1 0 0(1 - F(25000))1 0 / {1 + F(10000) - F(25000)}1 1 0. Comment: Similar to 4B, 11/99, Q.6. h integrates to unity over its support.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 644 22.11. B. The data is truncated from below. g(x) = f(x)/{1 - f(0)} = (e−λ λx / x! )/ (1 - e−λ). ln g(x) = -λ + xlnλ - ln(x!) - ln(1 - e−λ).
∂ Σ ln g(xi) / ∂λ = Σ {-1 + xi/λ − e−λ/(1 - e−λ)} = -N(1 + e−λ/(1 - e−λ)) + Σ xi/λ. Setting this partial derivative equal to zero: λ/(1 - e−λ) = Σ xi / N = (9776 + (2)(200) + (3)(21) + (4)(3))/10000 = 1.0251. Try the given choices of lambda on the left hand side of the equation and λ = 0.05. Alternately, e−λ = 1 - λ + λ2/2 - ... ⇒ λ/(1 - e−λ) ≅ λ/(λ - λ2/2) = 2/(2 - λ). Set 2/(2 - λ) = 1.0251. ⇒ λ ≅ 0.050. 22.12. E. The first interval contributes to the loglikelihood: 800 ln[{F(1000) - F(500)}/S(500)] = 800 ln[(e-500/θ - e-1000/θ)/e-500/θ] = 800 ln[(1 - e-500/θ)]. The second interval contributes to the loglikelihood: 200 ln[S(1000)/S(500)] = 200 ln[e-1000/θ/e-500/θ] = 200 ln[e-500/θ]. Let y = e-500/θ. Then the loglikelihood is: 800ln(1 - y) + 200lny. Setting the derivative equal to zero: 800/(1 - y) = 200/y. ⇒ 800y = 200 - 200y. ⇒ y = .20. ⇒ .20 = e-500/θ. ⇒ θ = 310.7. 22.13. B. If the ground-up losses follow an Exponential, then the non-zero payments excess of a deductible follow the same Exponential. For ungrouped data, the method of maximum likelihood is the same as the method of moments for an Exponential. θ = average of the payments = (500 + 2000 + 4000 + 14,000)/4 = 5125.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 645 22.14. We need to either compare the truncated and shifted data to the Exponential distribution altered for the effect of truncation and shifting, or compare the truncated data to the Exponential distribution altered for the effect of truncation. For the fitted Exponential with θ = 5125, F(500) = 1 - e-500/5125 = 0.0930. F(2000) = 1 - e-2000/5125 = 0.3231. F(4000) = 1 - e-4000/5125 = 0.5418. F(14000) = 1 - e-14000/5125 = 0.9349. Alternately, the Exponential with θ = 5125 adjusted for truncation is: G(x) = {F(x) - F(1000)}/S(1000) = (e-1000/5125 - e-x/5125)/e-1000/5125 = 1 - e-(x-1000)/5125. G(1500) = 1 - e-500/5125 = 0.0930. G(3000) = 1 - e-2000/5125 = 0.3231. G(5000) = 1 - e-4000/5125 = 0.5418. G(15000) = 1 - e-14000/5125 = 0.9349. The horizontal coordinates are j/(n+1), where n = 4 the number of points in the data. Thus the plotted points are: (1/5, 0.0930), (2/5, 0.3231), (3/5, 0.5418), and (4/5, 0.9349). Here is the p-p plot:
Comment: Similar to Example 16.3 in Loss Models.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 646 22.15. D. The data has been truncated from above. Truncated density at 0 is: f(0)/F(2) = e−λ/(e−λ + λe−λ + λ2e−λ/2) = 2/(2 + 2λ + λ2). Truncated density at 1 is: f(1)/F(2) = λe−λ/(e−λ + λe−λ + λ2e−λ/2) = 2λ/(2 + 2λ + λ2). Truncated density at 2 is: f(2)/F(2) = (λ2e−λ/2)/(e−λ + λe−λ + λ2e−λ/2) = λ2/(2 + 2λ + λ2). Let n/3 = number of zeros observed = number of ones observed = number of twos observed. Then the loglikelihood is: (n/3){ln(2) - ln(2 + 2λ + λ2)} + (n/3){ln(2λ) - ln(2 + 2λ + λ2)} + (n/3){2ln(λ) - ln(2 + 2λ + λ2)} = (2n/3)ln(2) + n ln(λ) - n ln(2 + 2λ + λ2). Setting the derivative with respect to λ equal to zero: 0 = n/λ - n(2 + 2λ)/(2 + 2λ + λ2).
⇒ 2 + 2λ + λ2 = λ(2 + 2λ). ⇒ λ2 = 2. ⇒ λ = 2 = 1.414. Comment: Similar to 4, 11/04, Q.32. 22.16. A. From a 500 franchise deductible, we have data that is left truncated at x = 500. The first interval contributes to the loglikelihood: 100 ln[{F(1000) - F(500)}/S(500)] = 100 ln[(e-500/θ - e-1000/θ)/e-500/θ] = 100 ln[(1 - e-500/θ)]. The second interval contributes to the loglikelihood: 80 ln[S(1000)/S(500)] = 80 ln[e-1000/θ/e-500/θ] = 80 ln[e-500/θ]. From a 10000 franchise deductible, we have data that is left truncated at x = 1000. The first interval contributes to the loglikelihood: 200 ln[{F(2000) - F(1000)}/S(1000)] = 100 ln[(e-1000/θ - e-2000/θ)/e-1000/θ] = 200 ln[(1 - e-1000/θ)]. The second interval contributes to the loglikelihood: 50 ln[S(2000)/S(1000)] = 50 ln[e-2000/θ/e-1000/θ] = 50 ln[e-1000/θ]. Let y = e-500/θ. Then the loglikelihood is: 100ln(1 - y) + 80lny + 200ln(1 - y2 ) + 50lny2 = 100ln(1 - y) + 80lny + 200ln(1 - y) + 200ln(1 + y) + 100lny = 180lny + 300ln(1 - y) + 200ln(1 + y). Setting the derivative equal to zero: 180/y - 300/(1 - y) + 200/(1 + y) = 0.
⇒ 180(1 - y2 ) - 300y(1 + y) + 200y(1 - y) = 0. ⇒ 680y2 + 100y - 180 = 0. ⇒ 34y2 + 5y - 9 = 0. ⇒ y = {-5 + 25 - (4)(34)(-9)}/68 = 0.4462. ⇒ e-500/θ = 0.4462. ⇒ θ = 620.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 647 22.17. For example, the mean residual life at 30 is: (.96 + 2.13 + 2.72 + 5.12 + 8.66)/5 = 3.918. MRL 8 7 6
5 4
10
15
20
25
30
x
Comment: At values above the truncation point, the mean residual life is unaffected by truncation from below. 22.18. The mean residual life appears roughly constant or perhaps decreasing somewhat. Therefore, the righthand tail is relatively light. For example, take an Exponential Distribution. For an Exponential, truncation and shifting from below gives the same Exponential. Thus we can fit to the data truncated and shifted from below at 8.4. Then the method of moments is the same as maximum likelihood. Therefore, θ^ = average of data - 8.40 = 14.84 - 8.40 = 6.44. If the ground up losses follow this Exponential, then the data truncated from below at 8.4 follows: F(x) = 1 - e-x/6.44/e-8.40/6.44 = 1 - e-(x - 8.4)/6.44, a shifted Exponential. Comment: The results of fitting a Weibull or Gamma will be discussed subsequently.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 648 22.19. D(x) = Empirical Distribution at x - Fitted Distribution at x. D(x) 0.06 0.04 0.02
10
15
20
25
30
35
40
x
- 0.02 - 0.04 - 0.06 For example at 30, the empirical distribution is 95/100 = 0.9500, and the fitted distribution is: 1 - e-(30 - 8.4)/6.44 = 0.9651. D(30) = .9500 - .9651 = -0.0151. 22.20. The Kolmogorov-Smirnov Statistic is the greatest distance of the difference graph from the x-axis, which is about 0.070. More precisely, just before 9.80 the empirical distribution function is 0.11 and the fitted function is: 1 e-(9.8 - 8.4)/6.44 = 0.1803. The K-S statistic is |.11 - .1803| = 0.0703. For 100 points, the 20% critical value is 1.07/ 100 = 0.107. Since 0.070 < 0.107, at 20% we do not reject the fit of the shifted Exponential. 22.21. For example, one could add τ and get a Weibull Distribution. Remembering that the data is truncated from below, via numerical methods using the Exponential as a starting point, the maximum likelihood Weibull has θ = 5.83 and τ = 0.947. Alternately, one could add α and get a Gamma Distribution. Remembering that the data is truncated from below, via numerical methods using the Exponential as a starting point, the maximum likelihood Gamma has α = 0.777 and θ = 6.99.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 649 22.22. The shifted Exponential has a loglikelihood of -286.242. The shifted Weibull has a loglikelihood of -286.224. The shifted Gamma has a loglikelihood of -286.221. Comparing the Exponential and the Weibull, the statistic is: 2{-286.224 - (-286.242)} = 0.036. For 1 degree of freedom, the 10% critical value for the Chi-Square is 2.706. Since 0.036 < 2.706, at 10% we do not reject the simpler Exponential in favor of the more complicated Weibull. Comparing the Exponential and the Gamma, the statistic is: 2{-286.221 - (-286.242)} = 0.042. For 1 degree of freedom, the 10% critical value for the Chi-Square is 2.706. Since 0.042 < 2.706, at 10% we do not reject the simpler Exponential in favor of the more complicated Gamma. 22.23. A. The concept being tested is that the Method of Moments is often used first to get a preliminary estimate of parameters since it is usually faster and easier than the method of maximum likelihood. This estimate in turn can be used as the starting point for a numerical algorithm to determine those parameters that have the maximum likelihood. Finally one can test the fit(s) by comparing the fitted and empirical statistics such as the Limited Expected Value. 22.24. E. The data has been truncated from above at $5000. The distribution function is: F(x) = 1 - e-λx, x > 0. For the data truncated from above, the distribution is: G(x) = F(x) / F(5) = (1 - e-λx)/(1 - e-5λ), 0 < x < 5. The likelihood of a claim in each interval is given by the difference of the distribution function H at the two endpoints. For the first interval (0,2) that is: (1 - e-2λ)/(1 - e-5λ) - 0. For the second interval [2,5) that is: 1 - (1 - e-2λ)/(1 - e-5λ) = (e-2λ - e-5λ)/(1 - e-5λ). The likelihood function is the product of the likelihoods for the intervals, each taken to the power of the number of claims observed in that interval: {(1 - e-2λ)/(1 - e-5λ)}2 {(e-2λ - e-5λ)/(1 - e-5λ)}4 = (1 - e-2λ ) 2 ( e-2λ - e-5λ ) 4 (1 - e-5λ ) - 6. Comment: One could maximize the sum of the loglikelihoods instead of the likelihoods, but that is not one of the choices given here.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 650 22.25. E. Let x = years of disability minus one year, then x follows a Pareto with α = 1: F(x) = 1 - (θ/(θ+x))α = 1 - θ/(θ+x) = x/(θ+x). Take the difference in distribution functions at the top and bottom of each interval to the power of the number of claims in that interval: Bottom of Top of Number of F(Bottom F(Top of Difference Contribution Interval Interval Claims of Interval) Interval) of Distrib. to Likelihood 0
1
10
1
∞
30
0
1 / (1+θ)
1 / (1+θ)
1
1 / (1+θ)
1 / (1+θ)10
θ / (1+θ)
{θ /(1+θ)}30
The likelihood is therefore the product of the contributions from each interval: {1 / (1+θ)10} {θ / (1+θ)}30. The loglikelihood is therefore: 30 ln(θ) - 40 ln(1+θ). To maximize the likelihood set the (partial) derivative of the loglikelihood with respect to θ equal to zero: 0 = 30 / θ - 40 / (1+θ). Solving, θ = 3. Comment: The assumption here is that the truncated and shifted data follow a Pareto, rather than that the original ground up losses follow a Pareto. 22.26. E. Data truncated from below at 100 has a density function of g(x) = f(x) / (1-F(100)). The likelihood of the observation is: g(200)g(300)g(400) = f(200)f(300)f(500)/[1 - F(100)]3 . Comment: Note that we are assuming that the ground-up losses follow f(x;θ). 22.27. E. The density function for the data truncated has to be adjusted for the probability missing from the data. The remaining probability is F(k2 ; θ) - F(k1 ; θ). The density after truncation is thus f(x ; θ) / {F(k2 ; θ) - F(k1 ; θ)} . The likelihood to be maximized is the product of these densities at the observed points:
∏ f(x i ; θ) F(k2 ; θ) - F(k1; θ)
.
22.28. E. If we eliminate data from the data base, in a manner similar to truncation, then the densities need to be divided by the remaining probability, so that the altered densities will integrate to unity. In this case we have eliminated the data on medium size losses, which have probability: F(k2 ) - F(k1 ). Thus the remaining probability is only: 1 - {F(k2 ) - F(k1 )} = 1 + F(k1 ) - F(k2 ). Therefore, the density corresponding to the altered data is: f(x) / {1 + F(k1 ) - F(k2 )}, for x < k1 or x > k2 . The likelihood is then the product of this density at the observed points:
∏ f(xi ; θ) 1 + F(k1 ; θ) - F(k2; θ)
.
Comment: This is a very unusual situation, the reverse of a combination of truncation from below and truncation from above; we do not have data on the medium size losses.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 651 22.29. D. The loglikelihood is the sum of the loglikelihoods from each year. For report year 1997, with data for only lags 1 and 2, g(n) = f(n)/(f(1) + f(2)). g(1) = f(1)/(f(1) + f(2)) = (1-p)p/{(1-p)p + (1-p)p2 } = 1/(1+p). g(2) = f(2)/(f(1) + f(2)) = (1-p)p2 /{(1-p)p + (1-p)p2 } = p/(1+p). Therefore the loglikelihood is: 3 ln(g(1)) + 1 ln(g(2)) = 3 ln(1/(1+p)) + ln(p/(1+p)) = ln(p) - 4 ln(1+p). For report year 1998, with data for only lags 0 and 1, g(n) = f(n)/(f(0) + f(1)). g(0) = f(0)/(f(0) + f(1)) = (1-p)/{(1-p) + (1-p)p} = 1/(1+p). g(1) = f(1)/(f(0) + f(1)) = (1-p)p/{(1-p) + (1-p)p} = p/(1+p). Therefore the loglikelihood is: 5 ln(g(0)) + 2 ln(g(1)) = 5 ln(1/(1+p)) + 2 ln(p/(1+p)) = 2 ln(p) - 7 ln(1+p). For report year 1999, the loglikelihood is: 4 ln (g(0)) = 4 ln(1) = 0. Thus the total loglikelihood is: 3 ln(p) - 11 ln(1+p). Setting the derivative of the loglikelihood with respect to p equal to zero: 3/p - 11/(1+p) = 0. p = 3/(11-3) = 3/8. Comment: This is a Geometric Distribution with p = β/(1+β). Here is a graph of the loglikelihood as a function of p: Loglikelihood - 6.5 - 7.0 - 7.5 - 8.0 - 8.5
0.2
0.4
0.6
0.8
1.0
p
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 652 22.30. A. The data would be truncated and shifted from below if we subtract the deductible of 100 from each of the losses: 25, 50, 65, 75, 150. If the losses prior to the impact of a deductible follow an Exponential, then the data truncated and shifted follows the same Exponential, due to the memoryless property of the Exponential. Therefore, fitting by the method of moments, which in this case is the same as the method of maximum likelihood: θ = (25 + 50 + 65 + 75 + 150)/5 = 73. Alternately after truncation from below, g(x) = f(x)/S(100) = (e-x/θ/θ)/e-100/θ = e-(x-100)/θ/θ, x > 100. ln(g(x)) = -(x-100)/θ - ln(θ). The loglikelihood = - (125-100)/θ - ln(θ) - (150-100)/θ - ln(θ) - (165-100)/θ - ln(θ) (175-100)/θ - ln(θ) - (250-100)/θ - ln(θ) = -365/θ - 5ln(θ). Setting the partial derivative with respect to θ equal to zero: 365/θ2 - 5/θ = 0. θ = 73. 22.31. A. The data would be truncated and shifted from below if we subtract the truncation point of 500 from each of the losses: 600 - 500 = 100, 700 - 500 = 200, 900 - 500 = 400. If the ground up losses follow an Exponential, then the data truncated and shifted follows the same Exponential, due to the memoryless property of the Exponential. Therefore, fitting by the method of moments, which in this case is the same as the method of maximum likelihood: θ = (100 + 200 + 400)/3 = 223.3. Alternately after truncation from below, g(x) = f(x)/S(500) = (e-x/θ/θ)/e-500/θ = e-(x-500)/θ/θ, x > 500. ln(g(x)) = -(x-500)/θ - ln(θ). The loglikelihood = - (600-500)/θ - ln(θ) - (700-500)/θ - ln(θ) - (900-500)/θ - ln(θ) = -700/θ - 3ln(θ). Setting the partial derivative with respect to θ equal to zero: 700/θ2 - 3/θ = 0. θ = 233.3. Comment: This is mathematically the same as a deductible of 500, and we are given the sizes of loss rather than payment. For an Exponential Distribution, θ^ = (total payments) / (# of uncensored values) = (100 + 200 + 400)/3 = 223.3.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 653 22.32. C. The data has been truncated from above. Truncated density at 0 is: f(0)/F(1) = e−λ/(e−λ + λe−λ) = 1/(1+λ). Truncated density at 1 is: f(1)/F(1) = λe−λ/(e−λ + λe−λ) = λ/(1+λ). Let n/2 = number of zeros observed = number of ones observed. Then the loglikelihood is: -(n/2)ln(1 + λ) + (n/2){ln(λ) - ln(1 + λ)} = (n/2)ln(λ) - n ln(1 + λ). Setting the derivative with respect to λ equal to zero: 0 = n/(2λ) - n/(1 + λ). ⇒ λ = 1. Comment: If one were to set f(0) = f(1): e−λ = λe−λ. ⇒ λ = 1. An example of where an intuitive approach works for the Poisson. Also if λ is integer, the modes are λ - 1 and λ. Thus if λ = 1, then 0 and 1 are the modes. In general, if observations other than x and x+1 have been deleted from the data, and the data contain an equal number of observations of x and x+1, then the truncated densities are: g(x) = f(x)/{f(x) + f(x+1)} = (e−λλ x/x!)/{e−λλ x/x! + e−λλ x+1/(x+1)!} = (x + 1)/(x +1 + λ), and g(x+1) = f(x+1)/{f(x) + f(x+1)} = (e−λλ x+1/(x+1)!)/{e−λλ x/x! + e−λλ x+1/(x+1)!} = λ/(x + 1 + λ). Let n/2 = number of x observed = number of x+1 observed. Then the loglikelihood is: (n/2)ln(x + 1) - (n/2)ln(x + 1 + λ) + (n/2)ln(λ) - (n/2)ln(x + 1 + λ) Setting the derivative with respect to λ equal to zero: 0 = n/(2λ) - n/(x + 1 + λ). ⇒ x + 1 + λ = 2λ. ⇒ λ = x + 1. If one were to set f(x) = f(x+1): e−λλ x/x! = e−λλ x+1/(x+1)!. ⇒ λ = x + 1. 22.33. A. Right truncated density at 0 is: f(0)/F(1) = e−λ/(e−λ + λe−λ) = 1/(1+λ). Right truncated density at 1 is: f(1)/F(1) = λe−λ/(e−λ + λe−λ) = λ/(1+λ). Let 2n/3 = number of zeros observed, n/3 = number of ones observed. Then the loglikelihood is: -(2n/3)ln(1 + λ) + (n/3){ln(λ) - ln(1 + λ)} = (n/3)ln(λ) - n ln(1 + λ). Setting the derivative with respect to λ equal to zero: 0 = n/(3λ) - n/(1 + λ). ⇒ λ = 1/2. Comment: f(0) = 2f(1) ⇒ e−λ = 2λe−λ. ⇒ λ = 1/2.
2013-4-6, Fitting Loss Distributions §22 Maximum Like. Trunc., HCM 10/15/12, Page 654 22.34. A. S(x) = {θ/(θ + x)}α. S(100) = (400/500)α = (4/5)α. f(x) = α 400α /(400 + x)α+1. The density truncated from below at 100 is: f(x)/S(100) = α 500α /(400 + x)α+1. Log density truncated from below at 100 is: ln(α) + α ln(500) - (α + 1)ln(400 + x). Loglikelihood is: N ln(α) + N α ln(500) - (α + 1)Σln(400 + xi). Setting the partial derivative of the loglikelihood with respect to α equal to zero: 0 = N/α + N ln(500) - Σln(400 + xi) = N/α - Σln[(400 + xi)/500].
⇒ α = N/Σln[(400 + xi)/500] = 7/(ln[520/500]+ ln[580/500]+ ln[600/500]+ ln[670/500]+ ln[700/500]+ ln[1400/500]+ ln[2900/500]) = 1.849. E[X] = θ/(α - 1) = 400/(1.849 - 1) = 471. Comment: Fitting via maximum likelihood to a Pareto Distribution with θ known, α^ = (number of uncensored values)/Σln[(θ + xi)/(θ + di)], where xi is the ith loss, possibly censored from above, and di is the deductible for the policy for xi. Here all di = 100 and no loss is censored. 22.35. E. The probability of observing 1 claim in the data is: f(1) λ e- λ 2 = = . λ 2 λ f(1) + f(2) λ e + λ e /2 2 + λ The probability of observing 2 claims in the data is: f(2) λ2 e - λ / 2 λ = = . λ 2 λ f(1) + f(2) λ e + λ e /2 2 + λ Let n be the total number of observations. Then the loglikelihood is: 2 λ (3n/4) ln[ ] + (n/4) ln[ ] = (n/4) {3 ln[2] + ln[λ] - 4 ln[2 + λ]}. 2 + λ 2 + λ Setting the derivative with respect to λ equal to zero: 0 = (n/4) {1/λ - 4/(2+λ)}. ⇒ 4λ = 2 + λ. ⇒ λ = 2/3. Comment: The data has been truncated from below and truncated from above For a Poisson nice intuitive results tend to hold. Since 75% of the observations are of 1 and 25% are of 2, set f(1) = 3f(2).
⇒ λe−λ = (3)( λ2 e−λ / 2). ⇒ λ = 2/3.
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 655
Section 23, Single Parameter Pareto Distribution, Data Truncated from Below Rather than assuming that the ground-up losses come from a particular distribution function, one may assume directly that the truncated losses come from a particular distribution. The Single Parameter Pareto Distribution is designed precisely to work well with data truncated from below.287 F(x) = 1 -
⎛ θ⎞ α , x > θ. ⎝ x⎠
α θα α ⎛ θ ⎞ α + 1 f(x) = α + 1 = , x > θ. θ ⎝ x⎠ x
In order to work with data truncated from below at $150,000, one would take θ = 150,000. No further adjustment is needed to work directly with this distribution. The Single Parameter Pareto Distribution works directly with data truncated from below; in this case there is no need to alter the form of the distribution for truncation. Choose theta equal to the truncation point from below. x is the size of loss prior to subtracting the deductible. Percentile Matching: Percentile matching proceeds as follows: p1 = F(x1 ) = 1 - (x1 / θ) - α. Thus, 1 - p1 = (x1 / θ) - α. Taking natural logs of both sides: ln(1 - p1 ) = -α ln (x1 / θ). Solving for α : α = - ln(1 - p1 ) / ln(x1 / θ). For example, for the data in Section 2 truncated at $150,000, for percentile matching to the 26th claim (out of the 52 losses > $150,000) of $406,900: p 1 = 26/(52+1) = 26/53 and x1 = 406,900, and θ = 150,000. Thus α = - ln(1-p1 ) / ln (x1 / θ) = - ln(1- 26/53) / ln (406,900 / 150,000) = -(-.6745) / .9979 = 0.676. Exercise: Fit a Single Parameter Pareto distribution to this data truncated at 150,000, via percentile matching at the 40th observed claim. [Solution: The 40th observed claim (of those greater than 150,000) is 766,100. Thus α = - {ln(1 - 40/53)} / ln(766,100/150,000) = - (-1.4053) / 1.6307 = 0.862. Alternately, from Appendix A, VaRp [X] = θ (1- p) - 1/ α . Thus, 766,100 = (150,000) (1 - 40/53)-1/α. ⇒ (13/53)1/α = 0.1958. ⇒ α = 0.862.] 287
If one has data truncated and shifted from below, in other words the payments after the application of the deductible, then one can translate it to data truncated from below, and then work directly with the Single Parameter Pareto Distribution.
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 656 Method of Moments: The method of moments is easily applied to the Single Parameter Pareto distribution since only the αθ first moment needs to be matched and the mean is given by: for α > 1. α −1
X =
αθ X .⇒α= . α −1 X - θ
For example, for the data in Section 2 truncated from below at $150,000, the observed first moment is 684,550. Thus if we fit via method of moments a Single Parameter Pareto Distribution with θ = 150,000 to this data truncated from below at 150,000: α=
X X - θ
=
684,550 = 1.281. 684,550 - 150,000
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 657 Method of Maximum Likelihood: The method of maximum likelihood is easy to apply to the Single Parameter Pareto distribution and data truncated from below. Exercise: Assume that the following data has been truncated from below at 5: 10, 15, 20, and 50. Fit a Single Parameter Pareto to this data via the method of maximum likelihood. [Solution: For a Single Parameter Pareto, with θ = 5, f(x) = α 5α / xα+1.
Σ ln f(xi) = Σ ln(α) + α ln(5) - (α+1)ln(xi). ∂Σ ln f(xi) /∂α = Σ (1/α) + ln(5) - ln(xi) = N/α - Σln(xi/5). Setting equal to zero the partial derivative with respect to alpha of the loglikelihood: N/α - Σln(xi/5) = 0. α = N / Σln(xi / 5) = 4/(ln(2) + ln(3) + ln(4) + ln(10)) = 0.730.] In general, for a Single Parameter Pareto Distribution, the maximum likelihood fit: α =
N
∑ ln[xi / θ]
.
Exercise: Derive the above formula, by using a change of variables to translate a Single Parameter Pareto Distribution into an Exponential Distribution with mean 1/α. ⎛ θ⎞ α [Solution: F(x) = 1 - ⎜ ⎟ , x > θ. Let y = ln[x/θ]. Then x/θ = ey. ⎝ x⎠ Substituting into F(x), F(y) = 1 - e−αy, an Exponential Distribution with mean 1/α. The maximum likelihood fit for the Exponential is equal to the method of moments, 1/α = Σyi / N. Since maximum likelihood is invariant under change of variables, this must also be the maximum likelihood fit for a Single Parameter Pareto Distribution. α = N / Σyi = N / Σln(xi / θ).] For example, for the data in Section 2 truncated at $150,000, N = 52 and Σln(xi /θ) = ln(150,300/150,000) + ... + ln(4,802,200/150,000) = 57.6794.
⇒ α = 52/57.6794 = 0.902.288
288
The Single Parameter Distributions fit via the three methods differ.
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 658 Problems: Use the following information for the next 3 questions: From a policy with a $10,000 deductible, you observe 6 claims with the following amounts paid by the insurer: $22,000, $28,000, $39,000, $51,000, $80,000, and $107,000. 23.1 (2 points) You fit a Single Parameter Pareto Distribution to this data via the method of moments. What is the fitted α parameter? (A) 1.0
(B) 1.2
(C) 1.4
(D) 1.6
(E) 1.8
23.2 (2 points) You fit a Single Parameter Pareto Distribution to this data via percentile matching. The matching is performed at the payment of size $39,000. What is the fitted α parameter? (A) 0.2
(B) 0.4
(C) 0.6
(D) 0.8
(E) 1.0
23.3 (2 points) You fit a Single Parameter Pareto Distribution to this data via the Method of Maximum Likelihood. What is the fitted α parameter? (A) 0.6
(B) 0.8
(C) 1.0
(D) 1.2
(E) 1.4
23.4 (3 points) The size of loss is a Single Parameter Pareto Distribution with θ = 10. Three losses were observed: 11, 12, 13. If α is known to be an integer, determine the maximum likelihood estimate of α. A. 4
B. 5
C. 6
D. 7
E. 8
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 659 Use the following information for the next 3 questions: You observe the following 10 sizes of loss: 241,513 231,919 105,310 125,152 110,493 139,647 220,942 161,964
116,472 105,829
23.5 (2 points) A distribution: F(x) = 1 - (x/100000)-q, x > 100,000, is fit to this data via percentile matching at the 82nd percentile. Determine the value of q. A. 1.8 B. 2.0 C. 2.2 D. 2.4 E. 2.6 23.6 (2 points) A distribution: F(x) = 1 - (x/100000)-q, x > 100,000, is fit to this data via the Method of Moments. Determine the value of q. A. less than 2.0 B. at least 2.0 but less than 2.3 C. at least 2.3 but less than 2.6 D. at least 2.6 but less than 2.9 E. at least 2.9 23.7 (3 points) A distribution: F(x) = 1 - (x/100000)-q, x > 100,000, is fit to this data via the Method of Maximum Likelihood. Determine the value of q. A. less than 2.0 B. at least 2.0 but less than 2.3 C. at least 2.3 but less than 2.6 D. at least 2.6 but less than 2.9 E. at least 2.9
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 660 For the next two questions, use the following forty (40) observed losses that have been recorded in thousands of dollars and have been grouped as follows: Interval Number of Total Losses ($000) Losses ($000) (1, 4/3) 16 20 [4/3, 2) 10 15 [2, 4) 10 35 [4, ∞) 4 20 23.8 (2 points) Via matching at the 65th percentile, fit a Single Parameter Pareto Distribution. What is the resulting estimate of the probability that a loss will exceed 5000? A. less than 6% B. at least 6% but less than 7% C. at least 7% but less than 8% D. at least 8% but less than 9% E. at least 9% 23.9 (2 points) Via the method of moments, fit a Single Parameter Pareto Distribution. What is the resulting estimate of the probability that a loss will exceed 5000? A. less than 6% B. at least 6% but less than 7% C. at least 7% but less than 8% D. at least 8% but less than 9% E. at least 9%
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 661 23.10 (3 points) Let x1 , x2 ,..., x200 and y1 , y2 ,..., y300 denote independent random samples of losses from State X and State Y, respectively. 200
∑ ln(xi ) = 1000
200
i=1
i=1
300
300
∑ ln(yi) = 1500 i=1
∑ xi = 30,000 ∑ yi = 50,000 i=1
Single-parameter Pareto distributions with θ = 100, but different values of α, are used to model the sizes of individual losses in these states. Past experience indicates that the expected size of individual losses in State X is 3/4 times the expected size of individual losses in State Y. You intend to calculate the maximum likelihood estimate of α for State X, using the data from both states. Which of the following equations must be solved? 500 1421 (A) = 79. α (α + 3)2 (B)
500 1421 = 0. α α2
(C)
500 1421 = 1000. α α2
(D)
500 1421 300 = 79. α (α + 3)2 α + 3
(E)
500 1421 300 = 1000. α α+3 α2
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 662 23.11 (4, 5/86, Q.54) (3 points) You are given the grouped loss data shown below. You assume that the losses follow a probability density function f(x) =α100α x-α-1, for 100 ≤ x < ∞. This corresponds to a distribution function F(x) = 1 -(100/x)α. Which of the following expressions represents the maximum likelihood estimate of α? Interval Number of losses 100 - 200 1 200 - 500 0 500 and over 1 ln [ ln 10 / ln 5 ] ln [ ln 2 ] A. B. ln 2 ln 2
C.
ln [ ln 5 ] ln 2
D. ln [ln 5]
E. Cannot be determined from the information given. 23.12 (4B, 11/92, Q.4) (2 points) A random sample of n claims x1 , x2 ,..., xn , is taken from the distribution function: F(x) = 1 - x-α, x > 1. Determine the maximum likelihood estimator of α. A.
n
B.
n
∑ ln(xi) i=1
n n
∑ xi i=1
C.
n n
∑ ln(xi) i=1
D.
n n
∑ xi
E. n
i=1
n
∑ exp(xi) i=1
23.13 (4B, 11/97, Q.18) (2 points) You are given the following: • The random variable X has the density function f(x) = αx−(α+1) , 1 < x < ∞, α > 1. • A random sample is taken of the random variable X. Determine the limit of the method of moments estimator of α, as the sample mean goes to infinity. A. 0
B. 1/2
C. 1
D. 2
E. ∞
23.14 (4, 5/00, Q.21) (2.5 points) You are given the following five observations: 521 658 702 819 1217 You use the single-parameter Pareto with cumulative distribution function F(x) = 1 - (500/x)α , x > 500, α > 0. Calculate the maximum likelihood estimate of the parameter α. (A) 2.2
(B) 2.5
(C) 2.8
(D) 3.1
(E) 3.4
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 663 23.15 (4, 11/04, Q.18 & 2009 Sample Q.146) (2.5 points) Let x1 , x2 ,..., xn and y1 , y2 ,..., ym denote independent random samples of losses from Region 1 and Region 2, respectively. Single-parameter Pareto distributions with θ = 1, but different values of α, are used to model losses in these regions. Past experience indicates that the expected value of losses in Region 2 is 1.5 times the expected value of losses in Region 1. You intend to calculate the maximum likelihood estimate of α for Region 1, using the data from both regions. Which of the following equations must be solved? n (A) - Σln(xi) = 0. α (B)
(C)
n m (α + 2) 2 ∑ ln(yi) - Σln(xi) + = 0. α 3α (α + 2)2 n 2m 2 ∑ ln(yi) - Σln(xi) + = 0. α 3 α (α + 2) (α + 2)2
(D)
n 2m 6 ∑ ln(yi) - Σln(xi) + = 0. α α (α + 2) (α + 2)2
(E)
6 ∑ ln(y i) n 3m - Σln(xi) + = 0. α α (3 - α) (3 - α) 2
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 664 Solutions to Problems: 23.1. B. The Single Parameter Pareto Distribution is designed to work directly with the data truncated from below. One takes θ = deductible = 10,000. E[X] = αθ/(α - 1) = 10000α/(α - 1). The losses to the insured corresponding to the listed payments are: $32,000, $38,000, $49,000, $61,000, $90,000, and $117,000, with mean $64,500. Matching means: 10000α/(α - 1) = 64,500. α = 1.18. Comment: The mean only exists for α > 1, so choice A can be eliminated. The given data is in terms of payments to the insured; i.e., truncated and shifted from below. In order to work with the Single Parameter Pareto Distribution, one needs to translate the data into losses to the insured; i.e., truncated from below. 23.2. B. The Single Parameter Pareto Distribution is designed to work directly with the data truncated from below. One takes θ = deductible = 10,000. F(x) = 1 - (x/10000)−α, x > 10,000. The payment of $39,000, which corresponds to a loss of $49,000 to the insured, is the 3rd of 6 payments, so it corresponds to an estimate of the 3/(6+1) = 3/7 percentile. Matching the observed and theoretical percentiles: Require that: 3/7 = F($49,000) = 1 - (4.9)−α. Solving, α = - ln(4/7)/ln(4.9) = 0.35. Alternately, from Appendix A, VaRp [X] = θ (1- p) - 1/ α . Thus, 49,000 = (10,000) (1 - 3/7)-1/α. ⇒ (7/4)1/α = 4.9. ⇒ α = 0.35. Comment: The smoothed empirical estimate of the pth percentile out of a sample of size N is the p (N+1) element from smallest to largest. For p = 3/7 and N = 6: p (N+1) = (3/7)(6 + 1) = 3. ⇒ The 3rd element is the estimate of the 3/7 percentile, in other words where the distribution function is 3/7. 23.3. A. For a Single Parameter Pareto, with θ = 10000, f(x) = α 10000α / xα+1.
Σ ln f(xi) = Σ ln(α) + α ln(10000) - (α+1)ln(xi). ∂Σ ln f(xi) /∂α = Σ (1/α) + ln(10000) - ln(xi) = N/α - Σln(xi/10000). Setting equal to zero the partial derivative with respect to alpha of the loglikelihood: N/α - Σln(xi/10000) = 0. ⇒ α = N / [ (Σln(xi / 10000)) ] . Add 10,000 to each payment in order to get the corresponding size of loss. x1 /10000 = (22000 + 10000)/10000 = 3.2. α = 6/{ln(3.2) + ln(3.8) + ln(4.9) + ln(6.1) + ln(9.0) + ln(11.7)} = 0.57.
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 665 23.4. C. f(x) = αθα/xα+1 = α10α/xα+1. The likelihood is: f(11)f(12)f(13). Plug in the five choices for α and see that α = 6 produces the largest likelihood. alpha
f(11)
f(12)
f(13)
Likelihood
4 5 6 7 8
0.248369 0.282237 0.307895 0.326555 0.339278
0.160751 0.167449 0.167449 0.162798 0.155045
0.107732 0.103588 0.095620 0.085813 0.075440
0.004301 0.004896 0.004930 0.004562 0.003968
Comment: If α was not restricted to integer, α^ = 3/(ln(11/10) + ln(12/10) + ln(13/10)) = 5.56. 23.5. B. First put the claims in order from smallest to largest. The 9th claim is 231,919 and is an estimate of the 9/(1+10) = 82nd percentile. Set 1 - (231919/100000)-q = .82. Then 2.31919-q = .18. Taking logarithms: -qln(2.31919) = ln(.18). Solve for q = -ln(.18)/ln(2.31919) = 2.04. Alternately, from Appendix A, VaRp [X] = θ (1- p) - 1/ α . Thus, 231,919 = (100,000) (1 - 9/11)-1/α. ⇒ (11/2)1/α = 2.31919. ⇒ α = 2.03. Comment: One could linearly interpolate between 231,919 at the 81.8 percentile and 241,513 at the 90.9 percentile, and instead use 234,028 as the observed 82nd percentile. This would give instead a result of q = -ln(.18)/ln(2.34028) = 2.02. 23.6. D. The average of the 10 claims is 155,924. For the Single Parameter Pareto Distribution, the mean is: {α / (α - 1)} θ. For θ = 100000 and α = q, set mean = 100000q/(q-1) = 155924. Solve for q = 2.788. 23.7. C. The density function for this Single Parameter Pareto Distribution is f(x) = qx-(q+1)100000q . The loglikelihood is Σ lnf(xi) = Σ lnq + qln100000 - (q+1)ln(xi). To maximize this, set the partial derivative with respect to q equal to zero: Σ { 1/q + ln 100000 - ln(xi) } = 0. N/q - Σln(xi / 100000). q = N / Σln(xi / 100000) = 10 / 3.917 = 2.553.
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 666 23.8. D. & 23.9. A. Since the data is truncated from below at 1, take θ = 1. The 65th percentile of the data is at 2. 0.65 = F(2) = 1 - 1/2α. ⇒ α = 1.515. S(5) = 1/51.515 = 8.73%. The mean of the data is: (20 + 15 + 35 + 20) / 40 = 2.25. αθ 2.25 = E[X] = = α / (α - 1). ⇒ α = 1.800. α −1 S(5) = 1/51.8 = 5.52%. Comment: Data taken from 4B, 5/96, Q.23. 23.10. D. For the Single Parameter Pareto Distribution, E[X] = αθ/(α-1). For θ = 100, E[X] = 100α/(α-1). Let a be the shape parameter for State Y, while α is the shape parameter for State X. Then we are given: (3/4){100a/(a-1)} = 100 α/(α-1). ⇒ (3a)(α - 1) = (4α)(a - 1).
⇒ a = 4α / (α + 3). For the Single Parameter Pareto Distribution, f(x) = αθα/xα+1 = 100α α/xα+1. ln f(x) = α ln(100) + ln(α) - (α+1)ln(x). The contribution to the loglikelihood from State X is: 200 α ln(100) + 200 ln(α) - (α+1)(1000). The contribution to the loglikelihood from State Y is: 300 a ln(100) + 300 ln(a) - (a+1)(1500) = 1200 ln(100) α / (α + 3) + (300) {ln(4) + ln(α) - ln(α+3)} - {4α / (α + 3) + 1}(1500). Loglikelihood is: 200 α ln(100) + 200 ln(α) - (α+1)(1000) + 1200 ln(100) α / (α + 3) + (300) {ln(4) + ln(α) - ln(α+3)} - {4α / (α + 3) + 1}(1500). Setting the derivative with respect to α equal to zero: 200 ln(100) + 200/α - 1000 + 1200 ln(100) 3/(α + 3)2 + 300/α - 300/(α+3) - {12/(α + 3)2 }(1500) = 0.
⇒
500 1421 300 = 79. 2 α α+3 (α + 3)
Comment: Similar to 4, 11/04, Q.18. An example of restricted maximum likelihood. Solving on a computer, α^ = 2.95.
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 667 23.11. A. For grouped data, the likelihood is computed by taking the difference in distribution functions at the top and bottom of each interval and then taking the product to the power of the number of claims in that interval: Bottom of Top of Number of F(Bottom F(Top of Difference Contribution Interval Interval Claims of Interval) Interval) of Distrib. to Likelihood 100
200
1
0
200
500
0
1 - 2−α
500
∞
1
1 - 5−α
1 - 2−α
1 - 2−α
1 - 5−α
2−α - 5−α 5−α
1
1 - 2−α 1 5−α
Likelihood = (1 - 2−α)(1)(5−α) = 5−α - 10−α. In order to maximize the likelihood, set the partial derivative with respect to α of the likelihood equal to zero. (-ln5)(5−α) - (-ln10)(10−α) = 0. ⇒ 2α = (ln10)/ (ln5). α ln 2 = ln[ ln10 / ln 5 ]. Therefore, α = ln[ ln10 / ln5 ] / ln2. Comment: α = ln[ ln10 / ln 5 ] / ln2 = ln(1.4307) / 0.693 = 0.517. d bx / dx = (ln b) bx . 23.12. A. f(xi) = α xi−(α+1). ⇒ ln f(xi) = ln(α) - (α+1)ln(xi). Set ∂Σ ln f(xi) / ∂α = Σ {(1/α) - ln(xi)} = 0. Therefore, α = n / Σ ln(xi) Comment: This is a Single Parameter Pareto Distribution, with θ = 1. 23.13. C. The mean of the distribution can be computed as follows: ∞
∞
∞
x=∞
∫xf(x)dx = ∫x αx−(α−1) dx = ∫ αx−αdx = -αx−(α−1) / (α-1) ] = α / (α − 1). 1
1
1
x=1
Then in order to apply the method of moments, one sets the theoretical mean equal to the sample mean and solves for the single parameter α. α / (α − 1) = sample mean. Thus α = (sample mean)/ (sample mean-1). As the sample mean goes to infinity the estimated alpha goes to 1. Comment: A Single Parameter Pareto Distribution, with θ = 1 and mean = (α / (α − 1))θ = α / (α − 1). As alpha approaches one, the theoretical mean approaches infinity. For α ≤ 1, the mean does not exist.
2013-4-6, Fitting Loss Distributions §23 Single Param. Pareto, HCM 10/15/12, Page 668 23.14. B. f(x) = α 500α / xα+1. ln f(x) = ln(α) + α ln(500) - (α+1)ln(x).
Σ ln (f(xi)) = Σ { ln(α) + α ln(500) − (α+1)ln(xi) }. The derivative with respect to α is: Σ { (1/α) + ln(500) - ln(xi) } = (n/α) - Σ ln(xi/500). Setting this derivative equal to zero: 0 = (n/α) - Σ ln(xi/500). Solving: α = n / Σ ln(xi/500) = 5/ {ln(521/500) + ln(658/500) + ln(702/500) + ln(819/500) + ln(1217/500)} = 5/ (.04114 + .27460 + .33933 + .49348 + .88954) = 5/2.0381= 2.45. Comment: It is implicitly assumed that the data is truncated from below at 500, which is why one uses a Single Parameter Pareto Distribution with θ = 500. 23.15. D. For the Single Parameter Pareto Distribution, E[X] = αθ/(α-1). For θ = 1, E[X] = α/(α-1). Let a be the shape parameter for the second region, while α is the shape parameter for the first region. Then we are given: a/(a-1) = 1.5α/(α-1). ⇒ a = 3α/(α+2). For the Single Parameter Pareto Distribution, f(x) = αθα/xα+1 = α/xα+1. ln f(x) = ln(α) - (α+1)ln(x). The contribution to the loglikelihood from region one is: n ln(α) - (α+1)Σln(xi). The contribution to the loglikelihood from region two is: m ln(a) - (a+1)Σln(yi) = m {ln(3) + ln(α) - ln(α+2)} - {3α/(α+2) + 1}Σln(yi). Loglikelihood is: n ln(α) - (α+1)Σln(xi) + m ln(3) + m ln(α) - m ln(α+2) - {3α/(α+2) + 1}Σln(yi). Setting the derivative with respect to α equal to zero: n/α - Σln(xi) + m/α - m/(α+2) - 3{(α + 2 - α)/(α+2)2 }Σln(yi) = 0.
⇒ n/α - Σln(xi) + 2m/{α(α+2)} - 6Σln(yi)/(α+2)2 = 0. Comment: An example of restricted maximum likelihood. Choice A does not involve in any way the data from the second region, and thus can not be right.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 669
Section 24, Fitting to Censored Data289 For data censored at a maximum covered loss u, the precise size of any claim of size u or greater is not reported; all you know about such claims is that they are of at least size u. This is the usual way data from a policy with a maximum covered loss u is reported. The revised Distribution Function under right censoring at u is: ⎧F(x) G(x) = ⎨ ⎩1
x < u x = u
Percentile Matching: Percentile matching can be used to fit a curve as with uncensored data, except that one can only match to percentiles corresponding to sizes less than u. (For sizes greater than or equal to u, one does not have the required data to determine a percentile.) Method of Moments: The method of moments can be used to fit curves to censored data. However, one needs to calculate the moments for censored distributions. The mean for right censored data is the Limited Expected Value E[X ∧ u] for the censorship value u. Exercise: Prior to the effect of a maximum covered loss, losses are assumed to follow a Pareto Distribution, with θ = 8 and α unknown. Data from policies, each with a maximum covered loss of 10, has an observed mean of 2.857. Estimate α using the method of moments. [ Solution: For the Pareto Distribution, E[X ∧ x] = {θ/(α-1)} {1-(θ/(θ+x))α−1}. E[X ∧ 10] = {8/(α-1)} {1-(8/(8+10))α−1} = {8/(α-1)} {1-(4/9)α−1} = 2.857. One can solve numerically, and get α = 3.4.] The second moment for censored data is E[(X ∧ L)2 ]: E[(X ∧ L)2 ] = S(L) L2 +
L
∫ t2 f(t) dt .
0
289
A good paper on this and other subjects of practical importance is “Estimating Casualty Insurance Loss Distributions” by Gary Patrik, PCAS 1980.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 670
Formulas for E[(X ∧ L)2 ] and higher limited moments are given in Appendix A of Loss Models. For example, assume that prior to the effect of a maximum covered loss, losses are assumed to follow a LogNormal Distribution. Data from policies, each with a maximum covered loss of 50,000, has an observed mean of 49,882 and second moment of 2490 million. Here is how one would estimate µ and σ using the method of moments. For the LogNormal Distribution the first two limited moments are: E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 - Φ[(lnx − µ)/σ]}. E[(X ∧ x)2 ] = exp[2µ + 2σ2]Φ[{ln(x) − (µ+ 2σ2)} / σ] + x2 {1 - Φ[{ln(x) − µ} / σ]}. We then can write down two equations in two unknowns, by matching the observed and theoretical limited moments, for a limit of 50,000. Note that ln(50000) = 10.8198. 49,882 = exp(µ + σ2/2)Φ[(10.8198 − µ − σ2)/σ] + 50000{1 - Φ[(10.8198 − µ)/σ]}. 2490 million = exp[2µ + 2σ2]Φ[{10.8198 − (µ+ 2σ2)} / σ] + 500002 {1 - Φ[{10.8198 − µ} / σ] }. One could solve numerically to obtain: µ = 12.4 and σ = 0.7. Method of Maximum Likelihood: The method of Maximum Likelihood can be applied to censored data in order to fit distributions. In the case of grouped data the method is straightforward. One has a final interval from the censoring point to infinity. For example, if the grouped data in Section 3 were censored at $100,000, we would have the exact same information on the number of accidents in each interval; thus the maximum likelihood estimate would be identical. (However, the 33 accidents of size greater than or equal to $100,000 would have 33 x $100,000 = $3.3 million of reported losses rather than $4.295 million. The method of maximum likelihood makes no use of this information, and thus is unaffected.) For data censored from above at u, the interval from u to ∞ is treated as it would be for grouped data. Each loss of size greater than or equal to u contributes S(u) to the likelihood, or ln S(u) to the loglikelihood. The loglikelihood is: Σ ln f(xi) + M ln S(u), where M is the number of claims greater than or equal to the right censorship point u, and one sums ln f(xi) over the claims less than the censorship point.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 671
Exercise: It is assumed that prior to the effects of a maximum covered loss, the losses follow a distribution F(x), with density f(x). The following data has been collected from polices with a maximum covered loss of 100: 13, 25, 67, 81, 100+, 100+, 100+. Write down the likelihood. [Solution: Each of the small losses contributes f(x). The three large losses each contribute 1 - F(100) = S(100). Thus the likelihood is: f(13)f(25)f(67)f(81) S(100)3 .] For a specific distribution one could then solve numerically for the parameters that maximize either this likelihood or the loglikelihood of: ln f(13) + ln f(25) + ln f(67) + ln f(81) + 3 ln S(100). This mixture of terms involving the distribution and density function is appropriate for this purpose because the expected contributions to the partial derivatives of the loglikelihood is the same whether the data is grouped or ungrouped. Specifically, let θ be one of the parameters of the Distribution Function F(x). The expected contribution to the partial derivative of the loglikelihood of the claims larger than the censorship point u is: u
u
S(u){∂lnS(u)/∂θ} = ∂S(u)/∂θ = - ∂F(u)/∂θ = −{∂/∂θ}∫ f(x)dx = − ∫ ∂f(x)/∂θ dx. 0
0
The expected contribution to the partial derivative of the loglikelihood of the claims smaller than the censorship point u is: u
u
F(u) E[∂lnf(x)/∂θ ] = F(u) E[(∂f(x)/∂θ) / f(x)] = ∫ {(∂f(x)/∂θ) / f(x)} f(x) dx = ∫ ∂f(x)/∂θ dx. 0
0
Thus the two expected contributions to the partial derivative of the loglikelihood add up to zero, at the true values of the parameters. Thus setting the partial derivatives equal to zero is an appropriate fitting technique. In practical applications one is often faced with multiple data sets with different censorship points.290 One can just add up the different loglikelihoods and maximize this overall sum. Exercise: For the following claims data, what is the loglikelihood? Amount Paid Maximum Covered Loss 600 10,000 10,000 10,000 3000 20,000 20,000 20,000 [Solution: ln f(600) + ln S(10000) + ln f(3000) + ln S(20000).] 290
Policies are sold with different maximum covered losses.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 672
Censoring from Below:291 You have observed the following three loss amounts: 2000, 5000, 10,000. Two other amounts are known to be less than or equal to 1000.292 Since we do not know the exact size of the two small losses, this data has been censored from below. One can fit a distribution via Maximum Likelihood, by treating the losses less than or equal to 1000 as being grouped into an interval from 0 to 1000. Each small loss contributes to the likelihood the probability covered by the interval from 0 to 1000: F(1000). Each large loss contributes to the likelihood the density at the size of loss. Exercise: What is the loglikelihood for the above data? [Solution: 2 ln F(1000) + ln f(2000) + ln f(5000) + ln f(10000).] Exercise: Fit an Inverse Pareto Distribution with θ = 4000 to the above data via maximum likelihood. [Solution: F(x) = {x/(x + θ)}τ. ln F(x) = τ ln[x/(x + θ)]. ln F(1000) = τ ln[1/5] = -τ ln[5]. f(x) = τθ xτ−1/(x + θ)τ+1. ln f(x) = lnτ + lnθ + (τ - 1)ln[x] - (τ + 1)ln[x + θ]. loglikelihood is: -2τ ln[5] + 3 lnτ + 3ln4000 + (τ - 1)(ln2000 + ln5000 + ln10000) - (τ + 1)(ln6000 + ln9000 + ln14000). Setting the partial derivative of the loglikelihood with respect to tau equal to zero: 0 = -2 ln[5] + 3/τ + (ln2000 + ln5000 + ln10000) - (ln6000 + ln9000 + ln14000).
⇒ τ = 3 / {2 ln[5] + (ln6000 + ln9000 + ln14000) - (ln2000 + ln5000 + ln10000)} = 0.572.]
291
See 4, 11/06, Q. 5. It can occur in practical applications that an actuary would know the size of each big loss, while not having readily available the size of each smaller loss. For example, in Workers Compensation Insurance, medical only losses of size less than 2000 commonly are grouped together for purposes of reporting them to a Rating Bureau. 292
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 673
Problems: Use the following information for the next seven questions: You observe five claims from a policy with a 1000 maximum covered loss: 150, 400, 900, 1000, 1000. 24.1 (2 points) You fit an exponential distribution F(x) = 1 - e-λx to this data via the method of moments. The parameter λ is in which of the following intervals? A. 0.0008
B. 0.0009
C. 0.0010
D. 0.0011
E. 0.0012
24.2 (2 points) You fit an exponential distribution F(x) = 1 - e-λx to this data via percentile matching at the 50th percentile. The parameter λ is in which of the following intervals? A. 0.0008
B. 0.0009
C. 0.0010
D. 0.0011
E. 0.0012
24.3 (3 points) You fit a uniform distribution on (0, b) to this data via the method of moments. Determine the fitted value of b. A. 1500 B. 1600 C. 1700 D. 1800 E. 1900 24.4 (3 points) You fit a uniform distribution on (0, b) to this data via maximum likelihood. Determine the fitted value of b. A. 1500 B. 1600 C. 1700 D. 1800 E. 1900 24.5 (4 points) You fit a LogNormal Distribution, with σ = 2 to this data via the method of moments. The parameter µ is in which of the following intervals? A. less than 6.0 B. at least 6.0 but less than 6.5 C. at least 6.5 but less than 7.0 D. at least 7.0 but less than 7.5 E. at least 7.5 24.6 (3 points) You fit a Pareto Distribution with α = 3 to this data via the method of moments. What is the fitted θ parameter? (A) 2000
(B) 2500
(C) 3000
(D) 3500
(E) 4000
24.7 (2 points) You fit a Single Parameter Pareto Distribution with θ = 100 to this data via percentile matching at the 40th percentile. What is the fitted α parameter? (A) 0.3
(B) 0.5
(C) 0.7
(D) 0.9
(E) 1.1
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 674
Use the following information about a random sample for the next two questions: (i) The sample size equals five. (ii) The sample is from an Exponential distribution as per Loss Models. (iii) Two of the sample observations are known to exceed 1000, and the remaining three observations are: 200, 500, 800. 24.8 (2 points) Calculate the maximum likelihood estimate of θ. (A) 1000
(B) 1050
(C) 1100
(D) 1150
(E) 1200
24.9 (3 points) Draw the p-p plot comparing the fitted Exponential fitted to the data.
Use the following information for the next two questions: You observe claims from several policies with different maximum covered losses: Claim Number Policy Number Claim Size Maximum Covered Loss C833 WCMA7 150 500 C745 WCMA3 600 1000 C666 WCMA3 1000 1000 C272 WCMA12 1300 2000 C724 WCMA14 2000 2000 You fit an exponential distribution F(x) = 1 - e-λx to this data via the method of maximum likelihood. 24.10 (2 points) Determine the likelihood. A. λ5 e−6500λ
B. λ3 e−3000λ
C. λ5 e−2300λ
D. λ3 e−5050λ
E. None of A, B, C, or D. 24.11 (2 points) What is the fitted value of the parameter λ? A. 0.0006
B. 0.0007
C. 0.0008
D. 0.0009
E. 0.0010
24.12 (2 points) 100 losses are observed. Five of the losses are: 50, 100, 200, 200, 400. All that is known about the other 95 losses is that they exceed 500. Determine the maximum likelihood estimate of the mean of an exponential model. (A) Less than 9000 (B) At least 9000, but less than 10,000 (C) At least 10,000, but less than 11,000 (D) At least 11,000, but less than 12,000 (E) At least 12,000
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 675
24.13 (4 points) An insurer writes a policy with a $25,000 maximum covered loss and an 80% coinsurance factor. There are 9 payments of sizes: $256, $1500, $2032, $4720, $5792, $8040, $15,200, $20,000, $20,000. It is assumed that prior to the effect of the maximum covered loss and the coinsurance factor, the losses follow a LogLogistic Distribution, as per Loss Models. A Loglogistic Distribution is fit via Percentile Matching, with the matching performed at the payments of sizes: $1500 and $8040. What is the mode of the fitted LogLogistic Distribution. A. 200 B. 300 C. 400 D. 500 E. 600 24.14 (2 points) You observe 6 claims from three policies with different maximum covered losses, each with no deductible: Claim Policy Maximum Amount Paid by Number Number Covered Loss The Insurer C1573 MAC19 20,000 13,000 C1574 MAC19 20,000 20,000 C3313 MAC33 10,000 500 C4445 MAC33 10,000 9,000 C6720 MAC17 30,000 30,000 C6754 MAC17 30,000 30,000 You assume that in the absence of any maximum covered loss that the losses would follow a Distribution Function F(x), with probability density function f(x). Let S(x) = 1 - F(x). What is the likelihood? A. f(500) f(9000) f(13,000) S(20,000) S(30,000)2 B. f(500) f(9000) f(13,000) S(10,000) S(20,000) S(30,000) f(500) f(9000) f(13,000) S(20,000) S(30,000)2 C. S(500) S(9000) S(13,000) D.
f(500) f(9000) f(13,000) S(10,000) S(20,000) S(30,000) S(500) S(9000) S(13,000)
E. None of the above.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 676
24.15 (3 points) The following seven payments are from a policy with a maximum covered loss of 100:
10, 15, 25, 40, 70, 100, and 100.
A Weibull distribution with τ = 3 is fit via maximum likelihood. What is the maximum likelihood estimate of θ? (A) Less than 75 (B) At least 75, but less than 80 (C) At least 80, but less than 85 (D) At least 85, but less than 90 (E) At least 90 24.16 (3 points) You are given a sample of losses from an exponential distribution. However, if a loss is 25,000 or greater, it is reported as 25,000. The summarized sample is: Reported Loss Number Total Amount Less than 10,000 36 154,000 [10,000, 25,000) 24 424,000 25,000 18 450,000 Total 78 1,028,000 Determine the maximum likelihood estimate of θ, the mean of the exponential distribution. (A) 15,000
(B) 16,000 (C) 17,000
(D) 18,000
(E) 19,000
24.17 (3 points) The maximum covered loss is 500. A sample of 9 payments is: 41, 133, 172, 194, 220, 235, 350, 500, 500. A LogNormal Distribution is fit by matching at the 50th and 67th percentiles. What is the mean of the fitted distribution? (A) 310 (B) 330 (C) 350 (D) 370 (E) 390 24.18 (3 points) The following four payments are from a policy with a maximum covered loss of 10,000: 1000, 3000, 6000, 10,000. A Pareto distribution with θ = 5000 is fit via maximum likelihood. What is the maximum likelihood estimate of α? (A) 0.8
(B) 1.0
(C) 1.2
(D) 1.4
(E) 1.6
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 677
24.19 (4, 5/85, Q.53) (2 points) It is given that a random sample of n claim values xi, i = 1, 2, ... n, from the probability density function (p.d.f.) f(x;θ) are censored from above at c. That is, the recorded value of a claim = min (xi, c), i = 1, 2, ..., n. Which of the following expressions represents the likelihood function with respect to this sample? Assume that nu values are uncensored and nc are censored. Uncensored values = xi, i= 1, 2, . . ., nu , Censored values = c, i = 1, 2,...nc, and that F(x;θ) is the distribution function of X, x
F(x) =
∫
f(t;θ) dt .
0 nu
A.
∏ f(xi ;θ)
B.
i=1
C.
nu
nc
i=1
i=1
∏ f(xi ;θ) ∏F(c;θ)
D.
nu
nc
i=1
i=1
nu
nc
i=1
i=1
∏ f(xi ;θ) ∏ f(c;θ) ∏ f(xi ;θ) ∏ {1 - F(c;θ)}
E. None of the above 24.20 (4, 11/00, Q.22) (2.5 points) You are given the following information about a random sample: (i) The sample size equals five. (ii) The sample is from a Weibull distribution with τ = 2. (iii) Two of the sample observations are known to exceed 50, and the remaining three observations are 20, 30 and 45. Calculate the maximum likelihood estimate of θ . (A) Less than 40 (B) At least 40, but less than 45 (C) At least 45, but less than 50 (D) At least 50, but less than 55 (E) At least 55 24.21 (2 points) Redo the previous question, 4, 11/00, Q.22, but with τ = 1. A. 25
B. 35
C. 45
D. 55
E. 65
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 678
24.22 (4, 5/01, Q.7) (2.5 points) You are given a sample of losses from an exponential distribution. However, if a loss is 1000 or greater, it is reported as 1000. The summarized sample is: Reported Loss Number Total Amount Less than 1000 62 28,140 1000 38 38,000 Total 100 66,140 Determine the maximum likelihood estimate of θ, the mean of the exponential distribution. (A) Less than 650 (B) At least 650, but less than 850 (C) At least 850, but less than 1050 (D) At least 1050, but less than 1250 (E) At least 1250 24.23 (2 points) Redo the previous question, 4, 5/01, Q.7, ignoring the values given in the “Total Amount” column. A. 1040
B. 1060
C. 1080
D. 1100
E. 1120
24.24 (2 points) In 4, 5/01, Q.7, determine the method of moments estimate of θ. A. 1040
B. 1060
C. 1080
D. 1100
E. 1120
24.25 (4, 11/02, Q.40 & 2009 Sample Q. 56) (2.5 points) You are given the following information about a group of policies: Claim Payment Maximum Covered Loss 5 50 15 50 60 100 100 100 500 500 500 1000 Determine the likelihood function. (A) f(50) f(50) f(100) f(100) f(500) f(1000) (B) f(50) f(50) f(100) f(100) f(500) f(1000) / {1 - F(1000)} (C) f(5) f(15) f(60) f(100) f(500) f(500) (D) f(5) f(15) f(60) f(100) f(500) f(500) / {1 - F(1000)} (E) f(5) f(15) f(60) {1 - F(100)} {1 - F(500)} f(500)
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 679
24.26 (4, 11/03, Q.6 & 2009 Sample Q. 4) (2.5 points) You are given: (i) Losses follow a Single-parameter Pareto distribution with density function: f(x) = α/xα+1, x > 1, 0 < α < ∞. (ii) A random sample of size five produced three losses with values 3, 6 and 14, and two losses exceeding 25. Determine the maximum likelihood estimate of α. (A) 0.25
(B) 0.30
(C) 0.34
(D) 0.38
(E) 0.42
24.27 (4, 11/04, Q.24 & 2009 Sample Q.150) (2.5 points) You are given: (i) Losses are uniformly distributed on (0, θ), with θ > 150. (ii) The policy limit is 150. (iii) A sample of payments is: 14, 33, 72, 94, 120, 135, 150, 150. Estimate θ by matching the average sample payment to the expected payment per loss. (A) 192
(B) 196
(C) 200
(D) 204
(E) 208
24.28 (4, 5/05, Q.31 & 2009 Sample Q.199) (2.9 points) Personal auto property damage claims in a certain region are known to follow the Weibull distribution: F(x) = 1 - exp[-(x/θ)0.2], x > 0. A sample of four claims is: 130 240 300 540. The values of two additional claims are known to exceed 1000. Determine the maximum likelihood estimate of θ. (A) Less than 300 (B) At least 300, but less than 1200 (C) At least 1200, but less than 2100 (D) At least 2100, but less than 3000 (E) At least 3000
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 680
24.29 (4, 11/06, Q.5 & 2009 Sample Q.250) (2.9 points) You have observed the following three loss amounts: 186 91 66 Seven other amounts are known to be less than or equal to 60. Losses follow an inverse exponential with distribution function F(x) = e−θ/x, x > 0 Calculate the maximum likelihood estimate of the population mode. (A) Less than 11 (B) At least 11, but less than 16 (C) At least 16, but less than 21 (D) At least 21, but less than 26 (E) At least 26 24.30 (3 points) In the previous question, 4, 11/06, Q.5, fit via maximum likelihood an Inverse Pareto Distribution with θ = 100. What is the fitted value of τ? (A) Less than 0.3 (B) At least 0.3, but less than 0.4 (C) At least 0.4, but less than 0.5 (D) At least 0.5, but less than 0.6 (E) At least 0.6
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 681
Solutions to Problems: 24.1. A. The average for data censored from above at 1000 is E[X ∧ 1000]. The observed average is: (150 + 400 + 900 + 1000 + 1000) / 5 = 690. For the exponential distribution E[X ∧ x] = (1/λ)(1 - e-λx). Therefore 690 = (1/λ)(1 - e-λx). Solving numerically λ ≅ 0.000794. (One can try the endpoints of the given intervals, in order to determine which interval λ is in.) 24.2. A. Since the 3rd claim out of 5 is 900, and 3/6 = 50%, the estimated 50th percentile is 900. Thus set .5 = 1 - e-900λ. Solving λ = - ln(.5)/900 = 0.00077. Comment: The percentiles greater than 50% are greater than 900. Since the 4th loss out of 5 is at least 1000 (the maximum payment is 1000, regardless of how large the loss), and 4/6 = 66.7%, the percentiles greater than 66.7 are at least 1000. Percentile matching can not be performed at percentiles greater than 50%. At smaller percentiles, percentile matching proceeds as it would without censoring from above. 1000
∫
24.3. B. For b > 1000, E[X ∧ 1000] = x/b dx + (1 - 1000/b)(1000) = 1000 - 500000/b. 0
The observed average is: (150 + 400 + 900 + 1000 + 1000) / 5 = 690. Setting E[X ∧ 1000] = 690: 1000 - 500000/b = 690. ⇒ b = 1613.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 682
24.4. C. For b > 1000, the likelihood is: f(150) f(400) f(900) S(1000) S(1000) =
1 1 1 b - 1000 b - 1000 (b - 1000)2 = . b b b b b b5
Loglikelihood is: 2 ln(b - 1000) - 5 ln(b). Set the derivative with respect to b equal to zero:
2 5 - = 0. ⇒ b = 1667. b - 1000 b
Comment: In this case, we can set the empirical and theoretical distribution functions equal at 1000: 1000 / b = 3 /5. ⇒ b = 1667. A graph of the loglikelihood as a function of b, verifying that we have indeed found the maximum: loglikelihood
- 24.5
- 25.0
1500
2000
2500
3000
b
24.5. D. There is only one parameter remaining, so we match means. The observed mean is: (150 + 400 + 900 + 1000 + 1000)/5 = 690. The average for data censored at 1000 is the Limited Expected Value E[X ∧ 1000]. For the LogNormal Distribution, E[X ∧ x] = exp(µ + σ2/2)Φ[(lnx − µ − σ2)/σ] + x {1 − Φ[(lnx − µ)/σ]} . E[X ∧ 1000] = exp(µ + 2)Φ[(6.9078 − µ - 4)/2] + 1000 {1 − Φ[(6.9078 − µ)/2]}. Setting E[X ∧ 1000] = 690, one can solve numerically, µ = 7.04. One can try the endpoints of the given intervals, in order to determine which interval µ is in: mu LEV at 1000
6 507
6.5 597
7 684
7.5 761
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 683
24.6. D. There is only one parameter remaining, so we match means. The observed mean is: (150 + 400 + 900 + 1000 + 1000)/5 = 690. The average for data censored from above at 1000 is E[X ∧ 1000]. For the Pareto Distribution, E[X ∧ x] = {θ/(α−1)}{1−(θ/(θ+x))α−1}. E[X ∧ 1000] = {θ/2}{1−(θ/(θ + 1000))2 }. Setting E[X ∧ 1000] = 690, {θ/2}{1−(θ/(θ + 1000))2 } = 690. ⇒ θ − θ3/(θ + 1000)2 = 1380. θ(θ + 1000)2 − θ3 = 1380(θ + 1000)2 . ⇒ 620θ2 - 1,760,000θ - 1,380,000,000 = 0. θ = {1,760,000 +
1,760,0002 + (4)(620)(1,380,000,000) } / {(2)(620)} = 3479.
24.7. A. (.4)(N+1) = (.4)(6) = 2.4. The estimated 40th percentile is: (.6)(400) + (.4)(900) = 600. Thus set .4 = 1 - (100/600)α. ⇒ 0.6 = 1/6α. ⇒ α = ln(.6)/ln(1/6) = 0.285. 24.8. D. f(x) = exp(-(x/θ))/θ. S(x) = exp(-(x/θ)). The losses not affected by the maximum covered loss each contribute to the loglikelihood: lnf(xi) = - (xi/θ) - ln(θ). Losses affected by the maximum covered loss each contribute: lnS(1000) = - (1000/θ). Thus the loglikelihood for the five observations is: - (200/θ) - ln(θ) - (500/θ) - ln(θ) - (800/θ) - ln(θ) - (1000/θ) - (1000/θ) = -3500/θ - 3ln(θ). Setting the partial derivative of the loglikelihood with respect to theta equal to zero: 0 = 3500/θ2 - 3/θ. ⇒ θ = 3500/3 = 1167.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 684
24.9. We are not able to plot the censored values, since 1000+ corresponds to a loss of 1000 or more. For the fitted Exponential with θ = 1167, F(200) = 1 - e-200/1167 = 0.1575. F(500) = 1 - e-500/1167 = 0.3485. F(800) = 1 - e-800/1167 = 0.4962. The horizontal coordinates are j/(n+1), where n = 5 the number of points in the data. Thus the plotted points are: (1/6, 0.1575), (2/6, 0.3485), and (3/6, 0.4962). Here is the p-p plot:
Comment: Similar to Example 16.3 in Loss Models. 24.10. D. For the Exponential distribution f(x) = λe-λx. The contribution to the likelihood of claims less than the maximum covered loss is f(x). The contribution to the likelihood of claims equal to the maximum covered loss is: 1 - F(x) = e-λx. Thus the likelihood is: (λe−150λ) (λe−600λ) (e−1000λ) (λe−1300λ) (e−2000λ) = λ3 e−5050λ . 24.11. A. From the previous problem the likelihood is λ3 e−5050λ. Setting the derivative of the likelihood equal to zero: 0 = 3λ2 e−5050λ - 5050 λ3 e−5050λ. Therefore 3 - 5050λ = 0. Thus λ = 0.00059. Comment: Working with the loglikelihood: 3 ln(λ) - 5050λ, would give the same result.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 685
24.12. B. f(x) = e-x/θ/θ. ln f(x) = -x/θ - lnθ. S(x) = e-x/θ. ln S(x) = -x/θ. loglikelihood = ln f(50) + ln f(100) + 2ln f(200) + ln f(400) + 95ln S(500) = -50/θ - lnθ - 100/θ - lnθ + (2)(-200/θ - lnθ) - 400/θ - lnθ + (95)(500/θ) = -48450/θ - 5lnθ. Setting the derivative equal to zero: 0 = 48450/θ2 - 5/θ. ⇒ θ = 48450/5 = 9690 24.13. B. The payments of $1500 and $8040 correspond to losses of sizes 1500/0.8 = 1875 and 8040/0.8 = 10,050. Since these are both below the maximum covered loss, we can perform percentile matching ignoring the maximum covered loss. The 1875 loss is 2 out of 9, corresponding to the 2/(9+1) = 20th percentile. The 10050 loss is 6 out of 9, corresponding to the 6/(9+1) = 60th percentile. Matching, we want S(1875) = 1 - 0.2 = 0.8 and S(10,050) = 1 - 0.6 = 0.4. S(x) = 1/{1+(x/θ)γ}. Therefore, 1/{1+(1875/θ)γ} = .8, and 1/{1+(10050/θ)γ} = .4. Inverting, (1875/θ)γ + 1 = 1.25, and (10050/θ)γ + 1 = 2.5. (1875/θ)γ = .25, and (10050/θ)γ = 1.5. Thus (10050/1875)γ = 6. γ = ln(6) / ln(10,050/1875) = 1.0672. ⇒ θ = 10,050 /(1.5)1/γ = 6873. ⎛ γ - 1⎞ 1/ γ Mode is: θ ⎜ = (6873) (0.0672 / 2.0672)1/1.0672 = 277. ⎟ ⎝ γ + 1⎠ 24.14. A. One needs to compute the likelihood for each claim and then multiply the terms together. The first claim is below the maximum covered loss, and thus its likelihood is the density f(3900). The second claim is at the 20000 maximum covered loss, and thus its likelihood is: 1 - F(20000) = S(20000). We employ the Distribution Function rather than the density; claims that exhaust the policy are treated as would be grouped data. In this case the second claim is in the interval [20,000 , ∞). The remaining claims are treated similarly to the first two claims, depending on whether or not they exhaust their maximum covered loss: Maximum Amount Paid by Size of Covered Loss The Insurer Accident Likelihood 20,000 13,000 13,000 f(13000) 20,000 20,000 ≥20,000 S(20000) 10,000 500 500 f(500) 10,000 9000 9000 f(9000) 30,000 30,000 ≥30,000 S(30000) 30,000 30,000 ≥30,000 S(30000) Multiplying the individual likelihoods together: f(500) f(9000) f(13000) S(20000) S(30000)2 .
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 686
24.15. B. f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x = 3x2 exp(-(x/θ)3 ) /θ3. S(x) = exp(-(x/θ)τ) = exp(-(x/θ)3 ). The losses not affected by the maximum covered loss contribute to the loglikelihood: lnf(xi) = ln(3) + 2ln(xi) - (xi/θ)3 - 3ln(θ). Losses affected by the maximum covered loss contribute: lnS(100) = - (100/θ)3 . Thus the loglikelihood for the seven observations is: {ln(3) + 2ln(10) - (10/θ)3 - 3ln(θ)} + {ln(3) + 2ln(15) - (15/θ)3 - 3ln(θ)} + {ln(3) + 2ln(25) - (25/θ)3 - 3ln(θ)} + {ln(3) + 2ln(40) - (40/θ)3 - 3ln(θ)} + {ln(3) + 2ln(70) - (70/θ)3 - 3ln(θ)} - (100/θ)3 - (100/θ)3 = 5ln(3) + 2{ln(10) + ln(15) + ln(20) + ln(40) + ln(70)} - 2427000/θ3 - 15ln(θ). Setting the partial derivative with respect to theta equal to zero: 0 = 7281000/θ4 - 15/θ. θ3 = 485,400. θ = 78.6. Comment: Similar to 4, 11/00, Q.22. The loglikelihood as a function of theta: Loglikelihood - 33.1 - 33.2 - 33.3 - 33.4
70
75
80
85
90
theta
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 687
24.16. C. f(x) = e-x/θ/θ. ln f(x) = -lnθ - x/θ. S(x) = e-x/θ. ln S(25000) = - 25000/θ. Those losses of size less than 25000, each contribute to the loglikelihood: ln f(xi) = -lnθ - xi/θ. The sum of these contribution is: -(36+24) lnθ - (154000+424000)/θ. Each loss of size 25000 or more contributes to the loglikelihood: ln S(25000) = - 25000/θ. The sum of these contributions is: 18(- 25000/θ) = -450000/θ. Therefore, the total loglikelihood is: -60 lnθ - 1028000/θ. Setting the partial derivative with respect to θ of the loglikelihood equal to zero: -60/θ + 1028000/θ2 = 0. ⇒ θ = 1028000/60 = 17,133. Comment: Similar to 4, 5/01, Q.7. 24.17. A. The estimate of the 50th percentile is: (.5)(9 + 1) = 5th loss = 220. The estimate of the 67th percentile is: (.67)(9 + 1) = 6.7th loss = (.3)(235) + (.7)(350) = 315.5. .5 = F(220) = Φ[(ln220 - µ)/σ)]. ⇒ 0 = (ln220 - µ)/σ. ⇒ µ = ln220 = 5.394. .67 = F(315.5) = Φ[(ln315.5 - µ)/σ)]. ⇒ 0.44 = (ln315.5 - µ)/σ.
⇒ σ = (ln315.5 - ln220)/ 0.44 = 0.819. Mean = exp[µ + σ2/2] = exp[5.394 + 0.8192 /2] = 308. 24.18. C. loglikelihood is: ln f(1000) + ln f(3000) + ln f(6000) + ln S(10000). S(x) = {θ/(x + θ)}α. ln S(x) = α ln[θ/(x + θ)]. ln S(10000) = α ln[5/15] = -α ln3. f(x) = αθα/(x + θ)α+1. ln f(x) = lnα + αlnθ - (α + 1)ln[x + θ]. loglikelihood is: 3lnα + 3αln5000 - (α + 1)(ln6000 + ln8000 + ln11000) - α ln3. Setting the partial derivative of the loglikelihood with respect to alpha equal to zero: 0 = 3/α +3ln5000 - (ln6000 + ln8000 + ln11000) - ln3.
⇒ α = 3/{ln6000 + ln8000 + ln11000 + ln3 - 3ln5000} = 1.18. 24.19. D. The likelihood function is the product of the likelihoods of the individual observations. For each censored point, all we know is that the original claim value was greater than or equal to c, and this has likelihood: 1 - F(c;θ). The product of these likelihoods is the second term of choice D. For the uncensored points, the likelihood is f(xi;θ). The product of these likelihoods is the first term of choice D. Thus the likelihood function is choice D.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 688
24.20. D. f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x = 2x exp(-(x/θ)2 ) /θ2. S(x) = exp(-(x/θ)τ) = exp(-(x/θ)2 ). The losses not affected by the maximum covered loss each contribute to the loglikelihood: lnf(xi) = ln(2) + ln(xi) - (xi/θ)2 - 2ln(θ). Losses affected by the maximum covered loss each contribute: lnS(50) = - (50/θ)2 . Thus the loglikelihood for the five observations is: {ln(2) + ln(20) - (20/θ)2 - 2ln(θ)} + {ln(2) + ln(30) - (30/θ)2 - 2ln(θ)} + {ln(2) + ln(45) - (45/θ)2 - 2ln(θ)} - (50/θ)2 - (50/θ)2 = 3ln(2) + ln(20) + ln(30) + ln(45) - 8325/θ2 - 6ln(θ). Setting the partial derivative with respect to theta equal to zero: 0 = 16650/θ3 - 6/θ. ⇒ θ2 = 16650/6 = 2775. ⇒ θ = 52.7. Comment: Alternately, for an Exponential Distribution, fit via maximum likelihood: ^
θ = (sum of payments) / (number of uncensored values). We can transform the Weibull into “Exponential Land” by raising things to the power τ, then fit via maximum Iikelihood, and then translate back to “Weibull Land” by taking the result to the 1/τ power. 1 /τ ∑ Min[xi , u]τ ⎛ ⎞ θ= = ⎝ number of uncensored values ⎠ ^
202 + 302 + 452 + 502 + 502 = 52.7. 3
Without truncating or censoring, to fit a Weibull with tau fixed via maximum likelihood: ^
θ=
(∑xi τ / N) 1 / τ . With both truncating at di and censoring at ui, to fit a Weibull with tau fixed via
1 /τ ∑ (Min[xi , ui ]τ - diτ) ⎛ ⎞ maximum likelihood: θ = . ⎝ number of uncensored values ⎠ ^
Here is a graph of the loglikelihood as a function of theta: - 14.50 - 14.52 - 14.54 - 14.56 - 14.58 - 14.60 48
50
52
54
56
58
60
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 689
24.21. E. A Weibull with t = 1 is an Exponential Distribution. f(x) = e-x/θ/θ. ln f(x) = -x/θ - lnθ. S(x) = e-x/θ. ln S(x) = -x/θ. loglikelihood = ln f(20) + ln f(30) + ln f(45) + ln S(50) + ln S(50) = -20/θ - lnθ - 30/θ - lnθ - 45/θ - lnθ - 50/θ - 50/θ = -195/θ - 3lnθ. Setting the derivative equal to zero: 0 = 195/θ2 - 3/θ. ⇒ θ = 195/3 = 65. 24.22. D. f(x) = e-x/θ/θ. ln f(x) = -lnθ - x/θ. S(x) = e-x/θ. ln S(1000) = -1000/θ. Those losses of size less than 1000, each contribute to the loglikelihood: ln f(xi) = -lnθ - xi/θ. The sum of these contributions is: -62 lnθ - 28140/θ. Each loss of size 1000 or more contributes to the loglikelihood: ln S(1000) = -1000/θ. The sum of these contributions is: (38)(-1000/θ) = -38000/θ. Therefore, the total loglikelihood is: -62 lnθ - 66140/θ. Setting the partial derivative with respect to θ of the loglikelihood equal to zero: -62/θ + 66140/θ2 = 0. θ = 66140/62 = 1067. Alternately, we get the same result as for fitting an Exponential Distribution via maximum likelihood to ungrouped data: sum of the payments θ= = 66,140 / 62 = 1067. number of uncensored values Comment: While this is grouped data, we are given the sum of losses for each interval in addition to the number of items in each interval. In the case of the Exponential Distribution, the contribution to the loglikelihood of the losses of size less than 1000 only depends on the number of data points and the sum of their values. Therefore, we have just enough data to proceed in the same manner as if the data were ungrouped. The Exponential Distribution is the exception! Due to the simple form of its density, the sum of uncensored losses is enough in order to determine the maximum likelihood Exponential. For other distributions there may be summary statistics that determine the maximum likelihood fit, but if so they are more complicated than the sum of the uncensored losses. 24.23. A. The likelihood is: (1 - e-1000/θ)62 (e-1000/θ)38. Let y = e-1000/θ. Likelihood: (1 - y)62 y38. Setting the derivative with respect to y equal to zero: 0 = -62(1 - y)61 y38 + 38(1 - y)62 y37. 0 = -62y + 38(1 - y). ⇒ y = .38. ⇒ e-1000/θ = .38. ⇒ θ = 1034.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 690
24.24. E. E[X
∧
1000] = {28140 + (1000)(38)}/100 = 661.40.
For the Exponential Distribution, E[X
∧
x] = θ(1 - e-x/θ). We want 661.4 = θ(1 - e-1000/θ).
theta
theta(1 - exp[-1000/theta])
1040 1060 1080 1100 1120
642.4 647.3 652.1 656.8 661.4
Checking the choices, this is so for θ = 1120. Comment: With censored data, for the Exponential Distribution the method of moments is not equal to the method of maximum likelihood. 24.25. E. Each claim that is below the maximum covered loss contributes f(x) to the likelihood. Each claim that exhaust the maximum covered loss contributes 1 - F(x) = S(x) to the likelihood. Claim Maximum Exhausts Maximum Contribution Payment Covered Loss Covered Loss to the Likelihood 5 50 No f(5) 15 50 No f(15) 60 100 No f(60) 100 100 Yes 1 - F(100) 500 500 Yes 1 - F(500) 500 1000 No f(500) The likelihood is the product of the contributions: f(5) f(15) f(60) [1-F(100)] [1-F(500)] f(500). Comment: f and F are the density and distribution function of the unlimited ground up losses. 24.26. A. S(x) = 1/xα. S(25) = 1/25α. ln S(25) = - α ln(25). f(x) = α/xα+1. ln f(x) = lnα - (α+1)lnx. Each small loss contributes ln f(x), while each loss censored from above at 25 contributes ln S(25), and thus the loglikelihood is: 3 lnα - (α+1)(ln3 + ln6 + ln14) - 2αln(25). Setting the derivative with respect to α equal to zero: 0 = 3/α - (ln3 + ln6 + ln14) - 2ln(25).
⇒ α = 3/{ln3 + ln6 + ln14 + 2ln(25)} = 0.251.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 691
24.27. E. The average of the observations is: (14 + 33 + 72 + 94 + 120 + 135 + 150 + 150)/8 = 96. For the uniform distribution on (0, θ), with θ > 150: 150
E[X
∧
150] =
150
∫ x f(x) dx + 150S(150) = ∫ x/θ dx + 150(1 -150/θ) = 1502/(2θ) + 150 - 1502/θ =
0
150 - 11,250/θ.
0
96 = 150 - 11,250/θ. ⇒ θ = 208.3.
24.28. E. f(x) = 0.2 x-0.8 exp[-(x/θ)0.2] / θ0.2. ln f(x) = ln(.2) - .8ln(x) - (x/θ)0.2 - .2ln(θ). ln S(x) = -(x/θ)0.2. ln S(1000) = -(1000/θ)0.2. Loglikelihood is: 4ln(.2) - .8{(ln(130) + ln(240) + ln(300) + ln(540)} - {1300.2 + 2400.2 + 3000.2 + 5400.2 + (2)10000.2} /θ0.2 - .8ln(θ) = -20.2505/θ0.2 - .8ln(θ) + constants. Setting the derivative of the loglikelihood equal to zero: 0 = (.2)(20.2505)/θ1.2 - .8/θ. ⇒ θ0.2 = 5.0626. ⇒ θ = 3326. Comment: A Weibull Distribution, with τ = 0.2. 24.29. A. The seven amounts that are in the interval from 0 to 60, each contribute F(60) to the likelihood; the larger losses whose values we know each contribute f(x). The likelihood is: F(60)7 f(66) f(91) f(186) = (e−θ/60)7 e−θ/66 θ/662 e−θ/91 θ/912 e−θ/186 θ/1862 = exp[-θ(7/60 + 1/66 + 1/91 + 1/186)] θ/{662 912 1862 }. loglikelihood is: -0.1482θ + 3ln(θ) + constants. Set the partial derivative equal to zero: 0 = -0.1482 + 3/θ. ⇒ θ = 3 / 0.1482 = 20.2. The mode of an Inverse Exponential is: θ/2 = 20.2/2 = 10.1. Alternately, let y = 1/x, then y follows an Exponential with mean µ = 1/θ. µ = (Sum of Y)/(uncensored values) = (7/60 + 1/66 + 1/91 + 1/186)/3. θ = 3/(7/60 + 1/66 + 1/91 + 1/186) = 20.2. Mode is: θ/2 = 20.2/2 = 10.1. Comment: The data has been censored from below.
2013-4-6,
Fitting Loss Distributions §24 Censored Data, HCM 10/15/12, Page 692
⎛ x ⎞τ τθ x τ − 1 24.30. B. For the Inverse Pareto, F(x) = ⎜ ⎟ = (1 + θ/x)−τ, and f(x) = . ⎝x + θ⎠ (x + θ ) τ + 1 lnF(x) = τ ln[x] - τ ln[x + θ].
lnf(x) = ln[τ] + ln[θ] + (τ-1)ln[x] - (τ+1)ln[x+θ].
The seven amounts that are in the interval from 0 to 60, each contribute F(60) to the likelihood; the larger losses whose values we know each contribute f(x). The likelihood is: F(60)7 f(66) f(91) f(186) The loglikelihood is: 7 ln[F(60)] + lnf(66) + lnf(91) + lnf(186) = 7 τ ln[60] - 7τ ln[160] + 3ln[τ] + 3ln[100] + (τ-1)(ln[66] + ln[91] + ln[186]) - (τ+1)(ln[166] + ln[191] + ln[286]) . Set the partial derivative with respect to tau equal to zero: 0 = 7 ln[60] - 7 ln[160] + 3/τ + ln[66] + ln[91] + ln[186] - (ln[166] + ln[191] + ln[286]). ⇒ 3/τ = 8.960. ⇒ τ = 0.335. Comment: The data has been censored from below.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 693
Section 25, Fitting to Data Truncated and Censored293 In practical applications, it is quite common to work with data truncated and shifted from below at d, and censored from above at u. Then the distribution function becomes in terms of the size of the payment x: ⎧F(x + d) - F(d) ⎪ G(x) = ⎨ S(d) ⎪⎩ 1
0 < x < u-d x = u- d
The distribution function becomes in terms of the size of the loss x: ⎧F(x) - F(d) ⎪ G(x) = ⎨ S(d) ⎪⎩ 1
d < x < u x = u
In this combined situation with both a deductible and a maximum covered loss, one combines the mathematical adjustments required in each of the separate situations. Deductible
Maximum Covered Loss
Contribution to the Likelihood of a Loss of Size x
None
None
f(x)
d
None
f(x) for x > d S(d)
None
u
f(x) for u > x, and S(u) for x ≥ u
d
u
f(x) S(u) for u > x > d, and for x ≥ u S(d) S(d)
Note that the final situation covers all the others cases. No deductible ⇔ d = 0. No maximum covered loss ⇔ u = ∞.
293
See “Mahlerʼs Guide to Survival Analysis” for additional examples of applying maximum likelihood to data that has been truncated and/or censored.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 694 Exercise: It is assumed that prior to the effects of deductibles and maximum covered losses, the losses follow a Distribution F(x), with density f(x) and Survival Function S(x). The following data on five payments by an insurer has been collected from polices each with a deductible of 10 and a maximum covered loss of 50: 6, 12, 27, 40, 40. Determine the likelihood. [Solution: The payment of 6 corresponds to a loss of 16. A payment of 40 corresponds to a loss of size 50 or more. The contributions are: f(16)/S(10), f(22)/S(10), f(37)/S(10), S(50)/S(10), S(50)/S(10). The likelihood is: f(16) f(22) f(37) S(50)2 / S(10)5 .] Exercise: If in the previous exercise, it is assumed that prior to the effects of deductibles and maximum covered losses, the losses follow a ParaLogistic Distribution, determine the loglikelihood. [Solution: The likelihood is: f(16) f(22) f(37) S(50)2 / S(10)5 . Therefore, the loglikelihood is: ln f(16) + ln f(22) + ln f(37) + 2 ln S(50) - 5 ln S(10). For a ParaLogistic Distribution, F(x) = 1 - (1/(1+(x /θ)α))α and f(x) = (α2 θ−α xα−1)(1 + (x /θ)α)−(α + 1). Therefore ln f(x) = 2 ln(α) - α ln (θ) + (α-1) ln(x) - (α+1) ln(1 + (x /θ)α), and ln S(x) = -α ln[1 + (x /θ)α]. Therefore, the loglikelihood is: 6 ln(α) - 3α ln (θ) + (α-1) {ln(16) + ln(22) + ln(37)} - (α+1) {ln[1 + (16 /θ)α] + ln[1 + (22 /θ)α] + ln[1 + (37/θ)α]} - 2α ln[1 + (50 /θ)α] + 5α ln[1 + (10 /θ)α]. ] One could then solve numerically for the parameters that maximize this loglikelihood. The solution turns out to be: α = 1.38 and θ = 4.16, with a loglikelihood of -14.1395. Note that in the above exercise, each of the losses of size x less than the maximum covered loss u, but greater than the deductible d results in a small payment and contributes to the loglikelihood: ln[f(x)/S(d)] = ln f(x) - ln S(d). Each of the losses of size x, x ≥ u, contributes to the loglikelihood: ln[S(u)/S(d)] = ln S(u) - ln S(d). Thus with both a deductible d and a maximum covered loss u, the method of maximum likelihood requires that one maximize: Σ ln f(xi) + M ln S(u) - N ln S(d), where M is the number of losses greater than or equal to the maximum covered loss and one sums ln f(xi) over the losses less than the maximum covered loss u, but greater than the deductible d.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 695 Exponential Distribution, Maximum Likelihood: Assume that we have the following data: Deductible Maximum Covered Loss 500 25,000 500 50,000 1000 25,000 1000 50,000
Payment 3000 49,500 24,000 8000
Loss 3500 50,000 or more 25,000 or more 9000
The loglikelihood is: ln[f(3500)/S(500)] + ln[S(50000)/S(500)] + ln[S(25000)/S(1000)] + ln[f(9000)/S(1000)]. Let us assume that prior to the effects of truncation and censoring, that the losses follow an Exponential Distribution. For an Exponential Distribution, f(x) = e-x/θ/θ and S(x) = e-x/θ. The loglikelihood is: -3500/θ - ln(θ) + 500/θ - 50000/θ + 500/θ - 25000/θ + 1000/θ - 9000/θ - ln(θ) + 1000/θ = -84,500/θ - 2 ln(θ). Setting the derivative of the loglikelihood with respect to θ equal to zero: 84,500/θ2 - 2/θ = 0. ⇒ θ = 84,500/2 = 42,250. Note that in general for an Exponential Distribution, each censored value contributes: ln[S(ui)/S(ti)] = -ui/θ + ti/θ = - (ui - ti)/θ, each uncensored value contributes: ln[f(xi)/S(ti)] = -xi/θ - ln(θ) + ti/θ = -(xi - ti)/θ - ln(θ). Therefore, the loglikelihood is: -(sum of payments)/θ - (number of uncensored values) ln(θ). Setting the derivative of the loglikelihood with respect to θ equal to zero: (sum of payments)/θ2 - (number of uncensored values)/θ = 0. Therefore, the maximum likelihood fit for an Exponential Distribution is: sum of payments ^ = . θ number of uncensored values
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 696 Pareto Distribution with θ fixed, Maximum Likelihood:294 Assume that we have the previous data: Deductible Maximum Covered Loss Payment Loss 500 25,000 3000 3500 500 50,000 49,500 50,000 or more 1000 25,000 24,000 25,000 or more 1000 50,000 8000 9000 As before, the loglikelihood is: ln[f(3500)/S(500)] + ln[S(50000)/S(500)] + ln[S(25000)/S(1000)] + ln[f(9000)/S(1000)]. For a Pareto Distribution, f(x) = α θα /(θ + x)α+1 and S(x) = {θ/(θ + x)}α. The loglikelihood is: ln[α(θ + 500)α/(θ + 3500)α+1] + ln[(θ + 500)α/(θ + 50000)α] + ln[(θ + 1000)α/(θ + 25000)α] + ln[α(θ + 1000)α/(θ + 9000)α+1] = 2 ln[α] + α ln[(θ + 500)/(θ + 3500)] - ln[θ + 3500] + α ln[(θ + 500)/(θ + 50000)] + α ln[(θ + 1000)/(θ + 25000)] + α ln[(θ + 1000)/(θ + 9000)] - ln[θ + 9000]. Let us assume θ is known and we wish to fit α via maximum likelihood. Then setting the derivative of the loglikelihood with respect to α equal to zero: 0 = 2/α + ln[(θ + 500)/(θ + 3500)] + ln[(θ + 500)/(θ + 50000)] + ln[(θ + 1000)/(θ + 25000)] + ln[(θ + 1000)/(θ + 9000)].
⇒ α = 2/ (ln[(θ + 3500)/(θ + 500)] + ln[(θ + 50000)/(θ + 500)] + ln[(θ + 25000)/(θ + 1000)] + ln[(θ + 9000)/(θ + 1000)]). Exercise: If θ = 10,000, what is fitted value of α? [Solution: α^ = 2 / (ln[13500/10500] + ln[60000/10500] + ln[35000/11000] + ln[19000/11000]) = 0.541.] In general, when fitting via maximum likelihood to a Pareto Distribution with θ known: α^ =
number of uncensored values , θ + xi ln[ ] θ + di
∑
where xi is the ith loss, possibly censored from above, and di is the deductible for the policy for xi. 294
This case is much less important than that of an Exponential Distribution.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 697 Weibull Distribution with τ fixed, Maximum Likelihood:295 In general, for the Weibull Distribution with tau fixed, the maximum likelihood fit to ungrouped data is: 1/ τ ∑ (Min[xi , ui]τ - diτ ) ⎛ ⎞ . θ=⎝ number of uncensored values ⎠ ^
Transform the Weibull into “Exponential Land” by raising things to the power τ, then fit via maximum Iikelihood, and then translate back to “Weibull Land” by taking the result to the 1/τ power. Assume that we have the previous data: Deductible Maximum Covered Loss 500 25,000 500 50,000 1000 25,000 1000 50,000
Payment 3000 49,500 24,000 8000
Loss 3500 50,000 or more 25,000 or more 9000
Using the shortcut to fit the Weibull with τ = 2: ∑ (Min[xi , ui]τ - diτ ) = (35002 - 5002 ) + (50,0002 - 5002 ) + (25,0002 - 10002 ) + (90002 - 10002 ) + (202 - 152 ) = 3216 million. 1 /τ ∑ (Min[xi , ui ]τ - diτ) ^ ⎛ ⎞ θ= = 3216 million / 2 = 40,100. ⎝ number of uncensored values ⎠
295
For τ = 1, the Weibull Distribution is an Exponential Distribution.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 698 Problems: 25.1 (2 points) The size of accidents is assumed to follow a Pareto Distribution with α = 3 and θ = 10,000. A policy is written with a $500 deductible and a $25,000 maximum covered loss. Assume that the maximum covered loss is applied first and then the deductible is applied. Which of the following is the Distribution function for data collected from this policy, where x is the payment by the insurer? ⎛ 10,500 ⎞ 3 A. 1.1576 - ⎜ ⎟ , for x < 25000; 1 for x = 25000 ⎝ 10,000 + x ⎠ ⎛ 10,500 ⎞ 3 B. 1.1576 - ⎜ ⎟ , for x < 24500; 1 for x = 24500 ⎝ 10,000 + x ⎠ ⎛ 10,500 ⎞ 3 C. 1 - ⎜ ⎟ , for x < 25000; 1 for x = 25000 ⎝ 10,500 + x ⎠ ⎛ 10,500 ⎞ 3 D. 1 - ⎜ ⎟ , for x < 24500; 1 for x = 24500 ⎝ 10,500 + x ⎠ E. None of the above.
Use the following information for the next three questions: From a policy with a $10,000 deductible and $100,000 maximum covered loss, you observe 6 claims with the following amounts paid by the insurer: $22,000, $28,000, $39,000, $51,000, $90,000, and $90,000. 25.2 (3 points) You fit a Pareto Distribution with α = 3 to this data via percentile matching. The matching is performed at the payment of size $51,000. What is the fitted θ parameter? (A) 50,000
(B) 75,000
(C) 100,000 (D) 125,000 (E) 150,000
25.3 (2 points) You fit an Exponential Distribution to this data via the method of moments. What is the fitted θ parameter? (A) 76,000
(B) 77,000
(C) 78,000
(D) 79,000
(E) 80,000
25.4 (3 points) You fit a Pareto Distribution with θ = 100,000 to this data via the method of maximum likelihood. What is the fitted α parameter? (A) 1.25
(B) 1.50
(C) 1.75
(D) 2.00
(E) 2.25
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 699 25.5 (3 points) A policy has an ordinary deductible of 5, coinsurance of 90%, and a maximum covered loss of 100 (before application of the deductible and coinsurance). You observe 3 individual payments: 9, 63, 85.5. It is assumed that the ground-up losses follow a LogLogistic Distribution, with θ = 1 and γ unknown. Determine the likelihood function for estimating γ via maximum likelihood. A.
1 γ (1 + 15 ) (1 + 75γ ) (1 + 100 γ )
B.
(1 + 5γ )3 (1 + 15γ ) (1 + 75γ ) (1 + 100 γ )
C.
(1 + 5 γ)3 γ2 1125 γ − 1 (1 + 15γ )2 (1 + 75γ)2 (1 + 100 γ )
D.
(1 + 5 γ)3 γ3 112,500 γ − 1 (1 + 15γ )2 (1 + 75γ)2 (1 + 100 γ )2
E. None of the above. 25.6 (3 points) You observe 6 claims from three policies with different maximum covered losses and deductible amounts: Claim Policy Deductible Maximum Amount Paid by Number Number Amount Covered Loss The Insurer C1673 MAC29 none 20,000 3200 C1674 MAC29 none 20,000 20,000 C2313 MAC23 100 10,000 500 C4745 MAC23 100 10,000 9900 C2720 MAC13 500 30,000 6000 C3754 MAC13 500 30,000 29,500 You assume that in the absence of any deductibles and maximum covered losses that the losses would follow a Distribution Function F(x), with probability density function f(x). Let S(x) = 1 - F(x). What is the likelihood? f(600) f(3200) f(6500) f(10,000) f(20,000) f(30,000) A. S(100) S(500) B.
f(600) f(3200) f(6500) S(10,000) S(20,000) S(30,000) S(100) S(500)
C.
f(600) f(3200) f(6500) S(10,000) S(20,000) S(30,000) S(100)2 S(500)2
D.
f(600) f(3200) f(6500) {S(10,000) -F(100)} S(20,000) {S(30,000)-F(500)} S(100)2 S(500)2
E. None of the above.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 700 Use the following information for the next two questions: You observe the following payments from 10 losses: Deductible Maximum Covered Loss Payment None None 300 None None 70,000 500 None 11,500 1000 None 1,000 None 25,000 11,000 None 50,000 50,000 500 25,000 14,500 500 100,000 200 1000 50,000 49,000 1000 100,000 28,000
Size of Loss 300 70,000 12,000 2,000 11,000 50,000 or more 15,000 700 50,000 or more 29,000
25.7 (3 points) An Exponential Distribution is fit to the ground up unlimited losses via maximum likelihood. Determine the estimated mean of the Exponential Distribution, θ. A. 20,000
B. 25,000
C. 30,000
D. 35,000
E. 40,000
25.8 (4 points) A Weibull Distribution with τ = 1/2 is fit to the ground up unlimited losses via maximum likelihood. Determine the estimated θ. A. 20,000
B. 25,000
C. 30,000
D. 35,000
E. 40,000
25.9 (3 points) You are given the following 6 losses (before the deductible is applied): Loss Number of Losses Deductible Policy Limit 800 1 0 10,000 5000 2 1000 10,000 ≥25,000 2 0 25,000 4000 1 2000 50,000 Past experience indicates that these losses follow a Pareto distribution with parameters α and θ = 5,000. Determine the maximum likelihood estimate of α. (A) Less than 1.0 (B) At least 1.0, but less than 1.5 (C) At least 1.5, but less than 2.0 (D) At least 2.0, but less than 2.5 (E) At least 2.5
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 701 25.10 (2 points) You observe 4 claims from policies with different maximum covered losses and deductible amounts: Deductible Amount Maximum Covered Loss Amount Paid by the Insurer none 25,000 25,000 none 50,000 3,000 1000 50,000 49,000 2000 100,000 3,000 You assume that in the absence of any deductibles and maximum covered losses that the losses would follow a Distribution Function F(x), with probability density function f(x). Let S(x) = 1 - F(x). Which of the following is the likelihood? A. f(3000)2 f(25,000) f(49,000) C.
B. f(3000) f(5000) S(25,000) S(50,000)
f(3000) f(5000) f(25,000) f(50,000) S(1000) S(2000)
D.
f(3000) f(5000) S(25,000) {S(50,000)- S(1000)} S(1000) S(2000)
E. None of the above. 25.11 (3 points) For an insurance policy: (i) The policy has an ordinary deductible of 500 per loss. (ii) YP is the claim payment per payment random variable. (iii) Past experience indicates that YP follows a Pareto distribution with parameters α and θ = 3000. (iv) A random sample of five claim payments for this policy is: 500
1000
4500+
2000
9500+
where + indicates that the data has been right censored. Determine the maximum likelihood estimate of Pr(YP > 4000). A. 30% B. 34% C. 38% D. 42% E. 46% 25.12 (2 points) You are given: (i) A sample of five losses is: 1500 2000 2000 2500 (The fifth loss is only known to be at least 5000.) (ii) No information is available about losses of 1000 or less.
≥ 5000
(iii) Losses are assumed to follow an exponential distribution with mean θ. Determine the maximum likelihood estimate of θ. (A) 2000
(B) 2200
(C) 2400
(D) 2600
(E) 2800
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 702 25.13 (2 points) You observe 4 claim payments from a policy with a $1000 franchise deductible and a $25,000 maximum covered loss: $1500, $4000, $9000, $25,000. You assume that in the absence of any deductibles and maximum covered losses that the losses would follow a Distribution Function F(x), with probability density function f(x). S(x) = 1 - F(x). What is the likelihood? 25.14 • • • •
(4B, 5/98, Q.27 & 11/98, Q.10) (2 points) You are given the following: A portfolio contains two types of policies. Type Y policies have no deductible and a maximum covered loss of k. Type Z policies have a deductible of k and no maximum covered loss. A total of 50 losses that are less than k have been recorded on Type Y policies, y 1 , y2 , ..., y50.
• • •
A total of 75 losses that exceed k have been recorded on Type Y policies, Losses that are less than k are not recorded on Type Z policies. A total of 75 losses that exceed k have been recorded on Type Z policies, z1 , z2 , ..., z75.
•
The random variable X underlying the losses on both types of policies has the density function f(x;θ) and the cumulative distribution function F(x;θ).
Which of the following functions must be maximized to find the maximum likelihood estimate of θ? 50
A.
50
∏ f(yi ; θ)
∏ f(zj; θ) {1 - F(k; θ)}75
j=1
i=1
j=1
75
50
75
∏ f(zj; θ)
i=1 50
∏ f(yi ; θ) C.
75
∏ f(yi ; θ)
i=1
∏ f(zj; θ)
∏ f(yi ; θ)
j=1
{1 - F(k;
D.
θ)} 75
50
75
i=1
j=1
∏ f(yi ; θ) ∏ f(zj; θ) E.
B.
i=1
75
∏ f(zj; θ)
{1 - F(k; θ)}75
j=1
F(k; θ) 75
F(k; θ) 75
{1 - F(k; θ)} 75
25.15 (Course 4 Sample Exam 2000, Q.18) A group consisting of ten independent lives has a health insurance policy with an ordinary deductible of 250, coinsurance of 80%, and a maximum covered loss of 1,000 (before application of the deductible and coinsurance). In the past year, the following individual payments were made to members of the group: 40, 120, 160, 280, 600 (loss in excess of maximum), 600(loss in excess of maximum). Determine the likelihood function for estimating the parameters of the ground-up loss distribution using f(x) to represent the probability density function and F(x) to represent the cumulative distribution function.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 703 25.16 (4, 5/05, Q.27 & 2009 Sample Q.196) (2.9 points) You are given the following 20 bodily injury losses (before the deductible is applied): Loss Number of Losses Deductible Policy Limit 750 3 200 ∞ 200 3 0 10,000 300 4 0 20,000 >10,000 6 0 10,000 400 4 300 ∞ Past experience indicates that these losses follow a Pareto distribution with parameters α and θ = 10,000. Determine the maximum likelihood estimate of α. (A) Less than 2.0 (B) At least 2.0, but less than 3.0 (C) At least 3.0, but less than 4.0 (D) At least 4.0, but less than 5.0 (E) At least 5.0 25.17 (4, 5/07, Q.1) (2.5 points) For a dental policy, you are given: (i) Ground-up losses follow an exponential distribution with mean θ. (ii) Losses under 50 are not reported to the insurer. (iii) For each loss over 50, there is a deductible of 50 and a policy limit of 350. (iv) A random sample of five claim payments for this policy is: 50
150
200
350+ 350+
where + indicates that the original loss exceeds 400. Determine the likelihood function L(θ). A. e-1100/θ/θ5
B. e-1300/θ/θ5
C. e-1350/θ/θ5
D. e-1100/θ/θ3
25.18 (1 point) In 4, 5/07, Q.1, determine the maximum likelihood value of θ.
E. e-1350/θ/θ3
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 704 Solutions to Problems: 25.1. D. Due to the deductible of 500, G(x) = {F(x+500) - F(500) }/{1 - F(500)} = {(10000/10500)3 - (10000/(10500 +x))3 }/(10000/10500)3 = 1 - (10500/(10500 +x))3 , x< 24500. G(24500) = 1, since at the maximum covered loss the insurer pays: 25000 - 500 = 24500. 25.2. E. For the data truncated from below, G(x) = (F(x) - F(d))/{1 - F(d)}. A payment of $51,000 corresponds to a loss of $61,000. The payment of $51,000 is the 4th of 6 payments, so it corresponds to an estimate of the 4/7 percentile of G. F(10,000) = 1 - (1 + 10000/θ)−3 . F(61,000) = 1 - (1 + 61000/θ)−3. Matching the observed and theoretical percentiles: 4/7 = G(61000) = (F(61000)-F(10000))/S(10000) = 1 - {(1 + 10000/θ)/(1 + 61000/θ)}3 . (7/3)1/3 = (θ + 61000)/(θ + 10000). Solving, θ = (61000 - 13264)/(1.3264 - 1) = 146,250. Comment: Since the payment of $51,000 does not exhaust the maximum covered loss, we can do percentile matching and we can ignore the maximum covered loss in doing so. As per usual, we assume the data prior to the effects of truncation and censoring are drawn from a Pareto Distribution. 25.3. C. X = ($22,000 + $28,000 + $39,000 + $51,000 + $90,000 + $90,000)/6 = $53,333. The theoretical mean of the nonzero payments with a maximum covered loss of 100,000 and a deductible of 10,000 is: {E[X ∧ 100000] - E[X ∧ 10000]}/S(10000). For the Exponential Distribution, E[X {E[X
∧
∧
x] = θ(1- e-x/θ). Thus,
100000] - E[X ∧ 10000]}/S(10000) = θ(e-10000/θ - e-100000/θ)/e-10000/θ
= θ(1 - e-90000/θ). Setting the observed and theoretical means equal to each other: $53,333 = θ(1 - e-90000/θ). Solving numerically, θ = $77,800. theta theta(1 - exp(-90000/theta))
$76,000 $52,745
$77,000 $53,074
$78,000 $53,397
$79,000 $53,715
$80,000 $54,028
Comment: Method of Moments is not equal to the Method of Maximum Likelihood. As per usual, we assume the data prior to the effects of truncation and censoring are drawn from an Exponential Distribution.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 705 25.4. C. For convenience put everything in thousands. f(x) = α100α/(100 + x)α+1. S(x) = 100α/(100 + x)α. Amount Paid by The Insurer
Size of Accident
Contribution to the Likelihood
22
32
f(32)/S(10) = α(110α)/(132)(α + 1)
28
38
f(38)/S(10) = α(110α)/(138)(α + 1)
39
49
f(49)/S(10) = α(110α)/(149)(α + 1)
51
61
f(61)/S(10) = α(110α)/(161)(α + 1)
90
≥100
S(100)/S(10) = (110α)/(200α)
90
≥100
S(100)/S(10) = (110α)/(200α)
Therefore, the loglikelihood is: 4ln(α) + 6αln(110) - (α+1){ln(132) + ln(138) + ln(149) + ln(161)} - 2αln(200). Setting the derivative of the loglikelihood with respect to α equal to zero: 4/α + 6ln(110) - {ln(132) + ln(138) + ln(149) + ln(161)} - 2ln(200) = 0. α = 4/{ln(1.32) + ln(1.38) + ln(1.49) + ln(1.61) + 2ln2 - 6 ln(1.1)} = 1.747. Alternately, when fitting via maximum likelihood to a Pareto Distribution with θ known: α^ =
number of uncensored values = θ + xi ∑ ln[ θ + d ] i
4 = 1.747. ln(132 / 110) + ln(138 / 110) + ln(149 / 110) + ln(161/ 110) + 2ln(200 / 110) Comment: As per usual, we assume the data prior to the effects of truncation and censoring are drawn from a Pareto Distribution.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 706 25.5. C. For this LogLogistic Distribution, with θ =1, F(x) = xγ /(1 + xγ), S(x) = 1/(1 + xγ) and f(x) = γxγ−1 / (1 + xγ)2 . After truncation from below at 5, the density, below the maximum payment is: f(x)/S(5) = {γxγ−1 / (1 + xγ)2 } / {1/(1 + 5γ)}. For each payment, we determine the size of the loss. For example, the payment of 63 corresponds to a loss of 63/.9 + 5 = 75. A loss of 100 or more would result in a payment of (100 - 5)(.9) = 85.5. Then the contribution to the likelihood is either the density or the survival function for the data truncated from below at 5. Payment Loss Contribution to Likelihood 9
15
f(15)/S(5) = (1 + 5γ){γ15γ−1 / (1 + 15γ)2 }
63
75
f(75)/S(5) = (1 + 5γ){γ75γ−1 / (1 + 75γ)2 }
85.5 ≥100
S(100)/S(5) = (1 + 5γ)/(1 + 100γ)
Multiplying together all of the contributions, the likelihood is: (1 + 5γ){γ15γ−1 / (1 + 15γ)2 }(1 + 5γ){γ75γ−1 / (1 + 75γ)2 }(1 + 5γ)/(1 + 100γ) = (1 + 5γ) 3 γ2 {1125}γ−1 /{(1 + 15γ) 2 (1 + 75γ) 2 (1 + 100γ)}.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 707 25.6. C. One needs to compute the likelihood for each claim and then multiply the terms together. The first policy has no deductible so that the first two claims are only affected by censoring. The first claim is below the maximum covered loss, and thus its likelihood is the density f(3200). The second claim is at the 20000 maximum covered loss, and thus its likelihood is: S(20000). Since the second policy has a deductible, we need to modify the likelihood for truncation and shifting from below. With a $100 deductible, the density has to be divided by S(100). Thus the likelihood for the third claim is: f(600)/S(100). Note that we employ the density rather than the Distribution Function; claims that do not exhaust the maximum covered loss are treated as would be ungrouped data. Note that we use the density at 600 in the numerator, since the loss in the absence of a deductible would be 100 + 500 = 600. For the fourth claim of 9900, the maximum covered loss of 10,000 has been exhausted. Thus we employ the Survival Function rather than the density; claims that exhaust the policy are treated as would be grouped data. G(y) = {F(y+100) - F(100) } / S(100). Thus the likelihood for the fourth claim is: 1 - G(9900) = 1 - {F(9900+100) - F(100) } / S(100) = {S(100) + F(100) - F(10000)} /S(100) = {1 - F(10000)} /S(100) = S(10000) / S(100). The claims from the third policy are treated similarly to the claims from the second policy. Deductible Maximum Amount Paid by Size of Contribution to the Amount Covered Loss The Insurer Accident Likelihood 0 20,000 3200 3200 f(3200) 0 20,000 20,000 ≥20000 S(20000) 100 10,000 500 600 f(600)/S(100) 100 10,000 9900 ≥10000 S(10000) /S(100) 500 30,000 6000 6500 f(6500)/S(500) 500 30,000 29,500 ≥30000 S(30000) / S(500) Multiplying the individual likelihoods together one gets: f(600)f(3200)f(6500)S(20000)S(10000)S(30000)/{S(100)2 S(500)2 }. Comment: Difficult, since it combines the effects of deductibles and maximum covered losses. 25.7. C. Let di be the deductible and ui be the maximum covered loss for loss i. Uncensored values contribute to the loglikelihood: ln {f(xi)/S(di)} = -xi/θ - lnθ + di/θ = -(xi - di)/θ - lnθ. Censored values contribute to the loglikelihood: ln(S(ui)/S(di)) = di/θ - ui/θ = -(ui - di)/θ. Therefore, the loglikelihood is: -8lnθ - (300 + 70000 + 11500 + 1000 + 11000 + 50000 + 14500 + 200 + 49000 + 28000)/θ = -8lnθ - 235500/θ. Setting the partial derivative with respect to θ equal to zero: 0 = -8/θ + 235500/θ2. ⇒ θ = 235,500/8 = 29,438. Comment: θ = {Σ(xi - di) + Σ (ui - di)}/ (# uncensored) = (sum of payments) / (# uncensored values). uncen.
cen.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 708 25.8. A. For the Weibull Distribution, S(x) = exp(-(x/θ)τ), and f(x) = τ(x/θ)τ exp(-(x/θ)τ) / x. Let di be the deductible and ui be the maximum covered loss for loss i. Uncensored values contribute to the loglikelihood: ln {f(xi)/S(di)} = -(xi/θ)0.5 - 0.5lnxi + ln0.5 - 0.5lnθ + (di/θ)0.5. Censored values contribute to the loglikelihood: ln(S(ui)/S(di)) = -(ui/θ)0.5 + (di/θ)0.5. Therefore, the loglikelihood is:
Σ{-(xi/θ)0.5 - 0.5lnxi + ln0.5 - 0.5lnθ } + Σ -(ui/θ)0.5 + Σ(di/θ)0.5. uncen.
cen.
Setting the partial derivative with respect to θ equal to zero: 0 = -4/θ + (0.5){Σxi0.5 + Σui0.5 - Σdi0.5} / θ1.5. ⇒ θ0.5 = {Σxi0.5 + Σui0.5 - Σdi0.5} / 8. uncen.
∑
xi
cen.
uncen.
cen.
=
uncensored
300 +
∑
70,000 +
12,000 +
ui = 50,000 +
2000 +
11,000 +
15,000 +
700 +
29,000 = 860.3.
50,000 = 447.2.
censored
∑
di = 0 +
0 +
500 +
1000 +
0 +
0 +
500 +
500 +
1000 +
1000
= 162.0. 860.3 + 447.2 - 162.0 ⎞ 2 ⇒ θ = ⎛⎝ = 20,503. ⎠ 8 1/ τ ⎛ τ τ τ⎞ x u d i i i 1/ τ ⎜ ⎟ ∑ (Min[xi , ui]τ - diτ ) ^ ⎛ ⎞ censored Comment: θ = ⎜ uncensored ⎟ = . ⎝ number of uncensored values ⎠ ⎜ number of uncensored values ⎟ ⎜ ⎟ ⎝ ⎠
∑
∑
∑
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 709 25.9. A. f(x) = α(5000α)(5000 + x)−(α + 1). S(x) = {5000/(5000 + x)}α. Size of Loss
Deductible
Policy Limit
Contribution to the Likelihood per Loss
800
0
10,000
f(800) = α5000α/5800(α + 1)
1
5000
1000
10,000
f(5000)/S(1000) = α6000α/10000(α + 1)
2
≥25,000
0
25,000
S(25000) = 5000α/30000α
2
4000
2000
50,000
f(4000)/S(2000) = α7000α/9000(α + 1)
1
Number
Therefore, the loglikelihood is: ln(α) + αln(5000) - (α+1)ln(5800) + 2ln(α) + 2αln(6000) 2(α+1)ln(10000) + 2αln(5000) - 2αln(30000) + ln(α) + αln(7000) - (α+1)ln(9000) = 4ln(α) - 5.0049α + constants. Setting the derivative of the loglikelihood with respect to α equal to zero: 4/α - 5.0049 = 0. ⇒ α = 4/5.0049 = 0.799. Comment: Similar to 4, 5/05, Q.27. None of the losses is both truncated and censored. Fitting via maximum likelihood to a Pareto Distribution with θ known, α^ = (number of uncensored values)/Σln[(θ + xi)/(θ + di)], where xi is the ith loss, possibly censored from above, and di is the deductible for the policy for xi. Here α^ = 4/(ln[(5000 + 800)/5000] + 2ln[(5000 + 5000)/(5000 + 1000)] + 2ln[(5000+ 25000)/5000] + ln[(5000 + 4000)/(5000 + 2000)] = 0.799.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 710 25.10. E. One needs to compute the likelihood for each claim, and then multiply the contributions together. The first claim had no deductible and is at the 25000 maximum covered loss, and thus its likelihood is: S(25000). The second claim had no deductible and is below the 25000 maximum covered loss, and thus its likelihood is: f(3000). The third claim has exhausted the 50000 - 1000 = 49,000 of coverage provided; the loss is of size greater than or equal to 50,000. Therefore, the likelihood of the third claim is the survival function at 50000 of the data truncated from below at 1000: S(50000)/S(1000). The fourth claim was truncated from below at 2000, but was unaffected by the maximum covered loss. Therefore, the likelihood of the fourth claim is the density at 2000 + 3000 = 5000 of the data truncated from below at 2000: f(5000)/S(2000). Deductible Maximum Amount Paid by Size of Contribution to the Amount Covered Loss The Insurer Accident Likelihood 0 25,000 25,000 ≥25,000 S(25000) 0 50,000 3,000 3000 f(3000) 1000 50,000 49,000 ≥50,000 S(50000)/S(1000) 2000 100,000 3,000 5000 f(5000)/S(2000) Multiplying the individual likelihoods together one gets: f(3000)f(5000)S(25000)S(50000)/{S(1000)S(2000)}. Comment: Difficult, since it combines the effects of deductibles and maximum covered losses.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 711 25.11. E. If YP follows a Pareto with θ = 3000, then the ground up losses follow a Pareto with the same α and θ = 3000 - 500 = 2500. The five payments correspond to losses of 1000, 1500, 2500, 5000+, 10,000+. When fitting via maximum likelihood to a Pareto Distribution with θ known: α^ =
number of uncensored values , θ + xi ∑ ln[ θ + d ] i
where xi is the ith loss, possibly censored from above, and di is the deductible for the policy for xi. α^ =
3 ln(3500 / 3000) + ln(4000 / 3000) + ln(5000 / 3000) + ln(7500 / 3000) + ln(12,500 / 3000)
= 0.9102. Thus YP follows a Pareto with α = 0.9102 and θ = 3000. Pr(YP > 4000) = {3000/(3000 + 4000)}0.9102 = 46.24%. Alternately, we are told that the payments after the application of the deductible, but prior to any maximum covered loss is Pareto with θ = 3000. f(y) = α(3000α)(3000 + y)−(α + 1). S(y) = {3000/(3000 + y)}α. Size of Payment
Contribution to Likelihood
500
f(500) = α3000α/3500(α + 1)
1000
f(1000) = α3000α/4000(α + 1)
2000
f(2000) = α3000α/5000(α + 1)
≥ 4500
S(4500) = 3000α/7500α
≥ 9500
S(4500) = 3000α/12500α
Therefore, the loglikelihood is: 3ln(α) + 5αln(3000) - (α+1){ln(3500) + ln(4000) + ln(5000)} - αln(7500) - α ln(12,500). Setting the derivative of the loglikelihood with respect to α equal to zero: 3/α + 5ln(3000) - {ln(3500) + ln(4000) + ln(5000)} - ln(7500) - ln(12,500) = 0. ⇒ α = 0.9102. Thus YP follows a Pareto with α = 0.9102 and θ = 3000. Pr(YP > 4000) = {3000/(3000 + 4000)}0.9102 = 46.24%. Comment: The ground up unlimited losses follow a Pareto with α = 0.9102 and θ = 3000 - 500 = 2500.
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 712 25.12. A. The data is truncated from below and censored from above. Pretend there is a 1000 deductible. Then the payments are: 500, 1000, 1000, 1500, 4000+. θ = (sum of the payments) / (# of uncensored values) = 8000/4 = 2000. 25.13. The data has been truncated from below at 1000 and censored from above at 25,000. For a franchise deductible of 1000, if a loss is greater than 1000, then the insurer pays the whole loss. Payment Loss Size Contribution to the Likelihood 1500 1500 f(1500) / S(1000) 4000 4000 f(4000) / S(1000) 9000 9000 f(9000) / S(1000) 25,000 25,000 or more S(25,000) / S(1000) Thus the likelihood is: f(1500) f(4000) f(9000) S(25,000) / S(1000)4 .
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 713 25.14. A. The likelihood for the 50 losses of less than k that have been recorded on Type Y is: 50
f(yi) . ∏ i=1 The likelihood for the 75 losses that have exceeded k that have been recorded on Type Y is: S(k)75. Thus the likelihood for the claims on Policy Type Y is the product: 50
f(yi) S(k)75. ∏ i=1 The likelihood for each recorded loss for Type Z is given by the density for truncation from below at k: f(zj) / S(k). Thus the likelihood for the 75 losses that have exceeded k that have been recorded on Type Z is: 75
75
{f(zj )/ S(k)} = ∏ f(zj) / S(k)75. ∏ j=1 j=1 Therefore, the combined likelihood for the two types of policies is: 50
75
50
75
f(yi) S(k)75 ∏ f(zj) / S(k)75 = ∏ f(yi) ∏ f(zj) . ∏ i=1 j=1 i=1 j= 1 Comment: It is very easy in this case to get the correct answer by accident, due to cancellation of the terms involving the tail probabilities at k. For example, if instead there had been 100 losses that exceeded k that had been recorded on Type Z policies, z1 , z2 , ..., z100, then the solution would have been: 50
∏ i=1
100
f(yi) S(k)75
50
100
{f(zj )/ S(k)} = ∏ f(yi) ∏ f(zj) S(k)25. ∏ j=1 i=1 j=1
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 714 25.15. For each payment, we determine the size of the loss. For example, the payment of 40 corresponds to a loss of 40/0.8 + 250 = 300. A loss of 1000 or more would result in a payment of (1000-250)(0.8) = 600. Then the contribution to the likelihood is either the density or the distribution function for the data truncated from below at 250. Payment Loss Contribution to Likelihood 40 300 f(300)/(1-F(250)) 120 400 f(400)/(1-F(250)) 160 450 f(450)/(1-F(250)) 280 600 f(600)/(1-F(250)) 600 ≥1000 (1-F(1000))/(1-F(250)) 600 ≥1000 (1-F(1000))/(1-F(250)) Multiplying together all of the contributions, the likelihood is: f(300) f(400) f(450) f(600) {1-F(1000)}2 / (1-F(250))6 . 25.16. C. f(x) = α(10000α)(10000 + x)−(α + 1). S(x) = {10000/(10000 + x)}α. Size of Loss
Deductible
Policy Limit
Contribution to the Likelihood per Loss
750
200
None
f(750)/S(200) = α10200α/10750(α + 1)
3
200
None
10,000
f(200) = α10000α/10200(α + 1)
3
300
None
20,000
f(300) = α10000α/10300(α + 1)
4
>10,000
None
10,000
S(10000) = 10000α/20000α
6
400
300
None
f(400)/S(300) = α10300α/10400(α + 1)
4
Therefore, the loglikelihood is: 3ln(α) + 3αln(10200) - 3(α+1)ln(10750) + 3ln(α) + 3αln(10000) - 3(α+1)ln(10200) + 4ln(α) + 4αln(10000) - 4(α+1)ln(10300) + 6αln(10000) - 6αln(20000) + 4ln(α) + 4αln(10300) - 4(α+1)ln(10400) = 14ln(α) - 4.53273α + constants. Setting the derivative of the loglikelihood with respect to α equal to zero: 14/α - 4.53273 = 0. ⇒ α = 14/4.53273 = 3.089. Comment: None of the losses is both truncated and censored.
Number
2013-4-6, Fitting Loss Distributions §25 Trunc. & Censored Data, HCM 10/15/12, Page 715 25.17. D. For an uncensored payment of size x < 350, the contribution to the likelihood is: f(x + 50)/S(50) = {e-(x + 50)/θ/θ} / e-50/θ = e-x/θ/θ. For a censored payment of size 350, the contribution to the likelihood is: S(350 + 50)/S(50) = e-400/θ / e-50/θ = e-350/θ. Therefore, the likelihood is: (e-50/θ/θ) (e-150/θ/θ) (e-200/θ/θ) e-350/θ e-350/θ = e- 1 1 0 0 / θ /θ3. Comment: The maximum covered loss and censorship point from above is: u = 50 + 350 = 400. The data is truncated from below at d = 50. We are given the payment amounts; the data are shifted by having subtracted the deductible of 50. 25.18. Likelihood is: e-1100/θ/θ3. Loglikelihood is: -1100/θ - 3 ln(θ). Setting the derivative with respect to θ equal to zero: 1100/θ2 - 3/ θ = 0. ⇒ θ = 1100/3 = 367. Comment: For an Exponential fit via maximum likelihood to ungrouped data, θ = (Sum of payments) / (number of uncensored values). In this case, θ = (50 + 150 + 200 + 350 + 350)/3 = 1100/3.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 716
Section 26, Properties of Estimators We are interested in what is expected to happen if we were to use an estimator again and again to make a particular estimate. Various desirable properties of estimators will be discussed. Estimates versus Estimators: Using maximum likelihood to fit a Exponential distribution is an example of an estimator. A fitted Exponential parameter of θ = 140 is an example of an estimate. An estimator is a procedure used to estimate a quantity of interest. An estimate is the result of using an estimator. For example, an estimator of the Loss Elimination Ratio at 10,000 might be to fit a LogNormal distribution via the method of moments and use the LER at 10,000 of the fitted distribution. The resulting estimate of the LER at 10,000 might be 12.3%. Note that the result of this estimator depends on the data sample; with a different random sample of data we would expect to get a somewhat different estimate of the LER at 10,000. Therefore, the estimator is a random variable. An estimator is a random variable or random function. An estimate is a number or a function. Types of Errors: model error ⇔ (inadvertently) used inappropriate model or assumptions. For example, one may believe the sizes of loss are from a Gamma Distribution, while they are actually from a Weibull Distribution. sampling frame error ⇔ using a sample different from the population to be modeled. For example, one may attempt to use data from individual lines of insurance: fire, theft, liability, etc., in order to model the first multiline homeowners policies.296 sampling error ⇔ nonrepresentative sample due to random fluctuation. The effect of sampling error depends on how much random fluctuation there is in the data, which depends on the size of the sample, as well as the properties of the particular estimator chosen. 296
Many decades ago, actuaries did precisely this before experience was collected under the then new homeowners line of insurance.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 717 Point Estimators: A point estimator provides a single value, or point estimate as an estimate of a quantity of interest.297 An example of a point estimator would be taking the value of the 7th out of 9 claims as an estimate of the 70th percentile. Another example, would be to take the ratio θ/(α-1) for the parameters of a fitted Pareto as the estimate of the mean size of loss for the given source of data. One wants point estimators to be: unbiased, consistent and have a small mean squared error. Bias: The Bias of an estimator is the expected value of the estimator minus the true value. An unbiased estimator has a Bias of zero.298 While being unbiased is a desirable property for an estimator:299 1. There may be another estimator that is biased but has a smaller mean squared error.300 2. Unbiasedness is usually not preserved under a change in parameters.301 3. Sometimes an estimator may be unbiased but also silly. Sample Mean: The sample mean is: X = ΣXi / n. Exercise: The random variables X1 , X2 , ... , Xn , are independent and identically distributed. Demonstrate that the sample mean is an unbiased estimator of the mean of their distribution. [Solution: E[ X ] = E[ΣXi / n] = ΣE[Xi] /n = ΣE[X] / n = n E[X] / n = E[X].] The sample mean is an unbiased estimator of the mean of the distribution from which the sample was drawn. 297
Point estimates differ from interval estimates. A point estimate of the average size of loss might be $13,891. An interval estimate might be: $13,891 ± $710, with 90% confidence. 298 See Definition 12.1 in Loss Models. 299 See page 395 of the first edition of Loss Models 300 As discussed below the mean squared error is equal to the variance plus the square of the bias. The minimal variance for an unbiased estimator of a parameter is the Rao-Cramer lower bound: -1 / n E [∂2 ln f(x) / ∂θ2]. Thus the MSE of an unbiased estimator is ≥ -1 / n E [∂2 ln f(x) / ∂θ2]. 301 Unbiasedness is not preserved by a nonlinear change of variables. See for example, 4, 11/05, Q.28.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 718 Let us assume we have X1 , X2 and X3 , three independent, identically distributed variables. Since they are independent, their variances add, and since they are identical they each have the same variance, Var[X]: Var[X1 + X2 + X3 ] = Var[X1 ] + Var[X1 ] + Var[X3 ] = 3Var[X]. X = (X1 + X2 + X3 )/3. Var[ X ] = Var[(X1 + X2 + X3 )/3] = Var[X1 + X2 + X3 ]/32 = (3Var[X])/32 = Var[X]/3. Exercise: We generate five independent random variables, each from an Exponential Distribution with θ = 7. What is the variance of their average? [Solution: Var[ X ] = Var[(X1 + X2 + X3 + X4 + X5 )/5] = Var[X1 + X2 + X3 + X4 + X4 ]/52 = (5Var[X])/52 = Var[X]/5 = θ2/5 = 49/5 = 9.8.] For n independent, identically distributed variables: n
n
∑Var[Xi] / n2 = nVar[X]/n2 = Var[X]/n.
i=1
i=1
i=1
Var[ X ] = Var[ ∑Xi / n] = Var[ ∑Xi ] / n2 =
n
Var[ X ] = Var[X] / n. The variance of an average is proportional to the inverse of the sample size!302 Exercise: We generate 10 independent random variables, each from a distribution with mean 3 and variance 8. What is E[ X 2 ]? [Solution: E[ X ] = E[X] = 3. Var[ X ] = Var[X]/10 = 8/10 = 0.8. E[ X 2 ] = Var[ X ] + E[ X ]2 = 0.8 + 32 = 9.8.] Empirical Moments: In a similar manner, let us estimate the second moment as: ΣXi2 / n. The expected value of this estimator is: E[ΣXi2 / n] = ΣE[Xi2 ] /n = ΣE[X2 ] / n = n E[X2 ] / n = E[X2 ]. Thus the empirical second moment is an unbiased estimator of the second moment. More generally, E[ΣXik / n] = ΣE[Xik] /n = ΣE[Xk] / n = n E[Xk] / n = E[Xk]. 302
The skewness of X goes down as 1/ n . Excess = kurtosis - 3. The excess of X goes down as 1/n. See equation 5.14.2 in Statistical Methods by Snedecor and Cochran.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 719 Thus the empirical kth moment is an unbiased estimator of the kth moment of the distribution from which the data was drawn. Sample Variance:
The sample variance,
∑ (Xi
- X )2
N - 1
, is an unbiased estimator of the variance of the
distribution from which the sample was drawn.303
ΣXi = N X . ⇒ Σ(Xi - X ) = 0.
Thus if one knows any N - 1 of the Xi - X , then one knows the
remaining one. We lose a degree of freedom; we have N - 1 rather than N degrees of freedom. Therefore, in order to obtain an unbiased estimator of the variance, we put N - 1 rather than N in the denominator. Exercise: Demonstrate that the sample variance is an unbiased estimator for a random sample of size 3. [Solution: Let X1 , X2 , and X3 , be three independent identically distributed variables. X = (X1 + X2 + X3 ) / 3. (X1 - X )2 = (2X1 /3 - X2 /3 - X3 /3)2 = 4X1 2 /9 + X2 2 /9 + X3 2 /9 - 4X1 X2 /9 - 4X1 X3 /9 + 2X2 X3 /9. E[(X1 - X )2 ] = E[4X1 2 /9 + X2 2 /9 + X3 2 /9 - 4X1 X2 /9 - 4X1 X3 /9 + 2X2 X3 /9] = (4/9)E[X2 ] + (1/9)E[X2 ] + (1/9)E[X2 ] - (4/9)E[X]E[X] - (4/9)E[X]E[X] + (2/9)E[X]E[X] = (2/3){E[X2 ] - E[X]2 } = (2/3)Var[X]. E[(X1 - X )2 ] = E[(X2 - X )2 ] = E[(X3 - X )2 ] = (2/3)Var[X]. The expected value of the sample variance is: E[{(X1 - X )2 + (X2 - X )2 + (X3 - X )2 }/(3-1)] = {(2/3)Var[X] + (2/3)Var[X] + (2/3)Var[X]}/2 = Var[X].]
303
The sample variance is calculated for a data set in order to estimate the variance of the distribution from which this data was drawn. The sample variance is on average equal to the “true variance”. If one knows the form of the distribution, one does not use the sample variance!
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 720 Proof That the Sample Variance is Unbiased: Assume we have a sample of size n > 1, from a distribution with finite first and second moments. n - 1 1 X1 - X = X1 n n
n
∑ Xi . i=2
(n - 1)2 n - 1 2 (X1 - X ) = X1 2 - 2 X1 2 n n2 (n - 1)2 n - 1 = X1 2 - 2 X1 2 n n2
n
∑ i=2 n
∑ i=2
1 Xi + 2 n
1 Xi + 2 n
n
n
∑ ∑ XiXj j=2 i=2 n
∑ i=2
Xi2
1 + 2 n
n
n
∑ ∑ XiXj . i=2 j>1, j≠i
Due to independence, for i ≠ j, E[XiXj] = E[X] E[X] = E[X]2 . Also E[XiXi] = E[X2 ]. Therefore, E[(X1 - X )2 ] = 1 (n - 1)2 2 ] - 2 n - 1 E[X] (n-1) E[X] + 1 (n-1) E[X2 ] + E[X (n-1)(n-2) E[X]2 = n2 n2 n2 n2
E[X2 ] {
(n - 1)2 1 (n - 1)2 1 2 {2 + (n-1)} E[X] (n-1)(n-2)} = n2 n2 n2 n2
E[X2 ]
2 2 (n2 - 2n + 1) + (n- 1) 2 (2)(n - 2n + 1) - (n - 3n + 2) = E[X] n2 n2
E[X2 ]
2 n2 - n 2 n - n = E[X2 ] n - 1 - E[X]2 n - 1 = n - 1 Var[X]. E[X] n2 n2 n n n
By symmetry, for all i, E[(Xi - X )2 ] = i= n
∑ (Xi - X )2
Thus E[s2 ] = E[ i = 1 n - 1
n - 1 Var[X]. n
i= n
∑ E[(Xi - X )2]
]= i= 1
n - 1
n =
n - 1 Var[X] n = Var[X]. n- 1
Thus the sample variance is an unbiased estimator of the variance of the distribution from which the sample was drawn.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 721 Asymptotically Unbiased: For an asymptotically unbiased estimator, as the number of data points, N → ∞, the bias approaches zero.304 305 In other words, as the sample size N → ∞ approaches infinity, the expected value of the estimator approaches the true value of the quantity being estimated.
∑ (Xi
One estimator of the variance is:
- X )2
.
N
One could call this estimator with N in the denominator the empirical variance, in order to distinguish it from the sample variance with N - 1 in the denominator.306
Σ(Xi - X )2/N = Σ(Xi2 - 2 X Xi + X 2)/N = ΣXi2/N - 2 X ΣXi/N + X 2 = ΣXi2/N - X 2. empirical variance =
∑ (Xi
- X )2
N
∑ (Xi sample variance =
- X )2
N - 1
(empirical variance) =
= empirical second moment - (sample mean)2 .
N = N - 1
∑ (Xi
- X )2
=
N
N (empirical variance). N - 1
N - 1 (sample variance). N
As discussed previously, the sample variance is an unbiased estimator of the variance, and therefore the expected value of the sample variance is Var[X].
Thus, the empirical variance estimator,
∑ (Xi
- X )2
N
has an expected value of:
∑ (Xi Therefore, since its expected value is not Var[X],
- X )2
N
∑ (Xi However, the expected value of
- X )2
N
304
N - 1 Var[X]. N
is a biased estimator of the variance.
goes to Var[X] as N → ∞.
See Definition 12.2 in Loss Models. While all unbiased estimators are also asymptotically unbiased, in common usage asymptotically unbiased is reserved for estimators that are biased. 306 Loss Models does not use a name for this estimator with N in the denominator. 305
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 722 Therefore,
∑ (Xi
∑ (Xi
- X )2
N
- X )2
N
is an asymptotically unbiased estimator of the variance.
is an excellent example of an asymptotically unbiased estimator,
∑ (Xi while the sample variance
- X )2
N - 1
is an excellent example of an unbiased estimator.307
Consistency: When based on a large number of observations, a consistent estimator, also called weakly consistent, has a very small probability that it will differ by a large amount from the true value.308 Let ψn be the estimator with a sample size of n and c be the true value, then ψ is a consistent estimator if given any ε > 0: limit Probability[| ψn - c | < ε] = 1. n→∞
This is similar to the idea behind Classical Credibility; the chance of large fluctuations from the true result goes to zero as n → ∞. We can pick a k such that the estimate will be within ±k of the true value at least P% of the time, provided the sample size n is large enough. For any fixed k, P→ 1 as n → ∞. Most estimators used by actuaries are consistent. For example the sample mean is a consistent estimator of the underlying mean, assuming the data are independent draws from a single distribution with finite mean.309 310 An example of an inconsistent estimator is the regression estimator of the slope, when a relevant variable is (unknowingly) omitted from the model.311 307
While technically the sample variance is asymptotically unbiased, calling it that would convey less information than saying that the sample variance is unbiased 308 See Definition 12.3 in Loss Models. A consistent estimator may also be defined as one that converges stochastically to the true value See Introduction to Mathematical Statistics, by Hogg & Craig. 309 The Law of Large Numbers. See for example, An Introduction to Probability Theory and Its Applications by Feller, of A First Course in Probability by Ross. 310 While it is not necessary that the distribution have a finite variance, if it does then consistency follows from the Central Limit Theorem. 311 See Section 7.3.1. of Economic Models and Economic Forecasts by Pindyck and Rubinfeld, not on the Syllabus.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 723 Another example of an inconsistent estimator is the “actuarial estimator” of qx, which is no longer used.312 Exercise: Use as an estimator of the mean, the first element of a data set of size N. Is this estimator consistent? [Solution: The estimate resulting from this estimator does not depend on the sample size N. The probability of a large error is independent of N and therefore does not approach zero as N approaches infinity. This estimator is not consistent. Comment: This is not a sensible estimator to use.] If an estimator does not seem sensible, it is unlikely to be consistent. Mean Squared Errors: The mean square error (MSE) of an estimator is the expected value of the squared difference between the estimate and the true value.313 The smaller the MSE, the better the estimator, all else equal. ^
^
^ 2
MSE[θ ] = Var[θ ] + Bias[θ ] . One can demonstrate this relationship as follows: MSE( θ^ ) = E[( θ^ - θ)2 ] = E[ θ^ 2 - 2θ θ^ + θ2] = E[ θ^ 2 ] - 2θE[ θ^ ] + θ2 = E[ θ^ 2 ] - E[ θ^ ]2 + E[ θ^ ]2 - 2θE[ θ^ ] + θ2 = Var( θ^ ) + (E[ θ^ ] - θ)2 = Var[ θ^ ] + Bias[ θ^ ]2 . The mean squared error is equal to the variance plus the square of the bias. Thus for an unbiased estimator, the mean square error is equal to the variance. Variances: Let ψn be asymptotically unbiased estimator, whose variance goes to zero as n, the number of data points, goes to infinity. Then as n goes to infinity, since both the variance and bias go to zero, the Mean Squared Error, MSE(ψn ) = Var(ψn ) + [Bias(ψn )]2 , also goes to zero.
312 313
See Section 6.4 of Survival Models and Their Estimation by Dick London, not on the Syllabus. See Definition 12.4 in Loss Models.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 724 Let c be the true value, then for any ε > 0: MSE(ψn ) = c−ε
∫
-∞ -ε2
(ψ n - c )2 f(ψn ) dψn + c−ε
∫
-∞
f(ψn ) dψn + ε2
∞
∫
c+ε
c+ε
∫
c−ε
(ψ n - c )2 f(ψn ) dψn +
∞
∫ (ψn
-∞ ∞
∫
c+ε
- c )2 f(ψ n) dψn =
(ψ n - c )2 f(ψn ) dψn ≥
f(ψn ) dψn = ε2 Probability[ |ψn - c | ≥ ε ].
We have shown that MSE(ψn )/ε2 ≥ Probability[ |ψn - c | ≥ ε ] ≥ 0. Therefore, since as n goes to infinity, MSE(ψn ) goes to zero: limit Probability[ |ψn - c | ≥ ε ] = 0.
n→∞
Therefore, an asymptotically unbiased estimator, whose variance goes to zero as the number of data points goes to infinity, is consistent (weakly consistent). i=n
For example, ∑ (xi - x)2 / n is an asymptotically unbiased estimator of the variance. For a i=1
distribution with finite fourth moment, the variance of this estimator goes to zero. Therefore, in that i=n
case, ∑ (xi - x)2 / n is a consistent estimator of the variance. i=1
The minimal variance for an unbiased estimator of a parameter, that does not appear in the definition of the support of a density, is the Cramer-Rao (Rao-Cramer) lower bound:314 −1 1 = . n E [∂2 ln f(x) / ∂ 2θ] n E [(∂ ln f(x) / ∂θ) 2] Thus an unbiased estimator of a parameter has the smallest MSE among unbiased estimators if and only if its variance attains the Cramer-Rao lower bound. A Uniformly Minimum Variance Unbiased Estimator (UMVUE) is unbiased and has the smallest variance among unbiased estimators.315 The Cramer-Rao lower bound would apply to estimating θ the mean of an Exponential Distribution. The Cramer-Rao lower bound would not apply to estimating ω for Demoivreʼs Law, a uniform distribution from (0, ω), since ω appears in the definition of the support: 0 ≤ x ≤ ω. 315 See Definition 12.5 in Loss Models. This is equivalent to the smallest MSE among unbiased estimators. Some other unbiased estimator may have the same variance, but not a smaller variance than the UMVUE. Also, the UMVUE must have this property regardless of the true value of the quantity that is being estimated. 314
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 725 Demonstrating that the Two Forms of the Cramer-Rao Lower Bound are Equal:316 f(x) is a density with one parameter θ.
[∂2 f(x)f(x)/ ∂θ2 ] = ∫ ∂2 f(x)f(x)/ ∂θ2 f(x) dx = ∫ ∂2 f(x) / ∂θ2 dx = ∂θ∂22 [ ∫ f(x) dx] = ∂θ∂22 [1] = 0.
E
⎛ ∂ f(x) / ∂θ ⎞ 2 ∂ ∂ ⎡ ∂ f(x) / ∂θ ⎤ ∂2 f(x) / ∂θ2 However, ∂2 ln f(x) / ∂θ 2 = - ⎜ [∂ ln f(x) / ∂θ] = ⎢ ⎥= ⎟ . ∂θ ∂θ ⎣ f(x) f(x) f(x) ⎦ ⎝ ⎠ Therefore, E[ ∂2 ln f(x) / ∂θ 2 ] = E[
⎛ ∂ f(x) / ∂θ ⎞ 2 ∂2 f(x) / ∂θ2 2 ] - E[ ⎜ ⎟ ] = 0 - E[ (∂ ln f(x) / ∂θ ) ]. f(x) f(x) ⎝ ⎠ 2
E[ ∂2 ln f(x) / ∂θ 2 ] = -E[ (∂ ln f(x) / ∂θ ) ].
316
See Volume 2 of Kendallʼs Advanced Theory of Statisticʼs, not on the syllabus.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 726 Demonstrating the Cramer-Rao Lower Bound: Assume t is an unbiased estimator of some function of the single parameter θ, g(θ). Let L be the likelihood, given a sample of size n. Since t is an unbiased estimator of g(θ): E[t] = ∫ ...∫ t L dx 1 ... dxn = g(θ). Therefore, ∫ ...∫ t ∂L / ∂θ dx1 ... dxn = ∂g(θ)/ ∂θ . ∂lnL / ∂θ = ( ∂L / ∂θ)/L. ⇒ ∂L / ∂θ = L ∂lnL / ∂θ . Therefore, ∫ ...∫ t L ∂lnL / ∂θ dx1 ... dxn = ∂g(θ)/ ∂θ . ∫ ...∫ L dx 1 ... dxn = 1. ⇒ ∫ ...∫ ∂L / ∂θ dx1 ... dxn = 0. ⇒ ∫ ...∫ L ∂lnL / ∂θ dx1 ... dxn = 0. Therefore, ∂g(θ)/ ∂θ = ∫ ...∫ t L ∂lnL / ∂θ dx1 ... dxn - g(θ) ∫ ...∫ L ∂lnL / ∂θ dx1 ... dxn = E[t ∂lnL / ∂θ ] - E[t] E[ ∂lnL / ∂θ ] = Cov[t, ∂lnL / ∂θ ]. Cov[t, ∂lnL / ∂θ ] ≤
Var[t] Var[∂lnL / ∂θ] .317 ⇒ Var[t] ≥ Cov[t, ∂lnL / ∂θ ]2 / Var[ ∂lnL / ∂θ ].
E[ ∂lnL / ∂θ ] = ∫ ...∫ L ∂lnL / ∂θ dx1 ... dxn = 0. ⇒ Var[ ∂lnL / ∂θ ] = E[( ∂lnL / ∂θ )2 ]. Therefore, Var[t] ≥ ( ∂g(θ)/ ∂θ )2 / E[( ∂lnL / ∂θ )2 ]. For the special case, where g(θ) = θ, Var[t] ≥
317
1 1 = . E [(∂ L / ∂θ)2] n E [(∂ ln f(x) / ∂θ) 2]
The covariance ≤ the product of the standard deviations. An example of the Cauchy-Schwarz inequality.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 727 Maximum Likelihood Estimators: Maximum likelihood estimators may be biased, but as n → ∞ the bias approaches zero; thus they are asymptotically unbiased. Also as n → ∞, the maximum likelihood estimator is Normally distributed with a variance that approaches the Cramer-Rao lower bound, so that maximum likelihood estimators asymptotically have the smallest MSE and smallest variance. Maximum likelihood estimators are asymptotically UMVUE. Since Maximum Likelihood Estimators are asymptotically unbiased and since their variance goes to zero as the sample size goes to infinity, they are consistent. So under certain regularity conditions, the maximum likelihood estimator has the desirable properties for large samples. Generally it also does okay for smaller samples as well. This is why in spite of the necessity to solve numerically, maximum likelihood is perhaps the most commonly used method of estimation. As has been discussed, the maximum likelihood estimator is invariant under changes of the parameters. It will be discussed subsequently how to calculate the variance of maximum likelihood estimators. Method of Moments Estimators: For method of moments, we set E[X] = X . Since the sample mean, X , is an unbiased estimator of the underlying mean, method of moments is an unbiased estimator of the mean.
For two parameters, we set E[X] =
X , and E[X2 ] =
∑Xi2 . N
Xi2 ∑ This is equivalent to setting E[X] = X , and Var[X] = N
Since
∑ (Xi
- X )2
N
-
X2
∑ (Xi =
- X )2
N
.
is a biased estimator of the variance, for two (or more) parameters, method of
moments is a biased estimator of the variance; however, it is asymptotically unbiased. Usually method of moment estimators have a larger MSE than maximum likelihood.318 318
With the exception of those cases where they are the same, for large samples maximum likelihood has a smaller mean squared error than method of moments.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 728 Efficiency:319 An unbiased estimator is efficient if for a given sample size it has the smallest variance of any unbiased estimator. For an unbiased estimator, the variance is equal to the mean squared error. Thus an unbiased estimator has the smallest MSE among unbiased estimators if and only if it has the smallest variance among unbiased estimators. Maximum Likelihood estimators are asymptotically efficient. An Example of a Test of Estimators: Here is an example of a simulation experiment to estimate the mean squared errors of various (point) estimators of the excess ratio at $1 million, R($1 million).320 One hundred sets of 1000 random claims from a Pareto Distribution with α = 2 and θ = 100,000 were simulated. For each of these 100 data sets, Pareto distributions were fit using various methods. Then the resulting estimated Excess Ratio at $1 million was compared to the Excess Ratio for the true underlying Pareto Distribution321. The details of the methods used are: Percentile Matching is at the 75th and 95th percentiles. The Method of Moments. The Method of Maximum Likelihood. The Mean Residual Lives were matched at $100,000 and $500,000. 322 The Minimum Chi-Square method, not on the syllabus, was based on grouping the data into the intervals: 0-10,000, 10,000-25,000, 25,000-50,000, 50,000-100,000, 100,000-250,000, 250,000-500,000, and greater than 500,000. Find numerically the Pareto Distribution with the smallest Chi-Square Statistic. 319
See Section 2.3.2 of Econometric Models and Econometric Forecasts by Pindyck and Rubinfeld, not on the Syllabus. 320 The efficiency of estimator ψ relative to another estimator φ is: MSE(φ) / MSE (ψ). The estimator with the smaller MSE is the more efficient; an estimatorʼs relative efficiency is inversely proportional to its MSE. Higher efficiency is better. 321 For the Pareto Distribution R(x) = (1 + x / θ )1 - α. For α = 2 and θ = 100,000, R(1 million) = 9.09%. While this method is not covered on the syllabus, it is easy for the Pareto. e(x) = (x +θ) / (α − 1). Therefore, θ = { e(x1 )x2 - e(x2 )x1 } / { e(x2 ) - e(x1 ) } and α = 1 + {x2 + θ} / e(x2 ). 322
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 729 The Minimum cdf Distance method, not on the syllabus, was based on summing the squared difference of the empirical and the fitted Distribution Functions at the points: 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, and 1,000,000. Find numerically the Pareto Distribution with the smallest such sum of squares. The Minimum LEV Distance method, not on the syllabus, is based on summing the squared difference of the empirical and the fitted Limited Expected Values at the points: 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, and 1,000,000. Find numerically the Pareto Distribution with the smallest such sum of squares. The Minimum K-S Statistic Method, not on the syllabus, solves numerically for the Pareto Distribution with the minimum Kolmogorov-Smirnov Statistic for the given data set. Shown below are the results of this simulation experiment of estimating Excess Ratios at $1 million, R($1 million).323 Mean
Bias = Estimate True Value
Variance of Excess Ratios
Mean Square Error
0.00097
0.00097
MSE for Max. Like. / MSE
Method
Excess Ratio
Actual Pareto (2,100000)
0.0909
Percentile Matching
0.1067
0.0158
0.00277
0.00302
0.322
Method of Moments
0.0484
-0.0425
0.00050
0.00231
0.422
Maximum Likelihood
0.0945
0.0036
0.00096
0.00097
1.000
Matching Mean Residual Lives
0.0716
-0.0193
0.00453
0.00490
0.198
Minimum Chi-Square
0.0979
0.0070
0.00123
0.00128
0.761
Minimum cdf Distance
0.0976
0.0067
0.00156
0.00160
0.606
Minimum LEV Distance
0.0979
0.0070
0.00128
0.00133
0.732
Minimum K-S Statistic
0.0994
0.0085
0.00282
0.00289
0.336
In each case the mean and variance are computed from the estimated first and second moments. For a given method of estimation one sums over the 100 data sets: 100
100
∑ R($1 million) i=1
100
323
∑
= estimated first moment.
R($1 million) 2
i=1
The excess ratio is one minus the loss elimination ratio.
100
= estimated second moment.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 730 Then the Mean Square Error = (Bias)2 + Variance. Note that the Cramer-Rao lower bound is .00097 in this case.324 Thus the minimum Mean Square Error for an unbiased estimator is .00097. The estimated MSE for the Method of Maximum Likelihood is equal to that minimum. This is not surprising since asymptotically maximum likelihood estimators have the smallest MSE and smallest variance. For large samples, the Method of Maximum Likelihood is expected to have a small MSE. The Method of Maximum Likelihood has the lowest MSE of the methods tested. Note that the Method of Moments performs particularly poorly in this example, because the underlying Pareto Distribution has no finite second moment and thus no finite variance. Thus trying to match the variance is not particularly useful in this case. In this test we have measured the errors that result from the repeated use of a procedure, rather than how good the estimate is for any single application of a procedure. The errors that would result from the repeated use of a procedure is what is referred to when we discuss the qualities of an estimator. The use of simulation allows one to compare the estimates to the true underlying value, something that is not possible in practical applications.
324
This will be shown in a problem in the section on the variance of functions of estimated parameters.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 731 An Example of Maximum Likelihood Applied to a LogNormal:325 Maximum likelihood estimators are asymptotically unbiased and asymptotically efficient. Thus they perform well for large sample sizes. However, maximum likelihood estimators may perform relatively poorly for small sample sizes. For samples from a LogNormal one can determine in closed form the bias and variance of two estimators of the mean: the sample mean and Maximum Likelihood. I have included the mathematics in the next subsection. Assume losses follow a LogNormal Distribution with µ = 6 and σ = 2. For various sample sizes, below are listed the bias, standard deviation, and root mean squared error:326 Sample Size 4 8 10 15 20 25 100 1000 10,000 100,000
Sample Mean Bias StdDev 0 10,912 0 7716 0 6901 0 5635 0 4880 0 4365 0 2182 0 690 0 218 0 69
RMSE 10,912 7716 6901 5635 4880 4365 2182 690 218 69
Maximum Likelihood Bias StdDev ∞ ∞ 2879.69 ∞ 1927.50 21,957 1060.73 6421 733.04 4161 560.31 3223 123.79 1138 11.97 330 1.19 103 0.12 33
RMSE ∞ ∞ 22,042 6508 4225 3272 1144 330 103 33
For this example, for samples of size less than 17, maximum likelihood has a larger root mean squared error than the sample mean. For samples of size more than 16, maximum likelihood has a smaller root mean squared error than the sample mean.327 Maximum likelihood performs well for large sample sizes, but in this example maximum likelihood performs poorly for very small sample sizes.
325
Based on an example in Brainstorms by Glenn G. Meyers in the August 2010 Actuarial Review. Mean Squared Error = Variance + Bias2 . The Root Mean Squared Error is the square root of the Mean Squared Error. 327 The crossover point depends on the sigma parameter of the LogNormal Distribution. The larger sigma, the higher the crossover point. For sigma = 1, for samples of size less than 12, maximum likelihood has a larger root mean squared error than the sample mean. For sigma = 3, for samples of size less than 27, maximum likelihood has a larger root mean squared error than the sample mean. 326
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 732 Mathematics of the LogNormal Example: In general, the sample mean is an unbiased estimator of the mean. For a sample of size n, Var[ X ] = Var[X] / n. For a LogNormal Distribution with parameters µ and σ: Mean = exp[µ + σ2/2]. Second moment = exp[2µ + 2σ2]. Variance = exp(2µ + σ2) {exp(σ2) -1}. For µ = 6 and σ = 2, the variance = 476,279,085. Thus, Var[ X ] = 476,279,085/n Since X is unbiased, MSE[ X ] = Var[ X ].
⇒ RMSE[ X ] = 21,824 / n . For example, with n = 1000, RMSE[ X ] = 690. For the maximum likelihood LogNormal: µ^
n
=
∑ ln(xi) / n.
i= 1
^2 σ
n
=
∑ {ln(xi)
^ }2 / n. - µ
i= 1
The ln(xi) are independent, identically distributed Normals, each with mean µ and variance σ2. Therefore, µ^ is also Normal, with mean µ, and variance: nσ2 / n2 = σ2 / n. Therefore, exp[ µ^ ] is LogNormal with parameters µ, and σ2 / n. Therefore, E[exp[ µ^ ]] = exp[µ + σ2 / (2n)], and E[exp[ µ^ ]2 ] = exp[2µ + 2σ2 / n]. The ln(xi) are a sample of size n from a Normal. µ^ is the sample mean of that Normal sample.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 733 n
Thus,
∑ {ln(xi)
^ }2 / (n - 1) is the Sample Variance, S2 , of that Normal sample. - µ
i= 1
^ 2 = S2 (n - 1)/n. Therefore, σ
For a sample of size n from a Normal Distribution with standard deviation σ, (n - 1) S2 / σ2 has a Chi-Square Distribution with n - 1 degrees of freedom.328 A Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with α = ν/2 and θ = 2. ^ 2 = S2 (n - 1)/n ⇔ (σ2/n) (Chi-Square Distribution with n - 1 degrees of freedom) σ
⇔ (σ2/n) (Gamma Distribution with α = (n - 1)/2 and θ = 2) ⇔ Gamma Distribution with α = (n - 1)/2 and θ = 2σ2/n.329 ^ 2 / 2 follows a Gamma Distribution with α = (n - 1)/2 and θ = σ2/n. Thus, σ ^ 2 / 2] follows a LogGamma Distribution with α = (n - 1)/2 and θ = σ2/n. Therefore, exp[ σ
The LogGamma has F(x) = Γ[α, ln(x)/θ], and E[Xk] = (1 - kθ)−α, for θ < 1/k. ^ 2 / 2]] = Therefore, E[exp[ σ ^ 2 / 2]2 ] = E[exp[ σ
1 for σ2 < n, and (n 1) / 2 2 (1 - σ / n)
1 for 2σ2 < n. (1 - 2σ 2 / n)(n - 1) / 2
^ 2 /2]. The maximum likelihood estimator of the mean is: exp[ µ^ + σ
Maximum Iikelihood applied to a sample from a Normal Distribution is the same as the method of ^ are also independent for moments; the estimates of µ and σ are independent. Thus µ^ and σ
maximum likelihood applied to a LogNormal Distribution.330 328
For Exam 4/C, you are not responsible for this result for samples from a Normal Distribution. As under uniform inflation, multiplying a Gamma Variable by a constant, we get another Gamma Distribution with alpha the same, but with theta multiplied by that constant. 330 Maximum likelihood is invariant under one-to-one monotonic change of variables. 329
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 734 Therefore, the expected value of the maximum likelihood estimator is: ^ ^ 2 /2]] = E[exp[ µ ^ 2 /2]] = exp[µ + σ2 / (2n)] E[exp[ µ^ + σ ]] E[exp[ σ
1 for σ2 < n. (1 - σ 2 / n)(n - 1) / 2
For µ = 6, σ = 2, and n = 1000, the expected value of the maximum likelihood estimator is: 2992.93. The bias is the expected value of the estimator minus the true value. exp[µ + σ 2 / (2n)] The bias of the maximum Iikelihood estimator of the mean is: - exp[µ + σ2 / 2]. (n 1) / 2 2 (1 - σ / n) Note that since the expected value of the estimator does not exist for n ≤ σ2, the bias does not exist unless n > σ2. For n = 1000, the bias of the maximum likelihood estimator is: 2992.93 - 2980.96 = 11.97. The second moment of the maximum likelihood estimator is: ^ 2 ^ 2 /2]2 ] = E[exp[ µ ^ 2 /2]2 ] = E[exp[ µ^ + σ ] ] E[exp[ σ
exp[2µ + 2σ2 / n]
1 for 2σ2 < n. (1 - 2σ 2 / n)(n - 1) / 2
Thus the variance of the maximum likelihood estimator is: ⎧exp[µ + σ 2 / (2n)]⎫ 2 exp[2µ + 2σ 2 / n] exp[2µ + σ 2 / n] exp[2µ + 2σ 2 / n] = . ⎨ ⎬ (1 - 2σ 2 / n)(n - 1) / 2 ⎩ (1 - σ 2 / n)(n - 1) / 2 ⎭ (1 - 2σ 2 / n)(n - 1) / 2 (1 - σ 2 / n)n - 1 Note that since the second moment of the estimator does not exist for n ≤ 2σ2, the variance does not exist unless n > 2σ2. For µ = 6, σ = 2, and n = 1000, the variance of the maximum likelihood estimator is: 9,066,254 - 8,957,604 = 108,650. Therefore, the mean squared error of the maximum likelihood estimator is: 108,650 + 11.972 = 108,793. For n = 1000, the root mean squared error of the maximum likelihood estimator is:
108,793 = 330.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 735 Problems: 26.1 (1 point) Which of the following are true about method of maximum likelihood estimators? 1. unbiased 2. asymptotically normal 3. have the smallest asymptotic variance A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D. 26.2 (3 points) Starfleet is testing two new portable instruments, one from Terra and one from Vulcan, each for measuring the subspace field intensity. The measurements of the two instruments are independent. Let c be the true strength of the subspace field. The instrument from Terra has an expected value of 0.9c and a variance of 0.4c2 , while the instrument from Vulcan has an expected value of 1.2c and a variance of 0.2c2 . Starfleet is willing to use an estimate of the form: ψ = αt + βv, where α and β are constants, t is the estimate from the Terran instrument and v is the estimate from the Vulcan instrument. What is the value of β that corresponds to the unbiased estimator with the smallest expected Mean Squared Error of such estimators? A. Less than 0.60 B. At least 0.60, but less than 0.70 C. At least 0.70, but less than 0.80 D. At least 0.80, but less than 0.90 E. At least 0.90 26.3 (2 points) The New Republic is testing two instruments, one from Tatooine and one from Naboo, each for measuring the force. The measurements of the two instruments are independent. Assume that the two instruments have the same mean and variance. The New Republic is willing to use an estimate of the form: ψ = αt + βn, where α and β are constants, t is the estimate from the Tatooinian instrument and n is the estimate from the Nabooan instrument. What is the value of α/β that corresponds to the unbiased estimator with the smallest expected Mean Squared Error of such estimators? 26.4 (2 points) You are given: x 0 25 50 Pr[X = x] 0.2 0.3 0.4
100 0.1 n
∑ (Xi
The population mean, µ, and variance, σ2, are estimated by X = ΣXi/n and Sn 2 = i=1 respectively. Calculate the bias of Sn 2 , when n = 10. (A) -80 (B) -60 (C) -40 (D) -20
(E) 0
- X )2 n
,
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 736 26.5 (2 points) You are given the following: •
Losses follow an Exponential distribution, with mean θ unknown.
•
7 losses have been observed.
•
X is the average of these seven losses.
•
You estimate θ as c X .
Determine the value of c that minimizes the mean squared error of this estimator of θ. A. B. C. D. E.
Less than 0.8 At least 0.8, but less than 0.9 At least 0.9, but less than 1.0 At least 1.0, but less than 1.1 1.1 or more
26.6 (3 points) Losses are of sizes 0, 1, or 5, with probabilities 30%, 60%, and 10% respectively. For a sample of size 2, the variance is estimated using the sample variance. What is the mean squared error? A. Less than 11 B. At least 11, but less than 12 C. At least 12, but less than 13 D. At least 13, but less than 14 E. 14 or more 26.7 (3 points) You are given two estimators, A and B, of the same unknown quantity. (i) Bias of A is -10. Variance of A is 20. (ii) Bias of B is 5. Variance of B is 30. (iii) The correlation of A and B is 0.7. Estimator C is a weighted average of the two estimators A and B, such that: C = wA + (1-w)B. Determine the value of w that minimizes the mean squared error of C. A. Less than 0.1 B. At least 0.1, but less than 0.2 C. At least 0.2, but less than 0.3 D. At least 0.3, but less than 0.4 E. 0.4 or more
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 737 26.8 (3 points) You are given the following: •
Losses follow a Pareto distribution, with parameters θ (unknown) and α = 4.
•
5 losses have been observed.
•
X is the average of these five losses.
•
You estimate θ as c X .
Determine the value of c that minimizes the mean squared error of this estimator of θ. A. B. C. D. E.
Less than 2.4 At least 2.4, but less than 2.8 At least 2.8, but less than 3.2 At least 3.2, but less than 3.6 3.6 or more
Use the following information for the next two questions: Losses are uniformly distributed from 0 to b. You observe 10 losses. You estimate b by taking the maximum of these 10 values. 26.9 (2 points) Determine the bias of this estimator of b. A. -b/11 B. -b/10 C. 0 D. b/10 E. b/11 26.10 (2 points) Determine the Mean Squared Error of this estimator of b. A. b2 /132
B. b2 /66
C. b2 /60
D. b2 /12
E. b2 /11
26.11 (3 points) The random independent, identically distributed variables X1 , X2 , ... , X100, have an exponential distribution with mean θ. One uses Σ Xi2 / 100 as an estimator of θ2. Calculate the mean-squared error of this estimator. (A) 1.0 θ4
(B) 1.1 θ4
(C) 1.2 θ4
(D) 1.3 θ4
(E) 1.4 θ4
26.12 (2 points) You are given the following: •
Losses follow a LogNormal distribution, with parameters µ and σ.
•
4 losses have been observed: X1 , X2 , X3 , X4 .
•
You estimate µ as: c ln[X1 X2 X3 X4 ].
Determine the value of c that minimizes the mean squared error of this estimator of µ. A. 1/4
B. 1/(4σ2)
C. 1/(4 + σ2)
D. µ2/(µ2 + σ2)
E. µ2/(4µ2 + σ2)
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 738 Use the following information for the next three questions: A population consist of five members, with values: 0, 2, 2, 5, 6. A random sample of size 3 is drawn without replacement. Determine the Mean Square Error of each of the following estimators of the population mean. 26.13 (2 points) The mean of the 3 values. A. 0.6 B. 0.8 C. 1.0 D. 1.2
E. 1.4
26.14 (2 points) The median of the 3 values. A. Less than 0.8 B. At least 0.8, but less than 1.1 C. At least 1.1, but less than 1.4 D. At least 1.4, but less than 1.7 E. 1.7 or more 26.15 (2 points) The average of the smallest and largest of the 3 values. A. Less than 0.8 B. At least 0.8, but less than 1.1 C. At least 1.1, but less than 1.4 D. At least 1.4, but less than 1.7 E. 1.7 or more 26.16 (3 points) Two independent estimators, ψ and φ, are available for estimating the parameter, θ, of a given loss distribution. To test their performance, you have conducted 1000 simulated trials of each estimator, using θ = 7, with the following results: 1000
∑
i=1
ψi = 7232,
1000
∑
i=1
ψi 2
1000
= 52,480,
∑
i=1
φi = 6917,
1000
∑
i=1
φi2 = 48,126.
Consider the class of estimators of θ which are of the form: wψ + (1-w)φ. Determine the value of w that results in an estimator with the smallest mean squared error. A. 40% B. 45% C. 50% D. 55% E. 60% 26.17 (3 points) The mean squared error of α^ as an estimator of α is 0.49. E[ α^ ] = 2.7. If the bias of α^ as an estimator of α is -0.4, what is the bias of α^ 2 as an estimator of α2? A. -2.0
B. -1.6
C. -1.2
D. -0.8
E. -0.4
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 739 26.18 (3 points) The random independent, identically distributed variables X1 , X2 , ... , Xn , have an exponential distribution with mean θ. One uses cΣXi2 /n as an estimator of θ2. Determine the value of c, such that the mean square error of this estimator is minimized. (A) 1/12 (B) n/(10 + 2n) (C) n/(8 + 4n) (D) n/(6 + 6n) (E) 1/2 26.19 (2 points) The true value of θ is 10. For a certain estimator of θ, when applied to a sample of size n, the estimate is 10 with probability 1 - 1/n, and n with probability 1/n. Which of the following statements about this estimator of θ are true? A. It is unbiased and is not consistent. B. It is asymptotically unbiased and is not consistent. C. It is not asymptotically unbiased and is not consistent. D. It is asymptotically unbiased and is consistent. E. It is not asymptotically unbiased and is consistent. 26.20 (3 points) Losses are Exponential with mean θ. You observe 3 losses. You estimate θ by taking the sample median of these 3 values. Determine the bias of this estimator of θ. A. -θ/6
B. -θ/3
C. 0
D. θ/3
E. θ/6
26.21 (2 points) Three observed values of the random variable X are: 1 1 4 You estimate the mode of X by taking the most common value in {X1 , X2 , X3 }. Estimate the bias of this estimator of the mode. (A) 0.6 (B) 0.8 (C) 1.0 (D) 1.2
(E) 1.4
26.22 (2 points) Two different estimators, η and ζ, are available for estimating the parameter, θ, of a given loss distribution. To test their performance, you have conducted 100 simulated trials of each estimator, using θ = 1.4, with the following results: 100
∑ ηi = 156,
100
∑ ηi 2 = 257,
100
∑ ζi = 127,
100
i=1
i=1
i=1
i=1
∑ ζi2
= 321
As measured in this simulation, what is the ratio of the Mean Squared Error of estimator η to the Mean Squared Error of estimator ζ? A. 0.1
B. 0.2
C. 0.4
D. 0.6
E. 0.8
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 740 26.23 (2 points) Let θ^ n be the estimate of a parameter θ for a sample size of n. The true value of θ is 100. E[ θ^ n ] = 100n/(n-1), for n = 2, 3, 4, ... Var[ θ^ n ] = 60/n, for n = 2, 3, 4, ... Which of the following statements are true about the estimator θ^ n ? I. θ^ n is an asymptotically unbiased estimator of θ. II. θ^ n is a consistent estimator of θ. III. The mean square error of θ^ 25 is less than 19. A. I and II only B. I and III only C. II and III only D. I, II, and III E. None of A, B, C, or D 26.24 (3 points) The number of claims in a year is Bernoulli with q = 0.4. (60% chance of no claim and 40% of a claim.) List all possible samples of 3 independent years. For each sample, calculate its probability, sample mean, sample variance (with n - 1 in the denominator) and biased estimator of the variance (with n in the denominator). Then use this information to calculate the expected value of these three estimators. 26.25 (3 points) The random variables X1 , X2 , ... , Xn , are independent and identically distributed with probability density function f(x) = λe-λx, x > 0. Calculate the bias of 1/ X as an estimator of λ. 26.26 (3 points) X and Y are each estimators of the same parameter θ. E[X] = 0.8θ. Var[X] = 0.3θ2. E[Y] = 1.3θ. Var[Y] = 0.5θ2. Cov[X, Y] = -0.2θ2. Let Z = aX + bY. Find a and b such that Z is the Uniformly Minimum Variance Unbiased Estimator (UMVUE) of θ.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 741 26.27 (2 points) Three observed values of the random variable X are: 1 1 4 You estimate the mean of X by taking: (X1 + X2 + X3 )/3. Determine the bias of this estimator of the mean. (A) -0.4 (B) -0.2 (C) 0 (D) 0.2
(E) 0.4
26.28 (1 point) A and B are each estimators of the same quantity. Estimator A has a bias of -4 and a mean squared error of 30. Estimator B has twice the variance but half the bias of Estimator A. Determine the mean squared error of Estimator B. A. 26 B. 28 C. 30 D. 32 E. 34 26.29 (4 points) A discrete distribution can take on only two values, 0 and 8 equally likely. i= 4
∑ (Xi - X)2
(a) For a sample of size four, demonstrate that the sample variance, s2 = i = 1 4 - 1
,
is an unbiased estimator of the variance of the distribution. (b) Let the sample standard deviation, s, be the square root of the sample variance. For a sample of size four, demonstrate that the sample standard deviation is a biased estimator of the standard deviation of the distribution. (c) For a sample of size four, demonstrate that s4 is a biased estimator of σ4 of the distribution. 26.30 (3 points) X follows a Uniform Distribution on (0, ω). One will estimate ω by taking the maximum of a random sample. Demonstrate that this estimator is consistent. 26.31 (3 points) A discrete distribution can take on only two values, 0 and 8 with probabilities of 3/4 and 1/4 respectively. i =4
2 For a sample of size four, let
∑ (Xi - X )3
i =1
3
be an estimator of the third central moment.
Determine whether this is an unbiased estimator of the third central moment of this distribution. 26.32 (1 point) X follows a Pareto Distribution with parameters α and θ = 10. For what values of α is the sample mean a consistent estimator of the mean of the Pareto?
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 742 26.33 (2, 5/83, Q.15) (1.5 points) Let T1 and T2 be estimators of a population parameter θ based upon the same random sample. If Ti is distributed normally with mean θ and variance σi2 > 0, i = 1, 2, and if T = bT1 + (1 - b)T2 , then T is an unbiased estimator of θ whose variance is minimized by choosing b as: A.
σ2 σ1
B.
σ22 σ12
D.
σ22 - Cov[T 1 , T2] σ12 - 2Cov[T1 , T2 ] + σ 22
E.
σ 22 - Cov[T1 , T2 ]/ 2 σ12 - 2Cov[T1 , T2 ] + σ 22
C.
σ 22 σ12 + σ22
26.34 (2, 5/83, Q.20) (1.5 points) Let X be a random variable with mean 2. Let S and T be unbiased estimators of the second and third moments, respectively, of X about the origin. Which of the following is an unbiased estimator of the third moment of X about its mean? A. T - 6S + 16
B. T- 3S + 2
C. (T - 2)3 - 3(S - 2)2
D. (T- 2)3
E. T - 8
26.35 (2, 5/83, Q.33) (1.5 points) Let X1 , X2 , . . . , Xn be a random sample of size n ≥ 2 from a Poisson distribution with mean λ. Consider the following three statistics as estimators of λ. n
I. X = ∑ Xi / n i=1
n
II. ∑ (Xi - X )2 / (n -1) i=1
III. 2X1 - X2
Which of these statistics are unbiased? A. I only B. II only C. III only E. The correct answer is not given by A, B, C, or D
D. I, II, and III
26.36 (2, 5/85, Q.21) (1.5 points) Let X1 , . . . , Xn be a random sample from a distribution with density function f(x) = 3λx2 exp[-λx3 ], for 0 < x < ∞. What is the Rao-Cramer lower bound for the variance of unbiased estimators of λ (i.e., the minimum variance bound) given that E(X3 ) = 1/λ and E(X6 ) = 2/λ2. A. 3λ/n
B. 2λ/n
C. 3λ2/n
D. 2λ2/n
E. λ2/n
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 743 26.37 (2, 5/88, Q.5) (1.5 points) Let X1 ,. . ., Xn be independent continuous random variables with common density f(x I λ) = λe−λx, x > 0, for which E(Xi) = 1/λ and Var(Xi) =1/λ2. What is the Rao-Cramer lower bound for the variance of an unbiased estimator of λ? A.
1 nλ
B.
1 n λ2
C.
n λ2
D.
λ2 n- 1
E.
λ2 n
26.38 (2, 5/88, Q.14) (1.5 points) Let X and Y be jointly distributed random variables. After observing X, the corresponding value of Y is predicted by some function h(X). Let h0 (X) = E(Y I X) be a specific predictor function. Which of the following statements about h0 (X) is always true? A. Among all possible choices of h(X), the predictor h0 (X) minimizes the expected absolute error E[IY - h(X)I]. B. h0 (X) is a linear function of X. C. Among all possible choices of h(X), the predictor h0 (X) minimizes, the squared error {Y - h(X)}2 . D. Among all possible choices of h(X), the predictor h0 (X) minimizes the expected squared error E[{Y - h(X)}2 ]. E. Among all possible choices of h(X), the predictor h0 (X) minimizes the variance Var[h(X)]. 26.39 (2, 5/88, Q.36) (1.5 points) Let X be a random variable with a binomial distribution with parameters m and q, and let q^ = x/m. Then q^ is an unbiased estimator of q. Which of the following is an unbiased estimator of q(1 - q)? A. q^ (1- q^ )
B. q^ (1- q^ ) / (m-1)
C. q^ (1- q^ ) / m
D. q^ (1- q^ )(m - 1) / m
E. q^ (1- q^ ) m / (m - 1)
26.40 (4, 5/88, Q.50) (1 point) Which of the following are true? 1. A consistent estimator is defined as one whose variance is minimal. 2. An unbiased estimator is defined as one whose expected value is equal to the underlying parameter. 3. Maximum likelihood estimators are unbiased for samples under general conditions. A. 1 B. 2 C. 3 D. 1, 2 E. 2, 3
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 744 26.41 (4, 5/89, Q.46) (1 point) Given the following information, calculate the mean square error (MSE) of the maximum likelihood estimator θ1 and the MSE of the method of moments estimator θ2:
Mean Variance
θ1
θ2
3.2 0.5
3.3 0.6
The estimated parameter θ has an actual value of θ = 3. The Mean Square Errors (MSE) of the estimators θ1 and θ2 are respectively: A. 0.04, 0.09
B. 0.29, 0.45
C. 0.50, 0.60
D. 0.54, 0.69
E. None of A, B, C, or D
26.42 (4, 5/89, Q.47) (1 point) Independent observations x1 , ..., x6 will be taken from a normal distribution with an unknown mean and variance. Based on this sample you will estimate the variance to be 6
∑
(xi - X )2
S2 = i=1
6
6
∑ xi
with X = i=1 . 6
If the actual variance for the normal distribution is σ2 = 2, what is the absolute value of the bias in your estimation procedure for the variance? A. Less than 0.1 B. At least 0.1, but less than 0.2 C. At least 0.2, but less than 0.3 D. At least 0.3, but less than 0.4 E. 0.4 or more 26.43 (4, 5/91, Q.28) (1 point) Loss Models describes three desirable properties of an estimator, α^ , of α . Match each of these properties with the correct mathematical description. a. Consistent
1. E [ α^ ] = α
b. Unbiased
2. E [ α^ ] = α, Var[ α^ ] ≤ Var[ α ], where α is any other estimator of α
~
~
~
such that E [ α ] = α. c. UMVUE
3. For any ε > 0, Pr{ | α^ - α | < ε} → 1 as n→ ∞, where n is the sample size.
A. a=1, b=2, c=3
B. a=2, b=1, c=3
C. a=1, b=3, c=2
D. a=3, b=2, c=1
E. a=3, b=1, c=2
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 745 26.44 (4B, 5/92, Q.2) (1 point) Which of the following are true? 1. The expected value of an unbiased estimator of a parameter is equal to the true value of the parameter. 2. If an estimator is asymptotically unbiased, the probability that an estimate based on n observations differs from the true parameter by more than some fixed amount converges to zero as n grows large. 3. A consistent estimator is one with a minimal variance. A. 1 only B. 3 only C. 1 and 2 only D. 1, 2, and 3 E. The correct answer is not given by A, B, C, or D 26.45 (4B, 5/92, Q.17) (2 points) You are given that the underlying size of loss distribution for disability claims is a Pareto distribution with parameters α and θ = 6000. You have used 10 random observations, maximum likelihood estimation, and simulation to determine the following for α^ the maximum likelihood estimator of α: E[ α^ ] = 2.20, M.S.E.( α^ ) = 1.00 (M.S.E. = mean square error) Determine the variance of α^ if α = 2. A. B. C. D. E.
Less than 0.70 At least 0.70 but less than 0.85 At least 0.85 but less than 1.00 At least 1.00 but less than 1.15 At least 1.15
26.46 (4B, 11/92, Q.8) (1 point) You are given the following information: X is a random variable whose distribution function has parameter α = 2.00. Based on n random observations of X you have determined:
•
E[α1] = 2.05 where α1 is an estimator of α having variance = 1.025.
•
E[α2] = 2.05 where α2 is an estimator of α having variance = 1.050.
•
As n increases to ∞, P[|α1 - α| > ε] approaches 0 for any ε > 0.
Which of the following are true? 1. α1 is an unbiased estimator of α. 2. α2 has a smaller Mean Squared Error than α1. 3. α1 is a consistent estimator of α. A. 1 only
B. 2 only
C. 3 only
D. 1, 3 only
E. 2, 3 only
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 746 26.47 (4B, 5/93, Q.15) (1 point) Which of the following are basic properties of maximum likelihood estimators? 1. The variance of the maximum likelihood estimator is equal to its mean square error. 2. Maximum likelihood estimators are invariant under parameter transformation. 3. Maximum likelihood estimators have an asymptotically normal distribution. A. 2 B. 1, 2 C. 1, 3 D. 2, 3 E. 1, 2, 3 26.48 (4B, 11/93, Q.13) (3 points) You are given the following: • Two instruments are available for measuring a particular (non-zero) distance. • X is the random variable representing the measurement using the first instrument and Y is the random variable representing the measurement using the second instrument. • X and Y are independent. • E[X] = 0.8m; E[Y] = m; Var[X] = m2 ; and Var[Y] = 1.5m2 where m is the true distance. Consider the class of estimators of m which are of the form Z = αX + βY. Within this class of estimators of m, determine the value of α that makes Z an unbiased estimator with smallest mean squared error. A. Less than 0.45 B. At least 0.45, but less than 0.50 C. At least 0.50, but less than 0.55 D. At least 0.55, but less than 0.60 E. At least 0.60 26.49 (4B, 5/95, Q.27) (2 points) Two different estimators, ψ and φ, are available for estimating the parameter, β, of a given loss distribution. To test their performance, you have conducted 75 simulated trials of each estimator, using β = 2, with the following results: 75
75
∑ ψi = 165, ∑ i=1
ψi2 i=1
= 375,
75
75
i=1
i=1
∑ φi = 147, ∑ φi 2 = 312.
Let MSE(ψ) = the mean squared error of estimator ψ. Let MSE(φ) = the mean squared error of estimator φ. In this simulation, what is MSE(ψ) / MSE(φ)? A. B. C. D. E.
Less than 0.50 At least 0.50, but less than 0.65 At least 0.65, but less than 0.80 At least 0.80, but less than 0.95 At least 0.95, but less than 1.00
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 747 26.50 (4B, 5/96, Q.12) (1 point) Which of the following must be true of a consistent estimator? 1. It is unbiased. 2. For any small quantity ε, the probability that the absolute value of the deviation of the estimator from the true parameter value is less than ε tends to 1 as the number of observations tends to infinity. 3. It has minimal variance. A. 1 B. 2 C. 3
D. 2, 3
E. 1, 2, 3
26.51 (4B, 11/96, Q.21) (2 points) You are given the following: • The expectation of a given estimator is 0.50.
• •
The variance of this estimator is 1.00.
The bias of this estimator is 0.50. Determine the mean square error of this estimator. A. 0.75 B. 1.00 C. 1.25 D. 1.50
E. 1.75
26.52 (4B, 11/97, Q.15) (1 point) Which of the following statements are always true with respect to the estimation of a parameter? 1. The maximum likelihood estimator differs from the method of moments estimator. 2. Point estimates are preferable to interval estimates. 3. All Bayesian estimators are unbiased. A. None of the above statements are always true. B. 1 C. 2 D. 3 E. 1, 2 26.53 (4B, 11/98, Q.20) (1 point) Which of the following statements are always true with respect to maximum likelihood estimation with small samples? 1. The estimator is unbiased. 2. The variance of the estimator is equal to the Rao-Cramer lower bound. 3. The estimator has a normal distribution. A. None of the above statements are always true. B. 1 C. 2 D. 3 E. 1, 2
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 748 26.54 (4, 5/00, Q.14) (2.5 points) Which of the following statements about evaluating an estimator is false? (A) Modeling error is not possible with empirical estimation. ^
^
^
(B) MSE( θ) = Var( θ) + [Bias( θ)]2 n
∑ (Xi
(C) Sn2 = i=1
- X )2 is an asymptotically unbiased estimator of variance.
n
^
^
^
(D) If θ n is asymptotically unbiased and lim Var( θ n ) = 0, then θ n is weakly consistent. n→∞
(E) An estimate is a random variable or a random function. Note: I have rewritten the question due to a change in the syllabus. 26.55 (4, 11/02, Q.31 & 2009 Sample Q. 49) (2.5 points) You are given: x 0 1 2 3 Pr[X = x] 0.5 0.3 0.1 0.1 The method of moments is used to estimate the population mean, µ, and variance, σ2, n
∑ (Xi
by X and Sn 2 = i=1 (A) -0.72
- X )2 n
(B) -0.49
, respectively. Calculate the bias of Sn 2 , when n = 4. (C) -0.24
(D) -0.08
(E) 0.00
26.56 (4, 11/04, Q.40 & 2009 Sample Q.161) (2.5 points) Which of the following statements is true? (A) A uniformly minimum variance unbiased estimator is an estimator such that no other estimator has a smaller variance. (B) An estimator is consistent whenever the variance of the estimator approaches zero as the sample size increases to infinity. (C) A consistent estimator is also unbiased. (D) For an unbiased estimator, the mean squared error is always equal to the variance. (E) One computational advantage of using mean squared error is that it is not a function of the true value of the parameter.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 749 26.57 (CAS3, 5/05, Q.21) (2.5 points) An actuary obtains two independent, unbiased estimates, Y 1 and Y2 , for a certain parameter. The variance of Y1 is four times that of Y2 . A new unbiased estimator of the form k1 Y 1 + k2 Y 2 is to be constructed. What value of k1 minimizes the variance of the new estimate? A. Less than 0.18 B. At least 0.18, but less than 0.23 C. At least 0.23, but less than 0.28 D. At least 0.28, but less than 0.33 E. 0.33 or more 26.58 (4, 5/05, Q.16 & 2009 Sample Q.186) (2.9 points) For the random variable X, you are given: (i) E[X] = θ, θ > 0. (ii) Var(X) = θ2/25. (iii) θ^ = {k/(k+1)} X, k > 0. (iv) MSEθ^(θ) = 2[bias ^θ(θ)] 2 . Determine k. (A) 0.2 (B) 0.5 (C) 2
(D) 5
(E) 25
26.59 (4, 11/05, Q.28 & 2009 Sample Q.238) (2.9 points) The random variable X has the exponential distribution with mean θ. Calculate the mean-squared error of X2 as an estimator of θ2. (A) 20 θ4
(B) 21 θ4
(C) 22 θ4
(D) 23 θ4
(E) 24 θ4
26.60 (CAS3, 5/06, Q.3) (2.5 points) Mrs. Actuarial Gardner has used a global positioning system to lay out a perfect 20-meter by 20-meter gardening plot in her back yard. Her husband, Mr. Actuarial Gardner, decides to estimate the area of the plot. He paces off a single side of the plot and records his estimate of the length. He repeats this experiment an additional 4 times along the same side. Each trial is independent and follows a Normal distribution with mean 20 meters and a standard deviation of 2 meters. He then averages his results and squares that number to estimate the total area of the plot. Which of the following is a true statement regarding Mr. Gardenerʼs method of estimating the area? A. On average, it will underestimate the true area by at least 1 square meter. B. On average, it will underestimate the true area by less than 1 square meter. C. On average, it is an unbiased method. D. On average, it will overestimate the true area by less than 1 square meter. E. On average, it will overestimate the true area by at least 1 square meter.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 750 26.61 (CAS3, 5/06, Q.4) (2.5 points) The random sample x1 , x2 , ..., xn is from a Normal distribution with unknown mean and variance. Which of the following statements are true? i =n
I.
∑ (xi - x)2 / (n - 1) is a unbiased estimator of the variance of X
i=1 i= n
II.
∑ xi2 / n is a unbiased estimator of E[X2].
i= 1 i= n
III.
∑ (xi - x)2 / n is a consistent estimator of the variance.
i= 1
A. None of the above statements are true. B. I and II C. I and III D. II and III E. I, II, and III Note: This past exam question has been rewritten to match the syllabus of your exam. 26.62 (CAS3, 11/07, Q.4) (2.5 points) You are given the following information:
• Y = X1 +...+ Xn where n = 100 • X1 ,..., Xn is a random sample from a Gamma distribution with α = 4 and θ unknown Calculate the value, c, to produce an unbiased estimate of θ from c Y. A. Less than 0.0035 B. At least 0.0035, but less than 0.0045 C. At least 0.0045, but less than 0.0055 D. At least 0.0055, but less than 0.0065 E. At least 0.0065
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 751 26.63 (CAS3L, 11/08, Q.5) (2.5 points) You are given the following information about a random variable X.
• X follows a Gamma Distribution •α=3 • θ = 100 Using the mean square error criterion, calculate the constant value, c, which is the best estimate of X. A. Less than 50 B. At least 50, but less than 150 C. At least 150, but less than 250 D. At least 250, but less than 350 E. At least 350 26.64 (CAS3L, 5/10, Q.26) (2.5 points) You are given the following information: • X is a random variable with a probability density function f(x, θ) where θ is unknown. • E(X) = θ ^
•θ=
n
∑ ai xi , where the xi's are from a random sample of size n = 5. i=1
• A table of ai as shown below: Option
a1
a2
a3
a4
a5
I II III IV
0.1 0.2 0.4 0.6
0.2 0.1 0.1 0.1
0.1 0.1 0.1 0.1
0.3 0.2 0.2 0.2
0.5 0.4 0.3 0.6
Which option will make θ^ an unbiased estimator of θ? A. Option I B. Option II C. Option III D. Option IV E. None of the above
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 752 26.65 (CAS3L, 5/11, Q.17) (2.5 points) You are given the following information:
• A random variable X is uniformly distributed on the interval (0, θ). • θ is unknown. • For a random sample of size n, an estimate of θ is Yn = max {X1 , X2 , ... , Xn }. • The cumulative distribution function. of Yn is: ⎧ 1 t > θ ⎪ FYn (t) = ⎨(t / θ)n 0 < t ≤ θ ⎪ 0 t ≤ 0 ⎩ Which of the following is a consistent estimator of θ? A. Yn
B. Yn / (n - 1)
C. Yn (n + 1)
D. Yn (n + 1) (n - 1)
E. Yn / (n + 1)
26.66 (CAS3L, 11/11, Q.17) (2.5 points) You are given the following:
• There are three available estimators for θ: X, Y, and Z. • θ is a parameter in a probability distribution function. • The variance of the estimators are Var(X) = 1.9; Var(Y) = 1.0; Var(Z) = 2.1;
• The difference between the expected value of the estimator and the true parameter are Bias(X, θ) = 0.5 Bias(Y, θ) = -1.0 Bias(Z, θ) = 0.0
• You wish to rank the estimators in order to minimize the mean square error (MSE). • Hint: MSE(X) = E[(X-θ)2 ] Choose the ranking of MSE(X), MSE(Y), and MSE(Z) from lowest to highest. A. MSE(X) < MSE(Y) < MSE(Z) B. MSE(X) < MSE(Z) < MSE(Y) C. MSE(Y) < MSE(Z)
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 753 Solutions to Problems: 26.1 C. 1. False, 2. True, 3. True. 26.2. B. UMVUE means an unbiased estimator with the smallest variance. ψ = αt + βv. E[ψ] = αE[t] + βE[v]. For ψ unbiased, E[ψ] = c. Thus c = α(.9c) + β(1.2c). Thus α = (1 - 1.2β)/.9. Since t and v are independent, Var[ψ] = α2Var[t] + β2 Var[v] = α2(.4c2) + β2 (.2c2 ) = c2 ((.2)β2 + (.4){(1 - 1.2β)/.9}2 ) = c2 (.9111β2 - 1.1852β + .4938). We need to minimize the mean squared error of ψ. The mean squared error is the sum of the variance and the square of the bias. Since the bias is zero we need to minimize the variance of ψ.
∂ Var[ψ] / ∂β = c2 (1.8222β - 1.1852). Setting the partial derivative equal to zero: β = 1.1852 / 1.8222 = 0.650. Therefore, α = (1 - 1.2β)/0.9 = 0.244. Comment: In spite of the feeble attempt at humor, this is just a revision of 4B, 11/93, Q.13. 26.3. Let µ be the common mean and σ2 be the common variance. Let c be the true value of the force. In order for ψ = αt + βn to be unbiased, c = αµ+ βµ. ⇒ α = c/µ - β. Var[ψ] = α2σ2 + β2σ2 = σ2{(c/µ − β)2 + β2}. Since ψ is unbiased, the MSE is equal to the square of the variance. In order to minimize the MSE, set the derivative of Var[ψ] with respect to β equal to zero: 0 = σ2{-2(c/µ − β) + 2β}. ⇒ β = c/(2µ). ⇒ α = c/(2µ). ⇒ α/β = 1. Comment: One obtains information from the independent reading of each instrument. If the two instruments have the same characteristics, then we give their readings equal weight. If in addition, each of the instruments were unbiased, then µ = c, and the weight given to each reading would be: c/(2c) = 50%.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 754 26.4. A. mean = (0)(.2) + (25)(.3) + (50)(.4) + (100)(.1) = 37.5. 2nd moment = (02 )(.2) + (252 )(.3) + (502 )(.4) + (1002 )(.1) = 2187.5. variance = 2187.5 - 37.52 = 781.25. S 102 = Σ(Xi - X )2 / 10 = (9/10)(Σ(Xi - X )2 / 9) = (9/10)(sample variance). The sample variance is unbiased. ⇒ E[sample variance] = 781.25.
⇒ E[S102 ] = (.9)(781.25) = 703.125. Bias = expected value of the estimator - true value = 703.125 - 781.25 = -78.1. Comment: Similar to 4, 11/02, Q.31. 1. The sample variance is only used when we have a data set. 2. If we had a sample (of size 4 from this distribution) the sample variance would be an unbiased estimator of the variance of the distribution. 3. In this case, we know the distribution and thus its variance. 4. The estimator given is: (sample variance) (N-1)/N = (sample variance)(3/4). 26.5. B. For an Exponential, E[X] = θ, Var[X] = θ2. Var[ X ] = Var[X]/N = θ2/7. Var[ θ^ ] = Var[c X ] = c2 Var[ X ] = c2 θ2/7. E[ θ^ ] = E[c X ] = cE[ X ] = cE[X] = cθ. Bias[ θ^ ] = E[ θ^ ] - θ = θ(c - 1). MSE = Var[ θ^ ] + Bias[ θ^ ]2 = c2 θ2/7 + θ2(c - 1)2 = θ2{c2 /7 + (c - 1)2 }. Set equal to zero the derivative of MSE with respect to c: 0 = θ2{2c/7 + 2(c - 1)}. ⇒ c = 7/8 = 0.875. Comment: The method of moments estimator, θ^ = X is unbiased, but does not have the smallest mean squared error (for finite sample sizes.)
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 755 26.6. D. For the given distribution: E[X] = (0.3)(0) + (0.6)(1) + (0.1)(5) = 1.1. E[X2 ] = (0.3)(02 ) + (0.6)(12 ) + (0.1)(52 ) = 3.1. Var[X] = 3.1 - 1.12 = 1.89. Sample Variance = {(X1 - X )2 + (X2 - X )2 }/(2 - 1) = (X1 - X )2 + (X2 - X )2 . Sample
Probability
Sample Variance
Squared Difference from True Variance
0, 0
.09
0
1.892
1, 1
.36
0
1.892
5, 5
.01
0
1.892
0, 1 or 1, 0
.36
.5
1.392
0, 5 or 5, 0
.06
12.5
10.612
1, 5 or 5, 1
.12
8
6.112
Mean Squared Error = (0.46)(1.892 ) + (0.36)(1.392 ) + (0.06)(10.612 ) + (0.12)(6.112 ) = 13.57. Comment: Somewhat beyond what you are likely to be asked on the exam. The expected value of the sample variance is: (0.36)(.5) + (0.06)(12.5) + (0.12)(8) = 1.89 = true value of the variance. In general, the sample variance is an unbiased estimator, with mean squared error of: E[(X - E[X])4 ] / n + (3 - n)E[(X - E[X])2 ]2 / (n2 - n). In this case, n = 2, E[(X - E[X])2 ] = Var[X] = 1.89, and E[(X - E[X])4 ] = (0.3)(1.14 ) + (0.6)(.14 ) + (0.1)(3.94 ) = 23.5737. E[(X - E[X])4 ] / n + (3 - n)E[(X - E[X])2 ]2 / (n2 - n) = 23.5737/2 + 1.892 /2 = 13.57.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 756 26.7. D. Cov[A, B] = Corr[A, B] Var[A]Var[B] = (0.7) 600 = 17.15. Var[C] = Var[wA + (1-w)B] = w2 Var[A] + (1-w)2 Var[B] + 2w(1-w)Cov[A, B] = 20w2 + 30(1-w)2 + 34.3w(1-w) = 15.7w2 - 25.7w + 30. Bias[C] = Bias[wA + (1-w)B] = wBias[A] + (1-w)Bias[B] = (-10)w + (5)(1 - w) = 5 - 15w. MSE[C] = Var[C] + Bias[C]2 = 15.7w2 - 25.7w + 30 + (5 - 15w)2 = 240.7w2 - 175.7w + 55. Setting the derivative with respect to w equal to zero: 481.4w - 175.7 = 0. ⇒ w = 175.7/481.4 = .365. Comment: Bias[A] = E[A] - true value. ⇒ Bias[B] = E[B] - true value. Bias[C] = E[C] - true value = wE[A] + (1-w)E[B] - true value = wBias[A] + (1-w)Bias[B]. Here is a graph of the Mean Squared Error as a function of w: MSE 120 100 80 60 40
0.2
0.4
0.6
0.8
1
w
26.8. A. For a Pareto, E[X] = θ/(α−1) = θ/3, E[X2 ] = 2θ2/{(α−1)(α−2)} = θ2/3, Var[X] = 2θ2/9. Var[ X ] = Var[X]/N = (2θ2/9)/5 = 2θ2/45. Var[ θ^ ] = Var[c X ] = c2 Var[ X ] = 2c2 θ2/45. E[ θ^ ] = E[c X ] = cE[ X ] = cE[X] = cθ/3. Bias[ θ^ ] = E[ θ^ ] - θ = θ(c/3 - 1). MSE = Var[ θ^ ] + Bias[ θ^ ]2 = c2 2θ2/45 + θ2(c/3 - 1)2 = θ2{2c2 /45 + (c/3 - 1)2 }. Set equal to zero the derivative of MSE with respect to c: 0 = θ2{4c/45 + (2/3)(c/3 - 1)}. ⇒ c = 15/7 = 2.14. Comment: The method of moments estimator, θ^ = (α-1) X = 3 X is unbiased, but does not have the smallest mean squared error (for finite sample sizes.)
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 757 26.9. A. The distribution function for the maximum of 10 values is: Prob[all 10 values < x] = F(x)10 = (x/b)10. The expected value of this maximum is the integral of its survival function: b
∫0 1 - (x / b)10 dx = b - b/11 = b10/11. Bias of the estimator = Expected Value of the Estimator - True Value = b10/11 - b = -b/11. Comment: See Example 12.5 in Loss Models. For n observations the bias is -b/(n+1); the estimator is biased downwards. 26.10. B. The distribution function for the maximum of 10 values is: (x/b)10. The density for the maximum of 10 values is: d{(x/b)10}/ dx = 10(x/b)9 /b = 10x9 /b10, 0 ≤ x ≤ b. The second moment of this estimator is: b
∫0 x2 10x9 / b10 dx = 10b2/12. Variance of this estimator = 10b2/12 - (b10/11)2 = 5b2/726. MSE = Variance + Bias2 = 5b2 /726 + (-b/11)2 = b2 /66. Comment: See Example 12.10 in Loss Models. For n observations the MSE is: 2b2 /{(n+1)(n+2)}. 26.11. C. E[X2 ] = 2θ2. ⇒ Bias = 2θ2 - θ2 = θ2. E[X4 ] = 24θ4. Var[X2 ] = E[(X2 )2 ] - E[X2 ]2 = 24θ4 - (2θ2)2 = 20θ4. Variance of the average of X2 for a sample of size 100 has Variance: Var[X2 ]/100 = 0.2θ4. MSE = Variance + Bias2 = 0.2θ4 + (θ2)2 = 1.2θ4. Comment: Similar to 4, 11/05, Q.28. For an Exponential Distribution, E[Xk] = k!θk.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 758 26.12. E. c ln[X1 X2 X3 X4 ] = c(lnX1 + lnX2 + lnX3 + lnX4 ). Each ln(Xi) is an independent Normal with the same parameters µ and σ. Thus E[c(lnX1 + lnX2 + lnX3 + lnX4 )] = c 4 E[Xi] = 4cµ. Bias = 4cµ - µ= µ(4c - 1). Var[c(lnX1 + lnX2 + lnX3 + lnX4 )] = c2 4 Var[Xi] = c2 4σ2. MSE = c2 4σ2 + {µ(4c - 1)}2 = c2 4σ2 + µ2(4c - 1)2 . Setting the derivative with respect c equal to zero: 0 = 8cσ2 + 8µ2(4c - 1). ⇒ c(4µ2 + σ2) = µ2. ⇒ c = µ2/(4µ 2 + σ2). 26.13. B. The population mean is: (0 + 2 + 2 + 5 + 6)/5 = 3. There are 10 equally likely subsets of size 3, drawn without replacement: {0, 2, 2}, {0, 2, 5}, {0, 2, 5}, {0, 2, 6}, {0, 2, 6}, {0, 5, 6}, {2, 2, 5}, {2, 2, 6}, {2, 5, 6}, {2, 5, 6}. The means are: 4/3, 7/3, 7/3, 8/3, 8/3, 11/3, 3, 10/3, 13/3, 13/3. The expected value of these estimates is: 3. Bias = 3 - 3 = 0. Second Moment of these estimates is: 9.8. Variance is: 9.8 - 32 = 0.8. MSE = Variance + Bias2 = 0.8 + 02 = 0.8. Comment: Similar to Exercise 12.9 in Loss Models. 26.14. E. The medians of the 10 subsets are: 2, 2, 2, 2, 2, 5, 2, 2, 5, 5. The expected value of these estimates is: 2.9. Bias = 2.9 - 3 = -0.1. Second Moment of these estimates is: 10.3. Variance is: 10.3 - 2.92 = 1.89. MSE = Variance + Bias2 = 1.89 + (-0.1)2 = 1.9. Comment: We are asked to calculate the MSE as an estimator of the mean. 26.15. A. For the 10 subsets, the average of the smallest and largest of the 3 values are: 1, 2.5, 2.5, 3, 3, 3, 3.5, 4, 4, 4. The expected value of these estimates is: 3.05. Bias = 3.05 - 3 = 0.05. Second Moment of these estimates is: 10.075. Variance is: 10.075 - 3.052 = 0.7725. MSE = Variance + Bias2 = 0.7725 + 0.052 = 0.775. Comment: In this case, the mean squared error of this estimator is less than that of the average of the three values.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 759 26.16. D. E[ψ] = 7232/1000 = 7.232. Bias[ψ] = 7.232 - 7 = 0.232. E[ψ2] = 52,480/1000 = 52.480. Var[ψ] = 52.480 - 7.2322 = 0.1782. E[φ] = 6917/1000 = 6.917. Bias[φ] = 6.917 - 7 = -0.083. E[φ2] = 48,126/1000 = 48.126. Var[φ] = 48.126 - 6.9172 = 0.2811. Bias[wψ + (1-w)φ] = w Bias[ψ] + (1 - w)Bias[φ] = w 0.232 + (1 - w)(-0.083) = 0.315w - 0.083. Var[wψ + (1-w)φ] = w2 Var[ψ] + (1 - w)2 Var[φ] = w2 0.1782 + (1 - w)2 (0.2811). MSE[wψ + (1-w)φ] = Var[wψ + (1-w)φ] + Bias[wψ + (1-w)φ]2 = = w2 0.1782 + (1 - w)2 (0.2811) + (0.315w - 0.083)2 . Setting the partial derivative of the MSE with respect w equal to zero: 0 = 2w 0.1782 - 2(1 - w)(0.2811) + 2(0.315)(0.315w - 0.083). 0 = w 0.1782 - 0.2811 + w 0.2811 + 0.0992w - 0.0261. w = .3072/.5585 = 0.550. 26.17. A. -0.4 = Bias[ α^ ] = E[ α^ ] - α = 2.7 - α. ⇒ α = 3.1. 0.49 = MSE[ α^ ] = Var[ α^ ] + Bias[ α^ ]2 = Var[ α^ ] + 0.42 . ⇒ Var[ α^ ] = 0.49 - 0.42 = 0.33. 0.33 = Var[ α^ ] = E[ α^ 2] - E[ α^ ]2 = E[ α^ 2] - 2.72 . ⇒ E[ α^ 2] = 0.33 + 2.72 = 7.62. Bias[ α^ 2] = E[ α^ 2] - α2 = 7.62 - 3.12 = -1.99. 26.18. B. E[cΣXi2 /n] = (c/n)ΣE[Xi2 ] = (c/n)n2θ2 = 2cθ2. Bias = 2cθ2 - θ2 = (2c - 1)θ2. Var[cΣXi2 /n] = (c/n)2 ΣVar[Xi2 ] = (c/n)2 n {E[X4 ] - E[X2 ]2 } = (c2 /n){24θ4 - (2θ2)2 } = 20c2 θ4/n. MSE = Variance + Bias2 = 20c2 θ4/n + (2c - 1)2 θ4. Setting the partial derivative with respect to c equal to zero: 0 = 40cθ4/n + 4(2c - 1)θ4. ⇒ 40c/n + 8c = 4. ⇒ c = 1/(10/n + 2) = n/(10 + 2n). Comment: Similar to 4, 11/05, Q.28. For n = 1, c = 1/12. As n → ∞, c → 1/2. 26.19. E. The expected value of the estimate is: (10)(1 - 1/n) + (n)(1/n) = 11 - 10/n. The bias is: (11 - 10/n) - 10 = 1 - 10/n. Thus the estimator is biased. The bias approaches 1 ≠ 0 as n approaches infinity, and thus the estimator is not asymptotically unbiased. However, the probability of a large error goes to zero as n approaches infinity, and thus the estimator is consistent. Comment: The probability of an error is 1/n, which goes to zero as n approaches infinity.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 760 26.20. A. The density for the sample median of 3 values is at x: Prob[one value = x, one value < x, one value > x] = 6f(x)F(x)S(x) = 6(e-x/θ/θ)(1 - e-x/θ) e-x/θ = (6/θ)(e-2x/θ - e-3x/θ). Therefore, the expected value of this median: ∞
∫ x (6 / θ) (e -2 x / θ
- e - 3x / θ ) dx = (6/θ){(θ/2)2 - (θ/3)2 } = 5θ/6.
0
Bias of the estimator = Expected Value of the Estimator - True Value = 5θ/6 - θ = -θ/6. Comment: See Example 12.4 in Loss Models. We can get the median to be x in 3! = 6 different combinations. This is a Gamma type integral. The estimator is biased downwards. 26.21. B. The mode of the original set is 1. The mode of a subset will be 4, if there are 2 or more fours selected. The number of fours in a subset of size three is Binomial with m = 3 and q = 1/3. For this Binomial: f(2) + f(3) = (3)(1/3)2 (2/3) + (1/3)3 = 7/27. Thus 7/27 of the time the estimate will be four, and the remainder of the time it will be 1. The expected value of the estimate is: (7/27)(4) + (20/27)(1) = 48/27 = 1.777. The bias is: 1.777 - 1 = 0.777. 26.22. A. E(η) = 156/100 = 1.56. E(η2) = 257/100 = 2.57. Var(η) = 2.57 - 1.562 = 0.136. Mean Square error of η = Square of Bias plus Variance = (1.56 -1.4)2 + 0.136 = 0.162. E(ζ) = 127/100 = 1.27. E(ζ2) = 321/100 = 3.21. Var (ζ) = 3.21 - 1.272 = 1.597. Mean Square error of ζ = Square of Bias plus Variance = (1.27 -1.4)2 + 1.597 = 1.614. MSE(η) / MSE(ζ) = 0.162 /1.614 = 0.100. Comment: ζ is not as “efficient” an estimator as η, because ζ has a larger mean square error.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 761 26.23. A. Bias = Expected value of the estimator - true value. For sample size n, Bias = 100n/(n-1) - 100 = 100/(n-1). As n approaches infinity the bias approaches zero; the estimator is asymptotically unbiased. Statement I is True. The variance of the estimator goes to zero as n approaches infinity. An asymptotically unbiased estimator, whose variance goes to zero as n approaches infinity, is consistent. Statement II is True. MSE = Variance + Bias2 . MSE[ θ^ 25] = 60/25 + (100/24)2 = 19.76. Statement III is False. 26.24. Samples: {0, 0, 0} with probability: .63 = .216. {0, 0, 1} in any order, with probability: (3)(.62 )(.4) = .432. {0, 1, 1} in any order, with probability: (3)(.6)(.42 ) = .288. {1, 1, 1} with probability: .43 = .064. Sample Sample Mean Sample Variance “Biased” Estimator of Variance {0, 0, 0} 0 0 0 {0, 0, 1} 1/3 1/3 2/9 {0, 1, 1} 2/3 1/3 2/9 {1, 1, 1} 1 0 0 Expected Value of the Sample Mean: (.216)(0) + (.432)(1/3) + (.288)(2/3) + (.064)(1) = 0.40. Expected Value of the Sample Variance: (.216)(0) + (.432)(1/3) + (.288)(1/3) + (.064)(0) = 0.24. Expected Value of the biased estimator of the variance (with n in the denominator): (.216)(0) + (.432)(2/9) + (.288)(2/9) + (.064)(0) = 0.16. Comment: The Bernoulli has mean 0.4, and variance (.4)(1 - .4) = 0.24. The sample mean is an unbiased estimator of the mean. The sample variance is an unbiased estimator of the variance. The bias of the biased estimator of the variance is in this case: 0.16 - 0.24 = -0.08.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 762 26.25. 1/ X = n/Σxi. Each Xi is Exponential with mean 1/λ. Therefore, Σxi is Gamma with α = n and θ = 1/λ. The distribution function of this Gamma is Γ[n; λy]. Doing a change of variables, z = 1/y, 1/Σxi has a distribution function, 1 - Γ[n; λ/z]. Thus, 1/Σxi follows an Inverse Gamma Distribution, with α = n and θ = λ. E[1/Σxi] = mean of this Inverse Gamma = λ/(n-1). ^
E[ λ ] = E[1/ X ] = E[n/Σxi] = n λ/(n-1). ^
Bias = λ - λ = n λ/(n-1) - λ = λ/(n-1). Alternately, Σxi is Gamma with α = n and θ = 1/λ. E[1/Σxi] = minus first moment of this Gamma = λ/(n-1). Proceed as before Comment: For an Exponential with mean θ, X is an unbiased estimator of θ. However, 1/ X is a biased estimator of the hazard rate λ. 26.26. E[Z] = aE[X] + bE[Y] = a0.8θ + b1.3θ. For this to be an unbiased estimator we require that θ = E[Z] = a0.8θ + b1.3θ.
⇒ 1 = .8a + 1.3b. ⇒ a = 1.25 - 1.625b. Var[Z] = a2 Var[X] + b2 Var[Y] + 2abCov[X, Y] = a2 0.3θ2 + b2 0.5θ2 + 2ab(-0.2θ2) = θ2 {0.3(1.25 - 1.625b)2 + b2 0.5 - 0.4(1.25 - 1.625b)b} = θ2 {1.9421875b2 - 1.71875b + 0.46875}. In order to minimize this variance, set the derivative with respect to b equal to zero: 0 = 3.884375b - 1.71875. ⇒ b = 0.4425. ⇒ a = 0.5309.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 763 26.27. C. The mean of the original set is 2. The number of fours in a subset of size three is Binomial with m = 3 and q = 1/3. Prob[no fours] = (2/3)3 = 8/27.
Sample Mean = 1.
Prob[one four] = (3)(1/3)(2/3)2 = 12/27. Sample Mean = 2. Prob[two fours] = (3)(1/3)2 (2/3) = 6/27.
Sample Mean = 3.
Prob[three fours] = (1/3)3 = 1/27. Sample Mean = 4. The expected value of the estimate is: (8/27)(1) + (12/27)(2) + (6/27)(3) + (1/27)(4) = 2. The bias is: 2 - 2 = 0. Comment: The sample mean is an unbiased estimator of the underlying mean. 26.28. D. 30 = (-4)2 + Var[A] ⇒ Var[A] = 14. ⇒ Var[B] = (2)(14) = 28. Bias[B] = -4/2 = -2. ⇒ MSE[B] = 28 + (-2)2 = 32.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 764 26.29. This discrete distribution has a mean of 4 and a variance of: {(0 - 4)2 + (8 - 4)2 } / 2 = 16. There are the following possible samples of size 4: 4 zeros, probability 1/16. sample variance = 0. 3 zeros, probability 4/16. mean = 2. sample variance = (4 + 4 + 4 + 36)/3 = 16. 2 zeros, probability 6/16. mean = 4. sample variance = (16 + 16 + 16 + 16)/3 = 64/3. 1 zero, probability 4/16. mean = 6. sample variance = (36 + 4 + 4 + 4)/3 = 16. no zeros, probability 1/16. sample variance = 0. (a) Expected Value of the sample variance is: (1/16)(0) + (4/16)(16) + (6/16)(64/3) + (4/16)(16) + (1/16)(0) = 16. This is equal to the variance of the distribution; the sample variance is an unbiased estimator of the variance (as it is in general.) (b) The expected value of the sample standard deviation is: (1/16)(0) + (4/16)(4) + (6/16)(8/ 3 ) + (4/16)(4) + (1/16)(0) = 2 + Since the standard deviation of the distribution is 4 ≠ 2 + biased estimator of the standard deviation.
3.
3 , the sample standard deviation is a
(c) The expected value of s4 is: (1/16)(0)2 + (4/16)(16)2 + (6/16)(64/3)2 + (4/16)(16)2 + (1/16)(0)2 = 298.667. For the distribution, σ4 = 162 = 256. 298.667 ≠ 256, so s4 is a biased estimator of σ4. Comment: The sample variance, s2 , is an unbiased estimator of the variance. However, unbiasedness is not preserved under nonlinear transformations, such as taking the square root. Taking the positive square root is a concave down function. Therefore, by a version of Jensenʼs inequality: E[s] = E[ s2 ] ≤
E[s2] = σ 2 = σ. For distributions that are not all concentrated at a
single point, the inequality is strict, and the sample standard deviation is biased downwards. Similarly, since x2 is a concave up function, s4 = (s2 )2 is biased upwards as an estimator of σ4.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 765 26.30. Let M be the maximum of a sample of size n. Then Prob[M < m] = Prob[all Xi < m] = (m/ω)n . Differentiating, the density of the maximum is: n mn-1 / ωn , m ≤ ω. ω
Thus the expected value of the maximum is:
∫0 m n mn - 1 / ωn dm = ω n / (n+1).
As n approaches infinity, this expected value approaches ω. Thus the estimator is asymptotically unbiased. ω
The second moment of the maximum is:
∫0 m2 n mn - 1 / ω n dm = ω2 n / (n+2).
Thus, the variance of the maximum is: ω2 n / (n+2) - {ω n / (n+1)}2 = ω2
n . (n + 2) (n +1)2
As n approaches infinity, this variance goes to zero. An asymptotically unbiased estimator whose variance goes to zero as the sample size goes to infinity is consistent, and thus the given estimator is consistent. Comment: See Example 12.7 in Loss Models. The maximum of the sample is the maximum Iikelihood estimator of ω, therefore it must be consistent.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 766 26.31. The distribution has a mean of 2. The third central moment of the distribution is: (3/4)(0 - 2)3 + (1/4)(8 -2)3 = 48. There are the following possible samples of size 4: i= 4
∑ (Xi - X)3 = 0.
4 zeros, probability 81/256.
i= 1 i= 4
3 zeros, probability 108/256.
mean = 2.
∑ (Xi - X)3 = (3)(-2)3 + 63 = 192.
i= 1 i= 4
2 zeros, probability 54/256.
mean = 4.
∑ (Xi - X)3 = (2)(-4)3 + (2)(4)3 = 0.
i= 1 i= 4
1 zero, probability 12/256.
mean = 6.
∑ (Xi - X)3 = (-6)3 + (3)(2)3 = -192.
i= 1 i= 4
∑ (Xi - X)3 = 0.
no zeros, probability 1/256.
i= 1
The expected value of the estimator is: (2/3) {(81/256)(0) + (108/256)(192) + (54/256)(0) + (12/256)(-192) + (1/256)(0)} = 48. Since the expected value of the estimator is the same as that of the third central moment, it is an unbiased estimator of the third central moment. Comment: For a sample of size n > 2, it can be shown that an unbiased estimator of the third central i =n
n moment is:
∑ (Xi - X)3
i =1
(n - 1) (n - 2)
. For n = 4,
n = 2/3. (n -1) (n - 2)
26.32. The sample mean is consistent provided the mean is finite. This is true for the Pareto Distribution provided α > 1. Comment: For α > 2 the variance is finite, and the sample mean is consistent due to the Central Limit Theorem, as shown in general in Example 12.6 in Loss Models. However, it is sufficient that the mean be finite due to the Law of Large Numbers.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 767 26.33. D. Var[T] = Var[bT1 + (1 - b)T2 ] = b2 Var[T1 ] + (1-b)2 Var[T2 ] + 2b(1-b)Cov[T1 , T2 ]. Set the derivative with respect to b equal to zero: 0 = 2bVar[T1 ] - 2(1-b)Var[T2 ] + (2 - 4b)Cov[T1 , T2 ]. b(Var[T1 ] + Var[T2 ] - 2Cov[T1 , T2 ]) = Var[T2 ] - Cov[T1 , T2 ]. b = (Var[T2 ] - Cov[T1 , T2 ])/(Var[T1 ] + Var[T2 ] - 2Cov[T1 , T2 ]) = (σ 2 2 - Cov[T1 , T2 ]) / (σ 1 2 - 2Cov[T1 , T2 ] + σ2 2). Comment: E[T] = bθ + (1-b)θ = θ. Thus T is an unbiased estimator estimator of θ. We do not use the fact that the Ti are normally distributed. 1 - b = (σ1 2 - Cov[T1 , T2 ])/(σ1 2 - 2Cov[T1 , T2 ] + σ2 2). If T1 and T2 were independent, then b = σ2 2/(σ1 2 + σ2 2). 26.34. A. Third moment of X about its mean = E[(X - 2)3 ] = E[X3 - 6X2 + 12X - 8] = E[X3 ] - 6E[X2 ] + 12E[X] - 8 = T - 6S + (12)(2) - 8 = T - 6S + 16. Comment: In general, E[(X - µ)3 ] = E[X3 ] - 3µE[X2 ] + 2µ3. 26.35. D. λ is both the mean and the variance of the Poisson Distribution. X is an unbiased estimator of the mean and thus of λ. Estimator II is the sample variance, an unbiased estimator of the variance and thus of λ. E[2X1 - X2 ] = 2E[X1 ] - E[X2 ] = 2λ - λ = λ. Thus estimator III is unbiased. 26.36. E. f(x) = 3λx2 exp[-λx3 ]. ln f(x) = ln3 + lnλ + 2lnx - λx3 . ∂ ln f(x)/∂λ = 1/λ - x3 .
∂2 ln f(x)/∂λ2 = -1/λ2. n E[ ∂2 ln f(x)/∂λ2 ] = n/λ2. Rao-Cramer lower bound = λ2/n. Alternately, E[(∂ ln f(x)/∂λ)2 ] = E[(1/λ - x3 )2 ] = E[1/λ2 - 2x3 /λ + x6 ] = 1/λ2 - 2(1/λ)/λ + 2/λ2 = 1/λ2. Rao-Cramer lower bound = 1/{nE[(∂ ln f(x)/∂λ)2 ]} = λ2/n. Comment: This a Weibull Distribution with τ = 3 and θ = 1/λ1/3. E[X3 ] = θ3 Γ[1 + 3/τ] = (1/λ)Γ[2] = 1/λ. E[X6 ] = θ6 Γ[1 + 6/τ] = (1/λ)2Γ[3] = 2/λ2. 26.37. E. ln f(x) = −λx + lnλ. ∂ln f(x)/∂λ = -x + 1/λ. ∂2ln f(x)/∂λ2 = -1/λ2. E[∂2ln f(x)/∂λ2] = -1/λ2. Rao-Cramer lower bound: -1/{n E[∂2ln f(x)/∂λ2]} = λ2/n.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 768 26.38. D. The median of the conditional distribution minimizes the expected absolute error. Thus statement A is not true. The conditional expected value is linear in some special situations, but generally is not. Thus statement B is not true. There is no estimator that can minimize the squared error, as opposed to the expected squared error. Thus statement C is not true. The mean of the conditional distribution minimizes the expected squared error. Thus statement D is true. Statement E is false. 26.39. E. q^ = x/m. E[ q^ ] = E[x/m] = E[x]/m = mq/m = q. E[ q^ 2 ] = E[(x/m)2 ] = E[x2 ]/m2 = (Var[x] + E[x]2 )/m2 = {mq(1-q) + m2 q2 }/m2 = q(1-q)/m + q2 . E[ q^ (1- q^ )] = E[ q^ ] - E[ q^ 2 ] = q - q(1-q)/m - q2 = q(1-q)(m-1)/m. ^ ^ Therefore, q(1q) m/(m-1) is an unbiased estimator of q(1-q).
Alternately, this Binomial can be thought of as a series of m Bernoulli Trials. q(1 - q) is the variance of this Bernoulli Distribution. We know that the sample variance an unbiased estimator of the variance. Now there have been x successes out of m Bernoulli trials. Thus X = x/m = q^ , and S 2 = {x(1 - q^ )2 + (m - x)(0 - q^ )2 }/(m - 1) = {x - 2x q^ + x q^ 2 + m q^ 2 - x q^ 2 } /(m-1) = {x - 2x q^ + m q^ 2 } /(m-1) = ( q^ m - 2 q^ m q^ + m q^ 2 } /(m-1) = ( q^ m - m q^ 2 }/(m-1) = q^ (1- q^ ) m/(m - 1). ^ ^ Therefore, q(1q) m/(m-1) is an unbiased estimator of q(1-q).
26.40. B. 1. False. 2. True. 3. False. Maximum Likelihood Estimators are often biased, (but they are asymptotically unbiased.) 26.41. D. The Mean Square Error of an estimator is equal to the Variance plus square of the Bias. For the first estimator: MSE = 0.5 + (3.2-3)2 = 0.54. For the second estimator: MSE = 0.6 + (3.3-3)2 = 0.69.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 769 26.42. D. E[S2 ]= (5/6)σ2, thus the absolute value of the bias is: | (5/6)σ2 - σ2 | = (1/6)σ2 = 2/6 = 0.333. Comment: nS2 / (n-1) is an unbiased estimator of σ2, therefore E[ nS2 / (n - 1)] = σ2. n E[S2 ] / (n - 1) = σ2. Therefore, E[S2 ] = (n - 1)σ2 / n. 26.43. E. 1. is the definition of unbiased estimator. 2. is the definition of Uniformly Minimum Variance Unbiased Estimator (UMVUE). 3. is the definition of a consistent estimator. 26.44. A. 1. T. Definition of Unbiased. 2. False. For an asymptotically unbiased estimator, the average error goes to zero as n goes to infinity. However, there can still be a large chance of a large error, as long as the errors are of opposite sign. For example, a 50% chance of an error of +1000 and a 50% chance of an error of -1000, gives an average error of 0. 3. False. 26.45. C. The MSE = Variance + Bias2 . Bias = 2.2 - 2. Thus Variance = 1.00 - (2.2-2)2 = 0.96. 26.46. C. 1. False. An unbiased estimator would have its expected value equal to the true value, but 2 ≠ 2.05. 2. False. MSE(α1) = (0.052 ) + 1.025 < (0.052 ) + 1.050 = MSE(α2). 3. True. 26.47. D. 1. False: Mean Square Error = Variance of an estimator + Bias2 . Thus statement #1 would imply that the Bias of Maximum Likelihood Estimators is always zero; i.e. , that maximum likelihood estimators are always unbiased. This is not true. 2. True. 3. True. 26.48. E. Z = αX + βY. E[Z] = αE[X] + βE[Y]. For Z unbiased, E[Z] = m. Thus m = α(.8m) + βm. Thus β = 1 - 0.8α. Since X and Y are independent, Var[Z] = α2Var[X] + β2Var[Y] = α2m2 + β21.5m2 = m2 (α2 + 1.5(1 - 0.8α)2) = m2 (1.96α2 - 2.4α + 1.5). The mean squared error is the sum of the variance and the square of the bias. Since the bias is zero, we need to minimize the variance of Z.
∂Var[Z] / ∂α = m2 (3.92α - 2.4). Setting the partial derivative equal to zero: m2 (3.92α - 2.4) = 0. Therefore, α = 2.4 / 3.92 = 0.612.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 770 26.49. B. E(ψ) = 165/75 = 2.2. E(ψ2) = 375/75 = 5. Var(ψ) = 5 - 2.22 = 0.16. Mean Square error of ψ = Square of Bias plus Variance = (2.2 -2)2 + 0.16 = 0.200. E(φ) = 147/75 = 1.96. E(φ2) = 312/75 = 4.16. Var (φ) = 4.16 - 1.962 = 0.318. Mean Square error of φ = Square of Bias plus Variance = (1.96 -2)2 + 0.318 = 0.320. MSE(ψ) / MSE( φ) = 0.200 / 0.320 = 0.625. 26.50. B. Statement 2 is the definition of a consistent estimator, and therefore must be true of a consistent estimator. Neither statement 1 or 3 must be true of a consistent estimator. 26.51. C. Mean squared error = variance plus the square of the bias = 1.00 + 0.52 = 1.25. 26.52. A. 1. False. These two estimators sometimes differ and sometimes are the same. 2. False. Whether a point or interval estimate is preferable would depend on the application. An interval estimate provides more information, but a point estimate is usually simpler to obtain. 3. False. An estimator is unbiased if its expected value is equal to that of the item we are trying to estimate. If one uses a different error function than the squared error function, which in turn results in a different estimator than the mean, such as the median or mode, then one does not have a necessarily unbiased estimator. If one is using the squared error measure, then the Bayesian estimate is the mean of the posterior distribution. This is unbiased, provided the prior distribution used in our computation is a correct model of the situation. While we assume in many computational exam questions that the selected prior distribution used is “correct”, in actual applications the actuary would have to select the prior distribution. In those circumstances, the selected prior distribution is unlikely to match reality and thus even the squared error Bayesian Estimator is unlikely to be unbiased. 26.53. A. While the method of maximum likelihood is asymptotically unbiased, for small samples it is usually biased. So #1 is false. While the method of maximum likelihood is asymptotically of minimum variance, for small samples the variance is usually greater than the Rao-Cramer lower bound. So #2 is false. While for the method of maximum likelihood as the sample size approaches infinity, the errors approach a Normal Distribution, for small samples this is not necessarily true. So #3 is false. Comment: All of three are true in the limit as the sample size approaches infinity.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 771 26.54. E. With empirical estimation there is no model and thus no modeling error; A is true. B is true. Sn2 is biased, (since we divided by n rather than n-1) with an expected value of ((n-1)/n)(Variance.) As n goes to infinity, the expected value of Sn2 goes to the variance. Therefore, Sn2 is an asymptotically unbiased estimator of the variance; C is true. MSE = Variance + Bias2 . An asymptotically unbiased estimator, whose variance goes to zero as the number of data points goes to infinity, has its Mean Squared Error also go to zero. Therefore, the probability of large errors goes to zero, and the estimator is consistent, also called weakly consistent. D is true. E is false, since an estimator, as opposed to an estimate, is a random variable or a random function. 26.55. C. mean = (0)(0.5) + (1)(0.3) + (2)(0.1) + (3)(0.1) = 0.8. 2nd moment = (02 )(0.5) + (12 )(0.3) + (22 )(0.1) + (32 )(0.1) = 1.6. variance = 1.6 - 0.82 = 0.96. S 4 2 = Σ(Xi - X )2 / 4 = (3/4){Σ(Xi - X )2 / 3} = (3/4)(sample variance). The sample variance is unbiased. ⇒ E[sample variance] = 0.96. ⇒ E[S4 2 ] = (3/4)(0.96) = 0.72. Bias = expected value of the estimator - true value = 0.72 - 0.96 = -0.24. Comment: In this case we know the distribution from which the data is drawn. Therefore, we know the variance of this distribution, 0.96. We would normally use the sample variance when the underlying variance is unknown. The sample variance is calculated for a data set in order to estimate the variance of the distribution from which this data was drawn. While the sample variance is on average equal to the “true variance”, 0.96 in this case, for any given sample the sample variance will usually have a value different than 0.96. For example, the sample of size four: {0, 1, 1, 2}, has a sample variance of 2/3.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 772 26.56. D. Statement A: No. “no other unbiased estimator has a smaller variance.” Statement B: No. If there is a bias, that does not decline as the sample size increases to infinity, then the estimator is not consistent. For example, assume the true value is 1, but the estimator is always 2. Then the variance of the estimator is zero, but the estimator is not consistent, since for any n, | ψn - c | = 1. Statement C: No. For example, let the true value be 0. Let ψn = 0 with probability 1 - 2n and ψn = 2n with probability 1/2n . Then E[ψn ] = 1 regardless of n, and thus the estimator is biased. Yet the probability of large errors goes to zero, so the estimator is consistent. Statement D: True. MSE = Bias2 + Variance. So Bias = 0 ⇒ MSE = Variance. Alternately, Bias = 0 ⇔ E[ α^ ] = α. ⇒ MSE = E[( α^ - α)2 ] = E[( α^ - E[ α^ ])2 ] = Variance. Statement E: False. Comment: Let ψn be the estimator with a sample size of n and c be the true value, then ψ is a consistent estimator if given any ε > 0: limit Probability{| ψn - c | < ε} = 1. n →∞
Statement B would have been true if it instead read “An estimator is consistent whenever the mean squared error of the estimator approaches zero as the sample size increases to infinity”, or “An asymptotically unbiased estimator is consistent whenever the variance of the estimator approaches zero as the sample size increases to infinity”. 26.57. B. Let the parameter be θ. E[Y1 ] = θ. E[Y2 ] = θ. E[k1 Y 1 + k2 Y 2 ] = θ.
⇒ k1 E[Y1 ] + k2 E[Y2 ] = θ. ⇒ k1 θ + k2 θ = θ. ⇒ k1 + k2 = 1. Var[k1 Y 1 + k2 Y 2 ] = k1 2 Var[Y1 ] + k2 2 Var[Y2 ] = (1 - k2 )2 4Var[Y2 ] + k2 2 Var[Y2 ]. Set equal to zero the partial derivative with respect k2 : 0 = -8(1 - k2 ) Var[Y2 ] + 2 k2 Var[Y2 ]. ⇒ k2 = 4/5. ⇒ k1 = 1/5 = 0.20. Comment: Weight each estimator inversely proportional to its variance.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 773 26.58. D. E[ θ^ ] = E[{k/(k+1)} X] = {k/(k+1)}E[X] = {k/(k+1)}θ. Bias = E[ θ^ ] - θ = −θ/(k + 1). Var[ θ^ ] = Var[{k/(k+1)} X] = {k/(k+1)}2 Var[X] = {k/(k+1)}2 θ2/25. MSE = Var + Bias2 . We are given that in this case MSEθ^(θ) = 2[bias ^θ(θ)] 2 .
⇒ Var[ θ^ ] = Bias2 . ⇒ {k/(k+1)}2 θ2/25 = {-θ/(k + 1)}2 . ⇒ k2 = 25. ⇒ k = 5. 26.59. B. E[X2 ] = 2θ2. ⇒ Bias = 2θ2 - θ2 = θ2. E[X4 ] = 24θ4. Var[X2 ] = E[(X2 )2 ] - E[X2 ]2 = 24θ4 - (2θ2)2 = 20θ4. MSE = Variance + Bias2 = 20θ4 + (θ2)2 = 21θ4. Alternately, MSE = E[(X2 - θ2)2 ] = E[X4 - 2X2 θ2 + θ4] = E[X4 ] - 2θ2E[X2 ] + θ4 = 24θ4 - 2θ2(2θ2) + θ4 = 21θ4. Comment: As shown in Appendix A attached to the exam, for an Exponential Distribution, the kth moment is: E[Xk] = k!θk. 26.60. D. There are a total of five observations. The average of 5 independent, identically distributed variables has the same mean and 1/5 the variance; it is Normal with mean 20 and variance 4/5 = 0.8. The expected value of the estimator is E[ X 2 ], which is the second moment of a Normal with µ = 20 and σ2 = 0.8, which is: 0.8 + 202 = 400.8. The true value of the area is: 202 = 400. The bias is: 400.8 - 400 = 0.8. It will overestimate on average by 0.8 square meters. Comment: Even though we have an unbiased estimator of the length of a side, squaring it does not give an unbiased estimator of the area, since E[X2 ] ≥ E[X]2 . Variance ≥ 0. ⇒ E[X2 ] ≥ E[X]2 .
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 774 26.61. D. The sample variance is an unbiased estimator of the variance. However, unbiasedness is not preserved under nonlinear transformations, such as taking the square root. Thus statement I is not true. i=n
i=n
i=n
i=1
i=1
i=1
E[ ∑ xi2 / n] = ∑ E[xi2 ]/n = E[X2 ]; ∑ xi2 / n is an unbiased estimator of E[X2 ]. Thus statement II is true. The sample variance, with n-1 in the denominator, is an unbiased estimator of the variance, therefore, i=n
∑ (xi - x)2 / n is an asymptotically unbiased estimator of the variance.
i=1
For a distribution with finite fourth moment, such as the Normal, the variance of this estimator goes to zero as n → ∞. An asymptotically unbiased estimator, whose variance goes to zero as the number of data points goes to infinity, is consistent. Thus statement III is true. i=n
Alternately, for a Normal Distribution ∑ (xi - x)2 / n is the maximum likelihood estimator of the i=1
variance, and thus it is a consistent estimator of the variance. Comment: For any distribution with a finite kth moment, E[
i= n
i= n
i= 1
i= 1
∑ xik / n] = ∑ E[xik]/n = E[Xk].
i= n
Thus
∑ xik / n is an unbiased estimator of the kth moment.
i= 1
For k = 1, the sample mean is an unbiased estimator of the mean of the distribution. For the Normal, or any distribution with finite fourth moment, the sample variance is a consistent estimator of the variance. i=n
I do not believe that either the sample variance or ∑ (xi - x)2 / n are consistent estimators of the i=1
variance for a distribution like a Pareto with α = 3, without a finite fourth moment. i=n
For the Normal, ∑ (xi - x)2 / σ2 is Chi-Square with n-1 degrees of freedom, a Gamma Distribution i=1
with α = (n-1)/2 and θ = 2, with mean n-1, and variance 2(n-1). i=n
i=n
i=1
i=1
Var[ ∑ (xi - x)2 / n ] = Var[ ∑ (xi - x)2 / σ2 ](σ2/n)2 = 2(n-1)σ4/n2 .
26.62. A. E[Y] = E[ΣXi] = 100 E[X] = 100 α θ = 400 θ. ⇒ E[Y/400] = θ. Thus in order to get unbiased estimator of θ, we take c = 1/400 = 0.025
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 775 26.63. D. In order to minimize the squared error loss function we use the mean: αθ = (3)(100) = 300. Alternately, the expected squared error is: E[(X - c)2 ] = E[X2 ] - 2cE[X] + c2 . Setting the derivative with respect to c equal to zero: 0 = -2E[X] + 2c. c = E[X] = αθ = (3)(100) = 300. 26.64. B. E[ θ^ ] = E[ ∑ ai xi ] =
∑ ai E[xi ] = ∑ ai θ = θ ∑ ai .
Thus E[ θ^ ] = θ, and the estimator is unbiased if and only if
∑ ai = 1.
This is the case for Option II. Comment: This could be an Exponential Distribution. If the ai were all equal to 1/n, then the estimator is the sample mean, an unbiased estimator of the underlying mean.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 776 26.65. A. Yn is the maximum likelihood estimator of θ. Therefore it is consistent. Alternately, for a sample from a uniform distribution, the expected values of the order statistics are evenly spaced; E[Yn ] = θ n / (n + 1). As n approaches infinity, E[Yn ] approaches θ; Yn is an asymptotically unbiased estimator of θ. A consistent estimator should seem sensible. None of the other choices are asymptotically unbiased, so choice A makes the most sense. Alternately, by differentiating the distribution function the density of the maximum is: n t n-1 / θn , 0 < t ≤ θ. θ
E[Yn ] =
∫0 t n tn - 1 / θn dt = θ n / (n + 1).
Therefore, Yn is an asymptotically unbiased estimator of θ. θ
E[Yn 2 ] =
∫0 t2 n tn - 1 / θn dt = θ2 n / (n + 2).
Var[Yn ] = θ2 n / (n + 2) - {θ n / (n + 1)}2 = θ2
n . (n + 2) (n +1)2
Therefore, Var[Yn ] goes to zero as n approaches infinity. Since Yn is also an asymptotically unbiased estimator of θ, Yn is consistent. Alternately, in a sample from the uniform distribution on the interval (0, θ), the rth order statistic follows a Beta Distribution with parameters a = r , b = n + 1 - r, and θ. The maximum follows a Beta Distribution with parameters a = n , b = n + 1 - n = 1, and θ. The Beta Distribution has mean = θ and thus variance = θ2
a a (a +1) , second moment = θ2 , a+b (a + b) (a + b + 1)
ab . (a+ b +1)
(a + b)2
Therefore, E[Yn ] = θ n / (n + 1), E[Yn 2 ] = θ2 n / (n + 2), and Var[Yn ] = θ2 n / (n + 2) - {θ n / (n + 1)}2 = θ2
n . Proceed as before. (n + 2) (n +1)2
Comment: One could also get E[Yn ] by integrating its survival function from 0 to θ. Similar to CAS3L, 5/09, Q.24, CAS3, 11/05, Q.6, and CAS3L, 5/08, Q.2.
2013-4-6, Fitting Loss Dists. §26 Properties of Estimators, HCM 10/15/12, Page 777 26.66. C. MSE = Var + Bias2 .
MSE(X) = 1.9 + 0.52 = 2.15.
MSE(Y) = 1.0 + (-1)2 = 2.00. MSE(Y) < MSE(Z)
MSE(Z) = 2.1 + 02 = 2.10.
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 778
Section 27, Variances of Estimates, Method of Moments One can use the Method of Moments to obtain point estimates of the parameters of a distribution. However, there is a variance in these estimates due to estimation error. For the case when the distribution has a single parameter, it is relatively easy to quantify that variance, and then use this variance of the estimated parameter in order to estimate the variance of a function of the parameter.331 Variance of an Estimated Mean: As discussed previously, the variance of an average declines in proportion to the inverse of the sample size: Var[ X ] = Var[X] / n. Variance of Functions: Let h be a function of y. In general, Δh(y) ≅ (∂h/ ∂y) (Δy). The variance is the second central moment, the expected squared difference between values and the mean. Therefore, Var[h(y)] ≅ (Δh(y))2 ≅ (∂h/ ∂y)2 (Δy)2 ≅ (∂h/ ∂y)2 Var[y]. ⎛ ∂h ⎞ 2 ^ Thus, Var[h(θ)] ≅ ⎜ ⎟ Var[ θ ]. 332 ⎝ ∂θ ⎠ For example if h(y) = 1/y, then ∂h/ ∂y = -1/y2 . Var[1/y] = Var[h(y)] ≅ (∂h/∂y)2 Var[y] = Var[y]/y4 . Exercise: You estimate y as 10, with the variance of that estimate equal to 9. What is an approximate 95% confidence interval for ln(y). [Solution: Let h(y) = ln(y). ∂h/∂y = 1/y. Var[h(y)] ≅ (∂h/∂y)2 Var[y] = Var[y]/y2 = 9/102 = 0.09. StdDev[h(y)] ≅ 0.3. Approximate 95% confidence interval for ln(y): ln(10) ± (1.96)(.3) = 2.3 ± 0.6.] Variance of an Estimated Parameter: Assume we have applied the Method of Moments to a single parameter distribution. For example, assume we fit a Pareto with α = 2.5 (fixed) and θ unknown via method of moments. The mean of this Pareto is θ/(α-1) = θ/1.5. Since there is one parameter, the method of moments involves matching the mean: X = θ /1.5. 331
When dealing with method of moments when two (or more) parameters vary, the variance of the estimated parameters is more difficult to estimate. One can use the methods shown at pages 350-351 of Kendallʼs Advanced Theory of Statistics, Volume 1 (1994) by Stuart & Ord. 332 This is a one dimensional special case of the delta method, to be discussed subsequently. See page 351 of Kendallʼs Advanced Theory of Statistics, Volume 1 (1994) by Stuart & Ord.
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 779 Therefore θ^ = 1.5 X . Therefore Var( θ^ ) = 1.52 Var( X ) = 2.25 Var(X) / n. For X from a Pareto Distribution, with α = 2.5: Var(X) = 2nd moment - mean2 = 2 θ2 α θ2 2.5 ⎛ θ ⎞2 = = θ2 = (20/9)θ2 . 2 2 ⎝ ⎠ α −1 (α − 1) (α − 2) (α − 1) (α − 2) (1.5 ) (0.5)
Thus, Var( θ^ ) = 2.25 Var(X) / n = (2.25)(20/9)θ2 /n = (5/n)θ2 . For the ungrouped data in Section 2, the observed mean is 312,675. Thus for α = 2.5 (fixed), the fitted θ via the Method of Moments is: (312675)(2.5 - 1) = 469,013. The variance of this estimate of θ is approximately: (5/n)θ2 = 8.46 x 109 , for n =130 data points. This corresponds to a standard deviation of about 92 thousand. Thus an approximate 95% confidence interval for θ is: θ = 469,013 ± (1.96)(92 thousand) ≅ 469 ± 180 thousand. ^
The steps in order to estimate the variance of a single parameter fit by the Method of Moments are: 1. Write the estimated parameter as a function of X . 2. Write down the variance of the function from step 1 in terms of the variance of X , using ⎛ ∂h ⎞ 2 ^ Var[h(θ)] ≅ ⎜ ⎟ Var[ θ ]. ⎝ ∂θ ⎠ 3. Write down the variance of the observed mean in terms of the process variance of a single draw from the distribution: Var( X ) = Var(X) / n. 4. Write down the process variance of a single draw from the distribution. 5. Combine the results of steps 1 through 4. Assume that we hold θ fixed and allow α to vary, instead of vice versa as above. For example, take θ = 500,000 in a Pareto Distribution and try to fit α via the Method of Moments to the ungrouped data in Section 2. The mean of this Pareto is θ/(α-1) = 500000/(α-1). Since there is one parameter, the method of moments involves matching the mean: X = 500000/(α-1). α^ = 1 + 500,000/ X . Then using the fact that Var[1/y] ≅ Var[y]/y4 : Var[ α^ ] = 500,0002 Var[1/ X ] ≅ 500,0002 Var[ X ]/ X 4 = (500,0002 /{500,000/(α-1)}4 )Var[ X ] = {(α - 1)4 / 5000002} Var[ X ].
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 780 For a Pareto Distribution, Var(X) = Var( X ) = Var(X)/n = Thus Var[ α^ ] ≅
α θ2 and therefore, (α − 1)2 (α − 2)
1 α θ2 . n (α − 1)2 (α − 2)
(α − 1)4 (α − 1)4 1 α 500,0002 α (α − 1)2 Var[ X ] = = . 500,0002 500,0002 n (α − 1)2 (α − 2) n (α − 2)
For the ungrouped data in Section 2, the observed mean is 312,675. Thus the estimate of α = 1 + (500,000 / 312,675) = 2.599. For α = 2.599 and n = 130, the approximate variance is: α (α − 1)2 = 0.0853. This corresponds to a standard deviation of about 0.292. n (α − 2) Thus an approximate 95% confidence interval for α is: α^ = 2.599 ± (1.96)(0.292) ≅ 2.60 ± 0.58. Variance of Functions of An Estimated Parameter: For the ungrouped data in Section 2, the observed mean is 312,675. Thus for a Pareto Distribution with α = 2.5 (fixed), the fitted θ via the Method of Moments is: θ = (312675)(2.5 - 1) = 469,013. As discussed previously, the variance of this estimate of θ is approximately: (5/n)θ2 = 8.46 x 109 , for n = 130 data points. Exercise: For a Pareto Distribution with α = 2.5 and θ = 469,013, determine F(100,000). [Solution: F(100,000) = 1 - {469,013/(469,013+100,000)}2.5 = 0.383.] Exercise: Assume we have fit a Pareto Distribution with fixed α = 2.5 to 130 observations via the Method of Moments and obtained a point estimate of θ = 469,013. What is an approximate 90% confidence interval for the Distribution Function at 100,000? [Solution: h(θ) = 1 - {θ/(θ+100,000)}2.5.
∂h/∂θ = -2.5{θ/(θ+100,000)}1.5 {100,000/(θ+100,000)2 } = -250,000θ1.5/(θ+100,000)3.5 = -250,000(469,0131.5)/(469,013 + 100,000)3.5 = -5.78 x 10-7. Thus Var[h(θ)] ≅ (∂h/∂θ)2 (Var[ θ^ ]) = (-5.78 x 10-7)2 (8.46 x 109 ). StdDev[h(θ)] ≅ (5.78 x 10-7)(9.20 x 104 ) = 0.053. Thus an approximate 90% confidence interval for the Distribution Function at 100,000 is: 0.383 ± (1.645)(0.053) ≅ 0.38 ± 0.09.]
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 781 Problems: Use the following information for the next 4 questions: You observe 1000 individual claims, which you assume follow a Pareto Distribution. You are given that: 1000
1000
i=1
i=1
1000
1000
1000
1000
∑ xi = 72,134 ∑ 100 1+ x = 7.038 i i=1 ∑ 106.91 + x = 6.682 i i=1
∑ xi2
= 22,535,997
1 2 = 0.0546 i=1 (100 + x i)
∑
1 2 = 0.0488 (106.9 + x ) i i=1
∑
27.1 (1 point) Assuming α = 2.5, estimate θ using the Method of Moments. A. Less than 100 B. At least 100, but less than 105 C. At least 105, but less than 110 D. At least 110, but less than 115 E. At least 115 27.2 (2 points) Estimate the variance of the estimate of θ in the previous question. A. Less than 50 B. At least 50, but less than 52 C. At least 52, but less than 54 D. At least 54, but less than 56 E. At least 56
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 782 27.3 (1 point) Assuming θ = 100, estimate α using the Method of Moments. A. Less than 2.0 B. At least 2.0, but less than 2.2 C. At least 2.2, but less than 2.4 D. At least 2.4, but less than 2.6 E. At least 2.6 27.4 (3 points) Estimate the variance of the estimate of α in the previous question. A. Less than 0.005 B. At least 0.005, but less than 0.008 C. At least 0.008, but less than 0.011 D. At least 0.011, but less than 0.014 E. At least 0.014
27.5 (3 points) You are given the following:
• The random variable X has the density function: f(x) = (1/θ) e-x/θ, x > 0. • A random sample of 6 observations of X yields the values: 10, 12, 15, 17, 19, 23. Estimate the variance of the method of moments estimator of θ. A. Less than 30 B. At least 30, but less than 35 C. At least 35, but less than 40 D. At least 40, but less than 45 E. At least 45
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 783 Use the following information for the next 4 questions: You observe 10 individual claims: 1534, 567, 383, 434, 563, 132, 1262, 748, 299, 516 which you assume follow a Gamma Distribution. 27.6 (1 point) Assuming α = 3, estimate θ using the Method of Moments. A. Less than 190 B. At least 190, but less than 200 C. At least 200, but less than 210 D. At least 210, but less than 220 E. At least 220 27.7 (3 points) Estimate the standard deviation of the estimate of θ in the previous question. A. Less than 20 B. At least 20, but less than 25 C. At least 25, but less than 30 D. At least 30, but less than 35 E. At least 35 27.8 (1 point) Assuming θ = 200, estimate α using the Method of Moments. A. Less than 3.0 B. At least 3.0, but less than 3.2 C. At least 3.2, but less than 3.4 D. At least 3.4, but less than 3.6 E. At least 3.6 27.9 (2 points) Estimate the variance of the estimate of α in the previous question. A. Less than 0.4 B. At least 0.4, but less than 0.5 C. At least 0.5, but less than 0.6 D. At least 0.6, but less than 0.7 E. At least 0.7
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 784 Use the following information to answer the next eight questions: 10,000 individual claims are observed with a total of $1,050,000 in losses. 27.10 (1 point) An exponential distribution F(x) = 1 - e-λx is fit to this data via the method of moments. What is the resulting estimate of λ? A. less than 0.01 B. at least 0.01 but less than 0.02 C. at least 0.02 but less than 0.03 D. at least 0.03 but less than 0.04 E. at least 0.04 27.11 (2 points) What is the standard deviation of the estimate of λ in the prior question? A. less than 0.00008 B. at least 0.00008 but less than 0.00009 C. at least 0.00009 but less than 0.00010 D. at least 0.00010 but less than 0.00011 E. at least 0.00011 27.12 (1 point) Using the exponential distribution fit by the method of moments, what is the estimated process variance of the claim severity? A. 9,000 B. 10,000 C. 11,000 D. 12,000 E. 13,000 27.13 (2 points) What is the standard deviation of the estimate of the variance of the claim severity in the prior question? A. less than 250 B. at least 250 but less than 300 C. at least 300 but less than 350 D. at least 350 but less than 400 E. at least 400 27.14 (1 point) Using the exponential distribution fit by the method of moments, what is the estimated probability of a claim size greater than $200? A. less than 12% B. at least 12% but less than 13% C. at least 13% but less than 14% D. at least 14% but less than 15% E. at least 15%
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 785 27.15 (2 points) What is the standard deviation of the estimate in the prior question? A. less than 0.001 B. at least 0.001 but less than 0.002 C. at least 0.002 but less than 0.003 D. at least 0.003 but less than 0.004 E. at least 0.004 27.16 (1 point) Using the exponential distribution fit by the method of moments, what is the estimated loss elimination ratio for $100? A. less than 40% B. at least 40% but less than 50% C. at least 50% but less than 60% D. at least 60% but less than 70% E. at least 70% 27.17 (3 points) What is the standard deviation of the loss elimination ratio estimated in the previous question? A. less than 0.001 B. at least 0.001 but less than 0.002 C. at least 0.002 but less than 0.003 D. at least 0.003 but less than 0.004 E. at least 0.004
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 786 Use the following information for the next 6 questions: • You observe 100,000 losses for a total of $873 million. • Based on having analyzed similar data from previous years, you believe these losses were drawn from a Weibull Distribution, as per Loss Models, with parameters τ = 1/4 and θ unknown. 27.18 (1 point) Estimate θ using the method of moments. A. Less than 370 B. At least 370, but less than 380 C. At least 380, but less than 390 D. At least 390, but less than 400 E. At least 400 27.19 (2 points) What is the standard deviation of the estimate of θ? A. Less than 6 B. At least 6, but less than 7 C. At least 7, but less than 8 D. At least 8, but less than 9 E. At least 9 27.20 (1 point) Estimate the probability of the next loss having a size greater than 5000. A. Less than 13% B. At least 13%, but less than 14% C. At least 14%, but less than 15% D. At least 15%, but less than 16% E. At least 16% 27.21 (2 points) What is the standard deviation of the estimate that is the solution to the previous question? A. Less than 0.15% B. At least 0.15%, but less than 0.16% C. At least 0.16%, but less than 0.17% D. At least 0.17%, but less than 0.18% E. At least 0.18% 27.22 (1 point) Estimate the 95th percentile of the size of loss distribution. A. 25,000 B. 26,000 C. 27,000 D. 28.000 E. 29,000 27.23 (1 point) What is the standard deviation of the estimate that is the solution to the previous question? A. 750 B. 770 C. 790 D. 810 E. 830
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 787 27.24 (4B, 11/95,Q.17) (3 points) You are given the following: •
Losses follow a Pareto distribution, with parameters θ (unknown) and α = 3.
•
300 losses have been observed.
Determine the variance of the method of moments estimator of θ. A. 0.0025 θ2
B. 0.0033 θ2
C. 0.0050 θ2
D. 0.0100 θ2
E. 0.0133 θ2
27.25 (4B, 5/98, Q.22) (2 points) You are given the following: • The random variable X has the density function f(x) = e-x/θ / θ , 0 < x < ∞ , θ > 0. •
θ is estimated by an estimator based on a large random sample of size n.
•
p is the proportion of the observations in the sample that are greater than 1.
•
The probability that X is greater than 1 is estimated by the estimator exp(-1/θ) .
Determine the approximate variance of the estimator for the probability that X is greater than 1 if the estimator of θ is the average of X observed in the random sample. A.
θ2 n
B.
e− 1 / θ n
C.
e− 1 / θ nθ
D.
e− 2 / θ n θ2
E.
(1 - e− 1 / θ ) e− 1 / θ n
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 788 Solutions to Problems: 27.1. C. Since there is one parameter (remaining), the method of moments consists of matching means. The observed mean is 72134 / 1000 = 72.134. The mean of this Pareto is θ/(α-1) = θ/1.5. Since there is one parameter, the method of moments involves matching the mean: mean = θ / 1.5. Therefore θ = 1.5 mean = (1.5)(72.134) = 108.2. 27.2. E. θ = 1.5 mean. Therefore Var(θ) = 1.52 Var(mean) = 2.25 Var(mean). Var(mean) = (1/n) Var(x). For x from a Pareto Distribution, with α = 1.5: Var(x) = 2nd moment - mean 2 = 2θ2 / {(α−1)(α−2)} - θ2 / (α−1)2 = θ2α / { (α−2)(α−1)2} = θ2 {2.5/(.5)(1.5)2 } = (20/9)θ2 . Therefore, Var(θ) = (1.52 ) Var(mean) = (2.25/n) Var(x) = (5/n)θ2 . For n =1000 and θ = 108.2 (from the previous solution): Var(θ) = (5/1000)(108.22 ) = 58.54. Comment: One could instead estimate Var(x) from the empirical variance. However, what I have done is the preferred solution; use the Pareto assumption with alpha = 2.5. Then using the fitted theta from the previous question, allows us to estimate Var(x). This is the preferred technique because: 1. It is consistent with how we estimated theta in previous question. 2. It avoids relying on the second moment of the data, which for a sample from a Pareto, even of size 1000, can be subject to considerable random fluctuation. 27.3. C. Since there is one parameter (remaining), the method of moments consists of matching means. The observed mean is: 72,134 / 1000 = 72.134. The mean of the Pareto is: θ/(α-1) = 100/(α-1). Set 100/(α-1) = 72.134. ⇒ α = 1 + (100/72.134) = 2.386.
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 789 27.4. D. The 5 steps are as follows: 1. Write the estimated parameter as a function of the observed mean: The mean of this Pareto is: θ/(α-1) = 100/(α−1). Since there is one parameter, the method of moments involves matching the mean: X = 100/(α−1). Therefore the method of moments estimator of the single parameter is: α^ = 1 + (100 / X ). 2. Write down the variance of the function from step 1 in terms of the variance of the observed mean: Var[1/y] ≅ (∂(1/y)/ ∂y)2 Var[y] = Var[y]/y4 . Since the α^ = 1 + (100 / X ), Var[ α^ ] ≅ 1002 Var[1/ X ] = 1002 Var[ X ] / X 4 = 1002 Var[ X ] / (100/(α−1))4 = {(α − 1)4 / 1002}Var[ X ]. 3. Write down the variance of the observed mean in terms of the process variance of a single draw from the distribution: Var( X ) = Var(X) / n . 4. Write down the process variance of a single draw from a Pareto distribution: Var(X) = θ2 α /{(α−1)2 (α−2)} = 1002 α /{(α−1)2 (α−2)}. 5. Combine the results of steps 1 through 4: Thus, Var[ α^ ] ≅ {(α − 1)4 / 1002}Var[ X ] = {(α − 1)4 / 1002}(1/n)Var[X] = {(α − 1)4 / 1002}(1/n)1002 α /{(α−1)2 (α−2)} = α(α−1)2 / {(α−2)n}. For n =1000 and α^ = 2.386 (from the previous solution), Var[ α^ ] ≅ (2.386)(1.3862 ) / 386 = 0.0119. 27.5. D. In order to apply the Method of Moments: θ = mean = (10+12+15+17+19+23)/6 = 16. Var[θ] = Var[mean] = Var[X]/n = θ2/n = 162 /6 = 42.67. Comment: For the Exponential Distribution, the Method of Maximum Likelihood applied to ungrouped data is the same as the Method of Moments. Thus this is also the variance of the estimate of θ via the Method of Maximum Likelihood. 27.6. D. Mean = 643.8. θ = mean/ α = 643.8/3 = 214.6.
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 790 27.7. E. Since the estimated θ = mean/α, VAR[estimated θ] = VAR[observed mean]/α2. The observed mean is (1/n) times the sum of n independent identically distributed Gammas. Therefore, the observed mean has variance: (1/n2 )(nVAR[Gamma]) = (1/n)VAR[Gamma] = (1/n) αθ2. Thus VAR[estimated θ] = (1/α2)(1/n) αθ2 = θ2 / (n α). For n =10, α = 3 and θ = 214.6 (from previous solution), Var(θ) = (214.62 ) / 30 = 1535. Thus the standard deviation of the estimated θ is:
1535 = 39.2.
Alternately, for the Gamma with alpha fixed, the method of moments is equal to maximum likelihood, so one can use the delta method. For a Gamma Distribution with alpha fixed, f(x) = θ−αxα−1 e−x /θ / Γ(α). lnf(x) = -αln(θ) + (α-1) ln(x) - x/θ - ln[Γ(α)]. ∂2ln[f(x)] = α/θ2 - 2x/θ3. ∂θ2 Thus, E[
∂ln[f(x)] = (-α/ θ) + x/θ2. ∂θ
However, E[X] = αθ.
∂2ln[f(x)] ] = α/θ2 - 2(αθ) / θ3 = -α/θ2. ∂θ2
Therefore, for n data points the estimated variance of θ is: θ2 / (n α). Proceed as before. 27.8. C. Mean = 643.8. α = mean / θ = 643.8/200 = 3.22. 27.9. A. α = mean / θ. Thus Var(α) = Var(mean) / θ2 . Var(mean) = (1/n) Var(x). For x from a Gamma Distribution: Var(x) = αθ2 . Therefore, Var(α) = Var(mean) / θ2 = (1/n) (αθ2 ) / θ2 = α/n. For n =10 and α = 3.22 (from previous solution) , Var(α) = 3.22/10 = 0.322. 27.10. A. For the method of moments, one sets 1/λ = observed mean = 105. Therefore, λ = 1/mean = 1/105 = 0.00952.
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 791 27.11. C. ^λ = 1/ X . Var[ ^λ ] = (∂ ^λ /∂ X )2 Var[ X ] = (-1/ X 2 )2 Var[ X ] = Var[ X ]/mean4 = (Var[x]/n)/(1/λ)4 = (1/λ2)(1/n)λ4 = λ2/n = (.00952)2 / 10000 = 9.06 x 10-9. Thus the standard deviation = 9.06 x 10 - 9 = 0.0000952. Alternately, for the Exponential Distribution fit to ungrouped data, the Method of Moments is equal to the Method of Maximum Likelihood. ln f(x) = -λx + lnλ.
∂ln f(x) / ∂λ = -x + 1/λ.
∂2 ln f(x / ∂λ2 = -1/λ2.
E[∂2 ln f(x) / ∂λ2] = -1/λ2.
Thus Var[ ^λ ] ≅ -1/{nE[∂2 ln f(x) / ∂λ2]} = λ2/n = (.00952)2 / 10000 = 9.06 x 10-9. Thus the standard deviation =
9.06 x 10 - 9 = 0.0000952.
27.12. C. Variance of an Exponential Distribution is 1/λ2 = 1/ .009522 = 11,034. 27.13. A. If we were to let h(λ) = 1 / λ2, then Var[h] ≅ (∂h / ∂λ)2 Var[λ] = (-2/λ3)2 λ 2/n = 4 λ-4 /n = 4(.00952)-4 / 10000 = 48698. Therefore, the standard deviation =
48,698 = 221.
27.14. D. 1 - F(x) = e−λx. 1- F(200) = e-(200)(.00952) = 0.149. 27.15. C. From a previous solution, Var[ ^λ ] ≅ λ2/n. Let the survival function be h(x;λ) = e−λx, then Var[h] ≅ (∂h / ∂λ)2 Var[ ^λ ] = (-xe−λx)2 λ2/n = (λx)2 e−2λx /n = {(.00952)(200)}2 e-2(200)(.00952) / 10000 = .00000805. Therefore, the standard deviation =
0.00000805 = 0.00284.
Comment: Thus an interval estimate of the tail probability at 200 is: .149 ± .006. 27.16. D. For the exponential distribution the loss elimination ratio = 1 - e-λx = 0.614. 27.17. D. Let the loss elimination ratio be h(x;λ) = 1 - e−λx.
∂h / ∂λ = x e-λx = 38.6 for x = 100 and λ = .00952. Var[h] ≅ (∂h / ∂λ)2 Var[λ] = (38.62 ) (9.06 x 10-9) = 1.35 x 10-5. Standard deviation = 0.00367.
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 792 27.18. A. The mean of a Weibull Distribution is θΓ(1+1/τ) = θ Γ(5) = 24θ. The observed mean is $873 million / 100,000 = $8730. 24θ = 8730. θ = 8730/24 = 363.75. 27.19. E. The second moment of a Weibull Distribution is: θ2Γ(1+2/τ) = θ2Γ(9) = 40320θ2. Therefore, the variance of a Weibull Distribution is: 40320θ2 - (24θ)2 = 39744θ2 = 39744(363.75)2 . θ = mean/24. Var[θ] = Var[mean]/242 = {Var[x]/(100,000)}/ 576 = 39744(363.75)2 / 57,600,000 = 0.00069(363.75)2 . ⇒ The standard deviation of the estimate of θ is: 363.75
0.00069 = 9.55.
27.20. C. S(x) = exp[-(x/θ)τ]. S(5000) = exp[-(5000/363.75)1/4] = 0.146. 27.21. E. S(x) = exp(-(x/θ)τ). ∂ S(x) / ∂θ = exp(-(x/θ)τ) τxτ/θτ+1 .
∂ S(5000) / ∂θ = (.146)(1/4)(50001/4)/(363.755/4) = .000193. Var[S(5000)] ≅ (∂S(x)/ ∂θ)2 (Var[θ]) = (.0001932 )(Var[θ]). Therefore, the standard deviation of the estimate of the survival function at 5000 is: (.000193)(standard deviation of the estimate of θ) = (.000193)(9.55) = 0.00185. Comment: An approximate 95% confidence interval for S(5000) is: 0.146 ± 0.004, assuming these losses are actually drawn from a Weibull Distribution with τ = 1/4. 27.22. E. 0.95 = 1 - exp[-(x/θ)τ] = 1 - exp[-(x/363.75)1/4]. ⇒ exp(-(x/363.75)1/4) = 0.05.
⇒ (x/363.75)1/4 = ln(20).⇒ x = 363.75 (ln(20))4 = 29,296. 27.23. B. Estimate of the 95th percentile = θ (ln(20))4 . Therefore, the standard deviation of the estimate of the 95th percentile is: (ln(20))4 (standard deviation of the estimate of θ) = (80.54)(9.55) = 769. 27.24. D. Since there is one parameter, the method of moments involves matching the mean: mean = θ / (α -1) = θ / 2. Thus θ = 2 (mean). Thus Var(θ) = 22 Var(mean) = 4 Var(mean). Var(mean) = (1/n) Var(x). For x from a Pareto Distribution, with α = 3: Var(x) = 2nd moment - mean2 = 2θ2 / {(α−1)(α−2)} - θ2 / (α−1)2 = θ2 {2/(2)(1) - 1/22 } = 0.75 θ2 . Thus, Var(θ) = 4 Var(mean) = (4/n) Var (x) = (4/300) 0.75 θ2 = 0.01 θ2 .
2013-4-6, Fitting Loss Dists. §27 Variances Meth. Moments, HCM 10/15/12, Page 793 27.25. D. Using the empirical mean as the estimator for θ is applying the Method of Moments. θ = mean. Var[θ] = Var[mean] = Var[X]/n = θ2/n. The quantity of interest is: S(1) = e-1/θ. ∂ ln S(1) / ∂θ = e-1/θ /θ2 . Var[S(1)] ≅ (∂S(1) / ∂θ)2 Var[θ] = (e-1/θ /θ2 )2 θ2/n = e−2/θ / (nθ2). Alternately, for the Exponential Distribution, as applied to ungrouped data the method of moments is equal to the method of maximum likelihood. ln f(x) = -x/θ - lnθ. ∂ ln f(x) / ∂θ = x/θ2 -1/θ.
∂2 ln f(x) / ∂θ2 = -2x/θ3 +1/θ2. E[∂2 ln f(x) / ∂θ2] = -2E[x]/θ3 +1/θ2 = -2θ/θ2 + 1/θ2 = -1/θ2. The quantity of interest is S(1) = e-1/θ. ∂ ln S(1) / ∂θ = e-1/θ /θ2 . Thus the approximate variance of the estimator for the probability that X is greater than 1 is: -(∂S(1) / ∂θ)2 / {n E[∂2 lnf(θ) / ∂θ2 ]} = -(e-1/θ /θ2 )2 / {n (-1/θ2)} = e−2/θ / (nθ2).
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 794
Section 28, Variance of Estimated Single Parameters, Maximum Likelihood If one has fit a distribution to data, one has obtained a point estimate of the parameter(s) of the distribution. However, there is a variance in that estimate due to estimation error. In this section, for the case with one parameter fit via Maximum Likelihood, how to quantify that variance will be discussed, as well as how to use this variance to obtain interval estimates. Assuming the form of the distribution and the other parameters are fixed, the approximate variance of the estimate of a single parameter using the method of maximum likelihood is given by negative the inverse of the product of the number of data points times the expected value of the second partial derivative of the log of the density:333 Variance of ^θ ≅
nE
[∂2
−1 1 = . 2 ln f(x) / ∂ θ] n E [(∂ ln f(x) / ∂θ) 2]
The above denominator is called the information, or Fisherʼs Information, the one dimensional special case of the information matrix to be discussed in a subsequent section. (Fisherʼs) Information = -n E [∂2 ln f(x) / ∂θ2] = n E [(∂ ln f(x) / ∂θ)2]. Exercise: An Exponential Distribution, f(x) = λe-λx, has been fit to 500 data points via the method of maximum likelihood. The fitted value of λ = 0.03. What is the variance of this estimate of λ? [Solution: ln f(x) = -λx + lnλ.
∂ ln f(x) / ∂λ = -x +1/λ.
∂2 ln f(x) / ∂λ2 = -1/λ2.
E[∂2 ln f(x) / ∂λ2] = -1/λ2. Thus Var[λ] ≅ -1/nE[∂2 ln f(x) / ∂λ2] = λ2/n = (0.03)2 / 500 = 0.0000018.] This formula can be used when one has held all but one parameter in a distribution fixed, and the remaining parameter has been estimated by maximum likelihood. For example, assume we have a Pareto with θ = 300,000.334 As noted in a previous section, fitting to the ungrouped data in Section 2 by maximum likelihood gives α = 1.963.335 ln f(x) = ln(α) + α ln(300,000) − (α+1)ln(300,000 + x). The derivative with respect to α is: (1/α) + ln(300,000) − ln(300,000 + x). The second derivative with respect to α is: −1/α2. ⇒ E [∂2 ln f(x) / ∂α2] = E[−1/α2] = -1/α2. 333
This is the Cramer-Rao lower bound. Maximum likelihood estimators are asymptotically UMVUE. Note that one can also use an alternate formula involving the expected value of the square of the first partial derivative. 334 Since θ is taken as fixed, this is really only a one parameter distribution. 335 This differs from the maximum likelihood curve when both parameters are allowed to vary freely.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 795 In general, write the log density, take the second partial derivative, and then take the expected value with respect to x. The variance of the estimate of α is approximately: -1 / {n E [∂2 ln f(x) / ∂α2]} = -1/(-n/α2) = α2 / n = 1.9632 / 130 = 0.0296. Thus standard deviation is approximately
0.0296 = 0.172.
In this case as well as in general, the variance of the estimate is inversely proportional to the number of data points used to fit the distribution. This can be a useful final check of your solution. This is same behavior as we had for the variance of an average, X .336 For maximum likelihood estimators, the errors are asymptotically normal. Thus ±1.96 standard deviations is an approximate 95% confidence interval for α. Using ±1.96 standard deviations, our estimate of α is: 1.96 ± 0.34. The information is: -n E [∂2 ln f(x) / ∂θ2]. One can calculate the expected value of the partial second derivative by integration with respect to x. Alternately, the expected value of the partial second derivative can be approximated by summing the partial second derivative at the fitted parameters over all the observed loss values and then dividing by the number of observed losses.337 In either case, when dealing with a particular distribution, one requires the second partial derivatives for that distributionʼs log density with respect to its parameters.338 Second Partial Derivatives of the Log Density: Distribution
Parameter
Second Partial Derivative of Log Density
Pareto
α
-1/α2
Pareto
θ
{(α+1)/(θ+x)2 } - { α / θ2}
LogNormal
µ
-1/ σ2
LogNormal
σ
{σ2 - 3(lnx − µ)2} / σ4
Gamma
α
−ψ′(α)
Gamma
θ
α/θ2 - 2x/θ3
336
For the Exponential, method of moments is equal to maximum likelihood, and therefore the variance of the maximum Iikelihood estimate of theta is the variance of X . 337 When multiplied by -n, this approximation is called the observed information. See page 397 of Loss Models. 338 One can approximate the second partial derivative by numerical means. See page 397 of Loss Models.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 796 Exponential
θ
1/θ2 - 2x/θ3
Weibull
θ
τ/θ2 - τ(τ+1) xτ θ−(τ+2)
Weibull
τ
{−1/τ2} − (ln(x/θ))2(x/θ)τ
Single Parameter
α
−1/α2
Burr
α
−1/α2
Burr
θ
{γ/(θ(1+(x/θ)γ)2}{1 +(x/θ)γ(1 − γ − α(1+γ+(x/θ)γ))}
Burr
γ
(1+α)(x/θ)γ (ln(x/θ))2/{(1+(x/θ)γ)}2 − 1/γ2
Trans. Gamma
α
−ψ′(α)
Trans. Gamma
θ
{ τ α / θ2} − τ (τ+1) xτθ−(τ+2)
Trans. Gamma
τ
{−1/τ2 } - (ln(x/θ))2 (x/θ)τ
Gen. Pareto
α
ψ′(α + τ) − ψ′(α)
Gen. Pareto
θ
(α + τ)/(θ + x)2 - α/θ2
Gen. Pareto
τ
ψ′(α +τ) − ψ′(τ)
Pareto
Where ψ′(α) is the trigamma function. The trigamma function is the derivative of the digamma function. The digamma function ψ(α) is equal to the derivative with respect to α of ln Γ(α). ψ(α) = d { ln Γ(α)} / dα . ψ′(α) = d2 { ln Γ(α)} / dα2. The trigamma function can be computed from the series summed from n = 0 to infinity: ψ′(α) = Σ (α + n) −2.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 797 Example for a Weibull Distribution with θ fixed: Here is another example of how to use these second partial derivatives to estimate the variance of a parameter of a distribution fit by maximum likelihood, assuming the other parameter(s) have been selected in advance and are fixed. Assume we have a Weibull with θ = 225,000.339 Fitting by maximum likelihood to the ungrouped data in Section 2 turns out to give τ = 0.686.340 f(x) = τ(x/θ)τ exp(-(x/θ)τ) / x = τ xτ−1 exp(-(x/225,000)τ)/225,000τ. lnf(x) = ln(τ) + (τ-1) ln(x) - (x/225,000)τ - τ ln(225,000). The derivative with respect to τ is: 1/τ + ln(x) - ln(x/225,000) (x/225,000)τ - ln(225,000). ∂2 ln f(x) / ∂τ2 = -1/τ2 - ln(x/225,000)2 (x/225,000)τ.
E[∂2 ln f(x) / ∂τ2] = -1/τ2 - E[ln(x/225,000)2 (x/225,000)τ] ≅ -1/0.6862 - (1/n) Σ[(ln(xi/225,000))2 (xi/225,000)0.686] = -1/.6862 - 315.718/130 = -0.3036.341 When we approximate the information by substituting in the observed values as was done here, the result is called the observed information. Thus the variance of the estimate of τ is approximately: Var[ τ^ ] ≅
nE
−1 1 = = 0.0253. 2 f(x) / ∂ τ] (130) (0.3036)
[∂2 ln
StdDev[ τ^ ] ≅ 0.0253 = 0.159. This gives an interval estimate of τ of: 0.69 ± 0.31.342
Since θ is taken as fixed, this is really only a one parameter distribution. Note that this differs from the maximum likelihood fit when both parameters are allowed to vary freely. 341 Note that to compute this value, we have to take the sum over all data points of (ln(x/θ))2 ((x/θ).686). For the ungrouped data in Section 2, this sum equals 315.718. 342 Using ±1.96 standard deviations, and assuming that θ = 225,000. 339 340
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 798 Problems: 28.1 (1 point) A distribution with a single parameter γ, has been fit via maximum likelihood to 1321 claims. E[∂2 ln f(x) / ∂γ2] = -0.45. What is the standard deviation of the estimate of γ? A. 0.02
B. 0.03
C. 0.04
D. 0.05
E. 0.06
28.2 (3 points) You are given the following:
•
The random variable X has the density function: f(x) = (1/θ) e-x/θ, x > 0.
•
A random sample of 6 observations of X yields the values: 10, 12, 15, 17, 19, 23.
Estimate the variance of the maximum likelihood estimator of θ. A. Less than 30 B. At least 30, but less than 35 C. At least 35, but less than 40 D. At least 40, but less than 45 E. At least 45 28.3 (2 points) A distribution F(x) = 1 - {1000 / (1000 + x)}α is fit by the method of maximum likelihood to 200 claims. The resulting estimate of the single parameter is α = 4. What is the variance of this estimate? A. less than 0.10 B. at least 0.10 but less than 0.12 C. at least 0.12 but less than 0.14 D. at least 0.14 but less than 0.16 E. at least 0.16 28.4 (3 points) A distribution F(x) = 1 - {θ/(θ+x)}3 is fit by the method of maximum likelihood to 200 claims. The resulting estimate of the single parameter is θ = 500. 200
Assume that
∑ (500 1+ x )2 i=1
= 0.00048. What is the variance of this estimate?
i
A. less than 2000 B. at least 2000 but less than 2100 C. at least 2100 but less than 2200 D. at least 2200 but less than 2300 E. at least 2300
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 799 28.5 (2 points) A LogNormal Distribution with σ = 3 is fit via maximum likelihood to 100 claims. The fitted value of the parameter is µ^ . Determine the Cramer-Rao lower bound. A. 1/ µ^
B. 0.01
D. µ^ /10
C. 0.09
E. µ^ /100
28.6 (2 points) 10 Claims have been observed. The underlying distribution is assumed to be Gamma, with parameters α = 8 and theta unknown. Which of the following is an expression for the variance of the maximum likelihood estimator of theta? A. 80 / θ
B. 80 / θ2
C. θ2 / 80
D. θ / 80
E. None of A, B, C, or D.
Use the following information for the next 2 questions: You observe 1000 individual claims, which you assume follow a Pareto Distribution. You are given that: 1000
1000
i=1
i=1
1000
1000
1000
1000
∑ xi = 72,134 ∑ 100 1+ x = 7.038 i i=1 1 = 6.682 106.9 + x i i=1
∑
∑ xi2
= 22,535,997
1 2 = 0.0546 (100 + x ) i i=1
∑
1 2 = 0.0488 (106.9 + x ) i i=1
∑
28.7 (2 points) Assuming θ = 100, α has been estimated using the Method of Maximum Likelihood. The resulting estimate of α is 2.366. Estimate the variance of this estimate of α. A. Less than 0.005 B. At least 0.005, but less than 0.008 C. At least 0.008, but less than 0.011 D. At least 0.011, but less than 0.014 E. At least 0.014
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 800 28.8 (3 points) Assuming α = 2.5, θ has been estimated using the Method of Maximum Likelihood. The resulting estimate of θ = 106.9. Estimate the variance of this estimate of θ. A. Less than 20 B. At least 20, but less than 22 C. At least 22, but less than 24 D. At least 24, but less than 26 E. At least 26 Use the following information for the next 3 questions: You observe 10 individual claims: 1534, 567, 383, 434, 563, 132, 1262, 748, 299, 516 which you assume follow a Gamma Distribution. 28.9 (1 point) Assuming α = 3, estimate θ using the Method of Maximum Likelihood. A. Less than 190 B. At least 190, but less than 200 C. At least 200, but less than 210 D. At least 210, but less than 220 E. At least 220 28.10 (3 points) Estimate the standard deviation of the estimate of θ in the previous question. A. Less than 20 B. At least 20, but less than 25 C. At least 25, but less than 30 D. At least 30, but less than 35 E. At least 35 28.11 (3 points) Assuming θ = 200, α has been estimated using the Method of Maximum Likelihood. The resulting estimate of α = 3.11. Estimate the variance of this estimate of α. You may use: Γ(3.11) = 2.219, ψ(3.11) = 0.965, and ψ′(3.11) = 0.379, where ψ(α) = d ln Γ(α) /dα, the digamma function, and ψ′(α) = d ψ(α) /dα. A. Less than 0.2 B. At least 0.2, but less than 0.3 C. At least 0.3, but less than 0.4 D. At least 0.4, but less than 0.5 E. At least 0.5
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 801 28.12 (3 points) A LogNormal Distribution with µ = 5 is fit via maximum likelihood to 100 claims. The fitted value of the parameter is σ^ . Determine the Cramer-Rao lower bound. A. 0.01
B. σ^ /10
C. σ^ /100
D. σ^ 2/100
E. σ^ 2/200
28.13 (3 points) A random sample of size 100 is drawn from a distribution with probability density function: f(x) = 2θ2 / (θ + x)3 , 0 < x < ∞, θ > 0. Determine the coefficient of variation of the maximum likelihood estimate of θ. A. Less than 0.07 B. At least 0.07, but less than 0.10 C. At least 0.10, but less than 0.13 D. At least 0.13, but less than 0.16 E. At least 0.16 28.14 (3 points) f(x) = 3(x2 /θ3) exp[-(x/θ)3]. What is E[∂2 ln f(x) / ∂θ2]? A. -3/θ2
B. -6/θ2
C. -12/θ2
D. -24/θ2
E. None of A, B, C, or D.
28.15 (3 points) A density f(x) = λ3x2 e−λx/2 is fit by the method of maximum likelihood to 25 claims. The resulting estimate of the single parameter is λ = 5000. What is an approximate 95% confidence interval for this estimate? A. [4850, 5150] B. [4500, 5500] C. [4200, 5800] D. [3850, 6150]
E. [3400, 6600]
28.16 (3 points) A distribution F(x) = Φ[{ln(x) - 10} / σ] is fit by the method of maximum likelihood to 100 claims. The resulting estimate of the single parameter is σ = 2. 100
Assume that
∑ {ln(xi)
- 10}2 = 900. What is the variance of this estimate?
i=1
A. less than 0.004 B. at least 0.004 but less than 0.006 C. at least 0.006 but less than 0.008 D. at least 0.008 but less than 0.010 E. at least 0.010 28.17 (3 points) A Weibull Distribution with τ fixed and θ unknown has been fit via maximum ^
likelihood to a sample of n points. Determine Var[ θ ].
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 802 28.18 (2 points) The probability density function is f(x) = (1 + γ) xγ for 0 < x < 1, γ > -1. A sample of size 50 is drawn and the maximum likelihood estimate for γ is computed to be 3. Determine the asymptotic variance of the maximum likelihood estimate of γ. A. 0.26
B. 0.28
C. 0.30
D. 0.32
E. 0.34
28.19 (4, 5/85, Q.57) (3 points) The following five numbers have been drawn from a distribution with probability density function f(x) = λe-λx. 1.59 3.71 4.09 5.31 13.03 ^
a. Find the maximum likelihood estimate, λ , of λ. ^
b. Estimate the variance of the λ obtained in (a). You may assume that the maximum likelihood estimator has an approximate normal distribution. ^
c. Use λ to estimate Pr[X>5]. 28.20 (4, 5/90, Q.54) (3 points) A single observation, x, is taken from a normal distribution with mean µ = 0 and variance σ2 = θ. The normal distribution has its probability density function given by: (x -µ)2 exp[] 2 2σ f(x) = , -∞ < x < ∞. σ 2π Let θ^ be the maximum likelihood estimator of θ. Which of the following is the variance of θ^ ? A. 1 / θ
B. 1 / θ2
C. 1 / 2θ
D. 2θ
E. 2θ2
28.21 (4, 5/91, Q.48) (2 points) The following sample of 10 claims is observed: 1500 6000 3500 3800 1800 5500 4800 4200 3900 3000 The underlying distribution is assumed to be Gamma, with parameters α = 12 and θ unknown. Which of the following is an expression for the variance of the maximum likelihood estimator of θ? A. 1200/θ
B. 12 /38000
C. θ2 /12
D. θ2 /120
E. θ2 /1200
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 803 28.22 (4B, 5/92, Q.22) (2 points) A random sample of n claims, x1 , x2, ..., xn , is taken from the following exponential distribution: f(x) = (1/θ) e-x/θ, x > 0. Determine the variance of the maximum likelihood estimator for θ. ⎛ n ⎞2 ⎜⎜∑ xi⎟⎟ ⎝ ⎠ A. i=1 n
⎛ n ⎞2 ⎜⎜∑ xi⎟⎟ ⎝ ⎠ B. i=1 2 n
C.
θ 2n
D.
θ2 n
E.
θ2 2n
28.23 (4B, 11/92, Q.5) (2 points) A random sample of n claims x1 , x2 , ..., xn , is taken from the distribution function: F(x) = 1 - x−α, x > 1. Determine the asymptotic variance of the maximum likelihood estimator for α. ⎛ n ⎞2 ⎜⎜∑ xi⎟⎟ ⎝ ⎠ A. i=1 n
B.
2n α
C.
α 2n
D.
n α2
E.
α2 n
28.24 (4B, 11/92, Q.25) (2 points) You are given the following information: • A random sample of 30 observations, x1 , x2 ,..., x30, has been observed from the distribution: x2 exp[- ] 2θ , -∞ < x < ∞. f(x) = 2π θ
•
The maximum likelihood estimator θ1 of θ is an unbiased estimator.
•
The estimated information of θ1 is 15/θ2 .
If θ = 1.85, determine a 95% confidence interval for θ1 . A. (1.403, 2.297) B. (0.914, 2.786) C. (1.162, 2.538) D. (1.499, 2.201) E. Cannot be determined.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 804 28.25 (4B, 5/93, Q.8) (2 points) A random sample of n claims x1 , x2 , ..., xn , is taken from the 2 x exp[- ( i - 1000) ] 2θ probability density function f(xi) = , -∞ < xi < ∞. 2πθ
Let θ1 be the maximum likelihood estimator of θ. Determine the asymptotic variance of θ1. n
∑ (xi − 1000)2 A.
i=1
B.
n
θ 2n
C.
2n θ
D.
2θ2 n
E.
θ2 n
28.26 (4B, 11/93, Q.9) (2 points) A random sample of 5 claims x1 ,..., x5 is taken from the probability density function: f(xi) = αθα (θ+xi)-(α+1), α, θ, xi > 0. In ascending order the observations are: 43, 145, 233, 396, 775. Given that θ = 1000, let α^ be the maximum likelihood estimate of α. Determine the asymptotic variance of α^ . A. 10 /α2
B. α2 /10
C. 5 /α
D. α2 / 5
E. α2 /25
28.27 (4B, 5/94, Q.28) (2 points) You are given the following: • A random sample of 40 observations, x1 , x2 ,..., x40, has been observed from the distribution: f(x) = λe−λx, x > 0. •
λ1 is the maximum likelihood estimator of λ.
• The information of λ1 is 40 / λ12. If λ1 = 5.00, determine a 95% confidence interval for λ. A. B. C. D. E.
(4.375, (4.209, (3.775, (3.450, (2.521,
5.625) 5.791) 6.225) 6.550) 7.479)
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 805 28.28 (4B, 11/94, Q.9) (2 points) You are given the following: A random sample of 40 observations, x1 , ..., x40 has been observed from the distribution x2 ] 2θ , -∞ < x < ∞. 2π θ
exp[f(x) =
The maximum likelihood estimator θ1 of θ is an unbiased estimator. The information of θ1 is 20 / θ12 . If θ1 = 2.00, estimate the Mean Square Error of θ1. A. 0.20 B. 0.45 C. 5.00 D. 10.00 E. Cannot be determined from the given Information. 28.29 (4B, 11/94, Q.22) (3 points) You are given the following: The claims frequency rate for a group of insureds is believed to be an exponential distribution with unknown parameter λ, F(x) = 1 - e- λx , x > 0, where X = the claims frequency rate. Ten random observations of X yield the following sample in ascending order: 0.001, 0.003, 0.053, 0.062, 0.127, 0.131, 0.377, 0.382, 0.462, 0.481 Summary statistics for the sample data are: 10
∑ xi = 2.079 ;
10
i=1
i=1
∑ xi2
= 0.773 ;
10
∑ ln[xi ] = -25.973 i=1
λ^ is the maximum likelihood estimator for λ. Use the normal distribution to determine a 95% confidence interval for λ based upon the sample data. A. (0.20, 0.22)
B. (0.27, 9.36)
C. (1.83, 7.79)
D. (2.50, 7.12)
E. (3.29, 6.33)
28.30 (4B, 5/96, Q.15) (2 points) You are given the following: •
The random variable X has the density function f(x) = (1/θ) e-x/θ , x >0.
•
A random sample of three observations of X yields the values 10, 20, and 30.
Estimate the variance of the maximum likelihood estimator of θ. A. Less than 50 B. At least 50, but less than 150 C. At least 150, but less than 250 D. At least 250, but less than 350 E. At least 350
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 806 28.31 (IOA 101, 4/00, Q.15) (13.5 points) An engineer is interested in estimating the probability that a particular electrical component will last at least 12 hours before failing. In order to do this, a random sample of n components is tested to destruction and their failure times, x1 , x2 ,..., xn are recorded. The engineer models failure times by assuming that they come from a distribution with distribution function, F, and probability density function, f, given below. F(x) = 1 - 1/(1 + x)α, f(x) = α/(1 + x)α+1, α > 0, x > 0. (i) (6 points) Determine α^ , the maximum likelihood estimator of α, and, assuming n is large, use asymptotic theory to determine an approximate 95% confidence interval for α. (ii) (4.5 points) A sample of size n = 80 leads to a maximum likelihood estimate of α of 0.56. Use this value to: (a) estimate the probability a component will fail before 12 hours, (b) determine an approximate upper 95% one-sided confidence interval for α, and (c) hence determine an approximate 95% one-sided confidence interval which provides an upper bound for the probability in part (ii)(a) above. (iii) (3 points) Sixty-one of the eighty components tested in part (ii) failed before 12 hours, so a second engineer estimates the failure probability by 61/80 = 0.7625, and constructs an upper 95% confidence interval based on the binomial distribution. (a) Construct this interval for the probability a component will fail before 12 hours, and (b) comment on the advantages and disadvantages of this method when compared to the method of part (ii). 28.32 (4, 11/01, Q.22 & 2009 Sample Q.69) (2.5 points) You fit an exponential distribution to the following data: 1000 1400 5300 7400 7600. Determine the coefficient of variation of the maximum likelihood estimate of the mean, θ. (A) 0.33
(B) 0.45
(C) 0.70
(D) 1.00
(E) 1.21
28.33 (4, 11/05, Q.18 & 2009 Sample Q.229) (2.9 points) A random sample of size n is drawn from a distribution with probability density function: f(x) = θ/(θ + x)2 , 0 < x < ∞, θ > 0. Determine the asymptotic variance of the maximum likelihood estimator of θ. (A)
3θ2 n
(B)
1 3nθ 2
(C)
3 nθ2
(D)
n 3θ2
(E)
1 3θ2
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 807 Solutions to Problems: 28.1. C. Var[γ] = -1/ {nE[∂2 ln f(x) / ∂γ2]} = -1/ {(1321)(-0.45)} = 0.00169. Therefore, the standard deviation of the estimate of γ is:
0.00169 = 0.041.
28.2. D. f(x) = (1/θ)e-x/θ. Σln(f(xi)) = Σ(−lnθ - xi/θ). ∂Σln(f(xi)) /∂θ = Σ(−1/θ + xi/θ2 ). Setting the partial derivative with respect to θ of the loglikelihood equal zero: 0 = Σ(−1/θ + xi/θ2 ) = -n/θ + (1/θ2 )Σ xi. Therefore the maximum likelihood estimate of θ is: (1/n)Σ xi = (10 + 12 + 15 + 17 + 19 + 23)/6 = 16.
∂2 ln(f(x)) / ∂θ2 = 1/θ2 - 2x/θ3 . For this Exponential Distribution, the mean E[x] = θ. Therefore, E[ ∂2 ln(f(x)) /∂θ2 ] = E[1/θ2 -2x/θ3 ] = 1/θ2 - (2/θ3 )E[x] = 1/θ2 - (2/θ3 )(θ) = -1/θ2 . Variance of θ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2]} = θ2 / n. In this case n = 6, and the estimated θ is 16. Thus Variance of θ ≅ 162 /6 = 42.67. Comment: For the Exponential Distribution, the Method of Maximum Likelihood applied to ungrouped data is the same as the Method of Moments, thus the variance of the estimates are the same. 28.3. A. f(x) = α1000α/(1000+x)α+1. ln f(x) = ln(α) + α ln(1000) - (α + 1)ln(1000 + x).
∂ ln f(xi) / ∂α = {(1/α) + ln(1000) - ln(1000 + xi)}. ∂2 ln f(xi) / ∂α2 = -1/α2. Thus the Variance of α ≅ -1 / {n E[∂2 ln f(x) / ∂α2]} = α2 / n = 16 / 200 = 0.08. Alternately, the one by one information matrix is: -n E [∂2 ln f(x) / ∂α2] = n / α2 = 200 /16 = 12.5. Thus the Variance of α ≅ (information matrix)-1 = 1/12.5 = 0.08. Comment: This is a Pareto Distribution with theta fixed at 1000.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 808 28.4. B. For the Pareto, f(x) = (αθα)(θ + x)−(α + 1). ln f(x) = ln(α) + α ln(θ) - (α+1)ln(θ + x). ∂ ln f(x) / ∂θ = α /θ - (α+1)/(θ + x). ∂2 ln f(x) / ∂θ2 = −α /θ2 + (α+1)/(θ + x)2. We need the expected value of the second partial derivative of the log density with respect to θ. The expected value of the first term is: - α / θ2. 200
The Expected Value of 1 / (θ+x)2 ≅ (1 / 200) Σ (500 + xi)-2 = .00048 / 200 = 2.4 x 10-6. i=1
Therefore the Expected Value of this second partial derivative is approximately: (4)(2.4 x 10-6) - 3/5002 = -2.4 x 10-6. Therefore the Variance of the estimated θ ≅ -1 / {n E [∂2 ln f(x) / ∂θ2]} = -1/{200(-2.4 x 10-6)} = 2083. Alternately, one could let x be a random draw from a Pareto Distribution: ∞ ∞ 1 f(x) α θα α θα α E[ ]= ∫ dx = ∫ dx = = . + α + α 2 2 3 2 (x+ θ) (x +θ) (x + θ) (α + 2) θ (α + 2) θ2 0 0
E[
∂2 lnf α α α ] = - 2 + (a+1) =. 2 2 ∂θ θ (α + 2) θ (α + 2) θ2
Therefore the Variance of the estimated θ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2]} = (α + 2)θ2 / (nα) = (5)(5002 ) / {200(3)} = 2083. Comment: Pareto distribution with α = 3. α /{(α+2) θ2 } = (3)/{(3+2)(5002 )} = 0.0000024. In this question, the exact value of E [1/(θ+x)2 ] matches the observed average value for (500 + xi)-2 of: 0.00048 / 200. While in this case the information and its approximation the observed information are equal, in general the two values will differ. On the exam, if they go out of their way to give you the sum of (500 + xi)-2 as here, then just go ahead and use it. If the two methods of solution give different answers, hopefully both will be given credit.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 809
[(
exp 28.5. C. f(x) =
ln(x) − µ)2 2σ2 x σ 2π
]
. lnf(x) = -
( ln(x)
− µ)2 - ln(x) - ln(σ) - ln(2π)/2. 2σ2
ln(x) − µ ∂2lnf ∂lnf -1 = . = 2. 2 2 σ σ ∂µ ∂µ The Cramer-Rao lower bound is:
σ2 -1 = = 32 /100 = 0.09. n E [∂2 ln f(x) / ∂ 2µ] n
28.6. C. For a Gamma Distribution with alpha fixed at 8, f(x) = θ−8 x7 e−x /θ / Γ(8). lnf(x) = -8ln(qθ) + 7 ln(x) - x/θ - ln[Γ(8)]. ∂2 ln f(x) / ∂θ2 = 8/θ2 - 2x/θ3.
∂ ln f(x) / ∂θ = (-8/ θ) + x/θ2.
E [∂2 ln f(x) / ∂θ2] = 8/θ2 - 2(8θ) / θ3 = -8/θ2.
Therefore, for 10 data points the estimated variance is: -1 / {(10)(-8 / θ2)} = θ2 / 80. Comment: For a Gamma Distribution, the expected value of the 2nd partial derivative of the log density with respect to θ is: α/θ2 - 2E[X]/θ3 = α/θ2 - 2αθ/θ3 = -α/θ2. 28.7. B. For the Pareto f(x) = (αθα)(θ + x)−(α + 1). The log density is: ln f(x) = ln(α) + α ln(θ) − (α+1)ln(θ + x). The derivative with respect to α is: 1/α + ln(θ) − ln(θ + x) Thus, the second derivative with respect to α is: -1/α2 . The expected value of the second partial derivative of the log density with respect to α is: −1/α2 . Thus the Variance of α^ ≅ -1 / {n E [∂2 ln f(x) / ∂α2]} = α2 / n = 2.3662 / 1000 = 0.00560. Comment: Compare the variance here of .0056 to that computed in a previous question for the method of moments of 0.0119.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 810 28.8. B. For the Pareto f(x) = (αθα)(θ + x)−(α + 1). ln f(x) = ln(α) + α ln(θ) − (α+1)ln(θ + x). ∂ ln f(x) / ∂θ = α /θ − (α+1)/(θ + x). ∂2 ln f(x) / ∂θ2 = −α /θ2 + (α+1)/(θ + x)2. We need the expected value of the second partial derivative of the log density with respect to θ. The expected value of the first term is: - α / θ2 = -2.5/106.92 . 1000
The Expected Value of 1
/ (θ+x)2
≅ (1 / 1000)
∑ (106.9 + xi)- 2 = 4.88 x 10-5. i=1
Therefore the Expected Value of this second partial derivative is: (3.5)(4.88 x 10-5) - 2.5/106.92 = -4.80 x 10-5. Therefore the Variance of the estimated θ ≅ -1 / {n E [∂2 ln f(x) / ∂θ2]} = -1/{1000(-4.80 x 10-5)} = 20.8. Alternately, one could let x be a random draw from a Pareto Distribution: ∞ ∞ 1 f(x) α θα α θα α E[ ]= ∫ dx = ∫ dx = = + α + α 2 2 3 2 2. (x+ θ) (x +θ) (x + θ) (α + 2) θ (α + 2) θ 0 0
E[
∂2 lnf α α α ] = - 2 + (a+1) =. 2 2 ∂θ θ (α + 2) θ (α + 2) θ2
Therefore the Variance of the estimated θ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2]} = {(α+2) θ2 } / (nα) = (4.5)(106.92 ) / 2500 = 20.6. Comment: Difficult! α /{(α+2) θ2 } = 2.5/ {(4.5)(106.92 )} = 0.0000486. In this question, E[1/(θ+x)2 ] does not (precisely) match the observed average value for (106.9 + xi)-2 of: .0000488. In general the information and its approximation the observed information will differ, resulting in (somewhat) different estimates for the variance of the estimate of θ. On the exam, if the examiners go out of their way to give you the sum of (106.9 + xi)-2 as here, then just go ahead and use it. If the two alternatives give different answers, hopefully both will be given credit.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 811 28.9. D. For the Gamma with α fixed, the Method of Maximum Likelihood applied to ungrouped data is equal to the Method of Moments. θ = mean/ α = 643.8/3 = 214.6. Alternately, for the Gamma f(x) = θ−αxα−1 e−λx/θ /Γ(α), ln f(x) = -α lnθ +(α-1) lnx -x/θ - lnΓ(α). 0 = ∂Σln f(xi) / ∂θ = Σ {-α/θ +xi/θ2} = -nα/θ + Σ xi/θ2. Therefore, θ = {Σ xi / n} / α = 643.8/3 = 214.6. 28.10. E. For the Gamma, f(x) = θ−αxα−1 e−x/θ /Γ(α). ln f(x) = -α lnθ +(α-1) lnx -x/θ - lnΓ(α). The partial derivative with respect to θ is: -α/θ + x /θ2. Thus, the second partial derivative with respect to θ is: α/θ2 - 2x /θ3. E[∂2 ln f(x) / ∂θ2] = E[α/θ2 - 2xi /θ3] = α/θ2 - 2 E[x]/θ3 = α/θ2 - 2αθ /θ3 = -α/θ2. Thus the Variance of θ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2]} = θ2/nα. For n =10, α = 3, and θ = 214.6 (from previous solution), Var(θ) = (214.62 ) / 30 = 1535. Thus the standard deviation of the estimated θ is
1535 = 39.2.
Comment: Since for α fixed in the Gamma the method of moments is equal to method of maximum likelihood, the variances of the estimates are the same. 28.11. B. For the Gamma, f(x) = θ−αxα−1 e−x/θ /Γ(α). ln f(x) = -α lnθ + (α-1) lnx - x/θ - lnΓ(α). The partial derivative with respect to α is: −ln(θ) + ln(xi) − ψ(α). Thus, the second partial derivative with respect to α is: −ψʼ(α) . The expected value of the second partial derivative of the log density with respect to α is −ψʼ(α). Thus the Variance of α ≅ -1 / n E[∂2 ln f(x) / ∂α2] = 1/ {nψʼ(α)}. For n =10 and α = 3.11 (given in the question), Var(α) = 1/ {10ψʼ(3.11)} = 1/{(10)(.379)} = 0.264. Comment: Beyond what is likely to be asked on the exam. Note that the variance of the estimate here is less than that for the similar estimate from the method of moments. Note that the first derivative of the sum of the loglikelihoods is: -nln(θ) + Σln(xi) − nψ(α). Setting this equal to zero yields the equation:
ψ(α) = ln(λ) + (1/n) Σln(xi) = ln(0.005) + (1/10)(62.618) = 0.963. One can then find by numerical methods the corresponding value of α which is approximately 3.11.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 812
[(
exp 28.12. E. f(x) =
ln(x) − µ)2 2σ2 x σ 2π
]
. lnf(x) = -
( ln(x)
− µ)2 - ln(x) - ln(σ) - ln(2π)/2. 2σ2
-3 ( ln(x) − µ)2 ∂lnf ( ln(x) − µ)2 1 ∂2lnf 1 = . = + . σ3 σ ∂σ 2 σ4 σ2 ∂σ ln(x) − µ is Normal with mean 0 and standard deviation one. σ Therefore, E[ Therefore, E[ Therefore, E[
( ln(x)
− µ)2
( ln(x)
− µ)2
σ2 σ2
] is the second moment of a Standard Normal Distribution. ] = 12 + 02 = 1.
∂2lnf -3 1 -2 ] = + = σ2 σ2 σ2 ∂σ 2
The Cramer-Rao lower bound is:
σ2 -1 = = σ^ 2/200. 2 2 n E [∂ ln f(x) / ∂ σ] 2n
28.13. D. f(x) = 2θ2/(θ + x)3 . ln f(x) = ln(2) + 2ln(θ) - 3ln(θ + x). ∂ ln f(x) / ∂θ = 2/θ - 3/(θ + x). ∂2 ln f(x) / ∂θ2 = -2/θ2 + 3/(θ + x)2 . ∞
x =∞
∫
E[1/(θ + x)2 ] = 1/(θ + x)2 2θ2/(θ + x)3 dx = 2θ2(-1/4)/(θ + x)4 ] = 1/(2θ2). 0
x=0
Therefore, E[∂2 ln f(x) / ∂θ2] = -2/θ2 + 3/(2θ2) = -1/(2θ2). Variance of θ^ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2]} = 2θ2/n = 2θ2/100 = 0.02θ2. Coefficient of Variation of θ^ = 0.02θ 2 / θ = 0.1414. Comment: Similar to 4, 11/05, Q.18. A Pareto Distribution with α = 2 fixed. For a Pareto Distribution with α fixed, the Variance of θ^ is: (α + 2)θ2/(nα).
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 813 28.14. E. ln f(x) = ln(3) + 2ln(x) - 3ln(θ) - (x/θ)3. ∂ ln f(x) / ∂θ = -3/θ + 3x3 /θ4. ∂2 ln f(x) / ∂θ2 = 3/θ2 - 12x3 /θ5. E[∂2 ln f(x) / ∂θ2] = 3/θ2 - 12E[x3 ]/θ5. This is a Weibull distribution with τ = 3. Therefore, E[x3 ] = θ3Γ(1 + 3/3) = θ3Γ(2) = θ3. E[∂2 ln f(x) / ∂θ2] = 3/θ2 - 12θ3/θ5 = -9/θ2. Comment: In general, for a Weibull Distribution, E[∂2 ln f(x) / ∂θ2] = -τ2/θ2. 28.15. D. f(x) = λ3x2 e−λx /2. lnf(x) = 3ln(λ) + 2 ln(x) - λx - ln(2). ∂ ln f(x) / ∂λ = (3/ λ) - x. ∂2 ln f(x) / ∂λ2 = -3/λ2 . E [∂2 ln f(x) / ∂λ2] = -3/λ2 = -3/50002 . Therefore the estimated variance is: -1/{25(-3/50002 } = 333333. The standard deviation is 577. A 95% confidence interval is: 5000 ± (1.96)(577) = 5000 ± 1132. Comment: This is a Gamma Distribution with alpha fixed at 3, and θ = 1/λ. The second partial derivative of the loglikelihood with respect to λ is: - α / λ2. 28.16. C. This is a LogNormal Distribution with mu fixed at 10. f(x) = exp[-.5 ({ln(x) − µ} / σ)2] /{xσ 2 π ). lnf(x) = -.5 ({ln(x) − µ} / σ)2 - ln(x) - ln(σ) - .5ln(2π). ∂ ln f(x) / ∂σ = {ln(x) − µ}2 / σ3 - 1/σ. ∂2 ln f(x) / ∂σ2 = -3{ln(x) − µ}2 / σ4 + 1/σ2. E [∂2 ln f(x) / ∂σ2] ≅ {σ2 - 3Σ(lnxi − µ)2/n} / σ4 = {22 - (3)(900/100)} / 24 = -1.4375. Therefore the estimated variance ≅ -1 / {n E[∂2 ln f(x) / ∂σ2]} = 1/(100)(1.4375) = 0.0070. Comment: The information has been approximated by the observed information. 28.17. f(x) = τ(x/θ)τ exp(-(x/θ)τ) / x.
ln f(x) = ln(τ) + (τ−1) ln(x) - (x/θ)τ - τ ln(θ).
∂ ln f(x) / ∂θ = τ xτ/θτ+1 - τ /θ.
∂2 ln f(x) / ∂θ2 = -τ(τ+1) xτ/θτ+2 + τ /θ2.
E[∂2 ln f(x) / ∂θ2] = -τ(τ+1) E[xτ]/θτ+2 + τ /θ2. For the Weibull, E[xn ] = θn Γ(1+ n/τ). ⇒ E[xτ] = θτΓ(1+ τ/τ) = θτ Γ(2) = θτ.
⇒ E[∂2 ln f(x) / ∂θ2] = -τ(τ+1) θτ/θτ+2 + τ /θ2 = -τ2/θ2 Var[ θ^ ] = -1/{n(−τ2/θ2)} = (θ2/τ2)/n. ^
Comment: For τ = 1, we get an Exponential Distribution with Var[ θ ] = θ2/n.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 814 28.18. D. ln[f(x)] = ln[1 + γ] + γ ln[x].
E[
∂ ln[f(x)] ∂2 ln[f(x)] = 1/(1+γ) + ln[x]. = -1/(1+γ)2 . ∂γ ∂γ 2
∂2 ln[f(x)] ] = -1/(1+γ)2 = -1/16. ⇒ ∂γ 2
Asymptotic variance is:
nE
[∂2
-1 = 16/50 = 0.32. ln f(x) / ∂2 γ]
Comment: A Beta Distribution with a = γ + 1, b = 1, and θ = 1. 5
28.19. The likelihood L(λ) =
5
f(xi) = λ5 exp(−λ ∑ xi ). ∏ i=1
5
ln L(λ) = 5 ln λ - λ
i=1
∑ xi . i=1
Setting the partial derivative with respect to λ of ln L(λ) equal to zero:
∂ ln L(λ) / ∂ λ = 5 / λ −
5
5
i=1
i=1
∑ xi = 0. ⇒ λ = 5 / ∑ xi = 5 / 27.73 = 0.1803.
b. ∂2 ln f(λ) / ∂ λ2 = -5 / λ2. VAR(λ) ≅ -1 / {n E[∂2 ln f(λ) / ∂ λ2]} = -1/ (-5/λ2) = λ2 / 5 = .18032 /5 = 0.006502. Alternately, VAR(λ) ≅ 1 / {n E[{∂ ln f(λ) / ∂ λ}2]} ≅ 1/ Σ (1/λ - xi)2 = 1 = (5.546 - 1.59)2 + (5.546 - 3.71) 2 + (5.546 - 4.09)2 + (5.546 - 5.31)2 + (5.546 - 13.03)2 1 / (15.65 + 3.37 + 2.12 + 0.06 + 56.01) = 1/ 77.21 = 0.01295. c. ∞
Pr[X > 5] =
∫ x=5
λ e- λx
dx =
x =∞ λx -e ]
= e −5λ = e-(5)(0.1803) = e-0.9016 = 0.406.
x= 5
Comment: Based on the different approximations, two solutions to part b were given full credit.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 815 28.20. E. f(x) = (1/σ)(1/ 2 π ) exp(-.5{(x-µ)/σ}2 ) = (1/ 2 πθ ) exp(-.5{x2 /θ}). ln f(x) = (-1/2)ln(θ) + (-1/2)ln(2π) - (x2 /2θ). ∂ ln(f(x)) / ∂θ = (-1/2)(1/θ) + (1/2θ2 )x2 . The second partial derivative with respect to θ is: (1/2)(1/θ2 ) + (-1/θ3 )x2 . We need the expected value. E[x2 ] = 2nd moment = Variance + mean2 = θ + 02 = θ. Therefore, E[∂2 ln f(x) / ∂θ2 ] = (1/2)(1/θ2) + (-1/θ3)(θ) = -1/(2θ2 ). Variance of θ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2 ]} = 2θ2 /n. For a single data point n = 1, therefore Variance of θ ≅ 2θ2 . 28.21. D. For the Gamma f(x) = θ−αxα−1 e−x/θ /Γ(α). The log density is: ln f(x) = -α lnθ +(α-1) lnx - x/θ - lnΓ(α). ∂ ln f(x) / ∂θ = -α/θ + x/θ2. ∂2 ln f(x) / ∂θ2 = α/θ2 - 2x/θ3. E[∂2 ln f(x) / ∂θ2 ] = α/θ2 - 2E[x]/θ3 = α/θ2 - 2αθ /θ3 = -α/θ2. Variance of θ ≅ 1 / {n E[∂2 ln f(x) / ∂θ2]} = θ2/(nα). For n =10, α = 12, Variance of θ^ ≅ θ2/120. 28.22. D. f(x) = (1/θ) e-x/θ. ln f(x) = - ln(θ) - x/θ. ∂ ln f(x) / ∂θ = -1/θ + (x/θ2). ∂2 ln f(x) / ∂θ2 = 1/θ2 − (2x/θ3 ). The Variance of θ^ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2]} = -1 / {n E[1/θ2 −(2x/θ3 )]} = (- θ3 / n) / {θ - 2E[x]} = (-θ3 / n) / {θ - 2θ} = θ2 / n. Comment: Note that E[X] is the mean, which for this exponential distribution is θ. Since the method of moments and maximum likelihood are equal, so are the variance of the estimates. 28.23. E. f(x) = αx−(α+1). lnf(x) = ln(α) - (α+1)ln(x).
∂ ln f(x) / ∂α = (1/α) - ln(x). ∂2 ln f(x) / ∂α2 = -1/α2. Therefore the information is: -n E[∂2 ln f(x) / ∂α2] = n / α2. Therefore, taking the inverse of the information, the Asymptotic Variance = α2 / n . Comment: A Single Parameter Pareto Distribution.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 816 28.24. B. The estimated variance is the inverse of the information. In this case variance ≅ θ2 /15 ≅ 1.852 / 15 = .228. Thus the standard deviation is about
0.2282 =
.4777. A 95% confidence interval is about ±1.96 standard deviations: 1.85 ± (1.96)(.4777) = 1.85 ± .936 = (0.914, 2.786). Comment: This is a Normal distribution, but with parameter θ equal to the variance, rather than the usual parameter σ equal to the standard deviation. 28.25. D. ln f(xi) = -.5ln(2πθ) - (x -1000)2 /(2θ).
∂ ln(f(x)) / ∂θ = (-1/2)(1/θ) + (x-1000)2 / (2θ2). The second partial derivative with respect to θ is: (1/2)(1/θ2) + (-1/θ3)(x-1000)2 . We need the expected value of the above. The final term involves the squared deviation of x from the mean of 1000. Taking the expected value of such terms is one definition of the variance. Therefore, the expected value of the last term is: Ex[(x -1000)2 ] = Var(x) = θ. Therefore, the expected value of the second partial derivative is: (1/2)(1/θ2) + (-1/θ3)(θ) = -1/(2θ2). Variance of θ ≅ -1/ {n E[∂2 ln f(x) / ∂θ2]} = 2θ2/n. Comment: Since we expect the variance of the estimate to decrease as the number of points increases, we can eliminate choice C. This a Normal distribution with fixed mean 1000 and with the variance rather than the standard deviation as the parameter. The maximum likelihood estimate of θ is: Σ(xi - 1000)2 /n. 28.26. D. ∂ ln f(xi) / ∂α = {(1/α) + ln(1000) - ln(1000 + xi)}.
∂2 ln f(xi) / ∂α2 = -1/α2. Therefore the information is: -n E[∂2 ln f(xi) / ∂α2] = n / α2. With 5 data points, n = 5. Therefore, taking the inverse of the information, the Asymptotic Variance = α2 / n = α2 / 5. Comment: -1/{nE[∂2 ln f(xi) / ∂α2]} is an approximation to the variance, which gets better and better as the sample size goes to infinity. Thus it is referred to as the asymptotic variance. If an exam question refers to the “asymptotic variance”, it means for you to use the usual formula: variance ≅ -1/{nE[∂2 ln f(xi) / ∂α2]}.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 817 28.27. D. The inverse of the information is λ12 / 40, which for λ1 = 5 is: 25 / 40 = 0.625. This is the approximate variance of the estimate of λ; thus the standard deviation of the estimate is 0.791. Using the standard normal approximation, a 95% confidence interval is about ±1.96 standard deviations, which in this case is: 5.00 ± 1.55. Comment: Even though the information was given, you should know how to compute it. ln f(x) = ln(λ) - λx. ∂ln f(x) / ∂λ = 1/λ − x. ∂2ln f(x) / ∂λ2 = -1/λ2. Thus the information is: -n E[∂2 ln f(x) / ∂λ2] = -n E[-1/λ2] = n / λ2 = 40/λ2. 28.28. A. Since we're told we have an unbiased estimator, the Mean Square Error is equal to the variance of the estimated parameter. (MSE = Variance + Bias2 .) The estimated variance is the inverse of the information evaluated at the estimated parameter: 1 / {20 / θ12 } = 4/20 = 0.2 Comment: Note that in the Normal distribution, usually θ is replaced by σ2. 28.29. C. For the exponential distribution the method of maximum likelihood applied to ungrouped data equals the method of moments. Therefore λ^ = 10/2.079 = 4.81. The next step is to estimate the variance by computing the information. f(x) = λe-λx. ln f(x) = ln λ − λx. ∂ lnf(x) / ∂λ = 1/λ − x. ∂2 lnf(x) / ∂λ2 = -1/λ2. Information = - n E[ ∂2 lnf(x) / ∂λ2] = n/λ2 = 10/4.812 = 0.432. estimated variance = 1/information = 2.31. Error bars for 95% confidence interval are ± 1.96 standard deviations = ± 1.96 (2.31).5 = ± 2.98. The interval estimate for λ is therefore 4.81 ± 2.98. Comment: The exponential is not usually not used for frequency. 28.30. B. f(x) = (1/θ)e-x/θ . ln(f(x)) = −lnθ - x/θ. ∂ln(f(x)) /∂θ = −1/θ + x/θ2 .
∂2 ln(f(x)) /∂θ2 =1/θ2 - 2x/θ3 . For this Exponential Distribution, the mean E[x] = θ. Therefore, E[∂2 ln(f(x)) /∂θ2 ] = E[1/θ2 -2x/θ3 ] = 1/θ2 - (2/θ3 )E[x] = 1/θ2 - (2/θ3 )θ = -1/θ2 . Variance of θ ≅ -1 / {n E [∂2 ln f(x) / ∂θ2]} = θ2 / n. In this case n = 3, and the estimated θ is: (x1 + x2 + x3 )/3 = (10+20+30) /3 = 20. Thus Variance of θ ≅ 202 /3 = 133.33.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 818 28.31. (i) ln f(x) = lnα - (α+1)ln(1 + x). Loglikelihood is: n ln α - (α+1)Σln(1 + xi). Set the derivative with respect to alpha equal to zero: 0 = n/α - Σln(1 + xi). α^ = n/Σln(1 + xi). Second derivative of log density with respect to alpha: -1/α2. Var[ α^ ] = -1/{nE[-1/α2]} = α2/n. An approximate 95% confidence interval for α is: α^ ± 1.96 α^ / n = {n/Σln(1 + xi)}(1 ± 1.96/ n ). (ii) (a) F(12) = 1 - 1 /(1 + 12).56 = 76.2%. (b) An approximate upper 95% one-sided confidence interval for α: α^ + 1.645 α^ / n = 0.56(1 + 1.645/ 80 ) = 0.663. α ≤ 0.663. (c) The bigger α the smaller the survival function and the larger the distribution function at 12. Thus a 95% confidence upper bound for F(12) is: 1 - 1 /(1 + 12).663 = 81.7%. (iii) (a) The variance of the empirical distribution function is: (0.7625)(1 - 0.7625)/80 = 0.002264. An upper 95% confidence interval for F(12) based on the binomial distribution is: .7625 + 1.645 0.002264 = 84.1%. (b) The first engineer makes a modeling assumption, that the data follows a Pareto Distribution with θ = 1. If he has not made a modeling error, the first engineerʼs method is more powerful. Note that the while the two point estimates for F(12) are similar, the first engineerʼs confidence interval is narrower. The second engineerʼs method is valid whether or not the data follows a Pareto Distribution. The second engineer did not need to make any parametric assumptions about the data.
2013-4-6, Fitting Loss Dists. §28 Variances of Parameters , HCM 10/15/12, Page 819 28.32. B. For the Exponential, the method of moments equals the method of maximum likelihood. θ = observed mean ⇒ Var[θ] = Var[mean] = Var[X]/n = θ2/5. StdDev[θ]/θ = (θ2/5).5/θ = 1/ 5 = 0.447. Alternately, f(x) = e-x/θ/θ. lnf(x) = -x/θ - ln(θ). ∂lnf(x) / ∂θ = x/θ2 - 1/θ.
∂2 lnf(x) / ∂θ2 = -2x/θ3 + 1/θ2. Var[ θ^ ] = -1/ {n E[∂2 lnf(x) / ∂θ2 ]} = (1/n)/E[2x/θ3 - 1/θ2] = (1/5)/(2θ/θ3 - 1/θ2) = θ2/5. Proceed as before. Comment: ^θ = 4540. Stddev[ ^θ ] = 4540 /
5 = 2030.
The Cramer-Rao lower bound is an asymptotic result; the variance approaches the Cramer-Rao lower bound as the sample size approaches infinity. Nevertheless, we use it on the exam even for small samples. Here it is exact. The fitted θ = 4540. 28.33. A. f(x) = θ/(θ + x)2 . ln f(x) = ln(θ) - 2ln(θ + x). ∂ ln f(x) / ∂θ = 1/θ - 2/(θ + x). ∂2 ln f(x) / ∂θ2 = -1/θ2 + 2/(θ + x)2 . ∞
x=∞
∫
E[1/(θ + x)2 ] = 1/(θ + x)2 θ/(θ + x)2 dx = θ(-1/3)/(θ + x)3 ] = 1/(3θ2). 0
x=0
Therefore, E[∂2 ln f(x) / ∂θ2] = -1/θ2 + 2/(3θ2) = -1/(3θ2). Variance of θ^ ≅ -1 / {n E[∂2 ln f(x) / ∂θ2]} = 3θ2/n. Comment: The asymptotic variance always goes down as 1/n, eliminating choices D and E. For larger values of θ^ , one expects a larger variance of θ^ , eliminating all the choices but A. A Pareto Distribution with α = 1 fixed. ∞
The density of any Pareto Distribution integrates to one: 1 =
∫ α θα / (x+ θ)α + 1 dx. 0
⇒
∞
∫ 1/ (x + θ)α + 1 dx = 1/(α θα). 0
This would allow you to do the integral required in the solution.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 820
Section 29, Information Matrix and Covariance Matrix This section continues where the previous section left off. It will be shown how to compute the variance of estimated parameters when more than one parameter is being fit.343 In the next section, it will be shown how to use the inverse of the information matrix, the covariance matrix, in order to compute the variance of functions of the estimated parameters using the delta method. Information Matrix:344 Information Matrix = -n E[
∂2 ln f(x) ∂ ln f(x) ∂ ln f(x) = n E[ ] ]. ∂θi ∂θj ∂θi ∂θ j
Memorize this formula! The elements of the information matrix are given by negative the number of points times the expected value of the second partial derivatives with respect to the pairs of parameters.345 Alternately, one can get the information matrix as the number of points times the expected value of product of the partial derivatives with respect to the pairs of parameters. In the case of a single parameter, the information is a one by one matrix, in other words a number. For the Exponential with a single parameter θ, the information is: -n E [∂2 ln f(x) / ∂θ2]. For example for the Pareto Distribution with two parameters, α and θ, the information matrix is two by two with elements: ⎛-n E ∂2 ln f(x) -n E ∂2 ln f(x) ⎞ [ ∂α2 ] [ ∂α ∂θ ] ⎟ ⎜ ⎜ ⎟ . 2 2 ⎜-n E[ ∂ ln f(x)] -n E[∂ ln f(x)] ]⎟ ⎝ ⎠ ∂α ∂θ ∂θ2
Or alternately:
⎛n E[ ∂ ln f(x) ∂ ln f(x)] n E[∂ ln f(x) ∂ ln f(x)]⎞ ⎜ ∂α ∂α ∂α ∂θ ⎟ . ⎜ ⎟ ⎜ n E[∂ ln f(x) ∂ ln f(x)] n E[ ∂ ln f(x) ∂ ln f(x)]⎟ ⎝ ∂θ ∂θ ⎠ ∂α ∂θ
In general the information matrix is symmetric with number of rows and columns equal to the number of parameters of the particular distribution function.346 343
Exam questions should only deal with the two parameter case, in addition to the one parameter case. See page 395 of Loss Models. 345 The matrix is symmetric and has diagonal elements based on second partial derivatives with respect to a single parameter. Each diagonal element looks like a denominator of a Rao-Cramer (Cramer-Rao) lower bound. 346 Thus an Exponential Distribution would have a one by one information matrix, while a Burr Distribution, with three parameters, would have a three by three information matrix. 344
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 821 An Example of an Information Matrix: As an example, the information matrix for the Pareto Distribution fit by maximum likelihood to the ungrouped data in Section 2, with parameters α = 1.702 and θ = 240,151 is: 347 α ⎛ 44.8770 ⎜ θ ⎝ -2.00343 x 10 - 4
-2.00343 x 10 - 4 ⎞ ⎟ 1.036329 x 10 - 9 ⎠
This information matrix was computed as follows.348 For the Pareto Distribution, f(x) = αθα/(θ + x)α+1. Thus the log density, ln f(x) = lnα + α lnθ − (α+1)ln(θ + x). The partial derivative of the log density with respect to α is: (1/α) + lnθ − ln(θ + x). Thus, the second derivative of the log density with respect to α is: -1/α2. Thus the expected value of the second derivative of the log density with respect to α is: -1/α2. The element of the information matrix A11 is given by negative the number of points times this expected value: A11 = -n E [∂2 ln f(x) / ∂α2] = n / α2 = 130/1.7022 = 44.8770. The mixed second derivative of the log density with respect to α and θ is: ∂2 ln f(x) = 1/θ - 1/(θ + x). ∂α ∂θ When as here there are terms involving x, one can get the expected value with respect to x via integration. ∞
E[1/(θ+ x)] =
∫0
f(x) dx = θ + x
∞
∫0
α θα 1 dx = α θα α + 1 θ + x (θ + x)
∞
∫0 (θ + x)α + 2 dx 1
= α θα / {(α+1) θα+1} = α /{(α+1)θ}. E
∂2 ln f(x) ] = 1/θ - α /{(α+1)θ} = 1 /{(α+1)θ}. ∂α ∂θ
A 12 = A21 = -n E [ 347
∂2 ln f(x) ] = -n /{(α+1)θ} = -130/{(2.702)(240,151)} = -0.000200343. ∂α ∂θ
The alpha next to the first row indicates that the first row refers to alpha, as does the first column. Similarly the second row and second column refer to theta. α and θ clarify which way we are writing the matrix. 348 While computing all of the elements of a 2 by 2 information matrix is too long for an exam question, you might be asked to compute one of these elements.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 822 The partial derivative of the log density with respect to θ is: α/θ - (α+1)/(θ + x). Thus, the second derivative of the log density with respect to θ is: ∂2 lnf α α + 1 = + . ∂θ2 θ2 (x +θ) 2 ∞ ∞ 1 f(x) α θα α θα α E[ ]= ∫ dx = ∫ dx = = + α + α 2 2 3 2 2. (x+ θ) (x +θ) (x + θ) (α + 2) θ (α + 2) θ 0 0
E[
∂2 lnf α α α ] = - 2 + (a+1) =. 2 2 ∂θ θ (α + 2) θ (α + 2) θ2
A 22 = -nE[∂2 ln f(x) / ∂θ2 ] = n
α = (130)(1.702)/{(3.702)(240,1512 )} = 1.036329 x 10-9. (α + 2) θ2
Therefore, in general, the Information Matrix for a Pareto Distribution is:349 ⎛ ⎞ n -n α ⎜ 2 θ (α + 1) ⎟⎟ ⎜ α -n nα ⎟ θ ⎜⎜ ⎟ ⎝ θ (α + 1) θ2 (α + 2) ⎠
Observed Information: Rather than use the expected value of partial derivatives of the log density, as was done previously, instead one can work with the loglikelihood. Recall that given a data set and the type of distribution, the loglikelihood is a function of the parameter(s). For a distribution with two parameters α and θ, such as the Pareto, the loglikelihood is: ln L(α, θ) = ∑ln f(xi; α, θ). Then A12 = A21 = -n E [∂2 ln f(x) / ∂α∂θ] ≅ -n (1/n)∑∂2 ln f(xi) / ∂α∂θ = - ∂2(∑ln f(xi) )/ ∂α∂θ = - ∂2(ln L(α, θ) )/ ∂α∂θ. Similarly, A11 = -n E [∂2 ln f(x) / ∂α2] ≅ -∂2(ln L(α, θ) )/ ∂α2, and A 22 = -n E [∂2 ln f(x) / ∂θ2] ≅ -∂2(ln L(α, θ) )/ ∂θ2. 349
This is not something to be memorized. Rather you should know that the information matrix depends on the particular distribution type, in this case Pareto, the parameters of the distribution, and the sample size. In practical applications, the form of the information matrix would be programmed on a computer once.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 823 The information matrix can be approximated by the observed information: A ij ≅ - ∂2 loglikelihood / ∂θi∂θj = - ∂2 ln L / ∂θi∂θj . Information Matrix ≅ Observed Information = - ∂2 Loglikelihood / ∂θi∂θj , where the (numerical) second partial derivatives are evaluated at the maximum likelihood fit. For example for the Pareto Distribution with two parameters, α and θ, the observed information is a symmetric two by two matrix with elements: ∂2 loglikelihood ∂2 loglikelihood ⎞ α ⎛⎜ − − ⎟ ∂α 2 ∂α ∂θ ⎜ ∂2 loglikelihood ∂2 loglikelihood ⎟⎟ ⎜ − θ ⎝− ⎠ ∂α ∂θ ∂θ2
Note that the loglikelihood is the sum over n data points. If one has 10 times as many data points, one expects 10 times the loglikelihood. Thus the loglikelihood already has the sample size in it. Therefore, there is no need to multiply by n in the formula for the observed information. An Example of an Observed Information Matrix: As an example, the observed information matrix for the Pareto Distribution fit by maximum likelihood to the ungrouped data in Section 2, with parameters α = 1.702 and θ = 240,151 is: α θ
⎛ 44.8770 ⎜ ⎝ -2.002379 x 10- 4
-2.002379 x 10 - 4 ⎞ ⎟ 1.037286 x 10 - 9 ⎠
This observed information matrix was computed as follows: For the Pareto Distribution, f(x) = αθα/(θ + x)α+1. Thus the log density, ln f(x) = lnα + α lnθ − (α+1)ln(θ + x). The loglikelihood is: n lnα + n α lnθ − (α+1)Σln(θ + xi). The partial derivative of the loglikelihood with respect to α is: n/α + nlnθ − Σln(θ + xi). Thus, the second derivative of the log density with respect to α is: -n/α2. The element of in the first row and first column of the observed information is: - ∂2(ln L(α, θ) )/ ∂α2 = n / α2 = 130/1.7022 = 44.8770.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 824 The mixed second derivative of the loglikelihood with respect to α and θ is: ∂2(ln L(α, θ) )/ ∂α∂θ = n/θ - Σ (θ+ xi)-1. When there are terms involving x, previously the expected value was obtained via integration. Instead in computing the observed information, one sums over the observed data points. Σ (240151 + xi)-1 = 1/(240151 + 300) + 1/(240151 + 400) + ... + 1/(240151 + 4,802,200). The off-diagonal elements of the 2 by 2 observed information matrix are: -∂2(ln L(α, θ) )/ ∂α∂θ = Σ (θ+ xi)-1 - n /θ = Σ (240151 + xi)-1 - 130 /240151 = 0.000340947 - 0.000541326 = -2.00379 x 10-4. The partial derivative of the loglikelihood with respect to θ is: nα/θ - (α+1)Σ(θ + xi)-1. Thus, the second derivative of the loglikelihood with respect to θ is: ∂2(ln L(α, θ) )/ ∂θ2 = -nα / θ2 + (α+1)Σ(θ + xi)-2. The element of in the second row and second column of the observed information is: −∂2(ln L(α, θ) )/ ∂θ2 = nα / θ2 - (α+1)Σ (θ + xi)-2 = (130)(1.702)/2401512 - 2.702Σ(240151 + xi)-2 = 3.836490 x 10-9 - (2.702)(1.0359749 x 10-9) = 1.037286 x 10-9. Properties of the Method of Maximum Likelihood: For maximum likelihood estimation one has: • asymptotically unbiased estimator. • consistent estimator. • whose errors are asymptotically normal or multivariate normal.
• variance that goes to zero as the inverse of the sample size. • variance that approaches the Cramer-Rao lower bound as the sample size → ∞.
• asymptotically UMVUE (Uniformly Minimum Variance Unbiased Estimator). • variance-covariance matrix ≅ the inverse of the Information Matrix. Other methods of estimation generally have asymptotic variances that are greater than that of maximum likelihood.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 825 Regularity Conditions for Nice Properties to Hold for the Method of Maximum Likelihood:350
• The log density is three times differentiable with respect to the parameter(s). ∂f
•
∫ ∂θ dx = 0.
•
∫ ∂θ2
∂2f
• -∞ <
dx = 0.
∫
∂2ln[f] f(x) dx < 0. ∂θ 2
• There exists a function H(x), such that:
∫ H(x) f(x) dx < ∞, with
350
∂3ln[f] < H(x). ∂θ3
See Theorem 15.5 in Loss Models.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 826 Variance-Covariance Matrix: The information matrix is important and useful since when using the method of maximum likelihood the inverse of the information matrix gives an approximate variance-covariance matrix for the fitted parameters.351 352 The diagonal elements of the variance-covariance matrix are the variances. The off-diagonal elements are the covariances. For example, if the two parameters of a distribution are α and θ, then the variance-covariance matrix is:353 ⎛ ⎞ ^ ^ ]⎟ α ⎜ Var[α^ ] Covar[α, θ ⎜⎜ ⎟⎟ ^ ^] θ ⎝ Covar[α, Var[θ^] ⎠ θ
In general, since the Information Matrix is proportional to the number of data points, the variance is inversely proportional to the number of data points used to fit the distribution. This can be a useful final check of your solution. Note that the inverse of a two-by-two matrix is given as follows: ⎛a b⎞ ⎛ d -b⎞ Inverse of ⎜ ⎟ is: ⎜ ⎟ / (ad - bc). ⎝ -c a ⎠ ⎝ c d⎠ As an example, for the maximum likelihood Pareto, the elements of the information matrix were computed above: α ⎛ 44.8770 -2.00343 x 10- 4 ⎞ ⎜ ⎟. θ ⎝ -2.00343 x 10- 4 1.036329 x 10 -9 ⎠
31450.9 ⎛ 0.162689 ⎞ The inverse of this matrix is: ⎜ ⎟ . 9 ⎝ 31450.9 7.04503 x 10 ⎠ Thus the estimated variance of the first parameter α is the 1-1 element: 0.163. The estimated variance of the second parameter θ is the 2-2 element: 7.05 x 109 . 351
When you hold all but one parameter fixed, this yields as a special case the formula for the variance of an estimated parameter, discussed in a previous section. 352 The covariance of X and Y is: Cov[X,Y] = E[XY] - E[X]E[Y]. Cov[X,X] = Var[X]. Covariances are discussed in “Mahlerʼs Guide to Buhlmann Credibility and Bayesian Analysis.” 353
α^ is used to denote the estimate of the parameter α.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 827 Since the estimated parameters are approximately jointly normally distributed, taking 1.96 standard deviations gives a 95% confidence interval for each parameter: In this case, α = 1.7 ± 0.8, while θ = 240 ± 165 thousand. At least 90% of probability will be in the resulting rectangles of values of both parameters.354 The covariance of α and θ is the 1,2 or 2,1 element of the above inverse of the information matrix: 31,451. Therefore, the estimated correlation between the two parameters is:355 ^
^
Corr[ α , θ] =
Covar[α^ , θ^] = Var[α^ ] Var[θ^]
31450.9 (0.162689) (7.04503 x 109 )
= 0.929,
which means that as the estimated α increases the estimated θ is also likely to increase.356 Confidence Ellipse: An ellipse that contains approximately 95% of the probability for the parameters looks as follows: Theta
400000
300000 Alpha 1
200000
2
2.5
100000
354
Since at most 5% is outside the bands formed by looking at each parameter separately. The correlation is defined as the covariance divided by the product of the standard deviations: Corr[X,Y] = Cov[X,Y] / { Var[X] Var[Y] . 356 The probability in two dimensions is concentrated near a line. For many purposes, the change as both parameters move together is less than when just one of them moves. So in this case, larger alphas and thetas go together, and as will be seen using the delta method, the resulting estimates of various quantities: mean, loss elimination ratio, etc., do not change as much as if just one parameter changed by the given amount. In general, the estimated parameters for the Pareto Distribution are highly correlated. The correlation of the estimated α and θ is: 1 - 1/ ( α +1)2 , close to 1. For the Lognormal Distribution, the estimated µ and σ are independent. For the Gamma Distribution, the estimated α and θ are negatively correlated. 355
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 828 For the bivariate normal distribution, the probability enclosed by the ellipse given by {(α − µα)/σα}2 + {(θ − µθ)/σθ}2 - 2 ρ(α − µα)(θ − µθ) / (σασθ) = a2 (1 − ρ2 ) is: 1 - exp(-a2 / 2).357 For a = 2.45, the probability inside such an ellipse is: 1 - exp(-2.452 / 2) = 0.95. For µα = 1.702, µθ = 240,151, σα2 = .163, σθ2 = 7.05 x 109 , ρ = 0.929, one obtains the previously shown 95% confidence ellipse. Other values of “a“ will produce ellipses corresponding to different levels of confidence. By using the inverses of the other Information Matrices, similar ellipses could be drawn for the estimates of parameters of the other two parameter curves fit by Maximum Likelihood. A Simulation Experiment: One hundred sets of data each of size 130, the same size as the data set in Section 2, were simulated from a Pareto distribution with parameters as per the maximum likelihood curve fit to the ungrouped data in Section 2: α = 1.702 , θ = 240,151. Then to each of these simulated data sets was fit a Pareto distribution via the method of maximum likelihood. The results of this simulation experiment were: Theta 700000
600000
500000
400000
300000
200000
1.5
2
2.5
3
3.5
Alpha
For data sets of size 130, there is a large variation in the estimated parameters around the true underlying parameters of α = 1.702 , θ = 240,151, indicated by a circle. 357
See the Handbook of Mathematical Functions, by Abramowitz, et. al., p. 940.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 829 I repeated this simulation experiment but with data sets each of size 13000, one hundred times the size of the data set in Section 2. As shown below for 100 data sets each of size 13,000, there is a much smaller variation in the estimated parameters around the true underlying parameters of α = 1.702 , θ = 240,151, indicated by a circle.358
Theta 260000
250000
240000
230000
220000 Alpha 1.6
1.65
1.7
1.75
1.8
So we conclude from either the estimated variances or the simulation experiments, that assuming the data follows a Pareto distribution there is still a reasonably wide range of parameters. (This is not uncommon when dealing with relatively small data sets.) The information matrix will be used subsequently in order to estimate the variance of functions of the parameters, such as estimated excess ratios.
358
Note that the previous graph was on a different scale.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 830 Mixed Second Partial Derivatives of Log Densities: While it is very unlikely that an exam question will require you to know any of the 2 by 2 information matrices associated with the 2 parameter distributions, it is not a bad idea to understand how they are derived and be prepared to do so for a special case of one of these distributions with all but one parameter held constant. In the prior section, were given the second partial derivatives with respect to single parameters. Below for some distributions are the mixed second partial derivatives that allow one to compute the remainder of the information matrix: Distribution
Par. 1
Par. 2
Mixed Second Partial Derivative of Log Density
θ
{1/θ} − {1/(θ+x)}
Pareto
α
LogNormal
σ
µ
−2{(lnx)−µ}/σ3
Gamma
α
θ
−1/θ
Weibull
θ
τ
- 1/θ + xτθ−(τ+1) (τln (x/θ) + 1)
Burr
α
θ
(γ/θ)/(1 + (θ/x)γ)
Burr
α
γ
−(ln (x/θ)) /(1 + (θ/x)γ)
Burr
θ
Trans. Gamma
α
θ
−τ / θ
Trans. Gamma
α
τ
ln (x/ θ)
Trans. Gamma
θ
τ
- α/θ + xτθ−(τ+1) (τln(x/θ) + 1)
Gen. Pareto
α
θ
{1/θ} − {1/(θ+x)}
Gen. Pareto
α
τ
ψ′(α + τ)
Gen. Pareto
θ
τ
−1/(θ+x)
γ
{(1+(x/θ)γ)(α(x/θ)γ -1) + (1+α)γ(x/θ)γ ln(x/θ)} /{θ(1+ (x/θ)γ)2}
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 831 More Examples of Observed Information Matrices: Here are the observed information matrices computed for the various maximum likelihood curves fit to the ungrouped data in Section 2: Pareto (α, θ): A 11 = 44.8770
A12 = A21 = -2.002379 x 10-4
A22 = 1.037286 x 10-9
LogNormal (µ, σ): A 11 = 50.575
A12 = A21 = 0
A22 = 101.15
Gamma (α, θ): A 11 = 496.4
A12 = A21 = 0.0002422
A22 = 2.628 x 10-10
Weibull (θ, τ): A 11 = 1.14939 x 10-9
A12 = A21 = -0.000265962
A22 = 580.486
Burr (α, θ, γ): A 11 = 37.988
A12 = A21 = -1.62173 x 10-4
A22 = 7.91847 x 10-10
A 33 = 221.037
A13 = A31 = 4.60802
A23 = A32 = 0.000103393
Transformed Gamma (α, θ, τ): A 11 = 29.8476
A12 = A21 = 0.0479359
A22 = 0.0000854893
A 33 = 22258.3
A13 = A31 = -634.798
A23 = A32 = -1.29141
Generalized Pareto (α, θ, τ): A 11 = 40.2402
A12 = A21 = -1.666687 x 10-4
A22 = 7.95041 x 10-10
A 33 = 173.27
A13 = A31 = -57.6527
A23 = A32 = 3.108468 x 10-4
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 832 Examples of Variances of Estimated Parameters: Using these observed information matrices, here are the estimated variances for the parameters for the distributions fit by maximum likelihood to the ungrouped data:359 Distribution Pareto LogNormal Gamma Weibull Burr Trans. Gamma Gen. Pareto
Par. α µ α θ α α α
Var. 0.161 0.0200 3.66e-3 9.73e+8 0.636 14.88 0.238
Par. θ σ θ τ θ θ θ
Var.
Par.
Var.
γ τ τ
0.0146
6.95e+9 0.0099 3374 0.00193 3.24e+10 1.655e+7 2.11e+10
0.01607 0.0244
These are not computed as the variances were estimated in a previous section. Those were the variances in one parameter assuming all other parameters were fixed, as opposed to allowing all parameters to vary freely. Here we have to take the matrix inverse of the observed information matrix, as opposed to merely working with the inverse of a number. This is not a difficulty for practical applications, but does make it less likely to appear on the exam.360
359
The variances for the Pareto differ slightly from those computed before. These are computed using the observed information matrix, while the previous ones were computed using the information matrix. 360 In practical applications a computer program computes the information matrix and inverts it. Unfortunately, for the three parameter distributions this is sensitive, and one has to be careful throughout the computation to retain many significant digits, starting with the fitted parameters.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 833 A LogNormal Example:361 Let us assume we have fit a LogNormal Distribution via maximum likelihood to a sample of size n.
[(
exp For the LogNormal, f(x) =
ln f(x) = -
( ln(x) - µ)2 2σ 2
ln(x) − µ)2 2σ2 x σ 2π
]
.
- ln(x) - ln(σ) - ln(2π)/2.
∂ ln[f(x)] ln(x) - µ = . σ2 ∂µ
∂ 2 ln[f(x)] = -1/σ2. ∂µ2
ln(x) - µ ∂ ln[f(x)] = -2 . σ3 ∂µ ∂σ
∂ ln[f(x)] ( ln(x) - µ)2 = - 1/σ. σ3 ∂σ
( ln(x) - µ)2 ∂ 2 ln[f(x)] = -3 + 1/σ2. σ4 ∂σ2
Exercise: What is the information matrix? [Solution: The 1,1 element of the Information Matrix is: -n E[
∂ 2 ln[f(x)] ] = -n E[-1/σ2] = n /σ2. ∂µ2
ln[X] is Normal with mean µ and variance σ2. Therefore, E[ln[x]] = µ, and E([ln[x] - µ)2 ] = σ2. The 2,2 element of the Information Matrix is: ( ln(x) - µ)2 -n E[∂2 ln f(x) / ∂σ2 ] = -n E[-3 + 1/σ2] = -n (-3σ2/σ4 + 1/σ2) = 2n /σ2. σ4 The off-diagonal elements of the information matrix are: ln(x) - µ ∂ ln[f(x)] -n E[ ] = -n E[ -2 ] = 2n (E[lnx] - µ) /σ3 = (2n)(µ - µ) / σ3 = 0. σ3 ∂µ ∂σ ⎛n / σ2 0 ⎞ Thus the Information Matrix is: ⎜ ⎟. 2n / σ 2⎠ ⎝ 0 Comment: A Maximum Likelihood fit to a LogNormal is equivalent to fitting via the Method of Moments a Normal Distribution to the log claim sizes. Since for the fitted Normal, µ and σ are independent, this is true for the LogNormal as well. Thus the off-diagonal elements of either the Information Matrix and its inverse are zero. ] 0 ⎞ ⎛σ 2 / n Thus the inverse of the Information Matrix is: ⎜ . σ2 / (2n)⎟⎠ ⎝ 0 361
See Example 15.13 in Loss Models.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 834 Therefore, Var[ µ^ ] = σ2 / n, and Var[ σ^ ] = σ2 / (2n). The variance of the estimate of mu is twice that of the variance of sigma. Also Cov[ µ^ , σ^ ] = 0. For the LogNormal, the estimates of mu and sigma are independent. For the ungrouped data in Section 2, the Maximum Likelihood LogNormal Distribution has µ = 11.5875, and σ = 1.60326. Here is a graph of the loglikelihood as a function of alpha and mu, with the maximum loglikelihood marked by a dot:
At the maximum loglikelihood, the first partial derivatives with respect to µ and σ are zero. Therefore, near the maximum loglikelihood, the change in the loglikelihood can be approximated by the second order terms in the multivariate Taylor series. Letting ll(µ , σ) be the loglikelihood, with µ0 and σ0. corresponding to the maximum loglikelihood: ll(µ , σ) - ll(µ0 , σ0) ≅ (µ - µ0)2
2 2 ∂ 2 ll 2 ∂ ll /2 + (µ - µ0) (σ - σ0) ∂ ll . /2 + (σ σ ) 0 ∂µ2 ∂σ2 ∂µ∂σ
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 835 ∂ 2 ll ∂ 2 ll ∂ 2 ll , , and , evaluated at the maximum likelihood fit, are minus the elements of the ∂µ2 ∂σ2 ∂µ∂σ observed information. The bigger the absolute value of these second partial derivatives, the more sensitive the loglikelihood is to a change in the values of the parameters.362 0 ⎛ 50.575 ⎞ In this case, for the LogNormal, the observed information is: ⎜ . 101.150⎟⎠ ⎝ 0 Thus the change in the loglikelihood as we vary µ is half of that as we vary σ. Here are graphs of the loglikelihood as we vary only one of the two parameters, while keeping the other one constant at its value at the maximum loglikelihood:363 Loglikelihood - 1752.2
- 1752.3
- 1752.4
Loglikelihood - 1752.2
- 1752.5
- 1752.3
- 1752.6
- 1752.7 - 1752.4 11.55 11.60 11.65
Mu - 1752.8
Sigma 1.55 1.60 1.65 1.70
We see that the loglikelihood varies more as sigma varies than as mu varies. The more sensitive the loglikelihood is to a change in the value of a parameter, the easier it is to find exactly where its maximum is. Therefore, it is easier to estimate sigma than it is to estimate mu. The variance of the estimate of sigma is smaller than the variance of the estimate of mu. 362 363
The non-mixed 2nd partial derivatives are both negative, since the loglikelihood surface is concave downwards. These are cross-sections of the three dimensional graph of the loglikelihood shown previously.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 836 Intuitively the variances of the estimated parameters can be thought of as being the inverse of the speed with which the loglikelihood falls from its maximum given a change in a parameter. For example consider the following diagram:364
Movements in parameter 1 from the optimal position reduce loglikelihood more quickly than similar movements in parameter 2. The loglikelihood curve becomes steeper in the parameter 1 direction than in the parameter 2 direction. The second partial derivative of the loglikelihood with respect to parameter 1 is negative but with a large absolute value, with the result that the Cramer-Rao lower bound for parameter 1, being minus one over the second partial derivative, is small. In contrast, the second partial derivative of the loglikelihood with respect to parameter 2 has a smaller absolute value, and thus the variance for the estimate of parameter 2 is larger, indicating greater uncertainty.
364
Taken from “A Practitioner's Guide to Generalized Linear Models,” by Duncan Anderson, Sholom Feldblum, Claudine Modlin, Doris Schirmacher, Ernesto Schirmacher, and Neeza Thandi, a study note on the syllabus of the CAS exam on advanced ratemaking.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 837 Problems: 29.1 (1 point) Use the following information:
• A set of grouped data includes losses > 100,000. • Using the Method of Maximum Likelihood, a Single Parameter Pareto Distribution with θ = 100,000 has been fit to this grouped data.
• The Method of Maximum Likelihood fitted parameter is α = 1.904. • The information is 193.36. Which of the following is an approximate 90% confidence interval for α? A. [1.88, 1.92]
B. [1.85, 1.95]
C. [1.81, 1.99]
D. [1.78, 2.02]
E. [1.76, 2.04]
29.2 (2 points) You are given the following:
•
The parameters of a loss distribution are α and β.
•
The maximum likelihood estimators of these parameters have information matrix ⎛496 242⎞ A( α,β) = ⎜ ⎟ ⎝242 263⎠
What is the correlation coefficient of the maximum likelihood estimates of α and β? A. -2/3
B. -1/3
C. 0
D. 1/3
E. 2/3
29.3 (4 points) A sample of observations is fit via maximum likelihood to a parametric family f(x; α, β). The loglikelihood function is: -8894.6/β2 + 1506.37α/β2 - 65α2/β2 - 130lnβ - 1625.83. Determine the estimated covariance matrix of the maximum likelihood estimator of the parameters α and β. (A)
⎛0.0198 0 ⎞ ⎜ ⎟ 0.0099⎠ ⎝ 0
(B)
⎛ 0.0198 ⎜ ⎝ - 0.0070
- 0.0070⎞ ⎟ 0.0099 ⎠
(C)
⎛50.6 0 ⎞ ⎜ ⎟ ⎝ 0 101.2⎠
(D)
⎛ 50.6 35.6 ⎞ ⎜ ⎟ ⎝ 35.6 101.2 ⎠
(E)
None of the above
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 838 Use the following information for the next three questions: Data x1 , x2 , x3 , ... xn , has been fit via Maximum Likelihood to a Weibull Distribution with parameters θ and τ, in that order. The fitted parameters are θ = 20.0 and τ = 0.700. The inverse of the 2 x 2 information matrix is given by: ⎛ 7.3 0.021 ⎞ ⎜ ⎟ ⎝ 0.021 0.00015⎠ 29.4 (1 point) Which of the following is an approximate 95% confidence interval for the parameter θ? A. B. C. D. E.
[16.7, 23.3] [15.7, 24.3] [14.7, 25.3] [13.7, 26.3] [12.7, 27.3]
29.5 (1 point) Which of the following is an approximate 95% confidence interval for the parameter τ? A. B. C. D. E.
[0.676, 0.724] [0.680, 0.720] [0.684, 0.716] [0.688, 0.712] [0.692, 0.708]
29.6 (2 points) What is the correlation between the estimates of the parameters θ and τ? A. less than 50% B. at least 50% but less than 55% C. at least 55% but less than 60% D. at least 60% but less than 65% E. at least 65%
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 839 Use the following information for the next three questions: Data x1 , x2 , x3 , ... xn , has been fit to a Weibull Distribution with parameters θ and τ, in that order. The 2 x 2 information matrix is given by: ⎛ p q⎞ ⎜ q r⎟ ⎝ ⎠ n
Let h(y) =
∑ ln[xi / θ]y (xi / θ)τ . i=1
29.7 (2 points) The element of the Information Matrix p, is given by which of the following? −τ −τ A. 2 {n - (τ+1)h(0)} B. 2 {n + θh(1)} θ θ C.
−τ {n - (τ+1)h(0) + θh(1)} θ2
D.
−τ {n - (τ+1)h(0) + θh(1) + h(2)} θ2
E. None of the above. 29.8 (2 points) The element of the Information Matrix q, is given by which of the following? A.
n - τ h(1) θ
B.
n - h(0) - τ h(1) θ
C.
n - h(0) θ
D.
n θ
E. None of the above. 29.9 (2 points) The element of the Information Matrix r, is given by which of the following? n n A. 2 B. 2 + h(0) τ τ C.
n h(1) + 2 τ θ
E. None of the above.
D.
n h(2) + 2 2 τ θ
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 840 Use the following information for the next three questions: A distribution with two parameters has been fit via maximum likelihood. ⎛128,866 19, 489⎞ The information matrix is ⎜ ⎟. 3065 ⎠ ⎝ 19,489 29.10 (2 points) What is the standard deviation of the estimated first parameter? A. less than 0.009 B. at least 0.009 but less than 0.011 C. at least 0.011 but less than 0.013 D. at least 0.013 but less than 0.015 E. at least 0.015 29.11 (2 points) What is the standard deviation of the estimated second parameter? A. less than 0.08 B. at least 0.08 but less than 0.09 C. at least 0.09 but less than 0.10 D. at least 0.10 but less than 0.11 E. at least 0.11 29.12 (2 points) What is the correlation between the estimates of the two parameters? A. less than -0.6 B. at least -0.6 but less than -0.2 C. at least -0.2 but less than 0.2 D. at least 0.2 but less than 0.6 E. at least 0.6 29.13 (4 points) A sample of size n has been fit via maximum likelihood to a Normal Distribution (x -µ)2 exp[] 2σ2 . with parameters the mean µ and the standard deviation σ, in that order. f(x) = σ 2π Determine the information matrix and the covariance matrix. 29.14 (4B, 11/95, Q.22) (2 points) You are given the following:
•
The parameters of a loss distribution are α and β.
•
The maximum likelihood estimators of these parameters have information matrix ⎛ 75 −20 ⎞ A(α, β) = ⎜ ⎟ 6 ⎠ ⎝−20
Determine the approximate variance of the maximum likelihood estimate of α. A. 0.12
B. 0.4
C. 1.5
D. 6
E. 75
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 841 29.15 (4B, 11/97, Q.16) (2 points) You are given the following: • The parameters of a loss distribution are α and β. ^ • The maximum likelihood estimators of these parameters, α^ and β, have information matrix
⎛ 6 −20 ⎞ A(α, β) = ⎜ ⎟ ⎝−20 75 ⎠ Determine the approximate variance of α^ . A. 0.12
B. 0.40
C. 1.50
D. 6.00
E. 75.00
29.16 (4B, 5/99, Q.22) (2 points) You are given the following: • The parameters of a loss distribution are α and β. • The maximum likelihood estimators of these parameters, have variance-covariance matrix ⎛0.12 0.40⎞ ⎜ ⎟ ⎝ 0.40 1.50 ⎠ Determine the approximate correlation coefficient of the maximum likelihood estimates of α and β. A. Less than -0.90 B. At least -0.90, but less than -0.30 C. At least -0.30, but less than 0.30 D. At least 0.30, but less than 0.90 E. At least 0.90
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 842 29.17 (4, 11/00, Q.13) (2.5 points) A sample of ten observations comes from a parametric family f(x, y; θ1, θ2) with loglikelihood function ln L(θ1, θ2) =
10
∑ ln f(xi , yi; θ 1, θ2 ) = -2.5θ12 - 3θ1θ2 - θ22 + 5θ1 + 2θ2 + k, where k is a constant. i=1
Determine the estimated covariance matrix of the maximum likelihood estimator, ⎛ θ^ 1 ⎞ ⎜^ ⎟ ⎝ θ2 ⎠ (A)
⎛0.5 0.3 ⎞ ⎜ ⎟ ⎝ 0.3 0.2⎠
(B)
⎛ 20 −30 ⎞ ⎜ ⎟ ⎝−30 50 ⎠
(C)
⎛0.2 0.3 ⎞ ⎜ ⎟ ⎝ 0.3 0.5⎠
(D)
⎛5 3⎞ ⎜ ⎟ ⎝3 2⎠
(E)
⎛ 2 -3⎞ ⎜ ⎟ ⎝-3 5 ⎠
29.18 (4, 11/03, Q.18 & 2009 Sample Q. 14) (2.5 points) The information associated with the maximum likelihood estimator of a parameter θ is 4n, where n is the number of observations. Calculate the asymptotic variance of the maximum likelihood estimator of 2θ. (A)
1 2n
(B)
1 n
(C)
4 n
(D) 8n
(E) 16n
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 843 Solutions to Problems: 29.1. D. The inverse of the information is approximately the variance of the estimate of alpha. 1/193.36 = .005172. Thus the standard deviation is: .005172 = 0.0719. A 90% confidence interval corresponds to ± 1.645 standard deviations = ± 0.12. So the interval estimate for alpha is: 1.90 ± 0.12 = 1.78 to 2.02. 29.2. A. The inverse of the information matrix is an approximate Covariance Matrix: ⎛ 0.00366 -0.00337⎞ ⎜ ⎟ ⎝-0.00337 0.00690⎠ ^
Thus the approximate variance of α^ is: 0.00366. The approximate variance of β is: .00690. ^
The approximate covariance of α^ and β is: -0.00337. ^
Thus the approximate correlation of α^ and β is: -.00337/
(.00366)(.00690) = -0.67.
Comment: The determinant of A is (496)(263) - (242)(242) = 71884. The elements of the inverse of A are: 263/71884, -242/71884, -242/71884, 496/71884 = 0.00366 , -0.00337, -0.00337, 0.00690. 29.3. A. We fit via maximum likelihood by setting the partial derivatives of the loglikelihood equal to zero. ∂ ln L /∂α = 1506.37/β2 - 130α/β2 = 0.
∂ ln L /∂β = 2(8894.6 - 1506.37α + 65α2)/β3 - 130/β = 0. From the first equation α = 1506.37/130 = 11.5875. From the second equation, β2 = 2(8894.6 - 1506.37α + 65α2)/130 = 2.57074. β = 1.60335. ∂2 ln L /∂α2 = - 130/β2 = -50.6. ∂2 ln L /∂α∂β = -3012.74/β3 + 260α/β3 = 0.
∂2 ln L /∂β2 = -6(8894.6 - 1506.37α + 65α2)/β4 + 130/β2 = -101.2. 0 ⎞ ⎛ 50.6 The Information Matrix is approximately: -∂2 loglikelihood / ∂θi∂θj = ⎜ . 101.2⎟⎠ ⎝ 0 ⎛0.0198 0 ⎞ The covariance matrix is the inverse of the Information Matrix: ⎜ ⎟. 0.0099⎠ ⎝ 0 Comment: This is the loglikelihood of a LogNormal Distribution, with α rather than µ and β rather than σ as the parameters, fit to the 130 losses in Section 2.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 844 29.4. C. The variances and covariance are the elements of the inverse of the information matrix. Therefore the variance of the estimate of θ is 7.3. The standard deviation is therefore 2.7. An approximate 95% confidence interval is plus or minus 1.96 standard deviations around the given estimate of θ of 20.0: 20.0 ± (1.96)(2.7) = 20.0 ± 5.3. 29.5. A. The variances and covariance are the elements of the inverse of the information matrix. Therefore the variance of the estimate of τ is .00015. The standard deviation is therefore .0122. An approximate 95% confidence interval is plus or minus 1.96 standard deviations around the given estimate of τ of 0.700: 0.70 ± (1.96)(.0122) = 0.700 ± .024. 29.6. D. The correlation is the covariance divided by the square root of the product of the variances. The variances and covariance are the elements of the inverse of the information matrix. Correlation = (.021) / {( 7.3)(.00015)}.5 = 0.635. Comment: In this case, θ^ and ^τ are positively correlated. Therefore, if we overestimate θ we are likely to also overestimate τ. If we underestimate θ we are likely to also underestimate τ. This is bad news is we are trying to estimate the value of a function similar to θ τ. This is good news is we are trying to estimate the value of a function similar to θ / τ. Thus we would expect the variance of a function of the parameters to depend on the form of the function, as well as the variances and covariances of the parameters. As discussed in a subsequent section, this is indeed the case for the delta method. 29.7. A. For the Weibull Distribution, f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x. ln f(x) = ln(τ) + τln(x/θ) - (x/θ)τ - ln(x). ∂2 ln f(x) / ∂θ2 = τ/θ2 - τ(τ+1) xτ θ−(τ+2). Therefore, −n E[∂2 ln f(x) / ∂θ2] ≅ -n τ/ θ2 + τ(τ+1)θ−2Σ(xi/θ)τ = -n τ/ θ2 + τ(τ+1)θ−2 h(0). 29.8. B. ∂2 ln f(x) / ∂θ∂τ = - 1/θ + xτθ−(τ+1) (τln(x/θ) + 1). Thus, −n E[∂2 ln f(x) / ∂θ∂τ] ≅ n/θ − (1/θ)Σ(xi/θ)τ − (τ/θ)Σln(x/θ)(xi/θ)τ = n/θ - (1/θ)h(0) - (τ/θ)h(1). 29.9. E. ln f(x) = ln(τ) + τln(x/θ) - (x/θ)τ - ln(x). ∂2 ln f(x) / ∂τ2 = −1/τ2 − (ln(x/θ))2 (x/θ)τ. Therefore, −n E [∂2 ln f(x) / ∂τ2] ≅ n/τ2 + Σln(xi/θ))2 (xi/θ)τ = n/τ2 + h(2).
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 845 29.10. D., 29.11. C. & 29.12. A. The inverse of the information matrix is: ⎛ 3065 ⎛0.0002023 -0.001286⎞ -19,489 ⎞ ⎜ ⎟ / {(128866)(3065) - (19489)(19489)} = ⎜ ⎟. ⎝-19,489 128,866 ⎠ ⎝ -0.001286 0.008504 ⎠ Variance of the estimated first parameter: .0002023. Standard deviation of the estimated first parameter: 0.0002023 = 0.0142. Variance of the estimated second parameter: .008504. Standard deviation of the estimated second parameter: 0.008504 = 0.0922. The correlation is the covariance divided by the square root of the product of the variances. The covariance of the estimates of the two parameters is -0.001286. Correlation = -0.001286 /
(0.0002023)(0.008504) = -0.980.
29.13. ln f(x) = -(x-µ)2 /(2σ2 ) - lnσ - .5ln(2π). ∂ ln f(x) / ∂µ = (x-µ)/σ2 .
∂2 ln f(x) / ∂µ2 = -1/σ2 . E[∂2 ln f(x) / ∂µ2] = -1/σ2 . ∂ ln f(x) / ∂σ = (x-µ)2/σ3 - 1/σ. ∂2 ln f(x) / ∂σ2 = -3(x-µ)2/σ4 + 1/σ2. E[(x-µ)2 ] = σ2. ⇒ E[∂2 ln f(x) / ∂σ2] = E[-3(x-µ)2/σ4 + 1/σ2] = -3σ2/σ4 + 1/σ2 = -2/σ2.
∂2 ln f(x) / ∂µ∂σ = -2(x-µ)/σ3 . E[(x-µ)] = 0. ⇒ E[∂2 ln f(x) / ∂µ∂σ] = E[-2(x-µ)/σ3 ] = 0. ⎛n / σ2 0 ⎞ Therefore, the Information Matrix is: ⎜ ⎟. 2n / σ2⎠ ⎝ 0 ⎛σ 2 / n ⎞ 0 Taking the inverse of the Information Matrix, the Covariance Matrix is: ⎜ ⎟. σ 2 / (2n )⎠ ⎝ 0 Comment: Due to the invariance of maximum likelihood to change of variables, these are also the Information Matrix and the Covariance Matrix for the LogNormal Distribution. 29.14. A. The approximate Variance-Covariance Matrix is given by the inverse of the information ⎛ 6 20⎞ matrix. In this case the inverse of the given information matrix is: ⎜ ⎟ / 50. ⎝20 75⎠ The variance of alpha is the upper left element of this inverse matrix: 6/50 = 0.12. Comments: The determinant of A is: (6)(75)-(-20)(-20) = 50. The elements of the inverse of A are: 75/50, -(-20)/50, -(-20)/50, 6/50 = 1.5, 0.4, 0.4, 0.12.
2013-4-6, Fitting Loss Distributions §29 Info. & Covar. Matrices , HCM 10/15/12, Page 846 29.15. C. The inverse of the information matrix is an (approximate) Variance-Covariance Matrix. ⎛ 1.5 0.4 ⎞ The inverse of the given information matrix A is: ⎜ ⎟ . ⎝ 0.4 0.12⎠ Thus the approximate variance of α is 1.5. 29.16. E. The variance of the estimate of α is 0.12. The variance of the estimate of β is 1.5. The covariance of alpha and beta is: 20/50 = 0.4. The approximate correlation coefficient of the estimates of α and β is: 0.4 /
(0.12)(1.5) = 0.943.
29.17. E. ln L(θ1, θ2) = -2.5θ12 - 3θ1θ2 - θ22 + 5θ1 + 2θ2 + k.
∂ln L / ∂θ1 = -5θ1 - 3θ2 + 5. ∂2 ln L / ∂θ12 = -5. ∂ln L / ∂θ2 = -3θ1 - 2θ2 + 2. ∂2 ln L / ∂θ22 = -2. ∂2 ln L / ∂θ1∂θ2 = - 3. The 2 by 2 Information Matrix can be estimated in terms of the loglikelihood as: 2 ⎛ - ∂ LnL ⎜ ∂θ12 ⎜ ∂2 LnL ⎜⎝ ∂θ ∂θ 1
2
∂2 LnL ⎞ ∂θ1∂θ2 ⎟ ⎛ 5 3⎞ =⎜ ⎟, ∂2 LnL ⎟⎟ ⎝ 3 2⎠ ∂θ2 2 ⎠ -
The inverse of the Information Matrix is the estimated covariance matrix: ⎛ 2 -3⎞ ⎜ -3 5 ⎟ / {(5)(2) - (3)(3)} = ⎝ ⎠
⎛ 2 -3⎞ ⎜ ⎟. ⎝ -3 5 ⎠
29.18. B. Var[ θ^ ] = Information-1 = 1/(4n). Var[2 θ^ ] = 22 Var[ θ^ ] = 4/(4n) = 1/n. Comment: “information” ⇔ the one by one information matrix for a single parameter.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 847
Section 30, Variance of Functions of Maximum Likelihood Parameters When fitting via the Method of Maximum Likelihood, the inverse of the Information Matrix is an approximate Variance-Covariance Matrix of the estimated parameters. In this section, it will be shown how to use this Variance-Covariance matrix in order to compute the variance of functions of the estimated parameters using the so-called delta method. This will involve computing the gradient vector of the function. Gradient Vector: The gradient vector of a function h of the parameter(s) consists of the partial derivatives of the function h with respect to the parameter(s): ∂h / ∂θi. For example for the Pareto Distribution with two parameters, α and θ, the gradient vector of a function h of the parameters has two elements: ⎛∂h / ∂α ⎞ ⎜ ⎟ ⎝ ∂h / ∂θ ⎠ Where as is conventional, I have represented the gradient vector as a column vector. ⎛ θ ⎞ α− 1 Exercise: Let R(x) = ⎜ ⎟ , the excess ratio of a Pareto Distribution. ⎝θ + x ⎠ What is the gradient vector for R(x)? ⎛∂R(x) / ∂α ⎞ ⎛ ln[θ / (θ + x)] {θ / (θ + x)} α -1⎞ [Solution: ⎜ ⎟=⎜ ⎟. ] ⎝ ∂R(x) / ∂θ ⎠ ⎝ (α -1) x θ α - 2 / (θ + x)α ⎠ Therefore, the gradient vector of the excess ratio at $1 million for a Pareto Distribution with for example α =1.702, θ = 240,151 is:365 ⎛ ln[θ / (θ + x)] {θ / (θ + x)} α -1⎞ ⎛ −0.5185 ⎞ ⎜ ⎟=⎜ ⎟. ⎝ (α -1) x θ α - 2 / (θ + x)α ⎠ ⎝7.445 x 10− 7 ⎠ Delta Method: The variance-covariance matrix can be used together with the gradient vector in order to estimate the variance of a function of the parameters when using the maximum likelihood fitted distribution. 365
This is the Pareto fit by maximum likelihood to the ungrouped data in Section 1.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 848 Using the so-called delta method, the asymptotic variance of the estimate of the function of the parameters is the matrix product of:366 (transpose of gradient vector) (Inverse of information matrix )(gradient vector) = (transpose of gradient vector) (Covariance Matrix) (gradient vector). Common functions to apply the delta method to besides the estimated parameters themselves are: the cumulative distribution function, survival function, moments of the distribution, loss elimination ratios, or limited expected values.367 With several parameters the computations can get too long to perform on an exam. However, when there is only one parameter θ, the gradient vector and the information matrix are both onedimensional. Therefore, the variance of the estimate of a function of a single parameter θ, h(q), becomes:368 Var[h(θ)] ≅ -
⎛ ∂h(θ)⎞ 2 (∂h(θ) / ∂θ)2 ^ = ⎜ ⎟ Var[ θ ]. n E[∂ 2 ln f(θ) / ∂θ 2] ⎝ ∂θ ⎠
For example, let f(x) = e-x/θ /θ. Then ln f(x) = -x/θ - lnθ. ∂ ln f(x) / ∂θ = x/θ2 -1/θ.
∂2 ln f(x) / ∂θ2 = -2x/θ3 +1/θ2. E[∂2 ln f(x) / ∂θ2] = -2θ/θ3 +1/θ2 = -1/θ2. Thus Var[θ] ≅ -1/{nE[∂2 ln f(x) / ∂θ2]} = θ2/n . If we were to let h(x;θ) be the cumulative distribution function, h(x;θ) = 1 - e−x/θ, then ⎛ x ⎞ 2 θ2 x2 e- 2x / θ Var[h] ≅ (∂h / ∂θ)2 Var[θ] = ⎜- 2 e - x / θ ⎟ = . ⎝ θ ⎠ n n θ2
Exercise: An Exponential Distribution, f(x) = e-x/θ /θ, has been fit to 500 data points via the method of maximum likelihood. The fitted value of θ = 33.3. What is the variance of the estimate of the probability of a claim being less than 70? [Solution: Let h(x) = 1 - e -x /θ. Var[h] ≅ x2 e−2x/θ /(θ2n) = 702 e-4.204 / (33.32 500) = 0.00013.]
366
See Theorem 15.6 of Loss Models. In a numerical example below, the delta method is applied to estimated excess ratios. 368 This one dimensional special case of the delta method was discussed in a previous section. 367
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 849 Variance of Estimated Excess Ratios: As an example I will apply these techniques to the excess ratios estimated in a previous section. One computes the gradient vector of partial derivatives of the excess ratio with respect to the parameters. The asymptotic variance estimate is the matrix product of: ( transpose of gradient vector) (Inverse of the information matrix) (gradient vector) The gradient vector can be computed in either of two ways. One can take the formula for the Excess Ratio in terms of the parameters and compute the partial derivatives with respect to each parameter. Then one can plug in the values of the parameters and limit. In many cases when working on a computer, it is easier to compute the gradient vector numerically. One need only compute the excess ratio for the given limit while varying one of the parameters a small amount from the given value. This is how each technique would be applied to compute the gradient vector for the Pareto Distribution for α =1.702 , θ = 240,151, for the excess ratio at $1 million. ⎛ θ ⎞ α− 1 The first method, uses the formula for the excess ratio of a Pareto: R(x) = ⎜ ⎟ . ⎝θ + x ⎠ ⎛ θ ⎞ α− 1 ∂R(x) = ln[θ/(θ+x)] ⎜ ⎟ = -0.5185. ⎝θ + x ⎠ ∂α ∂R(x) (α − 1) x θ α − 2 = = 7.445 x 10-7. ∂θ (θ + x)α This gives a gradient vector of: -0.5185, 7.445 x 10-7. The second method computes the excess ratio for $1 million for 3 sets of parameters: α =1.702 , θ = 240,151
R($1 million) = 0.315850015
α =1.702 + .0001 , θ = 240,151
R($1 million) = 0.315798165
α =1.702 , θ = 240,151 + 1
R($1 million) = 0.315850759
(0.315798165 - 0.3150015)/0.0001= -0.5185, and (0.315850759 - 0.3150015)/1 = 7.44 x 10-7. This gives a gradient vector of: -0.5185, 7.44 x 10-7. Thus this method results in the same estimated gradient vector. In many actual applications the second method is easier to program, as one does not have to work out a formula for the partial derivatives of the Excess Ratio.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 850 Using the estimated Variance-Covariance matrix computed in the previous section, one gets:369 (Covariance Matrix) (gradient vector) = ⎛ 0.162689 ⎞ ⎛-0.0609391⎞ 31450.9 ⎞ ⎛ -0.5185 = ⎜ ⎟ ⎜ ⎟ ⎜ ⎟. ⎝ 31450.9 7.04503 x 109 ⎠ ⎝ 7.445 x 10-7 ⎠ ⎝ -11062.3 ⎠ Multiplying this result by the transpose of the gradient vector, gives: ⎛ -0.0609391⎞ (-0.5185, 7.445 x 10-7) ⎜ ⎟ = ⎝ -11062.3 ⎠ (-0.5185) (-0.0609391) + (7.445 x 10-7 )(-11062.3) = 0.0234, the approximate variance of the estimated excess ratio at $1 million, using the fitted Maximum Likelihood Pareto distribution. Computed in a similar manner, but using the observed information, below are shown the approximate variances for the estimated excess ratios for $1 million and $5 million estimated using maximum likelihood curves fit to the ungrouped data in Section 2. Also shown are twice the standard deviation, which can be used to produce very approximate 95% confidence intervals for the estimated excess ratios.
Distribution
Excess Ratio $1 million
Twice Standard Deviation
Data
0.245
Pareto
0.316
0.31
0.0233
0.115
0.24
0.0139
Weibull
0.132
0.076
0.00146
0.001
0.0019
0.000000906
Gamma
0.103
0.058
0.000846
0.000
0.00011
2.76e-9
Trans. Gamma
0.193
0.11
0.00282
0.011
0.018
0.0000793
Gen. Pareto
0.300
0.30
0.0232
0.100
0.23
0.0131
Burr
0.297
0.32
0.0256
0.097
0.25
0.015
LogNormal
0.374
0.16
0.00614
0.108
0.097
0.00236
Variance
Twice Standard Deviation
Variance
0.000
(0.162689)(-0.5185) + (31450.9)(7.445 x 10-7) = -0.0609391. (31450.9)(-0.5185) + (7.04503 x 109 )(7.445 x 10-7) = -11062.3. 369
Excess Ratio $5 million
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 851 Below are shown the excess ratios (thick) estimated from the maximum likelihood Pareto as well as 95% confidence intervals:
Excess Ratio
130 claims
0.50
0.20 13,000 claims 0.10 130 claims
0.05
13,000 claims
0.02
0.2
0.5
1.0
2.0
5.0
10.0
Size (million)
The wider confidence intervals are for a sample with only 130 claims; the confidence intervals are quite wide and the estimates of excess ratios are not of much practical value. Also shown are the narrower confidence intervals that would result if there had been 13,000 claims, 100 times as many data points. With even more data points, the confidence intervals would be even narrower, with the 1 standard deviation decreasing as: . number of data points Types of Errors: Letʼs assume we used the Method of Maximum Likelihood in order to estimate the future excess ratio at $1 million. There are many reasons why the future observed excess ratios will differ from our estimate. First, the future observed excess ratio is subject to statistical error due to random fluctuations. Thus even if we were to perfectly estimate the future expected excess ratio at $1 million, the actual observed excess ratio in any given year would differ from our estimate. This variability of observed results around expected is usually measured by the process variance.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 852 In addition, it may be that the data set we are using is not completely appropriate for estimating future excess ratios For example, we may not have properly adjusted for loss development beyond a certain report, or it may be that this data set includes a different mix of types of business then we will insure in the future period,.370 or there may have been a recent law change for which we did not properly adjust, etc. There are errors in the estimated parameters due to the difficulty of estimating the parameters from the limited observed data. This is what is being estimated by the use of the delta method. Our finite sample of data may not be representative due to chance, which in turn can lead to an error in the estimated parameters. This is sometimes referred to as parameter variance. There is error due to the fact that the estimation method is not perfect.371 While Maximum Likelihood is not perfect, it is asymptotically unbiased and as the sample size gets large the variance approaches the Rao-Cramer (Cramer-Rao) lower bound. We are using this asymptotic limit in order to estimate the variances. There can be error due to the use of the wrong model or an incomplete model. An example would be if the claims followed a LogNormal distribution (without our knowing it), but we used a Pareto distribution to model them. This is what Loss Models calls model error.372 In practical applications, one has to be particularly wary of the possibility that the different types of error may add. For example, the maximum likelihood Gamma fit to the ungrouped data in Section 2 has parameters α = 0.5825 , θ = 536,801. Therefore, the estimated mean is: αθ = (0.5825)(536,801) = 312,687. The Observed Information Matrix as listed in the previous section is: ⎛ 496.4 .0002422 ⎞ ⎜ ⎟. ⎝.0002422 2.628 x 10− 10 ⎠ The covariance matrix is the inverse of the Information Matrix: ⎛0.003661 -3374 ⎞ ⎜ ⎟. 6.914 x 109 ⎠ ⎝ -3374
370
An example of sampling frame error. No estimation method is perfect (as applied to finite data sets.) 372 This is sometimes referred to as specification error. 371
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 853 Exercise: Using the delta method, what is the variance of the estimated mean? [Solution: ∂(αθ)/∂α = θ = 536801. ∂(αθ)/∂θ = α = 0.5825. ⎛0.003661 (536801, 0.5825) ⎜ ⎝ -3374
-3374 ⎞ ⎛536,801⎞ ⎟ ⎜ ⎟ = 1.291 x 109 .] 6.914 x 109 ⎠ ⎝ 0.5825 ⎠
Thus assuming this data was drawn from a Gamma Distribution, a 95% confidence interval for the mean is: 312,687 ± 1.96
1.291 x 109 = 313 ± 70 thousand.
Exercise: Assuming this data was drawn from a Gamma Distribution, what is the expected sum of the next 26 losses? [Solution: (26)(312,687) = 8,129,862.] There are at least two reasons why this estimate will not match the observation. The first is due to the variance in the estimated mean. The second is due to the random fluctuation in the loss sizes. The second is estimated via the process variance. For a single claim from a Gamma Distribution, the process variance = αθ2 = (0.5825)(536,8012 ) = 1.679 x 1011. The variance for a sum of 26 independent loss sizes is: (26)(1.679 x 1011) = 4.365 x 1012. The variance of the sum due to the variance of the estimated mean is: Var[26 mean] = 262 Var[mean] = (676)(1.291 x 109 ) = 8.727 x 1011. These two sources of error add, and therefore the mean squared error is:373 4.365 x 1012 + 8.727 x 1011 = 5.238 x 1012. Therefore, a 95% confidence interval for the sum of the next 26 losses is: 8,129,862 ± 1.96
5.238 x 1012 = 8.1 ± 4.5 million.
If instead we are interested in the expected aggregate losses over the next year, then we need an estimate of the frequency as well as the severity. Exercise: Assume you observed the 130 losses in Section 2 over 5 years. Fit a Poisson Distribution via Maximum Likelihood. [Solution: It is the method of moments estimate, λ = 130/5 = 26.] In general, the MSE = Variance + Bias2 . Maximum Likelihood estimation is asymptotically unbiased. Therefore for simplicity, I have ignored the bias here. However, in actual applications to small data sets, the bias could be significant. One could estimate the bias via a simulation experiment. 373
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 854 This estimate of the mean frequency has a variance of: Var[λ] = Var[mean] = Var[X]/n = λ/n = 26/130 = 0.20. The estimated mean annual aggregate loss, assuming the frequency is Poisson and the severity is Gamma, is: (26)(312,687) = 8,129,862. Exercise: If X and Y are independent, what is the variance of XY? [Solution: Var[XY]= E[(XY)2 ] - E[XY]2 = E[X2 ]E[Y2 ] - (E[X]E[Y])2 = (Var[X] + E[X]2 )(Var[Y] + E[Y]2 ) - E[X]2 E[Y]2 = Var[X]Var[Y] + E[Y]2 Var[X] + E[X]2 Var[Y].] Now the estimated mean aggregate loss is: (estimated mean frequency)(estimated mean severity) Therefore, assuming the errors in the estimated frequency and severity are independent, then Var[estimated mean aggregate loss] = Var[est. freq.]Var[est. sev.] + (sev2 )Var[estimated freq.] + (freq2 )Var[estimated sev.] = (.20)(1.291 x 109 ) + (312,6872 )(.20) + (262 )(1.291 x 109 ) = 0.0026 x 1011 + 0.1955 x 1011 + 8.727 x 1011 = 8.925 x 1011.374 The process variance of the aggregate loss is: (mean freq.)(Process Variance of severity) + (mean sev.)2 (Process Variance of freq.) = (26)(1.679 x 1011) + (312,687)2 (26) = 4.365 x 1012 + 2.542 x 1012 = 6.907 x 1012. These two sources of error add, and therefore the mean squared error is:375 8.925 x 1011 + 6.907 x 1012 = 7.800 x 1012. Therefore, a 95% confidence interval for the sum of the next yearʼs aggregate loss is: 8,129,862 ± 1.96
374
7.800 x 1012 = 8.1 ± 5.5 million.376
Note that the term Var[est. freq.]Var[est. sev.] is relatively small. If one ignored it, one would get the same result as applying the delta method to the function XY, when X and Y are independent. 375 Again for simplicity, I have ignored the bias here. 376 The error bars when we are estimating both frequency and severity are somewhat wider than when we assumed we would have 26 losses.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 855 A LogNormal Example:377 Let us assume we have fit a LogNormal Distribution via maximum likelihood to a sample of size n. ⎛n / σ2 0 ⎞ Then as shown in the previous section, the Information Matrix is: ⎜ ⎟. 2n / σ 2⎠ ⎝ 0 0 ⎞ ⎛σ 2 / n The inverse of the Information Matrix is: ⎜ . σ2 / (2n)⎟⎠ ⎝ 0 Assume we will use the fitted LogNormal Distribution to estimate the mean. Let us apply the delta method to estimate the variance of this maximum likelihood estimator. The quantity of interest as a function of the parameters is: h(µ , σ) = exp(µ + σ2/2). ∂h = exp(µ + σ2/2). ∂µ
∂h = σ exp(µ + σ2/2) ∂σ
^ 2/2) (1, σ ^ ). Thus plugging in the fitted parameters, the gradient vector is: exp( µ^ + σ ^ = 2.1, determine the gradient vector for this Exercise: If the fitted parameters are, µ^ = 5.5 and σ
example of estimating the mean. ^ 2/2) (1, σ ^ ) = (2219.4) (1, 2.1) = (2219.4, 4660.8).] [Solution: exp( µ^ + σ
Exercise: For these fitted parameters, and n = 100, use the delta method to estimate the variance of this estimator of the mean. 0 ⎞ 0 ⎛σ 2 / n ⎛ 0.04410 ⎞ [Solution: The Covariance Matrix is: ⎜ = . ⎟ ⎜ σ2 / (2n)⎠ 0 0.02205⎟⎠ ⎝ 0 ⎝ 0 ⎛ 0.04410 ⎞ ⎛ 2219.4⎞ Variance = (2219.4, 4660.8) ⎜ 0 0.02205⎟⎠ ⎜⎝ 4660.8⎟⎠ ⎝ = (2219.42 )(0.0441) + (4660.82 )(0.02205) = 696,218. ^ 2/2) = 2219.4. An approximate 95% Comment: The point estimate of the mean is exp( µ^ + σ
confidence interval for the mean is: 2219.4 ± (1.960)
696,218 = (584, 3855).]
For this example, the Delta Method result for the variance of this estimator of the mean is: ^2 0 ⎞ 1⎞ ⎛σ 2 / n ^ 2/2) (1, σ ^) ^ + σ ^ 2/2) ⎛ ^ + σ ^ 2) σ (1 + σ ^ 2/2). µ µ exp( µ^ + σ exp( = exp(2 ⎜ ⎟ ⎜ σ^ ⎟ σ2 / (2n)⎠ n ⎝ 0 ⎝ ⎠ 377
See Example 15.13 in Loss Models, and 4, 5/00, Q.25.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 856 Comparing the Delta Method to the Exact Result: There are a number of reasons why the result of using the Delta Method is only an approximation to the actual variance of the maximum likelihood estimator estimator:378 1. For finite sample sizes the inverse of the information matrix is only an approximation to the actual covariance matrix. 2. In both the covariance matrix and the gradient vector we have substituted the fitted values for the parameters, which differ somewhat from their actual values. 3. The Delta Method is a kind of linear approximation.379 Previously, I presented the exact variances and biases for this LogNormal example.380 It was derived that the variance of the maximum likelihood estimator is:381 exp[2µ + σ 2 / n] exp[2µ + 2σ 2 / n] . (1 - 2σ 2 / n)(n - 1) / 2 (1 - σ 2 / n)n - 1 Using the Delta Method, the estimate of the variance of this estimator of the mean is: ^2 ^ 2) σ (1 + σ ^ 2/2). exp(2 µ^ + σ n
Assuming solely for illustrative purposes, that the fitted parameters are equal to the actual values of µ = 6 and σ = 2, for various sample sizes, below is a comparison of standard deviations: Sample Size 4 8 10 15 20 25 100 1000 10,000 100,000
Exact Standard Deviation ∞ ∞ 21,957 6421 4161 3223 1138 330 103 33
Delta Method Estimate of Standard Deviation 5163 3651 3265 2666 2309 2065 1033 327 103 33
We see that while the Delta Method does a poor job for small sample sizes, it does a good job of estimating the standard deviation of this estimator for large sample sizes. 378
In addition, we are assuming the data was drawn from a particular distribution, such as in this example a LogNormal. Modeling error would be another potential reason for a difference between the results of the delta method and the true variance of the estimator. 379 Similar to the concept of taking only the first few terms of a (multivariate) Taylor series. 380 See my section on The Properties of Estimators. 381 This variance is not finite unless n > 2σ2.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 857 I simulated this example for a sample size of 20: 1. Simulate a sample of size 20 from a LogNormal Distribution with µ = 6 and σ = 2. 2. Fit a LogNormal via maximum likelihood: µ^
n
=
∑ ln(xi) / n.
i= 1
^2 σ
n
=
∑ {ln(xi)
^ }2 / n. - µ
i= 1
3. Apply the Delta Method. The estimate of the standard deviation of the estimator of the mean is: ^ ^ 2/2) σ exp( µ^ + σ n
^ 2/2 . 1 + σ
4. Record the result of step 3 and return to step 1. Here is a histogram of the results of using the Delta Method for 1000 simulation runs: Prob. 0.30 Sample Size 20 0.25
0.20
0.15
0.10
0.05
5000
10000
15000
20000
25000
As mentioned above, the exact standard deviation of the estimator for n = 20 is 4161. The mean result of these simulations is 3002. We can see that the Delta Method does a relatively poor job for a small sample size.
StdDev
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 858 For a sample size of 100 instead, here is a histogram of the results of using the Delta Method for 1000 simulation runs: Prob.
0.20 Sample Size 100
0.15
0.10
0.05
1000
2000
3000
4000
As mentioned above, the exact standard deviation of the estimator for n = 100 is 1138. The mean result of these simulations is 1110. We can see that the Delta Method does a better job when the sample size is bigger.
StdDev
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 859 Finally here is a histogram of the results of using the Delta Method for 1000 simulation runs, with a sample size of 1000: Prob. 0.20
0.15 Sample Size 1000
0.10
0.05
200
250
300
350
400
450
500
StdDev
As mentioned above, the exact standard deviation of the estimator for n = 1000 is 330. The mean result of these simulations is 331. We can see that the Delta Method does a fairly good job when the sample size is large.382
382
Actuaries sometimes work with samples of size much bigger than 1000.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 860 Comparing the Inverse of the Information Matrix and the Covariance Matrix: The inverse of the information matrix is only an approximation to the covariance matrix. 0 ⎞ 383 ⎛σ 2 / n For this example, as shown above, the inverse of the information matrix is: ⎜ . σ2 / (2n)⎟⎠ ⎝ 0 A Maximum Likelihood fit to a LogNormal is equivalent to fitting via the Method of Moments a Normal Distribution to the log claim sizes. Since for the fitted Normal, µ and σ are independent, this is true for the LogNormal as well. Thus the off-diagonal elements of the covariance matrix are zero. For the maximum likelihood LogNormal: µ^ =
n
∑
^2 = σ
ln(xi) / n.
i= 1
n
∑ {ln(xi)
^ }2 / n. - µ
i= 1
The ln(xi) are independent, identically distributed Normals, each with mean µ and variance σ2. Therefore, µ^ is also Normal, with mean µ, and variance: nσ2 / n2 = σ2 / n. Thus the variance of µ^ estimated from the inverse of the information matrix and the actual variance of µ^ are the same, σ2 / n.384 The ln(xi) are a sample of size n from a Normal. µ^ is the sample mean of that Normal sample. n
Thus,
∑ {ln(xi)
^ }2 / (n - 1) is the Sample Variance, S2 , of that Normal sample. - µ
i= 1
^ 2 = S2 (n - 1)/n. Therefore, σ
For a sample of size n from a Normal Distribution with standard deviation σ, (n - 1) S2 / σ2 has a Chi-Square Distribution with n - 1 degrees of freedom.385 A Chi-Square Distribution with ν degrees of freedom is a Gamma Distribution with α = ν/2 and θ = 2. 383
In using the Delta Method, we would substitute the fitted parameters for the actual unknown parameters. Recall that in using the Delta Method, we would substitute the fitted parameters for their actual values. 385 For Exam 4/C, you are not responsible for this result for samples from a Normal Distribution. 384
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 861 ^ 2 = S2 (n - 1)/n ⇔ (σ2/n) (Chi-Square Distribution with n - 1 degrees of freedom) σ
⇔ (σ2/n) (Gamma Distribution with α = (n - 1)/2 and θ = 2) ⇔ Gamma Distribution with α = (n - 1)/2 and θ = 2σ2/n. ^ 2 follows a Gamma Distribution with α = (n - 1)/2, and θ = 2σ2/n. Thus, σ
^ 2 ] = mean of this Gamma Distribution = αθ = σ2(n - 1)/n. E[ σ ^ ] = the half moment of this Gamma = E[ σ
θ Γ[α + 1/2] / Γ[α] = σ
^ ] = σ2(n - 1)/n - σ2 (2/n) Γ[n/2]2 / Γ[(n - 1)/2]2 = Thus, Var[ σ
2 / n Γ[n/2] / Γ[(n - 1)/2].386
σ2 2 Γ[n/ 2]2 {n - 1 }. n Γ[(n - 1)/ 2]2
^ estimated from the inverse of the information matrix, which is This differs from the variance of σ
σ2 / (2n). ^ is: 2{n - 1 The ratio of the estimated to the actual variance of σ
2 Γ[n/ 2]2 Γ[(n - 1)/ 2]2
}.
With the aid of a computer, here is a table of this ratio for selected sample sizes: Sample Size 4 8 10 15 20 25 100 1000 10,000 100,000
Ratio of Estimated Variance to the Actual Variance of the Fitted Sigma 0.907042 0.962027 0.970811 0.981538 0.986509 0.989372 0.997462 0.999750 0.999975 0.999997
We see that in this example, the inverse of the information matrix is a good approximation to the covariance matrix even for small samples. It does an even better job for large samples.387 I have used the formula from Appendix A for E[Xk], with k = 1/2. One and only one of n/2 and (n-1)/2 is an integer. 387 Recall that in using the Delta Method, we would substitute the fitted parameters for their actual values. Here we have ignored the errors introduced by the use of the fitted parameters; these errors would be significant for smaller samples. 386
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 862 Covariances of Functions of the Parameters: The delta method can also be used to estimate the covariance of two different functions g and h of the parameters.388 covariance of g and h ≅ (transpose of gradient vector of g) (Inverse of the information matrix) (gradient vector of h). Letʼs see how the delta method can be used to estimate the covariance of the estimate of the mean and estimate of the median, in the case of fitting a Pareto to the ungrouped data in Section 2, via maximum likelihood. Exercise: What is the mean of a Pareto Distribution with α =1.702 and θ = 240,151? [Solution: mean = θ/(α−1) = 342,095.] Exercise: What is the median of a Pareto Distribution with α =1.702 and θ = 240,151? [Solution: 0.5 = 1 - {θ/(θ + median)}α. Therefore, median = θ(21/α - 1) = 120,721. ] Exercise: Let g = mean of this Pareto Distribution. What is the gradient vector of g? [Solution: g = θ/(α−1). ∂g / ∂α = -q/(α−1)2 = -487,315.
∂g / ∂θ = 1/(α−1) = 1. 4245. gradient vector = (-487,315, 1.4245).] Exercise: Let h = median of this Pareto Distribution. What is the gradient vector of h? h = θ(21/α -1). ∂h / ∂α = -θ ln(2) (21/α)/α2 = -86,349.
∂h / ∂θ = 21/α -1 = 0.5027. gradient vector = (-86,349, 0.5027).] Using the estimated Variance-Covariance matrix computed in the previous section, one gets: (transpose of gradient vector of g)(covariance matrix)(gradient vector of h) = ⎛ 0 .162689 31450.9 ⎞ ⎛ -86,349 ⎞ (-487,315, 1.4245) ⎜ ⎟ ⎜ ⎟ = 3.18 x 108 . ⎝ 31450.9 7.04503 x 109 ⎠ ⎝ 0.5027 ⎠ Thus the estimated covariance of the estimate of the mean and estimate of the median is: 3.18 x 108 . 388
If g and h are equal, this result just reduces to the previous version of the delta method, used to estimate the variance. The roles of g and h can be reversed.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 863 Exercise: What is the variance of the estimate of the mean? [Solution: (transpose gradient vector of g)(Inverse of the information matrix)(gradient vector of g) = ⎛ 0 .162689 31450.9 ⎞ ⎛ -487,315⎞ (-487,315, 1.4245) ⎜ ⎟⎜ ⎟ = 9.27 x 109 .] ⎝ 31450.9 7.04503 x 109 ⎠ ⎝ 1.4245 ⎠ Exercise: What is the variance of the estimate of the median? [Solution: (transpose gradient vector of h)(Inverse of the information matrix)(gradient vector of h) = ⎛ 0 .162689 31450.9 ⎞ ⎛ -86,349⎞ (-86,349 , 0.5027) ⎜ ⎟ ⎜ ⎟ = 2.63 x 108 .] ⎝ 31450.9 7.04503 x 109 ⎠ ⎝ 0 .5027 ⎠ Thus the estimated correlation of the estimate of the mean and the estimate of the median is: 3.18 x 108 (9.27 x 10 9) (2.63 x 108 )
= 0.204.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 864 Negative Binomial Distribution: These same techniques can be applied to frequency distributions.389 For the Negative Binomial Distribution as per Loss Models with two parameters β and r, the information matrix is two by two with elements: ⎛ -n E[∂2 ln f(x) / ∂β2] -n E[∂2 ln f(x) / ∂β ∂r]⎞ ⎜ ⎟ ⎝-n E[∂2 ln f(x) / ∂β∂r] -n E[∂2 ln f(x) / ∂r2 ] ⎠ Exercise: Assume a Negative Binomial with parameters β and r has been fit via the method of maximum likelihood and that the resulting Information Matrix is: ⎛514 83 ⎞ ⎜ ⎟ ⎝ 83 140⎠ What is the variance of the estimate of β? What is the variance of the estimate of r? What is the correlation of the estimates of r and β? [Solution: The inverse of the information matrix is an approximate variance-covariance matrix. ⎛514 83 ⎞ ⎛140 -83 ⎞ ⎛ 0.00215 −0.00128 ⎞ Inverse of ⎜ ⎟=⎜ ⎟ / 65071 = ⎜ ⎟. ⎝ 83 140⎠ ⎝ -83 514⎠ ⎝−0.00128 0.00790 ⎠ Thus the variance of the estimate of the first parameter β is: 0.00215. The variance of the estimate of the second parameter r is: 0.00790. The covariance of the estimates of β and r is -0.00128. Thus, the correlation of the estimates of r and β is:
-0.00128 (0.00215) (0.00790)
= -0.311.]
For a Negative Binomial with parameters β and r, the gradient vector for a function h of the ⎛∂h / ∂β⎞ parameters is: ⎜ ⎟. ⎝ ∂h / ∂r ⎠ If for example, the quantity of interest is the mean frequency, then h = rβ. ⎛r ⎞ Then the gradient vector is: ⎜ ⎟ , where as is conventional I have represented the gradient vector as ⎝β⎠ a column vector. 389
The one parameter cases were discussed in “Mahlerʼs Guide to Fitting Frequency Distributions.”
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 865 If instead the quantity of interest is the variance of the frequency, then h = rβ(1+β). ∂h = r(1+β) + rβ = r + 2rβ. ∂β
∂h / ∂r = β(1+β).
⎛ r + 2rβ ⎞ The gradient vector is: ⎜ ⎟ . ⎝β(1+ β)⎠
Exercise: Assume a Negative Binomial with parameters β and r has been fit via the method of maximum likelihood. The fitted parameters are β = 2.3 and r =1.5. The resulting Information Matrix is: ⎛514 83 ⎞ ⎜ ⎟ ⎝ 83 140⎠ What is the an approximate 95% confidence interval for the estimated mean frequency? ⎛ r ⎞ ⎛1.5 ⎞ [Solution: The gradient vector is: ⎜ ⎟ = ⎜ ⎟ . ⎝β⎠ ⎝2.3 ⎠ (transpose of gradient vector) (Inverse of the information matrix) (gradient vector) = ⎛ 0.00215 −0.00128 ⎞ ⎛1.5 ⎞ ⎛1.5 ⎞ (1.5, 2.3) ⎜ ⎟ ⎜ ⎟ = (0.000281, 0.01625) ⎜ ⎟ = 0.0378. ⎝−0.00128 0.00790 ⎠ ⎝2.3 ⎠ ⎝2.3 ⎠ The standard deviation is:
0.0378 = 0.194.
The point estimate of the mean is: rβ = (1.5)(2.3) = 3.45. Thus an approximate 95% confidence interval for the mean frequency is: 3.45 ± (1.96)(0.194) = 3.45 ± 0.38.] Exercise: What would the confidence interval have been if there had been 100 times as much data, all other things being equal? [Solution: The variance would have been one hundredth as large; the standard deviation would have been one tenth as large. Thus the approximate 95% confidence interval for the mean frequency would have been: 3.45 ± 0.04.]
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 866 Problems: Use the following information for the next two questions: • A set of grouped data includes losses > 100,000.
• Using the Method of Maximum Likelihood, a Single Parameter Pareto Distribution with θ = 100,000 has been fit to this grouped data.
• The Method of Maximum Likelihood fitted parameter is α = 1.904. • The one-by-one Information Matrix (Fisher Information) is: 193.36. 30.1 (2 points) Use the Maximum Likelihood fit to estimate the mean excess loss (mean residual life) at 1 million. A. less than 1.1 million B. at least 1.1 million but less than 1.2 million C. at least 1.2 million but less than 1.3 million D. at least 1.3 million but less than 1.4 million E. at least 1.4 million 30.2 (2 points) What is the standard deviation of the estimate in the previous question? A. less than 75,000 B. at least 75,000 but less than 80,000 C. at least 80,000 but less than 85,000 D. at least 85,000 but less than 90,000 E. at least 90,000 30.3 (2 points) A distribution with parameters α and β is fit to some data via maximum likelihood. The estimated covariance matrix of the maximum Iikelihood estimates of α and β is: ⎛0.16 0.28⎞ ⎜ ⎟ ⎝0.28 0.70⎠ h(α, β) is a function of the parameters. Evaluated at the maximum likelihood fit of α and β:
∂h(α, β) ∂h(α, β) = -0.52, and = 0.12. ∂α ∂β
Estimate the variance of the maximum likelihood estimate of h(α, β). (A) Less than 0.02 (B) At least 0.02, but less than 0.03 (C) At least 0.03, but less than 0.04 (D) At least 0.04, but less than 0.05 (E) At least 0.05
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 867 Use the following information to answer the next eight questions: 10,000 individual claims are observed with a total of $1,050,000 in losses. 30.4 (1 point) An exponential distribution F(x) = 1 - e-λx is fit to this data via the method of maximum likelihood. What is the resulting estimate of λ? A. less than 0.01 B. at least 0.01 but less than 0.02 C. at least 0.02 but less than 0.03 D. at least 0.03 but less than 0.04 E. at least 0.04 30.5 (2 points) What is the standard deviation of the estimate of λ in the prior question? A. less than 0.00008 B. at least 0.00008 but less than 0.00009 C. at least 0.00009 but less than 0.00010 D. at least 0.00010 but less than 0.00011 E. at least 0.00011 30.6 (1 point) Using the exponential distribution fit by the method of maximum likelihood, what is the estimated process variance of the claim severity? A. less than 10,000 B. at least 10,000 but less than 11,000 C. at least 11,000 but less than 12,000 D. at least 12,000 but less than 13,000 E. at least 13,000 30.7 (2 points) What is the standard deviation of the estimate of the variance of the claim severity in the prior question? A. less than 250 B. at least 250 but less than 300 C. at least 300 but less than 350 D. at least 350 but less than 400 E. at least 400
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 868 30.8 (1 point) Using the exponential distribution fit by the method of maximum likelihood, what is the estimated probability of a claim size greater than $200? A. less than 12% B. at least 12% but less than 13% C. at least 13% but less than 14% D. at least 14% but less than 15% E. at least 15% 30.9 (2 points) What is the standard deviation of the estimate in the prior question? A. less than 0.001 B. at least 0.001 but less than 0.002 C. at least 0.002 but less than 0.003 D. at least 0.003 but less than 0.004 E. at least 0.004 30.10 (1 point) Using the exponential distribution fit by the method of maximum likelihood, what is the estimated loss elimination ratio for $100? A. less than 40% B. at least 40% but less than 50% C. at least 50% but less than 60% D. at least 60% but less than 70% E. at least 70% 30.11 (3 points) What is the standard deviation of the loss elimination ratio estimated in the previous question? A. less than 0.001 B. at least 0.001 but less than 0.002 C. at least 0.002 but less than 0.003 D. at least 0.003 but less than 0.004 E. at least 0.004
30.12 (2 points) Let h(α) = ln[-lnα]. Using maximum likelihood, the fitted value of α is 0.20, with Var[ α^ ] = 0.0001. Which of the following is an approximate 95% confidence interval for h(α)? A. 0.48 ± 0.01
B. 0.48 ± 0.03
C. 0.48 ± 0.06
D. 0.48 ± 0.10
E. 0.48 ± 0.15
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 869 Use the following information for the next 9 questions:
• •
F(x) = xa, 0 < x < 1. There are 1000 data points, xi. 1000
•
∑ xi = 854.8 i=1
1000
•
∑ ln[xi] = -169.3 i=1
30.13 (2 points) What is the maximum likelihood estimate of a? A. 5.89 B. 5.91 C. 5.93 D. 5.95 E. 5.97 30.14 (2 points) What is the standard deviation of the previous estimate? A. 0.2 B. 0.3 C. 0.4 D. 0.5 E. 0.6 30.15 (1 point) What is the maximum likelihood estimate of E[X]? A. 0.845 B. 0.850 C. 0.855 D. 0.860 E. 0.865 30.16 (2 points) What is the standard deviation of the previous estimate? A. 0.001 B. 0.002 C. 0.003 D. 0.004 E. 0.005 30.17 (2 points) Based on the maximum likelihood fit, what is the process variance for a single draw from this distribution? (A) Less than 0.015 (B) At least 0.015, but less than 0.017 (C) At least 0.017, but less than 0.019 (D) At least 0.019, but less than 0.021 (E) At least 0.021 30.18 (1 point) Based on the maximum likelihood fit, what is the estimate of the next random draw from this distribution, using the squared error loss function? A. .845 B. .850 C. .855 D. .860 E. .865 30.19 (2 points) What is the mean squared error of the estimate in the previous question? (A) Less than 0.015 (B) At least 0.015, but less than 0.017 (C) At least 0.017, but less than 0.019 (D) At least 0.019, but less than 0.021 (E) At least 0.021
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 870 30.20 (2 points) Based on the maximum likelihood fit, using the squared error loss function, one estimates the sum of the next 10,000 random draws from this distribution. What is the mean squared error of this estimate? (A) Less than 1500 (B) At least 1500, but less than 1600 (C) At least 1600, but less than 1700 (D) At least 1700, but less than 1800 (E) At least 1800 30.21 (3 points) Using the Method of Moments, what is a 95% confidence interval for the mean? (A) (0.853, 0.857) (B) (0.851, 0.859) (C) (0.849, 0.861) (D) (0.847, 0.863) (E) (0.845, 0.865)
30.22 (3 points) You model a loss process using a LogNormal Distribution with parameters µ and σ. The maximum Iikelihood estimates of µ and σ are: µ^ = 3.3
^ σ
= 2.5
The estimated covariance matrix of µ^ and σ^ is: ⎛0.00048 ⎞ 0 ⎜ ⎟ 0 0.00024 ⎠ ⎝ Using the delta method, estimate the standard deviation of the maximum likelihood estimate of the variance of the LogNormal distribution. (A) Less than 15 million (B) At least 15 million, but less than 20 million (C) At least 20 million, but less than 25 million (D) At least 25 million, but less than 30 million (E) At least 30 million
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 871 Use the following information for the next ten questions: Assume one has 1000 claims drawn at random from a Pareto Distribution, with parameters α = 2 and θ = 100,000. ⎛p q⎞ The Information Matrix is ⎜ ⎟ . ⎝q r ⎠ 30.23 (2 points) What is the value of p in the Information Matrix? A. 210
B. 230
C. 250
D. 270
E. 290
30.24 (2 points) Let x be a random draw from a Pareto Distribution. What is the expected value of A.
α α+θ
B.
α+1 αθ
C.
1 ? θ+x α −1 α2 θ
D.
α +1 (α − 1) θ
E.
α (α + 1) θ
30.25 (2 points) What is the value of q in the Information Matrix? A. less than -0.004 B. at least -0.004 but less than -0.003 C. at least -0.003 but less than -0.002 D. at least -0.002 but less than -0.001 E. at least -0.001 30.26 (2 points) Let x be a random draw from a Pareto Distribution. What is the expected value of A.
α+2 α + θ2
B.
α (α + 2) θ 2
1 ? (θ + x)2 C.
α −1 α 2 θ2
D.
α +1 (α − 1) θ 2
30.27 (2 points) What is the value of r in the Information Matrix? A. less than 1 x 10-8 B. at least 1 x 10-8 but less than 2 x 10-8 C. at least 2 x 10-8 but less than 3 x 10-8 D. at least 3 x 10-8 but less than 4 x 10-8 E. at least 4 x 10-8
E.
α2 (α + 1) θ2
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 872 30.28 (2 points) Define the Excess Ratio as: 1 - the Loss Elimination Ratio. What is the Excess Ratio at $1 million for this Pareto Distribution? A. less than 0.080 B. at least 0.080 but less than 0.085 C. at least 0.085 but less than 0.090 D. at least 0.090 but less than 0.095 E. at least 0.095 30.29 (2 points) What is the value of the first element of the gradient vector of the Excess Ratio at $1 million? A. less than -0.6 B. at least -0.6 but less than -0.5 C. at least -0.5 but less than -0.4 D. at least -0.4 but less than -0.3 E. at least -0.3 30.30 (2 points) What is the value of the second element of the gradient vector of the Excess Ratio at $1 million? A. less than 7 x 10-7 B. at least 7 x 10-7 but less than 8 x 10-7 C. at least 8 x 10-7 but less than 9 x 10-7 D. at least 9 x 10-7 but less than 10 x 10-7 E. at least 10 x 10-7 30.31 (3 points) What is the approximate variance of the estimated Excess Ratio at $1 million? A. less than 0.0006 B. at least 0.0006 but less than 0.0007 C. at least 0.0007 but less than 0.0008 D. at least 0.0008 but less than 0.0009 E. at least 0.0009 30.32 (1 point) Which of the following represents an approximate 90% confidence interval for the Excess Ratio at 1 million? A. (0.04, 0.14) B. (0.03, 0.15) C. [0, 0.18) D. [0, 0.20) E. None of A, B, C, or D.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 873 Use the following information for the next six questions: • You observe 1000 claims of sizes xi, i =1, 2, 3, ... 1000. •
A Gamma Distribution has been fit to this data via the method of maximum likelihood with result: α = 2.52 and θ = 59.52.
•
Where ψ(y) =
d ln[Γ(y)] is the digamma function, one has ψ(2.52) = 0.713. dy
•
Where ψ′(y) =
d ψ(y) , one has ψ′(2.52) = 0.486. dy
30.33 (3 points) What is the variance of the estimate of α? A. 0.009
B. 0.011
C. 0.013
D. 0.015
E. 0.017
30.34 (2 points) What is the variance of the estimate of θ? A. less than 7.5 B. at least 7.5 but less than 8.0 C. at least 8.0 but less than 8.5 D. at least 8.5 but less than 9.0 E. at least 9.0 30.35 (2 points) What is the correlation of the estimated α and θ? A. less than -95% B. at least -95% but less than -92% C. at least -92% but less than -89% D. at least -89% but less than -86% E. at least -86% 30.36 (2 points) What is the variance of the estimated mean? A. 9 B. 11 C. 13 D. 15 E. 17 30.37 (1 point) What is the estimated variance of this Gamma Distribution? A. less than 8600 B. at least 8600 but less than 8700 C. at least 8700 but less than 8800 D. at least 8800 but less than 8900 E. at least 8900 30.38 (2 points) Which of the following is an approximate 95% confidence interval for the estimated variance of this Gamma Distribution? A. [8740, 9110] B. [8520, 9330] C. [8200, 9650] D. [7900, 9940] E. [7750, 10100]
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 874 Use the following information for the next three questions: A distribution with parameters α and β is fit to some data via maximum likelihood. The estimated covariance matrix of the maximum Iikelihood estimates of α and β is: ⎛0.16 0.28⎞ ⎜ ⎟ . ⎝0.28 0.70⎠ g(α, β) and h(α, β) are functions of the parameters. Evaluated at the maximum likelihood fit of α and β: ∂g(α, β) ∂g(α, β) ∂h(α, β) ∂h(α, β) = -0.52, = 0.12, = 0.41, and = 0.23. ∂α ∂β ∂α ∂β 30.39 (2 points) Estimate the variance of the maximum likelihood estimate of g(α, β). (A) 0.02
(B) 0.04
(C) 0.06
(D) 0.08
(E)
0.10
30.40 (2 points) Estimate the variance of the maximum likelihood estimate of h(α, β). (A) 0.08
(B) 0.10
(C) 0.12
(D) 0.14
(E)
0.16
30.41 (2 points) Estimate the covariance of the maximum likelihood estimates of g(α, β) and h(α, β). (A) -0.06
(B) -0.03
(C) 0
(D) 0.03
(E)
0.06
30.42 (3 points) Data has been fit via Maximum Likelihood to a Weibull Distribution. The fitted parameters are θ = 100.0 and τ = 0.40. The inverse of the 2 x 2 information matrix is given by:
θ⎛ 2 -0.01⎞ . ⎜ τ ⎝ -0.01 0.001⎟⎠
What is the standard deviation of the estimate of the 90th percentile? A. 120 B. 125 C. 130 D. 135 E. 140
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 875 Use the following information for the next four questions: • Using the Method of Maximum Likelihood, a Pareto Distribution has been fit to grouped data.
• The Method of Maximum Likelihood parameters are α = 2.5884 and θ = 4.6415. ⎛ 2.761
6.150 ⎞ ⎟. ⎝6.150 14.118 ⎠
• The Inverse of the Information Matrix is: ⎜
30.43 (1 point) Use the Maximum Likelihood fit to estimate the Survival Function at 5. A. less than 16% B. at least 16% but less than 19% C. at least 19% but less than 22% D. at least 22% but less than 25% E. at least 25% 30.44 (4 points) What is the variance of the estimate in the previous question? A. less than 0.0010 B. at least 0.0010 but less than 0.0012 C. at least 0.0012 but less than 0.0014 D. at least 0.0014 but less than 0.0016 E. at least 0.0016 30.45 (2 points) The Layer Average Severity is defined as the average dollars in a layer per loss, (including those cases where the loss fails to penetrate the layer.) Use the Maximum Likelihood Fit to estimate the Layer Average Severity for the layer from 10 to 15. A. less than 0.10 B. at least 0.10 but less than 0.12 C. at least 0.12 but less than 0.14 D. at least 0.14 but less than 0.16 E. at least 0.16 30.46 (5 points) What is the variance of the estimate in the previous question? A. less than 0.010 B. at least 0.010 but less than 0.012 C. at least 0.012 but less than 0.014 D. at least 0.014 but less than 0.016 E. at least 0.016
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 876 30.47 (2 points) You model a loss process using a lognormal distribution with parameters µ and σ. You are given: (i) The maximum Iikelihood estimates of µ and σ are: µ^ = 5.0 ^ σ
= 0.8
(ii) The estimated covariance matrix of µ^ and σ^ is: 0 ⎞ ⎛ 0.00012 ⎜ 0 0.00006⎟⎠ ⎝ Estimate the variance of the maximum likelihood estimate of the second moment of the lognormal distribution, using the delta method. (A) 4 million (B) 5 million (C) 6 million (D) 7 million (E) 8 million Use the following information for the next 3 questions: Data has been fit via Maximum Likelihood to a Gamma Distribution. The fitted parameters are α = 4 and θ = 200. The Information Matrix is:
α ⎛ 170 -8⎞ . θ ⎜⎝ -8 1 ⎟⎠
30.48 (2 points) What is the standard deviation of the estimate of the coefficient of variation? A. 0.004 B. 0.005 C. 0.006 D. 0.007 E. 0.008 30.49 (2 points) What is the standard deviation of the estimate of the skewness? Hint: For a Gamma Distribution, skewness = 2/ α . A. 0.010 B. 0.012 C. 0.014 D. 0.016
E. 0.018
30.50 (2 points) What is the standard deviation of the estimate of the kurtosis? Hint: For a Gamma Distribution, kurtosis = 3 + 6/α. A. 0.020
B. 0.024
C. 0.028
D. 0.032
E. 0.036
30.51 (8 points) You are given: (i) Losses follow a lognormal distribution with unknown parameters µ and σ. (ii) Maximum Iikelihood was applied to 2000 losses. (iii) The maximum likelihood estimates are µ^ = 7.0 and σ^ = 1.5. (iv) The fitted lognormal is used to estimate the average payment per loss for a deductible of 1000. Determine the standard deviation of that estimate.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 877 Use the following information for the next 10 questions: The maximum likelihood Pareto Distribution has parameters α = 3.5 and θ = 20. ⎛ 40.82 −5.556⎞ The Information Matrix is: ⎜ ⎟. ⎝−5.556 0.7955⎠ 30.52 (1 point) What is the variance of the estimated α? A. 0.1
B. 0.2
C. 0.3
D. 0.4
E. 0.5
30.53 (1 point) What is the variance of the estimated θ? (A) Less than 25 (B) At least 25, but less than 26 (C) At least 26, but less than 27 (D) At least 27, but less than 28 (E) At least 28 30.54 (1 point) What is the correlation of the estimated α and θ? A. 97.5%
B. 98.0%
C. 98.5%
D. 99.0%
E. 99.5%
30.55 (1 point) What is the maximum likelihood estimate of the mean? A. 4 B. 5 C. 6 D. 7 E. 8 30.56 (2 points) What is the variance of the previous estimate? A. 0.22 B. 0.24 C. 0.26 D. 0.28 E. 0.30 30.57 (1 point) What is the maximum likelihood estimate of the sum of the next 250 loss events? A. 1500 B. 1750 C. 2000 D. 2250 E. 2500 30.58 (3 points) What is the mean squared error of the previous estimate? A. 35,000 B. 40,000 C. 45,000 D. 50,000 E. 55,000 30.59 (1 point) What is the maximum likelihood estimate of the median? A. 4.2 B. 4.4 C. 4.6 D. 4.8 E. 5.0 30.60 (2 points) What is the variance of the previous estimate? A. 0.05 B. 0.06 C. 0.07 D. 0.08 E. 0.09 30.61 (3 points) What is the covariance of the estimate of the mean and the estimate of the median? A. 0.05 B. 0.06 C. 0.07 D. 0.08 E. 0.09
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 878 Use the following information for the next four questions: A Negative Binomial Distribution, has been fit to the following data via the method of Maximum Likelihood. Number of Claims Number of Policies
0 90,000
1 9,000
2 700
3 50
4 5
All 99,755
^ The fitted parameters are: β = 0.06020 and r^ = 1.7600.
⎛2,750,825 94,091⎞ The information matrix is: ⎜ ⎟. 3252 ⎠ ⎝ 94,091 ^
30.62 (1 point) What is the standard deviation of β ? A. 0.002
B. 0.003
C. 0.004
D. 0.005
E. 0.006
30.63 (1 point) What is the standard deviation of r^? A. 0.11
B. 0.13
C. 0.15
D. 0.17
E. 0.19
^ 30.64 (2 points) What is the correlation of β and r^?
A. -0.99
B. -0.95
C. -0.90
D. -0.85
E. -0.80
30.65 (3 points) Which of the following is an approximate 95% confidence interval for the chance of a policyholder being claims free? A. [0.900, 0.904] B. [0.897, 0.907] C. [0.884, 0.920] D. [0.874, 0.930] E. [0.864, 0.940]
Use the following information for the next three questions: The random variables X1 , X2 , ... , Xn , are independent and identically distributed with a LogNormal Distribution with σ = 1.5 and µ unknown. ^
Let α = Σ ln(xi) / n. Let β = E[X]. We estimate β = exp[α]. 30.66 (3 points) Determine the bias of this estimator of the mean. 30.67 (3 points) Determine the mean squared error of this estimator of the mean. 30.68 (2 points) Using the delta method, estimate the variance of this estimator of the mean.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 879 For the next three questions, use the following observed loss development factors from first to second report: 1.814, 1.790, 1.771, 1.789, 1.768, 1.761, 1.751, 1.735, 1.720. 30.69 (3 points) Fit a LogNormal Distribution via Maximum Likelihood. 30.70 (1 point) Use the fitted LogNormal Distribution to estimate the mean. 30.71 (6 points) Use the delta method to determine a 95% confidence for the estimate in the previous question.
30.72 (3 points) Data x1 , x2 , x3 , ... xn , has been fit via Maximum Likelihood to a Weibull Distribution with parameters θ and τ, in that order. The fitted parameters are θ = 20.0 and τ = 0.700. ⎛ 7.3 0.021 ⎞ The inverse of the 2 x 2 information matrix is given by: ⎜ ⎟. ⎝0.021 0.00015⎠ What is the variance of the estimate of the distribution function at 100? A. less than 0.00005 B. at least 0.00005 but less than 0.00010 C. at least 0.00010 but less than 0.00015 D. at least 0.00015 but less than 0.00020 E. at least 0.00020 30.73 (4 points) Data has been fit via Maximum Likelihood to a Pareto Distribution with parameters α and θ, in that order. The fitted parameters are α = 2.3 and θ = 97. ⎛ 0.14 8.2⎞ The inverse of the information matrix is: ⎜ ⎟. ⎝ 8.2 509⎠ Define the quartiles as the 25th, 50th, and 75th percentiles. Define the interquartile range as the difference between the third and first quartiles, in other words as the 75th percentile minus the 25th percentile. What is the variance of the estimate of the interquartile range? A. 14 B. 16 C. 18 D. 20 E. 22
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 880 30.74 (4B, 11/96, Q.16 & Course 4 Sample Exam 2000, Q.33) (2 points) You are given the following:
• The random variable X has the density function f(x) = e-x/θ /θ , 0 < x < ∞ , θ > 0. ^
• θ is estimated by the maximum likelihood estimator, θ, based on a large sample of data. ^
• The probability that X is greater than k is estimated by the estimator exp(-k/ θ). Determine the approximate variance of the estimator for the probability that X is greater than k. ^
A. Variance of θ ^
B. exp(-k/θ) (Variance of θ) ^
C. exp(-2k/θ) (Variance of θ) ^
D. (k/θ2) exp(-k/θ) (Variance of θ) ^
E. (k2 /θ4) exp(-2k/θ) (Variance of θ) 30.75 (4B, 5/99, Q.15) (3 points) You are given the following: •
Claim sizes follow a distribution with density function f(x) = e-x/θ / θ , 0 < x < ∞, θ > 0.
•
A random sample of 100 claims yields total aggregate losses of 12,500.
Using the maximum likelihood estimate of θ, determine the length of an approximate 95% confidence interval for the proportion of claims that are greater than 250. A. Less than 0.025 B. At least 0.025, but less than 0.050 C. At least 0.050, but less than 0.075 D. At least 0.075, but less than 0.100 E. At least 0.100
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 881 30.76 (4, 5/00, Q.25) (2.5 points) You model a loss process using a lognormal distribution with parameters µ and σ. You are given: (i) The maximum Iikelihood estimates of µ and σ are: µ^ = 4.215 ^ σ
= 1.093
(ii) The estimated covariance matrix of µ^ and σ^ is: ⎛0.1195 0 ⎞ ⎜ ⎟ 0.0597⎠ ⎝ 0 (iii) The mean of the lognormal distribution is exp(µ + σ2/2). Estimate the variance of the maximum likelihood estimate of the mean of the lognormal distribution, using the delta method. (A) Less than 1500 (B) At least 1500, but less than 2000 (C) At least 2000, but less than 2500 (D) At least 2500, but less than 3000 (E) At least 3000 30.77 (4, 5/00, Q.34) (2.5 points) In a mortality study, the Weibull distribution with parameters λ and α was used as the survival model, and log time Y was modeled as Y= µ + σW, with W having the standard extreme value distribution. The parameters satisfy the relations:
λ = exp(- µ/σ) α = 1/σ The maximum likelihood estimates of the parameters are µ^ = 4.13 and σ^ = 1.39. The estimated variance-covariance matrix of µ^ and σ^ is: (0.075 (0.016
0.016) 0.048)
^ Use the delta method to estimate the covariance of α^ and In( λ ).
(A) Less than -0.054 (B) At least -0.054, but less than -0.018 (C) At least -0.018, but less than 0.018 (D) At least 0.018, but less than 0.054 (E) At least 0.054
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 882 30.78 (4, 5/01, Q.25) (2.5 points) You have modeled eight loss ratios as Yt = α + βt + εt, t = 1, 2, ..., 8, where Yt is the loss ratio for year t and εt is an error term. You have determined: ⎛ α^ ⎞ ⎛0.50 ⎞ ⎜^⎟=⎜ ⎟ . ⎝ β ⎠ ⎝0.02 ⎠ ^ ⎞⎤ ⎡⎛ α Var⎢⎜ ^ ⎟⎥ = ⎣⎝ β ⎠⎦
⎛ 0.00055 ⎜ ⎝-0.00010
-0.00010 ⎞ ⎟ . 0.00002 ⎠
Estimate the standard deviation of the forecast for year 10, Y10 = α^ + 10 ^β , using the delta method. (A) Less than 0.01 (B) At least 0.01, but less than 0.02 (C) At least 0.02, but less than 0.03 (D) At least 0.03, but less than 0.04 (E) At least 0.04 Use the following information for 4, 5/05, questions 9 and 10: The time to an accident follows an exponential distribution. A random sample of size two has a mean time of 6. Let Y denote the mean of a new sample of size two. 30.79 (4, 5/05, Q.9 & 2009 Sample Q.179) (2.9 points) Determine the maximum likelihood estimate of Pr(Y > 10). (A) 0.04 (B) 0.07 (C) 0.11 (D) 0.15 (E) 0.19 30.80 (4, 5/05, Q.10 & 2009 Sample Q.180) (2.9 points) Use the delta method to approximate the variance of the maximum likelihood estimator of FY(10). (A) 0.08
(B) 0.12
(C) 0.16
(D) 0.19
(E) 0.22
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 883 30.81 (4, 11/05, Q.14 & 2009 Sample Q.225) (2.9 points) You are given: (i) Fifty claims have been observed from a lognormal distribution with unknown parameters µ and σ. (ii) The maximum likelihood estimates are µ^ = 6.84 and σ^ = 1.49. (iii) The covariance matrix of µ^ and σ^ is: ⎛0.0444 0 ⎞ ⎜ ⎟ . 0.0222⎠ ⎝ 0 (iv) The partial derivatives of the lognormal cumulative distribution function are: ∂F -φ(z) ∂F -z φ(z) = and = . ∂µ σ ∂σ σ (v) An approximate 95% confidence interval for the probability that the next claim will be less than or equal to 5000 is: [PL , PH] Determine PL . (A) 0.73
(B) 0.76
(C) 0.79
(D) 0.82
(E) 0.85
30.82 (4, 11/06, Q.34 & 2009 Sample Q.277) (2.9 points) You are given: (i) Loss payments for a group health policy follow an exponential distribution with unknown mean. (ii) A sample of losses is: 100 200 400 800 1400 3100 Use the delta method to approximate the variance of the maximum likelihood estimator of S(1500). (A) 0.019 (B) 0.025 (C) 0.032 (D) 0.039 (E) 0.045
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 884 Solutions to Problems: 30.1. B. For the Single Parameter Pareto, α > 1, e(x) = (E[X] - E[X ∧ x] )/ S(x) = {θ(α / (α − 1)) - θ [{α − (x/θ)1−α} / (α − 1)] } / (x / θ)- α = x / (α−1). e(1 million) = 1,000,000/(1.904 - 1) = 1,106,195. 30.2. D. For the Single Parameter Pareto, α > 1, e(x) = x / (α−1).
∂ e(x) / ∂α = −x / (α−1)2 . ∂ e(1 million) / ∂α = −1,000,000 / (1.904 - 1)2 = -1,223,666. Since we only have a single parameter: Var[e(x)] ≅ (∂e(x) / ∂α)2 Var[α] = (-1,223,666)2 (1/193.36). Therefore the standard deviation of the estimate of e(x) is: 1,223,666/ 193.36 = 87,999. Comment: The inverse of the information is approximately the variance of the estimate of alpha. 1/193.36 = .005172. 30.3. A. The variance of the estimate of h(α, β) is: ⎛−0.52 ⎞ ⎛ 0.16 0.28⎞ ⎛−0.52 ⎞ (-0.52, 0.12) ⎜ ⎜ ⎟ = (-0.0496, -0.0616) ⎜ ⎟ = 0.0184. ⎟ ⎝ 0.28 0.70⎠ ⎝ 0.12 ⎠ ⎝ 0.12 ⎠ 30.4. A. For the Exponential Distribution the Method of Maximum Likelihood applied to ungrouped data is the same as the Method of Moments. For the method of moments, one sets 1/λ = observed mean = 105, resulting in an estimate λ = 0.00952. 30.5. C. ln f(x) = -λx + lnλ. ∂ ln f(x) / ∂λ = -x +1/λ.
∂2 ln f(x) / ∂λ2 = -1/λ2. E[∂2 ln f(x) / ∂λ2] = -1/λ2. Thus Var[λ] ≅ -1/ {nE[∂2 ln f(x) / ∂λ2]} = λ2/n = (.00952)2 / 10000 = 9.06 x 10-9. Thus the standard deviation =
9.06 x 10 - 9 = 0.0000952.
30.6. C. Variance of an Exponential Distribution is 1/λ2 = 1/ .009522 = 11,034. 30.7. A. If we were to let h(λ) = 1 / λ2, then Var[h] ≅ -(∂h / ∂λ)2 / {n E[∂2 lnf / ∂λ2 ]} = (∂h / ∂λ)2 Var[λ] = (-2/λ3)2 λ2/n = 4 λ-4 /n = (4)(0.00952)-4 / 10,000 = 48,698. Therefore, the standard deviation =
48,698 = 221.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 885 30.8. D. 1 - F(x) = e−λx. 1 - F(200) = e-(200)(0.00952) = 0.149. 30.9. C. Let the survival function be h(x;λ) = e−λx, then Var[h] ≅ -(∂h / ∂λ)2 / {n E[∂2 ln f / ∂λ2 ]} = (∂h / ∂λ)2 Var[λ] = (-xe−λx)2 λ2/n = (λx)2 e−2λx /n = {(.00952)(200)}2 e-2(200)(.00952) / 10000 = .00000805. Therefore, the standard deviation =
.00000805 = 0.00284.
Comment: Thus an interval estimate of the tail probability at 200 is: 0.149 ± 0.006. 30.10. D. For the exponential distribution the loss elimination ratio = 1 - e-λ x = 0.614. 30.11. D. f(x)= λe-λx. ln f(x) = ln (λ) − λx . First partial derivative of ln f(x) with respect to λ is 1/λ - x. Second partial derivative of ln f(x) with respect to λ is -1/λ2 . Therefore the expected value of the second partial derivative of ln f(x) with respect to λ is -1/λ2 . Therefore, the Information is n/λ2 . Its inverse is: λ2 /n = (.00952)2 /10000 = 9.06 x 10-9. The gradient of the loss elimination ratio is its partial derivative with respect to λ, which is x e-λ x = 38.6, for x = 100 and λ = .00952. Therefore the variance is: (38.6)(9.06 x 10 -9)(38.6) = 1.35 x 10-5, and the standard deviation = 0.00367. 30.12. C. h(α) = ln(-lnα). h(.2) = .476. ∂h/∂α = 1/(α lnα), which at α = .2 is -3.107. Var[h(α)] ≅ (∂h / ∂α)2 Var[ α^ ] = (-3.107)2 (.0001) = .000965. StdDev[h(α)] ≅ 95% confidence interval for h(α): 0.476 ± (1.96)(0.031) ≅ 0.48 ± 0.06. 30.13. B. f(x) = axa-1. ln f(x) = ln(a) + (a-1)ln(x).
Σ ln f(xi) = nln(a) + (a-1)Σln(xi) = 1000ln(a) - (169.3)(a-1). Setting the derivative with respect to a equal to zero: 1000/a - 169.3 = 0. ⇒ a = 1000/169.3 = 5.907. Comment: A Beta Distribution with b = 1 and θ = 1. In general, a = -n/Σln(xi).
0.000965 = 0.031.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 886 30.14. A. ln f(x) = ln(a) + (a-1)ln(x). ∂ ln f(x) / ∂a = 1/a + ln(x). ∂2 ln f(x) / ∂a2 = -1/a2 . Var[ a^ ] = -1/ {n E [∂2 ln f(x) / ∂a2]} = -1/{(1000)(-1/a2 ) = a2 /1000 = 5.9072 /1000. Standard deviation = 5.907/ 1000 = 0.1868. Comment: Thus a 95% confidence interval for a is: 5.91 ± .37. 30.15. C. The mean is a/(a+1), either from the formula for a Beta Distribution, E[X] = θa/(a+b), with b = 1 and θ = 1, or by integrating xf(x) from 0 to 1. E[X] = a/(a+1) = 5.907/6.907 = 0.8552. Comment: This is very close to the empirical mean. 30.16. D. h(a) = E[X] = a/(a+1). ∂ h(a) / ∂a = 1/(a+1)2 . Var[h(a)] = (∂ h(a) / ∂a)2 Var[a] = a2 / {1000(a+1)4 } = 5.9072 /{1000(6.907)4 } = 0.00001533. Standard deviation =
0.00001533 = 0.00392.
30.17. B. By integrating x2 f(x) from 0 to 1, E[X2 ] = a/(a+2). Var[X] = a/(a+2) - {a/(a+1)}2 = a/{(a+1)2 (a+2)} = 5.907/{(6.9072 )(7.907)} = 0.01566. 30.18. C. Using the squared error loss function, the estimate of the next draw is the estimated mean: 0.8552. Comment: See “Mahlerʼs Guide to Buhlmann Credibility.” 30.19. B. The estimate of the next draw is the estimated mean: .8552. There are at least two reasons why this estimate will not match the observation. The first is due to the variance in the estimated mean. The second is due to the random fluctuation in the sizes. The second is estimated via the process variance. For a single draw the process variance is .01566 from a previous solution. The variance of the estimated mean is .00001532 from a previous solution. These two sources of error add, and therefore the mean squared error is: .01566 + .00001532 = 0.01568. Comment: With a single random draw, the error due to random fluctuation predominates. Since Maximum Likelihood is asymptotically unbiased, I have ignored any contribution to the mean squared error from the square of the bias.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 887 30.20. C. For the sum of 10,000 independent identical variables, the process variance is 10,000 times that for a single draw: (10000)(.01566) = 156.6. The estimated sum of 10,000 draws is 10000(mean). Var[10000mean] = 100,000,000Var[mean] = (100,000,000)(.00001532) = 1532. These two sources of error add, and therefore the mean squared error is: 156.6 + 1532 = 1689. Comment: With 10,000 draws, the error due to random fluctuation no longer predominates. A 95% confidence interval for the sum of the next 10,000 random draws is: (10000)(.8552) ± 1.96 1689 = 8552 ± 81. Other reasons for prediction errors include: the value of the parameter a is changing over time, or the distribution the data is drawn from is not in fact of the form F(x) = xa. 30.21. D. Using the method of moments, the estimated mean is the empirical mean: 854.8/1000 = 0.8548. a/(a+1) = mean. ⇒ a = mean/(mean -1) = 0.8548/(1 - 0.8548) = 5.887. E[X2 ] = a/(a+2). Var[X] = a/(a+2) - {a/(a+1)}2 = a/{(a+1)2 (a+2)}. Var[mean] = Var[X]/n = {a/{(a+1)2 (a+2)}}/n = ((5.887)/{(6.8872 )(7.887)})/1000 = 0.00001574. Standard deviation = .00397. ⇒ 95% confidence interval for the mean is: 0.855 ± 0.008. Comment: The variance of the method of moments estimate of 0.0000157 is slightly larger than that of the maximum likelihood estimate of 0.0000153. The variance of the maximum likelihood estimate is: a2 / {n(a+1)4 }, vs. {a/{n(a+1)2 (a+2)}} for the method of moments. The ratio of the variance of the method of moments estimate to that of maximum likelihood is: (a+1)2 /{(a(a+2)} = 1 + 1/{(a(a+2)} > 1. Thus in this case, maximum likelihood has a smaller variance than the method of moments, regardless of the sample size. In general, maximum likelihood has a smaller variance for large sample sizes, while for small sample sizes, either method can have a smaller variance than the other.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 888 30.22. E. The function of the parameters in this case is the variance = 2nd moment - mean2 = exp(2µ + 2σ2) - exp(µ + σ2/2)2 = exp(2µ + σ2)(exp(σ2) -1). h(µ , σ) = exp(2µ + σ2)(exp(σ2) -1).
∂h/∂µ = 2exp(2µ + σ2)(exp(σ2) -1) = 2exp(6.6 + (2.52 ))(exp(2.52 ) -1) = 0.394 x 109 . ∂h / ∂σ = 2σ exp(2µ + σ2)(exp(σ2) -1) + exp(2µ + σ2)2σ exp(σ2) = 2σ exp(2µ + σ2)(2exp(σ2) -1) = (2)(2.5)e12.85(1035) = 1.971 x 109 . Thus the gradient vector is: (0.394 x 109 , 1.971 x 109 ). The inverse of the information matrix is given as: ⎛0.00048 ⎞ 0 ⎜ ⎟ 0 0.00024⎠ ⎝ Therefore, the variance of the estimate is: (0.394 x
109 ,
1.971 x
109 )
⎛0.00048 ⎞ ⎛0.394 x 109 ⎞ 0 ⎜ ⎟ ⎜ ⎟ = 1.007 x 1015. 0 0.00024⎠ ⎝1.971 x 109 ⎠ ⎝
The standard deviation of the estimate is:
10.07x 1014 = 32 million.
Comment: Similar to 4, 5/00, Q.25. The variance of the LogNormal Distribution is: exp(2µ + σ2)(exp(σ2) -1) = e12.85(517) = 197 million. Thus an approximate 95% confidence interval for the variance of the LogNormal Distribution is: 197 ± 63 million. For σ = 2.5, and 13,021 data points, one would get the given covariance matrix, with elements: σ2/n, 0, 0, .5σ2/n. 30.23. C. -nE[∂2 ln f(x) / ∂α2 ] = -1000 E[-1/α2 ] = 1000/α2 = 1000 / 4 = 250. ∞
30.24. E.
∫0
f(x) dx = x+θ
∞
∫0
α θα αθ α dx = = α /{(α+1) θ}. (x+ θ)2 + α (α +1) θ1 + α
30.25. B. -nE[∂2 ln f(x) / ∂α ∂θ] = -1000 E[(1/θ) − (1/(θ+x))] = 1000 (α /(α+1) −1) / θ = -1000/{θ(α+1)} = -0.00333333. ∞ ∞ 1 f(x) α θα α θα 2 30.26. B. E[ ]= ∫ 2 dx = ∫ (x+ θ)3 + α dx = (α + 2) θ2 + α = α /{(α+2) θ }. (x +θ) 2 (x + θ) 0 0
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 889 30.27. E. -nE[∂2 ln f(x) / ∂θ2 ] = -1000 E[(α+1)/(θ+x)2 − (α/θ2 )] = 1000 {1 - (α +1)/(α+2)} (α/ θ2 ) = (1000)α/{θ2(α+2)} = 5.00 x 10- 8. Comment: Therefore, the Information Matrix for a Pareto Distribution is: n / α2 ⎛ ⎜ - n / {θ(α +1)} ⎝
- n / {θ(α + 1)} ⎞ . nα / {θ2 (α + 2)}⎟⎠
30.28. D. For the Pareto the excess ratio is: R(x) = 1 - E[X ∧ x]/ E[X] = (1 + x / θ)1−α = 0.0909. 30.29. E. ∂R / ∂α = -ln(1 + x / θ)(1 + x / θ)1− α = -0.218. 30.30. C. ∂R / ∂θ = (1−α)(-x / θ2 )(1 + x / θ)−α = 8.26 x 10- 7. -1/ 300 ⎞ ⎛ 250 30.31. E. The Information Matrix is: ⎜ ⎟. ⎝ -1/ 300 5 x 10 - 8 ⎠ ⎛ 5 x 10 - 8 1/ 300⎞ The Inverse of the Information Matrix is: ⎜ / {(5 x 10-8)(250) - 1/3002 } = 250 ⎟⎠ ⎝ 1/ 300 2400 ⎛ 5 x 10 - 8 1/ 300⎞ ⎛ 0.036 ⎞ 720,000 ⎜ =⎜ . ⎟ 250 ⎠ ⎝ 2400 180,000,000⎟⎠ ⎝ 1/ 300 The estimated variance is: (transpose of gradient)(Inverse of Information Matrix) (gradient) = 2400 ⎛ 0.036 ⎞ ⎛ −0.218 ⎞ (-0.218 , 8.26 x 10-7) ⎜ ⎟ ⎜ ⎟ = 0.00097. ⎝ 2400 180,000,000⎠ ⎝ 8.26 x 10 −7 ⎠ Comment: The inverse of the Information Matrix for a Pareto Distribution is: θα(α + 1)(α + 2) / n ⎞ ⎛ α2 (α + 1)2 / n ⎜ θα(α +1)(α + 2) / n θ2(α +1)2 (α + 2) / (nα)⎟ . ⎝ ⎠ Thus, the correlation of the estimated α and θ is:
30.32. A. The standard deviation is
α(α + 2) = 1 - 1/ (α + 1)2 , close to 1. α +1
0.00097 = 0.031. A 90% confidence interval is about
±1.645 standard deviations, since Φ(1.645) = 0.95. The estimated excess ratio is 0.0909. Thus the desired confidence interval is: 0.0909 ± (1.645)(0.031) = (0.04, 0.14). Comment: Note that in general excess ratios are confined to the interval [0, 1].
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 890 30.33. B. For the Gamma Distribution, the log density is: ln f(x) = -αlnθ + (α−1)lnx - (x /θ) - ln(Γ(α)). The partial derivative of ln f(x) with respect to α is: -lnθ + lnx - ψ(α). The second partial derivative of ln f(x) with respect to α is: -ψ′(α). The second partial derivative of ln f(x) with respect to θ is: α/θ2 - 2x/θ3, which has an expected value of: α/θ2 - 2E[x]/θ3 = α/θ2 - 2αθ/θ3 = -α/θ2. The mixed second partial derivative of ln f(x) with respect to α and θ is: -1/θ Therefore the two by two information matrix is: ⎛ ψ'(α) 1/ θ ⎞ ⎛ ψ'(2.52) 1/ 59.52 ⎞ ⎛ 486 16.8 ⎞ 1000 ⎜ = 1000 ⎟ ⎜ ⎟=⎜ ⎟. ⎝ 1/ θ α / θ2⎠ ⎝1/ 59.52 2.52 / 59.52 2⎠ ⎝16.8 0.711⎠ The inverse of the information matrix, an approximate variance-covariance matrix, is: ⎛0.711 -16.8 ⎞ ⎛0.0112 -0.265⎞ ⎜ ⎟ / {(486)(.711) - 16.82 } = ⎜ ⎟. ⎝ -16.8 486 ⎠ ⎝ -0.265 7.66 ⎠ Therefore the variance of the estimated α is 0.0112. Comment: Beyond what you are likely to be asked on your exam. An approximate 95% confidence interval for alpha is: 2.52 ± 1.96 0.0112 = 2.52 ± 0.21. 30.34. B. The inverse of the information matrix is the approximate variance-covariance matrix. Therefore the variance of the estimated θ is 7.66. Comment: An approximate 95% confidence interval for theta is: 59.52 ± 1.96 7.66 = 59.5 ± 5.5. 30.35. C. The inverse of the information matrix is the approximate variance-covariance matrix. Therefore, the variance of the estimated α is .0112, the variance of the estimated θ is 7.66, and the the covariance of the estimated α and θ is -0.265. Thus the correlation of the estimated α and θ is: -0.265 / (0.0112) (7.66) = -90.5%.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 891 30.36. A. The mean of a Gamma Distribution is αθ. Let h = αθ. Then ∂h/∂α = θ and ∂h/ ∂θ = α. Therefore, the gradient vector is: (θ , α) = (59.52 , 2.52). Then the variance of the estimate of the mean is the matrix product of : ( transpose of gradient vector) (Inverse of the information matrix) (gradient vector) = ⎛0.0112 -0.265⎞ ⎛59.52⎞ ⎛59.52⎞ (59.52 , 2.52) ⎜ ⎟ ⎜ ⎟ = (0, 3.54) ⎜ ⎟ = 8.9. ⎝ -0.265 7.66 ⎠ ⎝ 2.52 ⎠ ⎝ 2.52 ⎠ Comment: The estimated mean is αθ = 150. An approximate 95% confidence interval for the mean is: 150 ± 1.96 8.9 = 150 ± 6. Note that since the estimates of alpha and theta are highly negatively correlated, the estimate of the mean derived from their product has a relatively small variance. 30.37. E. The variance of a Gamma Distribution is: αθ2 = (2.52) (59.522 ) = 8927. 30.38. D. The variance of a Gamma Distribution is: αθ2 = (2.52) (59.522 ) = 8927. Let h = αθ2. Then ∂h/∂α = θ2, and ∂h/ ∂θ = 2αθ. Therefore, the gradient vector is: (θ2 , 2αθ) = (59.522 , (2)(2.52) (59.52)) = (3543, 300). Then the variance of the estimate of the variance is the matrix product of: (transpose of gradient vector) (Inverse of the information matrix) (gradient vector) = ⎛0.0112 -0.265⎞ ⎛ 3543⎞ ⎛ 3543⎞ (3543, 300) ⎜ ⎟ ⎜ ⎟ = (-39.7, 1360) ⎜ ⎟ = 267 thousand. ⎝ -0.265 7.66 ⎠ ⎝ 300 ⎠ ⎝ 300 ⎠ An approximate 95% confidence interval for the variance of the Gamma Distribution is thus 8927 ± 1.96 267,000 = 8927 ± 1012 = [7915 , 9939 ] ≅ [7900, 9940]. 30.39. A. The variance of the estimate of g(α, β) is: ⎛0.16 0.28⎞ ⎛−0.52 ⎞ ⎛−0.52 ⎞ (-0.52, 0.12) ⎜ ⎟ ⎜ ⎟ = (-0.0496, -0.0616) ⎜ ⎟ = 0.0184. ⎝0.28 0.70⎠ ⎝ 0.12 ⎠ ⎝ 0.12 ⎠ 30.40. C. The variance of the estimate of h(α, β) is: ⎛0.16 0.28⎞ ⎛ 0.41⎞ ⎛ 0.41⎞ (0.41, 0.23) ⎜ ⎟ ⎜ ⎟ = (0.1300, 0.2758) ⎜ ⎟ = 0.1167. ⎝0.28 0.70⎠ ⎝0.23 ⎠ ⎝0.23 ⎠
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 892 30.41. B. The covariance of the estimates of g(α, β) and h(α, β) is: ⎛0.16 0.28⎞ ⎛ 0.41⎞ ⎛ 0.41⎞ (-0.52, 0.12) ⎜ ⎟ ⎜ ⎟ = (-0.0496, -0.0616) ⎜ ⎟ = -0.0345. ⎝0.28 0.70⎠ ⎝0.23 ⎠ ⎝0.23 ⎠ 30.42. D. For the Weibull, VARp [X] = θ { -ln(1-p) }1/τ . Thus the 90th percentile is: h(θ, τ) = θ ln(10)1/τ. ∂h = ln(10)1/τ. ∂θ Plugging in the fitted parameters, the first component of the gradient vector is: ln(10)1/0.4 = 8.0452. ∂h -1 = θ ln(10)1/τ ln[ln(10)] 2 . ∂τ τ Plugging in the fitted parameters, the second component of the gradient vector is: (100) ln(10)1/0.4 ln[ln(10)] (-1/0.16) = -4193.72. Thus the variance of the estimate is: -0.01⎞ ⎛ 8.0452 ⎞ ⎛ 2 ⎛ 8.0452 ⎞ (8.0452, -4193.7) ⎜ = (58.027, -4.2742) ⎜ ⎟ ⎜ ⎟ ⎟ = 18,392. ⎝ -0.01 0.001⎠ ⎝ -4193.7⎠ ⎝ -4193.7⎠
The standard deviation of the estimate of the 90th percentile is:
18,392 = 135.6.
Comment: The estimate of the 90th percentile is: (100) ln(10)1/0.4 = 804.52. 30.43. A. For the Pareto, S(x) = (1 + x/θ)−α. S(5) = (1 + 5/4.6415)-2.5884 = 15.1%.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 893 30.44. C. For the Pareto Distribution, S(x) = (θ/(θ+x))α.
∂S(x)/ ∂α = ln(θ/(θ+x)) (θ/(θ+x))α ∂S(5)/ ∂α = ln(4.6415/(4.6415+5)) (4.6415/(4.6415+5))2.5884 = -0.1102. ∂S(x)/∂θ = {αθα−1(θ+x)α - α(θ+x)α−1θα}/(θ+x)2α = αθα−1{(θ+x) - θ}/(θ+x)α+1 = αθα−1x /(θ+x)α+1.
∂S(5)/ ∂θ = 2.5884 (4.64151.5884)(5) /(4.6415+5)3.5884 = 0.0436. Thus the gradient vector is: (-0.1102, 0.0440). The variance of the estimated value of the Survival Function is: (transpose of gradient vector) (Inverse of the information matrix) (gradient vector) = ⎛ 2.761 6.150 ⎞ ⎛−0.1102 ⎞ (-0.1102, 0.0436) ⎜ ⎟ ⎜ ⎟= ⎝6.150 14.118⎠ ⎝ 0.0436 ⎠ (-0.1102, 0.0436) (-0.0361, -0.0622) = 0.00127. 30.45. E. For the Pareto: E[X ∧ x] ={θ/(α-1)}{1-(θ/(θ+x))α−1}. The Layer Average Severity for the layer from 10 to 15 is: E[X ∧ 15] - E[X ∧ 10] = {4.6415/1.5884}{1-(4.6415/(4.6415+15))1.5884} {4.6415/1.5884} {1 - (4.6415/(4.6415+10))1.5884} = 2.6266 - 2.4509 = 0.1757.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 894 30.46. D. For the Pareto: E[X ∧ x] = {θ/(α-1)}{1-(θ/(θ+x))α−1}.
∂ E[X ∧ x]/ ∂α = {θ/(α-1)2 } {[ (θ/(θ+x))α−1 ((α-1)Ln[1+ x/θ] + 1)] -1 }. ∂ E[X ∧ x]/ ∂θ = -(x/θ)(1+x/θ)−α + {1-(θ/(θ+x))α−1}/(α-1). ∂ E[X ∧ 10]/ ∂α = {4.6415/1.58842 } { [(4.6415/(4.6415+10))1.5884 (1.5884Ln[1+10/4.6415] + 1)] -1 } = -1.00169.
∂ E[X ∧ 15]/ ∂α = {4.6415/1.58842 } { [(4.6415/(4.6415+10))1.5884 (1.5884Ln[1+15/4.6415] + 1)] -1 } = -1.22736.
∂ E[X ∧ 10]/ ∂θ = -(10/4.6415)(1+10/4.6415)-2.5884 + {1-(4.6415/(4.6415+10))1.5884}/(1.5884) = 0.417913.
∂ E[X ∧ 15]/ ∂θ = -(15/4.6415)(1+15/4.6415)-2.5884 + {1-(4.6415/(4.6415+15))1.5884}/(1.5884) = 0.488677. Thus the Gradient Vector for the Layer Average Severity is: {-1.22736 - (-1.00169), 4.88677 - 0.417913} = {-0.225668, 0.0707643}. The variance of the estimate value of the Survival Function is: ( transpose of gradient vector) (Inverse of the information matrix) (gradient vector) = ⎛ 2.761 6.150 ⎞ ⎛ -0.225668 ⎞ (-0.225668, 0.0707643) ⎜ ⎟ ⎜ ⎟= ⎝6.150 14.118⎠ ⎝0.0707643⎠ (-0.225668, 0.0707643) (-0.18787, -0.38881) = 0.0149. Comment: Thus using ±1.96 standard deviations, the Layer Average Severity for the layer from 10 to 15 is: 0.18 ± 0.24 = [0, 0.42]. (The Layer Average Severity is always ≥ 0.)
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 895 30.47 . D. The function of the parameters in this case is the second moment: h(µ , σ) = exp(2µ + 2σ2). ∂h = 2 exp(2µ + 2σ2) = 2 exp(10 + 1.28) = 158,443. ∂µ ∂h = 4 σ exp(2µ + 2σ2) = (4)(0.8) exp(10 + 1.28) = 253,508. ∂σ Thus the gradient vector is: (158,443, 253,508). Using the given inverse of the information matrix, the variance of the estimate is: 0 ⎞ ⎛ 158,443 ⎞ ⎛ 0.00012 (158,443, 253,508) ⎜ = 6.868 million. 0 0.00006⎟⎠ ⎜⎝ 253,508 ⎟⎠ ⎝ Comment: The estimated second moment is: e11.28 = 79,221. The standard deviation of this estimate is: 6.868 million = 2621. Thus an approximate 95% confidence interval for the second moment is: 79,221 ± 5137.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 896 30.48. B. & 30.49. B. & 30.50. E. The Information Matrix is:
α ⎛ 170 -8⎞ . θ ⎜⎝ -8 1 ⎟⎠
⎛1 8 ⎞ ⎛ 1/ 106 4 / 53 ⎞ The inverse of the information matrix is: ⎜ / {(170)(1) - (-8)(-8)} = ⎜ ⎟ ⎟ ⎝ 8 170⎠ ⎝ 4 / 53 85 / 53 ⎠ For a Gamma, mean = αθ, and variance = αθ2 . Thus the CV is: θ
α / (αθ) = 1/ α .
∂CV -1/ 2 ∂CV = 1.5 . = 0. ∂α α ∂θ Plugging in the fitted parameters, the gradient vector is: (-0.0625 , 0). Variance of the estimate of the CV is: ⎛ 1/ 106 4 / 53 ⎞ ⎛ -0.0625⎞ (-0.0625 , 0) ⎜ ⎟ ⎜ ⎟ = (-0.0625)2 /106. ⎝ 4 / 53 85 / 53 ⎠ ⎝ 0 ⎠ Standard Deviation of the estimate of the CV is: 0.0625 / 106 = 0.00607. Since for a Gamma Distribution the skewness is twice the CV, the standard deviation of the estimate of the skewness is: (2)(0.00607) = 0.0121. ∂Kurtosis ∂Kurtosis kurtosis = 3 + 6/α. = -6/α2. = 0. ∂α ∂θ Plugging in the fitted parameters, the gradient vector is: (-0.375 , 0). Variance of the estimate of the Kurtosis is: ⎛ 1/ 106 4 / 53 ⎞ ⎛ -0.375⎞ (-0.375 , 0) ⎜ ⎟ ⎜ ⎟ = (-0.375)2 /106. ⎝ 4 / 53 85 / 53 ⎠ ⎝ 0 ⎠ Standard Deviation of the estimate of the Kurtosis is: 0.375 /
106 = 0.0364.
Comment: Since the CV, skewness, and kurtosis each only depend on the shape parameter α, we only need the 1,1 element of the covariance matrix. Estimated CV is: 1/2. Estimated Skewness is 1. Estimated Kurtosis is: 3 + 6/4 = 4.5.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 897
[(
exp 30.51. f(x) =
ln f(x) = -
( ln(x) - µ)2 2σ 2
ln(x) − µ)2 2σ2 x σ 2π
]
.
- ln(x) - ln[σ] - ln(2π)/2.
The off-diagonal elements of the Information Matrix are: -n E[∂2 ln f(x) / ∂σ ∂µ] = -n E[−2{(lnx)−µ}/σ3] = (2n) {E[lnx]−µ} /σ3 = (2n) {µ−µ} /σ3 = 0. Where E[lnx] is the mean of the Normal underlying the LogNormal, or µ. The 1,1 element of the Information Matrix is: -n E[∂2 ln f(x) / ∂µ2 ] = -n E[-1/σ2] = n /σ2. The 2,2 element of the Information Matrix is: -n E[∂2 ln f(x) / ∂σ2 ] = -n E[(σ2 - 3{lnx − µ}2 )/σ4] = -n {σ2 - 3σ2} /σ4 = 2n /σ2. Where E[{lnx − µ}2 ] is the variance of the Normal underlying the LogNormal, or σ2. ⎛n / σ2 0 ⎞ Thus the Information Matrix is: ⎜ ⎟. 2n / σ 2⎠ ⎝ 0 The inverse of the Information Matrix is: 0 ⎞ 0 0 ⎛σ 2 / n ⎛ 1.52 / 2000 ⎞ ⎛ 0.001125 ⎞ = = . ⎜ 0 ⎟ ⎜ ⎟ ⎜ 2 σ2 / (2n)⎠ 0 1.5 / 4000⎠ ⎝ 0 0.0005625⎟⎠ ⎝ ⎝ For the LogNormal, E[X] - E[X ∧ x] = ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ exp[µ + σ2/2] - exp(µ + σ2/2) Φ ⎢ x {1 Φ ⎥⎦ ⎢⎣ ⎥⎦ } = σ σ ⎣ ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ exp[µ + σ2/2] {1 - Φ ⎢ } x {1 Φ ⎥⎦ ⎢⎣ ⎥⎦ } . σ σ ⎣ ⎡ ln(x) − µ ⎤ ∂Φ ⎢ ⎥⎦ -1 ⎡ ln(x) − µ ⎤ σ ⎣ Now by the chain rule: = φ ⎥⎦ . ∂µ σ ⎢⎣ σ ⎡ ln(x) − µ ⎤ ∂Φ ⎢ ⎥⎦ ln(x) - µ ⎡ ln(x) − µ ⎤ σ ⎣ =φ⎢ ⎥⎦ . ∂σ σ2 σ ⎣ ⎡ ln(x) − µ ⎤ ∂Φ ⎢ - σ⎥ σ ⎣ ⎦ = -1 φ ⎡ ln(x) − µ - σ ⎤ . ⎥⎦ σ ⎢⎣ σ ∂µ
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 898 ⎡ ln(x) − µ ⎤ ∂Φ ⎢ - σ⎥ σ ⎣ ⎦ = ( - ln(x) - µ - 1) φ ⎡ ln(x) − µ - σ ⎤ . ⎢⎣ ⎥⎦ ∂σ σ2 σ Thus,
∂ {E[X] - E[X ∧ 1000]} = ∂µ ⎡ ln(1000 ) − µ − σ2 ⎤ ⎡ ln(1000 ) − µ ⎤ exp[µ + σ2/2] {1 - Φ ⎢ } + exp[µ + σ2/2] φ ⎢ - σ⎥ / σ ⎥ σ σ ⎣ ⎦ ⎣ ⎦ ⎡ ln(1000 ) − µ ⎤ - 1000 φ ⎢ ⎥⎦ / σ . σ ⎣
Plugging in the fitted parameters, the first component of the gradient vector is: exp[8.125] {1 - Φ[-1.56]} + exp[8.125] φ[-1.561] / 1.5 + (1000) φ[-0.0615] / 1.5 = (3378) (0.9406) + (3378) {Exp[-1.5612 / 2] / 2 π }/ 1.5 - (1000) {Exp[-0.06152 / 2] / 2 π }/ 1.5 = 3178. ∂ {E[X] - E[X ∧ 1000]} Similarly, = ∂σ ⎡ ln(1000 ) − µ − σ2 ⎤ σ exp[µ + σ2/2] {1 - Φ ⎢ ⎥⎦ } σ ⎣ ⎡ ln(1000 ) − µ ⎤ ln(x) - µ ⎡ ln(1000 ) − µ ⎤ ln(x) - µ + exp[µ + σ2/2] φ ⎢ - σ⎥ ( + 1) 1000 φ . ⎢⎣ ⎥⎦ σ σ2 σ σ2 ⎣ ⎦ Plugging in the fitted parameters, the second component of the gradient vector is: 1.5 exp[8.125] {1 - Φ[-1.56]} + exp[8.125] φ[-1.561] (0.9590) + (1000) φ[-0.0615] (0.0410 )= (1.5)(3378) (0.9406) + (3378) {Exp[-1.5612 / 2]/ 2 π }(0.9590) + (1000) {Exp[-0.06152 / 2] / 2 π } (0.0410) = 5165. Thus the gradient vector is: (3178, 5165). Variance of the estimate of the average payment per loss is: 0 ⎛ 0.001125 ⎞ ⎛ 3178⎞ (3178, 5165) ⎜ = 26,368. 0 0.0005625⎟⎠ ⎜⎝ 5165⎟⎠ ⎝ Comment: Similar to 4, 11/05, Q.14. The estimate of the average payment per loss is: ⎡ ln(x) − µ − σ2 ⎤ ⎡ ln(x) − µ ⎤ exp[µ + σ2/2] {1 - Φ ⎢ } x {1 Φ ⎥⎦ ⎢⎣ ⎥⎦ } = σ σ ⎣ (3378){1 - Φ[-1.56]} - (1000) {1 - Φ[-0.06]} = (3378)(0.9406) - (1000)(0.5239) = 2653.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 899 30.52. E., 30.53. B., and 30.54. A. The inverse of the information matrix is: (0.49620 3.46563) (3.46563 25.462 ) The variance of the estimated α is: 0.49620. The variance of the estimated θ is: 25.462. The covariance of the estimated α and θ is: 3.46563 The correlation of the estimated α and θ is: 3.46563/ (0.4962)(25.462) = 0.975. Comment: The given information matrix was calculated from a situation with 500 data points. 30.55. E. The mean is: θ/(α-1) = 20/2.5 = 8. 30.56. D. h(α,θ) = E[X] = θ/(α-1). ∂ h(α,θ) / ∂α = -θ/(α-1)2 = -3.2. ∂ h(α,θ) / ∂θ = 1/(α-1) = 0.4. Thus the variance of the estimated mean is: ⎛0.49620 3.46563 ⎞ ⎛-3.2⎞ (-3.2 , 0.4) ⎜ ⎟ ⎜ ⎟ = 0.283. ⎝3.46563 25.462 ⎠ ⎝ 0.4 ⎠ 30.57. C. (250)(8) = 2000. 30.58. E. This estimate will not match the observation due to the variance in the estimated mean and due to the random fluctuation in the sizes of loss. The first source of error contributes: Var(250 estimated mean) = 2502 Var[estimated mean] = (62500)(.283) = 17,687. The second source of error is estimated via the process variance. For a single draw from a Pareto Distribution the process variance is: E[X2 ] - E[X]2 = 2θ2 / {(α-1)(α-2)} - {θ/(α−1)}2 = αθ2 / {(α-1)2 (α-2)} = (3.5)(202 )/{(2.52 )(1.5)} = 149.33. For the sum of 250 independent identical variables, the process variance is 250 times that for a single draw: (250)(149.33) = 37,333. These two sources of error add, and therefore the mean squared error is: 17,687 + 37,333 = 55,020. Comment: Since Maximum Likelihood is asymptotically unbiased, I have ignored any contribution to the mean squared error from the square of the bias. An approximate 95% confidence interval for the sum of the next 250 loss events is: 2000 ± 460. 30.59. B. Set S(x) = 0.5. ⇒ 0.5 = (θ/(x+θ))α. ⇒ x = θ(21/α - 1) = (20)(21/3.5 - 1) = 4.380.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 900 30.60. C. h(α,θ) = median = θ(21/α - 1). ∂ h(α,θ) / ∂α = -θ 21/α − 1 ln(2)/α2 = -(20)(21/3.5 - 1)ln(2)/3.52 = -1.3795. ∂ h(α,θ) / ∂θ = 21/α - 1 = 21/3.5 - 1 = 0.2190. Thus the variance of the estimated mean is: ⎛0.49620 3.46563 ⎞ ⎛-1.3795⎞ (-1.3795 , 0.2190) ⎜ ⎟ ⎜ ⎟ = 0.0715. ⎝3.46563 25.462 ⎠ ⎝ 0.2190 ⎠ 30.61. D. The covariance of the estimate of the mean and the estimate of the median: ⎛0.49620 3.46563 ⎞ ⎛-3.2⎞ (-1.3795 , 0.219) ⎜ ⎟ ⎜ ⎟ = 0.0799. ⎝3.46563 25.462 ⎠ ⎝ 0.4 ⎠ Comment: Put the gradient vector of the median on one side of the covariance matrix and the gradient vector of the mean on the other. The correlation between the estimate of the mean and the 0.0799 estimate of the median is: = 0.56. (0.0719) (0.283) 30.62. E., 30.63. D., & 30.64. A. The inverse of the Information Matrix is: ⎛ 3252 -94,091 ⎞ ⎜ ⎟ ⎛0.000035132 -0.0010165⎞ ⎝-94,091 2,750,825⎠ =⎜ ⎟. (2,750,825)(3252) - (94,091)(94,091) ⎝ -0.0010165 0.029717 ⎠ ^
^
^
^
Var[ β ] = 0.000035132. Var[ r ] = 0.029717. Cov[ β , r ] = -0.0010165. ^
^
Stddev[ β ] = 0.000035132 = 0.0059. Stddev[ r ] = 0.029717 = 0.172. ^
^
Corr[ β , r ] =
- 0.0010165 (0.000035132) (0.029717)
= -0.994.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 901 30.65. A. For a Negative Binomial Distribution, f(0) = (1+β)-r.
∂((1+β)-r)/ ∂β = -r(1+β)-(r+1). ∂((1+β)-r)/ ∂r = -ln(1+β)(1+β)-r. Thus the (transpose of the) gradient vector is: (-r(1+β)-(r+1), -ln(1+β)(1+β)-r) = (-(1.76)(1.0602-2.76), -ln(1.0602)(1.0602-1.76)) = (-1.4978, -0.05274). (transpose of gradient vector) (Inverse of the information matrix) (gradient vector) = ⎛0.000035132 -0.0010165⎞ ⎛ -1.4978 ⎞ (-1.4978, -0.05274) ⎜ ⎟ ⎜ ⎟ = 0.00000088. 0.029717 ⎠ ⎝-0.05274 ⎠ ⎝ -0.0010165 Thus the standard deviation of the estimate is:
0.00000088 = 0.00094.
The point estimate of the chance of zero claims is: (1+β)-r = 1.0602 -1.76 = 0.9022. Thus an interval estimate is: 0.9022 ± (1.96)(0.00094) = 0.902 ± 0.002. Comment: Although it does not affect this case, since we are estimating a probability, we should restrict ourselves to [0,1]. So for example if the result were instead 0.90 ± 0.16, then the confidence interval would be [0.74, 1].
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 902 30.66. & 30.67. X is LogNormal with parameters µ and σ = 1.5. Therefore, ln(X) is Normal with parameters µ and σ = 1.5. The sum of independent and identically distributed variables has a sum of the means and variances. The sum of independent Normals is Normal. Therefore, Σ ln(xi) is Normal with parameters nµ and σ = 1.5 n . Therefore, α = Σ ln(xi) / n is Normal with parameters µ and σ = 1.5/ n . ^
Therefore, β = exp[α] is LogNormal with parameters µ and σ = 1.5/ n ^
E[ β ] = exp[µ + (1.5/ n )2 /2] = exp[µ + 1.125/n]. E[X] = exp[µ + (1.5)2 /2] = exp[µ + 1.125]. Therefore, the bias of this estimator of the mean is: exp[µ + 1.125/n] - exp[µ + 1.125] = exp[µ] (exp[1.125]1 / n - exp[1.125]). ^
E[ β 2] = exp[2µ + 2(1.5/ n )2 ] = exp[2µ + 4.5/n]. ^
Var[ β ] = exp[2µ + 4.5/n] - exp[µ + 1.125/n]2 = exp[2µ] (exp[4.5]1/n - exp[2.25]1/n). ^
^
^
MSE[ β ] = Var[ β ] + Bias[ β ]2 = exp[2µ] (exp[4.5]1/n - exp[2.25]1/n) + exp[2µ] (exp[1.125]1/n - exp[1.125])2 = exp[2µ] (exp[4.5]1 / n + exp[2.25] - 2exp[1.125]1 + 1 / n). Comment: If for example n =10, then the bias is: exp[µ] (exp[1.125]1/10 - exp[1.125]) = -1.961eµ. For n = 10, the MSE is: exp[2µ] (exp[4.5]1/10 + exp[2.25] - 2exp[1.125]1.1) = 4.162e2µ. ^
30.68. β = eα. ∂eα/∂α = eα. ln(X) is Normal with parameters µ and σ = 1.5. Therefore, α = Σ ln(xi) / n is Normal with parameters µ and σ = 1.5/ n . Var[α] = 2.25/n. Using the delta method, ^
^
Var[ β ] ≅ Var[α] (∂eα/∂α)2 = 2.25e2α/n = 2.25 β 2 /n = 2.25 exp[2Σ ln(xi) / n] / n. ^
Comment: The estimate using the delta method is not equal to the actual Var[ β ] computed in the previous solution. The delta method is an approximation.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 903 30.69. This is equivalent to fitting a Normal Distribution via Maximum Likelihood to the logs of the data. This is in turn the same as fitting a Normal Distribution via Method of Moments to the logs of the data. {ln(1.814) + ln(1.790) + ln(1.771) + ln(1.789) + ln(1.768) + ln(1.761) + ln(1.751) + ln(1.735) + ln(1.720)}/9 = 0.568911. {ln(1.814)2 + ln(1.790)2 + ln(1.771)2 + ln(1.789)2 + ln(1.768)2 + ln(1.761)2 + ln(1.751)2 + ln(1.735)2 + ln(1.720)2 }/9 = 0.323901. µ = 0.5689. σ2 = 0.323901 - 0.5689112 = 0.0002413. σ = 0.0155. Comment: Data for private passenger automobile liability paid losses, taken from Table 1 in “The Path of the Ultimate Loss Ratio Estimate,” by Michael G. Wacek, Variance, Fall 2007. These observed loss development factors are the paid losses 24 months after the beginning of an accident year divided by the paid losses 12 months after the beginning of an accident year. 30.70. mean = exp[0.5689 + 0.01552 /2] = 1.767.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 904 30.71. f(x) = exp[-.5 ({ln(x)−µ} /σ)2] / {xσ 2 π }. ln f(x) = -.5 ({ln(x)−µ} /σ)2 - lnx - lnσ - ln(2π)/2.
∂ ln f(x) / ∂µ = -µ/σ2. ∂2 ln f(x) / ∂µ2 = -1/σ2. ∂2 ln f(x) / ∂σ ∂µ = −2{(lnx)−µ}/σ3. ∂ ln f(x) / ∂σ = {lnx − µ}2 /σ3 - 1/σ. ∂2 ln f(x) / ∂σ2 = 3{lnx − µ}2 /σ4 + 1/σ2. E[lnx] is the mean of the Normal underlying the LogNormal, or µ. The off-diagonal elements of the Information Matrix are: -n E[∂2 ln f(x) / ∂σ ∂µ] = -n E[−2{(lnx)−µ}/σ3] = -2n {E[lnx]−µ} /σ3 = -2n {µ−µ} /σ3 = 0. The 1,1 element of the Information Matrix is: -n E[∂2 ln f(x) / ∂µ2 ] = -n E[−1/σ2] = n /σ2. E[{lnx − µ}2 ] is the variance of the Normal underlying the LogNormal, or σ2. The 2,2 element of the Information Matrix is: -n E[∂2 ln f(x) / ∂σ2 ] = -n E[(σ2 - 3{lnx − µ}2 )/σ4] = -n {σ2 −3σ2} /σ4 = 2n /σ2. ⎛n / σ2 0 ⎞ Thus the Information Matrix is: ⎜ ⎟. 2n / σ 2⎠ ⎝ 0 0 ⎞ ⎛σ 2 / n The inverse of the Information Matrix is: ⎜ . 2 σ / (2n)⎟⎠ ⎝ 0 ⎛.0000267 ⎞ 0 For n = 9 and σ = 0.0155, the Covariance Matrix is: ⎜ ⎟. 0 .0000133⎠ ⎝ The function of the parameters in this case is the mean: h(µ , σ) = exp(µ + σ2/2).
∂h / ∂µ = exp(µ + σ2/2) = exp[0.5689 + 0.01552 /2] = 1.767. ∂h / ∂σ = σ exp(µ + σ2/2) = (0.0155) exp[0.5689 + 0.01552 /2] = 0.027. ⎛1.767⎞ Thus the gradient vector is: ⎜ ⎟. ⎝0.027⎠ ⎛.0000267 ⎞ 0 The variance of the estimate is: (1.767, 0.027) ⎜ ⎟ 0 .0000133⎠ ⎝ 95% confidence interval for the mean is: 1.767 ± (1.960)
⎛1.767⎞ ⎜ ⎟ = 0.000834. ⎝0.027⎠
0.000834 = 1.767 ± 0.018.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 905 30.72. C. The first step is to compute the gradient vector, whose entries are the partial derivatives of the function in question, which in this case happens to be the Distribution Function. For the Weibull, F(x) = 1 - exp[(x/θ)τ], and the partial derivatives of the Distribution Function are:
∂F(x) / ∂θ = −τ(x/θ)τ exp[-(x/θ)τ] /θ, ∂F(x) / ∂τ = (x/θ)τ exp[-(x/θ)τ] ln(x/θ). At x = 100, θ = 20, and τ = 0.7, the gradient vector is : (-0.00494 , 0.227). The variance of the estimate value of the Distribution Function is: ( transpose of gradient vector) (Inverse of the information matrix) (gradient vector) = ⎛ 7.3 ⎛-0.00494 ⎞ 0.021 ⎞ ⎛-0.00494 ⎞ (-0.00494, 0.227) ⎜ ⎟⎜ ⎟ = (-0.0313, -0.0000697) ⎜ ⎟ = 0.000139. ⎝0.021 0.00015 ⎠ ⎝ 0.227 ⎠ ⎝ 0.227 ⎠ 30.73. A. For the Pareto Distribution, VaRp [X] = θ {(1-p)-1/α - 1}. Q 0.25 = VaR0.25[X] = θ {(0.75)-1/α - 1}. Q 0.75 = VaR0.75[X] = θ {(0.25)-1/α - 1}. Interquartile range = Q0.75 - Q0.25 = θ {(0.25)-1/α - (0.75)-1/α} = θ {41/α - (4/3)1/α} . Therefore, the function of interest is: h(α, θ) = θ {41/α - (4/3)1/α}. ∂h θ = θ {(-1/α2)ln(4)41/α - (-1/α2)ln(4/3)(4/3)1/α} = 2 {ln(4/3)(4/3)1/α - ln(4)41/α}. ∂α α ∂h = 41/α - (4/3)1/α. ∂θ Plugging in the fitted parameters of 2.3 and 97, the first component of the gradient vector is: (97 / 2.32 ) {ln(4/3)(4/3)1/2.3 - ln(4)41/2.3} = -40.467. Plugging in the fitted parameters of 2.3 and 97, the second component of the gradient vector is: 41/2.3 - (4/3)1/2.3 = 0.6939. Thus using the Delta Method, the variance of the estimate of the interquartile range is: ⎛ 0.14 8.2⎞ ⎛ -40.467⎞ ⎛ -40.467⎞ (-40.467, 0.6939) ⎜ = (0.0246, 21.366) ⎜ ⎟ ⎜ ⎟ ⎟ = 13.83. ⎝ 8.2 509⎠ ⎝ 0.6939 ⎠ ⎝ 0.6939 ⎠ Comment: The point estimate of the interquartile range is: θ {41/α - (4/3)1/α} = (97){41/2.3 - (4/3)1/2.3} = 67.3. Thus an approximate 95% confidence interval for the interquartile range is 67.3 ± 1.960
13.83 = 67.3 ± 7.3.
30.74. E. Let h(k;θ) = e-k/θ. Then ∂h / ∂θ = (k/θ2) e-k/θ. The variance of the estimate of a function of the single parameter θ is: (∂h / ∂θ)2 Var[θ] = (k2 /θ4) e-2k/θ Var(θ).
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 906 30.75. E. Since for ungrouped data for the exponential distribution the Method of Maximum Likelihood is the same as the Method of Moments, the fitted θ is 12500/100 = 125. f(x) = e-x/θ / θ. ln f(x) = -x/θ - ln(θ). ∂ ln f(x) / ∂θ = x / θ2 - 1/ θ . ∂2 ln f(x) / ∂θ2 = -2x / θ3 + 1/ θ2 . E [∂2 ln f(x) / ∂θ2] = -2E[x]/ θ3 + 1/ θ2 = -2θ/ θ3 + 1/ θ2 = -1/ θ2 . Var[θ] ≅ -1 / {n E [∂2 ln f(x) / ∂θ2]} = θ2 / n. The quantity of interest is the survival function: h(θ) = e-x/θ. ∂h / ∂θ = e-x/θ x/θ2. Var[h] ≅ -(∂h / ∂θ)2 / {n E[∂2 ln f / ∂θ2 ]} = (∂h / ∂θ)2 Var[θ] = (e-x/θ x/θ2)2 θ2 / n = e-2x/θ x2 / {θ2n} = e-2(250)/125 2502 / {1252 100} = e-4 / 25. Thus the standard deviation of the estimated survival function is e-2 / 5 = .027. Thus an approximate 95% confidence interval is ± (1.96)(.027) = ± .053, which has width 0.106. Comment: The estimated survival function at 250 is thus: S(250) = 0.135 ± 0.053.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 907 30.76. D. The function of the parameters in this case is the mean: h(µ , σ) = exp(µ + σ2/2). ∂h = exp(µ + σ2/2) = exp(4.215 + (1.0932 )/2) = e4.812 = 123.0. ∂µ ∂h = σ exp(µ + σ2/2) = (1.093) exp(4.215 + (1.0932 )/2) = 134.5. ∂σ Thus the gradient vector is: (123.0, 134.5). Using the given inverse of the information matrix, the variance of the estimate is: ⎛0.1195 ⎛123.0⎞ 0 ⎞ ⎛123.0⎞ (123.0, 134.5) ⎜ ⎟ ⎜ ⎟ = (14.699, 8.030) ⎜ ⎟ = 1808 + 1080 = 2888. 0.0597⎠ ⎝134.5⎠ ⎝ 0 ⎝134.5⎠ Comment: Similar to Example 15.13 in Loss Models. A Maximum Likelihood fit to a LogNormal is equivalent to fitting via the Method of Moments a Normal Distribution to the log claim sizes. Since for the fitted Normal, µ and σ are independent, this is true for the LogNormal as well. Thus the off-diagonal elements of either the Information Matrix or its inverse are zero. Alternately, the off-diagonal elements of the Information Matrix are: -n E[∂2 ln f(x) / ∂σ ∂µ] = -n E[−2{(lnx)−µ}/σ3] = (2n) {E[lnx]−µ} /σ3 = (2n) {µ−µ} /σ3 = 0. Where E[lnx] is the mean of the Normal underlying the LogNormal, or µ. The 1,1 element of the Information Matrix is: -n E[∂2 ln f(x) / ∂µ2 ] = -n E[−1/σ2] = n /σ2. The 2,2 element of the Information Matrix is: -n E[∂2 ln f(x) / ∂σ2 ] = -n E[(σ2 -3{lnx − µ}2 )/σ4] = -n {σ2 −3σ2} /σ4 = 2n /σ2. Where E[{lnx − µ}2 ] is the variance of the Normal underlying the LogNormal, or σ2. ⎛n / σ2 0 ⎞ Thus the Information Matrix is: ⎜ ⎟. 2n / σ 2⎠ ⎝ 0 0 ⎞ ⎛σ 2 / n The inverse of the Information Matrix is: ⎜ . σ2 / (2n)⎟⎠ ⎝ 0 Given the fitted σ = 1.093, if the number of data points were equal to ten (n =10), then one would match the given inverse of the Information Matrix.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 908 ^ 30.77. B. We want to estimate the covariance of α^ and In( λ ).
Therefore, let g = α = 1/σ, and h = In(λ) = −µ/σ.
(∂g/ ∂µ, ∂g/ ∂σ) = (0, −1/σ2) = (0, -1/1.392 ) = (0, -0.518). (∂h/ ∂µ, ∂h/ ∂σ) = (−1/σ, µ/σ2) = (-1/1.39, 4.13/1.392 ) = (-0.719, 2.138). Using the given variance-covariance matrix of µ^ and σ^ , the covariance of g and h is: ⎛0.075 0.016⎞ ⎛-0.719⎞ (0, -0.518) ⎜ ⎟ ⎜ ⎟ = (-0.008288, -0.02486) ⎝0.016 0.048⎠ ⎝ 2.138 ⎠
⎛-0.719⎞ ⎜ ⎟ = -0.047. ⎝ 2.138 ⎠
Comment: One can reverse the roles of g and h, without affecting the answer. The two functions g and h are whatever they want the covariance of, in this case α and ln(λ). We want to get α and ln(λ) in terms of µ and σ, since these are the fitted parameters whose covariance matrix we are given. To do so, look in the question where it says "the parameters satisfy the relations: λ = exp(−µ/σ) and α = 1/σ." The Survival Analysis part of this question is no longer on the Syllabus, but one does not need to understand it in order to answer the question asked.
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 909 30.78. C. h(α, β) = Y10 = α + 10β. ∂h/∂α = 1 and ∂h/ ∂β = 10. Therefore, the variance of the forecast is: ⎛ 0.00055 -0.00010 ⎞ ⎛ 1 ⎞ (1 10) ⎜ ⎟ ⎜ ⎟ = 0.00055. ⎝-0.00010 0.00002 ⎠ ⎝10 ⎠ The standard deviation of the forecast is:
0.00055 = 0.0235.
Alternately, Var[Y10] = Var[α + 10β] = Var[α] + Var[10β] + 2Cov[α , 10β] = Var[α] + 100Var[β] + 20Cov[α , β] = .00055 + (100)(0.00002) + (20)(-0.0001) = 0.00055. 0.00055 = 0.0235. Comment: The forecast is: 0.50 + (10)(0.02) = 0.70. Since in this case the gradient vector of (1, 10) does not depend on the values of the fitted α and β, one could solve the problem without them. Note that the variance of the forecast calculated here is similar to that in equation 8.12 in Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld, not on the syllabus. However, equation 8.12 includes a term of σ2, the variance of ε, the error term in the model. This term takes into account the additional forecast error due to the random fluctuation of the loss ratio for year 10 around its expected value. The delta method only measures the error in forecasting the expected value, not the error due to random fluctuation of the future observation around its expected value. The delta method measures the effect of the random fluctuations contained in the observations used to fit the model, not the random fluctuations of future observations. 30.79. D. For the Exponential, maximum likelihood is equal to the method of moments. θ^ = 6. The sum of two independent, identically distributed Exponentials is a Gamma Distribution, with α = 2 and θ = 6. This Gamma has density, xe-x/6/36. ∞
x=∞
∫
P(average > 10) = P(sum > 20) = xe-x/6/36 dx = -xe-x/6/6 - e-x/6] = e-20/6(20/6 + 1) = 0.1546. 20
x = 20
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 910 30.80. A. Var[ θ^ ] = Var[ X ] = Var[X]/n = θ2/2 = 62 /2 = 18. From the previous solution, the sum follows a Gamma Distribution with α = 2, f(x) = xe-x/θ/θ2. ∞
x=∞
∫
P(average > 10) = P(sum > 20) = xe-x/θ/θ2 dx = -xe-x/θ/θ - e-x/θ] = e-20/θ (20/θ + 1). 20
x = 20
Thus the quantity of interest as a function of the estimated theta is: e-20/θ (20/θ + 1). h(θ) = FY(10) = 1 - Prob(average > 10) = 1 - e-20/θ (20/θ + 1). hʼ(θ) = -{e-20/θ(20/θ + 1)(20/θ2) + e-20/θ(-20/θ2)} = -e-20/θ (400/θ3). hʼ(6) = -e-20/6(400/216) = -0.06606. Applying the one dimensional delta method, Var[h(θ)] ≅
⎛ ∂h ⎞ 2 Var[ θ^ ]. ⎝ ∂θ ⎠
Var[FY(10)] ≅ (-0.06606)2 (18) = 0.07855. Comment: “The maximum likelihood estimator of FY(10)” is the value of FY(10) = Prob(average ≤ 10) for an Exponential Distribution with the fitted theta. In general for a Gamma Distribution, F(x) = Γ[α; x/θ]. n-1
Γ[n ; x] = 1 -
∑ i=0
xi e- x . ⇒ Γ[2 ; x] = 1 - e-x + xe-x. i!
⇒ Γ[2; x/θ] = 1 - e-x/θ + (x/θ) e-x/θ. ⇒ Γ[2; 20/θ] = 1 - e-20/θ (20/θ + 1). ⇒ FY(10) = Prob(average ≤ 10) = Prob(sum ≤ 20) = Γ[2; 20/θ] = 1 - e-20/θ (20/θ + 1).
2013-4-6, Fitting Loss Distributions §30 Variances of Functions, HCM 10/15/12, Page 911 30.81. C. F(5000) = Φ[(ln(5000) - 6.84)/1.49] = Φ[1.13] = .8708. The quantity of interest is h(µ, σ) = F(5000) = Φ[z], where z = (ln(5000) - µ)/σ. Using the chain rule, ∂h/∂µ = dΦ[z]/dz ∂z/∂µ = φ[z](-1/σ) = -φ[z]/σ.
∂h/∂σ = φ[z]∂z/∂σ = -φ[z] (ln(5000) - µ)/σ2 = -φ[z] z/σ. At the maximum likelihood estimates, z = 1.13 and: φ[1.13] = exp[-1.132 /2]/ 2 π = 0.2107.
∂h/∂µ = -0.2107/1.49 = -0.1414. ∂h/∂σ = -(0.2107)(1.13)/1.49 = -0.1598. Therefore, the gradient vector is: (-0.1414, -0.1598). Therefore, using the delta method, the variance of the estimate of F(5000) is: ⎛0 0444 0 ⎞ ⎛-0.1414⎞ (-0.1414, -0.1598) ⎜ ⎟ ⎜ ⎟ = (0.14142 )(0.0444) + (0.15982 )(0.0222) = 0 0222⎠ ⎝-0.1598⎠ ⎝ 0 .001454. Standard Deviation is: 0.001454 = 0.0381. 95% confidence interval for F(5000) is: 0.8708 ± (1.96)(0.0381) = 0.796 to 0.945. Comment: Difficult! φ[z] = exp[-z2 /2]/ 2 π is the density of the Standard Normal Distribution, as shown on the first real page of the tables attached to the exam. No explicit use is made of the fact that the sample size was 50. ⎛n / σ2 0 ⎞ However, the information matrix for a LogNormal Distribution is: ⎜ ⎟. 2n / σ 2⎠ ⎝ 0 0 ⎞ ⎛σ 2 / n and the inverse of the Information Matrix, the Covariance Matrix is: ⎜ . σ2 / (2n)⎟⎠ ⎝ 0 For n = 50 and σ = 1.49, one matches the given Covariance Matrix. 30.82. A. θ^ = X = (100 + 200 + 400 + 800 + 1400 + 3100)/6 = 1000. Var[ θ^ ] = Var[ X ] = Var[X]/N = θ2/6 = 10002 /6. S(1500) = e-1500/θ. ∂S(1500)/∂θ = (1500/θ2) e-1500/θ. Evaluated at θ = 1000, ∂S(1500)/∂θ is: (1500/10002 ) e-1.5. ^ Var[ S (1500)] = {(1500/10002 ) e-1.5}2 (10002 /6) = 1.52 e-3/6 = 0.0187.
Comment: For the Exponential, method of moments is equal to maximum likelihood.
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 912
Section 31, Non-Normal Confidence Intervals390 The basis of the likelihood ratio test was that as the sample size approaches infinity, the distribution of the likelihood ratio statistic approaches a Chi-Square Distribution with number of degrees of freedom equal to the difference in the number of parameters: (2) {(maximum loglikelihood unrestricted) - (maximum loglikelihood restricted)} has approximately a Chi-Square Distribution. For example, let us assume we have 10 claims totaling 2000. Let us fit an Exponential Distribution, f(x) = exp[-x/θ]/θ. n
The loglikelihood is:
n
∑ ln[f(xi)] = ∑ {-xi / θ i=1
- ln[θ]} = -2000/θ - 10ln[θ].
i=1
Then fitting an Exponential via maximum likelihood, θ^ = 200. The maximum loglikelihood is: -2000/200 - 10 ln[200] = -62.9832. Exercise: Use the likelihood ratio test, to test H0 : θ = 100 versus H1 : θ ≠ 100. [Solution: The loglikelihood corresponding to θ = 100 is: -2000/100 - 10ln[100] = -66.0517. The likelihood ratio test statistic is: (2){-62.9832 - (-66.0517)} = 6.065. For one fitted parameter, we look at the one degree of freedom row of the Chi-Square Table. S(5.024) = 2.5% and S(6.635) = 1%. We reject the null hypothesis at 2.5%, but not at 1%.] 200 was a point estimate of θ. We can get an interval estimate for θ, by looking at those values of θ that produce large values of the loglikelihood function.391 For example, if we wanted a 90% confidence interval, we would use the 10% critical value from the Chi-Square Table. For one degree of freedom, the survival function at 2.706 is 10%. Thus we would want those values of θ such that: Likelihood Ratio Test Statistic ≤ 2.706. In other words, we want: (2){-62.9832 - (-2000/θ - 10ln[θ])} ≤ 2.706. -2000/θ - 10ln[θ] ≥ -62.9832 - 2.706/2 = -64.34.
390
See Section 15.4 of Loss Models. The Likelihood Ratio Test has a critical region (rejection region) of the form: unrestricted maximum loglikelihood minus restricted maximum loglikelihood is large. Here our confidence interval is those values of θ where we would not reject the null hypothesis. In general, confidence intervals and hypothesis tests are closely connected. 391
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 913 We want those θ such that the loglikelihood is greater than or equal to -64.34.392 Here is a graph of the loglikelihood as a function of θ: -2000/θ - 10ln[θ]. Loglikelihood - 63.0 - 63.5 - 64.0 - 64.5 - 65.0 - 65.5 150
200
250
300
350
400
Theta
One can solve numerically, and the loglikelihood is equal to 64.34 for: θ = 123 and θ = 354. Thus a 90% confidence interval for θ is: [123, 354]. In general, using the likelihood function, a confidence interval covering probability P, consists of those values of the parameter(s) such that: loglikelihood ≥ (maximum loglikelihood) - (Pt h percentile of the Chi-Square Dist.) / 2, where the Chi-Square Distribution, has number of degrees of freedom equal to the number of fitted parameters. Exercise: In the above example, write down the equation for the form of a 95% confidence interval for θ. [Solution: The maximum loglikelihood is -62.9832. For one degree of freedom, the Chi-Square Distribution is 95% at 3.841. The loglikelihood function is: -2000/θ - 10ln[θ]. Confidence interval is those theta such that: -2000/θ - 10ln[θ]. ≥ -62.9832 - 3.841/2 = -64.904. Comment: Solving numerically, the 95% confidence interval is: [114, 399]. ] 392
We include those values of theta with loglikelihoods close to the maximum loglikelihood. We exclude those values of theta with loglikelihoods much smaller than the maximum loglikelihood.
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 914 The confidence interval [114, 399] is not symmetric around the point estimate of θ = 200. This differs from the confidence interval one would get by using the Normal Approximation.393 Exercise: In the above example, use the Normal Approximation to get an approximate 95% confidence interval for θ. [Solution: θ^ = X . Var[ θ^] = Var[ X ] = Var[X]/n = θ2/n = 2002 /10 = 4000. Approximate 95% confidence interval for θ is: 200 ± 1.960
4000 = 200 ± 124 = [76, 324]. ]
A Two Parameter Example:394 As discussed in a previous section, the maximum likelihood fit of a Pareto Distribution to the Ungrouped Data in Section 2 is α = 1.702, and θ = 240,151.395
For the Pareto Distribution: f(x) =
α θα , and ln[f(x)] = ln(α) + α ln(θ) - (α+1) ln(θ + x). (θ + x)α + 1
Therefore, the loglikelihood is: n ln(α) + n α ln(θ) - (α+1) Σ ln(θ + xi). For the ungrouped data in Section 2, n = 130, and Σ ln(240,151 + xi) = 1686.9743. Therefore, the maximum loglikelihood is: 130 ln(1.702) + (130)(1.702) ln(240,151) - (2.702)(1686.974) = -1747.875. In order to get a 95% confidence area for the parameters of the Pareto, since we have fit two parameters we need to look at the row of the Ch-Square Table for two degrees of freedom. F(5.991) = 95%. Thus the 95% confidence area consists of those sets of alpha and theta such that: n ln(α) + n α ln(θ) - (α+1) Σ ln(θ + xi) = loglikelihood ≥ -1747.875 - 5.991/2 = -1750.87.
393
Unless specifically told to do otherwise, on the exam to get a confidence interval, one would use the Normal Approximation, rather than the technique currently being discussed which is based on the Likelihood Ratio Test. 394 Similar to Example 15.15 in Loss Models. 395 One needs a computer to fit this two parameter distribution.
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 915 We want alpha and theta such that: n ln(α) + n α ln(θ) - (α+1) Σ ln(θ + xi) ≥ -1750.87. Using a computer, this approximate 95% confidence area is:396 Theta 600000
500000
400000
300000
200000
1.0
1.5
2.0
2.5
3.0
Alpha
This differs somewhat from the 95% confidence ellipse shown in a previous section based on the covariance matrix and the Bivariate Normal Approximation: Theta
400000
300000 Alpha 1
200000
100000
396
Similar in concept to Figure 15.1 in Loss Models.
2
2.5
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 916 Problems: 31.1 (6 points) Losses are assumed to follow an Exponential Distribution with mean θ. You observe 1000 losses totaling 750,000. You wish to construct a Non-Normal 99% Confidence Interval for the fitted theta using the loglikelihood function. (a) (2 points) Determine an equation defining that 99% confidence interval. (b) (3 points) With the aid of a computer, determine the confidence interval. (c) (1 points) Determine a 99% confidence interval for the loss elimination ratio at 500. 31.2 (4 points) Claims are assumed to follow a Pareto distribution with parameters θ = 100 and α. A random sample of three claims is given below: 50 80 140 Which of the following equations defines a Non-Normal 95% Confidence Interval for alpha? (A) 3 ln(α) - 1.8687 α + 3.5032 ≥ 0. (B) 3 ln(α) - 0.5798 α + 4.0947 ≥ 0. (C) 3 ln(α) - 1.8687 α + 3.5032 ≥ 0. (D) 3 ln(α) - 0.5798 α + 4.0947 ≥ 0. (E) None of A, B, C, or D 31.3 (4 points) We draw a random sample of size N from a Normal Distribution with σ = 5. Let X be the sample mean and S2 be the sample variance. Which of the following equations defines a Non-Normal 90% Confidence Interval for µ? (A) |µ - X | ≥ 1.353/ N (B) |µ - X | ≥ 2.706/ N (C) (µ - X )2 - 1.353S2 ≥ 0 (D) (µ - X )2 - 2.706S2 ≥ 0 (E) None of A, B, C, or D
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 917 31.4 (3 points) Frequency is Poisson with mean λ. Out of 100 insureds, there are: 90 with no claims, 6 with one claim, 3 with two claims, and 1 with three claims. Which of the following equations defines a Non-Normal 95% Confidence Interval for λ? (A) 15ln(λ) - 100λ + 45.3773 ≥ 0. (B) 15ln(λ) - 100λ + 47.2978 ≥ 0. (C) 15ln(λ) + 100/λ - 45.3773 ≤ 0. (D) 15ln(λ) + 100/λ - 47.2978 ≤ 0. (E) None of A, B, C, or D 31.5 (5 points) Sizes of loss are distributed via a Weibull Distribution. A Weibull was fit via maximum likelihood to 10,000 losses: θ^ = 499.729, and ^τ = 3.03627. Let xi be the sizes of loss, then for the observed sample of size 10,000:
Σxi = 4,464,232, Σln[xi] = 60250.61967, and Σxi3.03627 = 1,563,457,374,484. Determine the set of thetas and taus that define a Non-Normal 90% Confidence Interval for the parameters of the Weibull Distribution from which this data was drawn.
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 918 Solutions to Problems: 31.1. (a) f(x) = exp[-x/θ]/θ. n
The loglikelihood is:
n
∑ ln[f(xi)] = ∑ {-xi / θ i=1
- ln[θ]} = -750,000/θ - 1000 ln[θ].
i=1
Fitting an Exponential via maximum likelihood, θ^ = 750. The maximum loglikelihood is: -750,000/750 - 1000 ln[750] = -7620.0732. We want a 99% confidence interval. For the Chi-Square Distribution with one degree of freedom, S(6.635) = 1%. Thus we would want those values of θ such that: loglikelihood ≥ (maximum loglikelihood) - (Pth percentile of the Chi-Square Dist.)/2. -750,000/θ - 1000 ln[θ] ≥ -7620.0732 - 6.635/2 = -7623.3907. 750/θ + ln[θ] ≤ 7.6233907. (b) The endpoints are where: 750/θ + ln[θ] = 7.6233907. Using a computer, the solutions are: θ = 692.085 and θ = 814.561. Thus an approximate 99% confidence interval for theta is: (692, 815). (c) The Loss Elimination ratio for an Exponential is: (Limited Expected Value)/ Mean = 1 - e-x/θ. Since this is a monotonic function of theta, we can plug in the endpoints of the confidence interval for theta to get a 99% confidence interval for LER(500). 1 - e-500/692 = 0.514. 1 - e-500/815 = 0.459. An approximate 99% confidence interval for the loss elimination ratio at 500 is: (45.9%, 51.4%). Comment: A graph of the loglikelihood as a function of theta: Loglikelihood - 7620 - 7640 - 7660 - 7680
600
700
800
900
1000
Theta
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 919 31.2. C. f(x) = α θα (θ+x)−(α+1). Σ ln (f(xi)) = Σ {ln(α) + α ln(θ) − (α+1)ln(θ + xi)}. The derivative with respect to α is: Σ {(1/α) + ln(θ) - ln(θ + xi)} = N/α - Σ ln {(θ +xi)/θ}. Setting this derivative equal to zero: 0 = (N/α) − Σ ln {(θ +xi)/θ}. Solving for alpha: α = N / Σln((θ + xi)/θ) = 3 = 1.6054. ln[(100 + 50) / 100] + ln[(100 + 80) / 100] + ln[(100 + 140)/ 100] The corresponding maximum loglikelihood is: 3 ln(α) + 3 α ln(100) − (α+1)Σ ln(100 + xi) = 3 ln(1.6054) + (3)(1.6054) ln(100) - (2.6054){ln(150) + ln(180) + ln(240)} = -17.2642. We want a 95% confidence interval. For the Chi-Square Distribution with one degree of freedom, S(3.841) = 5%. Thus we want those values of α such that: loglikelihood ≥ (maximum loglikelihood) - (95th percentile of the Chi-Square Dist.)/2. 3 ln(α) + 3 α ln(100) − (α+1){ln(150) + ln(180) + ln(240)} ≥ -17.2642 - 3.841/2 = -19.1847. 3 ln(α) - 1.8687 α + 3.5032 ≥ 0. Comment: A graph of the loglikelihood as a function of alpha: Loglikelihood - 17.5 - 18.0 - 18.5 - 19.0 - 19.5 - 20.0 1
2
3
4
5
Alpha
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 920 A graph of 3 ln(α) - 1.8687 α + 3.5032 as a function of alpha: 2.0 1.5 1.0 0.5
1
2
3
4
5
Alpha
-0 . 5 -1 . 0
The 95% confidence interval consists of those values of alpha such that this function is nonnegative; the non-normal 95% confidence interval for alpha is approximately: [0.4, 4.2].
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 921 31.3. E. Maximum likelihood is equal to the method of moments. Thus the maximum likelihood occurs for µ = X . (x -µ)2 ] 2σ2 . σ 2π
exp[Density is: f(x) =
lnf(x) = -
(x - µ)2 - ln(σ) - ln(2π)/2 = -(x-µ)2 / 50 - ln(5) - ln(2π)/2. 2σ2
Loglikelihood is: -Σ (xi - µ)2 / 50 - Nln(5) - Nln(2π)/2. Maximum loglikelihood is: -Σ (xi - X )2 / 50 - Nln(5) - Nln(2π)/2. We want a 90% confidence interval. For the Chi-Square Distribution with one degree of freedom, S(2.706) = 10%. Thus we want those values of µ such that: loglikelihood ≥ (maximum loglikelihood) - (90th percentile of the Chi-Square Dist.)/2. -Σ (xi - µ)2 / 50 - Nln(5) - Nln(2π)/2 ≥ -Σ (xi - X )2 / 50 - Nln(5) - Nln(2π)/2 - 2.706/2. ⇔
Σ (xi - X )2 / 50 - Σ (xi - µ)2 / 50 + 1.353 ≥ 0. ⇔ µΣ xi / 25 - X Σ xi / 25 + N X 2 / 50 - N µ2/ 50 + 1.353 ≥ 0. ⇔ 2µN X - 2N X 2 + N X 2 - N µ2 + 67.65 ≥ 0. ⇔ N µ2 - 2µN X - N X 2 - 67.65 ≤ 0. At the boundaries of the confidence interval: N µ2 - 2µN X + N X 2 - 67.65 = 0. µ=
2NX ±
4N2 X 2 - (4)(N)(NX2 - 67.65) = X ± 8.225/ N . 2N
The loglikelihood is large for µ near X . Thus the 90% confidence interval is: X - 8.225/ N ≤ µ ≤ X + 8.225/ N . Comment: For a sample for a Normal with known variance, with H0 : µ = µ0, the form of the critical region (rejection region) from the Likelihood Ratio Test is: | X - µ0| ≥ c.
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 922 31.4. A. Maximum likelihood is equal to the method of moments. Thus the maximum likelihood occurs for λ = X = 0.15. Density is: f(x) = λx e−λ/x!. lnf(x) = x ln(λ) - λ - ln(x!). Loglikelihood is: ln(λ) Σxi - N λ - Σln(xi!) = ln(λ) 15 - 100λ - Σln(xi!). Maximum loglikelihood is: ln(0.15) 15 - 15 - Σln(xi!) = -43.4568 - Σln(xi!). We want a 95% confidence interval. For the Chi-Square Distribution with one degree of freedom, S(3.841) = 5%. Thus we want those values of λ such that: loglikelihood ≥ (maximum loglikelihood) - (95th percentile of the Chi-Square Dist.)/2. ln(λ) 15 - 100λ - Σln(xi!) ≥ -43.4568 - Σln(xi!) - 3.841/2. ⇔ 15ln(λ) - 100λ + 45.3773 ≥ 0. Comment: A graph of the loglikelihood as a function of lambda: Loglikelihood - 48 - 50 - 52 - 54 - 56 0.05
0.10
0.15
0.20
0.25
Lambda
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 923 A graph of 15ln(λ) - 100λ + 45.3773 as a function of lambda:. 2
0.05
0.10
0.15
0.20
0.25
Lambda
-2
-4
-6
The graph crosses the x-axis at λ = 0.086 and λ = 0.239. Thus an approximate Non-Normal 95% Confidence Interval for λ is: [0.086, 0.239]. The maximum loglikelihood at λ = 0.15 is -47.3280. The loglikelihood at either λ = 0.086 or λ = 0.239 is: -49.2485 = -47.3280 - 3.841/2.
2013-4-6, Fitting Loss Dists. §31 Non-Normal Confid. Inter., HCM 10/15/12, Page 924 31.5. f(x) = τxτ−1θ−τ exp(-(x/θ)τ). ln f(x) = lnτ + (τ - 1) ln x - τlnθ - (x/θ)τ. Loglikelihood is: N lnτ + (τ - 1) Σ lnxi - Nτlnθ - Σxiτ/θτ = 10000 lnτ + (τ - 1) 60250.61967 - 10000τlnθ - Σxiτ/θτ. Maximum loglikelihood is: 10000 ln(3.03627) + (3.03627 - 1) (60250.61967) - (10000)(3.03627)ln(499.729) - Σxi3.03627/499.7293.03627 = -54,882.993 - 1,563,457,374,484/156,345,995 = -64,882.977. We want a 90% confidence interval. For the Chi-Square Distribution with two degree of freedom, S(4.605) = 10%. Thus we want those values of θ and τ such that: loglikelihood ≥ (maximum loglikelihood) - (90th percentile of the Chi-Square Dist.)/2. 10000 lnτ + (τ - 1) 60250.61967 - 10000τlnθ - Σxiτ/θτ ≥ -64,882.977 - 4.605/2. ⇔ 10000 lnτ + τ 60250.61967 - 10000 τ lnθ - Σxiτ/θτ ≥ -4634.660. Comment: I simulated 10,000 losses from a Weibull Distribution with θ = 500 and τ = 3. For this data, Σxi3 = 1,241,688,196,543. Thus it turns out that the likelihood for θ = 500 and τ = 3 is -64,884.386. Thus for H0 : θ = 500 and τ = 3, the likelihood ratio statistic is: (2){-64,882.977 - (-64,884.386)} = 2.819, with 2 degrees of freedom. Since 2.819 < 4.605 we do not reject H0 at 10%; using a computer, the p-value is 24.4%. For θ = 500 and τ = 3: 10000 lnτ + τ 60250.61967 - 10000τlnθ - Σxiτ/θτ = 10000 ln3 + (3) (60250.61967) - (10000)(3)ln500 - 1,241,688,196,543/5003 = -4633.767 > 4634.660. Thus θ = 500 and τ = 3 is in the Non-Normal 90% Confidence Interval for the parameters of the Weibull Distribution from which this data was drawn; that is why we do not reject H0 at 10%.
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 925
Section 32, Minimum Modified Chi-Square397 How to compute the Chi-Square Statistic has been discussed. The Modified Chi-Square is similar. However, the contribution from each interval for the Modified Chi-Square is: (fitted - observed)2 . Note that the denominator is the observed, rather than the fitted or observed assumed as in the case of the (unmodified) Chi-Square. For example for the Weibull distribution with θ = 16,184 and τ = 1.0997 fit to the grouped data in Section 3 by the Method of Maximum Likelihood, hereʼs the calculation of the Modified Chi-Square:398 Bottom of Top of Interval Interval $ Thous. $ Thous. 0 5 5 10 10 15 15 20 20 25 25 50 50 75 75 100 100 Infinity
# claims in the Interval 2208 2247 1701 1220 799 1481 254 57 33 10000
F(lower)
F(upper)
0.00000 0.24028 0.44508 0.60142 0.71696 0.80075 0.96848 0.99548 0.99939
0.24028 0.44508 0.60142 0.71696 0.80075 0.96848 0.99548 0.99939 1.00000
F(upper) minus F(lower) 0.24028 0.20480 0.15634 0.11553 0.08379 0.16774 0.02700 0.00391 0.00061
Fitted # claims 2402.8 2048.0 1563.4 1155.3 837.9 1677.4 270.0 39.1 6.1
(ObservedFitted)^2 /Observed 17.2 17.6 11.1 3.4 1.9 26.0 1.0 5.6 22.0
1.00000
10000
105.9
One can use Modified Chi-Square to fit distributions.399 Smaller Modified Chi-Square indicates a better fit, thus it makes sense to try to minimize the Modified Chi-Square. Using the method of Minimum Modified Chi-Square one numerically solves for those parameters that produce the smallest Modified Chi-Square. This can be applied to either grouped or ungrouped data. (In the case of ungrouped data one has to specify the intervals to use in computing the Modified Chi-Square statistic.) 397
See page 54 of the first edition of Loss Models, not on the syllabus. Minimum Modified Chi-Square is an example of a “minimum interval-based distance estimator.” There are additional estimators discussed in the first edition of Loss Models. Minimum Layer Average Severity is another example of a “minimum interval-based distance estimator.” Minimum Cumulative Distribution Function (cdf) Distance and Minimum Limited Expected Value (LEV) Distance are each examples of minimum distance estimators. 398 Compare it to the similar calculation of the (unmodified) Chi-Square of 204.6 in a previous Section. When as is the case here, the fit is very bad, the Chi-Square and Modified Chi-Square can be very different. When the fit is very good, the Chi-Square and modified Chi-Square are very close. 399 One could instead use the (unmodified) Chi-Square. Loss Models uses Minimum Modified Chi-Square, rather than Minimum Chi-Square. However, Loss Distributions by Hogg & Klugman, uses Minimum Chi-Square. For distributions that fit well, the results are similar.
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 926 For the ungrouped data in Section 2, one can compute the Modified Chi-Square for various Pareto distributions.400 For example, for α = 1.9 and θ = 250,000, the Modified Chi-Square is 2.49:
Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000
Upper Observed Endpoint # claims 10,000 8 25,000 13 50,000 12 100,000 24 250,000 35 500,000 19 1,000,000 12 infinity 7
1.9
250,000
F(lower) 0.000 0.072 0.166 0.293 0.472 0.732 0.876 0.953
F(upper) 0.072 0.166 0.293 0.472 0.732 0.876 0.953 1.000
Fitted # claims 9.34 12.20 16.53 23.34 33.76 18.71 10.01 6.11
(ObservedFitted)^2 /Observed 0.223 0.050 1.709 0.018 0.044 0.004 0.329 0.114
130
2.49
130
For the ungrouped data in Section 2, hereʼs the Modified Chi-Square for various Pareto distributions: 401 Theta (100 thousand)
1.7
Alpha 1.9
2.0 2.5 3.0 3.5
4.22 1.75 4.58 10.80
8.98 2.49 1.66 4.37
40 30 20 10 0
2.1
2.3
15.39 5.58 1.86 1.86
22.76 10.06 3.93 1.75
500000 400000 1.8
300000 2 2.2
400
200000
Using intervals with 0, 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, 1,000,000, and infinity as endpoints. 401 As previously, using intervals with 0, 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, 1,000,000, and infinity as endpoints.
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 927 If one wants the Pareto Distribution with the Minimum Modified Chi-Square, then one has to solve numerically.402 Using a computer, it turns out that the minimum Modified Chi-Square Pareto has alpha = 1.981 and theta = 300 thousand, with Modified Chi-Square of 1.45:
Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000 1.981
Upper Observed Endpoint # claims 10,000 8 25,000 13 50,000 12 100,000 24 250,000 35 500,000 19 1,000,000 12 infinity 7 300,000
F(lower) 0.000 0.063 0.147 0.263 0.434 0.699 0.857 0.945
F(upper) 0.063 0.147 0.263 0.434 0.699 0.857 0.945 1.000
Fitted # claims 8.18 10.89 15.15 22.26 34.40 20.50 11.51 7.12
(ObservedFitted)^2 /Observed 0.004 0.344 0.826 0.125 0.010 0.119 0.020 0.002
130
1.45
130
This Modified Chi-Square of 1.45 is smaller than that for the maximum likelihood curve which had α = 1.702, θ = 240,151, and has a Modified Chi-Square of 1.74:
Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000 1.702
Upper Observed Endpoint # claims 10,000 8 25,000 13 50,000 12 100,000 24 250,000 35 500,000 19 1,000,000 12 infinity 7 240,151
130
F(lower) 0.000 0.067 0.155 0.275 0.447 0.703 0.853 0.939
F(upper) 0.067 0.155 0.275 0.447 0.703 0.853 0.939 1.000
Fitted # claims 8.72 11.44 15.62 22.34 33.28 19.46 11.19 7.95
(ObservedFitted)^2 /Observed 0.065 0.186 1.089 0.115 0.084 0.011 0.055 0.129
130
1.74
By definition, the Minimum Modified Chi-Square Pareto has a smaller Modified Chi-Square than any other Pareto. Thus it has a smaller (better) Modified Chi-Square than the Maximum Likelihood Pareto. Conversely, the Maximum Likelihood Pareto has a larger (better) likelihood than the Minimum Modified Chi-Square Pareto. If one then tests the significance of a curve fit by minimum Modified Chi-Square, the degrees of freedom are: (the number of intervals minus one) - (the number of fitted parameters.)403 402
One could use a built-in minimization routine, or one could use the Nelder-Mead simplex algorithm. In this case, I have used the function FindMinimum in Mathematica. 403 Fewer degrees of freedom makes it easier to reject a fitted curve. A smaller value of the Chi-Square corresponds to a good fit, when one has specifically minimized the Chi-Square.
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 928 In this case the degrees of freedom are 8 -1 - 2 = 5, rather than 7 as for an a priori assumed distribution. One should avoid intervals with less than 5 observations.404 In that case one should group adjacent intervals together to create intervals with at least 5 observations. More generally, combining classes with a small number of observations into larger classes may improve the estimation. The following Pareto Distributions were each fit to the ungrouped data in Section 2: Method
Fitted Pareto Distributions α θ (000) 1.702 240 2.658 518 1.981 300
Maximum Likelihood Moments Minimum Modified Chi-Square
While these curves are similar, which method is used to fit the curve does make a difference, particularly if trying to extrapolate the behavior in the extreme tail of the distribution. The mean excess losses are compared for these three different fitted Pareto Distributions: e(x) (million) 7
Maximum Likelihood
6 5
Min. Mod. Chi-Square
4 3
Method of Moments
2 1 1
2
3
4
5
x (million)
By definition, the Minimum Modified Chi-Square Pareto has a smaller Modified Chi-Square Statistic than the other Pareto distributions. Similarly the Maximum Likelihood Pareto has a larger likelihood than the other Pareto distributions. Which Pareto fits best, depends on which criterion you use to define “best.” 404
A common rule of thumb for the use of the unmodified or modified Chi-Square Statistic is at least 5 expected claims per interval. If the expected number of claims is too small, the difference between the observed and expected number of claims is not well approximated by a Normal Distribution, which is what underlies the use of the Chi-Square Statistic.
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 929 Minimum Chi-Square: One could instead use the (unmodified) Chi-Square to fit distributions.405 Minimum Chi-Square fitting works the same way as Minimum Modified Chi-Square; for distributions that fit well Minimum Chi-Square usually produces very similar results to Minimum Modified ChiSquare. For example, it turns out that the Minimum Chi-Square Pareto fit to the ungrouped data in Section 2 has α = 1.939 and θ = 286,000, with (unmodified) Chi-Square of 1.32:406
Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000 1.939
Upper Observed Endpoint # claims 10,000 8 25,000 13 50,000 12 100,000 24 250,000 35 500,000 19 1,000,000 12 infinity 7 286,000
F(lower) 0.000 0.064 0.150 0.268 0.441 0.704 0.859 0.946
F(upper) 0.064 0.150 0.268 0.441 0.704 0.859 0.946 1.000
130
Fitted # claims 8.38 11.12 15.38 22.43 34.23 20.15 11.26 7.05
(ObservedFitted)^2 /Fitted 0.017 0.319 0.745 0.109 0.017 0.066 0.049 0.000
130
1.32
The Minimum Chi-Square Pareto is similar to the Minimum Modified Chi-Square Pareto with α = 1.980 and θ = 300 thousand. Note however, that the Minimum Chi-Square Pareto has a larger Modified Chi-Square of 1.48 than that of the Minimum Modified Chi-Square Pareto of 1.45.
Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000 1.939 405
Upper Observed Endpoint # claims 10,000 8 25,000 13 50,000 12 100,000 24 250,000 35 500,000 19 1,000,000 12 infinity 7 286,000
130
F(lower) 0.000 0.064 0.150 0.268 0.441 0.704 0.859 0.946
F(upper) 0.064 0.150 0.268 0.441 0.704 0.859 0.946 1.000
Fitted # claims 8.38 11.12 15.38 22.43 34.23 20.15 11.26 7.05
(ObservedFitted)^2 /Observed 0.018 0.273 0.955 0.102 0.017 0.070 0.046 0.000
130
1.48
In a given practical application use at most one of these two methods. Do not use both. One could use a built-in minimization routine, or one could use the Nelder-Mead simplex algorithm. In this case, I have used the function FindMinimum in Mathematica. 406
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 930 Conversely, the Minimum Modified Chi-Square Pareto has a larger Chi-Square of 1.35 than that of the Minimum Chi-Square Pareto of 1.32:
Lower Endpoint 0 10,000 25,000 50,000 100,000 250,000 500,000 1,000,000 1.98
Upper Observed Endpoint # claims 10,000 8 25,000 13 50,000 12 100,000 24 250,000 35 500,000 19 1,000,000 12 infinity 7 300,000
130
F(lower) 0.000 0.063 0.147 0.263 0.434 0.699 0.857 0.945
F(upper) 0.063 0.147 0.263 0.434 0.699 0.857 0.945 1.000
Fitted # claims 8.17 10.88 15.14 22.26 34.40 20.51 11.51 7.13
(ObservedFitted)^2 /Fitted 0.004 0.413 0.652 0.136 0.011 0.111 0.020 0.002
130
1.35
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 931 Problems: 32.1 (4 points) One hundred observed losses have been recorded in thousands of dollars and are grouped as follows: Interval [100,150] [150,200] [200,∞] Number of claims 60 10 30 The random variable underlying the observed losses has the Distribution function F(x) = 1 - (x/100000)-q, x > 100,000. Which of the following values for the parameter q fits the data best using the minimum modified chi-square method? A. 1.7 B. 1.8 C. 1.9 D. 2.0 E. 2.1 32.2 (3 points) One hundred losses have been recorded and are grouped as follows: Interval [0, 5) [5, 10) [10, 25) [25, ∞) Number of losses 32 37 23 8 Losses $87 $284 $351 $270 Assuming an Exponential Distribution as per Loss Models, which of the following values for the parameter θ fits the data best using the minimum modified chi-square method? A. 8
B. 10
C. 12
D. 14
E. 16
32.3 (3 points) One hundred losses have been recorded and are grouped as follows: Interval [0, 5) [5, 10) [10, ∞) Number of losses 32 37 31 Losses $87 $284 $621 Losses follow the uniform distribution on (0, θ), θ > 10. Fit θ using the method of minimum modified chi-square. A. 13.0
B. 13.5
C. 14.0
D. 14.5
E. 15.0
32.4 (4B, 11/95, Q.19) (1 point) Which of the following statements are true about minimum modified chi-square estimation? 1. Minimum modified chi-square estimation with a continuous distribution requires grouping of the observations. 2. The statistic to be minimized is the modified chi-square goodness-of-fit statistic. 3 . The parameters in the distribution function of the model appear in both the numerator and the denominator of the statistic that is to be minimized. A. 1, 2 B. 1, 3 C. 2, 3 D. 1, 2, 3 E. None of A, B, C, or D
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 932 32.5 (4B, 5/98, Q.30) (1 point) You are given the following: • A number of claims have been observed and recorded for a given risk. • The null hypothesis, H0 , is that claim sizes for this risk follow a distribution that is a member of a family with unknown parameter α. •
α is estimated using both maximum likelihood estimation and
• •
minimum chi-square estimation. Chi-square tests are performed with both estimates using the Chi-Square statistic. The same class boundaries were used to obtain the minimum chi-square estimate of α and the same significance level are used in both tests.
Which of the following statements are true? 1. The number of degrees of freedom used in the chi-square tests should be one less than the number of classes used. 2. If the chi-square test performed with the minimum chi-square estimate of α results in acceptance of H0 , then the chi-square test performed with the maximum likelihood estimate of α must also result in acceptance of H0 . 3. If the chi-square test performed with the minimum chi-square estimate of a results in rejection of H0 , then the chi-square test performed with the maximum likelihood estimate of α must also result in rejection of H0 . A. 1
B. 2
C. 3
D. 1, 2
E. 1, 3
32.6 (4, 11/03, Q.30 & 2009 Sample Q.23) (2.5 point) For a sample of 15 losses, you are given: (i) Interval Observed Number of Losses (0, 2] 5 (2, 5] 5 (5, ∞) 5 (ii) Losses follow the uniform distribution on (0, θ). Estimate θ by minimizing the function
3
(Ej - Oj)2 , Oj j=1
∑
where Ej is the expected number of losses in the jth interval, and Oj is the observed number of losses in the jth interval. (A) 6.0
(B) 6.4
(C) 6.8
(D) 7.2
(E) 7.6
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 933 Solutions to Problems: 32.1. C. The Modified Chi-Square Statistic for the five values of q are calculated as follows: Lower Endpoint 100,000 150,000 200,000 1.7 Lower Endpoint 100,000 150,000 200,000 1.8 Lower Endpoint 100,000 150,000 200,000 1.9 Lower Endpoint 100,000 150,000 200,000 2.0 Lower Endpoint 100,000 150,000 200,000 2.1
Upper Observed Endpoint # claims F(lower) F(upper) 150,000 60 0.000 0.498 200,000 10 0.498 0.692 infinity 30 0.692 1.000 100 Upper Observed Endpoint # claims F(lower) F(upper) 150,000 60 0.000 0.518 200,000 10 0.518 0.713 infinity 30 0.713 1.000 100 Upper Observed Endpoint # claims F(lower) F(upper) 150,000 60 0.000 0.537 200,000 10 0.537 0.732 infinity 30 0.732 1.000 100 Upper Observed Endpoint # claims F(lower) F(upper) 150,000 60 0.000 0.556 200,000 10 0.556 0.750 infinity 30 0.750 1.000 100 Upper Observed Endpoint # claims F(lower) F(upper) 150,000 60 0.000 0.573 200,000 10 0.573 0.767 infinity 30 0.767 1.000 100
Fitted # claims 49.81 19.41 30.78
(ObservedFitted)^2 /Observed 1.732 8.863 0.020
100
10.62
Fitted # claims 51.80 19.48 28.72
(ObservedFitted)^2 /Observed 1.120 8.989 0.055
100
10.16
Fitted # claims 53.72 19.49 26.79
(ObservedFitted)^2 /Observed 0.658 9.005 0.343
100
10.01
Fitted # claims 55.56 19.44 25.00
(ObservedFitted)^2 /Observed 0.329 8.920 0.833
100
10.08
Fitted # claims 57.32 19.35 23.33
(ObservedFitted)^2 /Observed 0.120 8.747 1.485
100
10.35
The Modified Chi-Square is smallest for q = 1.9. Comment: The Minimum Modified Chi-Square is q = 1.91.
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 934 32.2. B. The Modified Chi-Square Statistic for the five values of θ are calculated as follows: Lower Endpoint 0 5 10 25 8 Lower Endpoint 0 5 10 25 10 Lower Endpoint 0 5 10 25 12 Lower Endpoint 0 5 10 25 14 Lower Endpoint 0 5 10 25 16
Upper Observed Endpoint # claims F(lower) 5 32 0.000 10 37 0.465 25 23 0.713 infinity 8 0.956
F(upper) 0.465 0.713 0.956 1.000
100 Upper Observed Endpoint # claims F(lower) 5 32 0.000 10 37 0.393 25 23 0.632 infinity 8 0.918
F(upper) 0.393 0.632 0.918 1.000
100 Upper Observed Endpoint # claims F(lower) 5 32 0.000 10 37 0.341 25 23 0.565 infinity 8 0.875
F(upper) 0.341 0.565 0.875 1.000
100 Upper Observed Endpoint # claims F(lower) 5 32 0.000 10 37 0.300 25 23 0.510 infinity 8 0.832
F(upper) 0.300 0.510 0.832 1.000
100 Upper Observed Endpoint # claims F(lower) 5 32 0.000 10 37 0.268 25 23 0.465 infinity 8 0.790
F(upper) 0.268 0.465 0.790 1.000
100
The Modified Chi-Square is smallest for θ = 10.
Fitted # claims 46.47 24.88 24.26 4.39
(Observed-Fitted)^2 /Observed 6.55 3.97 0.07 1.63
100
12.21
Fitted # claims 39.35 23.87 28.58 8.21
(Observed-Fitted)^2 /Observed 1.69 4.66 1.35 0.01
100
7.71
Fitted # claims 34.08 22.46 31.01 12.45
(Observed-Fitted)^2 /Observed 0.13 5.71 2.79 2.48
100
11.11
Fitted # claims 30.03 21.01 32.19 16.77
(Observed-Fitted)^2 /Observed 0.12 6.91 3.67 9.61
100
20.31
Fitted # claims 26.84 19.64 32.57 20.96
(Observed-Fitted)^2 /Observed 0.83 8.15 3.98 21.00
100
33.96
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 935 32.3. D. For θ > 10, E1 = 100(5/θ) = 500/θ = E2 , E3 = 100(1 - 10/θ) = 100 - 1000/θ.
Σ (Ej - Oj)2/Oj = (500/θ − 32)2/32 + (500/θ − 37)2/37 + (69 - 1000/θ)2/31. Set the derivative with respect to θ equal to zero: 0 = (-2){(500/θ − 32)(500/θ2 )/32 + (500/θ − 37)(500/θ2 )/37 - (69 - 1000/θ)(1000/θ2 )/31}. 0 = (500/θ − 32)/32 + (500/θ − 37)/37 - (69 - 1000/θ)2/31. ⇒ 93.655/θ = 6.452. ⇒ θ = 14.5. 32.4. A. 1. True. By the definition of the (unmodified or modified) Chi-Square Statistic, the data must be grouped. 2. True. 3. False. The denominator is the observed number of losses in the interval, and does not depend on the parameters. For the (unmodified) Chi-Square, with the fitted number of losses in the denominator, statement #3 would be true. 32.5. C. Statement #1 would be true if the distribution were not fitted. However, with a fitted distribution, the number of degrees of freedom used in the chi-square tests should be one less than the number of classes used minus the number of fitted parameters. In this case, number of degrees of freedom = number of classes - 2. Therefore, Statement #1 is false. The Chi-Square statistic for the maximum likelihood distribution is larger than that for the minimum Chi-Square distribution. In each case with one fitted parameter, the number of degrees of freedom = the number of classes - 2. Therefore, if we accept the minimum Chi-Square distribution we could either accept or reject the maximum likelihood distribution, depending on how much larger its chi-square statistic. Thus Statement #2 is false. The Chi-Square statistic for the maximum likelihood distribution is larger than that for the minimum Chi-Square distribution. In each case, number of degrees of freedom = number of classes - 2. Therefore, if we reject the minimum Chi-Square distribution we also reject the maximum likelihood distribution. Thus Statement #3 is true. Comment: For example, assume there are 6 classes and thus 4 degrees of freedom. If the minimum chi-square distribution has a Chi-square of 9 and the maximum likelihood distribution has a Chi-square of 9.3, then we accept both at 5%. If the minimum chi-square distribution has a Chi-square of 9 and the maximum likelihood distribution has a Chi-square of 9.6, then we accept the minimum chi-square distribution at 5%, but reject the maximum likelihood distribution at 5%. If the minimum chi-square distribution has a Chi-square of 10, then the maximum likelihood distribution has a Chi-square of greater than 10, and we reject both at 5%.
2013-4-6, Fitting Loss Distributions §32 Min. Mod. Chi-Square , HCM 10/15/12, Page 936 32.6. E. For θ > 5, E1 = 15(2/θ) = 30/θ, E2 = 15(3/θ) = 45/θ, E3 = 15(1 - 5/θ) = 15 - 75/θ.
Σ (Ej - Oj)2/Oj = (30/θ − 5)2/5 + (45/θ − 5)2/5 + (10 - 75/θ)2/5. Set the derivative with respect to θ equal to zero: 0 = (-2/5){(30/θ − 5)30/θ2 + (45/θ − 5)45/θ2 - (10 - 75/θ)75/θ2 }. 0 = (30/θ − 5)30 + (45/θ − 5)45 - (10 - 75/θ)75. ⇒ 8550/θ = 1125. ⇒ θ = 7.6. Comment: Note that E1 + E2 + E3 = 30/θ + 45/θ + 15 - 75/θ = 15 = total number observed.
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 937
Section 33, Important Ideas & Formulas Here are what I believe are the most important formulas and ideas from this study guide to know for the exam. Section 1 includes the important ideas from “Mahlerʼs Guide to Loss Distributions.” Ogives & Histograms (Section 5) An Ogive is an approximate graph of the Distribution function. Connect by straight lines the points: (xi, empirical distribution function at xi). A Histogram is an approximate graph of the probability density function. # losses in the interval Height of each rectangle = . (total # losses) (width of interval) An estimate of the density from the histogram, fn (x), has variance: (losses in interval / N) {1 - (losses in interval / N)} . N (width of the interval)2 Kernel Smoothing (Section 6) Using the uniform kernel, one centers a uniform distribution of a fixed bandwidth, ±b, at each of the data points, and weights these uniform distributions together in order to get the smoothed model. The larger the bandwidth, the more smoothing. One can also use a triangular kernel of height 1/b and width 2b centered at each data point, or a Gamma kernel with fixed α and mean equal to each data point. The smaller α, the more smoothing. n
f (x) = ∑ p(yj) k yj(x) .
^
j=1
n
F(x) = ∑ p(yj) K yj(x) . ^
j=1
For the uniform kernel, ky(x) = 1/(2b), y - b ≤ x ≤ y + b. For the triangular kernel, ky(x) = (b - |x - y|)/b2 , y - b ≤ x ≤ y + b. The mean of the kernel smoothed density is the mean of the data. The variance of the kernel smoothed density is the variance of the data plus the (average) variance of the kernel.
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 938 Estimation of Percentiles (Section 7) The smoothed empirical estimate of the pth percentile is the p(N+1) claim from smallest to largest, where one linearly interpolates between two claim amounts if necessary. Percentile Matching (Section 8) If the distribution has j parameters, then match the empirical and theoretical distributions at j selected percentiles. Distributions likely to be asked about are: Exponential, Single Parameter Pareto, LogNormal, Weibull, and Loglogistic. Method of Moments (Section 9) If the distribution has j parameters, then match the first j moments of the empirical and theoretical distributions. Distributions likely to be asked about are: Exponential, Single Parameter Pareto, LogNormal, Pareto, Gamma, and Inverse Gaussian. Fitting by Maximum Likelihood (Sections 10 and 11) For ungrouped data, find the set of parameters such that Σln f(xi) is maximized. As applied here, the principle of parsimony, states that one should use the minimum number of parameters that get the job done, even though using a distribution with more parameters usually increases the loglikelihood. Maximum Likelihood is unaffected by change of variables, such as xτ and ln(x). For grouped data, maximize Σ ni ln[F(bi) - F(ai)]. Chi-Square Test (Section 12) The Chi-Square Statistic is used to test the fit of a distribution to Grouped data, and is computed by summing the contributions from each interval: (observed number of claims - expected # of claims )2 / expected # of claims. k
The Chi-Square Statistic is:
∑(Oj j=1
2
- Ej) / Ej .
χ 2 = Σ(Oi2 / Ei) - n.
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 939 To compute the number of Degrees of Freedom: 1. Unless the exam question has specifically told you which groups to use, use the groups for the data given in the question. 2. Determine whether any parameters have been fit to this data, and if so how many. 3. Degrees of freedom = (# intervals from step 1) - 1 - (# of fitted parameters, if any, from step #2). The better the fit of the curve to the data, the smaller the Chi-Square Statistic. The p-value is the value of the Survival Function of the Chi-Square Distribution (for the appropriate number of degrees of freedom) at the value of the Chi-Square Statistic. A large p-value indicates a good fit. Looking at the row of the Chi-Square Table corresponding to the appropriate number of degrees of freedom, see which entries bracket the calculated Chi-Square statistic; reject to the left, and do not reject to the right. Likelihood Ratio Test (Section 13) The Likelihood Ratio Test (or Loglikelihood Difference Test) proceeds as follows: 1. One has two distributions, one with more parameters than the other, both fit to the same data via Maximum Likelihood. 2. One of the distributions is a special case of the other. 3. Compute twice the difference in the loglikelihoods. 4. Compare the result of step 3 to the Chi-Square Distribution, with a # of degrees of freedom equal to the difference of the # of parameters of the two distributions. 5. Draw a conclusion as to whether the more general distribution fits significantly better than its special case. H0 is the hypothesis that the distribution with fewer parameters is appropriate; the alternative H1 is that the distribution with more parameters is appropriate. Hypothesis Testing (Section 14) One tests the null hypothesis H0 versus an alternative hypothesis H1 . Hypothesis tests are set up to disprove something, H0 , rather than prove anything. A hypothesis test needs a test statistic whose distribution is known. critical values ⇔ the values used to decide whether to reject H0 ⇔ boundaries of the rejection (critical) region other than ±∞. rejection region ⇔ critical region ⇔ if test statistic is in this region then we reject H0 . The significance level, α, of the test is a probability level selected prior to performing the test.
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 940 If given the value of the test statistic, the probability that H0 is true is less than or equal to the significance level chosen, then we reject the H0 . If not, we do not reject H0 . p-value = probability of rejecting H0 even though it is true = probability of a Type I error = Prob[test statistic takes on a value equal to its calculated value or a value less in agreement with H0 (in the direction of H1 ) | H0 ]. If the p-value is less than or equal to the chosen significance level, then we reject H0 . When applying hypothesis testing to test the fit of a distribution to data, the larger the p-value the better the fit. Type I Error ⇔ Reject H0 when it is true. Type II Error ⇔ Do not reject H0 when it is false. Rejecting H0 at a significance level of α ⇔ the probability of a Type I error is at most α. Power of the test ≡ probability of rejecting H0 when it is false = 1 - Prob[Type II error]. A hypothesis test is uniformly most powerful if it has the greatest power (largest probability of rejecting H0 when it is false) of any test with the same (or smaller) significance level. The larger the data set, the more powerful a given test. When a distribution is fit to data, and no adjustment is made to the critical values, the probability of a Type II error increases, while the probability of a Type I error decreases, compared to using parameters specified in advance. Schwarz Bayesian Criterion (Section 15) One adjusts the loglikelihoods by subtracting in each case the penalty: (r/2) ln(n) = (number of fitted parameters) ln(number of data points) / 2. One then compares these penalized loglikelihoods directly; larger is better. Kolmogorov-Smirnov Test (Sections 16-17) Without truncation or censoring, the Kolmogorov-Smirnov Statistic, D, is computed for ungrouped data by finding the maximum absolute difference between the empirical distribution function and the fitted distribution function: Max | (empirical distrib. function at x) - (theoretical distrib. function at x) | x
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 941 This maximum occurs just before or just after one of the observed points. The smaller the Kolmogorov-Smirnov statistic the better the fit. The Kolmogorov-Smirnov test should only be used on individual (ungrouped)data. For the Kolmogorov-Smirnov Statistic, the critical value is inversely proportional to the square root of the number of points. D(x) = empirical distribution - fitted/assumed distribution. One can usefully graph D(x); the K-S statistic is the maximum distance this difference curve gets from the x-axis, either above or below. Score Based Criterion for Selecting Models: Chi-Square Statistic or corresponding p-value, Kolmogorov-Smirnov Statistic, Anderson-Darling Statistic, Likelihood or Loglikelihood, Schwarz Bayesian Criterion. Judgement Based Criterion for Selecting Models: reviewing graphs, focusing on important items for the particular application, relying on models that have worked well in similar situations in the past. 1. Use a simple model if possible (principle of parsimony.) 2. Restrict the universe of possible models. If the critical value is c for a significance level of α, then with a probability of 1 - α: {empirical distribution function - c} ≤ F(x) ≤ {empirical distribution function + c}. Let t be the truncation point from below, and u be the censorship point from above. F* is the fitted/assumed distribution, after altering for the affects of any truncation from below. Fn is the empirical distribution. The Kolmogorov-Smirnov Statistic is defined as: D = Max | Fn (x) - F*(x) |. t≤x≤u
With censoring from above, we only make comparisons below the censorship point, including one just before the censorship point. For grouped data one can only compute bounds on the maximum absolute difference between the fitted and empirical distributions. p-p plots (Section 18) In a p-p plot one graphs the fitted distribution versus the estimated percentile at that data point. If one has a set of n losses, x1 , x2 ,..., xn , from smallest to largest, to which a Distribution F(x) has been fit, then the p-p plot consists of the n points: ( i/(n+1), F(xi) ).
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 942 One also includes on the p-p plot, the comparison line x=y. The closer the plotted points stay to the comparison line the better the fit. Anderson-Darling Test (Section 19) Without truncation or censoring, the Anderson-Darling statistic, A2 , is computed as: n
A 2 = -n - (1/n) ∑ (2i - 1) {ln[F(yi)] + ln[S(yn + 1- i )]} . i=1
Anderson-Darling statistic is always positive; large values indicate a bad fit. The Anderson-Darling statistic is a weighted average of the squared difference between the empirical distribution function and the model distribution function; it applies more weight to the differences between empirical and fitted distributions in the tails. The critical values for the Anderson-Darling Statistic do not depend on the sample size. Let t be the truncation point from below, and u be the censorship point from above. F* is the fitted/assumed distribution, after altering for the affects of any truncation from below. Fn is the empirical distribution. u
A2
{Fn(x) - F * (x)}2 f * (x) dx. F * (x) S * (x) t
≡ n ∫
k k S * (yi) F * (yi + 1) A 2 = -nF*(u) + n ∑ Sn (yi)2 ln[ + n Fn (yi)2 ln[ ∑ ] ], S * (y ) F * (y ) i + 1 i i=0 i=1
n is the number of data points, k is the number of data points which are not censored from above, t = y0 < y1 < y2 ... < yk-1 < yk < yk+ 1 = u. Fitting to Truncated Data (Sections 20 to 22) To fit a distribution to the ground up losses, when one has truncated data, one first alters the distribution for the effects of truncation prior to either comparing theoretical and observed (Percentile Matching or Method of Moments) or maximizing the likelihood (Maximum Likelihood). The method of maximum likelihood applied to ungrouped data truncated from below at d, requires that one maximize: Σ ln[f(xi)] - N ln[S(d)].
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 943 Single Parameter Pareto Distribution, Data Truncated from Below (Section 23) The Single Parameter Pareto Distribution, with support x > θ, is set to work directly with data truncated from below at θ, with no adjustment needed. Choose theta equal to the truncation point from below. x is the size of loss prior to subtracting the deductible. Fitting to Censored Data (Section 24) To fit a distribution to the ground up losses, when one has censored data, one first alters the distribution for the effects of censoring prior to either comparing theoretical and observed (Percentile Matching or Method of Moments) or maximizing the likelihood (Maximum Likelihood). For data censored from above at u, the interval from u to ∞ is treated as it would be for grouped data. Each loss of size greater than or equal to u contributes S(u) to the likelihood, or ln S(u) to the loglikelihood. Fitting to Data Truncated and Censored (Section 25) Deductible
Maximum Covered Loss
None
None
d
None
None
u
d
u
Contribution to the Likelihood of a Loss of Size x f(x) f(x) for x > d S(d) f(x) for u > x, and S(u) for x ≥ u f(x) S(u) for u > x > d, and for x ≥ u S(d) S(d)
The maximum likelihood fit to ungrouped data for an Exponential Distribution is: sum of payments ^ = . θ number of uncensored values
For the Pareto Distribution with theta fixed, the maximum likelihood fit to ungrouped data is: α^ =
number of uncensored values . θ + xi ln ∑ [θ + d ] i
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 944 For the Weibull Distribution with tau fixed, the maximum likelihood fit to ungrouped data is: ^
θ=
1 /τ ∑ (Min[xi , ui ]τ - diτ) ⎛ ⎞ . ⎝ number of uncensored values ⎠
Properties of Estimators (Section 26) estimator ⇔ procedure used to estimate a quantity of interest
⇔ a random variable or random function. estimate ⇔ result of using an estimator ⇔ a number or a function. model error ⇔ (inadvertently) used inappropriate model or assumptions. sampling frame error ⇔ using a sample different from the population to be modeled. sampling error ⇔ nonrepresentative sample due to random fluctuation. A point estimator provides a single value, or point estimate as an estimate of a quantity of interest. One wants point estimators to be: unbiased, consistent and have a small mean squared error. The Bias of an estimator is the expected value of the estimator minus the true value. An unbiased estimator has a bias of zero. Unbiasedness is usually not preserved under a change in parameters. The sample mean is an unbiased estimator of the mean of the distribution from which the sample was drawn. The sample variance is an unbiased estimator of the variance of the distribution from which the sample was drawn. Var( X ) = Var(X) / n. For an asymptotically unbiased estimator, as the number of data points, n → ∞, the bias approaches zero.
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 945 When based on a large number of observations, a consistent estimator, also called weakly consistent, has a very small probability that it will differ by a large amount from the true value. Let ψn be the estimator with a sample size of n and c be the true value, then ψ is a consistent estimator if given any ε > 0: limit
Probability{| ψn - c | < ε} = 1.
n →∞
The mean square error (MSE) of an estimator is the expected value of the squared difference between the estimate and the true value. ^
^
^ 2
MSE[θ ] = Var[θ ] + Bias[θ ] . An asymptotically unbiased estimator, whose variance goes to zero as the number of data points goes to infinity, is consistent (weakly consistent). The minimal variance for an unbiased estimator of a parameter, that does not appear in the definition of the support of a density, is the Cramer-Rao (Rao-Cramer) lower bound: −1 1 = . 2 2 n E [∂ ln f(x) / ∂ θ] n E [(∂ ln f(x) / ∂θ) 2] A Uniformly Minimum Variance Unbiased Estimator (UMVUE) is unbiased and has the smallest variance among unbiased estimators. The errors that would result from the repeated use of a procedure is what is referred to when we discuss the qualities of an estimator. For maximum likelihood estimation one has: • asymptotically unbiased estimator.
• consistent estimator. • whose errors are asymptotically normal or multivariate normal. • variance that goes to zero as the inverse of the sample size. • variance that approaches the Cramer-Rao(Rao-Cramer) lower bound as the sample size → ∞. • asymptotically UMVUE. • variance-covariance matrix ≅ the inverse of the Information Matrix.
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 946 Variances of Estimates when using the Method of Moments (Section 27) In order to estimate the variance of a single parameter fit by the Method of Moments: 1. Write the estimated parameter as a function of X . 2. Write down the variance of the function from step 1 in terms of the variance of X , using ⎛ ∂h ⎞ 2 ^ Var[h(θ)] ≅ ⎜ ⎟ Var[ θ ]. ⎝ ∂θ ⎠ 3. Write down the variance of the observed mean in terms of the process variance of a single draw from the distribution: Var( X ) = Var(X) / n. 4. Write down the process variance of a single draw from the distribution. 5. Combine the results of steps 1 through 4. Variance of Estimated Single Parameters, Maximum Likelihood (Section 28) (Fisherʼs) Information = -n E [∂2 ln f(x) / ∂θ2] = n E [(∂ ln f(x) / ∂θ)2]. The approximate variance of the estimate of a single parameter using the method of maximum likelihood is given by negative the inverse of the product of the number of points times the expected value of the second partial derivative of the log density: −1 1 1 Variance of ^θ = = = . n E [∂2 ln f(x) / ∂ 2θ] n E [(∂ ln f(x) / ∂θ) 2] the information The variance of the estimate is inversely proportional to the number of data points used to fit the distribution. Other methods of estimation generally have asymptotic variances that are greater than that of maximum likelihood.
2013-4-6, Fitting Loss Distributions §33 Important Ideas , HCM 10/15/12, Page 947 Information Matrix and Covariance Matrix (Section 29) Information Matrix = -n E[
∂2 ln f(x) ∂ ln f(x) ∂ ln f(x) = n E[ ] ]. ∂θi ∂θj ∂θi ∂θ j
The information matrix is symmetric with number of rows and columns equal to the number of parameters of the particular distribution function. For one parameter the information matrix is called the information. Information Matrix ≅ - ∂2 loglikelihood / ∂θi∂θj = the observed information. When using the method of maximum likelihood the inverse of the information matrix gives an approximate variance-covariance matrix for the fitted parameters. ⎛ ⎞ ^ ^ ]⎟ α ⎜ Var[α^ ] Covar[α, θ ⎜⎜ ⎟⎟ ^ ^] θ ⎝ Covar[α, Var[θ^] ⎠ θ
Variance of Functions of the Estimated Parameters (Section 30) The gradient vector is the partial derivatives of the function with respect to the ⎛∂h / ∂α ⎞ parameters: ∂h / ∂θi. ⎜ ⎟ ⎝ ∂h / ∂θ ⎠ For maximum likelihood, using the delta method, the asymptotic variance of the estimate of the function is the matrix product of : (transpose of gradient vector) (Inverse of information matrix) (gradient vector). The variance of the estimate of a function of a single parameter θ, h(θ), is: Var[h(θ)] ≅ -
⎛ ∂h(θ)⎞ 2 (∂h(θ) / ∂θ)2 ^ = ⎜ ⎟ Var[ θ ]. n E[∂ 2 ln f(θ) / ∂θ 2] ⎝ ∂θ ⎠
Non-Normal Confidence Intervals (Section 31) A confidence interval covering probability P, consists of those values of the parameter(s) such that: loglikelihood ≥ (maximum loglikelihood) - (Pth percentile of the Chi-Square Dist.)/2, where the Chi-Square Distribution has number of degrees of freedom equal to the number of fitted parameters.
Mahlerʼs Guide to
Survival Analysis Joint Exam 4/C prepared by Howard C. Mahler, FCAS Copyright 2013 by Howard C. Mahler.
Study Aid 2013-4-7 Howard Mahler
[email protected] www.howardmahler.com/Teaching
2013-4-7,
Survival Analysis,
HCM 10/16/12,
Page 1
Mahlerʼs Guide to Survival Analysis Copyright 2013 by Howard C. Mahler. This Study Guide covers what a student needs to know about Survival Analysis in Loss Models.1 Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important. Information presented in italics (or sections whose title is in italics) should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the readerʼs overall understanding of the subject, and to be useful in practical applications. Highly Recommended problems are double underlined. Recommended problems are underlined.2 Solutions to the problems in each section are at the end of that section.
A B
C
D
1
Section #
Pages
1 2 3 4 5 6 7 8 9 10
3-25 26-81 82-109 110-139 140-176 177-202 203-225 226-275 276-300 301-305
Section Name Introduction
Kaplan-Meier Product-Limit Estimator Variance of the Product-Limit Estimator Grouped Data
Nelson-Aalen Estimator Variance of the Nelson-Aalen Estimator Log-Transformed Confidence Intervals Maximum Likelihood Multiple Decrements Important Formulas and Ideas
Closely related material is covered in “Mahlerʼs Guide to Fitting Loss Distributions.” Maximum Likelihood is covered in both study guides. 2 Note that problems include both some written by me and some from past exams. The latter are copyright by the Casualty Actuarial Society and the SOA and are reproduced here solely to aid students in studying for exams. In some cases Iʼve rewritten these questions in order to match the notation in the current Syllabus.
2013-4-7,
Survival Analysis,
HCM 10/16/12,
Page 2
Course 4 Exam Questions by Section of this Study Aid3 Sec. Sample 5/00
11/00
5/01
11/01
11/02
11/03
11/04
5/05
1 2
22
4
4
9
25 33
4
5/07
2
4, 8 19
6
7
40
36
14 8
22
12 26
3 15
37
38
7 4
20
5
21
27
6 8
4, 19
38
4
7
11/06
1
3 5
11/05
14, 20, 31 17, 20
33
12 6
18 17
The CAS/SOA did not release the 5/02, 5/03, 5/04, 5/06, 11/07 and subsequent exams. Questions No Longer On the Syllabus: Sample: 13, 17, 26, 32. 5/00: 12, 23, 27. The Survival Analysis portions of question 34 are no longer on the syllabus. 11/00: 8, 15, 24, 29, 36. 5/01: 8, 15, 22, 26, 31, 35. 11/01: 8, 12, 20, 31. 11/02: 15, 19, 26. Question 33 is no longer directly covered on the syllabus. 11/03: 12. 11/04: 28, 34. 5/05: 18, 29. 11/05: 13, 24. 11/06: 28. 5/07: 22
3
The solutions and comments are solely the responsibility of the author; the CAS and SOA bear no responsibility for their accuracy. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 3
Section 1, Introduction Survival Analysis can be applied to mortality situations: human lives, how long machines last until failure, how long it takes a claim to be reported, how long it takes a claim to be closed, etc. It can also be applied to size of loss data, where the size of loss is mathematically equivalent to the time of death or failure. Subsequent sections will discuss: the Kaplan-Meier Product-Limit Estimator, and the Nelson-Aalen Estimator. Empirical Distribution Function: Four women are observed starting from when they were born:4 Ann dies at age 68. Betty dies at age 77. Carol dies at age 77. Debra dies at age 84. One way to estimate the survival function, is to use the Empirical Survival Function, which is one minus the Empirical Distribution Function.5 Empirical Distribution Function at t =
Empirical Survival Function at t =
number with age of death ≤ t . total
number with age of death > t total
= 1 - Empirical Distribution Function at t. For example, the Empirical Distribution Function at 70 is 1/4. The Empirical Survival Function at 70 is: 3/4 = 1 - 1/4. Exercise: What is the Empirical Survival Function at 80? [Solution: (number with age of death > 80)/total = 1/4.] Exercise: What is the Empirical Distribution Function at 77? [Solution: (number with age of death ≤ 77)/total = 3/4.] Exercise: What is the Empirical Survival Function at 77? [Solution: (number with age of death > 77)/total = 1/4 = 1 - 3/4.]
4 5
We have only four data points solely for simplicity. Mortality studies usually use thousands or millions of lives. The Empirical Survival Function is also discussed in “Mahlerʼs Guide to Loss Distributions.”
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 4
Exercise: Given four losses of size 68, 77, 77, and 84, what is the Empirical Distribution function at 70? [Solution: (# losses ≤ 70)/(# losses) = 1/4. Comment: Note that is mathematically the same as the example with four women. The Empirical Survival Function at 70 is: 1 - 1/4 = 3/4.] The Empirical Distribution Function and the Empirical Survival Function both have jump discontinuities at the observed values. In between these jump discontinuities, they are constant on intervals. Here is a graph of the Empirical Distribution Function for this example: Prob. 1
0.8
o
0.6
0.4 o
0.2
Age 65
70
75
80
85
90
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 5
Here is a graph of the Empirical Survival Function for this example: Prob. o
1
0.8
o
0.6
0.4 o 0.2
65
70
75
80
85
90
Age
The Empirical Survival Function is an unbiased and consistent estimator of the underlying survival function.6 Variance of the Empirical Distribution Function: Assume the data are drawn from a Distribution Function F(x). Then each observed value has a chance of F(x) of being less than or equal to x. Thus the number of values observed less than or equal to x is a sum of N independent Bernoulli trials with chance of success F(x). Thus if one has a sample of N values, the number of values observed less than or equal to x is Binomially distributed with parameters N and F(x). Therefore, the Empirical Distribution Function is (1/N) times a Binomial Distribution with parameters N and F(x). Therefore, the Empirical Distribution Function has a mean of F(x) F(x) {1 - F(x)} F(x) S(x) and a variance of: = . N N Similarly, the Empirical Survival Function has a mean of S(x) F(x) S(x) and a variance of: . N 6
See Example 14.4 in Loss Models. Properties of estimators are discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 6
For the example with four women, the Empirical Survival Function at 77 was 1/4. Therefore, the estimated variance of this estimate is: (3/4)(1/4) / 4 = 0.047. Exercise: Estimate S(100) and the variance of that estimate, given the following 10 losses: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746. [Solution: S(100) = 4/10. Var[S(10)] = (6/10)(4/10)/10 = 0.024.]
Probabilities on Intervals: Prob[70 < t ≤ 80] = Prob[t > 70] - Prob[t > 80] = S(70) - S(80). In the example with 4 women, Prob[70 < t ≤ 80] = 3/4 - 1/4 = 1/2. Alternately, of the four values: 68, 77, 77, 84, two are in the interval; 2/4 = 1/2. Exercise: Estimate Prob[90 < t ≤ 500], given the following 10 losses: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746. [Solution: S(90) = 0.4. S(500) = 0.1. S(90) - S(500) = 0.3. Alternately, 3 out of 10 values are in the interval.] Exercise: Estimate Prob[90 ≤ t ≤ 500], given the following 10 losses: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746. [Solution: S(90-ε) = 0.5, where ε is an extremely small and positive. S(500) = 0.1. Prob[90 ≤ t ≤ 500] = S(90-ε) - S(500) = 0.4. Alternately, 4 out of 10 values are in the interval.] Variance of Probabilities on Intervals: The number of observations in an interval is Binomial with parameters N and the probability of being in that interval. Therefore: (Probability in the interval) (1 - Probability in the interval) 7 Var[S(x) - S(y)] = . N Exercise: Given the following 10 losses: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746, what is the variance of the estimate of Prob[90 ≤ t ≤ 500]? [Solution: S(90-ε) - S(500) = 0.4. Var[S(90-ε) - S(500)] = (0.4)(1 - 0.4)/10 = 0.024.]
7
The formulas for the variances of F(x) and S(x) are special cases of this more general formula. The number of losses in the interval is Binomial with parameters N and S(x) - S(y).
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 7
Covariance of Empirical Distribution Functions: Given the following 10 losses: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746, we estimated Prob[90 < t ≤ 500] = S(90) - S(500) = 0.4 - 0.1 = 0.3. Therefore, the variance of that estimate is: Var[S(90) - S(500)] = Var[S(90)] + Var[S(500)] - 2Cov[S(90), S(500)]. If x ≤ y, then as shown subsequently: Cov[F(x), F(y)] = F(x)S(y)/N. Similarly, Cov[S(x), S(y)] = F(x)S(y)/N.8 Thus in the above example, Var[S(90) - S(500)] = F(90)S(90)/N + F(500)S(500)/N - 2F(90)S(500)/N = (0.6)(0.4)/10 + (0.9)(0.1)/10 - (2)(0.6)(0.1)/10 = 0.021. In general, for x ≤ y, Var[S(x) - S(y)] = Var[S(x)] + Var[S(y)] - 2Cov[S(x), S(y)] = F(x)S(x)/N + F(y)S(y)/N - 2F(x)S(y)/N = {F(x)S(x) + S(y) - S(y)2 - 2F(x)S(y)}/N = {F(x)S(x) + S(x)S(y) - S(y)2 - F(x)S(y)}/N = {S(x) - S(y)}{F(x) + S(y)}/N = {S(x) - S(y)}{1 - (S(x) - S(y))}/N = (Probability in the interval)(1 - Probability in the interval)/N. Thus in this example, Var[S(90) - S(500)] = (0.3)(1 - 0.3)/10 = 0.21. If x ≤ y, then Corr[S(x), S(y)] = Cov[S(x), S(y)] / {F(x)S(y)/N} {F(x)S(x) / N} {F(y)S(y) / N} =
Var[S(x)] Var[S(y)] =
{F(x) / S(x)} / {F(y) / S(y)} .
Exercise: Given the following 10 losses: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746, what is the correlation of S(90) and S(500)? [Solution:
8
{F(90) / S(90)} / {F(500) / S(500)} = (0.6 / 0.4) / (0.9 / 0.1) = 0.408.]
Note that if x = y, then this reduces to the previous formula for the variance.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 8
Derivation of the Covariance of Empirical Distribution Functions: Take x ≤ y. Assume the data are drawn from a Distribution Function F(x). Then each observed value has a chance of F(x) of being less than or equal to x. Similarly, the probability of an observed value being greater than x but less than or equal to y is F(y) - F(x). Then the number of values in such disjoint intervals follow a Multinomial Distribution. Let n1 be the number observed in the first interval t ≤ x. Let n2 be the number observed in the second interval x < t ≤ y. Then Cov[n1 , n2 ] = -N Prob[first interval] Prob[second interval].9 The Empirical Distribution Function at x, F(x), is n1 /N. F(y) - F(x) = n2 /N. Cov[F(x), F(y)] = Cov[F(x), F(x) + F(y) - F(x)] = Cov[F(x), F(x)] + Cov[F(x), F(y) - F(x)] = Var[F(x)] + Cov[n1 /N, n2 /N] = F(x)S(x)/N - Prob[first interval] Prob[second interval]/N = F(x)S(x)/N - F(x) {F(y)-F(x)} / N = F(x) {S(x) + F(x) - F(y)} / N = F(x)S(y)/N.
9
See for example, Example 3h in A First Course in Probability by Sheldon Ross.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 9
ps and qs:10 The probability of survival past time 70 + 10 = 80, given survival past time 70, is 10p 70. In the example with four women, 10p 70 = S(80)/S(70) = (1/4)(3/4) = 1/3. The probability of failing at or before time 70 + 10 = 80, given survival past time 70, is 10q70. In the example with four women, 10q70 = {S(70) - S(80)}/S(70) = {(3/4) - (1/4)}(3/4) = 2/3. 10p 70
+ 10q70 = 1/3 + 2/3 = 1.
In general, y-xpx ≡ Prob[Survival past y | Survival past x] = S(y)/S(x). y-xqx
≡ Prob[Not Surviving past y | Survival past x] =
S(x) - S(y) = 1 - y-xpx . S(x)
Also px ≡ 1 px = Prob[Survival past x+1 | Survival past x] = S(x+1)/S(x). qx ≡ 1 qx = Prob[Death within one year | Survival past x] = 1 - S(x+1)/S(x). Exercise: Estimate 100p 50 and 300q100, given the following 10 values: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746. [Solution: S(50) = 8/10. S(150) = 3/10. 100p 50 = S(150)/S(50) = (3/10)/(8/10) = 3/8. S(100) = 4/10. S(400) = 1/10. 300q100 = 1 - S(400)/S(100) = 3/4.]
t|uqx
≡ Prob[x+t < time of death ≤ x+t+u | Survival past x] =
S(x + t) - S(x + t + u) . S(x)
Note that t is the time delay, while u is the length of the interval whose probability we measure. Exercise: In the previous exercise, estimate 100|200q70. [Solution: 100|200q70 = {S(170) - S(370)}/S(70) = {(3/10) - (1/10)}/(6/10) = 1/3.]
10
See Section 3.2.2 of Actuarial Mathematics.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 10
Variance of ps and qs:11 With the 10 values: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746, the estimate of 100p 50 = S(150)/S(50) = (3/10)/(8/10) = 3/8 = (number > 150)/(number > 50). Conditional on having 8 values greater than 50, the number of values greater than 150 is Binomial with m = 8 and q = 100p 50, and variance: 8 100p 50(1 - 100p 50) = 8 100p 50 100q50. However, given 8 values greater than 50, 100p 50 = (number > 150)/8. Thus, Var[100p 50 | S(50) = 8/10] = 8 100p 50 100q50 /82 = 100p 50 100q50 / 8 = (3/8)(5/8)/8 = (3)(5)/83 . Let nx ≡ number of values greater than x. Then by the above reasoning, y-xp x = ny/nx, and Var[y-xp x | nx] = ny(nx - ny)/nx3 . Since y-xqx = 1 - y-xp x, Var[y-xqx | nx] = Var[y-xp x | nx] = ny(nx - ny)/nx3 . Exercise: Estimate Var[30q70 | 6 values greater than 70], given the following 10 values: 22, 35, 52, 69, 86, 90, 111, 254, 362, 746. [Solution: 30p 70 = S(100)/S(70) = (4/10)/(6/10) = 2/3. 30q70 = 1/3. Var[30q70 | 6 values greater than 70] = 30p 70 30q70 /6 = (2/3)(1/3)/6 = 1/27. Alternately, Var[30q70 | n70 = 6] = n100(n70 - n100)/n703 = (4)(6 - 4)/63 = 1/27.]
11
See Example 14.5 in Loss Models.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 11
Risk Sets: An equivalent way to calculate the Empirical Survival Function is via the use of risk sets. Recall the example, where four women are observed starting from when they were born: Ann dies at age 68. Betty dies at age 77. Carol dies at age 77. Debra dies at age 84. For example, at age 70 the risk set is the three lives: Betty, Carol, and Debra. These three lives could die (fail) at age 70. Ann having died at age 68, can not die at age 70. Define the risk set at age t as the people alive at age t. Risk set at age t = those individuals who could have failed at age t. Exercise: What is the risk set at age 80? [Solution: The single life Debra.] Let ri = number of lives that could have been observed to fail at age yi. i
yi
ri
1 2 3
68 77 84
4 3 1
Exercise: What is the set of lives that failed at age 77? [Solution: The two lives Betty and Carol.] Let si = number of lives that failed at age yi. i
yi
ri
si
1 2 3
68 77 84
4 3 1
1 2 1
We can use the above display to calculate the Empirical Survival Function. For example, S(68) = 3/4 = (4 -1)/4 = (r1 - s1 )/r1 . S(77 | given live past 68) = 1/3 = (3 - 2)/3 = (r2 - s2 )/r2 . S(77) = S(68)S(77 | given live past 68) = (3/4)(1/3) = 1/4.
2013-4-7,
Survival Analysis §1 Introduction,
i
yi
ri
si
(ri - si)/ri
S(yi)
1 2 3
68 77 84
4 3 1
1 2 1
3/4 1/3 0
3/4 (3/4)(1/3) = 1/4 (1/4)(0) = 0
HCM 10/16/12,
Page 12
This matches the results calculated previously for the Empirical Survival Function. The advantage of setting things up in this manner is it allows one to deal with left truncation and/or right censoring of data, as will be discussed in subsequent sections. Call (ri - si)/ri the “survival ratio” at age yi.12 Then the Empirical Survival Function can be computed as a product of these survival ratios. To compute the Empirical Survival Function at age t, we multiply all the survival ratios for yi ≤ t. Exercise: For the data set: 12, 15, 18, 18, 22, 22, 22, 27, compute the Empirical Survival Function using the risk set approach. [Solution: i yi ri si (ri - si)/ri S(yi) 1 12 8 1 7/8 7/8 2 15 7 1 6/7 (7/8)(6/7) = 6/8 3 18 6 2 4/6 (6/8)(4/6) = 4/8 4 22 4 3 1/4 (4/8)(1/4) = 1/8 5 27 1 1 0 0 The Empirical Survival Function is: 1 for t < 12, 7/8 for 12 ≤ t < 15, 6/8 for 15 ≤ t < 18, 4/8 for 18 ≤ t < 22, 1/8 for 22 ≤ t < 27, 0 for t ≥ 27.] Thus far we have discussed data sets with neither truncation nor censoring. Data sets with neither truncation nor censoring are sometimes referred to as complete data sets or complete data studies. How to estimate the Survival Function when there is truncation and/or censoring will be discussed in subsequent sections on the Kaplan-Meier and Nelson-Aalen Estimators.
12
Loss Models has no name for (ri- si)/ri = 1 - si/ri. I will call si/ri the “failure ratio”.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 13
Right Censoring / Censoring From Above:13 In our previous example, let us assume we had started to observe five women 90 years ago, including Edith who is still alive at age 90. Then we know Edithʼs age of death is greater than 90. Due to our not being able to observe Edith long enough, her data has been right censored (censored from above) at 90. Similarly, Fran was alive at age 80, when for some reason we lost track of her. We know that Franʼs age of death is more than 80, but we do not know its exact value. Franʼs data is right censored at 80. An observation x is right censored / censored from above at u if when x ≥ u it is recorded as u, but when x < u it is recorded as x.14 Let us assume we observe 3 payments from a policy with a limit of 25,000: 4000, 13,000, and 25,000. We know that the final loss is of size 25,000 or more, but we do not know its exact size. This data has been censored from above (right censored) at 25,000. When there are policy limits, size of loss data is censored from above. Left Truncation / Truncation from Below:15 In our previous example, let us assume we had started to observe Gale 20 years ago at age 70. However, a similar women such as Georgia who was born at the same time but died at age 66 would not have been able to make it into our study when Gale did. Due to our starting to observe Gale when she was age 70, her data has been left truncated (truncated from below) at 70. An observation x is left truncated / truncated from below at d if when x ≤ d it is not recorded, but when x > d it is recorded as x.16 Let us assume we observe 2 losses from a policy with a deductible of 5,000: 13,000, and 35,000. Then any loss is of size 5,000 or less would not make into this data base.17 There may have been no such losses or several such losses; we just do not know. This data has been truncated from below (left truncated) at 5,000. When there are deductibles, size of loss data is truncated from below.
13
Right Censored and Censored from Above are two terms for the same thing. See also “Mahlerʼs Guide to Loss Distributions.” 14 See Definition 14.1 in Loss Models. 15 Left Truncation and Truncation from Below are two terms for the same thing. See also “Mahlerʼs Guide to Loss Distributions.” 16 See Definition 14.1 in Loss Models. 17 A loss of size 4000 would result in no payment by the insurer when there is a 5000 deductible. We assume such a loss would not make it into our data base.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 14
Hazard Rate (Force of Mortality or Failure Rate): The hazard rate, force of mortality, or failure rate, is defined as:
h(x) =
f(x) , x ≥ 0. S(x)
h(x) can be thought of as the failure rate of machine parts. The hazard rate can also be interpreted as the force of mortality = probability of death / chance of being alive. For a given age x, it is the density of the deaths, divided by the number of people still alive. Exercise: F(x) = 1 - e-x/10. What is the hazard rate? [Solution: h(x) = f(x)/S(x) = (e-x/10/10)/e-x/10 = 1/10.] The hazard rate determines the survival (distribution) function and vice versa. d ln(S(x)) / dx = dS(x)/dx / S(x) = -f(x) / S(x) = - h(x). Thus h(x) = -d ln(S(x)) / dx.
[
S(x) = exp -
x
∫ h(t) dt ].
18
0
x
Define the Cumulative Hazard Rate = H(x) =
∫ h(t) dt .
0
Then S(x) = exp[-H(x)].
H(x) = -ln[S(x)].
One can write the distribution function and the density function: F(x) = 1 - exp[-H(x)]. f(x) = h(x) exp[-H(x)]. Exercise: For an Exponential Distribution with θ = 5, what is H(x)? [Solution: S(x) = e-x/5. H(x) = -ln[S(x)] = x/5.] Exercise: h(x) = 1/10. What is the distribution function? [Solution: F(x) = 1 - e-x/10, an Exponential Distribution with θ = 10.] h constant ⇔ the Exponential Distribution, with constant hazard rate of 1/θ = 1/mean. The Exponential is the only continuous distribution with a constant hazard rate, and therefore constant mean excess loss. 18
The lower limit of the integral should be the lower end of the support of the distribution.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Exercise: h(x) = 3/(10 + x). What is the distribution function? x
[Solution: H(x) =
∫0
3 dt = 3 {ln(10 +x) - ln(10)} = 3 ln[(10+x)/10]. 10 + t
F(x) = 1 - exp[-H(x)] = 1 - {10/(10+x)}3 . This is a Pareto Distribution with α = 3 and θ = 10.]
Page 15
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 16
Problems: Use the following information for the next 14 questions: Twenty individuals were observed from birth. All were observed until death. The ages of death are as follows: 22, 37, 51, 64, 68, 68, 70, 71, 75, 76, 77, 77, 80, 80, 80, 82, 84, 87, 90, 98. 1.1 (1 point) Estimate S(70). 1.2 (1 point) What is the variance of the estimate in the previous question? 1.3 (5 points) List yi, ri, si, and use the risk set approach in order to estimate the Survival Function. 1.4 (1 point) Estimate H(80). 1.5 (1 point) Estimate Prob[65 < age of death ≤ 75]. 1.6 (2 points) What is the variance of the estimate in the previous question? 1.7 (1 point) Estimate Prob[70 ≤ age of death ≤ 80]. 1.8 (2 points) What is the variance of the estimate in the previous question? 1.9 (1 point) Estimate 10p 60. 1.10 (1 point) What is the variance of the estimate in the previous question, conditional on S(60)? 1.11 (1 point) Estimate 15q65. 1.12 (1 point) What is the variance of the estimate in the previous question, conditional on S(65)? 1.13 (1 point) Estimate 10|5q70. 1.14 (1 point) What is the variance of the estimate in the previous question, conditional on S(70)? 1.15 (2 points) A portfolio of policies has produced the following 9 losses: 500, 1000, 1000, 2000, 4000, 5000, 5000, 10000, 25000 Using the empirical model, estimate H(5000). (A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 17
Use the following distribution of insureds by number of claims for the next 4 questions: Number of Claims 0 1 2 3 4 5&+ All Number of Insureds 490 711 572 212 15 0 2000 1.16 (1 point) What is the empirical estimate of the probability of having 2 claims? 1.17 (1 point) What is the variance of the estimate in the previous question? 1.18 (1 point) What is the empirical estimate of the probability of having more than 2 claims? 1.19 (1 point) What is the variance of the estimate in the previous question?
1.20 (2 points) One hundred machines are observed from when they are new. All were observed until failure. The following table gives the age at failure: Age Number of Failures 1 3 2 11 3 23 4 25 5 18 6 10 7 5 8 2 9 1 10 1 11 0 12 1 Let Y be the number of machines, out of 100, that fail after age 6. Let V1 denote the estimated variance of Y, if calculated without any distribution assumption. Let V2 denote the variance of Y, if calculated knowing that the survival function is that of a Weibull Distribution with τ = 2 and θ = 5. Determine V2 - V1 . (A) Less than -10 (B) At least -10, but less than -5 (C) At least -5, but less than 5 (D) At least 5, but less than 10 (E) At least 10
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 18
1.21 (2 points) The number of employees leaving a company for all reasons is tallied by the number of months since hire. The following data was collected for a group of 50 employees hired one year ago: Number of Months Since Hire Number Leaving the Company 1 1 2 1 3 2 5 2 7 1 10 1 12 1 Note: Assume that employees always leave the company after a whole number of months. Determine an approximate 95% confidence interval for S(6). 1.22 (1 point) A mortality study is done using all the death certificates from prior to 2000. Which of the following are true? A. This data is neither truncated nor censored. B. This data is left truncated. C. This data is right censored. D. This data is left truncated and right censored. E. None of the above. 1.23 (3 points) For a certain population, you are given that the hazard rate h(x) = .002 1.04x, x > 0. Calculate S(60). (A) 0.54 (B) 0.56
(C) 0.58
(D) 0.60
(E) 0.62
1.24 (160, 5/87, Q.7) (2.1 points) Two mortality studies with complete data have produced independent estimators of S(10). ^
Study 1 began with 300 lives and produced S 1 (10) = 0.60. ^
Study 2 began with 100 lives and produced S 2 (10) = 0.50. Calculate the estimated variance of the difference between these estimators. (A) 0.0008 (B) 0.0017 (C) 0.0025 (D) 0.0033 (E) The value cannot be determined from the information given.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 19
1.25 (160, 5/90, Q.7) (2.1 points) From a complete data sample of n mice, you are given: (i) There are 45 survivors at the end of five weeks. (ii) There are 38 survivors at the end of six weeks. (iii) The estimate of the unconditional variance of the number of deaths between the end of the fifth and sixth week is 6.17. Calculate n. (A) 47 (B) 51 (C) 55 (D) 59 (E) 63 1.26 (160, 11/90, Q.3) (1.9 points) For a certain population, you are given that the hazard rate h(x) = 1/(1 + x), x > 0. Calculate 10|5q9 , the probability that an individual age 9, lives for 10 years and then dies within 5 years. (A) 0.096 (B) 0.097 (C) 0.098
(D) 0.099
(E) 0.100
1.27 (160, 5/91, Q.1) (1.9 points) You are given the hazard rate function: h(t) =
2t , t > 0. Calculate 2 q1 . 1 + t2
(A) 0.5
(B) 0.6
(C) 0.7
(D) 0.8
(E) 0.9
1.28 (160, 5/91, Q.10) (1.9 points) A cohort of 10 lives is observed over the interval (0, 6]. You are given: (i) The observed times of death are 1, 2, 3, 4, 4, 5, 5, 6, 6, 6. ^
(ii) VU is the variance of S (3) when the survival distribution is assumed to be uniform on (0, ω], with ω = 6. ^
(iii) VO is the estimated variance of S (3) when no assumption about the underlying survival distribution is made. Calculate VU - VO. (A) -0.004
(B) -0.002
(C) 0.000
(D) 0.002
(E) 0.004
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 20
1.29 (Course 160 Sample Exam #2, 1996, Q.4) (1.9 points) A cohort of ten individuals is observed from time t = 0 until all have failed. You are given: (i) Time t Number of failures at time t 2 1 3 1 5 1 7 2 10 1 12 2 13 1 14 1 (ii) V1 denotes the estimated variance of 3 q^ 7 if calculated without any distribution assumption. (iii) V2 denotes the variance of 3 q^ 7 if calculated based on the survival distribution: S(t) = 1 - t/15, 0 ≤ t ≤ 15. Calculate V1 - V2 . (A) -0.015
(B) -0.007
(C) 0.000
(D) 0.007
(E) 0.015
1.30 (4B, 5/97, Q.22) (3 points) You are given the following:
•
Losses follow a distribution with a cumulative distribution function F(x;θ) = 1 - e -x/θ, 0 < x < ∞ , θ > 0.
•
You wish to estimate θ by minimizing the distance function w1 [Fn (500) - F(500;θ)]2 + w2 [Fn (2000) - F(2000;θ)]2 where Fn (x) is the empirical cumulative distribution function.
•
You wish the weights, w1 and w2 , to be inversely proportional to the variances of Fn (500) and Fn (2,000), respectively.
Although the true value of θ is not known, determine which of the following sets of weights would satisfy the above criterion if θ were 1,000. A. w1 = 0.18, w2 = 0.82 B. w1 = 0.33, w2 = 0.67 C. w1 = 0.50, w2 = 0.50 D. w1 = 0.67, w2 = 0.33 E. w1 = 0.82, w2 = 0.18
2013-4-7,
Survival Analysis §1 Introduction,
1.31 (4, 11/05, Q.1 & 2009 Sample Q.214) (2.9 points) A portfolio of policies has produced the following claims: 100 100 100 200 300 300 300 400 Determine the empirical estimate of H(300). (A) Less than 0.50 (B) At least 0.50, but less than 0.75 (C) At least 0.75, but less than 1.00 (D) At least 1.00, but less than 1.25
HCM 10/16/12,
500
600
(E) At least 1.25 1.32 (CAS3L, 11/08, Q.12) (2.5 points) You are given the force of mortality, µ(x) =
1 . 3 (100 - x)
Calculate the probability that a life aged 70 will die between ages 75 and 80. A. Less than 0.04 B. At least 0.04, but less than 0.06 C. At least 0.06, but less than 0.08 D. At least 0.08, but less than 0.10 E. At least 0.10
Page 21
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Solutions to Problems: 1.1. S(70) = 13/20 = 0.65 1.2. Variance of S(70) = F(70)S(70)/20 = (.35)(.65)/20 = 0.011375. 1.3.
i
yi
ri
si
(ri - si)/ri
S(yi)
1 22 20 1 19/20 19/20 2 37 19 1 18/19 (19/20)(18/19) = 18/20 3 51 18 1 17/18 (18/20)(18/19) = 17/20 4 64 17 1 16/17 (17/20)(16/17) = 16/20 5 68 16 2 14/16 (16/20)(14/16) = 14/20 6 70 14 1 13/14 (14/20)(13/14) = 13/20 7 71 13 1 12/13 (13/20)(12/13) = 12/20 8 75 12 1 11/12 (12/20)(11/12) = 11/20 9 76 11 1 10/11 (11/20)(10/11) = 10/20 10 77 10 2 8/10 (10/20)(8/10) = 8/20 11 80 8 3 5/8 (8/20)(5/8) = 5/20 12 82 5 1 4/5 (5/20)(4/5) = 4/20 13 84 4 1 3/4 (4/20)(3/4) = 3/20 14 87 3 1 2/3 (3/20)(2/3) = 2/20 15 90 2 1 1/2 (2/20)(1/2) = 1/20 16 98 1 1 0 0 Comment: The same estimates as the Empirical Distribution Function. 1.4. Empirical estimate of S(80) = (# of items > 80)/(Total # of items) = 5/20 = 0.25. Empirical estimate of H(80) = -ln[S(80)] = -ln(0.25) = 1.386. Comment: S(t) = Exp[-H(t)]. H(t) = -ln[S(t)]. 1.5. Prob[65 < age of death ≤ 75] = S(65) - S(75) = 16/20 - 11/20 = 5/20 = 0.25. Alternately, 5 out of 20 values are in the interval; 5/20 = .25. 1.6. (Prob. in the interval)(1 - Prob. in the interval)/N = (.25)(1 - .25)/20 = 0.009375. Alternately, Var[S(65) - S(75)] = Var[S(65)] + Var[S(75)] - 2Cov[S(65) , S(75)] = F(65)S(65)/N + F(75)S(75)/N - 2F(65)S(75)/N = (4/20)(16/20)/20 + (9/20)(11/20)/20 - (2)(4/20)(11/20)/20 = 75/203 = 0.009375. 1.7. Prob[70 ≤ age of death ≤ 80] = S(70-ε) - S(80) = 14/20 - 5/20 = 9/20 = 0.45. Alternately, 9 out of 20 values are in the interval; 9/20 = 0.45.
Page 22
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 23
1.8. (Prob. in the interval)(1 - Prob. in the interval)/N = (0.45)(1 - 0.45)/20 = 0.012375. Alternately, Var[S(70-ε) - S(80)] = Var[S(70-ε)] + Var[S(80)] - 2Cov[S(70-ε) , S(80)] = F(70-ε)S(70-ε)/N + F(80)S(80)/N - 2F(70-ε)S(80)/N = (6/20)(14/20)/20 + (15/20)(5/20)/20 - (2)(6/20)(5/20)/20 = 99/203 = 0.012375. 1.9. 10p 60 = S(70)/S(60) = (13/20)/(17/20) = 13/17. 1.10. Var[10p 60 | S(60) = 17/20] = 10p60 10q60/17 = (13/17)(4/17)/17 = 0.0106. 1.11. 15q65 = 1 - S(80)/S(65) = 1- (5/20)/(16/20) = 11/16. 1.12. Var[15q65 | S(65) = 16/20] = 15p 65 15q65/16 = (5/16)(11/16)/16 = 0.0134. 1.13. 10|5q70 = {S(80) - S(85)}/S(70) = {(5/20) - (3/20)}/(13/20) = 2/13. 1.14. Var[10|5q70 | S(70) = 13/20] = 10|5q70(1 - 10|5q70)/13 = (2/13)(11/13)/13 = 0.0100. 1.15. E. Empirical Model ⇔ Empirical Distribution Function. Empirical Survival Function at 5000 is: (# of items > 5000)/(Total # of items) = 2/9. Empirical estimate of H(1000) = -ln[S(5000)] = -ln(2/9) = 1.504. Comment: Similar to 4, 11/05, Q.1. 1.16. Prob[N = 2] = 572/2000 = 0.286. 1.17. (0.286)(1 - 0.286)/2000 = 0.000102. 1.18. Prob[N > 2] = (212 + 15)/2000 = .1135. 1.19. (0.1135)(1 - 0.1135)/2000 = 0.0000503. Comment: Similar to Exercise 14.14 in Loss Models. An approximate 95% confidence interval is: .114 ± .014. 1.20. D. The Empirical Survival Function at 6 is: (5 + 2 + 1 + 1 + 1)/100 = 0.1. V 1 = (100)(0.1)(1 - 0.1) = 9. The Survival Function of the Weibull at 6 is: exp[-(6/5)2 ] = 0.237. V 2 = (100)(0.237)(1 - 0.237) = 18.1. V2 - V1 = 18.1 - 9 = 9.1. Comment: Similar to Exercise 14.19 in Loss Models. We are adding rather than averaging independent, identically distributed variables.
2013-4-7,
Survival Analysis §1 Introduction,
HCM 10/16/12,
Page 24
1.21. S(6) = 1 - 6/50 = .88. Variance of that estimate is: (0.88)(1- 0.88)/50 = 0.00211. 95% confidence interval for S(6): 0.88 ± (1.960) 0.00211 = 0.88 ± 0.09 = (0.79, 0.97). 1.22. E. This data would include for example some people born in 1940 who have already died. However, someone born in 1940, who survived past age 60 would not be in this data. The data for people born in 1940 would be truncated from above (right truncated) at 60. Similarly, the data for people born in 1920 would be right truncated at 80. Comment: Loss Models does not discuss right truncation. Actually since the death certificates were not issued or are not available before some date, some of the data is also left truncated. For example, if the starting date were 1900, then anyone born in 1860 who died before age 40 is not in the data. Additional real world complications include people moving, in particular immigration and emigration. x
1.23. E. H(x) =
x
t=x
0
t=0
∫ h(t)dt = .002 ∫ 1.04t dt = .002 1.04 t / ln(1.04)] 0
= .002(1.04x - 1)/ln(1.04).
S(x) = exp[-H(x)] = exp[-.002(1.04x - 1)/ln(1.04)]. S(60) = exp[-0.002(1.0460 - 1)/ln(1.04)] = 0.6154. Comment: Gompertz Law, with parameters c = 1.04 and B = 0.002. ^
^
1.24. D. Var[ S 1 (10)] = (0.6)(1 - 0.6)/300 = 0.0008. Var[ S 2 (10)] = (0.5)(1 - 0.5)/100 = 0.0025. ^
^
^
^
Var[ S 1 (10) - S 2 (10)] = Var[ S 1 (10)] + Var[ S 2 (10)] = 0.0008 + 0.0025 = 0.0033. Comment: Since the estimators are independent, their variances add. 1.25. D. S(5) - S(6) = 45/n - 38/n = 7/n. Variance of the number of deaths between the end of the fifth and sixth week is: (7/n)(1 - 7/n)n = 7 - 49/n = 6.17. ⇒ n = 59. 1.26. E. H(t) = ∫ 0t h(x)dx = ln(1 + t). S(t) = exp[-H(t)] = 1/(1 + t). 10|5q 9 = {S(19) - S(24)}/S(9) = (1/20 - 1/25)/(1/10) = 0.1. 1.27. D. H(t) = ∫ 0t h(x)dx = ln(1 + t2 ). S(t) = exp[-H(t)] = 1/(1 + t2 ). 2 q1
= {S(1) - S(3)} / S(1) = (1/2 - 1/10) / (1/2) = 4/5 = 0.8.
2013-4-7,
Survival Analysis §1 Introduction, ^
HCM 10/16/12,
Page 25
^
1.28. E. Assuming uniform on (0, 6], S (3) = 1/2. Var[ S (3)] = (0.5)(1 - 0.5)/10 = 0.025. ^
^
With no distributional assumption, S (3) = 7/10. Var[ S (3)] = (0.7)(1 - 0.7)/10 = 0.021. V U - VO = 0.025 - 0.021 = 0.004. 1.29. A. There are 5 alive at age 7 of which one dies within 3 years. Therefore, based on the data, 3 q^ 7 = 1/5. V1 = (0.2)(0.8)/5 = 0.032. S(7) = 1 - 7/15 = 0.5333. S(10) = 1 - 10/15 = 0.3333. 3 q^ 7 = (0.5333 - 0.3333)/0.5333 = 0.375. V 2 = (0.375)(0.625)/5 = 0.0469. V1 - V2 = 0.032 - 0.0469 = -0.0149. 1.30. B. The Empirical Distribution Function at x has a variance of: F(x){1-F(x)} / N. In this case, F(x){1-F(x)} / N = (1 - e-x/θ)(e-x/θ) /N. Thus if θ = 1000, then w1 is inversely proportional to: (1-e-1/2)(e-1/2) /N = 0.239/N, while w2 is inversely proportional to: (1-e-2)(e-2) /N = 0.117/N. Thus w1 is proportional to: 4.18N, while w2 is proportional to: 8.55N. Thus w1 = 4.18/(4.18 + 8.55) = 0.33 and w2 = 8.55/(4.18 + 8.55) = 0.67. 1.31. D. Empirical estimate of S(300) = (# of items > 300)/(Total # of items) = 3/10. Empirical estimate of H(300) = -ln[S(300)] = -ln(0.3) = 1.204. Comment: S(t) = Exp[-H(t)]. H(t) = -ln[S(t)]. The empirical survival function is equal to the Product Limit estimator for this complete data set. While they intended for one to use the empirical survival function, some might have used instead the Nelson-Aalen estimator: H(300) = 3/10 + 1/7 + 3/6 = 0.943, getting a different answer. 1.32. C. h(x) = 1/{3(100 - x)}. x
H(x) =
∫ h(t) dt = (-1/3){ln(100 - x) - ln(100)} = (-1/3) ln(1 - x/100). 0
S(x) = exp[-H(x)] = exp[ln(1 - x/100) / 3] = (1 - x/100)1/3. S(70) = (1 - 70/100)1/3 = 0.6694. S(75) = (1 - 75/100)1/3 = 0.6300. S(80) = (1 - 80/100)1/3 = 0.5848. Probability that a life aged 70 will die between ages 75 and 80 is: {S(75) - S(80)}/S(70) = (0.6300 - 0.5848)/0.6694 = 6.75%. Comment: Modified Demoivreʼs Law, with x ≤ 100.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 26
Section 2, Kaplan-Meier Product-Limit Estimator19 The Kaplan-Meier Product-Limit Estimator uses the risk set approach discussed previously in order to estimate the survival function. When there is no censoring or truncation, the Kaplan-Meier Product-Limit estimator reduces to the empirical Survival Function. However, as will be discussed the Kaplan-Meier Product-Limit estimator can also be used in the presence of truncation and/or censoring. Right Censoring / Censoring from Above: Six women are observed starting from when they were born in 1914: Ann dies at age 68. Betty dies at age 77. Carol dies at age 77. Debra dies at age 84. Edith is alive at age 90 when the study ends in 2004. Fran leaves the study at age 80, and dies sometime after that. As previously, let ri = number of lives that could have failed at age yi, and let si = number of lives that failed at age yi. i
yi
ri
si
(ri - si)/ri
S(yi)
1 2 3
68 77 84
6 5 2
1 2 1
5/6 3/5 1/2
5/6 (5/6)(3/5) = 3/6 (3/6)(1/2) = 1/4
Note that Fran is no longer in the study after age 80. Therefore, Fran is not available to be observed to die at age 84. Thus r3 is only 2, corresponding to Debra and Edith. As before, we calculate the survival ratios, (ri - si)/ri. Then the Product-Limit estimator of the survival function is a product of the survival ratios. We estimate the survival function as: S(t) = 1 for t < 68, S(t) = 5/6 for 68 ≤ t < 77, S(t) = 3/6 for 77 ≤ t < 84, and S(t) = 1/4 for 84 ≤ t ≤ 90.20 19 20
See pages 345-346 of Loss Models. We will later discuss how to estimate the tail of the survival function.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 27
Note that prior to age 80, the first censorship point, the Product-Limit estimator is equal to the empirical Survival Function. The Product-Limit estimator is based on the assumption that knowledge of a censoring time for an individual provides no further information about this personʼs likelihood of survival at a future time had the individual continued in the study. yj is an uncensored value in the data. There are k unique values in the sample of size n. sj ⇔ the number of values that failed at yj. rj is the risk set at yj ⇔ the number of values that could have failed at yj. Product-Limit Estimator of the Survival Function is: r - si S n (t) = ∏ i , for yj - 1 ≤ t < yj. r i i
S n ( yk ) =
∏ ri ri= 1
si i
, where the product is taken over all yi.
If the largest value is an uncensored value, in other words an observed death, and there are no censored values of the same size, then sk = rk and Sn (yk) = 0. If instead sk < rk, then Sn (yk) > 0. In the example with 6 women, our last observed death was at 84, and the largest censorship point ^
was 90. S (90) = 1/4 > 0. However, we assume that everybody will eventually die, and therefore that the survival function must go to zero as t approaches infinity. Thus it makes sense to have the estimated survival function decline after age 90.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 28
Extrapolating the Tail of the Survival Function: Loss Models assumes an exponential decline in the survival function beyond the last observation or censorship point.21 In this example, for t > 90, we estimate S(t) = (1/4)t/90. Thus in this case, we estimate S(100) = (1/4)100/90 = 0.214. Let w = maximum of the censorship points and data elements. ⎪⎧ k ri - si⎪⎫t / w Then for t > w, let Sn (t) = ⎨∏ , where the product is taken over all yj . ⎬ ⎪⎩ i=1 ri ⎪⎭ Exercise: There are 5 losses from a policy with a limit of 25,000: 3000, 5000, 18,000, 25,000, and 25,000. There are also 5 losses from a policy with a limit of 50,000: 5000, 12,000, 28,000, 32,000, 50,000. Use the combined data and the Kaplan-Meier Product-Limit Estimator in order to estimate the Survival Function. Extrapolate the tail of the Survival Function using the method used in Loss Models. ri si (ri - si)/ri S(yi) [Solution: i yi 1 3000 10 1 9/10 9/10 2 5000 9 2 7/9 (9/10)(7/9) = 7/10 3 12,000 7 1 6/7 (7/10)(6/7) = 6/10 4 18,000 6 1 5/6 (6/10)(5/6) = 5/10 5 28,000 3 1 2/3 (5/10)(2/3) = 1/3 6 32,000 2 1 1/2 (1/3)(1/2) = 1/6. The last observed value is at 32,000, while the largest censorship point is 50,000. Thus we extrapolate beyond 50,000. For x > 50,000, S(x) = (1/6)x/50000. Comment: Note that below the smallest censorship point of 25,000, since there is no truncation, the estimated Survival Function is the Empirical Survival Function.] For example, we estimate S(100,000) = (1/6)100000/50000 = 1/62 = 2.78%, and S(200,000) = (1/6)200000/50000 = 1/64 = 0.077%.
21
See page 346 of Loss Models. While in a particular application other extrapolation techniques may be more reasonable, the exponential is the only one presented by Loss Models.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 29
Here is a graph of the estimated survival function: Prob. 1
0.8
0.6
0.4
0.2
Size 20000
40000
60000
80000
100000
120000
140000
For example, the Product Limit Estimator is constant on the interval [18,000, 28,000) between the observations at 18,000 and 28,000. S(x) = 1/2 for 18,000 ≤ x < 28,000. Left Truncation / Truncation from Below: Five women are observed: Ann dies at age 68, enters the study at birth. Betty dies at age 77, enters the study at birth. Carol dies at age 77, enters the study at birth. Debra dies at age 84, enters the study at birth. Gale enters the study at age 70 and dies at age 79. Exercise: What is the risk set at age 68? [Solution: The four lives: Ann, Betty, Carol, and Debra.]
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 30
Exercise: What is the risk set at age 77? [Solution: The four lives: Betty, Carol, Debra, and Gale.] Note that Galeʼs data is truncated from below at age 70.22 We can estimate the Survival Function using the Product-Limit Estimator in the same manner as before: i
yi
ri
si
(ri - si)/ri
S(yi)
1 2 3 4
68 77 79 84
4 4 2 1
1 2 1 1
3/4 2/4 1/2 0
3/4 (3/4)(2/4) = 3/8 (3/8)(1/2) = 3/16 0
Exercise: The following 4 losses are from policies with a deductible of 500: 700, 1100, 1500, 2700. The following 3 losses are from policies with a deductible of 1000: 1200, 1500, 2100. Use the Product-Limit Estimator to estimate the survival function. [Solution: i yi ri si (ri - si)/ri S(yi)/S(500) 1 2 3 4 5 6
500 700 1100 1200 1500 2100 2700
4 4 6 5 4 2 1
0 1 1 1 2 1 1
1 3/4 5/6 4/5 2/4 1/2 0
1 3/4 (3/4)(5/6) = 5/8 (5/8)(4/5) = 1/2 (1/2)(2/4) = 1/4 (1/4)(1/2) = 1/8 0
]
Note that in this case, all the data is left truncated and the smallest truncation point is 500. Thus we have no information on what happens below 500. Thus we are really estimating S(yi)/S(500), the probability of survival to yi conditional on survival to 500. In general, when all the data is left truncated, in other words when nobody is in the study from birth, then the Product-Limit estimator gives an estimate of the survival conditional on survival to the smallest truncation point. For example, if all of the lives enter a study at age 50, then the estimated survival functions are all conditional upon survival to 50: S(x)/S(50).
22
While Gale made it into the study, she is an exemplar of a set of people, some of whom could not enter the study at age 70 because they had died before age 70. This is similar to a set of losses on a policy with a $1000 deductible, which are truncated from below at 1000. In this case a loss of size $3000 would make into the data base, but would still be called truncated from below. See “Mahlerʼs Guide to Loss Distributions.”
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 31
Left Truncation and Right Censorship: Eight women are observed: Ann dies at age 68, enters the study at birth. Betty dies at age 77, enters the study at birth. Carol dies at age 77, enters the study at birth. Debra dies at age 84, enters the study at birth. Edith enters the study at birth and is alive at age 90 when the study ends. Fran enters the study at birth and leaves the study at age 80. Gale enters the study at age 70 and dies at age 79. Helen enters the study at age 65 and remains alive at age 90 when the study ends. Note that Edithʼs data is right censored at age 90, Franʼs data is right censored at age 80, Galeʼs data is left truncated at age 70, and Helenʼs data is left truncated at age 65 and right censored at age 90. Exercise: Estimate the Survival Function using the Product-Limit Estimator. [Solution: At age 68, the risk set is size 7: A through F, H. At age 77, the risk set is size 7: B through H. At age 79, the risk set is size 5: D through H. At age 84, the risk set is size 3: D, E, H. i yi ri si (ri - si)/ri S(yi) 1 68 7 1 6/7 6/7 2 77 7 2 5/7 (6/7)(5/7) = 30/49 3 79 5 1 4/5 (30/49)(4/5) = 24/49 4 84 3 1 2/3 (24/49)(2/3) = 16/49 = .327 Comment: For this very small sample, the estimated survival function at 90 is .327.] A value left truncated at d, enters the risk set at t > d, and is not in the risk set at d.23 Thus Gale is not in risk set at 70, but is in the risk set immediately after 70. An individual that dies at age x, is in the risk set at x. Thus Betty is in the risk set at 77. Betty could have been observed to die at age 77, and was. A value right censored at u, is in the risk set at u. Thus Fran is in the risk set at 80. Fran could have been observed to die at age 80. Exercise: Using the result of the previous exercise, estimate S(110). [Solution: S(110) = S(90)110/90 = (16/49)110/90 = 0.255. Comment: The Exponential Distribution has too heavy of a righthand tail to accurately extrapolate human mortality.] 23
If an individual enters the study at age d, then it is alive at age d, and was not available to die at age d. For a loss on a policy with deductible d to enter our data base, it must be of size strictly greater than d.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 32
Here is a graph of the estimated Survival Function: Prob. 1 0.8 0.6 0.4 0.2
80
100
120
140
Age
Exercise: In the previous exercises, estimate 10q70, 8|5q70, and 20p 80. [Solution: 10q70 = 1 - S(80)/S(70) = 1 - (24/49)/(6/7) = 1 - 4/7 = 3/7. 8|5q 70
= {S(78) - S(83)}/S(70) = {(30/49) - (24/49)}/(6/7) = 1/7.
20p 80
= S(100)/S(80) = (16/49)100/90 / (24/49) = 0.589.]
Exercise: You observe the following payments from 10 losses: Deductible Maximum Covered Loss Payment None None 300 None None 70,000 500 None 11,500 1000 None 1,000 None 25,000 11,000 None 50,000 50,000 500 25,000 14,500 500 100,000 200 1000 50,000 49,000 1000 100,000 28,000 Note that the maximum payment on any loss is: Maximum Covered Loss minus Deductible. Estimate the Survival Function using the Product-Limit Estimator.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 33
[Solution: Deductible None None 500 1000 None None 500 500 1000 1000 i yi
Maximum Covered Loss Payment Size of Loss None 300 300 None 70,000 70,000 None 11,500 12,000 None 1,000 2,000 25,000 11,000 11,000 50,000 50,000 50,000 or more 25,000 14,500 15,000 100,000 200 700 50,000 49,000 50,000 or more 100,000 28,000 29,000 ri si (ri - si)/ri S(yi) 1 300 4 1 3/4 3/4 2 700 6 1 5/6 (3/4)(5/6) = 5/8 3 2000 8 1 7/8 (5/8)(7/8) = 35/64 4 11,000 7 1 6/7 (35/64)(6/7) = 15/32 5 12,000 6 1 5/6 (15/32)(5/6) = 25/64 6 15,000 5 1 4/5 (25/64)(4/5) = 5/16 7 29,000 4 1 3/4 (5/16)(3/4) = 15/64 8 70,000 1 1 0 0 Comment: Note that between 1000 and 29,000 there was no effect of truncation and censoring. Out of 8 values present at 1000, 3 of them survive beyond 29,000. Therefore, S(29,000) = (3/8) S(1000) = (3/8)(5/8) = 15/64.] Time Intervals with No Entries or Withdrawals: In general, when on an interval there are no entries or withdrawals (for causes other than death), one can group together the deaths and treat the Product-Limit Estimator as one would the Empirical Survival Function. Exercise: No individuals enter or withdraw from a mortality study between the ages of 40 and 60. The Product-Limit estimate of S(40) is .84. 32 lives survive past age 40. There are single deaths at ages: 42, 43, 47, 50, 54, 57, 58, and 60. Determine the Product-Limit estimate of S(60). [Solution: There are 8 deaths on (40, 60]. S(60) = {(32 - 8)/32} S(40) = (0.75)(0.84) = 0.63. Alternately, S(60) = (0.84)(31/32)(30/31)(29/30)(28/29)(27/28)(26/27)(25/26)(24/25) = 0.63.]
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 34
Estimating the Mean: ∞
In general, E[X] =
∫ S(t)dt .
24
0
In other words, the mean is the area under the graph of the Survival Function. The Kaplan-Meier Product Limit Estimator produces estimates of the Survival Function that are constant on intervals. Therefore, one can estimate the mean as a sum of survival function estimates on intervals times width of those intervals. For a previous exercise with size of loss data, we estimated: S(x) = 1, x < 300. S(x) = 3/4, 300 ≤ x < 700. S(x) = 5/8, 700 ≤ x < 2000. S(x) = 35/64, 2000 ≤ x < 11,000. S(x) = 15/32, 11,000 ≤ x < 12,000. S(x) = 25/64, 12,000 ≤ x < 15,000. S(x) = 5/16, 15,000 ≤ x < 29,000. S(x) = 15/64 29,000 ≤ x < 70,000. S(x) = 0, x ≤ 70,000. 1
3 4
35 64 25 64 15 64
15000
29000
70000
Exercise: For this example, estimate the mean. [Solution: The mean is the area under the survival function: (1)(300) + (3/4)(700 - 300) + (5/8)(2000 - 700) + (35/64)(11000 - 2000) + (15/32)(12000 - 11000) + (25/64)(15000 - 12000) + (5/16)(29000 - 15000) + (15/64)(70000 - 29000) = 21,959. Comment: In this case, the estimated survival function is zero at 70,000. If instead the last observed value was censored, as discussed previously, one could extrapolate exponentially. In that case, one would have to include the area under this extrapolated tail of the Survival Function.] 24
For distributions with support starting at zero. See “Mahlerʼs Guide to Loss Distributions.”
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 35
Estimating the Mean Residual Life:
In general, E[X
∧
x
x] =
∫ S(t)dt .
25
0
Therefore, the loss excess of x are: E[X] - E[X
∧
∞
x] =
∫ S(t)dt .
26
x
In other words, the losses excess of x are the area under the graph of the Survival Function, from x to infinity. Exercise: For the above example, estimate the dollars of loss excess of 20,000. [Solution: E[X] - E[X ∧ 20000] = (5/16)(29000 - 20000) + (15/64)(70000 - 29000) = 12,422.] The mean residual life or mean excess loss at x is: e(x) = (E[X] - E[X
∧
x])/S(x).27
Exercise: For the above example, estimate the mean residual life at 20,000. [Solution: 12,422 / (5/16) = 39,750.] Probability on Closed Versus Open Intervals: S(16) = Prob[X > 16]. Thus S(16) - S(29) = Prob[X > 16] - Prob[X > 29] = Prob[16 < X ≤ 29]. S(15.999) - S(29) = Prob[X > 15.999] - Prob[X > 29] = Prob[15.999 < X ≤ 29] = Prob[16 ≤ X ≤ 29]. ^
^
^
S (16) and S (16-) = S (15.999) would differ if there were a death observed at exactly age 16.28 Properties of the Product-Limit Estimator: At times of death, the Product Limit Estimator is unbiased.29 Under certain regularity conditions, the Product-Limit estimator is a nonparametric maximum likelihood estimator. The Product-Limit estimator is consistent.
25
See “Mahlerʼs Guide to Loss Distributions.” See “Mahlerʼs Guide to Loss Distributions.” 27 See “Mahlerʼs Guide to Loss Distributions.” 28 See 4, 11/00, Q.4. 29 See page 356 of Loss Models. 26
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 36
Kernel Smoothing:30 The estimate of the Survival Function has jumps at each of the reported values; there is a corresponding discrete density. For example, in the previous exercise involving losses, f(700) = S(300) - S(700) = 3/4 - 5/8 = 1/8. Exercise: What is the discrete density at 11,000? [Solution: f(11,000) = S(2000) - S(11,000) = 35/64 - 15/32 = 5/64. Comment: It turns out that in this case many of the other discrete densities, such as f(12,000), f(15,000), and f(29,000), are also 5/64.] One can apply Kernel Smoothing to these discrete densities. Exercise: Use a uniform kernel with bandwidth of 3000 to estimate f(13,000). [Solution: The uniform kernel goes from 10,000 to 16,000 with density 1/6000. Only the three discrete densities at 11,000, 12,000, and 15,000 contribute. The kernel smoothed estimate of f(13,000) is: (f(11,000) + f(12,000) + f(15,000))/6000 = (5/64 + 5/64 + 5/64)/6000 = 0.0000391.] Exercise: Use a triangular kernel with bandwidth of 5000 to estimate f(25,000). [Solution: The triangular kernel has height 1/5000 at 25,000 and height zero at 20,000 and 30,000. 0.0002
20000
25000
30000
The kernel has a density at 29,000 of: (1/5)(1/5000) = 1/25000. Only the discrete density at 29,000 contributes. f(29,000) = S(15,000) - S(29,000) = 5/64. The kernel smoothed estimate of f(25,000) is: (5/64)/25000 = 0.0000031.]
30
See “Mahlerʼs Guide to Fitting Loss Distributions” and Section 14.3 of Loss Models.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 37
Risk Sets: ri is the risk set at yi ⇔ the number of values that could have failed at yi. In some exam questions you will be given the ri. In others, you will have to calculate them yourself. In order to calculate the risk set, one approach is to just count up the number of values that could fail at each age/amount. In the previous example involving losses, at 700 there are 6 items that could fail: Deductible None None 500 1000 None None 500 500 1000 1000
Maximum Covered Loss None None None None 25,000 50,000 25,000 100,000 50,000 100,000
Payment 300 70,000 11,500 1,000 11,000 50,000 14,500 200 49,000 28,000
Size of Loss Could Fail @ 700? 300 No 70,000 Yes 12,000 Yes 2,000 No 11,000 Yes 50,000 or more Yes 15,000 Yes 700 Yes 50,000 or more No 29,000 No
The loss of 300 could not fail at 700, since it had already failed at 300. The loss of 2000 could not fail at 700, because it is truncated from below at 1000; we only started to observe it after 1000 > 700. The payment of size 50,000 could not be observed to fail at 70,000, because it it censored from above at 50,000.31 Thus it would not be in the risk set at 70,000.
31
The insurer has paid 50,000, and the actuary would only know that the loss was of size 50,000 or more. The actuary could not observe the loss to be of size 70,000. If a person withdraws a mortality study at age 50, then we know they died at some age greater than 50. That person would not be in the risk set at 70.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 38
Some Approaches to Calculating the Risk Set: In order to calculate the risk set, one can use the following formulas or verbal approaches.32 xi is a value in the data. di is the left truncation point for xi. If xi is not left truncated, di = 0.33 ui is the right censoring point for xi. If xi is not right censored, ui = ∞.34 If there is a maximum covered loss that did not affect a particular payment, then set ui = ∞. y i are the uncensored values in the data. In the previous example: di
ui
Payment
Size of Loss
0 0 500 1000 0 0 500 500 1000 1000
∞ ∞ ∞ ∞ ∞ 50,000 ∞ ∞ 50,000 ∞
300 70,000 11,500 1,000 11,000 50,000 14,500 200 49,000 28,000
300 70,000 12,000 2,000 11,000 50,000 or more 15,000 700 50,000 or more 29,000
yi 300 70,000 12,000 2,000 11,000 15,000 700 29,000
For yj, an uncensored value in the data: rj is the risk set at yj ⇔ the number of values that could have failed at yj. rj = (Number of yis ≥ yj) + (Number of uis ≥ yj) - (Number of dis ≥ yj). For example at 700: There are 7 yis ≥ 700: 700, 2000, 11000, 12000, 15000, 29000, and 70000. There are 2 uis ≥ 700: 50000, and 50000. There are 3 dis ≥ 700: 1000, 1000, and 1000. 32
See page 344 of Loss Models. Some people may find these formulas useful for exam purposes. Others will find useful the equivalent verbal forms given below. These formulas are useful for programming these techniques. 33 In the context of size of loss, d would be the size of a deductible. 34 In the context of size of loss, u would be the size of a maximum covered loss.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 39
r2 = 7 + 2 - 3 = 6, the same answer as gotten previously. This formula in verbal form is: Risk set at time of death yj = (Number who die at time ≥ yj) + (Number who leave the study at time ≥ yj) - (Number who enter the study at time ≥ yj) . Alternately: Risk set at time of death yj = (Number who enter the study at time < yj) - (Number who die at time < yj) - (Number who leave the study at time < yj) . This verbal form has an equivalent formula:35 rj = (Number of dis < yj) - (Number of yis < yj) - (Number of uis < yj) . For example at 700: There are 7 dis < 700: 0, 0, 0, 0, 500, 500, and 500. There is 1 yi < 700: 300. There are no uis < 700. r2 = 7 - 1 - 0 = 6, the same answer as gotten previously. Setting r0 = 0, the previous formula is equivalent to the following recursion formula:36 rj = rj - 1 + (Number of dis: yj - 1 ≤ di < yj) - sj - 1 - (Number of uis: yj - 1 ≤ ui < yj) . For example, given that r = 6 at 700, we get the risk set at 2000: There are 3 of dis: 700 ≤ di < 2000: 1000, 1000, and 1000. There is 1 failure at 700. There are no uis: 700 ≤ ui < 2000. Therefore, the risk set at 2000 is: 6 + 3 - 1 - 0 = 8. The verbal form of this recursion formula is: Change in the risk set = (Number of entries) - (Number of deaths) - (Number of withdrawals). One can use any of these formulas or verbal approaches in order to calculate the risk set. Applied correctly, they will all produce the same answer. 35 36
Formula 14.1 of Loss Models. Formula 14.2 of Loss Models.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 40
Problems: Use the following information for the next 6 questions: Group A Group B si ri si ri yi 1 2 3 4
10 15 20 15
100 80 60 30
15 20 30 5
100 90 70 40
2.1 (2 points) Determine the Product-Limit estimate of S(4) based on the data for Group A. (A) 24%
(B) 27%
(C) 30%
(D) 33%
(E) 36%
2.2 (2 points) Briefly discuss the truncation and/or censoring that could have produced the data for Group A. 2.3 (2 points) Determine the Product-Limit estimate of S(4) based on the data for Group B. (A) 24% (B) 27% (C) 30% (D) 33% (E) 36% 2.4 (2 points) Briefly discuss the truncation and/or censoring that could have produced the data for Group B. 2.5 (3 points) Determine the Product-Limit estimate of S(4) based on the data for both groups. (A) 24%
(B) 27%
(C) 30%
(D) 33%
(E) 36%
2.6 (1 point) Use extrapolation in order to determine the Product-Limit estimate of S(6) based on the data for both groups. (A) 15% (B) 17% (C) 19% (D) 21% (E) 23%
2.7 (2 points) The claim payments on a sample of ten policies are: 4 +
4
5+
5+
6
9
10+
12
15+
17
indicates that the loss exceeded the policy limit. Using the Product-Limit estimator, calculate the probability that the loss on a policy exceeds 10. (A) 0.47 (B) 0.50 (C) 0.53 (D) 0.56 (E) 0.59
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 41
2.8 (2 points) For a three year mortality study: 15 lives enter at the beginning. One additional life enters the study at each of the following times: 0.7, 1.3, 2.4. One life withdraws from the study at each of the following times: 0.5, 1.2, 1.6. One death occurs at each of the following times: 0.4, 0.8, 1.9, 2.7. What is the Product-Limit estimate of S(3)? (A) 72% (B) 73% (C) 74% (D) 75% (E) 76% 2.9 (1 point) For a mortality study you are given: (i) 200 people enter the estimation interval (x, x + 1] at exact age x. (ii) 35 people leave the study at x + 0.7. (iii) 15 deaths occur in (x, x + 0.7]. (iv) 5 deaths occur in (x + 0.7, x + 1]. Determine the product limit estimator of qx. (A) 0.100
(B) 0.102
(C) 0.104
(D) 0.106
(E) 0.108
Use the following information for the next 5 questions: You observe the following payments from 9 losses: Deductible Maximum Covered Loss Payment None 25,000 25,000 None 50,000 4,000 None 100,000 15,000 10,000 25,000 5,000 10,000 50,000 40,000 10,000 100,000 13,000 25,000 50,000 5,000 25,000 100,000 75,000 25,000 100,000 60,000 Note that the maximum payment on any loss is: Maximum Covered Loss minus Deductible. Use the Kaplan-Meier Product-Limit Estimator. 2.10 (1 point) Determine the estimate of S(60,000). 2.11 (1 point) Determine the estimate of S(150,000). 2.12 (1 point) Determine the estimate of Prob[15,000 ≤ X ≤ 50,000]. 2.13 (1 point) Determine the estimate of 15,000q10,000. 2.14 (1 point) Determine the estimate of 10,000|15,000q5,000.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 42
Use the following information for the next 8 questions: T represents the number of months from the time a claim is reported to the time the claim is closed. For ten claims all reported 36 months ago, 8 have been closed, with observed values of T: 2, 5, 5, 8, 9, 15, 26, 31. For an additional ten claims all reported 24 months ago, 6 have been closed, with observed values of T: 1, 3, 5, 12, 12, 20. 2.15 (1 point) Determine the Product-Limit estimate of S(30). 2.16 (1 point) Determine the Product-Limit estimate of S(48). 2.17 (1 point) Determine the Product-Limit estimate of Prob[12 ≤ T ≤ 36]. 2.18 (1 point) Determine the Product-Limit estimate of Prob[12 < T ≤ 36]. 2.19 (1 point) Determine the Product-Limit estimate of 6 q24. 2.20 (2 points) Determine the Product-Limit estimate of 12|18q6 . 2.21 (2 points) Use a uniform kernel with bandwidth of 5 in order to estimate f(25). 2.22 (3 points) Use a triangular kernel with bandwidth of 4 in order to estimate f(7).
2.23 (3 points) Eight cancer patients have reached a state of remission due to a particular chemotherapy. The length of time (in months), that they remain in remission is as follows: 8, 11+, 15, 21, 21+, 28, 36, 50, where + indicates an observation censored from above. Using the Product Limit Estimator, calculate the average length of remission. (A) 24 (B) 26 (C) 28 (D) 30 (E) 32 Use the following information for the next 3 questions: Five losses are from policies with a deductible of 500: 600, 800, 1000, 1400, 4000. Six losses are from policies with a deductible of 1000: 1100, 1400, 1800, 2000, 2000, 3000. Use the Product-Limit Estimator. 2.24 (1 point) Estimate 1000q500. 2.25 (1 point) Estimate 500q1000. 2.26 (2 points) Estimate 1000|2000q500.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 43
2.27 (3 points) All members of a study joined at birth, however, some may leave the study by means other than death. s8 = 5. s9 = 6. s10 = 3. The following product-limit estimates were obtained: S n (y8 ) = 0.8060, Sn (y9 ) = 0.7504, and Sn (y10) = 0.7141. Determine the number of censored observations between times y9 and y10. Assume no observations were censored at the death times. (A) 18 (B) 19 (C) 20 (D) 21 (E) 22 2.28 (3 points) For a ten year mortality study: 40 lives enter at the beginning. n lives entered the study at time 5. There were no withdrawals from the study. One death occurred at time 2, two deaths occurred at time 4, one death occurred at time 6, two deaths occurred at time 8, and one death occurred at time 10. The Product-Limit estimate of S(10) is 0.854. What is n? (A) 12 (B) 13 (C) 14 (D) 15 (E) 16 2.29 (2 points) 50 people enter a mortality study at time 0. At time 5, 15 people leave the study. 5 deaths occur before time 5. 10 deaths occur between time 5 and 15. Calculate the Product-Limit estimate of S(15). (A) 55% (B) 60% (C) 65% (D) 70% (E) 75% 2.30 (2 points) You are given the following data from a mortality study: Time 50 55 60 Number of Deaths -4 6 Product-Limit Estimate of S(t) 0.45 0.40 0.30 No new lives enter the study, but some lives may withdraw for reasons other than death. How many lives withdraw during 55 ≤ t < 60? A. 6 B. 7 C. 8 D. 9 E. 10 2.31 (2 points) 70 annuitants were observed from attainment of exact age 80. 5 deaths were observed; they occurred at ages: 80.2, 80.4, 80.4, 80.7, and 80.8. 20 annuitants left the study at age 80.3. 14 annuitants left the study at age 80.6. The remaining annuitants were alive at age 81. Determine the product limit estimate of q80. (A) 0.07
(B) 0.08
(C) 0.09
(D) 0.10
(E) 0.11
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Use the following information on ten males for the next 3 questions: Person Age at Entry Age at Withdrawal Age at Death Al 0 25 Bob 0 30 Cal 0 60 Don 0 75 Ed 0 80 Fred 10 75 Gil 20 80 Hal 30 45 Ian 40 90 Jim 50 85 Use the Product-Limit Estimator. 2.32 (2 points) Estimate S(75). (A) 48% (B) 49% (C) 50%
(D) 51%
(E) 52%
2.33 (2 points) Estimate the expected future lifetime at birth. (A) 66 (B) 68 (C) 70 (D) 72 (E) 74 2.34 (2 points) Estimate the mean residual life at age 65. (A) 12 (B) 14 (C) 16 (D) 18 (E) 20
2.35 (2 points) For a mortality study of laboratory mice, you are given: (i) 200 mice were observed at the start of the study. (ii) The following daily results were observed: t Deaths Withdrawals New Entrants 1 8 2 7 30 3 13 4 14 20 5 y 15 6 9 35 (iii) The product-limit estimate of S(t) at time t = 6 is 0.7321. Determine y. (A) 10 (B) 11 (C) 12 (D) 13 (E) 14
Page 44
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 45
2.36 (2 points) The results of a study with a calendar year 2003 observation period are as follows: Individual Birthdate Death A April 1, 1942 June 1, 2003 B July 1, 1942 C October 1, 1942 March 1, 2003 D January 1, 1943 E April 1, 1943 F July 1, 1943 October 1, 2003 G September 1, 1942 All individuals were in the study on January 1, 2003. No one left the study during the observation period other than by death. Using the Product-Limit estimator, estimate q60. (A) 0.42
(B) 0.44
(C) 0.46
(D) 0.48
(E) 0.50
Use the following information for the next two questions: For 10 lives you observed from birth the following: Age Event 27 1 death 33 1 death 46 1 death 58 1 death 60 2 withdrawals 63 1 death 67 1 death 72 1 death 79 1 death 2.37 (1 point) Estimate S(58), using the product limit estimator. A. 0.45 B. 0.50 C. 0.55 D. 0.60 E. 0.65 2.38 (1 point) Estimate S(75), using the product limit estimator. A. 0.10 B. 0.15 C. 0.20 D. 0.25 E. 0.30
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 46
Use the following information for the next 2 questions: There are a total of 10 payments from polices with different deductibles. From policies with no deductible: 300, 700, 2000, 7000. From policies with a deductible of 500: 400, 1000, 9000. From policies with a deductible of 1000: 200, 1000, 4000. Use the Product-Limit Estimator. 2.39 (2 points) Estimate the mean loss. A. 2150 B. 2200 C. 2250
D. 2300
E. 2350
2.40 (2 points) Estimate the loss elimination ratio at 1000. A. 0.32 B. 0.34 C. 0.36 D. 0.38 E. 0.40
2.41 (3 points) If an auto insurance policyholder has a claim, his policy is not renewed the next year. If an auto insurance policyholder does not pay his premium, his coverage is terminated. You are given the following information about 20 randomly selected auto insurance policyholders observed from time t = 0: Policyholder A did not pay his premium for year 3. Policyholder B had a claim in year 2. Policyholder C had a claim in year 4. Policyholder D had a claim in year 1. Policyholder E did not renew his policy for year 3. Policyholder F had a claim in year 3. Policyholder G did not pay his premium for year 2. Policyholder H did not renew his policy for year 5. Policyholder I had a claim in year 4. The 11 other policyholders paid their premiums and had no claims during their first 5 years of coverage. Using the Kaplan-Meier Product-Limit estimator, calculate the probability that a policyholder has no claims for 5 years. (A) Less than 0.70 (B) At least 0.70, but less than 0.71 (C) At least 0.71, but less than 0.72 (D) At least 0.72, but less than 0.73 (E) At least 0.73
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
2.42 (2 points) You are given: (i) All members of a mortality study are observed from birth. Some leave the study by means other than death. (ii) s2 = 35, s3 = 50, s4 = 20. (iii) The following Kaplan-Meier product-limit estimates were obtained: S n (y2 ) = 0.6927, Sn (y3 ) = 0.4303, Sn (y4 ) = 0.3124. (iv) Assume no observations were censored at the times of deaths. Determine how many observations were censored between times y3 and y4 . (A) 6
(B) 7
(C) 8
(D) 9
(E) 10
Use the following information on five females for the next 3 questions: Person Age at Entry Age at Withdrawal Age at Death Alice 0 58 Barbara 0 75 Cindy 60 90 Dorothy 60 64 Elizabeth 65 77 Use the Kaplan-Meier Product-Limit Estimator. 2.43 (2 points) Estimate S(80). (A) 14% (B) 15% (C) 16%
(D) 17%
(E) 18%
2.44 (2 points) Estimate the expected future lifetime at birth. (A) 66 (B) 68 (C) 70 (D) 72 (E) 74 2.45 (2 points) Estimate the mean residual life at age 70. (A) 12 (B) 14 (C) 16 (D) 18 (E) 20
Page 47
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 48
2.46 (3 points) Scientist Frank N. Stein is experimenting on orangutans. His assistant Igor shows an orangutan a box and a mango fruit. The orangutan has never seen this box before, which has a complicated latch. Igor places the mango fruit inside the box and latches the box closed. Then Igor puts the box inside the orangutanʼs cage and leaves. Dr. Stein observes how long from the beginning of the experiment until the orangutan opens the box and takes out the fruit. Dr. Stein repeats this experiment a total of 8 times on different orangutans. The results are as follows from the start of each experiment: Orangutan Result Beda Opened box in 7 minutes. Dumadi Opened box in 3 minutes. Fio Threw the box out of the cage after 4 minutes. Kasih Opened box in 6 minutes. Lomon Opened box in 5 minutes. Melati Opened box in 8 minutes.* Sayang Opened box in 10 minutes. Tengku Opened box in 5 minutes. * Dr. Stein was called away and did not observe the first 6 minutes of Melatiʼs efforts. Determine the Kaplan-Meier Product-Limit estimate of S(7). (A) 23% (B) 25% (C) 27% (D) 29% (E) 31% 2.47 (3 points) You observe the following claims data: Deductible Maximum Covered Loss Payment 500 5000 500 500 5000 3000 500 10,000 4500 500 10,000 9500 1000 5000 3000 1000 5000 4000 1000 10,000 2000 1000 10,000 5000 Determine the Kaplan-Meier Product-Limit estimate of S(6000)/S(500). (A) 13% (B) 16% (C) 19% (D) 22% (E) 25%
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 49
2.48 (3 points) Fifty cancer patients were observed from the time of diagnosis until the earlier of death or 40 months from diagnosis: Time In Months Since Diagnosis Number Of Deaths 5 1 10 3 15 2 20 x 25 x 30 2 35 0 40 3 Thirty additional cancer patients were observed starting at 12 months after diagnosis until the earlier of death or 40 months from diagnosis: Time In Months Since Diagnosis Number Of Deaths 15 1 20 2 25 0 30 1 35 2 40 1 ^
The Product Limit estimate S (40) is 0.678. Determine the value of x. (A) 0 (B) 1 (C) 2
(D) 3
(E) 4
2.49 (3 points) Barbie has estimated S(75)/S(50) = 0.6615, using the product limit estimator. Ken is checking Barbieʼs work. The risk set information she passed on to Ken unfortunately excludes any deaths prior to age 60: y s r 61 1 32 63 1 30 66 2 27 69 1 26 72 1 24 73 1 23 74 2 22 Ken notices that Barbie incorrectly used a death at age 73 as if it were at age 63. Determine the corrected estimate of S(75)/S(50). (A) 0.598 (B) 0.599 (C) 0.660 (D) 0.661 (E) Can not be determined
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
2.50 (3 points) For observation i of a survival study: • di is the left truncation point • xi is the observed value if not right censored • ui is the observed value if right censored You are given: Observation (i)
di
xi
ui
1 0 3 2 0 3 3 0 4 4 0 5 5 0 7 6 0 10 7 0 15 8 5 7 9 5 7 10 10 12 Determine the Kaplan-Meier Product-Limit estimate, S10(15). (A) Less than 0.07 (B) At least 0.07, but less than 0.08 (C) At least 0.08, but less than 0.09 (D) At least 0.09, but less than 0.10 (E) At least 0.10 2.51 (3 points) For a mortality study from first diagnosis of a certain disease:
• The study starts at time zero with 40 lives. • There are 5 new entrants at time 2. • There are 4 new entrants at time 4. • There are 8 withdrawals at time 5 • There is 1 withdrawal at each of times 1, 3, 4, and 10. • There are two deaths at time 4. • There is one death at each of times 2, 3, 5, 7, and 10. What is the product-limit estimate of H(8)? (A) Less than 0.15 (B) At least 0.15, but less than 0.16 (C) At least 0.16, but less than 0.17 (D) At least 0.17, but less than 0.18 (E) At least 0.18
Page 50
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 51
2.52 (3 points) Let X be the length of time in weeks from receipt of an application to join the Bushwood Country Club to when a decision is issued on the application. For 9 applications, the times from application to decision are: 2, 2, 4, 7, 8, 12, 12, 14, 18. 2 applications were withdrawn prior to a decision being issued, with times: 3, 8. 3 applications have yet to be resolved, with times: 4, 7, 12. Using the product limit estimator, estimate Prob(7 ≤ X ≤ 12). (A) 36% (B) 38% (C) 40% (D) 42% (E) 44% Use the following information for the next two questions:
• Initech wishes to estimate its retention rate of employees. • Initech is studying employees who leave other than due to: being fired or downsized, dying, retiring, or becoming disabled.
• Initech examines the records of those employed during the year 2015. Employee A B C D E F G H I J K
Months Employed Prior to 2015 0 5 8 10 123 223 192 66 17 30 0
Months Employed During 2015 3 4 5 12 12 5 6 8 6 6 7
Status Still with Initech Fired Quit Still with Initech Still with Initech Retired Died Quit Downsized Downsized Quit
2.53 (2 points) Determine the Product-Limit estimate of the probability that an employee leaves Initech within 24 months, other than due to being fired or downsized, dying, retiring, or becoming disabled. 2.54 (2 points) Using an exponential curve to extrapolate, what is the probability that an employee leaves Initech within 30 years, other than due to being fired or downsized, dying, retiring, or becoming disabled.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 52
2.55 (2 points) Liability insurance is offered with three policy limits: 100,000, 250,000, and 500,000. There were the following 6 payments on policies with a 100,000 limit: 10,000, 25,000, 50,000, 100,000+, 100,000+, 100,000+. There were the following 7 payments on policies with a 250,000 limit: 10,000, 20,000, 50,000, 125,000, 200,000, 250,000+, 250,000+. There were the following 5 payments on policies with a 500,000 limit: 5,000, 25,000, 75,000, 300,000, 500,000+. Using the Product Limit Estimator, determine the probability that a loss will be greater than 150,000 before any policy limit is applied. A. 0.42 B. 0.44 C. 0.46 D. 0.48 E. 0.50 2.56 (1 point) For a mortality study:
• All of the lives enter the study at birth. • The product limit estimate of S(77) is 0.4368. • There are two deaths at age 77. • After age 75 and before age 77 nobody dies but 9 lives withdraw from the study. • At age 75 the risk set is of size 46 and there are three deaths. Determine the product limit estimate of S(75). 2.57 (160, 11/86, Q.6) (2.1 points) You are given: Age at Individual Entry Withdrawal Death A 0 6 B 0 27 C 0 42 D 0 42 E 5 60 F 10 24 G 15 50 H 20 23 Using the product limit estimator of S(x), determine the expected future lifetime at birth. (A) 46.0 (B) 46.5 (C) 47.0 (D) 47.5 (E) 48.0
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 53
2.58 (160, 5/89, Q.10) (2.1 points) From a 1987 calendar year mortality study of five lives, you are given: Individual Date of Birth Date of Death 1 April 1, 1900 --2 July 1, 1900 June 1, 1987 3 January 1, 1901 --4 April 1, 1901 --5 July 1, 1901 November 1, 1987 Calculate the product limit estimator of q86. (A) 0.45
(B) 0.50
(C) 0.55
(D) 0.60
(E) 0.65
2.59 (160, 11/89, Q.12) (2.1 points) You are given: (i) 1,000 persons enter the estimation interval (x, x + 1] at exact age x. (ii) 200 withdrawals occur at x + 0.4. (iii) 38 deaths occur in (x, x + 0.4]. (iv) 51 deaths occur in (x + 0.4, x + 1]. Calculate the absolute difference between the maximum likelihood estimator qx under the uniform distribution and the product limit estimator of qx. (A) 0.0001
(B) 0.0004
(C) 0.0007
(D) 0.0010
(E) 0.0013
2.60 (160, 11/90, Q.16) (1.9 points) From a study over the interval (x, x+1], you are given: (i) 25 lives entered the study at age x. (ii) n lives entered the study at age x + 0.4. (iii) There were no withdrawals. (iv) Age At Death Number of Deaths x + 0.25 4 x + 0.50 2 x + 0.75 3 x + 1.00 4 (v) The product limit estimate of qx was 0.396. Determine n. (A) 8 (B) 11
(C) 15
(D) 20
(E) 25
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
2.61 (Course 160 Sample Exam #2, 1996, Q.10) (1.9 points) For a mortality study you are given: (i) There were no intermediate entrants. (ii) There was one death at time t3 . (iii) There were two deaths at time t4 . (iv) There was one death at time t5 . (v) There were no other deaths in [t3 , t5 ). (vi) The following product limit estimates of S(t): ^
t
S (t)
t3
0.72
t4
0.60
t5
0.50
Calculate the number of terminations from the study in [t4 , t5 ). (A) 0
(B) 1
(C) 2
(D) 3
(E) 4
2.62 (Course 160 Sample Exam #2, 1997, Q.9) (1.9 points) You are given that: (i) 100 people enter a mortality study at time 0. (ii) At time 6, 15 people leave. (iii) 10 deaths occur before time 6. (iv) 3 deaths occur between time 6 and time 10. Calculate the product limit estimate of S(10). 2.63 (Course 160 Sample Exam #3, 1997, Q.9) (1.9 points) For a mortality study you are given: (i) 100 people enter the estimation interval (x, x + 1] at exact age x. (ii) 15 people leave the study at x + 0.6. (iii) 10 deaths occur in (x, x + 0.6]. (iv) 3 deaths occur in (x + 0.6, x + 1]. (v) MLE is the maximum likelihood estimator of qx under the uniform distribution. (vi) PLE is the product limit estimator of qx. Calculate I MLE - PLE I. (A) 0.001 (B) 0.002
(C) 0.003
(D) 0.004
(E) 0.005
Page 54
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 55
2.64 (Course 160 Sample Exam #1, 1999, Q.8) (1.9 points) For a mortality study of laboratory mice, you are given: (i) 300 mice were observed at the start of the study. (ii) The following daily results were observed: t Deaths Withdrawals New Entrants 1 6 2 20 3 10 4 10 30 5 a 6 7 45 8 9 b 10 35 11 12 6 13 15 (iii) The product-limit estimate of S(t) at time t = 7 is 0.892 and at time t = 13 is 0.856. Determine a + b. (A) 13 (B) 14 (C) 25 (D) 37 (E) 42 2.65 (Course 4 Sample Exam 2000, Q.22) An insurance company wishes to estimate its four-year agent retention rate using data on all agents hired during the last six years. You are given:
• Using the Product-Limit estimator, the company estimates the proportion of agents remaining after 3.75 years of service as S(3.75) = 0.25.
• One agent resigned between 3.75 and 4 years of service. • Eleven agents have been employed longer than the agent who resigned between 3.75 and 4 years of service. Determine the Product-Limit estimate of S(4). 2.66 (4, 11/00, Q.4) (2.5 points) You are studying the length of time attorneys are involved in settling bodily injury lawsuits. T represents the number of months from the time an attorney is assigned such a case to the time the case is settled. Nine cases were observed during the study period, two of which were not settled at the conclusion of the study. For those two cases, the time spent up to the conclusion of the study, 4 months and 6 months, was recorded instead. The observed values of T for the other seven cases are as follows: 1 3 3 5 8 8 9 Estimate Pr[3 ≤ T ≤ 5] using the Product-Limit estimator. (A) 0.13 (B) 0.22 (C) 0.36 (D) 0.40 (E) 0.44
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 56
2.67 (4, 5/01, Q.4) (2.5 points) You are given the following times of first claim for five randomly selected auto insurance policies observed from time t = 0: 1 2 3 4 5 You are later told that one of the five times given is actually the time of policy lapse, but you are not told which one. The smallest Product-Limit estimate of S(4), the probability that the first claim occurs after time 4, would result if which of the given times arose from the lapsed policy? (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 2.68 (4, 11/01, Q.4) (2.5 points) Which of the following statements about the Product-Limit estimator is false? (A) The Product-Limit estimator is based on the assumption that knowledge of a censoring time for an individual provides no further information about this personʼs likelihood of survival at a future time had the individual continued in the study. (B) If the largest study time corresponds to a death time, then the Product-Limit estimate of the survival function is undetermined beyond this death time. (C) When there is no censoring or truncation, the Product-Limit estimator reduces to the empirical survival function. (D) Under certain regularity conditions, the Product-Limit estimator is a nonparametric maximum likelihood estimator. (E) The Product-Limit estimator is consistent. 2.69 (4, 11/01, Q.19 & 2009 Sample Q.68) (2.5 points) For a mortality study of insurance applicants in two countries, you are given: Country A Country B si ri si ri ti 1 2 3 4
20 54 14 22
200 180 126 112
15 20 20 10
100 85 65 45
S T(t) is the Product-Limit estimate of S(t) based on the data for all study participants. S B(t) is the Product-Limit estimate of S(t) based on the data for study participants in Country B. Determine |ST(4) - SB(4)|. (A) 0.06 (B) 0.07 (C) 0.08
(D) 0.09
2.70 (2 points) In the previous question, what is SA(4)?
(E) 0.10
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 57
2.71 (4, 11/02, Q.25 & 2009 Sample Q. 46) (2.5 points) The claim payments on a sample of ten policies are: 2
3
3
5
5+
6
7
7+
9
10+
+
indicates that the loss exceeded the policy limit. Using the Product-Limit estimator, calculate the probability that the loss on a policy exceeds 8. (A) 0.20 (B) 0.25 (C) 0.30 (D) 0.36 (E) 0.40 2.72 (4, 11/04, Q.4 & 2009 Sample Q.135) (2.5 points) For observation i of a survival study: • di is the left truncation point • xi is the observed value if not right censored • ui is the observed value if right censored You are given: Observation (i)
di
xi
ui
1 0 0.9 2 0 1.2 3 0 1.5 4 0 1.5 5 0 1.6 6 0 1.7 7 0 1.7 8 1.3 2.1 9 1.5 2.1 10 1.6 2.3 Determine the Kaplan-Meier Product-Limit estimate, S10(1.6). (A) Less than 0.55 (B) At least 0.55, but less than 0.60 (C) At least 0.60, but less than 0.65 (D) At least 0.65, but less than 0.70 (E) At least 0.70
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 58
2.73 (4, 11/05, Q.5 & 2009 Sample Q.217) (2.9 points) For a portfolio of policies, you are given: (i) There is no deductible and the policy limit varies by policy. (ii) A sample of ten claims is: 350 350
500
500
500+
1000
1000+
1000+
1200
1500
where the symbol + indicates that the loss exceeds the policy limit. ^
(iii) S1(1250) is the product-limit estimate of S(1250). ^
(iv) S2 (1250) is the maximum likelihood estimate of S(1250) under the assumption that the losses follow an exponential distribution. ^
^
Determine the absolute difference between S1(1250) and S2 (1250). (A) 0.00
(B) 0.03
(C) 0.05
(D) 0.07
(E) 0.09
2.74 (4, 5/07, Q.38) (2.5 points) You are given: (i) All members of a mortality study are observed from birth. Some leave the study by means other than death. (ii) s3 = 1, s4 = 3 (iii) The following Kaplan-Meier product-limit estimates were obtained: S n (y3 ) = 0.65, Sn (y4 ) = 0.50, Sn (y5 ) = 0.25 (iv) Between times y4 and y5 , six observations were censored. (v) Assume no observations were censored at the times of deaths. Determine s5 . (A) 1
(B) 2
(C) 3
(D) 4
(E) 5
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 59
Solutions to Problems: 2.1. A. For Group A: yi
si
ri
(ri-si)/ri
1 2 3 4
10 15 20 15
100 80 60 30
0.900 0.812 0.667 0.500
S(yi) 1 0.900 0.731 0.487 0.244
2.2. r1 - s1 = 100 - 10 = 90. r2 = 80. Therefore, there could have been 10 values censored from above at 1. Alternately, there could have been 30 values censored from above at 1, and 20 values truncated from below at 1. r2 - s2 = 80 - 15 = 65. r3 = 60. Therefore, there could have been 5 values censored from above at 2. Alternately, there could have been 15 values censored from above at 2, and 10 values truncated from below at 2. r3 - s3 = 60 - 20 = 40. r4 = 30. Therefore, there could have been 10 values censored from above at 3. Alternately, there could have been 40 values censored from above at 3, and 30 values truncated from below at 3. Comment: There are many other possibilities besides those specifically mentioned. 2.3. D. For Group B: yi
si
ri
(ri-si)/ri
1 2 3 4
15 20 30 5
100 90 70 40
0.850 0.778 0.571 0.875
S(yi) 1 0.850 0.661 0.378 0.331
2.4. r1 - s1 = 100 - 15 = 85. r2 = 90. Therefore, there could have been 5 values truncated from below at 1. Alternately, there could have been 10 values censored from above at 1, and 15 values truncated from below at 1. r2 - s2 = 90 - 20 = 70. r3 = 70. Therefore, there could have been no values truncated or censored at 2. Alternately, there could have been 30 values censored from above at 2, and 30 values truncated from below at 2. r3 - s3 = 70 - 30 = 40. r4 = 40. Therefore, there could have been no values truncated or censored at 3. Alternately, there could have been 20 values censored from above at 3, and 20 values truncated from below at 3. Comment: There are many other possibilities besides those specifically mentioned.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 60
2.5. C. For Group A plus Group B: yi
si
ri
(ri-si)/ri
1 2 3 4
25 35 50 20
200 170 130 70
0.875 0.794 0.615 0.714
S(yi) 1 0.875 0.695 0.428 0.305
Comment: Similar to 4, 11/01, Q.19. ^
^
2.6. B. From the previous solution, S (4) = 0.305. Therefore, S (6) = 0.3056/4 = 0.168. 2.7. C. S10(10) = (8/10)(5/6)(4/5) = 0.533. yi
si
ri
(ri-si)/ri
4 6 9 12 17
2 1 1 1 1
10 6 5 3 1
0.800 0.833 0.800 0.667 0.000
S(yi) 1 0.800 0.667 0.533 0.356 0.000
Comment: Similar to 4, 11/02, Q.25. 2.8. B.
yi
si
ri
(ri-si)/ri
S(yi)
0.4 1 15 14/15 14/15 0.8 1 14 13/14 (13/14)(14/15) = 13/15 1.9 1 12 11/12 (11/12)(13/15) = 143/180 2.7 1 12 11/12 (11/12)(143/180) = 1573/2160 = 0.728. Comment: It is a 3 year study, so we observe the remaining 11 lives until time 3.
⇒ The estimate of S(3) equals the estimate of S(2.7). 2.9. D. In the interval (x, x + 0.7) there are no withdrawals or entries. ^
Therefore for the product limit estimator S (0.7) = 185/200 = 0.925, the empirical survival function. After the 35 people leave the study at x + 0.7, 200 - 15 - 35 = 150 people remain in the risk set. In the interval (x + 0.7, x + 1) there are no withdrawals or entries. ^
^
Therefore, for the product limit estimator S (1) = S (0.7)(145/150) = (0.925)(0.9667) = 0.8942. qx = 1 - 0.8942 = 0.1058.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
2.10 through 2.14. Deductible Maximum Covered Loss None 25,000 None 50,000 None 100,000 10,000 25,000 10,000 50,000 10,000 100,000 25,000 50,000 25,000 100,000 25,000 100,000 i yi ri si (ri - si)/ri 1 4000 3 1 2/3 2 15,000 5 2 3/5 3 23,000 3 1 2/3 4 30,000 4 1 3/4 5 85,000 2 1 1/2
Payment 25,000 4,000 15,000 5,000 40,000 13,000 5,000 75,000 60,000
HCM 10/16/12,
Page 61
Size of Loss 25,000 or more 4,000 15,000 15,000 50,000 or more 23,000 30,000 100,000 or more 85,000
S(yi) 2/3 (2/3)(3/5) = 2/5 (2/5)(2/3) = 4/15 (4/15)(3/4) = 1/5 (1/5)(1/2) = 1/10
S(60,000) = 1/5 = 0.2. S(150,000) = S(100,000)150000/100000 = (1/10)1.5 = 0.0316. Prob[15,000 ≤ X ≤ 50,000] = S(14999.9) - S(50000) = 2/3 - 1/5 = 0.04667. 15,000q 10,000 = (S(10,000) - S(25,000))/S(10,000) = (2/3 - 4/15)/(2/3) = 0.6. 10,000|15,000q 5,000
= (S(15,000) - S(30,000))/S(5000) = (.4 - .2)/(2/3) = 0.3.
Comment: We want Prob[15,000 ≤ X ≤ 50,000], which includes 15,000. S(15,000) = Prob[X > 15,000], and therefore S(15,000) - S(50,000) = Prob[15,000 ≤ X ≤ 50,000], which excludes 15,000. However, S(15,000 - ε) = Prob[X > 15,000 - ε] = Prob[X ≥ 15,000]. Therefore, S(15,000 - ε) - S(50,000) = Prob[15,000 ≤ X ≤ 50,000]. See 4, 11/00, Q.4. In order to get the risk sets one can use either formula approach. Risk set at time of death yj = (Number who die at time ≥ yj) + (Number who leave the study at time ≥ yj) - (Number who enter the study at time ≥ yj). claims that “die” at 4000 or more, numbers: 2, 3, 4, 6, 7, 9. claims that “leave” (other than due to death) at 4000 or more, numbers: 1, 5, 8. claims that “enter” at more than 4000, numbers: 4 through 9. r1 = 6 + 3 - 6 = 3. r2 = 5 + 3 - 3 = 5. r3 = 3 + 3 - 3 = 3. r4 = 2 + 2 - 0 = 4.
r5 = 1 + 1 - 0 = 2.
Alternately, Risk set at time of death yj = (Number who enter the study at time < yj) - (Number who die at time < yj) - (Number who leave the study at time < yj). r1 = 3 - 0 - 0 = 3.
r2 = 6 - 1 - 0 = 5.
r4 = 9 - 4 - 1 = 4.
r5 = 9 - 5 - 2 = 2.
r3 = 6 - 3 - 0 = 3.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 62
Alternately, one can count up how many are available to “die” whenever a “death” occurs. Risk set at 4000, includes claim numbers: 1, 2, 3. Risk set at 15,000, includes claim numbers: 1, 3, 4, 5, 6. Risk set at 23,000, includes claim numbers: 1, 5, 6. Risk set at 30,000, includes claim numbers: 5, 7, 8, 9. Risk set at 85,000, includes claim numbers: 8, 9. 2.15 through 2.20. There are 2 claims whose time of settlement is censored from above at 36, and 4 claims whose time of settlement is censored from above at 24. yi
si
ri
(ri-si)/ri
0 1 2 3 5 8 9 12 15 20 26 31
1 1 1 3 1 1 2 1 1 1 1
20 19 18 17 14 13 12 10 9 4 3
0.950 0.947 0.944 0.824 0.929 0.923 0.833 0.900 0.889 0.750 0.667
S(yi) 1 1 0.950 0.900 0.850 0.700 0.650 0.600 0.500 0.450 0.400 0.300 0.200
S(30) = 0.3. Extrapolate via an Exponential beyond the last censorship point of 36: S(48) = S(36)48/36 = .24/3 = 0.117. Prob[12 ≤ T ≤ 36] = S(11.99) - S(36) = .6 - .2 = 0.4. Prob[12 < T ≤ 36] = S(12) - S(36) = .5 - .2 = 0.3. 6 q 24 = 1 - S(30)/S(24) = 1 - .3/.4 = 1/4. 12|18q 6
= {S(18) - S(36)}/S(6) = (.45 - .2)/.7 = 0.357.
2.21. The estimate of the Survival Function has jumps at each of the reported values; there is a corresponding discrete density. The points 20 and 26 contribute to the smoothed estimate. f(20) = S(15) - S(20) = .450 - .400 = .05. f(26) = S(20) - S(26) = .400 - .300 = .1. The uniform kernel has density 1/10. The smoothed estimate of f(25) is: (.05 + .1)/10 = 0.015. Comment: The endpoints are included in the definition of the uniform kernel, so that the density at 20 contributes. See “Mahlerʼs Guide to Fitting Loss Distributions.” 2.22. The triangular kernel has height of 1/4 at 7, and height of zero at 3 and at 11. Thus it has density of 1/8 at 5, 3/16 at 8, and 1/8 at 9. The smoothed estimate of f(7) is: f(5)/8 + 3f(8)/16 + f(9)/8 = .15/8 + .05(3/16) + .05/8 = 0.0344.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 63
2.23. C. The mean is equal to an integral of the Survival Function. yi
si
ri
(ri-si)/ri
8 15 21 28 36 50
1 1 1 1 1 1
8 6 5 3 2 1
0.8750 0.8333 0.8000 0.6667 0.5000 0.0000
S(yi) 1 0.8750 0.7292 0.5833 0.3889 0.1944 0.0000
We add up the areas under the graph of the Survival Function, a series of rectangles: (8)(1) + (15 - 8)(.875) + (21 - 15)(.7292) + (28 - 21)(.5833) + (36 - 28)(.3889) + (50 - 36)(.1944) = 28.4. Comment: A graph of the estimated survival function: Prob. 1 0.8 0.6 0.4 0.2
10
20
30
40
50
x
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
2.24, 2.25, & 2.26. i
yi
ri
si
500 5 0 1 600 5 1 2 800 4 1 3 1000 3 1 4 1100 8 1 5 1400 7 2 6 1800 5 1 7 2000 4 2 8 3000 2 1 9 4000 1 1 1000q 500 = 1 - S(1500)/S(500) = 1- 1/4 = 3/4. 500q 1000
HCM 10/16/12,
(ri - si)/ri
S(yi)/S(500)
1 4/5 3/4 2/3 7/8 5/7 4/5 2/4 1/2 0
1 4/5 (4/5)(3/4) = 3/5 (3/5)(2/3) = 2/5 (2/5)(7/8) = 7/20 (7/20)(5/7) = 1/4 (1/4)(4/5) = 1/5 (1/5)(2/4) = 1/10 (1/10)(1/2) = 1/20 0
Page 64
= 1 - S(1500)/S(1000) = 1 - {S(1500)/S(500)}/{S(1000)/S(500)} = 1 - (1/4)/(2/5) = 3/8.
1000|2000q 500
= {S(1500) - S(3500)}/S(500) = 1/4 - 1/20 = 1/5.
Comment: Note that in this case, all the data is left truncated and the smallest truncation point is 500. Thus we have no information on what happens below 500. Thus we are really estimating S(yi)/S(500), the probability of survival to yi conditional on survival to 500. 2.27. B. Sn (y9 ) = Sn (y8 )(1 - 6/r9 ). ⇒ .8060/.7504 = 1 - 5/r9 ⇒ r9 = 87. S n (y10) = Sn (y9 )(1 - 3/r10). ⇒ .7504/.7141 = 1 - 3/r10. ⇒ r10 = 62. r9 - r10 = 87 - 62 = 25. s9 = 6. ⇒ number of censored observations between times y9 and y10 is: 25 - 6 = 19. Comment: Similar to Exercise 14.9 in Loss Models. 2.28. D.
yi
si
2 4 6 8 10
1 2 1 2 1
ri 40 39 37 + n 36 + n 34 + n
(ri-si)/ri
S(yi)
39/40 37/39 (36+n)/(37+n) (34+n)/(36+n) (33+n)/(34+n)
39/40 37/40 (37/40)(36+n)/(37+n) (37/40)(34+n)/(37+n) (37/40)(33+n)/(37+n)
.854 = (37/40)(33+n)/(37+n) ⇒ 1263.92 + 34.16n = 1221 + 37n. ⇒ n = 42.92/2.84 = 15.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 65
2.29. B. Since there are no entries or withdrawals between times 0 and 5, we can use the empirical survival function to estimate S(5) = 45/50. Since there are no entries or withdrawals between times 5 and 15, and there are 30 people in the study beyond time 5 of whom 10 die, the estimate of S(15)/S(5) = 20/30. The estimate of S(15) = (45/50)(20/30) = 60%. Comment: In the absence of truncation and censoring, the Product-Limit estimator is equal to the empirical survival function. 2.30. C. S(55)/S(50) = .40/.45 = 8/9. ⇒ 1 - 4/r = 8/9. ⇒ r = 36 at age 55. S(50)/S(40) = .30/.40 = 3/4. ⇒ 1 - 6/r = 3/4. ⇒ r = 24 at age 60. 36 - 4 - number that withdrew = 24. ⇒ number that withdrew = 8. Comment: Similar to Exercise 14.9 in Loss Models. 2.31. E. q80 = 1 - S(81)/S(80) = 1 - (69/70)(47/49)(32/33)(31/32) = 1 - .8882 = 0.1118. yi
si
ri
(ri-si)/ri
80.2 80.4 80.7 80.8
1 2 1 1
70 49 33 32
0.9857 0.9592 0.9697 0.9688
2.32. B.
yi
ri
25 7 45 7 75 6 80 4 90 1 S(75) = 24/49 = 49.0%.
si
(ri-si)/ri
S(yi)
1 1 2 1 1
6/7 6/7 4/6 3/4 0
6/7 36/49 24/49 18/49 0
S(yi)/S(80) 1 0.9857 0.9455 0.9168 0.8882
2.33. C. The mean is the integral of the Survival Function, which in this case involves adding up the areas under a step function. (25)(1) + (20)(6/7) + (30)(36/49) + (5)(24/49) + (10)(18/49) = 70.3. 2.34. D. e(65) = (integral of S(t) from 65 to infinity)/S(65) = {(10)(36/49) + (5)(24/49) + (10)(18/49)}/(36/49) = 10 + (5)(2/3) + (10)(1/2) = 18.33.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 66
^
2.35. A. S (6) = {192/200}{185/192}{202/215}{188/202}{(208 - y)/208}{(184 - y)/(193 - y)}.
⇒ (0.7321)(200)(215)(208)(193 - y) = (185)(188)(208 - y)(184 - y). ⇒ 34780y2 - 7085857.6y + 67354996.8 = 0. ⇒ y = {7085857.6 ± 7085857.62 - (4)(34780)(67354996.8) } / {(2)(34780)} = (7085857.6 ± 6390536)/69560. ⇒ y = 10 or 194. However, y can not be 194 since then there would be 208 - 194 - 15 = -1 lives at risk at time 6.
⇒ y= 10. Comment: Similar to Course 160 Sample Exam #1, 1999, Q.8. 2.36. E. Individual A B C D E F G ti
Birthdate April 1, 1942 July 1, 1942 Oct. 1, 1942 Jan. 1, 1943 April 1, 1943 July 1, 1943 Sept. 1, 1942 ri
Death June 1, 2003
Age on 1/1/03 60 & 9 months 60 & 6 months March 1, 2003 60 & 3 months 60 59 & 9 months Oct. 1, 2003 59 & 6 months 59 & 4 months si (ri - si)/ri S(ti)/S(60)
60 & 3 months 4 1 60 & 5 months 3 1 q60 = 1 - 1/2 = 1/2 = 0.500.
3/4 2/3
Age at Death or the End of 2003 61 years and 2 months 61 years and 6 months 60 years and 5 months 61 years 60 years and 9 months 60 years and 3 months 60 years and 4 months
3/4 1/2
Comment: At 60 & 3 months the risk set is: D to G. At 60 & 5 months the risk set is: C to E. The death of A at age 61 years & 2 months does not enter into the calculation of q60. 2.37. D. S(58) = (9/10)(8/9)(7/8)(6/7) = 6/10 = 0.6. Comment: Since there are no entries, and no withdrawals through age 58, the product limit estimator is the same as the empirical survival function, 6/10.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 67
2.38. B. S(75) = S(60)(3/4)(2/3)(1/2) = S(58)(1/4) = (0.6)(1/4) = 0.15. yi
si
ri
(ri-si)/ri
0 27 33 46 58 63 67 72 79
1 1 1 1 1 1 1 1
10 9 8 7 4 3 2 1
0.900 0.889 0.875 0.857 0.750 0.667 0.500 0.000
S(yi) 1 1 0.900 0.800 0.700 0.600 0.450 0.300 0.150 0.000
Comment: Since there are no entries, and no withdrawals between age 60 and age 75, the product limit estimator of S(75)/S(60) is the same as the empirical survival function, 1/4. Whenever one has a stretch of time with no entries or withdrawals, one gets similar cancellation of numerators and denominators in the product of ratios. 2.39. D. & 2.40. B. Convert the payments to losses by adding back in the deductibles. Losses: 300, 700, 2000, 7000, 900, 1500, 9500, 1200, 2000, 5000. The losses of sizes 900, 1500, and 9500, were left truncated at 500. The losses of sizes 1200, 2000, and 5000, were left truncated at 1000. yi
si
ri
(ri-si)/ri
300 700 900 1200 1500 2000 5000 7000 9500
1 1 1 1 1 2 1 1 1
4 6 5 7 6 5 3 2 1
0.7500 0.8333 0.8000 0.8571 0.8333 0.6000 0.6667 0.5000 0.0000
S(yi) 1 0.7500 0.6250 0.5000 0.4286 0.3571 0.2143 0.1429 0.0714 0.0000
The mean is equal to an integral of the Survival Function. We add up the areas under the graph of the Survival Function, in this case a series of rectangles: (300)(1) + (700 - 300)(0.75) + (900 - 700)(0.6250) + (1200 - 900)(0.5) + (1500 - 1200)(0.4286) + (2000 - 1500)(0.3571) + (5000 - 2000)(0.2143) + (7000 - 5000)(0.1429) + (9500 - 7000)(0.0714) = 2289. The E[X ∧ 1000] is equal to an integral of the Survival Function from 0 to 1000; add up the areas under the graph of the Survival Function from 0 to 1000, a series of rectangles: (300)(1) + (700 - 300)(0.75) + (900 - 700)(0.6250) + (1000 - 900)(0.5) = 775. LER[1000] = E[X ∧ 1000]/E[X] = 775/2289 = 33.9%.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 68
2.41. C. We treat nonpayment of premiums or not renewing a policy as censoring. yi
si
ri
(ri-si)/ri
1 2 3 4
1 1 1 2
20 18 15 14
0.9500 0.9444 0.9333 0.8571
S(t) 1 0.9500 0.8972 0.8374 0.7178
(19/20)(17/18)(14/15)(12/14) = 0.7178. For example, policyholders A and E did not receive coverage during year 3, but policyholder F did. During year 3, the risk set was size 15: C, F, H, I, plus 11 others. 2.42. D. .4303/.6927 = Sn (y3 )/Sn (y2 ) = (r3 - s3 )/r3 = (r3 - 50)/r3 .
⇒ .4303r3 = .6927r3 - 34.635. ⇒ r3 = 132. .3124/.4303 = Sn (y4 )/Sn (y3 ) = (r4 - s4 )/r4 = (r4 - 20)/r4 .
⇒ .3124r4 = .4303r4 - 8.606. ⇒ r4 = 73. Since there is no truncation, r4 = r3 - s3 - (# censored between y3 and y4 ). ⇒ 73 = 132 - 50 - (# censored between y3 and y4 ). ⇒ (# censored between y3 and y4 ) = 9. Comment: Similar to 4, 5/07, Q.38.
2.43. D.
yi
ri
si
(ri-si)/ri
S(yi)
58 2 64 3 77 2 90 1 S(80) = 1/6 = 16.7%.
1 1 1 1
1/2 2/3 1/2 0
1/2 1/3 1/6 0
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 69
2.44. B. The mean is the integral of the Survival Function, which in this case involves adding up the areas under a step function: (58)(1) + (64 - 58)(1/2) + (77 - 64)(1/3) + (90 - 77)(1/6) = 67.5. Comment: Such a small data set does not lead to accurate estimates. Here is a diagram of the areas we need to sum: S(x) 1
1 2 1 3 1 6
58 64
77
90
x
2.45. B. e(70) = (integral of S(t) from 70 to infinity)/S(70) = {(77 - 70)(1/3) + (90 - 77)(1/6)}/(1/3) = 13.5. 2.46. A. The risk set at time 3 is 7, since Melati could not be observed to fail at time 3. The risk set at time 5 is size 5: B, K, L, S, T. The risk set at time 6 is size 3: B, K, S. The risk set at time 7 is size 3: B, M, S.
^
yi
si
ri
(ri-si)/ri
3 5 6 7 8 10
1 2 1 1 1 1
7 5 3 3 2 1
0.857 0.600 0.667 0.667 0.500 0.000
S (7) = (6/7)(3/5)(2/3)(2/3) = 22.9%. Comment: Melatiʼs data is truncated from below at 6. Fioʼs data is censored from above at 4.
S(yi) 1 0.857 0.514 0.343 0.229 0.114 0.000
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
2.47. B.
Deductible 500 500 500
Maximum Covered Loss 5000 5000 10,000
Payment 500 3000 4500
Loss 1000 3500 5000
500
10,000
9500
10,000+
1000
5000
3000
4000
1000
5000
4000
5000+
1000 1000
10,000 10,000
2000 5000
3000 6000
yi
si
ri
(ri-si)/ri
1000 3000 3500 4000 5000 6000
1 1 1 1 1 1
4 7 6 5 4 2
0.750 0.857 0.833 0.800 0.750 0.500
Page 70
S(yi)/S(500) 1 0.750 0.643 0.536 0.429 0.321 0.161
Comment: Since all the data is truncated from below at 500 or more, we can only estimate the survival function conditional on survival to 500. Those losses on policies with a deductible of 1000 are not available to fail at size 1000; a loss of size 1000 on a policy with a deductible of 1000 would not be in the data base. Those losses on policies with a deductible of 1000 are not in the risk set at 1000. Lives truncated from below at d are not in the risk set at d. Lives censored from above at u are in the risk set at u.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
2.48. D. yi si
ri
HCM 10/16/12,
Page 71
(ri-si)/ri
5 1 50 49/50 10 3 49 46/49 15 3 76 73/76 20 x+2 73 (71 - x)/73 25 x 71 - x (71 - 2x)/(71 - x) 30 3 71 - 2x (68 - 2x)/(71 - 2x) 35 2 68 - 2x (66 - 2x)/(68 - 2x) 40 4 66 - 2x (62 - 2x)/(66 - 2x) Since there are only new entries at time 12, and no withdrawals, most of the numerators and denominators cancel: 46 (62 - 2x) 49 46 73 71- x 71- 2x 68 - 2x 66 - 2x 62 - 2x ^ S (40) = = . 71- x 71- 2x 68 - 2x 66 - 2x (50) (76) 50 49 76 73 0.678 =
46 (62 - 2x) . ⇒ x = 3. (50) (76)
Comment: For x = 3, here is the calculation of S: yi
si
ri
(ri-si)/ri
5 10 15 20 25 30 35 40
1 3 3 5 3 3 2 4
50 49 76 73 68 65 62 60
0.980 0.939 0.961 0.932 0.956 0.954 0.968 0.933
S(yi) 1 0.9800 0.9200 0.8837 0.8232 0.7868 0.7505 0.7263 0.6779
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 72
2.49. C. Using Barbieʼs risk set, 0.6615 = S(75)/S(50) = {S(60)/S(50)} (31/32) (29/30) (25/27) (25/26) (23/24) (22/23) (20/22) = Thus, Barbieʼs estimate of S(60)/S(50) = (1.4393)(0.6615) = 0.9521. Correcting Barbieʼs risk set, at age 73 s is one more, while at age 63 s is one less. In addition at ages 66, 69, 72, and 73 there is one more in the risk set: y s r 61 1 32 63 0 30 66 2 28 69 1 27 72 1 25 73 2 24 74 2 22 S(75)/S(50) = {S(60)/S(50)} (31/32) (26/28) (26/27) (24/25) (22/24) (20/22) (0.9521) (31/32) (26/28) (26/27) (24/25) (22/24) (20/22) = 0.6598. 2.50. B. At t = 3 there is a “death”; even though observation number 2 is censored from above at 3, it is still in the risk set at time 3. At time = 5 there is a “death”; observations 8 and 9 are truncated from below at 5, and they are not in the risk set at time = 5. They are available to die at time 5 + ε. At time = 10 there is a “death”; observation 10 is truncated from below at 10, and it is not in the risk set at time = 10. It is available to die at time 10 + ε. Time 3 risk set is: observations 1 to 7. Time 4 risk set is: observations 3 to 7. Time 5 risk set is: observations 4 to 7. Time 7 risk set is: observations 5 to 9 Time 10 risk set is: observations 6 and 7. Time 12 risk set is: observations 7 and 10. yi
si
ri
(ri-si)/ri
3 4 5 7 10 12
1 1 1 2 1 1
7 5 4 5 2 2
0.857 0.800 0.750 0.600 0.500 0.500
S 10(15) = (6/7)(4/5)(3/4)(3/5)(1/2)(1/2) = 5/7 = 0.0771. Comment: Similar to 4, 11/04, Q.4.
S(yi) 1 0.857 0.686 0.514 0.309 0.154 0.077
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 73
2.51. B. The risk set is as follows: x s r 2 1 40 - 1 = 39 3 1 39 - 1 + 5 = 43 4 2 43 - 1 - 1 = 41 5 1 41 - 2 - 1 + 4 = 42 7 1 42 - 1 - 8 = 33 10 1 33 - 1 = 32 S(8) = (38/39) (42/43) (39/41) (41/42) (32/33) = 0.8569. H(8) = - ln[0.8569] = 0.1544. 2.52. D. The risk set is as follows: x s r 2 2 14 4 1 11 7 1 9 8 1 7 12 2 5 14 1 2 18 1 1 S(7 - ε) = S(4) = (12/14) (10/11) = 0.7792. S(12) = (12/14) (10/11) (8/9) (6/7) (3/5) = 0.3562. Prob(7 ≤ X ≤ 12) = Prob[X > 7 - ε] - Prob[X > 12] = S(7 - ε) - S(12) = 0.7792 - 0.3562 = 0.4230. Comment: Both an application being withdrawn and an application that has yet to be resolved are treated as observations censored from above. 2.53. & 2.54. The risk set is as follows, where x is the total months employed: x s r 7 1 10 13 1 8 74 1 4 S(24) = (9/10)(7/8) = 0.7875. F(24) = 1 - 0.7875 = 0.2125. The last value in the data set is 228 months, right censored. S(228) = S(74) = (9/10)(7/8)(3/4) = 0.5906. 30 years is equivalent to 360 months. Extrapolating: S(300) = 0.5906360/228 = 0.4354. F(300) = 1 - 0.4354 = 0.5646.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 74
2.55. A. The risk set is as follows: x s r 5000 1 18 10,000 2 17 20,000 1 15 25,000 2 14 50,000 2 12 75,000 1 10 125,000 1 6 300,000 1 2 S(150,000) = (17/18)(15/17)((14/15)(12/14)(10/12)(9/10)(5/6) = (9/18)(5/6) = 5/12 = 0.4167. Comment: We have implicitly assumed that the size of loss distribution for insureds who buy the different policy limits would be the same prior to the impact of the policy limit. Since below 100,000 there is no censoring, S(75,000) = 9/18. 2.56. The risk set at age 77 is: 46 - 3 - 9 = 34. Thus (32/34) S(75) = S(77) = 0.4368. ⇒ S(75) = 0.4641. 2.57. B.
yi
si
ri
(ri - si)/ri
S(yi)
24 1 6 5/6 5/6 42 2 4 1/2 5/12 60 1 1 0 0 The estimated survival function is: 1 for 0 ≤ x < 24, 5/6 for 24 ≤ x < 42, 5/12 for 42 ≤ x < 60, 0 for 62 ≤ x. e(0) = E[X] = ∫S(t)dt = (1)(24) + (5/6)(42 - 24) + (5/12)(60 - 42) = 46.5. Comment: The risk set at 24 is B to G. The risk set at 42 is C to E, G. The risk set at 60 is E.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 75
2.58. C. The study starts on 1/1/87 and ends on 12/31/87. We need to determine the ages of the individuals when they enter and exit the study. Individual Age at Entry Age at Exit Reason for Exit 1 86 and 9 months 87 and 9 months Study Ends 2 86 and 6 months 86 and 11 months Death 3 86 87 Study Ends 4 85 and 9 months 86 and 9 months Study Ends 5 85 and 6 months 86 and 4 months Death Individual #5 died at age 86 and 4 months. The risk set is: #3, #4, #5. Individual #2 died at age 86 and 11 months. The risk set is: #1, #2, #3. yi si ri (ri - si/)ri S(yi)/S(86) 86 and 4 months 1 3 2/3 86 and 11 months 1 3 2/3 q86 = 1 - p86 = 1 - 4/9 = 5/9 = 0.556.
2/3 4/9
Comment: Nobody is in the study prior to age 86, thus all of the estimates are conditional on survival up to age 86. That is why we divide S(yi) by S(86). p86 = S(87)/S(86). 2.59. C. Since there are no withdrawals between x and x + 0.4, the product limit estimator of S(0.4) is the empirical survival function 1 - 38/1000 = 0.962. Similarly, since there are no withdrawals between x + 0.4 and x + 1, the product limit estimator of S(1)/S(0.4) is: 1 - 51/(1000 - 200 - 38) = 0.93307. Product limit estimator of qx is: 1 - S(1) = 1 - (0.962)(0.93307) = 1 - 0.89761 = 0.10239. Assuming a uniform survival function, S(t) = 1 - tqx, the loglikelihood is: 38 lnF(0.4) + 200 lnS(0.4) + 51 ln[S(0.4) - S(1)] + 711 ln S(1) = 38 ln(0.4qx) + 200 ln(1 - 0.4qx) + 51 ln(0.6qx) + 711 ln(1 - qx) = 38 ln(0.4) + 38 ln(qx) + 51 ln(0.6) + 200 ln(1 - 0.4qx) + 711 ln(1 - qx). Setting the derivative equal to zero: 89/qx - 200(0.4)/(1 - 0.4qx) - 711/(1 - qx) = 0. 89(1 - 0.4qx)(1 - qx) - 80qx(1 - qx) - 711qx (1 - 0.4qx) = 0 400qx2 - 915.6qx + 89 = 0. ⇒ qx = 0.10172. |0.10239 - 0.10172| = 0.00067.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
2.60. B.
yi
si
ri
x + 0.25 x + 0.50 x + 0.75 x + 1.00
4 2 3 4
25 21+n 19+n 16+n
(ri - si)/ri 21/25 (19+n)/(21+n) (16+n)/(19+n) (12+n)/(16+n)
HCM 10/16/12,
Page 76
S(yi) 21/25 (21/25)(19+n)/(21+n) (21/25)(16+n)/(21+n) (21/25)(12+n)/(21+n)
We are given, 0.396 = qx = 1 - S(x + 1) = 1 - (21/25)(12+n)/(21+n). ⇒ 15.1(21+n) = 21(12+n). ⇒ n = 65.1/5.9 = 11. 2.61. E. Let w be the number of terminations from the study in [t4 , t5 ). Let y be the number of people alive just before the deaths at time t4 . ^
^
S (4) = S (3)(y - 2)/y. ⇒ 0.6y = 0.72(y - 2). ⇒ y = 1.44/0.12 = 12. Just before the death at t5 there are alive: y - 2 - w = 10 - w. ^
^
S (5) = S (4)(10 - w - 1)/(10 - w). ⇒ 0.5(10 - w) = 0.6(9 - w). ⇒ w = 0.4/0.1 = 4. 2.62. Prior to time 6, it acts as if we have a complete data set, and the estimate of the survival function at 6 is the empirical survival function: (100 - 10)/100 = 0.9 Between time 6 and 10 nobody enters or withdraws, so again the estimate of S(10)/S(6) is the same as the empirical survival function: (75 - 3)/ 75 = 0.96. Thus the estimate of S(10) is: (0.9)(0.96) = 0.864. 2.63. C. In the interval (x, x + 0.6) there are no withdrawals or entries. ^
Therefore for the product limit estimator S (.6) = 90/100 = 0.9, the empirical survival function. After the 15 people leave the study at x + 0.6, 100 - 15 - 10 = 75 people remain in the risk set. In the interval (x + 0.6, x + 1) there are no withdrawals or entries. ^
^
Therefore, for the product limit estimator S (1) = S (.6)(72/75) = .8640. qx = 1 - 0.8640 = 0.1360. From the assumption of the uniform distribution: S(t) = 1 - tq, 0 ≤ t ≤ 1. S(0.6) = 1 - 0.6q. S(1) = 1 - q. f(t) = q, 0 ≤ t ≤ 1. Likelihood is: F(.6)10 {F(1) - F(.6)}3 S(.6)15 S(1)72. Loglikelihood is: 10 ln(0.6q) + 3 ln(.4q) + 15 ln(1 - 0.6q) + 72 ln(1 - q). Set the derivative of the loglikelihood with respect to q equal to zero: 0 = 10/q + 3/q - 9/(1 - 0.6q) - 72/(1 - q). ⇒ 13(1 - 0.6q)(1 - q) - 9q(1 - q) - 72q(1 - 0.6q) = 0.
⇒ 60q2 - 101.8q + 13 = 0. ⇒ q = {101.8 - 101.82 - (4)(60)(13) } / 120 = 0.1391. I MLE - PLE I = |0.1391 - 0.1360| = 0.0031.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 77
2.64. A.
t Deaths Withdrawals New Entrants r 1 6 300 2 20 294 3 10 314 4 10 30 304 5 a 324 - a 6 324 - a 7 45 324 - a The new entrants at time 4 are not in the risk set at time 4, since these mice are known to be alive at time 4. ^
0.892 = S (7) = {(300 - 6)/300}{(314 - 10)/314}{(304 - 10)/304}{(324 - a)/324}.
⇒ (0.892)(300)(314)(304)(324) = (294)(304)(294)(324 - a). ⇒ a = 324 - 315 = 9. t 7 8 9 10 11 12 13 ^
Deaths b 6 -
Withdrawals 45 35 15
New Entrants -
r 324 - 9 = 315 270 270 270 - b 235 - b 235 - b 229 - b
^
0.856/0.892 = S (13)/ S (7) = {(270 - b)/270}{(229 - b)/(235 - b)}.
⇒ (0.856)(270)(235 - b) = (0.892)(270 - b)(229 - b). ⇒ 0.892b2 - 213.988b + 839.16 = 0. ⇒ b = {213.988 ± 213.9882 - (4)(.892)(839.16) } / {(2)(.892)} = (213.988 ± 206.87)/1.784. ⇒ b = 4 or 236. However, b can not be 236 since there are only 235 lives at risk at time 9. a + b = 9 + 4 = 13. Comment: Any withdrawals at time 4 would be in the risk set at time 4, since these mice are available to die at time 4. 2.65. At a time between 3.75 and 4 we get one agent resigning out of 1 + 11 = 12 that remain. (rj - sj)/rj = (12 - 1)/ 12 = 11/12. We estimate S(4) = (11/12) S(3.75) = (11/12)(0.25) = 0.229. Comment: Let us assume that the agent in the second bullet resigned at 3.8 years of service. Based on bullet 3, there are 11 + 1 = 12 agents in the risk set at time 3.8. Therefore, S(3.8) = (11/12)S(3.75) = 0.229. Since there no more resignations through 4 years of service, S(4) = S(3.8).
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 78
2.66. C. One can estimate the Survival Functions as follows: yi
si
ri
(ri-si)/ri
0 1 3 5 8 9
1 2 1 2 1
9 8 5 3 1
0.889 0.750 0.800 0.333 0.000
S(yi) 1 1 0.889 0.667 0.533 0.178 0.000
For example, there are 9 - 4 = 5 cases available to be settled at t = 5; 3 were settled before then and one was censored from above at 4 due to the conclusion of the study. Thus the risk set, r, at t = 5 is 5. At t = 5, one case is settled, so s, the number of times the uncensored observation 5 appears in the data set, is 1. S(5) = {(5-1)/5}S(3) = (4/5)(2/3) = 8/15. Pr[3 ≤ T ≤ 5] = Prob[T ≥ 3] - Prob[T > 5] = S(2.9999) - S(5) = 0.889 - 0.533 = 0.356. Alternately, the data reveal death times of 1, 3 (2 deaths), 5, 8 (2 deaths), and 9 and withdrawal times of 4 and 6. The Product-Limit estimate produces a discrete distribution with probability at each death time. The probability of dying at time 3 is given by: Prob[survive to 3]Prob[die at 3 | survive to 3] = (8/9)(2/8) = 2/9. The probability of dying at time 5 is given by: Prob[survive to 3]Prob[survive to 5 | survive to 3]Prob[die at 5 | survive to 5] = (8/9)(6/8)(1/5) = 2/15. The total probability of dying between times 3 and 5 inclusive is: 2/9 + 2/15 = 16/45 = 0.356. Comment: One has to be careful of the endpoints; this is a closed interval. In order to include 3, we need to use S(3 - ε) = Prob[T ≥ 3]. In contrast, S(3) = Prob[T > 3]. S(5) - S(3) = Prob[T > 3] - Prob[T > 5] = Prob[3 < T ≤ 5] ≠ Prob[3 ≤ T ≤ 5]; this can make a difference when working with discrete distributions such as the result of the Product Limit Estimator. 2.67. E. If 1st value is censored at 1: S5 (4) = (3/4)(2/3)(1/2) = 1/4. If 2nd value is censored at 2: S5 (4) = (4/5)(2/3)(1/2) = 4/15. If 3rd value is censored at 3: S5 (4) = (4/5)(3/4)(1/2) = 3/10. If 4th value is censored at 4: S5 (4) = (4/5)(3/4)(2/3) = 2/5. If 5th value is censored at 5: S5 (4) = (4/5)(3/4)(2/3)(1/2) = 1/5. The smallest estimate of S(4) occurs when the 5th value is censored at 5. Comment: Time of policy lapse means the policy was not renewed. For example, assume Joe only is with our insurer for 3 years, and he is claims free for these 3 years. We do not know when Joe would have had his first claim if he had remained with our insurance company; however, we know it would have been greater than 3. Joeʼs data is censored from above at 3.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 79
2.68. B. If the largest study time corresponds to a death time, then at this time all of the remaining lives die; at this time sj = rj. Thus one multiplies by a factor of (rj - sj)/rj = 0. Therefore, the Product-Limit estimate of the survival function is zero beyond this death time. Comment: For example, assume that in a mortality study we have only one life left in the risk set at age 103, and that life is observed to die at age 103. Then the Product-Limit estimate of S(103) = 0. We would also estimate S(x) = 0 for all x > 103. 2.69. B. SB(4) = {(100 - 15)/100}{(85 - 20)/85}{(65 - 20)/65}{(45 - 10)/45} = (85/100) (65/85) (45/65) (35/45) = 35/100 = 0.350. ti
si
ri
(ri-si)/ri
1 2 3 4
15 20 20 10
100 85 65 45
0.850 0.765 0.692 0.778
S(t) 1 0.850 0.650 0.450 0.350
S T(4) = {(300 - 35)/300}{(265 - 74)/265}{(191 - 34)/191}{(157 - 32)/152} = (265/300)/(191/265)(157/191)(125/157) = 125/300 = 0.417. ti
si
ri
(ri-si)/ri
1 2 3 4
35 74 34 32
300 265 191 157
0.883 0.721 0.822 0.796
S(t) 1 0.883 0.637 0.523 0.417
|ST(4) - SB(4)| = 0.417 - 0.350 = 0.067. Comment: There are no new entrants, and lives only leave through death; there is no truncation and censoring. Therefore, the Product-Limit Estimator is equal to the usual empirical survival function. For Country B, the empirical distribution function at 4 is: (15 + 20 + 20 + 10)/100 = 65/100 = .65; the empirical survival function at 4 is: 1 - 0.65 = 0.35. 2.70. There are no new entrants, and lives only leave through death; there is no truncation and censoring. Therefore, the Product-Limit Estimator is equal to the usual empirical survival function. For Country A, the empirical survival function at 4 is: (112 - 22)/200 = 0.45. ti
si
ri
(ri-si)/ri
1 2 3 4
20 54 14 22
200 180 126 112
0.900 0.700 0.889 0.804
S(t) 1 0.900 0.630 0.560 0.450
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 80
2.71. D. S10(8) = (9/10)(7/9)(6/7)(4/5)(3/4) = 18/50 = 0.36. yi
si
ri
(ri-si)/ri
2 3 5 6 7 9
1 2 1 1 1 1
10 9 7 5 4 2
0.900 0.778 0.857 0.800 0.750 0.500
S(yi) 1 0.9 0.7 0.6 0.48 0.36 0.18
Comment: S10(t) = 0.36 for 7 ≤ t < 9. The Product Limit Estimator is constant on the intervals between observations. 2.72. E. S10(1.6) = (6/7)(5/6) = 5/7 = 0.714. yi
si
ri
(ri-si)/ri
0.9 1.5 1.7 2.1
1 1 1 2
7 6 5 3
0.857 0.833 0.800 0.333
S(yi) 1 0.857 0.714 0.571 0.190
Comment: S10(t) = 5/7 for 1.5 ≤ t < 1.7. The Product Limit Estimator is constant on the intervals between observations. At time 0.9, 7 lives are in the risk set: 1-7. At time 1.5, 6 lives are in the risk set: 3-8. Since it is left truncated at time 1.5, life #9 is not in the risk set at time 1.5. Since it is right censored at time 1.5, life #4 is in the risk set at time 1.5. If this were size of loss data, then: di = deductible. xi is the size of an uncensored loss. ui is the maximum covered loss if the loss has been censored from above. If this were mortality data, then: di = age at entry to the study. xi is age at death, uncensored. ui is age at withdrawal from the study (other than due to death) or when the study ends. 2.73. E. For the Exponential using maximum likelihood: ^ θ = (sum of amounts paid)/(# of uncensored values) = (350 + ... + 1500)/7 = 7900/7 = 1128.6. ^
S2 (1250) = exp[-1250/1128.6] = 0.330. 350 500 1000 1200 1500 ^
si
ri
(ri - si)/ri
2 2 1 1 1
10 8 5 2 1
8/10 6/8 4/5 1/2 1 ^
^
S1(1250) = (8/10)(6/8)(4/5)(1/2) = 0.24. | S1(1250) - S2 (1250)| = |0.24 - 0.330| = 0.090.
2013-4-7,
Survival Analysis §2 Product-Limit Estimator,
HCM 10/16/12,
Page 81
2.74. B. 0.5/0.65 = Sn (y4 )/Sn (y3 ) = (r4 - s4 )/r4 = (r4 - 3)/r4 . ⇒ 0.5r4 = 0.65r4 - 1.95. ⇒ r4 = 13. 0.25/0.5 = Sn (y5 )/Sn (y4 ) = (r5 - s5 )/r5 . ⇒ s5 = r5 /2. Now, r5 = r4 - s4 - 6 = 13 - 3 - 6 = 4. ⇒ s5 = 4/2 = 2. Comment: Given the outputs, solve for a missing input.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 82
Section 3, Variance of the Product-Limit Estimator37 In the previous section, for the example with eight women, we estimated the Survival Function using the Kaplan-Meier Product-Limit Estimator: i yi ri si (ri - si)/ri S(yi) 1 2 3 4 5
68 77 79 84 90
7 7 5 3
1 2 1 1 0
6/7 5/7 4/5 2/3
6/7 (6/7)(5/7) = 30/49 (30/49)(4/5) = 24/49 (24/49)(2/3) = 16/49 = 0.327 0.327
Greenwoodʼs Approximation:
^
Variance of the Product-Limit Estimator ≅ V [Sn ( yj)] = Sn ( yj) 2
j
si . (r s ) i i i i=1
∑r
In this example, using Greenwoodʼs approximation the variance of the estimate of S(77) is: (30/49)2 (1/{(7)(6)} + 2/{(7)(5)}) = 0.0303. Thus an approximate 95% confidence interval for S(77) is: 30/49 ± 1.96 0.0303 = 0.61 ± 0.34. With so little data, the estimate is not very good. Note that since F(x) + S(x) = 1, Var[Fn (x)] = Var[Sn (x)]. The terms that are summed in Greenwoodʼs approximation are: si / ri (failure ratio)i si (r - s i)/ ri (survival ratio)i = i = . ri (ri - s i) ri ri Therefore, ten times as much data, would lead to one tenth the variance. Here is a calculation of all the variances:
37
yi
si
ri
(ri-si)/ri
68 77 79 84
1 2 1 1
7 7 5 3
0.857 0.714 0.800 0.667
See Equation 14.3 in Loss Models.
S(yi) 1 0.857 0.612 0.490 0.327
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.02381 0.05714 0.05000 0.16667
0.02381 0.08095 0.13095 0.29762
0.0175 0.0303 0.0314 0.0317
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 83
As shown below, in the absence of truncation and censoring, Greenwoodʼs approximation gives the exact variance of the Empirical Survival Function. Variance of ps and qs: In this example, we would estimate 10p 70 = S(80)/S(70) = (24/49)/(6/7) = 4/7 = 0.5714. 10p 70
is the probability of survival to age 80, conditional on survival past 70.
Since 10p 70 is conditional on survival past 70, perform all of the calculations beyond 70:
^ 10 p 70 =
i= 3
∏ ri i=2
^
- si = (0.714)(0.800) = 0.571. ri i=3
p^
V [10 70] = 10p 702
∑ r (r si- s )
i=2 i
i
= 0.57142 {
i
yi
si
ri
(ri-si)/ri
77 79 84
2 1 1
7 5 3
0.714 0.800 0.667
S(yi) 1 1.000 0.714 0.571 0.381
2 1 + } = 0.0350. (7) (5) (5) (4)
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.05714 0.05000 0.16667
0.05714 0.10714 0.27381
0.0292 0.0350 0.0397
Note that since 10p 70 + 10q70 = 1, Var[10 q^ 70] = Var[10 p^ 70]. Exercise: In the previous section, for an example with 10 losses the Survival Function was estimated using the Kaplan-Meier Product-Limit Estimator: i yi ri si (ri - si)/ri S(yi) 1 300 4 1 3/4 2 700 6 1 5/6 3 2000 8 1 7/8 4 11,000 7 1 6/7 5 12,000 6 1 5/6 6 15,000 5 1 4/5 7 29,000 4 1 3/4 8 70,000 1 1 0 Estimate the variances of these estimates.
3/4 (3/4)(5/6) = 5/8 (5/8)(7/8) = 35/64 (35/64)(6/7) = 15/32 (15/32)(5/6) = 25/64 (25/64)(4/5) = 5/16 (5/16)(3/4) = 15/64 0
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 84
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.08333 0.03333 0.01786 0.02381 0.03333 0.05000 0.08333
0.08333 0.11667 0.13452 0.15833 0.19167 0.24167 0.32500
0.0469 0.0456 0.0402 0.0348 0.0292 0.0236 0.0179
[Solution: yi
si
ri
(ri-si)/ri
300 700 2,000 11,000 12,000 15,000 29,000 70,000
1 1 1 1 1 1 1 1
4 6 8 7 6 5 4 1
0.750 0.833 0.875 0.857 0.833 0.800 0.750 0.000
S(yi) 1 0.750 0.625 0.547 0.469 0.391 0.312 0.234 0.000
In the above exercise, 10000 p^ 1000 = S(11000 )/ S(1000) = 0.469 / 0.625 = 0.750. Equivalently, we start multiplying survival ratios at 1000: ^ 10000 p 1000
= (0.875)(0.857) = 0.750.
^
V [10000 p^ 1000] = 10000 p^ 10002
i=4
∑ r (r si- s )
i=3 i
i
= 0.7502 (0.01786 + 0.02381) = 0.0234.
i
The Basis of Greenwoodʼs Approximation:38 Let the survival ratio be: (ri - si) / ri = ξi. Assuming that the ri are given and fixed, then the ξi are independent. E[ξi] = S(yi) / S(yi-1). j
j
S n (yj) = Π ξi. E[Sn (yj)] = Π E[ξi] = S(yj)/S(y0 ). i=1
i=1
Given ri, ri - si is Binomial with parameters: pi = S(yi)/S(yi-1), and m = ri. The second moment of a Binomial Distribution is: variance + mean2 = mq(1-q) + (mq)2 = m2 q2 {1 + (1 - q)/(mq)}. Therefore, ri - si has second moment: ri2 p i2 {1 + (1 - pi)/(rip i)}. 38
See pages 355 to 357 of Loss Models.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 85
Therefore, E[ξi2 ] = E[(ri - si)2 ] / ri2 = pi2 {1 + (1 - pi)/(rip i)}. j
j
j
Second moment of Sn (yj) = Π E[ξi2 ] = Π pi2 Π{1 + (1 - pi)/(rip i)} i=1
i=1
i=1
j
Π pi2 = {S(y1 )/S(y0 )}2 {S(y2 )/S(y1 )}2 ... {S(yn )/S(yn-1)}2 = S(yj)2 /S(y0 )2 . i=1 j
j
Π{1 + (1 - pi)/(rip i)} ≅ 1 + Σ(1 - pi)/(rip i). i=1
i=1 j
Therefore, Var[Sn (yj)] ≅ {S(yj)2 /S(y0 )2 } {1 + Σ(1 - pi)/(rip i)} - S(yj)2 /S(y0 )2 = i=1 j
{S(yj)2 /S(y0 )2 } Σ(1 - pi)/(rip i)}. i=1
Now, si/(ri - si) is an estimator of: (1 - pi)/pi = {S(yi-1) - S(yi)} / S(yi). Also Sn (yj) is an estimator of: S(yj)/S(y0 ). j
Therefore, Var[Sn (yj)] ≅ Sn (yj)2 Σ si/{(ri - si)ri}, which is Greenwoodʼs approximation. i=1
Without Truncation and Censoring Greenwoodʼs Approximation is Exact: Lemma: With n individuals in a mortality study, in the absence of truncation and censoring: j
j
Σ si/{ri(ri - si)} = wj/{n(n - wj)}, where wj = Σ si. i=1
i=1
Proof: For j = 1, it is true. Proof by induction; assume it is true for j -1 and prove it is true for j. j
j-1
Σ si/{ri(ri - si)} = Σ si/{ri(ri - si)} + sj/{rj(rj - sj)} = wj-1/{n(n - wj-1)} + sj/{(n - wj-1)(n - wj-1 - sj)} i=1
i=1
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 86
= {wj-1(n - wj-1 - sj) + sjn}/{n(n - wj-1)(n - wj-1 - sj)} = {(n - wj-1)(wj-1 + sj)}/{n(n-wj-1)(n - wj-1 - sj)} = wj/{n(n - wj)}. Therefore, using Greenwoodʼs Approximation, in the absence of truncation and censoring: Var[Sn (yj)] ≅ Sn (yj)2 Σ si/{ri(ri - si)} = Sn (yj){(n - wj)/n}wj/{n(n - wj)} = Sn (yj)(wj/n)/n = Sn (yj)Fn (yj)/n. In the absence of truncation and censoring, the Product-Limit Estimator is the Empirical Survival Function, and Greenwoodʼs Approximation is equal to its variance. Covariances:39 Using the same type of approximations as Greenwood:
Cov[Sn (yj), Sn (yk)] ≅ [Sn (yj)] [Sn (yk)]
Min[j,k]
si .40 r (r s ) i i=1 i i
∑
Min[j,k]
Corr[Sn (yj), Sn (yk)] ≅
si i=1 ri (r i - si ) . Max[j,k] ∑ ri (risi- s i) i=1
∑
For example, in the previous exercise: Cov[Sn (5000), Sn (25000)] ≅ [Sn (5000)] [Sn (25000)]
i=3
∑ r (r si- s ) i=1 i
i
i
= (0.547)(0.312)(0.13452) = 0.0230. In the previous exercise, Prob[5000 < X ≤ 25,000] = Sn (5000) - Sn (25000) = 0.547 - 0.312 = 0.235. The variance of that estimate is: Var[Sn (5000) - Sn (25000)] = Var[Sn (5000)] + Var[Sn (25000)] - 2Cov[Sn (5000), Sn (25000)] = 0.0402 + 0.0236 - (2)(0.0230) = 0.0178. 39
Loss Models does not discuss covariances of the Kaplan-Meier Product-Limit Estimator. Without truncating and censoring, this reduces to the formula for covariances of Empirical Survival Functions, discussed previously. For x ≤ y, Cov[S(x), S(y)] = F(x)S(y)/N. 40
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 87
Another Example: Sometimes you are just given the risk set information in a question.41 Exercise: For a survival study with censored and truncated data, you are given: Time (t) Number at Risk at Time t Failures at Time t 1 35 4 2 28 8 3 31 5 4 24 4 5 22 3 Using the Product Limit Estimator, estimate the survival functions and the variance of each estimate. j
^
[Solution: V [Sn (yj)] =
Sn (yj)2
∑ ri (ri -i si ) . s
i=1
ti
si
ri
(ri-si)/ri
1 2 3 4 5
4 8 5 4 3
35 28 31 24 22
0.886 0.714 0.839 0.833 0.864
S(ti) 1 0.886 0.633 0.531 0.442 0.382
si/(ri(ri-si))
Cum. Sum
Var[S(ti)]
0.00369 0.01429 0.00620 0.00833 0.00718
0.00369 0.01797 0.02418 0.03251 0.03969
0.00289 0.00719 0.00681 0.00636 0.00579
Comment: The variances of the estimates in the two tails are lower than those near the middle, as would be the case for the empirical survival function. ^
As discussed previously, for a complete data set, Var[ S (x)] = F(x) S(x) / n.]
41
See for example, 4, 11/03, Q. 21.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 88
Problems: Use the following information for the next eight questions: Group A Group B si ri si ri yi 1 2 3 4 5
40 20 30 20 10
300 200 200 100 50
20 30 20 10 5
200 300 150 100 50
3.1 (1 point) Determine the Product-Limit estimate of S(2) based on the data for Group A. (A) 72% (B) 74% (C) 76% (D) 78% (E) 80% 3.2 (2 points) Determine the standard deviation of the estimate in the previous question. (A) 1.5% (B) 2.0% (C) 2.5% (D) 3.0% (E) 3.5% 3.3 (2 points) Determine the Product-Limit estimate of S(4) based on the data for Group A. (A) 53% (B) 55% (C) 57% (D) 59% (E) 61% 3.4 (2 points) Determine the standard deviation of the estimate in the previous question. (A) 1.5% (B) 2.0% (C) 2.5% (D) 3.0% (E) 3.5% 3.5 (2 points) Determine the Product-Limit estimate of S(4) based on the data for Group B. (A) 59%
(B) 61%
(C) 63%
(D) 65%
(E) 67%
3.6 (2 points) Determine the standard deviation of the estimate in the previous question. (A) 1.5%
(B) 2.0%
(C) 2.5%
(D) 3.0%
(E) 3.5%
3.7 (3 points) Determine the Product-Limit estimate of S(4) based on the data for both groups. (A) Less than 55% (B) At least 55%, but less than 60% (C) At least 60%, but less than 65% (D) At least 65%, but less than 70% (E) At least 70% 3.8 (2 points) Determine the standard deviation of the estimate in the previous question. (A) 1.5% (B) 2.0% (C) 2.5% (D) 3.0% (E) 3.5%
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 89
3.9 (3 points) Based on the following information, determine Greenwoodʼs approximation to the variance of the Product Limit estimate of S(40). yi ri si 10 800 200 20 1000 150 30 600 50 40 350 60 50 200 40 60 100 30 (A) 0.00015
(B) 0.00020
(C) 0.00025
(D) 0.00030
(E) 0.00035
Use the following information for the next two questions: For a study of patients with HIV, with time measured from when they were diagnosed: Time Event 0 40 new entrants 1.5 1 death 2.0 5 withdrawals from the study 2.7 1 death 3.0 10 new entrants to the study 3.3 1 death 4.0 5 withdrawals from the study 4.2 2 deaths 5.0 study ends 3.10 (3 points) Determine the standard deviation of the Product-Limit estimate of S(3). (A) 0.03 (B) 0.04 (C) 0.05 (D) 0.06 (E) 0.07 3.11 (3 points) Determine the standard deviation of the Product-Limit estimate of S(5). (A) 0.03 (B) 0.04 (C) 0.05 (D) 0.06 (E) 0.07
3.12 (3 points) For the interval from 0 to 1 year, the exposure (r) is 65 and the number of deaths (s) is 13. For the interval from 1 to 2 years the exposure is 90 and the number of deaths is 14. For 2 to 3 years the values are 45 and 5; for 3 to 4 years they are 30 and 4; and for 4 to 5 years they are 20 ^
and 2. Determine Greenwoodʼs approximation to the variance of S(4). A. Less than 0.004 B. At least 0.004 but less than 0.005 C. At least 0.005 but less than 0.006 D. At least 0.006 but less than 0.007 E. At least 0.007
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 90
Use the following information for the next two questions: You are given the following censored data on the time until remission of symptoms of 10 cancer patients on chemotherapy. i Time to Remission (weeks) 1 8 Remission 2 10 Remission 3 17 Censored 4 22 Remission 5 32 Censored 6 33 Remission 7 33 Censored 8 47 Remission 9 82 Remission 10 104 Censored 3.13 (2 points) Determine the Product-Limit Estimator of S(104). (A) 14% (B) 16% (C) 18% (D) 20%
(E) 22%
3.14 (2 points) Using Greenwoodʼs approximation, determine the variance of the estimate in the previous question. (A) 0.020 (B) 0.025 (C) 0.030 (D) 0.035 (E) 0.040 Use the following information for the next two questions: Liability insurance policies are sold with three limits: 50, 100, and 250. You observe 10 claim payments from each type of policy. Limit of 50: 3, 9, 14, 27, 39, 50, 50, 50, 50, 50. Limit of 100: 4, 8, 17, 22, 55, 60, 100, 100, 100, 100. Limit of 250: 7, 12, 20, 32, 45, 70, 90, 125, 190, 250. 3.15 (2 points) Determine the Product-Limit Estimator of S(75). (A) 31% (B) 33% (C) 35% (D) 37%
(E) 39%
3.16 (3 points) Using Greenwoodʼs approximation, determine the variance of the estimate in the previous question. (A) 0.0080 (B) 0.0085 (C) 0.0090 (D) 0.0095 (E) 0.0100
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 91
3.17 (2 points) Margieʼs favorite soap opera on daytime TV is “The Young and the Beautiful.” “The Young and the Beautiful” has been on TV for 17 years. You have the following data on the length of time daytime soap operas have spent on TV: 1, 1, 2, 3, 3+, 5, 9, 11+, 15, 17+, 22, 25+, 31, 37+, 46+, 47+, 53+, where a + indicates that the soap opera is still on the air. Using the Product-Limit Estimator, what is the probability that “The Young and the Beautiful” will remain on TV for more than 20 additional years? A. Less than 65% B. At least 65% but less than 70% C. At least 70% but less than 75% D. At least 75% but less than 80% E. At least 80% 3.18 (2 points) What is the standard deviation of the estimate in the previous question? A. Less than 0.16 B. At least 0.16 but less than 0.17 C. At least 0.17 but less than 0.18 D. At least 0.18 but less than 0.19 E. At least 0.19
3.19 (3 points) Let y1 be the first time at which deaths occurred. S(y1 ) is estimated using the Product-Limit estimator. (0.83572, 0.90187) is a 90% linear confidence interval for S(y1 ). How many deaths occurred at time y1 ? A. 36
B. 37
C. 38
D. 39
E. 40
3.20 (3 points) For 1000 liability insurance claims you are given: (i) Claims are submitted to an insurer t months after the accident occurs, t = 0, 1, 2... (ii) There are no censored observations. ^
(iii) S (t) is calculated using the Kaplan-Meier product limit estimator. ^ ^ ^ ^ [S ^ [S (iv) cS2 (t) = Var (t)]/ S (t)2 , where Var (t)] is calculated using Greenwoodʼs approximation. ^ (v) S (39) = 0.313, cS2 (39) = 0.012089, cS2 (40) = 0.012250. Determine the number of claims that were submitted to the insurer 40 months after an accident occurred. (A) 11 (B) 13 (C) 15 (D) 17 (E) 19
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 92
3.21 (2 points) A dating service has collected data on the length of time in months from when a couple met to when they got married to each other. Here is the data on 20 couples: Couple Time Until Married Time Since the Couple Met 1 8 2 14 3 16 4 18 5 18 6 18 7 25 8 27 9-20 30 or more ^
S (t) is calculated using the Kaplan-Meier product limit estimator. c 2 (t) = S
^ ^ V ar [S (t)] ^
^ ^ [S , where Var (t)] is calculated using Greenwoodʼs approximation.
S (t)2
Determine cS(30). (A) 0.14
(B) 0.16
(C) 0.18
(D) 0.20
(E) 0.22
Use the following information for the next two questions: The results of a study are as follows: Individual Birthdate Event A April 1, 1932 Died on July 1, 2007 B July 1, 1932 C October 1, 1932 Left the study on March 1, 2008 D January 1, 1933 E April 1, 1933 F July 1, 1933 Died on December 1, 2009 G September 1, 1933 All individuals were in the study on January 1, 2007. The study ended on January 1, 2010. 3.22 (2 points) Using the Product Limit estimator, estimate 2 q75. (A) 0.36
(B) 0.38
(C) 0.40
(D) 0.42
(E) 0.44
3.23 (2 points) What is the standard deviation of the estimate in the previous question? (A) 0.09 (B) 0.12 (C) 0.15 (D) 0.18 (E) 0.21
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 93
3.24 (2 points) You are given: (i) Ten people join an exercise program on the same day. They stay in the program until they reach their weight loss goal or switch to a diet program. (ii) Experience for each of the ten members is shown below: Time at Which… Member
Reach Weight Loss Goal
Switch to Diet Program
1
6
2
11
3
14
4
14
5
14
6
18
7
23
8
27
9
34
10
40
(iii) The variable of interest is time to reach weight loss goal. Using the Product estimator, calculate the upper limit of the symmetric 90% linear confidence interval for the survival function S(20). (A) 0.78
(B) 0.80
(C) 0.82
(D) 0.84
(E) 0.86
3.25 (2 points) You are given: Individual A B C D E F G H
Entry 0 0 0 0 50 60 70 80
Age at Withdrawal 63 72 -
Death 72 63 70 72 82 89 ^
Using the Product-Limit estimator, determine the coefficient of variation of S (75). (A) 0.3
(B) 0.4
(C) 0.5
(D) 0.6
(E) 0.7
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 94
3.26 (160, 11/86, Q.11) (2.1 points) A mortality study yields the following results: ti ri si 41 100 10 42 200 20 43 200 20 The Product-Limit estimator is used to approximate 3 p 40. Using Greenwood's formula, estimate the variance of the approximation of 3 p 40. A. 0.0012
B. 0.0015
C. 0.0017
D. 0.0020
E. 0.0022
3.27 (160, 11/87, Q.6) (2.1 points) A mortality study yields the following results: ti
ri
si
63 100 10 64 120 12 65 110 15 The Product-Limit estimator is used to approximate 3 p 62. Using Greenwood's formula, estimate the standard deviation of the approximation of 3 p 62. A. 0.0412
B. 0.0415
C. 0.0420
D. 0.0425
E. 0.0432
3.28 (160, 5/88, Q.16) (2.1 points) For the estimation interval (x, x+1], you are given: (i) There are 15 people in the study at age x. (ii) One intermediate entrant occurs at each age x + 0.4, x + 0.7. (iii) One intermediate termination occurs at each age x + 0.2, x + 0.6. (iv) One death occurs at each age x + 0.1, x + 0.3, x + 0.5, x + 0.8. Using the estimated value of px and Greenwood's formula, calculate the estimated variance of the product limit estimator of qx. (A) 0.01337
(B) 0.01344
(C) 0.01350
(D) 0.01357
(E) 0.01363
3.29 (160, 11/88, Q.8) (2.1 points) From a two-year complete mortality study of 10,000 lives that retired at exact age 65, you are given: ^
(i) Greenwood's approximation of the variance of S (2) is .0000088. (ii) The observed mortality rates at age 65 and age 66 are equal and each less than 1/4. Determine the observed mortality rate at age 65. (A) 0.05 (B) 0.06 (C) 0.07 (D) 0.08 (E) 0.09
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 95
3.30 (160, 5/90, Q.9) (2.1 points) From a two-year mortality study of 1000 lives beginning at exact age 40, you are given: (i) The observed number of deaths in the interval (40, 41] is the same as the observed number of deaths in the interval (41, 42]. (ii) Greenwood's approximation of Var[S(2)] is 0.00016. (iii) The observed mortality rate q^ 40 < 0.2. Calculate q^ 40. (A) 0.096
(B) 0.097
(C) 0.098
(D) 0.099
(E) 0.100
3.31 (160, 5/90, Q.14) (2.1 points) For the estimation interval (x, x+2], you are given: (i) Ten lives enter observation at exact age x. (ii) One intermediate entrant occurs at each age x+0.8, x+1.0. (iii) One intermediate withdrawal occurs at age x+1.5. (iv) One death occurs at each age x+0.2, x+0.5, x+1.3, x+1.7. Using Greenwoodʼs approximation, calculate the estimated variance of the product limit estimator of 2 q x. (A) 0.0210
(B) 0.0214
(C) 0.0218
(D) 0.0222
(E) 0.0226
3.32 (Course 160 Sample Exam #1, 1996, Q.8) (1.9 points) For a study of 1000 lives over three years, you are given: (i) There are no new entrants or withdrawals. (ii) Deaths occur at the end of the year of death. (iii) ni is the number of people alive at time i, i = 0, 1, 2. (iv)
qi = 1.11 x 10-4 for i = 0, 1. n pi
(v)
qi = 1.18 x 10-4 for i = 2. n pi
Calculate the conditional variance of S(3) using Greenwood's approximation. (A) 1.83 x 10-4
(B) 1.85 x 10-4
(C) 1.87 x 10-4
(D) 1.89 x 10-4
(E) 1.91 x 10-4
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 96
3.33 (4, 5/00, Q.38) (2.5 points) A mortality study is conducted on 50 lives observed from time zero. You are given: (i) Time Number of Deaths 15 2 17 0 25 4 30 0 32 40
Number Censored 0 3 0 c30
8 2
0 0
^
(ii) S 50(35) is the Product-Limit estimate of S(35). ^ ^
(iii) V [ S 50(35)] is the estimate of the variance of S50(35) using Greenwoodʼs approximation. ^ ^
(iv)
V [S 50(35)] ^
= 0.011467.
[S 50 (35)]2 Determine c30, the number censored at time 30. (A) 3
(B) 6
(C) 7
(D) 8
(E) 11
3.34 (4, 11/02, Q.33 ) (2.5 points) The following results were obtained from a survival study, using the Product-Limit estimator: ^
t
S(t)
17 25 32 36 39 42 44 47 50 54 56 57 59 62
0.957 0.888 0.814 0.777 0.729 0.680 0.659 0.558 0.360 0.293 0.244 0.187 0.156 0.052
^ (t)] V^ [S 0.0149 0.0236 0.0298 0.0321 0.0348 0.0370 0.0378 0.0418 0.0470 0.0456 0.0440 0.0420 0.0404 0.0444
Determine the lower limit of the 95% linear confidence interval for the 75th percentile of the survival distribution. (A) 32 (B) 36 (C) 50 (D) 54 (E) 56
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 97
3.35 (4, 11/03, Q.21 & 2009 Sample Q.16) (2.5 points) For a survival study with censored and truncated data, you are given: Time (t) Number at Risk at Time t Failures at Time t 1 30 5 2 27 9 3 32 6 4 25 5 5 20 4 The probability of failing at or before Time 4, given survival past Time 1, is 3 q1 . Calculate Greenwoodʼs approximation of the variance of 3 q^ 1 . (A) 0.0067
(B) 0.0073
(C) 0.0080
(D) 0.0091
(E) 0.0105
3.36 (3 points) In the previous question, calculate Greenwoodʼs approximation of the variance of ^ 3 q 2.
(A) 0.0065
(B) 0.0070
(C) 0.0075
(D) 0.0080
(E) 0.0085
3.37 (5 points) Using the information in 4, 11/03, Q.21, and the Kaplan-Meier Product-Limit estimator, estimate S(1), S(2), S(3), S(4), and S(5). 3.38 (5 points) Calculate Greenwoodʼs approximation of the variance of each estimate in the previous question. 3.39 (5 points) Using the solutions to the previous questions, calculate linear 95% confidence intervals for S(1), S(2), S(3), S(4), and S(5).
3.40 (4, 11/06, Q.7 & 2009 Sample Q.252) (2.9 points) The following is a sample of 10 payments: 4
4
5+
5+
5+
8
10+
10+
12
15
where + indicates that a loss exceeded the policy limit. ^
Determine Greenwoodʼs approximation to the variance of the product-limit estimate S (11). (A) 0.016
(B) 0.031
(C) 0.048
(D) 0.064
(E) 0.075
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 98
3.41 (4, 5/07, Q.12) (2.5 points) For 200 auto accident claims you are given: (i) Claims are submitted t months after the accident occurs, t = 0, 1, 2... (ii) There are no censored observations. ^
(iii) S (t) is calculated using the Kaplan-Meier product limit estimator. (iv)
c 2 (t) = S
^ ^ V ar [S (t)] ^
^ ^ [S , where Var (t)] is calculated using Greenwoodʼs approximation.
S (t)2
^ ^ (v) S (8) = 0.22, S (9) = 0.16, cS2 (9) = 0.02625, cS2 (10) = 0.04045
Determine the number of claims that were submitted to the company 10 months after an accident occurred. (A) 10 (B) 12 (C) 15 (D) 17 (E) 18
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 99
Solutions to Problems: ^
3.1. D. & 3.2. C. S(2) = {(300 - 40)/300} {(200 - 20)/200} = (0.867)(0.900) = 0.780. yi
si
ri
(ri-si)/ri
1 2
40 20
300 200
0.867 0.900
S(yi) 1 0.867 0.780
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.000513 0.000556
0.000513 0.001068
0.000385 0.000650
V [ S(2)] = S(2)2 Σ si/{ri(ri - si)} = (.7802 ){40/{(300)(300 - 40)} + 20/{(200)(200 - 20)}} = ^ ^
^
(.6084)(.001068) = .000650. The standard deviation of the estimate of S(2) is: 0.000650 = 0.0255. Comment: An approximate 95% confidence interval for S(2) is: .78 ± .05. 3.3. A. & 3.4. E.
4
S(4) = Π {(ri - si)/ri} = 0.530. ^
i =1
yi
si
ri
(ri-si)/ri
1 2 3 4
40 20 30 20
300 200 200 100
0.867 0.900 0.850 0.800
S(yi) 1 0.867 0.780 0.663 0.530
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.000513 0.000556 0.000882 0.002500
0.000513 0.001068 0.001951 0.004451
0.000385 0.000650 0.000857 0.001252
4
V [ S(4)] = S(4)2 Σ si/{ri(ri - si)} = (.5302 )(.004451) = 0.001252. ^ ^
^
i =1
The standard deviation of the estimate of S(4) is: 0.001252 = 0.035. Comment: An approximate 95% confidence interval for S(4) is: 0.53 ± 0.07. 3.5. C. & 3.6. E. For Group B: yi
si
ri
(ri-si)/ri
1 2 3 4
20 30 20 10
200 300 150 100
0.900 0.900 0.867 0.900
S(yi) 1 0.900 0.810 0.702 0.632
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.000556 0.000370 0.001026 0.001111
0.000556 0.000926 0.001952 0.003063
0.000450 0.000607 0.000962 0.001223
4
V [ S(4)] = S(4)2 Σ si/{ri(ri -si)} = (.6322 )(.003063) = 0.001223. ^ ^
^
i =1
The standard deviation of the estimate of S(4) is:
0.001223 = 0.035.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 100
3.7. B. & 3.8. C. For Group A plus Group B: yi
si
ri
(ri-si)/ri
1 2 3 4
60 50 50 30
500 500 350 200
0.880 0.900 0.857 0.850
S(yi) 1 0.880 0.792 0.679 0.577
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.000273 0.000222 0.000476 0.000882
0.000273 0.000495 0.000971 0.001853
0.000211 0.000310 0.000448 0.000617
4
V [ S(4)] = S(4)2 Σ si/{ri(ri -si)} = (.5772 )(.001853) = 0.000617. ^ ^
^
i =1
The standard deviation of the estimate of S(4) is: 0.000617 = 0.025. Comment: A crude approximation to the variance is: S(4)(1 - S(4)) / N, the formula for the variance of the empirical survival function. With truncation and censoring, this formula does not apply, since the number of data points varies. However, if we take N ≅ 400, then we get: (.577)(1 - .577)/400 = .00061, close to the correct answer. 4
si = (0.4842 )(0.001336) = 0.000313. r (r s ) i i=1 i i
3.9. D. V [ S(40)] = S(40)2 ∑ ^ ^
^
yi
si
ri
(ri-si)/ri
10 20 30 40
200 150 50 60
800 1000 600 350
0.750 0.850 0.917 0.829
S(yi) 1 0.750 0.637 0.584 0.484
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.000417 0.000176 0.000152 0.000591
0.000417 0.000593 0.000745 0.001336
0.000234 0.000241 0.000254 0.000313
3.10. B. & 3.11. C. V [ S(3)] = S(3)2 Σ si/{ri(ri -si)} = (0.9462 )(0.001532) = 0.001372. ^ ^
^
0.001372 = 0.0370. yi
si
ri
(ri-si)/ri
1.5 2.7 3.3 4.2
1 1 1 2
40 34 43 37
0.975 0.971 0.977 0.946
S(yi) 1 0.975 0.946 0.924 0.874
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.000641 0.000891 0.000554 0.001544
0.000641 0.001532 0.002086 0.003630
0.000609 0.001372 0.001782 0.002775
V [ S(5)] = S(5)2 Σ si/{ri(ri -si)} = (0.8742 )(0.003630) = 0.002775. ^ ^
^
0.002775 = 0.0527.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 101
^
3.12. A. S(4) = {(65 - 13)/65}{(90 - 14)/90}{(45 - 5)/45}{(30 - 4)/30} = 0.5204. V [ S(4)] = S(4)2 Σ si/{ri(ri -si)} = ^ ^
^
(0.52042 ){13/((65)(52)) + 14/((90)(76)) + 5/((45)(40)) + 4/((30)(26))} = (0.52042 )(0.01380) = 0.00374. yi
si
ri
(ri-si)/ri
1 2 3 4 5
13 14 5 4 2
65 90 45 30 20
0.8000 0.8444 0.8889 0.8667 0.9000
S(yi) 1 0.8000 0.6756 0.6005 0.5204 0.4684
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.003846 0.002047 0.002778 0.005128 0.005556
0.003846 0.005893 0.008671 0.013799 0.019354
0.002462 0.002689 0.003127 0.003737 0.004246
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.011111 0.013889 0.023810 0.050000 0.166667 0.500000
0.011111 0.025000 0.048810 0.098810 0.265476 0.765476
0.009000 0.016000 0.022950 0.029735 0.035507 0.025595
Comment: Similar to Exercise 14.20 in Loss Models. 3.13. C. and 3.14. B. yi
si
ri
(ri-si)/ri
8 10 22 33 47 82
1 1 1 1 1 1
10 9 7 5 3 2
0.900 0.889 0.857 0.800 0.667 0.500
S(yi) 1 0.900 0.800 0.686 0.549 0.366 0.183
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 102
3.15. E. and 3.16. D. yi
si
ri
(ri-si)/ri
3 4 7 8 9 12 14 17 20 22 27 32 39 45 55 60 70 90 125 190
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
30 29 28 27 26 25 24 23 22 21 20 19 18 17 11 10 9 8 3 2
0.967 0.966 0.964 0.963 0.962 0.960 0.958 0.957 0.955 0.952 0.950 0.947 0.944 0.941 0.909 0.900 0.889 0.875 0.667 0.500
S(yi) 1 0.9667 0.9333 0.9000 0.8667 0.8333 0.8000 0.7667 0.7333 0.7000 0.6667 0.6333 0.6000 0.5667 0.5333 0.4848 0.4364 0.3879 0.3394 0.2263 0.1131
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.001149 0.001232 0.001323 0.001425 0.001538 0.001667 0.001812 0.001976 0.002165 0.002381 0.002632 0.002924 0.003268 0.003676 0.009091 0.011111 0.013889 0.017857 0.166667 0.500000
0.001149 0.002381 0.003704 0.005128 0.006667 0.008333 0.010145 0.012121 0.014286 0.016667 0.019298 0.022222 0.025490 0.029167 0.038258 0.049369 0.063258 0.081115 0.247781 0.747781
0.001074 0.002074 0.003000 0.003852 0.004630 0.005333 0.005963 0.006519 0.007000 0.007407 0.007741 0.008000 0.008185 0.008296 0.008994 0.009400 0.009517 0.009343 0.012685 0.009571
Note that since there is no truncation or censoring between 0 and 50, S(45) = 16/30 = 0.5333. Then S(70) = S(45)(10/11)(9/10)(8/9) = (0.5333)8/11 = 0.388. Var[S(45)] = (0.5333)(1 - 0.5333)/30 = 0.008296.
⇒ Σ si/{ri(ri - si)} = 0.008296/0.53332 = 0.02917. ⇒ Var[S(70)] = 0.3882 {0.02917 + 1/((11)(10)) + 1/((10)(9)) + 1/((9)(8))} = 0.00952. 3.17. B. & 3.18. D. We want to estimate S(37)/S(17). Therefore, start calculating after 17. yi si ri (ri-si)/ri S(yi)/S(17) si/(ri(ri-si)) 22 31
1 1
7 5
6/7 4/5
6/7 (6/7)(4/5) = 24/35 = 0.686.
Var[Sn (37)/Sn (17)] = 0.6862 (1/42 + 1/20) = 0.0347.
1/42 1/20
0.0347 = 0.186.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 103
3.19. B. The midpoint of the interval, .868795, is the estimate of S(y1 ). S(y1 ) = (r1 - s1 )/r1 = 0.868795. ⇒ s1 /r1 = 0.131205. ⇒ s1 = 0.131205r1 . For a 90% confidence interval, ±1.645 standard deviations. ⇒ Var[S(y1)] .
0.90187 - 0.868795 = 1.645
⇒ Var[S(y1 )] = {(0.90187 - 0.868795)/1.645}2 = 0.0004043. 0.0004043 = Var[S(y1 )] = S(y1 )2 (s1 /r1 )/(r1 - s1 )} = (0.8687952 ) (0.131205)/(r1 - s1 ).
⇒ r1 - s1 = 245. ⇒ 0.868795r1 = 245. ⇒ r1 = 282. ⇒ s1 = (0.131205)(282) = 37. Comment: Given the usual output, solve for a missing input. r1 and s1 have to be integers. ^
3.20. C. We have a complete data set, and thus S (t) is given by the empirical survival function. ^
r40/1000 = S (39) = 0.313. ⇒ r40 = 313. ^ ^ ^ [S c 2 (t) = Var (t)]/ S (t)2 = S
t
∑ si / {ri(ri -si )} .
i=0
Therefore, cS2 (40) - cS2 (39) = s40/{r40(r40 - s40)}. 0.012250 - 0.012089 = 0.000161 = s40/{r40(r40 - s40)} = s40/{313(313 - s40)}. ⇒ s40 = 15. Comment: Similar to 4, 5/07, Q.12. ^ ^ ^ [S 3.21. A. cS2 (30) = Var (30)] / S (30)2 = 1/{(20)(19)} + 1/{(19)(18)} + 2/{(17)(15)} + 1/{(14)(13)} = 0.01889.
cS(30) =
0.01889 = 0.1375.
yi
si
ri
(ri-si)/ri
8 14 18 25
1 1 2 1
20 19 17 14
0.950 0.947 0.882 0.929
S(yi) 1 0.950 0.900 0.794 0.737
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.002632 0.002924 0.007843 0.005495
0.002632 0.005556 0.013399 0.018893
0.002375 0.004500 0.008450 0.010273
^
S (30) = (19/20)(18/19)(15/17)(13/14) = 0.737. ^ ^ ^ [S Var (30)] = S (30)2 (1/{(20)(19)} + 1/{(19)(18)} + 2/{(17)(15)} + 1/{(14)(13)})
= (0.7372 )(0.01889) = 0.0103. Comment: For the decrement of death, where everyone is assumed to die after a finite time, S(∞) = 0. However, ere a couple who meet through the dating service may never marry each other, and S(∞) > 0.
2013-4-7, 3.22. A. Individual A B C D E F G yi
Survival Analysis §3 Variance of Product Limit,
& 3.23. E. Birthdate April 1, 1932 July 1, 1932 Oct. 1, 1932 Jan. 1, 1933 April 1, 1933 July 1, 1933 Sept. 1, 1933 ri
Death July 1, 2007
Dec. 1, 2009 si
(ri-si)/ri
Age on 1/1/07 74 & 9 months 74 & 6 months 74 & 3 months 74 73 & 9 months 73 & 6 months 73 & 4 months si/{(ri - si)ri}
HCM 10/16/12,
Page 104
Age at Death or Censoring 75 and 3 months 77 years and 6 months 75 years and 5 months 77 years 76 years and 9 months 76 years and 5 months 76 years and 4 months
75 & 3 months 7 1 6/7 1/42 76 & 5 months 4 1 3/4 1/12 2 p 75 = pS(77)/S(75) = (6/7)(3/4) = 9/14. 2 q 75 = 1 - 2 p 75 = 5/14 = 0.3571. Var[2 q75] = Var[2 p 75] = (9/14)2 (1/42 + 1/12) = 0.04428.
0.04428 = 0.2104.
Comment: At 75 & 3 months the risk set is: A to G. At 76 & 5 months the risk set is: B, D, E, F. ^
3.24. C. S (20) = (8/9)(6/8)(4/5) = 0.533. yi
si
ri
(ri-si)/ri
11 14 18
1 2 1
9 8 5
0.889 0.750 0.800
S(yi) 1 0.889 0.667 0.533
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.013889 0.041667 0.050000
0.013889 0.055556 0.105556
0.010974 0.024691 0.030025
^ ^ ^ [S Var (20)] = S (20)2 (1/72 + 2/48 + 1/20) = 0.5332 (0.1056) = 0.0300.
Upper limit of the symmetric 90% linear confidence interval for S(20): 0.533 + (1.645)
0.0300 = 0.818.
Comment: Similar to 4, 5/07, Q.33, which used the Nelson-Aalen estimator. 3.25. D.
xi
si
ri
(ri-si)/ri
si/{(ri - si)ri}
63 70 72
1 1 2
6 4 4
5/6 3/4 2/4
1/30 1/12 2/8
^
^
S (75) = (5/6)(3/4)(2/4) = 5/16. Var[ S (75)] = (5/16)2 (1/30 + 1/12 + 2/8) = 0.03581. ^
coefficient of variation of S (75):
0.03581 = 0.6055. 5 / 16
Alternately, 1/ 30 + 1/ 12 + 2 / 8 = 0.6055. Comment: Risk set at 63 is: A to F. Risk set at 70 is: A, D, E, F; life G which enters at age 70 is not available to fail at exactly age 70. Risk set at 72 is: A, E, F, G.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
3.26. A.
ti
ri
si
(ri - si)/ri
HCM 10/16/12,
Page 105
si/{(ri - si)ri}
41 100 10 .9 .001111 42 200 20 .9 .000556 43 200 20 .9 .000556 Estimate of 3 p 40 = (0.9)(0.9)(0.9) = 0.729. Estimate of the variance of the estimate of 3 p 40 is: (0.7292 )(0.001111 + 0.000556 + 0.000556) = 0.00118. 3.27. A.
ti
ri
si
(ri - si)/ri
si/{(ri - si)ri}
63 100 10 .9 .001111 64 120 12 .9 .000926 65 110 15 .864 .001435 Estimate of 3 p 62 = (0.9)(0.9)(0.864) = 0.700. Estimate of the variance of the estimate of 3 p 62 is: (0.72 )(0.001111 + 0.000926 + 0.001435) = 0.00170. standard deviation: 3.28. A.
yi
ri
si
(ri - si)/ri
x + 0.1 15 1 0.9333 x + 0.3 13 1 0.9231 x + 0.5 13 1 0.9231 x + 0.8 12 1 0.9167 Estimate of px = (0.9333)(0.9231)(0.9231)(0.9167) =
0.00170 = 0.0412.
si/{(ri - si)ri} 0.004762 0.006410 0.006410 0.007576 0.7290.
Estimate of the variance of the estimate of qx is: (0.72902 )(0.004762 + 0.006410 + 0.006410 + 0.007576) = 0.01337. 3.29. A. Let q = q65 = q66. Since this is a complete study, Greenwood's approximation is the same as the variance of the empirical distribution function. ^
S(2) = (1 - q)(1-q) = 1 - 2q + q2 . Var[ S (2)] = S(2){1 - S(2)}/10000 = (1 - 2q + q2 )(2q - q2 )/10000. ⇒ 0.0000088 = (1 - 2q + q2 )(2q - q2 )/10000.
⇒ 0.088 = 2q - 5q2 - q4 . Try the choices. For q = 0.05, 2q - 5q2 - q4 = 0.0875. Comment: Using Mathematica, the roots of q4 + 5q2 - 2q + 0.088 = 0 are: 0.050338, 0.340408, -0.195373 ± 2.25774i. The only real root less than 1/4 is 0.05. For small q, 2q - 5q2 - q4 ≅ 2q - 5q2 . 0.088 ≅ 2q - 5q2 . ⇒ q ≅ 0.050 or q ≅ 0.350.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 106
3.30. E. In the absence of entries or withdrawals, Greenwood's approximation is exact and produces the same result as the variance of the empirical survival function. Let x be the number of deaths in each interval (40, 41] and (41, 42]. Var[S(2)] = (2x/1000)(1 - 2x/1000)/1000 = .00016. ⇒ x(1 - x/500) = 80.
⇒ x2 - 500x + 40000 = 0. x = (500 ± 300)/2. x = 100 or 400. q^ 40 = 100/1000 = 0.100 or 0.4. 3.31. B.
yi
si
ri
(ri - si)/ri
si/{(ri - si)ri}
0.2 1 10 9/10 1/90 0.5 1 9 8/9 1/72 1.3 1 10 9/10 1/90 1.7 1 8 7/8 1/56 Estimate of 2 p x is: (9/10)(8/9)(9/10)(7/8) = 0.63. Var[2 qx] = Var[2 p x] = (0.632 )(1/90 + 1/72 + 1/90 + 1/56) = 0.02142. 3.32. D. Let si be the number of people who die at time i. q0 /(p0 n0 ) = 1.11 x 10-4. ⇒ s1 /{(1000 - s1 )1000} = 1.11 x 10-4. ⇒ s1 = 100. q1 /(p1 n1 ) = 1.11 x 10-4. ⇒ s2 /{(900 - s2 )900} = 1.11 x 10-4. ⇒ s2 = 82. q2 /(p2 n2 ) = 1.18 x 10-4. ⇒ s3 /{(818 - s3 )818} = 1.18 x 10-4. ⇒ s3 = 72. In the absence of entries or withdrawals, Greenwood's approximation is exact and produces the same result as the variance of the empirical survival function. ^
S (3) = (1000 - 100 - 82 - 72)/1000 = 0.746. ^
Var[ S (3)] = (0.746)(1 - 0.746)/1000 = 1.8948 x 10- 4. 3.33. B. 0.011467 = V [S50(35)]/[S50(35)]2 = Σ si/{ri(ri -si)} = ^
2/{(50)(48)} + 4/{(45)(41)} + 8/{(41 - c30)(33 - c30)}, where the sum is taken through time 35.
⇒ 945 = (41 - c30)(33 - c30). ⇒ c30 = 6.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 107
3.34. D. The 75th percentile of the distribution is where the survival function is: 1 - 0.75 = 0.25. For a 95% confidence interval we want ±1.96 standard deviations. ^
^ ^
The lower limit is determined as the smallest value such that: S(t) ≤ 0.25 + 1.96 V [S(t)] . At t = 50 the two sides are .360 and 0.25 + (1.96)(0.0470) = 0.342, and the inequality does not hold. At t = 54 the two sides are .293 and 0.25 + (1.96)(0.0456) = 0.339, and the inequality does hold. The lower limit is 54. Comment: Estimation of percentiles using the Product-Limit estimator is not explicitly covered in Loss Models, which replaced the reading that was on the syllabus at the time this exam question was asked. ^
^ ^
The upper limit is determined as the largest value such that: S(t) ≥ 0.25 - 1.96 V [S(t)] . At t = 57 the two sides are .187 and 0.25 - (1.96)(0.0420) = 0.167, and the inequality does hold. At t = 59 the two sides are .156 and 0.25 - (1.96)(0.0404) = 0.171, and the inequality does not hold. The upper limit is 57. 3.35. A. Since 3 q1 is conditional on survival past one, perform all of the calculations beyond duration one. Using the Product-Limit estimator, the estimate of 3 p 1 = S(4)/S(1) = {(27-9)/27} {(32-6)/32} {(25-5)/25} = 0.433. t
r
1 2 3 4 5
27 32 25 20
s 9 6 5 4
(r-s)/r
S(t)/S(1)
0.667 0.812 0.800 0.800
1 0.667 0.542 0.433 0.347
4
si = 0.4332 {9/((27)(18)) + 6/((32)(26)) + 5/((25)(20))} = 0.0067. r (r s ) i i i i=2
V [3 q^ 1 ] = 3 p 1 2 ∑ ^
Comment: Note that since 3 p 1 + 3 q1 = 1, Var[3 q^ 1 ] = Var[3 p^ 1 ]. An approximate 95% confidence interval for 3 q1 is: 0.57 ± 0.16.
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
HCM 10/16/12,
Page 108
3.36. D. Since 3 q2 is conditional on survival to time two, perform all of the calculations beyond duration two. Using the Product-Limit estimator, the estimate of 3 p 2 = S(5)/S(2) = {(32-6)/32} {(25-5)/25} {(20-4)/20} = 0.520. yi
ri
si
(ri - si)/ri
S(yi)/S(2)
1 2 3 4 5
32 25 20
6 5 4
0.812 0.800 0.800
1.000 0.812 0.650 0.520
5
si = 0.5202 {6/((32)(26)) + 5/((25)(20)) + 4/((20)(16))} = 0.0080. r (r s ) i i i i=3
V [3 q^ 2 ] = 3 p 2 2 ∑ ^
Comment: It is as if we started the study at time 2. ^
^
^
3.37. S(1) = (30 - 5)/30 = 5/6 = 0.833. S(2) = S(1) (27 - 9)/27 = 5/9 = 0.556. yi
ri
si
(ri - si)/ri
S(yi)
0 1 2 3 4 5
30 27 32 25 20
5 9 6 5 4
0.833 0.667 0.812 0.800 0.800
1 0.833 0.556 0.451 0.361 0.289
^
^
^
^
^
^
S(3) = S(2) (32 - 6)/32 = 65/144 = 0.451. S(4) = S(3) (25 - 5)/25 = 13/36 = 0.361. S(5) = S(4) (20 - 3)/20 = 13/45 = 0.289. 1
^ ^
^
3.38. V [ S(1)] = S(1)2
∑ ri(ri i- si) = 0.8332 {5/((30)(25))} = 0.0046. s
i=1
1
^ ^
^
V [ S(2)] = S(2)2
∑ ri(ri i- si) = 0.5562 {5/((30)(25)) + 9/((27)(18))} = 0.5562 (0.02519) = s
i=1
0.0078. yi 0 1 2 3 4 5
si
ri
(ri-si)/ri
30 27 32 25 20
5 9 6 5 4
0.00667 0.01852 0.00721 0.01000 0.01250
S(yi)si/(ri(ri-si)) Cum.1Sum 0.00667 0.02519 0.03240 0.04240 0.05490
0.833 0.556 0.451 0.361 0.289
Var[S(yi)] 0.00463 0.00777 0.00660 0.00553 0.00458
2013-4-7,
Survival Analysis §3 Variance of Product Limit,
3.39. For example, 95% confidence interval for S(1): 0.833 ± 1.96 yi (r-s)/r Sum sr S(yi) 0 s/(r(r-s)) 1 1 0.833 2 0.556 3 0.451 4 0.361 5 0.289
Var[S(yi)]
Lower End
Upper End
0.00463 0.00777 0.00660 0.00553 0.00458
0.700 0.383 0.292 0.215 0.156
0.967 0.728 0.611 0.507 0.422
HCM 10/16/12,
Page 109
0.00463 = [0.700, 0.967].
Comment: The confidence intervals would have been narrower with more data. For example with 100 times as much data: yi
si
0 1 2 3 4 5
ri
3000 2700 3200 2500 2000
yi 0 1 2 3 4 5
500 900 600 500 400
si
ri
(ri-si)/ri
3000 2700 3200 2500 2000
500 900 600 500 400
0.00007 0.00019 0.00007 0.00010 0.00013
(ri-si)/ri
S(yi)
0.833 0.667 0.812 0.800 0.800
1 0.833 0.556 0.451 0.361 0.289
S(yi)si/(ri(ri-si)) Cum.1Sum 0.00007 0.00025 0.00032 0.00042 0.00055
0.833 0.556 0.451 0.361 0.289
Var[S(yi)] 0.0000463 0.0000777 0.0000660 0.0000553 0.0000458
The variances are 1/100 of what they were. Variances goes down as 1/N, as expected. 3.40. B.
^
yi
ri
si
4 8
10 5
2 1 ^
S (11) = (8/10)(4/5) = 16/25. Var[ S (11)] = (16/25)2 {2/((10)(8)) + 1/((5)(4))} = 0.0307. ^
3.41. A. We have a complete data set, and thus S (t) is given by the empirical survival function. ^ ^ ^ ^ r10/200 = S (9) = 0.16. ⇒ r10 = 32. cS2 (t) = Var [ S (t)]/ S (t)2 =
t
∑ si / {ri(ri -si )} .
i=0
Therefore, cS2 (10) - cS2 (9) = s10/{r10(r10 - s10)}. 0.04045 - 0.02625 = 0.0142 = s10 / {r10(s10 - r10)} = s10 / {32(32 - s10)}. ⇒ s10 = 10. Comment: There is no reason to assume any truncation. We assume t can only be an integer. ^
r9 /200 = S (8) = 0.22. ⇒ r9 = 44. ⇒ s9 = r9 - r10 = 44 - 32 = 12. c 2 is the square of the coefficient of variation of the estimated survival function. S
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 110
Section 4, Grouped Data42 Sometimes data is grouped into intervals. How to work with grouped data in the absence of truncating and censoring is discussed in “Mahlerʼs Guide to Loss Distributions.” In the absence of truncation and censoring, one can directly calculate the empirical survival function at the endpoints of intervals. In the presence of truncation and/or censoring, one can estimate the survival function using the Kaplan-Meier approach as discussed in this section. Grouping Data: As previously, eight women are observed: Ann dies at age 68, enters the study at birth. Betty dies at age 77, enters the study at birth. Carol dies at age 77, enters the study at birth. Debra dies at age 84, enters the study at birth. Edith enters the study at birth and is alive at age 90 when the study ends. Fran enters the study at birth and leaves the study at age 80. Gale enters the study at age 70 and dies at age 79. Helen enters the study at age 65 and remains alive at age 90 when the study ends. Let us assume, that for some reason this individual data is grouped into intervals with endpoints: 0, 50, 60, 70, 80, and 90. In the interval from 0 to 50, we have 6 entries (at birth), no deaths, and no withdrawals. In the interval, 50 to 60, we have no entries, no deaths, and no withdrawals. Gale enters the study at age 70, however she is not in the risk set at 70, but enters immediately after 70, so we will account for her entry in the interval from 70 to 80. Therefore, in the interval, 60 to 70, we have 1 entry (Helen), 1 death (Ann), and no withdrawals. In the interval, 70 to 80, we have 1 entry (Gale), 3 deaths (Betty, Carol, and Gale), and 1 withdrawal (Fran). In the interval, 80 to 90, we have no entries, 1 death (Debra), and 2 withdrawals (Edith and Helen).
42
See Section 14.4 of Loss Models, where it is referred to as “the Kaplan-Meier Approximation for Large Data Sets.”
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 111
Assume one has an interval from cj to cj+1. Let di be the number of truncated values in the data in [cj, cj + 1) .43 Let xi be the number of uncensored values in the data in (cj, cj + 1] .44 Let ui be the number of censored values in the data in (cj, cj + 1] .45 For this example, the data grouped in this manner would be displayed as:46 i
ci
ci+1
di
xi
ui
0 1 2 3 4
0 50 60 70 80
50 60 70 80 90
6 0 1 1 0
0 0 1 3 1
0 0 0 1 2
Note how one has a lot less information than was in the individual data.47 Therefore, a somewhat different technique will be used to estimate the risk set, which will be used to estimate the Survival Function. Pi:48 Let Pi = the number of lives that that were in interval i-1 that contribute to ri, the risk set for interval i. Determining Pi is a useful intermediate step in determining the risk set. All 6 lives in interval zero, contribute to the risk set for interval one. P1 = 6. All 6 lives in interval one, contribute to r2 . P2 = 6. Since one of the lives in interval 2 dies, only 6 of the 7 lives in interval two, contribute to r3 . P3 = 6. Since one of the lives in interval three withdraws and 3 die, only 3 of the 7 lives in interval three, contribute to r4 . P4 = 3. In general, with truncation and censoring, we add the number of lives that enter, and subtract the number of lives who die or withdraw: P i = Pi - 1 + di - 1 - xi - 1 - ui - 1. P 0 = 0. 43
Note that the interval is semi-closed, excluding the upper endpoint and including the lower endpoint. Left truncation can occur at the left endpoint. 44 Note that the interval is semi-closed, including the upper endpoint and excluding the lower endpoint. 45 Note that the interval is semi-closed, including the upper endpoint and excluding the lower endpoint. Right censoring can occur at the right endpoint. 46 I have labeled the first interval i = 0, in order to match the notation in Loss Models. 47 However, when one has a very large data set, grouping the data may make it easier to work with. 48 While earlier editions of Loss Models used Pi, the third edition does not. Nevertheless, the idea remains useful.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 112
Exercise: For the eight women example, the grouped data was: i ci ci+1 di xi ui 0 0 50 6 0 0 1 50 60 0 0 0 2 60 70 1 1 0 3 70 80 1 3 1 4 80 90 0 1 2 Apply the above formulas to this example. [Solution: P0 = 0. P1 = P0 + d0 - x0 - u0 = 0 + 6 - 0 - 0 = 6. P2 = P1 + d1 - x1 - u1 = 6 + 0 - 0 - 0 = 6. P3 = P2 + d2 - x2 - u2 = 6 + 1 - 1 - 0 = 6. P4 = P3 + d3 - x3 - u3 = 6 + 1 - 3 - 1 = 3.] Assumptions about Entries and Exits: Grouping decreases the amount of information. Assuming that all we had was this grouped version of the data, we now wish to estimate the Survival Function at each of the endpoints of intervals. In order to determine the risk set to use for interval two, from 60 to 70, one would need to make an assumption. One reasonable assumption is that at a time of a death in the interval on average half of the truncated values in the interval are in the risk set, while the remaining half are not. Then one would use r2 = 6 + 1/2 = 6.5. For interval three, from 70 to 80, one would also need to make an assumption as to how many of the censored values are in the risk set. One reasonable assumption is that at a time of a death in the interval on average half of the censored values in the interval are no longer in the risk set, while the remaining half are. Then one would use r3 = 6 + 1/2 - 1/2 = 6. In general, assume that 0 ≤ α ≤ 1 of the truncated values in the interval are in the risk set, and that 0 ≤ β ≤ 1 of the censored values in the interval are no longer in the risk set.49 Note that if all entries and exits are assumed to be at the beginning of an interval, then α = 1 = β. If instead all entries and exits are assumed to be at the end of an interval, then α = 0 = β. Then one determines the risk set: ri = Pi + αdi - βui. Once one has the risk set, one would estimate the Survival Function using the Kaplan-Meier approach; one would multiply a series of "survival ratios", (ri - xi)/ri. 49
While earlier editions of Loss Models used α and β, the third edition does not. The notation remains useful.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 113
Exercise: For the eight women example, the grouped data was: i ci ci+1 di xi ui 0 1 2 3 4
0 50 60 70 80
50 60 70 80 90
6 0 1 1 0
0 0 1 3 1
0 0 0 1 2
Assuming entries and exits are uniformly distributed on each interval, in other words using α = 1/2 and β = 1/2, estimate the survival functions at 70, 80, and 90. [Solution: For example, r4 = P4 + d4 /2 - u4 /2 = 3 + 0/2 - 2/2 = 2. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2 3 4
0 50 60 70 80
50 60 70 80 90
6 0 1 1 0
0 0 1 3 1
0 0 0 1 2
0 6 6 6 3
3 6 6.5 6 2
1.0000 1.0000 0.8462 0.5000 0.5000
1.0000 1.0000 0.8462 0.4231 0.2115
For example, S(80) = {(r3 - x3 )/r3 }S(70) = (.5)(.8462) = .4231.] Note that the results of applying the Product-Limit Estimator to this data in ungrouped form were: S(t) = 1 for t < 68 S(t) = 6/7 = 0.857 for 68 ≤ t < 77 S(t) = 30/49 = 0.612 for 77 ≤ t < 79 S(t) = 24/49 = 0.490 for 79 ≤ t < 84 S(t) = 16/49 = 0.327 for 84 ≤ t ≤ 90. In general, using the data in grouped form will produce estimates of the Survival Function that are similar to, but not the same as those obtained by working with the ungrouped data. Policy Limits / Maximum Covered Losses: In this example, we assumed that on average 1/2 of the censored values in the interval are no longer in the risk set. In general, we assume β of the censored values in the interval are no longer in the risk set, where 0 ≤ β ≤ 1. We had taken β = 1/2, which makes sense when everything is uniformly distributed on each interval. However, sometimes this is not a reasonable assumption. For example, if we are dealing with size of loss data with policy limits, then the data is right censored at the amount of the policy limit. One would normally know the sizes of policy limits sold. If one groups the data so that each such policy limit is an endpoint of an interval, then one should take β = 0. In other words, all of the exits take place at the end of intervals.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 114
For example, assume one sells 25,000 and 50,000 policy limits, and one has the following grouped data: i ci ci+1 xi ui 0 1 2 3
0 5000 10000 25000
5000 10000 25000 50000
100 40 30 5
0 0 20 10
Exercise: Estimate the survival function, assuming all exits occur at the end of intervals, β = 0. [Solution: In a case with no truncation, we let d0 = size of the data set = 100 + 40 + 30 + 5 + 20 + 10 = 205, and assume all these lives enter at time zero, in other words set α = 1. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2 3
0 5,000 10,000 25,000
5,000 10,000 25,000 50,000
205 0 0 0
100 40 30 5
0 0 20 10
0 105 65 15
205 105 65 15
0.5122 0.6190 0.5385 0.6667
0.5122 0.3171 0.1707 0.1138
For example, for the last interval, r3 = P3 - βu3 = 15 - 0(10) = 15. All of the censored values in the last interval are of size 50,000, and therefore they are included in r3 .] Deductibles: For example, assume the following grouped data without censoring: i ci ci+1 di xi 0 1 2 3
0 500 1000 5000
500 1000 5000 25000
100 100 0 0
40 60 80 20
As before, let Pi = the number of lives that that were in interval i-1 that contribute to ri. Then P1 = 100 - 40 = 60. P2 = 60 + 100 - 60 = 100. P3 = 100 + 0 - 80 = 20. As before, one assumes that α of the truncated values in the interval are in the risk set. If the above data is for size of loss, and we sell policies with either no deductible or a deductible of 500, then a particular choice of α would be reasonable. In interval zero, we know all of the 100 truncated values enter at x = 0.50 In interval one, we know all of the 100 truncated values enter at x = 500. Thus in each case, the truncated values are available to fail during the entire interval. This corresponds to an assumption of α = 1. In other words, all of the entries take place at the start of intervals. 50
Left truncation at 0 is the same as no truncation, in other words no deductible.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 115
Exercise: For the above data set, assuming are all entries occur at the beginning of intervals, α = 1, estimate the survival function. [Solution: For example, r1 = P1 + αd1 = 60 + 100 = 160. i
ci
ci+1
di
xi
Pi
ri
(ri - xi)/ri
S(ci+1)
0 1 2 3
0 500 1000 5000
500 1000 5000 25000
100 100 0 0
40 60 80 20
0 60 100 20
100 160 100 20
60/100 100/160 20/100 0
60/100 = 0.6 (100/160)(0.6) = 0.375 (20/100)(0.375) = 0.075 0
]
Both Truncation and Censoring: Assume we have the following grouped data with both left truncation and right censoring: i ci ci+1 di xi ui 0 0 500 100 30 0 1 500 1000 200 90 0 2 1000 5000 0 120 0 3 5000 25000 0 30 10 4 25000 50000 0 15 5 For the above data, P1 = P0 + d0 - x0 - u0 = 0 + 100 - 30 - 0 = 70. Exercise: For the above data, determine P2 , P3 , and P4 . [Solution: P2 = P1 + d1 - x1 - u1 = 70 + 200 - 90 - 0 = 180. P3 = P2 + d2 - x2 - u2 = 180 + 0 - 120 - 0 = 60. P4 = P3 + d3 - x3 - u3 = 60 + 0 - 30 - 10 = 20.] In general, the choices of when exits and entries take place, make a difference in the estimated Survival Function.51 As discussed previously, when one has size of loss data, and the deductibles and maximum covered losses sold are all endpoints of intervals, then one should choose α = 1 and β = 0. This corresponds to assuming that all values in an interval are available to fail starting at the lower endpoint of that interval and all exits occur at the righthand endpoint. Assume that the above were size of loss data, and that policies are sold with either no deductible or a deductible of 500, and with a maximum covered loss of either 25,000 or 50,000. Then since 0, 500, 25000, and 50,000 are among the endpoints of intervals, one should choose α = 1 and β = 0. 51
Exam questions should specify the assumptions on when entries and exits occur. In practical applications, In the absence of any other information, α = 1/2 and β = 1/2 are usually reasonable choices.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 116
Exercise: For the above data set, assuming any entries occur at the beginning of each interval and any exits occur at the end of each interval, α = 1, and β = 0, estimate the survival function. [Solution: For example, r1 = P1 + αd1 - βu1 = 70 + (1)(200) - (0)(0) = 270. i 0 1 2 3 4
ci 0 500 1000 5000 25000
ci+1
di
xi
ui
500 100 30 0 1000 200 90 0 5000 0 120 0 25000 0 30 10 50000 0 15 5
Pi
ri
(ri - xi)/ri
S(bi)
0 70 180 60 20
100 270 180 60 20
70/100 180/270 60/180 30/60 5/20
70/100 = 0.7 (180/270)(0.7) = 0.4667 (60/180)(0.4667) = 0.1556 (30/60)(0.1556) = 0.0778 (5/20)(0.0778) = 0.0194. ]
For mortality studies, without any other information, a reasonable choice is to assume exits and entries are uniformly distributed, α = 1/2 = β. For example, assume the following mortality data: i ci ci+1 di xi ui 0 1 2 3 4 5 6
0 25 50 70 80 90 100
25 50 70 80 90 100 110
500 300 200 100 40 20 0
10 40 160 220 240 90 10
100 60 80 60 80 10 0
The number of lives that are neither observed nor censored in interval 0 are: 500 - 10 - 100 = 390. These 390 lives, are P1 , the risk set at the beginning of the interval 1. In order to get r1 , we need to add half of the number left truncated in interval 1 and subtract half of the number right censored in interval 1: r1 = 390 + 300/2 - 60/2 = 390 + 150 - 30 = 510. In this example, ri = Pi + (1/2)di - (1/2)ui = Pi + xi/2 + (1/2)(di - ui - xi) = Pi + xi/2 + (1/2)(Pi+1 - Pi) = (Pi + Pi+1 + xi)/2. Exercise: For the above data set, assuming exits and entries are uniformly distributed, α = 1/2 and β = 1/2, determine P2 and r2 . [Solution: P2 = P1 + d1 - x1 - u1 = 390 + 300 - 40 - 60 = 590. r2 = P2 + d2 /2 - u2 /2 = 590 + 200/2 - 80/2 = 650. Alternately, r2 = (P2 + P3 + x2 )/2 = (590 + 550 + 160)/2 = 650.]
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 117
Exercise: For the above data set, assuming exits and entries are uniformly distributed, α = 1/2 and β = 1/2, estimate the survival functions. [Solution: For example, r3 = P3 + d3 /2 - u3 /2 = 550 + 100/2 - 60/2 = 570. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2 3 4 5 6
0 25 50 70 80 90 100
25 50 70 80 90 100 110
500 300 200 100 40 20 0
10 40 160 220 240 90 10
100 60 80 60 80 10 0
0 390 590 550 370 90 10
200 510 650 570 350 95 10
0.9500 0.9216 0.7538 0.6140 0.3143 0.0526 0.0000
0.9500 0.8755 0.6600 0.4053 0.1274 0.0067 0.0000
For example, S(70) = {(r2 - x2 )/r2 }S(50) = (0.7538)(0.8755) = .6600.] In certain circumstances, we may have additional information. For example, in the above situation we might know or suspect that many of the lives in the first interval entered the study at time equal zero. In that case, one could for the first interval take α > 1/2, for example α = 3/4. Exercise: Taking α = 3/4 and β = 1/2 for i = 0, determine r0 . [Solution: r0 = (3/4)(500) - 100/2 = 375 - 50 = 325.] Then S(25) = 315/325 = 0.9692. Assuming for i = 1, we took as before α = 1/2, and β = 1/2, then r1 = 510 and S(50) = (470/510)S(25) = 0.8932. Sometimes, one assigns a whole number age to a life when an insurance policy is issued. This assigned age is called an “insuring age.”52 If insuring ages are used, then insureds only enter the study at integer ages. Thus if the endpoints of the intervals are integer, and the intervals are of width one, then in each interval all of the truncation values are at the left endpoint and we would let α = 1. Similarly, insureds might only leave the study, for other than death, on their policy anniversary dates. In that case, if the endpoints of the intervals are integer, and the intervals are of width one, then in each interval all of the censored values are at the right endpoint and we would let β = 0. While Loss Models only applies the Kaplan-Meier Product Limit approach to grouped data, once one has determined the risk set, there is no reason why the Nelson-Aalen approach, to be discussed subsequently, could not be applied. 52
See Exercise 14.33 in Loss Models. For example, a life insurance policy is issued to a person aged 31 years and 106 days. This person is assigned an insuring age of 31 years old. The insuring date of birth is on the date of the year that the policy was issued, rather than the insuredʼs actual birthday.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 118
A Somewhat Different Approach than Used in Loss Models:53 Let us take the following sample of size of loss data:54 Occurrence Number 1 2 3 4 5 6 7 8 9 10 11 12
Occurrence Size 5,000 5,000 15,000 5,000 5,000 15,000 25,000 10,000 15,000 25,000 30,000 50,000
Attachment Point 0 0 0 7,500 0 0 0 15,000 0 0 0 15,000
Policy Limit 15,000 15,000 15,000 15,000 30,000 30,000 30,000 30,000 100,000 100,000 100,000 100,000
Comment
Censored Data Deductible Data
Excess Data
Excess Data
Attachment point = d = left truncation point. Policy limit = maximum payment. Policy limit plus attachment point = u = right censorship point. Occurrence size = size of payment. Occurrence size + attachment point = size of loss. Putting this data set in the format used in Survival Analysis on Exam 4/C:
53
Occurrence
y
1 2 3 4 5 6 7 8 9 10 11 12
5,000 5,000 15,000 12,500 5,000 15,000 25,000 25,000 15,000 25,000 30,000 65,000
d
u
15,000 7,500
15,000
15,000
Not on the syllabus of your exam! See pages 21 to 22 of “Increased Limit Factors for Liability Insurance Ratemaking”, by Joseph M. Palmer, formerly on the syllabus of CAS Exam on Ratemaking. The technique Palmer shows simplifies the calculation when one has very large data sets, as does ISO (Insurance Services Office). 54 The sample, taken from page 21 of Palmer, is small solely for illustrative purposes.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 119
Exercise: Apply the Kaplan-Meier Product Limit Estimator to this (individual) data. [Solution i yi ri si (ri - si)/ri S(yi) 1 2 3 4 5 6
5000 12,500 15,000 25,000 30,000 65,000
9 7 6 5 2 1
3 1 2 3 1 1
6/9 6/7 2/3 2/5 1/2 0
2/3 (2/3)(6/7) = 4/7 (4/7)(2/3) = 8/21 (8/21)(2/5) = 16/105 (16/105)(1/2) = 8/105 (8/105)(0) = 0 ]
Thus for example, one would estimate S(10,000) = 2/3, S(20,000)/S(10,000) = (8/21)/(2/3) = 4/7, and S(40,000)/S(20,000) = (8/105)/(8/21) = 1/5. If we were to group the individual data into intervals with endpoints of 10,000, 20,000, and 40,000:55 i
ci
ci+1
di
xi
ui
Pi
0 1 2 3
0 10 20 40
10 20 40 ∞
10 2 0 0
3 3 4 1
0 1 0 0
0 7 5 1
Exercise: Using α = 1/2 and β = 1/2, in other words assume that entries and exits are distributed uniformly on each interval, apply the “Kaplan-Meier Approximation for Large Data Sets” in Loss Models, to this grouped data. [Solution: For example r1 = P1 + d1 /2 - u1 /2 = 7 + 2/2 - 1/2 = 7.5. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2 3
0 10 20 40
10 20 40 ∞
10 2 0 0
3 3 4 1
0 1 0 0
0 7 5 1
5 7.5 5 1
0.4000 0.6000 0.2000 0.0000
0.4000 0.2400 0.0480 0.0000
Exercise: Use α = 1 and β = 0; entries are at the beginning and exits are at the end of each interval. [Solution: For example r1 = P1 + d1 = 7 + 2 = 9. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2 3
0 10 20 40
10 20 40 ∞
10 2 0 0
3 3 4 1
0 1 0 0
0 7 5 1
10 9 5 1
0.7000 0.6667 0.2000 0.0000
0.7000 0.4667 0.0933 0.0000
In this case, neither set of assumptions produces the same estimates as when using the original individual data prior to grouping.56 55 56
This would simplify calculations if we had for example 12,000 observations rather than just 12. We had S(10,000) = 2/3 = 0.6667, and S(20,000) = 8/21 = 0.3810.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 120
Palmer works on intervals, as in the “Kaplan-Meier Approximation for Large Data Sets” in Loss Models. For example, he estimates S(40,000)/S(20,000), the probability of survival past 40,000 given survival past 20,000, all in one piece. However, in Palmerʼs calculation of S(40,000)/S(20,000), he only includes occurrences where the size of loss is greater than 20,000, d ≤ 20,000, and u > 40,000. Occurrence number 7 is a payment of size 25,000 with no deductible and a policy limit of 30,000. Thus the size of loss is 25,000, without truncation or censoring. Thus it is included in the individual calculation of S(25,000). However, occurrence number 7 is not included in Palmerʼs calculation of S(40,000)/S(20,000), because its u = 30,000 is not more than 40,000. For this example, Palmerʼs calculation is as follows: There are 6 occurrences with y > 10,000, d = 0, and u > 10,000: #3, #6, #7, #9, #10, #11. There are 9 occurrences with d = 0 and u > 10,000: #1, #2, #3, #5, #6, #7, #9, #10, #11. S(10,000) = 6/9 = 2/3. There are 3 occurrences with y > 20,000, d ≤ 10,000, and u > 20,000: #7, #10, #11. There are 6 occurrences with d ≤ 10,000 and u > 20,000: #4, #6, #7, #9, #10, #11.57 S(20,000)/S(10,000) = 3/6 = 1/2. There is 1 occurrence with y > 40,000, d ≤ 20,000, and u > 40,000: #12. There are 4 occurrences with d ≤ 20,000 and u > 40,000: #8, #10, #11, #12.58 S(40,000)/S(20,000) = 1/4. Kaplan-Meier Approximation for Large Data Sets Individual Data S(10,000) S(20,000)/S(10,000) S(40,000)/S(20,000)
2/3 4/7 1/5
Palmer
α = 1/2, β = 1/2
α = 1, β = 0
2/3 1/2 1/4
2/5 3/5 1/5
7/10 2/3 1/5
Palmerʼs technique produces similar answers to working with the individual data, but they differ somewhat due to working on intervals. The answers should be very similar for very large data sets. Palmerʼs technique produces similar answers to the “Kaplan-Meier Approximation for Large Data Sets” in Loss Models, but they differ somewhat due to the particular manner Palmer has of deciding which occurrences to include in each calculation. In Palmerʼs technique there is no need to select α and β as in the method shown in Loss Models. 57 58
Occurrence #8 has d = 15,000 > 10,000, and thus is not included. Occurrence #7 has u = 30,000 ≤ 40,000, and thus is not included.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 121
Problems: Use the following information about a group of 100 lives for the next two questions: Interval Number of Deaths in Interval Number of Lives Censored in Interval (0-10] 3 0 (10-20] 1 7 (20-30] 2 15 (30-40] 1 6 (40-50] 2 6 (50-60] 7 3 (60-70] 12 4 (70-80] 15 2 (80-90] 8 1 (90-100] 3 1 (100-110] 1 0 Assume that exits are uniformly distributed within each interval. 4.1 (2 points) Estimate S(60). A. 70% B. 72% C. 74%
D. 76%
E. 78%
4.2 (2 points) Estimate S(80). A. 25% B. 27% C. 29%
D. 31%
E. 33%
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 122
Use the following information about a group of losses to answer the next three questions: Number of Losses Number of Losses Number of Losses ci ci+1 Truncated in [ci , ci+1) in (ci , ci+1] Censored in (ci , ci+1] 0 5 200 80 0 5 10 150 120 0 10 25 100 70 50 25 50 0 40 40 50 100 0 20 30 Policies were sold with either no deductible, a deductible of 5, or a deductible of 10. Policies were sold with either a maximum covered loss of 25, 50, or 100. 4.3 (3 points) Estimate S(100). A. 10% B. 11% C. 12%
D. 13%
E. 14%
4.4 (2 points) Estimate the probability that a payment from a policy with a deductible of 5 will be greater than 10. A. 48% B. 50% C. 52% D. 54% E. 56% 4.5 (2 points) Estimate the 80th percentile of the size of loss distribution. A. 28
B. 30
C. 32
D. 34
E. 36
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 123
Use the following information about a group of lives for the next four questions: Number of Lives Number of Deaths Number of Lives ci ci+1 Truncated in [ci , ci+1) in (ci , ci+1] Censored in (ci , ci+1] 70 71 72 73 74
71 72 73 74 75
500 50 40 30 20
12 16 14 17 15
20 25 30 35 40
4.6 (2 points) Assume that entries and exits are uniformly distributed within each interval. Estimate S(73)/S(70). A. 88% B. 89% C. 90% D. 91% E. 92% 4.7 (3 points) Assume that entries and exits are uniformly distributed within each interval. Estimate S(75)/S(70). A. 83% B. 84%
C. 85%
D. 86%
E. 87%
4.8 (2 points) Assume that entries occur at the beginning of each interval. Assume that exits occur at the end of each interval. Estimate S(73)/S(70). A. 88% B. 89% C. 90% D. 91% E. 92% 4.9 (3 points) Assume that entries occur at the beginning of each interval. Assume that exits occur at the end of each interval. Estimate S(75)/S(70). A. 83% B. 84% C. 85% D. 86% E. 87%
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 124
Use the following information for the next two questions: For each of 70 diesel generator fans, the number of hours of running time from its first being put into service until fan failure or until the end of the study (whichever came first) was recorded. Values that have been censored from above have been indicated by a +. 4.5, 4.6+, 11.5, 11.5, 15.6+, 16.0, 16.6+, 18.5+, 18.5+, 18.5+, 18.5+, 18.5+, 20.3+, 20.3+, 20.3+, 20.7, 20.7, 20.8, 22.0+, 30.0+, 30.0+, 30.0+, 30.0+, 31.0 , 32.0+, 34.5, 37.5+, 37.5+, 41.5+, 41.5+, 41.5+, 41.5+, 43.0+, 43.0+, 43.0+, 43.0+, 46.0, 48.5+, 48.5+, 48.5+, 48.5+, 50.0+, 50.0+, 50.0+, 61.0+, 61.0, 61.0+, 61.0+, 63.0+, 64.5+, 64.5+, 67.0+, 74.5+, 78.0+, 78.0+, 81.0+, 81.0+, 82.0+, 85.0+, 85.0+, 85.0+, 87.5+, 87.5, 87.5+, 94.0+, 99.0+, 101.0+, 101.0+, 101.0+, 115.0+. 4.10 (4 points) Use the Kaplan-Meier Product-Limit Estimator to estimate the Survival Function through age 115. 4.11 (4 points) Group the above data into intervals of width 20. Based on this grouped data, using the Kaplan-Meier approximation for large data sets estimate the Survival Function through age 120, assuming censorship is uniformly distributed within each interval. 4.12 (3 points) Loss data from policies with deductibles of 250 and 500 and maximum covered losses of 10,000 and 25,000 were collected. The results are given below: Deductible Range 250 500 Total (250, 500] 80 80 (500, 1,000] 100 60 160 (1,000, 5,000] 150 110 260 (5,000, 10,000) 130 70 200 (10,000, 25,000) 40 20 60 At 10,000 60 30 90 At 25,000 40 10 50 Total 600 300 900 Using the Kaplan-Meier approximation for large data sets, assuming any truncation occurs at the beginning of each interval and any censoring occurs at the end of each interval, estimate S(25000)/S(250). (A) Less than 8% (B) At least 8%, but less than 9% (C) At least 9%, but less than 10% (D) At least 10%, but less than 11% (E) At least 11%
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 125
Use the following information for the next five questions: A mortality study of retired lives was conducted. You are given: Age Number of Lives Number of Deaths Number of Lives Interval Entering in Interval in Interval Withdrawing in Interval 60 61 30 1 14 61 62 40 2 13 62 63 12 5 17 63 64 22 7 10 64 65 50 10 15 Use the Kaplan-Meier approximation for large data sets. Assume that entries and exits are uniformly distributed within each interval. 4.13 (1 point) Estimate p60. A. 0.86
B. 0.88
C. 0.90
D. 0.92
E. 0.94
D. 0.85
E. 0.87
D. 0.77
E. 0.79
D. 0.55
E. 0.57
D. 0.48
E. 0.50
4.14 (1 point) Estimate 2 p 60. A. 0.79
B. 0.81
C. 0.83
4.15 (1 point) Estimate 3 p 60. A. 0.71
B. 0.73
C. 0.75
4.16 (1 point) Estimate 4 p 60. A. 0.49
B. 0.51
C. 0.53
4.17 (1 point) Estimate 5 p 60. A. 0.42
B. 0.44
C. 0.46
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 126
Use the following information for the next two questions: Age Number of Lives Number of Deaths Number of Lives Interval Entering in Interval in Interval Exiting in Interval (0, 1] 1500 20 1000 (1, 2] 1500 14 1160 (2, 3] 1500 10 612 Assume that entries occur at the beginning of each interval. Assume that exits are uniformly distributed within each interval. 4.18 (2 points) Using the Kaplan-Meier approximation for large data sets, estimate S(3). (A) Less than 0.94 (B) At least 0.94, but less than 0.95 (C) At least 0.95, but less than 0.96 (D) At least 0.96, but less than 0.97 (E) At least 0.97 4.19 (2 points) Using Greenwoodʼs approximation, determine the variance of the estimate in the previous question. (A) 0.000028 (B) 0.000029 (C) 0.000030 (D) 0.000031 (E) 0.000032
4.20 (3 points) Assume that whole-number ages have been assigned to each insured; thus the ages in the following data base are insuring ages. Number of Lives Number of Deaths Number of Lives ci ci+1 Truncated in [ci , ci+1) in (ci , ci+1] Censored in (ci , ci+1] 80 81 300 15 100 81 82 200 21 150 82 83 100 16 150 83 84 60 18 100 84 85 30 20 100 Assume that exits are uniformly distributed within each interval. Estimate S(85)/S(80). A. 52% B. 54% C. 56% D. 58% E. 60%
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 127
4.21 (3 points) The following table was calculated based on loss amounts for a group of similar insurance policies: cj
dj
uj
xj
Pj
250
200
0
41
0
500
400
0
172
159
1000
300
0
127
387
2000
0
40
84
560
3000
0
52
58
436
4000
0
64
42
326
5000
0
41
31
220
6000
0
22
27
148
7000
0
15
23
99
8000
0
17
18
61
9000
0
14
12
26
10,000
0
0
0
0
Assume any truncation occurs at the beginning of each interval and any censoring occurs at the end of each interval. For a policy with a deductible of 1000, given that there is a claim payment, estimate the probability that this claim payment is in excess of 4000. (A) Less than 0.30 (B) At least 0.30, but less than 0.40 (C) At least 0.40, but less than 0.50 (D) At least 0.50, but less than 0.60 (E) At least 0.60
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 128
4.22 (2 points) You are given the following information about 1000 claims: Claim Number of Claims Number of Claims Number of Claims Size in Interval, Censored in Truncated in Interval Uncensored Interval Interval (0, 10,000] 400 50 600 (10,000, 25,000] 300 100 300 (25,000, 100,000] 100 50 100 Using the Kaplan-Meier approximation for large data sets, estimate S(100,000), assuming that truncations occur at the beginning of each interval and censoring at the end of each interval. (A) Less than 0.030 (B) At least 0.030, but less than 0.040 (C) At least 0.040, but less than 0.050 (D) At least 0.050, but less than 0.060 (E) At least 0.060 4.23 (Course 160 Sample Exam #2, 1996, Q.8) (1.9 points) Using the Kaplan-Meier approximation for large data sets, you have determined: Interval Deaths Risk Set (0, 1] 3 15 (1, 2] 24 80 (2, 3] 5 25 (3, 4] 6 60 (4, 5] 3 10 ^
Calculate Greenwood's approximation to Var[ S (4)]. (A) 0.0055
(B) 0.0056
(C) 0.0058
(D) 0.0061
(E) 0.0063
4.24 (4, 11/01, Q.27 & 2009 Sample Q.73) (2.5 points) You are given the following information about a group of 10 claims: Claim Size Interval Number of Claims in Interval Number of Claims Censored in Interval (0-15,000] 1 2 (15,000-30,000] 1 2 (30,000-45,000] 4 0 Assume that claim sizes and censorship points are uniformly distributed within each interval. Estimate the probability that a claim exceeds 30,000. (A) 0.67 (B) 0.70 (C) 0.74 (D) 0.77 (E) 0.80
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 129
4.25 (4, 5/05, Q.7) (2.9 points) Loss data for 925 policies with deductibles of 300 and 500 and policy limits of 5,000 and 10,000 were collected. The results are given below: Deductible Range 300 500 Total (300, 500] 50 50 (500, 1,000] 50 75 125 (1,000, 5,000) 150 150 300 (5,000, 10,000) 100 200 300 At 5,000 40 80 120 At 10,000 10 20 30 Total 400 525 925 Assume any truncation occurs at the beginning of each interval and any censoring occurs at the end of each interval. Using the Kaplan-Meier approximation for large data sets, estimate F(5000). (A) 0.25 (B) 0.32 (C) 0.40 (D) 0.51 (E) 0.55 4.26 (4, 5/07, Q.26) (2.5 points) The following table was calculated based on loss amounts for a group of motorcycle insurance policies: cj dj uj xj Pj 250 6 0 1 0 500 6 0 2 5 1000 7 1 4 9 2750 0 1 7 11 5500 0 1 1 3 6000 0 0 1 1 10,000 0 0 0 0 Assume any truncation occurs at the beginning of each interval and any censoring occurs at the end of each interval. Using the procedure in the Loss Models text, estimate the probability that a policy with a deductible of 500 will have a claim payment in excess of 5500. (A) Less than 0.13 (B) At least 0.13, but less than 0.16 (C) At least 0.16, but less than 0.19 (D) At least 0.19, but less than 0.22 (E) At least 0.22
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 130
Solutions to Problems: 4.1. E. and 4.2. C. In this case with no truncation, we let d0 = size of the data set = 100, and assume all these lives enter at time zero. Assuming that exits are uniformly distributed within each interval, with β = 1/2, here are the estimated survival functions: i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2 3 4 5 6 7 8 9 10
0 10 20 30 40 50 60 70 80 90 100
10 20 30 40 50 60 70 80 90 100 110
100 0 0 0 0 0 0 0 0 0 0
3 1 2 1 2 7 12 15 8 3 1
0 7 15 6 6 3 4 2 1 1 0
0 97 89 72 65 57 47 31 14 5 1
100 93.5 81.5 69 62 55.5 45 30 13.5 4.5 1
0.9700 0.9893 0.9755 0.9855 0.9677 0.8739 0.7333 0.5000 0.4074 0.3333 0.0000
0.9700 0.9596 0.9361 0.9225 0.8928 0.7802 0.5721 0.2861 0.1165 0.0388 0.0000
For example, P2 = P1 - x1 - u1 = 97 - 1 - 7 = 89, and r2 = P2 - u2 /2 = 89 - 15/2 = 81.5. S(30) = S(20) (r2 - x2 )/r2 = (0.9596)(81.5 - 2)/81.5 = 0.9361. 4.3. A. We assume all truncation occurs at the deductible values, which are the beginning of intervals and all censoring takes place at the values of the maximum covered losses, which are at the end of intervals, α = 1 and β = 0. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2 3 4
0 5 10 25 50
5 10 25 50 100
200 150 100 0 0
80 120 70 40 20
0 0 50 40 30
0 120 150 130 50
200 270 250 130 50
0.6000 0.5556 0.7200 0.6923 0.6000
0.6000 0.3333 0.2400 0.1662 0.0997
For example, P3 = P2 + d2 - x2 - u2 = 150 + 100 - 70 - 50 = 130. r2 = P2 + d2 = 150 + 100 = 250. S(25) = S(10) (r2 - x2 )/r2 = (0.3333)(250 - 70)/250 = 0.2400. 4.4. B. From the previous solution S(10) = 0.3333, and S(25) = 0.2400. Linearly interpolating, S(15) = (0.3333)(2/3) + (0.2400)(1/3) = .3022. The probability that a payment from a policy with a deductible of 5 will be greater than 10 = Prob[Loss > 15 | Loss > 5] = S(15)/S(5) = 0.3022/0.6000 = 0.504.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 131
4.5. E. From a previous solution S(25) = 0.2400, and S(50) = 0.1662. We wish to find x, such that S(x) = 20%. Linearly interpolating, x = (25)(4/7.38) + (50)(3.38/7.38) = 36.45. 36.45 - 25 50 - 36.45 Check: 0.2400 + 0.1662 = 0.200. 50 - 25 50 - 25 4.6. B. Assuming that entries and exits are uniformly distributed within each interval, α = 1/2 and β = 1/2, here are the estimated survival functions: i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(70)
0 1 2
70 71 72
71 72 73
500 50 40
12 16 14
20 25 30
0 468 477
240 480.5 482
0.9500 0.9667 0.9710
0.9500 0.9184 0.8917
For example, P2 = P1 + d1 - x1 - u1 = 468 + 50 - 16 - 25 = 477. r2 = P2 + d2 /2 - u2 /2 = 477 + 40/2 - 30/2 = 482. S(73)/S(70) = {S(72)/S(70)} (r2 - x2 )/r2 = (0.9184)(482 - 14)/482 = 0.8917. 4.7. A. Assuming that entries and exits are uniformly distributed within each interval, α = 1/2 and β = 1/2, here are the estimated survival functions: i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(70)
0 1 2 3 4
70 71 72 73 74
71 72 73 74 75
500 50 40 30 20
12 16 14 17 15
20 25 30 35 40
0 468 477 473 451
240 480.5 482 470.5 441
0.9500 0.9667 0.9710 0.9639 0.9660
0.9500 0.9184 0.8917 0.8595 0.8302
For example, P2 = P1 + d1 - x1 - u1 = 468 + 50 - 16 - 25 = 477. r2 = P2 + d2 /2 - u2 /2 = 477 + 40/2 - 30/2 = 482. S(73)/S(70) = {S(72)/S(70)} (r2 - x2 )/r2 = (0.9184)(482 - 14)/482 = 0.8917. 4.8. E. Assuming that entries occur at the beginning of each interval, and exits occur at the end of each interval, α = 1 and β = 0, here are the estimated survival functions: i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(70)
0 1 2
70 71 72
71 72 73
500 50 40
12 16 14
20 25 30
0 468 477
500 518 517
0.9760 0.9691 0.9729
0.9760 0.9459 0.9202
For example, P2 = P1 + d1 - x1 - u1 = 468 + 50 - 16 - 25 = 477. r2 = P2 + d2 = 477 + 40 = 517. S(73)/S(70) = {S(72)/S(70)} (r2 - x2 )/r2 = (0.9459)(517 - 14)/517 = 0.9202.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 132
4.9. D. Assuming that entries occur at the beginning of each interval, and exits occur at the end of each interval, α = 1 and β = 0, here are the estimated survival functions: i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(70)
0 1 2 3 4
70 71 72 73 74
71 72 73 74 75
500 50 40 30 20
12 16 14 17 15
20 25 30 35 40
0 468 477 473 451
500 518 517 503 471
0.9760 0.9691 0.9729 0.9662 0.9682
0.9760 0.9459 0.9202 0.8891 0.8608
For example, P2 = P1 + d1 - x1 - u1 = 468 + 50 - 16 - 25 = 477. r2 = P2 + d2 = 477 + 40 = 517. S(73)/S(70) = {S(72)/S(70)} (r2 - x2 )/r2 = (0.9459)(517 - 14)/517 = 0.9202. 4.10.
i
xi
ri
si
(ri - si)/ri
S(xi)
1 4.5 70 1 69/70 69/70 = 0.9857. 2 11.5 68 2 66/68 (69/70)(66/68) = 0.9567 3 16 65 1 64/65 (0.9567)(64/65) = 0.9420 4 20.7 55 2 53/55 (0.9420)(53/55) = 0.9077 5 20.8 53 1 52/53 (0.9077)(52/53) = 0.8906 6 31.0 47 1 46/47 (0.8906)(46/47) = 0.8717 7 34.5 45 1 44/45 (0.8717)(44/45) = 0.8530 8 46.0 34 1 33/34 (0.8530)(33/34) = 0.8272 9 61.0 26 1 25/26 (0.8272)(25/26) = 0.7954 10 87.5 9 1 8/9 (0.7954)(8/9) = 0.7070 The estimated Survival Function is: 1 for t < 4.5, 69/70 for 4.5 ≤ t < 11.5, 0.9567 for 11.5 ≤ t < 16, 0.9420 for 16 ≤ t < 20.7, 0.9077 for 20.7 ≤ t < 20.8, 0.8906 for 20.8 ≤ t < 31.0, 0.8717 for 31.0 ≤ t < 34.5, 0.8530 for 34.5 ≤ t < 46.0, 0.8272 for 46.0 ≤ t < 61.0, 0.7954 for 61.0 ≤ t < 87.5, 0.7070 for 87.5 ≤ t ≤ 115.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 133
4.11. The data grouped into intervals is: i ci ci+1 xi ui 0 0 20 4 8 1 20 40 5 11 2 40 60 1 15 3 60 80 1 10 4 80 100 1 10 5 100 120 0 4 In this case with no truncation, we let d0 = size of the data set = 70, and assume all these lives enter at time zero. Assume censorship is uniformly distributed within each interval. Then for example, P2 = 58 - 5 - 11 = 42, and r2 = 42 - 15/2 = 34.5. i
ci
ci+1
di
xi
ui
Pi
ri
(ri - xi)/ri
S(ci+1)
0 0 20 70 4 8 0 66 62/66 62/66 = 0.9394 1 20 40 0 5 11 58 52.5 47.5/52.5 (.9394)(47.5/52.5) = 0.8499 2 40 60 0 1 15 42 34.5 33.5/34.5 (.8499)(33.5/34.5) = 0.8253 3 60 80 0 1 10 26 21 20/21 (.8253)(20/21) = 0.7860 4 80 100 0 1 10 15 10 9/10 (.7860)(9/10) = 0.7074 5 100 120 0 0 4 4 2 4/4 0.7074 We have estimated: S(20) = 0.9394, S(40) = 0.8499, S(60) = 0.8252, S(80) = 0.7860, S(100) = 0.7074, and S(120) = 0.7074. Comment: Previously for the ungrouped version of this data set, we had estimated for example S(80) = 0.7954. While the estimates are similar, they are not identical.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 134
4.12. C. We assume all truncated values in an interval are in the risk set, α = 1. We assume all censored values in an interval are in the risk set, β = 0. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(250)
0 1 2 3 4
250 500 1000 5000 10000
500 1000 5000 10000 25000
600 300 0 0 0
80 160 260 200 60
0 0 0 90 50
0 520 660 400 110
600 820 660 400 110
0.8667 0.8049 0.6061 0.5000 0.4545
0.8667 0.6976 0.4228 0.2114 0.0961
d0 = 600 = total # losses from policies with 250 ded. = # of claims truncated from below at 250. d1 = 300 = total # losses from policies with 500 ded. = # of claims truncated from below at 500. x1 = 100 + 60 = 160= number of losses of size 5000 to 10,000. u3 = 60 + 30 = 90 = number of losses censored from above at 10,000. u4 = 40 + 10 = 50 = number of losses censored from above at 25,000. r1 = P1 + d1 = 520 + 300 = 820. P2 = P1 + d1 - x1 - u1 = 520 + 300 - 160 - 0 = 660. Alternately, the risk set for the first interval is all 600 of the losses from the policies with 250 deductible. S(500)/S(250) = (600 - 80)/600 = 520/600 = 0.8667. The risk set for the next interval adds all 300 of the losses from the policies with 500 deductible added to the 520 losses that survived the previous interval. Thus the risk set is: 520 + 300 = 820. S(1000)/S(250) = (0.8667)(820 - 160)/820 = (0.8667)660/820 = 0.6976. The risk set for the next interval is all 660 losses that survived the previous interval. S(5000)/S(250) = (0.6976)(660 - 260)/660 = (0.6976)400/660 = 0.4228. The risk set for the next interval is all 400 losses that survived the previous interval. S(10000)/S(250) = (0.4228)(400 - 200)/200 = (0.4228)200/400 = 0.2114. The risk set for the next interval is the 200 losses that survived the previous interval, minus the 90 losses censored from above at 10,000; the risk set is: 200 - 90 = 110. Alternately, the risk set is the 60 losses that fail in this interval, plus the 50 losses censored from above at 25,000; the risk set is: 60 + 50 = 110. S(25000)/S(250) = (0.2114)(110 - 60)/110 = (0.2114)(50)/110 = 0.0961. Comment: Similar to 4, 5/05, Q.7. α = 1 and β = 0, is the usual assumption when dealing with size of loss data, with the deductibles and maximum covered losses at the endpoints of intervals. Since all of the data is truncated from below, everything is conditional on being greater than 250. Therefore, we are estimating S(10000)/S(250), rather than S(10000). If we have a loss of size 4000 from a policy with a 250 deductible, (one of the 150 shown in the data), we have a truncation point from below at 250, which contributes 1 to d0 , and a loss of size 4000, which contributes 1 to x2 . If we have a loss of size 100,000 from a policy with a 500 deductible and a 25,000 maximum covered loss, (one of the 10 shown in the data), we have a truncation point from below at 500, which contributes 1 to d1 , and a loss of size greater than or equal to 25,000, which contributes 1 to u4 .
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 135
4.13 to 4.17. B., B., A., E., C. Assuming that entries and exits are uniformly distributed within each interval, α = 1/2 and β = 1/2. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(60)
0 1 2 3 4
60 61 62 63 64
61 62 63 64 65
30 40 12 22 50
1 2 5 7 10
14 13 17 10 15
0 15 40 30 35
8 28.5 37.5 36 52.5
0.8750 0.9298 0.8667 0.8056 0.8095
0.8750 0.8136 0.7051 0.5680 0.4598
For example, P3 = P2 + d2 - x2 - u2 = 40 + 12 - 5 - 17 = 30. r3 = P3 + d3 /2 - u3 /2 = 30 + 22/2 - 10/2 = 36. S(64)/S(60) = {S(64)/S(60)} (r3 - x3 )/r3 = (0.7051)(36 - 7)/36 = 0.5680. 5 p 60
= S(65)/S(60) = 0.4598.
4.18. D. Assuming that entries occur at the beginning of each interval, α = 1. Assuming that exits are uniformly distributed within each interval, β = 1/2. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)
0 1 2
0 1 2
1 2 3
1500 1500 1500
20 14 10
1000 1160 612
0 480 806
1000 1400 2000
0.9800 0.9900 0.9950
0.9800 0.9702 0.9653
For example, P2 = P1 + d1 - x1 - u1 = 480 + 1500 - 14 - 1160 = 806. r2 = P2 + d2 - u2 /2 = 806 + 1500 - 612/2 = 2000. S(3) = (980/1000)(1386/1400)(1990/2000) = 0.9653. 4.19. A. Var[S(3)] = S(3)2 (20/{(1000)(980)} + 14/{(1400)(1386)} + 10/{(2000)(1990)}) = (0.96532 )(0.000030136) = 0.0000281. 4.20. A. Since these are insuring ages, we assume that the lives enter at the beginning of each interval, α = 1. Assuming that exits are uniformly distributed within each interval, β = 1/2. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(80)
0 1 2 3 4
80 81 82 83 84
81 82 83 84 85
300 200 100 60 30
15 21 16 18 20
100 150 150 100 100
0 185 214 148 90
250 310 239 158 70
0.9400 0.9323 0.9331 0.8861 0.7143
0.9400 0.8763 0.8177 0.7245 0.5175
For example, P2 = P1 + d1 - x1 - u1 = 185 + 200 - 21 - 150 = 214. r2 = P2 + d2 - u2 /2 = 214 + 100 - 150/2 = 239. S(83)/S(80) = {S(82)/S(80)} (r2 - x2 )/r2 = (0.8763)(239 - 16)/239 = 0.8177.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 136
4.21. D. Assume any truncation occurs at the beginning of each interval and any censoring occurs at the end of each interval, α = 1 and β = 0. Therefore, rj = Pj + (1)dj - 0uj = Pj + dj. j
cj
cj+1
dj
uj
xj
Pj
rj
(rj-xj)/rj
S(cj+1)/S(250)
0 1 2 3 4 5 6 7 8 9 10 11
250 500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 ∞
200 400 300 0 0 0 0 0 0 0 0 0
0 0 0 40 52 64 41 22 15 17 14 0
41 172 127 84 58 42 31 27 23 18 12 0
0 159 387 560 436 326 220 148 99 61 26 0
200 559 687 560 436 326 220 148 99 61 26 0
0.7950 0.6923 0.8151 0.8500 0.8670 0.8712 0.8591 0.8176 0.7677 0.7049 0.5385
0.7950 0.5504 0.4486 0.3813 0.3306 0.2880 0.2474 0.2023 0.1553 0.1095 0.0589
Prob[X > 5000 | X > 1000] = S(5000)/S(1000) = {S(5000)/S(250)}/{S(1000)/S(250)} = 0.2880/0.5504 = 0.523. Alternately, {(687 - 127)/687}{(560 - 84)/560}{(436 - 58)/436}{(326 - 42)/326} = (0.8151)(0.8500)(0.8670)(0.8712) = 0.523. Comment: Similar to 4, 5/07, Q.26. In order to answer the question asked, one does not use the data from intervals above 5000. 4.22. B. There are 600 claims on policies with a deductible of 0. Thus in the interval (0, 10,000], r = 600. S(10,000) = (600 - 400)/600 = 1/3. Of these 600 claims, the number that come forward to the next interval is: 600 - 400 - 50 = 150. There are 300 claims on policies with a deductible of 10,000. Thus in the interval (10,000, 25,000], r = 300 + 150 = 450. S(25,000)/S(10,000) = (450 - 300)/450 = 1/3. Of these 450 claims, the number that come forward to the next interval is: 450 - 300 - 100 = 50. There are 100 claims on policies with a deductible of 25,000. Thus in the interval (25,000, 100,000], r = 100 + 50 = 150. S(100,000)/S(25,000) = (150 - 100)/150 = 1/3. S(100,000) = (1/3)(1/3)(1/3) = 1/27 = 0.037.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 137
^
4.23. A. S (4) = (12/15)(56/80)(20/25)(54/60) = 0.4032. i
ci
ci+1
xi
ri
(ri-xi)/ri
S(ci+1)
xi/((ri-xi)ri)
Sum
Var[S(ci+1)]
0 1 2 3 4
0 1 2 3 4
1 2 3 4 5
3 24 5 6 3
15 80 25 60 10
0.8000 0.7000 0.8000 0.9000 0.7000
0.8000 0.5600 0.4480 0.4032 0.2822
0.01667 0.00536 0.01000 0.00185 0.04286
0.01667 0.02202 0.03202 0.03388 0.07673
0.01067 0.00691 0.00643 0.00551 0.00611
^
Var[ S (4)] = 0.40322 {3/((12)(15)) + 24/((56)(80)) + 5/((20)(25)) + 6/((54)(60))} = 0.00551. 4.24. C. The total number of claims is 10. The risk set for the first interval is: 10 - 2/2 = 9. The risk set for the second interval is: 7 - 2/2 = 6. S(30000) = {(9 - 1)/9}{(6 - 1)/6} = 20/27 = 0.741. Comment: We assume on average half of the values censored in the interval do not contribute to the risk set for that interval; β = 1/2. There is no truncation. All of the “lives” are available to “die” from “age” 0. i ci ci+1 di xi ui Pi ri 0 0 1 15000 2 30000 P0 = 0.
15000 10 1 30000 0 1 45000 0 4 P1 = 0 + 10 - 1 - 2 = 7.
2 0 9 2 7 6 0 4 4 P2 = 7 - 1 - 2 = 4.
r0 = 10 - 2/2 = 9.
r1 = 7 - 2/2 = 6.
r2 = 4 - 0/2 = 4.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 138
4.25. E. We assume all truncated values in an interval are in the risk set, α = 1. We assume all censored values in an interval are in the risk set, β = 0. i
ci
ci+1
di
xi
ui
Pi
ri
(ri-xi)/ri
S(ci+1)/S(300)
0 1 2 3
300 500 1000 5000
500 1000 5000 10000
400 525 0 0
50 125 300 300
0 0 120 30
0 350 750 330
400 875 750 330
0.8750 0.8571 0.6000 0.0909
0.8750 0.7500 0.4500 0.0409
For example, r1 = P1 + d1 = 350 + 525 = 875. P2 = P1 + d1 - x1 - u1 = 350 + 525 - 125 - 0 = 750. S(5000)/S(300) = 0.4500. 1 - 0.450 = 0.550. Alternately, the risk set for the first interval is all 400 of the losses from the policies with 300 deductible. S(500)/S(300) = (400 - 50)/400 = 350/400 = .875. The risk set for the next interval adds all 525 of the losses from the policies with 500 deductible added to the 350 losses that survived the previous interval. Thus the risk set is: 350 + 525 = 875. S(1000)/S(300) = (0.875)(875 - 125)/875 = (0.875)750/875 = 0.750. The risk set for the next interval is all 750 losses that survived the previous interval. S(5000)/S(300) = (0.750)(750 - 300)/750 = (0.750)450/750 = 0.450. 1 - 0.450 = 0.550. The risk set for the next interval is the 450 losses that survived the previous interval, minus the 120 losses censored from above at 5000; the risk set is: 450 - 120 = 330. S(10000)/S(300) = (0.450)(330 - 300)/330 = (0.450)30/330 = 0.0409. Comment: Since all of the data is truncated from below, everything is conditional on being greater than 300. Therefore, we are really estimating S(5000)/S(300), rather than S(5000). 1 - S(5000)/S(300) = {S(300) - S(5000)}/S(300) = {F(5000) - F(300)}/S(300), the distribution at 5000 after the data has been truncated from below at 300. “Loss data for 925 policies with,” is at best confusing. What we really have is data on 925 losses from policies with these provisions. Recall that a policy limit of 5000 means that the maximum payment is 5000. Thus with a policy limit of 5000 and 500 deductible the data would be censored from above at: 5000 + 500 = 5500, the maximum covered loss, rather than 5000 as assumed in the question. This exam question should have read: “Data on losses from policies with deductibles of 300 and 500 and maximum covered losses of 5,000 and 10,000 were collected. Using the Kaplan-Meier approximation for large data sets, with α = 1 and β = 0, in other words assuming all truncated values and all censored values in an interval are in the risk set, estimate 1 - S(5000)/S(300).” α = 1 and β = 0, is the usual assumption when dealing with size of loss data, with the deductibles and maximum covered losses at the endpoints of intervals.
2013-4-7,
Survival Analysis §4 Grouped Data,
HCM 10/16/12,
Page 139
4.26. B. We assume all truncated values in an interval are in the risk set, α = 1. We assume all censored values in an interval are in the risk set, β = 0. Therefore, rj = Pj + (1)dj - 0uj = Pj + dj. For example, r2 = 9 + 7 = 16. j
cj
cj+1
dj
uj
xj
Pj
rj
(rj-xj)/rj
S(cj+1)/S(250)
0 1 2 3 4 5 6
250 500 1000 2750 5500 6000 10000
500 1000 2750 5500 6000 10000 ∞
6 6 7 0 0 0 0
0 0 1 1 1 0 0
1 2 4 7 1 1 0
0 5 9 11 3 1 0
6 11 16 11 3 1 0
0.8333 0.8182 0.7500 0.3636 0.6667 0.0000
0.8333 0.6818 0.5114 0.1860 0.1240 0.0000
For a policy with a deductible of 500, there will only be a claim payment if the loss exceeds 500, and a claim payment will be in excess of 5500 if a loss exceeds 6000. Prob[X > 6000 | X > 500] = S(6000)/S(500) = {S(6000)/S(250)} / {S(500)/S(250)} = 0.1240/0.8333 = 0.1488. Alternately, {(11 - 2)/11} {(16 - 4)/16} {(11 - 7)/11} {(3 - 1)/3} = 18/121 = 0.1488. Comment: Should have instead read: “For a policy with a deductible of 500 and no maximum covered loss, given that there is a claim payment, estimate the probability that this claim payment is in excess of 5500." The question gave you the values of Pj, but one could have calculated them as follows. P0 = 0. P1 = P0 + d0 - u0 - x0 = 0 + 6 - 0 - 1 = 5. P2 = P1 + d1 - u1 - x1 = 5 + 6 - 0 - 2 = 9. P3 = P2 + d2 - u2 - x2 = 9 + 7 - 1 - 4 = 11. P4 = P3 + d3 - u3 - x3 = 11 + 0 - 1 - 7 = 3. α = 1 and β = 0 corresponds to all the truncated values entering at the beginning of each interval and all of the censored values leaving at the end of each interval. For example, there are 7 losses truncated by a deductible of 1000 and 1 loss censored by a maximum covered loss of 2750. Similar to Exercise 14.34 in Loss Models. When all of the data is from policies with deductibles, then the estimates are of S(x)/S(d), where d is the smallest deductible of any of the policies. In their solutions manual, for Exercise 14.34, all of the values calculated are really S(cj)/S(250).
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 140
Section 5, Nelson-Aalen Estimator The Nelson-Aalen Estimator is another nonparametric estimator of the Survival Function that can be used, even when there is truncation and/or censoring. As with the Kaplan-Meier Product-Limit Estimator, the Nelson-Aalen Estimator uses the risk set. However, the Nelson-Aalen Estimator first estimates the cumulative hazard rate, H. Estimating the Cumulative Hazard Rate:59 In a previous section, the example with eight women had the following risk set: i yi ri si 1 2 3 4
68 77 79 84
7 7 5 3
1 2 1 1
Define the failure ratio = si/ri =
number who die at age yi 60 . number who can die at age yi
For example at age 77, the failure ratio is: 2/7. Each failure ratio is an estimate of the hazard rate: i yi ri si h(yi) = si/ri 1 68 7 1 1/7 2 77 7 2 2/7 3 79 5 1 1/5 4 84 3 1 1/3 We get an estimated discrete version of h(yi). By summing up these estimated hazard rates we get an estimate of the Cumulative Hazard Rate. Nelson-Aalen Estimator of the Cumulative Hazard Rate: ^
j
H( yj) =
^
∑ sr i . i=1 i
^
H(68) = 1/7. H(77) = 1/7 + 2/7 = 3/7. ^
H(84) = 1/7 + 2/7 + 1/5 + 1/3 = 101/105. 59 60
See page 348 of Loss Models. Loss Models does not call this ratio anything.
^
H(79) = 1/7 + 2/7 + 1/5 = 22/35.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 141
These estimates of the Cumulative Hazard Rate have jump discontinuities at the observed times of ^
^
death. For example, H(t) = 3/7 for 77 ≤ t < 79 and H(t) = 22/35 for 79 ≤ t < 84. Loss Models uses k for the number of distinct values at which a death occurs. In this example k = 4. ^
H(t) =
k
∑ sr i , t ≥ yk. i=1 i
^
In this case, H(t) = 101/105 for t ≥ 84. Estimated Survival Function: Then one can use these estimates of the Cumulative Hazard Rates to estimate the Survival ^
^
Function. S(68) = exp[- H(68)] = e-1/7 = 0.867. ^
^
S ( yj) = exp[- H( yj)]. yi
si
ri
si/ri
H(yi)
S(yi)
68 77 79 84
1 2 1 1
7 7 5 3
0.143 0.286 0.200 0.333
0.143 0.429 0.629 0.962
0.867 0.651 0.533 0.382
Note that these estimates of the Survival Function using the Nelson-Aalen Estimator are similar to but differ somewhat from those gotten previously using the Kaplan-Meier Product-Limit Estimator. Estimate of the Survival Function Kaplan-Meier Nelson-Aalen yi 68 77 79 84
0.857 0.612 0.490 0.327
0.867 0.651 0.533 0.382
In general, the two estimators differ. In the absence of truncation and censoring, the Nelson-Aalen Estimator does not reduce to the Empirical Survival Function, as does the KaplanMeier Product-Limit Estimator. In this example, our last observed death was at 84, and the largest censorship point was 90. Thus it makes sense to have the estimated survival function decline after age 90.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 142
As discussed previously with respect to the Product-Limit Estimator, Loss Models assumes an exponential decline in the survival function beyond the last observation or censorship point. ^
^
In this case, for t > 90, S (t) = (0.382)t/90. For example, S (100) = (0.382)100/90 = 0.343. Exercise: You observe the following payments from 10 losses: Deductible Maximum Covered Loss Payment None None 300 None None 70,000 500 None 11,500 1000 None 1,000 None 25,000 11,000 None 50,000 50,000 500 25,000 14,500 500 100,000 200 1000 50,000 49,000 1000 100,000 28,000 Note that the maximum payment on any loss is: Maximum Covered Loss minus Deductible. Use the Nelson-Aalen Estimator to estimate the Cumulative Hazard Rate and the corresponding Survival Function. [Solution: For example, H(29,000) = H(15,000) + 1/4 = 1.051 + 0.250 = 1.301. Deductible Maximum Covered Loss Payment Size of Loss None None 300 300 None None 70,000 70,000 500 None 11,500 12,000 1000 None 1,000 2,000 None 25,000 11,000 11,000 None 50,000 50,000 50,000 or more 500 25,000 14,500 15,000 500 100,000 200 700 1000 50,000 49,000 50,000 or more 1000 100,000 28,000 29,000 yi
si
ri
si/ri
H(yi)
S(yi)
300 700 2,000 11,000 12,000 15,000 29,000 70,000
1 1 1 1 1 1 1 1
4 6 8 7 6 5 4 1
0.250 0.167 0.125 0.143 0.167 0.200 0.250 1.000
0.250 0.417 0.542 0.685 0.851 1.051 1.301 2.301
0.779 0.659 0.582 0.504 0.427 0.350 0.272 0.100
S(29,000) = e-1.301 = 0.272.]
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 143
From these estimates of the Survival Function, one can estimate other quantities. For example, 10000q5000 = 1 - S(15000)/S(5000) = 1 - 0.350/0.582 = 0.399. Exercise: Estimate 10000|20000q5000. [Solution: 10000|20000q5000 = {S(15000) - S(35000)}/S(5000) = (0.350 - 0.272)/0.582 = 0.134.] Kernel Smoothing:61 The estimate of the Cumulative Hazard Rate has jumps at each of the reported values; there is a corresponding discrete density. For example, in the previous exercise h(11,000) = H(11,000) - H(2000) = 0.685 - 0.542 = 0.143 = 1/7. One can apply Kernel Smoothing to these discrete hazard rates. Exercise: Use a uniform kernel with bandwidth of 5000 to estimate h(10,000). [Solution: Only points within 5000 of 10,000 contribute. Each uniform kernel has height 1/10,000 and width 10,000. h(11,000) = 0.143. h(12,000) = 0.167. h(15,000) = 0.200. The kernel smoothed estimate of h(10,000) is: h(11000)/10000 + h(12000)/10000 + h(15000)/10000 = (0.143 + 0.167 + 0.200) / 10,000 = 0.000051.] Note that this is analogous to the application of kernel smoothing to the empirical distribution function.62 Nelson-Aalen estimate of h(t) at observed deaths. ⇔ For the empirical distribution function, point masses of probability at the observed values. Nelson-Aalen estimate of H(t) with jump discontinuities at the observed deaths. ⇔ Empirical distribution function with jump discontinuities at the observed values. Exercise: Use a triangular kernel with bandwidth of 5000 to estimate h(15,000). [Solution: Only points within 5000 of 15,000 contribute. The triangular kernel centered at 15,000 is 1/5000 at 15,000. The triangular kernel centered at 12,000 is: 0.4/5000 at 15,000. The triangular kernel centered at 11,000 is: 0.2/5000 at 15,000. Thus the kernel smoothed estimate of h(15,000) is: {(1/5)h(11000) + (2/5)h(12000) + (1)h(15000)} / 5000 = 0.143/25000 + 0.167/12500 + 0.200/5000 = 0.000059.] 61 62
See “Mahlerʼs Guide to Fitting Loss Distributions” and Section 14.3 of Loss Models. See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 144
Kaplan-Meier Product-Limit Estimator vs. Nelson-Aalen Estimator: As was seen before, the two estimators give similar but not identical estimates. j
For the Kaplan-Meier Product-Limit Estimator: Sn (yj) =
∏ ri i=1
- si . ri
Thus the corresponding estimate of the Cumulative Hazard Rate is: j
j
j
s r - si s ^ Hn (yj) = -ln(Sn (yj)) = - ∑ ln[ i = - ∑ ln[1 - i ] ≅ ∑ i = H(yj) = Nelson-Aalen Estimator. ] ri ri i=1 i=1 i=1 ri Thus in general, the Kaplan-Meier Product-Limit Estimator and the Nelson-Aalen Estimator produce similar estimates of the Cumulative Hazard Rate and the corresponding Survival Function. -ln(1 - x) = x + x2 /2 + x3 /3 + ... > x, for x > 0. j
j
s s ^ Therefore, Hn (yj) = - ∑ ln[1 - i ] > ∑ i = H(yj). ri i=1 i=1 ri ^
^
Therefore, Exp[-Hn (yj)] = Sn (yj) < S(yj) = Exp[- H(yj)]. The Kaplan-Meier Product-Limit Estimator of the Survival Function is (somewhat) less than the Nelson-Aalen Estimator of the Survival Function.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 145
Problems: Use the following information for the next three questions: Group A Group B si ri si ri yi 1 10 100 15 100 2 15 80 20 90 3 20 60 30 70 4 15 30 5 40 Use the Nelson-Aalen Estimator of the Cumulative Hazard Rate. 5.1 (2 points) Determine the estimate of S(4) based on the data for Group A. (A) 27%
(B) 30%
(C) 33%
(D) 37%
(E) 40%
5.2 (2 points) Determine the estimate of S(4) based on the data for Group B. (A) 27% (B) 30% (C) 33% (D) 37% (E) 40% 5.3 (3 points) Determine the estimate of S(4) based on the data for both groups. (A) 27% (B) 30% (C) 33% (D) 37% (E) 40%
5.4 (2 points) The claim payments on a sample of ten policies are: 4
4
5+
5+
6
9
10+
12
15+
17
+
indicates that the loss exceeded the policy limit. Using the Nelson-Aalen estimator, calculate the probability that the loss on a policy exceeds 15. (A) 0.35 (B) 0.40 (C) 0.45 (D) 0.50 (E) 0.55 5.5 (2 points) In a study of claim payment times, you are given: (i) The data were neither truncated nor censored. (ii) At most one claim was paid at any one time. (iii) There are a total of 20 claims. Determine the Nelson-Aalen estimate of the cumulative hazard function, H(t), immediately following the sixth paid claim. (A) 0.35 (B) 0.37 (C) 0.39 (D) 0.41 (E) 0.43
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 146
5.6 (3 points) n lives were observed from birth. None were censored and no two lives died at the same age. At the time of the fourth death, the Nelson-Aalen estimate of the cumulative hazard rate is 0.15121 and at the time of the fifth death it is 0.19288. At the time of the eighth death, what is the difference between the Nelson-Aalen estimate of the Survival Function and the Kaplan-Meier estimate of the Survival Function? (A) 0.002 (B) 0.003 (C) 0.004 (D) 0.005 (E) 0.006 Use the following information for the next 5 questions: You observe the following payments from 9 losses: Deductible Maximum Covered Loss Payment None 25,000 25,000 None 50,000 4,000 None 100,000 15,000 10,000 25,000 5,000 10,000 50,000 40,000 10,000 100,000 13,000 25,000 50,000 5,000 25,000 100,000 75,000 25,000 100,000 60,000 Note that the maximum payment on any loss is: Maximum Covered Loss minus Deductible. Use the Nelson-Aalen Estimator of the Cumulative Hazard Rate. 5.7 (1 point) Determine the estimate of S(60,000). 5.8 (1 point) Determine the estimate of S(150,000). 5.9 (1 point) Determine the estimate of Prob[15,000 ≤ X ≤ 50,000]. 5.10 (1 point) Determine the estimate of 15q10. 5.11 (1 point) Determine the estimate of 10|15q5 .
5.12 (2 points) n lives were observed from birth. None were censored and no two lives died at the same age. At the time of the seventh death, the Nelson-Aalen estimate of the cumulative hazard rate is 0.1897 and at the time of the eighth death it is 0.2200. Estimate the value of the survival function at the time of the fourth death. (A) 0.89 (B) 0.90 (C) 0.91 (D) 0.92 (E) 0.93
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 147
Use the following information for the next 3 questions: Five losses are from policies with a deductible of 500: 600, 800, 1000, 1400, 4000. Six losses are from policies with a deductible of 1000: 1100, 1400, 1800, 2000, 2000, 3000. Use the Nelson-Aalen Estimator of the Cumulative Hazard Rate. 5.13 (1 point) Estimate 1000q500. 5.14 (1 point) Estimate 500q1000. 5.15 (1 point) Estimate 1000|2000q500. Use the following information for the next 8 questions: T represents the number of months from the time a claim is reported to the time the claim is closed. For ten claims all reported 36 months ago, 8 have been closed, with observed values of T: 2, 5, 5, 8, 9, 15, 26, 31. For an additional ten claims all reported 24 months ago, 6 have been closed, with observed values of T: 1, 3, 5, 12, 12, 20. Use the Nelson-Aalen Estimator of the Cumulative Hazard Rate. 5.16 (2 points) Determine the estimate of S(30). 5.17 (1 point) Determine the estimate of S(48). 5.18 (1 point) Determine the estimate of Prob[12 ≤ T ≤ 36]. 5.19 (1 point) Determine the estimate of Prob[12 < T ≤ 36]. 5.20 (1 point) Determine the estimate of 6 q24. 5.21 (1 point) Determine the estimate of 12|18q6 . 5.22 (2 points) Using a uniform kernel with bandwidth 5, estimate the hazard rate at 5. 5.23 (3 points) Using a triangular kernel with bandwidth 3, estimate the hazard rate at 10.
5.24 (2 points) Using the Product-Limit Estimator: S(x) = 1 for 0 ≤ x < 5, S(x) = 5/6 for 5 ≤ x < 15, S(x) = 5/9 for 15 ≤ x < 30, 65/144 for 30 ≤ x < 50. Using the Nelson-Aalen estimator, estimate S(40). (A) 0.42 (B) 0.44 (C) 0.46 (D) 0.48 (E) 0.50
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 148
5.25 (2 points) For a data set without truncation or censoring, there are no simultaneous deaths. The deaths occur at times y1 , y2 , etc. The Nelson-Aalen estimate of S(ym) is 0.680. Two thirds of the lives in the data set survive beyond ym. Determine m. (A) 1
(B) 2
(C) 3
(D) 4
(E) 5
5.26 (4 points) For a large data set of size n without truncation or censoring, there are no ^
simultaneous deaths. Derive an approximation formula for S(yn ), the Nelson-Aalen estimate of the ^
survival function beyond the last data point. Then briefly compare and contrast S(yn ) with both the smoothed empirical estimate of the percentile corresponding to yn and the empirical survival function. 5.27 (3 points) For a mortality study: The first two deaths occur at time 4 and time 13. Some individuals withdrew from the study at time 10. There were no other withdrawals and nobody entered the study after time 0. The Kaplan-Meier Product-Limit estimate of S(13) is positive. The sum of the Kaplan-Meier Product-Limit estimate of S(13) and the Nelson-Aalen estimate of H(13) is 714/713. Determine the number of withdrawals. (A) 3 (B) 4 (C) 5 (D) 6 (E) 7 5.28 (2 points) You observe the following eight values of X: 13, 16, 18, 25, 37, 43, 50, 62. Using the Nelson-Aalen estimator, determine Prob[25 ≤ X ≤ 50]. (A) 39% (B) 41% (C) 43% (D) 45% (E) 47% 5.29 (3 points) For a mortality study: At time = 10, x people die and 3 people withdraw from the study. At time = 20, y people die and 7 people enter the study. At time = 30, 4 people die and 2 people withdraw from the study. Between time 0 and time 40 there are no other deaths, withdrawals or entries. ^
^
^
Using the Nelson-Aalen estimator: S(15) = 0.9045, S(25) = 0.6065, and S(35) = 0.4724. Determine x + y. (A) 5 or less
(B) 6
(C) 7
(D) 8
(E) 9 or more
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 149
5.30 (2 points) For a mortality study: Individual Age of Entry Age of Termination Cause of Termination 1 0 20 Censored 2 0 8 Censored 3 0 16 Death 4 5 20 Censored 5 10 12 Death 6 15 20 Censored Determine the Nelson-Aalen estimate of H(20). (A) 0.2 (B) 0.3 (C) 0.4 (D) 0.5 (E) 0.6 5.31 (2 points) For a mortality study is conducted on monkeys from birth, you are given:
• Using the Nelson-Aalen estimator, S^ (7) = 0.339. • Five monkeys died between age 7 and 8, each at a different age. • Fifteen monkeys survived beyond age 8. Determine the Nelson-Aalen estimate of S(8). (A) 0.250 (B) 0.252 (C) 0.254 (D) 0.256
(E) 0.258
5.32 (2 points) A mortality study has been conducted of persons with a certain disease. The observation period ended on July 1, 1996. No withdrawals occurred. Using the following data on five persons, calculate the probability of death within one year of diagnosis using the Nelson-Aalen Estimator. Person Date of Diagnosis Date of Death U October 1, 1994 January 1, 1996 V October 1, 1994 July 1, 1995 W January 1, 1996 January 1, 1997 X November 1, 1995 February 1, 1996 Y March 1, 1996 May 1, 1996 (A) 0.4 (B) 0.5 (C) 0.6 (D) 0.7 (E) 0.8 5.33 (3 points) You are given: (i) The complete data set: 72, 64, 80, 76, 72, X. (ii) k = 4 (iii) s1 = 1 (iiii) r4 = 1 ^
(iv) The Nelson-Åalen Estimate S (78) < 29%. Determine X. (A) 64 (B) 68
(C) 72
(D) 76
(E) 80
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 150
5.34 (3 points) You observe the following claims data: Deductible Maximum Covered Loss Payment 1 10 4 1 10 9 1 25 1 1 25 14 2 10 3 2 10 6 2 25 4 2 25 20 Using the Nelson-Aalen estimator, calculate S(15)/S(1). (A) 17% (B) 20% (C) 23% (D) 26% (E) 29% 5.35 (2 points) Joe Kerr and Gwen Peng are actuaries at the Gotham Insurance Company. Joe applied the Nelson-Aalen estimator to some mortality data. In the course of checking Joe's work, Gwen finds a mistake. Ten of the lives that Joe had thought had died at age 70 had actually died at a later age. What is the ratio of Gwen's corrected estimate of S(70) to Joe's original estimate of S(70)? Define any notation used to answer the question. 5.36 (3 points) You are given the following data from a clinical study: Time Event 0.0 30 new entrants 2.1 1 death 3.0 13 withdrawals 3.0 2 deaths 3.0 10 new entrants 5.0 8 withdrawals 6.2 1 death Calculate the absolute difference between the Product Limit estimate of S(7) and the Nelson-Aalen estimate of S(7). A) Less than 0.004 (B) At least 0.004, but less than 0.005 (C) At least 0.005, but less than 0.006 (D) At least 0.006, but less than 0.007 (E) At least 0.007
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 151
Use the following information for the next two questions:
• Extra Inning games in baseball last more than 9 innings. • You observed the length in innings of 16 extra innings baseball games: 11, 10, 19, 12, 10, 13, 10, 11, 15, 22, 13, 10, 14, 11, 12, 16.
• Use the Nelson-Aalen estimator. 5.37 (2 points) What is the probability that an extra inning game will last more than 15 innings? 5.38 (2 points) Given that an extra inning game has lasted more than 11 innings, what is the probability that it will last more than 15 innings?
5.39 (3 points) You are given: (i) The following is a sample of 8 losses: 500, 1000, 1000, 1000, 2000, 3000, 5000, 25,000 ^
(ii) H1 (x) is the Nelson-Aalen empirical estimate of the cumulative hazard rate function. ^
(iii) H2 (x) is the maximum likelihood estimate of the cumulative hazard rate function under the assumption that the sample is drawn from a Pareto Distribution with θ = 1000. ^
^
Calculate | H2 (1500) - H1 (1500)|. (A) 0.14
(B) 0.16
(C) 0.18
(D) 0.20
(E) 0.22
5.40 (3 points) You are given the following information on a set of corporations, taken from a data base of corporations that were in existence in 1940 or were founded later: Corporation A, Founded in 1951, Still in Existence in 2010. Corporation B, Founded in 1925, Still in Existence in 2010. Corporation C, Founded in 1908, Ceased to Exist in 1983. Corporation D, Founded in 1973, Ceased to Exist in 2005. Corporation E, Founded in 1893, Ceased to Exist in 1968. Corporation F, Founded in 1964, Still in Existence in 2010. Corporation G, Founded in 1879, Still in Existence in 2010. Use the Nelson-Aalen Estimator to estimate the probability that a similar corporation founded in 2010 will survive to at least 2100. (A) 47% (B) 50% (C) 53% (D) 56% (E) 59%
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 152
5.41 (2 points) You are given: (i) In a complete mortality study no two deaths occurred at the same time. (ii) 23 lives survived past age 70. (iii) 15 lives survived past age 75. (iv) The Nelson-Aalen estimate of the survival function at age 70 is 0.43. Determine the Nelson-Aalen estimate of the survival function at age 75. (A) 0.280 (B) 0.282 (C) 0.284 (D) 0.286 (E) 0.288 5.42 (2 points) You are given for a mortality study: (i) Deaths occur at times t1 , t2 , t3 , ... (ii) The Nelson-Aalen estimate of S(t1 ) = 0.918176. (iii) The Nelson-Aalen estimate of S(t2 ) = 0.872047. Determine the product limit estimate of S(t2 ). 5.43 (160, 11/86, Q.2) (2.1 points) The results of using the product-limit (Kaplan-Meier) estimator of S(x) for a certain data set are: 1.0 for 0 ≤ x < a, 49/50 for a ≤ x < b, 1,911/2,000 for b ≤ x < c, and 36,309/40,000 for c ≤ x < d. Determine the Nelson-Aalen estimate of S(c). (A) e-23/250
(B) e-93/1000
(C) e-19/200
(D) e-97/1000
(E) e-1/10
5.44 (160, 5/87, Q.14) (2.1 points) In a mortality study, the following observations are made: (i) x persons die, 1 withdraws and 1 enters at time t = 1. (ii) y persons die and 1 enters at t = 2. (iii) 1 person dies at t = 3. ^
Based on these observations, three values of H(t), the Nelson-Aalen estimate of the cumulative hazard rate function at time t are: ^
^
^
H(1.5) = 0.20, H(2.5) = 0.45, H(3.5) = 0.55. Determine x + y. (A) 3 (B) 4
(C) 5
(D) 6
(E) 7
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 153
5.45 (160, 5/87, Q.15) (2.1 points) Five individuals are observed over the interval (0, 1]. Individual Time At Entry Observation 1 0 Survived to time 1 2 0 Withdrew at time 1/3 3 0 Died at time 1/2 4 1/3 Died at time 5/6 5 2/3 Survived to time 1 ^
The Nelson-Aalen estimate of the cumulative hazard function over (0, 1] is denoted by H(1). The maximum likelihood estimate for the cumulative hazard function over (0.1], assuming a constant ~
^
~
force of hazard, is denoted by H(1). Determine H(1) - H(1). (A) -0.08
(B) -0.06
(C) 0.00
(D) 0.06
(E) 0.08
5.46 (160, 11/87, Q.14) (2.1 points) You are given the following data from a clinical study: Time Event 0.0 20 new entrants 1.1 1 death 1.5 9 terminations 2.3 1 death 3.0 1 new entrant 3.2 1 death 4.7 1 termination 6.0 2 deaths Calculate the absolute difference between the Product Limit estimate of S(6) and the Nelson-Aalen estimate of S(6). (A) 0.01 (B) 0.03 (C) 0.05 (D) 0.08 (E) 0.11 5.47 (160, 11/87, Q.18) (2.1 points) In a complete data study, with only one death at each point, H(t) is estimated by the Nelson-Aalen method. ^
^
^
You are given H(t10) = 0.669, and H(t11) = 0.769. Calculate H(t2 ). (A) 0.103
(B) 0.108
(C) 0.113
(D) 0.118
(E) 0.123
5.48 (160, 5/88, Q.15) (2.1 points) You are given the following for a complete data study: (i) No simultaneous deaths occur. (ii) One-third of the original entrants are surviving after k deaths, at time tk. (iii) The Nelson-Aalen estimate of H(tk) = 0.95. Determine k. (A) 2 (B) 4
(C) 6
(D) 8
(E) 10
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 154
5.49 (160, 11/88, Q.11) (2.1 points) A study starts with 12 newly diagnosed AIDS patients. Six additional AIDS patients enter the study 9 months after diagnosis. There is one withdrawal from the study, 6 months after diagnosis. One death occurs at each of: 6 months after diagnosis, 9 months after diagnosis, and 11 months after diagnosis. Estimates of the mortality rate during the first year after diagnosis, q, are made using: I. The Nelson-Aalen estimator. ll. The product-limit estimator. Calculate the absolute difference between the product-limit estimate of q and the Nelson-Aalen estimate of q. (A) 0.01 (B) 0.02 (C) 0.03 (D) 0.04 (E) 0.05 5.50 (160, 11/88, Q.12) (2.1 points) In a complete data study, with only one death at each death point, tk, H(t) is estimated by the Nelson-Aalen method. ^
^
You are given H(t7 ) = 0.18472 and H(t8 ) = 0.21414. Determine the sample size. (A) 40 (B) 41 (C) 42
(D) 43
(E) 44
5.51 (160, 11/89, Q.13) (2.1 points) In a study with no intermediate entrants, you are given: (i) The first 2 deaths occur at times t1 and t2 . (ii) The product limit estimate of S(t2 ) is not zero. (iii) The sum of the product limit estimate of S(t2 ) and the Nelson-Aalen estimate of H(t2 ) is 17/16. (iv) All withdrawals occur within (t1 , t2 ). Determine the number of withdrawals. (A) 2 (B) 3 (C) 4
(D) 5
(E) 6
5.52 (160, 5/90, Q.12) (2.1 points) From a complete data study, you are given: (i) There is only one death at each death point. (ii) H(t) is estimated by the Nelson-Aalen method. ^
(iii) H(t2 ) ≅ 0.1144. Determine the product limit estimate of S(t2 ). (A) 0.86
(B) 0.87
(C) 0.88
(D) 0.89
(E) 0.90
5.53 (160, 5/91, Q.17) (1.9 points) For a complete data study with original sample size 16, the product limit estimate of S(12) is 0.9375. Calculate the Nelson-Aalen estimate of S(12). (A) 0.9337 (B) 0.9356 (C) 0.9375 (D) 0.9394 (E) 0.9413
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 155
5.54 (Course 160 Sample Exam #3, 1994, Q.11) (1.9 points) For a complete data study, you are given: (i) There is only one death at each death point. (ii) H(t) is estimated by the Nelson-Aalen method. (iii) H(t7 ) = 0.3726, where t7 denotes the time at which the seventh death occurs. Calculate the product limit estimate of S(t7 ). (A) 0.66
(B) 0.67
(C) 0.68
(D) 0.69
(E) 0.70
5.55 (Course 160 Sample Exam #1, 1996, Q.11) (1.9 points) In a complete data study, (i) The Nelson-Aalen estimate of H(t), immediately following the third death, is 1.0. (ii) The first two deaths occurred simultaneously, before the third death. Calculate the product limit estimate of S(t3 ). (A) 0.25
(B) 0.33
(C) 0.37
(D) 0.40
(E) 0.50
5.56 (Course 160 Sample Exam #3, 1997, Q.8) (1.9 points) For a complete data study, you are given: (i) All items were observed from t = 0. (ii) Failure is the only decrement. (iii) No simultaneous failures were observed. (iv) The Nelson-Aalen estimates for H(t9 ) and H(t10) were 0.511 and 0.588, respectively. ^
Using the Nelson-Aalen estimator, calculate S (t3 ). (A) 0.860
(B) 0.867
(C) 0.872
(D) 0.875
(E) 0.876
5.57 (Course 4 Sample Exam 2000, Q.2) The number of employees leaving a company for all reasons is tallied by the number of months since hire. The following data was collected for a group of 50 employees hired one year ago: Number of Months Since Hire Number Leaving the Company 1 1 2 1 3 2 5 2 7 1 10 1 12 1 Determine the Nelson-Aalen estimate of the cumulative hazard at the sixth month since hire. Note: Assume that employees always leave the company after a whole number of months.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 156
5.58 (2 points) In the previous question, determine the Nelson-Aalen estimate of the survival ^
function at the 12th month since hire, S(12). 5.59 (4, 5/00, Q.4) (2.5 points) For a mortality study with right-censored data, you are given: Time Number of Deaths Number at Risk 5 2 15 7 1 12 10 1 10 12 2 6 ^
^
Calculate S(12) based on the Nelson-Aalen estimate H(12). (A) 0.48
(B) 0.52
(C) 0.60
(D) 0.65
(E) 0.67
5.60 (4, 5/00, Q.8) (2.5 points) You are given the following data on time to death: (i)
Time 10 34 47 75 156 171
Number of Deaths 1 1 1 1 1 1
^
Number of Risks 20 19 18 17 16 15
H(ti) 0.0500 0.1026 0.1582 0.2170 0.2795 0.3462
^
(ii) H(ti) is the Nelson-Aalen estimate of the cumulative hazard function. Determine the kernel-smoothed estimate of the hazard rate at 100, using bandwidth 60 and the uniform kernel. (A) 0.0010 (B) 0.0015 (C) 0.0029 (D) 0.0590 (E) 0.0885 5.61 (4, 11/02, Q.4 & 2009 Sample Q. 33) (2.5 points) In a study of claim payment times, you are given: (i) The data were not truncated or censored. (ii) At most one claim was paid at any one time. (iii) The Nelson-Aalen estimate of the cumulative hazard function, H(t), immediately following the second paid claim, was 23/132. Determine the Nelson-Aalen estimate of the cumulative hazard function, H(t), immediately following the fourth paid claim. (A) 0.35 (B) 0.37 (C) 0.39 (D) 0.41 (E) 0.43
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 157
5.62 (4, 11/03, Q.40 & 2009 Sample Q.30) (2.5 points) You are given the following about 100 insurance policies in a study of time to policy surrender: (i) The study was designed in such a way that for every policy that was surrendered, a new policy was added, meaning that the risk set, rj, is always equal to 100. (ii) Policies are surrendered only at the end of a policy year. (iii) The number of policies surrendered at the end of each policy year was observed to be: 1 at the end of the 1st policy year 2 at the end of the 2nd policy year 3 at the end of the 3rd policy year n at the end of the nth policy year (iv) The Nelson-Aalen empirical estimate of the cumulative distribution function at time n, ^
F(n), is 0.542. What is the value of n? (A) 8
(B) 9
(C) 10
(D) 11
(E) 12
5.63 (4, 11/04, Q.36 & 2009 Sample Q.158) (2.5 points) You are given: (i) The following is a sample of 15 losses: 11, 22, 22, 22, 36, 51, 69, 69, 69, 92, 92, 120, 161, 161, 230 ^
(ii) H1 (x) is the Nelson-Aalen empirical estimate of the cumulative hazard rate function. ^
(iii) H2 (x) is the maximum likelihood estimate of the cumulative hazard rate function under the assumption that the sample is drawn from an exponential distribution. ^
^
Calculate | H2 (75) - H1 (75)|. (A) 0.00
(B) 0.11
(C) 0.22
(D) 0.33
(E) 0.44
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
5.64 (4, 5/05, Q.3 & 2009 Sample Q.174) (2.9 points) You are given: (i) A mortality study covers n lives. (ii) None were censored and no two deaths occurred at the same time. (iii) tk = time of the kth death. ^
(iv) A Nelson-Aalen estimate of the cumulative hazard rate function is H(t2 ) = 39/380. Determine the Kaplan-Meier product-limit estimate of the survival function at time t9 . (A) Less than 0.56 (B) At least 0.56, but less than 0.58 (C) At least 0.58, but less than 0.60 (D) At least 0.60, but less than 0.62 (E) At least 0.62 5.65 (4, 11/06, Q.14 & 2009 Sample Q.258) (2.9 points) For the data set 200 300 100 400 X you are given: (i) k = 4 (ii) s2 = 1 (iii) r4 = 1 ^
(iv) The Nelson-Åalen Estimate H(410) > 2.15 Determine X. (A) 100 (B) 200
(C) 300
(D) 400
(E) 500
Page 158
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 159
5.66 (4, 11/06, Q.20 & 2009 Sample Q.264) (2.9 points) You are given: (i) The following data set: 2500 2500 2500 3617 3662 4517 5000 5000 6010 6932 7500 7500 ^
(ii) H1 (7000) is the Nelson-Åalen estimate of the cumulative hazard rate function calculated under the assumption that all of the observations in (i) are uncensored. ^
(iii) H2 (7000) is the Nelson-Åalen estimate of the cumulative hazard rate function calculated under the assumption that all occurrences of the values 2500, 5000 and 7500 in (i) reflect right-censored observations and that the remaining observed values are uncensored. ^
^
Calculate | H1 (7000) - H2 (7000)|. (A) Less than 0.1 (B) At least 0.1, but less than 0.3 (C) At least 0.3, but less than 0.5 (D) At least 0.5, but less than 0.7 (E) At least 0.7 5.67 (4, 11/06, Q.31 & 2009 Sample Q.274) (2.9 points) For a mortality study with right censored data, you are given the following: Time Number of Deaths Number at Risk 3 1 50 5 3 49 6 5 k 10 7 21 You are also told that the Nelson-Åalen estimate of the survival function at time 10 is 0.575. Determine k. (E) 46 (A) 28 (B) 31 (C) 36 (D) 44
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Solutions to Problems: 5.1. C. For Group A: H(t4 ) = 10/100 + 15/80 + 20/60 + 15/30 = 1.121. S(t4 ) = e-1.121 = 0.326. yi
si
ri
si/ri
H(yi)
S(yi)
1 2 3 4
10 15 20 15
100 80 60 30
0.100 0.188 0.333 0.500
0.100 0.287 0.621 1.121
0.905 0.750 0.537 0.326
5.2. E. For Group B: yi
si
ri
si/ri
H(yi)
S(yi)
1 2 3 4
15 20 30 5
100 90 70 40
0.150 0.222 0.429 0.125
0.150 0.372 0.801 0.926
0.861 0.689 0.449 0.396
5.3. D. For Group A plus Group B: yi
si
ri
si/ri
H(yi)
S(yi)
1 2 3 4
25 35 50 20
200 170 130 70
0.125 0.206 0.385 0.286
0.125 0.331 0.715 1.001
0.882 0.718 0.489 0.367
5.4. B. H(15) = 2/10 + 1/6 + 1/5 + 1/3 = 0.900. S(15) = e-0.9 = 0.407. yi
si
ri
si/ri
H(yi)
S(yi)
4 6 9 12 17
2 1 1 1 1
10 6 5 3 1
0.200 0.167 0.200 0.333 1.000
0.200 0.367 0.567 0.900 1.900
0.819 0.693 0.567 0.407 0.150
^
5.5. A. H(t) = Σ si/ri. At the time of the sixth claim: ^
H(y6 ) = 1/20 + 1/19 + 1/18 + 1/17 + 1/16 + 1/15 = 0.346.
Page 160
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
^
HCM 10/16/12,
Page 161
^
5.6. D. H(t5 ) - H(t4 ) = 1/(n-4). ⇒ 0.19288 - 0.15121 = 1/(n-4). ⇒ n = 28. ^
H(t8 ) = 1/28 + 1/27 + 1/26 + 1/25 + 1/24 + 1/23 + 1/22 + 1/21 = 0.32943. ^
S(t8 ) = e-0.32943 = 0.7193. Since there is no truncation or censoring, the Kaplan-Meier estimate of the Survival Function is the empirical Survival Function: 20/28 = 0.7143. The difference is: 0.7193 - 0.7143 = 0.0050. Comment: See Exercise 14.8 in Loss Models. 5.7 to 5.11. Deductible None None None 10,000 10,000 10,000 25,000 25,000 25,000
Maximum Covered Loss 25,000 50,000 100,000 25,000 50,000 100,000 50,000 100,000 100,000
Payment 25,000 4,000 15,000 5,000 40,000 13,000 5,000 75,000 60,000
Size of Loss 25,000 or more 4,000 15,000 15,000 50,000 or more 23,000 30,000 100,000 or more 85,000
yi
si
ri
si/ri
H(yi)
S(yi)
4000 15000 23000 30000 85000
1 2 1 1 1
3 5 3 4 2
0.333 0.400 0.333 0.250 0.500
0.333 0.733 1.067 1.317 1.817
0.717 0.480 0.344 0.268 0.163
S(60,000) = 0.268. S(150,000) = S(100,000)150000/100000 = 0.1631.5 = 0.066. Prob[15,000 ≤ X ≤ 50,000] = S(14999.9) - S(50000) = 0.717 - 0.268 = 0.449. 15q 10 = (S(10) - S(25))/S(10) = (0.717 - 0.344)/0.717 = 0.520. 10|15q 5
= (S(15) - S(30))/S(5) = (0.480 - 0.268)/0.717 = 0.296. ^
^
5.12. B. H(t8 ) - H(t7 ) = 1/(n-7). ⇒ .0.2200 - 0.1897 = 1/(n-7). ⇒ n = 40. ^
^
H(t4 ) = 1/40 + 1/39 + 1/38 + 1/37 = 0.1040. S(t4 ) = e-0.1040 = 0.901. Comment: See Exercise 14.8 in Loss Models.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 162
5.13, 5.14, & 5.15. ti
si
ri
si/ri
H(ti)
S(ti)/S(500)
500 600 800 1000 1100 1400 1800 2000 3000 4000
0 1 1 1 1 2 1 2 1 1
5 5 4 3 8 7 5 4 2 1
0.000 0.200 0.250 0.333 0.125 0.286 0.200 0.500 0.500 1.000
0.000 0.200 0.450 0.783 0.908 1.194 1.394 1.894 2.394 3.394
1.000 0.819 0.638 0.457 0.403 0.303 0.248 0.150 0.091 0.034
1000q 500
= 1 - S(1500)/S(500) = 1 - 0.303 = 0.697.
500q 1000
= 1 - S(1500)/S(1000) = 1 - {S(1500)/S(500)}/{S(1000)/S(500)}
= 1 - 0.303/0.457 = 0.337. 1000|2000q 500 = {S(1500) - S(3500)}/S(500) = 0.303 - 0.091 = 0.212. Comment: Note that in this case, all the data is left truncated and the smallest truncation point is 500. Thus we have no information on what happens below 500. Thus we are really estimating S(yi)/S(500), the probability of survival to yi conditional on survival to 500. The risk set at 1000 includes the last three of the losses from policies with a deductible of 500. The risk set at 1000 includes none of the losses from policies with a deductible of 1000. Values truncated from below at 1000, are not in the risk set at 1000. Rather they enter the risk set just after 1000. With a deductible of 1000, a loss of size 1000 would result in no payment and thus would not enter into the data base.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 163
5.16 through 5.23. There are 2 claims whose time of settlement is censored from above at 36, and 4 claims whose time of settlement is censored from above at 24. ti
si
ri
si/ri
H(ti)
S(ti)
0 1 2 3 5 8 9 12 15 20 26 31
1 1 1 3 1 1 2 1 1 1 1
20 19 18 17 14 13 12 10 9 4 3
0.050 0.053 0.056 0.176 0.071 0.077 0.167 0.100 0.111 0.250 0.333
0.050 0.103 0.158 0.335 0.406 0.483 0.650 0.750 0.861 1.111 1.444
0.951 0.902 0.854 0.716 0.666 0.617 0.522 0.473 0.423 0.329 0.236
S(30) = 0.329. Extrapolate via an Exponential beyond the last censorship point of 36: S(48) = S(36)48/36 = 0.2364/3 = 0.146. Prob[12 ≤ T ≤ 36] = S(11.99) - S(36) = 0.617 - 0.236 = 0.381. Prob[12 < T ≤ 36] = S(12) - S(36) = 0.522 - 0.236 = 0.286. 6 q 24 = 1 - S(30)/S(24) = 1 - 0.329/0.423 = 0.222. 12|18q 6
= {S(18) - S(36)}/S(6) = (0.473 - 0.236)/0.716 = 0.331.
The uniform kernel is 1/10 from 0 to 10. The smoothed estimate of h(10) is: (h(1) + h(2) + h(3) + h(5) + h(8) + h(9))/10 = (1/20 + 1/19 + 1/18 + 3/17 + 1/14 + 1/13)/10 = 0.483/10 = 0.0483. The triangular kernel is 1/3 at 10 and zero at 7 and 13. The smoothed estimate of h(10) is: h(8)/9 + 2h(9)/9 + h(12)/9 = (1/14)/9 + (1/13)(2/9) + (2/12)/9 = 0.0435. Comment: Some of the times of closing are censored from above at 24 and some are censored from above 36. We are not interested in the actual time a claim is closed. What we want to do is line everything up with respect to the time between when the claim is reported and when it is closed. This difference is mathematically equivalent to the age at death for a person. We are concerned with the age of the claim when it is closed, or how long each claim survives in the open status. In a different example, Jane is born 1/1/00 and dies 1/1/85 at age 85. Sally is born 1/1/20 and dies 1/1/95 at age 75. For a survival analysis, we are only interested in the ages at death, or how long each individual survives. It does not matter that Jane died before Sally. Jane would be in the risk set at age 80, while Sally would not. There is not a risk set at a particular point in time, such as 1/1/80.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 164
5.24. E. 5/6 = (r1 - s1 )/r1 = 1 - s1 /r1 . ⇒ s1 /r1 = 1/6. 5/9 = (5/6)(r2 - s2 )/r2 . ⇒ s2 /r2 = 1/3. 65/144 = (5/9)(r3 - s3 )/r3 . ⇒ s3 /r3 = 3/16. ^
^
^
H(30) = 1/6 + 1/3 + 3/16 = 0.6875. S(40) = S(30) = e-0.6875 = 0.503. Comment: In general, the Nelson-Aalen Estimator is greater than the Kaplan-Meier Product-Limit Estimator. So we know the answer is greater than 65/144 = 0.451. 5.25. D. Try the given values of m. For m = 4, the original number of lives must be: (3)(4) = 12. ^
^
H(y4 ) = 1/12 + 1/11 + 1/10 + 1/9 = 0.3854. S(y4 ) = e-0.3854 = 0.680. n + 1/ 2 ^
5.26. H(yn ) = 1/n + 1/(n-1) + ... + 1 ≅ 1 +
∫ 3/2
dx / x = 1 + ln(2n/3 + 1/3).
^
S(yn ) ≅ exp[-(1 + ln(2n/3 + 1/3))] = e-1/(2n/3 + 1/3) = (3/e)/(2n + 1) = 1.104/(2n + 1). The smoothed empirical estimate of the percentile corresponding to yn is 1/(n+1). Thus these two estimates of the chance that the next random individual will live longer than y n differ, with the Nelson-Aalen estimate being about half of that from the smoothed empirical estimate. In contrast the empirical survival function at yn is 0. Comment: Well beyond what you will be asked on your exam. ^
For example, for n = 100, S(y100) = exp[-5.18738] = 0.00559. While, 1.104/201 = 0.00549. ^
A better approximation is: S (yn ) ≅ Exp[-γ]/n, where γ is Eulerʼs Gamma; γ ≅ 0.577216. 5.27. E. Let n be the number of individuals at time 0, and w be the number who withdrew at time 10. Then r1 = n and r2 = n - 1 - w. S n (13) = {(n-1)/n}{(n - 2 - w)/(n - 1 - w)} = (n-1)(n - 2 - w)/{n(n - 1 - w)}. ^
^
H(13) = 1/n + 1/(n - 1 - w). Sn (13) + H(13) = (2n - 1 - w + n2 - 3n + 2 - nw + w)/{n(n - 1 - w)} = (n2 - n - nw + 1)/{n(n - 1 - w)} = 1 + 1/{n(n - 1 - w)} = 714/713. ⇒ n(n - 1 - w) = 713. ⇒ n = 31 and n - 1 - w = 23. ⇒ w = 7. Comment: Difficult! The only factors of 713 are 23 and 31, both prime. n > n - 1 - w. If instead n = 713 and n - 1 - w = 1, then Sn (13) = 0.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 165
^
5.28. E. H(18) = 1/8 + 1/7 + 1/6 = .4345. ^
H(50) = 1/8 + 1/7 + 1/6 + 1/5 + 1/4 + 1/3 + 1/2 = 1.7179. ^
^
Prob[25 ≤ X ≤ 50] = Prob[ X > 25 - ε] - Prob[X > 50] = S(25 - ε) - S(50) = e-0.4345 - e-1.7179 = 0.6476 - 0.1794 = 0.47. ^
^
^
5.29. D. H(15) = -ln(0.9045) = 0.10. H(25) = -ln(0.6065) = 0.50. H(35) = -ln(0.4724) = 0.75. x/n = .1. y/(n - 3 - x) = 0.5 - 0.1 = 0.4. 4/(n + 4 - x - y) = 0.75 - 0.5 = 0.25. Solving three equations in three unknowns: n = 20, x = 2, and y = 6. x + y = 8. ^
5.30. D. H(20) = 1/4 + 1/4 = 1/2. ti
si
ri
si/ri
H(ti)
S(ti)
12 16
1 1
4 4
0.250 0.250
0.250 0.500
0.779 0.607
5.31. D. H(7) = -lnS(7) = -ln(0.339) = 1.0818. H(8) = H(7) + 1/20 + 1/19 + 1/18 + 1/17 + 1/16 = 1.3613. S(6) = exp[-H(6)] = e-1.3613 = 0.256. 5.32. C. Since observing ended on July 1, 1996, person Wʼs death is not observed. Person Date of Diagnosis Date of Death U October 1, 1994 January 1, 1996 Died 15 months after diagnosis V October 1, 1994 July 1, 1995 Died 9 months after diagnosis W January 1, 1996 beyond July 1, 1996 Survived beyond 6 months X November 1, 1995 February 1, 1996 Died 3 months after diagnosis Y March 1, 1996 May 1, 1996 Died 2 months after diagnosis yi si ri si/ri H(yi) 2 3 9 15
1 1 1 1
5 4 2 1
1/5 1/4 1/2 1
.20 .45 .95 1.95
H(12 months) = 0.95. S(12 months) = e-0.95 = 0.387. F(12 months) = 1 - 0.387 = 0.613.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 166
5.33. C. A complete data set means that there is no truncation or censoring. Loss Models uses k for the number of distinct values at which a death occurs. k = 4 means we have 4 distinct values in the data set. Thus X is either 64, 72, 76, or 80. s1 = 1. ⇒ X ≠ 64. r4 = 1. ⇒ X ≠ 80. ^
^
^
^
If X = 72, H(78) = 1/6 + 3/5 + 1/2 = 1.267. S (78) = e-1.267 = 28.2%. OK. If X = 76, H(78) = 1/6 + 2/5 + 2/3 = 1.233. S (78) = e-1.233 = 29.1%. Not OK. Comment: Similar to 4, 11/06, Q.14. If X = 64, then s1 = 2. If X = 80, then r4 = 2. If the data set is: 72, 64, 80, 76, 72, 72, then: yi
si
ri
si/ri
H(yi)
S(yi)
64 72 76 80
1 3 1 1
6 5 2 1
0.1667 0.6000 0.5000 1.0000
0.1667 0.7667 1.2667 2.2667
0.8465 0.4646 0.2818 0.1037
If instead the data set were: 72, 64, 80, 76, 72, 76, then: yi
si
ri
si/ri
H(yi)
S(yi)
64 72 76 80
1 2 2 1
6 5 3 1
0.1667 0.4000 0.6667 1.0000
0.1667 0.5667 1.2333 2.2333
0.8465 0.5674 0.2913 0.1072
5.34. C.
Deductible 1
Maximum Covered Loss 10
Payment 4
Loss 5
1 1 1 2 2 2 2
10 25 25 10 10 25 25
9 1 14 3 6 4 20
10+ 2 15 5 8 6 22
yi
si
ri
si/ri
H(yi) - H(1)
2 5 6 8 15
1 2 1 1 1
4 7 5 4 2
0.250 0.286 0.200 0.250 0.500
0.250 0.536 0.736 0.986 1.486
S(yi)/S(1) 1 0.779 0.585 0.479 0.373 0.226
Comment: Since all the data is truncated from below at 1 or more, we can only estimate the survival function conditional on survival to 1. Those losses on policies with a deductible of 2 are not available to fail at size 2; a loss of size 2 on a policy with a deductible of 2 would not be in the data base. Those losses on policies with a deductible of 2 are not in the risk set at 2.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 167
5.35. Gwen's estimate of H(70) will differ from Joe's due to the difference in the final ratio added to the sum. Gwen's risk set at 70 should be the same, but the number of deaths is ten less. Let r70 be the size of the risk set at age 70. HGwen(70) = HJoe(70) - 10/r70. S Gwen(70) = Exp[-(HJoe(70) - 10/r70)] = Exp[10/r70] Exp[-HJoe(70)] = Exp[10/r70] SJoe(70). S Gwen(70)/SJoe(70) = Exp[10/r7 0]. 5.36. B. At time 3, the 13 lives that withdraw from the study are still in the risk set. At time 3, the 10 new entrants are not in the risk set; they are alive at exactly time 3. These 10 new entrants would be in the risk at time 3 + ε. ti
si
ri
(ri - si)/ri si/ri 2.1 1 30 29/30 1/30 3.0 2 29 27/29 2/29 6.2 1 16 15/16 1/16 Product Limit Estimator of S(7) = (29/30)(27/29)(15/16) = 0.84375. Nelson-Aalen estimate of H(7) = 1/30 + 2/29 + 1/16 = 0.16480. S(7) = e-0.16480 = 0.84806 |0.84375 - 0.84806| = 0.0043. Comment: Similar to 160, 11/87, Q.14. 5.37. & 5.38. x s 10 4 11 3 12 2 13 2 14 1 15 1 16 1 19 1 22 1 H(15) = 4/16
The risk set is as follows. r 16 12 9 7 5 4 3 2 1 + 3/12 + 2/9 + 2/7 + 1/5 + 1/4 = 1.4579.
S(15) = e-1.4579 = 0.233. H(11) = 4/16 + 3/12 = 0.5. S(11) = e-0.5 = 0.607. S(15) / S(11) = 0.233 / 0.607 = 0.384. Alternately, S(15) / S(11) = exp[-(2/9 + 2/7 + 1/5 + 1/4)] = 0.384.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
5.39. C. For the Pareto Distribution, α^ =
N θ+x ∑ ln[ θ i
HCM 10/16/12,
Page 168
=
]
8 = 0.7984. ln(1.5) + 3 ln(2) + ln(3) + ln(4) + ln(6) + ln(26) ^
S 2 (1500) =
0.7984 1000 ⎛ ⎞ ^ = 0.4812. H2 (1500) = -ln S2 (1500) = 0.7315. ⎝ 1000 + 1500 ⎠
H1 (1500) = Σ si/ri = 1/8 + 3/7 = 0.5536. ^
yi
si
ri
h
H
500 1,000
1 3
8 7
0.1250 0.4286
0.1250 0.5536
^
^
| H2 (1500) - H1 (1500)| = |0.7315 - 0.5536| = 0.1779. Comment: Similar to 4, 11/04, Q.36. The Nelson-Aalen estimator is nonparametric; in contrast, the maximum likelihood estimate assumes the data was drawn from Pareto Distribution with θ = 1000. The Nelson-Aalen estimator is constant on intervals between “deaths”. A comparison of the cumulative hazard rates estimated from the Nelson-Aalen (dots and horizontal lines) and the maximum likelihood Pareto Distribution (curve): H 2.5
2.0
1.5
1.0
0.5
5000
10000
15000
20000
25000
30000
x
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
5.40. A. Corporation A B C D E F G
Truncation 0 15 32 0 47 0 61
HCM 10/16/12,
Page 169
End Censored at 59 Censored at 85 Death at 75 Death at 32 Death at 75 Censored at 46 Censored at 131
At age 32, the risk set is: A, B, D, F, or Size 4. At age 75, the risk set is: B, C, E, G, or Size 4. y s r h H(y) S(y) 32
1
4
1/4
1/4
e-1/4
75 2 4 1/2 3/4 e-3/4. We do have data but no deaths from age 75 to 90. ^
^
S (90) = S (75) = e-3/4 = 47.2%. Comment: Corporation C, which is left truncated at 32 is not in the risk set at age 32. The survival of corporations over the next century will probably be different than that over the previous century. 5.41. C. H(70) = -ln[S(70)] = -ln[0.43] = 0.84397. H(75) = H(70) + 1/23 + 1/22 + 1/21 + 1/20 + 1/19 + 1/18 + 1/17 + 1/16 = 1.2600. S(75) = exp[-H(75)] = e-1.2600 = 0.2836. 5.42. C. 49/50 = (ra - sa)/ra. ⇒ sa/ra = 1/50. 1,911/2,000 = (49/50)(rb - sb )/rb . ⇒ sb /rb = 1/40. 36,309/40,000 = (1,911/2,000)(rc - sc)/rc. ⇒ sc/rc = 1/20. H(c) = 1/50 + 1/40 + 1/20 = 19/200. ⇒ S(c) = e- 1 9 / 2 0 0. Comment: The Kaplan-Meier Product Limit estimator of S has a jump discontinuity at every age where a death is observed; the estimator is constant on intervals between observed deaths. For example, (ra - sa)/ra = S(a) = 49/50 = S(b - ε). Similarly, {(ra - sa)/ra} {(rb - sb )/rb } = S(b) = 1,911/2,000 = S(c - ε). The Nelson-Aalen estimator of H has a jump discontinuity at every age where a death is observed; the estimator is constant on intervals between observed deaths. The deaths observed at age c, enter into the calculation of H(c). H(c) = sa/ra + sb /rb + sc/rc.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 170
5.43. H(t1 ) = -ln[0.918176] = 0.085366. ⇒ s1 /r1 = 0.085366. H(t2 ) = -ln[0.872047] = 0.136912. ⇒ s2 /r2 = 0.136912 - 0.085366 = 0.051546. Thus the product limit estimate of S(t2 ) is: (1 - s1 /r1 ) (1 - s2 /r2 ) = (1 - 0.085366) (1 - 0.051546) = 0.867488. 5.44. D. Assume originally z people were alive. x/z = H(1.5) = .2. ⇒ x = .2z. y/(z - x) = H(2.5) - H(1.5) = .45 - .20. ⇒ y = (z - x)/4 ⇒ y = .2z. 1/(z + 1 - x - y) = H(3.5) - H(2.5) = .55 - .45. ⇒ 10 = z + 1 - x - y. ⇒ z = 15.
⇒ x = 3. y = 3. x + y = 6. Comment:
si
ri
si/ri
H
3 3 1
15 12 10
1/5 .20 1/4 .45 1/10 .55
5.45. A. Constant force of hazard. ⇔ Exponential. For the Exponential, for maximum likelihood, θ^ = (total time observed)/(number of deaths observed) = (1 + 1/3 + 1/2 + 1/2 + 1/3)/2 = 4/3. ~
h(x) = 1/ θ^ = 3/4. H(1) = 3/4. ^
At time 1/2: one death with 3 at risk. At time 5/6: one death with 3 at risk. H(1) = 1/3 + 1/3 = 2/3. ^
~
H(1) - H(1) = 2/3 - 3/4 = -1/12 = -0.0833. 5.46. B.
si
ti
ri
(ri - si)/ri
si/ri
1.1 1 20 19/20 1/20 2.3 1 10 9/10 1/10 3.2 1 10 9/10 1/10 6.0 2 8 3/4 1/4 Product Limit Estimator of S(6) = (19/20)(9/10)(9/10)(3/4) = 0.5771. Nelson-Aalen estimate of H(6) = 1/20 + 1/10 + 1/10 + 1/4 = .5. S(6) = e-0.5 = 0.6065. |0.5771 - 0.6065| = 0.0294. ^
^
^
5.47. A. 0.1 = H(t11) - H(t10) = 1/r11. ⇒ r11 = 10. ⇒ r1 = 20. ⇒ H(t2 ) = 1/20 + 1/19 = 0.1026.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 171
5.48. B. H(tk) = 1/n + 1/(n-1) + ... + 1/(1 + n/3) = 0.95. The sum has 2n/3 terms and is approximately: (2n/3){1/n + 1/(1 + n/3)}/2 = 1/3 + 1/(3/n + 1). 1/3 + 1/(3/n + 1) ≅ 0.95 n ≅ 4.8. n has to be a multiple of 3, try n = 6. 1/6 + 1/5 + 1/4 + 1/3 = .95. Thus n = 6 and k = (2/3)6 = 4. Comment: One can try the choices, starting with C, and then moving up or down if necessary. 5.49. A.
t s r (r - s)/r s/r 6 1 12 11/12 1/12 9 1 10 9/10 1/10 11 1 15 14/15 1/15 Product Limit Estimator of q = 1 - (11/12)(9/10)(14/15) = 0.23. Nelson-Aalen estimate of H(1) = 1/12 + 1/10 + 1/15 = 0.25. q = 1 - e-0.25 = 0.2212. |0.23 - 0.2212| = 0.0088. ^
^
5.50. B. 0.21414 - 0.18472 = H(t8 ) - H(t7 ) = 1/r8 . ⇒ r8 = 34.
⇒ original sample size is r1 = 34 + 7 = 41. 5.51. D. Let w be the number of withdrawals. i ri si 1
r1
1
2
r1 - 1 - w
1
The product limit estimate of S(t2 ) is: {(r1 - 1)/r1 }{(r1 - 2 - w)/(r1 - 1 - w)} = (r1 2 - 3r1 - wr1 + 2 + w)/{r1 (r1 - 1 - w)}. The Nelson-Aalen estimate of H(t2 ) is: 1/r1 + 1/(r1 - 1 - w) = (2r1 - 1 - w)/{r1 (r1 - 1 - w)} The sum of the product limit estimate of S(t2 ) and the Nelson-Aalen estimate of H(t2 ) is: (r1 2 - r1 - wr1 + 1)/{r1 (r1 - 1 - w)} = 17/16. ⇒ r1 2 - r1 - wr1 = 16. ⇒ r1 (r1 - 1 - w) = 16.
⇒ r1 = 16 and r1 - 1 - w = 1, or r1 = 8 and r1 - 1 - w = 2. In the former case, w = 16 = r1 and S(t2 ) = 0. Therefore, we must have the latter case, r1 = 8 and w = r1 - 3 = 5.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 172
^
5.52. D. H(t2 ) = 1/n + 1/(n-1) = 0.1144. ⇒ n = 18. Product limit estimator of S(t2 ) is: (17/18)(16/17) = 8/9 = 0.8889. Comment: Using Nelson-Aalen: S(t2 ) = e-0.1144 = 0.8919. The Nelson-Aalen estimate is approximately the same as the product limit estimator, but is always greater. So a good guess for the product limit estimator of S(t2 ) is 0.89 or maybe 0.88. 5.53. D. For a complete study, the product limit estimate is equal to the empirical survival function. S(12) = 0.9375 = 1 - 1/16. Therefore, 1 death has occurred by time 12. Therefore, the Nelson-Aalen estimate of H(12) is 1/16.
⇒ The Nelson-Aalen estimate of S(12) is: exp[-1/16] = 0.9394. Comment: The Nelson-Aalen estimate is approximately the same as the product limit estimator, but is always greater. Therefore, only choices D and E are possible. 5.54. C. 0.3726 = H(t7 ) = 1/n + 1/(n-1) + 1/(n-2) + 1/(n-3) + 1/(n-4) + 1/(n-5) + 1/(n-6) ≅ 7/(n-3).
⇒ n ≅ 21.8. One can check that n = 22 produces the stated value for H(t7 ). For a complete study, the product limit estimate is equal to the empirical survival function.
⇒ Product limit estimate of S(t7 ) is: (22 - 7)/22 = 15/22 = 0.682. Comment: Product limit estimate of S(t7 ) is: (21/22)(20/21)(19/20)(18/19)(17/18)(16/17)(15/16). 5.55. A. 1.0 = H(t3 ) = 2/n + 1/(n - 2). ⇒ n2 - 5n + 4 = 0. ⇒ n = 1 or 4. But n ≥ 3, so n = 4. For a complete study, the product limit estimate is equal to the empirical survival function.
⇒ Product limit estimate of S(t3 ) is: (4 - 3)/4 = 1/4 = 0.25. ^
^
5.56. B. 0.588 - 0.511 = H(t10) - H(t9 ) = 1/(n - 9). ⇒ n = 9 + 1/0.077 = 22. ^
^
H(t3 ) = 1/22 + 1/21 + 1/20 = 0.1431. S (t3 ) = e-0.1431 = 0.8667.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
5.57. i
ti
ri
si
1 2 3 4
1 2 3 5
50 49 48 46
1 1 2 2
^
H(6) =
HCM 10/16/12,
4
∑ si / ri = 1/50 + 1/49 + 2/48 + 2/46 = 0.1256. i=1
^
^
Comment: S(6) = exp[- H(6)] = e-.1256 = 0.882. 5.58. i
ti
1 2 3 4 5 6 7
1 2 3 5 7 10 12
ri
si
50 49 48 46 44 43 42
1 1 2 2 1 1 1
7 ^
H(12) = Σ si/ri = 1/50 + 1/49 + 2/48 + 2/46 + 1/44 + 1/43 + 1/42 = 0.1953. i=1 ^
^
S(12) = exp[- H(12)] = e-0.1953 = 0.823. ^
5.59. B. H(12) = Σ si/ri = 2/15 + 1/12 + 1/10 + 2/6 = 0.650. ^
^
S(12) = exp[- H(12)] = e-0.650 = 0.522. ^
Comment: H(12) is the sum of si/ri, over all times of death less than or equal to 12.
Page 173
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 174
5.60. B. Three times are within 60 of 100: 47, 75 and 156, and thus contribute to the kernel-smoothed estimate. The height of the uniform kernel with bandwidth 60 is 1/120. The Nelson-Aalen estimate of the cumulative hazard function increases at each time of death by si/ri. Each point contributes to the kernel-smoothed estimate: (1/120)(si/ri). The kernel-smoothed estimate of the hazard rate at 100 is: (1/120)(1/18 + 1/17 + 1/16) = 0.00147. ^
Comment: One could use the given H(ti) in order to get the jumps in the cumulative hazard rate: 0.1582 - 0.1026 = 1/18, 0.2170 - 0.1582 = 1/17, 0.2795 - 0.2170 = 1/16. ^
5.61. C. H(t) = Σ si/ri . At the time of the second claim: ^
23/132 = H(t2 ) = 1/r1 + 1/r2 = 1/r1 + 1/(r1 - 1). ⇒ r1 = 12. ^
H(t4 ) = 1/12 + 1/11 + 1/10 + 1/9 = 0.385. ^
^
^
^
5.62. E. S(n) = 1 - F(n) = 1 - 0.542 = 0.458. H(n) = - ln S(n) = -ln(0.458) = 0.781. ^
H(n) =
n
n
i=1
i=1
∑ si / ri = ∑ i / 100 = (n(n+1)/2)/100 = n(n+1)/200.
Setting n(n+1)/200 = 0.781 ⇒ n2 + n - 156.2 = 0 ⇒ n = 12. Alternately, try n = 10: H = (1 + 2 + ... + 10)/100 = 0.55. Too small, so try n = 11: H = 0.55 + 11/100 = 0.66. Too small, so try n = 12: H = 0.66 + 12/100 = 0.78, OK.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 175
5.63. B. For the Exponential, θ^ = X = (11 + 22 + 22 + 22 + 36 + 51 + 69 + 69 + 69 + 92 + 92 + 120 + 161 + 161 + 230)/15 = 81.8. ^
H2 (75) = -ln S2 (75) = -ln(exp(-75/81.8)) = 75/81.8 = 0.9169. H1 (75) = Σ si/ri = 1/15 + 3/14 + 1/11 + 1/10 + 3/9 = 0.8052. ^
^
yi
si
ri
si/ri
H(yi)
11 22 36 51 69 92 120 161 230
1 3 1 1 3 2 1 2 1
15 14 11 10 9 6 4 3 1
0.0667 0.2143 0.0909 0.1000 0.3333 0.3333 0.2500 0.6667 1.0000
0.0667 0.2810 0.3719 0.4719 0.8052 1.1385 1.3885 2.0552 3.0552
^
| H2 (75) - H1 (75)| = |0.9169 - 0.8052| = 0.1117. Comment: With no censoring or truncation, for the Exponential the method of maximum likelihood is the same as the method of moments. For the Exponential, h(t) = 1/θ and H(t) = t/θ. ^
5.64. A. 39/380 = H(t2 ) = 1/n + 1/(n-1). ⇒ 39n2 - 799n + 380 = 0. ⇒ n = 20. The product limit estimator of S(t9 ) is the same as the empirical survival function: 11/20 = 0.55. Comment: It has been assumed that there are no new entries. The other root of the quadratic equation, n = 19/39 = 0.4872, is not a valid solution to this situation. 5.65. A. We need to assume there is no truncation or censoring. Loss Models uses k for the number of distinct values at which a death occurs. k = 4 means we have 4 distinct values in the data set. Thus X is either 100, 200, 300, or 400. s2 = 1. ⇒ X ≠ 200. r4 = 1. ⇒ X ≠ 400. ^
If X = 100, H(410) = 2/5 + 1/3 + 1/2 + 1 = 2.23. OK. ^
If X = 300, H(410) = 1/5 + 1/4 + 2/3 + 1 = 2.12. Not OK. Comment: A poor exam question, that has almost nothing to do with actuarial applications of ideas on the syllabus. If X = 200, then s2 = 2. If X = 400, then r4 = 2.
2013-4-7,
Survival Analysis §5 Nelson-Aalen Estimator,
HCM 10/16/12,
Page 176
^
5.66. D. H1 (7000) = 3/12 + 1/9 + 1/8 + 1/7 + 2/6 + 1/4 + 1/3. ^
^
^
H2 (7000) = 1/9 + 1/8 + 1/7 + 1/4 + 1/3. | H1 (7000) - H2 (7000)| = 3/12 + 2/6 = 0.58. Comment: When we assume 2500 and 5000 are censored values, these exits do not contribute to ^
the calculation of H2 , and thus the absolute difference just involves the terms in H1 (7000) contributed by these values. ^
^
5.67. C. 0.575 = S (10) = exp[- H(10)] = exp[-(1/50 + 3/49 + 5/k +7/21)]. ⇒ k = 36.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 177
Section 6, Variance of the Nelson-Aalen Estimator63 We are interested in the variance of the estimate of the Cumulative Hazard Rate gotten by using the j
^
Nelson-Aalen Estimator.
H(yj) =
∑ sr i . i=1 i
j
^
Var[ H( yj)] =
∑ rs2i .
64
i=1 i
For the example with eight women: yi
si
ri
si/ri
H(yi)
si/ri^2
Var[H(yi)]
68 77 79 84
1 2 1 1
7 7 5 3
0.143 0.286 0.200 0.333
0.1429 0.4286 0.6286 0.9619
0.020408 0.040816 0.040000 0.111111
0.020408 0.061224 0.101224 0.212336
^
Var[ H(84)] = 1/72 + 2/72 + 1/52 + 1/32 = 0.2123. Variance of the Estimated Survival Function: Let y = e-x. Then by the delta method, Var[y] = (∂y/∂x)2 Var[x] = (-e-x)2 Var[x] = y2 Var[x]. Apply this result with y = S(t) and x = H(t). Then: ^
Var[ S ( yj)] ≅
^
^ S ( yj) 2 Var[ H( yj)]
≅
j
^
S(yj)2
j
j
s ∑ rs2i = exp[-2∑ r i ] ∑ rs2i . i=1 i i=1 i i=1 i
For the example with eight women: yi
si
ri
si/ri
H(yi)
S(yi)
si/ri^2
Var[H(yi)]
Var[S(yi)]
68 77 79 84
1 2 1 1
7 7 5 3
0.143 0.286 0.200 0.333
0.1429 0.4286 0.6286 0.9619
0.867 0.651 0.533 0.382
0.020408 0.040816 0.040000 0.111111
0.020408 0.061224 0.101224 0.212336
0.015336 0.025982 0.028795 0.031012
^
^
S(84) = exp[ H(84)] = e-0.9619 = 0.382. ^
^
^
Var[ S(84)] = S(84)2 Var[ H(84)] = (0.3822 )(0.2123) = 0.0310. 63 64
See page 359 of Loss Models. This is sometimes called the Aalen estimate of the variance of the cumulative hazard rate.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 178
(Linear) Confidence Intervals:65 In the previous section, for an example with different deductibles and maximum covered losses, estimates were determined of the Cumulative Hazard Rate and Survival Rate: yi
si
ri
si/ri
H(yi)
S(yi)
300 700 2,000 11,000 12,000 15,000 29,000 70,000
1 1 1 1 1 1 1 1
4 6 8 7 6 5 4 1
0.250 0.167 0.125 0.143 0.167 0.200 0.250 1.000
0.250 0.417 0.542 0.685 0.851 1.051 1.301 2.301
0.779 0.659 0.582 0.504 0.427 0.350 0.272 0.100
Exercise: Estimate the variance of the estimates of H and S obtained above. [Solution: yi
si
ri
si/ri
H(yi)
S(yi)
si/ri^2
Var[H(yi)]
Var[S(yi)]
300 700 2,000 11,000 12,000 15,000 29,000 70,000
1 1 1 1 1 1 1 1
4 6 8 7 6 5 4 1
0.250 0.167 0.125 0.143 0.167 0.200 0.250 1.000
0.2500 0.4167 0.5417 0.6845 0.8512 1.0512 1.3012 2.3012
0.779 0.659 0.582 0.504 0.427 0.350 0.272 0.100
0.06250 0.02778 0.01562 0.02041 0.02778 0.04000 0.06250 1.00000
0.06250 0.09028 0.10590 0.12631 0.15409 0.19409 0.25659 1.25659
0.03791 0.03923 0.03584 0.03213 0.02808 0.02371 0.01901 0.01260
Then a 95% (linear) confidence interval for H(29000) would be: 1.3012 ± 1.96
0.2566 = 1.30 ± 0.99 = (0.31, 2.29).
Similarly, one could get a 95% (linear) confidence interval for S(29000) as follows: 0.272 ± 1.96 0.0190 = 0.27 ± 0.27 = (0, 0.54). However, this is not the preferred technique to use to get a confidence interval for S. Rather, one can multiply by minus one and exponentiate the 95% confidence interval for H(29000), in order to get a 95% confidence interval for S(29000): (e-2.29 , e-0.31) = (0.10, 0.73). This latter technique has the advantage of avoiding the possibility of values of S outside the interval 0 to 1. Also, the variance of S used in the prior technique was approximated via the delta method; the latter technique avoids using this approximation. In this situation, the default is the exponentiating technique;66 use the approximate delta method only when told to.67 65
Log-Transformed Confidence Intervals will be discussed in the next section. See 4, 5/05, Q.15. 67 See 4, 11/05, Q.20. 66
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 179
Kaplan-Meier Product-Limit Estimator vs. Nelson-Aalen Estimator: For the Kaplan-Meier Product-Limit Estimator: i=j
Var[Sn (yj)] =
Sn (yj)2
∑ r (r s-i i=1 i
i
si )
.
For the Nelson-Aalen Estimator: ^
Var[ S(yj)] ≅
^
^ ^ S(yj)2 Var[ H(yj)] = S(yj)2
i=j
∑ rs2i . i=1 i
As discussed previously, the Kaplan-Meier Product-Limit Estimator and the Nelson-Aalen Estimator produce similar estimates of the Survival Function. For si small compared to ri, 1/(ri - si) ≅ 1/ri. Therefore, in general the variance of the two estimated Survival Functions should be somewhat similar. For the example with different deductibles and maximum covered losses, here is a comparison of the results obtained previously: Kaplan-Meier Product-Limit yi (ri-si)/riS(yi) si ri si/(ri(ri-si)) Sum Var[S(yi)] 1 Cum. 300 0.750 0.0469 700 0.625 0.0456 2,000 0.547 0.0402 11,000 0.469 0.0348 12,000 0.391 0.0292 15,000 0.312 0.0236 29,000 0.234 0.0179 70,000 0.000
Nelson-Aalen yi (ri-si)/ri S(yi) si ri si/(ri(ri-si)) Cum. Sum Var[S(yi)] 300 700 2,000 11,000 12,000 15,000 29,000 70,000
0.779 0.659 0.582 0.504 0.427 0.350 0.272 0.100
0.0379 0.0392 0.0358 0.0321 0.0281 0.0237 0.0190 0.0126
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 180
The Basis of the Formula for the Variance the Nelson-Aalen Estimator of H:68 Define the failure ratio = si/ri = φi. Assuming that the ri are given and fixed, then the φi are independent. The number of deaths at time i is approximately Poisson.69 Thus the variance of the number of deaths at ti is equal to its mean, which we can estimate by the observed value si. si Var[si ] s ≅ 2i . Var[φi] = Var[ ] = 2 ri ri ri
^
H(yj) =
^
j
j
i =1
i =1
∑ si / ri = ∑ φi .
Var[ H(yj)] =
j
∑ Var[φ i] ≅ ∑ rs2i .
i =1
68 69
j
i=1 i
See page 359 of Loss Models. One could have instead assumed a Binomial. For small values of q, a Binomial approaches a Poisson.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 181
A Simulation Experiment: 400 losses were simulated from a Weibull Distribution with θ = 30,000 and τ = 1/2.70 100 losses had neither a deductible nor a policy limit applied. 100 losses had a 5000 deductible applied. 100 losses had a 50,000 policy limit applied. 100 losses had a 100,000 policy limit applied. Then the Nelson-Aalen estimator was applied to the combined data. Here is a graph of the estimated cumulative hazard rates (dots) versus that of the Weibull:71 H(x) 4
3
2
1
0
50000 100000 150000 200000 250000 300000 350000
x
70
This Weibull has a heavier righthand tail than the Exponential, but a lighter righthand tail than the LogNormal.
71
S(x) = Exp[-
x / 30,000 ]. H(x) =
x / 30,000 . An overestimate of H(x). ⇔ An underestimate of S(x).
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 182
Here is a graph of the estimated cumulative hazard rates plus or minus 1.96 standard deviations versus the cumulative hazard rate of the Weibull (thick): H(x) 8
6
4
2
100000 200000 300000 400000 500000 600000 700000
x
For each observed uncensored value in the data set, there is an approximate 95% confidence ^
^
interval for H(x) based on H(x) and Var[ H(x)]. The true underlying Cumulative Hazard Rate, that of the Weibull Distribution from which the losses were simulated, remains inside the approximate 95% confidence intervals based on the data. With more data, the confidence intervals would have been narrower.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 183
Problems: Use the following information for the next six questions: Group A Group B si ri si ri yi 1 2 3 4 5
40 20 30 20 10
300 200 200 100 50
20 30 20 10 5
200 300 150 100 50
6.1 (2 points) Determine the Nelson-Aalen estimate of H(4) based on the data for Group A. (A) Less than 0.45 (B) At least 0.45, but less than 0.50 (C) At least 0.50, but less than 0.55 (D) At least 0.55, but less than 0.60 (E) At least 0.60 6.2 (2 points) Determine the standard deviation of the estimate in the previous question. (A) 0.02 (B) 0.03 (C) 0.04 (D) 0.05 (E) 0.06 6.3 (2 points) Determine the Nelson-Aalen estimate of H(4) based on the data for Group B. (A) Less than 0.45 (B) At least 0.45, but less than 0.50 (C) At least 0.50, but less than 0.55 (D) At least 0.55, but less than 0.60 (E) At least 0.60 6.4 (2 points) Determine the standard deviation of the estimate in the previous question. (A) 0.02 (B) 0.03 (C) 0.04 (D) 0.05 (E) 0.06 6.5 (3 points) Determine the Nelson-Aalen estimate of H(4) based on the data for both groups. (A) Less than 0.45 (B) At least 0.45, but less than 0.50 (C) At least 0.50, but less than 0.55 (D) At least 0.55, but less than 0.60 (E) At least 0.60 6.6 (2 points) Determine the standard deviation of the estimate in the previous question. (A) 0.02
(B) 0.03
(C) 0.04
(D) 0.05
(E) 0.06
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 184
6.7 (2 points) Based on the following information, use the Nelson-Aalen Estimator to estimate the survival function at 40. yi ri si 10 800 20 1000 30 600 40 350 50 200 60 100 (A) 0.52
200 150 50 60 40 30 (B) 0.54
(C) 0.56
(D) 0.58
(E) 0.60
6.8 (2 points) What is the standard deviation of the estimate in the previous question? (A) 0.015 (B) 0.016 (C) 0.017 (D) 0.018 (E) 0.019 Use the following information for the next two questions: Three hundred cancer patients were observed from the time of diagnosis until the earlier of death or 48 months from diagnosis. Deaths occurred during the study as follows: Time In Months Since Diagnosis Number Of Deaths 6 22 12 35 18 26 24 19 30 13 36 11 42 8 48 7 6.9 (3 points) Using the Nelson-Aalen Estimator, determine a 95% linear confidence interval for H(40). (A) (0.49, 0.60) (B) (0.44, 0.65) (C) (0.46, 0.66) (D) (0.45, 0.59) (E) (0.43, 0.61) 6.10 (4 points) Using the Nelson-Aalen Estimator, determine a 95% linear confidence interval for S(48). (A) (0.49, 0.60) (B) (0.44, 0.65) (C) (0.46, 0.66) (D) (0.45, 0.59) (E) (0.43, 0.61)
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 185
6.11 (2 points) For a complete data study, you are given: (i) There are no simultaneous deaths. (ii) Using the Nelson-Aalen estimator, the estimate of the cumulative hazard rate at the time of the fifth death is 0.21822. Determine the standard deviation of this estimate. (A) 0.06 (B) 0.07 (C) 0.08 (D) 0.09 (E) 0.10 6.12 (2 points) Ten individuals are observed in a complete data study. The Kaplan-Meier Product-Limit estimate of S(54) is 0.8. Determine the variance of the Nelson-Aalen estimator of H(54). A. 1/100 B. 1/50 C. 1/40 D. 181/8100 E. Can not be determined Use the following information for the next two questions: Liability insurance policies are sold with three limits: 50, 100, and 250. You observe 10 claim payments from each type of policy. Limit of 50: 3, 9, 14, 27, 39, 50, 50, 50, 50, 50. Limit of 100: 4, 8, 17, 22, 55, 60, 100, 100, 100, 100. Limit of 250: 7, 12, 20, 32, 45, 70, 90, 125, 190, 250. 6.13 (3 points) Determine the Nelson-Aalen estimate of S(75). (A) 34% (B) 36% (C) 38% (D) 40%
(E) 42%
6.14 (4 points) Determine the variance of the estimate in the previous question. (A) 0.0090 (B) 0.0095 (C) 0.0010 (D) 0.0015
(E) 0.0011
6.15 (3 points) For a mortality study with right-censored data, the cumulative hazard rate is estimated using the Nelson-Aalen estimator. You are given: (i) No deaths occur between times yi and yi+1. (ii) A 90% linear confidence interval for H(yi) is (0.28954, 0.45546). (iii) A 90% linear confidence interval for H(yi+1) is (0.34016, 0.52984). Calculate the number of deaths observed at time ti+1. (A) 4
(B) 5
(C) 6
(D) 7
(E) 8
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 186
Use the following information for the next two questions: You are given the following censored data on the time until remission of symptoms of 10 cancer patients on chemotherapy. i Time to Remission (weeks) 1 8 Remission 2 10 Remission 3 17 Censored 4 22 Remission 5 32 Censored 6 33 Remission 7 33 Censored 8 47 Remission 9 82 Remission 10 104 Censored 6.16 (2 points) Determine the Nelson-Aalen Estimator of S(104). (A) 19% (B) 21% (C) 23% (D) 25%
(E) 27%
6.17 (2 points) Determine the variance of the estimate in the previous question. (A) 0.020 (B) 0.022 (C) 0.024 (D) 0.026
(E) 0.028
Use the following information for the next two questions: For a survival study with censored and truncated data, you are given: Time (t) Number at Risk at Time t Failures at Time t 1 30 5 2 27 9 3 32 6 4 25 5 5 20 4 Use the Nelson-Aalen Estimator. 6.18 (3 points) Calculate the variance of 3 q^ 1 . (A) 0.0050
(B) 0.0056
(C) 0.0062
(D) 0.0068
(E) 0.0074
6.19 (3 points) Determine a 90% linear confidence interval for 3 q^ 2 . A. (0.303, 0.585)
B. (0.276, 0.612)
C. (0.255, 0.633)
D. (0.218, 0.670)
E. (0.188, 0.700)
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 187
Use the following information for the next two questions: You are given the following times of first claim for five randomly selected auto insurance policies observed from time t = 0: 1 2 3 4 5 You are later told that one of the five times given is actually the time of policy lapse, but you are not told which one. 6.20 (2 points) The smallest Nelson-Aalen estimate of H(4) would result if which of the given times arose from the lapsed policy? (A) 1 (B) 2 (C) 3 (D) 4 (E) 5 6.21 (3 points) The largest coefficient of variation of the Nelson-Aalen estimate of H(4) would result if which of the given times arose from the lapsed policy? (A) 1 (B) 2 (C) 3 (D) 4 (E) 5
Use the following information for the next two questions: A survival study gave (0.302, 0.876) as the symmetric linear 90% confidence interval for the cumulative hazard rate at 7, H(7). 6.22 (2 points) Using the delta method, determine a symmetric linear 90% confidence interval for S(7). (A) (0.36, 0.68) (B) (0.38, 0.70) (C) (0.40, 0.71) (D) (0.42, 0.74) (E) (0.44, 0.76) 6.23 (1 point) Without using the delta method, determine a symmetric linear 90% confidence interval for S(7). (A) (0.36, 0.73) (B) (0.38, 0.73) (C) (0.40, 0.74) (D) (0.42, 0.74) (E) (0.44, 0.75)
6.24 (3 points) A mortality study yields the following results: ti ri si 72 80 20 73 140 22 74 90 25 The Nelson-Aalen estimator is used to estimate 3 p 71. Estimate the standard deviation of the estimate of 3 p 71. A. 0.04
B. 0.05
C. 0.06
D. 0.07
E. 0.08
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 188
Use the following information for the next two questions: The results of a study with a calendar year 2003 observation period are as follows: Individual Birthdate Death A April 1, 1932 June 1, 2003 B July 1, 1932 C October 1, 1932 March 1, 2003 D January 1, 1933 E April 1, 1933 F July 1, 1933 October 1, 2003 G September 1, 1933 All individuals were in the study on January 1, 2003. No one left the study during the observation period other than by death. 6.25 (2 points) Using the Nelson-Aalen estimator, estimate q70. (A) 0.43
(B) 0.44
(C) 0.45
(D) 0.46
(E) 0.47
6.26 (2 points) What is the standard deviation of the estimate in the previous question? (A) 0.19 (B) 0.20 (C) 0.21 (D) 0.22 (E) 0.23
6.27 (2 points) A serious accident occurred at the Ignobyl nuclear power plant. 12 workers were exposed to large doses of radiation. Deaths were observed at the following times (in years) after the accident: 2, 3, 3, 5, 6, 7, 9, 9. 4 of these workers remained alive at 10 years after the accident. Determine the Nelson-Aalen estimate of S(10). (A) 0.32 (B) 0.34 (C) 0.36 (D) 0.38 (E) 0.40 6.28 (2 points) What is the standard deviation of the estimate in the previous question? (A) 0.11 (B) 0.13 (C) 0.15 (D) 0.17 (E) 0.19
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 189
6.29 (2 points) You are given: Individual A B C D E F G H
Entry 0 0 0 0 5 10 15 20
Age at Withdrawal 26 33 52 71
Death 45 60 45 18 ^
Using the Nelson-Aalen estimator, determine the variance of H(60). (A) 0.27
(B) 0.29
(C) 0.31
(D) 0.33
(E) 0.35
6.30 (4, 5/00, Q.19) (2.5 points) For a mortality study with right-censored data, the cumulative hazard rate is estimated using the Nelson-Aalen estimator. You are given: (i) No deaths occur between times ti and ti+1. (ii) A 95% linear confidence interval for H(ti) is (0.07125, 0.22875). (iii) A 95% linear confidence interval for H(ti+1) is (0.15607, 0.38635). Calculate the number of deaths observed at time ti+1. (A) 4
(B) 5
(C) 6
(D) 7
(E) 8
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 190
6.31 (4, 11/00, Q.20) (2.5 points) Fifteen cancer patients were observed from the time of diagnosis until the earlier of death or 36 months from diagnosis. Deaths occurred during the study as follows: Time In Months Since Diagnosis Number Of Deaths 15 2 20 3 24 2 30 d 34 2 36 1 ^
The Nelson-Aalen estimate H(35) is 1.5641. ^
Calculate the estimate of the variance of H(35). (A) Less than 0.10 (B) At least 0.10, but less than 0.15 (C) At least 0.15, but less than 0.20 (D) At least 0.20, but less than 0.25 (E) At least 0.25 6.32 (4, 5/01, Q.14) (2.5 points) For a mortality study with right-censored data, you are given: ti si ri 1 15 100 8 20 65 17 13 40 25 31 31 Calculate the standard deviation of the Nelson-Aalen estimator of the cumulative hazard function at time 20. (A) Less than 0.05 (B) At least 0.05, but less than 0.10 (C) At least 0.10, but less than 0.15 (D) At least 0.15, but less than 0.20 (E) At least 0.20
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 191
6.33 (4, 5/05, Q.15 & 2009 Sample Q.185) (2.9 points) Twelve policyholders were monitored from the starting date of the policy to the time of first claim. The observed data are as follows: Time of First Claim 1 2 3 4 5 6 7 Number of Claims 2 1 2 2 1 2 2 Using the Nelson-Aalen estimator, calculate the 95% linear confidence interval for the cumulative hazard rate function H(4.5). (A) (0.189, 1.361) (B) (0.206, 1.545) (C) (0.248, 1.402) (D) (0.283, 1.266) (E) (0.314, 1.437) 6.34 (4, 11/05, Q.17 & 2009 Sample Q.228) (2.9 points) For a survival study, you are given: (i) Deaths occurred at times y1 < y2 < ... < y9 . (ii) The Nelson-Aalen estimates of the cumulative hazard function at y3 and y4 are: ^
^
H(y3 ) = 0.4128 and H(y4 ) = 0.5691. (iii) The estimated variances of the estimates in (ii) are: ^ ^ ^ [H ^ [H Var (y3 )] = 0.009565 and Var (y4 )] = 0.014448.
Determine the number of deaths at y4 . (A) 2
(B) 3
(C) 4
(D) 5
(E) 6
6.35 (4, 11/05, Q.20 & 2009 Sample Q.231) (2.9 points) A survival study gave (0.283, 1.267) as the symmetric linear 95% confidence interval for H(5). Using the delta method, determine the symmetric linear 95% confidence interval for S(5). (A) (0.23, 0.69) (B) (0.26, 0.72) (C) (0.28, 0.75) (D) (0.31, 0.73) (E) (0.32, 0.80)
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 192
6.36 (4, 5/07, Q.33) (2.5 points) You are given: (i) Eight people join an exercise program on the same day. They stay in the program until they reach their weight loss goal or switch to a diet program. (ii) Experience for each of the eight members is shown below: Time at Which… Member
Reach Weight Loss Goal
Switch to Diet Program
1
4
2
8
3
8
4
12
5
12
6
12
7
22
8
36
(iii) The variable of interest is time to reach weight loss goal. Using the Nelson-Aalen estimator, calculate the upper limit of the symmetric 90% linear confidence interval for the cumulative hazard rate function H(12). (A) 0.85
(B) 0.92
(C) 0.95
(D) 1.06
(E) 1.24
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Solutions to Problems:
^
6.1. D. & 6.2. E. H(4) =
4
∑ si / ri = 0.583. i=1
ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
1 2 3 4
40 20 30 20
300 200 200 100
0.133 0.100 0.150 0.200
0.1333 0.2333 0.3833 0.5833
0.000444 0.000500 0.000750 0.002000
0.000444 0.000944 0.001694 0.003694
^ ^
V [ H(4)] =
4
∑ si / ri2 = 0.003694. i=1
The standard deviation of the estimate of H(4) is: 0.003694 = 0.0608. Comment: An approximate 95% linear confidence interval for H(4) is: 0.58 ± 0.12. 6.3. A. & 6.4. D. For Group B: ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
1 2 3 4
20 30 20 10
200 300 150 100
0.100 0.100 0.133 0.100
0.1000 0.2000 0.3333 0.4333
0.000500 0.000333 0.000889 0.001000
0.000500 0.000833 0.001722 0.002722
4
V [ H(4)] = Σ si/ri2 = 0.002722. ^ ^
i =1
The standard deviation of the estimate of H(4) is:
0.002722 = 0.0522.
6.5. C. & 6.6. C. For Group A plus Group B: ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
1 2 3 4
60 50 50 30
500 500 350 200
0.120 0.100 0.143 0.150
0.1200 0.2200 0.3629 0.5129
0.000240 0.000200 0.000408 0.000750
0.000240 0.000440 0.000848 0.001598
^ ^
V [ H(4)] =
4
∑ si / ri2 = 0.001598. i=1
The standard deviation of the estimate of H(4) is:
0.001598 = 0.0400.
Page 193
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen, ^
6.7. A. & 6.8. C. H(4) =
4
∑ si / ri = 0.6548.
HCM 10/16/12,
Page 194
^
S(4) = e-0.6548 = 0.520.
i=1
ti
si
ri
si/ri
H(ti)
S(ti)
si/ri^2
Var[H(ti)]
Var[S(ti)]
1 2 3 4
200 150 50 60
800 1000 600 350
0.250 0.150 0.083 0.171
0.2500 0.4000 0.4833 0.6548
0.779 0.670 0.617 0.520
0.000313 0.000150 0.000139 0.000490
0.000313 0.000463 0.000601 0.001091
0.000190 0.000208 0.000229 0.000295
4
^ ^
V [ H(4)] =
∑ si / ri2 = .001091.
^ ^
^
^ ^
V [ S(4)] = S(4)2 V [ H(4)] = .000295.
0.000295 = 0.0172.
i=1
Comment: Let y = e-x. Then by the delta method, Var[y] ≅ (∂y/∂x)2 Var[x] = (-e-x)2 Var[x] = y 2 Var[x]. Apply this with y = S(t) and x = H(t). 6.9. E. & 6.10. A. A 95% linear confidence interval for H(40): .5189 ± 1.96
0.002194 = 0.52 ± 0.09.
ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
6 12 18 24 30 36 42 48
22 35 26 19 13 11 8 7
300 278 243 217 198 185 174 166
0.073 0.126 0.107 0.088 0.066 0.059 0.046 0.042
0.0733 0.1992 0.3062 0.3938 0.4594 0.5189 0.5649 0.6070
0.000244 0.000453 0.000440 0.000403 0.000332 0.000321 0.000264 0.000254
0.000244 0.000697 0.001138 0.001541 0.001873 0.002194 0.002458 0.002712
^
^
S(48) = exp[- H(48)] = e-0.6070 = 0.545. ^ ^
^
^ ^
V [ S(48)] = S(48)2 V [ H(48)] = 0.5452 (0.002712) = 0.0008055. A 95% linear confidence interval for S(48): 0.545 ± 1.96
0.0008055 = (0.49, 0.60).
Alternately, a 95% linear confidence interval for H(48) is: . 0.6070 ± 1.96 0.002712 = (0.505, 0.709). Multiplying by minus one and exponentiating, a 95% confidence interval for S(48) is: (e-0.709, e-0.505) = (0.49, 0.60). ^
6.11. E. H(y5 ) = 1/n + 1/(n-1) + 1/(n-2) + 1/(n-3) + 1/(n-4) ≅ 5/(n-2). 5/(n-2) ≅ .21822. ⇒ n ≅ 24.9. Try n = 25, then 1/25 + 1/24 + 1/23 + 1/22 + 1/21 = 0.21822. ^
Var[ H(y5 )] = 1/252 + 1/242 + 1/232 + 1/222 + 1/212 = 0.009560.
0.009560 = 0.0978.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 195
6.12. E. S(54) = 8/10. ⇒ There were a total of two deaths on or before time 54. ^
If the first two deaths were not simultaneous, then Var[ H(54)] = 1/102 + 1/92 = 181/8100. ^
If the first two deaths were simultaneous, then Var[ H(54)] = 2/102 = 1/50. Since it is not specified, we can not determine which applies. 6.13. D. and 6.14. B. yi
si
ri
si/ri
H(ti)
S(ti)
si/ri^2
Var[H(ti)]
Var[S(ti)]
3 4 7 8 9 12 14 17 20 22 27 32 39 45 55 60 70 90 125 190
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
30 29 28 27 26 25 24 23 22 21 20 19 18 17 11 10 9 8 3 2
0.033 0.034 0.036 0.037 0.038 0.040 0.042 0.043 0.045 0.048 0.050 0.053 0.056 0.059 0.091 0.100 0.111 0.125 0.333 0.500
0.0333 0.0678 0.1035 0.1406 0.1790 0.2190 0.2607 0.3042 0.3496 0.3972 0.4472 0.4999 0.5554 0.6143 0.7052 0.8052 0.9163 1.0413 1.3746 1.8746
0.9672 0.9344 0.9016 0.8689 0.8361 0.8033 0.7705 0.7377 0.7049 0.6722 0.6394 0.6066 0.5738 0.5410 0.4940 0.4470 0.4000 0.3530 0.2529 0.1534
0.001111 0.001189 0.001276 0.001372 0.001479 0.001600 0.001736 0.001890 0.002066 0.002268 0.002500 0.002770 0.003086 0.003460 0.008264 0.010000 0.012346 0.015625 0.111111 0.250000
0.001111 0.002300 0.003576 0.004947 0.006427 0.008027 0.009763 0.011653 0.013719 0.015987 0.018487 0.021257 0.024343 0.027804 0.036068 0.046068 0.058414 0.074039 0.185150 0.435150
0.001039 0.002008 0.002907 0.003735 0.004492 0.005180 0.005796 0.006342 0.006818 0.007223 0.007558 0.007822 0.008016 0.008139 0.008803 0.009205 0.009346 0.009226 0.011845 0.010242
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 196
^
6.15. B. H(yi) is the center of the interval (0.28954, 0.45546) or 0.37250. ^
H(yi+1) is the center of the interval (0.34016, 0.52984) or 0.43500. i+1 ^
^
H(yi+1) = Σ sj/rj = H(yi) + si+1/ri+1. ⇒ si+1/ri+1 = 0.43500 - 0.37250 = 0.0625. The interval (0.28954, 0.45546) is: 0.37250 ± 0.08296. ^
^
Since this is a 90% confidence interval, 1.645 StdDev[ H(yi)] = 0.08296 ⇒ Var[ H(yi)] = 0.002543. The interval (0.34016, 0.52984) is: 0.43500 ± 0.09484. ^
^
This is a 90% confidence interval, 1.645 StdDev[ H(yi+1)] = 0.09484. ⇒ Var[ H(yi+1)] = 0.003324. i+1 ^
^
Var[ H(yi+1)] = Σ sj/rj2 = Var[ H(yi)] + si+1/ri+12 . ⇒ si+1/ri+12 = 0.003324 - 0.002543 = 0.000781. Solving two equations in two unknowns: ri+1 = 80 and si+1 = 5. Comment: Similar to 4, 5/00, Q.19. 6.16. D. & 6.17. E.
6
H(104) = Σ si/ri = 1.3873. S(104) = e-1.3873 = 0.250. ^
^
i =1
ti
si
ri
si/ri
H(ti)
S(ti)
si/ri^2
Var[H(ti)]
Var[S(ti)]
8 10 22 33 47 82
1 1 1 1 1 1
10 9 7 5 3 2
0.100 0.111 0.143 0.200 0.333 0.500
0.1000 0.2111 0.3540 0.5540 0.8873 1.3873
0.905 0.810 0.702 0.575 0.412 0.250
0.010000 0.012346 0.020408 0.040000 0.111111 0.250000
0.010000 0.022346 0.042754 0.082754 0.193865 0.443865
0.008187 0.014650 0.021063 0.027329 0.032870 0.027686
^ ^
V [ H(104)] =
6
∑ si / ri2 = 0.4439. i=1
^ ^
^
^ ^
V [ S(104)] = S(104)2 V [ H(104)] = 0.0277.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 197
6.18. C. Since 3 p 1 is conditional on survival past one, perform all of the calculations beyond duration one. 3 p 1 = S(4)/S(1) = 0.4863. t
r
1 2 3 4 5
s
27 32 25 20
9 6 5 4
s/r
H(t) - H(1)
S(t)/S(1)
0.3333 0.1875 0.2000 0.2000
0 0.3333 0.5208 0.7208 0.9208
1.0000 0.7165 0.5940 0.4863 0.3982
i=4
^
q^
V [3 1 ] =
^
p^
V [3 1 ] = 3 p 1 2
∑ si / ri2 = 0.48632 {9/272 + 6/322 + 5/252} = 0.00620. i=2
Comment: Data taken from 4, 11/03, Q.21. 6.19. A. Since 3 q2 is conditional on survival to time two, perform all of the calculations beyond duration two. 3 p 2 = S(5)/S(2) = 0.5557. 3 q2 = 1 - 0.5557 = 0.4443. t
r
s
s/r
H(t) - H(2)
S(t)/S(2)
2 3 4 5
32 25 20
6 5 4
0.1875 0.2000 0.2000
0 0.1875 0.3875 0.5875
1.0000 0.8290 0.6788 0.5557
i=5
^
q^
V [3 2 ] = 3 p 2 2
∑ si / ri2 = 0.55572 {6/322 + 5/252 + 4/202} = 0.00737. i=3
90% confidence interval for 3 q^ 2 is: 0.4443 ± (1.645) 0.00737 = 0.444 ± 0.141 = (0.303, 0.585). Comment: S(5)/S(2) = {S(5)/S(1)}/{S(2)/S(1)} = 0.3982/0.7165 = 0.5557.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 198
6.20 D. & 6.21. C. If the observation at 5 is censored, H(4) = 1/5 + 1/4 + 1/3 + 1/2 = 1.283. If the observation at 4 is censored, H(4) = 1/5 + 1/4 + 1/3 = .783. If the observation at 3 is censored, H(4) = 1/5 + 1/4 + 1/2 = .95. If the observation at 2 is censored, H(4) = 1/5 + 1/3 + 1/2 = 1.033. If the observation at 1 is censored, H(4) = 1/4 + 1/3 + 1/2 = 1.083. The smallest H(4) occurs if the observation at 4 is censored. If the observation at 5 is censored, Var[H(4))] = 1/52 + 1/42 + 1/32 + 1/22 = 0.4636. CV =
0.4636 / 1.283 = 0.531.
If the observation at 4 is censored, Var[H(4))] = 1/52 + 1/42 + 1/32 = 0.2136. CV =
0.2136 / 0.783 = 0.590.
If the observation at 3 is censored, Var[H(4))] = 1/52 + 1/42 + 1/22 = 0.3525. CV =
0.3525 / 0.95 = 0.625.
If the observation at 2 is censored, Var[H(4))] = 1/52 + 1/32 + 1/22 = 0.4011 CV =
0.4011 / 1.033 = 0.613.
If the observation at 1 is censored, Var[H(4))] = 1/42 + 1/32 + 1/22 = 0.4236. CV =
0.4236 / 1.083 = 0.601.
The largest CV occurs if the observation at 3 is censored. 6.22. C. The center of the confidence interval is the point estimate of H: (0.302 + 0.876)/2 = 0.589. 1.645
Var[H] = 0.876 - 0.589. Var[H] = 0.03044.
S = Exp[-H] = Exp[-0.589] = 0.5549. ∂S/∂H = -Exp[-H]. Var[S] ≅ (∂S/∂H)2 Var[H] = Exp[-2H] Var[H] = Exp[-(2)(0.589)] 0.03044 = 0.009372. Symmetric linear 90% confidence interval for S: . 0.5549 ± 1.645 0.009372 = 0.5549 ± 0.1593 = (0.3956, 0.7142). Comment: Similar to 4, 11/05, Q.20. 6.23. D. If as is more commonly done, one instead exponentiated the confidence interval for H, one gets: (e-0.876, e-0.302) = (0.4164, 0.7393).
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 199
6.24. A. In order to calculate p71, blank out 71 and prior rows ti
ri
71 72 73 74
80 140 90
si/ri2
si
si /ri
20 22 25
0.2500 0.1571 0.2778
0.2500 + 0.1571 + 0.2778 = 0.6849.
0.003125 0.001122 0.003086
Estimate of 3 p 71 = e-0.6849 = 0.5041.
Variance of the estimate of 3 p 71 is: (0.50412 )(0.003125 + 0.001122 + 0.003086) = 0.001863. Standard Deviation of the estimate of 3 p 71 is: 0.001863 = 0.0432. 6.25. B. Individual A B C D E F G yi
Birthdate April 1, 1932 July 1, 1932 Oct. 1, 1932 Jan. 1, 1933 April 1, 1933 July 1, 1933 Sept. 1, 1933 ri
70 & 3 months 70 & 5 months
4 3
Death June 1, 2003 March 1, 2003
Oct. 1, 2003 si
si/ri
1 1
1/4 1/3
Age on 1/1/03 70 & 9 months 70 & 6 months 70 & 3 months 70 69 & 9 months 69 & 6 months 69 & 4 months H(yi) - H(70)
Age at Death or the End of 2003 71 years and 2 months 71 years and 6 months 70 years and 5 months 71 years 70 years and 9 months 70 years and 3 months 70 years and 4 months
1/4 7/12
q70 = 1 - S(71)/S(70) = 1 - e-7/12 = 0.4420. Comment: At 70 & 3 months the risk set is: D to G. At 70 & 5 months the risk set is: C to E. The death of A at age 71 years & 2 months does not enter into the calculation of q70. 6.26. E. Var[H(71) - H(70)] = 1/42 + 1/32 = .1736. Var[q70] = Var[p70] = (p702 )Var[H(71) - H(70)] = (.55802 )(.1736) = .05405. Standard Deviation of the estimate of q70 is: 0.05405 = 0.2325.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 200
6.27. D. & 6.28. B. H(10) = Σ si/ri = 0.9775. S(10) = e-0.9775 = 0.3763. ^
^
ti
si
ri
si/ri
H(ti)
S(ti)
si/ri^2
Var[H(ti)]
Var[S(ti)]
2 3 5 6 7 9
1 2 1 1 1 2
12 11 9 8 7 6
0.0833 0.1818 0.1111 0.1250 0.1429 0.3333
0.0833 0.2652 0.3763 0.5013 0.6441 0.9775
0.9200 0.7671 0.6864 0.6058 0.5251 0.3763
0.006944 0.016529 0.012346 0.015625 0.020408 0.055556
0.006944 0.023473 0.035819 0.051444 0.071852 0.127408
0.005878 0.013812 0.016877 0.018877 0.019814 0.018038
V [ H(10)] = Σ si/ri2 = 0.127408. V [ S(10)] = S(10)2 V [ H(10)] = 0.018038. ^ ^
^ ^
6.29. E.
xi
si
ri
si/ri2
18 45 60
1 2 1
7 5 2
1/49 2/25 1/4
^
^ ^
0.018038 = 0.1343.
^
Var[ H(60)] = 1/49 + 2/25 + 1/4 = 0.350. ^
Comment: H(60) = 1/7 + 2/5 + 1/2 = 1.043. Risk set at 18 is: A to G. Risk set at 45 is: C to F, H. Risk set at 60 is: D, H. ^
6.30. E. H(ti) is the center of the interval (0.07125, 0.22875) or 0.1500. ^
H(ti+1) is the center of the interval (0.15607, 0.38635) or 0.2712. i+1 ^
^
H(ti+1) = Σ sj/rj = H(ti) + si+1/ri+1. ⇒ si+1/ri+1 = 0.2712 - 0.1500 = 0.1212. The interval (0.07125, 0.22875) is 0.1500 ± 0.07875. ^
^
Since this is a 95% confidence interval, 1.96 StdDev[ H(ti)] = 0.07875. ⇒ Var[ H(ti)] = 0.00161. The interval (0.15607, 0.38635) is 0.2712 ± 0.1152. ^
^
This is a 95% confidence interval ⇒ 1.96 StdDev[ H(ti+1)] = 0.1152. ⇒ Var[ H(ti+1)] = 0.00345. i+1 ^
^
Var[ H(ti+1)] = Σ sj/rj2 = Var[ H(ti)] + si+1/ri+12 . ⇒ si+1/ri+12 = 0.00345 - 0.00161 = 0.00184. Solving two equations in two unknowns: ri+1 = 66 and si+1 = 8.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 201
^
6.31. D. H(35) = Σ si/ri = 2/15 + 3/13 + 2/10 + d/8 + 2/(8 - d) = 1.5641. ⇒ d/8 + 2/(8 -d) = 1. ^
⇒ d = 4. Var[ H(35)] = Σ si/ri2 = 2/152 + 3/132 + 2/102 + 4/82 + 2/42 = 0.234. ^
6.32. C. Var[ H(20)] = Σ si/ri2 = 15/1002 + 20/652 + 13/402 = .01436.
0.01436 = 0.120.
^
Comment: H(20) = Σ si/ri = 15/100 + 20/65 + 13/40 = 0.783. 4
^
6.33. A.
H(4.5) =
∑ si / ri = 0.7746. i=1
ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
1 2 3 4
2 1 2 2
12 10 9 7
0.1667 0.1000 0.2222 0.2857
0.1667 0.2667 0.4889 0.7746
0.013889 0.010000 0.024691 0.040816
0.013889 0.023889 0.048580 0.089397
^ ^
V [ H(4.5)] =
4
∑ si / ri2 = 0.089397. i=1
^
The standard deviation of H(4.5) is:
0.089397 = 0.2990.
A 95% confidence interval for H(4.5) is: 0.7746 ± (1.960)(0.2990) = (0.1886, 1.3606). ^
^
6.34. D. s4 /r4 = H(y4 ) - H(y3 ) = 0.5691 - 0.4128 = 0.1563. ^ ^ ^ [H ^ [H s4 /r4 2 = Var (y4 )] - Var (y3 )] = 0.014448 - 0.009565 = 0.004883.
Dividing the first equation by the second equation: r4 = 0.1563/0.004883 = 32. ⇒ s4 = 5.
2013-4-7,
Survival Analysis §6 Variance of Nelson-Aalen,
HCM 10/16/12,
Page 202
6.35. A. The center of the confidence interval is the point estimate of H: (0.283 + 1.267)/2 = 0.775. 1.960
^ ] = 1.267 - 0.775. ⇒ Var[H
S = Exp[-H] = Exp[-0.775] = 0.4607.
^ ] = 0.06301. Var[H
∂S = -Exp[-H]. ∂H
Using the delta method: ∂S 2 Var[S] ≅ ( ) Var[H] = Exp[-2H] Var[H] = Exp[-(2)(.775)] 0.06301 = 0 .013374. ∂H Symmetric linear 95% confidence interval for S: 0.4607 ± 1.960 0.013374 = 0.4607 ± 0.2267 = (0.2340, 0.6874). Comment: If as is more commonly done, one instead exponentiated the confidence interval for H, one gets: (e-1.267, e-0.283) = (0.2817, 0.7535), which is choice C. However, here we are told to use the delta method, so we do so. 6.36. D. For example, the risk set at time 8 consists of members 2 to 8, and is of size 7. xi
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
8 12 22 36
1 2 1 1
7 5 2 1
0.143 0.400 0.500 1.000
0.1429 0.5429 1.0429 2.0429
0.020408 0.080000 0.250000 1.000000
0.020408 0.100408 0.350408 1.350408
^
^
H(12) = 1/7 + 2/5 = 0.5429. Var[ H(12)] = 1/72 + 2/52 = 0.100408. A symmetric 90% linear confidence for H(12) is: 0.5429 ± 1.645
0.100408 = 0.5429 ± 0.5213 = 0.0216 to 1.064.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 203
Section 7, Log-Transformed Confidence Intervals72 In addition to the usual linear confidence intervals, log-transformed confidence intervals can also be obtained. Before discussing log-transformed intervals, let us review linear intervals. Linear Confidence Intervals for the Survival Function: One could use the previously discussed estimates and their variances in order to get linear confidence intervals. In a previous section, for the example with 10 losses, using the Kaplan-Meier Product-Limit Estimator, the following was obtained: yi
si
ri
(ri-si)/ri
300 700 2,000 11,000 12,000 15,000 29,000 70,000
1 1 1 1 1 1 1 1
4 6 8 7 6 5 4 1
0.750 0.833 0.875 0.857 0.833 0.800 0.750 0.000
S(yi) 1 0.750 0.625 0.547 0.469 0.391 0.312 0.234 0.000
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.08333 0.03333 0.01786 0.02381 0.03333 0.05000 0.08333
0.08333 0.11667 0.13452 0.15833 0.19167 0.24167 0.32500
0.0469 0.0456 0.0402 0.0348 0.0292 0.0236 0.0179
Then for example, a 95% linear confidence interval for S(2000) is: 0.547 ± 1.96 0.0402 = 0.547 ± 0.393 = (0.154, 0.940).
72
See pages 358-359 of Loss Models.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 204 Here is a graph of 95% linear confidence intervals for the Survival Function: Prob. 1.2
1
0.8
0.6
0.4
0.2
Size 10000
20000
30000
40000
50000
60000
70000
Exercise: For this example, determine a 95% linear confidence interval for S(29,000). [Solution: 0.234 ± 1.96 0.0179 = 0.234 ± 0.262 = (-0.028, 0.496). Comment: We know the survival function is between 0 and 1. It can not be negative!] When we have limited data as in this example, it is not uncommon for linear confidence intervals to extend outside of the interval [0, 1]. One way to avoid this problem is to use log-transformed confidence intervals. Log-Transformed Confidence Intervals for the Survival Function: Assume we have used the Kaplan-Meier Product-Limit Estimator and we want a log-transformed confidence interval for the Survival Function to cover probability P. Let y be such that Φ(y) = (1+P)/2.73 ^ ] y Var[S n Let U = exp Sn lnSn
[
]
.
Then the log-transformed confidence interval is: (Sn ( t )1 / U, Sn ( t )U ). 73
See “Mahlerʼs Guide to Classical Credibility.” If P = 95%, then y = 1.960. If P = 90%, then y = 1.645.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 205 For example, to get a 95% log-transformed confidence interval for S(2000): U = exp[
1.96 0.0402 ] = e-1.1908 = 0.304. 0.547 ln0.547
(0.5471/0.304, 0.5470.304 ) = (0.137, 0.832). Here is a graph of 95% log-transformed confidence intervals for the Survival Function: Prob.
0.8
0.6
0.4
0.2
10000
20000
30000
40000
50000
60000
Size 70000
Exercise: Determine a 95% log-transformed confidence interval for S(29,000). [Solution: U = exp[
1.96 0.0179 ] = e-0.7716 = 0.462. 0.234 ln0.234
(0.2341/0.462, 0.2340.462 ) = (0.043, 0.511). ] Unlike a linear confidence interval, a log-transformed confidence interval for the survival function is always between 0 and 1.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 206 Basis of the Formula for the Log-Transformed Confidence Interval for the Survival Function: The delta method allows one to estimate the variance of functions:74 Var[g(y)] ≅ (
∂g 2 ) Var[y]. ∂y
Exercise: Use the delta method in order to estimate the variance of the cumulative hazard rate H in terms of that of the survival function S. ∂H [Solution: H = -lnS. = -1/S. Var[H] ≅ (-1/S)2 Var[S] = Var[S]/S2 .] ∂S The log-transformed confidence interval comes from constructing an approximate linear confidence interval for ln(H). Exercise: Use the delta method in order to estimate the variance of lnH in terms of that of H. ∂ lnH [Solution: = 1/H. Var[ln(H)] ≅ Var[H]/H2 .] ∂H Therefore, Var[ln(H)] ≅ Var[H]/H2 ≅ Var[S]/(H2 S 2 ) = Var[S] / {(lnS)2 S 2 }. Therefore, an approximate linear confidence interval for ln(H) to cover probability P is: ln(H) ± y
Var[ln(H)] ≅ ln(H) ± y
Where we have let U = exp[y
Var[S] / {S lnS} = ln(H) ± ln(U).
Var[S] / {S lnS}].
Exponentiating, gives an approximate confidence interval for H to cover probability P: exp[ln(H) ± ln(U)] = (exp[ln(H) + ln(U)], exp[ln(H) - ln(U)] = (HU, H/U). Exponentiating with a minus sign, gives an approximate confidence interval for S: (exp[-H/U], exp[-HU]) = (exp[-H]1/U, exp[-H]U) = (S1/U, SU). This is the desired log-transformed confidence interval for S. Note that we can go from a confidence interval for lnH to a confidence interval for H because ex is a one-to-one monotonic function. We can go from a confidence interval for H to a confidence interval for S because e-x is a one-to-one monotonic function. 74
See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 207 Log-Transformed Confidence Intervals for the Cumulative Hazard Rate: Assume we have used the Nelson-Aalen Estimator and we want a log-transformed confidence interval for H to cover probability P. Let y be such that Φ(y) = (1+P)/2.
[
Let U = exp −
^ (t) ] y Var[ H
]
.75
H(t) Then the log-transformed confidence interval is: (H(t)U, H(t)/U).76 77 In a previous section, for the example with eight women, using the Nelson-Aalen Estimator, the following was obtained: yi
si
ri
si/ri
H(yi)
S(yi)
si/ri^2
Var[H(yi)]
Var[S(yi)]
68 77 79 84
1 2 1 1
7 7 5 3
0.143 0.286 0.200 0.333
0.1429 0.4286 0.6286 0.9619
0.8669 0.6514 0.5334 0.3822
0.020408 0.040816 0.040000 0.111111
0.020408 0.061224 0.101224 0.212336
0.015336 0.025982 0.028795 0.031012
Thus in order to get a 95% log-transformed confidence interval for H(77): U = exp[-1.96 0.061224 / 0.4286] = e-1.1315 = 0.3225. (H(77)U, H(77)/U) = ((0.4286)(0.3225), 0.4286/0.3225) = (0.138, 1.329). Note that since we have very little data, the confidence interval is wide. Exercise: Determine a 95% log-transformed confidence interval for H(84). [Solution: U = exp[-1.96 0.212336 / 0.9619] = e-0.9389 = 0.391. (H(84)U, H(84)/U) = ((0.9619)(0.391), 0.9619/0.391) = (0.376, 2.46).] By exponentiating with a minus sign, we get a corresponding 95% log-transformed confidence interval for S(84): (e-2.46, e-0.376) = (0.085, 0.687).78 79 Alternately, S(84) = exp[-H(84)] = e-0.9619 = 0.3822. As discussed previously, Var[S] ≅ S2 Var[H]. Var[S(84)] ≅ (0.38222 )(0.212336) = 0.031012. 75
Note that it is arbitrary whether or not one has the minus sign in the exponential; including the minus sign results in a value of U less than 1. If one does not include the minus sign, then what I have called U would be 1/U. The resulting log-transformed confidence interval would be the same. While the formula for “U” here is different than that given previously when getting a log-transformed confidence interval for S, the two formulas will give the same result by substituting Var[S] ≅ S2 Var[H], as discussed below. 76 The log-transformed confidence interval for H always contains only positive values. Unlike the survival function, values of H can be greater than 1. 77 The point estimate of H is the geometric average of the endpoints of the log-transformed confidence interval. 78 Since the values of H are positive, the log-transformed confidence interval for S is always between 0 and 1. 79 Rather than use the delta method, we have first gotten a confidence interval for H(84). Then we make use of the fact that exponentiating is a monotone function. For any monotone function g(x), Prob[a ≤ x ≤ b] = Prob[g(a) ≤ x ≤ g(b)]. Therefore, Prob[a ≤ H(84) ≤ b] = Prob[-b ≤ -H(84) ≤ -a] = Prob[exp[-b] ≤ exp[-H(84] = S(84) ≤ exp[-a]].
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 208 As discussed previously with respect to log-transformed confidence intervals for S, take Var[S] / {S lnS}] = exp[1.96
U = exp[y
0.031012 / {(0.3822)(-0.9619)}] = e-0.9389 = 0.391.
(S1/U, SU) = (0.38221/0.391, 0.38220.391) = (0.085, 0.687). The same result as above. Basis of the Formula for the Log-Transformed Confidence Interval for H: Applying the delta method to the function ln(x), Var[ln(H)] ≅ Var[H]/H2 . Therefore, an approximate linear confidence interval for ln(H) to cover probability P is: ln(H) ± y
Var[ln(H)] ≅ ln(H) ± y
Where we have let U = exp[-y
Var[H] / H = ln(H) ± ln(U). Var[H(t)] / H(t)].
Exponentiating, gives an approximate confidence interval for H to cover probability P:80 exp[ln(H) ± ln(U)] = (exp[ln(H) + ln(U)], exp[ln(H) - ln(U)] ) = (HU, H/U). Log-Transformed versus Linear Confidence Intervals: Often the log-transformed and linear confidence intervals will be similar. In either case, we are using a Normal Approximation. ^
The linear confidence interval for H will be appropriate if H is approximately Normal. ^
The log-transformed confidence interval for H will be appropriate if ln( H) is approximately Normal. As an example, I simulated 20 values from a Weibull Distribution with τ = 2 and θ = 100. Then I used the Nelson-Aalen estimator in order to estimate H(80). ^
^
I did this a total of 1000 times and looked at the distributions of the values of H(80) and ln[ H(80)]. ^
Skewness: Kurtosis:
^
H(80)
ln[ H(80)]
0.51 3.4
-0.35 3.1
The Normal Distribution has skewness of 0 and a kurtosis of 3. Therefore, in this case, a ^
log-transformed confidence interval, which is based on an assumption that ln[ H(80)] is approximately Normal, seems to be slightly better to use than a linear confidence interval.81 80
Rather than use the delta method, we have first gotten a confidence interval for ln(H). Then we make use of the fact that exponentiating is a monotone function. For any monotone function g(x), Prob[a ≤ x ≤ b] = Prob[g(a) ≤ x ≤ g(b)]. Therefore, Prob[a ≤ ln(H) ≤ b] = Prob[exp[a] ≤ exp[ln(H)] = H ≤ exp[b]]. 81 It depends on the distribution, sample size, value at which one looks, truncating, censoring, etc.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 209 Problems: Use the following information for the next 4 questions: For a survival study with censored and truncated data, you are given: Time (t) Number at Risk at Time t Failures at Time t 1 300 35 2 270 49 3 320 56 4 250 65 5 200 74 7.1 (5 points) Using the Kaplan-Meier Product-Limit estimator and Greenwoodʼs approximation, determine linear 95% confidence intervals for S(1), S(2), S(3), S(4), and S(5). 7.2 (6 points) Using the Kaplan-Meier Product-Limit estimator and Greenwoodʼs approximation, determine log-transformed 95% confidence intervals for S(1), S(2), S(3), S(4), and S(5). 7.3 (5 points) Using the Nelson-Aalen estimator, determine linear 95% confidence intervals for H(1), H(2), H(3), H(4), and H(5). 7.4 (6 points) Using the Nelson-Aalen estimator, determine log-transformed 95% confidence intervals for H(1), H(2), H(3), H(4), and H(5). 7.5 (3 points) For a survival study, you are given: (i) The Product-Limit estimator Sn (t5 ) is used to construct confidence intervals for S(t5 ). (ii) The 95% log-transformed confidence interval for S(t5 ) is (0.383, 0.855). Determine the 95% symmetric linear confidence interval for S(t5 ). 7.6 (3 points) For a survival study, you are given: (i) No withdrawals occur. (ii) 3 deaths occur at time t1 . (iii) 2 deaths occur at time t2 . (iv) The Nelson-Aalen estimate of S(t2 ) is 0.9752. Determine the upper end of the 95% log-transformed confidence interval for S(t2 ). A. .980
B. .985
C. .990
D. .995
E. .997
7.7 (2 points) (0.100, 0.200) is a linear 90% confidence interval for S(ti). Determine the lower end of the log-transformed 95% confidence interval for S(ti). (A) 9.0%
(B) 9.3%
(C) 9.6%
(D) 9.9%
(E) 10.2%
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 210 Use the following information for the next 4 questions: 9 losses are from policies with a deductible of 5: 10, 10, 10, 15, 15, 25, 25, 50, 100. 7 losses are from policies with a deductible of 10: 15, 15, 20, 20, 25, 50, 75. 7.8 (2 points) Determine the Nelson-Aalen estimate of 20q5 . A. 67%
B. 69%
C. 71%
D. 73%
E. 75%
7.9 (3 points) Determine the upper end of a log-transformed 90% confidence interval for the estimate in the previous question. A. 86% B. 88% C. 90% D. 92% E. 94% 7.10 (2 points) Determine the Product-Limit estimate of 30p 10. A. 29%
B. 31%
C. 33%
D. 35%
E. 37%
7.11 (3 points) Determine the lower end of a log-transformed 90% confidence interval for the estimate in the previous question. A. 8% B. 9% C. 10% D. 11% E. 12%
Use the following information on a study of Aids patients for the next two questions: Time Event 0 20 new entrants 1.5 1 death 2.0 5 withdrawals from the study 2.7 1 death 3.0 10 new entrants to the study 3.3 1 death 4.0 5 withdrawals from the study 4.2 2 deaths 5.0 study ends 7.12 (3 points) Using the Product-Limit Estimator, determine a 90% log-transformed confidence interval for S(5). (A) (0.56, 0.85)
(B) (0.54, 0.87)
(C) (0.52, 0.89)
(D) (0.50, 0.91)
(E) (0.48, 0.93)
7.13 (3 points) Using the Nelson-Aalen Estimator, determine a 90% log-transformed confidence interval for H(5). (A) (0.15, 0.58)
(B) (0.13, 0.60)
(C) (0.11, 0.62)
(D) (0.09, 0.64)
(E) (0.07, 0.66)
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 211 Use the following information for the next four questions: Fifty cancer patients were observed from the time of diagnosis until the earlier of death or 36 months from diagnosis. Deaths occurred during the study as follows: Time In Months Since Diagnosis Number Of Deaths 6 11 12 9 18 8 24 4 30 2 36 3 7.14 (2 points) Using the Product-Limit Estimator, determine a 95% log-transformed confidence interval for S(12). (A) (0.45, 0.72) (B) (0.47, 0.73) (C) (0.50, 0.75) (D) (0.52, 0.76) (E) (0.55, 0.78) 7.15 (2 points) Using the Nelson-Aalen Estimator, determine a 95% log-transformed confidence interval for S(12). (A) (0.45, 0.72) (B) (0.47, 0.73) (C) (0.50, 0.75) (D) (0.52, 0.76) (E) (0.55, 0.78) 7.16 (3 points) Using the Product-Limit Estimator, determine a 95% log-transformed confidence interval for S(36). (A) (0.17, 0.37) (B) (0.16, 0.38) (C) (0.15, 0.39) (D) (0.14, 0.40) (E) (0.13, 0.41) 7.17 (3 points) Using the Nelson-Aalen Estimator, determine a 95% log-transformed confidence interval for S(36). (A) (0.21, 0.40) (B) (0.20, 0.41) (C) (0.19, 0.42) (D) (0.18, 0.43) (E) (0.17, 0.44)
7.18 (2 points) A survival study gave (1.44, 2.30) as a log-transformed confidence interval for H(t0 ). Determine the linear confidence interval for H(t0 ), covering the same probability as this log-transformed confidence interval. What is the lower endpoint of this linear confidence interval? A. 1.39 B. 1.41 C. 1.43 D. 1.45 E. 1.47
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 212 Use the following information for the next three questions: Liability insurance policies are sold with two different limits: 50 and 100. You observe 10 claim payments from each type of policy. Limit of 50: 3, 9, 14, 27, 39, 50, 50, 50, 50, 50. Limit of 100: 4, 8, 17, 22, 55, 60, 100, 100, 100, 100. 7.19 (2 points) Using the Product-Limit Estimator, determine a 95% log-transformed confidence interval for S(55). (A) (0.23, 0.55) (B) (0.21, 0.67) (C) (0.19, 0.69) (D) (0.17, 0.71) (E) (0.15, 0.73) 7.20 (2 points) Using the Product-Limit Estimator, determine a 95% log-transformed confidence interval for S(70). (A) (0.20, 0.55) (B) (0.18, 0.57) (C) (0.16, 0.59) (D) (0.14, 0.61) (E) (0.12, 0.63) 7.21 (3 points) Using the Nelson-Aalen Estimator, determine a 95% log-transformed confidence interval for S(55). (A) (0.23, 0.68) (B) (0.21, 0.70) (C) (0.19, 0.72) (D) (0.17, 0.74) (E) (0.15, 0.76)
7.22 (Course 4 Sample Exam 2000, Q.6) (0.360, 0.640) is a linear 95% confidence interval for S(tM). Determine the log-transformed 95% confidence interval for S(tM). 7.23 (4, 11/01, Q.37 & 2009 Sample Q.77) (2.5 points) A survival study gave (1.63, 2.55) as the 95% linear confidence interval for H(t0 ). Calculate the 95% log-transformed confidence interval for H(t0 ). (A) (0.49, 0.94) (B) (0.84, 3.34) (C) (1.58, 2.60) (D) (1.68, 2.50) (E) (1.68, 2.60) 7.24 (4, 11/02, Q.8 & 2009 Sample Q. 36) (2.5 points) For a survival study, you are given: (i) The Product-Limit estimator Sn (t0 ) is used to construct confidence intervals for S(t0 ). (ii) The 95% log-transformed confidence interval for S(t0 ) is (0.695, 0.843). Determine Sn (t0 ). (A) 0.758
(B) 0.762
(C) 0.765
(D) 0.769
(E) 0.779
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 213 7.25 (4, 11/03, Q.22 & 2009 Sample Q.17) (2.5 points) For a survival study with censored and truncated data, you are given: Time (t) Number at Risk at Time t Failures at Time t 1 30 5 2 27 9 3 32 6 4 25 5 5 20 4 Calculate the 95% log-transformed confidence interval for H(3), based on the Nelson-Aalen estimate. (A) (0.30, 0.89) (B) (0.31, 1.54) (C) (0.39, 0.99) (D) (0.44, 1.07) (E) (0.56, 0.79) 7.26 (4 points) Using the information in 4, 11/03, Q.22, and the Nelson-Aalen estimator, calculate H(1), H(2), H(4), and H(5). 7.27 (4 points) Calculate the variance of the estimates in the previous question. 7.28 (8 points) Calculate the 95% linear confidence intervals and the 95% log-transformed confidence intervals for the estimates in the previous question.
7.29 (4, 11/04, Q.12 & 2009 Sample Q.141) (2.5 points) The interval (0.357, 0.700) is a 95% log-transformed confidence interval for the cumulative hazard rate function at time t, where the cumulative hazard rate function is estimated using the Nelson-Aalen estimator. Determine the value of the Nelson-Aalen estimate of S(t). (A) 0.50 (B) 0.53 (C) 0.56 (D) 0.59 (E) 0.61 7.30 (2 points) In the previous question, estimate the variance of the Nelson-Aalen estimate of the cumulative hazard rate function at time t. (A) 0.0058 (B) 0.0062 (C) 0.0066 (D) 0.0070 (E) 0.0074
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 214 Solutions to Problems: 7.1. t
r
s
(r-s)/r
S(t)
s/{r(r-s)}
Sum
Var[S(t)]
0 1 2 3 4 5
300 270 320 250 200
35 49 56 65 74
0.8833 0.8185 0.8250 0.7400 0.6300
1 0.8833 0.7230 0.5965 0.4414 0.2781
0.000440 0.000821 0.000663 0.001405 0.002937
0.000440 0.001261 0.001924 0.003330 0.006266
0.000344 0.000659 0.000685 0.000649 0.000485
sr S(t) 0t Sum 1 1 0.883 2 0.723 3 0.596 4 0.441 5 0.278
Var[S(t)]
Lower End
Upper End
0.000344 0.000659 0.000685 0.000649 0.000485
0.847 0.673 0.545 0.391 0.235
0.919 0.773 0.647 0.491 0.321
^
S(2) = {(300 - 35)/300}{(270 - 49)/270} = 0.7230. V [ S(2)] = S(2)2 Σ si/{ri(ri - si)} = (0.72302 ){35/((300)(265)} + 49/((270)(221)} = ^ ^
^
(0.5227)(0.001261) = 0.000659. For example, 95% confidence interval for S(1): 0.883 ± 1.96 7.2. U = exp[1.96 0t 1 2 3 4 5
S(t) 1 0.883 0.723 0.596 0.441 0.278
0.000344 = [0.847, 0.919].
V[S] / {S ln S}]. Lower end: S1/U. Upper end: SU. Var[S(t)]
U
Lower End
Upper End
0.000344 0.000659 0.000685 0.000649 0.000485
0.718 0.807 0.847 0.871 0.886
0.841 0.669 0.543 0.391 0.236
0.914 0.770 0.645 0.490 0.322
For example for t = 1, U = exp[1.96
0.000344 / {(0.883) ln(0.883)}] = 0.718.
Lower end:0 .8831/0.718 = 0.841. Upper end: 0.8830.718 = 0.914.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 215 7.3. H(2) = Σ si/ri = 35/300 + 49/270 = 0.2981. ^
V [ H(2)] = Σ si/ri2 = 35/3002 + 49/2702 = 0.001061. ^ ^
ti
ri
si
s/r
H(ti)
s/r^2
Var[H(ti)]
1 2 3 4 5
300 270 320 250 200
35 49 56 65 74
0.1167 0.1815 0.1750 0.2600 0.3700
0.1167 0.2981 0.4731 0.7331 1.1031
0.000389 0.000672 0.000547 0.001040 0.001850
0.000389 0.001061 0.001608 0.002648 0.004498
ti0 Sum sr H(ti) 1 1 0.1167 2 0.2981 3 0.4731 4 0.7331 5 1.1031
7.4. U = exp[-1.96 ti0 1 2 3 4 5
H(ti) 1 0.1167 0.2981 0.4731 0.7331 1.1031
Var[H(ti)]
Lower End
Upper End
0.000389 0.001061 0.001608 0.002648 0.004498
0.078 0.234 0.395 0.632 0.972
0.155 0.362 0.552 0.834 1.235
Var[H] / H]. Lower end: HU. Upper end: H/U. Var[H(ti)]
U
Lower End
Upper End
0.000389 0.001061 0.001608 0.002648 0.004498
0.7180 0.8072 0.8469 0.8715 0.8877
0.084 0.241 0.401 0.639 0.979
0.163 0.369 0.559 0.841 1.243
7.5. The lower endpoint is Sn (t5 )1/U. Sn (t5 )1/U = 0.383. ⇒ (ln Sn (t5 ))/U = ln0.383 = -0.9597. The upper endpoint is Sn (t5 )U. Sn (t5 )U = 0.855. ⇒ (ln Sn (t0 ))U = ln0.855 = -0.1567.
⇒ (ln Sn (t5 ))2 = (-0.9597)( -0.1567). ⇒ ln Sn (t5 ) = ±0.3878. ⇒ Sn (t5 ) = e-0.3878 = 0.679. U = ln Sn (t5 )/(-0.9597) = ln0.679 /(-0.9597) = 0.4034. But U = exp[1.96
V[S] / {S ln S}]. ⇒ 0.4034 = exp[1.96
V[S] / {.679 ln .679}]
⇒ V[S] = 0.0148. Therefore 95% symmetric linear confidence interval for S(t5 ) is: 0.679 ± 1.96 0.0148 = 0.679 ± 0.238 = (0.441, 0.917). Comment: Similar to 4, 11/02, Q.8. Note that 0 ≤ Sn (t5 ) ≤ 1, so that e0.3878 = 1.47 is not a acceptable solution.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 216 ^
7.6. C. H(t2 ) = -ln(0.9752) = 0.02511 = 3/n + 2/(n-3). ⇒ 0.02511n(n-3) = 3(n-2) + 2n.
⇒ 0.02511n2 - 5.0753n + 9 = 0 ⇒ n = 200. ^
Var[ H(20)] = Σ si/ri2 = 3/2002 + 2/1972 = 0.0001265. U = exp[-1.96
Var[H] / H] = exp[-1.96 0.0001265 )/ 0.02511] = 0.416.
The upper end is: S(t2 )U = 0.9752.416 = 0.990. Comment: The lower end is: S(t2 )1/U = 0.97521/.416 = 0.941. 7.7. C. The given 90% linear confidence interval is: 0.150 ± 0.050. Thus 0.150 is the estimate of S(ti), and 0.050 is 1.645 times the standard deviation of the estimate of S(ti). Var[S(ti)] = 0.050/1.645 = 0.0304. Let U = exp[1.96
Var[S(ti)] /{S(ti) lnS(ti)}] = exp[(1.96)(0.0304)/{(0.15)ln(0.15)}] = 0.8111.
The upper end of the log-transformed confidence interval is: S(ti)U = 0.150.8111 = 0.215. The lower end of the log-transformed confidence interval is: S(ti)1/U = 0.151/0.8111 = 0.096 The log-transformed 95% confidence interval is: (0.096, 0.215). Comment: Similar to Course 4 Sample Exam, Q.6 and 4, 11/01, Q.37. 7.8. D. Since all the data is truncated from below, everything is conditional on survival to 5. t
r
s
s/r
H(t)
S(t)
10 15 20 25 50 75 100
9 13 9 7 4 2 1
3 4 2 3 2 1 1
0.3333 0.3077 0.2222 0.4286 0.5000 0.5000 1.0000
0.3333 0.6410 0.8632 1.2918 1.7918 2.2918 3.2918
0.7165 0.5268 0.4218 0.2748 0.1667 0.1011 0.0372
20q 5
= 1 - S(25)/S(5) = 1 - 0.2748 = 0.7252.
Comment: The losses from polices with a deductible of 10 are not in the risk set at 10. Rather they enter the risk set just after 10. With a deductible of 10, a loss of size 10 would result in no payment and thus would not enter into the data base.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 217 7.9. B. 90% log-transformed confidence interval, (S1/U, SU), U = exp[-1.645
^
^ ] / H]. Var[H
t
r
s
s/r
H(t)
s/r^2
Var[H]
U
10 15 20 25 50 75 100
9 13 9 7 4 2 1
3 4 2 3 2 1 1
0.3333 0.3077 0.2222 0.4286 0.5000 0.5000 1.0000
0.3333 0.6410 0.8632 1.2918 1.7918 2.2918 3.2918
0.0370 0.0237 0.0247 0.0612 0.1250 0.2500 1.0000
0.0370 0.0607 0.0854 0.1466 0.2716 0.5216 1.5216
0.3868 0.5314 0.5730 0.6141 0.6197 0.5955 0.5399
S(t)
Lower
Upper
0.7165 0.5268 0.4218 0.2748 0.1667 0.1011 0.0372
0.4225 0.2993 0.2217 0.1220 0.0555 0.0213 0.0022
0.8790 0.7113 0.6098 0.4523 0.3294 0.2555 0.1691
10 15 20 25 50 75 100
A log-transformed 90% confidence interval for 20p 5 = S(25)/S(5) is: (0.1220, 0.4523). A log-transformed 90% confidence interval for 20q5 = 1 - 20p 5 is: (1 - 0.4523, 1 - 0.1220) = (0.5477, 0.8780). 7.10. B. Since all the data is truncated from below, everything is conditional on survival to 5. t
r
s
s/r
H(t)
S(t)/S(10)
10 15 20 25 50 75 100
9 13 9 7 4 2 1
3 4 2 3 2 1 1
0.6667 0.6923 0.7778 0.5714 0.5000 0.5000 0.0000
0.6667 0.4615 0.3590 0.2051 0.1026 0.0513 0.0000
1.0000 0.6923 0.5385 0.3077 0.1538 0.0769 0.0000
30p 10
= S(40)/S(10) = {S(40)/S(5)}{S(10)/S(5)} = 0.2051/0.6667 = 0.3076.
Comment: One can start all of the calculations beyond 10: S(40)/S(10) = (9/13)(7/9)(4/7) = 4/13.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 218 7.11. E. 90% log-transformed confidence interval, (Sn (t)1/U, Sn (t)U), where U = exp[1.645
Var[Sn(t)] / {Sn (t) lnSn (t)}].
Since we want 30p 10 = S(40)/S(10) = Prob[Survive past 40 | survive past 10], start all of the calculations beyond 10. t
r
s
(r-s)/r
S(t)
s/{r(r-s)}
Cum. Sum.
15 20 25 50 75 100
13 9 7 4 2 1
4 2 3 2 1 1
0.6923 0.7778 0.5714 0.5000 0.5000 0.0000
0.6923 0.5385 0.3077 0.1538 0.0769 0.0000
0.0342 0.0317 0.1071 0.2500 0.5000
0.0342 0.0659 0.1731 0.4231 0.9231
Var[S]
U
Lower
Upper
0.01639 0.01912 0.01639 0.01001 0.00546
0.4373 0.5054 0.5595 0.5646 0.5400
0.4313 0.2938 0.1217 0.0363 0.0087
0.8515 0.7313 0.5171 0.3476 0.2503
15 20 25 50 75
A log-transformed 90% confidence interval for 30p 10 = S(40)/S(10) is: (0.1217, 0.5171). 7.12. B. 90% log-transformed confidence interval, (Sn (t)1/U, Sn (t)U), where U = exp[1.645
Var[Sn(t)] / {Sn (t) lnSn (t)}].
ti
ri
si
(r-s)/r
S(ti)
s/{r(r-s)}
Cum. Sum.
1.5 2.7 3.3 4.2
20 14 23 17
1 1 1 2
0.9500 0.9286 0.9565 0.8824
0.9500 0.8821 0.8438 0.7445
0.002632 0.005495 0.001976 0.007843
0.002632 0.008126 0.010102 0.017946
ti
VAR[S(ti)]
U
Lower
Upper
1.5 2.7 3.3 4.2
0.002375 0.006324 0.007193 0.009947
0.1930 0.3065 0.3778 0.4738
0.7666 0.6642 0.6379 0.5365
0.9902 0.9623 0.9378 0.8695
A log-transformed 90% confidence interval for S(5) is: (0.5365, 0.8695). Comment: It saves a little time to note that Var[Sn (5)] / Sn (5)2 = 0.017946, so that U = exp[(1.645) 0.017946 / ln(0.7445)] = 0.4738.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 219 7.13. B. 90% log-transformed confidence interval, (H U, H/U), where U = exp[-1.645
^
^ ] / H]. Var[H
ti
ri
si
s/r
H(ti)
s/r^2
VAR[H(ti)]
U
1.5 2.7 3.3 4.2
20 14 23 17
1 1 1 2
0.0500 0.0714 0.0435 0.1176
0.0500 0.1214 0.1649 0.2826
0.0025 0.0051 0.0019 0.0069
0.0025 0.0076 0.0095 0.0164
0.1930 0.3069 0.3784 0.4743
ti
Lower
Upper
1.5 2.7 3.3 4.2
0.0097 0.0373 0.0624 0.1340
0.2591 0.3956 0.4358 0.5957
A log-transformed 90% confidence interval for H(5) is: (0.1340, 0.5957). Comment: Multiplying by minus one and exponentiating, a log-transformed 90% confidence interval for S(5) is: (0.5512, 0.8746). 7.14. A. S(12) = {(r1 - s1 )/r1 } {(r2 - s2 )/r2 } = {(50 - 11)/50}{(39 - 9)/39} = 0.6000. Var[S(12)] = S(12)2 {s1 /{r1 (r1 - s1 )} + s2 /{r2 (r2 - s2 )}} = 0.60002 {11/{(50)(50 - 11)} + 9/{(39)(39 - 9)}} = (0.36)(0.013333) = 0.004800. U = exp[1.96
Var[Sn(t)] / {Sn (t) lnSn (t)}] = exp[(1.96)
0.004800 / {(0.6000 ln(0.6000)}]
= e-0.44305 = 0.6421. A 95% log-transformed confidence interval for S(12) is: (Sn (12)1/U, Sn (12)U) = (0.61/0.6421, 0.60.6421) = (0.451, 0.720). Comment: H = - ln[S] = - ln[0.6] = 0.5108. ∂H / ∂S = -1/S. ⇒ Var[H] ≅ (-1/S)2 Var[S] = Var[S]/S2 = 0.004800/0.62 = 0.013333. U = exp[-1.96
Var[H] = 0.11547
Var[H] / H] = exp[-(1.96)(0.11547)/0.5108] = e-0.4431 = 0.6421.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 220 ^
7.15. C. H(12) = s1 /r1 + s2 /r2 = 11/50 + 9/39 = 0.4508. ^
Var[ H(12)] = s1 /r1 2 + s2 /r2 2 = 11/502 + 9/392 = 0.01032. ^
^
S (12) = exp[- H(12)] = e-0.4508 = 0.6371. U = exp[-1.96
^
^ ] / H] = exp[-1.96 Var[H
0.01032 / 0.4508] = e-0.4417 = 0.6430.
A log-transformed 95% confidence interval for S(12) is: (S1/U, SU) = (0.63711/0.6430, 0.63710.6430) = (0.496, 0.748). Alternately, a 95% log-transformed confidence interval for H(12) is: (HU, H/U) = ((0.4508)(0.6430), 0.4508/0.6430) = (0.2899, 0.7011). Multiplying by -1 and exponentiating, a 95% log-transformed confidence interval for S(12) is: ( e-0.7011, e-0.2899) = (0.496, 0.748). 7.16. C. 95% log-transformed confidence interval, (Sn (t)1/U, Sn (t)U), where U = exp[1.96
Var[Sn(t)] / {Sn (t) lnSn (t)}].
ti
ri
si
(r-s)/r
S(ti)
6 12 18 24 30 36
50 39 30 22 18 16
11 9 8 4 2 3
0.7800 0.7692 0.7333 0.8182 0.8889 0.8125
0.7800 0.6000 0.4400 0.3600 0.3200 0.2600
ti
VAR[S(ti)]
U
Lower
Upper
6 12 18 24 30 36
0.000000 0.002769 0.003836 0.003877 0.003774 0.003467
1.0000 0.7143 0.7146 0.7176 0.7187 0.7193
0.7800 0.4891 0.3170 0.2408 0.2049 0.1537
0.7800 0.6943 0.5562 0.4804 0.4409 0.3795
s/{r(r-s)}
Cum. Sum.
0.007692 0.012121 0.010101 0.006944 0.014423
0.000000 0.007692 0.019814 0.029915 0.036859 0.051282
A log-transformed 95% confidence interval for S(36) is: (0.1487, 0.3860). Comment: S50(36) = 1 - (11 + 9 + 8 + 4 + 2 + 3)/50 = 13/50 = 0.26. Var[S50(36)] = (0.26)(1 - 0.26)/50 = 0.003848.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 221 7.17. D. 95% log-transformed confidence interval, (S1/U, SU), where U = exp[-1.96
^
^ ] / H]. Var[H
ti
ri
si
s/r
H(ti)
s/r^2
VAR[H(ti)]
U
6 12 18 24 30 36
50 39 30 22 18 16
11 9 8 4 2 3
0.2200 0.2308 0.2667 0.1818 0.1111 0.1875
0.2200 0.4508 0.7174 0.8993 1.0104 1.1979
0.0044 0.0059 0.0089 0.0083 0.0062 0.0117
0.0044 0.0103 0.0192 0.0275 0.0336 0.0454
0.5538 0.6430 0.6848 0.6968 0.7006 0.7058
ti
S(ti)
Lower
Upper
6 12 18 24 30 36
0.8025 0.6371 0.4880 0.4069 0.3641 0.3018
0.6722 0.4961 0.3508 0.2751 0.2364 0.1832
0.8853 0.7484 0.6118 0.5344 0.4927 0.4294
A log-transformed 95% confidence interval for S(36) is: (0.1832, 0.4294). 7.18. A. For the given log-transformed confidence interval: 1.44 = H U, and 2.30 = H/U. Therefore, H = (1.44)(2.30) = 1.8199, and U = 1.44 / 2.30 = 0.79126. Alternately, U = H/2.30 = 1.8199/2.30 = 0.79126. Therefore, 0.79126 = U = exp[-y StdDev[H(t)] / H(t)].
⇒ y StdDev[H(t)] = 0.23413 H = (0.23413)(1.8199) = 0.4261. The linear confidence interval is: H ± y StdDev[H(t)] = 1.8199 ± 0.4261 = (1.3938, 2.2460). Comment: Similar to 4, 11/01, Q.37. Your U could have been the inverse of my U, and you would have gotten the same answer.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 222 7.19. B. & 7.20. D. 95% log-transformed confidence interval, (Sn (t)1/U, Sn (t)U), where U = exp[1.96
Var[Sn(t)] / {Sn (t) lnSn (t)}].
ti
ri
si
(r-s)/r
S(ti)
s/{r(r-s)}
Cum. Sum.
3 4 8 9 14 17 22 27 39 55 60
20 19 18 17 16 15 14 13 12 6 5
1 1 1 1 1 1 1 1 1 1 1
0.9500 0.9474 0.9444 0.9412 0.9375 0.9333 0.9286 0.9231 0.9167 0.8333 0.8000
0.9500 0.9000 0.8500 0.8000 0.7500 0.7000 0.6500 0.6000 0.5500 0.4583 0.3667
0.002632 0.002924 0.003268 0.003676 0.004167 0.004762 0.005495 0.006410 0.007576 0.033333 0.050000
0.002632 0.005556 0.008824 0.012500 0.016667 0.021429 0.026923 0.033333 0.040909 0.074242 0.124242
Var[S]
U
Lower
Upper
0.002375 0.004500 0.006375 0.008000 0.009375 0.010500 0.011375 0.012000 0.012375 0.015596 0.016704
0.1408 0.2499 0.3221 0.3745 0.4150 0.4473 0.4740 0.4963 0.5152 0.5043 0.5023
0.6947 0.6560 0.6038 0.5511 0.4999 0.4505 0.4030 0.3573 0.3134 0.2129 0.1357
0.9928 0.9740 0.9490 0.9198 0.8875 0.8525 0.8153 0.7761 0.7349 0.6747 0.6041
3 4 8 9 14 17 22 27 39 55 60
A log-transformed 95% confidence interval for S(55) is: (0.2129, 0.6747). There are no values in the interval, (60, 70], and therefore a log-transformed 95% confidence interval for S(70) is the same as that for S(60): (0.1357, 0.6041). Comment: S20(39) = 11/20 = 0.55. Var[S20(39)] = (0.55)(1 - 0.55)/20 = 0.012375.
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 223 7.21. A. 95% log-transformed confidence interval, (S1/U, SU), where U = exp[-1.96
^
^ ] / H]. Var[H
t
r
s
s/r
H(t)
s/r^2
Var[H]
U
3 4 8 9 14 17 22 27 39 55
20 19 18 17 16 15 14 13 12 6
1 1 1 1 1 1 1 1 1 1
0.0500 0.0526 0.0556 0.0588 0.0625 0.0667 0.0714 0.0769 0.0833 0.1667
0.0500 0.1026 0.1582 0.2170 0.2795 0.3462 0.4176 0.4945 0.5779 0.7445
0.0025 0.0028 0.0031 0.0035 0.0039 0.0044 0.0051 0.0059 0.0069 0.0278
0.0025 0.0053 0.0084 0.0118 0.0157 0.0202 0.0253 0.0312 0.0381 0.0659
0.1409 0.2500 0.3222 0.3746 0.4151 0.4475 0.4742 0.4966 0.5157 0.5087
S(t)
Lower
Upper
0.9512 0.9025 0.8537 0.8049 0.7562 0.7074 0.6586 0.6099 0.5611 0.4750
0.7012 0.6633 0.6120 0.5603 0.5100 0.4614 0.4145 0.3694 0.3261 0.2314
0.9930 0.9747 0.9503 0.9219 0.8905 0.8565 0.8203 0.7822 0.7423 0.6847
3 4 8 9 14 17 22 27 39 55
A log-transformed 95% confidence interval for S(55) is: (0.2314, 0.6847). 7.22. The given 95% linear confidence interval is: 0.500 ± 0.140. Thus 0.500 is the estimate of S(tM), and 0.140 is 1.96 times the standard deviation of the estimate of S(tM). Let U = exp[1.96
Var[S(tM)] / {S(tM) lnS(tM)}] = exp[0.140/{(0.5)ln(0.5)}] = 0.6677.
The upper end of the log-transformed confidence interval is: S(tM)U = 0.50.6677 = 0.630. The lower end of the log-transformed confidence interval is: S(tM)1/U = .51/0.6677 = 0.354. The log-transformed 95% confidence interval is: (0.354, 0.630).
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 224 7.23. E. The given 95% linear confidence interval is: (1.63, 2.55) = 2.09 ± 0.46. Thus 2.09 is the estimate of H(t0 ), and 0.46 is 1.96 times the standard deviation of the estimate of H(t0 ). Let U = exp[-1.96
Var[H(t0)] / H(t0 )] = exp[-0.46/2.09] = 0.8024.
The lower end of the log-transformed confidence interval is: H(t0 )U = (2.09)(0.8024) = 1.677. The upper end of the log-transformed confidence interval is: H(t0 )/U = 2.09/0.8024 = 2.605. Alternately, let U = exp[1.96
Var[H(t0)] / H(t0 )] = exp[0.46/2.09] = 1.2462.
The lower end of the log-transformed confidence interval is: H(t0 )/U = 2.09/1.2462 = 1.677. The upper end of the log-transformed confidence interval is: H(t0 )U = (2.09)(1.2462) = 2.605. Comment: 0.8024 = 1/1.2462. The corresponding 95% log-transformed confidence interval for S(t0 ) is: (e-2.605, e-1.677) = (0.074, 0.187). 7.24. E. The lower endpoint is Sn (t0 )1/U. Sn (t0 )1/U = 0.695. ⇒ (ln Sn (t0 ))/U = ln0.695 = -0.3638. The upper endpoint is Sn (t0 )U. Sn (t0 )U = 0.843. ⇒ (ln Sn (t0 ))U = ln0.843 = -0.1708.
⇒ (ln Sn (t0 ))2 = (-0.3638)(-0.1708). ⇒ ln Sn (t0 ) = ±0.2493. ⇒ Sn (t0 ) = e-0.2493 = 0.779. Comment: Note that 0 ≤ Sn (t0 ) ≤ 1, so that e0.2493 = 1.283 is not a acceptable solution. ^
7.25. D. H(3) = 5/30 + 9/27 + 6/32 = 0.6875. ^
Var[ H(3)] = Σ si/ri2 = 5/302 + 9/272 + 6/322 = 0.02376. U = exp[- (1.96)
^
^ (3)] / H(3)] = exp[- (1.96) Var[H
0.02376 / 0.6875] = exp-0.43945 = 0.6444.
The confidence interval is: (HU, H/U) = ((0.6875)(0.6444), 0.6875/0.6444) = (0.443, 1.067). Comment: The 95% linear confidence interval for H(3) is: ^
H(3) ± (1.96)
^ (3)] = 0.6875 ± (1.96) Var[H
0.02376 = (0.385, 0.990).
7.26. t
r
s
s/r
H(t)
1 2 3 4 5
30 27 32 25 20
5 9 6 5 4
0.167 0.333 0.188 0.200 0.200
0.1667 0.5000 0.6875 0.8875 1.0875
2013-4-7, Survival Analysis §7 Log-Transf. Confidence Intervals, HCM 10/16/12, Page 225 7.27. t
r
s
s/r^2
VAR[H(t)]
1 2 3 4 5
30 27 32 25 20
5 9 6 5 4
0.00556 0.01235 0.00586 0.00800 0.01000
0.00556 0.01790 0.02376 0.03176 0.04176
7.28. 95% linear confidence intervals: t
H(t)
VAR[H(t)]
Lower
Upper
1 2 3 4 5
0.1667 0.5000 0.6875 0.8875 1.0875
0.00556 0.01790 0.02376 0.03176 0.04176
0.021 0.238 0.385 0.538 0.687
0.313 0.762 0.990 1.237 1.488
For example, .8875 ± 1.96 0.03176 = (0.538, 1.237). ^
^
95% log-transformed confidence intervals, ( HU, H/U), where U = exp[-1.96 t
H(t)
VAR[H(t)]
U
Lower
Upper
1 2 3 4 5
0.1667 0.5000 0.6875 0.8875 1.0875
0.00556 0.01790 0.02376 0.03176 0.04176
0.4161 0.5919 0.6444 0.6746 0.6919
0.069 0.296 0.443 0.599 0.752
0.401 0.845 1.067 1.316 1.572
For example, exp[-(1.96) 0.04176 /1.0875] = 0.6919. Then, (1.0875)(0.6919) = .752, and 1.0875/0.6919 = 1.572. 7.29. E. The log-transformed confidence interval is: (H(t)U, H(t)/U).
⇒ H(t)U = 0.357 and H(t)/U = 0.700. ⇒ H(t)2 = (0.357)(0.700). ⇒ H(t) = 0.500. ⇒ S(t) = e-0.500 = 0.607. Comment: The point estimate of H is the geometric average of the endpoints of the log-transformed confidence interval. 7.30. E. U = exp[-y
^
^ ] / H]. Var[H
Var[H(t)] / H(t)]. For 95% confidence, y = 1.960.
U2 = 0.357/0.700 = 0.510. ⇒ U = 0.714. ⇒ ln0.714 = -y Var[H(t)] / H(t).
⇒ 0.337 = 1.96 Var[H(t)] / 0.5. ⇒ Var[H(t)] = {(0.337)(0.5)/1.96}2 = 0.0074.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 226
Section 8, Maximum Likelihood Using the Product-Limit or the Nelson-Aalen Estimators, one does not assume any specific form of the survival function. Rather in these empirical methods one relies on the data. Instead, one can assume a form of the survival function. Then one can fit its parameter(s) via for example maximum likelihood.82 Simple cases to which this can be applied include the Exponential Distribution, the Weibull Distribution, the Uniform Distribution, and an assumption of constant hazard rates on intervals. For ungrouped data {x1 , x2 , ... , xn } define: Likelihood = Π f(xi) Loglikelihood = Σ ln f(xi) In order to fit a chosen type of distribution by maximum likelihood, you maximize the likelihood or equivalently maximize the loglikelihood. In other words, for ungrouped data you find the set of parameters such that either Π f(xi) or Σ ln f(xi) is maximized. An untruncated and uncensored value will contribute f(x) to the likelihood and ln f(x) to the loglikelihood. A value right censored at u, will contribute S(u) to the likelihood and ln S(u) to the loglikelihood. A value left truncated at d, if not censored will contribute f(x)/S(d) to the likelihood, and if right censored at u will contribute S(u)/S(d) to the likelihood. Exponential Distribution: For the Exponential Distribution with mean θ, S(x) = e-x/θ and f(x) = e-x/θ/θ. Exercise: For 4 lives you observe deaths at times: 20, 52, 61, 91. Fit an Exponential Distribution via maximum likelihood. [Solution: ln f(x) = -x/θ - lnθ. Loglikelihood is: -20/θ - lnθ - 52/θ - lnθ - 61/θ - lnθ - 91/θ - lnθ = -224/θ - 4lnθ. Setting the derivative with respect to θ equal to 0: 0 = 224/θ2 - 4/θ. ⇒ θ = 224/4 = 56.]
82
Fitting via maximum likelihood is covered much more extensively in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 227
For the Exponential Distribution with ungrouped data, in the absence of truncating and censoring, the method of maximum likelihood is equal to the method of moments; the fitted θ is equal to the average of the observations.83 With truncating and/or censoring things get more complicated. Exercise: For a mortality study: Individual Age of Entry Age of Termination Cause of Termination 1 0 20 Censored 2 0 8 Censored 3 0 16 Death 4 5 20 Censored 5 10 12 Death 6 15 20 Censored Fit an Exponential Distribution via maximum likelihood. [Solution: Individual Contribution to the Loglikelihood 1
ln S(20) = -20/θ.
2
ln S(8) = -8/θ.
3
ln f(16) = -lnθ - 16/θ.
4
ln{S(20)/S(5)} = ln S(20) - ln S(5) = -20/θ + 5/θ = -15/θ.
5
ln{f(12)/S(10)} = -lnθ - 12/θ + 10/θ = -lnθ - 2/θ.
6
ln{S(20)/S(15)} = -20/θ + 15/θ = -5/θ.
The loglikelihood is the sum of the contributions: -66/θ - 2lnθ. Setting the derivative with respect to θ equal to 0: 0 = 66/θ2 - 2/θ. ⇒ θ = 66/2 = 33. Comment: For example, the estimate of S(50) is: e-50/33 = 22.0%.] In general, for the Exponential Distribution, the maximum likelihood fit to ungrouped data is:84 θ^ =
sum of payments . number of uncensored values
Which in this context is: θ^ =
sum of the times spent in the study number of deaths observed
= (20 + 8 + 16 + 15 + 2 + 5)/2 = 66/2 = 33.
83 84
See “Mahlerʼs Guide to Fitting Loss Distributions.” See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 228
Weibull Distribution:85 ⎡ ⎛ x ⎞ τ⎤ ⎡ ⎛ x ⎞ τ⎤ ⎛ x ⎞τ τ ⎜ ⎟ exp ⎢-⎜ ⎟ ⎥ τ xτ − 1 exp⎢-⎜ ⎟ ⎥ ⎝ θ⎠ ⎣ ⎝ θ⎠ ⎦ ⎣ ⎝ θ⎠ ⎦ For a Weibull Distribution, S(x) = exp[-(x/θ)τ], and f(x) = = . τ x θ ln S(x) = -(x/θ)τ.
ln f(x) = lnτ + (τ - 1) ln x - (x/θ)τ - τ lnθ.
For the Weibull and the data in the previous exercise, the loglikelihood is determined as follows: Individual Contribution to the Loglikelihood 1
ln S(20) = -(20/θ)τ.
2
ln S(8) = -(8/θ)τ.
3
ln f(16) = lnτ + (τ - 1) ln16 - (16/θ)τ - τ lnθ.
4
ln{S(20)/S(5)} = ln S(20) - ln S(5) = (5/θ)τ - (20/θ)τ.
5
ln{f(12)/S(10)} = lnτ + (τ - 1) ln12 - (12/θ)τ - τ lnθ + (10/θ)τ.
6
ln{S(20)/S(15)} = (15/θ)τ - (20/θ)τ.
The loglikelihood is the sum of the contributions: 2lnτ + τ(ln12 + ln16) - 2τ lnθ - {3(20τ) + 8τ + 16τ + 12τ - 5τ - 10τ - 15τ}/θτ - ln12 - ln16. If for a Weibull Distribution, τ is taken as fixed and known, one can fit θ via maximum likelihood. Exercise: For τ = 2, fit θ via maximum likelihood. [Solution: The loglikelihood is: 2ln2 + 2(ln12 + ln16) - 4 lnθ - {3(202 ) + 82 + 162 + 122 - 52 - 102 - 152 }/θ2 - ln12 - ln16. Set the derivative with respect to θ equal to zero: 0 = -4/θ + (2)(1314)/θ3. ⇒ θ =
657 = 25.63.]
The estimate of S(50) is: exp[-(50/25.63)2 ] = 2.2%. If both τ and θ are unknown, in order to maximize the likelihood, one needs to use numerical methods. In the above example, τ = 2.85709 and θ = 23.672 correspond to the maximum likelihood. The estimate of S(50) is: exp[-(50/23.672)2.85709] = 0.02%. 85
For τ = 1, the Weibull Distribution is an Exponential Distribution.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 229
In general, for the Weibull Distribution with tau fixed, the maximum likelihood fit to ungrouped data is: 1 /τ ∑ (Min[xi , ui ]τ - diτ) ⎛ ⎞ . θ= ⎝ number of uncensored values ⎠ ^
Transform the Weibull into “Exponential Land” by raising things to the power τ, then fit via maximum Iikelihood, and then translate back to “Weibull Land” by taking the result to the 1/τ power. For the example, using instead the shortcut to fit the Weibull with τ = 2: ∑ (Min[xi , ui]τ - diτ ) = 202 + 82 + 162 + (202 - 52 ) + (122 -102 ) + (202 - 152 ) = 1314. ^
θ=
1 /τ ∑ (Min[xi , ui ]τ - diτ) ⎛ ⎞ = 1314 / 2 = 25.63, matching the previous result. ⎝ number of uncensored values ⎠
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 230
DeMoivreʼs Law: Under De Moivreʼs Law, one assumes a uniform density from 0 to ω, with no lives surviving beyond age ω.86 S(t) = 1 - t/ω, 0 < t < ω.
f(x) = 1/ω, 0 < t < ω.
Exercise: For 4 lives you observe deaths at ages: 20, 52, 61, 91. Fit De Moivreʼs Law via maximum likelihood. [Solution: Since we observe someone live to age 91, we know that ω must be at least 91. ln f(x) = -ln ω. The loglikelihood is: -4 ln ω. The loglikelihood is a decreasing function of ω. In other words, smaller ω correspond to larger loglikelihoods. Thus the loglikelihood is maximized for the smallest possible ω, which is 91. Comment: If ω were for example 80, then we could not see a death at age 91. If deaths were uniform from 0 to 80, then we would not see a death at an age greater than 80.] For the Uniform Distribution on the interval (0, ω) with ungrouped data, in the absence of truncating and censoring, the method of maximum likelihood fitted ω is equal to the maximum of the observations.87 With truncating and/or censoring things get more complicated. Exercise: For a mortality study: Individual Age of Entry Age of Termination 1 0 20 2 0 20 3 0 16 4 10 20 5 10 12 6 10 20 Fit De Moivreʼs Law via maximum likelihood.
86 87
See Section 3.7 of Actuarial Mathematics. See “Mahlerʼs Guide to Fitting Loss Distributions.”
Cause of Termination Censored Censored Death Censored Death Censored
2013-4-7, [Solution:
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Individual
Contribution to the Loglikelihood
1
ln S(20) = ln(1 - 20/ω).
2
ln S(20) = ln(1 - 20/ω).
3
ln f(16) = -lnω.
4
ln S(20) - ln S(10) = ln(1 - 20/ω) - ln(1 - 10/ω).
5
ln f(12) - ln S(10) = -lnω - ln(1 - 10/ω).
6
ln S(20) - ln S(10) = ln(1 - 20/ω) - ln(1 - 10/ω).
Page 231
The loglikelihood is the sum of the contributions: -2lnω + 4ln(1 - 20/ω) - 3ln(1 - 10/ω). Setting the derivative with respect to ω equal to zero: 0 = -2/ω + 80/(ω2(1 - 20/ω)) - 30/(ω2(1 - 10/ω)). ⇔ ω2 - 55ω + 300 = 0. ⇒ ω = 48.86. Comment: For example, the estimate of S(30) is: 1 - 30/48.86 = 38.6%.] Uniform Distribution on an Interval: Often one assume a constant density over short intervals; in other words a uniform distribution over a short interval.88 Exercise: For 50 lives aged 20 you observe deaths at times: 22 and 28. The other 48 lives survive beyond age 30. Let S(t) be the survival function conditional on survival to age 20 for someone aged 20 + t, t > 0. In other words, S(t) = tp 20. Assume that the corresponding density is constant over the interval from t = 0 to 10. (We assume no specific form of the density for t > 10.) Determine the density that maximizes the likelihood. [Solution: Let f(x) = u for t ≤ 10. S(x) = 1 - ut for t ≤ 10. Loglikelihood is: ln(u) + ln(u) + 48ln(1 - 10u) = 2 ln(u) + 48ln(1 - 10u). Setting the derivative with respect to u equal to 0: 0 = 2/u - 480/(1 - 10u). ⇒ u = 1/250 = 0.004.] q20 = Prob[death by age 21 | alive at age 20] = 0.004. 5 p 20
= Prob[alive at age 25 | alive at age 20] = 1 - (5)(0.004) = 0.98.
10p 20
= 1 - (10)(0.004) = 0.96.
Note that since 10p 20 = 0.96, we know that 11p 20 ≤ 0.96. However, we have not assumed that the same density applies beyond age 30, so we can not estimate 11p 20. 88
See Section 3.6 of Actuarial Mathematics.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 232
Constant Hazard Rates: Exercise: Assume a constant hazard rate of 10 for 0 < t < ∞. What is the Survival Function? t
[Solution: H(t) =
∫ h(s) ds = 10t.
S(t) = exp[-H(t)] = e-10t, exponential with mean 1/10.]
0
If one has a constant hazard rate of λ, one has an Exponential Distribution with mean 1/λ. If however one has a constant hazard rate on each of two or more intervals, the Survival Function is somewhat more complicated. For example, assume a hazard rate of 10 for 0 < t ≤ 5, and a hazard rate of 20 for t > 5. Then for t ≤ 5, H(t) = 10t, and for t > 5: t
H(t) =
∫ h(s) ds = (10)(5) + 20(t - 5) = 20t - 50.
S(t) = exp[-H(t)] = exp[50 - 20t].
0
Thus S(t) = e-10t for t ≤ 5 and S(t) = e50-20t for t > 5. In general, if there is a hazard rate of λ1 for 0 < t ≤ b and λ2 for t > b, then S(t) = exp[-λ1t] for t ≤ b and S(t) = exp[-λ1b - λ2(t - b)] for t > b. Exercise: Assume a hazard rate of λ1 for 0 < t ≤ 5, and a hazard rate of λ2 for t > 5. For 100 lives, you observe deaths at times: 1, 3, 6, 9, 12. You stop observing at t = 15, when 95 of the original 100 remain alive. Estimate λ1 and λ2 via maximum likelihood. [Solution: S(t) = exp[-λ1t] for t ≤ 5 and S(t) = exp[-λ15 - λ2(t - 5)] for t > 5. f(t) = λ1exp[-λ1t] for t ≤ 5 and f(t) = λ2exp[-λ15 - λ2(t - 5)] for t > 5. Each of the uncensored values contributes ln f(t) to the loglikelihood, while the 95 censored values each contribute lnS(15) = -λ15 - λ210. The loglikelihood is: lnλ1 - λ1 + lnλ1 - 3λ1 + lnλ2 - 5λ1 - λ2 + lnλ2 - 5λ1 - 4λ2 + lnλ2 - 5λ1 - 7λ2 - 95(-5λ1 - 10λ2) = 2lnλ1 - 494λ1 + 3lnλ2 - 962λ2. Setting the partial derivative with respect to λ1 equal to zero: 0 = 2/λ1 - 494. ⇒ λ1 = 0.00405. Setting the partial derivative with respect to λ2 equal to zero: 0 = 3/λ2 - 962. ⇒ λ2 = 0.00312. Comment: For example, the estimate of S(20) = exp[-(5)(0.00405) - (20 - 5)(0.00312)] = 93.5%.]
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 233
Problems: 8.1 (2 points) 10 lives were first observed at age 50. At age 55, one life died and another two lives left the study. The remaining 7 lives survived until age 60, when the study ended. Assuming a constant density between age 50 and 60, use maximum likelihood to estimate q59. (A) Less than 0.9% (B) At least 0.9%, but less than 1.0% (C) At least 1.0%, but less than 1.1% (D) At least 1.1%, but less than 1.2% (E) At least 1.2% 8.2 (3 points) You are modeling the survival times of industrial fans. You assume a constant hazard rate from age 0 to age 15. You assume another constant hazard rate beyond age 15. Seven fans are observed from when they are new. Five of the fans fail at ages: 6, 13, 17, 22, 24. The remaining two fans survive beyond age 25. Using maximum likelihood, estimate S(20). A. 45% B. 46% C. 47% D. 48% E. 49% 8.3 (3 points) S(t) = (1 - t/ω)1/3, 0 < t < ω. 5 lives are observed from birth. One life dies at age 30. The other 4 lives are still alive at age 40. Estimate ω using maximum likelihood. (A) Less than 80 (B) At least 80, but less than 85 (C) At least 85, but less than 90 (D) At least 90, but less than 95 (E) At least 95 8.4 (3 points) 200 individuals are observed starting at age 55. 10 die between ages 55 and 60. 50 leave the study at age 60. 15 die between ages 60 and 65. Let A be the Product-Limit Estimator of 10q55. Assuming the time to death is uniform between age 55 and 65, let B be the maximum likelihood estimate of 10q55. Determine B/A. A. 0.90
B. 0.95
C. 1.00
D. 1.05
E. 1.10
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 234
Use the following information for the next 22 questions: Let ti be data on how long 30 new robots survive in days, prior to breaking down and needing to be repaired: 11, 14, 25, 31, 59, 68, 152, 297, 335, 358, 402, 410, 435, 450, 508, 513, 575, 998, 1019, 1189, 2884, 3414, 3803, 4088, 4334, 4712, 7571, 9850, 11736, 13810.
Σ ti = 74,051. Σ
Σ ln[ti] = 192.86.
ti = 1134.19.
Σ ti2 = 580,309,325.
8.5 (1 point) What is the empirical distribution function at 1000? A. 50% B. 55% C. 60% D. 65% E. 70% 8.6 (1 point) What is the empirical survival function at 5000? A. 11%
B. 13%
C. 15%
D. 17%
E. 19%
8.7 (1 point) Fit an Exponential Distribution via maximum likelihood and estimate S(5000). A. 11%
B. 13%
C. 15%
D. 17%
E. 19%
8.8 (3 points) Determine a 90% linear confidence interval for the estimate in the previous question. 8.9 (3 points) Fit a Weibull Distribution with τ = 1/2 via maximum likelihood and estimate S(5000). A. 11%
B. 13%
C. 15%
D. 17%
E. 19%
8.10 (4 points) Determine a 90% linear confidence interval for the estimate in the previous question. 8.11 (4 points) If one fits both θ and τ via maximum likelihood, then θ^ = 1608 and ^τ = 0.590, with a corresponding loglikelihood of -255.58. (i) Use the likelihood ratio test in order to test the hypothesis that this data follows an Exponential Distribution versus the alternate hypothesis that it follows a Weibull Distribution. (ii) Use the likelihood ratio test in order to test the hypothesis that this data follows a Weibull Distribution with τ = 1/2 versus the alternate hypothesis that it follows a Weibull Distribution with both τ and θ unknown.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 235
8.12 (5 points) The maximum likelihood Weibull has θ^ = 1608 and ^τ = 0.590. The information matrix is: θ ⎛ 0.00000404 -0.00816⎞ ⎜ 158 ⎟⎠ τ ⎝ -0.00816 Determine a 90% linear confidence interval for the estimate of S(5000). 8.13 (1 point) Using the Kaplan-Meier Product-Limit Estimator, estimate S(5000). A. 11% B. 13% C. 15% D. 17% E. 19% 8.14 (2 points) Determine a 90% linear confidence interval for the estimate in the previous question. 8.15 (3 points) Using the Nelson-Aalen Estimator, estimate S(5000). A. 11% B. 13% C. 15% D. 17% E. 19% 8.16 (4 points) Determine a 90% linear confidence interval for the estimate in the previous question. 8.17 (2 points) Assume that the study of robots was ended at 5000 days, so that the last four values are reported as 5000. Fit an Exponential Distribution via maximum likelihood and estimate S(5000). A. 8% B. 10% C. 12% D. 14% E. 16% 8.18 (3 points) Determine a 90% linear confidence interval for the estimate in the previous question. 8.19 (4 points) Assume that the study of robots was ended at 5000 days, so that the last four values are reported as 5000. Fit a Weibull Distribution with τ = 1/2 via maximum likelihood and estimate S(5000). A. 8%
B. 10%
C. 12%
D. 14%
E. 16%
8.20 (4 points) Determine a 90% linear confidence interval for the estimate in the previous question.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 236
8.21 (2 points) Assume that the study was started 500 days after the robots were new, so that the first 14 out of 30 values would not have been reported. For this data truncated from below at 500:
Σ ti = 71,004.
Σ ln[ti] = 126.26.
Σ
Σ ti2 = 579,226,186.
ti = 951.485.
Fit an Exponential Distribution via maximum likelihood and estimate S(5000). (A) Less than 10% (B) At least 10%, but less than 15% (C) At least 15%, but less than 20% (D) At least 20%, but less than 25% (E) At least 25% 8.22 (3 points) Determine a 90% linear confidence interval for the estimate in the previous question. 8.23 (3 points) Assume that the study was started 500 days after the robots were new, so that the first 14 out of 30 values would not have been reported. For this data truncated from below at 500:
Σ ti = 71,004. Σ
ti = 951.485.
Σ ln[ti] = 126.26. Σ ti2 = 579,226,186.
Fit a Weibull Distribution with τ = 1/2 via maximum likelihood and estimate S(5000). (A) Less than 10% (B) At least 10%, but less than 15% (C) At least 15%, but less than 20% (D) At least 20%, but less than 25% (E) At least 25% 8.24 (4 points) Determine a 90% linear confidence interval for the estimate in the previous question. 8.25 (1 point) Assume that the study was started 500 days after the robots were new. Using the Kaplan-Meier Product-Limit Estimator, estimate S(5000)/S(500). (A) Less than 20% (B) At least 20%, but less than 25% (C) At least 25%, but less than 30% (D) At least 30%, but less than 35% (E) At least 35%
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 237
8.26 (2 points) Assume that the study was started 500 days after the robots were new. Using the Nelson-Aalen Estimator, estimate S(5000)/S(500). (A) Less than 20% (B) At least 20%, but less than 25% (C) At least 25%, but less than 30% (D) At least 30%, but less than 35% (E) At least 35%
8.27 (3 points) 50 lives are first observed at age 60. 5 of these lives die prior to age 61, with the other 45 lives surviving to age 61. An additional 30 lives are first observed at age 60.6. One of these lives dies prior to age 61. Assuming that there is a constant density between ages 60 and 61, determine the maximum likelihood estimate of q60. A. 9.0%
B. 9.5%
C. 10.0%
D. 10.5%
8.28 (2 points) You observe five claims. The times (in months) from reporting to settlement are: 5, 11, 13, 19, 30. A uniform distribution from 0 to ω is fit by maximum likelihood. Compare and contrast this uniform distribution to the empirical distribution function. 8.29 (3 points) S(t) = 1 - t/ω, 0 ≤ t ≤ ω. Ten artificial satellites are observed from when they are new. Six of them fail at times: 3, 8, 12, 17, 19, 22. At time 24, four satellites are still functioning. At this time, Hasty Helen estimates ω via maximum likelihood. The four remaining satellites fail at times: 25, 30, 32, 38. At this time, Patient Pamela estimates ω via maximum likelihood. What is the difference between Pamelaʼs estimate of ω and Helenʼs estimate of ω? A. -2
B. -1
C. 0
D. 1
E. 2
8.30 (3 points) You observe four losses of sizes: 12, 35, 80, and 140. You assume a constant hazard rate from 0 to 20. You assume another constant hazard rate from 20 to 100. You assume yet another constant hazard rate beyond 100. Using maximum likelihood, estimate S(150). A. 5% B. 6% C. 7% D. 8% E. 9%
E. 11.0%
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 238
8.31 (2 points) You observe four lives from birth. The first life left the study at age 45, while still alive. The second life left the study at age 62, while still alive. The third life left the study at age 70, while still alive. The fourth life left the study at age 87, while still alive. Assume a Weibull Distribution with τ = 1/2. Using maximum likelihood, estimate θ. A. 65
B. 66
C. 67
D. 68
E. Can not be determined
8.32 (3 points) Six locomotives were first observed when they were 10 years old. They were observed for y additional years. Four of the locomotives failed at ages: 16, 27, 30, 39. At age 10 + y, the two remaining locomotives are still functioning. The survival function is uniform as a function of age t: S(t) = 1 - t/ω, 0 ≤ t ≤ ω. The maximum likelihood estimate of ω is 76. Determine y. A. 42
B. 43
C. 44
D. 45
E. 46
8.33 (4 points) For a portfolio of policies, you are given: (i) There is no deductible and the policy limit varies by policy. (ii) A sample of ten claims is: 350 350
500
500
500+
1000
1000+
1000+
1200
1500
where the symbol + indicates that the loss exceeds the policy limit. ^
(iii) S1(1250) is the Nelson-Aalen estimate of S(1250). ^
(iv) S2 (1250) is the maximum likelihood estimate of S(1250) under the assumption that the losses follow a Weibull Distribution with τ = 2. ^
^
Determine the absolute difference between S1(1250) and S2 (1250). (A) 0.045
(B) 0.055
(C) 0.065
(D) 0.075
(E) 0.085
8.34 (160, 11/86, Q.8) (2.1 points) A cohort of four individuals is observed from time t = 0 to time t = 4. You are given: Individual Time of Entry Time of Death A 0 1 B 0 2 C 1 2 D 1 Individual D survives beyond time t = 4. Assuming that the survival distribution function for each individual is of the form S(t) = e-kt, determine the maximum likelihood estimate of k. (A) 1/3
(B) 3/7
(C) 1/2
(D) 4/7
(E) 2/3
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 239
8.35 (165, 11/86, Q.10) (1.8 points) For a group of 100 impaired individuals followed from birth, the force of mortality is assumed to be a step-wise linear function of age such that: µx = b, 0 ≤ x < 1; µx = 2b, 1 ≤ x < 2; µx = 3b, 2 ≤ x < 3. You are given that deaths occur at the following exact ages: x Number of Deaths 0.5 40 1.5 40 2.5 20 Determine the maximum likelihood estimate of b. (A) 2/5 (B) 10/23 (C) 10/21 (D) 10/19 (E) 10/17 8.36 (160, 5/87, Q.16) (2.1 points) Five mice are observed over the interval (0, 4]. The time at which each mouse dies is indicated below: Mouse Exact Time of Death 1 2 2 1 3 2 4 5 3 Mouse 4 survives to the end of the observation period. Assuming that the survival distribution function is of the form S(t) = exp[-kt3 /3], t > 0, determine the maximum-likelihood estimator for k. (A) 0.02
(B) 0.04
(C) 0.06
(D) 0.11
(E) 0.33
8.37 (160, 5/87, Q.18) (2.1 points) A mortality study is made of five individuals. The exponential distribution S(x) = e-λx is fit to the data. You are given: (i) Deaths are recorded at times 2, 3, 5, 8, and s, where 0 < s < 6. ^
(ii) λ 1 is the maximum-likelihood estimator of λ. ^
(iii) λ 2 is the maximum-likelihood estimator obtained if the data were censored from above at t = 6. ^
^
(iv) 1/ λ 2 - 1/ λ 1 = 0.50. ^
Determine λ 2. (A) 0.20
(B) 0.22
(C) 0.25
(D) 0.28
(E) 0.35
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 240
8.38 (165, 5/87, Q.10) (2.1 points) For a group of three lives followed from birth, the force mortality is assumed to be µx = kx2 , with k > 0. You are given that the three deaths occurred at exact ages 4, 5, and 6. Determine the maximum-likelihood estimate of k. (A) 1/135 (B) 1/120 (C) 1/77 (D) 1/45 (E) 3/77 8.39 (160, 11/87, Q.3) (2.1 points) You have observed 20 lives over the time interval (0,15). Four deaths occurred during that interval, and the remaining 16 lives are active at t = 15. You are modeling the experience using the following density function: f(t) = 1/w, 0 ≤ t ≤ w. Calculate the maximum likelihood estimate of w. (A) 55 (B) 60 (C) 65 (D) 70 (E) 75 8.40 (160, 11/87, Q.12) (2.1 points) Ten mice are each exact age x at the start of an experiment. Four mice die between time 0 and time 1, and six mice are alive at time 1. All deaths are assumed to occur at age x + 1/2. The Survival Function is assumed to follow a Pareto with α = 1, tp x = θ/(θ + t), 0 ≤ t ≤ 1. Determine the maximum likelihood estimate of the probability of death between time 0 and time 1. (A) 0.24 (B) 0.28 (C) 0.30 (D) 0.37 (E) 0.40 8.41 (160, 5/88, Q.14) (2.1 points) You are given: (i) Ix is linear over the interval [65, 66]. (ii) 10 lives are observed to enter at exact age 65. (iii) 2 of the 10 die before reaching age 66. (iv) 2 of the 10 leave observation alive at age 65.5. Determine the maximum likelihood estimate of q65. (A) 0.221
(B) 0.222
(C) 0.223
(D) 0.224
(E) 0.225
8.42 (160, 5/88, Q.17) (2.1 points) You are given: (i) 4 lives are observed from time t = 0 until death. (ii) Deaths occur at times t = 1, 2, 3, and 4. (iii) The lives are assumed to follow a Weibull distribution with h(t) = kt. Determine the maximum likelihood estimate of k. (A) 0.13 (B) 0.27 (C) 0.40 (D) 0.53 (E) 0.63
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 241
8.43 (165, 11/88, Q.11) (1.7 points) For a group of three individuals, the force of mortality is assumed to be a function of age such that µx = kx. You are given the following data regarding age at entry and age at termination: Life Exact Age at Entry Termination I 20 Death at exact age 40 II 30 Death at exact age 40 III 20 Withdrawal at exact age 30 Determine the maximum likelihood estimate of k. (A) 1/1200 (B) 1/1000 (C) 1/800 (D) 1/600
(E) 1/400
8.44 (160, 5/89, Q.12) (2.1 points) From a study of the age interval (70, 71), you are given: (i) There were no withdrawals. (ii) Out of the 80 lives that entered observation at age 70, three deaths occurred, all at age 70.75. (iii) Out of the 26 intermediate entrants, which all occurred at age 70 + r, the only death was at age 70.75. (iv) On the assumption that the underlying survival distribution was linear, the maximum likelihood estimate of q70 is 0.042. Determine r. (A) 0.38 (B) 0.39
(C) 0.40
(D) 0.41
(E) 0.42
8.45 (160, 11/89, Q.11) (2.1 points) From a complete mortality study of n lives, you are given: (i) The n lives were observed from June 1, 1989 through October 30, 1989. (ii) Each life was exact age x on January 1, 1989. (iii) There were four deaths observed. (iv) The maximum likelihood estimate for qx was 0.20, assuming a uniform survival distribution. Determine n. (A) 44 (B) 45
(C) 46
(D) 47
(E) 48
8.46 (160, 11/89, Q.14) (2.1 points) For a group of 12 lives, you are given: (i) There are 5 deaths in the interval (0, 1]. (ii) There are 4 deaths in the interval (1, 2]. (iii) There are 3 deaths in the interval (2, 3]. (iv) The group is assumed to follow an exponential distribution with hazard rate λ. Calculate the maximum likelihood estimate of e−λ. (A) 0.33
(B) 0.45
(C) 0.58
(D) 0.71
(E) 0.83
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 242
8.47 (165, 11/89, Q.10) (1.7 points) For a group of four seriously ill individuals diagnosed at the same time, the force of mortality during the first 6 years of observation is of the form µt = kt, where t is the time since diagnosis. The future Iifetimes of these individuals are mutually independent. The following was observed over the six-year period: Individual Status I died at end of 3rd year II still seriously ill III still seriously ill IV cured at end of 5th year Determine the maximum Iikelihood estimate of k. (A) 0.017 (B) 0.019 (C) 0.029 (D) 0.038 (E) 0.057 8.48 (160, 5/90, Q.15) (2.1 points) A mortality study of the first year after diagnosis of a dread disease includes the concomitant variable of smoking status. You are given: (i) The hazard rate of nonsmokers is kt, t > 0. (ii) The hazard rate of smokers is twice that of nonsmokers. (iii) Individual Status Observation Interval Reason for Leaving Observation 1 Nonsmoker (0, 1/2) Death 2 Nonsmoker (0, 1) Survival 3 Smoker (0, 1/4) Death 4 Smoker (0, 1/2) Withdrawal Determine the maximum likelihood estimate of k. (A) 0.7 (B) 0.9 (C) 1.1 (D) 1.6 (E) 2.1 8.49 (160, 5/90, Q.16) (2.1 points) Ten laboratory mice are observed for a period of five days. You are given: (i) Seven mice die during the observation period, with the following distribution of deaths: Exact Time of Death in Days 2 3 4 5 Number of Deaths 1 2 1 3 (ii) The lives in the study are subject to an exponential survival function with hazard rate λ. ^
Determine λ by the method of maximum likelihood. (A) 0.17
(B) 0.20
(C) 0.23
(D) 0.26
(E) 0.29
8.50 (165, 5/90, Q.11) (1.7 points) For a group of four animals followed from birth, the force of mortality is µx = kx3 , with k > 0. The four animals die at exact ages 2, 3, 4, and 5. Determine the maximum likelihood estimate of k. (A) 0.001 (B) 0.004 (C) 0.016 (D) 0.032
(E) 0.071
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 243
8.51 (160, 11/90, Q.15) (1.9 points) From a study of 10 lives over the age interval (x, x + 1], you are given: (i) Three deaths are observed. (ii) The only withdrawal is observed at age x + 1/4. (iii) The underlying mortality distribution is uniform. Determine the maximum likelihood estimate of qx. (A) 0.319
(B) 0.321
(C) 0.324
(D) 0.327
(E) 0.329
8.52 (160, 11/90, Q.17) (1.9 points) From a study of 9 laboratory animals over the interval (0, 1], you are given: (i) One dies at 0.25. (ii) Another dies at 0.75. (iii) The other 7 survive to 1.00. (iv) T is the maximum likelihood estimate of q0 under the exponential distribution assumption. (v) U is the maximum likelihood estimate of q0 under the linear distribution assumption. Determine U - T. (A) -0.002 (B) -0.001
(C) 0.000
(D) 0.001
(E) 0.002
8.53 (160, 11/90, Q.20) (1.9 points) You are given: (i) The following two patients are observed for calendar year 1989 following an operation: Patient Date of Operation Date of Death 1 July 1, 1988 July 1, 1989 2 October 1, 1988 --(ii) The underlying mortality distribution is exponential with hazard rate λ. (iii) The maximum likelihood estimate of λ is 1.00. Calculate the cause and date of termination of Patient 2. (A) Died April 1, 1989 (B) Withdrew April 1, 1989 (C) Withdrew July 1, 1989 (D) Died October 1, 1989 (E) Withdrew October 1, 1989 8.54 (165, 11/90, Q.13) (1.9 points) For a group of four individuals that come under observation at age 3, you are given: (i) µx = bx; and (ii) the exact ages at death are 5, 7, 8, and 10 for the four individuals. Determine the maximum likelihood estimate of b. (A) 0.00 (B) 0.01 (C) 0.02 (D) 0.03 (E) 0.04
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 244
8.55 (160, 5/91, Q.15) (1.9 points) From a mortality study over the time interval (0, 1], you are given: (i) The lives are assumed to be subject to the exponential survival distribution. (ii) There were 104 lives scheduled to withdraw from the study, all at time t = 1/4. (iii) There were 32 deaths observed. (iv) Eight of the 32 deaths were from those lives scheduled to withdraw. (v) There were no unscheduled withdrawals. (vi) The maximum likelihood estimate of q0 is 1/10. Determine the number of lives that survived under observation to time t = 1. (A) 264 (B) 267 (C) 270 (D) 273 (E) 276 8.56 (160, 5/91, Q.18) (1.9 points) You are given: (i) Four independent lives are observed from time t = 0 until death. (ii) Deaths occur at exact times t = 1, 2, 3, and 4. (iii) The lives are assumed to be subject to the probability density function f(t) = t e-t/c/c2 , t > 0. Calculate the maximum likelihood estimate for c. (A) 0.20 (B) 0.80 (C) 1.25 (D) 2.50
(E) 5.00
8.57 (165, 11/94, Q.13) (1.9 points) For a mortality study covering the period January 1, 1990 to January 1, 1994, the following was observed: Life Age on January 1, 1990 Observation A 1 Died January 1, 1993 B 2 Still Alive January 1, 1994 C 1 Withdrew January 1, 1991 D 4 Withdrew January 1, 1992 The force of mortality has the form µx = kx. Determine the maximum likelihood estimate of k. (A) 1/70
(B) 1/35
(C) 3/70
(D) 3/35
(E) 4/35
8.58 (Course 160 Sample Exam #3, 1994, Q.10) (1.9 points) For a mortality study over (x, x+1], you are given: (i) Ten lives enter at exact age x. (ii) There are no intermediate entrants. (iii) The only three withdrawals occur at x+0.4. (iv) One death is observed. (v) The uniform distribution is assumed for mortality. Calculate the maximum likelihood estimate of qx. (A) 0.120
(B) 0.121
(C) 0.122
(D) 0.123
(E) 0.124
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 245
8.59 (Course 160 Sample Exam #3, 1994, Q.14) (1.9 points) For a complete data study, you are given: (i) The force of mortality, h(t) = kt, k > 0, t > 0. (ii) The exact times of death are 1, 1, 2, 4, 5, 6, and 6. Calculate the maximum likelihood estimate of k. (A) 0.09 (B) 0.12 (C) 0.18 (D) 0.25 (E) 0.28 8.60 (165, 5/96, Q.16) (1.9 points) The following table gives most of the observed experience of four lives in a mortality study which began January 1, 1990 and ended January 1, 1995: Life Exact Age on January 1, 1990 Observation A 4 Died January 1, 1991 B 2 Still Alive January 1, 1995 C 5 Died January 1, 1993 D 1 --The force of mortality has the form µx = kx. The maximum likelihood estimate of k is 1/27. Life D did not die while under observation. Determine the observation for life D. (A) Withdrew January 1, 1991 (B) Withdrew January 1, 1992 (C) Withdrew January 1, 1993 (D) Withdrew January 1, 1994 (E) Still Alive January 1, 1995 8.61 (Course 160 Sample Exam #1, 1996, Q.9) (1.9 points) The following data were produced from observing a sample of five individuals to estimate qx: (i) All came under observation at age x + r, 0 ≤ r < 0.5. (ii) One withdrew from the study at age x + 0.75. (iii) Two withdrew from the study at age x + 0.5. (iv) One died at age x + 0.5. (v) One stayed in the study until age x + 1. (vi) tqx = t/(θ + t), 0 ≤ t ≤ 1. (vii) The maximum likelihood estimate of qx is 4/11. Determine r. (A) 0.14 (B) 0.16
(C) 0.18
(D) 0.20
(E) 0.22
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
8.62 (Course 160 Sample Exam #1, 1996, Q.10) (1.9 points) For five patients observed over (x, x+1], you are given: Patient Age at entry Age at exit Mode of exit 1 x x + 0.3 withdrawal 2 x x + 0.5 death 3 x x + 1.0 survivor 4 x + 0.1 x + 1.0 survivor 5 x + 0.3 x + 0.7 death The underlying survival function is assumed to be linear over the age interval (x, x+1]. Calculate the maximum likelihood estimate of qx. (A) 0.475
(B) 0.500
(C) 0.507
(D) 0.513
(E) 0.522
8.63 (Course 160 Sample Exam #1, 1996, Q.14) (1.9 points) Two patients were observed over (0, 5]. You are given: (i) One died at t = 4. (ii) The other was still alive at t = 5. (iii) The survival function is of the form S(t) = (1 + t/10)−α, t > 0, α > 0. Determine the maximum likelihood estimate for α. (A) 0.6
(B) 0.9
(C) 1.1
(D) 1.3
(E) 2.5
8.64 (Course 160 Sample Exam #2, 1996, Q.11) (1.9 points) For a study over (40, 41], you are given: (i) 50 lives begin the study at exact age 40. (ii) There are no new entrants. (iii) Five deaths are observed. (iv) Ten withdrawals occur, all at age 40.2. (v) Deaths are subject to a uniform distribution. Calculate the maximum likelihood estimate of q40. (A) 0.112
(B) 0.113
(C) 0.116
(D) 0.119
(E) 0.120
8.65 (Course 160 Sample Exam #2, 1996, Q.12) (1.9 points) For a mortality study over (x, x+1], you are given: (i) Ten lives enter observation at age x. (ii) There were no intermediate entries. (iii) There was one death. (iv) One life withdrew from the study at age x + 1/2. (v) The underlying survival function is assume to be linear. Estimate qx using maximum likelihood estimation. (A) 0.100
(B) 0.102
(C) 0.104
(D) 0.106
(E) 0.108
Page 246
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 247
8.66 (Course 160 Sample Exam #2, 1996, Q.13) (1.9 points) You are given: (i) Ten lives are subject to the survival function S(t) = (1 - t/k)1/2, 0 ≤ t ≤ k from birth until death. (ii) The first two deaths in the sample occurred at time t = 10. (iii) The study is ended at t = 10. Calculate the maximum likelihood estimate of k. (A) 25.0 (B) 25.7 (C) 26.4 (D) 27.1 (E) 27.8 8.67 (Course 160 Sample Exam #2, 1996, Q.14) (1.9 points) A sample of 500 light bulbs are tested for failure beginning at time t = 0. You are given: (i) The study is ended at time t = 4. (ii) Five light bulbs fail before time t = 4 with times at failure of 1.1, 3.2, 3.3, 3.5, and 3.9. (iii) Time of failure is subject to an Exponential Distribution, with hazard rate λ. Calculate the maximum likelihood estimate of λ. (A) 0.00249
(B) 0.00251
(C) 0.00253
(D) 0.00255
(E) 0.00257
8.68 (165, 11/97, Q.16) (1.9 points) For a group of 20 lives exactly age 2, the force of mortality is assumed to be µx = 1/(b + x), b ≥ 0, x ≥ 2. There is 1 death and 19 withdrawals at exact age 3. Determine the maximum likelihood estimate of b. (A) 14 (B) 15 (C) 16 (D) 17
(E) 18
8.69 (Course 160 Sample Exam #3, 1997, Q.11) (1.9 points) Two research teams studied five diseased cows. You are given: (i) The survival distribution function is S(t) = (ω - t)/ω, 0 ≤ t ≤ ω. (ii) Each cow came under observation at t = 0 . (iii) The times of death were 1, 3, 4, 4, and 6. (iv) Research Team X, impatient to publish results, terminated its observations at t = 5 and estimated ω using the method of maximum likelihood with incomplete data. (v) Research Team Y waited for the last cow to die and estimated ω using the method of maximum likelihood with complete data. Compute the absolute value of the difference between Research Team X's and Research Team Y's maximum likelihood estimates of ω. (A) 0.25
(B) 0.30
(C) 0.35
(D) 0.40
(E) 0.45
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 248
8.70 (Course 160 Sample Exam #3, 1997, Q.13) (1.9 points) For a study of four automobile engines, you are given: (i) The engines are subject to a uniform survival distribution S(t) = (ω - t)/ω, 0 ≤ t ≤ ω. (ii) Failures occurred at times 4, 5, and 7; the remaining engine was still operational at time r. (iii) The observation period was from time 3 to time r. (iv) The MLE of ω is 13.67. Determine r. (A) 11 (B) 12
(C) 13
(D) 14
(E) 15
8.71 (Course 160 Sample Exam #3, 1997, Q.18) (1.9 points) Strike activities in an industry were observed from 6/1/96 to 8/31/96. You are given: (i) Plant Start Date of Strike End Date of Strike A June 10 June 20 B July 5 July 20 C July 25 August 8 D August 11 August 15 E August 24 Unsettled as of August 31 (ii) There are no other strikes in effect during the observation period. (iii) Three models are suggested to study strike durations: I: Exponential with mean 12.5 days. II: Exponential with mean 10 days. III: Weibull with θ = 14.14 days and τ = 2 (iv) Goodness of fit of a model is measured by the size of its likelihood. Rank the three models by goodness of fit from best to worst. (A) I, II, III (B) II, I, III (C) II, III, I (D) III, I, II (E) III, II, I 8.72 (Course 160 Sample Exam #3, 1997, Q.19) (1.9 points) For an industrial company that purchases 10 identical new machines, you are given: (i) The times of failure of the machines are 3, 7, 8, 12, 12, 13, and 14. (ii) Three machines had not failed at time 15, the end of the observation period. (iii) The survival pattern of the machines follows the piecewise exponential model defined by hazard rates: λ1 on 0 ≤ t < 5, λ2 on 5 ≤ t < 10, and λ3 on t ≥ 10. (iv) λ1, λ2, and λ3 are estimated using the maximum likelihood approach. ^
Calculate S (12). (A) 0.510
(B) 0.516
(C) 0.518
(D) 0.520
(E) 0.522
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 249
8.73 (Course 160 Sample Exam #1, 1999, Q.10) (1.9 points) For a fertility study over the interval (35, 36], you are given: (i) Pregnancies are assumed to be subject to a uniform distribution. (ii) Number of Lives Age of Entry Scheduled Ending Age Number Who Became Pregnant 10 35.4 36.0 6 20 35.0 36.0 8 Calculate the maximum likelihood estimate of the pregnancy rate over the interval (35, 36]. (A) 0.51 (B) 0.53 (C) 0.55 (D) 0.57 (E) 0.59 8.74 (Course 160 Sample Exam #1, 1999, Q.14) (1.9 points) For a study of failure time of batteries, you are given: (i) The hazard rate is h(t) = λ1 for 0 ≤ t < 2, and h(t) = λ2 for t ≥ 2. (ii) Five batteries contribute to the study. (iii) Each battery is observed from age 0. (iv) The results of the study are: Battery Observation 1 Fails at age 1.7 2 Leaves the study at age 1.5 3 Leaves the study at age 2.6 4 Fails at age 3.3 5 Leaves the study at age 3.5 Determine the maximum likelihood estimate of λ2 - λ 1. (A) 0.17
(B) 0.19
(C) 0.21
(D) 0.23
(E) 0.25
8.75 (4, 11/05, Q.6 & 2009 Sample Q.218) (2.9 points) The random variable X has survival function: S X(x) =
θ4 . (θ2 + x 2)2
Two values of X are observed to be 2 and 4. One other value exceeds 4. Calculate the maximum likelihood estimate of θ. (A) Less than 4.0 (B) At least 4.0, but less than 4.5 (C) At least 4.5, but less than 5.0 (D) At least 5.0, but less than 5.5 (E) At least 5.5
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
8.76 (4, 11/06, Q.18 & 2009 Sample Q.262) (2.9 points) You are given: (i) At time 4 hours, there are 5 working light bulbs. (ii) The 5 bulbs are observed for p more hours. (iii) Three light bulbs burn out at times 5, 9, and 13 hours, while the remaining light bulbs are still working at time 4 + p hours. (iv) The distribution of failure times is uniform on (0, ω). (v) The maximum likelihood estimate of ω is 29. Determine p. (A) Less than 10 (B) At least 10, but less than 12 (C) At least 12, but less than 14 (D) At least 14, but less than 16 (E) At least 16
Page 250
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 251
Solutions to Problems: 8.1. E. Let the density be u. S(t) = 1 - u(t - 50). f(t) = u. The death at age 55 contributes: ln f(55) = ln u. The two exits at age 55 contribute: 2 ln S(55) = 2 ln(1 - 5u). The seven exits at age 60 contribute: 7 ln S(60) = 7 ln(1 - 10u). The loglikelihood is: ln(u) + 2 ln(1 - 5u) + 7 ln(1 - 10u). Setting the derivative with respect to u equal to zero: 1/u - 10/(1 - 5u) - 70/(1 - 10u) = 0. ⇒ 500u2 - 95u + 1 = 0 ⇒ u = 0.01118. q59 = u/(1 - 9u) = 1.24%. Comment: See Exercise 15.31 in Loss Models. 8.2. E. Let the first hazard rate be λ1 and the second be λ2.
∫
H(t) = h(x) dx. t
For t ≤ 15, H(t) =
∫ λ1 dx = λ1t.
0 t
For t > 15, H(t) =
15
t
∫ h(x) dx = ∫ λ1 dx + ∫ λ2 dx = λ115 + λ2(t - 15).
0
0
15
S(t) = exp[-H(t)]. For t ≤ 15, S(t) = exp[-λ1t]. For t > 15, S(t) = exp[-λ115 - λ2(t - 15)]. ln S(25) = -λ115 - λ2(25 - 15) = -15λ1 - 10λ2. f(t) = -dS(t)/dt = h(t) S(t). For t ≤ 15, f(t) = λ1exp[-λ1t]. ln f(6) = lnλ1 - 6λ1. For t > 15, f(t) = λ2 exp[-λ115 - λ2(t - 15)]. ln f(22) = lnλ2 - 15λ1 - 7λ2. loglikelihood is: ln f(6) + ln f(13) + ln f(17) + ln f(22) + ln f(24) + 2ln S(25) = -6λ1 + lnλ1 - 13λ1 + lnλ1 - 15λ1 - 2λ2 + lnλ2 - 15λ1 - 7λ2 + lnλ2 - 15λ1 - 9λ2 + lnλ2 - 2(15λ1 +10λ 2) = 2lnλ1 + 3lnλ2 - 94λ1 - 38λ2. Setting the partial derivative with respect to λ1 equal to zero: 2/λ1 = 94. ⇒ λ1 = 1/47. Setting the partial derivative with respect to λ2 equal to zero: 3/λ2 = 38. ⇒ λ2 = 3/38. S(20) = exp[-λ115 - λ25] = exp[-15/47 - 15/38] = e-0.7139 = 0.490.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 252
Alternately, since a constant hazard rate would be an Exponential Distribution, one can apply the following shortcuts. To get the first hazard rate, censor the data from above at 15: 6, 13, 15+, 15+, 15+, 15+, 15+. Now fit an Exponential via maximum likelihood: θ1 = (sum of payments)/(# of uncensored values) = {6 + 13 + (5)(15)}/2 = 47. ⇒ λ1 = 1/47. To get the second hazard rate, truncate and shift the data from below at 15: 2, 7, 9, 10+, 10+. Now fit an Exponential via maximum likelihood: θ2 = (sum of payments)/(# of uncensored values) = (2 + 7 + 9 + 10 + 10)/3 = 38/3. ⇒ λ2 = 3/38. Up to 15, we get an Exponential: S(t) = exp[-λ1t] = exp[-t/47]. ⇒ S(15) = exp[-15/47]. Beyond 15, S(t)/S(15) = exp[-λ2(t-15)] = exp[-(t-15)3/38]. ⇒ S(20)/S(15) = exp[-(5)(3)/38]. S(20) = exp[-15/47] exp[-15/38] = e-0.7139 = 0.490. Comment: Here are graphs of the hazard rate, cumulative hazard rate, survival function, and density: h 3 38
1 47
15
25
15
25
age
H 1.0 0.8 0.6 0.4 0.2 age
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 253
S 1.0 0.8 0.6 0.4 0.2 15
50 age
15
50
f 0.05 0.04 0.03 0.02 0.01 age
8.3. A. f(t) = -dS(t)/dt = (1/3) (1 - t/ω)-2/3/ω. The loglikelihood is: ln f(30) + 4 ln S(40) = -ln(3) - (2/3)ln(1 - 30/ω) - ln(ω) + (4/3) ln(1 - 40/ω). Setting the derivative with respect to ω equal to zero: -20/(ω2 - 30ω) - 1/ω + (160/3)/(ω2 - 40ω) = 0. ⇒ 3ω2 - 310ω + 6000 = 0. ⇒ ω = 77.5. Comment: Similar to Exercise 15.32 in Loss Models. If instead the survival function were uniform, then the maximum likelihood fit is ω = 200.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 254
8.4. B. Using the Product-Limit estimator: 10p 55 = (190/200)(125/140) =0 .8482. The estimate of 10q55 is: A = 1 - 0.8482 = 0.1518. Assuming uniformity, the likelihood is: {S(55) - S(60)}10 S(60)50 {S(65) - S(60)}15 S(65)125. Letting the uniform density be u, the likelihood is: (5u)10 (1 - 5u)50 (5u)15 (1 - 10u)125 ~ u25(1 - 5u)50(1 - 10u)125. Setting the derivative with respect to u equal to zero: 0 = 25u24(1 - 5u)50(1 - 10u)125 - 250u25(1 - 5u)49(1 - 10u)125 - 1250u25(1 - 5u)50(1 - 10u)124.
⇒ 0 = (1 - 5u)(1 - 10u) - 10u(1 - 10u) - 50u(1 - 5u). ⇒ 400u2 - 75u + 1 = 0. ⇒ u = 0.01445. ⇒ The estimate of 10q55 is: B = (10)(0.01445) = 0.1445. B/A = 0.1445/0.1518 = 0.952. Comment: Similar to Exercise 15.36 in Loss Models. 8.5. C. (# ≤ 1000)/(total #) = 18/30 = 60%. 8.6. B. (# > 5000)/(total #) = 4/30 = 13.3%. 8.7. B. θ = Σ ti / 30 = 74051/30 = 2468.4. S(5000) = e-5000/2468.4 = 13.2%. Comment: The corresponding maximum loglikelihood is: -Σ ti /θ - Nln(θ) = -30 - 30ln2468.4 = -264.34. 8.8. Var[ θ^ ] = Var[ X ] = Var[X]/30 = θ2/30 = 2468.42 /30 = 203,100. S(5000) = exp[-5000/θ].
∂S(5000)/ ∂θ = exp[-5000/θ](5000/θ2) = exp[-5000/2468.4](5000/2468.42) = 0.00010825. ^
Var[ S(5000)] = (∂S(5000)/ ∂θ)2 Var[ θ^ ] = (0.00010825)2 (203,100) ^
StdDev[ S(5000)] = (0.00010825) 203,100 = 0.0488. 90% linear confidence interval: .132 ± (1.645)(.0488) = 0.132 ± 0.080 = (0.052, 0.212). Comment: See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 255
8.9. C. The density of Weibull is: f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x. ln f(x) = ln(τ) + (τ-1)ln(x) - (x/θ)τ - τln(θ). loglikelihood = Σ ln f(ti) = N ln(τ) + (τ-1) Σ ln(ti) - (Σ tiτ) /θτ - Nτln(θ) = 30ln(0.5) + 192.86(0.5 - 1) - (Σ ti.5) /θ.5 - 30(1/2)ln(θ) = -117.22 - 1134.19 /θ.5 - 15ln(θ). Setting the derivative with respect to θ equal to zero: 567.095 /θ1.5 - 15/θ = 0.
⇒ θ = 37.8062 = 1429.3. ⇒ S(5000) = exp[-(5000/1429.3).5] = 15.4%. Alternately, for a Weibull with tau fixed, θτ = (Σ xiτ) /N. θ.5 = 1134.19/30.
⇒ θ = 1429.3. ⇒ S(5000) = 15.4%. Comment: The corresponding maximum loglikelihood is: -117.22 - 1134.19 /θ.5 - 15ln(θ) = -117.22 - 1134.19 /1429.3.5 - 15ln(1429.3) = -256.20. 8.10. ln f(x) = ln(τ) + (τ-1)ln(x) - (x/θ)τ - τln(θ). ∂ ln f(x) / ∂θ = τxτ/θτ+1 - τ/θ. ∂2 ln f(x) / ∂θ2 = τ/θ2 - τ(τ+1) xτ θ−(τ+2). nE[∂2 ln f(x) / ∂θ2] ≅ (30)(0.5)/1429.32 - {(0.5)(1.5)/1429.32.5}Σ xi0.5 = 0.000007343 - (0.75/1429.32.5)(1134.19) = -0.000003671. Var[ θ^ ] ≅ - 1/{nE[∂2 ln f(x) / ∂θ2]} = 272,381. S(5000) = exp[-(5000/θ)0.5]. ∂S(5000)/ ∂θ = exp[-(5000/θ)0.5](1/2)50000.5/θ1.5 = (1/2)exp[-(5000/1429.3)0.5]50000.5/1429.31.5 = 0.0001008. ^
Var[ S(5000)] ≅ (∂S(5000)/ ∂θ)2 Var[ θ^ ] = (0.0001008)2 (272,381) ^
StdDev[ S(5000)] = (0.0001008) 272,381 = 0.0526. 90% linear confidence interval: .154 ± (1.645)(0.0526) = 0.154 ± 0.087 = (0.067, 0.241). Comment: Difficult. See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 256
8.11. Using the likelihood ratio test in order to test the hypothesis that this data follows an Exponential Distribution versus the alternate hypothesis that it follows a Weibull, the statistic is: 2(-255.58 - (-264.34)) = 17.52. Comparing to the Chi-Square Distribution with one degree of freedom, 17.52 > 7.88. Thus we reject at 0.5% the Exponential in favor of the Weibull. Using the likelihood ratio test in order to test the hypothesis that this data follows a Weibull Distribution with τ = 1/2 versus the alternate hypothesis that it follows a Weibull, the statistic is: 2(-255.58 - (-256.20)) = 1.24. Comparing to the Chi-Square Distribution with one degree of freedom, 1.24 < 3.84. Thus we do not reject at 5% the Weibull Distribution with τ = 1/2 in favor of the general Weibull Distribution. 8.12. The estimate of S(5000) is: exp(-(5000/1608)0.59) = 14.2%. 14.3 ⎞ ⎛ 276,352 The covariance matrix is the inverse of the information matrix: ⎜ . 0.00708⎟⎠ ⎝ 14.3 S(5000) = exp[-(5000/θ)τ].
∂S(5000)/ ∂θ = exp[-(5000/θ)τ] τ 5000τ /θτ+1 = exp[-(5000/1608)0.59](0.5898)50000.59/16081.59 = 0.0001016.
∂S(5000)/ ∂τ = -exp[-(5000/θ)τ] (5000/θ)τln(5000/θ) = -exp[-(5000/1608)0.59] (5000/1608)0.59ln(5000/1608) = -0.314. Gradient Vector = (0.0001016, -0.314). 14.3 ⎞ ⎛ 0.0001016⎞ ⎛ 276,352 ^ Var[ S(5000)] ≅ (0.0001016, -0.314) ⎜ = 0.00264. 0.00708⎟⎠ ⎜⎝ -0.314 ⎟⎠ ⎝ 14.3 90% linear confidence interval: 0.142 ± 1.645 0.00264 = 0.142 ± 0.085 = (0.057, 0.227). Comment: Difficult. See “Mahlerʼs Guide to Fitting Loss Distributions.” 8.13. B. Since there is no truncation or censoring, the Kaplan-Meier Product-Limit Estimator is equal to the empirical Survival Function at 5000, 4/30 = 13.3%. Comment: The empirical methods, such as the Kaplan-Meier Product-Limit Estimator and NelsonAalen Estimator, require no assumption as to the form of the distribution. 8.14. The variance is that of the empirical survival function: (4/30)(1 - 4/30)/30 = 0.00385. 0.133 ± 1.645 0.00385 = 0.133 ± 0.102 = (0.031, 0.235).
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 257
^ ^ 8.15. C. H(5000) = 1/30 + 1/29 + 1/28 + ...+ 1/5 = 1.91165. S(5000) = e-1.91165 = 14.8%.
^
8.16. The variance of H(5000) is: 1/302 + 1/292 + 1/282 + ...+ 1/52 = 0.188539. ^
90% linear confidence interval for H(5000) is: 1.91165 ± 1.645 0.188539 = 1.912 ± 0.714 = (1.198, 2.626). ^
Multiplying by -1 and exponentiating, a 90% linear confidence interval for S(5000) is: (0.072, 0.302). 8.17. A. ln f(t) = -t/θ - ln(θ). ln S(5000) = -5000/θ. loglikelihood is: i=26
-
∑ ti / θ
- 26ln(θ) + (4)(-5000/θ) = -31084/θ - 26ln(θ) - 20000/θ = -51084/θ - 26ln(θ).
i=1
Setting the derivative with respect to θ equal to zero: 0 = 51084/θ2 - 26/θ.
⇒ θ = 51084/26 = 1965. ⇒ S(5000) = e-5000/1965 = 7.85%. Comment: Similar to Exercise 15.33 in Loss Models. The corresponding maximum loglikelihood is -233.16. Using the Kaplan-Meier Product-Limit Estimator, since there is no truncation and the first censorship point is 5000, the estimate of S(5000) is equal to one minus the empirical Distribution Function at 5000: 1 - 26/30 = 4/30 = 13.3%. Using the Nelson-Aalen Estimator, ^ ^ H(5000) = 1/30 + 1/29 + 1/28 + ...+ 1/5 = 1.91165, and S(5000) = e-1.91165 = 14.8%.
8.18. ∂ loglikelihood/ ∂θ = 51084/θ2 - 26/θ. ∂2 loglikelihood/ ∂θ2 = -102168/θ3 + 26/θ2. Var[ θ^ ] ≅ -1/ (-102168/θ3 + 26/θ2) = θ3/(102168 - 26θ) = 19653 /(102168 - (26)(1965)) = 148,544. S(5000) = exp[-5000/θ].
∂S(5000)/ ∂θ = exp[-5000/θ](5000/θ2) = exp[-5000/1965](5000/19652) = 0.00010167. ^
Var[ S(5000)] ≅ (∂S(5000)/ ∂θ)2 Var[ θ^ ] = (0.00010167)2 (148,544). ^
StdDev[ S(5000)] = (0.00010167) 148,544 = 0.0392. 90% linear confidence interval: 0.079 ± (1.645)(0.0392) = 0.079 ± 0.064 = (0.015, 0.143). Comment: See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 258
8.19. E. ln f(x) = ln(τ) + (τ-1)ln(x) - (x/θ)τ - τln(θ) = ln(0.5) - ln(x)/2 - x0.5/θ0.5 - ln(θ)/2. ln S(5000) = -(5000/θ)τ = -70.71/θ0.5. 26
loglikelihood is: Σ {ln(0.5) - ln(x)/2 - x0.5/θ0.5 - ln(θ)/2} + (4)(-70.71/θ0.5) = i
-26ln(2) - 155.83/2 - 722.08/θ0.5 - 13ln(θ) - 282.84/θ0.5 = -26ln(2) - 77.92 - 1004.92/θ0.5 - 13ln(θ). Setting the derivative with respect to θ equal to zero: 0 = 502.46/θ1.5 - 13/θ.
⇒ θ = (502.46/13)2 = 1494. ⇒ S(5000) = exp[-(5000/1494)0.5] = 16.1%. Comment: The corresponding maximum loglikelihood is -216.955. If one fits both θ and τ via maximum likelihood, then using a computer to numerically maximize the loglikelihood: θ^ = 1568 and ^τ = 0.589, with a corresponding loglikelihood of -216.475. Using this fitted Weibull, S(5000) = exp(-(5000/1568)0.589) = 13.8%. 8.20. ∂ loglikelihood/ ∂θ = 502.46/θ1.5 - 13/θ. ∂2 loglikelihood/ ∂θ2 = -753.69/θ2.5 + 13/θ2. Var[ θ^ ] ≅ -1/ (-753.69/θ2.5 + 13/θ2) = θ2.5/(753.69 - 13θ.5) = 14942.5 / {753.69 - (13)(1494.5)} = 343,431. S(5000) = exp[-(5000/θ)0.5]. ∂S(5000)/ ∂θ = exp[-(5000/θ)0.5](1/2)50000.5/θ1.5 = (1/2)exp[-(5000/1494)0.5]50000.5/14941.5 = 0.00009827. ^
Var[ S(5000)] ≅ (∂S(5000)/ ∂θ)2 Var[ θ^ ] = (0.00009827)2 (343,431) ^
StdDev[ S(5000)] = (0.00009827) 343,431 = 0.0576. 90% linear confidence interval: 0.161 ± (1.645)(0.0576) = 0.161 ± 0.095 = (0.066, 0.256). Comment: Difficult. See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 259
8.21. E. ln f(t) = -t/θ - ln(θ). ln S(500) = -500/θ. Each value contributes: ln f(x) - ln S(500). loglikelihood is: Σ (-ti /θ - ln(θ) + 500/θ) = -71004/θ - 16ln(θ) + (16)(500)/θ = -63004/θ - 16ln(θ). Setting the derivative with respect to θ equal to zero: 0 = 63004/θ2 - 16/θ.
⇒ θ = 63004/16 = 3938. ⇒ S(5000) = e-5000/3938 = 28.1%. Comment: The corresponding maximum loglikelihood is -148.454. 8.22. ∂ loglikelihood/ ∂θ = 63004/θ2 - 16/θ. ∂2 loglikelihood/ ∂θ2 = -126008/θ3 + 16/θ2. Var[ θ^ ] ≅ - 1/ (-126008/θ3 + 16/θ2) = θ3/(126008 - 16θ) = 39383 /{126008 - (16)(3938)} = 969363. S(5000) = exp[-5000/θ].
∂S(5000)/ ∂θ = exp[-5000/θ](5000/θ2) = exp[-5000/3938](5000/39382) = .00009057. ^
Var[ S(5000)] ≅ (∂S(5000)/ ∂θ)2 Var[ θ^ ] = (0.00009057)2 (969,363) ^
StdDev[ S(5000)] = (0.00009057) 969,363 = 0.0892. 90% linear confidence interval: 0.281 ± (1.645)(0.0892) = 0.281 ± 0.147 = (0.134, 0.428). Comment: See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 260
8.23. B. ln f(x) = ln(τ) + (τ-1)ln(x) - (x/θ)τ - τln(θ) = ln(0.5) - ln(x)/2 - x0.5/θ0.5 - ln(θ)/2. ln S(500) = -(500/θ)τ = -22.36/θ0.5. Each value contributes: ln f(x) - ln S(500). loglikelihood is: Σ (ln(0.5) - ln(ti)/2 - ti0.5/θ0.5 - ln(θ)/2 + 22.36/θ0.5 = -16ln(2) - 126.26/2 - 951.485/θ0.5 - 16ln(θ)/2 + (16)(22.36)/θ0.5 = -593.725/θ0.5 - 8ln(θ) - 74.22. Setting the derivative with respect to θ equal to zero: 0 = 296.863/θ1.5 - 8/θ.
⇒ θ = (296.863/8)2 = 1377. ⇒ S(5000) = exp[-(5000/1377)0.5] = 14.9%. Alternately, we can use the shortcut for a Weibull with tau fixed. ∑ (Min[xi , ui]τ - di τ) = Σ ( ti -
^
θ=
500 ) = 951.485 - 16 500 .
1 /τ ∑ (Min[xi , ui ]τ - diτ) ⎛ ⎞ = (951.485/16 ⎝ number of uncensored values ⎠
500 )2 = 1377.
⇒ S(5000) = exp[-(5000/1377)0.5] = 14.9%. Comment: The corresponding maximum loglikelihood is -148.043. For a Weibull Distribution with both θ and τ unknown, using a computer to numerically maximize the loglikelihood: θ^ = 2528 and ^τ = 0.6787, with a corresponding loglikelihood of -147.855. Using this fitted Weibull, S(5000) = exp(-(5000/2528)0.6787) = 20.4%. 8.24. ∂ loglikelihood/ ∂θ = 296.863/θ1.5 - 8/θ. ∂2 loglikelihood/ ∂θ2 = -445.295/θ2.5 + 8/θ2. Var[ θ^ ] ≅ - 1/ (-445.295/θ2.5 + 8/θ2) = θ3/(445.295θ.5 - 8θ) = 13773 /{445.295 1377 - (8)(1377)} = 474,034. S(5000) = exp[-(5000/θ)0.5]. ∂S(5000)/ ∂θ = exp[-(5000/θ)0.5](1/2)50000.5/θ1.5 = (1/2)exp[-(5000/ 1377)0.5]50000.5 / 13771.5 = 0.00010292. ^
Var[ S(5000)] ≅ (∂S(5000)/ ∂θ)2 Var[ θ^ ] = (0.00010292)2 (474,034) ^
StdDev[ S(5000)] = (0.00010292) 474,034 = 0.0709. 90% linear confidence interval: 0.149 ± (1.645)(0.0709) = 0.149 ± 0.117 = (0.032, 0.266). Comment: Difficult. See “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 261
8.25. C. (15/16)(14/15)...(4/5) = 4/16 = 25%. Comment: Since all the data is truncated from below at 500, the estimate is conditional on survival to 500. Using the maximum likelihood Exponential, the estimate of S(5000)/S(500) is: e-5000/3938 / e-500/3938 = e-4500/3938 = 31.9%. ^
8.26. C. Conditional on survival to 500, H(5000) = 1/16 + 1/15 + 1/14 + ... + 1/5 = 1.2974. ^ ^ S(5000)/ S(500) = e-1.2974 = 27.3%.
Comment: Using the maximum likelihood Weibull, the estimate of S(5000)/S(500) is: exp(-(5000/2528)0.6787)/exp(-(500/2528)0.6787) = 28.5%. 8.27. B. Assume between ages 60 and 61, that f(t) = w. Note that q60 = w. Contribution of 5 lives who died prior to age 61 is: 5 ln(w). Contribution of 45 lives who survived to age 61 is: 45 ln(1 - w). Contribution of 1 life who died prior to age 61 is: ln[f(t)/S(60.6)] = ln(w) - ln(1 - 0.6w). Contribution of 29 lives is: 29ln[S(61)/S(60.6)] = 29{ln(1 - w) - ln(1 - 0.6w)}. The loglikelihood is: 6 ln(w) + 74ln(1 - w) - 30ln(1 - 0.6w). Setting the partial derivative with respect to w equal to zero: 6/w - 74/(1 - w) + 18/(1 - 0.6w) = 0. ⇒ 15w2 - 32.8w + 3 = 0. ⇒ w = 0.0956. Comment: Similar to Exercise 15.29 in Loss Models.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 262
8.28. The maximum likelihood uniform distribution has ω = the maximum observation = 30. Thus f(t) = 1/30, 0 ≤ t ≤ 30. F(t) = t/30, 0 ≤ t ≤ 30. In contrast to this continuous distribution, the empirical distribution function has jump discontinuities at each observation. F(t) = 0, 0 ≤ t < 5. F(t) = 1/5, 5 ≤ t < 11. F(t) = 2/5, 11 ≤ t < 13. F(t) = 3/5, 13 ≤ t < 19. F(t) = 4/5, 19 ≤ t < 30. F(t) = 1, t ≥ 30. So while both distributions are zero at 0 and one at 30, the two are significantly different. Here is a graph of the two distribution functions, with the uniform distribution dashed: Prob. 1
o
0.8
o
0.6
o
0.4
o
0.2
o
time 10
20
30
40
8.29. A. Pamelaʼs estimate of ω is the maximum of the ten uncensored observations: 38. For Helen, the loglikelihood is: ln f(3) + ln f(8) + ln f(12) + ln f(17) + ln f(19) + ln f(22) + 4 S(24) = -6ln(ω) + 4 ln(1 - 24/ω). Setting the partial derivative with respect to ω equal to zero: 0 = -6/ω + 96/(ω2 - 24ω). ⇒ 6ω = 240. ⇒ ω = 40. The difference between Pamelaʼs estimate of ω and Helenʼs estimate of ω: 38 - 40 = -2. Comment: Similar to Exercise 15.32 in Loss Models.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 263
∫
8.30. D. Let the hazard rates be λ1, λ2, and λ3. H(t) = h(x) dx. For t ≤ 20, H(t) = λ1t. For 100 ≥ t > 20, H(t) = λ120 + λ2(t-20). For t > 100, H(t) = λ120 + λ280 + λ3(t-100). S(t) = exp[-H(t)]. For t ≤ 20 S(t) = exp[-λ1t]. For 10 ≥ t > 20, S(t) = exp[-λ120 - λ2(t-20)]. For t > 100, S(t) = exp[-λ15 - λ280 - λ3(t-100)]. f(t) = -dS(t)/dt. For t ≤ 20, f(t) = λ1exp[-λ1t]. For 100 ≥ t > 20, f(t) = λ2exp[-λ120 - λ2(t-20)]. For t > 100, f(t) = λ3exp[-λ120 - λ280 - λ3(t-100)]. loglikelihood is: ln f(12) + ln f(35) + ln f(80) + ln f(140) = -12λ1 + lnλ1 - 20λ1 - 15λ2 + lnλ2 - 20λ1 - 60λ2 + lnλ2 - 20λ1 - 80λ2 - 40λ3 + lnλ3 = lnλ1 + 2lnλ2 + lnλ3 - 72λ1 - 155λ2 - 40λ3. Setting the partial derivative with respect to λ1 equal to zero: 1/λ1 = 72. ⇒ λ1 = 1/72. Setting the partial derivative with respect to λ2 equal to zero: 2/λ2 = 155. ⇒ λ2 = 2/155. Setting the partial derivative with respect to λ3 equal to zero: 1/λ3 = 40. ⇒ λ3 = 1/40. S(150) = exp[-λ120 - λ280 - λ350] = exp[-20/72 - 160/155 - 50/40] = exp-2.5600 = 0.0773. Comment: Similar to Exercise 15.37 in Loss Models. 8.31. E. S(t) = Exp[-(t/θ)0.5]. lnS(t) = -(t/θ)0.5. Thus the loglikelihood is: -(45/θ)0.5 - (62/θ)0.5 - (70/θ)0.5 - (87/θ)0.5 = -32.28/θ0.5. Due to the minus sign, this is an increasing function of θ. As θ approaches ∞, the loglikelihood approaches 0. The value of θ can not be determined. Comment: When all of the values have been censored from above, we can not determine the maximum likelihood value of θ for a Weibull with τ fixed.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 264
8.32. C. The loglikelihood is: ln [f(16)/S(10)] + ln [f(27)/S(10)] +ln [f(30)/S(10)] + ln [f(39)/S(10)] + 2ln [S(10+y)/S(10)] = -4ln(ω) - 6(1 - 10/ω) + 2 ln(1 - (10 + y)/ω). Setting the partial derivative with respect to ω equal to zero: 0 = -4/ω - 60/(ω2 - 10ω) + (20 + 2y)/(ω2 - (10+y)ω).
⇒ 0 = -2 - 30/(ω - 10) + (10 + y)/(ω - (10+y)). Substituting ω = 76: 0 = -2 - 30/66 + (10 + y)/(66 - y). ⇒ (27/11)(66 - y) = 10 + y. ⇒ y = 44. Comment: Similar to Exercise 15.36 in Loss Models. Given the usual output, solve for the missing input. 8.33. D. For the Weibull, S(x) = exp[-(x/θ)2 ]. Let y = x2 , then y follows an Exponential distribution, S(y) = exp[-y / θ2]. Fitting this exponential with mean µ = θ2 using maximum likelihood applied to the squares of the original data: µ = (sum of amounts paid)/(# of uncensored values) = (3502 + ... + 15002 )/7 = 7,685,000/7 = 1,097,857. ⇒ θ =
1,097,857 = 1048.
^
S2 (1250) = exp[-(1250/1048)2 ] = 0.241. 350 500 1000 1200 1500
si
ri
si/ri
2 2 1 1 1
10 8 5 2 1
.20 .25 .20 .50 1
^ ^ H1 (1250) = 0.2 + 0.25 + 0.2 + 0.5 = 1.15. S1(1250) = e-1.15 = 0.317. ^
^
| S1(1250) - S2 (1250)| = |0.317 - 0.241| = 0.076. Comment: Setup taken from 4, 11/05, Q.5. 8.34. B. Likelihood is: f(1) f(2) {f(2)/S(1)}{S(4)/S(1)} = ke-k ke-2k {ke-2k/e-k} {e-4k/e-k} = k3 e-7k. Set the derivative with respect to k equal to zero: 0 = 3k2 e-7k - 7k3 e-7k. ⇒ k = 3/7. Comment: For an Exponential θ^ = (total time observed)/(# of deaths) = (1 + 2 + 1 + 3)/3 = 7/3.
⇒ k = 1/ θ^ = 3/7.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 265
x
8.35. D. H(x) = ∫ b dx = bx for 0 ≤ x < 1. H(1) = b. 0 x
H(x) = H(1) + ∫ 2b dx = b + 2b(x-1) = 2bx - b for 1 ≤ x < 2; 1 x
H(x) = H(2) + ∫ 3b dx = 3b + 3b(x-2) = 3bx - 3b for 2 ≤ x < 3. 2
S(x) = exp[-bx] for 0 ≤ x < 1; S(x) = exp[b] exp[-2bx] for 1 ≤ x < 2; S(x) = exp[3b] exp[-3bx] for 2 ≤ x < 3. f(x) = b e-bx, 0 ≤ x < 1; f(x) = exp[b] 2b e-2bx, 1 ≤ x < 2; f(x) = exp[3b] 3b e-3bx, 2 ≤ x < 3. Loglikelihood: 40 ln f(.5) + 40 ln f(1.5) + 20 ln f(2.5) = 40 {ln(b) - (.5)b} + 40{b + ln(b) + ln(2) - 2b(1.5)} + 20{3b + ln(b) + ln(3) - 3b(2.5)} = 100ln(b) - 190b + 40ln(2) + 20ln(3). Set the derivative with respect to b equal to 0: 100/b - 190 = 0. ⇒ b = 10/19. 8.36. D. f(t) = kt2 exp[-kt3 /3]. Likelihood is: f(1) f(2)2 f(3) S(4) = {k exp[-k/3]}{4k exp[-8k/3]}2 {9k exp[-9k]}{exp[-64k/3]} = 144 k4 e-108k/3. Setting the derivative with respect to k equal to zero: 0 = 144{4k3 e-108k/3 - (108/3)k4 e-108k/3}. ⇒ 4 = k108/3. ⇒ k = 1/9 = 0.111. 8.37. B. For the Exponential, for maximum likelihood: θ^ = (total time observed)/(number of deaths observed). ^
1/ λ 1 = (18 + s)/5. With censoring from above at 6, we observe that the fourth life survived to time ^
6. 1/ λ 2 = (16 + s)/4. ^
^
^
1/ λ 1 - 1/ λ 2 = 0.50. ⇒ (18 + s)/5 - (16 + s)/4 = 0.50. ⇒ s = 2. ⇒ λ 2 = 4/(16 + 2) = 2/9 = 0.222. 8.38. D. h(x) = kx2 . H(x) = kx3 /3. S(x) = exp[-kx3 /3]. f(x) = kx2 exp[-kx3 /3]. ln f(x) = lnk + 2lnx - kx3 /3. loglikelihood: 3 lnk + 2(ln4 + ln5 + ln6) - (k/3)(64 + 125 + 216) = 3 lnk + 9.575 - 135k. Set the derivative with respect to k equal to zero: 3/k - 135 = 0. ⇒ k = 3/135 = 1/45.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 266
8.39. E. Likelihood is: F(t)4 S(15)16 = (15/w)4 (1 - 15/w)16. Set the derivative equal to zero: 0 = 15{-4(1/w)5 (1 - 15/w)16 + (15/w2 )16(1/w)4 (1 - 15/w)15}.
⇒ (1 - 15/w) = 60/w. ⇒ w = 75. 8.40. D. S(t) = θ/(θ + t). f(t) = θ/(θ + t)2 . Likelihood is: f(1/2)4 S(1)6 = {θ/(θ + 0.5)2 }4 {θ/(θ + 1)}6 = θ10/{(θ + 0.5)8 (θ + 1)6 }. Loglikelihood is: 10 ln(θ) - 8 ln(θ + 0.5) - 6 ln(θ + 1). 0 = 10/θ - 8/(θ + 0.5) - 6/(θ + 1). ⇒ 10(θ + 0.5)(θ + 1) - 8θ(θ + 1) - 6θ(θ + 0.5) = 0.
⇒ 4θ2 - 4θ - 5 = 0. ⇒ θ = 1.673. ⇒ 1 - S(1) = 1/(1.673 + 1) = 0.374. Comment: Equivalent to the Balducci (hyperbolic) hypothesis, not on the syllabus. See Actuarial Mathematics. 8.41. E. Ix is linear over the interval [65, 66]. ⇒ 0.5p 65 = 1 - q65/2. Loglikelihood is: 2ln(q65) + 2ln(1 - q65/2) + 6ln(1 - q65). Setting the derivative equal to zero: 0 = 2/q65 - 1/(1 - q65/2) - 6/(1 - q65). ⇒ 2(1 - q65/2)(1 - q65) - q65(1 - q65) - 6q65(1 - q65/2) = 0. ⇒ 5q652 - 10q65 + 2 = 0.
⇒ q65 = (10 - 60 )/10 = 0.2254. 8.42. B. H(t) = kt2 /2. S(t) = exp[-kt2 /2]. f(t) = kt exp[-kt2 /2]. ln f(t) = lnk + lnt - kt2 /2. Loglikelihood is: 4lnk + ln(1) + ln(2) + ln(3) + ln(4) - k(1/2 + 2 + 9/2 + 8). Setting the derivative equal to zero: 0 = 4/k - 15. ⇒ k = 4/15 = 0.266. 8.43. D. h(x) = kx. H(x) = kx2 /2. S(x) = Exp[-kx2 /2]. f(x) = kx Exp[-kx2 /2]. Likelihood is: {f(40)/S(20)} {f(40)/S(30)} {S(30)/S(20)} = f(40)2 /S(20)2 . Loglikelihood is: 2ln f(40) - 2lnS(20) = 2 lnk + 2 ln40 - 1600k - 2(-200k) = 2lnk + 2ln40 - 1200k. Set the derivative with respect to k equal to zero: 2/k = 1200. ⇒ k = 1/600.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 267
8.44. E. For convenience start time at age 70, and refer to q70 as q. From the assumption that the underlying survival distribution is linear: S(t) = 1 - tq, 0 ≤ t ≤ 1. S(r) = 1 - rq. S(1) = 1 - q. f(t) = q, 0 ≤ t ≤ 1. Likelihood is: f(0.75)3 S(1)77 {f(0.75)/S(r)} {S(1)/S(r)}25 = f(0.75)4 S(1)102 / S(r)26. Loglikelihood is: 4 ln f(0.75) + 102 ln S(1) - 26 ln S(r) = 4 ln(q) + 102 ln(1 - q) - 26 ln(1 - rq). Set the derivative of the loglikelihood with respect to q equal to zero: 0 = 4/q - 102/(1 - q) + 26r/(1 - rq). Plus in q = 0.042 and then solve for r.
⇒ r = 11.234/26.472 = 0.424. 8.45. A. All of the lives survived the first five months after age x, and then we observe them for the next five months. The likelihood is: {(F(10/12) - F(5/12))/S(5/12)}4 {S(10/12)/S(5/12)}n-4. The loglikelihood is: 4ln[F(10/12) - F(5/12)] + (n - 4)ln[S(10/12))] - nln[S(5/12)]. Assuming a uniform survival function, the loglikelihood is: 4 ln(5qx/12) + (n - 4)ln(1 - 10qx/12) - n ln(1 - 5qx/12). Setting the derivative equal to zero: 4/qx - (n - 4)(5/6)/(1 - 5qx/6) + n(5/12)/(1 - 5qx/12) = 0. Plugging in qx = 0.20: 4/.2 + (n - 4)(5/6)/(1 - 5(.2)/6) + n(5/12)/(1 - (.2)5/12) = 0.
⇒ 20 - (n - 4) + n5/11 = 0. ⇒ n = 24(11/6) = 44. 8.46. B. Let y = e−λ. Loglikelihood is: 5 ln F(1) + 4 ln {F(2) - F(1)} + 3 ln{F(3) - F(2)} = 5 ln(1 - y) + 4 ln(y - y2 ) + 3ln(y2 - y3 ) = 12 ln(1 - y) + 10 ln(y). Set the derivative with respect to y equal to zero: 0 = -12/(1 - y) + 10/y. ⇒ y = 10/22 = 5/11 = 0.455. 8.47. B. h(t) = kt. H(t) = kt2 /2. S(t) = Exp[-kt2 /2]. f(t) = k t Exp[-kt2 /2]. Likelihood is: f(3) S(6) S(6) S(5). Loglikelihood is: ln f(3) + ln S(5) + 2 ln S(6) = lnk + ln3 - 4.5k - 12.5k + 2(-18k) = lnk + ln3 - 53k. Set the derivative with respect to k equal to zero: 1/k = 53. ⇒ k = 1/53 = 0.0189.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 268
8.48. E. For nonsmokers, h(t) = kt. H(t) = kt2 /2. S(t) = exp[-kt2 /2]. f(t) = kt exp[-kt2 /2]. For smokers, h(t) = 2kt. H(t) = kt2 . S(t) = exp[-kt2 ]. f(t) = 2kt exp[-kt2 ]. The loglikelihood is: ln[k(1/2) exp[-k/8]] - k/2 + ln[2k(1/4) exp[-k/16]] - k/4 = 2 ln(k) - 2 ln (2) - 15k/16. Setting the derivative equal to zero: 2/k - 15/16 = 0. ⇒ k = 32/15 = 2.13. Comment: A Weibull base Survival Function. A simple example of a Cox Proportional Hazards Model, which used to be covered on the syllabus. ^
8.49. A. For the Exponential, λ = 1/ θ^ = (number of deaths observed)/(total time observed) = 7/(2 + 3 + 3 + 4 + 5 + 5 + 5 + 5 + 5 + 5) = 1/6 = 0.167. 8.50. C. h(x) = kx3 . H(x) = kax4 /4. S(x) = exp[-kx4 /4]. f(x) = kx3 exp[-kx4 /4]. ln f(x) = lnk + 3lnx - kx4 /4. loglikelihood: 4lnk + 3(ln2 + ln3 + ln4 + ln5) - (k/4)(16 + 81 + 256 + 625) = 4 lnk + 3(ln2 + ln3 + ln4 + ln5) - 244.5k. Set the derivative with respect to k equal to zero: 4/k - 244.5 = 0. ⇒ k = 0.0164. Comment: A Weibull Distribution with τ = 4, and θ = 1/k1/4. 8.51. D. S(t) = 1 - qt, 0 ≤ t ≤ 1. f(t) = q, 0 ≤ t ≤ 1. Loglikelihood is: 3 ln(q) + ln(1 - q/4) + 6 ln(1 - q). Set the derivative with respect to q equal to zero: 3/q - .25/(1 - q/4) - 6/(1 - q) = 0. ⇒ 3(1 - q/4)(1 - q) - .25q(1 - q) - 6q(1 - q/4) = 0.
⇒ 5q2 - 20q + 6 = 0. ⇒ q = 0.3267. 8.52. D. For the Exponential, θ^ = (total time observed)/(number of deaths observed) = (0.25 + 0.75)/2 = 4. q0 = F(1) = 1 - e-0.25 = 0.22120. For the linear assumption, S(t) = 1 - qt, 0 ≤ t ≤ 1. f(t) = q, 0 ≤ t ≤ 1. Loglikelihood is: 2 ln(q) + 7 ln(1 - q). Set the derivative with respect to q equal to zero: 2/q - 7/(1 - q) = 0. ⇒ q = 2/9 = 0.22222. 0.22222 - 0.22120 = 0.00102.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 269
^
8.53. C. For the Exponential, λ = 1/ θ^ = (number of deaths observed)/(total time observed). Since this is a calendar year 1989 study, we observed patient #1 for 1/2 year. Assume we observe patient #2 for y years. ^
If there is one death observed, then 1 = λ = 1/(1/2 + y). ⇒ y = 1/2.
⇒ We observed patient #2 for 1/2 year. ⇒ Patient # 2 withdrew July 1, 1989. ^
Comment: If there is two deaths observed, then 1 = λ = 2/(1/2 + y). ⇒ y = 1.5.
⇒ We observed patient #2 for 1.5 years. However, under a calendar year study it is possible to observe the patient for at most one year. 8.54. E. h(x) = bx. H(x) = bx2 /2. S(x) = Exp[-bx2 /2]. f(x) = b x Exp[-bx2 /2]. ln f(x) = ln(b) + ln(x) - bx2 /2. lnS(x) = - bx2 /2. Loglikelihood: ln{f(5)/S(3)} + ln{f(7)/S(3)} + ln{f(8)/S(3)} + ln{f(10)/S(3)} = 4ln(b) + ln(5) + ln(7) + ln(8) + ln(10) - b(25 + 49 + 64 + 100)/2 - 4(-4.5b) = 4ln(b) - 101b + constants. Set the derivative with respect to b equal to zero: 4/b - 101 = 0. ⇒ b = 4/101 = 0.0396. 8.55. B. Let y be number of lives that survived under observation to time t = 1. Let λ be the hazard rate of the Exponential. S(t) = e-λt. f(t) = λe-λt. 24 lives died during [0, 1], 8 lives died during [0, 1/4], 104 - 8 = 96 lives survived to time 1/4, and y lives survived to time 1. The loglikelihood is: 24 ln F(1) + 8 ln F(1/4) + 96 ln S(1/4) + y ln S(1) = 24 ln(1 - e-λ) + 8 ln(1 - e-λ/4) + 96(-λ/4) + y(-λ). Set the derivative equal to zero: 24e-λ/(1 - e-λ) + 2e-λ/4/(1 - e-λ/4) - 24 - y = 0. We are given q0 = 1/10. ⇒ 1 - e-λ = 0.1. ⇒ e-λ = 0.9.
⇒ y = (24)(0.9)/0.1 + 2(0.91/4)/(1 - 0.91/4) - 24 = 216 + 75 - 24 = 267. 8.56. C. ln f(t) = ln(t) - t/c - 2ln(c). Loglikelihood is: ln(1) + ln(2) + ln(3) + ln(4) - (1 + 2 + 3 + 4)/c - 8ln(c). Set the derivative equal to zero: 10/c2 - 8/c = 0. ⇒ c = 10/8 = 1.25. Alternately, this is a Gamma Distribution with α = 2 and θ = c. For the Gamma with α fixed, maximum likelihood is equal to the method of moments. X = (1 + 2 + 3 + 4)/4 = 2.5. θ^ = X /α = 2.5/2 = 1.25.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 270
8.57. B. h(x) = kx. H(x) = kx2 /2. S(x) = Exp[-kx2 /2]. f(x) = kx Exp[-kx2 /2]. Likelihood is: {f(4)/S(1)} {S(6)/S(2)} {S(2)/S(1)} {S(6)/S(4)} = f(4)S(6)2 /{S(4)S(1)2 }. Loglikelihood is: ln f(4) + 2lnS(6) - lnS(4) - 2lnS(1) = lnk + ln4 - 8k - 36k + 8k + k = lnk + ln4 - 35k. Set the derivative with respect to k equal to zero: 1/k = 35. ⇒ k = 1/35. Alternately, a Weibull Distribution with τ = 2, and θ = S(x) = Exp[- (x/θ)2 ] = Exp[-x2 / θ2] = Exp[-
2 / k.
x2 ] = Exp[-k x2 / 2]. 2/k
We can use the shortcut for a Weibull with tau fixed. ∑ (Min[xi , ui]τ - di τ) = (42 - 12 ) + (62 - 22 ) + (22 - 12 ) + (62 - 42 ) = 70.
^
θ= θ=
1 /τ ∑ (Min[xi , ui ]τ - diτ) ⎛ ⎞ = 70 / 1 = 70 . ⎝ number of uncensored values ⎠
2 / k . ⇒ k = 2/θ2 = 2/70 = 1/35.
8.58. D. S(t) = 1 - qt, 0 ≤ t ≤ 1. f(t) = q, 0 ≤ t ≤ 1. Loglikelihood is: ln(q) + 3ln(1 - 0.4q) + 6 ln(1 - q). Set the derivative with respect to q equal to zero: 1/q - 1.2/(1 - 0.4q) - 6/(1 - q) = 0. ⇒ (1 - 0.4q)(1 - q) - 1.2q(1 - q) - 6q(1 - 0.4q) = 0.
⇒ 4q2 - 8.6q + 1 = 0. ⇒ q = {8.6 - 8.62 - 16 }/8 = 0.1234. Comment: The choices are too close together. 8.59. B. h(t) = kt. H(t) = kt2 /2. S(t) = Exp[-kt2 /2]. f(t) = kt Exp[-kt2 /2]. ln f(t) = ln k + ln t - kt2 /2. Loglikelihood is: 7 ln k + 2 ln1 + ln2 + ln4 + ln5 + 2 ln6 - (k/2)(1 + 1 + 4 + 16 + 25 + 36 + 36). Set the derivative with respect to k equal to zero: 7/k - 59.5 = 0. ⇒ k = 7/59.5 = 0.118.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 271
8.60. C. Assume that life D withdrew from the study at age y or was still alive when the study ended at January 1, 1995, so that his data is censored from above at y. h(x) = kx. H(x) = kx2 /2. S(x) = Exp[-kx2 /2]. f(x) = kx Exp[-kx2 /2]. Loglikelihood is: ln{f(5)/S(4)} + ln{S(7)/S(2)} + ln{f(8))/S(5)} + ln{S(y)/S(1)} = ln(k) + ln(5) - 12.5k + 8k - 24.5k + 2k + ln(k) + ln(8) - 32k + 12.5k - ky2 /2 + .5k = 2 ln(k) + ln(5) + ln(8) - (46 + y2 /2)k. Set the derivative with respect to k equal to zero: 2/k = (46 + y2 ). ⇒ k = 2/(46 + y2 /2). Set this equal to the given estimate for k: 1/27 = 2/(46 + y2 /2). ⇒ y = 4. Life D withdrew at age 4. ⇔ Life D withdrew 4 - 1 = 3 years after the start of the study.
⇔ Life D Withdrew January 1, 1993. 8.61. E. loglikelihood is: ln S(0.75) + 2ln S(0.5) + ln f(0.5) + ln S(1) - 5 ln S(r) = ln[θ/(θ + 0.75)] + 2ln[θ/(θ + 0.5)] + ln[θ/(θ + 0.5)2 ] + ln[θ/(θ + 1)] - 5ln[θ/(θ + r)] = 5ln(θ + r) - 4ln(θ + 0.5) - ln(θ + 0.75) - ln(θ + 1). Set the derivative with respect to θ equal to zero: 5/(θ + r) = 4/(θ + 0.5) + 1/(θ + 0.75) + 1/(θ + 1). We are given that the maximum likelihood estimate of qx is 4/11. ⇒ 4/11 =1/(θ +1). ⇒ θ = 7/4.
⇒ 5/(1.75 + r) = 4/2.25 + 1/2.5 + 1/2.75. ⇒ 1.75 + r = 5/2.5414. ⇒ r = 0.2174. 8.62. C. S(t) = 1 - qt, 0 ≤ t ≤ 1. f(t) = q, 0 ≤ t ≤ 1. Loglikelihood is: lnS(0.3) + lnf(0.5) + lnS(1) + ln[S(1)/S(0.1)] + ln[f(0.7)/S(0.3)] = ln(1 - 0.3q) + ln(q) + ln(1 - q) + ln(1 - q) - ln(1 - 0.1q) + ln(q) - ln(1 - 0.3q) = 2ln(q) + 2ln(1 - q) - ln(1 - 0.1q). Set the derivative with respect to q equal to zero: 2/q - 2/(1 - q) + 0.1/(1 - 0.1q) = 0. ⇒ 2(1 - 0.1q)(1 - q) - 2q(1 - 0.1q) + 0.1q(1 - q) = 0.
⇒ .3q2 - 4.1q + 2 = 0. ⇒ q = {4.1 - 4.12 - (4)(0.3)(2) } / 0.6 = 0.5066. 8.63. D. f(t) = (α/10)(1 + t/10)−(α+1). Loglikelihood is: lnf(4) + lnS(5) = lnα - ln10 - (α+1)ln1.4 - αln1.5 = lnα - ln10 - 0.7419α - ln1.4. Set the derivative with respect to α equal to zero: 1/α - 0.7419 = 0. ⇒ α = 1.348. Comment: A Pareto Distribution with θ = 10.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 272
8.64. E. S(t) = 1 - qt, 0 ≤ t ≤ 1. f(t) = q, 0 ≤ t ≤ 1. Loglikelihood is: 5ln(q) + 10ln(1 - .2q) + 35ln(1 - q). Set the derivative with respect to q equal to zero: 5/q - 10(0.2)/(1 - 0.2q) - 35/(1 - q) = 0. ⇒ 5(1 - 0.2q)(1 - q) - 2q(1 - q) - 35q(1 - 0.2q) = 0.
⇒ 10q2 - 43q + 5 = 0. ⇒ q = {43 - 432 - 200 }/20 = 0.1196. 8.65. D. The loglikelihood is: ln F(1) + ln S(1/2) + 8 ln S(1) = ln(q) + ln(1 - q/2) + 8ln(1 - q). Set the derivative equal to zero: 0 = 1/q - (1/2)/(1 - q/2) - 8/(1 - q).
⇒ 5q2 - 10q - 1 = 0. ⇒ q = {10 - 100 - 20 }/10 = 0.1056. 8.66. A. f(t) = -dS(t)/dt = (1/2) (1 - t/k)-1/2/k. Loglikelihood is: 2 ln f(10) + 8 ln S(10) = 2{-ln(2) - (1/2)ln(1 - 10/k) - ln(k)} + 8(1/2) ln(1 - 10/k) = -2ln(2) + 3ln(1 - 10/k) - 2 ln(k). Setting the derivative with respect to k equal to zero: (10/k2 )(3)/(1 - 10/k) - 2/k = 0. ⇒ 15/k = 1 - 10/k. ⇒ k = 25. ^
8.67. B. For the Exponential, λ = 1/ θ^ = (number of deaths observed)/(total time observed) = 5/{1.1 + 3.2 + 3.3 + 3.5 + 3.9 + (495)(4)} = 1/399 = 0.002506. 8.68. E. h(x) = 1/(b+x). x
∫
H(x) = h(t) dt = ln(b+x) - ln(b+2). S(x) = (b+2)/(b+x). f(x) = (b+2)/(b+x)2 . 2
Likelihood is: f(3) S(3)19. Loglikelihood is: ln f(3) + 19 ln S(3) = ln(b+2) - 2ln(b+3) + 19ln(b+2) - 19ln(b+3) = 20ln(b+2) - 21ln(b+3). Set the derivative with respect to b equal to zero: 20/(b+2) - 21/(b+3) = 0. ⇒ 20b + 60 = 21b + 42. ⇒ b = 18. 8.69. A. For Team X, the likelihood is: f(1)f(3)f(4)f(4)S(5) = (1/ω)4 (1 - 5/ω). Loglikelihood is: - 4ln(ω) + ln(1 - 5/ω). Set the derivative equal to zero: 0 = - 4/ω + (5/ω2)/(1 - 5/ω). ⇒ 4(1 - 5/ω) = 5/ω. ⇒ ω = 6.25. For Team Y, the likelihood is: f(1)f(3)f(4)f(4)f(6) = (1/ω)5 . This is a decreasing function of ω, so we want the smallest possible ω, which is the largest observed time of death, 6. Team X - Team Y = 6.25 - 6 = 0.25.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 273
8.70. A. The likelihood is: {f(4)/S(3)} {f(5)/S(3)} {f(7)/S(3)} {S(r)/S(3)} = (1/ω)3 {(ω - r)/ω} / {(ω - 3)/ω}4 = (ω - r)/(ω - 3)4 . Loglikelihood is: ln(ω - r) - 4ln(ω - 3). Set the derivative equal to zero: 0 = 1/(ω - r) - 4/(ω - 3). ⇒ ω - 3 = 4(ω - r).
⇒ r = (3/4)(ω + 1) = (3/4)(13.67 + 1) = 11. 8.71. D. The likelihood is: f(10)f(15)f(14)f(4)S(7). The loglikelihood is: ln f(10) + ln f(15) + ln f(14) + ln f(4) + ln S(7). For an Exponential, ln f(x) = -x/θ - ln(θ), and ln S(x) = -x/θ. For Model I, the loglikelihood is: -(10 + 15 + 14 + 4 + 7)/12.5 - 4 ln(12.5) = -14.10. For Model II, the loglikelihood is: -(10 + 15 + 14 + 4 + 7)/10 - 4 ln(10) = -14.21. For a Weibull, ln f(x) = ln(τ) + (τ-1)ln(x) - (x/θ)τ - τln(θ), and ln S(x) = -(x/θ)τ. For Model III, the loglikelihood is: 4ln(2) + ln(10) + ln(15) + ln(14) + ln(4) -(102 + 152 + 142 + 42 + 72 )/14.142 - (4)(2)ln(14.14) = -12.31. Best fit is the largest loglikelihood. From best to worst: III, I, II. Comment: Ranking the loglikelihoods is the same as ranking the likelihoods. One should not directly compare the likelihoods for models with different numbers of parameters. One could use the Likelihood Ratio Test or the Schwarz Bayesian Criterion,as discussed in “Mahlerʼs Guide to Fitting Loss Distributions.”
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 274
∫
8.72. B. H(t) = h(x) dx. For t < 5, H(t) = λ1t. For 10 > t ≥ 5, H(t) = λ15 + λ2(t - 5). For t ≥ 10, H(t) = λ15 + λ25 + λ3(t - 10). S(t) = exp[-H(t)]. For t < 5 S(t) = exp[-λ1t]. For 10 > t ≥ 5, S(t) = exp[-λ15 - λ2(t - 5)]. For t ≥ 10, S(t) = exp[-λ15 - λ25 - λ3(t - 10)]. f(t) = -dS(t)/dt. For t < 5, f(t) = λ1exp[-λ1t]. For 10 > t ≥ 5, f(t) = λ2exp[-λ15 - λ2(t - 5)]. For t ≥ 10, f(t) = λ3exp[-λ15 - λ25 - λ3(t - 10)]. loglikelihood is: ln f(3) + ln f(7) + ln f(8) + 2ln f(12) + ln f(13) + ln f(14) + 3lnS(15) = -3λ1 + lnλ1 - 5λ1 - 2λ2 + lnλ2 - 5λ1 - 3λ2 + lnλ2 + 2(-5λ1 - 5λ2 - 2λ3 + lnλ3) - 5λ1 - 5λ2 - 3λ3 + lnλ3 - 5λ1 - 5λ2 - 4λ3 + lnλ3 + 3(-5λ1 - 5λ2 - 5λ3) = lnλ1 + 2lnλ2 + 4lnλ3 - 48λ1 - 40λ2 - 26λ3. Setting the partial derivative with respect to λ1 equal to zero: 1/λ1 = 48. ⇒ λ1 = 1/48. Setting the partial derivative with respect to λ2 equal to zero: 2/λ2 = 40. ⇒ λ2 = 2/40 = 1/20. Setting the partial derivative with respect to λ3 equal to zero: 4/λ3 = 26. ⇒ λ3 = 4/26 = 2/13. ^
S (12) = exp[-λ15 - λ25 - λ32] = exp[-5/48 - 5/20 - 4/13] = exp-0.66186 = 0.5159. Comment: The total time observed during each of the intervals are 48, 40, and 26. ^ In each case, λ i = (# of deaths observed in interval i)/(total time observed in interval i).
8.73. A. The likelihood is: {(F(1) - F(.4))/S(.4)}6 {S(1)/S(.4)}4 F(1)8 S(1)12. The loglikelihood is: 6 ln[F(1) - F(.4)] + 4 ln[S(1)] - 10 ln[S(.4)] + 8ln[F(1)] + 12 ln[S(1)]. S(t) = 1 - qt, 0 ≤ t ≤ 1. f(t) = q, 0 ≤ t ≤ 1. Loglikelihood is: 6ln(0.6q) + 4ln(1 - q) - 10ln(1 - 0.4q) + 8ln(q) + 12ln(1 - q) = 6ln(0.6) + 14ln(q) + 16ln(1 - q) - 10ln(1 - 0.4q). Set the derivative with respect to q equal to zero: 14/q - 16/(1 - q) + 4/(1 - 0.4q) = 0.
⇒ 7(1 - 0.4q)(1 - q) - 8q(1 - 0.4q) + 2q(1 - q) = 0. ⇒ 4q2 - 15.8q + 7 = 0. ⇒ q = {15.8 - 15.82 - (4)(4)(7) }/8 = 0.5085.
2013-4-7,
Survival Analysis §8 Maximum Likelihood,
HCM 10/16/12,
Page 275
∫
8.74. B. H(t) = h(x) dx. For t < 2, H(t) = λ1t. For t ≥ 2, H(t) = λ12 + λ2(t - 2). S(t) = exp[-H(t)]. For t < 2, S(t) = exp[-λ1t]. For t ≥ 2, S(t) = exp[-λ12 - λ2(t - 2)]. f(t) = -dS(t)/dt. For t < 5, f(t) = λ1exp[-λ1t]. For t ≥ 2, f(t) = λ2exp[-λ12 - λ2(t - 2)]. loglikelihood is: ln f(1.7) + ln S(1.5) + ln S(2.6) + ln f(3.3) + ln S(3.5) = -1.7λ1 + lnλ1 - 1.5λ1 - 2λ1 - 0.6λ2 + lnλ2 - 2λ1 - 1.3λ2 - 2λ1 - 1.5λ2 = lnλ1 + lnλ2 - 9.2λ1 - 3.4λ2. Setting the partial derivative with respect to λ1 equal to zero: 1/λ1 = 9.2. ⇒ λ1 = 0.1087. Setting the partial derivative with respect to λ2 equal to zero: 1/λ2 = 3.4. ⇒ λ2 = 0.2941. λ 2 - λ 1 = 0.2941 - 0.1087 = 0.1854. ^ Comment: λ i = (# of deaths observed in interval i)/(total time observed in interval i).
Exercise 15.30 in Loss Models. 8.75. E. S(x) = θ4/(θ2 + x2 )2 . ln S(x) = 4 lnθ - 2 ln(θ2 + x2 ). f(x) = 4xθ4/(θ2 + x2 )3 . ln f(x) = ln4 + ln x + 4 lnθ - 3 ln(θ2 + x2 ). Loglikelihood is: ln f(2) + ln f(4) + ln S(4) = ln4 + ln2 + 4 lnθ - 3 ln(θ2 + 4) + ln4 + ln4 + 4 lnθ - 3 ln(θ2 + 16) + 4 lnθ - 2 ln(θ2 + 16) = 12 lnθ - 3 ln(θ2 + 4) - 5 ln(θ2 + 16) + constants. Setting the derivative of the loglikelihood equal to zero: 0 = 12/θ - 6θ/(θ2 + 4) - 10θ/(θ2 + 16). ⇒ θ4 - 26θ2 - 192 = 0.
⇒ (θ2 - 32)(θ2 + 6) = 0. ⇒ θ2 = 32. ⇒ θ = 32 = 5.657. Comment: A Pareto Distribution with α = 2 and θ2 rather than θ as the scale parameter. 8.76. D. The distribution of failure times is uniform from 4 to ω. The loglikelihood is: lnf(5) + lnf(9) + lnf(13) + 2lnS(4 + p) = 3ln[1/(ω - 4)] + 2ln[{ω - (4+p)}/(ω -4)] = 2ln[ω - 4 - p] - 5ln[ω - 4]. Set the derivative of the loglikelihood with respect to ω equal to 0: 2/(ω - 4 - p) = 5/(ω - 4). Since we know the solution is ω = 29, we have: 2/(29 - 4 - p) = 5/(29 - 4). ⇒ p = 15.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 276
Section 9, Multiple Decrements89 A disabled worker is receiving social security disability benefits. There are a number of reasons why he may stop receiving benefits under disability: 1. Death 2. Recovery (he is no longer disabled) 3. Conversion to old age retirement benefits Each of these is a cause of decrement. This is called a multiple decrement situation.90 One can use the methods previously discussed to estimate the survival function for each decrement separately. Leaving due to another decrement is treated as right censoring. For example, when working on death, the observation of a worker who leaves due to recovery or conversion would act as an observation right censored with respect death. If instead working on recovery, then the observation of a worker who leaves due to death or conversion would act as an observation right censored with respect recovery. The corresponding probabilities estimated in this way are denoted qʼ and are called the single decrement rates. For a life aged x: tq xʼ(d)
= Prob[death within t years | other decrements do not occur].
tq xʼ(r)
= Prob[recovery within t years | other decrements do not occur].
tq xʼ(c)
= Prob[conversion within t years | other decrements do not occur].
For example, if working on death separately, assume we estimate S(55) = 0.772 and S(56) = 0.705. Then q55ʼ(d) = (0.772 - 0.705) / 0.772 = 0.0868. As usual, define tp xʼ(d) = 1 - tqxʼ(d). In the above example, p55ʼ(d) = 1 - 0.0868 = 0.9132. The total probability of decrement is that due to all causes, qx(T). px(T) = 1 - qx(T).
89
See page 369 of Loss Models. See Sections 10 and 11 of Actuarial Mathematics for an extensive discussion of multiple decrements. I do not believe you should worry about being asked detailed questions about multiple decrements on this exam. However, it would not hurt to briefly review anything you may have learned on a previous exam. 90
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 277
Given a life of age 55, p55(T) = Prob[continuing to receive disability benefits until age 56] = Prob[not dying by age 56] Prob[not recovering by age 56] Prob[not converting by age 56] = p 55ʼ(d) p55ʼ(r) p55ʼ(c). In general, tp x(T) = tp xʼ(d) tp xʼ(r) tp xʼ(c). More generally, we take a product of the pʼ(g) over the different decrements: tpx (T)
= Π t px' (g). 91
tqx(T)
= 1 - Π (1 - tqx' (g)).
Exercise: q55ʼ(d) = 0.0868. q55ʼ(r) = 0.0067. q55ʼ(c) = 0.0210. Calculate q55(T). [Solution: q55(T) = 1 - (1 - 0.0868)(1 - 0.0067)(1 - 0.0210) = 1 - (0.9132)(0.9933)(0.9790) = 0.1120.] Putting this exercise into words, for a life aged 55:
• There is a 8.68% chance they die within the next year. We do not care whether they recovered or started receiving old age benefits prior to dying.
• Conditional on them not dying, there is a 0.67% chance they recover within the next year. We do not care whether or not they receive old age benefits.
• Conditional on them not dying, there is a 2.10% chance they receive old age benefits within the next year. We do not care whether or not they recover.
• There is an 11.20% chance that they will stop getting disability benefits during the next year. There is an 88.80% chance that they will be still be getting disability benefits in one year. Alan is 55 on January 1. If Alan dies on March 1, then he would be included in the first probability. Alan will be included whether or not he was receiving disability benefits at that time. Barry is 55 on January 1. If Barry recovers on June 1, then he would be included in the second probability. Barry will be included whether or not he was receiving old age benefits at that time. Chris is 55 on January 1. If Chris converts to old age benefits on August 1, then he would be included in the third probability. Chris will be included whether or not he was receiving disability benefits at that time. 91
See equation 10.5.2 in Actuarial Mathematics.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 278
We are also interested in multiple decrement probabilities denoted q. For a life aged x: tq x(d)
= Prob[stop getting benefits within t years due to death].
tq x(r)
= Prob[stop getting benefits within t years due to recovery].
tq x(c)
= Prob[stop getting benefits within t years due to conversion].
Note that a disabled life aged 55 might die at age 57 in an automobile accident, preventing either the possibility of converting to old age benefits at age 60 or recovering at age 58. The q(d) as opposed to qʼ(d) includes the interaction of the competing decrements.92 There will be fewer conversions due to deaths that occur first. There will be fewer cases of death stopping benefits, due to those cases where benefits have already been stopped due to other causes. The total probability of decrement is the sum of the multiple decrement probabilities due to different causes: tq x(T) = tq x(d)
+ tqx(r) + tqx(c).
More generally, we take a sum of the q(g) over the different decrements: t qx (T)
= Σ t qx ( g ).93
There are a number of ways to estimate the multiple decrement probabilities q(g) from the single decrement rates qʼ(g). One method is to assume constant forces of decrement on the interval from x to x + 1. Another method is to assume for each cause of decrement a uniform distribution on the interval from x to x + 1. In either case:94 qx (g) = qx ( T )
' (g)] 95 ln[px' (g)] (T) ln[1 - q x = q . x ln[px(T) ] ln[1 - qx(T )]
92
It may be helpful to think of the decrements competing to occur first. See equation 10.2.19 in Actuarial Mathematics. 94 See Sections 10.5.3 and 10.5.4, as well as Exercise 10.22, in Actuarial Mathematics. 95 See equation 10.5.9 in Actuarial Mathematics. Since px(T) is a product of the individual px(g), ln[px(T)] is a sum of the individual ln[px(g)]. 93
Thus we are dividing qx(T) into pieces.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 279
Exercise: q55ʼ(d) = 0.0868. q55ʼ(r) = 0.0067. q55ʼ(c) = 0.0210. Calculate q55(d), q55(r), and q55(c). [Solution: From the solution to the previous exercise q55(T) = 0.1120. q55(d) = (0.1120) ln(1 - 0.0868) / ln(1 - 0.1120) = (0.1120) (-0.09080/-0.11878) = 0.0856. q 55(r) = (0.1120) ln(1 - 0.0067) / ln(1 - 0.1120) = (0.1120) (-0.00672/-0.11878) = 0.0063. q 55(c) = (0.1120) ln(1 - 0.0210) / ln(1 - 0.1120) = (0.1120)( -0.02122/-0.11878) = 0.0200. Comment: q55(d) + q55(r) + q55(c) = 0.0856 + 0.0063 + 0.0200 = 0.1119, equal to q55(T) subject to rounding.] Putting the results of this exercise into words, for a life aged 55:
• There is a 8.56% chance they die within the next year, prior to recovering from their disability or converting to old age benefits.
• There is a 0.63% chance they recover within the next year, prior to converting to old age benefits, and of course prior to dying.
• There is a 2.00% chance they convert to old age benefits within the next year, prior to recovering from their disability, and of course prior to dying. Joe is 55 on January 1. If Joe dies on May 1 while he is still receiving disability benefits, then he would be included in the first probability. However, If Joe had stopped receiving disability benefits on February 1, then he would be included in one of the other two probabilities. If he had recovered on February 1, then he would go in the second probability, while if he started received old age benefits on February 1 he would go in the third probability, q55(c). Ken is 55 on January 1. If Ken recovered on July 1 while he is still receiving disability benefits, then he would be included in the second probability, q55(r). Larry is 55 on January 1. If Larry started to receive old age benefits on October 1, and up until that time he was receiving disability benefits, then he would be included in the third probability, q55(c).
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 280
Assuming we start with 10,000 lives each aged 55, we expect on average during the next year:
• 856 will die while still receiving disability benefits.96 • 63 will recover while still receiving disability benefits.97 • 200 will convert to old age benefits while still receiving disability benefits.98 • The remaining 8880 will be still be getting disability benefits in one year.99 • 868 will die. We do not care whether they recovered or started receiving old age benefits prior to dying.100
• 0.67% or 67 of them recover within the next year.101 We do not care whether or not they receive old age benefits.
• 2.10% or 210 will receive old age benefits.102 We do not care whether or not they recover. Note that while one can die either before or after recovering or receiving old age benefits, one can not recover or receive old age benefits after dying. One can recover from your disability after starting to receive old age benefits. However, there would be no way for the Social Security actuary to know this. Once the individual starts receiving old age benefits, they would leave this actuaryʼs data base. Similarly, one can start to receive old age benefits after recovering from your disability. However, once the individual recovered and stopped receiving disability benefits, they would leave this actuaryʼs data base.
q55(d) = 8.56%. 97 q55(r) = 0.63%. 96
98
q55(c) = 2.00%.
The values add up to 9999 rather than 10,000, due to rounding. p55(T) = 1 - 0.1120 = 0.8880. 100 q55ʼ(d) = 8.68%. 868 - 856 = 12 will die while no longer receiving disability benefits. 99
q55ʼ(r) = 0.67%. 67 - 63 = 4 will recover from their disability while receiving old age benefits. 102 q55ʼ(c) = 2.10%. 210 - 200 = 10 will start receiving old age benefits after having recovered from their disability. 101
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 281
Exercise: q56ʼ(d) = 0.0886. q56ʼ(r) = 0.0058. q56ʼ(c) = 0.0304. Calculate q56(d), q56(r), and q56(c). [Solution: q56(T) = 1 - (1 - 0.0886)(1 - 0.0058)(1 - 0.0304) = 0.1214. q 56(d) = (0.1214) ln(1 - 0.0886) / ln(1 - 0.1214) = 0.0870. q56(r) = (0.1214) ln(1 - 0.0058) / ln(1 - 0.1214) = 0.0055. q56(c) = (0.1214) ln(1 - 0.0304) / ln(1 - 0.1214) = 0.0290. Comment: q56(d) + q56(r) + q56(c) = 0.0870 + 0.0055 + 0.0290 = 0.1215, equal to q56(T) subject to rounding.] Assuming we start with 10,000 lives each aged 55, then by age 56 the expected number still receiving disability benefits is: (10,000) p55(T) = (10,000) (0.8880) = 8880. From the solution to the above exercise, of these 8880 lives, we expect that by age 57: (0.0870) (8880) = 772.56 will die while still receiving disability benefits. (0.0055) (8880) = 48.84 will recover while still receiving disability benefits. (0.0290) (8880) = 257.52 will convert to old age benefits while still receiving disability benefits. 8800 p56(T) = (8880)(0.8786) = 7801.97, will continue to be receiving disability benefits.103 Therefore, assuming we start with 10,000 lives each aged 55, then by age 57 we expect: 856 + 773 = 1629 will die while still receiving disability benefits. 63 + 49 = 112 will recover while still receiving disability benefits. 200 + 258 = 458 will convert to old age benefits while still receiving disability benefits.104
103 104
772.56 + 48.84 + 257.52 + 7801.97 = 8880, subject to rounding. 1629 + 112 + 458 + 7802 = 10,000, subject to rounding.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 282
Problems: Use the following information for the next 3 questions: The following table presents the results for 50 five-year term insurance policies, issued to men aged 60. The column headed “First observation” gives the age of the insured at which the policy was first observed. The column headed “Last observation” gives the age of the insured at which the policy was last observed. Sometimes an insured will stop paying their premium and insurance coverage will no longer be provided; in other words, their policy will lapse. The column headed “Event” is coded “ l ” for lapse, “d” for death, and “e” for expiration of the five-year period. Policy 1 2 3 4 5 6 7 8-30 31 32 33-50
First Observation 60.0 60.0 60.0 60.0 60.0 60.0 60.0 60.0 63.0 63.0 63.0
Last Observation 61.3 62.1 62.8 63.5 64.0 64.4 64.6 65.0 63.6 64.8 65.0
Event l d d l l l d e d l e
Assume each decrement has a constant force over each age. Andrew was issued a five-year term policy at age 60. Andrew is still alive at age 64 with his policy still in force. 9.1 (2 points) Determine the probability that Andrewʼs policy will still be in effect just before he turns 65. A. 91.5% B. 92.0%
C. 92.5%
D. 93.0%
E. 93.5%
9.2 (2 points) Determine the probability that during the next year the insurer will pay a death benefit on Andrewʼs policy. A. 2.23% B. 2.28%
C. 2.33%
D. 2.38%
E. 2.43%
9.3 (2 points) Determine the probability that Andrewʼs policy will not lapse during the next year. A. 93.18%
B. 95.40%
C. 95.45%
D. 97.67%
E. 97.72%
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 283
Use the following information for the next 4 questions: The following table presents the results for 40 ten-year term insurance policies, issued to people of the same age. The column headed “First observation” gives the duration at which the policy was first observed. The column headed “Last observation” gives the duration at which the policy was last observed. The column headed “Event” is coded “s” for surrender, “d” for death, and “e” for expiration of the tenyear period. Policy 1 2 3 4 5 6 7 8-30 31 32 33-40
First Observation 0 0 0 0 0 0 0 0 5.0 5.0 5.0
Last Observation 1.3 2.1 3.8 4.5 6.0 7.4 8.2 10.0 6.6 8.8 10.0
Event d s s d s s d e d s e
9.4 (2 points) For death, determine the Nelson-Aalen estimate of H(10). (A) 0.12 (B) 0.13 (C) 0.14 (D) 0.15
(E) 0.16
9.5 (3 points) Determine the variance of the estimate in the previous question. (A) Less than 0.0030 (B) At least 0.0030, but less than 0.0035 (C) At least 0.0035, but less than 0.0040 (D) At least 0.0040, but less than 0.0045 (E) At least 0.0045 9.6 (2 points) For surrender, determine the Nelson-Aalen estimate of H(10). (A) 0.12 (B) 0.13 (C) 0.14 (D) 0.15 9.7 (3 points) Determine the variance of the estimate in the previous question. (A) Less than 0.0030 (B) At least 0.0030, but less than 0.0035 (C) At least 0.0035, but less than 0.0040 (D) At least 0.0040, but less than 0.0045 (E) At least 0.0045
(E) 0.16
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 284
Use the following information for the next two questions: In a mortality study 400 lives have survived to age 78. No new lives enter the study. Lives leave the study for only two reasons: Death or Surrender of their policy. Age Number of Deaths Number of Surrenders 78 to 79 28 12 79 to 80 32 10 80 to 81 37 9 Assume deaths and surrenders are uniformly distributed throughout each interval. 9.8 (2 points) Estimate q78(death). A. 0.04
B. 0.05
C. 0.06
D. 0.07
E. 0.08
D. 0.04
E. 0.05
9.9 (2 points) Estimate q79(surrender). A. 0.01
B. 0.02
C. 0.03
9.10 (2 points) There are two decrements: death and retirement. You have estimated the following survival function for death, treating any retirements as right censoring the data: S(68) = 0.700. S(69) = 0.650. You have estimated the following survival function for retirement, treating any deaths as right censoring the data: S(68) = 0.300. S(69) = 0.200. Assume each decrement has a constant force over each year of age. Determine q68(retirement). A. 0.32
B. 0.33
C. 0.34
D. 0.35
E. 0.36
9.11 (6 points) For a triple decrement model: x
qxʼ(1)
qxʼ(2)
qxʼ(3)
63 0.020 0.030 0.250 64 0.025 0.035 0.200 65 0.030 0.040 0.150 Each decrement has a constant force over each year of age. Calculate q63(1), q64(1), q65(1), q63(2), q64(2), q65(2), q63(3), q64(3), and q65(3).
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 285
Use the following information for the next 4 questions: The following table presents the results for 10 five-year term insurance policies, issued to people of the same age. Each policy was observed from time of issue. Policy 1 2 3 4 5 6 7-10
Time of Death 1.8 2.5 3.6 -
Time of Surrender 3.0 3.3 4.1 -
9.12 (2 points) For death, determine the Nelson-Aalen estimate of H(5). (A) 0.38 (B) 0.41 (C) 0.44 (D) 0.47 (E) 0.50 9.13 (2 points) Determine the variance of the estimate in the previous question. (A) 0.04 (B) 0.05 (C) 0.06 (D) 0.07 (E) 0.08 9.14 (2 points) For surrender, determine the Nelson-Aalen estimate of H(5). (A) 0.38 (B) 0.41 (C) 0.44 (D) 0.47 (E) 0.50 9.15 (2 points) Determine the variance of the estimate in the previous question. (A) Less than 0.04 (B) At least 0.04, but less than 0.05 (C) At least 0.05, but less than 0.06 (D) At least 0.06, but less than 0.07 (E) At least 0.07
9.16 (3 points) 100 lives are observed starting at age 20. The study ends at age 30. One withdrawal occurs at each of ages: 22, 26, 27. One death due to accident occurs at each of ages: 22, 24, 28. One death due to other than accident occurs at each of ages: 24, 26. (a)
(n)
Using the Product-Limit Estimator, estimate 10p 20, 10p20 , and 10p20 , where (a) refers to death due to accidents and (n) refers to death due to other than accidents.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 286
Use the following information for the next 4 questions: The following table presents the results for 40 ten-year term insurance policies, issued to people of the same age. The column headed “First observation” gives the duration at which the policy was first observed. The column headed “Last observation” gives the duration at which the policy was last observed. The column headed “Event” is coded “s” for surrender, “d” for death, and “e” for expiration of the ten-year period. Policy 1 2 3 4 5 6 7 8-30 31 32 33-40
First Observation 0 0 0 0 0 0 0 0 5.0 5.0 5.0
Last Observation 1.3 2.1 3.8 4.5 6.0 7.4 8.2 10.0 6.6 8.8 10.0
Event d s s d s s d e d s e
9.17 (2 points) For death, determine the Product-Limit Estimator of S(10). (A) 0.85 (B) 0.86 (C) 0.87 (D) 0.88
(E) 0.89
9.18 (2 points) Using Greenwoodʼs approximation, determine the variance of the estimate in the previous question. (A) Less than 0.0030 (B) At least 0.0030, but less than 0.0035 (C) At least 0.0035, but less than 0.0040 (D) At least 0.0040, but less than 0.0045 (E) At least 0.0045 9.19 (2 points) For surrender, determine the Product-Limit Estimator of S(10). (A) 0.85 (B) 0.86 (C) 0.87 (D) 0.88
(E) 0.89
9.20 (2 points) Using Greenwoodʼs approximation, determine the variance of the estimate in the previous question. (A) 0.0034 (B) 0.0036 (C) 0.0038 (D) 0.0040 (E) 0.0042
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 287
Use the following information for the next three questions: For a double-decrement study, q′j(i) is the probability that a person aged cj withdraws due to decrement (i) prior to age cj+1 in an environment where the other decrement is not possible. j
cj
q′j(1)
0 1 2 3 4
0 10 20 30 40
0.060 0.030 0.040 0.050 0.060
(T)
qj
0.135 0.127 0.098 0.102 0.111
9.21 (2 points) Group A consists of 10,000 individuals observed at age 0. Group A is affected by only decrement (2). Determine the expected number of individuals in Group A that survive to be at least 30 years old. (A) 7200 (B) 7400 (C) 7600 (D) 7800 (E) 8000 9.22 (3 points) Group B consists of 10,000 individuals observed at age 0. Group B is affected by both decrements (1) and (2). Determine the expected number of individuals in Group B that withdraw due to decrement (2) by age 30 years old. (A) 2000 (B) 2100 (C) 2200 (D) 2300 (E) 2400 9.23 (3 points) Group B consists of 10,000 individuals observed at age 0. Group B is affected by both decrements (1) and (2). Determine the expected number of individuals in Group B that withdraw due to decrement (1) by age 30 years old. (A) 900 (B) 1000 (C) 1100 (D) 1200 (E) 1300 9.24 (2 points) There are two decrements death and retirement. You have estimated the following survival function for death, treating any retirements as right censoring the data: S(62) = 0.80. S(63) = 0.74. You have estimated the following survival function for retirement, treating any deaths as right censoring the data: S(62) = 0.77. S(63) = 0.60. Assume each decrement has a constant force over each year of age. Determine q62(death). (A) Less than 0.068 (B) At least 0.068, but less than 0.070 (C) At least 0.070, but less than 0.072 (D) At least 0.072, but less than 0.074 (E) At least 0.074
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 288
9.25 (8 points) You are studying a group of couples all of whom were married in their 20s. Couples leave the study for only two reasons: Death or Divorce. Of the original couples, 1000 are still married 30 years after they were wed. Time from Marriage Number of Deaths Number of Divorces 30 to 31 years 11 23 31 to 32 years 14 19 32 to 33 years 13 20 33 to 34 years 12 25 34 to 35 years 17 21 Assume deaths and divorces are uniformly distributed throughout each interval. Using the product limit estimator, estimate the probability of a marriage that has survived 30 years ending due to divorce prior to reaching 35 years. Use the following information for the next four questions: An insurance company has sold term life insurance policies. Sometimes an insured will stop paying their premium and insurance coverage will no longer be provided; in other words, their policy will lapse. There are two decrements: death and lapses. You have estimated the following survival function for death, treating any lapses as right censoring the data: S(80) = 0.391. S(81) = 0.350. S(82) = 0.301. You have estimated the following survival function for lapses, treating any deaths as right censoring the data: S(80) = 0.537. S(81) = 0.485. S(82) = 0.440. Assume each decrement has a constant force over each year of age. 9.26 (2 points) Determine the probability that an insured aged 80 will not have his policy lapse over the coming year. A. 89.0% B. 89.5% C. 90.0% D. 90.5% E. 91.0% 9.27 (2 points) For an insured aged 80, determine the probability that the insurer will pay a death benefit over the coming year. A. 10.0% B. 10.5% C. 11.0% D. 11.5% E. 12.0% 9.28 (2 points) Determine the probability that an insured aged 80 will not have his policy lapse over the coming two years. A. 82.5% B. 83.0% C. 83.5% D. 84.0% E. 84.5% 9.29 (2 points) For an insured aged 80, determine the probability that the insurer will pay a death benefit over the coming two years. A. 20.0% B. 20.5% C. 21.0% D. 21.5% E. 22.0%
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 289
9.30 (CAS3, 11/03, Q.11) (2.5 points) Given: qx(1) x < 40 x ≥ 40
qx(2)
qx(3)
qx(T)
0.10 0.04 0.02 0.16 0.20 0.04 0.02 0.26
Calculate 5|q38(1). A. Less than 0.06 B. At least 0.06, but less than 0.07 C. At least 0.07, but less than 0.08 D. At least 0.08, but less than 0.09 E. At least 0.09 9.31 (CAS3, 5/04, Q.6) (2.5 points) A biological experiment begins with 100 identical independent trials. Each test can be terminated on any day for the following reasons: (1) human error, (2) successful completion of the experiment, or (3) all other reasons. Given the following triple-decrement mortality table: x
qx(1)
qx(2)
qx(3)
0 0.08 0.02 0.10 1 0.10 0.25 0.20 2 0.20 0.65 0.15 Calculate the expected number of tests to reach a successful completion during the third day. A. Less than 10 B. At least 10, but less than 20 C. At least 20, but less than 30 D. At least 30, but less than 40 E. At least 40
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 290
9.32 (4, 11/06, Q.17 & 2009 Sample Q.261) (2.9 points) For a double-decrement study, you are given: (i) The following survival data for individuals affected by both decrements (1) and (2): (T)
j
cj
qj
0 1 2 3
0 20 40 60
0.100 0.182 0.600 1.000
(ii) q′j(2) = 0.05 for all j (iii) Group A consists of 1000 individuals observed at age 0. (iv) Group A is affected by only decrement (1). Determine the Kaplan-Meier multiple-decrement estimate of the number of individuals in Group A that survive to be at least 40 years old. (A) 343 (B) 664 (C) 736 (D) 816 (E) 861
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 291
Solutions to Problems: 9.1. D., 9.2. B., and 9.3. C. For death, lapsing censors the observations from above. There is one death between ages 64 and 65, at age 64.6. At age 64.6 the risk set is of size 43. Thus for death S(65)/S(64) = 42/43. q64ʼ(d) = 1 - 42/43 = 1/43 = 0.0233. For lapses, death censors the observations from above. There are two lapses between ages 64 and 65, at ages 64.4 and 64.8. (The lapse at age 64 will not affect S(65)/S(64), since the corresponding term in the product will appear in both the numerator and denominator.) At age 64.4 the risk set is of size 44. At age 64.8 the risk set is of size 42. Thus for lapses S(65)/S(64) = (43/44) (41/42) = p64ʼ(l). q64ʼ(l) = 1 - (43/44) (41/42) = 0.0460. p 64(T) = (1 - 0.0233) (1 - 0.0460) = 0.9318. The probability that Andrewʼs policy will still be in effect just before he turns 65 is 93.18%. q64(T) = 1 - (1 - 0.0233)(1 - 0.0460) = 1 - 0.9318 = 0.0682. q64(d) = (0.0682) ln(1 - 0.0233) / ln(0.9318) = 0.0228. The probability that during the next year the insurer will pay a death benefit on Andrewʼs policy is the probability that he will die without his policy having lapsed first: 2.28%. q64(l) = (0.0682) ln(1 - 0.0460) / ln(0.9318) = 0.0455. p 64(l) = 1 - 0.0455 = 0.9545. The probability that Andrewʼs policy will not lapse during the coming year is 95.45%. Alternately, we can start at age 60. For death, treating any lapses as censoring from above: ti
si
ri
(ri-si)/ri
62.1 62.8 63.6 64.6
1 1 1 1
29 28 46 43
0.9655 0.9643 0.9783 0.9767
S(ti) /1S(60) 0.9655 0.9310 0.9108 0.8896
p 64ʼ(d) = 0.8896/0.9108 = 0.9767. q64ʼ(d) = (0.9108 - 0.8896)/0.9108 = 0.0233. For lapses, treating any deaths as censoring from above: ti
si
ri
(ri-si)/ri
61.3 63.5 64.0 64.4 64.8
1 1 1 1 1
30 47 45 44 42
0.9667 0.9787 0.9778 0.9773 0.9762
S(ti) /1S(60) 0.9667 0.9461 0.9251 0.9041 0.8825
p 64ʼ(l) = 0.8825/0.9251 = 0.9540. q64ʼ(l) = (0.9251 - 0.8825)/0.9251 = 0.0460. Proceed as before.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 292
Comment: The probability that within the next year Andrew will die whether or not the policy is still in effect is: q64ʼ(d) = 2.33%. The probability that within the next year Andrew will die, while the policy is still in effect, and thus the insurer will pay a death benefit is: q64(d) = 2.28%. 2.28% < 2.33%. The probability that Andrew dies during the year, but that his policy had lapsed first, and therefore the insurer did not pay a death benefit is: 2.33% - 2.28% = 0.05%. The probability that within the next year conditional on Andrew not dying his policy will lapse is: q64ʼ(i) = 4.60%. The probability that within the next year Andrew will let his policy lapse, (while he is still alive) is: q64(l) = 4.55%. There is a chance that Andrew dies during the year, but if he had lived his policy would have lapsed during the year. 4.55% < 4.60%. An insured can die either before or after their policy lapses, but their policy can only lapse while they are alive, in other words prior to dying. The probability that Andrewʼs policy will still be in effect just before he turns 65 is 93.18%. The probability that Andrewʼs policy will not be in effect just before he turns 65 is: 6.82% = q64(T). The probability that Andrewʼs policy will not lapse during the coming year is 95.45%. The probability that Andrewʼs policy will lapse during the coming year is: 4.55% = q64(l). The probability that during the next year the insurer will pay a death benefit is: 2.28% = q64(d). Subject to rounding, 4.55% + 2.28% = 6.82% = q64(T). 9.4. B. & 9.5. D. For death, surrender censors the observations from above. ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
1.3 4.5 6.6 8.2
1 1 1 1
30 27 35 33
0.033 0.037 0.029 0.030
0.0333 0.0704 0.0989 0.1292
0.001111 0.001372 0.000816 0.000918
0.001111 0.002483 0.003299 0.004217
9.6. E. & 9.7. E. For surrender, death censors the observations from above. ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
2.1 3.8 6.0 7.4 8.8
1 1 1 1 1
29 28 36 34 32
0.034 0.036 0.028 0.029 0.031
0.0345 0.0702 0.0980 0.1274 0.1586
0.001189 0.001276 0.000772 0.000865 0.000977
0.001189 0.002465 0.003236 0.004101 0.005078
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 293
9.8. D. For death, treating surrender as right censoring: r78 = 400 - 12/2 = 394. p 78ʼ(d) = S(79)/S(78) = (394 - 28)/394. q78ʼ(d) = 28/394 = 14/197. For surrender, treating death as right censoring: r78 = 400 - 28/2 = 386. p 78ʼ(s) = S(79)/S(78) = (386 - 12)/386. q78ʼ(s) = 12/386 = 6/193. q78(T) = 1 - (1 - 14/197)(1 - 6/193) = 1 - (183/197)(187/193) = 0.09994. q78(d) = q78(T) ln(1 - q78ʼ(d))/ln(1 - q78(T)) = (0.09994)ln(183/197)/ln(1 - 0.09994) = (0.09994)(-0.07372)/(-0.10529) = 0.0700. Comment: q78(s) = q78(T)ln(1 - q78ʼ(s))/ln(1 - q78(T)) = (0.09994)ln(187/193)/ln(1 - 0.09994) = 0.0300. 9.9. C. For death, treating surrender as right censoring: r79 = 360 - 10/2 = 355. p 79ʼ(d) = S(80)/S(79) = (355 - 32)/355. q78ʼ(d) = 32/355. For surrender, treating death as right censoring: r79 = 360 - 32/2 = 344. p 79ʼ(s) = S(80)/S(79) = (344 - 10)/344. q78ʼ(s) = 10/344. q79(T) = 1 - (1 - 32/355)(1 - 10/344) = 1 - (323/355)(334/344) = 0.11659. q79(s) = q79(T) ln(1 - q79ʼ(s))/ln(1 - q79(T)) = (0.11659)ln(334/344)/ln(1 - 0.11659) = 0.0277. Comment: q79(d) = q79(T)ln(1 - q79ʼ(d))/ln(1 - q79(T)) = 0.0888. 9.10. A. q68ʼ(d) = (0.7 - 0.65)/0.7 = 0.07143. q68ʼ(r) = (0.3 - 0.2)/0.3 = .33333. q68(T) = 1 - (1 - 0.07143)(1 - 0.33333) = 0.38095. q68(r) = (0.38095)ln(1 - 0.33333)/ln(1 - 0.38095) = 0.3221. Comment: q68(d) = (0.38095)ln(1 - 0.07143)/ln(1 - 0.38095) = 0.0589. Note that q68(r) + q68(d) = 0.3221 + 0.0589 = 0.3810 = q68(T). 9.11. For example, q65(T) = 1 - (1 - 0.03)(1 - 0.04)(1 - 0.15) = 0.2085. q65(1) = (0.2085)ln(1 - 0.03)/ln(1 - 0.2085) = 0.0272. x
qx'(1)
qx'(2)
qx'(3)
qx(T)
qx(1)
qx(2)
qx(3)
63 64 65
0.020 0.025 0.030
0.030 0.035 0.040
0.250 0.200 0.150
0.2871 0.2473 0.2085
0.0171 0.0220 0.0272
0.0258 0.0310 0.0364
0.2441 0.1942 0.1449
Comment: Multiple decrement table taken from 3, 11/01, Q.38. The solution to that question is: 2 q 64(1)
= q64(1) + p64(T) q65(1) = 0.0220 + (1 - 0.2473)(0.0272) = 0.0425.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 294
9.12. A. & 9.13. B. For death, surrender censors the observations from above. ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
1.8 2.5 3.6
1 1 1
10 9 6
0.100 0.111 0.167
0.1000 0.2111 0.3778
0.010000 0.012346 0.027778
0.010000 0.022346 0.050123
Comment: Similar to the data in Table 13.4 in Loss Models. 9.14. D. & 9.15. E. For surrender, death censors the observations from above. ti
si
ri
si/ri
H(ti)
si/ri^2
Var[H(ti)]
3.0 3.3 4.1
1 1 1
8 7 5
0.125 0.143 0.200
0.1250 0.2679 0.4679
0.015625 0.020408 0.040000
0.015625 0.036033 0.076033
Comment: This is an example of a situation with two decrements, death and surrender. ^
For death, treating surrender as censoring, S (5) = e-.3778 = 0.685. ^
For surrender, treating death as censoring, S (5) = e-.4679 = 0.626. These correspond to single decrement rates. Where x is the age at issue of these policies, 5 q x'(d)
= Prob[death within 5 years | other decrements do not occur] = 1 - 0.685 = 0.315,
and 5 qx'(s) = Prob[surrender within 5 years | other decrements do not occur] = 1 - 0.626 = 0.374. 9.16. For death due to all causes: ti
si
ri
(ri-si)/ri
22 24 26 28
1 2 1 1
100 98 96 93
0.9900 0.9796 0.9896 0.9892
S(ti)/S(20) 1 0.9900 0.9698 0.9597 0.9494
For death due to accidents, death due to other causes censors the observations from above: ti
si
ri
(ri-si)/ri
22 24 28
1 1 1
100 98 93
0.9900 0.9898 0.9892
S(ti)/S(20) 1 0.9900 0.9799 0.9694
For death due to other causes, death due to accidents censors the observations from above: ti
si
ri
(ri-si)/ri
24 26
1 1
98 96
0.9898 0.9896
S(ti) 1 0.9898 0.9795
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 295
9.17. D. & 9.18. B. For death, surrender censors the observations from above. yi
si
ri
(ri-si)/ri
1.3 4.5 6.6 8.2
1 1 1 1
30 27 35 33
0.967 0.963 0.971 0.970
S(yi) 1 0.967 0.931 0.904 0.877
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.001149 0.001425 0.000840 0.000947
0.001149 0.002574 0.003414 0.004361
0.001074 0.002230 0.002792 0.003353
4
^
S(10) =
(ri - si) / ri = 0.877. ∏ i=1 4
^ ^
V [ S(10)] =
^
S(10)2
∑ si / {ri(ri - si)} = (0.8772)(0.004361) = 0.00335. i=1
Comment: Policy #1 is first observed at time = 0. At time = 1.3 the policyholder dies. Policy #2 is first observed at time = 0. At time = 2.1 the policyholder surrenders the policy, meaning our observation is censored from above. Policy #31 is first observed at time = 5, meaning our data is truncated from below. At time = 6.6 this policyholder dies. Policy #32 is first observed at time = 5. At time = 8.8 this policyholder surrenders the policy, meaning our observation is censored from above. When dealing with the decrement due to death, we only get contributions to the estimator at times of death. The last death observed takes place at 8.2. To get the risk set at t = 8.2, add up all the policies on which we could have observed a death at time 8.2. For example, policy #1 we could not because that policyholder died at time 1.3. Policy #6 we could not because, that policy had been surrendered at time 7.4. For policy # 10 we could. For policy # 31 we could not because that policyholder died at time 6.6. For policy # 32 we could, because that policy was in force from t = 5 until time 8.8. For policy # 37 we could, since that policy was in force from t = 5 to t = 10. The risk set contains: At time 1.3, policies 1 to 30, a total of 30. At time 4.5, policies 4 to 30, a total of 27. At time 6.6, policies 6 to 30 and 31 to 40, a total of 25 + 10 = 35. At time 8.2, policies 7 to 30 and 32 to 40, a total of 24 + 9 = 33.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 296
9.19. A. & 9.20. C. For surrender, death censors the observations from above. yi
si
ri
(ri-si)/ri
2.1 3.8 6.0 7.4 8.8
1 1 1 1 1
29 28 36 34 32
0.966 0.964 0.972 0.971 0.969
S(yi) 1 0.966 0.931 0.905 0.879 0.851
si/(ri(ri-si))
Cum. Sum
Var[S(yi)]
0.001232 0.001323 0.000794 0.000891 0.001008
0.001232 0.002554 0.003348 0.004239 0.005247
0.001148 0.002214 0.002743 0.003272 0.003801
5
^
S(10) =
(ri - si) / ri = 0.851. ∏ i=1 5
^ ^
^
V [ S(10)] = S(10)2
∑ si / {ri(ri - si)} = (0.8512)(0.005247) = 0.00380. i=1
Comment: When dealing with the decrement due to surrender, we only get contributions to the estimator at times of surrender. 9.21. D. p′j(1) p′j(2) = p′j(T) . p′0(2) = (1 - 0.135)/(1 - 0.060) = 0.9202. p′1(2) = (1 - 0.127)/(1 - 0.030) = 0.9000. p′2(2) = (1 - 0.098)/(1 - 0.040) = 0.9396. (10,000)(0.9202)(0.9000)(0.9396) = 7782. Comment: Similar to 4, 11/06, Q.17. 9.22. B. qj(2) = qj(T) ln( p′j(2) )/ ln(pj(T)). q0 (2) = (0.135)ln(0.9202) / ln(1 - 0.135) = 0.0774. q1 (2) = (0.127)ln(0.9000) / ln(1 - 0.127) = 0.0985. q2 (2) = (0.098)ln(0.9396) / ln(1 - 0.098) = 0.0592. Expected number who withdraw from age 0 to 10 due to decrement (2): (10,000)(0.0774) = 774. Expected number who survive to age 10: (10,000)(1 - 0.135) = 8650. Expected number who withdraw from age 10 to 20 due to decrement (2): (8650)(0.0985) = 852. Expected number who survive to age 20: (8650)(1 - 0.127) = 7551. Expected number who withdraw from age 20 to 30 due to decrement (2): (7551)(0.0592) = 447. Expected number who withdraw by age 30 due to decrement (2): 774 + 852 + 447 = 2073.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 297
9.23. C. qj(1) = qj(T) ln( p′j(1) )/ ln(pj(T)). q0 (1) = (0.135)ln(1 - 0.060) / ln(1 - 0.135) = 0.0576. q1 (1) = (0.127)ln(1 - 0.030) / ln(1 - 0.127) = 0.0285. q2 (1) = (0.098)ln(1 - 0.040) / ln(1 - 0.098) = 0.0388. Expected number who withdraw from age 0 to 10 due to decrement (1): (10,000)(0.0576) = 576. Expected number who survive to age 10: (10,000)(1 - 0.135) = 8650. Expected number who withdraw from age 10 to 20 due to decrement (1): (8650)(0.0285) = 247. Expected number who survive to age 20: (8650)(1 - 0.127) = 7551. Expected number who withdraw from age 20 to 30 due to decrement (1): (7551)(0.0388) = 293. Expected number who withdraw by age 30 due to decrement (1): 576 + 247 + 293 = 1116. Comment: q0 (1) + q0 (2) = 0.0576 + 0.0774 = 0.135 = q0 (T). Number expected to survive to age 30: (10000)(1 - .135)(1 - .127)(1 - .098) = 6811. Number expected to withdraw by age 30 due to decrement (1) is 1116. Number expected to withdraw by age 30 due to decrement (2) is 2073. 6811 + 1116 + 2073 = 10,000. 9.24. A. q62ʼ(d) = (0.8 - 0.74)/0.8 = 0.075. q62ʼ(r) = (0.77 - 0.60)/0.77 = 0.2208. q62(T) = 1 - (1 - 0.075)(1 - 0.2208) = 0.2792 q62(d) = (0.2792)ln(1 - 0.075)/ln(1 - 0.2792) = 0.0665. Comment: q62(r) = (0.2792)ln(1 - 0.2208)/ln(1 - 0.2792) = 0.2128. While this question covered material on the syllabus of Exam 3L and Exam MLC, it is similar to 4, 11/06, Q.17. Therefore, you might be asked a similar question on your exam.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 298
9.25. For death, treating divorce as right censoring: r30 = 1000 - 23/2 = 988.5.
p 30ʼ(death) = (988.5 - 11)/988.5 = 0.98887.
r31 = 1000 - 11 - 23 - 19/2 = 956.5.
p 31ʼ(death) = (956.5 - 14)/956.5 = 0.98536.
r32 = 1000 - 11 - 23 - 14 - 19 - 20/2 = 923.
p 32ʼ(death) = (923 - 13)/923 = 0.98592.
r33 = 1000 - 11 - 23 - 14 - 19 - 13 - 20 - 25/2 = 887.5. p 33ʼ(death) = (887.5 - 12)/887.5 = 0.98647. r34 = 1000 - 11 - 23 - 14 - 19 - 13 - 20 - 12 - 25 - 21/2 = 852.5. p 34ʼ(death) = (852.5 - 17)/852.5 = 0.98006. For divorce, treating death as right censoring: r30 = 1000 - 11/2 = 994.5.
p 30ʼ(divorce) = (994.5 - 23)/994.5 = 0.97687.
r31 = 1000 - 11 - 23 - 14/2 = 959.
p 31ʼ(divorce) = (959 - 19)/959 = 0.98019.
r32 = 1000 - 11 - 23 - 14 - 19 - 13/2 = 926.5.
p 32ʼ(divorce) = (926.5 - 20)/926.5 = 0.97841.
r33 = 1000 - 11 - 23 - 14 - 19 - 13 - 20 - 12/2 = 894. p 33ʼ(divorce) = (894 - 25)/894 = 0.97204. r34 = 1000 - 11 - 23 - 14 - 19 - 13 - 20 - 12 - 25 - 17/2 = 854.5. p 34ʼ(divorce) = (854.5 - 21)/854.5 = 0.97542.
p 30(T) = (0.98887)(0.97687) = 0.96600.
q30(divorce) = q30(T) ln(p30ʼ(divorce))/ln(p30(T)) = (0.03400)ln(0.97687)/ln(0.96600) = 0.02300. p 31(T) = (0.98536)(0.98019) = 0.96584. q31(divorce) = q31(T) ln(p31ʼ(divorce))/ln(p31(T)) = (0.03416)ln(0.98019)/ln(0.96584) = 0.01967. p 32(T) = ( 0.98592)(0.97841) = 0.96463. q32(divorce) = q32(T) ln(p32ʼ(divorce))/ln(p32(T)) = (0.03537)ln(0.97841)/ln(0.96463) = 0.02144. p 33(T) = (0.98647)(0.97204) = 0.95889. q33(divorce) = q33(T) ln(p33ʼ(divorce))/ln(p33(T)) = (0.04111)ln(0.97204)/ln(0.95889) = 0.02777. p 34(T) = (0.98006)(0.97542) = 0.95597. q34(divorce = q34(T) ln(p34ʼ(divorce))/ln(p34(T)) = (0.04403)ln(0.97542)/ln(0.95597) = 0.02434. 5 p 30(divorce)
= (1 - 0.02300) (1 - 0.01967) (1 - 0.02144) (1 - 0.02777) (1 - 0.02434) = 0.8890.
5 q 30(divorce)
= 1 - 0.8890 = 0.1110.
Comment: The probability in these five years of a marriage not ending due to divorce is 88.90%; this would mean that either the marriage does not end or ends due to death.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 299
9.26. E., 9.27. A., 9.28. D., and 9.29. C. q80ʼ(d) = (0.391 - 0.350)/0.391 = 0.1049. q80ʼ(l) = (0.537 - 0.485)/0.537 = 0.0968. p 80(T) = (1 - 0.1049)(1 - 0.0968) = 0.8085.
q80(T) = 1 - 0.8085 = 0.1915.
q80(l) = (0.1915) ln(1 - 0.0968) / ln(0.8085) = 0.0917. p80(l) = 1 - 0.0917 = 90.83%. q80(d) = (0.1915) ln(1 - 0.1049) / ln(0.8085) = 9.98%. q81ʼ(d) = (0.350 - 0.301)/0.350 = 0.1400. q81ʼ(l) = (0.485 - 0.440)/0.485 = 0.0928. p 81(T) = (1 - 0.1400)(1 - 0.0928) = 0.7802.
q81(T) = 1 - 0.7802 = 0.2198.
q81(d) = (0.2198) ln(1 - 0.1400) / ln(0.7802) = 0.1336. q81(l) = (0.2198) ln(1 - 0.0928) / ln(0.7802) = 0.0862. Assuming we have 10,000 lives each age 80. Then the expected number who will be alive and still have coverage at age 81 is: 10,000 p80(T) = (10,000) (0.8085) = 8085. Of these 8085 lives, the number we expect to lapse by age 82 is: (8085) (0.0862) = 697. The total number of lapses between age 80 and age 82 is: 917 + 697 = 1614. The probability of not lapsing within two years is: 1 - 1614 /10,000 = 83.86%. Of these 8085 lives, the number we expect to die while covered by age 82 is: (8085) (0.1336) = 1080. The total number of death benefits paid by age 82 is: 998 + 1080 = 2078. The probability of paying a death benefit within two years is: 2078 /10,000 = 20.78% Comment: Note that q80(d) + q80(l) = 0.0998 + 0.0917 = 0.1915 = q80(T). 9.17% = q80(l) is the chance that the policy will lapse over the next year. There are cases where the policy would have lapsed if the insured had remained alive, but the insured died first and therefore the policy did not lapse. 9.68% = q80ʼ(l) is the probability that the policy will lapse over the next year, conditional on the insured not dying over the next year. 9.98% = q80(d) is the chance over the next year that the insured will die prior to the policy lapsing, and therefore that the insurer will pay a death benefit. There are cases where the policy lapsed and then the insured died later in the year, and therefore the policy did not pay a death benefit. 10.49% = q80ʼ(d) is the probability that the insured will die over the next year, whether or not his policy is still providing coverage. The probability of still having coverage in two years is: p 80(T) p81(T) = (0.8085) (0.7802) = 0.6308.
2013-4-7,
Survival Analysis §9 Multiple Decrements,
HCM 10/16/12,
Page 300
9.30. A. For a life age 38, the probability of surviving five years is: (1 - 0.16)2 (1 - 0.26)3 = 0.2859. 5|q 38(1)
= q43(1) Prob[surviving 5 years | age 38] = (0.2)(0.2859) = 0.0572.
Comment: Similarly, 5|q38(2) = (0.04)(0.2859) = 0.0114. 9.31. C. q0 (T) = 0.08 + 0.02 + 0.10 = 0.20. p0 (T) = 1 - 0.2 = 0.8. q1 (T) = 0.10 + 0.25 + 0.20 = 0.55. p1 (T) = 1 - 0.55 = 0.45. Expected number of tests to make it to day three: (0.8)(0.45)(100) = 36. Expected number of tests to reach a successful completion during the 3rd day: (0.65)(36) = 23.4. 9.32. D. For a double-decrement study, q′j(i) is the probability that a person aged cj withdraws due to decrement (i) prior to age cj+1 in an environment where the other decrement is not possible. (1)
(2)
(1)
(1)
(T) p 0 = p′0 p′0 . ⇒ (1 - 0.1) = p′0 (1 - 0.05). ⇒ p′0 (1)
Similarly, p′1
= 0.9/0.95 = 0.9474.
= (1 - 0.182)/0.95 = 0.8611. (1)
(1)
For Group A where only decrement 1 applies, S(40) = S(20) S(40)/S(20) = p′0 p′1 (1000)(0.8158) = 815.8.
= 0.8158.
2013-4-7,
Survival Analysis §10 Important Ideas,
HCM 10/16/12,
Page 301
Section 10, Important Formulas and Ideas Introduction (Section 1)
Empirical Distribution Function at t = Empirical Survival Function at t =
number with age of death ≤ t . total
number with age of death > t total
= 1 - Empirical Distribution Function at t. Empirical Distribution Function has mean of F(x) and a variance of: Empirical Survival Function has mean of S(x) and a variance of: Var[S(x) - S(y)] =
F(x) S(x) . N
F(x) S(x) . N
(Probability in the interval) (1 - Probability in the interval) . N
y-xp x ≡
Prob[Survival past y | Survival past x] = S(y)/S(x).
y-xq x ≡
Prob[Not Surviving past y | Survival past x] = {S(x)-S(y)}/S(x) = 1 - y-xp x.
p x ≡ 1 p x = Prob[Survival past x+1 | Survival past x] = S(x+1)/S(x). qx ≡ 1 qx = Prob[Death within one year | Survival past x] = 1 - S(x+1)/S(x).
t|uq x
≡ Prob[x+t < time of death ≤ x+t+u | Survival past x] = {S(x+t) - S(x+t+u)}/S(x).
Risk set at age t = those individuals who could have failed at age t. ri = number of lives that could have been observed to fail at age yi. si = number of lives that failed at age yi. An observation x is right censored / censored from above at u if when x ≥ u it is recorded as u, but when x < u it is recorded as x. An observation x is left truncated / truncated from below at d if when x ≤ d it is not recorded, but when x > d it is recorded as x.
2013-4-7,
Survival Analysis §10 Important Ideas,
HCM 10/16/12,
Page 302
h(x) = f(x)/S(x), x ≥ 0. x
Cumulative Hazard Rate = H(x) =
∫ h(t) dt .
0
S(x) = exp[-H(x)].
H(x) = -ln[S(x)].
Exponential Distribution has constant hazard rate of 1/θ = 1/mean. Kaplan-Meier Product-Limit Estimator (Sections 2 and 3) yj is an uncensored value in the data. There are k unique values in the sample of size n. sj ⇔ the number of values that failed at yj. rj is the risk set at yj ⇔ the number of values that could have failed at yj. Product-Limit Estimator of the Survival Function is: r - si S n (t) = ∏ i , for yj - 1 ≤ t < yj. r i i
S n (yk) =
∏ ri i=1
- si where the product is taken over all yi. ri
Let w = maximum of the censorship points and data elements. ⎧⎪ k r - s ⎫⎪ t / w i Then for t > w, let Sn (t) = ⎨∏ i , where the product is taken over all yj. ⎬ ⎪⎩ i=1 ri ⎪⎭ An exponential decline in the survival function beyond the last observation or censorship point. When there is no censoring or truncation, the Product-Limit estimator reduces to the empirical Survival Function. When all the data is left truncated, in other words when nobody is in the study from birth, then the Product-Limit estimator gives an estimate of the survival conditional on survival to the smallest truncation point. A value left truncated at d, enters the risk set at t > d, and is not in the risk set at d. An individual that dies at age x, is in the risk set at x. A value right censored at u, is in the risk set at u.
2013-4-7,
Survival Analysis §10 Important Ideas,
HCM 10/16/12,
Page 303
rj = (Number of yis ≥ yj) + (Number of uis ≥ yj) - (Number of dis ≥ yj).
⇔ Risk set at time of death yj = (Number who die at time ≥ yj) + (Number who leave the study at time ≥ yj) - (Number who enter the study at time ≥ yj). Risk set at time of death yj = (Number who enter the study at time < yj) - (Number who die at time < yj) - (Number who leave the study at time < yj).
⇔ rj = (Number of dis < yj) - (Number of yis < yj) - (Number of uis < yj). rj = rj-1 + (Number of dis: yj-1 ≤ di < yj) - sj-1 - (Number of uis: yj-1 ≤ ui < yj). ⇔ Change in the risk set = (Number of entries) - (Number of deaths) - (Number of withdrawals). ^
Greenwoodʼs Approximation: V [Sn ( yj)] ≅
j
[Sn ( yj)]2
si . i=1 i (r i - si)
∑r
In the absence of truncation and censoring, Greenwoodʼs approximation gives the exact result for the variance of the Empirical Survival Function. Grouped Data (Sections 4) di = number of left truncated values in interval i, [ci, ci+1). ui = number of right censored values in interval i, (ci, ci+1]. xi = number of deaths (failures) in interval i, (ci, ci+1]. Pi = number of lives that that were in interval i-1 that contribute to ri, the risk set for interval i. Pi = Pi-1 + di-1 - xi-1 - ui-1. Assume that 0 ≤ α ≤ 1 of the truncated values in the interval are in the risk set. Assume that 0 ≤ β ≤ 1 of the censored values in the interval are no longer in the risk set. ri = Pi + αdi - βui. If α = 1/2 and β = 1/2, then ri = (Pi + Pi+1 + xi)/2.
2013-4-7,
Survival Analysis §10 Important Ideas,
HCM 10/16/12,
Page 304
Nelson-Aalen Estimator (Sections 5 and 6) j
∑ sr i .
^
H( yj) =
^
^
S ( yj) = exp[- H( yj)].
i=1 i
j
^
Var[ H( yj)] =
∑ rs2i .
^
^
^
Var[ S(yj)] ≅ S(yj)2 Var[ H(yj)].
i=1 i
Log-Transformed Confidence Intervals (Section 7) Φ(y) = (1+P)/2.
[
U = exp
y Var[S^ n ] Sn lnSn
]
.
Log-transformed confidence interval for S(t) is: (Sn ( t )1 / U, Sn ( t )U ) .
[
U = exp −
^ (t) ] y Var[ H
H(t)
]
.
Log-transformed confidence interval for H(t) is: (H(t)U, H(t)/U). Maximum Likelihood (Section 8) Likelihood = Π f(xi) Loglikelihood = Σ ln f(xi) Find the set of parameters such that either Π f(xi) or Σ ln f(xi) is maximized. For the Exponential Distribution with ungrouped data, in the absence of truncating and censoring, the method of maximum likelihood is equal to the method of moments. For the Exponential Distribution, the maximum likelihood fit to ungrouped data is: θ^ =
sum of payments . number of uncensored values
For the Uniform Distribution on the interval (0, ω) with ungrouped data, in the absence of truncating and censoring, the method of maximum likelihood fitted ω is equal to the maximum of the observations.
2013-4-7,
Survival Analysis §10 Important Ideas,
HCM 10/16/12,
Page 305
Multiple Decrements (Section 9) tq xʼ(1)
= Prob[failure within t years due to decrement 1 | other decrements do not occur] = single decrement rate due to decrement 1.
One can use the methods previously discussed to estimate the survival function for each decrement separately and thus the single decrement rate; leaving due to another decrement is treated as right censoring. tq x(1)
= Prob[failure within t years due to decrement 1].
tq x(T)
= Prob[failure within t years due to any of the decrements].
tp x(T)
= Π tp xʼ(g).
tq x(T) = Σ tq x(g).
q (T) ln[1 - qx' (g)] qx(g) = x . ln[1 - qx(T) ]
tq x(T)
= 1 - Π (1 - tqxʼ(g)).